Arrays in TAP_SCHEMA

Patrick Dowler pdowler.cadc at gmail.com
Wed May 31 19:54:10 CEST 2017


The last thing I am trying to include in TAP-1.1 is the addition of
xtype in the tap_schema, which would *almost* make the type system the
same -- arraysize being more limited in TAP-1.1 than in
VOTable/VODataService.

The first thing to accomplish was to remove "size" and shift services
and clients to something else: arraysize.

The intent is to support array types in TAP (I already do it in my
service), at least as something you can select but maybe not do
anything with in other parts of the query. I think you can do that now
without breaking clients (for 1-D arrays) and I do that in my
implementation.

The tap_schema in TAP-1.1 says arraysize is an integer so supports 1-d
array only. We could change that to (allow) array-of-integer to
support multi-dimensional arrays and in most cases services would
continue to return a single number. I'd have to work out how drastic
that change looked but my feeling is that services could declare that
column to be length 1 or longer so existing services would not need to
be changed and services that wanted to expose multi-dim arrays would
make the change. Clients would have to be flexible.

That still doesn't allow for variable length and I have found that I
need to know variable length in my implementation and was going to add
a boolean column to tap_schema.columns for that. I had not intended to
push that into the standard as I'm not sure anyone else needs to know,
but the move from "adql" xtypes to VOTable datatype + DALI xtype
leaves that out of the tap_schema (except when implied by tings like
xtype="polygon").

In my protype of adding xtype to tap_schema and using tap_schema as
the definitive source of all metadata (impl design), I  ended up
adding a DataType construct that looks like this:

DataType
- String datatype
- Integer arraysize
- Boolean varsize
- String xtype

If this was extended to

DataType
- String datatype
- Integer[] arraysize
- Boolean varsize
- String xtype

and we added varsize (name TBD) to tap_schema.columns then it would
match features of VOTable. We alrady nominally accepted that adding
xtype was OK for TAP-1.1 so I think the same would apply to varsize...
and I think we can specfy tap_schema.columns.arraysize so that it is
length 1 (TAP-1.0 compat with "size") or longer and then other columns
that are 1- or multi-d arrays could be described. Services that don't
support arrays would simply have arraysize that was just like "size"
in TAP-1.0.

I agree with the goal of supporting array-typed columns, I think 1-d
arrays are OK now, and I think this is within the realm of the TAP-1.1
update. It would make TAPType in VODataService obsolete since we could
just use VOTableType.

And I think this is actually a pretty small change and I'm almost
there in my prototype.

Thoughts?


Pat

On 31 May 2017 at 08:08, alberto micol <amicol.ivoa at googlemail.com> wrote:
> My use case regarding arraysize=“2" in TAP is here described.
>
> SSA 1.1 requires, in the VOTable output, a single field for the spatial
> location,
> expressed as:
>
> <FIELD ID="SpatialLocation" name="SpatialLocation" datatype=“double"
> ucd="pos.eq" utype="ssa:Char.SpatialAxis.Coverage.Location.Value"
> arraysize="2" unit=“deg”>
>
> At ESO we are implementing SSA on top of TAP.
>
> I therefore created the spatial location column in the TAP_SCHEMA.columns
> table declaring size=2,
> but TAP does not translate this information into the (naively) expected
> arraysize=“2”,
> for the reasons that Gregory explained.
>
> Hence, we are stack with the development of the ESO SSAP...
> unless some smart DAL person comes up with a solution!
>
> Many thanks,
> Alberto
>
>
> On 31 May 2017, at 16:34, Grégory Mantelet <gmantele at ari.uni-heidelberg.de>
> wrote:
>
> Dear DAL members,
>
> Sorry to come back again with the "array" topic, but I have more and more
> requests for having arrays in my TAP-Library (and, personally, I will also
> need that quite soon) but I do not know how to proceed. Even though nothing
> formally forbids it, there is actually no possibility to declare arrays in
> TAP_SCHEMA...so in a way it is kind of preventing/forbidding the usage of
> arrays if nobody can really know that a column is an array.
>
> I have searched in TAP-1.0, the coming TAP-1.1 and in VODataService in the
> hope to find something leading us toward a solution. Here is what I found
> and my related questions:
>
>
> ## In TAP 1.0
>
> In REC-TAP-1.0, two columns of TAP_SCHEMA.columns let specify the type of a
> published column, defined as follows:
>        - datatype - "ADQL datatype as in section 2.5"
>        - size          - "length of variable length datatypes"
>
> With the following additional description:
>
>        "Data types and how they map to VOTable datatypes are described in
> section 2.5
>         above. The “size” gives the length of variable length datatypes, for
> example
>         varchar(256); this size does not map to the VOTable arraysize
> attribute when the
>         latter specifies the size and shape of a multi-dimensional array."
>
> As written here, "size" does not aim to tell whether the value is a scalar
> or an array ; it is just the N in CHAR(N), VARCHAR(N), BINARY(N) and
> VARBINARY(N).
>
>
> ## In TAP 1.1
>
> In WD-TAP-1.1, in addition of the above two columns, "arraysize" has been
> added. So the datatype descriptive columns are now:
>        - datatype  - ?? (the description disappeared in this WD)
>        - "size"       - ?? (idem)
>        - arraysize  - ?? (idem)
>
> With the following additional description:
>
>        "The arraysize column gives the length of variable length datatypes,
> for
>         example varchar(256); this arraysize does not map exactly to the
> VOTable
>         arraysize attribute because the latter can specify the size and
> shape of a
>         multi-dimensional array as well as the variable size.
>         [...]
>         In the next major version of TAP, the "size" column
>         will be removed."
>
> So, even in TAP-1.1 there will be no way to add information about arrays.
>
> ==> Furthermore, though I can understand the reason why "size" should be
> deprecated (collision with an ADQL reserved keyword....by the way, will we
> still have reserved keywords with the PEG grammar for ADQL?), is it really a
> good idea to call a column "arraysize" if it is not about an array?
>
> ==> And then, why having the same name as in VOTable if it does not do the
> same?
>
>
> ## In VODataService 1.1
>
> In REC-VODataService-1.1 (used to describe published columns in TAP's entry
> point '/tables'), the datatype of a column can be expressed using two types
> of type:
>        - VOTableType (e.g. <dataType xsi:type="vs:VOTableType"
> arraysize="*"> char </dataType>)
>        - TAPType         (e.g. <dataType xsi:type="vs:TAPType" size="8" >
> CHAR </dataType>)
>
> According to the XML schema of VODataService-1.1, TAPType is the only one
> that can have a "size" attribute defined as described in TAP 1.0 (i.e. "The
> length of the variable-length data type."). Ok, that makes sense since it is
> only something coming from TAP.
>
> ==> By the way, is it also planned to deprecate "size" from VODataService as
> in TAP-1.1?
>
> However, both VOTableType and TAPType can have an "arraysize" attribute
> defined as described in VOTable (i.e. an ArrayShape = " An expression of a
> the shape of a multi-dimensional array of the form LxNxM... where each value
> between gives the integer length of the array along a dimension. An asterisk
> (*) as the last dimension of the shape indicates that the length of the last
> axis is variable or undetermined.").
>
> So, here, we have a completely different definition of "arraysize" than in
> WD-TAP-1.1.
>
> ==> Is there a mistake here? If yes, which standard has to be updated:
> VODataService or TAP? And in which direction?
>
>
> ## To conclude,
>
> ==> considering these three documents and knowing that TAP-1.1 is still in
> WD, how can we declare arrays in TAP_SCHEMA (and /tables result)?
>
> I personally like to have something consistent and so I would go for
> re-defining the new column "arraysize" as in VODataService and VOTable.
>
> ==> But does it make sense to combine this VOTable piece of information with
> the datatypes of TAP (i.e. the so-called TAPType like VARCHAR, BIGINT, BLOB,
> ...)? If not, what other alternative(s) do we have?
>
> Cheers,
> Grégory
>
>



-- 
Patrick Dowler
Canadian Astronomy Data Centre
Victoria, BC, Canada


More information about the dal mailing list