Arrays in TAP_SCHEMA

Grégory Mantelet gmantele at ari.uni-heidelberg.de
Tue Jun 6 11:17:51 CEST 2017


Pat,


On 05/31/2017 07:54 PM, Patrick Dowler wrote:
> The last thing I am trying to include in TAP-1.1 is the addition of
> xtype in the tap_schema, which would *almost* make the type system the
> same -- arraysize being more limited in TAP-1.1 than in
> VOTable/VODataService.


The addition of xtype may duplicate the TAPType (e.g. POINT, TIMESTAMP), 
but if you say we are going to deprecate TAPType and rather use 
VOTable-ish types, it is all good for me: I am ok with the addition of 
xtype.


> The first thing to accomplish was to remove "size" and shift services
> and clients to something else: arraysize.

Ok

> The intent is to support array types in TAP (I already do it in my
> service), at least as something you can select but maybe not do
> anything with in other parts of the query. I think you can do that now
> without breaking clients (for 1-D arrays) and I do that in my
> implementation.
>
> The tap_schema in TAP-1.1 says arraysize is an integer so supports 1-d
> array only.


Is there another version of the draft document for TAP 1.1 than:

http://www.ivoa.net/documents/TAP/20160428/WD-TAP-1.1-20160428.html

If not, it is written nowhere that "arraysize" is a 1-D array 
length.....or at least it is not written in 4.3:

    The arraysize column gives the length of variable length datatypes,
    for example varchar(256); this arraysize does not map exactly to the
    VOTable arraysize attribute because the latter can specify the size
    and shape of a multi-dimensional array as well as the variable size.
    The ßize" column is retained for backwards compatiblity to TAP-1.0
    and must contain the same value as arraysize.


However, if there is another version of the TAP 1.1 document, could you 
tell me where I can take a look on it?

And if guarantee that the TAP 1.1's arraysize is clearly the length of 
arrays (and not the length of variable-length datatypes as "size" does), 
I am happy with that and I can start adapting my TAP-Library in that 
direction.


> We could change that to (allow) array-of-integer to
> support multi-dimensional arrays and in most cases services would
> continue to return a single number. I'd have to work out how drastic
> that change looked but my feeling is that services could declare that
> column to be length 1 or longer so existing services would not need to
> be changed and services that wanted to expose multi-dim arrays would
> make the change. Clients would have to be flexible.
>
> That still doesn't allow for variable length and I have found that I
> need to know variable length in my implementation and was going to add
> a boolean column to tap_schema.columns for that. I had not intended to
> push that into the standard as I'm not sure anyone else needs to know,
> but the move from "adql" xtypes to VOTable datatype + DALI xtype
> leaves that out of the tap_schema (except when implied by tings like
> xtype="polygon").
>
> In my protype of adding xtype to tap_schema and using tap_schema as
> the definitive source of all metadata (impl design), I  ended up
> adding a DataType construct that looks like this:
>
> DataType
> - String datatype
> - Integer arraysize
> - Boolean varsize
> - String xtype
>
> If this was extended to
>
> DataType
> - String datatype
> - Integer[] arraysize
> - Boolean varsize
> - String xtype
>
> and we added varsize (name TBD) to tap_schema.columns then it would
> match features of VOTable. We alrady nominally accepted that adding
> xtype was OK for TAP-1.1 so I think the same would apply to varsize...
> and I think we can specfy tap_schema.columns.arraysize so that it is
> length 1 (TAP-1.0 compat with "size") or longer and then other columns
> that are 1- or multi-d arrays could be described. Services that don't
> support arrays would simply have arraysize that was just like "size"
> in TAP-1.0.


For 1-D array, I agree that making arraysize an INTEGER makes perfectly 
sense, and that would be enough for TAP-1.1. But if later in TAP-1.2 (or 
TAP-2.0) we want to support multi-dim arrays, we would have to change 
the type of arraysize (into VARCHAR or INTEGER[] as you suggest), which 
would imply a relatively important transition for TAP clients...another 
one already after TAP 1.1. For that reason, I think it would be better 
to make arraysize like in VOTable (particularly if we tend into that 
direction in the future: VOTable-ish types): char(*) or VARCHAR.

If we agree to have a VARCHAR arraysize in TAP_SCHEMA.columns, we can 
still say that in TAP-1.1, we only support 1-D array (and yes, it would 
be just an integer into a String which is ugly, I agree). And then it 
TAP-1.2 or 2.0, we would still have the possibility to choose a better 
string syntax for multi-dim arrays (another syntax than the VOTable 
one...probably simpler). Moreover, having a VARCHAR arraysize would 
allow variable length arrays like in VOTable: with a '*' for instance. I 
think that it is a good compromise that works for TAP-1.1 but still 
allows an easy and flexible evolution for later versions of the TAP 
protocol.

Anyway, I agree that it makes the datatypes not as safe as you suggest, 
but I think that having more and more columns just to define a datatype 
starts to become too complicated for so few benefit....but it is just my 
anticipated point of view on that topic, so maybe I am wrong.


About the variable length datatypes (e.g. CHAR, VARCHAR, ....), I think 
it depends of the datatype system we choose. If we want to keep TAPType, 
"size" could still be use for this purpose, but if we want to get rid of 
"size", why not encoding the length of these special datatypes directly 
in the datatypes like in databases: CHAR(n), VARCHAR(n), .... The 
"datatype" column being a VARCHAR, such thing is still possible, but 
that of course means that TAP clients would have to change a bit the way 
they parse datatypes (e.g. startsWith(...) instead of equals(...)). 
Otherwise, if we want to choose a VOTable-ish datatype system, arraysize 
is fairly enough for variable length datatypes, because it is already 
how it is done in VOTable....am I wrong?


Grégory


> I agree with the goal of supporting array-typed columns, I think 1-d
> arrays are OK now, and I think this is within the realm of the TAP-1.1
> update. It would make TAPType in VODataService obsolete since we could
> just use VOTableType.
>
> And I think this is actually a pretty small change and I'm almost
> there in my prototype.
>
> Thoughts?
>
>
> Pat
>
> On 31 May 2017 at 08:08, alberto micol <amicol.ivoa at googlemail.com> wrote:
>> My use case regarding arraysize=“2" in TAP is here described.
>>
>> SSA 1.1 requires, in the VOTable output, a single field for the spatial
>> location,
>> expressed as:
>>
>> <FIELD ID="SpatialLocation" name="SpatialLocation" datatype=“double"
>> ucd="pos.eq" utype="ssa:Char.SpatialAxis.Coverage.Location.Value"
>> arraysize="2" unit=“deg”>
>>
>> At ESO we are implementing SSA on top of TAP.
>>
>> I therefore created the spatial location column in the TAP_SCHEMA.columns
>> table declaring size=2,
>> but TAP does not translate this information into the (naively) expected
>> arraysize=“2”,
>> for the reasons that Gregory explained.
>>
>> Hence, we are stack with the development of the ESO SSAP...
>> unless some smart DAL person comes up with a solution!
>>
>> Many thanks,
>> Alberto
>>
>>
>> On 31 May 2017, at 16:34, Grégory Mantelet <gmantele at ari.uni-heidelberg.de>
>> wrote:
>>
>> Dear DAL members,
>>
>> Sorry to come back again with the "array" topic, but I have more and more
>> requests for having arrays in my TAP-Library (and, personally, I will also
>> need that quite soon) but I do not know how to proceed. Even though nothing
>> formally forbids it, there is actually no possibility to declare arrays in
>> TAP_SCHEMA...so in a way it is kind of preventing/forbidding the usage of
>> arrays if nobody can really know that a column is an array.
>>
>> I have searched in TAP-1.0, the coming TAP-1.1 and in VODataService in the
>> hope to find something leading us toward a solution. Here is what I found
>> and my related questions:
>>
>>
>> ## In TAP 1.0
>>
>> In REC-TAP-1.0, two columns of TAP_SCHEMA.columns let specify the type of a
>> published column, defined as follows:
>>         - datatype - "ADQL datatype as in section 2.5"
>>         - size          - "length of variable length datatypes"
>>
>> With the following additional description:
>>
>>         "Data types and how they map to VOTable datatypes are described in
>> section 2.5
>>          above. The “size” gives the length of variable length datatypes, for
>> example
>>          varchar(256); this size does not map to the VOTable arraysize
>> attribute when the
>>          latter specifies the size and shape of a multi-dimensional array."
>>
>> As written here, "size" does not aim to tell whether the value is a scalar
>> or an array ; it is just the N in CHAR(N), VARCHAR(N), BINARY(N) and
>> VARBINARY(N).
>>
>>
>> ## In TAP 1.1
>>
>> In WD-TAP-1.1, in addition of the above two columns, "arraysize" has been
>> added. So the datatype descriptive columns are now:
>>         - datatype  - ?? (the description disappeared in this WD)
>>         - "size"       - ?? (idem)
>>         - arraysize  - ?? (idem)
>>
>> With the following additional description:
>>
>>         "The arraysize column gives the length of variable length datatypes,
>> for
>>          example varchar(256); this arraysize does not map exactly to the
>> VOTable
>>          arraysize attribute because the latter can specify the size and
>> shape of a
>>          multi-dimensional array as well as the variable size.
>>          [...]
>>          In the next major version of TAP, the "size" column
>>          will be removed."
>>
>> So, even in TAP-1.1 there will be no way to add information about arrays.
>>
>> ==> Furthermore, though I can understand the reason why "size" should be
>> deprecated (collision with an ADQL reserved keyword....by the way, will we
>> still have reserved keywords with the PEG grammar for ADQL?), is it really a
>> good idea to call a column "arraysize" if it is not about an array?
>>
>> ==> And then, why having the same name as in VOTable if it does not do the
>> same?
>>
>>
>> ## In VODataService 1.1
>>
>> In REC-VODataService-1.1 (used to describe published columns in TAP's entry
>> point '/tables'), the datatype of a column can be expressed using two types
>> of type:
>>         - VOTableType (e.g. <dataType xsi:type="vs:VOTableType"
>> arraysize="*"> char </dataType>)
>>         - TAPType         (e.g. <dataType xsi:type="vs:TAPType" size="8" >
>> CHAR </dataType>)
>>
>> According to the XML schema of VODataService-1.1, TAPType is the only one
>> that can have a "size" attribute defined as described in TAP 1.0 (i.e. "The
>> length of the variable-length data type."). Ok, that makes sense since it is
>> only something coming from TAP.
>>
>> ==> By the way, is it also planned to deprecate "size" from VODataService as
>> in TAP-1.1?
>>
>> However, both VOTableType and TAPType can have an "arraysize" attribute
>> defined as described in VOTable (i.e. an ArrayShape = " An expression of a
>> the shape of a multi-dimensional array of the form LxNxM... where each value
>> between gives the integer length of the array along a dimension. An asterisk
>> (*) as the last dimension of the shape indicates that the length of the last
>> axis is variable or undetermined.").
>>
>> So, here, we have a completely different definition of "arraysize" than in
>> WD-TAP-1.1.
>>
>> ==> Is there a mistake here? If yes, which standard has to be updated:
>> VODataService or TAP? And in which direction?
>>
>>
>> ## To conclude,
>>
>> ==> considering these three documents and knowing that TAP-1.1 is still in
>> WD, how can we declare arrays in TAP_SCHEMA (and /tables result)?
>>
>> I personally like to have something consistent and so I would go for
>> re-defining the new column "arraysize" as in VODataService and VOTable.
>>
>> ==> But does it make sense to combine this VOTable piece of information with
>> the datatypes of TAP (i.e. the so-called TAPType like VARCHAR, BIGINT, BLOB,
>> ...)? If not, what other alternative(s) do we have?
>>
>> Cheers,
>> Grégory
>>
>>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dal/attachments/20170606/54b315fa/attachment-0001.html>


More information about the dal mailing list