Array element access in ADQL

Marco Molinaro molinaro at oats.inaf.it
Fri Jul 7 08:55:23 CEST 2017


Hi DAL,
(replying after Gregory's but with Pat's TAP-reply in mind)

2017-07-06 13:24 GMT+02:00 Gregory Mantelet <gmantele at ari.uni-heidelberg.de>:
> Hi DAL,
>
> I agree with Pat: how to convey the knowledge to TAP clients that arrays are
> supported is a major issue?
> And personally, I am more thinking about what should I write in
> TAP_SCHEMA.columns.datatype for an array?

Do you mean, allowing for non homogeneous arrays?
Otherwise shouldn't datatype be the primitive VOTable-type of
the single array element?

> Once these points are solved, agreeing on a syntax with or without UDFs will
> be quite easy.
>
> PostgreSQL lets create an array with a string like '{12.3,125.987}' (1D) or
> '{{1,2},{3,4}}' (2D and more). So we can imagine any UDF working on a string
> like that. Even though a database system does not support arrays, with
> string manipulation it could be possible to simulate one. It is not ideal,
> indeed. But we could also state that a string with the above syntax would be
> just the serialisation of an array, which does not mean it has to be stored
> and manipulated like that in the database system.

I'd vote for the serialisation approach, for instance MySQL knows nothing
about arrays, but has recently introduced direct JSON serialisation that would
help in that sense.

> So, the major problem is that a new datatype has to be added in ADQL/TAP.

Goes down to VOTable, as Pat explained.

With the caveat that those fields will hardly be usable as query constraints,
at least in the general case. Of course specific implementations or
back ends can allow also for that.

Cheers,
     Marco

> Without this addition in the language, the only thing we can do is probably
> going on a solution based on strings: an array would be typed as a VARCHAR
> in TAP_SCHEMA. Setting a xtype="array" (or similar) would probably help the
> TAP clients to consider these strings as arrays (but xtype in TAP_SCHEMA may
> be possible only from TAP 1.1, is it right?). However it is also possible
> that a TAP implementation returns those arrays in VOTable (and FITS?) as
> arrays, which would allow clients like TOPCAT to interpret and manipulate
> them as such instead of simple strings.
>
> Grégory
>
>
>
> On 05/07/2017 17:43, Mark Taylor wrote:
>>
>> Pat and Walter,
>>
>> thanks for these pointers.  Something like PostgreSQL square
>> bracket or Oracle round bracket indexing obviously looks
>> intuitive and would be nice to have, and this option has come
>> up in our discussions.  But neither of these options can appear
>> directly in syntactically legal ADQL 2.0.
>>
>> I expect that DPAC would support some kind of standard array indexing
>> syntax along these lines in a future version of ADQL, but
>> it doesn't solve our immediate problem which is allowing array
>> element access in TAP queries in time for Gaia DR2 (April 2018).
>> I'm guessing that a revised ADQL version containing this
>> not-previously-discussed syntax is not likely on that timescale.
>> So unless I'm missing something we either have to supply this
>> capability using UDFs, or require syntactically-illegal ADQL in queries.
>>
>> Mark
>>
>> On Tue, 4 Jul 2017, Walter Landry wrote:
>>
>>> Oracle calls them Varrays
>>>
>>>    https://docs.oracle.com/cd/B10501_01/appdev.920/a96624/05_colls.htm
>>>
>>> It looks like they use parentheses '()' to access elements.  SQLite
>>> does not support arrays.
>>>
>>>    https://www.sqlite.org/datatype3.html
>>>
>>> It looks like both Postgres and Oracle start indexing at 1 :(
>>>
>>> Cheers,
>>> Walter Landry
>>>
>>> Patrick Dowler <pdowler.cadc at gmail.com> wrote:
>>>>
>>>> PostgreSQL supports array types with normal looking array access by
>>>> index:
>>>>
>>>> https://www.postgresql.org/docs/9.6/static/arrays.html
>>>>
>>>> I haven't found a compelling reason to use something like this (yet)
>>>> but it has been around for some time. The tricky bit probably isn't so
>>>> much ADQL speciyfing it but rather TAPRegExt providing the means to
>>>> tell clients/users whether array types/access is supported and
>>>> possibly for which primitive types.
>>>>
>>>> Pat
>>>>
>>>> On 4 July 2017 at 05:03, Ger van Diepen <diepen at astron.nl> wrote:
>>>>>
>>>>> Hi Mark,
>>>>>
>>>>>
>>>>> You could take a look at Casacore's TaQL, the query language for its
>>>>> tables.
>>>>> It uses array syntax similar to Python.
>>>>>
>>>>> I understood that an ADQL query command is parsed and rewritten to a
>>>>> command
>>>>> for the underlying query language. That should make it possible to use
>>>>> python-like array syntax and rewrite it to UDFs as needed. Probably not
>>>>> needed for PostgreSQL that supports arrays.
>>>>>
>>>>> Note that TaQL also support units in a natural way.
>>>>>
>>>>> See http://casacore.github.io/casacore-notes/199.html
>>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>> ger
>>>>>
>>>>>>>> Mark Taylor <M.B.Taylor at bristol.ac.uk> 04-Jul-17 13:43 >>>
>>>>>
>>>>>
>>>>> Dear DAL,
>>>>>
>>>>> Upcoming releases of the Gaia catalogue will contain some
>>>>> array-valued columns in the source catalogues, things like
>>>>> time series, spectra and correlation matrices.
>>>>> TAP does not prevent array-valued columns, but as far as
>>>>> I know there is no standard way to access array elements
>>>>> within ADQL queries.  In DPAC we are considering ways to allow this.
>>>>>
>>>>> We can define User-Defined Functions for this purpose, and
>>>>> some experimental functionality along these lines has been
>>>>> implemented.  But we're interested in input from the IVOA:
>>>>>
>>>>>     - have other people encountered this and come up with solutions
>>>>>       that we can copy?
>>>>>
>>>>>     - should we try to come up with something (a de facto standard)
>>>>>       that can be used by other services?
>>>>>
>>>>>     - is there a case for language support for these features in
>>>>>       a future version of ADQL?
>>>>>
>>>>> Here is the initial discussion item reported by Alcione Mora from
>>>>> the DPAC issue tracking system (ref for DPAC insiders: C9GACS-239):
>>>>>
>>>>>     Experimental support for array functions has been added to Gaia
>>>>> Archive
>>>>>     v1.3.0. Array types are supported as valid output formats. In
>>>>> addition,
>>>>>     some user defined functions have been defined for direct
>>>>> manipulation,
>>>>>     most notably (see Archive help):
>>>>>
>>>>>     GET_DOUBLE_ARRAY_ELEMENT(array,indexes): Returns the selected
>>>>> element
>>>>>     from the array of double precision values, where:
>>>>>
>>>>>        - array [double]:
>>>>>             Input array
>>>>>        - indexes [string]:
>>>>>             String with the selected indexes with the format '[i][j]..'
>>>>>
>>>>>     The syntax is functional, but should be considered a work in
>>>>> progress
>>>>>     until DR2.
>>>>>
>>>>>     Extra functionality is needed Some suggestions to discuss include
>>>>>     the following:
>>>>>
>>>>>        - get_length(array_column, any type), null for null input cell
>>>>>        - create_double_array(list_of_columns), autocasting
>>>>>        - create_int_array(list_of_columns), autocasting
>>>>>        - Is ADQL function overload supported? I could not find
>>>>>          any reference in ADQL and TAPRegExt (neither allowing nor
>>>>>          forbidding). If yes, GET_ARRAY_ELEMENT should be implemented
>>>>> using
>>>>>          a single function name for all data types transparently to the
>>>>>          user. If not, consider some of the following six bullet points
>>>>>        - Similar functions for float, long, int, byte and boolean
>>>>>          get_double and get_float should accept all numeric types as
>>>>> input
>>>>>          (automatic casting)
>>>>>        - get_long should accept long, int, byte and boolean (1=true,
>>>>> 0=false)
>>>>>        - get_int should accept int and byte and boolean (1=true,
>>>>> 0=false)
>>>>>        - get_byte should accept byte and boolean (1=true, 0=false)
>>>>>        - Alternatively or complementary, the CAST ADQL 2.1 optional
>>>>>          function could be implemented
>>>>>
>>>>> If anyone in VO-land has input on this, we'd be pleased to hear it,
>>>>> so we can do something re-usable or re-used as much as possible.
>>>>> Otherwise, we'll go ahead and do whatever looks like a best fit
>>>>> to Gaia requirements.
>>>>>
>>>>> Thanks
>>>>>
>>>>> Mark
>>>>>
>>>>> --
>>>>> Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
>>>>> m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/
>>>>
>>>>
>>>>
>>>> --
>>>> Patrick Dowler
>>>> Canadian Astronomy Data Centre
>>>> Victoria, BC, Canada
>>
>> --
>> Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
>> m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/
>
>
>


More information about the dal mailing list