Array element access in ADQL

Gregory Mantelet gmantele at ari.uni-heidelberg.de
Thu Jul 6 13:24:53 CEST 2017


Hi DAL,

I agree with Pat: how to convey the knowledge to TAP clients that arrays 
are supported is a major issue?
And personally, I am more thinking about what should I write in 
TAP_SCHEMA.columns.datatype for an array?

Once these points are solved, agreeing on a syntax with or without UDFs 
will be quite easy.

PostgreSQL lets create an array with a string like '{12.3,125.987}' (1D) 
or '{{1,2},{3,4}}' (2D and more). So we can imagine any UDF working on a 
string like that. Even though a database system does not support arrays, 
with string manipulation it could be possible to simulate one. It is not 
ideal, indeed. But we could also state that a string with the above 
syntax would be just the serialisation of an array, which does not mean 
it has to be stored and manipulated like that in the database system.

So, the major problem is that a new datatype has to be added in ADQL/TAP.

Without this addition in the language, the only thing we can do is 
probably going on a solution based on strings: an array would be typed 
as a VARCHAR in TAP_SCHEMA. Setting a xtype="array" (or similar) would 
probably help the TAP clients to consider these strings as arrays (but 
xtype in TAP_SCHEMA may be possible only from TAP 1.1, is it right?). 
However it is also possible that a TAP implementation returns those 
arrays in VOTable (and FITS?) as arrays, which would allow clients like 
TOPCAT to interpret and manipulate them as such instead of simple strings.

Grégory


On 05/07/2017 17:43, Mark Taylor wrote:
> Pat and Walter,
>
> thanks for these pointers.  Something like PostgreSQL square
> bracket or Oracle round bracket indexing obviously looks
> intuitive and would be nice to have, and this option has come
> up in our discussions.  But neither of these options can appear
> directly in syntactically legal ADQL 2.0.
>
> I expect that DPAC would support some kind of standard array indexing
> syntax along these lines in a future version of ADQL, but
> it doesn't solve our immediate problem which is allowing array
> element access in TAP queries in time for Gaia DR2 (April 2018).
> I'm guessing that a revised ADQL version containing this
> not-previously-discussed syntax is not likely on that timescale.
> So unless I'm missing something we either have to supply this
> capability using UDFs, or require syntactically-illegal ADQL in queries.
>
> Mark
>
> On Tue, 4 Jul 2017, Walter Landry wrote:
>
>> Oracle calls them Varrays
>>
>>    https://docs.oracle.com/cd/B10501_01/appdev.920/a96624/05_colls.htm
>>
>> It looks like they use parentheses '()' to access elements.  SQLite
>> does not support arrays.
>>
>>    https://www.sqlite.org/datatype3.html
>>
>> It looks like both Postgres and Oracle start indexing at 1 :(
>>
>> Cheers,
>> Walter Landry
>>
>> Patrick Dowler <pdowler.cadc at gmail.com> wrote:
>>> PostgreSQL supports array types with normal looking array access by index:
>>>
>>> https://www.postgresql.org/docs/9.6/static/arrays.html
>>>
>>> I haven't found a compelling reason to use something like this (yet)
>>> but it has been around for some time. The tricky bit probably isn't so
>>> much ADQL speciyfing it but rather TAPRegExt providing the means to
>>> tell clients/users whether array types/access is supported and
>>> possibly for which primitive types.
>>>
>>> Pat
>>>
>>> On 4 July 2017 at 05:03, Ger van Diepen <diepen at astron.nl> wrote:
>>>> Hi Mark,
>>>>
>>>>
>>>> You could take a look at Casacore's TaQL, the query language for its tables.
>>>> It uses array syntax similar to Python.
>>>>
>>>> I understood that an ADQL query command is parsed and rewritten to a command
>>>> for the underlying query language. That should make it possible to use
>>>> python-like array syntax and rewrite it to UDFs as needed. Probably not
>>>> needed for PostgreSQL that supports arrays.
>>>>
>>>> Note that TaQL also support units in a natural way.
>>>>
>>>> See http://casacore.github.io/casacore-notes/199.html
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> ger
>>>>
>>>>>>> Mark Taylor <M.B.Taylor at bristol.ac.uk> 04-Jul-17 13:43 >>>
>>>>
>>>> Dear DAL,
>>>>
>>>> Upcoming releases of the Gaia catalogue will contain some
>>>> array-valued columns in the source catalogues, things like
>>>> time series, spectra and correlation matrices.
>>>> TAP does not prevent array-valued columns, but as far as
>>>> I know there is no standard way to access array elements
>>>> within ADQL queries.  In DPAC we are considering ways to allow this.
>>>>
>>>> We can define User-Defined Functions for this purpose, and
>>>> some experimental functionality along these lines has been
>>>> implemented.  But we're interested in input from the IVOA:
>>>>
>>>>     - have other people encountered this and come up with solutions
>>>>       that we can copy?
>>>>
>>>>     - should we try to come up with something (a de facto standard)
>>>>       that can be used by other services?
>>>>
>>>>     - is there a case for language support for these features in
>>>>       a future version of ADQL?
>>>>
>>>> Here is the initial discussion item reported by Alcione Mora from
>>>> the DPAC issue tracking system (ref for DPAC insiders: C9GACS-239):
>>>>
>>>>     Experimental support for array functions has been added to Gaia Archive
>>>>     v1.3.0. Array types are supported as valid output formats. In addition,
>>>>     some user defined functions have been defined for direct manipulation,
>>>>     most notably (see Archive help):
>>>>
>>>>     GET_DOUBLE_ARRAY_ELEMENT(array,indexes): Returns the selected element
>>>>     from the array of double precision values, where:
>>>>
>>>>        - array [double]:
>>>>             Input array
>>>>        - indexes [string]:
>>>>             String with the selected indexes with the format '[i][j]..'
>>>>
>>>>     The syntax is functional, but should be considered a work in progress
>>>>     until DR2.
>>>>
>>>>     Extra functionality is needed Some suggestions to discuss include
>>>>     the following:
>>>>
>>>>        - get_length(array_column, any type), null for null input cell
>>>>        - create_double_array(list_of_columns), autocasting
>>>>        - create_int_array(list_of_columns), autocasting
>>>>        - Is ADQL function overload supported? I could not find
>>>>          any reference in ADQL and TAPRegExt (neither allowing nor
>>>>          forbidding). If yes, GET_ARRAY_ELEMENT should be implemented using
>>>>          a single function name for all data types transparently to the
>>>>          user. If not, consider some of the following six bullet points
>>>>        - Similar functions for float, long, int, byte and boolean
>>>>          get_double and get_float should accept all numeric types as input
>>>>          (automatic casting)
>>>>        - get_long should accept long, int, byte and boolean (1=true, 0=false)
>>>>        - get_int should accept int and byte and boolean (1=true, 0=false)
>>>>        - get_byte should accept byte and boolean (1=true, 0=false)
>>>>        - Alternatively or complementary, the CAST ADQL 2.1 optional
>>>>          function could be implemented
>>>>
>>>> If anyone in VO-land has input on this, we'd be pleased to hear it,
>>>> so we can do something re-usable or re-used as much as possible.
>>>> Otherwise, we'll go ahead and do whatever looks like a best fit
>>>> to Gaia requirements.
>>>>
>>>> Thanks
>>>>
>>>> Mark
>>>>
>>>> --
>>>> Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
>>>> m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/
>>>
>>>
>>> -- 
>>> Patrick Dowler
>>> Canadian Astronomy Data Centre
>>> Victoria, BC, Canada
> --
> Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
> m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/




More information about the dal mailing list