Array element access in ADQL

Patrick Dowler pdowler.cadc at gmail.com
Thu Jul 6 17:48:55 CEST 2017


With the latest change in WD-TAP-1.1 (svn rev 4139 iirc), the
tap_schema specifies arraysize as equivalent to the VOTable arraysize
attribute. So that means multi-dimensional arrays are supported and
describable. Only one-dimensional arrays could be used in TAP-1.0.

But that just means storing and select(ing) those values and not
accessing them -- either via indexed access to scalar values or via
array-specific functions we might want to standardise. TBD

In general we try to write the spec so that extensions that allow
non-standard features can be put into production. In this case it is
an extension to ADQL so maybe LANG=ADQL-array-proto??

Things t think about,

Pat
PS-new TAP WD is imminent

On 6 July 2017 at 04:24, Gregory Mantelet
<gmantele at ari.uni-heidelberg.de> wrote:
> Hi DAL,
>
> I agree with Pat: how to convey the knowledge to TAP clients that arrays are
> supported is a major issue?
> And personally, I am more thinking about what should I write in
> TAP_SCHEMA.columns.datatype for an array?
>
> Once these points are solved, agreeing on a syntax with or without UDFs will
> be quite easy.
>
> PostgreSQL lets create an array with a string like '{12.3,125.987}' (1D) or
> '{{1,2},{3,4}}' (2D and more). So we can imagine any UDF working on a string
> like that. Even though a database system does not support arrays, with
> string manipulation it could be possible to simulate one. It is not ideal,
> indeed. But we could also state that a string with the above syntax would be
> just the serialisation of an array, which does not mean it has to be stored
> and manipulated like that in the database system.
>
> So, the major problem is that a new datatype has to be added in ADQL/TAP.
>
> Without this addition in the language, the only thing we can do is probably
> going on a solution based on strings: an array would be typed as a VARCHAR
> in TAP_SCHEMA. Setting a xtype="array" (or similar) would probably help the
> TAP clients to consider these strings as arrays (but xtype in TAP_SCHEMA may
> be possible only from TAP 1.1, is it right?). However it is also possible
> that a TAP implementation returns those arrays in VOTable (and FITS?) as
> arrays, which would allow clients like TOPCAT to interpret and manipulate
> them as such instead of simple strings.
>
> Grégory
>
>
>
> On 05/07/2017 17:43, Mark Taylor wrote:
>>
>> Pat and Walter,
>>
>> thanks for these pointers.  Something like PostgreSQL square
>> bracket or Oracle round bracket indexing obviously looks
>> intuitive and would be nice to have, and this option has come
>> up in our discussions.  But neither of these options can appear
>> directly in syntactically legal ADQL 2.0.
>>
>> I expect that DPAC would support some kind of standard array indexing
>> syntax along these lines in a future version of ADQL, but
>> it doesn't solve our immediate problem which is allowing array
>> element access in TAP queries in time for Gaia DR2 (April 2018).
>> I'm guessing that a revised ADQL version containing this
>> not-previously-discussed syntax is not likely on that timescale.
>> So unless I'm missing something we either have to supply this
>> capability using UDFs, or require syntactically-illegal ADQL in queries.
>>
>> Mark
>>
>> On Tue, 4 Jul 2017, Walter Landry wrote:
>>
>>> Oracle calls them Varrays
>>>
>>>    https://docs.oracle.com/cd/B10501_01/appdev.920/a96624/05_colls.htm
>>>
>>> It looks like they use parentheses '()' to access elements.  SQLite
>>> does not support arrays.
>>>
>>>    https://www.sqlite.org/datatype3.html
>>>
>>> It looks like both Postgres and Oracle start indexing at 1 :(
>>>
>>> Cheers,
>>> Walter Landry
>>>
>>> Patrick Dowler <pdowler.cadc at gmail.com> wrote:
>>>>
>>>> PostgreSQL supports array types with normal looking array access by
>>>> index:
>>>>
>>>> https://www.postgresql.org/docs/9.6/static/arrays.html
>>>>
>>>> I haven't found a compelling reason to use something like this (yet)
>>>> but it has been around for some time. The tricky bit probably isn't so
>>>> much ADQL speciyfing it but rather TAPRegExt providing the means to
>>>> tell clients/users whether array types/access is supported and
>>>> possibly for which primitive types.
>>>>
>>>> Pat
>>>>
>>>> On 4 July 2017 at 05:03, Ger van Diepen <diepen at astron.nl> wrote:
>>>>>
>>>>> Hi Mark,
>>>>>
>>>>>
>>>>> You could take a look at Casacore's TaQL, the query language for its
>>>>> tables.
>>>>> It uses array syntax similar to Python.
>>>>>
>>>>> I understood that an ADQL query command is parsed and rewritten to a
>>>>> command
>>>>> for the underlying query language. That should make it possible to use
>>>>> python-like array syntax and rewrite it to UDFs as needed. Probably not
>>>>> needed for PostgreSQL that supports arrays.
>>>>>
>>>>> Note that TaQL also support units in a natural way.
>>>>>
>>>>> See http://casacore.github.io/casacore-notes/199.html
>>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>> ger
>>>>>
>>>>>>>> Mark Taylor <M.B.Taylor at bristol.ac.uk> 04-Jul-17 13:43 >>>
>>>>>
>>>>>
>>>>> Dear DAL,
>>>>>
>>>>> Upcoming releases of the Gaia catalogue will contain some
>>>>> array-valued columns in the source catalogues, things like
>>>>> time series, spectra and correlation matrices.
>>>>> TAP does not prevent array-valued columns, but as far as
>>>>> I know there is no standard way to access array elements
>>>>> within ADQL queries.  In DPAC we are considering ways to allow this.
>>>>>
>>>>> We can define User-Defined Functions for this purpose, and
>>>>> some experimental functionality along these lines has been
>>>>> implemented.  But we're interested in input from the IVOA:
>>>>>
>>>>>     - have other people encountered this and come up with solutions
>>>>>       that we can copy?
>>>>>
>>>>>     - should we try to come up with something (a de facto standard)
>>>>>       that can be used by other services?
>>>>>
>>>>>     - is there a case for language support for these features in
>>>>>       a future version of ADQL?
>>>>>
>>>>> Here is the initial discussion item reported by Alcione Mora from
>>>>> the DPAC issue tracking system (ref for DPAC insiders: C9GACS-239):
>>>>>
>>>>>     Experimental support for array functions has been added to Gaia
>>>>> Archive
>>>>>     v1.3.0. Array types are supported as valid output formats. In
>>>>> addition,
>>>>>     some user defined functions have been defined for direct
>>>>> manipulation,
>>>>>     most notably (see Archive help):
>>>>>
>>>>>     GET_DOUBLE_ARRAY_ELEMENT(array,indexes): Returns the selected
>>>>> element
>>>>>     from the array of double precision values, where:
>>>>>
>>>>>        - array [double]:
>>>>>             Input array
>>>>>        - indexes [string]:
>>>>>             String with the selected indexes with the format '[i][j]..'
>>>>>
>>>>>     The syntax is functional, but should be considered a work in
>>>>> progress
>>>>>     until DR2.
>>>>>
>>>>>     Extra functionality is needed Some suggestions to discuss include
>>>>>     the following:
>>>>>
>>>>>        - get_length(array_column, any type), null for null input cell
>>>>>        - create_double_array(list_of_columns), autocasting
>>>>>        - create_int_array(list_of_columns), autocasting
>>>>>        - Is ADQL function overload supported? I could not find
>>>>>          any reference in ADQL and TAPRegExt (neither allowing nor
>>>>>          forbidding). If yes, GET_ARRAY_ELEMENT should be implemented
>>>>> using
>>>>>          a single function name for all data types transparently to the
>>>>>          user. If not, consider some of the following six bullet points
>>>>>        - Similar functions for float, long, int, byte and boolean
>>>>>          get_double and get_float should accept all numeric types as
>>>>> input
>>>>>          (automatic casting)
>>>>>        - get_long should accept long, int, byte and boolean (1=true,
>>>>> 0=false)
>>>>>        - get_int should accept int and byte and boolean (1=true,
>>>>> 0=false)
>>>>>        - get_byte should accept byte and boolean (1=true, 0=false)
>>>>>        - Alternatively or complementary, the CAST ADQL 2.1 optional
>>>>>          function could be implemented
>>>>>
>>>>> If anyone in VO-land has input on this, we'd be pleased to hear it,
>>>>> so we can do something re-usable or re-used as much as possible.
>>>>> Otherwise, we'll go ahead and do whatever looks like a best fit
>>>>> to Gaia requirements.
>>>>>
>>>>> Thanks
>>>>>
>>>>> Mark
>>>>>
>>>>> --
>>>>> Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
>>>>> m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/
>>>>
>>>>
>>>>
>>>> --
>>>> Patrick Dowler
>>>> Canadian Astronomy Data Centre
>>>> Victoria, BC, Canada
>>
>> --
>> Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
>> m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/
>
>
>



-- 
Patrick Dowler
Canadian Astronomy Data Centre
Victoria, BC, Canada


More information about the dal mailing list