Array element access in ADQL

Ger van Diepen diepen at astron.nl
Tue Jul 4 14:03:04 CEST 2017


Hi Mark, 

You could take a look at Casacore's TaQL, the query language for its
tables. It uses array syntax similar to Python. 
I understood that an ADQL query command is parsed and rewritten to a
command for the underlying query language. That should make it possible
to use python-like array syntax and rewrite it to UDFs as needed.
Probably not needed for PostgreSQL that supports arrays. 
Note that TaQL also support units in a natural way. 
See http://casacore.github.io/casacore-notes/199.html 

Cheers, 
ger

>>> Mark Taylor <M.B.Taylor at bristol.ac.uk> 04-Jul-17 13:43 >>>
Dear DAL,

Upcoming releases of the Gaia catalogue will contain some
array-valued columns in the source catalogues, things like
time series, spectra and correlation matrices.
TAP does not prevent array-valued columns, but as far as
I know there is no standard way to access array elements
within ADQL queries.  In DPAC we are considering ways to allow this.

We can define User-Defined Functions for this purpose, and
some experimental functionality along these lines has been
implemented.  But we're interested in input from the IVOA:

   - have other people encountered this and come up with solutions
     that we can copy?

   - should we try to come up with something (a de facto standard)
     that can be used by other services?

   - is there a case for language support for these features in
     a future version of ADQL?

Here is the initial discussion item reported by Alcione Mora from
the DPAC issue tracking system (ref for DPAC insiders: C9GACS-239):

   Experimental support for array functions has been added to Gaia
Archive
   v1.3.0. Array types are supported as valid output formats. In
addition,
   some user defined functions have been defined for direct
manipulation,
   most notably (see Archive help):

   GET_DOUBLE_ARRAY_ELEMENT(array,indexes): Returns the selected
element
   from the array of double precision values, where:

      - array [double]:
           Input array
      - indexes [string]:
           String with the selected indexes with the format '[i][j]..'

   The syntax is functional, but should be considered a work in
progress
   until DR2.

   Extra functionality is needed Some suggestions to discuss include
   the following:

      - get_length(array_column, any type), null for null input cell
      - create_double_array(list_of_columns), autocasting
      - create_int_array(list_of_columns), autocasting
      - Is ADQL function overload supported? I could not find
        any reference in ADQL and TAPRegExt (neither allowing nor
        forbidding). If yes, GET_ARRAY_ELEMENT should be implemented
using
        a single function name for all data types transparently to the
        user. If not, consider some of the following six bullet points
      - Similar functions for float, long, int, byte and boolean
        get_double and get_float should accept all numeric types as
input
        (automatic casting)
      - get_long should accept long, int, byte and boolean (1=true,
0=false)
      - get_int should accept int and byte and boolean (1=true,
0=false)
      - get_byte should accept byte and boolean (1=true, 0=false)
      - Alternatively or complementary, the CAST ADQL 2.1 optional
        function could be implemented

If anyone in VO-land has input on this, we'd be pleased to hear it,
so we can do something re-usable or re-used as much as possible.
Otherwise, we'll go ahead and do whatever looks like a best fit
to Gaia requirements.

Thanks

Mark

--
Mark Taylor   Astronomical Programmer   Physics, Bristol University,
UK
m.b.taylor at bris.ac.uk +44-117-9288776 
http://www.star.bris.ac.uk/~mbt/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dal/attachments/20170704/46a34d2e/attachment-0001.html>


More information about the dal mailing list