ADQL XMATCH

Mark Taylor M.B.Taylor at bristol.ac.uk
Mon Apr 11 23:58:14 CEST 2016


On Mon, 11 Apr 2016, Patrick Dowler wrote:

> TL;DR - I think that we should redefine all the geometry functions
> without coord sys now and (since overloading seems to be OK) we can
> keep the old deprecated ones if we have to. Then the 2-arg DISTANCE
> function with point args is my preferred solution. I don't see this
> strictly as syntactic sugar to be used instead of a crafty CONTAINS
> (equiv as a predicate) because the user can also add DISTANCE(...) to
> the select list.

I understand Pat's arguments, but I still prefer the 4-arg DISTANCE
(or SKYDISTANCE).

While I understand that it's a bit fiddly to infer a POINT type
from an (ra,dec) argument pair, it doesn't sound that difficult
to do given that you're rewriting queries anyway.
I bet that if you have pos, ra and dec columns, you'd get a lot
of people writing POINT(ra,dec) in any case.

In particular:

> We have several catalogues in our TAP service with the coordinates in
> a column described with xtype="adql:POINT" (lets ignore the details of
> the adql prefix for now).  If the query on those tables uses that
> column, the relevant indexing comes into play. It is true that the
> tables also have separate RA and DEC columns and in principle I could
> detect DISTANCE(RA, DEC, uploaded.c1, uploaded.c2) and replace RA, DEC
> with the POS column, but what do I do if:
> 
> - query refers to the wrong columns in the table (e.g. DISTANCE(foo,
> bar, uploaded.c1, uploaded.c2)
> - query just gets them in the wrong order (e.g. DISTANCE(DEC, RA,
> uploaded.c1, uploaded.c2)

answer: execute the queries as submitted.  Maybe they're wrong,
probably it's not the first time that user will have submitted
a query that gives the wrong answer.

> I would be inclined to have the job fail rather than run it. It makes

Disagree.  The user may have her reasons for foo, bar.

The main reason I'm uncomfortable with the 2-arg form (which I agree
is cleaner - it's how I'd want to see it in a java API for instance)
is that provision of POINT datatypes in TAP services is still
a minority pastime.  I count (using GLOTS) only three services
providing at least one table with POINT or adql:POINT column:
CADC, GAVO-DC and SimTAP.  There are also a number of JVO
services with the non-standard type "jpoint", I presume doing
something similar.  The other 80-odd registered TAP services
out there get by representing sky positions with RA and Dec columns.
So queries to most services are going to be more verbose
(requiring DISTANCE(POINT(ra1,dec1), POINT(ra2,dec2))
rather than DISTANCE(ra1,dec1,ra2,dec2)) so that queries to
a few services can be shorter (DISTANCE(pos1,pos2), or
often DISTANCE(pos,POINT(upload.ra,upload.dec))).

The other thing is that if you write (e.g. in examples or
documentation) "DISTANCE(ra1,dec1,ra2,dec2)", everybody
understands at a glance how it works.  DISTANCE(pos1, pos2)
takes a little bit more explanation: some columns are point-valued,
if there isn't a point-valued column, you have to write
POINT(ra, dec).

One could argue those are not good enough reasons, and that
data providers and astronomers should be made to offer
databases and construct queries in a way that reflects
the geometrical datatypes that are really being manipulated.
I don't plan to fight this to the death, but that's my 2p.

Mark

--
Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/


More information about the dal mailing list