REGION
Patrick Dowler
patrick.dowler at nrc-cnrc.gc.ca
Tue May 8 10:45:21 PDT 2007
My general thoughts:
A boolean function and a boolean operator are both valid ways to express this.
Both forms can be extended with other function names and arguments or other
operators (in the future).
The difference is that the <column> <operator> <literal> construct is very
simple and exactly like other SQL conditions. It separately defines the
values (REGION literals) and the operator, which the functional approach does
not (it combines them into one thing, which will inherently be harder to
document and implementation will be less uniform as a result).
So, they mean roughly the same thing, but I argue that an operator approach is
easier to specify clearly because it breaks the issue down into two simple
parts.
Pedro - I will prepare a short presentation of the REGION issues for the
meeting.
Pat
More comments inline:
On Tuesday 08 May 2007 06:25, Alex Szalay wrote:
> I have an other concern about how we set out to do REGION. In the WHERE
> clause of SQL every item is a Boolean expression which evaluates to true
> or false. These have the form of value OPERATOR (value|set), the latter
> for the case of IN. In any case, the left hand side of a where clause is
> always a scalar value evaluated from the next tuple. In our current
> REGION syntax we have a list of columns on the left hand side, not a
> scalar, quite different from anything else in the SQL WHERE.
No, this is not what was described. The idea is that if REGION is considered a
base datatype, the the TAP service can declare a logical column (eg position)
and the ADQL would refer to this single column on the left-hand side of the
operator. On the right-hand side is a literal value, eg REGION("...") with
whatever STC says we can serialise in there. That is it. Users do not have to
figure out which columns go together to form the composite type.
Below I will re-write your examples in this way, just to be clear.
> After being negative, here is a suggestion, that came up today in a
> discussion with Benjamin. He did not quite like it, but I think it is worth
> mentioning it. This may have come up before I was involved in the VOQL, and
> discarded, but let me try it anyway. These are of course rough examples.
>
> Consider a boolean built-in REGION function, with possibly a variable
> number of arguments. We can then write a spatial query like
>
> SELECT objid, ra,dec FROM Catalog c
> WHERE REGION.CONTAINS('CIRCLE ICRS 185 32.5 0.5','ICRS',c.ra, c.dec)
SELECT objid, position FROM Catalog c
WHERE position OVERLAPS REGION('CIRCLE ICRS 185 32.5 0.5')
> SELECT objid, gl,gb FROM Catalog c
> WHERE REGION.CONTAINS('CIRCLE ICRS 185 32.5 0.5','GII',c.gl,c.gb)
If a catalog stores coordinates in a different coordinate system from the
literal value, it has to do the transformation (typically of the literal). If
the user wants/needs to see output in an arbitrary coordinate system, that
can/should be nadles with coordinate conversion function call(s) in the
SELECT or client-side.
> One can also do the "get all regions which contain the point" query. Assume
> that we have a table of regions. Then
>
> SELECT regionid FROM regionTable r
> WHERE REGION.CONTAINS(r.regionDef,'ICRS',185,32.5)
SELECT regionid FROM regionTable r
WHERE regionDef OVERLAPS REGION('185,32.5 ICRS')
> We can also have an intersect function, which can say if two regions
> intersect, like
>
> SELECT regionid FROM regionTable
> WHERE REGION.INTERSECTS(regionDef,'CIRCLE ICRS 185 32.5 0.5')
Same as above with a different REGION literal value.
It is true that you have to know that the OVERLAPS operator can handle points
and "areas", which I think I mentioned some geometry libraries handle the
same way and others different. For our purposes though, I think this sort of
definition of the simplest, most general operator, satisfies the 90-95% rule.
Again, I would personally prefer INTERSECTS except for the existing SQL
reserved word...
> These are all automatically valid SQL statements, provided that the
> mapping of the actual parameters on to the precise definition of
> the function is correct. One can even think of implementing this
> directly... and extending it later is trivial (REGION.DISJOINT, etc).
I may be wrong about this, not being a big parser guru, but I believe that the
BNF for operators can be quite explicit and thus ADQL parsers will more
easily handle the operator... at least it seems like they would be easier to
parse because you can write the rules for argument types in the BNF. Maybe
just easier to write (and read) the BNF :)
> The . notation is also arising in the INFORMATION_SCHEMA.* cloud, so it
> is not unlike existing stuff. By the way, in SDSS we have 30,000 regions
> stored, and I believe that Gretchen has over 100K in her HST exposures
> database. So I think we will store regions in the database.
Yes, we have lots of such things in the database too, they just aren't
actually STC REGIONs: they are polygons, points, and ellipses stored in my
own binary format.
--
Patrick Dowler
Tel/Tél: (250) 363-6914 | fax/télécopieur: (250) 363-0045
Canadian Astronomy Data Centre | Centre canadien de donnees astronomiques
National Research Council Canada | Conseil national de recherches Canada
Government of Canada | Gouvernement du Canada
5071 West Saanich Road | 5071, chemin West Saanich
Victoria, BC | Victoria (C.-B.)
More information about the voql-teg
mailing list