REGION

Tue May 8 10:45:21 PDT 2007

My general thoughts:

A boolean function and a boolean operator are both valid ways to express this. 
Both forms can be extended with other function names and arguments or other 
operators (in the future).

The difference is that the <column> <operator> <literal> construct is very 
simple and exactly like other SQL conditions. It separately defines the 
values (REGION literals) and the operator, which the functional approach does 
not (it combines them into one thing, which will inherently be harder to 
document and implementation will be less uniform as a result). 

So, they mean roughly the same thing, but I argue that an operator approach is 
easier to specify clearly because it breaks the issue down into two simple 
parts. 

Pedro - I will prepare a short presentation of the REGION issues for the 
meeting.

Pat

More comments inline:

On Tuesday 08 May 2007 06:25, Alex Szalay wrote:
> I have an other concern about how we set out to do REGION. In the WHERE
> clause of SQL every item is a Boolean expression which evaluates to true
> or false. These have the form of value OPERATOR (value|set), the latter
> for the case of IN. In any case, the left hand side of a where clause is
> always a scalar value evaluated from the next tuple. In our current
> REGION syntax we have a list of columns on the left hand side, not a
> scalar, quite different from anything else in the SQL WHERE.

No, this is not what was described. The idea is that if REGION is considered a 
base datatype, the the TAP service can declare a logical column (eg position) 
and the ADQL would refer to this single column on the left-hand side of the 
operator. On the right-hand side is a literal value, eg REGION("...") with 
whatever STC says we can serialise in there. That is it. Users do not have to 
figure out which columns go together to form the composite type. 

Below I will re-write your examples in this way, just to be clear.

> After being negative, here is a suggestion, that came up today in a
> discussion with Benjamin. He did not quite like it, but I think it is worth
> mentioning it. This may have come up before I was involved in the VOQL, and
> discarded, but let me try it anyway. These are of course rough examples.
>
> Consider a boolean built-in REGION function, with possibly a variable
> number of arguments. We can then write a spatial query like
>
> 	SELECT objid, ra,dec FROM Catalog c
> 	WHERE REGION.CONTAINS('CIRCLE ICRS 185 32.5 0.5','ICRS',c.ra, c.dec)

   SELECT objid, position FROM Catalog c
   WHERE position OVERLAPS REGION('CIRCLE ICRS 185 32.5 0.5')

> 	SELECT objid, gl,gb FROM Catalog c
> 	WHERE REGION.CONTAINS('CIRCLE ICRS 185 32.5 0.5','GII',c.gl,c.gb)

If a catalog stores coordinates in a different coordinate system from the 
literal value, it has to do the transformation (typically of the literal). If 
the user wants/needs to see output in an arbitrary coordinate system, that 
can/should be nadles with coordinate conversion function call(s) in the 
SELECT or client-side.

> One can also do the "get all regions which contain the point" query. Assume
> that we have a table of regions. Then
>
> 	SELECT regionid FROM regionTable r
> 	WHERE REGION.CONTAINS(r.regionDef,'ICRS',185,32.5)

   SELECT regionid FROM regionTable r
   WHERE regionDef OVERLAPS REGION('185,32.5 ICRS')

> We can also have an intersect function, which can say if two regions
> intersect, like
>
> 	SELECT regionid FROM regionTable
> 	WHERE REGION.INTERSECTS(regionDef,'CIRCLE ICRS 185 32.5 0.5')

Same as above with a different REGION literal value. 

It is true that you have to know that the OVERLAPS operator can handle points 
and "areas", which I think I mentioned some geometry libraries handle the 
same way and others different. For our purposes though, I think this sort of 
definition of the simplest, most general operator, satisfies the 90-95% rule. 
Again, I would personally prefer INTERSECTS except for the existing SQL 
reserved word...

> These are all automatically valid SQL statements, provided that the
> mapping of the actual parameters on to the precise definition of
> the function is correct. One can even think of implementing this
> directly... and extending it later is trivial (REGION.DISJOINT, etc).

I may be wrong about this, not being a big parser guru, but I believe that the 
BNF for operators can be quite explicit and thus ADQL parsers will more 
easily handle the operator... at least it seems like they would be easier to 
parse because you can write the rules for argument types in the BNF. Maybe 
just easier to write (and read) the BNF :)

> The . notation is also arising in the INFORMATION_SCHEMA.* cloud, so it
> is not unlike existing stuff. By the way, in SDSS we have 30,000 regions
> stored, and I believe that Gretchen has over 100K in her HST exposures
> database. So I think we will store regions in the database.

Yes, we have lots of such things in the database too, they just aren't 
actually STC REGIONs: they are polygons, points, and ellipses stored in my 
own binary format.

-- 

Patrick Dowler
Tel/Tél: (250) 363-6914                  | fax/télécopieur: (250) 363-0045
Canadian Astronomy Data Centre   | Centre canadien de donnees astronomiques
National Research Council Canada | Conseil national de recherches Canada
Government of Canada                  | Gouvernement du Canada
5071 West Saanich Road               | 5071, chemin West Saanich
Victoria, BC                                  | Victoria (C.-B.)