REGION

Alex Szalay szalay at jhu.edu
Sat May 5 00:40:36 PDT 2007


This is a good start, but I think that we need a much clearer focus.
Also, after reading this I still feel confused what a REGION datatype is.
I will try to keep my comments short.

In a typical spatial framework there are several different spatial datatypes

(POINTSET, LINESET, POLYGON). These datatypes are typically not simpole,
even
the description of a point can be quite complex (see STC), not to mention a
complex region. Of course these can be serialized into a string. But I would
not want to put the coordinates into "ra dec" strings.

Of course here Pat and Benjamin also want to 
extend this to even more abstract concepts like time and energy intervals, 
that none of the GIS systems do, although for intervals I think the BETWEEN 
clause (or several for a more complex interval set) might just do the job.

One can then define various RELATIONS and various OPERATIONS between them. 
The relations can be (CONTAINS, TOUCHES, DISJOINT, INTERSECT,...) understood

as an enumerated return value from an operation between two different
spatial 
objects.

One can also have OPERATIONS among spatial objects, these are (INTERSECTION,

UNION, DIFFERENCE) which form a Boolean algebra, with some restrictions. 
These return another spatial object.

If we only restrict ourselves to POINTSETS (our catalogs) and POLYGONS (say 
=REGIONS) there are still many different things we might want to do. These 
are all questions that the SDSS users have neen asking from the database as 
part of their research

(1) Give me all the POINTS within a REGION from a certain set of tables
(2) Given me all the POINTS which are within 10 arcsec to a REGION (errors)
(3) Tell me if this POINT is within this REGION
(4) Which REGIONS in the database contain this POINT (is it in the photo 
	footprint but not in the spectro, for example)
(5) What is the distance of this point to the boundary
(6) What percent of this points 30" neighborhood is inside the survey
footprint

One can also think of storing REGION (POLYGON) data in the database, and 
perform operations on those plus the incoming user defined regions. This is
a 
very complex task and to do this efficiently, one typically needs a binary 
representation inside the DB, i.e. an object oriented or an object
relational 
DB. I do not want to go there, since my one page is up.

I think this is a very hard problem and requires further discussion.

--Alex

-----Original Message-----
From: owner-voql-teg at eso.org [mailto:owner-voql-teg at eso.org] On Behalf Of
Patrick Dowler
Sent: Friday, May 04, 2007 5:33 PM
To: VOQL-TEG
Subject: REGION


note: I had to violate my one-screen email limit on this one, but it is a
"report" :)

Benjamin and I exchanged a few emails off-line about region, and came up
with this preferred format for expressing a condition:

   something OVERLAPS REGION("...")

where something is a column name or alias from the table, OVERLAPS is an
operator, and REGION("...") is thus a literal value. REGION is a reserved
word used to form literals (above) and to declare the type of "something". 
That is, a TAP service would say that there is a column of type REGION and
that tells the user exactly how to formulate the condition. 

We considered other reserved words for the operator (INTERSECT, IN) but
discounted IN because it implies complete inclusion which we thought it not
the general meaning when both the column and the literal are extended
regions (rather than points). INTERSECT in SQL is used to mean "set
intersection" (if I recall) so this would not be so bad if you think of a
region as the "set of all points" within a boundary. Using INTERSECT would
mean overloading the meaning (ie it means something special if the arguments
are regions). We nominally adopted OVERLAPS (although the term does appear
in the SIA 1.0 document at least). In geometry, I think INTERSECT is the
general term one would use and it has all the correct implications whether
you are talking about points, lines/segments, curves, or arbitrary shapes.
We also looked at but rejected the PostgreSQL overlaps operator && as being
obtuse.

Since I prefer with the trailing S, OVERLAPS seems slightly better (than
INTERSECT). Some other reserved word might be better, but overlap is
suitably general (it also appears in the SIA 1.0 doc and means the same
thing there as here).

As for STC, it is just the (one?) way to specify the REGION literal. That
is, STC says what to put in the string "...".

** Summary **

REGION is a datatype and literals are REGION("...") where ... is specified
by STC. We add an operator OVERLAPS that is used between two REGIONs
(typically a column of type REGION and a literal). It should work for
columns of energy and time or whatever else is in STC. A TAP service
declares (logical) columns of type REGION to say exactly where/how the
OVERLAPS operator can be used with no ambiguity. 

** sales pitch **

* multiple region columns in table or via join

You can have multiple REGION columns in a table (in theory) and there is no
need to say that 2 or more columns go together (eg ra and dec): you just
have a column like "position" of type REGION. In an observation catalog you
could in principle have columns like "bounds" and "center" and
"target_position" 
all of type REGION and all with different values. 

* separate REGION output from query capability:

A TAP service could in principle have columns of type REGION (for output)
and yet not support the OVERLAPS operator. I think it is good to decouple
this as all DBs can store them but not all can do decent spatial querying.
It is up to the TAP service to decide.

* energy and time axes (intervals):

I realised (but didn't express to Benjamin so he hasn't see this) that this
actually works as is for the energy and time axes that STC also covers. You
can declare a column named energy (for example) of type REGION, and then use
STC to write the literal (interval or single value) and thus form a
condition that is valid. Thus, one should be able to use 

   energy OVERLAPS REGION("<serialised STC energy region>")

as well. The column metadata (utype) would indicate what kind of literal
(which STC coordinate axis) to use.

Alex mentioned a few things, 3 of which fit in fine and the 4th --
expressing unions and intersections and such -- we thought maybe too much
for the query language, but could be discussed.

-- 

Patrick Dowler
Tel/Tél: (250) 363-6914                  | fax/télécopieur: (250) 363-0045
Canadian Astronomy Data Centre   | Centre canadien de donnees astronomiques
National Research Council Canada | Conseil national de recherches Canada
Government of Canada                  | Gouvernement du Canada
5071 West Saanich Road               | 5071, chemin West Saanich
Victoria, BC                                  | Victoria (C.-B.)




More information about the voql-teg mailing list