REGION

Alex Szalay szalay at jhu.edu
Tue May 8 06:25:47 PDT 2007


I have an other concern about how we set out to do REGION. In the WHERE
clause of SQL every item is a Boolean expression which evaluates to true
or false. These have the form of value OPERATOR (value|set), the latter 
for the case of IN. In any case, the left hand side of a where clause is
always a scalar value evaluated from the next tuple. In our current 
REGION syntax we have a list of columns on the left hand side, not a scalar,
quite different from anything else in the SQL WHERE.

After being negative, here is a suggestion, that came up today in a
discussion with Benjamin. He did not quite like it, but I think it is worth
mentioning it. This may have come up before I was involved in the VOQL, and
discarded, but let me try it anyway. These are of course rough examples.

Consider a boolean built-in REGION function, with possibly a variable 
number of arguments. We can then write a spatial query like

	SELECT objid, ra,dec FROM Catalog c
	WHERE REGION.CONTAINS('CIRCLE ICRS 185 32.5 0.5','ICRS',c.ra, c.dec)
or	
	SELECT objid, ra,dec FROM Catalog c
	WHERE REGION.CONTAINS('CIRCLE ICRS 185 32.5 0.5','ICRS',c.x,c.y,c.z)
or 
	SELECT objid, gl,gb FROM Catalog c
	WHERE REGION.CONTAINS('CIRCLE ICRS 185 32.5 0.5','GII',c.gl,c.gb)

One can also do the "get all regions which contain the point" query. Assume
that we have a table of regions. Then

	SELECT regionid FROM regionTable r
	WHERE REGION.CONTAINS(r.regionDef,'ICRS',185,32.5)

We can also have an intersect function, which can say if two regions
intersect, like

	SELECT regionid FROM regionTable
	WHERE REGION.INTERSECTS(regionDef,'CIRCLE ICRS 185 32.5 0.5')

These are all automatically valid SQL statements, provided that the 
mapping of the actual parameters on to the precise definition of
the function is correct. One can even think of implementing this
directly... and extending it later is trivial (REGION.DISJOINT, etc).

The . notation is also arising in the INFORMATION_SCHEMA.* cloud, so it
is not unlike existing stuff. By the way, in SDSS we have 30,000 regions
stored, and I believe that Gretchen has over 100K in her HST exposures
database. So I think we will store regions in the database. 

Let me know what you think. Sorry for the two page size.

--Alex


-----Original Message-----
From: owner-voql-teg at eso.org [mailto:owner-voql-teg at eso.org] On Behalf Of
Patrick Dowler
Sent: Monday, May 07, 2007 2:06 PM
To: 'VOQL-TEG'
Subject: Re: REGION


Many small comments interspersed...

On Saturday 05 May 2007 00:40, Alex Szalay wrote:
> This is a good start, but I think that we need a much clearer focus.
> Also, after reading this I still feel confused what a REGION datatype is.
> I will try to keep my comments short.
>
> In a typical spatial framework there are several different spatial
> datatypes
>
> (POINTSET, LINESET, POLYGON). These datatypes are typically not simpole,
> even
> the description of a point can be quite complex (see STC), not to mention
a
> complex region. Of course these can be serialized into a string. But I
> would not want to put the coordinates into "ra dec" strings.

REGION is the base class of all these things you mention. If that is not the

correct word (from STC or otherwise) we can pick a different one, but the 
idea is that it is the base class, which makes it the most general and we
can 
do quite a lot now and in future without adding more reserved words (eg 
without changing ADQL).

Yes, they are complex (in STC) but I don't see any alternative to that. I
have 
not looked at STC closely and considered it more or less pre-ordained that
we 
should use whatever VO-standard coordinate system spec is available. I don't

want to put a huge chunk of XML inside the REGION("...")... 

> Of course here Pat and Benjamin also want to
> extend this to even more abstract concepts like time and energy intervals,
> that none of the GIS systems do, although for intervals I think the
BETWEEN
> clause (or several for a more complex interval set) might just do the job.
>
> One can then define various RELATIONS and various OPERATIONS between them.
> The relations can be (CONTAINS, TOUCHES, DISJOINT, INTERSECT,...)
> understood
> as an enumerated return value from an operation between two different
> spatial objects.

Yes, all these operations are possible with geometry. However, I think 
OVERLAPS comparison (aka INTERSECT) covers 95% of the uses without blocking 
future enhancements (eg if a future version of ADQL introduces CONTAINS it 
would keep OVERLAPS and the meaning would not change). I am happy enough to 
re-use the INTERSECT reserved word if others are happy with that. 

> One can also have OPERATIONS among spatial objects, these are
> (INTERSECTION, UNION, DIFFERENCE) which form a Boolean algebra, 
> with some restrictions. These return another spatial object.

IMO, too fancy for ADQL 1.0


> If we only restrict ourselves to POINTSETS (our catalogs) and POLYGONS
(say
> =REGIONS) there are still many different things we might want to do. These
> are all questions that the SDSS users have neen asking from the database
as
> part of their research
>
> (1) Give me all the POINTS within a REGION from a certain set of tables

SELECT * FROM <table>
WHERE <position column> OVERLAPS REGION("...")

I don't see any problem treating points as regions (with 0 area). Some 
geometry libraries do this, others do not.

> (2) Given me all the POINTS which are within 10 arcsec to a REGION
(errors)

This is just a different region, the client software can (should) implement 
this feature.

> (3) Tell me if this POINT is within this REGION

I don't see how this differs from #1. Is it just that both args to OVERLAPS 
are literals? That should be OK.

> (4) Which REGIONS in the database contain this POINT (is it in the photo
> 	footprint but not in the spectro, for example)

SELECT * FROM <table>
WHERE <bounds column> OVERLAPS REGION("123 45 ICRS")

(for example, I don't know if that is the correct string version of a
point).

> (5) What is the distance of this point to the boundary

IMO, too fancy for ADQL 1.0

> (6) What percent of this points 30" neighborhood is inside the survey
> footprint

IMO, too fancy for ADQL 1.0

> One can also think of storing REGION (POLYGON) data in the database, and
> perform operations on those plus the incoming user defined regions. This
is
> a
> very complex task and to do this efficiently, one typically needs a binary
> representation inside the DB, i.e. an object oriented or an object
> relational DB. I do not want to go there, since my one page is up.

I agree technically but conceptually it is quite straightforward to define
the 
basic comparison operator (overlaps) that is applicable to the base type 
(region). That is what we have done and it covers ~95% of the ADQL use cases

and is "future proof" in the sense that further advanced features do not
make 
it incorrect or out of place (at least, that is a concern and I think the 
proposal satisfies it).

Also, despite there being a column of type REGION, I don't envision people 
actually storing regions in the DB... it is intended to be a logical 
declaration only.

-- 

Patrick Dowler
Tel/Tél: (250) 363-6914                  | fax/télécopieur: (250) 363-0045
Canadian Astronomy Data Centre   | Centre canadien de donnees astronomiques
National Research Council Canada | Conseil national de recherches Canada
Government of Canada                  | Gouvernement du Canada
5071 West Saanich Road               | 5071, chemin West Saanich
Victoria, BC                                  | Victoria (C.-B.)





More information about the voql-teg mailing list