REGION

Iñaki Ortiz de Landaluce Inaki.Ortiz at sciops.esa.int
Wed May 9 07:01:48 PDT 2007


Hi there,

Alex suggested us to have a look at the NVO Footprint Service to check 
whether it can be of use within the ADQL. As he is traveling today and 
might not be able to start the discussion, I will comment it here, 
pointing to the relevant pages.
In our opinion all that we need to specify a region, not the full 
predicate construct, is already there.  From the main pages of the 
Footprint Service (announced last November, 
http://www.voservices.net/footprint/), you can access a link to a very 
simple and useful grammar definition that we could use:
 http://www.pha.jhu.edu/~gyuri/htm/HtmPrimer/regiongrammar.html
This looks to us as a clean and reasonable approach. So please disregard 
my previous suggestion of contacting Arnold Rots for clarification on an 
eventual STC-like string serialization. Fortunately, everything is 
already there :-)
Please tell us what you think.
Inaki

Iñaki Ortiz de Landaluce wrote:
> Hi All
>
> I tend to agree with Alex to build the REGION definition as a function 
> returning a boolean rather than adding new comparison operators. I feel 
> that the region definition and the columns description should be tied 
> together as they influence each other. The algorithm resulting from the 
> ADQL translation depends on the region type, the coordinates system and 
> the operator/verb. Therefore I think it's better to have everything 
> encapsulated into one single entity.
>
> Jeff did send some comments on this issue a few months ago. He was 
> basically proposing either building a function-like region predicate or 
> using the derived table approach. In the last draft of the specification 
> I did sketch a temptative region predicate definition consistent with 
> Jeff's first approach, which I personally think is fine (See appendixes 
> A and B). As you can see it is not complete at all: it misses the bits 
> of the STC (and its usage, which is not trivial at all!). The relevant 
> syntax rules were as follows:
>
> <predicate> ::= <comparison_predicate>| <between_predicate> | 
> <in_predicate> | <like_predicate> | <null_predicate>
> | <exists_predicate> | <region_predicate>
>
> <region_predicate> ::= <table_reference> <period> REGION
> <left_paren> 
> <region_specification> [ <comma> <column_reference> [ { <comma> 
> <column_reference> }... ] ]
> <right_paren>
>
> <region_specification> ::= !! Region definition from STC schema here
>
> I believe all this fits with Alex's suggestion. We would just need to 
> qualify the REGION keyword with a verb from a list of possible values. 
> This list could be extended but I would propose, for the time being, to 
> include INTERSECTS/OVERLAPS (and CONTAINS?) only. This could be 
> described in BNF as follows:
>
> <region_predicate> ::=
> REGION<period><region_verbs>
> <left_paren>
> <region_specification>
> [ <comma> <column_reference> [ { <comma> <column_reference> }... ] ]
> <right_paren>
>
> <region_verbs> = INTERSECTS | CONTAINS ...
>
> Therefore, the missing bit would just be the <region_specification> 
> syntax rule definition, for which we should ask for inputs from the STC 
> experts, at least for circular and rectangular shapes. In any case, 
> Alex's example of the circular region, which I guess comes from the old 
> ADQL, looks good to me.
>
> Inaki
>
>
> Alex Szalay wrote:
>   
>> I have an other concern about how we set out to do REGION. In the WHERE
>> clause of SQL every item is a Boolean expression which evaluates to true
>> or false. These have the form of value OPERATOR (value|set), the latter 
>> for the case of IN. In any case, the left hand side of a where clause is
>> always a scalar value evaluated from the next tuple. In our current 
>> REGION syntax we have a list of columns on the left hand side, not a scalar,
>> quite different from anything else in the SQL WHERE.
>>
>> After being negative, here is a suggestion, that came up today in a
>> discussion with Benjamin. He did not quite like it, but I think it is worth
>> mentioning it. This may have come up before I was involved in the VOQL, and
>> discarded, but let me try it anyway. These are of course rough examples.
>>
>> Consider a boolean built-in REGION function, with possibly a variable 
>> number of arguments. We can then write a spatial query like
>>
>> 	SELECT objid, ra,dec FROM Catalog c
>> 	WHERE REGION.CONTAINS('CIRCLE ICRS 185 32.5 0.5','ICRS',c.ra, c.dec)
>> or	
>> 	SELECT objid, ra,dec FROM Catalog c
>> 	WHERE REGION.CONTAINS('CIRCLE ICRS 185 32.5 0.5','ICRS',c.x,c.y,c.z)
>> or 
>> 	SELECT objid, gl,gb FROM Catalog c
>> 	WHERE REGION.CONTAINS('CIRCLE ICRS 185 32.5 0.5','GII',c.gl,c.gb)
>>
>> One can also do the "get all regions which contain the point" query. Assume
>> that we have a table of regions. Then
>>
>> 	SELECT regionid FROM regionTable r
>> 	WHERE REGION.CONTAINS(r.regionDef,'ICRS',185,32.5)
>>
>> We can also have an intersect function, which can say if two regions
>> intersect, like
>>
>> 	SELECT regionid FROM regionTable
>> 	WHERE REGION.INTERSECTS(regionDef,'CIRCLE ICRS 185 32.5 0.5')
>>
>> These are all automatically valid SQL statements, provided that the 
>> mapping of the actual parameters on to the precise definition of
>> the function is correct. One can even think of implementing this
>> directly... and extending it later is trivial (REGION.DISJOINT, etc).
>>
>> The . notation is also arising in the INFORMATION_SCHEMA.* cloud, so it
>> is not unlike existing stuff. By the way, in SDSS we have 30,000 regions
>> stored, and I believe that Gretchen has over 100K in her HST exposures
>> database. So I think we will store regions in the database. 
>>
>> Let me know what you think. Sorry for the two page size.
>>
>> --Alex
>>
>>
>> -----Original Message-----
>> From: owner-voql-teg at eso.org [mailto:owner-voql-teg at eso.org] On Behalf Of
>> Patrick Dowler
>> Sent: Monday, May 07, 2007 2:06 PM
>> To: 'VOQL-TEG'
>> Subject: Re: REGION
>>
>>
>> Many small comments interspersed...
>>
>> On Saturday 05 May 2007 00:40, Alex Szalay wrote:
>>   
>>     
>>> This is a good start, but I think that we need a much clearer focus.
>>> Also, after reading this I still feel confused what a REGION datatype is.
>>> I will try to keep my comments short.
>>>
>>> In a typical spatial framework there are several different spatial
>>> datatypes
>>>
>>> (POINTSET, LINESET, POLYGON). These datatypes are typically not simpole,
>>> even
>>> the description of a point can be quite complex (see STC), not to mention
>>>     
>>>       
>> a
>>   
>>     
>>> complex region. Of course these can be serialized into a string. But I
>>> would not want to put the coordinates into "ra dec" strings.
>>>     
>>>       
>> REGION is the base class of all these things you mention. If that is not the
>>
>> correct word (from STC or otherwise) we can pick a different one, but the 
>> idea is that it is the base class, which makes it the most general and we
>> can 
>> do quite a lot now and in future without adding more reserved words (eg 
>> without changing ADQL).
>>
>> Yes, they are complex (in STC) but I don't see any alternative to that. I
>> have 
>> not looked at STC closely and considered it more or less pre-ordained that
>> we 
>> should use whatever VO-standard coordinate system spec is available. I don't
>>
>> want to put a huge chunk of XML inside the REGION("...")... 
>>
>>   
>>     
>>> Of course here Pat and Benjamin also want to
>>> extend this to even more abstract concepts like time and energy intervals,
>>> that none of the GIS systems do, although for intervals I think the
>>>     
>>>       
>> BETWEEN
>>   
>>     
>>> clause (or several for a more complex interval set) might just do the job.
>>>
>>> One can then define various RELATIONS and various OPERATIONS between them.
>>> The relations can be (CONTAINS, TOUCHES, DISJOINT, INTERSECT,...)
>>> understood
>>> as an enumerated return value from an operation between two different
>>> spatial objects.
>>>     
>>>       
>> Yes, all these operations are possible with geometry. However, I think 
>> OVERLAPS comparison (aka INTERSECT) covers 95% of the uses without blocking 
>> future enhancements (eg if a future version of ADQL introduces CONTAINS it 
>> would keep OVERLAPS and the meaning would not change). I am happy enough to 
>> re-use the INTERSECT reserved word if others are happy with that. 
>>
>>   
>>     
>>> One can also have OPERATIONS among spatial objects, these are
>>> (INTERSECTION, UNION, DIFFERENCE) which form a Boolean algebra, 
>>> with some restrictions. These return another spatial object.
>>>     
>>>       
>> IMO, too fancy for ADQL 1.0
>>
>>
>>   
>>     
>>> If we only restrict ourselves to POINTSETS (our catalogs) and POLYGONS
>>>     
>>>       
>> (say
>>   
>>     
>>> =REGIONS) there are still many different things we might want to do. These
>>> are all questions that the SDSS users have neen asking from the database
>>>     
>>>       
>> as
>>   
>>     
>>> part of their research
>>>
>>> (1) Give me all the POINTS within a REGION from a certain set of tables
>>>     
>>>       
>> SELECT * FROM <table>
>> WHERE <position column> OVERLAPS REGION("...")
>>
>> I don't see any problem treating points as regions (with 0 area). Some 
>> geometry libraries do this, others do not.
>>
>>   
>>     
>>> (2) Given me all the POINTS which are within 10 arcsec to a REGION
>>>     
>>>       
>> (errors)
>>
>> This is just a different region, the client software can (should) implement 
>> this feature.
>>
>>   
>>     
>>> (3) Tell me if this POINT is within this REGION
>>>     
>>>       
>> I don't see how this differs from #1. Is it just that both args to OVERLAPS 
>> are literals? That should be OK.
>>
>>   
>>     
>>> (4) Which REGIONS in the database contain this POINT (is it in the photo
>>> 	footprint but not in the spectro, for example)
>>>     
>>>       
>> SELECT * FROM <table>
>> WHERE <bounds column> OVERLAPS REGION("123 45 ICRS")
>>
>> (for example, I don't know if that is the correct string version of a
>> point).
>>
>>   
>>     
>>> (5) What is the distance of this point to the boundary
>>>     
>>>       
>> IMO, too fancy for ADQL 1.0
>>
>>   
>>     
>>> (6) What percent of this points 30" neighborhood is inside the survey
>>> footprint
>>>     
>>>       
>> IMO, too fancy for ADQL 1.0
>>
>>   
>>     
>>> One can also think of storing REGION (POLYGON) data in the database, and
>>> perform operations on those plus the incoming user defined regions. This
>>>     
>>>       
>> is
>>   
>>     
>>> a
>>> very complex task and to do this efficiently, one typically needs a binary
>>> representation inside the DB, i.e. an object oriented or an object
>>> relational DB. I do not want to go there, since my one page is up.
>>>     
>>>       
>> I agree technically but conceptually it is quite straightforward to define
>> the 
>> basic comparison operator (overlaps) that is applicable to the base type 
>> (region). That is what we have done and it covers ~95% of the ADQL use cases
>>
>> and is "future proof" in the sense that further advanced features do not
>> make 
>> it incorrect or out of place (at least, that is a concern and I think the 
>> proposal satisfies it).
>>
>> Also, despite there being a column of type REGION, I don't envision people 
>> actually storing regions in the DB... it is intended to be a logical 
>> declaration only.
>>
>>   
>>     
>
> ================================================================================================
> This message and any attachments are intended for the use of the addressee or addressees only. The
> unauthorised disclosure, use, dissemination or copying (either in whole or in part) of its content
> is prohibited. If you received this message in error, please delete it from your system and notify
> the sender. E-mails can be altered and their integrity cannot be guaranteed. ESA shall not be liable
> for any e-mail if modified.
> =================================================================================================

================================================================================================
This message and any attachments are intended for the use of the addressee or addressees only. The
unauthorised disclosure, use, dissemination or copying (either in whole or in part) of its content
is prohibited. If you received this message in error, please delete it from your system and notify
the sender. E-mails can be altered and their integrity cannot be guaranteed. ESA shall not be liable
for any e-mail if modified.
=================================================================================================



More information about the voql-teg mailing list