MyUCDs & Registry

Tony Linde ael at star.le.ac.uk
Tue May 20 08:22:28 PDT 2003


Hi Tom,

> I find this unconvincing.  How is a 'generic' query to be 
> able to do anything with column names?

The column names have to be inserted into the query before it is sent out.
It may start out as generic but, in order to be resolved at the dataset end,
it must have some specificity added.

> We have no standardized mechanism for specifying 
> the names of columns (and as far as I know we have not even 
> begun an effort to define such) so manifestly it is currently 
> impossible to use column names for a generic query.

That's why we have these working groups :)

I've proposed a possible approach to inclusion of names in my original
discourse:
  http://www.ivoa.net/twiki/bin/view/IVOA/TonyOnUCDs
where the <useColumn columnName="DEJ2000" inResource="cat2" /> tag modifies
the generic <field ucd="POS_EQ_DEC"> tag of which it is a part.

> This doesn't mean 
> that someone couldn't make a query against the observation 
> centers, but that would be a 'manual' query.

But they are both queries and the query mechanism needs to cater for both
possibilities.

> How does any kind of 
> automated program pick between 'ra_usno' and 'ra_sdss'?

It certainly cannot and I wasn't proposing that it could. In this case the
workflow should either stop and flag the user that he needs to intervene in
the workflow and resolve the duplication or just picks one and goes ahead
(this option should be set by the user at the beginning of the workflow). 

But where the query is specified at the outset of constructing the workflow,
the workflow engine can consult the registry to determine if any duplication
conflicts occur and ask the user to resolve them before the job is
submitted.

> To my mind UCD's provide precisely the information that is 
> useful in making automated decisions about which column to 
> query. They identify the kind of information that is in the 

I agree completely but we need to provide mechanisms for those situations
where UCDs do not solve the problem.

Cheers,
Tony. 

PS: Sorry about the cross-post. This seems to be more of a VOQL question
now, so shall we continue the discussion on that list ??

> -----Original Message-----
> From: Tom McGlynn [mailto:Thomas.A.McGlynn at nasa.gov] 
> Sent: 20 May 2003 15:22
> To: registry at ivoa.net
> Subject: Re: MyUCDs & Registry
> 
> 
> > 5. Should the Registry store the column names and units used in a 
> > catalog or data table? I would say 'definitely' for the 
> column names 
> > and 'probably' for the units. The column names are essential to 
> > resolve duplicate UCDs before a generic query is farmed out.
> 
> Hi Tony,
> 
> I find this unconvincing.  How is a 'generic' query to be 
> able to do anything with column names?
> 
> Generic queries must use some standardized mechanism
> to identify the elements used in the query (else they aren't 
> generic).  We have no standardized mechanism for specifying 
> the names of columns (and as far as I know we have not even 
> begun an effort to define such) so manifestly it is currently 
> impossible to use column names for a generic query.
> 
> However, I believe that the currently planned implementation 
> of UCDs will in practice be adequate virtually all of the time.
> 
> The putative problem with UCD's is that multiple columns
> in a given table may share the same UCD.  Let's look at some 
> scenarios in more detail.  One example given in Cambridge was 
> an object catalog whose entries contained not only the object 
> postition but the center of the image on which the object was 
> detected. This is precisely the kind of case the 'main' UCD 
> qualifier would (and does) address.  It is easily be handled 
> using either the current or proposed UCD frameworks.  A user 
> querying this table is getting objects, and the position of 
> the object should be the 'main' position.  This doesn't mean 
> that someone couldn't make a query against the observation 
> centers, but that would be a 'manual' query.
> 
> There are cases where the 'main' column is not obvious.  
> E.g., we might have a table which is the cross-correlation 
> between the SDSS and USNO-B object catalogs.  This contains 
> two positions neither really subordinate to the other. Which 
> should be used in querying the table?
> 
> I'll grant the UCD's do not magically solve this problem, but 
> the column names don't really help.  How does any kind of 
> automated program pick between 'ra_usno' and 'ra_sdss'?
> 
> In practice in this case I would expect the creators of this 
> table to pick one of the sets of position -- perhaps the one 
> with the smallest error -- and suggest that this is the 
> primary set of positions.  How might they do this most 
> easily?  The 'main' qualifier in the UCD descriptions is 
> again the obvious candidate.  The choice here is a bit 
> arbitrary, but so would any automated choice based on 
> position be.  Regardless it will not matter very much, since 
> the positions will be very close to one another.
> 
> For a third scenario let's consider a catalog of gamma-ray 
> bursts where the position of each burst is given as a 
> quadralateral in the sky with four bounding positions.  Here 
> it's clearly impossible to pick any
> one of these positions over the other.   The best resolution of this
> problem is probably to define a UCD for the group of columns 
> the define the bounding box.  Again having column names 
> doesn't help automated software pick a column -- this table 
> just isn't easily searchable using simple cone-search 
> techniques.  Fortunately this is a relatively rare sort of table.
> 
> 
> 
> To my mind UCD's provide precisely the information that is 
> useful in making automated decisions about which column to 
> query. They identify the kind of information that is in the 
> column in a standard way.  When there are multiple columns 
> that have the same kind of information I don't think it's 
> reasonable to expect automated choices to be made unless 
> there is a hint from the creator of the table.
> 
> This doesn't mean that we don't want to have column names in 
> the registry. They may be helpful for directed (rather than 
> automated queries), for textual matches, and to help the user 
> understand what is in a table before actually running a query.
> 
> 
> I agree with you about units.  Basically they are an 
> implementation detail and the registry should not be exposing 
> this implementation choice.
> 
> 		Tom McGlynn
> 




More information about the registry mailing list