[CATALOGUE]Starting Data Model Subgroup

Mon Aug 2 13:04:30 PDT 2004

Pedro Osuna wrote:

>Dear all,
>
>at the last IVOA meeting in Cambridge, Boston, I approached Jonathan to
>get the vacant responsibility of coordinating the efforts in a
>"Catalogue" subgroup of the Data Model.
>  
>
It is great that someone is taking this on!

>
>
>
>DEFINITION OF A CATALOGUE
>-------------------------
>
>>From "Webster's Revised Unabridged Dictionary (1913)":
>
>"[...]A list or enumeration of names, or articles arranged
>methodically, often in alphabetical order; as, a catalogue of
>the students of a college, or of books, or of the stars.[...]"
>
>
>In the case of astronomy, thus, a catalogue would be a list or
>enumeration of certain astronomical objects (to be clarified later) in a
>certain order and including certain information per object.
>
>The definition of an astronomical object in this context would vary.
>An astronomical object could be anything from Stars to Galaxies, etc.,
>but also something more general like Observations, Sources or
>Observatories.
>
>In this sense, the Catalogue data model would not have to describe the
>inner details of the object it is cataloging, that should be described
>in other data models, but just the information relevant for the
>catalogue itself.  
>
I agree with everything up to this point.

>It is also true that some of the internal properties
>of the astronomical objects would appear in the catalogue itself through
>its columns.
>
Here.  The mere mention of columns is, in my opinion, out of place.  The 
concept of rows and columns should not appear in any component of our 
data model.  They belong in a relational database data model.  Here I 
think we are working on a more abstract level in which objects may 
contain other objects.  This results in  tree-like structures.  We 
should worry about transformation into a set of interelated relational 
tables only after the VO data model for this is complete.  I believe 
that Roy correctly chimed in that  VOTable can  already do this only 
because Pedro incorrectly brought up the issue of  describing rows and 
columns.

>
>For example, the XMM-Newton "1XMM" is a list of serendipitous sources
>detected by the satellite in its observing campaign. The model for this
>catalogue could consist of things like the provenance (ESA), number of
>columns (400) number of rows (~32000), etc., or it might give more
>relevant information like: column number three in the catalogue is the
>Source.likelihood where likelihood is an attribute of the Source Data
>Model.
>I think this is an interesting point for discussion.....
>
>  
>
 A catalog should be a list of sourceObjects which 
holds/contains/aggregates  Quantities.  The quantities should be allowed 
to be of arbitrary depth and detail.  That is, one should be free to 
enter QuntitySets of QuantitySets.   To make this more concrete, lets 
talk about a general catalog of galaxies. We wish to provide at a 
minimum basic data about each galaxy (ie. simple quantities: magnitudes, 
ra, dec, morphological class).  Also, one wants the Observations of each 
galaxy, such as Image.  We may just want to hold crucial metadata about 
each image (exposure time, ra,dec, filter) and perhaps a URL to the 
actual data.  But we may want to group these images into various 
regions. So we have /galaxy/region/observation/image so far.  Region may 
specify not just the location on the celestial sphere, but  also give 
information on the type of region (spiral arm, interarm, open cluster 
region, outerhalo, etc). There may be photometry catalogs created from 
these images that are to be included.  These catalogs should have 
starObjects with mags with errors and filter info, and location pointers 
to pixel coordinates in the image.  Some of the photometryCatalogs are 
the children of  images but some may be concatention of several tables 
within a region.  That would be a child of the region.  Also in the 
region may be some higher resolution images in a crowded region 
(/galaxy/region/region/observation/photoCatalog).    We may want to 
point out variable stars, supernovae, etc so one has special subCatalogs 
of these.   There may be reasons for others to attach additional info 
about  the variable stars since they may be messing up the TRGB 
distances.  Finally there are outputs of the tip edge detectors and 
their input paramters as well.

Columns does not mean anything in this context.  Although one could and 
will provide a mechanism to serialize this by VOTABLE, a more object 
oriented method is prefered, not because it is easier for the human to 
read, but because it is easier for the machine to read.  To make it 
manageable to the human one has XSLT scripts for each object type.  One 
can provide skeleton views to see the general nested structure and then 
click on an object to display it more completely.

>A place to find literally thousands of catalogues is the CDS, where they
>have 5587 Catalogues available. Their clasification of the catalogues
>obeys to the type of data they are cataloging, e.g., Astrometric Data,
>Photometric data, Spectroscopic data, etc.. The same question as above
>on whether we would have to create specific data model for each of the
>eventual astronomical object categories we are cataloging arises.
>
>  
>
I think this is what the DM is all about.  We are creating spectral 
object, bandpass object, and STCobject. These are building blocks for 
spectralCatalog, photometricCatalog, and astrometricCatalog respectively.
However, I think 90% of what one wants in any astronomicalObject is 
satisfied by the same set of things.  Universe, cluster of galaxies, 
galaxy, cluster, star, planet, comet,  can all take STC for location, 
Quantity for any global property, Region for subregions, Layer for 
layers like convection zone or mesosphere,  Members or perhaps Parts  
for  component parts.

The real power of this schema is that one can establish a data model 
schema for query that is acceptable to all data centers, but completely 
hides each datacenters internal organization.  A query for galaxies 
with  supergiant stars in the interarm region is a simple XPath:

//galaxy//region[@type="interarm"]//star//spectralType/value="supergiant"

This query could be sent to all datacenters and be decomposed at the 
datacenters into a set of  SQLs  to retrieve the appropriate data and 
then  construct a  galaxyCatalog for output.  Used inside of an XQuery, 
the request could compose an alternate structure for the output XML 
object such as starCatalog rather than galaxyCatalog.

Ed