[CATALOGUE]Starting Data Model Subgroup
Ed Shaya
edward.j.shaya.1 at gsfc.nasa.gov
Mon Aug 2 13:04:30 PDT 2004
Pedro Osuna wrote:
>Dear all,
>
>at the last IVOA meeting in Cambridge, Boston, I approached Jonathan to
>get the vacant responsibility of coordinating the efforts in a
>"Catalogue" subgroup of the Data Model.
>
>
It is great that someone is taking this on!
>
>
>
>DEFINITION OF A CATALOGUE
>-------------------------
>
>>From "Webster's Revised Unabridged Dictionary (1913)":
>
>"[...]A list or enumeration of names, or articles arranged
>methodically, often in alphabetical order; as, a catalogue of
>the students of a college, or of books, or of the stars.[...]"
>
>
>In the case of astronomy, thus, a catalogue would be a list or
>enumeration of certain astronomical objects (to be clarified later) in a
>certain order and including certain information per object.
>
>The definition of an astronomical object in this context would vary.
>An astronomical object could be anything from Stars to Galaxies, etc.,
>but also something more general like Observations, Sources or
>Observatories.
>
>In this sense, the Catalogue data model would not have to describe the
>inner details of the object it is cataloging, that should be described
>in other data models, but just the information relevant for the
>catalogue itself.
>
I agree with everything up to this point.
>It is also true that some of the internal properties
>of the astronomical objects would appear in the catalogue itself through
>its columns.
>
Here. The mere mention of columns is, in my opinion, out of place. The
concept of rows and columns should not appear in any component of our
data model. They belong in a relational database data model. Here I
think we are working on a more abstract level in which objects may
contain other objects. This results in tree-like structures. We
should worry about transformation into a set of interelated relational
tables only after the VO data model for this is complete. I believe
that Roy correctly chimed in that VOTable can already do this only
because Pedro incorrectly brought up the issue of describing rows and
columns.
>
>For example, the XMM-Newton "1XMM" is a list of serendipitous sources
>detected by the satellite in its observing campaign. The model for this
>catalogue could consist of things like the provenance (ESA), number of
>columns (400) number of rows (~32000), etc., or it might give more
>relevant information like: column number three in the catalogue is the
>Source.likelihood where likelihood is an attribute of the Source Data
>Model.
>I think this is an interesting point for discussion.....
>
>
>
A catalog should be a list of sourceObjects which
holds/contains/aggregates Quantities. The quantities should be allowed
to be of arbitrary depth and detail. That is, one should be free to
enter QuntitySets of QuantitySets. To make this more concrete, lets
talk about a general catalog of galaxies. We wish to provide at a
minimum basic data about each galaxy (ie. simple quantities: magnitudes,
ra, dec, morphological class). Also, one wants the Observations of each
galaxy, such as Image. We may just want to hold crucial metadata about
each image (exposure time, ra,dec, filter) and perhaps a URL to the
actual data. But we may want to group these images into various
regions. So we have /galaxy/region/observation/image so far. Region may
specify not just the location on the celestial sphere, but also give
information on the type of region (spiral arm, interarm, open cluster
region, outerhalo, etc). There may be photometry catalogs created from
these images that are to be included. These catalogs should have
starObjects with mags with errors and filter info, and location pointers
to pixel coordinates in the image. Some of the photometryCatalogs are
the children of images but some may be concatention of several tables
within a region. That would be a child of the region. Also in the
region may be some higher resolution images in a crowded region
(/galaxy/region/region/observation/photoCatalog). We may want to
point out variable stars, supernovae, etc so one has special subCatalogs
of these. There may be reasons for others to attach additional info
about the variable stars since they may be messing up the TRGB
distances. Finally there are outputs of the tip edge detectors and
their input paramters as well.
Columns does not mean anything in this context. Although one could and
will provide a mechanism to serialize this by VOTABLE, a more object
oriented method is prefered, not because it is easier for the human to
read, but because it is easier for the machine to read. To make it
manageable to the human one has XSLT scripts for each object type. One
can provide skeleton views to see the general nested structure and then
click on an object to display it more completely.
>A place to find literally thousands of catalogues is the CDS, where they
>have 5587 Catalogues available. Their clasification of the catalogues
>obeys to the type of data they are cataloging, e.g., Astrometric Data,
>Photometric data, Spectroscopic data, etc.. The same question as above
>on whether we would have to create specific data model for each of the
>eventual astronomical object categories we are cataloging arises.
>
>
>
I think this is what the DM is all about. We are creating spectral
object, bandpass object, and STCobject. These are building blocks for
spectralCatalog, photometricCatalog, and astrometricCatalog respectively.
However, I think 90% of what one wants in any astronomicalObject is
satisfied by the same set of things. Universe, cluster of galaxies,
galaxy, cluster, star, planet, comet, can all take STC for location,
Quantity for any global property, Region for subregions, Layer for
layers like convection zone or mesosphere, Members or perhaps Parts
for component parts.
The real power of this schema is that one can establish a data model
schema for query that is acceptable to all data centers, but completely
hides each datacenters internal organization. A query for galaxies
with supergiant stars in the interarm region is a simple XPath:
//galaxy//region[@type="interarm"]//star//spectralType/value="supergiant"
This query could be sent to all datacenters and be decomposed at the
datacenters into a set of SQLs to retrieve the appropriate data and
then construct a galaxyCatalog for output. Used inside of an XQuery,
the request could compose an alternate structure for the output XML
object such as starCatalog rather than galaxyCatalog.
Ed
More information about the dm
mailing list