UCDs status and perspectives
Kirk Borne
borne at rings.gsfc.nasa.gov
Tue Mar 25 14:30:55 PST 2003
Ray: I believe that your suggestions carry a lot of merit. I had a
nagging feeling when reading Jonathan's comments that our metadata
efforts could easily diverge (as you suggest) if we are not careful.
The data model is key to this -- consequently, the ability of XML
Schema to carry the knowledge about both the data model and the
metadata relationships, using standardized techniques, makes a lot
of sense -- especially with regard to reconciling the two metadata
approaches that you mention. I think that disaster can be averted,
and Jonathan's and your suggestions can illuminate the way. The
PHOT_FLUX(PHOT_BAND_ID) is an excellent example of the problem
and a good solution.
- Kirk
> Date: Tue, 25 Mar 2003 15:15:13 -0600 (CST)
> From: Ray Plante <rplante at poplar.ncsa.uiuc.edu>
> To: ucd at ivoa.net
> Subject: Re: UCDs status and perspectives
>
> We have been bouncing around in our community two models for tagging
> metadata that we will eventually need to reconcile. One is essentially
> XML-based and the other is based on the current UCD set. The former makes
> the most sense for descriptions stored in a registry, while the latter is
> useful for tagging a set of data (e.g. in a table column). Both are
> important and necessary. Both reflect a common data model. However, it
> would inconvenient if not disasterous if there were not a direct
> correlation between to two representations.
>
> I feel the answer can be found in existing XML technologies. I would
> claim that the atomic descriptors being discussed (PAD, PCD, etc.) are
> really just a short hop from an XML model. The main thing that PCD has
> in common with XML tags is that both are essentially pointers into a
> data model. The main difference between them is that with a UCD/PCD, we
> need that pointer to be representable as a simple string (e.g. that can be
> put in the ucd attribute in a VOTable).
>
> XML has such a pointer; it's called an XPath. If we define the data
> model as an XML Schema, then our UCD/PCDs fall right out. There are other
> advantages:
> * XML Schema provides a machine readable form for a data dictionary.
> * The extensibility of XML schemas provides the extensibility for
> UCDs automatically.
> * When necessary, metadata in direct XML form (as envisioned in a
> registry) into UCD-tagged VOTable data.
> * The modeling that Jonathan proposed is largely still applicable.
> * The approach is consistant with the data modeling activities that
> have been done to date.
>
> As an example, consider the example Jonathan cited:
>
> > For instance the two UCDs PHOT_FLUX_RADIO_1.4G
> > and PHOT_FLUX_RADIO_1.6G would map to a single PCD
> > PHOT_FLUX(PHOT_BAND_ID) with PHOT_BAND_ID taking the values 1.4 GHz and
> > 1.6 GHz.
>
> Suppose a data model defined more or less in the following way:
>
> <element name="PHOT">
> <complexType>
> <sequence>
> <element name="FLUX" type="fluxValue"/> <!-- fluxValue and freqValue
-->
> <element name="FREQ" type="freqValue/> <!-- defined elsewhere
-->
> </sequence>
> </complexType>
> </element>
>
> In direct XML, such a flux would be rendered as:
>
> <PHOT>
> <FREQ>1.6 GHz</FREQ>
> <FLUX>0.25 GHz</VALUE>
> </PHOT>
>
> The XPath pointing to the flux value would be:
>
> PHOT[FREQ='1.6 GHz']/FLUX
>
> Note that there doesn't need to a direct XML representation like the
> one shown above for the XPath to carry meaning; it points into a data
> model describing Photometry.
>
> To make this model work, I think we need to...
> * set some restrictions on how metadata are defined in XML Schema.
> We may consider avoiding the defining attributes; this would make
> XPaths simpler and it would integrate into SOAP more easily.
>
> * devise a consistant pattern representing measurements (i.e. value
> and unit). I'm *not* recommending the form in the above example.
>
> * build data models from the bottom up starting with common concepts
> (e.g. physical quantities) that can be reused in different
> contexts. (Jonathan's model is relevent here.) A freqency
> should be represented the same way whether it is describing a flux
> measurement or a bandwidth.
>
> cheers,
> Ray
+------------------------------------+-------------------------------------+
| Dr. Kirk D. Borne | mailto:Kirk.Borne at gsfc.nasa.gov |
| Institute for Science & Technology, Raytheon (IST at R) |
| NASA Goddard Space Flight Center | |
| Astrophysics Data Facility | Phone: 301-286-0696 |
| Code 631 | or 301-286-2772:Kathy Starling |
| Greenbelt, MD 20771 | FAX: 301-286-1771 |
+------------------------------------+-------------------------------------+
US Virtual Observatory: http://us-vo.org/
Staff page: http://rings.gsfc.nasa.gov/~borne/bio_borne_kirk.html
More information about the registry
mailing list