UCDs and XML
Ray Plante
rplante at poplar.ncsa.uiuc.edu
Tue Mar 25 14:04:25 PST 2003
Please pardon if you've already seen this post to the ucd list.
---------- Forwarded message ----------
Date: Tue, 25 Mar 2003 15:15:13 -0600 (CST)
From: Ray Plante <rplante at ncsa.uiuc.edu>
To: ucd at ivoa.net
Subject: Re: UCDs status and perspectives
We have been bouncing around in our community two models for tagging
metadata that we will eventually need to reconcile. One is essentially
XML-based and the other is based on the current UCD set. The former makes
the most sense for descriptions stored in a registry, while the latter is
useful for tagging a set of data (e.g. in a table column). Both are
important and necessary. Both reflect a common data model. However, it
would inconvenient if not disasterous if there were not a direct
correlation between to two representations.
I feel the answer can be found in existing XML technologies. I would
claim that the atomic descriptors being discussed (PAD, PCD, etc.) are
really just a short hop from an XML model. The main thing that PCD has
in common with XML tags is that both are essentially pointers into a
data model. The main difference between them is that with a UCD/PCD, we
need that pointer to be representable as a simple string (e.g. that can be
put in the ucd attribute in a VOTable).
XML has such a pointer; it's called an XPath. If we define the data
model as an XML Schema, then our UCD/PCDs fall right out. There are other
advantages:
* XML Schema provides a machine readable form for a data dictionary.
* The extensibility of XML schemas provides the extensibility for
UCDs automatically.
* When necessary, metadata in direct XML form (as envisioned in a
registry) into UCD-tagged VOTable data.
* The modeling that Jonathan proposed is largely still applicable.
* The approach is consistant with the data modeling activities that
have been done to date.
As an example, consider the example Jonathan cited:
> For instance the two UCDs PHOT_FLUX_RADIO_1.4G
> and PHOT_FLUX_RADIO_1.6G would map to a single PCD
> PHOT_FLUX(PHOT_BAND_ID) with PHOT_BAND_ID taking the values 1.4 GHz and
> 1.6 GHz.
Suppose a data model defined more or less in the following way:
<element name="PHOT">
<complexType>
<sequence>
<element name="FLUX" type="fluxValue"/> <!-- fluxValue and freqValue -->
<element name="FREQ" type="freqValue/> <!-- defined elsewhere -->
</sequence>
</complexType>
</element>
In direct XML, such a flux would be rendered as:
<PHOT>
<FREQ>1.6 GHz</FREQ>
<FLUX>0.25 GHz</VALUE>
</PHOT>
The XPath pointing to the flux value would be:
PHOT[FREQ='1.6 GHz']/FLUX
Note that there doesn't need to a direct XML representation like the
one shown above for the XPath to carry meaning; it points into a data
model describing Photometry.
To make this model work, I think we need to...
* set some restrictions on how metadata are defined in XML Schema.
We may consider avoiding the defining attributes; this would make
XPaths simpler and it would integrate into SOAP more easily.
* devise a consistant pattern representing measurements (i.e. value
and unit). I'm *not* recommending the form in the above example.
* build data models from the bottom up starting with common concepts
(e.g. physical quantities) that can be reused in different
contexts. (Jonathan's model is relevent here.) A freqency
should be represented the same way whether it is describing a flux
measurement or a bandwidth.
cheers,
Ray
More information about the registry
mailing list