UCDs and XML

Ray Plante rplante at poplar.ncsa.uiuc.edu
Tue Mar 25 14:04:25 PST 2003


Please pardon if you've already seen this post to the ucd list.  

---------- Forwarded message ----------
Date: Tue, 25 Mar 2003 15:15:13 -0600 (CST)
From: Ray Plante <rplante at ncsa.uiuc.edu>
To: ucd at ivoa.net
Subject: Re: UCDs status and perspectives

We have been bouncing around in our community two models for tagging
metadata that we will eventually need to reconcile.  One is essentially
XML-based and the other is based on the current UCD set.  The former makes
the most sense for descriptions stored in a registry, while the latter is
useful for tagging a set of data (e.g. in a table column).  Both are
important and necessary.  Both reflect a common data model.  However, it
would inconvenient if not disasterous if there were not a direct
correlation between to two representations.

I feel the answer can be found in existing XML technologies.  I would 
claim that the atomic descriptors being discussed (PAD, PCD, etc.) are 
really just a short hop from an XML model.  The main thing that PCD has 
in common with XML tags is that both are essentially pointers into a 
data model.  The main difference between them is that with a UCD/PCD, we 
need that pointer to be representable as a simple string (e.g. that can be 
put in the ucd attribute in a VOTable).  

XML has such a pointer; it's called an XPath.  If we define the data 
model as an XML Schema, then our UCD/PCDs fall right out.  There are other 
advantages:
  *  XML Schema provides a machine readable form for a data dictionary.
  *  The extensibility of XML schemas provides the extensibility for 
     UCDs automatically.  
  *  When necessary, metadata in direct XML form (as envisioned in a 
     registry) into UCD-tagged VOTable data.  
  *  The modeling that Jonathan proposed is largely still applicable.
  *  The approach is consistant with the data modeling activities that 
     have been done to date.

As an example, consider the example Jonathan cited:

> For instance the two UCDs PHOT_FLUX_RADIO_1.4G
> and PHOT_FLUX_RADIO_1.6G would map to a single PCD
> PHOT_FLUX(PHOT_BAND_ID) with PHOT_BAND_ID taking the values 1.4 GHz and
> 1.6 GHz. 

Suppose a data model defined more or less in the following way:

 <element name="PHOT">     
   <complexType>           
     <sequence>            
       <element name="FLUX" type="fluxValue"/>  <!-- fluxValue and freqValue -->
       <element name="FREQ" type="freqValue/>   <!--   defined elsewhere     -->
     </sequence>           
   </complexType>          
 </element>                

In direct XML, such a flux would be rendered as:

 <PHOT>
   <FREQ>1.6 GHz</FREQ>
   <FLUX>0.25 GHz</VALUE>
 </PHOT>

The XPath pointing to the flux value would be:

  PHOT[FREQ='1.6 GHz']/FLUX

Note that there doesn't need to a direct XML representation like the
one shown above for the XPath to carry meaning; it points into a data
model describing Photometry.  

To make this model work, I think we need to...
  * set some restrictions on how metadata are defined in XML Schema.
    We may consider avoiding the defining attributes; this would make
    XPaths simpler and it would integrate into SOAP more easily.

  * devise a consistant pattern representing measurements (i.e. value
    and unit).  I'm *not* recommending the form in the above example.

  * build data models from the bottom up starting with common concepts
    (e.g. physical quantities) that can be reused in different
    contexts.  (Jonathan's model is relevent here.)  A freqency
    should be represented the same way whether it is describing a flux
    measurement or a bandwidth.  

cheers,
Ray



More information about the registry mailing list