UCDs status and perspectives

Kirk Borne borne at rings.gsfc.nasa.gov
Tue Mar 25 14:30:55 PST 2003


Ray:  I believe that your suggestions carry a lot of merit.  I had a
nagging feeling when reading Jonathan's comments that our metadata
efforts could easily diverge (as you suggest) if we are not careful.
The data model is key to this -- consequently, the ability of XML
Schema to carry the knowledge about both the data model and the
metadata relationships, using standardized techniques, makes a lot
of sense -- especially with regard to reconciling the two metadata
approaches that you mention.  I think that disaster can be averted,
and Jonathan's and your suggestions can illuminate the way.  The
PHOT_FLUX(PHOT_BAND_ID) is an excellent example of the problem
and a good solution.

- Kirk


> Date: Tue, 25 Mar 2003 15:15:13 -0600 (CST)
> From: Ray Plante <rplante at poplar.ncsa.uiuc.edu>
> To: ucd at ivoa.net
> Subject: Re: UCDs status and perspectives
> 
> We have been bouncing around in our community two models for tagging
> metadata that we will eventually need to reconcile.  One is essentially
> XML-based and the other is based on the current UCD set.  The former makes
> the most sense for descriptions stored in a registry, while the latter is
> useful for tagging a set of data (e.g. in a table column).  Both are
> important and necessary.  Both reflect a common data model.  However, it
> would inconvenient if not disasterous if there were not a direct
> correlation between to two representations.
> 
> I feel the answer can be found in existing XML technologies.  I would 
> claim that the atomic descriptors being discussed (PAD, PCD, etc.) are 
> really just a short hop from an XML model.  The main thing that PCD has 
> in common with XML tags is that both are essentially pointers into a 
> data model.  The main difference between them is that with a UCD/PCD, we 
> need that pointer to be representable as a simple string (e.g. that can be 
> put in the ucd attribute in a VOTable).  
> 
> XML has such a pointer; it's called an XPath.  If we define the data 
> model as an XML Schema, then our UCD/PCDs fall right out.  There are other 
> advantages:
>   *  XML Schema provides a machine readable form for a data dictionary.
>   *  The extensibility of XML schemas provides the extensibility for 
>      UCDs automatically.  
>   *  When necessary, metadata in direct XML form (as envisioned in a 
>      registry) into UCD-tagged VOTable data.  
>   *  The modeling that Jonathan proposed is largely still applicable.
>   *  The approach is consistant with the data modeling activities that 
>      have been done to date.
> 
> As an example, consider the example Jonathan cited:
> 
> > For instance the two UCDs PHOT_FLUX_RADIO_1.4G
> > and PHOT_FLUX_RADIO_1.6G would map to a single PCD
> > PHOT_FLUX(PHOT_BAND_ID) with PHOT_BAND_ID taking the values 1.4 GHz and
> > 1.6 GHz. 
> 
> Suppose a data model defined more or less in the following way:
> 
>  <element name="PHOT">     
>    <complexType>           
>      <sequence>            
>        <element name="FLUX" type="fluxValue"/>  <!-- fluxValue and freqValue 
-->
>        <element name="FREQ" type="freqValue/>   <!--   defined elsewhere     
-->
>      </sequence>           
>    </complexType>          
>  </element>                
> 
> In direct XML, such a flux would be rendered as:
> 
>  <PHOT>
>    <FREQ>1.6 GHz</FREQ>
>    <FLUX>0.25 GHz</VALUE>
>  </PHOT>
> 
> The XPath pointing to the flux value would be:
> 
>   PHOT[FREQ='1.6 GHz']/FLUX
> 
> Note that there doesn't need to a direct XML representation like the
> one shown above for the XPath to carry meaning; it points into a data
> model describing Photometry.  
> 
> To make this model work, I think we need to...
>   * set some restrictions on how metadata are defined in XML Schema.
>     We may consider avoiding the defining attributes; this would make
>     XPaths simpler and it would integrate into SOAP more easily.
> 
>   * devise a consistant pattern representing measurements (i.e. value
>     and unit).  I'm *not* recommending the form in the above example.
> 
>   * build data models from the bottom up starting with common concepts
>     (e.g. physical quantities) that can be reused in different
>     contexts.  (Jonathan's model is relevent here.)  A freqency
>     should be represented the same way whether it is describing a flux
>     measurement or a bandwidth.  
> 
> cheers,
> Ray


+------------------------------------+-------------------------------------+
| Dr. Kirk D. Borne                  | mailto:Kirk.Borne at gsfc.nasa.gov     |
| Institute for Science & Technology, Raytheon (IST at R)                     |
| NASA Goddard Space Flight Center   |                                     |
| Astrophysics Data Facility         | Phone: 301-286-0696                 |
| Code 631                           |     or 301-286-2772:Kathy Starling  |
| Greenbelt, MD  20771               | FAX:   301-286-1771                 |
+------------------------------------+-------------------------------------+
  US Virtual Observatory:  http://us-vo.org/
  Staff page:     http://rings.gsfc.nasa.gov/~borne/bio_borne_kirk.html



More information about the registry mailing list