UCD for SIAP

Doug Tody dtody at nrao.edu
Fri Jun 20 11:55:29 PDT 2003


Sebastien -

Good - we agree I think.  This is exactly the point I was trying to
make about data models and UCDs.  The attributes of formal data models
need to be defined precisely and unambiguously.  The attributes of
different data models need to be uniquely identified by some means, e.g.,
a globally unique name or reference (e.g., a form of UCD), a namespace
(e.g., our temporary VOX namespace), or some hierarchical structure as
in IDHA.  The attributes of different data models, although they need to
be distinguished from one another, may well share the same fundamental
type, and a UCD could be used to express this.

Using different approaches to naming data model attributes and types (UCDs),
as I think you are suggesting below, is one way to solve the problem.
This provides both the precision required to identify DM attributes, and the
means to associate elements of different data models for interoperability.

The only problem I see with this is that we would like flexibility in
how we represent data models and metadata.  Mapping DM attributes into
the columns of a flat table, as in SIA or in a FITS header, is convenient
and can simplify representations, up to a point.  If datasets get complex
enough then eventually one needs more structure and an approach such as
IDHA or HDX may be called for.  In many cases the simpler representation
is adequate.  It would be good if the underlying mechanisms, such as UCDs
and how we define data models, were flexible enough to permit a variety
of such representations.

If we map the attributes of a DM into table columns and we do NOT use the
UCD to identify the DM attribute, then we need another tag of some sort
for this purpose.  This would be no problem in XML, but we would have
the nuisance of carrying along an additional tag separate from the UCD.
In VOTable this would give us NAME, ID, UCD, plus a new tag for the
formal DM attribute assocation (conceivably ID could be used for this
purpose but it already has other uses).  In a representation such as
FITS, (e.g., if we try to represent VO data in FITS), then it is harder.
In this case one might want to use the comment field of a FITS keyword
to contain something like a UCD:  keyword = value / UCD.  I am not saying
we necessarily want to do this, but it is an example of representation
flexibility and it would be good if our scheme could extend to this level.

If we DO use the UCD to carry this additional meaning, then the global
UCD namespace could include both formal DM attribute names, and the more
fundamental types used to associate different data elements as at present.
UCDs would then provide a global naming index, with a single name (the
UCD) being sufficient to carry all this meaning.  Given the UCDs and
an understanding of the associated DM (stored separately) we would then
be able to recognize that different metadata elements (table columns in
this case) are associated, define and use an XML schema to verify the
integrity of the DM subset in these columns, use semantic relationships
for inference, and so forth.

In this case what we would do is use the UCD tag in a representation to
convey the data model attribute name, uniquely identifying both the data
model and the attribute of the data model.  The formal definition of the DM
would then define each attribute of the DM, ** giving for each attribute
the UCD type of the attribute **.  If this UCD type is elemental then we
would have the desired interoperability, and the means to associate and
compare similar data elements.  UCDs would thus provide the metadata "glue"
to link related concepts such as fundamental quantities and data models,
making possible a uniform representation for both.

To summarize, UCDs or something like them can play a key role to structure
and link fundamental metadata and data models.  The issue has already come
up in interfaces like SIA and IDHA.  Can we come up with something which is
sufficiently powerful and general to provide both types of representations?

	- Doug



On Fri, 20 Jun 2003, Sebastien Derriere wrote:
> Doug Tody wrote:
> > 
> > The key problem I see with trying to use existing UCDs is that historically
> > UCDs have been used primarily as fuzzy tags to link similar fields in
> > catalogs.  In data access metadata such as is introduced in SIA we are
> > using UCDs to identify the fields of a formal data model.  Here the tag
> > is not fuzzy at all, linking similar fields of unrelated catalogs, rather
> > it is a link to a field of a formally defined data model.  Precision is
> > important for these data models - we are precisely defining attributes
> > of the data model.
> > 
> > We should formally define data models such as spectralBandpass or WCS
> > and define, as part of the data model, the UCD tag used to identify an
> > attribute of the data model.  When we represent a data model as a set of
> > related columns in a table, or as an entity struct in XML (as in IDHA or
> > HDX), we will use the UCDs to formally type the data model attributes so
> > that programs can use them unambiguously, so that we can use XML Schemas
> > for automated validation, and so forth.
> 
>   Hello,
> 
>   The primary goal of UCDs is to ensure interoperability between
> heterogeneous datasets. That's why they have been defined to some
> "reasonable" level of precision (what you call fuzziness).
>   Internal attributes of a formally defined data model can be defined
> at any level of precision, and have their own names. But you can
> have *in addition* a UCD attached to every attribute (see the case 
> of the IDHA model). Those UCD can ensure interoperability between
> different data models, and between data models and datasets. 
>   The names of the attributes can not a priori ensure this task,
> because nothing prevents from having the same concept named 
> differently in different models.
> 
> Sebastien.



More information about the dal mailing list