handling metadata with multiple values

Alberto Micol Alberto.Micol at eso.org
Wed Aug 13 02:53:45 PDT 2003


Going extreme ...
And, for sure, I will regret this.

Naive question: What's wrong with the following syntax

<UCDList>
         <x1/>
         <x2/>
</UCDList>

where x1 and x2 are two UCDs ?
After all the UCDs are a well known set of words, and nobody is allowed
to invent his/her own UCDs, isn't it ?
I don't know enough of SAX and DOM, so I will appreciate if the gurus
can explain me the pros and cons of this.

And with such scheme, I can also add a value to each ucd:

<UCDList>
         <x1> val1 </x1>
         <x2> val2 </x2>
   <!-- e.g.: -->
         <FLUX>
            <WAVELENGTH unit="nm"> 555 </WAVELENGTH>
         </FLUX>

</UCDList>

(
  Now I have moved the multiple values problem one level down ...
  but I think that we can live with:
  <UCDList>
           <x1> val11 </x1>
           <x1> val12 </x1>
           <x2> val2 </x2>
  </UCDList>
)

A criticism will probably be that this might OK for UCDs, but there are other
metadata out there which are not UCDs,and those other metadata could still have
multiple values, so we are back to the original problem.

At the contrary, and that's where I go extreme, I think that all metadata
should be part of a well defined dictionary. Not only that, but even metadata
values should be part of the same dictionary!

Example, when I go to the pub and in front of a beer we start a nice conversation
regarding WFPC2, we never refer to it as "instrument wfpc2",
that is, we never use the syntax:

<INSTRUMENT> WFPC2 </INSTRUMENT> (or whatever is the UCD for instrument_name)

but we always use

<WFPC2/>

because WFPC2 is part of our dictionary.

I'm claiming that WFPC2 should become a sort of UCD, at list within
the HST context. (One could imagine the existance of specialised sets of context-dependent UCDs managed by each individual project).
The same is true for all other metadata.

You will claim that elevating UCDs and other metadata info
to the level of XML tags is not a flexible approach.
At the elementary school, when I misspelled a noun, or invented one,
I did not get a good mark for my flexibility, nor for my imagination.
It is the price to pay to be understood.

You will claim that could be hard to build such a huge dictionary,
and maybe even harder to use it.
But without a dictionary I will not know how to formulate queries
like:

Select
   service_name, service_description, service_rowcount
from
   Registry
where
   service_category = "CATALOG BROSWER"
   and data_class = "OBJECT CATALOG"
   and subject = "GALAXY"
   and querable_parameter = "GALAXY DISTANCE"
   and output_parameter = "GALAXY NAME"

Both the left and right values must be known to both client and server
otherwise the query cannot be formulated by the client, nor interpreted
by the server. The dictionary is the place to list all those "words".
When we speak our language we do not use typically more than a couple
of thousands words, even though my dictionary at home lists more than 100,000
words. A perfect case for the 90/10 rule I suppose, let's make those 90 happy.


Enough provocations for today ...

Alberto

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ivoa.net/pipermail/registry/attachments/20030813/9e4d5f8a/attachment-0001.html>


More information about the registry mailing list