[Cube/vo-dml] ivoa datatypes

Laurino, Omar olaurino at cfa.harvard.edu
Tue May 6 08:03:23 PDT 2014


Hi Pierre,



> May I precise my position.
>

Your feedback has been valuable in the Tiger Team and is always welcome.

=====
TL;DR reply (more details follow):

I said one year ago that the VO-DML VOTable serialization proposed by
> Gerard tended to move some meta information such as *UCD*, *unit *or
> *datatype *outside the VOTable FIELD entity towards the proposed GROUP
> VO-DML hierarchy extension. I noted that this point would be extremely
> annoying for all VOTable clients such as TOPcat or Aladin for which this
> metadata information must stay in the FIELD entities.


I am not sure what you are exactly referring to. If it is what Gerard
commented on, yes, this was fixed long ago after you made this comment.

If it is not, I am giving more information in the second part, but in
summary we are trying to standardize the serialization of Data Models also
for the reason you mention: allowing clients to know where to look for
metadata, which is tricky, to say the least, with the current usages and
standards (see the second part of the email for details and examples).

For bypassing this issue, and if I correctly understand the current
> 2014-05-03 XML basic IVOA model description (
> https://volute.googlecode.com/svn/trunk/projects/dm/vo-dml/models/ivoa/IVOA.vo-dml.xml),
> the "quantity" entry duplicates now the UCD role and unit role.


We are not duplicating existing standards, we are defining a standardized
way to describe and serialize data models in a machine-readable way. You
might be confusing the two levels of the solution, which correspond to two
different documents: VODML descriptions of data models, and the
serialization of such data models in VOTable. In the second document we use
the standardized units and ucd and the corresponding VOTable standard
attributes.



>   And I have to say that the current basic IVOA model appears for me too
> heteroclite to be used without fear: "identity, rational, complex,
> duration, anyURI, boolean, real, nonnegativeInteger, datetime, integer,
> string, quantity". For a no-DM person, it is quite difficult to understand
> why such or such data type is considered as a basic datatype (duration ?
> datetime ? anyURI ?), and why others are not (char ?, range ? frequency ?
> ...).
>

Where to draw the line is a good question, and the current descriptions
have been there to be commented for about a year, so we are happy we are
finally discussing them!

=====


More detailed responses below.



>  I said one year ago that the VO-DML VOTable serialization proposed by
> Gerard tended to move some meta information such as *UCD*, *unit *or *datatype
> *outside the VOTable FIELD entity towards the proposed GROUP VO-DML
> hierarchy extension. I noted that this point would be extremely annoying
> for all VOTable clients such as TOPcat or Aladin for which this metadata
> information must stay in the FIELD entities.
>


I am not sure whether you refer to the fact that in an early proof of
concept serialization there were standalone PARAMs for unit and ucd. If
that's the case, as Gerard pointed out this was fixed long ago in response
to your feedback and the result is in section 6.8 of the UTYPEs draft we
presented in Heidelberg one year ago, as well as in the actual examples
(Reference 1 below).

It may also sound like you are worried about FIELDref having the UCD
metadata as opposed to FIELDs. If that's the case, there are several
current standards and production implementations that use UCDs in
FIELDrefs. I am not going to elaborate too much on this, since I am not
sure whether this is really what you meant, but I will give a couple of
references, just in case. The PhotDM, in section C.2 (Reference 2) provides
an example of a Cone Search response, and use FIELDrefs (with UCDs). FIELDs
are not even mentioned. This is, I believe, taken directly to the note by
Sebastien et al (Reference 3) on how to serialize Photometry Measurements
in VOTable. The only examples that makes use of FIELDs (section 4.1 and
4.2) have two sets of (different) UCDs, one for the FIELDs and one for the
FIELDrefs. The other examples do not mention FIELDs.


In any case, whether you meant the first or the second interpretation, more
generally, the problem is that the current standards make it hard for
clients to make sense of the metadata, and this is one of the reasons why
we are trying to standardize the serialization of data models: to make
clients' life easier.

As far as I know this only applies to UCDs and UTYPEs, because FIELDrefs
can only have these attributes (Reference 4, Sections 7.2).

Some models (e.g. Spectrum 1.1, Reference 5) define reify UCDs by creating
UCD fields in the model (thus creating many *.ucd UTYPEs). For instance,
see the VOTable example in section 8.2 (I'm including a snippet for
convenience):

> <PARAM ID="DataFluxUcd" datatype="char" name="DataFluxUcd"
> utype="spec:Spectrum.Data.FluxAxis.Ucd" value="phot.flux.density;em.wl"
> arraysize="*">
> <DESCRIPTION>UCD for flux</DESCRIPTION>
> </PARAM>


Notice that, as opposed to Gerard's 2012 proof of concept, this is stated
in a *standard* document.

The status quo is that a client parsing a *standard* Spectrum 1.1 VOTable
(I am using the example above, but there may be other examples in other
models) can find a UCD in many different places:
  - a FIELDref with @utype spec:Spectrum.Data.FluxAxis
  - a FIELD referenced by a FIELDref and without a @utype
  - a FIELD with @utype spec:Spectrum.Data.FluxAxis
  - a PARAM with @utype spec:Spectrum.Data.FluxAxis.Ucd
  - a TD relative to a FIELD with @utype spec:Spectrum.Data.FluxAxis.Ucd

This is what we are trying to standardize, so that it is clear to clients
how to look for metadata in an unambiguous way. Even better, with a
standard like the one suggested by the Tiger Team, parsing a VOTable
according to a data model becomes a mechanical effort, so that users and
developers can use libraries, which is currently impossible (if not
convinced by the above example see the Current Usages document, Reference
6).



> For bypassing this issue, and if I correctly understand the current
> 2014-05-03 XML basic IVOA model description (
> https://volute.googlecode.com/svn/trunk/projects/dm/vo-dml/models/ivoa/IVOA.vo-dml.xml),
> the "quantity" entry duplicates now the UCD role and unit role.
>

I believe you are confusing two levels, which are represented by two
documents. One level is the data model description. Data Models can (in
fact they do, see Sprectrum 1.1) define ucd and unit as fields of their
models, reifying them. Even when they don't, there are some cases (I can
provide examples from production services) where the data publisher needs
to reify some of the metadata. For instance consider a column where the
same quantity is expressed in different units: in this case the unit piece
of metadata becomes data and you need a column to store them.

So, VODML supports all of these real world examples. This has nothing to do
with VOTable or any other serialization. As a matter of fact, VODML is
indeed an effort to make serializations of Data Models interoperable.

The other level is the one of serialization. since VOTable has a @ucd
attribute, it's smart to use it, and that's what we do in the serialization
document.



>  Personally, I am not sure that this solution to duplicate this kind of
> information will be the more appropriate approach: 1) we redo our VO
> efforts already done on UCDs and units...
>

Nope. When you serialize a data model instance in VOTable you use the
standard UCDs and Units, and the standard VOTable attributes for them
(again, section 6.8 in the UTYPEs WD).


> 2) we will have to manage correspondances between FIELD-UCD/FIELD-unit and
> VO-DML-quantity.
>

You already need to do that now, but with VODML and the serialization
document there is a standard to be implemented, applications developers do
not need to "guess", or to assume conventions.


>   And I have to say that the current basic IVOA model appears for me too
> heteroclite to be used without fear: "identity, rational, complex,
> duration, anyURI, boolean, real, nonnegativeInteger, datetime, integer,
> string, quantity". For a no-DM person, it is quite difficult to understand
> why such or such data type is considered as a basic datatype (duration ?
> datetime ? anyURI ?), and why others are not (char ?, range ? frequency ?
> ...).
>

Where to draw the line is a good question, and the current descriptions
have been there to be commented for about a year, so we are happy we are
finally discussing them!

Notice, however, that a no-DM person shouldn't care: VODML descriptions are
meant to be used by software developers who need to know how to map the
IVOA types to their language, and only DM people need to create models,
so...

Primitive types are special in that they need to be defined beforehand so
that developers can map them to their own "primitive" classes or
structures. All other types can be derived from them, and that can be done
mechanically in any language (we have prototypes and reference
implementations in Java and Python already, as I showed in Hawaii).

I believe primitive types should all be domain-independent: frequency is a
physics concept, you won't find it as a primitive type in MySQL or Java,
while datetime is general and can be found in both (I am using "primitive"
in a broad sense, not in a language-specific sense... e.g. Java doesn't
have a datetime "primitive", but datetime and duration have corresponding
classes in the standard Java library).

Also, they should map at least to the VOTable concepts.

Of course, this is all to some extent arbitrary and fuzzy. For instance,
you mention char and duration: the first one would be good to include
because it maps directly to a VOTable datatype. The second one is really on
the fuzzy edge. I think it makes sense to include it among the primitive
types, but I wouldn't be against leaving it out of the list.

Thanks for the feedback!

Omar.

Reference 1. UTYPEs WD
http://volute.googlecode.com/svn/trunk/projects/dm/vo-dml/doc/UTYPEs-WD-v1.0.pdf

Reference 2. PhotDM REC
http://www.ivoa.net/documents/PHOTDM/20130928/PR-PhotDM-1.0-20130928.pdf

Reference 3. PhotDM in VOTAble NOTE
http://wiki.ivoa.net/internal/IVOA/PhotometryDataModel/NOTE-PPDMDesc-0.1-20101202.pdf

Reference 4. VOTable 1.3 REC
http://www.ivoa.net/documents/VOTable/20130920/REC-VOTable-1.3-20130920.html

Reference 5. Spectrum 1.1 REC
http://www.ivoa.net/documents/SpectrumDM/20111120/REC-SpectrumDM-1.1-20111120.pdf

Reference 6. UTYPEs: Current Usages NOTE
http://www.ivoa.net/documents/Notes/UTypesUsage/20130213/NOTE-utypes-usage-1.0-20130213.html


-- 
Omar Laurino
Smithsonian Astrophysical Observatory
Harvard-Smithsonian Center for Astrophysics
100 Acorn Park Dr. R-377 MS-81
02140 Cambridge, MA
(617) 495-7227
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ivoa.net/pipermail/dm/attachments/20140506/e26add46/attachment-0001.html>


More information about the dm mailing list