[Cube/vo-dml] ivoa datatypes

Carlos Rodrigo crb at cab.inta-csic.es
Tue May 6 11:45:22 PDT 2014


Ah, OK. That is a good example that I should remember hehe :)

My question was actually more about how data models are expected to be serialized right now, before
vo-dml. That is, a typical votable serialization of any data model (spectrum, photometry, etc).
There is where I'm still confused.

But I didn't notice that we were talking about vo-dml so the question may be out of place.

Thanks

Carlos

On 06/05/14 19:48, Laurino, Omar wrote:
> Carlos,
> 
> you implemented one of the prototypes I presented in Hawaii, so you saw that in action, even though
> that happened a while ago, and we were using prototype model descriptions. Also, I now realize we
> made some mistakes, which is to expect for a prototype. When you want we can look at those issues
> and fix them.
> 
> In any case in the prototype you correctly use the pattern that Pierre suggested, with the UCD,
> Unit, and datatype in the FIELDs.
> 
> Your prototype is also a good example of how the proposal is backward compatible, since your FIELDs
> have the "old-style" UTYPEs, while the FIELDrefs have the "new-style" ones, so that an existing
> client can make sense of the file even though they ignore VODML GROUPS.
> 
> Omar.
> 
> 
> 
> 
> On Tue, May 6, 2014 at 12:04 PM, Carlos Rodrigo <crb at cab.inta-csic.es <mailto:crb at cab.inta-csic.es>>
> wrote:
> 
>     Hi
> 
>     I have always had a doubt that could have something to do with this discussion (if I'm not
>     understanding everything wrong)
> 
>     I want to serialize an spectrum in a votable.
>     I have two fields: wavelength and flux.
> 
>     <FIELD name="WAVELENGTH" utype="spec:Data.SpectralAxis.Value" ucd="em.wl" unit="angstrom"
>     datatype="double"/>
>     <FIELD name="FLUX" utype="spec:Data.FluxAxis.Value" ucd="phot.flux.density;em.wl" unit="erg/cm2/s/A"
>     datatype="double"/>
> 
>     the information about ucd, unit and also name for the Spectral and Flux axis is given there.
> 
>     But reading the Spectrum DM (at least version 2.0, but I think that it was similar in the previous
>     one and in other DataModels) I get the impression that I must duplicate this information in a
>     Characterization group:
> 
>     <GROUP name="Characterization">
>      <GROUP name="Char.FluxAxis" utype="spec:Char.FluxAxis">
>       <PARAM name="FluxAxisName" utype="spec:Char.FluxAxis.name <http://Char.FluxAxis.name>"
>     value="FLUX"  .../>
>       <PARAM name="FluxAxisUcd"  utype="spec:Char.FluxAxis.ucd"  value="phot.flux.density;em.wl" .../>
>       <PARAM name="FluxAxisUnit" utype="spec:Char.FluxAxis.unit" value="erg/cm2/s/A" .../>
>      </GROUP>
>      <GROUP name="Char.SpectralAxis">
>       <PARAM name="SpectralAxisName" utype="spec:Char.SpectralAxis.name
>     <http://Char.SpectralAxis.name>" value="WAVELENGTH" .../>
>       <PARAM name="SpectralAxisUcd"  utype="spec:Char.SpectralAxis.ucd"  value="em.wl" .../>
>       <PARAM name="SpectralAxisUnit" utype="spec:Char.SpectralAxis.unit" value="angstrom" .../>
>       </GROUP>
>     </GROUP>
> 
>     where I say again the name, ucd and unit for the spectral and flux axis.
> 
>     Is that really needed? what for? I've always found this odd.
> 
>     Carlos
> 
>     On 06/05/14 17:03, Laurino, Omar wrote:
>     > Hi Pierre,
>     >
>     >
>     >
>     >     May I precise my position.
>     >
>     >
>     > Your feedback has been valuable in the Tiger Team and is always welcome.
>     >
>     > =====
>     > TL;DR reply (more details follow):
>     >
>     >     I said one year ago that the VO-DML VOTable serialization proposed by Gerard tended to
>     move some
>     >     meta information such as *UCD*, *unit *or *datatype *outside the VOTable FIELD entity towards
>     >     the proposed GROUP VO-DML hierarchy extension. I noted that this point would be extremely
>     >     annoying for all VOTable clients such as TOPcat or Aladin for which this metadata information
>     >     must stay in the FIELD entities.
>     >
>     >
>     > I am not sure what you are exactly referring to. If it is what Gerard commented on, yes, this was
>     > fixed long ago after you made this comment.
>     >
>     > If it is not, I am giving more information in the second part, but in summary we are trying to
>     > standardize the serialization of Data Models also for the reason you mention: allowing clients to
>     > know where to look for metadata, which is tricky, to say the least, with the current usages and
>     > standards (see the second part of the email for details and examples).
>     >
>     >     For bypassing this issue, and if I correctly understand the current 2014-05-03 XML basic IVOA
>     >     model description
>     >     (https://volute.googlecode.com/svn/trunk/projects/dm/vo-dml/models/ivoa/IVOA.vo-dml.xml), the
>     >     "quantity" entry duplicates now the UCD role and unit role.
>     >
>     >
>     > We are not duplicating existing standards, we are defining a standardized way to describe and
>     > serialize data models in a machine-readable way. You might be confusing the two levels of the
>     > solution, which correspond to two different documents: VODML descriptions of data models, and the
>     > serialization of such data models in VOTable. In the second document we use the standardized units
>     > and ucd and the corresponding VOTable standard attributes.
>     >
>     >
>     >
>     >       And I have to say that the current basic IVOA model appears for me too heteroclite to be
>     used
>     >     without fear: "identity, rational, complex, duration, anyURI, boolean, real,
>     nonnegativeInteger,
>     >     datetime, integer, string, quantity". For a no-DM person, it is quite difficult to understand
>     >     why such or such data type is considered as a basic datatype (duration ? datetime ? anyURI ?),
>     >     and why others are not (char ?, range ? frequency ? ...).
>     >
>     >
>     > Where to draw the line is a good question, and the current descriptions have been there to be
>     > commented for about a year, so we are happy we are finally discussing them!
>     >
>     > =====
>     >
>     >
>     > More detailed responses below.
>     >
>     >
>     >
>     >     I said one year ago that the VO-DML VOTable serialization proposed by Gerard tended to
>     move some
>     >     meta information such as *UCD*, *unit *or *datatype *outside the VOTable FIELD entity towards
>     >     the proposed GROUP VO-DML hierarchy extension. I noted that this point would be extremely
>     >     annoying for all VOTable clients such as TOPcat or Aladin for which this metadata information
>     >     must stay in the FIELD entities.
>     >
>     >
>     >
>     > I am not sure whether you refer to the fact that in an early proof of concept serialization there
>     > were standalone PARAMs for unit and ucd. If that's the case, as Gerard pointed out this was fixed
>     > long ago in response to your feedback and the result is in section 6.8 of the UTYPEs draft we
>     > presented in Heidelberg one year ago, as well as in the actual examples (Reference 1 below).
>     >
>     > It may also sound like you are worried about FIELDref having the UCD metadata as opposed to
>     FIELDs.
>     > If that's the case, there are several current standards and production implementations that
>     use UCDs
>     > in FIELDrefs. I am not going to elaborate too much on this, since I am not sure whether this is
>     > really what you meant, but I will give a couple of references, just in case. The PhotDM, in
>     section
>     > C.2 (Reference 2) provides an example of a Cone Search response, and use FIELDrefs (with UCDs).
>     > FIELDs are not even mentioned. This is, I believe, taken directly to the note by Sebastien et al
>     > (Reference 3) on how to serialize Photometry Measurements in VOTable. The only examples that makes
>     > use of FIELDs (section 4.1 and 4.2) have two sets of (different) UCDs, one for the FIELDs and one
>     > for the FIELDrefs. The other examples do not mention FIELDs.
>     >
>     >
>     > In any case, whether you meant the first or the second interpretation, more generally, the problem
>     > is that the current standards make it hard for clients to make sense of the metadata, and this is
>     > one of the reasons why we are trying to standardize the serialization of data models: to make
>     > clients' life easier.
>     >
>     > As far as I know this only applies to UCDs and UTYPEs, because FIELDrefs can only have these
>     > attributes (Reference 4, Sections 7.2).
>     >
>     > Some models (e.g. Spectrum 1.1, Reference 5) define reify UCDs by creating UCD fields in the model
>     > (thus creating many *.ucd UTYPEs). For instance, see the VOTable example in section 8.2 (I'm
>     > including a snippet for convenience):
>     >
>     >     <PARAM ID="DataFluxUcd" datatype="char" name="DataFluxUcd"
>     >     utype="spec:Spectrum.Data.FluxAxis.Ucd" value="phot.flux.density;em.wl" arraysize="*">
>     >     <DESCRIPTION>UCD for flux</DESCRIPTION>
>     >     </PARAM>
>     >
>     >
>     > Notice that, as opposed to Gerard's 2012 proof of concept, this is stated in a *standard*
>     document.
>     >
>     > The status quo is that a client parsing a *standard* Spectrum 1.1 VOTable (I am using the example
>     > above, but there may be other examples in other models) can find a UCD in many different places:
>     >   - a FIELDref with @utype spec:Spectrum.Data.FluxAxis
>     >   - a FIELD referenced by a FIELDref and without a @utype
>     >   - a FIELD with @utype spec:Spectrum.Data.FluxAxis
>     >   - a PARAM with @utype spec:Spectrum.Data.FluxAxis.Ucd
>     >   - a TD relative to a FIELD with @utype spec:Spectrum.Data.FluxAxis.Ucd
>     >
>     > This is what we are trying to standardize, so that it is clear to clients how to look for metadata
>     > in an unambiguous way. Even better, with a standard like the one suggested by the Tiger Team,
>     > parsing a VOTable according to a data model becomes a mechanical effort, so that users and
>     > developers can use libraries, which is currently impossible (if not convinced by the above example
>     > see the Current Usages document, Reference 6).
>     >
>     >
>     >
>     >     For bypassing this issue, and if I correctly understand the current 2014-05-03 XML basic IVOA
>     >     model description
>     >     (https://volute.googlecode.com/svn/trunk/projects/dm/vo-dml/models/ivoa/IVOA.vo-dml.xml), the
>     >     "quantity" entry duplicates now the UCD role and unit role.
>     >
>     >
>     > I believe you are confusing two levels, which are represented by two documents. One level is the
>     > data model description. Data Models can (in fact they do, see Sprectrum 1.1) define ucd and
>     unit as
>     > fields of their models, reifying them. Even when they don't, there are some cases (I can provide
>     > examples from production services) where the data publisher needs to reify some of the
>     metadata. For
>     > instance consider a column where the same quantity is expressed in different units: in this
>     case the
>     > unit piece of metadata becomes data and you need a column to store them.
>     >
>     > So, VODML supports all of these real world examples. This has nothing to do with VOTable or any
>     > other serialization. As a matter of fact, VODML is indeed an effort to make serializations of Data
>     > Models interoperable.
>     >
>     > The other level is the one of serialization. since VOTable has a @ucd attribute, it's smart to use
>     > it, and that's what we do in the serialization document.
>     >
>     >
>     >
>     >     Personally, I am not sure that this solution to duplicate this kind of information will be the
>     >     more appropriate approach: 1) we redo our VO efforts already done on UCDs and units...
>     >
>     >
>     > Nope. When you serialize a data model instance in VOTable you use the standard UCDs and Units, and
>     > the standard VOTable attributes for them (again, section 6.8 in the UTYPEs WD).
>     >
>     >
>     >     2) we will have to manage correspondances between FIELD-UCD/FIELD-unit and VO-DML-quantity.
>     >
>     >
>     > You already need to do that now, but with VODML and the serialization document there is a standard
>     > to be implemented, applications developers do not need to "guess", or to assume conventions.
>     >
>     >
>     >       And I have to say that the current basic IVOA model appears for me too heteroclite to be
>     used
>     >     without fear: "identity, rational, complex, duration, anyURI, boolean, real,
>     nonnegativeInteger,
>     >     datetime, integer, string, quantity". For a no-DM person, it is quite difficult to understand
>     >     why such or such data type is considered as a basic datatype (duration ? datetime ? anyURI ?),
>     >     and why others are not (char ?, range ? frequency ? ...).
>     >
>     >
>     > Where to draw the line is a good question, and the current descriptions have been there to be
>     > commented for about a year, so we are happy we are finally discussing them!
>     >
>     > Notice, however, that a no-DM person shouldn't care: VODML descriptions are meant to be used by
>     > software developers who need to know how to map the IVOA types to their language, and only DM
>     people
>     > need to create models, so...
>     >
>     > Primitive types are special in that they need to be defined beforehand so that developers can map
>     > them to their own "primitive" classes or structures. All other types can be derived from them, and
>     > that can be done mechanically in any language (we have prototypes and reference implementations in
>     > Java and Python already, as I showed in Hawaii).
>     >
>     > I believe primitive types should all be domain-independent: frequency is a physics concept, you
>     > won't find it as a primitive type in MySQL or Java, while datetime is general and can be found in
>     > both (I am using "primitive" in a broad sense, not in a language-specific sense... e.g. Java
>     doesn't
>     > have a datetime "primitive", but datetime and duration have corresponding classes in the standard
>     > Java library).
>     >
>     > Also, they should map at least to the VOTable concepts.
>     >
>     > Of course, this is all to some extent arbitrary and fuzzy. For instance, you mention char and
>     > duration: the first one would be good to include because it maps directly to a VOTable
>     datatype. The
>     > second one is really on the fuzzy edge. I think it makes sense to include it among the primitive
>     > types, but I wouldn't be against leaving it out of the list.
>     >
>     > Thanks for the feedback!
>     >
>     > Omar.
>     >
>     > Reference 1. UTYPEs WD
>     http://volute.googlecode.com/svn/trunk/projects/dm/vo-dml/doc/UTYPEs-WD-v1.0.pdf
>     >
>     > Reference 2. PhotDM REC http://www.ivoa.net/documents/PHOTDM/20130928/PR-PhotDM-1.0-20130928.pdf
>     >
>     > Reference 3. PhotDM in VOTAble
>     > NOTE http://wiki.ivoa.net/internal/IVOA/PhotometryDataModel/NOTE-PPDMDesc-0.1-20101202.pdf
>     >
>     > Reference 4. VOTable 1.3
>     > REC http://www.ivoa.net/documents/VOTable/20130920/REC-VOTable-1.3-20130920.html
>     >
>     > Reference 5. Spectrum 1.1
>     > REC http://www.ivoa.net/documents/SpectrumDM/20111120/REC-SpectrumDM-1.1-20111120.pdf
>     >
>     > Reference 6. UTYPEs: Current Usages
>     > NOTE http://www.ivoa.net/documents/Notes/UTypesUsage/20130213/NOTE-utypes-usage-1.0-20130213.html
>     >
>     >
>     > --
>     > Omar Laurino
>     > Smithsonian Astrophysical Observatory
>     > Harvard-Smithsonian Center for Astrophysics
>     > 100 Acorn Park Dr. R-377 MS-81
>     > 02140 Cambridge, MA
>     > (617) 495-7227 <tel:%28617%29%20495-7227>
> 
> 
> 
> 
> -- 
> Omar Laurino
> Smithsonian Astrophysical Observatory
> Harvard-Smithsonian Center for Astrophysics
> 100 Acorn Park Dr. R-377 MS-81
> 02140 Cambridge, MA
> (617) 495-7227



More information about the dm mailing list