[Cube/vo-dml] ivoa datatypes
Gerard Lemson
lemson at MPA-Garching.MPG.DE
Tue May 6 09:42:16 PDT 2014
HI Carlos
>
> OK, so I don't need to repeat the ucd's, units and so. They are in the data model,
> but in a votable serialization the utype and ucd are expected (or are ok) to be
> present as PARAM or FIELD atributes.
Correct, but ONLY if you use the right types from the ivoa:quantity package.
Hence it is important that more people comment on the contents of that model, because we need to be able to cover these common cases.
> Good. I always thought that they were expected to be repeated, once as
> atributes and once as PARAMS theirselves.
>
This is precisely what Pierre noted in an early phase of the tiger team effort.
This is why we think we need to define these standard quantity types, they are allowed to be mapped this way.
> Second, for your specific example, two questions:
>
> why do you need FIELDref's? wouldn't it be ok:
>
> <GROUP utype="photdm:TransmissionCurve">
> <GROUP photdm: TransmissionCurve.transmissionPoint>
> <FIELD id="_WL" name="WAVELENGTH" ucd="em.wl" unit="angstrom"
> utype="photdm: TransmissionPoint.spectralValue"
> datatype="double"/>
> <FIELD id="_TV" name="TRANSMISSION_VALUE" ucd="???" unit="???"
> utype="photdm: TransmissionPoint.transmissionValue"
> datatype="double"/>
> </GROUP>
> </GROUP>
>
> instead of:
>
> > <GROUP utype="photdm:TransmissionCurve">
> > <GROUP photdm: TransmissionCurve.transmissionPoint>
> > <FIELDref ref="_WL" utype="photdm: TransmissionPoint.spectralValue"/>
> > <FIELDref ref="_TV" utype="photdm:
> TransmissionPoint.transmissionValue"/>
> > </GROUP>
> > </GROUP>
> > <FIELD id="_WL" name="WAVELENGTH" ucd="em.wl" unit="angstrom"
> > datatype="double"/> <FIELD id="_TV" name="TRANSMISSION_VALUE"
> > ucd="???" unit="???" datatype="double"/>
>
> and, if you use FIELDref's do you need to put the utypes in the FIELDref? (maybe
> this is explained in the utype datamodel that I haven't read, forgive me if it's the
> case). I mean:
>
> <GROUP utype="photdm:TransmissionCurve">
> <GROUP photdm: TransmissionCurve.transmissionPoint>
> <FIELDref ref="_WL"/>
> <FIELDref ref="_TV"/>
> </GROUP>
> </GROUP>
>
> <FIELD id="_WL" name="WAVELENGTH" ucd="em.wl" unit="angstrom"
> utype="photdm:TransmissionPoint.spectralValue" datatype="double"/> <FIELD
> id="_TV" name="TRANSMISSION_VALUE" ucd="???" unit="???" utype="photdm:
> TransmissionPoint.transmissionValue" datatype="double"/>
>
Indeed the UTYPE spec explains why we want you to use FIELDref-s and add the utypes on them.
In any case, FIELD-s inside a GROUP is illegal in VOTable I believe.
In this way we can in principle have multiple annotations for the same FIELD as well.
Also, the utype attribute on the FIELD (and also on TABLE) can be used for ad hoc, application specific purposes.
The VO-DML+UTYPE spec does everything in GROUPs :)
Cheers
Gerard
> Thanks
>
> Carlos
>
> On 06/05/14 18:16, Gerard Lemson wrote:
> > Hi Carlos
> > I think that the problem you have is similar to one that in PhotDM it is solved
> rather elegantly I believe.
> > PhotDM has a PhotometryFilter which has a TransmissionCurve, which is
> composed of TransmissionPoint-s.
> > See fig 1 in http://www.ivoa.net/documents/PHOTDM/20130928/PR-PhotDM-
> 1.0-20130928.pdf.
> > The latter consist of two PhysicalQuantityDouble-s, one for spectralValue, one
> for transmissionValue.
> > This PhysicalQuantityDouble is an adhoc quantity type defined inside this
> special model.
> > The argument of my reply to Pierre, and the goal of this special ivoa data
> model is that if instead an ivoa:quantity.RealQuantity was used, the following is
> a legal annotation of a TABLE, following the utype standard, and not requiring
> any other ucd or utype than the one already defined in the FIELD (note that I
> don't know ucd/unit for a transmission value):
> >
> > <GROUP utype="photdm:TransmissionCurve">
> > <GROUP photdm: TransmissionCurve.transmissionPoint>
> > <FIELDref ref="_WL" utype="photdm: TransmissionPoint.spectralValue"/>
> > <FIELDref ref="_TV" utype="photdm:
> TransmissionPoint.transmissionValue"/>
> > </GROUP>
> > </GROUP>
> > <FIELD id="_WL" name="WAVELENGTH" ucd="em.wl" unit="angstrom"
> > datatype="double"/> <FIELD id="_TV" name="TRANSMISSION_VALUE"
> > ucd="???" unit="???" datatype="double"/>
> >
> > If in fig 12 of the spectrum 2.0 model you'd add similar attributes in the
> SPPoint type, say spectralValue and flux (and backgroundValue), give all of these
> datatype ivoa:quantity.RealQuantity, you could similarly write (note that I make
> some assumptions on utypes of the types and the prefix of the model):
> >
> > <GROUP utype="specdm:Spectrum">
> > <GROUP utype="specdm:Spectrum.data">
> > <FIELDref ref="_WL" utype="specdm:SPPoint.spectralValue"/>
> > <FIELDref ref="_FL" utype="specdm SPPoint.flux"/> </GROUP>
> > </GROUP>
> > <FIELD name="WAVELENGTH" ucd="em.wl" unit="angstrom"
> > datatype="double"/> <FIELD name="FLUX" ucd="phot.flux.density;em.wl"
> > unit="erg/cm2/s/A" datatype="double"/>
> >
> > Note that you should then remove the fluxAxis and spectralAxis (and
> backgroundValue) collections from the model. Such constructs were there
> originally in the PhotDM as well, but were replaced with the transmission point.
> > I.e. this says that a Spectrum is basically a collection of SPPoint-s, each of
> which has a spectralValue, flux and possibly a backgroundValue. And this is
> elegantly serialized and annotated to a TABLE: each row represents one such
> point.
> >
> > Cheers
> > Gerard
> >
> > PS
> > No guarantee I have not some typos.
> >
> >>
> >> I have always had a doubt that could have something to do with this
> >> discussion (if I'm not understanding everything wrong)
> >>
> >> I want to serialize an spectrum in a votable.
> >> I have two fields: wavelength and flux.
> >>
> >> <FIELD name="WAVELENGTH" utype="spec:Data.SpectralAxis.Value"
> >> ucd="em.wl" unit="angstrom"
> >> datatype="double"/>
> >> <FIELD name="FLUX" utype="spec:Data.FluxAxis.Value"
> >> ucd="phot.flux.density;em.wl" unit="erg/cm2/s/A"
> >> datatype="double"/>
> >>
> >> the information about ucd, unit and also name for the Spectral and
> >> Flux axis is given there.
> >>
> >> But reading the Spectrum DM (at least version 2.0, but I think that
> >> it was similar in the previous one and in other DataModels) I get the
> >> impression that I must duplicate this information in a Characterization group:
> >>
> >> <GROUP name="Characterization">
> >> <GROUP name="Char.FluxAxis" utype="spec:Char.FluxAxis">
> >> <PARAM name="FluxAxisName" utype="spec:Char.FluxAxis.name"
> >> value="FLUX" .../>
> >> <PARAM name="FluxAxisUcd" utype="spec:Char.FluxAxis.ucd"
> >> value="phot.flux.density;em.wl" .../>
> >> <PARAM name="FluxAxisUnit" utype="spec:Char.FluxAxis.unit"
> >> value="erg/cm2/s/A" .../> </GROUP> <GROUP name="Char.SpectralAxis">
> >> <PARAM name="SpectralAxisName" utype="spec:Char.SpectralAxis.name"
> >> value="WAVELENGTH" .../>
> >> <PARAM name="SpectralAxisUcd" utype="spec:Char.SpectralAxis.ucd"
> >> value="em.wl" .../>
> >> <PARAM name="SpectralAxisUnit" utype="spec:Char.SpectralAxis.unit"
> >> value="angstrom" .../>
> >> </GROUP>
> >> </GROUP>
> >>
> >> where I say again the name, ucd and unit for the spectral and flux axis.
> >>
> >> Is that really needed? what for? I've always found this odd.
> >>
> >> Carlos
> >>
> >> On 06/05/14 17:03, Laurino, Omar wrote:
> >>> Hi Pierre,
> >>>
> >>>
> >>>
> >>> May I precise my position.
> >>>
> >>>
> >>> Your feedback has been valuable in the Tiger Team and is always welcome.
> >>>
> >>> =====
> >>> TL;DR reply (more details follow):
> >>>
> >>> I said one year ago that the VO-DML VOTable serialization
> >>> proposed by
> >> Gerard tended to move some
> >>> meta information such as *UCD*, *unit *or *datatype *outside the
> >>> VOTable
> >> FIELD entity towards
> >>> the proposed GROUP VO-DML hierarchy extension. I noted that this
> >>> point
> >> would be extremely
> >>> annoying for all VOTable clients such as TOPcat or Aladin for
> >>> which this
> >> metadata information
> >>> must stay in the FIELD entities.
> >>>
> >>>
> >>> I am not sure what you are exactly referring to. If it is what
> >>> Gerard commented on, yes, this was fixed long ago after you made this
> comment.
> >>>
> >>> If it is not, I am giving more information in the second part, but
> >>> in summary we are trying to standardize the serialization of Data
> >>> Models also for the reason you mention: allowing clients to know
> >>> where to look for metadata, which is tricky, to say the least, with
> >>> the current usages and
> >> standards (see the second part of the email for details and examples).
> >>>
> >>> For bypassing this issue, and if I correctly understand the
> >>> current 2014-05-03
> >> XML basic IVOA
> >>> model description
> >>> (https://volute.googlecode.com/svn/trunk/projects/dm/vo-
> >> dml/models/ivoa/IVOA.vo-dml.xml), the
> >>> "quantity" entry duplicates now the UCD role and unit role.
> >>>
> >>>
> >>> We are not duplicating existing standards, we are defining a
> >>> standardized way to describe and serialize data models in a
> >>> machine-readable way. You might be confusing the two levels of the
> >>> solution, which correspond to two different documents: VODML
> >>> descriptions of data models, and the serialization of such data
> >>> models in
> >> VOTable. In the second document we use the standardized units and ucd
> >> and the corresponding VOTable standard attributes.
> >>>
> >>>
> >>>
> >>> And I have to say that the current basic IVOA model appears
> >>> for me too
> >> heteroclite to be used
> >>> without fear: "identity, rational, complex, duration, anyURI,
> >>> boolean, real,
> >> nonnegativeInteger,
> >>> datetime, integer, string, quantity". For a no-DM person, it is
> >>> quite difficult
> >> to understand
> >>> why such or such data type is considered as a basic datatype (duration ?
> >> datetime ? anyURI ?),
> >>> and why others are not (char ?, range ? frequency ? ...).
> >>>
> >>>
> >>> Where to draw the line is a good question, and the current
> >>> descriptions have been there to be commented for about a year, so we
> >>> are
> >> happy we are finally discussing them!
> >>>
> >>> =====
> >>>
> >>>
> >>> More detailed responses below.
> >>>
> >>>
> >>>
> >>> I said one year ago that the VO-DML VOTable serialization
> >>> proposed by
> >> Gerard tended to move some
> >>> meta information such as *UCD*, *unit *or *datatype *outside the
> >>> VOTable
> >> FIELD entity towards
> >>> the proposed GROUP VO-DML hierarchy extension. I noted that this
> >>> point
> >> would be extremely
> >>> annoying for all VOTable clients such as TOPcat or Aladin for
> >>> which this
> >> metadata information
> >>> must stay in the FIELD entities.
> >>>
> >>>
> >>>
> >>> I am not sure whether you refer to the fact that in an early proof
> >>> of concept serialization there were standalone PARAMs for unit and ucd.
> >>> If that's the case, as Gerard pointed out this was fixed long ago in
> >>> response to your feedback and the result is in section 6.8 of the
> >>> UTYPEs draft
> >> we presented in Heidelberg one year ago, as well as in the actual
> >> examples (Reference 1 below).
> >>>
> >>> It may also sound like you are worried about FIELDref having the UCD
> >> metadata as opposed to FIELDs.
> >>> If that's the case, there are several current standards and
> >>> production implementations that use UCDs in FIELDrefs. I am not
> >>> going to elaborate too much on this, since I am not sure whether
> >>> this is really what you meant, but I will give a couple of references, just in
> case.
> >>> The PhotDM, in section
> >>> C.2 (Reference 2) provides an example of a Cone Search response, and
> >>> use
> >> FIELDrefs (with UCDs).
> >>> FIELDs are not even mentioned. This is, I believe, taken directly to
> >>> the note by Sebastien et al (Reference 3) on how to serialize
> >>> Photometry Measurements in VOTable. The only examples that makes use
> >>> of FIELDs (section 4.1 and 4.2) have two sets of (different) UCDs,
> >>> one for the
> >> FIELDs and one for the FIELDrefs. The other examples do not mention FIELDs.
> >>>
> >>>
> >>> In any case, whether you meant the first or the second
> >>> interpretation, more generally, the problem is that the current
> >>> standards make it hard for clients to make sense of the metadata,
> >>> and this is one of the reasons why we are trying to standardize the
> >>> serialization of data models: to
> >> make clients' life easier.
> >>>
> >>> As far as I know this only applies to UCDs and UTYPEs, because
> >>> FIELDrefs can only have these attributes (Reference 4, Sections 7.2).
> >>>
> >>> Some models (e.g. Spectrum 1.1, Reference 5) define reify UCDs by
> >>> creating UCD fields in the model (thus creating many *.ucd UTYPEs).
> >>> For instance, see the VOTable example in section 8.2 (I'm including
> >>> a snippet
> >> for convenience):
> >>>
> >>> <PARAM ID="DataFluxUcd" datatype="char" name="DataFluxUcd"
> >>> utype="spec:Spectrum.Data.FluxAxis.Ucd"
> value="phot.flux.density;em.wl"
> >> arraysize="*">
> >>> <DESCRIPTION>UCD for flux</DESCRIPTION>
> >>> </PARAM>
> >>>
> >>>
> >>> Notice that, as opposed to Gerard's 2012 proof of concept, this is
> >>> stated in a
> >> *standard* document.
> >>>
> >>> The status quo is that a client parsing a *standard* Spectrum 1.1
> >>> VOTable (I am using the example above, but there may be other
> >>> examples in
> >> other models) can find a UCD in many different places:
> >>> - a FIELDref with @utype spec:Spectrum.Data.FluxAxis
> >>> - a FIELD referenced by a FIELDref and without a @utype
> >>> - a FIELD with @utype spec:Spectrum.Data.FluxAxis
> >>> - a PARAM with @utype spec:Spectrum.Data.FluxAxis.Ucd
> >>> - a TD relative to a FIELD with @utype
> >>> spec:Spectrum.Data.FluxAxis.Ucd
> >>>
> >>> This is what we are trying to standardize, so that it is clear to
> >>> clients how to look for metadata in an unambiguous way. Even better,
> >>> with a standard like the one suggested by the Tiger Team, parsing a
> >>> VOTable according to a data model becomes a mechanical effort, so
> >>> that users and developers can use libraries, which is currently
> >>> impossible (if not
> >> convinced by the above example see the Current Usages document,
> >> Reference 6).
> >>>
> >>>
> >>>
> >>> For bypassing this issue, and if I correctly understand the
> >>> current 2014-05-03
> >> XML basic IVOA
> >>> model description
> >>> (https://volute.googlecode.com/svn/trunk/projects/dm/vo-
> >> dml/models/ivoa/IVOA.vo-dml.xml), the
> >>> "quantity" entry duplicates now the UCD role and unit role.
> >>>
> >>>
> >>> I believe you are confusing two levels, which are represented by two
> >>> documents. One level is the data model description. Data Models can
> >>> (in fact they do, see Sprectrum 1.1) define ucd and unit as fields
> >>> of their models, reifying them. Even when they don't, there are some
> >>> cases (I can provide examples from production services) where the
> >>> data publisher needs to reify some of the metadata. For instance
> >>> consider a column
> >> where the same quantity is expressed in different units: in this case
> >> the unit piece of metadata becomes data and you need a column to store
> them.
> >>>
> >>> So, VODML supports all of these real world examples. This has
> >>> nothing to do with VOTable or any other serialization. As a matter
> >>> of fact, VODML is indeed an effort to make serializations of Data
> >>> Models
> >> interoperable.
> >>>
> >>> The other level is the one of serialization. since VOTable has a
> >>> @ucd attribute, it's smart to use it, and that's what we do in the
> >>> serialization
> >> document.
> >>>
> >>>
> >>>
> >>> Personally, I am not sure that this solution to duplicate this
> >>> kind of
> >> information will be the
> >>> more appropriate approach: 1) we redo our VO efforts already
> >>> done on
> >> UCDs and units...
> >>>
> >>>
> >>> Nope. When you serialize a data model instance in VOTable you use
> >>> the standard UCDs and Units, and the standard VOTable attributes for
> >>> them
> >> (again, section 6.8 in the UTYPEs WD).
> >>>
> >>>
> >>> 2) we will have to manage correspondances between
> >>> FIELD-UCD/FIELD-unit
> >> and VO-DML-quantity.
> >>>
> >>>
> >>> You already need to do that now, but with VODML and the
> >>> serialization document there is a standard to be implemented,
> >>> applications developers do
> >> not need to "guess", or to assume conventions.
> >>>
> >>>
> >>> And I have to say that the current basic IVOA model appears
> >>> for me too
> >> heteroclite to be used
> >>> without fear: "identity, rational, complex, duration, anyURI,
> >>> boolean, real,
> >> nonnegativeInteger,
> >>> datetime, integer, string, quantity". For a no-DM person, it is
> >>> quite difficult
> >> to understand
> >>> why such or such data type is considered as a basic datatype (duration ?
> >> datetime ? anyURI ?),
> >>> and why others are not (char ?, range ? frequency ? ...).
> >>>
> >>>
> >>> Where to draw the line is a good question, and the current
> >>> descriptions have been there to be commented for about a year, so we
> >>> are
> >> happy we are finally discussing them!
> >>>
> >>> Notice, however, that a no-DM person shouldn't care: VODML
> >>> descriptions are meant to be used by software developers who need to
> >>> know how to map the IVOA types to their language, and only DM people
> >>> need
> >> to create models, so...
> >>>
> >>> Primitive types are special in that they need to be defined
> >>> beforehand so that developers can map them to their own "primitive"
> >>> classes or structures. All other types can be derived from them, and
> >>> that can be done mechanically in any language (we have prototypes
> >>> and reference
> >> implementations in Java and Python already, as I showed in Hawaii).
> >>>
> >>> I believe primitive types should all be domain-independent:
> >>> frequency is a physics concept, you won't find it as a primitive
> >>> type in MySQL or Java, while datetime is general and can be found in
> >>> both (I am using "primitive" in a broad sense, not in a
> >>> language-specific sense... e.g. Java doesn't have a datetime
> >>> "primitive", but datetime and
> >> duration have corresponding classes in the standard Java library).
> >>>
> >>> Also, they should map at least to the VOTable concepts.
> >>>
> >>> Of course, this is all to some extent arbitrary and fuzzy. For
> >>> instance, you mention char and
> >>> duration: the first one would be good to include because it maps
> >>> directly to a VOTable datatype. The second one is really on the
> >>> fuzzy edge. I think it makes sense to include it among the primitive
> >>> types, but I
> >> wouldn't be against leaving it out of the list.
> >>>
> >>> Thanks for the feedback!
> >>>
> >>> Omar.
> >>>
> >>> Reference 1. UTYPEs WD
> >>> http://volute.googlecode.com/svn/trunk/projects/dm/vo-dml/doc/UTYPEs
> >>> -W
> >>> D-v1.0.pdf
> >>>
> >>> Reference 2. PhotDM REC
> >>> http://www.ivoa.net/documents/PHOTDM/20130928/PR-PhotDM-1.0-
> >> 20130928.p
> >>> df
> >>>
> >>> Reference 3. PhotDM in VOTAble
> >>> NOTE
> >>> http://wiki.ivoa.net/internal/IVOA/PhotometryDataModel/NOTE-
> PPDMDesc
> >>> -0
> >>> .1-20101202.pdf
> >>>
> >>> Reference 4. VOTable 1.3
> >>> REC
> >>> http://www.ivoa.net/documents/VOTable/20130920/REC-VOTable-1.3-
> >> 2013092
> >>> 0.html
> >>>
> >>> Reference 5. Spectrum 1.1
> >>> REC
> >>> http://www.ivoa.net/documents/SpectrumDM/20111120/REC-
> SpectrumDM-
> >> 1.1-2
> >>> 0111120.pdf
> >>>
> >>> Reference 6. UTYPEs: Current Usages
> >>> NOTE
> >>> http://www.ivoa.net/documents/Notes/UTypesUsage/20130213/NOTE-
> >> utypes-u
> >>> sage-1.0-20130213.html
> >>>
> >>>
> >>> --
> >>> Omar Laurino
> >>> Smithsonian Astrophysical Observatory Harvard-Smithsonian Center for
> >>> Astrophysics
> >>> 100 Acorn Park Dr. R-377 MS-81
> >>> 02140 Cambridge, MA
> >>> (617) 495-7227
> >
More information about the dm
mailing list