[Cube/vo-dml] ivoa datatypes
Carlos Rodrigo
crb at cab.inta-csic.es
Tue May 6 10:14:18 PDT 2014
Hi
OK, so I don't need to repeat the ucd's, units and so. They are in the data model, but in a votable
serialization the utype and ucd are expected (or are ok) to be present as PARAM or FIELD atributes.
Good. I always thought that they were expected to be repeated, once as atributes and once as PARAMS
theirselves.
Second, for your specific example, two questions:
why do you need FIELDref's? wouldn't it be ok:
<GROUP utype="photdm:TransmissionCurve">
<GROUP photdm: TransmissionCurve.transmissionPoint>
<FIELD id="_WL" name="WAVELENGTH" ucd="em.wl" unit="angstrom"
utype="photdm: TransmissionPoint.spectralValue" datatype="double"/>
<FIELD id="_TV" name="TRANSMISSION_VALUE" ucd="???" unit="???"
utype="photdm: TransmissionPoint.transmissionValue" datatype="double"/>
</GROUP>
</GROUP>
instead of:
> <GROUP utype="photdm:TransmissionCurve">
> <GROUP photdm: TransmissionCurve.transmissionPoint>
> <FIELDref ref="_WL" utype="photdm: TransmissionPoint.spectralValue"/>
> <FIELDref ref="_TV" utype="photdm: TransmissionPoint.transmissionValue"/>
> </GROUP>
> </GROUP>
> <FIELD id="_WL" name="WAVELENGTH" ucd="em.wl" unit="angstrom" datatype="double"/>
> <FIELD id="_TV" name="TRANSMISSION_VALUE" ucd="???" unit="???" datatype="double"/>
and, if you use FIELDref's do you need to put the utypes in the FIELDref? (maybe this is explained
in the utype datamodel that I haven't read, forgive me if it's the case). I mean:
<GROUP utype="photdm:TransmissionCurve">
<GROUP photdm: TransmissionCurve.transmissionPoint>
<FIELDref ref="_WL"/>
<FIELDref ref="_TV"/>
</GROUP>
</GROUP>
<FIELD id="_WL" name="WAVELENGTH" ucd="em.wl" unit="angstrom"
utype="photdm:TransmissionPoint.spectralValue" datatype="double"/>
<FIELD id="_TV" name="TRANSMISSION_VALUE" ucd="???" unit="???" utype="photdm:
TransmissionPoint.transmissionValue" datatype="double"/>
Thanks
Carlos
On 06/05/14 18:16, Gerard Lemson wrote:
> Hi Carlos
> I think that the problem you have is similar to one that in PhotDM it is solved rather elegantly I believe.
> PhotDM has a PhotometryFilter which has a TransmissionCurve, which is composed of TransmissionPoint-s.
> See fig 1 in http://www.ivoa.net/documents/PHOTDM/20130928/PR-PhotDM-1.0-20130928.pdf.
> The latter consist of two PhysicalQuantityDouble-s, one for spectralValue, one for transmissionValue.
> This PhysicalQuantityDouble is an adhoc quantity type defined inside this special model.
> The argument of my reply to Pierre, and the goal of this special ivoa data model is that if instead an ivoa:quantity.RealQuantity was used, the following is a legal annotation of a TABLE, following the utype standard, and not requiring any other ucd or utype than the one already defined in the FIELD (note that I don't know ucd/unit for a transmission value):
>
> <GROUP utype="photdm:TransmissionCurve">
> <GROUP photdm: TransmissionCurve.transmissionPoint>
> <FIELDref ref="_WL" utype="photdm: TransmissionPoint.spectralValue"/>
> <FIELDref ref="_TV" utype="photdm: TransmissionPoint.transmissionValue"/>
> </GROUP>
> </GROUP>
> <FIELD id="_WL" name="WAVELENGTH" ucd="em.wl" unit="angstrom" datatype="double"/>
> <FIELD id="_TV" name="TRANSMISSION_VALUE" ucd="???" unit="???" datatype="double"/>
>
> If in fig 12 of the spectrum 2.0 model you'd add similar attributes in the SPPoint type, say spectralValue and flux (and backgroundValue), give all of these datatype ivoa:quantity.RealQuantity, you could similarly write (note that I make some assumptions on utypes of the types and the prefix of the model):
>
> <GROUP utype="specdm:Spectrum">
> <GROUP utype="specdm:Spectrum.data">
> <FIELDref ref="_WL" utype="specdm:SPPoint.spectralValue"/>
> <FIELDref ref="_FL" utype="specdm SPPoint.flux"/> </GROUP>
> </GROUP>
> <FIELD name="WAVELENGTH" ucd="em.wl" unit="angstrom" datatype="double"/>
> <FIELD name="FLUX" ucd="phot.flux.density;em.wl" unit="erg/cm2/s/A" datatype="double"/>
>
> Note that you should then remove the fluxAxis and spectralAxis (and backgroundValue) collections from the model. Such constructs were there originally in the PhotDM as well, but were replaced with the transmission point.
> I.e. this says that a Spectrum is basically a collection of SPPoint-s, each of which has a spectralValue, flux and possibly a backgroundValue. And this is elegantly serialized and annotated to a TABLE: each row represents one such point.
>
> Cheers
> Gerard
>
> PS
> No guarantee I have not some typos.
>
>>
>> I have always had a doubt that could have something to do with this discussion
>> (if I'm not understanding everything wrong)
>>
>> I want to serialize an spectrum in a votable.
>> I have two fields: wavelength and flux.
>>
>> <FIELD name="WAVELENGTH" utype="spec:Data.SpectralAxis.Value"
>> ucd="em.wl" unit="angstrom"
>> datatype="double"/>
>> <FIELD name="FLUX" utype="spec:Data.FluxAxis.Value"
>> ucd="phot.flux.density;em.wl" unit="erg/cm2/s/A"
>> datatype="double"/>
>>
>> the information about ucd, unit and also name for the Spectral and Flux axis is
>> given there.
>>
>> But reading the Spectrum DM (at least version 2.0, but I think that it was similar
>> in the previous one and in other DataModels) I get the impression that I must
>> duplicate this information in a Characterization group:
>>
>> <GROUP name="Characterization">
>> <GROUP name="Char.FluxAxis" utype="spec:Char.FluxAxis">
>> <PARAM name="FluxAxisName" utype="spec:Char.FluxAxis.name"
>> value="FLUX" .../>
>> <PARAM name="FluxAxisUcd" utype="spec:Char.FluxAxis.ucd"
>> value="phot.flux.density;em.wl" .../>
>> <PARAM name="FluxAxisUnit" utype="spec:Char.FluxAxis.unit"
>> value="erg/cm2/s/A" .../> </GROUP> <GROUP name="Char.SpectralAxis">
>> <PARAM name="SpectralAxisName" utype="spec:Char.SpectralAxis.name"
>> value="WAVELENGTH" .../>
>> <PARAM name="SpectralAxisUcd" utype="spec:Char.SpectralAxis.ucd"
>> value="em.wl" .../>
>> <PARAM name="SpectralAxisUnit" utype="spec:Char.SpectralAxis.unit"
>> value="angstrom" .../>
>> </GROUP>
>> </GROUP>
>>
>> where I say again the name, ucd and unit for the spectral and flux axis.
>>
>> Is that really needed? what for? I've always found this odd.
>>
>> Carlos
>>
>> On 06/05/14 17:03, Laurino, Omar wrote:
>>> Hi Pierre,
>>>
>>>
>>>
>>> May I precise my position.
>>>
>>>
>>> Your feedback has been valuable in the Tiger Team and is always welcome.
>>>
>>> =====
>>> TL;DR reply (more details follow):
>>>
>>> I said one year ago that the VO-DML VOTable serialization proposed by
>> Gerard tended to move some
>>> meta information such as *UCD*, *unit *or *datatype *outside the VOTable
>> FIELD entity towards
>>> the proposed GROUP VO-DML hierarchy extension. I noted that this point
>> would be extremely
>>> annoying for all VOTable clients such as TOPcat or Aladin for which this
>> metadata information
>>> must stay in the FIELD entities.
>>>
>>>
>>> I am not sure what you are exactly referring to. If it is what Gerard
>>> commented on, yes, this was fixed long ago after you made this comment.
>>>
>>> If it is not, I am giving more information in the second part, but in
>>> summary we are trying to standardize the serialization of Data Models
>>> also for the reason you mention: allowing clients to know where to
>>> look for metadata, which is tricky, to say the least, with the current usages and
>> standards (see the second part of the email for details and examples).
>>>
>>> For bypassing this issue, and if I correctly understand the current 2014-05-03
>> XML basic IVOA
>>> model description
>>> (https://volute.googlecode.com/svn/trunk/projects/dm/vo-
>> dml/models/ivoa/IVOA.vo-dml.xml), the
>>> "quantity" entry duplicates now the UCD role and unit role.
>>>
>>>
>>> We are not duplicating existing standards, we are defining a
>>> standardized way to describe and serialize data models in a
>>> machine-readable way. You might be confusing the two levels of the
>>> solution, which correspond to two different documents: VODML
>>> descriptions of data models, and the serialization of such data models in
>> VOTable. In the second document we use the standardized units and ucd and the
>> corresponding VOTable standard attributes.
>>>
>>>
>>>
>>> And I have to say that the current basic IVOA model appears for me too
>> heteroclite to be used
>>> without fear: "identity, rational, complex, duration, anyURI, boolean, real,
>> nonnegativeInteger,
>>> datetime, integer, string, quantity". For a no-DM person, it is quite difficult
>> to understand
>>> why such or such data type is considered as a basic datatype (duration ?
>> datetime ? anyURI ?),
>>> and why others are not (char ?, range ? frequency ? ...).
>>>
>>>
>>> Where to draw the line is a good question, and the current
>>> descriptions have been there to be commented for about a year, so we are
>> happy we are finally discussing them!
>>>
>>> =====
>>>
>>>
>>> More detailed responses below.
>>>
>>>
>>>
>>> I said one year ago that the VO-DML VOTable serialization proposed by
>> Gerard tended to move some
>>> meta information such as *UCD*, *unit *or *datatype *outside the VOTable
>> FIELD entity towards
>>> the proposed GROUP VO-DML hierarchy extension. I noted that this point
>> would be extremely
>>> annoying for all VOTable clients such as TOPcat or Aladin for which this
>> metadata information
>>> must stay in the FIELD entities.
>>>
>>>
>>>
>>> I am not sure whether you refer to the fact that in an early proof of
>>> concept serialization there were standalone PARAMs for unit and ucd.
>>> If that's the case, as Gerard pointed out this was fixed long ago in
>>> response to your feedback and the result is in section 6.8 of the UTYPEs draft
>> we presented in Heidelberg one year ago, as well as in the actual examples
>> (Reference 1 below).
>>>
>>> It may also sound like you are worried about FIELDref having the UCD
>> metadata as opposed to FIELDs.
>>> If that's the case, there are several current standards and production
>>> implementations that use UCDs in FIELDrefs. I am not going to
>>> elaborate too much on this, since I am not sure whether this is really
>>> what you meant, but I will give a couple of references, just in case.
>>> The PhotDM, in section
>>> C.2 (Reference 2) provides an example of a Cone Search response, and use
>> FIELDrefs (with UCDs).
>>> FIELDs are not even mentioned. This is, I believe, taken directly to
>>> the note by Sebastien et al (Reference 3) on how to serialize
>>> Photometry Measurements in VOTable. The only examples that makes use
>>> of FIELDs (section 4.1 and 4.2) have two sets of (different) UCDs, one for the
>> FIELDs and one for the FIELDrefs. The other examples do not mention FIELDs.
>>>
>>>
>>> In any case, whether you meant the first or the second interpretation,
>>> more generally, the problem is that the current standards make it hard
>>> for clients to make sense of the metadata, and this is one of the
>>> reasons why we are trying to standardize the serialization of data models: to
>> make clients' life easier.
>>>
>>> As far as I know this only applies to UCDs and UTYPEs, because
>>> FIELDrefs can only have these attributes (Reference 4, Sections 7.2).
>>>
>>> Some models (e.g. Spectrum 1.1, Reference 5) define reify UCDs by
>>> creating UCD fields in the model (thus creating many *.ucd UTYPEs).
>>> For instance, see the VOTable example in section 8.2 (I'm including a snippet
>> for convenience):
>>>
>>> <PARAM ID="DataFluxUcd" datatype="char" name="DataFluxUcd"
>>> utype="spec:Spectrum.Data.FluxAxis.Ucd" value="phot.flux.density;em.wl"
>> arraysize="*">
>>> <DESCRIPTION>UCD for flux</DESCRIPTION>
>>> </PARAM>
>>>
>>>
>>> Notice that, as opposed to Gerard's 2012 proof of concept, this is stated in a
>> *standard* document.
>>>
>>> The status quo is that a client parsing a *standard* Spectrum 1.1
>>> VOTable (I am using the example above, but there may be other examples in
>> other models) can find a UCD in many different places:
>>> - a FIELDref with @utype spec:Spectrum.Data.FluxAxis
>>> - a FIELD referenced by a FIELDref and without a @utype
>>> - a FIELD with @utype spec:Spectrum.Data.FluxAxis
>>> - a PARAM with @utype spec:Spectrum.Data.FluxAxis.Ucd
>>> - a TD relative to a FIELD with @utype
>>> spec:Spectrum.Data.FluxAxis.Ucd
>>>
>>> This is what we are trying to standardize, so that it is clear to
>>> clients how to look for metadata in an unambiguous way. Even better,
>>> with a standard like the one suggested by the Tiger Team, parsing a
>>> VOTable according to a data model becomes a mechanical effort, so that
>>> users and developers can use libraries, which is currently impossible (if not
>> convinced by the above example see the Current Usages document, Reference
>> 6).
>>>
>>>
>>>
>>> For bypassing this issue, and if I correctly understand the current 2014-05-03
>> XML basic IVOA
>>> model description
>>> (https://volute.googlecode.com/svn/trunk/projects/dm/vo-
>> dml/models/ivoa/IVOA.vo-dml.xml), the
>>> "quantity" entry duplicates now the UCD role and unit role.
>>>
>>>
>>> I believe you are confusing two levels, which are represented by two
>>> documents. One level is the data model description. Data Models can
>>> (in fact they do, see Sprectrum 1.1) define ucd and unit as fields of
>>> their models, reifying them. Even when they don't, there are some
>>> cases (I can provide examples from production services) where the data
>>> publisher needs to reify some of the metadata. For instance consider a column
>> where the same quantity is expressed in different units: in this case the unit
>> piece of metadata becomes data and you need a column to store them.
>>>
>>> So, VODML supports all of these real world examples. This has nothing
>>> to do with VOTable or any other serialization. As a matter of fact,
>>> VODML is indeed an effort to make serializations of Data Models
>> interoperable.
>>>
>>> The other level is the one of serialization. since VOTable has a @ucd
>>> attribute, it's smart to use it, and that's what we do in the serialization
>> document.
>>>
>>>
>>>
>>> Personally, I am not sure that this solution to duplicate this kind of
>> information will be the
>>> more appropriate approach: 1) we redo our VO efforts already done on
>> UCDs and units...
>>>
>>>
>>> Nope. When you serialize a data model instance in VOTable you use the
>>> standard UCDs and Units, and the standard VOTable attributes for them
>> (again, section 6.8 in the UTYPEs WD).
>>>
>>>
>>> 2) we will have to manage correspondances between FIELD-UCD/FIELD-unit
>> and VO-DML-quantity.
>>>
>>>
>>> You already need to do that now, but with VODML and the serialization
>>> document there is a standard to be implemented, applications developers do
>> not need to "guess", or to assume conventions.
>>>
>>>
>>> And I have to say that the current basic IVOA model appears for me too
>> heteroclite to be used
>>> without fear: "identity, rational, complex, duration, anyURI, boolean, real,
>> nonnegativeInteger,
>>> datetime, integer, string, quantity". For a no-DM person, it is quite difficult
>> to understand
>>> why such or such data type is considered as a basic datatype (duration ?
>> datetime ? anyURI ?),
>>> and why others are not (char ?, range ? frequency ? ...).
>>>
>>>
>>> Where to draw the line is a good question, and the current
>>> descriptions have been there to be commented for about a year, so we are
>> happy we are finally discussing them!
>>>
>>> Notice, however, that a no-DM person shouldn't care: VODML
>>> descriptions are meant to be used by software developers who need to
>>> know how to map the IVOA types to their language, and only DM people need
>> to create models, so...
>>>
>>> Primitive types are special in that they need to be defined beforehand
>>> so that developers can map them to their own "primitive" classes or
>>> structures. All other types can be derived from them, and that can be
>>> done mechanically in any language (we have prototypes and reference
>> implementations in Java and Python already, as I showed in Hawaii).
>>>
>>> I believe primitive types should all be domain-independent: frequency
>>> is a physics concept, you won't find it as a primitive type in MySQL
>>> or Java, while datetime is general and can be found in both (I am
>>> using "primitive" in a broad sense, not in a language-specific
>>> sense... e.g. Java doesn't have a datetime "primitive", but datetime and
>> duration have corresponding classes in the standard Java library).
>>>
>>> Also, they should map at least to the VOTable concepts.
>>>
>>> Of course, this is all to some extent arbitrary and fuzzy. For
>>> instance, you mention char and
>>> duration: the first one would be good to include because it maps
>>> directly to a VOTable datatype. The second one is really on the fuzzy
>>> edge. I think it makes sense to include it among the primitive types, but I
>> wouldn't be against leaving it out of the list.
>>>
>>> Thanks for the feedback!
>>>
>>> Omar.
>>>
>>> Reference 1. UTYPEs WD
>>> http://volute.googlecode.com/svn/trunk/projects/dm/vo-dml/doc/UTYPEs-W
>>> D-v1.0.pdf
>>>
>>> Reference 2. PhotDM REC
>>> http://www.ivoa.net/documents/PHOTDM/20130928/PR-PhotDM-1.0-
>> 20130928.p
>>> df
>>>
>>> Reference 3. PhotDM in VOTAble
>>> NOTE
>>> http://wiki.ivoa.net/internal/IVOA/PhotometryDataModel/NOTE-PPDMDesc-0
>>> .1-20101202.pdf
>>>
>>> Reference 4. VOTable 1.3
>>> REC
>>> http://www.ivoa.net/documents/VOTable/20130920/REC-VOTable-1.3-
>> 2013092
>>> 0.html
>>>
>>> Reference 5. Spectrum 1.1
>>> REC
>>> http://www.ivoa.net/documents/SpectrumDM/20111120/REC-SpectrumDM-
>> 1.1-2
>>> 0111120.pdf
>>>
>>> Reference 6. UTYPEs: Current Usages
>>> NOTE
>>> http://www.ivoa.net/documents/Notes/UTypesUsage/20130213/NOTE-
>> utypes-u
>>> sage-1.0-20130213.html
>>>
>>>
>>> --
>>> Omar Laurino
>>> Smithsonian Astrophysical Observatory
>>> Harvard-Smithsonian Center for Astrophysics
>>> 100 Acorn Park Dr. R-377 MS-81
>>> 02140 Cambridge, MA
>>> (617) 495-7227
>
More information about the dm
mailing list