Spectral 2.0 implementor feedback

Tue Feb 24 00:44:46 CET 2015

Markus,

It's great to see feedback from implementation!
I'm not sure how this folds into the process.. you raise some new points
here.

Many of your concerns seem to stem from this document not including some of
the improvements/benefits of the cube and vo-dml work which has occurred
since
this document first went through PR.  It is true, this document reflects
the state
of things as of a couple years ago, and the requirements placed upon the
generation of this version in general.  Things have progressed quite a bit
since
then and are reflected in the newer (cube) documents.

The priority for the DM group right now is the Cube model, so I can't spend
too
much time iterating on this document.

A note about SSA vs SDM.
   SDM (1 or 2) is the model for a Spectrum instance.
   SSA is an access protocol for discovering said instances.

SDM does not use/reuse any elements from SSA.  SSA makes use of SDM.
As such, utype changes in SDM can make for an incompatiblity with the SSA
protocol since it tied its definition of the query response so directly to
the model.
This is one reason why it was mandated that this revision of the model
should
have 'minimal impact on UTypes'.

Some responses below:

On Wed, Feb 18, 2015 at 9:55 AM, Markus Demleitner <
msdemlei at ari.uni-heidelberg.de> wrote:

> Dear DM,
>
> I've finally added support for the spectral data model 2 to DaCHS, which
> actually means DaCHS can now generate spectra in VOTable that I kinda
> hope confrom to the SDM2 from from SSA metadata and matching data.
> Note that I'm not using any of the new features, in particular not the
> photometryPoint.
>
> This is based on current support of SDM1 (in VOTable and FITS) -- what
> few clients support SDM1 appear to have been ok with what it has been
> producing.
>
> The actual implementation was mainly replacing some utypes and removing
> some of the more baroque aspects of SDM1 (like the XML namespace
> declaration for the utype prefix).  And then a bit to let people ask
> for SDM2.
>
> The following is a report on my experiences during implementation.  I'm
> sorry I've not found time to do this during public review -- I'm taking
> my Registry chair hat off for this, and none of this is meant as part of
> TCG review.  So, consider this an extremely late part of public review.
>
> My first challenge was: How do I let people tell me they want SDM2
> files?  I've been equating application/x-votable+xml with SDM1 files,
> so I can't really use that.  Having a (more or less) proper media
> type for SDM2 is IMHO important in many settings, however (think
> ObsCore).  I'm now using
>
>   application/x-votable+xml;content=spec2
>
>
I think this is an SSA question, outside the scope of SDM.

> -- I think the document should codify that (or something similar), or
> clients can't really figure out what's in a VOTable spectrum (before
> downloading it).  For me, this media type is an option in the AccessData
> services' FORMAT parameter.
>
> So, you can now pull such spectra from the Heidelberg spectral services
> using datalink data access services.  As client support for that is
> thin, here are a two direct links to randomly selected spectra
> generated in that way -- absent a validator I appreciate any hints
> what I'm doing wrong:
>
> A plain, observational spectrum:
>
>
> http://dc.g-vo.org/flashheros/q/sdl/dlget?ID=ivo%3A//org.gavo.dc/%7E%3Fflashheros/data/ca92/f0006.mt&FORMAT=application/x-votable%2Bxml%3Bcontent%3Dspec2
>
> A plain, theoretical spectrum (more on this below; this is known
> non-compliant due since position in space and time are required in
> SDM2):
>
>
> http://dc.g-vo.org/theossa/q/sdl/dlget?ID=ivo%3A%2F%2Fwww.g-vo.org%2Ftheossa%2Fq%2Fdata%2F0038000_5.70_H_9.968E-01_HE_3.167E-03_02000-03000A_2008-08-02_07_20_01&FORMAT=application/x-votable%2Bxml%3Bcontent%3Dspec2
>
> The "plain" above refers to my original hope that SDM2 would let me
> properly write Echelle spectra.  Here's an example for my improvised
> Echelle-as-sequence-of-almost-SDM1 hack -- I understand this is not in
> scope for SDM2, but frankly I'd love to be corrected:
>
> http://dc.g-vo.org/flashheros/q/echdl/dlget?ORDER_MAX=114&ORDER_MIN=112&ID=ivo%3A%2F%2Forg.gavo.dc%2F%7E%3Fflashheros%2Fdata_raw%2Fls95%2Fblue%2Fn0043.mt
>
>
I don't believe this model can accommodate Echelle spectra.  It is
something I'd like to address too, since it's been on the list for so long.
The priority now is the Cube work.  With that done, representing Spectrum
('plain' or Eschelle) and TimeSeries, in terms of a common framework should
be pretty straight forward.

> Then, here's a list of issues I encountered during this work:
>
>
> (1) 2.1.7 Dataset.dataID:DataID promises 2.6 would talk about "any
> associations with various collections" -- does it?  I'm asking because I
> was considering including SSA's association metadata (I'm not saying it
> should be included, but I'm wondering).  If this is a remnant of earlier
> modelling, I suggest removing it.  If "any associations" refers to
> DataID.collection, shouldn't that be made explicit?
>
>
The text in 2.1.7 is a high level description of what the DataID object does
in this particular context.  In this case, it simply summarizes the content
of
the object.

Section 2.6 describes the DataID object, where:
"high level identification metadata" = title, datasetID, creatorDID,
observationID, etc.
"associations with various collections" => collection

I'm not real familiar with it, but it looks like the SSA Association object
is an element
of the query response for grouping various datasets in the response.

> (2) Like SSA, SDM2 has both DataID.Version and Curation.Version.  Here's
> what the explanation in SDM2 is:
>
>   2.4.5 Curation.version:string
>   Version is provided by the publisher or creator and may be any string.
>   (RM:Curation.Version)
>
>   2.6.7 DataID.version:string
>   Version of the creator-produced dataset.
>
> As an implementor, I'd certainly appreciate some guidance as to what the
> difference between the two is and what scenarious for using one or
> the other might be.
>

As I mentioned on the twiki, I don't know what the difference is. I assume
it accommodates
a scenario like:
  + creator processes observation; releases version 1.
  + data reprocessed (multiple times?).. generates version 4
  + data provider receives dataset (for the first time)
      Curation.version == '1'
      DataID.version == '4'

   + curator modifies/updates some curation metadata (maybe their dataset
id?)
      Curation.version == '2'
      DataID.version == '4'

> (3) In a similar vein, it would be useful to explain why Derived.varAmpl
> is there when AFAICS we don't actually talk about time series anywhere
> else (incidentally, the explanation in 2.11.6 detailing why there's
> Target.redshift and Derived.redshift is a good example for the sort of
> explanation that's really helpful for implementors).
>

There has been no previous request to remove it.  In the cube work, there
is/will be more discussion on the Derived object modeling to allow more
flexibility ( these 3 items are not universally relevant ).

>
> (4) What are the CoordSys.ID and the other STC IDs supposed to do?
> Where would I reference them from?  I also couldn't figure out what I
> might want to do with CoordFrame -- some indication would be great.
>
> This is the tag used to reference the coordinate system.  In cube/vo-dml
work, this would not be a modeled element, but rather a property of the
object.  It would NOT have a UType.  The definition in this document is
unchanged from the current REC.

> (5) Why is it CalibStatus in Char.SpectralAxis.CalibStatus but
> Char.(TimeAxis|FluxAxis|SpatialAxis).CalibrationStatus otherwise?  (and
> is it actually worth the trouble to rename this from SSA's Calibration?)
>
> ERGH! can't believe I missed one.  All references to CalibStatus in Utypes
and
text (including SpectralAxis description in section 4.11) are using
CalibrationStatus
now in order to be more consistent with Characterisation (the owner of those
components), and ObsCore (1.1.. not the current REC).

(6) This one is really important to me, and for this one I could as
> well put my Registry chair hat on -- SDM1 and SDM2 utypes are
> different, and they should be discernable without context (e.g., in the
> utype column in TAP_SCHEMA or VOResource).  Hence, the "prefix" (or
> whatever preferred terminology you have for the thing in front of a
> colon in legacy utypes) *must not* be spec.  I'm now using spec2 in my
> implementation, and something like this will need to go to 7.1.1.  I
> don't know about 8.1.1, but I have a bad feeling about the photometry DM
> using photdm and SDM2 using phot.  IMHO that's inviting confusion, and
> I'd much prefer if we had, say, photpoint here.
>
> Hmm.  I hadn't expected the prefix to change for every version of any given
model.  The DataModel object would tell the user which model/version was
being used.
NOTE: This is another element which the vo-dml process improves.  The
DataModel element is absorbed as a property of the dataset.  At that point,
the
user would need to interpret the 'Model' description in the votable.

(7) Char.SpatialAxis.SamplingPrecision.SamplingPrecisionRefval.fillFactor
> and its three friends -- this is mapped from SSA's
> Char.SpatialAxis.SamplingPrecision.fillFactor, and I give you both are
> not beauties.  But frankly, given that char2 isn't REC yet, maybe we
> don't have to blindly follow this.  Doesn't anyone else feel a utype
> with almost 80 characters and "SamplingPrecision" twice in it is an
> indication that we're doing it wrong?
>
> The changes here bring Spectrum in line with Characterisation (which owns
those elements).  It was a major goal/effort to improve the consistency
between the models, so that a proper separation of concerns could happen
in the next pass (Cube).

> (8) I'm still entirely at a loss as to what to do with mandatory spatial
> axis coverage (both value and extent) and mandatory time axis start and
> stop.  I have lots of theoretical spectra, for which these make no sense
> at all.  Should I now write invalid SDM instances or give up on it for
> theoretical spectra?  And what's the rationale for disallowing them?
>
> (9) A minor thing, but for, e.g., comparing with other utype lists,
> having the list in Appendix A sorted strictly (and not only
> approximately) alphabetically would be nice.
>
> I can see your point here, but a strictly alphabetical list would be
awkward in
other cases ( Char.*Axis.Name|ucd|unit would be distributed oddly within
the complex objects).

> (10) It seems a bit weird to me to annotate both RESOURCE and TABLE with
> utype="spec2:Spectrum" -- even if I'm not quite sure what legacy utypes
> really point to, it seems unwise to point to the same thing from
> two fundamentally different entities.
>
> (11) I'm still uncertain if it's a good idea to have both
> spec2:Curation.PublisherDID and spec2:DataID.DatasetID; I don't even want
> to imagine circumstances in which the two would be different (also keep
> in mind that there's creatorDID on top of these two).  If we keep them,
> I'd suggest to have the (at least so far) far more common
> ssa:Curation.PublisherDID in the example rather than, as now,
> DataID.DatsetID.
>
> Also (and you might again notice my Registry hat behind my back), I'd
> of course like to change the "unique within the namespace controlled
> by the publisher" in 2.4.3.  We're using IVORNs here specifically
> such that pubDIDs are unique within the entire VO.  If I understand
> the provenance of these items the difference between PublisherDID and
> DatasetID was intended to be in persistence, not uniqueness.
>
> (12) I'm unhappy that the way to serialise list-valued items --
> repeating PARAMS, as shown for spec:DataId.Collection in the example --
> isn't actually properly defined anywhere. This needs discussion (in the
> document) as I'd claim the instinct of most people would have been to
> put at least atomic values into an array or a single PARAM.  Also, I
> think it needs to be stressed (somewhere) that in consequence,
> multiple PARAMs with identical utypes may occur.
>
>
I think this serialization is in line with where the vo-dml serialization
convention is going... and consistent with the description you were
advocating for the attribute.
A (singular) collection is a string PARAM
The DataID.collection attribute can have >1 of these.

If instead, collection were a complex object instead of a string.. then
each instance of collection would be a GROUP containing the content
of that object.

The multiplicity issue is described in the "Serialization Issues" section
in the end.  The possibility of UTypes occuring multiple times has always
existed, and was one of the drivers for the UTypes work.. wasn't it?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dm/attachments/20150223/65a9e51f/attachment-0001.html>