Spectral 2.0 implementor feedback

Wed Feb 18 15:55:40 CET 2015

Dear DM,

I've finally added support for the spectral data model 2 to DaCHS, which
actually means DaCHS can now generate spectra in VOTable that I kinda
hope confrom to the SDM2 from from SSA metadata and matching data.
Note that I'm not using any of the new features, in particular not the
photometryPoint.

This is based on current support of SDM1 (in VOTable and FITS) -- what
few clients support SDM1 appear to have been ok with what it has been
producing.

The actual implementation was mainly replacing some utypes and removing
some of the more baroque aspects of SDM1 (like the XML namespace
declaration for the utype prefix).  And then a bit to let people ask
for SDM2.

The following is a report on my experiences during implementation.  I'm
sorry I've not found time to do this during public review -- I'm taking
my Registry chair hat off for this, and none of this is meant as part of
TCG review.  So, consider this an extremely late part of public review.

My first challenge was: How do I let people tell me they want SDM2
files?  I've been equating application/x-votable+xml with SDM1 files,
so I can't really use that.  Having a (more or less) proper media
type for SDM2 is IMHO important in many settings, however (think
ObsCore).  I'm now using

  application/x-votable+xml;content=spec2

-- I think the document should codify that (or something similar), or
clients can't really figure out what's in a VOTable spectrum (before
downloading it).  For me, this media type is an option in the AccessData
services' FORMAT parameter.

So, you can now pull such spectra from the Heidelberg spectral services
using datalink data access services.  As client support for that is
thin, here are a two direct links to randomly selected spectra
generated in that way -- absent a validator I appreciate any hints
what I'm doing wrong:

A plain, observational spectrum:

http://dc.g-vo.org/flashheros/q/sdl/dlget?ID=ivo%3A//org.gavo.dc/%7E%3Fflashheros/data/ca92/f0006.mt&FORMAT=application/x-votable%2Bxml%3Bcontent%3Dspec2

A plain, theoretical spectrum (more on this below; this is known
non-compliant due since position in space and time are required in
SDM2):

http://dc.g-vo.org/theossa/q/sdl/dlget?ID=ivo%3A%2F%2Fwww.g-vo.org%2Ftheossa%2Fq%2Fdata%2F0038000_5.70_H_9.968E-01_HE_3.167E-03_02000-03000A_2008-08-02_07_20_01&FORMAT=application/x-votable%2Bxml%3Bcontent%3Dspec2

The "plain" above refers to my original hope that SDM2 would let me
properly write Echelle spectra.  Here's an example for my improvised
Echelle-as-sequence-of-almost-SDM1 hack -- I understand this is not in
scope for SDM2, but frankly I'd love to be corrected:
http://dc.g-vo.org/flashheros/q/echdl/dlget?ORDER_MAX=114&ORDER_MIN=112&ID=ivo%3A%2F%2Forg.gavo.dc%2F%7E%3Fflashheros%2Fdata_raw%2Fls95%2Fblue%2Fn0043.mt

Then, here's a list of issues I encountered during this work:

(1) 2.1.7 Dataset.dataID:DataID promises 2.6 would talk about "any
associations with various collections" -- does it?  I'm asking because I
was considering including SSA's association metadata (I'm not saying it
should be included, but I'm wondering).  If this is a remnant of earlier
modelling, I suggest removing it.  If "any associations" refers to
DataID.collection, shouldn't that be made explicit?

(2) Like SSA, SDM2 has both DataID.Version and Curation.Version.  Here's
what the explanation in SDM2 is:

  2.4.5 Curation.version:string
  Version is provided by the publisher or creator and may be any string.
  (RM:Curation.Version)

  2.6.7 DataID.version:string
  Version of the creator-produced dataset.

As an implementor, I'd certainly appreciate some guidance as to what the
difference between the two is and what scenarious for using one or
the other might be.

(3) In a similar vein, it would be useful to explain why Derived.varAmpl
is there when AFAICS we don't actually talk about time series anywhere
else (incidentally, the explanation in 2.11.6 detailing why there's
Target.redshift and Derived.redshift is a good example for the sort of
explanation that's really helpful for implementors).

(4) What are the CoordSys.ID and the other STC IDs supposed to do?
Where would I reference them from?  I also couldn't figure out what I
might want to do with CoordFrame -- some indication would be great.

(5) Why is it CalibStatus in Char.SpectralAxis.CalibStatus but
Char.(TimeAxis|FluxAxis|SpatialAxis).CalibrationStatus otherwise?  (and
is it actually worth the trouble to rename this from SSA's Calibration?)

(6) This one is really important to me, and for this one I could as
well put my Registry chair hat on -- SDM1 and SDM2 utypes are
different, and they should be discernable without context (e.g., in the
utype column in TAP_SCHEMA or VOResource).  Hence, the "prefix" (or
whatever preferred terminology you have for the thing in front of a
colon in legacy utypes) *must not* be spec.  I'm now using spec2 in my
implementation, and something like this will need to go to 7.1.1.  I
don't know about 8.1.1, but I have a bad feeling about the photometry DM
using photdm and SDM2 using phot.  IMHO that's inviting confusion, and
I'd much prefer if we had, say, photpoint here.

(7) Char.SpatialAxis.SamplingPrecision.SamplingPrecisionRefval.fillFactor 
and its three friends -- this is mapped from SSA's
Char.SpatialAxis.SamplingPrecision.fillFactor, and I give you both are
not beauties.  But frankly, given that char2 isn't REC yet, maybe we
don't have to blindly follow this.  Doesn't anyone else feel a utype
with almost 80 characters and "SamplingPrecision" twice in it is an
indication that we're doing it wrong?

(8) I'm still entirely at a loss as to what to do with mandatory spatial
axis coverage (both value and extent) and mandatory time axis start and
stop.  I have lots of theoretical spectra, for which these make no sense
at all.  Should I now write invalid SDM instances or give up on it for
theoretical spectra?  And what's the rationale for disallowing them?

(9) A minor thing, but for, e.g., comparing with other utype lists,
having the list in Appendix A sorted strictly (and not only
approximately) alphabetically would be nice.

(10) It seems a bit weird to me to annotate both RESOURCE and TABLE with
utype="spec2:Spectrum" -- even if I'm not quite sure what legacy utypes
really point to, it seems unwise to point to the same thing from
two fundamentally different entities.

(11) I'm still uncertain if it's a good idea to have both
spec2:Curation.PublisherDID and spec2:DataID.DatasetID; I don't even want
to imagine circumstances in which the two would be different (also keep
in mind that there's creatorDID on top of these two).  If we keep them,
I'd suggest to have the (at least so far) far more common 
ssa:Curation.PublisherDID in the example rather than, as now,
DataID.DatsetID.

Also (and you might again notice my Registry hat behind my back), I'd
of course like to change the "unique within the namespace controlled
by the publisher" in 2.4.3.  We're using IVORNs here specifically
such that pubDIDs are unique within the entire VO.  If I understand
the provenance of these items the difference between PublisherDID and
DatasetID was intended to be in persistence, not uniqueness.

(12) I'm unhappy that the way to serialise list-valued items --
repeating PARAMS, as shown for spec:DataId.Collection in the example --
isn't actually properly defined anywhere. This needs discussion (in the
document) as I'd claim the instinct of most people would have been to
put at least atomic values into an array or a single PARAM.  Also, I
think it needs to be stressed (somewhere) that in consequence,
multiple PARAMs with identical utypes may occur.

Still with my humble implementor hat firmly on: Given that the low
implementation activity suggests nobody is urgently waiting for the
PhotometryPoint feature to become available -- is there any
implementation at all? -- and "normal" use cases appartently are covered
by SDM1 quite as well as with SDM2:

(1) Can't we wait until the VO-DML serialisation spec is out and then
    use a principled way of representing the data model instances
    described here?  The document admits weaknesses in the
    serialisation part itself (which, in turn, is what I as
    implementor care about most). I'd volunteer as guinea pig, and
    I'd start working on this tomorrow.  This would provide a good
    justification for the spec change, and even better: The new
    annotation could sit next to each other in one single spectrum
    with SDM1, so there'd you'd have perfect forward and backward
    compatibility.

(2) What I'd really like to see (and we still have a CSP priority on it)
    is some standard format to represent time series in.  It seems to
    me SimpleTimeSeries hasn't gotten too much adoption, so I guess
    we have a second chance for doing it right from the start.  With
    (1) in place, I expect two additional pages in this document
    (so, about and additional 2% in material) would be enough to
    cover these, too.  So, while (1) is going on, we could add that
    material and have that solved, too.

(3) I've had a fairly scratchy itch in representing Echelle spectra for
    quite a while now -- existing serialisations are under scrutiny
    by developers' rights advocacy groups.  Again, I think time
    invested there would be well-invested, and again I promise I'd
    accompany any activity with avid implementation.  I don't believe
    it'd be much more than another two pages or so.

Doing this would have the additional advantage that we avoid conflicts
between the cube/image and spectrum DMs, as changes in the former can be
fed back into the latter.

Cheers (and sorry again for having procrastinated this for so long),

          Markus