[VEP-0001] DataLink semantics vocabulary enhacement proposal

Tue Oct 22 12:21:50 CEST 2019

Hi,

I agree that we must be careful not to put anything in the VO 
vocabulary, and that the endorsed vocabulary must correspond to concrete 
use-cases.

In fact, the VEP0001 list is compliant with this policy since it just 
sets a list of science products currently used in astronomy and then 
potentially tagged with VO vocabulary.

If a too fine vocabulary can lost clients, the same is true for a too 
coarse level. Question of trade-off.
Typically, the "associated-timeseries" is too coarse, since it just tell 
that the associated data-set is set of (timestamp + anything) points. 
Telling more about what "anything" is will obviously help the clients.
Let's take my use case. The GRB monitor (SVOM) I'm working on will 
deliver TS of spectra. I suppose that generalist clients as Splat would 
be happy to be notified that this dataset must not be plot as a simple 
light curve.

Cheers
LM

Le 22/10/2019 à 10:53, Markus Demleitner a écrit :
> Hi DAL,
> 
> On Mon, Oct 21, 2019 at 05:38:32PM +0200, Petr Skoda wrote:
>>> So, a question to all (including Carlos, who's posted on voevent@):
>>> Which of these terms do you actually need *now* (or at least for data
>>> that you will want to publish in the safely forseeable future)?  And
>>> can you see a clear scenario for a *machine* to have to understand
>>> matters at that level of detail (for human, there's always the
>>> description in datalink)?
>>
>> I would like to point out that the list suggested by Francois is still not
>> sufficient for many archival (Vizier e.g.) and future surveys data.
> 
> Well, you see, that is the question: sufficient for exactly *what*?
> These terms are directed at machines, and for humans there's still
> description (and a lot more channels).  So, the question is: how much
> distinction do you have to convey to a machine client?
> 
> As far as I can see, there are two use cases in general for datalink
> semantics:
> 
> (a) link filtering: The client, based on the semantics, selects a
> subset of the links provided to present to its users -- for instance,
> calibration data will not be shown outside of a debugging session.
> Or they're just used for grouping.  This was, I think, the original
> use case that triggered the introduction of the semantics column.
> 
> (b) figure out what do do with a link: When Aladin implemented
> datalink, they found that based on what's in a datalink row, they
> didn't know how to deal with a link: they'd like to send spectra to
> clients listening to spectrum.load.ssa-generic, images to those
> listening to image.load.fits and so forth.  The datalink content_type
> column isn't quite sufficient for this, because
> application/x-votable+xml can be a spectrum or an object catalog,
> whereas image/fits might be some kind of cube or a plain image (or an
> IRAF spectrum, or still something else).  That's the "SAMP sending
> use case" that, I think, was largely missed when we wrote datalink.
> 
> Does anyone have more use cases for Datalink semantics?  If so, this
> would be the perfect moment to bring them forward, in particular so
> we can put them into Datalink 1.1.
> 
> 
> Having established this much, after a mail from Ada I had another of
> my dangerous epiphanies.  That is, if we really want to deal with use
> case (b) in semantics, we'll end up reproducing the distinction that
> VEP-0001 proposes on in every branch: not only will we have
> 
> #associated-cube #associated-image #associated-radialvelocitycurve ...
> 
> but also
> 
> #derivation-cube #derivation-image #derivation-radialvelocitycurve ...
> 
> and (we've already seen use cases for that)
> 
> #progenitor-cube #progenitor-image #progenitor-radialvelocitycurve ...
> 
> We *could* do this.  But if we go there, we should be aware of what
> ugly thing we're doing.  And I'd suggest we think about alternatives
> first.
> 
> First off: I think #associated-data as such is a good term, although
> we may want to try get the distinction to the existing #auxiliary a
> bit clearer.  Essentially, if we model provenance as a tree, then
> #progenitor is an ancestor of the current item, #derivation a
> descendant, and #associated-data a sibling.  I like it, and I can see
> why this fits into use case (a).  Also, we have Gaia DR2, where this
> can be immediately applied.
> 
> I'm still unhappy about putting #auxiliary against #associated-data;
> the fact that the description of the former is just "auxiliary
> resources" may underline the importance of trying hard to come up
> with helpful descriptions.  But that's for another day.
> 
> Let's look at use case (b).  Really, what we'd like to have is a
> mapping of "something" to the SAMP mtypes
> (https://wiki.ivoa.net/twiki/bin/view/IVOA/SampMTypes).  I suppose
> we're doing our adopters a favour if we start from obscore
> dataproduct_types, because they'll have to deal with them anyway.
> I think François' intent has been pretty much that in the proposed
> vocabulary, which largely takes up 3.3.1 of obscore, except for
> the attempt to additionally describe the nature of cube axes in that
> scheme (which we could discuss separately).
> 
> If we accept this, the question transforms into: "Where can we
> communicate an obscore dataproduct_type in datalink?".
> 
> I can see three options:
> 
> (1) The semantics column -- the consequences I've described above.
> No disaster, but certainly ugly.
> 
> (2) The datalink content_type column.  As said above, media types
> don't quite work out of the box, because dataproduct types and media
> types don't really map onto each other.  However, RFC 6838 media
> types have structure: You can add parameters.  We already exploit
> this in datalink to say that datalink documents should come with a
> media type of application/x-votable+xml;content=datalink.
> 
> What if we just said, in datalink: "Whereever possible, the
> content_type should indicate the dataproduct type communicated, using
> a content parameter taken from the vocabulary associated with obscore
> dataproduct_type.  For instance, a spectrum in a VOTable would have
> application/x-votable+xml;content=spectrum, whereas some kind of cube
> in a FITS serialisation would be application/fits;content=cube."
> 
> We can immediately start doing this; there's strings attached,
> though, in that I doubt too many clients parse media types at this
> point, and these might become confused it we did this.
> 
> (3) Adding a dataproduct_type column in datalink.  If we started from
> scratch, this is probably what I'd do.  As things are now... don't
> know.  As for (2), this can start immediately (because datalink lets
> you add extra columns), and at it would even have the advantage that
> clients that don't parse media types would still understand
> content_type.
> 
> Any opinions or preferences from datalink adopters or authors?
> 
> 
> Coming back to the vocabulary as such -- Petr's mail IMHO admirably
> makes clear that the full problem is probably beyond the means of a
> single term from a vocabulary and thus underlines my appeal to try
> and solve problems we have right now and know can be solved with
> simple terms.  See:
> 
>> E.g. what is missing is the associated link to timeseries where the
>> horizontal axis is not time but circular phase associated with given
>> frequency in a periodogram and the associated periodogram itself.
>>
> [...]
>> If you want the example of timeseries of spectra
>> there is so called dynamical spectrum (e.g. in my old pictures
> [...]
>> There are of course better examples of quick time resolved spectroscopy etc
> [...]
>> Also I can imagine the time series of datacubes (in ALMA, radio) ...
>>
>> And lastly , what about the gravity wave associated information
>> (strain/frequency - I a have asked people from GW community for detailed
>> examples ...
>> and it seems that the common "timeseries" they use is
>> either strain/time   or power density of strain/frequency
>> (strain is relative displacement/baseline of mirrors)
> [...]
> 
>> As something more understandable for optical astronomers we should think
>> about folded curves as well as so called phase portaits of those curves
>> (important for analysis of deterministic chaos - which some sources may be
>> driven by)
> [...]
> 
>> If I go to details - even the single order specrum has associated the 2D
>> image of spectrum (e.g. the rainbow) on a CCD chip as a strip of light and
>> in echelle - still not properly handled even by SSAP it is even complicated
>> ... perhaps the cutout of whole echellogram of a given spectral order is a
>> good approximation for proposed "associated image"
> 
> (I've elided a few more cases of stuff we would have to annotate if
> we wanted to machine-readably label all possible kinds of data
> products). Which is why I like Petr's conclusions:
> 
>> IMHO we should have easily extensible vocabulary and let the client
>> developers to decide how they will use the information
>> The people publishing certain product at datalink end will have clear vision
>> what they want to show and the new clients will be able to display this ....
>>
>>
>> But in practice I think that the most different part of clients is the
>> dimension - e.g. timeseries as light curves, folded light curves (in phases)
>> , spectra, power spectra , gravitation waves etc ... are just the same task
>> to display as 1D vector - and all "semantics": is given by description of
>> axes - units, variables...
>>
>> This is what we wanted to express in our IVOA note - SPLAT is tool for
>> displaying 1D vectors. No semantics needed. Thats why we could use it to
>> time series immediately with changing a few lines of code ;-)
>>
>> The image is domain of Aladin and we need a 3D viewers for data cubes ...
>> Thats all - number of axes determines the product and client to use.
> 
> So -- I'd no say #associated-data is enough to satisfy the filtering
> use case (a).  Whereas the SAMP sending use case (b) is probably
> better solved by something else.
> 
>               -- Markus
> 

-- 
---- Laurent MICHEL              Tel  (33 0) 3 68 85 24 37
      Observatoire de Strasbourg  Fax  (33 0) 3 68 85 24 32
      11 Rue de l'Universite      Mail laurent.michel at astro.unistra.fr
      67000 Strasbourg (France)   Web  http://astro.u-strasbg.fr/~michel
---