[VEP-0001] DataLink semantics vocabulary enhacement proposal

Mon Oct 21 17:38:32 CEST 2019

> So, a question to all (including Carlos, who's posted on voevent@):
> Which of these terms do you actually need *now* (or at least for data
> that you will want to publish in the safely forseeable future)?  And
> can you see a clear scenario for a *machine* to have to understand
> matters at that level of detail (for human, there's always the
> description in datalink)?

I would like to point out that the list suggested by Francois is still not 
sufficient for many archival (Vizier e.g.) and future surveys data.

E.g. what is missing is the associated link to timeseries where the 
horizontal axis is not time but circular phase associated with given 
frequency in a periodogram and the associated periodogram itself.

This is massively used in asteroseismology and exoplanetplanet hunting
e.g. all the Kepler exoplanets publications show somewhere images from 
this tool: see namely fig 4 here

https://www.researchgate.net/publication/248394956_The_NASA_Exoplanet_Archive_Data_and_Tools_for_Exoplanet_Research

atlases of variable stars show the period-folded curves as here

http://ogle.astrouw.edu.pl/atlas/anomalous_Cepheids.html

If you want the example of timeseries of spectra
there is so called dynamical spectrum (e.g. in my old pictures

https://arxiv.org/pdf/1112.2787.pdf

and

for small amount of spectra

https://arxiv.org/pdf/1407.1765.pdf

Fig 1

There are of course better examples of quick time resolved spectroscopy 
etc ....

I would need to find them (if needed)

Also I can imagine the time series of datacubes (in ALMA, radio) ...

And lastly , what about the gravity wave associated information 
(strain/frequency - I a have asked people from GW community for detailed 
examples ...
and it seems that the common "timeseries" they use is
either strain/time   or power density of strain/frequency
(strain is relative displacement/baseline of mirrors)

IMHO if we are going to address  important "astroparticle" use cases 
(not only GW but particle burst also)  we should fullfil requirements of 
project which were supply PBs of data very soon, even if they look exotic 
for us.

As something more understandable for optical astronomers we should think 
about folded curves as well as so called phase portaits of those curves 
(important for analysis of deterministic chaos - which some sources may 
be driven by)
https://arxiv.org/pdf/1711.09029.pdf

it will be also important for LSST to have the multiband periodogram 
associated to targets ...

e.g.

http://jakevdp.github.io/multiband_LS/paper.pdf

> SPLAT, in turn, for each photometry point in a time series, could see
> that there is an associated image (not yet for Gaia DR2, but it's
> there for BGDS, ivo://org.gavo.dc/bgds/l/tsform, for instance) and
> then have that displayed as users select individual points on the
> time series so people can immediately check if something is odd with
> the source of an outlier (say).

This was original motivation for using datalink together with my and 
Jiri's sparse cube TS model as I was showing in Sydney 2015
https://wiki.ivoa.net/internal/IVOA/InteropOct2015Applications/lightcurves_skoda.pdf

The access_url attached to every datapoint in our model was interpreted by 
Aladin to download the associated image. We have also shown in the IVOA 
note the possible cutout behind this - so everyone can see that particulr 
point on ligh curve is strange because there was a cosmic on the star 
image ...

>
> This would lead me to the following terms:
>
> associated-data: The data at access_url contains information on the
>  discovered item not directly related to the original observation
>  [which would probably be a progenitor or derivation] but giving
>  additional insights in the source or phenomenon observed.

In fact we want to substitute the PROVENANCE both by Carlos' proposal and 
this as well ..... every product is derived from raw data by certain steps 
...

If I go to details - even the single order specrum has associated the 2D 
image of spectrum (e.g. the rainbow) on a CCD chip as a strip of light and 
in echelle - still not properly handled even by SSAP it is even 
complicated ... perhaps the cutout of whole echellogram of a given 
spectral order is a good approximation for proposed "associated image"

>
> That's the root term; I'd rather not talk about "dataset" here, as
> that, at least in some parts of the VO, is a strong term.  For
> instance, associated images will almost always be cutouts of
> datasets, and we may not carelessly want to promote these to
> datasets themselves.
>
> associated-timeseries: The data at access_url is a mapping from
>  time to one or more scalar observables (examples: a lightcurve or a
>  time series of radial velocities) giving additional insight into the
>  discovered item.

I do not like "a mapping from time to scalar observables"
as written earlier - mapping from the some variable which is function of 
time

- e.g. frequency to some (derived, observed) variable - e.g. power 
spectrum is more general

or what about visibility curves - they are function of time as well just 
mapped to virtual u,v plane (time trajectories in fact)

>
> associated-image: The data at access_url is an image giving
>  additional insight into the discovered item.
>
> Now that I write this, I frankly think that associated-image as given
> in the use case above should actually be progenitor: that's clearly
> what a postage stamp that shows the image that the photometry point
> was derived from.

see above

>
> So -- does anyone have another use case (as I said: try describing
> what a client would do with the term) for associated-image?  Or for
> any other term other than associated-data and associated-timeseries?
> And let me stress again because I think that point is often neglected
> in the VO: "use case" should really include proposed client
> behaviour.

yes - it would be nice to see the original spectrum in 2D on chip linked 
to extract 1D ... A lot of common mistakes in data reduction would be 
immediately seen and the whole "provenance" of the final result would be 
seen ...

> I'd be great if we didn't introduce terms nobody may be using for
> many years.

I am afraid, that what I have seen at ADASS is forcing us to start 
immediately to address just the current requirements of big projects which 
are in fact restricted in using VO (even if they would like) as we are not 
provide them end to end solution for their archives ... So they are mostly 
limited to using TAP to query catalogues or observing logs .... the lucky 
ones can use even tables of some measurement ...

  Bonus: We don't need to quarrel about what's the
> difference between associated-cube and associated-timeseries-image
> when it's not really clear what any client might be doing with
> either.

The client should better understand what to do with the end of datalink 
from the description in the VOTABLE itself (here should be encoded this is 
spectrum with such axes and such model behind.

But I can see the Mark Taylor's point of view of SAMP as well
don't send the datalink end to this application as it does not understand 
how to display spectrum ....

IMHO we should have easily extensible vocabulary and let the client 
developers to decide how they will use the information
The people publishing certain product at datalink end will have clear 
vision what they want to show and the new clients will be able to display 
this ....

But in practice I think that the most different part of clients is the 
dimension - e.g. timeseries as light curves, folded light curves (in 
phases) , spectra, power spectra , gravitation waves etc ... are just the 
same task to display as 1D vector - and all "semantics": is given by 
description of axes - units, variables...

This is what we wanted to express in our IVOA note - SPLAT is tool for 
displaying 1D vectors. No semantics needed. Thats why we could use it to 
time series immediately with changing a few lines of code ;-)

The image is domain of Aladin and we need a 3D viewers for data cubes ...
Thats all - number of axes determines the product and client to use.

This reminds me another associated image - to spectrum - in echelle it is 
the format pixels (or rebinned wavelengths) over order number ...

In case of interest I can give more details ....

Best regards,

Petr

*************************************************************************
*  Petr Skoda                         Phone : +420-323-649201, ext. 361 *
*  Stellar Department                         +420-323-620361           *
*  Astronomical Institute CAS         Fax   : +420-323-620250           *
*  251 65 Ondrejov                    e-mail: skoda at sunstel.asu.cas.cz  *
*  Czech Republic                             skoda at asu.cas.cz          *
*************************************************************************