[VEP-0001] DataLink semantics vocabulary enhacement proposal
Petr Skoda
skoda at sunstel.asu.cas.cz
Mon Oct 21 17:38:32 CEST 2019
> So, a question to all (including Carlos, who's posted on voevent@):
> Which of these terms do you actually need *now* (or at least for data
> that you will want to publish in the safely forseeable future)? And
> can you see a clear scenario for a *machine* to have to understand
> matters at that level of detail (for human, there's always the
> description in datalink)?
I would like to point out that the list suggested by Francois is still not
sufficient for many archival (Vizier e.g.) and future surveys data.
E.g. what is missing is the associated link to timeseries where the
horizontal axis is not time but circular phase associated with given
frequency in a periodogram and the associated periodogram itself.
This is massively used in asteroseismology and exoplanetplanet hunting
e.g. all the Kepler exoplanets publications show somewhere images from
this tool: see namely fig 4 here
https://www.researchgate.net/publication/248394956_The_NASA_Exoplanet_Archive_Data_and_Tools_for_Exoplanet_Research
atlases of variable stars show the period-folded curves as here
http://ogle.astrouw.edu.pl/atlas/anomalous_Cepheids.html
If you want the example of timeseries of spectra
there is so called dynamical spectrum (e.g. in my old pictures
https://arxiv.org/pdf/1112.2787.pdf
and
for small amount of spectra
https://arxiv.org/pdf/1407.1765.pdf
Fig 1
There are of course better examples of quick time resolved spectroscopy
etc ....
I would need to find them (if needed)
Also I can imagine the time series of datacubes (in ALMA, radio) ...
And lastly , what about the gravity wave associated information
(strain/frequency - I a have asked people from GW community for detailed
examples ...
and it seems that the common "timeseries" they use is
either strain/time or power density of strain/frequency
(strain is relative displacement/baseline of mirrors)
IMHO if we are going to address important "astroparticle" use cases
(not only GW but particle burst also) we should fullfil requirements of
project which were supply PBs of data very soon, even if they look exotic
for us.
As something more understandable for optical astronomers we should think
about folded curves as well as so called phase portaits of those curves
(important for analysis of deterministic chaos - which some sources may
be driven by)
https://arxiv.org/pdf/1711.09029.pdf
it will be also important for LSST to have the multiband periodogram
associated to targets ...
e.g.
http://jakevdp.github.io/multiband_LS/paper.pdf
> SPLAT, in turn, for each photometry point in a time series, could see
> that there is an associated image (not yet for Gaia DR2, but it's
> there for BGDS, ivo://org.gavo.dc/bgds/l/tsform, for instance) and
> then have that displayed as users select individual points on the
> time series so people can immediately check if something is odd with
> the source of an outlier (say).
This was original motivation for using datalink together with my and
Jiri's sparse cube TS model as I was showing in Sydney 2015
https://wiki.ivoa.net/internal/IVOA/InteropOct2015Applications/lightcurves_skoda.pdf
The access_url attached to every datapoint in our model was interpreted by
Aladin to download the associated image. We have also shown in the IVOA
note the possible cutout behind this - so everyone can see that particulr
point on ligh curve is strange because there was a cosmic on the star
image ...
>
> This would lead me to the following terms:
>
> associated-data: The data at access_url contains information on the
> discovered item not directly related to the original observation
> [which would probably be a progenitor or derivation] but giving
> additional insights in the source or phenomenon observed.
In fact we want to substitute the PROVENANCE both by Carlos' proposal and
this as well ..... every product is derived from raw data by certain steps
...
If I go to details - even the single order specrum has associated the 2D
image of spectrum (e.g. the rainbow) on a CCD chip as a strip of light and
in echelle - still not properly handled even by SSAP it is even
complicated ... perhaps the cutout of whole echellogram of a given
spectral order is a good approximation for proposed "associated image"
>
> That's the root term; I'd rather not talk about "dataset" here, as
> that, at least in some parts of the VO, is a strong term. For
> instance, associated images will almost always be cutouts of
> datasets, and we may not carelessly want to promote these to
> datasets themselves.
>
> associated-timeseries: The data at access_url is a mapping from
> time to one or more scalar observables (examples: a lightcurve or a
> time series of radial velocities) giving additional insight into the
> discovered item.
I do not like "a mapping from time to scalar observables"
as written earlier - mapping from the some variable which is function of
time
- e.g. frequency to some (derived, observed) variable - e.g. power
spectrum is more general
or what about visibility curves - they are function of time as well just
mapped to virtual u,v plane (time trajectories in fact)
>
> associated-image: The data at access_url is an image giving
> additional insight into the discovered item.
>
> Now that I write this, I frankly think that associated-image as given
> in the use case above should actually be progenitor: that's clearly
> what a postage stamp that shows the image that the photometry point
> was derived from.
see above
>
> So -- does anyone have another use case (as I said: try describing
> what a client would do with the term) for associated-image? Or for
> any other term other than associated-data and associated-timeseries?
> And let me stress again because I think that point is often neglected
> in the VO: "use case" should really include proposed client
> behaviour.
yes - it would be nice to see the original spectrum in 2D on chip linked
to extract 1D ... A lot of common mistakes in data reduction would be
immediately seen and the whole "provenance" of the final result would be
seen ...
> I'd be great if we didn't introduce terms nobody may be using for
> many years.
I am afraid, that what I have seen at ADASS is forcing us to start
immediately to address just the current requirements of big projects which
are in fact restricted in using VO (even if they would like) as we are not
provide them end to end solution for their archives ... So they are mostly
limited to using TAP to query catalogues or observing logs .... the lucky
ones can use even tables of some measurement ...
Bonus: We don't need to quarrel about what's the
> difference between associated-cube and associated-timeseries-image
> when it's not really clear what any client might be doing with
> either.
The client should better understand what to do with the end of datalink
from the description in the VOTABLE itself (here should be encoded this is
spectrum with such axes and such model behind.
But I can see the Mark Taylor's point of view of SAMP as well
don't send the datalink end to this application as it does not understand
how to display spectrum ....
IMHO we should have easily extensible vocabulary and let the client
developers to decide how they will use the information
The people publishing certain product at datalink end will have clear
vision what they want to show and the new clients will be able to display
this ....
But in practice I think that the most different part of clients is the
dimension - e.g. timeseries as light curves, folded light curves (in
phases) , spectra, power spectra , gravitation waves etc ... are just the
same task to display as 1D vector - and all "semantics": is given by
description of axes - units, variables...
This is what we wanted to express in our IVOA note - SPLAT is tool for
displaying 1D vectors. No semantics needed. Thats why we could use it to
time series immediately with changing a few lines of code ;-)
The image is domain of Aladin and we need a 3D viewers for data cubes ...
Thats all - number of axes determines the product and client to use.
This reminds me another associated image - to spectrum - in echelle it is
the format pixels (or rebinned wavelengths) over order number ...
In case of interest I can give more details ....
Best regards,
Petr
*************************************************************************
* Petr Skoda Phone : +420-323-649201, ext. 361 *
* Stellar Department +420-323-620361 *
* Astronomical Institute CAS Fax : +420-323-620250 *
* 251 65 Ondrejov e-mail: skoda at sunstel.asu.cas.cz *
* Czech Republic skoda at asu.cas.cz *
*************************************************************************
More information about the dal
mailing list