[VEP-0001] DataLink semantics vocabulary enhacement proposal

Wed Oct 23 14:01:02 CEST 2019

I will be now quite provocative concerning the omnipotence of proper 
semantics ;-)

All we try to do is to input the intelligence into the VO tools - they 
should be clever to prevent the user to do wrong things (e.g. sending 
catalogue table to SPLAT). So it decides for him not to allow to do this.

But it is always useful if you can switch off the (artificial ) 
intelligence nad use manual override (see autopilots in airplanes, secure 
cars, ABS ...) This is not current case of VO clients - the decision 
algorithms are hidden hard-coded deep in tools (without direct contact 
with Pierre, Mark and Margarida I would not be able to understand why 
certain tables do not do what expected after receiving/send by SAMP - 
answer was always - I am looking for certain collumn name, ucd, utype ... 
in a table to decide what to do).

But what if the user wants to display catalogue of spectra - so it is the 
catalogue of objects where one collumn is the accref to spectra . This can 
be easily loaded instead of the internally generated Results from SSA 
query (the difference is: the table may cover whole sky - so spatial 
restriction imposed by SSA query is not introduced, or the spectra may 
be result of machine learning giving their IDs).

Most of problems I have had with my trials to introduce a real 
interoperability (namely due to experiments with interlinking time series, 
spectra and images using SAMP) was the censorship of clients applied on 
particular votables...   So it required introducing hacks to allow e.g. the 
SPLAT to accept votables from outside and plot them or to force Aladin to 
plot the coordinates in a votable received by SAMP (down to 2014).

I have never used in my extensive work with SAMP the Broadcast function 
and I have doubts that  there is a really working real use case (I mean 
useful for science analysis). The reason to have it working is the main 
driver behind Markus's use case 2) of Datalink semantics - the SAMP case.

In practice the SAMP hub decides for you whether to send particular 
votable to given client - so the sending application must use particular 
message (e.g. load spectrum). The destination client declares it can 
accept spectra.

Perfect idea. But in practice the scientist will use the SAMP to transfer 
data from one particular application to another particular target one. So 
he builds the pipeline chaining the given applications by SAMP.

I would guess that I am not a typical SAMP user trying to do what I was 
showing in Paris - spectrum together with image of object + custom code to 
display local picture of spectrum somehow processed) 
But even here I could not use the broadcast as certain directions and 
certain tables did not get through - I had to use the funcionality of 
TOPCAT *activation actions) - which in fact transform the data for 
particular clients (including browser) .

It showed to be easier in SPLAT case to allow to user to decide how to 
deal with SAMP received table (depends on use case) - in menu there are 
switches. So now you can construct table with certain values in TOPCAT and 
send this table to SPLAT where you say interpret received table as 
spectrum (or time series)

If the table received is wrong (not containing the spectrum), it simply 
dislay nonsense or nothing and shows bug.

So it is the user's responsibility to decide how to interpret the SAMP 
received tables. and how to build the pipeline.
He has to right to do the wrong setup and he gets bugs or nonsense out ..

The whole effort behind the semantics  of datalink end is the 
desire to be identified before the target votable is opened .

Wouldn't it be better to have a clear dataproduct type written in the 
VOTABLE itself - so once the metadata of votable is read, the client can 
decide whether he knows how to interpret the content ?

It would require some more communication - but isn't the votable designed 
in serialization that allows to read just "preamble" while the contents is 
still flowing ....?

Everyone publishing some data table in VO knows well what is its contents 
- so why not to describe individual tables semantically here .... (I am 
image, I am spectrum ....)

So if I want attach link to other datasets I do not describe their nature. 
Just when it is used the header of table is loaded and client says - I 
will not deal with this (an ideal case will show window telling - Sorry 
the table you want to load is a IMAGE - I do not handle images ...)

I can then arbitrarily say - what you want to download is XXXX (e.g. power 
spectrum) - I am not able to handle it.
But the user may switch on button which will try to display it anyway

Then a new specialized application (period analyzer) can handle power 
spectrum - so it will not complain and display power spectrum properly.
(but it decides just after reading the preambl eof votable at end of link)

Of curse this is not a optimal solution - but just I want to show the 
practical side of working with VO .... To let the user bear the 
responsibility what happens. He may find a nice tricky  usage doing 
something unconcievable during design phase.
---------------------------------

Concerning the timeseries-of-someproduct
There are two sort of time series as said:

Either it has its own data model and is propely described by dataproduct 
type - like e.g. lightcurve in 2 column table or wrapped by spectra data 
model - so it is loaded in whole 
and the client needs to understand its content,

Or it is a simple set of other products (images, spectra) which have 
associated some variable (called time) - which may be anything but having 
the important property - to be ordered in a increasing or decreasing 
sequence.
Than the client should be able to allow to select which variable (and 
in which direction) it will be shown.

So instead of saying - plot me stacked spectra in increasing order of 
variable HJD  or JD (or ISO timestamp) you can say increasing order of 
circular phase  (usually implied). For example look at

https://wiki.ivoa.net/internal/IVOA/InterOpOct2008DAL/stelSSAcutout.pdf
slide 3 - image left is spectral series folded by circular phase 
corresponding to the given period) - from 0 to 2 (just cutout of certain
spectral line) - this is common in asteroseismolgy.
Image right - order by some difference in time from some reference date.

In case of images you can say - make the animation - frames ordered by 
that variable.

How to work with timeseries of datacubes I do not have idea (except if you 
apply some slicing/cutouts and you get series of spectra or images ...)

Sorry for detour from the practical discussion how to name the proper 
target links in DL table ....

Petr

*************************************************************************
*  Petr Skoda                         Phone : +420-323-649201, ext. 361 *
*  Stellar Department                         +420-323-620361           *
*  Astronomical Institute CAS         Fax   : +420-323-620250           *
*  251 65 Ondrejov                    e-mail: skoda at sunstel.asu.cas.cz  *
*  Czech Republic                             skoda at asu.cas.cz          *
*************************************************************************