[VEP-0001] DataLink semantics vocabulary enhacement proposal

François Bonnarel francois.bonnarel at astro.unistra.fr
Thu Oct 24 16:28:53 CEST 2019


Hi Peter,

Hi all,

A couple of answers in your interesting email.
Le 23/10/2019 à 14:01, Petr Skoda a écrit :
>
> I will be now quite provocative concerning the omnipotence of proper 
> semantics ;-)
>
> All we try to do is to input the intelligence into the VO tools - they 
> should be clever to prevent the user to do wrong things (e.g. sending 
> catalogue table to SPLAT). So it decides for him not to allow to do this.
>
> But it is always useful if you can switch off the (artificial ) 
> intelligence nad use manual override (see autopilots in airplanes, 
> secure cars, ABS ...) This is not current case of VO clients - the 
> decision algorithms are hidden hard-coded deep in tools (without 
> direct contact with Pierre, Mark and Margarida I would not be able to 
> understand why certain tables do not do what expected after 
> receiving/send by SAMP - answer was always - I am looking for certain 
> collumn name, ucd, utype ... in a table to decide what to do).
>
> But what if the user wants to display catalogue of spectra - so it is 
> the catalogue of objects where one collumn is the accref to spectra . 
> This can be easily loaded instead of the internally generated Results 
> from SSA query (the difference is: the table may cover whole sky - so 
> spatial restriction imposed by SSA query is not introduced, or the 
> spectra may be result of machine learning giving their IDs).
>
>
> Most of problems I have had with my trials to introduce a real 
> interoperability (namely due to experiments with interlinking time 
> series, spectra and images using SAMP) was the censorship of clients 
> applied on particular votables...   So it required introducing hacks 
> to allow e.g. the SPLAT to accept votables from outside and plot them 
> or to force Aladin to plot the coordinates in a votable received by 
> SAMP (down to 2014).

>
> I have never used in my extensive work with SAMP the Broadcast 
> function and I have doubts that  there is a really working real use 
> case (I mean useful for science analysis). The reason to have it 
> working is the main driver behind Markus's use case 2) of Datalink 
> semantics - the SAMP case.
>
> In practice the SAMP hub decides for you whether to send particular 
> votable to given client - so the sending application must use 
> particular message (e.g. load spectrum). The destination client 
> declares it can accept spectra.
>
> Perfect idea. But in practice the scientist will use the SAMP to 
> transfer data from one particular application to another particular 
> target one. So he builds the pipeline chaining the given applications 
> by SAMP.

All above is partially true but a little extreme.

In practice if we concentrate  on the unique DataLink {links} table 
case, semantics is also there to tag in a standard way for HUMAN readers 
the nature of the link. It is a standard guide for the understading of 
the description. Like in VOTable columns of a catalogue where the ucd is 
raedable and tell you more standardized information than the DESCRIPTION 
element embedded in the FIELD.

In the very simple relationship    VO item <-> DataLink-link <-> 
link-response it allows the user (not only the cleint)  to make 
decisions in advance from loading the link target (or not).

>
> I would guess that I am not a typical SAMP user trying to do what I 
> was showing in Paris - spectrum together with image of object + custom 
> code to display local picture of spectrum somehow processed) But even 
> here I could not use the broadcast as certain directions and certain 
> tables did not get through - I had to use the funcionality of TOPCAT 
> *activation actions) - which in fact transform the data for particular 
> clients (including browser) .
>
> It showed to be easier in SPLAT case to allow to user to decide how to 
> deal with SAMP received table (depends on use case) - in menu there 
> are switches. So now you can construct table with certain values in 
> TOPCAT and send this table to SPLAT where you say interpret received 
> table as spectrum (or time series)
>
> If the table received is wrong (not containing the spectrum), it 
> simply dislay nonsense or nothing and shows bug.
>
> So it is the user's responsibility to decide how to interpret the SAMP 
> received tables. and how to build the pipeline.
> He has to right to do the wrong setup and he gets bugs or nonsense out ..
>
> The whole effort behind the semantics  of datalink end is the desire 
> to be identified before the target votable is opened .
>
> Wouldn't it be better to have a clear dataproduct type written in the 
> VOTABLE itself - so once the metadata of votable is read, the client 
> can decide whether he knows how to interpret the content ?
This the DALI idea of using INFO tag at the beginning of the VOTable to 
provide some idea of the nature of the VOTable content.
This indeed works for SSA, SIA, TAP services responses and DataLink 
{links} response itself and SCS-next also. In the case of SSA/SIA we 
should discover inside the reponse table: "Well, dear user, I'm a 
SIAP2.0 response"
or "I'm a catalogue so and so excerpt retrieved by TAP"

In the case of VOTable serialization of a dataproduct (spectrum, 
timeseries) we have currently either an old (spectrum) or no 
(timeseries) standard reprsentation based on a model. When we have it we 
should recommand to add this kind of  INFO tag at the beginning of the 
VOTable.

It will indeed be nice to have this for better processing. But ......
>
> It would require some more communication - but isn't the votable 
> designed in serialization that allows to read just "preamble" while 
> the contents is still flowing ....?
>
> Everyone publishing some data table in VO knows well what is its 
> contents - so why not to describe individual tables semantically here 
> .... (I am image, I am spectrum ....)
>
> So if I want attach link to other datasets I do not describe their 
> nature. Just when it is used the header of table is loaded and client 
> says - I will not deal with this (an ideal case will show window 
> telling - Sorry the table you want to load is a IMAGE - I do not 
> handle images ...)
>
> I can then arbitrarily say - what you want to download is XXXX (e.g. 
> power spectrum) - I am not able to handle it.
> But the user may switch on button which will try to display it anyway
Well systematically loading products in order to know what they are is a 
rather heavy solution. What if the products are huge ?

That's why I think we must have sufficient information in the {links} 
table itself.

So a combination of good semantics/description/content-type in the links 
table and standard INFO at the beginning of responses is the best solution.
>
> Then a new specialized application (period analyzer) can handle power 
> spectrum - so it will not complain and display power spectrum properly.
> (but it decides just after reading the preambl eof votable at end of 
> link)
>
> Of curse this is not a optimal solution - but just I want to show the 
> practical side of working with VO .... To let the user bear the 
> responsibility what happens. He may find a nice tricky  usage doing 
> something unconcievable during design phase.
> ---------------------------------
>
> Concerning the timeseries-of-someproduct
> There are two sort of time series as said:
>
> Either it has its own data model and is propely described by 
> dataproduct type - like e.g. lightcurve in 2 column table or wrapped 
> by spectra data model - so it is loaded in whole and the client needs 
> to understand its content,
>
> Or it is a simple set of other products (images, spectra) which have 
> associated some variable (called time) - which may be anything but 
> having the important property - to be ordered in a increasing or 
> decreasing sequence.
> Than the client should be able to allow to select which variable (and 
> in which direction) it will be shown.
>
> So instead of saying - plot me stacked spectra in increasing order of 
> variable HJD  or JD (or ISO timestamp) you can say increasing order of 
> circular phase  (usually implied). For example look at
>
> https://wiki.ivoa.net/internal/IVOA/InterOpOct2008DAL/stelSSAcutout.pdf
> slide 3 - image left is spectral series folded by circular phase 
> corresponding to the given period) - from 0 to 2 (just cutout of certain
> spectral line) - this is common in asteroseismolgy.
> Image right - order by some difference in time from some reference date.
>
> In case of images you can say - make the animation - frames ordered by 
> that variable.
>
> How to work with timeseries of datacubes I do not have idea (except if 
> you apply some slicing/cutouts and you get series of spectra or images 
> ...)
>
> Sorry for detour from the practical discussion how to name the proper 
> target links in DL table ....
Here we are clearly in the representation of dataproducts issue.
For this kind of stuff  Laurent Michel and others (including me) have 
propoosed the CAB-MSD idea. Look here :
https://github.com/lmichel/Model-For-Source-Data/blob/master/cab-msd.pdf

Buts it's another story. No more DataLink vocabulary.

Cheers
François

>
> Petr
>
> *************************************************************************
> *  Petr Skoda                         Phone : +420-323-649201, ext. 361 *
> *  Stellar Department +420-323-620361           *
> *  Astronomical Institute CAS         Fax   : +420-323-620250           *
> *  251 65 Ondrejov                    e-mail: skoda at sunstel.asu.cas.cz  *
> *  Czech Republic skoda at asu.cas.cz          *
> *************************************************************************
>
>



More information about the dal mailing list