[VEP-0001] DataLink semantics vocabulary enhacement proposal

François Bonnarel francois.bonnarel at astro.unistra.fr
Wed Oct 23 10:47:08 CEST 2019


Hi Petr, all


Le 21/10/2019 à 17:38, Petr Skoda a écrit :
>
>> So, a question to all (including Carlos, who's posted on voevent@):
>> Which of these terms do you actually need *now* (or at least for data
>> that you will want to publish in the safely forseeable future)? And
>> can you see a clear scenario for a *machine* to have to understand
>> matters at that level of detail (for human, there's always the
>> description in datalink)?
>
> I would like to point out that the list suggested by Francois is still 
> not sufficient for many archival (Vizier e.g.) and future surveys data.
>
> E.g. what is missing is the associated link to timeseries where the 
> horizontal axis is not time but circular phase associated with given 
> frequency in a periodogram and the associated periodogram itself.
>
> This is massively used in asteroseismology and exoplanetplanet hunting
> e.g. all the Kepler exoplanets publications show somewhere images from 
> this tool: see namely fig 4 here
>
>
>
> https://www.researchgate.net/publication/248394956_The_NASA_Exoplanet_Archive_Data_and_Tools_for_Exoplanet_Research 
>
>
> atlases of variable stars show the period-folded curves as here
>
> http://ogle.astrouw.edu.pl/atlas/anomalous_Cepheids.html
>
I wonder if all this is not more "derived" data tna "associated" data. 
Because this requires some more processing than calibration to be 
obtained from direct observations. Let's say analysis.
And we lack a vocabulary for this kind of dataproducts which don't 
really fit with Obscore scope at the moment.
But for this kind of data, this is true beyond the DataLink context also
>
> If you want the example of timeseries of spectra
> there is so called dynamical spectrum (e.g. in my old pictures
>
> https://arxiv.org/pdf/1112.2787.pdf
>
> and
>
> for small amount of spectra
>
> https://arxiv.org/pdf/1407.1765.pdf
>
> Fig 1
Sure , thanks
>
>
> There are of course better examples of quick time resolved 
> spectroscopy etc ....
>
> I would need to find them (if needed)
>
>
> Also I can imagine the time series of datacubes (in ALMA, radio) ...
>
yes, it exists also. And for solar observations too.
> And lastly , what about the gravity wave associated information 
> (strain/frequency - I a have asked people from GW community for 
> detailed examples ...
> and it seems that the common "timeseries" they use is
> either strain/time   or power density of strain/frequency
> (strain is relative displacement/baseline of mirrors)
>
> IMHO if we are going to address  important "astroparticle" use cases 
> (not only GW but particle burst also)  we should fullfil requirements 
> of project which were supply PBs of data very soon, even if they look 
> exotic for us.
>
NO idea on these two at the moment. But GW and neutrino people are 
entering the VO now (for example via ESCAPE), so let's see their 
requirements.
> As something more understandable for optical astronomers we should 
> think about folded curves as well as so called phase portaits of those 
> curves (important for analysis of deterministic chaos - which some 
> sources may be driven by)
> https://arxiv.org/pdf/1711.09029.pdf
>
> it will be also important for LSST to have the multiband periodogram 
> associated to targets ...
>
These are also derived products I imagine.
> e.g.
>
> http://jakevdp.github.io/multiband_LS/paper.pdf
>
>
>
>> SPLAT, in turn, for each photometry point in a time series, could see
>> that there is an associated image (not yet for Gaia DR2, but it's
>> there for BGDS, ivo://org.gavo.dc/bgds/l/tsform, for instance) and
>> then have that displayed as users select individual points on the
>> time series so people can immediately check if something is odd with
>> the source of an outlier (say).
>
> This was original motivation for using datalink together with my and 
> Jiri's sparse cube TS model as I was showing in Sydney 2015
> https://wiki.ivoa.net/internal/IVOA/InteropOct2015Applications/lightcurves_skoda.pdf 
>
>
>
>
> The access_url attached to every datapoint in our model was 
> interpreted by Aladin to download the associated image. We have also 
> shown in the IVOA note the possible cutout behind this - so everyone 
> can see that particulr point on ligh curve is strange because there 
> was a cosmic on the star image ...
>
>
>
>>
>> This would lead me to the following terms:
>>
>> associated-data: The data at access_url contains information on the
>>  discovered item not directly related to the original observation
>>  [which would probably be a progenitor or derivation] but giving
>>  additional insights in the source or phenomenon observed.
>
>
> In fact we want to substitute the PROVENANCE both by Carlos' proposal 
> and this as well ..... every product is derived from raw data by 
> certain steps ...
>
> If I go to details - even the single order specrum has associated the 
> 2D image of spectrum (e.g. the rainbow) on a CCD chip as a strip of 
> light and in echelle - still not properly handled even by SSAP it is 
> even complicated ... perhaps the cutout of whole echellogram of a 
> given spectral order is a good approximation for proposed "associated 
> image"
>
Well there is much more in Provenance data model than "progenitor" or 
"docupmentation".

Provenance is an organised set of metadata to describe the history of data.

Progenitors and documentation semantics tages in datalink are for a 
simple information on the link. It does not presume of any 
standardisation of what is coming back from the linck apart from the 
format given by contentType.

So these links are useful when we are not able to provide the full 
provenance metadata

By the way I imagine that "provenance" can be a semantic tag  in the 
future to link an item to its full provenance metadata.
>
>
>
>
>>
>> That's the root term; I'd rather not talk about "dataset" here, as
>> that, at least in some parts of the VO, is a strong term.  For
>> instance, associated images will almost always be cutouts of
>> datasets, and we may not carelessly want to promote these to
>> datasets themselves.
>>
>> associated-timeseries: The data at access_url is a mapping from
>>  time to one or more scalar observables (examples: a lightcurve or a
>>  time series of radial velocities) giving additional insight into the
>>  discovered item.
>
>
> I do not like "a mapping from time to scalar observables"
> as written earlier - mapping from the some variable which is function 
> of time
>
> - e.g. frequency to some (derived, observed) variable - e.g. power 
> spectrum is more general
>
> or what about visibility curves - they are function of time as well 
> just mapped to virtual u,v plane (time trajectories in fact)
>
>
>
>>
>> associated-image: The data at access_url is an image giving
>>  additional insight into the discovered item.
>>
>> Now that I write this, I frankly think that associated-image as given
>> in the use case above should actually be progenitor: that's clearly
>> what a postage stamp that shows the image that the photometry point
>> was derived from.
>
> see above
>
>
>>
>> So -- does anyone have another use case (as I said: try describing
>> what a client would do with the term) for associated-image?  Or for
>> any other term other than associated-data and associated-timeseries?
>> And let me stress again because I think that point is often neglected
>> in the VO: "use case" should really include proposed client
>> behaviour.
>
>
> yes - it would be nice to see the original spectrum in 2D on chip 
> linked to extract 1D ... A lot of common mistakes in data reduction 
> would be immediately seen and the whole "provenance" of the final 
> result would be seen ...
>
>
>
>> I'd be great if we didn't introduce terms nobody may be using for
>> many years.
>
> I am afraid, that what I have seen at ADASS is forcing us to start 
> immediately to address just the current requirements of big projects 
> which are in fact restricted in using VO (even if they would like) as 
> we are not provide them end to end solution for their archives ... So 
> they are mostly limited to using TAP to query catalogues or observing 
> logs .... the lucky ones can use even tables of some measurement ...
>
>
>
>
>  Bonus: We don't need to quarrel about what's the
>> difference between associated-cube and associated-timeseries-image
>> when it's not really clear what any client might be doing with
>> either.
All what is below is no more for DataLink semantics. You are perfectly 
right that we may have to standardize what is coming back from the link 
in case it has been tagged as "associated light curve" as well as in the 
context of ObsCore with dataproduct_type = timleseries, subtype =  
lightcurve or whatever.
This has been long time discussed in the TimeSeries working group in 
collaboration, with DM, Apps and sometimes DAL and you participated to 
that with Jiri.
and there is no obvious solution yet;

Shortly speaking: "this is another thread"

Cheers
François
>
> The client should better understand what to do with the end of 
> datalink from the description in the VOTABLE itself (here should be 
> encoded this is spectrum with such axes and such model behind.
>
> But I can see the Mark Taylor's point of view of SAMP as well
> don't send the datalink end to this application as it does not 
> understand how to display spectrum ....
>
> IMHO we should have easily extensible vocabulary and let the client 
> developers to decide how they will use the information
> The people publishing certain product at datalink end will have clear 
> vision what they want to show and the new clients will be able to 
> display this ....
>
>
> But in practice I think that the most different part of clients is the 
> dimension - e.g. timeseries as light curves, folded light curves (in 
> phases) , spectra, power spectra , gravitation waves etc ... are just 
> the same task to display as 1D vector - and all "semantics": is given 
> by description of axes - units, variables...
>
> This is what we wanted to express in our IVOA note - SPLAT is tool for 
> displaying 1D vectors. No semantics needed. Thats why we could use it 
> to time series immediately with changing a few lines of code ;-)
>
> The image is domain of Aladin and we need a 3D viewers for data cubes ...
> Thats all - number of axes determines the product and client to use.
>
> This reminds me another associated image - to spectrum - in echelle it 
> is the format pixels (or rebinned wavelengths) over order number ...
>
> In case of interest I can give more details ....
>
>
> Best regards,
>
> Petr
>
>
> *************************************************************************
> *  Petr Skoda                         Phone : +420-323-649201, ext. 361 *
> *  Stellar Department +420-323-620361           *
> *  Astronomical Institute CAS         Fax   : +420-323-620250           *
> *  251 65 Ondrejov                    e-mail: skoda at sunstel.asu.cas.cz  *
> *  Czech Republic skoda at asu.cas.cz          *
> *************************************************************************



More information about the dal mailing list