Time Series Cube DM - IVOA Note

Fri Mar 3 02:07:48 CET 2017

Hi all,

Jiri is leaving for a holiday and he could not watch the disussion as he 
was quite busy and I have just arived after a month of travelling...

I am not sure if Jiri will come to Shangai , I will try to.

I would like just to explain some issue without going to details ...

What is described is implemented in DaCHS and we have adapted SPLAT-VO to 
work with it - so it shows light curves from our OSPS survey from 1.5 
Danish telescope in Chile. (I have shown it several times already at 
ADASS, Interops and ASTERICS .., but with forced SSAP ) Now it works the 
same way  on client side using new data model and obscore query (also new 
window in SPLAT-VO) .... there is a lot of issues to solve but basically 
it works.

The advantage of representing everything as a table is a possibility to 
send light curve to TOPCAT and work with individual points (every 
corresponding "column" may be activated - so we get e.g. original image or 
its cutout from which the particular stellar aperture was integrated to 
give the point on light curve. if you send this to Aladin, it starts to 
download the image .... thanks to Pierre's modification from end 2015.

The main goal was to allow to associate multiple links and multiple 
metadata with every point. The LSST plans to add to every point the whole 
probablity distribution function or complex statistical description.
This is possible in our model.

One important issue is the definition of time series.

I had explicitly stated that the time series is everything which has at 
least one axis time-dependent - in other words this axis is a FUNCTION of 
time f(t) ....
The main idea is to have possibility to mask as a dataproduct type 
TIMESERIES the Fourier spectrum, power spectrum periodogram etc ...
and to link them to the  time series. But the function also means that 
time may be implied or even eliminated !. Very important case is a time 
axis replaced by the circular phase (folded with given period).
Or you may have (for machine learning) on x-axis the histogram of various 
time diferences between individual points.

I would say that 90% of future usage for light curves will be connected 
with period analysis or some advanced statistics analysis (e.g. wavelet 
transform, or even machine learning products as Gaussian mixture
  models -  or associated multi-D errors.

We have followed all available science use cases as collected by CSP 
(namely Enrique as cited) and tried to find some new not yet mentioned.

But our imagination was limited by the primary goals to describe some kind 
of linear structrue (in machine learning terms 1D feature vector ) marking 
a single point with value dependent on a (function of) time. And with 
every point associated metadata or products of further processing or 
analysis, or link to previous states of pre-processing up to original 
data. In principle whole provenance of the single point may be associated 
here.

But this was a enclosure for our mental concept.

The idea was to give the comunity simple idea how to express the wealth of 
transients, light curves and period analysis reseults and catalogue them.

Or intention was not to describe the multi-D+1 datacube as a time axis 
linked to multi_D datacubes. This would bring all problems we had seen 
with SIAP2 etc ...

We also explicitely state that a physical domain of every axis is not 
subject of the proposal and particular semantics joined with given domain 
is the task for other models.

We do not solve this and we do not care .... The client will interpret 
just what he understands - extending the knowledge about particular 
contents may be just done by adding some module implementing other model.

Example (somewhat artifical , however...):

The photometric filter will be described in majority of input time 
series by name - and it is a task for filter profile service to find the 
particular transmission curve using metadata refering to photometric 
system (or instrument)

IMHO all users will apreciate if the client will label multiple light 
curves by the filter names and not complex vectors .....

If some advanced client knows the protocol it may open the picture of 
transmissivity but better IMHO will be to use SAMP and sending the light 
curve to another client which will extract the links to filters and 
displays them..

n Thu, 2 Mar 2017, François Bonnarel wrote:

> Dear all,
>
>
> Mireille Louys, Laurent Michel and I   discussed the TimeSeries Cube data 
> model here in Strasbourg.
>
> Before going to serialization we try to go back to the basic concepts needed 
> to represent TimeSeries and try to match them to Cube Data model as Jiri did 
> (although we apparently differ eventually)
>
>
> In our approach, we focus on the time axis considering it as generally 
> irregularly sampled, in other words "sparsed".
>
>
> For each time sample we have a (set of) measurements, which may be one 
> single flux (in the case of light curves) or whatever scalar value, but can 
> also be an observations dataset spanned on other data axes (spectrum, image, 
> radio cube, velocity map....) Actually for each time sample we have an ND 
> cube (of whatever dimension excluding time). And if a single data point , or 
> single value (flux) can be seen as a degenerate case of an ND cube then 
> everything is a set of NDCubes for different time samples !!!
>
>
>     This concept allows to describe Light curves, time-sequences of spectra, 
> of 2D-images, of (hyper)cubes.

I am afraid that describing e.g. radio maps at multiple frequencies 
repeated multiple times (in irregular intervals) is physically feasible 
but this would bring our model to the position of the ALL-INCLUDING 
all-VO-describing model of the Universe (and life etc ;-)

Which is beyond my imagination (and implementability) .

I did not want at the begining to immerse this model into data cube, but 
it was tempting (and Jiri convinced me that it can work after he modified 
DACHS (in collaboration with Markus who is also guilty as he was the first 
mentioning Data Cube model at our hackaton in Garching during SCIOPS 2015 
workshop).

>
>
> By doing this we are not fully consistent with ND cube data model : we have 
> something like a mixture between SparseCube and NDImage : the Time axis is 
> sparsed and each sample on the Time Axis indexes an ND Cube . It Could be a 
> third specialisation of a generic NDCube ?

>>
>>     >   2) Interoperability
>>
>>     Interoperability is actually what this is about.  If we build
>>     Megamodels doing everything, we either can't evolve the model or will
>>     break all kinds of clients needlessly all the time -- typcially,
>>     whatever annotation they expect *would* be there, but because their
>>     positition in the embedding DM changed, they can't find it any more.

>>     Client authors will, by the way, quickly figure this out and start
>>     hacking around it in weird ways, further harming interoperability;
>>     we've seen it with VOTable, which is what led us to the
>>     recommendations in the XML versioning note.
>>
>>     Keeping individual DMs small and as independent as humanly possible,
>>     even if one has to be incompatibly changed, most other functionality
>>     will just keep working and code won't have to be touched (phewy!).

This was our initial idea !!! With mainly SPLAT-VO in mind (yes SPLAT-VO 
now understands time series)

>>
>>     I'd argue by pulling all the various aspects into one structure,
>>     we're following the God object anti-pattern
>>     (https://en.wikipedia.org/wiki/God_object

Nice !!! the definition is exactly what is most of VO standards about

"that knows too much or does too much"

"its role in the program becomes God-like (all-knowing and 
all-encompassing) "

>>
>>     I have to admit that I find the current artefacts for current STC on
>>     volute somewhat hard to figure out. But from what I can see I'd be
>>     unsure how that binding would help me as a client; that may, of
>>     course, be because I've not quite understood the pattern.

As I understand - the coordinate system or better space-tiem coordinate 
system is the most difficult and contraversial part of every VO DM.

My naive view is that :

The STC is required to be able to compare the position and time of 
occuerence of some transient (e.g Supernova) observed from a 
satelilite with the same place observed by ground based telescope (e.g. 
for VOEVENT) Than it is crucial to be able to convert all times and 
coordinate systems into one one unified as I will query different 
databases each with its own metadata for coordsys and units.

But in case of publishing time series the main gaol is to study the 
temporal behaviour of some variable in the same coordinate and time 
system..  In fact the system is not important - it will be only mentioned 
at axis label (e.g. by name - HJD (see below ....) or satellite board time 
....)  or in legend (when comparing two stars - names in legend...)

I suppose the full processing and transformation of coordsystem will be 
done during data preparation phase before publishing ....
A number of important timeseries are light cuves folded with given period.
This is a label of the particular curve ....

In all cases what is presented is already homogenized dataset which would 
be printed in a publication. -

The issue with HJD (for Arnold..)   As said we are describing our 
implementation for DK154 survey .   And here the HJD is required by users 
as it is a habit in  community of variable stars. The processing pipeline 
outputs it so it is here.

>>     from.  What information, in addition to what you get from STC or
>>     comparable annotation, does your code require, and is there really no
>>     other way to communicate it without having to have a hard link
>>     between NDCube and STC (or any other "physical" DM, really)?

Exactly - the STC is not main visualizable of the time series. But it may 
be used when "clicking" on the particular point.

I hope I have revealed the motivations of our effort and explained why the 
current version is not suitable for expresing the whole ALMA observation 
run ;-) as Francois is already thinking at ....

But of course, any help is welcome !

*************************************************************************
*  Petr Skoda                         Phone : +420-323-649201, ext. 361 *
*  Stellar Department                         +420-323-620361           *
*  Astronomical Institute CAS         Fax   : +420-323-620250           *
*  251 65 Ondrejov                    e-mail: skoda at sunstel.asu.cas.cz  *
*  Czech Republic                             skoda at asu.cas.cz          *
*************************************************************************