Time Series Cube DM - IVOA Note

Mon Mar 13 11:14:13 CET 2017

Hi all,

I am sorry about long react time on my part, had other priorities on the
table but now I will focus on the Time Series Cube DM discussion as much as
I'm able.

I will try now to react to the emails that I collected as comments for the
Time Series Cube DM and reference the original messages where I can.

@Arnold Rots

> I undoubtedly will have other comments, but even without a magnifying
> glass I was able to spot the string "HJD".
> That's a no--no; not defined, ambiguous, and absolutely undesirable.

What time domain axis model we use here is not of that much importance,
still it needs to be discussed though. The important thing is that we want
to describe the time-related metadata the same way as other axis metadata,
with the AxisModel and reference to the data model that describes the
physical interpretation of the Axis Domain. Whether than that model will be
using HJD or MJD or completely different model is not to be defined within
the Time Series Cube DM, it will be just referenced from it.

What can be defined within the Time Series Cube DM is that an Axis Domain
Model, e.g., STC, needs to be referenced, as we are talking about a *time
series* data cube, not a generic data cube where there is no such
restriction. These relationships are open to discussion still.

@Mark Cressitello Dittmar

>  + Figure 2 and 3
>       These nearly match cube model Section 4.2..
>       Time Series cube 'imports' DatsetDM ~= basically identifies it as a
> SparseCubeDataset
>       referencing one (only 1?) TimeSeriesCube which is an extension of
> the SparseCube data product.
>

We are indeed using Sparse Cube Dataset as a Time Series Cube is a subtype
of Sparse Cube. The Dataset is a collection of such entities though and
that means we can store 0..n Time Series Cube entities in the Sparse Cube
Dataset we are using.

>    + Section 3.1
>       "because we use them as black boxes, it does not matter if these
> data models change.."
>       I'm not sure this is true.. or if it is, it is due to glossing over
> some details.
>         o it looks like you're saying that a Time Series instance has
> ObsDataset metadata as
>            described in 'the Dataset model'.. presumably any version
> thereof
>         o but you show the model import.. the vo-dml model import includes
> the URL to a specific version
>         o and Figure 4 shows the cube relation of SparseCube with the
> SparseCubeDataset which
>            extends the Dataset:ObsDataset object..
>       It seems important for interoperability to have the explicit
> relations so that V2.0 of models
>       can go through a vetting process before being acceptable/useable in
> subsequent models
>       which use the V1.0.
>

Right, this was not spelled entirely correctly it seems. The idea is the if
the Dataset or VO-DML model changes, we don't mind, because we are not
extending these, we are not building anything on top of them that would
break when these models change. Still, we are dependent on them as the Time
Series Cube DM won't work without them, it's a dependency on their
existence, not on their form.

Model import details:

   - Dataset DM - If it changes, we don't mind. We need to store a
   collection of our cubes, we leave the specification how to do so completely
   on the Dataset DM though.
   - VO-DML - We need it for annotating parts of serialization against the
   entities mentioned below. Anyway, we are dependent on existence of such
   mapping, not its syntax, as we are only adopting it and not trying to build
   something on top of it.
      - Parts of the model (entities defined by the model)
      - The Data Model itself

I feel this is a newer concept to the IVOA community and we need to make
sure we understand it all in the same way. If I am dependent on an external
model, that means the serialization of my data will change if that model
changes. That doesn't mean, however, that I need to "embed" it into my data
model, my data model is not changing if the on I am dependent on changes.

>    + Section 3.1.3
>       "We are not importing the whole VO-DML"
>       I'm not sure what this means.. are you saying that this it not
> attempting to be fully vo-dml compliant?
>

This means we *use* parts important for us, without trying to build on top
of them. Effectively we are importing only parts of those models where we
don't mind the syntax, only the semantics of what we are importing. We can
discuss this on examples if needed.

   + Section 3.2.1: Cube DM
>       "however, it is not describing Image Cube DM (as could be
> erroneously understood from the title)"
>       What?  pixelated image data is covered by NDImage (section 6 of the
> cube model)
>

That is maybe cause by my confusion. Is the data model defined in Figure 6
in  IVOA N-Dimensional Cube Model
<https://volute.g-vo.org/svn/trunk/projects/dm/CubeDM-1.0/doc/WD-CubeDM-1.0-20170203.pdf>
the
same one as described in this document IVOA Image Data Model
<http://wiki.ivoa.net/internal/IVOA/ImageDM/WD-ImageDM-20130812.pdf> ?

>    + Section 3.2.3: Image DM
>       see above..  the cube model covers the full scope of Doug's Image
> Cube model (2013)
>

Same question -  is the Figure 6 in N-Dimensional Cube Model describing the
same data model as Figure 2 in IVOA Image Data Model?

   + Section 4/Figure 4
>       your description of the TimeSeriesCube aligns pretty well with the
> SparseCube as it is..
>       I'm not sure it is necessary to 'override' the content (btw you
> could just extend "PointDataProduct")
>       o the cube model SparseCube has a collecton of NDPoints (ie rows),
> which contains a collection
>          of DataAxis, each referring to an instance of a measurement type
> coordinate (value + errors)
>          * your representation and mine simply reverse the row/column
> order.
>          * in cube, the Observable is any instance of Coordinate which
> follows the pattern described
>            in STC2-coords model.  The modeling of that instance/domain
> does NOT have to be in
>            any particular model.. so the Axis Domain DMs scenario you show
> works fine.
>          * But.. for interoperability sake, we do require them to use the
> same pattern (by linking
>            Observable to the abstract coords:DerivedCoordinate which is
> bound to a pattern)
>

Yes, it is pretty similar - we used it as an inspiration and believe
originally that we will just build upon it without changing it. Problems
and difficulties listed beneath.

   1. Axis Domain models - physical meaning of the data in the axis of the
   data cube should not be part of the cube model. The Frame mappings and
   CoordSys entities taken from STC are also such domain models IMHO.
   2. Row/column inversion - this is a bigger difference than it seems on
   the first look. Logically we are storing the same information but I don't
   want to explicitely store a reference to an axis in every single point of
   the data cube. By this the Sparse Cube Model is IMHO saying that the data
   can be unordered and so we need to store the reference for every point?
   Doing it the other way around, the axis can just specify where I can find
   the axis coordinates in the data element of the cube, no matter how we
   serialize it.
   3. What is the Observable entity's purpose in the diagram? Please
   explain..

    + Section 4.2.2
>        cube model does not have dependency on specifc axis domains any
> more (since ~Nov)

What about those FrameMappings and CoordSys entities? These are storing
physical meaning of the data, not just describing the structure of the data
(the data cube) itself.

>
>     + Section 4.2.3:
>        "it can go to the axis domain model through the model parameter of
> that axis"
>        ??  is this describing how one would follow the vo-dml annotation
> to find the axis/frame metadata?

Yes, exactly. For example, in VOTable serialization, this would be
"implemented" as GROUPRef, where the referenced GROUP will be anotated
against the Data Model that is used for this GROUP. this is how we
implement the loose coupling principle for our IVOA DMs IMHO.

@Petr
For Petr, I will only highlight and complete some comments:

> The main goal was to allow to associate multiple links and multiple
> metadata with every point. The LSST plans to add to every point the whole
> probablity distribution function or complex statistical description.

We are preparing here for new Axis Domain Models yet to come. Provenance DM
is almost ready it seems, the probability distribution or statistical
models are not there yet, but the data cube can store the data nonetheless,
just the physical interpretation (metadata) for these is not present yet.

n Thu, 2 Mar 2017, François Bonnarel wrote:
>
>>
>> Mireille Louys, Laurent Michel and I   discussed the TimeSeries Cube data
>> model here in Strasbourg.
>> Before going to serialization we try to go back to the basic concepts
>> needed to represent TimeSeries and try to match them to Cube Data model as
>> Jiri did (although we apparently differ eventually)
>> In our approach, we focus on the time axis considering it as generally
>> irregularly sampled, in other words "sparsed".
>> For each time sample we have a (set of) measurements, which may be one
>> single flux (in the case of light curves) or whatever scalar value, but can
>> also be an observations dataset spanned on other data axes (spectrum,
>> image, radio cube, velocity map....) Actually for each time sample we have
>> an ND cube (of whatever dimension excluding time). And if a single data
>> point , or single value (flux) can be seen as a degenerate case of an ND
>> cube then everything is a set of NDCubes for different time samples !!!
>
>
>     This concept allows to describe Light curves, time-sequences of
> spectra, of 2D-images, of (hyper)cubes.
> I am afraid that describing e.g. radio maps at multiple frequencies
> repeated multiple times (in irregular intervals) is physically feasible but
> this would bring our model to the position of the ALL-INCLUDING
> all-VO-describing model of the Universe (and life etc ;-)

This is not entirely impossible though, a referenced Axis Domain Model can
be actually a model of another Time Series Cube. We need to think on the
implications though and whether it is actually desirable behaviour of the
data model.

 >   2) Interoperability
>>
>>     Interoperability is actually what this is about.  If we build
>>     Megamodels doing everything, we either can't evolve the model or will
>>     break all kinds of clients needlessly all the time -- typcially,
>>     whatever annotation they expect *would* be there, but because their
>>     positition in the embedding DM changed, they can't find it any more.
>>
>
    Client authors will, by the way, quickly figure this out and start
>>     hacking around it in weird ways, further harming interoperability;
>>     we've seen it with VOTable, which is what led us to the
>>     recommendations in the XML versioning note.
>>
>>     Keeping individual DMs small and as independent as humanly possible,
>>     even if one has to be incompatibly changed, most other functionality
>>     will just keep working and code won't have to be touched (phewy!)
>
>

This was our initial idea !!! With mainly SPLAT-VO in mind (yes SPLAT-VO
> now understands time series)

>>     I'd argue by pulling all the various aspects into one structure,
>>     we're following the God object anti-pattern
>>     (https://en.wikipedia.org/wiki/God_object
>>
>
> Nice !!! the definition is exactly what is most of VO standards about

This is it! the main motivation and idea of Time Series Cube DM. We are
describing a data cube in the first place (that is a data structure), with
the physical interpretation (information about the data) being held within
other data models.

@Francoise Bonnnarel

>
> We are facing what the VizieR group had to develop a couple of years ago
> to provide SED built from VizieR content (see joint VOTABLE).
> We can indeed imagine that as an NDpoint with 2 or more axes (time+flux or
> any other observable+ ...) is a good representation of each table row.
> appropriate utypes for ND cube data model , photometry data model, etc...
> should be sufficient to define the role of each FIELD in the table.

My understanding here is that you are talking abouth the utype mapping for
the physical interpretation of the data, meaning the Axis Domain Model in
the terminology of Time Series Cube DM. Our model is really not trying to
re-implement any photometric or other physical domain models.Tthe Time
Series Cube DM is describing what data is held in the data cube, not the
information about how to interpret the data.

This is what we are trying to achieve with this loose coupling - a dataset
(which has no real physical meaning) is storing a set of data cubes (which
have no physical meaning either). The physical meaning of what is actually
in the dataset and its elements is defined within Axis Domain Models used
by these data cubes and their specification is not and should not be part
of the Time Series (or generic) Cube DM.

_________________________________________________________________

I hope this puts some more light onto the data model itself and I am
looking forward to discussing these points on the Asterics conference in
Strasbourg next week.

Cheers,

Jiri

2017-03-06 18:03 GMT+01:00 François Bonnarel <
francois.bonnarel at astro.unistra.fr>:

>
> Dear Petr, dear all,
>
>      If what we want is basically a scalar observable dependant of Time
> and maybe some other physical axis, the table serialization is indeed what
> we need.
>
> We are facing what the VizieR group had to develop a couple of years ago
> to provide SED built from VizieR content (see joint VOTABLE).
> We can indeed imagine that as an NDpoint with 2 or more axes (time+flux or
> any other observable+ ...) is a good representation of each table row.
> appropriate utypes for ND cube data model , photometry data model, etc...
> should be sufficient to define the role of each FIELD in the table.
> Each NDpoint could also have additional "attributes" (VOTABLE FIELDS) to
> link to progenitor datasets or metadata. We just have to agree on the right
> attribute names for these linking features. I don't agree with the idea to
> see those as some special kind of "axes".
>
>       To be clear : I like the prototype. But I think it should be done
> with a little bit less complexity (and I'm quite confident it could).
> Cheers
> François
>
> Le 03/03/2017 à 02:07, Petr Skoda a écrit :
>
>>
>>
>> Hi all,
>>
>> Jiri is leaving for a holiday and he could not watch the disussion as he
>> was quite busy and I have just arived after a month of travelling...
>>
>> I am not sure if Jiri will come to Shangai , I will try to.
>>
>> I would like just to explain some issue without going to details ...
>>
>> What is described is implemented in DaCHS and we have adapted SPLAT-VO to
>> work with it - so it shows light curves from our OSPS survey from 1.5
>> Danish telescope in Chile. (I have shown it several times already at ADASS,
>> Interops and ASTERICS .., but with forced SSAP ) Now it works the same way
>> on client side using new data model and obscore query (also new window in
>> SPLAT-VO) .... there is a lot of issues to solve but basically it works.
>>
>> The advantage of representing everything as a table is a possibility to
>> send light curve to TOPCAT and work with individual points (every
>> corresponding "column" may be activated - so we get e.g. original image or
>> its cutout from which the particular stellar aperture was integrated to
>> give the point on light curve. if you send this to Aladin, it starts to
>> download the image .... thanks to Pierre's modification from end 2015.
>>
>> The main goal was to allow to associate multiple links and multiple
>> metadata with every point. The LSST plans to add to every point the whole
>> probablity distribution function or complex statistical description.
>> This is possible in our model.
>>
>> One important issue is the definition of time series.
>>
>> I had explicitly stated that the time series is everything which has at
>> least one axis time-dependent - in other words this axis is a FUNCTION of
>> time f(t) ....
>> The main idea is to have possibility to mask as a dataproduct type
>> TIMESERIES the Fourier spectrum, power spectrum periodogram etc ...
>> and to link them to the  time series. But the function also means that
>> time may be implied or even eliminated !. Very important case is a time
>> axis replaced by the circular phase (folded with given period).
>> Or you may have (for machine learning) on x-axis the histogram of various
>> time diferences between individual points.
>>
>> I would say that 90% of future usage for light curves will be connected
>> with period analysis or some advanced statistics analysis (e.g. wavelet
>> transform, or even machine learning products as Gaussian mixture
>>  models -  or associated multi-D errors.
>>
>> We have followed all available science use cases as collected by CSP
>> (namely Enrique as cited) and tried to find some new not yet mentioned.
>>
>> But our imagination was limited by the primary goals to describe some
>> kind of linear structrue (in machine learning terms 1D feature vector )
>> marking a single point with value dependent on a (function of) time. And
>> with every point associated metadata or products of further processing or
>> analysis, or link to previous states of pre-processing up to original data.
>> In principle whole provenance of the single point may be associated here.
>>
>> But this was a enclosure for our mental concept.
>>
>> The idea was to give the comunity simple idea how to express the wealth
>> of transients, light curves and period analysis reseults and catalogue them.
>>
>> Or intention was not to describe the multi-D+1 datacube as a time axis
>> linked to multi_D datacubes. This would bring all problems we had seen with
>> SIAP2 etc ...
>>
>> We also explicitely state that a physical domain of every axis is not
>> subject of the proposal and particular semantics joined with given domain
>> is the task for other models.
>>
>> We do not solve this and we do not care .... The client will interpret
>> just what he understands - extending the knowledge about particular
>> contents may be just done by adding some module implementing other model.
>>
>> Example (somewhat artifical , however...):
>>
>> The photometric filter will be described in majority of input time series
>> by name - and it is a task for filter profile service to find the
>> particular transmission curve using metadata refering to photometric system
>> (or instrument)
>>
>> IMHO all users will apreciate if the client will label multiple light
>> curves by the filter names and not complex vectors .....
>>
>> If some advanced client knows the protocol it may open the picture of
>> transmissivity but better IMHO will be to use SAMP and sending the light
>> curve to another client which will extract the links to filters and
>> displays them..
>>
>>
>> n Thu, 2 Mar 2017, François Bonnarel wrote:
>>
>> Dear all,
>>>
>>>
>>> Mireille Louys, Laurent Michel and I   discussed the TimeSeries Cube
>>> data model here in Strasbourg.
>>>
>>> Before going to serialization we try to go back to the basic concepts
>>> needed to represent TimeSeries and try to match them to Cube Data model as
>>> Jiri did (although we apparently differ eventually)
>>>
>>>
>>> In our approach, we focus on the time axis considering it as generally
>>> irregularly sampled, in other words "sparsed".
>>>
>>>
>>> For each time sample we have a (set of) measurements, which may be one
>>> single flux (in the case of light curves) or whatever scalar value, but can
>>> also be an observations dataset spanned on other data axes (spectrum,
>>> image, radio cube, velocity map....) Actually for each time sample we have
>>> an ND cube (of whatever dimension excluding time). And if a single data
>>> point , or single value (flux) can be seen as a degenerate case of an ND
>>> cube then everything is a set of NDCubes for different time samples !!!
>>>
>>>
>>>     This concept allows to describe Light curves, time-sequences of
>>> spectra, of 2D-images, of (hyper)cubes.
>>>
>>
>> I am afraid that describing e.g. radio maps at multiple frequencies
>> repeated multiple times (in irregular intervals) is physically feasible but
>> this would bring our model to the position of the ALL-INCLUDING
>> all-VO-describing model of the Universe (and life etc ;-)
>>
>> Which is beyond my imagination (and implementability) .
>>
>> I did not want at the begining to immerse this model into data cube, but
>> it was tempting (and Jiri convinced me that it can work after he modified
>> DACHS (in collaboration with Markus who is also guilty as he was the first
>> mentioning Data Cube model at our hackaton in Garching during SCIOPS 2015
>> workshop).
>>
>>
>>
>>
>>>
>>> By doing this we are not fully consistent with ND cube data model : we
>>> have something like a mixture between SparseCube and NDImage : the Time
>>> axis is sparsed and each sample on the Time Axis indexes an ND Cube . It
>>> Could be a third specialisation of a generic NDCube ?
>>>
>>
>>
>>
>>>>     >   2) Interoperability
>>>>
>>>>     Interoperability is actually what this is about.  If we build
>>>>     Megamodels doing everything, we either can't evolve the model or
>>>> will
>>>>     break all kinds of clients needlessly all the time -- typcially,
>>>>     whatever annotation they expect *would* be there, but because their
>>>>     positition in the embedding DM changed, they can't find it any more.
>>>>
>>>
>>     Client authors will, by the way, quickly figure this out and start
>>>>     hacking around it in weird ways, further harming interoperability;
>>>>     we've seen it with VOTable, which is what led us to the
>>>>     recommendations in the XML versioning note.
>>>>
>>>>     Keeping individual DMs small and as independent as humanly possible,
>>>>     even if one has to be incompatibly changed, most other functionality
>>>>     will just keep working and code won't have to be touched (phewy!).
>>>>
>>>
>>
>> This was our initial idea !!! With mainly SPLAT-VO in mind (yes SPLAT-VO
>> now understands time series)
>>
>>
>>>>     I'd argue by pulling all the various aspects into one structure,
>>>>     we're following the God object anti-pattern
>>>>     (https://en.wikipedia.org/wiki/God_object
>>>>
>>>
>> Nice !!! the definition is exactly what is most of VO standards about
>>
>> "that knows too much or does too much"
>>
>> "its role in the program becomes God-like (all-knowing and
>> all-encompassing) "
>>
>>
>>
>>
>>
>>>>     I have to admit that I find the current artefacts for current STC on
>>>>     volute somewhat hard to figure out. But from what I can see I'd be
>>>>     unsure how that binding would help me as a client; that may, of
>>>>     course, be because I've not quite understood the pattern.
>>>>
>>>
>> As I understand - the coordinate system or better space-tiem coordinate
>> system is the most difficult and contraversial part of every VO DM.
>>
>> My naive view is that :
>>
>> The STC is required to be able to compare the position and time of
>> occuerence of some transient (e.g Supernova) observed from a satelilite
>> with the same place observed by ground based telescope (e.g. for VOEVENT)
>> Than it is crucial to be able to convert all times and coordinate systems
>> into one one unified as I will query different databases each with its own
>> metadata for coordsys and units.
>>
>> But in case of publishing time series the main gaol is to study the
>> temporal behaviour of some variable in the same coordinate and time
>> system..  In fact the system is not important - it will be only mentioned
>> at axis label (e.g. by name - HJD (see below ....) or satellite board time
>> ....)  or in legend (when comparing two stars - names in legend...)
>>
>> I suppose the full processing and transformation of coordsystem will be
>> done during data preparation phase before publishing ....
>> A number of important timeseries are light cuves folded with given period.
>> This is a label of the particular curve ....
>>
>> In all cases what is presented is already homogenized dataset which would
>> be printed in a publication. -
>>
>> The issue with HJD (for Arnold..)   As said we are describing our
>> implementation for DK154 survey .   And here the HJD is required by users
>> as it is a habit in  community of variable stars. The processing pipeline
>> outputs it so it is here.
>>
>>
>>
>>     from.  What information, in addition to what you get from STC or
>>>>     comparable annotation, does your code require, and is there really
>>>> no
>>>>     other way to communicate it without having to have a hard link
>>>>     between NDCube and STC (or any other "physical" DM, really)?
>>>>
>>>
>> Exactly - the STC is not main visualizable of the time series. But it may
>> be used when "clicking" on the particular point.
>>
>>
>> I hope I have revealed the motivations of our effort and explained why
>> the current version is not suitable for expresing the whole ALMA
>> observation run ;-) as Francois is already thinking at ....
>>
>>
>> But of course, any help is welcome !
>>
>> *************************************************************************
>> *  Petr Skoda                         Phone : +420-323-649201, ext. 361 *
>> *  Stellar Department +420-323-620361           *
>> *  Astronomical Institute CAS         Fax   : +420-323-620250           *
>> *  251 65 Ondrejov                    e-mail: skoda at sunstel.asu.cas.cz  *
>> *  Czech Republic skoda at asu.cas.cz          *
>> *************************************************************************
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dm/attachments/20170313/5d08a62c/attachment-0001.html>