[timeDomain: model for Time series] discussion on Timeseries Data model Note / ND-point .... or not ??? !!!

Sun Sep 3 14:48:07 CEST 2017

Hi Mark,

Answers/thoughts inline.

Cheers,

Jiri

From: CresitelloDittmar, Mark [mailto:mdittmar at cfa.harvard.edu] 
Sent: Thursday, August 31, 2017 10:22 PM
To: Jiří Nádvorník <nadvornik.ji at gmail.com>
Cc: François Bonnarel <francois.bonnarel at astro.unistra.fr>; Mireille Louys <mireille.louys at unistra.fr>; Data Models mailing list <dm at ivoa.net>; voevent at ivoa.net; dal at ivoa.net
Subject: Re: [timeDomain: model for Time series] discussion on Timeseries Data model Note / ND-point .... or not ??? !!!

All,

I am finally able to devote some time to the VO efforts, and have just been reading through these discussions.

Maybe someone could summarize for me what the current state is.  I would like to get back to iterating through example serializations as that seemed to make things more 'concrete' and provided important feedback about what is reasonable and usable.

Anyway.  My impression about the discussion...

1) NDPoint.

   One of the differences between Jiri's approach in the Note, and the Cube model is that my model has this element which, as Francois noted, is primarily to associate a set of Observables as being related.  The NDPoint collects data axes (time, pos, energy, mag ).  Each instance of NDPoint is a set of values which describe a single 'event'.  I believe the column-based approach looses this important relation.  Recall, the primary use case for this is an Event List.

[[Jiri Nadvornik]] I’d just like to point out that the difference here is that in the TSCube model we collect the data axes in the SparseCube class directly (time, pos, energy, mag) because we don’t find useful to store axis metadata on the level of each Observable, but rather on the whole Dataset.

2) Errors

   The Observables in the cube model are STC CoordMeasure instances, which include the measured value, associated error, and frame reference.   The error modeling to date is assuming a 1-1 association of the value to its error (eg Value +/- Error ).  The modeling of errors in STC2 is very extensible and targets the most common error types.  It is very compatible to have a separate effort take place to model other sorts of errors ( statistical distributions ) and extend the base STC-2 error class.

[[Jiri Nadvornik]] I agree that the Error class in STC2 is very extensible, but I would go even one step further here because when we have „statistical distribution“ for Error, what’s the Value then? I’d suppose it is just some Expectation or Mean value, which is already part of that statistical distribution.

If the relation is a bit more broad, and the 'measure' itself is a statistical distribution kind of thing, then what you are describing is not an Observable.  This sort of thing I expected might come up with simulated data (when we think about bringing SimDM and Cube more in line with each other.. (they are quite compatible)).  If this is the case, we we should define a different kind of DataAxis which better facilitates the statistical/mathematical nature of the measurement.

[[Jiri Nadvornik]] I don’t agree completely on the first statement here – I think that even an Observable ‚measure‘ can be represented with a statistical distribution. Typically, a point spread function will have a gaussian distribution where the Flux Value+Error would be just Mean+Sigma of that statistical distribution.

If the data can not be described as a series of 'events'

  "at time T1, we measured these items"

  "this source, has these proprerties"

then the data is not a SparseCube.  I'm under the impression that most time series do fit this description, but if there are some which do not, we should evaluate an example to see if it needs a different sort of DataProduct.

[[Jiri Nadvornik]] I still think we can fit into this ‚event‘ description with all our use cases. I think in every case the Time Axis points can be actually represented by a single Value, rather than needing some kind of distribution here.

Does anyone have a time series use case where we have uncertainty when exactly was the ‚measurement‘ taken?

Mark

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dm/attachments/20170903/4f791426/attachment.html>