[timeDomain: model for Time series] discussion on Timeseries Data model Note / ND-point .... or not ??? !!!

Arnold Rots arots at cfa.harvard.edu
Wed Sep 6 20:41:50 CEST 2017

My responses also in line.


  - Arnold

Arnold H. Rots                                          Chandra X-ray
Science Center
Smithsonian Astrophysical Observatory                   tel:  +1 617 496
60 Garden Street, MS 67                                      fax:  +1 617
495 7356
Cambridge, MA 02138
arots at cfa.harvard.edu

On Sun, Sep 3, 2017 at 8:48 AM, Jiří Nádvorník <nadvornik.ji at gmail.com>

> Hi Mark,
> Answers/thoughts inline.
> Cheers,
> Jiri
> *From:* CresitelloDittmar, Mark [mailto:mdittmar at cfa.harvard.edu]
> *Sent:* Thursday, August 31, 2017 10:22 PM
> *To:* Jiří Nádvorník <nadvornik.ji at gmail.com>
> *Cc:* François Bonnarel <francois.bonnarel at astro.unistra.fr>; Mireille
> Louys <mireille.louys at unistra.fr>; Data Models mailing list <dm at ivoa.net>;
> voevent at ivoa.net; dal at ivoa.net
> *Subject:* Re: [timeDomain: model for Time series] discussion on
> Timeseries Data model Note / ND-point .... or not ??? !!!
> All,
> I am finally able to devote some time to the VO efforts, and have just
> been reading through these discussions.
> Maybe someone could summarize for me what the current state is.  I would
> like to get back to iterating through example serializations as that seemed
> to make things more 'concrete' and provided important feedback about what
> is reasonable and usable.
> Anyway.  My impression about the discussion...
> 1) NDPoint.
>    One of the differences between Jiri's approach in the Note, and the
> Cube model is that my model has this element which, as Francois noted, is
> primarily to associate a set of Observables as being related.  The NDPoint
> collects data axes (time, pos, energy, mag ).  Each instance of NDPoint is
> a set of values which describe a single 'event'.  I believe the
> column-based approach looses this important relation.  Recall, the primary
> use case for this is an Event List.
> *[[Jiri Nadvornik]] I’d just like to point out that the difference **here
> is that in the TSCube model we collect the data axes in the SparseCube
> class directly (time, pos, energy, mag) because we don’t find useful to
> store axis metadata on the level of each Observable, but rather on the
> whole Dataset.*
I am not sure there is a difference here and it may actually be a
serialization matter. Presumably, the individual items in the different
arrays(?) in Jiri's case are associated with each other by index. In other
words, it is a kind of "table" model of the data - which is, when
serialized, indististinguishable from an STC serialization in a table,
where the coordinate values are put in table columns and the axis metadata
in the table header associated with the appropriate columns.

> 2) Errors
>    The Observables in the cube model are STC CoordMeasure instances, which
> include the measured value, associated error, and frame reference.   The
> error modeling to date is assuming a 1-1 association of the value to its
> error (eg Value +/- Error ).  The modeling of errors in STC2 is very
> extensible and targets the most common error types.  It is very compatible
> to have a separate effort take place to model other sorts of errors (
> statistical distributions ) and extend the base STC-2 error class.
> *[[Jiri Nadvornik]] I agree that the Error class in STC2 is very
> extensible, but I would go even one step further here because when we have
> „statistical distribution“ for Error, what’s the Value then? I’d suppose it
> is just some Expectation or Mean value, which is already part of that
> statistical distribution.*

You can do that in any way you want and that makes sense, since the Error
class is an empty prototype. That means that you can derive an Error class
that defines a certain type of distribution with appropriate parameters, or
even an enumerated distributino function.

> If the relation is a bit more broad, and the 'measure' itself is a
> statistical distribution kind of thing, then what you are describing is not
> an Observable.  This sort of thing I expected might come up with simulated
> data (when we think about bringing SimDM and Cube more in line with each
> other.. (they are quite compatible)).  If this is the case, we we should
> define a different kind of DataAxis which better facilitates the
> statistical/mathematical nature of the measurement.
> *[[Jiri Nadvornik]] I don’t agree completely on the first statement here –
> I think that even an Observable ‚measure‘ can be represented with a
> statistical distribution. Typically, a point spread function will have a
> gaussian distribution where the Flux Value+Error would be just Mean+Sigma
> of that statistical distribution.*

But here we are talking about Resolution, not Error/Uncertainty. Anyway,
the same argument applies.

> If the data can not be described as a series of 'events'
>   "at time T1, we measured these items"
>   "this source, has these proprerties"
> then the data is not a SparseCube.  I'm under the impression that most
> time series do fit this description, but if there are some which do not, we
> should evaluate an example to see if it needs a different sort of
> DataProduct.
> *[[Jiri Nadvornik]] I still think we can fit into this ‚event‘ description
> with all our use cases. I think in every case the Time Axis points can be
> actually represented by a single Value, rather than needing some kind of
> distribution here.*
No, any time stamp is characterized by the digital accuracy of the clock
used (and usually represents a truncated value within a bin corresponding
to the lowest clock bit), by the errors inherent in the clock (how stable,
how correct is the long-term clock rate, including relativistic effects if
necessary), and clock readout errors (was there a lag). So, at least
conceptually, one cannot make the argument that just a time stamp provides
sufficient information on absolute time.

> *Does anyone have a time series use case where we have uncertainty when
> exactly was the ‚measurement‘ taken?*
HEA event files contain two error measures for time (systematic, or long
term, and short term), as well as the size of the time stamp bins (the
accuracy of the clock).

> Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dal/attachments/20170906/321ded3c/attachment-0001.html>

More information about the dal mailing list