Time Series Cube DM - IVOA Note

Wed Apr 26 13:13:14 CEST 2017

Dear DM,

For context: We're talking about annotation such as on
http://dc.zah.uni-heidelberg.de/getproduct/k2c9vst/data/OGLE-2016-BLG-0937_VST_r_SDSS78.t

On Thu, Apr 20, 2017 at 10:57:30AM -0400, CresitelloDittmar, Mark wrote:
> On Tue, Apr 18, 2017 at 4:35 AM, Markus Demleitner <
> msdemlei at ari.uni-heidelberg.de> wrote:
> > (2) I never liked the separation of frames from the remaining
> > annotation in STC; I believe the original idea has been that you
> > could somehow re-use tuples of frames on the various axes, and I
> > doubt any savings (or functionalities?) possible with this are
> > proportional to the complication of having to go through references.
> > Hence, in my draft, the coordinates simply have a frame attribute,
> > done.  I'd even propose that just based on the type of the thing you
> > find in frame, clients should figure out the nature of the axis, and
> > you could get rid of verything in coordsys:domain except the frames
> > themselves.
> >
> >
> couple comments:
>   a) I don't see the 'frame' connection for your Measurements (mag, df)
>       ie: the 'value' should be a Coord type.. with relation to 'frame'

Well, here my hope still is that we can keep annotations from
different data models independent; I'm not saying it's terribly
important here, because both STC and whatever defines Measuerement
will be referenced all the time, but as an example of the general
pattern that's good enough.

Essentially, there is:

      <INSTANCE ID="mag-inst" dmtype="ivoa:Measurement">
        <ATTRIBUTE dmrole="statError">
          <COLUMN ref="e_mag"/>
        </ATTRIBUTE>
        <ATTRIBUTE dmrole="value">
          <COLUMN ref="mag"/>
        </ATTRIBUTE>
      </INSTANCE>

This (and perhaps a reference to a similar group for a derivative) is
for general, "Measurement" clients that want, for instance, to plot
error bars by default.  It's a complete annotation for any
measurement, not just for STC coordinates.

If I had a photometry DM description for this, there'd be a
*different* instance (I'm making things up in this particular case
because the photometry in my time series is extremely tricky):

  <INSTANCE ID="phot-inst" dmtype="phot:PhotometryPoint">
    <ATTRIBUTE dmrole="zeroPoint">
      <INSTANCE dmtype="phot:LinearFlux">
        <ATTRIBUTE dmrole="flux">
          <LITERAL>23.2</LITERAL>
        </ATTRIBUTE>
        <ATTRIBUTE dmrole="referenceMagnitude">
          <LITERAL>-0.5</LITERAL>
        </ATTRIBUTE>
      </INSTANCE>
    </ATTRIBUTE>
    <ATTRIBUTE dmrole="value">
      <COLUMN ref="mag"/>
    </ATTRIBUTE>
  </INSTANCE>

-- even a client understanding photometry would take the "normal"
measurement properties from  #mag-inst, and it could take
off-the-shelf code for that.

The extra photometry metadata is then added using #phot-inst.
Photometry doesn't need to care about error models or similar (unless
it absolutely had to because perhaps photometers have their own,
hyper-proprietary conventions too wonky to mention outside of the
field -- which I don't want to assert, of course).  It just specifies
what's special about photometry.

And it's similar for STC.  For instance, the hjd column in the
example is referenced as a part of the STC structure, which is where
its frame is defined (I hope with this Arnold will forgive the use of
the three-letter sequence hjd):

      <INSTANCE ID="ndgtaonoglha" dmtype="stc2:Coords">
        [...]
        <ATTRIBUTE dmrole="time">
          <INSTANCE ID="ndgtniuabtea" dmtype="stc2:Coord">
            <ATTRIBUTE dmrole="loc">
              <COLUMN ref="hjd"/>
            </ATTRIBUTE>
            <ATTRIBUTE dmrole="frame">
              <INSTANCE ID="ndgtniuabtha" dmtype="stc2:TimeFrame">
                <ATTRIBUTE dmrole="kind">
                  <LITERAL dmtype="ivoa:string">JD</LITERAL>
                </ATTRIBUTE>
                <ATTRIBUTE dmrole="refPosition">
                  <LITERAL dmtype="ivoa:string">BARYCENTER</LITERAL>
                </ATTRIBUTE>
                <ATTRIBUTE dmrole="timescale">
                  <LITERAL dmtype="ivoa:string">UTC</LITERAL>
                </ATTRIBUTE>
              </INSTANCE>
            </ATTRIBUTE>
          </INSTANCE>
        </ATTRIBUTE>

*as well as* with its role within ndcube:Cube:

      <INSTANCE ID="ndgtniuabnha" dmtype="ndcube:Cube">
        [...]
        <ATTRIBUTE dmrole="independent_axes">
          <COLUMN ref="hjd"/>
        </ATTRIBUTE>
      </INSTANCE>

Incidentally, even for "shallow parsing" clients that's not a major
issue; they could just look for //COLUMN[@ref="my_column"] elements
and could immediately gather all annotations for their column,
regardless of DM and DM major version, which I think is a very sane
pattern.  Once a client learns a new DM, it will automatically be
picked up in such an architecture.

>   b) on Coords and Frame relations
>       Frame is an ObjectType, so it can only be in Composition or
>       Reference relations to Coord.  Since a Coord does not own the
>       Frame, and multiple Coord-s may be in the same Frame, the
>       relation is Reference.  So, the modeling of this should not
>       change, and I hope this then becomes a Mapping concern.

Hm, well...  Is there a deep reason why a Coord doesn't "own" a frame
(see a while ago for why I'm putting the quotes around the own)?  As
far as I am concerned, frame metadata is just a (complex) attribute
of a value.

STC traditionally has had a "double" grouping: there was a complex
frame consisting of time and space sub-frames (and various other
things that are neither time nor space) and a complex object
containing the coordinates themselves, again grouping them.

I'm a big fan of keeping information in one place.  Hence, I'd argue
that grouping the actual coordinates is enough; you can tell, in my
examples, that the measurement was taken at ra, dec at hjd because
these FIELDs are referenced in the various attributes of a single
stc2:Coords instance.

Once you're there, you can attach temporal frames to times and
spatial frames to points -- where they belong: These things just
"come into existence" because they happen to be what was used to
reduce the measurements reported, so they're data types all right
rather than object types.  Which to me is another indication that
they should live outside of actual instances, being just referenced
by artificial ids that they shouldn't have (there's no identity to a
frame ICRS TOPOCENTER for Palomar Observatory at Epoch J2015).

Technically, I'm not using the COMPOSITON element here because I
think ATTRIBUTE is enough on the annotation side.  But that's a
separate discussion.

>   c) on Coords and Frame relations - part 2
>       This IS a model concern.
> 
>       I notice that you are linking the Coord to Frame, where the
> models/examples we've been putting out are
>       linking each Coord value to it's AXIS in the Frame.   Arnold will be
> happy to see this!  However, it presents
>       two problems (which is why the relation was put on Axis).
>        1) a Coord is a value along a particular axis, so this
>        allows you to know which axis the value is.. (x or y for
>        example) For some applications, they may be more interested
>        in the Frame particulars, but for others, they may be more
>        interested in knowing what the legal value range is
>        (domainMin/Max).  I prefer to keep the relation where it
>        conceptually belongs, and the user can access the info they
>        want.

I think I agree with this, but I can't see how that is related to the
location of Frame.  So, min/max are statistical properties that I'd
like to represent in ivoa:Measurement anyway (they're in no way
special to coordinates, are they?).  Frames, in the other hand, are
special to STC, or at least there's no point having uniform
descriptions of, say, STC frames and photometry systems.  So,
describing the two things in two different places seems very natural
to me.

>            As reference to Frame, there is no way for the user to
>            really know which is 'RA' and which is 'DEC' in an ICRS
>            Frame, you can only say there is a value pair associated
>            with an ICRS frame, and assume the ordering.  This may
>            be 'obvious' in this instance, but what if you have
>            CARTESIAN frame [x,y,z] and are only providing values
>            along a plane [x,z], [y,z], or  [x,y]?   Or a single
>            'ra' or 'dec' value without the other?  How do you know
>            which one-s are being provided?

Well, yes, the point modelling was really shoddy in the original
annotation.  I've just changed it to 

      <INSTANCE dmtype="stc2:Coords">
        <ATTRIBUTE dmrole="space">
          <INSTANCE dmtype="stc2:Coord">
            <ATTRIBUTE dmrole="loc">
              <INSTANCE dmtype="stc2:SphericalPoint">
                <ATTRIBUTE dmrole="latitude">
                  <CONSTANT ref="<to dec PARAM>"/>
                </ATTRIBUTE>
                <ATTRIBUTE dmrole="longitude">
                  <CONSTANT ref="<to ra PARAM>"/>
                </ATTRIBUTE>
              </INSTANCE>
            </ATTRIBUTE>
            <ATTRIBUTE dmrole="frame">
              [...]
            </ATTRIBUTE>
          </INSTANCE>
        </ATTRIBUTE>

which I think isn't terribly far from what current STC says.  The
measurement annotation, if I could make one, would still be something
like

      <INSTANCE dmtype="ivoa:Measurement">
        <ATTRIBUTE dmrole="statError">
          <COLUMN ref="e_ra"/>
        </ATTRIBUTE>
        <ATTRIBUTE dmrole="value">
          <COLUMN ref="ra"/>
        </ATTRIBUTE>
      </INSTANCE>
      <INSTANCE dmtype="ivoa:Measurement">
        <ATTRIBUTE dmrole="statError">
          <COLUMN ref="e_dec"/>
        </ATTRIBUTE>
        <ATTRIBUTE dmrole="value">
          <COLUMN ref="dec"/>
        </ATTRIBUTE>
      </INSTANCE>

though -- separate from stc.

>        2) this doesn't play well when we consider Coord-s which are in
> different domains, but have co-dependent errors.
>            (eg. [ t, x, y ] with ellipsoid error.)
>            There is no single frame to point at.
>            I think this could still be accommodated, but I really think the
> relation to Axis serves more applications better.
> 

It is true that this doesn't let us annotate correlated errors, but
I'd suggest that in this first version that's an acceptable
compromise.  Defining a correlatedErrors type later that could then
reference, e.g., our statErrors as the (square roots of the) diagonal
elements of a covariance matrix or similar and somehow defining
off-diagonal elements somewhere else is possible later, even in
a minor revision of wherever Measurement is defined.

>    d) on "I'd even propose that just based on the type of the thing you
> find in frame, clients should figure out the nature of the axis, and you
> could get rid of everything in coordsys:domain except the frames"
> From the 'read' point of view, this is probably true.. but for the
> models, I want to be able to say This element should be a
> TimeMeasure  ( value associated with Time Frame + error )... which
> requires the domain specialization.  For the TimeSeries example,
> don't you want to require that the independent axis is Time
> related?

It's fine if models pose additional requirements over what's
visible in the annotation, so that's a question that may be somewhat
detached from actual annotation.

In the concrete case, though, I'm mildly convinced there shouldn't be
an explicit notion of time series in the models at all.  A time
series simply is a cube with just one independent axis that happens
to be time-like.  I could be wrong there, but I think that's close to
optimal in terms of generality and sensible handling by clients.

For discovery, the Dataset (or Obscore, which has it already, I
think) model can introduce the term timeseries in about this loose
form.  See

        <ATTRIBUTE dmrole="dataProductType">
          <LITERAL dmtype="ivoa:string">TIMESERIES</LITERAL>
        </ATTRIBUTE>

in the dc:Dataset annotation in the example.

I don't think a general validator will be able to, from this
annotation, check that the independent axis is time-like.  Adding
such rules to VO-DML would kill it.

There's nothing wrong with custom code in an extra "time series
validator" that perhaps validates extra, natural-language constraints
in the ds:Dataset specification ("datasets with a TIMESERIES
dataProductType are limited to a single dependent axis which must
be annotated with an stc:TimeFrame" or so), though.

>        Similar for Target.pos  .. I really want a spatial Position there...
> nothing else.

I'd not object to having a complex type in Target.pos, but I'd
propose to have it point at the entire STC structure.  After all, the
time of observation is every bit as much relevant as the place.  Like
this (that's already in the annotation the service produces as of now):

      <INSTANCE ID="stc-mainpos" dmtype="stc2:Coords">
        <ATTRIBUTE dmrole="space">
          <INSTANCE ID="ndgtaonognea" dmtype="stc2:Coord">
      ....
      </INSTANCE>

      <INSTANCE dmtype="ds:Dataset">
        <ATTRIBUTE dmrole="curation">
          [...]
        <ATTRIBUTE dmrole="target">
          <INSTANCE dmtype="ds:Target">
            <ATTRIBUTE dmrole="location">
              <REFERENCE>
                <IDREF>stc-mainpos</IDREF>
              </REFERENCE>
            </ATTRIBUTE>
          </INSTANCE>

I give you that in this particular case, it'd be painful to not use
STC anotation as the reference target, and I guess that's normal when
you're dealing with really complex structures.  STC will be
referenced from many data models, which makes it even more important
we keep it minimal in the beginning -- you can always add things in
minor versions (which won't break all the other DMs).  Changing or
taking away things is a disaster.

I'd still say the annotation paper has to deal with the fact that we
do have xtype=POINT, xtype=POLYGON, and xtype=INTERVAL now.  For
instance, I'd *much* prefer

<PARAM datatype="float" arraysize="2" xtype="interval" 
  value="23.5 42.7"/>

as the VOTable serialisation of interval-like things (min/max values,
say).  It's what SODA uses for intervals, and it'd be a pain if other
interval representations had to come in when these things get a
proper DM annotation.

>        I don't seem to have this as a specific requirement (on the twiki),
> but have been treating it as one.
> 
> 
> (3) In ndcube:Cube, I'm using sequences of dependent and independent
> > rather than somehow packing them into Observable instances, and
> > I'm directly annotating the columns rather than other DM annotations.
> >
> 
> We are looking at this bit in the modeling, there is a relation I'm not
> quite comfortable with, but the
> general structure is, I think OK as is.. a collection of DataAxis with
> attribute for 'dependent' and a value (Measurement).
> In your annotation, are you still trying to cross ref information?  The

Yes, though I call it co-referenced-based annotation; I guess most of
the discussion above applies here as well. 

Plus, if I want to understand the cube-like structure of the VOTable,
I'd much rather deal with the actual FIELDs than with another level
of annotation.  Figuring out the error for a measurement is equally
simple both with direct co-references to FIELDs (my technique) and
referencing VO-DML elements.  So, in balance I'd again say let's
reference FIELDs directly whenever we really mean them.

       -- Markus