Time Series Cube DM - IVOA Note

Sat Apr 1 11:41:14 CEST 2017

Dear DM,

I have gone through the ideas since my last response and would like to sum
it up as follows. I think, generally, the discussion is going in the right
direction - but sometimes I'm getting the impression we are talking about
the same things and just calling them differently. An example of
serialization would be exactly what we need now.

Most important points that I see in the discussion now:

1. Validation
I will not cite here as this one has multiple remarks in the conversation
already. My opinion is that loosely coupled models are better - any client
can then validate what he needs. It can validate STC, Photometry or
Quantity separately and then if I want to validate TimeSeriesCube, I go
into the TimeSeriesCube DM and see - OK for a valid TimeSeriesCube instance
I need to have valid one STC and one Quantity (+ maybe Photometry)
annotated in the dataset. So the TimeSeriesCube DM won't actually *contain*
everything it requires for validation - it will just say what *other DMs*
need to be valid as part of it.

This gives the clients a tool (not only a possibility) to validate only
parts of the coming time series VOTable so Mark Taylor can now say - no
this is not a valid Time Series data cube, but it has valid STC and
Photometry annotations and I can work with those.

2. Quantity

> Agreed.. again, since your Quantity == my Coordinate, the 'coords' model
> would be that place.

I must disagree with this, because in this particular case, I guess we are
calling two different things one name. If I use your described objects:

>
>   coordsys: == Coordinate Systems and Frame specification pattern + domain
> implementations
>   coords:     == Coordinates specification ( value + errors )
>   trans:        == Transform model
>   cube:        == NDCube model elements
>   ds:            == DatasetMetadata elements.

Then you have only value+error for coords, along with its coordsys (which
can be delgated to STC, I will get to that later). For Quantity though,
it's not only about annotating value and error, it's also about describing
this quantity's distribution. So after put into your format:

coordsys: == ( value + errors ) -  simple model of the data, not metadata
quantity: == ( value + errors ) + this quantity metadata ( mean + sigma +
quartiles + ... ). The metadata is to help the users to decide what they
want search for within these values, how to filter them.

What alternative would you suggest?
> It is used in:
>    Dataset once
>    Coords many times
>    CoordSys a few times

This is actually not important for the Cube only, so I'd also vouch for
putting it into separate model which would have one page now only - just
list the statistical parameters that you would like to keep for describing
a quantity along with value and error columns.

3. Extensibility, God-likeness, loose coupling

I'd pick the following important statements.

Our standards live a lot longer than the last new-fangled
> distributed peer-to-peer NoSQL social blockchain web glitz.  And
> therefore for us dependencies are even more expensive.

The best way to make standards live longer is to make them closed for
modification (as few *major* versions as possible), but make them open to
extension. If anybody realizes the data model does not provide enough for
him, he can still extend it with custom attributes (still compatible with
the current version) and once we have several of these extensions, we just
make an intersection and what's common will create a baseline for new
*minor* version, adding more standard tools to bear for the clients.

If we realize that we designed something incorrectly in the *major *version
(made it too restrictive) and somebody needs to change attributes already
part of the model, that's the expensive part and we need to create a new
*major* version. Still, with the loose coupling we can keep annotations of
both major versions in the same dataset to make things backwards compatible
- more in serialization examples.

To sum it up, I would use the rule Petr mentioned:

In using the 80/20 rule - we may not cover everything ...

Exactly - we can cover within the model 80 percent, but to make our major
versions stable, we need to predict where those 20 percent might head and
make the model easily extensible (non-restrictive) in that way. Decoupling
models into smaller parts helps a lot in this regard, because it's much
easier to extend a smaller model than a God-like object.

> I think pulling out the generic stuff from STC will get us a long way
> towards a good start for quantity, and I'm not adverse to specifying
> quantity in a REC together with STC.  But I shouldn't need to pull in
> STC and its frames (not to mention transformations and geometry) just
> to express that I have something with a value, an error, and whatever
> else.

Yes! We should not be afraid of pulling parts of models out into new
stand-alone models. This is the most natural way of Software engineering.
You design a small object. You keep adding functionality to that object.
Once you realize that the object is too big, or has multiple
responsibilities, you refactor part of the object into a new object that is
just referenced from the original.

Once you realize that the object is too big, or has multiple
responsibilities - this question needs to be asked on every major version
of a model or we won't improve the quality of IVOA standards at all.

And yes, adding your own stuff to the model will be easier, as Omar wrote:

> At the same time, I think we should make sure that when such an update is
> really necessary, our framework makes it easy to update the downstream
> model. It's always a trade-off. The easier it is to adopt healthy patterns,
> the easier it is to fall into disruptive anti-pattern pits.

But that's not a reason to make it harder. The task of IVOA should be here
to guide the people who are interested in using our standards and extending
them and tell them what is actually a healthy addition and what is an abuse
of the model.

4. Examples of serialization
The most important part right now, because it will straighten up the
vocabulary and uncover ambiguities that we are using.

We are working on this right now - we have some sample XMLs for
TimeSeriesCube DM 1.1 written by hand right now, but I would like to
postpone sharing them here before we try to implement them to see how they
work in practice - hopefully somewhere during next week.

Cheers,

Jiri

2017-03-31 13:49 GMT+02:00 Markus Demleitner <msdemlei at ari.uni-heidelberg.de
>:

> Hi Omar,
>
> One short point, one longer one:
>
> On Tue, Mar 28, 2017 at 10:21:46AM -0400, Laurino, Omar wrote:
> > In the time series example, more than a time series *data model* I think
> > time series can just be seen as *instances* of a common, more generic
> data
> > model, that is itself a lightweight one. A client could specialize into
>
> Absolutely -- at least my goal in this is to have time series just be
> an NDCube that happens to have just one non-degenerate independent
> axis that furthermore happens to have time-like STC annotation; I
> think our adopters would rightfully develop solid resentments against
> us if we did something very different.
>
>
> >   <GROUP vodml-type="stc:Coordinate">
> > >     <PARAMref vodml-role="value" vodml-type="Coord2" ref="pt"/>
> > >   </GROUP>
> > >   <PARAM ID="pt" xtype="POINT" datatype="real"
> > >     arraysize="2" value="23.3 41"/>
> > >   <GROUP vodml-type="stc:Coordinate">
> > >     <GROUP vodml-role="value" vodml-type="Coord2">
> > >       <PARAMref vodml-role="C1" ref="ra"/>
> > >       <PARAMref vodml-role="C2" ref="dec"/>
> > >     </GROUP>
> > >   </GROUP>
> > >   <PARAM ID="ra" value="23.3"/>
> > >   <PARAM ID="dec" value="41"/>
> >
> >
> > Would you have both annotations in the same file? How should a client
> > (unaware of the enclosing model) know this is two different
> representations
> > of the same coordinate rather than two distinct coordinates? I would
> rather
> > be in favor of specific mapping rules for certain types, if that makes
> > sense, which is what we already do for ivoa:Quantity. Coord2 would be
> > serialized as a DALI POINT, if that makes sense. Admittedly, I haven't
> > given this possibility enough thought, so I am not sure how convenient
> that
> > would be or what repercussions it might have down the road.
>
> I guess this is a good example for a distinction between two use
> cases that we perhaps haven't sufficiently made in past DM work to
> our detriment.  I think issues become a lot clearer if we separate
> two related but actually distinct things:
>
> (1) We want to define standard serialisations; that's stuff like an
> obscore table, an SSA response, or whatever.  Here, we have to be
> strict and precise on the serialisation details.  I think obscore
> gets it right, simply saying "column/FIELD with name s_ra, floating
> point type, in unit foo, UCD such-and-such, preferred description
> this-and-that, reference frame ICRS".  Note how, this way, further
> annotation is actually not necessary for anything in the core data
> content, and that's how things must be if one wants to write
> multi-service queries or join results from different services without
> a lot of logic.  I'd say that's "baseline interoperability".
>
> Personally, I'm not even sure if the notion of a data model even is
> terrribly useful for these *as such*.  Grammars or, as in obscore, a
> simple database schema seem more appropriate to me.  Be that as it
> may, by now I'm convinced that even with VO-DML and the mapping
> document, we'll still have do define concrete serialisation(s), for
> me preferably in documents of type (1) themselves.  But that's, I'd
> say, tangential for now.
>
> Of course, once you add local extensions to such predefined
> serialisations (e.g., extra columns in obscore, custom fields in DAL
> responses), things are different, and then we have one example of
> (2).
>
> (2) We want complex metadata schemes for physical (or whatever)
> entities which generically work whereever these entities turn up;
> that could be filter names and zero points in photometry, time scales
> and reference positions for times, or statistical properties, error
> models, etc for measuments of all kinds.  These *may* go on top of
> the well-defined serialisations, but where they really are needed is
> when you have "free" responses, e.g., in TAP, datalink/SODA parameter
> declarations, custom extensions, etc.  I'd call this "spontaneous
> interoperability, because client and server don't need to pre-arrange
> anything above the transport and annotation layers.
>
> That's complex in the general case, but it's not black magic.  Hence,
> (not only) I still think it's a disgrace that 15 years into the VO
> all we have is a deprecated (and fairly limited) way to say "this
> pair of columns [this POINT, POLYGON...] is a position in ICRS
> BARYCENTER for Epoch J2015.0.  At least this very basic annotation
> simply must work for essentially all representations that sensible
> and almost-sensible people (as well as data-writing astronomers) may
> choose.
>
> With that distinction: No, I do not believe we'll end up at a useful
> standard if we leave open in a given standard whether positions are
> given as RA/DEC or a POINT in cases like (1).  They have to say that
> or you can never, say, write an obscore query that works on more than
> one service.
>
> But the data models and in particular annotation scheme (make that a
> plural once we tackle FITS or HDF5) must still be flexible enough to
> cover (2).  Let's see that we can finally annotate VOTables in
> suffient detail that a client can reliably bring a catalog in ICRS on
> Epoch J1992.25 to Galactic in J2015 (or notice when that's not possible
> for lack of proper motions).  I'd say that's an achievable goal.
>
> And since I'd really like this to not share the fate of the
> STC-in-VOTable Note, my feeling at this point is that proper error
> treatment is for when we've gathered some experience; that would mean
> that we can for now delay modelling correlated, non-Gaussian or
> otherwise real-world errors (but keep Quantity open for adding that
> later).
>
> A bird in the hand is worth two in the bush.
>
>
>       -- Markus
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dm/attachments/20170401/57324260/attachment-0001.html>