Coordinates model - Working draft.

Tue Jan 15 13:42:37 CET 2019

Hi Laurent,

On Tue, Jan 15, 2019 at 10:51:54AM +0100, Laurent MICHEL wrote:
> I believe that the scope of models must not be limited just because there is
> a risk of de-synchronization between model stuff and data. This risk is

So... What *is* the scope of the model?  I have, so far, assumed
the STC model will

(a) organise quantities to complete space-time coordinates (i.e.
where, when, d/dt where) and
(b) associate frames with these coordinates.

Do we agree up to here?  Would you like to widen the scope?

> present in any case and it a part of the data provider responsibility. A
> simple side effect in a loop building VOTable fields can make a RA column
> tagged as magnitude. This never occurs just because services are well
> validated. There no reason to assume that things will get worse when dealing

...except that annotation errors always occur (check the output of the
global validators if you don't take my word for it) -- and why set
traps to serialisers and increase the ways in which to get things
wrong?

> A model cannot tell to the client "this quantity is a STC:TimeStamp and do
> your best to get more information. I'm sorry I cannot tell more because I'm
> afraid to contradict my host".

But why *should* a model say that?  And to which question would that
be an answer?

The information that something is, say, a floating point number in
Julian years is already in the container format, and at least for
timestamps the client probably is even isolated from the fact that
something was written in ISO format in VOTable and in some other
format in a FITS table -- the client hopefully just sees a, say,
datetime.datetime instance if they're using python.

So, again: what do you want to accomplish by repeating some
serialisation detail that may be abstracted away by the container
format library in the data model?

> In my opinion, a model must be self-consistent. It must be enable to
> describe data without referring to any <FIELD> attribute and even without
> referring to any specific file format (FITS, VOTABLE..).

Self-consistent does not mean all-encompassing; in particular, just
having TimeInstant *is* self-consistent (what would it conflict
with).

But what do you mean by not referring to any FIELD?  Are you
suggesting DMs should somehow be serialisable independently of
existing file formats?  If so, why?  The file formats are there.
People *want* to use them, or at least they don't want to have to use
another one.

> The best modeling process (including annotation) is the one that allows
> clients to retrieve model instances without running inferences on
> ucd/datatype/xtype.

UCD -- yes, they're orthogonal to model efforts.  

For the rest, I'm not quite sure what you mean by "inference" here.
How would you even read a VOTable without looking at the datatype?
And once you have the data and the annotation, what's left for
inferencing?

VOTable (or FITS binary tables, for that matter, that also transmit
datatypes and units, as does essentially any other modern scientific
serialisation) are good at what they do, and I simply can't see a
reason to second-guess them, least of all from a data model.

I've always hoped we're working towards a layered architecture, where
DM annotation leaves the existing serialisation formats in place and
just adds labels and structures on top of them.  Are you saying this
expectation is wrong?

Anyway, to get this away from abstract speculation: Are there actual
use cases that would profit from repeating type/unit/value
serialisation in the data model?  If so, perhaps there are better,
more layer-respecting solutions for them?

      -- Markus