MIVOT: fully qualified attribute names

Thu Feb 16 18:28:27 CET 2023

Dear Laurent,

On Thu, Feb 16, 2023 at 04:40:32PM +0100, Laurent Michel wrote:
> We could continue playing ping-pong, but we reached a point where
> there is no longer great benefits to expect by doing so.

Hm, I don't know about that -- at least I'm learning quite a bit, and
I think figuring things out in this way will help many others, too.

First, it made me notice another (perhaps small) snag:

> - all needed VODML files

These are a bit hairy to find at this point.  Admittedly, they *are*
on https://ivoa.net/xml/, but they're intermingled with all kinds of
XSDs, and it's really hard to see what's ancient and (excuse me)
stillborn and what's fresh and current.  My suggestion would be: Have an
extra section for VO-DML files in the Schema repo so they are not
drowned in the legacy of more or less rusty XSD from the DM group.
And perhaps use a slightly different table pattern so there are links
to both the .vo-dml.xml and the generated HTML documentation.

I'd suggest "Formal VO-DML models" as a section header, perhaps with
VO-DML being a link to the spec.

> At this point, the annotator knows that it must build an instance with a role given by the model; 2 options:
> taking the NAME (Markus option) —> @dmrole=error
> taking the VODML-ID prefixed with the model name (MIVOT) —> @dmrole=meas:Measure.error

Ah! See, this is what I had missed -- sorry about that.  The
attribute names are not computed, they're taken from the VO-DML
files, more specifically, their vodml-id elements (the existence of
which I had forgotten since when I had contributed to VO-DML).
Still, for all I can tell, the syntax of the dmrole attribute is not
specified in MIVOT yet (at least searching for vodml-id just turned
up

  element_model.tex:All models that define vodml-ids used in the annotation must be declared.

for me).  Perhaps you should put, somewhere above Table 2, some
language like:

  The values of both dmrole and dmtype are constructed by taking
  the vodml-id of the respective VO-DML elements and prefixing it with
  the canonical short name of the model and a colon.

I can't *promise* that would have prevented me from having asked for
a clarification, but I'd claim there's a good chance.

> with the roles ala MIVOT , the client selecting
> “.//INSTANCE[@dmrole=meas:Measure.error]: is sure of what it gets
> with the roles ala Markus , the client selecting
> “.//INSTANCE[@dmrole=error]: will return anything playing any role
> named error, either from the meas:Measure class or any other. This
> is the price for using non-unique identifiers.

I note in passing that no client probably ever could learn anything
useful when it does this in either case, so that's a weak argument.
But then: While I'm still not particularly happy with the choice of
vodml-id, and I still think there is no good reason for that choice,
given the dmrole values are at least well-defined, it is certainly
not a blocker any more.

> We can add a section in the spec relating the process above. I
> admit that would help.

A non-normative appendix should indeed say where to find models and
where to find the vodml-ids.

> I will be less open-minded about the requirement of assigning the
> RFC ending with the release of software pieces operating all steps
> of the model mapping process with a good test coverage furthermore;
> very new in the IVOA landscape!

Well, the role string-computing function is no longer needed, because
it's now clear where the role strings come from.  So scratch that.

What's not new in the VO is the demonstration of two (or, when there
are producers and consumers, three) interoperating implementations.

I *am* trying to provide one on the server side (that's why I'm
here), but so far failed, partly because I couldn't figure out how to
use the existing models to annotate even the basic aspects of the
Gaia source catalogue (linking positions and proper motions, linking
scalar errors, defining photometries; note that I'm not asking for
annotation of the correlations any more -- sigh!).

If I can't do that, I'd say it indicates *some* sort of problem:
Perhaps our models are still insufficient -- but then there's no
particular hurry to pass MIVOT (because we can't do anything useful
with it until the models are there).

Another possibility is that I'm too dumb or too grumpy -- in that
case someone else should jump in and provide a... well, "second"
implementation.

I'm using quotes here because very frankly what I have seen so far as
implementation examples is either outdated and does not correspond to
the spec as present, or it strongly looks like hand-crafted examples.
Whether or not the hand-crafted is right: For "implemenation", I'd
strong prefer something that exists in the wild (i.e., in an actual
data centre) and is consumed by something that is not completely
contrived.

Remember my suggestion for an annotation syntax back in the Workshop
days, <https://github.com/msdemlei/astropy>?  You know, that *would*
have worked as a PR against astropy, and the things it consumed where
generated by a machine from an abstract representation in an actual
data centre.  It's that sort of thing I'd really like to see for
something as wide-reaching and ambitious as MIVOT.

But, really, at this point I'd already be content with just something
coming out of an actual data centre (like mine) that some of your
client-side code can consume.  But that's really the minimum in terms
of what I'll take as a "reference implementation" (as required by
Stdproc forever).

> The purpose of MIVOT is to map data on models serialised in VODML.
> All the implementations we provide show that this requirement is
> achieved even if we could dream about tools making the job easier.

That's the second point: I have no idea just *what part* of MIVOT is
actually exercised by the examples mentioned on the RFC page.  There
is simply *soo* much material in MIVOT that I feel unable to decide
what is and what is not covered by implementation examples.  To
cite the (to me) scariest part of the spec: There is a 5-way
definition by cases for COLLECTION in mostly very dense language, and
I freely admit I can't understand most of it.

I'd trust it a lot more if there were a non-author implementor who
had looked at it and found that it at least does not contradict
anything else in the spec.  I expect I'd have to spend days working
out what that is and writing test cases and all -- and sorry, I'll
not spend that time on something I think we should think about in
version 1.4 or version 1.5 of MIVOT, after we know *a lot* more.

> Actually I’ve code generating MIVOT instances from VODML
> (github:ivoa/modelinstanceinvot-code
> branch:feature/instance-generator), i’m using it to build the
> snippets I was talking about i my previous mail. This code is not
> ready to be published, I would be happy to do it but I’ve just no
> time to complete the job; but anyway, this must not block the REC.

But on the other hand, no urget use case is actually waiting for this
to become REC (or is one?), and given we're betting our coordinate
annotation on it, we can at least wait with REC until we successfully
exchange a 6-parameter astrometric solution with it and do an epoch
propagation, with one client and two services implemented by three
different people.  I'm volunteering to be one on the server side, but
I'll need help making sense of MCT.

Even then I'd feel *a lot* more comfortable if MIVOT 1.0 just
contained whatever is necessary for that exercise, because I am
almost 100% sure anything not exercised in that way will contain lots
of contradictions and errors that will bug us later (that's
invariably been the case for unused parts of specs that I have
written).

But if you really cannot be moved to postpone the complex stuff until
we better understand what we're doing, I'd not stand in the way once
we have the epoch propagation use case properly demonstrated.  But
that, I feel, is not asking too much.

Thanks,

           Markus