Vocabulary terms in MANGO

Tue Nov 5 18:32:26 CET 2024

Hi Laurent,

On Tue, Nov 05, 2024 at 05:39:32PM +0100, Laurent Michel via semantics wrote:
> In some places, this model uses literal enumerations to specify some property roles:
>
>     - Calibration level
>     - Colour definition (magnitude ratio vs hardness ratio)
>     - Shape definition (MOC, STCS...)
>     - Possibly error distributions (poisson, gaussian...) (see issue #59)
>
>
> This works fine, but has a major drawback: if some enumeration needs to be
> updated, we would have to create a new model version via a new RFC process.
> :-(
>
> This issue has already been discussed in the IVOA and there was a consensus
> that vocabularies should preferably be used in data models AFAIR.
>
> My questions are:
> ================
>
> 1) Do you still think it is better to replaces literal enumerations
> with a vocabulary?

Generally, a vocabulary is good if you can write some sort of generic
code to handle things; in-schema definitions are preferable when
"deep" code changes are likely necessary to deal with some change.
For instance, I doubt that any code will sensibly be able to deal
with an addition to an enumeration starting with moc, stc-s,
dali-shape.  Each of these items needs a profoundly different
treatment with very little common code.

Against that, a new calibration level probably won't break any code;
statistical distributions... are somewhere in between.

An extra consideration: Whenever there is some implied hierarchy,
vocabularies are particularly attractive, even when specialised code
is necessary, for instance, if you can fall back to more general code
for a more general term, perhaps losing some precision.  As an
example, consider reference positions, where for light travel time
corrections it is still useful to know something is in low earth
orbit even if you don't have an HST ephemeris (say).

> 2) Should this vocabulary be specific to MANGO or should it have a
> wider scope?

Vocabularies should be designed to model a particular part of the
reality.  Anything dealing with that part of the reality should be
able to use that model.  Hence: No vocabulary should be marked as
Mango-private, and when designing them, please keep reusability in
mind.

Since we have so many notions of calibration levels (see the EPN-TAP
spec for how bad the situation is in solar system science alone), I
would particularly like a shared vocabulary there.

> 3) Should it be included in some existent IVOA vocabulary?

Included?  Probably not.  If no existing vocabulary does what you
need, create a new one.  Semantic resources in general are easier to
use when they are small and cover only small parts of the reality,
and become harder as their extension (the subset of the world they
cover) grows.

On the other hand, you should avoid creating vocabularies that in
some sense share concepts with other vocabularies without a very good
reason (such a very good reason might exist, for instance, for having
object-type in addition to the full UAT, because the restriction to
object types may allow you to use a more rigorous formalism).

> 3) What is the process to create a new vocabulary (a simple VEP?)?

You write down the terms and definitions, ask here for opinions
and explain in your WD how to use the vocabulary.  After that, we
upload the vocabulary to the vocabulary repo with all terms marked as
preliminary, and it is being developed alongside your WD; it can still
be arbitrarily edited and even be withdrawn at that point.  It is
being community-reviewed together with the PR and as part of your
standard's RFC.  When your standard becomes REC, the vocabualry
becomes stable, and only then do VEPs come into play.

            -- Markus