[vodml] EnumLiterals and vodml-id

Thu Jan 14 09:30:10 CET 2016

Hi,

I feel a bit bad for keeping this discussion alive since it's quite a
bit removed from where I stand.  But with the use cases proposed here
this is entering a domain I do care about; cf. my mail in May,
http://mail.ivoa.net/pipermail/dm/2015-May/005180.html.

Back then the discussion ended (apparently off-list) in me conceding
that there is a use case for enum-like structures in a data models;
very roughly, the line is where there'd be a switch statement on the
labels in (well-designed) code.

That's an important distinction, because for such things changes in
the domain of admitted values necessarily mean code changes; hence,
the domain *must* be part of the DM.  That that is explictly not the
case for many vocabulary-type applications; typically, you won't have
to change your program just because there's a new spectral band (say,
someone splitting up submm, or we include particles or gravitational
waves) or creation type, hence...

On Wed, Jan 13, 2016 at 05:49:06PM -0500, CresitelloDittmar, Mark wrote:
> Generally this is not an issue since enumerations are typically a simple
> value set, but in the DatsetMetadata model, we already have 4 examples
> which violate this pattern.
>    SpectralBand:
>      "X-ray"
>      "Gamma-ray"
> 
>    CreationType:
>      "catalog extraction"
>      "spectral extraction"

...I doubt these should be enums in the first place.  What code would
a switch statement control here?  From my registry experience -- and
we have the creation type and the spectral band there, too --, I
would strongly advise against putting creation type in an enum
("baking it into the DM).

Don't ask how many times I've cursed the baked-in enums in, say
vr:ContentLevel, needlessly hindering evolution -- no code would
break if you're changing the vocabulary, but you cannot because
schema changes (or DM changes) are expensive, even if they'd not
break any software except through the namespace change.

So,I consider this an abuse of enums, re-raising exactly the concerns
I uttered in May.

[While I'm speaking, but that's a side issue here, really: I've still
not understood what the case for putting presentation (strings
ostensibly intended for display) into the data model is in the first
place.  It'd help be feel better about titles on enums if someone
proposed one.]

> I really can't speculate what things might be enumerated in future models,
> but some examples of things which would not pass this pattern test:
>   + anything with multiple words
>         o CreationType above
>         o Country:  "Papua New Guinea"

This is another classic for what IMHO should not be modelled in an
enum.  You don't want to have to change your DM (and hence, at least
conceptually, any software using it) every time there's a new
country.  And what's the switch statement that would do something
plausible with such a thing?

>         o Institute: "Centre de Données astronomiques de Strasbourg"

Why would institutes come in an enum, i.e., what's a plausible switch
statement here?  Note that that would mean even well-designed
programs would have to be changed when there's a new institute or
even just if the CDS changed its long form (which has happened
before).  Again, I'd suggest this is about as clear a case for using
a vocabulary as they come.

So I guess my message here is: Based on experiences in Registry I'd
ask everyone involved in modelling at this point to have a look into
VO-DML's SemanticConcept.

Gerard: Perhaps a very quick cheat sheet on "How to make a vocabulary
accomanying a data model and how topConcept can be used to keep the
number of files down" might be helpful?  I don't really know how this
integrates into common modelling tools, but it'd be great if this
could be made something like the obvious tool if all people want is
word lists, more obvious than enums, anyway.

Cheers,

         Markus