[vodml] EnumLiterals and vodml-id

Thu Jan 14 16:35:31 CET 2016

Hi Mark
I also will continue the discussion and I hope you will do so to without your chair-hat on, as I think the different points brought forward have not all been replied to.

> -----Original Message-----
> From: CresitelloDittmar, Mark [mailto:mdittmar at cfa.harvard.edu]
> Sent: Wednesday, January 13, 2016 5:49 PM
> To: Gerard Lemson <glemson1 at jhu.edu>
> Cc: Data Models mailing list <dm at ivoa.net>
> Subject: Re: [vodml] EnumLiterals and vodml-id
> 
> All,
> 
> 
> chair hat on:
> 
>   I've certainly been the most vocal on this topic.. but don't want to gum up the
> works.  I'm going to summarize my position here, then step back a bit.  I suggest
> we give the topic a week or so for other opinions/discussion to take place, then
> select a course of action (say ~ 1/22 or 1/29) so Gerard can make whichever
> adjustments need to be made.
> 
> chair hat off:
> > Mark, Arnold, any problems simply making the 'name' of the enumliterals
> conform to these rules?
> 
> 
> Basically yes, I don't think this is what we want.
> 
> This option means that all vo-dml compliant models must only use
> EnumerationLiterals which conform to the vodml-id pattern '[a-zA-Z_][\w_]*'.
> 
> Since the value is directly tied to the definition of the EnumerationLiteral, these
> are the values which data providers MUST give when representing these literals.
> 
No, this is absolutely not true.
Data providers SHOULD annotate the enumerated values that appear in their own data sets with vodml-refs identifying the VO-DML EnumLiteral in some model that (best) represents their value. That's it. 
There is nowhere in our proposal a statement that VO-DML models MUST be implemented 1-1. In fact the whole effort started from the desire to have an annotation mechanism based on formal data models that can also annotate existing/legacy data sets.

> 
> Generally this is not an issue since enumerations are typically a simple value set,
> but in the DatsetMetadata model, we already have 4 examples which violate this
> pattern.
> 
>    SpectralBand:
>      "X-ray"
>      "Gamma-ray"
> 
> 
>    CreationType:
> 
>      "catalog extraction"
> 
>      "spectral extraction"
> 

For the record I want to note that these examples were not invented by data providers but by IVOA data modelers, who could easily create values that are human readable and nevertheless conform to rules that make the values usable in software contexts. 
And by human readable I do not necessarily mean that any human should, just from the value, be able to infer the meaning. 
These models are supposedly used by other data modelers, or by coders aiming to write software that can understand (say) VOTables annotated by these terms.
All those people MUST be expected to read the actual data model and understand the concepts represented by these names. This is why there is a description attribute on ReferableElement and this is why we will generally write an extra document for our specifications apart from the machine readable VO-DML/XML.

Nor can we ever expect that a particular value in our model will ever correspond exactly to the value used by some data provider to annotate their data, so they'll have to have a mapping form value to value anyway. A possibility for this is given in the mapping document (latest version, 
https://volute.g-vo.org/svn/trunk/projects/dm/vo-dml/doc/MappingDMtoVOTable-v1.0-201506xx.docx and .pdf).

> 
> I really can't speculate what things might be enumerated in future models, but
> some examples of things which would not pass this pattern test:
> 
>   + anything with multiple words
> 
>         o CreationType above
> 
>         o Country:  "Papua New Guinea"
> 
>         o Institute: "Centre de Données astronomiques de Strasbourg"
> 
>         o StellarClass: "Brown Dwarf", "Red Giant", "Neutron Star", "X-ray Binary
> Star"
> 
>   + anything containing '.' or '-' or non-word character, or starting with a number
> 
>        o see SpectralBand above
> 
>        o Quasar: "SDSS J1004+4112", "QSO B1359+154", "3C 273",  "PKS 0637-752"
> 
> 
> Again, I'm not suggesting any of these would be enumerated, only giving the
> flavor.

But this is precisely one of the problems I have with many examples in our discussions, they are built around straw-man models.
I believe that in our discussions we should try to avoid using those.

In my opinion, none of these cases should appear as an Enumeration in any VO-DML model. Countries are ObjectTypes, as are Institutes.
Stellar class is likely a <<semanticconcept>>, your Quasars are identifiers. SpectralBand is defined already in the Photometry DM (as PhotometryFilter) as an objectype. Your CreationType should really be a reference to an object type defined to model provenance properly. 
But in any case this last example is a concept that most likely does not occur very commonly in data providers' database anyway, as it represents a concept in a model built specifically for the IVOA. Hence most likely users would have to transform their values anyway. 

To me it sometimes seems as if some expect VO-DML to be a language that can be naively translated (say using a code generator) into code AND then be immediately used to create a UI for end users who we do not expect to understand VO-DML data models. That is simply NOT the goal of VO-DML and there is no benefit in releasing restrictions so that that becomes possible.

Similarly it seems that in the cardinality discussion there is a desire for models that when generated naively should provide efficient and highly optimized code or data structures. Again this is not the goal of Vo-DML, optimization is the task of the implementer.

> All of these would be valid EnumerationLiterals in standard UML, but not be
> compliant vodml EnumLiteral-s.

Just for the record, even though UML allows one to create models with such enumerations does not mean that the corresponding models should be created or used by us. We are NOT using standard UML. We're defining a simpler more restricted language for use in *our* IVOA context. 
One motivation for restrictions such as these is that (hopefully) it makes one consider whether the concepts that one tries to represent as an Enumeration(Literal) should not maybe be modelled differently. And above I argued that that should be done.
Similar motivations are at play in some of the other discussion we're having (cardinalities etc). The restrictions are there to assist (if not force) one in making certain choices rather than others that might be available if we leave too much freedom for redundancy.

Cheers
Gerard

> 
> They would need to be converted to a vodml compliant form:
> 
>   + using CamelCase
> 
>         "XRay", "CatalogExtraction", "PapuaNewGuinea",
> "CentreDeDonneesAstronomiquesDeStrasboug", etc
> 
> 
>   + others more tricky
> 
>         "SDSS_J1004p4112", "PKS_0637d752"
> 
> 
>   + not sure what we'd do with a numeric.. perhaps just a sequence ["A", "B".. ]
> 
> 
> Whatever the mechanism, most of these would still understandable as far as the
> tag goes.
> 
> My concern is that since EnumerationLiterals are unique in that the value is
> directly tied to the
> 
> literal, these are the values data providers would have to give in query
> responses, and data products
> 
> when referencing these literals.  I don't think that is good.
> 
> 
> The second option proposed would
> 
>   + allow the definition of a vodml pattern compliant name
> 
>   + and associate it with the real-world literal value (lable|title|fullName)
> 
> 
> Which I think is better.
> 
> 
> 
>