PR#2 for Provenance DM

Fri Sep 6 09:43:49 CEST 2019

Hi Gerard,

On Thu, Sep 05, 2019 at 06:32:20PM +0000, Gerard Lemson wrote:
[provDM relevant stuff moved up here]
> One more thing. VO-DML does not say too much about *how* the
> attribute's value should identify the semantic concept.  This is a
> serialization issue. May also depend on the nature of the
> vocabulary maybe? In SKOS we assumed a URI.  Or, if the
> vocabularyURI was set in the model, the prefLabel of the concept
> might be sufficient.

I'm bit relieved you consider the question of term representation
rather open, because as I said I'd rather not dump long URIs into
fields humans might look at unless we had to (and even then I'd much
rather introduce "readable" abbreviations like, say, CURIEs with
sensible prefix definitions).

In the presence of SemanticConcept/@vocabularyURI, however, we
clearly don't *need* full URIs because the vocabularyURI is a fixed
prefix.  I'd still not use the prefLabel to identify the concrete
term, because you can't use them to build the URIs with which you can
go and look up the term's properties (plus, at least in principle,
there's no uniqueness guarantees on labels).

So, my take would be that somewhere we should have some language
like:

  When SemanticConcept/@vocabularyURI is given, the full term URI is
  obtained by concatenating the vocabularyURI with the attribute
  value.

VO-DML itself would be a good place.  Perhaps we could put it on the
VO-DML_1_0-Next page (that should exist anyway by DocStd 2.0) for now?

And with this, ProvDM would just put the vocabulary URIs into the
VO-DML, mention them in the text, perhaps mention the initial terms
(people wanted that in VOResource), and then have nice-looking, short
terms in their annotations.  I, for one, like it.

Warning to ProvDM folks: SimDM stuff only below here

> > [I wrote:]
> > -- the topConcept part would be interesting if pulling things from, say, the
> > UAT (which, incidentally, might be a good idea in SimDM), but I guess that's
> > irrelevant here.
> > 
> Note that topconcept was indeed introduced in SimDM (then called
> broadestConcept) upon Norman Gray's suggestions.  He suggested that
> in the world of semantic (then SKOS) vocabularies one should be
> able to declare that an attribute should have a value that
> identifies a concept in some vocabulary, and that that concept must
> be narrower than the declared (top)concept. I.e. no restriction to

Yes, that's what I meant with my short SimDM remark.  For instance,
in SimDM, you have a vocabulary for astronomical object types, and
I'm sure it's eminently useful to have such a vocabulary beyond
SimDM (e.g., to make SSA's Target.Class field useful).

So far, the plan has been to have a custom IVOA vocabulary for those
(draft at http://ivoa.net/rdf/theory/AstronomicalObjects),
and that's perhaps the most straightforward way to go.  On the other
hand, the UAT (http://astrothesaurus.org) probably already has almost
all of the terms we have (and more).

Now, if the UAT were to introduce a term "Object Type" and made all
object types narrower than that, you could use the UAT as
vocabularyURI and the hypothetical uat:ObjectType as top concept.
Except for an occasional filling of gaps (that would benefit the
entire community on top) we'd not have to do a thing and get an
up-to-date extensive list of object types and their hieararchy for
free.

> a given vocabulary.  This is very loose I think and may not be too
> useful for simple query engines that would prefer some predefined
> (though maybe extendable) list of valid values.  To support the
> latter purpose we have the vocabularyURI.

> (Markus, do I read from this that you would be happier with the
> vocabularies in SimDM1.1 if we would only have a topconcept in our
> standard, rather than a vocabulary URI?

It depends.  Where most of the terms are in UAT with similar use
cases (and beyond object types I suspect physical processes and
physical quantities might be), I'd say it'd be preferable to re-use
what they have and improve it.

For data object types and algorithms, I suspect the UAT's goals are
so far removed from yours (and probably ours in the wider IVOA) that
it makes perfect sense to have vocabularies of our own (though in
particular for data object types -- which might find use, e.g., in
obscore's dataproduct_subtype -- I'd prefer straight RDFS because of
the clearer semantics).

Lest SimDM authors panic: all in the department of "In A Perfect
World"; I'm not saying semantics would veto a new SimDM if you keep
your own vocabularies.  It'd still be great if folks from the Theory
IG could look into whether the UAT might help them.  

Vocabulary work is hard, and if a duplication of effort can be
avoided, everyone profits.

          -- Markus