Comments on Vocabularies in the VO 2.0

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Fri Dec 13 09:47:49 CET 2019


Hi Alberto,

Thanks for your feedback.  Here's some thoughts from me while having
my Semantics chair hat off in most paragraphs:

On Thu, Dec 12, 2019 at 01:44:30PM -0800, Accomazzi, Alberto wrote:
> * Given that the UAT exists and is being used as a source for concepts
> describing the literature and IVOA registry resources, how does the IVOA
> community see its role, if any, in the new vocabulary specification?

I certainly would like to see uptake of the UAT whereever applicable.
That already concerns VOResource (where I hope the complicance of the
resource records will improve as journal keyword systems evolve
towards UAT; but see footnote 1 below).

The current Vocabularies WD, on the other hand, has its central focus
on defining technology and processes for the smallish, rather formal
vocabularies we've introducing in several standards (Datalink,
VOResource, VOTable).

That we do define a subset of SKOS that, I hope, will be easy to
digest for clients, however, has of course looked towards the UAT
(and *perhaps* similar vocabularies), so they are in scope, and
you're right that we have to understand the implications on them
before going on.

> * Do you see the UAT as a source of vocabulary for the IVOA to be somehow
> incorporated in the proposed framework, or is it to be used as an external
> resource?

So far, I've considered it to be an external resource beyond IVOA's
control.  

I am fairly sure the UAT governance (in particular the journals) will
not want to follow IVOA procedures or be subject to TCG review, and
hence I'd expect the UAT to remain external in that sense for the
forseeable future.

On the other hand, since the UAT is referenced by at least one VO
standard, I acknowledge that it might deserve some sort of
first-class citizen status.  And, perhaps more pertinently, it would
clearly be nice if VO tools didn't need special tooling to process
UAT terms (over dealing with other IVOA vocabularies).

In that sense, it seems to me the problem isn't really political in
nature (the IVOA Exec, at least, would certainly not object to endorsing
the UAT).  It seems to me, to a large degree, to be technical.

> * If there is going to be an "IVOA UAT version," what would be the process
> to coordinate this vocabulary with the further development of the UAT?

Once we have, for instance, validator software actually evaluating
semantic annotation as envisioned in the current WD, one location
they should be working on is VOResource's subject field, where it
says: "Terms for Subject should be drawn from the Unified Astronomy
Thesaurus ", and hence a warning could be issued if it's not.  As I
said, I would be preferable if they didn't have to special-case the
UAT.

I can see two options for how to do that:

(1) As soon as there's a UAT release, a piece of software produces a
version of it following the IVOA rules for how vocabularies need to
be deployed and formatted.

(2) The UAT itself follows these rules.

Option (1) sounds a little unattractive to me mainly because we'd be
doubling the number of concepts out there (remember, each URI
introduces a new concept, and the URIs in the IVOA copy would
certainly be different from the UAT ones).  On the other hand, this
could be mitigated a bit by linking IVOA concepts and UAT concepts
with skos:exactMatch.

Option (2) seems hard to me.  Although of course the "IVOA rules" are
still negotiable, I suspect the requirements on the two ends are too
different. For instance, I'd expect that the UAT wants many of the
fancier features of SKOS (multi-language labels, hidden labels,
perhaps closeMatch and broadMatch relations to other
vocabularies...), whereas we want maximally simple and predictable
consumability by client software or, for instance, human-readable
concept URIs.

The deal with these human-readable concept URIs is that the UAT
currently follows best... well... librarian practice in having
language-agnostic concept URIs.  For instance, the concept "cold dark
matter" is identified as http://astrothesaurus.org/uat/265.  This is
done (mainly) in order to not privilege any specific language.  It
works admirably well in wikidata, but on the other hand is a certain
stumbling block for newcomers there.

In the IVOA, however, we're privileging English anyway, to the point
of basically excluding any other language in the current Vocabulary
draft (as in, for that matter, VOResource).  If we didn't have these
"basically human readable" concept URIs (in the example, it would
probably be http://www.ivoa.net/rdf/uat#ColdDarkMatter or perhaps
#cold-dark-matter, as we prefer), we couldn't really include
concepts into table columns as in datalink or RegTAP[1] (well, we
could, but then software clients would again have a much harder life
to translate these things into something humans would want to look
at).

After these considerations, I start becoming convinced that the "have
a reformatted, english-only UAT copy at IVOA with skos:exactMatch
links to upstream" option (2) above, albeit unattractive at first, is
actually helpful and a good idea.  I'd also volunteer to write the
associated software.

Of course, I might be completely mistaken in some of my assessments.
So, perhaps we should have some live conversation on that?  Much as I
despise telecons, I'd be available for one, perhaps in the week Jan
13-16?

Thanks again for the feedback -- this is, indeed, an omission in the
current draft, and I've largely missed the implications so far.

          -- Markus


[1] That's actually a skeleton in the closet that I only noticed when
working on the Vocabularies WD: VOResource says "use UAT", but it
doesn't say how.  I don't think we want UAT URIs in VOResource
<subject>.  The numbers I like even less.  But then do we want the
preferred labels?  But are these stable enough?  As soon as we've worked
out the Vocabularies business, we'll have to have a clarification for
that.  It's not overly urgent, though, as it seems everyone is
ignoring the UAT "should" on VOResource content/subject so far.


More information about the semantics mailing list