UAT in VOResource

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Mon May 25 10:40:12 CEST 2020


Dear Registry, dear Semantics,

[warning: long mail]

In its last meeting, the UAT steering committee has agreed to let us
have an IVOA-ised version of the Unified Astronomy Thesaurus
(original: http://astrothesaurus.org; IVOA rendition:
http://www.ivoa.net/rdf/uat).  See below ("Reminder") on how we got
to want this in case you're missing the big picture here.

With this, it's now on us to figure out how to fulfill VOResource
1.1's promise to "use the UAT" and then make use of that.

Below, I'm outlining the four steps that I see we have to take to get
there.  I'd be very grateful for comments, advice, or volunteers on
any of this.


(1) Reaching consensus on the term form
=======================================

I am almost entirely sure we will have to use the "terms" (i.e., the
fragments in the term URIs, e.g., virtual-observatories) in
VOResource's subject field.  Hence, their form is user-visible to
some degree (e.g., in the RegTAP rr.res_subject table).  On the other
hand, it's constrained by URI syntax.

In the current UAT "technology study", the standard rule to make such
a term is to replace all non-alphanumeric material in the labels with
a dashes (unless there's collisions, and the mapping is constant
after term creation). This gives terms like virtual-observatories,
active-galactic-nuclei, low-mass-x-ray-binary-stars, or
type-ia-supernovae.  UIs still ought to translate these to their
(UAT English preferred) labels, but as I said, people will interact
this them outside of such mapping clients.

So... can we agree on this "term syntax"?  Do we want something else?


(2) Adopting the IVOA-flavoured UAT
===================================

The WD on Vocabularies in the VO 2 says that adopting an external
vocabulary requires an endorsed note setting forth processes and
conventions.  I'd draft that, perhaps even starting as soon as people
are reasonably happy with what's done in UAT mapping now, as a proof
of concept for what VocInVO2 says on externally managed vocabularies.


(3) VOResource explanation on the UAT
=====================================

Once we agree on how the IVOA-flavoured UAT looks like, the next
question is what we do in VOResource 1.1.  The meagre "Terms for
Subject should be drawn from the Unified Astronomy Thesaurus
(http://astrothesaurus.org)." at least needs *some* form of
explanation.

My first proposal here would be to have an erratum that changes that
annotation to something like 

  Subject keywords should be terms from the IVOA rendition
  http://www.ivoa.net/rdf/uat of the Unified Astronomy Thesaurus
  (http://astrothesaurus.org).  Use the fragment part of the term's
  URI (i.e., virtual-observatories for
  http://www.ivoa.net/rdf/uat#virtual-observatories, which is the
  same as http://astrothesaurus.org/uat/1774).

As to timing, I'd say this would have to wait until VocInVO is at
least in RFC, ideally REC.

As to content, I think we can think about it already at this point,
in particular whether we keep it at "should" (which would mean that
the RofR validator will eventually issue warnings for non-UAT subject
keywords) and whether we need more explanation, perhaps even in the
main text of VOResource.


(4) Making this useful
======================

A major purpose of the whole effort with subject keywords in general
and using a specific controlled vocabulary in particular is to
finally be able to organise VO resources within a hierarchical,
controlled vocabulary.

Let's be honest, though: It'll be a long, long time before registry
record authors will put UAT keywords into their records in sufficient
numbers to make such an organisation useful.  That's even more true
as long as we're in the vicious cycle of too few "good" keywords to
make keywords useful, hence nothing exploiting them, hence no
incentive to put in compliant keywords, hence only very few of them.

On the other hand, if you execute

  select count(*) as ct, res_subject
  from rr.res_subject
  group by res_subject

on some RegTAP service (just tried on http://reg.g-vo.org/tap),
you'll see that right now threre's just ~1000 distinct res_subject
values set, and a lot of them have rather direct matches in the UAT.

Which makes me wonder: Perhaps the three operators of the RegTAP
services around could form an ad-hoc group and agree on a mapping of
existing terms to UAT terms and then add the (ivoid,
term-mapped-to-UAT) pairs to res_subject?  Or perhaps even replace
the mapped values (of course, whatever is not mappable would remain
as-is)?

I give you it's probably a bit fringy, but then RegTAP already says
to map deprecated terms (on other vocabularies) before ingestion, so
it's not totally unheard-of.


Reminder: Why do all this?
==========================

The main reason why we want this is that VOResource 1.1 says, in its
annotation for the subject child of vr:Content:

  Terms for Subject should be drawn from the Unified Astronomy
  Thesaurus (http://astrothesaurus.org).

The trouble is that that's not nearly enough for people to do
anything, because it doesn't say just *how* to use it (and sure
enough, as far as I know, nobody is even trying to comply with this).

"Using the UAT" is not trivial because UAT identifiers look like
"http://astrothesaurus.org/uat/1774".  Traditionally, however, our
subjects were human-readable, and clients such as TOPCAT exploit that
fact by matching searches against subjects.  I claim we shouldn't
break that.

We could use the labels from the UAT, which nicely look like what
we've always used ("Virtual observatories", say, for 1774, where
admittedly I'm not so enthusiastic about the plural).  But the labels
aren't guaranteed to be stable and not even unique between different
concepts, so going down that road opens up all kinds of pitfalls.

Hence the plan of defining a stable mapping from english-language
term URIs to UAT URIs -- which then also yields the benefit that you
can programmatically use the vocabulary without needing RDF tools by
VocInVO2's mechanisms.


So, again: I'll be grateful for whatever ideas on all this you have
to share.

Thanks,

            Markus


More information about the semantics mailing list