UAT in VOResource

Baptiste Cecconi ceccobapts at yahoo.fr
Wed May 27 14:01:53 CEST 2020


Hi Markus and all, 

good to see this going forward. I've recently explored a bit the usage of the UAT while making up Datacite metadata for data collection DOIs at PADC. 
The Datacite model implements a valueURI, and schemeURI and a value. Exemples:

<subject valueURI="http://astrothesaurus.org/uat/1338" schemeURI="http://astrothesaurus.org">Radio astronomy</subject>
<subject valueURI="http://astrothesaurus.org/uat/1426" schemeURI="http://astrothesaurus.org">Saturn</subject>

The good thing with this is that I can refer to another thesaurus or another URI for terms I can't find in the UAT (for instance, the name of the space mission). 

As Laurent noted, wording of the term may change while the URI is unchanged. So using the URI seems better for interoperability and sustainability. However it is clearly not human-eye-friendly. 

The overal proposal that you drafted looks ok for me. 

Since I'd like to keep it open to use external terms, then the use of terms outside UAT (or its VO flavour) should require the same kind of metadata as in the Datacite model (i.e., an valueURI, and schemeURI and a human-readable value). Example (excerpt from another datacite DOI record of mine):

<subject valueURI="https://nssdc.gsfc.nasa.gov/nmc/spacecraft/display.action?id=1997-061A" schemeURI="https://nssdc.gsfc.nasa.gov/nmc/">Cassini Orbiter</subject>

This URI is not strictly a thesaurus term, but it is an URI pointing to the definition of a term. 

Cheers
Baptiste


> Le 25 mai 2020 à 10:40, Markus Demleitner <msdemlei at ari.uni-heidelberg.de> a écrit :
> 
> Dear Registry, dear Semantics,
> 
> [warning: long mail]
> 
> In its last meeting, the UAT steering committee has agreed to let us
> have an IVOA-ised version of the Unified Astronomy Thesaurus
> (original: http://astrothesaurus.org; IVOA rendition:
> http://www.ivoa.net/rdf/uat).  See below ("Reminder") on how we got
> to want this in case you're missing the big picture here.
> 
> With this, it's now on us to figure out how to fulfill VOResource
> 1.1's promise to "use the UAT" and then make use of that.
> 
> Below, I'm outlining the four steps that I see we have to take to get
> there.  I'd be very grateful for comments, advice, or volunteers on
> any of this.
> 
> 
> (1) Reaching consensus on the term form
> =======================================
> 
> I am almost entirely sure we will have to use the "terms" (i.e., the
> fragments in the term URIs, e.g., virtual-observatories) in
> VOResource's subject field.  Hence, their form is user-visible to
> some degree (e.g., in the RegTAP rr.res_subject table).  On the other
> hand, it's constrained by URI syntax.
> 
> In the current UAT "technology study", the standard rule to make such
> a term is to replace all non-alphanumeric material in the labels with
> a dashes (unless there's collisions, and the mapping is constant
> after term creation). This gives terms like virtual-observatories,
> active-galactic-nuclei, low-mass-x-ray-binary-stars, or
> type-ia-supernovae.  UIs still ought to translate these to their
> (UAT English preferred) labels, but as I said, people will interact
> this them outside of such mapping clients.
> 
> So... can we agree on this "term syntax"?  Do we want something else?
> 
> 
> (2) Adopting the IVOA-flavoured UAT
> ===================================
> 
> The WD on Vocabularies in the VO 2 says that adopting an external
> vocabulary requires an endorsed note setting forth processes and
> conventions.  I'd draft that, perhaps even starting as soon as people
> are reasonably happy with what's done in UAT mapping now, as a proof
> of concept for what VocInVO2 says on externally managed vocabularies.
> 
> 
> (3) VOResource explanation on the UAT
> =====================================
> 
> Once we agree on how the IVOA-flavoured UAT looks like, the next
> question is what we do in VOResource 1.1.  The meagre "Terms for
> Subject should be drawn from the Unified Astronomy Thesaurus
> (http://astrothesaurus.org)." at least needs *some* form of
> explanation.
> 
> My first proposal here would be to have an erratum that changes that
> annotation to something like 
> 
>  Subject keywords should be terms from the IVOA rendition
>  http://www.ivoa.net/rdf/uat of the Unified Astronomy Thesaurus
>  (http://astrothesaurus.org).  Use the fragment part of the term's
>  URI (i.e., virtual-observatories for
>  http://www.ivoa.net/rdf/uat#virtual-observatories, which is the
>  same as http://astrothesaurus.org/uat/1774).
> 
> As to timing, I'd say this would have to wait until VocInVO is at
> least in RFC, ideally REC.
> 
> As to content, I think we can think about it already at this point,
> in particular whether we keep it at "should" (which would mean that
> the RofR validator will eventually issue warnings for non-UAT subject
> keywords) and whether we need more explanation, perhaps even in the
> main text of VOResource.
> 
> 
> (4) Making this useful
> ======================
> 
> A major purpose of the whole effort with subject keywords in general
> and using a specific controlled vocabulary in particular is to
> finally be able to organise VO resources within a hierarchical,
> controlled vocabulary.
> 
> Let's be honest, though: It'll be a long, long time before registry
> record authors will put UAT keywords into their records in sufficient
> numbers to make such an organisation useful.  That's even more true
> as long as we're in the vicious cycle of too few "good" keywords to
> make keywords useful, hence nothing exploiting them, hence no
> incentive to put in compliant keywords, hence only very few of them.
> 
> On the other hand, if you execute
> 
>  select count(*) as ct, res_subject
>  from rr.res_subject
>  group by res_subject
> 
> on some RegTAP service (just tried on http://reg.g-vo.org/tap),
> you'll see that right now threre's just ~1000 distinct res_subject
> values set, and a lot of them have rather direct matches in the UAT.
> 
> Which makes me wonder: Perhaps the three operators of the RegTAP
> services around could form an ad-hoc group and agree on a mapping of
> existing terms to UAT terms and then add the (ivoid,
> term-mapped-to-UAT) pairs to res_subject?  Or perhaps even replace
> the mapped values (of course, whatever is not mappable would remain
> as-is)?
> 
> I give you it's probably a bit fringy, but then RegTAP already says
> to map deprecated terms (on other vocabularies) before ingestion, so
> it's not totally unheard-of.
> 
> 
> Reminder: Why do all this?
> ==========================
> 
> The main reason why we want this is that VOResource 1.1 says, in its
> annotation for the subject child of vr:Content:
> 
>  Terms for Subject should be drawn from the Unified Astronomy
>  Thesaurus (http://astrothesaurus.org).
> 
> The trouble is that that's not nearly enough for people to do
> anything, because it doesn't say just *how* to use it (and sure
> enough, as far as I know, nobody is even trying to comply with this).
> 
> "Using the UAT" is not trivial because UAT identifiers look like
> "http://astrothesaurus.org/uat/1774".  Traditionally, however, our
> subjects were human-readable, and clients such as TOPCAT exploit that
> fact by matching searches against subjects.  I claim we shouldn't
> break that.
> 
> We could use the labels from the UAT, which nicely look like what
> we've always used ("Virtual observatories", say, for 1774, where
> admittedly I'm not so enthusiastic about the plural).  But the labels
> aren't guaranteed to be stable and not even unique between different
> concepts, so going down that road opens up all kinds of pitfalls.
> 
> Hence the plan of defining a stable mapping from english-language
> term URIs to UAT URIs -- which then also yields the benefit that you
> can programmatically use the vocabulary without needing RDF tools by
> VocInVO2's mechanisms.
> 
> 
> So, again: I'll be grateful for whatever ideas on all this you have
> to share.
> 
> Thanks,
> 
>            Markus



More information about the semantics mailing list