UAT in VOResource

Mon May 25 12:20:57 CEST 2020

Hello,

For the record, I've the same issue with the source model (CAB-MSD) 
which allows the use of UAT terms for semantic tags.

Le 25/05/2020 à 10:40, Markus Demleitner a écrit :
> Dear Registry, dear Semantics,
> 
> [warning: long mail]
> 
> In its last meeting, the UAT steering committee has agreed to let us
> have an IVOA-ised version of the Unified Astronomy Thesaurus
> (original: http://astrothesaurus.org; IVOA rendition:
> http://www.ivoa.net/rdf/uat).  See below ("Reminder") on how we got
> to want this in case you're missing the big picture here.
> 
> With this, it's now on us to figure out how to fulfill VOResource
> 1.1's promise to "use the UAT" and then make use of that.
> 
> Below, I'm outlining the four steps that I see we have to take to get
> there.  I'd be very grateful for comments, advice, or volunteers on
> any of this.
> 
> 
> (1) Reaching consensus on the term form
> =======================================
> 
> I am almost entirely sure we will have to use the "terms" (i.e., the
> fragments in the term URIs, e.g., virtual-observatories) in
> VOResource's subject field.  Hence, their form is user-visible to
> some degree (e.g., in the RegTAP rr.res_subject table).  On the other
> hand, it's constrained by URI syntax
> 
> In the current UAT "technology study", the standard rule to make such
> a term is to replace all non-alphanumeric material in the labels with
> a dashes (unless there's collisions, and the mapping is constant
> after term creation). This gives terms like virtual-observatories,
> active-galactic-nuclei, low-mass-x-ray-binary-stars, or
> type-ia-supernovae.  UIs still ought to translate these to their
> (UAT English preferred) labels, but as I said, people will interact
> this them outside of such mapping clients.
> 
> So... can we agree on this "term syntax"?  Do we want something else?

As these terms are not supposed to be decoded to retrieve the original 
wording , I agree with this encoding which has the big advantage of 
preserving their human-readabilty.

> 
> 
> (2) Adopting the IVOA-flavoured UAT
> ===================================
> 
> The WD on Vocabularies in the VO 2 says that adopting an external
> vocabulary requires an endorsed note setting forth processes and
> conventions.  I'd draft that, perhaps even starting as soon as people
> are reasonably happy with what's done in UAT mapping now, as a proof
> of concept for what VocInVO2 says on externally managed vocabularies.
> 

The way UAT is mapped looks fine. The concern is more likely the way 
this mapping will be maintained.

> 
> (3) VOResource explanation on the UAT
> =====================================
> 
> Once we agree on how the IVOA-flavoured UAT looks like, the next
> question is what we do in VOResource 1.1.  The meagre "Terms for
> Subject should be drawn from the Unified Astronomy Thesaurus
> (http://astrothesaurus.org)." at least needs *some* form of
> explanation.
> 
> My first proposal here would be to have an erratum that changes that
> annotation to something like
> 
>    Subject keywords should be terms from the IVOA rendition
>    http://www.ivoa.net/rdf/uat of the Unified Astronomy Thesaurus
>    (http://astrothesaurus.org).  Use the fragment part of the term's
>    URI (i.e., virtual-observatories for
>    http://www.ivoa.net/rdf/uat#virtual-observatories, which is the
>    same as http://astrothesaurus.org/uat/1774).
> 
> As to timing, I'd say this would have to wait until VocInVO is at
> least in RFC, ideally REC.
> 
> As to content, I think we can think about it already at this point,
> in particular whether we keep it at "should" (which would mean that
> the RofR validator will eventually issue warnings for non-UAT subject
> keywords) and whether we need more explanation, perhaps even in the
> main text of VOResource.

I'm not involved in VOResource, but I'll try to be compliant for CAB-SMD

> 
> 
> (4) Making this useful
> ======================
> 
> A major purpose of the whole effort with subject keywords in general
> and using a specific controlled vocabulary in particular is to
> finally be able to organise VO resources within a hierarchical,
> controlled vocabulary.
> 
> Let's be honest, though: It'll be a long, long time before registry
> record authors will put UAT keywords into their records in sufficient
> numbers to make such an organisation useful.  That's even more true
> as long as we're in the vicious cycle of too few "good" keywords to
> make keywords useful, hence nothing exploiting them, hence no
> incentive to put in compliant keywords, hence only very few of them.
> 
> On the other hand, if you execute
> 
>    select count(*) as ct, res_subject
>    from rr.res_subject
>    group by res_subject
> 
> on some RegTAP service (just tried on http://reg.g-vo.org/tap),
> you'll see that right now threre's just ~1000 distinct res_subject
> values set, and a lot of them have rather direct matches in the UAT.
> 
> Which makes me wonder: Perhaps the three operators of the RegTAP
> services around could form an ad-hoc group and agree on a mapping of
> existing terms to UAT terms and then add the (ivoid,
> term-mapped-to-UAT) pairs to res_subject?  Or perhaps even replace
> the mapped values (of course, whatever is not mappable would remain
> as-is)?
> 
> I give you it's probably a bit fringy, but then RegTAP already says
> to map deprecated terms (on other vocabularies) before ingestion, so
> it's not totally unheard-of.
> 
> 
> Reminder: Why do all this?
> ==========================
> 
> The main reason why we want this is that VOResource 1.1 says, in its
> annotation for the subject child of vr:Content:
> 
>    Terms for Subject should be drawn from the Unified Astronomy
>    Thesaurus (http://astrothesaurus.org).
> 
> The trouble is that that's not nearly enough for people to do
> anything, because it doesn't say just *how* to use it (and sure
> enough, as far as I know, nobody is even trying to comply with this).
> 
> "Using the UAT" is not trivial because UAT identifiers look like
> "http://astrothesaurus.org/uat/1774".  Traditionally, however, our
> subjects were human-readable, and clients such as TOPCAT exploit that
> fact by matching searches against subjects.  I claim we shouldn't
> break that.
> 
> We could use the labels from the UAT, which nicely look like what
> we've always used ("Virtual observatories", say, for 1774, where
> admittedly I'm not so enthusiastic about the plural).  But the labels
> aren't guaranteed to be stable and not even unique between different
> concepts, so going down that road opens up all kinds of pitfalls.
> 
> Hence the plan of defining a stable mapping from english-language
> term URIs to UAT URIs -- which then also yields the benefit that you
> can programmatically use the vocabulary without needing RDF tools by
> VocInVO2's mechanisms.
> 
For each item, we have 3 elements
- the UAT URL
- The word (e.g. active galactic nuclei)
- The term (dashed word)
The concern is that the "word" can change, but if we do not want to 
break existing stuff, the "term" must keep persistant even after being 
obsolted by a "word" change.
The straightforward way to do it is to maintain a mapping looking like 
this (I'm a JSON fan)
{ "word" : "our word",
   "uat_uri": "http://astrothesaurus.org/uat/999999"
   "terms": [ "our-word", "my-word"]
}
and to put on the top of this something making both 
http://www.ivoa.net/rdf/uat#our-words and 
http://www.ivoa.net/rdf/uat#my-words redirected to 
http://astrothesaurus.org/uat/999999
The upgrade of such a mapping could easily be automated.

This mapping mechanism could be described in VocInVO2 if needed.

Laurent
> 
> So, again: I'll be grateful for whatever ideas on all this you have
> to share.
> 
> Thanks,
> 
>              Markus
> 

-- 
---- English version:
      https://www.deepl.com/

---- Laurent MICHEL              Tel  (33 0) 3 68 85 24 37
      Observatoire de Strasbourg  Fax  (33 0) 3 68 85 24 32
      11 Rue de l'Universite      Mail laurent.michel at astro.unistra.fr
      67000 Strasbourg (France)   Web  http://astro.u-strasbg.fr/~michel
---