[VEP-0001] DataLink semantics vocabulary enhacement proposal

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Wed Nov 13 09:52:38 CET 2019


Hi Pat,

On Tue, Nov 12, 2019 at 09:21:49AM -0800, Patrick Dowler wrote:
> On the ObsCore dataproduct_type and subtype, I also have the feeling there
> that the (optional) subtype isn't a highly useful construct when I contrast
> it with the alternative of making dataproduct_type a real extensible
> vocabulary. The catch, of course, would be to make it feasible for people
> to query (eg in TAP+ADQL) a vocabulary column. Output is not a problem but
> querying right now would be by exact match only... it would be really cool
> if you could do something like "where ivo_vocab_match(dataproduct_type,
> 'cube')" and that would match "cube" and child terms... or you could be
> more specific (down to dataproduct_type = 'specific type').

First, with my Semantics chair hat on, I like that a lot.  This kind
of application is exactly why we plan to have the strictly forest-like
vocabularies (i.e., terms have zero or one parent).  

My gut feeling is that we'd like to have two functions (in brainstorm
mode), ivo_generalizing_match(col, term), that expands into term and
all parents, and ivo_specializing_match(col, term)[1], that expands
into term and all descendants.  We'd have to collect a few use cases,
though, and defining reasonable behaviour with SKOS would be cool,
too.

I'm officially in python3 porting retreat now, but if you poke me
again when DaCHS is python3-clean, this sounds like a little note and
implementation I'd like to write.

> I think I could handle this feasibly if the vocab function just dealt with
> the base terms, eg "where ivo_vocab_base(dataproduct_type) = 'cube' -- that
> would give the same query power as now but allow extending the vocabulary
> to more specific types. I think I like that better than a type/subtype
> pair...
> 
> Thoughts?

You have my (preliminary) vote, in particular because that would
further the utility of having dataproduct_type in datalink in
parallel to obscore; although conceptually, having an IVOA-defined
content parameter in the media type has more appeal to me, I think
both client authors (surprisingly little code exists that properly
parses media types) and server operators (who may have a hard time
configuring the apaches to properly hand out such media types) will
probably like it best if they can simply share code between obscore
and datalink and put and fetch terms into and from the much more
accessible table columns.

        -- Markus


[1] Bonus points if someone comes up with names that are spelled the
same on both sides of the Atlantic.


More information about the dal mailing list