UCDs and DataLink

Mon Aug 15 10:50:45 CEST 2022

Dear Gregory, dear DAL,

On Tue, Aug 09, 2022 at 10:13:36AM +0000, Dubois-Felsmann, Gregory P. wrote:
> But Markus made a stronger point, which I think I can go ahead and quote, since the issue is public:
> > I'd argue against making any requirements on UCDs [in these sorts of contexts].
> > UCDs are mainly intended for ad-hoc or discovery use, like:
> > 
> > * a client sees an arbitrary VOTable and wants to get an idea what
> >   kind of physics is represented in order to suggest plots for it,
> >   perhaps match it with columns in other tables or similar.
> > 
> > * a client is looking for tables having a particular sort of data in
> >   the registry.
> 
> I'd like to push back a bit on how narrow that field of applicability is.
> 
> In particular, in the context of service descriptors, we are
> finding, as we actually implement the DataLink-intensive design of
> the Rubin Science Platform (RSP), that client software often needs
> some hints as to how to present, in a UI, the "optional" parameters
> to a service named in a service descriptor.  If a service
> descriptor represents a service for which there is an IVOA
> standard, of course, the client software UI can be written against
> the whole of that standard.  But if (as is _very_ frequently the
> case for us) the service descriptor points to a custom data
> service, the UCDs can be useful in providing rendering hints
> without our having to hard-code the client software against the
> specifics of the custom data services.

Well, that is *exactly* the kind of ad-hoc (that is: not governed by
a standard on the, if you will, presentation or application layers in
ISO/OSI lingo) use I was talking about, and to enable that kind of thing
we indeed need to urge data providers to annotate their data with
UCDs (and usually advise them as to what good UCDs might be).

What I was arguing against is that standards require ("MUST") all of
column name, utype, and UCD at the same time, as that has led to a
continuous stream of errata while actual clients didn't actually care
because they were using either the name (e.g., obscore) or the utype
(e.g., SSAP).  And rightly so.  We should give *one* way to find
columns, and only one (my take: column names are just fine).

A side benefit of not having exact UCDs as requirements in standards
is, by the way, that providers can attach richer semantics to their
columns as appropriate.  As an example, take SSAP, which says that
services can give a column with the utype Target.Redshift.  The
current spec *forces* that column's UCD to be src.redshift.

Now consider an SSA service with Gaia RP/BP (low-resolution) spectra;
whatever redshifts you estimate from those probably won't count as
spectroscopic, and so to reinforce that point, I might like to have a
UCD of src.redshift.phot there.  Since SSA has fixed UCDs, my service
would (probably[1]) become invalid then -- for no good reason.

That's where my proposal comes from:  In future standards, let's give
advice as to suitable UCDs, but let's not require (and not even
recommend ("should"), as that would result in warnings) them.
Whether we ought to open up existing standards as we revise them --
well, you'd certainly have my support...

              -- Markus

[1] Though of course, given that that column is tagged optional, it
is unclear whether a validator should raise an error if there is a
column with utype Target.Redshift but some other UCD; repeating my
usual pitch: Let's avoid optional features, and let's be clear on
what breaks if a requirement is violated.  If it turns out that
nothing breaks, then let's drop the requirement.