DataLink local_semantics optional column proposal

Mark Taylor m.b.taylor at bristol.ac.uk
Fri Aug 12 10:54:27 CEST 2022


Hi DAL,

This message is to propose a new optional column in the {links} response
table of the DataLink standard.
I initially raised it as a github issue; you can see a bit more
discussion at https://github.com/ivoa-std/DataLink/issues/88,
and an earlier incarnation on slide "7/8" of my Victoria 2018 presentation
https://wiki.ivoa.net/internal/IVOA/InterOpMayy2018DAL/dlfeedback.pdf.

The problem I want to solve is to do with looking at links tables
from multiple different rows of a parent table; given a row from a
links table from one parent table row, how does a client identify the
corresponding row in the links table from a different parent table row?

For instance: Gaia DR3 queries on gaia_source can return a service
descriptor associating a links table with each row.
For one source (i.e. parent table row) that links table might look like:

   semantics, description, [other cols]
   ---------, -----------, ------------
   #this, MCMC MSC source Gaia DR3 4040949706019490560, ...
   #this, XP mean sampled spectra source Gaia DR3 4040949706019490560, ...
   #this, XP mean continuous spectra source Gaia DR3 4040949706019490560, ...
   #this, MCMC GSP-Phot source Gaia DR3 4040949706019490560, ...

and for another row like:

   semantics, description, [other cols]
   ---------, -----------, ------------
   #this, MCMC MSC source Gaia DR3 4040165887420469760, ...
   #this, XP mean continuous spectra source Gaia DR3 4040165887420469760, ...
   #this, XP mean sampled spectra source Gaia DR3 4040949706019490560, ...

If a user selects e.g. the "XP mean sampled spectra" datalink item
for the first source they will probably want the same thing when they
look at the next source, and it would be nice for a client like topcat
to be able to default to the corresponding links row rather than
forcing the user to select manually for each source.
It's obvious to a human which this corresponding row is,
but at present there is no reliable way for software to identify it
(admission: topcat currently does some unholy partial string matching
on the description column hacked to do the right thing for the ESA
Gaia DR3 service).

So I'd like to see an additional column to facilitate this.
Markus has suggested the following definition for such a column:

   column name: local_semantics
   type: text
   UCD: meta.id.assoc
   description: An identifier that allows clients to associate rows from
      different datalink documents on the same service with each other.

The above examples might then look like:

   semantics, local_semantics, description, [other cols]
   ---------, ---------------, -----------, ------------
   #this, 1, MCMC MSC source Gaia DR3 4040949706019490560, ...
   #this, 3, XP mean sampled spectra source Gaia DR3 4040949706019490560, ...
   #this, 4, XP mean continuous spectra source Gaia DR3 4040949706019490560, ...
   #this, 2, MCMC GSP-Phot source Gaia DR3 4040949706019490560, ...

   semantics, local_semantics, description, [other cols]
   ---------, ---------------, -----------, ------------
   #this, 1, MCMC MSC source Gaia DR3 4040165887420469760, ...
   #this, 3, XP mean sampled spectra source Gaia DR3 4040165887420469760, ...
   #this, 4, XP mean continuous spectra source Gaia DR3 4040165887420469760, ...

The intention is that within a given context (parent table at least,
perhaps data service or similar) the same local_semantics value
is unique per links response table and always means the "same" thing
(corresponding type of data product).
The content could be either an opaque value like the numeric
tokens in the example above or some more descriptive text
(I'd be inclined to allow any data type for this column rather than
requiring text content, but I'm not adamant).

Such a column would be strictly optional and supplied on a best efforts
basis by the service.  It could be documented in a future version of
the DataLink standard, but until then data providers and consumers
could agree informally to make use of it.  Since links response tables
are allowed to contain non-standard columns, this would not infringe
any standards.

Any comments?  Assuming some agreement or lack of disagreement is
established here about the general idea and specifics, then if
at least one data provider implements this I will add code in
topcat to make use of it.

Thanks

Mark

--
Mark Taylor  Astronomical Programmer  Physics, Bristol University, UK
m.b.taylor at bristol.ac.uk          http://www.star.bristol.ac.uk/~mbt/


More information about the dal mailing list