linking capabilities with tablesets

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Thu Mar 28 10:27:21 CET 2024


Dear Gilles,

On Mon, Mar 25, 2024 at 02:50:27PM +0100, gilles landais via registry wrote:
> Changing the VizieR granularity from catalogue to datasets is of course not
> possible for so many reason that I will not detail here.

Well, this is not (necessarily) about VizieR, but only VizieR's
interface to the VO Registry.  I'd respectfully suggest that creating
two or more registry records out of a single internal VizieR record
might not cause too much upheaval in your code and resource
management.

But sure, it's easy for me to say that.  I don't do that overly
lightly though, but because I am really convinced that unless we
completely overturn the way we are doing data discovery, everything
*will* creak and groan as long as you squeeze -- in the Gaia example
-- a source catalogue and a collection of time series into one
resource record.

This is not only a problem for the human-readable description as
outlined in my previous mail.  Let me in particular point out the
product-type issue again:
https://github.com/ivoa-std/VODataService/pull/1.

Suppose a client discovers "There's time series in
ivo://cds.vizier/i/355".  How does it then decide it has to query
I/355/epphot rather than one of the other tables in the resource
record?  And if you merge in the XP spectra in this record, too, even
a link to the obscore table won't help any more.

This problem immediately disappears if you create an extra resource
record -- which, again, doesn't mean that VizieR has to change
anything with the internal management of the resource, except perhaps
some extra hint of the type "make a time series resource record with
this table and capabilities A and B here".


> Just keep in mind that a catalogue correspond to a reference article and can
> contain several datasets(tables).

Sure -- but there is nothing wrong with having multiple resource
records with the same content/source field.

> The logic to gather these datasets in a single entity that includes common
> metadata is a valuable capability of the registry.

...and of course there's nothing wrong with sharing whatever metadata
is common between these resource records.

> Linking tablesets with its interfaces seem to be natural.

I'll read this as "Linking tables with capabilities".  Any yes, it's
certainly something that's natural if you have several pairs of
tables and interfaces in the resource record that are strongly
related within the pair but only very loosely across the pairs.

Regrettably, to the rest of what we do in VO discovery at this time
it's not natural.  Before we go this way, we will need a good plan
how these relationships will be exploited *in discovery*.  Continuing
the example above, suppose someone writes, with a future
pyvo.registry Dataproducttype constraint:

  rscs = pyvo.registry.search(
    Dataproducttype('time-series'),
    Keywords("Gaia"))

What happens then?  How would be implement
rscs[0].look_for_a_time_series_at(ra, dec)?

Or course, a similar challenge happens for my "Recent results from
the Volute radio dish":

  rscs = pyvo.registry.search(
    Servicetype("scs"),
    Keywords("Quasars"))

What do I do then to actually query the table with the quasars rather
than that of cataclysmic binaries?

> Talking about protocol  - ok, but I think that it is another debate!
> I agree and in favor to create a new SCS, more DALI compliant and that would
> serve tablesets collection - It would be a good feature, may be better
> included in the registry. But it doesn't exist yet and when it will,  this 
> migration of an architecture with service specific to dataset  to an
> architecture that serve tablesets collection (like TAP)  will impact a non
> negligible part of the VO architecture.

I am convinced that making the table-capability link work in practice
is significantly more work than that -- but I'd be happy to be taught
otherwise, of course.

> For the story, we discovered this issue recently when we added notebooks in
> order to educate our users with pyvo.

For the benefit of innocent bystanders: That's
<https://github.com/astropy/pyvo/pull/505> and its immediate
environment.

I have to say that I think the solution we came up with in pyVO is
not unreasonable (and neither is TOPCAT's handling of this kind of
thing) given the metadata we have and the structure of *that*
problem, and I can't really see how the extra table-capability link
could improve on that.

So, I think the immediate pain is somewhat relieved, in particular
because clients notice something is amiss when they try to use
multi-conesearch resources without another look.

But I give you it does not solve the discovery problem I outlined
with the Volute Radio Dish mock example.  However, neither does the
table-capability link, at least without serious changes to our
Registry operations.

Perhaps we should have a Registry running meeting on this?

           -- Markus



More information about the registry mailing list