Registry and Data Discovery

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Tue May 30 18:40:15 CEST 2023


Hi Paul,

On Tue, May 30, 2023 at 12:07:41PM +0000, Paul Harrison wrote:
> to see from the comments that DataCollection was being deprecated
> in favour of CatalogResource.

(just for completeness: or DataResource in case there's no relational
schema below the resource).

> 1. It breaks a fundamental part of the original Registry Data model
> design in that the Data and the Services that could supply the data
> were separately registered and the relationships between them
> registered (“service for” etc.)

Well, it is actually intended to *enable* that (as you say a bit
further down) without too much effort.

> DataCollection isA Resource
> CatalogResource isA DataResource isA Service isA Resource

I give you that inheritance hierarchy is not overly pretty, and
perhaps we should have introduced capabilities in vs:DataResource
independently of vr:Service (we could still do that, by the way,
without breaking anything; incidentally, the resource types are not
used in data or service discovery in the current Registry, and I
don't see that change any time soon).

But auxiliary capabilities -- which are behind the existing
inheritance tree -- *do* make data discovery a lot simpler in
practice, so going back to the capability-less vs:DataCollection-s is
a recipe for grief.  It's been the original plan, and it was
decidedly horrible.

[perhaps this is another case of overuse of inheritance]

> 2. The comment suggests that all data are catalogues - though I
> note that there is a DataResource would would be a more suitable
> general replacement - the comment seems to make

True, if we had many resources that weren't backed by relational
tables, I'd tend to agree, but that's not our VO; even our
collections of spectra or images are well modelled as tables.  Given
that, nudging people to declare the tableset for their (meta) data
is, I'd argue, the right thing to do -- and that's what the
recommendation to use CatalogResource does.

> have objected with an 8+ on the https://blog.g-vo.org/building-consensus.html scale.

Thanks for pitching the consensus scale :-)  But perhaps I can
mollify your dislike a bit...

> objections. It seems a shame to me that some more effort was not
> made to work round the objections to retain this conceptual clarity

I admit I'd have been grateful for a bit more community participation
in cleaning up the data discovery problem, too (which still is rather
serious for me, because it drove everyone doing TAP-related discovery
to GloTS) -- but I'd still claim I tried really hard getting by with
DataCollection, and it just didn't work out.

> height. RegTAP was a recognition that just about everyone was
> storing the registry in RDBs and that the original registry search

Aw, my main motivation was that astronomers should learn ADQL anyway,
and I wanted to let them do smart things in the Registry without
learning yet another language, and that...

> interface was practically useless in its vagueness. However, since
> then there has been a general rise in the use of “noSQL” databases
> and it might be that there is a way using other query languages -
> e.g. SPARQL that are more suitable for making data discovery style
> queries on the model (or some projection of it). Even after the

...still holds, although I give you that for something like the
Registry, SPARQL would have been an excellent match.  But I'd still
not want to ask astronomers to learn SPARQL, and I'd still like to
let astronomers do smart things in the Registry with the ADQL they
hopefully have learned.

> https://wiki.ivoa.net/internal/IVOA/InterOpMay2023Registry/hendrik_heinl.pdf,
> and I worry that the solutions proposed in the note might have been
> just point fixes rather than stepping back and re-examining some of
> the fundamentals

Perhaps -- but in a thing like the VO, stepping back and changing
fundamentals has a large XKCD 927 risk factor.  Don't get me started
on getting people to put in tablesets into their registry records,
which isn't even approaching a fundamental change...

> ensure that the model does work better for data discovery. It is
> clear that the 1.x registry data model is not sufficient to do good
> data discovery, but I think that a better direction of travel to
> expand the DataCollection part of the model rather than compress
> everything into a Service.

Oh, that's not what we're trying.  It's more "everything can have
(auxiliary) capabilities".  We *could* have effected about the same
thing expressing the auxiliary capabilities with relationships, but
the price in terms of query complexity was so high that the slight
blurring of the DM seemed a good deal.  If I may quote the Zen of
Python:

$ python -c "import this" | egrep "break|practicality"
Special cases aren't special enough to break the rules.
Although practicality beats purity.

            -- Markus


More information about the registry mailing list