Registry and Data Discovery
Paul Harrison
paul.harrison at manchester.ac.uk
Tue May 30 14:07:41 CEST 2023
Hi,
I have been experimenting with trying to unify data models (https://github.com/pahjbo/DataModelPlayground), as there are several shared concepts between some of the existing (Registry) and proposed (https://github.com/ivoa-std/DatasetDM) data models - I have had a go at doing an automated conversion of the registry XML schema into VO-DML (more on the details of that in another thread), and in doing so on reading the schema files I was rather surprised to see from the comments that DataCollection was being deprecated in favour of CatalogResource.
As you can see (if you can find a way of displaying at suitable magnification) in rough translation of the RegistryDM to VO-DML can be seen here
https://github.com/pahjbo/DataModelPlayground/blob/interface_models/RegistryDM1.2.png
This is bad on two fronts:
1. It breaks a fundamental part of the original Registry Data model design in that the Data and the Services that could supply the data were separately registered and the relationships between them registered (“service for” etc.)
DataCollection isA Resource
CatalogResource isA DataResource isA Service isA Resource
2. The comment suggests that all data are catalogues - though I note that there is a DataResource would would be a more suitable general replacement - the comment seems to make
I realise that the changes above are rather old, and I clearly have not been paying enough attention to what is happening in registry, but if I had I would
have objected with an 8+ on the https://blog.g-vo.org/building-consensus.html scale.
Anyway in catching up, I see that most of the motivation is in https://www.ivoa.net/documents/discovercollections/20190520/EN-discovercollections-1.1-20190520.html#tth_sEc1.2.2 where it is recognised that the original registry design is attractive for its conceptual clarity, but then lists some objections. It seems a shame to me that some more effort was not made to work round the objections to retain this conceptual clarity - e.g. given that there are a relatively small number of searchable registries, they could perhaps do some automated translation/record generation when harvesting from an “old” publishing registry to alleviate points 1 and 2. The third objection is probably more related to the slight collapsing of the model that is necessary to represent it in RegTAP and then the unsuitability of SQL to be able easily to make queries on the relationship parts of the model. - It is a favourite old hobby horse of mine that https://www.w3.org/TR/xquery-31/ (which we used in Astrogrid, because we were storing the registry natively as XML) is a very expressive and exact query language for a data model that is expressed in XML - but I am not expecting that to be a popular option now when it was not popular in the VO when XML was at its height. RegTAP was a recognition that just about everyone was storing the registry in RDBs and that the original registry search interface was practically useless in its vagueness. However, since then there has been a general rise in the use of “noSQL” databases and it might be that there is a way using other query languages - e.g. SPARQL that are more suitable for making data discovery style queries on the model (or some projection of it). Even after the discovering data services note there are still problems discovering Data - https://wiki.ivoa.net/internal/IVOA/InterOpMay2023Registry/hendrik_heinl.pdf, and I worry that the solutions proposed in the note might have been just point fixes rather than stepping back and re-examining some of the fundamentals
I used to more of the opinion that the registry was just a “technical” service, only useful for discovering services and their endpoint URLs, but I realise that it is the only central federated data model that the VO has and that it would be a good idea to ensure that the model does work better for data discovery. It is clear that the 1.x registry data model is not sufficient to do good data discovery, but I think that a better direction of travel to expand the DataCollection part of the model rather than compress everything into a Service.
I could go on, but I will stop here - I just wanted to raise a flag on this issue…
Paul.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2893 bytes
Desc: not available
URL: <http://mail.ivoa.net/pipermail/registry/attachments/20230530/ad34f0a5/attachment.p7s>
More information about the registry
mailing list