XML invalid entries in RoR
Jenkins, Dustin
Dustin.Jenkins at nrc-cnrc.gc.ca
Tue Feb 11 20:51:31 CET 2025
The reharvesting code is already there. There is a JAR built as part of compilation called "rofr-identities-harvesting-uber.jar" that is executable with the Java binary. It currently runs once a week.
Best,
Dustin
________________________________
From: registry <registry-bounces at ivoa.net> on behalf of Markus Demleitner via registry <registry at ivoa.net>
Sent: February 11, 2025 1:04:41 AM
To: Paul Harrison via registry
Subject: Re: XML invalid entries in RoR
***Attention*** This email originated from outside of the NRC. ***Attention*** Ce courriel provient de l'extérieur du CNRC.
Hi Paul,
On Tue, Feb 11, 2025 at 08:31:29AM +0000, Paul Harrison via registry wrote:
> there is quite a large proportion of the entries for registries in
> the RoR that are technically XML invalid
>
> e.g.
>
> % curl "http://rofr.ivoa.net/oai?verb=GetRecord&metadataPrefix=ivo_vor&identifier=ivo://src.pas/__system__/services/registry"
[...]
> is invalid as there is no declaration of the namespace of the vg:
> prefix. Although it is obvious what namespace this is referring to,
> if the output is read via an XML processor it will immediately stop
> at this error. I am not sure what the is the official way of
> getting these entries fixed, but clearly the most efficient would
> be a bulk update on the RoR content if someone has that access.
Ach... yeah, we should make more of an effort to maintain the RofR
codebase (or perhaps re-implement it, shedding some of the cruft);
I'm having "add vocabulary validation" on my todo list for a long
while.
At least in this case, I'm pretty sure the problem is that when
re-harvesting the record, the namespace declarations on the OAI-PMH
root element get lost; to see what probably happened, see:
<http://pithia.cbk.waw.pl/oai.xml?verb=GetRecord&metadataPrefix=ivo_vor&identifier=ivo://src.pas/__system__/services/registry>
Paul, in case you'd like to try your hand fixing this, the RofR
source is at <https://github.com/ivoa/rofr.ivoa.net>, although I
suspect the re-harvesting code is not in there yet (Dustin?).
Thanks,
Markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/registry/attachments/20250211/ab6ca80f/attachment.htm>
More information about the registry
mailing list