<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="Generator" content="Microsoft Exchange Server">
<!-- converted from text --><style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>
</head>
<body>
<meta content="text/html; charset=UTF-8">
<style type="text/css" style="">
<!--
p
{margin-top:0;
margin-bottom:0}
-->
</style>
<div dir="ltr">
<div id="x_divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:#000000; font-family:Calibri,Helvetica,sans-serif">
<p>The reharvesting code is already there. There is a JAR built as part of compilation called "<span>rofr-identities-harvesting-uber.jar</span>" that is executable with the Java binary. It currently runs once a week.</p>
<p><br>
</p>
<p>Best,</p>
<p>Dustin<br>
</p>
</div>
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="x_divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> registry <registry-bounces@ivoa.net> on behalf of Markus Demleitner via registry <registry@ivoa.net><br>
<b>Sent:</b> February 11, 2025 1:04:41 AM<br>
<b>To:</b> Paul Harrison via registry<br>
<b>Subject:</b> Re: XML invalid entries in RoR</font>
<div> </div>
</div>
</div>
<font size="2"><span style="font-size:10pt;">
<div class="PlainText">***Attention*** This email originated from outside of the NRC. ***Attention*** Ce courriel provient de l'extérieur du CNRC.<br>
<br>
Hi Paul,<br>
<br>
On Tue, Feb 11, 2025 at 08:31:29AM +0000, Paul Harrison via registry wrote:<br>
> there is quite a large proportion of the entries for registries in<br>
> the RoR that are technically XML invalid<br>
><br>
> e.g.<br>
><br>
> % curl "<a href="http://rofr.ivoa.net/oai?verb=GetRecord&metadataPrefix=ivo_vor&identifier=ivo://src.pas/__system__/services/registry">http://rofr.ivoa.net/oai?verb=GetRecord&metadataPrefix=ivo_vor&identifier=ivo://src.pas/__system__/services/registry</a>"<br>
[...]<br>
<br>
> is invalid as there is no declaration of the namespace of the vg:<br>
> prefix. Although it is obvious what namespace this is referring to,<br>
> if the output is read via an XML processor it will immediately stop<br>
> at this error. I am not sure what the is the official way of<br>
> getting these entries fixed, but clearly the most efficient would<br>
> be a bulk update on the RoR content if someone has that access.<br>
<br>
Ach... yeah, we should make more of an effort to maintain the RofR<br>
codebase (or perhaps re-implement it, shedding some of the cruft);<br>
I'm having "add vocabulary validation" on my todo list for a long<br>
while.<br>
<br>
At least in this case, I'm pretty sure the problem is that when<br>
re-harvesting the record, the namespace declarations on the OAI-PMH<br>
root element get lost; to see what probably happened, see:<br>
<<a href="http://pithia.cbk.waw.pl/oai.xml?verb=GetRecord&metadataPrefix=ivo_vor&identifier=ivo://src.pas/__system__/services/registry">http://pithia.cbk.waw.pl/oai.xml?verb=GetRecord&metadataPrefix=ivo_vor&identifier=ivo://src.pas/__system__/services/registry</a>><br>
<br>
Paul, in case you'd like to try your hand fixing this, the RofR<br>
source is at <<a href="https://github.com/ivoa/rofr.ivoa.net">https://github.com/ivoa/rofr.ivoa.net</a>>, although I<br>
suspect the re-harvesting code is not in there yet (Dustin?).<br>
<br>
Thanks,<br>
<br>
Markus<br>
<br>
</div>
</span></font>
</body>
</html>