[Ops] Inconsistency between full searchable registries
Menelaus Perdikeas
mperdikeas at sciops.esa.int
Tue Mar 15 09:02:58 CET 2016
Hi Thomas,
Indeed, as I've explained we were black-listed in HEASARC for performing too many queries. Tom McGlynn however managed to get the block lifted and I see that we've now successfully harvested HEASARC resources.
HEASARC aside, and as regards the other missing resources I would really have to investigate on a case by case basis but in general what Theresa explained (about resources falling through the cracks in incremental harvests) is a plausible explanation. If a registry updates / deletes or creates a new resource and does not accurately reflect the time-stamp in the OAI-PMH envelope, then records will be missed in incremental harvests and will only be fetched when a full harvest occurs (in the case of Euro-VO once a month).
I think that race conditions are also possible in the design of the OAI-PMH protocol itself (but these can be addressed for practical purposes by using comfortable overlaps in incremental harvests).
There's a report we produce that identifies mismatches (broken down per managed authority) between any two registries and indeed discrepancies of all kinds do exist. I guess that as long as they are kept to a reasonable minimum and only for "recently changed" resources and as long as the different registries eventually converge, it shouldn't matter too much.
Cheers,
Menelaus.
From: "Thomas Boch" <thomas.boch at astro.unistra.fr>
To: registry at ivoa.net
Cc: ops at ivoa.net
Sent: Monday, March 14, 2016 3:43:45 PM
Subject: Inconsistency between full searchable registries
Hi Registry-enthusiasts,
I would like to report on an inconsistency I found between resources available in the EuroVO registry and in the VAO/STScI registry.
I am performing daily a full harvesting (through OAI PMH) of the registry in order to retrieve and filter out services of interest to Aladin Desktop. I used to query the STScI registry for this task until I found out some active resources were missing (for instance ivo://cfa.tdc/hectospec/hectospec_public.ssap.q/ssa ). I then switched to the EuroVO registry and just found out that some other resources, for instance ivo://nasa.heasarc/skyview/skyview , were also missing (but available in the STScI registry).
The full list of missing resources for each registry is attached to this message. From a quick look:
- STScI registry is mostly missing 1300 VizieR resources
- EuroVO registry is mostly missing HEASARC services. Menelaus confirmed me that they had an issue with querying the HEASARC registry.
What should I do ? I am not really keen on querying the two registries and merging the results, as I feel this should not be done at my side. I would expect consistency between full registries, at least for resources older than 1 week. Am I missing something ?
Cheers,
Thomas
This message and any attachments are intended for the use of the addressee or addressees only.
The unauthorised disclosure, use, dissemination or copying (either in whole or in part) of its
content is not permitted.
If you received this message in error, please notify the sender and delete it from your system.
Emails can be altered and their integrity cannot be guaranteed by the sender.
Please consider the environment before printing this email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/ops/attachments/20160315/5e78ad5b/attachment-0001.html>
More information about the ops
mailing list