Registry Interfaces 1.1 RFC

Theresa Dower dower at stsci.edu
Wed Feb 8 18:56:26 CET 2017


Hello,

I can tone down the language on incremental harvesting, sure. 

Almost  conversely, since all of the searchable registries have independently been doing a full re-harvest from scratch every N (6?) months to account for unannounced deleted records, I'm okay with declaring that activity an operational requirement for searchable registries.  It isn't easy to validate compliance like defining an API, but is still worth stating.  I'll add this unless I see a good objection.

--Theresa

-----Original Message-----
From: registry-bounces at ivoa.net [mailto:registry-bounces at ivoa.net] On Behalf Of Markus Demleitner
Sent: Wednesday, February 08, 2017 8:55 AM
To: registry at ivoa.net
Subject: Re: Registry Interfaces 1.1 RFC

Hi Walter,

On Tue, Feb 07, 2017 at 10:39:23AM -0800, Walter Landry wrote:
> Theresa Dower <dower at stsci.edu> wrote:
> I guess I missed this last time.
> 
>   In its Identify response, an OAI-PMH-compliant registry must declare
>   its support for deleted records. This can be one of
> 
>   no
>     - the registry will never notify harvesters of records that have
>       become unvailable. In an enviroment like the VO, where
>       searchable regiestries frequently harvest publishing registries,
>       this is severely discouraged, as without deleted records,
>       harvesters need to perform full harvests every time or risk
>       delivering stale records.
> 
> The total amount of data that is being transferred is tiny.  Is there 
> really any need to discourage full harvests?  Retaining deleted 
> records increases implementation effort.

Even if the amount of data isn't that great (I'd not call some 100s of megabytes "tiny", though), processing and ingesting this stuff is a non-trivial effort, so sure, even at the current size of the VO, it's great if we *can* do incremental harvests.

The passage you quote says, however, that you don't *have to* implement them if you don't mind the consequences; for a smallish registry, that might be a perfectly valid decision.

I give you the language could be toned down a little bit, perhaps to "...frequently harvest publishing registries, this should be avoided, as it leaves harvesting registries the choice of doing frequent full harvests or risk serving stale records."

So, while I'm usually the first to speak out against optional features, in this particular instance I feel that this particular case is a good place for optionalism: If "safe" incrementals are worth it for you, implement them, otherwise don't.  Your clients can easily figure out what you do.

In practice, I think all searchable registries regularly (me, every 6
months) do full re-harvests, by the way, so even if you don't implement incrementals records you drop will eventually disappear.
I could be swayed to lower that interval for registries without deleted records.

        -- Markus


More information about the registry mailing list