Featherweight Publishing Registries
Pierre Fernique
Pierre.Fernique at astro.unistra.fr
Fri Oct 21 10:38:30 CEST 2016
Hi Landry,
I'm not sure that I understand your intention. Do you want to start a
discussion on a new or alternate registry protocol ? Is your FPR
proposal should be an alternative to the OAIP solution for non
publishing VO registries ?
I'm not at all a OAI fan but I think that we have to look carefully
which impacts can have a such evolution.
Pierre Fernique
Le 20/10/2016 à 20:42, Walter Landry a écrit :
> Hi Everyone,
>
> Here at IRSA, we run our own publishing registry, and it is a giant
> pain. The standard is rather complex, so we use a pre-packaged perl
> script. That script is incredibly slow, which means that it takes a
> long time for anyone to harvest our repository. We recently had to
> change all of our records, and it turned out that the best way to do
> it was to have everyone manually delete and then re-harvest our
> records. This is way harder than it should be.
>
> So I would like to propose something I call a Featherweight Publishing
> Registry (FPR). It does not use OAI-PMH. It uses static files.
> Fetching the FPR URL would return a single html file. That file would
> have a list of links. Following those links would return the XML
> document for one (or maybe more) of the services.
>
> As a concrete example, the FPR entry for IRSA would be something like
>
> http://irsa.ipac.caltech.edu/FPR
>
> Fetching it would give an HTML document with links to other URL's
>
> <!doctype html>
> <html>
> <a href="http://irsa.ipac.caltech.edu/FPR/2MASS/Catalog/CalMPSIT"></a>
> <a href="http://irsa.ipac.caltech.edu/FPR/2MASS/Catalog/CalMXSIT"></a>
> <a href="http://irsa.ipac.caltech.edu/FPR/2MASS/Catalog/CalPSWDB"></a>
> <a href="http://irsa.ipac.caltech.edu/FPR/2MASS/Catalog/CalScanInfo"></a>
> ...
> </html>
>
> There is no semantic meaning to the URL's. They could also be
> completely undescriptive.
>
> <!doctype html>
> <html>
> <a href="http://irsa.ipac.caltech.edu/xyzzy"></a>
> <a href="http://irsa.ipac.caltech.edu/zzggy"></a>
> <a href="http://irsa.ipac.caltech.edu/1bdDlXc"></a>
> <a href="http://irsa.ipac.caltech.edu/RboG305ntki"></a>
> ...
> </html>
>
> Fetching those URL's would return the XML registry document for one
> or more services. This setup makes it so that services can use an
> ordinary link checker to verify that the targets exist.
>
> There is no explicit method for adding or removing services. If a
> service is not in any of the XML registry documents, it is presumed to
> not exist anymore.
>
> This would greatly simplify creating and deploying a publishing
> registry. An archive would just have to create some static files.
>
> One objection to this scheme might be that it is wasteful of
> bandwidth. A harvesting service can not rely on OAI-PMH for
> intelligent updates. It has to fetch all of the URL's again.
>
> I would argue that the bandwidth used is trivial. Here at IRSA, we
> have hundreds of services, giving us (I believe) the third largest
> number of services. If someone harvested our complete registry every
> minute, the bandwidth used would be less than 1% of our total outbound
> bandwidth. I doubt that, in practice, it would be a burden even for
> CDS, which has something like 30,000 services.
>
> Moreover, the current harvesting services already do a full harvest
> regularly. I understand that one reason they do not do it more
> frequently is because everyone is using this horribly slow perl
> script. Static files can be served quickly and easily.
>
> In any event, I will be around during the Interop. So maybe we can
> discuss this then.
>
> Cheers,
> Walter Landry
More information about the registry
mailing list