Featherweight Publishing Registries

Walter Landry wlandry at caltech.edu
Fri Oct 21 21:28:07 CEST 2016


Sarah Weissman <sweissman at stsci.edu> wrote:
> Do you know why the script is so slow? Is it because of an implementation
> flaw or is it because of the self-imposed Retry-after wait period that is
> built into the protocol? Or if you are storing all of your records as
> files on disk is it because of an IO bottleneck? I agree that the protocol
> is complicated, but it seems like there is no reason that transferring
> data via OAI-PMH should be much slower than any other protocol for passing
> data as XML records.

I believe it is slow because, for each request, the perl service has
to read all of the XML files.  But I have not dug into the
implementation because I swore off Perl long ago.

> If you are proposing to switch to a model where each registry returns a
> feed of all its entries, without operations for subselecting based on
> dates for example, then I would suggest looking into using Atom
> syndication https://validator.w3.org/feed/docs/atom.html, which seems to
> be designed for exactly this purpose and is already an accepted and widely
> used standard on the web.

That is a little heavier than what I am suggesting.  It requires
'title, 'updated' and 'id' fields for each element and for the
document as a whole.  A minimal Atom feed would look like

  <?xml version="1.0" encoding="utf-8"?>
  <feed xmlns="http://www.w3.org/2005/Atom">

    <title>NASA/IPAC IRSA Publishing Registry</title>
    <link href="http://irsa.ipac.caltech.edu/"/>
    <updated>2003-12-13T18:30:02Z</updated>
    <id>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6</id>

    <entry>
      <title></title>
      <link href="http://irsa.ipac.caltech.edu/registry/2MASS/Catalog/CalMPSIT"/>
      <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
      <updated>2003-12-13T18:30:02Z</updated>
    </entry>

  </feed>

I can not say that I would be ecstatic about making sure that I do not
mess up the 'id' and 'updated' elements, but it would sure beat the
current situation.

Cheers,
Walter Landry


More information about the registry mailing list