OAI coordination
Ray Plante
rplante at poplar.ncsa.uiuc.edu
Tue Nov 25 14:17:21 PST 2003
Hello Harvesters and Harvestees,
We are beginning to see some OAI interfaces coming (back) on-line, so we
will need to coordinate on a few items regarding how we use them. As you
may know, there are several places where we, as a community, have some
latitude in presenting our metadata through this interface.
Nevertheless, we need a certain amount uniformity.
Here is a list of recommendations on things we need to agree on. My
preferences in each case are not as important as the fact that we need to
agree, so I welcome your alternative suggestions.
1. Metadata format name for VOResource metadata: ivo_vor
This was Sebastien's suggestion. The exact choice is not too
important, but we should reduce the likelihood that the name might be
used by another community.
2. "root" element for VOResource metadata; that is, the child of the
<oai:metadata> element:
I recommend we use <VOResource>, for the following reasons:
1. The <VOResource> is constrained to contain only 1 Resource by the
schema. In contrast, <VODescription> allows multiple resources;
this would prevent validation from catching this error.
2. The alternative of allowing <Resource> or one of its sub-classes
(e.g. <Organisation>, <Service>, etc.) will likely complicate the
handling of the data on the harvester's end if several
possible elements are allowed at this level (depending on how the
harvester is implemented).
This may not be a big deal in the short term, but in the long-term
it will make it easier for a harvester to decide if it can handle
the record. In general, any application must answer the
following questions:
* is the XML instance valid (for the schemas I know/care about)?
* is the root element what I need/expect it to be?
The second question is easier to answer if there is only one
possible root element to check for.
3. The form of the OAI identifier; i.e. the value of <oai:identifier>.
I would like to see us use our IVOA identifiers (in their URI forms)
here. Otherwise, we will find ourselves having to keep track of two
identifiers.
This might seem like a no-brainer, but several of us (including us at
NCSA!) are using the OAI interface script from Virginia Tech, which
creates its own OAI identifiers based on the local XML file name.
Ramon has placed a modified version of this script at
http://nvo.ncsa.uiuc.edu/VO/software/XMLFileDP_vo.pm that is meant to
serve as a drop in replacement for XMLFileDP.pm. (Replace your old
XMLFileDP.pm in the perl library directory used by your oai.pl script.)
This version will override the default OAI identifiers with the IVOA
ones found in the corresponding VOResource files. It also has the
added benefit of supporting deleted records.
Let me know if any of the above needs more clarification. Ramon and I
will happy to work with anyone needing additional help with the OAI
interface.
There's also another item that comes to mind that is independent of OAI:
every publishing registry (i.e. registry that can be harvested from)
should export:
* one <Authority> record (see
http://nvo.ncsa.uiuc.edu/VO/schemas/vomdoc-v0.9/VORegistry-v0.2.html#element_Authority)
for each AuthorityID it creates records for.
* at least one <Registry> record (see
http://nvo.ncsa.uiuc.edu/VO/schemas/vomdoc-v0.9/VORegistry-v0.2.html#element_Registry
that describes itself. This record should include a listing of all
the AuthorityIDs it manages (i.e. that it has Authority records for)
using the <ManagedAuthority> element
No two registries can claim to manage the same AuthorityID. While this
may not remain true in the future, for now, this is the way we track
resource records back to their origin (as discussed by Alex).
We need a way to register our harvesting interfaces as well. We can
either:
a. create a new standard service type, and define the appropriate
metadata
b. add additional metadata to Registry.
I prefer the latter, but we should go with which ever is easier.
Thoughts?
cheers,
Ray
More information about the registry
mailing list