OAI coordination

Ray Plante rplante at poplar.ncsa.uiuc.edu
Tue Nov 25 14:17:21 PST 2003


Hello Harvesters and Harvestees,

We are beginning to see some OAI interfaces coming (back) on-line, so we 
will need to coordinate on a few items regarding how we use them.  As you 
may know, there are several places where we, as a community, have some 
latitude in presenting our metadata through this interface.  
Nevertheless, we need a certain amount uniformity.

Here is a list of recommendations on things we need to agree on.  My 
preferences in each case are not as important as the fact that we need to 
agree, so I welcome your alternative suggestions.

1.  Metadata format name for VOResource metadata: ivo_vor

    This was Sebastien's suggestion.  The exact choice is not too 
    important, but we should reduce the likelihood that the name might be 
    used by another community.

2.  "root" element for VOResource metadata; that is, the child of the 
    <oai:metadata> element:  

    I recommend we use <VOResource>, for the following reasons:

    1. The <VOResource> is constrained to contain only 1 Resource by the 
       schema.  In contrast, <VODescription> allows multiple resources; 
       this would prevent validation from catching this error.

    2. The alternative of allowing <Resource> or one of its sub-classes 
       (e.g. <Organisation>, <Service>, etc.) will likely complicate the 
       handling of the data on the harvester's end if several 
       possible elements are allowed at this level (depending on how the 
       harvester is implemented).

       This may not be a big deal in the short term, but in the long-term 
       it will make it easier for a harvester to decide if it can handle 
       the record.  In general, any application must answer the 
       following questions:
         *  is the XML instance valid (for the schemas I know/care about)?
         *  is the root element what I need/expect it to be?

       The second question is easier to answer if there is only one 
       possible root element to check for.  

3. The form of the OAI identifier; i.e. the value of <oai:identifier>.

   I would like to see us use our IVOA identifiers (in their URI forms) 
   here.  Otherwise, we will find ourselves having to keep track of two 
   identifiers.  

   This might seem like a no-brainer, but several of us (including us at 
   NCSA!) are using the OAI interface script from Virginia Tech, which 
   creates its own OAI identifiers based on the local XML file name.  

   Ramon has placed a modified version of this script at 
   http://nvo.ncsa.uiuc.edu/VO/software/XMLFileDP_vo.pm that is meant to 
   serve as a drop in replacement for XMLFileDP.pm.  (Replace your old 
   XMLFileDP.pm in the perl library directory used by your oai.pl script.)  
   This version will override the default OAI identifiers with the IVOA 
   ones found in the corresponding VOResource files.  It also has the 
   added benefit of supporting deleted records.  

Let me know if any of the above needs more clarification.  Ramon and I 
will happy to work with anyone needing additional help with the OAI 
interface.  

There's also another item that comes to mind that is independent of OAI: 
every publishing registry (i.e. registry that can be harvested from) 
should export:

  *  one <Authority> record (see 
     http://nvo.ncsa.uiuc.edu/VO/schemas/vomdoc-v0.9/VORegistry-v0.2.html#element_Authority) 
     for each AuthorityID it creates records for.  

  *  at least one <Registry> record (see 
     http://nvo.ncsa.uiuc.edu/VO/schemas/vomdoc-v0.9/VORegistry-v0.2.html#element_Registry
     that describes itself.  This record should include a listing of all 
     the AuthorityIDs it manages (i.e. that it has Authority records for) 
     using the <ManagedAuthority> element

No two registries can claim to manage the same AuthorityID.  While this 
may not remain true in the future, for now, this is the way we track 
resource records back to their origin (as discussed by Alex).  

We need a way to register our harvesting interfaces as well.  We can 
either: 
  a. create a new standard service type, and define the appropriate 
     metadata 
  b. add additional metadata to Registry.
I prefer the latter, but we should go with which ever is easier.  
Thoughts?

cheers,
Ray

 






More information about the registry mailing list