towards an ID spec

Ray Plante rplante at poplar.ncsa.uiuc.edu
Wed Feb 19 11:55:43 PST 2003


Hi,

I'm working on a proposal for a specification of unique identifiers
for the IVOA.  (You can peek at my gory notes thus far at
http://rai.ncsa.uiuc.edu/~rplante/VO/metadata/oidspec.txt.)  There are
a few items in particular that I'm looking for feedback on now.  These
items will be a topic of discussion at this week's NVO MWG telecon;
however, broader feedback via email would be very helpful.

The general gist of the spec is that any ID that conforms to the IETF
standard for URIs may be used as a global identifier.  However, IDs
that start with "ivoaid:" (or whatever we want to call it) imply
certain things--principly that it has been registered in some form.
Use of the "ivoaid" scheme would impose additional requirements on the
ID and what you can do with it.  I also  attempt to incorporate
Arnold's ideas for addressing mirrors and location-independent
names.

The first issue I'm wondering about is which URI form we want to go
with for "ivoaid" IDs.  We should probably go with one of the two
common forms of URI refered to in the standard
(http://www.ietf.org/rfc/rfc2396.txt)...

  1) URN syntax:    
     e.g. urn:ncsa.uiuc.edu:ADIL:95.DR.01

       * a colon (:) is the primary delimiter 
       * commonly used in the digial library world
       * (we're not restricted to using "urn" as the leading scheme)

  2) a net-based form of the generic syntax:  
     e.g.  ivoaid://ncsa.uiuc.edu/ADIL/95.DR.01

       * a slash (/) is the primary delimiter
       * commonly used in the Web/XML world

Which do people prefer?  I am partial to the second one myself (based
on what I'm prosing to do with it); however, I don't think it matters
that much either way.  I'd like to hear other people's opinions,
particularly in light of the 2nd issue below.

The 2nd issue concerns using an ID to retrieve descriptions of
things.  In general, I don't think it's a good idea to require that
all "ivoaid:" IDs have a registered, retrievable description
associated with it.  That is, you may want to refer to an image in a
collection with an "ivoaid:" ID (say, in an SIA query result) but not
bother to actually register it explicitly.  This may be because:

   *  you've got too many images and it would be too much work
   *  the image or its ID is not persistant
   *  the collection contents is changing all the time.

Instead, we would simply require that at least one of its enclosing
collections be registered.  To make it possible to learn about an ID,
whether it is explicitly registered or not, I propose that the
authority that issues the ID support a "Describe" service that works
as follows:

   1. suppose I have an image ID of the form, 
        ivoaid://ncsa.uiuc.edu/ADIL/95.DR.01.01
   2. I give this ID to the service.  If that ID is registered
        explicitly, its description is returned.
   3. If that ID is not registered, the service looks for its
        enclosing colletion, ivoaid://ncsa.uiuc.edu/ADIL.
   4. The hierarchy is ascended until a description is found.  At a 
        minimum, the top level, ivoaid://ncsa.uiuc.edu, must be
        registered. 

Thus, you can always learn something about an ID.

My questions on this issue are:
  o  Should we require that all "ivoaid:" IDs be explicitly
     registered, or can we get away with just requiring registration,
     at the least, of one of the enclosing collections?
  o  Is the "fall-back" Describe service a good idea?
  o  If so, it requires that / (or :, in the URN syntax) imply
     containment.  Is this a problem?
  o  Does the "fall-back" Describe functionality affect which URI form
     we choose?

Now a 3rd issue (if you're still with me) is regarding mirroring and data
relocation.  Arnold proposed a three-component ID of the form "L:P:D",
where L=resource location, P=project/service, and D=dataset (see 
http://www.ivoa.net/forum/registry/0060.htm).  "L:P:D" points to a
specific instance of a dataset at a specific location.  "P:D" can be
used as a location-independent ID for the dataset which is resolvable
to a location by querying a registry for "P".  

I would propose folding this idea in in the following way.  Suppose
SSDS hosts a collection with the ID, "ivoaid://sdss.jhu.edu/SDSS/catalogs".
And suppose STSci wants to mirror that collection.  It would re-use
the "SDSS/catalogs" part of the ID for its mirror (that's the "P"
part); it could register this as
"ivoaid://stsci.edu/mirrors/SDSS/catalogs".  Now suppose that I want
to access one of the items in this collection:
"SDSS/catalogs/extended" (that's the "P:D" part).  I would resolve
this to a list of locations using a "Match ID" service of a registry.
The registry would first look for all IDs that end in
"SDSS/catalogs/extended".   Since this is not registered, it won't
find anything, so it ascends the ID and looks for IDs ending in
"SDSS/catalogs".  This would return both occurances:
"ivoaid://sdss.jhu.edu/SDSS/catalogs" and
"ivoaid://stsci.edu/mirrors/SDSS/catalogs". 

Note: just because 2 IDs share some portion does not by itself
indicate that they are mirrors.  To determine this definitively, one
would have to look at the metadata for the two collections.  We can
imagine specific metadata for describing this.

Questions:
  o  Is this a good framework for handling mirrors/data relocation
  o  (Arnold:)  does this satisfy the requirements for
     location-independent names (as needed by the journals)? 

I look forward to feedback.  We'll also talk about this at this week's
NVO MWG.

thanks,
Ray




More information about the registry mailing list