VO and ADEC identifiers

Alberto Accomazzi aaccomazzi at cfa.harvard.edu
Wed Sep 17 06:45:44 PDT 2003


Hi Arnold and Ray,

I've also been thinking about Ray's proposal, and in the last couple of 
days I have read up on the IVOA identifier WD and spoken to both 
Guenther and Arnold about this.

I think the proposal is good and can be adopted for use by the ADEC 
dataset linking/verification system.  For the sake of clarity (and to 
confirm that I am fully understanding what Ray is proposing) let me 
restate a few things.  Ray, please let me know if this matches up with 
what you have been thinking.

The identifier to be used in journals to refer to a dataset will be in 
the form:

	authorityID/datacollection#PrivateID

where both "authorityID" and "authorityID/datacollection" have entries 
in the registry.  PrivateID can be anything the authorityID has chosen 
it to be, and represents a unique identifier that has been assigned to a 
dataset within the particular data collection.

For instance, MAST has a number of missions it holds data for, each with 
a set of datacollections available (see 
http://archive.stsci.edu/dataset_verifier.html).  Assuming the 
authorityID of "NASA.HST" has been created for the HST mission , the 
identifiers for its datasets would look something like this:

	NASA.HST/STIS#O4LT010E0
	NASA.HST/ACS#J8FF03021
	NASA.HST/WFPC2#U32L0104T
	...

I'm not sure what to do with missions that do not currently have 
explicit datasets defined for them, e.g. IUE.  One solution would be to 
allow the identifiers to have an empty datacollection:

	NASA.IUE/#LWP25899

This would be akin to being able to specify URLs of the kind 
http://ads.harvard.edu/ rather than http://ads.harvard.edu/index.html, 
but it complicates things a bit when it comes to implementing the 
registry lookup, since NASA.IUE would have to appear as both an 
authority ID and a datacollection.  Or maybe we should similarly 
stipulate that a default data collection name should be used if none has 
been specified.

Also, while we're at it, I would suggest specifying an explicit scheme 
prefix to indicate what domain these identifiers belong to.  I'm 
thinking of simply prepending vo: or vo:// to it, like Ray has specified 
in the VO identifier WD:

	vo://NASA.HST/STIS#O4LT010E0

This will clearly mark the identifier unanbiguously as something that 
needs to be resolved against the VO registry.  It also means that in 
principle we can mix identifiers of different kinds in our documents and 
leave the resolution to the appropriate authorities and tools.  This is 
similar to what the DOI foundation does for their own identifiers, which 
  use a syntax of "doi:authority/object_id"  (Speaking of DOIs, there is 
no reason in my mind why we could not make use of them as well, but I'll 
save that for another discussion).

I still have some questions about how the deployment of resolvers should 
take place both in the short and in the long run, but I don't see a 
reason why this should stop us from adopting this syntax right away.


-- Alberto


Arnold Rots wrote:
> Ray,
> 
> I have been mulling this over and I am not sure what the merit is of
> introducing a third element into the ADEC identifier.
> Can't we just have an authority Id and a dataset identifier, where the
> former (everything before the first /) is resolved to a physical URL
> with the latter provided as, say a parameter, to that URL?
> I'm not partiucualrly fond of the # and, frankly, I don't see the
> advantage.
> 
> Also,  instead of going through the system of type and status
> attributes, woudl it not be preferable to just have an expiration
> date?  That would automatically indicate whether the Id is persistent
> (please note spelling), expired, or whatever.
> 
>   - Arnold
> 
> Ray Plante wrote:
> 
>>Hi ADECers,
>>
>>After looking over Alberto's page, I feel I have a much clearer idea of 
>>how your scheme is meant to work, and I think providing compatibility with 
>>VO Registries can be a fairly simple matter.  I shared my ideas about it 
>>on the registry list, but I also wanted to try some direct discussion with 
>>you guys.
>>
>>I like your architecture; I don't see any inherent incompatibilities with 
>>what we're doing in the VO (not surprising).  I think the only tweaking on 
>>your end that would be necessary would be the syntactic use of dots and 
>>slashes that we talked about in Victoria; that is, if ADEC identifiers 
>>were valid IVOA identifiers (in URI form), then that should do it.  That 
>>is, we both have a 2-component identifier where the first component is a 
>>namespace name.  Namespaces based on telescope names are fine.  Also, my 
>>suggestion (below) recommends use of a # in the dataset ID portion.  Is 
>>that ok?
>>
>>The other tweak on your end is to allow (encourage) the data resolvers
>>services at the data centers that ADS queries to be registered in VO
>>registries.  
>>
>>The main tweaking I'm recommending is on the IVOA end, and it depends
>>on the defintion of a a logical, location-independent name called a
>>"LogicalIdentifier".  This would be a separate ID from the
>>organization-dependent ResourceID.  Like the ResourceID, the LogicalID
>>has the two components.  Often the ResourceID and LogicalID will be
>>the same.  The definite exception would be a mirror of a resource:
>>here, the LogicalID would be the same as the LogicalID of the thing it
>>mirrors, but its ResourceID would be unique.
>>
>>So far, this bit about the LogicalIdentifier is largely independent of
>>the ADEC work; we could (should) do this anyway.  The connection to
>>the ADEC work comes in the form of a recommendation to data providers
>>for choosing ADEC identifiers.  Assuming that the dataset is not itself
>>registered in a VO registry, the ADEC identifier would be the
>>LogicalIdentifier of the dataset's registered DataCollection, a # sign
>>followed by a dataset name.  E.g.
>>
>>         HC.BIMA/BDA#t421/c110.ori
>>         \_____/ \_/ \___________/
>>            |     |        |
>>  IVOA   authID  ResKey  Dataset name
>>         \-----/ \_________________/
>>            |             |
>>  ADEC   InstID    dataset ID   
>>
>>or 
>>         bima.org/BDA#t421/c110.ori
>>
>>This would allow users to go to a VO registry and resolve the ADEC
>>identifier to a data collection (by chopping off the bit after the
>>#).  That data collection description could include a reference to the
>>data resolver service that can resolve it to a URL that can be used
>>for access.  Thus, VO registries can be used to resolve ADEC
>>identifiers.  
>>
>>To make this all clean and open and accessible to everyone, I would
>>also recommend the following:
>>
>>   o  that the interface that ADS uses to query its selection of data
>>      centers be the same as the interface UCP uses to query ADS.  
>>
>>   o  that the DataResolver service be developed as an IVOA standard.
>>
>>This way anyone could implement the DataResolver service.  ADS could
>>choose to add a particular implementation to its list of data centers
>>or not.  Users could use ADS's data resolver, which may or may not be
>>able to resolve the ID; they could also a VO registry if it comes from
>>a data center that does not happen to be in ADS's list.  
>>
>>how does this sound?
>>
>>cheers,
>>Ray
>>
> 
> --------------------------------------------------------------------------
> Arnold H. Rots                                Chandra X-ray Science Center
> Smithsonian Astrophysical Observatory                tel:  +1 617 496 7701
> 60 Garden Street, MS 67                              fax:  +1 617 495 7356
> Cambridge, MA 02138                             arots at head-cfa.harvard.edu
> USA                                     http://hea-www.harvard.edu/~arots/
> --------------------------------------------------------------------------


-- 

****************************************************************************
Alberto Accomazzi
NASA Astrophysics Data System                     http://adswww.harvard.edu
Harvard-Smithsonian Center for Astrophysics      http://cfa-www.harvard.edu
60 Garden Street, MS 31, Cambridge, MA 02138 USA
****************************************************************************



More information about the registry mailing list