VO and ADEC identifiers
Alberto Accomazzi
aaccomazzi at cfa.harvard.edu
Wed Sep 17 06:45:44 PDT 2003
Hi Arnold and Ray,
I've also been thinking about Ray's proposal, and in the last couple of
days I have read up on the IVOA identifier WD and spoken to both
Guenther and Arnold about this.
I think the proposal is good and can be adopted for use by the ADEC
dataset linking/verification system. For the sake of clarity (and to
confirm that I am fully understanding what Ray is proposing) let me
restate a few things. Ray, please let me know if this matches up with
what you have been thinking.
The identifier to be used in journals to refer to a dataset will be in
the form:
authorityID/datacollection#PrivateID
where both "authorityID" and "authorityID/datacollection" have entries
in the registry. PrivateID can be anything the authorityID has chosen
it to be, and represents a unique identifier that has been assigned to a
dataset within the particular data collection.
For instance, MAST has a number of missions it holds data for, each with
a set of datacollections available (see
http://archive.stsci.edu/dataset_verifier.html). Assuming the
authorityID of "NASA.HST" has been created for the HST mission , the
identifiers for its datasets would look something like this:
NASA.HST/STIS#O4LT010E0
NASA.HST/ACS#J8FF03021
NASA.HST/WFPC2#U32L0104T
...
I'm not sure what to do with missions that do not currently have
explicit datasets defined for them, e.g. IUE. One solution would be to
allow the identifiers to have an empty datacollection:
NASA.IUE/#LWP25899
This would be akin to being able to specify URLs of the kind
http://ads.harvard.edu/ rather than http://ads.harvard.edu/index.html,
but it complicates things a bit when it comes to implementing the
registry lookup, since NASA.IUE would have to appear as both an
authority ID and a datacollection. Or maybe we should similarly
stipulate that a default data collection name should be used if none has
been specified.
Also, while we're at it, I would suggest specifying an explicit scheme
prefix to indicate what domain these identifiers belong to. I'm
thinking of simply prepending vo: or vo:// to it, like Ray has specified
in the VO identifier WD:
vo://NASA.HST/STIS#O4LT010E0
This will clearly mark the identifier unanbiguously as something that
needs to be resolved against the VO registry. It also means that in
principle we can mix identifiers of different kinds in our documents and
leave the resolution to the appropriate authorities and tools. This is
similar to what the DOI foundation does for their own identifiers, which
use a syntax of "doi:authority/object_id" (Speaking of DOIs, there is
no reason in my mind why we could not make use of them as well, but I'll
save that for another discussion).
I still have some questions about how the deployment of resolvers should
take place both in the short and in the long run, but I don't see a
reason why this should stop us from adopting this syntax right away.
-- Alberto
Arnold Rots wrote:
> Ray,
>
> I have been mulling this over and I am not sure what the merit is of
> introducing a third element into the ADEC identifier.
> Can't we just have an authority Id and a dataset identifier, where the
> former (everything before the first /) is resolved to a physical URL
> with the latter provided as, say a parameter, to that URL?
> I'm not partiucualrly fond of the # and, frankly, I don't see the
> advantage.
>
> Also, instead of going through the system of type and status
> attributes, woudl it not be preferable to just have an expiration
> date? That would automatically indicate whether the Id is persistent
> (please note spelling), expired, or whatever.
>
> - Arnold
>
> Ray Plante wrote:
>
>>Hi ADECers,
>>
>>After looking over Alberto's page, I feel I have a much clearer idea of
>>how your scheme is meant to work, and I think providing compatibility with
>>VO Registries can be a fairly simple matter. I shared my ideas about it
>>on the registry list, but I also wanted to try some direct discussion with
>>you guys.
>>
>>I like your architecture; I don't see any inherent incompatibilities with
>>what we're doing in the VO (not surprising). I think the only tweaking on
>>your end that would be necessary would be the syntactic use of dots and
>>slashes that we talked about in Victoria; that is, if ADEC identifiers
>>were valid IVOA identifiers (in URI form), then that should do it. That
>>is, we both have a 2-component identifier where the first component is a
>>namespace name. Namespaces based on telescope names are fine. Also, my
>>suggestion (below) recommends use of a # in the dataset ID portion. Is
>>that ok?
>>
>>The other tweak on your end is to allow (encourage) the data resolvers
>>services at the data centers that ADS queries to be registered in VO
>>registries.
>>
>>The main tweaking I'm recommending is on the IVOA end, and it depends
>>on the defintion of a a logical, location-independent name called a
>>"LogicalIdentifier". This would be a separate ID from the
>>organization-dependent ResourceID. Like the ResourceID, the LogicalID
>>has the two components. Often the ResourceID and LogicalID will be
>>the same. The definite exception would be a mirror of a resource:
>>here, the LogicalID would be the same as the LogicalID of the thing it
>>mirrors, but its ResourceID would be unique.
>>
>>So far, this bit about the LogicalIdentifier is largely independent of
>>the ADEC work; we could (should) do this anyway. The connection to
>>the ADEC work comes in the form of a recommendation to data providers
>>for choosing ADEC identifiers. Assuming that the dataset is not itself
>>registered in a VO registry, the ADEC identifier would be the
>>LogicalIdentifier of the dataset's registered DataCollection, a # sign
>>followed by a dataset name. E.g.
>>
>> HC.BIMA/BDA#t421/c110.ori
>> \_____/ \_/ \___________/
>> | | |
>> IVOA authID ResKey Dataset name
>> \-----/ \_________________/
>> | |
>> ADEC InstID dataset ID
>>
>>or
>> bima.org/BDA#t421/c110.ori
>>
>>This would allow users to go to a VO registry and resolve the ADEC
>>identifier to a data collection (by chopping off the bit after the
>>#). That data collection description could include a reference to the
>>data resolver service that can resolve it to a URL that can be used
>>for access. Thus, VO registries can be used to resolve ADEC
>>identifiers.
>>
>>To make this all clean and open and accessible to everyone, I would
>>also recommend the following:
>>
>> o that the interface that ADS uses to query its selection of data
>> centers be the same as the interface UCP uses to query ADS.
>>
>> o that the DataResolver service be developed as an IVOA standard.
>>
>>This way anyone could implement the DataResolver service. ADS could
>>choose to add a particular implementation to its list of data centers
>>or not. Users could use ADS's data resolver, which may or may not be
>>able to resolve the ID; they could also a VO registry if it comes from
>>a data center that does not happen to be in ADS's list.
>>
>>how does this sound?
>>
>>cheers,
>>Ray
>>
>
> --------------------------------------------------------------------------
> Arnold H. Rots Chandra X-ray Science Center
> Smithsonian Astrophysical Observatory tel: +1 617 496 7701
> 60 Garden Street, MS 67 fax: +1 617 495 7356
> Cambridge, MA 02138 arots at head-cfa.harvard.edu
> USA http://hea-www.harvard.edu/~arots/
> --------------------------------------------------------------------------
--
****************************************************************************
Alberto Accomazzi
NASA Astrophysics Data System http://adswww.harvard.edu
Harvard-Smithsonian Center for Astrophysics http://cfa-www.harvard.edu
60 Garden Street, MS 31, Cambridge, MA 02138 USA
****************************************************************************
More information about the registry
mailing list