Resource Identifiers: discussion synthesis

Tony Linde ael at star.le.ac.uk
Thu Jun 5 00:46:40 PDT 2003


Good job, Ray. Let's deal with the easy bits first.

>  1. Q:  Should IDs carry any semantic information?
>  A:  They can, but they are not required to.  More precisely, the ID 
>      standard and its use in standard registry interfaces should not rely 
>      on it.  

I assume you mean other than the fact that there are two components to the
ID: AuthorityID and ResourceKey? So, agreed.

>  3. Q: How do IDs address the problem of transience and replication?  
>  A: They do not.  Replicated resources have different IDs; one must
>     consult the resources' metadata to know that they are mirrors.
>     IDs may persist when the resource moves around within its own
>     namespace; however, they cannot persist when the resource is
>     curated by a new organization with a differnent namespace.
> 
>  Q: Do they need to do more?
>  A: No.  A URN system should be built on top the ID standard and
>     registry interfaces.  

Also agreed.

>   2. Q: Who controls the components of an ID?
>   Original A:  the registry
>   Revised A:  the registering data provider.  Thus, the AuthorityID no 
>     longer implies the existance of any registry service specific to the 
>     AuthorityID.  The specification merely requires that AuthorityID's
>     are uniquely associated with organizations (or individuals) that
>     own them. 

Less sure. I'm happy with the idea that an organisation can 'own' an
authorityID but how is this determined? If it is based on certificates then
it comes down to one individual. And all registries have to implement this
ownership checking - and in the same way.

How about a modification of the original idea: the registry can control
*multiple* authorityIDs. These authorityIDs are 'owned' by organisations or
users but one registry controls their allocation. 

So an organisation 'tells' a registry to reserve an authorityID. The
registry lists the authorityID as a resource with metadata listing the
registry as controller. This authority resource is then replicated around
all the full registries. If anyone attempts to register a resource with that
authorityID at any other registry it is disallowed. The organisation then
uses that one registry to register all its resources.

The most common way this would be used is for a data centre with a simple
local registry serving one organisation to register one authorityID. But it
also allows for one registry to serve multiple organisations. And it allows
the organisation to choose its own authorityID. It also reserves the
possibility that someone can set up a registry on our original basis for
those who aren't bothered about the authorityID.

The only registry implementation that has to worry about checking that the
'owner' of an authorityID is registering under that id is the one
controlling multiple authorityIDs but we don't have to mandate the same
method of checking throughout the VO which we would have to do if we allowed
people to register any resource with any authorityID at any registry.

What do you think?

Cheers,
Tony. 

> -----Original Message-----
> From: Ray Plante [mailto:rplante at poplar.ncsa.uiuc.edu] 
> Sent: 03 June 2003 22:52
> To: registry at ivoa.net
> Subject: Resource Identifiers: discussion synthesis
> 
> 
> Hi all,
> 
> Thanks for everyone who provided comments to the ID proposal. 
>  As I've 
> mentioned, I felt like it was in need of wider discussion.  
> And while the 
> discussion may seem to have gotten a bit chaotic, I think we 
> can distill 
> some cogent issues.  From my reading, I believe the comments 
> that have 
> been raised address the following questions:
> 
>   1. Should IDs carry any semantic intformation?
> 
>   2. Who chooses/controls what is contained in the components of a 
>      specific ID?  We specifically discussed who chooses the 
> AuthorityID.
> 
>   3. How do IDs address the problem of transience and 
> replication?  Do 
>      they need to do more?  
> 
> In this message I will address each in turn.  While it is 
> true that these 
> are interrelated, I think conflating these makes it harder to make 
> progress.  I encourage others to attempt to separate these as 
> is possible.  
> 
> Here's my punchline ahead of time:  I think all of the 
> concerns raised can 
> be addressed with a minor adjustsment to the original 
> proposal regarding 
> who controls AuthorityIDs.  
> 
> --------------------------------------------------------------
> ------------
> 1. Should IDs carry any semantic information?
> --------------------------------------------------------------
> ------------
> 
> Personally, I'm finding the myriad analogies to the book 
> industry, email, and stock tickers of limited value as they 
> too quickly vear off the mark and confuse the issue.  The 
> best analogy for a VO identifier (which is so close, it could 
> cease to be an analogy) is one we all understand: URLs.  
> Do URLs carry semantic content?  Sure they do: from a URL, we 
> can often deduce all sorts of things about what it points to. 
>  Is there a standard for how semantic meaning is encoded?  
> Absolutely not.  Do machines universally rely on interpreting 
> the semantic content?  No.  (In general, the programs that do 
> "micro-parse" the URLs are necessarily controlled by the same 
> people that control the content.)  The ID proposal intends no 
> more than this.
> 
> Whether or not the URL characters contains anything 
> meaningful to anyone 
> does not affect the ability of browsers and servers to talk 
> to each other.  
> Nevertheless, I think we can say that it is incredibly 
> helpful that we can 
> put little messages into them that help humans remember them, 
> copy them 
> without error, and debug the systems that use them.  This brings up a 
> related question: are URLs intended for human consumption?  
> The answer is, 
> no, normally not.  When hidden behind highlighted text, they 
> can usually 
> be ignored.  Nevertheless, humans do occasionally handle them 
> directly.  
> 
> An advantage of adopting a URI-based identifier allows for this same 
> flexibility in a manner that people are used to in URLs.  
> (The XML version 
> is equivalent in composition; however, the parseable 
> components are tagged 
> individually to allow easier handling through XML parsers.)  
> Where the URL 
> analogy *potentially* breaks down is addressed in the next section.
> 
>  Q:  Should IDs carry any semantic information?
>  A:  They can, but they are not required to.  More precisely, the ID 
>      standard and its use in standard registry interfaces 
> should not rely 
>      on it.  
> 
> Is this acceptable?
> 
> --------------------------------------------------------------
> -------------
> 2.  Who controls the components of an ID? 
> --------------------------------------------------------------
> -------------
> 
> Back in February, the NVO project generated a set of 
> requirements for IDs; 
> one of them stated that the framework should maxmimize the 
> freedom of data 
> providers to choose identifiers for resources under their 
> control.  This 
> was the major point of discussion of the NVO telecon.  
> 
> The ID proposal intended that the AuthorityID (which would 
> typically look 
> like a DNS name) would be strictly associated with a standard 
> registry 
> interface.  In my mind, this was simply a mechanism to help 
> ensure that 
> IDs in total are globally unique: once a registry's AuthorityID is 
> determined unique, the registry need only ensure that all its 
> ResourceKeys 
> are locally unique.  Thus, the AuthorityID establishes a 
> namespace that 
> the registry ultimately controls.  Thus, the data provider does not 
> control the namespace *unless* they decide to run their own 
> registry.  It 
> was assumed that most providers would run their own, so this 
> restriction 
> would only affect a few (?) smaller providers.  
> 
> This is where the URL analogy breaks down.  A URL assumes 
> that there is a service running on the machine with the DNS 
> name matching the URL's host-id component.  The intention of 
> VO ID proposal was similar but a bit more vague: there would 
> be a registry interface running on the registry machine that 
> given the ID could return a resource description.  However, 
> that interface is not yet defined, and it was not determined 
> if the ID should be automatically convertable to a service 
> interface URL/handle.  
> 
> Critics of the proposal suggested that the choice of a AuthorityID, 
> which establishes a namespace, should be controlled by the 
> registrant. This would allow organizations to have complete 
> control over their own 
> namespace without having to implement any standard registry 
> service.  If 
> we have full registries that really do contain all registered 
> resources, 
> then we do not need the AuthorityID to be tied to the 
> registry where the 
> resource was first registered.  
> 
> It is worth noting that regardless of who controls the 
> AuthorityID, introducing a new one will always require that 
> it be checked against a VO-wide registry of namespaces to 
> determine if it has been used before.  Thus, revising the 
> proposal to tie the AuthorityID to an organization does not 
> change how we determine if the AuthorityID is already in use. 
>  It is harder, though, to ensure that the "owner" of the 
> namespace retains sole control over its use:  if a publisher 
> registers some resources in a namespace with one registry and 
> some with another, both registries need to know that the 
> publisher truely "owns" the namespace it is attempting refer 
> to.  It can be done (e.g. with grid-based certificates).  
> 
> The fundemental question, though, is: does the ID 
> specification need to be locked into the registry 
> infrastructure.  At best, all the ID framework needs is a way 
> to determine who owns an AuthorityID.  If the standard does 
> not lock IDs into the registry infrastructure, then we can 
> potentially allow a number of implementations--either 
> simulataneously or a sequence that evolves over time--that 
> encourage or enforce ID uniqueness.  This could include an 
> implementation that requires the publisher run a particular 
> registry service.
> 
>   Q: Who controls the components of an ID?
>   Original A:  the registry
>   Revised A:  the registering data provider.  Thus, the 
> AuthorityID no 
>     longer implies the existance of any registry service 
> specific to the 
>     AuthorityID.  The specification merely requires that AuthorityID's
>     are uniquely associated with organizations (or individuals) that
>     own them. 
> 
> --------------------------------------------------------------
> ----------
> 3.  How do IDs address Transience and Replication? 
> --------------------------------------------------------------
> ----------
> 
> This issue, as Arnold points out, touches on the need for 
> having persistant names that can refer to a resource in 
> perpetuity even when support of the resource changes over 
> time or is replicated across multiple locations (See 
> http://archives.us-vo.org/metadata/0762.html).  
> This is exactly what a URN (a type of URI) does.  
> 
> In my mind, the ID proposal does *not* address the use case 
> Arnold described; that is, VO identifiers are not URNs.  In 
> particular, if a data collection is mirrored at two different 
> locations and thus accessible through interfaces with 
> different URLs/handles, then the two mirrors are considered 
> distinct and therefore have different resource identifiers.  
> VO identifiers are tied to an organization that maintains the 
> resourse they identify via the AuthorityID.  If the access to 
> a resource moves to a different machine, its ID need not 
> change; the resource description it points to can be updated 
> to the new location.  However, if curation is transfered to a 
> new organization, the ID cannot persist unless ownership of 
> the original namespace is transfered in whole as well.  
> 
> A URN scheme is certainly needed; however, we also need a way 
> of distinguishing mirrors.  Thus, VO identifiers should not be URNs.  
> 
> A URN system will necessarily need to build on top of both 
> the ID standard as well as registry interfaces.  In 
> particular (as Arnold explains), registries should be able to 
> map a URN to a set of matching identifiers that are mirrors 
> of the same resource. 
> 
>  Q: How do IDs address the problem of transience and replication?  
>  A: They do not.  Replicated resources have different IDs; one must
>     consult the resources' metadata to know that they are mirrors.
>     IDs may persist when the resource moves around within its own
>     namespace; however, they cannot persist when the resource is
>     curated by a new organization with a differnent namespace.
> 
>  Q: Do they need to do more?
>  A: No.  A URN system should be built on top the ID standard and
>     registry interfaces.  
> 
> ---------------------------------------------------------------------
> In conclusion, I am recommending that an AuthorityID be "owned" and 
> controlled by a registering organization, but that the mechanism for 
> encouraging or enforcing that control not be part of the ID 
> specification.  
> 
> My apologies for the length of this installment, but I hope 
> it will help 
> focus our discussion. 
> 
> cheers,
> Ray
> 




More information about the registry mailing list