Resource Identifiers: discussion synthesis
Wil O'Mullane
womullan at skysrv.pha.jhu.edu
Tue Jun 3 16:42:42 PDT 2003
I think I buy all that ..
wil
On Tue, Jun 03, 2003 at 04:51:52PM -0500, Ray Plante wrote:
> Hi all,
>
> Thanks for everyone who provided comments to the ID proposal. As I've
> mentioned, I felt like it was in need of wider discussion. And while the
> discussion may seem to have gotten a bit chaotic, I think we can distill
> some cogent issues. From my reading, I believe the comments that have
> been raised address the following questions:
>
> 1. Should IDs carry any semantic intformation?
>
> 2. Who chooses/controls what is contained in the components of a
> specific ID? We specifically discussed who chooses the AuthorityID.
>
> 3. How do IDs address the problem of transience and replication? Do
> they need to do more?
>
> In this message I will address each in turn. While it is true that these
> are interrelated, I think conflating these makes it harder to make
> progress. I encourage others to attempt to separate these as is possible.
>
> Here's my punchline ahead of time: I think all of the concerns raised can
> be addressed with a minor adjustsment to the original proposal regarding
> who controls AuthorityIDs.
>
> --------------------------------------------------------------------------
> 1. Should IDs carry any semantic information?
> --------------------------------------------------------------------------
>
> Personally, I'm finding the myriad analogies to the book industry, email,
> and stock tickers of limited value as they too quickly vear off the mark
> and confuse the issue. The best analogy for a VO identifier (which is so
> close, it could cease to be an analogy) is one we all understand: URLs.
> Do URLs carry semantic content? Sure they do: from a URL, we can often
> deduce all sorts of things about what it points to. Is there a standard
> for how semantic meaning is encoded? Absolutely not. Do machines
> universally rely on interpreting the semantic content? No. (In general,
> the programs that do "micro-parse" the URLs are necessarily controlled by
> the same people that control the content.) The ID proposal intends no
> more than this.
>
> Whether or not the URL characters contains anything meaningful to anyone
> does not affect the ability of browsers and servers to talk to each other.
> Nevertheless, I think we can say that it is incredibly helpful that we can
> put little messages into them that help humans remember them, copy them
> without error, and debug the systems that use them. This brings up a
> related question: are URLs intended for human consumption? The answer is,
> no, normally not. When hidden behind highlighted text, they can usually
> be ignored. Nevertheless, humans do occasionally handle them directly.
>
> An advantage of adopting a URI-based identifier allows for this same
> flexibility in a manner that people are used to in URLs. (The XML version
> is equivalent in composition; however, the parseable components are tagged
> individually to allow easier handling through XML parsers.) Where the URL
> analogy *potentially* breaks down is addressed in the next section.
>
> Q: Should IDs carry any semantic information?
> A: They can, but they are not required to. More precisely, the ID
> standard and its use in standard registry interfaces should not rely
> on it.
>
> Is this acceptable?
>
> ---------------------------------------------------------------------------
> 2. Who controls the components of an ID?
> ---------------------------------------------------------------------------
>
> Back in February, the NVO project generated a set of requirements for IDs;
> one of them stated that the framework should maxmimize the freedom of data
> providers to choose identifiers for resources under their control. This
> was the major point of discussion of the NVO telecon.
>
> The ID proposal intended that the AuthorityID (which would typically look
> like a DNS name) would be strictly associated with a standard registry
> interface. In my mind, this was simply a mechanism to help ensure that
> IDs in total are globally unique: once a registry's AuthorityID is
> determined unique, the registry need only ensure that all its ResourceKeys
> are locally unique. Thus, the AuthorityID establishes a namespace that
> the registry ultimately controls. Thus, the data provider does not
> control the namespace *unless* they decide to run their own registry. It
> was assumed that most providers would run their own, so this restriction
> would only affect a few (?) smaller providers.
>
> This is where the URL analogy breaks down. A URL assumes that there is a
> service running on the machine with the DNS name matching the URL's
> host-id component. The intention of VO ID proposal was similar but a bit
> more vague: there would be a registry interface running on the registry
> machine that given the ID could return a resource description. However,
> that interface is not yet defined, and it was not determined if the ID
> should be automatically convertable to a service interface URL/handle.
>
> Critics of the proposal suggested that the choice of a AuthorityID,
> which establishes a namespace, should be controlled by the registrant.
> This would allow organizations to have complete control over their own
> namespace without having to implement any standard registry service. If
> we have full registries that really do contain all registered resources,
> then we do not need the AuthorityID to be tied to the registry where the
> resource was first registered.
>
> It is worth noting that regardless of who controls the AuthorityID,
> introducing a new one will always require that it be checked against a
> VO-wide registry of namespaces to determine if it has been used
> before. Thus, revising the proposal to tie the AuthorityID to an
> organization does not change how we determine if the AuthorityID is
> already in use. It is harder, though, to ensure that the "owner" of
> the namespace retains sole control over its use: if a publisher
> registers some resources in a namespace with one registry and some
> with another, both registries need to know that the publisher truely
> "owns" the namespace it is attempting refer to. It can be done
> (e.g. with grid-based certificates).
>
> The fundemental question, though, is: does the ID specification need
> to be locked into the registry infrastructure. At best, all the ID
> framework needs is a way to determine who owns an AuthorityID. If the
> standard does not lock IDs into the registry infrastructure, then we
> can potentially allow a number of implementations--either
> simulataneously or a sequence that evolves over time--that encourage or
> enforce ID uniqueness. This could include an implementation that
> requires the publisher run a particular registry service.
>
> Q: Who controls the components of an ID?
> Original A: the registry
> Revised A: the registering data provider. Thus, the AuthorityID no
> longer implies the existance of any registry service specific to the
> AuthorityID. The specification merely requires that AuthorityID's
> are uniquely associated with organizations (or individuals) that
> own them.
>
> ------------------------------------------------------------------------
> 3. How do IDs address Transience and Replication?
> ------------------------------------------------------------------------
>
> This issue, as Arnold points out, touches on the need for having
> persistant names that can refer to a resource in perpetuity even when
> support of the resource changes over time or is replicated across
> multiple locations (See http://archives.us-vo.org/metadata/0762.html).
> This is exactly what a URN (a type of URI) does.
>
> In my mind, the ID proposal does *not* address the use case Arnold
> described; that is, VO identifiers are not URNs. In particular, if a
> data collection is mirrored at two different locations and thus
> accessible through interfaces with different URLs/handles, then the
> two mirrors are considered distinct and therefore have different
> resource identifiers. VO identifiers are tied to an organization
> that maintains the resourse they identify via the AuthorityID. If
> the access to a resource moves to a different machine, its ID need not
> change; the resource description it points to can be updated to the
> new location. However, if curation is transfered to a new
> organization, the ID cannot persist unless ownership of the original
> namespace is transfered in whole as well.
>
> A URN scheme is certainly needed; however, we also need a way of
> distinguishing mirrors. Thus, VO identifiers should not be URNs.
>
> A URN system will necessarily need to build on top of both the ID
> standard as well as registry interfaces. In particular (as Arnold
> explains), registries should be able to map a URN to a set of matching
> identifiers that are mirrors of the same resource.
>
> Q: How do IDs address the problem of transience and replication?
> A: They do not. Replicated resources have different IDs; one must
> consult the resources' metadata to know that they are mirrors.
> IDs may persist when the resource moves around within its own
> namespace; however, they cannot persist when the resource is
> curated by a new organization with a differnent namespace.
>
> Q: Do they need to do more?
> A: No. A URN system should be built on top the ID standard and
> registry interfaces.
>
> ---------------------------------------------------------------------
> In conclusion, I am recommending that an AuthorityID be "owned" and
> controlled by a registering organization, but that the mechanism for
> encouraging or enforcing that control not be part of the ID specification.
>
> My apologies for the length of this installment, but I hope it will help
> focus our discussion.
>
> cheers,
> Ray
More information about the registry
mailing list