IDs and ID services

Mon Feb 3 13:31:10 PST 2003

Hi Ray,

In general I think I agree with what you say below.
One thing that perhaps I should emphasize is that
what I'm calling a key is probably a more primitive
object that what you are calling an ID.  Key's can
be used to build ID's but can also be used in other contexts.

A key is no more than something which identifies an element
of a set.  One can build more complex structures out of this.
E.g., one might consider a SetTaggedKey which is a concatentation
of two keys: the key defining the set from which the key
was derived, and the individual tag within the set.
This would be the kind of self describing ID that you talk
of below (e.g., "//cas.le.ac.uk/TonyLinde").  But if I have 10^9
sources in my table, I'm unlikely to want to repeat the  40
byte VO table identfier all 10^9 times.  At least at the lowest
level I'll store the set name as metadata for the table and just the
individual identier inside the table.

This is a simple illustration of why I worry about defining
the concepts of identity too broadly.  Every context will
have different requirements that may mandate exactly what is
in the 'ID'.  I think we may want substantially more
detailed requirements on ID's in the context of data files and
observations that we necessarily do in terms of catalog table
rows.  E.g., I'm not sure that the issues regarding image formats
really have a counterpart for a catalog.

The discussions in these arenas may be clarified if we think in
terms of keys though.  E.g., we might mandate that data archives
return data indexed with observation SetTaggedKeys and that format
of the data is given as a separate key.  It would be up to the
user to decide whether they wished to consider the concatenation
of the observation and format as the fundamental key, or just
the observation itself.  We provide the user with sufficient
information to enable them to decide sameness, we don't make
that decision prematurely.  I'm not suggesting this particular
approach, just trying to show how the abstract keys can help us define
the registries.

Also, note that the keys aren't necessarily real software objects,
though I've tended to treat them as such.  We may simply have agreements
indicating UCDs for keys or some way of recognizing them.  The methods
I've ascribed to keys would then be implicit.

	Tom

Ray Plante wrote:
> I'm happy to see the international discussion picking up.  This message is 
> mainly about identifiers; however, I wanted to preface it with some 
> context about the discussion initiated within the NVO Metadata Working 
> Group.  While registries and the use of IDs are interdependent, I have 
> been trying to talk about them separately as we define the scope of the 
> problem we want to address (and hopefully bring some order to what 
> otherwise might be a random walk).  
> 
> On the ID side, I assembled some requirements for IDs from the many
> comments from the discussion 
> (http://rai.ncsa.uiuc.edu/~rplante/VO/metadata/oidreq.txt).  This 
> should looked at like a menu--pick the things you agree with.  Better yet: 
> indicate the things you don't agree with.  Requirements are generally 
> considered a good first step to design as it defines the scope of the 
> problem you are trying to solve.  As Andy implied, the requirements for 
> IVOA as a whole might be looser that what we specify for NVO.
> 
> This week, Tom shared should thoughtful comments about how we might
> approach IDs.  (He brought up the term "keys", by analogy to DBs; however,
> the term is synonymous.)  This message is an attempt to enunciate how his
> framework maps onto the requirements list--I hope Tom can comment on
> whether I got it right.  The main conclusion is that IDs are only used
> determine "sameness"; infering ownership or derivation directly from IDs
> is not to be required.
> 
>   1. Single Framework:  Okay 
> 
>   2. Determine "sameness" from 2 IDs:  Okay for the most part
> 	*  direct determination only possible when IDs have been issued by 
>            the same ID service.
>         *  cross-ID service comparison requires translation.  
> 	*  because data mirrors can reuse identifiers (req. 7), it is 
> 	   expected that the need for translation will be rare and
> 	   practically unnecessary.
> 
>   3. Avoiding needless duplication:  Okay (see notes above).
> 
>   4. Identifying different formats of same object:  Unspecified (No?)
>   
>   5. Identifying curator:  No
>         *  The ID is not necessarily issued by the curator.  However,
> 	   it is possible to retrieve this information by using the ID
> 	   to access the item's description.
> 
>      This might suggest a different requirement (which I've added):
> 
>        5.b.  It should be easy to identify the resource having issued the
> 	     identifier.
> 
>      This would allow one to get more information about the item
>      identified.  Note, however, that Tom's outline does not imply
>      this requirement (since one must query an ID service to determine
>      if it includes a particular ID).
> 
>      Another requirement I might add which was part of the registry
>      requirements:
> 
>        5.c  It should be possible to use the ID to access a unique
>             description of the item it identifies.
> 
>      That is, for any information you can't glean from an ID, you can
>      get from a description accessed with the ID.
> 
>   6. Collection membership:  No
> 
>   7. Reuse of IDs by mirrors:  Yes
> 
>   8. Derivation:  No
> 
>   9. Provider control over assignment:  Yes
>         *  Provider can run their own ID service.
> 
> Personally, I think this approach is about right in its scope.
> However, I think it would be useful if all you had was an ID, you
> could infer which service you could invoke to get more information
> about it.  As an example, if you have an ID, "//cas.le.ac.uk/TonyLinde", 
> you could go to "http://cas.le.ac.uk/getDescription?//cas.le.ac.uk/TonyLinde"
> and discover what this TonyLinde is and who takes care of it.  I also 
> think that we need to decide whether (for example) the same images in 
> different formats (e.g. as returned from an SIA image query) should have 
> the same ID.  
> 
> cheers,
> Ray
> 
> 
> 
> 
> 
>