On ID "sameness"

Wed Feb 5 07:50:07 PST 2003

Hi Tony,

On Wed, 5 Feb 2003, Tony Linde wrote:
> I still prefer an ID for any resource to be unique. I was thinking more
> along the lines of a unique ID and some other field, like Reagan's
> 'semantic ID' being used for making two resources the 'same' (needs some
> sort of naming standards).Then you have other fields to identify other
> relationships. That said, I guess if we make the ID a semantic one, we
> could have another 'Unique ID' field to uniquely identify a specific
> resource.

So, are you saying that we would need to make a distinction between a 
'semantic id' and an 'instance id', where the latter points to a 
particular copy?

If so, I think this is largely consistent with my suggestions (or at least
we can adjust the suggestions accordingly).  In my mind, a 'semantic id'
should be considered a piece of metadata of something identified with an
'instance id': the former says, this instance is semantically the same as
another object.

A semantic id, however, should probably not be a static or permanent 
characteristic of an object since (as Tom pointed out) this may depend on 
the application.  Rather, its an association made within a particular 
application.  As an example, the SIA could specify that one of the columns 
in the VOTable returned by an image query be a semantic ID that is the 
same for all images (i.e. rows) that differ only in their format (and 
access URL).  This ID could be equivalent to the instance ID of one of the 
images in the group.  

One possible inconsistancy is in my point 4 in which I suggest that a 
mirror copy of something can have the same ID.  This would prevent 
an application from making a distinction between copies of dataset, say, 
in a derivation history of derived product.  Perhaps this should be 
handled in an analogous way: copies have different instance IDs but can 
have a 'reference ID' that is the same.  Recommendation for 
forming an instance ID from a reference ID might be appropriate.  Do you 
think this is a prefered way to go?  

On the con side is the case for dynamically generated data (e.g.  
mosaics) which may be different over time for a given set of inputs
because the algorithm changes.  In this case, an instance ID may not be
much use.  The possible solution is to say that instance IDs should not be
required in all cases; in particular, standard services (e.g. SIA) should 
avoid requiring them.  Or we do away with the concept of instance ID 
altogether: if you want to describe the copy you have, you go to the 
metadata describing origin.  

thoughts?
Ray