On ID "sameness"
Ray Plante
rplante at poplar.ncsa.uiuc.edu
Wed Feb 5 07:50:07 PST 2003
Hi Tony,
On Wed, 5 Feb 2003, Tony Linde wrote:
> I still prefer an ID for any resource to be unique. I was thinking more
> along the lines of a unique ID and some other field, like Reagan's
> 'semantic ID' being used for making two resources the 'same' (needs some
> sort of naming standards).Then you have other fields to identify other
> relationships. That said, I guess if we make the ID a semantic one, we
> could have another 'Unique ID' field to uniquely identify a specific
> resource.
So, are you saying that we would need to make a distinction between a
'semantic id' and an 'instance id', where the latter points to a
particular copy?
If so, I think this is largely consistent with my suggestions (or at least
we can adjust the suggestions accordingly). In my mind, a 'semantic id'
should be considered a piece of metadata of something identified with an
'instance id': the former says, this instance is semantically the same as
another object.
A semantic id, however, should probably not be a static or permanent
characteristic of an object since (as Tom pointed out) this may depend on
the application. Rather, its an association made within a particular
application. As an example, the SIA could specify that one of the columns
in the VOTable returned by an image query be a semantic ID that is the
same for all images (i.e. rows) that differ only in their format (and
access URL). This ID could be equivalent to the instance ID of one of the
images in the group.
One possible inconsistancy is in my point 4 in which I suggest that a
mirror copy of something can have the same ID. This would prevent
an application from making a distinction between copies of dataset, say,
in a derivation history of derived product. Perhaps this should be
handled in an analogous way: copies have different instance IDs but can
have a 'reference ID' that is the same. Recommendation for
forming an instance ID from a reference ID might be appropriate. Do you
think this is a prefered way to go?
On the con side is the case for dynamically generated data (e.g.
mosaics) which may be different over time for a given set of inputs
because the algorithm changes. In this case, an instance ID may not be
much use. The possible solution is to say that instance IDs should not be
required in all cases; in particular, standard services (e.g. SIA) should
avoid requiring them. Or we do away with the concept of instance ID
altogether: if you want to describe the copy you have, you go to the
metadata describing origin.
thoughts?
Ray
More information about the registry
mailing list