On ID "sameness"
Ray Plante
rplante at poplar.ncsa.uiuc.edu
Wed Feb 5 00:48:30 PST 2003
Hi,
We still have this issue of "sameness": that is, when should two
instances of an object be consider the same, and thus be refered to by
the same identifier? Reagan brought up the concept of a "semantic
copies", copies that are semantically the same but might have a
different byte-representation. Tom indicated what might be considered
semantically equivalent might depend on the context; he suggested that
we should leave it up to the user to decide when objects are the same
rather than locking it in up front.
Drawing on ideas presented earlier by others, I'd like to recommend
the following principles on defining "sameness". They draw on the
requirements discussed in my previous message.
1. Two identifiers refer to the same thing when the identifiers are
character-for-character identical.
2. Two local IDs are identical only when the context of the ID is the
same. Global IDs lock in a specific context, and thus can be
compared in an absolute sense.
3. A description of a resource, service, or data collection might
reference identifiers associated with various aspects of the
subject. Examples might include:
* observation ID
* "derived from" or "mirror of" ID.
* parent collection ID
4. When two instances of an object can be considered the same is up to
the curating resource and will depend on the object being
identified. Curators should consider the following
recommendations:
* Two resources can be given the same ID if their
descriptions are identical apart from the access point.
* Two services can be given the same ID if:
o their descriptions are identical (including the interface
inputs and outputs) apart from the access point.
o the implementaions are identical, or otherwise return
byte-for-byte output for any given set of inputs.
* Two data collections (i.e. anything that that is
byte-instantiatable) can be given the same ID if they are
byte-for-byte identical.
It may be necessary to establish rules or conventions that control
who is allowed to declare a "mirror" of something.
5. Because an identifier can be assigned to a variety of things, be
they abstract/virtual (e.g. resource IDs, observation IDs) or real
byte-instantiatable (e.g. collection IDs), services that
specify an ID as part of input or output should be very clear as to
what the ID refers to (e.g. an image, a table row, an
observation from which a data item is derived, etc.).
In particular with respect to data in a VOTable, it should maximize
the reader's options for determining whether two rows are effectively
the same for a particular purpose. This will likely require of
definitions of various ID UCDs.
One useful UCD might be one for a "semantic identifier", that
refers to another data item that, in the eyes of the writer, a can
be considered equivalent to the item being described. This could
be used by the SIA to group images that differ primarily in
format.
hope this helps,
Ray
More information about the registry
mailing list