On ID "sameness"

Wed Feb 5 00:48:30 PST 2003

Hi,

We still have this issue of "sameness": that is, when should two
instances of an object be consider the same, and thus be refered to by
the same identifier?  Reagan brought up the concept of a "semantic
copies", copies that are semantically the same but might have a
different byte-representation.  Tom indicated what might be considered
semantically equivalent might depend on the context; he suggested that
we should leave it up to the user to decide when objects are the same
rather than locking it in up front.  

Drawing on ideas presented earlier by others, I'd like to recommend
the following principles on defining "sameness".  They draw on the
requirements discussed in my previous message.

1. Two identifiers refer to the same thing when the identifiers are
   character-for-character identical.

2. Two local IDs are identical only when the context of the ID is the
   same.  Global IDs lock in a specific context, and thus can be
   compared in an absolute sense.

3. A description of a resource, service, or data collection might
   reference identifiers associated with various aspects of the
   subject.  Examples might include:
     *  observation ID
     *  "derived from" or "mirror of" ID.
     *  parent collection ID

4. When two instances of an object can be considered the same is up to
   the curating resource and will depend on the object being
   identified.  Curators should consider the following
   recommendations:
     * Two resources can be given the same ID if their
       descriptions are identical apart from the access point.
     * Two services can be given the same ID if:
         o  their descriptions are identical (including the interface
	    inputs and outputs) apart from the access point.
	 o  the implementaions are identical, or otherwise return
	    byte-for-byte output for any given set of inputs.  
     * Two data collections (i.e. anything that that is
       byte-instantiatable) can be given the same ID if they are
       byte-for-byte identical.
   It may be necessary to establish rules or conventions that control
   who is allowed to declare a "mirror" of something.  

5. Because an identifier can be assigned to a variety of things, be
   they abstract/virtual (e.g. resource IDs, observation IDs) or real
   byte-instantiatable (e.g. collection IDs), services that
   specify an ID as part of input or output should be very clear as to
   what the ID refers to (e.g. an image, a table row, an
   observation from which a data item is derived, etc.).  

   In particular with respect to data in a VOTable, it should maximize
   the reader's options for determining whether two rows are effectively
   the same for a particular purpose.  This will likely require of
   definitions of various ID UCDs.  

   One useful UCD might be one for a "semantic identifier", that
   refers to another data item that, in the eyes of the writer, a can
   be considered equivalent to the item being described.  This could
   be used by the SIA to group images that differ primarily in
   format.  

hope this helps,
Ray