IDs and ID services (fwd)

Mon Feb 3 12:37:15 PST 2003

>From the NVO metadata list...

---------- Forwarded message ----------
Date: Thu, 30 Jan 2003 13:01:45 -0500
From: Tom McGlynn <Thomas.A.McGlynn at nasa.gov>
To: metadata at us-vo.org
Subject: IDs and ID services

Here are some thoughts on IDs that I tried to say a little
bit about in the telecon.

Let's start abstractly...

An ID identifies an element of a set.

When we define a 'unique id' we are attempting to set up
a one-to-one correspondence between a set of id's and
a set of referents.

There are many possible sets for which we would
like IDs.  Indeed it is likely that we
want to have IDs for the sets that we are identifying.

If we become a little more concrete and start looking
at the Virtual Observatory, we see that we have many different
cases where we want to identify set elements: different archives,
different observations within a mission, different objects within
a tables.  In many cases there is already an ID available within
the limited context of the resource being considered.  E.g., many
tables have an row ID field included, or the observation ID may be
well defined.

Within the VO we want to use IDs for three purposes: to distinguish,
to match and to show a relationship.

If two objects are different, then they should have different IDs.  E.g.,
the HEASARC WGACAT catalog has multiple observations of the same X-ray
objects.  Their target name is the same, but there is another unique identifier
which distinguishes these objects.

If two objects are the same in the context chosen, then they should
have the same ID.  If I am interested in whether two datasets are derived from
the same observation, then the observation ID should be used as the ID
even though the datasets may be quite different (e.g., one is a raw data
set and the other is a list of objects generated from running a source
extractor on the first.

We also use ID's to show relationships among objects.  We can do this either
be encoding ID's of the related objects, or by have ID fields that refer to other
objects.  E.g., the observation ID may encode information about the instrument
used, or we may have a field in the observation table that shows the instrument.
Both of these preserve a relationship between the set of 'observations' and the
set of instruments (for a given mission).

There is a lot of commonality between the ID concepts and the use of unique and
foreign keys in relational databases.

So rather than spend a lot of time talking specific requirements on IDs and how
they might be used, I would suggest
that we define a "Key" object to be one of the kinds of objects that VO services
can talk about.  A Key object is quite simple.  It's only characteristic is
that two Key objects can be compared for equality.

In addition to Key objects we would define Key Services.  A key service
corresponds to a set for which we are defining keys.  In general we
may think of each key as belonging to some key service.

A Key service provides the following functionality:
   The service as a whole has a unique Key associated with it.
   It may be able indicate whether a given Key is a member of the set of Keys known to
this service.
   It may be able to provide a new Key for dynamic sets of objects.
   It may be able to provide a link to some other VO object
associated with the key.
   Given a Key thats associated with some other service and the key to that
Key Service it may be able to translate the key from the other service to this
service.

In JavaSpeak we have:

      boolean match     = key.equals(Key someOtherKey);   // Required.

      Key serviceKey    = service.getServiceKey();        // Required
      boolean included  = service.includes(Key key);      // Optional
      Key newKey        = service.getNewKey();            // "
      Key newKey        = service.translate(Key keyDefiningSomeOtherService,
                                            Key keyToSomeObjectUsingOtherKeyService); // "

Some kind of root VO Key service might be needed.  Note that the only functionality
a key service is required to have is that it itself be identified, i.e., we need
to uniquely identify the set we are talking about.

Using this very basic concept of keys, other services can build up their
own identifiers.  E.g., to use Ray's DSS example, the metadata for the STScI
DSS service would indicate that the string formed by some fields is a key
for the plates for this service.  It would also indicate that the
key for the DSS Key service is DSSKey.  At Illinois, they might have a mirror
of the DSS, with the same entries.  Noting that the services use the same key
service, someone trying to compare data could just check to see if the keys
were the same.  At the NOAO they may have the same plates, but they use a different
key system to identify the plates.  So they have a different key service.
If someone wants to check that two plates are the same between ST and
NOAO, then they can try to translate the key from one service to the other.
If someone has built the translator all is copacetic, otherwise the user is
out of luck.

A registry might require that each table service define the key fields and key service
used by that table, similarly for each archive of observations.  The idea is that
we provide a mechanism by which systems can in general show that data is the same,
but we don't mandate some overarching key system that everyone is required
to maintain (other than perhaps the root KeyService).

I hope this isn't more confusing than helpful.  It just seems to me that detailed
discussions of how we build and use keys are really premature.  They will
be used in very different ways in different places and times.  What I think
we can do is begin to indicate that certain things are keys and try to establish
a mechanism that defines the context of the key (i.e., the key service).

	Tom