UTYPEs for SIAP

Thu Apr 1 03:50:28 PST 2004

While I'm reluctant to get into this discussion -- I'm not sure
that I can see any useful outcome in putting forward so
divergent a view -- I feel this approach is just not right.
Since I can't attend today's telecon, here are my thoughts after
reading Jonathan's proposal.  Basically I don't see how it
advances what we can do already.  I just need to look in a
different field for a different set of strings to do the
same operations that I'm doing today.

Perhaps my problem can be traced back to my conception of what
data models are for: they should provide a generic description of what
can be done with a given data element.  If data models are so
specific that they are directly tied directly to columns  of
a table, then what is their utility?

The discussion of data models in SIAP needs first to identify
the generic concepts that we want the SIAP to support.   Here's
a very hastily put together concept more to illustrate the approach
than a serious suggestion.

An SIA response should be an example of
a  ResourceResponse which has a single mandatory method getNextResource()
method.   This single method represents the ResourceResponse data
model (there might be optional capabilities).  In this light both the 
SSA and the
ConeSearch protocols might also return ResourceResponse's.

In recent metadata telecons, there has been a request for an enhancement
to this data model.  Arnold has suggested that there should be a capability
of providing a  setOrderResourcesByDesirability() method
Although his request was specific to the SIAP, that's the way it might 
be phrased in
a more general discussion.

The getNextResource() method returns a ResourceDescriptor.   The following
methods are available for ResourceDescriptor:
    getResource() gets the data described in the resource.
    getResourceMeta() gets metadata associated with the resource
    getAssociatedQuicklook() gets a quicklook resource associated with
this ResourceDescriptor.  The last addresses my longstanding concern
with the SIA where I want to get the quicklook data associated with a given
image.  However again this is not tied down to the SIA -- all of these would
work identically for the SSA.

At the level below this we'd finally be getting something that
was more explicitly tied to images.  Ideally the
   getResource() method would return an object that satisfied some
Image data model.  The getResourceMeta() would return something
that had some of the attributes of an image: the WCS in this case.

As you may see this is completely different in approach.
We haven't even begun to discuss the VOTable.   First we have
defined the high level abstract data models which the data can satisfy.
Only now do we even begin to build the structure for defining concrete 
instantiations
of this data model.

The next step is building the 'real' SIARequestResponse object....
What becomes immediately clear is that translating our abstract data model
to work on the real data is not just a matter of looking at columns.  
The association
of quicklook data with non-quicklook data involves multiple rows, i.e., 
rows that
have the same logical name value but different formats.  So the data 
model is not
simply a description of the format, it is a layer of software on top of 
the structure.
To make any progress beyond what we already with the current SIA with
UCD defined columns we need to recognize that we are dealing with objects,
not just structures.  So what do we need in the VOTable to indicate
that we're going to get an SIARequestResponse.   Probably nothing -- it's
part of the protocol, but maybe one would add a VOTable level field 
indicating
this -- but nothing would need to be provided at the column level at least
not until we know a lot more...

Consider a very different data model, the one for spectra.  As I recall 
the discussion
of this, it also breaks down to essentially one mandatory method:

     (wavelength/channel, flux/brightness) = getNextSpectralBin()

where the user can iterate over the channels in the spectra.

What we want to be able to do is treat all different kinds of spectra 
with the same data
model, e.g., a spectrum might be a table with a wavelength and flux in 
each row,
or it might be a table with just a flux, but with a starting wavelength 
and a delta in the
metadata.

The user doesn't (and shouldn't care) about the internal organization of 
the spectrum
they just use the the public method.    So this single abstract data 
model has
at least two different concrete representations.   In the second case, 
the data model
requires a software operation to get the flux that is returned.
Even in this very simple case, it's clear that the data model is more 
than just a specification
of structure, it includes software transformations from the concrete 
data to the abstract
model. 

Once we recognize this, it's clear that an events list can also satisfy 
the spectral
data model -- binning the data is just a slightly more complex 
transformation of the data.
It's one that involves combining row and column information.

So here's what I think...

- Data models are interfaces  (in JavaSpeak) that define what we can do 
with the data.
- Data models need to be instantiated as concrete objects. 
- The transformation between a serialization (e.g., an SIA VOTable 
response)
and a concrete instantiation of a data model can be complex and cannot 
be reduced
to a purely structural transformation (i.e., the data model can only be 
realized by
an object which may have methods that operate on the fields defined in the
serialization).
-  Attempts to define a one-to-one correspondence between elements of the
public data model and serialization of the data model are pointless (or 
at least they
will often fail).
-  A given concrete object may satisfy multiple data models (e.g., 
events lists).

The real issue here is that  we can't use data models effectively
unless we're willing to treat them as objects, not just structures.
So we need to face up to either providing a common language in which to 
define these
objects, or to look at things like CORBA so that we know
how to transfer objects between different languages.  That's the price
we have to pay.

Finally here's where I think the data model effort should be going.

- Defining the high level interfaces for spectra, images and time series.
- Defining the high level interfaces for basic transactions between VO 
entities.

There is as yet no  way to register software that transforms a serialization
of an object into the object. Without that data models are useful only in
helping to document our tools. Once we have some sense of how this is to
be done, we can look at how the serialization should store information about
the data models that it satisfies and where and how we can use UTYPEs
and UCDs -- and we can use data models as the basic building blocks
within the VO.

       Cheers,
       Tom