Putting the pieces together...

Thomas McGlynn tam at lheapop.gsfc.nasa.gov
Thu May 13 08:31:46 PDT 2004


[Mail note.  I've sent this to the DM and APPS groups, it's
relevant to others, but I don't want to get 5 copies of
every response.]

There has certainly been plenty of mail on data models, registries,
STC, UCD's, UTYPE's, measurements, etc, but a lot of it is
frustrating for me.  I can't seem to get a hold on how much of it
gets used and what the consequences in terms of what the
developer or user sees will be.  We seem to spend a lot of
time discussing abstract data and data structures, but there
has been very little about how software finds and uses this information.

Until these discussions are anchored in some more concrete framework
for how the data models, UTYPES, STC are used, it may
be very difficult to come to any real consensus on what
they should look like.

Below I go through an extended example which shows how I think
we might use many of the concepts that have been the
subject of discussion.  It's clear to me that there are enormous
largely undiscussed gaps between these data concepts and
their use in software.


------------------------------------------------------------------
Scenario:

Find and plot spectra of TY Pyxis in the range from the soft X-ray
to the optical (1 A to 10000 A). Let's call the software tool
that does this VSPlot.

Step 1.

VSPlot needs to know the location of at least one registry.
VSPlot makes a query of the registry using a standard registry protocol.
VSPlot asks for all SSAP services that might have data in the desired
regimes and location.

   Issue 1.A: Is the registry query syntax different from the VOQL protocol?
   If so what is it?

   Issue 1.B: What is the protocol for the registry query?  Is it defined
   as by some standard registry WSDL?

   Issue 1.C: VSPlot hardwires the connection between spectra and SSAP services.
   Are we restricted to a single kind of service for each kind of data?  Do
   we need to register the attributes of the kinds of services so that if I'm
   interested in getting archival spectral data I might learn that there are SSA
   services and maybe other kinds of services that I want to query?

   Issue 1.D: How does VSPlot know where the registry is?  Is there a registry
   of registries?  What is the root of the hierarchy?

Step 2.

VSPlot parses the set of services returned from the registry?

   Issue 2.A: Where is the structure/protocol of this returned data defined?

   Issue 2.B: What is the contract regarding these services?  (I.e.,
   given that I asked for services that meet some criteria (spectral
   and spactial coverage), do I know that these services will actually
   have data that meets these criteria?  Probably not I think.)

   Issue 2.C: How much information is stored in the registry
   about each of the SSA services?

Step 3.

VSPlot queries the potential matching service one at a time to
get links to candidate spectral data using the SSA protocol.

   Issue 3. Need definition of the SSA protocol.

Step 4.

VSPlot now has a links to a list of files that may be of interest
for plotting.  We begin a loop over this list.

VSPlot copies a spectral file into local storage.

Step 5.

VSPlot determines if the file supports the Spectrum data model.
If the file does not support this data model it is discarded.

   Issue 5.A:  How do we find out if a data element supports a given
   data model?  Is it required that any file returned by the SSA
   support the Spectrum data model?  If so where do we put the mapping
   between service types and the data models that the returned
   data is going to support?

   Issue 5.B: Is there some list of the potential data models that
   any file might support?

Step 6.

VSPlot looks for frame information for this file to confirm
that it is a spectrum at the appropriate location and in the appropriate
spectral regime for further processing.

   Issue 6.A For a FITS file I know how to do this.  I'm much less
   clear how to do this for arbitrary data returned by an SSA service.
   Is this a standard method associated with the Spectrum data
   model that enables me to find this out?  Basically we're asking
   how we discover the STC information for a given dataset and the
   comparable spectral info.

   Issue 6.B Is coverage information (spatial and spectral) required to be
   in a standard format?  If so what is that format?  If not do we have
   standard conversion services or is it the responsibility of the application
   to convert?

Step 7.

VSPlot iteratively uses the standard (in this scenario) getNextElement method defined
in the spectrum data model to extract data from the file.

   Issue 7.A  How do we use the data model in real code?  Is the
   data model associated with a set of Java classes that we can
   invoke on the data?  If the data model is more than documentation
   we need to be able to instantiate behavior in some TBD way.
   How do we preserve language independence? (Or do we?)

   Issue 7.B Does the data model describe behavior that is defined
   for the data element or does it indicate that the data is convertible
   to some fiducial form?  If the latter who is responsible for the conversion?


Step 8.

The user had indicated that they wanted the spectrum to be flux versus
wavelength.  VSPlot needs to see if it can convert the data extracted
from the file into those units.  VSPlot looks at the UCDs and Units
associated with the spectra.  It converts columns to the desired
units where possible.  Spectra where the data are not convertable
are discarded.

   Issue 8.A. How does VSPlot know which column to look at as the flux-like
   column and which as the wavelength-like column?  It could look through
   a list of potential UCD's or UTYPE's could be invoked here.   Could
   the UCD and UTYPE seem to conflict?

   Issue 8.B.  How do we do the transformations? Is this VSPlot's responsibility
   or do we support standard VO transformation services.

   Issue 8.C.  This is a hard step.  How does VSPlot know enough to distinguish
   between raw and background subtracted spectra and the myriad details like that?
   Is this a characeristic of the flux column or of the entire spectral file?
   This seems to be where all of the discussion of measurements and quantities
   needs to provide some benefit to the user.

   Issue 8.D. How does VSPlot find and use the measurement data model
   information to help here?  What functionality is associated
   with a measurement?


Step 9.

The data is searched for error columns using UCDs and errors bars are computed.

   Issue 9.A. Errors need to be transformed if the data is transformed, but
   the transformations can be complex.  Where is this handled?

   Issue 9.B.  How do we aasociate the error columns with the approprite
   measurements?  Again this seems to be part of the mesurement discussion
   but I need to know how this model is instantiated for it to be useful.
   Does it use Groups in VOTables?  Are there other mechanisms?


Step 10.

The data and errors are plotted.  Fini!

-----------------------------------------------------------

This is intended to give only an example of how these
pieces might play together.  I don't know that there is any more
formal description of this architecture -- nor do I know
who is responsible for one.  Without such a broader picture
of how all these things interact it's very difficult
to assess all the myriad proposals that show
up in the mail.

The issues seem to repeat two themes:

How do I find and get the various data structures that we've
been talking about?
What functionality is associated with them?  If we're talking
about data models as objects, then what are the methods as
well as the fields?


If we can focus on what we really want to use the quantity
model for, or the UTYPEs, or the spectral data model, then I think
we'll be a lot more successful at defining them -- and we'll make
a lot of progress towards deliverable VO applications!

The places where we've been most successful in the VO are when
we have balanced definition of structures with protocols for using
these structures: e.g., VOTables, UCDs and CGI access in the SIAP.

		Regards,
		Tom McGlynn




More information about the dm mailing list