getCapabilities() and graininess

Doug Tody dtody at nrao.edu
Wed May 9 15:45:34 PDT 2007


On Wed, 9 May 2007, Ray Plante wrote:

> > Why not leave this to the people who define the service standard: so whoever
> > defines SIAP, SSAP, TAP etc can define the fine and coarse level of metadata
> > (if necessary - they might be equivalent), depending on the needs of those
> > who use the services.
>
> Clearly from the opinions raised regarding the role of registries, it
> is not simply a DAL issue; registries have a role in this as well.
> It's the curation of the metadata in the registries that has folks
> bothered.  And even in the DAL and VOQL groups there is division about
> whether table metadata should be managed by the registry.

So far, registry is the only group that has this concept of coarse vs
fine-grained metadata.  I think it was because, originally the registry
was based on discovery/description of resources described by their
resource metadata, but then some folks wanted to cache more fine-grained
metadata for specific types of resources to make workflows more efficient.
This information didn't necessarily have to go into the registry (it could
have still been cached, but someplace else) but for a modest amount of
additional metadata this was arguably a reasonable approach.

For DAL it is more a matter of service metadata (describing the service
instance - as described by getcapabilities) and "data-related metadata"
(describing the data managed by the service).  The data-related metadata
is in general open-ended, and includes database/table/column information,
at least for table data, possibly for all services if we include the
queryData output, and "dataset metadata" describing physical or virtual
dataset instances.  There are some more detailed levels below that as
well, e.g., metadata which is normally only needed for analysis and
which one normally only gets by retrieving the actual dataset, extension
metadata, adding information beyond what the core standard defines, and
project-specific metadata, when we get all the way down to the level of
native project data.

When it comes to describing the columns of a queryData "table",
or the columns of an actual data table, there is no question that
information should come from the service which generates or operates
upon those tables.  It is information which is useful directly
to a client application, not just for registry-based discovery or
cache-based efficiency improvements.  In the case of a data service,
a science-oriented client data analysis application is the primary
client, not the registry, and design decisions (e.g., query capabilities,
default output format) should be driven by the needs of such applications.

In any case, I agree with Ray that it could be useful to have a uniform
way to describe basic column information for both data tables, and the
DAL queryData (or similar) response table.  Whether or not it is useful
to cache such data in the registry, or link it to the registry in any
way, is another issue.  In either case the information should come from
the service.  Probably the default output format should be VOTable as
this is what the services already support as the default format for
manipulation of table data.  An XML format more compatible with the
internals of the registry would also be possible, as an alternative or
optional format (or CSV/TSV for that matter).

	- Doug



More information about the dal mailing list