Summary of data representation issues

Tue Apr 20 08:54:27 PDT 2004

Hi Folks -

For what it's worth here is an attempt to summarize the state of the
data representation discussions (cribbed from the NVO QR).

	- Doug

6.2 Data representation (VOTable, etc.)

How best to represent science data in the VO is still a controversial
issue.  This continues to be discussed at length in the VOTable, DM,
and DAL forums without a clear consensus.  We have general agreement on
the following points:

    o   All data objects internal to the VO should have a formally defined
        data model.

    o   Data models should be defined independently of data representations.
        Multiple representations of the same data are permissible.

    o   XML in some form should be supported for data representation,
        although this should not be the only form of representation.

There appears to be a consensus on the following points although there may
still be some controversy:

    o   The specification of a data model should include a standard schema
        and serialization.  Data representations are not required to
        use either explicitly, but should be defined in terms of these,
        ideally with an explicit transformation, to allow data objects
        to be extracted from some arbitrary representation so that the
        objects could be verified with a schema or instantiated with some
        class code.

    o   FITS should continue to be used for transmission of bulk binary
        data, at least for the forseeable future.  In general we probably
        do not want to use FITS for metadata transport except as necessary
        to self-document a binary data element.

    o   VOTable should continue to be used with evolutionary enhancements,
        at least until there is some clearly better XML-based alternative.

The use of standard table mechanisms for the transmission of data objects
containing tabular data remains controversial.  Some believe that only
a native XML encoding should be used, to make it easier to implement
Web services and to make such data easier to process with XML tools.
Others believe that VOTable (or some comparable standard table mechanism)
should be used for tabular data in order to allow generic table-based
tools to be used upon such data.  Some believe that an "open" and easily
extensible document-centric approach using a generic container to hold
multiple component data models is required for datasets which are too
complex to describe with a single standard data model.