Data representation using VOTable, FITS, XML etc.

Doug Tody dtody at aoc.nrao.edu
Wed Apr 14 10:48:53 PDT 2004



---------- Forwarded message ----------
Date: Wed, 14 Apr 2004 11:46:34 -0600 (MDT)
From: Doug Tody <dtody at zia.aoc.nrao.edu>
To: VOTable at ivoa.net
Subject: Re: Comments on V1.1 - Future of VOTable (flame bait sigh)

Didn't we just have this same discussion a while back?

There are some significant advantages to a generic table mechanism.

    o	By definition a generic table mechanism should work well for storing
	tabular data.  General table management software can be written to
	implement the table abstraction, then this software can be reused
	throughout a data analysis system.  Having factored off the table
	abstraction as common software it is worthwhile investing effort
	in such software, e.g., to efficiently handle bulk data.

    o	Data stored in a general can be manipulated with generic table
	tools.	This has been very successful in the past with FITS tables
	and we are seeing it again now with VOTable.

    o	Compatibility with existing astronomical software, much of which
	is table-based.  As Clive mentions, it is easy to modify such
	software to read VOTable as well as FITS ascii and binary table,
	text tables, etc.  Integration of VO and legacy astronomical formats
	such as FITS binary datable is much easier if both implement the
	table abstraction.

    o	Compatibility with non-astronomical software which is also table
    	based, e.g., databases, spreadsheets, statistical analysis tools.

    o	Any approach which uses a general container mechanism is likely
	to be more open than one which uses a custom schema designed for a
	single class of data.  One can represent the core elements of the
	data model in the container, and extract them from the container
	later to manipulate the object in class-specific code.	But other
	information can be stored in the generic container as well.
	This flexibility is important to allow data representation to
	evolve, or to adapt to subclasses of data.

If all one wants to do is serialize a single data model in XML then I
agree the simplest thing to do is to define a custom schema specifically
for that data model.  While simple, this is very restrictive.  Anything
which does not fit into the predefined schema is either disallowed, or
awkward to handle via the schema approach.  As soon as we try to model
complex datasets by aggregating multiple component data models (as we do
in the real world all the time) then the schema-based approach starts to
break down.  In general the schema approach only works at the level of
individual, well defined data models.

Perhaps we should try both approaches.  Any tabular data, be it a catalog
or a 1D spectrum, can be reasonably expressed using a generic table
mechanism, permitting use of generic tools and providing scalability to
very large datasets.  For any self-contained, well defined data model
it is natural to define an XML schema.  In cases where the data model
is simple enough we can implement a Web service which is schema-based.
More complex cases are probably better addressed using a generic,
flexible/open document-centric approach such as VOTable or FITS.

	- Doug




More information about the dal mailing list