Spectral DM document update

Tue Oct 10 09:05:01 PDT 2006

For many high-resolution science-ready spectra, you have typically 
thousands of data 
points which all share the same characteristics apart from the spectral 
coordinate and the flux density (and possibly the statistical error on the 
flux density).

In such a case, thre may be e.g. 10 or 20 other pieces of metadata 
(times, positions, position errors, spectral resolution per bin, accuracy 
of central spectral coordinate etc. etc.) which do not need repeating.

As I understand it, the Spectrum model can 'be' the data in which case 
there would indeed by a horrible bloat; a 100 M VOTable is far more 
reasonable than 1 G one.  For some instances - and especially for SEDs, 
with a few points, often with very different metadata, that is reasonable. 
But I think I agree with Norman, that in practice the Spectrum model 
will be far more useful in the case of large data sets, for describing 
data which are in 
any recognised 
format (including xml) than for reproducing it.

cheers
a

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dr. Anita M. S. Richards, AstroGrid Astronomer
MERLIN/VLBI National Facility, University of Manchester, 
Jodrell Bank Observatory, Macclesfield, Cheshire SK11 9DL, U.K. 
tel +44 (0)1477 572683 (direct); 571321 (switchboard); 571618 (fax).

On Tue, 10 Oct 2006, Norman Gray wrote:

>
> Greetings.
>
> Arguably, this whole discussion is moot.  If you're transporting enough data 
> that XML efficiency becomes an issue, then you probably shouldn't be using 
> XML -- that's not what it's for.  A Swiss army knife is a wonderful thing, 
> but shouldn't be used for brain surgery.
>
> As Doug said:
>
> On 2006 Oct 9 , at 16.09, Doug Tody wrote:
>
>> (None of this may matter in the end as most people will probably use
>> VOTable and FITS for spectra, but nonetheless array handling in XML
>> is an important issue to consider).
>
> While I take the second point, I would still maintain that using XML for this 
> sort of transport is probably an abuse of tools.
>
> There are ways of being efficient about XML, if that's what's really 
> required.  I have a paper sitting here by Peter Buneman and co at Edinburgh, 
> on `Vectorizing and Querying Large {XML} Repositories', DOI 
> 10.1109/ICDE.2005.150 <http://dx.doi.org/10.1109/ICDE.2005.150>.  It 
> describes a scheme (and points to others) for effectively compressing away 
> the XML overhead, and transparently making it column-accessible, without 
> actually losing the useful structuring.  Bob Mann is one of the authors and 
> could probably say more about it.
>
> If bulk data and XML structuring are both seen as vital, then something like 
> this is, I would think, a more stable solution to the problem than the 
> parser-inside-parser solution of having strings of numbers within XML.
>
> All the best,
>
> Norman
>
>
> -- 
> ----------------------------------------------------------------------------
> Norman Gray  /  http://nxg.me.uk
> eurovotech.org  /  University of Leicester, UK
>
>
>
>