SED Data Model: Questions and Comments

Wed Feb 16 09:32:58 PST 2005

I think the big win for dimensional analysis is that when you write code to 
handle unit strings, you essentially parse a bunch of ugly stuff using lookup
tables and trun it into a form you can work with  - the dimensional analysis
exponents and scale factors - then you use that to convetrt values to your 
preferred unit system.

So why bother having everyone write that parsing code, which everyone knows 
seems simple enough when you have m <-> ft and maybe sec <-> days but quickly 
degenerates into something very fragile that you keep tacking extra bits 
onto, keep adding rules to your lookup tables, etc.

As for converting existing data, it isn't all that bad because only the data 
provider has to understand their own small set of unit strings they used. 
That could be done without a general purpose parsing toolkit/code...

So, I would go for this in principle, despite having not read the paper :-)

Pat

On 16.2.2005 05:54, David Berry wrote:
> Pedro,
>        I have a certain sympathy with your attitude towards handling
> legacy FITS files with all their diverse ad-hoc approaches to meta-data
> representation! The situation is somewhat similar to the handling of
> WCS.
> Many different schemes have been used in the past for representing WCS
> in
> FITS files. Now there is a published standard, we are faced with the
> problem of what to do with all the non-conforming FITS files. The
> problem
> is similar to that of the use of non-standard units strings. In the case
> of WCS it looks like the solution is either to change all VO data to use
> the standard, or to use FITS interprets that known how to interpret the
> common WCS variants (which is what happens at the moment with things
> like
> AST and WCSTOOLS). Of course you then have to define what you mean by
> "common"...
>
> In the case of units, I'm just not sure that adding a dimensional
> analysis
> to every data set is any less work than correcting the units string of
> every data set. The process would presumably be:
>
>    for every legacy VO data file
>       interpret the existing units string
>       create a corresponding dimensional analysis and add it to the file
>    next file
>
> as opposed to:
>
>    for every legacy VO data file
>       interpret the existing units string
>       create a corresponding standardised unit string and replace the
>          original units string in the file.
>    next file
>
> The first doesn't seem any easier than the second. Or am I missing
> something?
>
> David
>
> > [...]FITS WCS paper one suggests that unit strings should be
> > standardised[...]
> >
> > yes, and again the problem is that some data providers do already have
> > their units written in other formats. Some of them are inside very old
> > "standard" names inside FITS files that will never be changed, just to
> > give an example.
> >
> > [...]So, given that some standardisation effort
> >
> > > is necessary, and that data will presumably always include a human
> > > readable units string, why not standardise that string rather than
> > > introducing an additional dimensional analysis standard?[...]
> >
> > because the dimensional analysis standard consists of only one line,
> > whereas the units standard consists of many names. And I'm not asking
> > for removal of the string names, I'm asking for inclusion of
>
> dimensional
>
> > parameters.
> >
> > For you interest, I was asked by the FITS community to send
>
> information
>
> > about this dimensional analysis thing, and I attach the answer back
>
> from
>
> > Greisen himself (one of the writers of the FITS WCS III paper). He
> > understands that the idea is nice and puts his reasons to not include
>
> it
>
> > in paper III (as he understood that was the proposal, which was
> > certainly not). Among them is the absence of rigorous formulation, and
> > that's the reason why we are writing something on it. Please see the
> > attached mail.
> >
> > [...]But is also introduces extra redundant meta-data, increasing data
> > size and complexity[...]
> >
> > as I say, the dimensional parameters are just two, normally the same
>
> for
>
> > many of the providers' data. Not much overhead.
> >
> > [...]and requires more effort on the part of data providers (in
> > that they have to work out what the dimensional analysis and scale
> > factor are)[...]
> >
> > we can help people on this. On the other hand, data providers will not
> > have to change their units inside their files, but just give the
>
> correct
>
> > dimeq-scaleq in the metadata. This would allow their data (though old
>
> as
>
> > they might be) to be able to play in the VO without having to modify
> > them. Still, I believe it's worth the effort.
> >
> > Cheers,
> > P.
> >
> > On Wed, 2005-02-16 at 12:51, David Berry wrote:
> > > Pedro,
> > >
> > > > parsing of strings is the traditional way to handle units, and we
> > > > believe there are examples more than enough of cases where units
>
> are
>
> > > > named wrongly, despite any effort to homogeneise unit names (which
>
> vary,
>
> > > > by the way, sometimes from FITS WCS paper I to A&A recommended
>
> units
>
> > > > conventions (a la Vizier, I believe), to CODATA ones, etc.).
> > >
> > > Sure, people need to abide by some standard language if
>
> communication is
>
> > > to be possible. FITS WCS paper one suggests that unit strings should
>
> be
>
> > > standardised, and you suggest that dimensional analysis description
>
> should
>
> > > be standardised. Either way, data provides have to check that their
>
> data
>
> > > conforms with *something*. So, given that some standardisation
>
> effort
>
> > > is necessary, and that data will presumably always include a human
> > > readable units string, why not standardise that string rather than
> > > introducing an additional dimensional analysis standard?
> > >
> > > > However, we insist that for superimposition of different
> > > > spectra in different units, the dimensional approach gives -even
> > > > algorithmically- a lot of benefits.
> > >
> > > But is also introduces extra redundant meta-data, increasing data
>
> size and
>
> > > complexity, gives rise to the possibility of inconsistency within
>
> the
>
> > > meta-data, and requires more effort on the part of data providers
>
> (in
>
> > > that they have to work out what the dimensional analysis and scale
> > > factor are).
> > >
> > > David
> >
> > --
> > Pedro Osuna Alcalaya
> >
> >
> > Software Engineer
> > Science Archive Team
> > European Space Astronomy Centre
> > (ESAC/ESA)
> > e-mail: Pedro.Osuna at esa.int
> > Tel + 34 91 8131314
> > ---------------------------------
> > European Space Astronomy Centre
> > European Space Agency
> > P.O. Box 50727
> > E-28080 Villafranca del Castillo
> > MADRID - SPAIN

-- 
Patrick Dowler
Tel/Tél: (250) 363-6914                  | fax/télécopieur: (250) 363-0045
Canadian Astronomy Data Centre   | Centre canadien de donnees astronomiques
National Research Council Canada | Conseil national de recherches Canada
Government of Canada                  | Gouvernement du Canada
5071 West Saanich Road               | 5071, chemin West Saanich
Victoria, BC                                  | Victoria (C.-B.)