general comments to SSAP and SDM from outside

Wed Aug 22 07:12:34 PDT 2007

Dear colleagues,

I decided to enter IVOA DAL after having discussion with some of you 
during both ESAC workshops in March and June and after reading the 
SSAP1.01 and DM documents and mailing lists archive concerning it. I 
am not sure about sending the copy of this to DM list, but hope that 
this will be read by DM people as well (I am not sure if the spectrum 
DM is part of DM group or if it is rather a business of DAL -- 
feeling some fuzziness of the competences). So I am sending my 
general comments here to DAL group and further detailed comments to 
both documents will be sent soon in respective list.

I try to understand all the reasons leading to the Spectrum data 
model and SSAP as it proposed, but after discussion with many of you, 
I feel need to give my impression of the overall effort connected 
with SDM and SSAP from outside - as a representant of a VO-aware 
scientific community having experience with the stellar optical 
spectroscopy and knowing well the data reduction procedures of most 
world optical spectrographs (both echelle and single order).

I have a feeling that all the spectral part of a VO is being developed in 
a "space-instruments-centric" manner. Of course, it is clear the 
development is driven by requirements of large (mostly space-based) 
projects. Unfortunately, the developers are surrounded by scientists 
working only with particular type of data and from here comes a false 
feeling that most of a spectra should have similar properties (to those 
produced by satellites) and what is different is "obsolete" or 
"incomplete" and thus not worth of proper handling in VO. (e.g. 1D FITS in 
image format)

Maybe I am wrong, but my current understanding of SSA and SDM is that: the 
only one good spectra format is VOTable and FITS with binary tables. The 
spectrum without absolute flux calibration in physical units (like W cm-2 
s-1 A-1) is considered not "fully science-ready" because of missing 
Flux-axis information etc ... When I asked some of you how to display 
continuum normalized spectra, I was told exactly this - as the protocol 
pre 1.0 required to represent SCALEQ and DIMEQ in units, I was told the 
units for flux should be "n/a" .....

I think that normalization does not deserve such a neglection: Let me 
comment on this a little: I think it is again space-centric view: the 
flux calibration for ground-based spectra is quite rare - only for 
certain studies (e.g spectral classification) is done - mostly on 
low-resolution spectra - morever it is extremely difficult to do it 
precisely (changing extinction, seeing ...).

The major part of ground based optical spectra from middle and high 
resolution spectrographs is uncalibrated (in some counts or data 
numbers - after the extraction by pipeline) and most people make the 
science on them after continuum normalization - look at the examples 
of many graphs in ApJ or A&A !

Moreover all the modern high resolution spectrographs are echelle - 
here the separate orders are published normalized. If the trials are 
made to merge the orders in one long spectrum (with very uncertain 
results due to complexity of behaviour of blaze function) the final 
spectrum is in artificial units like counts or is nomalized again. 
I have not yet see flux-calibrated spectrum from GROUND-BASED 
echelle.

All abundance studies, asteroseismology, multiple stars 
disentangling, time evolution of envelopes, circumstellar shells or 
stellar winds needs normalized spectra. The RV measurement - either 
direct or by cross-correlation needs well normalized spectrum as 
well (at least for processing by classical programs like fxcor)
Even the properly calculated synthetic spectra (in absolute flux) are 
usualy  normalized to continuum to allow the comparison with 
observations (and sometimes the continuum is corrected using the synthetic 
model)

I think the ROUGH spectral classification of unknown objects is the 
only application for absolute flux calibrated spectra. (especially in 
extragalactic research). Low resolution spectra are used for this. 
But even the topics like detailed spectral classification or analysis 
of chemical composition (if done properly) are comparing continuum 
normalized synthetic and observed spectra (e.g. by Chi^2 fits).
The problem how to fit the physical continuum is another story (depending 
on experience and physical nature of object - but in general it has to 
be done somehow to allow most of the analysis named above).

The space-based research works mostly with SEDs - OK the collection 
of various spectral regions is a nice demonstration of VO power (and 
most often mentioned in VO propaganda;-) so the non-VO aware 
astronomer can get the feeling that all the VO effort is just done 
for collection of very rough multispectral data and its main result 
is the SED !

I think the continuum - normalized spectra should be regarded as 
fully science ready both in observation and theory and proper support 
should be given to them in VO tools and protocols. (I will comment 
more on this in next mails) Concerning DAL business - one of SSA 
results can be (in addition to VOTable) the FITS. The tricky part is 
what FITS (it is implied binary tables FITS - but it is not seen 
immediately in both DM and SSAP proposals.

>From further reading of previous documents and general history of
  SSAP I got the feeling that all versions except the binary tables are 
considered not worth of support and let to the benevolence of the client 
to support it and display - but rather it is wanted to let it go through 
as the NATIVE format to some external legacy application. I understand 
this - every project can have special format of data - for most 
complicated instruments the bintable FITS were used.

But again most of optical ground spectra are not in bintables but in 1D 
images! The answer is following: When someone is asked to reduce spectra 
(if not having pipeline derived from some space-project) he takes either 
MIDAS or IRAF tasks to produce extracted spectra(in onedspec or echelle 
format). As the IRAF is still dominating the world, the results of MIDAS 
reduction has to be finally converted to FITS readable by IRAF splot or 
spectool as the most powerful spectra analysis tools. On the other hand 
MIDAS can read IRAF-produced FITS if rebinned and the non-linear WCS 
(using WAT keywords - unique for IRAF) is replaced by simple CDELT1, 
CRVAL1.

So from both sides we end with 1D FITS image format (perhaps with 
several more spectra for variance etc ..) Echelle spectra have to be 
expanded into separate files for each echelle order (again with 
CDELT1, CRVAL1), otherwise they are not interchangeable with other 
tools. Very seldom are the spectra analyzed in form of 
wavelength-flux ASCII table and sure not in VOTable format - but all 
the middle and high resolution spectroscopy I have seen (and 
discussed at different stellar conferences) is usually conducted in 
1D FITS where only CDELT1 and CRVAL1 and NAXIS=1 is present -- it is 
a common denominator of all formats used for spectra exchange.
Many one-man customized tools for spectrum analysis are even looking for 
only these keywords in header directly (a lot of experienced scientists I 
know are not even aware of FITSIO library)

So I think that the requirement for binary table FITS should be 
complemented by 1D FITS image in the simple form using CRVAL,CDELT1. I 
know a lot of professional astronomers who understand the FITS format only 
in this form (never heard about binary tables) - and I think the most 
astronomers can work with such a format and know how to display such a 
spectra or have converters to their favourite tools. (BTW even amateur 
spectroscopic SW or commercial SW for laboratory spectroscopy provides 1D 
FITS with only few keywords and CRVAl1, CDELT1... after extraction, too)

The objections against rebinning (loosing the accuracy) are clearly 
understandable, but quietly ignored (and proved not to be critical). 
Many people would immediately benefit from the VO tools supporting 
(esp. displaying) and servers delivering spectra in such a format. 
After all the metadata should clearly describe what the axes mean in 
what units and what is a precision and treatment of the data (e.g. by 
the pipeline).

The problem with serialization is here as well (I need to know the 
size of each axis...) but to be honest - what is a longest spectrum 
in pixels or sigle echelle order? - only several thousand! (today 
4600 as largest CCD chip - the mosaicing of chips for coverage of one 
whole spectral (echelle) order is nonsense - such a spectra should be 
divided to two parts as a complex object) So the keeping of several 
thousand of numbers in a memory should not be serious serialization 
problem.

The effort of making the standards overwhelming everything 
conceivable is appreciated, the purity of the model is understandable 
- it is clear that the UCD should be devised for every physical 
variable... But the question is what is important for scientists 
today and who are the scientists for whom the VO is build for. The VO 
should give ASAP something not only for SED investigators and galaxy 
classifiers but it should attract all the "classical" stellar 
astronomers as well. Not only as consumers but as providers as well.

There is a wealth of spectra being produced by spectrographs on 
smaller telescopes all over the world and reduced mostly in IRAF (or 
MIDAS). Despite the problem of data privacy (another story ...) there 
is a will to publish even small collection of spectra in VO - but 
there is a lack of tools and tutorials how to do this easily.

BTW - if you look at list of 21 SSA services in VO Registry most of 
them are space data. Only 3 ground based archives are there - some 
services are just retooling another server data (Elodie data in BeSS) 
or are just project data from several days of observation elsewhere. 
Most of the spectra from ground are served by a  single server 
tool (Pleinpot by P. Prugniel)

I am one of scientists with SW development abilities who has some 
knowledge of VO (and tries to understand it more deeply) - but 
current VO spectra publishing tools (as presented at June workshop) 
are not helping too much to publish easily small collection of IRAF 
reduced 1D spectra - BTW I was not able to find any tool allowing 
easily the conversion of 1D fits to binary tables (e.g. the TABLES 
package tasks did not accept such result of IRAF *ms* spectra 
produced by apall tasks).

In the examples in appendices of both documents there is a lot of 
stuff but no simple recommendations how to use the documents for 
building simple SSA 1.01-compliant service of a bunch of normalized 
1D FITS, what might be the main goal of many people from general 
astronomical community.

I am pretty sure that some tutorial showing a minimal required params 
and the example values for such a simple spectra is required to 
attract more spectra publishers from smaller observatories and 
amateurs as well ..

I didn't mean this like the critics of VO effort and I do not want my 
objections to delay the approval of documents - but I want just to 
emphasize the practical needs of ground-based astronomical community. 
I have presented more inputs for further VO development in my March 
presentation (and if somebody is interested I would be glad to 
discuss the potential benefits of VO approach for everyday 
astronomical work.

Best regards,

Petr Skoda

*************************************************************************
*  Petr Skoda                         Phone : +420-323-649201, ext. 361 * 
*  Stellar Department                         +420-323-620361           *
*  Astronomical Institute AS CR       Fax   : +420-323-620250           *
*  251 65 Ondrejov                    e-mail: skoda at sunstel.asu.cas.cz  *
*  Czech Republic                             pskoda at mbox.cesnet.cz     *
*************************************************************************