roadmap 2010-2011
Petr Skoda
skoda at sunstel.asu.cas.cz
Wed Sep 15 05:49:46 PDT 2010
Dear all,
I am glad that the question of next DAL development was opened.
I am strongly biased to the optical stellar spectroscopy where I see the
lack of interest of the VO community but unfortunately the lack of
interests of spectroscopists in VO but I would expect the similar problems
are in other fields as well - just do not know enough about style of work
of other scientists.
I see this all the time (even the ESO astronomers ;-) ignore the VO as it
does not give them the tools they need.
I was giving lecture about VO at the Crimean Astrophysical Observatory
yesterday - from where a most advanced ideas about the structure of stars,
stellar evolution, complicated numerical 3D simulations of mass exchange
in binaries etc ... come - they were excited by simple demonstration of
vodesktop and splat - but no one had seen anything like SPLAT and they had
the brief notion that VO is something about complicated access to
catalogues;-)
And what is worse, some people working in stellar reserch tried the VO
hoping to be able to make their work more efficient but after short time
they have to conclude that they do not see any difference to download the
spectra directly from archive or obtain through VO search in SPLAT.
No add-on value is provided ! they said.
In fact I have to repeat all the time the tools we have in VO for
spectroscopy are not in the VO spirit. The only practical work you can do
with current tools is just rudimentary plotting of several spectra that
can be handled by the limited memory of the tool and in fact the
only work I have seen sucessfully done with VO tools (by SVO people) is
the combination of several spectra in different regions and comparison
with theoretical spectra - ONE by ONE.
The problem is clearly seen in case of (among stellar stronomers) most
wanted spectra from ESO (not through the VO tools) - the UVES echelle.
If you want to download 100 spectra from UVES even in the local ESO
network it takes minutes. From the world it is practically imposible to
work with series of spectra from both both UVES, ELODIE, HARPS etc ...
And the client will often crash of slow down because a large memory
requirements ...
So this approach is not in VO spirit - as it has been presented often
(avalanche of data, seamless way of working with large sets ....)
So the solution is the on-the-server processing (cutout of spectral lines,
rebinning, normalization ......) and we have the SSA standard (probably
the only one consistent with data model) but in practise the only one
really large spectral set (SDSS) is not accessible through VO - it stands
apart - all the effort in VO development is obscured by their own CASJOB
interface (even very few people know about spectraservices web) for most
astronomers I have seen working with SDSS spectra.
The rest of SSA services is quite confusing as the most queries (e.g. from
vodesktop) are returning usually errors or nothing as they do not
understand the more advanced parameters (like BAND) time .... Some are
pretending to have more spectra but these are just same representation of
the same and it is difficult to distinguish particular type of data (e.g.
in ESO HARPS vs UVES). And some SSA are returning time series (e.g. COROT)
instead but there is no way in clients to recognize this and be able to
behave accordingly (e.g. to make period analysis)
The problem in current SSA - there is no way how to describe the type of
processing in generating the "virtual data". E.g. for my spectra cutout I
have to use 2 services - what if I will add e.g. rebining, convolution
to given resolution etc ... how many service we would need then ..
So why I am so sceptic about the plans of the advanced GDS DAL interface:
We do not yet have the practical testbed for whole SSA documentation
processing (something understanding everything written in SSAP standard)
we need the obligatory keywords for description of post-processing
operations even for simple spectra (cutout, rebining, convolution,
wavelength shift.....) and especially theoretical spectra (convolution
with given rotational velocity ....)
And in addition to that there is no practical description how to implement
SSA service if you have a bunch of FITS spectra (perhaps the SAADA has
something but it is just partial - not according to SSA full
specification) .
I have already pointed this in about November ..
I think that Doug had precisely expressed all the spirit of VO ideas.
We should still think about VO as a tool for astronomers who are expecting
to do their work with VO more efficiently - having the similar
capabilities - but the current development of VO is all about background
infrastructure - but who will do the tools that can use this ?
> In general virtual data generation can involve some combination of
> subsetting, filtering, or transformation.
> This vital to what we are trying to do with VO, to be able to scale
> up to the very large datasets which are coming.
EXACTLY !!!!
> spectra, e.g. cutting out a small 2D image region (or hundreds of
> them), reprojecting 2D image data, or cutting out a region around
> a single spectral line in a high resolution spectrum.
not even high resolution - even low resolution spectrographs (e.g. LAMOST)
have now 4000+ pixels and for analysis of time evolution (for which a
whole series of spectra is needed) you have to zoom on particular range
only. In practice the downloading of say 500 spectra takes time (minutes
..) the zooming takes time (e.g. large memory - swapping, plotting
interpolating pixels which are no t used afterwards before zooming) etc..
instead the cutting of short wavelength regions on server and downloading
this and display is much faster even if I need to download another set for
different spectral range ...
I have a practical experience with this using my cutout ssa server and
SPLAT-VO on a 3GB notebook all over the world (different speeds and
network latency)
> We need both simple whole-file and virtual data access capabilities.
> Virtual data access capabilities are essential to enabling distributed
> data analysis (analysis performed directly on remote data without
> first downloading the data), and to scaling up.
YES YES YES !!!!!!!
> Discovering and downloading whole archive files for local processing
> is of course a major use case - this is probably still the dominant
> form of data access.
I am afraid that downloading can be done easily by archive tools (e.g.
the tar.gz creation on FTP servers etc ... retransmission in case of
failure by rsync or wget .....
The really big spectra may be already a problem. Concerning discovery -
when I need series - it is usualy from the same instrument - so I know
where it is. And many people in "random" discovery - e.g. who had observed
my object and when - are interested only in simple feature (like are there
seen lines of HeI in emission? or did they observed good quality profile
of Halfa line as well ? - so they would not like to download gigabytes of
spectra, open all and zoom on given range.
For the first question they need the postprocessing (cutout of line) for
second they need ObsTAP giving range and SNR ...
But the typed interfaces like SIA/SSA already
> support this simple mode of access; this is the "simple" mode these
> interfaces support.
simple means curently (in practice) whole data.
> Soon we will have ObsTAP with both ADQL and
> PQL query interfaces, which will provide a simpler alternative for
> whole file discovery and access, adding the capability to access and
> associate any type of data, at the cost of some lost object-specific
> metadata.
Thats the most wanted feature - to know IS IT SOMEWHERE?, WHERE IS IT?,
HOW and WHEN WAS IT OBSERVED?
and only than comes HOW DOES IT LOOK LIKE ?
> to add knowledge of and advanced access capabilities for a specific
> type of astronomical data.
YES - the astronomers want to work with data not just look at them.
>
> The typed interfaces extend the generic query interface in important
> ways for each type of data, and add virtual data generation
> capabilities (the queryData response can describe virtual data).
That is nice - self-describing response - but how the cilent will work
with it ? (and who will write such ?)
> While the query may look
> similar in each interface these added semantics are extremely important
> as they represent the difference between (for example) a catalog or
> an image or a spectrum or a theoretical model.
yes you cannot measure RV on image and compare synthetic spectrum with
one line cut from 2D image of galaxy.
> Hence one might use ObsTAP with PQL to discover all the
> data for a region on the sky, and then use SIA or SSA etc. for data
> access more advanced than merely downloading entire archive files.
BUT what if the data discovered by ObsTAP will be of TB volumes ?
We are approaching the 4-th paradigma in astroinformatics ;-) and people
will want to dig inside the PB volumes to find something new about the
Universe.
> If whole file access is sufficient then ObsTAP alone might be enough.
I can imagine some will be happy just with small amoutn of full files but
the real power of VO can be acquired only with add-on value services.
> More complex data analysis use cases require the capabilities of the
> typed DAL interfaces with their customized parameter interfaces.
YES
Thanks Doug for concise and clear summary
Petr Skoda
*************************************************************************
* Petr Skoda Phone : +420-323-649201, ext. 361 *
* Stellar Department +420-323-620361 *
* Astronomical Institute AS CR Fax : +420-323-620250 *
* 251 65 Ondrejov e-mail: skoda at sunstel.asu.cas.cz *
* Czech Republic *
*************************************************************************
More information about the dal
mailing list