roadmap 2010-2011

Petr Skoda skoda at sunstel.asu.cas.cz
Wed Sep 15 05:49:46 PDT 2010


Dear all,

I am glad that the question of next DAL development was opened.
I am strongly biased to the optical stellar spectroscopy where I see the 
lack of interest of the VO community but unfortunately the lack of 
interests of spectroscopists in VO but I would expect the similar problems 
are in other fields as well - just do not know enough about style of work 
of other scientists.

I see this all the time (even the  ESO astronomers ;-) ignore the VO as it 
does not give them the tools they need.

I was giving lecture about VO at the Crimean Astrophysical Observatory 
yesterday - from where a most advanced ideas about the structure of stars, 
stellar evolution, complicated numerical 3D simulations of mass exchange 
in binaries etc ... come - they were excited by simple demonstration of
vodesktop and splat - but no one had seen anything like SPLAT and they had 
the brief notion that VO is something about complicated access to 
catalogues;-)

And what is worse, some people working in stellar reserch tried the VO 
hoping to be able to make their work more efficient but after short time 
they have to conclude that they do not see any difference to download the 
spectra directly from archive or obtain through VO search in SPLAT.
No add-on value is provided ! they said.


In fact I have to repeat all the time the tools we have in VO for 
spectroscopy are not in the VO spirit. The only practical work you can do 
with current tools is just rudimentary plotting of several spectra that 
can be handled by the limited memory of the tool and in fact the
only work I have seen sucessfully done with VO tools (by SVO people) is 
the combination of several spectra in different regions and comparison 
with theoretical spectra -  ONE by ONE.

The problem is clearly seen in case of (among stellar stronomers) most 
wanted spectra from ESO (not through the VO tools) - the UVES echelle.
If you want to download 100 spectra from UVES even in the local ESO 
network it takes minutes. From the world it is practically imposible to 
work with series of spectra from both both UVES, ELODIE, HARPS etc ...
And the client will often crash of slow down because a large memory 
requirements ...

So this approach is not in VO spirit - as it has been presented often 
(avalanche of data, seamless way of working with large sets ....)

So the solution is the on-the-server processing (cutout of spectral lines, 
rebinning, normalization ......) and we have the SSA standard (probably 
the only one consistent with data model) but in practise the only one 
really large spectral set (SDSS) is not accessible through VO - it stands 
apart - all the effort in VO development is obscured by their own CASJOB 
interface (even very few people know about spectraservices web) for most 
astronomers I have seen  working with SDSS spectra.

The rest of SSA services is quite confusing as the most queries (e.g. from 
vodesktop) are returning usually errors or nothing as they do not 
understand the more advanced parameters (like BAND) time .... Some are 
pretending to have more spectra but these are just same representation of 
the same and it is difficult to distinguish particular type of data (e.g. 
in ESO HARPS vs UVES). And some SSA are returning time series (e.g. COROT) 
instead but there is no way in clients to recognize this and be able to 
behave accordingly (e.g. to make period analysis)

The problem in current SSA  - there is no way how to describe the type of 
processing in generating the "virtual data". E.g. for my spectra cutout I 
have to use 2 services - what if I will add e.g. rebining, convolution 
to given resolution etc ... how many service we would need then ..

So why I am so sceptic about the plans of the advanced GDS DAL interface:
We do not yet have the practical testbed for whole SSA documentation 
processing (something understanding everything written in SSAP standard)

we need the obligatory keywords for description of post-processing 
operations even for simple spectra (cutout, rebining, convolution, 
wavelength shift.....) and especially theoretical spectra (convolution 
with given rotational velocity ....)

And in addition to that there is no practical description how to implement 
SSA service if you have a bunch of FITS spectra (perhaps the SAADA has 
something but it is just partial - not according to SSA full 
specification) .

I have already pointed this in about November ..


I think that Doug had precisely expressed all the spirit of VO ideas.
We should still think about VO as a tool for astronomers who are expecting 
to do their work with VO more efficiently -  having the similar 
capabilities - but the current development of VO is all about background 
infrastructure - but who will do the tools that can use this ?


> In general virtual data generation can involve some combination of
> subsetting, filtering, or transformation.

> This vital to what we are trying to do with VO, to be able to scale
> up to the very large datasets which are coming.

EXACTLY !!!!


> spectra, e.g.  cutting out a small 2D image region (or hundreds of
> them), reprojecting 2D image data, or cutting out a region around
> a single spectral line in a high resolution spectrum.

not even high resolution - even low resolution spectrographs (e.g. LAMOST) 
have now 4000+ pixels and for analysis of time evolution (for which a 
whole series of spectra is needed) you have to zoom on particular range 
only.  In practice the downloading of say 500 spectra takes time (minutes 
..) the zooming takes time (e.g. large memory - swapping, plotting 
interpolating pixels which are no t used afterwards before zooming) etc..
instead the cutting of short wavelength regions on server and downloading 
this and display is much faster even if I need to download another set for 
different spectral range ...

I have a practical experience with this using my cutout ssa server and 
SPLAT-VO on a 3GB notebook all over the world (different speeds and 
network latency)

> We need both simple whole-file and virtual data access capabilities.
> Virtual data access capabilities are essential to enabling distributed
> data analysis (analysis performed directly on remote data without
> first downloading the data), and to scaling up.

YES YES YES !!!!!!!


> Discovering and downloading whole archive files for local processing
> is of course a major use case - this is probably still the dominant
> form of data access.

I am afraid that downloading  can be done easily by archive tools (e.g. 
the tar.gz creation on FTP servers etc ... retransmission in case of 
failure by rsync or wget .....

The really big spectra may be already a problem. Concerning discovery -
when I need series - it is usualy from the same instrument - so I know 
where it is. And many people in "random" discovery - e.g. who had observed 
my object and when - are interested only in simple feature (like are there 
seen lines of HeI in emission?  or did they observed good quality profile 
of Halfa  line as well ? - so they would not like to download gigabytes of 
spectra, open all and zoom on given range.

For the first question they need the postprocessing (cutout of line) for 
second they need ObsTAP giving range and SNR ...


But the typed interfaces like SIA/SSA already
> support this simple mode of access; this is the "simple" mode these
> interfaces support.

simple means curently (in practice) whole data.

> Soon we will have ObsTAP with both ADQL and
> PQL query interfaces, which will provide a simpler alternative for
> whole file discovery and access, adding the capability to access and
> associate any type of data, at the cost of some lost object-specific
> metadata.

Thats the most wanted feature - to know IS IT SOMEWHERE?, WHERE IS IT?, 
HOW and WHEN WAS IT OBSERVED?
and only than comes HOW DOES IT LOOK LIKE ?

> to add knowledge of and advanced access capabilities for a specific
> type of astronomical data.
YES - the astronomers want to work with data not just look at them.

>
> The typed interfaces extend the generic query interface in important
> ways for each type of data, and add virtual data generation
> capabilities (the queryData response can describe virtual data).

That is nice - self-describing response - but how the cilent will work 
with it ? (and who will write such ?)


> While the query may look
> similar in each interface these added semantics are extremely important
> as they represent the difference between (for example) a catalog or
> an image or a spectrum or a theoretical model.

yes you cannot measure RV on image and compare synthetic spectrum with 
one line  cut from 2D image of galaxy.

> Hence one might use ObsTAP with PQL to discover all the
> data for a region on the sky, and then use SIA or SSA etc. for data
> access more advanced than merely downloading entire archive files.
BUT what if the data discovered by ObsTAP will be of TB volumes ?

We are approaching the 4-th paradigma in astroinformatics ;-) and people 
will want to dig inside the PB volumes to find something new about the 
Universe.

> If whole file access is sufficient then ObsTAP alone might be enough.
I can imagine some will be happy just with small amoutn of full files but 
the real power of VO can be acquired only with add-on value services.

> More complex data analysis use cases require the capabilities of the
> typed DAL interfaces with their customized parameter interfaces.

YES

Thanks Doug for concise and clear summary

Petr Skoda

*************************************************************************
*  Petr Skoda                         Phone : +420-323-649201, ext. 361 *
*  Stellar Department                         +420-323-620361           *
*  Astronomical Institute AS CR       Fax   : +420-323-620250           *
*  251 65 Ondrejov                    e-mail: skoda at sunstel.asu.cas.cz  *
*  Czech Republic                                                       *
*************************************************************************


More information about the dal mailing list