The IVOA in 2006: Assessment and Future Roadmap

Wed Jun 7 08:24:28 PDT 2006

Dear IVOA Exec

As requested, I present the assessment of the IVOA progress from the  
Technical Coordination Committee: The IVOA in 2006: Assessment and  
Future Roadmap. The PDF document can be found at the link:
http://www.ivoa.net/Documents/Notes/RoadMap/IVOARoadMap-2006.pdf

There is an explanation of the architecture of the IVOA expressed as  
Views, Services, and Registry; a set of eleven recommendations to the  
working groups; and the agreed roadmaps for each working group for  
the next year.

I would like to thank all the Working Group and Interest Group chairs  
for their help in making this document, particularly Francoise  
Genova, Tony Linde, Maria Nieto, Guy Rixon, and Doug Tody.

I reproduce the recommendations below, and the TCC welcomes your  
comments and corrections from the IVOA Exec or anyone else.

Roy Williams
Chair, IVOA Technical Coordination Committee

-----------------------------------------

(1) Crossmatch standards (TCC): The Interop meeting in Victoria (May  
2006) generated a wide-ranging discussion on the nature of  
astronomical crossmatch – meaning a fuzzy join of source catalogs in  
order to associate multiple observations of the same astrophysical  
object. It is a core operation of statistical astronomy, and a widely- 
implemented crossmatch standard is essential if catalogs are to be  
richly federated. The standard crossmatch cannot, of course, solve a  
scientific problem to its conclusion, but should be a way to obtain a  
“good selection of candidates” to which the scientist can apply  
custom processing. Therefore we recommend that the Technical  
Coordination Committee set up a small international “tiger tem” to  
investigate and report on crossmatch with a deadline of October 2006,  
specifically covering use-case, algorithm, complexity, and notation:
·       Produce a number of scientific use cases for crossmatch  
algorithms, such that the convex hull of these use cases covers most  
practical applications;
·       Research and enumerate the different algorithm components,  
including distance and the chi-squared (Szalay et al) algorithm,  
including algorithms that use non-spatial information;
·       Evaluate the strengths and weaknesses of these algorithms  
with respect to the scientific use cases;
·       Research and understand the technical implementation of these  
algorithms with relational database technology, assigning  
computational complexity to the algorithms;
·       Evaluate the conceptual complexity of the algorithms by  
building suitable notation based on ADQL or SQL, and writing the use- 
case queries in terms of this notation;

The report should be online at the IVOA wiki, with a discussion  
section to allow the use-cases, algorithms, complexity measures, and  
notations to be evaluated by the whole IVOA.

(2) Simple Image Access (DAL): One of the early successes of the IVOA  
standards process was the Simple Image Access Protocol, a standard  
service based on a view of an image survey as a covering of the sky  
with a small number of  named filters. This protocol has been widely  
accepted and implemented. Emerging in 2006 is a more sophisticated  
“Datacube Access” version, where the image can be a multidimensional  
datacube, the metadata aligns with the characterization data model,  
and other enhancements. We hope that the older, simple level can be  
retained as “Simple Image Access”, in addition to the new protocol  
for datacubes, because (a) many sites will continue using the  
original standard, and (b) the simpler protocol can often do  
everything that is needed. We recommend continued support of “Simple  
Image Access” in the same form as it has been, and welcome the  
addition of the new datacube protocol.

(3) Spectral Model/Interface/Access (DM, DAL): The spectrum data  
model and access services have appeared as Working Drafts in the Data  
Models and Data Access Layer groups. We recommend accelerated  
implementations of these standards, and experiments on  
interoperability between these, which will lead to accelerated  
approval to Recommendation by the IVOA. In this way, the exposure of  
spectral data by data centers can be brought to the same level of  
maturity as image and catalog views.

(4) Source Catalog View (DM, VOQL): Many astronomical databases (but  
not all) are object catalogs.  A choice of views is emerging between  
the “Table” and “Catalog” concepts. In the Table view, any database  
table can be exposed and its relational schema used to create  
queries; and in the Catalog data model, a table of astronomical  
sources is exposed in a standard data model – so that, for example,  
“positional error” is always written the same way rather than  
according to an arbitrary name chosen by the table author. We  
recommend (hope for) a vigorous discussion within the IVOA of this  
Source Catalog Data Model, with the objective of international  
agreement on a standard representation for a source catalog.

(5) Registry Implementation (Registry): As with many IVOA standards,  
it is time to finalize the schema for the Registry to enable a clear  
path to implementation. A new plan has been agreed at the May 2006  
Interop, that elaborates the idea of Service into a family: the  
parent Service contains Interfaces and Capabilities.
·       We recommend that this change in registry schema should be  
the last for a long time – at least the last schema change that would  
invalidate old records.
·       We also recommend that the registry WG define and reach  
agreement on the scope of the registry in terms of the variety and  
granularity of metadata. Registries can cache detailed metadata on a  
regular basis, or maintain limited (but valid) metadata and fetch  
detail only when required.
·       We also recommend that the “Registry of Registries” should be  
created immediately and/or advertised on the IVOA website, even if it  
is informal (a web page), so that information can be gathered at the  
same time as the formal specification is built.
·       We hope to clarify and define closely the idea of annotation/ 
augmentation of existing registry records by an entity that is not  
the author. We recommend that the Registry group provide use-cases  
for this concept.

(6) Registry Query Language (Registry, VOQL): Querying a registry of  
services is rather different, semantically, from querying a star  
catalog. The former may involve small data in complex schemas, and  
the latter large data in simple schema. The star catalog query is  
helped by specific language constructs (eg. Region of the sky) that  
may mean nothing in the context of the registry query. We recommend a  
sub-committee of the Registry and VOQL groups should examine the case  
for and against a separate query language for registry, that would be  
customized for registry queries and independent of future development  
of the catalog query language.

  (7) Table and Catalog Access (VOQL, DAL): The Query Language group  
has made considerable advances in generalizing and standardizing  
levels of compliance and utility. For general catalog access, cone  
search, trivial as it is, has proven to be a good start as it  
provides easy access to data via a simple interface. SkyNode  
addresses the much harder problems of providing a general query  
language, crossmatching of large catalogs, and distributed cross  
matches. What is needed -- an intermediate approach for basic catalog  
access -- is something which provides both a language-based interface  
(ADQL) as well as a parameter-based interface more sophisticated than  
cone search, and eventually, data model mediation via standard  
catalog data models.

·       In the language interface, a relational database is exposed  
through relational schema: the table names and table attributes,  
together with the ability to build a query using that metadata. We  
recommend the creation of a standard interface called Simple Table  
Access Protocol (STAP) that implements this view. It would be derived  
from the basic Skynode interface and the core ADQL language.
·       In the catalog view, queries can be created within the  
Catalog Data Model, language-based and/or parameter-based, so that  
the same query can be sent to multiply-authored source catalogs, and  
the results returned in the context of that view. We recommend the  
creation of a standard catalog access service interface to support  
his view.
·       The most sophisticated queries involve distributed cross- 
match, where multiple source catalogs generate associations of  
observations of the same physical object. See above for  
recommendation relating to crossmatch.

(8) VOSpace (GWS): The VOSpace effort within the Grid/Web services  
working group is building semantics, schema, interface, and  
prototype. The view and capabilities of the VOSpace is revealed at  
three levels of depth:
·       Data are stored as files/blobs, but MIME types are recorded  
against them so that them may be understood after being fetched out  
of VOSpace.
·       MIME types are used to allow access to parts of a file, or to  
allow dynamic reformatting during output from VOSpace.
·       Data are stored in some way that makes their internal  
structure accessible through an alternate interface on the same  
logical service. E.g., data put into VOSpace as a VOTable become  
accessible via a sky-node interface.

The spaces themselves may be structured in three levels of federation:
·       Data objects are siblings with no hierarchy and spaces are  
not linked.
·       Data objects can be grouped and arranged in a hierarchy of  
directories.
·       There are symbolic links between VOSpaces, allowing global  
federation.

We recommend the formation of a VOSpace Use Cases document, to more  
closely define the direction of this fine effort, and to  
differentiate it from related efforts in the grid community.

(9) Interoperable Security (GWS): Security and authentication is  
being implemented in several new efforts. The UK Astrogrid project  
has built a sophisticated workflow system for asynchronous  
computations and is adding authentication; a complementary project  
from the US NVO project is exploring the idea of “graduated security”  
for giving community access to high-performance computing. We  
recommend a study of these and other “grid” projects to promote  
interoperability.

(10) Space Time Coordinates (DM): An effect of a sophisticated data  
model can be the impression in the community that all levels of  
complexity must be understood before any part of it can be used. It  
would be better to have data models that can be used at different  
levels of sophistication. A jewel of the IVOA is the Space-Time  
Coordinate system specification, because of its rigor and accuracy.  
While it has become immensely more usable over the last year, it  
could be improved further by presenting a “toolkit” for expressing  
coordinates that allows rigor and accuracy rather than forcing a  
scientist to use accuracy and rigor even when there is are reasons  
against this.

(11) Units (DM): Most scientific quantities carry units, and data  
returned from IVOA services should also carry explicit unit  
information when not clear implicitly. Units should follow the IAU  
recommendation[1], which follows the SI convention. When a user makes  
a query based on a quantity, units can either be user-defined or  
fixed. In the former case, the user has the freedom to express the  
quantity in arbitrary units (eg. calories per square furlong per  
hour!), or an enumerated choice (eg. Angstroms OR nanometers). In the  
case of fixed units, the data model of the query is bound to specific  
units (eg all angles must be in decimal degrees). We recommend a  
study by the Data Model Working Group of how units are used in IVOA  
views and services, where it would be appropriate to simply fix the  
units, and where it is necessary to allow freedom of choice. In the  
latter case, the report should also recommend on how unit conversion  
is implemented: who is responsible and the nature of the software.