The IVOA in 2006: Assessment and Future Roadmap
Roy Williams
roy at cacr.caltech.edu
Wed Jun 7 08:24:28 PDT 2006
Dear IVOA Exec
As requested, I present the assessment of the IVOA progress from the
Technical Coordination Committee: The IVOA in 2006: Assessment and
Future Roadmap. The PDF document can be found at the link:
http://www.ivoa.net/Documents/Notes/RoadMap/IVOARoadMap-2006.pdf
There is an explanation of the architecture of the IVOA expressed as
Views, Services, and Registry; a set of eleven recommendations to the
working groups; and the agreed roadmaps for each working group for
the next year.
I would like to thank all the Working Group and Interest Group chairs
for their help in making this document, particularly Francoise
Genova, Tony Linde, Maria Nieto, Guy Rixon, and Doug Tody.
I reproduce the recommendations below, and the TCC welcomes your
comments and corrections from the IVOA Exec or anyone else.
Roy Williams
Chair, IVOA Technical Coordination Committee
-----------------------------------------
(1) Crossmatch standards (TCC): The Interop meeting in Victoria (May
2006) generated a wide-ranging discussion on the nature of
astronomical crossmatch – meaning a fuzzy join of source catalogs in
order to associate multiple observations of the same astrophysical
object. It is a core operation of statistical astronomy, and a widely-
implemented crossmatch standard is essential if catalogs are to be
richly federated. The standard crossmatch cannot, of course, solve a
scientific problem to its conclusion, but should be a way to obtain a
“good selection of candidates” to which the scientist can apply
custom processing. Therefore we recommend that the Technical
Coordination Committee set up a small international “tiger tem” to
investigate and report on crossmatch with a deadline of October 2006,
specifically covering use-case, algorithm, complexity, and notation:
· Produce a number of scientific use cases for crossmatch
algorithms, such that the convex hull of these use cases covers most
practical applications;
· Research and enumerate the different algorithm components,
including distance and the chi-squared (Szalay et al) algorithm,
including algorithms that use non-spatial information;
· Evaluate the strengths and weaknesses of these algorithms
with respect to the scientific use cases;
· Research and understand the technical implementation of these
algorithms with relational database technology, assigning
computational complexity to the algorithms;
· Evaluate the conceptual complexity of the algorithms by
building suitable notation based on ADQL or SQL, and writing the use-
case queries in terms of this notation;
The report should be online at the IVOA wiki, with a discussion
section to allow the use-cases, algorithms, complexity measures, and
notations to be evaluated by the whole IVOA.
(2) Simple Image Access (DAL): One of the early successes of the IVOA
standards process was the Simple Image Access Protocol, a standard
service based on a view of an image survey as a covering of the sky
with a small number of named filters. This protocol has been widely
accepted and implemented. Emerging in 2006 is a more sophisticated
“Datacube Access” version, where the image can be a multidimensional
datacube, the metadata aligns with the characterization data model,
and other enhancements. We hope that the older, simple level can be
retained as “Simple Image Access”, in addition to the new protocol
for datacubes, because (a) many sites will continue using the
original standard, and (b) the simpler protocol can often do
everything that is needed. We recommend continued support of “Simple
Image Access” in the same form as it has been, and welcome the
addition of the new datacube protocol.
(3) Spectral Model/Interface/Access (DM, DAL): The spectrum data
model and access services have appeared as Working Drafts in the Data
Models and Data Access Layer groups. We recommend accelerated
implementations of these standards, and experiments on
interoperability between these, which will lead to accelerated
approval to Recommendation by the IVOA. In this way, the exposure of
spectral data by data centers can be brought to the same level of
maturity as image and catalog views.
(4) Source Catalog View (DM, VOQL): Many astronomical databases (but
not all) are object catalogs. A choice of views is emerging between
the “Table” and “Catalog” concepts. In the Table view, any database
table can be exposed and its relational schema used to create
queries; and in the Catalog data model, a table of astronomical
sources is exposed in a standard data model – so that, for example,
“positional error” is always written the same way rather than
according to an arbitrary name chosen by the table author. We
recommend (hope for) a vigorous discussion within the IVOA of this
Source Catalog Data Model, with the objective of international
agreement on a standard representation for a source catalog.
(5) Registry Implementation (Registry): As with many IVOA standards,
it is time to finalize the schema for the Registry to enable a clear
path to implementation. A new plan has been agreed at the May 2006
Interop, that elaborates the idea of Service into a family: the
parent Service contains Interfaces and Capabilities.
· We recommend that this change in registry schema should be
the last for a long time – at least the last schema change that would
invalidate old records.
· We also recommend that the registry WG define and reach
agreement on the scope of the registry in terms of the variety and
granularity of metadata. Registries can cache detailed metadata on a
regular basis, or maintain limited (but valid) metadata and fetch
detail only when required.
· We also recommend that the “Registry of Registries” should be
created immediately and/or advertised on the IVOA website, even if it
is informal (a web page), so that information can be gathered at the
same time as the formal specification is built.
· We hope to clarify and define closely the idea of annotation/
augmentation of existing registry records by an entity that is not
the author. We recommend that the Registry group provide use-cases
for this concept.
(6) Registry Query Language (Registry, VOQL): Querying a registry of
services is rather different, semantically, from querying a star
catalog. The former may involve small data in complex schemas, and
the latter large data in simple schema. The star catalog query is
helped by specific language constructs (eg. Region of the sky) that
may mean nothing in the context of the registry query. We recommend a
sub-committee of the Registry and VOQL groups should examine the case
for and against a separate query language for registry, that would be
customized for registry queries and independent of future development
of the catalog query language.
(7) Table and Catalog Access (VOQL, DAL): The Query Language group
has made considerable advances in generalizing and standardizing
levels of compliance and utility. For general catalog access, cone
search, trivial as it is, has proven to be a good start as it
provides easy access to data via a simple interface. SkyNode
addresses the much harder problems of providing a general query
language, crossmatching of large catalogs, and distributed cross
matches. What is needed -- an intermediate approach for basic catalog
access -- is something which provides both a language-based interface
(ADQL) as well as a parameter-based interface more sophisticated than
cone search, and eventually, data model mediation via standard
catalog data models.
· In the language interface, a relational database is exposed
through relational schema: the table names and table attributes,
together with the ability to build a query using that metadata. We
recommend the creation of a standard interface called Simple Table
Access Protocol (STAP) that implements this view. It would be derived
from the basic Skynode interface and the core ADQL language.
· In the catalog view, queries can be created within the
Catalog Data Model, language-based and/or parameter-based, so that
the same query can be sent to multiply-authored source catalogs, and
the results returned in the context of that view. We recommend the
creation of a standard catalog access service interface to support
his view.
· The most sophisticated queries involve distributed cross-
match, where multiple source catalogs generate associations of
observations of the same physical object. See above for
recommendation relating to crossmatch.
(8) VOSpace (GWS): The VOSpace effort within the Grid/Web services
working group is building semantics, schema, interface, and
prototype. The view and capabilities of the VOSpace is revealed at
three levels of depth:
· Data are stored as files/blobs, but MIME types are recorded
against them so that them may be understood after being fetched out
of VOSpace.
· MIME types are used to allow access to parts of a file, or to
allow dynamic reformatting during output from VOSpace.
· Data are stored in some way that makes their internal
structure accessible through an alternate interface on the same
logical service. E.g., data put into VOSpace as a VOTable become
accessible via a sky-node interface.
The spaces themselves may be structured in three levels of federation:
· Data objects are siblings with no hierarchy and spaces are
not linked.
· Data objects can be grouped and arranged in a hierarchy of
directories.
· There are symbolic links between VOSpaces, allowing global
federation.
We recommend the formation of a VOSpace Use Cases document, to more
closely define the direction of this fine effort, and to
differentiate it from related efforts in the grid community.
(9) Interoperable Security (GWS): Security and authentication is
being implemented in several new efforts. The UK Astrogrid project
has built a sophisticated workflow system for asynchronous
computations and is adding authentication; a complementary project
from the US NVO project is exploring the idea of “graduated security”
for giving community access to high-performance computing. We
recommend a study of these and other “grid” projects to promote
interoperability.
(10) Space Time Coordinates (DM): An effect of a sophisticated data
model can be the impression in the community that all levels of
complexity must be understood before any part of it can be used. It
would be better to have data models that can be used at different
levels of sophistication. A jewel of the IVOA is the Space-Time
Coordinate system specification, because of its rigor and accuracy.
While it has become immensely more usable over the last year, it
could be improved further by presenting a “toolkit” for expressing
coordinates that allows rigor and accuracy rather than forcing a
scientist to use accuracy and rigor even when there is are reasons
against this.
(11) Units (DM): Most scientific quantities carry units, and data
returned from IVOA services should also carry explicit unit
information when not clear implicitly. Units should follow the IAU
recommendation[1], which follows the SI convention. When a user makes
a query based on a quantity, units can either be user-defined or
fixed. In the former case, the user has the freedom to express the
quantity in arbitrary units (eg. calories per square furlong per
hour!), or an enumerated choice (eg. Angstroms OR nanometers). In the
case of fixed units, the data model of the query is bound to specific
units (eg all angles must be in decimal degrees). We recommend a
study by the Data Model Working Group of how units are used in IVOA
views and services, where it would be appropriate to simply fix the
units, and where it is necessary to allow freedom of choice. In the
latter case, the report should also recommend on how unit conversion
is implemented: who is responsible and the nature of the software.
More information about the interop
mailing list