The IVOA in 2006: Assessment and Future Roadmap - Registries
Paul Harrison
pharriso at eso.org
Fri Jun 9 02:42:17 PDT 2006
I have been informed that this message - the original in this
particular thread did not make it to the list
Paul Harrison
On 07.06.2006, at 17:24, Roy Williams wrote:
> (5) Registry Implementation (Registry): As with many IVOA
> standards, it is time to finalize the schema for the Registry to
> enable a clear path to implementation. A new plan has been agreed
> at the May 2006 Interop, that elaborates the idea of Service into a
> family: the parent Service contains Interfaces and Capabilities.
> · We recommend that this change in registry schema should be
> the last for a long time – at least the last schema change that
> would invalidate old records.
though allowing schema extensions have always been part of the
philosophy behind the registry so as to facilitate innovation - up
until the Victoria meeting, this has caused problems, as the XML
based registries have been easily able to support extensions and
RDBMS based registries not. However, an important decision was made
at victoria that *All* registries should return a complete registry
entry including any extensions even if they allow searching only on
elements of the core schema.
> · We also recommend that the registry WG define and reach
> agreement on the scope of the registry in terms of the variety and
> granularity of metadata. Registries can cache detailed metadata on
> a regular basis, or maintain limited (but valid) metadata and fetch
> detail only when required.
The Recent NVO usability report from StSCI highlighted the
disappointment of the Astronomer-users with the Registry when trying
to use it as a "Google" to find out information about a particular
source - this is not (nor should be) the role of the registry - a
higher level tool is needed - such as astroscope or datascope, that
uses the registry as a starting point to discover resources that
might provide more information about the desired astronomical object.
Registries are there primarily to be information source for the co-
ordination between other services and resources. End-user astronomers
should not need to be aware that the registry exists....
As a centralized information source the registry can act as a cache
for the fine grained information required to call the service, that
can greatly speed up the end user experience.
The typical use case is if a user wants to make a query that
implements some non-mandatory selection criterion on a SSAP service
for instance - then for a coarse-grained registry the user tool needs to
1. query the registry to get a list of candidate services
2. query each service *in turn* to determine whether it supports the
extra selection criterion
This can lead to a significant delay for the user, and obviously does
not scale well as the number of deployed SSAP services increases. For
a fine-grained registry the user-tool need only make a single query
to determine which of the SSAP services can be successfully queried.
This does not necessarily impose a greater burden in creating the
registry entry, as in fact only core registry metadata need be
entered by hand and then the registry itself could query the service
to fill in automatically the missing metadata. It is part of the
GridWG "standard interface" for web services that a service returns
its own registry metadata, and older standards like SIAP do also
return this metadata in a different format. An implementor of service
will already probably have to maintain some sort of mapping between
his internal data model and that of the relevant IVOA standard, so it
is very little extra work for him to provide metadata about this
mapping in a standard way to the registry. In addition if he changes
the facilities offered by his service (e.g. adds extra parameters to
a SIAP query) he need only update the service and the registry entry
will be updated the next time that the registry does an automatic
update - this provides local curation of the metadata, which is the
most natural place to do it.
The process of "web-crawling" the registered services to check if
their metadata is up to date can be combined with regular service
validation, which adds further value to the fine-grained registry
approach. There is always the issue that the registry metadata are
not up to date, but there is a datestamp on the registry entry that
can be used to judge the likelyhood of a stale entry, and I suspect
that even a single registry would only take a timescale in the order
of hours for a registry to trawl all of the currently registered
services so could be done daily, and the problem is already naturally
divided between various registry deployments by only checking the
services for which they are the publishing registry.
We can do better in the case of the required initial registration of
resources as well. Each of the registries has their own
implementation of a "maintenance portal" for manually entering
registry records, and they are of considerably different quality -
this is a waste of effort, as in principle one single portal could be
built that could talk to any of the the registry implementations. I
would suggest that the best of the existing portals be refactored so
that it would be packaged in a standalone fashion, and then effort
could be concentrated on that one tool to make it easy to use and
provide all of the necessary validation and searching necessary to
aid in creating good quality registry entries.
> · We also recommend that the “Registry of Registries” should
> be created immediately and/or advertised on the IVOA website, even
> if it is informal (a web page), so that information can be gathered
> at the same time as the formal specification is built.
> · We hope to clarify and define closely the idea of
> annotation/augmentation of existing registry records by an entity
> that is not the author. We recommend that the Registry group
> provide use-cases for this concept.
>
> (6) Registry Query Language (Registry, VOQL): Querying a registry
> of services is rather different, semantically, from querying a star
> catalog. The former may involve small data in complex schemas, and
> the latter large data in simple schema. The star catalog query is
> helped by specific language constructs (eg. Region of the sky) that
> may mean nothing in the context of the registry query. We recommend
> a sub-committee of the Registry and VOQL groups should examine the
> case for and against a separate query language for registry, that
> would be customized for registry queries and independent of future
> development of the catalog query language.
standardization is good, but "one size fits all" can take things too
far, and this I think is the case with trying to use ADQL for
registry querying - I argued this a while back, see thread starting
http://www.ivoa.net/forum/registry/0504/1300.htm - basically the aims
of the query are too disimilar between catalogues and the registry,
and in fact different customised extensions/modifications of the
underlying SQL are required in each case. Basically, I do think that
it is now time to define a separate registry query language,
particularly as it appears that there is a schism opening even within
the VOQL community on what exactly should be part of the query
language. The problem here, in my opinion, is again that the language
and interfaces needed for services to be able to formulate queries
amongst themsleves is not necessarily the same as the interface and
language that the Astronomer user wants to formulate the query. There
should be translation layers between these two levels that keep the
interface definitions separate, but related.
The registry data model (for better or worse) has always been defined
in terms of the XML Schema language, and there is a very natural
candidate for a query language for XML, namely XQuery - however,
although this is easy for the XML based registries to implement, it
is difficult for the RDBMS based registries - at the Kyoto meeting it
was agreed to make XQuery an optional query language. It does have
most of the richness required - allows complex search relations
between different parts of the data model, and allows only the
desired portions of the registry record to be returned. It does not
have any specific "cone search" or "cross match" operators for doing
astrometric selections on registry records, but as I argued above I
think that it is probably up to a higher level facility to do this
sort of thing. Anyway most XQuery/XPath implementations do allow you
to add custom functions that could support these sort of operations
on STC coverage information.
The adoption of XQuery as *the* registry query language would be
problematic because of the RDBMS based registries - I do not have an
easy solution - perhaps some very simplified form of XQuery would do,
that could be easily translated into SQL - a principal requirement
would be that the query itself was NOT expressed as XML though....
Paul Harrison
More information about the registry
mailing list