Single or collection resources
Ray Plante
rplante at poplar.ncsa.uiuc.edu
Thu Aug 6 04:28:44 PDT 2009
Hey Markus et al,
Certainly, we have both approaches in practice today in the VO. For
example, Vizier has one ConeSearch service for each catalog that supports
it. This works well because:
a) They have so many catalogs across the broadest astronomical scope
that logically, as a whole, it could be returned to virtually every
subject keyword search to the registry.
b) They have detailed metadata--particularly a fined-tuned
description--that can usefully differentiate between the catalogs in
a typical query.
On the other hand, observatory archives also have a diverse set of
collections. A collection resource makes more sense if the metadata is
less rich particularly compared to the typical query. If "spiral galaxy"
is a typical query, then returning 100 resources about related to spiral
galaxies (plus 500 more related to galaxies or spirals) is not so helpful.
My own recommendation given the current state of the VO tools is to prefer
collection resources. You can always go back later and register at a
finer level once our tools can effectively deal with finer-grain
registration.
I did want to highlight some aspects of the registry model that are meant
to help with this issue. First, it is possible to register a data
collection separate from the services that access it. In VOResource
speak, the former is a resource of type DataCollection, the latter would
be a CatalogService with an SIA capability. The DataCollection resource
can reference the service as a "related resource" (of the type
"served-by"). This allows for N DataCollections being registered, but
just one SIA. In particular, people looking specifically for SIA services
will just get one resource from you; those doing more of a subject search
would find the DataCollections.
One barrier to this approach is the extent to which registries support
it. That is, your registry will need to let you (easily) set the related
resource metadata, and registry tools would need to interpret these values
and provide an easily accessible link to jump from the collection record
to the service. I'm not sure how close we are to that; however, I would
definitely like to push to make it possible.
That said, we can approach this in steps. Register your archive as a
DataCollection, and register your SIA as a service to that collection.
Later, you can register the individual image collections as
DataCollections as support for this level of granularity improves.
In closing, I'll note that the DAL services aren't currently good for
finding data based on the science subject matter that the datasets
address. If we consider our goals of re-purpose data, then perhaps they
shouldn't; science happens arguably at a higher level. The only other
service for doing discovery by subject matter is the registry. We could
use some improvements that allow a user to effectively make the jump
between the very general and the dataset.
hope this helps,
Ray
More information about the dal
mailing list