Single or collection resources

Ray Plante rplante at poplar.ncsa.uiuc.edu
Thu Aug 6 04:28:44 PDT 2009


Hey Markus et al,

Certainly, we have both approaches in practice today in the VO.  For 
example, Vizier has one ConeSearch service for each catalog that supports 
it.  This works well because:
   a) They have so many catalogs across the broadest astronomical scope
      that logically, as a whole, it could be returned to virtually every
      subject keyword search to the registry.
   b) They have detailed metadata--particularly a fined-tuned
      description--that can usefully differentiate between the catalogs in
      a typical query.

On the other hand, observatory archives also have a diverse set of 
collections.  A collection resource makes more sense if the metadata is 
less rich particularly compared to the typical query.  If "spiral galaxy" 
is a typical query, then returning 100 resources about related to spiral 
galaxies (plus 500 more related to galaxies or spirals) is not so helpful.

My own recommendation given the current state of the VO tools is to prefer 
collection resources.  You can always go back later and register at a 
finer level once our tools can effectively deal with finer-grain 
registration.

I did want to highlight some aspects of the registry model that are meant 
to help with this issue.  First, it is possible to register a data 
collection separate from the services that access it.  In VOResource 
speak, the former is a resource of type DataCollection, the latter would 
be a CatalogService with an SIA capability.  The DataCollection resource 
can reference the service as a "related resource" (of the type 
"served-by").  This allows for N DataCollections being registered, but 
just one SIA.  In particular, people looking specifically for SIA services 
will just get one resource from you; those doing more of a subject search 
would find the DataCollections.

One barrier to this approach is the extent to which registries support 
it.  That is, your registry will need to let you (easily) set the related 
resource metadata, and registry tools would need to interpret these values 
and provide an easily accessible link to jump from the collection record 
to the service.  I'm not sure how close we are to that; however, I would 
definitely like to push to make it possible.

That said, we can approach this in steps.  Register your archive as a
DataCollection, and register your SIA as a service to that collection. 
Later, you can register the individual image collections as 
DataCollections as support for this level of granularity improves.

In closing, I'll note that the DAL services aren't currently good for 
finding data based on the science subject matter that the datasets 
address.  If we consider our goals of re-purpose data, then perhaps they 
shouldn't; science happens arguably at a higher level.  The only other 
service for doing discovery by subject matter is the registry.  We could 
use some improvements that allow a user to effectively make the jump 
between the very general and the dataset.

hope this helps,
Ray



More information about the dal mailing list