Single or collection resources
Robert Hanisch
hanisch at stsci.edu
Thu Aug 6 03:15:16 PDT 2009
I would just add that you can put plenty of text in the description metadata
element to make it clear that your collection includes data of certain
types, and use the other metadata tags for instrument names, the subject
element for object types (fine to have more than one). The philosophy
behind the registry is to be inclusive, at least in my view. Users see
resources that may be of interest, and winnow things down by examining the
metadata.
More often than not, I think people are not finding relevant data in the
registry because the data providers have not provided very thorough
metadata. And given that data providers are already quite minimalist in
their metadata, moving toward more individual resources or adding more
metadata tags is not likely to make things better. So I remain in favor of
good quality aggregate collections, with complete metadata descriptions in
the registry.
cheers,
Bob
On 8/5/09 2:22 PM, "Markus Demleitner" <msdemlei at ari.uni-heidelberg.de>
wrote:
> Doug, Robert, DAL folks,
>
> Thanks for your responses, but let me follow up --
>
>
> On Tue, Aug 04, 2009 at 09:02:57AM -0400, Robert Hanisch wrote:
>> Hello, Markus. My preference would be for option 2.
> [collection services]
>
>> The resource metadata was intended to describe a data collection, e.g., data
>> from a particular instrument, of a particular class of objects, or
>> pertaining to some phenonemon. In cases where the metadata elements are not
>> unique for the collection (your example of CREATOR) it is perfectly ok to
>> use values like "Various". The metadata for data collections is supposed to
>> aid discovery, not be a full description of each and every image in the
>> collection. The FITS keywords for the individual files should contain
>> additional metadata specific to each image.
> Yes, well; I see that you can embed a fair amount of provenance in
> FITS. On the other hand, such "what *have* you found"-metadata
> is not the only application of RMI metadata. I'm much more worried
> about people looking for "services exposing data from instrument X"
> or "services that have 'carte du ciel' in their description" in the
> registry. With collection services, this would likely not yield the
> desired results.
>
> And of course, the people that supplied you with the data get happy
> if you query *their* data using VOExplorer. While that's kind of
> silly, we are (or at least I am) still not in a situation where
> people queue up to get their data into the VO, so things like that
> help.
>
> I guess that's, in a few more words, the case for single services I
> made in the original mail.
>
>> I agree that having the same data served through multiple services is likely
>> to confuse users. And we don't really want an explosion of SIA services,
>> each with a separate registry entry.
> Ye-es... Well, I wasn't sure about that, and this was in part why I
> wrote the first mail: is that "keep the number of services down"
> policy rough consensus within the VO? If it is, that's of course a
> strong case for collection services.
>
>
> On Tue, 4 Aug 2009 13:56:35 -0600, Doug Tody wrote:
>
>> Any of these approaches would work. In general the same data can
>> be available from multiple services (e.g. due to replication of a
>> popular collection) hence this situation cannot really be avoided.
> Yes -- but at least as long as we do not have reliable "artifact ids"
> that would clients allow to weed out duplicates, I have a feeling Bob
> is right and we should *try* to avoid it as best we can. But that's
> just a feeling not based on actual user feedback.
>
>> Re option #2, while the RMI cannot fully describe the data in this case
>> since there are multiple individual collections, the SIA query response
>> can, since each image is separately described. Hence something
>> like CREATOR (DataID.Creator), COLLECTION, PublisherDID, etc. can
>> be specified separately for each image. This metadata is included
> Hm, I suspected that SIAv2 would make such collection services a bit
> more attractive, but still that metadata is only included in the
> *response* but is not available while people are trying to locate
> resources in the registry, so even SIAv2 won't help me too much.
>
> I guess what I'd really be looking for would be some way of
> registering the individual data sets and point them all to one SIAP
> service. But that's bad as well, since then data will come back that
> doesn't match the registry metadata.
>
> Still a bit at a loss,
>
> Markus
More information about the dal
mailing list