VO and ADEC identifiers
Alberto Accomazzi
aaccomazzi at cfa.harvard.edu
Thu Sep 18 14:24:41 PDT 2003
I completely agree with Patrick's comments about the ambiguity and
overuse of the term "dataset." However, in the context of this thread
(verification of datasets published in the literature), one can
generally assume that the word "dataset" refers to a rather fine-grained
instance of astronomical observations (i.e. a single FITS file +
ancillary data rather than a survey). In that sense, it is unreasonable
(at least in my mind) to assume that each and every dataset will have an
entry in the registry. Even if you consider bigger sets of
observations, each survey paper may refer to its own custom-made
collection (obtained according to some criteria), and it's unreasonable
to think that each of these will be entered in the registry.
Just to add some more prospective (but hopefully not confusion) to the
topic, if one considers the ADS as an archive of bibliographic datasets,
there is no reason not to think of a single record (bibcode) as a datum
that can be verified and linked to. So we could presumably define an
entry in the registry corresponding to ADS as an archive and its
bibliographic datases as "data collections." It would also make sense
to register a verification service that can be used by other data
centers to create and maintain bibcode links (right now this is
performed using customized tools). However, it would be insane to
consider adding all of its 3.2M bibcodes (now seen as data identifiers)
to the registry.
So I guess my point is we should not assume that the registry contains
*everything* that we may want to obtain metadata about. We can however
assume that it contains entries for all the services that can be used to
get to this metadata.
-- Alberto
Patrick Dowler wrote:
> On Thursday 18 September 2003 12:36, Tony Linde wrote:
>
>>If there is a single service which sits in front of a collection of
>>datasets, each of which is a table within a database, how does a query
>>sent
>>to the service work? Does it query every dataset with the same criteria?
>>
>>Are all the datasets just blocks within a single table so that a query
>>is
>>effectively on the colleciton as a whole and the data returned can be
>>from
>>many datasets?
>>
>>If a user queries the registry looking for a service which can provide
>>data
>>of some description, how is the collection of datasets described under a
>>single service? - ie does the metadata (coverage, content etc) embrace
>>all
>>the datasets as if they all existed in a single table?
>>
>>Sorry if this is AstroInformatics 101 :)
>
>
> "dataset" is a heavily (over-)used word. To some people it means one or more
> related files from a telescope or archive (1+ images). To another, the whole
> SDSS source catalog is a dataset (ie. many RDB tables). There are cases
> where a "dataset" is a set of images, spectra, and a source catalog to go
> with it. I think the confusion comes from the fact that "data" is (over-)used
> to mean both the observational data (images, spectra, time series, etc) and
> the derived or extracted information (source catalogs, for example).
>
> Whether the use of data and dataset if over-use can be argued until the end of
> time. It certainly is a vague concept in practice and in my experience even
> individuals tend to use it losely and differently (which doesn't help :-).
>
--
****************************************************************************
Alberto Accomazzi
NASA Astrophysics Data System http://adswww.harvard.edu
Harvard-Smithsonian Center for Astrophysics http://cfa-www.harvard.edu
60 Garden Street, MS 31, Cambridge, MA 02138 USA
****************************************************************************
More information about the registry
mailing list