VO and ADEC identifiers

Thu Sep 18 14:24:41 PDT 2003

I completely agree with Patrick's comments about the ambiguity and 
overuse of the term "dataset."  However, in the context of this thread 
(verification of datasets published in the literature), one can 
generally assume that the word "dataset" refers to a rather fine-grained 
instance of astronomical observations (i.e. a single FITS file + 
ancillary data rather than a survey).  In that sense, it is unreasonable 
(at least in my mind) to assume that each and every dataset will have an 
entry in the registry.  Even if you consider bigger sets of 
observations, each survey paper may refer to its own custom-made 
collection (obtained according to some criteria), and it's unreasonable 
to think that each of these will be entered in the registry.

Just to add some more prospective (but hopefully not confusion) to the 
topic, if one considers the ADS as an archive of bibliographic datasets, 
there is no reason not to think of a single record (bibcode) as a datum 
that can be verified and linked to.  So we could presumably define an 
entry in the registry corresponding to ADS as an archive and its 
bibliographic datases as "data collections."  It would also make sense 
to register a verification service that can be used by other data 
centers to create and maintain bibcode links (right now this is 
performed using customized tools).  However, it would be insane to 
consider adding all of its 3.2M bibcodes (now seen as data identifiers) 
to the registry.

So I guess my point is we should not assume that the registry contains 
*everything* that we may want to obtain metadata about.  We can however 
assume that it contains entries for all the services that can be used to 
get to this metadata.

-- Alberto

Patrick Dowler wrote:
> On Thursday 18 September 2003 12:36, Tony Linde wrote:
> 
>>If there is a single service which sits in front of a collection of
>>datasets, each of which is a table within a database, how does a query
>>sent
>>to the service work? Does it query every dataset with the same criteria?
>>
>>Are all the datasets just blocks within a single table so that a query
>>is
>>effectively on the colleciton as a whole and the data returned can be
>>from
>>many datasets?
>>
>>If a user queries the registry looking for a service which can provide
>>data
>>of some description, how is the collection of datasets described under a
>>single service? - ie does the metadata (coverage, content etc) embrace
>>all
>>the datasets as if they all existed in a single table?
>>
>>Sorry if this is AstroInformatics 101 :)
> 
> 
> "dataset" is a heavily (over-)used word. To some people it means one or more
> related files from a telescope or archive (1+ images). To another, the whole 
> SDSS source catalog is a dataset (ie. many RDB tables).  There are cases 
> where a "dataset" is a set of images, spectra, and a source catalog to go 
> with it. I think the confusion comes from the fact that "data" is (over-)used
> to mean both the observational data (images, spectra, time series, etc) and
> the derived or extracted information (source catalogs, for example). 
> 
> Whether the use of data and dataset if over-use can be argued until the end of 
> time. It certainly is a vague concept in practice and in my experience even 
> individuals tend to use it losely and differently (which doesn't help :-).
> 

-- 

****************************************************************************
Alberto Accomazzi
NASA Astrophysics Data System                     http://adswww.harvard.edu
Harvard-Smithsonian Center for Astrophysics      http://cfa-www.harvard.edu
60 Garden Street, MS 31, Cambridge, MA 02138 USA
****************************************************************************