VO and ADEC identifiers

Thu Sep 18 14:58:12 PDT 2003

To expand a bit more on that:
Yes, these datasets are losely defined as the collections of data
files required to interpret a particular observation or set of
observations - often a tarball.

On the bright side, metadata for all of them are contained in a
registry of sorts: the mission's (observatory's) observation catalog.

One should also be aware of a few other properties:

The identifiers may not lead to the datasets themselves, but to a URL
that allows the user to inspect, browse, and download the dataset(s).

It is at the discretion of the datacenter whether the dataset pointed
to will be the exact same representation of the observation that the
author used or a different (newer, better) version. [No, I am not
inviting another discussion on the essence of the concept of
"sameness" >:-| ]

  - Arnold

Alberto Accomazzi wrote:
> I completely agree with Patrick's comments about the ambiguity and 
> overuse of the term "dataset."  However, in the context of this thread 
> (verification of datasets published in the literature), one can 
> generally assume that the word "dataset" refers to a rather fine-grained 
> instance of astronomical observations (i.e. a single FITS file + 
> ancillary data rather than a survey).  In that sense, it is unreasonable 
> (at least in my mind) to assume that each and every dataset will have an 
> entry in the registry.  Even if you consider bigger sets of 
> observations, each survey paper may refer to its own custom-made 
> collection (obtained according to some criteria), and it's unreasonable 
> to think that each of these will be entered in the registry.
> 
> Just to add some more prospective (but hopefully not confusion) to the 
> topic, if one considers the ADS as an archive of bibliographic datasets, 
> there is no reason not to think of a single record (bibcode) as a datum 
> that can be verified and linked to.  So we could presumably define an 
> entry in the registry corresponding to ADS as an archive and its 
> bibliographic datases as "data collections."  It would also make sense 
> to register a verification service that can be used by other data 
> centers to create and maintain bibcode links (right now this is 
> performed using customized tools).  However, it would be insane to 
> consider adding all of its 3.2M bibcodes (now seen as data identifiers) 
> to the registry.
> 
> So I guess my point is we should not assume that the registry contains 
> *everything* that we may want to obtain metadata about.  We can however 
> assume that it contains entries for all the services that can be used to 
> get to this metadata.
> 
> -- Alberto
> 
> 
> Patrick Dowler wrote:
> > On Thursday 18 September 2003 12:36, Tony Linde wrote:
> > 
> >>If there is a single service which sits in front of a collection of
> >>datasets, each of which is a table within a database, how does a query
> >>sent
> >>to the service work? Does it query every dataset with the same criteria?
> >>
> >>Are all the datasets just blocks within a single table so that a query
> >>is
> >>effectively on the colleciton as a whole and the data returned can be
> >>from
> >>many datasets?
> >>
> >>If a user queries the registry looking for a service which can provide
> >>data
> >>of some description, how is the collection of datasets described under a
> >>single service? - ie does the metadata (coverage, content etc) embrace
> >>all
> >>the datasets as if they all existed in a single table?
> >>
> >>Sorry if this is AstroInformatics 101 :)
> > 
> > 
> > "dataset" is a heavily (over-)used word. To some people it means one or more
> > related files from a telescope or archive (1+ images). To another, the whole 
> > SDSS source catalog is a dataset (ie. many RDB tables).  There are cases 
> > where a "dataset" is a set of images, spectra, and a source catalog to go 
> > with it. I think the confusion comes from the fact that "data" is (over-)used
> > to mean both the observational data (images, spectra, time series, etc) and
> > the derived or extracted information (source catalogs, for example). 
> > 
> > Whether the use of data and dataset if over-use can be argued until the end of 
> > time. It certainly is a vague concept in practice and in my experience even 
> > individuals tend to use it losely and differently (which doesn't help :-).
> > 
> 
> 
> -- 
> 
> ****************************************************************************
> Alberto Accomazzi
> NASA Astrophysics Data System                     http://adswww.harvard.edu
> Harvard-Smithsonian Center for Astrophysics      http://cfa-www.harvard.edu
> 60 Garden Street, MS 31, Cambridge, MA 02138 USA
> ****************************************************************************
> 
--------------------------------------------------------------------------
Arnold H. Rots                                Chandra X-ray Science Center
Smithsonian Astrophysical Observatory                tel:  +1 617 496 7701
60 Garden Street, MS 67                              fax:  +1 617 495 7356
Cambridge, MA 02138                             arots at head-cfa.harvard.edu
USA                                     http://hea-www.harvard.edu/~arots/
--------------------------------------------------------------------------