ADEC and VO data registration

Arnold Rots arots at head-cfa.cfa.harvard.edu
Fri Sep 19 06:59:24 PDT 2003


Doug Tody wrote:
> Arnold -
> 
> This one may come down to a matter of preference so here is my view.  Both
> forms can work but since the authority ID and resource ID are fundamental to
> what is being described, they are best represented explicitly in the syntax.

I'm not sure the distinction is fundamental for the references from
the literature - hence my question.
My aim is maximum simplicity for the users (i.e., authors) and
compatibility with IVO identifiers.

> 
> Some advantages of the first form:
> 
>     o	The distinction between the authority ID and the resource (data
>     	collection in this case) is clear from the syntax.  Otherwise
> 	one has to look up the authority ID metadata and apply some
> 	heuristics to determine what is being referred to, what is the
> 	authority ID and what is the resource.  This would be much more
> 	prone to interpretation error, and would be more complex, as
> 	runtime queries would be needed to resolve what otherwise would
> 	be clear from the syntax alone.

As I said, I don't think that distinction is meaningful in this
particular context.

> 
>     o	Separating out the resource in this way makes it easier to associate
>     	multiple resources with the same authority, e.g., different data
> 	collections, or a service which goes along with a data collection.
> 
> 	For example, if Sa.HST.STIS is an authority ID, then
> 	Sa.HST.STIS/STIS-V1.1 might be a versioned data collection
> 	controlled by the authority Sa.HST.STIS, and Sa.HST.STIS/sia might
> 	be a SIA service for this data collection, again controlled by
> 	the same authority.  I included STIS in the authority ID here
> 	to illustrate how hard it is to tell from the name alone what
> 	is being referred to - Sa.HST.STIS might well be the naming
> 	authority for STIS data, and not the/a STIS data collection.

I would strongly caution against using the authority Ids designed for
literature links for any other purpose.

> 
>     o	By separating out the resource ID there are fewer restrictions
> 	on the form this takes, e.g., a longer name could be used to
> 	describe different versions of a data collection in the naming
> 	syntax alone (you could do this with the second form as well but
> 	it would result in less consistent authority ID names).

I think that's a matter of preference (like most of this).

> 
> Although current ADEC proposals emphasize naming individual datasets,
> any scheme intended for publications should recognize data collections as
> well as datasets.  One project might analyze only a few datasets which
> are explicitly referenced in a paper, while another project may perform
> statistical analysis of many datasets and it will be more appropriate to
> reference the entire data collection in a published paper.

What we are planning for Chandra is to create custom "container"
identifiers for large collections of observations: instead of having
to specify 300 strings Sa.CXO/<n> where <n> takes on 300 different
values, we would create a custom Id that can be referred to as
Sa.CXO/DougsProject, which will internally point to the 300
observations that you used.
I think such an explicit mechanism is far to be preferred over anybody
vaguely referring to "the entire collection Sa.HST.STIS" (or Sa.HST/STIS).

But this is really getting into implementation details at the
datacenter level.

> 
> I like the use of # to delimit the resource-specific namespace (e.g.
> dataset ID), so long as this does not change when the ID is used in
> different contexts.
> 
> > Sa.HST.STIS/O4LT010E0
> > Sa.HST.WFPC2/U32L0104T
> 
> Ignoring for the moment the blending of authority and resource, it might be
> better here to use names like
> 
>     STIS.HST.Sa
>     WFPC2.HST.Sa
> 
> to be more consistent with existing DNS usage.  I prefer left-to-right
> myself from a logical point of view, but unless there are other existing
> conventions pushing us in this direction we should be consistent with
> common URL usage or it will just confuse everyone.

Ah, that's a messy issue.  I don't particularly care, but might note
that aoc.nrao.edu is actually the form that's backwards.  The rest of
all URLs goes left-to-right from most to least significant, and that's
how we write our numbers and how, for instance, DOI is designed.


> 
> 	- Doug
> 
> 
> 
> On Thu, 18 Sep 2003, Arnold Rots wrote:
> 
> > In the interest of simplicity for authors, can anyone explain what the
> > advantage is of this three-element Identifier definition:
> > 
> >         <AuthorityId>/<ResourceKey>#<DatasetId>
> > 
> > which would result in things like:
> > 
> >         Sa.CXO/4000
> >         Sa.HST/STIS#O4LT010E0
> >         Sa.HST/WFPC2#U32L0104T
> >         Sa.IUE/LWP25899
> > 
> > Over the two-element identifier:
> > 
> >         <AuthorityId>/<DatasetId>
> > 
> > that would result in identifiers like:
> > 
> >         Sa.CXO/4000
> >         Sa.HST.STIS/O4LT010E0
> >         Sa.HST.WFPC2/U32L0104T
> >         Sa.IUE/LWP25899
> > 
> > In both cases the same number of resources have to be registered,
> > though in the first case they are all different authority Ids, while
> > in the first case some of them are resource keys.
> > Actually, come to think of it, the first case requires more registry
> > records since the authority Ids as well as the resource keys need to
> > be registered.
> > In either case Sa.HST.STIS and Sa.HST/STIS need to be resolved to a
> > physical location.  What's the difference?
> > 
> > I don't see any advantage and unless someone can convince us that it's
> > a much better idea, I propose that we drop the #-sign and return to
> > the two-element model - it's simpler and cleaner.
> > 
> >   - Arnold
> > 
> > --------------------------------------------------------------------------
> > Arnold H. Rots                                Chandra X-ray Science Center
> > Smithsonian Astrophysical Observatory                tel:  +1 617 496 7701
> > 60 Garden Street, MS 67                              fax:  +1 617 495 7356
> > Cambridge, MA 02138                             arots at head-cfa.harvard.edu
> > USA                                     http://hea-www.harvard.edu/~arots/
> > --------------------------------------------------------------------------
> 
--------------------------------------------------------------------------
Arnold H. Rots                                Chandra X-ray Science Center
Smithsonian Astrophysical Observatory                tel:  +1 617 496 7701
60 Garden Street, MS 67                              fax:  +1 617 495 7356
Cambridge, MA 02138                             arots at head-cfa.harvard.edu
USA                                     http://hea-www.harvard.edu/~arots/
--------------------------------------------------------------------------



More information about the registry mailing list