ADEC and VO data registration

Doug Tody dtody at nrao.edu
Thu Sep 18 14:59:51 PDT 2003


Arnold -

This one may come down to a matter of preference so here is my view.  Both
forms can work but since the authority ID and resource ID are fundamental to
what is being described, they are best represented explicitly in the syntax.

Some advantages of the first form:

    o	The distinction between the authority ID and the resource (data
    	collection in this case) is clear from the syntax.  Otherwise
	one has to look up the authority ID metadata and apply some
	heuristics to determine what is being referred to, what is the
	authority ID and what is the resource.  This would be much more
	prone to interpretation error, and would be more complex, as
	runtime queries would be needed to resolve what otherwise would
	be clear from the syntax alone.

    o	Separating out the resource in this way makes it easier to associate
    	multiple resources with the same authority, e.g., different data
	collections, or a service which goes along with a data collection.

	For example, if Sa.HST.STIS is an authority ID, then
	Sa.HST.STIS/STIS-V1.1 might be a versioned data collection
	controlled by the authority Sa.HST.STIS, and Sa.HST.STIS/sia might
	be a SIA service for this data collection, again controlled by
	the same authority.  I included STIS in the authority ID here
	to illustrate how hard it is to tell from the name alone what
	is being referred to - Sa.HST.STIS might well be the naming
	authority for STIS data, and not the/a STIS data collection.

    o	By separating out the resource ID there are fewer restrictions
	on the form this takes, e.g., a longer name could be used to
	describe different versions of a data collection in the naming
	syntax alone (you could do this with the second form as well but
	it would result in less consistent authority ID names).

Although current ADEC proposals emphasize naming individual datasets,
any scheme intended for publications should recognize data collections as
well as datasets.  One project might analyze only a few datasets which
are explicitly referenced in a paper, while another project may perform
statistical analysis of many datasets and it will be more appropriate to
reference the entire data collection in a published paper.

I like the use of # to delimit the resource-specific namespace (e.g.
dataset ID), so long as this does not change when the ID is used in
different contexts.

> Sa.HST.STIS/O4LT010E0
> Sa.HST.WFPC2/U32L0104T

Ignoring for the moment the blending of authority and resource, it might be
better here to use names like

    STIS.HST.Sa
    WFPC2.HST.Sa

to be more consistent with existing DNS usage.  I prefer left-to-right
myself from a logical point of view, but unless there are other existing
conventions pushing us in this direction we should be consistent with
common URL usage or it will just confuse everyone.

	- Doug



On Thu, 18 Sep 2003, Arnold Rots wrote:

> In the interest of simplicity for authors, can anyone explain what the
> advantage is of this three-element Identifier definition:
> 
>         <AuthorityId>/<ResourceKey>#<DatasetId>
> 
> which would result in things like:
> 
>         Sa.CXO/4000
>         Sa.HST/STIS#O4LT010E0
>         Sa.HST/WFPC2#U32L0104T
>         Sa.IUE/LWP25899
> 
> Over the two-element identifier:
> 
>         <AuthorityId>/<DatasetId>
> 
> that would result in identifiers like:
> 
>         Sa.CXO/4000
>         Sa.HST.STIS/O4LT010E0
>         Sa.HST.WFPC2/U32L0104T
>         Sa.IUE/LWP25899
> 
> In both cases the same number of resources have to be registered,
> though in the first case they are all different authority Ids, while
> in the first case some of them are resource keys.
> Actually, come to think of it, the first case requires more registry
> records since the authority Ids as well as the resource keys need to
> be registered.
> In either case Sa.HST.STIS and Sa.HST/STIS need to be resolved to a
> physical location.  What's the difference?
> 
> I don't see any advantage and unless someone can convince us that it's
> a much better idea, I propose that we drop the #-sign and return to
> the two-element model - it's simpler and cleaner.
> 
>   - Arnold
> 
> --------------------------------------------------------------------------
> Arnold H. Rots                                Chandra X-ray Science Center
> Smithsonian Astrophysical Observatory                tel:  +1 617 496 7701
> 60 Garden Street, MS 67                              fax:  +1 617 495 7356
> Cambridge, MA 02138                             arots at head-cfa.harvard.edu
> USA                                     http://hea-www.harvard.edu/~arots/
> --------------------------------------------------------------------------



More information about the registry mailing list