ADEC and VO data registration
Arnold Rots
arots at head-cfa.cfa.harvard.edu
Fri Sep 19 06:59:24 PDT 2003
Doug Tody wrote:
> Arnold -
>
> This one may come down to a matter of preference so here is my view. Both
> forms can work but since the authority ID and resource ID are fundamental to
> what is being described, they are best represented explicitly in the syntax.
I'm not sure the distinction is fundamental for the references from
the literature - hence my question.
My aim is maximum simplicity for the users (i.e., authors) and
compatibility with IVO identifiers.
>
> Some advantages of the first form:
>
> o The distinction between the authority ID and the resource (data
> collection in this case) is clear from the syntax. Otherwise
> one has to look up the authority ID metadata and apply some
> heuristics to determine what is being referred to, what is the
> authority ID and what is the resource. This would be much more
> prone to interpretation error, and would be more complex, as
> runtime queries would be needed to resolve what otherwise would
> be clear from the syntax alone.
As I said, I don't think that distinction is meaningful in this
particular context.
>
> o Separating out the resource in this way makes it easier to associate
> multiple resources with the same authority, e.g., different data
> collections, or a service which goes along with a data collection.
>
> For example, if Sa.HST.STIS is an authority ID, then
> Sa.HST.STIS/STIS-V1.1 might be a versioned data collection
> controlled by the authority Sa.HST.STIS, and Sa.HST.STIS/sia might
> be a SIA service for this data collection, again controlled by
> the same authority. I included STIS in the authority ID here
> to illustrate how hard it is to tell from the name alone what
> is being referred to - Sa.HST.STIS might well be the naming
> authority for STIS data, and not the/a STIS data collection.
I would strongly caution against using the authority Ids designed for
literature links for any other purpose.
>
> o By separating out the resource ID there are fewer restrictions
> on the form this takes, e.g., a longer name could be used to
> describe different versions of a data collection in the naming
> syntax alone (you could do this with the second form as well but
> it would result in less consistent authority ID names).
I think that's a matter of preference (like most of this).
>
> Although current ADEC proposals emphasize naming individual datasets,
> any scheme intended for publications should recognize data collections as
> well as datasets. One project might analyze only a few datasets which
> are explicitly referenced in a paper, while another project may perform
> statistical analysis of many datasets and it will be more appropriate to
> reference the entire data collection in a published paper.
What we are planning for Chandra is to create custom "container"
identifiers for large collections of observations: instead of having
to specify 300 strings Sa.CXO/<n> where <n> takes on 300 different
values, we would create a custom Id that can be referred to as
Sa.CXO/DougsProject, which will internally point to the 300
observations that you used.
I think such an explicit mechanism is far to be preferred over anybody
vaguely referring to "the entire collection Sa.HST.STIS" (or Sa.HST/STIS).
But this is really getting into implementation details at the
datacenter level.
>
> I like the use of # to delimit the resource-specific namespace (e.g.
> dataset ID), so long as this does not change when the ID is used in
> different contexts.
>
> > Sa.HST.STIS/O4LT010E0
> > Sa.HST.WFPC2/U32L0104T
>
> Ignoring for the moment the blending of authority and resource, it might be
> better here to use names like
>
> STIS.HST.Sa
> WFPC2.HST.Sa
>
> to be more consistent with existing DNS usage. I prefer left-to-right
> myself from a logical point of view, but unless there are other existing
> conventions pushing us in this direction we should be consistent with
> common URL usage or it will just confuse everyone.
Ah, that's a messy issue. I don't particularly care, but might note
that aoc.nrao.edu is actually the form that's backwards. The rest of
all URLs goes left-to-right from most to least significant, and that's
how we write our numbers and how, for instance, DOI is designed.
>
> - Doug
>
>
>
> On Thu, 18 Sep 2003, Arnold Rots wrote:
>
> > In the interest of simplicity for authors, can anyone explain what the
> > advantage is of this three-element Identifier definition:
> >
> > <AuthorityId>/<ResourceKey>#<DatasetId>
> >
> > which would result in things like:
> >
> > Sa.CXO/4000
> > Sa.HST/STIS#O4LT010E0
> > Sa.HST/WFPC2#U32L0104T
> > Sa.IUE/LWP25899
> >
> > Over the two-element identifier:
> >
> > <AuthorityId>/<DatasetId>
> >
> > that would result in identifiers like:
> >
> > Sa.CXO/4000
> > Sa.HST.STIS/O4LT010E0
> > Sa.HST.WFPC2/U32L0104T
> > Sa.IUE/LWP25899
> >
> > In both cases the same number of resources have to be registered,
> > though in the first case they are all different authority Ids, while
> > in the first case some of them are resource keys.
> > Actually, come to think of it, the first case requires more registry
> > records since the authority Ids as well as the resource keys need to
> > be registered.
> > In either case Sa.HST.STIS and Sa.HST/STIS need to be resolved to a
> > physical location. What's the difference?
> >
> > I don't see any advantage and unless someone can convince us that it's
> > a much better idea, I propose that we drop the #-sign and return to
> > the two-element model - it's simpler and cleaner.
> >
> > - Arnold
> >
> > --------------------------------------------------------------------------
> > Arnold H. Rots Chandra X-ray Science Center
> > Smithsonian Astrophysical Observatory tel: +1 617 496 7701
> > 60 Garden Street, MS 67 fax: +1 617 495 7356
> > Cambridge, MA 02138 arots at head-cfa.harvard.edu
> > USA http://hea-www.harvard.edu/~arots/
> > --------------------------------------------------------------------------
>
--------------------------------------------------------------------------
Arnold H. Rots Chandra X-ray Science Center
Smithsonian Astrophysical Observatory tel: +1 617 496 7701
60 Garden Street, MS 67 fax: +1 617 495 7356
Cambridge, MA 02138 arots at head-cfa.harvard.edu
USA http://hea-www.harvard.edu/~arots/
--------------------------------------------------------------------------
More information about the registry
mailing list