Cube model - Dataset IDs

Douglas Tody dtody at
Wed Mar 18 23:02:54 CET 2015

The concept for DataID.datasetID dates back to the spectrum DM; it was
originally suggested by Jonathan.  The idea is that the PublisherDID is
assigned by the local archive; CreaterDID by the dataset creator (survey,
pipeline, etc.), and DatasetID is for a global index DID such as would
be assigned by the ADS.

PublisherDID should be reserved for the archive / data center; it is the
only DID we can really count on being assigned.  If we start linking
publications back to archival data (a long time goal) then supporting
the ADS DID (datasetID) is important.  Although the PublisherDID could
possibly be used for an ADS DID, that is bending the rules a bit (the
ADS is not the actual publisher; the archive would no longer be able to
consistently assign PubDIDs), and I think the distinction is still
useful.  DatasetID not null for example, would tell us something unique.

 	- Doug

On Wed, 18 Mar 2015, CresitelloDittmar, Mark wrote:

> All,
> I'm working to get an update to the cube docs out and have a question based
> on Marcus' inputs to the Spectral doc.
> Dataset Metadata has 3 dataset IDs defined, which originate from the
> ObsCore and Spectrum docs.
>  1) Curation.publisherDID
>       obscore: IVOA ID assigned by publisher
>       spectrum: string locating dataset within publisher holdings
>       ssa: IVOA ID assigned by publisher, with no meaning outside that
> namespace.
>  2) DataID.datasetID
>      obscore: <not in obscore>
>      spectrum: IVOA ID assigned by publisher
>      ssa: IVOA ID assigned by publisher (or maybe someone else?)
>  3) DataID.creatorDID
>      all: IVOA ID assigned by creator
> The question is in the distinction between 1 & 2.
> The best I can gather is in persistence.  The Curation.publisherDID can
> change over time, while
> the DataID.datasetID is supposed to be persistent.  If the publisher
> re-locates the dataset, or changes the interface, this could effect the
> publisherDID, but shouldn't effect the datasetID.  The datasetID folds in
> the possibility of being an journal based ID (e.g. from ADS) to provide a
> static ID.  (Note: Obscore does state that the publisherDID should remain
> static through time.)
> I'm not sure what to do with these.. are they both still necessary? or have
> the usages crystalized enough to consolidate them?  It looks like TAP
> services must rely on Curation.publisherDID, with the expectation that it
> is static.  The SSA/SIA services have, I think, both, and appear to rely on
> publisherDID more.
> I'm leaning heavily toward Markus' suggestion to consolidate them.. drop
> DataID.datasetID.
> Describe Curation.publisherDID as an IVOA ID (therefore globally unique
> since the publisher has a unique authority id), which identifies the
> dataset within the publisher's holdings.  The ID should be persistent and
> may be a journal-based ID (eg. from ADS).  The same dataset published at
> multiple locations would have different publisherDID, but the same
> creatorDID.
> Thoughts?
> Mark

More information about the dm mailing list