Cube model - Dataset IDs

Douglas Tody dtody at nrao.edu
Thu Mar 19 00:13:31 CET 2015


More on this:

If ADS (or some other global index) is the publisher, linking to
externally published datasets, yes they could just use the PubDID (their
PubDID) in their index.  The issue I was referring to, was if an
archive/dataCenter curates a dataset, and selected datasets are linked
to publications via (eg) the ADS, it can be useful to replicate the ADS
DID in the archive.  From an archive-centric point of view, it can be
useful to indicate which datasets are linked to external, global
publishers.  The alternative is to either drop the global DatasetID in
local archives, or use it as the local PubDID for selected datasets.

 	- Doug


On Wed, 18 Mar 2015, Douglas Tody wrote:

> The concept for DataID.datasetID dates back to the spectrum DM; it was
> originally suggested by Jonathan.  The idea is that the PublisherDID is
> assigned by the local archive; CreaterDID by the dataset creator (survey,
> pipeline, etc.), and DatasetID is for a global index DID such as would
> be assigned by the ADS.
>
> PublisherDID should be reserved for the archive / data center; it is the
> only DID we can really count on being assigned.  If we start linking
> publications back to archival data (a long time goal) then supporting
> the ADS DID (datasetID) is important.  Although the PublisherDID could
> possibly be used for an ADS DID, that is bending the rules a bit (the
> ADS is not the actual publisher; the archive would no longer be able to
> consistently assign PubDIDs), and I think the distinction is still
> useful.  DatasetID not null for example, would tell us something unique.
>
> 	- Doug
>
>
> On Wed, 18 Mar 2015, CresitelloDittmar, Mark wrote:
>
>> All,
>> 
>> I'm working to get an update to the cube docs out and have a question based
>> on Marcus' inputs to the Spectral doc.
>> 
>> Dataset Metadata has 3 dataset IDs defined, which originate from the
>> ObsCore and Spectrum docs.
>>  1) Curation.publisherDID
>>       obscore: IVOA ID assigned by publisher
>>       spectrum: string locating dataset within publisher holdings
>>       ssa: IVOA ID assigned by publisher, with no meaning outside that
>> namespace.
>>
>>  2) DataID.datasetID
>>      obscore: <not in obscore>
>>      spectrum: IVOA ID assigned by publisher
>>      ssa: IVOA ID assigned by publisher (or maybe someone else?)
>>
>>  3) DataID.creatorDID
>>      all: IVOA ID assigned by creator
>> 
>> The question is in the distinction between 1 & 2.
>> The best I can gather is in persistence.  The Curation.publisherDID can
>> change over time, while
>> the DataID.datasetID is supposed to be persistent.  If the publisher
>> re-locates the dataset, or changes the interface, this could effect the
>> publisherDID, but shouldn't effect the datasetID.  The datasetID folds in
>> the possibility of being an journal based ID (e.g. from ADS) to provide a
>> static ID.  (Note: Obscore does state that the publisherDID should remain
>> static through time.)
>> 
>> I'm not sure what to do with these.. are they both still necessary? or have
>> the usages crystalized enough to consolidate them?  It looks like TAP
>> services must rely on Curation.publisherDID, with the expectation that it
>> is static.  The SSA/SIA services have, I think, both, and appear to rely on
>> publisherDID more.
>> 
>> I'm leaning heavily toward Markus' suggestion to consolidate them.. drop
>> DataID.datasetID.
>> Describe Curation.publisherDID as an IVOA ID (therefore globally unique
>> since the publisher has a unique authority id), which identifies the
>> dataset within the publisher's holdings.  The ID should be persistent and
>> may be a journal-based ID (eg. from ADS).  The same dataset published at
>> multiple locations would have different publisherDID, but the same
>> creatorDID.
>> 
>> Thoughts?
>> Mark
>> 
>


More information about the dm mailing list