[Dataset] Model document update

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Thu Mar 31 11:46:56 CEST 2016

Hi Mark,

On Tue, Mar 29, 2016 at 02:18:04PM -0400, CresitelloDittmar, Mark wrote:
> On Mon, Mar 21, 2016 at 5:54 AM, Markus Demleitner <
> msdemlei at ari.uni-heidelberg.de> wrote:
> Rather than putting some in here, and some in NDCube, I thought it
> best to keep it all together.  Do you have another suggestion on how
> to handle this dependency while STC2 is being reviewed?

I'm afraid I don't have enough understanding of the technical nature
of the dependency -- why doesn't a simple reference to some internal
working draft work for the moment?

> re: Characterisation
> Section 3 is for the ObsDataset extension, and is, therefore, one of the
> specific Dataset types which is pulling in Characterisation.   Other types
> (eg: SimDataset if it were cast into this framework), may or may not
> pull in Characterisation, and may or may not want to extend that to
> include other simulation specific characterisation.
> Perhaps ObsDataset should be moved into the Observation/Experiment
> package-model.

My main interest is that I can use Dataset without having to process
the entire STC model *in VO-DML*.  So, my intereset is that the
various VO-DML documents are independent.  If that can be arranged,
I'm not so worried about the rest of the hierarchy.

> > (4) Talking about Curation.rights: This now has a multiplicity 0..1.
> > [my take: strike AccessRights and make Curation.rights point to
> > RightsType directly -- I don't think the potential benefit of having
> > this kind of thing machine-readable outweighs the cost in terms of
> > complexity]
> >
> When you say 'point to RightsType directly', that would not be possible as
> RightsType is a DataType.. it would be an attribute (as it was previously).
> I don't follow the 'machine-readable' part of your comment.

Well, machine-readable means that having DM attributes for the time
span of the access rights means that a computer can, in principle,
figure out when a dataset will change its status (and with higher
multiplicity, even figure out when that will be).  Since I don't see
a use case proportional to the added complexity I indeed proposed
going back to having a plain atomic attribute.

> > (6) I'm not happy with the inflation of places where dataset
> > identifiers can stand.  There's now Curation.publisherDID,
> > DataID.creatorDID, and  DataID.datasetID.  I don't think we're doing
> > our users a service by multiplying the concepts here, even though I
> > admit that each of these have a use case.
> >
> > I'd much rather see an Identifier type:
> >
> >   Identifier.kind: (publisher, creator, persistent, ...)
> >   Identifier.form: (doi, ivoid, generic-uri,  ...)
> >   Identifier.value: (well, you know).
> I haven't inflated anything.  These are the same set which has been
> in the prior models.  I do like the idea of using an Identifier
> type rather than anyURI.  Should be more adaptable to evolving
> standards/forms.  I would resist the 'kind' attribute.  As I said
> above, these groupings are associated with the dataset by different
> parties and the distinction is pervasive across the existing
> Resource documents.

...and has lead to much confusion.  I frankly don't see that these
different parts of a DM instance will be maintained by different
people.  And for the publisher it's much easier if they have one
central location for all the various identifiers -- which also helps
making clear their relationships.  It also helps when, for instance,
the creator has assigned both a DOI and an IVOA creatorDID assigned
to a dataset.

> > (7) Publication
> >
> > Here, we should be explicit about what the publication reference is.
> > Much as I would like the bibcode to rule supreme forever, this is
> > almost certainly not what is going to happen.  Either this gets a
> > form attribute as in (6) or we say "This should be a URI with a
> > scheme; use bibcode: for bibcodes, doi: for DOIs.  In a pinch,
> > non-URI, freetext references are ok".
> >
> Isn't this what 2.9.1 says?  Is there specific language you'd like changed
> there?

Well, perhaps something like:

  This should be interpreted as a URI.  Bibcodes should use the ad-hoc
  schema bibcode: (unless Alberto protests loudly), dois should use the
  form with the doi: schema.  Freetext references are discouraged.  If
  they are used nevertheless, they must not start with "[a-zA-z]+:" to
  ensure they are not interpreted as URIs.

> (10) Having said that, I think orcids will become a smash hit in the
> > near future if they aren't one already.  Hence, I'd add
> >
> >   identifier
> >
> > to the Party attributes.  The stuff on defining identifiers as in (7)
> > applies here, too (if we go the URI way, we should say whether we
> > want orcid:0000-... or http://orcid.org/0000-...)
> >
> >
> Can you elaborate?  Having an ID at the Party level could be
> confusing.. as an individual (me/you) could have different ID
> depending on the Role we are playing at the time.  That is why I
> left them up at the Role extensions (Publisher.publisherID).

Well, our orcid would presumably be the same, no?  And even if I have
a different id when I'm a publisher than when I'm a creator: I'm not
sure it helps if the attributes have different names.  Isn't it
enough that the two items differ by role?



More information about the dm mailing list