Data origin in the Virtual Observatory

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Thu Sep 1 09:13:15 CEST 2022


Dear Gilles, dear Gus,

On Wed, Aug 31, 2022 at 06:01:47PM +0200, gilles landais wrote:
> In order to improve the traceability and the citation of the resources
> consumed in the VO network, we propose to make more visible the data origin
> in the VO.
> We mean by Data Origin the basic provenance like authors, institutes, DOI,
> references, …

Thanks for taking this action!  Count me in on helping out on it.
What about a side meeting on this in the next Interop's gathertown?

A few early comments on your use cases:

(a) on the "get basic provenance" and "trace data origin": I always
like it better if a use case actually tells a story that is outside
of the thing itself.  In this case: "What do I do with that basic
provenance?"  I'd hence say the first could be something like:

* A researcher has data in a VOTable that shows an odd feature.  They
  would now like to talk to the creator of the data to help figure
  out whether that feature is physics or an artefact. [Requirement:
  contact information to producers present; but then let's not make
  that a MUST: This can be GDPR-relevant data, and it must be
  possible to leave it out if it is]

* A researcher revisits work they did six months earlier in an ad-hoc
  fashion and would now like to reproduce it in a more structured
  fashion.  Do do that, they need to know, say, which queries against
  which services, or perhaps which programs, produced the files.
  [Requirement: have the request parameters and a service
  identification (access url? ivoid?) in the data origin]

(b) The "final users to cite" and "bibliography of everything" (and
perhaps even "AAS citation template" use cases imply, I think,
roughly the same requirements.  I'd hence merge them to something
like:

* While preparing a publication, a researcher would like to properly
  cite the software and data that went into their results.  They now
  run a program to extract that information from the digital artefacts
  going into the publication -- perhaps even in separate parts of
  citations and acknowledgements.  [Requirement: The data origin must
  indicate requests for citation and/or acknowledgement in a
  machine-readable way, preferably in a way that machines can
  generate BibTeX for whatever they specify]

Thanks,

         Markus


More information about the datacp mailing list