Data origin in the Virtual Observatory
Markus Demleitner
msdemlei at ari.uni-heidelberg.de
Thu Sep 1 09:13:15 CEST 2022
Dear Gilles, dear Gus,
On Wed, Aug 31, 2022 at 06:01:47PM +0200, gilles landais wrote:
> In order to improve the traceability and the citation of the resources
> consumed in the VO network, we propose to make more visible the data origin
> in the VO.
> We mean by Data Origin the basic provenance like authors, institutes, DOI,
> references, …
Thanks for taking this action! Count me in on helping out on it.
What about a side meeting on this in the next Interop's gathertown?
A few early comments on your use cases:
(a) on the "get basic provenance" and "trace data origin": I always
like it better if a use case actually tells a story that is outside
of the thing itself. In this case: "What do I do with that basic
provenance?" I'd hence say the first could be something like:
* A researcher has data in a VOTable that shows an odd feature. They
would now like to talk to the creator of the data to help figure
out whether that feature is physics or an artefact. [Requirement:
contact information to producers present; but then let's not make
that a MUST: This can be GDPR-relevant data, and it must be
possible to leave it out if it is]
* A researcher revisits work they did six months earlier in an ad-hoc
fashion and would now like to reproduce it in a more structured
fashion. Do do that, they need to know, say, which queries against
which services, or perhaps which programs, produced the files.
[Requirement: have the request parameters and a service
identification (access url? ivoid?) in the data origin]
(b) The "final users to cite" and "bibliography of everything" (and
perhaps even "AAS citation template" use cases imply, I think,
roughly the same requirements. I'd hence merge them to something
like:
* While preparing a publication, a researcher would like to properly
cite the software and data that went into their results. They now
run a program to extract that information from the digital artefacts
going into the publication -- perhaps even in separate parts of
citations and acknowledgements. [Requirement: The data origin must
indicate requests for citation and/or acknowledgement in a
machine-readable way, preferably in a way that machines can
generate BibTeX for whatever they specify]
Thanks,
Markus
More information about the datacp
mailing list