Fwd: Fwd: Data origin in the Virtual Observatory

gilles landais gilles.landais at astro.unistra.fr
Fri Sep 2 18:08:26 CEST 2022


Dear Markus,

Thank you for your literary proposal - the use case sounds more friendly -

Some comments on your use case (a) -  What do you mean with contact 
information ? to add email?
email management is not easy - but authors/producer name, may be 
affiliation and especially ORCID could help users to contact the persons.

I agree that "contact information" should be an option. In fact, I 
suggest to ban MUST for each "origin" metadata - their implementation 
depends of the richness contents of the data providers.


I like your idea of a gathertown in next interop: we can plan one.

Regards,

Gilles



-------- Message transféré --------
Sujet : 	Re: Data origin in the Virtual Observatory
Date : 	Thu, 1 Sep 2022 09:13:15 +0200
De : 	Markus Demleitner <msdemlei at ari.uni-heidelberg.de>
Pour : 	datacp at ivoa.net



Dear Gilles, dear Gus,

On Wed, Aug 31, 2022 at 06:01:47PM +0200, gilles landais wrote:
> In order to improve the traceability and the citation of the resources
> consumed in the VO network, we propose to make more visible the data origin
> in the VO.
> We mean by Data Origin the basic provenance like authors, institutes, DOI,
> references, …

Thanks for taking this action! Count me in on helping out on it.
What about a side meeting on this in the next Interop's gathertown?

A few early comments on your use cases:

(a) on the "get basic provenance" and "trace data origin": I always
like it better if a use case actually tells a story that is outside
of the thing itself. In this case: "What do I do with that basic
provenance?" I'd hence say the first could be something like:

* A researcher has data in a VOTable that shows an odd feature. They
would now like to talk to the creator of the data to help figure
out whether that feature is physics or an artefact. [Requirement:
contact information to producers present; but then let's not make
that a MUST: This can be GDPR-relevant data, and it must be
possible to leave it out if it is]

* A researcher revisits work they did six months earlier in an ad-hoc
fashion and would now like to reproduce it in a more structured
fashion. Do do that, they need to know, say, which queries against
which services, or perhaps which programs, produced the files.
[Requirement: have the request parameters and a service
identification (access url? ivoid?) in the data origin]

(b) The "final users to cite" and "bibliography of everything" (and
perhaps even "AAS citation template" use cases imply, I think,
roughly the same requirements. I'd hence merge them to something
like:

* While preparing a publication, a researcher would like to properly
cite the software and data that went into their results. They now
run a program to extract that information from the digital artefacts
going into the publication -- perhaps even in separate parts of
citations and acknowledgements. [Requirement: The data origin must
indicate requests for citation and/or acknowledgement in a
machine-readable way, preferably in a way that machines can
generate BibTeX for whatever they specify]

Thanks,

Markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/datacp/attachments/20220902/e0178f0c/attachment.htm>


More information about the datacp mailing list