RFC Provenance Data Model
Harry Enke
henke at aip.de
Wed Aug 21 18:45:07 CEST 2019
Dears,
while reading the document, I ran into a couple of statements, which
seem unclear or misguiding, so here's my 2c:
Intro,para2: (p1.)
"The provenance of scientific data is a part of the open publishing
policy for science data and follows some of the FAIR principles for data
sharing."
I suggest to substitute this sentence with:
"Provenance of scientific data is one components of the FAIR
principles: "(4.3) Published data should refer to their sources with
rich enough metatdata and provenance to enable proper citation".
(Force11, FAIR Principles)
This provenance model goes beyond the FAIR principles, its intent is to
make the generation of astronomical data accessible. "
=> This sentence is confusing and slightly misleading: there is no "open
publishing policy", because policies require a body which gives this
policy to itself and
implements some enforcement measures. And why not pointing to the direct
reference in FAIR ? It's good we have some.
"In astronomy, such entities are generally datasets composed of
VOTables, FITS files, database tables or files containing values
(spectra, light curves), any value, logs, documents, as well as physical
objects such as instruments, detectors or photographic plates."
I suggest to clarify:
"In astronomy, such entities are generally datasets composed of
VOTables, FITS files, database tables or files containing values
(spectra, light curves), any value, logs, documents, as well as
descriptions of physical objects such as instruments, detectors or
photographic plates, or information about software."
=> (see paragraph above: 'information about...)
a) not the machinery itself, but the description can be included as
provenance information
b) I think that software (even though it's not explicitly addressed)
also could have provenance information
"General Remarks:" (p8)
"Another important usage of provenance information is to assess the
pertinence of a product for scientific objectives, which can be
facilitated through the selection of the relevant provenance information
attached to an entity that is delivered to a science user."
I suggest:
"Provenance information delivers additional information about the
scientific data set to enable the scientist to evaluate its relevance
for his work. "
=> This is hard to understand,if one substitutes "pertinence" with
"relevance" (because they are synonyms) you get a kind of tautology
(relevance of a product => select relevant provenance information)
Best Practices: (p.9)
"The following additional points are recommended when managing
provenance information within the VO context:"
should be
"The following additional points are recommended when providing
provenance information within the VO context:"
=> since all following statements are clearly for providers of
provenance infomation
13. Role ... (p. 10)
"The IVOA Provenance Data Model is structuring and adding metadata to
trace the original process followed during the data production for
providing astronomical data. "
should read:
"The IVOA Provenance Data Model is structuring and adding metadata to
trace the processes of the data production for providing astronomical data."
=>It's neither reasonable nor required to restrict the provenance to
'original processes'.
Best ,
Harry Enke
--
******************************************************************
* Dr. Harry Enke E-Science & Supercomputing *
* Phone : +49-331-7499-433 *
* Email : henke at aip.de FAX : +49-331-7499-526 *
******************************************************************
* Leibniz Institut für Astrophysik Potsdam (AIP) *
* An der Sternwarte 16, D-14482 Potsdam *
* Vorstand: Prof. Dr. Matthias Steinmetz, Matthias Winker *
* Stiftung bürgerlichen Rechts *
* Stiftungsverzeichnis Brandenburg: 26 742-00/7026 *
******************************************************************
More information about the dm
mailing list