Prov-WD: W3C compatibility review

Ole Streicher ole at aip.de
Fri Jun 21 09:53:30 CEST 2019


Hi all,

as you may know, one of the goals of the IVOA provenance data model is
compatibility to the W3C provenance data model. The actual W3C
provenance standard may be found at

https://www.w3.org/TR/2013/REC-prov-dm-20130430/

I started to look into a possible W3C mapping of the Provenance data
model. One example is here:

https://openprovenance.org/store/documents/929

The example activity took a spectrum and a redshift and generated a
new, shifted spectrum (trivial activity, just to have an example
here). The same mapping is done in the experimental provSAP server on
top of our MuseWISE Provenance database and serves as a foundation of
our HTML visualization, which was presented as part of my Paris talk.

In the following, I will list some problems that I found, followed by
the mapping of the IVOA Provenance DM to the W3C model. I use the
"prov:" prefix to explicitly refer to elements the W3C provenance
standard, and "voprov:" to refer to elements of the IVOA Provenance DM.

Note that the mapping was mainly done to check the current model for
problems. It is not a complete, reviewed proposal, but may serve as the
base to develop a standardized mapping. Also, the mapping is based on
earlier works for ProvSAP authored by the Provenance WG and should be
integrated there. I do not claim an exclusive authorship.

Note also that this is not meant as a disagreement with the Provenance
WD. It is rather meant to document and discuss its limitations with
respect to the W3C Provenance standard to the DM group, and to have a
start for developing a standardized, semantically correct mapping to the
W3C Provenance model (f.e. for ProvSAP) as far as this is possible.


Problems
========

ActivityConfiguration
---------------------

One of the reasons to have a separate ActivityConfiguration
is that there exist use cases where it is semantically wrong to see a
Parameter as an entity. This is due to the fact that parameters are
tightly bound to the Activity (like an attribute), and can't be seen
independently from them. The same is true for ConfigurationFiles. We
therefore can't represent Parameters and ConfigurationFiles semantically
correct as entities in W3C provenance. An incomplete "hack" could be to
add the Parameter values as attributes to prov:Activity, using the
Parameter name and an application specific namespace as attribute name.

The example above does not use ActivityConfiguration, but stores the
configuration as normal entities and is therefore not affected.

Missing identifiers
-------------------

The description classes in the IVOA Provenance data model do not
contain an identifier, but they are mapped to prov:Entity which requires
one. The id needs to be generated on the fly here.

Irritating attribute names
--------------------------

voprov:UsageDescription and voprov:GenerationDescription have the
attribute "role". This does *not* represent the role of the description,
but the role of the Used resp. WasGeneratedBy relations. Mapping to
prov:role would be misleading and syntactically wrong (prov:role is not
allowed in prov:Entity). This requires a specific mapping here. Since
voprov:role should be human readable, it is mapped to prov:label.

Similarly, the voprov:type attribute of voprov:ActivityDescription,
voprov:EntityDescription, voprov:UsageDescription and
voprov:GenerationDescription does not refer to the type of the
description, but to the type of the object that refers to it. This is
solved by mapping to specific names (e.g. voprov:ActivityType).

Unrepresentable references
--------------------------

Some the relations in the Provenance data model don't have a
semantically correct representation in W3C. In the example above, these
links are all implemented by attributes containing the id of the target
object; however alternatives (if possible) are discussed here:

* Entity --> EntityDescription (and their subclasses)

This can't be a prov:WasDerviedFrom relation, since there is neither a
derivation from the Entity to the description nor the reverse. Both
may be created independently. Also, the other W3C provenance relations
don't apply here.

* Activity --> ActivityDescription

This can't be a prov:WasInfluencedBy relation, for the same reason as
above. However, this relation /could/ be presented as a
prov:WasAssignedTo relation, requiring to qualify the mapped
ActivityDescription as a prov:Plan and suggesting an agent for the relation.

* Used --> UsageDescription, WasGeneratedBy --> GenerationDescription

In W3C provenance, Used/WasGeneratedBy are relations. There are no W3C
relations that refer to another relation. Semantically, the
UsageDescription/GenerationDescription is the prov:Role of prov:Used
and prov:WasGeneratedBy. There already may be a voprov:Role attribute in
voprov:Used and voprov:WasGeneratedBy, which however is redundant (must
be equal to the voprov:Role attribute in voprov:UsageDescription resp.
voprov:GenerationDescription), and therefore can be omitted.

* ActivityDescription --> UsageDescription/GenerationDescription

The description of input and output could be seen part of the
description of an activity, or as a specialization of this
description. That would imply to use the prov:HadMember or the
prov:specializationOf relation. The same is basically true for
ActivityDescription --> ParameterDescription/ConfigFileDescription.
Using prov:hadMember would require to qualify the mapped
ActivityDescription as a prov:Collection.

* UsageDescription/GenerationDescription --> EntityDescription

Again, this is semantically not a prov:WasDerivedFrom, since both may
be created independently. Other W3C relations also don't apply here.


Mapping
=======

Genuine Classes
---------------

* voprov:Activity --> prov:Activity
* voprov:Entity --> prov:Entity
* voprov:ValueEntity --> prov:Entity(type="voprov:ValueEntity")
* voprov:DatasetEntity --> prov:Entity(type="voprov:DatasetEntity")
* voprov:Collection --> prov:Collection
* voprov:Agent --> prov:Agent
* voprov:ActivityDescription
  --> prov:Entity(type="voprov:ActivityDescription")
* voprov:UsageDescription
  --> prov:Entity(type="voprov:UsageDescription")
* voprov:GenerationDescription
  --> prov:Entity(type="voprov:GenerationDescription")
* voprov:EntityDescription
  --> prov:Entity(type="voprov:EntityDescription")
* voprov:ValueDescription
  --> prov:Entity(type="voprov:ValueDescription")
* voprov:DatasetDescription
  --> prov:Entity(type="voprov:DatasetDescription")

Classes in the IVOA model that are mapped to W3C relations
----------------------------------------------------------

* voprov:Used --> prov:Used
* voprov:WasGeneratedBy --> prov:WasGeneratedBy
* voprov:WasAttributedTo --> prov:WasAttributedTo
* voprov:WasAssociatedWith --> prov:WasAssociatedWith

Relations in the IVOA model that are mapped to W3C relations
------------------------------------------------------------

* voprov:wasInformedBy --> prov:WasInformedBy
* voprov:wasDerivedFrom --> prov:wasDerivedFrom
* voprov:hadMember --> prov:hadMember

Attributes and references
-------------------------

All attributes in the IVOA model that have W3C counterparts are mapped
to them (i.e. voprov:location --> prov:location in Entity).
Exceptions:

 * voprov:name is mapped to prov:label.

 * The voprov:type attributes of voprov:UsageDescription,
   voprov:GenerationDescription and voprov:ActivityDescription are
   mapped to voprov:usageType, voprov:generationType, voprov:EntityType,
   and voprov:ActivityType in prov:Entity.

 * The voprov:role attributes of voprov:UsageDescription and
   voprov:GenerationDescription are mapped to voprov:label.

 * The voprov:role attributes of voprov:Used and voprov:WasGeneratedBy
   are mapped to prov:role if voprov:usageDescription resp.
   voprov:generationDescription does not exist, and ignored otherwise.

 * The voprov:usageDescription and voprov:generationDescription
   references in voprov:Used and voprov:WasGeneratedBy are mapped to the
   prov:role attribute.

Attributes that don't have W3C counterparts are generally mapped using
the voprov namespace (i.e. voprov:comment --> voprov:comment).

References and compositions in the IVOA model are mapped by the same
rules, using the identifier to identify the target.

Best regards

Ole


More information about the dm mailing list