FAIR semantics

Mon Jun 7 11:12:34 CEST 2021

Dear Françoise,

On Fri, Jun 04, 2021 at 11:40:09AM +0200, Francoise Genova wrote:
> As you know the FAIR principles are currently a hot topic, and requiring
> that data produced by projects is FAIR is on the agenda of many projects.
> The I2 FAIR guiding principles from Wilkinson et al. states that '(meta)data
> uses vocabularies that follow FAIR principles'. There is ongoing work to
> define what a 'FAIR vocabulary' means, in particular in the FAIRsFAIR
> project and in the RDA Vocabulary and Semantic Services IG (VSSIG). The
> current version of the FAIR Semantics Recommendations is here:
> https://doi.org/10.5281/zenodo.4314321

I've had a look at an earlier version of that paper in the run-up to
Vocabularies 2.0, and I think we're doing pretty good.  Below is a
short run-down of the requirements from the current version and how I
think we stand on them for VocInVO-compliant vocabularies.

Do you think there is any need to follow up on this in some way at
this point?

        -- Markus

Requirements from https://doi.org/10.5281/zenodo.4314321

P-Rec. 1: All our vocabularies are uniquely identified by
http://www.ivoa.net/rdf/<vocname>; they resolve to
human-readable vocabulary descriptions and the term lists by default,
to machine-readable RDF resources by content negotiation as per W3C
best practices otherwise.

P-Rec. 2: If I understand correctly the aim of this requirement, it
is about something like our registry records per vocabulary that can
be retrieved without retrieving the whole vocabulary.  That we do not
have (the vocabulary metadata is present machine-readably in the
files, though, and our files are compact enough that a harvester
wouldn't be overloaded either way).  If and when a standard is
created that defines how the metadata record should be serialised, it
should be simple to produce it from the in-vocabulary metadata; until
then, I would claim our vocabulary repo index at
http://www.ivoa.net/rdf/ is about the best we can do.

P-Rec. 3: This minimum metadata is defined for us by Vocabularies
2.0.  Adopting some external schema is mainly a matter of picking
one; that, I think, is an implementation issue mainly depending on
something wanting to consume some specific form of such metadata.

P-Rec. 4: I'd hope ivoa.net counts as trustworthy :-)

P-Rec. 5: This is not very concrete at this point; I would argue that
by following the W3C best practices we are good on this.  The more
complex APIs hinted at appear to be about offering clients ways to
edit resources, which is not within our use cases.

P-Rec. 6: Since we only operate a single repo, this does not apply to
us.

P-Rec. 7: Since our resource ids use the http schema, this is not
trivially satisfied, and changing this would break existing terms and
specifications.  However, all vocabularies can be retrieved through
HTTPS (as is necessary to make them usable from current client
javascript), so I'd say we're good here, too.

P-Rec. 8: That's Vocabularies in the VO 2.

P-Rec. 9: We're producing RDF/XML and Turtle, based on RDFS and SKOS.

P-Rec. 10: I *suppose* there's room for improvement here (but then
it's optional).

P-Rec. 11: That's fairly far beyond what we're doing at the moment
(and it's optional).

P-Rec. 12: We're already using skos:exactMatch in our UAT, and
similar devices are envisioned for a vocabulary of facilities and
instruments; so, for now I'd claim we're in the realm of the "In many
cases" language in this requirement.

P-Rec. 13: Whenever this becomes more formal, this would be covered
in the adoption ENs.

P-Rec. 14: We are re-using SKOS and RDFS; let's see what else becomes
useful.

P-Rec. 15: This currently applies to the UAT, where we're doing it
with skos:exactMatch.  We will have a similar mechanism for
SIMBAD-derived object types.

P-Rec. 16: Our own vocabularies are CC-0, other licenses are
possible for externally managed vocabularies (where I suppose we
won't adopt them if they're un-FAIR).  They are declared in both
human-readable and machine-readable ways.

P-Rec. 17: We use VEPs for that (which of course are not terribly
machine-readable yet); formal provenance for freshly-created
vocabularies we don't have (yet).  Human-readably, the IVOA RFC
process should provide sufficient provenance there, though.

On the best practices -- well, I agree on most of them, but as usual
with best practices, there's limits.  For instance, while VocInVO2 is
nudging people to use lower-case-with-dashes terms (that's BP.1),
that doesn't always work.  refframe, for instance, needs to
accomodate the lexical form from VOTable, and relationship is
well-advised to just adopt the lexical forms of DataCite.  For BP.6,
we've tried to provide some guidelines in
https://ivoa.net/documents/Vocabularies/20210525/REC-Vocabularies-2.0.html#tth_sEc5.2.4,
but again having this reflected in the actual discussion processes is
not always easy.

BP.8 is of course my personal big desideratum.  You wouldn't believe
how hard that is.