Comments on Vocabulary 2.0

Fri Aug 28 17:19:01 CEST 2020

Dear members of the Semantics W.G., 

I read the Vocabulary 2.0 document.

I noticed a few issues for compatibility with Theory I.G. recommendations. The Simulation Data Model and The Simulation Data Access Layer are two recommendations that rely strongly on Semantics as described in Vocabulary 1.19 as well as the philosophy introduced by Norman Gray concerning the use of web semantics at IVOA that is underlying in Vocabulary 1.19. 

The main issues are:

1 - Vocabulary 2.0 seems to imply that only official IVOA vocabularies can be used. 
This topic that has been discussed many time in Semantics and Theory sessions a few years ago. We decided, during the definition of our two Theory recommendations and in agreement with the Semantics W.G. that we would have official vocabularies and non-official vocabularies, the latter ones being managed directly by the data producers / publishers and not any IVOA W.G.. Indeed, for the description of simulations and astrophysical codes, according to SimDM, many quantities need to be defined but are unique to some specific codes. The most obvious example are Input Parameters of codes, that often are specific to codes. It would be complicate to have these concepts in general IVOA vocabularies and to follow the procedure of recommendation of vocabularies described in Vocabulary 2.0. Other quantities are more general and can fit in official IVOA vocabularies (more specifically VO-Theory vocabularies - I mean managed by experts of simulations). This difficulty was answered by the decision to have two kind of vocabularies and, eventually, mapping between vocabularies. We do not wish to have single vocabularies because it would be too complicate to set up and would not fit our needs. That is in the line with Vocabulary 1.19:

Reference : Vocabulary 1.19 - Section 1.3:
"We find ourselves in the situation where there are multiple vocabularies in use, describing a broad range of resources of interest to professional and amateur astronomers, and members of the public. These different vocabularies use different terms and different relationships to support the different constituencies they cater for. … 
One approach to this problem is to create a single consensus vocabulary, which draws terms from the various existing vocabularies to create a new vocabulary which is able to express anything its users might desire. The problem with this is that such an effort would be very expensive, both in terms of time and effort on the part of those creating it, and to the potential users, who have to learn to navigate around it, recognise the new terms, and who have to be supported in using the new terms correctly (or, more often, incorrectly).
The alternative approach to the problem is to evade it, and this is the approach taken in this document. Rather than deprecating the existence of multiple overlapping vocabularies, we embrace it, help interest groups formalise as many of them as are appropriate, and standardise the process of formally declaring the relationships between them. This means that:
* The various vocabularies are allowed to evolve separately, on their own timescales, managed either by the IVOA, individual working groups within the IVOA, or by third parties;
* Specialised vocabularies can be developed and maintained by the community with the most knowledge about a specific topic, ensuring that the vocabulary will have the most appropriate breadth, depth, and precision;
… "

2 - Vocabulary 2.0 seems to imply that each concept must have a proper definition. 
That is a nice idea. Unfortunately that is not achievable in some cases. The Simulation Data Model require SKOS concepts for input parameters of codes, algorithms, objects that can be simulated (real astrophysical objects and theoretical objects as particles, mesh, …), physical quantities as well as physical processes. The number of quantities is huge (several thousands or tens of thousands quantities). To provide a definition for each concept is not achievable. 

I recognised it is a very pragmatic position from an developer point of view to ask for definitions of each concept. But, in the same time, asking for definitions of thousands of concepts is not pragmatic because it will never been done. For example, who will spend time to provide definitions to Hubble constant, velocity, thermal pressure, etc and all physical and astrophysical concepts ? It will slow the process of providing tools and standards to allow data publishers to publish their data in the Virtual Observatory.

A solution would be to distinguish between different categories of vocabularies: 
- those gathering concepts strongly tied to IVOA standards (ex: Datalink vocabulary). Usually those vocabularies have a limited number of concepts and it is worthwhile to provide precise definitions
- those gathering scientific concepts. Ofter they are large vocabularies, used in specific science domains and belong to disciplinary knowledge. Their definitions are known and defined in theses communities and so do not have to be managed at the level of the IVOA.

In some cases, it may be difficult to know in which category a vocabulary falls. Maybe that should be discussed case by case. 

3 - Vocabulary 2.0 seems to not support SKOS ALT labels. ALT label is a basis of web-semantics since it allows a system to manage synonyms. ALT label is used in implementations of SimDM. ALT labels are supported in Vocabulary 1.19. The discontinuation of supporting ALT labels is a step back that is not compatible with the way theory services use SKOS concepts. 

In conclusion, Vocabulary 1.19 has been created with the goal to start to introduce web semantics in VO services. The next steps, would have been to work on the links / mappings between vocabularies and on the versioning of vocabularies. 
Ref: - Vocabulary 1.19, section 5 : "Part of the motivation for formalising vocabularies within the VO is to support mapping between vocabularies, so that an application which understands, or can natively process, one vocabulary, can use a mapping to provide at least partial support for data described using another vocabulary."
Here, Vocabulary 2.0 seems to no more put the focus on web semantics but instead to have strong and rigorous rules to manage and use “simple” vocabularies. The goals seem different. I think that if a new standard is required, it should not break what has been done before and on which recommendations, as Theory I.G. ones, rely on. 

A solution could be to develop a bit Vocabulary 2.0 to distinguish between two kinds of vocabularies: those strongly tied to IVOA standards and infrastructures and those that describe scientific concepts. The first one, as the Datalink vocabulary, can follow the specifications described in Vocabulary 2.0, in particular a definition is expected for each concept and they must be approved by IVOA people as the Semantics W.G. group. I am note sure the TCG is the proper level because all chairs of W.G. might not feel concerned. The latter one can have very long lists of concepts specific to scientific domains. The rules must be more flexible for those ones and they should be managed by scientists experts of the field. 

Best regards
Franck