[vodml] Attribute multiplicity

Wed Dec 30 00:21:19 CET 2015

Hi Mark
Thanks for moderating the discussions. This is a long email and you have another one that may need discussing as well.
In this response  I have mainly tried interpreting some of the cases you mention in valid VO-UML diagrams. PNGs of the diagrams are attached and comments are added to the text below.

From: dm-bounces at ivoa.net [mailto:dm-bounces at ivoa.net] On Behalf Of CresitelloDittmar, Mark
Sent: Tuesday, December 29, 2015 12:57 PM
To: Data Models mailing list <dm at ivoa.net>
Subject: [vodml] Attribute multiplicity

All,

I am attempting to spawn this thread out of the 'vo-dml for cube' subject which has expanded into multiple inter-related topics.
  http://mail.ivoa.net/pipermail/dm/2015-December/005271.html
The request is that the vo-dml specification be modified to allow a variable multiplicity in Attributes.
The current document (Oct 2015) states in Section 4.14: Attribute
  "The multiplicity of Attributes is restricted to be either 0..1, or n..n where n is a positive integer literal.  i.e. open ended collections are not allowed, nor is 0..n, with n>1."
  and in Section 4.19: Multiplicity
    "A special case is the assignment of a Multiplicity to an Attribute.  The only combinations of minOccurs..maxOccurs that are allowed on attribute definitions are 0..1, 1..1 (or simply 1), and n..n (or simply n) with n>1.  i.e. for multiplicity greater than 1 the attribute can be interpreted as an array of fixed size."
Gerard explains the reasoning as:
  "Motivation for disallowing * is somewhat philosophical, related to the semantics of datatype. Also related to the type of data models VO-DML is targeting. VO-DML is aimed at models that can give a conceptualization of a domain, rather than providing an implementation/serialization. For the latter we have our VOTable mapping document for example, where the vo-dml is used in annotations.
Idea is that if you have a collection-like concept for which you do not know how many instances you will have (i.e. multiplicity is *), it becomes a meaningful action to first state that such an instance exists and then that it is contained in the collection on a certain object. Such a concept should be modelled by collection of object types, not list of data types."
There are, however, several instances in the current model work where this approach has not been followed.
  STC2:
    - array of coefficients in a Polynomial transform definition
It is easy to create a very generic model of an N-dimensional polynomial as in the attached NDPolynomial.png.
It defines the axes explicitly (the dimension is the size of the axis collection) and one may even want to add an identification between an axis and some observable. The polynomial consists of a collection of Monomials that are assumed to be summed.
Each monomial is the product of one or more exponentiated axes.

What this model assumes is that it is meaningful to state that a particular polynomial, or expression, exists and that the definition of its terms, coefficients and powers is meaningful as well. In fact when the polynomial is the result of fitting parameters one may wish to add uncertainty estimates to this pattern for example. More generally, one may wish to describe the provenance of the polynomial that has been defined.
Compared to the “array of coefficients” this model is much more explicit, we do not have to rely on order in an array to define the power of the corresponding term for example.

And though this may not be relevant for some use cases considered in STC2, it even allows one to have negative or non-integer powers if desired.
In fact it is actually easy to generalize this pattern further to a generic mathematical expression as in figure Expression1.png or Expression2.png. The latter includes Sum, Product and Power as instances of Function. This would allow us to describe transformations from Cartesian to spherical coordinates etc using the same model components.

    - number of coordinate values along an enumerated coordinate axis (Enum transform)
I do not know what you’re referring to here, maybe other email explains.
  Dataset:
    - Curation.reference = string[0..*]
Using string as a serialization of an id identifying a publication. I.e. as a serialization of a reference.

    - DataID.collection= string[0..*]
Idem, using string to identify a collection. Most likely the association between dataset and collection can have further attributes, e.g. the id by which the dataset is known in that collection etc.

    - DataID.contributor= string[0..*]
Idem, using name to identify a person contributing (“playing a role”) in the creation of the dataset. Likely could be further refined by describing the actual role the person played. E.g. “creator”,”curator”,”publisher”.
I have created a diagram (Dataset.png) linking Dataset to other concepts without the intermediate DataID and Curation types. It models these strings as references and associative types. For example Contributor is a role played by a Party (a common design pattern to generalize Individual and Organization). Ths Party is also used for defining Authors of a Publication. Again, no need to discuss these in detail I think, but I believe these simple models are already more explicit, and I would argue more “correct” than the implicit models relying on arrays of strings to (maybe) identify instances of some concepts that themselves are not explicitly defined.

    NOTE: the STC2 prototype also contains the above transform related items
  Cube:
    - PointData.customAxes = GenericCoord[0..*]
        a list of coordinate values not representable by the domain-specific coordinate types.
        GenericCoord is a dataType, so customAxes is an Attribute, and this multiplicity is not allowed.
Not yet looked at this.

  VO-DML
    - Model.author = string[0..*]
Please note first that VO-DML is *not* a conceptual data model written in VO-DML, it is a language for expressing data models and there was no requirement to have it conform to its own rules. If you want it may be possible to consider it to be an application specific serialization/denormalization of a more comprehensive/normalized representation. Representing an “author” by a string is clearly not doing justice to the Author concept, but makes it easier to write and parse/interpret a model document. If we think it is important to do something about it, in particular make it VO-DML compliant, the Party/Author pattern used above can be used without much problems.
Plus some speculated cases which have not been modeled.
  - pixel data values in a spectrum
Several models have already moved to describing pixels (or voxels) through a collection of objecttype-s on the parent similar to the attached Spectrum.png. Note that I am not advocating this model in all detail, but it shows how one might gather relevant info in one objecttype.  And note once more, serializations do not have to follow the model 1-1.
I think in general our models, aimed as they are at annotation to assist interoperability/information integration, rather than at direct implementation, should be more explicit about how data is obtained. I believe the measurement pattern as in uml2.narod.ru/files/docs/13/AnalysisPatterns.pdf
Is useful. It was used in SimDB and originally in the “domain model” (for those who are old enough to remember this ☺).

  - number of samples in a time series.
  basically dimensionality of values;  ndim[1],  values[ndim]
I guess similar to previous case. No need to define ndim if the size of the collection of samples gives one that information.
Gerard has suggested:
  "I think we should allow 0..n for attributes, but with a different interpretation from its counterpart in composition relations and references.  It would imply that the value of the attribute is either 0, or an array of length exactly n.  In relations the interpretation remains that one can have between 0 and n (inclusive) instances in the collection."

and added:
  "Allowing n to be an integer attribute with multiplicity 1 I am still strongly against.  For example, what if, after initializing the 0..n attribute, you change the value of n?"
Laurent argued that a Constraint definition would be able to deal with this. I agree with that. I think my main argument is that this would introduce a kind of redundancy. Why have a separate integer attribute if the value of that concept could be derived from the length of the collection/array? My main point is that since this “solution” would  leave it open to the instantiation to decide how long the collection, it is apparently a meaningful statement to declare the existence of these instances, from which it follows (see my argument quoted by Mark above)  that they should be represented as ObjectType-s.
(Which is an excellent point. BTW)
However, this suggestion does not, I think, resolve any of the above cases since the number of instances is not known.
Indeed, my main argument is that I think those examples are flawed and so I need not have made the.
For the simple string cases, creating an object to hold the string datatype would resolve the compliance issue, but seems overly complicated.   Once could argue that you wouldn't, necessarily, need to serialize the containers, but I would not like to see different serialization rules for simple containers vs complex ones.
It is only overly complicated if you ignore that in arguably all these cases a single string does not do justice to the concept that is being represented. There are in general no serialization rules that must be followed. It generally will depend on the application how one wishes to do that.
The more complex cases, one could/should question the modeling to be sure the approach is correct (on appropriate separate thread).  Which may or may not find an alternate solution.
For those requesting n = non-negative integer attribute with multiplicity 1:
  perhaps in these cases, n is not a modeled property, but derived from the length of the array.  if so,
  then these become open-ended values[*] which matches the above cases. constraints can restrict '* = 1,2,3'
So the issue is still un-resolved, and in my opinion, the string cases may be the better to focus on since they would generate the most overhead for the least benefit.  We would have several objects, which simply hold the primitive value
  Author.name:string[1]
  Reference.refString:string[1]
  Collection.tag:string[1]
  Contributor.entity:string[1]
or maybe just
  Author.value:string[1]
  Reference.value:string[1]
  Collection.value:string[1]
  Contributor.value:string[1]
I hope my alternatives show that in fact these things are related to referencing instances of ObjectTypes, and in our conceptual model should be modeled like that. BUT serializations can create patterns that are similar to what you describe here, the extra annotation around them will simply make these mappings more explicit.

I'll close with an opinion on the suggested change to allow 0..n with special interpretation.
I expect there may be use cases where there will be a need for optional fixed array attributes
(served by 0 OR n with literal n), but I don't think we've seen them yet.  I think that using the
notation 0..n will be widely misinterpreted when people are reading the model diagrams.
The notation '0,n' would be more intuitive, but not standard.  So, this feels like a combination
of multiplicity and constraint:   multiplicity = '*' with constraint "* = 0,n"
Fair enough. It really would be “0” or “n”, which is different from anywhere between 0 and n. Of course the context would make this clear, but if people think this is confusing, we may come up with an alternate design for the Multiplicity class in Vo-DML.

Cheers
Gerard
Mark

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dm/attachments/20151229/a02a65d4/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: NDPolynomial.png
Type: image/png
Size: 9910 bytes
Desc: NDPolynomial.png
URL: <http://mail.ivoa.net/pipermail/dm/attachments/20151229/a02a65d4/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Expression1.png
Type: image/png
Size: 24538 bytes
Desc: Expression1.png
URL: <http://mail.ivoa.net/pipermail/dm/attachments/20151229/a02a65d4/attachment-0005.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Expression2.png
Type: image/png
Size: 9643 bytes
Desc: Expression2.png
URL: <http://mail.ivoa.net/pipermail/dm/attachments/20151229/a02a65d4/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Spectrum.png
Type: image/png
Size: 5468 bytes
Desc: Spectrum.png
URL: <http://mail.ivoa.net/pipermail/dm/attachments/20151229/a02a65d4/attachment-0007.png>