[vodml] Attribute multiplicity

CresitelloDittmar, Mark mdittmar at cfa.harvard.edu
Tue Dec 29 18:56:49 CET 2015


I am attempting to spawn this thread out of the 'vo-dml for cube' subject
which has expanded into multiple inter-related topics.

The request is that the vo-dml specification be modified to allow a
variable multiplicity in Attributes.
The current document (Oct 2015) states in Section 4.14: Attribute
  "The multiplicity of Attributes is restricted to be either 0..1, or n..n
where n is a positive integer literal.  i.e. open ended collections are not
allowed, nor is 0..n, with n>1."

  and in Section 4.19: Multiplicity
    "A special case is the assignment of a Multiplicity to an Attribute.
The only combinations of minOccurs..maxOccurs that are allowed on attribute
definitions are 0..1, 1..1 (or simply 1), and n..n (or simply n) with n>1.
i.e. for multiplicity greater than 1 the attribute can be interpreted as an
array of fixed size."

Gerard explains the reasoning as:
  "Motivation for disallowing * is somewhat philosophical, related to the
semantics of datatype. Also related to the type of data models VO-DML is
targeting. VO-DML is aimed at models that can give a conceptualization of a
domain, rather than providing an implementation/serialization. For the
latter we have our VOTable mapping document for example, where the vo-dml
is used in annotations.
Idea is that if you have a collection-like concept for which you do not
know how many instances you will have (i.e. multiplicity is *), it becomes
a meaningful action to first state that such an instance exists and then
that it is contained in the collection on a certain object. Such a concept
should be modelled by collection of object types, not list of data types."

There are, however, several instances in the current model work where this
approach has not been followed.
    - array of coefficients in a Polynomial transform definition
    - number of coordinate values along an enumerated coordinate axis (Enum

    - Curation.reference = string[0..*]
    - DataID.collection= string[0..*]
    - DataID.contributor= string[0..*]
    NOTE: the STC2 prototype also contains the above transform related items

    - PointData.customAxes = GenericCoord[0..*]
        a list of coordinate values not representable by the
domain-specific coordinate types.
        GenericCoord is a dataType, so customAxes is an Attribute, and this
multiplicity is not allowed.

    - Model.author = string[0..*]

Plus some speculated cases which have not been modeled.
  - pixel data values in a spectrum
  - number of samples in a time series.
  basically dimensionality of values;  ndim[1],  values[ndim]

Gerard has suggested:
  "I think we should allow 0..n for attributes, but with a different
interpretation from its counterpart in composition relations and
references.  It would imply that the value of the attribute is either 0, or
an array of length exactly n.  In relations the interpretation remains that
one can have between 0 and n (inclusive) instances in the collection."

and added:
  "Allowing n to be an integer attribute with multiplicity 1 I am still
strongly against.  For example, what if, after initializing the 0..n
attribute, you change the value of n?"

(Which is an excellent point. BTW)
However, this suggestion does not, I think, resolve any of the above cases
since the number of instances is not known.

For the simple string cases, creating an object to hold the string datatype
would resolve the compliance issue, but seems overly complicated.   Once
could argue that you wouldn't, necessarily, need to serialize the
containers, but I would not like to see different serialization rules for
simple containers vs complex ones.

The more complex cases, one could/should question the modeling to be sure
the approach is correct (on appropriate separate thread).  Which may or may
not find an alternate solution.

For those requesting n = non-negative integer attribute with multiplicity 1:
  perhaps in these cases, n is not a modeled property, but derived from the
length of the array.  if so,
  then these become open-ended values[*] which matches the above cases.
constraints can restrict '* = 1,2,3'

So the issue is still un-resolved, and in my opinion, the string cases may
be the better to focus on since they would generate the most overhead for
the least benefit.  We would have several objects, which simply hold the
primitive value

or maybe just

I'll close with an opinion on the suggested change to allow 0..n with
special interpretation.
I expect there may be use cases where there will be a need for optional
fixed array attributes
(served by 0 OR n with literal n), but I don't think we've seen them yet.
I think that using the
notation 0..n will be widely misinterpreted when people are reading the
model diagrams.
The notation '0,n' would be more intuitive, but not standard.  So, this
feels like a combination
of multiplicity and constraint:   multiplicity = '*' with constraint "* =

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dm/attachments/20151229/1bea1017/attachment-0001.html>

More information about the dm mailing list