Proposed erratum to clarify arraysize="1"

Tom Donaldson tdonaldson at stsci.edu
Thu Feb 15 17:31:37 CET 2018


Mark and all,

I think the R2 ruling proposal is interesting.  By simply making arraysize=”1” illegal, we would:


·         Eliminate the complications of explaining that option in the spec or of interpreting it in clients.


·         Possibly be doing what was meant by the original spec anyway (“The arraysize attribute exists when the corresponding table cell contains more than one…”).

As you say, it has the downside of not supporting 1-element arrays. I admit to feeling a certain appeal in the consistency and completeness of offering 1-element arrays (just as those are available in programming languages), but I can’t think of a specific need, so I won’t push hard either way.

I do think it’s important that with either R1 or R2, clients should not have to change.  The erratum is just another input to consider when deciding their implementation.

As a side note, I agree that STIL/TOPCAT does do the sensible thing.  ☺  For astropy however, it’s worth noting that although it treats arraysize=”1” cases as “arrays”, those arrays offer flexibility within the astropy/numpy framework that ends up treating the value as a scalar when one is needed.

As a simple example, this output:

One value is an array, the other is a scalar.
f1 value = [1] (arraysize="1")
f2 value = 1   (scalar)

The array value is treated as a scalar when needed.
f1val + f2val == 2

Is generated by this code:

from astropy.table import Table
import math
import io

vot = """<?xml version="1.0"?>
<VOTABLE version="1.3" xmlns="http://www.ivoa.net/xml/VOTable/v1.3">
      <RESOURCE>
            <TABLE name="trap">
                  <FIELD datatype="short" name="f1" arraysize="1"/>
                  <FIELD datatype="short" name="f2"/>
                  <DATA>
                        <TABLEDATA>
                              <TR>
                                    <TD>1</TD>
                                    <TD>1</TD>
                              </TR>
                        </TABLEDATA>
                  </DATA>
            </TABLE>
      </RESOURCE>
</VOTABLE>
"""

t = Table.read(io.BytesIO(bytes(vot, 'utf-8')))

for row in t:
    f1val = row['f1']
    f2val = row['f2']

    print('One value is an array, the other is a scalar.')
    print('f1 value = {} (arraysize="1")'.format(f1val))
    print('f2 value = {}   (scalar)'.format(f2val))

    print('\nThe array value is treated as a scalar when needed.')
    if (f1val + f2val == 2):
        print('f1val + f2val == 2')

I expect that the main problems come up in cases where it’s not clear a scalar is expected, such as re-serialization for sharing with other consumers.

Cheers,
Tom


From: <apps-bounces at ivoa.net> on behalf of Mark Taylor <m.b.taylor at bristol.ac.uk>
Date: Thursday, February 15, 2018 at 5:36 AM
To: Applications WG <apps at ivoa.net>
Subject: Re: Proposed erratum to clarify arraysize="1"

Tom, Markus, others,

On Mon, 12 Feb 2018, Tom Donaldson wrote:

Since this is an erratum, I was thinking in terms of putting the wording
there that we would have put in the original if we had realized the lack
of clarity.  In that frame of mind, the use of “must” makes sense.
That is what the spec should have said.  (I do see this as a clarification
rather than a change.  Section 4.1 does already say, “The arraysize
attribute exists when the corresponding table cell contains more than
one…”. )

Yes well that does make me think.  That spec text which you quote
implies, though it does not explicitly state, that the use of
arraysize="1" is simply incorrect.

Given that, behaviour on encountering that construction is unspecified,
so VOTable consumers should either refuse to process such elements
(not very helpful) or if not, should do the most sensible thing.
As the erratum points out, "it is not likely that the provider meant
for the values to be interpreted as an array.", so what current
client software ought to do is to treat arraysize="1" as a scalar.

The Erratum currently says:

   "Clients that interpret arraysize="1" exclusively as a single scalar
    value should consider interpreting those values as arrays of size 1,
    since that's what compliant services intend."

But, as I (and the Erratum) have said above, it's also most likely
the *opposite* of what pre-Erratum services intend.

[Attentive readers will have spotted that in my argument above
I'm characteristing STIL/TOPCAT behaviour as "doing the sensible
thing", and Astropy as ... well, not that.  (If anybody familiar
with other VOTable software can report what that does in such
circumstances, it would be interesting to hear.)  Obviously, I'm not
a disinterested observer in this, so if anybody wants to shoot this
down, please fire away.]

So I'm still not convinced that this erratum looks like a clarification
rather than a new decision about what VOTable ought to have said
in the first place.  So much for the legal philosophy.

But in practical terms, the Erratum is saying:
   R1:
      - if arraysize="1" is present, it means a 1-element array
      - don't write arraysize="1" unless you mean a 1-element array

Given that ruling:

   1a) VOTable producers should reconsider whether they really mean
          arraysize="1"
   1b) VOTable validators should present a warning for arraysize="1"
          that producers may or may not need to do something about
   1c) Astropy-like clients can carry on unchanged
   1d) STIL-like clients should go from treating arraysize="1" as a scalar
          to treating it like an array (but may wish to exercise flexibility)

As I've said in an earlier message, I'm not going to make the change (1d)
any time soon, at least in the default configuration of TOPCAT/STILTS,
because it's likely to break stuff.

An alternative possibility would be:

   R2:
      - arraysize="1" is illegal
      - don't write arraysize="1", ever

(note this does not correspond to any of the options in Markus's
final "What do we do?" slide from
http://wiki.ivoa.net/internal/IVOA/InterOpOct2017Apps/arraysize-slides.pdf)

Given that ruling:

   2a) VOTable producers should stop writing arraysize="1"
   2b) VOTable validators should present an unambiguous error message
          for arraysize="1"
   2c) Astropy-like and STIL-like clients can carry on unchanged

Thus (2c) clients can continue to implement unspecified but plausible
behaviour for arraysize="1", but behavioural differences will
disappear as (2a) services stop emitting ambiguous VOTables.

I am therefore wondering if the Erratum should say R2 rather than R1.
The prescription is simpler, and the implementation is less disruptive.

The downside of R2 is that there is then no way to specify a
1-element array.  Question: is that a problem?  Does anybody
actually need to communicate 1-element arrays as distinct from scalars?
As I pointed out earlier in this thread, FITS BINTABLE has got away
with that situation since its invention.

Mark

--
Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
m.b.taylor at bris.ac.uk<mailto:m.b.taylor at bris.ac.uk> +44-117-9288776  http://www.star.bris.ac.uk/~mbt/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/apps/attachments/20180215/a2409534/attachment-0001.html>


More information about the apps mailing list