RegTAP 1.1

Markus Demleitner msdemlei at
Wed Jan 17 15:15:09 CET 2018

Hi Mark, Hi Registry,

On Sun, Jan 07, 2018 at 12:14:49AM +0000, Mark Taylor wrote:
> On Thu, 7 Dec 2017, Markus Demleitner wrote:
> > Since I guess VOResource 1.1 won't change terribly much any more, I
> > went ahead and put up RegTAP 1.1 on the document repository, also in
> > celebration of the REC's fourth birthday:
> > 
> >
> Data model identifier:
> ---------------------
> Section 7 discusses "declaring support for the data model Registry 1.0"
> by adding an element to the RegTAP capability:
>    <dataModel ivo-id="ivo://"
>      >Registry 1.0</dataModel>
> This document discusses RegTAP 1.1, so I *think* that all the "1.0"s
> in the above should be replaced by "1.1".

Indeed.  Fixed.

> Column data types:
> -----------------
> The listed types for each table column tabulated in sections 8.1-8.14
> have changed since RegTAP 1.0.  They used to be things like
> "VARCHAR(*)" and "REAL(1)".  Now they are things like "char(*)"
> and "float(1)" (I'm not sure if this text is auto-generated or
> if the change is intentional).  There are a few issues here.

It's autogenerated, essentially from DaCHS' schema, and the change as
such is intentional to follow TAP 1.1 TAP_SCHEMA changes.  I've not
paid attention, though, and thus things went... erm... bad.

As you write below, the actual VOTable serialisation details should
rarely matter, and so I'm mapping both char(*) and unicodeChar(*) to
"string"; but I'm feeling this needs some elaboration, so I put this
paragraph into the introduction to the table listing:

  Many of the columns specified below are defined as haveing a ``string'' 
  data type.  This is to be translated into arrays of \texttt{char} or
  \texttt{unicodeChar} on VOTable output depending on the service
  operators' decisions as to the representation of non-ASCII data in the
  database.  For requirements and recommendations regarding national
  characters in RegTAP, see Sect.~\ref{utfreq}.  The length of these
  arrays is not defined by this standard, where, obviously, no artificial
  length limits should be posed by implementations.  Implementors who have
  to limit the length of strings in their databases are referred to

I'm not 100% happy with this.  Better ideas are welcome.

>   - This change is not noted in the change log Appendix E.1

Fixed, thanks.

>   - The headings of these tables in the text still say "ADQL types",
>     but the listed values are now VOTable types.

Oops.  This is now "datatypes".

>   - "TIMESTAMP(1)" has been changed to "char(*)" for column
>     "created" in table "rr.resource".  The textual description
>     hasn't changed however, and it doesn't say how to serialize
>     the creation time in a char(*).  Presumably some mention of
>     ISO-8601 or DALI would be in order.  There may be other
>     similar cases in other tables/columns, I haven't checked.

Ah, the joy of xtypes.  Yes, that was a bug in the translation
script.  I now add utypes as appropriate, and I've added introductory

  Some of the types are given as ``datatype+xtype''.  In these cases,
  the xtype MUST be given on VOTable output, and the serialisation
  rules from DALI \citep{2017ivoa.spec.0517D} apply.

>   - Usages like "float(1)" as opposed to "float" seem questionable
>     with reference to recent discussions about whether for VOTable
>     arraysize a missing value is equivalent to a value of "1".

Another oversight in the new translation scripts.  Fixed.

>   - In general I would favour type descriptions in this context
>     which are not tied to VOTable (or ADQL), since as I understand
>     it it's the data model not the serialization that is being
>     defined here.  Compare discussions near
>     The current TAP 1.1 working version (since volute revision 4286,
>     currently unpublished) has adopted this suggestion and describes
>     column types as e.g. "string" or "integer".
>     In this case there might be an argument for distinguishing
>     "unicode string" and "ASCII string" or similar.

Agreed, except that I *would* like to force operators to indeed
serialise timestamps to VOTables as forseen by DALI, and the simplest
way to make people do that is to give the xtype.

The exposition of this could certainly be improved, and I'd
gratefully consider any commits to that end to the repository.

> standard_id in Examples:
> -----------------------
> The example in Sec 10.1 includes the clause
>    WHERE standard_id like 'ivo://'
> but 10.2 (and some others) has
>    WHERE standard_id='ivo://'
> I've got a feeling there's a good reason that the pattern-matched
> form makes sense for TAP but not for SIA, but I've forgotten
> what it is.  Since these examples are intended to be pedagogical,
> and in case other readers are as ill-informed/forgetful about IVOID
> as me, it would be nice to add some text explaining the discrepancy
> (or avoid it if there is no good reason).

The text you vaguely remember is in Identifiers 2.0, sect 4.2,

And true, I simply hadn't updated the respective examples.

> -----
> The term "IVORN" is used several times (I count 15) in the text.
> My understanding is that this term is deprecated following section
> 1.1 of IVOID 2.0.

Right.  Fixed.  Thanks.

> QName example:
> -------------
> The formatted text renders the QName example in sec 5 as
> "".
> I think this is missing some curly brackets that have been
> eaten by LaTeX.

(Brown Bag).  I can hardly believe I never spotted such a glaring
typo when proofreading RegTAP 1.0.  Well, serves me right for using
texttt{} rather than verb|| in that place.

> Formatting:
> ----------
> It looks a bit surprising to have some lowercase XML element
> and attribute names rendered in smallcaps (e.g. "The STATUS attribute
> of VR:RESOURCE ..." in sec 8.1 - LaTeX and the browser know those are
> lower case strings, but humans may not).  This appears to be the work
> of the \vorent macro.  But probably anybody paying enough attention
> to notice can work out what's going on, so it's not a very big deal.

Now that you put it like that I notice that perhaps that hasn't been
my greatest typographical choice.  Until someone else speaks out I'll
not change it, though, since I suspect it's less confusing that one
might think and the convention is already there in several other
published documents that we can't really recall or easily fix.

Thanks a lot for the review.  For the benefit of other reviewers,
I've put a PDF of the fixed document on, and of course you can simply check
out the latest text from
(which is also the best way to review the changes I've just made; all
of the above is between revisions 4684 and 4685).

        -- Markus

More information about the registry mailing list