RegTAP Post-RFC: Whitespace/NULL

Menelaus Perdikeas mperdikeas at sciops.esa.int
Fri May 23 06:44:42 PDT 2014


Hi Markus,

If memory serves, I had voiced a preference on maintaining the distinction purely on the grounds of not losing information in the database layer. Let the clients decide for themselves if this distinction is material or not (but don't make that decision for them). That was based entirely on a domain-agnostic, abstract shall we say, line of reasoning without considering use cases (with which I am rather unfamiliar).

Another consideration, again abstract, is that stripping and implicitly converting to NULL would require more fields in the RegTAP schema (more precisely: "in a concrete implementation of the RegTAP schema for a particular RDBMS") to be made NULLable (resulting in a looser, more permissive, schema). But I suppose that in practice most fields will likely be declared NULLable to begin with (regardless of whether they may actually be mandatory as per the XSDs), either purposefully, or for convenience.

At any rate, I can certainly change the ingestion process to accommodate your proposed stripping and conversion to NULL.

Cheers,
Menelaus.

----- Original Message -----
From: "Markus Demleitner" <msdemlei at ari.uni-heidelberg.de>
To: registry at ivoa.net
Sent: Wednesday, May 21, 2014 9:11:53 PM
Subject: RegTAP Post-RFC: Whitespace/NULL

Dear list,

as promised, here's the second installment of the Post-RFC
consultation on RegTAP based on

http://wiki.ivoa.net/internal/IVOA/InterOpMay2014Registry/regtaprfc.pdf

The issues at hand have to do with whitespace and are discussed at
some length in 5. and 6. in that document.

I believe the question of whitespace normalisation is tame.  The few
cases in which our stuff doesn't come in xs:token in the first place
(and rather in xs:string) look essentially like oversights to me --
nobody intended for leading or trailing whitespace to be part of
these things.

So, given the vagaries of what the parsers actually do discussed in
the lecture notes, I tend to require that all strings are
whitespace-stripped; I wouldn't say anything about normalising
internal whitespace, as I don't forsee relevant interoperability
problems there.

The question whether NULLs and empty strings should be made
equivalent in the database itself is not quite so clear.  In the
session basically everyone that said something argued for folding
them, mainly on grounds that there's no good reason for telling them
apart in the first place.

I'd accomodate  that by saying all empty strings (and, by the
whitespace-stripping requirement, whitespace-only strings) MUST be
mapped to NULL on ingestion.

Pre-Interop consultation with other implementors, on the other hand,
indicated that would be unpopular with them.  Database people tend to
be suspicious of NULLs, and there are good reasons for that, both
from a theoretical and a practical point of view.

Me, I just want that all registries do the same thing.  What it is I
don't care much.  If nobody comes up with widely accepted arguments
for why that's a terrible idea, I'd do the NULL-folding as sketched
above.

Cheers,

         Markus

This message and any attachments are intended for the use of the addressee or addressees only.
The unauthorised disclosure, use, dissemination or copying (either in whole or in part) of its
content is not permitted.
If you received this message in error, please notify the sender and delete it from your system.
Emails can be altered and their integrity cannot be guaranteed by the sender.

Please consider the environment before printing this email.



More information about the registry mailing list