SIA-2.0: query params for string values

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Thu Jun 26 00:25:24 PDT 2014


Hi,

I'm afraid I simply cannot keep my mouth shut when we're talking
about parameter syntax and semantics...

On Wed, Jun 25, 2014 at 09:53:34AM -0700, Patrick Dowler wrote:
> Now, we have a proposal to define string-value comparisons such that:
> 
> FOO=abc
> 
> is satisfied if "abc" matches a substring of the stored value (foo
> contains "abc"). I haven't really thought this through, but there are
> cases where exact match is the  better behaviour (e.g.
> ID=<obs_publisher_did>) and others where substring matching is
> arguably better... but I'm not convinced and I'm not crazy about
> different params behaving differently depending on one sentence in
> the spec.

Nah, special rules by parameter are terrible, in particular because
people will use custom parameters and everyone will be confused as to
what semantics these have then, and again it's going to be hard to
make such things discoverable.

For reasons of consistency (also with the outside world) "normal"
parameter behaviour should be comparison for equality.  For special
effects, I think we should point to ADQL.  If we insist we want
special effects on our S-protocols, I'm again proposing extra
parameters (may I throw in "Hungarian Notation" to give this a bit of
academic credibility?) -- a service that actually supports partial
matches would just declare support for a parameter

FOO_PARTICLE

and any value(s) given would allow partial matches; whoever adds
parameters that do partial matching would use the _PARTICLE suffix,
too.

[either that or someone sits down and actually writes a full-blown
query language -- VizieR's would be a good starting point with
relatively few wrinkles to iron out; for strings, that might actually
work out much better than it would for numbers as we wouldn't have to
lie that blantantly about types here.]


> A second complete orthogonal issue that has come up before is case
> sensitivity. Should we define these comparisons to always be
> non-case-sensitive? There is the fact that ObsCore dataproduct_type
> has fixed lowercase values defined, so this doesn't make sense there,
> but we can word carefully for such "enumerated" values to not open it
> up to mixed or upper case there.

Case-insensitivity is a bane for more reasons than I care to
enumerate here.  After having written RegTAP, I start groping for my
stress therapist's phone number as soon as I type the second
"n" in the c-i word.

I'd be fine with declaring _PARTICLE parameters as "ignoring case in
some operator-defined sense"[1], as these need a lot of massaging, and
aren't useful for "exact" matches anyway , but normal parameter
matching should be case-sensitive ("compare for equality").

More precisely, I believe we shouldn't say anything about case
folding in the protocol.  That's because unfortunately, we're already
knee-deep in case-insensitivity hell: IVORNs were defined as being
c-i (which of course leads to no end of interoperability woes because
few services actually treat them accordingly).  So, if there were a
PUBDID parameter (which could be useful in certain datalink
scenarios), backend matching would have to be case insensitive.   But
that's a (regrettable) property of IVORNs, not the protocol.

Now what was that stress therapists's phone number?

Taking a deep breath,

          Markus


[1] Note that case folding outside of plain ASCII is tricky business,
including interesting locale dependencies (lower('I'), for instance,
is different in Turkish from lower('I') everywhere else, and
upper('ß') has caused snappy press reports over here.



More information about the dal mailing list