ADQL polymorphic functions

Grégory Mantelet gregory.mantelet at astro.unistra.fr
Fri Apr 23 10:37:03 CEST 2021


Hi Markus,


On 23/04/2021 08:14, Markus Demleitner wrote:
> Hi Mark,
>
> On Thu, Apr 22, 2021 at 05:44:59PM +0100, Mark Taylor wrote:
>> On Thu, 22 Apr 2021, Markus Demleitner wrote:
>>>    SELECT access_url, gavo_specconv(em_min, 'keV') as en_min
>>>    FROM ivoa.obscore
>>>    WHERE 1=CONTAINS(s_region, CIRCLE(30, 20, 5))
>>>      AND gavo_specconv(1, 'keV', 'm') BETWEEN em_min and em_max
>> A function with an optional argument *in between* two mandatory arguments
>> looks ... quite surprising to me.  It's a sufficiently strange/confusing
> The "inserted optional parameter" is a pattern that is, I think,
> surprisingly common; think of python's range builtin that comes as
> range(stop) or range(start, stop[, step]).  And while this isn't
> pretty, I'd argue having it any other way would be a lot less pretty.
>
> But the question where to put the optional parameter in this
> particular UDF I'd like to defer to a different (future) thread.


I know it is still a bit off-topic, but I completely agree with Mark 
about this
"inserted optional parameter". This is a practice I don't really 
appreciate in
programming languages (e.g. Python, CSS). It makes function usage much less
explicit and so more complicated, and I don't want to confuse even more the
ADQL/TAP users.


> Given that, I've thought about alternatives.  For instance, we could
> expand the syntax of UDF signatures (parsing these is always
> best-effort, so we wouldn't really break anything seriously).  In
> this particular case,
>
>    <form>gavo_specconv(expr DOUBLE PRECISION[, expr_unit TEXT],
>      dest_unit TEXT)</form>
>    <description>
>      In the two parameter form, this...; the third parameter...
>    </description>
>
> -- in the spirit of EBNF -- would do the trick.  But this syntax is a
> bit of a burden on clients that actually want to look at the
> signatures, and it'll become ugly when we want to express type
> polymorphism.  So, I can't say I like it much.


Personally, I quite like this. It is both easily human and machine 
readable. It
is compact enough to appear as only one UDF though allowing some 
flexibility in
its argument.

But I see 2 possible problems:

   1/ square brackets could be a problem if sometime in the future ADQL
      introduces the square brackets as a way to access array items, as 
it is
      generally the case.

   2/ as you said, type polymorphism would be more tricky. We could have 
a syntax
      like:

	my_function(expr (INTEGER|DOUBLE PRECISION))

      Though still compact, it can make the UDF signature much more 
longer if it
      has more than one type-polymorphic arguments...so a solution, but 
probably
      not the best one.
      An alternative could be to introduce special types like:
ANY, ANY_NUMERIC, ...

About datatypes in UDF signature, for the moment, nothing specifies what
datatypes are allowed. And this is especially the case in ADQL, where 
there is
no strong typing.

Personally, when reading such datatypes, my ADQL parser simplifies them as
"numeric", "string", "geometry" and "unknown" (a synonym of "any") in 
order to
check types ; it allows it to make a simple query check with approximate
datatypes. That way, it can not reject a UDF inside an ADQL query just 
because a
float is given instead of a double ; most of the case, DBMS may apply some
automatic casting in such case. But, it is possible that my current way 
to deal
with UDFs in my library can be improved, and if so, I would be glad to hear
about better approaches.

So, in conclusion, I do not know if it is a good idea to go so far with 
types in
UDF form declaration. Probably an ANY_NUMERIC would be enough...but I 
admit that
it may be difficult to get rid of it if we want to change that in the 
future.


>> Is that 9-way overload really something you can imagine?
> Fair question, and paging through the postgres documentation I notice
> that while optional arguments are reasonably common (though less
> common than I had estimated), they don't have a lot of type
> polymorphism (an interesting one is to_ascii on
> https://www.postgresql.org/docs/13/functions-string.html, and
> unsurprisingly this comes up with to_char,
> https://www.postgresql.org/docs/13/functions-formatting.html).


I also rather prefer having several functions to solve the "type 
polymorphism"
issue.

But still, I do not know how an ADQL parser would be able to behave if 
datatypes
are too much precise.

Then, of course, for the client/user point of view, it does not really 
matter as
these UDF signatures are just informative.


> Perhaps our geometry exercises (with CIRCLE(coosys, ra, dec, radius),
> CIRCLE(ra, dec, radius), CIRCLE(center, radius) and perhaps even (I
> forget) CIRCLE(coosys, center, radius)) have made me a bit
> over-paranoid there.
>
>> Given those comments, the above option of listing the UDF multiple
>> times in multiple features elements doesn't sound like it would be
>> too burdensome in practice.
> Given the results of my informal survey of the postgres docs, I think
> I'm leaning that way, too.


+1


>>>    <feature>
>>>      <form>gavo_specconv(expr DOUBLE PRECISION, dest_unit TEXT)
>>>          -&gt; DOUBLE PRECISION</form>
>>>      <form>gavo_specconv(expr DOUBLE PRECISION, expr_unit TEXT, dest_unit TEXT)
>>>          -&gt; DOUBLE PRECISION</form>
>>>        <description>returns the spectral value expr converted to dest_unit.
>>>          expr is assumed to be given in expr_unit...
>>>        </description>
>>>    </feature>
>>>
>>> -- this would be my favourite, except it needs schema changes in
>>> TAPRegExt, and I'd not bet on how well existing capabilities parsers
>>> would cope with this.  Conversely, I think some of the other features
>>> could profit from allowing multiple form-s, too.  Hm.
>> this one seems quite reasonable.  You can also imagine listing multiple
>> UDFs with different names alongside each other in this way if it
>> was semantically convenient to combine their documentation together.
> Given this would, I think, solve this particular problem rather
> elegantly, but perhaps is solving something that's much less of a
> problem than I had initially felt: Does anyone feel multiple forms
> per feature might give us other benefits?


In a client, I assume such thing would probably be displayed as multiple 
UDFs
(as in your "9-way overload" solution). So my impression is that it does not
bring anything new. Besides, if in a n-th form, the UDF name is different
(because of a typo...or not), it can raise an error in the client, just 
make it
to ignore it, or merely, it may interpret it as a new UDF (which brings 
us to
the "9-way overload" solution again).

But it's fairly possible that I just missed something in my understanding of
this solution....

Cheers,
Grégory


> If not, I think for now I'd just have two gavo_specconv features and
> think again if my concerns about an inflation of UDF features
> actually substantiate.
>
> Thanks,
>
>              Markus



More information about the dal mailing list