MAXREC in Datalink

Mon Mar 30 14:13:34 CEST 2020

Markus et al.,

Initial thoughts (I reserve the right to change my mind if somebody
comes up with good arguments to the contrary):

On Mon, 30 Mar 2020, Markus Demleitner wrote:

> After reading
> 
>   If the client submits more ID values than a service is prepared to
>   process, the service should process ID values up to the limit and
>   must include an overflow indicator in the output as described in
>   DALI. The service must not truncate the output within the set of
>   rows (links) for a single ID value if the request exceeds such an
>   input limit.
> 
> in the current master branch of the datalink 1.1 draft, I somewhat
> hotheadedly assumed that Datalink was supposed to support MAXREC and
> put such code into DaCHS.  Only when I noticed that my current
> implementation didn't do any QUERY_STATUS at all, I started to
> wonder, and, sure enough, Datalink 1.0 didn't say anything about
> QUERY_STATUS, and Datalink 1.1 doesn't say anything about MAXREC.
> 
> Now, I'd say there's not terribly much merit in supporting MAXREC on
> datalink services (DaCHS will do it anyway, not that I've already put
> it in) -- I can't see much of a scenario there, at least if we agree
> that a Datalink services shouldn't return more than a couple of 100
> links per ID, and I think we should do that because for UI
> considerations.

That sounds fairly persuasive.  There may be some weird cases in
which a datalink service has a large number of links per ID,
but as you say that is not to be encouraged, and since such cases
ought to be rare, missing the non-essential MAXREC feature is not
much problem.

> But if we say that we should have a 
> 
>   <INFO name="QUERY_STATUS" value="OVERFLOW"/> 
> 
> (which the above passage does), I'd say we should also say clients
> must produce
> 
>   <INFO name="QUERY_STATUS" value="OK"/> 
> 
> as per DALI 1.2.  Of course, that would make datalink 1.0 services
> nonvalid for 1.1, which I think we're not allowed to declare.
> 
> So, perhaps we should say they "should" produce the query status, and
> that it defaults to QUERY_STATUS=OK?  Perhaps that defaulting would
> even be a good addition to DALI?

QUERY_STATUS=OK doesn't really do any work; if I was writing DALI from
scratch I don't think I'd put it in.  I can't imagine clients
(apart from validators) changing behaviour if they do or don't find
a QUERY_STATUS=OK.  If an overflow or error is flagged then deal with it,
if not, assume normality.  This is kind of underlined by the provision
in DALI that you have to ignore an earlier QUERY_STATUS=OK if a later
QUERY_STATUS appears saying something different.

> But when we do that, the next question is if there's any way a
> datalink QUERY_STATUS could ever be ERROR, given that at least many
> errors are reported in-table.  Is there a point for really serious
> errors ("No database connectivity") being reported in this way
> (rather than just throwing an HTTP 500)?  While I'd say consistency
> is nice, that would raise the number of error types a client will
> have to watch out for to three (HTTP-Level, global INFO, per-row
> faults).  Hm.

My client code usually has enough error-handling code in it without
wanting to lobby for more conditions.  As you say, HTTP-level errors
have to be caught anyway, so unless there's some pressing concrete
use case for flagging an error within the document, I would not
support this.

So I'd say: just use QUERY_STATUS=OVERFLOW and not the others.
Agreed that may be a bit inconsistent with other standards, but
personally I'm not too bothered.

Mark

--
Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/