DataLink 1.1 : step towards PR+RFC ????

Mon Jan 9 11:13:23 CET 2023

Dear DAL, Dear François,

On Fri, Jan 06, 2023 at 06:49:21PM +0100, BONNAREL FRANCOIS wrote:
>        It would nice to have the RFC starting before next interop. I think
> it's mature enough.
> 
>       Thoughts ?

Sure, let's go ahead.  Except...  there's one skeleton in the closet
that recently came up again, and perhaps we can still somehow bury
it before RFC.

The problem is the following text:

  Unless the incoming request included a RESPONSEFORMAT parameter
  requesting a different format, the content-type header of the
  response MUST be ``application/x-votable+xml'' with the ``content''
  parameter set to ``datalink'', with the canonical form given in
  \ref{sec:mime} strongly recommended.

The purpose of this language is that clients can (relatively) easily
work out that they are dealing with a Datalink document regardless of
where they get it from (as long as it's http).  I think that's a good
idea, although I'm not aware of a client that actually looks at
content-type when retrieving things that could be datalink documents.

But at the same time this is blocking an important use case:
Displaying datalink documents in the browser (Background:
http://mail.ivoa.net/pipermail/dal/2021-April/008426.html and
https://github.com/msdemlei/datalink-xslt).  When I wrote the XSLT
for that in ~2016, I planned it as a temporary hack until there are
good datalink clients, but now I think letting people open datalinks
with the browser and getting something actually usable is a major use
case in itself. 

The trouble with this: Web browsers will not apply the XSLT to
documents with a media type of
application/x-votable+xml;content=datalink.  I have to give them
text/xml to start the whole magic.

I hence at the moment have the choice of violating the standard or
breaking a use case important to me.  I weaseled around that first by
inspecting user agent strings and only returning text/xml if the user
agent looked as if I was dealing a web browser, praying nobody would
notice.  But that broke rather quickly (I forget the details), and I
switched to inspecting the accept header.  If I find a text/html in
there, I return text/xml (yeah, it's that twisted), otherwise I'm
compliant with the datalink spec.

But it's still a violation of the standard.  I had hoped programmatic
use would not be impacted, but it turns out that, for instance, the
JVMs earlier than 11 actually indicate acceptance of text/html, too.
Sigh.

So... it's trouble, and I have not found any solution that doesn't
make me cringe.  But I increasingly have the impression that ignoring
the problem will only make matters worse.

The least horrible proposal I have would be to replace the text
quoted above:

  When a datalink service returns a datalink VOTable (i.e., absent a
  RESPONSEFORMAT parameter requesting something else), it MUST
  indicate that in the response's content-type header.  When the
  request's accept header includes ``application/x-votable+xml'',
  then it MUST be ``application/x-votable+xml'' with the ``content''
  parameter set to ``datalink'', with the canonical form given in
  \ref{sec:mime} strongly recommended.  Otherwise, any legal VOTable
  media type, including text/xml, is allowed.

That is: clients wishing to do dispatch based on the datalink media
type must indicate that they accept VOTable.  It's a pretty safe bet
that major browsers won't do that (and potential future VO-enabled
browsers wouldn't need the XSLT, I'm sure).  And although HTTP
content negotiation isn't as popular as it should be, I think it's
implementationally not very intrusive.

The only alternative I could come up with would be to codify what I'm
currently doing:

  Unless the incoming request included a RESPONSEFORMAT parameter
  requesting a different format, and unless the user agent indicates
  it will accept text/html, the content-type header of the response
  MUST be ``application/x-votable+xml'' with the ``content''
  parameter set to ``datalink'', with the canonical form given in
  \ref{sec:mime} strongly recommended.

We could then have a footnote explaining what the text/html exception
is supposed to do.  The downside here is that it's really an ugly
hack to return text/xml when accept has text/html, and there's too
much library code that wantonly sticks text/html into accept behind
the programmers' backs.

I think given the media type hasn't seen too much use so far anyway
and when a client wants to use it, it would be new code anyway, I'd
go for option one.

But if anyone had a less painful idea, that'd be even better.  Does
anyone?

         -- Markus