Draft CORS guidance for an IVOA JSON protocol

Thu May 30 05:03:52 CEST 2024

Markus Demleitner via grid <grid at ivoa.net> writes:
> On Wed, May 29, 2024 at 07:15:52AM -0700, Russ Allbery via grid wrote:

>> Ah, I think I see my error: I assumed that the information available
>> from the /capabilities endpoint would also make it into the registry,
>> and it sounds like that's a bad assumption.

> No, that assumption is good.  The content of /capabilities is intended
> to be, modulo the root element, identical the capabilities element in
> the registry record.

Ah, so the center of my confusion was that the example I was discussing
was SODA, and while it does indeed register separate sync and async
services, this is not that meaningful because use of SODA isn't really
registry-driven.

>> You can try to run them on the same URL with content negotiation, but
>> you're increasing the complexity a fair bit and my intuition is that a
>> lot of things could go wrong.  It's conceptually simpler to provide two
>> different interfaces.

> Hm... ok.  Current discovery patterns wouldn't break if you declared
> these (in both <capabilities> -- the registry record -- and
> /capabilities -- the VOSI endpoint) as separate interfaces, as in

>   <capability standardID="ivo://ivoa.net/std/sia">
>     <interface xsi:type="vs:ParamHTTP">
>       <!-- legacy form post -->
>       <accessURL>http://example.org/images/sia.xml</accessURL>
>     </interface>

>     <interface xsi:type="vs:JSONPost">
>       <!-- CSRF-hardened new-fangled thing -->
>       <accessURL>http://example.org/images/sia.xml</accessURL>
>     </interface>
>   </capability>

> I don't forsee much trouble there *if* clients have a good way
> to know when to request which type of interface.

I think you meant the URL of that second entry to be sia.json?  If so,
then yes, this sort of thing is what I had in mind.

> I think this is an important observation: The whole matter only is an
> issue for services that require authentication.

Yes.  I've been trying to make that clear; this whole conversation about
CSRF is in the context of authenticated services.  Unauthenticated
services do not need to care about CSRF.

> As it happens, these are services that likely won't work with current
> clients anyway because few of them can do any auth beyond HTTP Basic,
> and if you're doing auth with HTTP Basic, you are probably not *very*
> concerned about security in the first place.

No, I don't think this is correct.  HTTP Basic Auth is equivalent, as far
as the security properties of the protocol are concerned, to bearer token
authentication where the user provides the token.  In both cases, you're
sending the authenticator in the request.

HTTP Basic Auth *with passwords* is less secure, but only because
passwords are not very secure (and also annoying to deal with for a bunch
of other reasons, which is why people generally do some variation of OAuth
or SAML these days for authentication of humans).  But there is no
requirement that the "password" field in HTTP Basic Auth be a password.
We use bearer tokens and support HTTP Basic Auth for all of our services;
the user just puts the token where the password would go.  This is no more
or less secure than RFC 6750 with a user-supplied token, and works with
clients that only support HTTP Basic Auth.  In that, we're in the same
company as other services that care a great deal about security, such as
GitHub. [1]

[1] https://docs.github.com/en/rest/authentication/authenticating-to-the-rest-api?apiVersion=2022-11-28#using-basic-authentication
    https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#using-a-personal-access-token-on-the-command-line
    I believe the latter is using HTTP Basic Auth under the hood, based on
    the Git documentation, but I'm not 100% sure.

Full OpenID Connect with device registration is more secure, but it's also
considerably more complex and requires a lot of work on both the server
and client sides.  Supporting that is the long-term plan, but I suspect
we'll continue to support HTTP Basic Auth with bearer tokens well into the
future.

> Let me suggest that this observation might point the way to a nice
> compromise between not wanting to break everything that works relatively
> nicely now and wanting to plug gaping CSRF holes in our protocols:
> Define a way to derive JSON-posting interfaces from our "normal"
> form-posting interfaces and then tell people: "If you want to do
> SSO-compliant Auth, write a vs:JSONPost interface rather than a
> vs:ParamHTTP one".

If the only goal were addressing CSRF issues, this may make sense, but I
don't think this approach would provide several of the other, more
significant motivating benefits of this work.

> Cf. the SIA/SIA2 disaster for the time frames you are looking at for a
> full transition; we're probably not halfway through for SIA2 yet.

Yes, it will take a long time.  Sooner started, sooner finished.  The
benefits for new service implementations accrue faster, since the
milestone of sufficient client support such that services no longer have
to implement the old protocols will be reached earlier.  General clients
will necessarily have a much longer transition period.

>> For now, with the existing IVOA protocols, I think the best fallback
>> CSRF protection is to require an Authorization header on all POST
>> requests, which forces them to not be simple requests.  This is still a
>> protocol

> Hm... Since the overwhelming majority of VO requests don't use Auth at
> all, this would feel a bit odd to me.

I'm sorry, I thought it was obvious from context that this was only about
services that require authentication, so I was sloppy in my wording.  Yes,
to be more precise, I think the best fallback for services that require
authentication and want to support the current simple-request POST
protocols is to require an Authorization header on all authenticated POST
requests.  In other words, do not support cookie authentication; it's the
combination of cookie authentication plus simple requests that create the
CSRF issues.

The downside to this CSRF approach that this means such services cannot be
driven by HTML forms, with possibly the special exception of HTML forms
hosted in the same origin as the service and that use HTTP Basic Auth
(although that browser UI is not great and I wouldn't recommend it).

> In a system like the VO, where old services literally keep running for
> decades and (sometimes grossly cobbled-together) clients hang around on
> legacy systems for about as long, I don't think such a mechanism exists.

I think unmaintained legacy clients talking to new services is the use
case that is the most incompatible with a protocol transition.

It is not possible to do a protocol transition without eventually breaking
something, almost by definition.  Otherwise, all services (including
newly-written services) and all clients have to support all variations of
the protocol forever, or at least the oldest ones.  That's not a
transition; that's an accretion.  It provides some benefit, in that new
features can be used with new protocols, but the benefit comes mostly in
the form of additive new features, and the costs for new service
implementations are high.

The only way that I see to get the implementation cost reduction is with a
protocol transition to a protocol that is less expensive, in terms of
human resources, to implement.  The good news is that if we can do this
successfully, we free up resources that can be spent on more and better
service implementations.

> We are probably in a situation much like the HTTP folks who made sure
> that Timothy Berner-Lee's original www client still interoperates with
> even the lastest web servers (provided they spit out HTML rather than
> some Javascript soup, that is).

That would be an awful and depressing situation for astronomy to be in.
HTTP is a complexity train wreck; see, for example, the difficulty we've
had just puzzling out the details of CORS.  HTTP/2 and HTTP/3 are major
breaking changes to try to address some of the limitations, but they still
pay a substantial legacy cost.  Thankfully in the case of HTTP that
complexity cost is paid partly by entities with more resources than the
entirety of astronomy, so we have excellent, robust HTTP implementations
that try to hide as much of that complexity as possible.

Also, even for HTTP, I'm dubious that support of the original client is as
widespread as you claim.  It's common practice in the HTTP world to
consult the browser compatibility matrices provided on MDN and to use
features that are supported by all of the browsers that you care about.
As one very obvious example, many web sites require TLS and are simply
inaccessible to any browser that doesn't implement it.

I don't believe we're in the same situation as HTTP.  I think we have a
tractable number of major clients that would be expected to support new
services, and a tractable number of people who are writing IVOA protocol
implementations.  We can escape the complexity trap of supporting every
old client in every new service forever.

> Hmyeah... I'd have something to say on these, too, like "As long as
> the Registry uses OAI-PMH, people will need to learn XML and its
> tooling anyway."

I don't think it's all-or-nothing and incremental changes are possible,
but yes, we need to talk about which places have the most leverage, and it
will require some thought and care because the assumption that services
use XML is fairly deeply entwined into the standards.  I think one of the
big pieces of this work is figuring out how to separate the encoding from
the data model so that it's easier to add new encodings in the future.
JSON won't be the last.

> If you feel like discussing that, can I invite you to open a thread on
> this over on the Registry list?

I appreciate the invitation!  I think have as many irons in the fire as I
can handle at the moment, so I won't take you up on that immediately, but
I have it in my queue if no one gets to it before I do.

-- 
Russ Allbery (eagle at eyrie.org)             <https://www.eyrie.org/~eagle/>