Draft CORS guidance for an IVOA JSON protocol (was: x-www-form-urlencoded prohibition)

Thu May 23 21:18:10 CEST 2024

If one isn't that familiar with CORS already, it may be hard to follow the
discussion in my previous message about how to use CORS with a
hypothetical future RESTful JSON protocol.  We also need to start drafting
guidance at some point anyway, so I may as well give it a shot now to make
this more concrete.

The following applies only to a protocol that forces browser pre-flight
checks by default.  Examples include protocols that require Authorization
headers or require the Content-Type of the body be application/json.

As with anything related to CORS, this does not apply to GET unless extra
headers are required.  The standard web service guidance is to not use GET
for anything that would be risky if unauthenticated users were able to
blindly trigger a GET but not see the response.

There are three basic cases:

1. The service is unauthenticated.  CSRF concerns do not apply and you
   should use an open CORS policy.  To do this, register an OPTIONS
   handler for every API route (usually one catch-all OPTIONS handler is
   sufficient if you don't have different policies per route) that
   responds with a 204 code and the following headers:

       Access-Control-Allow-Origin: *
       Access-Control-Allow-Methods: GET, POST, OPTIONS

   You may also want to send Access-Control-Allow-Headers if your web
   service accepts non-simple-request headers, and
   Access-Control-Expose-Headers if your service returns extra headers
   that should be readable by JavaScript.

   This is a very common use case and many web frameworks support setting
   this up for you with a few lines of code. [1]

2. The service is authenticated, but you do not believe that any of the
   operations that it supports are sufficiently risky to warrant concern
   about CSRF.  In practice, this means that the service doesn't provide
   destructive commands (or you accept the risk of someone's browser being
   tricked into issuing them), and you aren't concerned about denial of
   service attacks (which is a reasonable position for most sites).

   Similar to case 1, in this case CSRF concerns do not apply and you
   should use an open CORS policy.  You have to do this slightly
   differently than case 1 because you have to allow the browser to send
   you credentials.  Standard headers in your OPTIONS response look like:

       Access-Control-Allow-Origin: <copy of the Origin header in request>
       Access-Control-Allow-Methods: GET, POST, OPTIONS
       Access-Control-Allow-Credentials: true

   (Yes, this means that, in the authenticated case, OPTIONS handlers
   cannot return a static response and need to vary the response based on
   the request.)  As above, you may need to add additional headers.

   As with case 1, this is a common pattern and many web frameworks will
   set this up for you, although they may want a whitelist of origins (see
   case 3b) so you may have to write a bit of code to allow any origin.

3. The service is authenticated and you don't want to allow any web site
   with JavaScript to drive your service.  There are four common subcases:

   a. You do not want to allow remote web sites to drive your service, or
      at least you don't care enough about this use case to want to
      support it.  Do nothing; the default CORS policy will work fine for
      you.  Other web sites won't be able to drive your service with
      client-side JavaScript, but direct requests from anywhere on the
      Internet (with the correct authentication credentials) will work
      without any further action required on your part.

   b. You want to allow a specific whitelisted set of remote web sites to
      drive your service.  In this case, add that whitelist to your
      configuration and, when you receive an incoming OPTIONS request,
      check the Origin header against that whitelist.  If it matches,
      respond as in case 2.  If it does not, reply with a 400 error.  This
      is another very common use case that is probably supported by your
      web framework.

   c. You want to allow every well-known astronomical portal to drive your
      API service, but not random sites on the Internet.  This is the case
      where some IVOA registry work would be useful.  If this turns out to
      be a common desired policy, we could provide a mechanism for sites
      to register the origins (in the specific technical sense of the
      Origin HTTP header) of their portals that support cross-site IVOA
      protocol requests, and IVOA services could retrieve this list of
      origins from the registry and configure their CORS OPTIONS responses
      to whitelist those origins using a mechanism similar to 3b.

   d. You want to allow users to whitelist any web site that they use to
      drive your API, but you don't want to allow any random web site to
      drive your API without an authenticated user explicitly whitelisting
      it.  This is the hardest case and will require a bit of effort.  You
      could, for example, provide users with a configuration screen where
      they could add origins (generally https plus the hostname) of the
      portals they use to a local database, and then the service would
      allow any origin listed in that database.  Or you could go further
      and provide some OAuth-style dynamic registration service for web
      portals to register themselves with your services as valid origins.

      A lot could be done here, but it's effort, so I would want to wait
      to see how often this use case arises before doing design work.  I
      suspect this use case may be relatively rare if we can provide a
      nice solution for case 3c.  In the initial draft guidance, I think
      the place to start is to provide some vague guidance about providing
      the user with a way to enter hostnames of web portals they use.

For the case of x-www-form-urlencoded POST, cases 1 and 2 are the default
behavior provided that the site doesn't expect or send extra HTTP headers
and provided that cookies are used for authentication.  If the API
requires any headers that aren't whitelisted for simple requests
(Authorization is the most common), the browser will do CORS preflight and
the service will still have to implement the above logic.

If case 3 applies but the service also needs to support cookie
authentication without any extra HTTP headers, then one of the other CORS
mitigation strategies must be used.  This is the case that I think is
worth documenting in a standardization of the x-www-form-urlencoded
network encoding.  The obvious thing to document is probably some version
of the synchronizer token pattern [2], although in some cases it may be
possible to get away with using SameSite=Strict cookies.  (SameSite=Strict
cookies have some significant limitations, however, which would deserve a
longer discussion.)  Documenting this as part of the standard would allow
use of an authenticated, CSRF-protected x-www-form-urlencoded POST API by
non-browser clients such as PyVO or TOPCAT.

One final note: CORS protection does not apply to same-origin requests.
Any JavaScript running within the same origin as your API service (in
practice, this means scheme, hostname, and port) can make authenticated
requests to your API service without regard to your CORS policy.

It is possible to try to add mitigations if you have untrusted sites in
the same origin, but this is strongly discouraged by all standard web
security advice.  The browser JavaScript security boundary is the origin;
if you have web sites with different trust levels, they should run in
different origins.  This is the advice we should give in IVOA standards.

This means that many cases that superficially appear to be case 3a turn
out to be case 3b, because there's often a reason to serve the web UI from
a different origin from the API.

[1] https://fastapi.tiangolo.com/tutorial/cors/#use-corsmiddleware, for
    example, for FastAPI.  The documentation discourages case 2 for the
    normal web security reasons, but it does support case 2 with no
    additional code by using allow_origin_regex=r".*".

[2] https://cheatsheetseries.owasp.org/cheatsheets/Cross-Site_Request_Forgery_Prevention_Cheat_Sheet.html#synchronizer-token-pattern

-- 
Russ Allbery (eagle at eyrie.org)             <https://www.eyrie.org/~eagle/>