[p3t] A thought on DataLink in a JSON-based protocol

Fri May 31 19:53:46 CEST 2024

Russ Allbery via p3t <p3t at ivoa.net> writes:

> This does not (at all) solve the problem of how to then embed these
> things in VOTables, since OpenAPI does not (so far as I know) have an
> XML representation for a schema.  I think only JSON and YAML are
> defined.  So that's another obvious drawback that I'm not sure how to
> approach.

I went for a walk and thought about this some more, and I think I have an
idea.  This will at least make it obvious what sort of complexity would be
involved in going this route.

Currently service descriptors define two separate things: a simple schema
or standard protocol reference for an associated service, and a set of
instructions for how to set parameters to that service from either static
values or columns in the VOTable.

I have no desire to specify an XML serialization of an OpenAPI schema.
That doesn't sound like a good time.  But what if we separated those two
things?  A service descriptor would then be a link to a schema
(specifically, in the OpenAPI context, to an operation specified by that
schema), and a set of instructions for how to set input values.  Those
instructions would be an identifier for the input parameter, which would
be a string whose meaning is determined by the type of schema, and the
value, which similar to today would either be a static value or a
reference to a column in the associated VOTable.

We would continue to define an XML serialization of this service
descriptor, as well as a JSON serialization (and any future protocol
serialization).  The JSON version of a service descriptor would look
something like this:

    {
      "schema": {
        "type": "openapi",
        "url": "https://example.com/service/openapi.json#/<json-pointer>"
      },
      "inputParams": [
        {
          "parameter": "<json-pointer>",
          "value": "static-value"
        },
        {
          "body": "<json-pointer? json-path? see below>",
          "ref": "primaryID"
        }
      ]
    }

(This is very incomplete; more fields would be needed.)

The end of the URL is a JSON-Pointer to the operation in the schema.  This
means that in the case of web services that publish their own OpenAPI
schemas, the service descriptor can just point to the OpenAPI schema
published by the service with the appropriate JSON-Pointer to reference
the specific intended operation.  (An "operation" is basically a path plus
an HTTP verb.)

Input parameters then reference either an OpenAPI parameter or a field in
the body of the request.  Referencing parameters is easy, since they're
right there in the schema so this can just be another JSON-Pointer
relative to the operation.  The body is a bit harder, since (to support
content negotiation) it's defined in the OpenAPI schema as a content type
mapped to a schema, and then the schema defines the nested structure.  I'm
not sure if we want a JSON-Pointer to the schema definition or (for JSON
network serializations) a JSON path within the body.

Advantages:

1. We're leaning into OpenAPI already, at least for now, and we can tag
   the type of schema to give us the flexibility to use other types of
   schemas in the future (asyncAPI or whatever).

2. Being able to point directly to the OpenAPI schema provided by the
   service is a nice reduction in duplicated information.  Right now, our
   DataLink services have to first write the service, which creates an
   OpenAPI schema, and then separately also define the same information in
   a DataLink snippet, which is annoying and involves two sources of data
   that can get out of sync.

3. This gives us the ability to handle any HTTP verb and any format that
   can be defined by OpenAPI, or any similar specification standard we
   adopt in the future.  This is way more power than the current service
   descriptor syntax has.

Disadvantages:

1. Parsing the semantics of an entire OpenAPI schema is way work than
   clients are going to want to do.  I think we would need to be able to
   define a really narrow subset of information in the schema that clients
   would need to look at and allow clients to ignore everything else, and
   we're still paying some complexity cost because there probably aren't
   great OpenAPI schema client libraries for doing this sort of operation.
   And we do want the client to parse the schema in order to get things
   like a list of allowable values, minimum and maximum values, etc.  Most
   of the JSON Schema libraries I'm aware of are focused on validation of
   input data, not in parsing the schema for semantic information that one
   might want to use to, for instance, build an HTML form.

2. Related, even finding the schema is going to be complicated,
   particularly for a request body.  I'm not sure if a pointer into the
   schema is the right approach, because the schema can be a reference to
   a different part of the OpenAPI schema, which in turn can reference
   other parts of the OpenAPI schema.

2. I'm not sure how good the library support for JSON-Pointers is.  It
   does look like there are a few good Python libraries out there.

4. If we want the specification of the service to continue to contain IVOA
   type information (and I think we do), we will need to define an OpenAPI
   extension that carries that information.  I suspect we'll want to do
   that anyway, because I suspect we'll want to add IVOA type information
   to all of our schemas, but I'm not sure how hard that's going to be.

-- 
Russ Allbery (eagle at eyrie.org)             <https://www.eyrie.org/~eagle/>