[p3t] A thought on DataLink in a JSON-based protocol

Fri May 31 17:22:32 CEST 2024

One of the thought-provoking ideas that came out of my long conversation
with Markus is what DataLink would look like in a new protocol world.

One obvious question is whether the links response should be a VOTable.
Replacing usage of VOTable for astronomy data is not one of our goals, but
I think we should look at whether it makes sense in the places where it's
used as a wire encoding for protocol elements that aren't actual data.
This feels more like the latter, so I would lean towards defining the
semantics of a DataLink response and supporting encoding it in JSON.  That
looks like it would be reasonably straightforward for the main links
response.

The service descriptors are the more interesting part, because they are
both returned in the links response and intended to be embedded in
VOTables alongside the data.  I think we had an earlier round of
discussion of this, and there are essentially three options:

1. Embed something other than XML, such as a literal JSON blob, inside the
   VOTable.

2. Define both XML and JSON serializations of the service descriptor.

3. Stick with only XML for service descriptors.

The thing with service descriptors, though, is that the current language
for specifying the parameters that can be passed into the service API is
*very* restrictive and only useful for simple cases.  It essentially can
only characterize a service that takes key/value pairs as input, although
it does have a vitally important facility for specifying a data column
that should be used as the source of an input parameter, and for ad hoc
services it only supports GET.  It also can't specify any sort of
structure, which we would want for the general case of services that use
JSON for their input parameters.

Here's my radical thought (and feel free to tell me that this would be too
much): what if service descriptors could be OpenAPI schemas?

The point of a service descriptor is to define an API that the client can
call and to tell the client exactly what input parameters are allowed.
This sounds familiar!  We have a whole language for defining APIs that
other people developed and that has well-known and well-understood
semantics.  We have additional information that we need to embed in that
schema for IVOA purposes (the reference to the field that should be used
to populate that parameter, the more-specific IVOA typing information),
but OpenAPI supports extensions via x-* fields, so we can add that in
additional x-ivoa-* fields.  And potentially a service descriptor for a
standard service could just use an external reference to the canonical
OpenAPI schema for that service, although we would have to think about how
to convey the field references when doing so.

I think the most obvious drawback is that OpenAPI schemas are pretty
complicated and a lot of uses of DataLink in clients are pretty simple.
We would at the very least have to think about sharply restricting what
features of OpenAPI schemas could be used in service descriptors:
essentially, define a subset of OpenAPI that would be tractable for
clients to implement.  But this would replace all of the ad hoc,
astronomy-specific logic in DataLink about how to specify field
constraints, enumerations, field descriptions, etc. with a standard and
well-tested API description mechanism that could handle nested structure,
other REST verbs besides GET, and all sorts of other features that may be
useful in the future.

This does not (at all) solve the problem of how to then embed these things
in VOTables, since OpenAPI does not (so far as I know) have an XML
representation for a schema.  I think only JSON and YAML are defined.  So
that's another obvious drawback that I'm not sure how to approach.

What do folks think about this?  Interesting?  Too ambitious?  Any ideas
on how to handle the VOTable integration?

-- 
Russ Allbery (eagle at eyrie.org)             <https://www.eyrie.org/~eagle/>