[p3t] Draft documents demonstrating standards layout

Mon Aug 26 20:26:13 CEST 2024

Dave Morris <dave.morris at metagrid.co.uk> writes:

> I suggest we split the content over 4 documents.

> 1) IVOA Service protocol
> A document containing generic definitions of what services, operations
> are etc. This is pretty close to what you have in
> https://sqr-091.lsst.io/, but replace the phrase 'web service' with
> simply 'Service'.

Good call.  I like that better.

> 2) IVOA HTTP protocol
> A document describing how the IVOA should use aspects of the HTTP
> protocol. This includes the formatting of HTTP GET parameters from
> https://sqr-092.lsst.io/, and how IVOA services should use the HTTP
> content negotiation process. This also gives us somewhere to talk about
> HTTP error codes, caching, cross-site-scripting, and things like the
> POST-redirect-GET pattern.

> 3) IVOA JSON serialisation
> This is just the details of how IVOA services should map object data
> into a JSON serialisation. This is a stand-alone document, separate from
> how we use the HTTP transport protocol. Pretty much what you have in
> https://sqr-092.lsst.io/, but without the HTTP parts. It should be
> explicitly clear what is inherited from the existing JSON standard and
> what is IVOA specific.

I combined these two because I wasn't sure that we would have cases in the
future where we would use one of these but not the other, but I'm happy to
break them apart again.  The only goal of combining them was to have fewer
documents, which is a benefit but not a major one if there are other
reasons to split them.

It sounds like your hope is that document 2 can be used for any service
going forward even if it is still using some XML-based serialization
protocol, and if that's possible, that would be a good reason to split
them.

That said, I will say that I doubt the HTTP document will be useful for
future serialization mechanisms, even if they are using HTTP under the
hood.  The example that I had in mind while writing this was gRPC, and
gRPC doesn't use GET parameters so far as I know, wouldn't use HTTP
content negotiation, doesn't use REST-style HTTP verbs, has much different
(and much more limited) cross-site scripting concerns, has its own
separate error code concept, etc.  So most of the HTTP document would not
be relevant to one obvious possible future protocol, and I suspect that
will be somewhat typical.

But that's a guess about the future and it also doesn't really matter that
much.  If we do split the documents and then want to standardize something
like gRPC, it can just say that it doesn't use either of those documents.

So I guess that's a long-winded way of saying yup, this sounds good to me.

I may hold off on separating the documents in my drafts just to focus on
trying to add more content, but that's not a sign of disagreement, just a
convenience, and when we start working on proper IVOA documents we should
follow this split.

> 4) IVOA WebService protocol
> A specific sub-type of the [IVOA Service Protocol] for IVOA service HTTP
> based WebServices.

> An [IVOA WebService]
> * MUST format HTTP request parameters as defined in [IVOA HTTP protocol].
> * MUST use HTTP headers for content negotiation as defined in [IVOA HTTP
>   protocol].
> * MAY specify [IVOA JSON serialisation] as the default serialisation for
>   objects in requests and responses.
> * MAY include other serialisations for objects in requests and responses.

I think the implication here is that we're distinguishing between an "IVOA
service," for which document 1 applies but none of the others, and an
"IVOA web service," for which all the docuemnts apply?

I was hoping to make document 4 at least somewhat generic in that the
pieces would be there to retarget it at something other than the style of
HTTP web service you're considering here, although since it needs to
include the instantiation with the JSON-based protocol and the OpenAPI
schema, there will be a portion of the document that is specific to what
it looks like as a JSON-based web serfvice.

Note that point two will not be possible for a lot of future protocols
that I'm anticipating.  I consider it partly an aspect of the JSON-based
protocol (for the client to indicate what type of data response it wants
to get) and *maybe* as a way to support several encodings of an HTTP-based
protocol, but my guess is that the latter probably won't work out for
future protocols we want to support since I don't think the HTTP
negotiation mechanism will be sufficient.

I'm a little worried about the idea of a "default serialization" since I
want to make sure that the path to adopting new protocols in the future is
very clear.  In my mind at least, the service is a fairly abstract
specification of inputs, outputs, and semantics, and we then instantiate
it for one or more protocols, all of which are in some sense on equal
footing.  Here's how you do TAP over REST+JSON, here's how you do TAP over
gRPC, here's how you do TAP over REST+XML, etc.  Adding a new
instantiation should be fairly easy; one can just apply the new protocol
and encoding rules to the defined data types, and ideally one has a
working protocol, maybe with a few issues to resolve around the edges.

Maybe the concern around specifying something as a default serialization
is to push people towards a single instantiation for interoperability
concerns?  That definitely is a concern, but if it is, I suspect we don't
want each protocol separately picking its preferred default?  Maybe that's
the world we'll necessarily end up in anyway, though.

> A particular IVOA standard, e.g. SODA, MAY specify a minimal set of
> content types. We already do this when we say TAP requires VOTable, but
> we should use the HTTP content type headers for this rather than a
> separate FORMAT parameter.

Yes, completely agreed.  I think we should try to specify what content
types the protocol is supposed to support, provide some idea of what it
might optionally support, and specify how to use HTTP content negotiation
for the client to choose the format in which it wants its results.

> * The main part of an IVOA Service standard defines the data model.
> * A sub-section of the standard refers to [IVOA WebService protocol] as an
>  implementation.
> * The [IVOA WebService protocol] says a WebService MUST use content
>   negotiation for HTTP requests.
> * The [IVOA WebService protocol] says a WebService SHOULD specify JSON as
>  the default content type serialisation.

As above, I think the third requirement there is pretty limiting and means
that most future protocols will have to work outside of this box (and thus
wouldn't count as "WebService protocols" presumably).  This may be okay,
but I'm not sure I see a ready example of how one could use HTTP content
negotiation to negotiate the serialization, as opposed to the MIME type of
data responses.  I guess if we defined an XML serialization as well, that
would be one?  I wasn't sure if we were going to do that.

> For example, the main part of the SODA Standard would describe the
> operations and data model in an abstract way, without referring to a
> transport protocol or serialization format.

> The SODA Standard would include a section on [Implementations], with a
> sub-section for the [WebService] implementation, which describes how to
> implement the SODA service using the [IVOA WebService] protocol.

Yes, that's exactly what I was going for.

> As part of this, the SODA Standard can define a minimum set of content
> type serialisations that a [SODA WebService] MUST support.  In 2024,
> this minimum set would include JSON as the default.

> End result, Rubin et al get what they want; the 2024 version of SODA
> requires JSON, BUT we do not build it in from the start. Change one
> single line in the SODA Standard and a future version could require
> [not_invented_yet] as the default serialization.

This sounds right to me.

> One of the issues with the IVOA standards raised in the P3T team remit
> was the extent of the document cross referencing. To address the
> complaint that "to understand one standard you need to refer to at least
> three other documents each of which refer to three more" which makes it
> hard to get started.

I think the main place where we help with this is by publishing the schema
as part of the service document.  That combines most of the critical
information from all of the documents in one place.  If we stick with that
pattern (in whatever the native form of that schema is; it likely will not
be OpenAPI for future protocols) for each instantiation of the service
with the service, I suspect most implementers will be able to work off of
the schema and only refer to the other documents if something is unclear.

For example, for gRPC, we would presumably publish the service and message
protobuf specifications for the API, which is enough to write both a
client and a server in conjunction with the semantic description in the
service document.  For XML, we would publish XML schema for each of the
data elements plus an OpenAPI schema for the REST portions.  Etc.

Also, along a different axis, I think we'll get the most benefit from
trying to build the document layering so that it mirrors a typical
layering in an implementation strategy.  The lower-level documents
hopefully correspond to lower-level libraries; for example, serializing a
timestamp for the JSON serialization format is something I would write
once and stick in a library and then any time I was implementing an IVOA
web service and saw a timestamp, I would just use that predefined data
type from the library with its attached serialization and deserialization
properties.  That means I only look at the network encoding document when
writing the low-level library, and only look at the service specification
when writing a new service.

-- 
Russ Allbery (eagle at eyrie.org)             <https://www.eyrie.org/~eagle/>