UWS as a REST protocol

Tue Mar 6 14:27:53 PST 2007

Greetings,

On 2007 Feb 26 , at 15.43, Guy Rixon wrote:

> there has been a lot of debate recently in the industry about SOAP  
> vs. REST as
> the basis of web services.  I've written an IVOA Note on how UWS  
> might be
> presented as a REST service: please see

I'm catching up on this discussion a little late, so please forgive  
the compendium response.  I've tried not to overlap too much with the  
already posted replies to points, with I think varying success.

I've been pushing REST for a little while, and have written a couple  
of RESTful applications[1], so what's below is a mixture of reading  
and personal experience.

Matthew said:

> The main point of REST is having resources which can be accessed  
> through a standard CRUD interface that maps onto the HTTP methods:
>
> CREATE = POST
> RETRIEVE = GET
> UPDATE = PUT
> DELETE = DELETE

I would tend to think of PUT as create/replace (RFC 2616: `the PUT  
method requests that the enclosed entity be stored under the supplied  
Request-URI'), and POST as update, since it to some extent mutates or  
(RFC 2616 again) `creates a new subordinate' of the Request-URI.

That said, I hadn't thought of this mapping between REST and CRUD,  
but that's really useful!

Matthew again, responding to John:

> You're right about the HTTP Accept header but this is difficult to  
> set from a browser.

While it's true that you can't control Accept headers from a (HTML)  
browser, would you want to?  What a browser can handle is text/html  
or at least text/* -- any other representations of a resource would  
naturally be handled by different types of client.

Indeed, perhaps the canonical REST client is 'curl', not Mozilla.

Guy (but overlapping points in Dave Morris's previous message):

> However, if the identifier for a VOSpace node is
> http://mumble.mumble/... then it's tided to some specific site- 
> name. The
> vos:// notation lets us move the node transparently to a different  
> host (or,
> at least it lets us move the _space_; maybe not individual nodes);

True, but the mumble.mumble host needn't have much to do with where  
the bitbucket of the resource is.  HTTP defines a range of 3xx  
responses, and if the mumble.mumble server's job is to return one of  
those responses, then it's not architecturally hugely different from  
a registry, except with an easy-to-use lookup protocol.  Indeed, when  
an HTTP server returns a redirection like this, I imagine it'd be  
defensible for the Location to use a protocol other than 'http',  
perhaps one such as 'gridftp' (this, by the way, seems to at least  
look towards addressing the VOSpace requirements 0 and 1 in a message  
from Paul late in the thread).

> we can't do this with http:// URIs except by DNS trickery

One person's `trickery' is another person's `full use of the protocol  
stack'.  I'd imagine that with a combination of DNS round-robining  
and 300 (Multiple Choices) you could create a very flexible  
distributed system without making anyone too queasy.  But the proof  
would be in the implementation.

To build on Dave's example of an asynchronous third-party transfer,  
consider the following sequence of actions:

1. POST to http://orchestrator.org/transfer some suitable payload  
specifying the transfer (with a suitable Content-Type); the response  
is 303 with Location:http://orchestrator.org/transfer/<id>/status.   
The payload you could think of as a message, but you could also view  
it as a description of the transfer you want, or the result you want  
achieved (nouns rather than verbs, again).

2. GET http://orchestrator.org/transfer/<id>/status; response is 503  
(Service Unavailable) with a Retry-After: containing the server's  
estimate of when it's worth asking again

3. Wait and retry, until the status is 200, 204, or perhaps 502 (Bad  
Gateway = `it's not my fault').

4. If you get bored waiting, then DELETE http://orchestrator.org/ 
transfer/<id>/status.

...or something like that.  You can probably do most or all of this  
transaction using 'curl' (I think, but wouldn't swear, that it can  
grok Retry headers).

Roy:

> What is the theological basis of RESTfulness?

The claim is that it's a better impedance match to the actual web,  
and that it's worked so far (it just didn't have a fancy name).   
Compare HTTP and CORBA: which protocol's endpoint identifiers do you  
see written on the sides of busses?

> Why is it better than VERBishness?

I don't think there's a one-line answer to that.  I think, however,  
that most answers would be elaborations of `the web is a messy  
heterogeneous place, and needs a small(ish) universal protocol, which  
HTTP hits the sweet-spot of'.  See the previous answer.

> Why is is bad for a URL to have side-effects?

It's not bad for `a URL' to have side-effects -- indeed the PUT and  
DELETE actions on the URL necessarily have side-effects, and POST may  
have.  The point is that if you define GETting a URL to have no side  
effects (or at least none that the requestor is accountable for),  
then you can reason about the properties of proxies, caches,  
security, and so on, and so make the web work.  Optimisations are a  
function of the strength of the assertions you can make about a system.

> Why can't VERBish things be cached just as well?

It's not just about caching.  The REST thing is not just motiveless  
`URLs should be nouns not verbs'.  My summary is:

* URIs name things (possibly abstract things like the weather in  
Glasgow, or relatively concrete things like the status of a  
transfer).  As such, they can be passed around straightforwardly on  
busses (diesel or memory) with less chance of everyone getting  
confused.  You can't do this so easily with a verb/message: whom can  
I send this message to?, when?, can I replay it?, are _you_ allowed  
to?, can I store/duplicate/discard it?  A name's just a name.

* There are a _few_ CRUD things you can do with a name (GET/PUT/...),  
and they are orthogonal to the representations of the thing (Content- 
Type, Accept), and orthogonal to the name itself.  If you've chosen  
your set of names skillfully (not trivial, of course), then you can  
probably use those few methods to do all that you want with the  
names.  Since HTTP will change on a vastly slower timescale than your  
SOAP protocol spec, that's a vast amount of confusion and brittleness  
you've taken out of the world.

* HTTP is a damn clever protocol, when you look at it closely.  And  
RFC 2616 is possibly larger than you think, with 'curl' implementing  
quite a lot of it.

Paul:

> In the pure RESTful alternative what if the underlying data change  
> (e.g. improved calibration) then really by the RESTful theology the  
> response should still return the original data not the improved data?

No, the name is the same, but its state, and thus the representation  
of that state, is now different (better calibrated), and not  
necessarily as the result of a PUT.  You can have a URL which names  
`the current weather in Glasgow' -- that's going to change on a  
minute-by-minute basis.  It's the Expires, and If-Modified-Since  
class of headers that handle the fact that while names might well be  
long-lived, representations come and go, and might need to be re- 
retrieved on any timescale between minutes and years.

> Maybe what we want is a mixture.....REST for the stateful, job  
> management part of services, and SOAP (or REST HTTP GET with URL  
> parameters) for the initial job creation - most astronomers do  
> simply want to think of the action that they want to perform in a  
> procedural fashion.

...and if the named Thing is a `processing element', then PUTting a  
job onto it (creating it) and GETting the results back sound pretty  
procedural.

In my experience, the hard thing about designing a RESTful service,  
is deciding just what is the set of Things that you're going to name  
and therefore expose as the conceptual state of your service.

This isn't massively different from the work you do deciding on  
classes and methods in an O-O design, but because (in effect) the  
method names are chosen for you, and because the set of names is (in  
effect) your API, it becomes a weightier design decision.  The upside  
is that this forces you to ask yourself some very useful questions  
about what it is you're designing, and pushes you towards a design  
that's simple and powerful.

One of the stronger arguments in Fielding's thesis (he who named the  
`REST' notion) is the discussion of Unix pipes and character  
streams.  Representing everything as a character stream, and thus  
having successive tools parsing and re-serialising, seems clumsy when  
you first look at it, but the discipline of fitting in with that  
pattern pushes you towards designing Unix tools in a way which  
encourages orthogonal, simple, and robust components, which is  
powerful because it matches the ecology/environment you find in  
Unix.  That is, pipes and streams go with the flow in Unix, in the  
way that (say) more heavyweight file objects chimed with VMS, and (it  
is plausibly claimed) the way that HTTP simply chimes with the web.

All the best,

Norman

[1] Currently, temporarily, at <http://thor.roe.ac.uk/quaestor/> and  
<http://thor.roe.ac.uk/utype-resolver/>.  The first of the two  
implements a generic reasoner, which allows you to upload RDF, and  
submit elaborate queries against it, retrieving the results in  
multiple formats.  I wouldn't swear to it being canonical REST style,  
but it's surely non-trivial, and it seems to work OK.

-- 
------------------------------------------------------------------------ 
----
Norman Gray  /  http://nxg.me.uk
eurovotech.org  /  University of Leicester, UK