TAP Implementation Issues: Final Comment: TAP and UWS, sync and async

Fri Nov 6 10:44:58 PST 2009

Hi Paul,

What I had in mind was that at some level you have to have a synchronous 
service -- something actually has to run whatever it is you want to do 
-- and it was natural to me to think of UWS as being built as a proxy on 
top of that.  Where I suffered brain damage was in forgetting the time 
limitations on synchronous HTTP requests so while one can build on top 
of them, it's going to -- as Guy so gently pointed out -- obviate the 
primary goal of being able to run long requests.  He was kind enough to 
give me a couple of chances to realize it on my own!

Were there no such limits, then I don't think the copying issues are 
significant -- it might increase latency a bit but I doubt it would have 
much effect on throughput.  I don't think there need be any duplication 
of storage regardless of which way you do things.

	Tom

Paul Harrison wrote:
> Hi,
> 
> I think that Tom's implementation idea is legal wrt UWS and not 
> necessarily "bad" as long as the 
> http://tap/async/id/phase?phase=run (btw should be a POST) step returns 
> "immediately" - in fact I had been thinking about offering a similar 
> generic service for people who said that async TAP was too tricky for 
> them (for Guy - this is what the CEA "HTTP" style server does). Granted 
> there might be some (internal) inefficiency with data being transferred 
> from the sync to the async storage area, and that internally the UWS 
> part might have to resubmit the sync step if it timed out, but the 
> behaviour to the end client should appear to be standard UWS still.
> 
> I think that the "better" (i.e. more efficient) implementation is to 
> write a fundamentally async service and layer the sync service on this 
> as detailed in section 5 of the UWS document, but as long as the UWS 
> interface is adhered to then people are free to implement as they want. 
> However, I do not think that this requires a change to the UWS document 
> to say that UWS is a separate "service" as Tom suggests, because that 
> would favour the inefficient implementation over the efficient one.
> 
> Paul.
> 
> On 2009-11 -06, at 17:47, Guy Rixon wrote:
> 
>> Tom,
>>
>> your suggested implementation has massive, inherent problems: it loses 
>> most of the benefits of a asynchronous interface!
>>
>> By depending on a synchronous HTTP endpoint to run the query, your 
>> implementation breaks any time that synchronous thing times out, and 
>> it breaks if the network connection with the synchronous thing drops. 
>> In those cases, your UWS has no way to regain control of the query, 
>> even if the DB part is still running it. The UWS has to resubmit the 
>> query.
>>
>> The whole point of UWS is /not / to depend on a synchronous 
>> HTTP-connection for long-running jobs.
>>
>> You can build synchronous TAP as a wrapper around a UWS (Pat D. has 
>> done so) but it doesn't work the other way around.
>>
>> Guy
>>
>>
>> On 6 Nov 2009, at 17:33, Tom McGlynn wrote:
>>
>>> Hi Guy,
>>>
>>> I'm not sure there is a big area of disagreement here.  In terms of 
>>> the text that users write and get back in the TAP asynchronous 
>>> interface I'm not suggesting a that a single byte needs to be 
>>> changed.  It's all in what tasks are doing the processing.  I've put 
>>> in some text that I hope clarifies what I was saying below in context.
>>>
>>> Tom
>>>
>>> Guy Rixon wrote:
>>>> Tom,
>>>> whenever you use UWS in a service definition, you have to say what 
>>>>  parameters it takes when setting up a job and the work done by that 
>>>>  job. That's the "application of the UWS pattern" to use the terms 
>>>> from  the UWS standard.
>>>
>>> I'm not sure I understand this.  While there is talk of JDL and such 
>>> in the UWS standard, I don't see any requirements that show it 
>>> actually being used in any way.  So while a given UWS implementation 
>>> might restrict parameters being used, I don't see how that is done 
>>> within the UWS protocol itself.
>>>
>>>> The UWS specification is supposed to be reusable between 
>>>> applications;  hence the U in the title. Therefore, it can't specify 
>>>> the application- specific parameters.
>>>>
>>>
>>> Right.  I'm not suggesting that.  What I'm saying is that it's easy 
>>> to write a UWS that can handle any parameters -- as indeed you 
>>> suggest you have done already below.
>>>
>>>> It's possible to specify a UWS-conforming service for more than one 
>>>>  application. CEA does this. The modern interface of this kind is 
>>>>  called UWS-PA ("UWS for parameterized applications") and its fore- 
>>>> runner (which is SOAPy) is the Common Execution Connector. In these 
>>>>  kind of services, the applications are pluggable.
>>> Sounds like the kind of thing I was looking at.  I suggested in the 
>>> original message that this has likely come up in earlier discussions.
>>>
>>>> AstroGrid DSA/Catalogue has had a CEC interface for years. It uses a 
>>>>  generic CEC implementation and passes the requests through to an 
>>>> ADQL- query application plugged inside it.
>>>> The downside of generalizing a job-control service in this way is 
>>>>  complication and divergence from the synchronous case. TAP/UWS is 
>>>>  quite like asynchronous TAP: you do an asynchronous query by 
>>>> POSTing  the same parameters you could use for a synchronous query. 
>>>> If you try  to use CEC or UWS-PA to start a TAP query then you have 
>>>> a different  interface. Because that interface is more general, it's 
>>>> not as simple,  either to implement in a service or to call from a 
>>>> client.
>>> Here's where I think I'm getting a little lost.  My suggestion is the 
>>> that I have a UWS service running above TAP that is simply a proxy 
>>> for the TAP synchronous service.  So by definition, they could not 
>>> get disassociated.  I'm getting the sense that for you, the UWS 
>>> service needs to know about the parameters it's going to pass along 
>>> to whatever it calls when it does a run.  However, as far as I can 
>>> see a UWS service can be entirely agnostic about parameters.  It can 
>>> simply take whatever parameters the user specifies and pass it along, 
>>> leaving it to the underlying synchronous call to handle validity.  In 
>>> fact, for TAP that's pretty much the case since the names of the 
>>> parameters used in TAP are not bounded.
>>>
>>>
>>>> I think that the current boundary between TAP and UWS is just where 
>>>> we  need it for the simplest implementations.
>>>
>>> I'm not so much concerned with boundaries as in the sense in which 
>>> UWS is instantiated.  Let me give a concrete example.  I have a TAP 
>>> service with a base URL of http://tap/, so http://tap/sync is the 
>>> synchronous access point and http://tap/async is the async access point.
>>>
>>> What happens when someone references the later URL?  In my current 
>>> implementation, a TAP servlet starts up, notes that I'm using an 
>>> asynchronous request and calls the appropriate methods and classes 
>>> that TAP has defined for this.  If I had multiple asynchronous 
>>> services these would likely be in a nice little UWS library.  All is 
>>> copacetic: UWS is a layer within TAP.  It works fine but TAP and the 
>>> UWS layer are pretty tightly coupled.
>>>
>>> What I think I'm going to do when I get back from the IVOA is a bit 
>>> different.  When I invoke http://tap/async I start a servlet whose 
>>> only knowledge of TAP is that there is a synchronous service at 
>>> http://tap/sync.  It knows nothing of the internals of TAP and is 
>>> completely independent of it.  At some point the user does  a 
>>> http://tap/async/id/phase?phase=run and this UWS service takes the 
>>> parameters that the user has specified for this job and invokes the 
>>> http://tap/sync URL with those parameters.  The results get saved 
>>> somehow and whenever the user sends the appropriate URL the results 
>>> are sent back.   The only thing the UWS service ever knows about TAP 
>>> is the base URL.  Everything else is supplied by the user.
>>>
>>> Why do I like this better?  Well it makes the TAP code simpler.  It 
>>> makes it easy for me to provide UWS functionality to all of my web 
>>> services.  E.g., I'd have a UWS interface to SkyView by simply 
>>> changing the syncrhonous URL. And if UWS changes so that, e.g., 
>>> there's now a security resource, I can plug it in without any change 
>>> whatsoever to my TAP servlet.   For me it will be a big win.
>>>
>>> I'm not suggesting that this implementation be required.  It would be 
>>> fine to keep things coupled in one TAP implementation.  However if 
>>> the paradigm (and here I mean it in its literal sense of exemplar) is 
>>> a UWS service runs on top of a TAP service then the way to describe 
>>> the relationship between TAP and UWS changes. In particular I think 
>>> it then makes a lot more sense to simply say that a UWS service can 
>>> be used to provide asynchronous access to a TAP service.  The 
>>> standard can require that if we  decide async access is mandatory (as 
>>> I think we have).  So the TAP document becomes simpler -- and far 
>>> less tightly coupled with the UWS document.
>>>
>>>> Cheers,
>>>> Guy
>>>> On 6 Nov 2009, at 15:33, Tom McGlynn wrote:
>>>>> I'm sure everyone will be happy to see the word 'Final' in the 
>>>>>  title...
>>>>>
>>>>> In the past couple of days I've gotten the UWS asynchronous 
>>>>>  implementation of TAP working (though doubtless still bug-ridden).
>>>>>
>>>>> When I read and implemented the TAP and UWS standard I had the 
>>>>> sense  of UWS as being a layer within TAP.  In retrospect I think 
>>>>> it would  have been better (for my implementation at least), if I 
>>>>> had  distinguished them more clearly.
>>>>>
>>>>> Suppose we think of UWS not as an interface layer but as the 
>>>>>  definition of how to build an asynchronous proxies.  UWS becomes a 
>>>>>  service definition, not an access protocol.  The proxy accepts and 
>>>>>  caches input parameters from the users, starts the underlying 
>>>>>  request when told to, caches the response and sends it back to the 
>>>>>  user when requested.  [I haven't followed the discussions of UWS 
>>>>>  earlier in the Grid list, so my apologies if I just discovering 
>>>>> what  everyone already knows....]
>>>>>
>>>>> If I think of things this way, then I can implement UWS completely 
>>>>>  independently of the underlying application. Indeed the binding to 
>>>>>  the underlying application could be dynamic: I can provide a UWS 
>>>>>  layer over any number of distinct synchronous applications.    I 
>>>>>  don't need to know anything about what parameters they use, just 
>>>>>  some root URL.  The one piece of the  specification that might 
>>>>> cause  problems is the desire to support multiple outputs as well 
>>>>> as a  single result.  That's not at issue in TAP, but even this 
>>>>> could  easily be handled by returning a list of the outputs -- 
>>>>> which is  what UWS does now anyway.
>>>>>
>>>>> UWS is not described this way in its standards document: it is 
>>>>> shown  as a layer within some bigger application, not as a 
>>>>> separable  entity. Similarly TAP shows the asynchronous interface 
>>>>> tightly  coupled within the rest of the TAP.
>>>>>
>>>>> In this new view, the TAP document would say very little about the 
>>>>>  asynchronous interface.  TAP itself would be synchronous, but if 
>>>>> we  want asynchronous access to be mandatory then the requirement 
>>>>> is  that a TAP implementation must specify a corresponding UWS 
>>>>> service  through which the TAP implementation can be invoked.  We 
>>>>> could still  have a TAP service that is only available 
>>>>> asynchronously: we allow  that this TAP service is not directly 
>>>>> callable: Only the associated  UWS service can access it.  I'm not 
>>>>> trying to take sides here in the  sync/async wars.
>>>>>
>>>>> Changes to the UWS document would be rather more subtle, noting 
>>>>> that  the interface can implemented without reference to the 
>>>>> underlying  implementation, and perhaps explicitly supporting the 
>>>>> kind of  dynamic association with the underlying synchronous 
>>>>> service  mentioned above. Maybe provide a convenience resource to 
>>>>> get the  output in the single output case (rather than having to 
>>>>> parse the  output list).
>>>>>
>>>>> The advantage had we taken this approach before, is that it largely 
>>>>>  decouples TAP and UWS.  The TAP standard is shorter and simpler. 
>>>>>   The UWS standard is largely unchanged.  We can change UWS in the 
>>>>>  future without worrying about any impact on TAP.
>>>>>
>>>>> This is probably a bridge too far in terms of the TAP standard. 
>>>>>  For  UWS it's really a change in tone more than content -- hints 
>>>>> to the  user -- so perhaps it is doable were it to be thought a 
>>>>> good idea.   Regardless, I do anticipate revising my own 
>>>>> implementation to use  this approach after the Interop.
>>>>>
>>>>> Tom McGlynn
>>
> 
> Dr. Paul Harrison
> JBCA, Manchester University
> http://www.manchester.ac.uk/jodrellbank
> 
> 
>