TAP Implementation Issues: Final Comment: TAP and UWS, sync and async

Fri Nov 6 10:19:56 PST 2009

Hi Guy,

You're absolutely right that what I said doesn't work!

I'll still be doing essentially what I said, but I'm taking advantage of 
the fact that my Web interface is simply a wrapper around a CLI 
interface, and the CLI interface isn't subject to HTTP timeouts. 
[That's true for SkyView too, so I could do the same there.]  So this 
has been valuable for me, since it will help me build my implementation 
more cleanly, but it's not going to work generally and this last 
exchange was a waste of everyone else's time. Thanks for setting me 
straight.

With egg on face I am respectfully yours,
	Tom

Guy Rixon wrote:
> Tom,
> 
> your suggested implementation has massive, inherent problems: it loses 
> most of the benefits of a asynchronous interface!
> 
> By depending on a synchronous HTTP endpoint to run the query, your 
> implementation breaks any time that synchronous thing times out, and it 
> breaks if the network connection with the synchronous thing drops. In 
> those cases, your UWS has no way to regain control of the query, even if 
> the DB part is still running it. The UWS has to resubmit the query.
> 
> The whole point of UWS is /not / to depend on a synchronous 
> HTTP-connection for long-running jobs.
> 
> You can build synchronous TAP as a wrapper around a UWS (Pat D. has done 
> so) but it doesn't work the other way around.
> 
> Guy
> 
> 
> On 6 Nov 2009, at 17:33, Tom McGlynn wrote:
> 
>> Hi Guy,
>>
>> I'm not sure there is a big area of disagreement here.  In terms of 
>> the text that users write and get back in the TAP asynchronous 
>> interface I'm not suggesting a that a single byte needs to be changed. 
>>  It's all in what tasks are doing the processing.  I've put in some 
>> text that I hope clarifies what I was saying below in context.
>>
>> Tom
>>
>> Guy Rixon wrote:
>>> Tom,
>>> whenever you use UWS in a service definition, you have to say what 
>>>  parameters it takes when setting up a job and the work done by that 
>>>  job. That's the "application of the UWS pattern" to use the terms 
>>> from  the UWS standard.
>>
>> I'm not sure I understand this.  While there is talk of JDL and such 
>> in the UWS standard, I don't see any requirements that show it 
>> actually being used in any way.  So while a given UWS implementation 
>> might restrict parameters being used, I don't see how that is done 
>> within the UWS protocol itself.
>>
>>> The UWS specification is supposed to be reusable between 
>>> applications;  hence the U in the title. Therefore, it can't specify 
>>> the application- specific parameters.
>>>
>>
>> Right.  I'm not suggesting that.  What I'm saying is that it's easy to 
>> write a UWS that can handle any parameters -- as indeed you suggest 
>> you have done already below.
>>
>>> It's possible to specify a UWS-conforming service for more than one 
>>>  application. CEA does this. The modern interface of this kind is 
>>>  called UWS-PA ("UWS for parameterized applications") and its fore- 
>>> runner (which is SOAPy) is the Common Execution Connector. In these 
>>>  kind of services, the applications are pluggable.
>> Sounds like the kind of thing I was looking at.  I suggested in the 
>> original message that this has likely come up in earlier discussions.
>>
>>> AstroGrid DSA/Catalogue has had a CEC interface for years. It uses a 
>>>  generic CEC implementation and passes the requests through to an 
>>> ADQL- query application plugged inside it.
>>> The downside of generalizing a job-control service in this way is 
>>>  complication and divergence from the synchronous case. TAP/UWS is 
>>>  quite like asynchronous TAP: you do an asynchronous query by POSTing 
>>>  the same parameters you could use for a synchronous query. If you 
>>> try  to use CEC or UWS-PA to start a TAP query then you have a 
>>> different  interface. Because that interface is more general, it's 
>>> not as simple,  either to implement in a service or to call from a 
>>> client.
>> Here's where I think I'm getting a little lost.  My suggestion is the 
>> that I have a UWS service running above TAP that is simply a proxy for 
>> the TAP synchronous service.  So by definition, they could not get 
>> disassociated.  I'm getting the sense that for you, the UWS service 
>> needs to know about the parameters it's going to pass along to 
>> whatever it calls when it does a run.  However, as far as I can see a 
>> UWS service can be entirely agnostic about parameters.  It can simply 
>> take whatever parameters the user specifies and pass it along, leaving 
>> it to the underlying synchronous call to handle validity.  In fact, 
>> for TAP that's pretty much the case since the names of the parameters 
>> used in TAP are not bounded.
>>
>>
>>> I think that the current boundary between TAP and UWS is just where 
>>> we  need it for the simplest implementations.
>>
>> I'm not so much concerned with boundaries as in the sense in which UWS 
>> is instantiated.  Let me give a concrete example.  I have a TAP 
>> service with a base URL of http://tap/, so http://tap/sync is the 
>> synchronous access point and http://tap/async is the async access point.
>>
>> What happens when someone references the later URL?  In my current 
>> implementation, a TAP servlet starts up, notes that I'm using an 
>> asynchronous request and calls the appropriate methods and classes 
>> that TAP has defined for this.  If I had multiple asynchronous 
>> services these would likely be in a nice little UWS library.  All is 
>> copacetic: UWS is a layer within TAP.  It works fine but TAP and the 
>> UWS layer are pretty tightly coupled.
>>
>> What I think I'm going to do when I get back from the IVOA is a bit 
>> different.  When I invoke http://tap/async I start a servlet whose 
>> only knowledge of TAP is that there is a synchronous service at 
>> http://tap/sync.  It knows nothing of the internals of TAP and is 
>> completely independent of it.  At some point the user does  a 
>> http://tap/async/id/phase?phase=run and this UWS service takes the 
>> parameters that the user has specified for this job and invokes the 
>> http://tap/sync URL with those parameters.  The results get saved 
>> somehow and whenever the user sends the appropriate URL the results 
>> are sent back.   The only thing the UWS service ever knows about TAP 
>> is the base URL.  Everything else is supplied by the user.
>>
>> Why do I like this better?  Well it makes the TAP code simpler.  It 
>> makes it easy for me to provide UWS functionality to all of my web 
>> services.  E.g., I'd have a UWS interface to SkyView by simply 
>> changing the syncrhonous URL. And if UWS changes so that, e.g., 
>> there's now a security resource, I can plug it in without any change 
>> whatsoever to my TAP servlet.   For me it will be a big win.
>>
>> I'm not suggesting that this implementation be required.  It would be 
>> fine to keep things coupled in one TAP implementation.  However if the 
>> paradigm (and here I mean it in its literal sense of exemplar) is a 
>> UWS service runs on top of a TAP service then the way to describe the 
>> relationship between TAP and UWS changes. In particular I think it 
>> then makes a lot more sense to simply say that a UWS service can be 
>> used to provide asynchronous access to a TAP service.  The standard 
>> can require that if we  decide async access is mandatory (as I think 
>> we have).  So the TAP document becomes simpler -- and far less tightly 
>> coupled with the UWS document.
>>
>>> Cheers,
>>> Guy
>>> On 6 Nov 2009, at 15:33, Tom McGlynn wrote:
>>>> I'm sure everyone will be happy to see the word 'Final' in the  title...
>>>>
>>>> In the past couple of days I've gotten the UWS asynchronous 
>>>>  implementation of TAP working (though doubtless still bug-ridden).
>>>>
>>>> When I read and implemented the TAP and UWS standard I had the sense 
>>>>  of UWS as being a layer within TAP.  In retrospect I think it would 
>>>>  have been better (for my implementation at least), if I had 
>>>>  distinguished them more clearly.
>>>>
>>>> Suppose we think of UWS not as an interface layer but as the 
>>>>  definition of how to build an asynchronous proxies.  UWS becomes a 
>>>>  service definition, not an access protocol.  The proxy accepts and 
>>>>  caches input parameters from the users, starts the underlying 
>>>>  request when told to, caches the response and sends it back to the 
>>>>  user when requested.  [I haven't followed the discussions of UWS 
>>>>  earlier in the Grid list, so my apologies if I just discovering 
>>>> what  everyone already knows....]
>>>>
>>>> If I think of things this way, then I can implement UWS completely 
>>>>  independently of the underlying application. Indeed the binding to 
>>>>  the underlying application could be dynamic: I can provide a UWS 
>>>>  layer over any number of distinct synchronous applications.    I 
>>>>  don't need to know anything about what parameters they use, just 
>>>>  some root URL.  The one piece of the  specification that might 
>>>> cause  problems is the desire to support multiple outputs as well as 
>>>> a  single result.  That's not at issue in TAP, but even this could 
>>>>  easily be handled by returning a list of the outputs -- which is 
>>>>  what UWS does now anyway.
>>>>
>>>> UWS is not described this way in its standards document: it is shown 
>>>>  as a layer within some bigger application, not as a separable 
>>>>  entity. Similarly TAP shows the asynchronous interface tightly 
>>>>  coupled within the rest of the TAP.
>>>>
>>>> In this new view, the TAP document would say very little about the 
>>>>  asynchronous interface.  TAP itself would be synchronous, but if we 
>>>>  want asynchronous access to be mandatory then the requirement is 
>>>>  that a TAP implementation must specify a corresponding UWS service 
>>>>  through which the TAP implementation can be invoked.  We could 
>>>> still  have a TAP service that is only available asynchronously: we 
>>>> allow  that this TAP service is not directly callable: Only the 
>>>> associated  UWS service can access it.  I'm not trying to take sides 
>>>> here in the  sync/async wars.
>>>>
>>>> Changes to the UWS document would be rather more subtle, noting that 
>>>>  the interface can implemented without reference to the underlying 
>>>>  implementation, and perhaps explicitly supporting the kind of 
>>>>  dynamic association with the underlying synchronous service 
>>>>  mentioned above. Maybe provide a convenience resource to get the 
>>>>  output in the single output case (rather than having to parse the 
>>>>  output list).
>>>>
>>>> The advantage had we taken this approach before, is that it largely 
>>>>  decouples TAP and UWS.  The TAP standard is shorter and simpler. 
>>>>   The UWS standard is largely unchanged.  We can change UWS in the 
>>>>  future without worrying about any impact on TAP.
>>>>
>>>> This is probably a bridge too far in terms of the TAP standard.  For 
>>>>  UWS it's really a change in tone more than content -- hints to the 
>>>>  user -- so perhaps it is doable were it to be thought a good idea. 
>>>>   Regardless, I do anticipate revising my own implementation to use 
>>>>  this approach after the Interop.
>>>>
>>>> Tom McGlynn
>