TAP Implementation Issues: Final Comment: TAP and UWS, sync and async
Tom McGlynn
Thomas.A.McGlynn at nasa.gov
Fri Nov 6 10:44:58 PST 2009
Hi Paul,
What I had in mind was that at some level you have to have a synchronous
service -- something actually has to run whatever it is you want to do
-- and it was natural to me to think of UWS as being built as a proxy on
top of that. Where I suffered brain damage was in forgetting the time
limitations on synchronous HTTP requests so while one can build on top
of them, it's going to -- as Guy so gently pointed out -- obviate the
primary goal of being able to run long requests. He was kind enough to
give me a couple of chances to realize it on my own!
Were there no such limits, then I don't think the copying issues are
significant -- it might increase latency a bit but I doubt it would have
much effect on throughput. I don't think there need be any duplication
of storage regardless of which way you do things.
Tom
Paul Harrison wrote:
> Hi,
>
> I think that Tom's implementation idea is legal wrt UWS and not
> necessarily "bad" as long as the
> http://tap/async/id/phase?phase=run (btw should be a POST) step returns
> "immediately" - in fact I had been thinking about offering a similar
> generic service for people who said that async TAP was too tricky for
> them (for Guy - this is what the CEA "HTTP" style server does). Granted
> there might be some (internal) inefficiency with data being transferred
> from the sync to the async storage area, and that internally the UWS
> part might have to resubmit the sync step if it timed out, but the
> behaviour to the end client should appear to be standard UWS still.
>
> I think that the "better" (i.e. more efficient) implementation is to
> write a fundamentally async service and layer the sync service on this
> as detailed in section 5 of the UWS document, but as long as the UWS
> interface is adhered to then people are free to implement as they want.
> However, I do not think that this requires a change to the UWS document
> to say that UWS is a separate "service" as Tom suggests, because that
> would favour the inefficient implementation over the efficient one.
>
> Paul.
>
> On 2009-11 -06, at 17:47, Guy Rixon wrote:
>
>> Tom,
>>
>> your suggested implementation has massive, inherent problems: it loses
>> most of the benefits of a asynchronous interface!
>>
>> By depending on a synchronous HTTP endpoint to run the query, your
>> implementation breaks any time that synchronous thing times out, and
>> it breaks if the network connection with the synchronous thing drops.
>> In those cases, your UWS has no way to regain control of the query,
>> even if the DB part is still running it. The UWS has to resubmit the
>> query.
>>
>> The whole point of UWS is /not / to depend on a synchronous
>> HTTP-connection for long-running jobs.
>>
>> You can build synchronous TAP as a wrapper around a UWS (Pat D. has
>> done so) but it doesn't work the other way around.
>>
>> Guy
>>
>>
>> On 6 Nov 2009, at 17:33, Tom McGlynn wrote:
>>
>>> Hi Guy,
>>>
>>> I'm not sure there is a big area of disagreement here. In terms of
>>> the text that users write and get back in the TAP asynchronous
>>> interface I'm not suggesting a that a single byte needs to be
>>> changed. It's all in what tasks are doing the processing. I've put
>>> in some text that I hope clarifies what I was saying below in context.
>>>
>>> Tom
>>>
>>> Guy Rixon wrote:
>>>> Tom,
>>>> whenever you use UWS in a service definition, you have to say what
>>>> parameters it takes when setting up a job and the work done by that
>>>> job. That's the "application of the UWS pattern" to use the terms
>>>> from the UWS standard.
>>>
>>> I'm not sure I understand this. While there is talk of JDL and such
>>> in the UWS standard, I don't see any requirements that show it
>>> actually being used in any way. So while a given UWS implementation
>>> might restrict parameters being used, I don't see how that is done
>>> within the UWS protocol itself.
>>>
>>>> The UWS specification is supposed to be reusable between
>>>> applications; hence the U in the title. Therefore, it can't specify
>>>> the application- specific parameters.
>>>>
>>>
>>> Right. I'm not suggesting that. What I'm saying is that it's easy
>>> to write a UWS that can handle any parameters -- as indeed you
>>> suggest you have done already below.
>>>
>>>> It's possible to specify a UWS-conforming service for more than one
>>>> application. CEA does this. The modern interface of this kind is
>>>> called UWS-PA ("UWS for parameterized applications") and its fore-
>>>> runner (which is SOAPy) is the Common Execution Connector. In these
>>>> kind of services, the applications are pluggable.
>>> Sounds like the kind of thing I was looking at. I suggested in the
>>> original message that this has likely come up in earlier discussions.
>>>
>>>> AstroGrid DSA/Catalogue has had a CEC interface for years. It uses a
>>>> generic CEC implementation and passes the requests through to an
>>>> ADQL- query application plugged inside it.
>>>> The downside of generalizing a job-control service in this way is
>>>> complication and divergence from the synchronous case. TAP/UWS is
>>>> quite like asynchronous TAP: you do an asynchronous query by
>>>> POSTing the same parameters you could use for a synchronous query.
>>>> If you try to use CEC or UWS-PA to start a TAP query then you have
>>>> a different interface. Because that interface is more general, it's
>>>> not as simple, either to implement in a service or to call from a
>>>> client.
>>> Here's where I think I'm getting a little lost. My suggestion is the
>>> that I have a UWS service running above TAP that is simply a proxy
>>> for the TAP synchronous service. So by definition, they could not
>>> get disassociated. I'm getting the sense that for you, the UWS
>>> service needs to know about the parameters it's going to pass along
>>> to whatever it calls when it does a run. However, as far as I can
>>> see a UWS service can be entirely agnostic about parameters. It can
>>> simply take whatever parameters the user specifies and pass it along,
>>> leaving it to the underlying synchronous call to handle validity. In
>>> fact, for TAP that's pretty much the case since the names of the
>>> parameters used in TAP are not bounded.
>>>
>>>
>>>> I think that the current boundary between TAP and UWS is just where
>>>> we need it for the simplest implementations.
>>>
>>> I'm not so much concerned with boundaries as in the sense in which
>>> UWS is instantiated. Let me give a concrete example. I have a TAP
>>> service with a base URL of http://tap/, so http://tap/sync is the
>>> synchronous access point and http://tap/async is the async access point.
>>>
>>> What happens when someone references the later URL? In my current
>>> implementation, a TAP servlet starts up, notes that I'm using an
>>> asynchronous request and calls the appropriate methods and classes
>>> that TAP has defined for this. If I had multiple asynchronous
>>> services these would likely be in a nice little UWS library. All is
>>> copacetic: UWS is a layer within TAP. It works fine but TAP and the
>>> UWS layer are pretty tightly coupled.
>>>
>>> What I think I'm going to do when I get back from the IVOA is a bit
>>> different. When I invoke http://tap/async I start a servlet whose
>>> only knowledge of TAP is that there is a synchronous service at
>>> http://tap/sync. It knows nothing of the internals of TAP and is
>>> completely independent of it. At some point the user does a
>>> http://tap/async/id/phase?phase=run and this UWS service takes the
>>> parameters that the user has specified for this job and invokes the
>>> http://tap/sync URL with those parameters. The results get saved
>>> somehow and whenever the user sends the appropriate URL the results
>>> are sent back. The only thing the UWS service ever knows about TAP
>>> is the base URL. Everything else is supplied by the user.
>>>
>>> Why do I like this better? Well it makes the TAP code simpler. It
>>> makes it easy for me to provide UWS functionality to all of my web
>>> services. E.g., I'd have a UWS interface to SkyView by simply
>>> changing the syncrhonous URL. And if UWS changes so that, e.g.,
>>> there's now a security resource, I can plug it in without any change
>>> whatsoever to my TAP servlet. For me it will be a big win.
>>>
>>> I'm not suggesting that this implementation be required. It would be
>>> fine to keep things coupled in one TAP implementation. However if
>>> the paradigm (and here I mean it in its literal sense of exemplar) is
>>> a UWS service runs on top of a TAP service then the way to describe
>>> the relationship between TAP and UWS changes. In particular I think
>>> it then makes a lot more sense to simply say that a UWS service can
>>> be used to provide asynchronous access to a TAP service. The
>>> standard can require that if we decide async access is mandatory (as
>>> I think we have). So the TAP document becomes simpler -- and far
>>> less tightly coupled with the UWS document.
>>>
>>>> Cheers,
>>>> Guy
>>>> On 6 Nov 2009, at 15:33, Tom McGlynn wrote:
>>>>> I'm sure everyone will be happy to see the word 'Final' in the
>>>>> title...
>>>>>
>>>>> In the past couple of days I've gotten the UWS asynchronous
>>>>> implementation of TAP working (though doubtless still bug-ridden).
>>>>>
>>>>> When I read and implemented the TAP and UWS standard I had the
>>>>> sense of UWS as being a layer within TAP. In retrospect I think
>>>>> it would have been better (for my implementation at least), if I
>>>>> had distinguished them more clearly.
>>>>>
>>>>> Suppose we think of UWS not as an interface layer but as the
>>>>> definition of how to build an asynchronous proxies. UWS becomes a
>>>>> service definition, not an access protocol. The proxy accepts and
>>>>> caches input parameters from the users, starts the underlying
>>>>> request when told to, caches the response and sends it back to the
>>>>> user when requested. [I haven't followed the discussions of UWS
>>>>> earlier in the Grid list, so my apologies if I just discovering
>>>>> what everyone already knows....]
>>>>>
>>>>> If I think of things this way, then I can implement UWS completely
>>>>> independently of the underlying application. Indeed the binding to
>>>>> the underlying application could be dynamic: I can provide a UWS
>>>>> layer over any number of distinct synchronous applications. I
>>>>> don't need to know anything about what parameters they use, just
>>>>> some root URL. The one piece of the specification that might
>>>>> cause problems is the desire to support multiple outputs as well
>>>>> as a single result. That's not at issue in TAP, but even this
>>>>> could easily be handled by returning a list of the outputs --
>>>>> which is what UWS does now anyway.
>>>>>
>>>>> UWS is not described this way in its standards document: it is
>>>>> shown as a layer within some bigger application, not as a
>>>>> separable entity. Similarly TAP shows the asynchronous interface
>>>>> tightly coupled within the rest of the TAP.
>>>>>
>>>>> In this new view, the TAP document would say very little about the
>>>>> asynchronous interface. TAP itself would be synchronous, but if
>>>>> we want asynchronous access to be mandatory then the requirement
>>>>> is that a TAP implementation must specify a corresponding UWS
>>>>> service through which the TAP implementation can be invoked. We
>>>>> could still have a TAP service that is only available
>>>>> asynchronously: we allow that this TAP service is not directly
>>>>> callable: Only the associated UWS service can access it. I'm not
>>>>> trying to take sides here in the sync/async wars.
>>>>>
>>>>> Changes to the UWS document would be rather more subtle, noting
>>>>> that the interface can implemented without reference to the
>>>>> underlying implementation, and perhaps explicitly supporting the
>>>>> kind of dynamic association with the underlying synchronous
>>>>> service mentioned above. Maybe provide a convenience resource to
>>>>> get the output in the single output case (rather than having to
>>>>> parse the output list).
>>>>>
>>>>> The advantage had we taken this approach before, is that it largely
>>>>> decouples TAP and UWS. The TAP standard is shorter and simpler.
>>>>> The UWS standard is largely unchanged. We can change UWS in the
>>>>> future without worrying about any impact on TAP.
>>>>>
>>>>> This is probably a bridge too far in terms of the TAP standard.
>>>>> For UWS it's really a change in tone more than content -- hints
>>>>> to the user -- so perhaps it is doable were it to be thought a
>>>>> good idea. Regardless, I do anticipate revising my own
>>>>> implementation to use this approach after the Interop.
>>>>>
>>>>> Tom McGlynn
>>
>
> Dr. Paul Harrison
> JBCA, Manchester University
> http://www.manchester.ac.uk/jodrellbank
>
>
>
More information about the grid
mailing list