TAP Implementation Issues: Final Comment: TAP and UWS, sync and async
Tom McGlynn
Thomas.A.McGlynn at nasa.gov
Fri Nov 6 10:19:56 PST 2009
Hi Guy,
You're absolutely right that what I said doesn't work!
I'll still be doing essentially what I said, but I'm taking advantage of
the fact that my Web interface is simply a wrapper around a CLI
interface, and the CLI interface isn't subject to HTTP timeouts.
[That's true for SkyView too, so I could do the same there.] So this
has been valuable for me, since it will help me build my implementation
more cleanly, but it's not going to work generally and this last
exchange was a waste of everyone else's time. Thanks for setting me
straight.
With egg on face I am respectfully yours,
Tom
Guy Rixon wrote:
> Tom,
>
> your suggested implementation has massive, inherent problems: it loses
> most of the benefits of a asynchronous interface!
>
> By depending on a synchronous HTTP endpoint to run the query, your
> implementation breaks any time that synchronous thing times out, and it
> breaks if the network connection with the synchronous thing drops. In
> those cases, your UWS has no way to regain control of the query, even if
> the DB part is still running it. The UWS has to resubmit the query.
>
> The whole point of UWS is /not / to depend on a synchronous
> HTTP-connection for long-running jobs.
>
> You can build synchronous TAP as a wrapper around a UWS (Pat D. has done
> so) but it doesn't work the other way around.
>
> Guy
>
>
> On 6 Nov 2009, at 17:33, Tom McGlynn wrote:
>
>> Hi Guy,
>>
>> I'm not sure there is a big area of disagreement here. In terms of
>> the text that users write and get back in the TAP asynchronous
>> interface I'm not suggesting a that a single byte needs to be changed.
>> It's all in what tasks are doing the processing. I've put in some
>> text that I hope clarifies what I was saying below in context.
>>
>> Tom
>>
>> Guy Rixon wrote:
>>> Tom,
>>> whenever you use UWS in a service definition, you have to say what
>>> parameters it takes when setting up a job and the work done by that
>>> job. That's the "application of the UWS pattern" to use the terms
>>> from the UWS standard.
>>
>> I'm not sure I understand this. While there is talk of JDL and such
>> in the UWS standard, I don't see any requirements that show it
>> actually being used in any way. So while a given UWS implementation
>> might restrict parameters being used, I don't see how that is done
>> within the UWS protocol itself.
>>
>>> The UWS specification is supposed to be reusable between
>>> applications; hence the U in the title. Therefore, it can't specify
>>> the application- specific parameters.
>>>
>>
>> Right. I'm not suggesting that. What I'm saying is that it's easy to
>> write a UWS that can handle any parameters -- as indeed you suggest
>> you have done already below.
>>
>>> It's possible to specify a UWS-conforming service for more than one
>>> application. CEA does this. The modern interface of this kind is
>>> called UWS-PA ("UWS for parameterized applications") and its fore-
>>> runner (which is SOAPy) is the Common Execution Connector. In these
>>> kind of services, the applications are pluggable.
>> Sounds like the kind of thing I was looking at. I suggested in the
>> original message that this has likely come up in earlier discussions.
>>
>>> AstroGrid DSA/Catalogue has had a CEC interface for years. It uses a
>>> generic CEC implementation and passes the requests through to an
>>> ADQL- query application plugged inside it.
>>> The downside of generalizing a job-control service in this way is
>>> complication and divergence from the synchronous case. TAP/UWS is
>>> quite like asynchronous TAP: you do an asynchronous query by POSTing
>>> the same parameters you could use for a synchronous query. If you
>>> try to use CEC or UWS-PA to start a TAP query then you have a
>>> different interface. Because that interface is more general, it's
>>> not as simple, either to implement in a service or to call from a
>>> client.
>> Here's where I think I'm getting a little lost. My suggestion is the
>> that I have a UWS service running above TAP that is simply a proxy for
>> the TAP synchronous service. So by definition, they could not get
>> disassociated. I'm getting the sense that for you, the UWS service
>> needs to know about the parameters it's going to pass along to
>> whatever it calls when it does a run. However, as far as I can see a
>> UWS service can be entirely agnostic about parameters. It can simply
>> take whatever parameters the user specifies and pass it along, leaving
>> it to the underlying synchronous call to handle validity. In fact,
>> for TAP that's pretty much the case since the names of the parameters
>> used in TAP are not bounded.
>>
>>
>>> I think that the current boundary between TAP and UWS is just where
>>> we need it for the simplest implementations.
>>
>> I'm not so much concerned with boundaries as in the sense in which UWS
>> is instantiated. Let me give a concrete example. I have a TAP
>> service with a base URL of http://tap/, so http://tap/sync is the
>> synchronous access point and http://tap/async is the async access point.
>>
>> What happens when someone references the later URL? In my current
>> implementation, a TAP servlet starts up, notes that I'm using an
>> asynchronous request and calls the appropriate methods and classes
>> that TAP has defined for this. If I had multiple asynchronous
>> services these would likely be in a nice little UWS library. All is
>> copacetic: UWS is a layer within TAP. It works fine but TAP and the
>> UWS layer are pretty tightly coupled.
>>
>> What I think I'm going to do when I get back from the IVOA is a bit
>> different. When I invoke http://tap/async I start a servlet whose
>> only knowledge of TAP is that there is a synchronous service at
>> http://tap/sync. It knows nothing of the internals of TAP and is
>> completely independent of it. At some point the user does a
>> http://tap/async/id/phase?phase=run and this UWS service takes the
>> parameters that the user has specified for this job and invokes the
>> http://tap/sync URL with those parameters. The results get saved
>> somehow and whenever the user sends the appropriate URL the results
>> are sent back. The only thing the UWS service ever knows about TAP
>> is the base URL. Everything else is supplied by the user.
>>
>> Why do I like this better? Well it makes the TAP code simpler. It
>> makes it easy for me to provide UWS functionality to all of my web
>> services. E.g., I'd have a UWS interface to SkyView by simply
>> changing the syncrhonous URL. And if UWS changes so that, e.g.,
>> there's now a security resource, I can plug it in without any change
>> whatsoever to my TAP servlet. For me it will be a big win.
>>
>> I'm not suggesting that this implementation be required. It would be
>> fine to keep things coupled in one TAP implementation. However if the
>> paradigm (and here I mean it in its literal sense of exemplar) is a
>> UWS service runs on top of a TAP service then the way to describe the
>> relationship between TAP and UWS changes. In particular I think it
>> then makes a lot more sense to simply say that a UWS service can be
>> used to provide asynchronous access to a TAP service. The standard
>> can require that if we decide async access is mandatory (as I think
>> we have). So the TAP document becomes simpler -- and far less tightly
>> coupled with the UWS document.
>>
>>> Cheers,
>>> Guy
>>> On 6 Nov 2009, at 15:33, Tom McGlynn wrote:
>>>> I'm sure everyone will be happy to see the word 'Final' in the title...
>>>>
>>>> In the past couple of days I've gotten the UWS asynchronous
>>>> implementation of TAP working (though doubtless still bug-ridden).
>>>>
>>>> When I read and implemented the TAP and UWS standard I had the sense
>>>> of UWS as being a layer within TAP. In retrospect I think it would
>>>> have been better (for my implementation at least), if I had
>>>> distinguished them more clearly.
>>>>
>>>> Suppose we think of UWS not as an interface layer but as the
>>>> definition of how to build an asynchronous proxies. UWS becomes a
>>>> service definition, not an access protocol. The proxy accepts and
>>>> caches input parameters from the users, starts the underlying
>>>> request when told to, caches the response and sends it back to the
>>>> user when requested. [I haven't followed the discussions of UWS
>>>> earlier in the Grid list, so my apologies if I just discovering
>>>> what everyone already knows....]
>>>>
>>>> If I think of things this way, then I can implement UWS completely
>>>> independently of the underlying application. Indeed the binding to
>>>> the underlying application could be dynamic: I can provide a UWS
>>>> layer over any number of distinct synchronous applications. I
>>>> don't need to know anything about what parameters they use, just
>>>> some root URL. The one piece of the specification that might
>>>> cause problems is the desire to support multiple outputs as well as
>>>> a single result. That's not at issue in TAP, but even this could
>>>> easily be handled by returning a list of the outputs -- which is
>>>> what UWS does now anyway.
>>>>
>>>> UWS is not described this way in its standards document: it is shown
>>>> as a layer within some bigger application, not as a separable
>>>> entity. Similarly TAP shows the asynchronous interface tightly
>>>> coupled within the rest of the TAP.
>>>>
>>>> In this new view, the TAP document would say very little about the
>>>> asynchronous interface. TAP itself would be synchronous, but if we
>>>> want asynchronous access to be mandatory then the requirement is
>>>> that a TAP implementation must specify a corresponding UWS service
>>>> through which the TAP implementation can be invoked. We could
>>>> still have a TAP service that is only available asynchronously: we
>>>> allow that this TAP service is not directly callable: Only the
>>>> associated UWS service can access it. I'm not trying to take sides
>>>> here in the sync/async wars.
>>>>
>>>> Changes to the UWS document would be rather more subtle, noting that
>>>> the interface can implemented without reference to the underlying
>>>> implementation, and perhaps explicitly supporting the kind of
>>>> dynamic association with the underlying synchronous service
>>>> mentioned above. Maybe provide a convenience resource to get the
>>>> output in the single output case (rather than having to parse the
>>>> output list).
>>>>
>>>> The advantage had we taken this approach before, is that it largely
>>>> decouples TAP and UWS. The TAP standard is shorter and simpler.
>>>> The UWS standard is largely unchanged. We can change UWS in the
>>>> future without worrying about any impact on TAP.
>>>>
>>>> This is probably a bridge too far in terms of the TAP standard. For
>>>> UWS it's really a change in tone more than content -- hints to the
>>>> user -- so perhaps it is doable were it to be thought a good idea.
>>>> Regardless, I do anticipate revising my own implementation to use
>>>> this approach after the Interop.
>>>>
>>>> Tom McGlynn
>
More information about the grid
mailing list