SSO authentication: a new approach

Mon Mar 14 10:50:08 PST 2005

Guy,

Thank you for your detailed reply. As I understand it, you have two main 
concerns:

1) The user agent may be transient, e.g. a laptop web browser, and thus 
cannot guarantee itself available to orchestrate the workflow. Even if it 
were available, it may not be able to do so if it is a simple browser-like 
entity.

2) Unless user identity is passed freely, data must pass back through the
intervening software layers, with each communication incurring a
potentially large and wasteful data transfer.

To answer your first point: I think we may have a misunderstanding about 
what we each mean by 'user agent'. You describe the user agent as the 
primary interface between the human astronomer and the grid architecture. 
You give the example of a browser-like process running on a laptop.

To me, this browser application is just the *interface* to a persistent
user agent (an intelligent agent), running permanently on my local
community server, as you have described. For what follows, we can
essentially ignore the browser, as we would like this to remain decoupled
from the rest of the system. I shall use the phrase 'local agent' to
describe the other, persistent, local software entity.

Now I think we are mostly in agreement. As you have argued, the local
agent must have full user privileges - that is fine. The local agent is
persistent - we have some guarantee that it is not going to disappear
during the lifetime of the workflow. This means it can deal with 
coordinating workflow components.

Let's assume that we use the laptop 'user agent' as a way to talk to the
local agent, as if we were logging on to an internet email service using a
username and password. Once connected to the local agent, we can make
our experiment request, and then log out. It's now up to the local agent
to carry out our wishes, to the best of its ability.

Now to your second point. In the example with the portal, I suggested that 
the portal was authorised to run backend jobs. In the general case, that 
may not be true - user privileges might be required to access a 
specialised service elsewhere as part of the workflow.

There are many ways this could be handled. Obviously we could just pass
our full privileges to the portal, and let it freely talk to anything we
have access to. I have argued against this.

We could pass a unique, single-use ticket to the portal, digitally signed 
by the local agent, allowing access to the specialised service under 
whatever restrictions we like - one time write, write only, expiry 
date etc. etc.

To get around the problem of multiple data transfers, we essentially need 
to pass this privilege on to whoever needs to do the data transfer. It is 
likely that we don't actually know who that will be, if it some backend 
service hidden behind a portal. Looking at the portal example again, let's 
say that the backend scratch store needs to return the data directly to my 
VOSpace. Now there are several possibilities:

1) The portal pretends to be me, and simply authorises the transfer. 

2) Since we trusted the portal with the authorisation to write to our
VOSpace, we give the portal the right to pass the ticket on to whomever it
deems fit. However, the ticket is valid only for the portal, not any other
service. One solution would be to make the ticket globally valid (i.e.
relax the checking constraint), but encrypted with the portal public key,
so it could not be intercepted by others. Once the portal has the ticket,
if it wants to pass it on, it can do so, by decrypting and resending.  
Since the key is still only single use, and we trusted the portal anyway,
this seems reasonable.

3) We don't give the portal any user authorisations, but we tell it what 
personal actions we will allow. In this case, we tell the portal that we 
will allow a single data transfer to our VOSpace. The portal passes the 
message on to the backend scratch space, which responds asking for access 
to the VOSpace. The portal passes on the access request, with the server 
id of the previously hidden backend scratch service. We choose to accept 
or deny the request. If we accept, we generate the ticket, locked to the 
backend service, and send the ticket back. Now the backend service has the 
direct connection it requires, and sends the data to our VOSpace.

The thing I like about this architecture is that nothing needs to be known
a priori. Each trust relationship and the logic required to make the
decisions at each stage is local. Our local agent doesn't even have to be
aware that a particular service exists. This is important when we have a
service landscape in constant flux, and one which has the potential to
scale upwards indefinitely. By combining small numbers of relatively
'stupid' local agents, that know only about their specific local
environment, we can discover paths to arbitrary successful workflows, and 
authorise those paths in a secure manner.

Cheers

Eric

-------------------------------------------
Eric Saunders
eSTAR Project (http://www.estar.org.uk)
Astrophysics Group
University of Exeter
-------------------------------------------

On Sun, 13 Mar 2005, Guy Rixon wrote:

> Eric,
> 
> you've given us a lot to think about.  I'd like to reply first concerning your
> initial question and your detailed sequence of operations; I'll defer
> discussion of the other web-of-trust model to later email.
> 
> On Thu, 10 Mar 2005, Eric Saunders wrote:
> 
> > Guy,
> >
> > I have a couple of issues with the trust model as it's been defined here.
> >
> > Firstly, what is the benefit of allowing fully trusted agents other than
> > your personal user agent to assume your identity? If each step of a
> > workflow is authorised, then it follows that the whole workflow must be
> > valid.
> 
> If you run a workflow or similar long-running experiment, and if you leave it
> to run unattended, then something has to be left in charge, activating each
> stage of the experiment when it comes due. If this is your personal user-agent
> (running as a background process on your desktop PC, say), then you don't need
> any other agent to have access to your full set of privileges. However, if
> your local user-agent isn't running all the time (e.g. it is on your lap-top)
> then you may have another agent, on a server, quite probably associated with
> your portal/community, looking after things for you. In this case, it helps
> for the latter agent to have all your privileges at hand.
> 
> The set of tickets you generate to authorize bits of an experiment varies
> according to which services you choose.  If the agent doing the workflow has
> all your privileges, then it can choose among equivalent services at the time
> of execution to optimize thing; if it doesn't, then you'll have to choose all
> the service and issue all the tickets when you set the experiment up.
> 
> 
> > Consider the following simple example. I want to run a set of simulations
> > on some remote facility, and have the final output returned to my VOSpace
> > for further processing. The delegation proceeds as follows.
> 
> You've made a few assumptions in this sequence that don't hold for all likely
> use cases.  I'll point them out as they come up.
> 
> 
> > 1. I instruct my user agent.
> >
> > 2. My user agent contacts the portal agent for the facility and makes the
> > request. My user agent provides the portal with a single use 'ticket'
> > which allows the portal access to a single directory of my VOSpace, for
> > writing only, with a short but reasonable expiry date.
> 
> You're assuming that your user agent can do this.  If it's a web browser, then
> it probably can't issue tickets by itself. It might be able to get the tickets
> from, say, your community, but it may be difficult to push these tickets back
> to the portal (would they go as HTTP cookies, perhaps?).
> 
> 
> > 3. The portal agent verifies my community and decides to trust me.
> >
> > 4. The portal agent authorises cpu time and temporary scratch disk
> > allocation from separate services at the remote facility, *using its own
> > identity*. These facilities are completely abstracted from everybody
> > beyond the portal.
> 
> This assumes that the portal itself has privileges at the remote facility. Can
> we require this in all cases?
> 
> If the privileges are actually tied to your identity, then either the portal
> can't use them; or you need a ticket authorizing the portal to do so; or you
> need to open your identity and all privileges to the portal.
> 
> 
> >
> > 5. The job runs. The portal agent uses the ticket to dump the data from
> > the internal data store to my VOSpace location. The ticket is now
> > invalidated. The portal returns a status message to my user agent.
> 
> You're assumming that the data flow to VOSpace via the portal. This is
> undesirable since it involves two transfers of every byte. Some systems may
> want to move the data direct from the producer to VOSpace. AstroGrid's CEA
> does this routinely.
> 
> If the data go direct from the remote service to VOSpace, then the ticket for
> access to VOSpace has to be passed to the remote service. It has to authorize
> that service, not the portal. Therefore, either the user agent has to be made
> aware of the specific service being used, or the portal has to be able to make
> of get a new ticket without reference to the UA.
> 
> > In this example, we only had to export our user privileges to access the
> > final data write back. When we did so, we minimised the extent of these
> > privileges. If we needed other services requiring user privileges, the
> > same temporary ticket model still works. This is just an example of the
> > Gang of Four 'Facade' pattern: allow encapsulation of a subsystem using a
> > high-level interface, simplifying subsystem usage and hiding structural
> > details.
> 
> I like encapsulation, too. That's why I think it's useful to make the portal
> fully trusted and authorized: so that we don't have to break encapsulation by
> passing authorization details up to the UA.
> 
> > In this case, our high-level interface is simply the set of
> > allowed personal actions which we make available to the other components
> > of the workflow. We could even make this explicitly OO by passing the
> > interface API of our agent to the other workflow components.  Now, when
> > the portal wants to give us our data back, it calls a 'dumpDataToVOSpace'
> > method on our user agent, which verifies the portal is allowed to do this,
> > then goes ahead and unlocks the storage location, passing back a suitable
> > URI or whatever to the portal.
> 
> The idea of a callback for authorization - it's usually
> called "pull authorization" - is good. However, the agent making the
> authorization responses needs to be on-line all the time. As discussed above,
> this may be a job better done in the portal, or in the community service.
> 
> >
> > The advantages of this approach are the same as those of encapsulation in
> > any other piece of software. The less each component knows about each
> > other component, the less can go wrong, whether maliciously, because of
> > bugs, erroneous assumptions or simply poor security. Each component is now
> > effectively sandboxed.
> 
> > The other big advantage is that we are not giving away our private
> > identity to arbitrary software entities, and simply trusting that they
> > will do the right thing, now and forever. This is inherently risky,
> > because if a single point in the trust model ever fails, we lose our
> > identity integrity permanently.
> 
> I agree with both points. However, I think there are cases where the
> encapsulation counts than the restriction of the identity.
> 
> Regards,
> Guy
> 
> Guy Rixon 				        gtr at ast.cam.ac.uk
> Institute of Astronomy   	                Tel: +44-1223-337542
> Madingley Road, Cambridge, UK, CB3 0HA		Fax: +44-1223-337523
>