[Deploying Sakai] SSL load balancing and fault tolerance for Sakai

Stephen Marquard stephen.marquard at uct.ac.za
Wed Aug 12 11:07:33 PDT 2009


Why do you need such a complex setup, for example can't you let Sakai do the LDAP auth directly?

Regardless of that, the only current solution for in-session failover is Terracotta, and as you observe, it won't preserve tool state across all tools, though the user wouldn't need to login again.

OTOH outright server failure is not common (for Sakai reasons, that is). Generally sites want to do take one servers in a cluster (without taking the whole cluster) to do rolling updates, which can be done if the load balancer supports disabling servers in the cluster (i.e. preserving existing sessions but not sending new sessions there), given enough time for the active servers on that server to finish.

If we had real cluster-wide sessions, it would be possible to roll out a version update within 10-20 minutes without disrupting users, rather than the hours it can take if one waits for user sessions to finish on each server in turn.

Regards
Stephen 
 
>>> "Grossman,John E" <john.grossman at mdanderson.org> 8/12/2009 7:56 PM >>> 
We are deploying Sakai 2.6 on Windows/Tomcat.
We would like to have load balancing and in-session fault tolerance/failover.
We tried to deploy to support the following scenario (everything over SSL):

-       User authenticates via a web app using LDAP
-       The authentication app connects through a Cisco CSS load balancer configured for SSL sticky sessions to the Sakai login web service to create a Sakai session on one of our Sakai servers
-       The authentication app passes the Sakai session to the user's browser with the load balancer virtual IP in the address
-       User hits the load balancer
-       User is directed to one of the Sakai servers by the load balancer
-       If the server fails while a session is active the user is directed to another server and continues the session.

Problems
-       The load balancer doesn't necessarily send the user to the same Sakai server that created the Sakai session. The SSL sticky capability on the load balancer is based on the SSL session ID. The SSL session ID associated with the Sakai session is assigned to the authentication app - not to the user's browser. The user's browser doesn't get an SSL session ID until it connects to Sakai for the first time.
-       Even if the load balancing worked, we still wouldn't have in-session fault tolerance because one Sakai host isn't aware of sessions created on another host.

I've read the Jira documents about Terracotta, but I can't see a great deal of value in enabling Teracotta at this time since many tools don't appear to support Terracotta without some customization. We're trying to avoid customization as much as possible.

Has anyone else solved this problem using only standard Sakai and load balancer configurations? Our fallback is to implement simple round-robin load balancing in the authentication app. In this scenario we would not have in-session fault tolerance - anyone connected to a server at the time of server failure would have to log in again to the surviving server. Ugh!


John Grossman
The University of Texas M. D. Anderson Cancer Center







More information about the production mailing list