[Building Sakai] distributed state replication in sakai 2.6

Thu Dec 17 17:14:50 PST 2009

John,

I know you've already settled on a non-replication solution. Thought I 
should provide the elevator pitch on where Terracotta fits in with Sakai 
session replication, though... basically, the issue is that solutions 
like ehCache or Tomcat's in-built session replication cannot deserialize 
session attributes defined by Sakai component classloaders. Terracotta 
solves this directly with the notion of a "global loader."[1] Also, 
because Terracotta avoids object serialization, modifications to the 
class definitions for distributed objects do not present the issues 
you'd encounter with a more traditional replication approach, which is 
potentially useful if the goal is session resilience in general.

Framework-level support for Terracotta-based session replication is in 
the K1 trunk [2],[3]. Work remains, though, to Terracotta-enable 
tools/services which would benefit from such replication[4]. In the case 
of the Wiley project which sponsored this work, Terracotta is being used 
for session replication in Sakai, but not for vanilla Sakai tooling. 
That project is not yet in production. I am unaware of any institutions 
currently using Sakai+Terracotta in production.

Hope that helps.

- Dan

1 - http://forums.terracotta.org/forums/posts/list/962.page
2 - http://confluence.sakaiproject.org//x/CgBx
3 - http://confluence.sakaiproject.org//x/BwBx
4 - http://confluence.sakaiproject.org//x/DABx

John Bush wrote:
> If we need to do distributed state replication in sakai 2.6, what is  
> the suggested way?  Is there an agreed upon way?  I know a  
> messageservice exists in contrib, or hide under ehcache, or  
> terracotta.  Is it just pick your own poison?  Seems like unicon has  
> done the most work in this area that I'm aware of anyway.  Is this  
> something the product council should make a recommendation on?
> 
> The reason I'm asking is we've uncovered a rather nasty bug in webdav  
> that is caused by the mac client in a cluster.  It brings down the  
> server.  Its due to resource contention around locking.  I think its  
> probably caused by how we are doing load balancing,  its not clear in  
> our setup quite yet how to force webdav to be sticky, we are still  
> looking into that.  If we pursue a state replication solution to  
> replicate the locking state around the cluster, we'd like this work to  
> make it into core, and would like some guidance on the appropriate way  
> to do this for general adoption.
> 
> We are still researching this issue, expect a JIRA with more detail  
> later today.
> 
> John
> _______________________________________________
> sakai-dev mailing list
> sakai-dev at collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
> 
> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe at collab.sakaiproject.org with a subject of "unsubscribe"