[Building Sakai] Maximum number of Sakai nodes

Fri Jun 15 13:27:31 PDT 2012

Having thought through this issue a little further since the BoF, something else to consider is that every node you add further fragments your caching of realm membership, site properties, etc. So that effect adds to your database load in a broader way than just multiplying the pain of the event polling.

Other parameters we never discussed in the BoF but which play into this:

 * How long is your session timeout? We've set ours to one hour, but I believe some schools set it much longer.

 * Are you using HTTP or AJP connectors in Tomcat?

 * What value have you set for your Tomcat Connector's maxThreads? We use 300, although our maximum in use at any point typically hovers around 15-20 and maxes out at around 200.

 * Have you moved any static resources (such as /library) to be served up from outside of Tomcat? We found that Apache could serve up static files 10-100 times faster that Tomcat in most cases, and that the /library webapp, which consists solely of static files, accounted for a third or more of our HTTP requests (as well as eating into our available Tomcat threads). So since we're using Apache with mod_proxy_http in front of Tomcat, it was a simple matter to set /library URIs to not proxy and instead pull from the filesystem directly.

As I mentioned at the BoF, Virginia Tech's usage peaks out at 6000 concurrent users on four physical app servers running single Tomcats using 18GB heaps. We have continued running at peak times just fine with only three app servers, and even with 2000 concurrent sessions per app server, we don't see a significant slowdown in HTTP response time versus our usage troughs. I'm confident we could run our peak loads on just two app servers, though we've not tested that any time recently.

Regarding our larger-than-average heaps, We haven't run into problems with GC delays though it's worth noting that we restart our app servers once a week when our DBAs have an Oracle restart scheduled, which may be why we're lucky in that department (at the expense of about 10-15 minutes of outage during the lowest usage period of every week--5am on Saturday morning).

I don't have an answer for how many app servers is too many, but I guess the first question that comes to my mind is, what performance problems led you to add app servers beyond 10 or 16 or 25? Out of memory or CPU on the app servers? Response time slowdown? Database connection saturation? Tomcat thread exhaustion?

Response time slowdown is almost certainly in my experience a database issue. Tomcat thread exhaustion is generally also triggered by a database slowdown, unless the maxThreads value is set extremely low. We've almost never found CPU utilization to be a bottleneck of any sort on the app servers, even on four-year-old hardware.

So, no answers from me, but hopefully those thoughts can trigger some more discussion or further ideas. And of course I'm interested in hearing more about the exact details of your outages.

David Adams

Kusnetz, Jeremy wrote:
> At that Sakai conference’s high concurrency BoF and later TCC meetings it was pointed out that there is probably a maximum number of Sakai nodes running in a cluster before you start lowering performance because of how event caching is currently pulled through the database.  Does anyone have a good feel to what that number may be?  Obviously some load testing is in order but I’m looking for a good starting point.
>
> Also it was mentioned that there may be a maximum JVM heap size before you start having issues with long garbage collection times even with proper JVM tuning.  I’ve heard numbers in the 8GB range.
>
> Basically I’m looking for what you guys feel (especially from the TCC team) what the largest Sakai cluster would look like before you start getting performance penalties from growing larger.
>
> As shared in the BoF we currently have maxed at about 14.5K concurrent users and are looking to grow to about 20K concurrent users by fall.  Currently we easily hit 12K concurrent users weekly.  We have 35 Sakai application nodes with 7GB heaps running on VMs.  At the BoF it was widely seen that this is too many nodes and we were probably hurting our performance.  I’d like to look at turning off some of those VMs, adding the memory gained by shutting off those VMs and adding them back to the existing VMs and increase their heaps.  During peak times on average we are using about 50% of the heap, although some nodes can get closer to 75% while others are less.
>
> This message is private and confidential. If you have received it in error, please notify the sender and remove it from your system.
> _______________________________________________
> sakai-dev mailing list
> sakai-dev at collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>
> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe at collab.sakaiproject.org with a subject of "unsubscribe"
-- 
David Adams
Director, Learning Systems Integration and Support
Virginia Tech Learning Technologies