[Building Sakai] my conclusions from yesterday

Hedrick Charles hedrick at rutgers.edu
Mon Sep 30 15:28:32 PDT 2013


I believe what happened last night was simply running out of memory. In sakai-summary.jsp, there's a line labelled "men". It has the amount of memory in GB in the last GC. If it's 12 you've got a problem with that VM.

Last night all three systems that were up were at 13. They were unusable, so the load balancer took them out of use. To be honest there are lots of issues remaining about exacty how the LB works. Does it sometimes say "Sakai is down" when it's just one bad server? Maybe. But last night all 3 were unusable at once.

Recommendation: watch memory usage. When it gets to 11 or 12, take that system out of service and restart it.

If we get into the situation of last night, about all you can do is restart. I'd also try adding servers. Depending upon the underlying issue that may well help. Short of that, try watching the systems closely enough that you can reboot them one at a time before the whole system fails.

We need to look for known memory issues. KNL-1037 look particularly interesting. It might be worth installing.



More information about the sakai-dev mailing list