[Deploying Sakai] Deployment sizing question [Rutgers data and plans]
Charles Hedrick
hedrick at rutgers.edu
Tue May 19 09:00:07 PDT 2009
Data from Rutgers, 2 of our 3 campuses.
Note that Continuous Education uses eCollege, and it is also available
for on-campus use. The third campus uses Blackboard. We moved from
WebCT 4 to Sakai. WebCT has been decommissioned for a year.
40,000 students, about 36,000 people use Sakai within a month,
including up to 600 high school students in a district that has a
contract with us to run Sakai for them. (They are on the same servers.)
About 45% of our undergrad sections use Sakai. We do not precreate
sites, so this should be a real number. At least half of our sites are
non-instructional. It is used for all activities, including
politically sensitive activities involving the upper level
administration. We also use OSP for portfolios, though that's not in
heavy use yet. We're slowly phasing in the evaluation system for our
course evaluations.
I've seen 2500 users on, but only once. That is a conservative number.
Sakai would show a lot more sessions. This is the number of distinct
IP addresses in a netstat -n -a, which should represent the number of
people doing queries in the last 2 minutes.
Infrastructure:
A pair of Barracuda load balancers, with auto failover.
DB:
A pair of Sun X4150s. 2 x Intel 5355, 2.66 GHz, total of 8 cores. 16
GB of memory
We run Mysql. The second machine is a slave, maintained in sync with
the primary. The slave is not normally used. It's there in case the
primary fails. The slave also hosts the database for our test
infrastructure.
Front ends:
5 Sun X4100, 2 x Opteron 275, 2.2 GHz, total of 4 cores, 16 GB of
memory. Only 4 of these are in production. The 5th, with the same code
and pointing at the production database, is used when we need to put
up a fix or new feature for one faculty member, or we want to verify
that something works in the production configuration, but don't want
to redeploy the public systems.
We run a single 64-bit JVM on each, using about 13 GB of memory. The
JVM typically stays up for at least a month. We've just begun
restarting them after a month of uptime, though it's not clear whether
this is needed.
We've seen Sakai become unresponsive for two reasons:
* a problem with a specific application. Very rare. I think it's
happened once this semester.
* very long GC's, over 3 minutes. One or two a week. This is long
enough that it sometimes triggers the load balancer to mark the system
down, requiring users to login again.
I'm still worried about the long GC's. As everyone says a number of GC
bugs have been fixed, as we move to 2.6 tomorrow we're also moving to
Java 1.6.0 update 13 (still building under Java 5, but this may be
paranoia). We've done the last 2 weeks of testing for 2.6 under this
Java.
If they release update 14 fairly soon, I am hoping to go to the new
garbage collector, G1, for fall. We'll deploy it slowly, first on test
systems, then on one of the front ends. The current GCs are simply not
designed for very large JVM's. I don't think we'll ever see trouble-
free operation with them.
More information about the production
mailing list