[Building Sakai] patching and downtime

Mon Oct 7 08:56:19 PDT 2013

Will-

Here at UVa, use 'rolling restarts', as you say, to deploy new code and/or reboot sakai app servers without a downtime, through a combination of a BigIP F5 loadbalancer and a jsf script that will kill the F5 cookie in the event that a user has no active JSESSION. This allows users who are actively using sakai on an F5-disabled server to continue doing so, while anyone who previously logged out and is establishing an new sakai session will get assigned to an F5-active server instead.

Our usual deployment schedule is weekly: we disable half our servers Friday afternoon to allow users to trickle off (usually 6-8 hours); deploy, restart and re-enable them, and disable the other half on Friday evening; and then deploy & restart the second half on Saturday. We picked those days/times for when our user load is at its lowest (it tends to pick back up on Sundays). This way, we always have half our app servers running for users to use, and so they don't see any downtime.

We've been using this process for several years now, and it's been working great. We also recently brought in the 'global alerts' tool, which we used over the summer to notify users on F5-disabled servers that a restart was coming and they should log out and back in, and that drastically reduced the time it took for users to leave the server. However, our load has been a bit high for that approach in the last few weeks.

For apache, we've found that restarting the apache server causes no interruption to our users (and is almost instantaneous), so if/when it's necessary, we just do it.

As for databases, we use MySQL 5.5, and we initially used an F5 pool to allow us to swap servers in/out behind the scenes, but found the F5 caused too much overhead in database query times and was slowing sakai down dramatically, so we now operate with a single 'production' database machine, with a 'hot spare' waiting in the wings with a continuously replicated copy of the sakai database that we can manually begin using if the main one goes down for any reason. Unfortunately, this means for any operations requiring database or OS restarts, we do have to schedule a downtime. Luckily, we've found this to be an infrequent occurrence (once or twice a year, I'd say).

Kevin @ UVa

On 10/1/2013 9:10 AM, Will Humphries wrote:
> Hi everyone,
>
> What approaches do you take to roll out updates/patches to Sakai or its
> dependencies (appserver OS, apache, db server/OS, etc.)? Do you take a
> downtime for these events? If not, what strategies do you employ to
> avoid downtime, and how well have they worked for you?
>
> We're taking a downtime for all of these activities at Tufts. On the
> appserver side, we've talked about rolling Tomcat restarts when making
> the kind of changes that don't need to be on every appserver at once to
> function.
>
> On the db side, we run Oracle with Data Guard, but some operations (site
> creation, maybe others) can be interrupted by the temporary loss of db
> connectivity that occurs when switching from primary to secondary db, so
> we don't use that feature to avoid stopping Tomcat.
>
> Thanks,
> Will
> _______________________________________________
> sakai-dev mailing list
> sakai-dev at collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>
> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe at collab.sakaiproject.org with a subject of "unsubscribe"
>