[Building Sakai] Lesson Builder may be triggering a kernel deadlock

Hedrick Charles hedrick at rutgers.edu
Tue Oct 11 16:57:56 PDT 2011


If you are using Lesson Builder, and your users use the release control features, please look at KNL-815. It turns out that the realm update code in the kernel is not thread-safe. Updating the same site or realm at the same time in two threads can easily trigger a deadlock that will cause all of Sakai to pause for 50 sec.

My preferred solution would be to install the kernel patch in the Jira. However I've just patched both Lesson Builder 1.3.x and trunk. The LB patch should cause it not to trigger this deadlock. Please see the writeup for KNL-815 for details.

I recommend either updating Lesson Builder or installing the kernel patch (or both).

Currently Rutgers is running in production with the first version of the kernel patch. It has fixed the deadlocks (with 90% confidence – it's been a day; we had several hangs per day before that). Unfortunately I won't be able to try the LB patch alone in production without removing the kernel patch. The problems were sufficiently serious for us that I don't think I'm willing to do that. (It's not that I mistrust my workaround – rather, the kernel problem can be triggered by things other than Lesson Builder, and the kernel patch should get them all.)

Note that there's nothing wrong with the original LB code. It uses core APIs exactly as documented. The bug is in the kernel. However if we can fix LB to not trigger the bug, it seems worth doing. I can come up with plausible scenarios not involving LB that would likely cause the problem as well (e.g. if you use a course management provider, and lots of users who have added courses login at the same time).

LB 1.2 could also cause this problem. The same patch would work with it, but I'm no longer changing 1.2.



More information about the sakai-dev mailing list