[Building Sakai] Lesson Builder may be triggering a kernel deadlock

Steve Swinsburg steve.swinsburg at gmail.com
Tue Oct 11 17:42:26 PDT 2011


If the workaround solves the immediate issue then go for it. It doesn't look like anyone else has had this yet so it can buy some time to fix the kernel so that other tools don't need the same workaround.

cheers,
Steve


On 12/10/2011, at 11:37 AM, Hedrick Charles wrote:

> Yes, I would think so. And in fact we've recently starting using the signup tool as well. My conjecture however is that lesson builder is likely to produce a higher rate of changes. I certainly want to fix the kernel. That's what we've done at Rutgers. Do you think trying a workaround in Lesson Builder also isn't worth it? It's easy enough to revert the change if the community doesn't think it's a good idea.
> 
> On Oct 11, 2011, at 8:31:40 PM, Steve Swinsburg wrote:
> 
>> I commented on the Jira for this, but if this is the issue, then it wouldn't just be LessonBuilder, it would be any code that allows multiple users to update a group at once, wouldn't it? So a number of eager administrators using the Realms tool or instructors in the same site modifying a group list?
>> 
>> regards,
>> Steve
>> 
>> 
>> 
>> On 12/10/2011, at 10:57 AM, Hedrick Charles wrote:
>> 
>>> If you are using Lesson Builder, and your users use the release control features, please look at KNL-815. It turns out that the realm update code in the kernel is not thread-safe. Updating the same site or realm at the same time in two threads can easily trigger a deadlock that will cause all of Sakai to pause for 50 sec.
>>> 
>>> My preferred solution would be to install the kernel patch in the Jira. However I've just patched both Lesson Builder 1.3.x and trunk. The LB patch should cause it not to trigger this deadlock. Please see the writeup for KNL-815 for details.
>>> 
>>> I recommend either updating Lesson Builder or installing the kernel patch (or both).
>>> 
>>> Currently Rutgers is running in production with the first version of the kernel patch. It has fixed the deadlocks (with 90% confidence – it's been a day; we had several hangs per day before that). Unfortunately I won't be able to try the LB patch alone in production without removing the kernel patch. The problems were sufficiently serious for us that I don't think I'm willing to do that. (It's not that I mistrust my workaround – rather, the kernel problem can be triggered by things other than Lesson Builder, and the kernel patch should get them all.)
>>> 
>>> Note that there's nothing wrong with the original LB code. It uses core APIs exactly as documented. The bug is in the kernel. However if we can fix LB to not trigger the bug, it seems worth doing. I can come up with plausible scenarios not involving LB that would likely cause the problem as well (e.g. if you use a course management provider, and lots of users who have added courses login at the same time).
>>> 
>>> LB 1.2 could also cause this problem. The same patch would work with it, but I'm no longer changing 1.2.
>>> 
>>> _______________________________________________
>>> sakai-dev mailing list
>>> sakai-dev at collab.sakaiproject.org
>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>> 
>>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe at collab.sakaiproject.org with a subject of "unsubscribe"
>> 
> 



More information about the sakai-dev mailing list