[Building Sakai] Severe DB Performance Issues for 2.9

Noah Botimer botimer at umich.edu
Wed Sep 4 09:09:29 PDT 2013


If threads are waiting, you may be able to capture a stack/thread dump with the offender (I think jstack is the easiest, but there is some kill signal, too).

If that doesn't signal a primary culprit, you could possibly attach a polling profiler (tracing would probably be overload and take too long to catch anything useful).

I'm curious about caching -- can you check the stats?

Some of this should have been covered by KNL-1011 (site visits, drop-down tool menu), but there looks to be some place where each tool/page is being loaded directly. With that kind of execution count, it almost has to be in an n^2+ loop, so it should show up pretty easily with either of the above diagnostics.

These queries should also be relatively easy to backtrack in the code if you can't get a pinpoint. They are going to come through the site/base storage classes and are specific enough to probably only be triggered from a couple of places.

Thanks,
-Noah

On Sep 4, 2013, at 11:36 AM, Sobieralski, Damian Michael wrote:

> IU is into the 2nd week of classes and yesterday we saw heavy CPU load on database which then backed up the app servers.   Our users experienced System latency which then resulted in a lot of errors and very frustrated users.     As traffic slowed, the system recovered.   
> 
> We're running a base 2.9.0 with cherry picked enhancements and performance fixes.  We did apply KNL-1011.  
> 
> What we are noticing is under average to high load the DB is running at 100%.  We are noticing a LOT of calls such as:
> 
> SELECT TOOL_ID,  
>   REGISTRATION,  
>   TITLE,  
>   LAYOUT_HINTS,  
>   PAGE_ORDER  
> FROM SAKAI_SITE_TOOL  
> WHERE PAGE_ID = :1  
> ORDER BY PAGE_ORDER ASC
> 
> And
> 
> SELECT NAME,  
>   VALUE  
> FROM SAKAI_SITE_PAGE_PROPERTY  
> WHERE ( PAGE_ID = :1 )
> 
> Pre 2.9 upgrade we only had one query that ran over a million times. Both of these had over 140 million executions.
> 
> Any ideas what might be causing this behavior?  And, more importantly, what can be done to reduce this? This is a SEVERE problem for us as yesterday our system basically grinded to a halt w/ the DB locked at 100%.
> 
> We did find this: https://jira.sakaiproject.org/browse/SAK-23841 
> 
> But as far as I know we're not using XFrame.  But could this still be the issue?
> 
> Thanks.
> 
> ---
> Damian Sobieralski
> Indiana University
> 
> _______________________________________________
> sakai-dev mailing list
> sakai-dev at collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
> 
> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe at collab.sakaiproject.org with a subject of "unsubscribe"



More information about the sakai-dev mailing list