[Building Sakai] 2.7.2 production deadlock

Zhen Qian zqian at umich.edu
Tue Oct 23 11:44:01 PDT 2012


A couple of days ago, one of our app server showed high CPU usage, and
dbpool maxed out (148 vs 250 max), which resulted in auto-restart of the
app server afterwards.

There are errors of maxThreads hit and then the pool connection error in
the app log file:

"ERROR org.apache.tomcat.util.threads.ThreadPool - All threads (768) are
currently busy, waiting. Increase maxThreads (768) or check the servlet
status
WARN  org.sakaiproject.db.impl.BasicSqlService - Sql.dbRead: sql: select
SRR.ROLE_NAME from SAKAI_REALM_RL_GR SRRG inner join SAKAI_REALM SR on
SRRG.REALM_KEY = SR.REALM_KEY inner join SAKAI_REALM_ROLE SRR on
SRRG.ROLE_KEY = SRR.ROLE_KEY where SR.REALM_ID = ? and SRRG.USER_ID = ? and
SRRG.ACTIVE = '1' /site/2588f17d-5d1f-40e8-b357-dc9a968c80d8
bc5f3970-742f-4726-a2c7-c6c9d3e29558
org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool
error Timeout waiting for idle object
...."

There are hundreds of stuck threads like the following in the thread dump
file, all starts from a forum delete action.

 "TP-Processor570" daemon prio=10 tid=0x0000000041f6f000 nid=0x2623 in
Object.wait() [0x00007f02fabe7000]
    java.lang.Thread.State: TIMED_WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on <0x000000059701f4d0> (a
org.apache.commons.pool.impl.GenericObjectPool$Latch)
 at
org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1123)
 - locked <0x000000059701f4d0> (a
org.apache.commons.pool.impl.GenericObjectPool$Latch)
 at
org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:96)
 at
org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:880)
 at
org.sakaiproject.db.impl.BasicSqlService.borrowConnection(BasicSqlService.java:254)
...
 at
org.sakaiproject.api.app.messageforums.cover.SynopticMsgcntrManagerCover.getUserToNewMessagesForForumMap(SynopticMsgcntrManagerCover.java:67)
 at
org.sakaiproject.tool.messageforums.DiscussionForumTool.processActionDeleteForum(DiscussionForumTool.java:1097)
 "

I think we found the student issued the forum delete and here was her
comment on the issue "Trying to delete a forum, and then it just froze.  It
had been freezing earlier when I was trying to change the settings.".

The above stacktrace pattern looks similar to the one reported in
POOL-147<https://issues.apache.org/jira/browse/POOL-147>and
POOL-149 <https://issues.apache.org/jira/browse/POOL-149> from Apache
Commons Pool project. However, both JIRAs are fixed in 1.5.3 version, the
version used by sakai 2.7.2 release. I don't understand then why we still
seeing this problem in 2.7.2.

The commons-dbcp version in Sakai 2.9rc2 is the same r1.2.2, while
commons-pool version is a little higher, 1.5.6. I suspect we will still hit
by the same problem going down the road with 2.9 deployment.

Is it possible that hibernate usage could trigger connection pool maxout,
similar to this reported problem <http://jira.grails.org/browse/GRAILS-8311>
?

All suggestions are welcome!

Thanks,

- Zhen

-----------------------------------
Zhen Qian
Application Developer Sr.
University of Michigan
734-327-9066
------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://collab.sakaiproject.org/pipermail/sakai-dev/attachments/20121023/f8a4fea2/attachment.html 


More information about the sakai-dev mailing list