[Deploying Sakai] System 'pause'

Tom Hall thall at brocku.ca
Wed Sep 22 09:19:44 PDT 2010


  Good day everyone

Last night at about 7:40 pm our entire cluster 'paused' for about a 
minute.  There are no entries in the SAKAI_EVENT table or in the 
catalina.out logs during this time.  We have 5 application servers, of 
which 3 did not report any type of error.  The last entries in the 
catalina.out logs of the other 2 are of the form (copied from one of them):

2010-09-21 19:39:53,950  WARN 
org.sakaiproject.event.impl.ClusterEventTracking$$EnhancerByCGLIB$$ef6ff8d3 
org.sakaiproject.db.impl.BasicSqlService - Sql.dbWrite(): error code: 
1205 sql: insert into SAKAI_EVENT (EVENT_DATE, EVENT, REF, SESSION_ID, 
EVENT_CODE, CONTEXT) values     (?, ?, ?, ?, ?, ?) binds:  
20100921233858108 pres.begin /presence/ECON2P91D02FW2010MAIN-presence 
54bc1fe8-e44c-4526-9340-8dc66298d912 m null
java.sql.SQLException: Lock wait timeout exceeded; try restarting 
transaction
         at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055)
         at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956)
         at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3491)
         at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3423)
         at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1936)
         at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2060)
         at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2542)
         at 
com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1734)
         at 
com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2019)
         at 
com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1937)
         at 
com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1922)
         at 
org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:102)
         at 
org.sakaiproject.db.impl.BasicSqlService.dbWrite(BasicSqlService.java:1239)
         at 
org.sakaiproject.db.impl.BasicSqlService.dbWrite(BasicSqlService.java:1094)
         at 
org.sakaiproject.event.impl.ClusterEventTracking.writeBatchEvents(ClusterEventTracking.java:334)
         at 
org.sakaiproject.event.impl.ClusterEventTracking.run(ClusterEventTracking.java:489)
         at java.lang.Thread.run(Thread.java:619)
2010-09-21 19:39:53,950  WARN 
org.sakaiproject.event.impl.ClusterEventTracking$$EnhancerByCGLIB$$ef6ff8d3 
org.sakaiproject.event.impl.ClusterEventTracking - 
org.sakaiproject.event.impl.ClusterEventTracking$$EnhancerByCGLIB$$ef6ff8d3 at 448d5a91.writeBatchEvents: 
java.lang.RuntimeException: SqlService.dbWrite failure

At the time this happened there were between 1400 and 1500 active 
sessions.  The load averages were normal on all applications servers, 
the database server and the file server.  Network traffic seemed normal 
during the period leading up to the pause and seemed normal afterwards 
(based on 5 minute snapshots from the switches and sar stats).

Upon recovery, the systems seem to have simply continued operation 
normally.  There have been no reports of anything from users, just our 
nagios monitoring system.

Anybody seen anything similar to this?

Tom


More information about the production mailing list