[Deploying Sakai] System 'pause'
Tom Hall
thall at brocku.ca
Wed Sep 22 09:19:44 PDT 2010
Good day everyone
Last night at about 7:40 pm our entire cluster 'paused' for about a
minute. There are no entries in the SAKAI_EVENT table or in the
catalina.out logs during this time. We have 5 application servers, of
which 3 did not report any type of error. The last entries in the
catalina.out logs of the other 2 are of the form (copied from one of them):
2010-09-21 19:39:53,950 WARN
org.sakaiproject.event.impl.ClusterEventTracking$$EnhancerByCGLIB$$ef6ff8d3
org.sakaiproject.db.impl.BasicSqlService - Sql.dbWrite(): error code:
1205 sql: insert into SAKAI_EVENT (EVENT_DATE, EVENT, REF, SESSION_ID,
EVENT_CODE, CONTEXT) values (?, ?, ?, ?, ?, ?) binds:
20100921233858108 pres.begin /presence/ECON2P91D02FW2010MAIN-presence
54bc1fe8-e44c-4526-9340-8dc66298d912 m null
java.sql.SQLException: Lock wait timeout exceeded; try restarting
transaction
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3491)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3423)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1936)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2060)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2542)
at
com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1734)
at
com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2019)
at
com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1937)
at
com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1922)
at
org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:102)
at
org.sakaiproject.db.impl.BasicSqlService.dbWrite(BasicSqlService.java:1239)
at
org.sakaiproject.db.impl.BasicSqlService.dbWrite(BasicSqlService.java:1094)
at
org.sakaiproject.event.impl.ClusterEventTracking.writeBatchEvents(ClusterEventTracking.java:334)
at
org.sakaiproject.event.impl.ClusterEventTracking.run(ClusterEventTracking.java:489)
at java.lang.Thread.run(Thread.java:619)
2010-09-21 19:39:53,950 WARN
org.sakaiproject.event.impl.ClusterEventTracking$$EnhancerByCGLIB$$ef6ff8d3
org.sakaiproject.event.impl.ClusterEventTracking -
org.sakaiproject.event.impl.ClusterEventTracking$$EnhancerByCGLIB$$ef6ff8d3 at 448d5a91.writeBatchEvents:
java.lang.RuntimeException: SqlService.dbWrite failure
At the time this happened there were between 1400 and 1500 active
sessions. The load averages were normal on all applications servers,
the database server and the file server. Network traffic seemed normal
during the period leading up to the pause and seemed normal afterwards
(based on 5 minute snapshots from the switches and sar stats).
Upon recovery, the systems seem to have simply continued operation
normally. There have been no reports of anything from users, just our
nagios monitoring system.
Anybody seen anything similar to this?
Tom
More information about the production
mailing list