[Deploying Sakai] [Building Sakai] System 'pause'

Wed Sep 22 10:58:16 PDT 2010

  HI Stephen

Thanks for getting back to me.

None of our maintenance scripts use the SAKAI_EVENT table at all (except 
indirectly through web service calls).  The only one that puts any heavy 
load on the system (enrollments) runs at 5:45 AM so it wasn't involved 
in this instance.  The only maintenance script that could run at 
approximately this time of the pause had nothing to do at that time (I 
checked the source data or this job not just its log). it would have run 
at 40 minutes past the hour and the problem started at 3 minutes.

Tom

On 9/22/2010 12:36 PM, Stephen Marquard wrote:
> It looks like something locked the SAKAI_EVENT table for a long
> time(lock wait timeout is 50s by default). Do you have an external
> maintenance script that does a very slow operation on it which would
> cause it to be locked? It shouldn't ever get locked like that by Sakai
> code itself.
>
> Regards
> Stephen
>
>>>> Tom Hall<thall at brocku.ca>  9/22/2010 6:19 PM>>>
> Good day everyone
>
> Last night at about 7:40 pm our entire cluster 'paused' for about a
> minute.  There are no entries in the SAKAI_EVENT table or in the
> catalina.out logs during this time.  We have 5 application servers, of
>
> which 3 did not report any type of error.  The last entries in the
> catalina.out logs of the other 2 are of the form (copied from one of
> them):
>
> 2010-09-21 19:39:53,950  WARN
> org.sakaiproject.event.impl.ClusterEventTracking$$EnhancerByCGLIB$$ef6ff8d3
>
> org.sakaiproject.db.impl.BasicSqlService - Sql.dbWrite(): error code:
> 1205 sql: insert into SAKAI_EVENT (EVENT_DATE, EVENT, REF, SESSION_ID,
>
> EVENT_CODE, CONTEXT) values     (?, ?, ?, ?, ?, ?) binds:
> 20100921233858108 pres.begin /presence/ECON2P91D02FW2010MAIN-presence
> 54bc1fe8-e44c-4526-9340-8dc66298d912 m null
> java.sql.SQLException: Lock wait timeout exceeded; try restarting
> transaction
>           at
> com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055)
>           at
> com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956)
>           at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3491)
>           at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3423)
>           at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1936)
>           at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2060)
>           at
> com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2542)
>           at
> com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1734)
>           at
> com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2019)
>           at
> com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1937)
>           at
> com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1922)
>           at
> org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:102)
>           at
> org.sakaiproject.db.impl.BasicSqlService.dbWrite(BasicSqlService.java:1239)
>           at
> org.sakaiproject.db.impl.BasicSqlService.dbWrite(BasicSqlService.java:1094)
>           at
> org.sakaiproject.event.impl.ClusterEventTracking.writeBatchEvents(ClusterEventTracking.java:334)
>           at
> org.sakaiproject.event.impl.ClusterEventTracking.run(ClusterEventTracking.java:489)
>           at java.lang.Thread.run(Thread.java:619)
> 2010-09-21 19:39:53,950  WARN
> org.sakaiproject.event.impl.ClusterEventTracking$$EnhancerByCGLIB$$ef6ff8d3
>
> org.sakaiproject.event.impl.ClusterEventTracking -
> org.sakaiproject.event.impl.ClusterEventTracking$$EnhancerByCGLIB$$ef6ff8d3 at 448d5a91.writeBatchEvents:
>
> java.lang.RuntimeException: SqlService.dbWrite failure
>
> At the time this happened there were between 1400 and 1500 active
> sessions.  The load averages were normal on all applications servers,
> the database server and the file server.  Network traffic seemed normal
>
> during the period leading up to the pause and seemed normal afterwards
>
> (based on 5 minute snapshots from the switches and sar stats).
>
> Upon recovery, the systems seem to have simply continued operation
> normally.  There have been no reports of anything from users, just our
>
> nagios monitoring system.
>
> Anybody seen anything similar to this?
>
> Tom
> _______________________________________________
> sakai-dev mailing list
> sakai-dev at collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>
> TO UNSUBSCRIBE: send email to
> sakai-dev-unsubscribe at collab.sakaiproject.org with a subject of
> "unsubscribe"
>
>
>
>
>
> ###
> UNIVERSITY OF CAPE TOWN
>
> This e-mail is subject to the UCT ICT policies and e-mail disclaimer
> published on our website at
> http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from
> +27 21 650 4500. This e-mail is intended only for the person(s) to whom
> it is addressed. If the e-mail has reached you in error, please notify
> the author. If you are not the intended recipient of the e-mail you may
> not use, disclose, copy, redirect or print the content. If this e-mail
> is not related to the business of UCT it is sent by the sender in the
> sender's individual capacity.
>
> ###
>