[Deploying Sakai] FW: 2.6 Production Issues

Charles Hedrick hedrick at rutgers.edu
Wed Sep 2 18:42:42 PDT 2009


hibernate.dialect=org.hibernate.dialect.MySQLDialect
vendor at org.sakaiproject.db.api.SqlService=mysql
driverClassName at javax.sql.BaseDataSource=com.mysql.jdbc.Driver
url at javax.sql.BaseDataSource=jdbc:mysql://....:3306/sakai? 
useUnicode 
= 
true 
&characterEncoding 
= 
UTF 
-8 
&useServerPrepStmts 
= 
false 
&cachePrepStmts 
= 
true 
&prepStmtCacheSize 
= 
4096 
&prepStmtCacheSqlLimit 
=4096&elideSetAutoCommits=true&useLocalSessionState=true
username at javax.sql.BaseDataSource=***OVERRIDE IN LOCAL.PROPERTIES***
password at javax.sql.BaseDataSource=***OVERRIDE IN LOCAL.PROPERTIES***
testOnBorrow at javax.sql.BaseDataSource=false
validationQuery at javax.sql.BaseDataSource=
#validationQuery at javax.sql.BaseDataSource=select 1 from DUAL
defaultTransactionIsolationString 
@javax.sql.BaseDataSource=TRANSACTION_READ_COMMITTED
initialSize at javax.sql.BaseDataSource=300
maxActive at javax.sql.BaseDataSource=300
maxIdle at javax.sql.BaseDataSource=300
minIdle at javax.sql.BaseDataSource=0

At one point testOnBorrow, validationQuery, and probably minIdle were  
critical to work around bugs in dbcp. I don't know whether 2.6 is  
using a new enough version that this is no longer needed. Mysql can  
handle large numbers of connections. I wonder if the caching helps  
with the query you mention.

This is 2.6.0, Java 6 and Mysql 4. Front ends are 2xdual core Operons  
(a few years old). Database is 2xquad core Intel, a year or so old.  
All 16 GB machines. One JVM per front end, with minimal JVM tuning. (I  
believe in simple configurations. Java 5 requires a bit more tuning.)

I just looked at our connection and out of 1361 selects, 80 looked for  
SITE_PAGE_PROPERTY

As in past semesters, our mysql server is using about the same CPU as  
each of 4 front ends. Common queries seem to include

realm permission checking

select NAME, VALUE from SAKAI_SITE_TOOL_PROPERTY where ( TOOL_ID =  
x'65623135323
431322D353937342D343531382D393630642D333963643666623632366262' )

select TOOL_ID, REGISTRATION, TITLE, LAYOUT_HINTS, PAGE_ORDER from  
SAKAI_SITE_TO
OL where PAGE_ID =  
x'37633433303061642D326433392D346361342D623962392D65363032653
7363839386261' order by PAGE_ORDER ASC

the same kind of stuff I've seen the past.

Your symptoms sound like the dbcp problem we had until we adopted the  
parameters given above to work around the bugs. Note that those  
parameters make it less likely that a front end can survive a database  
restart or network problem. But our frontends and database are on the  
same switch, so network problems are unlikely, and our database stays  
up for months at a time (sometimes a year at a time, although we're  
going to be doing work that will keep it down to months at a time).  
Our JVM's normally come down only when we need to do a redeploy.

There does seem to be a database connection leak, which looks new to  
2.6 (and/or new to something we've added since last semester). We  
currently have 25-50 open connections from each front end. They are  
almost certainly not in current use. If we can't find the problem that  
will force periodic JVM restarts.


On Aug 31, 2009, at 10:08:05 PM, May, Megan Marie wrote:

> Cross posting for Michelle since she's not on this list (yet!)
>
> -----Original Message-----
> From: sakai-dev-bounces at collab.sakaiproject.org [mailto:sakai-dev- 
> bounces at collab.sakaiproject.org] On Behalf Of Wagner, Michelle R.
> Sent: Monday, August 31, 2009 10:04 PM
> To: sakai-dev at collab.sakaiproject.org; production at collab.sakaiproject.org
> Subject: [Building Sakai] 2.6 Production Issues
>
> Hi all,
> Today IU experienced our heaviest load since upgrading to 2.6.   
> Starting last week (several of our campuses started last Wednesday),  
> Oncourse (our local Sakai brand) began experiencing sporadic  
> behavior where all of the application configured DB connections were  
> being used resulting in "An attempt by a client to checkout a  
> Connection has timed out" errors.  These errors typically lasted  
> about 30 seconds or so and then the pool recovered.
>
> Today all connections on all 8 of the servers were consumed.  In  
> response, we increased the maxPoolSize and restarted the servers.   
> This brought the application back, but we are still experiencing  
> short but high increases in the number of connections used by our  
> app servers.  At our peak usage, we had just over 200,000 requests  
> per 5 minutes.
>
> We are hoping to get some insight and/or recommendations from others  
> in the community, especially those of you who are running 2.6.
>
> * Is anyone else experiencing these short but high increases in  
> connection usage?
>
> * What connection pool settings are you using?  We currently have  
> the following.  We increased our maxPoolSize from 150 to 250 this  
> afternoon when our application went down:
> initialPoolSize at javax.sql.DataSource=50
> maxPoolSize at javax.sql.DataSource=250
> minPoolSize at javax.sql.DataSource=50
> maxConnectionAge at javax.sql.DataSource=0
> maxIdleTime at javax.sql.DataSource=0
> checkoutTimeout at javax.sql.DataSource=5000
> acquireRetryAttempts at javax.sql.DataSource=1
> maxIdleTimeExcessConnections at javax.sql.DataSource=300
> maxStatementsPerConnection at javax.sql.DataSource=0
> maxStatements at javax.sql.DataSource=0
>
> * We are still concerned about the query to  
> SAKAI_SITE_PAGE_PROPERTY.  It is being executed about 4x as many  
> times as any other query.  Does anyone have special cache settings  
> for this table?  What kind of results do you see in production?  Our  
> table has no records in it.  We are using the default cache:
> [ name =  
> org.sakaiproject.db.BaseDbFlatStorage.SAKAI_SITE_PAGE_PROPERTY  
> status = STATUS_ALIVE eternal = false overflowToDisk = true  
> maxElementsInMemory = 10000 maxElementsOnDisk = 0  
> memoryStoreEvictionPolicy = LRU timeToLiveSeconds = 120  
> timeToIdleSeconds = 120 diskPersistent = false  
> diskExpiryThreadIntervalSeconds = 120 cacheEventListeners: hitCount  
> = 25624 memoryStoreHitCount = 25456 diskStoreHitCount = 168  
> missCountNotFound = 1743800 missCountExpired = 1680 ]
>
> * What are your top queries in terms of executions and/or cpu time?   
> We want to get a better idea of what is "normal" for 2.6.
>
> Thanks so much for any help/insight you can provide,
> Michelle
> _______________________________________________
> sakai-dev mailing list
> sakai-dev at collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>
> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe at collab.sakaiproject.org 
>  with a subject of "unsubscribe"
> _______________________________________________
> production mailing list
> production at collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/production
>
> TO UNSUBSCRIBE: send email to production-unsubscribe at collab.sakaiproject.org 
>  with a subject of "unsubscribe"

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2421 bytes
Desc: not available
Url : http://collab.sakaiproject.org/pipermail/production/attachments/20090902/93ae3c88/attachment.bin 


More information about the production mailing list