[Building Sakai] Sakai Memory Issues

John Bush jbush at anisakai.com
Wed Oct 9 14:43:23 PDT 2013


We do tool identification with a pojo custom rule for business
transactions, so everything rolls up to a tool.  I've attached
screenshots of our config for that and data collectors. Turning on the
timer stuff is nice, sakai has a lot of timers.  We tried using the
lite version for about a year, and it really just doesn't do enough.
Right now I have a trial license and I'm going to buy a few licenses
and just move them around as needed.  They offer the controller in the
cloud now which is really nice, it literary took us 15 minutes to
deploy it into prod.

Their pricing model stinks for us, we have over a 150 jvm's, to put
this everywhere would put us out of business.  We tried to pass the
cost on to clients, but that puts us out of business too because we
end up being too expensive compared to our competitors.  Its such a
great tool, I really haven't found anything else that compares.  So a
few licenses is probably the best we can do right now.  It's been my
experience it really delivers in finding root cause.



On Wed, Oct 9, 2013 at 2:28 PM, David McIntyre
<david.mcintyre at stanford.edu> wrote:
> John,
>
> I'm working on installing AppDynamics right now, and I'd like to see your
> configuration for assigning users to a business request. Even better would
> be tool identification.
>
> Thanks,
>   David
>
>
> On Wed, Oct 9, 2013 at 1:52 PM, John Bush <jbush at anisakai.com> wrote:
>>
>> I think I just found the root cause.  I was going back and looking at
>> requests right before heap was exhausted and I've been able to trace
>> every single time back to loading the announcements tool or synoptic.
>> Seeing sites that have 1500 and two that have 150,000 announcements in
>> there, all created at the same time (presumably from a site dupe or
>> import or something).  So these requests obviously take forever, cause
>> memory to fill up, big GC pause comes in which backs up requests, and
>> a death spiral ensues.
>>
>> I'm not sure exactly what defect/user activity is at play here, but
>> something is going on to have created so many announcements.  I'd
>> guess site dupe or import, or the announcement merge feature or
>> something like that. If anyone else is still having trouble, maybe
>> mine your database and see if you have any sites with ridiculous
>> number of announcements.
>>
>> I have a data collector in appdynamics that gives me the user and site
>> data for business requests.  If anyone else is using appdynamics I can
>> tell you how I did that, its pretty handy.
>>
>> On Wed, Oct 9, 2013 at 7:27 AM, John Bush <jbush at anisakai.com> wrote:
>> > Yes, we could take it early on.  We have an alert in place to be able
>> > to do that next time, it hasn't occurred again so we tried that.  It
>> > hasn't occurred again since Oct 4th, GC has been keeping up ok.
>> >
>> > In this particular the instance the server was just too far gone its
>> > couldn't recover even by bleeding users.  We are testing IU's courier
>> > fix with my jforum fix and pushing that out soon.  I'm not convinced
>> > either of these fixes are the root or our problem, but its the best
>> > we've got until we get more info.
>> >
>> > On Wed, Oct 9, 2013 at 6:24 AM, David Haines <dlhaines at umich.edu> wrote:
>> >> John,
>> >>
>> >>    Would your setup allow getting a heap dump early on before the node
>> >> get
>> >> overloaded?  The server would be unavailable during the dump which may
>> >> take
>> >> many minutes.  At Michigan we've gotten heap dumps by taking a server
>> >> out of
>> >> the active cluster after running for a while, and then getting a heap
>> >> dump
>> >> the next day when people with active sessions on that server have
>> >> drifted
>> >> away.
>> >>
>> >> - Dave
>> >>
>> >>
>> >> On Tue, Oct 8, 2013 at 12:10 PM, John Bush <jbush at anisakai.com> wrote:
>> >>>
>> >>> Ok this is interesting.  Jeremy, we tried to get a heap dump but our
>> >>> node was so over loaded at that point it was impossible.  The patch
>> >>> from IU is for a ConcurrentHashMap filling up.  I would have expected
>> >>> appdynamics to find that.
>> >>>
>> >>> I was going to try to use yourkit to look at the heap, I think it can
>> >>> do it, although I haven't used it for that purpose before.  Jeremy,
>> >>> it might be helpful if you could review this patch and dig in your
>> >>> heap and bit and see if it looks like that same thing.
>> >>>
>> >>>
>> >>> On Tue, Oct 8, 2013 at 8:51 AM, Thomas, Gregory J <gjthomas at iu.edu>
>> >>> wrote:
>> >>> > Indiana recently had a memory leak issue with the delivery objects
>> >>> > in
>> >>> > courier and the chat tool.  We have a patch here:
>> >>> > https://jira.sakaiproject.org/browse/SAK-21398  It hasn't been
>> >>> > committed
>> >>> > to trunk yet as there were some changes that were preferred to be
>> >>> > done
>> >>> > first on its implementation.  However, we have the current patch in
>> >>> > our
>> >>> > production code as of Thursday last week.  It passed our load
>> >>> > testing
>> >>> > and
>> >>> > it passed our dreaded Sunday evening for high server load/usage.
>> >>> >
>> >>> > Not sure if it's related to issues brought up here, but thought I'd
>> >>> > throw
>> >>> > it out there as something we've definitely ran into.
>> >>> >
>> >>> > Thanks,
>> >>> > Greg
>> >>> >
>> >>> >
>> >>> >
>> >>> > On 10/8/13 10:57 AM, "Kusnetz, Jeremy" <JKusnetz at APUS.EDU> wrote:
>> >>> >
>> >>> >>We have been seeing similar memory problems since upgrading to 2.9.3
>> >>> >>also.  The oldgen space just continually grows until it's out of
>> >>> >> space.
>> >>> >>GC just doesn't seem to clean everything up.  We have tried lots of
>> >>> >>different combinations of GC, the best we can do is slow down how
>> >>> >> quickly
>> >>> >>we run out of memory so we are forced to have to restart Sakai
>> >>> >> multiple
>> >>> >>times a week.
>> >>> >>
>> >>> >>Interestingly we are running two instances of Sakai all running from
>> >>> >> the
>> >>> >>same code base.  Once instance has the memory issue while the other
>> >>> >> does
>> >>> >>not.
>> >>> >>
>> >>> >>The one that does NOT have the memory issues is primarily using
>> >>> >> jforums,
>> >>> >>assignment1, and gradebook1.  The one that IS having memory issues
>> >>> >> is
>> >>> >>primarily using msgcntr, assignment2, and gradebook2.  There is a
>> >>> >> small
>> >>> >>amount of jforum usage in the one with memory issues, but it's only
>> >>> >> in a
>> >>> >>couple of sites.
>> >>> >>
>> >>> >>There are a few other differences.  The one with memory issues has
>> >>> >> 35
>> >>> >>sakai nodes, while the one without memory issues is only 2 sakai
>> >>> >> nodes.
>> >>> >>But on average the one with 2 nodes has more sessions on each of
>> >>> >> those
>> >>> >>nodes then the one with 35 nodes.  The one without memory issues is
>> >>> >> using
>> >>> >>LDAP, while the one without memory issues is not using LDAP, all
>> >>> >> users
>> >>> >>are local but we do use the soap axis SakaiPortalLogin for most of
>> >>> >> it's
>> >>> >>session creation.
>> >>> >>
>> >>> >>We are also using appdynamics to try to find the memory leaks, but
>> >>> >> it's
>> >>> >>automatic leak detection only tracks Map and Collection libraries.
>> >>> >> There
>> >>> >>is another tool to map any custom object, but we would need to know
>> >>> >> what
>> >>> >>object to look for.  The only thing appdynamics seems to think are
>> >>> >> memory
>> >>> >>leaks with it's automatic leak detection are the various ehcaches.
>> >>> >>
>> >>> >>We did try dumping the heap and opening it in the Eclipse Memory
>> >>> >>Analyzer.  I don't have much experience with it.  Even though the
>> >>> >> heap
>> >>> >>dump was close to 7GB, it only identified about 2.5GB, the rest were
>> >>> >>unreferenced objects.  I'm not sure why those objects weren't GCed.
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>-----Original Message-----
>> >>> >>From: sakai-dev-bounces at collab.sakaiproject.org
>> >>> >>[mailto:sakai-dev-bounces at collab.sakaiproject.org] On Behalf Of John
>> >>> >> Bush
>> >>> >>Sent: Tuesday, October 08, 2013 9:54 AM
>> >>> >>To: Mike Jennings
>> >>> >>Cc: sakai-dev Developers
>> >>> >>Subject: Re: [Building Sakai] Sakai Memory Issues
>> >>> >>
>> >>> >>we are on the 2.9 tag.   I don't have a smokin' gun yet to really
>> >>> >> say
>> >>> >>if that's the root cause or not.  We are using appdynamics to look
>> >>> >> at
>> >>> >>things in production.  What we are seeing is a single request run
>> >>> >> this
>> >>> >>type of query literary hundreds of times.
>> >>> >>
>> >>> >>[SELECT special_access_id, forum_id, topic_id, start_date,
>> >>> >>hide_until_open, end_date, allow_until_date, override_start_date,
>> >>> >>override_hide_until_open, override_end_date,
>> >>> >> override_allow_until_date,
>> >>> >>password, lock_end_date, users FROM jforum_special_access WHERE
>> >>> >> forum_id
>> >>> >>= '116964'  AND topic_id > 0]
>> >>> >>
>> >>> >>Each one is fast and our database has no issue.  Eventually we see
>> >>> >>getting connections taking awhile and threads get blocked but we
>> >>> >> aren't
>> >>> >>running into any type of bottleneck load or connection wise on the
>> >>> >> db.
>> >>> >> I
>> >>> >>think the appservers are simply running stupid amounts of queries
>> >>> >> and
>> >>> >>just eating up connections and then blocking and keeping things in
>> >>> >>memory.  Its just a hunch right now.  Its the best we've got so far.
>> >>> >>
>> >>> >>I wrote some code to cache this stuff yesterday, we are QA'ing
>> >>> >> today.
>> >>> >>Again, I'm not sure its jforum, but jforum certainly isn't helping.
>> >>> >>
>> >>> >>On Tue, Oct 8, 2013 at 6:22 AM, Mike Jennings
>> >>> >> <mike_jennings at unc.edu>
>> >>> >>wrote:
>> >>> >>> John,
>> >>> >>>
>> >>> >>> We are using JForum and Sakai 2.9.2 and are not seeing any Memory
>> >>> >>> Issues to date.  What version of JForum's do you suspect this
>> >>> >>> memory
>> >>> >>>leak in?
>> >>> >>>
>> >>> >>> Mike Jennings
>> >>> >>>
>> >>> >>>
>> >>> >>> On 10/07/2013 05:39 PM, John Bush wrote:
>> >>> >>>>
>> >>> >>>> We've been having an issue like this with one client, we think it
>> >>> >>>> might be related to jforum.  Do you use jforum ?
>> >>> >>>>
>> >>> >>>> On Mon, Oct 7, 2013 at 1:11 PM, William Karavites
>> >>> >>>> <willkara at oit.rutgers.edu> wrote:
>> >>> >>>>>
>> >>> >>>>> Hello Community,
>> >>> >>>>>
>> >>> >>>>> I was wondering if there were any outstanding or recently fixed
>> >>> >>>>> memory usage issues for Sakai. We recently came across one that
>> >>> >>>>> took
>> >>> >>>>> Sakai down twice in a night for 30 minutes each. We found the
>> >>> >>>>> culprit for that one and want to try and be proactive for any
>> >>> >>>>> others
>> >>> >>>>> in the future. We are currently running Sakai 2.9.1 and Kernel
>> >>> >>>>> 1.3.1.
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>> So if there are any improved memory usage patches in Sakai,
>> >>> >>>>> please
>> >>> >>>>> let me know.
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>> Thank you,
>> >>> >>>>> William Karavites
>> >>> >>>>>
>> >>> >>>>> ------------------------------------
>> >>> >>>>> William Karavites
>> >>> >>>>> Application Programmer
>> >>> >>>>> OIT/OIRT- Rutgers University
>> >>> >>>>> Office: 732-445-8726
>> >>> >>>>> Cell: 732-822-9405
>> >>> >>>>> willkara at rutgers.edu
>> >>> >>>>> ------------------------------------
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>> _______________________________________________
>> >>> >>>>> sakai-dev mailing list
>> >>> >>>>> sakai-dev at collab.sakaiproject.org
>> >>> >>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>> >>> >>>>>
>> >>> >>>>> TO UNSUBSCRIBE: send email to
>> >>> >>>>> sakai-dev-unsubscribe at collab.sakaiproject.org
>> >>> >>>>> with a subject of "unsubscribe"
>> >>> >>>>
>> >>> >>>>
>> >>> >>>>
>> >>> >>>>
>> >>> >>>
>> >>> >>> --
>> >>> >>> =========================================
>> >>> >>> Mike Jennings
>> >>> >>> Teaching and Learning Developer
>> >>> >>> T: 919.843.5013
>> >>> >>> E: mike_jennings at unc.edu
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>--
>> >>> >>John Bush
>> >>> >>602-490-0470
>> >>> >>
>> >>> >>** This message is neither private nor confidential in fact the US
>> >>> >>government is storing it in a warehouse located in Utah for future
>> >>> >> data
>> >>> >>mining use cases should they arise. **
>> >>> >>_______________________________________________
>> >>> >>sakai-dev mailing list
>> >>> >>sakai-dev at collab.sakaiproject.org
>> >>> >>http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>> >>> >>
>> >>> >>TO UNSUBSCRIBE: send email to
>> >>> >>sakai-dev-unsubscribe at collab.sakaiproject.org with a subject of
>> >>> >>"unsubscribe"
>> >>> >>This message is private and confidential. If you have received it in
>> >>> >>error, please notify the sender and remove it from your system.
>> >>> >>
>> >>> >>_______________________________________________
>> >>> >>sakai-dev mailing list
>> >>> >>sakai-dev at collab.sakaiproject.org
>> >>> >>http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>> >>> >>
>> >>> >>TO UNSUBSCRIBE: send email to
>> >>> >>sakai-dev-unsubscribe at collab.sakaiproject.org with a subject of
>> >>> >>"unsubscribe"
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> John Bush
>> >>> 602-490-0470
>> >>>
>> >>> ** This message is neither private nor confidential in fact the US
>> >>> government is storing it in a warehouse located in Utah for future
>> >>> data mining use cases should they arise. **
>> >>> _______________________________________________
>> >>> sakai-dev mailing list
>> >>> sakai-dev at collab.sakaiproject.org
>> >>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>> >>>
>> >>> TO UNSUBSCRIBE: send email to
>> >>> sakai-dev-unsubscribe at collab.sakaiproject.org with a subject of
>> >>> "unsubscribe"
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > John Bush
>> > 602-490-0470
>> >
>> > ** This message is neither private nor confidential in fact the US
>> > government is storing it in a warehouse located in Utah for future
>> > data mining use cases should they arise. **
>>
>>
>>
>> --
>> John Bush
>> 602-490-0470
>>
>> ** This message is neither private nor confidential in fact the US
>> government is storing it in a warehouse located in Utah for future
>> data mining use cases should they arise. **
>> _______________________________________________
>> sakai-dev mailing list
>> sakai-dev at collab.sakaiproject.org
>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>
>> TO UNSUBSCRIBE: send email to
>> sakai-dev-unsubscribe at collab.sakaiproject.org with a subject of
>> "unsubscribe"
>
>
>
>
> --
> David McIntyre
> Library Technology
> CourseWork QA Lead
> (650) 498-7209



-- 
John Bush
602-490-0470

** This message is neither private nor confidential in fact the US
government is storing it in a warehouse located in Utah for future
data mining use cases should they arise. **
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2013-10-09 at 2.34.54 PM.png
Type: image/png
Size: 110088 bytes
Desc: not available
Url : http://collab.sakaiproject.org/pipermail/sakai-dev/attachments/20131009/1db4b189/attachment.png 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2013-10-09 at 2.34.38 PM.png
Type: image/png
Size: 54938 bytes
Desc: not available
Url : http://collab.sakaiproject.org/pipermail/sakai-dev/attachments/20131009/1db4b189/attachment-0001.png 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2013-10-09 at 2.34.30 PM.png
Type: image/png
Size: 54908 bytes
Desc: not available
Url : http://collab.sakaiproject.org/pipermail/sakai-dev/attachments/20131009/1db4b189/attachment-0002.png 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2013-10-09 at 2.33.55 PM.png
Type: image/png
Size: 46261 bytes
Desc: not available
Url : http://collab.sakaiproject.org/pipermail/sakai-dev/attachments/20131009/1db4b189/attachment-0003.png 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2013-10-09 at 2.33.50 PM.png
Type: image/png
Size: 37492 bytes
Desc: not available
Url : http://collab.sakaiproject.org/pipermail/sakai-dev/attachments/20131009/1db4b189/attachment-0004.png 


More information about the sakai-dev mailing list