[Deploying Sakai] out of memory exceptions after upgrade to Sakai CLE 2.9
ecrossman at ufl.edu
Wed May 15 14:04:06 PDT 2013
I'm writing to see if any other institutions have encountered this strange out of memory exception after their upgrade to CLE 2.9. We are encountering this exception after a given tomcat node has been running for 2-3 days. The exception looks like the following:
Java HotSpot(TM) 64-Bit Server VM warning: Attempt to allocate stack guard pages failed.
mmap failed for CEN and END part of zip file
Exception in thread "Thread-423" java.lang.OutOfMemoryError
at java.util.zip.ZipFile.open(Native Method)
at java.security.AccessController.doPrivileged(Native Method)
>From our research so far it seems that the JVM is not actually out of memory but has hit a Linux kernel limit for the number of memory mapped files for a single process. For our RHEL 5.x boxes, this is 65536. While we could raise this value, we would likely only delay the inevitable failure of the JVM once it hits that limit. We have observed that a freshly restarted Sakai/Tomcat instance will start with around 1100-1200 memory files and continuously grow upwards from there.
We have also observed that there is a large population of temporary /dev/shm files that have been memory mapped but later deleted which is contributed to this mmap count being so high. Here is an example from a box:
2aaabbeb4000-2aaabbeb5000 rw-s 00000000 00:13 9279287 /dev/shm/sem.m1TZAN (deleted)
2aaabbeb5000-2aaabbeb6000 rw-s 00000000 00:13 9279288 /dev/shm/sem.oU5lY9 (deleted)
2aaabbeb6000-2aaabbeb7000 rw-s 00000000 00:13 9279289 /dev/shm/sem.bTTLlw (deleted)
2aaabbeb7000-2aaabbeb8000 rw-s 00000000 00:13 9279290 /dev/shm/sem.RHXiKS (deleted)
2aaabbeb8000-2aaabbeb9000 rw-s 00000000 00:13 9272028 /dev/shm/sem.epwwAM (deleted)
2aaabbeb9000-2aaabbeba000 rw-s 00000000 00:13 9272029 /dev/shm/sem.yZCZxN (deleted)
2aaabbeba000-2aaabbebb000 rw-s 00000000 00:13 9272030 /dev/shm/sem.46lwvO (deleted)
2aaabbebb000-2aaabbebc000 rw-s 00000000 00:13 9272031 /dev/shm/sem.iRdcuP (deleted)
2aaabbebc000-2aaabbebd000 rw-s 00000000 00:13 9272032 /dev/shm/sem.SyKWsQ (deleted)
2aaabbebd000-2aaabbebe000 rw-s 00000000 00:13 9272033 /dev/shm/sem.eAEKrR (deleted)
2aaabbebe000-2aaabbebf000 rw-s 00000000 00:13 9272034 /dev/shm/sem.NMFAqS (deleted)
For reference here are the versions of various software that we are using:
Linux Kernel 2.6.18-348.4.1.el5
Redhat Enterprise Linux 5.8
Has anyone seen this scenario and might have some advice on how to further troubleshoot this problem?
Thanks in advance,
University of Florida
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the production