[WG: Production / Deploying Sakai] Revisiting app server GC settings - advice sought

Stephen Marquard stephen.marquard at uct.ac.za
Mon Mar 16 02:06:26 PDT 2009


Hi all,

Since we moved to 2-6-x in production, we seem to be running into more GC issues, as reflected in longer and more frequent full GCs. We do also have about 30-40% more overall usage than last year, so it could be unrelated to 2-6-x but a consequence of more activity.

We're looking for recommendations to improve our GC settings to reduce the time taken by the Full GCs and/or reduce the frequency of long Full GCs.

The app servers are 8G RAM, 64 bit Linux blades running jdk1.5.0_17, running

Linux srvslscle004 2.6.16.60-0.31-smp #1 SMP Tue Oct 7 16:16:29 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux

Our current GC settings are:

-server -d64 -Xms5000m -Xmx5000m -Xmn1g -XX:MaxPermSize=512m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:GCTimeRatio=19
-XX:SurvivorRatio=8 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=31 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps  -Xloggc:/usr/local/sakai/logs/gc.log  

At the time when we start seeing excessive GC times, the app servers have about 600 user sessions on them. They're not unusable at this stage, but they tend to trip all our performance monitoring systems (as you'd expect from a response time to a single request of 10s vs the typical < 100ms) and back up a few hundred requests on the loadbalancer for short periods.

Some extracts from our gc.log files are below. The first column is seconds since app server start. The lines that I've left out are all the short "ParNew" GCs. The problematic behaviour is a whole series of Full GCs in a row, e.g. starting at 70622.

Any advice welcome.

Cheers
Stephen


23.484: [GC 23.484: [ParNew: 838912K->26306K(943744K), 0.1099360 secs] 838912K->26306K(5015168K), 0.1101380 secs]
41.227: [GC 41.227: [ParNew: 865218K->55009K(943744K), 0.1683830 secs] 865218K->55009K(5015168K), 0.1685900 secs]
62.656: [GC 62.656: [ParNew: 893921K->91343K(943744K), 0.2479720 secs] 893921K->91343K(5015168K), 0.2481470 secs]

...

1204.353: [GC 1204.353: [ParNew: 943744K->88172K(943744K), 0.3344730 secs] 2052652K->1232478K(5015168K), 0.3348100 secs]
1204.746: [Full GC 1204.747: [CMS (concurrent mode failure): 1144305K->721470K(4071424K), 4.5633040 secs] 1236779K->721470K(5015168K), [CMS Perm : 295586K->294923K(296000K)], 4.5647950 secs]
1227.854: [GC 1227.854: [ParNew: 838912K->60849K(943744K), 0.1849720 secs] 1560382K->782319K(5015168K), 0.1853440 secs]

...

6867.341: [GC 6867.341: [ParNew: 943744K->104832K(943744K), 0.2778480 secs] 2976047K->2150987K(5015168K), 0.2782800 secs]
6867.649: [GC [1 CMS-initial-mark: 2046155K(4071424K)] 2151783K(5015168K), 0.1256860 secs]
6867.775: [CMS-concurrent-mark-start]
6869.869: [CMS-concurrent-mark: 2.089/2.089 secs]
6869.870: [CMS-concurrent-preclean-start]
6869.900: [CMS-concurrent-preclean: 0.030/0.031 secs]
6869.900: [CMS-concurrent-abortable-preclean-start]
 CMS: abort preclean due to time 6870.931: [CMS-concurrent-abortable-preclean: 0.172/1.031 secs]
6870.992: [GC[YG occupancy: 169328 K (943744 K)]6870.992: [Rescan (parallel) , 0.1486120 secs]6871.141: [weak refs processing, 0.2962240 secs] [1 CMS-remark: 2046155K(4071424K)] 2215484K(5015168K), 0.4554950 secs]
6871.448: [CMS-concurrent-sweep-start]
6873.631: [CMS-concurrent-sweep: 2.181/2.183 secs]
6873.632: [CMS-concurrent-reset-start]
6873.673: [CMS-concurrent-reset: 0.041/0.041 secs]
6928.483: [GC 6928.483: [ParNew: 943744K->76937K(943744K), 0.2661860 secs] 1890814K->1050838K(5015168K), 0.2666060 secs]

...

22966.977: [GC 22966.978: [ParNew: 938175K->104832K(943744K), 0.2944090 secs] 4059662K->3247647K(5015168K), 0.2948490 secs]
22967.293: [GC [1 CMS-initial-mark: 3142815K(4071424K)] 3247647K(5015168K), 0.1178050 secs]
22967.411: [CMS-concurrent-mark-start]
22970.373: [CMS-concurrent-mark: 2.962/2.962 secs]
22970.373: [CMS-concurrent-preclean-start]
22970.419: [CMS-concurrent-preclean: 0.044/0.046 secs]
22970.419: [CMS-concurrent-abortable-preclean-start]
 CMS: abort preclean due to time 22971.442: [CMS-concurrent-abortable-preclean: 0.221/1.023 secs]
22971.477: [GC[YG occupancy: 293626 K (943744 K)]22971.477: [Rescan (parallel) , 0.3664450 secs]22971.844: [weak refs processing, 0.3327230 secs] [1 CMS-remark: 3142815K(4071424K)] 3436441K(5015168K), 0.7121780 secs]
22972.191: [CMS-concurrent-sweep-start]
22976.137: [CMS-concurrent-sweep: 3.902/3.946 secs]
22976.137: [CMS-concurrent-reset-start]
22976.204: [CMS-concurrent-reset: 0.067/0.067 secs]
22988.679: [GC 22988.679: [ParNew: 943744K->104832K(943744K), 0.3070180 secs] 2264005K->1446269K(5015168K), 0.3074830 secs]
23018.916: [GC 23018.916: [ParNew: 943744K->104831K(943744K), 0.6239100 secs] 2285181K->1562978K(5015168K), 0.6243830 secs]

...

29269.687: [GC 29269.687: [ParNew: 917913K->82191K(943744K), 0.1931320 secs] 3954592K->3118870K(5015168K), 0.1935970 secs]
29291.625: [Full GC 29291.626: [CMS: 3036679K->1387557K(4071424K), 10.3931740 secs] 3622124K->1387557K(5015168K), [CMS Perm : 378953K->367365K(524288K)], 10.3935970 secs]
29329.916: [GC 29329.917: [ParNew: 838852K->22236K(943744K), 0.0878990 secs] 2226410K->1409793K(5015168K), 0.0883050 secs]

...

31630.811: [GC 31630.812: [ParNew: 943744K->104832K(943744K), 0.2950120 secs] 2647679K->1827315K(5015168K), 0.2954270 secs]
31666.923: [Full GC 31666.923: [CMS: 1722483K->1259876K(4071424K), 7.6969950 secs] 2427064K->1259876K(5015168K), [CMS Perm : 378977K->378676K(524288K)], 7.6974040 secs]
31721.418: [GC 31721.418: [ParNew: 838912K->62904K(943744K), 0.1879970 secs] 2098788K->1322780K(5015168K), 0.1883910 secs]

...

47752.229: [GC 47752.229: [ParNew: 943744K->104832K(943744K), 3.0367960 secs] 3959521K->3266969K(5015168K), 3.0371490 secs]
47755.295: [GC [1 CMS-initial-mark: 3162137K(4071424K)] 3266969K(5015168K), 0.1733800 secs]
47755.468: [CMS-concurrent-mark-start]
47759.998: [CMS-concurrent-mark: 4.529/4.529 secs]
47759.998: [CMS-concurrent-preclean-start]
47760.034: [CMS-concurrent-preclean: 0.036/0.036 secs]
47760.034: [CMS-concurrent-abortable-preclean-start]
 CMS: abort preclean due to time 47761.081: [CMS-concurrent-abortable-preclean: 0.188/1.047 secs]
47761.375: [GC[YG occupancy: 210327 K (943744 K)]47761.375: [Rescan (parallel) , 0.1313130 secs]47761.559: [weak refs processing, 0.1358960 secs] [1 CMS-remark: 3162137K(4071424K)] 3372465K(5015168K), 0.3247780 secs]
47761.700: [CMS-concurrent-sweep-start]
47764.757: [CMS-concurrent-sweep: 3.058/3.058 secs]
47764.758: [CMS-concurrent-reset-start]
47764.802: [CMS-concurrent-reset: 0.045/0.045 secs]
47807.777: [GC 47807.789: [ParNew: 942285K->75183K(943744K), 0.2329460 secs] 2608910K->1763039K(5015168K), 0.2332810 secs]

...

59333.603: [GC 59333.604: [ParNew: 937558K->84202K(943744K), 0.2865440 secs] 4049912K->3213354K(5015168K), 0.2869480 secs]
59333.905: [GC [1 CMS-initial-mark: 3129152K(4071424K)] 3219092K(5015168K), 0.2183720 secs]
59334.123: [CMS-concurrent-mark-start]
59341.598: [CMS-concurrent-mark: 7.475/7.475 secs]
59341.598: [CMS-concurrent-preclean-start]
59341.648: [CMS-concurrent-preclean: 0.049/0.050 secs]
59341.648: [CMS-concurrent-abortable-preclean-start]
 CMS: abort preclean due to time 59342.656: [CMS-concurrent-abortable-preclean: 0.210/1.008 secs]
59342.659: [GC[YG occupancy: 161328 K (943744 K)]59342.659: [Rescan (parallel) , 0.1135730 secs]59342.773: [weak refs processing, 0.0733360 secs] [1 CMS-remark: 3129152K(4071424K)] 3290480K(5015168K), 0.1885900 secs]
59342.848: [CMS-concurrent-sweep-start]
59349.502: [CMS-concurrent-sweep: 6.653/6.653 secs]
59349.502: [CMS-concurrent-reset-start]
59349.553: [CMS-concurrent-reset: 0.051/0.051 secs]
59373.776: [GC 59373.776: [ParNew: 923114K->83329K(943744K), 0.1881390 secs] 1925328K->1085543K(5015168K), 0.1885860 secs]

...

64467.374: [GC 64467.375: [ParNew: 943744K->86693K(943744K), 0.3723400 secs] 4053132K->3223656K(5015168K), 0.3728030 secs]
64467.763: [GC [1 CMS-initial-mark: 3136962K(4071424K)] 3226527K(5015168K), 0.1098040 secs]
64467.873: [CMS-concurrent-mark-start]
64472.090: [CMS-concurrent-mark: 4.217/4.217 secs]
64472.090: [CMS-concurrent-preclean-start]
64472.120: [CMS-concurrent-preclean: 0.029/0.031 secs]
64472.120: [CMS-concurrent-abortable-preclean-start]
 CMS: abort preclean due to time 64473.155: [CMS-concurrent-abortable-preclean: 0.239/1.034 secs]
64473.158: [GC[YG occupancy: 286923 K (943744 K)]64473.158: [Rescan (parallel) , 0.2824040 secs]64473.440: [weak refs processing, 0.3771440 secs] [1 CMS-remark: 3136962K(4071424K)] 3423886K(5015168K), 0.6649970 secs]
64473.823: [CMS-concurrent-sweep-start]
64479.367: [CMS-concurrent-sweep: 5.544/5.544 secs]
64479.367: [CMS-concurrent-reset-start]
64479.435: [CMS-concurrent-reset: 0.068/0.068 secs]
64484.612: [GC 64484.612: [ParNew: 925605K->89701K(943744K), 0.2006340 secs] 2643568K->1807664K(5015168K), 0.2011030 secs]

...

67611.147: [GC 67611.147: [ParNew: 943363K->95216K(943744K), 0.2194970 secs] 4059018K->3226094K(5015168K), 0.2198840 secs]
67611.391: [GC [1 CMS-initial-mark: 3130878K(4071424K)] 3227321K(5015168K), 0.1104910 secs]
67611.501: [CMS-concurrent-mark-start]
67617.077: [CMS-concurrent-mark: 5.575/5.575 secs]
67617.077: [CMS-concurrent-preclean-start]
67617.127: [CMS-concurrent-preclean: 0.047/0.050 secs]
67617.127: [CMS-concurrent-abortable-preclean-start]
 CMS: abort preclean due to time 67618.190: [CMS-concurrent-abortable-preclean: 0.260/1.063 secs]
67618.240: [GC[YG occupancy: 737409 K (943744 K)]67618.240: [Rescan (parallel) , 0.5994670 secs]67618.840: [weak refs processing, 0.3035280 secs] [1 CMS-remark: 3130878K(4071424K)] 3868288K(5015168K), 0.9111950 secs]
67619.152: [CMS-concurrent-sweep-start]
67622.544: [GC 67622.545: [ParNew: 934128K->98773K(943744K), 0.2225970 secs] 3128963K->2298886K(5015168K), 0.2230080 secs]
67624.951: [CMS-concurrent-sweep: 5.560/5.786 secs]
67624.951: [CMS-concurrent-reset-start]
67624.994: [CMS-concurrent-reset: 0.043/0.043 secs]
67639.837: [GC 67639.837: [ParNew: 937685K->99019K(943744K), 0.2392750 secs] 2651140K->1817564K(5015168K), 0.2397650 secs]

...

69549.701: [GC 69549.701: [ParNew: 943744K->100744K(943744K), 0.2499910 secs] 4059008K->3228774K(5015168K), 0.2504360 secs]
69549.952: [GC [1 CMS-initial-mark: 3128029K(4071424K)] 3231360K(5015168K), 0.1253200 secs]
69550.078: [CMS-concurrent-mark-start]
69557.721: [CMS-concurrent-mark: 7.628/7.643 secs]
69557.721: [CMS-concurrent-preclean-start]
69557.765: [CMS-concurrent-preclean: 0.042/0.044 secs]
69557.765: [CMS-concurrent-abortable-preclean-start]
 CMS: abort preclean due to time 69558.793: [CMS-concurrent-abortable-preclean: 0.259/1.028 secs]
69558.796: [GC[YG occupancy: 577488 K (943744 K)]69558.797: [Rescan (parallel) , 0.7291500 secs]69559.526: [weak refs processing, 0.4361640 secs] [1 CMS-remark: 3128029K(4071424K)] 3705517K(5015168K), 1.1718250 secs]
69559.971: [CMS-concurrent-sweep-start]
69564.181: [CMS-concurrent-sweep: 4.197/4.197 secs]
69564.181: [CMS-concurrent-reset-start]
69564.236: [CMS-concurrent-reset: 0.055/0.055 secs]
69565.550: [GC 69565.550: [ParNew: 939656K->98025K(943744K), 0.2241920 secs] 2737108K->1901480K(5015168K), 0.2246160 secs]

...

70374.420: [GC 70374.420: [ParNew: 933875K->83227K(943744K), 0.2482430 secs] 3331332K->2502494K(5015168K), 0.2487240 secs]
70375.336: [Full GC 70375.336: [CMS: 2419267K->1828063K(4071424K), 12.2888470 secs] 2517247K->1828063K(5015168K), [CMS Perm : 406349K->401138K(524288K)], 12.2892630 secs]
70396.647: [GC 70396.647: [ParNew: 838912K->104832K(943744K), 0.4042130 secs] 2666975K->1948260K(5015168K), 0.4046490 secs]
70412.707: [Full GC 70412.707: [CMS: 1843428K->1807106K(4071424K), 10.2653580 secs] 2558811K->1807106K(5015168K), [CMS Perm : 401165K->401155K(524288K)], 10.2659650 secs]
70423.602: [Full GC 70423.602: [CMS: 1807106K->1799785K(4071424K), 9.9146110 secs] 1898270K->1799785K(5015168K), [CMS Perm : 401177K->401163K(524288K)], 9.9151040 secs]
70434.239: [Full GC 70434.240: [CMS: 1799785K->1806317K(4071424K), 9.6978270 secs] 1930839K->1806317K(5015168K), [CMS Perm : 401196K->401183K(524288K)], 9.6983740 secs]
70444.662: [Full GC 70444.662: [CMS: 1806317K->1806478K(4071424K), 9.8252720 secs] 1939530K->1806478K(5015168K), [CMS Perm : 401195K->401187K(524288K)], 9.8257810 secs]
70458.281: [Full GC 70458.282: [CMS: 1806478K->1813899K(4071424K), 9.0004990 secs] 2257135K->1813899K(5015168K), [CMS Perm : 401232K->401217K(524288K)], 9.0010200 secs]
70467.756: [Full GC 70467.756: [CMS: 1813899K->1808548K(4071424K), 10.0453770 secs] 1859903K->1808548K(5015168K), [CMS Perm : 401228K->401218K(524288K)], 10.0459120 secs]
70478.317: [Full GC 70478.317: [CMS: 1808548K->1810302K(4071424K), 9.6284660 secs] 1850598K->1810302K(5015168K), [CMS Perm : 401236K->401222K(524288K)], 9.6289730 secs]
70488.473: [Full GC 70488.473: [CMS: 1810302K->1811830K(4071424K), 8.8405820 secs] 1851481K->1811830K(5015168K), [CMS Perm : 401238K->401224K(524288K)], 8.8410730 secs]
70506.701: [GC 70506.701: [ParNew: 838912K->45125K(943744K), 0.1405100 secs] 2650742K->1856956K(5015168K), 0.1411210 secs]
70513.974: [GC 70513.975: [ParNew: 884037K->61990K(943744K), 0.1370560 secs] 2695868K->1873821K(5015168K), 0.1375380 secs]
70517.738: [Full GC 70517.738: [CMS: 1811830K->1836153K(4071424K), 10.2075070 secs] 2277839K->1836153K(5015168K), [CMS Perm : 401342K->401314K(524288K)], 10.2080350 secs]
70528.542: [Full GC 70528.543: [CMS: 1836153K->1832106K(4071424K), 8.7894580 secs] 1908693K->1832106K(5015168K), [CMS Perm : 401335K->401317K(524288K)], 8.7902530 secs]
70537.757: [Full GC 70537.757: [CMS: 1832106K->1834519K(4071424K), 9.6428030 secs] 1885272K->1834519K(5015168K), [CMS Perm : 401327K->401319K(524288K)], 9.6433480 secs]
70547.857: [Full GC 70547.857: [CMS: 1834519K->1837896K(4071424K), 8.6115270 secs] 1883284K->1837896K(5015168K), [CMS Perm : 401334K->401320K(524288K)], 8.6120590 secs]
70564.155: [GC 70564.156: [ParNew: 838912K->35345K(943744K), 0.1117160 secs] 2676808K->1873242K(5015168K), 0.1122400 secs]
70568.323: [Full GC 70568.323: [CMS: 1837896K->1843347K(4071424K), 9.8451750 secs] 2363845K->1843347K(5015168K), [CMS Perm : 401356K->401343K(524288K)], 9.8457220 secs]
70578.694: [Full GC 70578.694: [CMS: 1843347K->1844855K(4071424K), 8.7805860 secs] 1895053K->1844855K(5015168K), [CMS Perm : 401370K->401355K(524288K)], 8.7811110 secs]
70587.814: [Full GC 70587.814: [CMS: 1844855K->1834670K(4071424K), 9.8209280 secs] 1869303K->1834670K(5015168K), [CMS Perm : 401371K->401354K(524288K)], 9.8215000 secs]
70598.057: [Full GC 70598.057: [CMS: 1834670K->1844248K(4071424K), 9.3784210 secs] 1878226K->1844248K(5015168K), [CMS Perm : 401372K->401362K(524288K)], 9.3790120 secs]
70613.855: [GC 70613.856: [ParNew: 838912K->38639K(943744K), 0.1058890 secs] 2683160K->1882887K(5015168K), 0.1064020 secs]
70621.945: [GC 70621.945: [ParNew: 877551K->68099K(943744K), 0.1549250 secs] 2721799K->1912347K(5015168K), 0.1554800 secs]
70622.937: [Full GC 70622.937: [CMS: 1844248K->1854709K(4071424K), 9.0322210 secs] 1981171K->1854709K(5015168K), [CMS Perm : 401391K->401371K(524288K)], 9.0327500 secs]
70632.358: [Full GC 70632.358: [CMS: 1854709K->1833790K(4071424K), 10.0332030 secs] 1892899K->1833790K(5015168K), [CMS Perm : 401385K->401362K(524288K)], 10.0337790 secs]
70642.877: [Full GC 70642.877: [CMS: 1833790K->1840344K(4071424K), 8.5311480 secs] 1891402K->1840344K(5015168K), [CMS Perm : 401381K->401368K(524288K)], 8.5316940 secs]
70652.038: [Full GC 70652.039: [CMS: 1840344K->1848224K(4071424K), 9.5606510 secs] 1943580K->1848224K(5015168K), [CMS Perm : 401377K->401369K(524288K)], 9.5612150 secs]
70665.292: [Full GC 70665.292: [CMS: 1848224K->1853711K(4071424K), 9.1706600 secs] 2379802K->1853711K(5015168K), [CMS Perm : 401386K->401370K(524288K)], 9.1713840 secs]
70674.979: [Full GC 70674.979: [CMS: 1853711K->1850172K(4071424K), 8.4414800 secs] 1934760K->1850172K(5015168K), [CMS Perm : 401384K->401373K(524288K)], 8.4420140 secs]
70683.854: [Full GC 70683.854: [CMS: 1850172K->1851004K(4071424K), 8.7758960 secs] 1894064K->1851004K(5015168K), [CMS Perm : 401384K->401374K(524288K)], 8.7764240 secs]
70693.204: [Full GC 70693.204: [CMS: 1851004K->1851241K(4071424K), 9.1485970 secs] 1922225K->1851241K(5015168K), [CMS Perm : 401385K->401383K(524288K)], 9.1491360 secs]
70710.190: [GC 70710.191: [ParNew: 838912K->46674K(943744K), 0.1271790 secs] 2690153K->1897915K(5015168K), 0.1277290 secs]
70720.391: [GC 70720.391: [ParNew: 885577K->52774K(943744K), 0.1228470 secs] 2736818K->1904015K(5015168K), 0.1233340 secs]



More information about the production mailing list