大对象导致JVM Crash (Jboss) 分析及解决

精选转载

水火云树 2010-11-09 09:16:38 博主文章分类：java堆内存/GC

最近在一个项目中，web 应用跑一段时间后, JBoss JVM crash ，web日志中没有任何异常。

存放日志的地方发现有 hs_err_pid25052.log，发现这个文件，就知道是JVM crash了。

打开这个文件然后分析：

---------------  T H R E A D  ---------------   
   
Current thread (0x0000000050682000):  GCTaskThread [stack: 0x00000000413fb000,0x00000000414fc000] [id=25059]   
siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR), si_addr=0x0000000000000018   
Registers:   
此处省略.........   
Top of Stack: (sp=0x00000000414fae70)   
此处省略.........   
Instructions: (pc=0x00002aaffd33163c)   
此处省略.........   
Stack: [0x00000000413fb000,0x00000000414fc000],  sp=0x00000000414fae70,  free space=3ff0000000000000018k   
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)   
V  [libjvm.so+0x62263c]   
V  [libjvm.so+0x6230b0]   
V  [libjvm.so+0x625672]   
V  [libjvm.so+0x365dda]   
V  [libjvm.so+0x5da2af]

分析1：JVM crash时，执行的线程GCTaskThread；V [libjvm.so+0x62263c]
代表执行的是VM代码，也就是说执行虚拟机的代码时JVM crash的。

分析2：这里日志是当前所有的Thread的列表。exited标示说明这个是执行线程。

VM state:at safepoint (normal execution)   
   
VM Mutex/Monitor currently owned by a thread:  ([mutex/lock_event])   
[0x0000000050663370] Threads_lock - owner thread: 0x00000000506d2000   
[0x0000000050663870] Heap_lock - owner thread: 0x00000000517c4800   
   
Heap   
 PSYoungGen      total 43328K, used 43136K [0x00002aaacb760000, 0x00002aaace200000, 0x00002aaad6200000)   
  eden space 43008K, 100% used [0x00002aaacb760000,0x00002aaace160000,0x00002aaace160000)   
  from space 320K, 40% used [0x00002aaace1b0000,0x00002aaace1d0000,0x00002aaace200000)   
  to   space 320K, 40% used [0x00002aaace160000,0x00002aaace180000,0x00002aaace1b0000)   
 PSOldGen        total 349568K, used 119371K [0x00002aaab6200000, 0x00002aaacb760000, 0x00002aaacb760000)   
  object space 349568K, 34% used [0x00002aaab6200000,0x00002aaabd692e50,0x00002aaacb760000)   
 PSPermGen       total 131072K, used 62813K [0x00002aaaae200000, 0x00002aaab6200000, 0x00002aaab6200000)   
  object space 131072K, 47% used [0x00002aaaae200000,0x00002aaab1f57670,0x00002aaab6200000)

分析3上，从上面的日志可以肯定是Young GC的时候发生了异常，导致JVM crash。

在网上查询这个异常，发现很多人碰到了这个问题；然后看到sun jdk 官网上，这个版本（1.6_18）有已知的问题，其中一条讲的就是我们的问题：

Card-Marking Optimization Issue   
•   A flaw in the implementation of a card-marking performance optimization in the JVM can cause heap corruption under some circumstances. This issue affects the CMS garbage collector prior to 6u18, and the CMS, G1 and Parallel Garbage Collectors in 6u18. The serial garbage collector is not affected. Applications most likely to be affected by this issue are those that allocate very large objects which would not normally fit in Eden, or those that make extensive use of JNI Critical Sections (JNI Get/Release*Critical).   
This issue will be fixed in the next Java SE 6 update.   
Meanwhile, as a workaround to the issue, users should disable this performance optimization by -XX:-ReduceInitialCardMarks.

解决方案：通过jdk 增加这个 -XX:-ReduceInitialCardMarks 项，避免这个问题。

总结:

JVM crash时，充分分析JVM自动生成的hs_err_pid.log文件；如果确定是JVM的问题后，去网上google，并且上Sun的官网。