现在 IT 界普遍高并发, 分布式环境, 难免遇到
死锁
,死循环
等问题, 平时开发我们一般都可以停掉服务, 然后打 trace ---> 编译 ---> 修改源代码 ---> 重新编译 ---> ..... ---> 解决问题
, 或者通过集成开发环境(如:IDEA
,Eclipse
)提供的Debug
功能打断点 ---> watch variable ---> 进入/跳出方法 ---> .....---> 解决问题
, 但是, 当遇到生产环境却往往不知所措, 因为生产环境是不能随便让你停止服务的, 并且生产环境是不会开放 debug 端口的(debug 需要开放调试<>)
, 那么, 线上死锁如何定位? 怎么---> 解决问题
?实际上, Java 已经为我们提供了一些好用的定位分析工具, 当然, 这是一句废话, 你肯定知道, 你还知道哪些命令在 JDK 的安装目录下的
bin
下和java
以及javac
在一块躺着. 是的, 的确如此, 你可真是一个小机灵鬼哦(奈何从来没用过是不是? 今天帅帅给你一个应用场景, 哈哈....)
01
—
jps
1.1 jps 虚拟机进程状况工具
要定位分析 java 死锁, 死循环你得先知道发生问题的
LVMID
(Logical Volume Manager ID), 就好比你要看 Windows 上的一个应用的详细信息, 你得先找到这个应用运行在哪个进程上一样. 或者你可以将LVMID
理解为PID
, 通常LVMID
和 进程PID
是一致的, 但是并不非得一样.因此, 要想知道
LVMID
, 可以通过查询 PID, 但是其一这是不保险的, 其二, 当同时启动了多个虚拟机进程时, 无法通过名称区分, 就必须得使用jps
了(任务管理器里面有 n 个 java.exe).
JackLi:~ dreamli$ jps
1461
1847 GradleDaemon
1210
1851 RunnerApplication
1852 Jps
JackLi:~ dreamli$
上面是一个简单的使用, 第一列的数字就是
LVMID
, 后面跟的是启动的主类. 帅帅使用的gradle
构建工具, 所以有一个GradleDaemon
, 然后jfoa
项目的启动入口在RunnerApplication
1.2 jps 参数
jps 有以下几个参数
-q
只输出LVMID
JackLi:~ dreamli$ jps -q
1875
1461
1847
1210
1851
-m
输出传给 main 函数的参数
JackLi:~ dreamli$ jps -m
1876 Jps -m
1461
1847 GradleDaemon 6.2.2
1210
1851 RunnerApplication
-l
输出主类的全名, 如果进程执行的是 jar, 则会输出 jar 路径
JackLi:~ dreamli$ jps -l
1461
1877 sun.tools.jps.Jps
1847 org.gradle.launcher.daemon.bootstrap.GradleDaemon
1210
1851 club.javafamily.runner.RunnerApplication
-v
输出虚拟机进程启动时 JVM 参数
JackLi:~ dreamli$ jps -v
1461 -Dosgi.requiredJavaVersion=1.8 -Dosgi.instance.area.default=@user.home/eclipse-workspace -XX:+UseG1GC -XX:+UseStringDeduplication -XstartOnFirstThread -Dorg.eclipse.swt.internal.carbon.smallFonts -Dosgi.requiredJavaVersion=1.8 -Xms256m -Xmx1024m -Xdock:icon=../Resources/Eclipse.icns -XstartOnFirstThread -Dorg.eclipse.swt.internal.carbon.smallFonts -javaagent:/Applications/Eclipse.app/Contents/Eclipse/lombok.jar
1878 Jps -Dapplication.home=/Users/dreamli/Software/java/zulu-jfx-252/zulu-8.jdk/Contents/Home -Xms8m
1847 GradleDaemon -Xmx2g -Dfile.encoding=UTF-8 -Duser.country=CN -Duser.language=zh -Duser.variant
1210 -Xms128m -Xmx2048m -XX:ReservedCodeCacheSize=240m -XX:+UseCompressedOops -Dfile.encoding=UTF-8 -XX:+UseConcMarkSweepGC -XX:SoftRefLRUPolicyMSPerMB=50 -ea -Dsun.io.useCanonCaches=false -Djava.net.preferIPv4Stack=true -Djdk.http.auth.tunneling.disabledSchemes="" -XX:+HeapDumpOnOutOfMemoryError -XX:-OmitStackTraceInFastThrow -Xverify:none -XX:ErrorFile=/Users/dreamli/java_error_in_idea_%p.log -XX:HeapDumpPath=/Users/dreamli/java_error_in_idea.hprof -javaagent:/Applications/IntelliJ IDEA.app/Contents/bin/jetbrains-agent.jar -Djb.vmOptionsFile=/Users/dreamli/Library/Preferences/IntelliJIdea2019.3/idea.vmoptions -Didea.home.path=/Applications/IntelliJ IDEA.app/Contents -Didea.executable=idea -Didea.paths.selector=IntelliJIdea2019.3
1851 RunnerApplication -Djava.rmi.server.hostname=127.0.0.1 -Dspring.profiles.active=dev -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005 -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/Users/dreamli/Workspace/MyRepository/javafamily/jfoa/runner/build/server-temp -Duser.country=CN -Duser.language=zh -Duser.variant
02
—
jstack
2.1 jstack: java 堆栈跟踪工具
jstack
命令用于生成当前时刻线程快照(一般称为threaddump
文件或者javacore
文件), 线程快照就是当前虚拟机内每一条线程正在执行的方法堆栈集合, 生成线程快照的目的是定位线程出现长时间停顿的原因, 如: 线程死锁, 死循环, 请求外部资源的长时间等待等都是导致线程长时间停顿的常见原因. 基本命令格式如下:jstack [options] lvmid
JackLi:~ dreamli$ jps -m
1461
1847 GradleDaemon 6.2.2
1210
1851 RunnerApplication
1980 Jps -m
JackLi:~ dreamli$ jstack 1851
2020-11-26 23:59:27
Full thread dump OpenJDK 64-Bit Server VM (25.252-b14 mixed mode):
"Attach Listener" #62 daemon prio=9 os_prio=31 tid=0x00007faece823800 nid=0xac03 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"MessageBroker-2" #61 prio=5 os_prio=31 tid=0x00007faed3018800 nid=0x15403 waiting on condition [0x000070000ab6c000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000773124fa8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
2.2 jstack 参数
jstack
有以下几个参数:
-F
: 当正常输出的请求不被响应时, 强制输出线程堆栈-l
: 除堆栈外显示关于锁的信息-m
: 调用本地方法的话, 可以显示 C/C++ 的堆栈
03
—
死锁定位分析
昨晚代码撸到半夜, 提交了很多代码, 没想到......死锁------留下了心疼自己的泪水...
3.1 jps 定位 lvmid
第一步不用说了, 先找到
lvmid
3.2 jstack 获取线程堆栈
3.1 提点 Linux 小知识
- 将线程堆栈写入本地文件
jstack -m 1851 >> ....您的路径../jfoa/jstack-2020-11-26.log
- 获取远程 Linux 上的堆栈文件
scp root@javafamily.club:/root/jfoa-logs/jstack-2020-11-26.log ./jstack.txt
3.2 线程堆栈文件分析
这里贴出来一些核心的地方
"http-nio-80-exec-6" #58 daemon prio=5 os_prio=0 tid=0x00007efc15ccb800 nid=0x40 waiting on condition [0x00007efbdd490000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e051e308> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:107)
at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:33)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- None
"jfoa-1" #78 prio=5 os_prio=0 tid=0x00000000009ef000 nid=0x52 waiting on condition [0x00007efbda87f000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e0363ae0> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
at club.javafamily.runner.web.portal.service.SubjectVoteService.getVoteCurrentCount(SubjectVoteService.java:159)
at club.javafamily.runner.web.portal.service.SubjectVoteService.changeVote(SubjectVoteService.java:93)
at club.javafamily.runner.web.portal.service.SubjectVoteService$$FastClassBySpringCGLIB$$c43301b2.invoke()
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:749)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
at org.springframework.aop.interceptor.AsyncExecutionInterceptor.lambda$invoke$0(AsyncExecutionInterceptor.java:115)
at org.springframework.aop.interceptor.AsyncExecutionInterceptor$$Lambda$848/1976926828.call(Unknown Source)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- <0x00000000dd01f1a0> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"https-jsse-nio-443-exec-5" #45 daemon prio=5 os_prio=0 tid=0x00007efc15a17800 nid=0x33 waiting on condition [0x00007efbde19a000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e0363ae0> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
at club.javafamily.runner.web.portal.service.SubjectVoteService.getVoteCurrentCount(SubjectVoteService.java:159)
at club.javafamily.runner.web.portal.service.SubjectVoteService$$FastClassBySpringCGLIB$$c43301b2.invoke()
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
...
Locked ownable synchronizers:
- <0x00000000e0520d38> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"https-jsse-nio-443-exec-10" #50 daemon prio=5 os_prio=0 tid=0x00007efc15c2d800 nid=0x38 waiting on condition [0x00007efbddc94000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e0363ae0> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
at club.javafamily.runner.web.portal.service.SubjectVoteService.getSubjectVoteDto(SubjectVoteService.java:61)
at club.javafamily.runner.web.portal.service.SubjectVoteService$$FastClassBySpringCGLIB$$c43301b2.invoke()
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
Locked ownable synchronizers:
- <0x00000000e051fc38> (a java.util.concurrent.ThreadPoolExecutor$Worker)
这里主要分析
Locked ownable synchronizers
以及"jfoa-1" #78 prio=5 os_prio=0 tid=0x00000000009ef000 nid=0x52 waiting on condition [0x00007efbda87f000]
Locked ownable synchronizers
显示了锁信息"jfoa-1" #78 prio=5 os_prio=0 tid=0x00000000009ef000 nid=0x52 waiting on condition [0x00007efbda87f000]
如下
"线程名" #78 prio=5 os_prio=0 tid=0x00000000009ef000 nid=0x52 waiting on condition [等待锁状态, 0(0x0000000000000000) 代表没锁]
java.lang.Thread.State: 线程状态
然后结合方法调用堆栈分析死锁, 死循环的原因.
04
—
问题重现
帅帅的问题最终定位到原因是
ReentrantReadWriteLock
在读锁中获取写锁.(在写锁中获取读锁不会死锁)
- Service
package org.jack.thread01;
import java.util.concurrent.locks.ReadWriteLock;
import java.util.concurrent.locks.ReentrantReadWriteLock;
/**
* @Description: 在读锁中获取写锁
* @Warning: 死锁
* @Author: Jack Li
* @Package: thread-01 - ThreadService
* @Date: Nov 26, 2020 10:57:49 PM
* @Version: 1.0.0
* @TimeComplexity: Required[O(n)] ---- Current[O(n^2)]
* @ExecuteResult: Success!
* @Status: Accepted
*/
public class ThreadService {
public void method1() {
System.out.println("Method 1 准备获取读锁");
lock.readLock().lock();
System.out.println("Method 1 得到读锁");
try {
Thread.sleep(2000);
System.out.println("Method 1 准备获取写锁");
method3();
System.out.println("Method 1 获得写锁并释放");
}
catch(Exception e) {
e.printStackTrace();
}
finally {
lock.readLock().unlock();
System.out.println("Method 1 释放读锁");
}
}
public void method2() {
System.out.println("Method 2 准备获取写锁");
lock.writeLock().lock();
System.out.println("Method 2 获得写锁");
try {
Thread.sleep(1000);
}
catch(Exception e) {
e.printStackTrace();
}
finally {
lock.writeLock().unlock();
System.out.println("Method 2 释放写锁");
}
}
public void method3() {
lock.writeLock().lock();
try {
Thread.sleep(500);
}
catch(Exception e) {
e.printStackTrace();
}
finally {
lock.writeLock().unlock();
}
}
private final ReadWriteLock lock = new ReentrantReadWriteLock();
}
- 程序入口
public class DealLockTest {
public static void main(String[] args) {
new Thread(() -> {
threadService.method1();
}) .start();
new Thread(() -> {
threadService.method2();
}) .start();
}
private static ThreadService threadService = new ThreadService();
// private static ThreadService2 threadService = new ThreadService2();
}
- 线程堆栈
"Thread-1" #11 prio=5 os_prio=31 tid=0x00007fbda01c2000 nid=0xa903 waiting on condition [0x0000700003e8c000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000076abbc1b8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
at org.jack.thread01.ThreadService.method2(ThreadService.java:50)
at org.jack.thread01.DealLockTest.lambda$1(DealLockTest.java:11)
at org.jack.thread01.DealLockTest$$Lambda$2/295530567.run(Unknown Source)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
"Thread-0" #10 prio=5 os_prio=31 tid=0x00007fbda01be800 nid=0x5503 waiting on condition [0x0000700003d89000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000076abbc1b8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
at org.jack.thread01.ThreadService.method3(ThreadService.java:68)
at org.jack.thread01.ThreadService.method1(ThreadService.java:31)
at org.jack.thread01.DealLockTest.lambda$0(DealLockTest.java:7)
at org.jack.thread01.DealLockTest$$Lambda$1/250421012.run(Unknown Source)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
Thread-0 先获取了读锁, 然后在读锁还没释放时又去获取写锁, 获取写锁时又要等待所有的读锁释放.从而导致了死锁.