这是一个开发中遇到的一个问题,场景是单次http请求需要调用很多次上游接口,所以采用线程池并发调用。开发中没有注意,外层任务和内层任务都是用了线程池,并且使用的是同一个线程池,造成了资源竞争,外层需要内层执行结束的结果,内层需要外层释放的线程,两者相持不下,造成程序卡死。
用代码模拟个场景,同一个线程池,submit一个任务之后,又在任务中submit一个新的线程任务。这里定一个只有一个核心线程数的线程池,用于展示现象,并不特指只有单线程才会出现bug现象。
package com.example.springdemo;
import org.springframework.scheduling.concurrent.CustomizableThreadFactory;
import java.util.concurrent.*;
public class TestA {
private static ThreadPoolExecutor es = new ThreadPoolExecutor(1, 1,
0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue<Runnable>(), new CustomizableThreadFactory("test-thread-"), new ThreadPoolExecutor.CallerRunsPolicy());
/*private static ThreadPoolExecutor esIn = new ThreadPoolExecutor(2, 2,
0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue<Runnable>(), new CustomizableThreadFactory("test-thread-in-"), new ThreadPoolExecutor.CallerRunsPolicy());*/
public static void main(String[] args) throws ExecutionException, InterruptedException {
//商品信息
System.out.println("商品信息查询开始");
Future<?> fProduct = es.submit(() -> {
//这里需要组装两部分信息 1是基本信息 2是促销信息
//为了加快速度,促销信息用一个新线程查,基本信息用当前线程查
Future<?> fPromo = es.submit(() -> {
System.out.println("这句代表查询了商品促销信息接口");
});
System.out.println("这句代表查询了商品基本信息接口");
try {
fPromo.get();
System.out.println("基本信息和促销信息都拿到了");
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
}
});
fProduct.get();
System.out.println("商品信息查询结束");
}
}
调试运行,结果是只打印了两句。基本可以证明查询促销任务没有执行(因为没有打印执行的文案)。
更加严谨,我们看下jstack里到底发生了什么
"test-thread-1" #14 prio=5 os_prio=0 tid=0x0000000020e28800 nid=0x3ea8 waiting on condition [0x000000002186e000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000076c2e9430> (a java.util.concurrent.FutureTask)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
at java.util.concurrent.FutureTask.get(FutureTask.java:191)
at com.example.springdemo.TestA.lambda$main$1(TestA.java:27)
at com.example.springdemo.TestA$$Lambda$1/407858146.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
at java.util.concurrent.FutureTask.run(FutureTask.java)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)"Service Thread" #13 daemon prio=9 os_prio=0 tid=0x000000001ffc6000 nid=0x2cd4 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"C1 CompilerThread3" #12 daemon prio=9 os_prio=2 tid=0x000000001ff38000 nid=0x1518 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"C2 CompilerThread2" #11 daemon prio=9 os_prio=2 tid=0x000000001ff35800 nid=0x4614 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"C2 CompilerThread1" #10 daemon prio=9 os_prio=2 tid=0x000000001ff1c000 nid=0x2374 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"C2 CompilerThread0" #9 daemon prio=9 os_prio=2 tid=0x000000001ff19800 nid=0x33e8 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"JDWP Command Reader" #8 daemon prio=10 os_prio=0 tid=0x000000001eff1000 nid=0x2338 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"JDWP Event Helper Thread" #7 daemon prio=10 os_prio=0 tid=0x000000001eff0000 nid=0x4738 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"JDWP Transport Listener: dt_socket" #6 daemon prio=10 os_prio=0 tid=0x000000001efe4800 nid=0x1848 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"Attach Listener" #5 daemon prio=5 os_prio=2 tid=0x000000001ef80800 nid=0x254 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"Signal Dispatcher" #4 daemon prio=9 os_prio=2 tid=0x000000001efdb000 nid=0x301c runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"Finalizer" #3 daemon prio=8 os_prio=1 tid=0x000000001d077000 nid=0x13b0 in Object.wait() [0x000000001f43e000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x000000076b388ec8> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
- locked <0x000000076b388ec8> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)"Reference Handler" #2 daemon prio=10 os_prio=2 tid=0x000000001d070000 nid=0x2fc4 in Object.wait() [0x000000001ef3f000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x000000076b386b68> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:502)
at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
- locked <0x000000076b386b68> (a java.lang.ref.Reference$Lock)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)"main" #1 prio=5 os_prio=0 tid=0x000000000344b000 nid=0xb84 waiting on condition [0x00000000033df000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000076c2bdd50> (a java.util.concurrent.FutureTask)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
at java.util.concurrent.FutureTask.get(FutureTask.java:191)
at com.example.springdemo.TestA.main(TestA.java:33)"VM Thread" os_prio=2 tid=0x000000001d067000 nid=0x3fac runnable
"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x0000000003460800 nid=0x2528 runnable
"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x0000000003462000 nid=0x3f30 runnable
"GC task thread#2 (ParallelGC)" os_prio=0 tid=0x0000000003464000 nid=0x2a0 runnable
"GC task thread#3 (ParallelGC)" os_prio=0 tid=0x0000000003465800 nid=0x4ae8 runnable
"GC task thread#4 (ParallelGC)" os_prio=0 tid=0x0000000003468800 nid=0x49ac runnable
"GC task thread#5 (ParallelGC)" os_prio=0 tid=0x0000000003469800 nid=0x6a0 runnable
"GC task thread#6 (ParallelGC)" os_prio=0 tid=0x000000000346d000 nid=0x4098 runnable
"GC task thread#7 (ParallelGC)" os_prio=0 tid=0x000000000346e000 nid=0x2944 runnable
"VM Periodic Task Thread" os_prio=2 tid=0x00000000200e2000 nid=0x4304 waiting on condition
JNI global references: 3061
可以看到,jstack发现两处java.lang.Thread.State: WAITING (parking),分别是线程main和test-thread-1 ,行数是33和27行。
分析一下,33行park是因为27行park,这里没法拿到返回结果,所以阻塞住了,那么27行为什么停止呢?
27行应该是22行这个任务提交了,但是线程没有执行(没有执行的println打出),所以get肯定拿不到结果,造成阻塞。
为什么提交却不执行呢?这里需要知道线程池的执行机制,提交任务时先增加worker,当到worker到核心线程数量时,就会加入队列了。。。。。(剩下的过程就不说了,解答这个问题已经够了)。
因为我们线程池es核心线程数是1,所以19行就占用了一个线程,那么22行提交的时候,因为已经到了核心线程数,那么这个任务就会加入队列,等待线程池中的线程空闲了消费队列中的任务。27行去尝试拿结果,那么这个线程就会陷入等待任务完成的状态。因为这个线程陷入等待的状态,所以外层33就拿不到执行结果,也是一直wait,一直占用着这一个线程,不会释放,27行等待的任务也不会拿到线程去执行然后返回结果,这个就造成了jstack两个等待的情况。
总体而言,造成阻塞的原因就是内层需要外层释放线程去执行,外层需要拿到内层执行的结果。外层不拿到结果就不释放线程,内层不拿到线程就得不到结果。造成了另一种形式的资源竞争死锁。但是通过jconsole等工具却检查不到,排查还是有一定难度的。