参考文献:
docker run hangs问题排查记录阿里巴巴 Kubernetes 集群问题排查思路和方法

1.首先确认docker version一致

2.查看/var/log/message日志报错如下

The maximum number of pending replies per connection has been reached

3.排查过程

网上搜了下,说是系统为了防止程序占用过多系统资源导致拒绝服务而做的限制。看看/etc/dbus-1/session.conf文件属于哪个包,包含哪些文件
rpm -qf /etc/dbus-1/session.conf
rpm -ql dbus-1.10.24-12.el7.x86_64

/usr/share/dbus-1/session.conf文件末尾有个max_replies_per_connection参数和报错类似,默认是50000

我这通过重启dbus解决问题
systemctl restart dbus

最终并没有查到具体哪个参数导致的,不过确实有关联,下次复现这个问题再继续跟踪下

附:使用strace追踪进程调用链(可能用到)

strace docker run --rm image:tag

故障:到下面卡住了

clone(child_stack=0x7ff631ffafb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7ff631ffb9d0, tls=0x7ff631ffb700, child_tidptr=0x7ff631ffb9d0) = 25759
 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
 futex(0x55c38622c0a8, FUTEX_WAIT, 0, NULL) = 0

正常情况

clone(child_stack=0x7fa86cddffb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fa86cde09d0, tls=0x7fa86cde0700, child_tidptr=0x7fa86cde09d0) = 15449
 rt_sigprocmask(SIG_S) = 40
 ) = 4
 write(2, “\33[K”, 3) = 3
 ) = 1
 write(2, "8c5a7da1afbc: “, 148c5a7da1afbc: ) = 14
 ioctl(1, TIOCGWINSZ, {ws_row=32, ws_col=138, ws_xpixel=0, ws_ypixel=0}) = 0
 ) = 18Pulling fs layer \r”, 18Pulling fs layer
 write(2, “\33[1B”, 4) = 4
 …
 …
 +++ exited with 0 +++busctl tree


可以输出所有 bus 上对外暴露的接口