近来解决了一个可以使薪资提升10K的bug,做如下总结:

硬件平台:MTK6572

软件平台:Android4.4.2

具体现象:

        系统启动完成之后,高频率按powerkey会导致系统卡死,等待大概3~5分钟之后,进入开机动画,然后重启。在这3~5分钟之内,底层的事件都有,但是上层没有反应,而且dumpsys挂了,啥都出不来。

"*** APP TRANSITION TIMEOUT"

这一行是WindowManagerService.java打印出来的,出现的原因可能是由于window切换的时候,发生了超时,无响应之类。但考虑到无响应系统应该有anr弹窗出来,而这种状态下没有,因此一开始没有倾向于anr的方向考虑。

后来,抱着看一看的想法,查看了data/anr目录的traces日志,发现其中大有文章啊,下面截取当中的片段:

"main" prio=5 tid=1 MONITOR
 | group="main" sCount=1 dsCount=0 obj=0x41858de0 self=0x4176e8e8
 | sysTid=591 nice=-2 sched=0/0 cgrp=apps handle=1073918340
 | state=S schedstat=( 6320338488 10332730389 16603 ) utm=442 stm=190 core=1
 at com.android.server.am.ActivityManagerService.broadcastIntent(ActivityManagerService.java:~14759)
 - waiting to lock <0x41ac0000> (a com.android.server.am.ActivityManagerService) held by tid=68 (Binder_9)
 at android.app.ActivityManagerNative.broadcastStickyIntent(ActivityManagerNative.java:97)
 at com.android.server.BatteryService$8.run(BatteryService.java:573)
 at android.os.Handler.handleCallback(Handler.java:808)
 at android.os.Handler.dispatchMessage(Handler.java:103)
 at android.os.Looper.loop(Looper.java:193)
 at com.android.server.ServerThread.initAndLoop(SystemServer.java:1456)
 at com.android.server.SystemServer.main(SystemServer.java:1551)
 at java.lang.reflect.Method.invokeNative(Native Method)
 at java.lang.reflect.Method.invoke(Method.java:515)
 at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:825)
 at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:641)
 at dalvik.system.NativeStart.main(Native Method)
 ......
"Binder_9" prio=5 tid=68 MONITOR
 | group="main" sCount=1 dsCount=0 obj=0x42065ef8 self=0x5f1e8758
 | sysTid=1055 nice=0 sched=0/0 cgrp=apps handle=1629852832
 | state=S schedstat=( 657378771 1141816455 2278 ) utm=38 stm=27 core=0
 at com.android.server.power.PowerManagerService.acquireWakeLockInternal(PowerManagerService.java:~923)
 - waiting to lock <0x41aaef28> (a java.lang.Object) held by tid=28 (InputReader)
 at com.android.server.power.PowerManagerService.acquireWakeLock(PowerManagerService.java:915)
 at android.os.PowerManager$WakeLock.acquireLocked(PowerManager.java:872)
 at android.os.PowerManager$WakeLock.acquire(PowerManager.java:841)
 at com.android.server.am.ActivityStackSupervisor.acquireLaunchWakelock(ActivityStackSupervisor.java:1975)
 at com.android.server.am.ActivityStack.startPausingLocked(ActivityStack.java:812)
 at com.android.server.am.ActivityStack.resumeTopActivityLocked(ActivityStack.java:1508)
 at com.android.server.am.ActivityStackSupervisor.resumeTopActivitiesLocked(ActivityStackSupervisor.java:2226)
 at com.android.server.am.ActivityStackSupervisor.resumeTopActivitiesLocked(ActivityStackSupervisor.java:2213)
 at com.android.server.am.ActivityStack.startActivityLocked(ActivityStack.java:1978)
 at com.android.server.am.ActivityStackSupervisor.startActivityUncheckedLocked(ActivityStackSupervisor.java:1960)
 at com.android.server.am.ActivityStackSupervisor.startActivityLocked(ActivityStackSupervisor.java:1336)
 at com.android.server.am.ActivityStackSupervisor.startActivityMayWait(ActivityStackSupervisor.java:807)
 at com.android.server.am.ActivityManagerService.startActivityAsUser(ActivityManagerService.java:3417)
 at com.android.server.am.ActivityManagerService.startActivity(ActivityManagerService.java:3403)
 at android.app.ActivityManagerNative.onTransact(ActivityManagerNative.java:136)
 at com.android.server.am.ActivityManagerService.onTransact(ActivityManagerService.java:2270)
 at android.os.Binder.execTransact(Binder.java:404)
 at dalvik.system.NativeStart.run(Native Method)...
...
"InputReader" prio=10 tid=28 MONITOR
 | group="main" sCount=1 dsCount=0 obj=0x41c5bba8 self=0x5d3d3f78
 | sysTid=734 nice=-8 sched=0/0 cgrp=apps handle=1564294448
 | state=S schedstat=( 168765462 258998536 602 ) utm=10 stm=6 core=1
 at com.android.server.am.ActivityManagerService.getTasks(ActivityManagerService.java:~7488)
 - waiting to lock <0x41ac0000> (a com.android.server.am.ActivityManagerService) held by tid=68 (Binder_9)
 at android.app.ActivityManager.getRunningTasks(ActivityManager.java:776)
 at android.app.ActivityManager.getRunningTasks(ActivityManager.java:812)
 at com.android.server.power.PowerManagerService.ifSendNoSleepBroadcast(PowerManagerService.java:1767)
 at com.android.server.power.PowerManagerService.goToSleepNoUpdateLocked(PowerManagerService.java:1707)
 at com.android.server.power.PowerManagerService.goToSleepInternal(PowerManagerService.java:1657)
 at com.android.server.power.PowerManagerService.goToSleepFromNative(PowerManagerService.java:1651)
 at dalvik.system.NativeStart.run(Native Method)


        里边还有好多关于等待锁的日志,一看到锁,敝人菊花紧了好几下,难道是传说中的死锁???!!!

细分析我发的三个线程的trace日志:

上述三个线程都处在MONITOR状态,一般认为是在等待什么系统资源。

"main"标识的线程为系统主线程,system_server进程,它此时在等待Binder_9的同步锁;

而“Binder_9”在等待InputReader线程的同步锁;

InputReader在干嘛呢?它也没有闲着,它也在等待Binder_9的同步锁!!!

因此,死锁忙等待出现了,系统卡死了,为啥会重启呢?因为系统的watchdog会定时尝试获取系统级的service内部的同步锁,来确定是否有service处于死锁状态,当watchdog发现获取不了某一service的同步锁之后,一直等待,则会做出系统死锁的结论,然后重启system_server,就会看到重启的现象喽。


         那么问题来了我们看上边3个线程的trace日志,基本判断出是PowerManagerService和ActivityManagerService死锁了,真是特么八竿子打不着啊,Google的人又不脑残,如果没人改这俩服务,怎么可能死锁呢?

原来我们改过PowerManagerService的实现逻辑,具体参看敝人的另一篇博文:

改动后,每有一次短按powerkey事件,都会发送一个broadcast给我司的app端,但是这为啥会导致死锁呢?

        原来app端在接收到PMS发送的广播后,会finish掉top的一个activity,而activity在finish的过程中,ActivityStack.java会防止系统进入休眠态从而会申请wake lock锁,而wake lock锁需要PMS分配,于是频按power key就出现了PMS要依靠AMS发送广播给app,而app端在finish activity的时候又需要PMS的wake lock锁分配,这样,两者就处于互相等待的状态了,由于我司关于PMS的改动逻辑不可废弃,临时又没有其他的办法,


因此,我大胆的决定,先把activity切换时候的wakelock锁的申请操作去掉。

if (!mService.isSleepingOrShuttingDown()) {
 //Disable this for screen off sleep system down.
  //mStackSupervisor.acquireLaunchWakelock();
  }

        这样临时就可以畅快的疯点powerkey了,未经过长期测试,先暂时这样改,如果有同仁知道这样改对系统的不利影响,欢迎恢复讨论~~~