Android为什么设计ANR机制

转载

mob64ca13fa2f9e 2024-07-07 20:15:14

文章标签 Android为什么设计ANR机制 android sed Reason 文章分类 Android 移动开发

一　概述

当 input 事件处理得慢就会触发 ANR，那 ANR 内部原理是什么，哪些场景会产生 ANR？ “工欲善其事必先利其器”，为了理解 input ANR 原理，前面几篇文章疏通了整个 input 框架的处理流程，都是为了这篇文章而做铺垫。在正式开始分析 ANR 触发原理以及触发场景之前，先来回顾一下 input 流程。

1.1 InputReader

Android为什么设计ANR机制_Reason

InputReader 的主要工作分两部分：

１．调用 EventHub 的 getEvents() 读取节点 /dev/input/eventX 下的输入事件，并把表示原始事件的 input_event 结构体转换成 RawEvent 结构体，RawEvent 根据不同 InputMapper 来转换成相应的 EventEntry，比如按键事件则对应 KeyEntry，触摸事件则对应 MotionEntry。

转换结果：input_event -> EventEntry

２．将事件添加到 InputDispatcher 的 mInboundQueue 队列尾部，加入该队列前有以下两个过滤：

IMS.interceptKeyBeforeQueueing：事件分发前可增加业务逻辑
IMS.filterInputEvent：可拦截事件，当返回值为 false 的事件都直接拦截，没有机会加入 mInboundQueue 队列，不会再往下分发；否则进入下一步
enqueueInboundEventLocked：执行输入事件放入 mInboundQueue 队列尾部
mLooper->wake：并根据情况来唤醒 InputDispatcher 线程

３．KeyboardInputMapper.processKey() 的过程，记录下按下 down 事件的时间点

1.2 InputDispatcher

Android为什么设计ANR机制_sed_02

１．dispatchOnceInnerLocked()：从 InputDispatcher 的 mInboundQueue 队列，取出事件 EventEntry。另外该方法开始执行的时间点 (currentTime) 便是后续事件 dispatchEntry 的分发时间 (deliveryTime）

２．dispatchKeyLocked()：满足一定条件时会添加命令 doInterceptKeyBeforeDispatchingLockedInterruptible

３．enqueueDispatchEntryLocked()：生成事件 DispatchEntry 并加入 connection 的 outbound 队列

４．startDispatchCycleLocked()：从 outboundQueue 中取出事件 DispatchEntry，重新放入 connection 的 waitQueue 队列

５．runCommandsLockedInterruptible()：通过循环遍历方式，依次处理 mCommandQueue 队列中的所有命令。而 mCommandQueue 队列中的命令是通过 postCommandLocked() 方式向该队列添加的。ANR 回调命令便是在这个时机执行

６．handleTargetsNotReadyLocked()：该过程会判断是否等待超过 5s 来决定是否调用 onANRLocked()

流程15中 sendMessage 是将 input 事件分发到 app 端，当 app 处理完该事件后会发送 finishInputEvent() 事件。接下来又回到 pollOnce() 方法。

1.3 UI Thread

Android为什么设计ANR机制_sed_03

InputDispatcher 线程监听 socket 服务端，收到消息后回调 InputDispatcher.handleReceiveCallback()
UI 主线程监听 socket 客户端，收到消息后回调 NativeInputEventReceiver.handleEvent()

对于 ANR 的触发主要是在 InputDispatcher 过程，下面再从 ANR 的角度来说一说 ANR 触发过程。

二　ANR处理流程

ANR 时间区间便是指当前这次的事件 dispatch 过程中执行 findFocusedWindowTargetsLocked() 方法到下一次执行 resetANRTimeoutsLocked() 的时间区间。以下 5 个函数会 reset。都位于 InputDispatcher.cpp 文件中：

dispatchOnceInnerLocked
setInputDispatchMode
setFocusedApplication
releasePendingEventLocked
resetAndDropEverythingLocked

简单来说，主要是以下 4 个场景，会有机会执行 resetANRTimeoutsLocked：

解冻屏幕，系统开/关机的时刻点 (thawInputDispatchingLw，setEventDispatchingLw，最后调用 setInputDispatchMode)
wms 聚焦 app 的改变 (WMS.setFocusedApp，IMS.setFocusedApplication，setFocusedApplication)
设置 input filter 的过程 (IMS.setInputFilter，进而调用 resetAndDropEverythingLocked)
再次分发事件的过程 (dispatchOnceInnerLocked)
dispatch 结束的时候（dispatchOnceInnerLocked 最后 done 为 true，最终调用 releasePendingEventLocked）

当 InputDispatcher 线程，执行 findFocusedWindowTargetsLocked() 过程调用到 handleTargetsNotReadyLocked，且满足超时 5s 的情况则会调用 onANRLocked()。

2.1 onANRLocked

void InputDispatcher::onANRLocked(nsecs_t currentTime,
    const sp<InputApplicationHandle>& applicationHandle,
    const sp<InputWindowHandle>& windowHandle,
    nsecs_t eventTime, nsecs_t waitStartTime, const char* reason) {
    
    float dispatchLatency = (currentTime - eventTime) * 0.000001f;
    float waitDuration = (currentTime - waitStartTime) * 0.000001f;

    ALOGI("Application is not responding: %s. "
"It has been %0.1fms since event, %0.1fms since wait started. Reason: %s",
getApplicationWindowLabelLocked(applicationHandle, windowHandle).string(),
dispatchLatency, waitDuration, reason);

    // 捕获 ANR 的现场信息
    time_t t = time(NULL);
    struct tm tm;
    localtime_r(&t, &tm);
    char timestr[64];
    strftime(timestr, sizeof(timestr), "%F %T", &tm);
    mLastANRState.clear();
    mLastANRState.append(INDENT "ANR:\n");
    mLastANRState.appendFormat(INDENT2 "Time: %s\n", timestr);
    mLastANRState.appendFormat(INDENT2 "Window: %s\n",
    getApplicationWindowLabelLocked(applicationHandle, windowHandle).string());
    mLastANRState.appendFormat(INDENT2 "DispatchLatency: %0.1fms\n", dispatchLatency);
    mLastANRState.appendFormat(INDENT2 "WaitDuration: %0.1fms\n", waitDuration);
    mLastANRState.appendFormat(INDENT2 "Reason: %s\n", reason);
    dumpDispatchStateLocked(mLastANRState);

    // 将 ANR 命令加入 mCommandQueue
    CommandEntry* commandEntry = postCommandLocked(
            & InputDispatcher::doNotifyANRLockedInterruptible);
    commandEntry->inputApplicationHandle = applicationHandle;
    commandEntry->inputWindowHandle = windowHandle;
    commandEntry->reason = reason;
}

onANRLocked() 中会对 ANR 信息进行收集，然后构建一个回调函数为 doNotifyANRLockedInterruptible 的 CommandEntry ，并加入 mCommandQueue 队列。

这样，当循环执行到下一轮 InputDispatcher.dispatchOnce 的过程中，会先执行 runCommandsLockedInterruptible() 方法，取出 mCommandQueue 队列的所有命令逐一执行。那么就会执行 ANR 所对应的函数 doNotifyANRLockedInterruptible：

2.2 doNotifyANRLockedInterruptible

InputDispatcher.cpp

void InputDispatcher::doNotifyANRLockedInterruptible(
        CommandEntry* commandEntry) {
    mLock.unlock();
    
    nsecs_t newTimeout = mPolicy->notifyANR(
        commandEntry->inputApplicationHandle, commandEntry->inputWindowHandle,
        commandEntry->reason);

    mLock.lock();
    // newTimeout = 5s
    resumeAfterTargetsNotReadyTimeoutLocked(newTimeout,
            commandEntry->inputWindowHandle != NULL
            ? commandEntry->inputWindowHandle->getInputChannel() : NULL);
}

我们已经知道这里的 mPolicy，就是 NativeInputManager。

2.3 NativeInputManager.notifyANR

com_android_server_input_InputManagerService.cpp

nsecs_t NativeInputManager::notifyANR(
    const sp<InputApplicationHandle>& inputApplicationHandle,
    const sp<InputWindowHandle>& inputWindowHandle, const String8& reason) {
    ......
    JNIEnv* env = jniEnv();
    ScopedLocalFrame localFrame(env);

    jobject tokenObj = javaObjectForIBinder(env, token);
    jstring reasonObj = env->NewStringUTF(reason.c_str());

    // 调用 Java 方法
    jlong newTimeout = env->CallLongMethod(mServiceObj,
                gServiceClassInfo.notifyANR, tokenObj,
                reasonObj);
    if (checkAndClearExceptionFromCallback(env, "notifyANR")) {
        newTimeout = 0; // 抛出异常,则清理并重置 timeout
    } else {
        assert(newTimeout >= 0);
    }
    return newTimeout;
}

先看看 register_android_server_InputManager 过程：

int register_android_server_InputManager(JNIEnv* env) {
    int res = jniRegisterNativeMethods(env,
    "com/android/server/input/InputManagerService",
    gInputManagerMethods, NELEM(gInputManagerMethods));

    jclass clazz;
    FIND_CLASS(clazz, "com/android/server/input/InputManagerService");
    ......
    GET_METHOD_ID(gServiceClassInfo.notifyANR, clazz,
            "notifyANR",
            "(Landroid/os/IBinder;Ljava/lang/String;)J");
    ......
}

可知 gServiceClassInfo.notifyANR 是指 IMS.notifyANR

2.4 IMS.notifyANR

private long notifyANR(IBinder token, String reason) {
    return mWindowManagerCallbacks.notifyANR(
            token, reason);
}

此处 mWindowManagerCallbacks 是指 InputManagerCallback 对象。

2.5 InputManagerCallback.notifyANR

InputManagerCallback.java

public long notifyANR(IBinder token, String reason) {
    AppWindowToken appWindowToken = null;
    WindowState windowState = null;
    boolean aboveSystem = false;
    synchronized (mService.mGlobalLock) {
        if (token != null) {
                windowState = mService.windowForClientLocked(null, token, false);
                if (windowState != null) {
                    appWindowToken = windowState.mAppToken;
                }
        }
        // 输出 input 事件分发超时 log
        if (windowState != null) {
                Slog.i(TAG_WM, "Input event dispatching timed out "
                        + "sending to " + windowState.mAttrs.getTitle()
                        + ".  Reason: " + reason);
                // Figure out whether this window is layered above system windows.
                // We need to do this here to help the activity manager know how to
                // layer its ANR dialog.
                int systemAlertLayer = 
                mService.mPolicy.getWindowLayerFromTypeLw(
                TYPE_APPLICATION_OVERLAY,
                windowState.mOwnerCanAddInternalSystemWindow);
                aboveSystem = windowState.mBaseLayer > systemAlertLayer;
            } else if (appWindowToken != null) {
                Slog.i(TAG_WM, "Input event dispatching timed out "
                        + "sending to application " + appWindowToken.stringName
                        + ".  Reason: " + reason);
            } else {
                Slog.i(TAG_WM, "Input event dispatching timed out "
                        + ".  Reason: " + reason);
            }
        mService.saveANRStateLocked(appWindowToken, windowState, reason);
    }

    // All the calls below need to happen without the WM
    // lock held since they call into AM.
    mService.mAtmInternal.saveANRState(reason);
        
    if (appWindowToken != null && appWindowToken.appToken != null) {
        final boolean abort = appWindowToken.keyDispatchingTimedOut(reason,
                (windowState != null) ? windowState.mSession.mPid : -1);
        if (! abort) {
            return appWindowToken.inputDispatchingTimeoutNanos; //5s
        }
    } else if (windowState != null) {
        long timeout = mService.mAmInternal.inputDispatchingTimedOut(
                windowState.mSession.mPid, aboveSystem, reason);
        if (timeout >= 0) {
            return timeout * 1000000L; //5s
        }
    }
    return 0;
}

AppWindowToken.java
boolean keyDispatchingTimedOut(String reason, int windowPid) {
        return mActivityRecord != null &&
        mActivityRecord.keyDispatchingTimedOut(reason, windowPid);
    }

发生 input 相关的 ANR 时在 system log 输出 ANR 信息，并且 tag 为 WindowManager。主要有 3 类 log：

Input event dispatching timed out sending to [windowState.mAttrs.getTitle()]
Input event dispatching timed out sending to application [appWindowToken.stringName)]
Input event dispatching timed out sending

2.6 DispatchingTimedOut

2.6.1 ActivityRecord.keyDispatchingTimedOut

final class ActivityRecord extends ConfigurationContainer {
    ......
    public boolean keyDispatchingTimedOut(String reason, int windowPid) {
        ActivityRecord anrActivity;
        WindowProcessController anrApp;
        boolean windowFromSameProcessAsActivity;
        synchronized (mAtmService.mGlobalLock) {
            anrActivity = getWaitingHistoryRecordLocked();
            anrApp = app;
            windowFromSameProcessAsActivity =
                    !hasProcess() || app.getPid() == windowPid || windowPid == -1;
        }

        if (windowFromSameProcessAsActivity) {
            return mAtmService.mAmInternal.inputDispatchingTimedOut(
            anrApp.mOwner, anrActivity.shortComponentName,
            anrActivity.appInfo, shortComponentName, app, false, reason);
        } else {
            // In this case another process added windows using
            // this activity token. So, we call the
            // generic service input dispatch timed out
            // method so that the right process is blamed.
            return mAtmService.mAmInternal.inputDispatchingTimedOut(
                    windowPid, false /* aboveSystem */, reason) < 0;
        }
    }
}

### 2.6.2 AMS.inputDispatchingTimedOut

long inputDispatchingTimedOut(int pid, final boolean aboveSystem,
    String reason) {
        if (checkCallingPermission(FILTER_EVENTS) !=
            PackageManager.PERMISSION_GRANTED) {
 throw new SecurityException("Requires permission " + FILTER_EVENTS);
        }
        ProcessRecord proc;
        long timeout;
        synchronized (this) {
            synchronized (mPidsSelfLocked) {
                proc = mPidsSelfLocked.get(pid);// 根据 pid 查看进程 record
            }
            // 超时为 KEY_DISPATCHING_TIMEOUT，即 timeout = 5s
            timeout = proc != null ?
            proc.getInputDispatchingTimeout() : KEY_DISPATCHING_TIMEOUT_MS;
        }

        if (inputDispatchingTimedOut(proc, null, null, null,
            null, aboveSystem, reason)) {
            return -1;
        }
        return timeout;
}


boolean inputDispatchingTimedOut(ProcessRecord proc,
    String activityShortComponentName, ApplicationInfo aInfo,
    String parentShortComponentName, WindowProcessController parentProcess,
    boolean aboveSystem, String reason) {
        if (checkCallingPermission(FILTER_EVENTS) !=
            PackageManager.PERMISSION_GRANTED) {
            throw new SecurityException("Requires permission " + FILTER_EVENTS);
        }

        final String annotation;
        if (reason == null) {
            annotation = "Input dispatching timed out";
        } else {
            annotation = "Input dispatching timed out (" + reason + ")";
        }

        if (proc != null) {
            synchronized (this) {
                if (proc.isDebugging()) {
                    return false;
                }

                if (proc.getActiveInstrumentation() != null) {
                    Bundle info = new Bundle();
                    info.putString("shortMsg", "keyDispatchingTimedOut");
                    info.putString("longMsg", annotation);
                    finishInstrumentationLocked(
                    proc, Activity.RESULT_CANCELED, info);
                    return true;
                }
            }
            proc.appNotResponding(activityShortComponentName, aInfo,
                    parentShortComponentName, parentProcess,
                    aboveSystem, annotation);
        }
        return true;
}

appNotResponding 会输出现场的重要进程的 trace 等信息。再回到【小节2.2】处理完 ANR 后再调用 resumeAfterTargetsNotReadyTimeoutLocked。

2.7 resumeAfterTargetsNotReadyTimeoutLocked

InputDispatcher.cpp

void InputDispatcher::resumeAfterTargetsNotReadyTimeoutLocked(
    nsecs_t newTimeout, const sp<InputChannel>& inputChannel) {
    if (newTimeout > 0) {
        // 超时时间增加 5s
        mInputTargetWaitTimeoutTime = now() + newTimeout;
    } else {
        // Give up.
        mInputTargetWaitTimeoutExpired = true;

        // Input state will not be realistic.  Mark it out of sync.
        if (inputChannel.get()) {
            ssize_t connectionIndex =
            getConnectionIndexLocked(inputChannel);
            if (connectionIndex >= 0) {
                sp<Connection> connection =
                mConnectionsByFd.valueAt(connectionIndex);
                sp<IBinder> token = connection->inputChannel->getToken();

                if (token != nullptr) {
                    removeWindowByTokenLocked(token);
                }

                if (connection->status == Connection::STATUS_NORMAL) {
                    CancelationOptions options(
                    CancelationOptions::CANCEL_ALL_EVENTS,
                    "application not responding");
         synthesizeCancelationEventsForConnectionLocked(connection, options);
                }
            }
        }
    }
}

三　input 死锁监测机制

3.1 IMS.start

InputManagerService.java

public void start() {
    ......
    Watchdog.getInstance().addMonitor(this);
    ......
}

InputManagerService 实现了 Watchdog.Monitor 接口，并且在启动过程将自己加入到了 Watchdog 线程的 monitor 队列。

3.2 IMS.monitor

Watchdog 便会定时调用 IMS.monitor() 方法

@Override
    public void monitor() {
        synchronized (mInputFilterLock) { }
        nativeMonitor(mPtr);
    }

nativeMonitor 经过 JNI 调用，进入如下方法：

static void nativeMonitor(JNIEnv* /* env */, jclass /* clazz */, jlong ptr) {
    NativeInputManager* im = reinterpret_cast<NativeInputManager*>(ptr);

    im->getInputManager()->getReader()->monitor();
    im->getInputManager()->getDispatcher()->monitor();
}

3.3 InputReader::monitor

InputReader.cpp

void InputReader::monitor() {
    // 请求和释放一次 mLock,来确保 reader 没有发生死锁的问题
    mLock.lock();
    mEventHub->wake();
    mReaderIsAliveCondition.wait(mLock);
    mLock.unlock();

    // 监测 EventHub
    mEventHub->monitor();
}

获取 mLock 之后，进入 Condition 类型的 wait() 方法，等待 InputReader 线程的 loopOnce() 中的 broadcast() 来唤醒。

void InputReader::loopOnce() {
    size_t count = mEventHub->getEvents(timeoutMillis, mEventBuffer, EVENT_BUFFER_SIZE);
    ......
    {
        AutoMutex _l(mLock);
        mReaderIsAliveCondition.broadcast();
        if (count) {
            processEventsLocked(mEventBuffer, count);
        }
    }
    ......
    mQueuedListener->flush();
}

3.3.1 EventHub::monitor

EventHub.cpp

void EventHub::monitor() {
    // 请求和释放一次 mLock,来确保 reader 没有发生死锁的问题
    mLock.lock();
    mLock.unlock();
}

3.4 InputDispatcher::monitor

InputDispatcher.cpp

void InputDispatcher::monitor() {
    std::unique_lock _l(mLock);
    mLooper->wake();
    mDispatcherIsAliveCondition.wait(_l);
}

获取 mLock 之后，进入 Condition 类型的 wait() 方法，等待 InputDispatcher 线程的 loopOnce() 中的 broadcast() 来唤醒。

void InputDispatcher::dispatchOnce() {
    nsecs_t nextWakeupTime = LONG_LONG_MAX;
    {
        std::scoped_lock _l(mLock);
        mDispatcherIsAlive.notify_all();
        if (!haveCommandsLocked()) {
            dispatchOnceInnerLocked(&nextWakeupTime);
        }
        if (runCommandsLockedInterruptible()) {
            nextWakeupTime = LONG_LONG_MIN;
        }
    }

    nsecs_t currentTime = now();
    int timeoutMillis = toMillisecondTimeoutDelay(currentTime, nextWakeupTime);
    mLooper->pollOnce(timeoutMillis); // 进入 epoll_wait
}

3.5 小结

通过将 InputManagerService 加入到 Watchdog 的 monitor 队列，定时监测是否发生死锁。

整个监测涉及 EventHub，InputReader，InputDispatcher，InputManagerService 的死锁监测。监测的原理很简单，通过尝试获取锁并释放锁的方式。

最后，可通过 adb shell dumpsys input 来查看 Android 系统当前的 input 状态，输出内容分别为 EventHub.dump()，InputReader.dump()，InputDispatcher.dump() 这 3 类，另外如果发生过 input ANR，那么也会输出上一个 ANR 的状态。

其中 mPendingEvent 代表当下正在处理的输入事件。

四　总结

4.1 ANR分类

由小节 #2.5 InputManagerCallback.notifyANR 完成，当发生 ANR 时 system log 中会出现以下信息，并且 TAG = WindowManager：

Input event dispatching timed out xxx. Reason: + reason，其中 xxx 取值：

窗口类型：sending to windowState.mAttrs.getTitle()
应用类型：sending to application appWindowToken.stringName
其他类型：则为空

至于 Reason 主要有以下类型：

4.1.1 reason类型

由 checkWindowReadyForMoreInputLocked 完成， ANR reason 主要有以下几类：

无窗口，有应用：Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up
窗口暂停：Waiting because the [targetType] window is paused
窗口未连接：Waiting because the [targetType] window’s input channel is not registered with the input dispatcher。The window may be in the process of being removed
窗口连接已死亡：Waiting because the [targetType] window’s input connection is [Connection.Status]。The window may be in the process of being removed
窗口连接已满：Waiting because the [targetType] window’s input channel is full。Outbound queue length：[outboundQueue长度]。Wait queue length：[waitQueue长度]
按键事件，输出队列或事件等待队列不为空：Waiting to send key event because the [targetType] window has not finished processing all of the input events that were previously delivered to it。Outbound queue length：[outboundQueue长度]。Wait queue length：[waitQueue长度]
非按键事件，事件等待队列不为空且头事件分发超时500ms：Waiting to send non-key event because the [targetType] window has not finished processing certain input events that were delivered to it over 500ms ago。Wait queue length：[waitQueue长度]。Wait queue head age：[等待时长]

其中

targetType：取值为 ”focused” 或者 ”touched”
Connection.Status：取值为 ”NORMAL”，”BROKEN”，”ZOMBIE”

另外，findFocusedWindowTargetsLocked，findTouchedWindowTargetsLocked 这两个方法中可以通过实现 updateDispatchStatistics() 来分析 anr 问题。

4.2 drop事件分类

由 dropInboundEventLocked 完成，输出事件丢弃的原因：

DROP_REASON_POLICY：“inbound event was dropped because the policy consumed it”;
DROP_REASON_DISABLED：“inbound event was dropped because input dispatch is disabled”
DROP_REASON_APP_SWITCH：“inbound event was dropped because of pending overdue app switch”
DROP_REASON_BLOCKED：“inbound event was dropped because the current application is not responding and the user has started interacting with a different application”
DROP_REASON_STALE：“inbound event was dropped because it is stale”

其他：

doDispatchCycleFinishedLockedInterruptible 的过程，会记录分发时间超过 2s 的事件
findFocusedWindowTargetsLocked 的过程，可以统计等待时长信息

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：hadoop yarn idea 远程调试

下一篇：spring为什么被final

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯