契机:今天升级了Android7.1 beta版。然而升上去之后,国产的部分App简直丑态百出啊,给各位看看我的手机截图

啧啧,原来Android7.0以及以下干干净净的通知栏瞬间被这几个家伙占满。有句话说:潮水退去,才知道谁在裸泳啊。同样的,系统升级修复漏洞后,才赤果果地暴露出吃相呢。

开始进入正题:
startForeground啥效果我就不啰嗦了。
而国内大部分主流应用,其实都使用了Android的一个通知栏的bug,使得在调用startForegound之后在通知栏没有任何通知,而且进程同样处于低oom_adj状态。直到Android7.1才修复了这个漏洞。

首先是怎么做:这里代码参考https://github.com/D-clock/AndroidDaemonService
首先是一个主service,在这个service里的onstartCommand里头启动一个临时的GrayInnerService

Intent innerIntent = new Intent(this, GrayInnerService.class);
startService(innerIntent);
startForeground(GRAY_SERVICE_ID, new Notification());

随后在GrayInnerService的onstartCommand里头

startForeground(GRAY_SERVICE_ID, new Notification());
//stopForeground(true);
stopSelf();
return super.onStartCommand(intent, flags, startId);

看起来十分容易,总结关键点就是 一个进程里头的两个service同时用同一个ID来startForeground,然后其中一个自杀,就OK了。

原理的话也相当简单:
因为Android没有针对startForeground的ID的唯一性做判定,然后两个service对应了一个notification,然后其中一个自杀,会把这个Notification带走,所以我们看不见notification了,但是另一个处于foreground的service依然存活着!,只要存在一个foreground的service,那么这个进程的oomadj的值就比较底,就不容易被杀死

代码分析如下:

public final void startForeground(int id, Notification notification) {
        try {
            mActivityManager.setServiceForeground(
                    new ComponentName(this, mClassName), mToken, id,
                    notification, true);
        } catch (RemoteException ex) {
        }
    }

跟踪setServiceForeground

585    public void setServiceForegroundLocked(ComponentName className, IBinder token,
586            int id, Notification notification, boolean removeNotification) {
587        final int userId = serHandle.getCallingUserId();
588        final long origId =Binder.clearCallingIdentity();
589        try {
590            ServiceRecord r = findServiceLocked(className, token, userId);//找到service对应的serviceRecord
591            if (r != null) {
592                if (id != 0) {
593                    if (notification == null) {
594                        throw new IllegalArgumentException("null notification");
595                    }
596                    if (r.foregroundId != id) {//这里并没有对id的进程内的唯一性做检查。只是单存地更新一下ID和notification而已
597                        r.cancelNotification();
598                        r.foregroundId = id;
599                    }
600                    notification.flags |= Notification.FLAG_FOREGROUND_SERVICE;
601                    r.foregroundNoti = notification;
602                    r.isForeground = true;
603                    r.postNotification();
604                    if (r.app != null) {
605                        updateServiceForegroundLocked(r.app, true);//走到这里!
606                    }
607                    getServiceMap(r.userId).ensureNotStartingBackground(r);
.....
}

注意本段代码已经出现了最最重要的关键点,Android只是简单地把startForeground传入的id记录在r.foregroundId ,而没有检查是否id之前是否被其他的foreground service使用过了·

然后调用updatServiceForeground:

630    private void updateServiceForegroundLocked(ProcessRecord proc, boolean oomAdj) {
631        boolean anyForeground = false;
632        for (int i=proc.services.size()-1; i>=0; i--) {
633            ServiceRecord sr = proc.services.valueAt(i);
634            if (sr.isForeground) {
635                anyForeground = true;
636                break;
637            }
638        }
639        mAm.updateProcessForegroundLocked(proc, anyForeground, oomAdj);
640    }

即检查本进程内有任意service为foreground状态,然后依据这个结果进入updateProcessForegroundLocked对进程进行后续调整:

18964    final void updateProcessForegroundLocked(ProcessRecord proc, boolean isForeground,
18965            boolean oomAdj) {
18966        if (isForeground != proc.foregroundServices) {
18967            proc.foregroundServices = isForeground;//更新ProcessRecord的foregroundServices
18968            ArrayList<ProcessRecord> curProcs = mForegroundPackages.get(proc.info.packageName,
18969                    proc.info.uid);
18970            if (isForeground) {
18971                if (curProcs == null) {
18972                    curProcs = new ArrayList<ProcessRecord>();
18973                    mForegroundPackages.put(proc.info.packageName, proc.info.uid, curProcs);
18974                }
18975                if (!curProcs.contains(proc)) {
18976                    curProcs.add(proc);
...电量统计相关...
18992            if (oomAdj) {
18993                updateOomAdjLocked(); //更新并应用进程oomadj的值
18994            }
18995        }
18996    }

分析就到updateOomAdjLocked为止了。后面就是一长串的oomadj计算并且将新的oomadj应用到本进程上了,这样进程就不被杀的优先级就提升了,以后如果有机会再详细说,这不是本章的重点。

还记得我们启动了两个相同id的service了吗,然后其中一个Service开始自杀。开始看自杀啦:

public final void stopSelf(int startId) {
        if (mActivityManager == null) {
            return;
        }
        try {
            mActivityManager.stopServiceToken(
                    new ComponentName(this, mClassName), mToken, startId);
        } catch (RemoteException ex) {
        }
    }
private void stopServiceLocked(ServiceRecord service) {
460        if (service.delayed) {
461            // If service isn't actually running, but is is being held in the
462            // delayed list, then we need to keep it started but note that it
463            // should be stopped once no longer delayed.
464            if (DEBUG_DELAYED_STARTS) Slog.v(TAG_SERVICE, "Delaying stop of pending: " + service);
465            service.delayedStop = true;
466            return;
467        }
468        synchronized (service.stats.getBatteryStats()) {
469            service.stats.stopRunningLocked();
470        }
471        service.startRequested = false;
472        if (service.tracker != null) {
473            service.tracker.setStarted(false, mAm.mProcessStats.getMemFactorLocked(),
474                    SystemClock.uptimeMillis());
475        }
476        service.callStart = false;
477        bringDownServiceIfNeededLocked(service, false, false);//重点
478    }

跟进bringDownServiceIfNeededLocked

private final void bringDownServiceLocked(ServiceRecord r) {
1691        //Slog.i(TAG, "Bring down service:");
1692        //r.dump("  ");
1693
1694        // Report to all of the connections that the service is no longer
1695        // available.
1696        for (int conni=r.connections.size()-1; conni>=0; conni--) {
1697            ArrayList<ConnectionRecord> c = r.connections.valueAt(conni);
1698            for (int i=0; i<c.size(); i++) {
1699                ConnectionRecord cr = c.get(i);
1700                // There is still a connection to the service that is
1701                // being brought down.  Mark it as dead.
1702                cr.serviceDead = true;
1703                try {
1704                    cr.conn.connected(r.name, null);
1705                } catch (Exception e) {
1706                    Slog.w(TAG, "Failure disconnecting service " + r.name +
1707                          " to connection " + c.get(i).conn.asBinder() +
1708                          " (in " + c.get(i).binding.client.processName + ")", e);
1709                }
1710            }
1711        }
.............
1755
1756        r.cancelNotification();//取消serviceRecord对应的前台的广播!!
1757        r.isForeground = false;//取消serviceRecord的前台资格。
1758        r.foregroundId = 0;
1759        r.foregroundNoti = null;
1760
1761        // Clear start entries.
1762        r.clearDeliveredStartsLocked();
1763        r.pendingStarts.clear();
1764
1765        if (r.app != null) {
1766            synchronized (r.stats.getBatteryStats()) {
1767                r.stats.stopLaunchedLocked();
1768            }
1769            r.app.services.remove(r);
1770            if (r.app.thread != null) {
1771                (r.app, false);//重点。更新前台的状态,回到前面的那个函数
1772                try {
1773                    bumpServiceExecutingLocked(r, false, "destroy");
1774                    mDestroyingServices.add(r);
1775                    r.destroying = true;
1776                    mAm.updateOomAdjLocked(r.app);
1777                    r.app.thread.scheduleStopService(r);
1778                } catch (Exception e) {
1779                    Slog.w(TAG, "Exception when destroying service "
1780                            + r.shortName, e);
1781                    serviceProcessGoneLocked(r);
1782                }
1783            } else {
1784                if (DEBUG_SERVICE) Slog.v(
1785                    TAG_SERVICE, "Removed service that has no process: " + r);
1786            }
1787        } else {
1788            if (DEBUG_SERVICE) Slog.v(
1789                TAG_SERVICE, "Removed service that is not running: " + r);
1790        }
1791
。。。。。。。。
1812    }

可以看到一个service自杀的时候,会先取消和对应serviceRecord相关的所有的前台广播的notification(1756行),还有把自身设置为非foreground状态(1757行)。
然后调用updateServiceForegroundLocked,这个方法是之前startForeground那块就分析过的。它是根据serviceRecord是否是前台service的信息更新一下进程的oomadj。而当时我们开了两个前台Service,现在死了一个,还剩下一个呢!所以进程依旧保持高优先级状态。
发现问题了吗?那个代表着foreground的notification没了,但是进程却仍然保持低oomadj值!

修复其实也相当容易,
方案1:严格要求一个startForeground的id对应一个notification。不过这要修改API文档描述。不可取。
方案2:在移除通知的时候做判定,如果通知对应的Service没有死光,那么通知不能够移除!因为service和notification是多对一的状态。

手头没有7.1的代码,不过可以推断应该就是走了方案二,然后导致各个国产App没能成功移除Notification,导致在通知栏上群魔乱舞,不忍直视

花絮:翻阅代码的时候看到这么一段:

if (localForegroundNoti.getSmallIcon() == null) {
     // It is not correct for the caller to not supply a notification
     // icon, but this used to be able to slip through, so for
     // those dirty apps we will create a notification clearly
     // blaming the app.
     Slog.v(TAG, "Attempted to start a foreground service ("
             + name
             + ") with a broken notification (no icon: "
             + localForegroundNoti
             + ")");
     CharSequence appName = appInfo.loadLabel(
             ams.mContext.getPackageManager());
     if (appName == null) {
         appName = appInfo.packageName;
     }
     Context ctx = null;
     try {
         ctx = ams.mContext.createPackageContextAsUser(
                 appInfo.packageName, 0, new UserHandle(userId));
         Notification.Builder notiBuilder = new Notification.Builder(ctx);
         // it's ugly, but it clearly identifies the app
         notiBuilder.setSmallIcon(appInfo.icon);

这个就是Android对一个以前的一个保活方法的修复,利用方法是:startForeground的notification没有setSmallIcon的话就不会在通知栏出现。然后后面的版本很暴力地直接取出App的图标给他填了上去。

参考:www.jianshu.com/p/63aafe3c12af