系统升级软件流程

本章节结合源码剖析Recovery系统升级流程,流程中相关技术难点或者细节会单独文章介绍,文中相应位置会附上链接。

从APP检测到服务器推送OTA升级包到设备启动到新版本系统的整个软件流程如下图所示,

文章将围绕图中涉及到的模块详细讲解。

android system recovery android system recovery(3饿)_预处理


软件流程

  • 系统升级软件流程
  • 1. App下载升级包并调用RecoverySystem接口
  • 2. Framework RecoverySystem 触发升级
  • 2.1 构造 Recovery 升级指令
  • 2.2 通过 RECOVERY_SERVICE 把升级指令写入到BCB
  • 2.3 通过 POWER_SERVICE 触发重启设备
  • 3. BootLoader 读取 BCB 启动到 Recovery System
  • 4. Kernel 加载 ramdisk,启动 init 并拉起 recovery 进程
  • 5. 进入 Recovery 升级流程
  • 5.1 从 misc 分区 BCB 读取升级指令
  • 5.2 把升级包 mmap 到内存
  • 5.3 校验升级包完整性、合法性
  • 5.4 Fork update-binary 子进程升级系统
  • 5.5 退出 update-binary 子进程、保存 log 并擦除 misc 分区 BCB
  • 5.6 重启机器返回 main system
  • 6. BootLoader 启动 Main System
  • 7. Init 拉起 flash_recovery 服务升级 recovery 分区
  • 7.1 flash_recovery 存在的意义是什么?
  • 7.2 什么时候启动 flash_recovery 服务?
  • 7.3 flash_recovery 怎么升级 recovery 分区?
  • 8. 启动到 launcher,升级流程结束


1. App下载升级包并调用RecoverySystem接口

检测是否有OTA推送并从服务器下载升级包的业务逻辑由oem厂商自行实现,下面从触发升级开始分析。
App下载完升级包后调用 framework RecoverySystem 类的 installPackage 接口传入下载好的升级包路径,app的任务到此即结束。

// android.os.RecoverySystem
RecoverySystem.installPackage(Context context, File packageFile)

2. Framework RecoverySystem 触发升级

Google AOSP RecoverySystem函数 installPackage()

// frameworks/base/core/java/android/os/RecoverySystem.java
public static void installPackage(Context context, File packageFile, boolean processed)
        throws IOException {
    synchronized (sRequestLock) {
        /* 1. 构造固定格式的 recovery 升级指令 */
        LOG_FILE.delete();
        // Must delete the file in case it was created by system server.
        UNCRYPT_PACKAGE_FILE.delete();

        String filename = packageFile.getCanonicalPath();
        Log.w(TAG, "!!! REBOOTING TO INSTALL " + filename + " !!!");

        // If the package name ends with "_s.zip", it's a security update.
        boolean securityUpdate = filename.endsWith("_s.zip");

		//  如果升级包存储于data分区,则需要对升级包特殊处理,原因和原理见下文介绍。
        // If the package is on the /data partition, the package needs to
        // be processed (i.e. uncrypt'd). The caller specifies if that has
        // been done in 'processed' parameter.
        if (filename.startsWith("/data/")) {
        	// 如果升级包已经被处理过则检查处理后输出文件是否存在即可
            if (processed) {
                if (!BLOCK_MAP_FILE.exists()) {
                    Log.e(TAG, "Package claimed to have been processed but failed to find "
                            + "the block map file.");
                    throw new IOException("Failed to find block map file");
                }
            } else {
            	// 升级包预处理是由服务 uncryptd 完成的,其输入为文件 UNCRYPT_PACKAGE_FILE,
            	// 输出为文件 BLOCK_MAP_FILE, 此处初始化这两个文件。 uncryptd 详见下文介绍。
                FileWriter uncryptFile = new FileWriter(UNCRYPT_PACKAGE_FILE);
                try {
                    uncryptFile.write(filename + "\n");
                } finally {
                    uncryptFile.close();
                }
                // UNCRYPT_PACKAGE_FILE needs to be readable and writable
                // by system server.
                if (!UNCRYPT_PACKAGE_FILE.setReadable(true, false)
                        || !UNCRYPT_PACKAGE_FILE.setWritable(true, false)) {
                    Log.e(TAG, "Error setting permission for " + UNCRYPT_PACKAGE_FILE);
                }

                BLOCK_MAP_FILE.delete();
            }
			
			// 预处理的升级包参数改为 "@+BLOCK_MAP_FILE(/cache/recovery/block.map)",
			// 为什么这么做,仅仅是约定而已,原理见下文介绍 uncryptd 。
            // If the package is on the /data partition, use the block map
            // file as the package name instead.
            filename = "@/cache/recovery/block.map";
        }

        final String filenameArg = "--update_package=" + filename + "\n";
        final String localeArg = "--locale=" + Locale.getDefault().toLanguageTag() + "\n";
        final String securityArg = "--security\n";

        String command = filenameArg + localeArg;
        if (securityUpdate) {
            command += securityArg;
        }

        /* 2. 通过 RECOVERY_SERVICE 把升级指令写入到 BCB(也就是misc分区头部)*/
        RecoverySystem rs = (RecoverySystem) context.getSystemService(
                Context.RECOVERY_SERVICE);
        if (!rs.setupBcb(command)) {
            throw new IOException("Setup BCB failed");
        }

        /* 3. 通过 POWER_SERVICE 触发重启 */
        // Having set up the BCB (bootloader control block), go ahead and reboot
        PowerManager pm = (PowerManager) context.getSystemService(Context.POWER_SERVICE);
        String reason = PowerManager.REBOOT_RECOVERY_UPDATE;

        // On TV, reboot quiescently if the screen is off
        if (context.getPackageManager().hasSystemFeature(PackageManager.FEATURE_LEANBACK)) {
            WindowManager wm = (WindowManager) context.getSystemService(Context.WINDOW_SERVICE);
            if (wm.getDefaultDisplay().getState() != Display.STATE_ON) {
                reason += ",quiescent";
            }
        }
        // 进入关机重启流程
        pm.reboot(reason);

        throw new IOException("Reboot failed (no permissions?)");
    }
}

分析 installPackage 一共干了以下3件事:

2.1 构造 Recovery 升级指令

通过升级包存储位置的绝对路径判断存储设备是 data 分区还是其他存储介质(U盘、TF卡等)决定是否对升级包预处理 。
如果需要预处理,则把参数 update_package 的值改为固定值 “@/cache/recovery/block.map”,同时把参数 update_package、locale、security 等格式化成固定格式的升级指令字符串。

1)uncryptd 为什么预处理升级包?
Android data 分区的数据会被加密(FDE/FBE),AOSP recovery 没有实现分区解密功能,因此 recovery 无法访问data分区的数据,也就无法从 data 分区文件系统直接 load 升级包。(recovery模式下:FDE加密的设备无法挂载data分区,FBE加密的设备看到的data分区文件内容是乱码)。

  • 所以在进入recovery 前先把升级包数据解密,解密后会把升级包在存储介质中的存储信息写入固定文件 /cache/recovery/block.map 中,进入recovery后,从 block.map 文件中解析出升级包位置信息即可 load 升级包数据。
  • 当然如果升级包保存在未加密的TF卡、U盘中,那么无需多升级包做额外处理,recovery可以从存储器文件系统直接 load 数据)

2)uncryptd 如何预处理升级包?
输入: UNCRYPT_PACKAGE_FILE 输入参数。
输出:处理结果写入到文件BLOCK_MAP_FILE(/cache/recovery/block.map)。
当然此处只是准备好输入输出文件,预处理操作是在下文第 2.3 步重启设备时执行的,见下文介绍。
UNCRYPT_PACKAGE_FILE :升级包的实际在文件系统的路径;
BLOCK_MAP_FILE :升级包数据在存储器中块分布信息(主要就是块号);

3)升级指令字符串中各个参数的格式为 –key=value 或者 –key,同时以换行符分隔。

4)函数参数 processed 的作用
uncryptd 在关机时对升级包做预处理解密,当升级包的 size 比较大时会造成关机耗时,因此可以事先预处理好升级包,再调用 installPackage 时processed 置为 true,那么在关机时就不会启动 uncryptd,从而不影响关机速度。

2.2 通过 RECOVERY_SERVICE 把升级指令写入到BCB

把格式化后的指令字符串写入 misc 分区头部的 BCB (bootloader control block)区域。

RecoverySystem

// frameworks/base/core/java/android/os/RecoverySystem.java
// 1. RecoverySystem 调用 setupBcb
public static void installPackage(Context context, File packageFile, boolean processed) {
	...
	RecoverySystem rs = (RecoverySystem) context.getSystemService(Context.RECOVERY_SERVICE);
	rs.setupBcb(command)
	...
}

// 2. 调 RecoverySystemService 的 setupBcb 接口
private boolean setupBcb(String command) {
	return mService.setupBcb(command);
}

RecoverySystemService

// frameworks/base/services/core/java/com/android/server/recoverysystem/RecoverySystemService.java
// 1. 调用 setupOrClearBcb 把升级指令字符串 command 写入 BCB
public boolean setupBcb(String command) {
    if (DEBUG) Slog.d(TAG, "setupBcb: [" + command + "]");
    return setupOrClearBcb(true, command);
}

// 2. setupOrClearBcb 实际上是启动 native 服务 setup-bcb 并通过它把指令字符串写入 BCB
private boolean setupOrClearBcb(boolean isSetup, String command) {
	// 2.1 检查 uncrypt/setup-bcb/clear-bcb 服务是否正在运行,
	// 如果处于runing状态则说明在这之前已经触发工作了,中止本次操作。
    final boolean available = checkAndWaitForUncryptService();
    if (!available) {
        Slog.e(TAG, "uncrypt service is unavailable.");
        return false;
    }
	
	// 2.2 通过 isSetup 判断往 BCB 写入还是擦除参数,启动不同的服务
	// (本质上 uncrypt/setup-bcb/clear-bcb 都是同一个binary,
	// 只是传入不同参数执行不同任务而已,详见下文讲解)
    if (isSetup) {
        mInjector.systemPropertiesSet("ctl.start", "setup-bcb");
    } else {
        mInjector.systemPropertiesSet("ctl.start", "clear-bcb");
    }
	
	// 2.3 启动 setup-bcb 或者 clear-bcb 服务后通过socket 与其通信
    // Connect to the uncrypt service socket.
    UncryptSocket socket = mInjector.connectService();
    if (socket == null) {
        Slog.e(TAG, "Failed to connect to uncrypt socket");
        return false;
    }

    try {
    	// 如果是写 BCB 参数则把升级指令通过 socket 传输给服务 setup-bcb
        // Send the BCB commands if it's to setup BCB.
        if (isSetup) {
            socket.sendCommand(command);
        }

		// 从 socket 读取 setup-bcb/clear-bcb 执行的结果
        // Read the status from the socket.
        int status = socket.getPercentageUncrypted();

        // Ack receipt of the status code. uncrypt waits for the ack so
        // the socket won't be destroyed before we receive the code.
        socket.sendAck();
		
		// setup-bcb/clear-bcb 定义好的成功返回值,执行成功返回100
		// (100仅仅是 setup-bcb 服务定义的正确返回值,无计量等特殊含义)
        if (status == 100) {
            Slog.i(TAG, "uncrypt " + (isSetup ? "setup" : "clear")
                    + " bcb successfully finished.");
        } else {
            // Error in /system/bin/uncrypt.
            Slog.e(TAG, "uncrypt failed with status: " + status);
        }
    }
}

此处不展开介绍 native 服务 setup-bcb/clear-bcb 如何写入和擦除BCB数据,详见文章:(待续)

2.3 通过 POWER_SERVICE 触发重启设备

该步骤的重点是启动 uncryptd 预处理升级包。

RecoverySystem:

// frameworks/base/core/java/android/os/RecoverySystem.java
public static void installPackage(Context context, File packageFile, boolean processed) {
	...
	PowerManager pm = (PowerManager) context.getSystemService(Context.POWER_SERVICE);
	String reason = PowerManager.REBOOT_RECOVERY_UPDATE;
	pm.reboot(reason);
}

PowerManager:

// frameworks/base/core/java/android/os/PowerManager.java
public void reboot(@Nullable String reason) {
	mService.reboot(false, reason, true);
 }

PowerManagerService:

// frameworks/base/services/core/java/com/android/server/power/PowerManagerService.java
public void reboot(boolean confirm, @Nullable String reason, boolean wait) {
	shutdownOrRebootInternal(HALT_MODE_REBOOT, confirm, reason, wait);
}

private void shutdownOrRebootInternal(final @HaltMode int haltMode, final boolean confirm,
        @Nullable final String reason, boolean wait) {
	...
	// 启动关机线程 ShutdownThread
    Runnable runnable = new Runnable() {
        @Override
        public void run() {
	        if (haltMode == HALT_MODE_REBOOT) {
	            ShutdownThread.reboot(getUiContext(), reason, confirm);
	        }
        }
    };

    // ShutdownThread must run on a looper capable of displaying the UI.
    Message msg = Message.obtain(UiThread.getHandler(), runnable);
    msg.setAsynchronous(true);
    UiThread.getHandler().sendMessage(msg);

    // PowerManager.reboot() is documented not to return so just wait for the inevitable.
    if (wait) {
	     while (true) {
         	runnable.wait();
	     }
    }
}

ShutdownThread:

// frameworks/base/services/core/java/com/android/server/power/ShutdownThread.java

public final class ShutdownThread extends Thread {

	ShutdownThread sInstance = new ShutdownThread()
	// 1. 重启机器
	public static void reboot(final Context context, String reason, boolean confirm) {
	    mReboot = true;
	    mRebootSafeMode = false;
	    mRebootHasProgressBar = false;
	    mReason = reason;
	    shutdownInner(context, confirm);
	}

	// 2. 弹出关机进度条弹窗(uncryptd 处理升级包比较耗时)
   private static void shutdownInner(final Context context, boolean confirm) {
		beginShutdownSequence(context) {
		    sInstance.mProgressDialog = showShutdownDialog(context);
			sInstance.start()
		}
	}
	
	// 3. 如上文所述,UNCRYPT_PACKAGE_FILE 存在以及 BLOCK_MAP_FILE 不存在 则说明需要uncryptd 
	// 预处理升级包,此时标记本次重启需要给用户进度条弹窗,同时该标记 mRebootHasProgressBar 在
	// 下文也会作为是否启动 uncryptd 的标志。
	private static ProgressDialog showShutdownDialog(Context context) {
		// mReason could be "recovery-update" or "recovery-update,quiescent".
		if (mReason != null && mReason.startsWith(PowerManager.REBOOT_RECOVERY_UPDATE)) {
		   // We need the progress bar if uncrypt will be invoked during the
		   // reboot, which might be time-consuming.
		   mRebootHasProgressBar = RecoverySystem.UNCRYPT_PACKAGE_FILE.exists()
		           && !(RecoverySystem.BLOCK_MAP_FILE.exists());
		}
		...
	}

	// 4. ShutdownThread 线程的任务实现
    /**
     * Makes sure we handle the shutdown gracefully.
     * Shuts off power regardless of radio state if the allotted time has passed.
     */
    public void run() {
    	// 记录本次重启原因
		{
			 String reason = (mReboot ? "1" : "0") + (mReason != null ? mReason : "");
			 SystemProperties.set(SHUTDOWN_ACTION_PROPERTY, reason);
		}
		// 此处进入 uncryptd 开始预处理升级包
        if (mRebootHasProgressBar) {
            sInstance.setRebootProgress(MOUNT_SERVICE_STOP_PERCENT, null);

            // If it's to reboot to install an update and uncrypt hasn't been
            // done yet, trigger it now.
            uncrypt();
        }
        // 最后关机 or 重启
        rebootOrShutdown(mContext, mReboot, mReason);
    }
    
    // 5. 
	public static void rebootOrShutdown(final Context context, boolean reboot, String reason) {
	    if (reboot) {
	        Log.i(TAG, "Rebooting, reason: " + reason);
	        PowerManagerService.lowLevelReboot(reason);
	        Log.e(TAG, "Reboot failed, will attempt shutdown instead");
	        reason = null;
	    } 
	    ...
	}
}

函数 uncrypt()
通过 RecoverySystem 启动 uncryptd 预处理升级包,同时监听处理进度,更新弹窗显示的进度条。

// frameworks/base/services/core/java/com/android/server/power/ShutdownThread.java
private void uncrypt() {
     Log.i(TAG, "Calling uncrypt and monitoring the progress...");
	
	// 定义uncryptd 预处理升级包进度监听器,更新关机进度条进度值
     final RecoverySystem.ProgressListener progressListener =
             new RecoverySystem.ProgressListener() {
         @Override
         public void onProgress(int status) {
             if (status >= 0 && status < 100) {
                 // Scale down to [MOUNT_SERVICE_STOP_PERCENT, 100).
                 status = (int)(status * (100.0 - MOUNT_SERVICE_STOP_PERCENT) / 100);
                 status += MOUNT_SERVICE_STOP_PERCENT;
                 CharSequence msg = mContext.getText(
                         com.android.internal.R.string.reboot_to_update_package);
                 sInstance.setRebootProgress(status, msg);
             } else if (status == 100) {
                 CharSequence msg = mContext.getText(
                         com.android.internal.R.string.reboot_to_update_reboot);
                 sInstance.setRebootProgress(status, msg);
             } else {
                 // Ignored
             }
         }
     };

	// 通过RecoverySystem的processPackage接口启动uncryptd预处理升级包
     final boolean[] done = new boolean[1];
     done[0] = false;
     Thread t = new Thread() {
         @Override
         public void run() {
             RecoverySystem rs = (RecoverySystem) mContext.getSystemService(
                     Context.RECOVERY_SERVICE);
             String filename = null;
             try {
                 filename = FileUtils.readTextFile(RecoverySystem.UNCRYPT_PACKAGE_FILE, 0, null);
                 
                 // 把调用 RecoverySystem.installPackage 准备好的UNCRYPT_PACKAGE_FILE和
                 // 进度监听器传入processPackage接口,最终uncryptd会把 UNCRYPT_PACKAGE_FILE
                 // 的内容作为输入预处理升级包同时通过 progressListene r反馈处理进度
                 rs.processPackage(mContext, new File(filename), progressListener);
             } catch (IOException e) {
                 Log.e(TAG, "Error uncrypting file", e);
             }
             done[0] = true;
         }
     };
     t.start();

     try {
         t.join(MAX_UNCRYPT_WAIT_TIME);
     } catch (InterruptedException unused) {
     }
     if (!done[0]) {
         Log.w(TAG, "Timed out waiting for uncrypt.");
         final int uncryptTimeoutError = 100;
         String timeoutMessage = String.format("uncrypt_time: %d\n" + "uncrypt_error: %d\n",
                 MAX_UNCRYPT_WAIT_TIME / 1000, uncryptTimeoutError);
         try {
             FileUtils.stringToFile(RecoverySystem.UNCRYPT_STATUS_FILE, timeoutMessage);
         } catch (IOException e) {
             Log.e(TAG, "Failed to write timeout message to uncrypt status", e);
         }
     }
 }

此处不展开介绍 uncryptd 如何预处理升级包,详见文章:(待续)

3. BootLoader 读取 BCB 启动到 Recovery System

Bootloader阶段代码AOSP非实现,由芯片平台产商提供,此处只粗略介绍高通平台升级时BootLoader流程,其他平台(MTK、三星 Exynos)虽然代码实现不一样,但是流程基本一致。
函数 LinuxLoaderEntry (…)
bootloader 启动 kernel 的入口。

// Bootloader load Linux kernel 入口
LinuxLoaderEntry (IN EFI_HANDLE ImageHandle, IN EFI_SYSTEM_TABLE *SystemTable) {

    // 1. 从 boot reason 确认启动到那种模式
    // 平时 "adb reboot recovery" 就是在这里决定启动到 recovery 模式
    // 而升级的话是通过后面第2步决定的。
	Status = GetRebootReason (&BootReason);

   // 2. 从 misc 分区读取 BCB 内容确认是否要进入 Recovery
	Status = RecoveryInit (&BootIntoRecovery);
	
	if (!BootIntoFastboot) {
	  BootInfo Info = {0};
	  // 3. 设置启动参数
	  // BootIntoRecovery 为 true 则启动到 Recovery system
	  // 否则启动到 Main system。
	  Info.MultiSlotBoot = MultiSlotBoot;
	  Info.BootIntoRecovery = BootIntoRecovery;
	  Info.BootReasonAlarm = BootReasonAlarm;
	  // 4. 分区镜像签名校验
	  Status = LoadImageAndAuth (&Info);
	  // 5. 从存储器 load kernel到内存并跳转到 kernel
	  BootLinux (&Info);
	}
}

函数 RecoveryInit (…)
作用:根据 misc 分区 BCB 内容判断是否启动到 recovery 模式。
实现:RecoveryInit 直接把 misc 分区头部的 raw 数据填充 RecoveryMessage 结构体 (RecoveryMessage 即 BCB 从存储器到内存中的数据表示),然后判断 command 字段是否等于字符串 “boot-recovery” 来决定是否启动到 recovery system 还是 main system(由前文可知 misc 分区头部的 BCB 数据是框架 RecoverySystem类 通过服务 setup-bcb 写入的)。

#define RECOVERY_BOOT_RECOVERY "boot-recovery"

/* Recovery Message */
struct RecoveryMessage {
  CHAR8 command[32];
  CHAR8 status[32];
  CHAR8 recovery[1024];
};

EFI_STATUS
RecoveryInit (BOOLEAN *BootIntoRecovery)
{
  EFI_STATUS Status;
  struct RecoveryMessage *Msg = NULL;
  EFI_GUID Ptype = gEfiMiscPartitionGuid;
  MemCardType CardType = UNKNOWN;
  VOID *PartitionData = NULL;
  UINT32 PageSize;

  CardType = CheckRootDeviceType ();
  if (CardType == NAND) {
    Status = GetNandMiscPartiGuid (&Ptype);
    if (Status != EFI_SUCCESS) {
      return Status;
    }
  }

  GetPageSize (&PageSize);

  /* Get the first 2 pages of the misc partition.
   * If the device type is NAND then read the recovery message from page 1,
   * Else read from the page 0
   */
  Status = ReadFromPartition (&Ptype, (VOID **)&PartitionData, (PageSize * 2));
  if (Status != EFI_SUCCESS) {
    DEBUG ((EFI_D_ERROR, "Error Reading from misc partition: %r\n", Status));
    return Status;
  }

  if (!PartitionData) {
    DEBUG ((EFI_D_ERROR, "Error in loading Data from misc partition\n"));
    return EFI_INVALID_PARAMETER;
  }

  Msg = (CardType == NAND) ?
           (struct c *) ((CHAR8 *) PartitionData + PageSize) :
           (struct RecoveryMessage *) PartitionData;

  // Ensure NULL termination
  Msg->command[sizeof (Msg->command) - 1] = '\0';
  if (Msg->command[0] != 0 && Msg->command[0] != 255)
    DEBUG ((EFI_D_VERBOSE, "Recovery command: %d %a\n", sizeof (Msg->command),
            Msg->command));

  if (!AsciiStrnCmp (Msg->command, RECOVERY_BOOT_RECOVERY,
                       AsciiStrLen (RECOVERY_BOOT_RECOVERY))) {
    *BootIntoRecovery = TRUE;
  }

  FreePool (PartitionData);
  PartitionData = NULL;
  Msg = NULL;

  return Status;
}

函数 BootLinux (…)
把存储在磁盘上不同分区的 ramdisk、kernel 加载到固定的内存区域中,并设置传递给 kernel 的 cmdline,最后通过指向 kernel 在内存中的首地址的函数指针跳转到 kernel 执行,此后启动流程进入 kernel 阶段。

从软件架构篇可知, recovery system 和 main system 的 kernel、ramdisk 会从不同的分区加载到内存。recovery system 的 kernel 和 ramdisk 是从 recovery 分区加载,而 main system 的 kernel 和 ramdisk 是从 boot 分区加载。两者的区别在于ramdisk 里面打包的目录结构、配置文件,执行程序等不一样,但 kernel 实际上是完全一致的,只是运行时因为 cmdline 不同流程会有差异。

EFI_STATUS
BootLinux (BootInfo *Info) {
	....
   LinuxKernel = (LINUX_KERNEL) (UINT64)BootParamlistPtr.KernelLoadAddr;
   LinuxKernel ((UINT64)BootParamlistPtr.DeviceTreeLoadAddr, 0, 0, 0);
}

4. Kernel 加载 ramdisk,启动 init 并拉起 recovery 进程

(待续)

5. 进入 Recovery 升级流程

Android Q 开始 Google 在 recovery 模式下增加了 fastbootd,用于使用动态分区的设备烧写system、vendor等分区,因此 main 函数里面调用 StartFastboot 或者 start_recovery 进入到不同的子模式中。

函数 main()
通过参数决定进入 user fastboot模式(StartFastboot) 还是 recovery 模式(start_recovery),同时在退出 fastboot/recovery 模式后根据返回值决定重启或者关机。

fastbootd
在用户态打开一个usb端口同时实现了bootloader fastboot 数据传输协议的服务,在这个模式下可以使用fastboot.exe烧写设备分区镜像,本文不做详细介绍。

// bootable/recovery/recovery_main.cpp
int main(int argc, char** argv) {
  // 初始化 log
  // We don't have logcat yet under recovery; so we'll print error on screen and log to stdout
  // (which is redirected to recovery.log) as we used to do.
  android::base::InitLogging(argv, &UiLogger);
  
  // 将程序标准输出重定向到临时log文件 /tmp/recovery.log
  // redirect_stdio should be called only in non-sideload mode. Otherwise we may have two logger
  // instances with different timestamps.
  redirect_stdio(Paths::Get().temporary_log_file().c_str());
  // 从 fstab load 分区信息
  load_volume_table();
  
  // 从 misc 分区把存储在 BCB 里的升级指令取出并保存到数组 args
  std::vector<std::string> args = get_args(argc, argv, &stage);

  while (true) {
    // We start adbd in recovery for the device with userdebug build or a unlocked bootloader.
    std::string usb_config =
        fastboot ? "fastboot" : IsRoDebuggable() || IsDeviceUnlocked() ? "adb" : "none";
    std::string usb_state = android::base::GetProperty("sys.usb.state", "none");
    if (usb_config != usb_state) {
      if (!SetUsbConfig("none")) {
        LOG(ERROR) << "Failed to clear USB config";
      }
      if (!SetUsbConfig(usb_config)) {
        LOG(ERROR) << "Failed to set USB config to " << usb_config;
      }
    }
	// 通过 args 里的参数识别到 recovery 模式,进入 start_recovery,并传入从 misc 分区 BCB
	// 读到升级指令数组。
    auto ret = fastboot ? StartFastboot(device, args) : start_recovery(device, args);
	
	// 升级结束,关机 or 重启 等
    switch (ret) {
      case Device::REBOOT:
        ui->Print("Rebooting...\n");
        Reboot("userrequested,recovery");
        break;
    }
  }

  // Should be unreachable.
  return EXIT_SUCCESS;
}

5.1 从 misc 分区 BCB 读取升级指令

函数 get_args()
从函数注释可知升级指令有三个来源,依次读取解析,只要其中一个地方读取到指令则直接返回。
get_args 依次从下面三个地方获取升级指令:

  1. 进程启动参数
  2. misc 分区 BCB
  3. COMMAND_FILE (/cache/recovery/command)

升级流程实际上都是从 第2项 misc 分区 BCB 读取的。

// bootable/recovery/recovery_main.cpp
// Parses the command line argument from various sources; and reads the stage field from BCB.
// command line args come from, in decreasing precedence:
//   - the actual command line
//   - the bootloader control block (one per line, after "recovery")
//   - the contents of COMMAND_FILE (one per line)
static std::vector<std::string> get_args(const int argc, char** const argv, std::string* stage) {
  CHECK_GT(argc, 0);

  bootloader_message boot = {};
  std::string err;
  // 1. 把 misc 分区头部 BCB 数据填充到 bootloader_message 结构体 boot
  if (!read_bootloader_message(&boot, &err)) {
    LOG(ERROR) << err;
    // If fails, leave a zeroed bootloader_message.
    boot = {};
  }
  if (stage) {
    *stage = std::string(boot.stage);
  }

  std::string boot_command;
  if (boot.command[0] != 0) {
    if (memchr(boot.command, '\0', sizeof(boot.command))) {
      boot_command = std::string(boot.command);
    } else {
      boot_command = std::string(boot.command, sizeof(boot.command));
    }
    LOG(INFO) << "Boot command: " << boot_command;
  }

  if (boot.status[0] != 0) {
    std::string boot_status = std::string(boot.status, sizeof(boot.status));
    LOG(INFO) << "Boot status: " << boot_status;
  }
  
  // 2. 把进程启动参数作为默认升级指令参数 (通常为空)	
  std::vector<std::string> args(argv, argv + argc);

  // 3. 如果进程启动参数为空,则从 misc BCB 的 "recovery" 字段获取升级指令
  // --- if arguments weren't supplied, look in the bootloader control block
  if (args.size() == 1) {
    boot.recovery[sizeof(boot.recovery) - 1] = '\0';  // Ensure termination
    std::string boot_recovery(boot.recovery);
    std::vector<std::string> tokens = android::base::Split(boot_recovery, "\n");
    if (!tokens.empty() && tokens[0] == "recovery") {
      for (auto it = tokens.begin() + 1; it != tokens.end(); it++) {
        // Skip empty and '\0'-filled tokens.
        if (!it->empty() && (*it)[0] != '\0') args.push_back(std::move(*it));
      }
      LOG(INFO) << "Got " << args.size() << " arguments from boot message";
    } else if (boot.recovery[0] != 0) {
      LOG(ERROR) << "Bad boot message: \"" << boot_recovery << "\"";
    }
  }

  // 4. 如果前面均没有获取到参数则从 COMMAND_FILE 获取参数
  // --- if that doesn't work, try the command file (if we have /cache).
  if (args.size() == 1 && HasCache()) {
    std::string content;
    if (ensure_path_mounted(COMMAND_FILE) == 0 &&
        android::base::ReadFileToString(COMMAND_FILE, &content)) {
      std::vector<std::string> tokens = android::base::Split(content, "\n");
      // All the arguments in COMMAND_FILE are needed (unlike the BCB message,
      // COMMAND_FILE doesn't use filename as the first argument).
      for (auto it = tokens.begin(); it != tokens.end(); it++) {
        // Skip empty and '\0'-filled tokens.
        if (!it->empty() && (*it)[0] != '\0') args.push_back(std::move(*it));
      }
      LOG(INFO) << "Got " << args.size() << " arguments from " << COMMAND_FILE;
    }
  }

  // 5. 把读到的参数更新或misc 分区 BCB,这个操作是针对从进程启动参数或者COMMAND_FILE获取
  // 升级指令设计的,这样可以使得指令在正常退出前都保存在misc分区,即使中间出现中断等情况,设
  // 备可以自动恢复完成指令,知道最后执行完毕主动擦除misc分区
  // ***  此处增强了升级的可靠性 ****
  // Write the arguments (excluding the filename in args[0]) back into the
  // bootloader control block. So the device will always boot into recovery to
  // finish the pending work, until FinishRecovery() is called.
  std::vector<std::string> options(args.cbegin() + 1, args.cend());
  if (!update_bootloader_message(options, &err)) {
    LOG(ERROR) << "Failed to set BCB message: " << err;
  }

  // Finally, if no arguments were specified, check whether we should boot
  // into fastboot or rescue mode.
  if (args.size() == 1 && boot_command == "boot-fastboot") {
    args.emplace_back("--fastboot");
  } else if (args.size() == 1 && boot_command == "boot-rescue") {
    args.emplace_back("--rescue");
  }

  return args;
}

5.2 把升级包 mmap 到内存

升级流程进入 start_recovery(),接着调用 mmap 把升级包数据从存储器映射到进程内存空间,见源码分析。
函数 start_recovery()

Device::BuiltinAction start_recovery(Device* device, const std::vector<std::string>& args) {
  // 1. 从参数 "update_package" 里得到升级包的路径
  static constexpr struct option OPTIONS[] = {
    { "update_package", required_argument, nullptr, 0 },
  };

  const char* update_package = nullptr;
  auto args_to_parse = StringVectorToNullTerminatedArray(args);

  // Parse everything before the last element (which must be a nullptr). getopt_long(3) expects a
  // null-terminated char* array, but without counting null as an arg (i.e. argv[argc] should be
  // nullptr).
  while ((arg = getopt_long(args_to_parse.size() - 1, args_to_parse.data(), "", OPTIONS,
                            &option_index)) != -1) {
    switch (arg) {
      ...
      case 0: {
        std::string option = OPTIONS[option_index].name;
        if (option == "install_with_fuse") {
          ...
        } else if (option == "update_package") {
          update_package = optarg;
        }
      }
    }
  }

  InstallResult status = INSTALL_SUCCESS;
  // next_action indicates the next target to reboot into upon finishing the install. It could be
  // overridden to a different reboot target per user request.
  Device::BuiltinAction next_action = shutdown_after ? Device::SHUTDOWN : Device::REBOOT;

  if (update_package != nullptr) {
    // It's not entirely true that we will modify the flash. But we want
    // to log the update attempt since update_package is non-NULL.
    save_current_log = true;

    if (int required_battery_level; retry_count == 0 && !IsBatteryOk(&required_battery_level)) {
      ui->Print("battery capacity is not enough for installing package: %d%% needed\n",
                required_battery_level);
      // Log the error code to last_install when installation skips due to low battery.
      log_failure_code(kLowBattery, update_package);
      status = INSTALL_SKIPPED;
    } else if (retry_count == 0 && bootreason_in_blacklist()) {
      // Skip update-on-reboot when bootreason is kernel_panic or similar
      ui->Print("bootreason is in the blacklist; skip OTA installation\n");
      log_failure_code(kBootreasonInBlacklist, update_package);
      status = INSTALL_SKIPPED;
    } else {
      // retry_count  用于记录升级过程中设备是否发生过重启
      // It's a fresh update. Initialize the retry_count in the BCB to 1; therefore we can later
      // identify the interrupted update due to unexpected reboots.
      if (retry_count == 0) {
        set_retry_bootloader_message(retry_count + 1, args);
      }
	
      if (update_package[0] == '@') {
        ensure_path_mounted(update_package + 1);
      } else {
        ensure_path_mounted(update_package);
      }

	  // 2. 由函数名 CreateMemoryPackage 可知,把升级包 mmap 到内存,并通过对象
	  // memory_package 管理 mmap 到内存中的升级包。
      if (install_with_fuse) {
		...
      } else if (auto memory_package = Package::CreateMemoryPackage(
                     update_package,
                     std::bind(&RecoveryUI::SetProgress, ui, std::placeholders::_1));
                 memory_package != nullptr) {
        // 3. InstallPackage :有函数名可知开始安装升级包
        status = InstallPackage(memory_package.get(), update_package, should_wipe_cache,
                                retry_count, ui);
      } else {
		...
      }
      if (status != INSTALL_SUCCESS) {
        ui->Print("Installation aborted.\n");

		// 4. 有时在升级过程中会发生 I/O 错误 可能导致升级无法进行下去,通常这类
		// 错误重启设备再次写数据就不会发生,因此 google 设计了一套升级中断并恢复升级的机制,
		// 此处就是当系统出现 I/O 等错误时,重启设备,再次尝试升级。
        // When I/O error or bspatch/imgpatch error happens, reboot and retry installation
        // RETRY_LIMIT times before we abandon this OTA update.
        static constexpr int RETRY_LIMIT = 4;
        if (status == INSTALL_RETRY && retry_count < RETRY_LIMIT) {
          copy_logs(save_current_log);
          // retry_count 加1,重启恢复升级时通过该标记就知道此次升级是属于
          // 重启后再次尝试升级,恢复机制生效。
          retry_count += 1;
          set_retry_bootloader_message(retry_count, args);
          // Print retry count on screen.
          ui->Print("Retry attempt %d\n", retry_count);

          // Reboot back into recovery to retry the update.
          Reboot("recovery");
        }
      }
    }
  }
  ...
}

Package::CreateMemoryPackage
该方法实质上就是调用 mmap 把升级包数据映射到进程内存,但是还记得框架对保存在data分区中升级包做了解密处理同时传给 recovery 的升级包路径是 “@/cache/recovery/block.map”。
这是一个很巧妙的操作,会单独讲解,详见:(待续)。

INSTALL_RETRY
这个是一个特殊的升级失败错误码,得益于 Google 设计了一套升级中断后恢复的机制,可以在升级过程中出现设备重启、进程被杀等中断(主动或者被动)场景后继续恢复升级。此处就是遇到系统 I/O 错误时,主动重启机器后再次尝试升级,升级中断恢复机制详见:待续)。

5.3 校验升级包完整性、合法性

InstallResult InstallPackage(Package* package, const std::string_view package_id,
                             bool should_wipe_cache, int retry_count, RecoveryUI* ui) {
    ...
    bool updater_wipe_cache = false;
    result = VerifyAndInstallPackage(package, &updater_wipe_cache, &log_buffer, retry_count,
                                     &max_temperature, ui);
    should_wipe_cache = should_wipe_cache || updater_wipe_cache;
    ...
}
static InstallResult VerifyAndInstallPackage(Package* package, bool* wipe_cache,
                                             std::vector<std::string>* log_buffer, int retry_count,
                                             int* max_temperature, RecoveryUI* ui) {
  // Verify package.
  if (!verify_package(package, ui)) {
    log_buffer->push_back(android::base::StringPrintf("error: %d", kZipVerificationFailure));
    return INSTALL_CORRUPT;
  }

  // Verify and install the contents of the package.
  ui->Print("Installing update...\n");
  if (retry_count > 0) {
    ui->Print("Retry attempt: %d\n", retry_count);
  }
  ui->SetEnableReboot(false);
  auto result = TryUpdateBinary(package, wipe_cache, log_buffer, retry_count, max_temperature, ui);
  ui->SetEnableReboot(true);
  ui->Print("\n");

  return result;
}
bool verify_package(Package* package, RecoveryUI* ui) {
  static constexpr const char* CERTIFICATE_ZIP_FILE = "/system/etc/security/otacerts.zip";
  std::vector<Certificate> loaded_keys = LoadKeysFromZipfile(CERTIFICATE_ZIP_FILE);
  if (loaded_keys.empty()) {
    return false;
  }

  int err = verify_file(package, loaded_keys);

  if (err != VERIFY_SUCCESS) {
    return false;
  }
  return true;
}

校验升级包的签名是否合法,本质上对升级包做 RSA 签名校验。

  1. 首先服务器用私钥签名升级包,同时把证书嵌入到升级包尾部;
  2. 升级包校验时从尾部取出证书,再从证书中取出公钥;
  3. 接着通过设备里存储的公钥列表判断该公钥是否合法;
  4. 最后使用该公钥验签。

升级包签名校验技术细节详见:(待续)

5.4 Fork update-binary 子进程升级系统

Google 在设计升级流程时有很多灵活巧妙的地方,比如上文提到的升级中断恢复机制。接下来介绍的 update binary 也是非常巧妙的。recovery 进程 (/system/bin/recovery) 在整个升级过程中实际上只是充当流程控制的角色,升级的实际执行者是 update-binary,它被打包到升级包路径 META-INF/com/google/android/update-binary

update-binary 运行流程如下

  1. recovery mmap 升级包到内存(上文已介绍);
  2. recovery 调用 TryUpdateBinary () -> SetUpNonAbUpdateCommands() 把 update-binary 从升级包里面释放到设备路径 /tmp/update-binary 下;
  3. recovery fork 子进程启动 update-binary,同时建立管道和子进程建立进程间通信;
  4. update-binary 调用 mmap 把升级包映射到自己的内存空间,然后开始从升级包拿数据更新相关分区的块设备数据,升级系统;
  5. update-binary 通过管道向父进程 recovery 传递升级进度、数据等,接着 recovery 更新界面进度条;
  6. recovery 进程调用 waitpid(pid, &status, 0) 等待 update-binary 子进程升级结束,最后根据进程退出码 status 的值判断升级是否成功;

update-binary 打包到升级包的好处

Android 系统升级过程实际上是比较复杂的,特别是基于存储块打 patch 的增量升级,很难保证不会出现bug,一旦出现严重bug,那么很可能导致用户手中的设备无法升级,这个影响就很大了。本来升级就是为了解决系统bug,但是这时recovery本身存在bug导致设别无法升级,那就很尴尬了。
update-binary 打包到升级包中,升级时释放到内存,再通过 update-binary 完成系统升级, 这样即使 update-binary 存在严重bug,再给用户推送新的升级包时解决掉就好,不影响系统升级到新版本

函数 TryUpdateBinary()

// If the package contains an update binary, extract it and run it.
static InstallResult TryUpdateBinary(Package* package, bool* wipe_cache,
                                     std::vector<std::string>* log_buffer, int retry_count,
                                     int* max_temperature, RecoveryUI* ui) {
  std::map<std::string, std::string> metadata;
  auto zip = package->GetZipArchiveHandle();
  if (!ReadMetadataFromPackage(zip, &metadata)) {
    LOG(ERROR) << "Failed to parse metadata in the zip file";
    return INSTALL_CORRUPT;
  }

  bool is_ab = android::base::GetBoolProperty("ro.build.ab_update", false);
  if (is_ab) {
    CHECK(package->GetType() == PackageType::kFile);
  }

  // Verify against the metadata in the package first.
  if (is_ab && !CheckPackageMetadata(metadata, OtaType::AB)) {
    log_buffer->push_back(android::base::StringPrintf("error: %d", kUpdateBinaryCommandFailure));
    return INSTALL_ERROR;
  }

  ReadSourceTargetBuild(metadata, log_buffer);

  // The updater in child process writes to the pipe to communicate with recovery.
  android::base::unique_fd pipe_read, pipe_write;
  // Explicitly disable O_CLOEXEC using 0 as the flags (last) parameter to Pipe
  // so that the child updater process will recieve a non-closed fd.
  if (!android::base::Pipe(&pipe_read, &pipe_write, 0)) {
    PLOG(ERROR) << "Failed to create pipe for updater-recovery communication";
    return INSTALL_CORRUPT;
  }

  // The updater-recovery communication protocol.
  //
  //   progress <frac> <secs>
  //       fill up the next <frac> part of of the progress bar over <secs> seconds. If <secs> is
  //       zero, use `set_progress` commands to manually control the progress of this segment of the
  //       bar.
  //
  //   set_progress <frac>
  //       <frac> should be between 0.0 and 1.0; sets the progress bar within the segment defined by
  //       the most recent progress command.
  //
  //   ui_print <string>
  //       display <string> on the screen.
  //
  //   wipe_cache
  //       a wipe of cache will be performed following a successful installation.
  //
  //   clear_display
  //       turn off the text display.
  //
  //   enable_reboot
  //       packages can explicitly request that they want the user to be able to reboot during
  //       installation (useful for debugging packages that don't exit).
  //
  //   retry_update
  //       updater encounters some issue during the update. It requests a reboot to retry the same
  //       package automatically.
  //
  //   log <string>
  //       updater requests logging the string (e.g. cause of the failure).
  //

  std::string package_path = package->GetPath();

  std::vector<std::string> args;
  if (auto setup_result =
          is_ab ? SetUpAbUpdateCommands(package_path, zip, pipe_write.get(), &args)
                : SetUpNonAbUpdateCommands(package_path, zip, retry_count, pipe_write.get(), &args);
      !setup_result) {
    log_buffer->push_back(android::base::StringPrintf("error: %d", kUpdateBinaryCommandFailure));
    return INSTALL_CORRUPT;
  }

  pid_t pid = fork();
  if (pid == -1) {
    PLOG(ERROR) << "Failed to fork update binary";
    log_buffer->push_back(android::base::StringPrintf("error: %d", kForkUpdateBinaryFailure));
    return INSTALL_ERROR;
  }

  if (pid == 0) {
    umask(022);
    pipe_read.reset();

    // Convert the std::string vector to a NULL-terminated char* vector suitable for execv.
    auto chr_args = StringVectorToNullTerminatedArray(args);
    execv(chr_args[0], chr_args.data());
    // We shouldn't use LOG/PLOG in the forked process, since they may cause the child process to
    // hang. This deadlock results from an improperly copied mutex in the ui functions.
    // (Bug: 34769056)
    fprintf(stdout, "E:Can't run %s (%s)\n", chr_args[0], strerror(errno));
    _exit(EXIT_FAILURE);
  }
  pipe_write.reset();

  std::atomic<bool> logger_finished(false);
  std::thread temperature_logger(log_max_temperature, max_temperature, std::ref(logger_finished));

  *wipe_cache = false;
  bool retry_update = false;

  char buffer[1024];
  FILE* from_child = android::base::Fdopen(std::move(pipe_read), "r");
  while (fgets(buffer, sizeof(buffer), from_child) != nullptr) {
    std::string line(buffer);
    size_t space = line.find_first_of(" \n");
    std::string command(line.substr(0, space));
    if (command.empty()) continue;

    // Get rid of the leading and trailing space and/or newline.
    std::string args = space == std::string::npos ? "" : android::base::Trim(line.substr(space));

    if (command == "progress") {
      std::vector<std::string> tokens = android::base::Split(args, " ");
      double fraction;
      int seconds;
      if (tokens.size() == 2 && android::base::ParseDouble(tokens[0].c_str(), &fraction) &&
          android::base::ParseInt(tokens[1], &seconds)) {
        ui->ShowProgress(fraction * (1 - VERIFICATION_PROGRESS_FRACTION), seconds);
      } else {
        LOG(ERROR) << "invalid \"progress\" parameters: " << line;
      }
    } else if (command == "set_progress") {
      std::vector<std::string> tokens = android::base::Split(args, " ");
      double fraction;
      if (tokens.size() == 1 && android::base::ParseDouble(tokens[0].c_str(), &fraction)) {
        ui->SetProgress(fraction);
      } else {
        LOG(ERROR) << "invalid \"set_progress\" parameters: " << line;
      }
    } else if (command == "ui_print") {
      ui->PrintOnScreenOnly("%s\n", args.c_str());
      fflush(stdout);
    } else if (command == "wipe_cache") {
      *wipe_cache = true;
    } else if (command == "clear_display") {
      ui->SetBackground(RecoveryUI::NONE);
    } else if (command == "enable_reboot") {
      // packages can explicitly request that they want the user
      // to be able to reboot during installation (useful for
      // debugging packages that don't exit).
      ui->SetEnableReboot(true);
    } else if (command == "retry_update") {
      retry_update = true;
    } else if (command == "log") {
      if (!args.empty()) {
        // Save the logging request from updater and write to last_install later.
        log_buffer->push_back(args);
      } else {
        LOG(ERROR) << "invalid \"log\" parameters: " << line;
      }
    } else {
      LOG(ERROR) << "unknown command [" << command << "]";
    }
  }
  fclose(from_child);

  int status;
  waitpid(pid, &status, 0);

  logger_finished.store(true);
  finish_log_temperature.notify_one();
  temperature_logger.join();

  if (retry_update) {
    return INSTALL_RETRY;
  }
  if (WIFEXITED(status)) {
    if (WEXITSTATUS(status) != EXIT_SUCCESS) {
      LOG(ERROR) << "Error in " << package_path << " (status " << WEXITSTATUS(status) << ")";
      return INSTALL_ERROR;
    }
  } else if (WIFSIGNALED(status)) {
    LOG(ERROR) << "Error in " << package_path << " (killed by signal " << WTERMSIG(status) << ")";
    return INSTALL_ERROR;
  } else {
    LOG(FATAL) << "Invalid status code " << status;
  }

  return INSTALL_SUCCESS;
}
bool SetUpNonAbUpdateCommands(const std::string& package, ZipArchiveHandle zip, int retry_count,
                              int status_fd, std::vector<std::string>* cmd) {
  CHECK(cmd != nullptr);

  // In non-A/B updates we extract the update binary from the package.
  static constexpr const char* UPDATE_BINARY_NAME = "META-INF/com/google/android/update-binary";
  ZipEntry binary_entry;
  if (FindEntry(zip, UPDATE_BINARY_NAME, &binary_entry) != 0) {
    LOG(ERROR) << "Failed to find update binary " << UPDATE_BINARY_NAME;
    return false;
  }

  const std::string binary_path = Paths::Get().temporary_update_binary();
  unlink(binary_path.c_str());
  android::base::unique_fd fd(
      open(binary_path.c_str(), O_CREAT | O_WRONLY | O_TRUNC | O_CLOEXEC, 0755));
  if (fd == -1) {
    PLOG(ERROR) << "Failed to create " << binary_path;
    return false;
  }

  if (auto error = ExtractEntryToFile(zip, &binary_entry, fd); error != 0) {
    LOG(ERROR) << "Failed to extract " << UPDATE_BINARY_NAME << ": " << ErrorCodeString(error);
    return false;
  }

  // When executing the update binary contained in the package, the arguments passed are:
  //   - the version number for this interface
  //   - an FD to which the program can write in order to update the progress bar.
  //   - the name of the package zip file.
  //   - an optional argument "retry" if this update is a retry of a failed update attempt.
  *cmd = {
    binary_path,
    std::to_string(kRecoveryApiVersion),
    std::to_string(status_fd),
    package,
  };
  if (retry_count > 0) {
    cmd->push_back("retry");
  }
  return true;
}

update-binary
源码路径:bootable/recovery/updater/
update-binary 作为升级的实际执行者,其内部流程也是很复杂的。从下文的源码可以看出,update-binary 从进程启动参数得到升级包的路径,然后构造好参数 Updater 后,调用Updater.RunUpdate 开始执行升级任务。
update-binary 本身很复杂,它如何完成系统升级的详见文章:(待续)。

// bootable/recovery/updater/updater_main.cpp
static void UpdaterLogger(android::base::LogId /* id */, android::base::LogSeverity /* severity */,
                          const char* /* tag */, const char* /* file */, unsigned int /* line */,
                          const char* message) {
  fprintf(stdout, "%s\n", message);
}

int main(int argc, char** argv) {
  // Various things log information to stdout or stderr more or less
  // at random (though we've tried to standardize on stdout).  The
  // log file makes more sense if buffering is turned off so things
  // appear in the right order.
  setbuf(stdout, nullptr);
  setbuf(stderr, nullptr);

  // We don't have logcat yet under recovery. Update logs will always be written to stdout
  // (which is redirected to recovery.log).
  android::base::InitLogging(argv, &UpdaterLogger);

  // Run the libcrypto KAT(known answer tests) based self tests.
  if (BORINGSSL_self_test() != 1) {
    LOG(ERROR) << "Failed to run the boringssl self tests";
    return EXIT_FAILURE;
  }

  if (argc != 4 && argc != 5) {
    LOG(ERROR) << "unexpected number of arguments: " << argc;
    return EXIT_FAILURE;
  }

  char* version = argv[1];
  if ((version[0] != '1' && version[0] != '2' && version[0] != '3') || version[1] != '\0') {
    // We support version 1, 2, or 3.
    LOG(ERROR) << "wrong updater binary API; expected 1, 2, or 3; got " << argv[1];
    return EXIT_FAILURE;
  }

  int fd;
  if (!android::base::ParseInt(argv[2], &fd)) {
    LOG(ERROR) << "Failed to parse fd in " << argv[2];
    return EXIT_FAILURE;
  }

  std::string package_name = argv[3];

  bool is_retry = false;
  if (argc == 5) {
    if (strcmp(argv[4], "retry") == 0) {
      is_retry = true;
    } else {
      LOG(ERROR) << "unexpected argument: " << argv[4];
      return EXIT_FAILURE;
    }
  }

  // Configure edify's functions.
  RegisterBuiltins();
  RegisterInstallFunctions();
  RegisterBlockImageFunctions();
  RegisterDynamicPartitionsFunctions();
  RegisterDeviceExtensions();

  auto sehandle = selinux_android_file_context_handle();
  selinux_android_set_sehandle(sehandle);

  Updater updater(std::make_unique<UpdaterRuntime>(sehandle));
  if (!updater.Init(fd, package_name, is_retry)) {
    return EXIT_FAILURE;
  }

  if (!updater.RunUpdate()) {
    return EXIT_FAILURE;
  }

  return EXIT_SUCCESS;
}

5.5 退出 update-binary 子进程、保存 log 并擦除 misc 分区 BCB

在上文介绍的函数 TryUpdateBinary 可以看到,recovery 进程 fork 出 update-binary 子进程后进入 while 循环从管道里读取从子进程传递过来的数据,解析成命令后执行对应的操作。

recovery 进程在打开管道的读端时没有参数 O_NONBLOCK,所以是阻塞式IO,那么只要子进程没有关闭写端,while 循环就不会退出,因此 recovery 要么被阻塞等待数据,要么读到数据解析命令执行操作,直到子进程退出。

FILE* from_child = android::base::Fdopen(std::move(pipe_read), "r");
while (fgets(buffer, sizeof(buffer), from_child) != nullptr) {
std::string line(buffer);
size_t space = line.find_first_of(" \n");
std::string command(line.substr(0, space));
if (command.empty()) continue;

// Get rid of the leading and trailing space and/or newline.
std::string args = space == std::string::npos ? "" : android::base::Trim(line.substr(space));

if (command == "progress") {
  std::vector<std::string> tokens = android::base::Split(args, " ");
  double fraction;
  int seconds;
  if (tokens.size() == 2 && android::base::ParseDouble(tokens[0].c_str(), &fraction) &&
	  android::base::ParseInt(tokens[1], &seconds)) {
	ui->ShowProgress(fraction * (1 - VERIFICATION_PROGRESS_FRACTION), seconds);
  } else {
	LOG(ERROR) << "invalid \"progress\" parameters: " << line;
  }
}
...
}

update-binary 进程执行完毕退出时,会关闭管道的写端,这时 recovery 进程退出监听子进程消息的 where 循环,接下来代码继续执行到:

int status;
  waitpid(pid, &status, 0);

  logger_finished.store(true);
  finish_log_temperature.notify_one();
  temperature_logger.join();

  if (retry_update) {
    return INSTALL_RETRY;
  }
  if (WIFEXITED(status)) {
    if (WEXITSTATUS(status) != EXIT_SUCCESS) {
      LOG(ERROR) << "Error in " << package_path << " (status " << WEXITSTATUS(status) << ")";
      return INSTALL_ERROR;
    }
  } else if (WIFSIGNALED(status)) {
    LOG(ERROR) << "Error in " << package_path << " (killed by signal " << WTERMSIG(status) << ")";
    return INSTALL_ERROR;
  } else {
    LOG(FATAL) << "Invalid status code " << status;
  }

  return INSTALL_SUCCESS;

可以看出 recovery 调用 waitpid(pid, &status, 0),获取子进程的退出码。根据退出码来判断升级是否成功,接着流程从 install/install.cpp 回到 recovery.cpp。

Device::BuiltinAction start_recovery(Device* device, const std::vector<std::string>& args) {
   ...
  // Determine the next action.
  //  - If the state is INSTALL_REBOOT, device will reboot into the target as specified in
  //    `next_action`.
  //  - If the recovery menu is visible, prompt and wait for commands.
  //  - If the state is INSTALL_NONE, wait for commands (e.g. in user build, one manually boots
  //    into recovery to sideload a package or to wipe the device).
  //  - In all other cases, reboot the device. Therefore, normal users will observe the device
  //    rebooting a) immediately upon successful finish (INSTALL_SUCCESS); or b) an "error" screen
  //    for 5s followed by an automatic reboot.
  if (status != INSTALL_REBOOT) {
    if (status == INSTALL_NONE || ui->IsTextVisible()) {
      auto temp = PromptAndWait(device, status);
      if (temp != Device::NO_ACTION) {
        next_action = temp;
      }
    }
  }

  // Save logs and clean up before rebooting or shutting down.
  FinishRecovery(ui);

  return next_action;
}

InstallPackage() 的返回值有 INSTALL_SUCCESS、INSTALL_RETRY、INSTALL_SUCCESS,即源码中的 status 变量的值。

  • 如果升级失败:进入函数 PromptAndWait(),界面上会显示提示信息,用户确认后才能进行下一步操作(这个步骤意义不大,不做进一步介绍)。
  • 如果升级成功或者退出 PromptAndWait() 时:进入 FinishRecovery() ,做退出 recovery 的准备工作。

函数FinishRecovery()

// Clear the recovery command and prepare to boot a (hopefully working) system,
// copy our log file to cache as well (for the system to read). This function is
// idempotent: call it as many times as you like.
static void FinishRecovery(RecoveryUI* ui) {
  std::string locale = ui->GetLocale();
  // Save the locale to cache, so if recovery is next started up without a '--locale' argument
  // (e.g., directly from the bootloader) it will use the last-known locale.
  if (!locale.empty() && HasCache()) {
    LOG(INFO) << "Saving locale \"" << locale << "\"";
    if (ensure_path_mounted(LOCALE_FILE) != 0) {
      LOG(ERROR) << "Failed to mount " << LOCALE_FILE;
    } else if (!android::base::WriteStringToFile(locale, LOCALE_FILE)) {
      PLOG(ERROR) << "Failed to save locale to " << LOCALE_FILE;
    }
  }

  copy_logs(save_current_log);

  // Reset to normal system boot so recovery won't cycle indefinitely.
  std::string err;
  if (!clear_bootloader_message(&err)) {
    LOG(ERROR) << "Failed to clear BCB message: " << err;
  }

  // Remove the command file, so recovery won't repeat indefinitely.
  if (HasCache()) {
    if (ensure_path_mounted(COMMAND_FILE) != 0 || (unlink(COMMAND_FILE) && errno != ENOENT)) {
      LOG(WARNING) << "Can't unlink " << COMMAND_FILE;
    }
    ensure_path_unmounted(CACHE_ROOT);
  }

  sync();  // For good measure.
}

FinishRecovery 两个关键的操作:

  1. 把当前输出到内存中的log文件 /tmp/recovery.log 转存到 /cache/recovery 下
    recovery log 之所以先输出到内存文件 /tmp/recovery.log 中而不直接保存在/cache/recovery的原因:由 main 函数可知,recovery log 是通过重定向的方式实时输出到文件,如果直接保存到 /cache/recovery 下那么和 recovery 的常规动作“擦除cache分区(格式化分区)”相冲突,当执行 wipeCache 的时候会因为 cache 分区被占用无法卸载,导致擦除失败。
  2. clear_bootloader_message 把misc分区BCB数据擦除
    在升级流程结束时要及时擦除 misc 分区 BCB。因为再次重启设备,在 BootLoader 检测 BCB 数据的时不会又回到 recovery system,当然也不能过早擦除 misc 分区的 BCB,因为这是升级中断恢复机制的重要一环。

5.6 重启机器返回 main system

退出函数 start_recovery,流程又回到 recovery_main.cpp,根据 start_recovery 返回值,重启到目标系统(正常一般是 main system)。

// recovery_main.cpp
    auto ret = fastboot ? StartFastboot(device, args) : start_recovery(device, args);

    if (ret == Device::KEY_INTERRUPTED) {
      ret = action.exchange(ret);
      if (ret == Device::NO_ACTION) {
        continue;
      }
    }
    switch (ret) {
      case Device::SHUTDOWN:
        ui->Print("Shutting down...\n");
        Shutdown("userrequested,recovery");
        break;

      case Device::SHUTDOWN_FROM_FASTBOOT:
        ui->Print("Shutting down...\n");
        Shutdown("userrequested,fastboot");
        break;

      case Device::REBOOT_BOOTLOADER:
        ui->Print("Rebooting to bootloader...\n");
        Reboot("bootloader");
        break;

      case Device::REBOOT_FASTBOOT:
        ui->Print("Rebooting to recovery/fastboot...\n");
        Reboot("fastboot");
        break;

      case Device::REBOOT_RECOVERY:
        ui->Print("Rebooting to recovery...\n");
        Reboot("recovery");
        break;

      case Device::REBOOT_RESCUE: {
        // Not using `Reboot("rescue")`, as it requires matching support in kernel and/or
        // bootloader.
        bootloader_message boot = {};
        strlcpy(boot.command, "boot-rescue", sizeof(boot.command));
        std::string err;
        if (!write_bootloader_message(boot, &err)) {
          LOG(ERROR) << "Failed to write bootloader message: " << err;
          // Stay under recovery on failure.
          continue;
        }
        ui->Print("Rebooting to recovery/rescue...\n");
        Reboot("recovery");
        break;
      }

      case Device::ENTER_FASTBOOT:
        if (android::fs_mgr::LogicalPartitionsMapped()) {
          ui->Print("Partitions may be mounted - rebooting to enter fastboot.");
          Reboot("fastboot");
        } else {
          LOG(INFO) << "Entering fastboot";
          fastboot = true;
        }
        break;

      case Device::ENTER_RECOVERY:
        LOG(INFO) << "Entering recovery";
        fastboot = false;
        break;

      case Device::REBOOT:
        ui->Print("Rebooting...\n");
        Reboot("userrequested,recovery");
        break;

      case Device::REBOOT_FROM_FASTBOOT:
        ui->Print("Rebooting...\n");
        Reboot("userrequested,fastboot");
        break;

      default:
        ui->Print("Rebooting...\n");
        Reboot("unknown" + std::to_string(ret));
        break;
    }
void Reboot(std::string_view target) {
  std::string cmd = "reboot," + std::string(target);
  // Honor the quiescent mode if applicable.
  if (target != "bootloader" && target != "fastboot" &&
      android::base::GetBoolProperty("ro.boot.quiescent", false)) {
    cmd += ",quiescent";
  }
  if (!android::base::SetProperty(ANDROID_RB_PROPERTY, cmd)) {
    LOG(FATAL) << "Reboot failed";
  }

  while (true) pause();
}

bool Shutdown(std::string_view target) {
  std::string cmd = "shutdown," + std::string(target);
  return android::base::SetProperty(ANDROID_RB_PROPERTY, cmd);
}

6. BootLoader 启动 Main System

此处流程和 3. BootLoader 读取 BCB 启动到 Recovery System 大同小异,只不过此时的 misc 分区 BCB 在退出 recovery 的时候已经被擦除了,因此启动的是 boot 分区的 kernel,接着启动到 main system。

7. Init 拉起 flash_recovery 服务升级 recovery 分区

vendor_flash_recovery 服务定义在 rc 文件,它的执行程序是 /vendor/bin/install-recovery.sh 。当然 flash_recovery 的定义不是唯一的,部分厂商就把他定义在 system,但是实际完成的任务和原理都一样,把 recovery 分区升级到新版本。

以 AOSP 为例,flash_recovery 定义在 vendor,名字改成了 vendor_flash_recovery :

# bootable/recovery/applypatch/vendor_flash_recovery.rc
service vendor_flash_recovery /vendor/bin/install-recovery.sh
    class main
    oneshot

7.1 flash_recovery 存在的意义是什么?

为什么不把 recovery 分区的镜像打包到升级包,recovery 升级系统的时候顺便把自己也升级了 ?原因主要有以下两点:

  • 升级稳定性
    假如在升级 recovery 分区的时候发生异常重启,这时分区数据只写了一半,那么 recovery 分区的数据一定损坏了,这时上文提到的升级中断恢复机制就无法正常运行,因为再也无法启动到 recovery system,回过头来看目前的 flash_recovery 这套机制就很好的解决了这个问题。
  • 系统安全性
    论坛上经常有发烧友通过刷入第三方recovery来烧写第三方rom或者获取手机数据,非常不安全,这套机制可以一定程度上解决这个问题,flash_recovery 每次重启时都会计算 recovery 分区数据的 SHA1 值是否和预期匹配,不匹配就会恢复 recovery 分区的数据。

那么问题来了,flash_recovery 是怎么升级或者恢复 recovery 分区的?请见 7.3介绍。

7.2 什么时候启动 flash_recovery 服务?

由 vendor_flash_recovery.rc 可以可知,vendor_flash_recovery 属于 main class,也就是当触发启动 main 类服务时 vendor_flash_recovery 也就开始工作。
从 init.rc 可以看到,如果分区未加密,则在触发 nonencrypted 时启动 main class 服务,否则由加解密流程属性 vold.decrypt 控制。

  • on nonencrypted 在 builtins.cpp 中函数 queue_fs_event 触发。
  • 属性 decrypt 在 system/vold/cryptfs.cpp 中被设置。
// system/core/init/builtins.cpp
static Result<void> queue_fs_event(int code, bool userdata_remount)
# system/core/rootdir/init.rc
on nonencrypted
    class_start main
    class_start late_start

on property:vold.decrypt=trigger_restart_min_framework
    # A/B update verifier that marks a successful boot.
    exec_start update_verifier
    class_start main

on property:vold.decrypt=trigger_restart_framework
    # A/B update verifier that marks a successful boot.
    exec_start update_verifier
    class_start_post_data hal
    class_start_post_data core
    class_start main
    class_start late_start
    setprop service.bootanim.exit 0
    start bootanim

on property:vold.decrypt=trigger_shutdown_framework
    class_reset late_start
    class_reset main
    class_reset_post_data core
    class_reset_post_data hal

7.3 flash_recovery 怎么升级 recovery 分区?

启动服务 vendor_flash_recovery ,执行脚本 install-recovery.sh 升级 recovery 分区。

# bootable/recovery/applypatch/vendor_flash_recovery.rc
service vendor_flash_recovery /vendor/bin/install-recovery.sh
    class main
    oneshot

install-recovery.sh 脚本的内容如下所示:

#!/system/bin/sh
if ! applypatch --check EMMC:/dev/block/bootdevice/by-name/recovery:100663296:5859245349a4196c1e30f0ff21d727016e740e25; then
  applypatch  \
          --patch /system/recovery-from-boot.p \
          --source EMMC:/dev/block/bootdevice/by-name/boot:100663296:a362e080d203e34fbdcce47278cda2bda566409a \
          --target EMMC:/dev/block/bootdevice/by-name/recovery:100663296:5859245349a4196c1e30f0ff21d727016e740e25 && \
      log -t recovery "Installing new recovery image: succeeded" || \
      log -t recovery "Installing new recovery image: failed"
else
  log -t recovery "Recovery image already installed"

脚本升级 recovery 分区的步骤如下:
1). 计算 recovery 分区的 SHA1 值是否匹配;

SHA1 值是固化在脚本里面的。这个值在编译软件的时候就固定了,服务器编译生成 recovery 分区数据时,会计算内容的 SHA1 值,在生成 install-recovery.sh 脚本时直接写入到脚本里面。

2). 如果 SHA1 匹配,则说明 recovery 分区已经升级过了,结束;

applypatch --check EMMC:/dev/block/bootdevice/by-name/recovery:100663296:5859245349a4196c1e30f0ff21d727016e740e25

“–check” :指示 applypatch 执行分区数据校验操作,后面提供了分区路径、大小以及期待的 SHA1。

3). 如果 SHA1 不匹配,则说明 recovery 分区未升级或数据损坏,升级 recovery 分区。
applypatch 从 boot 分区 load 数据,并打上 patch (/system/recovery-from-boot.p)合成新的数据写到 recovery 分区。

applypatch  \
          --patch /system/recovery-from-boot.p \
          --source EMMC:/dev/block/bootdevice/by-name/boot:100663296:a362e080d203e34fbdcce47278cda2bda566409a \
          --target EMMC:/dev/block/bootdevice/by-name/recovery:100663296:5859245349a4196c1e30f0ff21d727016e740e25

“–patch” :patch 文件。该文件是服务器编译软件时,diff 工具根据 boot 和 recovery 镜像 raw data 生成的差分补丁文件;
“–source” : 源文件。从代码可知,参数包括设备路径、大小以及 SHA1(说明打 patch 的时候也会校验源文件数据的完整性和合法性);
“–target”:目标文件。源文件和 patch 作用后生成的数据会写到目标文件,同样参数包括设备路径、大小以及 SHA1,也就说明升级结束后会检查目标文件数据是否正确。可以看到 “–target” 参数和 “–check” 的参数一致。

为什么源文件是boot分区?它的大小、SHA1以及 patch 文件怎么来的?

在讲软件架构的时候已经介绍,boot 和 recovery 分区的内容其实就是 kernel+ramdisk(recovery镜像有时也会是 kernel+ramdisk+dtb)。但是它们之间 kernel 的内容基本一样,ramdisk大同小异,dtb占用空间很小,也就是 boot 和 recovery 分区的数据大部分是一样的,因此当 recovery 把系统升级到新版本后,没必要在系统分区内保存完整的 recovery 镜像来升级 recovery 分区,可以充分利用 boot 分区的数据,编译软件时在系统分区内保存一份它们之间的 pacth,升级 recovery 的时候只需要 load boot 分区的数据打上 patch 就可以还原 recovery 分区的数据,同时也节省了不少系统分区的空间。

这里面的核心程序 applypatch 实现也是挺复杂的,此处不展开介绍,详见:(待续)

8. 启动到 launcher,升级流程结束