在上篇中我们知道App经过编译之后最终生成一个可执行的Mach-O文件,在运行之前它只是硬盘上的一个静态文件,现在我们需要将它加载运行形成一个独立的进程,主要包含下面两个步骤:

装载

将硬盘上的可执行文件映射到虚拟内存中的过程就是装载.

动态链接(Dynamic Linking)

前面我们在生成可执行文件时所说的链接是属于静态链接; 当App加载运行时,链接运行时库是属于动态链接. 运行时库通常是系统库,是支持程序运行的基本函数的集合,也就意味着每个进程都会用到它,如果每一个可执行文件都使用静态链接的方式,将其打包进自己的可执行文件中,其对设备的内存和磁盘的空间造成巨大的浪费!在现在的Linux系统中,一个普通的程序会使用到的C语言静态库至少在1M以上,如果系统中有2000个这样的程序在运行,就要占用近2G的空间存储。为了解决这个问题,把运行时库的链接过程推迟到了运行时再进行,且使用动态库共享缓存的策略.

使用动态链接的优点:

解决了共享的目标文件存在多个副本浪费磁盘和内存空间的问题

减少物理页面的换入换出,还增加了CPU的缓存命中率,因为不同进程间的数据和指令访问都集中在了同一个共享模块上

系统升级只需要替换掉对应的共享模块,当程序下次启动时新版本的共享模块会被自动装载并链接起来,程序就无感的对接到了新版本。

更方便程序插件(Plug-in)的制作,为程序带来更好的可扩展性和兼容性。

Dyld

Dyld概述

dyld(the dynamic link editor)动态链接器,是iOS系统一个重要的组成部分,系统内核做好准备工作之后,剩下的就会交给了dyld。系统会先读取App的可执行文件(MachO)从里面获取dyld的路径,然后加载dyld,dyld去初始化运行环境,开启缓存策略,加载程序相关的依赖库,并对这些库进行链接,最后调用每个依赖库的初始化方法,在这一步,runtime被初始化。当所有依赖库的初始化后,最后是App可执行性文件进行初始化,在这时runtime会对项目中所有类进行类结构初始化,然后调用所有的load方法。最后dyld返回函数地址,main函数被调用,便来到了熟悉的程序入口。

在我们手机的 /usr/lib目录下有个Mach-O文件dyld,其就是用来加载动态库的文件,如下所示:

Snip20210115_4.png

Dyld源码分析

在工程文件中的load类方法中加入断点

Snip20210115_7.png

运行工程我们发现load函数的调用,在入口main函数之前,为什么会出现这种情况下面我们来详细分析app运行加载的流程,就能解开其中的秘密。

运行工程,进入断点处可以看到汇编代码如下:

Snip20210115_8.png

在控制台上输入bt命令可以看到函数的调用堆栈:

Snip20210302_116.png

打开源码工程,在dyldStartup.s汇编文件中可以看到:

#if __arm64__ && !TARGET_OS_SIMULATOR
.text
.align 2
.globl __dyld_start
__dyld_start:
mov x28, sp
and sp, x28, #~15 // force 16-byte alignment of stack
mov x0, #0
mov x1, #0
stp x1, x0, [sp, #-16]! // make aligned terminating frame
mov fp, sp // set up fp to point to terminating frame
sub sp, sp, #16 // make room for local variables
#if __LP64__
ldr x0, [x28] // get app's mh into x0
ldr x1, [x28, #8]
add x2, x28, #16 // get argv into x2
#else
ldr w0, [x28] // get app's mh into x0
ldr w1, [x28, #4]
add w2, w28, #8 // get argv into x2
#endif
adrp x3,___dso_handle@page
add x3,x3,___dso_handle@pageoff // get dyld's mh in to x4
mov x4,sp // x5 has &startGlue
//call dyldbootstrap::start(app_mh, argc, argv, dyld_mh, &startGlue)
bl __ZN13dyldbootstrap5startEPKN5dyld311MachOLoadedEiPPKcS3_Pm
mov x16,x0 // save entry point address in x16
#if __LP64__
ldr x1, [sp]
#else
ldr w1, [sp]
#endif
cmp x1, #0
b.ne Lnew

在dyld源码工程中全局搜索dyldbootstrap定位到dyldInitialization.cpp文件中,然后 在dyldInitialization.cpp文件中搜索start(,找到start的函数实现如下:

uintptr_t start(const dyld3::MachOLoaded* appsMachHeader, int argc, const char* argv[],
const dyld3::MachOLoaded* dyldsMachHeader, uintptr_t* startGlue)
{
// Emit kdebug tracepoint to indicate dyld bootstrap has started 
dyld3::kdebug_trace_dyld_marker(DBG_DYLD_TIMING_BOOTSTRAP_START, 0, 0, 0, 0);
//1.Dyld的重定位
rebaseDyld(dyldsMachHeader);
//kernel sets up env pointer to be just past end of agv array
const char** envp = &argv[argc+1];
//kernel sets up apple pointer to be just past end of envp array
const char** apple = envp;
while(*apple != NULL) { ++apple; }
++apple;
//2.栈溢出保护
__guard_setup(apple);
#if DYLD_INITIALIZER_SUPPORT
//3.初始化Dyld
runDyldInitializers(argc, argv, envp, apple);
#endif
_subsystem_init(apple);
//4.进入dyld::main() 主函数 (核心)
uintptr_t appsSlide = appsMachHeader->getSlide();
return dyld::_main((macho_header*)appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue);
}

在start方法中主要做了以下操作:

Dyld的重定位;

栈溢出保护;

初始化Dyld;

调用dyld::main() 主函数,其中传参macho_header是Mach-O文件的头部,由此可见dyld是针对Mach-O文件进行相关处理的.

执行dyld::main()函数 点击直接定位到dyld2.cpp文件中,对源码做了简化处理如下:

uintptr_t
_main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide,
int argc, const char* argv[], const char* envp[], const char* apple[],
uintptr_t* startGlue){
if (dyld3::kdebug_trace_dyld_enabled(DBG_DYLD_TIMING_LAUNCH_EXECUTABLE)) {
launchTraceID = dyld3::kdebug_trace_dyld_duration_start(DBG_DYLD_TIMING_LAUNCH_EXECUTABLE, (uint64_t)mainExecutableMH, 0, 0);
}
//Check and see if there are any kernel flags
dyld3::BootArgs::setFlags(hexToUInt64(_simple_getenv(apple, "dyld_flags"), nullptr));
#if __has_feature(ptrauth_calls)
if ( const char* disableStr = _simple_getenv(apple, "ptrauth_disabled") ) {
if ( strcmp(disableStr, "1") == 0 )
sKeysDisabled = true;
}else {
// needed until kernel passes ptrauth_disabled for arm64 main executables
if ( (mainExecutableMH->cpusubtype == CPU_SUBTYPE_ARM64_V8) || (mainExecutableMH->cpusubtype == CPU_SUBTYPE_ARM64_ALL) )
sKeysDisabled = true;
}
#endif
//第一步:环境变量的配置
uint8_t mainExecutableCDHashBuffer[20];
const uint8_t* mainExecutableCDHash = nullptr;
if ( const char* mainExeCdHashStr = _simple_getenv(apple, "executable_cdhash")) {
unsigned bufferLenUsed;
if ( hexStringToBytes(mainExeCdHashStr, mainExecutableCDHashBuffer, sizeof(mainExecutableCDHashBuffer), bufferLenUsed))
//获取主程序的hash
mainExecutableCDHash = mainExecutableCDHashBuffer;
}
//获取主机信息
getHostInfo(mainExecutableMH, mainExecutableSlide);
#if !TARGET_OS_SIMULATOR
// Trace dyld's load
notifyKernelAboutImage((macho_header*)&__dso_handle, _simple_getenv(apple, "dyld_file"));
// Trace the main executable's load
notifyKernelAboutImage(mainExecutableMH, _simple_getenv(apple, "executable_file"));
#endif
uintptr_t result = 0;
//获取主程序的macho_header
sMainExecutableMachHeader = mainExecutableMH;
//获取主程序的slide值
sMainExecutableSlide = mainExecutableSlide;
CRSetCrashLogMessage("dyld: launch started");
//设置上下文信息
setContext(mainExecutableMH, argc, argv, envp, apple);
//获取主程序路径
sExecPath = _simple_getenv(apple, "executable_path");
//进程的名字
sExecShortName = ::strrchr(sExecPath, '/');
if ( sExecShortName != NULL )
++sExecShortName;
else
sExecShortName = sExecPath;
//进程的头环境配置
configureProcessRestrictions(mainExecutableMH, envp);
#if TARGET_OS_OSX
if ( !gLinkContext.allowEnvVarsPrint && !gLinkContext.allowEnvVarsPath && !gLinkContext.allowEnvVarsSharedCache ) {
pruneEnvironmentVariables(envp, &apple);
setContext(mainExecutableMH, argc, argv, envp, apple);
}else
#endif
{
//检测环境变量
checkEnvironmentVariables(envp);
defaultUninitializedFallbackPaths(envp);
}
//第二步:加载共享缓存
//检查缓存共享区域是否开启,iOS必须开启
checkSharedRegionDisable((dyld3::MachOLoaded*)mainExecutableMH, mainExecutableSlide);
if ( gLinkContext.sharedRegionMode != ImageLoader::kDontUseSharedRegion ) {
#if TARGET_OS_SIMULATOR
if ( sSharedCacheOverrideDir)
mapSharedCache(mainExecutableSlide);
#else
//共享缓存加载
mapSharedCache(mainExecutableSlide);
#endif
}
//第三步:初始化主程序
sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);
gLinkContext.mainExecutable = sMainExecutable;
gLinkContext.mainExecutableCodeSigned = hasCodeSignatureLoadCommand(mainExecutableMH);
//load any inserted libraries
//第四步:插入的动态库
if ( sEnv.DYLD_INSERT_LIBRARIES != NULL ) {
for (const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib != NULL; ++lib)
loadInsertedDylib(*lib);
}
//记录插入的动态库的数量
sInsertedDylibCount = sAllImages.size()-1;
//link main executable
gLinkContext.linkingMainExecutable = true;
//第五步:链接主程序
link(sMainExecutable, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
sMainExecutable->setNeverUnloadRecursive();
if ( sMainExecutable->forceFlat() ) {
gLinkContext.bindFlat = true;
gLinkContext.prebindUsage = ImageLoader::kUseNoPrebinding;
}
//第六步:链接插入的动态库
if ( sInsertedDylibCount > 0 ) {
for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
ImageLoader* image = sAllImages[i+1];
link(image, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
image->setNeverUnloadRecursive();
}
if ( gLinkContext.allowInterposing ) {
for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
ImageLoader* image = sAllImages[i+1];
image->registerInterposing(gLinkContext);
}
}
}
//第七步:弱引用绑定主程序
sMainExecutable->weakBind(gLinkContext);
gLinkContext.linkingMainExecutable = false;
sMainExecutable->recursiveMakeDataReadOnly(gLinkContext);
CRSetCrashLogMessage("dyld: launch, running initializers");
#if SUPPORT_OLD_CRT_INITIALIZATION
// Old way is to run initializers via a callback from crt1.o
if ( ! gRunInitializersOldWay )
initializeMainExecutable();
#else
//第八步:执行初始化方法
initializeMainExecutable();
#endif
//第九步:查找程序入口函数 main并返回
notifyMonitoringDyldMain();
catch(const char* message) {
syncAllImages();
halt(message);
}
catch(...) {
dyld::log("dyld: launch failed\n");
}
CRSetCrashLogMessage("dyld2 mode");
#if !TARGET_OS_SIMULATOR
if (sLogClosureFailure) {
// We failed to launch in dyld3, but dyld2 can handle it. synthesize a crash report for analytics
dyld3::syntheticBacktrace("Could not generate launchClosure, falling back to dyld2", true);
}
#endif
if (sSkipMain) {
notifyMonitoringDyldMain();
if (dyld3::kdebug_trace_dyld_enabled(DBG_DYLD_TIMING_LAUNCH_EXECUTABLE)) {
dyld3::kdebug_trace_dyld_duration_end(launchTraceID, DBG_DYLD_TIMING_LAUNCH_EXECUTABLE, 0, 0, 2);
}
ARIADNEDBG_CODE(220, 1);
result = (uintptr_t)&fake_main;
*startGlue = (uintptr_t)gLibSystemHelpers->startGlueToCallExit;
}
return result;
}

其流程可以总结为以下9个步骤:

【第一步:环境变量的配置】:检查设置的环境变量以及获取运行文件的主机信息;

Snip20210302_126.png

【第二步:加载共享缓存】

Snip20210302_117.png

【第三步:初始化主程序】:调用instantiateFromLoadedImage函数,返回了ImageLoaderMachO对象

Snip20210302_118.png

【第四步:插入动态库】:遍历DYLD_INSERT_LIBRARIES环境变量,然后调用loadInsertedDylib加载,插入动态库;sInsertedDylibCount用来记录插入动态库的数量。

Snip20210302_120.png

【第五步:链接主程序】主程序sMainExecutable本质是ImageLoader,会将主程序的ImageLoader添加到sAllImages集合中去;

Snip20210302_121.png

【第六步:链接插入的动态库】:会将各动态库的ImageLoader添加到sAllImages集合中去;

Snip20210302_122.png

【第七步:弱引用绑定主程序】:

Snip20210302_123.png

【第八步:执行所有初始化程序】:

Snip20210302_124.png

【第九步:寻找主程序入口即main函数】:

Snip20210302_125.png

下面重点分析第八步执行所有初始化程序,调用initializeMainExecutable()函数

Snip20210302_127.png

其内部调用runInitializers()函数;直接定位看到其函数实现:

void ImageLoader::runInitializers(const LinkContext& context, InitializerTimingList& timingInfo)
{
uint64_t t1 = mach_absolute_time();
mach_port_t thisThread = mach_thread_self();
ImageLoader::UninitedUpwards up;
up.count = 1;
up.imagesAndPaths[0] = { this, this->getPath() };
processInitializers(context, thisThread, timingInfo, up);
context.notifyBatch(dyld_image_state_initialized, false);
mach_port_deallocate(mach_task_self(), thisThread);
uint64_t t2 = mach_absolute_time();
fgTotalInitTime += (t2 - t1);
}

核心步骤为processInitializers()函数,其函数实现如下:

void ImageLoader::processInitializers(const LinkContext& context, mach_port_t thisThread,
InitializerTimingList& timingInfo, ImageLoader::UninitedUpwards& images)
{
uint32_t maxImageCount = context.imageCount()+2;
ImageLoader::UninitedUpwards upsBuffer[maxImageCount];
ImageLoader::UninitedUpwards& ups = upsBuffer[0];
ups.count = 0;
// Calling recursive init on all images in images list, building a new list of
// uninitialized upward dependencies.
for (uintptr_t i=0; i < images.count; ++i) {
images.imagesAndPaths[i].first->recursiveInitialization(context, thisThread, images.imagesAndPaths[i].second, timingInfo, ups);
}
// If any upward dependencies remain, init them.
if ( ups.count > 0 )
processInitializers(context, thisThread, timingInfo, ups);
}

看到循环遍历ImageLoader执行recursiveInitialization函数,其实现如下:

void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize,
InitializerTimingList& timingInfo, UninitedUpwards& uninitUps)
{
recursive_lock lock_info(this_thread);
recursiveSpinLock(lock_info);
if ( fState < dyld_image_state_dependents_initialized-1 ) {
uint8_t oldState = fState;
// break cycles
fState = dyld_image_state_dependents_initialized-1;
try {
// initialize lower level libraries first
for(unsigned int i=0; i < libraryCount(); ++i) {
ImageLoader* dependentImage = libImage(i);
if ( dependentImage != NULL ) {
// don't try to initialize stuff "above" me yet
if ( libIsUpward(i) ) {
uninitUps.imagesAndPaths[uninitUps.count] = { dependentImage, libPath(i) };
uninitUps.count++;
}
else if ( dependentImage->fDepth >= fDepth ) {
dependentImage->recursiveInitialization(context, this_thread, libPath(i), timingInfo, uninitUps);
}
}
}
// record termination order
if ( this->needsTermination() )
context.terminationRecorder(this);
// let objc know we are about to initialize this image
uint64_t t1 = mach_absolute_time();
fState = dyld_image_state_dependents_initialized;
oldState = fState;
//核心步骤
context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo);
// initialize this image
bool hasInitializers = this->doInitialization(context);
// let anyone know we finished initializing this image
fState = dyld_image_state_initialized;
oldState = fState;
context.notifySingle(dyld_image_state_initialized, this, NULL);
if ( hasInitializers ) {
uint64_t t2 = mach_absolute_time();
timingInfo.addTime(this->getShortName(), t2-t1);
}
}
catch (const char* msg) {
// this image is not initialized
fState = oldState;
recursiveSpinUnLock();
throw;
}
}
recursiveSpinUnLock();
}

其核心步骤有两个:

notifySingle() 发现notifySingle()先后调用了两次,第一次在doInitialization()函数之前,第二次在doInitialization()之后。

doInitialization()

notifySingle()函数的核心逻辑如下:

Snip20210304_16.png

关键代码 (*sNotifyObjCInit)(image->getRealPath(), image->machHeader())

第一次调用notifySingle()函数时,sNotifyObjCInit是空的,因为sNotifyObjCInit是在doInitialization()调用-->_objc_init()--> _dyld_objc_notify_register --> registerObjCNotifiers中赋值的。

全局搜索sNotifyObjCInit,定位如下:

Snip20210302_129.png

看到只有赋值操作,然后全局搜索registerObjCNotifiers,定位如下:

void _dyld_objc_notify_register(_dyld_objc_notify_mapped mapped,
_dyld_objc_notify_init init,
_dyld_objc_notify_unmapped unmapped)
{
dyld::registerObjCNotifiers(mapped, init, unmapped);
}

最终定位到void _dyld_objc_notify_register()这个函数,此函数不是在dyld中调用的,而是在objc中调用的,这就涉及到两个不同底层库之间的通信。

objc调用void _dyld_objc_notify_register()注册了三个回调函数,当dyld加载image镜像进入内存时,初始化image镜像时,移除image镜像时,分别调用相应的回调函数。

在objc4_781源码工程中全局搜索_dyld_objc_notify_register(,定位如下:

void _objc_init(void)
{
static bool initialized = false;
if (initialized) return;
initialized = true;
// fixme defer initialization until an objc-using image is found?
environ_init();
tls_init();
static_init();
runtime_init();
exception_init();
#if __OBJC2__
cache_t::init();
#endif
_imp_implementationWithBlock_init();
_dyld_objc_notify_register(&map_images, load_images, unmap_image);
#if __OBJC2__
didCallDyldNotifyRegister = true;
#endif
}

看到objc 在_objc_init中调用了_dyld_objc_notify_register 传入的三个参数就是三个回调函数;dyld会保存这三个回调函数,且在合适的时机去回调这三个函数。

objc中的load_images方法实现如下:

load_images(const char *path __unused, const struct mach_header *mh)
{
if (!didInitialAttachCategories && didCallDyldNotifyRegister) {
didInitialAttachCategories = true;
loadAllCategories();
}
// Return without taking locks if there are no +load methods here.
if (!hasLoadMethods((const headerType *)mh)) return;
recursive_mutex_locker_t lock(loadMethodLock);
// Discover load methods
{
mutex_locker_t lock2(runtimeLock);
prepare_load_methods((const headerType *)mh);
}
// Call +load methods (without runtimeLock - re-entrant)
call_load_methods();
}

核心逻辑call_load_methods(),其实现如下:

Snip20210302_130.png

call_class_loads实现如下:

Snip20210302_131.png

流程到了这里,也就解释了为什么类的load类方法会比入口main函数率先调用。

同时也验证了上面控制台打印的函数调用堆栈。

总结:类的load方法调用堆栈:

_dyld_start --> dyldbootstrap::start --> dyld::_main --> dyld::initializeMainExecutable --> ImageLoader::runInitializers --> ImageLoader::processInitializers --> ImageLoader::recursiveInitialization --> dyld::notifySingle --> sNotifyObjCInit --> load_images(libobjc.A.dylib)

在Xcode中加入符号断点 _objc_init

Snip20210118_11.png

当工程执行到符号断点_objc_init 时在控制器台中输入bt,打印出系统函数的调用堆栈:

Snip20210302_1.png

从函数堆栈我们看到中途调用的是recursiveInitialization()函数中的doInitialization()也就是上面所说的核心步骤的第二个步骤;doInitialization()函数实现如下:

bool ImageLoaderMachO::doInitialization(const LinkContext& context)
{
CRSetCrashLogMessage2(this->getPath());
// mach-o has -init and static initializers
doImageInit(context);
doModInitFunctions(context);
CRSetCrashLogMessage2(NULL);
return (fHasDashInit || fHasInitializers);
}

其内部也包含两个核心步骤:

doImageInit():镜像初始化

doModInitFunctions()

其中doImageInit()与doModInitFunctions()实现分别如下所示:

Snip20210304_7.png

Snip20210304_8.png

从上面的两个函数实现并没有看到_objc_init()的调用;

看到注释在初始化时必须要先初始化libSystem,猜测是否在libSystem的初始化中存在_objc_init()的调用;

下载Libsystem-1292.60.1源码工程,全局搜索libSystem_initializer函数,其实现如下所示:

Snip20210304_10.png

其内部也存在dyld的初始化_dyld_initializer();其次是libdispatch_init(),根据_objc_init()的调用堆栈,知道libdispatch_init()是在libdispatch开源库中,下载它的开源库 libdispatch-1173.0.3打开工程全局搜索libdispatch_init

Snip20210304_12.png

全局搜索_os_object_init(,找到其函数实现如下所示:

Snip20210304_13.png

到了这里终于看到了_objc_init()的调用。

总结_objc_init的调用堆栈:

_dyld_start --> dyldbootstrap::start --> dyld::_main --> dyld::initializeMainExecutable --> ImageLoader::runInitializers --> ImageLoader::processInitializers --> ImageLoader::recursiveInitialization --> doInitialization -->libSystem_initializer(libSystem.B.dylib) --> _os_object_init(libdispatch.dylib) --> _objc_init(libobjc.A.dylib)