Spark源码梳理

一、程序的起点

spark-submit --class com.sjh.example.SparkPi --master local[*] …/demo.jar

  • 在windows下调用spark-submit.cmd
  • 在linux下调用spark-submit脚本
    脚本调用顺序:

1、spark-submit

2、 spark-submit2.cmd

spark-class2.cmd org.apache.spark.deploy.SparkSubmit /*参数*/

3、 spark-class2.cmd

%SPARK_CMD%

SPARK_CMD是什么?

java org.apache.spark.deploy.SparkSubmit (简化)

由此,脚本最终启动了一个java进程去执行SparkSubmit中的方法

二、SparkSubmit干了啥?

1、SparkSubmit object的main方法(985line)

val submit = new SparkSubmit()... submit.doSubmit(args) // 执行提交

2、doSubmit方法
1)解析参数 parseArguments(args) // new SparkSubmitArguments(args)

1.执行类的初始化,这个类中具有大量的属性

2.line 108 parse(args.asJava)

使用正则表达式解析参数 =》 名称,值

handle 模式匹配为类属性赋值

2)appArgs.action match … // 匹配action 默认会给SUBMIT值 (loadEnvironmentArguments方法中)
3)submit(appArgs, uninitLog) line90
3、submit方法

1)、判断是否是StandaloneClouster

2)、不是 =》 doRunMain =》 runMain

4、runMain方法

1) prepareSubmitEnvironment(args) line871 (重要)

line714 在yarn集群环境下为childClasspath 赋值org.apache.spark.deploy.yarn.YarnClusterApplication

2)通过ClassLoader得到类的信息

3)根据这个类是否继承于SparkApplication, 如果是则转换为这个类型,否则new JavaMainApplication(mainClass)

4)app.start line928

5、YarnClusterApplication类

注:需要添加spark-yarn_2.12依赖才能找到

new Client(new ClientArguments(args), conf, null).run()
1) ClientArguments 解析参数
2) Client
一个重要属性 private val yarnClient = YarnClient.createYarnClient =》new YarnClientImpl()

*YarnClient

protected ApplicationClientProtocol rmClient;

3)run方法

this.appId = submitApplication() // 这里的appId是yarn全局的唯一Id,后续可以通过这个ID获取这个应用

6、 submitApplication方法
  1. yarnClient.start() // 启动yarnClient,建立与resourceManager的连接
  2. val newApp = yarnClient.createApplication() //通过yarnClient创建应用
  3. val newAppResponse = newApp.getNewApplicationResponse() // 返回appId
  4. val containerContext = createContainerLaunchContext(newAppResponse) // 创建容器环境
  5. val appContext = createApplicationSubmissionContext(newApp, containerContext) // 创建提交环境
  6. yarnClient.submitApplication(appContext) // 提交应用
1) createContainerLaunchContext 方法
  1. 配置参数

val amClass =  if (isClusterMode) {
    Utils.classForName("org.apache.spark.deploy.yarn.ApplicationMaster").getName  
} else {
    Utils.classForName("org.apache.spark.deploy.yarn.ExecutorLauncher").getName  
}
  1. 封装指令 : java amClass … 放入容器中 ,返回容器,容器随着环境被返回给ResoureManager

容器封装后的日志:

===============================================================================
YARN AM launch context:
    user class: com.sjh.Test
    env:
        CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__
        SPARK_YARN_STAGING_DIR -> file:/Users/jinghu/.sparkStaging/application_1629708346779_0043
        SPARK_USER -> jinghu
    resources:
        __app__.jar -> resource { scheme: "file" port: -1 file: "/Users/jinghu/.sparkStaging/application_1629708346779_0043/monthly-PV.jar" } size: 485814 timestamp: 1630206978724 type: FILE visibility: PRIVATE
        __spark_libs__ -> resource { scheme: "file" port: -1 file: "/Users/jinghu/.sparkStaging/application_1629708346779_0043/__spark_libs__3773172268469152497.zip" } size: 242063445 timestamp: 1630206978507 type: ARCHIVE visibility: PRIVATE
        __spark_conf__ -> resource { scheme: "file" port: -1 file: "/Users/jinghu/.sparkStaging/application_1629708346779_0043/__spark_conf__.zip" } size: 295698 timestamp: 1630206979149 type: ARCHIVE visibility: PRIVATE
    command:
       {{JAVA_HOME}}/bin/java -server -Xmx1024m -Djava.io.tmpdir={{PWD}}/tmp -Dspark.yarn.app.container.log.dir=<LOG_DIR> org.apache.spark.deploy.yarn.ApplicationMaster --class 'com.sjh.Test' --jar file:/Users/jinghu/Documents/project/IdeaProjects/Spark/sparkdemo/target/monthly-PV.jar --properties-file {{PWD}}/__spark_conf__/__spark_conf__.properties --dist-cache-conf {{PWD}}/__spark_conf__/__spark_dist_cache__.properties 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
===============================================================================

因此接下来需要执行的是ApplicationMaster这个类。

7、ApplicationMaster
  1. master = new ApplicationMaster(amArgs, sparkConf, yarnConf) =》 private val client = new YarnRMClient() =》 private var amClient: AMRMClient[ContainerRequest] = _ rm连接am的客户端

2) run =》line264 ,runDriver()


8、runDriver方法

1) userClassThread = startUserApplication()

1. 加载类  val mainMethod = userClassLoader.loadClass(args.userClass)  // 这个类来自--class

 		2. 执行这个类的main方法 userThread.setName("Driver")  userThread.start()  // 这是一个线程

2) val sc = ThreadUtils.awaitResult(sparkContextPromise.future, // 等待上下文环境对象准备完成

3) 注册AM registerAM(host, port, userConf, sc.ui.map(_.webUrl), appAttemptId) // 与yarn连接,申请资源

4) line,512   createAllocator =》  line,479  allocator.allocateResources()

获取可分配的容器列表 allocateResponse.getAllocatedContainers()

处理可分配的容器  handleAllocatedContainers(allocatedContainers.asScala)  //分配策略

运行已分配容器 runAllocatedContainers(containersToUse)

9、 容器的启动
1)runAllocatedContainers方法

使用线程池去启动各个容器 run =》

def run(): Unit = {
    logDebug("Starting Executor Container")
    nmClient = NMClient.createNMClient()
    nmClient.init(conf)
    nmClient.start()
    startContainer()
  }
2)startContainer()

val commands = prepareCommand() // 准备指令

nmClient.startContainer(container.get, ctx) // 使用rmClient启动指定容器,并传递上下文(包含command)

3) prepareCommand()

java org.apache.spark.executor.YarnCoarseGrainedExecutorBackend // Executor的通信后台进程

10、 YarnCoarseGrainedExecutorBackend 与 CoarseGrainedExecutorBackend
1) main方法

line 81 run

2) run方法

SparkEnv.createExecutorEnv

env.rpcEnv.setupEndpoint(“Executor”, backendCreateFn(env.rpcEnv, arguments, env, cfg.resourceProfile)) // 将创建出来的对象 ,设置成通信终端

3) setupEndpoint ,注意点进实现类NettyRpcEnv

dispatcher.registerRpcEndpoint(name, endpoint) =》 new DedicatedMessageLoop(name, e, this) // inbox threadpool

inbox设置完成后CoarseGrainedExecutorBackend中的 onStart得以执行

11、收尾通信
  • Inbox line,78 在类被初始化的时候被执行,回向自己发送一条消息——OnStart
    在终端收到消息并执行onStart方法
  • onStart : 向driver发送消息,注册一个executor
  • SparkContext中的 :private var _schedulerBackend: SchedulerBackend = _
  • 关注 SchedulerBackend 的实现CoarseGrainedSchedulerBackend
    receiveAndReply方法 进行操作并回复成功 line227 context.reply(true)
  • 收到成功的消息,2.4版本中会忽略这个消息,而在3.0中则会向自己发送一条成功的消息
  • receive 方法 executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false)
    启动executor

在ApplicationMaster中继续执行driver代码(完结

补充:

  • SparkContext =》 _taskScheduler.postStartHook()
  • 来到实现YarnClusterScheduler ,ApplicationMaster.sparkContextInitialized(sc)让程序继续执行
    postStartHook =》 waitBackendReady =》 while的等待

两条线:让driver executor准备好资源
执行业务逻辑

通信框架

Spark的通信环境在哪?

SparkContext中,

private[spark] def env: SparkEnv = _env

类代码块,将初始化env

_env = createSparkEnv(_conf, isLocal, listenerBus) // =>createDriverEnv =>  create

create =>

val rpcEnv = RpcEnv.create(systemName, bindAddress, advertiseAddress, port.getOrElse(-1), conf,

create =>

new NettyRpcEnvFactory().create(config)

create =>

val nettyEnv =  new NettyRpcEnv(sparkConf, javaSerializerInstance, config.advertiseAddress,    config.securityManager, config.numUsableCores)
Utils.startServiceOnPort(config.port, startNettyRpcEnv, sparkConf, config.name)._1

startServiceOnPort=>

val (service, port) = startService(tryPort)

关注startService,这个函数是传进来的,回到上一步 startNettyRpcEnv这个参数

val startNettyRpcEnv: Int => (NettyRpcEnv, Int) = { actualPort =>
        nettyEnv.startServer(config.bindAddress, actualPort)

startServer =>

server = transportContext.createServer(bindAddress, port, bootstraps)

createServer =>

return new TransportServer(this, host, port, rpcHandler, bootstraps);

TransportServer类代码块

init(hostToBind, portToBind);

init =>

bootstrap = new ServerBootstrap()  .group(bossGroup, workerGroup)  .channel(NettyUtils.getServerChannelClass(ioMode))  .option(ChannelOption.ALLOCATOR, pooledAllocator)  .option(ChannelOption.SO_REUSEADDR, !SystemUtils.IS_OS_WINDOWS)  .childOption(ChannelOption.ALLOCATOR, pooledAllocator);

getServerChannelClass =>

public static Class<? extends ServerChannel> getServerChannelClass(IOMode mode) {  
    switch (mode) {    
        case NIO:      return NioServerSocketChannel.class;    
        case EPOLL:      return EpollServerSocketChannel.class;   // 在linux下使用epoll的方法实现AIO 
        default:      throw new IllegalArgumentException("Unknown io mode: " + mode);  }}

回到NettyRpcEnv startServer

def startServer(bindAddress: String, port: Int): Unit = {
    val bootstraps: java.util.List[TransportServerBootstrap] =
      if (securityManager.isAuthenticationEnabled()) {
        java.util.Arrays.asList(new AuthServerBootstrap(transportConf, securityManager))
      } else {
        java.util.Collections.emptyList()
      }
    server = transportContext.createServer(bindAddress, port, bootstraps)
    dispatcher.registerRpcEndpoint(
      RpcEndpointVerifier.NAME, new RpcEndpointVerifier(this, dispatcher))
  }

=> registerRpcEndpoint => RpcEndpoint(形参) =>

receive* //用来接收消息

=>RpcEndpointRef // Ref——“引用” ask* 用来发送消息

小总结

  • spark中的通信又SparkContext负责,在SparkContext中维护一个env对象,而这个env的实现实际上是由NettyRpcEnv实现的
  • 在NettyRpcEnv中维护一个属性dispatcher(调度员),我们的EndPoint(终端)就注册在这里
  • 在EndPoint中有一个收件箱维护在Dispatcher中

通信示意图

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-KFwSdxhh-1627747424849)(spark笔记.assets/1614071924204.png)]

应用程序的执行

  1. RDD 依赖
  2. 阶段的划分
  3. 任务的划分
  4. 任务的调度
  5. 任务的执行

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-BDt4Dycw-1627747424862)(spark笔记.assets/wps1.jpg)]

SparkContext

关键组件

  • SparkConf 配置对象 :基础环境配置
  • SparkEnv 环境对象:通信环境
  • SchedulerBackend 通信后台:主要用于和Executor之间进行通信
  • TaskScheduler 任务调度器:主要用于任务的调度
  • DAGScheduler 阶段调度器:主要用于阶段的划分与任务的划分
RDD之间的依赖关系

例如算子flatMap

def map[U: ClassTag](f: T => U): RDD[U] = withScope {
    val cleanF = sc.clean(f)
    new MapPartitionsRDD[U, T](this, (_, _, iter) => iter.map(cleanF)) // 可以发现这个算子 将这个rdd使用MapPartitionsRDD ”包“起来了
  }

MapPartitionsRDD(构造方法)

def this(@transient oneParent: RDD[_]) =  
this(oneParent.context, List(new OneToOneDependency(oneParent))) //出现了一对一的关系

继续走this

abstract class RDD[T: ClassTag](
    @transient private var _sc: SparkContext,
    @transient private var deps: Seq[Dependency[_]] //可以发现依赖关系被传进来了
  ) extends Serializable with Logging
protected def getDependencies: Seq[Dependency[_]] = deps // 依赖关系可以在之后被取出

groupby算子

groupby => groupby => groupByKey => combineByKeyWithClassTag => ShuffledRDD

class ShuffledRDD[K: ClassTag, V: ClassTag, C: ClassTag](
    @transient var prev: RDD[_ <: Product2[K, V]],
    part: Partitioner)
  extends RDD[(K, C)](prev.context, Nil) // 可以发现在deps: Seq[Dependency[_]] 中给的是Nil

ShuffleRDD没有依赖?并不是

因为在ShuffleRDD中 getDependencies 方法会返回固定new ShuffleDependency(),一次这个地方已经没有必要传依赖了

List(new ShuffleDependency(prev, part, serializer, keyOrdering, aggregator, mapSideCombine)) // prev 就是传进来的依赖

依赖关系就出来了:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-zLUb3ukD-1627747424871)(spark笔记.assets/1614075032549.png)]

阶段的划分

一个action算子(collect)会发生什么?

collect => sc.runJob => runJob => dagScheduler.runJob

=> runJob => submitJob

=>eventProcessLoop.post(JobSubmitted( …)))

=> post => eventQueue.put(event) => onReceive(event)

=> 实现类DAGScheduler 中 doOnReceive => 模式匹配是什么消息? JobSubmitted handleJobSubmitted

阶段的划分就在 handleJobSubmitted 方法中

=> createResultStage =>

1)getOrCreateParentStages

getShuffleDependencies(rdd).map { shuffleDep =>
      getOrCreateShuffleMapStage(shuffleDep, firstJobId)
    }.toList

1. => getShuffleDependencies =>

toVisit.dependencies.foreach {
          case shuffleDep: ShuffleDependency[_, _, _] =>
            parents += shuffleDep // parents增加一个shuffle依赖
          case dependency =>
            waitingForVisit.prepend(dependency.rdd)
        } //判断rdd依赖中是否是shuffle依赖
  1. => getOrCreateShuffleMapStage // shuffle map阶段 => createShuffleMapStage => getOrCreateParentStages , new ShuffleMapStage
val stage = new ShuffleMapStage(  id, rdd, numTasks, parents, jobId, rdd.creationSite, shuffleDep, mapOutputTracker)

2)

val stage = new ResultStage(id, rdd, func, partitions, parents, jobId, callSite) // 这里的当前rdd就被传进来了, 而这个rdd就是最后的rdd,它包含了所有的依赖关系

Spark中的阶段 = shuffle依赖的数量 + 1

任务的切分

DAGScheduler中,handleJobSubmitted =>

val job = new ActiveJob(jobId, finalStage, callSite, listener, properties)
// ... 

submitStage(finalStage) // 将包含整体stage的阶段提交

=> submitStage =>

val missing = getMissingParentStages(stage).sortBy(_.id) //获取上一级
        logDebug("missing: " + missing)
        if (missing.isEmpty) {
          logInfo("Submitting " + stage + " (" + stage.rdd + "), which has no missing parents")
          submitMissingTasks(stage, jobId.get) // 提交任务
        } else {
          for (parent <- missing) {
            submitStage(parent) // 先提交上一级
          }
          waitingStages += stage
        }

=> submitMissingTasks

val tasks: Seq[Task[_]] = try {
      val serializedTaskMetrics = closureSerializer.serialize(stage.latestInfo.taskMetrics).array()
      stage match { //模式匹配,属于哪一阶段?
        case stage: ShuffleMapStage =>
          stage.pendingPartitions.clear()
          partitionsToCompute.map { id =>
              // ...
            stage.pendingPartitions += id
            new ShuffleMapTask //... 有几个task就创建几个对象
          }

        case stage: ResultStage =>
          // ...
          }
      }
    }

=> partitionsToCompute 是什么? => val partitionsToCompute: Seq[Int] = stage.findMissingPartitions()

=> 来到实现类 ShuffleMapStage(怎么知道的? 模式匹配) findMissingPartitions

=> .getOrElse(0 until numPartitions) // 默认情况下没有的话,返回0到numPartitions的集合 这个集合就是partitionsToCompute,而集合中的每一个数字都会变成一个task

任务的调度

DAGScheduler中 submitMissingTasks =>

继续往下走,如果tasks不为空 =>

taskScheduler.submitTasks(new TaskSet(
        tasks.toArray, stage.id, stage.latestInfo.attemptNumber, jobId, properties))

=> submitTasks(抽象方法) 来到实现 TaskSchedulerImpl

1)=> createTaskSetManager => new TaskSetManager 任务集的管理器

2)addTaskSetManager(抽象) 来到实现FIFOSchedulableBuilder(因为默认初始化的 是这个调度器) => rootPool.addSchedulable(manager) //放进调度池中

3)backend.reviveOffers() // 从调度池中取出

=> 来到实现 CoarseGrainedSchedulerBackend => driverEndpoint.send(ReviveOffers) => receive(本类) => makeOffers() =>

1.scheduler.resourceOffers(workOffers) => val sortedTaskSets = rootPool.getSortedTaskSetQueue.filterNot(_.isZombie) => getSortedTaskSetQueue => taskSetSchedulingAlgorithm //调度算法

2. // … 经过一系列规则 如本地化策略

3. launchTasks(taskDescs) //启动任务

=> launchTasks 判断任务数量是否超限,executorData.executorEndpoint.send(发送启动task消息)

任务的执行

承接调度,消息发送以后EndPoint应该已经接收到消息

来到CoarseGrainedExecutorBackend, receive =>

=> 模式匹配 LaunchTask

=> 如果任务不为空 ,解码任务decode ,launchTask

=>

def launchTask(context: ExecutorBackend, taskDescription: TaskDescription): Unit = {
    val tr = new TaskRunner(context, taskDescription)
    runningTasks.put(taskDescription.taskId, tr)
    threadPool.execute(tr) //线程池执行
  }

=> new TaskRunner

=>run

Shuffle 机制

shuffle实现流程

ShuffleMapTask runTask (为什么是这里?) =>

write =>

getWriter, write =>

实现类SortShuffleWriter write =>

mapOutputWriter.commitAllPartitions()

1) mapOutputWriter.commitAllPartitions() =>

destructiveSortedWritablePartitionedIterator

  1. commitAllPartitions =>

LocalDiskShuffleMapOutputWriter 的commitAllPartitions =>

writeIndexFileAndCommit => 生成数据文件 与 索引文件

ResultTask runTask =>

rdd.iterator(partition, context) =>

getOrCompute(split, context) =>

computeOrReadCheckpoint =>

compute(实现类ShuffledRDD ) =>

SparkEnv.get.shuffleManager.getReader(  dep.shuffleHandle, split.index, split.index + 1, context, metrics)  .read()  .asInstanceOf[Iterator[(K, C)]]
shuffle写流程
  • shuffleWriterProcessor 写处理器
  • shuffleManager shuffle管理器 : Hash(早期版本) & Sort

ShuffleMapTask runTask =>

dep.shuffleWriterProcessor => ShuffleWriteProcessor write=>

1、SparkEnv.get.shuffleManager => ShuffleManager

2、 (实现类) getWriter(dep.shuffleHandle, mapId, context, createMetricsReporter(context))

1) shuffleHandle => val shuffleHandle: ShuffleHandle = _rdd.context.env.shuffleManager.registerShuffle( shuffleId, this) =>

来到(SortShuffleManager) registerShuffle

2) getWriter=> 不同的处理器 将获取不同的写对象

处理器

写对象

判断条件

SerializedShuffleHandle

UnsafeShuffleWriter

1.序列化规则支持重定位操作(java序列化不支持,kyro支持)2.不能使用预聚合3.如果下游分区数量<=16777216

BypassMergeSortShuffleHandle

BypassMergeSortShuffleWriter

1.不能使用预聚合2.如果下游分区数量小于等于200(可配)

BaseShuffleHandle

SortShuffleWriter

其他情况

3、 write



sorter = //… 获取排序其

2)sorter.insertAll(records)

val shouldCombine = aggregator.isDefined // 是否需要预聚合

  1. 需要聚合 = 》 map : new PartitionedAppendOnlyMap[K, C]
if (shouldCombine) {  
// Combine values in-memory first using our AppendOnlyMap  
    val mergeValue = aggregator.get.mergeValue  
    val createCombiner = aggregator.get.createCombiner  
    var kv: Product2[K, V] = null  
    val update = (hadValue: Boolean, oldValue: C) => {    if (hadValue) mergeValue(oldValue, kv._2) else createCombiner(kv._2)  }  
    while (records.hasNext) {    
        addElementsRead()    
        kv = records.next()    
        map.changeValue((getPartition(kv._1), kv._1), update)     // update 做预聚合功能
        maybeSpillCollection(usingMap = true)  // 是否溢写磁盘
    }
}
  1. 不需要聚合 =》 buffer : new PartitionedPairBuffer[K, C]

4、 maybeSpillCollection =》maybeSpill =》

if (shouldSpill) { // 如果需要溢写操作
    _spillCount += 1  
    logSpillage(currentMemory)  
    spill(collection) // 溢写磁盘  
    _elementsRead = 0 
    _memoryBytesSpilled += currentMemory  
    releaseMemory() // 释放磁盘内存
}

=》 spill =》 来到实现类 ExternalSorter spill

override protected[this] def spill(collection: WritablePartitionedPairCollection[K, C]): Unit = {  
    val inMemoryIterator = collection.destructiveSortedWritablePartitionedIterator(comparator)   // 排序
    val spillFile = spillMemoryIteratorToDisk(inMemoryIterator)   //将内存数据溢写到磁盘
    spills += spillFile}

=》 spillMemoryIteratorToDisk

val (blockId, file) = diskBlockManager.createTempShuffleBlock() // 创建临时文件
// ...
val writer: DiskBlockObjectWriter =
      blockManager.getDiskWriter(blockId, file, serInstance, fileBufferSize, spillMetrics)
// ...

5、 回到SortShuffleWriter

sorter.writePartitionedMapOutput(dep.shuffleId, mapId, mapOutputWriter) // …

val partitionLengths = mapOutputWriter.commitAllPartitions()

读流程

ShuffledRDD compute =>

SparkEnv.get.shuffleManager.getReader(  dep.shuffleHandle, split.index, split.index + 1, context, metrics)  .read()  .asInstanceOf[Iterator[(K, C)]]

getReader => SortShuffleManager getReader // 略

Spark内存机制

内存分布

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-V7iIUnfA-1627747424876)(spark笔记.assets/1614840777402.png)]

SparkEnv

val memoryManager: MemoryManager = UnifiedMemoryManager(conf, numUsableCores) // 统一内存管理

=》 UnifiedMemoryManager apply

val maxMemory = getMaxMemory(conf) // 获取最大内存大小
new UnifiedMemoryManager(  
    conf,  
    maxHeapMemory = maxMemory,  
    onHeapStorageRegionSize =    (maxMemory * conf.get(config.MEMORY_STORAGE_FRACTION)).toLong,  
    numCores = numCores)