Spark源码梳理
一、程序的起点
spark-submit --class com.sjh.example.SparkPi --master local[*] …/demo.jar
- 在windows下调用spark-submit.cmd
- 在linux下调用spark-submit脚本
脚本调用顺序:
1、spark-submit
2、 spark-submit2.cmd
spark-class2.cmd org.apache.spark.deploy.SparkSubmit /*参数*/
3、 spark-class2.cmd
%SPARK_CMD%
SPARK_CMD是什么?
java org.apache.spark.deploy.SparkSubmit (简化)
由此,脚本最终启动了一个java进程去执行SparkSubmit中的方法
二、SparkSubmit干了啥?
1、SparkSubmit object的main方法(985line)
val submit = new SparkSubmit()... submit.doSubmit(args) // 执行提交
2、doSubmit方法
1)解析参数 parseArguments(args) // new SparkSubmitArguments(args)
1.执行类的初始化,这个类中具有大量的属性
2.line 108 parse(args.asJava)
使用正则表达式解析参数 =》 名称,值
handle 模式匹配为类属性赋值
2)appArgs.action match … // 匹配action 默认会给SUBMIT值 (loadEnvironmentArguments方法中)
3)submit(appArgs, uninitLog) line90
3、submit方法
1)、判断是否是StandaloneClouster
2)、不是 =》 doRunMain =》 runMain
4、runMain方法
1) prepareSubmitEnvironment(args) line871 (重要)
line714 在yarn集群环境下为childClasspath 赋值org.apache.spark.deploy.yarn.YarnClusterApplication
2)通过ClassLoader得到类的信息
3)根据这个类是否继承于SparkApplication, 如果是则转换为这个类型,否则new JavaMainApplication(mainClass)
4)app.start line928
5、YarnClusterApplication类
注:需要添加spark-yarn_2.12依赖才能找到
new Client(new ClientArguments(args), conf, null).run()
1) ClientArguments 解析参数
2) Client
一个重要属性 private val yarnClient = YarnClient.createYarnClient =》new YarnClientImpl()
*YarnClient
protected ApplicationClientProtocol rmClient;
3)run方法
this.appId = submitApplication() // 这里的appId是yarn全局的唯一Id,后续可以通过这个ID获取这个应用
6、 submitApplication方法
- yarnClient.start() // 启动yarnClient,建立与resourceManager的连接
- val newApp = yarnClient.createApplication() //通过yarnClient创建应用
- val newAppResponse = newApp.getNewApplicationResponse() // 返回appId
- val containerContext = createContainerLaunchContext(newAppResponse) // 创建容器环境
- val appContext = createApplicationSubmissionContext(newApp, containerContext) // 创建提交环境
- yarnClient.submitApplication(appContext) // 提交应用
1) createContainerLaunchContext 方法
- 配置参数
val amClass = if (isClusterMode) {
Utils.classForName("org.apache.spark.deploy.yarn.ApplicationMaster").getName
} else {
Utils.classForName("org.apache.spark.deploy.yarn.ExecutorLauncher").getName
}
- 封装指令 : java amClass … 放入容器中 ,返回容器,容器随着环境被返回给ResoureManager
容器封装后的日志:
===============================================================================
YARN AM launch context:
user class: com.sjh.Test
env:
CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__
SPARK_YARN_STAGING_DIR -> file:/Users/jinghu/.sparkStaging/application_1629708346779_0043
SPARK_USER -> jinghu
resources:
__app__.jar -> resource { scheme: "file" port: -1 file: "/Users/jinghu/.sparkStaging/application_1629708346779_0043/monthly-PV.jar" } size: 485814 timestamp: 1630206978724 type: FILE visibility: PRIVATE
__spark_libs__ -> resource { scheme: "file" port: -1 file: "/Users/jinghu/.sparkStaging/application_1629708346779_0043/__spark_libs__3773172268469152497.zip" } size: 242063445 timestamp: 1630206978507 type: ARCHIVE visibility: PRIVATE
__spark_conf__ -> resource { scheme: "file" port: -1 file: "/Users/jinghu/.sparkStaging/application_1629708346779_0043/__spark_conf__.zip" } size: 295698 timestamp: 1630206979149 type: ARCHIVE visibility: PRIVATE
command:
{{JAVA_HOME}}/bin/java -server -Xmx1024m -Djava.io.tmpdir={{PWD}}/tmp -Dspark.yarn.app.container.log.dir=<LOG_DIR> org.apache.spark.deploy.yarn.ApplicationMaster --class 'com.sjh.Test' --jar file:/Users/jinghu/Documents/project/IdeaProjects/Spark/sparkdemo/target/monthly-PV.jar --properties-file {{PWD}}/__spark_conf__/__spark_conf__.properties --dist-cache-conf {{PWD}}/__spark_conf__/__spark_dist_cache__.properties 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
===============================================================================
因此接下来需要执行的是ApplicationMaster这个类。
7、ApplicationMaster
- master = new ApplicationMaster(amArgs, sparkConf, yarnConf) =》 private val client = new YarnRMClient() =》 private var amClient: AMRMClient[ContainerRequest] = _ rm连接am的客户端
2) run =》line264 ,runDriver()
8、runDriver方法
1) userClassThread = startUserApplication()
1. 加载类 val mainMethod = userClassLoader.loadClass(args.userClass) // 这个类来自--class
2. 执行这个类的main方法 userThread.setName("Driver") userThread.start() // 这是一个线程
2) val sc = ThreadUtils.awaitResult(sparkContextPromise.future, // 等待上下文环境对象准备完成
3) 注册AM registerAM(host, port, userConf, sc.ui.map(_.webUrl), appAttemptId) // 与yarn连接,申请资源
4) line,512 createAllocator =》 line,479 allocator.allocateResources()
获取可分配的容器列表 allocateResponse.getAllocatedContainers()
处理可分配的容器 handleAllocatedContainers(allocatedContainers.asScala) //分配策略
运行已分配容器 runAllocatedContainers(containersToUse)
9、 容器的启动
1)runAllocatedContainers方法
使用线程池去启动各个容器 run =》
def run(): Unit = {
logDebug("Starting Executor Container")
nmClient = NMClient.createNMClient()
nmClient.init(conf)
nmClient.start()
startContainer()
}
2)startContainer()
val commands = prepareCommand() // 准备指令
nmClient.startContainer(container.get, ctx) // 使用rmClient启动指定容器,并传递上下文(包含command)
3) prepareCommand()
java org.apache.spark.executor.YarnCoarseGrainedExecutorBackend // Executor的通信后台进程
10、 YarnCoarseGrainedExecutorBackend 与 CoarseGrainedExecutorBackend
1) main方法
line 81 run
2) run方法
SparkEnv.createExecutorEnv
env.rpcEnv.setupEndpoint(“Executor”, backendCreateFn(env.rpcEnv, arguments, env, cfg.resourceProfile)) // 将创建出来的对象 ,设置成通信终端
3) setupEndpoint ,注意点进实现类NettyRpcEnv
dispatcher.registerRpcEndpoint(name, endpoint) =》 new DedicatedMessageLoop(name, e, this) // inbox threadpool
inbox设置完成后CoarseGrainedExecutorBackend中的 onStart得以执行
11、收尾通信
- Inbox line,78 在类被初始化的时候被执行,回向自己发送一条消息——OnStart
在终端收到消息并执行onStart方法 - onStart : 向driver发送消息,注册一个executor
- SparkContext中的 :private var _schedulerBackend: SchedulerBackend = _
- 关注 SchedulerBackend 的实现CoarseGrainedSchedulerBackend
receiveAndReply方法 进行操作并回复成功 line227 context.reply(true) - 收到成功的消息,2.4版本中会忽略这个消息,而在3.0中则会向自己发送一条成功的消息
- receive 方法 executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false)
启动executor
在ApplicationMaster中继续执行driver代码(完结)
补充:
- SparkContext =》 _taskScheduler.postStartHook()
- 来到实现YarnClusterScheduler ,ApplicationMaster.sparkContextInitialized(sc)让程序继续执行
postStartHook =》 waitBackendReady =》 while的等待
两条线:让driver executor准备好资源
执行业务逻辑
通信框架
Spark的通信环境在哪?
SparkContext中,
private[spark] def env: SparkEnv = _env
类代码块,将初始化env
_env = createSparkEnv(_conf, isLocal, listenerBus) // =>createDriverEnv => create
create =>
val rpcEnv = RpcEnv.create(systemName, bindAddress, advertiseAddress, port.getOrElse(-1), conf,
create =>
new NettyRpcEnvFactory().create(config)
create =>
val nettyEnv = new NettyRpcEnv(sparkConf, javaSerializerInstance, config.advertiseAddress, config.securityManager, config.numUsableCores)
Utils.startServiceOnPort(config.port, startNettyRpcEnv, sparkConf, config.name)._1
startServiceOnPort=>
val (service, port) = startService(tryPort)
关注startService,这个函数是传进来的,回到上一步 startNettyRpcEnv这个参数
val startNettyRpcEnv: Int => (NettyRpcEnv, Int) = { actualPort =>
nettyEnv.startServer(config.bindAddress, actualPort)
startServer =>
server = transportContext.createServer(bindAddress, port, bootstraps)
createServer =>
return new TransportServer(this, host, port, rpcHandler, bootstraps);
TransportServer类代码块
init(hostToBind, portToBind);
init =>
bootstrap = new ServerBootstrap() .group(bossGroup, workerGroup) .channel(NettyUtils.getServerChannelClass(ioMode)) .option(ChannelOption.ALLOCATOR, pooledAllocator) .option(ChannelOption.SO_REUSEADDR, !SystemUtils.IS_OS_WINDOWS) .childOption(ChannelOption.ALLOCATOR, pooledAllocator);
getServerChannelClass =>
public static Class<? extends ServerChannel> getServerChannelClass(IOMode mode) {
switch (mode) {
case NIO: return NioServerSocketChannel.class;
case EPOLL: return EpollServerSocketChannel.class; // 在linux下使用epoll的方法实现AIO
default: throw new IllegalArgumentException("Unknown io mode: " + mode); }}
回到NettyRpcEnv startServer
def startServer(bindAddress: String, port: Int): Unit = {
val bootstraps: java.util.List[TransportServerBootstrap] =
if (securityManager.isAuthenticationEnabled()) {
java.util.Arrays.asList(new AuthServerBootstrap(transportConf, securityManager))
} else {
java.util.Collections.emptyList()
}
server = transportContext.createServer(bindAddress, port, bootstraps)
dispatcher.registerRpcEndpoint(
RpcEndpointVerifier.NAME, new RpcEndpointVerifier(this, dispatcher))
}
=> registerRpcEndpoint => RpcEndpoint(形参) =>
receive* //用来接收消息
=>RpcEndpointRef // Ref——“引用” ask* 用来发送消息
小总结 :
- spark中的通信又SparkContext负责,在SparkContext中维护一个env对象,而这个env的实现实际上是由NettyRpcEnv实现的
- 在NettyRpcEnv中维护一个属性dispatcher(调度员),我们的EndPoint(终端)就注册在这里
- 在EndPoint中有一个收件箱维护在Dispatcher中
通信示意图
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-KFwSdxhh-1627747424849)(spark笔记.assets/1614071924204.png)]
应用程序的执行
- RDD 依赖
- 阶段的划分
- 任务的划分
- 任务的调度
- 任务的执行
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-BDt4Dycw-1627747424862)(spark笔记.assets/wps1.jpg)]
SparkContext
关键组件
- SparkConf 配置对象 :基础环境配置
- SparkEnv 环境对象:通信环境
- SchedulerBackend 通信后台:主要用于和Executor之间进行通信
- TaskScheduler 任务调度器:主要用于任务的调度
- DAGScheduler 阶段调度器:主要用于阶段的划分与任务的划分
RDD之间的依赖关系
例如算子flatMap
def map[U: ClassTag](f: T => U): RDD[U] = withScope {
val cleanF = sc.clean(f)
new MapPartitionsRDD[U, T](this, (_, _, iter) => iter.map(cleanF)) // 可以发现这个算子 将这个rdd使用MapPartitionsRDD ”包“起来了
}
MapPartitionsRDD(构造方法)
def this(@transient oneParent: RDD[_]) =
this(oneParent.context, List(new OneToOneDependency(oneParent))) //出现了一对一的关系
继续走this
abstract class RDD[T: ClassTag](
@transient private var _sc: SparkContext,
@transient private var deps: Seq[Dependency[_]] //可以发现依赖关系被传进来了
) extends Serializable with Logging
protected def getDependencies: Seq[Dependency[_]] = deps // 依赖关系可以在之后被取出
groupby算子
groupby => groupby => groupByKey => combineByKeyWithClassTag => ShuffledRDD
class ShuffledRDD[K: ClassTag, V: ClassTag, C: ClassTag](
@transient var prev: RDD[_ <: Product2[K, V]],
part: Partitioner)
extends RDD[(K, C)](prev.context, Nil) // 可以发现在deps: Seq[Dependency[_]] 中给的是Nil
ShuffleRDD没有依赖?并不是
因为在ShuffleRDD中 getDependencies 方法会返回固定new ShuffleDependency(),一次这个地方已经没有必要传依赖了
List(new ShuffleDependency(prev, part, serializer, keyOrdering, aggregator, mapSideCombine)) // prev 就是传进来的依赖
依赖关系就出来了:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-zLUb3ukD-1627747424871)(spark笔记.assets/1614075032549.png)]
阶段的划分
一个action算子(collect)会发生什么?
collect => sc.runJob => runJob => dagScheduler.runJob
=> runJob => submitJob
=>eventProcessLoop.post(JobSubmitted( …)))
=> post => eventQueue.put(event) => onReceive(event)
=> 实现类DAGScheduler 中 doOnReceive => 模式匹配是什么消息? JobSubmitted handleJobSubmitted
阶段的划分就在 handleJobSubmitted 方法中
=> createResultStage =>
1)getOrCreateParentStages
getShuffleDependencies(rdd).map { shuffleDep =>
getOrCreateShuffleMapStage(shuffleDep, firstJobId)
}.toList
1. => getShuffleDependencies =>
toVisit.dependencies.foreach {
case shuffleDep: ShuffleDependency[_, _, _] =>
parents += shuffleDep // parents增加一个shuffle依赖
case dependency =>
waitingForVisit.prepend(dependency.rdd)
} //判断rdd依赖中是否是shuffle依赖
- => getOrCreateShuffleMapStage // shuffle map阶段 => createShuffleMapStage => getOrCreateParentStages , new ShuffleMapStage
val stage = new ShuffleMapStage( id, rdd, numTasks, parents, jobId, rdd.creationSite, shuffleDep, mapOutputTracker)
2)
val stage = new ResultStage(id, rdd, func, partitions, parents, jobId, callSite) // 这里的当前rdd就被传进来了, 而这个rdd就是最后的rdd,它包含了所有的依赖关系
Spark中的阶段 = shuffle依赖的数量 + 1
任务的切分
DAGScheduler中,handleJobSubmitted =>
val job = new ActiveJob(jobId, finalStage, callSite, listener, properties)
// ...
submitStage(finalStage) // 将包含整体stage的阶段提交
=> submitStage =>
val missing = getMissingParentStages(stage).sortBy(_.id) //获取上一级
logDebug("missing: " + missing)
if (missing.isEmpty) {
logInfo("Submitting " + stage + " (" + stage.rdd + "), which has no missing parents")
submitMissingTasks(stage, jobId.get) // 提交任务
} else {
for (parent <- missing) {
submitStage(parent) // 先提交上一级
}
waitingStages += stage
}
=> submitMissingTasks
val tasks: Seq[Task[_]] = try {
val serializedTaskMetrics = closureSerializer.serialize(stage.latestInfo.taskMetrics).array()
stage match { //模式匹配,属于哪一阶段?
case stage: ShuffleMapStage =>
stage.pendingPartitions.clear()
partitionsToCompute.map { id =>
// ...
stage.pendingPartitions += id
new ShuffleMapTask //... 有几个task就创建几个对象
}
case stage: ResultStage =>
// ...
}
}
}
=> partitionsToCompute 是什么? => val partitionsToCompute: Seq[Int] = stage.findMissingPartitions()
=> 来到实现类 ShuffleMapStage(怎么知道的? 模式匹配) findMissingPartitions
=> .getOrElse(0 until numPartitions) // 默认情况下没有的话,返回0到numPartitions的集合 这个集合就是partitionsToCompute,而集合中的每一个数字都会变成一个task
任务的调度
DAGScheduler中 submitMissingTasks =>
继续往下走,如果tasks不为空 =>
taskScheduler.submitTasks(new TaskSet(
tasks.toArray, stage.id, stage.latestInfo.attemptNumber, jobId, properties))
=> submitTasks(抽象方法) 来到实现 TaskSchedulerImpl
1)=> createTaskSetManager => new TaskSetManager 任务集的管理器
2)addTaskSetManager(抽象) 来到实现FIFOSchedulableBuilder(因为默认初始化的 是这个调度器) => rootPool.addSchedulable(manager) //放进调度池中
3)backend.reviveOffers() // 从调度池中取出
=> 来到实现 CoarseGrainedSchedulerBackend => driverEndpoint.send(ReviveOffers) => receive(本类) => makeOffers() =>
1.scheduler.resourceOffers(workOffers) => val sortedTaskSets = rootPool.getSortedTaskSetQueue.filterNot(_.isZombie) => getSortedTaskSetQueue => taskSetSchedulingAlgorithm //调度算法
2. // … 经过一系列规则 如本地化策略
3. launchTasks(taskDescs) //启动任务
=> launchTasks 判断任务数量是否超限,executorData.executorEndpoint.send(发送启动task消息)
任务的执行
承接调度,消息发送以后EndPoint应该已经接收到消息
来到CoarseGrainedExecutorBackend, receive =>
=> 模式匹配 LaunchTask
=> 如果任务不为空 ,解码任务decode ,launchTask
=>
def launchTask(context: ExecutorBackend, taskDescription: TaskDescription): Unit = {
val tr = new TaskRunner(context, taskDescription)
runningTasks.put(taskDescription.taskId, tr)
threadPool.execute(tr) //线程池执行
}
=> new TaskRunner
=>run
Shuffle 机制
shuffle实现流程
写
ShuffleMapTask runTask (为什么是这里?) =>
write =>
getWriter, write =>
实现类SortShuffleWriter write =>
mapOutputWriter.commitAllPartitions()
1) mapOutputWriter.commitAllPartitions() =>
destructiveSortedWritablePartitionedIterator
- commitAllPartitions =>
LocalDiskShuffleMapOutputWriter 的commitAllPartitions =>
writeIndexFileAndCommit => 生成数据文件 与 索引文件
读
ResultTask runTask =>
rdd.iterator(partition, context) =>
getOrCompute(split, context) =>
computeOrReadCheckpoint =>
compute(实现类ShuffledRDD ) =>
SparkEnv.get.shuffleManager.getReader( dep.shuffleHandle, split.index, split.index + 1, context, metrics) .read() .asInstanceOf[Iterator[(K, C)]]
shuffle写流程
- shuffleWriterProcessor 写处理器
- shuffleManager shuffle管理器 : Hash(早期版本) & Sort
ShuffleMapTask runTask =>
dep.shuffleWriterProcessor => ShuffleWriteProcessor write=>
1、SparkEnv.get.shuffleManager => ShuffleManager
2、 (实现类) getWriter(dep.shuffleHandle, mapId, context, createMetricsReporter(context))
1) shuffleHandle => val shuffleHandle: ShuffleHandle = _rdd.context.env.shuffleManager.registerShuffle( shuffleId, this) =>
来到(SortShuffleManager) registerShuffle
2) getWriter=> 不同的处理器 将获取不同的写对象
处理器 | 写对象 | 判断条件 |
SerializedShuffleHandle | UnsafeShuffleWriter | 1.序列化规则支持重定位操作(java序列化不支持,kyro支持)2.不能使用预聚合3.如果下游分区数量<=16777216 |
BypassMergeSortShuffleHandle | BypassMergeSortShuffleWriter | 1.不能使用预聚合2.如果下游分区数量小于等于200(可配) |
BaseShuffleHandle | SortShuffleWriter | 其他情况 |
3、 write
sorter = //… 获取排序其
2)sorter.insertAll(records)
val shouldCombine = aggregator.isDefined // 是否需要预聚合
- 需要聚合 = 》 map : new PartitionedAppendOnlyMap[K, C]
if (shouldCombine) {
// Combine values in-memory first using our AppendOnlyMap
val mergeValue = aggregator.get.mergeValue
val createCombiner = aggregator.get.createCombiner
var kv: Product2[K, V] = null
val update = (hadValue: Boolean, oldValue: C) => { if (hadValue) mergeValue(oldValue, kv._2) else createCombiner(kv._2) }
while (records.hasNext) {
addElementsRead()
kv = records.next()
map.changeValue((getPartition(kv._1), kv._1), update) // update 做预聚合功能
maybeSpillCollection(usingMap = true) // 是否溢写磁盘
}
}
- 不需要聚合 =》 buffer : new PartitionedPairBuffer[K, C]
4、 maybeSpillCollection =》maybeSpill =》
if (shouldSpill) { // 如果需要溢写操作
_spillCount += 1
logSpillage(currentMemory)
spill(collection) // 溢写磁盘
_elementsRead = 0
_memoryBytesSpilled += currentMemory
releaseMemory() // 释放磁盘内存
}
=》 spill =》 来到实现类 ExternalSorter spill
override protected[this] def spill(collection: WritablePartitionedPairCollection[K, C]): Unit = {
val inMemoryIterator = collection.destructiveSortedWritablePartitionedIterator(comparator) // 排序
val spillFile = spillMemoryIteratorToDisk(inMemoryIterator) //将内存数据溢写到磁盘
spills += spillFile}
=》 spillMemoryIteratorToDisk
val (blockId, file) = diskBlockManager.createTempShuffleBlock() // 创建临时文件
// ...
val writer: DiskBlockObjectWriter =
blockManager.getDiskWriter(blockId, file, serInstance, fileBufferSize, spillMetrics)
// ...
5、 回到SortShuffleWriter
sorter.writePartitionedMapOutput(dep.shuffleId, mapId, mapOutputWriter) // …
val partitionLengths = mapOutputWriter.commitAllPartitions()
读流程
ShuffledRDD compute =>
SparkEnv.get.shuffleManager.getReader( dep.shuffleHandle, split.index, split.index + 1, context, metrics) .read() .asInstanceOf[Iterator[(K, C)]]
getReader => SortShuffleManager getReader // 略
Spark内存机制
内存分布
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-V7iIUnfA-1627747424876)(spark笔记.assets/1614840777402.png)]
SparkEnv
val memoryManager: MemoryManager = UnifiedMemoryManager(conf, numUsableCores) // 统一内存管理
=》 UnifiedMemoryManager apply
val maxMemory = getMaxMemory(conf) // 获取最大内存大小
new UnifiedMemoryManager(
conf,
maxHeapMemory = maxMemory,
onHeapStorageRegionSize = (maxMemory * conf.get(config.MEMORY_STORAGE_FRACTION)).toLong,
numCores = numCores)