目录
Spark原理之资源调度和任务调度
Worker注册部分的源码分析
worker注册的流程:
关于去重
结论
Driver
Application
问题:
Spark原理之资源调度和任务调度
Spark资源调度涉及到三个集合:
集合 | 类型 |
workers | val workers = new HashSet[WorkerInfo] |
waitingDrivers |
|
waitingApps |
|
Worker注册部分的源码分析
首先Worker向Master注册时会调用Master重写的receiveAndReply方法,这个方法比较核心,是接收消息和响应的一个方法。
worker注册的流程:
- 匹配接收的context是否为worker注册类型的对象
- 如果是,判断master状态是否是active。
- 判断是否已经包含该worker。
- new一个WorkerInfo对象,将注册信息封装到里面。
- 调用refisterWorker()方法。在里面在次判断是否重复。若不重复:workers+=worker;
- 调用schedule()方法;稍后我们解析此方法。
override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
//1.类型匹配,看请求是否为注册worker
case RegisterWorker(
id, workerHost, workerPort, workerRef, cores, memory, workerUiPort, publicAddress) => {
logInfo("Registering worker %s:%d with %d cores, %s RAM".format(
workerHost, workerPort, cores, Utils.megabytesToString(memory)))
//2.判断master是否健在
if (state == RecoveryState.STANDBY) {
context.reply(MasterInStandby)
} //3.判断ID是否已存在
else if (idToWorker.contains(id)) {
context.reply(RegisterWorkerFailed("Duplicate worker ID"))
} else {
//4.将worker信息封装成一个WorkerInfo对象。
val worker = new WorkerInfo(id, workerHost, workerPort, cores, memory,
workerRef, workerUiPort, publicAddress)
//5.调用registerWorker()方法注册worker
if (registerWorker(worker)) {
persistenceEngine.addWorker(worker)
context.reply(RegisteredWorker(self, masterWebUiUrl))
//6.注册成功调用schedule方法
schedule()
} else {
val workerAddress = worker.endpoint.address
logWarning("Worker registration failed. Attempted to re-register worker at same " +
"address: " + workerAddress)
context.reply(RegisterWorkerFailed("Attempted to re-register worker at same address: "
+ workerAddress))
}
}
}
private def registerWorker(worker: WorkerInfo): Boolean = {
// There may be one or more refs to dead workers on this same node (w/ different ID's),
// remove them.
workers.filter { w =>
(w.host == worker.host && w.port == worker.port) && (w.state == WorkerState.DEAD)
}.foreach { w =>
workers -= w
}
val workerAddress = worker.endpoint.address
if (addressToWorker.contains(workerAddress)) {
val oldWorker = addressToWorker(workerAddress)
if (oldWorker.state == WorkerState.UNKNOWN) {
// A worker registering from UNKNOWN implies that the worker was restarted during recovery.
// The old worker must thus be dead, so we will remove it and accept the new worker.
removeWorker(oldWorker)
} else {
logInfo("Attempted to re-register worker at same address: " + workerAddress)
return false
}
}
workers += worker
idToWorker(worker.id) = worker
addressToWorker(workerAddress) = worker
true
}
关于去重
worker的节点信息是封装成WorkerInfo对象会存放在workers这个集合中,集合类型是HashSet,这里选用HashSet可以防止集合中出现相同的workerInfo对象。
val workers = new HashSet[WorkerInfo]
不过通过观察spark源码我们发现每次来注册的worker都是new的对象,也就是说不会出现相同地之的对象在这个HashSet中。
关于去重,则是spark在源码中手动去重,参看下面的 if (addressToWorker.contains(workerAddress)) 。如果包含相同地址的worker,如果是未知状态,则将其移除(未知状态是在recovery阶段重新启动的worker),否则提示尝试重新注册一个已存在的worker。这里集合类型选用HashSet的动机还有点疑惑,不过这样无疑更加安全。
下面是WorkerInfo的数据结构:封装了位置、端口以及资源等信息
private[spark] class WorkerInfo(
val id: String,
val host: String,
val port: Int,
val cores: Int,
val memory: Int,
val endpoint: RpcEndpointRef,
val webUiPort: Int,
val publicAddress: String)
extends Serializable {
...
}
结论
通过上面一顿分析源码我们可以得知以下结论:
- 当有新的worker向master注册时,注册成功后都会调用schedule方法。
- 为了避免workers集合中出现重复元素,数据结构选择了HashSet,而且在插入新元素的时候手动做了层层判断。
Driver注册的源码分析
对于driver注册就不像worker那样严格了,只要master节点为alive状态即可,waitingDriver是ArrayBuffer类型的。同样的,driver注册也会调用schedule方法。
case RequestSubmitDriver(description) => {
if (state != RecoveryState.ALIVE) {
val msg = s"${Utils.BACKUP_STANDALONE_MASTER_PREFIX}: $state. " +
"Can only accept driver submissions in ALIVE state."
context.reply(SubmitDriverResponse(self, false, None, msg))
} else {
logInfo("Driver submitted " + description.command.mainClass)
val driver = createDriver(description)
persistenceEngine.addDriver(driver)
waitingDrivers += driver
drivers.add(driver)
schedule()
// TODO: It might be good to instead have the submission client poll the master to determine
// the current status of the driver. For now it's simply "fire and forget".
context.reply(SubmitDriverResponse(self, true, Some(driver.id),
s"Driver successfully submitted as ${driver.id}"))
}
}
Application注册时的源码分析
Application的注册是在另外一个方法:receive。和Driver一样waitingApps也是ArrayBuffer类型的,注册完毕之后也会调用schedule方法。
case RegisterApplication(description, driver) => {
// TODO Prevent repeated registrations from some driver
if (state == RecoveryState.STANDBY) {
// ignore, don't send response
} else {
logInfo("Registering app " + description.name)
val app = createApplication(description, driver)
registerApplication(app)
logInfo("Registered app " + description.name + " with ID " + app.id)
persistenceEngine.addApplication(app)
driver.send(RegisteredApplication(app.id, self))
schedule()
}
private def registerApplication(app: ApplicationInfo): Unit = {
val appAddress = app.driver.address
if (addressToApp.contains(appAddress)) {
logInfo("Attempted to re-register application at same address: " + appAddress)
return
}
applicationMetricsSystem.registerSource(app.appSource)
apps += app
idToApp(app.id) = app
endpointToApp(app.driver) = app
addressToApp(appAddress) = app
waitingApps += app
}
问题:
- 用户在提交应用程序的时候是先调用的哪个方法?是driver还是APP。
- 为什么每次都要调用schedule方法?此方法是做什么的?
schedule方法
private def schedule(): Unit = {
if (state != RecoveryState.ALIVE) {
return
}
// Drivers take strict precedence over executors
val shuffledAliveWorkers = Random.shuffle(workers.toSeq.filter(_.state == WorkerState.ALIVE))
val numWorkersAlive = shuffledAliveWorkers.size
var curPos = 0
for (driver <- waitingDrivers.toList) { // iterate over a copy of waitingDrivers
// We assign workers to each waiting driver in a round-robin fashion. For each driver, we
// start from the last worker that was assigned a driver, and continue onwards until we have
// explored all alive workers.
var launched = false
var numWorkersVisited = 0
while (numWorkersVisited < numWorkersAlive && !launched) {
val worker = shuffledAliveWorkers(curPos)
numWorkersVisited += 1
if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) {
launchDriver(worker, driver)
waitingDrivers -= driver
launched = true
}
curPos = (curPos + 1) % numWorkersAlive
}
}
startExecutorsOnWorkers()
}
未完待续。。。