第1章 简介
接上一篇文章,启动TaskManager之后;本篇文章介绍TaskManager向ResourceManager注册Slot,然后提供给JobManager。
第2章 具体步骤
2.1 启动TaskExecutor
org.apache.flink.runtime.taskexecutor.TaskExecutor#startTaskExecutorServices
private void startTaskExecutorServices() throws Exception {
try {
// start by connecting to the ResourceManager
// TODO taskManager向ResourceManager发起连接
resourceManagerLeaderRetriever.start(new ResourceManagerLeaderListener());
// tell the task slot table who's responsible for the task slot actions
taskSlotTable.start(new SlotActionsImpl(), getMainThreadExecutor());
// start the job leader service
jobLeaderService.start(getAddress(), getRpcService(), haServices, new JobLeaderListenerImpl());
fileCache = new FileCache(taskManagerConfiguration.getTmpDirectories(), blobCacheService.getPermanentBlobService());
} catch (Exception e) {
handleStartTaskExecutorServicesException(e);
}
}
2.2 TM与RM建立连接
org.apache.flink.runtime.leaderretrieval.LeaderRetrievalService#start
我们看ZooKeeperLeaderRetrievalService的实现类
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService#start
@Override
public void start(LeaderRetrievalListener listener) throws Exception {
Preconditions.checkNotNull(listener, "Listener must not be null.");
Preconditions.checkState(leaderListener == null, "ZooKeeperLeaderRetrievalService can " +
"only be started once.");
LOG.info("Starting ZooKeeperLeaderRetrievalService {}.", retrievalPath);
synchronized (lock) {
leaderListener = listener;
// TODO 添加监听器
client.getUnhandledErrorListenable().addListener(this);
cache.getListenable().addListener(this);
cache.start();
client.getConnectionStateListenable().addListener(connectionStateListener);
running = true;
}
}
最终会执行org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService#retrieveLeaderInformationFromZooKeeper
private void retrieveLeaderInformationFromZooKeeper() {
synchronized (lock) {
if (running) {
try {
// ...
// TODO 通知leader地址
notifyIfNewLeaderAddress(leaderAddress, leaderSessionID);
} catch (Exception e) {
leaderListener.handleError(new Exception("Could not handle node changed event.", e));
ExceptionUtils.checkInterrupted(e);
}
} else {
LOG.debug("Ignoring node change notification since the service has already been stopped.");
}
}
}
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService#notifyIfNewLeaderAddress
@GuardedBy("lock")
private void notifyIfNewLeaderAddress(String newLeaderAddress, UUID newLeaderSessionID) {
if (!(Objects.equals(newLeaderAddress, lastLeaderAddress) &&
Objects.equals(newLeaderSessionID, lastLeaderSessionID))) {
// ...
// TODO 通知Leader的地址
leaderListener.notifyLeaderAddress(newLeaderAddress, newLeaderSessionID);
}
}
org.apache.flink.runtime.leaderretrieval.LeaderRetrievalListener#notifyLeaderAddress
实现类最终还是在TaskExecutor
org.apache.flink.runtime.taskexecutor.TaskExecutor.ResourceManagerLeaderListener#notifyLeaderAddress
@Override
public void notifyLeaderAddress(final String leaderAddress, final UUID leaderSessionID) {
// TODO 获得新的RM地址
runAsync(
() -> notifyOfNewResourceManagerLeader(
leaderAddress,
ResourceManagerId.fromUuidOrNull(leaderSessionID)));
}
org.apache.flink.runtime.taskexecutor.TaskExecutor#notifyOfNewResourceManagerLeader
private void notifyOfNewResourceManagerLeader(String newLeaderAddress, ResourceManagerId newResourceManagerId) {
resourceManagerAddress = createResourceManagerAddress(newLeaderAddress, newResourceManagerId);
// TODO 连接RM
reconnectToResourceManager(new FlinkException(String.format("ResourceManager leader changed to new address %s", resourceManagerAddress)));
}
org.apache.flink.runtime.taskexecutor.TaskExecutor#reconnectToResourceManager
private void reconnectToResourceManager(Exception cause) {
closeResourceManagerConnection(cause);
startRegistrationTimeout();
// TODO 尝试连接RM
tryConnectToResourceManager();
}
org.apache.flink.runtime.taskexecutor.TaskExecutor#tryConnectToResourceManager
private void tryConnectToResourceManager() {
if (resourceManagerAddress != null) {
// TODO 连接RM
connectToResourceManager();
}
}
org.apache.flink.runtime.taskexecutor.TaskExecutor#connectToResourceManager
private void connectToResourceManager() {
assert(resourceManagerAddress != null);
assert(establishedResourceManagerConnection == null);
assert(resourceManagerConnection == null);
log.info("Connecting to ResourceManager {}.", resourceManagerAddress);
final TaskExecutorRegistration taskExecutorRegistration = new TaskExecutorRegistration(
getAddress(),
getResourceID(),
unresolvedTaskManagerLocation.getDataPort(),
JMXService.getPort().orElse(-1),
hardwareDescription,
memoryConfiguration,
taskManagerConfiguration.getDefaultSlotResourceProfile(),
taskManagerConfiguration.getTotalResourceProfile()
);
// TODO 注意,注册成功后会执行TaskExecutorToResourceManagerConnection中的回调onRegistrationSuccess
resourceManagerConnection =
new TaskExecutorToResourceManagerConnection(
log,
getRpcService(),
taskManagerConfiguration.getRetryingRegistrationConfiguration(),
resourceManagerAddress.getAddress(),
resourceManagerAddress.getResourceManagerId(),
getMainThreadExecutor(),
new ResourceManagerRegistrationListener(),
taskExecutorRegistration);
// TODO 开始连接
resourceManagerConnection.start();
}
start启动了RPC的注册连接,连接成功执行TaskExecutorToResourceManagerConnection中的onRegistrationSuccess回调
org.apache.flink.runtime.taskexecutor.TaskExecutorToResourceManagerConnection#onRegistrationSuccess
@Override
protected void onRegistrationSuccess(TaskExecutorRegistrationSuccess success) {
log.info("Successful registration at resource manager {} under registration id {}.",
getTargetAddress(), success.getRegistrationId());
// TODO 注册成功后
registrationListener.onRegistrationSuccess(this, success);
}
org.apache.flink.runtime.registration.RegistrationConnectionListener#onRegistrationSuccess的实现类ResourceManagerRegistrationListener实际上是TaskExecutor的一个内部类。
org.apache.flink.runtime.taskexecutor.TaskExecutor.ResourceManagerRegistrationListener#onRegistrationSuccess
@Override
public void onRegistrationSuccess(TaskExecutorToResourceManagerConnection connection, TaskExecutorRegistrationSuccess success) {
final ResourceID resourceManagerId = success.getResourceManagerId();
final InstanceID taskExecutorRegistrationId = success.getRegistrationId();
final ClusterInformation clusterInformation = success.getClusterInformation();
final ResourceManagerGateway resourceManagerGateway = connection.getTargetGateway();
runAsync(
() -> {
// filter out outdated connections
//noinspection ObjectEquality
if (resourceManagerConnection == connection) {
try {
// TODO TM建立与RM的连接
establishResourceManagerConnection(
resourceManagerGateway,
resourceManagerId,
taskExecutorRegistrationId,
clusterInformation);
} catch (Throwable t) {
log.error("Establishing Resource Manager connection in Task Executor failed", t);
}
}
});
}
2.3 向RM注册Slot
org.apache.flink.runtime.taskexecutor.TaskExecutor#establishResourceManagerConnection
private void establishResourceManagerConnection(
ResourceManagerGateway resourceManagerGateway,
ResourceID resourceManagerResourceId,
InstanceID taskExecutorRegistrationId,
ClusterInformation clusterInformation) {
// TODO 发送请求slot信息
final CompletableFuture<Acknowledge> slotReportResponseFuture = resourceManagerGateway.sendSlotReport(
getResourceID(),
taskExecutorRegistrationId,
taskSlotTable.createSlotReport(getResourceID()),
taskManagerConfiguration.getTimeout());
// ...
}
org.apache.flink.runtime.resourcemanager.ResourceManagerGateway#sendSlotReport的实现方法:
org.apache.flink.runtime.resourcemanager.ResourceManager#sendSlotReport
@Override
public CompletableFuture<Acknowledge> sendSlotReport(ResourceID taskManagerResourceId, InstanceID taskManagerRegistrationId, SlotReport slotReport, Time timeout) {
final WorkerRegistration<WorkerType> workerTypeWorkerRegistration = taskExecutors.get(taskManagerResourceId);
if (workerTypeWorkerRegistration.getInstanceID().equals(taskManagerRegistrationId)) {
// TODO RM中的slotManager注册TM
if (slotManager.registerTaskManager(workerTypeWorkerRegistration, slotReport)) {
onWorkerRegistered(workerTypeWorkerRegistration.getWorker());
}
return CompletableFuture.completedFuture(Acknowledge.get());
} else {
return FutureUtils.completedExceptionally(new ResourceManagerException(String.format("Unknown TaskManager registration id %s.", taskManagerRegistrationId)));
}
}
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager#registerTaskManager的实现方法:
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl#registerTaskManager
@Override
public boolean registerTaskManager(final TaskExecutorConnection taskExecutorConnection, SlotReport initialSlotReport) {
checkInit();
LOG.debug("Registering TaskManager {} under {} at the SlotManager.", taskExecutorConnection.getResourceID().getStringWithMetadata(), taskExecutorConnection.getInstanceID());
// we identify task managers by their instance id
if (taskManagerRegistrations.containsKey(taskExecutorConnection.getInstanceID())) {
reportSlotStatus(taskExecutorConnection.getInstanceID(), initialSlotReport);
return false;
} else {
// ...
// next register the new slots
for (SlotStatus slotStatus : initialSlotReport) {
// TODO 注册Slot
registerSlot(
slotStatus.getSlotID(),
slotStatus.getAllocationID(),
slotStatus.getJobID(),
slotStatus.getResourceProfile(),
taskExecutorConnection);
}
return true;
}
}
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl#registerSlot
private void registerSlot(
SlotID slotId,
AllocationID allocationId,
JobID jobId,
ResourceProfile resourceProfile,
TaskExecutorConnection taskManagerConnection) {
// TODO 如果slots中已经存在,先根据slotId移除旧的slot
if (slots.containsKey(slotId)) {
// remove the old slot first
removeSlot(
slotId,
new SlotManagerException(
String.format(
"Re-registration of slot %s. This indicates that the TaskExecutor has re-connected.",
slotId)));
}
// TODO 创建并注册新的Slot
final TaskManagerSlot slot = createAndRegisterTaskManagerSlot(slotId, resourceProfile, taskManagerConnection);
final PendingTaskManagerSlot pendingTaskManagerSlot;
if (allocationId == null) {
// TODO 待定的slot
pendingTaskManagerSlot = findExactlyMatchingPendingTaskManagerSlot(resourceProfile);
} else {
pendingTaskManagerSlot = null;
}
if (pendingTaskManagerSlot == null) {
// TODO 更新slot
updateSlot(slotId, allocationId, jobId);
} else {
pendingSlots.remove(pendingTaskManagerSlot.getTaskManagerSlotId());
final PendingSlotRequest assignedPendingSlotRequest = pendingTaskManagerSlot.getAssignedPendingSlotRequest();
// TODO 分配挂起的请求为空
if (assignedPendingSlotRequest == null) {
// TODO 当作空闲的slot处理
handleFreeSlot(slot);
} else {
// TODO 取消挂起的TM slot
assignedPendingSlotRequest.unassignPendingTaskManagerSlot();
// TODO 分配slot
allocateSlot(slot, assignedPendingSlotRequest);
}
}
}
到这里TM向RM中注册slot就完成了!
2.4 RM通知TM注册信息
RM注册完Slot后,需要返回注册信息给TM。
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl#allocateSlot
private void allocateSlot(TaskManagerSlot taskManagerSlot, PendingSlotRequest pendingSlotRequest) {
// ...
// TODO 在所有当前注册的TM中获取当前实例
TaskManagerRegistration taskManagerRegistration = taskManagerRegistrations.get(instanceID);
if (taskManagerRegistration == null) {
throw new IllegalStateException("Could not find a registered task manager for instance id " +
instanceID + '.');
}
// TODO 标记为已使用
taskManagerRegistration.markUsed();
// RPC call to the task manager
// TODO 通知TM,提供slot给JM,供执行job
CompletableFuture<Acknowledge> requestFuture = gateway.requestSlot(
slotId,
pendingSlotRequest.getJobId(),
allocationId,
pendingSlotRequest.getResourceProfile(),
pendingSlotRequest.getTargetAddress(),
resourceManagerId,
taskManagerRequestTimeout);
// ...
}
org.apache.flink.runtime.taskexecutor.TaskExecutorGateway#requestSlot的实现方法
org.apache.flink.runtime.taskexecutor.TaskExecutor#requestSlot
@Override
public CompletableFuture<Acknowledge> requestSlot(
final SlotID slotId,
final JobID jobId,
final AllocationID allocationId,
final ResourceProfile resourceProfile,
final String targetAddress,
final ResourceManagerId resourceManagerId,
final Time timeout) {
// ...
try {
// TODO 根据RM分配成功后的指令,分配自己的slot
allocateSlot(
slotId,
jobId,
allocationId,
resourceProfile);
} catch (SlotAllocationException sae) {
return FutureUtils.completedExceptionally(sae);
}
// ...
if (job.isConnected()) {
// TODO 提供slot给JobManager
offerSlotsToJobManager(jobId);
}
return CompletableFuture.completedFuture(Acknowledge.get());
}
TM在收到RM返回的信息后,先对自己内部的slot信息进行响应的分配处理,然后再将slot信息提供给JM。
2.5 TM提供Slot给JM
org.apache.flink.runtime.taskexecutor.TaskExecutor#offerSlotsToJobManager
private void offerSlotsToJobManager(final JobID jobId) {
jobTable
.getConnection(jobId)
.ifPresent(this::internalOfferSlotsToJobManager);
}
org.apache.flink.runtime.taskexecutor.TaskExecutor#internalOfferSlotsToJobManager
private void internalOfferSlotsToJobManager(JobTable.Connection jobManagerConnection) {
final JobID jobId = jobManagerConnection.getJobId();
if (taskSlotTable.hasAllocatedSlots(jobId)) {
// ...
// TODO 连接jobMaster(jobManager),提供slot
CompletableFuture<Collection<SlotOffer>> acceptedSlotsFuture = jobMasterGateway.offerSlots(
getResourceID(),
reservedSlots,
taskManagerConfiguration.getTimeout());
acceptedSlotsFuture.whenCompleteAsync(
handleAcceptedSlotOffers(jobId, jobMasterGateway, jobMasterId, reservedSlots),
getMainThreadExecutor());
} else {
log.debug("There are no unassigned slots for the job {}.", jobId);
}
}
这里开始通过RPC请求JM,向JM提供Slot。
org.apache.flink.runtime.jobmaster.JobMasterGateway#offerSlots的实现方法
org.apache.flink.runtime.jobmaster.JobMaster#offerSlots
@Override
public CompletableFuture<Collection<SlotOffer>> offerSlots(
final ResourceID taskManagerId,
final Collection<SlotOffer> slots,
final Time timeout) {
// ...
// TODO jobManger中的slotpool提供slot
return CompletableFuture.completedFuture(
slotPool.offerSlots(
taskManagerLocation,
rpcTaskManagerGateway,
slots));
}
向JM中的slotpool提供slot
org.apache.flink.runtime.jobmaster.slotpool.SlotPool#offerSlots的实现方法:
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl#offerSlots
@Override
public Collection<SlotOffer> offerSlots(
TaskManagerLocation taskManagerLocation,
TaskManagerGateway taskManagerGateway,
Collection<SlotOffer> offers) {
ArrayList<SlotOffer> result = new ArrayList<>(offers.size());
// TODO 提供slot
for (SlotOffer offer : offers) {
if (offerSlot(
taskManagerLocation,
taskManagerGateway,
offer)) {
result.add(offer);
}
}
return result;
}
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl#offerSlot
boolean offerSlot(
final TaskManagerLocation taskManagerLocation,
final TaskManagerGateway taskManagerGateway,
final SlotOffer slotOffer) {
// ...
// TODO 分配slot
final AllocatedSlot allocatedSlot = new AllocatedSlot(
allocationID,
taskManagerLocation,
slotOffer.getSlotIndex(),
slotOffer.getResourceProfile(),
taskManagerGateway);
// use the slot to fulfill pending request, in requested order
// TODO 使用slot来完成挂起的请求
tryFulfillSlotRequestOrMakeAvailable(allocatedSlot);
// we accepted the request in any case. slot will be released after it idled for
// too long and timed out
return true;
}
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl#tryFulfillSlotRequestOrMakeAvailable
private void tryFulfillSlotRequestOrMakeAvailable(AllocatedSlot allocatedSlot) {
Preconditions.checkState(!allocatedSlot.isUsed(), "Provided slot is still in use.");
// TODO 挂起的请求
final PendingRequest pendingRequest = findMatchingPendingRequest(allocatedSlot);
if (pendingRequest != null) {
log.debug("Fulfilling pending slot request [{}] with slot [{}]",
pendingRequest.getSlotRequestId(), allocatedSlot.getAllocationId());
// 移除挂起的请求
removePendingRequest(pendingRequest.getSlotRequestId());
// TODO 将挂起的请求添加到分配的slot中
allocatedSlots.add(pendingRequest.getSlotRequestId(), allocatedSlot);
pendingRequest.getAllocatedSlotFuture().complete(allocatedSlot);
// this allocation may become orphan once its corresponding request is removed
// TODO 获取AllocationId,AllocationId在JM中生成,注册给RM,然后由RM给TM,标记不同的分配
final Optional<AllocationID> allocationIdOfRequest = pendingRequest.getAllocationId();
// the allocation id can be null if the request was fulfilled by a slot directly offered
// by a reconnected TaskExecutor before the ResourceManager is connected
if (allocationIdOfRequest.isPresent()) {
maybeRemapOrphanedAllocation(allocationIdOfRequest.get(), allocatedSlot.getAllocationId());
}
} else {
log.debug("Adding slot [{}] to available slots", allocatedSlot.getAllocationId());
availableSlots.add(allocatedSlot, clock.relativeTimeMillis());
}
}
JM将TM提供的Slot进行校验和记录。
到这里,整个Slot注册和提供的过程就结束了。Slot注册完之后,下一步JM需要将job提交给TM执行。这部分内容再下一篇文章为您介绍!