kafka producer 请求broker失败

转载

gjnet 2024-10-08 11:14:25

如果说 Acceptor 是做入站连接处理的，那么，Processor 代码则是真正创建连接以及分发请求的地方。

Processor 的重要对象有：

private[kafka] class Processor(val id: Int,
                               time: Time,
                               maxRequestSize: Int,
                               requestChannel: RequestChannel, /** Processor 与 Handler 线程之间传递请求数据的队列 */
                               connectionQuotas: ConnectionQuotas,
                               connectionsMaxIdleMs: Long,
                               failedAuthenticationDelayMs: Int,
                               listenerName: ListenerName,
                               securityProtocol: SecurityProtocol,
                               config: KafkaConfig,
                               metrics: Metrics,
                               credentialProvider: CredentialProvider,
                               memoryPool: MemoryPool,
                               logContext: LogContext,
                               connectionQueueSize: Int = ConnectionQueueSize) extends AbstractServerThread(connectionQuotas) with KafkaMetricsGroup {
  // 创建的新连接信息，具体来说，就是 SocketChannel 对象。
  /** 记录分配给当前 Processor 的待处理的 SocketChannel 对象 */
  // 每当 Processor 线程接收新的连接请求时，都会将对应的 SocketChannel 放入这个队列，它保存的是要创建的新连接信息
  private val newConnections = new ArrayBlockingQueue[SocketChannel](connectionQueueSize)
  // 这是一个临时 Response 队列。当 Processor 线程将 Response 返还给 Request 发送方之后，还要将 Response 放入这个临时队列。
  // 有些 Response 回调逻辑要在 Response 被发送回发送方之后，才能执行，因此需要暂存在一个临时队列里面。
  private val inflightResponses = mutable.Map[String, RequestChannel.Response]()
  // 每个 Processor 线程都会维护自己的 Response 队列
   private val responseQueue = new LinkedBlockingDeque[RequestChannel.Response]()
   }

它的run方法如下：

override def run(): Unit = {
    // 标识当前线程启动完成
    startupComplete()
    try {
      while (isRunning) {
        try {
          // setup any new connections that have been queued up
          // 1. 创建新连接
          // 遍历获取分配给当前 Processor 的 SocketChannel 对象，注册 OP_READ 事件
          configureNewConnections()
          // register any new responses for writing
          // 2. 遍历处理当前 Processor 的响应队列，依据响应类型进行处理
          // 发送Response，并将Response放入到inflightResponses临时队列
          processNewResponses()
          // 3. 发送缓存的响应对象给客户端
          // 执行NIO poll，获取对应SocketChannel上准备就绪的I/O操作
          // 真正执行 I/O 动作的方法是这里的 poll 方法
          // poll 方法的核心代码就只有 1 行：selector.poll(pollTimeout)
          // 在底层，它实际上调用的是 Java NIO Selector 的 select 方法去执行那些准备就绪的 I/O 操作，不管是接收 Request，还是发送 Response。
          poll()
          // 4.
          // 遍历处理 poll 操作放置在 Selector 的 completedReceives 队列中的请求，
          // 封装请求信息为 Request 对象，并记录到请求队列中等待 Handler 线程处理，
          // 同时标记当前 Selector 暂时不再接收新的请求
          // 将接收到的Request放入Request队列
          processCompletedReceives()
          // 5.
          // 遍历处理 poll 操作放置在 Selector 的 completedSends 队列中的请求，
          // 将其从 inflightResponses 集合中移除，并标记当前 Selector 可以继续读取数据
          // 为临时Response队列中的Response执行回调逻辑
          processCompletedSends()
          // 6.
          // 遍历处理 poll 操作放置在 Selector 的 disconnected 集合中的断开的连接，
          // 将连接对应的所有响应从 inflightResponses 中移除，同时更新对应 IP 的连接数
          processDisconnected()

          // 关闭超过配额限制部分的连接
          closeExcessConnections()
        } catch {
          // We catch all the throwables here to prevent the processor thread from exiting. We do this because
          // letting a processor exit might cause a bigger impact on the broker. This behavior might need to be
          // reviewed if we see an exception that needs the entire broker to stop. Usually the exceptions thrown would
          // be either associated with a specific socket channel or a bad request. These exceptions are caught and
          // processed by the individual methods above which close the failing channel and continue processing other
          // channels. So this catch block should only ever see ControlThrowables.
          case e: Throwable => processException("Processor got uncaught exception.", e)
        }
      }
    } finally {
      // 关闭底层资源
      debug(s"Closing selector - processor $id")
      CoreUtils.swallow(closeAll(), this, Level.ERROR)
      shutdownComplete()
    }
  }

下面我们分别看看每个步骤的实现。

configureNewConnections

configureNewConnections 负责处理新连接请求。该方法最重要的逻辑是调用 selector 的 register 来注册 SocketChannel

private def configureNewConnections(): Unit = {
    var connectionsProcessed = 0 // 当前已配置的连接数计数器
    // 如果没超配额并且有待处理新连接
    while (connectionsProcessed < connectionQueueSize && !newConnections.isEmpty) {
      // 获取待处理 SocketChannel 对象
      val channel = newConnections.poll()
      try {
        debug(s"Processor $id listening to new connection from ${channel.socket.getRemoteSocketAddress}")
        // 用给定Selector注册该Channel
        // 底层就是调用Java NIO的SocketChannel.register(selector, SelectionKey.OP_READ)
        selector.register(connectionId(channel.socket), channel)
        // 更新计数器
        connectionsProcessed += 1
      } catch {
        // We explicitly catch all exceptions and close the socket to avoid a socket leak.
        // 对于不致命的异常，则捕获并关闭对应的通道
        case e: Throwable =>
          val remoteAddress = channel.socket.getRemoteSocketAddress
          // need to close the channel here to avoid a socket leak.
          close(listenerName, channel)
          processException(s"Processor $id closed connection from $remoteAddress", e)
      }
    }
  }

processNewResponses

负责发送 Response 给 Request 发送方，并且将 Response 放入临时 Response 队列。

private def processNewResponses(): Unit = {
    var currentResponse: RequestChannel.Response = null
    // 获取当前 Processor 的响应队列
    while ({currentResponse = dequeueResponse(); currentResponse != null}) { // Response队列中存在待处理Response
      // 获取连接通道ID
      val channelId = currentResponse.request.context.connectionId
      try {
        currentResponse match {
          case response: NoOpResponse => // 无需发送Response
            // There is no response to send to the client, we need to read more pipelined requests
            // that are sitting in the server's socket buffer
            updateRequestMetrics(response)
            trace(s"Socket server received empty response to send, registering for read: $response")
            // Try unmuting the channel. If there was no quota violation and the channel has not been throttled,
            // it will be unmuted immediately. If the channel has been throttled, it will be unmuted only if the
            // throttling delay has already passed by now.
            handleChannelMuteEvent(channelId, ChannelMuteEvent.RESPONSE_SENT)
            tryUnmuteChannel(channelId)

          case response: SendResponse => // 发送Response并将Response放入inflightResponses
            sendResponse(response, response.responseSend)
          // 关闭对应的连接
          case response: CloseConnectionResponse =>
            updateRequestMetrics(response)
            trace("Closing socket connection actively according to the response code.")
            close(channelId)
          case _: StartThrottlingResponse =>
            handleChannelMuteEvent(channelId, ChannelMuteEvent.THROTTLE_STARTED)
          case _: EndThrottlingResponse =>
            // Try unmuting the channel. The channel will be unmuted only if the response has already been sent out to
            // the client.
            handleChannelMuteEvent(channelId, ChannelMuteEvent.THROTTLE_ENDED)
            tryUnmuteChannel(channelId)
          case _ =>
            throw new IllegalArgumentException(s"Unknown response type: ${currentResponse.getClass}")
        }
      } catch {
        case e: Throwable =>
          processChannelException(channelId, s"Exception while processing response for $channelId", e)
      }
    }
  }

这里的关键是 SendResponse 分支上的 sendResponse 方法。

protected[network] def sendResponse(response: RequestChannel.Response, responseSend: Send): Unit = {
    val connectionId = response.request.context.connectionId
    trace(s"Socket server received response to send to $connectionId, registering for write and sending data: $response")
    // `channel` can be None if the connection was closed remotely or if selector closed it for being idle for too long
    if (channel(connectionId).isEmpty) {
      warn(s"Attempting to send response via channel for which there is no open connection, connection id $connectionId")
      response.request.updateRequestMetrics(0L, response)
    }
    // Invoke send for closingChannel as well so that the send is failed and the channel closed properly and
    // removed from the Selector after discarding any pending staged receives.
    // `openOrClosingChannel` can be None if the selector closed the connection because it was idle for too long
    if (openOrClosingChannel(connectionId).isDefined) { // 如果该连接处于可连接状态
      selector.send(responseSend) // 发送Response
      inflightResponses += (connectionId -> response) // 将Response加入到inflightResponses队列
    }
  }

poll
严格来说，上面提到的所有发送的逻辑都不是执行真正的发送。真正执行 I/O 动作的方法是这里的 poll 方法。
poll 方法的核心代码就只有 1 行：selector.poll(pollTimeout)。

private def poll(): Unit = {
    val pollTimeout = if (newConnections.isEmpty) 300 else 0
    try selector.poll(pollTimeout)
    catch {
      case e @ (_: IllegalStateException | _: IOException) =>
        // The exception is not re-thrown and any completed sends/receives/connections/disconnections
        // from this poll will be processed.
        error(s"Processor $id poll failed", e)
    }
  }

processCompletedReceives
它是接收和处理 Request 的逻辑，Processor 从底层 Socket 通道不断读取已接收到的网络请求，然后转换成 Request 实例，并将其放入到 Request 队列。。
最核心的代码就只有 1 行：requestChannel.sendRequest(req)，也就是将此 Request 放入 Request 队列。

private def processCompletedReceives(): Unit = {
    // 遍历所有已接收的Request
    selector.completedReceives.forEach { receive =>
      try {
        // 保证对应连接通道已经建立
        openOrClosingChannel(receive.source) match {
          case Some(channel) =>
            val header = RequestHeader.parse(receive.payload)
            if (header.apiKey == ApiKeys.SASL_HANDSHAKE && channel.maybeBeginServerReauthentication(receive,
              () => time.nanoseconds()))
              trace(s"Begin re-authentication: $channel")
            else {
              val nowNanos = time.nanoseconds()
              // 如果认证会话已过期，则关闭连接
              if (channel.serverAuthenticationSessionExpired(nowNanos)) {
                // be sure to decrease connection count and drop any in-flight responses
                debug(s"Disconnecting expired channel: $channel : $header")
                close(channel.id)
                expiredConnectionsKilledCount.record(null, 1, 0)
              } else {
                val connectionId = receive.source
                val context = new RequestContext(header, connectionId, channel.socketAddress,
                  channel.principal, listenerName, securityProtocol,
                  channel.channelMetadataRegistry.clientInformation)
                // 根据Channel中获取的Receive对象，构建Request对象
                val req = new RequestChannel.Request(processor = id, context = context,
                  startTimeNanos = nowNanos, memoryPool, receive.payload, requestChannel.metrics)
                // KIP-511: ApiVersionsRequest is intercepted here to catch the client software name
                // and version. It is done here to avoid wiring things up to the api layer.
                if (header.apiKey == ApiKeys.API_VERSIONS) {
                  val apiVersionsRequest = req.body[ApiVersionsRequest]
                  if (apiVersionsRequest.isValid) {
                    channel.channelMetadataRegistry.registerClientInformation(new ClientInformation(
                      apiVersionsRequest.data.clientSoftwareName,
                      apiVersionsRequest.data.clientSoftwareVersion))
                  }
                }
                // 核心代码：将Request添加到Request队列
                requestChannel.sendRequest(req)
                // 取消注册的 OP_READ 事件，处理期间不再接收新的请求（即不读取新的请求数据）
                selector.mute(connectionId)
                handleChannelMuteEvent(connectionId, ChannelMuteEvent.REQUEST_RECEIVED)
              }
            }
          case None =>
            // This should never happen since completed receives are processed immediately after `poll()`
            throw new IllegalStateException(s"Channel ${receive.source} removed from selector before processing completed receive")
        }
      } catch {
        // note that even though we got an exception, we can assume that receive.source is valid.
        // Issues with constructing a valid receive object were handled earlier
        case e: Throwable =>
          processChannelException(receive.source, s"Exception while processing request from ${receive.source}", e)
      }
    }
    selector.clearCompletedReceives()
  }

processCompletedSends
它负责处理 Response 的回调逻辑。这里通过调用 Response 对象的 onComplete 方法，来实现回调函数的执行。

private def processCompletedSends(): Unit = {
    // 遍历底层SocketChannel已发送的Response
    selector.completedSends.forEach { send =>
      try {
        // 取出对应inflightResponses中的Response
        // 因为当前响应已经发送成功，从 inflightResponses 中移除，不需要客户端确认
        val response = inflightResponses.remove(send.destination).getOrElse {
          throw new IllegalStateException(s"Send for ${send.destination} completed, but not in `inflightResponses`")
        }
        updateRequestMetrics(response) // 更新一些统计指标
        // 执行回调逻辑
        // Invoke send completion callback
        response.onComplete.foreach(onComplete => onComplete(send))

        // Try unmuting the channel. If there was no quota violation and the channel has not been throttled,
        // it will be unmuted immediately. If the channel has been throttled, it will unmuted only if the throttling
        // delay has already passed by now.
        // 注册 OP_READ 事件，继续读取请求数据
        handleChannelMuteEvent(send.destination, ChannelMuteEvent.RESPONSE_SENT)
        tryUnmuteChannel(send.destination)
      } catch {
        case e: Throwable => processChannelException(send.destination,
          s"Exception while processing completed send to ${send.destination}", e)
      }
    }
    selector.clearCompletedSends()
  }

processDisconnected
它就是处理已断开连接的，比较关键的代码是需要从底层 Selector 中获取那些已经断开的连接，之后把它们从 inflightResponses 中移除掉，同时也要更新它们的配额数据。

private def processDisconnected(): Unit = {
    // 遍历底层SocketChannel的那些已经断开的连接
    selector.disconnected.keySet.forEach { connectionId =>
      try {
        // 获取断开连接的远端主机名信息
        val remoteHost = ConnectionId.fromString(connectionId).getOrElse {
          throw new IllegalStateException(s"connectionId has unexpected format: $connectionId")
        }.remoteHost
        // 将该连接从inflightResponses中移除，同时更新一些监控指标
        inflightResponses.remove(connectionId).foreach(updateRequestMetrics)
        // the channel has been closed by the selector but the quotas still need to be updated
        // 更新配额数据
        // 对应的通道已经被关闭，所以需要减少对应 IP 上的连接数
        connectionQuotas.dec(listenerName, InetAddress.getByName(remoteHost))
      } catch {
        case e: Throwable => processException(s"Exception while processing disconnection of $connectionId", e)
      }
    }
  }

closeExcessConnections
关闭超过配额限制部分的连接，所谓优先关闭，是指在诸多 TCP 连接中找出最近未被使用的那个。这里“未被使用”就是说，在最近一段时间内，没有任何 Request 经由这个连接被发送到 Processor 线程。

private def closeExcessConnections(): Unit = {
    // 如果配额超限了
    if (connectionQuotas.maxConnectionsExceeded(listenerName)) {
      // 找出优先关闭的那个连接
      // 所谓优先关闭，是指在诸多 TCP 连接中找出最近未被使用的那个。
      // 这里“未被使用”就是说，在最近一段时间内，没有任何 Request 经由这个连接被发送到 Processor 线程。
      val channel = selector.lowestPriorityChannel()
      if (channel != null)
        close(channel.id) // 关闭该连接
    }
  }

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。