BRPC 客户端源码分析


文章目录

  • BRPC 客户端源码分析
  • client使用
  • google::protobuf::Service、RpcChannel、RpcController
  • brpc::Channel::Init
  • 初始化InitChannelOptions
  • 继续看LoadBalancerWithNaming::Init
  • RunNamingService获取nameservice
  • ResetServers中保存nameservice获取的Servers信息
  • Stub Echo是如何进行远程调用的
  • 核心CallMethod
  • IssueRPC
  • StartWrite
  • 总结



大家好,我是dandyhuang,brpc在c艹届还是比较牛逼的rpc框架,本次带来brpc的client端的源码分析。分析源码前,大家先搭建好环境,有利于代码调试和理解。按照brpc框架中example的echo_c++为例子,并且讲protobuf中,编译出的中间文件echo.pb.cc和.h保留,有利于我们更好的代码理解。

client使用

int main(int argc, char* argv[]) {
    brpc::Channel channel;
    // Initialize the channel, NULL means using default options.
    brpc::ChannelOptions options;
    options.protocol = "baidu_std";
    options.timeout_ms = 100 /*milliseconds*/;
    if (channel.Init(FLAGS_server.c_str(), FLAGS_load_balancer.c_str(), &options) != 0) {
        LOG(ERROR) << "Fail to initialize channel";
        return -1;
    }
    example::EchoService_Stub stub(&channel);
    int log_id = 0;
    while (!brpc::IsAskedToQuit()) {
        example::EchoRequest request;
        example::EchoResponse response;
        brpc::Controller cntl;
        request.set_message("hello world");
        cntl.set_log_id(log_id ++);  // set by user
        stub.Echo(&cntl, &request, &response, NULL);
        if (!cntl.Failed()) {
            LOG(INFO) << "Received response from " << cntl.remote_side()
                << ": " << response.message() << " (attached="
                << cntl.response_attachment() << ")"
                << " latency=" << cntl.latency_us() << "us";
        } else {
            LOG(WARNING) << cntl.ErrorText();
        }
        usleep(FLAGS_interval_ms * 1000L);
    }
    return 0;
}

google::protobuf::Service、RpcChannel、RpcController

client端主要对google里头的以上三个类进行了继承封装,

  • Service中主要是EchoService_Stub中实现的Echo,这些都有脚手架完成
void EchoService_Stub::Echo(::google::protobuf::RpcController* controller,
                              const ::example::EchoRequest* request,
                              ::example::EchoResponse* response,
                              ::google::protobuf::Closure* done) {
  channel_->CallMethod(descriptor()->method(0),
                       controller, request, response, done);
}
  • RpcChannel中主要是CallMethod进行封装
virtual void CallMethod(const MethodDescriptor* method,
                          RpcController* controller, const Message* request,
                          Message* response, Closure* done) = 0;
  • RcpController主要由brpc::Controller继承并封装
  • google::protobuf::Closure是异步调用使用的,后续对其分析

我们也可以自行对其封装,也可以使用protobuf来完成rpc的交互

brpc::Channel::Init

以consul服务发现为例,负载均衡使用rr。调用下游协议以baidu_std为例

brpc::Channel channel;
brpc::ChannelOptions options;
options.protocol = "baidu_std";
options.timeout_ms = 100 /*milliseconds*/;
if (channel.Init("consul://hello_call", "rr", &options) != 0) {
  LOG(ERROR) << "Fail to initialize channel";
  return -1;
}

int Channel::Init(const char* ns_url,
                  const char* lb_name,
                  const ChannelOptions* options) {
    // 判断是否有设置负载均衡
    if (lb_name == NULL || *lb_name == '\0') {
        // Treat ns_url as server_addr_and_port
        return Init(ns_url, options);
    }
    // 全局只初始化一次。后续可单独分析
    GlobalInitializeOrDie();
    // 初始化Channel中一些协议,连接类型等 
    if (InitChannelOptions(options) != 0) {
        return -1;
    }
    // http协议解析
    if (_options.protocol == brpc::PROTOCOL_HTTP &&
        ::strncmp(ns_url, "https://", 8) == 0) {
        if (_options.mutable_ssl_options()->sni_name.empty()) {
            ParseURL(ns_url,
                     NULL, &_options.mutable_ssl_options()->sni_name, NULL);
        }
    }
    
    LoadBalancerWithNaming* lb = new (std::nothrow) LoadBalancerWithNaming;
    if (NULL == lb) {
        LOG(FATAL) << "Fail to new LoadBalancerWithNaming";
        return -1;        
    }
    GetNamingServiceThreadOptions ns_opt;
    ns_opt.succeed_without_server = _options.succeed_without_server;
    ns_opt.log_succeed_without_server = _options.log_succeed_without_server;
    // 保证每个id唯一保证,后续SocketMapKey使用
    ns_opt.channel_signature = ComputeChannelSignature(_options);
    // 是否开启ssl
    if (CreateSocketSSLContext(_options, &ns_opt.ssl_ctx) != 0) {
        return -1;
    }
    // lb初始化
    if (lb->Init(ns_url, lb_name, _options.ns_filter, &ns_opt) != 0) {
        LOG(ERROR) << "Fail to initialize LoadBalancerWithNaming";
        delete lb;
        return -1;
    }
    _lb.reset(lb);
    return 0;
}
  • 初始化channel中options数据,主要是协议、收发包、回调连接类型
  • loadbalance初始化

初始化InitChannelOptions

int Channel::InitChannelOptions(const ChannelOptions* options) {
    if (options) {  // Override default options if user provided one.
        _options = *options;
    }
    // 全局GlobalInitializeOrDie获取到baidu_std中协议信息
    const Protocol* protocol = FindProtocol(_options.protocol);
    if (NULL == protocol || !protocol->support_client()) {
        LOG(ERROR) << "Channel does not support the protocol";
        return -1;
    }
    // 收包信息
    _serialize_request = protocol->serialize_request;
    // 打包回调函数
    _pack_request = protocol->pack_request;
    // rpc名字
    _get_method_name = protocol->get_method_name;

    // Check connection_type
    if (_options.connection_type == CONNECTION_TYPE_UNKNOWN) {
        // Save has_error which will be overriden in later assignments to
        // connection_type.
        const bool has_error = _options.connection_type.has_error();
        
        if (protocol->supported_connection_type & CONNECTION_TYPE_SINGLE) {
            _options.connection_type = CONNECTION_TYPE_SINGLE;
        } else if (protocol->supported_connection_type & CONNECTION_TYPE_POOLED) {
            _options.connection_type = CONNECTION_TYPE_POOLED;
        } else {
            _options.connection_type = CONNECTION_TYPE_SHORT;
        }
        if (has_error) {
            LOG(ERROR) << "Channel=" << this << " chose connection_type="
                       << _options.connection_type.name() << " for protocol="
                       << _options.protocol.name();
        }
    } else {
        if (!(_options.connection_type & protocol->supported_connection_type)) {
            LOG(ERROR) << protocol->name << " does not support connection_type="
                       << ConnectionTypeToString(_options.connection_type);
            return -1;
        }
    }
		// 知道brpc的协议,后续直接使用这个index去获取,减少遍历。不像服务端,不清楚过来的包是什么协议,一个个尝试解析。
    _preferred_index = get_client_side_messenger()->FindProtocolIndex(_options.protocol);
    if (_preferred_index < 0) {
        LOG(ERROR) << "Fail to get index for protocol="
                   << _options.protocol.name();
        return -1;
    }
		//  Client side only
    if (_options.protocol == PROTOCOL_ESP) {
        if (_options.auth == NULL) {
            _options.auth = policy::global_esp_authenticator();
        }
    }

    // Normalize connection_group暂时没用到
    std::string& cg = _options.connection_group;
    if (!cg.empty() && (::isspace(cg.front()) || ::isspace(cg.back()))) {
        butil::TrimWhitespace(cg, butil::TRIM_ALL, &cg);
    }
    return 0;
}

继续看LoadBalancerWithNaming::Init

int LoadBalancerWithNaming::Init(const char* ns_url, const char* lb_name,
                                 const NamingServiceFilter* filter,
                                 const GetNamingServiceThreadOptions* options) {
   // 解析lb中使用的是哪个类型,当前为rr模型RoundRobinLoadBalancer  
   if (SharedLoadBalancer::Init(lb_name) != 0) {
        return -1;
    }
    // nameservice->run
    if (GetNamingServiceThread(&_nsthread_ptr, ns_url, options) != 0) {
        LOG(FATAL) << "Fail to get NamingServiceThread";
        return -1;
    }
    if (_nsthread_ptr->AddWatcher(this, filter) != 0) {
        LOG(FATAL) << "Fail to add watcher into _server_list";
        return -1;
    }
    return 0;
}

void NamingServiceThread::Run() {
    // NamingService获取,以consul获取为例
    int rc = _ns->RunNamingService(_service_name.c_str(), &_actions);
    if (rc != 0) {
        LOG(WARNING) << "Fail to run naming service: " << berror(rc);
        if (rc == ENODATA) {
            LOG(ERROR) << "RunNamingService should not return ENODATA, "
                "change it to ESTOP";
            rc = ESTOP;
        }
        _actions.EndWait(rc);
    }
}
  • GetNamingServiceThread中启动协程获取信息RunNamingSrvice,以consul为例。

RunNamingService获取nameservice

int ConsulNamingService::RunNamingService(const char* service_name,
                                          NamingServiceActions* actions) {
    std::vector<ServerNode> servers;
    bool ever_reset = false;
    for (;;) {
        servers.clear();
        // 调用consul-agent获取servers
        const int rc = GetServers(service_name, &servers);
        if (bthread_stopped(bthread_self())) {
            RPC_VLOG << "Quit NamingServiceThread=" << bthread_self();
            return 0;
        }
        if (rc == 0) {
            ever_reset = true;
            // 获取成功,保存ServerNode数据
            actions->ResetServers(servers);
        } else {
            if (!ever_reset) {
                // ResetServers must be called at first time even if GetServers
                // failed, to wake up callers to `WaitForFirstBatchOfServers'
                ever_reset = true;
                servers.clear();
                actions->ResetServers(servers);
            }
            if (bthread_usleep(std::max(FLAGS_consul_retry_interval_ms, 1) * butil::Time::kMicrosecondsPerMillisecond) < 0) {
                if (errno == ESTOP) {
                    RPC_VLOG << "Quit NamingServiceThread=" << bthread_self();
                    return 0;
                }
                PLOG(FATAL) << "Fail to sleep";
                return -1;
            }
        }
    }
    CHECK(false);
    return -1;
}
  • GetServers调用consul接口:http://127.0.0.1:8500/v1/health/service/hello_world?stale&passing&index=4464071119&wait=60s,其中index为增量变更数据,wait会hang住,直到consul-server端有变更,或者到达等待时间
  • ServerNode中保存tag信息和ip、port的数据
  • ResetServers中会保存servers的数据,并更新lb中的数据

ResetServers中保存nameservice获取的Servers信息

void NamingServiceThread::Actions::ResetServers(
        const std::vector<ServerNode>& servers) {
    _servers.assign(servers.begin(), servers.end());
    std::sort(_servers.begin(), _servers.end());
    const size_t dedup_size = std::unique(_servers.begin(), _servers.end())
        - _servers.begin();
    if (dedup_size != _servers.size()) {
        LOG(WARNING) << "Removed " << _servers.size() - dedup_size
                     << " duplicated servers";
        _servers.resize(dedup_size);
    }
    _added.resize(_servers.size());
    // 求出差集,新服务中增加的数量
    std::vector<ServerNode>::iterator _added_end =
        std::set_difference(_servers.begin(), _servers.end(),
                            _last_servers.begin(), _last_servers.end(),
                            _added.begin());
    _added.resize(_added_end - _added.begin());
    // 求出差集,老服务中减少的服务数据
    _removed.resize(_last_servers.size());
    std::vector<ServerNode>::iterator _removed_end =
        std::set_difference(_last_servers.begin(), _last_servers.end(),
                            _servers.begin(), _servers.end(),
                            _removed.begin());
    _removed.resize(_removed_end - _removed.begin());

    _added_sockets.clear();
    for (size_t i = 0; i < _added.size(); ++i) {
        ServerNodeWithId tagged_id;
        tagged_id.node = _added[i];
        // TODO: For each unique SocketMapKey (i.e. SSL settings), insert a new
        //       Socket. SocketMapKey may be passed through AddWatcher. Make sure
        //       to pick those Sockets with the right settings during OnAddedServers
        const SocketMapKey key(_added[i], _owner->_options.channel_signature);
        // 创建套接字等。存储,channel_signature保证id唯一
        CHECK_EQ(0, SocketMapInsert(key, &tagged_id.id, _owner->_options.ssl_ctx,
                                    _owner->_options.use_rdma));
        _added_sockets.push_back(tagged_id);
    }

    _removed_sockets.clear();
    for (size_t i = 0; i < _removed.size(); ++i) {
        ServerNodeWithId tagged_id;
        tagged_id.node = _removed[i];
        const SocketMapKey key(_removed[i], _owner->_options.channel_signature);
        CHECK_EQ(0, SocketMapFind(key, &tagged_id.id));
        _removed_sockets.push_back(tagged_id);
    }

    // Refresh sockets
    if (_removed_sockets.empty()) {
        // 上一次不变+这次新增的
        _sockets = _owner->_last_sockets;
    } else {
        std::sort(_removed_sockets.begin(), _removed_sockets.end());
        _sockets.resize(_owner->_last_sockets.size());
        // 差集减去移除的数据
        std::vector<ServerNodeWithId>::iterator _sockets_end =
            std::set_difference(
                _owner->_last_sockets.begin(), _owner->_last_sockets.end(),
                _removed_sockets.begin(), _removed_sockets.end(),
                _sockets.begin());
        _sockets.resize(_sockets_end - _sockets.begin());
    }
    // 新增sockets
    if (!_added_sockets.empty()) {
        const size_t before_added = _sockets.size();
        std::sort(_added_sockets.begin(), _added_sockets.end());
        _sockets.insert(_sockets.end(),
                       _added_sockets.begin(), _added_sockets.end());
        std::inplace_merge(_sockets.begin(), _sockets.begin() + before_added,
                           _sockets.end());
    }
    std::vector<ServerId> removed_ids;
    ServerNodeWithId2ServerId(_removed_sockets, &removed_ids, NULL);

    {
        BAIDU_SCOPED_LOCK(_owner->_mutex);
        _last_servers.swap(_servers);
        _owner->_last_sockets.swap(_sockets);
        for (std::map<NamingServiceWatcher*,
                      const NamingServiceFilter*>::iterator
                 it = _owner->_watchers.begin();
             it != _owner->_watchers.end(); ++it) {
            if (!_removed_sockets.empty()) {
                // 同步lb移除结点
                it->first->OnRemovedServers(removed_ids);
            }

            std::vector<ServerId> added_ids;
            ServerNodeWithId2ServerId(_added_sockets, &added_ids, it->second);
            if (!_added_sockets.empty()) {
                // 同步lb增加结点
                it->first->OnAddedServers(added_ids);
            }
        }
    }

    for (size_t i = 0; i < _removed.size(); ++i) {
        // TODO: Remove all Sockets that have the same address in SocketMapKey.peer
        //       We may need another data structure to avoid linear cost
        const SocketMapKey key(_removed[i], _owner->_options.channel_signature);
        // 全局的数据移除。套接字,健康检查移除
        SocketMapRemove(key);
    }

    if (!_removed.empty() || !_added.empty()) {
        std::ostringstream info;
        info << butil::class_name_str(*_owner->_ns) << "(\""
             << _owner->_service_name << "\"):";
        if (!_added.empty()) {
            info << " added "<< _added.size();
        }
        if (!_removed.empty()) {
            info << " removed " << _removed.size();
        }
        LOG(INFO) << info.str();
    }

    EndWait(servers.empty() ? ENODATA : 0);
}
  • 前置工作都准备好后。开始进行远程调用
  • SocketMapInsert创建套接字连接。

Stub Echo是如何进行远程调用的

EchoService_Stub::EchoService_Stub(::google::protobuf::RpcChannel* channel)
  : channel_(channel), owns_channel_(false) {}

void EchoService_Stub::Echo(::google::protobuf::RpcController* controller,
                              const ::example::EchoRequest* request,
                              ::example::EchoResponse* response,
                              ::google::protobuf::Closure* done) {
  channel_->CallMethod(descriptor()->method(0),
                       controller, request, response, done);
}
  • EchoService_Stub初始化后channel进行赋值
  • echo.pb.cc中调用了CallMethod的方法。所以核心调用都在CallMethod方法上

核心CallMethod

void Channel::CallMethod(const google::protobuf::MethodDescriptor* method,
                         google::protobuf::RpcController* controller_base,
                         const google::protobuf::Message* request,
                         google::protobuf::Message* response,
                         google::protobuf::Closure* done) {
    const int64_t start_send_real_us = butil::gettimeofday_us();
    Controller* cntl = static_cast<Controller*>(controller_base);
    cntl->OnRPCBegin(start_send_real_us);
    // Override max_retry first to reset the range of correlation_id
    if (cntl->max_retry() == UNSET_MAGIC_NUM) {
        // 默认为3次
        cntl->set_max_retry(_options.max_retry);
    }
    if (cntl->max_retry() < 0) {
        // this is important because #max_retry decides #versions allocated
        // in correlation_id. negative max_retry causes undefined behavior.
        cntl->set_max_retry(0);
    }
    // options.protocol = "baidu_std";
    cntl->_request_protocol = _options.protocol;
    // 默认为空
    if (_options.protocol.has_param()) {
        CHECK(cntl->protocol_param().empty());
        cntl->protocol_param() = _options.protocol.param();
    }
    // http协议
    if (_options.protocol == brpc::PROTOCOL_HTTP && (_scheme == "https" || _scheme == "http")) {
        URI& uri = cntl->http_request().uri();
        if (uri.host().empty() && !_service_name.empty()) {
            uri.SetHostAndPort(_service_name);
        }
    }
    // 下次解析包协议更快
    cntl->_preferred_index = _preferred_index;
    cntl->_retry_policy = _options.retry_policy;
    // 隔离熔断,后续专门分析
    if (_options.enable_circuit_breaker) {
        cntl->add_flag(Controller::FLAGS_ENABLED_CIRCUIT_BREAKER);
    }
    // 错误处理,如重试,对冲请求,都按照这个处理
    const CallId correlation_id = cntl->call_id();
    const int rc = bthread_id_lock_and_reset_range(
                    correlation_id, NULL, 2 + cntl->max_retry());
    // 在另一个rpc使用
    if (rc != 0) {
        CHECK_EQ(EINVAL, rc);
        if (!cntl->FailedInline()) {
            cntl->SetFailed(EINVAL, "Fail to lock call_id=%" PRId64,
                            correlation_id.value);
        }
        LOG_IF(ERROR, cntl->is_used_by_rpc())
            << "Controller=" << cntl << " was used by another RPC before. "
            "Did you forget to Reset() it before reuse?";
        // Have to run done in-place. If the done runs in another thread,
        // Join() on this RPC is no-op and probably ends earlier than running
        // the callback and releases resources used in the callback.
        // Since this branch is only entered by wrongly-used RPC, the
        // potentially introduced deadlock(caused by locking RPC and done with
        // the same non-recursive lock) is acceptable and removable by fixing
        // user's code.
        if (done) {
            done->Run();
        }
        return;
    }
    cntl->set_used_by_rpc();

    if (cntl->_sender == NULL && IsTraceable(Span::tls_parent())) {
        const int64_t start_send_us = butil::cpuwide_time_us();
        const std::string* method_name = NULL;
        if (_get_method_name) {
            method_name = &_get_method_name(method, cntl);
        } else if (method) {
            method_name = &method->full_name();
        } else {
            const static std::string NULL_METHOD_STR = "null-method";
            method_name = &NULL_METHOD_STR;
        }
        Span* span = Span::CreateClientSpan(
            *method_name, start_send_real_us - start_send_us);
        span->set_log_id(cntl->log_id());
        span->set_base_cid(correlation_id);
        span->set_protocol(_options.protocol);
        span->set_start_send_us(start_send_us);
        cntl->_span = span;
    }
    // 超时设置
    if (cntl->timeout_ms() == UNSET_MAGIC_NUM) {
        cntl->set_timeout_ms(_options.timeout_ms);
    }
    // Since connection is shared extensively amongst channels and RPC,
    // overriding connect_timeout_ms does not make sense, just use the
    // one in ChannelOptions
    cntl->_connect_timeout_ms = _options.connect_timeout_ms;
    if (cntl->backup_request_ms() == UNSET_MAGIC_NUM) {
        cntl->set_backup_request_ms(_options.backup_request_ms);
    }
    //  "single", "pooled", "short"
    if (cntl->connection_type() == CONNECTION_TYPE_UNKNOWN) {
        cntl->set_connection_type(_options.connection_type);
    }
    cntl->_response = response;
    // 异步回调使用,非异步为空
    cntl->_done = done;
    cntl->_pack_request = _pack_request;
    cntl->_method = method;
    cntl->_auth = _options.auth;

    if (SingleServer()) {
        cntl->_single_server_id = _server_id;
        cntl->_remote_side = _server_address;
    }

    // Share the lb with controller.
    cntl->_lb = _lb;

    // Ensure that serialize_request is done before pack_request in all
    // possible executions, including:
    //   HandleSendFailed => OnVersionedRPCReturned => IssueRPC(pack_request)
    // 打包
    _serialize_request(&cntl->_request_buf, cntl, request);
    if (cntl->FailedInline()) {
        // Handle failures caused by serialize_request, and these error_codes
        // should be excluded from the retry_policy.
        return cntl->HandleSendFailed();
    }
    if (FLAGS_usercode_in_pthread &&
        done != NULL &&
        TooManyUserCode()) {
        cntl->SetFailed(ELIMIT, "Too many user code to run when "
                        "-usercode_in_pthread is on");
        return cntl->HandleSendFailed();
    }

    if (cntl->_request_stream != INVALID_STREAM_ID) {
        // Currently we cannot handle retry and backup request correctly
        cntl->set_max_retry(0);
        cntl->set_backup_request_ms(-1);
    }
		// 对冲请求
    if (cntl->backup_request_ms() >= 0 &&
        (cntl->backup_request_ms() < cntl->timeout_ms() ||
         cntl->timeout_ms() < 0)) {
        // Setup timer for backup request. When it occurs, we'll setup a
        // timer of timeout_ms before sending backup request.

        // _deadline_us is for truncating _connect_timeout_ms and resetting
        // timer when EBACKUPREQUEST occurs.
        if (cntl->timeout_ms() < 0) {
            cntl->_deadline_us = -1;
        } else {
            cntl->_deadline_us = cntl->timeout_ms() * 1000L + start_send_real_us;
        }
        // 对冲请求HandleBackupRequest调用HandleSocketFailed(call_id的创建)
        const int rc = bthread_timer_add(
            &cntl->_timeout_id,
            butil::microseconds_to_timespec(
                cntl->backup_request_ms() * 1000L + start_send_real_us),
            HandleBackupRequest, (void*)correlation_id.value);
        if (BAIDU_UNLIKELY(rc != 0)) {
            cntl->SetFailed(rc, "Fail to add timer for backup request");
            return cntl->HandleSendFailed();
        }
    } else if (cntl->timeout_ms() >= 0) {
        // Setup timer for RPC timetout
        // 超时请求的处理
        cntl->_deadline_us = cntl->timeout_ms() * 1000L + start_send_real_us;
        const int rc = bthread_timer_add(
            &cntl->_timeout_id,
            butil::microseconds_to_timespec(cntl->_deadline_us),
            HandleTimeout, (void*)correlation_id.value);
        if (BAIDU_UNLIKELY(rc != 0)) {
            cntl->SetFailed(rc, "Fail to add timer for timeout");
            return cntl->HandleSendFailed();
        }
    } else {
        cntl->_deadline_us = -1;
    }
    // 发送数据
    cntl->IssueRPC(start_send_real_us);
    if (done == NULL) {
        // MUST wait for response when sending synchronous RPC. It will
        // be woken up by callback when RPC finishes (succeeds or still
        // fails after retry)
        Join(correlation_id);
        if (cntl->_span) {
            cntl->SubmitSpan();
        }
        cntl->OnRPCEnd(butil::gettimeofday_us());
    }
}
  • 创建了call_id,后续的重试,错误处理都需要根据使用
  • _serialize_request对序列化请求数据,并调用IssueRPC发送数据

IssueRPC

void Controller::IssueRPC(int64_t start_realtime_us) {
    _current_call.begin_time_us = start_realtime_us;
    // If has retry/backup request,we will recalculate the timeout,
    if (_real_timeout_ms > 0) {
        _real_timeout_ms -= (start_realtime_us - _begin_time_us) / 1000;
    }

    // Clear last error, Don't clear _error_text because we append to it.
    _error_code = 0;

    // Make versioned correlation_id.
    // call_id         : unversioned, mainly for ECANCELED and ERPCTIMEDOUT
    // call_id + 1     : first try.
    // call_id + 2     : retry 1
    // ...
    // call_id + N + 1 : retry N
    // All ids except call_id are versioned. Say if we've sent retry 1 and
    // a failed response of first try comes back, it will be ignored.
    // 首先通过current_id()得到该次请求的CallId,如上所述,第一次请求版本号为1 + 0 + 1 = 2
    const CallId cid = current_id();

    // Intercept IssueRPC when _sender is set. Currently _sender is only set
    // by SelectiveChannel.
    // 初始值为null
    if (_sender) {
        if (_sender->IssueRPC(start_realtime_us) != 0) {
            return HandleSendFailed();
        }
        CHECK_EQ(0, bthread_id_unlock(cid));
        return;
    }

    // Pick a target server for sending RPC
    _current_call.need_feedback = false;
    _current_call.enable_circuit_breaker = has_enabled_circuit_breaker();
    SocketUniquePtr tmp_sock;
    if (SingleServer()) {
        ...
    } else {
        LoadBalancer::SelectIn sel_in =
            { start_realtime_us, true,
              has_request_code(), _request_code, _accessed };
        LoadBalancer::SelectOut sel_out(&tmp_sock);
        // 获取socket保存在tmp_sock中
        const int rc = _lb->SelectServer(sel_in, &sel_out);
        if (rc != 0) {
            std::ostringstream os;
            DescribeOptions opt;
            opt.verbose = false;
            _lb->Describe(os, opt);
            SetFailed(rc, "Fail to select server from %s", os.str().c_str());
            return HandleSendFailed();
        }
        _current_call.need_feedback = sel_out.need_feedback;
        _current_call.peer_id = tmp_sock->id();
        // NOTE: _remote_side must be set here because _pack_request below
        // may need it (e.g. http may set "Host" to _remote_side)
        // Don't set _local_side here because tmp_sock may be not connected
        // here.
        _remote_side = tmp_sock->remote_side();
    }
    if (_stream_creator) {
  		...
    }
    // Handle connection type
    if (_connection_type == CONNECTION_TYPE_SINGLE ||
        _stream_creator != NULL) { // let user decides the sending_sock
        // in the callback(according to connection_type) directly
        _current_call.sending_sock.reset(tmp_sock.release());
        // TODO(gejun): Setting preferred index of single-connected socket
        // has two issues:
        //   1. race conditions. If a set perferred_index is overwritten by
        //      another thread, the response back has to check protocols one
        //      by one. This is a performance issue, correctness is unaffected.
        //   2. thrashing between different protocols. Also a performance issue.
        _current_call.sending_sock->set_preferred_index(_preferred_index);
    } else {
        int rc = 0;
        // pooled调用
        if (_connection_type == CONNECTION_TYPE_POOLED) {
            rc = tmp_sock->GetPooledSocket(&_current_call.sending_sock);
        } else if (_connection_type == CONNECTION_TYPE_SHORT) {
            rc = tmp_sock->GetShortSocket(&_current_call.sending_sock);
        } else {
            tmp_sock.reset();
            SetFailed(EINVAL, "Invalid connection_type=%d", (int)_connection_type);
            return HandleSendFailed();
        }
        if (rc) {
            tmp_sock.reset();
            SetFailed(rc, "Fail to get %s connection",
                      ConnectionTypeToString(_connection_type));
            return HandleSendFailed();
        }
        // Remember the preferred protocol for non-single connection. When
        // the response comes back, InputMessenger calls the right handler
        // w/o trying other protocols. This is a must for (many) protocols that
        // can't be distinguished from other protocols w/o ambiguity.
        // 减少协议解析
        _current_call.sending_sock->set_preferred_index(_preferred_index);
        // Set preferred_index of main_socket as well to make it easier to
        // debug and observe from /connections.
        if (tmp_sock->preferred_index() < 0) {
            tmp_sock->set_preferred_index(_preferred_index);
        }
        tmp_sock.reset();
    }
    if (_tos > 0) {
        _current_call.sending_sock->set_type_of_service(_tos);
    }
    if (is_response_read_progressively()) {
        // Tag the socket so that when the response comes back, the parser will
        // stop before reading all body.
        _current_call.sending_sock->read_will_be_progressive(_connection_type);
    }
    // Make request
    butil::IOBuf packet;
    SocketMessage* user_packet = NULL;
    // 协议组包
    _pack_request(&packet, &user_packet, cid.value, _method, this,
                  _request_buf, using_auth);
    // TODO: PackRequest may accept SocketMessagePtr<>?
    SocketMessagePtr<> user_packet_guard(user_packet);
    if (FailedInline()) {
        // controller should already be SetFailed.
        if (using_auth) {
            // Don't forget to signal waiters on authentication
            _current_call.sending_sock->SetAuthentication(ErrorCode());
        }
        return HandleSendFailed();
    }

    timespec connect_abstime;
    timespec* pabstime = NULL;
    if (_connect_timeout_ms > 0) {
        if (_deadline_us >= 0) {
            connect_abstime = butil::microseconds_to_timespec(
                std::min(_connect_timeout_ms * 1000L + start_realtime_us,
                         _deadline_us));
        } else {
            connect_abstime = butil::microseconds_to_timespec(
                _connect_timeout_ms * 1000L + start_realtime_us);
        }
        pabstime = &connect_abstime;
    }
    Socket::WriteOptions wopt;
    wopt.id_wait = cid;
    wopt.abstime = pabstime;
    wopt.pipelined_count = _pipelined_count;
    wopt.auth_flags = _auth_flags;
    wopt.ignore_eovercrowded = has_flag(FLAGS_IGNORE_EOVERCROWDED);
    int rc;
    size_t packet_size = 0;
    if (user_packet_guard) {
        if (span) {
            packet_size = user_packet_guard->EstimatedByteSize();
        }
        rc = _current_call.sending_sock->Write(user_packet_guard, &wopt);
    } else {
        packet_size = packet.size();
      	// 发送数据
        rc = _current_call.sending_sock->Write(&packet, &wopt);
    }
    if (span) {
        if (_current_call.nretry == 0) {
            span->set_sent_us(butil::cpuwide_time_us());
            span->set_request_size(packet_size);
        } else {
            span->Annotate("Requested(%lld) [%d]",
                           (long long)packet_size, _current_call.nretry + 1);
        }
    }
    if (using_auth) {
        // For performance concern, we set authentication to immediately
        // after the first `Write' returns instead of waiting for server
        // to confirm the credential data
        _current_call.sending_sock->SetAuthentication(rc);
    }
    CHECK_EQ(0, bthread_id_unlock(cid));
}
  • 从lb中获取到对应的socket信息
  • _pack_request组装具体协议的二进制包,转成对应的IOBuf进行发送
  • Write发送数据

StartWrite

int Socket::StartWrite(WriteRequest* req, const WriteOptions& opt) {
    // Release fence makes sure the thread getting request sees *req
    // 与当前_write_head做原子交换,_write_head初始值是NULL,
    // 如果是第一个写fd的线程,则exchange返回NULL,并将_write_head指向第一个线程的待写数据,
    // 如果不是第一个写fd的线程,exchange返回值是非NULL,且将_write_head指向最新到来的待写数据。
    WriteRequest* const prev_head =
        _write_head.exchange(req, butil::memory_order_release);
    if (prev_head != NULL) {
        // Someone is writing to the fd. The KeepWrite thread may spin
        // until req->next to be non-UNCONNECTED. This process is not
        // lock-free, but the duration is so short(1~2 instructions,
        // depending on compiler) that the spin rarely occurs in practice
        // (I've not seen any spin in highly contended tests).
        // 如果不是第一个写fd的bthread,将待写数据加入链表后,就返回。
        req->next = prev_head;
        return 0;
    }

    int saved_errno = 0;
    bthread_t th;
    SocketUniquePtr ptr_for_keep_write;
    ssize_t nw = 0;

    // We've got the right to write.
    // req指向的是第一个待写数据,肯定是以_write_head为头部的链表的尾结点,next一定是NULL。
    req->next = NULL;

    // Connect to remote_side() if not.
    // 如果TCP连接未建立,则在ConnectIfNot内部执行非阻塞的connect,并将自身挂起,
    // 等待epoll通知连接已建立后再被唤醒执行。
    int ret = ConnectIfNot(opt.abstime, req);
    if (ret < 0) {
        saved_errno = errno;
        SetFailed(errno, "Fail to connect %s directly: %m", description().c_str());
        goto FAIL_TO_WRITE;
    } else if (ret == 1) {
        // We are doing connection. Callback `KeepWriteIfConnected'
        // will be called with `req' at any moment after
        // TCP连接建立中,bthread返回、挂起,等待唤醒。
        return 0;
    }

    // NOTE: Setup() MUST be called after Connect which may call app_connect,
    // which is assumed to run before any SocketMessage.AppendAndDestroySelf()
    // in some protocols(namely RTMP).
    req->Setup(this);

    if (ssl_state() != SSL_OFF) {
        // Writing into SSL may block the current bthread, always write
        // in the background.
        goto KEEPWRITE_IN_BACKGROUND;
    }

    // Write once in the calling thread. If the write is not complete,
    // continue it in KeepWrite thread.
    // 向fd写入数据,这里只关心req指向的数据,不关心其他bthread加入_write_head链表的数据。
    // 不一定能一次写完,可能req指向的数据只写入了一部分。
    if (_conn) {
        butil::IOBuf* data_arr[1] = { &req->data };
        nw = _conn->CutMessageIntoFileDescriptor(fd(), data_arr, 1);
    } else {
#if BRPC_WITH_RDMA
        if (_rdma_ep && _rdma_state != RDMA_OFF) {
            butil::IOBuf* data_arr[1] = { &req->data };
            nw = _rdma_ep->CutFromIOBufList(data_arr, 1);
        } else {
#else
        {
#endif
            // 写入数据
            nw = req->data.cut_into_file_descriptor(fd());
        }
    }
    if (nw < 0) {
        // RTMP may return EOVERCROWDED
        if (errno != EAGAIN && errno != EOVERCROWDED) {
            saved_errno = errno;
            // EPIPE is common in pooled connections + backup requests.
            PLOG_IF(WARNING, errno != EPIPE) << "Fail to write into " << *this;
            SetFailed(saved_errno, "Fail to write into %s: %s",
                      description().c_str(), berror(saved_errno));
            goto FAIL_TO_WRITE;
        }
    } else {
        AddOutputBytes(nw);
    }
    // 判断req指向的数据是否已写完。
    // 在IsWriteComplete内部会判断,如果req指向的数据已全部写完,且当前时刻req是唯一待写入的数据,
    // 则IsWriteComplete返回true。
    if (IsWriteComplete(req, true, NULL)) {
        // 回收req指向的heap内存到对象池,bthread完成任务,返回。
        ReturnSuccessfulWriteRequest(req);
        return 0;
    }

KEEPWRITE_IN_BACKGROUND:
    ReAddress(&ptr_for_keep_write);
    req->socket = ptr_for_keep_write.release();
    // req指向的数据未全部写完,为了使pthread wait-free,启动KeepWrite bthread后,当前bthread就返回。
    // 在KeepWrite bthread内部,不仅需要处理当前req未写完的数据,还可能要处理其他bthread加入链表的数据。
    // KeepWrite bthread并不具有最高的优先级,所以使用bthread_start_background,将KeepWrite bthread的
    // tid加到执行队列尾部。
    if (bthread_start_background(&th, &BTHREAD_ATTR_NORMAL,
                                 KeepWrite, req) != 0) {
        LOG(FATAL) << "Fail to start KeepWrite";
        KeepWrite(req);
    }
    return 0;

FAIL_TO_WRITE:
    // `SetFailed' before `ReturnFailedWriteRequest' (which will calls
    // `on_reset' callback inside the id object) so that we immediately
    // know this socket has failed inside the `on_reset' callback
    ReleaseAllFailedWriteRequests(req);
    errno = saved_errno;
    return -1;
}
  • 需要考虑多个数据同时写的情况,所以把在发送的数据,用链表进行存储
  • 剩余的数据,启动新的协程处理,后新数据到,都加到链表中
  • 协程处理会将需要写的数据,全部写完,在退出

总结

brpc客户端调用过程基本就是这样,相信认真看完这篇文章。你对客户端肯定会有一定的收获。之前我们已经分析介绍brpc服务端。 后续分析brpc协程,socket套接字资源管理等。