Thrift的分层实现

Thrift作为RPC的一种实现机制, 封装了内部处理细节, 执行远程的一个服务调用, 其格式就像本地的函数调用一样简洁.Thrift可以看成一个分层的实现,和TCP/IP分层实现有所区别的是, 在使用thrift的时候,仍然需要为每层进行配置, 也就是说, 选择每个分层具体的实现方式. thrift的分层网络栈如下图所示:

thrift在java中使用 thrift json_thrift在java中使用


Transport是最底层的传输层.是对底层网络访问的抽象.其实现可能是基于TCP的socket插口,也可能是HTTP这样的高层网络协议.尽管thrift的代码在RPC的不同机器是相同的, 但是根据业务模型, RPC的调用者可以认为是客户端, 被调用者可以认为是服务器,即向调用者提供服务.因此,客户端角色的传输层和服务器角色的传输层执行的内部逻辑是不同的.

传输层之上是protocol层,协议层负责把业务层的数据结构,以及thrift的数据类型进行编码.当向底层传输层写入数据时, 进行序列化编码.当从传输层读取数据时,进行反序列化编码.协议层的实现机制,可以为JSON, XML或者二进制等.

协议层之上是processor层.业务处理层的特点是:

  • 实现并封装了对流的读取和写入的流程.
  • 其实现细节和具体.thrift文件中定义的数据结构和服务相关.在服务端,processor层负责调用服务handler提供的服务.

协议层之上是server调用层. 一般情况下, 这一层存在于服务器端, 即以不同的线程调度模式处理客户端的RPC调用.

Thrift的JSON协议设计
JSON协议是JavaScript语法的子集,可读性好,是被广泛支持的数据传输格式,不同语言平台上很多库(如Java的Gson)可以很方便在JSON和类对象间进行转换.

相比较于更紧凑的编码协议, thrift的JSON协议的缺点是执行效率, 主要体现在两个方面:

  • Thrift在将业务数据转换为JSON时,进行编解码,和特殊的逃逸字符处理.比如将二进制数据编码为base64等等.
  • Thrift将业务数据编码为JSON后,数据将变大.base64编码将增加到原始数据的4/3大小.JSON格式还需要写入其语法中的字符,比如{ } [ ] , : "等.因此,thrift的JSON协议将使网络传输的负载变大.

Thrift 的JSON协议将thrift类型编码为JSON格式的规则如下(TJSONProtocol.h):

/**
 * JSON protocol for Thrift.
 *
 * Implements a protocol which uses JSON as the wire-format.
 *
 * Thrift types are represented as described below:
 *
 * 1. Every Thrift integer type is represented as a JSON number.
 *
 * 2. Thrift doubles are represented as JSON numbers. Some special values are
 *    represented as strings:
 *    a. "NaN" for not-a-number values
 *    b. "Infinity" for positive infinity
 *    c. "-Infinity" for negative infinity
 *
 * 3. Thrift string values are emitted as JSON strings, with appropriate
 *    escaping.
 *
 * 4. Thrift binary values are encoded into Base64 and emitted as JSON strings.
 *    The readBinary() method is written such that it will properly skip if
 *    called on a Thrift string (although it will decode garbage data).
 *
 *    NOTE: Base64 padding is optional for Thrift binary value encoding. So
 *    the readBinary() method needs to decode both input strings with padding
 *    and those without one.
 *
 * 5. Thrift structs are represented as JSON objects, with the field ID as the
 *    key, and the field value represented as a JSON object with a single
 *    key-value pair. The key is a short string identifier for that type,
 *    followed by the value. The valid type identifiers are: "tf" for bool,
 *    "i8" for byte, "i16" for 16-bit integer, "i32" for 32-bit integer, "i64"
 *    for 64-bit integer, "dbl" for double-precision loating point, "str" for
 *    string (including binary), "rec" for struct ("records"), "map" for map,
 *    "lst" for list, "set" for set.
 *
 * 6. Thrift lists and sets are represented as JSON arrays, with the first
 *    element of the JSON array being the string identifier for the Thrift
 *    element type and the second element of the JSON array being the count of
 *    the Thrift elements. The Thrift elements then follow.
 *
 * 7. Thrift maps are represented as JSON arrays, with the first two elements
 *    of the JSON array being the string identifiers for the Thrift key type
 *    and value type, followed by the count of the Thrift pairs, followed by a
 *    JSON object containing the key-value pairs. Note that JSON keys can only
 *    be strings, which means that the key type of the Thrift map should be
 *    restricted to numeric or string types -- in the case of numerics, they
 *    are serialized as strings.
 *
 * 8. Thrift messages are represented as JSON arrays, with the protocol
 *    version #, the message name, the message type, and the sequence ID as
 *    the first 4 elements.
 *
 * More discussion of the double handling is probably warranted. The aim of
 * the current implementation is to match as closely as possible the behavior
 * of Java's Double.toString(), which has no precision loss.  Implementors in
 * other languages should strive to achieve that where possible. I have not
 * yet verified whether std::istringstream::operator>>, which is doing that
 * work for me in C++, loses any precision, but I am leaving this as a future
 * improvement. I may try to provide a C component for this, so that other
 * languages could bind to the same underlying implementation for maximum
 * consistency.
 *
 */

Thrift在对其数据进行JSON编码时,需要知道当前处理的JSON格式的那一部分,比如正在处理的是数组还是对象的key/value对等等, 需要写入相应的JSON语法字符.因此,thrift实现了TJSONContext基类,并实现了不同的子类,如JSONPairContext, JSONListContext等.不同的子类负责处理JSON的语法字符.

因为JSON格式是个嵌套的数据格式,可能同时存在多个不同的TJSONContext对象.如何管理这种嵌套的TJSONContext对象,典型的数据结构就是stack.因此类TJSONProtocol中维护了一个有stack管理的TJSONContext成员变量,并且还有一个当前TJSONContext的成员变量.类TJSONProtocol的定义如下(TJSONProtocol.h):

class TJSONProtocol : public TProtocol {
 public:

  TJSONProtocol(boost::shared_ptr<TTransport> ptrans);

  ~TJSONProtocol();

 private:

  void pushContext(boost::shared_ptr<TJSONContext> c);

  void popContext();

  uint32_t writeJSONEscapeChar(uint8_t ch);

  uint32_t writeJSONChar(uint8_t ch);

  uint32_t writeJSONString(const std::string &str);

  uint32_t writeJSONBase64(const std::string &str);

  template <typename NumberType>
  uint32_t writeJSONInteger(NumberType num);

  uint32_t writeJSONDouble(double num);

  uint32_t writeJSONObjectStart() ;

  uint32_t writeJSONObjectEnd();

  uint32_t writeJSONArrayStart();

  uint32_t writeJSONArrayEnd();

  uint32_t readJSONSyntaxChar(uint8_t ch);

  uint32_t readJSONEscapeChar(uint8_t *out);

  uint32_t readJSONString(std::string &str, bool skipContext = false);

  uint32_t readJSONBase64(std::string &str);

  uint32_t readJSONNumericChars(std::string &str);

  template <typename NumberType>
  uint32_t readJSONInteger(NumberType &num);

  uint32_t readJSONDouble(double &num);

  uint32_t readJSONObjectStart();

  uint32_t readJSONObjectEnd();

  uint32_t readJSONArrayStart();

  uint32_t readJSONArrayEnd();

 public:

  /**
   * Writing functions.
   */

  uint32_t writeMessageBegin(const std::string& name,
                             const TMessageType messageType,
                             const int32_t seqid);

  uint32_t writeMessageEnd();

  uint32_t writeStructBegin(const char* name);

  uint32_t writeStructEnd();

  uint32_t writeFieldBegin(const char* name,
                           const TType fieldType,
                           const int16_t fieldId);

  uint32_t writeFieldEnd();

  uint32_t writeFieldStop();

  uint32_t writeMapBegin(const TType keyType,
                         const TType valType,
                         const uint32_t size);

  uint32_t writeMapEnd();

  uint32_t writeListBegin(const TType elemType,
                          const uint32_t size);

  uint32_t writeListEnd();

  uint32_t writeSetBegin(const TType elemType,
                         const uint32_t size);

  uint32_t writeSetEnd();

  uint32_t writeBool(const bool value);

  uint32_t writeByte(const int8_t byte);

  uint32_t writeI16(const int16_t i16);

  uint32_t writeI32(const int32_t i32);

  uint32_t writeI64(const int64_t i64);

  uint32_t writeDouble(const double dub);

  uint32_t writeString(const std::string& str);

  uint32_t writeBinary(const std::string& str);

  /**
   * Reading functions
   */

  uint32_t readMessageBegin(std::string& name,
                            TMessageType& messageType,
                            int32_t& seqid);

  uint32_t readMessageEnd();

  uint32_t readStructBegin(std::string& name);

  uint32_t readStructEnd();

  uint32_t readFieldBegin(std::string& name,
                          TType& fieldType,
                          int16_t& fieldId);

  uint32_t readFieldEnd();

  uint32_t readMapBegin(TType& keyType,
                        TType& valType,
                        uint32_t& size);

  uint32_t readMapEnd();

  uint32_t readListBegin(TType& elemType,
                         uint32_t& size);

  uint32_t readListEnd();

  uint32_t readSetBegin(TType& elemType,
                        uint32_t& size);

  uint32_t readSetEnd();

  uint32_t readBool(bool& value);

  uint32_t readByte(int8_t& byte);

  uint32_t readI16(int16_t& i16);

  uint32_t readI32(int32_t& i32);

  uint32_t readI64(int64_t& i64);

  uint32_t readDouble(double& dub);

  uint32_t readString(std::string& str);

  uint32_t readBinary(std::string& str);

  class LookaheadReader {

   public:

    LookaheadReader(TTransport &trans) :
      trans_(&trans),
      hasData_(false) {
    }

    uint8_t read() {
      if (hasData_) {
        hasData_ = false;
      }
      else {
        trans_->readAll(&data_, 1);
      }
      return data_;
    }

    uint8_t peek() {
      if (!hasData_) {
        trans_->readAll(&data_, 1);
      }
      hasData_ = true;
      return data_;
    }

   private:
    TTransport *trans_;
    bool hasData_;
    uint8_t data_;
  };

 private:

  std::stack<boost::shared_ptr<TJSONContext> > contexts_;
  boost::shared_ptr<TJSONContext> context_;
  LookaheadReader reader_;
};

其中,context_当前上下文,contexts_是用stack结构管理的push操作的上下文.

Thrift进行JSON编码过程中,如果需要进入嵌套的上下文环境中,会把当前的上下文push到contexts_栈,同时根据要进入的上下文类型分配新的TJSONContext对象,赋值给context_成员变量.如果当前上下文处理完毕,就会执行contexts_pop操作, 将栈顶的上下文对象赋值给context_成员变量.