protobuf的java反射 protobuf反射原理

转载

梦里忧郁 2023-09-05 11:17:16

文章标签 protobuf的java反射 c++ protobuf 反射字段 文章分类 Java 后端开发

文章目录

前言

相关应用场景

一、ProtoBuf 反射原理概述

1、获取message和service的属性和方法

1.1 使用protoc将proto文件生成.h和.cc文件
1.2 只使用proto文件，不使用protoc进行编译
1.3 非 .proto 文件，转换成.proto

2、调用message的属性和方法

2.1根据type name反射自动创建实例

3、通过实例instance 的反射接口reflection对其pb字段读写

获取反射Reflection接口，并且**值写入**的代码如下：线程安全

二、反射例子

1、serialize_message
2、parse_message

前言

所谓反射机制，就是能够在运行时知道任意类的所有属性和方法，能够调用任意对象的任意方法和属性。这种动态获取的信息以及动态调用对象方向的功能称为反射机制。
如果用一句话来总结反射实现的关键，可概括为获取系统元信息。

元信息：即系统自描述信息，用于描述系统本身。举例来讲，即系统有哪些类？类中有哪些字段、哪些方法？字段属于什么类型、方法又有怎样的参数和返回值？…

不像Jave，python等语言，C++本身没有反射机制，但使用protobuf时，通过proto文件产生响应的message和service，可以提供反射机制，运行时可以通过proto获取任意message和任意service的属性和方法，并调用、修改和设置。

一、ProtoBuf 反射原理概述

通过Message获取单个字段的FieldDescriptor
通过Message获取其Reflection
通过Reflection来操作FieldDescriptor，从而动态获取或修改单个字段

1、获取message和service的属性和方法

protobuf通过Descriptor获取任意message或service的属性和方法，Descriptor主要包括了一下几种类型：

描述符	方法
FileDescriptor	获取Proto文件中的Descriptor和ServiceDescriptor
Descriptor	获取类message属性和方法，包括FieldDescriptor和EnumDescriptor
FieldDescriptor	获取message中各个字段的类型、标签、名称等
EnumDescriptor	获取Enum中的各个字段名称、值等
ServiceDescriptor	获取service中的MethodDescriptor
MethodDescriptor	获取各个RPC中的request、response、名称等

在类 Descriptor 中可以获取自身信息的函数

const std::string & name() const; // 获取message自身名字
int field_count() const; // 获取该message中有多少字段
const FileDescriptor* file() const; // The .proto file in which this message type was defined. Never nullptr.

在类 Descriptor 中，可以通过如下方法获取类 FieldDescriptor：

const FieldDescriptor* field(int index) const; // 根据定义顺序索引获取，即从0开始到最大定义的条目
const FieldDescriptor* FindFieldByNumber(int number) const; // 根据定义的message里面的顺序值获取（option string name=3，3即为number）
const FieldDescriptor* FindFieldByName(const string& name) const; // 根据field name获取

也就是说，如果能够获取到proto文件的FileDescriptor，就能获取proto文件中的所有的内容。那么如何获取proto文件的FileDescriptor呢？protobuf提供多种方法。

1.1 使用protoc将proto文件生成.h和.cc文件

这种方法直接根据生成的类来获取响应的FileDescriptor，比如现在有test.proto文件，那么可以通过DescriptorPool::generated_pool()获取到其FileDescriptor

const FileDescriptor* fileDescriptor = DescriptorPool::generated_pool()->FindFileByName(file);

并且对于任意的message和service都可以根据其名称，通过DescriptorPool对应的Descriptor和ServiceDescriptor

1.2 只使用proto文件，不使用protoc进行编译

这种情况需要手动解析proto文件，再获取FileDescriptor。protobuf提供了响应的解析器compiler，通过compoiler可以方便的获取proto文件的FileDescriptor

const FileDescriptor* GetFileDescriptorFromProtoFile(const std::string &protoRoot, const std::string &protoFile){
    compiler::DeskSourceTree sourceTree;
    sourceTree.MapPath("", protoRoot);
    FileErrorCollector errorCollector;
    compiler::Importer importer(&sourceTree, &errorCollector);
    return importer.Import(protoFile);
}

1.3 非 .proto 文件，转换成.proto

可以从远程读取，如将数据与数据元信息一同进行 protobuf 编码并传输：

message Req {
  optional string proto_file = 1;
  optional string data = 2;
}

从 Json 或其它格式数据中转换而来

无论 .proto 文件来源于何处，我们都需要对其做进一步的处理和注册，将其解析成内存对象，并构建其与实例的映射，同时也要计算每个字段的内存偏移。可总结出如下步骤：

提供 .proto （范指 ProtoBuf Message 语法描述的元信息）
解析 .proto 构建 FileDescriptor、FieldDescriptor 等，即 .proto 对应的内存模型（对象）
之后每创建一个实例，就将其存到实例工厂相应的实例池中
将 Descriptor 和 instance 的映射维护到表中备查
通过 Descriptor 可查到相应的 instance，又由于了解 instance 中字段类型（FieldDescriptor），所以知道字段的内存偏移，那么就可以访问或修改字段的值

2、调用message的属性和方法

如下是通过字符串type_name调用message的属性和方法的流程图；
Person是自定义的pb类型，继承自Message. MessageLite作为Message基类，更加轻量级一些。
通过Descriptor能获得所需消息的Message* 指针。Message class 定义了 New() 虚函数，用来返回本对象的一份新实体，具体流程如下：

通过 DescriptorPool 的 FindMessageTypeByName 获得了元信息 Descriptor。
根据 MessageFactory 实例工厂和Descriptor获得了Message的默认实例Message*指针 default instance
通过new()构造一个可用的消息对象

protobuf的java反射 protobuf反射原理_protobuf的java反射

2.1根据type name反射自动创建实例

通过字符串"person" 创建新的 person message对象。线程安全

// 先获得类型的Descriptor .
auto descriptor = google::protobuf::DescriptorPool::generated_pool()->FindMessageTypeByName("Person");
//利用Descriptor拿到类型注册的instance. 这个是不可修改的.
auto prototype = google::protobuf::MessageFactory::generated_factory()->GetPrototype(descriptor);
 // 构造一个可用的消息.
auto instance = prototype->New(); //创建新的 person message对象。

第一步我们通过 DescriptorPool 的 FindMessageTypeByName 获得了元信息 Descriptor。

DescriptorPool 为元信息池，对外提供了诸如 FindServiceByName、 FindMessageTypeByName 等各类接口以便外部查询所需的Service或者Message元信息。当 DescriptorPool 不存在时需要查询的元信息时，将进一步到 DescriptorDatabase 中去查找。 DescriptorDatabase 可从硬编码或磁盘中查询对应名称的 .proto 文件内容，解析后返回查询需要的元信息。

不难看出，DescriptorPool 和 DescriptorDatabase 通过缓存机制提高了反射运行效率。DescriptorDatabase 从磁盘中读取 .proto 内容并解析成 Descriptor 并不常用，实际上我们在使用 protoc 生成 xxx.pb.cc 和 xxx.pb.h 文件时，其中不仅仅包含了读写数据的接口，还包含了 .proto 文件内容。阅读任意一个 xxx.pb.cc 的内容，你可以看到如下类似代码

static void AddDescriptorsImpl() {
  InitDefaults();

  // .proto 内容
  static const char descriptor[] GOOGLE_PROTOBUF_ATTRIBUTE_SECTION_VARIABLE(protodesc_cold) = {
      "\n\022single_int32.proto\"\035\n\010Example1\022\021\n\010int3"
      "2Val\030\232\005 \001(\005\" \n\010Example2\022\024\n\010int32Val\030\377\377\377\377"
      "\001 \003(\005b\006proto3"
  };

  // 注册 descriptor
  ::google::protobuf::DescriptorPool::InternalAddGeneratedFile(
      descriptor, 93);

  // 注册 instance
  ::google::protobuf::MessageFactory::InternalRegisterGeneratedFile(
    "single_int32.proto", &protobuf_RegisterTypes);
}

其中 descriptor 数组存储的便是 .proto 内容。这里当然不是简单的存储原始文本字符串，而是经过了 SerializeToString 序列化处理，而后将结果以硬编码的形式保存在 xxx.pb.cc 中，真是充分利用了自己的高效编码能力。

硬编码的 .proto 元信息内容将以懒加载的方式（被调用时才触发）被 DescriptorDatabase 加载、解析，并缓存到 DescriptorPool

3、通过实例instance 的反射接口reflection对其pb字段读写

Reflection主要提供了动态读写 message对象字段的接口，对message对象的自动读写主要通过该类完成。

对每种数据类型，Reflection都提供了一个单独的接口（Get、Set）用于读写字段对应的值
例如对 int32、int64的读操作：输入参数为Message和FieldDescriptor*

virtual int32  GetInt32 (const Message& message,const FieldDescriptor* field) const = 0;
  virtual int64  GetInt64 (const Message& message, const FieldDescriptor* field) const = 0;

特殊的，对于枚举和嵌套的message：

virtual const EnumValueDescriptor* GetEnum( const Message& message, const FieldDescriptor* field) const = 0;
virtual const Message& GetMessage(const Message& message,
                                    const FieldDescriptor* field,
                                    MessageFactory* factory = NULL) const = 0;

对于写操作也是类似的接口，例如SetInt32/SetInt64/SetEnum等

写单个字段的函数如下：

void SetInt32(Message * message, const FieldDescriptor * field, int32 value) const
 
void SetString(Message * message, const FieldDescriptor * field, std::string value) const

获取重复字段的函数如下：

int32 GetRepeatedInt32(const Message & message, const FieldDescriptor * field, int index) const

std::string GetRepeatedString(const Message & message, const FieldDescriptor * field, int index) const

const Message & GetRepeatedMessage(const Message & message, const FieldDescriptor * field, int index) const

获取反射Reflection接口，并且值写入的代码如下：线程安全

auto reflecter = instance.GetReflection();//2.1通过字符串"Person"取得的实例instance
auto field = descriptor->FindFieldByName("name"); // Person这个Message 中有name字段
reflecter->SetString(&instance, field, "小明") ; //反射来设置name字段

反射机制

二、反射例子

1、serialize_message

serialize_message遍历提取message中所有字段以及对应的值，序列化到string中。主要思路就是：

Descriptor得到每个字段的描述符FieldDescriptor：字段名、字段的cpp类型。
通过Reflection的GetXXX接口获取对应的value。

类 FieldDescriptor 介绍
类 FieldDescriptor 的作用主要是对 Message 中单个字段进行描述，包括字段名、字段属性、原始的 field 字段等。

其获取获取自身信息的函数：

const std::string & name() const; // Name of this field within the message.
const std::string & lowercase_name() const; // Same as name() except converted to lower-case.
const std::string & camelcase_name() const; // Same as name() except converted to camel-case.
CppType cpp_type() const; //C++ type of this field.

其中cpp_type()函数是来获取该字段是什么类型的，在 protobuf 中，类型的类目如下：

enum FieldDescriptor::Type {
  TYPE_DOUBLE = = 1,
  TYPE_FLOAT = = 2,
  TYPE_INT64 = = 3,
  TYPE_UINT64 = = 4,
  TYPE_INT32 = = 5,
  TYPE_FIXED64 = = 6,
  TYPE_FIXED32 = = 7,
  TYPE_BOOL = = 8,
  TYPE_STRING = = 9,
  TYPE_GROUP = = 10,
  TYPE_MESSAGE = = 11,
  TYPE_BYTES = = 12,
  TYPE_UINT32 = = 13,
  TYPE_ENUM = = 14,
  TYPE_SFIXED32 = = 15,
  TYPE_SFIXED64 = = 16,
  TYPE_SINT32 = = 17,
  TYPE_SINT64 = = 18,
  MAX_TYPE = = 18
}

serialize_message遍历提取message中****所有字段以及对应的值代码如下：

void serialize_message(const google::protobuf::Message& message, std::string* serialized_string) {
    //获取描述Descriptor*和反射Reflection*
    const google::protobuf::Descriptor* descriptor = message.GetDescriptor();
    const google::protobuf::Reflection* reflection = message.GetReflection();
     
    //遍历消息的所有字段  
    for (int i = 0; i < descriptor->field_count(); ++i) {
        // 获得单个字段的描述FieldDescriptor*
        const google::protobuf::FieldDescriptor* field = descriptor->field(i);
        bool has_field = reflection->HasField(message, field);
        
        if (has_field) {
            //arrays not supported
            assert(!field->is_repeated());
            switch (field->cpp_type()) {
            //宏定义 CASE_FIELD_TYPE(cpptype, method, valuetype) 写case
#define CASE_FIELD_TYPE(cpptype, method, valuetype)\
                case google::protobuf::FieldDescriptor::CPPTYPE_##cpptype:{\
                    valuetype value = reflection->Get##method(message, field);\
                    int wsize = field->name().size();\
                    serialized_string->append(reinterpret_cast<char*>(&wsize), sizeof(wsize));\
                    serialized_string->append(field->name().c_str(), field->name().size());\
                    wsize = sizeof(value);\
                    serialized_string->append(reinterpret_cast<char*>(&wsize), sizeof(wsize));\
                    serialized_string->append(reinterpret_cast<char*>(&value), sizeof(value));\
                    break;\
                }

                // 利用宏定义 CASE_FIELD_TYPE 写所有简单数据类型的case
                CASE_FIELD_TYPE(INT32, Int32, int);
                CASE_FIELD_TYPE(UINT32, UInt32, uint32_t);
                CASE_FIELD_TYPE(FLOAT, Float, float);
                CASE_FIELD_TYPE(DOUBLE, Double, double);
                CASE_FIELD_TYPE(BOOL, Bool, bool);
                CASE_FIELD_TYPE(INT64, Int64, int64_t);
                CASE_FIELD_TYPE(UINT64, UInt64, uint64_t);
#undef CASE_FIELD_TYPE
                case google::protobuf::FieldDescriptor::CPPTYPE_ENUM: {
                    int value = reflection->GetEnum(message, field)->number();
                    int wsize = field->name().size();
                    //写入name占用字节数
                    serialized_string->append(reinterpret_cast<char*>(&wsize), sizeof(wsize));
                    //写入name
                    serialized_string->append(field->name().c_str(), field->name().size());
                    wsize = sizeof(value);
                    //写入value占用字节数
                    serialized_string->append(reinterpret_cast<char*>(&wsize), sizeof(wsize));
                    //写入value
                    serialized_string->append(reinterpret_cast<char*>(&value), sizeof(value));
                    break;
                }
                case google::protobuf::FieldDescriptor::CPPTYPE_STRING: {
                    std::string value = reflection->GetString(message, field);
                    int wsize = field->name().size();
                    serialized_string->append(reinterpret_cast<char*>(&wsize), sizeof(wsize));
                    serialized_string->append(field->name().c_str(), field->name().size());
                    wsize = value.size();
                    serialized_string->append(reinterpret_cast<char*>(&wsize), sizeof(wsize));
                    serialized_string->append(value.c_str(), value.size());
                    break;
                }
                //递归 序列化 嵌套Message
                case google::protobuf::FieldDescriptor::CPPTYPE_MESSAGE: {
                    std::string value;
                    int wsize = field->name().size();
                    serialized_string->append(reinterpret_cast<char*>(&wsize), sizeof(wsize));
                    serialized_string->append(field->name().c_str(), field->name().size());
                    const google::protobuf::Message& submessage = reflection->GetMessage(message, field);
                    serialize_message(submessage, &value);
                    wsize = value.size();
                    serialized_string->append(reinterpret_cast<char*>(&wsize), sizeof(wsize));
                    serialized_string->append(value.c_str(), value.size());
                    break;
                }
            }
        }
    }
}

2、parse_message

parse_message通过读取field/value，还原message对象。
主要思路跟serialize_message很像，通过Descriptor得到每个字段的描述符FieldDescriptor，通过Reflection的SetX填充message。

void parse_message(const std::string& serialized_string, google::protobuf::Message* message) {
    // 通过传入的Message* 获得 Descriptor* 和Reflection* 
    const google::protobuf::Descriptor* descriptor = message->GetDescriptor();
    const google::protobuf::Reflection* reflection = message->GetReflection();
    // 字段名字 和FieldDescriptor*字段的 映射
    std::map<std::string, const google::protobuf::FieldDescriptor*> field_map;
    
    for (int i = 0; i < descriptor->field_count(); ++i) {
        const google::protobuf::FieldDescriptor* field = descriptor->field(i);
        field_map[field->name()] = field;
    }
    const google::protobuf::FieldDescriptor* field = NULL;
    size_t pos = 0;
    //解析 字符串，填写字段值，
    while (pos < serialized_string.size()) {
        int name_size = *(reinterpret_cast<const int*>(serialized_string.substr(pos, sizeof(int)).c_str()));
        pos += sizeof(int);
        
        std::string name = serialized_string.substr(pos, name_size);
        pos += name_size;
        
        int value_size = *(reinterpret_cast<const int*>(serialized_string.substr(pos, sizeof(int)).c_str()));
        pos += sizeof(int);
   
        std::string value = serialized_string.substr(pos, value_size);
        pos += value_size;
        //通过name 从map中获取FieldDescriptor*
        std::map<std::string, const google::protobuf::FieldDescriptor*>::iterator iter =
            field_map.find(name);
        if (iter == field_map.end()) {
            fprintf(stderr, "no field found.\n");
            continue;
        } else {
            field = iter->second;
        }
        assert(!field->is_repeated());
        switch (field->cpp_type()) {
        // 利用宏定义 CASE_FIELD_TYPE 写所有简单数据类型的case
#define CASE_FIELD_TYPE(cpptype, method, valuetype)\
            case google::protobuf::FieldDescriptor::CPPTYPE_##cpptype: {\
                reflection->Set##method(\
                        message,\
                        field,\
                        *(reinterpret_cast<const valuetype*>(value.c_str())));\
                std::cout << field->name() << "\t" << *(reinterpret_cast<const valuetype*>(value.c_str())) << std::endl;\
                break;\
            }
            CASE_FIELD_TYPE(INT32, Int32, int);
            CASE_FIELD_TYPE(UINT32, UInt32, uint32_t);
            CASE_FIELD_TYPE(FLOAT, Float, float);
            CASE_FIELD_TYPE(DOUBLE, Double, double);
            CASE_FIELD_TYPE(BOOL, Bool, bool);
            CASE_FIELD_TYPE(INT64, Int64, int64_t);
            CASE_FIELD_TYPE(UINT64, UInt64, uint64_t);
#undef CASE_FIELD_TYPE

            case google::protobuf::FieldDescriptor::CPPTYPE_ENUM: {
                const google::protobuf::EnumValueDescriptor* enum_value_descriptor =
                    field->enum_type()->FindValueByNumber(*(reinterpret_cast<const int*>(value.c_str())));
                reflection->SetEnum(message, field, enum_value_descriptor);
                std::cout << field->name() << "\t" << *(reinterpret_cast<const int*>(value.c_str())) << std::endl;
                break;
            }
            
            case google::protobuf::FieldDescriptor::CPPTYPE_STRING: {
                reflection->SetString(message, field, value);
                std::cout << field->name() << "\t" << value << std::endl;
                break;
            }
            //递归解析 嵌套message字符串
            case google::protobuf::FieldDescriptor::CPPTYPE_MESSAGE: {
                google::protobuf::Message* submessage = reflection->MutableMessage(message, field);
                parse_message(value, submessage);
                break;
            }
            default: {
                break;
            }
        }
    }
}

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。