0、ES6.X 一对多、多对多的数据该如何存储和实现呢?
引出问题:

“某头条新闻APP”新闻内容和新闻评论是1对多的关系?

在ES6.X该如何存储、如何进行高效检索、聚合操作呢?

相信阅读本文,你就能得到答案!

1、ES6.X 新类型Join 产生背景
Mysql中多表关联,我们可以通过left join 或者Join等实现;

ES5.X版本,借助父子文档实现多表关联,类似数据库中Join的功能;实现的核心是借助于ES5.X支持1个索引(index)下多个类型(type)。

ES6.X版本,由于每个索引下面只支持单一的类型(type)。

所以,ES6.X版本如何实现Join成为大家关注的问题。

幸好,ES6.X新推出了Join类型,主要解决类似Mysql中多表关联的问题。

2、ES6.X Join类型介绍
仍然是一个索引下,借助父子关系,实现类似Mysql中多表关联的操作。

3、ES6.X Join类型实战
3.1 ES6.X Join类型 Mapping定义
Join类型的Mapping如下:

核心

1) "my_join_field"为join的名称。

2)“question”: “answer” 指:qustion为answer的父类。

PUT my_join_index
 {
   "mappings": {
     "_doc": {
       "properties": {
         "my_join_field": { 
           "type": "join",
           "relations": {
             "question": "answer" 
           }
         }
       }
     }
   }
 }

3.2 ES6.X join类型定义父文档
直接上以下简化的形式,更好理解些。

如下,定义了两篇父文档。
文档类型为父类型:“question”。

PUT my_join_index/_doc/1?refresh
 {
   "text": "This is a question",
   "my_join_field": "question" 
 }PUT my_join_index/_doc/2?refresh
 {
   "text": "This is another question",
   "my_join_field": "question"
 }


3.3 ES6.X join类型定义子文档
路由值是强制性的,因为父文件和子文件必须在相同的分片上建立索引。
"answer"是此子文档的加入名称。
指定此子文档的父文档ID:1。

PUT my_join_index/_doc/3?routing=1&refresh 
 {
   "text": "This is an answer",
   "my_join_field": {
     "name": "answer", 
     "parent": "1" 
   }
 }PUT my_join_index/_doc/4?routing=1&refresh
 {
   "text": "This is another answer",
   "my_join_field": {
     "name": "answer",
     "parent": "1"
   }
 }

4、ES6.X Join类型约束
每个索引只允许一个Join类型Mapping定义;
父文档和子文档必须在同一个分片上编入索引;这意味着,当进行删除、更新、查找子文档时候需要提供相同的路由值。
一个文档可以有多个子文档,但只能有一个父文档。
可以为已经存在的Join类型添加新的关系。
当一个文档已经成为父文档后,可以为该文档添加子文档。
5、ES6.X Join类型检索与聚合
5.1 ES6.X Join全量检索

GET my_join_index/_search
 {
   "query": {
     "match_all": {}
   },
   "sort": ["_id"]
 }

 返回结果如下:{
   "took": 1,
   "timed_out": false,
   "_shards": {
     "total": 5,
     "successful": 5,
     "skipped": 0,
     "failed": 0
   },
   "hits": {
     "total": 4,
     "max_score": null,
     "hits": [
       {
         "_index": "my_join_index",
         "_type": "_doc",
         "_id": "1",
         "_score": null,
         "_source": {
           "text": "This is a question",
           "my_join_field": "question"
         },
         "sort": [
           "1"
         ]
       },
       {
         "_index": "my_join_index",
         "_type": "_doc",
         "_id": "2",
         "_score": null,
         "_source": {
           "text": "This is another question",
           "my_join_field": "question"
         },
         "sort": [
           "2"
         ]
       },
       {
         "_index": "my_join_index",
         "_type": "_doc",
         "_id": "3",
         "_score": null,
         "_routing": "1",
         "_source": {
           "text": "This is an answer",
           "my_join_field": {
             "name": "answer",
             "parent": "1"
           }
         },
         "sort": [
           "3"
         ]
       },
       {
         "_index": "my_join_index",
         "_type": "_doc",
         "_id": "4",
         "_score": null,
         "_routing": "1",
         "_source": {
           "text": "This is another answer",
           "my_join_field": {
             "name": "answer",
             "parent": "1"
           }
         },
         "sort": [
           "4"
         ]
       }
     ]
   }
 }



5.2 ES6.X 基于父文档查找子文档

GET my_join_index/_search
 {
     "query": {
         "has_parent" : {
             "parent_type" : "question",
             "query" : {
                 "match" : {
                     "text" : "This is"
                 }
             }
         }
     }
 }



返回结果:

{
   "took": 0,
   "timed_out": false,
   "_shards": {
     "total": 5,
     "successful": 5,
     "skipped": 0,
     "failed": 0
   },
   "hits": {
     "total": 2,
     "max_score": 1,
     "hits": [
       {
         "_index": "my_join_index",
         "_type": "_doc",
         "_id": "3",
         "_score": 1,
         "_routing": "1",
         "_source": {
           "text": "This is an answer",
           "my_join_field": {
             "name": "answer",
             "parent": "1"
           }
         }
       },
       {
         "_index": "my_join_index",
         "_type": "_doc",
         "_id": "4",
         "_score": 1,
         "_routing": "1",
         "_source": {
           "text": "This is another answer",
           "my_join_field": {
             "name": "answer",
             "parent": "1"
           }
         }
       }
     ]
   }
 }





5.3 ES6.X 基于子文档查找父文档

GET my_join_index/_search
 {
 "query": {
         "has_child" : {
             "type" : "answer",
             "query" : {
                 "match" : {
                     "text" : "This is question"
                 }
             }
         }
     }
 }


返回结果:

{
   "took": 0,
   "timed_out": false,
   "_shards": {
     "total": 5,
     "successful": 5,
     "skipped": 0,
     "failed": 0
   },
   "hits": {
     "total": 1,
     "max_score": 1,
     "hits": [
       {
         "_index": "my_join_index",
         "_type": "_doc",
         "_id": "1",
         "_score": 1,
         "_source": {
           "text": "This is a question",
           "my_join_field": "question"
         }
       }
     ]
   }
 }




5.4 ES6.X Join聚合操作实战
以下操作含义如下:

1)parent_id是特定的检索方式,用于检索属于特定父文档id=1的,子文档类型为answer的文档的个数。
2)基于父文档类型question进行聚合;
3)基于指定的field处理。
 

返回结果:

{
   "took": 1,
   "timed_out": false,
   "_shards": {
     "total": 5,
     "successful": 5,
     "skipped": 0,
     "failed": 0
   },
   "hits": {
     "total": 2,
     "max_score": 0.13353139,
     "hits": [
       {
         "_index": "my_join_index",
         "_type": "_doc",
         "_id": "3",
         "_score": 0.13353139,
         "_routing": "1",
         "fields": {
           "parent": [
             "1"
           ]
         }
       },
       {
         "_index": "my_join_index",
         "_type": "_doc",
         "_id": "4",
         "_score": 0.13353139,
         "_routing": "1",
         "fields": {
           "parent": [
             "1"
           ]
         }
       }
     ]
   },
   "aggregations": {
     "parents": {
       "doc_count_error_upper_bound": 0,
       "sum_other_doc_count": 0,
       "buckets": [
         {
           "key": "1",
           "doc_count": 2
         }
       ]
     }
   }
 }




6、ES6.X Join 一对多实战
6.1 一对多定义
如下,一个父文档question与多个子文档answer,comment的映射定义。

PUT join_ext_index
 {
   "mappings": {
     "_doc": {
       "properties": {
         "my_join_field": {
           "type": "join",
           "relations": {
             "question": ["answer", "comment"]  
           }
         }
       }
     }
   }
 }



6.2 一对多对多定义
实现如下图的祖孙三代关联关系的定义。

question
     /    \
    /      \
 comment  answer
            |
            |
           vote

 PUT join_multi_index
 {
   "mappings": {
     "_doc": {
       "properties": {
         "my_join_field": {
           "type": "join",
           "relations": {
             "question": ["answer", "comment"],  
             "answer": "vote" 
           }
         }
       }
     }
   }
 }



孙子文档导入数据,如下所示:

PUT join_multi_index/_doc/3?routing=1&refresh 
 {
   "text": "This is a vote",
   "my_join_field": {
     "name": "vote",
     "parent": "2" 
   }
 }



注意:

- 孙子文档所在分片必须与其父母和祖父母相同
- 孙子文档的父代号(必须指向其父亲answer文档)

7、小结
虽然ES官方文档已经很详细了,详见:
http://t.cn/RnBBLgp

但手敲一遍,翻译一遍,的的确确会更新认知,加深理解。

和你一起,死磕ELK Stack!