join type概述

出现的背景

引出问题: “某头条新闻APP”新闻内容和新闻评论是1对多的关系?在ES6.X该如何存储、如何进行高效检索、聚合操作呢?

1. ES6.X 新类型join产生背景

  • Mysql中多表关联,我们可以通过left join 或者Join等实现
  • ES5.X版本,借助父子文档实现多表关联,类似数据库中Join的功能;实现的核心是借助于ES5.X支持1个索引(index)下多个类型(type)
  • ES6.X版本,由于每个索引下面只支持单一的类型(type)
  • 所以,ES6.X版本如何实现Join成为关注点

ES6.X新推出了Join类型,主要解决类似Mysql中多表关联的问题

2. join类型介绍

仍然是一个索引下,借助父子关系,实现类似Mysql中多表关联的操作

3. join类型的mapping定义

PUT my_index
{
  "mappings": {
    "docs": {
      "properties": {
          "id": {
            "type": "long"
          },
          "my_join_field": { <1>
            "type": "join",
            "eager_global_ordinals": true,
            "relations": {
              "question": "answer" <2>
            }
          },
          "text": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
    }
  }
}

<1> 为join的名称

<2> 指question为answer的父类


4. 父文档数据插入

PUT my_index/docs/1?refresh
{
  "text": "This is a question",
  "my_join_field": {
    "name": "question" 
  }
}

PUT my_index/docs/2?refresh
{
  "text": "This is a another question",
  "my_join_field": {
    "name": "question"
  }
}

PUT my_index/docs/_bulk?refresh
{"index": {"_id": 3}}
{"id":3, "text": "question 3333", "my_join_field": {"name": "question"}}
{"index": {"_id": 4}}
{"id":4, "text": "question 4444", "my_join_field": {"name": "question"}}

文档类型为父类型: ”question”。

5. 子类型文档插入

PUT my_index/doc/5?routing=1&refresh <1>
{
  "text": "This is an answer",
  "my_join_field": {
    "name": "answer", <2>
    "parent": "1" <3>
  }
}

PUT my_index/doc/6?routing=1&refresh
{
  "text": "This is another answer",
  "my_join_field": {
    "name": "answer",
    "parent": "1"
  }
}

<1> 路由值是强制性的,因为父文件和子文件必须在相同的分片上建立索引。

<2> “answer”是此子文档的加入名称。代表其是一个子文档。

<3> 指定此子文档的父文档ID:1。


6. 使用join类型的其他约束

  • 每个索引只允许一个Join类型Mapping定义
  • 父文档和子文档必须在同一个分片上编入索引;这意味着,当进行删除、更新、查找子文档时候需要提供相同的路由值
  • 一个文档可以有多个子文档,但只能有一个父文档
  • 可以为已经存在的Join类型添加新的关系
  • 当一个文档已经成为父文档后,可以为该文档添加子文档

7.join类型的搜索与聚合

7.1 搜索全部

GET my_index/docs/_search

结果数据为

{
  "took": 145,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_index",
        "_type": "docs",
        "_id": "4",
        "_score": 1,
        "_source": {
          "id": 4,
          "text": "question 4444",
          "my_join_field": {
            "name": "question"
          }
        }
      },
      {
        "_index": "my_index",
        "_type": "docs",
        "_id": "2",
        "_score": 1,
        "_source": {
          "text": "This is a another question",
          "my_join_field": {
            "name": "question"
          }
        }
      },
      {
        "_index": "my_index",
        "_type": "docs",
        "_id": "1",
        "_score": 1,
        "_source": {
          "text": "This is a question",
          "my_join_field": {
            "name": "question"
          }
        }
      },
      {
        "_index": "my_index",
        "_type": "docs",
        "_id": "5",
        "_score": 1,
        "_routing": "1",
        "_source": {
          "text": "This is an answer",
          "my_join_field": {
            "name": "answer",
            "parent": "1"
          }
        }
      },
      {
        "_index": "my_index",
        "_type": "docs",
        "_id": "6",
        "_score": 1,
        "_routing": "1",
        "_source": {
          "text": "This is another answer",
          "my_join_field": {
            "name": "answer",
            "parent": "1"
          }
        }
      },
      {
        "_index": "my_index",
        "_type": "docs",
        "_id": "3",
        "_score": 1,
        "_source": {
          "id": 3,
          "text": "question 3333",
          "my_join_field": {
            "name": "question"
          }
        }
      }
    ]
  }
}

7.2 基于父文档查找子文档

GET my_index/docs/_search
{
  "query": {
    "has_parent": {
      "parent_type": "question",
      "query": {
        "match": {
          "text": "this is"
        }
      }
    }
  }
}

返回结果集

{
  "took": 161,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_index",
        "_type": "docs",
        "_id": "5",
        "_score": 1,
        "_routing": "1",
        "_source": {
          "text": "This is an answer",
          "my_join_field": {
            "name": "answer",
            "parent": "1"
          }
        }
      },
      {
        "_index": "my_index",
        "_type": "docs",
        "_id": "6",
        "_score": 1,
        "_routing": "1",
        "_source": {
          "text": "This is another answer",
          "my_join_field": {
            "name": "answer",
            "parent": "1"
          }
        }
      }
    ]
  }
}

7.3 基于子文档查找父文档

GET my_index/docs/_search
{
  "query": {
    "has_child": {
      "type": "answer",
      "query": {
        "match": {
          "text": "this is"
        }
      }
    }    
  }
}

返回结果集

{
  "took": 286,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_index",
        "_type": "docs",
        "_id": "1",
        "_score": 1,
        "_source": {
          "text": "This is a question",
          "my_join_field": {
            "name": "question"
          }
        }
      }
    ]
  }
}

7.4 查找指定父文档id的子文档集合

GET /my_index/docs/_search
{
  "query": {
    "parent_id": {
      "type": "answer",
      "id": "1"
    }
  }
}

结果集

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.13353139,
    "hits": [
      {
        "_index": "my_index",
        "_type": "docs",
        "_id": "5",
        "_score": 0.13353139,
        "_routing": "1",
        "_source": {
          "text": "This is an answer",
          "my_join_field": {
            "name": "answer",
            "parent": "1"
          }
        }
      },
      {
        "_index": "my_index",
        "_type": "docs",
        "_id": "6",
        "_score": 0.13353139,
        "_routing": "1",
        "_source": {
          "text": "This is another answer",
          "my_join_field": {
            "name": "answer",
            "parent": "1"
          }
        }
      }
    ]
  }
}

7.5 聚合操作

在这里不做过多介绍,详细的使用方法请在后面的聚合的章节进行分析。