ElasticSearch7.0 关联查询之父子文档

ES7中取消了type这一层级(相当于关系数据库中的table,mongo中的collection),所有文档平铺存放在同一个index中,对于一对多的关联关系,ES7中有两种方式:

  • 父子文档,所有文档都是平级的,通过特殊的字段类型join来表示层级关系
  • 嵌套文档,类似于json中的嵌套数组,需要申明字段类型为nested

本篇针对父子文档这一类型
官方文档地址:https://www.elastic.co/guide/en/elasticsearch/reference/current/parent-join.html
本项目地址:
https://gitee.com/xiiiao/es-learning.git

创建index

PUT my-index-join_family
{
	"mappings":{
		"properties":{
			"my_id":{
				"type":"keyword"
			},
			"name":{
			  "type":"keyword"
			},
			"level":{
			  "type":"keyword"
			},
			"join_filed":{  //1关联的字段名,可以随意取
				"type":"join", //类型需要定义为join
				"relations":{ //定义层级关系,grand_parent ->parent -> child
					"grand_parent":"parent",  
					"parent":"child"
				}
			}
		}
	}
}

以上创建了一个祖->父->子的关联关系,一个父可以有多个子,多个子用数组的方式申明

插入顶层父节点

Rest API
PUT my-index-join_family/_doc/1?refresh
{
  "my_id": "1",
  "name": "grandPa",
  "join_filed": { //表名这个文档属于grand_parent这一层级
    "name": "grand_parent" 
  }
}
RestHighLevelClient实现
public void addGrandPa( String name) {
        String id = UUID.randomUUID().toString();

        JoinFamily member = new JoinFamily();
        member.setName(name);
        member.setLevel("1");
        member.setMy_id(id);
        JoinField joinField = new JoinField();
        joinField.setName("grand_parent");
        member.setJoin_filed(joinField);
     
        String source = JSON.toJSONString(member);
        log.info("source: " + source);
        IndexRequest indexRequest = new IndexRequest("my-index-join_family").id(id).source(source, XContentType.JSON)
                .setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE);
     
        try {
            IndexResponse out = client.index(indexRequest, RequestOptions.DEFAULT);
            log.info(out.getId());
        } catch (IOException e) {
            log.error("", e);
        }

    }

插入父节点

Rest API
PUT my-index-join_family/_doc/2?routing=1&refresh
{
  "my_id": "2",
  "name": "parent",
  "join_filed": { //表名这个文档属于parent这一层级
    "name": "parent" ,
    "parent":"1"  //父级节点的id,前面的parent字段名固定
  }
}

由于子文档需要和父文档在同一分片中,所以需要指定routing参数

RestHighLevelClient实现
public void addParent(String parentId, String name) {
        String id = UUID.randomUUID().toString();

        JoinFamily member = new JoinFamily();
        member.setName(name);
        member.setLevel("1");
        member.setMy_id(id);
        JoinField joinField = new JoinField();
        joinField.setName("parent");
        joinField.setParent(parentId);
        member.setJoin_filed(joinField);
     
        String source = JSON.toJSONString(member);
        log.info("source: " + source);
        IndexRequest indexRequest = new IndexRequest("my-index-join_family").id(id).source(source, XContentType.JSON)
                .setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE)
                .routing(parentId);
     
        try {
            IndexResponse out = client.index(indexRequest, RequestOptions.DEFAULT);
            log.info(out.getId());
        } catch (IOException e) {
            log.error("", e);
        }

    }

child节点插入方式和parent层一致,这里限于篇幅不再赘述。

查询API

Parent-Id-Query

顾名思义,根据parentId进行查询
官方文档https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-parent-id-query.html#query-dsl-parent-id-query

Rest API
GET /my-index-join_family/_search
{
  "query": {
      "parent_id": {
          "type": "parent", //子节点的名称
          "id": "1" //父节点的ID
      }
  }
}
RestHighLevelClient实现
@Test
    public void testParentId() throws IOException {
        SearchRequest search= new SearchRequest("my-index-join_family");
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        ParentIdQueryBuilder build = JoinQueryBuilders.parentId("parent","1");
        searchSourceBuilder.query(build);
        search.source(searchSourceBuilder);
        SearchResponse response=client.search(search, RequestOptions.DEFAULT);
        response.getHits().forEach(hi ->{
            System.out.println(hi.getSourceAsString());
        });

    }

Has-Parent

根据parent中的条件,返回子文档集合
官方文档https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-has-parent-query.html

Rest API
GET /my-index-join_family/_search
{
  "query": {
    "has_parent": {
      "parent_type": "grand_parent",
      "query": {
        "match": {
          "name": "grandPa"
        }
      }
    }
  }
}
RestHighLevelClient实现
public void testHasParent() throws IOException {
        SearchRequest search= new SearchRequest("my-index-join_family");
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        HasParentQueryBuilder build= JoinQueryBuilders.hasParentQuery("grand_parent", QueryBuilders.matchQuery("name", "grand_parent"), false);
        searchSourceBuilder.query(build);
        search.source(searchSourceBuilder);
        SearchResponse response=client.search(search, RequestOptions.DEFAULT);
        response.getHits().forEach(hi ->{
            System.out.println(hi.getSourceAsString());
        });

    }

可以看出,parentId这种查询方式骑士是hasParent的一种特例

Has-Child

根据子文档的条件,返回对应的父文档列表
官方文档https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-parent-id-query.html#query-dsl-parent-id-query

Rest API
GET /my-index-join_family/_search
{
  "query": {
    "has_child": {
      "type": "parent",
      "query": {
        "match_all": {}
      }
      
    }
  }
}
RestHighLevelClient实现
public void testHasChild() throws IOException {
        SearchRequest search= new SearchRequest("my-index-join_family");
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        HasChildQueryBuilder build=JoinQueryBuilders.hasChildQuery("parent", QueryBuilders.matchQuery("parent", "parent5"), ScoreMode.None);
        searchSourceBuilder.query(build);
        search.source(searchSourceBuilder);
        SearchResponse response=client.search(search, RequestOptions.DEFAULT);
        response.getHits().forEach(hi ->{
            System.out.println(hi.getSourceAsString());
        });

    }

查询某一特定层级

除了以上方式,如果想要查询某一特定层级的文档,可以用以下方式:

GET my-index-join_family/_search
{
  "query": {
    "match": {
      "join_filed": "parent"//这里的parent对应relations中的parent
    }
  }
}

文档聚合

对于具有关联关系的文档,按照某一方式进行聚合也是非常常见的需求,对于父子文档,需要使用特殊的聚合类型children

Rest API
GET my-index-join_family/_search
{
  "size": 0, 
  "query": {
    "match": {
      "join_filed": "parent"
    }
  },
  "aggs": {
    "parent_agg": {//1
      "terms": {
        "field": "my_id",
        "size": 10
      },
      "aggs": {
        "child_agg": {
          "children": {//2
            "type": "child"
          }
        }
      
      }
    }
    
  }
}

首先第一层按照父文档进行聚合,第二层再根据子文档进行聚合,结果如下

{
  "took" : 12,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "parent_agg" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "0088519a-797f-4155-868c-cabb5e9a8a9e",
          "doc_count" : 1,
          "child_agg" : {
            "doc_count" : 0
          }
        },
        {
          "key" : "0893e4fd-daf6-48ca-8f96-ee5242aaea42",
          "doc_count" : 1,
          "child_agg" : {
            "doc_count" : 2
          }
        },
        {
          "key" : "136fa236-d3f2-4a19-98d8-a8460de531c3",
          "doc_count" : 1,
          "child_agg" : {
            "doc_count" : 1
          }
        }
      ]
    }
  }
}

如果想要继续处理,可以继续进行嵌套

GET my-index-join_family/_search
{
  "size": 0, 
  "query": {
    "match": {
      "join_filed": "parent"
    }
  },
  "aggs": {
    "parent_agg": {
      "terms": {
        "field": "my_id",
        "size": 10
      },
      "aggs": {
        "child_agg": {
          "children": {
            "type": "child"
          },
          "aggs": {
            "child_name": {
              "scripted_metric": {
                "init_script": "state.transactions = ''",
                "map_script": "state.transactions=state.transactions+' '+doc.name",
                "combine_script": " return state.transactions",
                "reduce_script": "String profit = ''; for (a in states) { profit += a } return profit"
              }
            }
          }
        }
      
      }
    }
    
  }
}
RestHighLevelClient实现
@Test
    public void testAggChild() throws IOException {
        SearchRequest search = new SearchRequest("my-index-join_family");
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        searchSourceBuilder.query(QueryBuilders.matchQuery("join_filed", "parent"));
        AggregationBuilder build = AggregationBuilders.terms("parent_agg").field("name")
                .subAggregation(JoinAggregationBuilders.children("child_agg", "child"));
        searchSourceBuilder.aggregation(build);
        search.source(searchSourceBuilder);
        SearchResponse response = client.search(search, RequestOptions.DEFAULT);
        Map<String, Aggregation> map = response.getAggregations().getAsMap();
        Terms terms = (Terms) map.get("parent_agg");
        terms.getBuckets().forEach(bucket -> {
            System.out.println(bucket.getKeyAsString() + " " + bucket.getDocCount());
            Map<String, Aggregation> subMap = bucket.getAggregations().getAsMap();
            Children children = (Children) subMap.get("child_agg");
            System.out.println("childCount" + children.getDocCount());
        });

    }

    @Test
    public void testScriptedMetric() throws IOException {
        SearchRequest search = new SearchRequest("my-index-join_family");
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        searchSourceBuilder.query(QueryBuilders.matchQuery("join_filed", "parent"));
        AggregationBuilder build = AggregationBuilders.terms("parent_agg").field("name")
                .subAggregation(JoinAggregationBuilders.children("child_agg", "child")
                        .subAggregation(AggregationBuilders.scriptedMetric("child_name")
                                .initScript(new Script("state.transactions = []"))
                                .mapScript(new Script("state.transactions.add(doc.name)"))
                                .combineScript(new Script("String profit =''; for (t in state.transactions) { profit += t } return profit"))
                                .reduceScript(new Script("String profit = ''; for (a in states) { profit += a } return profit")))
                );
        searchSourceBuilder.aggregation(build);
        search.source(searchSourceBuilder);
        SearchResponse response = client.search(search, RequestOptions.DEFAULT);
        Map<String, Aggregation> map = response.getAggregations().getAsMap();
        Terms terms = (Terms) map.get("parent_agg");
        terms.getBuckets().forEach(bucket -> {
            System.out.println(bucket.getKeyAsString() + " " + bucket.getDocCount());
            Map<String, Aggregation> subMap = bucket.getAggregations().getAsMap();
            Children children = (Children) subMap.get("child_agg");
            System.out.println("childCount " + children.getDocCount());
            Map<String, Aggregation> subSubMap = children.getAggregations().getAsMap();
            ScriptedMetric metric = (ScriptedMetric) subSubMap.get("child_name");
            System.out.println("childName " + metric.aggregation());
        });

    }