承接上文
【ES】ES中的join方案一(Nested类型,基于6.3版本的java实现):
上文说到ES中的join有两种实现,上文把Nested类型的实现说了,本文要写的是通过设置join的字段,来关联不同文档,通过设置的parent和child,来实现父子关系。
等下,为什么要有父子关系?
我们通过一个业务场景来理解
我有两张表:
表A:文章内容表
表B:发布文章的用户基础信息
需求:想知道提到“NBA”(最近NBA很“火”。。。)的用户都有哪些(要查询出这些用户的基础信息),并且哪一个跟NBA关系更大?
如果使用数据库实现的话,无非是:把所有文章拿过来like '%NBA%',然后将这些文章的用户ID去重,再关联用户基础信息表,获得我们想要的数据(评分的话,可以简单的通过该用户提到NBA的文章的次数来统计)
随便写个SQL如下:
--cnt越大,排名越高,因为该用户提到NBA的文章越多
select b.* from
(select
user_id,count(1) as cnt
from 文章内容表
where 文章内容 like '%NBA%' )a
inner join
发布文章的用户基础信息 b
on a.user_id = b.user_id
order by cnt desc
大家看到其实返回的是用户的基础信息,也就是表B
我并不需要知道用户发了什么文章,这些文章的内容是什么(当然可能后续操作需要用到,但是在这个需求中,我是不需要的)
这样的需求转换成ES实现
其实就是父文档是用户基础信息,子文档是用户发布的文章,查询子文档返回父文档(父和子是一对多的关系)
如下案例的层级关系:
user_base
|
|
article
|
|
vote
遇到这种需求就可以使用hasChild来实现:
注意:父和子一定要放到同一个routing中否则会索引不到或者出现返回值不唯一
所以一般是用关联键来做routing=关联键id,这样就会把相同的数据放到同一个routing上
关于routing:(作者:天戈朱)
##创建一个3层的index
PUT three_tree_index
{
"mappings": {
"_doc": {
"properties": {
"user_name": {
"type": "text"
},
"age": {
"type": "keyword"
},
"my_join_field": {
"type": "join",
"relations": {
"user_base": "article",
"article": "vote"
}
},
"stars": {
"type": "short"
},
"article_desc": {
"type": "text"
}
}
}
}
}
##插入6条数据
PUT three_tree_index/_doc/1?routing=1&refresh
{
"user_name":"xiaoming",
"age":29,
"my_join_field":"user_base"
}
PUT three_tree_index/_doc/2?routing=2&refresh
{
"user_name":"xiaohong",
"age":32,
"my_join_field":"user_base"
}
PUT three_tree_index/_doc/3?routing=1&refresh
{
"article_desc":"xiaoming,article_desc_1",
"my_join_field":{
"name":"article",
"parent":"1"
}
}
PUT three_tree_index/_doc/4?routing=2&refresh
{
"article_desc":"xiaohong,article_desc_1",
"my_join_field":{
"name":"article",
"parent":"2"
}
}
PUT three_tree_index/_doc/5?routing=1&refresh
{
"stars":5,
"my_join_field":{
"name":"vote",
"parent":"3"
}
}
PUT three_tree_index/_doc/6?routing=2&refresh
{
"stars":3,
"my_join_field":{
"name":"vote",
"parent":"4"
}
}
1、查询文章内容包括“xiaoming”的用户信息
##查询文章内容有“xiaoming”的用户信息
GET three_tree_index/_search
{
"query": {
"has_child": {
"type": "article",
"query": {
"match": {
"article_desc": "xiaoming"
}
}
}
}
}
##可以通过添加"inner_hits":{},将父和子文档都返回出来
GET three_tree_index/_search
{
"query": {
"has_child": {
"type": "article",
"query": {
"match": {
"article_desc": "xiaoming"
}
},"inner_hits":{}
}
}
}
##查询有五星评价的文章对应的用户(嵌套两层has_child查询)
GET three_tree_index/_search
{
"query": {
"bool": {
"must": [
{
"has_child": {
"type": "article",
"query": {
"has_child": {
"type": "vote",
"query": {
"bool": {
"should": [
{
"term": {
"stars": 5
}
}
]
}
}
}
}
}
}
]
}
}
}
2、当然has_parent的功能正好相反,比如我想搜索所有包含“xiaoming”的文章的平均星级
GET three_tree_index/_search
{
"query": {
"has_parent": {
"parent_type": "article",
"query": {
"bool": {
"should": [{
"match": {
"article_desc": "xiaoming"
}
}]
}
}
}
},
"aggs": {
"avg_star": {
"avg": {
"field": "stars"
}
}
}
}
3、Java的api:
可以参考官网:https://www.elastic.co/guide/en/elasticsearch/client/java-api/6.3/java-joining-queries.html
/**
* 我就写了部分的java代码,权当抛砖引玉吧,如果实在不会写给我留言咯
* 感觉这个javaAPI还是比较好写的,就是不太好找到相关的资料
*/
public static void testChildQuery(TransportClient client) {
HasChildQueryBuilder hasParentQueryBuilder =
new HasChildQueryBuilder("article",QueryBuilders.matchQuery("article_desc", "xiaoming"),ScoreMode.Avg);
SearchResponse searchResponse = client.prepareSearch("three_tree_index")
.setTypes("_doc")
.addSort("_score", SortOrder.DESC)
.setQuery(hasParentQueryBuilder)
.setFrom(0).setSize(50).execute().actionGet();
printSearchResponse(searchResponse);
}
public static void testMultiChildQuery(TransportClient client) {
HasChildQueryBuilder hasSecondChildQueryBuilder =
new HasChildQueryBuilder("vote", QueryBuilders.boolQuery()
.should(QueryBuilders.termQuery("stars", 5)),ScoreMode.None);
HasChildQueryBuilder hasFirstChildQueryBuilder =
new HasChildQueryBuilder("article",hasSecondChildQueryBuilder,ScoreMode.None);
SearchResponse searchResponse = client.prepareSearch("three_tree_index")
.setTypes("_doc")
.addSort("_score", SortOrder.DESC)
.setQuery(hasFirstChildQueryBuilder)
.setFrom(0).setSize(50).execute().actionGet();
printSearchResponse(searchResponse);
}
4、其实这个父子关系可以更加的复杂,例如这种:
官网:https://www.elastic.co/guide/en/elasticsearch/reference/6.3/parent-join.html
question
/ \
/ \
comment answer
|
|
vote
##就这么定义,这是官网的例子
PUT my_index
{
"mappings": {
"_doc": {
"properties": {
"my_join_field": {
"type": "join",
"relations": {
"question": ["answer", "comment"],
"answer": "vote"
}
}
}
}
}
}
5、两种join的区别(我引用了别人的图):
引用地址: