上一篇我们总结了如何将MySQL的数据实时同步到Elasticsearch中,这一篇我们来总结一下当数据同步到Elasticsearch中后,如何使用Java API对Elasticsearch进行复杂查询,但实际工作中还没有遇到过特别复杂的SQL如何转化到Elasticsearch中,当然遇到这种情况一种解决方式是将数据库的查询结果直接存到一张表中,然后在Elasticsearch建立对应的索引,再对该索引进行查询;当然更直接的就是编写复杂的Java代码实现查询,这种情况我也还没有遇到。

前面我们已经讲过了如何将MySQL数据库的一张表映射到Elasticsearch的索引中,同时不建议多张表对应同一个索引,Elasticsearch本身现在也不支持一个index下存在多个type的情况,因此一张表对应一个index是目前主要的方式。但开发过程中还会经常遇到表对应的索引经常需要调整的问题,因为不同的查询过滤条件不同,每个字段的设置就会不同,因此会需要反复调整索引,下面我们讲一个我实际做过的SQL语句的转化。

在工作中遇到过group by两个字段,并且进行分组求和的情况,SQL语句如下:

select t.DAINFOATTR15, sum(t.DAINFOATTR19), t.DAINFOATTR17
from t_da_info t
group by t.DAINFOATTR15, t.DAINFOATTR17;

可以看到这是一个非常简单的SQL语句,但在Elasticsearch中进行查询时会发现许多问题,首先我们的表中出现了DAINFOATTR15这个字段存的全部都是中文,然后在使用Java API的时候,Elasticserach会将DAINFOATTR15字段的每一个中文都进行group by,也就是每个独立的中文字符都会被group by,一开始遇到这个问题的时候就会很懵,后来通过百度查找结局方案,需要将存放中文的字段设置为keyword,如下:

"DAINFOATTR15": {
  "type": "text",
  "fielddata": true,
  "fields": {
    "keyword": {
      "type": "keyword"
    }
  }
}

我将完整的建立索引的语句展示出来,可以看到凡是加了如上的keyword过滤的,都是后面查询中需要作为group by的字段:

PUT daac0118_da_info
{
  "mappings": {
    "da_info": {
      "properties": {
        "DAINFOCODE": {
          "type": "text"
        },
        "DAINFONAME": {
          "type": "text"
        },
        "DADIRCODE": {
          "type": "text"
        },
        "DAINFOTYPE": {
          "type": "text"
        },
        "DAINFOFLAG": {
          "type": "text"
        },
        "DACOMPID": {
          "type": "text"
        },
        "DAVERSION": {
          "type": "text"
        },
        "DAINFOATTR1": {
          "type": "text"
        },
        "DAINFOATTR2": {
          "type": "text"
        },
        "DAINFOATTR3": {
          "type": "text"
        },
        "DAINFOATTR4": {
          "type": "text"
        },
        "DAINFOATTR5": {
          "type": "text"
        },
        "DAINFOATTR6": {
          "type": "text"
        },
        "DAINFOATTR7": {
          "type": "text"
        },
        "DAINFOATTR8": {
          "type": "text"
        },
        "DAINFOATTR9": {
          "type": "text"
        },
        "DAINFOATTR10": {
          "type": "text"
        },
        "DAINFOATTR11": {
          "type": "text",
          "analyzer": "ik_max_word"
        },
        "DAINFOATTR12": {
          "type": "text"
        },
        "DAINFOATTR13": {
          "type": "text",
          "fielddata": true,
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          }
        },
        "DAINFOATTR14": {
          "type": "text"
        },
        "DAINFOATTR15": {
          "type": "text",
          "fielddata": true,
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          }
        },
        "DAINFOATTR16": {
          "type": "text"
        },
        "DAINFOATTR17": {
          "type": "text",
          "fielddata": true
        },
        "DAINFOATTR18": {
          "type": "text"
        },
        "DAINFOATTR19": {
          "type": "integer"
        },
        "DAINFOATTR20": {
          "type": "text",
          "fielddata": true,
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          }
        },
        "DAINFOATTR21": {
          "type": "text",
          "fielddata": true,
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          }
        },
        "DAINFOATTR22": {
          "type": "text"
        },
        "DAINFOATTR23": {
          "type": "text"
        },
        "DAINFOATTR24": {
          "type": "text"
        },
        "DAINFOATTR25": {
          "type": "text"
        },
        "DAINFOATTR26": {
          "type": "text"
        },
        "DAINFOATTR27": {
          "type": "text"
        },
        "DAINFOATTR28": {
          "type": "text"
        },
        "DAINFOATTR29": {
          "type": "text"
        },
        "DAINFOATTR30": {
          "type": "text"
        },
        "DAINFOATTR31": {
          "type": "text"
        },
        "DAINFOATTR32": {
          "type": "text"
        },
        "DAMEMO1": {
          "type": "text"
        },
        "DAMEMO2": {
          "type": "text"
        },
        "DAMEMO3": {
          "type": "text"
        },
        "DAMEMO4": {
          "type": "text"
        },
        "DAMEMO5": {
          "type": "text"
        },
        "DAMEMO6": {
          "type": "text"
        },
        "DAMEMO7": {
          "type": "text"
        },
        "DAMEMO8": {
          "type": "text"
        },
        "DAMEMO9": {
          "type": "text"
        },
        "DAMEMO10": {
          "type": "text"
        },
        "DAMEMO11": {
          "type": "text"
        },
        "DAMEMO12": {
          "type": "text"
        },
        "DAMEMO13": {
          "type": "text"
        },
        "DAMEMO14": {
          "type": "text"
        },
        "DAMEMO15": {
          "type": "text"
        },
        "DAMEMO16": {
          "type": "text"
        },
        "DAMEMO17": {
          "type": "text"
        },
        "DAMEMO18": {
          "type": "text"
        },
        "DAMEMO19": {
          "type": "text"
        },
        "DAMEMO20": {
          "type": "text"
        },
        "DAMEMO21": {
          "type": "text"
        },
        "DAMEMO22": {
          "type": "text"
        },
        "DAMEMO23": {
          "type": "text"
        },
        "DAMEMO24": {
          "type": "text"
        },
        "DAMEMO25": {
          "type": "text"
        },
        "CREATETIME": {
          "type": "date"
        },
        "UPDATETIME": {
          "type": "date"
        }
      }
    }
  }
}

这个建立索引的语句在上一篇也出现过,可以对比一下看出有什么区别。

下面我们进行Java代码的编写,值得一提的是,有一种方法可以快速地将所编写的Java代码在Elasticsearch中执行检验效果,这种方法在开发过程中非常值得推荐。

/**
     * 分组求和,group by多个字段
     * @param index
     * @return
     */
    public Map<Object, Object> searchByIndexAndGroupBy(String index) throws IOException {
        List<Map<String,Object>> result = new ArrayList<Map<String,Object>>();
        List<String> nameList = new ArrayList<String>();

        HashMap<Object, Object> resultMap = new HashMap<>();
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        //group by两个字段
        TermsAggregationBuilder builder = AggregationBuilders.terms("dept_name").field("DAINFOATTR15.keyword");
        TermsAggregationBuilder builder2 = AggregationBuilders.terms("dept_id").field("DAINFOATTR17");

        //sum求和的字段
        SumAggregationBuilder sumAggregationBuilder = AggregationBuilders.sum("heat_num").field("DAINFOATTR19");
        //拼接builder
        searchSourceBuilder.aggregation(builder.subAggregation(builder2).subAggregation(sumAggregationBuilder)).size(0);

        //--------------------------------------
        //处理返回的结果
        SearchRequest searchRequest = new SearchRequest(index).source(searchSourceBuilder);
        // 获取返回值
        SearchResponse response = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
        Map<String, Aggregation> aggMap = response.getAggregations().getAsMap();

        //获得名字为dept_name的Aggregation,强转为ParsedStringTerms
        ParsedStringTerms deptName = (ParsedStringTerms) aggMap.get("dept_name");

        //遍历每一个Bucket
        for (Terms.Bucket deptNameBucket : deptName.getBuckets()) {
            Map<String,Object> map = new HashMap<String,Object>();
            //获取Bucket的key
            String deptNames = (String) deptNameBucket.getKey();

            map.put("deptNames",deptNames);

            nameList.add(deptNames);

            //获取Bucket中的多个Aggregation的Map,通过key获取对应的Aggregation
            Map<String, Aggregation> subAggMap = deptNameBucket.getAggregations().getAsMap();
            ParsedSum sum = (ParsedSum) subAggMap.get("heat_num");

            map.put("sum",(int) sum.getValue());

            ParsedStringTerms deptId = (ParsedStringTerms) subAggMap.get("dept_id");
            for (Terms.Bucket posBucket : deptId.getBuckets()) {
                String deptIds = (String) posBucket.getKey();
                map.put("deptIds",deptIds);
            }
            result.add(map);
        }
        resultMap.put("result",result);
        resultMap.put("nameList",nameList);
        return resultMap;
    }

直接看Java代码会是有点懵,看不懂,那我们再获取结果之前对searchSourceBuilder的信息进行打印,并且直接在Elasticsearch中执行验证是否正确,在虚线的上方增加一行代码,虚线下方的代码可以暂时注掉:

System.out.println(searchSourceBuilder.toString());

然后调用该方法,查看控制台的打印信息,复制该段json字符串,在Elasticsearch执行验证结果:

GET daac0118_da_info/da_info/_search
{
  "size": 0, --表示不查看具体的查询结果
  "aggregations": {
    "dept_name": { --起别名
      "terms": {
        "field": "DAINFOATTR15.keyword", --group by的字段
        "size": 10,
        "min_doc_count": 1,
        "shard_min_doc_count": 0,
        "show_term_doc_count_error": false,
        "order": [
          {
            "_count": "desc"
          },
          {
            "_key": "asc"
          }
        ]
      },
      "aggregations": { --聚合的设置
        "dept_id": { --起的别名
          "terms": {
            "field": "DAINFOATTR17", --group by的字段
            "size": 10,
            "min_doc_count": 1,
            "shard_min_doc_count": 0,
            "show_term_doc_count_error": false,
            "order": [
              {
                "_count": "desc"
              },
              {
                "_key": "asc"
              }
            ]
          }
        },
        "heat_num": { --起的别名
          "sum": {
            "field": "DAINFOATTR19" --sum求和的字段
          }
        }
      }
    }
  }
}

我们直接在kibana的开发者工具中执行,查看查询的结果:

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 285,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "dept_name": { --聚合条件一
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "集团理财部",
          "doc_count": 156,
          "heat_num": {
            "value": 294
          },
          "dept_id": { --聚合条件二
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "33333333",
                "doc_count": 156
              }
            ]
          }
        },
        {
          "key": "集团数据治理部",
          "doc_count": 78,
          "heat_num": {
            "value": 156
          },
          "dept_id": { --聚合条件二
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "11111111",
                "doc_count": 78
              }
            ]
          }
        },
        {
          "key": "集团法务部",
          "doc_count": 45,
          "heat_num": {
            "value": 112
          },
          "dept_id": { --聚合条件二
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "22222222",
                "doc_count": 45
              }
            ]
          }
        },
        {
          "key": "",
          "doc_count": 6,
          "heat_num": {
            "value": 6
          },
          "dept_id": { --聚合条件二
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "00000000",
                "doc_count": 6
              }
            ]
          }
        }
      ]
    }
  }
}

看到这个结果,我们处理返回结果就会变得非常方便,首先聚合条件下返回的每一个结果都会放在buckets数组中,我们需要去遍历这个数组,拿到我们需要的值,如果有多个聚合的条件,还需要进一步到聚合条件中获取buckets数组,也就是进行双重for循环,以此类推,group by的字段越多,需要循环的次数就越多。这就是一次将SQL转化为Elasticsearch中查询的实战,我所用的组件的版本在上一篇数据同步的文章中都有叙述,这边就不再进行重复,有兴趣的同学可以自己尝试一下。