Query DSL结合springboot使用
- Query DSL
- 数据准备
- match_all
- 术语级查询
- Term Query
- Terms Query
- Exists Query
- Ids Query
- Range Query
- Prefix Query
- Wildcard Query
- Fuzzy Query
Query DSL
Elasticsearch 提供了基于 JSON 的完整 Query DSL(Domain Specific Language)来定义查询。
因Query DSL是利用Rest API传递JSON格式的请求体(RequestBody)数据与ES进行交互,所以我们在使用springboot的时候也可以很方便的进行集成,本文主要讲述的就是使用springboot实现各类DSL的语法查询。
Elasticsearch 官网地址
数据准备
新增名为(dsl_index)的索引,并插入部分数据,本文使用springboot变更Elasticsearch数据都是通过RestHighLevelClient来操作的
索引(dsl_index)结构:
GET dsl_index/_mappings
{
"dsl_index" : {
"mappings" : {
"properties" : {
"age" : {
"type" : "long"
},
"description" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"name" : {
"type" : "text",
"analyzer" : "ik_max_word",
"search_analyzer" : "ik_smart"
}
}
}
}
}
索引(dsl_index)数据:
POST /dsl_index/_bulk
{"index":{"_id":1}}
{"name":"张三","age":11,"description":"南京市 羽毛球爱好者"}
{"index":{"_id":2}}
{"name":"王五","age":15,"description":"北京市 篮球两年半"}
{"index":{"_id":3}}
{"name":"李四","age":18,"description":"山东省 游泳健身"}
{"index":{"_id":4}}
{"name":"富贵","age":22,"description":"天津市 游泳打球"}
{"index":{"_id":5}}
{"name":"来福","age":8,"description":"安徽合肥 职业代练"}
{"index":{"_id":6}}
{"name":"憨憨","age":27,"description":"北京市 健身打球"}
{"index":{"_id":7}}
{"name":"小七","age":31,"description":"北京市 游泳"}
match_all
match_all会查询指定索引下的所有文档,但是默认只会返回10条数据。原因是:_search查询默认采用的是分页查询,from=0;size=10 如果想显示更多数据,指定size数量
DSL: 查询当前索引下所有数据(默认前十条)
GET dsl_index/_search
{
"query": {
"match_all": {}
}
}
返回数据如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 7,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "dsl_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "张三",
"age" : 11,
"description" : "南京市 羽毛球爱好者"
}
},
{
"_index" : "dsl_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "王五",
"age" : 15,
"description" : "北京市 篮球两年半"
}
},
{
"_index" : "dsl_index",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"name" : "李四",
"age" : 18,
"description" : "山东省 游泳健身"
}
},
{
"_index" : "dsl_index",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"name" : "富贵",
"age" : 22,
"description" : "天津市 游泳打球"
}
},
{
"_index" : "dsl_index",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"name" : "来福",
"age" : 8,
"description" : "安徽合肥 职业代练"
}
},
{
"_index" : "dsl_index",
"_type" : "_doc",
"_id" : "6",
"_score" : 1.0,
"_source" : {
"name" : "憨憨",
"age" : 27,
"description" : "北京市 健身打球"
}
},
{
"_index" : "dsl_index",
"_type" : "_doc",
"_id" : "7",
"_score" : 1.0,
"_source" : {
"name" : "小七",
"age" : 31,
"description" : "北京市 游泳"
}
}
]
}
}
sprongboot实现:
代码:
private static final String INDEX_NAME = "dsl_index"; -- 以下统一使用该索引
@Resource
private RestHighLevelClient client; -- 以下统一使用该client
@RequestMapping(value = "/matchAll", method = RequestMethod.GET)
@ApiOperation(value = "DSL - match_all")
public void matchAll() throws Exception {
// 定义请求对象
SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
// 查询所有
searchRequest.source(new SearchSourceBuilder().query(QueryBuilders.matchAllQuery()));
// 打印返回数据
printLog(client.search(searchRequest, RequestOptions.DEFAULT));
}
查询结果如下:
{name=张三, description=南京市 羽毛球爱好者, age=11}
{name=王五, description=北京市 篮球两年半, age=15}
{name=李四, description=山东省 游泳健身, age=18}
{name=富贵, description=天津市 游泳打球, age=22}
{name=来福, description=安徽合肥 职业代练, age=8}
{name=憨憨, description=北京市 健身打球, age=27}
{name=小七, description=北京市 游泳, age=31}
术语级查询
术语级别查询(Term-Level Queries)指的是搜索内容不经过文本分析直接用于文本匹配,这个过程类似于数据库的SQL查询,搜索的对象大多是索引的非text类型字段。
Term Query
术语查询直接返回包含搜索内容的文档,常用来查询索引中某个类型为keyword的文本字段,类似于SQL的“=”查询,因此最好不要在term查询的字段中使用text字段,因为text字段会被分词,这样做既没有意义,还很有可能什么也查不到。
DSL: 查询当前索引下age=31的数据
GET dsl_index/_search
{
"query": {
"term": {
"age": {
"value": "31"
}
}
}
}
返回数据如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "dsl_index",
"_type" : "_doc",
"_id" : "7",
"_score" : 1.0,
"_source" : {
"name" : "小七",
"age" : 31,
"description" : "北京市 游泳"
}
}
]
}
}
springboot实现:
代码如下:
@RequestMapping(value = "/term", method = RequestMethod.GET)
@ApiOperation(value = "DSL - term")
public void term() throws Exception {
// 定义请求对象
SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
// 查询
searchRequest.source(new SearchSourceBuilder().query(QueryBuilders.termQuery("age",31)));
// 打印返回数据
printLog(client.search(searchRequest, RequestOptions.DEFAULT));
}
查询结果如下:
{name=小七, description=北京市 游泳, age=31}
Terms Query
Terms query用于在指定字段上匹配多个词项(terms)。它会精确匹配指定字段中包含的任何一个词项。
DSL: 查询当前索引中age 为31或者15的数据,类似mysql的age in (‘15’,‘31’)
GET dsl_index/_search
{
"query": {
"terms": {
"age": ["31","15"]
}
}
}
返回结果如下:
{
"took" : 13,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "dsl_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "王五",
"age" : 15,
"description" : "北京市 篮球两年半"
}
},
{
"_index" : "dsl_index",
"_type" : "_doc",
"_id" : "7",
"_score" : 1.0,
"_source" : {
"name" : "小七",
"age" : 31,
"description" : "北京市 游泳"
}
}
]
}
}
springboot实现:
@RequestMapping(value = "/terms", method = RequestMethod.GET)
@ApiOperation(value = "DSL - terms")
public void terms() throws Exception {
// 定义请求对象
SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
// 查询
searchRequest.source(new SearchSourceBuilder().query(QueryBuilders.termsQuery("age", new String[]{"15", "31"})));
// 打印返回数据
printLog(client.search(searchRequest, RequestOptions.DEFAULT));
}
返回结果如下:
{name=王五, description=北京市 篮球两年半, age=15}
{name=小七, description=北京市 游泳, age=31}
Exists Query
在Elasticsearch中可以使用exists进行查询,判断文档中是否存在对应的字段。
DSL: 判断当前索引中是否存在sex字段
GET dsl_index/_search
{
"query": {
"exists": {
"field": "sex"
}
}
}
返回结果如下: 很明显返回值hits中并无数据,说明当前索引中没有sex字段
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
springboot实现:
@RequestMapping(value = "/exists", method = RequestMethod.GET)
@ApiOperation(value = "DSL - exists")
public void exists() throws Exception {
// 定义请求对象
SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
// 查询
searchRequest.source(new SearchSourceBuilder().query(QueryBuilders.existsQuery("sex")));
// 打印返回数据
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT)
SearchHits hits = searchResponse.getHits();
System.out.println("返回hits数组长度:" + hits.getHits().length);
}
返回结果如下:
返回hits数组长度:0
Ids Query
ids 关键字 : 值为每条文档的默认主键,根据一组id获取多个对应的文档
DSL: 查询_id为1或者2的数据
GET dsl_index/_search
{
"query": {
"ids": {
"values": [1,2]
}
}
}
返回数据如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "dsl_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "张三",
"age" : 11,
"description" : "南京市 羽毛球爱好者"
}
},
{
"_index" : "dsl_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "王五",
"age" : 15,
"description" : "北京市 篮球两年半"
}
}
]
}
}
springboot实现:
@RequestMapping(value = "/ids", method = RequestMethod.GET)
@ApiOperation(value = "DSL - ids")
public void ids() throws Exception {
// 定义请求对象
SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
// 第一种方式
// searchRequest.source(new SearchSourceBuilder().query(QueryBuilders.termsQuery("_id","1","2")));
// 第二种方式
searchRequest.source(new SearchSourceBuilder().query(QueryBuilders.idsQuery().addIds(new String[]{"1","2"})));
// 打印返回数据
printLog(client.search(searchRequest, RequestOptions.DEFAULT));
}
返回结果如下:
{name=张三, description=南京市 羽毛球爱好者, age=11}
{name=王五, description=北京市 篮球两年半, age=15}
Range Query
范围查询:
- range:范围关键字
- gte 大于等于
- lte 小于等于
- gt 大于
- lt 小于
- now 当前时间
DSL: 查询当前索引下age>=20且age<=30的数据
GET dsl_index/_search
{
"query": {
"range": {
"age": {
"gte": 20,
"lte": 30
}
}
}
}
返回数据如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "dsl_index",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"name" : "富贵",
"age" : 22,
"description" : "天津市 游泳打球"
}
},
{
"_index" : "dsl_index",
"_type" : "_doc",
"_id" : "6",
"_score" : 1.0,
"_source" : {
"name" : "憨憨",
"age" : 27,
"description" : "北京市 健身打球"
}
}
]
}
}
springboot:
@RequestMapping(value = "/range", method = RequestMethod.GET)
@ApiOperation(value = "DSL - range")
public void range() throws Exception {
// 定义请求对象
SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
// 查询
searchRequest.source(new SearchSourceBuilder().query(QueryBuilders.rangeQuery("age").gte(20).lte(30)));
// 打印返回数据
printLog(client.search(searchRequest, RequestOptions.DEFAULT));
}
返回结果如下:
{name=富贵, description=天津市 游泳打球, age=22}
{name=憨憨, description=北京市 健身打球, age=27}
Prefix Query
前缀查询 :
- 它不会分析要搜索字符串,传入的前缀就是想要查找的前缀
- prefix的原理:需要遍历所有倒排索引,并比较每个term是否以所指定的前缀开头。
此时当前索引中的数据如下: - 默认状态下,前缀查询不做相关度分数计算,它只是将所有匹配的文档返回,然后赋予所有相关分数值为1。它的行为更像是一个过滤器而不是查询。两者实际的区别就是过滤器是可以被缓存的,而前缀查询不行。
{name=张三, description=南京市 羽毛球爱好者, age=11}
{name=王五, description=北京市 篮球两年半, age=15}
{name=李四, description=山东省 游泳健身, age=18}
{name=富贵, description=天津市 游泳打球, age=22}
{name=来福, description=安徽合肥 职业代练, age=8}
{name=憨憨, description=北京市 健身打球, age=27}
{name=小七, description=北京市 游泳, age=31}
DSL: 查询description以 "南"开头的数据
GET dsl_index/_search
{
"query": {
"prefix": {
"description": {
"value": "南"
}
}
}
}
返回数据如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "dsl_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "张三",
"age" : 11,
"description" : "南京市 羽毛球爱好者"
}
}
]
}
}
springboot实现:
@RequestMapping(value = "/prefix", method = RequestMethod.GET)
@ApiOperation(value = "DSL - prefix")
public void prefix() throws Exception {
// 定义请求对象
SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
// 查询
searchRequest.source(new SearchSourceBuilder().query(QueryBuilders.prefixQuery("description","南")));
// 打印返回数据
printLog(client.search(searchRequest, RequestOptions.DEFAULT));
}
返回数据如下:
{name=张三, description=南京市 羽毛球爱好者, age=11}
疑问: 查询description以 “南” 开头有数据返回,查询 “南京” 开头的无数据返回
DSL:
GET dsl_index/_search
{
"query": {
"prefix": {
"description": {
"value": "南京"
}
}
}
}
返回结果如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
原因: 这就是上文说到的,prefix的原理是不对查询的数据分词,用查询的数据直接遍历所有倒排索引
而我们在创建索引的时候,description字段使用的是默认分词器 “standard”,该分词器会把输入的每个字符单独拆分,比方说把 "南京市 " 拆分成 “南”,“京”,“市”,顾而通过 “南” 字符去匹配能查询到数据,而通过 “南京” 去匹配无数据
分词器 “standard” 解析 “南京市” 如下:
POST _analyze
{
"analyzer": "standard",
"text": ["南京市"]
}
结果如下:
{
"tokens" : [
{
"token" : "南",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<IDEOGRAPHIC>",
"position" : 0
},
{
"token" : "京",
"start_offset" : 1,
"end_offset" : 2,
"type" : "<IDEOGRAPHIC>",
"position" : 1
},
{
"token" : "市",
"start_offset" : 2,
"end_offset" : 3,
"type" : "<IDEOGRAPHIC>",
"position" : 2
}
]
}
那么我们如果就想通过 “南京” 去搜索到数据如何实现呢?可以在创建索引的时候,指定字段分词器为
“ik_max_word”,该分词器划分粒度比较细,当然也可以指定其它分词器或者自定义分词器,看场景需要,如下是 “ik_max_word” 分词器解析 “南京市” :
POST _analyze
{
"analyzer": "ik_max_word",
"text": ["南京市"]
}
结果如下:
{
"tokens" : [
{
"token" : "南京市",
"start_offset" : 0,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "南京",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "市",
"start_offset" : 2,
"end_offset" : 3,
"type" : "CN_CHAR",
"position" : 2
}
]
}
Wildcard Query
通配符查询:工作原理和prefix相同,只不过它不是只比较开头,它能支持更为复杂的匹配模式。
注意:其实无论是前缀匹配还是通配符查询,针对的都是倒排索引。
DSL: 查询当前索引下description字段中含有 “篮” 的数据
GET dsl_index/_search
{
"query": {
"prefix": {
"description": {
"value": "篮"
}
}
}
}
返回结果如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "dsl_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "王五",
"age" : 15,
"description" : "北京市 篮球两年半"
}
}
]
}
}
springboot实现:
@RequestMapping(value = "/wildcard", method = RequestMethod.GET)
@ApiOperation(value = "DSL - wildcard")
public void wildcard() throws Exception {
// 定义请求对象
SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
// 查询
searchRequest.source(new SearchSourceBuilder().query(QueryBuilders.wildcardQuery("description","篮")));
// 打印返回数据
printLog(client.search(searchRequest, RequestOptions.DEFAULT));
}
返回结果如下:
{name=王五, description=北京市 篮球两年半, age=15}
Fuzzy Query
模糊查询:
在实际的搜索中,我们有时候会打错字,从而导致搜索不到。在Elasticsearch中,我们可以使用fuzziness属性来进行模糊查询,从而达到搜索有错别字的情形。
fuzzy 查询会用到两个很重要的参数,fuzziness,prefix_length
- fuzziness:表示输入的关键字通过几次操作可以转变成为ES库里面的对应field的字段
- 操作是指:新增一个字符,删除一个字符,修改一个字符,每次操作可以记做编辑距离为1;
- 如中文集团到中威集团编辑距离就是1,只需要修改一个字符;如果fuzziness值在这里设置成2,会把编辑距离为2的东东集团也查出来。
- 该参数默认值为0,即不开启模糊查询; fuzzy 模糊查询 最大模糊错误必须在0-2之间
- prefix_length:表示限制输入关键字和ES对应查询field的内容开头的第n个字符必须完全匹配,不允许错别字匹配;
- 如这里等于1,则表示开头的字必须匹配,不匹配则不返回;
- 默认值也是0;
- 加大prefix_length的值可以提高效率和准确率。
DSL: 此时我们索引中有以下的数据:{name=王五, description=北京市 篮球两年半, age=15}
我们搜索description字段值为 “北京市 足球两年半” 故意输错一个字符
1.当设置fuzziness=0时:
GET dsl_index/_search
{
"query": {
"fuzzy": {
"description.keyword": {
"value": "北京市 足球两年半",
"fuzziness": 0
}
}
}
}
无数据返回,如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
2.当设置fuzziness=1时:
GET dsl_index/_search
{
"query": {
"fuzzy": {
"description.keyword": {
"value": "北京市 足球两年半",
"fuzziness": 1
}
}
}
}
返回数据如下:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.4879789,
"hits" : [
{
"_index" : "dsl_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.4879789,
"_source" : {
"name" : "王五",
"age" : 15,
"description" : "北京市 篮球两年半"
}
}
]
}
}
springboot实现:
搜索description字段值为 "北京市 足球两年半",并且可以错一个字符:
@RequestMapping(value = "/fuzzy", method = RequestMethod.GET)
@ApiOperation(value = "DSL - fuzzy")
public void fuzzy() throws Exception {
// 定义请求对象
SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
// 查询
searchRequest.source(new SearchSourceBuilder().query(
QueryBuilders.fuzzyQuery("description.keyword","北京市 篮球2年半").fuzziness(Fuzziness.ONE)));
// 打印返回数据
printLog(client.search(searchRequest, RequestOptions.DEFAULT));
}
返回数据如下:
{name=王五, description=北京市 篮球两年半, age=15}
以上就是Query DSL术语级别查询并结合springboot的使用方法,后期继续介绍全文检索结合springboot使用。