java ES 查询条件 es根据条件查询

转载

mob64ca13f83523 2023-10-18 21:15:30

文章标签 java ES 查询条件 elasticsearch Python 字段 文章分类 Java 后端开发

子条件查询

子条件查询又称为叶子条件查询，在特定字段中查询所指定值。在子查询中又分为Query Context和FilterContext。

QueryContext

在QueryContext查询过程中，除了判断文档是否满足查询条件外，elasticsearch还会计算一个"_score"来标识匹配的文档的标识度，用于判断目标文档和查询条件匹配的又多吻合。

在QueryContext中，针对于文本类型可以使用全文本类型查询，针对结构化数据、数字、日期等可以使用字段级别的查询。

全文本查询

模糊匹配

在模糊匹配中，使用match去查找对应字段含有某些关键字的文档，比如：在book索引中，查找含有“Python精通”关键字的文档信息，请求json如下所示。这里使用“profile”属性可以查看es的执行情况。

{
	"profile" : "true",
	"query" : {
		"match" : {
			"title" : "Python精通"
		}
	}
}

得到返回结果中的hits中包含三个文档，如下所示。

"hits": {
        "total": 3,
        "max_score": 0.94566005,
        "hits": [
            {
                "_index": "book",
                "_type": "novel",
                "_id": "5",
                "_score": 0.94566005,
                "_source": {
                    "author": "瓦力",
                    "title": "Python蒸汽机的时代",
                    "word_count": "320000",
                    "publish_date": "2015-02-10"
                }
            },
            {
                "_index": "book",
                "_type": "novel",
                "_id": "2",
                "_score": 0.5773649,
                "_source": {
                    "author": "王五",
                    "title": "elasticsearch精通",
                    "word_count": "100000",
                    "publish_date": "2018-09-23"
                }
            },
            {
                "_index": "book",
                "_type": "novel",
                "_id": "4",
                "_score": 0.40913987,
                "_source": {
                    "author": "赵六",
                    "title": "SpringBoot从入门到精通",
                    "word_count": "320000",
                    "publish_date": "2015-02-10"
                }
            }
        ]
    }

从上面结果可以发现，在返回的结果中并不是每个文档的title都精准命中"Python精通"关键字，而是含有"Python"或者"精通"的文档。可以通过返回结果的profile内容查看es的执行情况，摘取profile部分内容如下所示。

"query": [
    {
        "type": "BooleanQuery",
        "description": "title:python title:精通",
        "time_in_nanos": 18647768,
        "breakdown": {
            "score": 11757,
            "build_scorer_count": 6,
            "match_count": 0,
            "create_weight": 483255,
            "next_doc": 10709,
            "match": 0,
            "create_weight_count": 1,
            "next_doc_count": 6,
            "score_count": 3,
            "build_scorer": 18142031,
            "advance": 0,
            "advance_count": 0
        }
    }

选取query中的一部分结果，主要关注description的内容，可以发现"python精通"在es执行时，被分解为了"python"和"精通"两个词语，也就可以解释返回文档的title为是含有"Python"或者"精通"的文档了。通过测试可以发现，match的查询会先进行分词，然后将包含的内容全部查询出来。

习语匹配查询

在习语匹配查询中，使用match_phrase来查找对应字段含有某些关键字的文档，比如：h还在book索引中，查找含有“Python精通”关键字的文档信息，请求json如下所示。这里使用“profile”属性可以查看es的执行情况。

{
	"profile" : "true",
	"query" : {
		"match_phrase" : {
			"title" : "Python精通"
		}
	}
}

本次查询结果的hits中不包含任何文档，为什么会出现这种现象呢？还是通过profile的内容来查看es的执行情况，摘取profile的部分内容。

"query": [
    {
        "type": "PhraseQuery",
        "description": "title:\"python 精通\"",
        "time_in_nanos": 5573645,
        "breakdown": {
            "score": 0,
            "build_scorer_count": 3,
            "match_count": 0,
            "create_weight": 1994068,
            "next_doc": 0,
            "match": 0,
            "create_weight_count": 1,
            "next_doc_count": 0,
            "score_count": 0,
            "build_scorer": 3579573,
            "advance": 0,
            "advance_count": 0
        }
    }

可以发现在使用match_phrase的习语匹配模式时，与match匹配模式不同，description将"Python 精通"作为一个字符串来进行查找，并没有进行分词。通过实验可以得出结论：match_phrase习语匹配模式并不会将查找内容进行分词，而是直接作为一个字符串进行查询操作。

多个字段的匹配查询

多字段匹配查询，使用multi_match进行匹配查询。在book索引中，查询title和author中查找包含"Python精通李"的文档内容，json请求参数如下所示。

{
	"profile" : "true",
	"query" : {
		"multi_match" : {
			"query" : "Python精通李四",
			"fields" : ["title", "author"]
		}
	}
}

得到返回值文档hits中包含4个文档，如下所示。

"hits": {
        "total": 4,
        "max_score": 0.94566005,
        "hits": [
            {
                "_index": "book",
                "_type": "novel",
                "_id": "5",
                "_score": 0.94566005,
                "_source": {
                    "author": "瓦力",
                    "title": "Python蒸汽机的时代",
                    "word_count": "320000",
                    "publish_date": "2015-02-10"
                }
            },
            {
                "_index": "book",
                "_type": "novel",
                "_id": "3",
                "_score": 0.6931472,
                "_source": {
                    "author": "李四",
                    "title": "elasticsearch入门",
                    "word_count": "40000",
                    "publish_date": "2018-02-10"
                }
            },
            {
                "_index": "book",
                "_type": "novel",
                "_id": "2",
                "_score": 0.5773649,
                "_source": {
                    "author": "王五",
                    "title": "elasticsearch精通",
                    "word_count": "100000",
                    "publish_date": "2018-09-23"
                }
            },
            {
                "_index": "book",
                "_type": "novel",
                "_id": "4",
                "_score": 0.40913987,
                "_source": {
                    "author": "赵六",
                    "title": "SpringBoot从入门到精通",
                    "word_count": "320000",
                    "publish_date": "2015-02-10"
                }
            }
        ]
    }

与match查询方式不同，返回结果增加了title为"elasticsearch入门"的文档。通过profile内容查看es的执行情况。

"query": [
    {
        "type": "DisjunctionMaxQuery",
        "description": "((title:python title:精通 title:李四) | (author:python author:精通 author:李四))",
        "time_in_nanos": 408258,
        "breakdown": {
            "score": 12277,
            "build_scorer_count": 6,
            "match_count": 0,
            "create_weight": 140825,
            "next_doc": 13252,
            "match": 0,
            "create_weight_count": 1,
            "next_doc_count": 6,
            "score_count": 3,
            "build_scorer": 241888,
            "advance": 0,
            "advance_count": 0
        }
    }

从profile的description中可以看到，"Python精通李四"被分词后同时作用在title和author字段上，也就可以解释为何返回结果增加了title为"elasticsearch入门"的文档。

Query String语法查询

Query String称为语法查询，可以根据一定的语法规则进行查询，经常使用在kibana中。它的query字段中支持通配符、范围查询、bool查询、正则表达式等。同样在book索引中查询文档，json请求体如下所示。

{
	"profile" : "true",
	"query" : {
		"query_string" : {
			"query" : "(elastic and SpringBoot) or 入门",
			"fields" : ["title", "author"]
		}
	}
}

得到返回值包含两个文档，如下所示。

"hits": {
        "total": 2,
        "max_score": 1.7076308,
        "hits": [
            {
                "_index": "book",
                "_type": "novel",
                "_id": "4",
                "_score": 1.7076308,
                "_source": {
                    "author": "赵六",
                    "title": "SpringBoot从入门到精通",
                    "word_count": "320000",
                    "publish_date": "2015-02-10"
                }
            },
            {
                "_index": "book",
                "_type": "novel",
                "_id": "3",
                "_score": 0.7549128,
                "_source": {
                    "author": "李四",
                    "title": "elasticsearch入门",
                    "word_count": "40000",
                    "publish_date": "2018-02-10"
                }
            }
        ]
    }

通过profile的description来查看es的执行情况。

"query": [
    {
        "type": "BooleanQuery",
        "description": "((title:elastic title:springboot) | (author:elastic author:springboot)) (title:入门 | author:入门)",
        "time_in_nanos": 345098,
        "breakdown": {
            "score": 8146,
            "build_scorer_count": 4,
            "match_count": 0,
            "create_weight": 129655,
            "next_doc": 8180,
            "match": 0,
            "create_weight_count": 1,
            "next_doc_count": 2,
            "score_count": 1,
            "build_scorer": 199109,
            "advance": 0,
            "advance_count": 0
        }
    }

字段级别查询

字段级别的查询支持范围查询，范围查询用于匹配某些字段的字段值位于特定范围内的文档。比如，在book索引中，要查询数据的字数在10000~50000字的数据，json请求体如下所示。

{
	"profile" : "true",
	"query" :{
		"range" : {
			"word_count" : {
				"gte" : 10000,
				"lte" : 50000
			}
		}
	}
}

返回结果中有一条文档信息，如下所示。

"hits": {
        "total": 1,
        "max_score": 1,
        "hits": [
            {
                "_index": "book",
                "_type": "novel",
                "_id": "3",
                "_score": 1,
                "_source": {
                    "author": "李四",
                    "title": "elasticsearch入门",
                    "word_count": "40000",
                    "publish_date": "2018-02-10"
                }
            }
        ]
    }

通过profile来查看es的执行情况。

"query": [
    {
        "type": "IndexOrDocValuesQuery",
        "description": "word_count:[10000 TO 50000]",
        "time_in_nanos": 6170642,
        "breakdown": {
            "score": 0,
            "build_scorer_count": 6,
            "match_count": 0,
            "create_weight": 1329740,
            "next_doc": 1440,
            "match": 0,
            "create_weight_count": 1,
            "next_doc_count": 3,
            "score_count": 0,
            "build_scorer": 4839452,
            "advance": 0,
            "advance_count": 0
        }
    }

可以发现，范围查询，es转换为了to的查询语句。