子条件查询

子条件查询又称为叶子条件查询,在特定字段中查询所指定值。在子查询中又分为Query Context和FilterContext。

QueryContext

在QueryContext查询过程中,除了判断文档是否满足查询条件外,elasticsearch还会计算一个"_score"来标识匹配的文档的标识度,用于判断目标文档和查询条件匹配的又多吻合。

在QueryContext中,针对于文本类型可以使用全文本类型查询,针对结构化数据、数字、日期等可以使用字段级别的查询。

全文本查询
  • 模糊匹配

在模糊匹配中,使用match去查找对应字段含有某些关键字的文档,比如:在book索引中,查找含有“Python精通”关键字的文档信息,请求json如下所示。这里使用“profile”属性可以查看es的执行情况。

{
	"profile" : "true",
	"query" : {
		"match" : {
			"title" : "Python精通"
		}
	}
}

得到返回结果中的hits中包含三个文档,如下所示。

"hits": {
        "total": 3,
        "max_score": 0.94566005,
        "hits": [
            {
                "_index": "book",
                "_type": "novel",
                "_id": "5",
                "_score": 0.94566005,
                "_source": {
                    "author": "瓦力",
                    "title": "Python蒸汽机的时代",
                    "word_count": "320000",
                    "publish_date": "2015-02-10"
                }
            },
            {
                "_index": "book",
                "_type": "novel",
                "_id": "2",
                "_score": 0.5773649,
                "_source": {
                    "author": "王五",
                    "title": "elasticsearch精通",
                    "word_count": "100000",
                    "publish_date": "2018-09-23"
                }
            },
            {
                "_index": "book",
                "_type": "novel",
                "_id": "4",
                "_score": 0.40913987,
                "_source": {
                    "author": "赵六",
                    "title": "SpringBoot从入门到精通",
                    "word_count": "320000",
                    "publish_date": "2015-02-10"
                }
            }
        ]
    }

从上面结果可以发现,在返回的结果中并不是每个文档的title都精准命中"Python精通"关键字,而是含有"Python"或者"精通"的文档。可以通过返回结果的profile内容查看es的执行情况,摘取profile部分内容如下所示。

"query": [
    {
        "type": "BooleanQuery",
        "description": "title:python title:精通",
        "time_in_nanos": 18647768,
        "breakdown": {
            "score": 11757,
            "build_scorer_count": 6,
            "match_count": 0,
            "create_weight": 483255,
            "next_doc": 10709,
            "match": 0,
            "create_weight_count": 1,
            "next_doc_count": 6,
            "score_count": 3,
            "build_scorer": 18142031,
            "advance": 0,
            "advance_count": 0
        }
    }

选取query中的一部分结果,主要关注description的内容,可以发现"python精通"在es执行时,被分解为了"python"和"精通"两个词语,也就可以解释返回文档的title为是含有"Python"或者"精通"的文档了。通过测试可以发现,match的查询会先进行分词,然后将包含的内容全部查询出来。

  • 习语匹配查询

在习语匹配查询中,使用match_phrase来查找对应字段含有某些关键字的文档,比如:h还在book索引中,查找含有“Python精通”关键字的文档信息,请求json如下所示。这里使用“profile”属性可以查看es的执行情况。

{
	"profile" : "true",
	"query" : {
		"match_phrase" : {
			"title" : "Python精通"
		}
	}
}

本次查询结果的hits中不包含任何文档,为什么会出现这种现象呢?还是通过profile的内容来查看es的执行情况,摘取profile的部分内容。

"query": [
    {
        "type": "PhraseQuery",
        "description": "title:\"python 精通\"",
        "time_in_nanos": 5573645,
        "breakdown": {
            "score": 0,
            "build_scorer_count": 3,
            "match_count": 0,
            "create_weight": 1994068,
            "next_doc": 0,
            "match": 0,
            "create_weight_count": 1,
            "next_doc_count": 0,
            "score_count": 0,
            "build_scorer": 3579573,
            "advance": 0,
            "advance_count": 0
        }
    }

可以发现在使用match_phrase的习语匹配模式时,与match匹配模式不同,description将"Python 精通"作为一个字符串来进行查找,并没有进行分词。通过实验可以得出结论:match_phrase习语匹配模式并不会将查找内容进行分词,而是直接作为一个字符串进行查询操作。

  • 多个字段的匹配查询

多字段匹配查询,使用multi_match进行匹配查询。在book索引中,查询title和author中查找包含"Python精通李"的文档内容,json请求参数如下所示。

{
	"profile" : "true",
	"query" : {
		"multi_match" : {
			"query" : "Python精通李四",
			"fields" : ["title", "author"]
		}
	}
}

得到返回值文档hits中包含4个文档,如下所示。

"hits": {
        "total": 4,
        "max_score": 0.94566005,
        "hits": [
            {
                "_index": "book",
                "_type": "novel",
                "_id": "5",
                "_score": 0.94566005,
                "_source": {
                    "author": "瓦力",
                    "title": "Python蒸汽机的时代",
                    "word_count": "320000",
                    "publish_date": "2015-02-10"
                }
            },
            {
                "_index": "book",
                "_type": "novel",
                "_id": "3",
                "_score": 0.6931472,
                "_source": {
                    "author": "李四",
                    "title": "elasticsearch入门",
                    "word_count": "40000",
                    "publish_date": "2018-02-10"
                }
            },
            {
                "_index": "book",
                "_type": "novel",
                "_id": "2",
                "_score": 0.5773649,
                "_source": {
                    "author": "王五",
                    "title": "elasticsearch精通",
                    "word_count": "100000",
                    "publish_date": "2018-09-23"
                }
            },
            {
                "_index": "book",
                "_type": "novel",
                "_id": "4",
                "_score": 0.40913987,
                "_source": {
                    "author": "赵六",
                    "title": "SpringBoot从入门到精通",
                    "word_count": "320000",
                    "publish_date": "2015-02-10"
                }
            }
        ]
    }

与match查询方式不同,返回结果增加了title为"elasticsearch入门"的文档。通过profile内容查看es的执行情况。

"query": [
    {
        "type": "DisjunctionMaxQuery",
        "description": "((title:python title:精通 title:李四) | (author:python author:精通 author:李四))",
        "time_in_nanos": 408258,
        "breakdown": {
            "score": 12277,
            "build_scorer_count": 6,
            "match_count": 0,
            "create_weight": 140825,
            "next_doc": 13252,
            "match": 0,
            "create_weight_count": 1,
            "next_doc_count": 6,
            "score_count": 3,
            "build_scorer": 241888,
            "advance": 0,
            "advance_count": 0
        }
    }

从profile的description中可以看到,"Python精通李四"被分词后同时作用在title和author字段上,也就可以解释为何返回结果增加了title为"elasticsearch入门"的文档。

  • Query String语法查询

Query String称为语法查询,可以根据一定的语法规则进行查询,经常使用在kibana中。它的query字段中支持通配符、范围查询、bool查询、正则表达式等。同样在book索引中查询文档,json请求体如下所示。

{
	"profile" : "true",
	"query" : {
		"query_string" : {
			"query" : "(elastic and SpringBoot) or 入门",
			"fields" : ["title", "author"]
		}
	}
}

得到返回值包含两个文档,如下所示。

"hits": {
        "total": 2,
        "max_score": 1.7076308,
        "hits": [
            {
                "_index": "book",
                "_type": "novel",
                "_id": "4",
                "_score": 1.7076308,
                "_source": {
                    "author": "赵六",
                    "title": "SpringBoot从入门到精通",
                    "word_count": "320000",
                    "publish_date": "2015-02-10"
                }
            },
            {
                "_index": "book",
                "_type": "novel",
                "_id": "3",
                "_score": 0.7549128,
                "_source": {
                    "author": "李四",
                    "title": "elasticsearch入门",
                    "word_count": "40000",
                    "publish_date": "2018-02-10"
                }
            }
        ]
    }

通过profile的description来查看es的执行情况。

"query": [
    {
        "type": "BooleanQuery",
        "description": "((title:elastic title:springboot) | (author:elastic author:springboot)) (title:入门 | author:入门)",
        "time_in_nanos": 345098,
        "breakdown": {
            "score": 8146,
            "build_scorer_count": 4,
            "match_count": 0,
            "create_weight": 129655,
            "next_doc": 8180,
            "match": 0,
            "create_weight_count": 1,
            "next_doc_count": 2,
            "score_count": 1,
            "build_scorer": 199109,
            "advance": 0,
            "advance_count": 0
        }
    }
字段级别查询

字段级别的查询支持范围查询,范围查询用于匹配某些字段的字段值位于特定范围内的文档。比如,在book索引中,要查询数据的字数在10000~50000字的数据,json请求体如下所示。

{
	"profile" : "true",
	"query" :{
		"range" : {
			"word_count" : {
				"gte" : 10000,
				"lte" : 50000
			}
		}
	}
}

返回结果中有一条文档信息,如下所示。

"hits": {
        "total": 1,
        "max_score": 1,
        "hits": [
            {
                "_index": "book",
                "_type": "novel",
                "_id": "3",
                "_score": 1,
                "_source": {
                    "author": "李四",
                    "title": "elasticsearch入门",
                    "word_count": "40000",
                    "publish_date": "2018-02-10"
                }
            }
        ]
    }

通过profile来查看es的执行情况。

"query": [
    {
        "type": "IndexOrDocValuesQuery",
        "description": "word_count:[10000 TO 50000]",
        "time_in_nanos": 6170642,
        "breakdown": {
            "score": 0,
            "build_scorer_count": 6,
            "match_count": 0,
            "create_weight": 1329740,
            "next_doc": 1440,
            "match": 0,
            "create_weight_count": 1,
            "next_doc_count": 3,
            "score_count": 0,
            "build_scorer": 4839452,
            "advance": 0,
            "advance_count": 0
        }
    }

可以发现,范围查询,es转换为了to的查询语句。

FilterContext

FilterContext即在查询过程中仅判断该文档是否满足条件,结果仅有Yes或者No,同时es会对filter的内容进行数据缓存,相对Query的速度会更快一些,需要结合bool一起使用。

小节

在本篇中,简单介绍并实例了elasticsearch中的子查询查询,在实际过程中,当遇到和自己预期不同的返回值时,可以通过添加profile属性,来查看es的执行情况,这样有助于在开发过程中及时更新自己的查询条件。