elasticsearch search-after
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#request-body-search-search-after
一、使用场景 search-after
可以使用from和size对结果进行分页,但是当达到深度分页时,成本会变得很高。index.max_result_窗口默认为10000,这是一种保护措施,搜索请求占用堆内存和与from+size成比例的时间建议使用scroll api进行高效的深度滚动,但scroll上下文代价高昂,不建议将其用于实时用户请求。search_after参数通过提供一个活动光标来规避此问题。其思想是使用上一页的结果来帮助检索下一页。(Pagination of results can be done by using the from and size but the cost becomes prohibitive when the deep pagination is reached. The index.max_result_window which defaults to 10,000 is a safeguard, search requests take heap memory and time proportional to from + size. The Scroll api is recommended for efficient deep scrolling but scroll contexts are costly and it is not recommended to use it for real time user requests. The search_after parameter circumvents this problem by providing a live cursor. The idea is to use the results from the previous page to help the retrieval of the next page.XXXXXXA)
二、使用
假设检索第一页的查询如下所示:
2.1 请求
GET /bank/account/_search
{
"size": 5,
"query": {
"match_all": {}
},
"sort": [
{
"_id":"desc",
"account_number": "asc"
}
]
}
结果
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1000,
"max_score" : null,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "994",
"_score" : null,
"_source" : {
"account_number" : 994,
"balance" : 33298,
"firstname" : "Madge",
"lastname" : "Holcomb",
"age" : 31,
"gender" : "M",
"address" : "612 Hawthorne Street",
"employer" : "Escenta",
"email" : "madgeholcomb@escenta.com",
"city" : "Alafaya",
"state" : "OR"
},
"sort" : [
"994",
994
]
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "993",
"_score" : null,
"_source" : {
"account_number" : 993,
"balance" : 26487,
"firstname" : "Campos",
"lastname" : "Olsen",
"age" : 37,
"gender" : "M",
"address" : "873 Covert Street",
"employer" : "Isbol",
"email" : "camposolsen@isbol.com",
"city" : "Glendale",
"state" : "AK"
},
"sort" : [
"993",
993
]
}
]
}
}
2.2 再次请求
上述请求的结果包括每个文档的排序值数组。这些排序值可以与search_after参数一起使用,以便在结果列表中的任何文档之后开始返回结果例如,我们可以使用上一个文档的排序值并将其传递给search-after以检索下一页的结果:
GET /bank/account/_search
{
"size": 2,
"query": {
"match_all": {}
},
"search_after": ["995",995 ],
"sort": [
{
"_id":"desc",
"account_number": "asc"
}
]
}
{
"took" : 13,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1000,
"max_score" : null,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "994",
"_score" : null,
"_source" : {
"account_number" : 994,
"balance" : 33298,
"firstname" : "Madge",
"lastname" : "Holcomb",
"age" : 31,
"gender" : "M",
"address" : "612 Hawthorne Street",
"employer" : "Escenta",
"email" : "madgeholcomb@escenta.com",
"city" : "Alafaya",
"state" : "OR"
},
"sort" : [
"994",
994
]
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "993",
"_score" : null,
"_source" : {
"account_number" : 993,
"balance" : 26487,
"firstname" : "Campos",
"lastname" : "Olsen",
"age" : 37,
"gender" : "M",
"address" : "873 Covert Street",
"employer" : "Isbol",
"email" : "camposolsen@isbol.com",
"city" : "Glendale",
"state" : "AK"
},
"sort" : [
"993",
993
]
}
]
}
}
2.3 注意事项
The parameter from
must be set to 0 (or -1) when search_after
is used.
2.4 和Scroll的区别
search_after不是自由跳转到随机页面的解决方案,而是并行滚动许多查询。它与scroll API非常相似,但与之不同的是,search_after参数是无状态的,它总是根据搜索器的最新版本进行解析因此,根据索引的更新和删除,排序顺序可能会在遍历期间发生更改。(search_after is not a solution to jump freely to a random page but rather to scroll many queries in parallel. It is very similar to the scroll API but unlike it, the search_after parameter is stateless, it is always resolved against the latest version of the searcher. For this reason the sort order may change during a walk depending on the updates and deletes of your index.)