Scroll API 滚动查询
- 第一次查询
- 滚动请求
- 清理scroll
- 总结
- 参考资料
前言
ES的查询效率算是比较高的,但是from+size
的分页查询方式只能查到一万条,并且随着分页到后面,执行效率越低。
Scroll滚动查询的方式可以查询大量数据,并能保证查询数据结果稳定。对于后台批量数据来说非常有用。
查询
第一次查询
第一次查询和通常的_search
查询基本一致,只需要在后面加上?scroll=1m
,1m代表一分钟,参考的时间格式如下
GET bbs/_search?scroll=1m
{
"size": 200
}
返回结果除了和正常查询结果基本一致之外,增加了返回值_scroll_id
{
"_scroll_id" : "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAFwRFnpvalZqbDZEVEFPQWptS0wyZjYzTXcAAAAAAABcEhZ6b2pWamw2RFRBT0FqbUtMMmY2M013AAAAAAAAXBMWem9qVmpsNkRUQU9Bam1LTDJmNjNNdwAAAAAAAFwUFnpvalZqbDZEVEFPQWptS0wyZjYzTXcAAAAAAABcFRZ6b2pWamw2RFRBT0FqbUtMMmY2M013",
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 5001,
"max_score" : 1.0,
"hits" : [
{
(省略。。。。。。)
滚动请求
得到第一次请求的_scroll_id
之后,就可以在设定的有效时间内,使用这个_scroll_id
完成滚动查询。
GET /_search/scroll
{
"scroll":"10m",
"scroll_id": "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAFsTFnpvalZqbDZEVEFPQWptS0wyZjYzTXcAAAAAAABbFxZ6b2pWamw2RFRBT0FqbUtMMmY2M013AAAAAAAAWxQWem9qVmpsNkRUQU9Bam1LTDJmNjNNdwAAAAAAAFsVFnpvalZqbDZEVEFPQWptS0wyZjYzTXcAAAAAAABbFhZ6b2pWamw2RFRBT0FqbUtMMmY2M013"
}
滚动查询得到的结果和第一次请求的结果一致,返回的_scroll_id
也是一致的
如果请求翻页的结果已经翻完,返回的结果也是一致的,只是hits
里面没有数据了,可以根据这个判断数据已经刷完。
{
"_scroll_id" : "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAF17FnpvalZqbDZEVEFPQWptS0wyZjYzTXcAAAAAAABdeRZ6b2pWamw2RFRBT0FqbUtMMmY2M013AAAAAAAAXXwWem9qVmpsNkRUQU9Bam1LTDJmNjNNdwAAAAAAAF16FnpvalZqbDZEVEFPQWptS0wyZjYzTXcAAAAAAABdfRZ6b2pWamw2RFRBT0FqbUtMMmY2M013",
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 5001,
"max_score" : 1.0,
"hits" : [ ]
}
}
如果查询的_scroll_id
已经超时,那么就会返回错误码
{
"error" : {
"root_cause" : [
{
"type" : "search_context_missing_exception",
"reason" : "No search context found for id [24047]"
},
{
"type" : "search_context_missing_exception",
"reason" : "No search context found for id [24051]"
},
{
"type" : "search_context_missing_exception",
"reason" : "No search context found for id [24048]"
},
{
"type" : "search_context_missing_exception",
"reason" : "No search context found for id [24049]"
},
{
"type" : "search_context_missing_exception",
"reason" : "No search context found for id [24050]"
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : -1,
"index" : null,
"reason" : {
"type" : "search_context_missing_exception",
"reason" : "No search context found for id [24047]"
}
},
{
"shard" : -1,
"index" : null,
"reason" : {
"type" : "search_context_missing_exception",
"reason" : "No search context found for id [24051]"
}
},
{
"shard" : -1,
"index" : null,
"reason" : {
"type" : "search_context_missing_exception",
"reason" : "No search context found for id [24048]"
}
},
{
"shard" : -1,
"index" : null,
"reason" : {
"type" : "search_context_missing_exception",
"reason" : "No search context found for id [24049]"
}
},
{
"shard" : -1,
"index" : null,
"reason" : {
"type" : "search_context_missing_exception",
"reason" : "No search context found for id [24050]"
}
}
],
"caused_by" : {
"type" : "search_context_missing_exception",
"reason" : "No search context found for id [24050]"
}
},
"status" : 404
}
清理scroll
我们可以主动清理scroll,释放es压力。
DELETE /_search/scroll
{
"scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ=="
}
总结
优点
- 可以查询大量数据
- 稳定分页不会数据重复
- 可以超出分页的一万条限制
缺点
- 不能跨页请求
- 不支持重试请求
参考资料
- https://www.elastic.co/guide/en/elasticsearch/reference/current/scroll-api.html#scroll-api
- https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html#scroll-search-results
- https://www.elastic.co/guide/en/elasticsearch/reference/current/clear-scroll-api.html