Scroll API 滚动查询



前言

ES的查询效率算是比较高的,但是​​from+size​​的分页查询方式只能查到一万条,并且随着分页到后面,执行效率越低。

Scroll滚动查询的方式可以查询大量数据,并能保证查询数据结果稳定。对于后台批量数据来说非常有用。

查询

第一次查询

第一次查询和通常的​​_search​​​查询基本一致,只需要在后面加上​​?scroll=1m​​,1m代表一分钟,参考的时间格式如下

【Elasticsearch】ES查询优化—Scroll API 滚动查询_elasticsearch

GET bbs/_search?scroll=1m
{
"size": 200
}

返回结果除了和正常查询结果基本一致之外,增加了返回值​​_scroll_id​

{
"_scroll_id" : "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAFwRFnpvalZqbDZEVEFPQWptS0wyZjYzTXcAAAAAAABcEhZ6b2pWamw2RFRBT0FqbUtMMmY2M013AAAAAAAAXBMWem9qVmpsNkRUQU9Bam1LTDJmNjNNdwAAAAAAAFwUFnpvalZqbDZEVEFPQWptS0wyZjYzTXcAAAAAAABcFRZ6b2pWamw2RFRBT0FqbUtMMmY2M013",
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 5001,
"max_score" : 1.0,
"hits" : [
{
(省略。。。。。。)

滚动请求

得到第一次请求的​​_scroll_id​​​之后,就可以在设定的有效时间内,使用这个​​_scroll_id​​完成滚动查询。

GET /_search/scroll
{
"scroll":"10m",
"scroll_id": "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAFsTFnpvalZqbDZEVEFPQWptS0wyZjYzTXcAAAAAAABbFxZ6b2pWamw2RFRBT0FqbUtMMmY2M013AAAAAAAAWxQWem9qVmpsNkRUQU9Bam1LTDJmNjNNdwAAAAAAAFsVFnpvalZqbDZEVEFPQWptS0wyZjYzTXcAAAAAAABbFhZ6b2pWamw2RFRBT0FqbUtMMmY2M013"
}

滚动查询得到的结果和第一次请求的结果一致,返回的​​_scroll_id​​也是一致的

如果请求翻页的结果已经翻完,返回的结果也是一致的,只是​​hits​​里面没有数据了,可以根据这个判断数据已经刷完。

{
"_scroll_id" : "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAF17FnpvalZqbDZEVEFPQWptS0wyZjYzTXcAAAAAAABdeRZ6b2pWamw2RFRBT0FqbUtMMmY2M013AAAAAAAAXXwWem9qVmpsNkRUQU9Bam1LTDJmNjNNdwAAAAAAAF16FnpvalZqbDZEVEFPQWptS0wyZjYzTXcAAAAAAABdfRZ6b2pWamw2RFRBT0FqbUtMMmY2M013",
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 5001,
"max_score" : 1.0,
"hits" : [ ]
}
}

如果查询的​​_scroll_id​​已经超时,那么就会返回错误码

{
"error" : {
"root_cause" : [
{
"type" : "search_context_missing_exception",
"reason" : "No search context found for id [24047]"
},
{
"type" : "search_context_missing_exception",
"reason" : "No search context found for id [24051]"
},
{
"type" : "search_context_missing_exception",
"reason" : "No search context found for id [24048]"
},
{
"type" : "search_context_missing_exception",
"reason" : "No search context found for id [24049]"
},
{
"type" : "search_context_missing_exception",
"reason" : "No search context found for id [24050]"
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : -1,
"index" : null,
"reason" : {
"type" : "search_context_missing_exception",
"reason" : "No search context found for id [24047]"
}
},
{
"shard" : -1,
"index" : null,
"reason" : {
"type" : "search_context_missing_exception",
"reason" : "No search context found for id [24051]"
}
},
{
"shard" : -1,
"index" : null,
"reason" : {
"type" : "search_context_missing_exception",
"reason" : "No search context found for id [24048]"
}
},
{
"shard" : -1,
"index" : null,
"reason" : {
"type" : "search_context_missing_exception",
"reason" : "No search context found for id [24049]"
}
},
{
"shard" : -1,
"index" : null,
"reason" : {
"type" : "search_context_missing_exception",
"reason" : "No search context found for id [24050]"
}
}
],
"caused_by" : {
"type" : "search_context_missing_exception",
"reason" : "No search context found for id [24050]"
}
},
"status" : 404
}

清理scroll

我们可以主动清理scroll,释放es压力。

DELETE /_search/scroll
{
"scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ=="
}

总结

优点


  • 可以查询大量数据
  • 稳定分页不会数据重复
  • 可以超出分页的一万条限制

缺点


  • 不能跨页请求
  • 不支持重试请求

参考资料