es 请求100万的数据怎么返回的 es get请求

转载

mob6454cc79ab13 2024-05-05 22:19:08

文章标签 es 请求100万的数据怎么返回的 ElasticSearch 快速请求字段元数据 文章分类 架构后端开发

绑定多个请求在一个请求里，可以避免处理每个请求带来的网络负载问题。

（这个技术算不上NB吧，基本都支持了。）

如果你知道你需要检索多个文档，一次请求的速度很快，而不是一个接着一个文档获取。

mget API 期望一组docs,每个元素包含_index,_type,_id元数据

你也可以指定_source参数来过滤字段。

GET /_mget

{

"docs" : [

{

"_index" : "website" ,

"_type" : "blog" ,

"_id" : 2

},

{

"_index" : "website" ,

"_type" : "pageviews" ,

"_id" : 1,

"_source" : "views"

}

]

}

响应体也包含一组docs.按照请求对应的顺序返回。

每个响应

{

"docs" : [

{

"_index" : "website" ,

"_id" : "2" ,

"_type" : "blog" ,

"found" : true,

"_source" : {

"text" : "This is a piece of cake..." ,

"title" : "My first external blog entry"

},

"_version" : 10

},

{

"_index" : "website" ,

"_id" : "1" ,

"_type" : "pageviews" ,

"found" : true,

"_version" : 2,

"_source" : {

"views" : 2

}

]

}

如果你检索的文档都在同一个 index 里(甚至同一个type),

你可以指定/_index或者/_index/_type在url里。

你也可以覆盖这些值。

GET /website/blog/_mget

{

"docs" : [

{ "_id" : 2 },

{ "_type" : "pageviews" , "_id" : 1 }

]

}

事实上，如果所有的文档有同样的_index,_type,

你可以这样来查询：

GET /website/blog/_mget

{

"ids" : [ "2" , "1" ]

}

注意：第二个文档不存在，

如果不存在

{

"docs" : [

{

"_index" : "website" ,

"_type" : "blog" ,

"_id" : "2" ,

"_version" : 10,

"found" : true,

"_source" : {

"title" : "My first external blog entry" ,

"text" : "This is a piece of cake..."

}

},

{

"_index" : "website" ,

"_type" : "blog" ,

"_id" : "1" ,

"found" : false

}

]

}

文档没有找到。

第二个文档没找到，不影响第一个，每个文档独立执行。

HTTP响应体的代码是200，尽管有一个文档没找到，

事实上，就算都没找到，也还是200，

原因是mget本身已经成功执行了，

用户需要关注found字段的值。

mget让我们一次检索多个文档

bulk API让我们来做多个创建，索引，更新和删除请求。

这个非常有用。

bulk请求体有如下的格式：

{ action: { metadata }}\n

{ request body }\n

{ action: { metadata }}\n

{ request body }\n

...

需要注意两点：

1 每行要以 '\n' 结束，最后一行也是，

2 不能包含非转义换行符，影响解析。

action / metadata指定对文档执行什么操作。

action是以下几种之一：index,create,update,delete.

metadata指定_index,_type,_id来让文档被索引，创建，更新和删除。

比如，一个删除的请求如下：

{ "delete" : { "_index" : "website" , "_type" : "blog" , "_id" : "123" }}

请求体包含了文档_source本身，文档包含的字段和值，

当行为为index和create时要求存在，你必须提供文档来索引。

行为为update时，也需要,比如doc,upsert,script等等。

删除则不需要request body.

{ "create" : { "_index" : "website" , "_type" : "blog" , "_id" : "123" }}

{ "title" : "My first blog post" }

如果没有指定 id ,自动生成一个 id .

{ "index" : { "_index" : "website" , "_type" : "blog" }}

{ "title" : "My second blog post" }

看一个例子。

POST / _bulk

{ "delete" : { "_index" : "website" , "_type" : "blog" , "_id" : "123" }}

{ "create" : { "_index" : "website" , "_type" : "blog" , "_id" : "123" }}

{ "title" : "My first blog post" }

{ "index" : { "_index" : "website" , "_type" : "blog" }}

{ "title" : "My second blog post" }

{ "update" : { "_index" : "website" , "_type" : "blog" , "_id" : "123" , "_retry_on_conflict" : 3 } }

{ "doc" : { "title" : "My updated blog post" } }

VIEW IN SENSE

响应如下：

{

"took" : 4 ,

"errors" : false,

"items" : [

{ "delete" : {

"_index" : "website" ,

"_type" : "blog" ,

"_id" : "123" ,

"_version" : 2 ,

"status" : 200 ,

"found" : true

}},

{ "create" : {

"_index" : "website" ,

"_type" : "blog" ,

"_id" : "123" ,

"_version" : 3 ,

"status" : 201

}},

{ "create" : {

"_index" : "website" ,

"_type" : "blog" ,

"_id" : "EiwfApScQiiy7TIKFxRCTw" ,

"_version" : 1 ,

"status" : 201

}},

{ "update" : {

"_index" : "website" ,

"_type" : "blog" ,

"_id" : "123" ,

"_version" : 4 ,

"status" : 200

}}

]

}}

~~~~~~~~~~~~~~~~~~

每个子请求都独立执行，所以一个失败不会影响别人，

如果任何一个请求失败，顶层的失败标识被设置为true,

错误细节会被报告。

POST / _bulk

{ "create" : { "_index" : "website" , "_type" : "blog" , "_id" : "123" }}

{ "title" : "Cannot create - it already exists" }

{ "index" : { "_index" : "website" , "_type" : "blog" , "_id" : "123" }}

{ "title" : "But we can update it" }

响应体里可以看到：

创建 123 失败，因为已经存在，但是后续的请求成功了。

{

"took" : 3 ,

"errors" : true,

"items" : [

{ "create" : {

"_index" : "website" ,

"_type" : "blog" ,

"_id" : "123" ,

"status" : 409 ,

"error" : "DocumentAlreadyExistsException

[[website][ 4 ] [blog][ 123 ]:

document already exists]"

}},

{ "index" : {

"_index" : "website" ,

"_type" : "blog" ,

"_id" : "123" ,

"_version" : 5 ,

"status" : 200

}}

]

}

这也意味着，批量请求不是原子的，不能用来实现事务。

每个请求独立处理，

不要重复自己

也许你在批量索引日志数据到同一个index里，同一个 type ,

为每个文档指定元数据是一种浪费，正如mget api所示，

bulk请求接受 / _index和 / _index / _type在url里。

POST / website / _bulk

{ "index" : { "_type" : "log" }}

{ "event" : "User logged in" }

你也仍然可以覆盖_index和_type在metadata行里。

但是它默认使用url里的值。

POST / website / log / _bulk

{ "index" : {}}

{ "event" : "User logged in" }

{ "index" : { "_type" : "blog" }}

{ "title" : "Overriding the default type" }

多大是多大？

整个bulk请求需要被接受到请求的节点放在内存里，所以请求的数据越多，

其它请求可用的内存数量越少，有一个最佳大小，

超过那个大小，性能不会提高甚至下降。

这个最佳大小，尽管如此，不是一个固定数字，取决于你的硬件，文档大小和复杂度。

你的索引和查找负载，幸运的，很容易找到这个点。

尝试批量索引典型的文档，大小不断增加，当性能开始下降，说明这个数字太大了。

可以取一个[ 1000 , 5000 ]之间的数字作为开始。

同样也要关注你的请求的物理大小， 1000 个 1kb 的文档是不同于 1000 个 1M 的文档的。

一个比较好的bulk大小是 5 - 15MB 大小。

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：用greenplum搭建分布式集群 greenplumdb

下一篇：lua 小数转int c#小数转换成int型

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

es 请求100万的数据怎么返回的 es get请求

es 请求100万的数据怎么返回的 es get请求

51CTO博客