【README】
1.本文介绍了elasticsearch文档批量操作的api, bulk;
2.bulk api:使得在单个api调用请求中可以执行多个 index/delete(索引或删除) 操作,这可以极大提高索引速度;
3.bulk api可以参考 Bulk API | Elasticsearch Guide [7.2] | Elastic
【1】bulk api介绍
1)语法格式
语法格式:
{action:{metadata}}\n
{请求体} \n
2)bulk api列表 :
- create 如果文档不存在就保存,但如果文档存在就返回错误;
- *index 如果文档不存在就保存,如果文档存在就更新 (upsert)(更新时不比较新老数据);
- update 更新一个文档,如果文档不存在就返回错误(比较新老数据,若相同,则不更新返回noop);
- delete 删除一个文档,如果要删除的文档id不存在,就返回错误;
3)利用bulk api保存文档时,显然使用 bulk index是比较推荐的方式(因为bulk index是 upsert,有则更新,否则新增);
【1.1】bulk create 批量保存文档
1)create 如果文档不存在就保存,但如果文档存在就返回错误;
Post localhost:9200/_bulk
{"create":{"_index":"website","_type":"blog","_id":"3"}}
{"title":"zhangsan03_bulk", "body":"成都欢迎你03"}
{"create":{"_index":"website","_type":"blog","_id":"4"}}
{"title":"zhangsan04_bulk", "body":"成都欢迎你04"}
// 这里必须有一个空行,否则报错
2)再重试执行一次;报错如下(文档已经存在):
{
"took": 1,
"errors": true,
"items": [
{
"create": {
"_index": "website",
"_type": "blog",
"_id": "3",
"status": 409,
"error": {
"type": "version_conflict_engine_exception",
"reason": "[3]: version conflict, document already exists (current version [1])", // 文档已经存在报错
"index_uuid": "rAlhUmExQvCXb1pGZJ1tog",
"shard": "0",
"index": "website"
}
}
},
]
}
【1.2】bulk-delete 批量删除文档
1)delete 删除一个文档,如果要删除的文档id不存在,就返回错误
Post localhost:9200/_bulk
{"delete":{"_index":"website","_type":"blog","_id":"3"}}
{"delete":{"_index":"website","_type":"blog","_id":"4"}}
// 这里必须有一个空行
【1.3】bulk index 批量保存或更新文档(不比较新老数据)
1)bulk index :如果文档不存在就保存,如果文档存在就更新 (upsert);
Post localhost:9200/_bulk
{"index":{"_index":"website","_type":"blog", "_id":"25"}}
{"title":"zhangsan25_bulk", "body":"成都欢迎你25" }
{"index":{"_index":"website","_type":"blog", "_id":"26"}}
{"title":"zhangsan26_bulk", "body":"成都欢迎你26"}
// 这里必须有一个空行
{
"took": 235,
"errors": false,
"items": [
{
"index": {
"_index": "website",
"_type": "blog",
"_id": "25",
"_version": 1,
"result": "created", // 创建事件
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 18,
"_primary_term": 1,
"status": 201
}
......
}
2)再执行一次,则是更新事件;
{
"took": 157,
"errors": false,
"items": [
{
"index": {
"_index": "website",
"_type": "blog",
"_id": "25",
"_version": 2,
"result": "updated", //更新事件
......
}
【1.4】bulk update 批量更新文档(比较新老数据 )
1)update 更新一个文档,如果文档不存在就返回错误;
Post localhost:9200/_bulk
{"update":{"_index":"website","_type":"blog","_id":"25"}}
{"doc":{"title":"zhangsan25_bulk_update01"} }
{"update":{"_index":"website","_type":"blog","_id":"26"}}
{"doc":{"title":"zhangsan26_bulk_update02"} }
// 这里必须要有一个换行
2)多次更新25 26号文档,报文体相同,返回结果是 noop;
- bulk update: 会比较新老数据,若两者相同,则不更新,返回 noop;
{
"took": 2,
"errors": false,
"items": [
{
"update": {
"_index": "website",
"_type": "blog",
"_id": "25",
"_version": 3,
"result": "noop", // 没有操作
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"status": 200
}
},
......
]
}
3)若更新一个不存在的文档(id=35,id=36的文档就不存在),报 document is missing 错误:
Post localhost:9200/_bulk
{"update":{"_index":"website","_type":"blog","_id":"35"}}
{"doc":{"title":"zhangsan25_bulk_update01"} }
{"update":{"_index":"website","_type":"blog","_id":"36"}}
{"doc":{"title":"zhangsan26_bulk_update02"} }
{
"took": 0,
"errors": true,
"items": [
{
"update": {
"_index": "website",
"_type": "blog",
"_id": "35",
"status": 404,
"error": {
"type": "document_missing_exception", // 文档不存在错误
"reason": "[blog][35]: document missing",
"index_uuid": "rAlhUmExQvCXb1pGZJ1tog",
"shard": "0",
"index": "website"
}
}
},
......
]
}
【2】bulk 批量导入样本数据
Post localhost:9200/bank/account/_bulk
样本数据
样本数据转自: https://github.com/linuxacademy/content-elasticsearch-deep-dive/blob/master/sample_data/accounts.json