【README】

1.本文介绍了elasticsearch文档批量操作的api, bulk;

2.bulk api:使得在单个api调用请求中可以执行多个 index/delete(索引或删除) 操作,这可以极大提高索引速度;

3.bulk api可以参考 Bulk API | Elasticsearch Guide [7.2] | Elastic


【1】bulk api介绍

1)语法格式

语法格式:
{action:{metadata}}\n
{请求体} \n

2)bulk api列表 :

  • create 如果文档不存在就保存,但如果文档存在就返回错误;
  • *index 如果文档不存在就保存,如果文档存在就更新 (upsert)(更新时不比较新老数据);
  • update 更新一个文档,如果文档不存在就返回错误(比较新老数据,若相同,则不更新返回noop);
  • delete 删除一个文档,如果要删除的文档id不存在,就返回错误;

3)利用bulk api保存文档时,显然使用 bulk index是比较推荐的方式(因为bulk index是 upsert,有则更新,否则新增)


【1.1】bulk create 批量保存文档

1)create 如果文档不存在就保存,但如果文档存在就返回错误;

Post localhost:9200/_bulk
{"create":{"_index":"website","_type":"blog","_id":"3"}} 
{"title":"zhangsan03_bulk", "body":"成都欢迎你03"} 
{"create":{"_index":"website","_type":"blog","_id":"4"}} 
{"title":"zhangsan04_bulk", "body":"成都欢迎你04"} 
// 这里必须有一个空行,否则报错

2)再重试执行一次;报错如下(文档已经存在):

{
    "took": 1,
    "errors": true,
    "items": [
        {
            "create": { 
                "_index": "website",
                "_type": "blog",
                "_id": "3",
                "status": 409,
                "error": {
                    "type": "version_conflict_engine_exception",
                    "reason": "[3]: version conflict, document already exists (current version [1])", // 文档已经存在报错
                    "index_uuid": "rAlhUmExQvCXb1pGZJ1tog",
                    "shard": "0",
                    "index": "website"
                }
            }
        },
        
    ]
}

【1.2】bulk-delete 批量删除文档

1)delete 删除一个文档,如果要删除的文档id不存在,就返回错误

Post  localhost:9200/_bulk 
{"delete":{"_index":"website","_type":"blog","_id":"3"}}
{"delete":{"_index":"website","_type":"blog","_id":"4"}}
// 这里必须有一个空行

【1.3】bulk index 批量保存或更新文档(不比较新老数据)

1)bulk index :如果文档不存在就保存,如果文档存在就更新 (upsert);

Post  localhost:9200/_bulk 
{"index":{"_index":"website","_type":"blog", "_id":"25"}}
{"title":"zhangsan25_bulk", "body":"成都欢迎你25" }
{"index":{"_index":"website","_type":"blog", "_id":"26"}}
{"title":"zhangsan26_bulk", "body":"成都欢迎你26"}
// 这里必须有一个空行

{
    "took": 235,
    "errors": false,
    "items": [
        {
            "index": {
                "_index": "website",
                "_type": "blog",
                "_id": "25",
                "_version": 1,
                "result": "created", // 创建事件
                "_shards": {
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 18,
                "_primary_term": 1,
                "status": 201
            }
......
}

2)再执行一次,则是更新事件

{
    "took": 157,
    "errors": false,
    "items": [
        {
            "index": {
                "_index": "website",
                "_type": "blog",
                "_id": "25",
                "_version": 2,
                "result": "updated", //更新事件 
......
}

【1.4】bulk update 批量更新文档(比较新老数据 )

1)update 更新一个文档,如果文档不存在就返回错误;

Post localhost:9200/_bulk
{"update":{"_index":"website","_type":"blog","_id":"25"}} 
{"doc":{"title":"zhangsan25_bulk_update01"} }
{"update":{"_index":"website","_type":"blog","_id":"26"}} 
{"doc":{"title":"zhangsan26_bulk_update02"} }
// 这里必须要有一个换行

2)多次更新25 26号文档,报文体相同,返回结果是 noop;

  • bulk update: 会比较新老数据,若两者相同,则不更新,返回 noop;
{
    "took": 2,
    "errors": false,
    "items": [
        {
            "update": {
                "_index": "website",
                "_type": "blog",
                "_id": "25",
                "_version": 3,
                "result": "noop", // 没有操作
                "_shards": {
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                },
                "status": 200
            }
        },
        ...... 
    ]
}

3)若更新一个不存在的文档(id=35,id=36的文档就不存在),报 document is missing 错误:

Post localhost:9200/_bulk 
{"update":{"_index":"website","_type":"blog","_id":"35"}} 
{"doc":{"title":"zhangsan25_bulk_update01"} }
{"update":{"_index":"website","_type":"blog","_id":"36"}} 
{"doc":{"title":"zhangsan26_bulk_update02"} } 


{
    "took": 0,
    "errors": true,
    "items": [
        {
            "update": {
                "_index": "website",
                "_type": "blog",
                "_id": "35",
                "status": 404,
                "error": {
                    "type": "document_missing_exception", // 文档不存在错误 
                    "reason": "[blog][35]: document missing",
                    "index_uuid": "rAlhUmExQvCXb1pGZJ1tog",
                    "shard": "0",
                    "index": "website"
                }
            }
        },
        ...... 
    ]
}

【2】bulk 批量导入样本数据

Post  localhost:9200/bank/account/_bulk
样本数据

样本数据转自:  https://github.com/linuxacademy/content-elasticsearch-deep-dive/blob/master/sample_data/accounts.json