es删除索引释放空间 es批量删除索引命令

转载

mob64ca14048514 2024-08-15 15:09:16

文章标签 es删除索引释放空间数据字段 json 文章分类 架构后端开发

常用 ElasticSearch 管理操作

1 查看健康状态

GET _cat/health?v

`epoch timestamp cluster status node.total node.data shards 1531290005 14:20:05 elasticsearch green 1 1 2 pri relo init unassign pending_tasks 2 0 0 0 0 max_task_wait_time active_shards_percent - 100.0%`

epoch             timestamp      cluster              status  node.total    node.data   shards
 1531290005   14:20:05           elasticsearch    green     1                  1                  2
pri    relo   init   unassign    pending_tasks
 2       0       0       0                  0
max_task_wait_time    active_shards_percent
                                        - 100.0%

status：green、yellow、red

green：每个索引的 primary shard 和 replica shard 都是 active 的

yellow：每个索引的primary shard都是active的，但部分的replica shard不是active的

red：不是所有的索引的 primary shard 都是 active 状态的。

2 创建索引

命令语法：PUT 索引名{索引配置参数}

index 名称必须是小写的，且不能以下划线'_'，'-'，'+'开头。

在 ElasticSearch 中，默认的创建索引的时候，会分配 5 个 primary shard，并为每个primary shard 分配一个 replica shard（在 ES7 版本后，默认创建 1 个 primary shard）。在 ElasticSearch 中，默认的限制是：如果磁盘空间不足 15%的时候，不分配 replica shard。如果磁盘空间不足 5%的时候，不再分配任何的 primary shard。ElasticSearch 中对 shard的分布是有要求的。ElasticSearch 尽可能保证 primary shard 平均分布在多个节点上。Replica shard 会保证不和他备份的那个 primary shard 分配在同一个节点上。

创建默认索引

PUT test_index1

创建索引时指定分片。

PUT test_index2
{
	"settings":{
		"number_of_shards" : 2,
		"number_of_replicas" : 1
	}
}

3 修改索引

命令语法：PUT 索引名/_settings{索引配置参数}

注意：索引一旦创建，primary shard 数量不可变化，可以改变 replica shard 数量。

PUT test_index2/_settings
{
   "number_of_replicas" : 2
}

4 删除索引

命令语法：DELETE 索引名 1[, 索引名 2 ...]

DELETE test_index1

5 查看索引信息

GET _cat/indices?v

`health status index uuid pri rep docs.count yellow open test_index 2PJFQBtzTwOUhcy-QjfYmQ 5 1 0 docs.deleted store.size pri.store.size 0 460b 460b`

6 检查分片信息

查看索引的 shard 信息。

GET _cat/shards?v

`index shard prirep state docs store ip node test_index2 1 p STARTED 0 261b 192.168.89.142 mN_pylT test_index2 1 r UNASSIGNED test_index2 1 r UNASSIGNED test_index2 0 p STARTED 0 261b 192.168.89.142 mN_pylT test_index2 0 r UNASSIGNED test_index2 0 r UNASSIGNED`

index           shard   prirep   state                  docs    store    ip                           node
 test_index2  1           p          STARTED            0         261b    192.168.89.142     mN_pylT
 test_index2  1           r           UNASSIGNED
 test_index2  1           r           UNASSIGNED
 test_index2  0           p          STARTED            0         261b     192.168.89.142    mN_pylT
 test_index2  0           r           UNASSIGNED
 test_index2  0           r           UNASSIGNED

7 新增 Document

在索引中增加文档。在 index 中增加 document。

ElasticSearch 有自动识别机制。如果增加的 document 对应的 index 不存在，自动创建 index；如果 index 存在，type 不存在，则自动创建 type。如果 index 和 type 都存在，则使用现有的 index 和 type。

7.1 PUT 语法

此操作为手工指定 id 的 Document 新增方式。

语法：PUT 索引名/类型名/唯一 ID{字段名:字段值}

如：

PUT test_index/my_type/1
{
	"name":"test_doc_01",
	"remark":"first test elastic search",
	"order_no":1
}
PUT test_index/my_type/2
{
	"name":"test_doc_02",
	"remark":"second test elastic search",
	"order_no":2
}
PUT test_index/my_type/3
{
	"name":"test_doc_03",
	"remark":"third test elastic search",
	"order_no":3
}

结果：

{
	"_index": "test_index", 新增的 document 在什么 index 中，
	"_type": "my_type", 新增的 document 在 index 中的哪一个 type 中。
	"_id": "1", 指定的 id 是多少
	"_version": 1, document 的版本是多少，版本从 1 开始递增，每次写操作都会+1
	"result": "created", 本次操作的结果，created 创建，updated 修改，deleted 删除
	"_shards": { 分片信息
		"total": 2, 分片数量只提示 primary shard
		"successful": 1, 数据 document 一定只存放在 index 中的某一个 primary shard 中
		"failed": 0
	},
	"_seq_no": 0, 执行的序列号
	"_primary_term": 1 词条比对。
}

如果使用 PUT 语法对同 id 的 Document 执行多次操作。是一种覆盖操作。如果需要ElasticSearch 辅助检查 PUT 的 Document 是否已存在，可以使用强制新增语法。使用强制新增语法时，如果 Document 的 id 在 ElasticSearch 中已存在，则会报错。（version conflict, document already exists）

语法：

PUT 索引名/类型名/唯一 ID/_create{字段名:字段值}

或

PUT 索引名/类型名/唯一 ID?op_type=create{字段名:字段值}。

如：

PUT test_index/my_type/1/_create
{
	"name":"new_test_doc_01",
	"remark":"first test elastic search",
	"order_no":1
}

7.2 POST 语法

此操作为 ElasticSearch 自动生成 id 的新增 Document 方式。此语法格式和 PUT 请求的数据新增，只有唯一的区别，就是可以自动生成主键 id，其他的和 PUT 请求新增数据完全一致。

语法：POST 索引名/类型名{字段名:字段值}

如：

POST test_index/my_type
{
	"name":"test_doc_04",
	"remark":"forth test elastic search",
	"order_no":4
}

8 查询 Document

8.1 GET ID 单数据查询

语法：GET 索引名/类型名/唯一 ID

如：

GET test_index/my_type/1

结果：

{
	"_index": "test_index",
	"_type": "my_type",
	"_id": "1",
	"_version": 1,
	"found": true,
	"_source": { 找到的 document 数据内容。
		"name": "test_doc_01",
		"remark": "first test elastic search",
		"order_no":1
	}
}

8.2 GET _mget 批量查询

批量查询可以提高查询效率。推荐使用（相对于单数据查询来说）。

语法如下：

`GET _mget { "docs" : [ { "_index" : "索引名", "_type" : "类型名", "_id" : "唯一 ID 值" }, {}, {} ] }`
GET 索引名/_mget `{ "docs" : [ { "_type" : "类型名", "_id" : "唯一 ID 值" }, {}, {} ] }`
GET 索引名/类型名/_mget `{ "docs" : [ { "_id" : "唯一 ID 值" }, { "_id" : "唯一 ID 值" } ] }`

GET _mget
 {
     "docs" : [
         {
             "_index" : "索引名",
             "_type" : "类型名",
             "_id" : "唯一 ID 值"
         }, {}, {}
     ]
 }

GET 索引名/_mget

{
     "docs" : [
         {
             "_type" : "类型名",
             "_id" : "唯一 ID 值"
         }, {}, {}
     ]
 }

GET 索引名/类型名/_mget

{
     "docs" : [
         {
             "_id" : "唯一 ID 值"
         },
         {
             "_id" : "唯一 ID 值"
         }
     ]
 }

9 修改 Document

9.1 替换 Document（全量替换）

和新增的 PUT|POST 语法是一致。

PUT|POST 索引名/类型名/唯一 ID{字段名:字段值}

本操作相当于覆盖操作。全量替换的过程中，ElasticSearch 不会真的修改 Document中的数据，而是标记 ElasticSearch 中原有的 Document 为 deleted 状态，再创建一个新的 Document 来存储数据，当 ElasticSearch 中的数据量过大时，ElasticSearch 后台回收 deleted 状态的 Document。

如：

PUT test_index/my_type/1
{
	"name":"new_test_doc_01",
	"remark":"first test elastic search",
	"order_no":1
}

结果：

{
	"_index": "test_index",
	"_type": "my_type",
	"_id": "1",
	"_version": 2,
	"result": "updated",
	"_shards": {
		"total": 2,
		"successful": 1,
		"failed": 0
	},
	"_seq_no": 1,
	"_primary_term": 1
}

9.2 更新 Document（partial update）

语法：POST 索引名/类型名/唯一 ID/_update{doc:{字段名:字段值}}

只更新某 Document 中的部分字段。这种更新方式也是标记原有数据为 deleted 状态，创建一个新的Document数据，将新的字段和未更新的原有字段组成这个新的Document，并创建。对比全量替换而言，只是操作上的方便，在底层执行上几乎没有区别。

如：

POST test_index/my_type/1/_update
{
	"doc":{
		"name":" test_doc_01_for_update"
	}
}

结果：

{
	"_index": "test_index",
	"_type": "my_type",
	"_id": "1",
	"_version": 5,
	"result": "updated",
	"_shards": {
		"total": 2,
		"successful": 1,
		"failed": 0
	},
	"_seq_no": 2,
	"_primary_term": 1
}

10 删除 Document

ElasticSearch 中执行删除操作时，ElasticSearch 先标记 Document 为 deleted 状态，而不是直接物理删除。当 ElasticSearch 存储空间不足或工作空闲时，才会执行物理删除操作。标记为 deleted 状态的数据不会被查询搜索到。

语法：DELETE 索引名/类型名/唯一 ID

如：

DELETE test_index/my_type/1

结果：

{
	"_index": "test_index",
	"_type": "my_type",
	"_id": "1",
	"_version": 6,
	"result": "deleted",
	"_shards": {
		"total": 2,
		"successful": 1,
		"failed": 0
	},
	"_seq_no": 5,
	"_primary_term": 1
}

11 bulk 批量增删改

使用 bulk 语法执行批量增删改。语法格式如下：

POST _bulk
{ "action_type" : { "metadata_name" : "metadata_value" } }
{ document datas | action datas }

语法中的 action_type 可选值为：

create : 强制创建，相当于 PUT 索引名/类型名/唯一 ID/_create

index: 普通的 PUT 操作，相当于创建 Document 或全量替换

update: 更新操作（partial update）,相当于 POST 索引名/类型名/唯一 ID/_update

delete: 删除操作

案例如下：

新增数据： `POST _bulk { "create" : { "_index" : "test_index" , "_type" : "my_type", "_id" : "1" } } { "field_name" : "field value" }`
PUT 操作新增或全量替换 `POST _bulk { "index" : { "_index" : "test_index", "_type" : "my_type" , "_id" : "2" } } { "field_name" : "field value 2" }`
POST 更新数据 `POST _bulk { "update" : { "_index" : "test_index", "_type" : "my_type" , "_id" : 2, "_retry_on_conflict" : 3 } } { "doc" : { "field_name" : "partial update field value" } }`
DELETE 删除数据 `POST _bulk { "delete" : { "_index" : "test_index", "_type" : "my_type", "_id" : "2" } }`
批量写操作 `POST _bulk { "create" : { "_index" : "test_index" , "_type" : "my_type", "_id" : "10" } } { "field_name" : "field value" } { "index" : { "_index" : "test_index", "_type" : "my_type" , "_id" : "20" } } { "field_name" : "field value 2" } { "update" : { "_index" : "test_index", "_type" : "my_type" , "_id" : 20, "_retry_on_conflict" : 3 } } { "doc" : { "field_name" : "partial update field value" } } { "delete" : { "_index" : "test_index", "_type" : "my_type", "_id" : "2" } }`

注意：bulk 语法中要求一个完整的 json 串不能有换行。不同的 json 串必须使用换行分隔。多个操作中，如果有错误情况，不会影响到其他的操作，只会在批量操作返回结果中标记失败。bulk 语法批量操作时，bulk request 会一次性加载到内存中，如果请求数据量太大，性能反而下降（内存压力过高），需要反复尝试一个最佳的 bulk request size。一般从 1000~5000 条数据开始尝试，逐渐增加。如果查看 bulk request size 的话，一般是 5~15MB 之间为好。

bulk 语法要求 json 格式是为了对内存的方便管理，和尽可能降低内存的压力。如果json 格式没有特殊的限制，ElasticSearch 在解释 bulk 请求时，需要对任意格式的 json进行解释处理，需要对 bulk 请求数据做 json 对象会 json array 对象的转化，那么内存的占用量至少翻倍，当请求量过大的时候，对内存的压力会直线上升，且需要 jvm gc 进程对垃圾数据做频繁回收，影响 ElasticSearch 效率。

生成环境中，bulk api 常用。都是使用 java 代码实现循环操作。一般一次 bulk 请求，执行一种操作。如：批量新增 10000 条数据等。

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：grep查看不显示指定内容 grep只显示行号

下一篇：java反射通过字段名取值 java 反射获取字段类型

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯