es批量写 es批量写入大小设置

转载

mob64ca13fe1aa6 2024-04-15 18:54:31

文章标签 es批量写 nosql 线程池数据调优 文章分类 架构后端开发

这里写自定义目录标题

ES调优

写入调优
批量操作

单个批次一次发送几十m左右的数据
写入数据时先关闭副本，写入后再开启副本
写入数据时禁止segment自动merge，禁止自动刷新写入后再开启
设置合适的分片
线程池大小优化
默认使用文档ID进行路由

JVM设置
配置调优

常用配置

ES调优

写入调优

批量操作

单个批次一次发送几十m左右的数据

使用Bulk

写入数据时先关闭副本，写入后再开启副本

写入数据时禁止segment自动merge，禁止自动刷新写入后再开启

禁止自动刷新
“index”: {
“refresh_interval”: “-1” // 禁止自动刷新数据
}
禁止自动merge

PUT /_cluster/settings
{
   "transient" : {
       "indices.store.throttle.type" : "none" 
   }
}

设置合适的分片

建议分片大小为50G左右；1GB堆内存对应20-25个分片

线程池大小优化

threadpool.index.size: 64 线程池大小（建议2~3倍cpu数）

默认使用文档ID进行路由

自带文档ID会导致每次都需要确认该文档是否存在

JVM设置

堆内内存设置不要超过32G，防止64位内存指针压缩技术对于大于32G的内存失效
Lucene会使用大量的堆外内存，需要预留一般的内存给它。
避免交换内存 bootstrap.mlockall: true

配置调优

index.merge.scheduler.max_thread_count:1    #索引merge最大线程数
indices.memory.index_buffer_size:30%        #内存
index.translog.durability:async             #这个可以异步写硬盘，增大写的速度(可能会有数据丢失)
index.translog.sync_interval:120s           #translog间隔时间
discovery.zen.ping_timeout:120s             #心跳超时时间
discovery.zen.fd.ping_interval:120s         #节点检测时间
discovery.zen.fd.ping_timeout:120s          #ping超时时间
discovery.zen.fd.ping_retries:6             #心跳重试次数
thread_pool.bulk.size:20                    #写入线程个数 由于我们查询线程都是在代码里设定好的，我这里只调节了写入的线程数
thread_pool.bulk.queue_size:1000            #写入线程队列大小
index.refresh_interval:300s                 #index刷新间隔

常用配置

cluster.name: estest 集群名称
node.name: “testanya” 节点名称

node.master: false 是否主节点
node.data: true 是否存储数据

index.store.type: niofs 读写文件方式
index.cache.field.type: soft 缓存类型

bootstrap.mlockall: true 禁用swap

gateway.type: local 本地存储

gateway.recover_after_nodes: 3 3个数据节点开始恢复

gateway.recover_after_time: 5m 5分钟后开始恢复数据

gateway.expected_nodes: 4 4个es节点开始恢复

cluster.routing.allocation.node_initial_primaries_recoveries:8 并发恢复分片数
cluster.routing.allocation.node_concurrent_recoveries:2 同时recovery并发数

indices.recovery.max_bytes_per_sec: 250mb 数据在节点间传输最大带宽
indices.recovery.concurrent_streams: 8 同时读取数据文件流线程

discovery.zen.ping.multicast.enabled: false 禁用多播
discovery.zen.ping.unicast.hosts:[“192.168.169.11:9300”, “192.168.169.12:9300”]

discovery.zen.fd.ping_interval: 10s 节点间存活检测间隔
discovery.zen.fd.ping_timeout: 120s 存活超时时间
discovery.zen.fd.ping_retries: 6 存活超时重试次数

http.cors.enabled: true 使用监控

index.analysis.analyzer.ik.type:”ik” ik分词

thread pool setting

threadpool.index.type: fixed 写索引线程池类型
threadpool.index.size: 64 线程池大小（建议2~3倍cpu数）
threadpool.index.queue_size: 1000 队列大小

threadpool.search.size: 64 搜索线程池大小
threadpool.search.type: fixed 搜索线程池类型
threadpool.search.queue_size: 1000 队列大小

threadpool.get.type: fixed 取数据线程池类型
threadpool.get.size: 32 取数据线程池大小
threadpool.get.queue_size: 1000 队列大小

threadpool.bulk.type: fixed 批量请求线程池类型
threadpool.bulk.size: 32 批量请求线程池大小
threadpool.bulk.queue_size: 1000 队列大小

threadpool.flush.type: fixed 刷磁盘线程池类型
threadpool.flush.size: 32 刷磁盘线程池大小
threadpool.flush.queue_size: 1000 队列大小

indices.store.throttle.type: merge
indices.store.throttle.type: none 写磁盘类型
indices.store.throttle.max_bytes_per_sec:500mb 写磁盘最大带宽

index.merge.scheduler.max_thread_count: 8 索引merge最大线程数
index.translog.flush_threshold_size:600MB 刷新translog文件阀值

cluster.routing.allocation.node_initial_primaries_recoveries:8 并发恢复分片数
cluster.routing.allocation.node_concurrent_recoveries:2 同时recovery并发数

使用bulk API 增加入库速度
初次索引的时候，把 replica 设置为 0
增大 threadpool.index.queue_size 1000
增大 indices.memory.index_buffer_size: 20%
index.translog.durability: async –这个可以异步写硬盘，增大写的速度
增大 index.translog.flush_threshold_size: 600MB
增大 index.translog.flush_threshold_ops: 500000

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。