etcd介绍
系统要求
由于 etcd 将数据写入磁盘,因此其性能很大程度上取决于磁盘性能。因此,强烈推荐使用 SSD。要评估磁盘是否足够快用于 etcd,一种可能性是使用磁盘基准测试工具,例如fio。为了防止性能下降或无意中使键值存储超载,etcd 强制将可配置的存储大小配额默认设置为 2GB。为避免交换或内存不足,机器应至少有足够多的 RAM 来覆盖配额。8GB 是正常环境的建议最大大小,如果配置的值超过该值,etcd 会在启动时发出警告。在 CoreOS,etcd 集群通常部署在具有双核处理器、2GB RAM 和至少 80GB SSD 的专用 CoreOS Container Linux 机器上。请注意,性能本质上取决于工作负载;请在生产部署之前进行测试。
为什么是奇数个集群成员?
一个 etcd 集群需要大多数节点(一个仲裁)来就集群状态的更新达成一致。对于具有 n 个成员的集群,quorum 为 (n/2)+1。对于任何奇数大小的集群,添加一个节点总是会增加仲裁所需的节点数。尽管将节点添加到奇数大小的集群看起来更好,因为有更多的机器,但容错性更差,因为完全相同数量的节点可能会失败而不会丢失仲裁,但是有更多的节点可能会失败。如果集群处于无法容忍更多故障的状态,在删除节点之前添加节点是危险的,因为如果新节点无法在集群中注册(例如,地址配置错误),quorum 将永久丢失
最大集群大小是多少?
理论上,没有硬性限制。然而,一个 etcd 集群可能不应该超过七个节点。谷歌 Chubby 锁服务,类似于 etcd,并在谷歌内部广泛部署多年,建议运行五个节点。一个 5 成员的 etcd 集群可以容忍两个成员的故障,这在大多数情况下就足够了。尽管较大的集群提供了更好的容错能力,但写入性能会受到影响,因为必须在更多机器上复制数据
我应该在删除不健康的成员之前添加一个成员吗?
替换 etcd 节点时,重要的是先删除成员,然后添加其替换
为什么 etcd 会因磁盘延迟峰值而失去其领导者?
这是故意的;磁盘延迟是领导者活跃度的一部分。假设集群领导者需要一分钟时间将 raft 日志更新同步到磁盘,但 etcd 集群有一秒的选举超时。即使领导者可以在选举间隔内处理网络消息(例如,发送心跳),它实际上是不可用的,因为它不能提交任何新提案;它正在慢速磁盘上等待。如果集群由于磁盘延迟而频繁失去其领导者,请尝试调整磁盘设置或 etcd 时间参数
etcd集群搭建
环境:一台物理机,通过不同的端口跑出3个节点的etcd集群,建议奇数节点,以防止脑裂
第一步,下载etcd安装包
wget -c https://github.com/etcd-io/etcd/releases/download/v3.5.2/etcd-v3.5.2-linux-amd64.tar.gz
第二步,解压,然后新建配置etcd1.conf文件
name: etcd-1
data-dir: /root/etcd1/data
listen-client-urls: http://0.0.0.0:2379
advertise-client-urls: http://127.0.0.1:2379
listen-peer-urls: http://0.0.0.0:2380
initial-advertise-peer-urls: http://127.0.0.1:2380
initial-cluster: etcd-1=http://127.0.0.1:2380,etcd-2=http://127.0.0.1:2480,etcd-3=http://127.0.0.1:2580
initial-cluster-token: etcd-cluster-my
initial-cluster-state: new
etcd2.conf配置文件
name: etcd-2
data-dir: /root/etcd2/data
listen-client-urls: http://0.0.0.0:2479
advertise-client-urls: http://127.0.0.1:2479
listen-peer-urls: http://0.0.0.0:2480
initial-advertise-peer-urls: http://127.0.0.1:2480
initial-cluster: etcd-1=http://127.0.0.1:2380,etcd-2=http://127.0.0.1:2480,etcd-3=http://127.0.0.1:2580
initial-cluster-token: etcd-cluster-my
initial-cluster-state: new
etcd3.conf配置文件
name: etcd-3
data-dir: /root/etcd3/data
listen-client-urls: http://0.0.0.0:2579
advertise-client-urls: http://127.0.0.1:2579
listen-peer-urls: http://0.0.0.0:2580
initial-advertise-peer-urls: http://127.0.0.1:2580
initial-cluster: etcd-1=http://127.0.0.1:2380,etcd-2=http://127.0.0.1:2480,etcd-3=http://127.0.0.1:2580
initial-cluster-token: etcd-cluster-my
initial-cluster-state: new
编辑脚本
#!/bin/bash
CRTDIR=$(pwd)
servers=("etcd1" "etcd2" "etcd3")
for server in ${servers[@]}
do
cd ${CRTDIR}/$server
nohup ./etcd --config-file=etcd.conf &
echo $?
done
运行脚本后查看进程和端口
查看集群状态
[root@VM-0-15-centos etcd1]# ./etcdctl --write-out=table --endpoints=127.0.0.1:2379,127.0.0.1:2479,127.0.0.1:2579 endpoint status
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| 127.0.0.1:2379 | 47a42fb96a975854 | 3.5.2 | 20 kB | false | false | 4 | 31 | 31 | |
| 127.0.0.1:2479 | 72ab37cc61e2023b | 3.5.2 | 20 kB | false | false | 4 | 31 | 31 | |
| 127.0.0.1:2579 | 470f778210a711ed | 3.5.2 | 20 kB | true | false | 4 | 31 | 31 | |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
[root@VM-0-15-centos etcd1]# ./etcdctl --endpoints=$ENDPOINTS endpoint health
127.0.0.1:2579 is healthy: successfully committed proposal: took = 5.033785ms
127.0.0.1:2479 is healthy: successfully committed proposal: took = 5.003254ms
127.0.0.1:2379 is healthy: successfully committed proposal: took = 4.990036ms
[root@VM-0-15-centos etcd1]# ./etcdctl -w table member list
+------------------+---------+--------+-----------------------+-----------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+--------+-----------------------+-----------------------+------------+
| 470f778210a711ed | started | etcd-3 | http://127.0.0.1:2580 | http://127.0.0.1:2579 | false |
| 47a42fb96a975854 | started | etcd-1 | http://127.0.0.1:2380 | http://127.0.0.1:2379 | false |
| 72ab37cc61e2023b | started | etcd-2 | http://127.0.0.1:2480 | http://127.0.0.1:2479 | false |
+------------------+---------+--------+-----------------------+-----------------------+------------+
日常操作
下面的ENDPOINTS="127.0.0.1:2379,127.0.0.1:2479,127.0.0.1:2579"
#添加数据
[root@VM-0-15-centos etcd1]# ./etcdctl put /etc/password 123456
#删除数据
[root@VM-0-15-centos etcd1]# ./etcdctl del /etc/password
--data-dir
etcutl defrag
--endpoints
--cluster
./etcdctl --endpoints=localhost:2379,badendpoint:2379 defrag
./etcdctl defrag --cluster
etcd v2迁移到etcd v3
# write key in etcd version 2 store
export ETCDCTL_API=2
etcdctl --endpoints=http://$ENDPOINT set foo bar
# read key in etcd v2
etcdctl --endpoints=$ENDPOINTS --output="json" get foo
# stop etcd node to migrate, one by one
# migrate v2 data
export ETCDCTL_API=3
etcdctl --endpoints=$ENDPOINT migrate --data-dir="default.etcd" --wal-dir="default.etcd/member/wal"
# restart etcd node after migrate, one by one
# confirm that the key got migrated
etcdctl --endpoints=$ENDPOINTS get /foo