1、背景
Orchestrator(orch):go编写的MySQL高可用性和复制拓扑管理工具,支持复制拓扑结构的调整,自动故障转移和手动主从切换等。后端数据库用MySQL或SQLite存储元数据,并提供Web界面展示MySQL复制的拓扑关系及状态,通过Web可更改MySQL实例的复制关系和部分配置信息,同时也提供命令行和api接口,方便运维管理。相对比MHA来看最重要的是解决了管理节点的单点问题,其通过raft协议保证本身的高可用。
一个Orchestrator进程对应一个专门给Orchestrator的mysql服务器,多个Orchestrator进程对应一个Orchestrator的mysql服务器。即一个Orchestrator数据库对应多个Orchestrator进程序,但是只有一个Orchestrator进程是处于管理的状态。
Orchestrator进程只需要带有SUPER, PROCESS, REPLICATION SLAVE, RELOAD,SELECT权限的一个用户来连接数据库server集群,即被管理的mysql服务器上必须有这个专用用户来给Orchestrator使用,进而操作server数据库。
1.Orchestrator是用go来写的。
2.它本身需要自己的数据库后台来存储管理数据库集群拓扑结构的相关信息。
3.至少需要一个Orchestrator后台进程,但是推荐在多个不同的主机上运行多个Orchestrator后台,这些后台节点共用一个后台数据库,但是同时只有一个是active的(可以在web界面的status菜单中查看哪一个节点是active的,也可以通过数据库中的active_node表查看)
大致的特点有:
① 自动发现MySQL的复制拓扑,并且在web上展示。
② 重构复制关系,可以在web进行拖图来进行复制关系变更。
③ 检测主异常,并可以自动或手动恢复,通过Hooks进行自定义脚本。
④ 支持命令行和web界面管理复制
2、 Orchestrator 的安装
1)下载地址
https://github.com/outbrain/orchestrator/releases
2) 安装
[iddbs@db142 ~]# yum -y install orchestrator-1.5.7-1.x86_64.rpm
[iddbs@db142 ~]# ll /usr/local/orchestrator/
总用量 15460
-rwxrwxr-x. 1 iddbs iddbs 15821016 10月 28 2016 orchestrator
-rw-rw-r--. 1 iddbs iddbs 5305 10月 28 2016 orchestrator-sample.conf.json
drwxr-xr-x. 6 iddbs iddbs 63 6月 7 03:52 resources
## orchestrator:应用程序
## *.json:默认的配置模板
## resources:orchestrator相关的文件:client、web、伪GTID等相关文件
3)配置文件参数说明
{
"Debug": false, --设置debug模式
"EnableSyslog": false, -- 是否把日志输出到系统日志里
"ListenAddress": ":3000", -- web http tpc 监听端口
"MySQLTopologyUser": "orc_client_user",
"MySQLTopologyPassword": "orc_client_password",
"MySQLTopologyCredentialsConfigFile": "",
"MySQLTopologySSLPrivateKeyFile": "",
"MySQLTopologySSLCertFile": "",
"MySQLTopologySSLCAFile": "",
"MySQLTopologySSLSkipVerify": true,
"MySQLTopologyUseMutualTLS": false, --是否启用TLS身份验证
"MySQLTopologyMaxPoolConnections": 3,
"DatabaselessMode__experimental": false,
"MySQLOrchestratorHost": "127.0.0.1",
"MySQLOrchestratorPort": 3306, --后端数据库端口
"MySQLOrchestratorDatabase": "orchestrator",
"MySQLOrchestratorUser": "orc_server_user",
"MySQLOrchestratorPassword": "orc_server_password",
"MySQLOrchestratorCredentialsConfigFile": "",
"MySQLOrchestratorSSLPrivateKeyFile": "",
"MySQLOrchestratorSSLCertFile": "",
"MySQLOrchestratorSSLCAFile": "",
"MySQLOrchestratorSSLSkipVerify": true,
"MySQLOrchestratorUseMutualTLS": false, --是否为Orchestrator MySQL实例启用TLS身份验证
"MySQLConnectTimeoutSeconds": 1, --数据库连接超时时间,秒
"DefaultInstancePort": 3306, -数据库默认端口
"SlaveStartPostWaitMilliseconds": 1000,
"DiscoverByShowSlaveHosts": true,
"InstancePollSeconds": 5, --实例之间读取间隔
"ReadLongRunningQueries": true,
"UnseenInstanceForgetHours": 240,
"SnapshotTopologiesIntervalHours": 0,
"InstanceBulkOperationsWaitTimeoutSeconds": 10,
"ActiveNodeExpireSeconds": 5,
"HostnameResolveMethod": "default",
"MySQLHostnameResolveMethod": "@@hostname",
"SkipBinlogServerUnresolveCheck": true, --跳过检查未解析的主机名是否解析为binlog服务器的相同主机名
"ExpiryHostnameResolvesMinutes": 60, --主机名解析到期之前的分钟数
"RejectHostnameResolvePattern": "", --不接受解析主机名的正则表达式。 这样做是为了避免因网络故障而存储错误
"ReasonableReplicationLagSeconds": 10, --复制延迟高于该值表示异常
"ProblemIgnoreHostnameFilters": [], --将与给定的regexp过滤器匹配的主机名最小化问题
"VerifyReplicationFilters": false, --在拓扑重构之前检查复制筛选器
"MaintenanceOwner": "orchestrator",
"ReasonableMaintenanceReplicationLagSeconds": 20,
"MaintenanceExpireMinutes": 10,
"MaintenancePurgeDays": 365,
"CandidateInstanceExpireMinutes": 60,
"AuditLogFile": "", --审计操作的日志文件名。 空的时候禁用
"AuditToSyslog": false, --审计日志是否写入到系统日志
"AuditPageSize": 20,
"AuditPurgeDays": 365,
"RemoveTextFromHostnameDisplay": ".mydomain.com:3306", --去除群集/群集页面上的主机名的文本
"ReadOnly": false,
"AuthenticationMethod": "", --身份验证类型。可选值有:
"" for none, "basic" for BasicAuth,
"multi" for advanced BasicAuth,
"proxy" for forwarded credentials via reverse proxy, 通过反向代理转发凭证
"token" for token based access
"HTTPAuthUser": "", --HTTP基本身份验证的用户名,空表示禁用身份验证
"HTTPAuthPassword": "", --HTTP基本身份验证的密码,空表示禁用密码
"AuthUserHeader": "", "X-Forwarded-User",--当AuthenticationMethod为“proxy”时,HTTP标头指示auth用户
"PowerAuthUsers": [
"*"
], --在AuthenticationMethod ==“proxy”上,可以更改的用户列表。 所有其他都是只读的
"ClusterNameToAlias": {
"127.0.0.1": "test suite"
},
"SlaveLagQuery": "",
"DetectClusterAliasQuery": "SELECT SUBSTRING_INDEX(@@hostname, '.', 1)",
"DetectClusterDomainQuery": "", --可选查询(在拓扑实例上执行),返回此集群主服务器的VIP / CNAME /别名/任何域名
"DetectInstanceAliasQuery": "", --可选查询(在拓扑实例上执行),返回实例的别名
"DetectPromotionRuleQuery": "", --可选查询(在拓扑实例上执行),返回实例的提升规则
"DataCenterPattern": "[.]([^.]+)[.][^.]+[.]mydomain[.]com", --一个组的正则表达式模式,从主机名中提取数据中心名称
"PhysicalEnvironmentPattern": "[.]([^.]+[.][^.]+)[.]mydomain[.]com", --一个组的正则表达式模式,从主机名中提取物理环境信息
"PromotionIgnoreHostnameFilters": [],
"DetectSemiSyncEnforcedQuery": "", --可选查询(在拓扑实例上执行)以确定是否对主写入完全强制执行半同步
"ServeAgentsHttp": false, --产生另一个专用于orchestrator-agent的HTTP接口
"AgentsServerPort": ":3001", --回调接口
"AgentsUseSSL": false,
"AgentsUseMutualTLS": false,
"AgentSSLSkipVerify": false,
"AgentSSLPrivateKeyFile": "",
"AgentSSLCertFile": "",
"AgentSSLCAFile": "",
"AgentSSLValidOUs": [],
"UseSSL": false,
"UseMutualTLS": false,
"SSLSkipVerify": false,
"SSLPrivateKeyFile": "",
"SSLCertFile": "",
"SSLCAFile": "",
"SSLValidOUs": [],
"URLPrefix": "",
"StatusEndpoint": "/api/status", --状态查看,默认为'/api/status'
"StatusSimpleHealth": true,
"StatusOUVerify": false, --如果为true,请尝试在Mutual TLS打开时验证OU。 默认为false
"HttpTimeoutSeconds": 60,
"AgentPollMinutes": 60,
"AgentAutoDiscover": false,
"UnseenAgentForgetHours": 6,
"StaleSeedFailMinutes": 60,
"SeedAcceptableBytesDiff": 8192,
"PseudoGTIDPattern": "",
"PseudoGTIDPatternIsFixedSubstring": false,
"PseudoGTIDMonotonicHint": "asc:",
"DetectPseudoGTIDQuery": "",
"PseudoGTIDCoordinatesHistoryHeuristicMinutes": 2,
"BinlogEventsChunkSize": 10000,
"BufferBinlogEvents": true,
"SkipBinlogEventsContaining": [],
"ReduceReplicationAnalysisCount": true,
"FailureDetectionPeriodBlockMinutes": 60,
"RecoveryPollSeconds": 10,
"RecoveryPeriodBlockSeconds": 3600,
"RecoveryIgnoreHostnameFilters": [],
"RecoverMasterClusterFilters": [
"_master_pattern_"
],
"RecoverIntermediateMasterClusterFilters": [
"_intermediate_master_pattern_"
],
"OnFailureDetectionProcesses": [
"echo 'Detected {failureType} on {failureCluster}. Affected replicas: {countSlaves}' >> /tmp/recovery.log"
],
"PreFailoverProcesses": [
"echo 'Will recover from {failureType} on {failureCluster}' >> /tmp/recovery.log"
],
"PostFailoverProcesses": [
"echo '(for all types) Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log"
],
"PostUnsuccessfulFailoverProcesses": [],
"PostMasterFailoverProcesses": [
"echo 'Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Promoted: {successorHost}:{successorPort}' >> /tmp/recovery.log"
],
"PostIntermediateMasterFailoverProcesses": [
"echo 'Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log"
],
"CoMasterRecoveryMustPromoteOtherCoMaster": true,
"DetachLostSlavesAfterMasterFailover": true,
"ApplyMySQLPromotionAfterMasterFailover": false,
"MasterFailoverDetachSlaveMasterHost": false,
"MasterFailoverLostInstancesDowntimeMinutes": 0,
"PostponeSlaveRecoveryOnLagMinutes": 0,
"OSCIgnoreHostnameFilters": [],
"GraphiteAddr": "",
"GraphitePath": "",
"GraphiteConvertHostnameDotsToUnderscores": true
}
3、运行部署
1)环境
主机:
db140 192.168.221.140
db141 192.168.221.141
db142 192.168.221.142
后端MySQL端口:3306
测试MySQL端口:3308
在三台测试机上各自安装MySQL2个实例:orch用的后端MySQL(3306)和被orch管理的MySQL(3308)。按照给出的配置模板,首先在后端数据库的实例上创建账号:
mysql> CREATE USER 'orchestrator'@'%' IDENTIFIED BY '123456';
mysql> GRANT ALL ON orchestrator.* TO 'orchestrator'@'%';
再在被管理的MySQL(3308)实例上创建账号
##现将141和142的3308配置为140:3308的从,即
master:192.168.221.140:3308
slave: 192.168.221.141:3308
slave:192.168.221.142:3308
orchestrator:192.168.221.142
mysql> change master to master_user='rep',master_host='192.168.221.140',master_port=3308,master_password='rep',master_auto_position=1;
mysql> change master to master_host='db140';
mysql> start slave;
##在140:3308上执行:
CREATE USER 'orchestrator'@'%' IDENTIFIED BY 'Aa123456';
GRANT SUPER, PROCESS, REPLICATION SLAVE, RELOAD ON *.* TO 'orchestrator'@'%';
GRANT SELECT ON mysql.slave_master_info TO 'orchestrator'@'%';
GRANT SELECT ON meta.* TO 'orchestrator'@'%';
其中meta库的作用是自己的query所用到的,如:cluster、pseudo_gtid_status等,后面会有相关说明 。
最后,因为配置文件里写的是域名(hostname),所以需要修改被管理MySQL的hosts,即:
[iddbs@db140 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.221.140 db140
192.168.221.141 db141
192.168.221.142 db142
2)orchestrator配置文件
注意:orchestrator安装在192.168.221.142上,先使用单点测试
[iddbs@db142 orchestrator]# vim /etc/orchestrator.conf.json
{
"Debug": true,
"ListenAddress": ":3000", #http开放端口
"MySQLTopologyUser": "orchestrator", #mysql管理账号,所有被管理的MySQL集群都需要有该账号
"MySQLTopologyPassword": "Aa123456", #mysql管理账号密码
"MySQLOrchestratorHost": "192.168.221.142", #后台mysql数据库地址,orchestrator依赖MySQL或者SQLite存储管理数据
"MySQLOrchestratorPort": 3306, #后台mysql数据库端口
"MySQLOrchestratorDatabase": "orchestrator", #后台mysql数据库名
"MySQLOrchestratorUser": "orchestrator", #后台mysql数据库账号
"MySQLOrchestratorPassword": "123456", #后台mysql数据库密码
"DiscoverByShowSlaveHosts": false,
"AuthenticationMethod": "", #取消页面验证
"HTTPAuthUser":"",
"HTTPAuthPassword":"",
"AuthUserHeader": "",
"PowerAuthUsers": [
"*"
],
"RecoverMasterClusterFilters": ["*"],
"RecoverIntermediateMasterClusterFilters": ["*"],
}
注意:DiscoverByShowSlaveHosts:true的时候,必须配置report_host
vim /iddbs/data3306/my3306.cnf
report_host=ip
说明:不加report_host ,show slave hosts 不会显示host,会导致程序报错的,report_host为只读参数,必须重启才可生效
3)创建命令软连接
ln -s /usr/local/orchestrator/orchestrator /usr/bin/
4)启动
orchestrator --debug --config=/etc/orchestrator.conf.json http
报错:
2021-06-07 06:00:36 INFO starting orchestrator
2021-06-07 06:00:36 INFO Read config: /etc/orchestrator.conf.json
2021-06-07 06:00:36 DEBUG Initializing orchestrator
2021-06-07 06:00:36 DEBUG Migrating database schema
2021-06-07 06:00:36 FATAL this authentication plugin is not supported
解决方案:
在142:3306实例执行:mysql> alter USER 'orchestrator'@'%' IDENTIFIED with mysql_native_password by '123456';
在140:3308实例执行:mysql> alter USER 'orchestrator'@'%' IDENTIFIED with mysql_native_password BY 'Aa123456';
再次执行,依旧报错
修改default_authentication_plugin=mysql_native_password,重启实例,再次执行,可以正常运行
5)访问192.168.221.142:3000,报错
6)进入到orchestrator的安装目录/usr/local/orchestrator,执行如下命令:
./orchestrator --debug --config=/etc/orchestrator.conf.json http
访问页面192.168.221.142:3000,初次打开页面是看不到mysql cluster集群名称的,需要点击discover 发现isntance,如图:
再次点击Cluster,便出现集群别名和instance
选择home下面的status,可以看到当前 Orchestrator健康的节点:
查看详细的复制拓扑关系,一主两从
如果db142想要变成db141的从库,即整个复制是级联复制,直接把db142拖拽和db141重合即可
如果想还原一主两从的结构,即把db142拖拽和db140重合即可
4、 Orchestrator 高可用
Orchestrator多节点部署,通过raft一致性协议实现自身高可用。 例如在如下3台机器部署Orchestrator节点:
192.168.221.140
192.168.221.141
192,168.221.142
在每个节点上修改配置文件/etc/orchestrator.conf.json
添加如下配置:
"RaftEnabled": true,
"RaftDataDir": "/usr/local/orchestrator",
"RaftBind": "192.168.221.142",
"DefaultRaftPort": 10008,
"RaftNodes": [ "192.168.221.140", "192.168.221.141", "192.168.221.142" ],
RaftBind配置为当前节点ip,在每个节点上去启动orchestrator,查看页面home--status,如下图:
闭192.168.221.142节点上的orchestrator服务,leader自动切换到192.168.221.141或者192.168.221.140,如果192.168.221.142重新启动后,加入集群,它将作为follower