orchestrator安装部署

一、orchestrator简述

1、什么是orchestrator

Orchestrator是一款开源,对MySQL复制提供高可用、拓扑的可视化管理工具,采用go语言编写,它能够主动发现当前拓扑结构和主从复制状态,支持MySQL主从复制拓扑关系的调整、支持MySQL主库故障自动切换(failover)、手动主从切换(switchover)等功能。

Orchestrator后台依赖于MySQL或者SQLite存储元数据,能够提供Web界面展示MySQL集群的拓扑关系及实例状态,通过Web界面可更改MySQL实例的部分配置信息,同时也提供命令行和api接口,以便更加灵活的自动化运维管理。Orchestrator 对MySQL主库的故障切换分为自动切换和手动切换。手动切换又分为recover、force-master-failover、force-master-takeover以及graceful-master-takeover。

相比于MHA,Orchestrator更加偏重于复制拓扑关系的管理,能够实现MySQL任一复制拓扑关系的调整,并在此基础上,实现MySQL高可用。另外,Orchestrator自身也可以部署多个节点,通过raft分布式一致性协议,保证自身的高可用。

2、orchestrator的特点和作用

1.检测和审查复制集群,可以主动发现当前的拓扑结构和主从复制状态,读取基本的MySQL信息如复制状态和配置。
2.安全拓扑重构:转移服务于另外一台计算机的系统拓扑,orchestrator了解复制规则,可以解析binlog文件中的position、GTID、Pseudo GTID、binlog服务器 ,整洁的拓扑可视化 ,复制问题可视化 ,通过简单的拖拽修改拓扑。
3.故障修复,根据拓扑信息,可以识别各种故障,根据配置可以执行自动恢复。

3、原理图

Orchestrator 默认配置 orchestrator更新服务_数据库


 

二、Orchestrator测试环境规划

IP地址

主机名

操作系统

MySQL版本

MySQL端口

192.168.220.107

orche-1

centos7.9

5.7.40

3306

192.168.220.10

orche-2

centos7.9

5.7.40

3306

192.168.220.11

orche-3

centos7.9

5.7.40

3306

192.168.220.110

master

centos7.9

5.7.40

3306

192.168.220.100

slave-1

centos7.9

5.7.40

3306

192.168.220.200

slave-2

centos7.9

5.7.40

3306

三、MySQL部署过程

1、六台服务器安装MySQL软件(三台搭建主从复制,三台搭建orchestrator高可用集群)

2、三台服务器进行MySQL主从搭建(一主两从,搭建过程略,之前博客有发过)

四、Orchestrator集群部署过程(三台都做相同操作)

1、安装jq工具

# 安装epel源
[root@orche ~]# yum install epel-release -y

# 安装jq工具
[root@orche ~]# yum install jq -y

2、安装orchestrator

下载地址:https://github.com/openark/orchestrator/releases

[root@orche ~]# cd /orchestrator/
[root@orche orchestrator]# ls
orchestrator-3.2.6-1.x86_64.rpm  orchestrator-client-3.2.6-1.x86_64.rpm
[root@orche orchestrator]# rpm -ivh orchestrator-3.2.6-1.x86_64.rpm 
准备中...                          ################################# [100%]
正在升级/安装...
   1:orchestrator-1:3.2.6-1           ################################# [100%]
[root@orche orchestrator]# rpm -ivh orchestrator-client-3.2.6-1.x86_64.rpm 
准备中...                          ################################# [100%]
正在升级/安装...
   1:orchestrator-client-1:3.2.6-1    ################################# [100%]   

# 安装完成,查看
[root@orche orchestrator]# rpm -qa | grep orchestrator
orchestrator-client-3.2.6-1.x86_64
orchestrator-3.2.6-1.x86_64

3、在orchestrator的mysql创建元数据库和用户

在orchestrator服务器(192.168.220.107)上创建,其余两台orchestrator服务器做相同操作

# 登录MySQL
[root@orche ~]# mysql -uroot -p

# 创建元数据库和用户
root@(none) 21:01  mysql>create database orchestrator;
root@(none) 21:01  mysql>create user 'orchestrator'@'%' identified by 'mysql';
root@(none) 21:01  mysql>grant all on orchestrator.* to 'orchestrator'@'%';

4、在mysql主库上创建用户

在master服务器(192.168.220.110)上创建,数据会同步给两个从服务器

root@(none) 21:02  mysql>GRANT SELECT, RELOAD, PROCESS, SUPER, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'orchestrator'@'%' identified by 'mysql';

5、编辑配置文件

在192.168.220.107上操作,拷贝一个模板的过来进行修改,其余两台orchestrator服务器做相同操作

[root@orche ~]# cd /usr/local/orchestrator/
[root@orche orchestrator]# cp orchestrator-sample.conf.json orchestrator.conf.json
[root@orche orchestrator]# cat orchestrator.conf.json 
{
  "Debug": true,
  "EnableSyslog": false,
  "ListenAddress": ":3000",
  "MySQLTopologyUser": "orchestrator",
  "MySQLTopologyPassword": "mysql",
  "MySQLTopologyCredentialsConfigFile": "",
  "MySQLTopologySSLPrivateKeyFile": "",
  "MySQLTopologySSLCertFile": "",
  "MySQLTopologySSLCAFile": "",
  "MySQLTopologySSLSkipVerify": true,
  "MySQLTopologyUseMutualTLS": false,
  "MySQLOrchestratorHost": "192.168.220.107",	# orchestrator服务器的主机号是什么就填什么
  "MySQLOrchestratorPort": 3306,
  "MySQLOrchestratorDatabase": "orchestrator",
  "MySQLOrchestratorUser": "orchestrator",
  "MySQLOrchestratorPassword": "mysql",
  "MySQLOrchestratorCredentialsConfigFile": "",
  "MySQLOrchestratorSSLPrivateKeyFile": "",
  "MySQLOrchestratorSSLCertFile": "",
  "MySQLOrchestratorSSLCAFile": "",
  "MySQLOrchestratorSSLSkipVerify": true,
  "MySQLOrchestratorUseMutualTLS": false,
  "MySQLConnectTimeoutSeconds": 1,
  
  # 模板默认是没有raft配置项的,需要自行加入
  "RaftEnabled": true,
  "RaftDataDir": "/data/orchestrator",
  "RaftBind": "192.168.220.107",			    # orchestrator服务器的主机号是什么就填什么
  "DefaultRaftPort": 10008,
  "RaftNodes": [
    "192.168.220.10",
    "192.168.220.11",
    "192.168.220.107"
  ], 
  
  "DefaultInstancePort": 3306,
  "DiscoverByShowSlaveHosts": true,
  "InstancePollSeconds": 5,
  "DiscoveryIgnoreReplicaHostnameFilters": [
    "a_host_i_want_to_ignore[.]example[.]com",
    ".*[.]ignore_all_hosts_from_this_domain[.]example[.]com",
    "a_host_with_extra_port_i_want_to_ignore[.]example[.]com:3307"
  ],
  "UnseenInstanceForgetHours": 240,
  "SnapshotTopologiesIntervalHours": 0,
  "InstanceBulkOperationsWaitTimeoutSeconds": 10,
  "HostnameResolveMethod": "default",
  "MySQLHostnameResolveMethod": "@@hostname",
  "SkipBinlogServerUnresolveCheck": true,
  "ExpiryHostnameResolvesMinutes": 60,
  "RejectHostnameResolvePattern": "",
  "ReasonableReplicationLagSeconds": 10,
  "ProblemIgnoreHostnameFilters": [],
  "VerifyReplicationFilters": false,
  "ReasonableMaintenanceReplicationLagSeconds": 20,
  "CandidateInstanceExpireMinutes": 60,
  "AuditLogFile": "",
  "AuditToSyslog": false,
  "RemoveTextFromHostnameDisplay": ".mydomain.com:3306",
  "ReadOnly": false,
  "AuthenticationMethod": "",
  "HTTPAuthUser": "",
  "HTTPAuthPassword": "",
  "AuthUserHeader": "",
  "PowerAuthUsers": [
    "*"
  ],
  "ClusterNameToAlias": {
    "127.0.0.1": "test suite"
  },
  "ReplicationLagQuery": "",
  "DetectClusterAliasQuery": "SELECT SUBSTRING_INDEX(@@hostname, '.', 1)",
  "DetectClusterDomainQuery": "",
  "DetectInstanceAliasQuery": "",
  "DetectPromotionRuleQuery": "",
  "DataCenterPattern": "[.]([^.]+)[.][^.]+[.]mydomain[.]com",
  "PhysicalEnvironmentPattern": "[.]([^.]+[.][^.]+)[.]mydomain[.]com",
  "PromotionIgnoreHostnameFilters": [],
  "DetectSemiSyncEnforcedQuery": "",
  "ServeAgentsHttp": false,
  "AgentsServerPort": ":3001",
  "AgentsUseSSL": false,
  "AgentsUseMutualTLS": false,
  "AgentSSLSkipVerify": false,
  "AgentSSLPrivateKeyFile": "",
  "AgentSSLCertFile": "",
  "AgentSSLCAFile": "",
  "AgentSSLValidOUs": [],
  "UseSSL": false,
  "UseMutualTLS": false,
  "SSLSkipVerify": false,
  "SSLPrivateKeyFile": "",
  "SSLCertFile": "",
  "SSLCAFile": "",
  "SSLValidOUs": [],
  "URLPrefix": "",
  "StatusEndpoint": "/api/status",
  "StatusSimpleHealth": true,
  "StatusOUVerify": false,
  "AgentPollMinutes": 60,
  "UnseenAgentForgetHours": 6,
  "StaleSeedFailMinutes": 60,
  "SeedAcceptableBytesDiff": 8192,
  "PseudoGTIDPattern": "",
  "PseudoGTIDPatternIsFixedSubstring": false,
  "PseudoGTIDMonotonicHint": "asc:",
  "DetectPseudoGTIDQuery": "",
  "BinlogEventsChunkSize": 10000,
  "SkipBinlogEventsContaining": [],
  "ReduceReplicationAnalysisCount": true,
  "FailureDetectionPeriodBlockMinutes": 60,
  "FailMasterPromotionOnLagMinutes": 0,
  "RecoveryPeriodBlockSeconds": 3600,
  "RecoveryIgnoreHostnameFilters": [],
  # 以下2项需要设置为"*",否则不会发生自动切换
  "RecoverMasterClusterFilters": [
    "*"
  ],
  "RecoverIntermediateMasterClusterFilters": [
    "*"
  ],
  "OnFailureDetectionProcesses": [
    "echo 'Detected {failureType} on {failureCluster}. Affected replicas: {countSlaves}' >> /tmp/recovery.log"
  ],
  "PreGracefulTakeoverProcesses": [
    "echo 'Planned takeover about to take place on {failureCluster}. Master will switch to read_only' >> /tmp/recovery.log"
  ],
  "PreFailoverProcesses": [
    "echo 'Will recover from {failureType} on {failureCluster}' >> /tmp/recovery.log"
  ],
  "PostFailoverProcesses": [
    "echo '(for all types) Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log"
  ],
  "PostUnsuccessfulFailoverProcesses": [],
  "PostMasterFailoverProcesses": [
    "echo 'Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Promoted: {successorHost}:{successorPort}' >> /tmp/recovery.log"
  ],
  "PostIntermediateMasterFailoverProcesses": [
    "echo 'Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log"
  ],
  "PostGracefulTakeoverProcesses": [
    "echo 'Planned takeover complete' >> /tmp/recovery.log"
  ],
  "CoMasterRecoveryMustPromoteOtherCoMaster": true,
  "DetachLostSlavesAfterMasterFailover": true,
  "ApplyMySQLPromotionAfterMasterFailover": true,
  "PreventCrossDataCenterMasterFailover": false,
  "PreventCrossRegionMasterFailover": false,
  "MasterFailoverDetachReplicaMasterHost": false,
  "MasterFailoverLostInstancesDowntimeMinutes": 0,
  "PostponeReplicaRecoveryOnLagMinutes": 0,
  "OSCIgnoreHostnameFilters": [],
  "GraphiteAddr": "",
  "GraphitePath": "",
  "GraphiteConvertHostnameDotsToUnderscores": true,
  "ConsulAddress": "",
  "ConsulAclToken": "",
  "ConsulKVStoreProvider": "consul"
}

6、启动orchestrator(三台orchestrator一起运行)

# 三个节点组成的raft,死掉1个节点不影响程序运行,可以正常切换,死掉2个节点的话,orc不可用,也不能切换

[root@orche ~]# cd /usr/local/orchestrator/

# 在后台运行脚本
[root@orche orchestrator]# [root@orche orchestrator]# nohup ./orchestrator http &

# 查看进程
[root@orche orchestrator]# ps aux | grep orchestrator
root     116243  0.5  1.5 916220 29452 ?        Sl   10:12   3:55 ./orchestrator http
root     126975  0.0  0.0 112808   968 pts/0    S+   21:17   0:00 grep --color=auto orchestrato

访问查看orchestrator的web页面

访问http://192.168.220.107:3000/,或者http://192.168.220.10:3000/,http://192.168.220.11:3000/

需要提前导入

Orchestrator 默认配置 orchestrator更新服务_Orchestrator 默认配置_02

导入成功

Orchestrator 默认配置 orchestrator更新服务_MySQL_03