MySQL 高可用复制管理工具 - Orchestrator

测试说明

环境介绍

服务器环境：

代码语言：javascript

复制

三台服务器 1：MySQL实例（3306是orch的后端数据库，3307是MySQL主从架构「开启GTID」） Master ：192.168.163.131:3307 Slave ：192.168.163.132:3307 Slave ：192.168.163.133:3307

2：hosts（etc/hosts）： 192.168.163.131 test1 192.168.163.132 test2 192.168.163.133 test3

这里需要注意的是，orch 检测主库宕机依赖从库的 IO 线程（本身连不上主库后，还会通过从库再去检测主库是否异常），所以默认 change 搭建的主从感知主库宕机的等待时间过长，需要需要稍微改下：

代码语言：javascript

复制

change master to master_host='192.168.163.131',master_port=3307,master_user='rep',master_password='rep',master_auto_position=1,MASTER_HEARTBEAT_PERIOD=2,MASTER_CONNECT_RETRY=1, MASTER_RETRY_COUNT=86400;

代码语言：javascript

复制

set global slave_net_timeout=8;

代码语言：javascript

复制

slave_net_timeout（全局变量）：MySQL5.7.7之后，默认改成60秒。该参数定义了从库从主库获取数据等待的秒数，超过这个时间从库会主动退出读取，中断连接，并尝试重连。

master_heartbeat_period：复制心跳的周期。默认是 slave_net_timeout 的一半。Master 在没有数据的时候，每 master_heartbeat_period 秒发送一个心跳包，这样 Slave 就能知道 Master 是不是还正常。

slave_net_timeout 是设置在多久没收到数据后认为网络超时，之后 Slave 的 IO 线程会重新连接 Master 。结合这两个设置就可以避免由于网络问题导致的复制延误。master_heartbeat_period 单位是秒，可以是个带上小数，如 10.5，最高精度为 1 毫秒。

代码语言：javascript

复制

重试策略为：

备库过了slave-net-timeout秒还没有收到主库来的数据，它就会开始第一次重试。然后每过 master-connect-retry 秒，备库会再次尝试重连主库。直到重试了 master-retry-count 次，它才会放弃重试。如果重试的过程中，连上了主库，那么它认为当前主库是好的，又会开始 slave-net-timeout 秒的等待。

slave-net-timeout 的默认值是 60 秒， master-connect-retry 默认为 60 秒， master-retry-count 默认为 86400 次。也就是说，如果主库一分钟都没有任何数据变更发送过来，备库才会尝试重连主库。

这样，主库宕机之后，约 8~10 秒感知主库异常，Orchestrator 开始切换。另外还需要注意的是，orch 默认是用主机名来进行管理的，需要在 mysql 的配置文件里添加：report_host 和 report_port 参数。

数据库环境：

代码语言：javascript

复制

Orchestrator后端数据库：在启动Orchestrator程序的时候，会自动在数据库里创建orchestrator数据库，保存orchestrator的一些数据信息。

Orchestrator管理的数据库：在配置文件里配置的一些query参数，需要在每个被管理的目标库里有meta库来保留一些元信息（类似cmdb功能），比如用pt-heartbeat来验证主从延迟；用cluster表来保存别名、数据中心等。

如下面是测试环境的 cluster 表信息：

代码语言：javascript

复制

> CREATE TABLE cluster (

anchor tinyint(4) NOT NULL,

cluster_name varchar(128) CHARACTER SET ascii NOT NULL DEFAULT '',

cluster_domain varchar(128) CHARACTER SET ascii NOT NULL DEFAULT '',

data_center varchar(128) NOT NULL,

PRIMARY KEY (anchor)

) ENGINE=InnoDB DEFAULT CHARSET=utf8
>select * from cluster;

+--------+--------------+----------------+-------------+

| anchor | cluster_name | cluster_domain | data_center |

+--------+--------------+----------------+-------------+

|      1 | test         | CaoCao         | BJ          |

+--------+--------------+----------------+-------------+

测试说明

开启 Orchestrator 进程：

代码语言：javascript

复制

./orchestrator --config=/etc/orchestrator.conf.json http

在浏览器里输入三台主机的任意主机的 IP 加端口（http://192.168.163.131）进入到 Web 管理界面，在 Clusters 导航的 Discover 里输入任意一台被管理 MySQL 实例的信息。添加完成之后，Web 界面效果：

在 web 上可以进行相关的管理，关于 Web 上的相关按钮的说明，下面会做相关说明：

1. 部分可修改的参数 (点击 Web 上需要被修改实例的任意图标）：

说明

代码语言：javascript

复制

Instance Alias ：实例别名

Last seen       :  最后检测时间

Self coordinates ：自身的binlog位点信息

Num replicas ：有几个从库

Server ID    ： MySQL server_id

Server UUID ：    MySQL UUID

Version ：    版本

Read only ： 是否只读

Has binary logs ：是否开启binlog

Binlog format    ：binlog 模式

Logs slave updates ：是否开启log_slave_updates

GTID supported ：是否支持GTID

GTID based replication ：是否是基于GTID的复制

GTID mode    ：复制是否开启了GTID

Executed GTID set ：复制中执行过的GTID列表

Uptime ：启动时间

Allow TLS ：是否开启TLS

Cluster ：集群别名

Audit ：审计实例

Agent ：Agent实例

说明：上面图中，后面有按钮的都是可以在 Web 上进行修改的功能，如：是否只读，是否开启 GTID 的复制等。其中 Begin Downtime 会将实例标记为已停用，此时如果发生 Failover，该实例不会参与。

2. 任意改变主从的拓扑结构：可以直接在图上拖动变更复制，会自动恢复拓扑关系：

3. 主库挂了之后自动 Failover，如：

图中显示，当主挂掉之后，拓扑结构里自动剔除该主节点，选择一个最合适的从库提升成主库，并修复复制拓扑。在 Failover 过程当中，可以查看 / tmp/recovery.log 文件（配置文件里定死），里面包含了在 Failover 过程中 Hooks 执行的外部脚本，类似 MHA 的 master_ip_failover_script 参数。可以通过外部脚本进行相应的如：VIP 切换、Proxy 修改、DNS 修改、中间件修改、LVS 修改等等，具体的执行脚本可以根据自己的实际情况编写。

4. Orchestrator 高可用。因为在一开始就已经部署了 3 台，通过配置文件里的 Raft 参数进行通信。只要有 2 个节点的 Orchestrator 正常，就不会影响使用，如果出现 2 个节点的 Orchestrator 异常，则 Failover 会失败。2 个节点异常的图如下：

图中的各个节点全部显示灰色，此时 Raft 算法失效，导致 Orch 的 Failover 功能失败。相对比 MHA 的 Manager 的单点，Orchestrator 通过 Raft 算法解决了本身的高可用性以及解决网络隔离问题，特别是跨数据中心网络异常。这里说明下 Raft，通过共识算法：

Orchestrator 节点能够选择具有仲裁的领导者（leader）。如有 3 个 orch 节点，其中一个可以成为 leader（3 节点仲裁大小为 2，5 节点仲裁大小为 3）。只允许 leader 进行修改，每个 MySQL 拓扑服务器将由三个不同的 orchestrator 节点独立访问，在正常情况下，三个节点将看到或多或少相同的拓扑图，但他们每个都会独立分析写入其自己的专用后端数据库服务器：

① 所有更改都必须通过 leader。

② 在启用 raft 模式上禁止使用 orchestrator 客户端。

③ 在启用 raft 模式上使用 orchestrator-client，orchestrator-client 可以安装在没有 orchestrator 上的服务器。

④ 单个 orchestrator 节点的故障不会影响 orchestrator 的可用性。在 3 节点设置上，最多一个服务器可能会失败。在 5 节点设置上，2 个节点可能会失败。

⑤ Orchestrator 节点异常关闭，然后再启动。它将重新加入 Raft 组，并接收遗漏的任何事件, 只要有足够的 Raft 记录。

⑥ 要加入比日志保留允许的更长 / 更远的 orchestrator 节点或者数据库完全为空的节点，需要从另一个活动节点克隆后端 DB。

关于 Raft 更多的信息见：https://github.com/github/orchestrator/blob/master/docs/raft.md

Orchestrator 的高可用有 2 种方式，第一种就是上面说的通过 Raft（推荐），另一种是通过后端数据库的同步。详细信息见文档。文档里详细比较了两种高可用性部署方法。两种方法的图如下：

到这里，Orchestrator 的基本功能已经实现，包括主动 Failover、修改拓扑结构以及 Web 上的可视化操作。

5. Web 上各个按钮的功能说明

①：Home 下的 status：查看 orch 的状态：包括运行时间、版本、后端数据库以及各个 Raft 节点的状态。

②：Cluster 下的 dashboard：查看 orch 下的所有被管理的 MySQL 实例。

③：Cluster 下的 Failure analysis：查看故障分析以及包括记录的故障类型列表。

④：Cluster 下的 Discover：用来发现被管理的 MySQL 实例。

⑤：Audit 下的 Failure detection：故障检测信息，包含历史信息。

⑥：Audit 下的 Recovery：故障恢复信息以及故障确认。

⑦：Audit 下的 Agent：是一个在 MySQL 主机上运行并与 orchestrator 通信的服务，能够向 orch 提供操作系统，文件系统和 LVM 信息，以及调用某些命令和脚本。

⑧：导航栏里的

图标，对应左边导航栏的图标：

第 1 行：集群别名的查看修改。

第 2 行：pools。

第 3 行：Compact display，紧凑展示。

![img](https://img2018.cnblogs.com/blog/163084/201902/163084-20190219115128134-967922582.png

第 4 行：Pool indicator，池指示器。

第 5 行：Colorize DC，每个数据中心用不同颜色展示。

第 6 行：Anonymize，匿名集群中的主机名。

注意：左边导航栏里的

图标，表示实例的概括：实例名、别名、故障检测和恢复等信息。

⑧：导航栏里的

图标，表示是否禁止全局恢复。禁止掉的话不会进行 Failover。

⑨：导航栏里的

图标，表示是否开启刷新页面（默认 60 一次）。

⑩：导航栏里的

图标，表示 MySQL 实例迁移模式。

代码语言：javascript

复制

Smart mode：自动选择迁移模式，让Orch自己选择迁移模式。

Classic mode：经典迁移模式，通过binlog和position进行迁移。

GTID mode：GTID迁移模式。

Pseudo GTID mode：伪GTID迁移模式。

到此，Orchestrator 的基本测试和 Web 说明已经介绍完毕。和 MHA 比已经有很大的体验提升，不仅在 Web 进行部分参数的设置修改，还可以改变复制拓扑，最重要的是解决 MHA Manager 单点的问题。还有什么理由不替换 MHA 呢？😃

工作流程说明

Orchestrator 实现了自动 Failover，现在来看看自动 Failover 的大致流程是怎么样的。

1. 检测流程

① orchestrator 利用复制拓扑，先检查主本身，并观察其 slaves。

② 如果 orchestrator 本身连不上主，可以连上该主的从，则通过从去检测，若在从上也看不到主（IO Thread）「2 次检查」，判断 Master 宕机。

该检测方法比较合理，当从都连不上主了，则复制肯定有出问题，故会进行切换。所以在生产中非常可靠。

检测发生故障后并不都会进行自动恢复，比如：禁止全局恢复、设置了 shutdown time、上次恢复离本次恢复时间在 RecoveryPeriodBlockSeconds 设置的时间内、失败类型不被认为值得恢复等。检测与恢复无关，但始终启用。每次检测都会执行 OnFailureDetectionProcesses Hooks。

配置故障检测：

代码语言：javascript

复制

{

"FailureDetectionPeriodBlockMinutes": 60,

}
Hooks相关参数：

{

"OnFailureDetectionProcesses": [

"echo 'Detected {failureType} on {failureCluster}. Affected replicas: {countReplicas}' >> /tmp/recovery.log"

],

}
MySQL复制相关调整：

slave_net_timeout

MASTER_CONNECT_RETRY

2. 恢复流程

恢复的实例需要支持：GTID、伪 GTID、开启 Binlog。恢复的配置如下：

代码语言：javascript

复制

{

"RecoveryPeriodBlockSeconds": 3600,

"RecoveryIgnoreHostnameFilters": [],

"RecoverMasterClusterFilters": [

"thiscluster",

"thatcluster"

],

"RecoverMasterClusterFilters": [""],

"RecoverIntermediateMasterClusterFilters": [

""

],

}
{

"ApplyMySQLPromotionAfterMasterFailover": true,

"PreventCrossDataCenterMasterFailover": false,

"FailMasterPromotionIfSQLThreadNotUpToDate": true,

"MasterFailoverLostInstancesDowntimeMinutes": 10,

"DetachLostReplicasAfterMasterFailover": true,

}
Hooks：

{

"PreGracefulTakeoverProcesses": [

"echo 'Planned takeover about to take place on {failureCluster}. Master will switch to read_only' >> /tmp/recovery.log"

],

"PreFailoverProcesses": [

"echo 'Will recover from {failureType} on {failureCluster}' >> /tmp/recovery.log"

],

"PostFailoverProcesses": [

"echo '(for all types) Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log"

],

"PostUnsuccessfulFailoverProcesses": [],

"PostMasterFailoverProcesses": [

"echo 'Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:

{failedPort}; Promoted: {successorHost}:{successorPort}' >> /tmp/recovery.log"

],

"PostIntermediateMasterFailoverProcesses": [],

"PostGracefulTakeoverProcesses": [

"echo 'Planned takeover complete' >> /tmp/recovery.log"

],

}

具体的参数含义请参考「MySQL 高可用复制管理工具 —— Orchestrator 介绍」。在执行故障检测和恢复的时候都可以执行外部自定义脚本（hooks），来配合使用（VIP、Proxy、DNS）。

可以恢复中继主库（DeadIntermediateMaster）和主库：

中继主库：恢复会找其同级的节点进行做主从。匹配副本按照哪些实例具有 log-slave-updates、实例是否延迟、它们是否具有复制过滤器、哪些版本的 MySQL 等等

主库：恢复可以指定提升特定的从库「提升规则」（register-candidate），提升的从库不一定是最新的，而是选择最合适的，设置完提升规则之后，有效期为 1 个小时。

提升规则选项有：

代码语言：javascript

复制

prefer      --比较喜欢

neutral    --中立（默认）

prefer_not --比较不喜欢

must_not  --拒绝

恢复支持的类型有：自动恢复、优雅的恢复、手动恢复、手动强制恢复，恢复的时候也可以执行相应的 Hooks 参数。具体的恢复流程可以看恢复流程的说明。关于恢复的配置可以官方说明。

**补充：**每次恢复除了自动的 Failover 之外，都需要配合执行自己定义的 Hooks 的脚本来处理外部的一些操作：VIP 修改、DNS 修改、Proxy 修改等等。所以这么多 Hooks 的参数该如何设置呢？哪个参数需要执行，哪个参数不需要执行，以及 Hooks 的执行顺序是怎么样的？虽然文章里有介绍，但为了更好的进行说明，下面进行各种恢复场景执行 Hooks 的顺序：

代码语言：javascript

复制

   "OnFailureDetectionProcesses": [   #检测故障时执行

"echo '②  Detected {failureType} on {failureCluster}. Affected replicas: {countSlaves}' >> /tmp/recovery.log"

],

"PreGracefulTakeoverProcesses": [   #在主变为只读之前立即执行

"echo '①   Planned takeover about to take place on {failureCluster}. Master will switch to read_only' >> /tmp/recovery.log"

],

"PreFailoverProcesses": [   #在执行恢复操作之前立即执行

"echo '③  Will recover from {failureType} on {failureCluster}' >> /tmp/recovery.log"

],

"PostMasterFailoverProcesses": [ #在主恢复成功结束时执行

"echo '④  Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Promoted: {successorHost}:{successorPort}' >> /tmp/recovery.log"

],

"PostFailoverProcesses": [   #在任何成功恢复结束时执行

"echo '⑤  (for all types) Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log"

],

"PostUnsuccessfulFailoverProcesses": [  #在任何不成功的恢复结束时执行

"echo '⑧  >> /tmp/recovery.log'"

],

"PostIntermediateMasterFailoverProcesses": [  #在成功的中间主恢复结束时执行

"echo '⑥ Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log"

],

"PostGracefulTakeoverProcesses": [   #在旧主位于新晋升的主之后执行

"echo '⑦ Planned takeover complete' >> /tmp/recovery.log"

],

代码语言：javascript

复制

主库宕机，自动Failover ② Detected UnreachableMaster on test1:3307. Affected replicas: 2 ② Detected DeadMaster on test1:3307. Affected replicas: 2 ③ Will recover from DeadMaster on test1:3307 ④ Recovered from DeadMaster on test1:3307. Failed: test1:3307; Promoted: test2:3307 ⑤ (for all types) Recovered from DeadMaster on test1:3307. Failed: test1:3307; Successor: test2:3307 优雅的主从切换：test2:3307优雅的切换到test1:3307，切换之后需要手动执行start slave orchestrator-client -c graceful-master-takeover -a test2:3307 -d test1:3307 ① Planned takeover about to take place on test2:3307. Master will switch to read_only ② Detected DeadMaster on test2:3307. Affected replicas: 1 ③ Will recover from DeadMaster on test2:3307 ④ Recovered from DeadMaster on test2:3307. Failed: test2:3307; Promoted: test1:3307 ⑤ (for all types) Recovered from DeadMaster on test2:3307. Failed: test2:3307; Successor: test1:3307 ⑦ Planned takeover complete 手动恢复，当从库进入停机或则维护模式，此时主库宕机，不会自动Failover，需要手动执行恢复，指定死掉的主实例： orchestrator-client -c recover -i test1:3307 ② Detected UnreachableMaster on test1:3307. Affected replicas: 2 ② Detected DeadMaster on test1:3307. Affected replicas: 2 ③ Will recover from DeadMaster on test1:3307 ④ Recovered from DeadMaster on test1:3307. Failed: test1:3307; Promoted: test2:3307 ⑤ (for all types) Recovered from DeadMaster on test1:3307. Failed: test1:3307; Successor: test2:3307

手动强制恢复，不管任何情况，都进行恢复： orchestrator-client -c force-master-failover -i test2:3307 ② Detected DeadMaster on test2:3307. Affected replicas: 2 ③ Will recover from DeadMaster on test2:3307 ② Detected AllMasterSlavesNotReplicating on test2:3307. Affected replicas: 2 ④ Recovered from DeadMaster on test2:3307. Failed: test2:3307; Promoted: test1:3307 ⑤ (for all types) Recovered from DeadMaster on test2:3307. Failed: test2:3307; Successor: test1:3307

其中上面的情况下，⑥和⑧都没执行。因为⑥是执行中间主库时候执行的，没有中间主库（级联复制）可以不用设置。⑧是恢复失败的时候执行的，上面恢复没有出现失败，可以定义一些告警提醒。

生产环境上部署

在生产上部署 Orchestrator，可以参考文档。

1. Orchestrator 首先需要确认本身高可用的后端数据库是用单个 MySQL，MySQL 复制还是本身的 Raft。

2. 运行发现服务（web、orchestrator-client）

代码语言：javascript

复制

orchestrator-client -c discover -i this.hostname.com

3. 确定提升规则（某些服务器更适合被提升）

代码语言：javascript

复制

orchestrator -c register-candidate -i {::fqdn} --promotion-rule {promotion_rule}

4. 如果服务器出现问题，将在 Web 界面上的问题下拉列表中显示。使用 Downtiming 则不会在问题列表里显示，并且也不会进行恢复，处于维护模式。

代码语言：javascript

复制

orchestrator -c begin-downtime -i ${::fqdn} --duration=5m --owner=cron --reason=continuous_downtime"

也可以用API： curl -s "http://my.orchestrator.service:80/api/begin-downtime/my.hostname/3306/wallace/experimenting+failover/45m"

5. 伪 GTID，如果 MySQL 没有开启 GTID，则可以开启伪 GTID 实现类似 GTID 的功能。

6. 保存元数据，元数据大部分通过参数的 query 来获取，比如在自的表 cluster 里获取集群的别名 (DetectClusterAliasQuery)、数据中心(DetectDataCenterQuery)、域名(DetectClusterDomainQuery) 等，以及复制的延迟（pt-heartbeat）、是否半同步(DetectSemiSyncEnforcedQuery)。以及可以通过正则匹配：DataCenterPattern、PhysicalEnvironmentPattern 等。

7. 可以给实例打标签。

命令行、API 的使用

Orchestrator 不仅有 Web 界面来进行查看和管理，还可以通过命令行（orchestrator-client）和 API（curl）来执行更多的管理命令，现在来说明几个比较常用方法。

通过 help 来看下有哪些可以执行的命令：./orchestrator-client --help，命令的说明可以看手册说明。

代码语言：javascript

复制

Usage: orchestrator-client -c <command> [flags...] Example: orchestrator-client -c which-master -i some.replica Options: -h, --help print this help -c <command>, --command <command> indicate the operation to perform (see listing below) -a <alias>, --alias <alias> cluster alias -o <owner>, --owner <owner> name of owner for downtime/maintenance commands -r <reason>, --reason <reason> reason for downtime/maintenance operation -u <duration>, --duration <duration> duration for downtime/maintenance operations -R <promotion rule>, --promotion-rule <promotion rule> rule for 'register-candidate' command -U <orchestrator_api>, --api <orchestrator_api> override $orchestrator_api environemtn variable, indicate where the client should connect to. -P <api path>, --path <api path> With '-c api', indicate the specific API path you wish to call -b <username:password>, --auth <username:password> Specify when orchestrator uses basic HTTP auth. -q <query>, --query <query> Indicate query for 'restart-replica-statements' command -l <pool name>, --pool <pool name> pool name for pool related commands -H <hostname> -h <hostname> indicate host for resolve and raft operations help Show available commands which-api Output the HTTP API to be used api Invoke any API request; provide --path argument async-discover Lookup an instance, investigate it asynchronously. Useful for bulk loads discover Lookup an instance, investigate it forget Forget about an instance's existence forget-cluster Forget about a cluster topology Show an ascii-graph of a replication topology, given a member of that topology topology-tabulated Show an ascii-graph of a replication topology, given a member of that topology, in tabulated format clusters List all clusters known to orchestrator clusters-alias List all clusters known to orchestrator search Search for instances matching given substring instance"|"which-instance Output the fully-qualified hostname:port representation of the given instance, or error if unknown which-master Output the fully-qualified hostname:port representation of a given instance's master which-replicas Output the fully-qualified hostname:port list of replicas of a given instance which-broken-replicas Output the fully-qualified hostname:port list of broken replicas of a given instance which-cluster-instances Output the list of instances participating in same cluster as given instance which-cluster Output the name of the cluster an instance belongs to, or error if unknown to orchestrator which-cluster-master Output the name of a writable master in given cluster all-clusters-masters List of writeable masters, one per cluster all-instances The complete list of known instances which-cluster-osc-replicas Output a list of replicas in a cluster, that could serve as a pt-online-schema-change operation control replicas which-cluster-osc-running-replicas Output a list of healthy, replicating replicas in a cluster, that could serve as a pt-online-schema-change operation control replicas downtimed List all downtimed instances dominant-dc Name the data center where most masters are found submit-masters-to-kv-stores Submit a cluster's master, or all clusters' masters to KV stores relocate Relocate a replica beneath another instance relocate-replicas Relocates all or part of the replicas of a given instance under another instance match Matches a replica beneath another (destination) instance using Pseudo-GTID match-up Transport the replica one level up the hierarchy, making it child of its grandparent, using Pseudo-GTID match-up-replicas Matches replicas of the given instance one level up the topology, making them siblings of given instance, using Pseudo-GTID move-up Move a replica one level up the topology move-below Moves a replica beneath its sibling. Both replicas must be actively replicating from same master. move-equivalent Moves a replica beneath another server, based on previously recorded "equivalence coordinates" move-up-replicas Moves replicas of the given instance one level up the topology make-co-master Create a master-master replication. Given instance is a replica which replicates directly from a master. take-master Turn an instance into a master of its own master; essentially switch the two. move-gtid Move a replica beneath another instance via GTID move-replicas-gtid Moves all replicas of a given instance under another (destination) instance using GTID repoint Make the given instance replicate from another instance without changing the binglog coordinates. Use with care repoint-replicas Repoint all replicas of given instance to replicate back from the instance. Use with care take-siblings Turn all siblings of a replica into its sub-replicas. tags List tags for a given instance tag-value List tags for a given instance tag Add a tag to a given instance. Tag in "tagname" or "tagname=tagvalue" format untag Remove a tag from an instance untag-all Remove a tag from all matching instances tagged List instances tagged by tag-string. Format: "tagname" or "tagname=tagvalue" or comma separated "tag0,tag1=val1,tag2" for intersection of all. submit-pool-instances Submit a pool name with a list of instances in that pool which-heuristic-cluster-pool-instances List instances of a given cluster which are in either any pool or in a specific pool begin-downtime Mark an instance as downtimed end-downtime Indicate an instance is no longer downtimed begin-maintenance Request a maintenance lock on an instance end-maintenance Remove maintenance lock from an instance register-candidate Indicate the promotion rule for a given instance register-hostname-unresolve Assigns the given instance a virtual (aka "unresolved") name deregister-hostname-unresolve Explicitly deregister/dosassociate a hostname with an "unresolved" name stop-replica Issue a STOP SLAVE on an instance stop-replica-nice Issue a STOP SLAVE on an instance, make effort to stop such that SQL thread is in sync with IO thread (ie all relay logs consumed) start-replica Issue a START SLAVE on an instance restart-replica Issue STOP and START SLAVE on an instance reset-replica Issues a RESET SLAVE command; use with care detach-replica Stops replication and modifies binlog position into an impossible yet reversible value. reattach-replica Undo a detach-replica operation detach-replica-master-host Stops replication and modifies Master_Host into an impossible yet reversible value. reattach-replica-master-host Undo a detach-replica-master-host operation skip-query Skip a single statement on a replica; either when running with GTID or without gtid-errant-reset-master Remove errant GTID transactions by way of RESET MASTER gtid-errant-inject-empty Apply errant GTID as empty transactions on cluster's master enable-semi-sync-master Enable semi-sync (master-side) disable-semi-sync-master Disable semi-sync (master-side) enable-semi-sync-replica Enable semi-sync (replica-side) disable-semi-sync-replica Disable semi-sync (replica-side) restart-replica-statements Given `-q "<query>"` that requires replication restart to apply, wrap query with stop/start slave statements as required to restore instance to same replication state. Print out set of statements can-replicate-from Check if an instance can potentially replicate from another, according to replication rules can-replicate-from-gtid Check if an instance can potentially replicate from another, according to replication rules and assuming Oracle GTID is-replicating Check if an instance is replicating at this time (both SQL and IO threads running) is-replication-stopped Check if both SQL and IO threads state are both strictly stopped. set-read-only Turn an instance read-only, via SET GLOBAL read_only := 1 set-writeable Turn an instance writeable, via SET GLOBAL read_only := 0 flush-binary-logs Flush binary logs on an instance last-pseudo-gtid Dump last injected Pseudo-GTID entry on a server recover Do auto-recovery given a dead instance, assuming orchestrator agrees there's a problem. Override blocking. graceful-master-takeover Gracefully promote a new master. Either indicate identity of new master via '-d designated.instance.com' or setup replication tree to have a single direct replica to the master. force-master-failover Forcibly discard master and initiate a failover, even if orchestrator doesn't see a problem. This command lets orchestrator choose the replacement master force-master-takeover Forcibly discard master and promote another (direct child) instance instead, even if everything is running well ack-cluster-recoveries Acknowledge recoveries for a given cluster; this unblocks pending future recoveries ack-all-recoveries Acknowledge all recoveries disable-global-recoveries Disallow orchestrator from performing recoveries globally enable-global-recoveries Allow orchestrator to perform recoveries globally check-global-recoveries Show the global recovery configuration replication-analysis Request an analysis of potential crash incidents in all known topologies raft-leader Get identify of raft leader, assuming raft setup raft-health Whether node is part of a healthy raft group raft-leader-hostname Get hostname of raft leader, assuming raft setup raft-elect-leader Request raft re-elections, provide hint for new leader's identity</code></pre></div></div><p>orchestrator-client 不需要和 Orchestrator 服务放一起，不需要访问后端数据库，在任意一台上都可以。</p><p>**注意：**因为配置了 Raft，有多个 Orchestrator，所以需要 ORCHESTRATOR_API 的环境变量，orchestrator-client 会自动选择 leader。如：</p><div class="rno-markdown-code"><div class="rno-markdown-code-toolbar"><div class="rno-markdown-code-toolbar-info"><div class="rno-markdown-code-toolbar-item is-type"><span class="is-m-hidden">代码语言：</span>javascript</div></div><div class="rno-markdown-code-toolbar-opt"><div class="rno-markdown-code-toolbar-copy"><i class="icon-copy"></i><span class="is-m-hidden">复制</span></div></div></div><div class="developer-code-block"><pre class="prism-token token line-numbers language-javascript"><code class="language-javascript" style="margin-left:0">export ORCHESTRATOR_API="test1:3000/api test2:3000/api test3:3000/api"</code></pre></div></div><ol class="ol-level-0"><li>列出所有集群：clusters</li></ol><p>默认：</p><div class="rno-markdown-code"><div class="rno-markdown-code-toolbar"><div class="rno-markdown-code-toolbar-info"><div class="rno-markdown-code-toolbar-item is-type"><span class="is-m-hidden">代码语言：</span>javascript</div></div><div class="rno-markdown-code-toolbar-opt"><div class="rno-markdown-code-toolbar-copy"><i class="icon-copy"></i><span class="is-m-hidden">复制</span></div></div></div><div class="developer-code-block"><pre class="prism-token token line-numbers language-javascript"><code class="language-javascript" style="margin-left:0"># orchestrator-client -c clusters

test2:3307

返回包含集群别名：clusters-alias

代码语言：javascript

复制

# orchestrator-client -c clusters-alias

test2:3307,test

发现指定实例：discover/async-discover

同步现：

代码语言：javascript

复制

# orchestrator-client -c discover -i test1:3307

test1:3307

异步发现：适用于批量

代码语言：javascript

复制

# orchestrator-client -c async-discover -i test1:3307

:null

忘记指定对象：forget/forget-cluster

忘记指定实例：

代码语言：javascript

复制

# orchestrator-client -c forget -i test1:3307

忘记指定集群：

代码语言：javascript

复制

# orchestrator-client -c forget-cluster -i test

打印指定集群的拓扑：topology/topology-tabulated

普通返回：

代码语言：javascript

复制

# orchestrator-client -c topology -i test1:3307

test2:3307   [0s,ok,5.7.25-0ubuntu0.16.04.2-log,rw,ROW,>>,GTID]

test1:3307 [0s,ok,5.7.25-0ubuntu0.16.04.2-log,ro,ROW,>>,GTID]
test3:3307 [0s,ok,5.7.25-log,ro,ROW,>>,GTID]

列表返回：

代码语言：javascript

复制

# orchestrator-client -c topology-tabulated -i test1:3307

test2:3307  |0s|ok|5.7.25-0ubuntu0.16.04.2-log|rw|ROW|>>,GTID
test1:3307|0s|ok|5.7.25-0ubuntu0.16.04.2-log|ro|ROW|>>,GTID
test3:3307|0s|ok|5.7.25-log                 |ro|ROW|>>,GTID

查看使用哪个 API：自己会选择出 leader。which-api

代码语言：javascript

复制

# orchestrator-client -c which-api

test3:3000/api

也可以通过 http://192.168.163.133/api/leader-check 查看。

调用api 请求，需要和 -path 参数一起：api..-path

代码语言：javascript

复制

# orchestrator-client -c api -path clusters [ "test2:3307" ] orchestrator-client -c api -path leader-check "OK" orchestrator-client -c api -path status

{ "Code": "OK", "Message": "Application node is healthy"...}

搜索实例：search

代码语言：javascript

复制

# orchestrator-client -c search -i test

test2:3307

test1:3307

test3:3307

打印指定实例的主库：which-master

代码语言：javascript

复制

# orchestrator-client -c which-master -i test1:3307 test2:3307 orchestrator-client -c which-master -i test3:3307 test2:3307 orchestrator-client -c which-master -i test2:3307 #自己本身是主库

:0

打印指定实例的从库：which-replicas

代码语言：javascript

复制

# orchestrator-client -c which-replicas -i test2:3307

test1:3307

test3:3307

打印指定实例的实例名：which-instance

代码语言：javascript

复制

# orchestrator-client -c instance -i test1:3307

test1:3307

打印指定主实例从库异常的列表：which-broken-replicas，模拟 test3 的复制异常：

代码语言：javascript

复制

# orchestrator-client -c which-broken-replicas -i test2:3307

test3:3307

给出一个实例或则集群别名，打印出该实例所在集群下的所有其他实例。which-cluster-instances

代码语言：javascript

复制

# orchestrator-client -c which-cluster-instances -i test

test1:3307

test2:3307

test3:3307

root@test1:~# orchestrator-client -c which-cluster-instances -i test1:3307

test1:3307

test2:3307

test3:3307

给出一个实例，打印该实的集群名称：默认是 hostname:port。which-cluster

代码语言：javascript

复制

# orchestrator-client -c which-cluster -i test1:3307

test2:3307# orchestrator-client -c which-cluster -i test2:3307

test2:3307# orchestrator-client -c which-cluster -i test3:3307

test2:3307

打印出指定实例 / 集群名或则所有所在集群的可写实例，：which-cluster-master

指定实例：which-cluster-master

代码语言：javascript

复制

# orchestrator-client -c which-cluster-master -i test2:3307 test2:3307 orchestrator-client -c which-cluster-master -i test

test2:3307

所有实例：all-clusters-masters，每个集群返回一个

代码语言：javascript

复制

# orchestrator-client -c all-clusters-masters

test1:3307

打印出所有实例：all-instances

代码语言：javascript

复制

# orchestrator-client -c all-instances

test2:3307

test1:3307

test3:3307

打印出集群中可以作为 pt-online-schema-change 操作的副本列表：which-cluster-osc-replicas

代码语言：javascript

复制

# orchestrator-client -c which-cluster-osc-replicas -i test

test1:3307

test3:3307

root@test1:# orchestrator-client -c which-cluster-osc-replicas -i test2:3307

test1:3307

test3:3307

打印出集群中可以作为 pt-online-schema-change 可以操作的健康的副本列表：which-cluster-osc-running-replicas

代码语言：javascript

复制

# orchestrator-client -c which-cluster-osc-running-replicas -i test test1:3307 test3:3307 orchestrator-client -c which-cluster-osc-running-replicas -i test1:3307

test1:3307 test3:3307

打印出所有在维护（downtimed）的实例：downtimed

代码语言：javascript

复制

# orchestrator-client -c downtimed

test1:3307

test3:3307

打印出进群中主的数据中心：dominant-dc

代码语言：javascript

复制

# orchestrator-client -c dominant-dc

BJ

将集群的主提交到 KV 存储。submit-masters-to-kv-stores

代码语言：javascript

复制

# orchestrator-client -c submit-masters-to-kv-stores

mysql/master/test:test2:3307

mysql/master/test/hostname:test2

mysql/master/test/port:3307

mysql/master/test/ipv4:192.168.163.132

mysql/master/test/ipv6:

迁移从库到另一个实例上：relocate

代码语言：javascript

复制

# orchestrator-client -c relocate -i test3:3307 -d test1:3307 #迁移test3:3307作为test1:3307的从库

test3:3307<test1:3307
查看
orchestrator-client -c topology -i test2:3307
test2:3307     [0s,ok,5.7.25-0ubuntu0.16.04.2-log,rw,ROW,>>,GTID]

test1:3307   [0s,ok,5.7.25-0ubuntu0.16.04.2-log,ro,ROW,>>,GTID]

test3:3307 [0s,ok,5.7.25-log,ro,ROW,>>,GTID]

迁移一个实例的所有从库到另一个实例上：relocate-replicas

代码语言：javascript

复制

# orchestrator-client -c relocate-replicas -i test1:3307 -d test2:3307 #迁移test1:3307下的所有从库到test2:3307下，并列出被迁移的从库的实例名

test3:3307

将 slave 在拓扑上向上移动一级，对应 web 上的是在 Classic Model 下进行拖动：move-up

代码语言：javascript

复制

# orchestrator-client -c move-up -i test3:3307 -d test2:3307

test3:3307<test2:3307

结构从 test2:3307 -> test1:3307 -> test3:3307 变成 test2:3307 -> test1:3307

-> test3:3307

将 slave 在拓扑上向下移动一级（移到同级的下面），对应 web 上的是在 Classic Model 下进行拖动：move-below

代码语言：javascript

复制

# orchestrator-client -c move-below -i test3:3307 -d test1:3307

test3:3307<test1:3307

结构从 test2:3307 -> test1:3307 变成 test2:3307 -> test1:3307 -> test3:3307

-> test3:3307

将给定实例的所有从库在拓扑上向上移动一级，基于 Classic Model 模式：move-up-replicas

代码语言：javascript

复制

# orchestrator-client -c move-up-replicas -i test1:3307

test3:3307

结构从 test2:3307 -> test1:3307 -> test3:3307 变成 test2:3307 -> test1:3307

-> test3:3307

创建主主复制，将给定实例直接和当前主库做成主主复制：make-co-master

代码语言：javascript

复制

# orchestrator-client -c make-co-master -i test1:3307

test1:3307<test2:3307

将实例转换为自己主人的主人，切换两个：take-master

代码语言：javascript

复制

# orchestrator-client -c take-master -i test3:3307

test3:3307<test2:3307

结构从 test2:3307 -> test1:3307 -> test3:3307 变成 test2:3307 -> test3:3307 -> test1:3307

通过 GTID 移动副本，move-gtid：

通过 orchestrator-client 执行报错：

代码语言：javascript

复制

# orchestrator-client -c move-gtid -i test3:3307 -d test1:3307

parse error: Invalid numeric literal at line 1, column 9

parse error: Invalid numeric literal at line 1, column 9

parse error: Invalid numeric literal at line 1, column 9

通过 orchestrator 执行是没问题，需要添加 --ignore-raft-setup 参数：

代码语言：javascript

复制

# orchestrator -c move-gtid -i test3:3307 -d test2:3307 --ignore-raft-setup

test3:3307<test2:3307

通过 GTID 移动指定实例下的所有 slaves 到另一个实例，move-replicas-gtid

通过 orchestrator-client 执行报错：

代码语言：javascript

复制

# orchestrator-client -c move-replicas-gtid -i test3:3307 -d test1:3307

jq: error (at <stdin>:1): Cannot index string with string "Key"

通过 orchestrator 执行是没问题，需要添加 --ignore-raft-setup 参数：

代码语言：javascript

复制

# ./orchestrator -c move-replicas-gtid -i test2:3307 -d test1:3307 --ignore-raft-setup

test3:3307

将给定实例的同级 slave，变更成他的 slave，take-siblings

代码语言：javascript

复制

# orchestrator-client -c take-siblings -i test3:3307

test3:3307<test1:3307

结构从 test1:3307 -> test2:3307 变成 test1:3307 -> test3:3307 -> test2:3307

-> test3:3307

给指定实例打上标签，tag

代码语言：javascript

复制

# orchestrator-client -c tag -i test1:3307 --tag 'name=AAA'

test1:3307

列出指定实例的标签，tags：

代码语言：javascript

复制

# orchestrator-client -c tags -i test1:3307

name=AAA

列出给定实例的标签值：tag-value

代码语言：javascript

复制

# orchestrator-client -c tag-value -i test1:3307 --tag "name"

AAA

移除指定实例上的标签：untag

代码语言：javascript

复制

# orchestrator-client -c untag -i test1:3307 --tag "name=AAA"

test1:3307

列出打过某个标签的实例，tagged：

代码语言：javascript

复制

# orchestrator-client -c tagged -t name

test3:3307

test1:3307

test2:3307

标记指定实例进入停用模式，包括时间、操作人、和原因，begin-downtime：

代码语言：javascript

复制

# orchestrator-client -c begin-downtime -i test1:3307 -duration=10m -owner=zjy -reason 'test'

test1:3307

移除指定实例的停用模式，end--downtime：

代码语言：javascript

复制

# orchestrator-client -c end-downtime -i test1:3307

test1:3307

请求指定实例上的维护锁：拓扑更改需要将锁放在最小受影响的实例上，以避免在同一个实例上发生两个不协调的操作，begin-maintenance ：

代码语言：javascript

复制

# orchestrator-client -c begin-maintenance -i test1:3307 --reason "XXX"

test1:3307

锁默认 10 分钟后过期，有参数 MaintenanceExpireMinutes。

移除指定实例上的维护锁：end-maintenance

代码语言：javascript

复制

# orchestrator-client -c end-maintenance -i test1:3307

test1:3307

设置提升规则，恢复时可以指定一个实例进行提升：register-candidate：需要和 promotion-rule 一起使用

代码语言：javascript

复制

# orchestrator-client -c register-candidate -i test3:3307 --promotion-rule prefer

test3:3307

提升 test3:3307 的权重，如果进行 Failover，会成为 Master。

指定实例执行停止复制：

普通的：stop slave：stop-replica

代码语言：javascript

复制

# orchestrator-client -c stop-replica -i test2:3307

test2:3307

应用完 relay log，在 stop slave：stop-replica-nice

代码语言：javascript

复制

# orchestrator-client -c stop-replica-nice -i test2:3307

test2:3307

指定实例执行开启复制： start-replica

代码语言：javascript

复制

# orchestrator-client -c start-replica -i test2:3307

test2:3307

指定实例执行复制重启：restart-replica

代码语言：javascript

复制

# orchestrator-client -c restart-replica -i test2:3307

test2:3307

指定实例执行复制重置：reset-replica

代码语言：javascript

复制

# orchestrator-client -c reset-replica -i test2:3307

test2:3307

分离副本：非 GTID 修改 binlog position，detach-replica ：

代码语言：javascript

复制

# orchestrator-client -c detach-replica -i test2:3307

恢复副本：reattach-replica

代码语言：javascript

复制

# orchestrator-client -c reattach-replica  -i test2:3307

分离副本：注释 master_host 来分离，detach-replica-master-host ：如 Master_Host: //test1

代码语言：javascript

复制

# orchestrator-client -c detach-replica-master-host -i test2:3307

test2:3307

恢复副本：reattach-replica-master-host

代码语言：javascript

复制

# orchestrator-client -c reattach-replica-master-host -i test2:3307

test2:3307

跳过 SQL 线程的 Query，如主键冲突，支持在 GTID 和非 GTID 下：skip-query

代码语言：javascript

复制

# orchestrator-client -c skip-query -i test2:3307

test2:3307

将错误的 GTID 事务当做空事务应用副本的主上：gtid-errant-inject-empty「web 上的 fix」

代码语言：javascript

复制

# orchestrator-client -c gtid-errant-inject-empty  -i test2:3307

test2:3307

通过 RESET MASTER 删除错误的 GTID 事务：gtid-errant-reset-master

代码语言：javascript

复制

# orchestrator-client -c gtid-errant-reset-master  -i test2:3307

test2:3307

设置半同步相关的参数:

代码语言：javascript

复制

orchestrator-client -c $variable -i test1:3307

enable-semi-sync-master      主上执行开启半同步

disable-semi-sync-master      主上执行关闭半同步

enable-semi-sync-replica       从上执行开启半同步

disable-semi-sync-replica      从上执行关闭半同步

执行需要 stop/start slave 配合的 SQL：restart-replica-statements

代码语言：javascript

复制

# orchestrator-client -c restart-replica-statements -i test3:3307 -query "change master to auto_position=1" | jq .[] -r

stop slave io_thread;

stop slave sql_thread;

change master to auto_position=1;

start slave sql_thread;

start slave io_thread;



orchestrator-client -c restart-replica-statements -i test3:3307 -query "change master to master_auto_position=1" | jq .[] -r  |  mysql -urep -p -htest3 -P3307
Enter password:

根据复制规则检查实例是否可以从另一个实例复制 (GTID 和非 GTID）：

非 GTID，can-replicate-from：

代码语言：javascript

复制

# orchestrator-client -c can-replicate-from -i test3:3307 -d test1:3307

test1:3307

GTID：can-replicate-from-gtid

代码语言：javascript

复制

# orchestrator-client -c can-replicate-from-gtid -i test3:3307 -d test1:3307

test1:3307

检查指定实例是否在复制：is-replicating

代码语言：javascript

复制

#有返回在复制
orchestrator-client -c is-replicating -i test2:3307
test2:3307
#没有返回，不在复制
orchestrator-client -c is-replicating -i test1:3307

检查指定实例的 IO 和 SQL 限制是否都停止：

代码语言：javascript

复制

# orchestrator-client -c is-replicating -i test2:3307

将指定实例设置为只读，通过 SET GLOBAL read_only=1，set-read-only：

代码语言：javascript

复制

# orchestrator-client -c set-read-only -i test2:3307

test2:3307

将指定实例设置为读写，通过 SET GLOBAL read_only=0，set-writeable

代码语言：javascript

复制

# orchestrator-client -c set-writeable -i test2:3307

test2:3307

轮询指定实例的 binary log，flush-binary-logs

代码语言：javascript

复制

# orchestrator-client -c flush-binary-logs -i test1:3307

test1:3307

手动执行恢复，指定一个死机的实例，recover：

代码语言：javascript

复制

# orchestrator-client -c recover -i test2:3307

test3:3307

测试下来，该参数会让处理停机或则维护状态下的实例进行强制恢复。结构：

test1:3307 -> test2:3307 -> test3:3307(downtimed) 当 test2:3307 死掉之后，此时 test3:3307 处于停机状态，不会进行 Failover，执行后变成

test1:3307 -> test2:3307

-> test3:3307

优雅的进行主和指定从切换，graceful-master-takeover：

代码语言：javascript

复制

# orchestrator-client -c graceful-master-takeover -a test1:3307 -d test2:3307

test2:3307

结构从 test1:3307 -> test2:3307 变成 test2:3307 -> test1:3307。新主指定变成读写，新从变成只读，还需要手动 start slave。

注意需要配置：需要从元表里找到复制的账号和密码。

代码语言：javascript

复制

"ReplicationCredentialsQuery":"SELECT repl_user, repl_pass from meta.cluster where anchor=1"

手动强制执行恢复，即使 orch 没有发现问题，force-master-failover：转移之后老主独立，需要手动加入到集群。

代码语言：javascript

复制

# orchestrator-client -c force-master-failover -i test1:3307

test3:3307

强行丢弃 master 并指定的一个实例，force-master-takeover：老主 (test1) 独立，指定从 (test2) 提升为 master

代码语言：javascript

复制

# orchestrator-client -c force-master-takeover -i test1:3307 -d test2:3307

test2:3307

确认集群恢复理由，在 web 上的 Audit->Recovery->Acknowledged 按钮确认，/ack-all-recoveries

确认指定集群：ack-cluster-recoveries

代码语言：javascript

复制

# orchestrator-client -c ack-cluster-recoveries  -i test2:3307 -reason=''

test1:3307

确认所有集群：ack-all-recoveries

代码语言：javascript

复制

# orchestrator-client -c ack-all-recoveries  -reason='OOOPPP'

eason=XYZ

检查、禁止、开启 orchestrator 执行全局恢复：

检查：check-global-recoveries

代码语言：javascript

复制

# orchestrator-client -c check-global-recoveries

enabled

禁止：disable-global-recoveries

代码语言：javascript

复制

# orchestrator-client -c disable-global-recoveries

disabled

开启：enable--global-recoveries

代码语言：javascript

复制

# orchestrator-client -c enable-global-recoveries

enabled

检查分析复制拓扑中存在的问题：replication-analysis

代码语言：javascript

复制

# orchestrator-client -c replication-analysis

test1:3307 (cluster test1:3307): ErrantGTIDStructureWarning

raft 检测：leader 查看、健康监测、迁移 leader：

代码语言：javascript

复制

查看leader节点 orchestrator-client -c raft-leader 192.168.163.131:10008 健康监测 orchestrator-client -c raft-health healthy leader 主机名 orchestrator-client -c raft-leader-hostname test1 指定主机选举leader orchestrator-client -c raft-elect-leader -hostname test3

test3

伪 GTID 相关参数：

代码语言：javascript

复制

match      #使用Pseudo-GTID指定一个从匹配到指定的另一个（目标）实例下

match-up #Transport the replica one level up the hierarchy, making it child of its grandparent, using Pseudo-GTID

match-up-replicas  #Matches replicas of the given instance one level up the topology, making them siblings of given instance, using Pseudo-GTID

last-pseudo-gtid #Dump last injected Pseudo-GTID entry on a server

到此关于 Orchestrator 的使用以及命令行说明已经介绍完毕，Web API 可以在 Orchestrator API 查看，通过命令行和 API 上的操作可以更好的进行自动化开发。

总结：

Orchestrator 是一款开源 (go 编写) 的 MySQL 复制拓扑管理工具，支持 MySQL 主从复制拓扑关系的调整、主库故障自动切换、手动主从切换等功能。提供 Web 界面展示 MySQL 集群的拓扑关系及状态，可以更改 MySQL 实例的部分配置信息，也提供命令行和 api 接口。相对比 MHA，Orchestrator 自身可以部署多个节点，通过 raft 分布式一致性协议来保证自身的高可用。

下一篇： MySQL 高可用复制管理工具 —— Orchestrator 介绍→