存储架构

MySQL Master High Available 实战篇

微信扫一扫,分享到朋友圈

MySQL Master High Available 实战篇
0

测试背景

以下所有测试,全部基于以下复制结构完成

host_1(host_1:3306)(current master)
 +--host_2(host_2:3306)
 +--host_3(host_3:3306)

一、MHA安装

mha node 所有服务器节点都要安装

mha manager 只需要安装在manager节点

  • MHA NODE
1. yum install perl-DBD-MySQL
2. rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm
  • MHA Manager
1. yum install perl-DBD-MySQL
2. yum install perl-Config-Tiny
3. yum install perl-Log-Dispatch
4. yum install perl-Parallel-ForkManager
5. rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm
  • MHA rpm安装路径
* Node 
Node> rpm -qpl mha4mysql-node-0.56-0.el6.noarch.rpm

/usr/bin/apply_diff_relay_logs
/usr/bin/filter_mysqlbinlog
/usr/bin/purge_relay_logs
/usr/bin/save_binary_logs
/usr/share/man/man1/apply_diff_relay_logs.1.gz
/usr/share/man/man1/filter_mysqlbinlog.1.gz
/usr/share/man/man1/purge_relay_logs.1.gz
/usr/share/man/man1/save_binary_logs.1.gz
/usr/share/perl5/vendor_perl/MHA/BinlogHeaderParser.pm
/usr/share/perl5/vendor_perl/MHA/BinlogManager.pm
/usr/share/perl5/vendor_perl/MHA/BinlogPosFindManager.pm
/usr/share/perl5/vendor_perl/MHA/BinlogPosFinder.pm
/usr/share/perl5/vendor_perl/MHA/BinlogPosFinderElp.pm
/usr/share/perl5/vendor_perl/MHA/BinlogPosFinderXid.pm
/usr/share/perl5/vendor_perl/MHA/NodeConst.pm
/usr/share/perl5/vendor_perl/MHA/NodeUtil.pm
/usr/share/perl5/vendor_perl/MHA/SlaveUtil.pm

* Manager
Manager> rpm -qpl mha4mysql-manager-0.56-0.el6.noarch.rpm

/usr/bin/masterha_check_repl
/usr/bin/masterha_check_ssh
/usr/bin/masterha_check_status
/usr/bin/masterha_conf_host
/usr/bin/masterha_manager
/usr/bin/masterha_master_monitor
/usr/bin/masterha_master_switch
/usr/bin/masterha_secondary_check
/usr/bin/masterha_stop
/usr/share/man/man1/masterha_check_repl.1.gz
/usr/share/man/man1/masterha_check_ssh.1.gz
/usr/share/man/man1/masterha_check_status.1.gz
/usr/share/man/man1/masterha_conf_host.1.gz
/usr/share/man/man1/masterha_manager.1.gz
/usr/share/man/man1/masterha_master_monitor.1.gz
/usr/share/man/man1/masterha_master_switch.1.gz
/usr/share/man/man1/masterha_secondary_check.1.gz
/usr/share/man/man1/masterha_stop.1.gz
/usr/share/perl5/vendor_perl/MHA/Config.pm
/usr/share/perl5/vendor_perl/MHA/DBHelper.pm
/usr/share/perl5/vendor_perl/MHA/FileStatus.pm
/usr/share/perl5/vendor_perl/MHA/HealthCheck.pm
/usr/share/perl5/vendor_perl/MHA/ManagerAdmin.pm
/usr/share/perl5/vendor_perl/MHA/ManagerAdminWrapper.pm
/usr/share/perl5/vendor_perl/MHA/ManagerConst.pm
/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm
/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm
/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm
/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm
/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm
/usr/share/perl5/vendor_perl/MHA/Server.pm
/usr/share/perl5/vendor_perl/MHA/ServerManager.pm

二、MHA配置文件

  • global scope
* /etc/masterha_default.cnf

[server default] -- 这下面的都是全局配置,适用于所有app.cnf
user=dba
password=dba
ssh_user=root
master_binlog_dir=/data/mysql.bin
secondary_check_script=masterha_secondary_check -s remote_host1 -s remote_host2
ping_interval=3
master_ip_failover_script=/script/masterha/master_ip_failover
shutdown_script=/script/masterha/power_manager
report_script=/script/masterha/send_master_failover_mail
  • application scope
* /etc/app1.cnf

[serverdefault]
remote_workdir=/var/log/masterha/app1
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/app1.log

[serverdefault] 里面的这些配置,会影响到下面的N 个server
  • local scope
* /etc/app1.cnf

[server1]
hostname=192.168.1.10
candidate_master=1

[server2]
hostname=192.168.1.11
candidate_master=1

[server3]
hostname=192.168.1.12
no_master=1

每个server下面的配置,专属于某个server

三、前提要求(必须满足)

3.1 SSH公钥认证

基本上MHA manager,MNA node,以及二次检测的节点,都需要互相信任

# masterha_check_ssh --conf=/etc/app1.cnf

SatMay1414:42:192011- [warn] Global configuration file /etc/masterha_default.cnf not found. Skipping.
 Sat May 1414:42:192011- [info] Reading application default configurations from /etc/app1.cnf..
 Sat May 1414:42:192011- [info] Reading server configurations from /etc/app1.cnf..
 Sat May 1414:42:192011- [info] Starting SSH connection tests..
 Sat May 1414:42:192011- [debug] Connecting via SSH from root@host1(192.168.0.1) to root@host2(192.168.0.2)..
 Sat May 1414:42:202011- [debug] ok.
 Sat May 1414:42:202011- [debug] Connecting via SSH from root@host1(192.168.0.1) to root@host3(192.168.0.3)..
 Sat May 1414:42:202011- [debug] ok.
 Sat May 1414:42:212011- [debug] Connecting via SSH from root@host2(192.168.0.2) to root@host1(192.168.0.1)..
 Sat May 1414:42:212011- [debug] ok.
 Sat May 1414:42:212011- [debug] Connecting via SSH from root@host2(192.168.0.2) to root@host3(192.168.0.3)..
 Sat May 1414:42:212011- [debug] ok.
 Sat May 1414:42:222011- [debug] Connecting via SSH from root@host3(192.168.0.3) to root@host1(192.168.0.1)..
 Sat May 1414:42:222011- [debug] ok.
 Sat May 1414:42:222011- [debug] Connecting via SSH from root@host3(192.168.0.3) to root@host2(192.168.0.2)..
 Sat May 1414:42:222011- [debug] ok.
 Sat May 1414:42:222011- [info] All SSH connection tests passed successfully.

如果slave比较多,实例比较多,最好提高下 /etc/ssh/sshd_config MaxStartups 的值(默认是10)

3.2 操作系统

仅在Linux上测试过

3.3 单写master和多slave或者只读master

打从一开始,MHA就是为了解决数据一致性而出生,所以,最好是多个slave

如果你只有一个slave,根本就碰不到数据一致性问题,也就不需要mha了

如果是一个slave,用版同步复制也能解决

从0.52开始,MHA就支持多master的复制架构了,下面列举了多master环境下注意点

  • 多master,但是只允许单点写入
  • 默认情况下,只支持2层复制架构

3.4 三层或者多层复制环境

默认情况下,MHA是不支持3层或多层复制架构的(Master1 -> Master2 -> Slave3)

MHA可以恢复Master2,但是不能恢复Slave3,因为Master2,Slave3有不同的master

为了让MHA支持以上架构,可以参考如下配置:

  • 在配置文件中,只配置两层(master1 and master2)
  • 使用 “multi_tier_slave=1” 参数,然后设置所有hosts

3.5 MySQL版本必须是5.0 或者高于 5.0

  • MySQL版本必须大于等于5.0
  • 尽量使用高版本的MySQL

3.6 使用mysqlbinlog 5.1+ 支持MySQL5.1+

  • MHA使用mysqlbinlog来应用日志到目标slave上的
  • 如果MySQL master设置的是row格式,那么MySQL必须是大于等于5.1版本,因为5.0不支持row
  • mysqlbinlog版本可以这样被检测:
[app@slave_host1]$mysqlbinlog --version
mysqlbinlog Ver3.3forunknown-linux-gnu at x86_64
  • 如果你使用的是MySQL5.1,那么mysqlbinlog必须大于等于3.3
  • 如果mysqlbinlog的版本是3.2,而mysql的版本是5.1,那么mha manager会报错,且停止monitoring

3.7 log-bin必须在候选master上开启

  • 如果当前slave没有设置log-bin,那么很显然它不能成为提升为new master
  • 如果没有任何机器设置了log-bin,那么mha会报错且停止failover

3.8 binlog,relay-log 主从环境必须全部一致

  • 复制过滤规则(binlog-do-db, replicate-ignore-db 等等)必须全部一致

3.9 复制用户必须在候选master上要存在

  • 切换完成后,所有slave都必须执行change master 命令。在new master上复制用户必须有(REPLICATEION SLAVE权限)

3.10 使用purge_relay_logs来定期删除relay logs

默认情况下,如果SQL线程执行完relay-log,relay logs就会被自动删除。

但是这些relay-logs 也许还会用来恢复其他的slave,所以你需要关闭自动删除relay-logs的purge线程,然后自己阶段性的来删除

如果是你自己来删的话,必须考虑repl 延迟问题

最好让slave删除relay log不要在同一时间点,假如需要恢复,那么这个时间点所有relay logs都被删除了就不好了

3.11 不要在SBR的环境中使用load data infile

不管是SBR,还是RBR,最好不要使用load data

四、实战演练

  1. [功能测试] 测试场景一、手工操作,手动完成,failover
  2. [功能测试] 测试场景二、手工操作, 自动完成,failover
  3. [功能测试] 测试场景三、手工操作,自动完成,online master switch
  4. [功能测试] 测试场景四、自动监控,自动操作,自动完成,failover
  5. [用例测试] MySQL master 服务 down掉,是否成功自动failover
  6. [用例测试] MySQL master too many connection,无权限,响应慢,是否自动failover
  7. [用例测试] MySQL master 服务器 down掉,且候选master落后的最多, 是否自动failover,知否可以成功的做日志补偿
  8. [用例测试] MySQL slave 服务 down掉,是否自动failover
  9. [用例测试] MySQL 有一台slave,或者多台slave服务器延迟很大,是否自动failover
  10. [用例测试] MySQL slave IO/SQL线程 stop,是否自动failover
  11. [用例测试] MySQL slave IO/SQL线程 报错,是否自动failover
  12. [用例测试] MySQL master 有大事务超过100s再执行,是否可以online master switch
  13. [用例测试] MySQL master 网络断掉,是否自动failover
  14. [用例测试] MySQL master 网路瞬断(1~30秒),是否自动failover
  15. [用例测试] MySQL master 和 候选master 网络都挂掉的情况,是否自动failover
  16. [用例测试] MHA manager 和 MySQL master之间的网络断掉,但是master和slave之间的网络是好的,是否自动failover
  17. [用例测试] MHA manager 和 MySQL master,slave 网络都断掉的情况,是否自动failover
  18. [用例测试] GTID模式下,还需要relay-log吗?是否能够成功的补齐日志
  19. [用例测试] 多线程复制模式下,做failover 和 online master switch,会不会有问题呢?
  20. [用例测试] 在一开始没有开启MHA的group中,如何做到日志补偿,然后change master呢?

4.0 简单的failover过程

  • step1:安装MHA node在所有节点上

  • step2:安装MHA manager

  • step3:创建配置文件

manager_host$ cat /etc/app1.cnf

 [server default]
# mysql user and password
user=root
password=mysqlpass
ssh_user=root
# working directory on the manager
manager_workdir=/var/log/masterha/app1
# working directory on MySQL servers
remote_workdir=/var/log/masterha/app1

 [server1]
hostname=host1

 [server2]
hostname=host2

 [server3]
hostname=host3
  • step4: 检查SSH 互信
# masterha_check_ssh --conf=/etc/app1.cnf
  • step5: 检查复制配置
manager_host$ masterha_check_repl --conf=/etc/app1.cnf
  • step6:开启Manager
manager_host$ masterha_manager --conf=/etc/app1.cnf
  • step7: 检查manager的状态
manager_host$ masterha_check_status --conf=/etc/app1.cnf
  • step8: 测试关闭manager
manager_host$ masterha_stop --conf=/etc/app1.cnf
  • step9:测试master failover
host1$killall -9mysqld mysqld_safe
  • 未完成的步骤(进阶)
* 二次检测: 检查master是否真的挂了,避免脑裂 secondary_network_script

* master_ip_failover: 默认的是空,什么都不做 
 ip漂移
 新master赋予写
 日志保留
 报警
等等

4.2 测试场景一、 手工操作,手动完成,failover

  • step1: 检查各种MHA manager的各种环境
* 检查SSH 
 masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
 masterha_check_repl --conf=/etc/app1.cnf
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_2(host_2:3306).
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_3(host_3:3306).

* 检查mha状态 
 masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
 nohup masterha_manager --conf=/etc/app1.cnf &

* 关闭mha 监控 
 masterha_stop --conf=/etc/app1.cnf
  • step2: 手工制造master mysql挂了
killall-9mysqld mysqld_safe
  • step3: 人工交互式failover
masterha_master_switch --master_state=dead --conf=/etc/app1.cnf --dead_master_host=host_1 --interactive=1 --ignore_last_failover


* 切换日志如下:

--dead_master_ip= is not set. Using host_1.
--dead_master_port= is not set. Using 3306.
Thu Jul 28 16:37:04 2016 - [info] Reading default configuration from /etc/masterha_default.cnf..
Thu Jul 28 16:37:04 2016 - [info] Reading application default configuration from /etc/app1.cnf..
Thu Jul 28 16:37:04 2016 - [info] Reading server configuration from /etc/app1.cnf..
Thu Jul 28 16:37:04 2016 - [info] MHA::MasterFailover version 0.56.
Thu Jul 28 16:37:04 2016 - [info] Starting master failover.
Thu Jul 28 16:37:04 2016 - [info]
Thu Jul 28 16:37:04 2016 - [info] * Phase 1: Configuration CheckPhase..
Thu Jul 2816:37:042016- [info]
Thu Jul 2816:37:042016- [info] GTID failovermode=0
Thu Jul 2816:37:042016- [info] Dead Servers:
Thu Jul 2816:37:042016- [info] host_1(host_1:3306)
Thu Jul 2816:37:042016- [info] Checkingmasterreachability via MySQL(doublecheck)...
Thu Jul 2816:37:042016- [info] ok.
Thu Jul 2816:37:042016- [info] Alive Servers:
Thu Jul 2816:37:042016- [info] host_2(host_2:3306)
Thu Jul 2816:37:042016- [info] host_3(host_3:3306)
Thu Jul 2816:37:042016- [info] Alive Slaves:
Thu Jul 2816:37:042016- [info] host_2(host_2:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Thu Jul 2816:37:042016- [info] Replicatingfromhost_1(host_1:3306)
Thu Jul 2816:37:042016- [info]Primarycandidateforthe newMaster(candidate_masterisset)
Thu Jul 2816:37:042016- [info] host_3(host_3:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Thu Jul 2816:37:042016- [info] Replicatingfromhost_1(host_1:3306)
Thu Jul 2816:37:042016- [info]Notcandidateforthe newMaster(no_masterisset)
Masterhost_1(host_1:3306)isdead. Proceed? (yes/NO): yes
Thu Jul 2816:37:062016- [info]StartingNon-GTID based failover.
Thu Jul 2816:37:062016- [info]
Thu Jul 2816:37:062016- [info] ** Phase1: ConfigurationCheckPhase completed.
Thu Jul 2816:37:062016- [info]
Thu Jul 2816:37:062016- [info] * Phase2: DeadMasterShutdown Phase..
Thu Jul 2816:37:062016- [info]
Thu Jul 2816:37:062016- [info] HealthCheck: SSHtohost_1isreachable.
Thu Jul 2816:37:072016- [info] Forcing shutdown so that applications neverconnecttothecurrentmaster..
Thu Jul 2816:37:072016- [info] ExecutingmasterIP deactivation script:
Thu Jul 2816:37:072016- [info] /home/mysql/MHA/masterha/master_ip_failover--orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --command=stopssh --ssh_user=root
Thu Jul 2816:37:072016- [info] done.
Thu Jul 2816:37:072016- [warning] shutdown_scriptisnotset. Skipping explicit shutting downofthe deadmaster.
Thu Jul 2816:37:072016- [info] * Phase2: DeadMasterShutdown Phase completed.
Thu Jul 2816:37:072016- [info]
Thu Jul 2816:37:072016- [info] * Phase3:MasterRecovery Phase..
Thu Jul 2816:37:072016- [info]
Thu Jul 2816:37:072016- [info] * Phase3.1: Getting Latest Slaves Phase..
Thu Jul 2816:37:072016- [info]
Thu Jul 2816:37:072016- [info] The latestbinarylogfile/positiononallslavesishost_1_name.000004:154
Thu Jul 2816:37:072016- [info] Latest slaves (Slaves that received relaylogfilestothe latest):
Thu Jul 2816:37:072016- [info] host_2(host_2:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Thu Jul 2816:37:072016- [info] Replicatingfromhost_1(host_1:3306)
Thu Jul 2816:37:072016- [info]Primarycandidateforthe newMaster(candidate_masterisset)
Thu Jul 2816:37:072016- [info] host_3(host_3:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Thu Jul 2816:37:072016- [info] Replicatingfromhost_1(host_1:3306)
Thu Jul 2816:37:072016- [info]Notcandidateforthe newMaster(no_masterisset)
Thu Jul 2816:37:072016- [info] The oldestbinarylogfile/positiononallslavesishost_1_name.000004:154
Thu Jul 2816:37:072016- [info] Oldest slaves:
Thu Jul 2816:37:072016- [info] host_2(host_2:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Thu Jul 2816:37:072016- [info] Replicatingfromhost_1(host_1:3306)
Thu Jul 2816:37:072016- [info]Primarycandidateforthe newMaster(candidate_masterisset)
Thu Jul 2816:37:072016- [info] host_3(host_3:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Thu Jul 2816:37:072016- [info] Replicatingfromhost_1(host_1:3306)
Thu Jul 2816:37:072016- [info]Notcandidateforthe newMaster(no_masterisset)
Thu Jul 2816:37:072016- [info]
Thu Jul 2816:37:072016- [info] * Phase3.2: Saving DeadMaster's Binlog Phase..
Thu Jul 28 16:37:07 2016 - [info]
Thu Jul 28 16:37:07 2016 - [info] Fetching dead master's binarylogs..
Thu Jul 2816:37:072016- [info] Executing commandonthe deadmasterhost_1(host_1:3306): save_binary_logs--command=save --start_file=host_1_name.000004 --start_pos=154 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160728163704.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56
 Creating /var/log/masterha/app1ifnotexists.. ok.
Concatbinary/relaylogsfromhost_1_name.000004pos154tohost_1_name.000004EOFinto/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160728163704.binlog..
BinlogChecksumenabled
 Dumping binlogformatdescriptionevent,fromposition0to154.. ok.
Noneedtodump effectivebinlogdatafrom/data/mysql.bin/host_1_name.000004(pos starts154, filesize154). Skipping.
BinlogChecksumenabled
 /var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160728163704.binloghasnoeffectivedataevents.
Eventnotexists.
Thu Jul 2816:37:072016- [info] Additionaleventswerenotfoundfromthe origmaster.Noneedtosave.
Thu Jul 2816:37:072016- [info]
Thu Jul 2816:37:072016- [info] * Phase3.3: Determining NewMasterPhase..
Thu Jul 2816:37:072016- [info]
Thu Jul 2816:37:072016- [info] Finding the latestslavethat hasallrelaylogsforrecovering other slaves..
Thu Jul 2816:37:072016- [info]Allslaves received relaylogstothe sameposition.Noneedtoresynceachother.
Thu Jul 2816:37:072016- [info] Searching newmasterfromslaves..
Thu Jul 2816:37:072016- [info] Candidate mastersfromthe configuration file:
Thu Jul 2816:37:072016- [info] host_2(host_2:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Thu Jul 2816:37:072016- [info] Replicatingfromhost_1(host_1:3306)
Thu Jul 2816:37:072016- [info]Primarycandidateforthe newMaster(candidate_masterisset)
Thu Jul 2816:37:072016- [info] Non-candidate masters:
Thu Jul 2816:37:072016- [info] host_3(host_3:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Thu Jul 2816:37:072016- [info] Replicatingfromhost_1(host_1:3306)
Thu Jul 2816:37:072016- [info]Notcandidateforthe newMaster(no_masterisset)
Thu Jul 2816:37:072016- [info] Searchingfromcandidate_master slaves which have received the latest relaylogevents..
Thu Jul 2816:37:072016- [info] Newmasterishost_2(host_2:3306)
Thu Jul 2816:37:072016- [info]Startingmasterfailover..
Thu Jul 2816:37:072016- [info]
From:
host_1(host_1:3306) (currentmaster)
 +--host_2(host_2:3306)
 +--host_3(host_3:3306)

To:
host_2(host_2:3306) (newmaster)
 +--host_3(host_3:3306)

Startingmasterswitchfromhost_1(host_1:3306)tohost_2(host_2:3306)? (yes/NO): yes
Thu Jul 2816:37:092016- [info] Newmasterdecided manuallyishost_2(host_2:3306)
Thu Jul 2816:37:092016- [info]
Thu Jul 2816:37:092016- [info] * Phase3.3: NewMasterDiffLogGeneration Phase..
Thu Jul 2816:37:092016- [info]
Thu Jul 2816:37:092016- [info] Thisserverhasallrelaylogs.Noneedtogenerate diff filesfromthe latestslave.
Thu Jul 2816:37:092016- [info]
Thu Jul 2816:37:092016- [info] * Phase3.4:MasterLogApply Phase..
Thu Jul 2816:37:092016- [info]
Thu Jul 2816:37:092016- [info] *NOTICE:Ifanyerror happensfromthis phase, manual recoveryisneeded.
Thu Jul 2816:37:092016- [info]Startingrecoveryonhost_2(host_2:3306)..
Thu Jul 2816:37:092016- [info] Thisserverhasallrelaylogs. Waitingalllogstobe applied..
Thu Jul 2816:37:092016- [info] done.
Thu Jul 2816:37:092016- [info]Allrelaylogswere successfully applied.
Thu Jul 2816:37:092016- [info] Getting newmaster's binlog name and position..
Thu Jul 28 16:37:09 2016 - [info] host_2_name.000002:294
Thu Jul 28 16:37:09 2016 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host_2', MASTER_PORT=3306, MASTER_LOG_FILE='host_2_name.000002', MASTER_LOG_POS=294, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Thu Jul 28 16:37:09 2016 - [info] Executing master IP activate script:
Thu Jul 28 16:37:09 2016 - [info] /home/mysql/MHA/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --new_master_host=host_2 --new_master_ip=host_2 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba'
Set read_only=0 on the new master.
No need to Creating app user on the new master..
Thu Jul 28 16:37:09 2016 - [info] OK.
Thu Jul 28 16:37:09 2016 - [info] ** Finished master recovery successfully.
Thu Jul 28 16:37:09 2016 - [info] * Phase 3: Master Recovery Phase completed.
Thu Jul 28 16:37:09 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info] * Phase 4: Slaves Recovery Phase..
Thu Jul 28 16:37:09 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Thu Jul 28 16:37:09 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info] -- Slave diff file generation on host host_3(host_3:3306) started, pid: 6211. Check tmp log /var/log/masterha/app1/host_3_3306_20160728163704.log if it takes time..
Thu Jul 28 16:37:09 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info] Log messages from host_3 ...
Thu Jul 28 16:37:09 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Thu Jul 28 16:37:09 2016 - [info] End of log messages from host_3.
Thu Jul 28 16:37:09 2016 - [info] -- host_3(host_3:3306) has the latest relay log events.
Thu Jul 28 16:37:09 2016 - [info] Generating relay diff files from the latest slave succeeded.
Thu Jul 28 16:37:09 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Thu Jul 28 16:37:09 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info] -- Slave recovery on host host_3(host_3:3306) started, pid: 6216. Check tmp log /var/log/masterha/app1/host_3_3306_20160728163704.log if it takes time..
Thu Jul 28 16:37:10 2016 - [info]
Thu Jul 28 16:37:10 2016 - [info] Log messages from host_3 ...
Thu Jul 28 16:37:10 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info] Starting recovery on host_3(host_3:3306)..
Thu Jul 28 16:37:09 2016 - [info] This server has all relay logs. Waiting all logs to be applied..
Thu Jul 28 16:37:09 2016 - [info] done.
Thu Jul 28 16:37:09 2016 - [info] All relay logs were successfully applied.
Thu Jul 28 16:37:09 2016 - [info] Resetting slave host_3(host_3:3306) and starting replication from the new master host_2(host_2:3306)..
Thu Jul 28 16:37:09 2016 - [info] Executed CHANGE MASTER.
Thu Jul 28 16:37:10 2016 - [info] Slave started.
Thu Jul 28 16:37:10 2016 - [info] End of log messages from host_3.
Thu Jul 28 16:37:10 2016 - [info] -- Slave recovery on host host_3(host_3:3306) succeeded.
Thu Jul 28 16:37:10 2016 - [info] All new slave servers recovered successfully.
Thu Jul 28 16:37:10 2016 - [info]
Thu Jul 28 16:37:10 2016 - [info] * Phase 5: New master cleanup phase..
Thu Jul 28 16:37:10 2016 - [info]
Thu Jul 28 16:37:10 2016 - [info] Resetting slave info on the new master..
Thu Jul 28 16:37:10 2016 - [info] host_2: Resetting slave info succeeded.
Thu Jul 28 16:37:10 2016 - [info] Master failover to host_2(host_2:3306) completed successfully.
Thu Jul 28 16:37:10 2016 - [info]

----- Failover Report -----

app1: MySQL Master failover host_1(host_1:3306) to host_2(host_2:3306) succeeded

Master host_1(host_1:3306) is down!

Check MHA Manager logs at host_manager_name for details.

Started manual(interactive) failover.
Invalidated master IP address on host_1(host_1:3306)
The latest slave host_2(host_2:3306) has all relay logs for recovery.
Selected host_2(host_2:3306) as a new master.
host_2(host_2:3306): OK: Applying all logs succeeded.
host_2(host_2:3306): OK: Activated master IP address.
host_3(host_3:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
host_3(host_3:3306): OK: Applying all logs succeeded. Slave started, replicating from host_2(host_2:3306)
host_2(host_2:3306): Resetting slave info succeeded.
Master failover to host_2(host_2:3306) completed successfully.

4.3 [功能测试] 测试场景二、手工操作, 自动完成,failover

  • step1: 检查各种MHA manager的各种环境
* 检查SSH
 masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
 masterha_check_repl --conf=/etc/app1.cnf
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_2(host_2:3306).
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_3(host_3:3306).

* 检查mha状态
 masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
 nohup masterha_manager --conf=/etc/app1.cnf &

* 关闭mha 监控 
 masterha_stop --conf=/etc/app1.cnf
  • step2: 手工制造master mysql挂了
killall-9mysqld mysqld_safe
  • step3: 人工-非交互式 failover
masterha_master_switch --master_state=dead --conf=/etc/app1.cnf --dead_master_host=host_1 --interactive=0--ignore_last_failover


* 切换日志如下:

--dead_master_ip= isnotset.Usinghost_1.
--dead_master_port= isnotset.Using3306.
Thu Jul 2816:33:212016- [info] Readingdefaultconfigurationfrom/etc/masterha_default.cnf..
Thu Jul 2816:33:212016- [info] Reading applicationdefaultconfigurationfrom/etc/app1.cnf..
Thu Jul 2816:33:212016- [info] Reading server configurationfrom/etc/app1.cnf..

4.4 [功能测试] 测试场景三、手工操作,自动完成,online master switch

  • step1: 检查各种MHA manager的各种环境
* 检查SSH
 masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
 masterha_check_repl --conf=/etc/app1.cnf
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_2(host_2:3306).
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_3(host_3:3306).

* 检查mha状态
 masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
 nohup masterha_manager --conf=/etc/app1.cnf &
  • step2: 手工制造master mysql挂了
killall-9mysqld mysqld_safe
  • step3: 手工操作,自动完成,online master switch
masterha_master_switch --master_state=alive --conf=/etc/app1.cnf --orig_master_is_new_slave --interactive=0

* 在online master切换过程中,如果MHA manager 开启,那么切换不会成功

Thu Jul 28 16:13:57 2016 - [info] Checking MHA is not monitoring or doing failover..
Thu Jul 28 16:13:57 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm, ln142] Getting advisory lockfailedonthecurrentmaster. MHA Monitor runsonthecurrentmaster.StopMHA Manager/Monitorandtry again.
Thu Jul 2816:13:572016- [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR:at/usr/bin/masterha_master_switch line53

* 在online master切换过程中,如果设置--orig_master_is_new_slave, 但是没有设置repl_password,就会报错

Thu Jul 2816:15:132016- [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln784]Slavecouldnotbe startedonhost_1(host_1:3306)!Checkslavestatus.
Thu Jul 2816:15:132016- [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln862]StartingslaveIO/SQLthreadonhost_1(host_1:3306) failed!
Thu Jul 2816:15:132016- [error][/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm, ln573] Failed!
Thu Jul 2816:15:132016- [error][/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm, ln602] Switchingmastertohost_2(host_2:3306) done, but switching slaves partially failed.


* 切换成功的日志如下:

Thu Jul 2816:23:252016- [info] MHA::MasterRotateversion0.56.
Thu Jul 2816:23:252016- [info]Startingonlinemasterswitch..
Thu Jul 2816:23:252016- [info]
Thu Jul 2816:23:252016- [info] * Phase1: ConfigurationCheckPhase..
Thu Jul 2816:23:252016- [info]
Thu Jul 2816:23:252016- [info] Readingdefaultconfigurationfrom/etc/masterha_default.cnf..
Thu Jul 2816:23:252016- [info] Reading applicationdefaultconfigurationfrom/etc/app1.cnf..
Thu Jul 2816:23:252016- [info] Readingserverconfigurationfrom/etc/app1.cnf..
Thu Jul 2816:23:252016- [info] GTID failovermode=0
Thu Jul 2816:23:252016- [info]CurrentAliveMaster: host_2(host_2:3306)
Thu Jul 2816:23:252016- [info] Alive Slaves:
Thu Jul 2816:23:252016- [info] host_1(host_1:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Thu Jul 2816:23:252016- [info] Replicatingfromhost_2(host_2:3306)
Thu Jul 2816:23:252016- [info]Primarycandidateforthe newMaster(candidate_masterisset)
Thu Jul 2816:23:252016- [info] host_3(host_3:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Thu Jul 2816:23:252016- [info] Replicatingfromhost_2(host_2:3306)
Thu Jul 2816:23:252016- [info]Notcandidateforthe newMaster(no_masterisset)
Thu Jul 2816:23:252016- [info] ExecutingFLUSHNO_WRITE_TO_BINLOGTABLES. This may take longtime..
Thu Jul 2816:23:252016- [info] ok.
Thu Jul 2816:23:252016- [info] Checking MHAisnotmonitoringordoing failover..
Thu Jul 2816:23:252016- [info] Checking replication healthonhost_1..
Thu Jul 2816:23:252016- [info] ok.
Thu Jul 2816:23:252016- [info] Checking replication healthonhost_3..
Thu Jul 2816:23:252016- [info] ok.
Thu Jul 2816:23:252016- [info] Searching newmasterfromslaves..
Thu Jul 2816:23:252016- [info] Candidate mastersfromthe configuration file:
Thu Jul 2816:23:252016- [info] host_1(host_1:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Thu Jul 2816:23:252016- [info] Replicatingfromhost_2(host_2:3306)
Thu Jul 2816:23:252016- [info]Primarycandidateforthe newMaster(candidate_masterisset)
Thu Jul 2816:23:252016- [info] host_2(host_2:3306)Version=5.7.13-loglog-bin:enabled
Thu Jul 2816:23:252016- [info] Non-candidate masters:
Thu Jul 2816:23:252016- [info] host_3(host_3:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Thu Jul 2816:23:252016- [info] Replicatingfromhost_2(host_2:3306)
Thu Jul 2816:23:252016- [info]Notcandidateforthe newMaster(no_masterisset)
Thu Jul 2816:23:252016- [info] Searchingfromcandidate_master slaves which have received the latest relaylogevents..
Thu Jul 2816:23:252016- [info]
From:
host_2(host_2:3306) (currentmaster)
 +--host_1(host_1:3306)
 +--host_3(host_3:3306)

To:
host_1(host_1:3306) (newmaster)
 +--host_3(host_3:3306)
 +--host_2(host_2:3306)
Thu Jul 2816:23:252016- [info] Checking whether host_1(host_1:3306)isokforthe newmaster..
Thu Jul 2816:23:252016- [info] ok.
Thu Jul 2816:23:252016- [info] host_2(host_2:3306):SHOWSLAVESTATUSreturned empty result.Tocheckreplication filtering rules, temporarily executingCHANGEMASTERtoa dummy host.
Thu Jul 2816:23:252016- [info] host_2(host_2:3306): Resettingslavepointingtothe dummy host.
Thu Jul 2816:23:252016- [info] ** Phase1: ConfigurationCheckPhase completed.
Thu Jul 2816:23:252016- [info]
Thu Jul 2816:23:252016- [info] * Phase2: Rejecting updates Phase..
Thu Jul 2816:23:252016- [info]
Thu Jul 2816:23:252016- [warning] master_ip_online_change_scriptisnotdefined. Skipping disabling writesonthecurrentmaster.
Thu Jul 2816:23:252016- [info] Lockingalltablesonthe origmastertoreject updatesfromeverybody (including root):
Thu Jul 2816:23:252016- [info] ExecutingFLUSHTABLESWITHREADLOCK..
Thu Jul 2816:23:252016- [info] ok.
Thu Jul 2816:23:252016- [info] Origmasterbinlog:posishost_2_name.000002:294.
Thu Jul 2816:23:252016- [info] Waitingtoexecuteallrelaylogsonhost_1(host_1:3306)..
Thu Jul 2816:23:252016- [info]master_pos_wait(host_2_name.000002:294) completedonhost_1(host_1:3306). Executed0events.
Thu Jul 2816:23:252016- [info] done.
Thu Jul 2816:23:252016- [info] Getting newmaster's binlog name and position..
Thu Jul 28 16:23:25 2016 - [info] host_1_name.000004:154
Thu Jul 28 16:23:25 2016 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host_1', MASTER_PORT=3306, MASTER_LOG_FILE='host_1_name.000004', MASTER_LOG_POS=154, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Thu Jul 28 16:23:25 2016 - [info] Setting read_only=0 on host_1(host_1:3306)..
Thu Jul 28 16:23:26 2016 - [info] ok.
Thu Jul 28 16:23:26 2016 - [info]
Thu Jul 28 16:23:26 2016 - [info] * Switching slaves in parallel..
Thu Jul 28 16:23:26 2016 - [info]
Thu Jul 28 16:23:26 2016 - [info] -- Slave switch on host host_3(host_3:3306) started, pid: 32120
Thu Jul 28 16:23:26 2016 - [info]
Thu Jul 28 16:23:26 2016 - [info] Log messages from host_3 ...
Thu Jul 28 16:23:26 2016 - [info]
Thu Jul 28 16:23:26 2016 - [info] Waiting to execute all relay logs on host_3(host_3:3306)..
Thu Jul 28 16:23:26 2016 - [info] master_pos_wait(host_2_name.000002:294) completed on host_3(host_3:3306). Executed 0 events.
Thu Jul 28 16:23:26 2016 - [info] done.
Thu Jul 28 16:23:26 2016 - [info] Resetting slave host_3(host_3:3306) and starting replication from the new master host_1(host_1:3306)..
Thu Jul 28 16:23:26 2016 - [info] Executed CHANGE MASTER.
Thu Jul 28 16:23:26 2016 - [info] Slave started.
Thu Jul 28 16:23:26 2016 - [info] End of log messages from host_3 ...
Thu Jul 28 16:23:26 2016 - [info]
Thu Jul 28 16:23:26 2016 - [info] -- Slave switch on host host_3(host_3:3306) succeeded.
Thu Jul 28 16:23:26 2016 - [info] Unlocking all tables on the orig master:
Thu Jul 28 16:23:26 2016 - [info] Executing UNLOCK TABLES..
Thu Jul 28 16:23:26 2016 - [info] ok.
Thu Jul 28 16:23:26 2016 - [info] Starting orig master as a new slave..
Thu Jul 28 16:23:26 2016 - [info] Resetting slave host_2(host_2:3306) and starting replication from the new master host_1(host_1:3306)..
Thu Jul 28 16:23:26 2016 - [info] Executed CHANGE MASTER.
Thu Jul 28 16:23:26 2016 - [info] Slave started.
Thu Jul 28 16:23:26 2016 - [info] All new slave servers switched successfully.
Thu Jul 28 16:23:26 2016 - [info]
Thu Jul 28 16:23:26 2016 - [info] * Phase 5: New master cleanup phase..
Thu Jul 28 16:23:26 2016 - [info]
Thu Jul 28 16:23:26 2016 - [info] host_1: Resetting slave info succeeded.
Thu Jul 28 16:23:26 2016 - [info] Switching master to host_1(host_1:3306) completed successfully.

4.5 测试场景四、自动监控,自动操作,自动完成,failover

  • step1: 检查各种MHA manager的各种环境
* 检查SSH
 masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
 masterha_check_repl --conf=/etc/app1.cnf
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_2(host_2:3306).
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_3(host_3:3306).

* 检查mha状态
 masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
 nohup masterha_manager --conf=/etc/app1.cnf &
  • step2: 手工制造master mysql挂了
killall-9mysqld mysqld_safe
  • step3: 自动监控,自动操作,自动完成,自动failover
Thu Jul 2817:10:252016- [info] MHA::MasterMonitorversion0.56.
Thu Jul 2817:10:262016- [info] GTID failover mode =0
Thu Jul 2817:10:262016- [info] Dead Servers:
Thu Jul 2817:10:262016- [info] Alive Servers:
Thu Jul 2817:10:262016- [info] host_1(host_1:3306)
Thu Jul 2817:10:262016- [info] host_2(host_2:3306)
Thu Jul 2817:10:262016- [info] host_3(host_3:3306)
Thu Jul 2817:10:262016- [info] Alive Slaves:
Thu Jul 2817:10:262016- [info] host_2(host_2:3306) Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Thu Jul 2817:10:262016- [info] Replicatingfromhost_1(host_1:3306)
Thu Jul 2817:10:262016- [info] Primary candidateforthenew Master (candidate_masterisset)
Thu Jul 2817:10:262016- [info] host_3(host_3:3306) Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Thu Jul 2817:10:262016- [info] Replicatingfromhost_1(host_1:3306)
Thu Jul 2817:10:262016- [info] Not candidateforthenew Master (no_masterisset)
Thu Jul 2817:10:262016- [info] Current Alive Master: host_1(host_1:3306)
Thu Jul 2817:10:262016- [info] Checking slave configurations..
Thu Jul 2817:10:262016- [info] read_only=1isnotsetonslave host_2(host_2:3306).
Thu Jul 2817:10:262016- [warning] relay_log_purge=0isnotsetonslave host_3(host_3:3306).
Thu Jul 2817:10:262016- [info] Checking replication filtering settings..
Thu Jul 2817:10:262016- [info] binlog_do_db= , binlog_ignore_db=
Thu Jul 2817:10:262016- [info] Replication filtering check ok.
Thu Jul 2817:10:262016- [info] GTID (withauto-pos)isnotsupported
Thu Jul 2817:10:262016- [info] Starting SSH connection tests..
Thu Jul 2817:10:272016- [info] All SSH connection tests passed successfully.
Thu Jul 2817:10:272016- [info] Checking MHA Nodeversion..
Thu Jul 2817:10:282016- [info] Version check ok.
Thu Jul 2817:10:282016- [info] Checking SSH publickey authentication settingsonthecurrent master..
Thu Jul 2817:10:282016- [info] HealthCheck: SSHtohost_1isreachable.
Thu Jul 2817:10:292016- [info] Master MHA Nodeversionis0.56.
Thu Jul 2817:10:292016- [info] Checking recoveryscriptconfigurationsonhost_1(host_1:3306)..
Thu Jul 2817:10:292016- [info] Executing command: save_binary_logs--command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --start_file=host_1_name.000004
Thu Jul 2817:10:292016- [info] Connectingtoroot@host_1(host_1:22)..
 Creating /var/log/masterha/app1ifnotexists.. ok.
 Checking output directory isaccessibleornot..
 ok.
 Binlog found at/data/mysql.bin, uptohost_1_name.000004
Thu Jul 2817:10:292016- [info] Binlog setting check done.
Thu Jul 2817:10:292016- [info] Checking SSH publickey authenticationandchecking recoveryscriptconfigurationsonall alive slave servers..
Thu Jul 2817:10:292016- [info] Executing command : apply_diff_relay_logs--command=test --slave_user='dba' --slave_host=host_2 --slave_ip=host_2 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.7.13-log --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_2_name-relay-bin.000001 --slave_pass=xxx
Thu Jul 2817:10:292016- [info] Connectingtoroot@host_2(host_2:22)..
 Checking slave recovery environment settings..
 Relay logfoundat/data/mysql_data, uptohost_2_name-relay-bin.000002
 Temporary relay logfileis/data/mysql_data/host_2_name-relay-bin.000002
 Testing mysql connection andprivileges..mysql: [Warning] Using a passwordonthecommand line interface can be insecure.
 done.
 Testing mysqlbinlog output.. done.
 Cleaning up test file(s).. done.
Thu Jul 2817:10:292016- [info] Executing command : apply_diff_relay_logs--command=test --slave_user='dba' --slave_host=host_3 --slave_ip=host_3 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.7.13-log --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_3_name-relay-bin.000001 --slave_pass=xxx
Thu Jul 2817:10:292016- [info] Connectingtoroot@host_3(host_3:22)..
 Checking slave recovery environment settings..
 Relay logfoundat/data/mysql_data, uptohost_3_name-relay-bin.000002
 Temporary relay logfileis/data/mysql_data/host_3_name-relay-bin.000002
 Testing mysql connection andprivileges..mysql: [Warning] Using a passwordonthecommand line interface can be insecure.
 done.
 Testing mysqlbinlog output.. done.
 Cleaning up test file(s).. done.
Thu Jul 2817:10:292016- [info] Slaves settings check done.
Thu Jul 2817:10:292016- [info]
host_1(host_1:3306) (current master)
 +--host_2(host_2:3306)
 +--host_3(host_3:3306)

Thu Jul 2817:10:292016- [info] Checking master_ip_failover_script status:
Thu Jul 2817:10:292016- [info] /home/mysql/MHA/masterha/master_ip_failover--command=status --ssh_user=root --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306
Thu Jul 2817:10:302016- [info] OK.
Thu Jul 2817:10:302016- [warning] shutdown_scriptisnotdefined.
Thu Jul 2817:10:302016- [info] Set master ping interval3seconds.
Thu Jul 2817:10:302016- [warning] secondary_check_scriptisnotdefined. Itishighly recommended settingittocheck master reachabilityfromtwoormore routes.
Thu Jul 2817:10:302016- [info] Starting ping health checkonhost_1(host_1:3306)..
Thu Jul 2817:10:302016- [info] Ping(SELECT) succeeded, waitinguntilMySQL doesn't respond..
Thu Jul 2817:11:212016- [warning] GoterroronMySQL select ping:2006(MySQL server has gone away)
Thu Jul 2817:11:212016- [info] Executing SSH checkscript: save_binary_logs--command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Thu Jul 2817:11:212016- [info] HealthCheck: SSHtohost_1isreachable.
Thu Jul 2817:11:242016- [warning] GoterroronMySQL connect:2013(Lost connectiontoMySQL serverat'reading initial communication packet', systemerror:111)
Thu Jul 2817:11:242016- [warning] Connection failed2time(s)..
Thu Jul 2817:11:272016- [warning] GoterroronMySQL connect:2013(Lost connectiontoMySQL serverat'reading initial communication packet', systemerror:111)
Thu Jul 2817:11:272016- [warning] Connection failed3time(s)..
Thu Jul 2817:11:302016- [warning] GoterroronMySQL connect:2013(Lost connectiontoMySQL serverat'reading initial communication packet', systemerror:111)
Thu Jul 2817:11:302016- [warning] Connection failed4time(s)..
Thu Jul 2817:11:302016- [warning] Masterisnotreachablefromhealth checker!
Thu Jul 2817:11:302016- [warning] Master host_1(host_1:3306)isnotreachable!
Thu Jul 2817:11:302016- [warning] SSHisreachable.
Thu Jul 2817:11:302016- [info] Connectingtoa master server failed. Reading configurationfile/etc/masterha_default.cnfand/etc/app1.cnf again,andtryingtoconnecttoall serverstocheck server status..
Thu Jul 2817:11:302016- [info] Reading default configurationfrom/etc/masterha_default.cnf..
Thu Jul 2817:11:302016- [info] Readingapplicationdefault configurationfrom/etc/app1.cnf..
Thu Jul 2817:11:302016- [info] Reading server configurationfrom/etc/app1.cnf..
Thu Jul 2817:11:302016- [info] GTID failover mode =0
Thu Jul 2817:11:302016- [info] Dead Servers:
Thu Jul 2817:11:302016- [info] host_1(host_1:3306)
Thu Jul 2817:11:302016- [info] Alive Servers:
Thu Jul 2817:11:302016- [info] host_2(host_2:3306)
Thu Jul 2817:11:302016- [info] host_3(host_3:3306)
Thu Jul 2817:11:302016- [info] Alive Slaves:
Thu Jul 2817:11:302016- [info] host_2(host_2:3306) Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Thu Jul 2817:11:302016- [info] Replicatingfromhost_1(host_1:3306)
Thu Jul 2817:11:302016- [info] Primary candidateforthenew Master (candidate_masterisset)
Thu Jul 2817:11:302016- [info] host_3(host_3:3306) Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Thu Jul 2817:11:302016- [info] Replicatingfromhost_1(host_1:3306)
Thu Jul 2817:11:302016- [info] Not candidateforthenew Master (no_masterisset)
Thu Jul 2817:11:302016- [info] Checking slave configurations..
Thu Jul 2817:11:302016- [info] read_only=1isnotsetonslave host_2(host_2:3306).
Thu Jul 2817:11:302016- [warning] relay_log_purge=0isnotsetonslave host_3(host_3:3306).
Thu Jul 2817:11:302016- [info] Checking replication filtering settings..
Thu Jul 2817:11:302016- [info] Replication filtering check ok.
Thu Jul 2817:11:302016- [info] Masterisdown!
Thu Jul 2817:11:302016- [info] Terminating monitoringscript.
Thu Jul 2817:11:302016- [info] Gotexitcode20(Master dead).
Thu Jul 2817:11:302016- [info] MHA::MasterFailoverversion0.56.
Thu Jul 2817:11:302016- [info] Starting master failover.
Thu Jul 2817:11:302016- [info]
Thu Jul 2817:11:302016- [info] * Phase1: Configuration Check Phase..
Thu Jul 2817:11:302016- [info]
Thu Jul 2817:11:302016- [info] GTID failover mode =0
Thu Jul 2817:11:302016- [info] Dead Servers:
Thu Jul 2817:11:302016- [info] host_1(host_1:3306)
Thu Jul 2817:11:302016- [info] Checking master reachability via MySQL(double check)...
Thu Jul 2817:11:302016- [info] ok.
Thu Jul 2817:11:302016- [info] Alive Servers:
Thu Jul 2817:11:302016- [info] host_2(host_2:3306)
Thu Jul 2817:11:302016- [info] host_3(host_3:3306)
Thu Jul 2817:11:302016- [info] Alive Slaves:
Thu Jul 2817:11:302016- [info] host_2(host_2:3306) Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Thu Jul 2817:11:302016- [info] Replicatingfromhost_1(host_1:3306)
Thu Jul 2817:11:302016- [info] Primary candidateforthenew Master (candidate_masterisset)
Thu Jul 2817:11:302016- [info] host_3(host_3:3306) Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Thu Jul 2817:11:302016- [info] Replicatingfromhost_1(host_1:3306)
Thu Jul 2817:11:302016- [info] Not candidateforthenew Master (no_masterisset)
Thu Jul 2817:11:302016- [info] Starting Non-GTID based failover.
Thu Jul 2817:11:302016- [info]
Thu Jul 2817:11:302016- [info] ** Phase1: Configuration Check Phase completed.
Thu Jul 2817:11:302016- [info]
Thu Jul 2817:11:302016- [info] * Phase2: Dead Master Shutdown Phase..
Thu Jul 2817:11:302016- [info]
Thu Jul 2817:11:302016- [info] Forcing shutdown sothatapplications never connecttothecurrent master..
Thu Jul 2817:11:302016- [info] Executing master IP deactivationscript:
Thu Jul 2817:11:302016- [info] /home/mysql/MHA/masterha/master_ip_failover--orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --command=stopssh --ssh_user=root
Thu Jul 2817:11:302016- [info] done.
Thu Jul 2817:11:302016- [warning] shutdown_scriptisnotset. Skipping explicit shutting downofthedead master.
Thu Jul 2817:11:302016- [info] * Phase2: Dead Master Shutdown Phase completed.
Thu Jul 2817:11:302016- [info]
Thu Jul 2817:11:302016- [info] * Phase3: Master Recovery Phase..
Thu Jul 2817:11:302016- [info]
Thu Jul 2817:11:302016- [info] * Phase3.1: Getting Latest Slaves Phase..
Thu Jul 2817:11:302016- [info]
Thu Jul 2817:11:302016- [info] The latest binarylogfile/positiononall slavesishost_1_name.000004:154
Thu Jul 2817:11:302016- [info] Latest slaves (Slavesthatreceived relaylogfilestothelatest):
Thu Jul 2817:11:302016- [info] host_2(host_2:3306) Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Thu Jul 2817:11:302016- [info] Replicatingfromhost_1(host_1:3306)
Thu Jul 2817:11:302016- [info] Primary candidateforthenew Master (candidate_masterisset)
Thu Jul 2817:11:302016- [info] host_3(host_3:3306) Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Thu Jul 2817:11:302016- [info] Replicatingfromhost_1(host_1:3306)
Thu Jul 2817:11:302016- [info] Not candidateforthenew Master (no_masterisset)
Thu Jul 2817:11:302016- [info] The oldest binarylogfile/positiononall slavesishost_1_name.000004:154
Thu Jul 2817:11:302016- [info] Oldest slaves:
Thu Jul 2817:11:302016- [info] host_2(host_2:3306) Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Thu Jul 2817:11:302016- [info] Replicatingfromhost_1(host_1:3306)
Thu Jul 2817:11:302016- [info] Primary candidateforthenew Master (candidate_masterisset)
Thu Jul 2817:11:302016- [info] host_3(host_3:3306) Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Thu Jul 2817:11:302016- [info] Replicatingfromhost_1(host_1:3306)
Thu Jul 2817:11:302016- [info] Not candidateforthenew Master (no_masterisset)
Thu Jul 2817:11:302016- [info]
Thu Jul 2817:11:302016- [info] * Phase3.2: Saving Dead Master's Binlog Phase..
Thu Jul 2817:11:302016- [info]
Thu Jul 2817:11:302016- [info] Fetching dead master's binary logs..
Thu Jul 2817:11:302016- [info] Executing commandonthedead master host_1(host_1:3306): save_binary_logs--command=save --start_file=host_1_name.000004 --start_pos=154 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160728171130.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56
 Creating /var/log/masterha/app1ifnotexists.. ok.
 Concat binary/relay logs fromhost_1_name.000004pos154tohost_1_name.000004EOFinto/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160728171130.binlog ..
 Binlog Checksum enabled
 Dumping binlog format description event, fromposition0to154.. ok.
 No need todump effective binlog datafrom/data/mysql.bin/host_1_name.000004(pos starts154, filesize154). Skipping.
 Binlog Checksum enabled
 /var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160728171130.binlog has no effective data events.
Event notexists.
Thu Jul 2817:11:312016- [info] Additional events werenotfoundfromtheorig master. No needtosave.
Thu Jul 2817:11:312016- [info]
Thu Jul 2817:11:312016- [info] * Phase3.3: Determining New Master Phase..
Thu Jul 2817:11:312016- [info]
Thu Jul 2817:11:312016- [info] Findingthelatest slavethathas all relay logsforrecovering other slaves..
Thu Jul 2817:11:312016- [info] All slaves received relay logstothesame position. No needtoresync each other.
Thu Jul 2817:11:312016- [info] Searching new masterfromslaves..
Thu Jul 2817:11:312016- [info] Candidate mastersfromtheconfigurationfile:
Thu Jul 2817:11:312016- [info] host_2(host_2:3306) Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Thu Jul 2817:11:312016- [info] Replicatingfromhost_1(host_1:3306)
Thu Jul 2817:11:312016- [info] Primary candidateforthenew Master (candidate_masterisset)
Thu Jul 2817:11:312016- [info] Non-candidate masters:
Thu Jul 2817:11:312016- [info] host_3(host_3:3306) Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Thu Jul 2817:11:312016- [info] Replicatingfromhost_1(host_1:3306)
Thu Jul 2817:11:312016- [info] Not candidateforthenew Master (no_masterisset)
Thu Jul 2817:11:312016- [info] Searchingfromcandidate_master slaves which have receivedthelatest relaylogevents..
Thu Jul 2817:11:312016- [info] New masterishost_2(host_2:3306)
Thu Jul 2817:11:312016- [info] Starting master failover..
Thu Jul 2817:11:312016- [info]
From:
host_1(host_1:3306) (current master)
 +--host_2(host_2:3306)
 +--host_3(host_3:3306)

To:
host_2(host_2:3306) (new master)
 +--host_3(host_3:3306)
Thu Jul 2817:11:312016- [info]
Thu Jul 2817:11:312016- [info] * Phase3.3: New Master Diff Log Generation Phase..
Thu Jul 2817:11:312016- [info]
Thu Jul 2817:11:312016- [info] This server has all relay logs. No needtogenerate diff filesfromthelatest slave.
Thu Jul 2817:11:312016- [info]
Thu Jul 2817:11:312016- [info] * Phase3.4: Master Log Apply Phase..
Thu Jul 2817:11:312016- [info]
Thu Jul 2817:11:312016- [info] *NOTICE: If anyerrorhappensfromthis phase, manual recoveryisneeded.
Thu Jul 2817:11:312016- [info] Starting recoveryonhost_2(host_2:3306)..
Thu Jul 2817:11:312016- [info] This server has all relay logs. Waiting all logstobe applied..
Thu Jul 2817:11:312016- [info] done.
Thu Jul 2817:11:312016- [info] All relay logs were successfully applied.
Thu Jul 2817:11:312016- [info] Getting new master's binlognameandposition..
Thu Jul 2817:11:312016- [info] host_2_name.000002:294
Thu Jul 2817:11:312016- [info] All other slaves should start replicationfromhere. Statement should be: CHANGE MASTER TO MASTER_HOST='host_2', MASTER_PORT=3306, MASTER_LOG_FILE='host_2_name.000002', MASTER_LOG_POS=294, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Thu Jul 2817:11:312016- [info] Executing master IPactivatescript:
Thu Jul 2817:11:312016- [info] /home/mysql/MHA/masterha/master_ip_failover--command=start --ssh_user=root --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --new_master_host=host_2 --new_master_ip=host_2 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba'
Set read_only=0onthenew master.
No need toCreating app useronthenew master..
Thu Jul 2817:11:312016- [info] OK.
Thu Jul 2817:11:312016- [info] ** Finished master recovery successfully.
Thu Jul 2817:11:312016- [info] * Phase3: Master Recovery Phase completed.
Thu Jul 2817:11:312016- [info]
Thu Jul 2817:11:312016- [info] * Phase4: Slaves Recovery Phase..
Thu Jul 2817:11:312016- [info]
Thu Jul 2817:11:312016- [info] * Phase4.1: Starting Parallel Slave Diff Log Generation Phase..
Thu Jul 2817:11:312016- [info]
Thu Jul 2817:11:312016- [info]-- Slave diff file generation on host host_3(host_3:3306) started, pid: 22427. Check tmp log /var/log/masterha/app1/host_3_3306_20160728171130.log if it takes time..
Thu Jul 2817:11:312016- [info]
Thu Jul 2817:11:312016- [info] Log messagesfromhost_3 ...
Thu Jul 2817:11:312016- [info]
Thu Jul 2817:11:312016- [info] This server has all relay logs. No needtogenerate diff filesfromthelatest slave.
Thu Jul 2817:11:312016- [info] Endoflogmessagesfromhost_3.
Thu Jul 2817:11:312016- [info]-- host_3(host_3:3306) has the latest relay log events.
Thu Jul 2817:11:312016- [info] Generating relay diff filesfromthelatest slave succeeded.
Thu Jul 2817:11:312016- [info]
Thu Jul 2817:11:312016- [info] * Phase4.2: Starting Parallel Slave Log Apply Phase..
Thu Jul 2817:11:312016- [info]
Thu Jul 2817:11:312016- [info]-- Slave recovery on host host_3(host_3:3306) started, pid: 22429. Check tmp log /var/log/masterha/app1/host_3_3306_20160728171130.log if it takes time..
Thu Jul 2817:11:312016- [info]
Thu Jul 2817:11:312016- [info] Log messagesfromhost_3 ...
Thu Jul 2817:11:312016- [info]
Thu Jul 2817:11:312016- [info] Starting recoveryonhost_3(host_3:3306)..
Thu Jul 2817:11:312016- [info] This server has all relay logs. Waiting all logstobe applied..
Thu Jul 2817:11:312016- [info] done.
Thu Jul 2817:11:312016- [info] All relay logs were successfully applied.
Thu Jul 2817:11:312016- [info] Resetting slave host_3(host_3:3306)andstarting replicationfromthenew master host_2(host_2:3306)..
Thu Jul 2817:11:312016- [info] Executed CHANGE MASTER.
Thu Jul 2817:11:312016- [info] Slave started.
Thu Jul 2817:11:312016- [info] Endoflogmessagesfromhost_3.
Thu Jul 2817:11:312016- [info]-- Slave recovery on host host_3(host_3:3306) succeeded.
Thu Jul 2817:11:312016- [info] All new slave servers recovered successfully.
Thu Jul 2817:11:312016- [info]
Thu Jul 2817:11:312016- [info] * Phase5: New master cleanup phase..
Thu Jul 2817:11:312016- [info]
Thu Jul 2817:11:312016- [info] Resetting slave infoonthenew master..
Thu Jul 2817:11:312016- [info] host_2: Resetting slave info succeeded.
Thu Jul 2817:11:312016- [info] Master failovertohost_2(host_2:3306) completed successfully.
Thu Jul 2817:11:312016- [info]

----- Failover Report -----

app1: MySQL Master failover host_1(host_1:3306)tohost_2(host_2:3306) succeeded

Master host_1(host_1:3306)isdown!

Check MHA Manager logs athost_manager_name:/var/log/masterha/app1/app1.logfordetails.

Started automated(non-interactive) failover.
Invalidated master IP address onhost_1(host_1:3306)
The latest slave host_2(host_2:3306) has all relay logsforrecovery.
Selected host_2(host_2:3306)asa new master.
host_2(host_2:3306): OK: Applying all logs succeeded.
host_2(host_2:3306): OK: Activated master IP address.
host_3(host_3:3306): This host hasthelatest relaylogevents.
Generating relay diff files fromthelatest slave succeeded.
host_3(host_3:3306): OK: Applying all logs succeeded. Slave started, replicatingfromhost_2(host_2:3306)
host_2(host_2:3306): Resetting slave info succeeded.
Master failover tohost_2(host_2:3306) completed successfully.

4.6 [用例测试] MySQL master too many connection,无权限,响应慢

  • step1: 检查各种MHA manager的各种环境
* 检查SSH
 masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
 masterha_check_repl --conf=/etc/app1.cnf
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_2(host_2:3306).
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_3(host_3:3306).

* 检查mha状态
 masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
 nohup masterha_manager --conf=/etc/app1.cnf &
  • step2: 手工制造master too many connection
mysqlslap --concurrency=4000--iterations=10--query='insert into tb(id_2) select (1);' --debug-info -hhost_1 -uxx -pxx --create-schema=lc

mysqlslap: Errorwhenconnectingtoserver:1040Too many connections
mysqlslap: Errorwhenconnectingtoserver:1040Too many connections
mysqlslap: Errorwhenconnectingtoserver:1040Too many connections
  • step3: 检查是否发生failover
* 1. 手动开启另一个session,链接报错tmc。 但是并没有发生failover,说明master还活着
* 2. 检查了MHA的healthcheck机制,原来是长链接

DIGEST_TEXT:SELECT?ASVALUE
COUNT_STAR: 46
FIRST_SEEN: 2016-08-0211:28:00
LAST_SEEN: 2016-08-0211:30:37

4.7 [用例测试] MySQL master 服务器 down掉,且候选master落后的最多, 是否自动failover,知否可以成功的做日志补偿

  • step1: 检查各种MHA manager的各种环境
* 检查SSH
 masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
 masterha_check_repl --conf=/etc/app1.cnf
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_2(host_2:3306).
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_3(host_3:3306).

* 检查mha状态
 masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
 nohup masterha_manager --conf=/etc/app1.cnf &
  • step2: 手动制造候选master落后的情况
候选master>stop slave; sleep600s ; start slave;
  • step3: 手动制造master 服务器挂掉的情况
master >reboot
  • step4: 观察MHA 日志
Tue Aug 2 14:33:34 2016 - [warning] Slave host_2(host_2:3306) SQL Thread delays too much. Latest log file:host_1_name.000002:576864, Current log file:host_1_name.000001:39731438. This server is not selected as a new master because recovery will take long time.
Tue Aug 2 14:33:34 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln970] None of existing slaves matches as a new master. Maybe preferred node is misconfigured or all slaves are too far behind.
Tue Aug 2 14:33:34 2016 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/bin/masterha_master_switch line 53
  • step5:重新修正配置文件

由于要指定特定的slave为候选master,而此slave还落后非常多

必须在每组服务器上都加上check_repl_delay=0

vi /etc/app1.cnf

[server default]
remote_workdir=/var/log/masterha/app1
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/app1.log

[server1]
hostname=host_1
candidate_master=1
check_repl_delay=0

[server2]
hostname=host_2
candidate_master=1
check_repl_delay=0

[server3]
hostname=host_3
no_master=1
check_repl_delay=0
  • step6:用手工failover再来模拟
masterha_master_switch--master_state=dead--conf=/etc/app1.cnf--dead_master_host=host_1--interactive=1--ignore_last_failover--new_master_host=host_2
  • step7: 观察MHA 日志
Tue Aug 2 14:38:55 2016 - [info] Reading default configuration from /etc/masterha_default.cnf..
Tue Aug 2 14:38:55 2016 - [info] Reading application default configuration from /etc/app1.cnf..
Tue Aug 2 14:38:55 2016 - [info] Reading server configuration from /etc/app1.cnf..
Tue Aug 2 14:38:55 2016 - [info] MHA::MasterFailover version 0.56.
Tue Aug 2 14:38:55 2016 - [info] Starting master failover.
Tue Aug 2 14:38:55 2016 - [info]
Tue Aug 2 14:38:55 2016 - [info] * Phase 1: Configuration CheckPhase..
Tue Aug 214:38:552016- [info]
Tue Aug 214:38:552016- [info] GTID failovermode=0
Tue Aug 214:38:552016- [info] Dead Servers:
Tue Aug 214:38:552016- [info] host_1(host_1:3306)
Tue Aug 214:38:552016- [info] Checkingmasterreachability via MySQL(doublecheck)...
Tue Aug 214:38:552016- [info] ok.
Tue Aug 214:38:552016- [info] Alive Servers:
Tue Aug 214:38:552016- [info] host_2(host_2:3306)
Tue Aug 214:38:552016- [info] host_3(host_3:3306)
Tue Aug 214:38:552016- [info] Alive Slaves:
Tue Aug 214:38:552016- [info] host_2(host_2:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Tue Aug 214:38:552016- [info] Replicatingfromhost_1(host_1:3306)
Tue Aug 214:38:552016- [info]Primarycandidateforthe newMaster(candidate_masterisset)
Tue Aug 214:38:552016- [info] host_3(host_3:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Tue Aug 214:38:552016- [info] Replicatingfromhost_1(host_1:3306)
Tue Aug 214:38:552016- [info]Notcandidateforthe newMaster(no_masterisset)
Masterhost_1(host_1:3306)isdead. Proceed? (yes/NO): yes
Tue Aug 214:38:572016- [info]StartingNon-GTID based failover.
Tue Aug 214:38:572016- [info]
Tue Aug 214:38:572016- [info] ** Phase1: ConfigurationCheckPhase completed.
Tue Aug 214:38:572016- [info]
Tue Aug 214:38:572016- [info] * Phase2: DeadMasterShutdown Phase..
Tue Aug 214:38:572016- [info]
Tue Aug 214:38:582016- [info] HealthCheck: SSHtohost_1isreachable.
Tue Aug 214:38:582016- [info] Forcing shutdown so that applications neverconnecttothecurrentmaster..
Tue Aug 214:38:582016- [info] ExecutingmasterIP deactivation script:
Tue Aug 214:38:582016- [info] /home/mysql/MHA/masterha/master_ip_failover--orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --command=stopssh --ssh_user=root
Tue Aug 214:38:582016- [info] done.
Tue Aug 214:38:582016- [warning] shutdown_scriptisnotset. Skipping explicit shutting downofthe deadmaster.
Tue Aug 214:38:582016- [info] * Phase2: DeadMasterShutdown Phase completed.
Tue Aug 214:38:582016- [info]
Tue Aug 214:38:582016- [info] * Phase3:MasterRecovery Phase..
Tue Aug 214:38:582016- [info]
Tue Aug 214:38:582016- [info] * Phase3.1: Getting Latest Slaves Phase..
Tue Aug 214:38:582016- [info]
Tue Aug 214:38:582016- [info] The latestbinarylogfile/positiononallslavesishost_1_name.000002:576864
Tue Aug 214:38:582016- [info] Latest slaves (Slaves that received relaylogfilestothe latest):
Tue Aug 214:38:582016- [info] host_3(host_3:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Tue Aug 214:38:582016- [info] Replicatingfromhost_1(host_1:3306)
Tue Aug 214:38:582016- [info]Notcandidateforthe newMaster(no_masterisset)
Tue Aug 214:38:582016- [info] The oldestbinarylogfile/positiononallslavesishost_1_name.000001:39731619
Tue Aug 214:38:582016- [info] Oldest slaves:
Tue Aug 214:38:582016- [info] host_2(host_2:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Tue Aug 214:38:582016- [info] Replicatingfromhost_1(host_1:3306)
Tue Aug 214:38:582016- [info]Primarycandidateforthe newMaster(candidate_masterisset)
Tue Aug 214:38:582016- [info]
Tue Aug 214:38:582016- [info] * Phase3.2: Saving DeadMaster's Binlog Phase..
Tue Aug 2 14:38:58 2016 - [info]
Tue Aug 2 14:38:58 2016 - [info] Fetching dead master's binarylogs..
Tue Aug 214:38:582016- [info] Executing commandonthe deadmasterhost_1(host_1:3306): save_binary_logs--command=save --start_file=host_1_name.000002 --start_pos=576864 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56
 Creating /var/log/masterha/app1ifnotexists.. ok.
Concatbinary/relaylogsfromhost_1_name.000002pos576864tohost_1_name.000002EOFinto/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog..
BinlogChecksumenabled
 Dumping binlogformatdescriptionevent,fromposition0to154.. ok.
 Dumping effective binlogdatafrom/data/mysql.bin/host_1_name.000002position576864totail(577435).. ok.
BinlogChecksumenabled
Concatsucceeded.
saved_master_binlog_from_host_1_3306_20160802143855.binlog100%7250.7KB/s00:00
Tue Aug 214:38:592016- [info] scpfromroot@host_1:/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlogtolocal:/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlogsucceeded.
Tue Aug 214:38:592016- [info] HealthCheck: SSHtohost_2isreachable.
Tue Aug 214:39:002016- [info] HealthCheck: SSHtohost_3isreachable.
Tue Aug 214:39:002016- [info]
Tue Aug 214:39:002016- [info] * Phase3.3: Determining NewMasterPhase..
Tue Aug 214:39:002016- [info]
Tue Aug 214:39:002016- [info] Finding the latestslavethat hasallrelaylogsforrecovering other slaves..
Tue Aug 214:39:002016- [info] Checking whether host_3 has relaylogsfromthe oldestposition..
Tue Aug 214:39:002016- [info] Executing command: apply_diff_relay_logs--command=find --latest_mlf=host_1_name.000002 --latest_rmlp=576864 --target_mlf=host_1_name.000001 --target_rmlp=39731619 --server_id=12616606 --workdir=/var/log/masterha/app1 --timestamp=20160802143855 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_3_name-relay-bin.000004 :
 Relay logfoundat/data/mysql_data, uptohost_3_name-relay-bin.000004
Fastrelaylogpositionsearch failed. Reading relaylogstofind..
Reading host_3_name-relay-bin.000004
BinlogChecksumenabled
MasterVersionis5.7.13-log
BinlogChecksumenabled
 host_3_name-relay-bin.000004containsmasterhost_1_name.000002fromposition4
Reading host_3_name-relay-bin.000003
BinlogChecksumenabled
 host_3_name-relay-bin.000003containsmasterhost_1_name.000001fromposition218246073
Reading host_3_name-relay-bin.000002
BinlogChecksumenabled
 host_3_name-relay-bin.000002containsmasterhost_1_name.000001fromposition154
Target relay logFOUND!
Tue Aug 214:39:002016- [info] OK. host_3 hasallrelaylogs.
Tue Aug 214:39:002016- [info] Searching newmasterfromslaves..
Tue Aug 214:39:002016- [info] Candidate mastersfromthe configuration file:
Tue Aug 214:39:002016- [info] host_2(host_2:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Tue Aug 214:39:002016- [info] Replicatingfromhost_1(host_1:3306)
Tue Aug 214:39:002016- [info]Primarycandidateforthe newMaster(candidate_masterisset)
Tue Aug 214:39:002016- [info] Non-candidate masters:
Tue Aug 214:39:002016- [info] host_3(host_3:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Tue Aug 214:39:002016- [info] Replicatingfromhost_1(host_1:3306)
Tue Aug 214:39:002016- [info]Notcandidateforthe newMaster(no_masterisset)
Tue Aug 214:39:002016- [info] Searchingfromcandidate_master slaves which have received the latest relaylogevents..
Tue Aug 214:39:002016- [info]Notfound.
Tue Aug 214:39:002016- [info] Searchingfromallcandidate_master slaves..
Tue Aug 214:39:002016- [info] Newmasterishost_2(host_2:3306)
Tue Aug 214:39:002016- [info]Startingmasterfailover..
Tue Aug 214:39:002016- [info]
From:
host_1(host_1:3306) (currentmaster)
 +--host_2(host_2:3306)
 +--host_3(host_3:3306)

To:
host_2(host_2:3306) (newmaster)
 +--host_3(host_3:3306)

Startingmasterswitchfromhost_1(host_1:3306)tohost_2(host_2:3306)? (yes/NO): yes
Tue Aug 214:39:052016- [info] Newmasterdecided manuallyishost_2(host_2:3306)
Tue Aug 214:39:052016- [info]
Tue Aug 214:39:052016- [info] * Phase3.3: NewMasterDiffLogGeneration Phase..
Tue Aug 214:39:052016- [info]
Tue Aug 214:39:052016- [info]Serverhost_2 received relaylogsupto: host_1_name.000001:39731619
Tue Aug 214:39:052016- [info] Needtogetdiffsfromthe latestslave(host_3) upto: host_1_name.000002:576864(usingthe latestslave's relay logs)
Tue Aug 2 14:39:05 2016 - [info] Connecting to the latest slave host host_3, generating diff relay log files..
Tue Aug 2 14:39:05 2016 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=host_2 --latest_mlf=host_1_name.000002 --latest_rmlp=576864 --target_mlf=host_1_name.000001 --target_rmlp=39731619 --server_id=12616606 --diff_file_readtolatest=/var/log/masterha/app1/relay_from_read_to_latest_host_2_3306_20160802143855.binlog --workdir=/var/log/masterha/app1 --timestamp=20160802143855 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_3_name-relay-bin.000004
Tue Aug 2 14:39:11 2016 - [info]
 Relay log found at /data/mysql_data, up to host_3_name-relay-bin.000004
 Fast relay log position search failed. Reading relay logs to find..
Reading host_3_name-relay-bin.000004
 Binlog Checksum enabled
 Master Version is 5.7.13-log
 Binlog Checksum enabled
 host_3_name-relay-bin.000004 contains master host_1_name.000002 from position 4
Reading host_3_name-relay-bin.000003
 Binlog Checksum enabled
 host_3_name-relay-bin.000003 contains master host_1_name.000001 from position 218246073
Reading host_3_name-relay-bin.000002
 Binlog Checksum enabled
 host_3_name-relay-bin.000002 contains master host_1_name.000001 from position 154
 Target relay log file/position found. start_file:host_3_name-relay-bin.000002, start_pos:39731788.
 Concat binary/relay logs from host_3_name-relay-bin.000002 pos 39731788 to host_3_name-relay-bin.000004 EOF into /var/log/masterha/app1/relay_from_read_to_latest_host_2_3306_20160802143855.binlog ..
 Binlog Checksum enabled
 Binlog Checksum enabled
 Dumping binlog format description event, from position 0 to 323.. ok.
 Dumping effective binlog data from /data/mysql_data/host_3_name-relay-bin.000002 position 39731788 to tail(218246252).. ok.
 Dumping binlog head events (rotate events), skipping format description events from /data/mysql_data/host_3_name-relay-bin.000003.. Binlog Checksum enabled
dumped up to pos 264. ok.
 No need to dump effective binlog data from /data/mysql_data/host_3_name-relay-bin.000003 (pos starts 264, filesize 264). Skipping.
 Dumping binlog head events (rotate events), skipping format description events from /data/mysql_data/host_3_name-relay-bin.000004.. Binlog Checksum enabled
 Binlog Checksum enabled
dumped up to pos 373. ok.
 Dumping effective binlog data from /data/mysql_data/host_3_name-relay-bin.000004 position 373 to tail(577083).. ok.
 Binlog Checksum enabled
 Binlog Checksum enabled
 Concat succeeded.
 Generating diff relay log succeeded. Saved at /var/log/masterha/app1/relay_from_read_to_latest_host_2_3306_20160802143855.binlog .
 scp host_3_name.58os.org:/var/log/masterha/app1/relay_from_read_to_latest_host_2_3306_20160802143855.binlog to root@host_2(22) succeeded.
Tue Aug 2 14:39:11 2016 - [info] Generating diff files succeeded.
Tue Aug 2 14:39:11 2016 - [info] Sending binlog..
saved_master_binlog_from_host_1_3306_20160802143855.binlog 100% 725 0.7KB/s 00:00
Tue Aug 2 14:39:12 2016 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog to root@host_2:/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog succeeded.
Tue Aug 2 14:39:12 2016 - [info]
Tue Aug 2 14:39:12 2016 - [info] * Phase 3.4: Master Log Apply Phase..
Tue Aug 2 14:39:12 2016 - [info]
Tue Aug 2 14:39:12 2016 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Tue Aug 2 14:39:12 2016 - [info] Starting recovery on host_2(host_2:3306)..
Tue Aug 2 14:39:12 2016 - [info] Generating diffs succeeded.
Tue Aug 2 14:39:12 2016 - [info] Waiting until all relay logs are applied.
Tue Aug 2 14:39:12 2016 - [info] done.
Tue Aug 2 14:39:12 2016 - [info] Getting slave status..
Tue Aug 2 14:39:12 2016 - [info] This slave(host_2)'s Exec_Master_Log_Pos(host_1_name.000001:39731438) doesnotequaltoRead_Master_Log_Pos(host_1_name.000001:39731619). Itislikely that relaylogwas cut duringtransaction. NeedtorecoverfromExec_Master_Log_Pos.
Tue Aug 214:39:122016- [info] Savinglocalrelaylogsfromexecpostoreadposonhost_2:fromhost_2_name-relay-bin.000003:39726380totheendofthe relaylog..
Tue Aug 214:39:122016- [info] Executing command : save_binary_logs--command=save --start_file=host_2_name-relay-bin.000003 --start_pos=39726380 --output_file=/var/log/masterha/app1/relay_from_exec_to_read_host_2_3306_20160802143855.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --binlog_dir=/data/mysql_data
Tue Aug 214:39:122016- [info]
 Creating /var/log/masterha/app1ifnotexists.. ok.
Concatbinary/relaylogsfromhost_2_name-relay-bin.000003pos39726380tohost_2_name-relay-bin.000003EOFinto/var/log/masterha/app1/relay_from_exec_to_read_host_2_3306_20160802143855.binlog..
BinlogChecksumenabled
BinlogChecksumenabled
 Dumping binlogformatdescriptionevent,fromposition0to323.. ok.
 Dumping effective binlogdatafrom/data/mysql_data/host_2_name-relay-bin.000003position39726380totail(39726561).. ok.
BinlogChecksumenabled
BinlogChecksumenabled
Concatsucceeded.
Tue Aug 214:39:122016- [info] Connectingtothe targetslavehost host_2, running recover script..
Tue Aug 214:39:122016- [info] Executing command: apply_diff_relay_logs--command=apply --slave_user='dba' --slave_host=host_2 --slave_ip=host_2 --slave_port=3306 --apply_files=/var/log/masterha/app1/relay_from_exec_to_read_host_2_3306_20160802143855.binlog,/var/log/masterha/app1/relay_from_read_to_latest_host_2_3306_20160802143855.binlog,/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog --workdir=/var/log/masterha/app1 --target_version=5.7.13-log --timestamp=20160802143855 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx
Tue Aug 214:42:112016- [info]
Concatallapply filesto/var/log/masterha/app1/total_binlog_for_host_2_3306.20160802143855.binlog..
 Copying the firstbinlogfile /var/log/masterha/app1/relay_from_exec_to_read_host_2_3306_20160802143855.binlogto/var/log/masterha/app1/total_binlog_for_host_2_3306.20160802143855.binlog.. ok.
 Dumping binlogheadevents(rotateevents), skippingformatdescriptioneventsfrom/var/log/masterha/app1/relay_from_read_to_latest_host_2_3306_20160802143855.binlog..BinlogChecksumenabled
BinlogChecksumenabled
dumped up topos323.ok.
 /var/log/masterha/app1/relay_from_read_to_latest_host_2_3306_20160802143855.binloghas effectivebinlogeventsfrompos323.
 Dumping effective binlogdatafrom/var/log/masterha/app1/relay_from_read_to_latest_host_2_3306_20160802143855.binlogposition323totail(179091707).. ok.
 Dumping binlogheadevents(rotateevents), skippingformatdescriptioneventsfrom/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog..BinlogChecksumenabled
dumped up topos154.ok.
 /var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binloghas effectivebinlogeventsfrompos154.
 Dumping effective binlogdatafrom/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlogposition154totail(725).. ok.
Concatsucceeded.
Allapply targetbinarylogsareconcatinatedat/var/log/masterha/app1/total_binlog_for_host_2_3306.20160802143855.binlog.
MySQL client versionis5.7.13.Using--binary-mode.
Applying differential binary/relaylogfiles /var/log/masterha/app1/relay_from_exec_to_read_host_2_3306_20160802143855.binlog,/var/log/masterha/app1/relay_from_read_to_latest_host_2_3306_20160802143855.binlog,/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlogonhost_2:3306.This may take longtime...
Applying logfiles succeeded.
Tue Aug 214:42:112016- [info]Allrelaylogswere successfully applied.
Tue Aug 214:42:112016- [info] Getting newmaster's binlog name and position..
Tue Aug 2 14:42:11 2016 - [info] host_2_name.000001:217647730
Tue Aug 2 14:42:11 2016 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host_2', MASTER_PORT=3306, MASTER_LOG_FILE='host_2_name.000001', MASTER_LOG_POS=217647730, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Tue Aug 2 14:42:11 2016 - [info] Executing master IP activate script:
Tue Aug 2 14:42:11 2016 - [info] /home/mysql/MHA/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --new_master_host=host_2 --new_master_ip=host_2 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba'
Set read_only=0 on the new master.
No need to Creating app user on the new master..
Tue Aug 2 14:42:11 2016 - [info] OK.
Tue Aug 2 14:42:11 2016 - [info] ** Finished master recovery successfully.
Tue Aug 2 14:42:11 2016 - [info] * Phase 3: Master Recovery Phase completed.
Tue Aug 2 14:42:11 2016 - [info]
Tue Aug 2 14:42:11 2016 - [info] * Phase 4: Slaves Recovery Phase..
Tue Aug 2 14:42:11 2016 - [info]
Tue Aug 2 14:42:11 2016 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Tue Aug 2 14:42:11 2016 - [info]
Tue Aug 2 14:42:11 2016 - [info] -- Slave diff file generation on host host_3(host_3:3306) started, pid: 22541. Check tmp log /var/log/masterha/app1/host_3_3306_20160802143855.log if it takes time..
Tue Aug 2 14:42:11 2016 - [info]
Tue Aug 2 14:42:11 2016 - [info] Log messages from host_3 ...
Tue Aug 2 14:42:11 2016 - [info]
Tue Aug 2 14:42:11 2016 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Tue Aug 2 14:42:11 2016 - [info] End of log messages from host_3.
Tue Aug 2 14:42:11 2016 - [info] -- host_3(host_3:3306) has the latest relay log events.
Tue Aug 2 14:42:11 2016 - [info] Generating relay diff files from the latest slave succeeded.
Tue Aug 2 14:42:11 2016 - [info]
Tue Aug 2 14:42:11 2016 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Tue Aug 2 14:42:11 2016 - [info]
Tue Aug 2 14:42:11 2016 - [info] -- Slave recovery on host host_3(host_3:3306) started, pid: 22543. Check tmp log /var/log/masterha/app1/host_3_3306_20160802143855.log if it takes time..
saved_master_binlog_from_host_1_3306_20160802143855.binlog 100% 725 0.7KB/s 00:00
Tue Aug 2 14:42:12 2016 - [info]
Tue Aug 2 14:42:12 2016 - [info] Log messages from host_3 ...
Tue Aug 2 14:42:12 2016 - [info]
Tue Aug 2 14:42:11 2016 - [info] Sending binlog..
Tue Aug 2 14:42:12 2016 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog to root@host_3:/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog succeeded.
Tue Aug 2 14:42:12 2016 - [info] Starting recovery on host_3(host_3:3306)..
Tue Aug 2 14:42:12 2016 - [info] Generating diffs succeeded.
Tue Aug 2 14:42:12 2016 - [info] Waiting until all relay logs are applied.
Tue Aug 2 14:42:12 2016 - [info] done.
Tue Aug 2 14:42:12 2016 - [info] Getting slave status..
Tue Aug 2 14:42:12 2016 - [info] This slave(host_3)'s Exec_Master_Log_Pos equals toRead_Master_Log_Pos(host_1_name.000002:576864).NoneedtorecoverfromExec_Master_Log_Pos.
Tue Aug 214:42:122016- [info] Connectingtothe targetslavehost host_3, running recover script..
Tue Aug 214:42:122016- [info] Executing command: apply_diff_relay_logs--command=apply --slave_user='dba' --slave_host=host_3 --slave_ip=host_3 --slave_port=3306 --apply_files=/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog --workdir=/var/log/masterha/app1 --target_version=5.7.13-log --timestamp=20160802143855 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx
Tue Aug 214:42:122016- [info]
MySQL client versionis5.7.13.Using--binary-mode.
Applying differential binary/relaylogfiles /var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlogonhost_3:3306.This may take longtime...
Applying logfiles succeeded.
Tue Aug 214:42:122016- [info]Allrelaylogswere successfully applied.
Tue Aug 214:42:122016- [info] Resettingslavehost_3(host_3:3306)andstartingreplicationfromthe newmasterhost_2(host_2:3306)..
Tue Aug 214:42:122016- [info] ExecutedCHANGEMASTER.
Tue Aug 214:42:122016- [info]Slavestarted.
Tue Aug 214:42:122016- [info]Endoflogmessagesfromhost_3.
Tue Aug 214:42:122016- [info]-- Slave recovery on host host_3(host_3:3306) succeeded.
Tue Aug 214:42:122016- [info]Allnewslaveservers recovered successfully.
Tue Aug 214:42:122016- [info]
Tue Aug 214:42:122016- [info] * Phase5: Newmastercleanup phase..
Tue Aug 214:42:122016- [info]
Tue Aug 214:42:122016- [info] Resettingslaveinfoonthe newmaster..
Tue Aug 214:42:122016- [info] host_2: Resettingslaveinfo succeeded.
Tue Aug 214:42:122016- [info]Masterfailovertohost_2(host_2:3306) completed successfully.
Tue Aug 214:42:122016- [info]

----- Failover Report -----

app1: MySQL Masterfailover host_1(host_1:3306)tohost_2(host_2:3306) succeeded

Masterhost_1(host_1:3306)isdown!

CheckMHA Managerlogsathost_manager_namefordetails.

Started manual(interactive) failover.
Invalidated masterIP addressonhost_1(host_1:3306)
The latest slavehost_3(host_3:3306) hasallrelaylogsforrecovery.
Selected host_2(host_2:3306)asa newmaster.
host_2(host_2:3306): OK: Applyingalllogssucceeded.
host_2(host_2:3306): OK: ActivatedmasterIP address.
host_3(host_3:3306): This host has the latest relaylogevents.
Generating relay diff files fromthe latestslavesucceeded.
host_3(host_3:3306): OK: Applyingalllogssucceeded.Slavestarted, replicatingfromhost_2(host_2:3306)
host_2(host_2:3306): Resettingslaveinfo succeeded.
Masterfailovertohost_2(host_2:3306) completed successfully.

4.8 [用例测试] Master 没挂,但是MySQL slave 服务 down掉,是否自动failover

  • MySQL slave down掉,不会自动failover
* MHA 只会监控master的状态,只有master挂了,才会自动failover 

[info] Ping(SELECT) succeeded, waitinguntilMySQL doesn't respond..
  • master没挂,其他slave的错误,都不会导致自动failover
> 10. [用例测试] MySQL slave IO/SQL线程 stop,是否自动failover 
> 11. [用例测试]MySQLslaveIO/SQL线程 报错,是否自动failover

4.9 [用例测试] Master 挂了,但是MySQL slave 有问题,是否自动failover

MySQL slave IO/SQL线程 stop

MySQL slave IO/SQL线程 延迟很多

  • step1: 检查各种MHA manager的各种环境
* 检查SSH
 masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
 masterha_check_repl --conf=/etc/app1.cnf
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_2(host_2:3306).
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_3(host_3:3306).

* 检查mha状态
 masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
 nohup masterha_manager --conf=/etc/app1.cnf &
  • step2: 手动让某台slave stop slave ,然后再让master 挂掉
某台slave>stop slave;
master> shutdown mysql;
  • step3: 观察日志,观察情况
* 如果是slave 停止同步,然后master也挂了,任然会自动failover,不会报错
* 如果是slave 延迟严重,然后master也挂了,任然会自动failover,不会报错

4.10 [用例测试] MySQL master 有大事务超过100s再执行,是否online master switch

  • step1: 检查各种MHA manager的各种环境
* 检查SSH
 masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
 masterha_check_repl --conf=/etc/app1.cnf
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_2(host_2:3306).
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_3(host_3:3306).

* 检查mha状态
 masterha_check_status --conf=/etc/app1.cnf
  • step2: 手动在master上制造大事务
master> selectid , sleep(20)fromtbforupdate;
  • step3: 哪种情况MHA会报错
* slave 有问题的时候
 IO,sql thread有问题 
 slave 挂了
 slave 延迟太多
 复制过滤规则不统一 
 等等

以上情况,都会导致切换失败
  • step4:日志如下

    如果有大事务,MHA会一直等待,直到事务结束

Wed Aug 3 15:33:04 2016 - [info] MHA::MasterRotate version 0.56.
Wed Aug 3 15:33:04 2016 - [info] Starting online master switch..
Wed Aug 3 15:33:04 2016 - [info]
Wed Aug 3 15:33:04 2016 - [info] * Phase 1: Configuration CheckPhase..
Wed Aug 315:33:042016- [info]
Wed Aug 315:33:042016- [info] Readingdefaultconfigurationfrom/etc/masterha_default.cnf..
Wed Aug 315:33:042016- [info] Reading applicationdefaultconfigurationfrom/etc/app1.cnf..
Wed Aug 315:33:042016- [info] Readingserverconfigurationfrom/etc/app1.cnf..
Wed Aug 315:33:042016- [info] GTID failovermode=0
Wed Aug 315:33:042016- [info]CurrentAliveMaster: host_1(host_1:3306)
Wed Aug 315:33:042016- [info] Alive Slaves:
Wed Aug 315:33:042016- [info] host_2(host_2:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Wed Aug 315:33:042016- [info] Replicatingfromhost_1(host_1:3306)
Wed Aug 315:33:042016- [info]Primarycandidateforthe newMaster(candidate_masterisset)
Wed Aug 315:33:042016- [info] host_3(host_3:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Wed Aug 315:33:042016- [info] Replicatingfromhost_1(host_1:3306)
Wed Aug 315:33:042016- [info]Notcandidateforthe newMaster(no_masterisset)
Wed Aug 315:33:042016- [info] ExecutingFLUSHNO_WRITE_TO_BINLOGTABLES. This may take longtime..

... 一直等待 ...

4.11 [用例测试] MySQL master 网络断掉

  • step1: 检查各种MHA manager的各种环境
* 检查SSH
 masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
 masterha_check_repl --conf=/etc/app1.cnf
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_2(host_2:3306).
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_3(host_3:3306).

* 检查mha状态
 masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
 nohup masterha_manager --conf=/etc/app1.cnf &
  • step2: 手动制造master网络不通的情况
* MASTER 网络断掉,只允许host_monitor的22端口访问(便于恢复网络) 
master> iptables -AINPUT-p tcp -s host_monitor --dprot22-jACCEPT
master>iptables -AINPUT-p tcp -s0.0.0.0/0-jDROP

* MASTER 恢复网络
master> service iptables restart
  • step3: master的网络不通,观察MHA日志
Wed Aug 318:32:262016- [warning] GottimeoutonMySQL Ping(SELECT) child processandkilledit!at/usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line431.
Wed Aug 318:32:262016- [info] Executing SSH checkscript: save_binary_logs--command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Wed Aug 318:32:292016- [warning] GoterroronMySQL connect:2003(Can't connecttoMySQL serveron'host_1' (4))
Wed Aug 318:32:292016- [warning] Connection failed2time(s)..
Wed Aug 318:32:312016- [warning] HealthCheck: Gottimeoutonchecking SSH connectiontohost_1!at/usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line342.
Wed Aug 318:32:322016- [warning] GoterroronMySQL connect:2003(Can't connecttoMySQL serveron'host_1' (4))
Wed Aug 318:32:322016- [warning] Connection failed3time(s)..
Wed Aug 318:32:352016- [warning] GoterroronMySQL connect:2003(Can't connecttoMySQL serveron'host_1' (4))
Wed Aug 318:32:352016- [warning] Connection failed4time(s)..
Wed Aug 318:32:352016- [warning] Masterisnotreachablefromhealth checker!
Wed Aug 318:32:352016- [warning] Master host_1(host_1:3306)isnotreachable!
Wed Aug 318:32:352016- [warning] SSHisNOT reachable.
Wed Aug 318:32:352016- [info] Connectingtoa master server failed. Reading configurationfile/etc/masterha_default.cnfand/etc/app1.cnf again,andtryingtoconnecttoall serverstocheck server status..
Wed Aug 318:32:352016- [info] Reading default configurationfrom/etc/masterha_default.cnf..
Wed Aug 318:32:352016- [info] Readingapplicationdefault configurationfrom/etc/app1.cnf..
Wed Aug 318:32:352016- [info] Reading server configurationfrom/etc/app1.cnf..
Wed Aug 318:32:352016- [info] GTID failover mode =0
Wed Aug 318:32:352016- [info] Dead Servers:
Wed Aug 318:32:352016- [info] host_1(host_1:3306)
Wed Aug 318:32:352016- [info] Alive Servers:
Wed Aug 318:32:352016- [info] host_2(host_2:3306)
Wed Aug 318:32:352016- [info] host_3(host_3:3306)
Wed Aug 318:32:352016- [info] Alive Slaves:
Wed Aug 318:32:352016- [info] host_2(host_2:3306) Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Wed Aug 318:32:352016- [info] Replicatingfromhost_1(host_1:3306)
Wed Aug 318:32:352016- [info] Primary candidateforthenew Master (candidate_masterisset)
Wed Aug 318:32:352016- [info] host_3(host_3:3306) Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Wed Aug 318:32:352016- [info] Replicatingfromhost_1(host_1:3306)
Wed Aug 318:32:352016- [info] Not candidateforthenew Master (no_masterisset)
Wed Aug 318:32:352016- [info] Checking slave configurations..
Wed Aug 318:32:352016- [info] read_only=1isnotsetonslave host_2(host_2:3306).
Wed Aug 318:32:352016- [info] read_only=1isnotsetonslave host_3(host_3:3306).
Wed Aug 318:32:352016- [warning] relay_log_purge=0isnotsetonslave host_3(host_3:3306).
Wed Aug 318:32:352016- [info] Checking replication filtering settings..
Wed Aug 318:32:352016- [info] Replication filtering check ok.
Wed Aug 318:32:352016- [info] Masterisdown!
Wed Aug 318:32:352016- [info] Terminating monitoringscript.
Wed Aug 318:32:352016- [info] Gotexitcode20(Master dead).
Wed Aug 318:32:352016- [info] MHA::MasterFailoverversion0.56.
Wed Aug 318:32:352016- [info] Starting master failover.
Wed Aug 318:32:352016- [info]
Wed Aug 318:32:352016- [info] * Phase1: Configuration Check Phase..
Wed Aug 318:32:352016- [info]
Wed Aug 318:32:352016- [info] GTID failover mode =0
Wed Aug 318:32:352016- [info] Dead Servers:
Wed Aug 318:32:352016- [info] host_1(host_1:3306)
Wed Aug 318:32:352016- [info] Checking master reachability via MySQL(double check)...
Wed Aug 318:32:362016- [info] ok.
Wed Aug 318:32:362016- [info] Alive Servers:
Wed Aug 318:32:362016- [info] host_2(host_2:3306)
Wed Aug 318:32:362016- [info] host_3(host_3:3306)
Wed Aug 318:32:362016- [info] Alive Slaves:
Wed Aug 318:32:362016- [info] host_2(host_2:3306) Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Wed Aug 318:32:362016- [info] Replicatingfromhost_1(host_1:3306)
Wed Aug 318:32:362016- [info] Primary candidateforthenew Master (candidate_masterisset)
Wed Aug 318:32:362016- [info] host_3(host_3:3306) Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Wed Aug 318:32:362016- [info] Replicatingfromhost_1(host_1:3306)
Wed Aug 318:32:362016- [info] Not candidateforthenew Master (no_masterisset)
Wed Aug 318:32:362016- [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln309] Last failover was doneat2016/08/0316:07:43.Currenttimeistoo earlytodo failover again. If you wanttodo failover, manually remove /var/log/masterha/app1/app1.failover.completeandrunthisscriptagain.
Wed Aug 318:32:362016- [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR:at/usr/bin/masterha_manager line65

* 由于之前做过多次切换,然后时间太短,所以failover终止,但是从这里可以看出,MHA已经检测到网络问题,进行failover动作了

4.12 [用例测试] MySQL master 网路瞬断(1~30秒),是否自动failover

  • step1: 检查各种MHA manager的各种环境
* 检查SSH
 masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
 masterha_check_repl --conf=/etc/app1.cnf
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_2(host_2:3306).
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_3(host_3:3306).

* 检查mha状态
 masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
 nohup masterha_manager --conf=/etc/app1.cnf &
  • step2: 手动制造master网络不通的情况, 断掉1~10s左右
* MASTER 网络断掉,只允许host_monitor的22端口访问(便于恢复网络)
master> iptables -AINPUT-p tcp -s host_monitor --dprot22-jACCEPT
master>iptables -AINPUT-p tcp -s0.0.0.0/0-jDROP


* 1~10秒左右 MASTER 恢复网络
master> service iptables restart
  • step3: 观察MHA日志

结论:如果时间非常短,比如1~2秒的网络瞬断,不会failover

结论:如果时间比较长,比如:> 5~10秒,那么如果MHA已经判定到Master is not reachable from health checker! 那么,就会进行failover阶段

Wed Aug 3 18:48:35 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..



Wed Aug 3 18:48:50 2016 - [warning] Got timeout on MySQL Ping(SELECT) child process and killed it! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 431.
Wed Aug 3 18:48:50 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Wed Aug 3 18:48:53 2016 - [warning] Got error on MySQL connect: 2003 (Can't connecttoMySQLserveron'host_1'(4))
Wed Aug 318:48:532016- [warning]Connectionfailed2time(s)..
Wed Aug 318:48:552016- [warning] HealthCheck: Got timeoutonchecking SSHconnectiontohost_1!at/usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line342.
Wed Aug 318:48:562016- [warning] Got erroronMySQLconnect:2003(Can't connect to MySQL server on 'host_1' (4))
Wed Aug 3 18:48:56 2016 - [warning] Connection failed 3 time(s)..
Wed Aug 3 18:48:59 2016 - [warning] Got error on MySQL connect: 2003 (Can't connecttoMySQLserveron'host_1'(4))
Wed Aug 318:48:592016- [warning]Connectionfailed4time(s)..
Wed Aug 318:48:592016- [warning]Masterisnotreachablefromhealth checker!
Wed Aug 318:48:592016- [warning]Masterhost_1(host_1:3306)isnotreachable!
Wed Aug 318:48:592016- [warning] SSHisNOTreachable.

----- Failover Report -----

app1: MySQL Masterfailover host_1(host_1:3306)tohost_2(host_2:3306) succeeded

Masterhost_1(host_1:3306)isdown!

CheckMHA Managerlogsathost_manager_name:/var/log/masterha/app1/app1.logfordetails.

Started automated(non-interactive) failover.
Invalidated masterIP addressonhost_1(host_1:3306)
The latest slavehost_2(host_2:3306) hasallrelaylogsforrecovery.
Selected host_2(host_2:3306)asa newmaster.
host_2(host_2:3306): OK: Applyingalllogssucceeded.
host_2(host_2:3306): OK: ActivatedmasterIP address.
host_3(host_3:3306): This host has the latest relaylogevents.
Generating relay diff files fromthe latestslavesucceeded.
host_3(host_3:3306): OK: Applyingalllogssucceeded.Slavestarted, replicatingfromhost_2(host_2:3306)
host_2(host_2:3306): Resettingslaveinfo succeeded.
Masterfailovertohost_2(host_2:3306) completed successfully.

4.14 [用例测试] 多段网络测试0 (没有二次检测的机制)

Manager Master

Manager S1 master

Manager S2 master

  • step1: 检查各种MHA manager的各种环境
* 检查SSH
 masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
 masterha_check_repl --conf=/etc/app1.cnf
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_2(host_2:3306).
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_3(host_3:3306).

* 检查mha状态
 masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
 nohup masterha_manager --conf=/etc/app1.cnf &
  • step2: 手动制造上述网络故障
master> iptables -AINPUT-p tcp -s host_monitor -jACCEPT
master>iptables -AINPUT-p tcp -s host_2 -jACCEPT
master>iptables -AINPUT-p tcp -s host_3 -jACCEPT
master>iptables -AINPUT-p tcp -s0.0.0.0/0-jDROP
  • step3: 检查日志

已经发生failover

----- Failover Report -----

app1: MySQL Master failover host_1(host_1:3306) to host_2(host_2:3306) succeeded

Master host_1(host_1:3306) is down!

CheckMHA Managerlogsathost_manager_name:/var/log/masterha/app1/app1.logfordetails.

Started automated(non-interactive) failover.
Invalidated masterIP addressonhost_1(host_1:3306)
The latest slavehost_2(host_2:3306) hasallrelaylogsforrecovery.
Selected host_2(host_2:3306)asa newmaster.
host_2(host_2:3306): OK: Applyingalllogssucceeded.
host_2(host_2:3306): OK: ActivatedmasterIP address.
host_3(host_3:3306): This host has the latest relaylogevents.
Generating relay diff files fromthe latestslavesucceeded.
host_3(host_3:3306): OK: Applyingalllogssucceeded.Slavestarted, replicatingfromhost_2(host_2:3306)
host_2(host_2:3306): Resettingslaveinfo succeeded.
Masterfailovertohost_2(host_2:3306) completed successfully.

4.14 [用例测试] 多段网络测试1 (有二次检测的机制)

Manager Master

Manager S1 master

Manager S2 master

  • step1: 检查各种MHA manager的各种环境
* 检查SSH
 masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
 masterha_check_repl --conf=/etc/app1.cnf
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_2(host_2:3306).
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_3(host_3:3306).

* 检查mha状态
 masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
 nohup masterha_manager --conf=/etc/app1.cnf --last_failover_minute=1 &

* 二次检测
 secondary_check_script= masterha_secondary_check -s host_2 -s host_3
  • step2: 手动制造上述网络故障
master> iptables -AINPUT-p tcp -s host_monitor -jACCEPT
master>iptables -AINPUT-p tcp -s host_2 -jACCEPT
master>iptables -AINPUT-p tcp -s host_3 -jACCEPT
master>iptables -AINPUT-p tcp -s0.0.0.0/0-jDROP
  • step3: 观察日志

结论: 因为二次检查可以通过,所以MHA认为master没有挂,没有failover,但是在一直循环检测中

结论: 只要有二次检测,那么只要有一个别的server可以连通master,那么就会认为master没有挂,就不会failover

Fri Aug 510:16:242016- [info] Ping(SELECT) succeeded, waitinguntilMySQL doesn't respond..
Fri Aug 510:16:242016- [info] HealthCheck: SSHtohost_1isreachable.













Fri Aug 510:17:062016- [warning] GottimeoutonMySQL Ping(SELECT) child processandkilledit!at/usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line431.
Fri Aug 510:17:062016- [info] Executing secondary network checkscript: masterha_secondary_check -s host_2 -s host_3--user=root --master_host=host_1 --master_ip=host_1 --master_port=3306 --master_user=dba --master_password=dba --ping_type=SELECT
Fri Aug 510:17:062016- [info] Executing SSH checkscript: save_binary_logs--command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Master isreachablefromhost_2!
Fri Aug 510:17:072016- [warning] Masterisreachablefromatleast oneofother monitoring servers. Failover shouldnothappen.
Fri Aug 510:17:092016- [warning] GoterroronMySQL connect:2003(Can't connecttoMySQL serveron'host_1' (4))
Fri Aug 510:17:092016- [warning] Connection failed2time(s)..
Fri Aug 510:17:112016- [warning] HealthCheck: Gottimeoutonchecking SSH connectiontohost_1!at/usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line342.
Fri Aug 510:17:122016- [warning] GoterroronMySQL connect:2003(Can't connecttoMySQL serveron'host_1' (4))
Fri Aug 510:17:122016- [warning] Connection failed3time(s)..
Fri Aug 510:17:152016- [warning] GoterroronMySQL connect:2003(Can't connecttoMySQL serveron'host_1' (4))
Fri Aug 510:17:152016- [warning] Connection failed4time(s)..
Fri Aug 510:17:152016- [warning] Secondary network checkscriptreturned errors. Failover shouldnotstart so checking server status again. Check network settingsfordetails.
Fri Aug 510:17:182016- [warning] GoterroronMySQL connect:2003(Can't connecttoMySQL serveron'host_1' (4))
Fri Aug 510:17:182016- [warning] Connection failed1time(s)..
Fri Aug 510:17:192016- [info] Executing SSH checkscript: save_binary_logs--command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Fri Aug 510:17:192016- [info] Executing secondary network checkscript: masterha_secondary_check -s host_2 -s host_3--user=root --master_host=host_1 --master_ip=host_1 --master_port=3306 --master_user=dba --master_password=dba --ping_type=SELECT
Master isreachablefromhost_2!
Fri Aug 510:17:192016- [warning] Masterisreachablefromatleast oneofother monitoring servers. Failover shouldnothappen.
Fri Aug 510:17:222016- [warning] GoterroronMySQL connect:2003(Can't connecttoMySQL serveron'host_1' (4))
Fri Aug 510:17:222016- [warning] Connection failed2time(s)..
Fri Aug 510:17:242016- [warning] HealthCheck: Gottimeoutonchecking SSH connectiontohost_1!at/usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line342.
Fri Aug 510:17:252016- [warning] GoterroronMySQL connect:2003(Can't connecttoMySQL serveron'host_1' (4))
Fri Aug 510:17:252016- [warning] Connection failed3time(s)..
Fri Aug 510:17:282016- [warning] GoterroronMySQL connect:2003(Can't connecttoMySQL serveron'host_1' (4))
Fri Aug 510:17:282016- [warning] Connection failed4time(s)..
Fri Aug 510:17:282016- [warning] Secondary network checkscriptreturned errors. Failover shouldnotstart so checking server status again. Check network settingsfordetails.
Fri Aug 510:17:312016- [warning] GoterroronMySQL connect:2003(Can't connecttoMySQL serveron'host_1' (4))
Fri Aug 510:17:312016- [warning] Connection failed1time(s)..
Fri Aug 510:17:312016- [info] Executing SSH checkscript: save_binary_logs--command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Fri Aug 510:17:312016- [info] Executing secondary network checkscript: masterha_secondary_check -s host_2 -s host_3--user=root --master_host=host_1 --master_ip=host_1 --master_port=3306 --master_user=dba --master_password=dba --ping_type=SELECT
Master isreachablefromhost_2!

4.14 [用例测试] 多段网络测试2 (有二次检测的机制)

Manager Master

Manager S1 master

Manager S2 master

  • step1: 检查各种MHA manager的各种环境
* 检查SSH
 masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
 masterha_check_repl --conf=/etc/app1.cnf
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_2(host_2:3306).
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_3(host_3:3306).

* 检查mha状态
 masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
 nohup masterha_manager --conf=/etc/app1.cnf &
  • step2: 手动制造上述网络故障
master> iptables -AINPUT-p tcp -s host_monitor -jACCEPT
master>iptables -AINPUT-p tcp -s host_3 -jACCEPT
master>iptables -AINPUT-p tcp -s0.0.0.0/0-jDROP
  • step3: 观察日志

结论: 因为二次检查可以通过,所以MHA认为master没有挂,没有failover,但是在一直循环检测中

结论: 只要有二次检测,那么只要有一个别的server可以连通master,那么就会认为master没有挂,就不会failover

Fri Aug 510:32:272016- [info] Starting ping health checkonhost_1(host_1:3306)..
Fri Aug 510:32:272016- [info] Ping(SELECT) succeeded, waitinguntilMySQL doesn't respond..



















Fri Aug 510:33:302016- [warning] GottimeoutonMySQL Ping(SELECT) child processandkilledit!at/usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line431.
Fri Aug 510:33:302016- [info] Executing secondary network checkscript: masterha_secondary_check -s host_2 -s host_3--user=root --master_host=host_1 --master_ip=host_1 --master_port=3306 --master_user=dba --master_password=dba --ping_type=SELECT
Fri Aug 510:33:302016- [info] Executing SSH checkscript: save_binary_logs--command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Fri Aug 510:33:332016- [warning] GoterroronMySQL connect:2003(Can't connecttoMySQL serveron'host_1' (4))
Fri Aug 510:33:332016- [warning] Connection failed2time(s)..
Fri Aug 510:33:352016- [warning] HealthCheck: Gottimeoutonchecking SSH connectiontohost_1!at/usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line342.
Monitoring server host_2 isreachable, Masterisnotreachablefromhost_2. OK.
Master isreachablefromhost_3!
Fri Aug 510:33:352016- [warning] Masterisreachablefromatleast oneofother monitoring servers. Failover shouldnothappen.
Fri Aug 510:33:362016- [warning] GoterroronMySQL connect:2003(Can't connecttoMySQL serveron'host_1' (4))
Fri Aug 510:33:362016- [warning] Connection failed3time(s)..
Fri Aug 510:33:392016- [warning] GoterroronMySQL connect:2003(Can't connecttoMySQL serveron'host_1' (4))
Fri Aug 510:33:392016- [warning] Connection failed4time(s)..
Fri Aug 510:33:392016- [warning] Secondary network checkscriptreturned errors. Failover shouldnotstart so checking server status again. Check network settingsfordetails.
Fri Aug 510:33:422016- [warning] GoterroronMySQL connect:2003(Can't connecttoMySQL serveron'host_1' (4))
Fri Aug 510:33:422016- [warning] Connection failed1time(s)..
Fri Aug 510:33:422016- [info] Executing secondary network checkscript: masterha_secondary_check -s host_2 -s host_3--user=root --master_host=host_1 --master_ip=host_1 --master_port=3306 --master_user=dba --master_password=dba --ping_type=SELECT
Fri Aug 510:33:422016- [info] Executing SSH checkscript: save_binary_logs--command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Fri Aug 510:33:452016- [warning] GoterroronMySQL connect:2003(Can't connecttoMySQL serveron'host_1' (4))
Fri Aug 510:33:452016- [warning] Connection failed2time(s)..
Fri Aug 510:33:472016- [warning] HealthCheck: Gottimeoutonchecking SSH connectiontohost_1!at/usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line342.
Monitoring server host_2 isreachable, Masterisnotreachablefromhost_2. OK.
Master isreachablefromhost_3!
Fri Aug 510:33:472016- [warning] Masterisreachablefromatleast oneofother monitoring servers. Failover shouldnothappen.
Fri Aug 510:33:482016- [warning] GoterroronMySQL connect:2003(Can't connecttoMySQL serveron'host_1' (4))
Fri Aug 510:33:482016- [warning] Connection failed3time(s)..
Fri Aug 510:33:512016- [warning] GoterroronMySQL connect:2003(Can't connecttoMySQL serveron'host_1' (4))
Fri Aug 510:33:512016- [warning] Connection failed4time(s)..
Fri Aug 510:33:512016- [warning] Secondary network checkscriptreturned errors. Failover shouldnotstart so checking server status again. Check network settingsfordetails.
Fri Aug 510:33:542016- [warning] GoterroronMySQL connect:2003(Can't connecttoMySQL serveron'host_1' (4))
Fri Aug 510:33:542016- [warning] Connection failed1time(s)..
Fri Aug 510:33:542016- [info] Executing SSH checkscript: save_binary_logs--command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Fri Aug 510:33:542016- [info] Executing secondary network checkscript: masterha_secondary_check -s host_2 -s host_3--user=root --master_host=host_1 --master_ip=host_1 --master_port=3306 --master_user=dba --master_password=dba --ping_type=SELECT
Fri Aug 510:33:572016- [warning] GoterroronMySQL connect:2003(Can't connecttoMySQL serveron'host_1' (4))
Fri Aug 510:33:572016- [warning] Connection failed2time(s)..
Fri Aug 510:33:592016- [warning] HealthCheck: Gottimeoutonchecking SSH connectiontohost_1!at/usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line342.
Monitoring server host_2 isreachable, Masterisnotreachablefromhost_2. OK.
Master isreachablefromhost_3!
Fri Aug 510:34:002016- [warning] Masterisreachablefromatleast oneofother monitoring servers. Failover shouldnothappen.
Fri Aug 510:34:002016- [warning] GoterroronMySQL connect:2003(Can't connecttoMySQL serveron'host_1' (4))
Fri Aug 510:34:002016- [warning] Connection failed3time(s)..

4.14 [用例测试] 多段网络测试3 (有二次检测的机制)

Manager Master

Manager S1 master

Manager S2 master

  • step1: 检查各种MHA manager的各种环境
* 检查SSH
 masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
 masterha_check_repl --conf=/etc/app1.cnf
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_2(host_2:3306).
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_3(host_3:3306).

* 检查mha状态
 masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
 nohup masterha_manager --conf=/etc/app1.cnf &
  • step2: 手动制造上述网络故障
master> iptables -AINPUT-p tcp -s host_monitor -jACCEPT
master>iptables -AINPUT-p tcp -s0.0.0.0/0-jDROP
  • step3: 观察日志

由于都不通,那么会failover

Fri Aug 510:27:562016- [warning] GottimeoutonMySQL Ping(SELECT) child processandkilledit!at/usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line431.
Fri Aug 510:27:562016- [info] Executing SSH checkscript: save_binary_logs--command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Fri Aug 510:27:562016- [info] Executing secondary network checkscript: masterha_secondary_check -s host_2 -s host_3--user=root --master_host=host_1 --master_ip=host_1 --master_port=3306 --master_user=dba --master_password=dba --ping_type=SELECT
Fri Aug 510:27:592016- [warning] GoterroronMySQL connect:2003(Can't connecttoMySQL serveron'host_1' (4))
Fri Aug 510:27:592016- [warning] Connection failed2time(s)..
Fri Aug 510:28:012016- [warning] HealthCheck: Gottimeoutonchecking SSH connectiontohost_1!at/usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line342.
Monitoring server host_2 isreachable, Masterisnotreachablefromhost_2. OK.
Fri Aug 510:28:022016- [warning] GoterroronMySQL connect:2003(Can't connecttoMySQL serveron'host_1' (4))
Fri Aug 510:28:022016- [warning] Connection failed3time(s)..
Fri Aug 510:28:052016- [warning] GoterroronMySQL connect:2003(Can't connecttoMySQL serveron'host_1' (4))
Fri Aug 510:28:052016- [warning] Connection failed4time(s)..
Monitoring server host_3 isreachable, Masterisnotreachablefromhost_3. OK.
Fri Aug 510:28:062016- [info] Masterisnotreachablefromall other monitoring servers. Failover should start.
Fri Aug 510:28:062016- [warning] Masterisnotreachablefromhealth checker!
Fri Aug 510:28:062016- [warning] Master host_1(host_1:3306)isnotreachable!
Fri Aug 510:28:062016- [warning] SSHisNOT reachable.
Fri Aug 510:28:062016- [info] Connectingtoa master server failed. Reading configurationfile/etc/masterha_default.cnfand/etc/app1.cnf again,andtryingtoconnecttoall serverstocheck server status..
Fri Aug 510:28:062016- [info] Reading default configurationfrom/etc/masterha_default.cnf..
Fri Aug 510:28:062016- [info] Readingapplicationdefault configurationfrom/etc/app1.cnf..
Fri Aug 510:28:062016- [info] Reading server configurationfrom/etc/app1.cnf..
Fri Aug 510:28:062016- [info] GTID failover mode =0
Fri Aug 510:28:062016- [info] Dead Servers:
Fri Aug 510:28:062016- [info] host_1(host_1:3306)
Fri Aug 510:28:062016- [info] Alive Servers:
Fri Aug 510:28:062016- [info] host_2(host_2:3306)
Fri Aug 510:28:062016- [info] host_3(host_3:3306)
Fri Aug 510:28:062016- [info] Alive Slaves:
Fri Aug 510:28:062016- [info] host_2(host_2:3306) Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Fri Aug 510:28:062016- [info] Replicatingfromhost_1(host_1:3306)
Fri Aug 510:28:062016- [info] Primary candidateforthenew Master (candidate_masterisset)
Fri Aug 510:28:062016- [info] host_3(host_3:3306) Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Fri Aug 510:28:062016- [info] Replicatingfromhost_1(host_1:3306)
Fri Aug 510:28:062016- [info] Not candidateforthenew Master (no_masterisset)
Fri Aug 510:28:062016- [info] Checking slave configurations..
Fri Aug 510:28:062016- [info] read_only=1isnotsetonslave host_2(host_2:3306).
Fri Aug 510:28:062016- [info] read_only=1isnotsetonslave host_3(host_3:3306).
Fri Aug 510:28:062016- [warning] relay_log_purge=0isnotsetonslave host_3(host_3:3306).
Fri Aug 510:28:062016- [info] Checking replication filtering settings..
Fri Aug 510:28:062016- [info] Replication filtering check ok.
Fri Aug 510:28:062016- [info] Masterisdown!
Fri Aug 510:28:062016- [info] Terminating monitoringscript.
Fri Aug 510:28:062016- [info] Gotexitcode20(Master dead).
Fri Aug 510:28:062016- [info] MHA::MasterFailoverversion0.56.
Fri Aug 510:28:062016- [info] Starting master failover.
Fri Aug 510:28:062016- [info]
Fri Aug 510:28:062016- [info] * Phase1: Configuration Check Phase..
Fri Aug 510:28:062016- [info]
Fri Aug 510:28:072016- [info] GTID failover mode =0
Fri Aug 510:28:072016- [info] Dead Servers:
Fri Aug 510:28:072016- [info] host_1(host_1:3306)
Fri Aug 510:28:072016- [info] Checking master reachability via MySQL(double check)...
Fri Aug 510:28:082016- [info] ok.
Fri Aug 510:28:082016- [info] Alive Servers:
Fri Aug 510:28:082016- [info] host_2(host_2:3306)
Fri Aug 510:28:082016- [info] host_3(host_3:3306)
Fri Aug 510:28:082016- [info] Alive Slaves:
Fri Aug 510:28:082016- [info] host_2(host_2:3306) Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Fri Aug 510:28:082016- [info] Replicatingfromhost_1(host_1:3306)
Fri Aug 510:28:082016- [info] Primary candidateforthenew Master (candidate_masterisset)
Fri Aug 510:28:082016- [info] host_3(host_3:3306) Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Fri Aug 510:28:082016- [info] Replicatingfromhost_1(host_1:3306)
Fri Aug 510:28:082016- [info] Not candidateforthenew Master (no_masterisset)
Fri Aug 510:28:082016- [info] Starting Non-GTID based failover.
Fri Aug 510:28:082016- [info]
Fri Aug 510:28:082016- [info] ** Phase1: Configuration Check Phase completed.
Fri Aug 510:28:082016- [info]
Fri Aug 510:28:082016- [info] * Phase2: Dead Master Shutdown Phase..
Fri Aug 510:28:082016- [info]
Fri Aug 510:28:082016- [info] Forcing shutdown sothatapplications never connecttothecurrent master..
Fri Aug 510:28:082016- [info] Executing master IP deactivationscript:
Fri Aug 510:28:082016- [info] /home/mysql/MHA/masterha/master_ip_failover--orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --command=stop
Fri Aug 510:28:082016- [info] done.
Fri Aug 510:28:082016- [warning] shutdown_scriptisnotset. Skipping explicit shutting downofthedead master.
Fri Aug 510:28:082016- [info] * Phase2: Dead Master Shutdown Phase completed.
Fri Aug 510:28:082016- [info]
Fri Aug 510:28:082016- [info] * Phase3: Master Recovery Phase..
Fri Aug 510:28:082016- [info]
Fri Aug 510:28:082016- [info] * Phase3.1: Getting Latest Slaves Phase..
Fri Aug 510:28:082016- [info]
Fri Aug 510:28:082016- [info] The latest binarylogfile/positiononall slavesishost_1_name.000001:154
Fri Aug 510:28:082016- [info] Latest slaves (Slavesthatreceived relaylogfilestothelatest):
Fri Aug 510:28:082016- [info] host_2(host_2:3306) Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Fri Aug 510:28:082016- [info] Replicatingfromhost_1(host_1:3306)
Fri Aug 510:28:082016- [info] Primary candidateforthenew Master (candidate_masterisset)
Fri Aug 510:28:082016- [info] host_3(host_3:3306) Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Fri Aug 510:28:082016- [info] Replicatingfromhost_1(host_1:3306)
Fri Aug 510:28:082016- [info] Not candidateforthenew Master (no_masterisset)
Fri Aug 510:28:082016- [info] The oldest binarylogfile/positiononall slavesishost_1_name.000001:154
Fri Aug 510:28:082016- [info] Oldest slaves:
Fri Aug 510:28:082016- [info] host_2(host_2:3306) Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Fri Aug 510:28:082016- [info] Replicatingfromhost_1(host_1:3306)
Fri Aug 510:28:082016- [info] Primary candidateforthenew Master (candidate_masterisset)
Fri Aug 510:28:082016- [info] host_3(host_3:3306) Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Fri Aug 510:28:082016- [info] Replicatingfromhost_1(host_1:3306)
Fri Aug 510:28:082016- [info] Not candidateforthenew Master (no_masterisset)
Fri Aug 510:28:082016- [info]
Fri Aug 510:28:082016- [info] * Phase3.2: Saving Dead Master's Binlog Phase..
Fri Aug 510:28:082016- [info]
Fri Aug 510:28:082016- [warning] Dead MasterisnotSSH reachable. Couldnotsaveit's binlogs. Transactionsthatwerenotsenttothelatest slave (Read_Master_Log_Postothetailofthedead master's binlog) were lost.
Fri Aug 510:28:082016- [info]
Fri Aug 510:28:082016- [info] * Phase3.3: Determining New Master Phase..
Fri Aug 510:28:082016- [info]
Fri Aug 510:28:082016- [info] Findingthelatest slavethathas all relay logsforrecovering other slaves..
Fri Aug 510:28:082016- [info] All slaves received relay logstothesame position. No needtoresync each other.
Fri Aug 510:28:082016- [info] Searching new masterfromslaves..
Fri Aug 510:28:082016- [info] Candidate mastersfromtheconfigurationfile:
Fri Aug 510:28:082016- [info] host_2(host_2:3306) Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Fri Aug 510:28:082016- [info] Replicatingfromhost_1(host_1:3306)
Fri Aug 510:28:082016- [info] Primary candidateforthenew Master (candidate_masterisset)
Fri Aug 510:28:082016- [info] Non-candidate masters:
Fri Aug 510:28:082016- [info] host_3(host_3:3306) Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Fri Aug 510:28:082016- [info] Replicatingfromhost_1(host_1:3306)
Fri Aug 510:28:082016- [info] Not candidateforthenew Master (no_masterisset)
Fri Aug 510:28:082016- [info] Searchingfromcandidate_master slaves which have receivedthelatest relaylogevents..
Fri Aug 510:28:082016- [info] New masterishost_2(host_2:3306)
Fri Aug 510:28:082016- [info] Starting master failover..
Fri Aug 510:28:082016- [info]
From:
host_1(host_1:3306) (current master)
 +--host_2(host_2:3306)
 +--host_3(host_3:3306)

To:
host_2(host_2:3306) (new master)
 +--host_3(host_3:3306)
Fri Aug 510:28:082016- [info]
Fri Aug 510:28:082016- [info] * Phase3.3: New Master Diff Log Generation Phase..
Fri Aug 510:28:082016- [info]
Fri Aug 510:28:082016- [info] This server has all relay logs. No needtogenerate diff filesfromthelatest slave.
Fri Aug 510:28:082016- [info]
Fri Aug 510:28:082016- [info] * Phase3.4: Master Log Apply Phase..
Fri Aug 510:28:082016- [info]
Fri Aug 510:28:082016- [info] *NOTICE: If anyerrorhappensfromthis phase, manual recoveryisneeded.
Fri Aug 510:28:082016- [info] Starting recoveryonhost_2(host_2:3306)..
Fri Aug 510:28:082016- [info] This server has all relay logs. Waiting all logstobe applied..
Fri Aug 510:28:082016- [info] done.
Fri Aug 510:28:082016- [info] All relay logs were successfully applied.
Fri Aug 510:28:082016- [info] Getting new master's binlognameandposition..
Fri Aug 510:28:082016- [info] host_2_name.000001:154
Fri Aug 510:28:082016- [info] All other slaves should start replicationfromhere. Statement should be: CHANGE MASTER TO MASTER_HOST='host_2', MASTER_PORT=3306, MASTER_LOG_FILE='host_2_name.000001', MASTER_LOG_POS=154, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Fri Aug 510:28:082016- [info] Executing master IPactivatescript:
Fri Aug 510:28:082016- [info] /home/mysql/MHA/masterha/master_ip_failover--command=start --ssh_user=root --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --new_master_host=host_2 --new_master_ip=host_2 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba'
Set read_only=0onthenew master.
No need toCreating app useronthenew master..
Fri Aug 510:28:082016- [info] OK.
Fri Aug 510:28:082016- [info] ** Finished master recovery successfully.
Fri Aug 510:28:082016- [info] * Phase3: Master Recovery Phase completed.
Fri Aug 510:28:082016- [info]
Fri Aug 510:28:082016- [info] * Phase4: Slaves Recovery Phase..
Fri Aug 510:28:082016- [info]
Fri Aug 510:28:082016- [info] * Phase4.1: Starting Parallel Slave Diff Log Generation Phase..
Fri Aug 510:28:082016- [info]
Fri Aug 510:28:082016- [info]-- Slave diff file generation on host host_3(host_3:3306) started, pid: 3964. Check tmp log /var/log/masterha/app1/host_3_3306_20160805102806.log if it takes time..
Fri Aug 510:28:082016- [info]
Fri Aug 510:28:082016- [info] Log messagesfromhost_3 ...
Fri Aug 510:28:082016- [info]
Fri Aug 510:28:082016- [info] This server has all relay logs. No needtogenerate diff filesfromthelatest slave.
Fri Aug 510:28:082016- [info] Endoflogmessagesfromhost_3.
Fri Aug 510:28:082016- [info]-- host_3(host_3:3306) has the latest relay log events.
Fri Aug 510:28:082016- [info] Generating relay diff filesfromthelatest slave succeeded.
Fri Aug 510:28:082016- [info]
Fri Aug 510:28:082016- [info] * Phase4.2: Starting Parallel Slave Log Apply Phase..
Fri Aug 510:28:082016- [info]
Fri Aug 510:28:082016- [info]-- Slave recovery on host host_3(host_3:3306) started, pid: 3966. Check tmp log /var/log/masterha/app1/host_3_3306_20160805102806.log if it takes time..
Fri Aug 510:28:082016- [info]
Fri Aug 510:28:082016- [info] Log messagesfromhost_3 ...
Fri Aug 510:28:082016- [info]
Fri Aug 510:28:082016- [info] Starting recoveryonhost_3(host_3:3306)..
Fri Aug 510:28:082016- [info] This server has all relay logs. Waiting all logstobe applied..
Fri Aug 510:28:082016- [info] done.
Fri Aug 510:28:082016- [info] All relay logs were successfully applied.
Fri Aug 510:28:082016- [info] Resetting slave host_3(host_3:3306)andstarting replicationfromthenew master host_2(host_2:3306)..
Fri Aug 510:28:082016- [info] Executed CHANGE MASTER.
Fri Aug 510:28:082016- [info] Slave started.
Fri Aug 510:28:082016- [info] Endoflogmessagesfromhost_3.
Fri Aug 510:28:082016- [info]-- Slave recovery on host host_3(host_3:3306) succeeded.
Fri Aug 510:28:082016- [info] All new slave servers recovered successfully.
Fri Aug 510:28:082016- [info]
Fri Aug 510:28:082016- [info] * Phase5: New master cleanup phase..
Fri Aug 510:28:082016- [info]
Fri Aug 510:28:082016- [info] Resetting slave infoonthenew master..
Fri Aug 510:28:082016- [info] host_2: Resetting slave info succeeded.
Fri Aug 510:28:082016- [info] Master failovertohost_2(host_2:3306) completed successfully.
Fri Aug 510:28:082016- [info]

----- Failover Report -----

app1: MySQL Master failover host_1(host_1:3306)tohost_2(host_2:3306) succeeded

Master host_1(host_1:3306)isdown!

Check MHA Manager logs athost_manager_name:/var/log/masterha/app1/app1.logfordetails.

Started automated(non-interactive) failover.
Invalidated master IP address onhost_1(host_1:3306)
The latest slave host_2(host_2:3306) has all relay logsforrecovery.
Selected host_2(host_2:3306)asa new master.
host_2(host_2:3306): OK: Applying all logs succeeded.
host_2(host_2:3306): OK: Activated master IP address.
host_3(host_3:3306): This host hasthelatest relaylogevents.
Generating relay diff files fromthelatest slave succeeded.
host_3(host_3:3306): OK: Applying all logs succeeded. Slave started, replicatingfromhost_2(host_2:3306)
host_2(host_2:3306): Resetting slave info succeeded.
Master failover tohost_2(host_2:3306) completed successfully.

4.14 [用例测试] 多段网络测试4 (没有二次检测的机制)

Manager Master

Manager S1 master

Manager S2 master

  • step1: 检查各种MHA manager的各种环境
* 检查SSH
 masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
 masterha_check_repl --conf=/etc/app1.cnf
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_2(host_2:3306).
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_3(host_3:3306).

* 检查mha状态
 masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
 nohup masterha_manager --conf=/etc/app1.cnf &
  • step2: 手动制造上述网络故障
master> iptables -AINPUT-p tcp -s host_monitor -jACCEPT
master>iptables -AINPUT-p tcp -s host_2 -jACCEPT
master>iptables -AINPUT-p tcp -s host_3 -jACCEPT
master>iptables -AINPUT-p tcp -s0.0.0.0/0-jDROP

s1> iptables -AINPUT-p tcp -s host_monitor -jACCEPT
s1>iptables -AINPUT-p tcp -s host_1 -jACCEPT
s1>iptables -AINPUT-p tcp -s0.0.0.0/0-jDROP

s2> iptables -AINPUT-p tcp -s host_monitor -jACCEPT
s2>iptables -AINPUT-p tcp -s host_1 -jACCEPT
s2>iptables -AINPUT-p tcp -s0.0.0.0/0-jDROP
  • step3: 观察日志

结论:因为master不通,其他slave也不通,那么没有存活的slave,也就不会failover了

Fri Aug 5 10:39:39 2016 - [info] Checking master_ip_failover_script status:
Fri Aug 5 10:39:39 2016 - [info] /home/mysql/MHA/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306
Fri Aug 5 10:39:39 2016 - [info] OK.
Fri Aug 5 10:39:39 2016 - [warning] shutdown_script is not defined.
Fri Aug 5 10:39:39 2016 - [info] Setmasterpinginterval3seconds.
Fri Aug 510:39:392016- [warning] secondary_check_scriptisnotdefined. Itishighly recommended setting ittocheckmasterreachabilityfromtwoormore routes.
Fri Aug 510:39:392016- [info]Startingping healthcheckonhost_1(host_1:3306)..
Fri Aug 510:39:392016- [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Fri Aug 5 10:41:51 2016 - [warning] Got timeout on MySQL Ping(SELECT) child process and killed it! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 431.
Fri Aug 5 10:41:51 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Fri Aug 5 10:41:54 2016 - [warning] Got error on MySQL connect: 2003 (Can't connecttoMySQLserveron'host_1'(4))
Fri Aug 510:41:542016- [warning]Connectionfailed2time(s)..
Fri Aug 510:41:562016- [warning] HealthCheck: Got timeoutonchecking SSHconnectiontohost_1!at/usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line342.
Fri Aug 510:41:572016- [warning] Got erroronMySQLconnect:2003(Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 10:41:57 2016 - [warning] Connection failed 3 time(s)..
Fri Aug 5 10:42:00 2016 - [warning] Got error on MySQL connect: 2003 (Can't connecttoMySQLserveron'host_1'(4))
Fri Aug 510:42:002016- [warning]Connectionfailed4time(s)..
Fri Aug 510:42:002016- [warning]Masterisnotreachablefromhealth checker!
Fri Aug 510:42:002016- [warning]Masterhost_1(host_1:3306)isnotreachable!
Fri Aug 510:42:002016- [warning] SSHisNOTreachable.
Fri Aug 510:42:002016- [info] Connectingtoamasterserverfailed. Reading configuration file /etc/masterha_default.cnfand/etc/app1.cnf again,andtryingtoconnecttoallserverstocheckserverstatus..
Fri Aug 510:42:002016- [info] Readingdefaultconfigurationfrom/etc/masterha_default.cnf..
Fri Aug 510:42:002016- [info] Reading applicationdefaultconfigurationfrom/etc/app1.cnf..
Fri Aug 510:42:002016- [info] Readingserverconfigurationfrom/etc/app1.cnf..
Fri Aug 510:42:082016- [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln188] Thereisnoaliveserver. We can't do failover
Fri Aug 5 10:42:08 2016 - [warning] Got Error: at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 558
Fri Aug 5 10:42:08 2016 - [info] Got exit code 1 (Not master dead).

4.14 [用例测试] 多段网络测试5 (有二次检测的机制)

Manager Master

Manager S1 master

Manager S2 master

  • step1: 检查各种MHA manager的各种环境
* 检查SSH
 masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
 masterha_check_repl --conf=/etc/app1.cnf
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_2(host_2:3306).
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_3(host_3:3306).

* 检查mha状态
 masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
 nohup masterha_manager --conf=/etc/app1.cnf &
  • step2: 手动制造上述网络故障
master> iptables -AINPUT-p tcp -s host_monitor -jACCEPT
master>iptables -AINPUT-p tcp -s host_2 -jACCEPT
master>iptables -AINPUT-p tcp -s host_3 -jACCEPT
master>iptables -AINPUT-p tcp -s0.0.0.0/0-jDROP

s1> iptables -AINPUT-p tcp -s host_monitor -jACCEPT
s1>iptables -AINPUT-p tcp -s host_1 -jACCEPT
s1>iptables -AINPUT-p tcp -s0.0.0.0/0-jDROP

s2> iptables -AINPUT-p tcp -s host_monitor -jACCEPT
s2>iptables -AINPUT-p tcp -s host_1 -jACCEPT
s2>iptables -AINPUT-p tcp -s0.0.0.0/0-jDROP
  • step3: 观察日志

Monitoring server host_2 is NOT reachable!

只要monitor 服务器,有一台不通,那么就不会failover,MHA manager会认为这是网络故障

Fri Aug 510:47:412016- [info] Ping(SELECT) succeeded, waitinguntilMySQL doesn't respond..






Fri Aug 510:55:502016- [warning] GottimeoutonMySQL Ping(SELECT) child processandkilledit!at/usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line431.
Fri Aug 510:55:502016- [info] Executing secondary network checkscript: masterha_secondary_check -s host_2 -s host_3--user=root --master_host=host_1 --master_ip=host_1 --master_port=3306 --master_user=dba --master_password=dba --ping_type=SELECT
Fri Aug 510:55:502016- [info] Executing SSH checkscript: save_binary_logs--command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Fri Aug 510:55:532016- [warning] GoterroronMySQL connect:2003(Can't connecttoMySQL serveron'host_1' (4))
Fri Aug 510:55:532016- [warning] Connection failed2time(s)..
Fri Aug 510:55:552016- [warning] HealthCheck: Gottimeoutonchecking SSH connectiontohost_1!at/usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line342.
ssh: connect tohost host_2 port22: Connection timed out
Monitoring server host_2 isNOT reachable!
Fri Aug 510:55:552016- [warning] At least oneofmonitoring serversisnotreachablefromthisscript. Thisislikely a network problem. Failover shouldnothappen.
Fri Aug 510:55:562016- [warning] GoterroronMySQL connect:2003(Can't connecttoMySQL serveron'host_1' (4))
Fri Aug 510:55:562016- [warning] Connection failed3time(s)..
Fri Aug 510:55:592016- [warning] GoterroronMySQL connect:2003(Can't connecttoMySQL serveron'host_1' (4))
Fri Aug 510:55:592016- [warning] Connection failed4time(s)..
Fri Aug 510:55:592016- [warning] Secondary network checkscriptreturned errors. Failover shouldnotstart so checking server status again. Check network settingsfordetails.
Fri Aug 510:56:022016- [warning] GoterroronMySQL connect:2003(Can't connecttoMySQL serveron'host_1' (4))
Fri Aug 510:56:022016- [warning] Connection failed1time(s)..
Fri Aug 510:56:022016- [info] Executing secondary network checkscript: masterha_secondary_check -s host_2 -s host_3--user=root --master_host=host_1 --master_ip=host_1 --master_port=3306 --master_user=dba --master_password=dba --ping_type=SELECT
Fri Aug 510:56:022016- [info] Executing SSH checkscript: save_binary_logs--command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Fri Aug 510:56:052016- [warning] GoterroronMySQL connect:2003(Can't connecttoMySQL serveron'host_1' (4))
Fri Aug 510:56:052016- [warning] Connection failed2time(s)..
Fri Aug 510:56:072016- [warning] HealthCheck: Gottimeoutonchecking SSH connectiontohost_1!at/usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line342.
ssh: connect tohost host_2 port22: Connection timed out
Monitoring server host_2 isNOT reachable!
Fri Aug 510:56:072016- [warning] At least oneofmonitoring serversisnotreachablefromthisscript. Thisislikely a network problem. Failover shouldnothappen.

4.14 [用例测试] 多段网络测试6 (没有二次检测的机制)

Manager Master

Manager S1 master

Manager S2 master

  • step1: 检查各种MHA manager的各种环境
* 检查SSH
 masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
 masterha_check_repl --conf=/etc/app1.cnf
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_2(host_2:3306).
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_3(host_3:3306).

* 检查mha状态
 masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
 nohup masterha_manager --conf=/etc/app1.cnf &
  • step2: 手动制造上述网络故障
master> iptables -AINPUT-p tcp -s host_monitor -jACCEPT
master>iptables -AINPUT-p tcp -s0.0.0.0/0-jDROP

s1> iptables -AINPUT-p tcp -s host_monitor -jACCEPT
s1>iptables -AINPUT-p tcp -s0.0.0.0/0-jDROP

s2> iptables -AINPUT-p tcp -s host_monitor -jACCEPT
s2>iptables -AINPUT-p tcp -s0.0.0.0/0-jDROP
  • step3: 观察日志

由于都不通,所以不会failover

4.14 [用例测试] 多段网络测试7 (有二次检测的机制)

Manager Master

Manager S1 master

Manager S2 master

  • step1: 检查各种MHA manager的各种环境
* 检查SSH
 masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
 masterha_check_repl --conf=/etc/app1.cnf
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_2(host_2:3306).
 Thu Jul 2814:03:332016- [warning] relay_log_purge=0isnotsetonslave host_3(host_3:3306).

* 检查mha状态
 masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
 nohup masterha_manager --conf=/etc/app1.cnf &
  • step2: 手动制造上述网络故障
master> iptables -AINPUT-p tcp -s host_monitor -jACCEPT
master>iptables -AINPUT-p tcp -s0.0.0.0/0-jDROP

s1> iptables -AINPUT-p tcp -s host_monitor -jACCEPT
s1>iptables -AINPUT-p tcp -s0.0.0.0/0-jDROP

s2> iptables -AINPUT-p tcp -s host_monitor -jACCEPT
s2>iptables -AINPUT-p tcp -s0.0.0.0/0-jDROP
  • step3: 观察日志

由于都不通,所以不会failover

4.15 [用例测试] Non-GTID模式下,需要relay-log吗?是否能够成功的补齐日志

如果没有足够的relay-log,在恢复的时候,会报错的

在生产差异relay log的时候发现没有足够的日志,所以就报错,不会failover

Fri Aug 5 11:09:51 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..














Fri Aug 5 11:10:42 2016 - [warning] Got timeout on MySQL Ping(SELECT) child process and killed it! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 431.
Fri Aug 5 11:10:42 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Fri Aug 5 11:10:45 2016 - [warning] Got error on MySQL connect: 2003 (Can't connecttoMySQLserveron'host_1'(4))
Fri Aug 511:10:452016- [warning]Connectionfailed2time(s)..
Fri Aug 511:10:472016- [warning] HealthCheck: Got timeoutonchecking SSHconnectiontohost_1!at/usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line342.
Fri Aug 511:10:482016- [warning] Got erroronMySQLconnect:2003(Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 11:10:48 2016 - [warning] Connection failed 3 time(s)..
Fri Aug 5 11:10:51 2016 - [warning] Got error on MySQL connect: 2003 (Can't connecttoMySQLserveron'host_1'(4))
Fri Aug 511:10:512016- [warning]Connectionfailed4time(s)..
Fri Aug 511:10:512016- [warning]Masterisnotreachablefromhealth checker!
Fri Aug 511:10:512016- [warning]Masterhost_1(host_1:3306)isnotreachable!
Fri Aug 511:10:512016- [warning] SSHisNOTreachable.
Fri Aug 511:10:512016- [info] Connectingtoamasterserverfailed. Reading configuration file /etc/masterha_default.cnfand/etc/app1.cnf again,andtryingtoconnecttoallserverstocheckserverstatus..
Fri Aug 511:10:512016- [info] Readingdefaultconfigurationfrom/etc/masterha_default.cnf..
Fri Aug 511:10:512016- [info] Reading applicationdefaultconfigurationfrom/etc/app1.cnf..
Fri Aug 511:10:512016- [info] Readingserverconfigurationfrom/etc/app1.cnf..
Fri Aug 511:10:512016- [warning]SQLThreadisstopped(noerror)onhost_3(host_3:3306)
Fri Aug 511:10:512016- [info] GTID failovermode=0
Fri Aug 511:10:512016- [info] Dead Servers:
Fri Aug 511:10:512016- [info] host_1(host_1:3306)
Fri Aug 511:10:512016- [info] Alive Servers:
Fri Aug 511:10:512016- [info] host_2(host_2:3306)
Fri Aug 511:10:512016- [info] host_3(host_3:3306)
Fri Aug 511:10:512016- [info] Alive Slaves:
Fri Aug 511:10:512016- [info] host_2(host_2:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Fri Aug 511:10:512016- [info] Replicatingfromhost_1(host_1:3306)
Fri Aug 511:10:512016- [info]Primarycandidateforthe newMaster(candidate_masterisset)
Fri Aug 511:10:512016- [info] host_3(host_3:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Fri Aug 511:10:512016- [info] Replicatingfromhost_1(host_1:3306)
Fri Aug 511:10:512016- [info]Notcandidateforthe newMaster(no_masterisset)
Fri Aug 511:10:512016- [info] Checkingslaveconfigurations..
Fri Aug 511:10:512016- [info] read_only=1isnotsetonslavehost_2(host_2:3306).
Fri Aug 511:10:512016- [warning] relay_log_purge=0isnotsetonslavehost_2(host_2:3306).
Fri Aug 511:10:512016- [info] read_only=1isnotsetonslavehost_3(host_3:3306).
Fri Aug 511:10:512016- [warning] relay_log_purge=0isnotsetonslavehost_3(host_3:3306).
Fri Aug 511:10:512016- [info] Checking replication filtering settings..
Fri Aug 511:10:512016- [info] Replication filteringcheckok.
Fri Aug 511:10:512016- [info]Masterisdown!
Fri Aug 511:10:512016- [info] Terminating monitoring script.
Fri Aug 511:10:512016- [info] Got exit code20(Masterdead).
Fri Aug 511:10:512016- [info] MHA::MasterFailoverversion0.56.
Fri Aug 511:10:512016- [info]Startingmasterfailover.
Fri Aug 511:10:512016- [info]
Fri Aug 511:10:512016- [info] * Phase1: ConfigurationCheckPhase..
Fri Aug 511:10:512016- [info]
Fri Aug 511:10:512016- [warning]SQLThreadisstopped(noerror)onhost_3(host_3:3306)
Fri Aug 511:10:512016- [info] GTID failovermode=0
Fri Aug 511:10:512016- [info] Dead Servers:
Fri Aug 511:10:512016- [info] host_1(host_1:3306)
Fri Aug 511:10:512016- [info] Checkingmasterreachability via MySQL(doublecheck)...
Fri Aug 511:10:522016- [info] ok.
Fri Aug 511:10:522016- [info] Alive Servers:
Fri Aug 511:10:522016- [info] host_2(host_2:3306)
Fri Aug 511:10:522016- [info] host_3(host_3:3306)
Fri Aug 511:10:522016- [info] Alive Slaves:
Fri Aug 511:10:522016- [info] host_2(host_2:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Fri Aug 511:10:522016- [info] Replicatingfromhost_1(host_1:3306)
Fri Aug 511:10:522016- [info]Primarycandidateforthe newMaster(candidate_masterisset)
Fri Aug 511:10:522016- [info] host_3(host_3:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Fri Aug 511:10:522016- [info] Replicatingfromhost_1(host_1:3306)
Fri Aug 511:10:522016- [info]Notcandidateforthe newMaster(no_masterisset)
Fri Aug 511:10:522016- [info]StartingSQLthreadonhost_3(host_3:3306) ..
Fri Aug 511:10:522016- [info] done.
Fri Aug 511:10:522016- [info]StartingNon-GTID based failover.
Fri Aug 511:10:522016- [info]
Fri Aug 511:10:522016- [info] ** Phase1: ConfigurationCheckPhase completed.
Fri Aug 511:10:522016- [info]
Fri Aug 511:10:522016- [info] * Phase2: DeadMasterShutdown Phase..
Fri Aug 511:10:522016- [info]
Fri Aug 511:10:522016- [info] Forcing shutdown so that applications neverconnecttothecurrentmaster..
Fri Aug 511:10:522016- [info] ExecutingmasterIP deactivation script:
Fri Aug 511:10:522016- [info] /home/mysql/MHA/masterha/master_ip_failover--orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --command=stop
Fri Aug 511:10:532016- [info] done.
Fri Aug 511:10:532016- [warning] shutdown_scriptisnotset. Skipping explicit shutting downofthe deadmaster.
Fri Aug 511:10:532016- [info] * Phase2: DeadMasterShutdown Phase completed.
Fri Aug 511:10:532016- [info]
Fri Aug 511:10:532016- [info] * Phase3:MasterRecovery Phase..
Fri Aug 511:10:532016- [info]
Fri Aug 511:10:532016- [info] * Phase3.1: Getting Latest Slaves Phase..
Fri Aug 511:10:532016- [info]
Fri Aug 511:10:532016- [info] The latestbinarylogfile/positiononallslavesishost_1_name.000001:32925754
Fri Aug 511:10:532016- [info] Latest slaves (Slaves that received relaylogfilestothe latest):
Fri Aug 511:10:532016- [info] host_2(host_2:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Fri Aug 511:10:532016- [info] Replicatingfromhost_1(host_1:3306)
Fri Aug 511:10:532016- [info]Primarycandidateforthe newMaster(candidate_masterisset)
Fri Aug 511:10:532016- [info] The oldestbinarylogfile/positiononallslavesishost_1_name.000001:2093512
Fri Aug 511:10:532016- [info] Oldest slaves:
Fri Aug 511:10:532016- [info] host_3(host_3:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Fri Aug 511:10:532016- [info] Replicatingfromhost_1(host_1:3306)
Fri Aug 511:10:532016- [info]Notcandidateforthe newMaster(no_masterisset)
Fri Aug 511:10:532016- [info]
Fri Aug 511:10:532016- [info] * Phase3.2: Saving DeadMaster's Binlog Phase..
Fri Aug 5 11:10:53 2016 - [info]
Fri Aug 5 11:10:53 2016 - [warning] Dead Master is not SSH reachable. Could not save it's binlogs. Transactions that were notsenttothe latestslave(Read_Master_Log_Postothe tailofthe deadmaster's binlog) were lost.
Fri Aug 5 11:10:53 2016 - [info]
Fri Aug 5 11:10:53 2016 - [info] * Phase 3.3: Determining New Master Phase..
Fri Aug 5 11:10:53 2016 - [info]
Fri Aug 5 11:10:53 2016 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Fri Aug 5 11:10:53 2016 - [info] HealthCheck: SSH to host_2 is reachable.
Fri Aug 5 11:10:53 2016 - [info] Checking whether host_2 has relay logs from the oldest position..
Fri Aug 5 11:10:53 2016 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=host_1_name.000001 --latest_rmlp=32925754 --target_mlf=host_1_name.000001 --target_rmlp=2093512 --server_id=12616506 --workdir=/var/log/masterha/app1 --timestamp=20160805111051 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_2_name-relay-bin.000008 :
 Relay log found at /data/mysql_data, up to host_2_name-relay-bin.000008
 Fast relay log position search failed. Reading relay logs to find..
Reading host_2_name-relay-bin.000008
 Binlog Checksum enabled
 Master Version is 5.7.13-log
 Binlog Checksum enabled
 host_2_name-relay-bin.000008 contains master host_1_name.000001 from position 26175199
Reading host_2_name-relay-bin.000007
 Binlog Checksum enabled
 host_2_name-relay-bin.000007 contains master host_1_name.000001 from position 12184106
Reading host_2_name-relay-bin.000006
No such file or directory:/data/mysql_data/host_2_name-relay-bin.000006 at /usr/share/perl5/vendor_perl/MHA/BinlogPosFindManager.pm line 102
Fri Aug 5 11:10:53 2016 - [warning] host_2 doesn't have allrelaylogs. Maybesomelogswere purged.
Fri Aug 511:10:532016- [warning] Noneoflatest servers have enough relaylogsfromoldestposition. We can't recover oldest slaves.
Fri Aug 5 11:10:53 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln947] None of the latest slaves has enough relay logs for recovery.
Fri Aug 5 11:10:53 2016 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/bin/masterha_manager line 65
Fri Aug 5 11:10:53 2016 - [info]

----- Failover Report -----

app1: MySQL Master failover host_1(host_1:3306)

Master host_1(host_1:3306) is down!

Check MHA Manager logs at host_manager_name:/var/log/masterha/app1/app1.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on host_1(host_1:3306)
None of the latest slaves has enough relay logs for recovery.
Got Error so couldn't continuefailoverfromhere.

4.16 [用例测试] GTID模式下,需要relay-log吗?是否能够成功的补齐日志

GTID模式下,不需要relay-log,因为是走新的协议。

但是必须打开 binlog,log_slave_updates

Fri Aug 5 13:32:25 2016 - [info] OK.
Fri Aug 5 13:32:25 2016 - [warning] shutdown_script is not defined.
Fri Aug 5 13:32:25 2016 - [info] Setmasterpinginterval3seconds.
Fri Aug 513:32:252016- [warning] secondary_check_scriptisnotdefined. Itishighly recommended setting ittocheckmasterreachabilityfromtwoormore routes.
Fri Aug 513:32:252016- [info]Startingping healthcheckonhost_1(host_1:3306)..
Fri Aug 513:32:282016- [warning] Got erroronMySQLconnect:2003(Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 13:32:28 2016 - [warning] Connection failed 1 time(s)..
Fri Aug 5 13:32:28 2016 - [info] Executing SSH check script: exit 0
Fri Aug 5 13:32:31 2016 - [warning] Got error on MySQL connect: 2003 (Can't connecttoMySQLserveron'host_1'(4))
Fri Aug 513:32:312016- [warning]Connectionfailed2time(s)..
Fri Aug 513:32:332016- [warning] HealthCheck: Got timeoutonchecking SSHconnectiontohost_1!at/usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line342.
Fri Aug 513:32:342016- [warning] Got erroronMySQLconnect:2003(Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 13:32:34 2016 - [warning] Connection failed 3 time(s)..
Fri Aug 5 13:32:37 2016 - [warning] Got error on MySQL connect: 2003 (Can't connecttoMySQLserveron'host_1'(4))
Fri Aug 513:32:372016- [warning]Connectionfailed4time(s)..
Fri Aug 513:32:372016- [warning]Masterisnotreachablefromhealth checker!
Fri Aug 513:32:372016- [warning]Masterhost_1(host_1:3306)isnotreachable!
Fri Aug 513:32:372016- [warning] SSHisNOTreachable.
Fri Aug 513:32:372016- [info] Connectingtoamasterserverfailed. Reading configuration file /etc/masterha_default.cnfand/etc/app1.cnf again,andtryingtoconnecttoallserverstocheckserverstatus..
Fri Aug 513:32:372016- [info] Readingdefaultconfigurationfrom/etc/masterha_default.cnf..
Fri Aug 513:32:372016- [info] Reading applicationdefaultconfigurationfrom/etc/app1.cnf..
Fri Aug 513:32:372016- [info] Readingserverconfigurationfrom/etc/app1.cnf..
Fri Aug 513:32:372016- [warning]SQLThreadisstopped(noerror)onhost_3(host_3:3306)
Fri Aug 513:32:372016- [info] GTID failovermode=1
Fri Aug 513:32:372016- [info] Dead Servers:
Fri Aug 513:32:372016- [info] host_1(host_1:3306)
Fri Aug 513:32:372016- [info] Alive Servers:
Fri Aug 513:32:372016- [info] host_2(host_2:3306)
Fri Aug 513:32:372016- [info] host_3(host_3:3306)
Fri Aug 513:32:372016- [info] Alive Slaves:
Fri Aug 513:32:372016- [info] host_2(host_2:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Fri Aug 513:32:372016- [info] GTIDON
Fri Aug 513:32:372016- [info] Replicatingfromhost_1(host_1:3306)
Fri Aug 513:32:372016- [info]Primarycandidateforthe newMaster(candidate_masterisset)
Fri Aug 513:32:372016- [info] host_3(host_3:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Fri Aug 513:32:372016- [info] GTIDON
Fri Aug 513:32:372016- [info] Replicatingfromhost_1(host_1:3306)
Fri Aug 513:32:372016- [info]Notcandidateforthe newMaster(no_masterisset)
Fri Aug 513:32:372016- [info] Checkingslaveconfigurations..
Fri Aug 513:32:372016- [info] Checking replication filtering settings..
Fri Aug 513:32:372016- [info] Replication filteringcheckok.
Fri Aug 513:32:372016- [info]Masterisdown!
Fri Aug 513:32:372016- [info] Terminating monitoring script.
Fri Aug 513:32:372016- [info] Got exit code20(Masterdead).
Fri Aug 513:32:372016- [info] MHA::MasterFailoverversion0.56.
Fri Aug 513:32:372016- [info]Startingmasterfailover.
Fri Aug 513:32:372016- [info]
Fri Aug 513:32:372016- [info] * Phase1: ConfigurationCheckPhase..
Fri Aug 513:32:372016- [info]
Fri Aug 513:32:372016- [warning]SQLThreadisstopped(noerror)onhost_3(host_3:3306)
Fri Aug 513:32:372016- [info] GTID failovermode=1
Fri Aug 513:32:372016- [info] Dead Servers:
Fri Aug 513:32:372016- [info] host_1(host_1:3306)
Fri Aug 513:32:372016- [info] Checkingmasterreachability via MySQL(doublecheck)...
Fri Aug 513:32:382016- [info] ok.
Fri Aug 513:32:382016- [info] Alive Servers:
Fri Aug 513:32:382016- [info] host_2(host_2:3306)
Fri Aug 513:32:382016- [info] host_3(host_3:3306)
Fri Aug 513:32:382016- [info] Alive Slaves:
Fri Aug 513:32:382016- [info] host_2(host_2:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Fri Aug 513:32:382016- [info] GTIDON
Fri Aug 513:32:382016- [info] Replicatingfromhost_1(host_1:3306)
Fri Aug 513:32:382016- [info]Primarycandidateforthe newMaster(candidate_masterisset)
Fri Aug 513:32:382016- [info] host_3(host_3:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Fri Aug 513:32:382016- [info] GTIDON
Fri Aug 513:32:382016- [info] Replicatingfromhost_1(host_1:3306)
Fri Aug 513:32:382016- [info]Notcandidateforthe newMaster(no_masterisset)
Fri Aug 513:32:382016- [info]StartingSQLthreadonhost_3(host_3:3306) ..
Fri Aug 513:32:382016- [info] done.
Fri Aug 513:32:382016- [info]StartingGTID based failover.
Fri Aug 513:32:382016- [info]
Fri Aug 513:32:382016- [info] ** Phase1: ConfigurationCheckPhase completed.
Fri Aug 513:32:382016- [info]
Fri Aug 513:32:382016- [info] * Phase2: DeadMasterShutdown Phase..
Fri Aug 513:32:382016- [info]
Fri Aug 513:32:382016- [info] Forcing shutdown so that applications neverconnecttothecurrentmaster..
Fri Aug 513:32:382016- [info] ExecutingmasterIP deactivation script:
Fri Aug 513:32:382016- [info] /home/mysql/MHA/masterha/master_ip_failover--orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --command=stop
Fri Aug 513:32:382016- [info] done.
Fri Aug 513:32:382016- [warning] shutdown_scriptisnotset. Skipping explicit shutting downofthe deadmaster.
Fri Aug 513:32:382016- [info] * Phase2: DeadMasterShutdown Phase completed.
Fri Aug 513:32:382016- [info]
Fri Aug 513:32:382016- [info] * Phase3:MasterRecovery Phase..
Fri Aug 513:32:382016- [info]
Fri Aug 513:32:382016- [info] * Phase3.1: Getting Latest Slaves Phase..
Fri Aug 513:32:382016- [info]
Fri Aug 513:32:382016- [info] The latestbinarylogfile/positiononallslavesishost_1_name.000002:56745425
Fri Aug 513:32:382016- [info] Retrieved GtidSet:0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-210949
Fri Aug 513:32:382016- [info] Latest slaves (Slaves that received relaylogfilestothe latest):
Fri Aug 513:32:382016- [info] host_2(host_2:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Fri Aug 513:32:382016- [info] GTIDON
Fri Aug 513:32:382016- [info] Replicatingfromhost_1(host_1:3306)
Fri Aug 513:32:382016- [info]Primarycandidateforthe newMaster(candidate_masterisset)
Fri Aug 513:32:382016- [info] The oldestbinarylogfile/positiononallslavesishost_1_name.000002:517969
Fri Aug 513:32:382016- [info] Retrieved GtidSet:0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-1925
Fri Aug 513:32:382016- [info] Oldest slaves:
Fri Aug 513:32:382016- [info] host_3(host_3:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Fri Aug 513:32:382016- [info] GTIDON
Fri Aug 513:32:382016- [info] Replicatingfromhost_1(host_1:3306)
Fri Aug 513:32:382016- [info]Notcandidateforthe newMaster(no_masterisset)
Fri Aug 513:32:382016- [info]
Fri Aug 513:32:382016- [info] * Phase3.3: Determining NewMasterPhase..
Fri Aug 513:32:382016- [info]
Fri Aug 513:32:382016- [info] Searching newmasterfromslaves..
Fri Aug 513:32:382016- [info] Candidate mastersfromthe configuration file:
Fri Aug 513:32:382016- [info] host_2(host_2:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Fri Aug 513:32:382016- [info] GTIDON
Fri Aug 513:32:382016- [info] Replicatingfromhost_1(host_1:3306)
Fri Aug 513:32:382016- [info]Primarycandidateforthe newMaster(candidate_masterisset)
Fri Aug 513:32:382016- [info] Non-candidate masters:
Fri Aug 513:32:382016- [info] host_3(host_3:3306)Version=5.7.13-log(oldest majorversionbetweenslaves)log-bin:enabled
Fri Aug 513:32:382016- [info] GTIDON
Fri Aug 513:32:382016- [info] Replicatingfromhost_1(host_1:3306)
Fri Aug 513:32:382016- [info]Notcandidateforthe newMaster(no_masterisset)
Fri Aug 513:32:382016- [info] Searchingfromcandidate_master slaves which have received the latest relaylogevents..
Fri Aug 513:32:382016- [info] Newmasterishost_2(host_2:3306)
Fri Aug 513:32:382016- [info]Startingmasterfailover..
Fri Aug 513:32:382016- [info]
From:
host_1(host_1:3306) (currentmaster)
 +--host_2(host_2:3306)
 +--host_3(host_3:3306)

To:
host_2(host_2:3306) (newmaster)
 +--host_3(host_3:3306)
Fri Aug 513:32:382016- [info]
Fri Aug 513:32:382016- [info] * Phase3.3: NewMasterRecovery Phase..
Fri Aug 513:32:382016- [info]
Fri Aug 513:32:382016- [info] Waitingalllogstobe applied..
Fri Aug 513:32:382016- [info] done.
Fri Aug 513:32:382016- [info] Getting newmaster's binlog name and position..
Fri Aug 5 13:32:38 2016 - [info] host_2_name.000008:10198806
Fri Aug 5 13:32:38 2016 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host_2', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Fri Aug 5 13:32:38 2016 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: host_2_name.000008, 10198806, 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-210949
Fri Aug 5 13:32:38 2016 - [info] Executing master IP activate script:
Fri Aug 5 13:32:38 2016 - [info] /home/mysql/MHA/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --new_master_host=host_2 --new_master_ip=host_2 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba'
Set read_only=0 on the new master.
No need to Creating app user on the new master..
Fri Aug 5 13:32:38 2016 - [info] OK.
Fri Aug 5 13:32:38 2016 - [info] Setting read_only=0 on host_2(host_2:3306)..
Fri Aug 5 13:32:38 2016 - [info] ok.
Fri Aug 5 13:32:38 2016 - [info] ** Finished master recovery successfully.
Fri Aug 5 13:32:38 2016 - [info] * Phase 3: Master Recovery Phase completed.
Fri Aug 5 13:32:38 2016 - [info]
Fri Aug 5 13:32:38 2016 - [info] * Phase 4: Slaves Recovery Phase..
Fri Aug 5 13:32:38 2016 - [info]
Fri Aug 5 13:32:38 2016 - [info]
Fri Aug 5 13:32:38 2016 - [info] * Phase 4.1: Starting Slaves in parallel..
Fri Aug 5 13:32:38 2016 - [info]
Fri Aug 5 13:32:38 2016 - [info] -- Slave recovery on host host_3(host_3:3306) started, pid: 22029. Check tmp log /var/log/masterha/app1/host_3_3306_20160805133237.log if it takes time..
Fri Aug 5 13:33:38 2016 - [info]
Fri Aug 5 13:33:38 2016 - [info] Log messages from host_3 ...
Fri Aug 5 13:33:38 2016 - [info]
Fri Aug 5 13:32:38 2016 - [info] Resetting slave host_3(host_3:3306) and starting replication from the new master host_2(host_2:3306)..
Fri Aug 5 13:32:38 2016 - [info] Executed CHANGE MASTER.
Fri Aug 5 13:32:39 2016 - [info] Slave started.
Fri Aug 5 13:33:38 2016 - [info] gtid_wait(0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-210949) completed on host_3(host_3:3306). Executed 206170 events.
Fri Aug 5 13:33:38 2016 - [info] End of log messages from host_3.
Fri Aug 5 13:33:38 2016 - [info] -- Slave on host host_3(host_3:3306) started.
Fri Aug 5 13:33:38 2016 - [info] All new slave servers recovered successfully.
Fri Aug 5 13:33:38 2016 - [info]
Fri Aug 5 13:33:38 2016 - [info] * Phase 5: New master cleanup phase..
Fri Aug 5 13:33:38 2016 - [info]
Fri Aug 5 13:33:38 2016 - [info] Resetting slave info on the new master..
Fri Aug 5 13:33:38 2016 - [info] host_2: Resetting slave info succeeded.
Fri Aug 5 13:33:38 2016 - [info] Master failover to host_2(host_2:3306) completed successfully.
Fri Aug 5 13:33:38 2016 - [info]

----- Failover Report -----

app1: MySQL Master failover host_1(host_1:3306) to host_2(host_2:3306) succeeded

Master host_1(host_1:3306) is down!

Check MHA Manager logs at host_manager_name:/var/log/masterha/app1/app1.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on host_1(host_1:3306)
Selected host_2(host_2:3306) as a new master.
host_2(host_2:3306): OK: Applying all logs succeeded.
host_2(host_2:3306): OK: Activated master IP address.
host_3(host_3:3306): OK: Slave started, replicating from host_2(host_2:3306)
host_2(host_2:3306): Resetting slave info succeeded.
Master failover to host_2(host_2:3306) completed successfully.

4.18 [用例测试] 在一开始没有开启MHA的group中,然后master挂了,如何做到日志补偿,然后change master呢?

* 如果master挂了,而且没有HA,那怎么做日志补偿呢?

1. 其实很简单,只要之前在slave都设置了relay-log-purge=0就可以

2. 然后自己在按照正常流程部署好manager和node

3. 操作 手工-自动 failover 就可以了

阅读原文...


Focus on MySQL

一个新的舞台,一个新的挑战

上一篇

MySQL Master High Available 源码篇

下一篇

您也可能喜欢

评论已经被关闭。

插入图片
MySQL Master High Available 实战篇

长按储存图像,分享给朋友