两个Redis实例互相SLAVEOF会怎样
发布网友
发布时间:2022-04-19 09:50
我来回答
共1个回答
热心网友
时间:2022-04-11 15:34
今天尝试配置Redis Sentinel 来监控Redis服务器,中间由于某些设想我突然想到如果两个Redis实例互相slaveof会怎样。以下是我的试验:
两个Redis实例,redis1配置作为master,redis2配置作为slave:slaveof redis1。
启动redis1、redis2。
启动成功并且redis2也成功slaveof redis1后,redis-cli连接redis1,执行命令将redis1设置为redis2的从库:
slaveof [redis2 IP] [redis2 port]
执行后的结果是......两个redis都在重复抛出SYNC命令执行失败的log,也就是显然两个redis不能互相作为从库。
redis1执行slaveof后的log:
[14793] 06 Sep 17:36:20.426 * SLAVE OF 10.18.129.49:9778 enabled (user request)
[14793] 06 Sep 17:36:20.636 - Accepted 10.18.129.49:44277
[14793] 06 Sep 17:36:20.637 - Client closed connection
[14793] 06 Sep 17:36:20.804 * Connecting to MASTER...
[14793] 06 Sep 17:36:20.804 * MASTER <-> SLAVE sync started
[14793] 06 Sep 17:36:20.804 * Non blocking connect for SYNC fired the event.
[14793] 06 Sep 17:36:20.804 * Master replied to PING, replication can continue...
[14793] 06 Sep 17:36:20.804 # MASTER aborted replication with an error: ERR Can't SYNC while not connected with my master
[14793] 06 Sep 17:36:21.636 - Accepted 10.18.129.49:44279
[14793] 06 Sep 17:36:21.637 - Client closed connection
[14793] 06 Sep 17:36:21.804 * Connecting to MASTER...
[14793] 06 Sep 17:36:21.804 * MASTER <-> SLAVE sync started
[14793] 06 Sep 17:36:21.804 * Non blocking connect for SYNC fired the event.
[14793] 06 Sep 17:36:21.804 * Master replied to PING, replication can continue...
[14793] 06 Sep 17:36:21.804 # MASTER aborted replication with an error: ERR Can't SYNC while not connected with my master
[14793] 06 Sep 17:36:22.636 - Accepted 10.18.129.49:44281
[14793] 06 Sep 17:36:22.637 - Client closed connection
[14793] 06 Sep 17:36:22.804 * Connecting to MASTER...
[14793] 06 Sep 17:36:22.804 * MASTER <-> SLAVE sync started
[14793] 06 Sep 17:36:22.804 * Non blocking connect for SYNC fired the event.
[14793] 06 Sep 17:36:22.804 * Master replied to PING, replication can continue..
redis2的log:
[14796] 06 Sep 17:36:20.426 - Client closed connection
[14796] 06 Sep 17:36:20.636 * Connecting to MASTER...
[14796] 06 Sep 17:36:20.636 * MASTER <-> SLAVE sync started
[14796] 06 Sep 17:36:20.636 * Non blocking connect for SYNC fired the event.
[14796] 06 Sep 17:36:20.636 * Master replied to PING, replication can continue...
[14796] 06 Sep 17:36:20.636 # MASTER aborted replication with an error: ERR Can't SYNC while not connected with my master
[14796] 06 Sep 17:36:20.804 - Accepted 10.18.129.49:51034
[14796] 06 Sep 17:36:20.805 - Client closed connection
[14796] 06 Sep 17:36:21.636 * Connecting to MASTER...
[14796] 06 Sep 17:36:21.636 * MASTER <-> SLAVE sync started
[14796] 06 Sep 17:36:21.636 * Non blocking connect for SYNC fired the event.
[14796] 06 Sep 17:36:21.636 * Master replied to PING, replication can continue...
[14796] 06 Sep 17:36:21.637 # MASTER aborted replication with an error: ERR Can't SYNC while not connected with my master
[14796] 06 Sep 17:36:21.804 - Accepted 10.18.129.49:51036
[14796] 06 Sep 17:36:21.805 - Client closed connection
[14796] 06 Sep 17:36:22.636 - DB 0: 20 keys (0 volatile) in 32 slots HT.
[14796] 06 Sep 17:36:22.636 - 0 clients connected (0 slaves), 801176 bytes in use
[14796] 06 Sep 17:36:22.636 * Connecting to MASTER...
[14796] 06 Sep 17:36:22.636 * MASTER <-> SLAVE sync started
[14796] 06 Sep 17:36:22.636 * Non blocking connect for SYNC fired the event.
[14796] 06 Sep 17:36:22.636 * Master replied to PING, replication can continue..
两个redis就这样都进入SYNC失败的死循环状态。
我想到的疑问是:为什么原来的从库redis2会重新执行SYNC命令?
从上面的redis2的log第一行可以看到原先的主从连接断开了。
看了执行主从设置的源码replication.c,下面是redis1执行slaveof命令的代码,它在中间执行disconnectSlaves()导致原来的主从连接断开:
void slaveofCommand(redisClient *c) {
if (!strcasecmp(c->argv[1]->ptr,"no") &&!strcasecmp(c->argv[2]->ptr,"one")) {
// 省略了
} else {
// 省略了
/* There was no previous master or the user specified a different one,
* we can continue. */
sdsfree(server.masterhost);
server.masterhost = sdsp(c->argv[1]->ptr);
server.masterport = port;
if (server.master) freeClient(server.master);
disconnectSlaves(); /* Force our slaves to resync with us as well. */
cancelReplicationHandshake();
server.repl_state = REDIS_REPL_CONNECT;
redisLog(REDIS_NOTICE,"SLAVE OF %s:%d enabled (user request)",
server.masterhost, server.masterport);
}
addReply(c,shared.ok);
}
disconnectSlaves()旁边的注解是:Force our slaves to resync with us as well. 意思类似于先把你们(redis2)断开,等我(redis1)同步我的主库搞定后你们再来向我同步。这样导致redis2和redis1断开了,而redis2一开始作为从库如果它和主库断开它会不断尝试重新连接并执行SYNC命令直到成功。
了解了为什么redis2也执行SYNC命令后,第二个疑问是为什么两个redis的SYNC操作都会一直失败,实际上原因和第一个差不多。两个redis的log异常都是:ERR Can't SYNC while not connected with my master。这个log在代码中是:
void syncCommand(redisClient *c) {
/* ignore SYNC if already slave or in monitor mode */
if (c->flags & REDIS_SLAVE) return;
/* Refuse SYNC requests if we are a slave but the link with our master
* is not ok... */
if (server.masterhost && server.repl_state != REDIS_REPL_CONNECTED) {
addReplyError(c,"Can't SYNC while not connected with my master");
return;
}
/* SYNC can't be issued when the server has pending data to send to
* the client about already issued commands. We need a fresh reply
* buffer registering the differences between the BGSAVE and the current
* dataset, so that we can copy to other slaves if needed. */
if (listLength(c->reply) != 0) {
addReplyError(c,"SYNC is invalid with pending input");
return;
}
//省略
}
syncCommand函数是Redis作为主库收到从库发来的SYNC命令时的处理,看上面注释部分“Refuse SYNC requests if we are a slave but the link with our master is not ok...”。当redis1作为主库收到从库的SYNC命令,会执行syncCommand函数,其中if (server.masterhost && server.repl_state != REDIS_REPL_CONNECTED)... ,redis1刚好设置为别的主库(redis2)的从库但还没完成同步工作(redis1需要向redis2发送SYNC请求并且返回成功才能完成同步,而redis2处理redis1的SYNC请求时又需要redis1处理好redis2的SYNC请求才行,这导致死锁了),所以这个判断返回true,redis1直接reply error:Can't SYNC while not connected with my master)。redis2的情况也一样,所以双方都处在Can't SYNC while not connected with my master的状态。