首页

文章

两个Redis实例互相SLAVEOF会怎样

发布网友发布时间：2022-04-19 09:50

共1个回答

热心网友时间：2022-04-11 15:34

今天尝试配置Redis Sentinel 来监控Redis服务器，中间由于某些设想我突然想到如果两个Redis实例互相slaveof会怎样。以下是我的试验：
两个Redis实例，redis1配置作为master，redis2配置作为slave：slaveof redis1。
启动redis1、redis2。
启动成功并且redis2也成功slaveof redis1后，redis-cli连接redis1，执行命令将redis1设置为redis2的从库:
slaveof [redis2 IP] [redis2 port]
执行后的结果是......两个redis都在重复抛出SYNC命令执行失败的log，也就是显然两个redis不能互相作为从库。
redis1执行slaveof后的log：

[14793] 06 Sep 17:36:20.426 * SLAVE OF 10.18.129.49:9778 enabled (user request)
[14793] 06 Sep 17:36:20.636 - Accepted 10.18.129.49:44277
[14793] 06 Sep 17:36:20.637 - Client closed connection
[14793] 06 Sep 17:36:20.804 * Connecting to MASTER...
[14793] 06 Sep 17:36:20.804 * MASTER <-> SLAVE sync started
[14793] 06 Sep 17:36:20.804 * Non blocking connect for SYNC fired the event.
[14793] 06 Sep 17:36:20.804 * Master replied to PING, replication can continue...
[14793] 06 Sep 17:36:20.804 # MASTER aborted replication with an error: ERR Can't SYNC while not connected with my master
[14793] 06 Sep 17:36:21.636 - Accepted 10.18.129.49:44279
[14793] 06 Sep 17:36:21.637 - Client closed connection
[14793] 06 Sep 17:36:21.804 * Connecting to MASTER...
[14793] 06 Sep 17:36:21.804 * MASTER <-> SLAVE sync started
[14793] 06 Sep 17:36:21.804 * Non blocking connect for SYNC fired the event.
[14793] 06 Sep 17:36:21.804 * Master replied to PING, replication can continue...
[14793] 06 Sep 17:36:21.804 # MASTER aborted replication with an error: ERR Can't SYNC while not connected with my master
[14793] 06 Sep 17:36:22.636 - Accepted 10.18.129.49:44281
[14793] 06 Sep 17:36:22.637 - Client closed connection
[14793] 06 Sep 17:36:22.804 * Connecting to MASTER...
[14793] 06 Sep 17:36:22.804 * MASTER <-> SLAVE sync started
[14793] 06 Sep 17:36:22.804 * Non blocking connect for SYNC fired the event.
[14793] 06 Sep 17:36:22.804 * Master replied to PING, replication can continue..

redis2的log：

[14796] 06 Sep 17:36:20.426 - Client closed connection
[14796] 06 Sep 17:36:20.636 * Connecting to MASTER...
[14796] 06 Sep 17:36:20.636 * MASTER <-> SLAVE sync started
[14796] 06 Sep 17:36:20.636 * Non blocking connect for SYNC fired the event.
[14796] 06 Sep 17:36:20.636 * Master replied to PING, replication can continue...
[14796] 06 Sep 17:36:20.636 # MASTER aborted replication with an error: ERR Can't SYNC while not connected with my master
[14796] 06 Sep 17:36:20.804 - Accepted 10.18.129.49:51034
[14796] 06 Sep 17:36:20.805 - Client closed connection
[14796] 06 Sep 17:36:21.636 * Connecting to MASTER...
[14796] 06 Sep 17:36:21.636 * MASTER <-> SLAVE sync started
[14796] 06 Sep 17:36:21.636 * Non blocking connect for SYNC fired the event.
[14796] 06 Sep 17:36:21.636 * Master replied to PING, replication can continue...
[14796] 06 Sep 17:36:21.637 # MASTER aborted replication with an error: ERR Can't SYNC while not connected with my master
[14796] 06 Sep 17:36:21.804 - Accepted 10.18.129.49:51036
[14796] 06 Sep 17:36:21.805 - Client closed connection
[14796] 06 Sep 17:36:22.636 - DB 0: 20 keys (0 volatile) in 32 slots HT.
[14796] 06 Sep 17:36:22.636 - 0 clients connected (0 slaves), 801176 bytes in use
[14796] 06 Sep 17:36:22.636 * Connecting to MASTER...
[14796] 06 Sep 17:36:22.636 * MASTER <-> SLAVE sync started
[14796] 06 Sep 17:36:22.636 * Non blocking connect for SYNC fired the event.
[14796] 06 Sep 17:36:22.636 * Master replied to PING, replication can continue..

两个redis就这样都进入SYNC失败的死循环状态。
我想到的疑问是：为什么原来的从库redis2会重新执行SYNC命令？
从上面的redis2的log第一行可以看到原先的主从连接断开了。
看了执行主从设置的源码replication.c，下面是redis1执行slaveof命令的代码，它在中间执行disconnectSlaves()导致原来的主从连接断开：

void slaveofCommand(redisClient *c) {
if (!strcasecmp(c->argv[1]->ptr,"no") &&!strcasecmp(c->argv[2]->ptr,"one")) {
// 省略了
} else {
// 省略了
/* There was no previous master or the user specified a different one,
* we can continue. */
sdsfree(server.masterhost);
server.masterhost = sdsp(c->argv[1]->ptr);
server.masterport = port;
if (server.master) freeClient(server.master);
disconnectSlaves(); /* Force our slaves to resync with us as well. */
cancelReplicationHandshake();
server.repl_state = REDIS_REPL_CONNECT;
redisLog(REDIS_NOTICE,"SLAVE OF %s:%d enabled (user request)",
server.masterhost, server.masterport);
}
addReply(c,shared.ok);
}

disconnectSlaves()旁边的注解是：Force our slaves to resync with us as well. 意思类似于先把你们(redis2)断开，等我(redis1)同步我的主库搞定后你们再来向我同步。这样导致redis2和redis1断开了，而redis2一开始作为从库如果它和主库断开它会不断尝试重新连接并执行SYNC命令直到成功。
了解了为什么redis2也执行SYNC命令后，第二个疑问是为什么两个redis的SYNC操作都会一直失败，实际上原因和第一个差不多。两个redis的log异常都是：ERR Can't SYNC while not connected with my master。这个log在代码中是：

void syncCommand(redisClient *c) {
/* ignore SYNC if already slave or in monitor mode */
if (c->flags & REDIS_SLAVE) return;

/* Refuse SYNC requests if we are a slave but the link with our master
* is not ok... */
if (server.masterhost && server.repl_state != REDIS_REPL_CONNECTED) {
addReplyError(c,"Can't SYNC while not connected with my master");
return;
}

/* SYNC can't be issued when the server has pending data to send to
* the client about already issued commands. We need a fresh reply
* buffer registering the differences between the BGSAVE and the current
* dataset, so that we can copy to other slaves if needed. */
if (listLength(c->reply) != 0) {
addReplyError(c,"SYNC is invalid with pending input");
return;
}
//省略
}

syncCommand函数是Redis作为主库收到从库发来的SYNC命令时的处理，看上面注释部分“Refuse SYNC requests if we are a slave but the link with our master is not ok...”。当redis1作为主库收到从库的SYNC命令，会执行syncCommand函数，其中if (server.masterhost && server.repl_state != REDIS_REPL_CONNECTED)... ，redis1刚好设置为别的主库(redis2)的从库但还没完成同步工作(redis1需要向redis2发送SYNC请求并且返回成功才能完成同步，而redis2处理redis1的SYNC请求时又需要redis1处理好redis2的SYNC请求才行，这导致死锁了)，所以这个判断返回true，redis1直接reply error：Can't SYNC while not connected with my master)。redis2的情况也一样，所以双方都处在Can't SYNC while not connected with my master的状态。

声明：本网页内容为用户发布，旨在传播知识，不代表本网认同其观点，若有侵权等问题请及时与本网联系，我们将在第一时间删除处理。E-MAIL:11247931@qq.com