Redis Sentinel 通信流程

准备工作：

1. 配置 master slave 、修改sentinel 配置，如下：

Redis.conf 
monitor <Master1> <host> <port> <quorum> 
monitor <Master2> <host> <port> <quorum>

2. 启动 master / slave 节点，然后再启动 sentinel 节点

通信流程：

定时器会逐层遍历sentinel结构，轮询执行下面的主流程，最初，整个节点只有masters 信息，根据不断的扩散，逐渐完善整个结构，最终形成一张非常复杂环形网络。下面的每个命令都有一个执行频次，未到指定周期不会执行，进而执行之后的命令。
节点结构：

sentinel
	masters
		Master1
			sentinels
				sentinel1
				sentinel2
			slaves
				slave1
				slave2
		Master2
			sentinels
				...
			slaves
				...

主流程：

1. sentinelReconnectInstance 建立连接
connect - Commands connection （建立普通连接，用于后续发送 PING INFO PUBLISH 等命令）
connect - Pub/Sub connection（建立订阅连接，用户持续接受其他节点发来的消息）
subscribe（发送订阅命令，等待接收消息）

2. sentinelSendPeriodicCommands 发送周期性命令
send info - （向Master/Slave 发送 Info 命令，获取主从关系、服务器健康状况信息），正常频率为每 10s 执行一次
send ping - sentinelSendPing（向Sentinel/Master/Slave 发送 ping 命令，心跳检测，根据是否及时回应消息来判断节点是否健康，用于故障监测），正常频率为每 1s 执行一次
send hello - sentinelSendHello（向Sentinel/Master/Slave 发送 publish 命令，通过共同订阅同一个Master的消息队列使得多个Sentinel 可以相互连接），正常频率为每 2s 执行一次

详细流程：（省略ping过程，这里只关注多个节点如何传播与扩散）

1. slaveof 主从连接
2. sentinel 向Master 发送 SUBSCRIBE 命令，订阅 Master 的 Hello Channel （SUBSCRIBE __sentinel__:hello），等待接收消息
3. sentinel 向 Master 发送 INFO 命令，获取服务信息，及发现主从关系（获取到 Slave 信息）
4. sentinel 向 Master/Slave 发送 SUBSCRIBE 命令，与 2 类似（这次有了Slave节点）。如果之前已有的连接正常，则不会再次重复连接，因此正常情况下这一步只会连接新加入的Slave节点。
5. sentinel 向 Master/Slave 发送INFO命令，与 3 类似。
6. sentinel 向 Master/Slave 发送PUBLISH命令，扩散消息（当前Sentinel/Master 信息）。sentinel 接收到了 Master/Slave 中 Hello Channel 中的消息，得到其他的Sentinel 信息，然后保存到对应Master 下的 Sentinels 列表中。至此，整个 sentinel 结构中的节点数据基本完善。
7. sentinel 向 Master/Slave/Sentinel 发送 PUBLISH命令，扩散消息。

在经过多次的消息扩散之后，每个 master 下的 sentinels 列表会逐渐增多。最终，每个 sentinel 中将保存着所有 Sentinel/Master/Slave 信息，并且不断进行心跳监测，信息同步。

上面这7步执行过程如下图：

总结

1. 集群中节点有3种角色，分别是 Sentinel/Master/Slave。Master 、Slave 就是普通的RedisServer，而Sentinel与他们不同，客户端不能直接进行get set inc hset hlen 等命令操作。它的作用是如果Master节点发生故障后，选取该Master对应的其中一个Slave提升会Master，客户端通过订阅 Sentinel 的变更消息来获取到这个事件，进而连接新的Master IP，以达到最快的时间恢复对外服务的能力，提升可用性。

2. 在各个节点通信中，只有Sentinel 会向 Sentinel/Master/Slave 主动发出 RPC 请求，可以说是单向的。Master、Slave 还是普通的RedisServer，因此他们不会有别的行为。如果Sentinel发送Ping ，他们就回复 Pong；如果Sentinel发送Info，他们就回复对应的服务器信息；如果Sentinel发送Subscrib/Publish命令，他们执行对应的处理。只是把Sentinel节点当做普通的客户端来看待，不会有任何额外的关于Sentinel的偏袒。

3. Sentinle -> 其他Sentinel 有3种RPC，Sentinel ->Master/Slave 有4种类型RPC。Sentinel -> Sentinel 的3个RPC 分别是 Ping/Publish/Vote。Sentinel -> Master/Slave的RPC分别是Ping/Info/Subscrib/Publish。
Ping 也称为 Heartbeat RPC，用来检测各个节点的存活情况与延时，根据回复Pong的时长进行判断，如果超过设定的超时时间，该Sentinel标记这个节点为 SRI_S_DOWN（主观下线）。
Info RPC 用来只发送给Master/Slave，用来获取这些信息
a) run_id 节点id，如果重启会变更，则对应在Sentinel中登记的名称也修改为新值
b) Master/Slave信息，包括当前节点角色，是Master，还是Slave，还有每种情况对应的与之关联的Slave/Master节点IP/PORT信息。用来根据Master发现Slave，根据Slave发现Master，还包括Master / Slave 角色切换的情况。有了这一步，我们在启动Sentinel的时候只需要在配置文件中配置Master IP/PORT 就行，它会根据这个机制自动找到Slave节点。具体如下：

run_id:<40 hex chars>
master_link_down_since_seconds:<seconds> 
role:<role>

Master Node:
slave0:<ip>,<port>,<state> /* old versions*/
slave0:ip=127.0.0.1,port=9999,...  /* new versions*/

Slave Node:
master_host:<host>
master_port:<port>
master_link_status:<status>
slave_priority:<priority>
slave_repl_offset:<offset>

Publish RPC 用来发送当前Sentinel 信息与其对应的Master信息。作用：①传播当前Sentinel 与其连接的Master，让其他Sentinel节点也发现这些节点；②如果当前Sentinel 与其对应的Master信息有变更，则通知其他Sentinel节点，比如故障转移之后，原来的Master变为Slave，其中的一个Slave则变为了Master。

Message:
<ip>,<port>,<runid>,<current_epoch>
<master_name>,<master_ip>,<master_port>,<master_config_epoch>

Subscrib RPC 是Sentinel订阅 Master/Slave节点的 Hello Channel(__sentinel__:hello) 。通过这个Channel来接收上一步的Publish 消息。这个只应用于 Master/Slave 节点，Sentinel -> Sentinel 是直接通过 Publish 命令进行直接处理，不经过任何Channel。
Vote RPC 是当Sentinel发现Master发生故障，即达到主观下线状态（SRI_S_DOWN），向所有Sentinel节点确认是否真的是发生故障。如果大多数节点认为该Master确实是发生故障，则当前Sentinel 设置该Master为客观下线状态（SRI_O_DOWN）。通过发送Sentinel命令来实现，这个是发送给Sentinel节点。

SENTINEL IS-MASTER-DOWN-BY-ADDR <ip> <port> <current-epoch> <runid>

上面的这个命令有2种使用场景，第一，runid为*，如果 Master 处在主观下线状态（SRI_S_DOWN），则当前Sentinel节点定期给其他Sentinel节点发送这个命令，然后计算收到 SRI_MASTER_DOWN 的票数（节点数），如果大于指定票数，那么就标记这个节点为客观下线状态（SRI_O_DOWN）。第二，runid为当前Sentinel节点的id，如果Master处在客观下线状态，则进行投票请求，目的是选举一个 leader来执行本次故障转移。

4. 几种RPC，各自都有自己的执行频次与超时时间，有的是先后依次执行，有的是在下一个时间周期执行，因此他们的执行顺序不好明确说明。

疑问：

1. Sentinel之间传递消息可以直接发送命令（PUBLISH），为什么还要使用Master/Slave的Channel来进行通信？
在Sentinel启动时，我们在配置文件中并没有指定其他Sentinel信息，因此这是他们还是单点，无法形成集群，也就无法进行后续的投票等操作。只有通过每个Sentinel都订阅其共同的Master/Slave 来达到发现其他Sentinel的目的。

如果有任何不正确的地方，请指正，谢谢！

Mar 2018

AUTHOR WiFeng

CATEGORY Redis

COMMENTS 4 Comments

已有 4 条评论 »

canadian pharmacy online 24

2022/03/22 12:08

Thank you! Excellent information.

回复
sdfs

2020/04/27 16:03

您好，sentinel在向的Master/Slave 发送消息的时候，是要向每个master或者slave全部发一遍同样的消息么，还是只是发送一部分节点就好

回复
小武

2018/07/30 15:52

你好，请问sentinel有给_sentinel_:hello频道发送消息，但是好像没有去取得其他sentinel存在的信息，是需要另外的配置么，我的sentinel客户端所在的虚拟机可以连上其他虚拟机的sentinel服务端，也没有进行其他关键配置更改，你知道是什么原因么
1. WiFeng
  
  2018/10/08 18:26
  
  不好意思，这么久才看到你的回复。现在问题是否已经解决呢？
  每个sentinel 节点的配置文件中加入 monitor 指令就行了，如上准备工作中提到的。为了方便管理，通常各个sentinel配置的monitor 指令后面的ip / port 与其他参数都相同。
  你这种情况是不是对应的master节点不同？
  
  回复
回复

搜索

最近回复

分类

归档

关键词更懂你