Redis Cluster是Redis官方提供的分布式解决方案,通过数据分片、故障转移和自动重配置等机制,实现了高可用性和横向扩展能力。本文将深入解析Redis Cluster的核心机制,并结合实际运维场景,详细介绍常用的运维命令。
1 Redis Cluster核心机制
1.1 数据分片(Slot)
Redis Cluster将数据分布到多个节点上,通过哈希槽(Slot)的方式实现数据分片。集群总共有 16384个哈希槽,每个节点负责管理一部分槽位。
哈希槽的分配规则:
- 每个键通过CRC16算法计算出一个哈希值,然后对16384取模,得到对应的槽位编号
- 集群中的每个主节点负责一部分槽位,例如:
- 节点A:负责槽位 0-5460
- 节点B:负责槽位 5461-10922
- 节点C:负责槽位 10923-16383
2 故障转移
Redis Cluster通过自动故障检测和从节点选举机制实现高可用性。
自动故障检测:
- 每个节点会定期向其他节点发送ping消息,检测其是否在线
- 如果某个主节点在指定时间内未响应,集群会将其标记为“疑似下线”
- 当大多数主节点确认该节点下线后,集群会触发故障转移
从节点选举逻辑:
- 当主节点下线后,集群会从其从节点中选举一个新的主节点
- 选举规则:
1)从节点的优先级(slave-priority)越高,越容易被选举为主节点
2)如果优先级相同,则选择复制偏移量(replication offset)最大的从节点
- 选举完成后,集群会更新配置,并将新的主节点信息广播给所有节点
3 Redis Cluster常用运维命令
3.1 cluster nodes
# 示例
/usr/local/redis/bin/redis-cli -p 6379 -a lahmy1c@ cluster nodes[root@node4 bin]# /usr/local/redis/bin/redis-cli -p 6379 -a lahmy1c@ cluster nodes
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
034005a84845645fac6834900a11254db009ccec 192.168.10.34:6380@16380 slave bc972a8cec9c52e5c4b5da01b7674edd8f9891a5 0 1742308064559 1 connected
4297823a72a4677329764bdfbdd6fdb0f1b25182 192.168.10.34:6379@16379 master - 0 1742308063743 3 connected 5461-10922
f4aebe01a277c7384356d10dd9f14381aadba730 192.168.10.33:6380@16380 slave 72a68c3bbe6f8228640afabbd0d6100f681bfb87 0 1742308065271 5 connected
5eb12144975cb46c0cee39d63407488f7bcc99c6 192.168.10.35:6380@16380 slave 4297823a72a4677329764bdfbdd6fdb0f1b25182 0 1742308064252 3 connected
72a68c3bbe6f8228640afabbd0d6100f681bfb87 192.168.10.35:6379@16379 master - 0 1742308064764 5 connected 10923-16383
bc972a8cec9c52e5c4b5da01b7674edd8f9891a5 192.168.10.33:6379@16379 myself,master - 0 1742308064000 1 connected 0-5460 [1000->-f4aebe01a277c7384356d10dd9f14381aadba730]
[root@node4 bin]#
- 输出解析:节点ID、IP、端口、角色(主节点或从节点)、槽位分配、主从关系等
3.2 cluster info
# 示例
/usr/local/redis/bin/redis-cli -p 6379 -a lahmy1c@ cluster info[root@node4 bin]# /usr/local/redis/bin/redis-cli -p 6379 -a lahmy1c@ cluster info
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:7
cluster_my_epoch:1
cluster_stats_messages_ping_sent:20169
cluster_stats_messages_pong_sent:19908
cluster_stats_messages_fail_sent:10
cluster_stats_messages_sent:40087
cluster_stats_messages_ping_received:19903
cluster_stats_messages_pong_received:20158
cluster_stats_messages_meet_received:5
cluster_stats_messages_fail_received:11
cluster_stats_messages_auth-req_received:3
cluster_stats_messages_received:40080
[root@node4 bin]#
- 输出解析:
- cluster_state:集群状态(ok表示正常)
- cluster_slots_assigned:已分配的槽位数
- cluster_slots_ok:正常的槽位数
- cluster_known_nodes:集群中已知的节点数
3.3 cluster failover
- 作用:手动触发从节点提升为主节点
- 使用场景:
- 主节点需要维护时,可以手动触发故障转移
- 测试故障转移流程
# 示例
/usr/local/redis/bin/redis-cli -p 6379 -a lahmy1c@ cluster failover
- 注意事项
- 该命令需要在从节点上执行
- 执行后,从节点会尝试提升为主节点
3.4 cluster addslots
# 示例
/usr/local/redis/bin/redis-cli -p 6379 -a lahmy1c@ cluster addslots 0 1 2 3
- 注意事项:
- 槽位分配只能在主节点上执行
- 分配槽位后,集群会重新平衡数据
3.5 cluster check
/usr/local/redis/bin/redis-cli -p 6379 -a lahmy1c@--cluster check 192.168.10.33:6379[root@node4 bin]# /usr/local/redis/bin/redis-cli -p 6379 -a lahmy1c@--cluster check 192.168.10.33:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
AUTH failed: WRONGPASS invalid username-password pair or user is disabled.
[root@node4 bin]# /usr/local/redis/bin/redis-cli -p 6379 -a lahmy1c@ --cluster check 192.168.10.33:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
192.168.10.33:6379 (bc972a8c...) -> 1 keys | 5461 slots | 1 slaves.
192.168.10.34:6379 (4297823a...) -> 0 keys | 5462 slots | 1 slaves.
192.168.10.35:6379 (72a68c3b...) -> 0 keys | 5461 slots | 1 slaves.
[OK] 1 keys in 3 masters.
0.00 keys per slot on average.
>>> Performing Cluster Check (using node 192.168.10.33:6379)
M: bc972a8cec9c52e5c4b5da01b7674edd8f9891a5 192.168.10.33:6379slots:[0-5460] (5461 slots) master1 additional replica(s)
S: 034005a84845645fac6834900a11254db009ccec 192.168.10.34:6380slots: (0 slots) slavereplicates bc972a8cec9c52e5c4b5da01b7674edd8f9891a5
M: 4297823a72a4677329764bdfbdd6fdb0f1b25182 192.168.10.34:6379slots:[5461-10922] (5462 slots) master1 additional replica(s)
S: f4aebe01a277c7384356d10dd9f14381aadba730 192.168.10.33:6380slots: (0 slots) slavereplicates 72a68c3bbe6f8228640afabbd0d6100f681bfb87
S: 5eb12144975cb46c0cee39d63407488f7bcc99c6 192.168.10.35:6380slots: (0 slots) slavereplicates 4297823a72a4677329764bdfbdd6fdb0f1b25182
M: 72a68c3bbe6f8228640afabbd0d6100f681bfb87 192.168.10.35:6379slots:[10923-16383] (5461 slots) master1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
[WARNING] Node 192.168.10.33:6379 has slots in migrating state 1000.
[WARNING] The following slots are open: 1000.
>>> Check slots coverage...
[OK] All 16384 slots covered.
[root@node4 bin]#
- 输出解析:
- 槽位分配是否完整
- 主从节点配置是否正确
- 节点之间的一致性
4 总结
Redis Cluster通过数据分片和故障转移机制,实现了高可用性和横向扩展能力。掌握集群的核心机制和常用运维命令,可以帮助你更好地管理和维护Redis集群。在实际运维中,建议定期检查集群状态,及时处理故障,并根据业务需求调整集群规模。