Redis：持久化

- 持久化
- RDB
- - dump.rdb
  - 优缺点
- AOF
- - 文件同步
  - 重写机制
- 混合持久化

持久化

虽然Redis是一个内存级别的数据库，但是Redis也是有持久化的能力的。当系统崩溃时，Redis就会被强制退出，此时内存中的数据就会丢失。为了能够在下次重启时恢复数据，Redis会把数据在硬盘中备份一份，也就是说Redis持久化的目的不是为了保存更多的数据，而是为了恢复数据。

Redis有两种处理持久化的方式，RDB和AOF，两者的区别可以简单理解如下：

RDB：定期同步内存与硬盘的数据
AOF：只要内存的数据被修改，立刻同步到硬盘

本博客就基于这两种方式，讲解Redis的持久化。

RDB

RDB全称Redis DataBase，用于持久化Redis中的数据，其会定期把Redis的数据保存下来，这份数据就称为一个快照，随后存储到硬盘中。

除了等到指定时间，让其自动生成快照，RDB还支持程序员通过指令生成快照：

save：此时Redis会立刻生成快照，但是由于Redis是单线程模型，其它的所有操作都会被阻塞
bgsave：Redis立刻生成快照，但是是在后台运行，不会影响其他操作

在实际开发中，一般都使用bgsave，而不是save。此时就有疑问了，Redis不是单线程模型吗？如何做到在后台生成快照，而不影响其他操作的？

其实Redis在这使用了多进程的模式，流程如下：

在这里插入图片描述

当用户执行bgsave，此时父进程通过fork创建子进程，随后子进程去完成快照生成，父进程继续响应其他命令。当子进程持久化完毕，就给父进程发送信号通知父进程。如果父进程在收到信号之前，收到了来自其他用户的bgsave命令，此时不做处理直接返回，因为已经有子进程正在进行持久化了，此时多个bgsave命令只执行一次持久化。

此处的fork是Linux的一个系统调用，因为Redis本身就只支持在Linux上运行，所以就直接用Linux的系统调用了。当执行fork时，会把父进程拷贝一份一模一样的出来，比如进程地址空间，页表，PCB等等，都会拷贝一份。当然也包括内存中的数据，此时子进程就可以拿到Redis原先的所有数据，进而生成快照！

但是要是把变量全部拷贝一份，不会很浪费空间吗？其实这里用了一个写时拷贝，子进程和父进程刚拷贝完，其实是使用相同的内存数据，当父子进程任何一方进行了写入，才会发生拷贝。所以其实最后子进程几乎没有发生多少拷贝，又可以看到父进程的数据，进而生成快照。

dump.rdb

那么RDB文件到底在哪里？

打开Redis的配置文件，/etc/redis/redis.conf，可以找到以下内容：

# The working directory.
#
# The DB will be written inside this directory, with the filename specified
# above using the 'dbfilename' configuration directive.
#
# The Append Only File will also be created inside this directory.
#
# Note that you must specify a directory here, not a file name.
dir /var/lib/redis

dir后的路径，就是存储RDB文件的存储位置。在该目录下，会有一个dump.rdb文件，这就是生成的快照。这个文件内部存储的是二进制，如果直接查看会得到乱码。

并且这个数据不是简单的二进制存储，还会进行压缩，提高存储效率。

那么RDB的文件多久更新一次？这也在配置文件中：

# Save the DB to disk.
#
# save <seconds> <changes> [<seconds> <changes> ...]
#
# Redis will save the DB if the given number of seconds elapsed and it
# surpassed the given number of write operations against the DB.
#
# Snapshotting can be completely disabled with a single empty string argument
# as in following example:
#
# save ""
#
# Unless specified otherwise, by default Redis will save the DB:
#   * After 3600 seconds (an hour) if at least 1 change was performed
#   * After 300 seconds (5 minutes) if at least 100 changes were performed
#   * After 60 seconds if at least 10000 changes were performed
#
# You can set these explicitly by uncommenting the following line.save 3600 1 300 100 60 10000

找到Save the DB to disk.字段，这就是RDB的自动保存机制，配置文件的格式为：

save <seconds> <changes> [<seconds> <changes> ...]

只要在seconds秒内到达了changes次数的修改，那么就会更新文件，比如save 3600 1 300 100 60 10000的意思就是：

如果在3600 s内修改了1次数据，3600秒后更新dump.rdb
如果在300 s内修改了100次数据，300秒后更新dump.rdb
如果在60 s内修改了10000次数据，60秒后更新dump.rdb

另外的，如果配置save “”，也就是加一个空字符串，相当于禁止通过该方式自动更新。

除此之外，Redis还有一些其他方式也会自动生成快照： 0

通过配置文件的save属性，定期生成快照
Redis正常关闭时
进行主从复制时
执行flushall指令时

优缺点

优点：

因为RDB会把数据进行压缩，所以RDB的存储效率很高，占用的硬盘资源少
通过RDB恢复，效率比AOF快很多

缺点：

每次生成RDB都要额外创建一个子进程，这需要消耗额外的资源，执行

AOF

AOF全称Append Of File，其保存的是Redis的每一个具体操作，类似于MySQL的保存方式。AOF模式默认关闭，在配置文件中有以下内容：

############################## APPEND ONLY MODE ################################ By default Redis asynchronously dumps the dataset on disk. This mode is
# good enough in many applications, but an issue with the Redis process or
# a power outage may result into a few minutes of writes lost (depending on
# the configured save points).
#
# The Append Only File is an alternative persistence mode that provides
# much better durability. For instance using the default data fsync policy
# (see later in the config file) Redis can lose just one second of writes in a
# dramatic event like a server power outage, or a single write if something
# wrong with the Redis process itself happens, but the operating system is
# still running correctly.
#
# AOF and RDB persistence can be enabled at the same time without problems.
# If the AOF is enabled on startup Redis will load the AOF, that is the file
# with the better durability guarantees.
#
# Please check https://redis.io/topics/persistence for more information.appendonly no

只需要把appendonly后面的值改为yes就可以打开这个模式了，一旦这个模式打开，那么RDB模式就会失效。

修改配置文件后，要重启redis服务端：

service redis-server restart

在该字段的后面，紧跟着AOF的文件存储位置：

appendfilename "appendonly.aof"# For convenience, Redis stores all persistent append-only files in a dedicated
# directory. The name of the directory is determined by the appenddirname
# configuration parameter.appenddirname "appendonlydir"

在这里插入图片描述

打开.aof文件，可以看到以下格式的语句：

*2^M
$6^M
SELECT^M
$1^M
0^M
*8^M
$4^M
XADD^M
$58^M
pcp:values:series:2756fc65948d7070a2a980759d7e4267d49ccef4^M
$6^M
MAXLEN^M

可以看到很多认识指令，比如SELECT，XADD，MAXLEN。

那么AOF模式下，每次操作都要把指令保存到硬盘，这种IO操作是非常低效的，那不是会极大的降低Redis的执行效率？

其实Redis这里采用了一个缓冲的策略，每次操作都把指令保存到aof_buf缓冲区中，这是一个内存区域，对内存的读写非常高效，当aof_buf内部的数据量达到一定值，才会写入硬盘。这样就减少了IO次数，保证了Redis的效率。

文件同步

将aof_buf的数据写入到硬盘的文件，称为文件同步。文件同步有多种策略，如果同步的频率高了，那么Redis的效率就会低，但是如果同步的频率太低了，由于缓冲区在内存中，一旦程序崩溃，丢失的数据就更多。因此Redis给用户提供了多个级别，让用户自己取舍。

在配置文件中，可以设置同步频率：

# The fsync() call tells the Operating System to actually write data on disk
# instead of waiting for more data in the output buffer. Some OS will really flush
# data on disk, some other OS will just try to do it ASAP.
#
# Redis supports three different modes:
#
# no: don't fsync, just let the OS flush the data when it wants. Faster.
# always: fsync after every write to the append only log. Slow, Safest.
# everysec: fsync only one time every second. Compromise.
#
# The default is "everysec", as that's usually the right compromise between
# speed and data safety. It's up to you to understand if you can relax this to
# "no" that will let the operating system flush the output buffer when
# it wants, for better performances (but if you can live with the idea of
# some data loss consider the default persistence mode that's snapshotting),
# or on the contrary, use "always" that's very slow but a bit safer than
# everysec.
#
# More details please check the following article:
# http://antirez.com/post/redis-persistence-demystified.html
#
# If unsure, use "everysec".# appendfsync always
appendfsync everysec
# appendfsync no

appendfsync用于设置同步的频率，其中频率是always > everysec > no，默认策略是everysec每秒刷新一次。

always：每次写入都进行同步
everysec：每秒钟同步一次
no：依据操作系统的同步策略，但是这个策略是不可控的

重写机制

当写入的数据多，此时.aof生成的文件就会很大，而且会出现冗余的问题，比如以下操作：

set key1 111
set key1 222
set key1 333
set key1 444

由于第四次写入覆盖了前三次的值，那么前三次操作就都是无效的，如果Redis真的每一条数据都保存下来，那么这个案例就多浪费了 3/4的空间。

为此Redis会对.aof文件进行重写，剔除文件中的冗余操作，合并一些操作，从而减小.aof文件的大小。

重写机制也分为手动触发和自动触发，如果想要手动触发，输入指令bgrewriteaof。

自动触发要参照两个参数：

auto-aof-min-size：触发重写的最小大小，只有.aof超过该大小才触发重写，默认为64MB
auto-aof-rewrite-percentage：表示.aof当前占用的大小相比于上次重写增加的比例

重写流程如下：

在这里插入图片描述

和之前的bgsave类似，重写也是通过创建子进程完成的，子进程将当前内存中的所有数据，重写写入一个新的.aof文件，然后直接覆盖之前的.aof即可。

也就是说，重写的过程，并不是通过复杂的算法检测哪些内容重复，哪些可以合并。而是直接看当前的最终状态是什么，当前有什么就写入什么。

此处有一个步骤3.1，在子进程写入的过程中，父进程也会接收到数据，此时父进程会把数据写入到旧的.aof文件中。这主要是为了防止在重写.aof的过程中崩溃，一旦父子进程同时崩溃，那么就只能通过旧的.aof进行恢复，如果重写过程中不对旧的.aof更新，那么重写这段时间的数据就会丢失。

另外的，由于子进程创建时，只保存创建那一瞬间的父进程的内存状态，此时父进程接收到的后续的数据，子进程是看不到的。所以父进程还要通过3.2将数据写入一个中间缓存aof_rewrite_buf，将当前的新数据保存下来。

直到5.1子进程完成写入，给父进程发送一个信号，父进程就知道子进程写入完成了。随后通过5.2，把子进程重写期间收到的数据写入到新的.aof文件，最后再完成5.3文件的覆盖。