A closer look at how the Linux kernel influences Redis memory management
Recently, I was talking to a long-time friend, previous university colleague and former boss, who mentioned the fact that Redis was failing to persist data to disk in low memory conditions. For that reason, he advised to never let a Redis in-memory dataset to be bigger than 50% of the system memory. Thinking about how wasteful that practice would be, it's interesting to understand why this can happen and look for alternatives to assure that Redis will be able to use as much memory as there's available to it, without sacrificing its durability.
MEM_GB = 2 * 1024**3
KEY_SIZE = 1024**2
TOTAL_KEYS = int((MEM_GB * 0.5) / KEY_SIZE)
return ''.join([random.choice(string.ascii_letters + string.digits) for x in range(1024)]) * 1024
r = redis.StrictRedis()
for i in range(TOTAL_KEYS):
It will generate random key/value pairs of 1MB each, using up to half of the total memory available. As it was executed on a 2GB RAM virtual machine, it will create a dataset about 1GB in size. Considering the memory used by the OS and all other processes, we can be sure that Redis is now using a bit more than 50% of the total system memory. From this point in time, calling BGSAVE will result in an error:
And the following message will appear in /var/log/redis/redis-server.log (on a Ubuntu 18.04 LTS system):
10202:M 13 Sep 11:34:16.535 # Can't save in background: fork: Cannot allocate memory
Looking at the source code for this operation, this message is shown when the fork() system call returns -1. In its man page, we can see that this return code only means that it failed and no child process were created. Based on that information and the error message, one might say that the process failed because it was duplicating the entire dataset in memory, an action that can't be done with less than half memory available.
Digging through a bit of Unix history, we'll find that the first-generation of Unix OSes indeed duplicated the whole parent address space when fork() was called. On modern kernels like Linux, this doesn't happen anymore and the NOTES section of the same man page mentions this in detail:
Under Linux, fork() is implemented using copy-on-write pages, so the only penalty that it incurs is the time and memory required to duplicate the parent's page tables, and to create a unique task structure for the child.
A copy-on-write approach is much more efficient than actually copying data from one place to the other. The child process will share the same memory pages as its parent, but in the end will only need enough memory to create pointers to the actual data. Each of these memory pages will only be copied if, and only if, the child process tries to write something to them, hence the name copy-on-write (CoW). As the data is being dumped to disk, this is a read-only operation that results in virtually no increase in memory usage.
The question now is: if nowhere near double the amount of memory is needed, why is it still failing? The answer is that the Linux kernel cannot make the compromise of allowing a child process to point to that amount of data, as there's no guarantee it won't modify it. If the kernel allowed that, it could result in a situation where there the total system memory wouldn't be enough to hold everything that was allocated by both parent and child processes. The good news is that there's a way to overcome that, presented as a tip in the Redis log file:
10202:M 13 Sep 11:33:09.943 # WARNING overcommit_memory is set to 0! Background
save may fail under low memory condition. To fix this issue add
'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the
command 'sysctl vm.overcommit_memory=1' for this to take effect.
The message is a bit misleading, as a system that is using a bit more than 50% of memory isn't exactly in a "low memory condition," but is still consistent with what we know about the problem until now. Before trying any command or configuration with exactly knowing what it does, let's look at what the 1 option means in the overcommit_memory section of the proc file system man page:
In mode 1, the kernel pretends there is always enough memory, until memory actually runs out. One use case for this mode is scientific computing applications that employ large sparse arrays. In Linux kernel versions before 2.6.0, any nonzero value implies mode 1.
$ sudo sysctl vm.overcommit_memory=1
vm.overcommit_memory = 1
Background saving started
After that there will be much better messages in the Redis log:
10202:M 13 Sep 11:47:04.663 * Background saving started by pid 10337
10337:C 13 Sep 11:47:05.833 * DB saved on disk
10337:C 13 Sep 11:47:05.839 * RDB: 0 MB of memory used by copy-on-write
10202:M 13 Sep 11:47:05.885 * Background saving terminated with success
Work with people like Tiago. Have a look at our open jobs page.