The story
Boxes with various kernel versions have weird free memory problems. After examining the memory usage it seems that processes don’t add up to the actual memory that is being used.
Taking a look at /proc/meminfo we see something like this:
MemTotal:Â Â Â Â Â 8161544 kB MemFree:Â Â Â Â Â Â Â 115676 kB Buffers:Â Â Â Â Â Â Â Â Â 3900 kB Cached:Â Â Â Â Â Â Â Â 200520 kB SwapCached:Â Â Â Â Â 42336 kB Active:Â Â Â Â Â Â Â Â 546824 kB Inactive:Â Â Â Â Â Â 138336 kB HighTotal:Â Â Â Â Â Â Â Â Â Â 0 kB HighFree:Â Â Â Â Â Â Â Â Â Â Â 0 kB LowTotal:Â Â Â Â Â 8161544 kB LowFree:Â Â Â Â Â Â Â 115676 kB SwapTotal:Â Â Â Â 2096472 kB SwapFree:Â Â Â Â Â Â 547480 kB Dirty:Â Â Â Â Â Â Â Â Â Â Â 1020 kB Writeback:Â Â Â Â Â Â Â Â Â Â 0 kB AnonPages:Â Â Â Â Â 453480 kB Mapped:Â Â Â Â Â Â Â Â Â 66928 kB Slab:Â Â Â Â Â Â Â Â Â 7250176 kB PageTables:Â Â Â Â Â 75408 kB ...
Notice that Slab is about 7.5GB, almost the whole memory (8GB) (!).
Slab is the kernel memory and we can see where it is allocated by examining /proc/slabinfo. Here’s an excerpt:
# name           <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> nfs_direct_cache      0     0   136  28   1 : tunables 120  60   8 : slabdata     0     0     0 nfs_write_data       62    63   832   9   2 : tunables  54  27   8 : slabdata     7     7     0 nfs_read_data       215   297   832   9   2 : tunables  54  27   8 : slabdata    33    33    54 nfs_inode_cache  5384386 5399040  1032   3   1 : tunables  24  12   8 : slabdata 1799680 1799680    40 nfs_page            534   750   128  30   1 : tunables 120  60   8 : slabdata    25    25   264 rpc_buffers           8     8  2048   2   1 : tunables  24  12   8 : slabdata     4     4     0 ...
Notice the nfs_inode_cache which is 5.3M objects of 1032 bytes each, adding up to about 5.4GB.
The workaround
Looking a bit about this on the internet we see that this is most probably a bug. Fortunately there are two workaround: A slow and a fast one:
Slow workaround: Login to that box and run “sync”. Then leave it alone for a couple of minutes while the nfs_inode_cache memory goes down and down. It make take a couple of minutes before starting going down and there may be pauses in the process. It can take more than an hour to free the memory.
Fast workaround: Login to that box and run:
# sync # echo 2 > /proc/sys/vm/drop_caches
I’m not sure why the first one works, but it looks like it is triggering a chain reaction that frees the memory.
Thank you, thank you, thank you. This post just saved us from having to reboot a box.
Details for future Googlers: We had several pdflush processes running wild and almost all RAM was slab memory. Running sync cleared a few GB of slab memory and made pdflush quiet down, but several GB of slab memory remained. We then dropped caches as above and that cleared out the rest of the slab waste.
LikeLike