Linux How to reclaim page cache?

What is page cache?
To optimize disk performance Linux caches data of recently read / written files. This part of RAM is called page cache.

Why would you need to reclaim page cache?
By default Linux reclaims page cache using LRU algorithm. While LRU algorithm suffice for general purpose, it may not be suitable for specific requirements of an IO intensive software. In that case the software should implement it’s own logic to reclaim page cache.

How to advice the kernel to reclaim the page cache specific to a file?
posix_fadvise is the system call used to advice kernel about the page cache behavior for the particular file.

If you want to open a file for reading and wish that it should not consume page cache do it following way:

#include <unistd.h>
#include <fcntl.h>
int main(int argc, char *argv[]) {
    int fd;
    int ret;
    fd = open(argv[1], O_RDONLY);
    ret = posix_fadvise(fd, 0,0,POSIX_FADV_DONTNEED);
    close(fd);
    return 0;
}

If you want to reclaim page cache of file being written, it needs to be handled differently. Linux will need the data of file being written in page cache till it does not flush the contents on the disk. Till that point it can not reclaim page cache. So before calling posix_fadvise, you will have to call fdatasync or wait for sufficient amount of time (greater than 30 seconds) before calling posix_fadvise.

Why 30 seconds?
Because max after 30 seconds, Linux treats data in page cache as stale and initiates flushing on disk using pdflush/bdflush thread. The related setting can be found in file:
/proc/sys/vm/dirty_expire_centiseconds
(default 3000): In hundredths of a second, how long data can be in the page cache before it’s considered expired and must be written at the next opportunity. Note that this default is very long: a full 30 seconds. That means that under normal circumstances, unless you write enough to trigger the other pdflush method, Linux won’t actually commit anything you write until 30 seconds later.

More information can be found at:
http://insights.oetiker.ch/linux/fadvise.html

Advertisements

Linux Memory Management – How to read top output?

Q. How to understand the memory usage from the output of top command in linux? What is buffers, cached memory?

Ans:
top command’s output looks like as below:

top - 10:25:14 up 5 days, 21:47, 1 user, load average: 5.98, 6.40, 6.52
Tasks: 308 total, 2 running, 306 sleeping, 0 stopped, 0 zombie
Cpu(s): 7.0%us, 10.8%sy, 0.0%ni, 75.1%id, 6.7%wa, 0.0%hi, 0.4%si, 0.0%st
Mem: 65967440k total, 64164816k used, 1802624k free, 15331976k buffers
Swap: 31999996k total, 85556k used, 31914440k free, 15929768k cached


PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
38 root 20 0 0 0 0 S 0 0.0 3:58.90 rcuos/3
45 root 20 0 0 0 0 S 0 0.0 1:39.88 rcuos/10
47 root 20 0 0 0 0 S 0 0.0 3:38.73 rcuos/12
154 root 20 0 0 0 0 S 0 0.0 3:38.41 kworker/4:1
160 root 20 0 0 0 0 S 0 0.0 2:02.53 kworker/10:1

How to read the top output?

1. Mem: 65967440k total
Is the total RAM available on the machine.

2. 64164816k used,
Is the used memory by linux process & disk IO operations.

3. 1802624k free
Is the free memory available on the machine.

4. 15331976k buffers
Is the memory used by machine to cache inode & directory entries.

5. 15929768k cached
Is the memory used as page cache to store data of recently read/written files.

6. Swap: 31999996k total
Is the total swap space available in machine.

7. 85556k used
Is the swap space used by machine.

8. 31914440k free
Is the swap space available on machine.

I was really confused to find out that: (buffers + cached + total memory used by all process) was not matching to used memory on system. I could not understand this unaccounted memory until I came across:
http://www.logicmonitor.com/2014/10/09/more-linux-memory-free-memory-that-is-not-free-nor-cache/