Saturday, May 30, 2015

Constrain Memory And CPU Via Cgroups On Linux

Limitations on memory and CPU is required inevitably on public-service-provided linux machines like gateways, since there are times that resources on these nodes have been eaten up by clients so that administrators could hardly ssh to the problematic machine when it's stuck.

cgroups (abbreviated from control groups) is a Linux kernel feature that limits, accounts for and isolates the resource usage (CPU, memory, disk I/O, network, etc.) of a collection of processes. And the way to limit a group of users to a specific upperbound for both memory and CPU is as follows.

Firstly, we have to install cgroups just in case it is not on your linux server. In my scenario, the environment is CentOS-6.4 and the following command will just complete the installation process (you need to enable EPEL repository on CentOS of yum command and ensure that CentOS has to be version 6.x).
yum install -y libcgroup

Make it persistent by changing it to 'on' in the runlevel you are using. (Refer to chkconfig and Run Level on Linux)
chkconfig --list | grep cgconfig
chkconfig --level 35 cgconfig on

After that, cgroups is already installed successfully.


Just before we move on to configure limitations via cgroups, we should first put all of our client users in the same group for the purpose of management as a whole. Related commands are as below:
/usr/sbin/groupadd groupname    # add a new group named groupname
usermod -a -G group1,group2 username    # apply username to group1 and group2
groups username    # To check a user's group memberships, use the groups command


Now it's time to configure memory and CPU limitation in '/etc/cgconfig.conf', to which, append the following settings:
group gateway_limit {
    memory {
        memory.limit_in_bytes = 8589934592;
    }

    cpu {
        cpu.cfs_quota_us = 3000000;
        cpu.cfs_period_us = 1000000;
    }
}

Configuration for memory is well self-explained, and that for CPU is defined in this document. In essence, if tasks in a cgroup should be able to access a single CPU for 0.2 seconds out of every 1 second, set cpu.cfs_quota_us to 200000 and cpu.cfs_period_us to 1000000. Note that the quota and period parameters operate on a CPU basis. To allow a process to fully utilize two CPUs, for example, set cpu.cfs_quota_us to 200000 and cpu.cfs_period_us to 100000. If relative share of CPU time is required, setting 'cpu.shares' could be applied. But according to this thread, 'cpu.shares' is work conservingly, i.e., a task would not be stopped from using cpu if there is no competition. If you want to put a hard limit on amount of cpu a task can use, try setting 'cpu.cfs_quota_us' and 'cpu.cfs_period_us'.

After that, we should connect our user/group with the above cgroups limitations in '/etc/cgrules.conf':
@gatewayer      cpu,memory      gateway_limit/

it denotes that for all users in group 'gatewayer', CPU and memory is constrained by 'gateway_limit' which is configured as above.

Notice that if we configure two different lines for the same user/group, then the second line setting will fail, so don't do that:
@gatewayer      memory      gateway_limit/
@gatewayer      cpu      gateway_limit/

Now we have to restart all the related services. Note that we have to switch to path '/etc' in order to execute the following commands without failure, cuz 'cgconfig.conf' and 'cgrules.conf' is available in that path.
service cgconfig restart
service cgred restart  #cgred stands for CGroup Rules Engine Daemon

Eventually, we should verify that all settings have all been hooked up to specific users as expected.

The first way to do that is to login to a user who is in group 'gatewayer', invoking `pidof bash` to check out current PID for the bash terminal. Then `cat /cgroup/cpu/gateway_limit/cgroup.procs` as well as `cat /cgroup/memory/gateway_limit/cgroup.procs` to see whether the former PID is in here. If so, it means memory and CPU is monitored by cgroups for the current user.

The second way to achieve this is a little bit simpler, for you only have to login to a user, execute `cat /proc/self/cgroup`. If gateway_limit is applied both to memory and CPU as follows, then it is well configured.
246:blkio:/
245:net_cls:/
244:freezer:/
243:devices:/
242:memory:/gateway_limit
241:cpuacct:/
240:cpu:/gateway_limit
239:cpuset:/

As for testing, here's a memory-intensive python script that could be used to test memory limitation:
import string
import random
import time

if __name__ == '__main__':
    d = {}
    i = 0;
    for i in range(0, 900000000):
        d[i] = some_str = ' ' * 512000000
        if i % 10000 == 0:
            print i

When monitoring via `ps aux | grep this_script_name`, the process is killed when the consuming cpu exceeds the upperbound set in cgroups. FYI, swappiness is currently set to 0, thus no swap space will be used when physical memory is exhausted and the process will be killed by Linux kernel. If we intend to make our Linux environment more elastic, we could modify swappiness to a higher level (default is 60).

For CPU-intensive test, referencing to this thread, the following command will be executed on the specific user to create multiple processes consuming CPU time. On another terminal window, we could execute `top -b -n 1 | grep current_user_name_or_dd | awk '{s+=$9} END {print s}'` to sum up all the dd processes CPU time.
fulload() { dd if=/dev/zero of=/dev/null | dd if=/dev/zero of=/dev/null | dd if=/dev/zero of=/dev/null | dd if=/dev/zero of=/dev/null & }; fulload; read; killall dd



Reference:
1. cgroups documentation
2. cgroups on centos6
3.1. cgrules.conf(5) - Linux man page
3.2. cgconfig.conf(5) - Linux man page
4. how-to-create-a-user-with-limited-ram-usage - stackexchange
5. how-can-i-configure-cgroups-to-fairly-share-resources-between-users - stackexchange
6. how-can-i-produce-high-cpu-load-on-a-linux-server - superuser
7. shell-command-to-sum-integers-one-per-line - stackoverflow
8. Why does Linux swap out pages when I have many pages cached and vm.swappiness is set to 0 - quora