2. What do we have?
● cpuset - whole cores and cpu mapping
● cpuacct - cpu cycle accounting
● cpu - less then core granularity
● memory - limits and accounting
● blkio - limits and accounting
● net_cls - network classification
● net_prio - network priority
● Freezer + checkpoint/restore - migration
3. General structure
● tasks
– attach a task(thread) and show list of threads
● cgroup.procs
– show list of processes
● cgroup.event_control
– an interface for event_fd()
# mount -t cgroup none /cgroups
# mount -t cgroup -o cpuset cpuset /cg/cpuset
4. cpuset
● Physical CPU & Memory limits
– cpuset.cpus - a list of allowed CPUs
– cpuset.mems - a list of allowed memory slots
– cpuset.cpu_exclusive - 0/1 are the CPUs exclusive to this
group(no other group can use them)
– cpuset.mem_exclusive or cpuset.mem_hardwall - 0/1 are
the memory slots exclusive to this group(no other group can
use them)
– cpuset.sched_load_balance - should the kernel balance the
tasks between the CPUs in the current cpuset
– cpuset.sched_relax_domain_level
Documentation/cgroups/cpusets.txt
5. cpuset
● Physical CPU & Memory limits
– cpuset.sched_relax_domain_level
-1 : no request. use system default or follow request of others.
0 : no search.
1 : search siblings (hyperthreads in a core).
2 : search cores in a package.
3 : search cpus in a node [= system wide on non-NUMA system]
on NUMA systems only
4 : search nodes in a chunk of node
5 : search system wide
Documentation/cgroups/cpusets.txt
6. CPU accounting
● cpu usage combined for all cpus (in nanoseconds)
● cpu usage per-cpu (in nanoseconds)
● per cpu and user/system(in USER_HZ)
● Documentation/cgroups/cpuacct.txt
7. CPU
● CPU scheduler limits CONFIG_CGROUP_SCHED
– cpu.shares: the amount of cpu shares available to the group
– cpu.cfs_quota_us: the total available run-time within a period (in
microseconds) (-1 no limit)
– cpu.cfs_period_us: the length of a period (in microseconds) (default
100ms)
– cpu.stat: exports throttling statistics
nr_periods: Number of enforcement intervals that have elapsed.
nr_throttled: Number of times the group has been throttled/limited.
throttled_time: The total time duration (in nanoseconds) for which
entities of the group have been throttled.
● Documentation/scheduler/sched-bwc.txt
8. CPU examples
1. Limit a group to 1 CPU worth of runtime. If period is 250ms and quota is also
250ms, the group will get 1 CPU worth of runtime every 250ms.
# echo 250000 > cpu.cfs_quota_us /* quota = 250ms */
# echo 250000 > cpu.cfs_period_us /* period = 250ms */
2. Limit a group to 2 CPUs worth of runtime on a multi-CPU machine. With 500ms
period and 1000ms quota, the group can get 2 CPUs worth of runtime every 500ms.
# echo 1000000 > cpu.cfs_quota_us /* quota = 1000ms */
# echo 500000 > cpu.cfs_period_us /* period = 500ms */
The larger period here allows for increased burst capacity.
3. Limit a group to 20% of 1 CPU. With 50ms period, 10ms quota will be equivalent to
20% of 1 CPU.
# echo 10000 > cpu.cfs_quota_us /* quota = 10ms */
# echo 50000 > cpu.cfs_period_us /* period = 50ms */
By using a small period here we are ensuring a consistent latency response at the
expense of burst capacity.
9. memory
Only Memory
●
memory.usage_in_bytes - show current res_counter usage for memory
●
memory.limit_in_bytes - set/show limit of memory usage
● memory.failcnt - show the number of memory usage hits limits
●
memory.max_usage_in_bytes - show max memory usage recorded
Memory + Swap
●
memory.memsw.usage_in_bytes - show current res_counter usage
● memory.memsw.limit_in_bytes - set/show limit
●
memory.memsw.failcnt - show the number of hits limits
●
memory.memsw.max_usage_in_bytes - show max memory+Swap usage recorded
●
memory.soft_limit_in_bytes - set/show soft limit of memory usage
●
memory.stat - show various statistics
● memory.use_hierarchy - set/show hierarchical account enabled
●
memory.force_empty - trigger forced move charge to parent
● memory.pressure_level - set memory pressure notifications
● memory.swappiness - set/show swappiness parameter of vmscan
10. memory
● memory.move_charge_at_immigrate - set/show controls of moving charges
●
memory.oom_control - set/show oom controls.
●
memory.numa_stat - show the number of memory usage per numa node
Kernel Memory limits
● memory.kmem.limit_in_bytes - set/show hard limit for kernel memory
●
memory.kmem.usage_in_bytes - show current kernel memory allocation
●
memory.kmem.failcnt - show the number of kernel memory usage hits limits
● memory.kmem.max_usage_in_bytes - show max kernel memory usage recorded
●
memory.kmem.tcp.limit_in_bytes - set/show hard limit for tcp buf memory
●
memory.kmem.tcp.usage_in_bytes - show current tcp buf memory allocation
● memory.kmem.tcp.failcnt - show the number of tcp buf memory usage hits limits
●
memory.kmem.tcp.max_usage_in_bytes - show max tcp buf memory usage recorded
11. blkio statistics
● blkio.io_wait_time
● blkio.io_merged
● blkio.io_queued
● blkio.avg_queue_size
● blkio.group_wait_time
● blkio.throttle.io_serviced
● blkio.throttle.io_service_bytes
● blkio.sectors
● blkio.io_service_bytes
● blkio.io_serviced
● blkio.io_service_time
● blkio.*_recursive
● blkio.reset_stats
– write an int to it
12. blkio limiting
● blkio.weight - allowed range 10 - 1000
● blkio.weight_device - weight per device
● blkio.leaf_weight[_device] - when competing with
child cgroups
● blkio.time - disk time allocated in miliseconds
● blkio.throttle.read_bps_device
● blkio.throttle.write_bps_device
● blkio.throttle.read_iops_device
13. Network
● Adding network class to each cgroup so you can
later limit it with tc
– Documentation/cgroups/net_cls.txt
● Prioritizing network traffic on interface
– Documentation/cgroups/net_prio.txt
14. Freezer + CRIU
● freezer.state
– ТHAWED
– FREEZING
– FROZEN
● freezer.self_freezing
– 0 (thawed)/ 1 (frozen)
● freezer.parent_freezing
– 0 if partent is frozen
● CRIU - Checkpoint and Restore
In Userspace