systemd and cgroups

During normal operation, systemd maintains an association between a unit abstraction and the underlying processes active on the system. This is documented in the ​​man systemd​​ documentation.

​Raw​

       Processes systemd spawns are placed in individual Linux control groups named after the unit
which they belong to in the private systemd hierarchy. (see cgroups.txt[1] for more information
about control groups, or short "cgroups"). systemd uses this to effectively keep track of
processes. Control group information is maintained in the kernel, and is accessible via the
file system hierarchy (beneath /sys/fs/cgroup/systemd/), or in tools such as ps(1) (ps xawf -eo
pid,user,cgroup,args is particularly useful to list all processes and the systemd units they
belong to.).

When a process forks itself, it inherits the cgroup of the creating process. With this being the case, all of the processes associated with a given unit can be verified by reading the contents of the applicable ​​cgroup.procs​​ file. Similar to the following:

​Raw​

# cat /sys/fs/cgroup/systemd/system.slice/httpd.service/cgroup.procs 
1253
1254
1255
1256
1257
1258

This output will match the CGroup information returned during a ​​systemctl status <unit>​​ operation:

​Raw​

# systemctl status httpd
● httpd.service - The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; disabled; vendor preset: disabled)
Active: active (running) since Tue 2017-06-13 13:45:47 EDT; 899ms ago
Docs: man:httpd(8)
man:apachectl(8)
Main PID: 1253 (httpd)
Status: "Processing requests..."
CGroup: /system.slice/httpd.service
├─1253 /usr/sbin/httpd -DFOREGROUND
├─1254 /usr/sbin/httpd -DFOREGROUND
├─1255 /usr/sbin/httpd -DFOREGROUND
├─1256 /usr/sbin/httpd -DFOREGROUND
├─1257 /usr/sbin/httpd -DFOREGROUND
└─1258 /usr/sbin/httpd -DFOREGROUND

<DATE> <TIME> host.example.com systemd[1]: Starting The Apache HTTP Server...
<DATE> <TIME> host.example.com systemd[1]: Started The Apache HTTP Server.

To directly view these groupings of processes system-wide, the ​​systemd-cgls​​ utility can be used:

​Raw​

# systemd-cgls | head -17
├─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 21
├─user.slice
│ └─user-0.slice
│ └─session-2.scope
│ ├─1206 sshd: root@pts/0
│ ├─1209 -bash
│ ├─1332 systemd-cgls
│ └─1333 head -17
└─system.slice
├─httpd.service
│ ├─1253 /usr/sbin/httpd -DFOREGROUND
│ ├─1254 /usr/sbin/httpd -DFOREGROUND
│ ├─1255 /usr/sbin/httpd -DFOREGROUND
│ ├─1256 /usr/sbin/httpd -DFOREGROUND
│ ├─1257 /usr/sbin/httpd -DFOREGROUND
│ └─1258 /usr/sbin/httpd -DFOREGROUND
├─atd.service

Issue example

In the above example output, the five httpd processes are logically included in the ​​httpd.service​​ unit. This results in that unit files directives being used during system shutdown.

Specifically, the following configuration from a Red Hat Enterprise Linux 7.3 system:

​Raw​

# systemctl cat httpd.service
# /usr/lib/systemd/system/httpd.service
[Unit]
Description=The Apache HTTP Server
After=network.target remote-fs.target nss-lookup.target
Documentation=man:httpd(8)
Documentation=man:apachectl(8)

[Service]
Type=notify
EnvironmentFile=/etc/sysconfig/httpd
ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND
ExecReload=/usr/sbin/httpd $OPTIONS -k graceful
ExecStop=/bin/kill -WINCH ${MAINPID}
# We want systemd to give httpd some time to finish gracefully, but still want
# it to kill httpd after TimeoutStopSec if something went wrong during the
# graceful stop. Normally, Systemd sends SIGTERM signal right after the
# ExecStop, which would kill httpd. We are sending useless SIGCONT here to give
# httpd time to finish.
KillSignal=SIGCONT
PrivateTmp=true

[Install]
WantedBy=multi-user.target

It is imperative that the service be started/stopped via the systemd system in order to maintain the correct process to unit grouping. Any operation that takes external action results in the necessary cgroup structure not being created. This is simply due to systemd not being aware of the special nature of the processes being started.

As an example, when the above httpd processes are stopped and then started from a local shell, note the outcome in terms of process and unit grouping.

​Raw​

# killall httpd           # Stop the currently running httpd processes
# /usr/sbin/httpd # Start a new instance that will daemonize itself

# systemd-cgls | head -17
├─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 21
├─user.slice
│ └─user-0.slice
│ └─session-2.scope
│ ├─1206 sshd: root@pts/0
│ ├─1209 -bash
│ ├─1407 /usr/sbin/httpd
│ ├─1408 /usr/sbin/httpd
│ ├─1409 /usr/sbin/httpd
│ ├─1410 /usr/sbin/httpd
│ ├─1411 /usr/sbin/httpd
│ ├─1412 /usr/sbin/httpd
│ ├─1413 systemd-cgls
│ └─1414 head -17
└─system.slice
├─atd.service
│ └─1143 /usr/sbin/atd -f

The httpd processes have been started, however, the are now associated with the session-2.scope unit. These units are created during user login via an interaction between systemd-logind and the pam_systemd module.

From the ​​man pam_systemd​​ documentation:

​Raw​

       pam_systemd registers user sessions with the systemd login manager systemd-logind.service(8),
and hence the systemd control group hierarchy.

On login, this module ensures the following:

1. If it does not exist yet, the user runtime directory /run/user/$USER is created and its
ownership changed to the user that is logging in.

2. The $XDG_SESSION_ID environment variable is initialized. If auditing is available and
pam_loginuid.so was run before this module (which is highly recommended), the variable is
initialized from the auditing session id (/proc/self/sessionid). Otherwise, an independent
session counter is used.

3. A new systemd scope unit is created for the session. If this is the first concurrent
session of the user, an implicit slice below user.slice is automatically created and the
scope placed into it.

When a process is interacted with outside of systemd, the result is generally that the individual processes are associated with the shutdown operations defined in the user session scope. These units are more ephemeral in nature as they are instantiated on login and are expected to only last the duration of that particular session:

​Raw​

# systemctl cat session-2.scope
# /run/systemd/system/session-2.scope
# Transient stub

# /run/systemd/system/session-2.scope.d/50-After-systemd-logind\x2eservice.conf
[Unit]
After=systemd-logind.service
# /run/systemd/system/session-2.scope.d/50-After-systemd-user-sessions\x2eservice.conf
[Unit]
After=systemd-user-sessions.service
# /run/systemd/system/session-2.scope.d/50-Description.conf
[Unit]
Description=Session 2 of user root
# /run/systemd/system/session-2.scope.d/50-SendSIGHUP.conf
[Scope]
SendSIGHUP=yes
# /run/systemd/system/session-2.scope.d/50-Slice.conf
[Scope]
Slice=user-0.slice

# systemctl show session-2.scope | grep Kill
KillMode=control-group
KillSignal=15

The above configuration determines that a ​​SIGTERM​​ signal, followed by a ​​SIGHUP​​, to each of the processes found within the applicable cgroup will take place during that session stop operation. By default, without this external interaction, the httpd process would be to gracefully shut down after being sent a ​​SIGWINCH​​ which httpd is written to shutdown in a specific manner.

How to avoid non-graceful service shutdowns

In order to avoid this behaviour, one of two separate strategies must be used.

1 - Alter service management operations to only make use of systemd provided interfaces. Operations such as ​​systemctl start/stop/restart <service>​​ as well as the use of the underlying dbus API are available for these purposes.

2 - Alter process signal handling so that they can respond in a graceful manner when the indicated ​​SIGTERM​​ and ​​SIGHUP​​ signals are delivered.

In the event that the application encountering this behaviour is provided by a 3rd party, it is recommended that an issue be raised with their respective support organization. This will allow those teams to verify compatibility and operation intent between the application and the surrounding Operating System when using systemd as an init system.

Additional details

Please see the following documentation:​

Table of Contents

Automatically generate a table of contents