Rootless Containers

Rootless containers are containers that can be created, run, and otherwise managed by unprivileged users (as opposed to the root user). To be considered fully rootless, both the container runtime and the container must be running without root privileges.

An advantage to using rootless containers is that they can mitigate the risk of container-breakout vulnerabilities. The best way to prevent privilege-escalation attacks from within a container is to configure your container’s applications to run as unprivileged users. However, running in rootless mode introduces some additional complexity, especially concerning networking. For more details, see Container Networking.

User Namespaces

Rootless containers make use of a feature of the Linux kernel called user namespaces. User namespaces isolate security-related identifiers and attributes, in particular, user IDs (UIDs) and group IDs (GIDs), credentials, the root directory, keys, and capabilities. A process's user and group IDs can be different inside and outside a user namespace. With user namespaces, a range of user and group IDs in the process's user namespace is mapped to a set of user and group IDs in the parent namespace; this mapping is specified in the /etc/subuid and /etc/subgid files.

The /etc/subuid file authorizes a user ID to map ranges of userIDs from its namespace into child namespaces; the /etc/subgid provides the same functionality for group IDs. Each line in /etc/subuid and /etc/subgid contains either a user name or group name, respectively, and a range of subordinate IDs that processes in the child namespace are allowed to use. The three fields delimited for each entry are:

  • user/group name or ID
  • numerical subordinate ID
  • numerical subordinate ID count

For example, this entry shows that for user Maria, the subordinate IDs start at 10001 and have a range of 65536:

maria:10001:65536

User and Group ID Translation

Rootless mode executes the processes for the container runtime and containers inside a user namespace. The user ID of the container maps to the user ID of host as follows:

 <start of subuid range> + <uid inside the container> – 1

Similarly, the group ID of the container user maps to:

<start of subgid range> + <GID inside the container> - 1

The exception to this is that the root user (UID=0) maps to the UID that owns the user namespace.

You must carefully consider this mapping when you are setting the ownership of the data volume directories. Podman includes an unshare utility that makes setting the directory ownership less confusing (and does not require the user running the commands to use sudo). For more information, see Managing Storage for Container Images.

For example, suppose Solly has a UID of 1000. His account has a mapping in /etc/subuid of solly:12000:65536. This means that user IDs inside Solly's user namespace can be mapped to a range of 65536 IDs on the host, starting with UID 12000:

$ cat /etc/subuid
solly:120000:65536

Solly is running Podman with this UID mapping. When he launches Bash inside his Podman container, and then examines the current user, he sees the following:

$ podman run --rm -it ubuntu bash
root@bfda7167e840:/# id    
uid=0(root) gid=0(root) groups=0(root)

This shows that Bash is running as the root user (0) inside the user namespace.

He then examines the uid_map file:


root@bfda7167e840:/# cat /proc/self/uid_map 
         0       1000          1
         1     120000      65536

This shows that:

  • the root user (0) in the user namespace is mapped to Solly's UID (1000) on the host
  • user ID 1 in the namespace is mapped to UID 120000 on the host

The following diagram shows another example where the container runtime is launched by Maria, who has user ID 1600. The /etc/subuid file contains the following mapping:

maria:10001:65536

This means that processes within Maria's user namespace map to host UIDs starting at 10001. Because the container runtime launches the containers, the container processes also belong to Maria's user namespace.

In this example:

  • the root user (0) in the user namespace is mapped to Maria's UID (1600) on the host
  • user ID 1000 in the user namespace is mapped to UID 11000 on the host

The default User ID in a Solace container is 1000001. That is, if you do not specify the -u parameter, the container runs with UID=1000001. In a rootless environment, the container will not run because the OS won't allocate a UID range that starts with 1000001 within the user namespace. To resolve this, specify a -u parameter with a smaller non-zero user ID when you start the container.

Resource Limit Configuration

The resource constraints that an unprivileged user can impose on a container are limited by the constraints assigned to the user. Any changes in the limits assigned to the unprivileged user must be made by a privileged user.

For instance, on the WSL2 Ubuntu 20.04 LTS distribution, the default hard limit for the maximum number of concurrent open files for an unprivileged user (and therefore for a container created by an unprivileged user) is 4096. Using this default value means that the number of PubSub+ client connections is constrained to less than the configured maximum (because 4096 is less than the recommended limit for the Maximum Number of Client Connections).

To allow an unprivileged user to create a container with --ulimit nofile=2448:42192, the root user must modify the nofile hard limit configuration for the user in the /etc/security/limits.conf file.

Directory and File Ownership

A PubSub+ container is designed to work with directories and files that are owned by the container user and the root group. As described in Managing Software Event Broker Storage, the software broker makes use of a storage-group to maintain state information. We recommend that this storage-group be kept in external storage, and mounted to the container as a volume (preferred) or bind mount.

To ensure that the software broker container has the required permissions to access the storage-group, you must modify the ownership of the persistent storage using the podman unshare command on the host:

podman unshare chown <container user's uid>:0 -R <directory>

The podman unshare command lets you run a command (chown in this case) in the same user namespace as your containers. Because all rootless containers that are run by a given user run inside the same user namespace, you only need to run podman unshare chown once to allow all of a user's containers to access a directory.

For example, for a container user with UID 5 and a storage-group mounted as a volume, run chown with podman unshare to change the directory owner within the container's user namespace:

$ podman unshare chown 5:0 -R /home/ec2-user/.local/share/containers/storage/volumes/solace/_data

Now run ls with podman unshare to view the directory owner within the container's user namespace:

$ podman unshare ls -laZ /home/ec2-user/.local/share/containers/storage/volumes/solace/_data
drwxrwxrwx. 9 5 root system_u:object_r:container_file_t:s0 165 Feb 24 14:40 _data

Run ls again, this time without podman unshare, to view the directory owner from the perspective of the host namespace:

$ ls -laZ /home/ec2-user/.local/share/containers/storage/volumes/solace/_data
drwxrwxrwx. 9 100004 ec2-user system_u:object_r:container_file_t:s0 165 Feb 24 14:40 _data

Podman resets the ownership of directories and files in volume mounts when it starts the rootless container for the first time, therefore, simply run the podman unshare command after you start the container for the first time. Alternatively, create an empty directory and bind mount it—in this case the correct directory/file ownership is automatically assigned.

Rootful Versus Rootless Containers

There are four possible variants for running containers with a combination of root and non-root users, as shown in the table below:

  • The container runtime is executed as root (left two scenarios) versus non-root (right two scenarios)
  • The user inside the container is root (top two scenarios) versus non-root (bottom two scenarios)

The most secure solution is the bottom-right scenario, where the container is run as non-root, and the user inside the container is also non-root.

The examples in the table use Podman to illustrate the four scenarios:

Container Runtime Executed as Root

Processes in Container Executed as Root

Launch Podman as root, and specify the root user (-u 0) within the container:

brian@ubuntu:/$ sudo bash
root@ubuntu:/# whoami
> root
root@ubuntu:/# podman run -u 0 ... solace-container

Run bash inside the container:

root@ubuntu:/# podman exec -it solace-container bash

Processes run as root inside the container:

 whoami
> root
ps auxf
> # Notice processes are running as root

The same processes run as root outside the container from the perspective of the host:

root@ubuntu:/# ps auxf
> # Notice all container processes are running as root

Container Runtime Executed as Non-Root

Processes in Container Executed as Root

Launch Podman as unprivileged user, and specify the root user (-u 0) within the container:

brian@ubuntu:/$ whoami
> brian (UID 1000)

brian@ubuntu:/$ podman run -u 0 ... solace-container

Run bash inside the container:

brian@ubuntu:/$ podman exec -it solace-container bash

Processes run as root inside the container

 whoami
> root
 ps auxf
> # Notice processes are running as root

The same processes run as UID 1000 (brian) outside the container from the perspective of the host (because of user namespace mapping)

brian@ubuntu:/$ ps auxf
> # Notice all container processes are running as user 1000 (brian)

Container Runtime Executed as Root

Processes in Container Executed as Non-Root

Launch Podman as root, and do not specify a user within the container:

brian@ubuntu:/$ sudo bash
root@ubuntu:/# whoami
> root
root@ubuntu:/# podman run ... solace-container

Run bash inside the container:

root@ubuntu:/# podman exec -it solace-container bash

When you don't specify a user, container processes run as user 1000001 (the default User ID in a Solace container is 1000001):

whoami
> 1000001
ps auxf
> # Notice processes are running as user 1000001

The same processes run as user 1000001 outside the container from the perspective of the host (no user namespace mapping because the container runtime runs as root):

root@ubuntu:/# ps auxf
> # Notice all container processes are running as user 1000001

Container Runtime Executed as Non-Root

Processes in Container Executed as Non-Root

Launch Podman as non-root user, and specify a non-root user within the container:

brian@ubuntu:/$ whoami
> brian (UID 1000)

brian@ubuntu:/$ podman run -u 5 ... solace-container

Run bash inside the container:

brian@ubuntu:/$ podman exec -it solace-container bash

Processes run as user 5 inside the container

whoami
> 5
ps auxf
> # Notice processes are running as user 5

The same processes run as UID 100004 outside the container from the perspective of the host (because of user namespace mapping):

brian@ubuntu:/$ ps auxf
> # Notice all container processes are running as USER 100004

Prerequisites for Rootless Containers

Rootless mode relies on the resource isolation mechanisms of the host operating system. Therefore, certain configurations must be in place before you attempt to launch a container in rootless mode. For details about the requirements for Podman, see Basic Setup and Use of Podman in a Rootless environment. Other container engines have similar requirements.

In RHEL 8 and other popular Linux distributions, the packages that include Podman also install all of the prerequisites. For example, if you have installed the RHEL 8 container-tools package, you do not need to install any additional packages.

Next Steps

For more information about working with rootless containers, see the following: