Polyinstantiation of directories in an SE Linux system

Russell Coker[HREF1], Coker Software russell@coker.com.au



This paper describes the problems related to shared directories such as /tmp and /var/tmp as well as problems related to having multiple SE Linux security contexts used for accessing a single home directory. It then provides detailed information on the solution to this problem that has been implemented with polyinstantiated directories by using the pam_namespace module.


It is a long-standing Unix tradition that the directories /tmp and /var/tmp are used for temporary storage by all programs and on behalf of all users. This used to not be considered a problem, however in recent times it has been recognised that the use of such a shared directory is vulnerable to race-condition attacks with symbolic links.

Another problem is that in some situations a file name may convey secret information. If the file in question is in a public directory such as /tmp or /var/tmp (which may be an unintended result of a command by the user) then this will represent an information leak if there are any less privileged processes running on the machine.

Past attempts to deal with these problems have included restrictions on creating sym-links and hiding file names, which have both been inadequate. The solution chosen for use with SE Linux (which is also designed to work without SE Linux) is to have polyinstantiated directories based on Unix account name and/or SE Linux context. This means that every user will see a different version of the directory in question based on their context.

In the past this feature has been implemented as part of Multi-Level Security (Dr. Rick Smith [HREF2]) systems under the name multi-level directories. I believe that the multi-level directory variant of this solution was based on file system support, while the Linux support for this type of operation that I will describe is based in the VFS layer and thus does not require modification to any of the file systems that may be used.

Summary of Attacks that can be Prevented by Poly-Instantiated Directories

In this paper I am considering the following attack scenarios:
  1. Attack by user on user (including the case of a non-PI user as attacker or victim)
  2. Attack by user on daemon (including the case of a non-PI user as attacker)
  3. Attack by non-root daemon on user
  4. Attack by root daemon on user (will always succeed without SE Linux)

Each of the above four attack scenarios may occur with one of the following three attacks:

  1. Race-condition attacks on the integrity of processes and data (sym-link attacks, race conditions on renaming objects, or pre-creating a file to take ownership of data)
  2. Leaks of confidential data via secrets in file names
  3. Denial Of Service (DOS) attacks based on race conditions and pre-allocating file/directory names

Other Solutions

One attempt at solving this problem that has been implemented in some Linux security systems is to hide file names. This can work as long as it is not possible to guess any of the file names in question. If the file name can be guessed then the hostile party can attempt to create a new file of the same name, failure to create the file in question indicates existence.
But this only solves the problem of secret data in file names.

Another partial attempt at dealing with this problem is controlling the ability to create hard-links and/or sym-links to try and prevent race conditions. A well-known implementation of this is in the OpenWall kernel patch [HREF3] which prevents the user from creating hard-links to files to which they have no write access and from creating sym-links in a +t directory (a directory such as /tmp or /var/tmp) which point to a file that they don't own. It also prevents writing to named pipes in +t directories which are owned by a different user. This deals with some of the issues related to race-condition attacks but there are potential issues that it does not address, such as a hostile user creating sym-links to their own files to divert output or creating a file with no write permissions as a denial of service against a program that uses a fixed file name.
But this only deals with the case of race conditions used to attack system integrity. It does not prevent DOS attacks or protect secret data when it is used in a file name.

SE Linux Requirements for Shared Directories

SE Linux does not attempt to hide file names in a directory, if the name of a file contains secret data then this can be a security problem on shared directories such as /tmp and /var/tmp, this is an issue that has to be solved outside of the core SE Linux code base.

The SE Linux strict and mls policies provide good protection against most race condition attacks. Most domains are not permitted to create hard links to privileged files (types such as etc_t). Daemons are all protected from sym-link attacks by each other due to being denied access to sym-links created by other daemons and by users, and users are given similar protection against attack by daemons (both root and non-root). The main benefit for PI directories in strict and mls SE Linux systems is for protection against users attacking other users, in most cases large numbers of users will have the same SE Linux domain and therefore there will not be any effective protection against such attacks in the domain-type model (the integrity protection part of SE Linux).

When a Unix account is associated with more than one SE Linux context it is necessary to have multiple instances of the home directory to match the SE Linux context. If there is only one instance of the home directory and different SE Linux contexts are used for user logins then one of the contexts may be denied access to shared files such as .bashrc and .bash_history, or they may serve as information leaks. This use creates a requirement for PI home directories in SE Linux that does not exist for non-SE systems.

The problem of multiple logins with different contexts can occur in the older version of SE Linux (known as the example policy) that was used in Red Hat Enterprise Linux 4 and Fedora Core versions 2 to 4 when running the strict policy that permits multiple roles to be allocated to a user. But this is more of an issue with the newer versions of SE Linux policy that have functional support for MLS labels and the new MCS policy that permits different sets of categories to be assigned to a user session.

Non-SE Linux Requirements for Shared Directories

Polyinstantiation of shared directories also provides benefits for non-SE Linux systems, in fact there are probably more benefits to be gained from using this on non-SE systems. The SE Linux strict policy provides protection against sym-link race condition attacks launched by users against users in different roles, attacks by users against daemons, and attacks by daemons against users. The SE Linux MLS policy provides these benefits and also protects against attack from programs running at different levels, for example a process running at sensitivity level s2 could not be tricked into leaking data to a program running at level s1, even if the two programs ran in the same domain and with the same UID. Also SE Linux prevents unprivileged processes from creating hard links to files that are important to system integrity or data confidentiality (which is almost a complete solution to hard-link based attacks).

A non-SE system has none of the above protections and only has the Unix UID to protect both system integrity and confidentiality of data.

Linux Kernel Support for Poly-Instantiated Directories

In recent versions of Linux the current list of mounted file systems is available from the /proc/mounts file which is a sym-link to /proc/self/mounts, this permits displaying the name-space which applies to the current process. If /etc/mtab is a sym-link to /proc/mounts then programs such as df will display information on the mount points that are associated with the name-space for the process.

The initial support for PI directories was via the CLONE_NEWNS flag to the clone() system call. This flag causes the child process to be allocated a separate name space. That process and each child process that it launched would have a separate name space to the process which called clone(), and to any process that resulted from another call to clone() with the CLONE_NEWNS flag. The problem with this was the requirement that applications be modified to use clone() with this flag instead of using fork().

To solve this problem a new system call sys_unshare [HREF4] was added to the Linux kernel. The unshare system call can create a separate name-space for mounted file systems among other things (the set of kernel datra structures that can be unshared has been steadily increased since the introduction of unshare).

The unshare system call requires the SYS_ADMIN capability but does not require a fork, exec, or other operation. So it can be called from a PAM module and thus work with unmodified login programs. Also it is possible for multiple PAM modules to unshare different kernel data structures.

Shared Subtrees

One obvious problem with the functionality described in the previous section is the situation where the administrator wants to mount file systems and have all users see them, or have daemons mount file systems (such as autofs).

The solution to this is a development known as Shared Subtrees [HREF5]. This gives the option of specifying that certain subtrees will not be shared. For example if the directories /tmp and /var/tmp are being instantiated then the following commands could be run from a system boot script to cause all other mount operations to propagate to all users:

mount --make-shared /
mount --bind /tmp /tmp
mount --make-private /tmp
mount --bind /var/tmp /var/tmp
mount --make-private /var/tmp
The above commands make the root of the name-space shared and then make /tmp and /var/tmp private. Note that the --make-private option to the mount command only applies to mount points. As on my test system both /tmp and /var/tmp are on the root file system I have to bind mount them to themselves to have a mount point that can be made private. Be aware that if you don't correctly exclude the PI directories from the shared name space then each user who logs in may get PI directories under another user's directories, and things generally won't work.

Design Overview of PI Directories in Linux

The initial design for PI directories was based on having them only created for user sessions at login time by PAM [HREF6] or similar mechanisms. To implement this the PAM module will create a directory under the directory that is being instantiated, create an unshared name space, and then bind mount the new directory over the PI directory. For example if /tmp is to be PI for user rjc then the directory /tmp/tmp.inst-rjc-rjc would be created as the instance of /tmp for the user rjc. After the directory is created an unshared name space would be created via the unshare system call. Finally in the new name space a bind mount would be used to replace /tmp with /tmp/tmp.inst-rjc-rjc, the bind mount operation would be equivalent to the command:
mount --bind /tmp/tmp.inst-rjc-rjc /tmp
The directory that was created was given the Unix permission mode 1777 (all users can create files and directories, but it is only permitted to remove files or directories that you own). This solved many of the problems related to users attacking users and users attacking daemons. But it does not solve the problem of a daemon attacking a user as the daemon has access to the parent of the PI directories. Also there is a configuration option to have a user excluded from the PI directory system, a user who is granted such access (either deliberately or accidentally) would also be able to attack other users. As all directories were created under /tmp with mode 1777 there was no protection of secret file names from daemons and users who were outside the PI system (for most systems I expect that there will be some users who will be excluded from the PI configuration).

Another problem with the initial implementation was that the directories were all created at login time, therefore a hostile process could guess the names and pre-allocate directories to allow taking over ownership and potentially allowing other race condition attacks. For example any privileged process which relies on files not being unlinked or renamed for correct operation would operate incorrectly (and possibly be subject to attack) if run in a situation where the /tmp directory did not have the mode 1777 to prevent such rename and unlink operations.

Finally the initial implementation did not have a fall-back case for when the desired name for a PI directory had been taken by a file and would cause the login process to abort, this could be used as a DOS attack against user login sessions.

I have identified two possible solutions to the problem of DOS attacks against the pam_namespace module. One solution is to have it check whether the PI directory already exists, if it exists but has the wrong permissions (either Unix or SE Linux) or if there is an object other than a directory using that name then it would try creating the directory under a different name (maybe the original name with ".1" appended) and keep trying different names until it finds one that is available. This solution does not solve the problem of protecting secret file names.

The other solution I have identified solves the problems of DOS attacks and race conditions as well as the leaks of secret data in file names. This requires that a directory be pre-allocated on the system to contain all PI directories. So instead of a PI directory having the name /tmp/tmp.inst-rjc-rjc it might have the name /tmp/.inst/tmp.inst-rjc-rjc. The /tmp/.inst directory would be created and/or verified at system boot time and would have Unix permission mode 000 (the capability dac_override which every login program posesses would be requred to access it) and would also have a SE Linux context that permits only very restrictive access. Therefore non-root daemons will be denied access to /tmp/.inst and therefore would not be able to launch attacks on users via the /tmp directory. On SE Linux systems root daemons will also be denied such access. If a user session is launched with a shared system name space (through misconfiguration or unusual requirements) then they would also be denied access to the instance of /tmp used by other users.

In the initial design of PI directories the aim was to confine users to prevent them from attacking the rest of the system, and such a confined user was still vulnerable to attack from outside. The second of the two solutions that I propose above is the one that I believe to be the best, it will protect users who have a PI version of a shared directory from attack by non-root daemons on a non-SE system and from attack by root daemons as well on a SE Linux system. It would be a viable option for the sys-admin to give a single user a PI version of /tmp to protect the files for that user while allowing all other users access to the system shared name space.

At the time of writing we have agreement on the concept of using a naming system somewhat like /tmp/.inst/tmp.inst-something where the directory /tmp/.inst will have Unix mode 000 and restrictive SE Linux access controls. This will prevent daemons and users that are not included in the PI configuration from attacking daemons and users that have it enabled. This makes PI protect the user who has such a PI directory as well as protecting the rest of the system from that user. Note that at the time of writing there was no final agreement on the directory names, while the concept of a two-level directory is agreed the actual name of the directory in the default configuration is still to be resolved.

A feature that has been discussed and agreed in concept is to have the pam_namespace.so module check the permissions of the /tmp/.inst directory and abort the login process if the directory does not have Unix permission mode 000, root ownership, and a suitable SE Linux label (if SE Linux is enabled). There will be a configuration option to disable this functionality as not all systems will need this level of protection (and not all administrators will want a system to fail-closed on such a minor security issue).

Currently Released Code

As of the time of writing Fedora Rawhide has a shared object named pam_namespace.so that implements the basic functionality. To use it the PAM configuration files in the /etc/pam.d directory must be modified to have the following line at the end:
session    required     pam_namespace.so
The system will work if the pam_namespace object is not the last in the list, but the creation of the namespace may interfere with some other PAM modules (for example if a PAM module wanted to access files in the /tmp directory) and in general it is safest to have it last. The only situation in which you might not want to have pam_namespace as the last session module is if you are using pam_mkhomedir and also using pam_namespace to provide PI home directories. But currently pam_mkhomedir does not work correctly in situations where PI home directories are desired so this should not be an issue.

The most noteworthy parameter for the pam_namespace module is the optional parameter unmnt_remnt. This is used by programs that run from an unshared namespace and need to create another unshared namespace. The primary example of this is su, all other programs that perform actions which are similar in concept (IE they are run from a user session and launch a new session on behalf of another user) will have the same requirement.

The pam_namespace module uses the configuration file /etc/security/namespace.conf. This file currently has four parameters, the first gives a directory that should be instantiated (there is an option of $HOME for instantiating the user home directory). The second is the name of the real directory to be used for the instance which has variables $USER and $HOME to represent the user-name and the home directory of the user. The third parameter may have as it's value user, context, or both to indicate whether the instantiation should be based on user-name, SE Linux context, or both. The final parameter is a list of comma-separated user-names for accounts that are exempt from poly-instantiation of the directory in question. I believe that it will be standard practice to include root in this list of accounts (usually there will be no need for other users to be excluded).

Preventing Daemons From Attacking Each Other

In the currently released code there is no protection against daemons attacking each other. I believe that to take advantage of the full benefits offered by PI directories most daemons that run as non-root need the same protection so that they can not attack each other.

In Fedora there is a new program called runuser that will start a daemon as a user other than root. It is linked against PAM and can be configured to call the pam_namespace module. When I finish the debugging then every time it launches a daemon as non-root it will be able to create a new unshared namespace. Non-root daemons that require the system shared name space will need to have their user-names specified in the namespace.conf file.

In Debian daemons are started via a program named start-stop-daemon. I plan to modify this program to have the necessary name-space functionality.

Interesting Features

There is no requirement that the PI directory be a sub-directory of the directory it replaces. In fact it can be on a different filesystem. If you have a separate filesystem for /tmp and don't want to have a separate filesystem for /var/tmp too then you could just configure namespace.conf such that /var/tmp is instantiated under /tmp. The following is one sample configuration:
/tmp     /tmp/.inst/tmp.inst-$USER-       both      rjc,root
/var/tmp /tmp/.inst/var-tmp.inst-$USER-   both      rjc,root


With a two-level directory configuration (such as /tmp/.inst/whatever) we protect against all the attack scenarios that I consider (including an attack launched by a root-owned daemon on users when running SE Linux). The protection provided by the PI shared directory works in two ways, it protects the process with the unshared namespace and it also protects all other processes on the system against attack from that process.

Most operations that are described in this paper are usable in Fedora Rawhide as of the 31st of May 2006. The only operation that is not usable at this time is PI support for daemons. I hope to have PI working in Debian and have PI support for daemons in both Debian and Fedora Rawhide by the time this paper is published, I will describe my success in these efforts when I present this paper.

Hypertext References

HREF1 http://www.coker.com.au/selinux/
HREF2 http://www.cs.stthomas.edu/faculty/resmith/r/mls/index.html
HREF3 http://www.openwall.com/linux/README.shtml
HREF4 http://marc2.theaimsgroup.com/?l=linux-kernel&m=112350785026703&w=2
HREF5 http://lwn.net/Articles/159092/
HREF6 http://www.kernel.org/pub/linux/libs/pam/


The System Administrators Guild of Australia© 2006. The authors assign to The System Administrator's Guild of Australia and other educational and non-profit institutions a non-exclusive licence to use this document for personal use and in courses of instruction provided that the article is used in full and this copyright statement is reproduced. The authors also grant a non-exclusive licence to The System Administrators Guild of Australia to publish this document in full on the World Wide Web and on CD-ROM and in printed form with the conference papers and for the document to be published on mirrors on the World Wide Web.