Design principles of sysmask The design of sysmask is based on the following "obvious" observations. OBSERVATION 1. Error is human. The more a program is complicated, the more errors it may contain. It is irrealistic to demand that every program and every library be free of vulnerabilities. OBSERVATION 2. A running system is composed of active processes. In order to compromise a system, the attacker must first compromise a vulnerable process it has access to, then use it to compromise the sensitive data of the system as a whole. OBSERVATION 3. Under a multi-tasking operating system, each process is running in a separate memory space. If the memory management and scheduling codes of the CPU and the operating system is error-free (we may assume this without taking too much risk), and if there is no direct input/output or inter-process communication channels open to the process (this is a controlable situation), the only way for the process to communicate to the outside world is through one of the system calls provided by the operating system. For example, if the process is denied access to the system calls, it has no means to do neither good nor harm except wasting some system resources by its presence. OBSERVATION 4. It is extremely costly, if not impossible, to make sure that general processes are bug-free. These processes usually start from highly sophisticated programs and depend on various dynamic libraries, each of them developed by a different team. OBSERVATION 5. In general, the less a piece of code is frequently used, the more is the risk of bugs contained in it. OBSERVATION 6. Access restrictions for a process that can be worked around in one way or another is not a good security measure. Both in theory and in practice, one has to assume that everything a process can do "legitimately", the process compromised by a vulnerability will also be able to do. Things like executing a suid binary, or clear a soft capability restriction. ________________________________ In view of these observations, the basic design principle of sysmask is to install security checkpoints in a way such that: 1. The implementation is as simple as possible, for the more it is complicated, the more is the risk of bugs in the security package itself. Complicatedness will also bring about human errors in configurations, which is a non-negligeable factor. 2. The integrity of sensitive system data can be preserved even under the assumption that general processes will do every bad thing they are allowed to. That is, the system security is made insensitive to vulnerabilities in processes. 3. The system integrity can also be preserved against bugs in large parts of the kernel itself, especially in codes that are rarely used. ---------------------------------- The most efficient place to install such a checkpoint is the interrupt handling routine leading to the processing of system calls. So it is where sysmask installs its master piece: a facility for selectively denying access to system calls according to a set of mask bits -the first mask set- that is individually set up for each process. The mask set is designed to be not too atomic (it has only 32 bits), while allowing the denial of as many legitimately unneeded syscalls as possible. The security meaning of this design is two fold. Firstly, the brutal denial of services like link(), fork(), execve(), socketcall() or mount() whenever possible is the best way to ensure that the process cannot do harm in the corresponding way. Secondly, by closing rarely used syscalls, one neutralizes a major source of kernel vulnerabilities. Vulnerabilities are more often discovered behind rarely used syscalls, but as the latter are rarely used, they can more often be closed to processes without disrupting their normal work. To make things work better, a secondary mask set is added. It provides miscellaneous restrictions such as access denial to device drivers (dev), proc file system (procfs), the right to leave child processes when the parent dies (orphan), execute suid binaries as suid (suid), etc. A special mask is also reserved for future extension allowing the prohibition of executing writable memory areas. Moreover, due to one of the above observations, no possibility is provided for the process to regain any denied access by any of its own action, including by executing a suid binary or changing its uid (when not masked). For processes that legitimately need to change access rights, a special daemon process must be called. The latter will consult its configurations that define who can do what and under which circumstances. The secondary mask set can also be set to make the daemon check file and network access rights of the process. Unlike some other security packages, the file access check is text based, on the pathname submitted by the process and BEFORE the kernel path_lookup takes place. The basic reason behind this choice is that path_lookup() dives deeply into file system management codes. These make up a large chunk, and as long as one cannot prove that there is no bugs or even trojan horses in these codes, there is the risk that some specially crafted filename may trigger a bug to do unexpected things. In particular, filenames containing special characters such as '*', '?', quotes, comma etc. are rarely used, therefore the solidity of the kernel against such filenames might not be sufficiently tested. Even apparently innocent syscalls like stat() are exposed to such a risk. The design of sysmask provides the possibility to prevent such filenames to leak into path_lookup(). The most exposed processes can even be configured to accept only an explicitly defined list of filenames, thus totally eliminating the above risk for them. This design has also the advantage that access right configuration is centrally kept, which is obviously a plus for security. Great precisions can be built into the configuration, with conditions combining wildcarded pathnames, process uid, type of the operation (for example open() versus chdir()), and user-definable process status. The text based configuration also makes it possible for runtime reconfiguration with immediate effect. The system is also designed to work in a completely coercive way, that is, not requiring cooperation from processes. This makes it possible to set up a fully protected system, both against system risks (intrusions) and user risks (viruses, trojans), with only one modification of system file - /etc/inittab, and no modification at all on existing software. To do so, one has only to tell /etc/inittab to attach a "token" to each process created by init. The token is a kind of "uid for sysmask", that can be changed by the daemon smkd under predefined circumstances, but will otherwise be inherited by child processes. For example, the first child process of init, the rc daemon dispatcher, will have a token "init". When it launches sshd, the configuration may tell smkd to change the token to "sshd". Under the new token, each time a user passes the authentication, a user token will be assigned to the user login shell. In this way, different users may have different access rights according to the wishes of the sysadmin. Moreover, /etc/inittab puts a token "tty" to the console manager of the system. Under this token, a user who logs in from the console may get a different token as from sshd, conceivably with more access rights. All the access restrictions imposed by sysmask are put above (and usually before) any other access policies otherwise defined for the system, including the file attribute checks, capabilities, or even other security packages. The principle is that sysmask only restricts access rights but never gives more. With only two exceptions. 1. When a tokened process executes another binary, the sysmask configuration may let it change its euid, in the same way as a suid binary, but here the binary needs not to be suid, nor the given euid be necessarily equal to the owner of the file. Here the objective is to reduce the number of suid binaries in the system (which are always a trouble for security). The approach is also more versatile and more secure. 2. When the "orphan" mask is set, a process can kill its child processes even if the latter is running under a different uid. This is obviously necessary for the orphan mask to work, and this mask is a very important measure against potential DoS attacks. -------------------------------------------------------- It must be noted that filename based access control requires careful control of symbolic links, hard links and mounts in order to avoid pitfalls. Sysmask incorporates several facilities for the management of such a control: besides hard masks for link, mount and chroot, there is the "follow" option, and refusal options "hardlink", "symlink", "backup" etc. Special uids can auso be used to put auxiliary access restrictions. On the other hand, the possibility for linked files to trigger different actions of the kernel based on the called name is a very useful feature. Sysadmins can use this feature to establish custom authentication methods that are very hard to guess by the attacker. -------------------------------------------------------- Difficult choices have been made in the classification of system calls in order to define the action of each mask. Our design principle is to keep the number of masks as small as possible, not only for performance, but above all to make the configuration as simple as possible. Because the complicatedness in the configuration is an enemy of security: it tends to generate human errors by sysadmins. Therefore, we have limited to less than 40 hard masks for more than 270 system calls. Inevitably, sometimes many system calls are grouped under one mask. The usual draw back is that when one of the system calls is used by a process, the mask should be unset and the process is exposed to potential vulnerabilities in the whole set of system calls. It is for this reason that the possibility of adding excepted system calls is added to the design. If only one or two system calls is used within a large group, one can still mask the group, while in the same time declare the used system calls as exceptions. One example is Apache that requires exactly one call (sendfile) within the biggest mask group nonstd (Linux-only system calls). By declaring sendfile as an exception, the rest of the group can be masked. This is important because system calls behind nonstd are generally not frequently used, therefore with relatively high risks of privilege elevation. Only 4 excepts of system calls are made available in the setup. It is enough under most circumstances, and the small number is chosen in order not to encourage people to make too complicated definitions. In the rare cases where a process needs more exceptions, token-based flexible exceptions can be installed with no limit on the number. --------------------------------------------------------- Resource consumption control. Unix-like systems have a traditional resource consumption control mechanism via the rlimit variables. However, this mechanism is not designed to fight against deliberate attacks, and it does not meet the requirement of sysmask that modification of the limitations be centraly controlled. Therefore, a new set of resource limitations is incorporated into sysmask. This set of limitations is independent to the existing mechanism. In order for the limitations to be efficient, the concept of token session is introduced. All processes forked out by a tokened process and without token switching are grouped within the same token session, and resource limitation is computed for all processes within the session. This allows an efficient defense against forking attacks. Several new limitations are introduced. For example, time2live controls how long a token session can stay alive, avoiding memory consumption attacks by infinite sleeping. Forks can be limited by their number, depths and frequency. On the other hand, the method of sysmask to control core dumps is not by a size limit. It is a mask (core) that prohibits core dump. --------------------------------------------------------- Details about some design choices. * Pathname check under chroot. I have chosen to check the pathname relative to the root of the process, but not that of the daemon smkd. There are several reasons. Firstly, it is theoretically not always possible to do the other way. Secondly, the other way may create confusion and complication when the chrooted root of the process is dynamic. And in practice, it would complicate the definition and slow down performance in case where the chroot is long. The current choice creates a slight loophole when used carelessly. However, in the great majority of cases the chroot or realroot refusal option can be used to close the loophole, if ever chroot mask cannot be set. In any case, the chroot method as a security measure is now obsoleted by sysmask, as the latter is much more solid. * The choice of smkd versus direct kernel processing. My choice of a separate user space process as daemon has an obvious performance drawback. On the other hand, for testing purposes this is very convenient because the daemon can be killed and reinstalled without interrupting anything else. This choice is also more secure. * kmod call_usermodehelper() execution. A process can call kmod to launch this helper (see kernel/kmod.c), which will be run as full root, even without sysmask restriction. As there is no checking of the file to be executed by the helper, there is the potential risk that a trojan horse hidden in a device driver or a kernel exploit calls it to get execution of anything with full root privilege. Note that a trojan need not to call the function name in order to access it! So the risk must be controled. Therefore a mask (kmod) is added to block this function. Moreover, when exec mask is set, sysmask name checking will be performed on the filename submitted to the helper. The file executed is usually /sbin/modprobe. * Markup in the kernel patch. The kernel patch marks every sysmask-related modification by a #ifdef directive. The primary reason for this is to ensure that the kernel can be compiled without any trace of sysmask. This point is important for security. Even when the use of sysmask becomes widespread, the kernel must be able to compile and run normally with sysmask completely removed. Testing things on a kernel without sysmask allows people to verify that no hidden malicious codes in the kernel is trying to reach sysmask data, once the latter is compiled in. * time for log file The log lines of the log file contain date informations that are only under a quasi-readable format. This is because the daemon is not compiled with timezone information. I prefer a more compact program with less potential security risk than a slightly more convenience in its use.