Design principles of sysmask

The design of sysmask is based on the following "obvious" observations.

OBSERVATION 1. Error is human. The more a program is complicated, the more
errors it may contain. It is irrealistic to demand that every program and
every library be free of vulnerabilities.

OBSERVATION 2. A running system is composed of active processes. In order to
compromise a system, the attacker must first compromise a vulnerable process
it has access to, then use it to compromise the sensitive data of the
system as a whole.

OBSERVATION 3. Under a multi-tasking operating system, each process is
running in a separate memory space. If the memory management and scheduling
codes of the CPU and the operating system is error-free (we may assume this
without taking too much risk), and if there is no direct input/output or
inter-process communication channels open to the process (this is a
controlable situation), the only way for the process to communicate to the
outside world is through one of the system calls provided by the operating
system.

For example, if the process is denied access to the system calls, it has no
means to do neither good nor harm except wasting some system resources by
its presence.

OBSERVATION 4. It is extremely costly, if not impossible, to make sure that
general processes are bug-free. These processes usually start from highly
sophisticated programs and depend on various dynamic libraries, each of them
developed by a different team.

OBSERVATION 5. In general, the less a piece of code is frequently used, the
more is the risk of bugs contained in it.

OBSERVATION 6. Access restrictions for a process that can be worked around
in one way or another is not a good security measure. Both in theory and in
practice, one has to assume that everything a process can do "legitimately",
the process compromised by a vulnerability will also be able to do. Things
like executing a suid binary, or clear a soft capability restriction.

________________________________

In view of these observations, the basic design principle of sysmask is to
install security checkpoints in a way such that:

1. The implementation is as simple as possible, for the more it is
complicated, the more is the risk of bugs in the security package itself.
Complicatedness will also bring about human errors in configurations, which
is a non-negligeable factor.

2. The integrity of sensitive system data can be preserved even under the
assumption that general processes will do every bad thing they are allowed
to. That is, the system security is made insensitive to vulnerabilities in
processes.

3. The system integrity can also be preserved against bugs in large parts of
the kernel itself, especially in codes that are rarely used.

----------------------------------

The most efficient place to install such a checkpoint is the interrupt
handling routine leading to the processing of system calls. So it is where
sysmask installs its master piece: a facility for selectively denying access
to system calls according to a set of mask bits -the first mask set- that is
individually set up for each process.

The mask set is designed to be not too atomic (it has only 32 bits), while
allowing the denial of as many legitimately unneeded syscalls as possible.

The security meaning of this design is two fold. Firstly, the brutal denial
of services like link(), fork(), execve(), socketcall() or mount() whenever
possible is the best way to ensure that the process cannot do harm in the
corresponding way.

Secondly, by closing rarely used syscalls, one neutralizes a major source of
kernel vulnerabilities. Vulnerabilities are more often discovered behind
rarely used syscalls, but as the latter are rarely used, they can more
often be closed to processes without disrupting their normal work.

To make things work better, a secondary mask set is added. It provides
miscellaneous restrictions such as access denial to device drivers (dev),
proc file system (procfs), the right to leave child processes when the
parent dies (orphan), execute suid binaries as suid (suid), etc. A special
mask is also reserved for future extension allowing the prohibition of
executing writable memory areas.

Moreover, due to one of the above observations, no possibility is provided
for the process to regain any denied access by any of its own action,
including by executing a suid binary or changing its uid (when not masked).
For processes that legitimately need to change access rights, a special
daemon process must be called. The latter will consult its configurations
that define who can do what and under which circumstances.

The secondary mask set can also be set to make the daemon check file and
network access rights of the process. Unlike some other security packages,
the file access check is text based, on the pathname submitted by the
process and BEFORE the kernel path_lookup takes place.

The basic reason behind this choice is that path_lookup() dives deeply into
file system management codes. These make up a large chunk, and as long as one
cannot prove that there is no bugs or even trojan horses in these codes,
there is the risk that some specially crafted filename may trigger a bug to
do unexpected things. In particular, filenames containing special characters
such as '*', '?', quotes, comma etc. are rarely used, therefore the solidity
of the kernel against such filenames might not be sufficiently tested. Even
apparently innocent syscalls like stat() are exposed to such a risk.

The design of sysmask provides the possibility to prevent such filenames to
leak into path_lookup(). The most exposed processes can even be configured
to accept only an explicitly defined list of filenames, thus totally
eliminating the above risk for them.

This design has also the advantage that access right configuration is
centrally kept, which is obviously a plus for security. Great precisions can
be built into the configuration, with conditions combining wildcarded
pathnames, process uid, type of the operation (for example open() versus
chdir()), and user-definable process status. The text based configuration
also makes it possible for runtime reconfiguration with immediate effect.

The system is also designed to work in a completely coercive way, that is,
not requiring cooperation from processes. This makes it possible to set up a
fully protected system, both against system risks (intrusions) and user
risks (viruses, trojans), with only one modification of system file -
/etc/inittab, and no modification at all on existing software.

To do so, one has only to tell /etc/inittab to attach a "token" to each
process created by init. The token is a kind of "uid for sysmask", that can
be changed by the daemon smkd under predefined circumstances, but will
otherwise be inherited by child processes.

For example, the first child process of init, the rc daemon dispatcher, will
have a token "init". When it launches sshd, the configuration may tell smkd
to change the token to "sshd". Under the new token, each time a user passes
the authentication, a user token will be assigned to the user login shell.
In this way, different users may have different access rights according to
the wishes of the sysadmin.

Moreover, /etc/inittab puts a token "tty" to the console manager of the
system. Under this token, a user who logs in from the console may get a
different token as from sshd, conceivably with more access rights.

All the access restrictions imposed by sysmask are put above (and usually
before) any other access policies otherwise defined for the system,
including the file attribute checks, capabilities, or even other security
packages. The principle is that sysmask only restricts access rights but
never gives more. With only two exceptions.

1. When a tokened process executes another binary, the sysmask configuration
may let it change its euid, in the same way as a suid binary, but here the
binary needs not to be suid, nor the given euid be necessarily equal to the
owner of the file. Here the objective is to reduce the number of suid
binaries in the system (which are always a trouble for security). The
approach is also more versatile and more secure.

2. When the "orphan" mask is set, a process can kill its child processes
even if the latter is running under a different uid. This is obviously
necessary for the orphan mask to work, and this mask is a very important
measure against potential DoS attacks.

--------------------------------------------------------

It must be noted that filename based access control requires careful control
of symbolic links, hard links and mounts in order to avoid pitfalls. Sysmask
incorporates several facilities for the management of such a control:
besides hard masks for link, mount and chroot, there is the "follow" option,
and refusal options "hardlink", "symlink", "backup" etc. Special uids can
auso be used to put auxiliary access restrictions.

On the other hand, the possibility for linked files to trigger different
actions of the kernel based on the called name is a very useful feature.
Sysadmins can use this feature to establish custom authentication methods
that are very hard to guess by the attacker.

--------------------------------------------------------

Difficult choices have been made in the classification of system calls in
order to define the action of each mask. Our design principle is to keep the
number of masks as small as possible, not only for performance, but above
all to make the configuration as simple as possible. Because the
complicatedness in the configuration is an enemy of security: it tends to
generate human errors by sysadmins.

Therefore, we have limited to less than 40 hard masks for more than 270
system calls. Inevitably, sometimes many system calls are grouped under one
mask. The usual draw back is that when one of the system calls is used by a
process, the mask should be unset and the process is exposed to potential
vulnerabilities in the whole set of system calls.

It is for this reason that the possibility of adding excepted system calls
is added to the design. If only one or two system calls is used within a
large group, one can still mask the group, while in the same time declare
the used system calls as exceptions.

One example is Apache that requires exactly one call (sendfile) within the
biggest mask group nonstd (Linux-only system calls). By declaring sendfile
as an exception, the rest of the group can be masked. This is important
because system calls behind nonstd are generally not frequently used,
therefore with relatively high risks of privilege elevation.

Only 4 excepts of system calls are made available in the setup. It is enough
under most circumstances, and the small number is chosen in order not to
encourage people to make too complicated definitions. In the rare cases
where a process needs more exceptions, token-based flexible exceptions can
be installed with no limit on the number.

---------------------------------------------------------

Resource consumption control.

Unix-like systems have a traditional resource consumption control mechanism
via the rlimit variables. However, this mechanism is not designed to fight
against deliberate attacks, and it does not meet the requirement of sysmask
that modification of the limitations be centraly controlled.

Therefore, a new set of resource limitations is incorporated into sysmask.
This set of limitations is independent to the existing mechanism.

In order for the limitations to be efficient, the concept of token session
is introduced. All processes forked out by a tokened process and without
token switching are grouped within the same token session, and resource
limitation is computed for all processes within the session. This allows
an efficient defense against forking attacks.

Several new limitations are introduced. For example, time2live controls how
long a token session can stay alive, avoiding memory consumption attacks by
infinite sleeping. Forks can be limited by their number, depths and frequency.

On the other hand, the method of sysmask to control core dumps is not by a
size limit. It is a mask (core) that prohibits core dump.

---------------------------------------------------------

Details about some design choices.

* Pathname check under chroot.

I have chosen to check the pathname relative to the root of the process, but
not that of the daemon smkd. There are several reasons.

Firstly, it is theoretically not always possible to do the other way.
Secondly, the other way may create confusion and complication when the
chrooted root of the process is dynamic. And in practice, it would
complicate the definition and slow down performance in case where the chroot
is long.

The current choice creates a slight loophole when used carelessly. However,
in the great majority of cases the chroot or realroot refusal option can be
used to close the loophole, if ever chroot mask cannot be set.

In any case, the chroot method as a security measure is now obsoleted by
sysmask, as the latter is much more solid.

* The choice of smkd versus direct kernel processing.

My choice of a separate user space process as daemon has an obvious
performance drawback. On the other hand, for testing purposes this is very
convenient because the daemon can be killed and reinstalled without
interrupting anything else. This choice is also more secure.

* kmod call_usermodehelper() execution.

A process can call kmod to launch this helper (see kernel/kmod.c), which
will be run as full root, even without sysmask restriction. As there is no
checking of the file to be executed by the helper, there is the potential
risk that a trojan horse hidden in a device driver or a kernel exploit calls
it to get execution of anything with full root privilege. Note that a trojan
need not to call the function name in order to access it! So the risk must
be controled.

Therefore a mask (kmod) is added to block this function. Moreover, when exec
mask is set, sysmask name checking will be performed on the filename
submitted to the helper. The file executed is usually /sbin/modprobe.

* Markup in the kernel patch.

The kernel patch marks every sysmask-related modification by a #ifdef
directive. The primary reason for this is to ensure that the kernel can be
compiled without any trace of sysmask.

This point is important for security. Even when the use of sysmask becomes
widespread, the kernel must be able to compile and run normally with sysmask
completely removed. Testing things on a kernel without sysmask allows people
to verify that no hidden malicious codes in the kernel is trying to reach
sysmask data, once the latter is compiled in.

* time for log file

The log lines of the log file contain date informations that are only under
a quasi-readable format. This is because the daemon is not compiled with
timezone information. I prefer a more compact program with less potential
security risk than a slightly more convenience in its use.