oss-sec mailing list archives

Re: Linux: Disabling network namespaces


From: John Johansen <john.johansen () canonical com>
Date: Mon, 29 Apr 2024 11:58:22 -0700

On 4/21/24 13:06, Solar Designer wrote:
On Sat, Apr 20, 2024 at 09:33:07PM +0000, Jordan Glover wrote:
bubblwrap has --disable-userns option which prevents creation of nested namespaces (from manpage):

        --disable-userns
Prevent the process in the sandbox from creating further user namespaces, so that it cannot rearrange the filesystem 
namespace or do other more complex namespace modification. This is currently implemented by setting the 
user.max_user_namespaces sysctl to 1, and then entering a nested user namespace which is unable to raise that limit in the 
outer namespace. This option requires --unshare-user, and doesn't work in the setuid version of bubblewrap.

Flatpak uses this (or seccomp filter) to block nested namespaces as this can bypass security its design. For this reason 
firefox own sandbox doesn't use namespaces in flatpak, see https://bugzilla.mozilla.org/show_bug.cgi?id=1756236

Thanks, I didn't expect it was this advanced already.

In what exact way would nested namespaces bypass the security design of
Flatpak?  Is this about the kernel's attack surface exposed by
capabilities in a namespace or something else?  I guess capabilities are
also dropped in the nested namespace?

After reviewing some kernel code, I have doubts as to how effective the
dropping of capabilities in a namespace actually is.

security/commoncap.c: cap_capable() includes this:

                 /*
                  * The owner of the user namespace in the parent of the
                  * user namespace has all caps.
                  */
                 if ((ns->parent == cred->user_ns) && uid_eq(ns->owner, cred->euid))
                         return 0;

this check is only reached when cap_capable() is called for a target
namespace other than one the credentials are from.  However, such uses
do exist, e.g. via Netlink, which would expose e.g. Netfilter:

net/netlink/af_netlink.c:

/**
  * netlink_net_capable - Netlink network namespace message capability test
  * @skb: socket buffer holding a netlink command from userspace
  * @cap: The capability to use
  *
  * Test to see if the opener of the socket we received the message
  * from had when the netlink socket was created and the sender of the
  * message has the capability @cap over the network namespace of
  * the socket we received the message from.
  */
bool netlink_net_capable(const struct sk_buff *skb, int cap)
{
         return netlink_ns_capable(skb, sock_net(skb->sk)->user_ns, cap);
}

So I worry whether even with all namespaces in a sandbox having dropped
capabilities, an attack can still be arranged (with a pair of namespaces
one nested in the other) where a task effectively "has all caps" for a
dangerous operation like configuring Netfilter due to it hitting code
paths like this, which bypass capability bit checks.

The above finding may be a reason for us to prefer making capabilities
in a namespace ineffective vs. dropping capabilities.  In context of my
idea/proposal for a new sysctl, it could be better for it to work as I
had described, overriding security_capable() return, instead of e.g.
hooking return of create_user_ns() and dropping new cred's capabilities.

I hope the Ubuntu/AppArmor solution is also safe in this respect, as it
sounds like it similarly makes capabilities ineffective instead of
dropping them.

The AppArmor solution is flexible, allowing the policy author to decide
what is done. The namespace creation can be allowed, denied or the profile
can be transitioned on namespace creation. So the behavior can be tuned
selectively per application, and based on whether it is in a user namespace
or not.

The 24.04 Ubuntu behavior is for "unconfined" applications to transition
to a profile that denies further creation of user namespaces and denies
capabilities within the user namespace.

There are profiles for known applications allowing them to use user
namespaces. The behavior of most of these just allow the user namespace
and maybe a specific capability, currently without transitioning the
user namespace to tighter confinement, but ideally the policy would
do more, and there are plans to improve the policy around the set of
applications.

Bubblewrap and unshare have additional behaviors around restricting what
the applications can do as they also take advantage of the exec barrier.

Applications that embedded bubblewrap to setup their sandbox, eg.
steam's pressure vessel, can have their own profiles that can control
bubblewrap separate from the system bubblewrap policy.

Its still early days and policy the rollout/policy has been mostly to set
a default of allowing user namespaces but with no capabilities. Then
provide default very open policy for application that have been found to
need them, with plans to tighten that policy on a per application basis
in the future.

appimages and containers that users expect to be able to run from their
home or other user writable locations are the big issue atm. They are
allowed the default behavior of allowed to create user namespaces without
any capabilities but if they require more, we are requiring privileged
user intervention to individually enable running these applications.

We have found application behavior around restricting user namespaces
to be very inconsistent. Eg. qtwebkit will crash if you deny creation
of the user namespace, but will gracefully fallback to not using
user namespaces in its sandbox if its denied capabilities within the
user namespace during sandbox setup. Firefox on the other hand crashes
when user namespaces or capabilities are denied.



Current thread: