From c8d383cc49dbb2325958e805a8b0118b0d6c4f9a Mon Sep 17 00:00:00 2001 From: Paul Morie Date: Wed, 20 Apr 2016 19:51:46 -0400 Subject: [PATCH] Seccomp Proposal --- docs/design/seccomp.md | 295 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 295 insertions(+) create mode 100644 docs/design/seccomp.md diff --git a/docs/design/seccomp.md b/docs/design/seccomp.md new file mode 100644 index 00000000000..7d65611edfd --- /dev/null +++ b/docs/design/seccomp.md @@ -0,0 +1,295 @@ + + + + +WARNING +WARNING +WARNING +WARNING +WARNING + +

PLEASE NOTE: This document applies to the HEAD of the source tree

+ +If you are using a released version of Kubernetes, you should +refer to the docs that go with that version. + +Documentation for other releases can be found at +[releases.k8s.io](http://releases.k8s.io). + +-- + + + + + +## Abstract + +A proposal for adding **alpha** support for +[seccomp](https://github.com/seccomp/libseccomp) to Kubernetes. Seccomp is a +system call filtering facility in the Linux kernel which lets applications +define limits on system calls they may make, and what should happen when +system calls are made. Seccomp is used to reduce the attack surface available +to applications. + +## Motivation + +Applications use seccomp to restrict the set of system calls they can make. +Recently, container runtimes have begun adding features to allow the runtime +to interact with seccomp on behalf of the application, which eliminates the +need for applications to link against libseccomp directly. Adding support in +the Kubernetes API for describing seccomp profiles will allow administrators +greater control over the security of workloads running in Kubernetes. + +Goals of this design: + +1. Describe how to reference seccomp profiles in containers that use them + +## Constraints and Assumptions + +This design should: + +* build upon previous security context work +* be container-runtime agnostic +* allow use of custom profiles +* facilitate containerized applications that link directly to libseccomp + +## Use Cases + +1. As an administrator, I want to be able to grant access to a seccomp profile + to a class of users +2. As a user, I want to run an application with a seccomp profile similar to + the default one provided by my container runtime +3. As a user, I want to run an application which is already libseccomp-aware + in a container, and for my application to manage interacting with seccomp + unmediated by Kubernetes +4. As a user, I want to be able to use a custom seccomp profile and use + it with my containers + +### Use Case: Administrator access control + +Controlling access to seccomp profiles is a cluster administrator +concern. It should be possible for an administrator to control which users +have access to which profiles. + +The [pod security policy](https://github.com/kubernetes/kubernetes/pull/7893) +API extension governs the ability of users to make requests that affect pod +and container security contexts. The proposed design should deal with +required changes to control access to new functionality. + +### Use Case: Seccomp profiles similar to container runtime defaults + +Many users will want to use images that make assumptions about running in the +context of their chosen container runtime. Such images are likely to +frequently assume that they are running in the context of the container +runtime's default seccomp settings. Therefore, it should be possible to +express a seccomp profile similar to a container runtime's defaults. + +As an example, all dockerhub 'official' images are compatible with the Docker +default seccomp profile. So, any user who wanted to run one of these images +with seccomp would want the default profile to be accessible. + +### Use Case: Applications that link to libseccomp + +Some applications already link to libseccomp and control seccomp directly. It +should be possible to run these applications unmodified in Kubernetes; this +implies there should be a way to disable seccomp control in Kubernetes for +certain containers, or to run with a "no-op" or "unconfined" profile. + +Sometimes, applications that link to seccomp can use the default profile for a +container runtime, and restrict further on top of that. It is important to +note here that in this case, applications can only place _further_ +restrictions on themselves. It is not possible to re-grant the ability of a +process to make a system call once it has been removed with seccomp. + +As an example, elasticsearch manages its own seccomp filters in its code. +Currently, elasticsearch is capable of running in the context of the default +Docker profile, but if in the future, elasticsearch needed to be able to call +`ioperm` or `iopr` (both of which are disallowed in the default profile), it +should be possible to run elasticsearch by delegating the seccomp controls to +the pod. + +### Use Case: Custom profiles + +Different applications have different requirements for seccomp profiles; it +should be possible to specify an arbitrary seccomp profile and use it in a +container. This is more of a concern for applications which need a higher +level of privilege than what is granted by the default profile for a cluster, +since applications that want to restrict privileges further can always make +additional calls in their own code. + +An example of an application that requires the use of a syscall disallowed in +the Docker default profile is Chrome, which needs `clone` to create a new user +namespace. Another example would be a program which uses `ptrace` to +implement a sandbox for user-provided code, such as +[eval.in](https://eval.in/). + +## Community Work + +### Container runtime support for seccomp + +#### Docker / opencontainers + +Docker supports the open container initiative's API for +seccomp, which is very close to the libseccomp API. It allows full +specification of seccomp filters, with arguments, operators, and actions. + +Docker allows the specification of a single seccomp filter. There are +community requests for: + +Issues: + +* [docker/22109](https://github.com/docker/docker/issues/22109): composable + seccomp filters +* [docker/21105](https://github.com/docker/docker/issues/22105): custom + seccomp filters for builds + +#### rkt / appcontainers + +The `rkt` runtime delegates to systemd for seccomp support; there is an open +issue to add support once `appc` supports it. The `appc` project has an open +issue to be able to describe seccomp as an isolator in an appc pod. + +The systemd seccomp facility is based on a whitelist of system calls that can +be made, rather than a full filter specification. + +Issues: + +* [appc/529](https://github.com/appc/spec/issues/529) +* [rkt/1614](https://github.com/coreos/rkt/issues/1614) + +#### HyperContainer + +[HyperContainer](https://hypercontainer.io) does not support seccomp. + +### Other platforms and seccomp-like capabilities + +FreeBSD has a seccomp/capability-like facility called +[Capsicum](https://www.freebsd.org/cgi/man.cgi?query=capsicum&sektion=4). + +#### lxd + +[`lxd`](http://www.ubuntu.com/cloud/lxd) constrains containers using a default profile. + +Issues: + +* [lxd/1084](https://github.com/lxc/lxd/issues/1084): add knobs for seccomp + +## Proposed Design + +### Seccomp API Resource? + +An earlier draft of this proposal described a new global API resource that +could be used to describe seccomp profiles. After some discussion, it was +determined that without a feedback signal from users indicating a need to +describe new profiles in the Kubernetes API, it is not possible to know +whether a new API resource is warranted. + +That being the case, we will not propose a new API resource at this time. If +there is strong community desire for such a resource, we may consider it in +the future. + +Instead of implementing a new API resource, we propose that pods be able to +reference seccomp profiles by name. Since this is an alpha feature, we will +use annotations instead of extending the API with new fields. + +### API changes? + +In the alpha version of this feature we will use annotations to store the +names of seccomp profiles. The keys will be: + +`security.alpha.kubernetes.io/seccomp/container/` + +which will be used to set the seccomp profile of a container, and: + +`security.alpha.kubernetes.io/seccomp/pod` + +which will set the seccomp profile for the containers of an entire pod. If a +pod-level annotation is present, and a container-level annotation present for +a container, then the container-level profile takes precedence. + +The value of these keys should be container-runtime agnostic. We will +establish a format that expresses the conventions for distinguishing between +an unconfined profile, the container runtime's default, or a custom profile. +Since format of profile is likely to be runtime dependent, we will consider +profiles to be opaque to kubernetes for now. + +The following format is scoped as follows: + +1. `runtime/default` - the default profile for the container runtime +2. `unconfined` - unconfined profile, ie, no seccomp sandboxing +3. `localhost/` - the profile installed to the node's local seccomp profile root + +Since seccomp profile schemes may vary between container runtimes, we will +treat the contents of profiles as opaque for now and avoid attempting to find +a common way to describe them. It is up to the container runtime to be +sensitive to the annotations proposed here and to interpret instructions about +local profiles. + +A new area on disk (which we will call the seccomp profile root) must be +established to hold seccomp profiles. A field will be added to the Kubelet +for the seccomp profile root and a knob (`--seccomp-profile-root`) exposed to +allow admins to set it. If unset, it should default to the `seccomp` +subdirectory of the kubelet root directory. + +### Pod Security Policy annotation + +The `PodSecurityPolicy` type should be annotated with the allowed seccomp +profiles using the key +`security.alpha.kubernetes.io/allowedSeccompProfileNames`. The value of this +key should be a comma delimited list. + +## Examples + +### Unconfined profile + +Here's an example of a pod that uses the unconfined profile: + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: trustworthy-pod + annotations: + security.alpha.kubernetes.io/seccomp/pod: unconfined +spec: + containers: + - name: trustworthy-container + image: sotrustworthy:latest +``` + +### Custom profile + +Here's an example of a pod that uses a profile called `example-explorer- +profile` using the container-level annotation: + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: explorer + annotations: + security.alpha.kubernetes.io/seccomp/container/explorer: localhost/example-explorer-profile +spec: + containers: + - name: explorer + image: gcr.io/google_containers/explorer:1.0 + args: ["-port=8080"] + ports: + - containerPort: 8080 + protocol: TCP + volumeMounts: + - mountPath: "/mount/test-volume" + name: test-volume + volumes: + - name: test-volume + emptyDir: {} +``` + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/seccomp.md?pixel)]() +