mirror of
				https://github.com/k3s-io/kubernetes.git
				synced 2025-10-25 18:09:10 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			375 lines
		
	
	
		
			12 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			375 lines
		
	
	
		
			12 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| ## Abstract
 | |
| 
 | |
| A proposal for refactoring `SecurityContext` to have pod-level and container-level attributes in
 | |
| order to correctly model pod- and container-level security concerns.
 | |
| 
 | |
| ## Motivation
 | |
| 
 | |
| Currently, containers have a `SecurityContext` attribute which contains information about the
 | |
| security settings the container uses.  In practice, many of these attributes are uniform across all
 | |
| containers in a pod.  Simultaneously, there is also a need to apply the security context pattern
 | |
| at the pod level to correctly model security attributes that apply only at a pod level.
 | |
| 
 | |
| Users should be able to:
 | |
| 
 | |
| 1.  Express security settings that are applicable to the entire pod
 | |
| 2.  Express base security settings that apply to all containers
 | |
| 3.  Override only the settings that need to be differentiated from the base in individual
 | |
|     containers
 | |
| 
 | |
| This proposal is a dependency for other changes related to security context:
 | |
| 
 | |
| 1.  [Volume ownership management in the Kubelet](https://github.com/kubernetes/kubernetes/pull/12944)
 | |
| 2.  [Generic SELinux label management in the Kubelet](https://github.com/kubernetes/kubernetes/pull/14192)
 | |
| 
 | |
| Goals of this design:
 | |
| 
 | |
| 1.  Describe the use cases for which a pod-level security context is necessary
 | |
| 2.  Thoroughly describe the API backward compatibility issues that arise from the introduction of
 | |
|     a pod-level security context
 | |
| 3.  Describe all implementation changes necessary for the feature
 | |
| 
 | |
| ## Constraints and assumptions
 | |
| 
 | |
| 1.  We will not design for intra-pod security; we are not currently concerned about isolating
 | |
|     containers in the same pod from one another
 | |
| 1.  We will design for backward compatibility with the current V1 API
 | |
| 
 | |
| ## Use Cases
 | |
| 
 | |
| 1.  As a developer, I want to correctly model security attributes which belong to an entire pod
 | |
| 2.  As a user, I want to be able to specify container attributes that apply to all containers
 | |
|     without repeating myself
 | |
| 3.  As an existing user, I want to be able to use the existing container-level security API
 | |
| 
 | |
| ### Use Case: Pod level security attributes
 | |
| 
 | |
| Some security attributes make sense only to model at the pod level.  For example, it is a
 | |
| fundamental property of pods that all containers in a pod share the same network namespace.
 | |
| Therefore, using the host namespace makes sense to model at the pod level only, and indeed, today
 | |
| it is part of the `PodSpec`.  Other host namespace support is currently being added and these will
 | |
| also be pod-level settings; it makes sense to model them as a pod-level collection of security
 | |
| attributes.
 | |
| 
 | |
| ## Use Case: Override pod security context for container
 | |
| 
 | |
| Some use cases require the containers in a pod to run with different security settings.  As an
 | |
| example, a user may want to have a pod with two containers, one of which runs as root with the
 | |
| privileged setting, and one that runs as a non-root UID.  To support use cases like this, it should
 | |
| be possible to override appropriate (i.e., not intrinsically pod-level) security settings for
 | |
| individual containers.
 | |
| 
 | |
| ## Proposed Design
 | |
| 
 | |
| ### SecurityContext
 | |
| 
 | |
| For posterity and ease of reading, note the current state of `SecurityContext`:
 | |
| 
 | |
| ```go
 | |
| package api
 | |
| 
 | |
| type Container struct {
 | |
|     // Other fields omitted
 | |
| 
 | |
|     // Optional: SecurityContext defines the security options the pod should be run with
 | |
|     SecurityContext *SecurityContext `json:"securityContext,omitempty"`
 | |
| }
 | |
| 
 | |
| type SecurityContext struct {
 | |
|     // Capabilities are the capabilities to add/drop when running the container
 | |
|     Capabilities *Capabilities `json:"capabilities,omitempty"`
 | |
| 
 | |
|     // Run the container in privileged mode
 | |
|     Privileged *bool `json:"privileged,omitempty"`
 | |
| 
 | |
|     // SELinuxOptions are the labels to be applied to the container
 | |
|     // and volumes
 | |
|     SELinuxOptions *SELinuxOptions `json:"seLinuxOptions,omitempty"`
 | |
| 
 | |
|     // RunAsUser is the UID to run the entrypoint of the container process.
 | |
|     RunAsUser *int64 `json:"runAsUser,omitempty"`
 | |
| 
 | |
|     // RunAsNonRoot indicates that the container should be run as a non-root user.  If the RunAsUser
 | |
|     // field is not explicitly set then the kubelet may check the image for a specified user or
 | |
|     // perform defaulting to specify a user.
 | |
|     RunAsNonRoot bool `json:"runAsNonRoot,omitempty"`
 | |
| }
 | |
| 
 | |
| // SELinuxOptions contains the fields that make up the SELinux context of a container.
 | |
| type SELinuxOptions struct {
 | |
|     // SELinux user label
 | |
|     User string `json:"user,omitempty"`
 | |
| 
 | |
|     // SELinux role label
 | |
|     Role string `json:"role,omitempty"`
 | |
| 
 | |
|     // SELinux type label
 | |
|     Type string `json:"type,omitempty"`
 | |
| 
 | |
|     // SELinux level label.
 | |
|     Level string `json:"level,omitempty"`
 | |
| }
 | |
| ```
 | |
| 
 | |
| ### PodSecurityContext
 | |
| 
 | |
| `PodSecurityContext` specifies two types of security attributes:
 | |
| 
 | |
| 1.  Attributes that apply to the pod itself
 | |
| 2.  Attributes that apply to the containers of the pod
 | |
| 
 | |
| In the internal API, fields of the `PodSpec` controlling the use of the host PID, IPC, and network
 | |
| namespaces are relocated to this type:
 | |
| 
 | |
| ```go
 | |
| package api
 | |
| 
 | |
| type PodSpec struct {
 | |
|     // Other fields omitted
 | |
| 
 | |
|     // Optional: SecurityContext specifies pod-level attributes and container security attributes
 | |
|     // that apply to all containers.
 | |
|     SecurityContext *PodSecurityContext `json:"securityContext,omitempty"`
 | |
| }
 | |
| 
 | |
| // PodSecurityContext specifies security attributes of the pod and container attributes that apply
 | |
| // to all containers of the pod.
 | |
| type PodSecurityContext struct {
 | |
|     // Use the host's network namespace. If this option is set, the ports that will be
 | |
|     // used must be specified.
 | |
|     // Optional: Default to false.
 | |
|     HostNetwork bool
 | |
|     // Use the host's IPC namespace
 | |
|     HostIPC bool
 | |
| 
 | |
|     // Use the host's PID namespace
 | |
|     HostPID bool
 | |
| 
 | |
|     // Capabilities are the capabilities to add/drop when running containers
 | |
|     Capabilities *Capabilities `json:"capabilities,omitempty"`
 | |
| 
 | |
|     // Run the container in privileged mode
 | |
|     Privileged *bool `json:"privileged,omitempty"`
 | |
| 
 | |
|     // SELinuxOptions are the labels to be applied to the container
 | |
|     // and volumes
 | |
|     SELinuxOptions *SELinuxOptions `json:"seLinuxOptions,omitempty"`
 | |
| 
 | |
|     // RunAsUser is the UID to run the entrypoint of the container process.
 | |
|     RunAsUser *int64 `json:"runAsUser,omitempty"`
 | |
| 
 | |
|     // RunAsNonRoot indicates that the container should be run as a non-root user.  If the RunAsUser
 | |
|     // field is not explicitly set then the kubelet may check the image for a specified user or
 | |
|     // perform defaulting to specify a user.
 | |
|     RunAsNonRoot bool
 | |
| }
 | |
| 
 | |
| // Comments and generated docs will change for the container.SecurityContext field to indicate
 | |
| // the precedence of these fields over the pod-level ones.
 | |
| 
 | |
| type Container struct {
 | |
|     // Other fields omitted
 | |
| 
 | |
|     // Optional: SecurityContext defines the security options the pod should be run with.
 | |
|     // Settings specified in this field take precedence over the settings defined in
 | |
|     // pod.Spec.SecurityContext.
 | |
|     SecurityContext *SecurityContext `json:"securityContext,omitempty"`
 | |
| }
 | |
| ```
 | |
| 
 | |
| In the V1 API, the pod-level security attributes which are currently fields of the `PodSpec` are
 | |
| retained on the `PodSpec` for backward compatibility purposes:
 | |
| 
 | |
| ```go
 | |
| package v1
 | |
| 
 | |
| type PodSpec struct {
 | |
|     // Other fields omitted
 | |
| 
 | |
|     // Use the host's network namespace. If this option is set, the ports that will be
 | |
|     // used must be specified.
 | |
|     // Optional: Default to false.
 | |
|     HostNetwork bool `json:"hostNetwork,omitempty"`
 | |
|     // Use the host's pid namespace.
 | |
|     // Optional: Default to false.
 | |
|     HostPID bool `json:"hostPID,omitempty"`
 | |
|     // Use the host's ipc namespace.
 | |
|     // Optional: Default to false.
 | |
|     HostIPC bool `json:"hostIPC,omitempty"`
 | |
| 
 | |
|     // Optional: SecurityContext specifies pod-level attributes and container security attributes
 | |
|     // that apply to all containers.
 | |
|     SecurityContext *PodSecurityContext `json:"securityContext,omitempty"`
 | |
| }
 | |
| ```
 | |
| 
 | |
| The `pod.Spec.SecurityContext` specifies the security context of all containers in the pod.
 | |
| The containers' `securityContext` field is overlaid on the base security context to determine the
 | |
| effective security context for the container.
 | |
| 
 | |
| The new V1 API should be backward compatible with the existing API.  Backward compatibility is
 | |
| defined as:
 | |
| 
 | |
| > 1.  Any API call (e.g. a structure POSTed to a REST endpoint) that worked before your change must
 | |
| >     work the same after your change.
 | |
| > 2.  Any API call that uses your change must not cause problems (e.g. crash or degrade behavior) when
 | |
| >     issued against servers that do not include your change.
 | |
| > 3.  It must be possible to round-trip your change (convert to different API versions and back) with
 | |
| >     no loss of information.
 | |
| 
 | |
| Previous versions of this proposal attempted to deal with backward compatibility by defining
 | |
| the affect of setting the pod-level fields on the container-level fields.  While trying to find
 | |
| consensus on this design, it became apparent that this approach was going to be extremely complex
 | |
| to implement, explain, and support.  Instead, we will approach backward compatibility as follows:
 | |
| 
 | |
| 1.  Pod-level and container-level settings will not affect one another
 | |
| 2.  Old clients will be able to use container-level settings in the exact same way
 | |
| 3.  Container level settings always override pod-level settings if they are set
 | |
| 
 | |
| #### Examples
 | |
| 
 | |
| 1.  Old client using `pod.Spec.Containers[x].SecurityContext`
 | |
| 
 | |
|     An old client creates a pod:
 | |
| 
 | |
|     ```yaml
 | |
|     apiVersion: v1
 | |
|     kind: Pod
 | |
|     metadata:
 | |
|       name: test-pod
 | |
|     spec:
 | |
|       containers:
 | |
|       - name: a
 | |
|         securityContext:
 | |
|           runAsUser: 1001
 | |
|       - name: b
 | |
|         securityContext:
 | |
|           runAsUser: 1002
 | |
|     ```
 | |
| 
 | |
|     looks to old clients like:
 | |
| 
 | |
|     ```yaml
 | |
|     apiVersion: v1
 | |
|     kind: Pod
 | |
|     metadata:
 | |
|       name: test-pod
 | |
|     spec:
 | |
|       containers:
 | |
|       - name: a
 | |
|         securityContext:
 | |
|           runAsUser: 1001
 | |
|       - name: b
 | |
|         securityContext:
 | |
|           runAsUser: 1002
 | |
|     ```
 | |
| 
 | |
|     looks to new clients like:
 | |
| 
 | |
|     ```yaml
 | |
|     apiVersion: v1
 | |
|     kind: Pod
 | |
|     metadata:
 | |
|       name: test-pod
 | |
|     spec:
 | |
|       containers:
 | |
|       - name: a
 | |
|         securityContext:
 | |
|           runAsUser: 1001
 | |
|       - name: b
 | |
|         securityContext:
 | |
|           runAsUser: 1002
 | |
|     ```
 | |
| 
 | |
| 2.  New client using `pod.Spec.SecurityContext`
 | |
| 
 | |
|     A new client creates a pod using a field of `pod.Spec.SecurityContext`:
 | |
| 
 | |
|     ```yaml
 | |
|     apiVersion: v1
 | |
|     kind: Pod
 | |
|     metadata:
 | |
|       name: test-pod
 | |
|     spec:
 | |
|       securityContext:
 | |
|         runAsUser: 1001
 | |
|       containers:
 | |
|       - name: a
 | |
|       - name: b
 | |
|     ```
 | |
| 
 | |
|     appears to new clients as:
 | |
| 
 | |
|     ```yaml
 | |
|     apiVersion: v1
 | |
|     kind: Pod
 | |
|     metadata:
 | |
|       name: test-pod
 | |
|     spec:
 | |
|       securityContext:
 | |
|         runAsUser: 1001
 | |
|       containers:
 | |
|       - name: a
 | |
|       - name: b
 | |
|     ```
 | |
| 
 | |
|     old clients will see:
 | |
| 
 | |
|     ```yaml
 | |
|     apiVersion: v1
 | |
|     kind: Pod
 | |
|     metadata:
 | |
|       name: test-pod
 | |
|     spec:
 | |
|       containers:
 | |
|       - name: a
 | |
|       - name: b
 | |
|     ```
 | |
| 
 | |
| 3.  Pods created using `pod.Spec.SecurityContext` and `pod.Spec.Containers[x].SecurityContext`
 | |
| 
 | |
|     If a field is set in both `pod.Spec.SecurityContext` and
 | |
|     `pod.Spec.Containers[x].SecurityContext`, the value in `pod.Spec.Containers[x].SecurityContext`
 | |
|     wins.  In the following pod:
 | |
| 
 | |
|     ```yaml
 | |
|     apiVersion: v1
 | |
|     kind: Pod
 | |
|     metadata:
 | |
|       name: test-pod
 | |
|     spec:
 | |
|       securityContext:
 | |
|         runAsUser: 1001
 | |
|       containers:
 | |
|       - name: a
 | |
|         securityContext:
 | |
|           runAsUser: 1002
 | |
|       - name: b
 | |
|     ```
 | |
| 
 | |
|     The effective setting for `runAsUser` for container A is `1002`.
 | |
| 
 | |
| #### Testing
 | |
| 
 | |
| A backward compatibility test suite will be established for the v1 API.  The test suite will
 | |
| verify compatibility by converting objects into the internal API and back to the version API and
 | |
| examining the results.
 | |
| 
 | |
| All of the examples here will be used as test-cases.  As more test cases are added, the proposal will
 | |
| be updated.
 | |
| 
 | |
| An example of a test like this can be found in the
 | |
| [OpenShift API package](https://github.com/openshift/origin/blob/master/pkg/api/compatibility_test.go)
 | |
| 
 | |
| E2E test cases will be added to test the correct determination of the security context for containers.
 | |
| 
 | |
| ### Kubelet changes
 | |
| 
 | |
| 1.  The Kubelet will use the new fields on the `PodSecurityContext` for host namespace control
 | |
| 2.  The Kubelet will be modified to correctly implement the backward compatibility and effective
 | |
|     security context determination defined here
 | |
| 
 | |
| <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 | |
| []()
 | |
| <!-- END MUNGE: GENERATED_ANALYTICS -->
 |