mirror of
				https://github.com/k3s-io/kubernetes.git
				synced 2025-10-30 21:30:16 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			241 lines
		
	
	
		
			9.9 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			241 lines
		
	
	
		
			9.9 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| <!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
 | ||
| 
 | ||
| <!-- BEGIN STRIP_FOR_RELEASE -->
 | ||
| 
 | ||
| <img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
 | ||
|      width="25" height="25">
 | ||
| <img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
 | ||
|      width="25" height="25">
 | ||
| <img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
 | ||
|      width="25" height="25">
 | ||
| <img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
 | ||
|      width="25" height="25">
 | ||
| <img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
 | ||
|      width="25" height="25">
 | ||
| 
 | ||
| <h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2>
 | ||
| 
 | ||
| If you are using a released version of Kubernetes, you should
 | ||
| refer to the docs that go with that version.
 | ||
| 
 | ||
| <!-- TAG RELEASE_LINK, added by the munger automatically -->
 | ||
| <strong>
 | ||
| The latest release of this document can be found
 | ||
| [here](http://releases.k8s.io/release-1.3/docs/design/daemon.md).
 | ||
| 
 | ||
| Documentation for other releases can be found at
 | ||
| [releases.k8s.io](http://releases.k8s.io).
 | ||
| </strong>
 | ||
| --
 | ||
| 
 | ||
| <!-- END STRIP_FOR_RELEASE -->
 | ||
| 
 | ||
| <!-- END MUNGE: UNVERSIONED_WARNING -->
 | ||
| 
 | ||
| # DaemonSet in Kubernetes
 | ||
| 
 | ||
| **Author**: Ananya Kumar (@AnanyaKumar)
 | ||
| 
 | ||
| **Status**: Implemented.
 | ||
| 
 | ||
| This document presents the design of the Kubernetes DaemonSet, describes use
 | ||
| cases, and gives an overview of the code.
 | ||
| 
 | ||
| ## Motivation
 | ||
| 
 | ||
| Many users have requested for a way to run a daemon on every node in a
 | ||
| Kubernetes cluster, or on a certain set of nodes in a cluster. This is essential
 | ||
| for use cases such as building a sharded datastore, or running a logger on every
 | ||
| node. In comes the DaemonSet, a way to conveniently create and manage
 | ||
| daemon-like workloads in Kubernetes.
 | ||
| 
 | ||
| ## Use Cases
 | ||
| 
 | ||
| The DaemonSet can be used for user-specified system services, cluster-level
 | ||
| applications with strong node ties, and Kubernetes node services. Below are
 | ||
| example use cases in each category.
 | ||
| 
 | ||
| ### User-Specified System Services:
 | ||
| 
 | ||
| Logging: Some users want a way to collect statistics about nodes in a cluster
 | ||
| and send those logs to an external database. For example, system administrators
 | ||
| might want to know if their machines are performing as expected, if they need to
 | ||
| add more machines to the cluster, or if they should switch cloud providers. The
 | ||
| DaemonSet can be used to run a data collection service (for example fluentd) on
 | ||
| every node and send the data to a service like ElasticSearch for analysis.
 | ||
| 
 | ||
| ### Cluster-Level Applications
 | ||
| 
 | ||
| Datastore: Users might want to implement a sharded datastore in their cluster. A
 | ||
| few nodes in the cluster, labeled ‘app=datastore’, might be responsible for
 | ||
| storing data shards, and pods running on these nodes might serve data. This
 | ||
| architecture requires a way to bind pods to specific nodes, so it cannot be
 | ||
| achieved using a Replication Controller. A DaemonSet is a convenient way to
 | ||
| implement such a datastore.
 | ||
| 
 | ||
| For other uses, see the related [feature request](https://issues.k8s.io/1518)
 | ||
| 
 | ||
| ## Functionality
 | ||
| 
 | ||
| The DaemonSet supports standard API features:
 | ||
|   - create
 | ||
|   - The spec for DaemonSets has a pod template field.
 | ||
|   - Using the pod’s nodeSelector field, DaemonSets can be restricted to operate
 | ||
| over nodes that have a certain label. For example, suppose that in a cluster
 | ||
| some nodes are labeled ‘app=database’. You can use a DaemonSet to launch a
 | ||
| datastore pod on exactly those nodes labeled ‘app=database’.
 | ||
|   - Using the pod's nodeName field, DaemonSets can be restricted to operate on a
 | ||
| specified node.
 | ||
|   - The PodTemplateSpec used by the DaemonSet is the same as the PodTemplateSpec
 | ||
| used by the Replication Controller.
 | ||
|   - The initial implementation will not guarantee that DaemonSet pods are
 | ||
| created on nodes before other pods.
 | ||
|   - The initial implementation of DaemonSet does not guarantee that DaemonSet
 | ||
| pods show up on nodes (for example because of resource limitations of the node),
 | ||
| but makes a best effort to launch DaemonSet pods (like Replication Controllers
 | ||
| do with pods). Subsequent revisions might ensure that DaemonSet pods show up on
 | ||
| nodes, preempting other pods if necessary.
 | ||
|   - The DaemonSet controller adds an annotation:
 | ||
| ```"kubernetes.io/created-by: \<json API object reference\>"```
 | ||
|   - YAML example:
 | ||
| 
 | ||
|     ```YAML
 | ||
|   apiVersion: extensions/v1beta1
 | ||
|   kind: DaemonSet
 | ||
|   metadata:
 | ||
|     labels:
 | ||
|       app: datastore
 | ||
|     name: datastore
 | ||
|   spec:
 | ||
|     template:
 | ||
|       metadata:
 | ||
|         labels:
 | ||
|           app: datastore-shard
 | ||
|       spec:
 | ||
|         nodeSelector: 
 | ||
|           app: datastore-node
 | ||
|         containers:
 | ||
|           name: datastore-shard
 | ||
|           image: kubernetes/sharded
 | ||
|           ports:
 | ||
|             - containerPort: 9042
 | ||
|               name: main
 | ||
| ```
 | ||
| 
 | ||
|   - commands that get info:
 | ||
|     - get (e.g. kubectl get daemonsets)
 | ||
|     - describe
 | ||
|   - Modifiers:
 | ||
|     - delete (if --cascade=true, then first the client turns down all the pods
 | ||
| controlled by the DaemonSet (by setting the nodeSelector to a uuid pair that is
 | ||
| unlikely to be set on any node); then it deletes the DaemonSet; then it deletes
 | ||
| the pods)
 | ||
|     - label
 | ||
|     - annotate
 | ||
|     - update operations like patch and replace (only allowed to selector and to
 | ||
| nodeSelector and nodeName of pod template)
 | ||
|     - DaemonSets have labels, so you could, for example, list all DaemonSets
 | ||
| with certain labels (the same way you would for a Replication Controller).
 | ||
| 
 | ||
| In general, for all the supported features like get, describe, update, etc,
 | ||
| the DaemonSet works in a similar way to the Replication Controller. However,
 | ||
| note that the DaemonSet and the Replication Controller are different constructs.
 | ||
| 
 | ||
| ### Persisting Pods
 | ||
| 
 | ||
|   - Ordinary liveness probes specified in the pod template work to keep pods
 | ||
| created by a DaemonSet running.
 | ||
|   - If a daemon pod is killed or stopped, the DaemonSet will create a new
 | ||
| replica of the daemon pod on the node.
 | ||
| 
 | ||
| ### Cluster Mutations
 | ||
| 
 | ||
|   - When a new node is added to the cluster, the DaemonSet controller starts
 | ||
| daemon pods on the node for DaemonSets whose pod template nodeSelectors match
 | ||
| the node’s labels.
 | ||
|   - Suppose the user launches a DaemonSet that runs a logging daemon on all
 | ||
| nodes labeled “logger=fluentd”. If the user then adds the “logger=fluentd” label
 | ||
| to a node (that did not initially have the label), the logging daemon will
 | ||
| launch on the node. Additionally, if a user removes the label from a node, the
 | ||
| logging daemon on that node will be killed.
 | ||
| 
 | ||
| ## Alternatives Considered
 | ||
| 
 | ||
| We considered several alternatives, that were deemed inferior to the approach of
 | ||
| creating a new DaemonSet abstraction.
 | ||
| 
 | ||
| One alternative is to include the daemon in the machine image. In this case it
 | ||
| would run outside of Kubernetes proper, and thus not be monitored, health
 | ||
| checked, usable as a service endpoint, easily upgradable, etc.
 | ||
| 
 | ||
| A related alternative is to package daemons as static pods. This would address
 | ||
| most of the problems described above, but they would still not be easily
 | ||
| upgradable, and more generally could not be managed through the API server
 | ||
| interface.
 | ||
| 
 | ||
| A third alternative is to generalize the Replication Controller. We would do
 | ||
| something like: if you set the `replicas` field of the ReplicationControllerSpec
 | ||
| to -1, then it means "run exactly one replica on every node matching the
 | ||
| nodeSelector in the pod template." The ReplicationController would pretend
 | ||
| `replicas` had been set to some large number -- larger than the largest number
 | ||
| of nodes ever expected in the cluster -- and would use some anti-affinity
 | ||
| mechanism to ensure that no more than one Pod from the ReplicationController
 | ||
| runs on any given node. There are two downsides to this approach. First,
 | ||
| there would always be a large number of Pending pods in the scheduler (these
 | ||
| will be scheduled onto new machines when they are added to the cluster). The
 | ||
| second downside is more philosophical: DaemonSet and the Replication Controller
 | ||
| are very different concepts. We believe that having small, targeted controllers
 | ||
| for distinct purposes makes Kubernetes easier to understand and use, compared to
 | ||
| having larger multi-functional controllers (see
 | ||
| ["Convert ReplicationController to a plugin"](http://issues.k8s.io/3058) for
 | ||
| some discussion of this topic).
 | ||
| 
 | ||
| ## Design
 | ||
| 
 | ||
| #### Client
 | ||
| 
 | ||
| - Add support for DaemonSet commands to kubectl and the client. Client code was
 | ||
| added to pkg/client/unversioned. The main files in Kubectl that were modified are
 | ||
| pkg/kubectl/describe.go and pkg/kubectl/stop.go, since for other calls like Get, Create,
 | ||
| and Update, the client simply forwards the request to the backend via the REST
 | ||
| API.
 | ||
| 
 | ||
| #### Apiserver
 | ||
| 
 | ||
| - Accept, parse, validate client commands
 | ||
| - REST API calls are handled in pkg/registry/daemonset
 | ||
|   - In particular, the api server will add the object to etcd
 | ||
|   - DaemonManager listens for updates to etcd (using Framework.informer)
 | ||
| - API objects for DaemonSet were created in expapi/v1/types.go and
 | ||
| expapi/v1/register.go
 | ||
| - Validation code is in expapi/validation
 | ||
| 
 | ||
| #### Daemon Manager
 | ||
| 
 | ||
| - Creates new DaemonSets when requested. Launches the corresponding daemon pod
 | ||
| on all nodes with labels matching the new DaemonSet’s selector.
 | ||
| - Listens for addition of new nodes to the cluster, by setting up a
 | ||
| framework.NewInformer that watches for the creation of Node API objects. When a
 | ||
| new node is added, the daemon manager will loop through each DaemonSet. If the
 | ||
| label of the node matches the selector of the DaemonSet, then the daemon manager
 | ||
| will create the corresponding daemon pod in the new node.
 | ||
| - The daemon manager creates a pod on a node by sending a command to the API
 | ||
| server, requesting for a pod to be bound to the node (the node will be specified
 | ||
| via its hostname.)
 | ||
| 
 | ||
| #### Kubelet
 | ||
| 
 | ||
| - Does not need to be modified, but health checking will occur for the daemon
 | ||
| pods and revive the pods if they are killed (we set the pod restartPolicy to
 | ||
| Always). We reject DaemonSet objects with pod templates that don’t have
 | ||
| restartPolicy set to Always.
 | ||
| 
 | ||
| ## Open Issues
 | ||
| 
 | ||
| - Should work similarly to [Deployment](http://issues.k8s.io/1743).
 | ||
| 
 | ||
| 
 | ||
| <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 | ||
| []()
 | ||
| <!-- END MUNGE: GENERATED_ANALYTICS -->
 |