mirror of
https://github.com/k3s-io/kubernetes.git
synced 2025-07-23 11:50:44 +00:00
Job controller proposal
This commit is contained in:
parent
f21a6e9a93
commit
688f3da839
191
docs/proposals/job.md
Normal file
191
docs/proposals/job.md
Normal file
@ -0,0 +1,191 @@
|
||||
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
<!-- BEGIN STRIP_FOR_RELEASE -->
|
||||
|
||||
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
|
||||
width="25" height="25">
|
||||
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
|
||||
width="25" height="25">
|
||||
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
|
||||
width="25" height="25">
|
||||
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
|
||||
width="25" height="25">
|
||||
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
|
||||
width="25" height="25">
|
||||
|
||||
<h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2>
|
||||
|
||||
If you are using a released version of Kubernetes, you should
|
||||
refer to the docs that go with that version.
|
||||
|
||||
<strong>
|
||||
The latest 1.0.x release of this document can be found
|
||||
[here](http://releases.k8s.io/release-1.0/docs/proposals/job.md).
|
||||
|
||||
Documentation for other releases can be found at
|
||||
[releases.k8s.io](http://releases.k8s.io).
|
||||
</strong>
|
||||
--
|
||||
|
||||
<!-- END STRIP_FOR_RELEASE -->
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
# Job Controller
|
||||
|
||||
## Abstract
|
||||
|
||||
A proposal for implementing a new controller - Job controller - which will be responsible
|
||||
for managing pod(s) that require running once to completion even if the machine
|
||||
the pod is running on fails, in contrast to what ReplicationController currently offers.
|
||||
|
||||
Several existing issues and PRs were already created regarding that particular subject:
|
||||
* Job Controller [#1624](https://github.com/GoogleCloudPlatform/kubernetes/issues/1624)
|
||||
* New Job resource [#7380](https://github.com/GoogleCloudPlatform/kubernetes/pull/7380)
|
||||
|
||||
|
||||
## Use Cases
|
||||
|
||||
1. Be able to start one or several pods tracked as a single entity.
|
||||
1. Be able to run batch-oriented workloads on Kubernetes.
|
||||
1. Be able to get the job status.
|
||||
1. Be able to specify the number of instances performing a job at any one time.
|
||||
1. Be able to specify the number of successfully finished instances required to finish a job.
|
||||
|
||||
|
||||
## Motivation
|
||||
|
||||
Jobs are needed for executing multi-pod computation to completion; a good example
|
||||
here would be the ability to implement any type of batch oriented tasks.
|
||||
|
||||
|
||||
## Implementation
|
||||
|
||||
Job controller is similar to replication controller in that they manage pods.
|
||||
This implies they will follow the same controller framework that replication
|
||||
controllers already defined. The biggest difference between a `Job` and a
|
||||
`ReplicationController` object is the purpose; `ReplicationController`
|
||||
ensures that a specified number of Pods are running at any one time, whereas
|
||||
`Job` is responsible for keeping the desired number of Pods to a completion of
|
||||
a task. This difference will be represented by the `RestartPolicy` which is
|
||||
required to always take value of `RestartPolicyNever` or `RestartOnFailure`.
|
||||
|
||||
|
||||
The new `Job` object will have the following content:
|
||||
|
||||
```go
|
||||
// Job represents the configuration of a single job.
|
||||
type Job struct {
|
||||
TypeMeta
|
||||
ObjectMeta
|
||||
|
||||
// Spec is a structure defining the expected behavior of a job.
|
||||
Spec JobSpec
|
||||
|
||||
// Status is a structure describing current status of a job.
|
||||
Status JobStatus
|
||||
}
|
||||
|
||||
// JobList is a collection of jobs.
|
||||
type JobList struct {
|
||||
TypeMeta
|
||||
ListMeta
|
||||
|
||||
Items []Job
|
||||
}
|
||||
```
|
||||
|
||||
`JobSpec` structure is defined to contain all the information how the actual job execution
|
||||
will look like.
|
||||
|
||||
```go
|
||||
// JobSpec describes how the job execution will look like.
|
||||
type JobSpec struct {
|
||||
|
||||
// Parallelism specifies the maximum desired number of pods the job should
|
||||
// run at any given time. The actual number of pods running in steady state will
|
||||
// be less than this number when ((.spec.completions - .status.successful) < .spec.parallelism),
|
||||
// i.e. when the work left to do is less than max parallelism.
|
||||
Parallelism *int
|
||||
|
||||
// Completions specifies the desired number of successfully finished pods the
|
||||
// job should be run with. Defaults to 1.
|
||||
Completions *int
|
||||
|
||||
// Selector is a label query over pods running a job.
|
||||
Selector map[string]string
|
||||
|
||||
// Template is the object that describes the pod that will be created when
|
||||
// executing a job.
|
||||
Template *PodTemplateSpec
|
||||
}
|
||||
```
|
||||
|
||||
`JobStatus` structure is defined to contain informations about pods executing
|
||||
specified job. The structure holds information about pods currently executing
|
||||
the job.
|
||||
|
||||
```go
|
||||
// JobStatus represents the current state of a Job.
|
||||
type JobStatus struct {
|
||||
Conditions []JobCondition
|
||||
|
||||
// CreationTime represents time when the job was created
|
||||
CreationTime util.Time
|
||||
|
||||
// StartTime represents time when the job was started
|
||||
StartTime util.Time
|
||||
|
||||
// CompletionTime represents time when the job was completed
|
||||
CompletionTime util.Time
|
||||
|
||||
// Active is the number of actively running pods.
|
||||
Active int
|
||||
|
||||
// Successful is the number of pods successfully completed their job.
|
||||
Successful int
|
||||
|
||||
// Unsuccessful is the number of pods failures, this applies only to jobs
|
||||
// created with RestartPolicyNever, otherwise this value will always be 0.
|
||||
Unsuccessful int
|
||||
}
|
||||
|
||||
type JobConditionType string
|
||||
|
||||
// These are valid conditions of a job.
|
||||
const (
|
||||
// JobSucceeded means the job has successfully completed its execution.
|
||||
JobSucceeded JobConditionType = "Complete"
|
||||
)
|
||||
|
||||
// JobCondition describes current state of a job.
|
||||
type JobCondition struct {
|
||||
Type JobConditionType
|
||||
Status ConditionStatus
|
||||
LastHeartbeatTime util.Time
|
||||
LastTransitionTime util.Time
|
||||
Reason string
|
||||
Message string
|
||||
}
|
||||
```
|
||||
|
||||
## Events
|
||||
|
||||
Job controller will be emitting the following events:
|
||||
* JobStart
|
||||
* JobFinish
|
||||
|
||||
## Future evolution
|
||||
|
||||
Below are the possible future extensions to the Job controller:
|
||||
* Be able to limit the execution time for a job, similarly to ActiveDeadlineSeconds for Pods.
|
||||
* Be able to create a chain of jobs dependent one on another.
|
||||
* Be able to specify the work each of the workers should execute (see type 1 from
|
||||
[this comment](https://github.com/GoogleCloudPlatform/kubernetes/issues/1624#issuecomment-97622142))
|
||||
* Be able to inspect Pods running a Job, especially after a Job has finished, e.g.
|
||||
by providing pointers to Pods in the JobStatus ([see comment](https://github.com/kubernetes/kubernetes/pull/11746/files#r37142628)).
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
Loading…
Reference in New Issue
Block a user