diff --git a/docs/proposals/scheduledjob.md b/docs/proposals/scheduledjob.md index 0625f09d836..5a474b0a888 100644 --- a/docs/proposals/scheduledjob.md +++ b/docs/proposals/scheduledjob.md @@ -62,14 +62,27 @@ report generation and the like. Each of these tasks should be allowed to run repeatedly (once a day/month, etc.) or once at a given point in time. -## Implementation +## Design Overview + +Users create a ScheduledJob object. One ScheduledJob object +is like one line of a crontab file. It has a schedule of when to run, +in [Cron](https://en.wikipedia.org/wiki/Cron) format. + + +The ScheduledJob controller creates a Job object [Job](job.md) +about once per execution time of the scheduled (e.g. once per +day for a daily schedule.) We say "about" because there are certain +circumstances where two jobs might be created, or no job might be +created. We attempt to make these rare, but do not completely prevent +them. Therefore, Jobs should be idempotent. + +The Job object is responsible for any retrying of Pods, and any parallelism +among pods it creates, and determining the success or failure of the set of +pods. The ScheduledJob does not examine pods at all. + ### ScheduledJob resource -The ScheduledJob controller relies heavily on the [Job API](job.md) -for running actual jobs, on top of which it adds information regarding the date -and time part according to [Cron](https://en.wikipedia.org/wiki/Cron) format. - The new `ScheduledJob` object will have the following contents: ```go @@ -173,31 +186,12 @@ type ScheduledJobStatus struct { } ``` -### Modifications to Job resource +Users must use a generated selector for the job. -In order to distinguish Job runs, we need to add `UniqueLabelKey` field to `JobSpec`. -This field will be used for creating unique label selectors. +## Modifications to Job resource -```go -type JobSpec { - - //... - - // Key of the selector that is added to prevent concurrently running Jobs - // selecting their pods. - // Users can set this to an empty string to indicate that the system should - // not add any selector and label. If unspecified, system uses - // "scheduledjob.kubernetes.io/podTemplateHash". - // Value of this key is hash of ScheduledJobSpec.PodTemplateSpec. - // No label is added if this is set to an empty string. - UniqueLabelKey *string -} -``` - -Although at Job level empty string is perfectly valid, `ScheduledJob` cannot have -empty selector, it needs to be defined, either by user or generated automatically. -For this to happen, validation will be tightened at ScheduledJob level for this -field to be either nil or non-empty string. +TODO for beta: forbid manual selector since that could cause confusing between +subsequent jobs. ### Running ScheduledJobs using kubectl @@ -216,6 +210,151 @@ In the above example: July 7th, 2pm. This value will be validated according to the same rules which apply to `.spec.schedule`. +## Fields Added to Job Template + +When the controller creates a Job from the JobTemplateSpec in the ScheduledJob, it +adds the following fields to the Job: + +- a name, based on the ScheduledJob's name, but with a suffix to distinguish + multiple executions, which may overlap. +- the standard created-by annotation on the Job, pointing to the SJ that created it + The standard key is `kubernetes.io/created-by`. The value is a serialized JSON object, like + `{ "kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ScheduledJob","namespace":"default",` + `"name":"nightly-earnings-report","uid":"5ef034e0-1890-11e6-8935-42010af0003e","apiVersion":...` + This serialization contains the UID of the parent. This is used to match the Job to the SJ that created + it. + +## Updates to ScheduledJobs + +If the schedule is updated on a ScheduledJob, it will: +- continue to use the Status.Active list of jobs to detect conflicts. +- try to fulfill all recently-passed times for the new schedule, by starting + new jobs. But it will not try to fulfill times prior to the + Status.LastScheduledTime. + - Example: If you have a schedule to run every 30 minutes, and change that to hourly, then the previously started + top-of-the-hour run, in Status.Active, will be seen and no new job started. + - Example: If you have a schedule to run every hour, change that to 30-minutely, at 31 minutes past the hour, + one run will be started immediately for the starting time that has just passed. + +If the job template of a ScheduledJob is updated, then future executions use the new template +but old ones still satisfy the schedule and are not re-run just because the template changed. + +If you delete and replace a ScheduledJob with one of the same name, it will: +- not use any old Status.Active, and not consider any existing running or terminated jobs from the previous + ScheduledJob (with a different UID) at all when determining coflicts, what needs to be started, etc. +- If there is an existing Job with the same time-based hash in its name (see below), then + new instances of that job will not be able to be created. So, delete it if you want to re-run. +with the same name as conflicts. +- not "re-run" jobs for "start times" before the creation time of the new ScheduledJobJob object. +- not consider executions from the previous UID when making decisions about what executions to + start, or status, etc. +- lose the history of the old SJ. + +To preserve status, you can suspend the old one, and make one with a new name, or make a note of the old status. + + +## Fault-Tolerance + +### Starting Jobs in the face of controller failures + +If the process with the scheduledJob controller in it fails, +and takes a while to restart, the scheduledJob controller +may miss the time window and it is too late to start a job. + +With a single scheduledJob controller process, we cannot give +very strong assurances about not missing starting jobs. + +With a suggested HA configuration, there are multiple controller +processes, and they use master election to determine which one +is active at any time. + +If the Job's StartingDeadlineSeconds is long enough, and the +lease for the master lock is short enough, and other controller +processes are running, then a Job will be started. + +TODO: consider hard-coding the minimum StartingDeadlineSeconds +at say 1 minute. Then we can offer a clearer guarantee, +assuming we know what the setting of the lock lease duration is. + +### Ensuring jobs are run at most once + +There are three problems here: + +- ensure at most one Job created per "start time" of a schedule. +- ensure that at most one Pod is created per Job +- ensure at most one container start occurs per Pod + +#### Ensuring one Job + +Multiple jobs might be created in the following sequence: + +1. scheduled job controller sends request to start Job J1 to fulfill start time T. +1. the create request is accepted by the apiserver and enqueued but not yet written to etcd. +1. scheduled job controller crashes +1. new scheduled job controller starts, and lists the existing jobs, and does not see one created. +1. it creates a new one. +1. the first one eventually gets written to etcd. +1. there are now two jobs for the same start time. + +We can solve this in several ways: + +1. with three-phase protocol, e.g.: + 1. controller creates a "suspended" job. + 1. controller writes writes an annotation in the SJ saying that it created a job for this time. + 1. controller unsuspends that job. +1. by picking a deterministic name, so that at most one object create can succeed. + +#### Ensuring one Pod + +Job object does not currently have a way to ask for this. +Even if it did, controller is not written to support it. +Same problem as above. + +#### Ensuring one container invocation per Pod + +Kubelet is not written to ensure at-most-one-container-start per pod. + +#### Decision + +This is too hard to do for the alpha version. We will await user +feedback to see if the "at most once" property is needed in the beta version. + +This is awkward but possible for a containerized application ensure on it own, as it needs +to know what ScheduledJob name and Start Time it is from, and then record the attempt +in a shared storage system. We should ensure it could extract this data from its annotations +using the downward API. + +## Name of Jobs + +A ScheduledJob creates one Job at each time when a Job should run. +Since there may be concurrent jobs, and since we might want to keep failed +non-overlapping Jobs around as a debugging record, each Job created by the same ScheduledJob +needs a distinct name. + +To make the Jobs from the same ScheduledJob distinct, we could use a random string, +in the way that pods have a `generateName`. For example, a scheduledJob named `nightly-earnings-report` +in namespace `ns1` might create a job `nightly-earnings-report-3m4d3`, and later create +a job called `nightly-earnings-report-6k7ts`. This is consistent with pods, but +does not give the user much information. + +Alternatively, we can use time as a uniqifier. For example, the same scheduledJob could +create a job called `nightly-earnings-report-2016-May-19`. +However, for Jobs that run more than once per day, we would need to represent +time as well as date. Standard date formats (e.g. RFC 3339) use colons for time. +Kubernetes names cannot include time. Using a non-standard date format without colons +will annoy some users. + +Also, date strings are much longer than random suffixes, which means that +the pods will also have long names, and that we are more likely to exceed the +253 character name limit when combining the scheduled-job name, +the time suffix, and pod random suffix. + +One option would be to compute a hash of the nominal start time of the job, +and use that as a suffix. This would not provide the user with an indication +of the start time, but it would prevent creation of the same execution +by two instances (replicated or restarting) of the controller process. + +We chose to use the hashed-date suffix approach. ## Future evolution