ScheduledJob: proposal updates

2025-07-31 15:25:57 +00:00 · 2016-05-13 14:56:05 -07:00 · 2016-05-13 14:56:05 -07:00 · 87cba77e34
commit 87cba77e34
parent d6854cbb6b
1 changed files with 167 additions and 28 deletions
--- a/docs/proposals/scheduledjob.md
+++ b/docs/proposals/scheduledjob.md
@ -62,14 +62,27 @@ report generation and the like.  Each of these tasks should be allowed to run
 repeatedly (once a day/month, etc.) or once at a given point in time.
-## Implementation
+## Design Overview
 Users create a ScheduledJob object.  One ScheduledJob object
 is like one line of a crontab file.  It has a schedule of when to run,
 in [Cron](https://en.wikipedia.org/wiki/Cron) format.
 The ScheduledJob controller creates a Job object [Job](job.md)
 about once per execution time of the scheduled (e.g. once per
 day for a daily schedule.)  We say "about" because there are certain
 circumstances where two jobs might be created, or no job might be
 created.  We attempt to make these rare, but do not completely prevent
 them.  Therefore, Jobs should be idempotent.
 The Job object is responsible for any retrying of Pods, and any parallelism
 among pods it creates, and determining the success or failure of the set of
 pods.  The ScheduledJob does not examine pods at all.
 ### ScheduledJob resource
 The ScheduledJob controller relies heavily on the [Job API](job.md)
 for running actual jobs, on top of which it adds information regarding the date
 and time part according to [Cron](https://en.wikipedia.org/wiki/Cron) format.
 The new `ScheduledJob` object will have the following contents:
 ```go
@ -173,31 +186,12 @@ type ScheduledJobStatus struct {
 }
 ```
-### Modifications to Job resource
+Users must use a generated selector for the job.
-In order to distinguish Job runs, we need to add `UniqueLabelKey` field to `JobSpec`.
+## Modifications to Job resource
 This field will be used for creating unique label selectors.
-```go
+TODO for beta: forbid manual selector since that could cause confusing between
-type JobSpec {
+subsequent jobs.
    //...
    // Key of the selector that is added to prevent concurrently running Jobs
    // selecting their pods.
    // Users can set this to an empty string to indicate that the system should
    // not add any selector and label. If unspecified, system uses
    // "scheduledjob.kubernetes.io/podTemplateHash".
    // Value of this key is hash of ScheduledJobSpec.PodTemplateSpec.
    // No label is added if this is set to an empty string.
    UniqueLabelKey *string
 }
 ```
 Although at Job level empty string is perfectly valid, `ScheduledJob` cannot have
 empty selector, it needs to be defined, either by user or generated automatically.
 For this to happen, validation will be tightened at ScheduledJob level for this
 field to be either nil or non-empty string.
 ### Running ScheduledJobs using kubectl
@ -216,6 +210,151 @@ In the above example:
  July 7th, 2pm.  This value will be validated according to the same rules which
  apply to `.spec.schedule`.
 ## Fields Added to Job Template
 When the controller creates a Job from the JobTemplateSpec in the ScheduledJob, it
 adds the following fields to the Job:
 - a name, based on the ScheduledJob's name, but with a suffix to distinguish
  multiple executions, which may overlap.
 - the standard created-by annotation on the Job, pointing to the SJ that created it
  The standard key is `kubernetes.io/created-by`.  The value is a serialized JSON object, like
  `{ "kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ScheduledJob","namespace":"default",`
  `"name":"nightly-earnings-report","uid":"5ef034e0-1890-11e6-8935-42010af0003e","apiVersion":...`
  This serialization contains the UID of the parent.  This is used to match the Job to the SJ that created
  it.
 ## Updates to ScheduledJobs
 If the schedule is updated on a ScheduledJob, it will:
 - continue to use the Status.Active list of jobs to detect conflicts.
 - try to fulfill all recently-passed times for the new schedule, by starting
  new jobs.  But it will not try to fulfill times prior to the
  Status.LastScheduledTime.
  - Example:   If you have a schedule to run every 30 minutes, and change that to hourly, then the previously started
    top-of-the-hour run, in Status.Active, will be seen and no new job started.
  - Example:   If you have a schedule to run every hour, change that to 30-minutely, at 31 minutes past the hour,
    one run will be started immediately for the starting time that has just passed.
 If the job template of a ScheduledJob is updated, then future executions use the new template
 but old ones still satisfy the schedule and are not re-run just because the template changed.
 If you delete and replace a ScheduledJob with one of the same name, it will:
 - not use any old Status.Active, and not consider any existing running or terminated jobs from the previous
  ScheduledJob (with a different UID) at all when determining coflicts, what needs to be started, etc.
 - If there is an existing Job with the same time-based hash in its name (see below), then
  new instances of that job will not be able to be created.  So, delete it if you want to re-run.
 with the same name as conflicts.
 - not "re-run" jobs for "start times" before the creation time of the new ScheduledJobJob object.
 - not consider executions from the previous UID when making decisions about what executions to
 start, or status, etc.
 - lose the history of the old SJ.
 To preserve status, you can suspend the old one, and make one with a new name, or make a note of the old status.
 ## Fault-Tolerance
 ### Starting Jobs in the face of controller failures
 If the process with the scheduledJob controller in it fails,
 and takes a while to restart, the scheduledJob controller
 may miss the time window and it is too late to start a job.
 With a single scheduledJob controller process, we cannot give
 very strong assurances about not missing starting jobs.
 With a suggested HA configuration, there are multiple controller
 processes, and they use master election to determine which one
 is active at any time.
 If the Job's StartingDeadlineSeconds is long enough, and the
 lease for the master lock is short enough, and other controller
 processes are running, then a Job will be started.
 TODO: consider hard-coding the minimum StartingDeadlineSeconds
 at say 1 minute.  Then we can offer a clearer guarantee,
 assuming we know what the setting of the lock lease duration is.
 ### Ensuring jobs are run at most once
 There are three problems here:
 - ensure at most one Job created per "start time" of a schedule.
 - ensure that at most one Pod is created per Job
 - ensure at most one container start occurs per Pod
 #### Ensuring one Job
 Multiple jobs might be created in the following sequence:
 1. scheduled job controller sends request to start Job J1 to fulfill start time T.
 1. the create request is accepted by the apiserver and enqueued but not yet written to etcd.
 1. scheduled job controller crashes
 1. new scheduled job controller starts, and lists the existing jobs, and does not see one created.
 1. it creates a new one.
 1. the first one eventually gets written to etcd.
 1. there are now two jobs for the same start time.
 We can solve this in several ways:
 1. with three-phase protocol, e.g.:
  1. controller creates a "suspended" job.
  1. controller writes writes an annotation in the SJ saying that it created a job for this time.
  1. controller unsuspends that job.
 1. by picking a deterministic name, so that at most one object create can succeed.
 #### Ensuring one Pod
 Job object does not currently have a way to ask for this.
 Even if it did, controller is not written to support it.
 Same problem as above.
 #### Ensuring one container invocation per Pod
 Kubelet is not written to ensure at-most-one-container-start per pod.
 #### Decision
 This is too hard to do for the alpha version.  We will await user
 feedback to see if the "at most once" property is needed in the beta version.
 This is awkward but possible for a containerized application ensure on it own, as it needs
 to know what ScheduledJob name and Start Time it is from, and then record the attempt
 in a shared storage system.   We should ensure it could extract this data from its annotations
 using the downward API.
 ## Name of Jobs
 A ScheduledJob creates one Job at each time when a Job should run.
 Since there may be concurrent jobs, and since we might want to keep failed
 non-overlapping Jobs around as a debugging record, each Job created by the same ScheduledJob
 needs a distinct name.
 To make the Jobs from the same ScheduledJob distinct, we could use a random string,
 in the way that pods have a `generateName`.  For example, a scheduledJob named `nightly-earnings-report`
 in namespace `ns1` might create a job `nightly-earnings-report-3m4d3`, and later create
 a job called `nightly-earnings-report-6k7ts`.  This is consistent with pods, but
 does not give the user much information.
 Alternatively, we can use time as a uniqifier.  For example, the same scheduledJob could
 create a job called `nightly-earnings-report-2016-May-19`.
 However, for Jobs that run more than once per day, we would need to represent
 time as well as date.  Standard date formats (e.g. RFC 3339) use colons for time.
 Kubernetes names cannot include time.  Using a non-standard date format without colons
 will annoy some users.
 Also, date strings are much longer than random suffixes, which means that
 the pods will also have long names, and that we are more likely to exceed the
 253 character name limit when combining the scheduled-job name,
 the time suffix, and pod random suffix.
 One option would be to compute a hash of the nominal start time of the job,
 and use that as a suffix.  This would not provide the user with an indication
 of the start time, but it would prevent creation of the same execution
 by two instances (replicated or restarting) of the controller process.
 We chose to use the hashed-date suffix approach.
 ## Future evolution