Commit Graph

58 Commits

Author SHA1 Message Date
Sathyanarayanan Saravanamuthu
c84c8add70
Decouple batch/job back-off logic from workqueues (#114768)
* batch/job: decouple backoff from workqueue

Signed-off-by: Sathyanarayanan Saravanamuthu <sathyanarays@vmware.com>

* Resolving review comments

* Resolving more review comments

* Resolving review comments

Signed-off-by: Sathyanarayanan Saravanamuthu <sathyanarays@vmware.com>

* Computing finish time to now when FinishedAt is unix epoch

* Addressing review comments

Signed-off-by: Sathyanarayanan Saravanamuthu <sathyanarays@vmware.com>

---------

Signed-off-by: Sathyanarayanan Saravanamuthu <sathyanarays@vmware.com>
2023-03-16 10:15:21 -07:00
Kubernetes Prow Robot
cb00077cd3
Merge pull request #113471 from ncdc/gc-contextual-logging
garbagecollector: use contextual logging
2023-03-10 04:34:39 -08:00
Andy Goldstein
26e3dab78b garbagecollector: use contextual logging
Signed-off-by: Andy Goldstein <andy.goldstein@redhat.com>
2023-03-08 08:37:56 -05:00
ahg-g
2ecd24011a Graduate JobMutableNodeSchedulingDirectives feature to GA 2023-02-28 15:47:13 +00:00
Yuan Chen
a24aef6510 Replace a function closure
Replace more closures with pointer conversion

Replace deprecated Int32Ptr to Int32
2023-02-27 09:13:36 -08:00
Daniel Vega-Myhre
c63f448451 change test names and address other comments 2023-02-23 03:25:17 +00:00
Daniel Vega-Myhre
b0b0959b92 address comments 2023-02-23 03:25:16 +00:00
Daniel Vega-Myhre
d41302312e update validation logic so completions is mutable iff completions is modified in tandem with parallelsim so completions == parallelism 2023-02-23 03:25:16 +00:00
kannon92
6dfaeff33c Remove Legacy Job Tracking 2023-01-10 14:52:54 +00:00
Aldo Culquicondor
61fe6114b3
Reduce load of Job integration test
Change-Id: If99856aa6640375a8a9feff13fa213d4f974a99a
2022-12-02 12:58:28 -05:00
Michal Wozniak
c803892bd8 Enable the feature into beta 2022-11-09 09:02:40 +01:00
Aldo Culquicondor
4948918155
Graduate JobTrackingWithFinalizers to stable
Change-Id: Ifc749a85b1270c0155ac511b91d4681d53236820
2022-11-04 17:05:53 -04:00
Aldo Culquicondor
5e03865f65
Add benchmark for large indexed job
Change-Id: I556f0cce5842699c98654cfb5a66e7c8d63b2e2e
2022-11-02 11:56:26 -04:00
Michał Woźniak
3628532311
Extend metrics with the new labels (#113324)
* Extend job metrics

* Refactor TestMetrics to extract its checks into dedicated tests per feature
2022-10-31 08:50:45 -07:00
Aldo Culquicondor
12d308f5c4 Add metric for terminated pods with tracking finalizer
Change-Id: I26f3169588c30ed82250cb7baff8e277f8d13bb7
2022-10-20 11:35:20 -04:00
Aldo Culquicondor
b8bd168180 Simplify tests for job metrics by resetting them
Change-Id: I20a0acbbb179bf895953b9d7af72625a2191b8eb
2022-10-19 13:52:00 -04:00
Kubernetes Prow Robot
bf14677914
Merge pull request #112546 from oscr/the-the
grammar: replace all occurrences of "the the" with "the"
2022-10-19 10:03:02 -07:00
Oscar Utbult
e4f776f230 grammar: replace all occurrences of "the the" with "the" 2022-10-14 09:03:14 +02:00
Michal Wozniak
b64e5b2d15 Fix the occasional double-counting job_finished_total metric
The reason for the issue is that the metrics were bumped before the
final job status update. In case the update failed the path was
repeated by the next syncJob leading to double-counting of the metrics.

The solution is to delay recording metrics and broadcasting events
after the job status update succeeds.
2022-10-13 17:23:03 +02:00
Michal Wozniak
bf9ce70de3 Support handling of pod failures with respect to the specified rules 2022-08-04 18:39:08 +02:00
Aldo Culquicondor
ca8cebe5ba Fix JobTrackingWithFinalizers when a pod succeeds after the job fails
Change-Id: I3be351fb3b53216948a37b1d58224f8fbbf22b47
2022-08-02 19:33:06 -04:00
Wojciech Tyczyński
5b042f0bf4 Remove RunAnAPIServer from integration tests 2022-07-25 17:52:31 +02:00
Wojciech Tyczyński
aee829abf4 Clean shutdown of job integration tests 2022-05-28 21:14:09 +02:00
Lukasz Szaszkiewicz
59a5c1a6ea hardens integration job tests
the job controller used by the tests must wait for the caches to sync
since the tests don't check /readyz there is no way
the tests can tell it is safe to call the server and requests won't be rejected
2022-05-24 13:47:38 +02:00
Wojciech Tyczyński
deef9e40de Simplify Create/Delete-TestingNamespace functions 2022-05-15 23:06:26 +02:00
Kubernetes Prow Robot
63a618a815
Merge pull request #109486 from alculquicondor/job-backofflimit
Fix job tracking leaving pods with finalizers
2022-05-04 01:28:14 -07:00
Aldo Culquicondor
12568860cb Test Foreground deletion in job integration
Change-Id: Ia6e86da5e66422fdb653c1ee60864a1c79233ea6
2022-04-20 16:39:10 -04:00
Aldo Culquicondor
f2c8030845 Integration test for backoff limit and finalizers
Change-Id: Ic231ce9a5504d3aae4191901d7eb5fe69bf017ac
2022-04-20 16:39:09 -04:00
Aldo Culquicondor
3b18613be8 Disable JobTrackingWithFinalizers due to unresolved bug
Change-Id: Ieeeab689ae51dfe0dc06bdca88519d0ecf66d636
2022-04-14 15:08:14 -04:00
Aldo Culquicondor
8c00f510ef Graduate JobReadyPods to beta
Set podUpdateBatchPeriod to 1s

Change-Id: I8a10fd8f8559adad9df179b664b8c82851607855
2022-03-29 10:07:41 -04:00
Aldo Culquicondor
cd9fd12960 Reduce number of pods in Job+GC tests
To reduce the load of the integration tests. This change reduces the runtime of each test in half.

Change-Id: I71bcaadf3809643c63bb0f6b73c28778d37d8967
2022-03-25 13:01:50 -04:00
Aldo Culquicondor
1d9e3766d2 Add test for Background delete propagation
Change-Id: I033e6fb04933c64cfe6490d1019333745d58c423
2022-03-24 11:57:51 -04:00
Aldo Culquicondor
f72173e4b4 Add integration test for orphan pods when there is GC
Change-Id: I04cd70725fd1830be8daf2dca53f67bc10a379b7
2022-03-24 11:57:49 -04:00
Aldo Culquicondor
2c5d0a273c Graduate IndexedJob to stable
- Lock feature gate to true and schedule for deletion in 1.26
- Remove checks on feature gate
- Graduate E2E test to Conformance

Change-Id: I6814819d318edaed5c86dae4055f4b050a4d39fd
2022-03-15 13:41:06 -04:00
Abdullah Gharaibeh
b2d2ec9e76 Graduate SuspendJob to GA 2022-02-15 10:46:13 -05:00
Mike Dame
80c01707e0
Wire contexts to Batch controllers (#105491)
* Wire contexts to Batch controllers

* (hold) feedback + updates that overlap with Apps controllers

* fixup errors
2021-11-10 14:56:46 -08:00
Kubernetes Prow Robot
6edcb60d9f
Merge pull request #104915 from alculquicondor/job-ready
Track ready pods in Job status
2021-10-28 09:20:26 -07:00
Abdullah Gharaibeh
74e1b07a5e Fixes TestNodeSelectorUpdate flaky test 2021-10-25 10:33:50 -04:00
Aldo Culquicondor
68f2c892e5 Add integration tests for tracking ready Pods
Change-Id: I1f20657f4f9cd4daad73149f969bad52a33698fa
2021-10-19 15:18:37 -04:00
Aldo Culquicondor
2c1b3fdb5b Graduate JobTrackingWithFinalizers to beta
Enable feature by default.

Update integration tests for other features to assume that finalizers are present.

Change-Id: Ie969344f572627dba882c0e862e5700dadaf3026
2021-10-15 10:29:40 -04:00
Abdullah Gharaibeh
335817cbce Allow updating node affinity, selector and tolerations for suspended jobs that never started 2021-10-14 10:04:47 -04:00
Aldo Culquicondor
95c2a8024c Parallelize pod updates in job test
To potentially reduce the number of job controller syncs.

Also reduce the maximum number of pods to sync in tests.
2021-10-01 09:55:53 -04:00
Aldo Culquicondor
47a957d163 Revert "Revert "Limit number of Pods counted in a single Job sync""
This reverts commit 8bcb780808.
2021-09-23 12:56:29 -04:00
Aldo Culquicondor
8bcb780808 Revert "Limit number of Pods counted in a single Job sync"
This reverts commit 7d9cb88fed.
2021-09-21 15:16:50 -04:00
Aldo Culquicondor
7d9cb88fed Limit number of Pods counted in a single Job sync
This prevents big Jobs from starving smaller ones.
2021-09-10 10:32:04 -04:00
Aldo Culquicondor
82728b5f71 Add integration tests for updating Job parallelism 2021-07-14 14:26:15 -04:00
Aldo Culquicondor
2dd2622188 Track Job Pods completion in status
Through Job.status.uncountedPodUIDs and a Pod finalizer

An annotation marks if a job should be tracked with new behavior

A separate work queue is used to remove finalizers from orphan pods.

Change-Id: I1862e930257a9d1f7f1b2b0a526ed15bc8c248ad
2021-07-08 17:48:05 +00:00
Mengjiao Liu
6871b2b3c7 Rename masterConfig to controlPlaneConfig 2021-06-04 20:55:08 +08:00
Mengjiao Liu
77b5ad2fb0 Part of master to controlplane in test/integration(1.22) 2021-06-03 18:29:05 +08:00
Mengjiao Liu
387154f1a9 Part3: master to controlplane in test/integration
Rename RunAMaster to RunAControlPlane
2021-06-03 11:06:19 +08:00