Sathyanarayanan Saravanamuthu
c84c8add70
Decouple batch/job back-off logic from workqueues ( #114768 )
...
* batch/job: decouple backoff from workqueue
Signed-off-by: Sathyanarayanan Saravanamuthu <sathyanarays@vmware.com>
* Resolving review comments
* Resolving more review comments
* Resolving review comments
Signed-off-by: Sathyanarayanan Saravanamuthu <sathyanarays@vmware.com>
* Computing finish time to now when FinishedAt is unix epoch
* Addressing review comments
Signed-off-by: Sathyanarayanan Saravanamuthu <sathyanarays@vmware.com>
---------
Signed-off-by: Sathyanarayanan Saravanamuthu <sathyanarays@vmware.com>
2023-03-16 10:15:21 -07:00
Kubernetes Prow Robot
cb00077cd3
Merge pull request #113471 from ncdc/gc-contextual-logging
...
garbagecollector: use contextual logging
2023-03-10 04:34:39 -08:00
Andy Goldstein
26e3dab78b
garbagecollector: use contextual logging
...
Signed-off-by: Andy Goldstein <andy.goldstein@redhat.com>
2023-03-08 08:37:56 -05:00
ahg-g
2ecd24011a
Graduate JobMutableNodeSchedulingDirectives feature to GA
2023-02-28 15:47:13 +00:00
Yuan Chen
a24aef6510
Replace a function closure
...
Replace more closures with pointer conversion
Replace deprecated Int32Ptr to Int32
2023-02-27 09:13:36 -08:00
Daniel Vega-Myhre
c63f448451
change test names and address other comments
2023-02-23 03:25:17 +00:00
Daniel Vega-Myhre
b0b0959b92
address comments
2023-02-23 03:25:16 +00:00
Daniel Vega-Myhre
d41302312e
update validation logic so completions is mutable iff completions is modified in tandem with parallelsim so completions == parallelism
2023-02-23 03:25:16 +00:00
kannon92
6dfaeff33c
Remove Legacy Job Tracking
2023-01-10 14:52:54 +00:00
Aldo Culquicondor
61fe6114b3
Reduce load of Job integration test
...
Change-Id: If99856aa6640375a8a9feff13fa213d4f974a99a
2022-12-02 12:58:28 -05:00
Michal Wozniak
c803892bd8
Enable the feature into beta
2022-11-09 09:02:40 +01:00
Aldo Culquicondor
4948918155
Graduate JobTrackingWithFinalizers to stable
...
Change-Id: Ifc749a85b1270c0155ac511b91d4681d53236820
2022-11-04 17:05:53 -04:00
Aldo Culquicondor
5e03865f65
Add benchmark for large indexed job
...
Change-Id: I556f0cce5842699c98654cfb5a66e7c8d63b2e2e
2022-11-02 11:56:26 -04:00
Michał Woźniak
3628532311
Extend metrics with the new labels ( #113324 )
...
* Extend job metrics
* Refactor TestMetrics to extract its checks into dedicated tests per feature
2022-10-31 08:50:45 -07:00
Aldo Culquicondor
12d308f5c4
Add metric for terminated pods with tracking finalizer
...
Change-Id: I26f3169588c30ed82250cb7baff8e277f8d13bb7
2022-10-20 11:35:20 -04:00
Aldo Culquicondor
b8bd168180
Simplify tests for job metrics by resetting them
...
Change-Id: I20a0acbbb179bf895953b9d7af72625a2191b8eb
2022-10-19 13:52:00 -04:00
Kubernetes Prow Robot
bf14677914
Merge pull request #112546 from oscr/the-the
...
grammar: replace all occurrences of "the the" with "the"
2022-10-19 10:03:02 -07:00
Oscar Utbult
e4f776f230
grammar: replace all occurrences of "the the" with "the"
2022-10-14 09:03:14 +02:00
Michal Wozniak
b64e5b2d15
Fix the occasional double-counting job_finished_total metric
...
The reason for the issue is that the metrics were bumped before the
final job status update. In case the update failed the path was
repeated by the next syncJob leading to double-counting of the metrics.
The solution is to delay recording metrics and broadcasting events
after the job status update succeeds.
2022-10-13 17:23:03 +02:00
Michal Wozniak
bf9ce70de3
Support handling of pod failures with respect to the specified rules
2022-08-04 18:39:08 +02:00
Aldo Culquicondor
ca8cebe5ba
Fix JobTrackingWithFinalizers when a pod succeeds after the job fails
...
Change-Id: I3be351fb3b53216948a37b1d58224f8fbbf22b47
2022-08-02 19:33:06 -04:00
Wojciech Tyczyński
5b042f0bf4
Remove RunAnAPIServer from integration tests
2022-07-25 17:52:31 +02:00
Wojciech Tyczyński
aee829abf4
Clean shutdown of job integration tests
2022-05-28 21:14:09 +02:00
Lukasz Szaszkiewicz
59a5c1a6ea
hardens integration job tests
...
the job controller used by the tests must wait for the caches to sync
since the tests don't check /readyz there is no way
the tests can tell it is safe to call the server and requests won't be rejected
2022-05-24 13:47:38 +02:00
Wojciech Tyczyński
deef9e40de
Simplify Create/Delete-TestingNamespace functions
2022-05-15 23:06:26 +02:00
Kubernetes Prow Robot
63a618a815
Merge pull request #109486 from alculquicondor/job-backofflimit
...
Fix job tracking leaving pods with finalizers
2022-05-04 01:28:14 -07:00
Aldo Culquicondor
12568860cb
Test Foreground deletion in job integration
...
Change-Id: Ia6e86da5e66422fdb653c1ee60864a1c79233ea6
2022-04-20 16:39:10 -04:00
Aldo Culquicondor
f2c8030845
Integration test for backoff limit and finalizers
...
Change-Id: Ic231ce9a5504d3aae4191901d7eb5fe69bf017ac
2022-04-20 16:39:09 -04:00
Aldo Culquicondor
3b18613be8
Disable JobTrackingWithFinalizers due to unresolved bug
...
Change-Id: Ieeeab689ae51dfe0dc06bdca88519d0ecf66d636
2022-04-14 15:08:14 -04:00
Aldo Culquicondor
8c00f510ef
Graduate JobReadyPods to beta
...
Set podUpdateBatchPeriod to 1s
Change-Id: I8a10fd8f8559adad9df179b664b8c82851607855
2022-03-29 10:07:41 -04:00
Aldo Culquicondor
cd9fd12960
Reduce number of pods in Job+GC tests
...
To reduce the load of the integration tests. This change reduces the runtime of each test in half.
Change-Id: I71bcaadf3809643c63bb0f6b73c28778d37d8967
2022-03-25 13:01:50 -04:00
Aldo Culquicondor
1d9e3766d2
Add test for Background delete propagation
...
Change-Id: I033e6fb04933c64cfe6490d1019333745d58c423
2022-03-24 11:57:51 -04:00
Aldo Culquicondor
f72173e4b4
Add integration test for orphan pods when there is GC
...
Change-Id: I04cd70725fd1830be8daf2dca53f67bc10a379b7
2022-03-24 11:57:49 -04:00
Aldo Culquicondor
2c5d0a273c
Graduate IndexedJob to stable
...
- Lock feature gate to true and schedule for deletion in 1.26
- Remove checks on feature gate
- Graduate E2E test to Conformance
Change-Id: I6814819d318edaed5c86dae4055f4b050a4d39fd
2022-03-15 13:41:06 -04:00
Abdullah Gharaibeh
b2d2ec9e76
Graduate SuspendJob to GA
2022-02-15 10:46:13 -05:00
Mike Dame
80c01707e0
Wire contexts to Batch controllers ( #105491 )
...
* Wire contexts to Batch controllers
* (hold) feedback + updates that overlap with Apps controllers
* fixup errors
2021-11-10 14:56:46 -08:00
Kubernetes Prow Robot
6edcb60d9f
Merge pull request #104915 from alculquicondor/job-ready
...
Track ready pods in Job status
2021-10-28 09:20:26 -07:00
Abdullah Gharaibeh
74e1b07a5e
Fixes TestNodeSelectorUpdate flaky test
2021-10-25 10:33:50 -04:00
Aldo Culquicondor
68f2c892e5
Add integration tests for tracking ready Pods
...
Change-Id: I1f20657f4f9cd4daad73149f969bad52a33698fa
2021-10-19 15:18:37 -04:00
Aldo Culquicondor
2c1b3fdb5b
Graduate JobTrackingWithFinalizers to beta
...
Enable feature by default.
Update integration tests for other features to assume that finalizers are present.
Change-Id: Ie969344f572627dba882c0e862e5700dadaf3026
2021-10-15 10:29:40 -04:00
Abdullah Gharaibeh
335817cbce
Allow updating node affinity, selector and tolerations for suspended jobs that never started
2021-10-14 10:04:47 -04:00
Aldo Culquicondor
95c2a8024c
Parallelize pod updates in job test
...
To potentially reduce the number of job controller syncs.
Also reduce the maximum number of pods to sync in tests.
2021-10-01 09:55:53 -04:00
Aldo Culquicondor
47a957d163
Revert "Revert "Limit number of Pods counted in a single Job sync""
...
This reverts commit 8bcb780808
.
2021-09-23 12:56:29 -04:00
Aldo Culquicondor
8bcb780808
Revert "Limit number of Pods counted in a single Job sync"
...
This reverts commit 7d9cb88fed
.
2021-09-21 15:16:50 -04:00
Aldo Culquicondor
7d9cb88fed
Limit number of Pods counted in a single Job sync
...
This prevents big Jobs from starving smaller ones.
2021-09-10 10:32:04 -04:00
Aldo Culquicondor
82728b5f71
Add integration tests for updating Job parallelism
2021-07-14 14:26:15 -04:00
Aldo Culquicondor
2dd2622188
Track Job Pods completion in status
...
Through Job.status.uncountedPodUIDs and a Pod finalizer
An annotation marks if a job should be tracked with new behavior
A separate work queue is used to remove finalizers from orphan pods.
Change-Id: I1862e930257a9d1f7f1b2b0a526ed15bc8c248ad
2021-07-08 17:48:05 +00:00
Mengjiao Liu
6871b2b3c7
Rename masterConfig to controlPlaneConfig
2021-06-04 20:55:08 +08:00
Mengjiao Liu
77b5ad2fb0
Part of master to controlplane in test/integration(1.22)
2021-06-03 18:29:05 +08:00
Mengjiao Liu
387154f1a9
Part3: master to controlplane in test/integration
...
Rename RunAMaster to RunAControlPlane
2021-06-03 11:06:19 +08:00