Commit Graph

314 Commits

Author SHA1 Message Date
vinay kulkarni
01b96e7704 Rename ContainerStatus.ResourcesAllocated to ContainerStatus.AllocatedResources 2023-03-10 14:49:26 +00:00
Todd Neal
4096c9209c dedupe pod resource request calculation 2023-03-09 17:15:53 -06:00
Chen Wang
7db339dba2 This commit contains the following:
1. Scheduler bug-fix + scheduler-focussed E2E tests
2. Add cgroup v2 support for in-place pod resize
3. Enable full E2E pod resize test for containerd>=1.6.9 and EventedPLEG related changes.

Co-Authored-By: Vinay Kulkarni <vskibum@gmail.com>
2023-02-24 18:21:21 +00:00
Vinay Kulkarni
f2bd94a0de In-place Pod Vertical Scaling - core implementation
1. Core Kubelet changes to implement In-place Pod Vertical Scaling.
2. E2E tests for In-place Pod Vertical Scaling.
3. Refactor kubelet code and add missing tests (Derek's kubelet review)
4. Add a new hash over container fields without Resources field to allow feature gate toggling without restarting containers not using the feature.
5. Fix corner-case where resize A->B->A gets ignored
6. Add cgroup v2 support to pod resize E2E test.
KEP: /enhancements/keps/sig-node/1287-in-place-update-pod-resources

Co-authored-by: Chen Wang <Chen.Wang1@ibm.com>
2023-02-24 18:21:21 +00:00
SataQiu
4c60ee00aa remove GA featuregates: CSIInlineVolume, CSIMigration, DaemonSetUpdateSurge, EphemeralContainers, IdentifyPodOS, LocalStorageCapacityIsolation, NetworkPolicyEndPort, StatefulSetMinReadySeconds 2022-12-11 19:27:41 +08:00
Michal Wozniak
c803892bd8 Enable the feature into beta 2022-11-09 09:02:40 +01:00
ZhangKe10140699
62177fd36d make eviction message more clear 2022-11-08 10:07:02 +08:00
Michal Wozniak
52cd6755eb Add pod disruption conditions for kubelet initiated failures 2022-11-07 11:23:22 +01:00
David Ashpole
64af1adace
Second attempt: Plumb context to Kubelet CRI calls (#113591)
* plumb context from CRI calls through kubelet

* clean up extra timeouts

* try fixing incorrectly cancelled context
2022-11-05 06:02:13 -07:00
Kubernetes Prow Robot
98742f9d77
Merge pull request #110747 from harshanarayana/cleanup/GIT-110737/logging-improvements
structured-logging: replace KObjs with KObjSlice for logging
2022-11-03 00:49:34 -07:00
Antonio Ojea
9c2b333925 Revert "plumb context from CRI calls through kubelet"
This reverts commit f43b4f1b95.
2022-11-02 13:37:23 +00:00
David Ashpole
f43b4f1b95
plumb context from CRI calls through kubelet 2022-10-28 02:55:28 +00:00
Claudiu Belu
6f2eeed2e8 unittests: Fixes unit tests for Windows
Currently, there are some unit tests that are failing on Windows due to
various reasons:

- config options not supported on Windows.
- files not closed, which means that they cannot be removed / renamed.
- paths not properly joined (filepath.Join should be used).
- time.Now() is not as precise on Windows, which means that 2
  consecutive calls may return the same timestamp.
- different error messages on Windows.
- files have \r\n line endings on Windows.
- /tmp directory being used, which might not exist on Windows. Instead,
  the OS-specific Temp directory should be used.
- the default value for Kubelet's EvictionHard field was containing
  OS-specific fields. This is now moved, the field is now set during
  Kubelet's initialization, after the config file is read.
2022-10-25 23:46:56 +03:00
jinxu
0064010cdd Promote Local storage capacity isolation feature to GA
This change is to promote local storage capacity isolation feature to GA

At the same time, to allow rootless system disable this feature due to
unable to get root fs, this change introduced a new kubelet config
"localStorageCapacityIsolation". By default it is set to true. For
rootless systems, they can set this configuration to false to disable
the feature. Once it is set, user cannot set ephemeral-storage
request/limit because capacity and allocatable will not be set.

Change-Id: I48a52e737c6a09e9131454db6ad31247b56c000a
2022-08-02 23:45:48 -07:00
Davanum Srinivas
a9593d634c
Generate and format files
- Run hack/update-codegen.sh
- Run hack/update-generated-device-plugin.sh
- Run hack/update-generated-protobuf.sh
- Run hack/update-generated-runtime.sh
- Run hack/update-generated-swagger-docs.sh
- Run hack/update-openapi-spec.sh
- Run hack/update-gofmt.sh

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2022-07-26 13:14:05 -04:00
Harsha Narayana
c3cbc443ef
structured-logging: replace KObjs with KObjSlice for logging 2022-07-01 09:52:07 +05:30
ZhangKe10140699
08235a5835 fix evictionManager debugLog wrong 2022-06-22 16:08:43 +08:00
Kir Kolyshkin
4513de06a8 Regen mocks using go 1.18
Generated by ./hack/update-mocks.sh using go 1.18

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-03-23 10:19:38 -07:00
Kubernetes Prow Robot
56062f7f4f
Merge pull request #108010 from endocrimes/dani/eviction-flake
eviction: Deflake TestStart
2022-03-17 12:22:54 -07:00
Kubernetes Prow Robot
5d6a793221
Merge pull request #96828 from panjf2000/opt-epoll-eventfd
kubelet/eviction: eliminate redundant allocations when handling eventfd
2022-03-01 13:59:54 -08:00
Danielle Lancashire
3630328fd9 eviction: Deflake TestStart
TestStart was previously flaky. In approx 100_000 local runs, it failed
about 70% of the time, and has been mentioned as a flaky unit test in
the past.

This flake was due to a race condition with the logic as written and the
go scheduler. UpdateThreshold calls `notifier.Start(events)` in a new Go
Routine, which is not guarunteed to be called immediately.

This meant that if `m.Start()` was called before `notifier.Start()`, the
test would fail, as the notifier would not have been started before the
4 events were processed and lock released.

Here, we update the test to more closely match the intended application
behaviour, and have events passed to the channel when `Start` is called
on the notifier.

This ensures that -Start gets called and additionally validates
that the correct channel is provided to the notifier.

Stop was never called previously, as it only gets called on a subsequent
call to UpdateThreshold. `AnyTimes()` hid that this did not occur.
2022-02-08 17:03:44 +01:00
Patrick Ohly
9eaa2dc554 avoid klog Info calls without verbosity
In the following code pattern, the log message will get logged with v=0 in JSON
output although conceptually it has a higher verbosity:

   if klog.V(5).Enabled() {
       klog.Info("hello world")
   }

Having the actual verbosity in the JSON output is relevant, for example for
filtering out only the important info messages. The solution is to use
klog.V(5).Info or something similar.

Whether the outer if is necessary at all depends on how complex the parameters
are. The return value of klog.V can be captured in a variable and be used
multiple times to avoid the overhead for that function call and to avoid
repeating the verbosity level.
2022-01-12 07:48:36 +01:00
Kubernetes Prow Robot
7eb5046064
Merge pull request #106470 from qmloong/qmloong/fix
fix: some typos and syncPod outdated workflow annotation
2022-01-11 10:48:38 -08:00
menglong.qi
b886b9b108 fix: typo 2021-11-17 09:22:57 +08:00
David Porter
f5140d3145 kubelet: cgroupv2 disable memcg notifications
The current memory notifier on cgroupv2 relies on reading
`cgroup.event_control` which is unsupported on cgroupv2. For now, let's
disable the feature on cgroupv2.
2021-11-10 15:40:59 -08:00
Andy Pan
3033a64135 kubelet/eviction: eliminate redundant allocations when handling eventfd 2021-11-04 15:41:46 +08:00
yuzhiquanlong
be9e1fda5e remove format pods func, instead with klog.Kobjs 2021-10-15 18:26:02 +08:00
Kubernetes Prow Robot
cab54856f1
Merge pull request #104933 from vikramcse/automate_mockery
conversion of tests from mockery to mockgen
2021-09-30 18:33:21 -07:00
vikram Jadhav
0de4397490 mockery to mockgen conversion 2021-09-25 16:15:08 +00:00
wojtekt
53ce79a18a Migrate to k8s.io/utils/clock in pkg/kubelet 2021-09-10 12:20:09 +02:00
Stephen Augustus
481cf6fbe7
generated: Run hack/update-gofmt.sh
Signed-off-by: Stephen Augustus <foo@auggie.dev>
2021-08-24 15:47:49 -04:00
Clayton Coleman
3eadd1a9ea
Keep pod worker running until pod is truly complete
A number of race conditions exist when pods are terminated early in
their lifecycle because components in the kubelet need to know "no
running containers" or "containers can't be started from now on" but
were relying on outdated state.

Only the pod worker knows whether containers are being started for
a given pod, which is required to know when a pod is "terminated"
(no running containers, none coming). Move that responsibility and
podKiller function into the pod workers, and have everything that
was killing the pod go into the UpdatePod loop. Split syncPod into
three phases - setup, terminate containers, and cleanup pod - and
have transitions between those methods be visible to other
components. After this change, to kill a pod you tell the pod worker
to UpdatePod({UpdateType: SyncPodKill, Pod: pod}).

Several places in the kubelet were incorrect about whether they
were handling terminating (should stop running, might have
containers) or terminated (no running containers) pods. The pod worker
exposes methods that allow other loops to know when to set up or tear
down resources based on the state of the pod - these methods remove
the possibility of race conditions by ensuring a single component is
responsible for knowing each pod's allowed state and other components
simply delegate to checking whether they are in the window by UID.

Removing containers now no longer blocks final pod deletion in the
API server and are handled as background cleanup. Node shutdown
no longer marks pods as failed as they can be restarted in the
next step.

See https://docs.google.com/document/d/1Pic5TPntdJnYfIpBeZndDelM-AbS4FN9H2GTLFhoJ04/edit# for details
2021-07-06 15:55:22 -04:00
刁浩 10284789
be48f1d272 Add test cases to the addAllocatableThresholds function in pkg/kubelet/eviction/helpers.go
Signed-off-by: 刁浩 10284789 <diao.hao@zte.com.cn>
2021-06-15 11:32:44 +00:00
Kubernetes Prow Robot
8b057cdfa4
Merge pull request #99095 from maxlaverse/fix_kubelet_stuck_in_diskpressure
Prevent Kubelet from getting stuck in DiskPressure when imagefs minReclaim is set
2021-04-23 18:23:14 -07:00
Maxime Lagresle
63cba062eb
more sensible comment 2021-04-12 21:20:20 +02:00
Niekvdplas
fec272a7b2 Fixed several spelling mistakes 2021-03-30 23:02:09 +02:00
Kubernetes Prow Robot
a91fdfbeb3
Merge pull request #97493 from Pingan2017/allocatemem-1224
add operator for allocateMemory.available signal
2021-03-07 20:09:44 -08:00
Shintaro Murakami
fe7a862c2d Unify determination of whether a volume is ephemeral 2021-03-05 14:49:09 +09:00
Kubernetes Prow Robot
14abaa23c3
Merge pull request #99032 from yangjunmyfm192085/run-test18
Structured Logging migration: modify eviction  part logs of kubelet.
2021-03-04 13:38:57 -08:00
Benjamin Elder
56e092e382 hack/update-bazel.sh 2021-02-28 15:17:29 -08:00
Maxime Lagresle
848dc9e65f
Check that minReclaim is taken into account in reclaimNodeLevelResources() 2021-02-22 11:58:36 +01:00
JunYang
af0b4c9031 Structured Logging migration: modify eviction part logs of kubelet.
Signed-off-by: JunYang <yang.jun22@zte.com.cn>
2021-02-21 08:36:14 +08:00
Maxime Lagresle
7e97601023
Prevent Kubelet stuck in DiskPressure when imagefs minReclaim is set
Fixes https://github.com/kubernetes/kubernetes/issues/99094
2021-02-15 14:31:06 +01:00
Kubernetes Prow Robot
3ad19200ba
Merge pull request #97321 from shawnhanx/master
Change the upper limit of evictionthreshold from 10000% to 100%
2021-02-09 10:25:13 -08:00
shawnhanx
fa8d07d3e1
Apply suggestions from code review
Co-authored-by: bl-ue <54780737+bl-ue@users.noreply.github.com>
2021-02-07 09:23:07 +08:00
shawnhanx
0bee739a2f Change the upper limit of threshold from 10000% to 100% 2021-02-03 21:55:49 +08:00
Mike Dame
578ff3ec34 Move Taint/Toleration helpers to component-helpers repo
This is part of the goal for scheduling to remove dependencies on internal
packages for the scheduling framework. It also provides these functions in an
external location for other components and projects to import.
2021-02-01 11:06:03 -05:00
wzshiming
29808eaf24 Fix panic 2021-01-21 19:47:28 +08:00
Pingan2017
2f76666ff4 add operator for allocateMemory.available signal 2020-12-24 10:04:09 +08:00
Joel Smith
29ff2fe528 Remove now-unused eviction helpers, fix unit test TestCRIListPodStats 2020-12-03 13:04:25 -07:00