Commit Graph

114247 Commits

Author SHA1 Message Date
Clayton Coleman
6b9a381185
kubelet: Force deleted pods can fail to move out of terminating
If a CRI error occurs during the terminating phase after a pod is
force deleted (API or static) then the housekeeping loop will not
deliver updates to the pod worker which prevents the pod's state
machine from progressing. The pod will remain in the terminating
phase but no further attempts to terminate or cleanup will occur
until the kubelet is restarted.

The pod worker now maintains a store of the pods state that it is
attempting to reconcile and uses that to resync unknown pods when
SyncKnownPods() is invoked, so that failures in sync methods for
unknown pods no longer hang forever.

The pod worker's store tracks desired updates and the last update
applied on podSyncStatuses. Each goroutine now synchronizes to
acquire the next work item, context, and whether the pod can start.
This synchronization moves the pending update to the stored last
update, which will ensure third parties accessing pod worker state
don't see updates before the pod worker begins synchronizing them.

As a consequence, the update channel becomes a simple notifier
(struct{}) so that SyncKnownPods can coordinate with the pod worker
to create a synthetic pending update for unknown pods (i.e. no one
besides the pod worker has data about those pods). Otherwise the
pending update info would be hidden inside the channel.

In order to properly track pending updates, we have to be very
careful not to mix RunningPods (which are calculated from the
container runtime and are missing all spec info) and config-
sourced pods. Update the pod worker to avoid using ToAPIPod()
and instead require the pod worker to directly use
update.Options.Pod or update.Options.RunningPod for the
correct methods. Add a new SyncTerminatingRuntimePod to prevent
accidental invocations of runtime only pod data.

Finally, fix SyncKnownPods to replay the last valid update for
undesired pods which drives the pod state machine towards
termination, and alter HandlePodCleanups to:

- terminate runtime pods that aren't known to the pod worker
- launch admitted pods that aren't known to the pod worker

Any started pods receive a replay until they reach the finished
state, and then are removed from the pod worker. When a desired
pod is detected as not being in the worker, the usual cause is
that the pod was deleted and recreated with the same UID (almost
always a static pod since API UID reuse is statistically
unlikely). This simplifies the previous restartable pod support.
We are careful to filter for active pods (those not already
terminal or those which have been previously rejected by
admission). We also force a refresh of the runtime cache to
ensure we don't see an older version of the state.

Future changes will allow other components that need to view the
pod worker's actual state (not the desired state the podManager
represents) to retrieve that info from the pod worker.

Several bugs in pod lifecycle have been undetectable at runtime
because the kubelet does not clearly describe the number of pods
in use. To better report, add the following metrics:

  kubelet_desired_pods: Pods the pod manager sees
  kubelet_active_pods: "Admitted" pods that gate new pods
  kubelet_mirror_pods: Mirror pods the kubelet is tracking
  kubelet_working_pods: Breakdown of pods from the last sync in
    each phase, orphaned state, and static or not
  kubelet_restarted_pods_total: A counter for pods that saw a
    CREATE before the previous pod with the same UID was finished
  kubelet_orphaned_runtime_pods_total: A counter for pods detected
    at runtime that were not known to the kubelet. Will be
    populated at Kubelet startup and should never be incremented
    after.

Add a metric check to our e2e tests that verifies the values are
captured correctly during a serial test, and then verify them in
detail in unit tests.

Adds 23 series to the kubelet /metrics endpoint.
2023-03-08 22:03:51 -06:00
David Porter
c5a1f0188b
test: Add node e2e test to verify static pod termination
Add node e2e test to verify that static pods can be started after a
previous static pod with the same config temporarily failed termination.

The scenario is:

1. Static pod is started
2. Static pod is deleted
3. Static pod termination fails (internally `syncTerminatedPod` fails)
4. At later time, pod termination should succeed
5. New static pod with the same config is (re)-added
6. New static pod is expected to start successfully

To repro this scenario, setup a pod using a NFS mount. The NFS server is
stopped which will result in volumes failing to unmount and
`syncTerminatedPod` to fail. The NFS server is later started, allowing
the volume to unmount successfully.

xref:

1. https://github.com/kubernetes/kubernetes/pull/113145#issuecomment-1289587988
2. https://github.com/kubernetes/kubernetes/pull/113065
3. https://github.com/kubernetes/kubernetes/pull/113093

Signed-off-by: David Porter <david@porter.me>
2023-03-03 10:00:48 -06:00
David Porter
1c75c2cda8
test: Add e2e to verify static pod termination
Add a node e2e to verify that if a static pod is terminated while the
container runtime or CRI returns an error, the pod is eventually
terminated successfully.

This test serves as a regression test for k8s.io/issue/113145 which
fixes an issue where force deleted pods may not be terminated if the
container runtime fails during a `syncTerminatingPod`.

To test this behavior, start a static pod, stop the container runtime,
and later start the container runtime. The static pod is expected to
eventually terminate successfully.

To start and stop the container runtime, we need to find the container
runtime systemd unit name. Introduce a util function
`findContainerRuntimeServiceName` which finds the unit name by getting
the pid of the container runtime from the existing
`ContainerRuntimeProcessName` flag passed into node e2e and using
systemd dbus `GetUnitNameByPID` function to convert the pid of the
container runtime to a unit name. Using the unit name, introduce helper
functions to start and stop the container runtime.

Signed-off-by: David Porter <david@porter.me>
2023-03-03 10:00:48 -06:00
Kubernetes Prow Robot
d446bebca8
Merge pull request #116171 from daman1807/conntrack-sync
Syncing IPVS conntrack cleaning with IPTables.
2023-03-03 06:18:57 -08:00
Kubernetes Prow Robot
6fd488a4e6
Merge pull request #115861 from JayKayy/inform-unsupported-pdb
Add a warning event when pdb has found a unmanaged pod
2023-03-03 03:16:58 -08:00
Kubernetes Prow Robot
165829587a
Merge pull request #116202 from ritazh/kmsv2-testcoverage
kmsv2: improve test coverage
2023-03-03 01:26:57 -08:00
Kubernetes Prow Robot
a6c775333c
Merge pull request #116237 from seans3/openapi3-add-error
Add custom error struct for Group/Version not found
2023-03-02 22:56:57 -08:00
Kubernetes Release Robot
3c7e7be3b6 CHANGELOG: Update directory for v1.27.0-alpha.3 release 2023-03-03 05:20:56 +00:00
Kubernetes Prow Robot
152d973d8b
Merge pull request #116242 from bobbypage/bump-gosystemd
deps: Update github.com/coreos/go-systemd/v22 to v22.4.0
2023-03-02 20:38:57 -08:00
David Porter
28e9775fd5 deps: Update github.com/coreos/go-systemd/v22 to v22.4.0
Update github.com/coreos/go-systemd/v22 to v22.4.0 which introduces
`GetUnitNameByPID`. This function will be used in node e2e to get the
container runtime systemd unit name.

Performed by:

$ hack/pin-dependency.sh github.com/coreos/go-systemd/v22  v22.4.0
$ hack/update-vendor.sh

Signed-off-by: David Porter <david@porter.me>
2023-03-02 19:33:55 -08:00
Kubernetes Prow Robot
3835c7aecd
Merge pull request #115882 from binacs/binacs/controller-use-issuperset
cleanup(controller): use IsSuperset to avoid interim slice
2023-03-02 17:00:57 -08:00
Rita Zhang
51db940dcc
kmsv2: improve test coverage
Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>
2023-03-02 16:36:07 -08:00
Sean Sullivan
dc83af0b44 Add custom error struct for Group/Version not found 2023-03-03 00:01:22 +00:00
Kubernetes Prow Robot
a80b423351
Merge pull request #116222 from ameukam/remove-retention-policy
remove retention policy for staging buckets
2023-03-02 14:52:04 -08:00
Kubernetes Prow Robot
e0ca10118e
Merge pull request #115865 from seans3/discovery-client-cleanup
Updates old 403 and 404 discovery response tolerations
2023-03-02 14:51:57 -08:00
Kubernetes Prow Robot
2898a044d9
Merge pull request #116215 from kannon92/comment-manual-selector
update documentation on generateSelector for manual selector case
2023-03-02 12:48:56 -08:00
kannon92
cd71486cfb update documentation on generateSelector for manual selector case 2023-03-02 19:47:58 +00:00
Kubernetes Prow Robot
ff52646d93
Merge pull request #116221 from enj/enj/i/kms_lru_size
kmsv2: retain more key ID metrics
2023-03-02 11:37:07 -08:00
Kubernetes Prow Robot
74f0819069
Merge pull request #116152 from torredil/fix-windows-e2e-test
Add windows nodeSelector to provisioning functions
2023-03-02 11:36:56 -08:00
Kubernetes Prow Robot
ab002db788
Merge pull request #116223 from logicalhan/metric-docs
include beta metrics in documentation and update docs for metrics
2023-03-02 10:31:04 -08:00
Kubernetes Prow Robot
57fd02ca29
Merge pull request #116218 from pohly/test-lease-controller-leak
update lease controller
2023-03-02 10:30:56 -08:00
Arnaud Meukam
471985557a
remove retention policy for staging buckets
Follow-up of:
  - https://github.com/kubernetes/kubernetes/pull/115634

The current retention policy prevent creation or update of new objects
until the existing one are deleted based on the retention period.

Signed-off-by: Arnaud Meukam <ameukam@gmail.com>
2023-03-02 19:15:29 +01:00
Kubernetes Prow Robot
efe20f6c9b
Merge pull request #114114 from ffromani/full-pcpus-stricter-precheck-issue113537
node: cpumgr: stricter pre-check for  the policy option full-pcpus-only
2023-03-02 09:04:56 -08:00
Monis Khan
539f734bfd
kmsv2: retain more key ID metrics
This change helps users understand the state of their encryption
config if storage migration is not consistently run with key ID
rotation.

Signed-off-by: Monis Khan <mok@microsoft.com>
2023-03-02 12:02:34 -05:00
Daman
42a91c29e5 proxier: track metrics before conntrack cleaning 2023-03-02 20:56:05 +05:30
Daman
b23cb97704 proxier: syncing ipvs conntrack cleaning with iptables. 2023-03-02 20:54:34 +05:30
Francesco Romani
0e9b92090c node: cpumgr: stricter precheck for full-pcpus-only
In order to implement the `full-pcpus-only` cpumanager policy option,
we leverage the implementation of the algorithm which picks CPUs.
By design, CPUs are taken from the biggest chunk available (socket
or NUMA zone) to physical cores, down to single cores.

Leveraging this, if the requested CPU count is a multiple of the SMT
level (commonly 2), we're guaranteed that only full physical cores
will be taken.

The hidden assumption here is this holds true by construction iff
the user reserved CPUs (if any) considering full physical CPUs.
IOW, if the user did intentionally or mistakely reserve single threads
which are no core siblings[1], then the simple check we implemented
is not sufficient.

A easy example can probably outline this better. With this setup:

cores: [(0, 4), (1, 5), (2, 6), (3, 8)] (in parens: thread siblings).
SMT level: 2 (each tuple is 2 elements)
Reserved CPUs: 0,1 (explicit pick using `--reserved-cpus`)

A container then requests 6 cpus. full-pcpus-only check: 6 % 2 == 0. Passed.
The CPU allocator will take first full cores, (2,6) and (3,8), and will
then pick the remaining single CPUs. The allocation will succeed, but
it's incorrect.

We can fix this case with a stricter precheck.
We need to additionally consider all the core siblings of the reserved
CPUs as unavailable when computing the free cpus, before to start the
actual allocation. Doing so, we fall back in the intended behavior, and
by construction all possible CPUs allocation whose number is multiple
of the SMT level are now correct again.

+++

[1] or thread siblings in the linux parlance, in any case:
hyperthread siblings of the same physical core

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-03-02 16:00:58 +01:00
Patrick Ohly
dad95e1be6 update lease controller
Passing in a context instead of a stop channel has several advantages:
- ensures that client-go calls return as soon as the controller is asked to stop
- contextual logging can be used

By passing that context down to its own functions and checking it while
waiting, the lease controller also doesn't get stuck in backoffEnsureLease
anymore (https://github.com/kubernetes/kubernetes/issues/116196).
2023-03-02 15:06:00 +01:00
Kubernetes Prow Robot
b6d102d634
Merge pull request #116071 from yuanchen8911/symlink
Add symlink data verification to statefulset e2e
2023-03-02 05:43:07 -08:00
Kubernetes Prow Robot
78e5db0931
Merge pull request #115107 from swatisehgal/handle-device-mgr-recovery-sample-dp-changes
node: device-mgr: sample device plugin: Add support to control registration process
2023-03-02 05:42:55 -08:00
Kubernetes Prow Robot
096e67d30e
Merge pull request #116179 from justinsb/visiteduids_deprecation
cleanup: replace deprecated sets.String
2023-03-02 04:04:56 -08:00
Kubernetes Prow Robot
0ad676fca8
Merge pull request #110960 from p0lyn0mial/upstream-cacher-sends-stream
cacher consistent streaming support
2023-03-02 03:00:56 -08:00
Lukasz Szaszkiewicz
52ce41a293 cacher_watcher: Add support for consistent streaming
design details https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/3157-watch-list#design-details
2023-03-02 10:59:48 +01:00
Lukasz Szaszkiewicz
7c7e773305 cacher: Add support for consistent streaming
design details https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/3157-watch-list#design-details
2023-03-02 10:59:47 +01:00
Kubernetes Prow Robot
762fa12686
Merge pull request #115402 from p0lyn0mial/upstream-sendinitialevents-take-2
Add API for watch list
2023-03-02 01:58:55 -08:00
Kubernetes Prow Robot
af9f7a4d90
Merge pull request #115220 from ruiwen-zhao/limit
Add MaxParallelImagePulls support
2023-03-01 23:32:55 -08:00
Kubernetes Prow Robot
949bee0118
Merge pull request #116189 from marosset/windows-hyperv-basic-e2e-test
Adding e2e test to verify hyperv container is running inside a VM on Windows
2023-03-01 22:27:07 -08:00
aimuz
571adf6e84
Improved FormatMap: Improves performance by about 4x, or nearly 2x in the worst case (#112661)
* Improved FormatMap

Improves performance by about 4x, or nearly 2x in the worst case

old FormatMap
BenchmarkFormatMap-12             873046                1238 ns/op             384 B/op         13 allocs/op
new FormatMap
BenchmarkFormatMap-12            3665762               327.0 ns/op             152 B/op          3 allocs/op

Signed-off-by: aimuz <mr.imuz@gmail.com>

* fixed

Signed-off-by: aimuz <mr.imuz@gmail.com>

* fixed

Signed-off-by: aimuz <mr.imuz@gmail.com>

* test: fix test

Signed-off-by: aimuz <mr.imuz@gmail.com>

---------

Signed-off-by: aimuz <mr.imuz@gmail.com>
2023-03-01 22:26:55 -08:00
Kubernetes Prow Robot
d788d436c9
Merge pull request #115893 from mgoltzsche/go-jose-update-2.6
bump go-jose to v2.6.0
2023-03-01 20:23:06 -08:00
Kubernetes Prow Robot
2b50e09f78
Merge pull request #115816 from ivelichkovich/celrefactor
refactor validatingadmissionpolicy cel validator and compiler to be reusable
2023-03-01 20:22:54 -08:00
ruiwen-zhao
572e6e0ffb Add MaxParallelImagePulls support
Signed-off-by: ruiwen-zhao <ruiwen@google.com>
2023-03-02 03:57:59 +00:00
Kubernetes Prow Robot
b4b2345f9a
Merge pull request #116106 from alexzielenski/revert-116062-revert-field-manager
Revert "Revert "Merge pull request #115324 from alexzielenski/apiserver/smd/use-openapiv3"
2023-03-01 19:09:07 -08:00
Kubernetes Prow Robot
59a7e34052
Merge pull request #115442 from bobbypage/unknown_pods_test
test: Add e2e node test to check for unknown pods
2023-03-01 19:08:55 -08:00
Max Goltzsche
fa5e6587f1
handle new error where sa jwt issued in the future
Signed-off-by: Max Goltzsche <max.goltzsche@gmail.com>
2023-03-02 03:15:13 +01:00
Max Goltzsche
031075d149
check jwt timestamp for zero value
Co-authored-by: Jordan Liggitt <jordan@liggitt.net>
2023-03-02 03:09:49 +01:00
Kubernetes Prow Robot
ddb0d06744
Merge pull request #116052 from kannon92/validation-coverage-and-cleanup
remove ValidateJobTemplate and add more test cases to batch validation
2023-03-01 18:05:07 -08:00
Kubernetes Prow Robot
53f3583c7f
Merge pull request #114785 from TommyStarK/kubelet/replace-deprecated-pointer-function
kubelet: Replace deprecated pointer function
2023-03-01 18:04:55 -08:00
Max Goltzsche
df8fa2eab5
bump go-jose to v2.6.0
Update go-jose from v2.2.2 to v2.6.0.
This is to make the kubernetes code compatible with newer go-jose versions that have a small breaking change (`jwt.NewNumericDate()` returns a pointer).

Signed-off-by: Max Goltzsche <max.goltzsche@gmail.com>
2023-03-02 02:53:17 +01:00
John Kwiatkoski
1f42ebc013 Add a warning event when pdb has found a unmanaged pod 2023-03-01 20:14:10 -05:00
Kubernetes Prow Robot
bb8e9f3afb
Merge pull request #116195 from seans3/openapi3-root-fix
Fixes bug with Root not handling Group without Version
2023-03-01 16:55:30 -08:00