Commit Graph

132154 Commits

Author SHA1 Message Date
Erik Wilson
2faebd06de Add Vagrantfile 2026-02-10 19:18:48 -03:00
Darren Shepherd
d00de11a39 Add tag.sh script
Add DefaultKubeBinaryVersion to the tag script

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2026-02-10 19:18:48 -03:00
Darren Shepherd
31cccb5015 Fix CSI initialization conflict
CSI is used by both the kubelet and kube-controller-manager.  Both
components will initialize the csiPlugin with different VolumeHost
objects.  The csiPlugin will then assign a global variable for
the node info manager.  It is then possible that the kubelet gets
the credentials of the kube-controller-manager and that will cause
CSI to fail.
2026-02-10 19:18:48 -03:00
Darren Shepherd
ae68621b06 Allow override of "kubernetes" endpoint port 2026-02-10 19:18:48 -03:00
Darren Shepherd
3ae336d147 Notify startup to grab a hold of handler and authenticator
Fix to the completed options config

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2026-02-10 19:18:48 -03:00
Darren Shepherd
7b3a3466a8 Add stopCh to apiserver & context to kublet commands
Remove SetupSignalContext call from the apiserver

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2026-02-10 19:18:48 -03:00
Erik Wilson
1197debe45 Update kubernetes service on start for port changes 2026-02-10 19:18:48 -03:00
Darren Shepherd
1ea5fa6587 Don't ever select the flannel bridge or cni bridge 2026-02-10 19:18:48 -03:00
Darren Shepherd
7e25ab755d Cache loopback cert in the certs dir if set 2026-02-10 19:18:48 -03:00
Darren Shepherd
42c396c3fc Add ability to disable proxy hostname check 2026-02-10 19:18:48 -03:00
Darren Shepherd
39a967ff02 Hide deprecated warnings 2026-02-10 19:18:48 -03:00
Darren Shepherd
5aacef5316 Set all sources so node+agent in the same process doesn't get restricted 2026-02-10 19:18:48 -03:00
Darren Shepherd
01ad2ae0c4 Don't check for cpuset cgroup, not always required? 2026-02-10 19:18:48 -03:00
Darren Shepherd
8e80ea30bc Wait for kube-apiserver for 2 minutes for slow (ARM) systems 2026-02-10 19:18:48 -03:00
Darren Shepherd
87a85076c5 Make kubelet.sock path changable 2026-02-10 19:18:48 -03:00
Darren Shepherd
e8b0bfcbff only use the resolved name if port was zero 2026-02-10 19:18:48 -03:00
Darren Shepherd
547d883088 If you can't set hashsize on nf_conntrack don't fail 2026-02-10 19:18:48 -03:00
Darren Shepherd
0d57fd771e Drop credential providers 2026-02-10 19:18:48 -03:00
Darren Shepherd
efd701efb9 Drop storage plugins 2026-02-10 19:18:48 -03:00
Darren Shepherd
586124c087 Drop client-go cloud auth 2026-02-10 19:18:47 -03:00
Kubernetes Release Robot
14507e2fb3 Release commit for Kubernetes v1.34.4 2026-02-10 12:53:36 +00:00
Kubernetes Prow Robot
cc6bd01063 Merge pull request #136490 from AutuSnow/automated-cherry-pick-of-#136325-upstream-release-1.34
Automated cherry pick of #136325: fix(expansion):Resolve the issue of UTF-8 characters being truncated
2026-02-06 08:28:30 +05:30
Kubernetes Prow Robot
e71810983a Merge pull request #136480 from rogowski-piotr/automated-cherry-pick-of-#135919-upstream-release-1.34
Automated cherry pick of #135919: kubelet(dra): fix handling of multiple ResourceClaims when one is already prepared
2026-02-06 05:38:30 +05:30
Kubernetes Prow Robot
36b7bc8fb5 Merge pull request #136364 from dlipovetsky/automated-cherry-pick-of-#136014-upstream-release-1.34
Automated cherry pick of #136014: kubeadm: waiting for etcd learner member to be started before promoting during 'kubeadm join'
2026-02-06 03:28:29 +05:30
Kubernetes Prow Robot
62627f5552 Merge pull request #136594 from RomanBednar/automated-cherry-pick-of-#136202-upstream-release-1.34
Automated cherry pick of #136202: csi: raise kubelet CSI init backoff to cover ~140s DNS delays
2026-02-05 20:00:42 +05:30
Kubernetes Prow Robot
2a5c6bb779 Merge pull request #136142 from shwetha-s-poojary/automated-cherry-pick-of-#135666-upstream-release-1.34
Automated cherry pick of #135666: Fixes the flaky `TestWebhookConverterWithWatchCache` test
2026-02-05 20:00:34 +05:30
Kubernetes Prow Robot
976ffc74a9 Merge pull request #136566 from pohly/automated-cherry-pick-of-#136269-origin-release-1.34
Automated cherry pick of #136269: DRA scheduler: double allocation fixes
2026-02-05 17:58:47 +05:30
Kubernetes Prow Robot
8e110c6ce1 Merge pull request #136433 from thc1006/cherry-pick-136028-release-1.34
[release-1.34] fix(kubelet): convert V().Error() to V().Info() for verbosity-aware logging
2026-02-05 17:58:40 +05:30
Kubernetes Prow Robot
5caaeadc3f Merge pull request #136374 from princepereira/automated-cherry-pick-of-#136241-upstream-release-1.34
Automated cherry pick of #136241: Fix for preferred dualstack and required dualstack in winkernel proxier.
2026-02-05 17:58:32 +05:30
Kubernetes Prow Robot
7dc238b2eb Merge pull request #136467 from cpanato/update-go-1256-rel134
[release-1.34] [go] Bump dependencies, images and versions used to Go 1.24.12 and distroless iptables
2026-01-31 13:22:25 +05:30
Carlos Panato
86f29bf874 Bump dependencies, images and versions used to Go 1.24.12 and distroless iptables
Signed-off-by: Carlos Panato <ctadeu@gmail.com>
2026-01-30 11:24:34 +01:00
Kubernetes Prow Robot
0cfcccf6c0 Merge pull request #136635 from dims/automated-cherry-pick-of-#136529-#136554-upstream-release-1.34
Automated cherry pick of #136529: test: Read /proc/net/nf_conntrack instead of using conntrack binary
#136554: test: Fix KubeProxy CLOSE_WAIT test for IPv6 environments (and where /proc/net/nf_conntrack may be missing)
2026-01-30 04:09:48 +05:30
Davanum Srinivas
4c5332710c Apparently some EC2 images we use do not have /proc/net/nf_conntrack
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2026-01-29 15:02:00 -05:00
Davanum Srinivas
77976dc0f7 test: cleanup from review
- Use netutils.IsIPv6(ip) instead of manual nil/To4 check
- Remove unnecessary ip.To16() call since IPv6 is already 16 bytes
- Remove ipFamily from grep pattern since IP format ensures correctness

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2026-01-29 15:02:00 -05:00
Davanum Srinivas
5d4cdfdee8 test: Fix KubeProxy CLOSE_WAIT test for IPv6 environments
The /proc/net/nf_conntrack file uses fully expanded IPv6 addresses
with leading zeros in each 16-bit group. For example:
  fc00:f853:ccd:e793::3 -> fc00:f853:0ccd:e793:0000:0000:0000:0003

Add expandIPv6ForConntrack() helper function to expand IPv6 addresses
to the format used by /proc/net/nf_conntrack before using them in
the grep pattern.

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2026-01-29 15:01:59 -05:00
Davanum Srinivas
404bf4b04d test: Read /proc/net/nf_conntrack instead of using conntrack binary
The distroless-iptables image no longer includes the conntrack binary
as of v0.8.7 (removed in kubernetes/release#4223 since kube-proxy no
longer needs it after kubernetes#126847).

Update the KubeProxy CLOSE_WAIT timeout test to read /proc/net/nf_conntrack
directly instead of using the conntrack command. The file contains the
same connection tracking data and is accessible from the privileged
host-network pod.

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2026-01-29 15:01:57 -05:00
Kubernetes Prow Robot
766a195b2a Merge pull request #136440 from BenTheElder/134-go1.24.12
release-1.34: upgrade go to 1.24.12 and decouple from kube-cross
2026-01-29 04:11:47 +05:30
Roman Bednar
d71e0dc506 csi: raise kubelet CSI init backoff to cover ~140s DNS delays
- bump init backoff to Duration=30ms, Factor=8 (Steps=6) to yield ~140s total
- prevent kubelet restarts when DNS is blackholed and NSS must fall back to myhostname
- keep CSI/CSINode initialization alive long enough to complete in ARO DNS-failure scenarios
2026-01-28 11:48:44 +01:00
Patrick Ohly
e4b5d3c8b4 DRA scheduler: fix another root cause of double device allocation
GatherAllocatedState and ListAllAllocatedDevices need to collect information
from different sources (allocated devices, in-flight claims), potentially even
multiple times (GatherAllocatedState first gets allocated devices, then the
capacities).

The underlying assumption that nothing bad happens in parallel is not always
true. The following log snippet shows how an update of the assume
cache (feeding the allocated devices tracker) and in-flight claims lands such
that GatherAllocatedState doesn't see the device in that claim as allocated:

    dra_manager.go:263: I0115 15:11:04.407714      18778] scheduler: Starting GatherAllocatedState
    ...
    allocateddevices.go:189: I0115 15:11:04.407945      18066] scheduler: Observed device allocation device="testdra-all-usesallresources-hvs5d.driver/worker-5/worker-5-device-094" claim="testdra-all-usesallresources-hvs5d/claim-0553"
    dynamicresources.go:1150: I0115 15:11:04.407981      89109] scheduler: Claim stored in assume cache pod="testdra-all-usesallresources-hvs5d/my-pod-0553" claim="testdra-all-usesallresources-hvs5d/claim-0553" uid=<types.UID>: a84d3c4d-f752-4cfd-8993-f4ce58643685 resourceVersion="5680"
    dra_manager.go:201: I0115 15:11:04.408008      89109] scheduler: Removed in-flight claim claim="testdra-all-usesallresources-hvs5d/claim-0553" uid=<types.UID>: a84d3c4d-f752-4cfd-8993-f4ce58643685 version="1211"
    dynamicresources.go:1157: I0115 15:11:04.408044      89109] scheduler: Removed claim from in-flight claims pod="testdra-all-usesallresources-hvs5d/my-pod-0553" claim="testdra-all-usesallresources-hvs5d/claim-0553" uid=<types.UID>: a84d3c4d-f752-4cfd-8993-f4ce58643685 resourceVersion="5680" allocation=<
        	{
        	  "devices": {
        	    "results": [
        	      {
        	        "request": "req-1",
        	        "driver": "testdra-all-usesallresources-hvs5d.driver",
        	        "pool": "worker-5",
        	        "device": "worker-5-device-094"
        	      }
        	    ]
        	  },
        	  "nodeSelector": {
        	    "nodeSelectorTerms": [
        	      {
        	        "matchFields": [
        	          {
        	            "key": "metadata.name",
        	            "operator": "In",
        	            "values": [
        	              "worker-5"
        	            ]
        	          }
        	        ]
        	      }
        	    ]
        	  },
        	  "allocationTimestamp": "2026-01-15T14:11:04Z"
        	}
         >
    dra_manager.go:280: I0115 15:11:04.408085      18778] scheduler: Device is in flight for allocation device="testdra-all-usesallresources-hvs5d.driver/worker-5/worker-5-device-095" claim="testdra-all-usesallresources-hvs5d/claim-0086"
    dra_manager.go:280: I0115 15:11:04.408137      18778] scheduler: Device is in flight for allocation device="testdra-all-usesallresources-hvs5d.driver/worker-5/worker-5-device-096" claim="testdra-all-usesallresources-hvs5d/claim-0165"
    default_binder.go:69: I0115 15:11:04.408175      89109] scheduler: Attempting to bind pod to node pod="testdra-all-usesallresources-hvs5d/my-pod-0553" node="worker-5"
    dra_manager.go:265: I0115 15:11:04.408264      18778] scheduler: Finished GatherAllocatedState allocatedDevices=<map[string]interface {} | len:2>: {

Initial state: "worker-5-device-094" is in-flight, not in cache
- goroutine #1: starts GatherAllocatedState, copies cache
- goroutine #2: adds to assume cache, removes from in-flight
- goroutine #1: checks in-flight

=> device never seen as allocated

This is the second reason for double allocation of the same device in two
different claims. The other was timing in the assume cache. Both were
tracked down with an integration test (separate commit). It did not fail
all the time, but enough that regressions should show up as flakes.
2026-01-27 14:52:15 +01:00
Patrick Ohly
af0e9bb033 DRA scheduler: fix one root cause of double device allocation
DRA depends on the assume cache having invoked all event handlers before
Assume() returns, because DRA maintains state that is relevant for scheduling
through those event handlers.

This log snippet shows how this went wrong during PreBind:

    dynamicresources.go:1150: I0115 10:35:29.264437] scheduler: Claim stored in assume cache pod="testdra-all-usesallresources-kqjpj/my-pod-0091" claim="testdra-all-usesallresources-kqjpj/claim-0091" uid=<types.UID>: 516f274f-e1a9-4a4b-b7d2-bb86138e4240 resourceVersion="5636"
    dra_manager.go:198: I0115 10:35:29.264448] scheduler: Removed in-flight claim claim="testdra-all-usesallresources-kqjpj/claim-0091" uid=<types.UID>: 516f274f-e1a9-4a4b-b7d2-bb86138e4240 version="287"
    dynamicresources.go:1157: I0115 10:35:29.264463] scheduler: Removed claim from in-flight claims pod="testdra-all-usesallresources-kqjpj/my-pod-0091" claim="testdra-all-usesallresources-kqjpj/claim-0091" uid=<types.UID>: 516f274f-e1a9-4a4b-b7d2-bb86138e4240 resourceVersion="5636" allocation=<
    ...
    allocateddevices.go:189: I0115 10:35:29.267315] scheduler: Observed device allocation device="testdra-all-usesallresources-kqjpj.driver/worker-1/worker-1-device-096" claim="testdra-all-usesallresources-kqjpj/claim-0091"

- goroutine #1: UpdateStatus result delivered via informer.
  AssumeCache updates cache, pushes event A, emitEvents pulls event A from queue.
  *Not* done with delivering it yet!
- goroutine #2: AssumeCache.Assume called. Updates cache, pushes event B, emits it.
  Old and new claim have allocation, so no "Observed device allocation".
- goroutine #3: Schedules next pod, without considering device as allocated (not in the log snippet).
- goroutine #1: Finally delivers event A: "Observed device allocation", but too late.

Also, events are delivered out-of-order.

The fix is to let emitEvents when called by Assume wait for a potentially
running emitEvents in some other goroutine, thus ensuring that an event pulled
out of the queue by that other goroutine got delivered before Assume itself
checks the queue one more time and then returns.

The time window were things go wrong is small. An E2E test covering this only
flaked rarely, and only in the CI. An integration test (separate commit) with
higher number of pods finally made it possible to reproduce locally. It also
uncovered a second race (fix in separate commit).

The unit test fails without the fix:

    === RUN   TestAssumeConcurrency
        assume_cache_test.go:311: FATAL ERROR:
            	Assume should have blocked and didn't.
    --- FAIL: TestAssumeConcurrency (0.00s)
2026-01-27 14:52:15 +01:00
thc1006
d9da8e2033 fix(kubelet): convert V().Error() to V().Info() for verbosity-aware logging
This PR fixes incorrect usage of `logger.V(N).Error()` across the kubelet
package. The go-logr package design causes `Error()` calls to bypass
verbosity level checks entirely, meaning these logs are always printed
regardless of the configured verbosity level.

Note: This cherry-pick only fixes `logger.V().Error()` (contextual logging)
calls. Files using `klog.V().ErrorS()` are NOT changed because klog's
structured logging properly respects verbosity levels.

Files modified (6 files, 11 instances):
- pkg/kubelet/cm/dra/plugin/dra_plugin_manager.go
- pkg/kubelet/kuberuntime/kuberuntime_container_linux.go
- pkg/kubelet/lifecycle/handlers.go
- pkg/kubelet/pleg/generic.go
- pkg/kubelet/volumemanager/cache/desired_state_of_world.go
- pkg/kubelet/volumemanager/reconciler/reconstruct.go

Conversion pattern applied:
- V(N).Error(err, "msg", ...) → V(N).Info("msg", ..., "err", err)

Fixes: https://github.com/kubernetes/kubernetes/issues/136027
Cherry-pick of: https://github.com/kubernetes/kubernetes/pull/136028

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
2026-01-26 23:56:56 +00:00
Piotr Rogowski
dacc7257f6 test(ut/dra): add unit test for pod requesting prepared and new claims 2026-01-26 12:04:47 +00:00
qiuxue
c356771af0 fix(expansion):Resolve the issue of UTF-8 characters being truncated, resulting in invalid UTF-8 2026-01-24 11:07:45 +08:00
Piotr Rogowski
6af7de8d1c test(e2e/dra): add test for pod requesting allocated and new claims 2026-01-23 19:40:18 +00:00
rogowski-piotr
ddfb1c1e53 kubelet(dra): fix multiple claims handling 2026-01-23 19:40:18 +00:00
Benjamin Elder
94938168a8 remove blank line between comments and entry 2026-01-22 13:05:53 -08:00
Benjamin Elder
a0f2539961 reorder kube-cross to be under go version and dedupe it from the go version 2026-01-22 13:05:38 -08:00
Benjamin Elder
cf76cd86a3 bump go to 1.24.12 2026-01-22 13:03:45 -08:00
Prince Pereira
eb8f93348d Fix for preferred dualstack and required dualstack in winkernel proxier. 2026-01-21 05:40:42 +00:00
SataQiu
d77a7d6bb4 kubeadm: waiting for etcd learner member to be started before promoting during 'kubeadm join' 2026-01-20 16:00:51 -08:00