When race detection is enabled, merely running 25 e2e.test instances was too much
and the OOM killer shut down the Prow test pod because of the memory overhead.
A CI job could control that via GINKGO_PARALLEL_NODES, but we should also have
saner defaults which take this into account.
Since a few releases, Go supports `go build -race` and then produces
binaries which do data race detection when invoked. Some changes are needed to
enable using this in a kind cluster:
- `-race` must be passed when building dynamically linked binaries.
Only those support -race because CGO is required.
To avoid adding yet another env variables, the existing KUBE_RACE
gets used to convey the intent.
- KUBE_RACE must be passed into the dockerized build.
- Logging the base image of a release image makes it easier
to figure out whether the binary has a chance to run.
The base image is important because dynamically linked binaries need a base
image with libc. By default, control plane components are linked statically,
so users need to explicitly override the defaults:
KUBE_RACE=-race KUBE_CGO_OVERRIDES="kube-apiserver kube-controller-manager kube-scheduler" KUBE_GORUNNER_IMAGE=gcr.io/k8s-staging-test-infra/kubekins-e2e:v20250815-171060767f-master kind build node-image ...
KUBE_GORUNNER_IMAGE changes the base image for kube-apiserver,
kube-controller-manager and kube-scheduler. The kubekins image was picked for
this example because a Prow job definition already uses it. Reusing
it in a job avoids the need to maintain another image definition.
Running conformance tests against such a cluster with alpha+beta features
enabled revealed one new data race:
$ kubectl logs -n kube-system kube-controller-manager-kind-control-plane
...
WARNING: DATA RACE
Write at 0x00c00019a730 by goroutine 216:
k8s.io/client-go/tools/leaderelection.(*LeaderElector).setObservedRecord()
k8s.io/client-go/tools/leaderelection/leaderelection.go:529 +0x179
k8s.io/client-go/tools/leaderelection.(*LeaderElector).tryCoordinatedRenew()
k8s.io/client-go/tools/leaderelection/leaderelection.go:367 +0x5ca
...
Cleanup the available scripts to remove unused code paths after all
gogo references have been migrated to native protobuf.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
mockery has introduced breaking changes and switched to a v3 branch,
this migrates to that, mostly using the built-in migration tool. Mocks
are now generated in single files per package, except in packages
containing mocks for multiple interface packages (in
pkg/kubelet/container/testing).
Signed-off-by: Stephen Kitt <skitt@redhat.com>
This brings a few fixes, drops github.com/pkg/errors (as a direct
dependency), and bumps many transitive dependencies. The
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp bump to
v0.61.0 breaks "k8s.io/kubernetes/test/integration/apiserver: tracing"
consistently, so it's held back for now.
github.com/containerd/containerd/api pulls in gopkg.in/yaml.v3 so that
needs to be added to the exceptions in unwanted-dependencies.json.
Signed-off-by: Stephen Kitt <skitt@redhat.com>
set -x/+x wraps setting relevant configuration variables to debug how and where
they get sets. The final gotestsum invocation gets logged in addition to being
run.
Disable QF1008 in pull-kubernetes-linter-hints, based on discussion in
\#k8s-code-organization. We might want a check that forced a consistent
style within a single file, but there is no reason to prefer a global
choice between `obj.ObjectMeta.Name` and `obj.Name`.
Change-Id: If2e7a3a464cdf1ac83fdaaa18b065df1f19e8568
This change introduces the ability for the Kubelet to monitor and report
the health of devices allocated via Dynamic Resource Allocation (DRA).
This addresses a key part of KEP-4680 by providing visibility into
device failures, which helps users and controllers diagnose pod failures.
The implementation includes:
- A new `v1alpha1.NodeHealth` gRPC service with a `WatchResources`
stream that DRA plugins can optionally implement.
- A health information cache within the Kubelet's DRA manager to track
the last known health of each device and handle plugin disconnections.
- An asynchronous update mechanism that triggers a pod sync when a
device's health changes.
- A new `allocatedResourcesStatus` field in `v1.ContainerStatus` to
expose the device health information to users via the Pod API.
Update vendor
KEP-4680: Fix lint, boilerplate, and codegen issues
Add another e2e test, add TODO for KEP4680 & update test infra helpers
Add Feature Gate e2e test
Fixing presubmits
Fix var names, feature gating, and nits
Fix DRA Health gRPC API according to review feedback