The PatchNodeOnce function has historically exited early
in scanarious when we Get a Node object, but the next Patch
API call on the same Node object fails. This can happen
in setups that are under a lot of resource pressure
or different network timeout scenarious.
Instead of exiting early and allow listing certain errors,
always retry on any Patch error. This aligns with the
general idea that kubeadm retries *all* API calls.
If the user has provided extraArgs with an order that has
significance (e.g. --service-account-issuer for kube-apiserver),
kubeadm will correctly override any base args, but will end up
sorting the entire resulting list, which is not desired.
Instead, only sort the base arguments and preserve the order
of overrides provided by the user.
When retrieving the cluster-info CM, ensure the cluster pointed
out by the current context in the kubeconfig is validated.
Add unit test for the above.
Make the function GetClusterFromKubeConfig() to return various
errors. Handle the errors on call sites. Add unit tests
for the update.
The above changes prevent panics when the users has manually
edited and malformed the kubeconfig in the cluster-info CM.
The newControlPlane flag has been historically problematic, since
it implies that the function FetchInitConfigurationFromCluster
cannot handle the cases where a node is worker node but
we still want to fetch its NodeRegistrationOptions conditionally,
in cases such as "upgrade node" for workers.
To fix this issue, replace the flag newControlPlaneNode with
two new flags getNodeRegistration and getAPIEndpoint.
If getNodeRegistration is true, we fetch the NRO, and if
getAPIEndpoint is true, we fetch the API endpoint for
that node.
Additionally, rename skipComponentConfigs to getComponentConfigs
for consistency and flip its value accordingly everywhere.
When waiting for the kube-apiserver to report 'ok'
in the 'init' and 'join' phase 'wait-control-plane', a client
constructed from the 'admin.conf' is used. In the case of the
kube-apiserver, the discovery client is used so that
anonymous-auth works. But if 'admin.conf' is used as is,
it would point to the CPE and not the LAE.
Implement a new method WaitControlPlaneClient() for both
init.go and join.go that patches the 'Server' field in the
loaded v1.Config to point to the LAE, before constructing
a client set and using it in the kube-apiserver waiter.
Moving Scheduler interfaces to staging: Move PodInfo and NodeInfo interfaces (together with related types) to staging repo, leaving internal implementation in kubernetes/kubernetes/pkg/scheduler
It hasn't been on-by-default before, therefore it does not get locked to the
new default on yet. This has some impact on the scheduler configuration
because the plugin is now enabled by default.
Because the feature is now GA, it doesn't need to be a label on E2E tests,
which wouldn't be possible anyway once it gets removed entirely.
Some tests do version emulation and need the DRA feature. In that combination
the --runtime-config-emulation-forward-compatible option is needed to allow
enabling the V1 API although it's only available in 1.34.
As before when adding v1beta2, DRA drivers built using the
k8s.io/dynamic-resource-allocation helper packages remain compatible with all
Kubernetes release >= 1.32. The helper code picks whatever API version is
enabled from v1beta1/v1beta2/v1.
However, the control plane now depends on v1, so a cluster configuration where
only v1beta1 or v1beta2 are enabled without the v1 won't work.
ProxyHealthServer now consumes NodeManager to get the latest
updated node object for determining node eligibility.
Signed-off-by: Daman Arora <aroradaman@gmail.com>
Co-authored-by: Dan Winship <danwinship@redhat.com>
NodeManager, if configured with to watch for PodCIDR watch, watches
for changes in PodCIDRs and crashes kube-proxy if a change is
detected in PodCIDRs.
Signed-off-by: Daman Arora <aroradaman@gmail.com>
Co-authored-by: Dan Winship <danwinship@redhat.com>
NodeManager initialises node informers, waits for cache sync and polls for
node object to retrieve NodeIPs, handle node events and crashes kube-proxy
when change in NodeIPs is detected.
Signed-off-by: Daman Arora <aroradaman@gmail.com>
Co-authored-by: Dan Winship <danwinship@redhat.com>
This simplifies how the proxier receives update for change in node
labels. Instead of passing the complete Node object we just pass
the proxy relevant topology labels extracted from the complete list
of labels, and the downstream event handlers will only be notified
when there are changes in topology labels.
Signed-off-by: Daman Arora <aroradaman@gmail.com>