The PatchNodeOnce function has historically exited early
in scanarious when we Get a Node object, but the next Patch
API call on the same Node object fails. This can happen
in setups that are under a lot of resource pressure
or different network timeout scenarious.
Instead of exiting early and allow listing certain errors,
always retry on any Patch error. This aligns with the
general idea that kubeadm retries *all* API calls.
If the user has provided extraArgs with an order that has
significance (e.g. --service-account-issuer for kube-apiserver),
kubeadm will correctly override any base args, but will end up
sorting the entire resulting list, which is not desired.
Instead, only sort the base arguments and preserve the order
of overrides provided by the user.
When retrieving the cluster-info CM, ensure the cluster pointed
out by the current context in the kubeconfig is validated.
Add unit test for the above.
Make the function GetClusterFromKubeConfig() to return various
errors. Handle the errors on call sites. Add unit tests
for the update.
The above changes prevent panics when the users has manually
edited and malformed the kubeconfig in the cluster-info CM.
The newControlPlane flag has been historically problematic, since
it implies that the function FetchInitConfigurationFromCluster
cannot handle the cases where a node is worker node but
we still want to fetch its NodeRegistrationOptions conditionally,
in cases such as "upgrade node" for workers.
To fix this issue, replace the flag newControlPlaneNode with
two new flags getNodeRegistration and getAPIEndpoint.
If getNodeRegistration is true, we fetch the NRO, and if
getAPIEndpoint is true, we fetch the API endpoint for
that node.
Additionally, rename skipComponentConfigs to getComponentConfigs
for consistency and flip its value accordingly everywhere.
When waiting for the kube-apiserver to report 'ok'
in the 'init' and 'join' phase 'wait-control-plane', a client
constructed from the 'admin.conf' is used. In the case of the
kube-apiserver, the discovery client is used so that
anonymous-auth works. But if 'admin.conf' is used as is,
it would point to the CPE and not the LAE.
Implement a new method WaitControlPlaneClient() for both
init.go and join.go that patches the 'Server' field in the
loaded v1.Config to point to the LAE, before constructing
a client set and using it in the kube-apiserver waiter.
With StreamingCollectionEncodingToJSON and
StreamingCollectionEncodingToProtobuf, the WatchList must re-justify its
necessity. To prevent an ecosystem from building around a feature that
may not be promoted, we will stop serving list-via-watch until
performance numbers can justify its inclusion.
This also stops the kube-controller-manager from using the
list-via-watch by default. The fallback is a regular list, so during
the skew during an upgrade the "right" thing will happen and the new
StreamingCollectionEncoding will be used.
When the kube-apiserver has --anonymous-auth=false,
the regular http.Client.Get() that WaitForAllControlPlaneComponents
does will not work.
Always use the discovery client when checking the health status
of the kube-apiserver.
Do a minor rework of struct fields and unit tests.
Replace nil client in cmd/phases/join/waitcontrolplane.go.
There was one error path that led to a "controller has shut down" log
message. Other errors caused different log entries or are so unlikely (event
handler registration failure!) that they weren't checked at all.
It's clearer to let Run return an error in all cases and then log the
"controller has shut down" error at the call site. This also enables tests to
mark themselves as failed, should that ever happen.
This change introduces improvements to the component compatibility registry:
- Modify the kube-scheduler test server to create a separate ComponentGlobalsRegistry
- Update the compatibility registry to handle multiple flag configurations
- Enhance test cases to support emulation version mapping between components
Various parts of kube-proxy passed around a "hostname", but it is
actually the name of the *node* kube-proxy is running on, which is not
100% guaranteed to be exactly the same as the hostname. Rename it
everywhere to make it clearer that (a) it is definitely safe to use
that name to refer to the Node, (b) it is not necessarily safe to use
that name with DNS, etc.
The controller is derived from the node taint eviction controller.
In contrast to that controller it tracks the UID of pods to prevent
deleting the wrong pod when it got replaced.
The tests and comments have also been updated because while
VolumeCapacityPriority preferred a node with the least allocatable,
StorageCapacityScoring preferred a node with the maximum allocatable.
Basically all callers want dual-stack-if-possible, so simplify that.
Also, tweak the startup-time checking in kubelet to treat "no iptables
support" as interesting but not an error.
It was there so you could mock the results via a FakeExec, but these
days any unit tests outside of pkg/util/iptables that want to mock
iptables results use a FakeIPTables instead of a real
utiliptables.Interface with a FakeExec.
Historically it took an exec argument so you could pass a FakeExec to
mock its behavior in unit tests, but it has a fake implementation now
that is much more useful for unit tests than trying to use the real
implementation with a fake exec. (The unit tests still use fake execs,
but they don't need to use a public constructor.) So remove the exec
args from the public constructors.
Remove the utilexec.Interface args from the iptables/ipvs constructors
(which have been unused since the conntrack cleanup code was ported to
netlink).
Remove the EventRecorder fields from the iptables/ipvs Proxiers, which
have been unused since we removed the port-opener code in 2022.
Remove the strictARP field from the ipvs Proxier, which has apparently
always been unused (strictARP is only looked at at construct time).