The newControlPlane flag has been historically problematic, since
it implies that the function FetchInitConfigurationFromCluster
cannot handle the cases where a node is worker node but
we still want to fetch its NodeRegistrationOptions conditionally,
in cases such as "upgrade node" for workers.
To fix this issue, replace the flag newControlPlaneNode with
two new flags getNodeRegistration and getAPIEndpoint.
If getNodeRegistration is true, we fetch the NRO, and if
getAPIEndpoint is true, we fetch the API endpoint for
that node.
Additionally, rename skipComponentConfigs to getComponentConfigs
for consistency and flip its value accordingly everywhere.
When waiting for the kube-apiserver to report 'ok'
in the 'init' and 'join' phase 'wait-control-plane', a client
constructed from the 'admin.conf' is used. In the case of the
kube-apiserver, the discovery client is used so that
anonymous-auth works. But if 'admin.conf' is used as is,
it would point to the CPE and not the LAE.
Implement a new method WaitControlPlaneClient() for both
init.go and join.go that patches the 'Server' field in the
loaded v1.Config to point to the LAE, before constructing
a client set and using it in the kube-apiserver waiter.
When gRPC notifies the kubelet that a connection ended, the kubelet tries to
reconnect because it needs to know when a DRA driver comes back. The same code
gets called when a connection goes idle, by default after 30 minutes. In that
and only that case the conn.Connect call deadlocks while calling into the gRPC
idle manager.
This can be reproduced with a new unit test which artificially shortens the
idle timeout. This fix is to move the Connect call into a goroutine because
then both HandleConn and Connect can proceed. It's sufficient that Connect
finishes at some point, it doesn't need to be immediately.
DRA also calls Register at pkg/kubelet/cm/container_manager_linux.go NewContainerManager(), causing volume stats collector being ignored.
Fix this by moving it out of `sync.Once()`, allowing multiple calls to `Register()` func.
Listing all keys from etcd turned out to be too expensive, negativly
impacting events POST latency. Events resource is the only resource that
by default has watch cache disabled and which includes very
large number of small objects making it very costly to list keys.
Expected impact:
* No apiserver_resource_size_estimate_bytes metric for events.
* APF overestimating LIST request cost to events. Fallback assumes
object size of 1.5MB, meaning LIST events will always get maxSeats
The output format is now used by the `Complete()` function, so it must
be set before invoking said function.
The commit also adds a unit tests for this scenario.
Signed-off-by: Marc Khouzam <marc.khouzam@gmail.com>
The comparison of SELinux labels in KCM tolerates missing fields - the
operating system is going to default them from its defaults, but in KCM we
don't know what the defaults are.
But the OS won't default the last component, "level", which includes also
categories. Make sure that labels with a level set conflicts with level "",
that's what will conflict on the OS too.