Today, the health check response to the load balancers asking Kube-proxy for
the status of ETP:Local services does not include the healthz state of Kube-
proxy. This means that Kube-proxy might indicate to load balancers that they
should forward traffic to the node in question, simply because the endpoint
is running on the node - this overlooks the fact that Kube-proxy might be
not-healthy and hasn't successfully written the rules enabling traffic to
reach the endpoint.
- set higher severity and log level when unmanaged pods found and improve testing
- do not mention unsupported controller when triggering event for
unmanaged pods (this is covered by CalculateExpectedPodCountFailed
event)
- test unsupported controller
- make testing for events non blocking when event not found
Add node e2e test to verify that static pods can be started after a
previous static pod with the same config temporarily failed termination.
The scenario is:
1. Static pod is started
2. Static pod is deleted
3. Static pod termination fails (internally `syncTerminatedPod` fails)
4. At later time, pod termination should succeed
5. New static pod with the same config is (re)-added
6. New static pod is expected to start successfully
To repro this scenario, setup a pod using a NFS mount. The NFS server is
stopped which will result in volumes failing to unmount and
`syncTerminatedPod` to fail. The NFS server is later started, allowing
the volume to unmount successfully.
xref:
1. https://github.com/kubernetes/kubernetes/pull/113145#issuecomment-1289587988
2. https://github.com/kubernetes/kubernetes/pull/113065
3. https://github.com/kubernetes/kubernetes/pull/113093
Signed-off-by: David Porter <david@porter.me>
Add a node e2e to verify that if a static pod is terminated while the
container runtime or CRI returns an error, the pod is eventually
terminated successfully.
This test serves as a regression test for k8s.io/issue/113145 which
fixes an issue where force deleted pods may not be terminated if the
container runtime fails during a `syncTerminatingPod`.
To test this behavior, start a static pod, stop the container runtime,
and later start the container runtime. The static pod is expected to
eventually terminate successfully.
To start and stop the container runtime, we need to find the container
runtime systemd unit name. Introduce a util function
`findContainerRuntimeServiceName` which finds the unit name by getting
the pid of the container runtime from the existing
`ContainerRuntimeProcessName` flag passed into node e2e and using
systemd dbus `GetUnitNameByPID` function to convert the pid of the
container runtime to a unit name. Using the unit name, introduce helper
functions to start and stop the container runtime.
Signed-off-by: David Porter <david@porter.me>
Currently, when user applies a List type resource, there is an unnecessary
request to customresourcedefinitions endpoint by validation. The reason of this
request is that validation can not find List type in OpenAPI schema and
behaves this type as CRD.
This PR returns unsupported when List type is applied.
In terms of functionality, there is no bug or any issue, because
eventually `schemaValidation` detects this List and executes `validateList`
function. Only issue is unnecessary CRD enpoint request.