Fixes the issue caused when multile ClusterCIDR objects have the same
nodeSelector values, order of the requirements in the nodeSelector is
not preserved when nodeSelector is marshalled and converted to a string.
Adds integration tests for the following scenarios with
MultiCIDRRangeAllocator enabled:
- ClusterCIDR is released when an associated node is deleted.
- ClusterCIDR delete when a node is associated, validate the finalizer
behavior, make sure that deleted ClusterCIDR is cleaned up after the
associated node is deleted.
- ClusterCIDR marked as terminating due to deletion must not be used for
allocating PodCIDRs to new nodes.
- Tie break behavior when multiple ClusterCIDRs are eligible to
allocate PodCIDRs to a node.
The device plugin test expects that no other pods are running prior to
the test starting. However, it has been observed that in some cases
some resources may still be around from previous tests. This is because
the deletion of resources from other tests is handled by deleting that
test's framework's namespace which is done asynchronously without
waiting for the other test's namespace to be deleted.
As a result, when the node e2e device plugin starts, there may still be
other pods in process of termination. To work around this, add a retry
to the device plugin test to account for the time it takes to delete the
resources from the prior test.
Signed-off-by: David Porter <david@porter.me>
In the `NodeSupportsPreconfiguredRuntimeClassHandler`, update the check
for the runtime handler to return a failure if the
`/etc/containerd/config.toml` or `/etc/crio/crio.conf` config files do
not exist. If an error is returned, then the underlying test will be
skipped.
Test manually with starting a kind cluster and moving the containerd
config file and verifying that the test is skipped:
```
$ docker exec -it kind-worker /bin/bash
root@kind-worker:/# mv /etc/containerd/config.toml /etc/containerd/config.toml.bak
```
```
make WHAT="test/e2e/e2e.test"
$ ./_output/bin/e2e.test -kubeconfig /tmp/kubeconfig_kind -ginkgo.focus=".*should run a Pod requesting a RuntimeClass with a configured handler.*" --num-nodes=1 2>&1 -ginkgo.v=1 | tee -i "/tmp/build-log.txt"
[sig-node] RuntimeClass [It] should run a Pod requesting a RuntimeClass with a configured handler [NodeFeature:RuntimeHandler]
test/e2e/common/node/runtimeclass.go:85
[SKIPPED] Skipping test as node does not have E2E runtime class handler preconfigured in container runtime config: command terminated with exit code 1
```
Signed-off-by: David Porter <david@porter.me>
The recently introduced failure handling in ExpectNoError depends on error
wrapping: if an error prefix gets added with `fmt.Errorf("foo: %v", err)`, then
ExpectNoError cannot detect that the root cause is an assertion failure and
then will add another useless "unexpected error" prefix and will not dump the
additional failure information (currently the backtrace inside the E2E
framework).
Instead of manually deciding on a case-by-case basis where %w is needed, all
error wrapping was updated automatically with
sed -i "s/fmt.Errorf\(.*\): '*\(%s\|%v\)'*\",\(.* err)\)/fmt.Errorf\1: %w\",\3/" $(git grep -l 'fmt.Errorf' test/e2e*)
This may be unnecessary in some cases, but it's not wrong.
Instead of pod responses being printed to the log each time polling fails, we
get a consolidated failure message with all unexpected pod responses if (and
only if) the check times out or a progress report gets produced.
This renames PodsResponding to WaitForPodsResponding for the sake of
consistency and adds a timeout parameter. That is necessary because some other
users of NewProxyResponseChecker used a much lower timeout (2min vs. 15min).
Besides simplifying some code, it also makes it easier to rewrite
ProxyResponseChecker because it only gets used in WaitForPodsResponding.
WaitForPodToDisappear was always called such that it listed all pods, which
made it less efficient than trying to get just the one pod it was checking for.
Being able to customize the poll interval in practice wasn't useful, therefore
it can be replaced with WaitForPodNotFoundInNamespace.
Currently, there are some unit tests that are failing on Windows due to
various reasons:
- On Windows, consecutive time.Now() calls may return the same timestamp, which would cause
the TestFreeSpaceRemoveByLeastRecentlyUsed test to flake.
- tests in kuberuntime_container_windows_test.go fail on Nodes that have fewer than 3 CPUs,
expecting the CPU max set to be more than 100% of available CPUs, which is not possible.
- calls in summary_windows_test.go are missing context.
- filterTerminatedContainerInfoAndAssembleByPodCgroupKey will filter and group container
information by the Pod cgroup key, if it exists. However, we don't have cgroups on Windows,
thus we can't make the same assertions.
The test validates the following endpoints
- deleteApiregistrationV1CollectionAPIService
- patchApiregistrationV1APIServiceStatus
- replaceApiregistrationV1APIService
- replaceApiregistrationV1APIServiceStatus
WaitForPods is now a generic function which lists pods and then checks the pods
that it found against some provided condition. A parameter determines how many
pods must be found resp. match the condition for the check to succeed.