The node-controllers has 2 reconcilation methods:
- workqueue with workers, is using during bootstrap and process
nodes until the cloud provider taint is removed
- periodic loop, that runs every certain period polling the cloud
provider to get the instances metadata to update the node addresses,
since nodes can Update its addresses anytime during its lifecycle.
These follows up on the parallelization of the node-controller, that
previously increased the number of workers that handle the bootstrap.
This parallelize the periodic loop based on the input value of the
number of workers, and also uses the informer lister instead of doing
a new List to the apiserver.
Added an unit test that can used to evaluate the performance improvement
with different workers values:
=== RUN TestUpdateNodeStatus/single_thread
node_controller_test.go:2537: 1 workers: processed 100 nodes int 1.055595262s
=== RUN TestUpdateNodeStatus/5_workers
node_controller_test.go:2537: 5 workers: processed 100 nodes int 216.990972ms
=== RUN TestUpdateNodeStatus/10_workers
node_controller_test.go:2537: 10 workers: processed 100 nodes int 112.422435ms
=== RUN TestUpdateNodeStatus/30_workers
node_controller_test.go:2537: 30 workers: processed 100 nodes int 46.243204ms
Change-Id: I38870993431d38fc81a2dc6a713321cfa2e40d85
Image versions are already explicitly set in our manifests
configuration, so tests should not be setting these values
to ensure we're using the same versions across the board.
Verb is the commonly used label when referring to HTTP verbs.
rest_client_requests_total is the only metric in the rest package using
`method` instead of `verb` which makes it inconsistent and confusing.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
* Make policy decision object public
Signed-off-by: Max Smythe <smythe@google.com>
* Separate version conversion from validation
Signed-off-by: Max Smythe <smythe@google.com>
* Address review comments
Signed-off-by: Max Smythe <smythe@google.com>
* Fix variable name
Signed-off-by: Max Smythe <smythe@google.com>
---------
Signed-off-by: Max Smythe <smythe@google.com>
Fixes the issue caused when multile ClusterCIDR objects have the same
nodeSelector values, order of the requirements in the nodeSelector is
not preserved when nodeSelector is marshalled and converted to a string.
Adds integration tests for the following scenarios with
MultiCIDRRangeAllocator enabled:
- ClusterCIDR is released when an associated node is deleted.
- ClusterCIDR delete when a node is associated, validate the finalizer
behavior, make sure that deleted ClusterCIDR is cleaned up after the
associated node is deleted.
- ClusterCIDR marked as terminating due to deletion must not be used for
allocating PodCIDRs to new nodes.
- Tie break behavior when multiple ClusterCIDRs are eligible to
allocate PodCIDRs to a node.
The device plugin test expects that no other pods are running prior to
the test starting. However, it has been observed that in some cases
some resources may still be around from previous tests. This is because
the deletion of resources from other tests is handled by deleting that
test's framework's namespace which is done asynchronously without
waiting for the other test's namespace to be deleted.
As a result, when the node e2e device plugin starts, there may still be
other pods in process of termination. To work around this, add a retry
to the device plugin test to account for the time it takes to delete the
resources from the prior test.
Signed-off-by: David Porter <david@porter.me>
In the `NodeSupportsPreconfiguredRuntimeClassHandler`, update the check
for the runtime handler to return a failure if the
`/etc/containerd/config.toml` or `/etc/crio/crio.conf` config files do
not exist. If an error is returned, then the underlying test will be
skipped.
Test manually with starting a kind cluster and moving the containerd
config file and verifying that the test is skipped:
```
$ docker exec -it kind-worker /bin/bash
root@kind-worker:/# mv /etc/containerd/config.toml /etc/containerd/config.toml.bak
```
```
make WHAT="test/e2e/e2e.test"
$ ./_output/bin/e2e.test -kubeconfig /tmp/kubeconfig_kind -ginkgo.focus=".*should run a Pod requesting a RuntimeClass with a configured handler.*" --num-nodes=1 2>&1 -ginkgo.v=1 | tee -i "/tmp/build-log.txt"
[sig-node] RuntimeClass [It] should run a Pod requesting a RuntimeClass with a configured handler [NodeFeature:RuntimeHandler]
test/e2e/common/node/runtimeclass.go:85
[SKIPPED] Skipping test as node does not have E2E runtime class handler preconfigured in container runtime config: command terminated with exit code 1
```
Signed-off-by: David Porter <david@porter.me>