Automatic merge from submit-queue (batch tested with PRs 46926, 48468)
Fix typo in cluster size autoscaling tests selector
This caused tests not to be run automatically
Automatic merge from submit-queue (batch tested with PRs 48168, 48199)
Fix some flakes in autoscaler e2e on gke
This PR should fix some of the flakes we found in e2e runs, while testing for 1.7 release:
- if one of the nodes is unschedulable in set up (causing set up to fail) we used to wait for wrong number of nodes in clean-up, adding unnecessary 20 minute wait to failing test
- we did not check for errors when creating RC in test, leading to tests failing later in hard to debug way (added retry loop and explicit test failure)
Automatic merge from submit-queue (batch tested with PRs 48004, 48205, 48130, 48207)
Add e2e tests for CA scale up when pending pod requests volume
Test verifying pending pods with PVC don't interfere with scale up, issue: kubernetes/autoscaler#22
Automatic merge from submit-queue
Further reduce cluster-autoscaler e2e flakiness
Ref: https://github.com/kubernetes/autoscaler/issues/89
Add pdbs for additional kube-system pod, move adding
pdbs to separate function, as it will need to be reused
in new tests we're working on (ex. scale to 0).
Automatic merge from submit-queue
Add new e2e test for cluster size autoscaler (evicting system pods)
This test verifies that cluster autoscaler drains nodes with system pods running if they have a PDB.
Automatic merge from submit-queue (batch tested with PRs 43844, 44284)
Add a retry to cluster-autoscaler e2e
This should fix https://github.com/kubernetes/kubernetes/issues/44268.
The flake was caused by following sequence of events:
1. Cluster was at minimum size (3), some node was unneeded for a while.
2. Setup for some test (scale-down, failure) would increase node group size (to 5) and wait for new nodes to come up.
3. As soon as new node come up (cluster size 4) CA would scale-down the old unneeded node (setting node group size to 4).
4. Node group would not reach size 5 (as the target was now 4) and the test would timeout and fail.
This PR makes the setup monitor re-set the target node group size if the above scenario happens.