Automatic merge from submit-queue
New e2e node test suite with memcg turned on
The flag --experimental-kernal-memcg-notification was initially added to allow disabling an eviction feature which used memcg notifications to make memory evictions more reactive.
As documented in #37853, memcg notifications increased the likelihood of encountering soft lockups, especially on CVM.
This feature would valuable to turn on, at least for GCI, since soft lockup issues were less prevalent on GCI and appeared (at the time) to be unrelated to memcg notifications.
In the interest of caution, I would like to monitor serial tests on GCI with --experimental-kernal-memcg-notification=true.
cc @vishh @Random-Liu @dchen1107 @kubernetes/sig-node-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 42664, 42687)
[Fix Flaky Tests] E2e Node Flaky test suite runs serially
The [e2e Node Flaky Test Suite](https://k8s-testgrid.appspot.com/google-node#kubelet-flaky-gce-e2e&width=20) has been failing with strange errors.
This is because the tests in that suite are meant to be run serially, but are running in parallel, since that was left out of the config. This PR fixes this by changing the Flaky test suite to serial
cc @Random-Liu
The design of DaemonSet requires a relist before each phase (manage,
update, status) because it does not short-circuit and requeue for each
action triggered.
The DaemonSet Listers still use selectors, because this is the
behavior expected by callers. This clarifies the meaning of the
returned list. Some callers may need to switch to using
GetControllerOf() instead, but that is a separate, case-by-case issue.
Automatic merge from submit-queue
[Bug Fix] Garbage Collect Node e2e Failing
This node e2e test uses its own deletion timeout (1 minute) instead of the default (3 minutes).
#41644 likely increased time for deletion. See that PR for analysis on that.
There may be other problems with this test, but those are difficult to pick apart from hitting this low timeout.
This PR changes the Garbage Collector test to use the default timeout. This should allow us to discern if there are any actual bugs to fix.
cc @kubernetes/sig-node-bugs @calebamiles @derekwaynecarr
Kubernetes initiates "graceful shutdown" by sending SIGTERM to pid 1.
The way the existing startup scripts worked, this signal arrived at
the shell wrapper, not elasticsearch, and the shell wrapper exited,
killing the container immediately.
Before this change:
1 ? Ss 0:00 /bin/sh -c /run.sh
6 ? S 0:00 /bin/bash /run.sh
13 ? S 0:00 \_ /bin/su -c /elasticsearch/bin/elasticsearch elasticsearch
14 ? Ss 0:00 \_ sh -c /elasticsearch/bin/elasticsearch
15 ? Sl 19:18 \_ /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java ... org.elasticsearch.bootstrap.Elasticsearch start
After this change:
1 ? Ssl 0:29 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java ... org.elasticsearch.bootstrap.Elasticsearch start
Automatic merge from submit-queue
Create "framework" per upgrade test
There were already a few tests just using the default framework
namespace instead of creating a new one. Also there are several
testing libraries that use the default framework's default namespace
as well. It's just easier this way.
Automatic merge from submit-queue
Adding note saying client-go examples only work with the code in the same branch
Adding this note because the problem has confused many users.
It's doc change and only affects client-go examples, so adding the milestone.
Automatic merge from submit-queue (batch tested with PRs 42637, 42648)
Support multiple --feature-gates flags in the command line
Fixes the issue in https://github.com/kubernetes/kubernetes/pull/42647.
Before this change the whole set of gates was replaced with new values. Now values are overridden one by one.
Automatic merge from submit-queue (batch tested with PRs 42637, 42648)
Update examples with storage.k8s.io/v1
This should squeeze into 1.6. All changes are in `examples` directory and reflect our new API.
Release note:
```release-note
NONE
```
@kubernetes/sig-storage-pr-reviews
Automatic merge from submit-queue
kubeadm: Make kube-apiserver's liveness probe match its bindport.
The `kube-apiserver` liveness probe port had previously been hardcoded, so if you used `--apiserver-bind-port` to override the default port (6443), then the health check for the pod would quickly fail and kubelet would continuously kill the apiserver.
**Which issue this PR fixes**: fixes https://github.com/kubernetes/kubeadm/issues/196
**Release note**:
```release-note
kubeadm: fix kube-apiserver liveness probe port when --apiserver-bind-port given
```
* Properly return ImageNotFoundError
* Support inject "Images" or "ImageInspects" and keep both in sync.
* Remove the FakeDockerPuller and let FakeDockerClient subsumes its
functinality. This reduces the overhead to maintain both objects.
* Various small fixes and refactoring of the testing utils.
Automatic merge from submit-queue
deflake TestPatch by waiting for cache
Fixes#39471
Rather than retry on conflicts for an unknown number of times, we have the resource version after the previous patch and we can use that to wait for a GET response that is at least as current as that resourceVersion.
@kubernetes/sig-api-machinery-pr-reviews @liggitt @wojtek-t
Automatic merge from submit-queue (batch tested with PRs 41890, 42593, 42633, 42626, 42609)
Pods pending due to insufficient OIR should get scheduled once sufficient OIR becomes available (e2e disabled).
#41870 was reverted because it introduced an e2e test flake. This is the same code with the e2e for OIR disabled again.
We can attempt to enable the e2e test cases one-by-one in follow-up PRs, but it would be preferable to get the main fix merged in time for 1.6 since OIR is broken on master (see #41861).
cc @timothysc
Automatic merge from submit-queue (batch tested with PRs 41890, 42593, 42633, 42626, 42609)
make all controllers obey the disable flags
Fixes https://github.com/kubernetes/kubernetes/issues/42592
Some controllers weren't disable-able. This fixes them so they obey our flags.
@ncdc
Automatic merge from submit-queue (batch tested with PRs 41890, 42593, 42633, 42626, 42609)
Remove everything that is not new from batch/v2alpha1
Fixes#37166.
@lavalamp you've asked for it
@erictune this is a prereq for moving CronJobs to beta. I initially planned to put all in one PR, but after I did that I figured out it'll be easier to review separately. ptal
@kubernetes/api-approvers @kubernetes/sig-api-machinery-pr-reviews ptal