Automatic merge from submit-queue (batch tested with PRs 42664, 42687)
[Fix Flaky Tests] E2e Node Flaky test suite runs serially
The [e2e Node Flaky Test Suite](https://k8s-testgrid.appspot.com/google-node#kubelet-flaky-gce-e2e&width=20) has been failing with strange errors.
This is because the tests in that suite are meant to be run serially, but are running in parallel, since that was left out of the config. This PR fixes this by changing the Flaky test suite to serial
cc @Random-Liu
Automatic merge from submit-queue
[Bug Fix] Garbage Collect Node e2e Failing
This node e2e test uses its own deletion timeout (1 minute) instead of the default (3 minutes).
#41644 likely increased time for deletion. See that PR for analysis on that.
There may be other problems with this test, but those are difficult to pick apart from hitting this low timeout.
This PR changes the Garbage Collector test to use the default timeout. This should allow us to discern if there are any actual bugs to fix.
cc @kubernetes/sig-node-bugs @calebamiles @derekwaynecarr
Automatic merge from submit-queue
Create "framework" per upgrade test
There were already a few tests just using the default framework
namespace instead of creating a new one. Also there are several
testing libraries that use the default framework's default namespace
as well. It's just easier this way.
Automatic merge from submit-queue
Adding note saying client-go examples only work with the code in the same branch
Adding this note because the problem has confused many users.
It's doc change and only affects client-go examples, so adding the milestone.
Automatic merge from submit-queue (batch tested with PRs 42637, 42648)
Support multiple --feature-gates flags in the command line
Fixes the issue in https://github.com/kubernetes/kubernetes/pull/42647.
Before this change the whole set of gates was replaced with new values. Now values are overridden one by one.
Automatic merge from submit-queue (batch tested with PRs 42637, 42648)
Update examples with storage.k8s.io/v1
This should squeeze into 1.6. All changes are in `examples` directory and reflect our new API.
Release note:
```release-note
NONE
```
@kubernetes/sig-storage-pr-reviews
Automatic merge from submit-queue
kubeadm: Make kube-apiserver's liveness probe match its bindport.
The `kube-apiserver` liveness probe port had previously been hardcoded, so if you used `--apiserver-bind-port` to override the default port (6443), then the health check for the pod would quickly fail and kubelet would continuously kill the apiserver.
**Which issue this PR fixes**: fixes https://github.com/kubernetes/kubeadm/issues/196
**Release note**:
```release-note
kubeadm: fix kube-apiserver liveness probe port when --apiserver-bind-port given
```
* Properly return ImageNotFoundError
* Support inject "Images" or "ImageInspects" and keep both in sync.
* Remove the FakeDockerPuller and let FakeDockerClient subsumes its
functinality. This reduces the overhead to maintain both objects.
* Various small fixes and refactoring of the testing utils.
Automatic merge from submit-queue
deflake TestPatch by waiting for cache
Fixes#39471
Rather than retry on conflicts for an unknown number of times, we have the resource version after the previous patch and we can use that to wait for a GET response that is at least as current as that resourceVersion.
@kubernetes/sig-api-machinery-pr-reviews @liggitt @wojtek-t
Automatic merge from submit-queue (batch tested with PRs 41890, 42593, 42633, 42626, 42609)
Pods pending due to insufficient OIR should get scheduled once sufficient OIR becomes available (e2e disabled).
#41870 was reverted because it introduced an e2e test flake. This is the same code with the e2e for OIR disabled again.
We can attempt to enable the e2e test cases one-by-one in follow-up PRs, but it would be preferable to get the main fix merged in time for 1.6 since OIR is broken on master (see #41861).
cc @timothysc
Automatic merge from submit-queue (batch tested with PRs 41890, 42593, 42633, 42626, 42609)
make all controllers obey the disable flags
Fixes https://github.com/kubernetes/kubernetes/issues/42592
Some controllers weren't disable-able. This fixes them so they obey our flags.
@ncdc
Automatic merge from submit-queue (batch tested with PRs 41890, 42593, 42633, 42626, 42609)
Remove everything that is not new from batch/v2alpha1
Fixes#37166.
@lavalamp you've asked for it
@erictune this is a prereq for moving CronJobs to beta. I initially planned to put all in one PR, but after I did that I figured out it'll be easier to review separately. ptal
@kubernetes/api-approvers @kubernetes/sig-api-machinery-pr-reviews ptal
Automatic merge from submit-queue (batch tested with PRs 42506, 42585, 42596, 42584)
[Federation][e2e] Add wait for namespaces to appear in federated cluster
Some federation e2e tests are failing because the namespace (created for every test case) does not seem to appear in one (or more) of federated clusters.
This may be a timing issue, so introducing a wait until the namespace is created in federated clusters.
cc @madhusudancs, @nikhiljindal, @kubernetes/sig-federation-bugs
Automatic merge from submit-queue (batch tested with PRs 42506, 42585, 42596, 42584)
RC/RS: Fix ignoring inactive Pods.
**What this PR does / why we need it**:
Fix typo that broke ignoring of inactive Pods in RC, and add unit test for that case.
**Which issue this PR fixes**:
Fixes#37479
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 42506, 42585, 42596, 42584)
provide active pods to cgroup cleanup
**What this PR does / why we need it**:
This PR provides more information for when a pod cgroup is considered orphaned. The running pods cache is based on the runtime's view of the world. we create pod cgroups before containers so we should just be looking at activePods.
**Which issue this PR fixes**
Fixes https://github.com/kubernetes/kubernetes/issues/42431
Automatic merge from submit-queue (batch tested with PRs 42506, 42585, 42596, 42584)
Preserve custom etcd prefix compatibility for etcd3
Fixes#42505
```release-note
restored normalization of custom `--etcd-prefix` when `--storage-backend` is set to etcd3
```
Automatic merge from submit-queue
cgroup names created by kubelet should be lowercased
**What this PR does / why we need it**:
This PR modifies the kubelet to create cgroupfs names that are lowercased. This better aligns us with the naming convention for cgroups v2 and other cgroup managers in ecosystem (docker, systemd, etc.)
See: https://www.kernel.org/doc/Documentation/cgroup-v2.txt
"2-6-2. Avoid Name Collisions"
**Special notes for your reviewer**:
none
**Release note**:
```release-note
kubelet created cgroups follow lowercase naming conventions
```
It had previously been hardcoded, so if you used --apiserver-bind-port
to override the default port (6443), then the health check for the pod
would quickly fail and kubelet would continuously kill the apiserver.
Automatic merge from submit-queue (batch tested with PRs 42080, 41653, 42598, 42555)
Fix resource cleanup in ingress_utils.go within e2e/framework
**What this PR does / why we need it**:
The GLBC is failing to delete resources during the etcd rollback tests and the e2e cleanup is leaking them. After a short while, tests are failing to create new resources.
This PR addresses the e2e/framework's ability to delete GLBC-created resources and adds more logging.
**Which issue this PR fixes**:
Helps #38569 but does not completely close this flake
**Special notes for your reviewer**:
Resources were not being deleted because resource names were being truncated and then their ability to be deleted was determined by the entire cluster id existing in the name. Truncated names also have an extra '0' append to the end of their name (unknown origin). This PR tries to match on a common prefix.
Minor changes were made to improve log readability.
**Testing this PR**:
This was tested by running a master upgrade test and by adding a second forwarding-rule mid-run. This forwarding rule referenced the same url-map used by the first forwarding-rule created by the GLBC. Therefore, the GLBC will be able to delete the forwarding-rule but not anymore L7 resources. This second forwarding rule's name was nearly identical to the first forwarding rule so that the cleanup code will find it.
As you can see from the test run below, the cleanup code deleted all the resources that the GLBC could not.
```log
...
Mar 5 18:35:53.112: INFO: Monitoring glbc's cleanup of gce resources:
k8s-fws-e2e-tests-ingress-upgrsde-0px85-static-ip--5f38ac0e2420 (forwarding rule)
k8s-tps-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e2420 (target-https-proxy)
k8s-um-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e24260 (url-map)
k8s-be-32331--5f38ac0e2426f796 (backend-service)
k8s-be-32613--5f38ac0e2426f796 (backend-service)
k8s-be-32331--5f38ac0e2426f796 (http-health-check)
k8s-be-32613--5f38ac0e2426f796 (http-health-check)
k8s-ig--5f38ac0e2426f796 (instance-group)
k8s-ssl-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e2420 (ssl-certificate)
STEP: Performing final delete of any remaining resources
Mar 5 18:35:54.055: INFO: Deleting forwarding-rules: k8s-fws-e2e-tests-ingress-upgrsde-0px85-static-ip--5f38ac0e2420
Mar 5 18:36:06.945: INFO: Deleting target-https-proxies: k8s-tps-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e2420
Mar 5 18:36:14.301: INFO: Deleting url-map: k8s-um-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e24260
Mar 5 18:36:18.309: INFO: Deleting backed-service: k8s-be-32331--5f38ac0e2426f796
Mar 5 18:36:22.112: INFO: Deleting backed-service: k8s-be-32613--5f38ac0e2426f796
Mar 5 18:36:26.192: INFO: Deleting http-health-check: k8s-be-32331--5f38ac0e2426f796
Mar 5 18:36:29.846: INFO: Deleting http-health-check: k8s-be-32613--5f38ac0e2426f796
Mar 5 18:36:33.722: INFO: Deleting instance-group: k8s-ig--5f38ac0e2426f796
Mar 5 18:36:37.762: INFO: Deleting ssl-certificate: k8s-ssl-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e2420
STEP: No resources leaked.
Mar 5 18:36:46.441: INFO: Deleting addresses: e2e-tests-ingress-upgrade-0px85-static-ip
Mar 5 18:36:53.902: INFO: L7 controller failed to delete all cloud resources on time. timed out waiting for the condition
...
```
Automatic merge from submit-queue (batch tested with PRs 42080, 41653, 42598, 42555)
Revert "Pods pending due to insufficient OIR should get scheduled once sufficient OIR becomes available."
Reverts kubernetes/kubernetes#41870 for stopping bleeding edge: #42597
cc/ @ConnorDoyle @kubernetes/release-team
Connor if there is a pending pr to fix the issue, please point it out to me. We can close this one, otherwise, I would like to revert the pr first. You can resubmit the fix. Thanks!
Automatic merge from submit-queue (batch tested with PRs 42080, 41653, 42598, 42555)
Support whitespace in command path for gcp auth plugin
```
External command option on gcp client auth plugin supports whitespace in command path.
```
Splitting on whitespace to get cmd+args breaks when the path the executable contains spaces. Resolve by adding a new "cmd-args" field to config to allow the full string of "cmd-path" to be interpreted as path to executable.
This change is backwards compatible with existing behavior.
Automatic merge from submit-queue (batch tested with PRs 42080, 41653, 42598, 42555)
StatefulSet: Respect ControllerRef
**What this PR does / why we need it**:
This is part of the completion of the [ControllerRef](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/controller-ref.md) proposal. It brings StatefulSet into full compliance with ControllerRef. See the individual commit messages for details.
**Which issue this PR fixes**:
Fixes#36859
**Special notes for your reviewer**:
**Release note**:
```release-note
StatefulSet now respects ControllerRef to avoid fighting over Pods. At the time of upgrade, **you must not have StatefulSets with selectors that overlap** with any other controllers (such as ReplicaSets), or else [ownership of Pods may change](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/controller-ref.md#upgrading).
```
cc @erictune @kubernetes/sig-apps-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 41826, 42405)
Fixed too long name in HPA e2e upgrade test.
Fixed too long name in HPA e2e upgrade test.
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 41826, 42405)
Add stubDomains and upstreamNameservers configuration to kube-dns
```release-note
Updates the dnsmasq cache/mux layer to be managed by dnsmasq-nanny.
dnsmasq-nanny manages dnsmasq based on values from the
kube-system:kube-dns configmap:
"stubDomains": {
"acme.local": ["1.2.3.4"]
},
is a map of domain to list of nameservers for the domain. This is used
to inject private DNS domains into the kube-dns namespace. In the above
example, any DNS requests for *.acme.local will be served by the
nameserver 1.2.3.4.
"upstreamNameservers": ["8.8.8.8", "8.8.4.4"]
is a list of upstreamNameservers to use, overriding the configuration
specified in /etc/resolv.conf.
```