Automatic merge from submit-queue (batch tested with PRs 42854, 43105, 43090)
Add a timeout to allow replacement pod to become ready
Hopefully fixes https://github.com/kubernetes/kubernetes/issues/37259
```
I0314 04:26:02.562] Mar 14 04:26:02.562: INFO: Pod my-hostname-net-1bgrj still exists
I0314 04:26:22.491] Mar 14 04:26:22.491: INFO: Waiting for pod my-hostname-net-1bgrj to disappear
I0314 04:26:22.496] Mar 14 04:26:22.495: INFO: Pod my-hostname-net-1bgrj no longer exists
I0314 04:26:22.496] STEP: verifying whether the pod from the unreachable node is recreated
I0314 04:26:22.498] Mar 14 04:26:22.498: INFO: Pod name my-hostname-net: Found 3 pods out of 3
I0314 04:26:22.499] STEP: ensuring each pod is running
I0314 04:26:22.499] STEP: trying to dial each unique pod
I0314 04:26:22.579] Mar 14 04:26:22.579: INFO: Controller my-hostname-net: Got expected result from replica 1 [my-hostname-net-5jrdb]: "my-hostname-net-5jrdb", 1 of 3 required successes so far
I0314 04:26:22.642] Mar 14 04:26:22.642: INFO: Controller my-hostname-net: Got expected result from replica 2 [my-hostname-net-mjf3c]: "my-hostname-net-mjf3c", 2 of 3 required successes so far
I0314 04:31:22.645] Mar 14 04:31:22.644: INFO: Controller my-hostname-net: Failed to Get from replica 3 [my-hostname-net-rf46s]: Get https://35.184.87.178/api/v1/namespaces/e2e-tests-network-partition-s5gqt/pods/my-hostname-net-rf46s/proxy/: context deadline exceeded
```
The issue appears to be that we have a race between the pod being "running + ready" and being accessible via the APIServer proxy.
cc @kow3ns @bowei @davidopp
Automatic merge from submit-queue (batch tested with PRs 42854, 43105, 43090)
Move e2e sched event predicates to new file.
**What this PR does / why we need it**:
Small e2e test refactor for scheduler. Moves scheduler event predicates out of opaque_resource.go for reuse elsewhere.
**Release note**:
```release-note
NONE
```
cc @kubernetes/sig-scheduling-pr-reviews @timothysc @bsalamat
Automatic merge from submit-queue (batch tested with PRs 42854, 43105, 43090)
Update ScaleIO volume plugin default readOnly value
This commit updates the code to set readOnly attribute to be set to false.
**What this PR does / why we need it**:
This PR is a minor fix that updates the default value of `readOnly` attribute to `false`.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 43018, 42713, 42819)
Update startup scripts for kube-dns ConfigMap and ServiceAccount
Follow up PR of #42757. This PR changes all existing startup scripts to support default kube-dns ConfigMap and ServiceAccount.
@bowei
cc @liggitt
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 43018, 42713)
kubeadm: Don't drain and remove the current node on kubeadm reset
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
In v1.5, `kubeadm reset` would drain your node and remove it from your cluster if you specified, but now in v1.6 we can't do that due to the RBAC rules we have set up.
After conversations with @liggitt, I also agree this functionality was somehow a little mis-placed (though still very convenient to use), so we're removing it for v1.6.
It's the system administrator's duty to drain and remove nodes from the cluster, not the nodes' responsibility.
The current behavior is therefore a bug that needs to be fixed in v1.6
**Release note**:
```release-note
kubeadm: `kubeadm reset` won't drain and remove the current node anymore
```
@liggitt @deads2k @jbeda @dmmcquay @pires @errordeveloper
Automatic merge from submit-queue (batch tested with PRs 43018, 42713)
Log instead of fail on GLBCs tendency to leak resources
**What this PR does / why we need it**:
Stops upgrade tests from flaking because the GLBC does not cleanup all resources due to a race condition.
**Which issue this PR fixes**: fixes#38569
**Special notes for your reviewer**:
To be reviewed by @mml
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 42775, 42991, 42968, 43029)
Add e2e test for Deployment controllerRef orphaning and adoption
Follow up #42908
@enisoc @kubernetes/sig-apps-bugs @kargakis
Automatic merge from submit-queue (batch tested with PRs 42775, 42991, 42968, 43029)
Initial breakout of scheduling e2es to help assist in assignment and refactoring
**What this PR does / why we need it**:
This PR segregates the scheduling specific e2es to isolate the library which will assist both in refactoring but also auto-assignment of issues.
**Which issue this PR fixes**
xref: https://github.com/kubernetes/kubernetes/issues/42691#issuecomment-285563265
**Special notes for your reviewer**:
All this change does is shuffle code around and quarantine. Behavioral, and other cleanup changes, will be in follow on PRs. As of today, the e2es are a monolith and there is massive symbol pollution, this 1st step allows us to segregate the e2es and tease apart the dependency mess.
**Release note**:
```
NONE
```
/cc @kubernetes/sig-scheduling-pr-reviews @kubernetes/sig-testing-pr-reviews @marun @skriss
/cc @gmarek - same trick for load + density, etc.
Automatic merge from submit-queue (batch tested with PRs 42775, 42991, 42968, 43029)
Remove 'beta' from default storage class annotation
I forgot to update default storage class annotation in my storage.k8s.io/v1beta1 -> v1 PRs. Let's fix it before 1.6 is released.
I consider it as a bugfix, in #40088 I already updated the release notes to include non-beta annotation `storageclass.kubernetes.io/is-default-class`
```release-note
NONE
```
@kubernetes/sig-storage-pr-reviews
@msau42, please help with merging.
Automatic merge from submit-queue (batch tested with PRs 42775, 42991, 42968, 43029)
`kubectl taint` update for `NoExecute`.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
Fixes#42774
**Release note**:
```release-note
```
Automatic merge from submit-queue
Add DaemonSet templateGeneration validation and tests, and fix a bunch of validation test errors
For DaemonSet update:
1. Validate that templateGeneration is increased when and only when template is changed
1. Validate that templateGeneration is never decreased
1. Added validation tests for templateGeneration
1. Fix a bunch of errors in validate tests
- fake tests: almost all validation test error cases failed on "missing resource version", "name changes", "missing update strategy", "selector/template labels mismatch", not on the real validation we wanted to test
- some error cases should be success cases
@kargakis @lukaszo @kubernetes/sig-apps-bugs
*I've verified locally that all DaemonSet e2e tests pass with this change.*
Automatic merge from submit-queue (batch tested with PRs 43034, 43066)
Allow StatefulSet controller to PATCH Pods.
**What this PR does / why we need it**:
StatefulSet now needs the PATCH permission on Pods since it calls into ControllerRefManager to adopt and release. This adds the permission and the missing e2e test that should have caught this.
**Which issue this PR fixes**:
**Special notes for your reviewer**:
This is based on #42925.
**Release note**:
```release-note
```
cc @kubernetes/sig-apps-pr-reviews
This commit updates the code to set the default value of the readOnly attribute to false.
It also updates the example docs to add full list of supported plugin attributes and doc.
Automatic merge from submit-queue (batch tested with PRs 42942, 42935)
pkg/util/flock: Fix the flock so it actually locks.
With this PR, the second call to `Acquire()` will block unless the lock is released (process exits).
Also removed the memory mutex in the previous code since we don't need `Release()` here so no need to save and protect the local fd.
Fix#42929.
Automatic merge from submit-queue (batch tested with PRs 42942, 42935)
[Bug] Handle container restarts and avoid using runtime pod cache while allocating GPUs
Fixes#42412
**Background**
Support for multiple GPUs is an experimental feature in v1.6.
Container restarts were handled incorrectly which resulted in stranding of GPUs
Kubelet is incorrectly using runtime cache to track running pods which can result in race conditions (as it did in other parts of kubelet). This can result in same GPU being assigned to multiple pods.
**What does this PR do**
This PR tracks assignment of GPUs to containers and returns pre-allocated GPUs instead of (incorrectly) allocating new GPUs.
GPU manager is updated to consume a list of active pods derived from apiserver cache instead of runtime cache.
Node e2e has been extended to validate this failure scenario.
**Risk**
Minimal/None since support for GPUs is an experimental feature that is turned off by default. The code is also isolated to GPU manager in kubelet.
**Workarounds**
In the absence of this PR, users can mitigate the original issue by setting `RestartPolicyNever` in their pods.
There is no workaround for the race condition caused by using the runtime cache though.
Hence it is worth including this fix in v1.6.0.
cc @jianzhangbjz @seelam @kubernetes/sig-node-pr-reviews
Replaces #42560
Automatic merge from submit-queue
remove dgoodwin and dmmcquay to kubeadm reviewers
@dgoodwin says he needs to work on other stuff right now. @dmmcquay says he wants to help with reviews.
Automatic merge from submit-queue (batch tested with PRs 43022, 43078)
Use constant time compare for bootstrap tokens
This is a subtle security issue that should go in for 1.6 on a new feature (bootstrap tokens).
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 43022, 43078)
Dumb typo in kubeadm instructions
I typo'd chown as chmod in kubeadm instructions. Ugh.
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 42802, 42927, 42669, 42988, 43012)
[Federation] Fix flakey ingress unit test
The unit test for the ingress controller was previously adding a cluster twice, which resulted in a cluster being deleted and added back. The deletion was racing the controller shutdown to close
informer channels, sometimes resulting in closing an already closed channel. This change ensures that the federated informer clears its map of informers when ``Stop()`` is called to insure against a double close, and fixes the test to no longer add the cluster twice.
Targets #43009
cc: @csbell @kubernetes/sig-federation-bugs
Automatic merge from submit-queue (batch tested with PRs 42802, 42927, 42669, 42988, 43012)
Update Cluster Autoscaler entrypoint
**What this PR does / why we need it**:
Update Cluster Autoscaler manifest file to use new shell wrapper instead of directly calling CA binary (the wrapper is already included in current CA image).
Add params to improve logging.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 42802, 42927, 42669, 42988, 43012)
update to latest version of coreos/go-oidc
Includes updates that enable OIDC with OKTA as a IDP
**What this PR does / why we need it**:
Updates to the latest version of coreos/go-oidc
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes # TBD
**Special notes for your reviewer**:
Updates coreos/go-oidc module to include fixes for https://github.com/coreos/go-oidc/issues/137 which prevent OKTA being used as an IDP
**Release note**:
```release-note
NONE
```
cc:/ @ericchiang
With this PR, the second call to `Acquire()` will block unless the lock is released (process exits).
Also removed the memory mutex in the previous code since we don't need `Release()` here so no need to save and protect the local fd.
Fix#42929.
Automatic merge from submit-queue
Allow DaemonSet controller to PATCH pods, and add more steps and logs in DaemonSet pods adoption e2e test
DaemonSet pods adoption failed because DS controller aren't allowed to patch pods when claiming pods.
[Edit] This PR fixes#42908 by modifying RBAC to allow DaemonSet controllers to patch pods, as well as adding more logs and steps to the original e2e test to make debugging easier.
Tested locally with a local cluster and GCE cluster.
@kargakis @lukaszo @kubernetes/sig-apps-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 42940, 42906, 42970, 42848)
Enable RollingUpdates for the fluentd daemonset addon
In anticipation of needing to rev fluentd-gcp image versions in patch releases, we should enable rolling update so the new versions get rolled out in a timely manner.
/cc @ixdy
Automatic merge from submit-queue (batch tested with PRs 42940, 42906, 42970, 42848)
Improve kubeadm init message
Now that we are locking down the insecure port, we should give clearer instructions on how to copy out the root owned admin.conf file, chmod it and use it.
Signed-off-by: Joe Beda <joe.github@bedafamily.com>
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 42940, 42906, 42970, 42848)
Move node and event observer helpers to e2e/common
**What this PR does / why we need it**:
Moves existing test helper functions in OIR e2e tests to `test/e2e/common`. These functions wrap informers to help test writers to observe events instead of long-polling for status updates.
For usage examples, see `test/e2e/opaque_resource.go`.
cc @kubernetes/sig-scheduling-misc
**Release note**:
```release-note
NONE
```
The unit test for the ingress controller was previously adding
a cluster twice, which resulted in a cluster being deleted and added
back. The deletion was racing the controller shutdown to close
informer channels. This change ensures that the informer clears its
map of informers when Stop() is called to prevent a double close, and
that the test no longer adds the cluster twice.