Automatic merge from submit-queue (batch tested with PRs 61284, 61119, 61201). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Prevent garbage collector from attempting to sync with 0 resources
**What this PR does / why we need it**:
As of #55259 we enabled garbagecollector.GetDeletableResources to return partial discovery results (including an empty set of discovery results).
This had the unintended consequence of allowing the garbage collector to enter a blocked state that can only be fixed by restarting.
From [this comment](https://github.com/kubernetes/kubernetes/issues/60037#issuecomment-372801088):
> 1. The Sync function periodically calls GetDeletableResources
>
> 2. According to the comment above GetDeletableResources, All discovery errors are considered temporary. Upon encountering any error, GetDeletableResources will log and return any discovered resources it was able to process (which may be none)., an error in discovery causes the discovery client to no longer discover resources in the cluster, but instead of failing and returning an error, it simply logs the error as garbagecollector.go:601] failed to discover preferred resources: %vthe server was unable to return a response in the time allotted, but may still be processing the request and returns an empty list of resources
>
> 3. The Sync function, upon recieving an empty resource list from discovery, detects that the resources have changed, and calls resyncMonitors, which calls dependencyGraphBuilder.syncMonitors with map[] as the argument as shown in the log as garbagecollector.go:189] syncing garbage collector with updated resources from discovery: map[], which sets the list of monitors to an empty list because it thinks there are no resources to monitor.
>
> 4. Lastly the Sync function calls controller.WaitForCacheSync, which calls cache.WaitForCacheSync, which will continually retry the garbagecollector.IsSynced function until it returns true, but it will always return false because len(gb.monitors) is 0.
This PR prevents that specific race condition from arising.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#60037
**Release note**:
```release-note
Fix bug allowing garbage collector to enter a broken state that could only be fixed by restarting the controller-manager.
```
Automatic merge from submit-queue (batch tested with PRs 61284, 61119, 61201). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add AUTOSCALER_ENV_VARS to kube-env to hotfix cluster autoscaler
This provides a temporary way for the cluster autoscaler to get at
values that were removed from kube-env in #60020. Ideally this
information will eventually be available via e.g. the Cluster API,
because kube-env is an internal interface that carries no stability
guarantees.
This is the first half of the fix; the other half is that cluster autoscaler
needs to be modified to read from AUTOSCALER_ENV_VARS, if it is
available.
Since cluster autoscaler was also reading KUBELET_TEST_ARGS for the
kube-reserved flag, and we don't want to resurrect KUBELET_TEST_ARGS in kube-env,
we opted to create AUTOSCALER_ENV_VARS instead of just adding back
the old env vars. This also makes it clear that we have an ugly dependency
on kube-env.
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 61284, 61119, 61201). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix creation of subpath with SUID/SGID directories.
SafeMakeDir() should apply SUID/SGID/sticky bits to the directory it creates.
Fixes#61283
**Release note**:
```release-note
NONE
```
This provides a temporary way for the cluster autoscaler to get at
values that were removed from kube-env in #60020. Ideally this
information will eventually be available via e.g. the Cluster API,
because kube-env is an internal interface that carries no stability
guarantees.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Added unschedulable taint
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
part of #59194; fixes#61050
**Release note**:
```release-note
When `TaintNodesByCondition` enabled, added `node.kubernetes.io/unschedulable:NoSchedule`
taint to the node if `spec.Unschedulable` is true.
When `ScheduleDaemonSetPods` enabled, `node.kubernetes.io/unschedulable:NoSchedule`
toleration is added automatically to DaemonSet Pods; so the `unschedulable` field of
a node is not respected by the DaemonSet controller.
```
Automatic merge from submit-queue (batch tested with PRs 60722, 61269). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Bump fluentd-gcp-scaler version
**What this PR does / why we need it**:
This version fixes a bug in which scaler was setting resources for all containers in the pod, not only fluentd-gcp one.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#60763
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove mapping to /host/lib from fluentd-gcp container.
**What this PR does / why we need it**:
This mapping is no longer needed since fluentd-gcp v2.0.16, in which it started using a container image based on Debian Stretch, in which the systemd libraries already include support for all the supported
compression algorithms.
The `/run.sh` in the image no longer accesses `/host/lib` anyways, so let's stop mapping it here.
Related changes:
- fluentd-gcp on GoogleCloudPlatform/k8s-stackdriver#101
- fluentd-es on GoogleCloudPlatform/google-fluentd#80
/assign @timstclair
/cc @crassirostris @bmoyles0117
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
N/A
**Special notes for your reviewer**:
N/A
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fixes 'Zone is empty' errors in PD upgrade tests; skips pd tests with inline volume in multizone clusters
**What this PR does / why we need it**: Fixes regional cluster upgrade test failures.
PV upgrade tests were failing because a "" zone is passed to the GCE PD create disk call. In a multizone setting the test must select from a managed zone.
PD tests were failing because it uses inline GCE PD volumes, which should not be used in multizone clusters.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#61242
/release-note-none
/assign @saad-ali
/cc @wojtek-t
/sig storage
/sig gcp
Automatic merge from submit-queue (batch tested with PRs 60978, 60985). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Backoff only when failed pod shows up
**What this PR does / why we need it**:
Upon introducing the backoff policy we started to delay sync runs for the job when it failed several times before. This leads to failed jobs not reporting status right away in cases that are not related to failed pods, eg. a successful run. This PR ensures the backoff is applied only when `updatePod` receives a failed pod.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#59918#59527
/assign @janetkuo @kow3ns
**Release note**:
```release-note
None
```
Automatic merge from submit-queue (batch tested with PRs 60978, 60985). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix use of "-w" flag to iptables-restore
iptables accepts "-w5" but iptables-restore requires "-w 5", so kube-proxy is currently broken for people with an iptables-restore new enough that kube-proxy tries to use the new flags.
Fixes#58956
**Release note**:
```release-note
Fixed kube-proxy to work correctly with iptables 1.6.2 and later.
```
Automatic merge from submit-queue (batch tested with PRs 61203, 61071). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix issue with race condition during pod deletion
This PR fixes two issues
1. When desired_state_populator removes podvolume state, it should check
whether the actual state already has the volume before deleting it to
make sure actual state has a chance to add the volume into the state
2. When checking podVolume still exists, it not only checks the actual
state, but also the volume disk directory because actual state might not
reflect the real world when kubelet starts.
fixes issue #60645
Automatic merge from submit-queue (batch tested with PRs 61203, 61071). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix deprecated gcloud compute networks --mode switches.
"create --mode" becomes "create --subnet-mode", and switch-mode has been
folded into "update".
Create --mode was deprecated in October and will be removed in the next
gcloud release. It is already failing in staging tests.
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** Fixes#54238
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
This PR fixes two issues
1. When desired_state_populator removes podvolume state, it should check
whether the actual state already has the volume before deleting it to
make sure actual state has a chance to add the volume into the state
2. When checking podVolume still exists, it not only checks the actual
state, but also the volume disk directory because actual state might not
reflect the real world when kubelet starts.
Automatic merge from submit-queue (batch tested with PRs 60888, 61225). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Bump fluentd-gcp-scaler version
**What this PR does / why we need it**:
This version verifies on its own whether resources should be updated or not, instead of relying on `kubectl set resources`.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#61190
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
cc @shyamjvs
Automatic merge from submit-queue (batch tested with PRs 60888, 61225). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Mark reconstructed volumes as reported InUse
When a newly started kubelet finds a directory where a volume should be,
it can be fairly confident that the volume was mounted by previous kubelet
and therefore the volume must have been in node.status.volumesInUse.
Therefore we can mark reconstructed volumes as already reported so
subsequent reconcile() can fix the directory and put the mounted volume
into actual state of world.
Fixes: #60645
**Release note**:
```release-note
NONE
```
/sig storage
/sig node
cc: @gnufied @jingxu97
"create --mode" becomes "create --subnet-mode", and switch-mode has been
folded into "update".
Create --mode was deprecated in October and will be removed in the next
gcloud release. It is already failing in staging tests.
Automatic merge from submit-queue (batch tested with PRs 61118, 60579). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Increase loging verbosity for deleting stateful set pods
We should always log reasons for deleting StatefulSet Pods.
@jdumars - what's the current process for putting such changes into the release? It's literally 0-risk change that helps with debugging.
cc @ttz21
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 61118, 60579). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Increase apiserver mem-threshold in density test
Ref: https://github.com/kubernetes/kubernetes/issues/60500#issuecomment-372682659 (fixes part of that issue)
/sig scalability
/kind bug
/priority important-soon
/cc @wojtek-t
/cc @crassirostris (for the release-note)
```release-note
Audit logging with buffering enabled can increase apiserver memory usage (e.g. up to 200MB in 100-node cluster). The increase is bounded by the buffer size (configurable). Ref: issue #60500
```
Automatic merge from submit-queue (batch tested with PRs 61129, 60359). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Cleanup old upgrading code that is v1.8->v1.9-specific
**What this PR does / why we need it**:
Cleanup old upgrading code that is v1.8->v1.9-specific
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubeadm/issues/622
This will finish the task in the issue.
**Special notes for your reviewer**:
/cc @luxas @vbmade2000
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 61111, 61069). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use pod UID as cache key instead of namespace/name
UID uniquely identifies pods across lifecycles, while namespace/name
could be 2 different pods across lifecycles. This could result in
tricky scheduler bugs.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#60966
**Special notes for your reviewer**: @bsalamat
**Release note**:
```release-note
Fix a bug in scheduler cache by using Pod UID as the cache key instead of namespace/name
```
Automatic merge from submit-queue (batch tested with PRs 61111, 61069). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix subpath e2e tests on multizone cluster.
Use dynamically provisioned PV to run GCE PD tests. This will make sure that the pod is scheduled to the right zone and GCE PD can be attached to a node.
**Which issue(s) this PR fixes**:
Fixes#61101
**Release note**:
```release-note
NONE
```
/sig storage
@msau42 @verult
Automatic merge from submit-queue (batch tested with PRs 60737, 60739, 61080, 60968, 60951). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Find most recent modified date for fluentd buffers recursively.
Fixes#60762
**What this PR does / why we need it**:
Due to updates in Fluent v0.14, the buffers directory modified date is no
longer updated when files inside the directory are changed. Therefore we
must find the most recent modified date recursively to fix liveness probe.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 60737, 60739, 61080, 60968, 60951). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix broken gke regional logging test.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#60882
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 60737, 60739, 61080, 60968, 60951). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Detect backsteps correctly in base path detection
Avoids false positives with atomic writer `..<timestamp>` directories
Fixes#61076
/assign @msau42 @jsafrane
```release-note
Fix a regression that prevented using `subPath` volume mounts with secret, configMap, projected, and downwardAPI volumes
```
Automatic merge from submit-queue (batch tested with PRs 60737, 60739, 61080, 60968, 60951). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix default auditing options.
- Log backend defaults to blocking mode (backwards compatability)
- Webhook backend defaults to throttled
- Fix webhook validation
- Add options test
**Which issue(s) this PR fixes**:
Fixes#60719
**Special notes for your reviewer**:
This PR is an alternative fix to https://github.com/kubernetes/kubernetes/pull/60727. If the rollback goes in first, I'll rebase this on a roll-forward.
**Release note**:
-->
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 60737, 60739, 61080, 60968, 60951). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Update Kubelet command option description for IPv6
**What this PR does / why we need it**:
The restriction for a /66 cidr was removed in PR #60089.
Removing this reference from the command options description.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#60734
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
UID uniquely identifies pods across lifecycles, while namespace/name
could be 2 different pods across lifecycles. This could result in
tricky scheduler bugs.
Fixes#60966
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Bump to etcd 3.1.12 to pick up critical fix
etcd [3.1.12](https://github.com/coreos/etcd/releases/tag/v3.1.12) (as well as 3.2.17 and 3.3.2) was released yesterday to fix a bug critical to kubernetes:
Fix [mvcc "unsynced" watcher restore operation](https://github.com/coreos/etcd/pull/9297).
- "unsynced" watcher is watcher that needs to be in sync with events that have happened.
- That is, "unsynced" watcher is the slow watcher that was requested on old revision.
- "unsynced" watcher restore operation was not correctly populating its underlying watcher group.
- Which possibly causes [missing events from "unsynced" watchers](https://github.com/coreos/etcd/issues/9086).
This will be backported to 1.9 as well.
Release note:
```release-note
Upgrade the default etcd server version to 3.1.12 to pick up critical etcd "mvcc "unsynced" watcher restore operation" fix.
```
cc @gyuho @wojtek-t @shyamjvs @timothysc @jdumars
Automatic merge from submit-queue (batch tested with PRs 61004, 60981). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use grpc to improve the CPU utilization of the logging agent.
Fixes#60762
**What this PR does / why we need it**:
Using gRPC improves the CPU utilization of the logging agent be reducing
serialization overhead and reusing TCP connections.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix option --audit-webhook-initial-backoff
Before this change, --audit-webhook-initial-backoff has no effect
@crassirostris @sttts
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Use dynamically provisioned PV to run GCE PD tests. This will make sure
that the pod is scheduled to the right zone and GCE PD can be attached
to a node.