Automatic merge from submit-queue (batch tested with PRs 61284, 61119, 61201). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add AUTOSCALER_ENV_VARS to kube-env to hotfix cluster autoscaler
This provides a temporary way for the cluster autoscaler to get at
values that were removed from kube-env in #60020. Ideally this
information will eventually be available via e.g. the Cluster API,
because kube-env is an internal interface that carries no stability
guarantees.
This is the first half of the fix; the other half is that cluster autoscaler
needs to be modified to read from AUTOSCALER_ENV_VARS, if it is
available.
Since cluster autoscaler was also reading KUBELET_TEST_ARGS for the
kube-reserved flag, and we don't want to resurrect KUBELET_TEST_ARGS in kube-env,
we opted to create AUTOSCALER_ENV_VARS instead of just adding back
the old env vars. This also makes it clear that we have an ugly dependency
on kube-env.
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 61284, 61119, 61201). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix creation of subpath with SUID/SGID directories.
SafeMakeDir() should apply SUID/SGID/sticky bits to the directory it creates.
Fixes#61283
**Release note**:
```release-note
NONE
```
This provides a temporary way for the cluster autoscaler to get at
values that were removed from kube-env in #60020. Ideally this
information will eventually be available via e.g. the Cluster API,
because kube-env is an internal interface that carries no stability
guarantees.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Added unschedulable taint
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
part of #59194; fixes#61050
**Release note**:
```release-note
When `TaintNodesByCondition` enabled, added `node.kubernetes.io/unschedulable:NoSchedule`
taint to the node if `spec.Unschedulable` is true.
When `ScheduleDaemonSetPods` enabled, `node.kubernetes.io/unschedulable:NoSchedule`
toleration is added automatically to DaemonSet Pods; so the `unschedulable` field of
a node is not respected by the DaemonSet controller.
```
Automatic merge from submit-queue (batch tested with PRs 60722, 61269). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Bump fluentd-gcp-scaler version
**What this PR does / why we need it**:
This version fixes a bug in which scaler was setting resources for all containers in the pod, not only fluentd-gcp one.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#60763
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove mapping to /host/lib from fluentd-gcp container.
**What this PR does / why we need it**:
This mapping is no longer needed since fluentd-gcp v2.0.16, in which it started using a container image based on Debian Stretch, in which the systemd libraries already include support for all the supported
compression algorithms.
The `/run.sh` in the image no longer accesses `/host/lib` anyways, so let's stop mapping it here.
Related changes:
- fluentd-gcp on GoogleCloudPlatform/k8s-stackdriver#101
- fluentd-es on GoogleCloudPlatform/google-fluentd#80
/assign @timstclair
/cc @crassirostris @bmoyles0117
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
N/A
**Special notes for your reviewer**:
N/A
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fixes 'Zone is empty' errors in PD upgrade tests; skips pd tests with inline volume in multizone clusters
**What this PR does / why we need it**: Fixes regional cluster upgrade test failures.
PV upgrade tests were failing because a "" zone is passed to the GCE PD create disk call. In a multizone setting the test must select from a managed zone.
PD tests were failing because it uses inline GCE PD volumes, which should not be used in multizone clusters.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#61242
/release-note-none
/assign @saad-ali
/cc @wojtek-t
/sig storage
/sig gcp
Automatic merge from submit-queue (batch tested with PRs 60978, 60985). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Backoff only when failed pod shows up
**What this PR does / why we need it**:
Upon introducing the backoff policy we started to delay sync runs for the job when it failed several times before. This leads to failed jobs not reporting status right away in cases that are not related to failed pods, eg. a successful run. This PR ensures the backoff is applied only when `updatePod` receives a failed pod.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#59918#59527
/assign @janetkuo @kow3ns
**Release note**:
```release-note
None
```
Automatic merge from submit-queue (batch tested with PRs 60978, 60985). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix use of "-w" flag to iptables-restore
iptables accepts "-w5" but iptables-restore requires "-w 5", so kube-proxy is currently broken for people with an iptables-restore new enough that kube-proxy tries to use the new flags.
Fixes#58956
**Release note**:
```release-note
Fixed kube-proxy to work correctly with iptables 1.6.2 and later.
```
Automatic merge from submit-queue (batch tested with PRs 61203, 61071). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix issue with race condition during pod deletion
This PR fixes two issues
1. When desired_state_populator removes podvolume state, it should check
whether the actual state already has the volume before deleting it to
make sure actual state has a chance to add the volume into the state
2. When checking podVolume still exists, it not only checks the actual
state, but also the volume disk directory because actual state might not
reflect the real world when kubelet starts.
fixes issue #60645
Automatic merge from submit-queue (batch tested with PRs 61203, 61071). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix deprecated gcloud compute networks --mode switches.
"create --mode" becomes "create --subnet-mode", and switch-mode has been
folded into "update".
Create --mode was deprecated in October and will be removed in the next
gcloud release. It is already failing in staging tests.
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** Fixes#54238
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
This PR fixes two issues
1. When desired_state_populator removes podvolume state, it should check
whether the actual state already has the volume before deleting it to
make sure actual state has a chance to add the volume into the state
2. When checking podVolume still exists, it not only checks the actual
state, but also the volume disk directory because actual state might not
reflect the real world when kubelet starts.
Automatic merge from submit-queue (batch tested with PRs 60888, 61225). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Bump fluentd-gcp-scaler version
**What this PR does / why we need it**:
This version verifies on its own whether resources should be updated or not, instead of relying on `kubectl set resources`.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#61190
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
cc @shyamjvs
Automatic merge from submit-queue (batch tested with PRs 60888, 61225). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Mark reconstructed volumes as reported InUse
When a newly started kubelet finds a directory where a volume should be,
it can be fairly confident that the volume was mounted by previous kubelet
and therefore the volume must have been in node.status.volumesInUse.
Therefore we can mark reconstructed volumes as already reported so
subsequent reconcile() can fix the directory and put the mounted volume
into actual state of world.
Fixes: #60645
**Release note**:
```release-note
NONE
```
/sig storage
/sig node
cc: @gnufied @jingxu97
"create --mode" becomes "create --subnet-mode", and switch-mode has been
folded into "update".
Create --mode was deprecated in October and will be removed in the next
gcloud release. It is already failing in staging tests.
Pods in scheduler cache contains both the scheduled pods and those not
scheduled yet in scheduling queue. This commit adds the second group of
pods into consideration while comparing the cache.
Automatic merge from submit-queue (batch tested with PRs 61118, 60579). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Increase loging verbosity for deleting stateful set pods
We should always log reasons for deleting StatefulSet Pods.
@jdumars - what's the current process for putting such changes into the release? It's literally 0-risk change that helps with debugging.
cc @ttz21
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 61118, 60579). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Increase apiserver mem-threshold in density test
Ref: https://github.com/kubernetes/kubernetes/issues/60500#issuecomment-372682659 (fixes part of that issue)
/sig scalability
/kind bug
/priority important-soon
/cc @wojtek-t
/cc @crassirostris (for the release-note)
```release-note
Audit logging with buffering enabled can increase apiserver memory usage (e.g. up to 200MB in 100-node cluster). The increase is bounded by the buffer size (configurable). Ref: issue #60500
```
Automatic merge from submit-queue (batch tested with PRs 61129, 60359). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Cleanup old upgrading code that is v1.8->v1.9-specific
**What this PR does / why we need it**:
Cleanup old upgrading code that is v1.8->v1.9-specific
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubeadm/issues/622
This will finish the task in the issue.
**Special notes for your reviewer**:
/cc @luxas @vbmade2000
**Release note**:
```release-note
NONE
```
Similar to the change we made for `GetObjectMetricReplicas` in the
previous commit. Ensure that `GetExternalMetricReplicas` does not
include unready pods when its determining how many replica it desires.
Including unready pods can lead to over-scaling.
We did not change the behavior of `GetExternalPerPodMetricReplicas`, as
it is slightly less clear what is the desired behavior. We did make some
small naming refactorings to this method, which will make it easier to
ignore unready pods if we decide we want to.
Automatic merge from submit-queue (batch tested with PRs 61111, 61069). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use pod UID as cache key instead of namespace/name
UID uniquely identifies pods across lifecycles, while namespace/name
could be 2 different pods across lifecycles. This could result in
tricky scheduler bugs.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#60966
**Special notes for your reviewer**: @bsalamat
**Release note**:
```release-note
Fix a bug in scheduler cache by using Pod UID as the cache key instead of namespace/name
```
Automatic merge from submit-queue (batch tested with PRs 61111, 61069). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix subpath e2e tests on multizone cluster.
Use dynamically provisioned PV to run GCE PD tests. This will make sure that the pod is scheduled to the right zone and GCE PD can be attached to a node.
**Which issue(s) this PR fixes**:
Fixes#61101
**Release note**:
```release-note
NONE
```
/sig storage
@msau42 @verult
Automatic merge from submit-queue (batch tested with PRs 60737, 60739, 61080, 60968, 60951). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Find most recent modified date for fluentd buffers recursively.
Fixes#60762
**What this PR does / why we need it**:
Due to updates in Fluent v0.14, the buffers directory modified date is no
longer updated when files inside the directory are changed. Therefore we
must find the most recent modified date recursively to fix liveness probe.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 60737, 60739, 61080, 60968, 60951). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix broken gke regional logging test.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#60882
```release-note
NONE
```