Automatic merge from submit-queue (batch tested with PRs 51833, 51936)
Changed volume IO e2e test to verify file hash instead of content.
**What this PR does / why we need it**: The existing way of verifying file content takes too much memory, causing processes to be OOM killed.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/kubernetes/kubernetes/issues/51717
**Release note**:
```release-note
NONE
```
/sig storage
/release-note-none
/assign @jeffvance @rootfs
/cc @msau42
Automatic merge from submit-queue
Add support for multi-zone GCE PDs
**What this PR does / why we need it**:
Adds alpha support in k8s for multi-zone (aka Regional) GCE PDs.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
https://github.com/kubernetes/kubernetes/issues/51232
**Special notes for your reviewer**:
**Release note**:
Modifies the VolumeZonePredicate to handle a PV that belongs to more
then one zone or region. This is indicated by the zone or region label
value containing a comma separated list.
Automatic merge from submit-queue (batch tested with PRs 51180, 51893)
Clear alpha MountPropagation fields.
This is leftover from #50924, mount propagation introduced a new field that needs to be cleared.
**Which issue this PR fixes**
fixes#51738
**Release note**:
```release-note
NONE
```
@k8s-mirror-api-machinery-pr-reviews
/assign @liggitt
Automatic merge from submit-queue (batch tested with PRs 51180, 51893)
CPU manager static policy
Blocker for CPU manager #49186 (5 of 6)
* Previous PR in this series: #51357
* Next PR in this series: #51041
cc @derekwaynecarr @sjenning @flyingcougar @balajismaniam
Attempting to be fairly accurate with main authorship at least at a file level -- please let me know if anyone has a better idea on how to improve this.
For posterity, here are the Kubelet flags to run the static policy (assuming `/kube-reserved` is a cgroup that exists for all required controllers)
`--feature-gates=CPUManager=true --cpu-manager-policy=static --cpu-manager-reconcile-period=5s --enforce-node-allocatable=pods,kube-reserved --kube-reserved-cgroup=/kube-reserved --kube-reserved=cpu=500m`
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Add client side event spam filtering
**What this PR does / why we need it**:
Add client side event spam filtering to stop excessive traffic to api-server from internal cluster components.
this pr defines a per source+object event budget of 25 burst with refill of 1 every 5 minutes.
i tested this pr on the following scenarios:
**Scenario 1: Node with 50 crash-looping pods**
```
$ create 50 crash-looping pods on a single node
$ kubectl run bad --image=busybox --replicas=50 --command -- derekisbad
```
Before:
* POST events with peak of 1.7 per second, long-tail: 0.2 per second
* PATCH events with peak of 5 per second, long-tail: 5 per second
After:
* POST events with peak of 1.7 per second, long-tail: 0.2 per second
* PATCH events with peak of 3.6 per second, long-tail: 0.2 per second
Observation:
* https://github.com/kubernetes/kubernetes/pull/47462 capped the number of total events in the long-tail as expected, but did nothing to improve total spam of master.
**Scenario 2: replication controller limited by quota**
```
$ kubectl create quota my-quota --hard=pods=1
$ kubectl run nginx --image=nginx --replicas=50
```
Before:
* POST events not relevant as aggregation worked well here.
* PATCH events with peak and long-tail of 13.6 per second
After:
* POST events not relevant as aggregation worked well here.
* PATCH events with peak: .35 per second, and long-tail of 0
**Which issue this PR fixes**
fixes https://github.com/kubernetes/kubernetes/issues/47366
**Special notes for your reviewer**:
this was a significant problem in a kube 1.5 cluster we are running where events were co-located in a single etcd. this cluster was normal to have larger numbers of unhealty pods as well as denial by quota.
**Release note**:
```release-note
add support for client-side spam filtering of events
```
Automatic merge from submit-queue
Add liggitt to registry approvers
~50 commits to this subtree, and changes to pkg/api, apimachinery, and apiserver (already in approvers list) usually involve corresponding changes here
/assign @smarterclayton
/assign @lavalamp
/assign @wojtek-t
Automatic merge from submit-queue
Fix Stackdriver Logging tests for large clusters
Fixes https://github.com/kubernetes/kubernetes/issues/51700
Due to the limit on the length of the filter, filtering out all nodes in the cluster is not possible. Removing the filter shouldn't affect the tests, since the checks are made based on the nodeIds in the cluster that are unique anyway
Automatic merge from submit-queue
Bump gce metadata-proxy from 0.1.2 to 0.1.3
**What this PR does / why we need it**: Bump metadata-proxy from 0.1.2 to 0.1.3 to incorporate fix for CVE 2016-9063, xref https://github.com/kubernetes/contrib/pull/2720
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Remove DynamicVolumeProvisioning from feature gate
**What this PR does / why we need it**:
Remove `DynamicVolumeProvisioning` from feature gate.
**Which issue this PR fixes** : fixes#51120
**Special notes for your reviewer**:
N/A
**Release note**:
No
Automatic merge from submit-queue
Provide a way to omit Event stages in audit policy
This provide a way to omit some stages for each audit policy rule.
For example:
```
apiVersion: audit.k8s.io/v1beta1
kind: Policy
- level: Metadata
resources:
- group: "rbac.authorization.k8s.io"
resources: ["roles"]
omitStages:
- "RequestReceived"
```
RequestReceived stage will not be emitted to audit backends with previous config.
**Release note**:
```
None
```
#
Automatic merge from submit-queue (batch tested with PRs 49727, 51792)
Introducing metrics-server
ref https://github.com/kubernetes/features/issues/271
There is still some work blocked on problems with repo synchronization:
- migrate to `v1beta1` introduced in #51653
- bump deps to HEAD
Will do it in a follow up PRs once the issue is resolved.
```release-note
Introduced Metrics Server
```
Automatic merge from submit-queue (batch tested with PRs 49727, 51792)
Implement Controller for growing persistent volumes
This PR implements API and controller plane changes necessary for doing controller side resize.
xref : https://github.com/kubernetes/community/pull/657
Also xref https://github.com/kubernetes/features/issues/284
```
Add alpha support for allowing users to grow persistent volumes. Currently we only support volume types that just require control plane resize (such as glusterfs) and don't need separate file system resize.
```
Updates https://github.com/kubernetes/kubernetes/issues/48561
This provide a way to omit some stages for each audit policy rule.
For example:
apiVersion: audit.k8s.io/v1beta1
kind: Policy
- level: Metadata
resources:
- group: "rbac.authorization.k8s.io"
resources: ["roles"]
omitStages:
- "RequestReceived"
RequestReceived stage will not be emitted to audit backends with
previous config.
Automatic merge from submit-queue
Fixes grace period in delete
**What this PR does / why we need it**: Fixes `kubectl delete` ignoring `--grace-period`.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/openshift/origin/issues/15060 found in OpenShift.
**Release note**:
```release-note
NONE
```
Introduce feature gate for expanding PVs
Add a field to SC
Add new Conditions and feature tag pvc update
Add tests for size update via feature gate
register the resize admission plugin
Update golint failures
Automatic merge from submit-queue (batch tested with PRs 51845, 51868, 51864)
Update sys spec to support docker 1.11-1.13 and overlay2.
Fixes https://github.com/kubernetes/kubernetes/issues/32536.
Update docker spec to:
1) Support overlay2;
2) Support docker version 1.11-1.13.
@dchen1107 @yguo0905 @luxas
/cc @kubernetes/sig-node-pr-reviews
```release-note
Kubernetes 1.8 supports docker version 1.11.x, 1.12.x and 1.13.x. And also supports overlay2.
```
Automatic merge from submit-queue
Enable batch/v1beta1.CronJobs by default
This PR re-applies the cronjobs->beta back (https://github.com/kubernetes/kubernetes/pull/51720) with the fix from @shyamjvs.
Fixes#51692
@apelisse @dchen1107 @smarterclayton ptal
@janetkuo @erictune fyi
Automatic merge from submit-queue
Remove deprecated init-container in annotations
fixes#50655fixes#51816closes#41004fixes#51816
Builds on #50654 and drops the initContainer annotations on conversion to prevent bypassing API server validation/security and targeting version-skewed kubelets that still honor the annotations
```release-note
The deprecated alpha and beta initContainer annotations are no longer supported. Init containers must be specified using the initContainers field in the pod spec.
```
Automatic merge from submit-queue
Workloads deprecation 1.8
**What this PR does / why we need it**: This PR deprecates the Deployment, ReplicaSet, and DaemonSet kinds in the extensions/v1beta1 group version and the StatefulSet, Deployment, and ControllerRevision kinds in the apps/v1beta1 group version. The Deployment, ReplicaSet, DaemonSet, StatefuSet, and ControllerRevision kinds in the apps/v1beta2 group version are now the current version.
xref kubernetes/features#353
```release-note
The Deployment, DaemonSet, and ReplicaSet kinds in the extensions/v1beta1 group version are now deprecated, as are the Deployment, StatefulSet, and ControllerRevision kinds in apps/v1beta1. As they will not be removed until after a GA version becomes available, you may continue to use these kinds in existing code. However, all new code should be developed against the apps/v1beta2 group version.
```
Automatic merge from submit-queue (batch tested with PRs 51682, 51546, 51369, 50924, 51827)
Clear values for disabled alpha fields
Fixes#51831
Before persisting new or updated resources, alpha fields that are disabled by feature gate must be removed from the incoming objects.
This adds a helper for clearing these values for pod specs and calls it from the strategies of all in-tree resources containing pod specs.
Addresses https://github.com/kubernetes/community/pull/869
Automatic merge from submit-queue (batch tested with PRs 51682, 51546, 51369, 50924, 51827)
kubeadm: Detect kubelet readiness and error out if the kubelet is unhealthy
**What this PR does / why we need it**:
In order to improve the UX when the kubelet is unhealthy or stopped, or whatever, kubeadm now polls the kubelet's API after 40 and 60 seconds, and then performs an exponential backoff for a total of 155 seconds.
If the kubelet endpoint is not returning `ok` by then, kubeadm gives up and exits.
This will miligate at least 60% of our "[apiclient] Created API client, waiting for control plane to come up" issues in the kubeadm issue tracker 🎉, as kubeadm now informs the user what's wrong and also doesn't deadlock like before.
Demo:
```
lucas@THEGOPHER:~/luxas/kubernetes$ sudo ./kubeadm init --skip-preflight-checks
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.7.4
[init] Using Authorization modes: [Node RBAC]
[preflight] Skipping pre-flight checks
[kubeadm] WARNING: starting in 1.8, tokens expire after 24 hours by default (if you require a non-expiring token use --token-ttl 0)
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [thegopher kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.1.115]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests"
[init] This often takes around a minute; or longer if the control plane images have to be pulled.
[apiclient] All control plane components are healthy after 40.502199 seconds
[markmaster] Will mark node thegopher as master by adding a label and a taint
[markmaster] Master thegopher tainted and labelled with key/value: node-role.kubernetes.io/master=""
[bootstraptoken] Using token: 5776d5.91e7ed14f9e274df
[bootstraptoken] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstraptoken] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstraptoken] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[uploadconfig] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[addons] Applied essential addon: kube-dns
[addons] Applied essential addon: kube-proxy
Your Kubernetes master has initialized successfully!
To start using your cluster, you need to run (as a regular user):
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
http://kubernetes.io/docs/admin/addons/
You can now join any number of machines by running the following on each node
as root:
kubeadm join --token 5776d5.91e7ed14f9e274df 192.168.1.115:6443 --discovery-token-ca-cert-hash sha256:6f301ce8c3f5f6558090b2c3599d26d6fc94ffa3c3565ffac952f4f0c7a9b2a9
lucas@THEGOPHER:~/luxas/kubernetes$ sudo ./kubeadm reset
[preflight] Running pre-flight checks
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Removing kubernetes-managed containers
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes /var/lib/etcd]
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
lucas@THEGOPHER:~/luxas/kubernetes$ sudo systemctl stop kubelet
lucas@THEGOPHER:~/luxas/kubernetes$ sudo ./kubeadm init --skip-preflight-checks
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.7.4
[init] Using Authorization modes: [Node RBAC]
[preflight] Skipping pre-flight checks
[kubeadm] WARNING: starting in 1.8, tokens expire after 24 hours by default (if you require a non-expiring token use --token-ttl 0)
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [thegopher kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.1.115]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests"
[init] This often takes around a minute; or longer if the control plane images have to be pulled.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by that:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
- There is no internet connection; so the kubelet can't pull the following control plane images:
- gcr.io/google_containers/kube-apiserver-amd64:v1.7.4
- gcr.io/google_containers/kube-controller-manager-amd64:v1.7.4
- gcr.io/google_containers/kube-scheduler-amd64:v1.7.4
You can troubleshoot this for example with the following commands if you're on a systemd-powered system:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
couldn't initialize a Kubernetes cluster
```
In this demo, I'm first starting kubeadm normally and everything works as usual.
In the second case, I'm explicitely stopping the kubelet so it doesn't run, and skipping preflight checks, so that kubeadm doesn't even try to exec `systemctl start kubelet` like it does usually.
That obviously results in a non-working system, but now kubeadm tells the user what's the problem instead of waiting forever.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Fixes: https://github.com/kubernetes/kubeadm/issues/377
**Special notes for your reviewer**:
**Release note**:
```release-note
kubeadm: Detect kubelet readiness and error out if the kubelet is unhealthy
```
@kubernetes/sig-cluster-lifecycle-pr-reviews @pipejakob
cc @justinsb @kris-nova @lukemarsden as well as you wanted this feature :)
Automatic merge from submit-queue (batch tested with PRs 51682, 51546, 51369, 50924, 51827)
Remove duplicate fake and unused openapi
**What this PR does / why we need it**:
Follow-up on PR #50404
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
rsync IPVS proxier to the HEAD of iptables
**What this PR does / why we need it**:
There was a significant performance improvement made to iptables. Since IPVS proxier makes use of iptables in some use cases, I think we should rsync IPVS proxier to the HEAD of iptables.
**Which issue this PR fixes** :
xref #51679
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51819, 51706, 51761, 51818, 51500)
fix kube-proxy panic because of nil sessionAffinityConfig
**What this PR does / why we need it**:
fix kube-proxy panic because of nil sessionAffinityConfig
**Which issue this PR fixes**: closes#51499
**Special notes for your reviewer**:
I apology that this bug is introduced by #49850 :(
@thockin @smarterclayton @gnufied
**Release note**:
```release-note
NONE
```