mirror of https://github.com/k3s-io/kubernetes.git synced 2025-08-22 01:56:16 +00:00

Production-Grade Container Scheduling and Management

Go to file

Kubernetes Submit Queue e528a6e785 Merge pull request #51369 from luxas/kubeadm_poll_kubelet Automatic merge from submit-queue (batch tested with PRs 51682, 51546, 51369, 50924, 51827) kubeadm: Detect kubelet readiness and error out if the kubelet is unhealthy What this PR does / why we need it: In order to improve the UX when the kubelet is unhealthy or stopped, or whatever, kubeadm now polls the kubelet's API after 40 and 60 seconds, and then performs an exponential backoff for a total of 155 seconds. If the kubelet endpoint is not returning `ok` by then, kubeadm gives up and exits. This will miligate at least 60% of our "[apiclient] Created API client, waiting for control plane to come up" issues in the kubeadm issue tracker 🎉, as kubeadm now informs the user what's wrong and also doesn't deadlock like before. Demo: ``` lucas@THEGOPHER:~/luxas/kubernetes$ sudo ./kubeadm init --skip-preflight-checks [kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters. [init] Using Kubernetes version: v1.7.4 [init] Using Authorization modes: [Node RBAC] [preflight] Skipping pre-flight checks [kubeadm] WARNING: starting in 1.8, tokens expire after 24 hours by default (if you require a non-expiring token use --token-ttl 0) [certificates] Generated ca certificate and key. [certificates] Generated apiserver certificate and key. [certificates] apiserver serving cert is signed for DNS names [thegopher kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.1.115] [certificates] Generated apiserver-kubelet-client certificate and key. [certificates] Generated sa key and public key. [certificates] Generated front-proxy-ca certificate and key. [certificates] Generated front-proxy-client certificate and key. [certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki" [kubeconfig] Wrote KubeConfig file to disk: "admin.conf" [kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf" [kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf" [kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf" [controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml" [controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml" [controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml" [etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml" [init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests" [init] This often takes around a minute; or longer if the control plane images have to be pulled. [apiclient] All control plane components are healthy after 40.502199 seconds [markmaster] Will mark node thegopher as master by adding a label and a taint [markmaster] Master thegopher tainted and labelled with key/value: node-role.kubernetes.io/master="" [bootstraptoken] Using token: 5776d5.91e7ed14f9e274df [bootstraptoken] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials [bootstraptoken] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token [bootstraptoken] Creating the "cluster-info" ConfigMap in the "kube-public" namespace [uploadconfig] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace [addons] Applied essential addon: kube-dns [addons] Applied essential addon: kube-proxy Your Kubernetes master has initialized successfully! To start using your cluster, you need to run (as a regular user): mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: http://kubernetes.io/docs/admin/addons/ You can now join any number of machines by running the following on each node as root: kubeadm join --token 5776d5.91e7ed14f9e274df 192.168.1.115:6443 --discovery-token-ca-cert-hash sha256:6f301ce8c3f5f6558090b2c3599d26d6fc94ffa3c3565ffac952f4f0c7a9b2a9 lucas@THEGOPHER:~/luxas/kubernetes$ sudo ./kubeadm reset [preflight] Running pre-flight checks [reset] Stopping the kubelet service [reset] Unmounting mounted directories in "/var/lib/kubelet" [reset] Removing kubernetes-managed containers [reset] Deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes /var/lib/etcd] [reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki] [reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf] lucas@THEGOPHER:~/luxas/kubernetes$ sudo systemctl stop kubelet lucas@THEGOPHER:~/luxas/kubernetes$ sudo ./kubeadm init --skip-preflight-checks [kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters. [init] Using Kubernetes version: v1.7.4 [init] Using Authorization modes: [Node RBAC] [preflight] Skipping pre-flight checks [kubeadm] WARNING: starting in 1.8, tokens expire after 24 hours by default (if you require a non-expiring token use --token-ttl 0) [certificates] Generated ca certificate and key. [certificates] Generated apiserver certificate and key. [certificates] apiserver serving cert is signed for DNS names [thegopher kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.1.115] [certificates] Generated apiserver-kubelet-client certificate and key. [certificates] Generated sa key and public key. [certificates] Generated front-proxy-ca certificate and key. [certificates] Generated front-proxy-client certificate and key. [certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki" [kubeconfig] Wrote KubeConfig file to disk: "admin.conf" [kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf" [kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf" [kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf" [controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml" [controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml" [controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml" [etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml" [init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests" [init] This often takes around a minute; or longer if the control plane images have to be pulled. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused. Unfortunately, an error has occurred: timed out waiting for the condition This error is likely caused by that: - The kubelet is not running - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled) - There is no internet connection; so the kubelet can't pull the following control plane images: - gcr.io/google_containers/kube-apiserver-amd64:v1.7.4 - gcr.io/google_containers/kube-controller-manager-amd64:v1.7.4 - gcr.io/google_containers/kube-scheduler-amd64:v1.7.4 You can troubleshoot this for example with the following commands if you're on a systemd-powered system: - 'systemctl status kubelet' - 'journalctl -xeu kubelet' couldn't initialize a Kubernetes cluster ``` In this demo, I'm first starting kubeadm normally and everything works as usual. In the second case, I'm explicitely stopping the kubelet so it doesn't run, and skipping preflight checks, so that kubeadm doesn't even try to exec `systemctl start kubelet` like it does usually. That obviously results in a non-working system, but now kubeadm tells the user what's the problem instead of waiting forever. Which issue this PR fixes (optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged): fixes # Fixes: https://github.com/kubernetes/kubeadm/issues/377 Special notes for your reviewer: Release note: ```release-note kubeadm: Detect kubelet readiness and error out if the kubelet is unhealthy ``` @kubernetes/sig-cluster-lifecycle-pr-reviews @pipejakob cc @justinsb @kris-nova @lukemarsden as well as you wanted this feature :)		2017-09-03 15:54:19 -07:00
.github	Merge pull request #46714 from castrojo/new-issue-template	2017-06-22 16:43:47 -07:00
api	update bazel	2017-09-03 02:18:14 -07:00
build	Merge pull request #50597 from dixudx/qemu_upgrade_2.9.1	2017-09-03 03:24:53 -07:00
cluster	Merge pull request #51237 from gunjan5/calico-2.5-rbac	2017-09-03 14:01:04 -07:00
cmd	Merge pull request #51369 from luxas/kubeadm_poll_kubelet	2017-09-03 15:54:19 -07:00
docs	Merge pull request #48921 from smarterclayton/paging_prototype	2017-09-02 19:26:29 -07:00
examples	Replicate the persistent volume label admission plugin in a controller in	2017-08-28 03:12:18 -04:00
federation	Merge pull request #51638 from mfojtik/client-gen-custom-methods	2017-09-03 11:10:09 -07:00
Godeps	Merge pull request #51819 from cblecker/godep-license-darwin	2017-09-03 15:00:00 -07:00
hack	Merge pull request #51819 from cblecker/godep-license-darwin	2017-09-03 15:00:00 -07:00
logo
pkg	Merge pull request #51546 from apelisse/remove-duplicate-fake-openapi	2017-09-03 15:54:16 -07:00
plugin	Merge pull request #51818 from liggitt/controller-roles	2017-09-03 15:00:11 -07:00
staging	Merge pull request #51500 from m1093782566/fix-kube-proxy-panic	2017-09-03 15:00:15 -07:00
test	Merge pull request #51153 from clamoriniere1A/feature/job_failure_policy_controller	2017-09-03 13:13:17 -07:00
third_party	Use buildozer to remove deprecated automanaged tags	2017-08-11 09:31:50 -07:00
translations	Italian translation	2017-08-23 08:45:26 +02:00
vendor	Update Godep for kube-openapi	2017-09-03 02:16:08 -07:00
.bazelrc
.generated_files
.gitattributes
.gitignore
.kazelcfg.json	Switch from gazel to kazel, and move kazelcfg into build/root	2017-07-18 12:48:51 -07:00
BUILD.bazel
CHANGELOG.md	Fix changelog to add discovery/controller-manager fixes.	2017-08-31 10:31:19 -07:00
code-of-conduct.md
CONTRIBUTING.md
labels.yaml	Update the label manifest with new do-not-merge labels	2017-08-30 16:23:54 -07:00
LICENSE
Makefile
Makefile.generated_files
OWNERS	Fix my incorrect username in #46649	2017-08-10 11:59:54 -07:00
OWNERS_ALIASES	add sig leads to owners-aliases	2017-08-23 14:35:20 -07:00
README.md	update submit-queue URL in README.md	2017-08-01 20:36:17 -07:00
SUPPORT.md	Add a SUPPORT.md file for github	2017-08-11 14:42:36 -04:00
Vagrantfile
WORKSPACE

README.md

Kubernetes

Kubernetes is an open source system for managing containerized applications across multiple hosts, providing basic mechanisms for deployment, maintenance, and scaling of applications.

Kubernetes builds upon a decade and a half of experience at Google running production workloads at scale using a system called Borg, combined with best-of-breed ideas and practices from the community.

Kubernetes is hosted by the Cloud Native Computing Foundation (CNCF). If you are a company that wants to help shape the evolution of technologies that are container-packaged, dynamically-scheduled and microservices-oriented, consider joining the CNCF. For details about who's involved and how Kubernetes plays a role, read the CNCF announcement.

To start using Kubernetes

See our documentation on kubernetes.io.

Try our interactive tutorial.

Take a free course on Scalable Microservices with Kubernetes.

To start developing Kubernetes

The community repository hosts all information about building Kubernetes from source, how to contribute code and documentation, who to contact about what, etc.

If you want to build Kubernetes right away there are two options:

You have a working Go environment.

$ go get -d k8s.io/kubernetes
$ cd $GOPATH/src/k8s.io/kubernetes
$ make

You have a working Docker environment.

$ git clone https://github.com/kubernetes/kubernetes
$ cd kubernetes
$ make quick-release

If you are less impatient, head over to the developer's documentation.

Support

If you need support, start with the troubleshooting guide and work your way through the process that we've outlined.

That said, if you have questions, reach out to us one way or another.