mirror of
https://github.com/k3s-io/kubernetes.git
synced 2025-07-23 03:41:45 +00:00
Merge pull request #11317 from JanetKuo/docs-troubleshooting
Organize troubleshooting sections
This commit is contained in:
commit
b3835665ee
@ -43,6 +43,8 @@ certainly want the docs that go with that version.</h1>
|
|||||||
* There are example files and walkthroughs in the [examples](../examples/)
|
* There are example files and walkthroughs in the [examples](../examples/)
|
||||||
folder.
|
folder.
|
||||||
|
|
||||||
|
* If something went wrong, see the [troubleshooting](troubleshooting.md) document for how to debug.
|
||||||
|
|
||||||
|
|
||||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||||
[]()
|
[]()
|
||||||
|
@ -22,7 +22,7 @@ certainly want the docs that go with that version.</h1>
|
|||||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||||
# Cluster Troubleshooting
|
# Cluster Troubleshooting
|
||||||
Most of the time, if you encounter problems, it is your application that is having problems. For application
|
Most of the time, if you encounter problems, it is your application that is having problems. For application
|
||||||
problems please see the [application troubleshooting guide](../user-guide/application-troubleshooting.md).
|
problems please see the [application troubleshooting guide](../user-guide/application-troubleshooting.md). You may also visit [troubleshooting document](../troubleshooting.md) for more information.
|
||||||
|
|
||||||
## Listing your cluster
|
## Listing your cluster
|
||||||
The first thing to debug in your cluster is if your nodes are all registered correctly.
|
The first thing to debug in your cluster is if your nodes are all registered correctly.
|
||||||
|
@ -20,7 +20,7 @@ certainly want the docs that go with that version.</h1>
|
|||||||
<!-- END STRIP_FOR_RELEASE -->
|
<!-- END STRIP_FOR_RELEASE -->
|
||||||
|
|
||||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||||
# Application Troubleshooting.
|
# Application Troubleshooting
|
||||||
|
|
||||||
This guide is to help users debug applications that are deployed into Kubernetes and not behaving correctly.
|
This guide is to help users debug applications that are deployed into Kubernetes and not behaving correctly.
|
||||||
This is *not* a guide for people who want to debug their cluster. For that you should check out
|
This is *not* a guide for people who want to debug their cluster. For that you should check out
|
||||||
@ -28,10 +28,18 @@ This is *not* a guide for people who want to debug their cluster. For that you
|
|||||||
|
|
||||||
**Table of Contents**
|
**Table of Contents**
|
||||||
<!-- BEGIN MUNGE: GENERATED_TOC -->
|
<!-- BEGIN MUNGE: GENERATED_TOC -->
|
||||||
- [Application Troubleshooting.](#application-troubleshooting.)
|
- [Application Troubleshooting](#application-troubleshooting)
|
||||||
- [FAQ](#faq)
|
- [FAQ](#faq)
|
||||||
- [Diagnosing the problem](#diagnosing-the-problem)
|
- [Diagnosing the problem](#diagnosing-the-problem)
|
||||||
- [Debugging Pods](#debugging-pods)
|
- [Debugging Pods](#debugging-pods)
|
||||||
|
- [My pod stays pending](#my-pod-stays-pending)
|
||||||
|
- [My pod stays waiting](#my-pod-stays-waiting)
|
||||||
|
- [My pod is crashing or otherwise unhealthy](#my-pod-is-crashing-or-otherwise-unhealthy)
|
||||||
|
- [Debugging Replication Controllers](#debugging-replication-controllers)
|
||||||
|
- [Debugging Services](#debugging-services)
|
||||||
|
- [My service is missing endpoints](#my-service-is-missing-endpoints)
|
||||||
|
- [Network traffic is not forwarded](#network-traffic-is-not-forwarded)
|
||||||
|
- [More information](#more-information)
|
||||||
|
|
||||||
<!-- END MUNGE: GENERATED_TOC -->
|
<!-- END MUNGE: GENERATED_TOC -->
|
||||||
|
|
||||||
@ -46,46 +54,40 @@ your Service?
|
|||||||
* [Debugging Services](#debugging-services)
|
* [Debugging Services](#debugging-services)
|
||||||
|
|
||||||
### Debugging Pods
|
### Debugging Pods
|
||||||
The first step in debugging a Pod is taking a look at it. For the purposes of example, imagine we have a pod
|
The first step in debugging a Pod is taking a look at it. Check the current state of the Pod and recent events with the following command:
|
||||||
```my-pod``` which holds two containers ```container-1``` and ```container-2```
|
|
||||||
|
|
||||||
First, describe the pod. This will show the current state of the Pod and recent events.
|
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
export POD_NAME=my-pod
|
|
||||||
kubectl describe pods ${POD_NAME}
|
kubectl describe pods ${POD_NAME}
|
||||||
```
|
```
|
||||||
|
|
||||||
Look at the state of the containers in the pod. Are they all ```Running```? Have there been recent restarts?
|
Look at the state of the containers in the pod. Are they all ```Running```? Have there been recent restarts?
|
||||||
|
|
||||||
Depending on the state of the pod, you may want to:
|
Continue debugging depending on the state of the pods.
|
||||||
* [Debug a pending pod](#debugging-pending-pods)
|
|
||||||
* [Debug a waiting pod](#debugging-waiting-pods)
|
|
||||||
* [Debug a crashing pod](#debugging-crashing-pods-or-otherwise-unhealthy-pods)
|
|
||||||
|
|
||||||
#### Debuging Pending Pods
|
#### My pod stays pending
|
||||||
If a Pod is stuck in ```Pending``` it means that it can not be scheduled onto a node. Generally this is because
|
If a Pod is stuck in ```Pending``` it means that it can not be scheduled onto a node. Generally this is because
|
||||||
there are insufficient resources of one type or another that prevent scheduling. Look at the output of the
|
there are insufficient resources of one type or another that prevent scheduling. Look at the output of the
|
||||||
```kubectl describe ...``` command above. There should be messages from the scheduler about why it can not schedule
|
```kubectl describe ...``` command above. There should be messages from the scheduler about why it can not schedule
|
||||||
your pod. Reasons include:
|
your pod. Reasons include:
|
||||||
|
|
||||||
You don't have enough resources. You may have exhausted the supply of CPU or Memory in your cluster, in this case
|
* **You don't have enough resources**: You may have exhausted the supply of CPU or Memory in your cluster, in this case
|
||||||
you need to delete Pods, adjust resource requests, or add new nodes to your cluster.
|
you need to delete Pods, adjust resource requests, or add new nodes to your cluster. See [Compute Resources document](compute-resources.md#my-pods-are-pending-with-event-message-failedscheduling) for more information.
|
||||||
|
|
||||||
You are using ```hostPort```. When you bind a Pod to a ```hostPort``` there are a limited number of places that pod can be
|
* **You are using ```hostPort```**: When you bind a Pod to a ```hostPort``` there are a limited number of places that pod can be
|
||||||
scheduled. In most cases, ```hostPort``` is unnecessary, try using a Service object to expose your Pod. If you do require
|
scheduled. In most cases, ```hostPort``` is unnecessary, try using a Service object to expose your Pod. If you do require
|
||||||
```hostPort``` then you can only schedule as many Pods as there are nodes in your Kubernetes cluster.
|
```hostPort``` then you can only schedule as many Pods as there are nodes in your Kubernetes cluster.
|
||||||
|
|
||||||
|
|
||||||
#### Debugging Waiting Pods
|
#### My pod stays waiting
|
||||||
If a Pod is stuck in the ```Waiting``` state, then it has been scheduled to a worker node, but it can't run on that machine.
|
If a Pod is stuck in the ```Waiting``` state, then it has been scheduled to a worker node, but it can't run on that machine.
|
||||||
Again, the information from ```kubectl describe ...``` should be informative. The most common cause of ```Waiting``` pods
|
Again, the information from ```kubectl describe ...``` should be informative. The most common cause of ```Waiting``` pods is a failure to pull the image. There are three things to check:
|
||||||
is a failure to pull the image. Make sure that you have the name of the image correct. Have you pushed it to the repository?
|
* Make sure that you have the name of the image correct
|
||||||
Does it work if you run a manual ```docker pull <image>``` on your machine?
|
* Have you pushed the image to the repository?
|
||||||
|
* Run a manual ```docker pull <image>``` on your machine to see if the image can be pulled.
|
||||||
|
|
||||||
#### Debugging Crashing or otherwise unhealthy pods
|
#### My pod is crashing or otherwise unhealthy
|
||||||
|
|
||||||
Let's suppose that ```container-2``` has been crash looping and you don't know why, you can take a look at the logs of
|
First, take a look at the logs of
|
||||||
the current container:
|
the current container:
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
@ -112,12 +114,12 @@ kubectl exec cassandra -- cat /var/log/cassandra/system.log
|
|||||||
|
|
||||||
|
|
||||||
If none of these approaches work, you can find the host machine that the pod is running on and SSH into that host,
|
If none of these approaches work, you can find the host machine that the pod is running on and SSH into that host,
|
||||||
but this should generally not be necessary given tools in the Kubernetes API. Indeed if you find yourself needing to ssh into a machine, please file a
|
but this should generally not be necessary given tools in the Kubernetes API. Therefore, if you find yourself needing to ssh into a machine, please file a
|
||||||
feature request on GitHub describing your use case and why these tools are insufficient.
|
feature request on GitHub describing your use case and why these tools are insufficient.
|
||||||
|
|
||||||
### Debugging Replication Controllers
|
### Debugging Replication Controllers
|
||||||
Replication controllers are fairly straightforward. They can either create Pods or they can't. If they can't
|
Replication controllers are fairly straightforward. They can either create Pods or they can't. If they can't
|
||||||
create pods, then please refer to the [instructions above](#debugging-pods)
|
create pods, then please refer to the [instructions above](#debugging-pods) to debug your pods.
|
||||||
|
|
||||||
You can also use ```kubectl describe rc ${CONTROLLER_NAME}``` to introspect events related to the replication
|
You can also use ```kubectl describe rc ${CONTROLLER_NAME}``` to introspect events related to the replication
|
||||||
controller.
|
controller.
|
||||||
@ -126,8 +128,7 @@ controller.
|
|||||||
Services provide load balancing across a set of pods. There are several common problems that can make Services
|
Services provide load balancing across a set of pods. There are several common problems that can make Services
|
||||||
not work properly. The following instructions should help debug Service problems.
|
not work properly. The following instructions should help debug Service problems.
|
||||||
|
|
||||||
#### Verify that there are endpoints for the service
|
First, verify that there are endpoints for the service. For every Service object, the apiserver makes an `endpoints` resource available.
|
||||||
For every Service object, the apiserver makes an ```endpoints`` resource available.
|
|
||||||
|
|
||||||
You can view this resource with:
|
You can view this resource with:
|
||||||
|
|
||||||
@ -139,7 +140,7 @@ Make sure that the endpoints match up with the number of containers that you exp
|
|||||||
For example, if your Service is for an nginx container with 3 replicas, you would expect to see three different
|
For example, if your Service is for an nginx container with 3 replicas, you would expect to see three different
|
||||||
IP addresses in the Service's endpoints.
|
IP addresses in the Service's endpoints.
|
||||||
|
|
||||||
#### Missing endpoints
|
#### My service is missing endpoints
|
||||||
If you are missing endpoints, try listing pods using the labels that Service uses. Imagine that you have
|
If you are missing endpoints, try listing pods using the labels that Service uses. Imagine that you have
|
||||||
a Service where the labels are:
|
a Service where the labels are:
|
||||||
```yaml
|
```yaml
|
||||||
@ -163,7 +164,7 @@ selected don't have that port listed, then they won't be added to the endpoints
|
|||||||
|
|
||||||
Verify that the pod's ```containerPort``` matches up with the Service's ```containerPort```
|
Verify that the pod's ```containerPort``` matches up with the Service's ```containerPort```
|
||||||
|
|
||||||
#### Network traffic isn't forwarded
|
#### Network traffic is not forwarded
|
||||||
If you can connect to the service, but the connection is immediately dropped, and there are endpoints
|
If you can connect to the service, but the connection is immediately dropped, and there are endpoints
|
||||||
in the endpoints list, it's likely that the proxy can't contact your pods.
|
in the endpoints list, it's likely that the proxy can't contact your pods.
|
||||||
|
|
||||||
@ -173,6 +174,11 @@ check:
|
|||||||
* Can you connect to your pods directly? Get the IP address for the Pod, and try to connect directly to that IP
|
* Can you connect to your pods directly? Get the IP address for the Pod, and try to connect directly to that IP
|
||||||
* Is your application serving on the port that you configured? Kubernetes doesn't do port remapping, so if your application serves on 8080, the ```containerPort``` field needs to be 8080.
|
* Is your application serving on the port that you configured? Kubernetes doesn't do port remapping, so if your application serves on 8080, the ```containerPort``` field needs to be 8080.
|
||||||
|
|
||||||
|
#### More information
|
||||||
|
If none of the above solves your problem, follow the instructions in [Debugging Service document](debugging-services.md) to make sure that your `Service` is running, has `Endpoints`, and your `Pods` are actually serving; you have DNS working, iptables rules installed, and kube-proxy does not seem to be misbehaving.
|
||||||
|
|
||||||
|
You may also visit [troubleshooting document](../troubleshooting.md) for more information.
|
||||||
|
|
||||||
|
|
||||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||||
[]()
|
[]()
|
||||||
|
@ -23,15 +23,19 @@ certainly want the docs that go with that version.</h1>
|
|||||||
# Compute Resources
|
# Compute Resources
|
||||||
|
|
||||||
** Table of Contents**
|
** Table of Contents**
|
||||||
- Compute Resources
|
<!-- BEGIN MUNGE: GENERATED_TOC -->
|
||||||
|
- [Compute Resources](#compute-resources)
|
||||||
- [Container and Pod Resource Limits](#container-and-pod-resource-limits)
|
- [Container and Pod Resource Limits](#container-and-pod-resource-limits)
|
||||||
- [How Pods with Resource Limits are Scheduled](#how-pods-with-resource-limits-are-scheduled)
|
- [How Pods with Resource Limits are Scheduled](#how-pods-with-resource-limits-are-scheduled)
|
||||||
- [How Pods with Resource Limits are Run](#how-pods-with-resource-limits-are-run)
|
- [How Pods with Resource Limits are Run](#how-pods-with-resource-limits-are-run)
|
||||||
- [Monitoring Compute Resource Usage](#monitoring-compute-resource-usage)
|
- [Monitoring Compute Resource Usage](#monitoring-compute-resource-usage)
|
||||||
- [Troubleshooting](#troubleshooting)
|
- [Troubleshooting](#troubleshooting)
|
||||||
- [Detecting Resource Starved Containers](#detecting-resource-starved-containers)
|
- [My pods are pending with event message failedScheduling](#my-pods-are-pending-with-event-message-failedscheduling)
|
||||||
|
- [My container is terminated](#my-container-is-terminated)
|
||||||
- [Planned Improvements](#planned-improvements)
|
- [Planned Improvements](#planned-improvements)
|
||||||
|
|
||||||
|
<!-- END MUNGE: GENERATED_TOC -->
|
||||||
|
|
||||||
When specifying a [pod](pods.md), you can optionally specify how much CPU and memory (RAM) each
|
When specifying a [pod](pods.md), you can optionally specify how much CPU and memory (RAM) each
|
||||||
container needs. When containers have resource limits, the scheduler is able to make better
|
container needs. When containers have resource limits, the scheduler is able to make better
|
||||||
decisions about which nodes to place pods on, and contention for resources can be handled in a
|
decisions about which nodes to place pods on, and contention for resources can be handled in a
|
||||||
@ -134,6 +138,7 @@ then pod resource usage can be retrieved from the monitoring system.
|
|||||||
|
|
||||||
## Troubleshooting
|
## Troubleshooting
|
||||||
|
|
||||||
|
### My pods are pending with event message failedScheduling
|
||||||
If the scheduler cannot find any node where a pod can fit, then the pod will remain unscheduled
|
If the scheduler cannot find any node where a pod can fit, then the pod will remain unscheduled
|
||||||
until a place can be found. An event will be produced each time the scheduler fails to find a
|
until a place can be found. An event will be produced each time the scheduler fails to find a
|
||||||
place for the pod, like this:
|
place for the pod, like this:
|
||||||
@ -159,8 +164,8 @@ The [resource quota](../admin/resource-quota.md) feature can be configured
|
|||||||
to limit the total amount of resources that can be consumed. If used in conjunction
|
to limit the total amount of resources that can be consumed. If used in conjunction
|
||||||
with namespaces, it can prevent one team from hogging all the resources.
|
with namespaces, it can prevent one team from hogging all the resources.
|
||||||
|
|
||||||
### Detecting Resource Starved Containers
|
### My container is terminated
|
||||||
To check if a container is being killed because it is hitting a resource limit, call `kubectl describe pod`
|
Your container may be terminated because it's resource-starved. To check if a container is being killed because it is hitting a resource limit, call `kubectl describe pod`
|
||||||
on the pod you are interested in:
|
on the pod you are interested in:
|
||||||
|
|
||||||
```
|
```
|
||||||
|
@ -20,13 +20,35 @@ certainly want the docs that go with that version.</h1>
|
|||||||
<!-- END STRIP_FOR_RELEASE -->
|
<!-- END STRIP_FOR_RELEASE -->
|
||||||
|
|
||||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||||
# My Service isn't working - how to debug
|
# My Service is not working - how to debug
|
||||||
|
|
||||||
An issue that comes up rather frequently for new installations of Kubernetes is
|
An issue that comes up rather frequently for new installations of Kubernetes is
|
||||||
that `Services` are not working properly. You've run all your `Pod`s and
|
that `Services` are not working properly. You've run all your `Pod`s and
|
||||||
`ReplicationController`s, but you get no response when you try to access them.
|
`ReplicationController`s, but you get no response when you try to access them.
|
||||||
This document will hopefully help you to figure out what's going wrong.
|
This document will hopefully help you to figure out what's going wrong.
|
||||||
|
|
||||||
|
**Table of Contents**
|
||||||
|
<!-- BEGIN MUNGE: GENERATED_TOC -->
|
||||||
|
- [My Service is not working - how to debug](#my-service-is-not-working---how-to-debug)
|
||||||
|
- [Conventions](#conventions)
|
||||||
|
- [Running commands in a Pod](#running-commands-in-a-pod)
|
||||||
|
- [Setup](#setup)
|
||||||
|
- [Does the Service exist?](#does-the-service-exist)
|
||||||
|
- [Does the Service work by DNS?](#does-the-service-work-by-dns)
|
||||||
|
- [Does any Service exist in DNS?](#does-any-service-exist-in-dns)
|
||||||
|
- [Does the Service work by IP?](#does-the-service-work-by-ip)
|
||||||
|
- [Is the Service correct?](#is-the-service-correct)
|
||||||
|
- [Does the Service have any Endpoints?](#does-the-service-have-any-endpoints)
|
||||||
|
- [Are the Pods working?](#are-the-pods-working)
|
||||||
|
- [Is the kube-proxy working?](#is-the-kube-proxy-working)
|
||||||
|
- [Is kube-proxy running?](#is-kube-proxy-running)
|
||||||
|
- [Is kube-proxy writing iptables rules?](#is-kube-proxy-writing-iptables-rules)
|
||||||
|
- [Is kube-proxy proxying?](#is-kube-proxy-proxying)
|
||||||
|
- [Seek help](#seek-help)
|
||||||
|
- [More information](#more-information)
|
||||||
|
|
||||||
|
<!-- END MUNGE: GENERATED_TOC -->
|
||||||
|
|
||||||
## Conventions
|
## Conventions
|
||||||
|
|
||||||
Throughout this doc you will see various commands that you can run. Some
|
Throughout this doc you will see various commands that you can run. Some
|
||||||
@ -57,33 +79,32 @@ OUTPUT
|
|||||||
|
|
||||||
## Running commands in a Pod
|
## Running commands in a Pod
|
||||||
|
|
||||||
For many steps here will will want to see what a `Pod` running in the cluster
|
For many steps here you will want to see what a `Pod` running in the cluster
|
||||||
sees. Kubernetes does not directly support interactive `Pod`s (yet), but you can
|
sees. Kubernetes does not directly support interactive `Pod`s (yet), but you can
|
||||||
approximate it:
|
approximate it:
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
$ cat <<EOF | kubectl create -f -
|
$ cat <<EOF | kubectl create -f -
|
||||||
> apiVersion: v1
|
apiVersion: v1
|
||||||
> kind: Pod
|
kind: Pod
|
||||||
> metadata:
|
metadata:
|
||||||
> name: busybox-sleep
|
name: busybox-sleep
|
||||||
> spec:
|
spec:
|
||||||
> containers:
|
containers:
|
||||||
> - name: busybox
|
- name: busybox
|
||||||
> image: busybox
|
image: busybox
|
||||||
> args:
|
args:
|
||||||
> - sleep
|
- sleep
|
||||||
> - "1000000"
|
- "1000000"
|
||||||
> EOF
|
EOF
|
||||||
pods/busybox-sleep
|
pods/busybox-sleep
|
||||||
```
|
```
|
||||||
|
|
||||||
Now, when you need to run a command (even an interactive shell) in a `Pod`-like
|
Now, when you need to run a command (even an interactive shell) in a `Pod`-like
|
||||||
context:
|
context, use:
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
$ kubectl exec busybox-sleep hostname
|
$ kubectl exec busybox-sleep -- <COMMAND>
|
||||||
busybox-sleep
|
|
||||||
```
|
```
|
||||||
|
|
||||||
or
|
or
|
||||||
@ -281,7 +302,7 @@ debugging your own `Service`, debug DNS.
|
|||||||
|
|
||||||
## Does the Service work by IP?
|
## Does the Service work by IP?
|
||||||
|
|
||||||
The next thing to test is whether your `Service` worksat all. From a
|
The next thing to test is whether your `Service` works at all. From a
|
||||||
`Node` in your cluster, access the `Service`'s IP (from `kubectl get` above).
|
`Node` in your cluster, access the `Service`'s IP (from `kubectl get` above).
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
@ -510,6 +531,9 @@ Contact us on
|
|||||||
[email](https://groups.google.com/forum/#!forum/google-containers) or
|
[email](https://groups.google.com/forum/#!forum/google-containers) or
|
||||||
[GitHub](https://github.com/GoogleCloudPlatform/kubernetes).
|
[GitHub](https://github.com/GoogleCloudPlatform/kubernetes).
|
||||||
|
|
||||||
|
## More information
|
||||||
|
Visit [troubleshooting document](../troubleshooting.md) for more information.
|
||||||
|
|
||||||
|
|
||||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||||
[]()
|
[]()
|
||||||
|
Loading…
Reference in New Issue
Block a user