Merge pull request #11317 from JanetKuo/docs-troubleshooting

Organize troubleshooting sections
This commit is contained in:
Eric Tune 2015-07-15 15:01:02 -07:00
commit b3835665ee
5 changed files with 87 additions and 50 deletions

View File

@ -43,6 +43,8 @@ certainly want the docs that go with that version.</h1>
* There are example files and walkthroughs in the [examples](../examples/) * There are example files and walkthroughs in the [examples](../examples/)
folder. folder.
* If something went wrong, see the [troubleshooting](troubleshooting.md) document for how to debug.
<!-- BEGIN MUNGE: GENERATED_ANALYTICS --> <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/README.md?pixel)]() [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/README.md?pixel)]()

View File

@ -22,7 +22,7 @@ certainly want the docs that go with that version.</h1>
<!-- END MUNGE: UNVERSIONED_WARNING --> <!-- END MUNGE: UNVERSIONED_WARNING -->
# Cluster Troubleshooting # Cluster Troubleshooting
Most of the time, if you encounter problems, it is your application that is having problems. For application Most of the time, if you encounter problems, it is your application that is having problems. For application
problems please see the [application troubleshooting guide](../user-guide/application-troubleshooting.md). problems please see the [application troubleshooting guide](../user-guide/application-troubleshooting.md). You may also visit [troubleshooting document](../troubleshooting.md) for more information.
## Listing your cluster ## Listing your cluster
The first thing to debug in your cluster is if your nodes are all registered correctly. The first thing to debug in your cluster is if your nodes are all registered correctly.

View File

@ -20,7 +20,7 @@ certainly want the docs that go with that version.</h1>
<!-- END STRIP_FOR_RELEASE --> <!-- END STRIP_FOR_RELEASE -->
<!-- END MUNGE: UNVERSIONED_WARNING --> <!-- END MUNGE: UNVERSIONED_WARNING -->
# Application Troubleshooting. # Application Troubleshooting
This guide is to help users debug applications that are deployed into Kubernetes and not behaving correctly. This guide is to help users debug applications that are deployed into Kubernetes and not behaving correctly.
This is *not* a guide for people who want to debug their cluster. For that you should check out This is *not* a guide for people who want to debug their cluster. For that you should check out
@ -28,10 +28,18 @@ This is *not* a guide for people who want to debug their cluster. For that you
**Table of Contents** **Table of Contents**
<!-- BEGIN MUNGE: GENERATED_TOC --> <!-- BEGIN MUNGE: GENERATED_TOC -->
- [Application Troubleshooting.](#application-troubleshooting.) - [Application Troubleshooting](#application-troubleshooting)
- [FAQ](#faq) - [FAQ](#faq)
- [Diagnosing the problem](#diagnosing-the-problem) - [Diagnosing the problem](#diagnosing-the-problem)
- [Debugging Pods](#debugging-pods) - [Debugging Pods](#debugging-pods)
- [My pod stays pending](#my-pod-stays-pending)
- [My pod stays waiting](#my-pod-stays-waiting)
- [My pod is crashing or otherwise unhealthy](#my-pod-is-crashing-or-otherwise-unhealthy)
- [Debugging Replication Controllers](#debugging-replication-controllers)
- [Debugging Services](#debugging-services)
- [My service is missing endpoints](#my-service-is-missing-endpoints)
- [Network traffic is not forwarded](#network-traffic-is-not-forwarded)
- [More information](#more-information)
<!-- END MUNGE: GENERATED_TOC --> <!-- END MUNGE: GENERATED_TOC -->
@ -46,46 +54,40 @@ your Service?
* [Debugging Services](#debugging-services) * [Debugging Services](#debugging-services)
### Debugging Pods ### Debugging Pods
The first step in debugging a Pod is taking a look at it. For the purposes of example, imagine we have a pod The first step in debugging a Pod is taking a look at it. Check the current state of the Pod and recent events with the following command:
```my-pod``` which holds two containers ```container-1``` and ```container-2```
First, describe the pod. This will show the current state of the Pod and recent events.
```sh ```sh
export POD_NAME=my-pod
kubectl describe pods ${POD_NAME} kubectl describe pods ${POD_NAME}
``` ```
Look at the state of the containers in the pod. Are they all ```Running```? Have there been recent restarts? Look at the state of the containers in the pod. Are they all ```Running```? Have there been recent restarts?
Depending on the state of the pod, you may want to: Continue debugging depending on the state of the pods.
* [Debug a pending pod](#debugging-pending-pods)
* [Debug a waiting pod](#debugging-waiting-pods)
* [Debug a crashing pod](#debugging-crashing-pods-or-otherwise-unhealthy-pods)
#### Debuging Pending Pods #### My pod stays pending
If a Pod is stuck in ```Pending``` it means that it can not be scheduled onto a node. Generally this is because If a Pod is stuck in ```Pending``` it means that it can not be scheduled onto a node. Generally this is because
there are insufficient resources of one type or another that prevent scheduling. Look at the output of the there are insufficient resources of one type or another that prevent scheduling. Look at the output of the
```kubectl describe ...``` command above. There should be messages from the scheduler about why it can not schedule ```kubectl describe ...``` command above. There should be messages from the scheduler about why it can not schedule
your pod. Reasons include: your pod. Reasons include:
You don't have enough resources. You may have exhausted the supply of CPU or Memory in your cluster, in this case * **You don't have enough resources**: You may have exhausted the supply of CPU or Memory in your cluster, in this case
you need to delete Pods, adjust resource requests, or add new nodes to your cluster. you need to delete Pods, adjust resource requests, or add new nodes to your cluster. See [Compute Resources document](compute-resources.md#my-pods-are-pending-with-event-message-failedscheduling) for more information.
You are using ```hostPort```. When you bind a Pod to a ```hostPort``` there are a limited number of places that pod can be * **You are using ```hostPort```**: When you bind a Pod to a ```hostPort``` there are a limited number of places that pod can be
scheduled. In most cases, ```hostPort``` is unnecessary, try using a Service object to expose your Pod. If you do require scheduled. In most cases, ```hostPort``` is unnecessary, try using a Service object to expose your Pod. If you do require
```hostPort``` then you can only schedule as many Pods as there are nodes in your Kubernetes cluster. ```hostPort``` then you can only schedule as many Pods as there are nodes in your Kubernetes cluster.
#### Debugging Waiting Pods #### My pod stays waiting
If a Pod is stuck in the ```Waiting``` state, then it has been scheduled to a worker node, but it can't run on that machine. If a Pod is stuck in the ```Waiting``` state, then it has been scheduled to a worker node, but it can't run on that machine.
Again, the information from ```kubectl describe ...``` should be informative. The most common cause of ```Waiting``` pods Again, the information from ```kubectl describe ...``` should be informative. The most common cause of ```Waiting``` pods is a failure to pull the image. There are three things to check:
is a failure to pull the image. Make sure that you have the name of the image correct. Have you pushed it to the repository? * Make sure that you have the name of the image correct
Does it work if you run a manual ```docker pull <image>``` on your machine? * Have you pushed the image to the repository?
* Run a manual ```docker pull <image>``` on your machine to see if the image can be pulled.
#### Debugging Crashing or otherwise unhealthy pods #### My pod is crashing or otherwise unhealthy
Let's suppose that ```container-2``` has been crash looping and you don't know why, you can take a look at the logs of First, take a look at the logs of
the current container: the current container:
```sh ```sh
@ -112,12 +114,12 @@ kubectl exec cassandra -- cat /var/log/cassandra/system.log
If none of these approaches work, you can find the host machine that the pod is running on and SSH into that host, If none of these approaches work, you can find the host machine that the pod is running on and SSH into that host,
but this should generally not be necessary given tools in the Kubernetes API. Indeed if you find yourself needing to ssh into a machine, please file a but this should generally not be necessary given tools in the Kubernetes API. Therefore, if you find yourself needing to ssh into a machine, please file a
feature request on GitHub describing your use case and why these tools are insufficient. feature request on GitHub describing your use case and why these tools are insufficient.
### Debugging Replication Controllers ### Debugging Replication Controllers
Replication controllers are fairly straightforward. They can either create Pods or they can't. If they can't Replication controllers are fairly straightforward. They can either create Pods or they can't. If they can't
create pods, then please refer to the [instructions above](#debugging-pods) create pods, then please refer to the [instructions above](#debugging-pods) to debug your pods.
You can also use ```kubectl describe rc ${CONTROLLER_NAME}``` to introspect events related to the replication You can also use ```kubectl describe rc ${CONTROLLER_NAME}``` to introspect events related to the replication
controller. controller.
@ -126,8 +128,7 @@ controller.
Services provide load balancing across a set of pods. There are several common problems that can make Services Services provide load balancing across a set of pods. There are several common problems that can make Services
not work properly. The following instructions should help debug Service problems. not work properly. The following instructions should help debug Service problems.
#### Verify that there are endpoints for the service First, verify that there are endpoints for the service. For every Service object, the apiserver makes an `endpoints` resource available.
For every Service object, the apiserver makes an ```endpoints`` resource available.
You can view this resource with: You can view this resource with:
@ -139,7 +140,7 @@ Make sure that the endpoints match up with the number of containers that you exp
For example, if your Service is for an nginx container with 3 replicas, you would expect to see three different For example, if your Service is for an nginx container with 3 replicas, you would expect to see three different
IP addresses in the Service's endpoints. IP addresses in the Service's endpoints.
#### Missing endpoints #### My service is missing endpoints
If you are missing endpoints, try listing pods using the labels that Service uses. Imagine that you have If you are missing endpoints, try listing pods using the labels that Service uses. Imagine that you have
a Service where the labels are: a Service where the labels are:
```yaml ```yaml
@ -163,7 +164,7 @@ selected don't have that port listed, then they won't be added to the endpoints
Verify that the pod's ```containerPort``` matches up with the Service's ```containerPort``` Verify that the pod's ```containerPort``` matches up with the Service's ```containerPort```
#### Network traffic isn't forwarded #### Network traffic is not forwarded
If you can connect to the service, but the connection is immediately dropped, and there are endpoints If you can connect to the service, but the connection is immediately dropped, and there are endpoints
in the endpoints list, it's likely that the proxy can't contact your pods. in the endpoints list, it's likely that the proxy can't contact your pods.
@ -173,6 +174,11 @@ check:
* Can you connect to your pods directly? Get the IP address for the Pod, and try to connect directly to that IP * Can you connect to your pods directly? Get the IP address for the Pod, and try to connect directly to that IP
* Is your application serving on the port that you configured? Kubernetes doesn't do port remapping, so if your application serves on 8080, the ```containerPort``` field needs to be 8080. * Is your application serving on the port that you configured? Kubernetes doesn't do port remapping, so if your application serves on 8080, the ```containerPort``` field needs to be 8080.
#### More information
If none of the above solves your problem, follow the instructions in [Debugging Service document](debugging-services.md) to make sure that your `Service` is running, has `Endpoints`, and your `Pods` are actually serving; you have DNS working, iptables rules installed, and kube-proxy does not seem to be misbehaving.
You may also visit [troubleshooting document](../troubleshooting.md) for more information.
<!-- BEGIN MUNGE: GENERATED_ANALYTICS --> <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/user-guide/application-troubleshooting.md?pixel)]() [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/user-guide/application-troubleshooting.md?pixel)]()

View File

@ -23,15 +23,19 @@ certainly want the docs that go with that version.</h1>
# Compute Resources # Compute Resources
** Table of Contents** ** Table of Contents**
- Compute Resources <!-- BEGIN MUNGE: GENERATED_TOC -->
- [Compute Resources](#compute-resources)
- [Container and Pod Resource Limits](#container-and-pod-resource-limits) - [Container and Pod Resource Limits](#container-and-pod-resource-limits)
- [How Pods with Resource Limits are Scheduled](#how-pods-with-resource-limits-are-scheduled) - [How Pods with Resource Limits are Scheduled](#how-pods-with-resource-limits-are-scheduled)
- [How Pods with Resource Limits are Run](#how-pods-with-resource-limits-are-run) - [How Pods with Resource Limits are Run](#how-pods-with-resource-limits-are-run)
- [Monitoring Compute Resource Usage](#monitoring-compute-resource-usage) - [Monitoring Compute Resource Usage](#monitoring-compute-resource-usage)
- [Troubleshooting](#troubleshooting) - [Troubleshooting](#troubleshooting)
- [Detecting Resource Starved Containers](#detecting-resource-starved-containers) - [My pods are pending with event message failedScheduling](#my-pods-are-pending-with-event-message-failedscheduling)
- [My container is terminated](#my-container-is-terminated)
- [Planned Improvements](#planned-improvements) - [Planned Improvements](#planned-improvements)
<!-- END MUNGE: GENERATED_TOC -->
When specifying a [pod](pods.md), you can optionally specify how much CPU and memory (RAM) each When specifying a [pod](pods.md), you can optionally specify how much CPU and memory (RAM) each
container needs. When containers have resource limits, the scheduler is able to make better container needs. When containers have resource limits, the scheduler is able to make better
decisions about which nodes to place pods on, and contention for resources can be handled in a decisions about which nodes to place pods on, and contention for resources can be handled in a
@ -134,6 +138,7 @@ then pod resource usage can be retrieved from the monitoring system.
## Troubleshooting ## Troubleshooting
### My pods are pending with event message failedScheduling
If the scheduler cannot find any node where a pod can fit, then the pod will remain unscheduled If the scheduler cannot find any node where a pod can fit, then the pod will remain unscheduled
until a place can be found. An event will be produced each time the scheduler fails to find a until a place can be found. An event will be produced each time the scheduler fails to find a
place for the pod, like this: place for the pod, like this:
@ -159,8 +164,8 @@ The [resource quota](../admin/resource-quota.md) feature can be configured
to limit the total amount of resources that can be consumed. If used in conjunction to limit the total amount of resources that can be consumed. If used in conjunction
with namespaces, it can prevent one team from hogging all the resources. with namespaces, it can prevent one team from hogging all the resources.
### Detecting Resource Starved Containers ### My container is terminated
To check if a container is being killed because it is hitting a resource limit, call `kubectl describe pod` Your container may be terminated because it's resource-starved. To check if a container is being killed because it is hitting a resource limit, call `kubectl describe pod`
on the pod you are interested in: on the pod you are interested in:
``` ```

View File

@ -20,13 +20,35 @@ certainly want the docs that go with that version.</h1>
<!-- END STRIP_FOR_RELEASE --> <!-- END STRIP_FOR_RELEASE -->
<!-- END MUNGE: UNVERSIONED_WARNING --> <!-- END MUNGE: UNVERSIONED_WARNING -->
# My Service isn't working - how to debug # My Service is not working - how to debug
An issue that comes up rather frequently for new installations of Kubernetes is An issue that comes up rather frequently for new installations of Kubernetes is
that `Services` are not working properly. You've run all your `Pod`s and that `Services` are not working properly. You've run all your `Pod`s and
`ReplicationController`s, but you get no response when you try to access them. `ReplicationController`s, but you get no response when you try to access them.
This document will hopefully help you to figure out what's going wrong. This document will hopefully help you to figure out what's going wrong.
**Table of Contents**
<!-- BEGIN MUNGE: GENERATED_TOC -->
- [My Service is not working - how to debug](#my-service-is-not-working---how-to-debug)
- [Conventions](#conventions)
- [Running commands in a Pod](#running-commands-in-a-pod)
- [Setup](#setup)
- [Does the Service exist?](#does-the-service-exist)
- [Does the Service work by DNS?](#does-the-service-work-by-dns)
- [Does any Service exist in DNS?](#does-any-service-exist-in-dns)
- [Does the Service work by IP?](#does-the-service-work-by-ip)
- [Is the Service correct?](#is-the-service-correct)
- [Does the Service have any Endpoints?](#does-the-service-have-any-endpoints)
- [Are the Pods working?](#are-the-pods-working)
- [Is the kube-proxy working?](#is-the-kube-proxy-working)
- [Is kube-proxy running?](#is-kube-proxy-running)
- [Is kube-proxy writing iptables rules?](#is-kube-proxy-writing-iptables-rules)
- [Is kube-proxy proxying?](#is-kube-proxy-proxying)
- [Seek help](#seek-help)
- [More information](#more-information)
<!-- END MUNGE: GENERATED_TOC -->
## Conventions ## Conventions
Throughout this doc you will see various commands that you can run. Some Throughout this doc you will see various commands that you can run. Some
@ -57,33 +79,32 @@ OUTPUT
## Running commands in a Pod ## Running commands in a Pod
For many steps here will will want to see what a `Pod` running in the cluster For many steps here you will want to see what a `Pod` running in the cluster
sees. Kubernetes does not directly support interactive `Pod`s (yet), but you can sees. Kubernetes does not directly support interactive `Pod`s (yet), but you can
approximate it: approximate it:
```sh ```sh
$ cat <<EOF | kubectl create -f - $ cat <<EOF | kubectl create -f -
> apiVersion: v1 apiVersion: v1
> kind: Pod kind: Pod
> metadata: metadata:
> name: busybox-sleep name: busybox-sleep
> spec: spec:
> containers: containers:
> - name: busybox - name: busybox
> image: busybox image: busybox
> args: args:
> - sleep - sleep
> - "1000000" - "1000000"
> EOF EOF
pods/busybox-sleep pods/busybox-sleep
``` ```
Now, when you need to run a command (even an interactive shell) in a `Pod`-like Now, when you need to run a command (even an interactive shell) in a `Pod`-like
context: context, use:
```sh ```sh
$ kubectl exec busybox-sleep hostname $ kubectl exec busybox-sleep -- <COMMAND>
busybox-sleep
``` ```
or or
@ -281,7 +302,7 @@ debugging your own `Service`, debug DNS.
## Does the Service work by IP? ## Does the Service work by IP?
The next thing to test is whether your `Service` worksat all. From a The next thing to test is whether your `Service` works at all. From a
`Node` in your cluster, access the `Service`'s IP (from `kubectl get` above). `Node` in your cluster, access the `Service`'s IP (from `kubectl get` above).
```sh ```sh
@ -510,6 +531,9 @@ Contact us on
[email](https://groups.google.com/forum/#!forum/google-containers) or [email](https://groups.google.com/forum/#!forum/google-containers) or
[GitHub](https://github.com/GoogleCloudPlatform/kubernetes). [GitHub](https://github.com/GoogleCloudPlatform/kubernetes).
## More information
Visit [troubleshooting document](../troubleshooting.md) for more information.
<!-- BEGIN MUNGE: GENERATED_ANALYTICS --> <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/user-guide/debugging-services.md?pixel)]() [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/user-guide/debugging-services.md?pixel)]()