mirror of
https://github.com/k3s-io/kubernetes.git
synced 2025-07-22 19:31:44 +00:00
Merge pull request #11317 from JanetKuo/docs-troubleshooting
Organize troubleshooting sections
This commit is contained in:
commit
b3835665ee
@ -43,6 +43,8 @@ certainly want the docs that go with that version.</h1>
|
||||
* There are example files and walkthroughs in the [examples](../examples/)
|
||||
folder.
|
||||
|
||||
* If something went wrong, see the [troubleshooting](troubleshooting.md) document for how to debug.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
|
@ -22,7 +22,7 @@ certainly want the docs that go with that version.</h1>
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
# Cluster Troubleshooting
|
||||
Most of the time, if you encounter problems, it is your application that is having problems. For application
|
||||
problems please see the [application troubleshooting guide](../user-guide/application-troubleshooting.md).
|
||||
problems please see the [application troubleshooting guide](../user-guide/application-troubleshooting.md). You may also visit [troubleshooting document](../troubleshooting.md) for more information.
|
||||
|
||||
## Listing your cluster
|
||||
The first thing to debug in your cluster is if your nodes are all registered correctly.
|
||||
|
@ -20,7 +20,7 @@ certainly want the docs that go with that version.</h1>
|
||||
<!-- END STRIP_FOR_RELEASE -->
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
# Application Troubleshooting.
|
||||
# Application Troubleshooting
|
||||
|
||||
This guide is to help users debug applications that are deployed into Kubernetes and not behaving correctly.
|
||||
This is *not* a guide for people who want to debug their cluster. For that you should check out
|
||||
@ -28,10 +28,18 @@ This is *not* a guide for people who want to debug their cluster. For that you
|
||||
|
||||
**Table of Contents**
|
||||
<!-- BEGIN MUNGE: GENERATED_TOC -->
|
||||
- [Application Troubleshooting.](#application-troubleshooting.)
|
||||
- [Application Troubleshooting](#application-troubleshooting)
|
||||
- [FAQ](#faq)
|
||||
- [Diagnosing the problem](#diagnosing-the-problem)
|
||||
- [Debugging Pods](#debugging-pods)
|
||||
- [My pod stays pending](#my-pod-stays-pending)
|
||||
- [My pod stays waiting](#my-pod-stays-waiting)
|
||||
- [My pod is crashing or otherwise unhealthy](#my-pod-is-crashing-or-otherwise-unhealthy)
|
||||
- [Debugging Replication Controllers](#debugging-replication-controllers)
|
||||
- [Debugging Services](#debugging-services)
|
||||
- [My service is missing endpoints](#my-service-is-missing-endpoints)
|
||||
- [Network traffic is not forwarded](#network-traffic-is-not-forwarded)
|
||||
- [More information](#more-information)
|
||||
|
||||
<!-- END MUNGE: GENERATED_TOC -->
|
||||
|
||||
@ -46,46 +54,40 @@ your Service?
|
||||
* [Debugging Services](#debugging-services)
|
||||
|
||||
### Debugging Pods
|
||||
The first step in debugging a Pod is taking a look at it. For the purposes of example, imagine we have a pod
|
||||
```my-pod``` which holds two containers ```container-1``` and ```container-2```
|
||||
|
||||
First, describe the pod. This will show the current state of the Pod and recent events.
|
||||
The first step in debugging a Pod is taking a look at it. Check the current state of the Pod and recent events with the following command:
|
||||
|
||||
```sh
|
||||
export POD_NAME=my-pod
|
||||
kubectl describe pods ${POD_NAME}
|
||||
```
|
||||
|
||||
Look at the state of the containers in the pod. Are they all ```Running```? Have there been recent restarts?
|
||||
|
||||
Depending on the state of the pod, you may want to:
|
||||
* [Debug a pending pod](#debugging-pending-pods)
|
||||
* [Debug a waiting pod](#debugging-waiting-pods)
|
||||
* [Debug a crashing pod](#debugging-crashing-pods-or-otherwise-unhealthy-pods)
|
||||
Continue debugging depending on the state of the pods.
|
||||
|
||||
#### Debuging Pending Pods
|
||||
#### My pod stays pending
|
||||
If a Pod is stuck in ```Pending``` it means that it can not be scheduled onto a node. Generally this is because
|
||||
there are insufficient resources of one type or another that prevent scheduling. Look at the output of the
|
||||
```kubectl describe ...``` command above. There should be messages from the scheduler about why it can not schedule
|
||||
your pod. Reasons include:
|
||||
|
||||
You don't have enough resources. You may have exhausted the supply of CPU or Memory in your cluster, in this case
|
||||
you need to delete Pods, adjust resource requests, or add new nodes to your cluster.
|
||||
* **You don't have enough resources**: You may have exhausted the supply of CPU or Memory in your cluster, in this case
|
||||
you need to delete Pods, adjust resource requests, or add new nodes to your cluster. See [Compute Resources document](compute-resources.md#my-pods-are-pending-with-event-message-failedscheduling) for more information.
|
||||
|
||||
You are using ```hostPort```. When you bind a Pod to a ```hostPort``` there are a limited number of places that pod can be
|
||||
* **You are using ```hostPort```**: When you bind a Pod to a ```hostPort``` there are a limited number of places that pod can be
|
||||
scheduled. In most cases, ```hostPort``` is unnecessary, try using a Service object to expose your Pod. If you do require
|
||||
```hostPort``` then you can only schedule as many Pods as there are nodes in your Kubernetes cluster.
|
||||
|
||||
|
||||
#### Debugging Waiting Pods
|
||||
#### My pod stays waiting
|
||||
If a Pod is stuck in the ```Waiting``` state, then it has been scheduled to a worker node, but it can't run on that machine.
|
||||
Again, the information from ```kubectl describe ...``` should be informative. The most common cause of ```Waiting``` pods
|
||||
is a failure to pull the image. Make sure that you have the name of the image correct. Have you pushed it to the repository?
|
||||
Does it work if you run a manual ```docker pull <image>``` on your machine?
|
||||
Again, the information from ```kubectl describe ...``` should be informative. The most common cause of ```Waiting``` pods is a failure to pull the image. There are three things to check:
|
||||
* Make sure that you have the name of the image correct
|
||||
* Have you pushed the image to the repository?
|
||||
* Run a manual ```docker pull <image>``` on your machine to see if the image can be pulled.
|
||||
|
||||
#### Debugging Crashing or otherwise unhealthy pods
|
||||
#### My pod is crashing or otherwise unhealthy
|
||||
|
||||
Let's suppose that ```container-2``` has been crash looping and you don't know why, you can take a look at the logs of
|
||||
First, take a look at the logs of
|
||||
the current container:
|
||||
|
||||
```sh
|
||||
@ -112,12 +114,12 @@ kubectl exec cassandra -- cat /var/log/cassandra/system.log
|
||||
|
||||
|
||||
If none of these approaches work, you can find the host machine that the pod is running on and SSH into that host,
|
||||
but this should generally not be necessary given tools in the Kubernetes API. Indeed if you find yourself needing to ssh into a machine, please file a
|
||||
but this should generally not be necessary given tools in the Kubernetes API. Therefore, if you find yourself needing to ssh into a machine, please file a
|
||||
feature request on GitHub describing your use case and why these tools are insufficient.
|
||||
|
||||
### Debugging Replication Controllers
|
||||
Replication controllers are fairly straightforward. They can either create Pods or they can't. If they can't
|
||||
create pods, then please refer to the [instructions above](#debugging-pods)
|
||||
create pods, then please refer to the [instructions above](#debugging-pods) to debug your pods.
|
||||
|
||||
You can also use ```kubectl describe rc ${CONTROLLER_NAME}``` to introspect events related to the replication
|
||||
controller.
|
||||
@ -126,8 +128,7 @@ controller.
|
||||
Services provide load balancing across a set of pods. There are several common problems that can make Services
|
||||
not work properly. The following instructions should help debug Service problems.
|
||||
|
||||
#### Verify that there are endpoints for the service
|
||||
For every Service object, the apiserver makes an ```endpoints`` resource available.
|
||||
First, verify that there are endpoints for the service. For every Service object, the apiserver makes an `endpoints` resource available.
|
||||
|
||||
You can view this resource with:
|
||||
|
||||
@ -139,7 +140,7 @@ Make sure that the endpoints match up with the number of containers that you exp
|
||||
For example, if your Service is for an nginx container with 3 replicas, you would expect to see three different
|
||||
IP addresses in the Service's endpoints.
|
||||
|
||||
#### Missing endpoints
|
||||
#### My service is missing endpoints
|
||||
If you are missing endpoints, try listing pods using the labels that Service uses. Imagine that you have
|
||||
a Service where the labels are:
|
||||
```yaml
|
||||
@ -163,7 +164,7 @@ selected don't have that port listed, then they won't be added to the endpoints
|
||||
|
||||
Verify that the pod's ```containerPort``` matches up with the Service's ```containerPort```
|
||||
|
||||
#### Network traffic isn't forwarded
|
||||
#### Network traffic is not forwarded
|
||||
If you can connect to the service, but the connection is immediately dropped, and there are endpoints
|
||||
in the endpoints list, it's likely that the proxy can't contact your pods.
|
||||
|
||||
@ -173,6 +174,11 @@ check:
|
||||
* Can you connect to your pods directly? Get the IP address for the Pod, and try to connect directly to that IP
|
||||
* Is your application serving on the port that you configured? Kubernetes doesn't do port remapping, so if your application serves on 8080, the ```containerPort``` field needs to be 8080.
|
||||
|
||||
#### More information
|
||||
If none of the above solves your problem, follow the instructions in [Debugging Service document](debugging-services.md) to make sure that your `Service` is running, has `Endpoints`, and your `Pods` are actually serving; you have DNS working, iptables rules installed, and kube-proxy does not seem to be misbehaving.
|
||||
|
||||
You may also visit [troubleshooting document](../troubleshooting.md) for more information.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
|
@ -23,15 +23,19 @@ certainly want the docs that go with that version.</h1>
|
||||
# Compute Resources
|
||||
|
||||
** Table of Contents**
|
||||
- Compute Resources
|
||||
<!-- BEGIN MUNGE: GENERATED_TOC -->
|
||||
- [Compute Resources](#compute-resources)
|
||||
- [Container and Pod Resource Limits](#container-and-pod-resource-limits)
|
||||
- [How Pods with Resource Limits are Scheduled](#how-pods-with-resource-limits-are-scheduled)
|
||||
- [How Pods with Resource Limits are Run](#how-pods-with-resource-limits-are-run)
|
||||
- [Monitoring Compute Resource Usage](#monitoring-compute-resource-usage)
|
||||
- [Troubleshooting](#troubleshooting)
|
||||
- [Detecting Resource Starved Containers](#detecting-resource-starved-containers)
|
||||
- [My pods are pending with event message failedScheduling](#my-pods-are-pending-with-event-message-failedscheduling)
|
||||
- [My container is terminated](#my-container-is-terminated)
|
||||
- [Planned Improvements](#planned-improvements)
|
||||
|
||||
<!-- END MUNGE: GENERATED_TOC -->
|
||||
|
||||
When specifying a [pod](pods.md), you can optionally specify how much CPU and memory (RAM) each
|
||||
container needs. When containers have resource limits, the scheduler is able to make better
|
||||
decisions about which nodes to place pods on, and contention for resources can be handled in a
|
||||
@ -134,6 +138,7 @@ then pod resource usage can be retrieved from the monitoring system.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### My pods are pending with event message failedScheduling
|
||||
If the scheduler cannot find any node where a pod can fit, then the pod will remain unscheduled
|
||||
until a place can be found. An event will be produced each time the scheduler fails to find a
|
||||
place for the pod, like this:
|
||||
@ -159,8 +164,8 @@ The [resource quota](../admin/resource-quota.md) feature can be configured
|
||||
to limit the total amount of resources that can be consumed. If used in conjunction
|
||||
with namespaces, it can prevent one team from hogging all the resources.
|
||||
|
||||
### Detecting Resource Starved Containers
|
||||
To check if a container is being killed because it is hitting a resource limit, call `kubectl describe pod`
|
||||
### My container is terminated
|
||||
Your container may be terminated because it's resource-starved. To check if a container is being killed because it is hitting a resource limit, call `kubectl describe pod`
|
||||
on the pod you are interested in:
|
||||
|
||||
```
|
||||
|
@ -20,13 +20,35 @@ certainly want the docs that go with that version.</h1>
|
||||
<!-- END STRIP_FOR_RELEASE -->
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
# My Service isn't working - how to debug
|
||||
# My Service is not working - how to debug
|
||||
|
||||
An issue that comes up rather frequently for new installations of Kubernetes is
|
||||
that `Services` are not working properly. You've run all your `Pod`s and
|
||||
`ReplicationController`s, but you get no response when you try to access them.
|
||||
This document will hopefully help you to figure out what's going wrong.
|
||||
|
||||
**Table of Contents**
|
||||
<!-- BEGIN MUNGE: GENERATED_TOC -->
|
||||
- [My Service is not working - how to debug](#my-service-is-not-working---how-to-debug)
|
||||
- [Conventions](#conventions)
|
||||
- [Running commands in a Pod](#running-commands-in-a-pod)
|
||||
- [Setup](#setup)
|
||||
- [Does the Service exist?](#does-the-service-exist)
|
||||
- [Does the Service work by DNS?](#does-the-service-work-by-dns)
|
||||
- [Does any Service exist in DNS?](#does-any-service-exist-in-dns)
|
||||
- [Does the Service work by IP?](#does-the-service-work-by-ip)
|
||||
- [Is the Service correct?](#is-the-service-correct)
|
||||
- [Does the Service have any Endpoints?](#does-the-service-have-any-endpoints)
|
||||
- [Are the Pods working?](#are-the-pods-working)
|
||||
- [Is the kube-proxy working?](#is-the-kube-proxy-working)
|
||||
- [Is kube-proxy running?](#is-kube-proxy-running)
|
||||
- [Is kube-proxy writing iptables rules?](#is-kube-proxy-writing-iptables-rules)
|
||||
- [Is kube-proxy proxying?](#is-kube-proxy-proxying)
|
||||
- [Seek help](#seek-help)
|
||||
- [More information](#more-information)
|
||||
|
||||
<!-- END MUNGE: GENERATED_TOC -->
|
||||
|
||||
## Conventions
|
||||
|
||||
Throughout this doc you will see various commands that you can run. Some
|
||||
@ -57,33 +79,32 @@ OUTPUT
|
||||
|
||||
## Running commands in a Pod
|
||||
|
||||
For many steps here will will want to see what a `Pod` running in the cluster
|
||||
For many steps here you will want to see what a `Pod` running in the cluster
|
||||
sees. Kubernetes does not directly support interactive `Pod`s (yet), but you can
|
||||
approximate it:
|
||||
|
||||
```sh
|
||||
$ cat <<EOF | kubectl create -f -
|
||||
> apiVersion: v1
|
||||
> kind: Pod
|
||||
> metadata:
|
||||
> name: busybox-sleep
|
||||
> spec:
|
||||
> containers:
|
||||
> - name: busybox
|
||||
> image: busybox
|
||||
> args:
|
||||
> - sleep
|
||||
> - "1000000"
|
||||
> EOF
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: busybox-sleep
|
||||
spec:
|
||||
containers:
|
||||
- name: busybox
|
||||
image: busybox
|
||||
args:
|
||||
- sleep
|
||||
- "1000000"
|
||||
EOF
|
||||
pods/busybox-sleep
|
||||
```
|
||||
|
||||
Now, when you need to run a command (even an interactive shell) in a `Pod`-like
|
||||
context:
|
||||
context, use:
|
||||
|
||||
```sh
|
||||
$ kubectl exec busybox-sleep hostname
|
||||
busybox-sleep
|
||||
$ kubectl exec busybox-sleep -- <COMMAND>
|
||||
```
|
||||
|
||||
or
|
||||
@ -281,7 +302,7 @@ debugging your own `Service`, debug DNS.
|
||||
|
||||
## Does the Service work by IP?
|
||||
|
||||
The next thing to test is whether your `Service` worksat all. From a
|
||||
The next thing to test is whether your `Service` works at all. From a
|
||||
`Node` in your cluster, access the `Service`'s IP (from `kubectl get` above).
|
||||
|
||||
```sh
|
||||
@ -510,6 +531,9 @@ Contact us on
|
||||
[email](https://groups.google.com/forum/#!forum/google-containers) or
|
||||
[GitHub](https://github.com/GoogleCloudPlatform/kubernetes).
|
||||
|
||||
## More information
|
||||
Visit [troubleshooting document](../troubleshooting.md) for more information.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
|
Loading…
Reference in New Issue
Block a user