Various minor edits/clarifications to docs/admin/ docs.

Deleted docs/admin/namespaces.md as it was content-free and the topic is already covered well in docs/user-guide/namespaces.md
2025-09-04 18:52:38 +00:00 · 2015-07-17 10:12:08 -07:00
parent e81645b973
commit 2a26b7487e
14 changed files with 83 additions and 130 deletions
--- a/docs/admin/cluster-troubleshooting.md
+++ b/docs/admin/cluster-troubleshooting.md
@@ -31,7 +31,7 @@ Documentation for other releases can be found at

 <!-- END MUNGE: UNVERSIONED_WARNING -->
 # Cluster Troubleshooting
-Most of the time, if you encounter problems, it is your application that is having problems.  For application
+Most of the time, if you encounter problems, it is your application that is the root cause.  For application
 problems please see the [application troubleshooting guide](../user-guide/application-troubleshooting.md). You may also visit [troubleshooting document](../troubleshooting.md) for more information. 

 ## Listing your cluster
@@ -46,7 +46,7 @@ And verify that all of the nodes you expect to see are present and that they are

 ## Looking at logs
 For now, digging deeper into the cluster requires logging into the relevant machines.  Here are the locations
-of the relevant log files.  (note that on systemd based systems, you may need to use ```journalctl``` instead)
+of the relevant log files.  (note that on systemd-based systems, you may need to use ```journalctl``` instead)

 ### Master
   * /var/log/kube-apiserver.log - API Server, responsible for serving the API
@@ -59,7 +59,7 @@ of the relevant log files.  (note that on systemd based systems, you may need to

 ## A general overview of cluster failure modes

-This is an incomplete list of things that could go wrong, and how to deal with them.
+This is an incomplete list of things that could go wrong, and how to adjust your cluster setup to mitigate the problems.

 Root causes:
  - VM(s) shutdown
@@ -102,18 +102,18 @@ Specific scenarios:
      - etc.

 Mitigations:
- Action: Use IaaS providers automatic VM restarting feature for IaaS VMs
+- Action: Use IaaS provider's automatic VM restarting feature for IaaS VMs
  - Mitigates: Apiserver VM shutdown or apiserver crashing
  - Mitigates: Supporting services VM shutdown or crashes

 - Action use IaaS providers reliable storage (e.g GCE PD or AWS EBS volume) for VMs with apiserver+etcd
  - Mitigates: Apiserver backing storage lost

- Action: Use [replicated APIserver](high-availability.md) feature
-  - Mitigates: Apiserver VM shutdown or apiserver crashing
-    - Will tolerate one or more simultaneous apiserver failures
+- Action: Use (experimental) [high-availability](high-availability.md) configuration
+  - Mitigates: Master VM shutdown or master components (scheduler, API server, controller-managing) crashing
+    - Will tolerate one or more simultaneous node or component failures
  - Mitigates: Apiserver backing storage (i.e., etcd's data directory) lost
-    - Each apiserver has independent storage.  Etcd will recover from loss of one member.  Risk of total data loss greatly reduced.
+    - Assuming you used clustered etcd.

 - Action: Snapshot apiserver PDs/EBS-volumes periodically
  - Mitigates: Apiserver backing storage lost