kubernetes

mirror of https://github.com/k3s-io/kubernetes.git synced 2026-01-18 15:54:57 +00:00

Files

Kubernetes Submit Queue 84aa5f695f Merge pull request #35038 from sjenning/nfs-nonblock-reader2

Automatic merge from submit-queue

kubelet: storage: don't hang kubelet on unresponsive nfs

Fixes #31272 

Currently, due to the nature of nfs, an unresponsive nfs volume in a pod can wedge the kubelet such that additional pods can not be run.

The discussion thus far surrounding this issue was to wrap the `lstat`, the syscall that ends up hanging in uninterruptible sleep, in a goroutine and limiting the number of goroutines that hang to one per-pod per-volume.

However, in my investigation, I found that the callsites that request a listing of the volumes from a particular volume plugin directory don't care anything about the properties provided by the `lstat` call.  They only care about whether or not a directory exists.

Given that constraint, this PR just avoids the `lstat` call by using `Readdirnames()` instead of `ReadDir()` or `ReadDirNoExit()`

### More detail for reviewers
Consider the pod mounted nfs volume at `/var/lib/kubelet/pods/881341b5-9551-11e6-af4c-fa163e815edd/volumes/kubernetes.io~nfs/myvol`.  The kubelet wedges because when we do a `ReadDir()` or `ReadDirNoExit()` it calls `syscall.Lstat` on `myvol` which requires communication with the nfs server.  If the nfs server is unreachable, this call hangs forever.

However, for our code, we only care what about the names of files/directory contained in `kubernetes.io~nfs` directory, not any of the more detailed information the `Lstat` call provides.  Getting the names can be done with `Readdirnames()`, which doesn't need to involve the nfs server.

@pmorie @eparis @ncdc @derekwaynecarr @saad-ali @thockin @vishh @kubernetes/rh-cluster-infra

2016-10-18 12:37:31 -07:00