Production-Grade Container Scheduling and Management
Go to file
Kubernetes Submit Queue f499606bfe Merge pull request #45346 from codablock/fix_double_attach
Automatic merge from submit-queue

Don't try to attach volumes which are already attached to other nodes

This PR is a replacement for https://github.com/kubernetes/kubernetes/pull/40148. I was not able to push fixes and rebases to the original branch as I don't have access to the Github organization anymore.

CC @saad-ali You probably have to update the PR link in [Q2 2017 (v1.7)](https://docs.google.com/spreadsheets/d/1t4z5DYKjX2ZDlkTpCnp18icRAQqOE85C1T1r2gqJVck/edit#gid=14624465)

I assume the PR will need a new "ok to test" 

**ORIGINAL PR DESCRIPTION**

This PR fixes an issue with the attach/detach volume controller. There are cases where the `desiredStateOfWorld` contains the same volume for multiple nodes, resulting in the attach/detach controller attaching this volume to multiple nodes. This of course fails for volumes like AWS EBS, Azure Disks, ...

I observed this situation on Azure when using Azure Disks and replication controllers which start to reschedule PODs. When you delete a POD that belongs to a RC, the RC will immediately schedule a new POD on another node. This results in a short time (max a few seconds) where you have 2 PODs which try to attach/mount the same volume on different nodes. As the old POD is still alive, the attach/detach controller does not try to detach the volume and starts to attach the volume to the new POD immediately.

This behavior was probably not noticed before on other clouds as the bogus attempt to attach probably fails pretty fast and thus is unnoticed. As the situation with the 2 PODs disappears after a few seconds, a detach for the old POD is initiated and thus the new POD can attach successfully.

On Azure however, attaching and detaching takes quite long, resulting in the first bogus attach attempt to already eat up much time.
When attaching fails on Azure and reports that it is already attached somewhere else, the cloud provider immediately does a detach call for the same volume+node it tried to attach to. This is done to make sure the failed attach request is aborted immediately. You can find this here: https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/azure/azure_storage.go#L74

The complete flow of attach->fail->abort eats up valuable time and the attach/detach controller can not proceed with other work while this is happening. This means, if the old POD disappears in the meantime, the controller can't even start the detach for the volume which delays the whole process of rescheduling and reattaching.

Also, I and other people have observed very strange behavior where disks ended up being "attached" to multiple VMs at the same time as reported by Azure Portal. This results in the controller to fail reattaching forever. It's hard to figure out why and when this happens and there is no reproducer known yet. I can imagine however that the described behavior correlates with what I described above.

I was not sure if there are actually cases where it is perfectly fine to have a volume mounted to multiple PODs/nodes. At least technically, this should be possible with network based volumes, e.g. nfs. Can someone with more knowledge about volumes help me here? I may need to add a check before skipping attaching in `reconcile`.

CC @colemickens @rootfs

-->
```release-note
Don't try to attach volume to new node if it is already attached to another node and the volume does not support multi-attach.
```
2017-05-19 21:54:42 -07:00
.github Redirect kubeadm issues to kubeadm repo 2017-05-19 12:18:55 +02:00
api generated: api changes 2017-05-18 10:07:47 -04:00
build Merge pull request #45855 from mikedanese/out-of-root 2017-05-18 00:56:47 -07:00
cluster Merge pull request #38169 from caseydavenport/calico-daemonset 2017-05-19 19:38:59 -07:00
cmd add "admission" API group 2017-05-19 10:17:37 -06:00
docs generated: api changes 2017-05-18 10:07:47 -04:00
examples Merge pull request #45678 from a-robinson/1.0 2017-05-17 18:40:59 -07:00
federation Merge pull request #46063 from madhusudancs/fed-kubefed-logv4 2017-05-18 21:48:39 -07:00
Godeps Update k8s.io/gengo/... dependency 2017-05-17 00:09:38 -07:00
hack Merge pull request #45564 from whitlockjc/admission-api-group 2017-05-19 18:57:38 -07:00
hooks
logo
pkg Merge pull request #45346 from codablock/fix_double_attach 2017-05-19 21:54:42 -07:00
plugin Merge pull request #46104 from liggitt/node-admission 2017-05-19 10:58:07 -07:00
staging Merge pull request #46059 from nikhita/test-int-preserve 2017-05-19 08:35:08 -07:00
test Merge pull request #45979 from bowei/owners 2017-05-19 19:39:05 -07:00
third_party autogenerated 2017-04-14 10:40:57 -07:00
translations Extract a bunch more strings from kubectl 2017-04-06 20:12:50 -07:00
vendor Update k8s.io/gengo/... dependency 2017-05-17 00:09:38 -07:00
.bazelrc move build related files out of the root directory 2017-05-15 15:53:54 -07:00
.gazelcfg.json Add go_genrule for zz_generated.openapi.go. 2017-04-25 17:51:36 -07:00
.generated_files
.gitattributes
.gitignore Remove verify_gen_openapi make rule. 2017-04-25 17:41:33 -07:00
BUILD.bazel move build related files out of the root directory 2017-05-15 15:53:54 -07:00
CHANGELOG.md Update CHANGELOG.md for v1.6.4. 2017-05-19 12:30:25 -07:00
code-of-conduct.md
CONTRIBUTING.md
labels.yaml Update labels.yaml with sig labels 2017-04-28 14:27:32 -07:00
LICENSE
Makefile move build related files out of the root directory 2017-05-15 15:53:54 -07:00
Makefile.generated_files move build related files out of the root directory 2017-05-15 15:53:54 -07:00
OWNERS
OWNERS_ALIASES Merge pull request #42953 from kargakis/rm-myself 2017-04-03 01:50:58 -07:00
README.md Adjust the link to the right troubleshooting doc page 2017-04-13 08:20:39 +00:00
Vagrantfile
WORKSPACE move build related files out of the root directory 2017-05-15 15:53:54 -07:00

Kubernetes

Submit Queue Widget GoDoc Widget


Kubernetes is an open source system for managing containerized applications across multiple hosts, providing basic mechanisms for deployment, maintenance, and scaling of applications.

Kubernetes builds upon a decade and a half of experience at Google running production workloads at scale using a system called Borg, combined with best-of-breed ideas and practices from the community.

Kubernetes is hosted by the Cloud Native Computing Foundation (CNCF). If you are a company that wants to help shape the evolution of technologies that are container-packaged, dynamically-scheduled and microservices-oriented, consider joining the CNCF. For details about who's involved and how Kubernetes plays a role, read the CNCF announcement.


To start using Kubernetes

See our documentation on kubernetes.io.

Try our interactive tutorial.

Take a free course on Scalable Microservices with Kubernetes.

To start developing Kubernetes

The community repository hosts all information about building Kubernetes from source, how to contribute code and documentation, who to contact about what, etc.

If you want to build Kubernetes right away there are two options:

You have a working Go environment.
$ go get -d k8s.io/kubernetes
$ cd $GOPATH/src/k8s.io/kubernetes
$ make
You have a working Docker environment.
$ git clone https://github.com/kubernetes/kubernetes
$ cd kubernetes
$ make quick-release

If you are less impatient, head over to the developer's documentation.

Support

If you need support, start with the troubleshooting guide and work your way through the process that we've outlined.

That said, if you have questions, reach out to us one way or another.

Analytics