Merge pull request #52575 from vmware/vSphereInstanceNotFoundOnPowerOff

Automatic merge from submit-queue (batch tested with PRs 51311, 52575, 53169). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Unable to detach the vSphere volume from Powered off node

With the existing implementation when a vSphere node is powered off, the node is not deleted by the node controller and is in "NotReady" state. Following the approach similar to GCE as mentioned here - https://github.com/kubernetes/kubernetes/issues/46442.

I observe the following issues:
- The pods on the powered off node are not **instantaneously** created on the other available node. Only after 5 minutes timeout, the pods will be created on other available nodes with the volume attached to it. This means an application downtime of around 5 minutes which is not good at all.
- The volume on the powered off node are not detached at all when the pod with the volume is already moved to other available node. Hence any attempt to restart the powered off node will fail as the same volume is attached to other node which is present on this powered off node. (Please note that the volumes are not automatically detached from powered off in vSphere as opposed to GCE, AWS where volume is automatically detached from when node is powered off).

So inorder to resolve this problem, we have decided to back with the approach where the powered off node will be removed by the Node controller. So the above 2 problems will be resolved as follows:
- Since the node is deleted, the pod on the powered off node becomes instantaneously available on other available nodes with the volume attached to the new nodes. Hence there is no application downtime at all.
- After a period of 6 minutes (timeout period), the volumes are automatically detached from the powered off node. Hence any restarts after 6 minutes on the powered off node would work and not cause any problems as volumes are already detached.

For now, we would want to go ahead with deleting the node from node controller when a node is powered off in vCenter until we have a better approach. I think the best possible solution would be to introduce power handler in volume controller to see if the node is powered off before we can take any appropriate for attach/detach operations.

```release-note
None
```

@jingxu97 @saad-ali @divyenpatel @luomiao @rohitjogvmw
This commit is contained in:
Kubernetes Submit Queue 2017-09-28 23:18:19 -07:00 committed by GitHub
commit 00ee67bdc8

View File

@ -412,8 +412,8 @@ func (vs *VSphere) InstanceID(nodeName k8stypes.NodeName) (string, error) {
if isActive {
return "/" + vm.InventoryPath, nil
}
return "", fmt.Errorf("The node %q is not active", nodeNameToVMName(nodeName))
glog.Warningf("The VM: %s is not in %s state", nodeNameToVMName(nodeName), vclib.ActivePowerState)
return "", cloudprovider.InstanceNotFound
}
// InstanceTypeByProviderID returns the cloudprovider instance type of the node with the specified unique providerID