Merge pull request #55884 from mpolednik/dpi-race-fix

Automatic merge from submit-queue (batch tested with PRs 55839, 54495, 55884, 55983, 56069). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

deviceplugin: fix race when multiple plugins are registered

**What this PR does / why we need it**:
When registering multiple device plugins to Kubelet concurrently, there exists a race that crashes the Kubelet.

Consider two plugins: D1 and D2. The call order method is roughly

D1 -> manager.go:register -> endpoint.go:listAndWatch -> device_plugin_handler.go:(*D1).callback
D2 -> manager.go:register -> endpoint.go:listAndWatch -> device_plugin_handler.go:(*D2).callback

The callback function accesses HandlerImpl's allDevices map that maps (resourceName -> DeviceID). If both plugins reach these accesses at the same time, Kubelet crashes with "fatal error: concurrent map read and map write".

This can be solved by making sure handler is locked when allDevices are being updated. The functionality is needed to avoid Kubelet crashes when multiple device plugins are trying to register with Kubelet at the same moment. Occurs frequently when single binary tries to register itself as multiple plugins.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:

**Special notes for your reviewer**:

**Release note**:
```release-note
NONE
```
This commit is contained in:
Kubernetes Submit Queue 2017-11-20 13:08:09 -08:00 committed by GitHub
commit 0b1d023aa7
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -92,6 +92,10 @@ func NewHandlerImpl(updateCapacityFunc func(v1.ResourceList)) (*HandlerImpl, err
deviceManagerMonitorCallback := func(resourceName string, added, updated, deleted []pluginapi.Device) {
var capacity = v1.ResourceList{}
kept := append(updated, added...)
handler.Lock()
defer handler.Unlock()
if _, ok := handler.allDevices[resourceName]; !ok {
handler.allDevices[resourceName] = sets.NewString()
}