mirror of
https://github.com/k3s-io/kubernetes.git
synced 2025-07-29 22:46:12 +00:00
kubelet: document seamless upgrade support and guidance
This tries to capture the current state of affairs and a potential plan for supporting seamless upgrades better.
This commit is contained in:
parent
5a6ace2aa0
commit
0490b9f0b7
@ -16,6 +16,86 @@ there.
|
||||
|
||||
This socket filename should not start with a '.' as it will be ignored.
|
||||
|
||||
To avoid conflicts between different plugins, the recommendation is to use
|
||||
`<plugin name>[-<some optional string>].sock` as filename. `<plugin name>`
|
||||
should end with a DNS domain that is unique for the plugin. Each time a plugin
|
||||
starts, it has to delete old sockets if they exist and listen anew under the
|
||||
same filename.
|
||||
|
||||
## Seamless Upgrade
|
||||
|
||||
To avoid downtime of a plugin on a node, it would be nice to support running an
|
||||
old plugin in parallel to the new plugin. When deploying with a DaemonSet,
|
||||
setting `maxSurge` to a value larger than zero enables such a seamless upgrade.
|
||||
|
||||
**Warning**: Such a seamless upgrade **is not** supported at the moment. This
|
||||
section merely describes what would have to be changed to make it work.
|
||||
|
||||
### In a plugin
|
||||
|
||||
To support seamless upgrades, each plugin instance must use a unique
|
||||
socket filename. Otherwise the following could happen:
|
||||
- The old instance is registered with `plugin.example.com-reg.sock`.
|
||||
- The new instance starts, unlinks that file, and starts listening on it again.
|
||||
- In parallel, the kubelet notices the removal and unregisters the plugin
|
||||
before probing the new instance, thus breaking the seamless upgrade.
|
||||
|
||||
Even if the timing is more favorable and unregistration is avoided, using the
|
||||
same socket is problematic: if the new instance fails, the kubelet cannot fall
|
||||
back to the old instance because that old instance is not listening to the
|
||||
socket that is available under `plugin.example.com-reg.sock`.
|
||||
|
||||
This can be achieved in a DaemonSet by passing the UID of the pod into the pod
|
||||
through the downward API. New instances may try to clean up stale sockets of
|
||||
older instances, but have to be absolutely sure that those sockets really
|
||||
aren't in use anymore. Each instance should catch termination signals and clean
|
||||
up after itself. Then sockets only leak during abnormal events (power loss,
|
||||
killing with SIGKILL).
|
||||
|
||||
Last but not least, both plugin instances must be usable in parallel. It is not
|
||||
predictable which instance the kubelet will use for which request.
|
||||
|
||||
### In the kubelet
|
||||
|
||||
For such a seamless upgrade with different sockets per plugin to work reliably,
|
||||
the handler for the plugin type must track all registered instances. Then if
|
||||
one of them fails and gets unregistered, it can fall back to some
|
||||
other. Picking the most recently registered instance is a good heuristic. This
|
||||
isn't perfect because after a kubelet restart, plugin instances get registered
|
||||
in a random order. Restarting the kubelet in the middle of an upgrade should be
|
||||
rare.
|
||||
|
||||
At the moment, none of the existing handlers support such seamless upgrades:
|
||||
|
||||
- The device plugin handler suffers from temporarily removing the extended
|
||||
resources during an upgrade. A proposed fix is pending in
|
||||
https://github.com/kubernetes/kubernetes/pull/127821.
|
||||
|
||||
- The CSI handler [tries to determine which instance is newer](https://github.com/kubernetes/kubernetes/blob/7140b4910c6c1179c9778a7f3bb8037356febd58/pkg/volume/csi/csi_plugin.go#L115-L125) based on the supported version(s) and
|
||||
only remembers that one. If that newest instance fails, there is no fallback.
|
||||
|
||||
In practice, most CSI drivers probably all pass [the hard-coded "1.0.0"](https://github.com/kubernetes-csi/node-driver-registrar/blob/27700e2962cd35b9f2336a156146181e5c75399e/cmd/csi-node-driver-registrar/main.go#L72)
|
||||
from the csi-node-registrar as supported version, so this version
|
||||
selection mechanism isn't used at all.
|
||||
|
||||
- The DRA handler only remembers the most recently registered instance.
|
||||
|
||||
### Deployment
|
||||
|
||||
Deploying a plugin with support for seamless upgrades and per-instance socket
|
||||
filenames is *not* compatible with a kubelet version that does not have support
|
||||
for seamless upgrades yet. It breaks like this:
|
||||
|
||||
- New instance starts, gets registered and replaces the old one.
|
||||
- Old instance stops, removing its socket.
|
||||
- The kubelet notices that, unregisters the plugin.
|
||||
- The plugin handler removes *the new* instance because it ignores the socket path -> no instance left.
|
||||
|
||||
Plugin authors either have to assume that the cluster has a recent enough
|
||||
kubelet or rely on labeling nodes with support. Then the plugin can use one
|
||||
simple DaemonSet for nodes without support and another, more complex one where
|
||||
`maxSurge` is increased to enable seamless upgrades on nodes which support it.
|
||||
No such label is specified at the moment.
|
||||
|
||||
## gRPC Service Lifecycle
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user