mirror of
https://github.com/k3s-io/kubernetes.git
synced 2025-08-03 09:22:44 +00:00
kubelet: document seamless upgrade support and guidance
This tries to capture the current state of affairs and a potential plan for supporting seamless upgrades better.
This commit is contained in:
parent
5a6ace2aa0
commit
0490b9f0b7
@ -16,6 +16,86 @@ there.
|
|||||||
|
|
||||||
This socket filename should not start with a '.' as it will be ignored.
|
This socket filename should not start with a '.' as it will be ignored.
|
||||||
|
|
||||||
|
To avoid conflicts between different plugins, the recommendation is to use
|
||||||
|
`<plugin name>[-<some optional string>].sock` as filename. `<plugin name>`
|
||||||
|
should end with a DNS domain that is unique for the plugin. Each time a plugin
|
||||||
|
starts, it has to delete old sockets if they exist and listen anew under the
|
||||||
|
same filename.
|
||||||
|
|
||||||
|
## Seamless Upgrade
|
||||||
|
|
||||||
|
To avoid downtime of a plugin on a node, it would be nice to support running an
|
||||||
|
old plugin in parallel to the new plugin. When deploying with a DaemonSet,
|
||||||
|
setting `maxSurge` to a value larger than zero enables such a seamless upgrade.
|
||||||
|
|
||||||
|
**Warning**: Such a seamless upgrade **is not** supported at the moment. This
|
||||||
|
section merely describes what would have to be changed to make it work.
|
||||||
|
|
||||||
|
### In a plugin
|
||||||
|
|
||||||
|
To support seamless upgrades, each plugin instance must use a unique
|
||||||
|
socket filename. Otherwise the following could happen:
|
||||||
|
- The old instance is registered with `plugin.example.com-reg.sock`.
|
||||||
|
- The new instance starts, unlinks that file, and starts listening on it again.
|
||||||
|
- In parallel, the kubelet notices the removal and unregisters the plugin
|
||||||
|
before probing the new instance, thus breaking the seamless upgrade.
|
||||||
|
|
||||||
|
Even if the timing is more favorable and unregistration is avoided, using the
|
||||||
|
same socket is problematic: if the new instance fails, the kubelet cannot fall
|
||||||
|
back to the old instance because that old instance is not listening to the
|
||||||
|
socket that is available under `plugin.example.com-reg.sock`.
|
||||||
|
|
||||||
|
This can be achieved in a DaemonSet by passing the UID of the pod into the pod
|
||||||
|
through the downward API. New instances may try to clean up stale sockets of
|
||||||
|
older instances, but have to be absolutely sure that those sockets really
|
||||||
|
aren't in use anymore. Each instance should catch termination signals and clean
|
||||||
|
up after itself. Then sockets only leak during abnormal events (power loss,
|
||||||
|
killing with SIGKILL).
|
||||||
|
|
||||||
|
Last but not least, both plugin instances must be usable in parallel. It is not
|
||||||
|
predictable which instance the kubelet will use for which request.
|
||||||
|
|
||||||
|
### In the kubelet
|
||||||
|
|
||||||
|
For such a seamless upgrade with different sockets per plugin to work reliably,
|
||||||
|
the handler for the plugin type must track all registered instances. Then if
|
||||||
|
one of them fails and gets unregistered, it can fall back to some
|
||||||
|
other. Picking the most recently registered instance is a good heuristic. This
|
||||||
|
isn't perfect because after a kubelet restart, plugin instances get registered
|
||||||
|
in a random order. Restarting the kubelet in the middle of an upgrade should be
|
||||||
|
rare.
|
||||||
|
|
||||||
|
At the moment, none of the existing handlers support such seamless upgrades:
|
||||||
|
|
||||||
|
- The device plugin handler suffers from temporarily removing the extended
|
||||||
|
resources during an upgrade. A proposed fix is pending in
|
||||||
|
https://github.com/kubernetes/kubernetes/pull/127821.
|
||||||
|
|
||||||
|
- The CSI handler [tries to determine which instance is newer](https://github.com/kubernetes/kubernetes/blob/7140b4910c6c1179c9778a7f3bb8037356febd58/pkg/volume/csi/csi_plugin.go#L115-L125) based on the supported version(s) and
|
||||||
|
only remembers that one. If that newest instance fails, there is no fallback.
|
||||||
|
|
||||||
|
In practice, most CSI drivers probably all pass [the hard-coded "1.0.0"](https://github.com/kubernetes-csi/node-driver-registrar/blob/27700e2962cd35b9f2336a156146181e5c75399e/cmd/csi-node-driver-registrar/main.go#L72)
|
||||||
|
from the csi-node-registrar as supported version, so this version
|
||||||
|
selection mechanism isn't used at all.
|
||||||
|
|
||||||
|
- The DRA handler only remembers the most recently registered instance.
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
|
||||||
|
Deploying a plugin with support for seamless upgrades and per-instance socket
|
||||||
|
filenames is *not* compatible with a kubelet version that does not have support
|
||||||
|
for seamless upgrades yet. It breaks like this:
|
||||||
|
|
||||||
|
- New instance starts, gets registered and replaces the old one.
|
||||||
|
- Old instance stops, removing its socket.
|
||||||
|
- The kubelet notices that, unregisters the plugin.
|
||||||
|
- The plugin handler removes *the new* instance because it ignores the socket path -> no instance left.
|
||||||
|
|
||||||
|
Plugin authors either have to assume that the cluster has a recent enough
|
||||||
|
kubelet or rely on labeling nodes with support. Then the plugin can use one
|
||||||
|
simple DaemonSet for nodes without support and another, more complex one where
|
||||||
|
`maxSurge` is increased to enable seamless upgrades on nodes which support it.
|
||||||
|
No such label is specified at the moment.
|
||||||
|
|
||||||
## gRPC Service Lifecycle
|
## gRPC Service Lifecycle
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user