If you set `trafficDistribution: PreferClose` on a service in a
cluster with no defined zones, then it would add
hints:
forZones:
- name: ""
to each endpoint. This ended up working anyway since kube-proxy would
likewise end up looking for an endpoint for the "" zone, but it's
unnecessary, since you'd get exactly the same behavior by just leaving
all of the endpoints unhinted. (Of course there's no point in using
PreferClose traffic distribution in this case, but this will make
PreferSameNode cleaner.)
Merge TestReconcileHints_trafficDistribution_is_PreferClose and
TestReconcileHints_trafficDistribution_is_nil_or_empty together.
Change the `trafficDistribution: ""` test to `trafficDistribution:
Unknown`, since `""` is not actually a possible value (but we should
still test that unknown values are ignored, to prevent weird skew
bugs).
Fill in the NodeName field in the endpoints. It's not needed yet but
it will be.
Don't use sets for validating port name and zone hint uniqueness,
since constructing a new set each time is likely to be less efficient
than just doing a linear search.
Keep the sets for supportedAddressTypes and supportedPortProtocols
(since they're only constructed once) but switch to the generic set
API.
Changing the encryption key doesn't work with watch cache as it doesn't
break decoding newly written objects. A new object will be written using
a new key, and decoded using a new key.
If there was an unexpected status, the code extracting the expected error
message crashed with a panic. Happened once so far, for unknown reasons
because the unexpected status then didn't get logged.
When the new RollingUpdate option is used, the DRA driver gets deployed such
that it uses unique socket paths and uses file locking to serialize gRPC
calls. This enables the kubelet to pick arbitrarily between two concurrently
instances. The handover is seamless (no downtime, no removal of ResourceSlices
by the kubelet).
For file locking, the fileutils package from etcd is used because that was
already a Kubernetes dependency. Unfortunately that package brings in some
additional indirect dependency for DRA drivers (zap, multierr), but those
seem acceptable.
The key difference is that the kubelet must remember all plugin instances
because it could always happen that the new instance dies and leaves only the
old one running.
The endpoints of each instance must be different. Registering a plugin with the
same endpoint as some other instance is not supported and triggers an error,
which should get reported as "not registered" to the plugin. This should only
happen when the kubelet missed some unregistration event and re-registers the
same instance again. The recovery in this case is for the plugin to shut down,
remove its socket, which should get observed by kubelet, and then try again
after a restart.
When doing an update of a DaemonSet, first the old pod gets stopped and
then the new one is started. This causes the kubelet to remove all
ResourceSlices directly after removal and forces the new pod to recreate all of
them.
Now the kubelet waits 30 seconds before it deletes ResourceSlices. If a new
driver registers during that period, nothing is done at all. The new driver
finds the existing ResourceSlices and only needs to update them if something
changed.
The downside is that if the driver gets removed permanently, this creates a
delay where pods might still get scheduled to the node although the driver is
not going to run there anymore and thus the pods will be stuck.