Services which fail to be successfully synced as a cause by a triggered node event
are actually never retried. The commit before this one gave an example of when such
services used to be retried before, but that was not really efficient nor fully
correct. Moving to a workqueue for node events is a more modern approach to syncing
nodes, and placing all service keys that have failed on the service workqueue, in
case they do, fixes the re-sync problem
Also, now that we are using a node workqueue and use one go-routine to service items
from that queue, we don't need the `nodeSyncLock` anymore. So further clean that up
from the controller.
It dawned on me that `needsFullSync` can never be false. `needsFullSync` was used
to compare the set of nodes that were existing last time the node event handler was
triggered, with the current set of node for this run. However, if `triggerNodeSync`
gets called it's always because the set of nodes have changed due to a condition
changing on one node, or a new node being added/removed. If `needsFullSync` can
never be false then a lot of things in the service sync path was just spurious, for
ex: `servicesToRetry`, `knownHosts`. Essentially: if we ever need to `triggerNodeSync`
then the set of nodes have somehow changed and we always need to re-sync all services.
Before this patch series there was a possibility for `needsFullSync` to be set to false.
`shouldSyncNode` and the predicates used to list nodes were not aligned, specifically
for Unschedulable nodes. This means that we could have been triggered by a change to
the schedulable state but not actually computed any diffs between the old vs. new nodes.
Meaning, whenever there was a change in schedulable state we would just try to re-sync
all service updates that might have failed when we synced last time. But I believe this
to be an overlooked coincidence, rather than something actually intended.
Updates predicate to check for a length >=2 to avoid
the index out of bounds panic.
Signed-off-by: Edwin Xie <exie@vmware.com>
Co-authored-by: Tyler Schultz <tschultz@vmware.com>
github.com/opencontainers/selinux/go-selinux needs OS that supports SELinux
and SELinux enabled in it to return useful data, therefore add an interface
in front of it, so we can mock its behavior in unit tests.
In theory the check is not necessary, but for sake of robustness and
completenes, let's check SELinuxMountReadWriteOncePod feature gate before
assuming anything about SELinux labels.
Add a new call to VolumePlugin interface and change all its
implementations.
Kubelet's VolumeManager will be interested whether a volume supports
mounting with -o conext=XYZ or not to hanle SetUp() / MountDevice()
accordingly.
Let volume plugins decide if they want to mount volumes with "-o
context=XYZ" or let the container runtime relabel the volume on container
startup.
Using NewMounter, as it's the call where a volume plugin gets the other MountOptions.