Some test cases can make nodes not ready and use DeferCleanup to bring
nodes back online. Checking if all nodes are online would fail
in such cases as AfterEach runs before DeferCleanup.
Scheduling nodes readines check to DeferCleanup should solve this
issue as nodes would be brought back to a `Ready` state before the
check.
Add retry mechanism to handle cases where after kubelet restarts, the device
plugin unix socket(s) were created but not ready to serve yet.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
Add kubeletSocket file to fsnotify instead of polling and waiting for deletion
of device plugin unix socket as a way of detecting kubelet restart. We need to
ensure that the device plugin re-registers itself after kubelet restart depending
on the configured registration mode (auto-registration or controller registration).
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
If the user specifies the intent to control registration process, we rely on
registration triggers (deletion of control file) to prompt registration.
This behvaiour is expected to be consistent across kubelet restarts and therefore
across the watch calls where we watch for changes to the unix socket so we make
this part of Stub object instead of a parameter.
Co-authored-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
In case `REGISTER_CONTROL_FILE` is specified, we want to ensure that the
registration is triggered by deletion of the control file. This is
applicable both when the registration happens for the first time and
subsequent ones because of kubelet restarts.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
In issue: 115107 we added an environment variable to control the registration of sample
device plugin to kubelet. The intent of this patch is to ensure that the default
behaviour of the plugin is to register to kubelet (in case no environment
variable is specified).
In addition to that, we want to ensure that the plugin registers itself not just once.
It should re-register itself to kubelet in case of node reboot or kubelet restarts.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
Fixes https://github.com/kubernetes/kubernetes/issues/120941
GetNewerThan() call isn't blocking until the pod status/cache is updated and returning the empty pod status.
Hence, whenever the `SyncLoop ADD/UPDATE/RECONCILE` functions are called multiple times in a very less time interval,
Kubelet calls multiple `CreateContainer` CRI api that results in the creation of duplicate containers within a given pod.
The initially created conainer keeps `Running` and the later container keeps `Exiting` and hence resulting the pod in `CrashLoopBackOff` state forever
Signed-off-by: Sai Ramesh Vanka <svanka@redhat.com>
This was created as alpha (off by default) but should probably have used
the "Deprecated" setting. Changing it now makes it on by default and
still disable-able.