clh: isClhRunning waits for full timeout when clh exits

isClhRunning uses signal 0 to test whether the process is still alive or not. This doesn't work because the process is a direct child of the shim. Once it is dead the process becomes zombie. Since no one waits for it the process lingers until its parent dies and init reaps it. Hence sending signal 0 in isClhRunning will always return success whether the process is dead or not. This patch calls wait to reap the process, if it succeeds that means it is our child process, if not we send the signal. Fixes: #9431 Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2025-09-20 16:27:52 +00:00 · 2024-04-08 15:44:46 +03:00
parent 6b2d655857
commit 54923164b5
1 changed files with 6 additions and 1 deletions
--- a/src/runtime/virtcontainers/clh.go
+++ b/src/runtime/virtcontainers/clh.go
@@ -1467,7 +1467,12 @@ func (clh *cloudHypervisor) isClhRunning(timeout uint) (bool, error) {
 	timeStart := time.Now()
 	cl := clh.client()
 	for {
-		err := syscall.Kill(pid, syscall.Signal(0))
+		waitedPid, err := syscall.Wait4(pid, nil, syscall.WNOHANG, nil)
+		if waitedPid == pid && err == nil {
+			return false, nil
+		}
+
+		err = syscall.Kill(pid, syscall.Signal(0))
 		if err != nil {
 			return false, nil
 		}