runtime-rs: Refine OOM watcher error reporting for sandbox teardown

This commit refines the error handling within the OOM watcher to
distinguish between genuine failures and errors that occur as a natural
consequence of sandbox shutdown via the helper is_normal_shutdown_error.
Previously, various connection-related errors during teardown were logged
as warnings, contributing to noisy logs.

It aims to improve OOM error handling, distinguish error types:
The logic now differentiates between "normal shutdown" errors (e.g.,
Connection reset by peer, broken pipe) and actual OOM watcher failures.

This enhancement makes OOM event logs more informative and less prone to
clutter during normal sandbox termination.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
This commit is contained in:
Alex Lyn
2026-01-06 19:57:03 +08:00
parent 3095bd379b
commit 44dd2b1f34

View File

@@ -12,6 +12,7 @@ use agent::{
};
use anyhow::{anyhow, Context, Result};
use async_trait::async_trait;
use common::error::is_normal_oom_shutdown_error;
use common::types::utils::option_system_time_into;
use common::types::ContainerProcess;
use common::{
@@ -925,8 +926,14 @@ impl Sandbox for VirtSandbox {
}
}
Err(err) => {
warn!(sl!(), "failed to get oom event error {:?}", err);
break;
// Handle errors by type
if is_normal_oom_shutdown_error(&err) {
info!(sl!(), "oom watcher exit on sandbox shutdown: {:?}", err);
break;
} else {
warn!(sl!(), "failed to get oom event error {:?}", err);
continue;
}
}
}
}