Compare commits

..

10 Commits

Author SHA1 Message Date
Alex Lyn
22c4cab237 Merge pull request #12623 from Apokleos/fix-dgb-ut
runtime-rs: Fix dragonball's flaky unit tests
2026-03-09 11:38:02 +08:00
Alex Lyn
62b0f63e37 dragonball: Generate unique TAP names to avoid conflicts
The vhost-kern net unit test used a fixed TAP interface name
("test_vhosttap"). When tests run in parallel or a previous run
leaves the interface behind, TAP creation can fail with
EBUSY ("Resource busy"), making CI flaky.

Introduce a unique_tap_name() helper in the tests and use it to
generate a per-test TAP name (based on pid/thread/counter),
avoiding name collisions and stabilizing CI.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-03-06 17:33:40 +08:00
Alex Lyn
1c8c0089da dragonball: fix flaky signal_handler test using libc::raise
The signal_handler test was intermittently failing because it used
kill(pid, sig), which sends signals asynchronously to the process.
This created a race condition where the child thread could exit and
be joined before the signal was delivered or processed.

This fix including:
1. Replaces `kill` with `libc::raise` to ensure signals are delivered
   synchronously to the calling thread.
2. Reorders triggers to verify standard signals before installing
   seccomp filters.
3. Guarantees that metrics are incremented before the child thread
   terminates and is joined by the main thread.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-03-06 09:28:56 +08:00
Alex Lyn
d0718f6001 dragonball: Fix unnecessary parentheses around type
warning: unnecessary parentheses around type
   --> src/dragonball/dbs_legacy_devices/src/serial.rs:245:39
    |
245 |         let out: Arc<Mutex<Option<Box<(dyn std::io::Write + Send +
'static)>>>> =
    |                                       ^
^
    |
    = note: `#[warn(unused_parens)]` (part of `#[warn(unused)]`) on by
default
help: remove these parentheses

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-03-06 09:28:56 +08:00
Alex Lyn
b4161198ee dragonball: Remove unused imports variables in dbs_pci
Fix warnings of unused imports as below:
```
warning: unused imports: `DEVICE_ACKNOWLEDGE`, `DEVICE_DRIVER_OK`,
`DEVICE_DRIVER`, `DEVICE_FEATURES_OK`, and `DEVICE_INIT`
    --> src/dragonball/dbs_pci/src/virtio_pci.rs:1177:9
     |
1177 |         DEVICE_ACKNOWLEDGE, DEVICE_DRIVER, DEVICE_DRIVER_OK,
DEVICE_FEATURES_OK, DEVICE_INIT,
     |         ^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^  ^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^
     |
     = note: `#[warn(unused_imports)]` (part of `#[warn(unused)]`) on by
default
```

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-03-06 09:28:56 +08:00
Alex Lyn
ca4e14086f runtime-rs: Fix warnings of unformatted codes
Fix warnings from unformattted codes.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-03-06 09:28:56 +08:00
Alex Lyn
ce800b7c37 dragonball: Fix flaky test_vhost_user_net_virtio_device_activate hang
The vhost-user-net tests could hang in CI because
VhostUserNet::new_server() blocks indefinitely on listener.accept()
when the slave fails to connect in time
(e.g. due to scheduler delays or flaky socket paths). This also caused
panics when connect_slave() returned None and the test unwrapped it.

Fix the tests by:
- using a `/tmp`, absolute, unique unix socket path per test run
  retrying slave connect with a deadline
- running new_server() in a separate thread and waiting via
  recv_timeout() to ensure the test never blocks indefinitely

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-03-06 09:28:56 +08:00
Alex Lyn
a988b10440 dragonball: Fix flaky test_vhost_user_net_virtio_device_normal hang
It aims to fix flaky test hang by implementing thread timeouts.

The `test_vhost_user_net_virtio_device_normal` was hanging in CI
when master/slave threads drifted.

This commit stabilizes the test by:
- Using `tempfile` and unique paths to ensure socket isolation.
- Adding a 5s deadline for slave connections to handle CI jitter.
- Running `new_server` in a separate thread with a `recv_timeout`
  to prevent the CI pipeline from deadlocking.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-03-06 09:28:56 +08:00
Alex Lyn
f36218d566 dragonball: Fix flaky test_inner_stream_timeout in inner backend
The `test_inner_stream_timeout` test case was prone to failure due to a
race condition between the main thread and the background handler. The
test relied on hardcoded `thread::sleep` durations, which could cause
the second read operation to time out (150ms window) before the main
thread performed its write (after a 300ms sleep) under high system load.

This commit stabilizes the test by:
1. Replacing fixed sleep durations with a `Condvar` and a `stage`
   variable to implement a deterministic state machine.
2. Synchronizing the threads so that the main thread only writes data
   after the background handler has confirmed it is ready or has
   completed its previous phase.
3. Ensuring the read timeout is explicitly managed between different
   validation stages to prevent accidental `TimedOut` errors.

This change eliminates the flakiness and ensures the test passes
consistently across different CIenvironments.

Fixes #12618

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-03-06 09:28:56 +08:00
Alex Lyn
c8a39ad28d dragonball: Fix flaky test_epoll_manager by improving synchronization
This commit aims to address issues of "Infinite loop in epoll_manager
tests" and improve stablity.

Root causes as below:
1. Using `handle_events(-1)` caused the worker thread to block forever
   if an event was missed or if the internal `kick()` signal was not
   accounted for correctly.
2. Relying on event counts was unreliable because internal signals could
   fluctuate the total count, causing the it to enter an infinite loop.
3. Using `EventSet::OUT` on an EventFd is often continuously ready,
   leading to non-deterministic trigger behavior.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-03-06 09:28:56 +08:00
13 changed files with 302 additions and 226 deletions

View File

@@ -25,9 +25,10 @@ jobs:
fail-fast: false
matrix:
containerd_version: ['active']
# vmm: ['dragonball', 'cloud-hypervisor', 'qemu-runtime-rs']
vmm: ['dragonball', 'qemu-runtime-rs']
runs-on: ubuntu-24.04
vmm: ['dragonball', 'cloud-hypervisor', 'qemu-runtime-rs']
# TODO: enable me when https://github.com/containerd/containerd/issues/11640 is fixed
if: false
runs-on: ubuntu-22.04
env:
CONTAINERD_VERSION: ${{ matrix.containerd_version }}
GOPATH: ${{ github.workspace }}

View File

@@ -242,7 +242,7 @@ mod tests {
let metrics = Arc::new(SerialDeviceMetrics::default());
let out: Arc<Mutex<Option<Box<(dyn std::io::Write + Send + 'static)>>>> =
let out: Arc<Mutex<Option<Box<dyn std::io::Write + Send + 'static>>>> =
Arc::new(Mutex::new(Some(Box::new(std::io::sink()))));
let mut serial = SerialDevice {
serial: Serial::with_events(

View File

@@ -1174,7 +1174,6 @@ pub(crate) mod tests {
use dbs_virtio_devices::Result as VirtIoResult;
use dbs_virtio_devices::{
ActivateResult, VirtioDeviceConfig, VirtioDeviceInfo, VirtioSharedMemory,
DEVICE_ACKNOWLEDGE, DEVICE_DRIVER, DEVICE_DRIVER_OK, DEVICE_FEATURES_OK, DEVICE_INIT,
};
use dbs_address_space::{AddressSpaceLayout, AddressSpaceRegion, AddressSpaceRegionType};

View File

@@ -99,76 +99,61 @@ impl Default for EpollManager {
#[cfg(test)]
mod tests {
use super::*;
use std::os::unix::io::AsRawFd;
use std::os::fd::AsRawFd;
use std::sync::mpsc::channel;
use std::time::Duration;
use vmm_sys_util::{epoll::EventSet, eventfd::EventFd};
struct DummySubscriber {
pub event: EventFd,
pub event: Arc<EventFd>,
pub notify: std::sync::mpsc::Sender<()>,
}
impl DummySubscriber {
fn new() -> Self {
Self {
event: EventFd::new(0).unwrap(),
}
fn new(event: Arc<EventFd>, notify: std::sync::mpsc::Sender<()>) -> Self {
Self { event, notify }
}
}
impl MutEventSubscriber for DummySubscriber {
fn process(&mut self, events: Events, _ops: &mut EventOps) {
let source = events.fd();
let event_set = events.event_set();
assert_ne!(source, self.event.as_raw_fd());
match event_set {
EventSet::IN => {
unreachable!()
}
EventSet::OUT => {
self.event.read().unwrap();
}
_ => {
unreachable!()
}
}
fn init(&mut self, ops: &mut EventOps) {
ops.add(Events::new(self.event.as_ref(), EventSet::IN))
.unwrap();
}
fn init(&mut self, _ops: &mut EventOps) {}
fn process(&mut self, events: Events, _ops: &mut EventOps) {
if events.fd() == self.event.as_raw_fd() && events.event_set().contains(EventSet::IN) {
let _ = self.event.read();
let _ = self.notify.send(());
}
}
}
#[test]
fn test_epoll_manager() {
let mut epoll_manager = EpollManager::default();
let epoll_manager_clone = epoll_manager.clone();
let thread = std::thread::spawn(move || loop {
let count = epoll_manager_clone.handle_events(-1).unwrap();
if count == 0 {
continue;
let epoll_manager = EpollManager::default();
let (stop_tx, stop_rx) = channel::<()>();
let worker_mgr = epoll_manager.clone();
let worker = std::thread::spawn(move || {
while stop_rx.try_recv().is_err() {
let _ = worker_mgr.handle_events(50);
}
assert_eq!(count, 1);
break;
});
let handler = DummySubscriber::new();
let event = handler.event.try_clone().unwrap();
let (notify_tx, notify_rx) = channel::<()>();
let event = Arc::new(EventFd::new(0).unwrap());
let handler = DummySubscriber::new(event.clone(), notify_tx);
let id = epoll_manager.add_subscriber(Box::new(handler));
thread.join().unwrap();
epoll_manager
.add_event(id, Events::new(&event, EventSet::OUT))
.unwrap();
event.write(1).unwrap();
let epoll_manager_clone = epoll_manager.clone();
let thread = std::thread::spawn(move || loop {
let count = epoll_manager_clone.handle_events(-1).unwrap();
if count == 0 {
continue;
}
assert_eq!(count, 2);
break;
});
notify_rx
.recv_timeout(Duration::from_secs(2))
.expect("timeout waiting for subscriber to be processed");
thread.join().unwrap();
epoll_manager.remove_subscriber(id).unwrap();
epoll_manager.clone().remove_subscriber(id).unwrap();
let _ = stop_tx.send(());
worker.join().unwrap();
}
}

View File

@@ -690,6 +690,15 @@ mod tests {
use crate::tests::{create_address_space, create_vm_and_irq_manager};
use crate::{create_queue_notifier, VirtioQueueConfig};
fn unique_tap_name(prefix: &str) -> String {
use std::sync::atomic::{AtomicUsize, Ordering};
static CNT: AtomicUsize = AtomicUsize::new(0);
let n = CNT.fetch_add(1, Ordering::Relaxed);
// "vtap" + pid(<=5) + n(<=3) => max len <= 15
format!("{}{:x}{:x}", prefix, std::process::id() & 0xfff, n & 0xfff)
}
fn create_vhost_kern_net_epoll_handler(
id: String,
) -> NetEpollHandler<Arc<GuestMemoryMmap>, QueueSync, GuestRegionMmap> {
@@ -723,13 +732,16 @@ mod tests {
let guest_mac = MacAddr::parse_str(guest_mac_str).unwrap();
let queue_sizes = Arc::new(vec![128]);
let epoll_mgr = EpollManager::default();
let mut dev: Net<Arc<GuestMemoryMmap>, QueueSync, GuestRegionMmap> = Net::new(
String::from("test_vhosttap"),
Some(&guest_mac),
queue_sizes,
epoll_mgr,
)
.unwrap();
let tap_name = unique_tap_name("vtap");
let dev_result: VirtioResult<Net<Arc<GuestMemoryMmap>, QueueSync, GuestRegionMmap>> =
Net::new(tap_name.clone(), Some(&guest_mac), queue_sizes, epoll_mgr);
let mut dev: Net<Arc<GuestMemoryMmap>, QueueSync, GuestRegionMmap> = match dev_result {
Ok(d) => d,
Err(e) => {
eprintln!("skip test: failed to create tap {}: {:?}", tap_name, e);
return;
}
};
assert_eq!(dev.device_type(), TYPE_NET);
@@ -765,14 +777,16 @@ mod tests {
{
let queue_sizes = Arc::new(vec![128]);
let epoll_mgr = EpollManager::default();
let mut dev: Net<Arc<GuestMemoryMmap>, QueueSync, GuestRegionMmap> = Net::new(
String::from("test_vhosttap"),
Some(&guest_mac),
queue_sizes,
epoll_mgr,
)
.unwrap();
let tap_name = unique_tap_name("vtap");
let dev_result: VirtioResult<Net<Arc<GuestMemoryMmap>, QueueSync, GuestRegionMmap>> =
Net::new(tap_name.clone(), Some(&guest_mac), queue_sizes, epoll_mgr);
let mut dev: Net<Arc<GuestMemoryMmap>, QueueSync, GuestRegionMmap> = match dev_result {
Ok(d) => d,
Err(e) => {
eprintln!("skip test: failed to create tap {}: {:?}", tap_name, e);
return;
}
};
let queues = vec![
VirtioQueueConfig::create(128, 0).unwrap(),
VirtioQueueConfig::create(128, 0).unwrap(),
@@ -809,13 +823,17 @@ mod tests {
let queue_eventfd2 = Arc::new(EventFd::new(0).unwrap());
let queue_sizes = Arc::new(vec![128, 128]);
let epoll_mgr = EpollManager::default();
let mut dev: Net<Arc<GuestMemoryMmap>, Queue, GuestRegionMmap> = Net::new(
String::from("test_vhosttap"),
Some(&guest_mac),
queue_sizes,
epoll_mgr,
)
.unwrap();
let tap_name = unique_tap_name("vtap");
let dev_result: VirtioResult<Net<Arc<GuestMemoryMmap>, Queue, GuestRegionMmap>> =
Net::new(tap_name.clone(), Some(&guest_mac), queue_sizes, epoll_mgr);
let mut dev: Net<Arc<GuestMemoryMmap>, Queue, GuestRegionMmap> = match dev_result {
Ok(d) => d,
Err(e) => {
eprintln!("skip test: failed to create tap {}: {:?}", tap_name, e);
return;
}
};
let queues = vec![
VirtioQueueConfig::new(queue, queue_eventfd, notifier.clone(), 1),

View File

@@ -590,6 +590,7 @@ where
mod tests {
use std::sync::Arc;
use std::thread;
use std::time::{Duration, Instant};
use dbs_device::resources::DeviceResources;
use dbs_interrupt::{InterruptManager, InterruptSourceType, MsiNotifier, NoopNotifier};
@@ -609,19 +610,16 @@ mod tests {
};
use crate::{VirtioDevice, VirtioDeviceConfig, VirtioQueueConfig, TYPE_NET};
fn connect_slave(path: &str) -> Option<Endpoint<MasterReq>> {
let mut retry_count = 5;
fn connect_slave(path: &str, timeout: Duration) -> Option<Endpoint<MasterReq>> {
let deadline = Instant::now() + timeout;
loop {
match Endpoint::<MasterReq>::connect(path) {
Ok(endpoint) => return Some(endpoint),
Ok(ep) => return Some(ep),
Err(_) => {
if retry_count > 0 {
std::thread::sleep(std::time::Duration::from_millis(100));
retry_count -= 1;
continue;
} else {
if Instant::now() >= deadline {
return None;
}
thread::sleep(Duration::from_millis(20));
}
}
}
@@ -639,62 +637,88 @@ mod tests {
#[test]
fn test_vhost_user_net_virtio_device_normal() {
let device_socket = concat!("vhost.", line!());
let queue_sizes = Arc::new(vec![128]);
let dir_path = std::path::Path::new("/tmp");
let socket_path = dir_path.join(format!(
"vhost-user-net-{}-{:?}.sock",
std::process::id(),
thread::current().id()
));
let socket_str = socket_path.to_str().unwrap().to_string();
let _ = std::fs::remove_file(&socket_path);
let queue_sizes = Arc::new(vec![128u16]);
let epoll_mgr = EpollManager::default();
let handler = thread::spawn(move || {
let mut slave = connect_slave(device_socket).unwrap();
let socket_for_slave = socket_str.clone();
let slave_th = thread::spawn(move || {
let mut slave = connect_slave(&socket_for_slave, Duration::from_secs(5))
.unwrap_or_else(|| panic!("slave connect timeout: {}", socket_for_slave));
create_vhost_user_net_slave(&mut slave);
});
let mut dev: VhostUserNet<Arc<GuestMemoryMmap>> =
VhostUserNet::new_server(device_socket, None, queue_sizes, epoll_mgr).unwrap();
let (tx, rx) = std::sync::mpsc::channel();
let socket_for_master = socket_str.clone();
let queue_sizes_for_master = queue_sizes.clone();
let epoll_mgr_for_master = epoll_mgr.clone();
thread::spawn(move || {
let res = VhostUserNet::<Arc<GuestMemoryMmap>>::new_server(
&socket_for_master,
None,
queue_sizes_for_master,
epoll_mgr_for_master,
);
let _ = tx.send(res);
});
let dev_res = rx
.recv_timeout(Duration::from_secs(5))
.unwrap_or_else(|_| panic!("new_server() stuck/timeout: {}", socket_str));
let dev: VhostUserNet<Arc<GuestMemoryMmap>> = dev_res.unwrap_or_else(|e| {
panic!(
"new_server() returned error: {:?}, socket={}",
e, socket_str
)
});
assert_eq!(
VirtioDevice::<Arc<GuestMemoryMmap<()>>, QueueSync, GuestRegionMmap>::device_type(&dev),
TYPE_NET
);
let queue_size = [128];
let queue_size = [128u16];
assert_eq!(
VirtioDevice::<Arc<GuestMemoryMmap<()>>, QueueSync, GuestRegionMmap>::queue_max_sizes(
&dev
),
&queue_size[..]
);
assert_eq!(
VirtioDevice::<Arc<GuestMemoryMmap<()>>, QueueSync, GuestRegionMmap>::get_avail_features(&dev, 0),
dev.device().device_info.get_avail_features(0)
);
assert_eq!(
VirtioDevice::<Arc<GuestMemoryMmap<()>>, QueueSync, GuestRegionMmap>::get_avail_features(&dev, 1),
dev.device().device_info.get_avail_features(1)
);
assert_eq!(
VirtioDevice::<Arc<GuestMemoryMmap<()>>, QueueSync, GuestRegionMmap>::get_avail_features(&dev, 2),
dev.device().device_info.get_avail_features(2)
);
VirtioDevice::<Arc<GuestMemoryMmap<()>>, QueueSync, GuestRegionMmap>::set_acked_features(
&mut dev, 2, 0,
);
assert_eq!(VirtioDevice::<Arc<GuestMemoryMmap<()>>, QueueSync, GuestRegionMmap>::get_avail_features(&dev, 2), 0);
let config: [u8; 8] = [0; 8];
let _ = VirtioDevice::<Arc<GuestMemoryMmap<()>>, QueueSync, GuestRegionMmap>::write_config(
&mut dev, 0, &config,
);
let mut data: [u8; 8] = [1; 8];
let _ = VirtioDevice::<Arc<GuestMemoryMmap<()>>, QueueSync, GuestRegionMmap>::read_config(
&mut dev, 0, &mut data,
);
assert_eq!(config, data);
handler.join().unwrap();
slave_th.join().unwrap();
let _ = std::fs::remove_file(&socket_path);
drop(dev);
}
#[test]
fn test_vhost_user_net_virtio_device_activate() {
skip_if_kvm_unaccessable!();
let device_socket = concat!("vhost.", line!());
let queue_sizes = Arc::new(vec![128]);
let dir_path = std::path::Path::new("/tmp");
let socket_path = dir_path.join(format!(
"vhost-user-net-{}-{:?}.sock",
std::process::id(),
thread::current().id()
));
let socket_str = socket_path.to_str().unwrap().to_string();
let _ = std::fs::remove_file(&socket_path);
let queue_sizes = Arc::new(vec![128u16]);
let epoll_mgr = EpollManager::default();
let handler = thread::spawn(move || {
let mut slave = connect_slave(device_socket).unwrap();
let socket_for_slave = socket_str.clone();
let slave_th = thread::spawn(move || {
let mut slave = connect_slave(&socket_for_slave, Duration::from_secs(10))
.unwrap_or_else(|| panic!("slave connect timeout: {}", socket_for_slave));
create_vhost_user_net_slave(&mut slave);
let mut pfeatures = VhostUserProtocolFeatures::all();
// A workaround for no support for `INFLIGHT_SHMFD`. File an issue to track
@@ -702,8 +726,30 @@ mod tests {
pfeatures -= VhostUserProtocolFeatures::INFLIGHT_SHMFD;
negotiate_slave(&mut slave, pfeatures, true, 1);
});
let mut dev: VhostUserNet<Arc<GuestMemoryMmap>> =
VhostUserNet::new_server(device_socket, None, queue_sizes, epoll_mgr).unwrap();
let (tx, rx) = std::sync::mpsc::channel();
let socket_for_master = socket_str.clone();
let queue_sizes_for_master = queue_sizes.clone();
let epoll_mgr_for_master = epoll_mgr.clone();
thread::spawn(move || {
let res = VhostUserNet::<Arc<GuestMemoryMmap>>::new_server(
&socket_for_master,
None,
queue_sizes_for_master,
epoll_mgr_for_master,
);
let _ = tx.send(res);
});
let mut dev: VhostUserNet<Arc<GuestMemoryMmap>> = rx
.recv_timeout(Duration::from_secs(10))
.unwrap_or_else(|_| panic!("new_server() stuck/timeout: {}", socket_str))
.unwrap_or_else(|e| {
panic!(
"new_server() returned error: {:?}, socket={}",
e, socket_str
)
});
// invalid queue size
{
let kvm = Kvm::new().unwrap();
@@ -760,6 +806,9 @@ mod tests {
);
dev.activate(config).unwrap();
}
handler.join().unwrap();
slave_th.join().unwrap();
let _ = std::fs::remove_file(&socket_path);
drop(dev);
}
}

View File

@@ -867,56 +867,96 @@ mod tests {
.set_read_timeout(Some(Duration::from_millis(150)))
.is_ok());
let cond_pair = Arc::new((Mutex::new(false), Condvar::new()));
let cond_pair_2 = Arc::clone(&cond_pair);
let handler = thread::Builder::new()
.spawn(move || {
// notify handler thread start
let (lock, cvar) = &*cond_pair_2;
let mut started = lock.lock().unwrap();
*started = true;
// stage:
// 0 = handler started
// 1 = first read timed out (main can do first write now)
// 2 = timeout cancelled, handler is about to do 3rd blocking read
let stage = Arc::new((Mutex::new(0u32), Condvar::new()));
let stage2 = Arc::clone(&stage);
let handler = thread::spawn(move || {
// notify started
{
let (lock, cvar) = &*stage2;
let mut s = lock.lock().unwrap();
*s = 0;
cvar.notify_one();
drop(started);
}
let start_time1 = Instant::now();
let mut reader_buf = [0; 5];
// first read would timed out
assert_eq!(
outer_stream.read_exact(&mut reader_buf).unwrap_err().kind(),
ErrorKind::TimedOut
);
let end_time1 = Instant::now().duration_since(start_time1).as_millis();
assert!((150..250).contains(&end_time1));
let mut reader_buf = [0u8; 5];
// second read would ok
assert!(outer_stream.read_exact(&mut reader_buf).is_ok());
assert_eq!(reader_buf, [1, 2, 3, 4, 5]);
// 1) first read should timed out
let start_time1 = Instant::now();
assert_eq!(
outer_stream.read_exact(&mut reader_buf).unwrap_err().kind(),
ErrorKind::TimedOut
);
let end_time1 = start_time1.elapsed().as_millis();
assert!((150..300).contains(&end_time1));
// cancel the read timeout
let start_time2 = Instant::now();
outer_stream.set_read_timeout(None).unwrap();
assert!(outer_stream.read_exact(&mut reader_buf).is_ok());
let end_time2 = Instant::now().duration_since(start_time2).as_millis();
assert!(end_time2 >= 500);
})
.unwrap();
outer_stream
.set_read_timeout(Some(Duration::from_secs(10)))
.unwrap();
// wait handler thread started
let (lock, cvar) = &*cond_pair;
let mut started = lock.lock().unwrap();
while !*started {
started = cvar.wait(started).unwrap();
// notify main: timeout observed, now do first write
{
let (lock, cvar) = &*stage2;
let mut s = lock.lock().unwrap();
*s = 1;
cvar.notify_one();
}
// 2) second read should ok (main will write after stage==1)
outer_stream.read_exact(&mut reader_buf).unwrap();
assert_eq!(reader_buf, [1, 2, 3, 4, 5]);
// 3) cancel timeout, then do a blocking read; notify main before blocking
outer_stream.set_read_timeout(None).unwrap();
{
let (lock, cvar) = &*stage2;
let mut s = lock.lock().unwrap();
*s = 2;
cvar.notify_one();
}
let start_time2 = Instant::now();
outer_stream.read_exact(&mut reader_buf).unwrap();
let end_time2 = start_time2.elapsed().as_millis();
assert!(end_time2 >= 500);
assert_eq!(reader_buf, [1, 2, 3, 4, 5]);
});
// wait handler started (stage==0)
{
let (lock, cvar) = &*stage;
let mut s = lock.lock().unwrap();
while *s != 0 {
s = cvar.wait(s).unwrap();
}
}
// sleep 300ms, test timeout
thread::sleep(Duration::from_millis(300));
let writer_buf = [1, 2, 3, 4, 5];
inner_stream.write_all(&writer_buf).unwrap();
// wait first timeout done (stage==1), then do first write
{
let (lock, cvar) = &*stage;
let mut s = lock.lock().unwrap();
while *s < 1 {
s = cvar.wait(s).unwrap();
}
}
inner_stream.write_all(&[1, 2, 3, 4, 5]).unwrap();
// wait handler cancelled timeout and is about to block-read (stage==2)
{
let (lock, cvar) = &*stage;
let mut s = lock.lock().unwrap();
while *s < 2 {
s = cvar.wait(s).unwrap();
}
}
// sleep 500ms again, test cancel timeout
thread::sleep(Duration::from_millis(500));
let writer_buf = [1, 2, 3, 4, 5];
inner_stream.write_all(&writer_buf).unwrap();
inner_stream.write_all(&[1, 2, 3, 4, 5]).unwrap();
handler.join().unwrap();
}

View File

@@ -120,7 +120,7 @@ mod tests {
use libc::{cpu_set_t, syscall};
use std::convert::TryInto;
use std::{mem, process, thread};
use std::{mem, thread};
use seccompiler::{apply_filter, BpfProgram, SeccompAction, SeccompFilter};
@@ -157,6 +157,16 @@ mod tests {
let child = thread::spawn(move || {
assert!(register_signal_handlers().is_ok());
// Trigger SIGBUS/SIGSEGV *before* installing the seccomp filter.
// Call SIGBUS signal handler.
assert_eq!(METRICS.read().unwrap().signals.sigbus.count(), 0);
unsafe { libc::raise(SIGBUS) };
// Call SIGSEGV signal handler.
assert_eq!(METRICS.read().unwrap().signals.sigsegv.count(), 0);
unsafe { libc::raise(SIGSEGV) };
// Install a seccomp filter that traps a known syscall so that we can verify SIGSYS handling.
let filter = SeccompFilter::new(
vec![(libc::SYS_mkdirat, vec![])].into_iter().collect(),
SeccompAction::Allow,
@@ -168,20 +178,8 @@ mod tests {
assert!(apply_filter(&TryInto::<BpfProgram>::try_into(filter).unwrap()).is_ok());
assert_eq!(METRICS.read().unwrap().seccomp.num_faults.count(), 0);
// Call the blacklisted `SYS_mkdirat`.
// Invoke the blacklisted syscall to trigger SIGSYS and exercise the SIGSYS handler.
unsafe { syscall(libc::SYS_mkdirat, "/foo/bar\0") };
// Call SIGBUS signal handler.
assert_eq!(METRICS.read().unwrap().signals.sigbus.count(), 0);
unsafe {
syscall(libc::SYS_kill, process::id(), SIGBUS);
}
// Call SIGSEGV signal handler.
assert_eq!(METRICS.read().unwrap().signals.sigsegv.count(), 0);
unsafe {
syscall(libc::SYS_kill, process::id(), SIGSEGV);
}
});
assert!(child.join().is_ok());

View File

@@ -470,7 +470,10 @@ impl CloudHypervisorInner {
net_config.id = None;
net_config.num_queues = network_queues_pairs * 2;
info!(sl!(), "network device queue pairs {:?}", network_queues_pairs);
info!(
sl!(),
"network device queue pairs {:?}", network_queues_pairs
);
// we need ensure opening network device happens in netns.
let netns = self.netns.clone().unwrap_or_default();

View File

@@ -9,8 +9,8 @@ use crate::device::topology::PCIePort;
use crate::qemu::qmp::get_qmp_socket_path;
use crate::{
device::driver::ProtectionDeviceConfig, hypervisor_persist::HypervisorState, selinux,
HypervisorConfig, MemoryConfig, VcpuThreadIds, VsockDevice, HYPERVISOR_QEMU,
KATA_BLK_DEV_TYPE, KATA_CCW_DEV_TYPE, KATA_NVDIMM_DEV_TYPE, KATA_SCSI_DEV_TYPE,
HypervisorConfig, MemoryConfig, VcpuThreadIds, VsockDevice, HYPERVISOR_QEMU, KATA_BLK_DEV_TYPE,
KATA_CCW_DEV_TYPE, KATA_NVDIMM_DEV_TYPE, KATA_SCSI_DEV_TYPE,
};
use crate::utils::{
@@ -138,15 +138,16 @@ impl QemuInner {
&block_dev.config.path_on_host,
block_dev.config.is_readonly,
)?,
KATA_CCW_DEV_TYPE | KATA_BLK_DEV_TYPE | KATA_SCSI_DEV_TYPE => cmdline.add_block_device(
block_dev.device_id.as_str(),
&block_dev.config.path_on_host,
block_dev
.config
.is_direct
.unwrap_or(self.config.blockdev_info.block_device_cache_direct),
block_dev.config.driver_option.as_str() == KATA_SCSI_DEV_TYPE,
)?,
KATA_CCW_DEV_TYPE | KATA_BLK_DEV_TYPE | KATA_SCSI_DEV_TYPE => cmdline
.add_block_device(
block_dev.device_id.as_str(),
&block_dev.config.path_on_host,
block_dev
.config
.is_direct
.unwrap_or(self.config.blockdev_info.block_device_cache_direct),
block_dev.config.driver_option.as_str() == KATA_SCSI_DEV_TYPE,
)?,
unsupported => {
info!(sl!(), "unsupported block device driver: {}", unsupported)
}

View File

@@ -187,11 +187,21 @@ impl Qmp {
continue;
}
(None, _) => {
warn!(sl!(), "hotpluggable vcpu {} has no socket_id for driver {}, skipping", core_id, driver);
warn!(
sl!(),
"hotpluggable vcpu {} has no socket_id for driver {}, skipping",
core_id,
driver
);
continue;
}
(_, None) => {
warn!(sl!(), "hotpluggable vcpu {} has no thread_id for driver {}, skipping", core_id, driver);
warn!(
sl!(),
"hotpluggable vcpu {} has no thread_id for driver {}, skipping",
core_id,
driver
);
continue;
}
}
@@ -753,10 +763,9 @@ impl Qmp {
Ok((None, Some(scsi_addr)))
} else if block_driver == VIRTIO_BLK_CCW {
let subchannel = self
.ccw_subchannel
.as_mut()
.ok_or_else(|| anyhow!("CCW subchannel not available for virtio-blk-ccw hotplug"))?;
let subchannel = self.ccw_subchannel.as_mut().ok_or_else(|| {
anyhow!("CCW subchannel not available for virtio-blk-ccw hotplug")
})?;
let slot = subchannel
.add_device(&node_name)

View File

@@ -61,23 +61,9 @@ function install_dependencies() {
"install_${dep[0]}" "${dep[1]}"
done
# Clone containerd as we'll need to build it in order to run the tests.
# TODO: revert to upstream once https://github.com/containerd/containerd/pull/XXXXX
# (fix for getRuncOptions() failing for non-runc runtimes like Kata) is merged and
# released.
local containerd_fork="fidencio/containerd"
local containerd_branch="topic/fix-runc-options-type-mismatch-for-non-runc-runtimes"
info "Cloning containerd from fork ${containerd_fork}@${containerd_branch} (temporary, pending upstream fix)"
rm -rf containerd
git clone -b "${containerd_branch}" "https://github.com/${containerd_fork}"
# `make cri-integration` uses the cloned tree's `bin/containerd`, but later
# Kata-specific tests restart the systemd service and thus use
# `/usr/local/bin/containerd`. Install the same patched daemon there so both
# phases exercise the same containerd build.
info "Building and installing the patched containerd daemon for systemd restarts"
make -C containerd bin/containerd
sudo install -m 0755 containerd/bin/containerd /usr/local/bin/containerd
# Clone containerd as we'll need to build it in order to run the tests
# base_version: The version to be intalled in the ${major}.${minor} format
clone_cri_containerd $(get_from_kata_deps ".externals.containerd.${CONTAINERD_VERSION}")
}
function run() {

View File

@@ -162,13 +162,6 @@ function err_report() {
function check_daemon_setup() {
info "containerd(cri): Check daemon works with runc"
# Use podsandbox for the runc sanity check: the shim sandboxer has a known
# containerd-side bug where the OCI spec is not populated before NewBundle is
# called, so config.json is never written and containerd-shim-runc-v2 fails.
# See https://github.com/containerd/containerd/issues/11640
# This check only verifies that containerd + runc are functional before the
# real kata tests run, so the sandboxer choice doesn't matter here.
local SANDBOXER="podsandbox"
create_containerd_config "runc"
# containerd cri-integration will modify the passed in config file. Let's
@@ -666,13 +659,7 @@ function main() {
info "containerd(cri): Running cri-integration"
# TestContainerRestart is excluded: creating a new container in the same
# sandbox VM after the previous container has exited and been removed has
# never been supported by kata-containers (neither with the go-based nor
# the rust-based runtime). The kata VM shuts down when its last container
# is removed, so any attempt to start a new container in the same sandbox
# fails. This test exercises a use-case kata does not currently support.
passing_test="TestContainerStats|TestContainerListStatsWithIdFilter|TestContainerListStatsWithIdSandboxIdFilter|TestDuplicateName|TestImageLoad|TestImageFSInfo|TestSandboxCleanRemove"
passing_test="TestContainerStats|TestContainerRestart|TestContainerListStatsWithIdFilter|TestContainerListStatsWithIdSandboxIdFilter|TestDuplicateName|TestImageLoad|TestImageFSInfo|TestSandboxCleanRemove"
if [[ "${KATA_HYPERVISOR}" == "cloud-hypervisor" || \
"${KATA_HYPERVISOR}" == "qemu" ]]; then