Compare commits

..

28 Commits

Author SHA1 Message Date
Fupan Li
c376ab5de7 runtime-rs: fix the issue of hot-unplug memory smaller
It should do nothing instead of return an error when
hot-unplug the memory to the size smaller than static
plugged memory size.

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2025-11-10 14:17:44 +08:00
Fupan Li
200abc815c tests: fix the issue of missing teardown pods
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2025-11-10 10:30:17 +08:00
Fupan Li
333d12dc47 runtime: fix the issue of update interface error
Since the network device hotplug is an asynchronous operation,
it's possible that the hotplug operation had returned, but
the network device hasn't ready in guest, thus it's better to
retry on this operation to wait until the device ready in guest.

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2025-11-07 11:29:57 +08:00
Fabiano Fidêncio
a591cda466 gatekeeper: Adjust the nvidia gpu test name
With the change made to the matrix when the CC GPU runner was added,
there was a change in the job name (@sprt saw that coming, but I
didn't).

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-06 16:28:33 +01:00
Manuel Huber
c6dc176a03 tests: nvidia: cc: Enable NIMs tests
Same deal as the previous commut, just enabling the tests here, with the
same list of improvements that we will need to go through in order to
get is working in a perfect way.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-06 16:28:33 +01:00
Manuel Huber
8ca77f2655 tests: nvidia: cc: Run CUDA vectorAdd tests on CC mode
While the primary goal of this change is to detect regressions to the
NVIDIA SNP GPU scenario, various improvements to reflect a more
realistic CC setting are planned in subsequent changes, such as:

* moving away from the overlayfs snapshotter
* disabling filesystem sharing
* applying a pod security policy
* activating the GPUs only after attestation
* using a refined approach for GPU cold-plugging without requiring
  annotations
* revisiting pod timeout and overhead parameters (the podOverhead value
  was increased due to CUDA vectorAdd requiring about 6Gi of
  podOverhead, as well as the inference and embedqa requiring at least
  12Gi, respectively, 14Gi of podOverhead to run without invoking the
  host's oom-killer. We will revisit this aspect after addressing
  points 1. and 2.)

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-06 16:28:33 +01:00
Manuel Huber
25ce0afd52 kata-deploy: Allow the CDI annotation for CC GPU cases
For the nvidia-gpu-snp and nvidia-gpu-tdx we must set containerd to
allow the CDI annotation to be passed to down.

This solution may become obsolete soon enough, but the cleanest way to
have it properly working is by adding it here (even if we remove it
before the next release).

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-06 16:28:33 +01:00
Manuel Huber
c91edf884b runtimeclasses: nvidia: Bump TEE podOverhead
It's been noticed that as more RAM is needed to run the CC tests, we
also need to update the podOverhead of the NVIDIA CC runtime classes to
avoid getting OOM Killed.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-06 16:28:33 +01:00
Fupan Li
aac2a37ff5 runtime-rs: enable pselect6 syscall for dragonball seccomp
Since the nerdctl's network hook would call pselect6 syscall
by xtables-nft-multi, thus we'd better add it to the seccomp's
whitelist.

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2025-11-06 11:17:57 +01:00
Hyounggyu Choi
ff429072b6 Merge pull request #11924 from BbolroC/fix-static-checks-actionspz
ci: Fix failing static checks to enable IBM actionspz - Z specific
2025-11-06 09:04:04 +01:00
Zvonko Kaiser
fce6a75899 Merge pull request #12027 from fidencio/topic/kata-deploy-make-ALLOWED_HYPERVISOR_ANNOTATIONS-per-arch
kata-deploy: Add per arch ALLOWED_HYPERVISOR_ANNOTATIONS
2025-11-05 18:20:14 -05:00
Manuel Huber
d8953f67c5 ci: Onboard another NVIDIA machine
Let's add a new NVIDIA machine, which later on will be used for CC
related tests.

For now the current tests are skipped in the CC capable machine.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-05 23:23:08 +01:00
Fabiano Fidêncio
b2ee64a2d6 kata-deploy: scripts: Ensure we don't add duplicated values
Let's now make sure that we don't add duplicated values to any of our
entries, making the script as sane as possible for sequential runs.

Vibed with Cursor's help!

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-05 19:48:24 +01:00
Fabiano Fidêncio
78ae79d153 kata-deploy: scripts: Add helper functions to avoid duplicated items
Let's add some helper functions, not yet used, to avoid adding
duplicated items.

This idea is an expansion of Choi's idea to avoid setting duplicated
items, and it'll help on making the whole script idempotent on
sequential runs.

Vibed with Cursor's help!

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-05 19:48:24 +01:00
Fabiano Fidêncio
f773368d93 kata-deploy: Add per arch ALLOWED_HYPERVISOR_ANNOTATIONS
I know, this is not simplifying much things for now, but it has a good
intent in the background and will serve as base for making the
kata-deploy helm chart more user friendly.

With that said, let's add ALLOWED_HYPERVISOR_ANNOTATIONS per arch, while
adding support to set something like "qemu:foo,bar clh:bar foobar
barfoo". Why? Because in the future we'll have a better way to set this
per shim (and the shim is per arch ...).

More details of what we'll do in the future are being discussed here:
https://github.com/kata-containers/kata-containers/issues/12024

Anyways, the variables are **DELIBERATELY** not exposed to the chart for
now, as those will be later on when addressing the issue mentioned
above.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-05 19:45:34 +01:00
Fabiano Fidêncio
66e133e096 kata-deploy: Add missing runtimeClasses
When the runtimeClasses were added, as part of 7cfa826804, the
firecracker runtimeClass ended up missing from the dictionary.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-05 19:07:28 +01:00
Anton Ippolitov
23c46b8a00 docs: Update devmapper containerd plugin name
The Firecracker installation docs had an outaded containerd configuration for the devmapper plugin.
This commit updates the instructions so that they are compatible with more recent versions of containerd.

Signed-off-by: Anton Ippolitov <anton.ippolitov@datadoghq.com>
2025-11-05 18:42:29 +01:00
Fabiano Fidêncio
ace9cf942d tests: guest-pull: Fix names
When added, I've mistakenly used the wrong test-type name, which is now
fixed and should be enough to trigger the tests correctly.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-05 18:21:48 +01:00
Hyounggyu Choi
4ee2037974 GHA: Run runtime tests on self-hosted runners for P/Z
On IBM actionspz P/Z runners, the following error was observed during
runtime tests:

```
host system doesn't support vsock: stat /dev/vhost-vsock: no such file or directory
```

Since loading the vsock module on the fly is not permitted, this commit
moves the runtime tests back to self-hosted runners for P/Z.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2025-11-05 16:35:04 +00:00
Hyounggyu Choi
32da38273a agent/tests: Skip if kernel module is not found
On IBM actionspz Z runners, the following error occurs when running
`modprobe`:

```
modprobe: FATAL: Module bridge not found in directory /lib/modules/6.8.0-85-generic
```

Additionally, there are no files under `/lib/modules`, for example:

```
total 0
drwxr-xr-x 1 root root    0 Aug  5 13:09 .
drwxr-xr-x 1 root root 2.0K Oct  1 22:59 ..
```

This commit skips the `test_load_kernel_module` test if the module is
not found or if running `modprobe` is not permitted.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2025-11-05 16:35:04 +00:00
Hyounggyu Choi
075de4dc62 agent/tests: Skip test if error is EACCES (permission denied)
On IBM actionspz Z runners, write operations on network interfaces
are not allowed, even for the root user.
This commit skips the `add_update_addresses` test if the operation
fails with EACCES (-13, permission denied).

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2025-11-05 16:35:04 +00:00
Hyounggyu Choi
3f84b623a3 agent/tests: Skip RNG reseeding test on restricted environments
On IBM actionspz Z runners, the ioctl system call is not allowed even
for the root user. There is likely an additional security mechanism
(such as AppArmor or seccomp) in place on Ubuntu runners.
This commit introduces a new helper, `is_permission_error()`,
which skips the test if ioctl operations in `reseed_rng()` are not
permitted.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2025-11-05 16:35:04 +00:00
Hyounggyu Choi
c2abc4da34 agent/tests: Use detected filesystem for baremounted points
The IBM actionspz Z runners mount /dev as tmpfs, while other systems
use devtmpfs. This difference causes an assertion failure for
test_already_baremounted.
This commit sets the detected filesystem for bare-mounted points
as the expected value.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2025-11-05 16:35:04 +00:00
Hyounggyu Choi
faa048893d agent/tests: Handle error messages differetnly based on root filesystem
The root filesystem for IBM actionspz Z runners is `btrfs` instead of `ext4`.
The error message differs when an unprivileged user tries to perform a bind mount.
This commit adjusts the handling of error messages based on the detected root
filesystem type.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2025-11-05 16:35:04 +00:00
Fupan Li
0df6c795d8 runtime-rs: disable the default static resource management
Since the qemu & cloud-hypervisor support the cpu & memory
hotplug now, thus disable the static resource management
for qemu and cloud-hypervisor by default.

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2025-11-05 16:59:13 +01:00
Fupan Li
02ecab40e4 tests: disable the cpu hotplug test for coco dev runtime
Since qemu-coco-dev-runtime-rs and qemu-coco-dev had disabled the
cpu&memory hotplug by enable static_sandbox_resource_mgmt, thus
we should disable the cpu hotplug test for those two runtime.

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2025-11-05 16:59:13 +01:00
Fupan Li
1fc05491a2 tests: enable the cpu hotplug test for dragonball etc
Since the qemu, cloud-hypervisor and dragonball had supported the
cpu hotplug on runtime-rs, thus enable the cpu hotplug test in CI.

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2025-11-05 16:59:13 +01:00
Fabiano Fidêncio
0a0de4e6e3 Revert "tests: Do not enable NFD on s390x"
This reverts commit c75a46d17f, as NFD now
publishes an s390x image (and also a ppc64le one).

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-05 16:06:33 +01:00
27 changed files with 390 additions and 78 deletions

View File

@@ -8,6 +8,7 @@ self-hosted-runner:
# Labels of self-hosted runner that linter should ignore
labels:
- amd64-nvidia-a100
- amd64-nvidia-h100-snp
- arm64-k8s
- containerd-v1.7-overlayfs
- containerd-v2.0-overlayfs

View File

@@ -12,7 +12,7 @@ name: Build checks
jobs:
check:
name: check
runs-on: ${{ inputs.instance }}
runs-on: ${{ matrix.component.name == 'runtime' && inputs.instance == 'ubuntu-24.04-s390x' && 's390x' || matrix.component.name == 'runtime' && inputs.instance == 'ubuntu-24.04-ppc64le' && 'ppc64le' || inputs.instance }}
strategy:
fail-fast: false
matrix:

View File

@@ -14,10 +14,10 @@ jobs:
matrix:
environment: [
{ test-type: multi-snapshotter, containerd: v2.2 },
{ test-type: multi-experimental-force-guest-pull, containerd: v1.7 },
{ test-type: multi-experimental-force-guest-pull, containerd: v2.0 },
{ test-type: multi-experimental-force-guest-pull, containerd: v2.1 },
{ test-type: multi-experimental-force-guest-pull, containerd: v2.2 },
{ test-type: force-guest-pull, containerd: v1.7 },
{ test-type: force-guest-pull, containerd: v2.0 },
{ test-type: force-guest-pull, containerd: v2.1 },
{ test-type: force-guest-pull, containerd: v2.2 },
]
env:
# I don't want those to be inside double quotes, so I'm deliberately ignoring the double quotes here.
@@ -41,7 +41,7 @@ jobs:
KUBERNETES: vanilla
SNAPSHOTTER: ${{ matrix.environment.test-type == 'multi-snapshotter' && 'nydus' || '' }}
USE_EXPERIMENTAL_SETUP_SNAPSHOTTER: ${{ matrix.environment.test-type == 'multi-snapshotter' }}
EXPERIMENTAL_FORCE_GUEST_PULL: ${{ matrix.environment.test-type == 'experimental-force-guest-pull' && 'qemu-coco-dev' || '' }}
EXPERIMENTAL_FORCE_GUEST_PULL: ${{ matrix.environment.test-type == 'force-guest-pull' && 'qemu-coco-dev' || '' }}
# This is needed as we may hit the createContainerTimeout
- name: Adjust Kata Containers' create_container_timeout

View File

@@ -29,22 +29,22 @@ permissions: {}
jobs:
run-nvidia-gpu-tests-on-amd64:
name: run-nvidia-gpu-tests-on-amd64
name: run-${{ matrix.environment.name }}-tests-on-amd64
strategy:
fail-fast: false
matrix:
vmm:
- qemu-nvidia-gpu
k8s:
- kubeadm
runs-on: amd64-nvidia-a100
environment: [
{ name: nvidia-gpu, vmm: qemu-nvidia-gpu, runner: amd64-nvidia-a100 },
{ name: nvidia-gpu-snp, vmm: qemu-nvidia-gpu-snp, runner: amd64-nvidia-h100-snp },
]
runs-on: ${{ matrix.environment.runner }}
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
DOCKER_REPO: ${{ inputs.repo }}
DOCKER_TAG: ${{ inputs.tag }}
GH_PR_NUMBER: ${{ inputs.pr-number }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
KUBERNETES: ${{ matrix.k8s }}
KATA_HYPERVISOR: ${{ matrix.environment.vmm }}
KUBERNETES: kubeadm
K8S_TEST_HOST_TYPE: all
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
@@ -66,20 +66,20 @@ jobs:
- name: Install `bats`
run: bash tests/integration/kubernetes/gha-run.sh install-bats
- name: Run tests
- name: Run tests ${{ matrix.environment.vmm }}
timeout-minutes: 30
run: bash tests/integration/kubernetes/gha-run.sh run-nv-tests
env:
NGC_API_KEY: ${{ secrets.NGC_API_KEY }}
- name: Collect artifacts ${{ matrix.vmm }}
- name: Collect artifacts ${{ matrix.environment.vmm }}
if: always()
run: bash tests/integration/kubernetes/gha-run.sh collect-artifacts
continue-on-error: true
- name: Archive artifacts ${{ matrix.vmm }}
- name: Archive artifacts ${{ matrix.environment.vmm }}
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:
name: k8s-tests-${{ matrix.vmm }}-${{ matrix.k8s }}-${{ inputs.tag }}
name: k8s-tests-${{ matrix.environment.vmm }}-kubeadm-${{ inputs.tag }}
path: /tmp/artifacts
retention-days: 1

View File

@@ -104,12 +104,20 @@ LOW_WATER_MARK=32768
sudo dmsetup create "${POOL_NAME}" \
--table "0 ${LENGTH_IN_SECTORS} thin-pool ${META_DEV} ${DATA_DEV} ${DATA_BLOCK_SIZE} ${LOW_WATER_MARK}"
# Determine plugin name based on containerd config version
CONFIG_VERSION=$(containerd config dump | awk '/^version/ {print $3}')
if [ "$CONFIG_VERSION" -ge 2 ]; then
PLUGIN="io.containerd.snapshotter.v1.devmapper"
else
PLUGIN="devmapper"
fi
cat << EOF
#
# Add this to your config.toml configuration file and restart containerd daemon
#
[plugins]
[plugins.devmapper]
[plugins."${PLUGIN}"]
pool_name = "${POOL_NAME}"
root_path = "${DATA_DIR}"
base_image_size = "10GB"

View File

@@ -336,11 +336,17 @@ mod tests {
let plain = slog_term::PlainSyncDecorator::new(std::io::stdout());
let logger = Logger::root(slog_term::FullFormat::new(plain).build().fuse(), o!());
// Detect actual filesystem types mounted in this environment
// Z runners mount /dev as tmpfs, while normal systems use devtmpfs
let dev_fs_type = get_mount_fs_type("/dev").unwrap_or_else(|_| String::from("devtmpfs"));
let proc_fs_type = get_mount_fs_type("/proc").unwrap_or_else(|_| String::from("proc"));
let sys_fs_type = get_mount_fs_type("/sys").unwrap_or_else(|_| String::from("sysfs"));
let test_cases = [
("dev", "/dev", "devtmpfs"),
("udev", "/dev", "devtmpfs"),
("proc", "/proc", "proc"),
("sysfs", "/sys", "sysfs"),
("dev", "/dev", dev_fs_type.as_str()),
("udev", "/dev", dev_fs_type.as_str()),
("proc", "/proc", proc_fs_type.as_str()),
("sysfs", "/sys", sys_fs_type.as_str()),
];
for &(source, destination, fs_type) in &test_cases {
@@ -381,6 +387,22 @@ mod tests {
let drain = slog::Discard;
let logger = slog::Logger::root(drain, o!());
// Detect filesystem type of root directory
let tmp_fs_type = get_mount_fs_type("/").unwrap_or_else(|_| String::from("unknown"));
// Error messages that vary based on filesystem type
const DEFAULT_ERROR_EPERM: &str = "Operation not permitted";
const BTRFS_ERROR_ENODEV: &str = "No such device";
// Helper to select error message based on filesystem type (e.g. btrfs for s390x runners)
let get_error_msg = |default: &'static str, btrfs_specific: &'static str| -> &'static str {
if tmp_fs_type == "btrfs" && !btrfs_specific.is_empty() {
btrfs_specific
} else {
default
}
};
let tests = &[
TestData {
test_user: TestUserType::Any,
@@ -416,7 +438,7 @@ mod tests {
fs_type: "bind",
flags: MsFlags::empty(),
options: "bind",
error_contains: "Operation not permitted",
error_contains: get_error_msg(DEFAULT_ERROR_EPERM, BTRFS_ERROR_ENODEV),
},
TestData {
test_user: TestUserType::NonRootOnly,
@@ -496,7 +518,14 @@ mod tests {
let err = result.unwrap_err();
let error_msg = format!("{}", err);
assert!(error_msg.contains(d.error_contains), "{}", msg);
assert!(
error_msg.contains(d.error_contains),
"{}: expected error containing '{}', got '{}'",
msg,
d.error_contains,
error_msg
);
}
}

View File

@@ -922,6 +922,18 @@ mod tests {
const TEST_DUMMY_INTERFACE: &str = "dummy_for_arp";
const TEST_ARP_IP: &str = "192.0.2.127";
/// Helper function to check if the result is a netlink EACCES error
fn is_netlink_permission_error<T>(result: &Result<T>) -> bool {
if let Err(e) = result {
let error_string = format!("{:?}", e);
if error_string.contains("code: Some(-13)") {
println!("INFO: skipping test - netlink operations are restricted in this environment (EACCES)");
return true;
}
}
false
}
#[tokio::test]
async fn find_link_by_name() {
let message = Handle::new()
@@ -1045,10 +1057,14 @@ mod tests {
let lo = handle.find_link(LinkFilter::Name("lo")).await.unwrap();
for network in list {
handle
.add_addresses(lo.index(), iter::once(network))
.await
.expect("Failed to add IP");
let result = handle.add_addresses(lo.index(), iter::once(network)).await;
// Skip test if netlink operations are restricted (EACCES = -13)
if is_netlink_permission_error(&result) {
return;
}
result.expect("Failed to add IP");
// Make sure the address is there
let result = handle
@@ -1063,10 +1079,14 @@ mod tests {
assert!(result.is_some());
// Update it
handle
.add_addresses(lo.index(), iter::once(network))
.await
.expect("Failed to delete address");
let result = handle.add_addresses(lo.index(), iter::once(network)).await;
// Skip test if netlink operations are restricted (EACCES = -13)
if is_netlink_permission_error(&result) {
return;
}
result.expect("Failed to delete address");
}
}

View File

@@ -59,10 +59,26 @@ pub fn reseed_rng(data: &[u8]) -> Result<()> {
#[cfg(test)]
mod tests {
use super::*;
use nix::errno::Errno;
use std::fs::File;
use std::io::prelude::*;
use test_utils::skip_if_not_root;
/// Helper function to check if the result is an EPERM error
fn is_permission_error(result: &Result<()>) -> bool {
if let Err(e) = result {
if let Some(errno) = e.downcast_ref::<Errno>() {
if *errno == Errno::EPERM {
println!(
"EPERM: skipping test - reseeding RNG is not permitted in this environment"
);
return true;
}
}
}
false
}
#[test]
fn test_reseed_rng() {
skip_if_not_root!();
@@ -73,6 +89,9 @@ mod tests {
// Ensure the buffer was filled.
assert!(n == POOL_SIZE);
let ret = reseed_rng(&seed);
if is_permission_error(&ret) {
return;
}
assert!(ret.is_ok());
}
@@ -85,6 +104,9 @@ mod tests {
// Ensure the buffer was filled.
assert!(n == POOL_SIZE);
let ret = reseed_rng(&seed);
if is_permission_error(&ret) {
return;
}
if nix::unistd::Uid::effective().is_root() {
assert!(ret.is_ok());
} else {

View File

@@ -2481,6 +2481,26 @@ mod tests {
// normally this module should eixsts...
m.name = "bridge".to_string();
let result = load_kernel_module(&m);
// Skip test if loading kernel modules is not permitted
// or kernel module is not found
if let Err(e) = &result {
let error_string = format!("{:?}", e);
// Let's print out the error message first
println!("DEBUG: error: {}", error_string);
if error_string.contains("Operation not permitted")
|| error_string.contains("EPERM")
|| error_string.contains("Permission denied")
{
println!("INFO: skipping test - loading kernel modules is not permitted in this environment");
return;
}
if error_string.contains("not found") {
println!("INFO: skipping test - kernel module is not found in this environment");
return;
}
}
assert!(result.is_ok(), "load module should success");
}

View File

@@ -219,5 +219,6 @@ pub fn get_process_seccomp_rules() -> Vec<(i64, Vec<seccompiler::SeccompRule>)>
(libc::SYS_chmod, vec![]),
#[cfg(target_arch = "x86_64")]
(libc::SYS_fchmodat2, vec![]),
(libc::SYS_pselect6, vec![]),
]
}

View File

@@ -603,15 +603,15 @@ impl QemuInner {
}
};
let coldplugged_mem = megs_to_bytes(self.config.memory_info.default_memory);
let coldplugged_mem_mb = self.config.memory_info.default_memory;
let coldplugged_mem = megs_to_bytes(coldplugged_mem_mb);
let new_total_mem = megs_to_bytes(new_total_mem_mb);
if new_total_mem < coldplugged_mem {
return Err(anyhow!(
"asked to resize to {} M but that is less than cold-plugged memory size ({})",
new_total_mem_mb,
bytes_to_megs(coldplugged_mem)
));
warn!(sl!(), "asked to resize to {} M but that is less than cold-plugged memory size ({}), nothing to do",new_total_mem_mb,
bytes_to_megs(coldplugged_mem));
return Ok((coldplugged_mem_mb, MemoryConfig::default()));
}
let guest_mem_block_size = qmp.guest_memory_block_size();

View File

@@ -473,6 +473,8 @@ ifneq (,$(QEMUCMD))
KERNELTDXPARAMS_NV += "authorize_allow_devs=pci:ALL"
KERNELSNPPARAMS_NV = $(KERNELPARAMS_NV)
#TODO: temporary until the attestation agent activates the device after successful attestation
KERNELSNPPARAMS_NV += "nvrc.smi.srs=1"
# Setting this to false can lead to cgroup leakages in the host
# Best practice for production is to set this to true

View File

@@ -36,6 +36,7 @@ import (
"github.com/kata-containers/kata-containers/src/runtime/virtcontainers/pkg/rootless"
"github.com/kata-containers/kata-containers/src/runtime/virtcontainers/types"
"github.com/kata-containers/kata-containers/src/runtime/virtcontainers/utils"
"github.com/kata-containers/kata-containers/src/runtime/virtcontainers/utils/retry"
ctrAnnotations "github.com/containerd/containerd/pkg/cri/annotations"
crioAnnotations "github.com/cri-o/cri-o/pkg/annotations"
@@ -597,7 +598,31 @@ func (k *kataAgent) updateInterface(ctx context.Context, ifc *pbTypes.Interface)
ifcReq := &grpc.UpdateInterfaceRequest{
Interface: ifc,
}
resultingInterface, err := k.sendReq(ctx, ifcReq)
// Since the network device hotplug is an asynchronous operation,
// it's possible that the hotplug operation had returned, but the network device
// hasn't ready in guest, thus it's better to retry on this operation to
// wait until the device ready in guest.
var resultingInterface interface{}
err := retry.Do(func() error {
if resInterface, nerr := k.sendReq(ctx, ifcReq); nerr != nil {
errMsg := nerr.Error()
if !strings.Contains(errMsg, "Link not found") {
return retry.Unrecoverable(nerr)
}
return nerr
} else {
resultingInterface = resInterface
return nil
}
},
retry.Attempts(20),
retry.LastErrorOnly(true),
retry.Delay(20*time.Millisecond))
if err != nil {
k.Logger().WithFields(logrus.Fields{
"interface-requested": fmt.Sprintf("%+v", ifc),

View File

@@ -509,8 +509,7 @@ function helm_helper() {
yq -i ".node-feature-discovery.enabled = true" "${values_yaml}"
# Do not enable on cbl-mariner yet, as the deployment is failing on those
# Do not enable on s390x yet, as the uninstall is failing on those
if [[ "${HELM_HOST_OS}" == "cbl-mariner" ]] || [[ "$(uname -m)" == "s390x" ]]; then
if [[ "${HELM_HOST_OS}" == "cbl-mariner" ]]; then
yq -i ".node-feature-discovery.enabled = false" "${values_yaml}"
fi

View File

@@ -82,6 +82,9 @@ setup() {
[ "$total_cpus_container" -eq "$total_cpus" ] && break
sleep 1
done
info "total_cpus_container = $total_cpus_container"
[ "$total_cpus_container" -eq "$total_cpus" ]
# Check the total of requests
@@ -117,10 +120,8 @@ setup() {
teardown() {
[ "${KATA_HYPERVISOR}" == "firecracker" ] && skip "test not working see: ${fc_limitations}"
[ "${KATA_HYPERVISOR}" == "fc" ] && skip "test not working see: ${fc_limitations}"
[ "${KATA_HYPERVISOR}" == "dragonball" ] && skip "test not working see: ${dragonball_limitations}"
[ "${KATA_HYPERVISOR}" == "qemu-runtime-rs" ] && skip "Requires CPU hotplug which isn't supported on ${KATA_HYPERVISOR} yet"
[ "${KATA_HYPERVISOR}" == "qemu-se-runtime-rs" ] && skip "Requires CPU hotplug which isn't supported on ${KATA_HYPERVISOR} yet"
[ "${KATA_HYPERVISOR}" == "cloud-hypervisor" ] && skip "https://github.com/kata-containers/kata-containers/issues/9039"
[ "${KATA_HYPERVISOR}" == "qemu-coco-dev*" ] && skip "test not working see: ${fc_limitations}"
( [ "${KATA_HYPERVISOR}" == "qemu-tdx" ] || [ "${KATA_HYPERVISOR}" == "qemu-snp" ] || \
[ "${KATA_HYPERVISOR}" == "qemu-se" ] ) \
&& skip "TEEs do not support memory / CPU hotplug"

View File

@@ -15,6 +15,9 @@ export RUNTIME_CLASS_NAME
POD_NAME_CUDA="cuda-vectoradd-kata"
export POD_NAME_CUDA
POD_WAIT_TIMEOUT=${POD_WAIT_TIMEOUT:-300s}
export POD_WAIT_TIMEOUT
setup() {
setup_common
get_pod_config_dir
@@ -33,7 +36,7 @@ setup() {
kubectl apply -f "${pod_yaml}"
# Wait for pod to complete successfully
kubectl wait --for=jsonpath='{.status.phase}'=Succeeded --timeout=300s pod "${pod_name}"
kubectl wait --for=jsonpath='{.status.phase}'=Succeeded --timeout="${POD_WAIT_TIMEOUT}" pod "${pod_name}"
# Get and verify the output contains expected CUDA success message
output=$(kubectl logs "${pod_name}")

View File

@@ -22,6 +22,15 @@ export LOCAL_NIM_CACHE="/opt/nim/.cache"
SKIP_MULTI_GPU_TESTS=${SKIP_MULTI_GPU_TESTS:-false}
export SKIP_MULTI_GPU_TESTS
if [[ "${RUNTIME_CLASS_NAME}" == "kata-qemu-nvidia-gpu-snp" ]]; then
POD_READY_TIMEOUT_INSTRUCT=${POD_READY_TIMEOUT_INSTRUCT:-1000s}
else
POD_READY_TIMEOUT_INSTRUCT=${POD_READY_TIMEOUT_INSTRUCT:-500s}
fi
POD_READY_TIMEOUT_EMBEDQA=${POD_READY_TIMEOUT_EMBEDQA:-500s}
export POD_READY_TIMEOUT_INSTRUCT
export POD_READY_TIMEOUT_EMBEDQA
DOCKER_CONFIG_JSON=$(
echo -n "{\"auths\":{\"nvcr.io\":{\"username\":\"\$oauthtoken\",\"password\":\"${NGC_API_KEY}\",\"auth\":\"$(echo -n "\$oauthtoken:${NGC_API_KEY}" | base64 -w0)\"}}}" |
base64 -w0
@@ -49,7 +58,7 @@ create_inference_pod() {
export POD_INSTRUCT_YAML="${pod_instruct_yaml}"
kubectl apply -f "${POD_INSTRUCT_YAML}"
kubectl wait --for=condition=Ready --timeout=500s pod "${POD_NAME_INSTRUCT}"
kubectl wait --for=condition=Ready --timeout="${POD_READY_TIMEOUT_INSTRUCT}" pod "${POD_NAME_INSTRUCT}"
# shellcheck disable=SC2030 # Variable is shared via file between BATS tests
POD_IP_INSTRUCT=$(kubectl get pod "${POD_NAME_INSTRUCT}" -o jsonpath='{.status.podIP}')
@@ -68,7 +77,7 @@ create_embedqa_pod() {
export POD_EMBEDQA_YAML="${pod_embedqa_yaml}"
kubectl apply -f "${POD_EMBEDQA_YAML}"
kubectl wait --for=condition=Ready --timeout=500s pod "${POD_NAME_EMBEDQA}"
kubectl wait --for=condition=Ready --timeout="${POD_READY_TIMEOUT_EMBEDQA}" pod "${POD_NAME_EMBEDQA}"
# shellcheck disable=SC2030 # Variable is shared via file between BATS tests
POD_IP_EMBEDQA=$(kubectl get pod "${POD_NAME_EMBEDQA}" -o jsonpath='{.status.podIP}')

View File

@@ -13,9 +13,16 @@ source "${kubernetes_dir}/../../common.bash"
# Enable NVRC trace logging for NVIDIA GPU runtime
enable_nvrc_trace() {
if [[ ${RUNTIME_CLASS_NAME:-kata-qemu-nvidia-gpu} == "kata-qemu-nvidia-gpu" ]]; then
local config_file=""
if [[ ${RUNTIME_CLASS_NAME} == "kata-qemu-nvidia-gpu" ]]; then
config_file="/opt/kata/share/defaults/kata-containers/configuration-qemu-nvidia-gpu.toml"
elif [[ ${RUNTIME_CLASS_NAME} == "kata-qemu-nvidia-gpu-snp" ]]; then
config_file="/opt/kata/share/defaults/kata-containers/configuration-qemu-nvidia-gpu-snp.toml"
# Let's temporarily hijack this function to set needed changes for the test
sudo sed -i -e 's/^shared_fs = "none"/shared_fs = "virtio-9p"/' "${config_file}"
fi
if ! grep -q "nvrc.log=trace" "${config_file}"; then
sudo sed -i -e 's/^kernel_params = "\(.*\)"/kernel_params = "\1 nvrc.log=trace"/g' "${config_file}"
fi
@@ -40,6 +47,12 @@ else
"k8s-nvidia-nim.bats")
fi
# KATA_HYPERVISOR is set in the CI workflow yaml file, and can be set by the user executing CI locally
if [ -n "${KATA_HYPERVISOR:-}" ]; then
export RUNTIME_CLASS_NAME="kata-${KATA_HYPERVISOR}"
info "Set RUNTIME_CLASS_NAME=${RUNTIME_CLASS_NAME} from KATA_HYPERVISOR=${KATA_HYPERVISOR}"
fi
ensure_yq
if [[ "${ENABLE_NVRC_TRACE:-true}" == "true" ]]; then

View File

@@ -9,6 +9,8 @@ metadata:
name: ${POD_NAME_CUDA}
labels:
app: ${POD_NAME_CUDA}
annotations:
cdi.k8s.io/gpu: "nvidia.com/pgpu=0"
spec:
runtimeClassName: ${RUNTIME_CLASS_NAME}
restartPolicy: Never

View File

@@ -17,6 +17,8 @@ metadata:
name: ${POD_NAME_INSTRUCT}
labels:
app: ${POD_NAME_INSTRUCT}
annotations:
cdi.k8s.io/gpu: "nvidia.com/pgpu=0"
spec:
restartPolicy: Never
runtimeClassName: "${RUNTIME_CLASS_NAME}"

View File

@@ -17,6 +17,8 @@ metadata:
name: nvidia-nim-llama-3-2-nv-embedqa-1b-v2
labels:
app: nvidia-nim-llama-3-2-nv-embedqa-1b-v2
annotations:
cdi.k8s.io/gpu: "nvidia.com/pgpu=1"
spec:
restartPolicy: Always
runtimeClassName: "${RUNTIME_CLASS_NAME}"

View File

@@ -13,6 +13,7 @@
"cloud-hypervisor" (dict "memory" "130Mi" "cpu" "250m")
"dragonball" (dict "memory" "130Mi" "cpu" "250m")
"fc" (dict "memory" "130Mi" "cpu" "250m")
"firecracker" (dict "memory" "130Mi" "cpu" "250m")
"qemu" (dict "memory" "160Mi" "cpu" "250m")
"qemu-coco-dev" (dict "memory" "160Mi" "cpu" "250m")
"qemu-runtime-rs" (dict "memory" "160Mi" "cpu" "250m")
@@ -21,8 +22,8 @@
"qemu-snp" (dict "memory" "2048Mi" "cpu" "1.0")
"qemu-tdx" (dict "memory" "2048Mi" "cpu" "1.0")
"qemu-nvidia-gpu" (dict "memory" "4096Mi" "cpu" "1.0")
"qemu-nvidia-gpu-snp" (dict "memory" "4096Mi" "cpu" "1.0")
"qemu-nvidia-gpu-tdx" (dict "memory" "4096Mi" "cpu" "1.0")
"qemu-nvidia-gpu-snp" (dict "memory" "16384Mi" "cpu" "1.0")
"qemu-nvidia-gpu-tdx" (dict "memory" "16384Mi" "cpu" "1.0")
"qemu-cca" (dict "memory" "2048Mi" "cpu" "1.0")
"stratovirt" (dict "memory" "130Mi" "cpu" "250m")
"remote" (dict "memory" "120Mi" "cpu" "250m")

View File

@@ -6,7 +6,7 @@ metadata:
handler: kata-qemu-nvidia-gpu-snp
overhead:
podFixed:
memory: "4096Mi"
memory: "16384Mi"
cpu: "1"
scheduling:
nodeSelector:

View File

@@ -6,7 +6,7 @@ metadata:
handler: kata-qemu-nvidia-gpu-tdx
overhead:
podFixed:
memory: "4096Mi"
memory: "16384Mi"
cpu: "1"
scheduling:
nodeSelector:

View File

@@ -84,7 +84,7 @@ metadata:
handler: kata-qemu-nvidia-gpu-snp
overhead:
podFixed:
memory: "4096Mi"
memory: "16384Mi"
cpu: "1"
scheduling:
nodeSelector:
@@ -97,7 +97,7 @@ metadata:
handler: kata-qemu-nvidia-gpu-tdx
overhead:
podFixed:
memory: "4096Mi"
memory: "16384Mi"
cpu: "1"
scheduling:
nodeSelector:

View File

@@ -33,6 +33,40 @@ info() {
echo "INFO: $msg" >&2
}
# Check if a value exists within a specific field in the config file
# * field_contains_value "${config}" "kernel_params" "agent.log=debug"
field_contains_value() {
local config_file="$1"
local field="$2"
local value="$3"
# Use word boundaries (\b) to match complete parameters, not substrings
# This handles space-separated values like kernel_params = "param1 param2 param3"
grep -qE "^${field}[^=]*=.*[[:space:]\"](${value})([[:space:]\"]|$)" "${config_file}"
}
# Get existing values from a TOML array field and return them as a comma-separated string
# * get_field_array_values "${config}" "enable_annotations"
get_field_array_values() {
local config_file="$1"
local field="$2"
# Extract values from field = ["val1", "val2", ...] format
grep "^${field} = " "${config_file}" | sed "s/^${field} = \[\(.*\)\]/\1/" | sed 's/"//g' | sed 's/, /,/g'
}
# Check if a boolean config is already set to true
config_is_true() {
local config_file="$1"
local key="$2"
grep -qE "^${key}\s*=\s*true" "${config_file}"
}
# Check if a string value already exists anywhere in the file (literal match)
string_exists_in_file() {
local file_path="$1"
local string="$2"
grep -qF "${string}" "${file_path}"
}
DEBUG="${DEBUG:-"false"}"
ARCH=$(uname -m)
@@ -55,6 +89,12 @@ SNAPSHOTTER_HANDLER_MAPPING_AARCH64="${SNAPSHOTTER_HANDLER_MAPPING_AARCH64:-${SN
SNAPSHOTTER_HANDLER_MAPPING_S390X="${SNAPSHOTTER_HANDLER_MAPPING_S390X:-${SNAPSHOTTER_HANDLER_MAPPING}}"
SNAPSHOTTER_HANDLER_MAPPING_PPC64LE="${SNAPSHOTTER_HANDLER_MAPPING_PPC64LE:-${SNAPSHOTTER_HANDLER_MAPPING}}"
ALLOWED_HYPERVISOR_ANNOTATIONS="${ALLOWED_HYPERVISOR_ANNOTATIONS:-}"
ALLOWED_HYPERVISOR_ANNOTATIONS_X86_64="${ALLOWED_HYPERVISOR_ANNOTATIONS_X86_64:-${ALLOWED_HYPERVISOR_ANNOTATIONS}}"
ALLOWED_HYPERVISOR_ANNOTATIONS_AARCH64="${ALLOWED_HYPERVISOR_ANNOTATIONS_AARCH64:-${ALLOWED_HYPERVISOR_ANNOTATIONS}}"
ALLOWED_HYPERVISOR_ANNOTATIONS_S390X="${ALLOWED_HYPERVISOR_ANNOTATIONS_S390X:-${ALLOWED_HYPERVISOR_ANNOTATIONS}}"
ALLOWED_HYPERVISOR_ANNOTATIONS_PPC64LE="${ALLOWED_HYPERVISOR_ANNOTATIONS_PPC64LE:-${ALLOWED_HYPERVISOR_ANNOTATIONS}}"
PULL_TYPE_MAPPING="${PULL_TYPE_MAPPING:-}"
PULL_TYPE_MAPPING_X86_64="${PULL_TYPE_MAPPING_X86_64:-${PULL_TYPE_MAPPING}}"
PULL_TYPE_MAPPING_AARCH64="${PULL_TYPE_MAPPING_AARCH64:-${PULL_TYPE_MAPPING}}"
@@ -70,6 +110,7 @@ EXPERIMENTAL_FORCE_GUEST_PULL_PPC64LE="${EXPERIMENTAL_FORCE_GUEST_PULL_PPC64LE:-
SHIMS_FOR_ARCH=""
DEFAULT_SHIM_FOR_ARCH=""
SNAPSHOTTER_HANDLER_MAPPING_FOR_ARCH=""
ALLOWED_HYPERVISOR_ANNOTATIONS_FOR_ARCH=""
PULL_TYPE_MAPPING_FOR_ARCH=""
EXPERIMENTAL_FORCE_GUEST_PULL_FOR_ARCH=""
case ${ARCH} in
@@ -77,6 +118,7 @@ case ${ARCH} in
SHIMS_FOR_ARCH="${SHIMS_X86_64}"
DEFAULT_SHIM_FOR_ARCH="${DEFAULT_SHIM_X86_64}"
SNAPSHOTTER_HANDLER_MAPPING_FOR_ARCH="${SNAPSHOTTER_HANDLER_MAPPING_X86_64}"
ALLOWED_HYPERVISOR_ANNOTATIONS_FOR_ARCH="${ALLOWED_HYPERVISOR_ANNOTATIONS_X86_64}"
PULL_TYPE_MAPPING_FOR_ARCH="${PULL_TYPE_MAPPING_X86_64}"
EXPERIMENTAL_FORCE_GUEST_PULL_FOR_ARCH="${EXPERIMENTAL_FORCE_GUEST_PULL_X86_64}"
;;
@@ -84,6 +126,7 @@ case ${ARCH} in
SHIMS_FOR_ARCH="${SHIMS_AARCH64}"
DEFAULT_SHIM_FOR_ARCH="${DEFAULT_SHIM_AARCH64}"
SNAPSHOTTER_HANDLER_MAPPING_FOR_ARCH="${SNAPSHOTTER_HANDLER_MAPPING_AARCH64}"
ALLOWED_HYPERVISOR_ANNOTATIONS_FOR_ARCH="${ALLOWED_HYPERVISOR_ANNOTATIONS_AARCH64}"
PULL_TYPE_MAPPING_FOR_ARCH="${PULL_TYPE_MAPPING_AARCH64}"
EXPERIMENTAL_FORCE_GUEST_PULL_FOR_ARCH="${EXPERIMENTAL_FORCE_GUEST_PULL_AARCH64}"
;;
@@ -91,6 +134,7 @@ case ${ARCH} in
SHIMS_FOR_ARCH="${SHIMS_S390X}"
DEFAULT_SHIM_FOR_ARCH="${DEFAULT_SHIM_S390X}"
SNAPSHOTTER_HANDLER_MAPPING_FOR_ARCH="${SNAPSHOTTER_HANDLER_MAPPING_S390X}"
ALLOWED_HYPERVISOR_ANNOTATIONS_FOR_ARCH="${ALLOWED_HYPERVISOR_ANNOTATIONS_S390X}"
PULL_TYPE_MAPPING_FOR_ARCH="${PULL_TYPE_MAPPING_S390X}"
EXPERIMENTAL_FORCE_GUEST_PULL_FOR_ARCH="${EXPERIMENTAL_FORCE_GUEST_PULL_S390X}"
;;
@@ -98,6 +142,7 @@ case ${ARCH} in
SHIMS_FOR_ARCH="${SHIMS_PPC64LE}"
DEFAULT_SHIM_FOR_ARCH="${DEFAULT_SHIM_PPC64LE}"
SNAPSHOTTER_HANDLER_MAPPING_FOR_ARCH="${SNAPSHOTTER_HANDLER_MAPPING_PPC64LE}"
ALLOWED_HYPERVISOR_ANNOTATIONS_FOR_ARCH="${ALLOWED_HYPERVISOR_ANNOTATIONS_PPC64LE}"
PULL_TYPE_MAPPING_FOR_ARCH="${PULL_TYPE_MAPPING_PPC64LE}"
EXPERIMENTAL_FORCE_GUEST_PULL_FOR_ARCH="${EXPERIMENTAL_FORCE_GUEST_PULL_PPC64LE}"
;;
@@ -105,6 +150,7 @@ case ${ARCH} in
SHIMS_FOR_ARCH="${SHIMS}"
DEFAULT_SHIM_FOR_ARCH="${DEFAULT_SHIM}"
SNAPSHOTTER_HANDLER_MAPPING_FOR_ARCH="${SNAPSHOTTER_HANDLER_MAPPING}"
ALLOWED_HYPERVISOR_ANNOTATIONS_FOR_ARCH="${ALLOWED_HYPERVISOR_ANNOTATIONS}"
PULL_TYPE_MAPPING_FOR_ARCH="${PULL_TYPE_MAPPING}"
EXPERIMENTAL_FORCE_GUEST_PULL_FOR_ARCH="${EXPERIMENTAL_FORCE_GUEST_PULL}"
;;
@@ -116,6 +162,8 @@ default_shim="${DEFAULT_SHIM_FOR_ARCH}"
IFS=',' read -a snapshotters <<< "${SNAPSHOTTER_HANDLER_MAPPING_FOR_ARCH}"
snapshotters_delimiter=':'
IFS=' ' read -a hypervisor_annotations <<< "${ALLOWED_HYPERVISOR_ANNOTATIONS_FOR_ARCH}"
IFS=',' read -a pull_types <<< "${PULL_TYPE_MAPPING_FOR_ARCH}"
IFS="," read -a experimental_force_guest_pull <<< "${EXPERIMENTAL_FORCE_GUEST_PULL_FOR_ARCH}"
@@ -123,15 +171,6 @@ IFS="," read -a experimental_force_guest_pull <<< "${EXPERIMENTAL_FORCE_GUEST_PU
CREATE_RUNTIMECLASSES="${CREATE_RUNTIMECLASSES:-"false"}"
CREATE_DEFAULT_RUNTIMECLASS="${CREATE_DEFAULT_RUNTIMECLASS:-"false"}"
ALLOWED_HYPERVISOR_ANNOTATIONS="${ALLOWED_HYPERVISOR_ANNOTATIONS:-}"
IFS=' ' read -a non_formatted_allowed_hypervisor_annotations <<< "$ALLOWED_HYPERVISOR_ANNOTATIONS"
allowed_hypervisor_annotations=""
for allowed_hypervisor_annotation in "${non_formatted_allowed_hypervisor_annotations[@]}"; do
allowed_hypervisor_annotations+="\"$allowed_hypervisor_annotation\", "
done
allowed_hypervisor_annotations=$(echo $allowed_hypervisor_annotations | sed 's/,$//')
AGENT_HTTPS_PROXY="${AGENT_HTTPS_PROXY:-}"
AGENT_NO_PROXY="${AGENT_NO_PROXY:-}"
@@ -484,7 +523,9 @@ EOF
chmod +x ${qemu_binary_script_host_path}
fi
sed -i -e "s|${qemu_binary}|${qemu_binary_script}|" ${config_path}
if ! string_exists_in_file "${config_path}" "${qemu_binary_script}"; then
sed -i -e "s|${qemu_binary}|${qemu_binary_script}|" ${config_path}
fi
}
function install_artifacts() {
@@ -505,26 +546,119 @@ function install_artifacts() {
local kata_config_file="${config_path}/configuration-${shim}.toml"
# Properly set https_proxy and no_proxy for Kata Containers
if [ -n "${AGENT_HTTPS_PROXY}" ]; then
sed -i -e 's|^kernel_params = "\(.*\)"|kernel_params = "\1 agent.https_proxy='${AGENT_HTTPS_PROXY}'"|g' "${kata_config_file}"
if ! field_contains_value "${kata_config_file}" "kernel_params" "agent.https_proxy"; then
sed -i -e 's|^kernel_params = "\(.*\)"|kernel_params = "\1 agent.https_proxy='${AGENT_HTTPS_PROXY}'"|g' "${kata_config_file}"
fi
fi
if [ -n "${AGENT_NO_PROXY}" ]; then
sed -i -e 's|^kernel_params = "\(.*\)"|kernel_params = "\1 agent.no_proxy='${AGENT_NO_PROXY}'"|g' "${kata_config_file}"
if ! field_contains_value "${kata_config_file}" "kernel_params" "agent.no_proxy"; then
sed -i -e 's|^kernel_params = "\(.*\)"|kernel_params = "\1 agent.no_proxy='${AGENT_NO_PROXY}'"|g' "${kata_config_file}"
fi
fi
# Allow enabling debug for Kata Containers
if [[ "${DEBUG}" == "true" ]]; then
sed -i -e 's/^#\(enable_debug\).*=.*$/\1 = true/g' "${kata_config_file}"
sed -i -e 's/^#\(debug_console_enabled\).*=.*$/\1 = true/g' "${kata_config_file}"
sed -i -e 's/^kernel_params = "\(.*\)"/kernel_params = "\1 agent.log=debug initcall_debug"/g' "${kata_config_file}"
if ! config_is_true "${kata_config_file}" "enable_debug"; then
sed -i -e 's/^#\{0,1\}\(enable_debug\).*=.*$/\1 = true/g' "${kata_config_file}"
fi
if ! config_is_true "${kata_config_file}" "debug_console_enabled"; then
sed -i -e 's/^#\{0,1\}\(debug_console_enabled\).*=.*$/\1 = true/g' "${kata_config_file}"
fi
local debug_params=""
if ! field_contains_value "${kata_config_file}" "kernel_params" "agent.log=debug"; then
debug_params+=" agent.log=debug"
fi
if ! field_contains_value "${kata_config_file}" "kernel_params" "initcall_debug"; then
debug_params+=" initcall_debug"
fi
if [[ -n "${debug_params}" ]]; then
sed -i -e "s/^kernel_params = \"\(.*\)\"/kernel_params = \"\1${debug_params}\"/g" "${kata_config_file}"
fi
fi
if [ -n "${allowed_hypervisor_annotations}" ]; then
sed -i -e "s/^enable_annotations = \[\(.*\)\]/enable_annotations = [\1, $allowed_hypervisor_annotations]/" "${kata_config_file}"
# Apply allowed_hypervisor_annotations:
# Here we need to support both cases of:
# * A list of annotations, which will be blindly applied to all shims
# * A per-shim list of annotations, which will only be applied to the specific shim
if [[ ${#hypervisor_annotations[@]} -gt 0 ]]; then
local shim_specific_annotations=""
local global_annotations=""
for m in "${hypervisor_annotations[@]}"; do
# Check if this mapping has a colon (shim-specific) or not
if [[ "${m}" == *:* ]]; then
# Shim-specific mapping like "qemu:foo,bar"
local key="${m%:*}"
local value="${m#*:}"
if [[ "${key}" != "${shim}" ]]; then
continue
fi
if [[ -n "${shim_specific_annotations}" ]]; then
shim_specific_annotations+=","
fi
shim_specific_annotations+="${value}"
else
# All shims annotations like "foo bar"
if [[ -n "${global_annotations}" ]]; then
global_annotations+=","
fi
global_annotations+="$(echo "${m}" | sed 's/ /,/g')"
fi
done
# Combine shim-specific and non-shim-specific annotations
local all_annotations="${global_annotations}"
if [[ -n "${shim_specific_annotations}" ]]; then
if [[ -n "${all_annotations}" ]]; then
all_annotations+=","
fi
all_annotations+="${shim_specific_annotations}"
fi
if [[ -n "${all_annotations}" ]]; then
local existing_annotations=$(get_field_array_values "${kata_config_file}" "enable_annotations")
# Combine existing and new annotations
local combined_annotations="${existing_annotations}"
if [[ -n "${combined_annotations}" ]] && [[ -n "${all_annotations}" ]]; then
combined_annotations+=",${all_annotations}"
elif [[ -n "${all_annotations}" ]]; then
combined_annotations="${all_annotations}"
fi
# Deduplicate all annotations
IFS=',' read -a annotations <<< "${combined_annotations}"
local -A seen_annotations
local unique_annotations=()
for annotation in "${annotations[@]}"; do
# Trim whitespace
annotation=$(echo "${annotation}" | sed 's/^[[:space:]]//;s/[[:space:]]$//')
if [[ -n "${annotation}" ]] && [[ -z "${seen_annotations[${annotation}]+_}" ]]; then
seen_annotations["${annotation}"]=1
unique_annotations+=("${annotation}")
fi
done
if [[ ${#unique_annotations[@]} -gt 0 ]]; then
local formatted_annotations=()
for ann in "${unique_annotations[@]}"; do
formatted_annotations+=("\"${ann}\"")
done
local final_annotations=$(IFS=', '; echo "${formatted_annotations[*]}")
sed -i -e "s/^enable_annotations = \[.*\]/enable_annotations = [${final_annotations}]/" "${kata_config_file}"
fi
fi
fi
if printf '%s\n' "${experimental_force_guest_pull[@]}" | grep -Fxq "${shim}"; then
sed -i -e 's/^\(experimental_force_guest_pull\).*=.*$/\1 = true/g' "${kata_config_file}"
if ! config_is_true "${kata_config_file}" "experimental_force_guest_pull"; then
sed -i -e 's/^#\{0,1\}\(experimental_force_guest_pull\).*=.*$/\1 = true/g' "${kata_config_file}"
fi
fi
if grep -q "tdx" <<< "$shim"; then
@@ -584,11 +718,20 @@ function install_artifacts() {
# Allow Mariner to use custom configuration.
if [ "${HOST_OS:-}" == "cbl-mariner" ]; then
config_path="${host_install_dir}/share/defaults/kata-containers/configuration-clh.toml"
sed -i -E "s|(static_sandbox_resource_mgmt)=false|\1=true|" "${config_path}"
if ! config_is_true "${config_path}" "static_sandbox_resource_mgmt"; then
sed -i -E "s|(static_sandbox_resource_mgmt)\s*=\s*false|\1=true|" "${config_path}"
fi
clh_path="${dest_dir}/bin/cloud-hypervisor-glibc"
sed -i -E "s|(valid_hypervisor_paths) = .+|\1 = [\"${clh_path}\"]|" "${config_path}"
sed -i -E "s|(path) = \".+/cloud-hypervisor\"|\1 = \"${clh_path}\"|" "${config_path}"
if ! field_contains_value "${config_path}" "valid_hypervisor_paths" "${clh_path}"; then
sed -i -E "s|(valid_hypervisor_paths) = .+|\1 = [\"${clh_path}\"]|" "${config_path}"
fi
if ! field_contains_value "${config_path}" "path" "${clh_path}"; then
sed -i -E "s|(path) = \".+/cloud-hypervisor\"|\1 = \"${clh_path}\"|" "${config_path}"
fi
fi
local expand_runtime_classes_for_nfd=false
@@ -766,7 +909,12 @@ function configure_containerd_runtime() {
tomlq -i -t $(printf '%s.runtime_type=%s' ${runtime_table} ${runtime_type}) ${configuration_file}
tomlq -i -t $(printf '%s.runtime_path=%s' ${runtime_table} ${runtime_path}) ${configuration_file}
tomlq -i -t $(printf '%s.privileged_without_host_devices=true' ${runtime_table}) ${configuration_file}
tomlq -i -t $(printf '%s.pod_annotations=["io.katacontainers.*"]' ${runtime_table}) ${configuration_file}
if [[ "${shim}" == *"nvidia-gpu-"* ]]; then
tomlq -i -t $(printf '%s.pod_annotations=["io.katacontainers.*","cdi.k8s.io/*"]' ${runtime_table}) ${configuration_file}
else
tomlq -i -t $(printf '%s.pod_annotations=["io.katacontainers.*"]' ${runtime_table}) ${configuration_file}
fi
tomlq -i -t $(printf '%s.ConfigPath=%s' ${runtime_options_table} ${runtime_config_path}) ${configuration_file}
if [ "${DEBUG}" == "true" ]; then
@@ -1122,6 +1270,10 @@ function main() {
echo "* CREATE_RUNTIMECLASSES: ${CREATE_RUNTIMECLASSES}"
echo "* CREATE_DEFAULT_RUNTIMECLASS: ${CREATE_DEFAULT_RUNTIMECLASS}"
echo "* ALLOWED_HYPERVISOR_ANNOTATIONS: ${ALLOWED_HYPERVISOR_ANNOTATIONS}"
echo " * x86_64: ${ALLOWED_HYPERVISOR_ANNOTATIONS_X86_64}"
echo " * aarch64: ${ALLOWED_HYPERVISOR_ANNOTATIONS_AARCH64}"
echo " * s390x: ${ALLOWED_HYPERVISOR_ANNOTATIONS_S390X}"
echo " * ppc64le: ${ALLOWED_HYPERVISOR_ANNOTATIONS_PPC64LE}"
echo "* SNAPSHOTTER_HANDLER_MAPPING: ${SNAPSHOTTER_HANDLER_MAPPING}"
echo " * x86_64: ${SNAPSHOTTER_HANDLER_MAPPING_X86_64}"
echo " * aarch64: ${SNAPSHOTTER_HANDLER_MAPPING_AARCH64}"

View File

@@ -102,7 +102,7 @@ mapping:
- Kata Containers CI / kata-containers-ci-on-push / run-kata-deploy-tests / run-kata-deploy-tests (qemu, microk8s)
- Kata Containers CI / kata-containers-ci-on-push / run-kata-deploy-tests / run-kata-deploy-tests (qemu, rke2)
- Kata Containers CI / kata-containers-ci-on-push / run-kata-monitor-tests / run-monitor (qemu, crio)
- Kata Containers CI / kata-containers-ci-on-push / run-k8s-tests-on-nvidia-gpu / run-nvidia-gpu-tests-on-amd64 (qemu-nvidia-gpu, kubeadm)
- Kata Containers CI / kata-containers-ci-on-push / run-k8s-tests-on-nvidia-gpu / run-nvidia-gpu-tests-on-amd64
required-labels:
- ok-to-test
build: