kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2025-10-22 04:18:53 +00:00

Author	SHA1	Message	Date
Manuel Huber	4ad8c31b5a	gpu: build nv rootfs with guest pull support While the local-build's folder's Makefile dependencies for the confidential nvidia rootfs targets already declare the pause image and coco-guest-components dependencies, the actual rootfs composition does not contain the pause image bundle and relevant certificates for guest pull. This change ensure the rootfs gets composed with the relevant files. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-10-16 09:20:49 -07:00
Aurélien Bombo	edbb4b633c	Merge pull request #11890 from microsoft/saulparedes/optional_initdata genpolicy: take path to initdata from command line if provided	2025-10-16 11:04:57 -05:00
Markus Rudy	d5cb9764fd	kata-types: use pretty TOML encoder for initdata TOML was chosen for initdata particularly for the ability to include policy docs and other configuration files without mangling them. The default TOML encoding renders string values as single-line, double-quoted strings, effectively depriving us of this feature. This commit changes the encoding to use `to_string_pretty`, and includes a test that verifies the desirable aspect of encoding: newlines are kept verbatim. Fixes: #11943 Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-10-16 12:08:18 +02:00
Fabiano Fidêncio	aa7e46b5ed	tests: Check the multi-snapshotter situation on containerd One problem that we've been having for a reasonable amount of time, is containerd not behaving very well when we have multiple snapshotters. Although I'm adding this test with my "CoCo" hat in mind, the issue can happen easily with any other case that requires a different snapshotter (such as, for instance, firecracker + devmapper). With this in mind, let's do some stability tests, checking every hour a simple case of running a few pre-defined containers with runc, and then running the same containers with kata. This should be enough to put us in the situation where containerd gets confused about which snapshotter owns the image layers, and break on us (or not break and show us that this has been solved ...). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-15 13:35:43 +02:00
Manuel Huber	8221361915	gpu: Use variable to differentiate rootfs variants With this change we namespace the stage one rootfs tarball name and use the same name across all uses. This will help overcome several subtle local build problems. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-10-15 12:39:44 +02:00
Hyounggyu Choi	88c333f2a6	agent: Fix race in tests calling LinuxContainer::new() We fix the following error: ``` thread 'sandbox::tests::add_and_get_container' panicked at src/sandbox.rs:901:10: called `Result::unwrap()` on an `Err` value: Create cgroupfs manager Caused by: 0: fs error caused by: Os { code: 17, kind: AlreadyExists, message: "File exists" } 1: File exists (os error 17) note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace ``` by ensuring that the cgroup path is unique for tests run in the same millisecond. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-10-15 11:32:22 +02:00
Hyounggyu Choi	8412af919d	agent/netlink: Attempt to fix ARP and routes tests test_add_one_arp_neighbor ========================= We attempt to fix the following error: ``` thread 'netlink::tests::test_add_one_arp_neighbor' panicked at src/netlink.rs:1163:9: assertion `left == right` failed left: "" right: "192.0.2.127 lladdr 6a:92:3a:59:70:aa PERMANENT" ``` by adding a sleep to prepare_env_for_test_add_one_arp_neighbor() to wait for the kernel interfaces to settle. list_routes =========== We attempt to fix the following error (notice that the available devices contain "dummy_for_arp"): ``` thread 'netlink::tests::list_routes' panicked at src/netlink.rs:986:14: Failed to list routes: available devices: [Interface { device: "", name: "lo", IPAddresses: [IPAddress { family: v6, address: "127.0.0.1", mask: "8", special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }, IPAddress { family: v6, address: "169.254.1.1", mask: "31", special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }, IPAddress { family: v4, address: "2001:db8:85a3::8a2e:370:7334", mask: "128", special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }, IPAddress { family: v4, address: "::1", mask: "128", special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }], mtu: 65536, hwAddr: "00:00:00:00:00:00", devicePath: "", type_: "", raw_flags: 0, special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }, Interface { device: "", name: "enc0", IPAddresses: [IPAddress { family: v6, address: "10.249.65.4", mask: "24", special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }, IPAddress { family: v4, address: "fe80::4ff:fe57:b3e4", mask: "64", special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }], mtu: 1500, hwAddr: "02:00:04:57:B3:E4", devicePath: "", type_: "", raw_flags: 0, special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }, Interface { device: "", name: "docker0", IPAddresses: [IPAddress { family: v6, address: "172.17.0.1", mask: "16", special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }, IPAddress { family: v4, address: "fe80::42:56ff:fe5c:d9f9", mask: "64", special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }], mtu: 1500, hwAddr: "02:42:56:5C:D9:F9", devicePath: "", type_: "", raw_flags: 0, special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }, Interface { device: "", name: "dummy_for_arp", IPAddresses: [IPAddress { family: v6, address: "192.0.2.2", mask: "24", special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }, IPAddress { family: v4, address: "fe80::f4f2:64ff:fe46:2b01", mask: "64", special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }], mtu: 1500, hwAddr: "4A:73:DE:A3:07:64", devicePath: "", type_: "", raw_flags: 0, special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }] Caused by: 0: error looking up device 19888 1: Received a netlink error message No such device (os error 19) ``` by calling clean_env_for_test_add_one_arp_neighbor() at the start of the test. However this fix is uncertain: the original assumption for the fix was that the "dummy_for_arp" interface left over from test_add_one_arp_neighbor was the cause of the error. But (3) below shows that running list_routes in isolation while that interface is present is NOT enough to repro the error: 1. Running all tests + no clean_env in list_routes => list_routes FAILS (before this PR) 2. Running all tests + clean_env in list_routes => list_routes PASSES (after this PR) 3. Running only list_routes + dummy_for_arp present => list_routes PASSES (manual test, see below) ``` $ ip a l 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet 169.254.1.1/31 brd 169.254.1.1 scope global lo valid_lft forever preferred_lft forever inet6 2001:db8:85a3::8a2e:370:7334/128 scope global valid_lft forever preferred_lft forever inet6 ::1/128 scope host noprefixroute valid_lft forever preferred_lft forever 2: enc0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 02:00:01:02:e2:47 brd ff:ff:ff:ff:ff:ff inet 10.240.64.4/24 metric 100 brd 10.240.64.255 scope global dynamic enc0 valid_lft 159sec preferred_lft 159sec inet6 fe80::1ff:fe02:e247/64 scope link valid_lft forever preferred_lft forever 311: dummy_for_arp: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether ee:79:66:3a:dc:bc brd ff:ff:ff:ff:ff:ff inet 192.0.2.2/24 scope global dummy_for_arp valid_lft forever preferred_lft forever inet6 fe80::4c2e:83ff:fe7d:ef00/64 scope link valid_lft forever preferred_lft forever $ sudo -E PATH=$PATH make test ../../utils.mk:162: "WARNING: s390x-unknown-linux-musl target is unavailable" Finished `test` profile [unoptimized + debuginfo] target(s) in 0.25s Running unittests src/main.rs (target/s390x-unknown-linux-gnu/debug/deps/kata_agent-b2b5b200deca712e) running 1 test test netlink::tests::list_routes ... ok test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 224 filtered out; finished in 0.00s ``` Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-10-15 11:32:22 +02:00
Paul Meyer	06ed957a45	virtcontainers: fix nydus cleanup on rootfs unmount This was discovered by @sprt in https://github.com/kata-containers/kata-containers/pull/10243#discussion_r2373709407. Checking for state.Fstype makes no sense as we know it is empty. Signed-off-by: Paul Meyer <katexochen0@gmail.com>	2025-10-15 09:22:51 +02:00
Zvonko Kaiser	10f8ec0c20	cdi: Add Crate remove Github Hash Use CDI exclusively from crates.io and not from a GH repository. Cargo can easily check if a new version is available and we can far more easier bump it if needed. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-10-15 09:22:20 +02:00
Greg Kurz	3507b2038e	Merge pull request #11936 from ldoktor/ocp-helm ci.ocp: Use helm to install kata	2025-10-14 18:22:28 +02:00
Lukáš Doktor	bdb0afc4e0	ci.ocp: Fix incorrectly quoted argument with the shellcheck fixes we accidentally quoted the "-n NAMESPACE" argument where we should have used array instead, which lead to oc considering this as a pod name and returning error. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-10-14 17:59:33 +02:00
Lukáš Doktor	f891f340bc	ci.ocp: Use helm to install kata which is the current supported way to deploy kata-containers directly. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-10-14 17:59:33 +02:00
Aurélien Bombo	0c6fcde198	Merge pull request #11918 from fidencio/topic/builds-qemu-use-liburing-newer-than-2.2 builds: qemu: Use a liburing newer than 2.2	2025-10-14 10:17:16 -05:00
Steve Horsman	363701d767	Merge pull request #11915 from stevenhorsman/ibm-runner-followups-part-i ci: Add protobuf-compiler dependencies	2025-10-14 13:28:45 +01:00
Fabiano Fidêncio	2ad81c4797	build: qemu: Fix cache logic We need to ensure that any change on the Dockerfile (and its dir) leads to the build being retriggered, rather than using the cached version. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-14 12:17:43 +02:00
Fabiano Fidêncio	2f73e34e33	builds: qemu: Use a liburing newer than 2.2 Due to a potential regression introduced by: `984a32f17e (565f3835aaed6321caab4f7c4f8560a687f6000b_379_386)` Reported-by: Aurélien Bombo <abombo@microsoft.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-14 12:17:28 +02:00
stevenhorsman	8ce714cf97	ci: Add protobuf-compiler dependencies We are seeing more protoc related failures on the new runners, so try adding the protobuf-compiler dependency to these steps to see if it helps. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-10-14 10:58:58 +01:00
Fabiano Fidêncio	b0b0038689	versions: Bump QEMU to 10.1.1 QEMU 10.1.1 was released on October 8th, 2025, let's bump it on our side. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-13 23:52:01 +02:00
Fabiano Fidêncio	d46474cfc0	tests: Run apt-get update before installing a package Otherwise it'll just break. :-) Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-13 23:33:46 +02:00
Saul Paredes	ba7a5953c8	tests: k8s-policy-pod.bats: test unspecified initdata path use auto_generate_policy_no_added_flags, so we don't pass --initdata-path to genpolicy Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2025-10-13 10:47:53 -07:00
Saul Paredes	395f237fc2	tests: k8s: use default-initdata.toml when auto-generating policy - copy default-initdata.toml in create_tmp_policy_settings_dir, so it can be modified by other tests if needed - make auto_generate_policy use default-initdata.toml by default - add auto_generate_policy_no_added_flags, so it may be used by tests that don't want to use default-initdata.toml by default Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2025-10-13 10:47:53 -07:00
Saul Paredes	dfd269eb87	genpolicy: take path to initdata from command line if provided Otherwise use default initdata. Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2025-10-13 10:47:53 -07:00
Fabiano Fidêncio	fb43d3419f	build: Fix nvidia kernel breakage On commit `9602ba6ccc`, from February this year, we've introduced a check to ensure that the files needed for signing the kernel build are present. However, we've noticed last week that there were a reasonable amount of wrong assumptions with the workflow. :-) Zvonko fixed the majority of those, but this bit was left and it'd cause breakages when using kernel that was cached ... although passing when building new kernels. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-13 19:28:40 +02:00
Fupan Li	8b06f3d95d	Merge pull request #11905 from Apokleos/coldplug-scsidev runtime-rs: Support virtio-scsi for initdata within non-TEE	2025-10-11 16:11:39 +08:00
Xuewei Niu	5acb6d8e13	Merge pull request #11863 from lifupan/fupan_blk_remove runtime-rs: ad the block device hot unplug for clh	2025-10-11 10:31:48 +08:00
Aurélien Bombo	ff973a95c8	Merge pull request #11916 from zvonkok/fix-kernel-module-signing gpu: Fix kernel module signing	2025-10-10 17:17:08 -05:00
Zvonko Kaiser	b00013c717	kernel: Add KBUILD_SIGN_PIN pass through This is needed to the kernel setup picks up the correct config values from our fragments directories. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-10-10 15:45:34 -04:00
Zvonko Kaiser	37bd5e3c9d	gpu: Add kernel CONFIG check We need to make sure that the kernel we're using has the correct configs set, otherwise the module signing will not work. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-10-10 15:45:34 -04:00
Fabiano Fidêncio	e782d1ad50	ci: k8s: Test experimental_force_guest_pull Now that we have added the ability to deploy kata-containers with experimental_force_guest_pull configured, let's make sure we test it to avoid any kind of regressions. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-10 20:08:10 +02:00
Fabiano Fidêncio	1bc89d09ae	tests: Consider SNAPSHOTTER in the cluster name Otherwise we have no way to differentiate running tests on qemu-coco-dev with different snapshotters. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-10 20:08:10 +02:00
Fabiano Fidêncio	496e255ea2	build: Fix KBUILD_SIGN_PIN usage What was done in the past, trying to set the env var on the same step it'd be used, simply does not work. Instead, we need to properly set it through the `env` set up, as done now. We're also bumping the kata_config_version to ensure we retrigger the kernel builds. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-10 15:25:10 +02:00
Paul Meyer	5ae891ab46	versions: bump opa 1.6.0 -> 1.9.0 Bumping opa to latest release. Signed-off-by: Paul Meyer <katexochen0@gmail.com>	2025-10-10 10:58:51 +02:00
Steve Horsman	a570fdc0fd	Merge pull request #11909 from kata-containers/ibm-runners-test ci: Enable new ibm runners	2025-10-10 09:42:53 +01:00
stevenhorsman	8dcd91cf5f	ci: Enable new ibm runners We have some scalable s390x and ppc runners, so start to use them for build and test, to improve the throughput of our CI Signed-off-by: stevenhorsman <steven@uk.ibm.com> Co-authored-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-10-10 09:42:06 +01:00
Fabiano Fidêncio	06a3bbdd44	ci: k8s: coco: Add "Report tests" step For some reason we didn't have the "Report tests" step as part of the TEE jobs. This step immensely helps to check which tests are failing and why, so let's add it while touching the workflow. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-10 09:51:59 +02:00
Fabiano Fidêncio	a1f90fe350	tests: k8s: Unify k8s TEE tests There's no reason to have the code duplication between the SNP / TDX tests for CoCo, as those are basically using the same configuration nowadays. Note that for the TEEs case, as the nydus-snapshotter is deployed by the admin, once, instead of deploying it on every run ... I'm actually removing the nydus-snapshotter steps so we make it clear that those steps are not performed by the CI. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-10 09:51:59 +02:00
Alex Lyn	4c386b51d9	runtime-rs: Add support for handling virtio-scsi devices As virtio-scsi has been set the default block device driver, the runtime also need to correctly handle the virtio-scsi info, specially the SCSI address required within kata-agent handling logic. And getting and assigning the scsi_addr to kata agent device id will be enough. This commit just do such work. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-10-10 11:31:04 +08:00
Fupan Li	4002a91452	runtime-rs: ad the block device hot unplug for clh Since runtime-rs support the block device hotplug with creating new containers, and the device would also be removed when the container stopped, thus add the block device unplug for clh. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-10-10 10:02:12 +08:00
Zvonko Kaiser	afbec780a9	Merge pull request #11903 from zvonkok/ppcie gpu: PPCIE support DGX like systems	2025-10-09 21:06:41 -04:00
Aurélien Bombo	a3a45429f6	Merge pull request #11865 from microsoft/danmihai1/nested-configmap-secret tests: k8s-nested-configmap-secret policy	2025-10-09 11:33:50 -05:00
Alex Lyn	b42ef09ffb	Merge pull request #11888 from spuzirev/main runtime: fix "num-queues expects uint64" error with virtio-blk	2025-10-09 20:21:32 +08:00
Xuewei Niu	2a43bf37ed	Merge pull request #11894 from M-Phansa/main runtime: fix device typo	2025-10-09 16:53:40 +08:00
Alex Lyn	a54d95966b	runtime-rs: Support virtio-scsi for initdata within non-TEE This commit introduces support for selecting `virtio-scsi` as the block device driver for QEMU during initial setup. The primary goal is to resolve a conflict in non-TEE environments: 1. The global block device configuration defaults to `virtio-scsi`. 2. The `initdata` device driver was previously designed and hardcoded to `virtio-blk-pci`. 3. This conflict prevented unified block device usage. By allowing `virtio-scsi` to be configured at cold boot, the `initdata` device can now correctly adhere to the global setting, eliminating the need for a hardcoded driver and ensuring consistent block device configuration across all supported devices (excluding rootfs). Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-10-09 15:52:33 +08:00
Xuewei Niu	5208ee4ec0	Merge pull request #11674 from was-saw/dragonball_seccomp runtime-rs: add seccomp support for dragonball	2025-10-09 15:01:15 +08:00
wangxinge	8e1b33cc14	docs: add document for seccomp This commit adds a document to use seccomp in runtime-rs Signed-off-by: wangxinge <wangxinge@bupt.edu.cn>	2025-10-09 13:25:17 +08:00
wangxinge	2abf6965ff	dragonball: add seccomp support for dragonball This commit modifies seccomp framework to support different restrictions for different threads. Signed-off-by: wangxinge <wangxinge@bupt.edu.cn>	2025-10-09 13:25:17 +08:00
wangxinge	bb6fb8ff39	runtime-rs: add seccomp support for dragonball The implementation of the seccomp feature in Dragonball currently has a basic framework. But the actual restriction rules are empty. This pull request includes the following changes: - Modifiy configuration files to relevant configuration files. - Modifiy seccomp framework to support different restrictions for different threads. - Add new seccomp rules for the modified framework. This commit primarily implements the changes 1 and 3 for runtime-rs. Fixes: #11673 Signed-off-by: wangxinge <wangxinge@bupt.edu.cn>	2025-10-09 13:25:17 +08:00
Zvonko Kaiser	91739d4425	gpu: PPCIE support DGX like systems For DGX like systems we need additional binaries and libraries, enable the Kata AND CoCo use-case. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> Update tools/osbuilder/rootfs-builder/nvidia/nvidia_rootfs.sh Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-10-09 00:00:12 +00:00
Dan Mihai	364d3cded0	tests: k8s-nested-configmap-secret policy Add auto-generated agent policy in k8s-nested-configmap-secret.bats. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-10-08 23:37:54 +00:00
Sergei Puzyrev	62b12953c7	runtime: fix "num-queues expects uint64" error with virtio-blk Unneeded type-conversion was removed. Fixes #11887 Signed-off-by: Sergei Puzyrev <spuzirev@gmail.com>	2025-10-08 17:09:22 -05:00

1 2 3 4 5 ...

16995 Commits