kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-04-02 18:13:57 +00:00

Author	SHA1	Message	Date
Aurélien Bombo	30e030e18e	Merge pull request #12679 from microsoft/user/romoh/gpu-fix clh: Add VFIO device cold-plug support	2026-03-27 11:12:51 -05:00
Hyounggyu Choi	8cebcf0113	Merge pull request #12742 from BbolroC/remove-skipped-emptydir-tests-for-ibm-sel tests: Remove skip condition for emptyDir-related tests on IBM SEL	2026-03-27 14:35:48 +01:00
Fabiano Fidêncio	237729d728	Merge pull request #12739 from fidencio/topic/kata-deploy-nydus-use-a-different-namespace kata-deploy: rename nydus-snapshotter to nydus-for-kata-tee	2026-03-27 14:32:58 +01:00
Fabiano Fidêncio	f0ad9f1709	tests: snp: policy: Adjust to containerd 2.3.0 As the AMD maintainers switched to the 2.3.0-beta.0 containerd (due to the nydus fixes that landed there). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-27 11:14:54 +01:00
Fabiano Fidêncio	1b8189731a	tests: hand nydus snapshotter setup over to kata-deploy Now that kata-deploy deploys and manages nydus-for-kata-tee on all platforms, the separate standalone nydus-snapshotter DaemonSet deployment is no longer needed. - Short-circuit deploy_nydus_snapshotter and cleanup_nydus_snapshotter to no-ops with an explanatory message. - Add qemu-snp to the workaround case so AMD SEV-SNP baremetal runners also get USE_EXPERIMENTAL_SETUP_SNAPSHOTTER=true and kata-deploy picks up the snapshotter setup on every run. - Drop the x86_64 arch guard and the hypervisor sub-case from the EXPERIMENTAL_SETUP_SNAPSHOTTER block, allowing any architecture and hypervisor to use the kata-deploy-managed path when the flag is set. Made-with: Cursor Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-27 11:14:54 +01:00
Fabiano Fidêncio	4fad88499c	kata-deploy: rename nydus-snapshotter to nydus-for-kata-tee Rename all host-visible names of the nydus-snapshotter instance managed by kata-deploy from the generic "nydus-snapshotter" to "nydus-for-kata-tee". This covers the systemd service name, the containerd proxy plugin key, the runtime class snapshotter field, the data directory (/var/lib/nydus-for-kata-tee), the socket path (/run/nydus-for-kata-tee/), and the host install subdirectory. The rename makes it immediately clear that this nydus-snapshotter instance is the one deployed and managed by kata-deploy specifically for Kata TEE use cases, rather than any general-purpose nydus-snapshotter that might be present on the host. Because the old code operated under a completely separate set of paths (nydus-snapshotter.*), any previously deployed installation continues to run without interference during the transition to this new naming. CI pipelines and operators can upgrade kata-deploy on their own schedule without having to coordinate an atomic cutover: the old service keeps serving its existing workloads until it is explicitly replaced, and the new deployment lands cleanly alongside it. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-27 11:14:54 +01:00
Fabiano Fidêncio	fb77c357f4	Merge pull request #12743 from BbolroC/enable-trusted-ephemeral-storage-ibm-sel runtime: Set emptydir_mode to DEFEMPTYDIRMODE_COCO for IBM SEL	2026-03-27 09:48:28 +01:00
Hyounggyu Choi	de3afd3076	tests: Remove skip condition for s390x in trusted ephemeral storage test Remove the skip condition for s390x in k8s-trusted-ephemeral-data-storage.bats. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2026-03-26 18:58:13 +01:00
Hyounggyu Choi	cd931d4905	runtime: Set emptydir_mode to DEFEMPTYDIRMODE_COCO for IBM SEL The enablement of the trusted ephemeral storage for IBM SEL was missed in #10559. Set the emptydir_mode properly for the TEE. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2026-03-26 15:55:30 +01:00
Hyounggyu Choi	911aee5ad7	tests: Remove skip condition for emptyDir-related tests on IBM SEL Fixes: #10002 Since #11537 resolves the issue, remove the skip conditions for the k8s e2e tests involving emptyDir volume mounts. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2026-03-26 15:39:33 +01:00
Roaa Sakr	858620d2e7	clh: Add VFIO device cold-plug support Enable VFIO device pass-through at VM creation time on Cloud Hypervisor, in addition to the existing hot-plug path. Signed-off-by: Roaa Sakr <romoh@microsoft.com>	2026-03-25 16:39:25 -07:00
Steve Horsman	8c2b7ed619	Merge pull request #12729 from fidencio/topic/kata-deploy-nydus-dont-touch-data-dir-on-install kata-deploy: nydus: never remove the data dir	2026-03-25 10:28:50 +00:00
Steve Horsman	af7fdd5cd1	Merge pull request #12725 from kata-containers/sprt/cargo-check-fix build: Don't fail `cargo check` on a dirty tree	2026-03-25 10:21:16 +00:00
Steve Horsman	0d8186ae16	Merge pull request #12730 from fidencio/topic/bump-nydus-snapshotter versions: Bump nydus-snapshotter to v0.15.13	2026-03-25 10:20:23 +00:00
Steve Horsman	7e0f5e533a	Merge pull request #12733 from fidencio/topic/unrequire-nvidia-gpu-snp-tests-till-we-fix-auth-issues gatekeeper: Unrequire NVIDIA GPU SNP tests till auth is fixed	2026-03-25 10:11:10 +00:00
Fabiano Fidêncio	bcfb2354e0	gatekeeper: Unrequire NVIDIA GPU SNP tests till auth is fixed SSIA, the NIM tests are breaking due to authentication issues, and those issues are blocking other PRs. Let's unrequire the test for now, and mark it as required again once we fixed the auth issues. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-25 10:23:53 +01:00
Fabiano Fidêncio	caf6b244e6	versions: Bump nydus-snapshotter to v0.15.13 As this brings in a fix for using images with too many layers. https://github.com/containerd/nydus-snapshotter/releases/tag/v0.15.13 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-25 08:31:48 +01:00
Fabiano Fidêncio	fb5482f647	kata-deploy: nydus: never remove the data directory Removing /var/lib/nydus-snapshotter during install or uninstall creates a split-brain state: the nydus backend starts empty while containerd's BoltDB (meta.db) still holds snapshot records from the previous run. Any subsequent image pull then fails with: "unable to prepare extraction snapshot: target snapshot \"sha256:...\": already exists" An earlier attempt cleaned up containerd's BoltDB via `ctr snapshots rm` before wiping the directory, but that cleanup is inherently fragile: - It requires the nydus gRPC service to be reachable at cleanup time. If the service is stopped, crashed, or not yet running, every `ctr` call silently fails and the stale records remain. - Any workload still actively using a snapshot blocks the entire cleanup, making it impossible to guarantee a clean state. The correct invariant is that meta.db and the nydus backend always agree. Preserving the data directory unconditionally guarantees this: - Fresh install: data directory does not exist, nydus starts empty. - Reinstall: existing snapshots and nydus.db are preserved, meta.db and backend remain in sync, new binary starts cleanly. - After uninstall: containerd is reconfigured without the nydus proxy_plugins entry and restarted, so the snapshot records in meta.db are completely dormant — nothing will use them. If nydus is reinstalled later, the data directory is still present and both sides remain in sync, so no split-brain can occur. Any stale snapshots from previous workloads are garbage-collected by containerd once the images referencing them are removed. This also removes the cleanup_containerd_nydus_snapshots, cleanup_nydus_snapshots, and cleanup_nydus_containers helpers that were introduced by the earlier (fragile) attempt. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-03-25 07:06:41 +01:00
Alex Lyn	46aa318b74	Merge pull request #12716 from lifupan/bump_dragonball_kernel kernel: Bump the kernel to v6.18.15 for dragonball	2026-03-25 11:04:44 +08:00
Aurélien Bombo	ec9c57c595	Merge pull request #12467 from ldoktor/gk-output tools.gatekeeper: Improve output	2026-03-24 17:03:55 -05:00
Fabiano Fidêncio	8950f1caeb	Merge pull request #12706 from fidencio/topic/ci-tdx-nydus-snapshotter tests: Use the helm chart to setup nydus for TDX	2026-03-24 22:37:38 +01:00
Fabiano Fidêncio	814ae53d77	tests: Use the helm chart to setup nydus for TDX Now that containerd 2.3.0-beta.0 has been released, it brings fixes for multi-snapshotters that allows us to test the baremetal machines in the same way we test the non-baremetal ones. Let's start doing the switch for TDX as timezone is friendlier with Mikko. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-24 19:13:59 +01:00
Fabiano Fidêncio	27dfb0d06f	Merge pull request #12724 from fidencio/topic/kata-deploy-properly-cleanup-nydus-snapshotter-on-uninstall kata-deploy: nydus: clean containerd metadata before cleaning up the backend	2026-03-24 19:13:25 +01:00
Aurélien Bombo	7ae2282a99	build: Don't fail `cargo check` on a dirty tree `cargo check` was introduced in `3f1533a` to check that Cargo.lock is in sync with Cargo.toml. However, if there are uncommitted changes in the working tree, the current invocation will immediately fail because of the `git diff` call, which is frustrating for local development. As it turns out, `cargo clippy` is a superset of `cargo check`, so we can simply pass `--locked` to `cargo clippy` to detect Cargo.lock issues. This is tested with the following change: diff --git a/src/agent/Cargo.lock b/src/agent/Cargo.lock index 96b6c676d..e1963af00 100644 --- a/src/agent/Cargo.lock +++ b/src/agent/Cargo.lock @@ -4305,6 +4305,7 @@ checksum = "8f50febec83f5ee1df3015341d8bd429f2d1cc62bcba7ea2076759d315084683" name = "test-utils" version = "0.1.0" dependencies = [ - "libc", "nix 0.26.4", ] which results in the following output: $ make -C src/agent check make: Entering directory '/kata-containers/src/agent' standard rust check... cargo fmt -- --check cargo clippy --all-targets --all-features --release --locked \ -- \ -D warnings error: the lock file /kata-containers/src/agent/Cargo.lock needs to be updated but --locked was passed to prevent this If you want to try to generate the lock file without accessing the network, remove the --locked flag and use --offline instead. make: *** [../../utils.mk:184: standard_rust_check] Error 101 make: Leaving directory '/kata-containers/src/agent' Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-03-24 11:22:14 -05:00
Fabiano Fidêncio	fd583d833b	kata-deploy: nydus: clean containerd metadata before wiping backend When /var/lib/nydus-snapshotter is removed, containerd's BoltDB (meta.db at /var/lib/containerd/) still holds snapshot records for the nydus snapshotter. On the next install these stale records cause image pulls to fail with: "unable to prepare extraction snapshot: target snapshot \"sha256:...\": already exists" The failure path in core/unpack/unpacker.go: 1. sn.Prepare() → metadata layer finds the target chainID in BoltDB → returns AlreadyExists without touching the nydus backend. 2. sn.Stat() → metadata layer finds the BoltDB record, then calls s.Snapshotter.Stat(bkey) on the nydus gRPC backend → NotFound (backend was wiped). 3. The unpacker treats NotFound as a transient key-collision race and retries 3 times; all 3 attempts hit the same dead end, and the pull is aborted. The commit message of `62ad0814c` ("nydus: Always start from a clean state") assumed "containerd will re-pull/re-unpack when it finds non- existent snapshots", but that is not what happens: the metadata layer intercepts the Prepare call in BoltDB before the backend is ever consulted. Fix: call cleanup_containerd_nydus_snapshots() before stopping the nydus service (and thus before wiping its data directory) in both install_nydus_snapshotter and uninstall_nydus_snapshotter. The cleanup must run while the service is still up because ctr snapshots rm goes through the metadata layer which calls the nydus gRPC backend to physically remove the snapshot; if the service is already stopped the backend call fails and the BoltDB record remains. The cleanup: - Discovers all containerd namespaces via `ctr namespaces ls -q` (falls back to k8s.io if that fails). - Removes containers whose Snapshotter field matches the nydus plugin name; these become dangling references once snapshots are gone and can confuse container reconciliation after an aborted CI run. - Removes snapshots round by round (leaf-first) until either the list is empty or no progress can be made (see below). Note: containerd's GC cannot substitute for this explicit cleanup. The image record (a GC root) references content blobs which reference the snapshots via gc.ref labels, keeping the entire chain alive in the GC graph even after the nydus backend is wiped. Snapshot removal rounds ----------------------- Snapshot chains are linear: an image with N layers produces a chain of N snapshots, each parented on the previous. Only the current leaf can be removed each round, so N layers require exactly N rounds. There is no fixed round cap — the loop terminates when either the list reaches zero (success) or a round removes nothing at all (all remaining snapshots are actively in use by running workloads). Active workload safety ---------------------- If active workloads still hold nydus snapshots (e.g. during a live upgrade), no progress is made in a round and cleanup_nydus_snapshots returns false. Both install_nydus_snapshotter and uninstall_nydus_snapshotter gate the fs::remove_dir_all on that return value: - true → proceed as before: stop service, wipe data dir. - false → stop service, skip data dir removal, log a warning. The new nydus instance starts on the existing backend state; running containers are left intact. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-03-24 16:44:25 +01:00
Fabiano Fidêncio	eb4ce0e98b	Merge pull request #12676 from manuelh-dev/mahuber/gpu-ci-data-storage tests: gpu: use container data storage feature	2026-03-24 09:59:13 +01:00
Fupan Li	6a832dd1f3	kernel: Bump the kernel to v6.18.15 for dragonball Bump the dragonball supported kernel to v6.18.15. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2026-03-24 06:46:43 +08:00
Manuel Huber	79efe3e041	tests: gpu: use container data storage feature Use the container data storage feature for the k8s-nvidia-nim.bats test pod manifests. This reduces the pods' memory requirements. For this, enable the block-encrypted emptydir_mode for the NVIDIA GPU TEE handlers. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-03-23 11:43:11 -07:00
Steve Horsman	2728b493d5	Merge pull request #12681 from manuelh-dev/mahuber/ci-pip-py-venv tests: cc: setup function for python venv	2026-03-23 14:33:30 +00:00
Fabiano Fidêncio	1ec97d25e7	Merge pull request #12704 from stevenhorsman/security-fixes-23-mar-26 Security fixes 23 mar 26	2026-03-23 15:27:07 +01:00
Fabiano Fidêncio	aa6890eae1	Merge pull request #12675 from manuelh-dev/mahuber/cdh-storage-options agent: add mkfs_opts parameter to cdh_secure_mount	2026-03-23 15:18:38 +01:00
Fabiano Fidêncio	fe817bb47b	Merge pull request #12705 from fidencio/topic/tests-nginx-connectibity-2nd-try tests: nginx-connectivity: Use `-O index.html` to override the downloaded file	2026-03-23 13:08:51 +01:00
Fabiano Fidêncio	514a2b1a7c	Merge pull request #12264 from fidencio/topic/nvidia-gpu-cc-use-nydus-snapshotter nvidia: cc: Use nydus-snapshotter	2026-03-23 12:50:15 +01:00
stevenhorsman	2edb588ed9	kata-ctl: Pin micro_http the micro_http crate was just pointing the the main branch and hadn't been updated for around 3 years, so pin to the latest for stability and update to remediate RUSTSEC-2024-0002 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-03-23 10:34:28 +00:00
stevenhorsman	9871256771	versions: Bump cloud-hypervisor to v51 In v51 the license was added, so try bumping to this version to solve the cargo deny issue Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-03-23 10:34:28 +00:00
dependabot[bot]	8de7f29981	agent-ctl: Bump aws-lc-rs to 1.16.2 Bump aws-lc-rs, so that aws-lc-sys updates to 0.39.0 to remediate RUSTSEC-2026-0044 and https://osv.dev/vulnerability/RUSTSEC-2026-0048 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-03-23 10:34:28 +00:00
dependabot[bot]	1c63738b80	build(deps): bump aws-lc-fips-sys in /src/tools/agent-ctl Bumps [aws-lc-fips-sys](https://github.com/aws/aws-lc-rs) from 0.13.12 to 0.13.13. - [Release notes](https://github.com/aws/aws-lc-rs/releases) - [Commits](https://github.com/aws/aws-lc-rs/compare/aws-lc-fips-sys/v0.13.12...aws-lc-fips-sys/v0.13.13) --- updated-dependencies: - dependency-name: aws-lc-fips-sys dependency-version: 0.13.13 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>	2026-03-23 10:34:28 +00:00
dependabot[bot]	6e79a9d6ad	build(deps): bump rustls-webpki in /src/tools/agent-ctl Bumps [rustls-webpki](https://github.com/rustls/webpki) from 0.103.3 to 0.103.10. - [Release notes](https://github.com/rustls/webpki/releases) - [Commits](https://github.com/rustls/webpki/compare/v/0.103.3...v/0.103.10) --- updated-dependencies: - dependency-name: rustls-webpki dependency-version: 0.103.10 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>	2026-03-23 10:34:27 +00:00
dependabot[bot]	8df9cf35df	build(deps): bump rustls-webpki in /tools/packaging/kata-deploy/binary Bumps [rustls-webpki](https://github.com/rustls/webpki) from 0.103.8 to 0.103.10. - [Release notes](https://github.com/rustls/webpki/releases) - [Commits](https://github.com/rustls/webpki/compare/v/0.103.8...v/0.103.10) --- updated-dependencies: - dependency-name: rustls-webpki dependency-version: 0.103.10 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>	2026-03-23 10:34:27 +00:00
dependabot[bot]	ef32923461	build(deps): bump tar from 0.4.44 to 0.4.45 Bumps [tar](https://github.com/alexcrichton/tar-rs) from 0.4.44 to 0.4.45. - [Commits](https://github.com/alexcrichton/tar-rs/compare/0.4.44...0.4.45) --- updated-dependencies: - dependency-name: tar dependency-version: 0.4.45 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2026-03-23 10:34:27 +00:00
stevenhorsman	85e17c2e77	deps: Bump rustls-webpki Bump rusttls-webpki to 0.103.10 to remediate RUSTSEC-2026-0049 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-03-23 10:34:27 +00:00
stevenhorsman	c3868f8e60	deps: Bump aws-lc-rs to 1.16.2 Bump aws-lc-rs, so that aws-lc-sys updates to 0.39.0 to remediate RUSTSEC-2026-0044 and https://osv.dev/vulnerability/RUSTSEC-2026-0048 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-03-23 10:34:27 +00:00
stevenhorsman	27417d9d15	ci: Add more crates to dependabot groups Add aws-lc, and rustls-webpki, so that in future the different component bumps are all done together Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-03-23 10:34:27 +00:00
Fabiano Fidêncio	83f37f4beb	tests: nginx-connectivity: Override index.html (2nd try) We need to explicitly pass `-O index.html` as the busybox' wget has a different behaviour than GNU's wget. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-23 11:11:44 +01:00
Fabiano Fidêncio	e44dfccf7a	Revert "tests: nginx-connectivity: Allow overriding the downloded file" This reverts commit `4403289123`. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-23 11:06:23 +01:00
Hyounggyu Choi	1035504492	Merge pull request #12701 from fidencio/topic/tests-arm-nginx-connectivity tests: nginx-connectivity: Allow overriding the downloded file	2026-03-23 10:37:25 +01:00
Steve Horsman	20cb65b1fb	Merge pull request #12624 from lifupan/bump_rust_vmms runtime-rs: Bump rust vmms for dragonball	2026-03-23 08:56:47 +00:00
Fabiano Fidêncio	864f181faf	Merge pull request #12694 from manuelh-dev/mahuber/nv-test-timeout tests: nvidia: Increase run test timeout	2026-03-23 09:13:20 +01:00
Fabiano Fidêncio	642b5661ff	Merge pull request #12651 from manuelh-dev/mahuber/doc-update-nvidia-gpu-op docs: Update NVIDIA GPU passthrough QEMU scenario	2026-03-23 09:01:02 +01:00
Fabiano Fidêncio	4403289123	tests: nginx-connectivity: Allow overriding the downloded file In case a wget fails for one reason or another, it'll leave behind an 'index.html' file. Let's make sure we allow overriding that file so the retry loop doesn't fail for no reason. Fixes: #12670 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-23 04:08:24 +01:00

1 2 3 4 5 ...

18286 Commits