kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-04-10 22:12:35 +00:00

Author	SHA1	Message	Date
Alex Lyn	fb2af538ee	docs: Add how-to guide for using fsmerged EROFS rootfs with Kata Document the end-to-end workflow for using the containerd EROFS snapshotter with Kata Containers runtime-rs, covering containerd configuration, Kata QEMU settings, and pod deployment examples via crictl/ctr/Kubernetes. Include prerequisites (containerd >= 2.2, runtime-rs main branch), QEMU VMDK format verification command, architecture diagram, VMDK descriptor format reference, and troubleshooting guide. Note that Cloud Hypervisor, Firecracker, and Dragonball do not support VMDK block devices and are currently unsupported for fsmerged EROFS rootfs. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-04-10 10:54:20 +08:00
Hyounggyu Choi	f15f7f49f1	Merge pull request #12787 from fidencio/topic/runtime-rs-qemu-arm64-use-static-sandbox-resource-mgmt runtime: qemu: Enable static sandbox resource management on ARM & s390x	2026-04-09 09:18:11 +02:00
Amanda Liem	79f844d057	runtime: SNP img-based rootfs with dm-verity Follow-on to kata-containers/kata-containers#12396 Switch SNP config from initrd-based to image-based rootfs with dm-verity. The runtime assembles the dm-mod.create kernel cmdline from kernel_verity_params, and with kernel-hashes=on the root hash is included in the SNP launch measurement. Also add qemu-snp to the measured rootfs integration test. Signed-off-by: Amanda Liem <aliem@amd.com>	2026-04-08 16:46:32 +00:00
Fabiano Fidêncio	e93bfbe01a	tests: Remove qemu-coco-dev* skip from sandbox vCPU allocation test With static_sandbox_resource_mgmt calculation fixed for runtime-rs, the VM is correctly pre-sized at creation time. The vCPU allocation test no longer depends on CPU hotplug, so the qemu-coco-dev* skip is no longer needed. Fixes: #10928 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-08 16:36:00 +02:00
Fabiano Fidêncio	6bc2452664	tests: Remove aarch64 skip from sandbox vCPU allocation test With static_sandbox_resource_mgmt now enabled for ARM on runtime-rs, the VM is correctly pre-sized at creation time. The vCPU allocation test no longer depends on CPU hotplug, so the aarch64 skip (issue #10928) is no longer needed. Fixes: #10928 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-08 16:36:00 +02:00
Fabiano Fidêncio	b3ae6ef99c	Merge pull request #12760 from fitzthum/bump-nvat Bump trustee and guest-components to add nvswitch / ppcie support	2026-04-07 19:07:50 +02:00
Aurélien Bombo	79fab93041	Merge pull request #12779 from rophy/fix/strip-cr-from-tty-exec tests: strip \r from kubectl exec output for TTY containers	2026-04-07 10:19:21 -05:00
Tobin Feldman-Fitzthum	e40abcf72d	nvidia: add nvrc.smi.srs=1 to default nvidia kernel params The attestation-agent no longer sets nvidia devices to ready automatically. Instead, we should use nvrc for this. Since this is required for all nvidia workloads, add it to the default nv kernel params. With bounce buffers, the timing of attesting a device versus setting it to ready is not so important. Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>	2026-04-07 14:28:50 +00:00
Tobin Feldman-Fitzthum	7385938c57	tests: fix default KBS Policy path We recently moved the default policy in the Trustee repo. Now it's in the same place as all the other policies. Update the test code to match. Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>	2026-04-07 05:46:27 +00:00
Rophy Tsai	f7d9024249	tests: strip \r from kubectl exec output for TTY containers The busybox-pod.yaml test fixture sets tty: true on the second container. When a container has a TTY, kubectl exec may return \r\n line endings. The invisible \r causes string comparisons to fail: container_name=$(kubectl exec ... -- env \| grep CONTAINER_NAME) [ "$container_name" == "CONTAINER_NAME=second-test-container" ] This comparison fails because $container_name contains a trailing \r character. Fix by piping through tr -d '\r' after grep. This is harmless when \r is absent and fixes the mismatch when present. Fixes: #9136 Signed-off-by: Rophy Tsai <rophy@users.noreply.github.com>	2026-04-07 01:35:10 +00:00
Dan Mihai	9b770793ba	Merge pull request #12728 from manuelh-dev/mahuber/empty-dir-fsgrou-policy genpolicy: adjust GID after passwd GID handling and set fs_group for encrypted emptyDir volumes	2026-04-06 10:22:34 -07:00
Fabiano Fidêncio	1300145f7a	tests: add k3s/rke2 to OCI 1.3.0 drop-in overlay condition k3s and rke2 ship containerd 2.2.2, which requires the OCI 1.3.0 drop-in overlay. Move them from the separate OCI 1.2.1 branch into the OCI 1.3.0 condition alongside nvidia-gpu, qemu-snp, qemu-tdx, and custom container engine versions. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-06 18:50:20 +02:00
llink5	f7878cc385	runtime: fix Docker 26+ networking by rescanning after Start Docker 26+ configures container networking (veth pair, IP addresses, routes) after task creation rather than before. Kata's endpoint scan runs during CreateSandbox, before the interfaces exist, resulting in VMs starting without network connectivity (no -netdev passed to QEMU). Add RescanNetwork() which runs asynchronously after the Start RPC. It polls the network namespace until Docker's interfaces appear, then hotplugs them to QEMU and informs the guest agent to configure them inside the VM. Additional fixes: - mountinfo parser: find fs type dynamically instead of hardcoded field index, fixing parsing with optional mount tags (shared:, master:) - IsDockerContainer: check CreateRuntime hooks for Docker 26+ - DockerNetnsPath: extract netns path from libnetwork-setkey hook args with path traversal protection - detectHypervisorNetns: verify PID ownership via /proc/pid/cmdline to guard against PID recycling - startVM guard: rescan when len(endpoints)==0 after VM start Fixes: #9340 Signed-off-by: llink5 <llink5@users.noreply.github.com>	2026-04-02 21:23:16 +02:00
Manuel Huber	dd868dee6d	tests: nvidia: onboard NIM service test Onboard a test case for deploying a NIM service using the NIM operator. We install the operator helm chart on the fly as this is a fast operation, spinning up a single operand. Once a NIM service is scheduled, the operator creates a deployment with a single pod. For now, the TEE-based flow uses an allow-all policy. In future work, we strive to support generating pod security policies for the scenario where NIM services are deployed and the pod manifest is being generated on the fly. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-04-02 16:58:54 +02:00
Manuel Huber	57e42b10f1	tests: nvidia: Do not use elevated privileges Do not run the NIM containers with elevated privileges. Note that, using hostPath requires proper host folder permissions, and that using emptyDir requires a proper fsGroup ID. Once issue 11162 is resolved, we can further refine the securityContext fields for the TEE manifests. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-04-01 10:23:26 -07:00
Manuel Huber	a762b136de	tests: generate policy for pod-empty-dir-fsgroup The logic in the k8s-empty-dirs.bats file missed to add a security policy for the pod-empty-dir-fsgroup.yaml manifest. With this change, we add the policy annotation. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-04-01 10:23:26 -07:00
Fabiano Fidêncio	2131147360	tests: add kata-deploy lifecycle tests for restart resilience and cleanup Add functional tests that cover two previously untested kata-deploy behaviors: 1. Restart resilience (regression test for #12761): deploys a long-running kata pod, triggers a kata-deploy DaemonSet restart via rollout restart, and verifies the kata pod survives with the same UID and zero additional container restarts. 2. Artifact cleanup: after helm uninstall, verifies that RuntimeClasses are removed, the kata-runtime node label is cleared, /opt/kata is gone from the host filesystem, and containerd remains healthy. 3. Artifact presence: after install, verifies /opt/kata and the shim binary exist on the host, RuntimeClasses are created, and the node is labeled. Host filesystem checks use a short-lived privileged pod with a hostPath mount to inspect the node directly. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-01 15:20:53 +02:00
Fabiano Fidêncio	8b9ce3b6cb	tests: remove k3s/rke2 V3 containerd template workaround Remove the workaround that wrote a synthetic containerd V3 config template for k3s/rke2 in CI. This was added to test kata-deploy's drop-in support before the upstream k3s/rke2 patch shipped. Now that k3s and rke2 include the drop-in imports in their default template, the workaround is no longer needed and breaks newer versions. Removed: - tests/containerd-config-v3.tmpl (synthetic Go template) - _setup_containerd_v3_template_if_needed() and its k3s/rke2 wrappers - Calls from deploy_k3s() and deploy_rke2() This reverts the test infrastructure part of `a2216ec05`. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-01 14:24:55 +02:00
Manuel Huber	177f5c308e	tests: gpu: use container image layer storage Use the container image layer storage feature for the k8s-nvidia-nim.bats test pod manifests. This reduces the pods' memory requirements. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-04-01 10:22:26 +02:00
Manuel Huber	b6cf00a374	tests: parametrize storage parameters - trusted-storage.yaml.in: use $PV_STORAGE_CAPACITY and $PVC_STORAGE_REQUEST so that PV/PVC size can vary per test. - confidential_common.sh: add optional size (MB) argument to create_loop_device. - k8s-guest-pull-image.bats: pass PV_STORAGE_CAPACITY and PVC_STORAGE_REQUEST when generating storage config. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-04-01 10:22:26 +02:00
Hyounggyu Choi	11cd5f2808	tests: Configure devmapper properly regardless of containerd version The follow differences are observed between container 1.x and 2.x: ``` [plugins.'io.containerd.snapshotter.v1.devmapper'] snapshotter = 'overlayfs' ``` and ``` [plugins."io.containerd.snapshotter.v1.devmapper"] snapshotter = "overlayfs" ``` The current devmapper configuration only works with double quotes. Make it work with both single and double quotes via tomlq. In the default configuration for containerd 2.x, the following configuration block is missing: ``` [[plugins.'io.containerd.transfer.v1.local'.unpack_config]] platform = "linux/s390x" # system architecture snapshotter = "devmapper" ``` Ensure the configuration block is added for containerd 2.x. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2026-04-01 07:14:52 +02:00
Alex Lyn	119a145923	docs: Upgrade architecture documentation from 3.0 to 4.0 Replace Kata 3.0 architecture docs with Kata 4.0 (Rust Runtime) documentation. Key changes: - Remove deprecated architecture 3.0 documentation - Add comprehensive Kata 4.0 architecture guide covering: - Unified single-binary architecture - Built-in Dragonball VMM integration - Async I/O model with Tokio - Layered architecture design - Modular resource manager - Extensible framework for multiple container types The new documentation reflects the production-ready Rust runtime with improved performance and reduced resource consumption. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-03-29 19:17:03 +02:00
Alex Lyn	004333ed71	docs: Update containerd-kata.md with clear settings In this commit: (1) Update containerd config with kata configurations (2) Add more comments to guide how to use containerd/kata with default setting and customized configure setting; (3) Update the usage of containerd cmd tool ctr with explicitly specified runtime-config-path options to make it work. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-03-29 19:17:03 +02:00
Alex Lyn	a923bb2917	docs: Add document for how-to-use passthroughfd-IO within runtime-rs This document describes the Passthrough-FD (pass-fd) technology implemented in Kata Containers to optimize IO performance. By bypassing the intermediate proxy layers, this technology significantly reduces latency and CPU overhead for container IO streams. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-03-29 19:17:03 +02:00
Hyounggyu Choi	8cebcf0113	Merge pull request #12742 from BbolroC/remove-skipped-emptydir-tests-for-ibm-sel tests: Remove skip condition for emptyDir-related tests on IBM SEL	2026-03-27 14:35:48 +01:00
Fabiano Fidêncio	f0ad9f1709	tests: snp: policy: Adjust to containerd 2.3.0 As the AMD maintainers switched to the 2.3.0-beta.0 containerd (due to the nydus fixes that landed there). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-27 11:14:54 +01:00
Fabiano Fidêncio	1b8189731a	tests: hand nydus snapshotter setup over to kata-deploy Now that kata-deploy deploys and manages nydus-for-kata-tee on all platforms, the separate standalone nydus-snapshotter DaemonSet deployment is no longer needed. - Short-circuit deploy_nydus_snapshotter and cleanup_nydus_snapshotter to no-ops with an explanatory message. - Add qemu-snp to the workaround case so AMD SEV-SNP baremetal runners also get USE_EXPERIMENTAL_SETUP_SNAPSHOTTER=true and kata-deploy picks up the snapshotter setup on every run. - Drop the x86_64 arch guard and the hypervisor sub-case from the EXPERIMENTAL_SETUP_SNAPSHOTTER block, allowing any architecture and hypervisor to use the kata-deploy-managed path when the flag is set. Made-with: Cursor Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-27 11:14:54 +01:00
Hyounggyu Choi	de3afd3076	tests: Remove skip condition for s390x in trusted ephemeral storage test Remove the skip condition for s390x in k8s-trusted-ephemeral-data-storage.bats. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2026-03-26 18:58:13 +01:00
Hyounggyu Choi	911aee5ad7	tests: Remove skip condition for emptyDir-related tests on IBM SEL Fixes: #10002 Since #11537 resolves the issue, remove the skip conditions for the k8s e2e tests involving emptyDir volume mounts. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2026-03-26 15:39:33 +01:00
Fabiano Fidêncio	814ae53d77	tests: Use the helm chart to setup nydus for TDX Now that containerd 2.3.0-beta.0 has been released, it brings fixes for multi-snapshotters that allows us to test the baremetal machines in the same way we test the non-baremetal ones. Let's start doing the switch for TDX as timezone is friendlier with Mikko. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-24 19:13:59 +01:00
Manuel Huber	79efe3e041	tests: gpu: use container data storage feature Use the container data storage feature for the k8s-nvidia-nim.bats test pod manifests. This reduces the pods' memory requirements. For this, enable the block-encrypted emptydir_mode for the NVIDIA GPU TEE handlers. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-03-23 11:43:11 -07:00
Steve Horsman	2728b493d5	Merge pull request #12681 from manuelh-dev/mahuber/ci-pip-py-venv tests: cc: setup function for python venv	2026-03-23 14:33:30 +00:00
Fabiano Fidêncio	fe817bb47b	Merge pull request #12705 from fidencio/topic/tests-nginx-connectibity-2nd-try tests: nginx-connectivity: Use `-O index.html` to override the downloaded file	2026-03-23 13:08:51 +01:00
Fabiano Fidêncio	514a2b1a7c	Merge pull request #12264 from fidencio/topic/nvidia-gpu-cc-use-nydus-snapshotter nvidia: cc: Use nydus-snapshotter	2026-03-23 12:50:15 +01:00
Fabiano Fidêncio	83f37f4beb	tests: nginx-connectivity: Override index.html (2nd try) We need to explicitly pass `-O index.html` as the busybox' wget has a different behaviour than GNU's wget. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-23 11:11:44 +01:00
Fabiano Fidêncio	e44dfccf7a	Revert "tests: nginx-connectivity: Allow overriding the downloded file" This reverts commit `4403289123`. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-23 11:06:23 +01:00
Hyounggyu Choi	1035504492	Merge pull request #12701 from fidencio/topic/tests-arm-nginx-connectivity tests: nginx-connectivity: Allow overriding the downloded file	2026-03-23 10:37:25 +01:00
Fabiano Fidêncio	642b5661ff	Merge pull request #12651 from manuelh-dev/mahuber/doc-update-nvidia-gpu-op docs: Update NVIDIA GPU passthrough QEMU scenario	2026-03-23 09:01:02 +01:00
Fabiano Fidêncio	4403289123	tests: nginx-connectivity: Allow overriding the downloded file In case a wget fails for one reason or another, it'll leave behind an 'index.html' file. Let's make sure we allow overriding that file so the retry loop doesn't fail for no reason. Fixes: #12670 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-23 04:08:24 +01:00
Alex Lyn	d2c2ec6e23	Merge pull request #12633 from LandonTClipp/docs_materialx docs: Move to mkdocs-material, port Helm to docs site	2026-03-23 09:29:25 +08:00
Fabiano Fidêncio	740d380b8e	tests: nvidia: cc: Use nydus-snapshotter So we can test what we just changed in the config files. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-22 10:10:34 +01:00
Agam Dua	f6319da73d	tests: Add eBPF and dwarves to spell check dictionary Add missing terms to the spell check dictionary to fix CI failures for kernel debug documentation: - eBPF - dwarves: Linux package with DWARF/BTF tools (pahole) required for CONFIG_DEBUG_INFO_BTF kernel option Also fix the casing of "ebpf" to "eBPF" in the kernel README to match the official naming convention. Signed-off-by: Agam Dua <agam_dua@apple.com>	2026-03-20 15:04:08 -07:00
LandonTClipp	5333e45313	docs: Fix static-checks.sh when running locally This fixes the test_dir variable in static-checks.sh so that when a --repo-path is provided, the test_dir variable uses that for the location instead of the GOPATH location. Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>	2026-03-20 14:51:45 -05:00
Manuel Huber	476f550977	docs: Update NVIDIA GPU passthrough QEMU scenario With the upcoming GPU operator 26.3 relase and recent changes to kata-containers, we adapt this documentation with notes on multi GPU passthrough, support for TDX, changed deployment instructions, and with various other minor improvements. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-03-20 10:53:14 -07:00
stevenhorsman	e62df07b6a	static-checks: Delete kata-spell-check The old hunspell based spell-check was causing contributors challenges and proving a barrier to doc updates. We've replaced it with a cspell based-solution, so clean up the old approach. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-03-19 10:22:54 +00:00
stevenhorsman	d06dadd8ef	docs: Spelling updates Either fixing typos, or including program/repo name in backticks Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-03-19 10:22:54 +00:00
stevenhorsman	829a32ee67	spellcheck: Add cspell files Add cspell config and initial dictionary Assisted-by: Bob (dictionary ordering and catergorisation) Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-03-19 10:22:54 +00:00
Manuel Huber	5765bc97b4	tests: cc: setup function for python venv We recently had a failure on a new CI runner where ${HOME}/.cicd/venv/bin/activate was not present. The relevant call originated from ensure_sev_snp_measure. Thus, add a function ensure_cicd_python_venv before callers to pip install. Currently, the NVIDIA NIM test and the confidential attestation tests use pip to install dependencies. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-03-18 17:07:47 -07:00
Aurélien Bombo	352b4cdad2	Merge pull request #12660 from LandonTClipp/ci_docs ci: Don't run CI builds on doc PRs	2026-03-17 12:19:11 -05:00
Aurélien Bombo	f8e234c6f9	Merge pull request #12650 from kata-containers/sprt/remove-csi ci: Stop building/deploying CSI driver	2026-03-16 16:53:02 -05:00

1 2 3 4 5 ...

1972 Commits