kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-05-14 11:03:31 +00:00

Author	SHA1	Message	Date
Konstantin Khlebnikov	5d99a141d9	runtime: add hypervisor options for NUMA topology With enable_numa=true hypervisor will expose host NUMA topology as is: map vm NUMA nodes to host 1:1 and bind vpus to relates CPUS. Option "numa_mapping" allows to redefine NUMA nodes mapping: - map each vm node to particular host node or several numa nodes - emulate numa on host without numa (useful for tests) Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Co-authored-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-02-09 20:09:25 +01:00
Fabiano Fidêncio	ab515712d4	kernel: Unify kernel and kernel-confidential Build a single kernel for both kernel and kernel-confidential on x86_64 and s390x. The kernel is built with TEE support (-x) on those arches only. This helps to simplilfy and to maintain the code, and having a single kernel was the original plan since forever. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-09 18:28:23 +01:00
Fabiano Fidêncio	c5b5433866	kernel: Unify nvidia-gpu and nvidia-gpu-confidential Build a single kernel for both nvidia-gpu and nvidia-gpu-confidential, simplifying and reducing code maintenance. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-09 18:28:23 +01:00
Steve Horsman	f02fa79758	Merge pull request #12470 from jirimoravcik/docs/add-os-version docs: add `OS_VERSION` to rootfs script	2026-02-09 15:06:14 +00:00
Alex Lyn	3fda59e27d	tests: rename pod_exec_with_retries to pod_exec and update callers It will do following works in this commit: (1) Rename pod_exec_with_retries() to pod_exec(). (2) Update implementation to call container_exec(). (3) Replace all usages of pod_exec_with_retries across tests with pod_exec. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-02-09 15:56:13 +01:00
Alex Lyn	861d39305c	tests: drop kubectl exec retries in container_exec This commit aims to drop retries when kubectl exec a container: (1) Rename container_exec_with_retries() to container_exec(). (2) Remove the retry loop and sleep backoff around kubectl exec. Keep the same logging and container-selection logic and return kubectl exec exit status directly. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-02-09 15:56:13 +01:00
Alex Lyn	41e8acbc5e	runtime: Map empty ReadStdout/ReadStderr response to io.EOF After the kata-agent "drain-after-exit" change, stdout/stderr EOF is signaled by a successful ReadStdout/ReadStderr reply with empty Data (len==0), instead of an RPC error. However, runtime-go currently returns (0, nil) to io.CopyBuffer() when resp.Data is empty, which violates Go io.Reader semantics and can cause `kubectl exec` to hang after the command output is already printed. To avoid exec hang: In readProcessStream(), map an empty response (len(resp.Data)==0) into (0, io.EOF). This allows the stdout/stderr copy goroutines to terminate, closes exitIOch, and unblocks the wait path so exec can complete normally. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-02-09 15:56:13 +01:00
Alex Lyn	ffb8a6a9c3	agent: fix misleading tokio::select! biased comment in do_read_stream The previous comment incorrectly implied that `biased` prevents data loss and the exit notifier would never be polled before all buffered data is read. And the detailed info can be seen from the document: https://docs.rs/tokio/latest/src/tokio/macros/select.rs.html#67 Tokio's `biased` only makes polling order deterministic(top-to-bottom) when multiple branches are ready in the same poll, and it makes fairness the caller's responsibility. Output can still be truncated if the exit notification becomes ready while `read_stream` is pending. This change updates the comment to reflect the actual semantics and caveats. No functional behavior change. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-02-09 15:56:13 +01:00
Alex Lyn	1080f6d87e	agent: Introduce drain after exit mechanism to address truncation race Short-lived processes (e.g., `kubectl exec echo`) in legacy-io mode occasionally lose the last segments of their output. The root cause is a race condition where the `term_exit_notifier` triggers before the pipe buffers are fully drained. In the previous implementation, once the exit notification was received, the agent immediately returned an EOF, causing the runtime's `run_io_copy` to terminate and drop any residual data in the pipe. This patch introduces a "drain after exit" mechanism: - Upon receiving an exit notification, the agent enters a 500ms window for polling `read_streaim` to flush remaining data from the buffer. - A true EOF is only returned if the stream is confirmed empty or the timeout is reached. This ensures reliable output delivery for transient exec tasks under high concurrency. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-02-09 15:56:13 +01:00
Alex Lyn	700bddeecc	agent: treat EOF as normal for read_stdout/stderr stream Legacy IO uses shim polling via read_stdout/read_stderr. The agent previously mapped pipe EOF (read() == 0) and term_exit_notifier to errors ("read meet eof"/"eof"), which became ttrpc INTERNAL failures. This caused runtime IO copy to abort early, leading to lost stdout/stderr for short-lived exec (e.g."echo") and spurious failures. Normalize EOF semantics: read_stream now returns Ok(empty) on EOF instead of Err("read meet eof"). This makes legacy IO behave like a proper stream: data until EOF, no INTERNAL errors for normal termination. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-02-09 15:56:13 +01:00
stevenhorsman	b909c41128	runtime: Bump x/net to v0.49.0 Bump x/net to resolve CVEs: - GO-2026-4441 - GO-2026-4440 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 14:49:31 +01:00
stevenhorsman	b29312289f	versions: Bump go to 1.24.13 Bump go to 1.24.13 to fix CVE GO-2026-4337 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 14:49:31 +01:00
Zvonko Kaiser	7af306de13	agent: Update aarch64 create_pci_root_bus_path aarch64 is also a supported architecture for NUMA. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-02-09 10:19:41 +01:00
Zvonko Kaiser	8185c015ad	gpu: Add Agent NUMA Support 1 of N We're introducing a root_complex to assign each and every device to a NUMA node or to the default root_complex="00" aka pcie.0. This patch introduces the proper handling of the current qom path being bus/device == "00/02" with NUMAA we need to extend it with the root_complex/bus/device == "10/00/02". We're defaulting to root_complex="00". Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-02-09 10:19:41 +01:00
Alex Lyn	16a7ed6e14	Merge pull request #12464 from mythi/runtime-rs-tdvf runtime-rs: use FIRMWARETDVFPATH like Go runtime	2026-02-09 09:12:52 +08:00
Mikko Ylinen	4088881662	runtime-rs: use FIRMWARETDVFPATH like Go runtime Use OVMF path configuration for Intel TDX consistently: $ git grep FIRMWARETD src/runtime-rs/Makefile:FIRMWARETDXPATH := $(PREFIXDEPS)/share/ovmf/OVMF.inteltdx.fd src/runtime-rs/Makefile:USER_VARS += FIRMWARETDXPATH src/runtime-rs/config/configuration-qemu-tdx-runtime-rs.toml.in:firmware = "@FIRMWARETDXPATH@" src/runtime/Makefile:FIRMWARETDVFPATH := $(PREFIXDEPS)/share/ovmf/OVMF.inteltdx.fd Go runtime has used TDVF so just make runtime-rs to follow. This keeps the behavior consistent when downstreams switch from Go runtime to runtime-rs. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-02-08 21:38:06 +01:00
Jiri Moravcik	d5840149d2	docs: add `OS_VERSION` to rootfs script The OS_VERSION is required when trying to build RootFS with ubuntu distro. Fixes #12469 Signed-off-by: Jiri Moravcik <jiri.moravcik@gmail.com>	2026-02-08 21:21:59 +01:00
Manuel Huber	d9d1073cf1	gpu: Install packages for devkit Introduce a new function to install additional packages into the devkit flavor. With modprobe, we avoid errors on pod startup related to loading nvidia kernel modules in the NVRC phase. Note, the production flavor gets modprobe from busybox, see its configuration file containing CONFIG_MODPROBE=y. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-06 09:58:32 +01:00
Manuel Huber	a786582d0b	rootfs: deprecate initramfs dm-verity mode Remove the initramfs folder, its build steps, and use the kernel based dm-verity enforcement for the handlers which used the initramfs mode. Also, remove the initramfs verity mode capability from the shims and their configs. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	cf7f340b39	tests: Read and overwrite kernel_verity_parameters Read the kernel_verity_paramers from the shim config and adjust the root hash for the negative test. Further, improve some of the test logic by using shared functions. This especially ensures we don't read the full journalctl logs on a node but only the portion of the logs we are actually supposed to look at. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	7958be8634	runtime: Make kernel_verity_params overwritable Similar to the kernel_params annotation, add a kernel_verity_params annotation and add logic to make these parameters overwritable. For instance, this can be used in test logic to provide bogus dm-verity hashes for negative tests. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	7700095ea8	runtime-rs: Make kernel_verity_params overwritable Similar to the kernel_params annotation, add a kernel_verity_params annotation and add logic to make these parameters overwritable. For instance, this can be used in test logic to provide bogus dm-verity hashes for negative tests. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	472b50fa42	runtime-rs: Enable kernelinit dm-verity variant This change introduces the kernel_verity_parameters knob to the rust based shim, picking up dm-verity information in a new config field (the corresponding build variable is already produced by the shim build). The change extends the shim to parse dm-verity information from this parameter and to construct the kernel command line appropriately, based on the indicated initramfs or kernelinit build variant. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	f639c3fa17	runtime: Enable kernelinit dm-verity variant This change introduces the kernel_verity_parameters knob to the Go based shim, picking up dm-verity information in a new config field (the corresponding build variable is already produced by the shim build). The change extends the shim to parse dm-verity information from this parameter and to construct the kernel command line appropriately, based on the indicated initramfs or kernelinit build variant. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	e120dd4cc6	tests: cc: Remove quotes from kernel command line With dm-mod.create parameters using quotes, we remove the backslashes used to escape these quotes from the output we retrieve. This will enable attestation tests to work with the kernelinit dm-verity mode. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	976df22119	rootfs: Change condition for cryptsetup-bin Measured rootfs mode and CDH secure storage feature require the cryptsetup-bin and e2fsprogs components in the guest. This change makes this more explicity - confidential guests are users of the CDH secure container image layer storage feature. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	a3c4e0b64f	rootfs: Introduce kernelinit dm-verity mode This change introduces the kernelinit dm-verity mode, allowing initramfs-less dm-verity enforcement against the rootfs image. For this, the change introduces a new variable with dm-verity information. This variable will be picked up by shim configurations in subsequent commits. This will allow the shims to build the kernel command line with dm-verity information based on the existing kernel_parameters configuration knob and a new kernel_verity_params configuration knob. The latter specifically provides the relevant dm-verity information. This new configuration knob avoids merging the verity parameters into the kernel_params field. Avoiding this, no cumbersome escape logic is required as we do not need to pass the dm-mod.create="..." parameter directly in the kernel_parameters, but only relevant dm-verity parameters in semi-structured manner (see above). The only place where the final command line is assembled is in the shims. Further, this is a line easy to comment out for developers to disable dm-verity enforcement (or for CI tasks). This change produces the new kernelinit dm-verity parameters for the NVIDIA runtime handlers, and modifies the format of how these parameters are prepared for all handlers. With this, the parameters are currently no longer provided to the kernel_params configuration knob for any runtime handler. This change alone should thus not be used as dm-verity information will no longer be picked up by the shims. systemd-analyze on the coco-dev handler shows that using the kernelinit mode on a local machine, less time is spent in the kernel phase, slightly speeding up pod start-up. On that machine, the average of 172.5ms was reduced to 141ms (4 measurements, each with a basic pod manifest), i.e., the kernel phase duration is improved by about 18 percent. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	83a0bd1360	gpu: use dm-verity for the non-TEE GPU handler Use a dm-verity protected rootfs image for the non-TEE NVIDIA GPU handler as well. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	02ed4c99bc	rootfs: Use maxdepth=1 to search for kata tarballs These tarballs are in the top layer of the build directory, no need to traverse all sub-directories. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	d37db5f068	rootfs: Restore "gpu: Handle root_hash.txt ..." This reverts commit `923f97bc66` in order to re-instantiate the logic from commit `e4a13b9a4a`. The latter commit was previously reverted due to the NVIDIA GPU TEE handler using an initrd, not an image. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	f1ca547d66	initramfs: introduce log function Log to /dev/kmsg, this way logs will show up and not get lost. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	6d0bb49716	runtime: nvidia: Use img and sanitize whitespaces Shift NVIDIA shim configurations to use an image instead of an initrd, and remove trailing whitespaces from the configs. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	282014000f	tests: cc: support initrd, image for attestation Allow using an image instead of an initrd. For confidential guests using images, the assumption is that the guest kernel uses dm-verity protection, implicitly measuring the rootfs image via the kernel command line's dm-verity information. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Greg Kurz	e430b2641c	Merge pull request #12435 from bpradipt/crio-annotation shim: Add CRI-O annotation support for device cold plug	2026-02-05 09:29:19 +01:00
Alex Lyn	e257430976	Merge pull request #12433 from manuelh-dev/mahuber/cfg-sanitize-whitespaces runtimes: Sanitize trailing whitespaces	2026-02-05 09:31:21 +08:00
Fabiano Fidêncio	dda1b30c34	tests: nvidia-nim: Use sealed secrets for NGC_API_KEY Convert the NGC_API_KEY from a regular Kubernetes secret to a sealed secret for the CC GPU tests. This ensures the API key is only accessible within the confidential enclave after successful attestation. The sealed secret uses the "vault" type which points to a resource stored in the Key Broker Service (KBS). The Confidential Data Hub (CDH) inside the guest will unseal this secret by fetching it from KBS after attestation. The initdata file is created AFTER create_tmp_policy_settings_dir() copies the empty default file, and BEFORE auto_generate_policy() runs. This allows genpolicy to add the generated policy.rego to our custom CDH configuration. The sealed secret format follows the CoCo specification: sealed.<JWS header>.<JWS payload>.<signature> Where the payload contains: - version: "0.1.0" - type: "vault" (pointer to KBS resource) - provider: "kbs" - resource_uri: KBS path to the actual secret Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-04 12:34:44 +01:00
Fabiano Fidêncio	c9061f9e36	tests: kata-deploy: Increase post-deployment wait time Increase the sleep time after kata-deploy deployment from 10s to 60s to give more time for runtimes to be configured. This helps avoid race conditions on slower K8s distributions like k3s where the RuntimeClass may not be immediately available after the DaemonSet rollout completes. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-04 12:13:53 +01:00
Fabiano Fidêncio	0fb2c500fd	tests: kata-deploy: Merge E2E tests to avoid timing issues Merge the two E2E tests ("Custom RuntimeClass exists with correct properties" and "Custom runtime can run a pod") into a single test, as those 2 are very much dependent of each other. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-04 12:13:53 +01:00
Fabiano Fidêncio	fef93f1e08	tests: kata-deploy: Use die() instead of fail() for error handling Replace fail() calls with die() which is already provided by common.bash. The fail() function doesn't exist in the test infrastructure, causing "command not found" errors when tests fail. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-04 12:13:53 +01:00
Fabiano Fidêncio	f90c12d4df	kata-deploy: Avoid text file busy error with nydus-snapshotter We cannot overwrtie a binary that's currently in use, and that's the reason that elsewhere we remove / unlink the binary (the running process keeps its file descriptor, so we're good doing that) and only then we copy the binary. However, we missed doing this for the nydus-snapshotter deployment. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-04 10:24:49 +01:00
Manuel Huber	30c7325e75	runtimes: Sanitize trailing whitespaces Clean up trailing whitespaces, making life easier for those who have configured their IDE to clean these up. Suggest to not add new code with trailing whitespaces etc. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-03 11:46:30 -08:00
Steve Horsman	30494abe48	Merge pull request #12426 from kata-containers/dependabot/github_actions/zizmorcore/zizmor-action-0.4.1 build(deps): bump zizmorcore/zizmor-action from 0.2.0 to 0.4.1	2026-02-03 14:38:54 +00:00
Pradipta Banerjee	8a449d358f	shim: Add CRI-O annotation support for device cold plug Add support for CRI-O annotations when fetching pod identifiers for device cold plug. The code now checks containerd CRI annotations first, then falls back to CRI-O annotations if they are empty. This enables device cold plug to work with both containerd and CRI-O container runtimes. Annotations supported: - containerd: io.kubernetes.cri.sandbox-name, io.kubernetes.cri.sandbox-namespace - CRI-O: io.kubernetes.cri-o.KubeName, io.kubernetes.cri-o.Namespace Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>	2026-02-03 04:51:15 +00:00
Steve Horsman	6bb77a2f13	Merge pull request #12390 from mythi/tdx-updates-2026-2 runtime: tdx QEMU configuration changes	2026-02-02 16:58:44 +00:00
Zvonko Kaiser	6702b48858	Merge pull request #12428 from fidencio/topic/nydus-snapshotter-start-from-a-clean-state kata-deploy: nydus: Always start from a clean state	2026-02-02 11:21:26 -05:00
Steve Horsman	0530a3494f	Merge pull request #12415 from nlle/make-helm-updatestrategy-configurable kata-deploy: Make update strategy configurable for kata-deploy DaemonSet	2026-02-02 10:29:01 +00:00
Steve Horsman	93dcaee965	Merge pull request #12423 from manuelh-dev/mahuber/pause-build-fix packaging: Delete pause_bundle dir before unpack	2026-02-02 10:26:30 +00:00
Fabiano Fidêncio	62ad0814c5	kata-deploy: nydus: Always start from a clean state Clean up existing nydus-snapshotter state to ensure fresh start with new version. This is safe across all K8s distributions (k3s, rke2, k0s, microk8s, etc.) because we only touch the nydus data directory, not containerd's internals. When containerd tries to use non-existent snapshots, it will re-pull/re-unpack. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-02 11:06:37 +01:00
Mikko Ylinen	870630c421	kata-deploy: drop custom TDX installation steps As we have moved to use QEMU (and OVMF already earlier) from kata-deploy, the custom tdx configurations and distro checks are no longer needed. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-02-02 11:11:26 +02:00
Mikko Ylinen	927be7b8ad	runtime: tdx: move to use QEMU from kata-deploy Currently, a working TDX setup expects users to install special TDX support builds from Canonical/CentOS virt-sig for TDX to work. kata-deploy configured TDX runtime handler to use QEMU from the distro's paths. With TDX support now being available in upstream Linux and Ubuntu 24.04 having an install candidate (linux-image-generic-6.17) for a new enough kernel, move TDX configuration to use QEMU from kata-deploy. While this is the new default, going back to the original setup is possible by making manual changes to TDX runtime handlers. Note: runtime-rs is already using QEMUPATH for TDX. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-02-02 11:10:52 +02:00

1 2 3 4 5 ...

17827 Commits