kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-04-12 23:04:33 +00:00

Author	SHA1	Message	Date
Manuel Huber	1b04a936d3	tests: Align CDH mount options Test to see if there is improvements in integrity errors. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-04-08 10:45:47 -07:00
Manuel Huber	ff6088050e	tests: Remove lazy_journal_init for CDH mount Remove lazy_journal_init for the CDH secure_mount option. Re-enable concurrent image-layer pulls for the NIM test. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-04-07 16:06:47 -07:00
Manuel Huber	eee6049ee9	tests: Use multiple CPUs for image layer storage Use multiple CPUs for the image layer storage test. The purpose is to ensure guest-pull using the container image layer storage functionality with integrity-protected encryption works (writes to /dev/trusted_store will be multi-threaded). Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-04-07 10:26:45 -07:00
Dan Mihai	9b770793ba	Merge pull request #12728 from manuelh-dev/mahuber/empty-dir-fsgrou-policy genpolicy: adjust GID after passwd GID handling and set fs_group for encrypted emptyDir volumes	2026-04-06 10:22:34 -07:00
Fabiano Fidêncio	47770daa3b	helm: Align values.yaml with try-kata-nvidia-gpu.values.yaml We've switched to nydus there, but never did for the values.yaml. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-06 18:51:54 +02:00
Fabiano Fidêncio	1300145f7a	tests: add k3s/rke2 to OCI 1.3.0 drop-in overlay condition k3s and rke2 ship containerd 2.2.2, which requires the OCI 1.3.0 drop-in overlay. Move them from the separate OCI 1.2.1 branch into the OCI 1.3.0 condition alongside nvidia-gpu, qemu-snp, qemu-tdx, and custom container engine versions. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-06 18:50:20 +02:00
Fabiano Fidêncio	0a739b3b55	Merge pull request #12755 from katexochen/runtime-rs-config-cleanup runtime-rs: cleanup config	2026-04-06 13:14:58 +02:00
Fabiano Fidêncio	9a2825a429	runtime: config: Use OVMF for the qemu-nvidia-gpu 2ba0cb0d4a7 did the ground work for using OVMF even for the qemu-nvidia-gpu, but missed actually setting the OVMF path to be used, which we'e fixing now. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-06 03:54:56 +02:00
Fabiano Fidêncio	e1fae11509	Merge pull request #12392 from Apokleos/enhance-tdx runtime-rs: Enhance TDX in qemu	2026-04-05 20:54:43 +02:00
Alex Lyn	35cafe8715	runtime-rs: configure TDX machine options with kernel_irqchip=split When TDX confidential guest support is enabled, set `kernel_irqchip=split` for TDX CVM: ... -machine \ q35,accel=kvm,kernel_irqchip=split,confidential-guest-support=tdx \ ... Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-04-05 10:18:47 +02:00
Fabiano Fidêncio	f074ceec6d	Merge pull request #12682 from PiotrProkop/fix-direct-io-kata runtime-rs: fix setting directio via config file	2026-04-03 16:11:57 +02:00
Fabiano Fidêncio	945aa5b43f	Merge pull request #12774 from zvonkok/bump-nvrc nvrc: Bump to the latest Release	2026-04-03 15:39:01 +02:00
Fabiano Fidêncio	ccfdf5e11b	Merge pull request #12754 from llink5/fix/docker26-networking-9340 runtime: fix Docker 26+ networking by rescanning after Start	2026-04-03 13:15:38 +02:00
RuoqingHe	26bd5ad754	Merge pull request #12762 from YutingNie/fix-runtime-rs-shared-fs-typo runtime-rs: Fix typo in share_fs error message	2026-04-03 15:24:33 +08:00
Yuting Nie	517882f93d	runtime-rs: Fix typo in share_fs error message There's a typo in the error message which gets prompted when an unsupported share_fs was configured. Fixed shred -> shared. Signed-off-by: Yuting Nie <yuting.nie@spacemit.com>	2026-04-03 05:23:46 +00:00
Alex Lyn	4a1c2b6620	Merge pull request #12309 from kata-containers/stale-issues-by-date workflows: Create workflow to stale issues based on date	2026-04-03 09:31:34 +08:00
Zvonko Kaiser	3e23ee9998	nvrc: Bump to the latest Release v0.1.4 has a bugfix for nvrc.log=trace which is now optional. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-04-02 17:40:47 -04:00
llink5	f7878cc385	runtime: fix Docker 26+ networking by rescanning after Start Docker 26+ configures container networking (veth pair, IP addresses, routes) after task creation rather than before. Kata's endpoint scan runs during CreateSandbox, before the interfaces exist, resulting in VMs starting without network connectivity (no -netdev passed to QEMU). Add RescanNetwork() which runs asynchronously after the Start RPC. It polls the network namespace until Docker's interfaces appear, then hotplugs them to QEMU and informs the guest agent to configure them inside the VM. Additional fixes: - mountinfo parser: find fs type dynamically instead of hardcoded field index, fixing parsing with optional mount tags (shared:, master:) - IsDockerContainer: check CreateRuntime hooks for Docker 26+ - DockerNetnsPath: extract netns path from libnetwork-setkey hook args with path traversal protection - detectHypervisorNetns: verify PID ownership via /proc/pid/cmdline to guard against PID recycling - startVM guard: rescan when len(endpoints)==0 after VM start Fixes: #9340 Signed-off-by: llink5 <llink5@users.noreply.github.com>	2026-04-02 21:23:16 +02:00
Fabiano Fidêncio	09194d71bb	Merge pull request #12767 from nubificus/fix/fc-rs runtime-rs: Fix FC API fields	2026-04-02 18:24:35 +02:00
Manuel Huber	dd868dee6d	tests: nvidia: onboard NIM service test Onboard a test case for deploying a NIM service using the NIM operator. We install the operator helm chart on the fly as this is a fast operation, spinning up a single operand. Once a NIM service is scheduled, the operator creates a deployment with a single pod. For now, the TEE-based flow uses an allow-all policy. In future work, we strive to support generating pod security policies for the scenario where NIM services are deployed and the pod manifest is being generated on the fly. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-04-02 16:58:54 +02:00
Manuel Huber	57e42b10f1	tests: nvidia: Do not use elevated privileges Do not run the NIM containers with elevated privileges. Note that, using hostPath requires proper host folder permissions, and that using emptyDir requires a proper fsGroup ID. Once issue 11162 is resolved, we can further refine the securityContext fields for the TEE manifests. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-04-01 10:23:26 -07:00
Manuel Huber	a762b136de	tests: generate policy for pod-empty-dir-fsgroup The logic in the k8s-empty-dirs.bats file missed to add a security policy for the pod-empty-dir-fsgroup.yaml manifest. With this change, we add the policy annotation. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-04-01 10:23:26 -07:00
Manuel Huber	43489f6d56	genpolicy: fs_group for encrypted emptyDir volumes The shim uses Storage.fs_group on block/scsi encrypted emptyDir while genpolicy used fsgid= in options and null fs_group, leading to denying CreateContainerRequest when using block-encrypted emptyDir in combination with fsGroup. Thus, emit fs_group in that scenario and keep fsgid= for the existing shared-fs/local emptyDir behavior. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-04-01 10:23:26 -07:00
Manuel Huber	9923f251f5	genpolicy: adjust GID after passwd GID handling After pod runAsUser triggers passwd-based GID resolution, genpolicy clears AdditionalGids and inserts only the primary GID. PodSecurityContext fsGroup and supplementalGroups get cleared, so policy enforcement would deny CreateContainer when the runtime includes those when specified. This change applies fsGroup/supplementalGroups once in get_container_process via apply_pod_fs_group_and_supplemental_groups. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-04-01 10:23:25 -07:00
Steve Horsman	58101a2166	Merge pull request #12656 from stevenhorsman/actions/checkout-bump workflows: Update actions/checkout version	2026-04-01 17:34:39 +01:00
Fabiano Fidêncio	75df4c0bd3	Merge pull request #12766 from fidencio/topic/kata-deploy-avoid-kata-pods-to-crash-after-containerd-restart kata-deploy: Fix kata-deploy pods crashing if containerd restarts	2026-04-01 18:28:16 +02:00
Steve Horsman	2830c4f080	Merge pull request #12746 from ldoktor/ci-helm2 ci.ocp: Use helm deployment for peer-pods	2026-04-01 17:13:21 +01:00
Lukáš Doktor	55a3772032	ci.ocp: Add note about external tests to README.md to run all the tests that are running in CI we need to enable external tests. This can be a bit tricky so add it into our documentation. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2026-04-01 16:59:33 +01:00
Lukáš Doktor	3bc460fd82	ci.ocp: Use helm deployment for peer-pods replace the deprecated CAA deployment with helm one. Note that this also installs the CAA mutating webhook, which wasn't installed before. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2026-04-01 16:59:33 +01:00
PiotrProkop	67af63a540	runtime-rs: fix setting directio via config file This fix applies the config file value as a fallback when block_device_cache_direct annotation is not explicitly set on the pod. Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2026-04-01 16:59:04 +02:00
Anastassios Nanos	02c82b174a	runtime-rs: Fix FC API fields A FC update caused bad requests for the runtime-rs runtime when specifying the vcpu count and block rate limiter fields. Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>	2026-04-01 14:50:51 +00:00
Fabiano Fidêncio	2131147360	tests: add kata-deploy lifecycle tests for restart resilience and cleanup Add functional tests that cover two previously untested kata-deploy behaviors: 1. Restart resilience (regression test for #12761): deploys a long-running kata pod, triggers a kata-deploy DaemonSet restart via rollout restart, and verifies the kata pod survives with the same UID and zero additional container restarts. 2. Artifact cleanup: after helm uninstall, verifies that RuntimeClasses are removed, the kata-runtime node label is cleared, /opt/kata is gone from the host filesystem, and containerd remains healthy. 3. Artifact presence: after install, verifies /opt/kata and the shim binary exist on the host, RuntimeClasses are created, and the node is labeled. Host filesystem checks use a short-lived privileged pod with a hostPath mount to inspect the node directly. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-01 15:20:53 +02:00
Fabiano Fidêncio	b4b62417ed	kata-deploy: skip cleanup on pod restart to avoid crashing kata pods When a kata-deploy DaemonSet pod is restarted (e.g. due to a label change or rolling update), the SIGTERM handler runs cleanup which unconditionally removes kata artifacts and restarts containerd. This causes containerd to lose the kata shim binary, crashing all running kata pods on the node. Fix this by implementing a three-stage cleanup decision: 1. If this pod's owning DaemonSet still exists (exact name match via DAEMONSET_NAME env var), this is a pod restart — skip all cleanup. The replacement pod will re-run install, which is idempotent. 2. If this DaemonSet is gone but other kata-deploy DaemonSets still exist (multi-install scenario), perform instance-specific cleanup only (snapshotters, CRI config, artifacts) but skip shared resources (node label removal, CRI restart) to avoid disrupting the other instances. 3. If no kata-deploy DaemonSets remain, perform full cleanup including node label removal and CRI restart. The Helm chart injects a DAEMONSET_NAME environment variable with the exact DaemonSet name (including any multi-install suffix), ensuring instance-aware lookup rather than broadly matching any DaemonSet containing "kata-deploy". Fixes: #12761 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-01 15:20:52 +02:00
Fabiano Fidêncio	28414a614e	kata-deploy: detect k3s/rke2 via systemd services instead of version string Newer k3s releases (v1.34+) no longer include "k3s" in the containerd version string at all (e.g. "containerd://2.2.2-bd1.34" instead of "containerd://2.1.5-k3s1"). This caused kata-deploy to fall through to the default "containerd" runtime, configuring and restarting the system containerd service instead of k3s's embedded containerd — leaving the kata runtime invisible to k3s. Fix by detecting k3s/rke2 via their systemd service names (k3s, k3s-agent, rke2-server, rke2-agent) rather than parsing the containerd version string. This is more robust and works regardless of how k3s formats its containerd version. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-01 14:24:55 +02:00
Fabiano Fidêncio	8b9ce3b6cb	tests: remove k3s/rke2 V3 containerd template workaround Remove the workaround that wrote a synthetic containerd V3 config template for k3s/rke2 in CI. This was added to test kata-deploy's drop-in support before the upstream k3s/rke2 patch shipped. Now that k3s and rke2 include the drop-in imports in their default template, the workaround is no longer needed and breaks newer versions. Removed: - tests/containerd-config-v3.tmpl (synthetic Go template) - _setup_containerd_v3_template_if_needed() and its k3s/rke2 wrappers - Calls from deploy_k3s() and deploy_rke2() This reverts the test infrastructure part of `a2216ec05`. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-01 14:24:55 +02:00
Steve Horsman	9a3f6b075e	Merge pull request #12753 from stevenhorsman/remove-agent-Cargo- agent: Remove Cargo.lock	2026-04-01 13:22:57 +01:00
Steve Horsman	0d38d88b07	Merge pull request #12484 from Amulyam24/runtime-rs-ppc64le runtime-rs: add QEMU support for ppc64le	2026-04-01 12:54:40 +01:00
Fabiano Fidêncio	6555350625	Merge pull request #12765 from fidencio/topic/kata-deploy-nydus-fix-systemd-unit kata-deploy: Make nydus a soft dep of containerd	2026-04-01 13:40:42 +02:00
Fabiano Fidêncio	b823184cf7	Merge pull request #12580 from manuelh-dev/mahuber/gpu-ci-storage tests: gpu: use the container image layer storage feature	2026-04-01 12:23:56 +02:00
Fabiano Fidêncio	fe1f804543	kata-deploy: Restart nydus-snapshotter in case of failure Let's ensure that in case nydus-snapshotter crashes for one reason or another, the service is restarted. This follows containerd approach, and avoids manual intervention in the node. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-01 11:00:21 +02:00
Fabiano Fidêncio	789abe6fdf	kata-deploy: Make nydus a soft dep of containerd Let's relax our RequiredBy and use a WantedBy in the nydus systemd unit file as, in case of a nydus crash, containerd would also be put down, causing the node to become NotReady. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-01 10:52:29 +02:00
Amulyam24	bf74f683d7	runtime-rs: align memory size with desired block size on ppc64le couldn't initialise QMP: Connection reset by peer (os error 104) Caused by: Connection reset by peer (os error 104) qemu stderr: "qemu-system-ppc64: Maximum memory size 0x80000000 is not aligned to 256 MiB” When the default max memory was assigned according to the available host memory, it failed with the above error Align the memory values with the block size of 256 MB on ppc64le. Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>	2026-04-01 09:36:45 +01:00
Amulyam24	dcb7d025c7	runtime-rs: Use libc::TUNSETIFF instead of wrapper TUNSETIFF() While attaching the tap device, it fails on ppc64le with EBADF "cannot create tap device. File descriptor in bad state (os error 77)\"): unknown” Refactor the ioctl call to use the standard libc::TUNSETIFF constant. Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>	2026-04-01 09:36:45 +01:00
Amulyam24	8d25ff2c36	runtime-rs: implement set_capabilities for qemu After the qemu VM is booted, while storing the guest details, it fails to set capabilities as it is not yet implemented for QEMU, this change adds a default implementation for it. Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>	2026-04-01 09:36:45 +01:00
Amulyam24	778524467b	runtime-rs: enable building runtime-rs on ppc64le Adds changes in Makefile to build runtime-rs on ppc64le with QEMU support. Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>	2026-04-01 09:36:45 +01:00
Manuel Huber	177f5c308e	tests: gpu: use container image layer storage Use the container image layer storage feature for the k8s-nvidia-nim.bats test pod manifests. This reduces the pods' memory requirements. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-04-01 10:22:26 +02:00
Manuel Huber	b6cf00a374	tests: parametrize storage parameters - trusted-storage.yaml.in: use $PV_STORAGE_CAPACITY and $PVC_STORAGE_REQUEST so that PV/PVC size can vary per test. - confidential_common.sh: add optional size (MB) argument to create_loop_device. - k8s-guest-pull-image.bats: pass PV_STORAGE_CAPACITY and PVC_STORAGE_REQUEST when generating storage config. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-04-01 10:22:26 +02:00
Fabiano Fidêncio	f756966a8e	Merge pull request #12757 from BbolroC/fix-parsing-imcompatibility-containerd-config tests: Configure devmapper properly regardless of containerd version	2026-04-01 10:21:50 +02:00
stevenhorsman	5390e470d3	agent: Remove Cargo.lock Following on from #12690, the agent is part of the repo workspace, so no longer needs a lock file. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-01 09:11:28 +01:00
Hyounggyu Choi	11cd5f2808	tests: Configure devmapper properly regardless of containerd version The follow differences are observed between container 1.x and 2.x: ``` [plugins.'io.containerd.snapshotter.v1.devmapper'] snapshotter = 'overlayfs' ``` and ``` [plugins."io.containerd.snapshotter.v1.devmapper"] snapshotter = "overlayfs" ``` The current devmapper configuration only works with double quotes. Make it work with both single and double quotes via tomlq. In the default configuration for containerd 2.x, the following configuration block is missing: ``` [[plugins.'io.containerd.transfer.v1.local'.unpack_config]] platform = "linux/s390x" # system architecture snapshotter = "devmapper" ``` Ensure the configuration block is added for containerd 2.x. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2026-04-01 07:14:52 +02:00

1 2 3 4 5 ...

18368 Commits