kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-03-18 10:44:10 +00:00

Author	SHA1	Message	Date
Fabiano Fidêncio	d40afe592c	genpolicy: add settings drop-in directory and RFC 6902 JSON Patch support Allow genpolicy -j to accept a directory instead of a single file. When given a directory, genpolicy loads genpolicy-settings.json from it and applies all genpolicy-settings.d/.json files (sorted by name) as RFC 6902 JSON Patches. This gives precise control over settings with explicit operations (add, remove, replace, move, copy, test), including array index manipulation and assertions. Ship composable drop-in examples in drop-in-examples/: - 10- files set platform base settings (non-CoCo, AKS, CBL-Mariner) - 20-* files overlay specific adjustments (OCI version, guest pull) Users copy the combination they need into genpolicy-settings.d/. Replace the old adapt_common_policy_settings_* jq-patching functions in tests_common.sh with install_genpolicy_drop_ins(), which copies the right combination of 10-* and 20-* drop-ins for the CI scenario. Tests still generate 99-test-overrides.json on the fly for per-test request/exec overrides. Packaging installs 10-* and 20-* drop-ins from drop-in-examples/ into the tarball; the default genpolicy-settings.d/ is left empty. Made-with: Cursor Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-04 20:13:21 +01:00
Steve Horsman	a4a4683ec7	Merge pull request #12626 from kata-containers/topic/kata-deploy-k3s-rke2-use-imports kata-deploy: a bunch of fixes regarding uninstall, rke2 and k3s tests	2026-03-04 14:01:09 +00:00
Steve Horsman	2687ad75c1	Merge pull request #12617 from BbolroC/skip-cgroup-device-check-for-remote runtime: Skip to call sandboxDevices() for remote hypervisor	2026-03-04 14:00:23 +00:00
Steve Horsman	8e11bb2526	Merge pull request #12611 from mythi/coco-kernel-v6.18.15 versions: bump to Linux v6.18.15 (LTS)	2026-03-04 14:00:00 +00:00
Steve Horsman	94f850979f	Merge pull request #12613 from stevenhorsman/tooling-bump-x/net-to-v0.51.0 Tooling bump x/net to v0.51.0	2026-03-04 13:44:22 +00:00
stevenhorsman	8640f27516	ci: Remove SNP tests from required The SNP tests have been unstable on nightlies, but even when these it seems to be manually cleaned up or something as PR tests are consistently failing, so we should skip this from the required list until it is reliable. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-03-04 14:41:09 +01:00
Fabiano Fidêncio	56c3618c1d	tests: kata-deploy: wait for API recovery after uninstall kata-deploy's SIGTERM cleanup restarts the CRI runtime, which on k3s/rke2 takes down the API server temporarily. The helm uninstall may complete with errors, and the next test suite would start with a dead API. Add a wait loop after uninstall to ensure the API is available before proceeding. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-04 11:26:31 +01:00
Fabiano Fidêncio	966d710df5	tests: increase kata-deploy wait timeout to 15 minutes kata-deploy restarts the CRI runtime during install, which can cause the kata-deploy pod to be killed and recreated by the DaemonSet controller. On k3s and rke2 in particular, the restart can take several minutes. Increase the default timeout from 600s (10m) to 900s (15m) to accommodate this. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-04 11:26:31 +01:00
Fabiano Fidêncio	ebe75cc3e3	kata-deploy: make verification job resilient to CRI runtime restarts kata-deploy restarts the CRI runtime (k3s/containerd) during install, which can kill the verification job pod or cause transient API server errors. Bump backoffLimit from 0 to 3 so the job can retry after being killed, and add a retry loop around kubectl rollout status to handle transient connection failures. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-04 11:26:31 +01:00
Fabiano Fidêncio	7a08ef2f8d	kata-deploy: run cleanup on SIGTERM instead of preStop hook Move the cleanup logic from a preStop lifecycle hook (separate exec) into the main process's SIGTERM handler. This simplifies the architecture: the install process now handles its own teardown when the pod is terminated. The SIGTERM handler is registered before install begins, and tokio::select! races install against SIGTERM so cleanup always runs even if SIGTERM arrives mid-install (e.g. helm uninstall while the container is restarting after a failed install attempt). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-04 11:26:31 +01:00
Fabiano Fidêncio	4e024bfb43	Revert "tests: Skip testing k3s/rke2 with nydus snapshotter" This reverts commit `ab25592533`, as now we're deploying k3s/rke2 in a way that we properly test them. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-04 11:26:31 +01:00
Fabiano Fidêncio	a2216ec05a	tests: set up full K3s/RKE2 V3 containerd template when needed If the rendered config-v3.toml does not import the drop-in dir, write the full k3s ContainerdConfigTemplateV3 (with hardcoded import path) so kata-deploy can use drop-in. This allows us to test with K3s/RKE2 before my patch there gets released. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-04 11:26:31 +01:00
Fabiano Fidêncio	01895bf87e	kata-deploy: use k3s/rke2 drop-in Check the rendered containerd config for the versioned drop-in dir import (config.toml.d or config-v3.toml.d) and bail with a clear error if it is missing. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-04 11:08:26 +01:00
Aurélien Bombo	d821d4e572	Merge pull request #12619 from sprt/require-editorconfig gatekeeper: Add EditorConfig checker to required tests	2026-03-03 21:36:32 -06:00
Fabiano Fidêncio	b0345d50e8	build: kernel: Do not expect a modules tarball for vanilla kernel When I added this I had in mind the period that we still relied on the SEV module being generated, which we don't do for quite a long time. This wrong assumption caused the cache to ALWAYS fail, increasing our build time considerably for no reason. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-03 20:14:42 +01:00
Aurélien Bombo	911742e26e	gatekeeper: Add EditorConfig checker to required tests Now that it's stable and fully configured. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-03-03 11:34:06 -06:00
Hyounggyu Choi	347ce5e3bc	runtime: Skip to call sandboxDevices() for remote hypervisor The remote hypervisor delegates VM creation to a remote service. The VM runs on cloud infrastructure, not the local host kernel. So requiring a KVM/MSHV device is semantically wrong and would cause a hard failure on any host where these devices are absent (e.g., a VM that doesn't expose nested virtualization). Skip sandboxDevices() entirely when the configured hypervisor type is remoteHypervisor{}. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2026-03-03 13:44:12 +01:00
Fabiano Fidêncio	ab25592533	tests: Skip testing k3s/rke2 with nydus snapshotter We depend on a k3s commit so we can properly test it, or we need to change our CI quite a bit to deploy a full template with that imports in. For now, let's just skip the testing in k3s/rke2 and we'll address it in a different PR. ref: `b51167a996` Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-03 12:55:10 +01:00
Fabiano Fidêncio	fa3c3eb2ce	ci: Add autogenerated policy tests on k0s, k3s, rke2 and microk8s These tests run only on nightly and when triggering the dev CI manually. They cover both nydus snapshotter with guest-pull and experimental-force-guest-pull, using qemu-coco-dev and qemu-coco-dev-runtime-rs, and are included in the run-kata-coco-tests workflow behind the extensive-matrix-autogenerated-policy flag. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-03 12:55:10 +01:00
Fabiano Fidêncio	3e807300ac	tests: k0s: Ensure --logging=containerd=debug is passed As the default is `info` and that actually overrides whatever is set in the drop-in file used by k0s. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-03 12:55:10 +01:00
Fabiano Fidêncio	876c6c832d	tests: set runtime-request-timeout to 600s for k0s, k3s, rke2, microk8s Align with kubeadm and bare metal by setting the kubelet CRI runtime-request-timeout to 600s in deploy functions for k0s (worker profile), k3s (--kubelet-arg), rke2 (config.yaml), and microk8s (args/kubelet + restart). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-03 12:55:10 +01:00
Fabiano Fidêncio	9725df658f	tests: k8s: policy: set OCI bundle 1.2.1 for k3s/rke2 k3s and rke2 use containerd that expects OCI bundle 1.2.1; otherwise autogenerated policy tests fail. Add adapt_common_policy_settings_for_k3s_rke2 and call it from adapt_common_policy_settings when KUBERNETES is k3s or rke2. Tested with k3s v1.34.4+k3s1, rke2 v1.34.4+rke2r1. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-03-03 12:55:10 +01:00
Steve Horsman	7ca8db1e61	Merge pull request #12616 from Amulyam24/go-arch-fix gha: pass the arch for setup-go on ppc64le	2026-03-03 11:34:30 +00:00
Amulyam24	0754a17fed	gha: pass the arch for setup-go on ppc64le By default, setup-go installs ppc64 binary instead of ppc64le, resulting in an exec format error. Pass the arch explicitly to fix this. Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>	2026-03-03 16:41:10 +05:30
Mikko Ylinen	2cf9018e35	versions: bump to Linux v6.18.15 (LTS) Bump to the latest LTS kernel to get a fix for TDX: efi: Fix reservation of unaccepted memory table See details in: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0862438c90487e79822d5647f854977d50381505 Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-03-03 07:56:24 +02:00
Mikko Ylinen	0b2af07b02	build: kernel: fix checksum checks for RC kernels get_kernel() thinks it knows when it needs to skip sha256sum validation for RC kernels since sha256sums.asc is not available: INFO: Config version: 176 INFO: Kernel version: 6.18-rc5 INFO: kernel path does not exist, will download kernel INFO: Release candidate kernels are not part of the official sha256sums.asc -- skipping sha256sum validation But continues to check it anyway since ${rc} matches with -n. sha256sum should only be checked when ${rc} is NOT set. Fixes a problem where downloaded RC kernels are always removed and downloaded again. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-03-03 07:56:24 +02:00
Dan Mihai	3ea23528a5	docs: require user/group/fsGroup/supplementalGroups Add a nydus guest-pull limitation explaining that specifying runAsUser, runAsGroup, fsGroup, and supplementalGroups are required. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Dan Mihai <dmihai@microsoft.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-03-02 23:48:36 +01:00
stevenhorsman	642aa12889	csi-kata-directvolume: Bump x/net to v0.51 Remediates CVE GO-2026-4559 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-03-02 16:40:58 +00:00
stevenhorsman	24fe232e56	ci/openshift-ci: Bump x/net to v0.51 Remediate CVE GO-2026-4559 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-03-02 16:40:03 +00:00
Steve Horsman	e50324ba5b	Merge pull request #12609 from kata-containers/dependabot/go_modules/src/runtime/go.opentelemetry.io/otel/sdk-1.40.0 build(deps): bump go.opentelemetry.io/otel/sdk from 1.35.0 to 1.40.0 in /src/runtime	2026-03-02 16:32:40 +00:00
stevenhorsman	993a4846c8	versions: Bump go to 1.25.7 Now that go 1.26 is out, 1.24 is not supported, so bump to 1.25 as per our policy. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-03-02 16:33:47 +01:00
dependabot[bot]	d95d1796b2	build(deps): bump go.opentelemetry.io/otel/sdk in /src/runtime Bumps [go.opentelemetry.io/otel/sdk](https://github.com/open-telemetry/opentelemetry-go) from 1.35.0 to 1.40.0. - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.35.0...v1.40.0) --- updated-dependencies: - dependency-name: go.opentelemetry.io/otel/sdk dependency-version: 1.40.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2026-03-02 12:59:21 +00:00
Steve Horsman	501d8d1916	Merge pull request #12596 from kata-containers/remove-install_go workflow \| tests: Remove install go	2026-03-02 12:36:58 +00:00
Steve Horsman	964c91f8fc	Merge pull request #12608 from kata-containers/sprt/fix-hostpath-dev-docs docs: Use more accurate wording for /dev hostPath behavior	2026-03-02 11:50:15 +00:00
Aurélien Bombo	68e67d7f8a	docs: Use more accurate wording for /dev hostPath behavior I got lazy when I first added this section in `5c21b1f`, so updating the language to specify that any non-regular host file (under /dev) qualifies, not just devices. This matches the actual code, see: `330bfff4be/src/runtime/virtcontainers/mount.go (L57-L83)` Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-03-02 11:32:01 +00:00
Steve Horsman	b147cb1319	Merge pull request #12587 from fidencio/topic/runtime-add-configurable-kubelet-root-dir runtimes: add configurable kubelet root dir	2026-02-28 19:06:14 +00:00
Xuewei Niu	8a4ae090e6	Merge pull request #12513 from lifupan/event_publish send the task create/start/delete event to containerd	2026-02-28 14:41:46 +08:00
Zvonko Kaiser	afe09803a1	gpu: Ignore OVMF and use the Kernel for proper PCI setup Sometimes OVFM provides incorrect values to the kernel we override it by telling the kernel to handle the PCI space setup like allocating the proper window sizes and assigning the proper busses to each device. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-02-27 22:54:31 +01:00
Manuel Huber	88f746dea8	runtime: nvidia: Use OVMF for NV GPU handler Shift to using OVMF instead of using SeaBios. Signed-off-by: Manuel Huber <manuelh@nvidia.com> Update src/runtime/Makefile Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-02-27 22:54:31 +01:00
Zvonko Kaiser	eec397ac08	qemu: Remove PCIe root port BAR reserve sizing Stop computing and setting mem-reserve and pref64-reserve on PCIe root ports and switch ports. Remove getBARsMaxAddressableMemory() which scanned host GPU BARs to pre-calculate these values. The previous approach only considered GPU devices (IsGPU(), class 0x0302) when scanning for BAR sizes, so devices like NVSwitches (class 0x0680) with their 32MB non-prefetchable BAR0 were not accounted for and received the 4MB default. Additionally, GetTotalAddressableMemory() classifies BARs by 32/64-bit address width rather than by the prefetchable flag that QEMU's mem-reserve vs pref64-reserve maps to. Modern QEMU introspects VFIO device BARs when they are attached to root ports and sizes the MMIO windows accordingly. Modern OVMF (edk2-stable202502+) automatically calculates the 64-bit PCI MMIO aperture based on the BARs of actually present devices during PCI enumeration. Omitting the reserve parameters lets QEMU and OVMF handle MMIO window sizing correctly for all device types including GPUs, NVSwitches, and NICs without requiring host-side BAR scanning. This also removes the nvpci dependency from qemu_arch_base.go. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-02-27 22:54:31 +01:00
Zvonko Kaiser	bb7fd335f3	qemu: Remove OVMF X-PciMmio64Mb fw_cfg hint Modern OVMF (edk2-stable202502 and later) automatically sizes the 64-bit PCI MMIO aperture based on the BARs of actually attached devices during PCI enumeration. The opt/ovmf/X-PciMmio64Mb fw_cfg hint is no longer needed to ensure large-BAR devices like NVIDIA GPUs receive adequate MMIO space. The previous approach was fragile: the runtime scanned host PCI devices to estimate the required aperture size, but only considered GPU devices (class 0x0302), missing NVSwitches and other devices with large BARs. Removing this code avoids confusion about MMIO sizing responsibility. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-02-27 22:54:31 +01:00
Fabiano Fidêncio	330bfff4be	kata-deploy: Fix nydus snapshotter config (on v3 config version) On containerd v3 config, disable_snapshot_annotations must be set under the images plugin, not the runtime plugin. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-27 18:20:30 +01:00
Fabiano Fidêncio	0a73638744	runtime: add configurable kubelet root dir Different kubernetes distributions, such as k0s, use a different kubelet root dir location instead of the default /var/lib/kubelet, so ConfigMap and Secret volume propagation were failing. This adds a kubelet_root_dir config option that the go runtime uses when matching volume paths and kata-deploy now sets it automatically for k0s via a drop-in file. runtime-rs does not need this option: it identifies ConfigMap/Secret, projected, and downward-api volumes by volume-type path segment (kubernetes.io~configmap, etc.), not by kubelet root prefix. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-27 14:10:57 +01:00
Steve Horsman	2695007ef8	Merge pull request #12584 from stevenhorsman/switch-actionlint-workflow workflow: Update actionlint workflows	2026-02-27 13:03:58 +00:00
stevenhorsman	66e58d6490	tests: Delete install_go.sh Having a script to install go is legacy from Jenkins, so delete it, so there is less code in our repo. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-27 12:42:43 +00:00
stevenhorsman	b71bb47e21	workflow: Use setup-go to install go Rather than having our own script, just use the github action to install go when needed. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-27 12:42:43 +00:00
Steve Horsman	3442fc7d07	Merge pull request #12477 from kata-containers/workflow-improvements workflow: Recommended improvements	2026-02-27 11:57:22 +00:00
Markus Rudy	d9d886b419	agent-policy: read bundle-id from OCI spec rootfs The host path of bundles is not portable and could be literally anything depending on containerd configuration, so we can't rely on a specific prefix when deriving the bundle-id. Instead, we derive the bundle-id from the target root path in the guest. NOTE: fixes https://github.com/kata-containers/kata-containers/issues/10065 Signed-off-by: Markus Rudy <mr@edgeless.systems> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-27 10:24:38 +01:00
Hyounggyu Choi	be5ae7d1e1	Merge pull request #12573 from BbolroC/support-memory-hotplug-go-runtime-s390x runtime: Support memory hotplug via virtio-mem on s390x	2026-02-27 09:59:40 +01:00
Steve Horsman	c6014ddfe4	Merge pull request #12574 from sathieu/kata-deploy-kubectl-image kata-deploy: allow to configure kubectl image	2026-02-27 08:42:06 +00:00

1 2 3 4 5 ...

18096 Commits