kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-02-21 22:34:29 +00:00

Author	SHA1	Message	Date
Aurélien Bombo	336b922d4f	tests/cbl-mariner: Stop disabling NVDIMM explicitly This is not needed anymore since now disable_image_nvdimm=true for Cloud Hypervisor. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-02-18 11:52:51 -06:00
Aurélien Bombo	48aa077e8c	runtime{,-rs}/qemu/arm64: Disable DAX Enabling full-featured QEMU NVDIMM support on ARM with DAX enabled causes a kernel panic in caches_clean_inval_pou (see below, different issue from `33b1f07`), so we disable DAX in that environment. [ 1.222529] EXT4-fs (pmem0p1): mounted filesystem e5a4892c-dac8-42ee-ba55-27d4ff2f38c3 ro with ordered data mode. Quota mode: disabled. [ 1.222695] VFS: Mounted root (ext4 filesystem) readonly on device 259:1. [ 1.224890] devtmpfs: mounted [ 1.225175] Freeing unused kernel memory: 1920K [ 1.226102] Run /sbin/init as init process [ 1.226164] with arguments: [ 1.226204] /sbin/init [ 1.226235] with environment: [ 1.226268] HOME=/ [ 1.226295] TERM=linux [ 1.230974] Internal error: synchronous external abort: 0000000096000010 [#1] SMP [ 1.231963] CPU: 0 UID: 0 PID: 1 Comm: init Tainted: G M 6.18.5 #1 NONE [ 1.232965] Tainted: [M]=MACHINE_CHECK [ 1.233428] pstate: 43400005 (nZcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) [ 1.234273] pc : caches_clean_inval_pou+0x68/0x84 [ 1.234862] lr : sync_icache_aliases+0x30/0x38 [ 1.235412] sp : ffff80008000b9a0 [ 1.235842] x29: ffff80008000b9a0 x28: 0000000000000000 x27: 00000000019a00e1 [ 1.236912] x26: ffff80008000bc08 x25: ffff80008000baf0 x24: fffffdffc0000000 [ 1.238064] x23: ffff000001671ab0 x22: ffff000001663480 x21: fffffdffc23401c0 [ 1.239356] x20: fffffdffc23401c0 x19: fffffdffc23401c0 x18: 0000000000000000 [ 1.240626] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 1.241762] x14: ffffaae8f021b3b0 x13: 0000000000000000 x12: ffffaae8f021b3b0 [ 1.242874] x11: ffffffffffffffff x10: 0000000000000000 x9 : 0000ffffbb53c000 [ 1.244022] x8 : 0000000000000000 x7 : 0000000000000012 x6 : ffff55178f5e5000 [ 1.245157] x5 : ffff80008000b970 x4 : ffff00007fa4f680 x3 : ffff00008d007000 [ 1.246257] x2 : 0000000000000040 x1 : ffff00008d008000 x0 : ffff00008d007000 [ 1.247387] Call trace: [ 1.248056] caches_clean_inval_pou+0x68/0x84 (P) [ 1.248923] __sync_icache_dcache+0x7c/0x9c [ 1.249578] insert_page_into_pte_locked+0x1e4/0x284 [ 1.250432] insert_page+0xa8/0xc0 [ 1.251080] vmf_insert_page_mkwrite+0x40/0x7c [ 1.251832] dax_iomap_pte_fault+0x598/0x804 [ 1.252646] dax_iomap_fault+0x28/0x30 [ 1.253293] ext4_dax_huge_fault+0x80/0x2dc [ 1.253988] ext4_dax_fault+0x10/0x3c [ 1.254679] __do_fault+0x38/0x12c [ 1.255293] __handle_mm_fault+0x530/0xcf0 [ 1.255990] handle_mm_fault+0xe4/0x230 [ 1.256697] do_page_fault+0x17c/0x4dc [ 1.257487] do_translation_fault+0x30/0x38 [ 1.258184] do_mem_abort+0x40/0x8c [ 1.258895] el0_ia+0x4c/0x170 [ 1.259420] el0t_64_sync_handler+0xd8/0xdc [ 1.260154] el0t_64_sync+0x168/0x16c [ 1.260795] Code: d2800082 9ac32042 d1000443 8a230003 (d50b7523) [ 1.261756] ---[ end trace 0000000000000000 ]--- Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-02-18 11:52:43 -06:00
Aurélien Bombo	c727332b0e	runtime/qemu/arm64: Align NVDIMM usage on amd64 Nowadays on arm64 we use a modern QEMU version which supports the features we require for NVDIMM, so we remove the arm64-specific code and use the generic implementation. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-02-18 11:47:53 -06:00
Aurélien Bombo	e17f96251d	runtime{,-rs}/clh: Disable virtio-pmem This disables virtio-pmem support for Cloud Hypervisor by changing Kata config defaults and removing the relevant code paths. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-02-18 11:47:53 -06:00
Zvonko Kaiser	1d09e70233	Merge pull request #12538 from fidencio/topic/kata-deploy-fix-regression-on-hardcopying-symlinks kata-deploy: preserve symlinks when installing artifacts	2026-02-18 12:44:46 -05:00
Mikko Ylinen	5622ab644b	versions: bump QEMU to v10.2.1 v10.2.1 is the latest patch release in v10.2 series. Changes: https://github.com/qemu/qemu/compare/v10.2.0...v10.2.1 Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-02-18 18:18:52 +01:00
Mikko Ylinen	d68adc54da	versions: bump to Linux v6.18.12 (LTS) Latest changelog in https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.18.12 Also other changes for 6..11 updates are available. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-02-18 18:18:52 +01:00
Fabiano Fidêncio	34336f87c7	kata-deploy: convert install.rs get_hypervisor_name tests to rstest Use rstest parameterized tests for QEMU variants, other hypervisors, and unknown/empty shim cases. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-18 12:41:55 +01:00
Fabiano Fidêncio	bb11bf0403	kata-deploy: preserve symlinks when installing artifacts When copying artifacts from the container to the host, detect source entries that are symlinks and recreate them as symlinks at the destination instead of copying the target file. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-18 12:29:14 +01:00
Dan Mihai	eee25095b5	tests: mariner annotations for k8s-openvpn This test uses YAML files from a different directory than the other k8s CI tests, so annotations have to be added into these separate files. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2026-02-18 07:17:04 +01:00
Markus Rudy	8365afa336	qemu: log exit code after failure When qemu exits prematurely, we usually see a message like msg="Cannot start VM" error="exiting QMP loop, command cancelled" This is an indirect hint, caused by the QMP server shutting down. It takes experience to understand what it even means, and it still does not show what's actually the problem. With this commit, we're taking the error return from the qemu subprocess and surface it in the logs, if it's not nil. This means we automatically capture any non-zero exit codes in the logs. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2026-02-17 21:03:13 +01:00
Fabiano Fidêncio	f0a0425617	kata-deploy: convert a few toml.rs tests to rstest Turn test_toml_value_types into a parameterized test with one case per type (string, bool, int). Merge the two invalid-TOML tests (get and set) into one rstest with two cases, and the two "not an array" tests into one rstest with two cases. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-17 09:33:39 +01:00
Fabiano Fidêncio	899005859c	kata-deploy: avoid leading/blank lines in written TOML config When writing containerd drop-in or other TOML (e.g. initially empty file), the serialized document could start with many newlines. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-17 09:33:39 +01:00
Fabiano Fidêncio	cfa8188cad	kata-deploy: convert containerd version support tests to rstest Replace multiple #[test] functions for snapshotter and erofs version checks with parameterized #[rstest] #[case] tests for consistency and easier extension. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-17 09:33:39 +01:00
Fabiano Fidêncio	cadac7a960	kata-deploy: runtime_platform -> runtime_platforms Fix runtime_platforms typo. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-17 09:33:39 +01:00
Hyounggyu Choi	8bc60a0761	Merge pull request #12521 from fidencio/topic/kata-deploy-auto-add-nfd-tee-labels-to-the-runtime-class kata-deploy: Add TEE nodeSelectors for TEE shims when NFD is detected	2026-02-16 18:06:18 +01:00
Jacek Tomasiak	8025fa0457	agent: Don't pass empty options to mount With some older kernels some fs implementations don't handle empty options strings well. This leads to failures in "setup rootfs" step. E.g. `cgroup: cgroup2: unknown option ""`. This is fixed by mapping empty string to `None` before passing to `nix::mount`. Signed-off-by: Jacek Tomasiak <jtomasiak@arista.com> Signed-off-by: Jacek Tomasiak <jacek.tomasiak@gmail.com>	2026-02-16 14:55:59 +01:00
Fabiano Fidêncio	a04df4f4cb	kata-deploy: disable provenance/SBOM for quay.io compatibility Disable provenance and SBOM when building per-arch kata-deploy images so each tag is a single image manifest. quay.io rejects pushing multi-arch manifest lists that include attestation manifests (400 manifest invalid). Add a note in the release script documenting this. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-16 13:32:25 +01:00
Fabiano Fidêncio	0e8e30d6b5	kata-deploy: fix default RuntimeClass + nodeSelectors The default RuntimeClass (e.g. kata) is meant to point at the default shim handler (e.g. kata-qemu-$tee). We were building it in a separate block and only sometimes adding the same TEE nodeSelectors as the shim-specific RuntimeClass, leading to kata ending up without the SE/SNP/TDX nodeSelector while kata-qemu-$tee had it. The fix is to stop duplicating the RuntimeClass definition, having a single template that renders one RuntimeClass (name, handler, overhead, nodeSelectors). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-16 13:09:03 +01:00
Fabiano Fidêncio	80a175d09b	kata-deploy: Add TEE nodeSelectors for TEE shims when NFD is detected When NFD is detected (deployed by the chart or existing in the cluster), apply shim-specific nodeSelectors only for TEE runtime classes (snp, tdx, and se). Non-TEE shims keep existing behavior (e.g. runtimeClass.nodeSelector for nvidia GPU from `f3bba0885` is unchanged). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-16 12:07:51 +01:00
Fabiano Fidêncio	d000acfe08	infra: fix multi-arch manifest publish Per-arch images were failing publish-multiarch-manifest with 'X is a manifest list' because Buildx now enables attestations by default, so each arch tag became an image index. Use 'docker buildx imagetools create' instead of 'docker manifest create' so we can merge those indexes into the final multi-arch manifest while keeping provenance and SBOM on per-arch images. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-14 19:49:00 +01:00
Fabiano Fidêncio	02c9a4b23c	kata-deploy: Temporarily comment GPU specific labels We depend on GPU Operator v26.3 release, which is not out yet. Although we have been testing with it, it's not yet publicly available, which would break anyone actually trying to use the GPU runtime classes. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-14 09:25:14 +01:00
Fabiano Fidêncio	5106e7b341	build: Add gnupg to the agent's builder container Otherwise we'll fail to check gperf's GPG signing key when needed. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-14 00:33:45 +01:00
stevenhorsman	79b5022a5a	kata-ctl: Bump rkyv version to 0.7.46 Bump to remediate RUSTSEC-2026-0001 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-14 00:33:45 +01:00
stevenhorsman	30ebc4241e	genpolicy: Bump rkyv version to 0.7.46 Bump to remediate RUSTSEC-2026-0001 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-14 00:33:45 +01:00
stevenhorsman	87d1979c84	agent-ctl: Bump rkyv version to 0.7.46 Bump to remediate RUSTSEC-2026-0001 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-14 00:33:45 +01:00
stevenhorsman	90dbd3f562	agent: Bump rkyv version to 0.7.46 Bump to remediate RUSTSEC-2026-0001 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-14 00:33:45 +01:00
stevenhorsman	7f77948658	versions: Bump rkyv version to 0.7.46 Bump to remediate RUSTSEC-2026-0001 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-14 00:33:45 +01:00
Aurélien Bombo	981f693a88	Merge pull request #11140 from balintTobik/hyperv_warning runtime: refactor hypervisor devices cgroup creation	2026-02-13 15:16:09 -06:00
Fabiano Fidêncio	d8acc403c8	kata-deploy: set CRI images runtime_platform snapshotter for containerd v3 In containerd config v3 the CRI plugin is split into runtime and images, and setting the snapshotter only on the runtime plugin is not enough for image pull/prepare. The images plugin must have runtime_platform.<runtime>.snapshotter so it uses the correct snapshotter per runtime (e.g. nydus, erofs). A PR on the containerd side is open so we can rely on the runtime plugin snapshotter alone: https://github.com/containerd/containerd/pull/12836 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-13 22:15:02 +01:00
Fabiano Fidêncio	2930c68c0b	ci: tdx: properly skip k8s-sandbox-vcpus-allocation.bats This is a follow-up for `25962e9325` Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-13 20:56:08 +01:00
Fabiano Fidêncio	f6e0a7c33c	scripts: use temporary GPG home when verifying cached gperf tarball In CI the default GPG keyring is often read-only or missing, so 'gpg --import' of the cached keyring fails and verification cannot succeed. Use a temporary GNUPGHOME for import and verify so cached gperf can be verified without writing to the system keyring. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-13 19:39:55 +01:00
stevenhorsman	55a89f6836	runtime: doc: Remove usage of golang.org/x/net/context This package is deprecated and we aren't using it any more Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-13 17:55:23 +01:00
stevenhorsman	06246ea18b	csi-kata-directvolume: Remove usage of golang.org/x/net/context This packages is deprecated, so use the standard library context package instead Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-13 17:55:23 +01:00
stevenhorsman	f2fae93785	csi-kata-directvolume: Bump x/net to v0.50 Remediates CVEs: - GO-2026-4440 - GO-2026-4441 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-13 17:55:23 +01:00
stevenhorsman	74d4469dab	ci/openshift-ci: Bump x/net to v0.50 Remediates CVEs: - GO-2026-4440 - GO-2026-4441 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-13 17:55:23 +01:00
Steve Horsman	bb867149bb	Merge pull request #12514 from fidencio/topic/nvidia-try-to-improve-genpolicy-failures tests: nvidia: Fix genpolicy error when pulling nvcr.io images	2026-02-13 16:34:00 +00:00
Joji Mekkattuparamban	f3bba08851	kata-deploy: add node selector to nvidia runtime classes The CC runtime classes kata-qemu-nvidia-gpu-snp and kata-qemu-nvidia-gpu-tdx are mutually exclusive with kata-qemu-nvidia-gpu, as dictated by the gpu cc mode setting. In order to properly support a cluster that has both CC and non-CC nodes, we use a node selector so the scheduling is consistent with the GPU mode. The GPU operator sets a label nvidia.com/cc.ready.state=[true, false] to indicate the gpu mode setting Fixes #12431 Signed-off-by: Joji Mekkattuparamban <jojim@nvidia.com>	2026-02-13 15:58:06 +01:00
Fabiano Fidêncio	8cb7d0be9d	tests: nvidia: Fix genpolicy error when pulling nvcr.io images genpolicy pulls image manifests from nvcr.io to generate policy and was failing with 'UnauthorizedError' because it had no registry credentials. Genpolicy (src/tools/genpolicy) uses docker_credential::get_credential() in registry.rs, which reads from DOCKER_CONFIG/config.json. Add setup_genpolicy_registry_auth() to create a Docker config with nvcr.io auth (NGC_API_KEY) and set DOCKER_CONFIG before running genpolicy so it can authenticate when pulling manifests. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-13 13:12:55 +01:00
Fabiano Fidêncio	f4dcb66a3c	ci: add workflow to push ORAS tarball cache Add push-oras-tarball-cache workflow that runs on push to main when versions.yaml changes (and on workflow_dispatch). It populates the ghcr.io ORAS cache with gperf and busybox tarballs from versions.yaml. Remove the push_to_cache call from download-with-oras-cache.sh since it was never triggered in CI. Cache population is now done solely by the new workflow and by populate-oras-tarball-cache.sh when run manually. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-13 12:57:48 +01:00
Balint Tobik	295a6a81d0	runtime: refactor hypervisor devices cgroup creation Separatly added hypervisor devices to cgroup to omit not relevant warnings and fail if none of them are available. Also fix a testcase reload removed kernel modules to later testcases and skip some tests on ARM because lack of virtualization support Fixes #6656 Signed-off-by: Balint Tobik <btobik@redhat.com>	2026-02-13 09:23:08 +01:00
Aurélien Bombo	14be9504e7	Merge pull request #12506 from kata-containers/sprt/gperf-mirror versions: Switch gperf mirror again	2026-02-12 17:00:17 -06:00
Fabiano Fidêncio	a01e95b988	kata-deploy: test k3s/rke2 template handling / version checks Add tests for the split_non_toml_header helper that strips Go template directives before TOML parsing, and for every TOML operation (set, get, append, remove, set_array) on files that start with {{ template "base" . }}. Also converts the containerd version detection tests in manager.rs from individual #[test] functions with helper wrappers to parametrized #[rstest] cases, which is more readable and easier to extend. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-12 22:30:08 +01:00
Fabiano Fidêncio	2e7633674f	kata-deploy: use k3s/rke2 base template K3s docs (https://docs.k3s.io/advanced#configuring-containerd) say that the right way to customize containerd is to extend the base template with {{ template "base" . }} and append your own TOML blocks, rather than copying a prerendered config.toml into the template file. We were copying config.toml into config.toml.tmpl / config-v3.toml.tmpl, which meant we were replacing the K3s defaults with a snapshot that gets stale as soon as K3s is upgraded. Now we create the template files with just the base directive and let our regular set_toml_value code path append the Kata runtime configuration on top. To make that work, the TOML utils learned to handle files that start with a Go template line ({{ ... }}): strip it before parsing, put it back when writing. This keeps the K3s/RKE2 path identical to every other runtime -- no special append logic needed. refs: * k3s:: https://docs.k3s.io/advanced#configuring-containerd * rke2: https://docs.rke2.io/advanced?_highlight=conyainerd#configuring-containerd Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-12 22:30:08 +01:00
Aurélien Bombo	199e1ab16c	versions: Switch gperf mirror again The mirror introduced by #11178 still breaks quite often so apply this as a quick fix. A proper solution would probably be to load balance like in #12453. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-02-12 13:41:19 -06:00
Fabiano Fidêncio	6a3bbb1856	tests: Retry k8s deployment We've seen a lot of spurious issues when deploying the infra needed for the tests. Let's give it a few tries before actually failing. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-02-12 20:13:59 +01:00
Manuel Huber	ed7de905b5	build: Tighten upstream download path for ORAS The gperf-3.3 tarball frequently fails to download on my end with cryptic error messages such as: "tar: This does not look like a tar archive". This change tightens the download logic a bit: We fail at the point in time when we're supposed to fail. This way we detect rate limiting issues right away, and this way, the actual hashsum and signature checks are effective, not only printouts. This change also updates the key reference and allows for an array, for instance, when a different signer was used for a cache vs upstream version. The change also makes it clear, that signature verification is only implemented for the gperf tarball. Improvements can be made in a subsequent change. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-12 19:20:35 +01:00
Fabiano Fidêncio	9fc5be47d0	kata-deploy: fix custom runtime config path for runtime-rs shims Custom runtimes whose base config lives under runtime-rs/ (e.g. dragonball, cloud-hypervisor) were not found because the path was always built under share/defaults/kata-containers/. Use get_kata_containers_original_config_path for the handler so rust shim configs are read from .../runtime-rs/. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-12 18:08:47 +01:00
Fabiano Fidêncio	50923b6d62	kata-deploy: run cleanup on uninstall via DaemonSet preStop On helm uninstall let's rely on a preStop hook to run kata-deploy cleanup so each pod cleans its node before exiting. We must keep RBAC (resource-policy: keep) so pods retain API access during termination, and then can properly delete the NodeFeatureRules and remove the labels from the nodes. The post-delete hook Job, which runs on a single node, now is only responsible for cleaning the kept RBAC (cluster-wide resource) after uninstall, not leaving any resource or artefact behind. The changes on this commit lead to a "resouerces were kept" message when running `helm uninstall`, which document as being normal, as the post-delete job will remove those. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-11 22:05:10 +01:00
Fabiano Fidêncio	6e0cbc28a3	kata-deploy: fix node label removal When removing a node label, JSON merge patch semantics require setting the key to null; omitting the key leaves it unchanged. Fix label_node to send a patch with the label key set to null so the API server actually removes katacontainers.io/kata-runtime. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-11 22:05:10 +01:00

1 2 3 4 5 ...

17928 Commits