kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-03-18 18:58:36 +00:00

Author	SHA1	Message	Date
Zvonko Kaiser	451dcb289a	kernel: bump kata_config_version We have kernel build changes bump the config version Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-01-14 14:55:20 +01:00
Zvonko Kaiser	34cde2637d	gpu: build_image.sh use versions.yaml We've done some bad file based driver determination, now with versions.yaml there is a single source of truth. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-01-14 14:55:20 +01:00
Zvonko Kaiser	664a3af02b	gpu: nvidia_chroot.sh update decoupling Remove all the driver build instructions, sicne those are now done in the kernel target. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-01-14 14:55:20 +01:00
Zvonko Kaiser	e9bb43ef01	gpu: deploy modules for kernel build We need to package the build modules for the rootfs to be able to consume it. We package the whole /lib/modules/$(uname -r) directory strip=2. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-01-14 14:55:20 +01:00
Zvonko Kaiser	d4962bafac	gpu: Add NVIDA modules to build-kernel.sh Checkout and build the kernel modules along with the kernel to avoid the kernel rootfs dependency. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-01-14 14:30:31 +01:00
Zvonko Kaiser	c42f7501fd	gpu: Remove building of Headers Since we build along the kernel we do not need to carry over the headers to the rootfs build. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-01-14 14:30:31 +01:00
Zvonko Kaiser	1f6cfb11b0	kernel: bugfix install yq We actually never installed yq to the kernel build, there are some path that use yq but were never hit, for the GPU use-case we need to read values from versions.yaml Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-01-14 14:30:31 +01:00
Fabiano Fidêncio	2acb94ef2d	arm64: Do not use DAX with the rootfs image Kernel 6.18.x has an issue with DAX, which is not yet fixed upstream: ``` [ 0.737679] EXT4-fs (pmem0p1): mounted filesystem 79676804-7c8b-491a-b2a6-9bae3c72af70 ro with ordered data mode. Quota mode: disabled. [ 0.737891] VFS: Mounted root (ext4 filesystem) readonly on device 259:1. [ 0.739119] devtmpfs: mounted [ 0.739476] Freeing unused kernel memory: 1920K [ 0.740156] Run /sbin/init as init process [ 0.740229] with arguments: [ 0.740286] /sbin/init [ 0.740321] with environment: [ 0.740369] HOME=/ [ 0.740400] TERM=linux [ 0.743162] Unable to handle kernel paging request at virtual address fffffdffbf000008 [ 0.743285] Mem abort info: [ 0.743316] ESR = 0x0000000096000006 [ 0.743371] EC = 0x25: DABT (current EL), IL = 32 bits [ 0.743444] SET = 0, FnV = 0 [ 0.743489] EA = 0, S1PTW = 0 [ 0.743545] FSC = 0x06: level 2 translation fault [ 0.743610] Data abort info: [ 0.743656] ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000 [ 0.743720] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 0.743785] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 0.743848] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000b9d17000 [ 0.743931] [fffffdffbf000008] pgd=10000000bfa3d403, p4d=10000000bfa3d403, pud=1000000040bfe403, pmd=0000000000000000 [ 0.744070] Internal error: Oops: 0000000096000006 [#1] SMP [ 0.748888] CPU: 0 UID: 0 PID: 1 Comm: init Not tainted 6.18.4 #1 NONE [ 0.749421] pstate: 004000c5 (nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 0.749969] pc : dax_disassociate_entry.constprop.0+0x20/0x50 [ 0.750444] lr : dax_insert_entry+0xcc/0x408 [ 0.750802] sp : ffff80008000b9e0 [ 0.751083] x29: ffff80008000b9e0 x28: 0000000000000000 x27: 0000000000000000 [ 0.751682] x26: 0000000001963d01 x25: ffff0000004f7d90 x24: 0000000000000000 [ 0.752264] x23: 0000000000000000 x22: ffff80008000bcc8 x21: 0000000000000011 [ 0.752836] x20: ffff80008000ba90 x19: 0000000001963d01 x18: 0000000000000000 [ 0.753407] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 0.753970] x14: ffffbf3154b9ae70 x13: 0000000000000000 x12: ffffbf3154b9ae70 [ 0.754548] x11: ffffffffffffffff x10: 0000000000000000 x9 : 0000000000000000 [ 0.755122] x8 : 000000000000000d x7 : 000000000000001f x6 : 0000000000000000 [ 0.755707] x5 : 0000000000000000 x4 : 0000000000000000 x3 : fffffdffc0000000 [ 0.756287] x2 : 0000000000000008 x1 : 0000000040000000 x0 : fffffdffbf000000 [ 0.756871] Call trace: [ 0.757107] dax_disassociate_entry.constprop.0+0x20/0x50 (P) [ 0.757592] dax_iomap_pte_fault+0x4fc/0x808 [ 0.757951] dax_iomap_fault+0x28/0x30 [ 0.758258] ext4_dax_huge_fault+0x80/0x2dc [ 0.758594] ext4_dax_fault+0x10/0x3c [ 0.758892] __do_fault+0x38/0x12c [ 0.759175] __handle_mm_fault+0x530/0xcf0 [ 0.759518] handle_mm_fault+0xe4/0x230 [ 0.759833] do_page_fault+0x17c/0x4dc [ 0.760144] do_translation_fault+0x30/0x38 [ 0.760483] do_mem_abort+0x40/0x8c [ 0.760771] el0_ia+0x4c/0x170 [ 0.761032] el0t_64_sync_handler+0xd8/0xdc [ 0.761371] el0t_64_sync+0x168/0x16c [ 0.761677] Code: f9453021 f2dfbfe3 cb813080 8b001860 (f9400401) [ 0.762168] ---[ end trace 0000000000000000 ]--- [ 0.762550] note: init[1] exited with irqs disabled [ 0.762631] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ``` For now, we limit the rootfs that we ship to ARM64 to not use DAX, in the future we'll re-enable it as soon as the patch lands on mainstream kernel. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-14 11:46:40 +01:00
Fabiano Fidêncio	3ef99f4ee3	versions: Add specific nvidia kernel version This is needed as the 580 driver doesn't build against 6.18.x, and the 590 driver is not yet fully working for our case, thus we stick to the previous version that worked before. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-14 11:46:40 +01:00
Fabiano Fidêncio	cce5d4abf6	kernel: bump to v6.18.x (LTS) Bump both the kernel and kernel-confidential versions from v6.12.x and v6.16.x to v6.18.4, aligning with the new LTS release. Kernel 6.18 introduced several configuration changes that required updates to our kernel config fragments: * CRYPTO_FIPS dependencies changed: - In 6.12: depended on !CRYPTO_MANAGER_DISABLE_TESTS - In 6.18: now depends on CRYPTO_SELFTESTS (which requires EXPERT) Added CONFIG_EXPERT=y and CONFIG_CRYPTO_SELFTESTS=y to crypto.conf to satisfy the new dependency chain. * CONFIG_EXPERT is a naughty one, as it disables / enables a bunch of things behind ones back, probably just to prove a point that it is for experts ;-) ... regardless, a reasonable amount of options had to be re-added in order to make sure anything ends up broken. * Legacy iptables support: Kernel 6.18 requires explicit legacy xtables/iptables configs for IP_NF_* options. Added CONFIG_NETFILTER_XTABLES_LEGACY, CONFIG_IP_NF_IPTABLES_LEGACY, and CONFIG_IP6_NF_IPTABLES_LEGACY to netfilter.conf. * Module signing dependencies: Added CONFIG_MODULES=y and other required dependencies to module_signing.conf to ensure MODULE_SIG can be properly enabled. * Whitelist updates: - Added CONFIG_NF_CT_PROTO_DCCP (removed in 6.18+) - Added CONFIG_CRYPTO_SELFTESTS, CONFIG_NETFILTER_XTABLES_LEGACY, CONFIG_IP_NF_IPTABLES_LEGACY, CONFIG_IP6_NF_IPTABLES_LEGACY (added in 6.18+, not present in older kernels like 6.12) Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-14 11:46:40 +01:00
Fabiano Fidêncio	26dfcb627b	tools: Build kubectl image This image will be used by our helm charts to verify that a kata-containers deployment is correct. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-12 15:48:44 +01:00
stevenhorsman	a0d96256f5	packaging: Fix tools permissions issue In some builds we are seeing: ``` error: could not create temp file /opt/rustup/tmp/r2xu46kwuyc7k2kr_file: Permission denied (os error 13) ``` in the agent-ctl build, so try and port a fix from #12313 to the tools build to try and resolve this. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-01-09 21:45:26 +01:00
Federico A. Corazza	787768fe9b	kata-deploy: Fix extraction of the containerd major version Fixes deploying kata-containers using k3s. The deploy script fails with /opt/kata-artifacts/scripts/kata-deploy.sh: line 397: [: too many arguments Signed-off-by: Federico A. Corazza <git@facorazza.com>	2026-01-09 19:52:18 +01:00
Hyounggyu Choi	2962e14c10	virtiofsd: fix RUSTUP_HOME and CARGO_HOME permissions for non-root builds The following error was observed during virtiofsd static build: ``` error: could not create temp file /opt/rustup/tmp/p44enysfaxwdbvw4_file: Permission denied (os error 13) ``` This occurs because RUSTUP_HOME and CARGO_HOME were initialized by the root user during `docker build`, but `cargo build` is executed as a non-root user via 'docker run --user'. Ensure these directories are writable by adjusting the permission after the toolchain installation is complete. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2026-01-09 14:01:20 +01:00
Fabiano Fidêncio	f8318c0542	kata-deploy: Remove unused dependency We're depending solely on toml_edit, thus we can safely remove the toml dependency. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-08 18:58:11 +01:00
Fupan Li	b3546f3a68	Merge pull request #12282 from kata-containers/set-required-ci Set several tests as required ci	2026-01-08 20:34:39 +08:00
Mikko Ylinen	e02e226431	packaging: build OVMF for Intel TDX again OVMF build for Intel TDX (aka "TDVF") was disabled in favor of Ubuntu/ CentOS pre-upstream releases of Intel TDX. See `4292c4c3b1`. It's time to re-enable the build and move runtime configurations to use it (the latter will be done in a later commit). This is a partial revert of `4292c4c3b` with the following changes: - Stop calling OVMF for Intel TDX "TDVF" and follow the naming distros use for TDX enabled build: OVMF.inteltdx.fd. - Single binary OVMF.inteltdx.fd is supported using -bios QEMU param. - Secure Boot infrastructure is disabled since Kata does not support it. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-01-08 10:21:47 +01:00
Alex Lyn	e4451baa84	tests: Set run-nerdctl-tests with qemu-runtime-rs required run-nerdctl-tests (qemu-runtime-rs) Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-08 09:56:50 +08:00
Alex Lyn	56a21c33a3	tests: Set stability tests with qemu-runtime-rs required run-containerd-stability (active, qemu-runtime-rs) run-containerd-stability (lts, qemu-runtime-rs) Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-08 09:56:50 +08:00
Alex Lyn	679e31d884	tests: Set run-nydus CIs as required run-basic-amd64-tests / run-nydus Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-08 09:56:50 +08:00
Fabiano Fidêncio	c4194538e2	versions: Bump QEMU to v10.2.0 QEMU v10.2.0 was released on December 24th, 2025. The experimental GPU SNP / TDX are also pointing to v10.2.0 release with their gpu-{snp,tdx}-20260107 branch. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-07 12:30:55 +01:00
Mikko Ylinen	99bc0f49cc	use-cases: drop Intel QuickAssist instructions While the use-case of Intel QuickAssist (QAT) accelerated crypto and/or compression with k8s and Kata Containers is still valid, the setup instructions are outdated: Starting with Intel Xeon Gen4 (Sapphire Rapids), QAT driver stack moved to in-tree drivers without a separete SR-IOV VF driver. Drop all the setup instructions but keep the use-cases doc for reference. Users wanting to enable the use-case, should consult with Intel QAT Device plugins or Intel QAT DRA driver authors. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-01-02 12:14:04 +02:00
Alex Lyn	0b1a5c6e93	tests: Make the tests coco-dev job with coco-dev-runtime-rs required The nontee job (run-k8s-tests-coco-nontee) for qemu-coco-dev-runtime-rs is running well and it's time to make it required when the CI runs. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-23 09:54:52 +08:00
Fabiano Fidêncio	51e9b7e9d1	nydus-snapshotter: Bump to v0.15.10 As it brings a fix that most likely can workaround the containerd / nydus-snapshotter databases desynchronization. Reference: https://github.com/containerd/nydus-snapshotter/pull/700 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-18 18:41:09 +01:00
Fabiano Fidêncio	03297edd3a	kata-deploy: rust: Add list verb for runtimeclasses RBAC The Rust kata-deploy binary calls list_runtimeclasses() during NFD setup, but the ClusterRole only granted get and patch permissions. Add the list verb to the runtimeclasses resource permissions to fix the RBAC error: runtimeclasses.node.k8s.io is forbidden: User \"system:serviceaccount:kube-system:kata-deploy-sa\" cannot list resource \"runtimeclasses\" in API group \"node.k8s.io\" at the cluster scope Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-18 18:31:52 +01:00
Fabiano Fidêncio	0e534fa7fe	versions: Update virtiofsd to v1.13.3 Update virtiofsd to its latest release. Here we also need to update the alpine version used by the builder as we need a version of musl-dev new enough to have wrappers for pread2 and pwrite2. As bumping, bump to the latest. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-18 00:51:08 +01:00
Fabiano Fidêncio	320f1ce2a3	versions: Bump experimental {tdx,snp} QEMU Let's bump experimental {tdx,snp} QEMU to the tags created Today in the Confidential Containers repo, which match with QEMU 10.2.0-rc3. This bump is mostly for early testing what will become 10.2.0, which will be bumped everywhere then. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-17 17:42:04 +01:00
Hyounggyu Choi	f1b4327dba	Merge pull request #12247 from fidencio/topic/ci-store-the-tarballs-we-rely-on-on-gchr-follow-up build: Fix GPG key for gperf & Pass PUSH_TO_REGISTRY and GH_TOKEN to Docker builds	2025-12-17 13:53:58 +01:00
Fabiano Fidêncio	98c5276546	helm: runtimeclasses: Match the kata-deploy rust deployment There we ensure labels are added to better deal with ownership of the runtimeclasses. It's not strictly needed here as helm does take care of the ownership, but also doesn't hurt to follow what seems to be a common practice. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-17 09:57:02 +01:00
Fabiano Fidêncio	6130d7330f	ci: Run a nightly job using the kata-deploy rust Let's shamelessly duplicate the nightly job to have at least nightly runs using the rust implementation of kata-deploy. The reason for doing that is to be pragmatic, as pragmatic as possible, and avoid switching away of the scripts before 3.24.0 release, while still testing both ways till the switch happens. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-17 09:57:02 +01:00
Fabiano Fidêncio	fbc29f3f5e	kata-deploy: helm: Adapt to the rust binary Differently than the scripts, which are called as `bash -c ...`, the kata-deploy rust binary must be invoked directly we do not even have shell in its container. For now, the rust version is used in the used image has the "-rust" suffix, which will help us to have both ways being used / tested for a little while. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-17 09:57:02 +01:00
Fabiano Fidêncio	9d88c6b1d7	kata-deploy: Oxidize the script kata-deploy shell script is not THAT bad and, to be honest, it's quite handy for quick hacks and quick changes. However, it's been increasingly becoming harder to maintain as it's grown its scope from a testing tool to the proper project's front door, lacking unit tests, and with an abundacy of complex regular expressions and bashisms to be able to properly parse the environment variables it consumes. Morever, the fact it is a Frankstein's monster glued together using python packages, golang binaries, and a distro dependent container makes the situation VERY HARD to use it from a distroless container (thus, avoiding security issues), preventing further integration with components that require a higher standard of security than we've been requiring. With everything said, with the help of Cursor (mostly on generating the tests cases), here comes the oxidized version of the script, which runs from a distroless container image. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-17 09:57:02 +01:00
Fabiano Fidêncio	c9cd79655d	build: Pass PUSH_TO_REGISTRY and GH_TOKEN to Docker builds The ORAS cache helper needs PUSH_TO_REGISTRY to be set to 'yes' to push new artifacts to the cache. However, this environment variable was not being passed to the Docker container during agent, tools, and busybox builds. Moreover, for ghcr.io authentication, add support for using GH_TOKEN and GITHUB_ACTOR as fallbacks when explicit credentials (ARTEFACT_REGISTRY_USERNAME/PASSWORD) are not provided. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-16 21:58:16 +01:00
Fabiano Fidêncio	b11cea3113	build: Fix GPG key for gperf The GPG key used for gperf was incorrectly set to the busybox maintainer's key (Denis Vlasenko) instead of the gperf maintainer's key (Marcel Schaible). Wrong key (busybox): C9E9416F76E610DBD09D040F47B70C55ACC9965B Denis Vlasenko <vda.linux@googlemail.com> Correct key (gperf): EDEB87A500CC0A211677FBFD93C08C88471097CD Marcel Schaible <marcel.schaible@studium.fernuni-hagen.de> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-16 21:58:16 +01:00
Fabiano Fidêncio	6e01ee6d47	helm: Provide kata-remote runtime class kata-remote is a runtime class that cloud-api-adaptor relies on to work. kata-remote by itself does nothing, and that's the reason it's disabled by default. We're only adding it here so cloud-api-adaptor charts can simply do something like `--set shims.remote.enabled=true`. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-16 21:57:49 +01:00
Fabiano Fidêncio	0a0fcbae4a	gatekeeper: Adjust to kata-tools A few jobs have been renamed as part of the kata-tools split. Let's add them all here. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-16 18:22:40 +01:00
Fabiano Fidêncio	a2534e7bc8	kata-tools: Release as its own tarball We're only releasing those for amd64 as that's the only architecture we've been building the packages for. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-16 12:55:07 +01:00
Fabiano Fidêncio	6d2f393be4	build: Split tools build from the other artefacts build Let's ensure we can create a specific "tools" tarball, which will help those who only need to pull those either for testing or production usage. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-16 12:55:07 +01:00
Ruoqing He	1872af7c5a	ci: Install cmake before building runtime-rs cmake is required for libz-sys to compile (which is required by nydus). Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-12-16 11:26:07 +01:00
Fabiano Fidêncio	1388a3acda	packaging: Add ORAS cache for gperf and busybox tarballs To protect against upstream download failures for gperf and busybox, implement ORAS-based caching to GHCR. This adds: - download-with-oras-cache.sh: Core helper for downloading with cache - populate-oras-tarball-cache.sh: Script to manually populate cache - warn() function to lib.sh for consistency Modified build scripts to: - Try ORAS cache first (from ghcr.io/kata-containers/kata-containers) - Fall back to upstream download on cache miss - Automatically push to cache when PUSH_TO_REGISTRY=yes The cache is automatically populated during CI builds, and parallel architecture builds check for existing versions before pushing to avoid race conditions. Forks benefit from upstream cache but can override with their own: ARTEFACT_REPOSITORY=myorg/kata make agent-tarball Generated-By: Cursor IDE with Claude Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-15 22:04:21 +01:00
Fabiano Fidêncio	a25a53c860	kata-deploy: sa: Fix permissions for patching nodefeaturerules I've seen this happening with the GPU SNP CI every now and then, but I don't really understand how this was not caught by the TDX / SNP CI themselves before. In any case, the error seen is: ``` Error from server (Forbidden): error when applying patch: {"metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"nfd.k8s-sigs.io/v1alpha1\",\"kind\":\"NodeFeatureRule\",\"metadata\":{\"annotations\":{},\"name\":\"amd64-tee-keys\"},\"spec\":{\"rules\":[{\"extendedResources\":{\"sev-snp.amd.com/esids\":\"@cpu.security.sev.encrypted_state_ids\"},\"labels\":{\"amd.feature.node.kubernetes.io/snp\":\"true\"},\"matchFeatures\":[{\"feature\":\"cpu.security\",\"matchExpressions\":{\"sev.snp.enabled\":{\"op\":\"Exists\"}}}],\"name\":\"amd.sev-snp\"},{\"extendedResources\":{\"tdx.intel.com/keys\":\"@cpu.security.tdx.total_keys\"},\"labels\":{\"intel.feature.node.kubernetes.io/tdx\":\"true\"},\"matchFeatures\":[{\"feature\":\"cpu.security\",\"matchExpressions\":{\"tdx.enabled\":{\"op\":\"Exists\"}}}],\"name\":\"intel.tdx\"}]}}\n"}}} to: Resource: "nfd.k8s-sigs.io/v1alpha1, Resource=nodefeaturerules", GroupVersionKind: "nfd.k8s-sigs.io/v1alpha1, Kind=NodeFeatureRule" Name: "amd64-tee-keys", Namespace: "" for: "/opt/kata-artifacts/node-feature-rules/x86_64-tee-keys.yaml": error when patching "/opt/kata-artifacts/node-feature-rules/x86_64-tee-keys.yaml": nodefeaturerules.nfd.k8s-sigs.io "amd64-tee-keys" is forbidden: User "system:serviceaccount:kube-system:kata-deploy-sa" cannot patch resource "nodefeaturerules" in API group "nfd.k8s-sigs.io" at the cluster scope ``` And the fix is as simple as allowing patching and updating a nodefeaturerule in our service account RBAC. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-15 12:01:20 +01:00
Alex Lyn	f4f61d5666	Merge pull request #12229 from fidencio/topic/kata-deploy-do-deprecations kata-deploy: Remove deprecated features from 3.23.0	2025-12-15 19:00:07 +08:00
Hyounggyu Choi	b69da5f3ba	gatekeeper: Make s390x e2e tests required again Since the CI issue for s390x was resolved on Dec 5th, the nightly test result has gone green for 10 consecutive days. This commit puts the e2e tests for s390x again into the required job list. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-12-15 11:12:25 +01:00
Fabiano Fidêncio	ded6d1636f	kata-deploy: Remove deprecated features from 3.23.0 Let's remove the deprecated features that were marked for removal after Kata Containers 3.23.0: kata-deploy.sh: - Remove non-arch-specific variable fallbacks (SHIMS, DEFAULT_SHIM, SNAPSHOTTER_HANDLER_MAPPING, ALLOWED_HYPERVISOR_ANNOTATIONS, PULL_TYPE_MAPPING, EXPERIMENTAL_FORCE_GUEST_PULL). Each arch now has its own default value. - Remove CREATE_RUNTIMECLASSES and CREATE_DEFAULT_RUNTIMECLASS variables and associated functions (create_runtimeclasses, delete_runtimeclasses, adjust_shim_for_nfd). RuntimeClasses are now managed by Helm chart, not the daemonset script. - Unsupported architectures now fail with an error instead of falling back to non-arch-specific defaults. Helm chart: - Remove all deprecated env values (createRuntimeClasses, createDefaultRuntimeClass, debug, shims, shims_, defaultShim, defaultShim_, allowedHypervisorAnnotations, snapshotterHandlerMapping, snapshotterHandlerMapping_, agentHttpsProxy, agentNoProxy, pullTypeMapping, pullTypeMapping_, _experimentalSetupSnapshotter, _experimentalForceGuestPull, _experimentalForceGuestPull_*). - Remove backward compatibility code from _helpers.tpl that checked for legacy env values. - Remove legacy env.shims check from runtimeclasses.yaml. - Remove CREATE_RUNTIMECLASSES and CREATE_DEFAULT_RUNTIMECLASS env vars from kata-deploy.yaml and post-delete-job.yaml. - Update RBAC to only include runtimeclasses get/patch permissions (needed for NFD patching), removing create/delete/list/update/watch. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-13 16:32:00 +01:00
Fabiano Fidêncio	c7d0c270ee	release: Bump version to 3.24.0 Bump VERSION and helm-chart versions Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-12 18:15:41 +01:00
Fabiano Fidêncio	5b6a2d25bc	podOverhead: Reduce memory overhead for GPU runtime classes Now that we've bumped to QEMU 10.2.0-rc1, we can take advantage of a fix that's present there, which fixes the double memory allocation for the cases where GPUs are being cold-plugged. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-06 00:16:43 +01:00
Fabiano Fidêncio	aaa67df4dd	versions: Bump experimental {tdx,snp} QEMU Let's bump experimental {tdx,snp} QEMU to the tags created Today in the Confidential Containers repo, which match with QEMU 10.2.0-rc1. This bump is specially beneficial for us, as we can get rid of QEMU's double memory allocation when cold plugging a GPU. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-05 18:58:35 +01:00
Fabiano Fidêncio	923f97bc66	rootfs: Temporarily revert "gpu: Handle root_hash.txt correctly" This reverts commit `e4a13b9a4a`, as it caused some issues with the GPU workflows. Reverting it is better, as it unblocks other PRs. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-05 11:47:37 +01:00
Steve Horsman	d27af53902	Merge pull request #12185 from stevenhorsman/runtime-rs-required-checks ci: Add qemu-runtime-rs AKS tests to required	2025-12-05 10:43:25 +00:00
stevenhorsman	403de2161f	version: Update golang to 1.24.11 Needed to fix: ``` Vulnerability #1: GO-2025-4155 Excessive resource consumption when printing error string for host certificate validation in crypto/x509 More info: https://pkg.go.dev/vuln/GO-2025-4155 Standard library Found in: crypto/x509@go1.24.9 Fixed in: crypto/x509@go1.24.11 Vulnerable symbols found: #1: x509.HostnameError.Error ``` Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-04 22:50:07 +01:00

1 2 3 4 5 ...

1986 Commits