kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-02-21 14:22:24 +00:00

Author	SHA1	Message	Date
Jacek Tomasiak	8025fa0457	agent: Don't pass empty options to mount With some older kernels some fs implementations don't handle empty options strings well. This leads to failures in "setup rootfs" step. E.g. `cgroup: cgroup2: unknown option ""`. This is fixed by mapping empty string to `None` before passing to `nix::mount`. Signed-off-by: Jacek Tomasiak <jtomasiak@arista.com> Signed-off-by: Jacek Tomasiak <jacek.tomasiak@gmail.com>	2026-02-16 14:55:59 +01:00
Fabiano Fidêncio	a04df4f4cb	kata-deploy: disable provenance/SBOM for quay.io compatibility Disable provenance and SBOM when building per-arch kata-deploy images so each tag is a single image manifest. quay.io rejects pushing multi-arch manifest lists that include attestation manifests (400 manifest invalid). Add a note in the release script documenting this. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-16 13:32:25 +01:00
Fabiano Fidêncio	d000acfe08	infra: fix multi-arch manifest publish Per-arch images were failing publish-multiarch-manifest with 'X is a manifest list' because Buildx now enables attestations by default, so each arch tag became an image index. Use 'docker buildx imagetools create' instead of 'docker manifest create' so we can merge those indexes into the final multi-arch manifest while keeping provenance and SBOM on per-arch images. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-14 19:49:00 +01:00
Fabiano Fidêncio	02c9a4b23c	kata-deploy: Temporarily comment GPU specific labels We depend on GPU Operator v26.3 release, which is not out yet. Although we have been testing with it, it's not yet publicly available, which would break anyone actually trying to use the GPU runtime classes. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-14 09:25:14 +01:00
Fabiano Fidêncio	5106e7b341	build: Add gnupg to the agent's builder container Otherwise we'll fail to check gperf's GPG signing key when needed. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-14 00:33:45 +01:00
stevenhorsman	79b5022a5a	kata-ctl: Bump rkyv version to 0.7.46 Bump to remediate RUSTSEC-2026-0001 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-14 00:33:45 +01:00
stevenhorsman	30ebc4241e	genpolicy: Bump rkyv version to 0.7.46 Bump to remediate RUSTSEC-2026-0001 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-14 00:33:45 +01:00
stevenhorsman	87d1979c84	agent-ctl: Bump rkyv version to 0.7.46 Bump to remediate RUSTSEC-2026-0001 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-14 00:33:45 +01:00
stevenhorsman	90dbd3f562	agent: Bump rkyv version to 0.7.46 Bump to remediate RUSTSEC-2026-0001 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-14 00:33:45 +01:00
stevenhorsman	7f77948658	versions: Bump rkyv version to 0.7.46 Bump to remediate RUSTSEC-2026-0001 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-14 00:33:45 +01:00
Aurélien Bombo	981f693a88	Merge pull request #11140 from balintTobik/hyperv_warning runtime: refactor hypervisor devices cgroup creation	2026-02-13 15:16:09 -06:00
Fabiano Fidêncio	d8acc403c8	kata-deploy: set CRI images runtime_platform snapshotter for containerd v3 In containerd config v3 the CRI plugin is split into runtime and images, and setting the snapshotter only on the runtime plugin is not enough for image pull/prepare. The images plugin must have runtime_platform.<runtime>.snapshotter so it uses the correct snapshotter per runtime (e.g. nydus, erofs). A PR on the containerd side is open so we can rely on the runtime plugin snapshotter alone: https://github.com/containerd/containerd/pull/12836 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-13 22:15:02 +01:00
Fabiano Fidêncio	2930c68c0b	ci: tdx: properly skip k8s-sandbox-vcpus-allocation.bats This is a follow-up for `25962e9325` Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-13 20:56:08 +01:00
Fabiano Fidêncio	f6e0a7c33c	scripts: use temporary GPG home when verifying cached gperf tarball In CI the default GPG keyring is often read-only or missing, so 'gpg --import' of the cached keyring fails and verification cannot succeed. Use a temporary GNUPGHOME for import and verify so cached gperf can be verified without writing to the system keyring. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-13 19:39:55 +01:00
stevenhorsman	55a89f6836	runtime: doc: Remove usage of golang.org/x/net/context This package is deprecated and we aren't using it any more Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-13 17:55:23 +01:00
stevenhorsman	06246ea18b	csi-kata-directvolume: Remove usage of golang.org/x/net/context This packages is deprecated, so use the standard library context package instead Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-13 17:55:23 +01:00
stevenhorsman	f2fae93785	csi-kata-directvolume: Bump x/net to v0.50 Remediates CVEs: - GO-2026-4440 - GO-2026-4441 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-13 17:55:23 +01:00
stevenhorsman	74d4469dab	ci/openshift-ci: Bump x/net to v0.50 Remediates CVEs: - GO-2026-4440 - GO-2026-4441 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-13 17:55:23 +01:00
Steve Horsman	bb867149bb	Merge pull request #12514 from fidencio/topic/nvidia-try-to-improve-genpolicy-failures tests: nvidia: Fix genpolicy error when pulling nvcr.io images	2026-02-13 16:34:00 +00:00
Joji Mekkattuparamban	f3bba08851	kata-deploy: add node selector to nvidia runtime classes The CC runtime classes kata-qemu-nvidia-gpu-snp and kata-qemu-nvidia-gpu-tdx are mutually exclusive with kata-qemu-nvidia-gpu, as dictated by the gpu cc mode setting. In order to properly support a cluster that has both CC and non-CC nodes, we use a node selector so the scheduling is consistent with the GPU mode. The GPU operator sets a label nvidia.com/cc.ready.state=[true, false] to indicate the gpu mode setting Fixes #12431 Signed-off-by: Joji Mekkattuparamban <jojim@nvidia.com>	2026-02-13 15:58:06 +01:00
Fabiano Fidêncio	8cb7d0be9d	tests: nvidia: Fix genpolicy error when pulling nvcr.io images genpolicy pulls image manifests from nvcr.io to generate policy and was failing with 'UnauthorizedError' because it had no registry credentials. Genpolicy (src/tools/genpolicy) uses docker_credential::get_credential() in registry.rs, which reads from DOCKER_CONFIG/config.json. Add setup_genpolicy_registry_auth() to create a Docker config with nvcr.io auth (NGC_API_KEY) and set DOCKER_CONFIG before running genpolicy so it can authenticate when pulling manifests. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-13 13:12:55 +01:00
Fabiano Fidêncio	f4dcb66a3c	ci: add workflow to push ORAS tarball cache Add push-oras-tarball-cache workflow that runs on push to main when versions.yaml changes (and on workflow_dispatch). It populates the ghcr.io ORAS cache with gperf and busybox tarballs from versions.yaml. Remove the push_to_cache call from download-with-oras-cache.sh since it was never triggered in CI. Cache population is now done solely by the new workflow and by populate-oras-tarball-cache.sh when run manually. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-13 12:57:48 +01:00
Balint Tobik	295a6a81d0	runtime: refactor hypervisor devices cgroup creation Separatly added hypervisor devices to cgroup to omit not relevant warnings and fail if none of them are available. Also fix a testcase reload removed kernel modules to later testcases and skip some tests on ARM because lack of virtualization support Fixes #6656 Signed-off-by: Balint Tobik <btobik@redhat.com>	2026-02-13 09:23:08 +01:00
Aurélien Bombo	14be9504e7	Merge pull request #12506 from kata-containers/sprt/gperf-mirror versions: Switch gperf mirror again	2026-02-12 17:00:17 -06:00
Fabiano Fidêncio	a01e95b988	kata-deploy: test k3s/rke2 template handling / version checks Add tests for the split_non_toml_header helper that strips Go template directives before TOML parsing, and for every TOML operation (set, get, append, remove, set_array) on files that start with {{ template "base" . }}. Also converts the containerd version detection tests in manager.rs from individual #[test] functions with helper wrappers to parametrized #[rstest] cases, which is more readable and easier to extend. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-12 22:30:08 +01:00
Fabiano Fidêncio	2e7633674f	kata-deploy: use k3s/rke2 base template K3s docs (https://docs.k3s.io/advanced#configuring-containerd) say that the right way to customize containerd is to extend the base template with {{ template "base" . }} and append your own TOML blocks, rather than copying a prerendered config.toml into the template file. We were copying config.toml into config.toml.tmpl / config-v3.toml.tmpl, which meant we were replacing the K3s defaults with a snapshot that gets stale as soon as K3s is upgraded. Now we create the template files with just the base directive and let our regular set_toml_value code path append the Kata runtime configuration on top. To make that work, the TOML utils learned to handle files that start with a Go template line ({{ ... }}): strip it before parsing, put it back when writing. This keeps the K3s/RKE2 path identical to every other runtime -- no special append logic needed. refs: * k3s:: https://docs.k3s.io/advanced#configuring-containerd * rke2: https://docs.rke2.io/advanced?_highlight=conyainerd#configuring-containerd Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-12 22:30:08 +01:00
Aurélien Bombo	199e1ab16c	versions: Switch gperf mirror again The mirror introduced by #11178 still breaks quite often so apply this as a quick fix. A proper solution would probably be to load balance like in #12453. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-02-12 13:41:19 -06:00
Fabiano Fidêncio	6a3bbb1856	tests: Retry k8s deployment We've seen a lot of spurious issues when deploying the infra needed for the tests. Let's give it a few tries before actually failing. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-02-12 20:13:59 +01:00
Manuel Huber	ed7de905b5	build: Tighten upstream download path for ORAS The gperf-3.3 tarball frequently fails to download on my end with cryptic error messages such as: "tar: This does not look like a tar archive". This change tightens the download logic a bit: We fail at the point in time when we're supposed to fail. This way we detect rate limiting issues right away, and this way, the actual hashsum and signature checks are effective, not only printouts. This change also updates the key reference and allows for an array, for instance, when a different signer was used for a cache vs upstream version. The change also makes it clear, that signature verification is only implemented for the gperf tarball. Improvements can be made in a subsequent change. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-12 19:20:35 +01:00
Fabiano Fidêncio	9fc5be47d0	kata-deploy: fix custom runtime config path for runtime-rs shims Custom runtimes whose base config lives under runtime-rs/ (e.g. dragonball, cloud-hypervisor) were not found because the path was always built under share/defaults/kata-containers/. Use get_kata_containers_original_config_path for the handler so rust shim configs are read from .../runtime-rs/. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-12 18:08:47 +01:00
Fabiano Fidêncio	50923b6d62	kata-deploy: run cleanup on uninstall via DaemonSet preStop On helm uninstall let's rely on a preStop hook to run kata-deploy cleanup so each pod cleans its node before exiting. We must keep RBAC (resource-policy: keep) so pods retain API access during termination, and then can properly delete the NodeFeatureRules and remove the labels from the nodes. The post-delete hook Job, which runs on a single node, now is only responsible for cleaning the kept RBAC (cluster-wide resource) after uninstall, not leaving any resource or artefact behind. The changes on this commit lead to a "resouerces were kept" message when running `helm uninstall`, which document as being normal, as the post-delete job will remove those. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-11 22:05:10 +01:00
Fabiano Fidêncio	6e0cbc28a3	kata-deploy: fix node label removal When removing a node label, JSON merge patch semantics require setting the key to null; omitting the key leaves it unchanged. Fix label_node to send a patch with the label key set to null so the API server actually removes katacontainers.io/kata-runtime. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-11 22:05:10 +01:00
Fabiano Fidêncio	510d2a69ae	kata-deploy: exit with 0 on SIGTERM in install mode Wait for SIGTERM after install and exit(0) so the container terminates cleanly. If registering the SIGTERM handler fails, log a warning and sleep forever instead of exiting with an error (fallback to the old behaviour). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-11 22:05:10 +01:00
Mikko Ylinen	25962e9325	tests/coco: disable k8s-sandbox-vcpus-allocation.bats for TDX After the move to Linux 6.17 and QEMU 10.2 from Kata, k8s-sandbox-vcpus-allocation.bats started failing on TDX. 2026-02-10T16:39:39.1305813Z # pod/vcpus-less-than-one-with-no-limits created 2026-02-10T16:39:39.1306474Z # pod/vcpus-less-than-one-with-limits created 2026-02-10T16:39:39.1307090Z # pod/vcpus-more-than-one-with-limits created 2026-02-10T16:39:39.1307672Z # pod/vcpus-less-than-one-with-limits condition met 2026-02-10T16:39:39.1308373Z # timed out waiting for the condition on pods/vcpus-less-than-one-with-no-limits 2026-02-10T16:39:39.1309132Z # timed out waiting for the condition on pods/vcpus-more-than-one-with-limits 2026-02-10T16:39:39.1310370Z # Error from server (BadRequest): container "vcpus-less-than-one-with-no-limits" in pod "vcpus-less-than-one-with-no-limits" is waiting to start: ContainerCreating A manual test without agent policies added it seems to work OK but disable the test for now to get CI stable. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-02-11 22:02:59 +01:00
stevenhorsman	006a5d5141	versions: Tidy up versions file - We don't use containerd.latest as the comment on it suggests - We also don't have any references to `sriov-network-device` so remove that and the plugins section. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-11 20:49:53 +01:00
Dan Mihai	9d763e9d5a	Merge pull request #12439 from sespiros/genpolicy-suppress-yaml-stdout genpolicy: suppress YAML output when --{base64/raw}-out are used	2026-02-11 10:27:01 -08:00
Spyros Seimenis	282bfc9f14	genpolicy: suppress YAML output when --{base64/raw}-out are used this will suppress yaml output only if the input is passed via stdin. If {base64/raw}-out is passed in alongside a yaml file, the encoded annotation or the policy data respectively will be printed to stdout as before. Fixes #12438 Signed-off-by: Spyros Seimenis <sse@edgeless.systems>	2026-02-11 14:08:30 +02:00
Hyounggyu Choi	c84e37f6ac	Merge pull request #12486 from BbolroC/cpu-hotplug-s390x-runtime-rs runtime-rs: Skip sockets and threads for hotplug_vcpus on Z/P	2026-02-11 09:40:21 +01:00
Hyounggyu Choi	67f54bdcb5	tests: Remove skip condition for runtime-rs on s390x in k8s-cpu-ns This commit removes the skip condition for qemu-runtime-rs on s390x in k8s-cpu-ns.bats. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2026-02-11 05:52:13 +01:00
Hyounggyu Choi	eab77a26ab	runtime-rs: Skip sockets and threads for hotplug_vcpus on Z/P As s390x and ppc64 use a flat CPU topology without sockets and threads, this commit skips the socket_id and thread_id properties for vCPU hotplug on these architectures instead of aborting the operation. This is the change in line with those from the Go runtime: - isSocketIDSupported() - isThreadIDSupported() Fixes: #12155 Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2026-02-11 05:52:13 +01:00
Alex Lyn	c53910eb1b	Merge pull request #12408 from Apokleos/netdev-multiq runtime-rs: Add support configurable network_queues via configuration and annotation	2026-02-11 09:34:58 +08:00
stevenhorsman	a115d6d858	ci: Add copyright and license to shellcheckrc Make the static-checks happy Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-10 21:58:28 +01:00
stevenhorsman	15d6a681ed	doc: Fix spelling issues Put things in backticks Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-10 21:58:28 +01:00
stevenhorsman	e84d234721	doc: Update broken/slow URLs Update the URLs to better/existing links Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-10 21:58:28 +01:00
Fabiano Fidêncio	5c0269881e	tests: Make editorconfig-checker happy - Trim trailing whitespace and ensure final newline in non-vendor files - Add .editorconfig-checker.json excluding vendor dirs, .patch, .img, .dtb, .drawio, *.svg, and pkg/cloud-hypervisor/client so CI only checks project code - Leave generated and binary assets unchanged (excluded from checker) Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-10 21:58:28 +01:00
Fabiano Fidêncio	34199b09eb	runtime-rs: Properly parse containerd runtime options to extract ConfigPath The runtime-rs shim was failing to load its configuration when deployed via kata-deploy because it couldn't correctly parse the ConfigPath passed by containerd. The previous implementation naively skipped the first 2 bytes of the options and interpreted the rest as a UTF-8 string, which doesn't work since containerd passes a properly serialized protobuf message of type runtimeoptions.v1.Options. This change adds the runtimeoptions.proto definition to the protocols crate and updates the load_config function to correctly deserialize the protobuf message and extract the config_path field, matching how the Go runtime handles this via typeurl.UnmarshalAny. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	cb652e0da1	tests: Update NVRC trace to use drop-in config mechanism Update the enable_nvrc_trace() function to use the new drop-in configuration mechanism instead of directly modifying the base configuration file. The function now creates a 90-nvrc-trace.toml drop-in file that properly combines existing kernel parameters with the nvrc.log=trace setting. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	4cb2aea9dd	kata-deploy: Document drop-in configuration and add warning to config files When kata-deploy installs Kata Containers, the base configuration files should not be modified directly. This change adds documentation explaining how to use drop-in configuration files for customization, and prepends a warning comment to all deployed configuration files reminding users to use drop-in files instead. The warning is added to both standard shim configurations and custom runtime configurations. It includes a brief explanation of how drop-in files work and points users to the documentation for more details. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	d5d561abe5	kata-deploy: Add detailed logging for drop-in configuration Add clear INFO-level messages when creating drop-in configuration files, making it easy to understand what kata-deploy is doing during installation: - "Setting up runtime directory for shim: X" - "Generating drop-in configuration files for shim: X" - "Created drop-in file: <path>" When DEBUG mode is enabled (via DEBUG=true environment variable), also log the full content of each drop-in file to aid troubleshooting. The log level is now automatically set to Debug when the DEBUG environment variable is set, ensuring debug messages are visible. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	eddd1b507e	kata-deploy: Extract common drop-in generation into shared helper Deduplicate the drop-in file generation logic between configure_shim_config and install_custom_runtime_configs by extracting it into a shared write_common_drop_ins helper function. This ensures both standard and custom runtimes use the same code path for generating drop-in configuration files. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00

1 2 3 4 5 ...

17910 Commits