kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-03-18 18:58:36 +00:00

Author	SHA1	Message	Date
stevenhorsman	aa11441c1a	workflows: Create workflow to stale issues based on date The standard stale/action is intended to be run regularly with a date offset, but we want to have one we can run against a specific date in order to run the stale bot against issues created since a particular release milestone, so calculate the offset in one step and use it in the next. At the moment we want to run this to stale issues before 9th October 2022 when Kata 3.0 was release, so default to this. Note the stale action only processes a few issues at a time to avoid rate limiting, so why we want a cron job to it can get through the backlog, but also to stale/unstale issues that are commented on.	2026-01-22 11:32:01 +00:00
Steve Horsman	2cd76796bd	Merge pull request #12305 from stevenhorsman/fix-stalebot-permissions ci: Fix stalebot permissions	2026-01-22 10:02:43 +00:00
Hyounggyu Choi	bc131a84b9	GHA: Set timeout for kata-deploy and kbs cleanup It was observed that some kata-deploy cleanup steps could hang, causing the workflow to never finish properly. In these cases, a QEMU process was not cleaned up and kept printing debug logs to the journal. Over time, this maxed out the runner’s disk usage and caused the runner service to stop. Set timeouts for the relevant cleanup steps to avoid this. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2026-01-22 10:32:24 +01:00
Fabiano Fidêncio	dacb14619d	kata-deploy: Make verification ConfigMap a regular resource The verification job mounts a ConfigMap containing the pod spec for the Kata runtime test. Previously, both the ConfigMap and the Job were Helm hooks with different weights (-5 and 0 respectively). On k3s, a race condition was observed where the Job pod would be scheduled before the kubelet's informer cache had registered the ConfigMap, causing a FailedMount error: MountVolume.SetUp failed for volume "pod-spec": object "kube-system"/"kata-deploy-verification-spec" not registered This happened because k3s's lightweight architecture schedules pods very quickly, and the hook weight difference only controls Helm's ordering, not actual timing between resource creation and cache sync. By making the ConfigMap a regular chart resource (removing hook annotations), it is created during the main chart installation phase, well before any post-install hooks run. This guarantees the ConfigMap is fully propagated to all kubelets before the verification Job starts. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	89e287c3b2	kata-deploy: Add more permissions to verification job's RBAC The verification job needs to list nodes to check for the katacontainers.io/kata-runtime label and list events to detect FailedCreatePodSandBox errors during pod creation. This was discovered when testing with k0s, where the service account lacked the required cluster-scope permissions to list nodes. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	869dd5ac65	kata-deploy: Enable dynamic drop-in support for k0s Remove k0s-worker and k0s-controller from RUNTIMES_WITHOUT_CONTAINERD_DROP_IN_SUPPORT and always return true for k0s in is_containerd_capable_of_using_drop_in_files since k0s auto-loads from containerd.d/ directory regardless of containerd version. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	d4ea02e339	kata-deploy: Add microk8s support with dynamic version detection Add microk8s case to get_containerd_paths() method and remove microk8s from RUNTIMES_WITHOUT_CONTAINERD_DROP_IN_SUPPORT to enable dynamic containerd version checking. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	69dd9679c2	kata-deploy: Centralize containerd path management Introduce ContainerdPaths struct and get_containerd_paths() method to centralize the complex logic for determining containerd configuration file paths across different Kubernetes distributions. The new ContainerdPaths struct includes: - config_file: File to read containerd version from and write to - backup_file: Backup file path before modification - imports_file: File to add/remove drop-in imports from (Option<String>) - drop_in_file: Path to the drop-in configuration file - use_drop_in: Whether drop-in files can be used Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	606c12df6d	kata-deploy: fix JSONPath parsing for labels with dots The JSONPath parser was incorrectly splitting on escaped dots (\.) causing microk8s detection to fail. Labels like "microk8s.io/cluster" were being split into ["microk8s\", "io/cluster"] instead of being treated as a single key. This adds a split_jsonpath() helper that properly handles escaped dots, allowing the automatic microk8s detection via the node label to work correctly. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	ec18dd79ba	tests: Simplify kata-deploy test to use helm directly The kata-deploy test was using helm_helper which made it hard to debug failures (die() calls would cause "Executed 0 tests" errors) and added unnecessary complexity. The test now calls helm directly like a user would, making it simpler and more representative of real-world usage. The verification job status is explicitly checked with proper failure detection instead of relying on helm --wait. Timeouts are configurable via environment variables to account for different network speeds and image sizes: - KATA_DEPLOY_TIMEOUT (default: 600s) - KATA_DEPLOY_DAEMONSET_TIMEOUT (default: 300s) - KATA_DEPLOY_VERIFICATION_TIMEOUT (default: 120s) Documentation has been added to explain what each timeout controls and how to customize them. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	86e0b08b13	kata-deploy: Improve verification job timing and failure detection The verification job now supports configurable timeouts to accommodate different environments and network conditions. The daemonset timeout defaults to 1200 seconds (20 minutes) to allow for large image downloads, while the verification pod timeout defaults to 180 seconds. The job now waits for the DaemonSet to exist, pods to be scheduled, rollout to complete, and nodes to be labeled before creating the verification pod. A 15-second delay is added after node labeling to allow kubelet time to refresh runtime information. Retry logic with 3 attempts and a 10-second delay handles transient FailedCreatePodSandBox errors that can occur during runtime initialization. The job only fails on pod errors after a 30-second grace period to avoid false positives from timing issues. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	2369cf585d	tests: Fix retry loop bugs in helm_helper The retry loop in helm_helper had two bugs: 1. Counter initialized to 10 instead of 0, causing immediate failure 2. Exit condition used -eq instead of -ge, incorrect for loop logic These bugs would cause helm_helper to fail immediately on the first retry attempt instead of properly retrying up to max_tries times. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
stevenhorsman	19efeae12e	workflow: Fix stalebot permissions When looking into stale bot more for issues, I realised that our existing stale job would need permissions to work. Unfortunately the behaviour of the actions without these permissions is to log, but still finish as successful. This means it was hard to spot we had an issue. Add the required permissions to get this working again and improve the message Also add concurrency rule to make zizmor happy Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-01-21 17:28:59 +00:00
Steve Horsman	70f6543333	Merge pull request #12371 from stevenhorsman/cargo-check build: Add cargo check	2026-01-21 14:50:07 +00:00
Steve Horsman	4eb50d7b59	Merge pull request #12334 from stevenhorsman/rust-linting-improvements Rust linting improvements	2026-01-21 14:01:37 +00:00
Steve Horsman	ba47bb6583	Merge pull request #11421 from kata-containers/dependabot/go_modules/src/runtime/github.com/urfave/cli-1.22.17 build(deps): bump github.com/urfave/cli from 1.22.14 to 1.22.17 in /src/runtime	2026-01-21 11:46:02 +00:00
stevenhorsman	62847e1efb	kata-ctl: Remove unnecessary unwrap Switch `is_err()` and then `unwrap_err()` for `if let` which is "more idiomatic" Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-01-21 08:53:40 +00:00
stevenhorsman	78824e0181	agent: Remove unnecessary unwrap Switch `is_some()` and then `unwrap()` for `if let` which is "more idiomatic" Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-01-21 08:53:40 +00:00
stevenhorsman	d135a186e1	libs: Remove unnecessary unwrap Switch `is_err()` and then `unwrap_err()` for `if let` which is "more idiomatic" Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-01-21 08:52:48 +00:00
stevenhorsman	949e0c2ca0	libs: Remove unused imports Tidy up the imports Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-01-21 08:52:48 +00:00
stevenhorsman	83b0c44986	dragonball: Remove unused imports Clean up the imports Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-01-21 08:52:48 +00:00
stevenhorsman	7a02c54b6c	kata-ctl: Allow unused assigned in clap parsing command isn't ever read, but leave it in for now, so we don't disrupt the parsing option Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-01-21 08:52:48 +00:00
stevenhorsman	bf1539b802	libs: Replace manual default HugePageType has a manual default that can be derived more concisely Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-01-21 08:52:47 +00:00
stevenhorsman	0fd9eebf0f	kata-ctl: Update Cargo.lock The cargo check identified that the lock file is out of date, so bump this to fix the issue Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-01-20 16:07:34 +00:00
stevenhorsman	3f1533ae8a	build: Add cargo check We've had a couple of occasions that Cargo.lock has been out of sync with Cargo.toml, so try and extend our rust check to pick this up in the CI. There is probably a more elegant way than doing `cargo check` and checking for changes, but I'll start with this approach Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-01-20 16:07:34 +00:00
Greg Kurz	cf3441bd2c	agent: Refresh `Cargo.lock` Downstream builders at Red Hat complain that `Cargo.lock` doesn't match `Cargo.toml`. Run `cargo check` to refresh `Cargo.lock`. `git bisect` shows that `7cfb97d41b` is the first commit where `cargo check` has an effect in `src/agent`. Signed-off-by: Greg Kurz <groug@kaod.org>	2026-01-20 14:44:47 +01:00
Fabiano Fidêncio	e0158869b1	tests: Add common bats test runner function Add run_bats_tests() function to common.bash that provides consistent test execution and reporting across all test suites (k8s, nvidia, kata-deploy). This removes duplicated test runner code from run_kubernetes_tests.sh, run_kubernetes_nv_tests.sh, and run-kata-deploy-tests.sh. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-20 12:31:55 +01:00
Fabiano Fidêncio	5aff81198f	helm-chart: Fix warnings on README nydus -> `nydus` erofs -> `erofs` Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-19 22:41:50 +01:00
Fabiano Fidêncio	b5a986eacf	kata-deploy: Add runtime-rs TDX / SNP runtimeclasses https://github.com/kata-containers/kata-containers/pull/11534 has been merged and it added all the needed bits to deploy the QEMU SNP / TDX runtime-rs variants, apart from the kata-deploy additions, which is done by this PR. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-19 22:41:50 +01:00
Fabiano Fidêncio	c7570427d2	tests: Add report generation to NVIDIA tests The NVIDIA GPU test runner script was not generating test reports, causing the report_tests() function in gha-run.sh to have nothing to display. This aligns the script with run_kubernetes_tests.sh by: - Adding set -o pipefail for proper pipeline error handling - Creating a reports directory with timestamped subdirectory - Capturing test output to files with ok-/not_ok- prefixes - Adding --timing flag to bats for timing information Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-19 18:21:43 +01:00
Fabiano Fidêncio	c1216598e8	static-checks: Fix kata-deploy reference Let's just point to the official documentation rather than explaining exactly how to deploy (and the current text was very outdated). Removing fluentd / minikube examples is out of context of this commit. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-19 15:09:20 +01:00
Fabiano Fidêncio	96e1fb4ca6	tools: Remove runk The runk tool hasn't been supported for a few years, with no maintainers since ManaSugi stopped being involved in the project and the CI was disabled in 2024. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-19 14:43:53 +01:00
Fabiano Fidêncio	f68c25de6a	kata-deploy: Switch to the rust version Let's remove the script and rely only on the rust version from now on. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-19 14:07:49 +01:00
Fabiano Fidêncio	d7aa793dde	Revert "ci: Run a nightly job using the kata-deploy rust" This reverts commit `6130d7330f`, as we're officially swithcing to the rust version of kata-deploy. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-19 14:07:49 +01:00
Fabiano Fidêncio	17472f3f10	release: scripts: Accept KATA_TOOLS_STATIC_TARBALL env var `a2534e7bc8` introduced the logic to also release a kata-tools tarball, but it missed allowing KATA_TOOLS_STATIC_TARBALL env var to be passed to the release script, leading to the following error during the release process: ``` ERROR: Invalid environment variable "KATA_TOOLS_STATIC_TARBALL" ``` Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> 3.25.0	2026-01-19 13:03:23 +01:00
Fabiano Fidêncio	882862d711	release: Bump version to 3.25.0 Bump VERSION and helm-charts versions. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-19 11:33:45 +01:00
XanderC	93beb58c5d	runtime: fix network initialization for non-hotplug VMMs In startVM(), for VMMs without hotplug support (e.g., Firecracker or QEMU microvm), the runtime runs prestart hooks but misses rescanning the network namespace. This causes VMs to boot with uninitialized network configs, as updates from CNI plugins are not captured. This patch adds a network rescan via AddEndpoints after prestart hooks for the non-hotplug path, ensuring correct network info is passed to the VMM configuration before the VM starts. Fixes #11500 Signed-off-by: XanderC <xanderc@qq.com>	2026-01-17 23:56:59 +01:00
Zvonko Kaiser	428cc5d586	gpu: Chroot Cleanup With the newest NVRC we do not need the supported GPUs anymore. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-01-17 19:27:24 +01:00
Fabiano Fidêncio	1c154b4c15	kernel: Add DAX fix for arm64 The patch has been provided upstream by Seunguk Shin and is already approved. We'll drop it once it becomes available in the LTS tree. Reference: https://lore.kernel.org/all/18af3213-6c46-4611-ba75-da5be5a1c9b0@arm.coum Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-17 19:15:53 +01:00
Fabiano Fidêncio	33b1f0786e	Revert "arm64: Do not use DAX with the rootfs image" This reverts commit `2acb94ef2d`, as we have a kernel patch approved fixing the issue. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-17 19:15:53 +01:00
Alex Lyn	fe15f2fa47	runtime-rs: Remove deprecated virtio-9p The virtio-9p is not supported for a long time, specially within the runtime-rs, we have no such plan to support it. Removal of the related items is reasonable. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-17 18:52:57 +01:00
Alex Lyn	b7cfc6fd72	runtime-rs: Remove mem-agent section from TDX/SNP configurations As Memory Agent feature is not used within CoCo(TDX/SNP) scenarios, with this fact, it's better to just remove the related sections. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-17 18:52:57 +01:00
Alex Lyn	634ec2b56d	runtime-rs: Add configurable SNP items in Makefile when make build It aims to introduce some related items within Makefile to enable Intel SNP settings in configuration when do make build. And make it possible to generate the rendered qemu-snp-runtime-rs configuration based on the *.in template. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-17 18:52:57 +01:00
Alex Lyn	0abdb8e016	runtime-rs: Introduce a qemu-runtime-rs/SEV-SNP dedicated configuration To make it work well on the SEV-SNP platforms for qemu-runtime-rs with coco, a dedicated SEV-SNP configuration should be introduced to help prepare related CVM resources. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-17 18:52:57 +01:00
Alex Lyn	b0a82f7bb8	runtime-rs: Enable measured rootfs within configuration when make build Enable measured rootfs within configuration when make build. And add some other important items to make the configuration work well. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-17 18:52:57 +01:00
Alex Lyn	3799855040	runtime-rs: Add configurable TDX items in Makefile when make build It aims to introduce some related items within Makefile to enable Intel TDX settings in configuration when do make build. And make it possible to generate the rendered qemu-tdx-runtime-rs configuration based on the *.in template. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-17 18:52:57 +01:00
Alex Lyn	4d55e2c8c8	runtime-rs: Introduce a dedicated configuration for qemu-runtime-rs/TDX To make it work well on the TDX platforms for qemu-runtime-rs with coco, a dedicated TDX configuration should be introduced to help prepare related CVM resources. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-17 18:52:57 +01:00
Manuel Huber	956f43c6c6	runtime: skip MoveTo for systemd cgroups Systemd-managed cgroups use the slice:prefix:name format, which is not a filesystem path. Calling MoveTo() on such paths fails with "invalid group path" and can abort cleanup before Delete() runs. In some cases, this causes pod teardown delays. Skip MoveTo for systemd-formatted sandbox/overhead cgroup paths when sandbox_cgroup_only is true; systemd moves tasks on unit deletion. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-16 16:41:38 +01:00
Manuel Huber	6b70923e55	docs: Update NVIDIA GPU passthrough QEMU scenario With cold-plug becoming by design the only supported mode with the update of NVRC to v0.1.1, resolving references to hot-plug. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-16 13:50:10 +01:00
Steve Horsman	610a8bdfd5	Merge pull request #12346 from Amulyam24/ppc64le-payload ci: move the job publish kata payload after push to an alternate runner for ppc64le	2026-01-16 11:41:53 +00:00

1 2 3 4 5 ...

17730 Commits