kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-04-26 10:32:28 +00:00

Author	SHA1	Message	Date
LandonTClipp	fd896e4e76	ci: Add kata-dictionary.txt to required_tests.yaml This makes it so that changes to the kata-dictionary.txt file only trigger the static checks to run. Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>	2026-04-15 14:48:01 +01:00
LandonTClipp	56cdfa831f	docs: Add annotation config to doc site Adding the pod annotation config to the doc site. A symlink is created at docs/pod-annotations.md that points to how-to/how-to-set-sandbox-config-kata.md so that the URL for this file will be created at `/pod-annotations`. Also adding brief contrbuting guidelines and how-to's for running the documentation site locally for local previews. Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>	2026-04-15 14:48:01 +01:00
Fabiano Fidêncio	57898de1fe	Merge pull request #12845 from fidencio/topic/fix-signed-image-tests tests: Update images used for signed tests	2026-04-15 14:47:58 +02:00
Fabiano Fidêncio	ba9a02897e	genpolicy: make allowed cgroup v2 mount extras configurable Newer kernels and containerd versions (>= 2.2.3) may add extra mount options to /sys/fs/cgroup that genpolicy does not embed in the policy (e.g. nsdelegate, memory_recursiveprot). This causes the Kata agent to reject CreateContainerRequest with PERMISSION_DENIED because the check_mount rules require an exact match. Rather than hard-coding the allowed extras in Rego, make them configurable via genpolicy-settings.json under cluster_config.cgroup_mount_extras_allowed. The corresponding Rego rule (check_mount 4) reads the list from policy_data.cluster_config and allows only those named options beyond the policy-embedded set. To support this, cluster_config is now included in PolicyData so that it gets serialized into the Rego policy_data object at generation time. This follows the established pattern of keeping site- and version-specific tunables in genpolicy-settings.json so they can be overridden via JSON-Patch drop-ins without touching the Rego source. A policy test case is added to verify that the default allowed extras (nsdelegate, memory_recursiveprot) are accepted and that unknown extras are rejected. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-15 13:24:21 +02:00
Fabiano Fidêncio	d29b77e953	tests: Update images used for signed tests I've updaed the images on the Confidential Containers side, in order to add arm64 support, but I didn't realize it'd break tests not using those. Apologies! Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-15 12:11:37 +02:00
Fabiano Fidêncio	2d57b89857	Merge pull request #12805 from stevenhorsman/stale-bot-improvements Stale bot improvements	2026-04-14 23:20:41 +02:00
Fabiano Fidêncio	672d3f2b0f	workflows: Use docker buildx to build and push auth test image skopeo copy with --override-arch fails with "authentication required" during blob existence checks at the destination, regardless of how credentials are provided (--dest-creds, --authfile, REGISTRY_AUTH_FILE). This is a known issue with skopeo 1.13.x when copying from manifest list sources. Replace the skopeo/buildah approach with docker/build-push-action, which is already proven in this repo (build-kubectl-image.yaml) and handles multi-arch builds and Quay pushes reliably. The workflow now builds a trivial FROM busybox image using buildx with QEMU emulation. Fixes: `b0abe5999` ("workflows: Add workflow to create auth registry test image") Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-14 22:44:30 +02:00
Fabiano Fidêncio	09ef32eaf1	Merge pull request #12827 from fidencio/topic/kata-deploy-custom-containerd-config kata-deploy: Allow overriding containerd config path and file name	2026-04-14 22:23:33 +02:00
stevenhorsman	5ea30b33ae	workflows: stale-issue: Increase operations-per-run At a rate of default 30 per run, with over 1.5k issues, it will take us over 50 days to do a pass of the issues we have, so increase operations-per-run as suggested in the workflow by github to reduce this. Based on the stats of the latest run, we are not too close to hitting the API rate limit: ``` Github API rate used: 32 Github API rate remaining: 3693; reset at: Thu Apr 09 2026 10:23:31 GMT+0000 (Coordinated Universal Time) ``` so I think this should be okay. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-14 16:25:35 +01:00
stevenhorsman	a0359326e9	workflow: Bump stale action version v9 is based on Node.js 20 which is deprecated, so update to the latest to pick up a Node.js 24 version before Github removes Node 20 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-14 16:25:35 +01:00
Fabiano Fidêncio	0713b2d5d3	Merge pull request #12828 from kata-containers/dependabot/pip/docs/pillow-12.2.0 build(deps): bump pillow from 12.1.1 to 12.2.0 in /docs	2026-04-14 17:23:07 +02:00
Fabiano Fidêncio	661cfd7efa	Merge pull request #12800 from kata-containers/dependabot/go_modules/src/runtime/go.opentelemetry.io/otel/sdk-1.43.0 build(deps): bump go.opentelemetry.io/otel/sdk from 1.40.0 to 1.43.0 in /src/runtime	2026-04-14 17:22:47 +02:00
dependabot[bot]	b54f02aa6c	build(deps): bump pillow from 12.1.1 to 12.2.0 in /docs Bumps [pillow](https://github.com/python-pillow/Pillow) from 12.1.1 to 12.2.0. - [Release notes](https://github.com/python-pillow/Pillow/releases) - [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst) - [Commits](https://github.com/python-pillow/Pillow/compare/12.1.1...12.2.0) --- updated-dependencies: - dependency-name: pillow dependency-version: 12.2.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2026-04-14 14:40:14 +00:00
Steve Horsman	8289aaf0c7	Merge pull request #12831 from kata-containers/topic/ci-move-out-of-nodejs-20 ci: Update GitHub Actions to Node.js 24 compatible versions	2026-04-14 14:59:03 +01:00
Fabiano Fidêncio	c087eb92ec	ci: Update GitHub Actions to Node.js 24 compatible versions Node.js 20 is deprecated on GitHub Actions runners and will be forced to Node.js 24 starting June 2nd, 2026. Update all affected actions to versions that natively support Node.js 24: - actions/upload-artifact: v4.6.2 -> v6.0.0 - actions/download-artifact: v4.3.0 -> v7.0.0 - docker/build-push-action: v5.4.0 -> v7.0.0 - docker/login-action: v3.4.0 -> v4.1.0 - docker/setup-buildx-action: v3.10.0 -> v4.0.0 - docker/setup-qemu-action: v3.6.0 -> v4.0.0 - geekyeggo/delete-artifact: v5.1.0 -> v6.0.0 - azure/login: v2.3.0 -> v3.0.0 - azure/setup-kubectl: v4.0.1 -> v5.0.0 - nick-fields/retry: v3.0.2 -> v4.0.0 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-14 15:48:45 +02:00
Fabiano Fidêncio	7e464f13a5	Merge pull request #12830 from fidencio/topic/workflows-create-auth-registry-image workflows: Add workflow to create auth registry test image	2026-04-14 11:28:23 +02:00
Fabiano Fidêncio	b0abe59993	workflows: Add workflow to create auth registry test image Add a manually-triggered workflow that builds and pushes a multi-arch busybox-based image to quay.io/kata-containers/confidential-containers-auth for use as an authenticated container image in CI tests. The workflow uses skopeo to copy per-arch images and buildah to create and push the multi-arch manifest. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-14 10:59:12 +02:00
Fabiano Fidêncio	b0a87880e7	Merge pull request #12826 from fidencio/topic/fix-concurrent-map-access-in-wait runtime: Fix concurrent map read/write panic in Wait()	2026-04-14 08:48:52 +02:00
Fabiano Fidêncio	df1d02d3cf	kata-deploy: Allow overriding containerd config path and file name Add two new Helm values under `containerd`: - `configDir`: overrides the host directory where the containerd config lives, taking precedence over the k8sDistribution-based auto-detection. - `configFileName`: overrides the containerd config file name, propagated to the kata-deploy binary via the new CONTAINERD_CONFIG_FILE_NAME environment variable. These are useful for non-standard containerd setups that don't match any of the built-in k8sDistribution presets (k8s, k3s, rke2, k0s, microk8s). The config file name override only affects the default runtime branch in get_containerd_paths(). The k0s/microk8s/k3s/rke2 branches are left untouched since those runtimes have mandatory file naming conventions. Also fixes a spurious leading space in the k3s containerdConfPath branch. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-13 22:31:55 +02:00
Fabiano Fidêncio	b17dd2a902	runtime: Fix concurrent map read/write panic in Wait() Wait() was releasing s.mu immediately after getContainer(), then calling getExec() — which reads c.execs — without holding any lock. Concurrent Exec() or Delete() calls that write to c.execs under s.mu triggered a "concurrent map read and map write" fatal panic. Add a dedicated sync.RWMutex to the container struct that protects the execs map. getExec() now acquires a read lock internally, and all writes go through new setExec()/deleteExec() helpers that acquire the write lock. This keeps the locking concern local to the map and avoids complicating the s.mu usage in Wait(). Add a regression test (TestConcurrentExecAccess) that exercises concurrent getExec reads against setExec/deleteExec writes; this reliably reproduces the panic under the race detector without the fix. Fixes: #12825 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-13 21:14:28 +02:00
Fabiano Fidêncio	4c567a9c05	ci: Reduce TEE test scope for PR runs TEE hardware (TDX, SEV-SNP) is very limited in CI. Running the full test suite on every PR consumes these resources unnecessarily, since most tests exercises what is already exercised by the -coco-dev CIs. Introduce a `tee-test-scope` workflow input (small/full) and a new `baremetal-small-tee` K8S_TEST_HOST_TYPE that runs only the 12 tests that are TEE-relevant: attestation tests (encrypted/authenticated/ signed image pull, confidential attestation) plus policy and trusted ephemeral data storage tests. PR runs default to "small" (12 tests), nightly runs use "full" (59 tests), and manual dispatch offers a dropdown to choose. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-13 20:26:46 +02:00
dependabot[bot]	b303600283	build(deps): bump go.opentelemetry.io/otel/sdk in /src/runtime Bumps [go.opentelemetry.io/otel/sdk](https://github.com/open-telemetry/opentelemetry-go) from 1.40.0 to 1.43.0. - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.40.0...v1.43.0) --- updated-dependencies: - dependency-name: go.opentelemetry.io/otel/sdk dependency-version: 1.43.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2026-04-13 10:36:44 +00:00
Fabiano Fidêncio	bd6377a038	Merge pull request #12614 from manuelh-dev/mahuber/image-signing-nim tests: nvidia: Enforce image signing for NIM test	2026-04-11 14:48:04 +02:00
Fabiano Fidêncio	5eb7844183	Merge pull request #12430 from stevenhorsman/cargo-deny-static-checks static-checks: Rework cargo deny check	2026-04-11 12:05:53 +02:00
stevenhorsman	8be3a24112	ci: Update cargo-deny in gatekeeper Update the name and move it to the static checks as we don't need to ensure it's running for none code changes. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-11 08:46:32 +01:00
stevenhorsman	9448988783	workflow: Update cargo deny check The cargo deny generated action doesn't seem to work and seems unnecessarily complex, so try using EmbarkStudios/cargo-deny-action instead Fixes: #11218 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-11 08:46:32 +01:00
stevenhorsman	a0410e0d5c	static-checks: Update cargo deny config The previous config is not valid, so update it based on information from https://embarkstudios.github.io/cargo-deny/checks/cfg.html Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-11 08:46:32 +01:00
stevenhorsman	a32c6fd9ff	mem-agent: Add package metadata Make the authors, edition and license be inherited from the workspace Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-11 08:46:32 +01:00
stevenhorsman	5bcc006447	runtime-rs: Add missing license The ch-config crate was missing a license Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-11 08:46:32 +01:00
Manuel Huber	7daeb78b67	tests: nvidia: Enforce image signing for NIM test Validate container image signatures for the NIM test using NVIDIA's public signing key. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-04-11 09:22:50 +02:00
Fabiano Fidêncio	3ce3644c3c	Merge pull request #12807 from PiotrProkop/blk-sector-rust runtime-rs: allow specifying logical/physical sector size for block devices	2026-04-11 00:42:45 +02:00
Fabiano Fidêncio	6f3c11aec4	Merge pull request #12808 from fidencio/topic/agent-allow-configuring-launch-process-timeout agent: Make launch_process_timeout configurable	2026-04-11 00:36:01 +02:00
Fabiano Fidêncio	d4a042a155	Merge pull request #12813 from fitzthum/bump-gc-ma-sigs Bump guest components to pickup additional signature support	2026-04-10 23:57:19 +02:00
Fabiano Fidêncio	78fa4c88e2	Merge pull request #12814 from fidencio/topic/nvidia-always-do-vcpu-pinning runtime: Set `enable_vcpus_pinning = true` for NVIDIA configs	2026-04-10 23:47:44 +02:00
Fabiano Fidêncio	7244389ad4	runtime: Set `enable_vcpus_pinning = true` for NVIDIA configs So we can have a better performance by default. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-10 16:41:34 +02:00
Fabiano Fidêncio	1d77c4e60f	Merge pull request #12752 from LizZhang315/add-overheadEnabled helm: add overheadEnabled switch for runtimeclass	2026-04-10 16:40:42 +02:00
Tobin Feldman-Fitzthum	ff26a6b876	versions: update image-rs to pickup signature fixes The new version of image-rs supports more types of signed images. First, we added supported for a few more key types. Second, we added support for multi-arch images where the manifest digest is signed but the individual arch manifest is not. These images are relatively common, so let's pickup the fix asap. Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>	2026-04-10 06:54:58 -07:00
Tobin Feldman-Fitzthum	2588a0e5a5	agent-ctl: bump image-rs version I don't think agent-ctl will benefit from the new image-rs features, but let's update it to be complete. Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>	2026-04-10 06:52:53 -07:00
Fabiano Fidêncio	e8f34a2b26	agent: Update protocol This is not related to this PR, but rather to #12734, which ended up not running the `make src/agent generate-protocols`. While here, let's also fix it. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-10 14:47:01 +02:00
Fabiano Fidêncio	36a2d8e7f2	agent: Make launch_process_timeout configurable The hardcoded DEFAULT_LAUNCH_PROCESS_TIMEOUT of 6 seconds in the kata agent is insufficient for environments with NVIDIA GPUs and NVSwitches, where the attestation-agent needs significantly more time to collect evidence during initialization (e.g. ~2 seconds per NVSwitch). When the timeout expires, the agent (PID 1) exits with an error, causing the guest kernel to perform an orderly shutdown before the attestation-agent has finished starting. Make this timeout configurable via the kernel parameter agent.launch_process_timeout (in seconds), preserving the 6-second default for backward compatibility. The Go runtime is wired up to pass this value from the TOML config's [agent.kata] section through to the kernel command line. The NVIDIA GPU configs set the new default to 15 seconds. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-10 14:47:01 +02:00
PiotrProkop	82de35c720	runtime-rs: allow specifying logical/physical sector size for block devices Add two new configuration knobs that control the logical and physical sector sizes advertised by virtio-blk devices to the guest: block_device_logical_sector_size (config file) block_device_physical_sector_size (config file) io.katacontainers.config.hypervisor.blk_logical_sector_size (annotation) io.katacontainers.config.hypervisor.blk_physical_sector_size (annotation) The annotation names are abbreviated relative to the config file keys because Kubernetes enforces a 63-character limit on annotation name segments, and the full names would exceed it. Both settings default to 0 (let QEMU decide). When set, they are passed as logical_block_size and physical_block_size in the QMP device_add command during block device hotplug. Setting logical_sector_size smaller then container filesystem block size will cause EINVAL on mount. The physical_sector_size can always be set independently. Values must be 0 or a power of 2 in the range [512, 65536]; other values are rejected with an error at sandbox creation time. Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2026-04-10 11:14:51 +02:00
Fabiano Fidêncio	fd6375d8d5	Merge pull request #12806 from kata-containers/topic/ci-run-runtime-rs-on-SNP ci: Run qemu-snp-runtime-rs tests in the CI	2026-04-10 11:01:20 +02:00
LizZhang315	2312f67c9b	helm: add overheadEnabled switch for runtimeclass Add a global and per-shim configurable switch to enable/disable the overhead section in generated RuntimeClasses. This allows users to omit overhead when it's not needed or managed externally. Priority: per-shim > global > default(true). Signed-off-by: LizZhang315 <123134987@qq.com>	2026-04-10 10:26:11 +02:00
Fabiano Fidêncio	218077506b	Merge pull request #12769 from RuoqingHe/runtime-rs-allow-install-on-riscv runtime-rs: Allow installation on RISC-V platforms	2026-04-10 10:24:40 +02:00
Fabiano Fidêncio	dca89485f0	Merge pull request #12802 from stevenhorsman/bump-golang-1.25.9 versions: bump golang to 1.25.9	2026-04-10 06:50:35 +02:00
Fabiano Fidêncio	72fb41d33b	kata-deploy: Symlink original config to per-shim runtime copy Users were confused about which configuration file to edit because kata-deploy copied the base config into a per-shim runtime directory (runtimes/<shim>/) for config.d support, leaving the original file in place untouched. This made it look like the original was the authoritative config, when in reality the runtime was loading the copy from the per-shim directory. Replace the original config file with a symlink pointing to the per-shim runtime copy after the copy is made. The runtime's ResolvePath / EvalSymlinks follows the symlink and lands in the per-shim directory, where it naturally finds config.d/ with all drop-in fragments. This makes it immediately obvious that the real configuration lives in the per-shim directory and removes the ambiguity about which file to inspect or modify. During cleanup, the symlink at the original location is explicitly removed before the runtime directory is deleted. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-09 17:16:40 +02:00
Steve Horsman	9e8069569e	Merge pull request #12734 from Apokleos/rm-v9p-rs runtime-rs: Remove virtio-9p Shared Filesystem Support	2026-04-09 16:15:55 +01:00
Fabiano Fidêncio	5e1ab0aa7d	tests: Support runtime-rs QEMU cmdline format in attestation test The k8s-confidential-attestation test extracts the QEMU command line from journal logs to compute the SNP launch measurement. It only matched the Go runtime's log format ("launching <path> with: [<args>]"), but runtime-rs logs differently ("qemu args: <args>"). Handle both formats so the test works with qemu-snp-runtime-rs. Made-with: Cursor Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-09 16:35:08 +02:00
Fabiano Fidêncio	3b155ab0b1	ci: Run runtime-rs tests for SNP As we're in the process to stabilise runtime-rs for the coming 4.0.0 release, we better start running as many tests as possible with that. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-09 16:35:08 +02:00
stevenhorsman	31f9a5461b	versions: bump golang to 1.25.9 Bump the go version to resolve CVEs: - GO-2026-4947 - GO-2026-4946 - GO-2026-4870 - GO-2026-4869 - GO-2026-4865 - GO-2026-4864 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-09 08:59:40 +01:00

1 2 3 4 5 ...

18471 Commits