kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-04-27 11:03:40 +00:00

Author	SHA1	Message	Date
Fabiano Fidêncio	eda3bc6190	runtime-rs: wire GetDiagnosticData for termination logs Add runtime-rs support for the GetDiagnosticData RPC. This extends the Agent trait, types, and protocol translation layer with the new request/response types. During container stop, when shared_fs is "none" and the terminationMessagePolicy annotation is "File", the runtime copies the termination log from the guest via GetDiagnosticData. The call is best-effort to avoid blocking container teardown. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-17 13:16:25 +02:00
Fabiano Fidêncio	411f8cf583	genpolicy: policy-gate GetDiagnosticDataRequest Add policy rules for the new GetDiagnosticDataRequest RPC. The request is denied by default in genpolicy-generated policies, ensuring CoCo workloads do not expose diagnostic data unless explicitly opted in via policy_data.request_defaults. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Silenio Quarti <silenio_quarti@ca.ibm.com>	2026-04-17 13:16:25 +02:00
Fabiano Fidêncio	64c139208f	agent: add GetDiagnosticData RPC with termination log support Add a new extensible GetDiagnosticData RPC that retrieves diagnostic information from the guest VM. The request carries a log_type string field to specify what kind of data is requested, and a container_id field to identify the target container. The first supported log_type is "termination_log", which reads the Kubernetes termination message file from inside the guest. This is needed for shared_fs=none configurations where the host cannot directly access the guest filesystem. On the Go runtime side, the container stop() path now calls GetDiagnosticData to copy the termination message to the host when running with NoSharedFS and the terminationMessagePolicy annotation is set to "File". The call is best-effort: failures are logged as warnings rather than blocking container teardown. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Silenio Quarti <silenio_quarti@ca.ibm.com>	2026-04-17 13:01:13 +02:00
Steve Horsman	1db12f8ccf	Merge pull request #12812 from stevenhorsman/tee-test-refactor ci: Refactor confidential TEE support	2026-04-17 11:12:13 +01:00
Steve Horsman	e4b3ba56dd	Merge pull request #12855 from stevenhorsman/increase-stale-issues-frequency ci: increase stale issues workflow frequency	2026-04-17 08:37:20 +01:00
stevenhorsman	1dc57c6cef	ci: increase stale issues workflow frequency Update the stale issues workflow to run more frequently: - Weekdays: Every 4 hours (6x per day) at 00:00, 06:00, 12:00, 18:00 UTC - Weekends: Every hour (24x per day) Previously ran once daily at midnight UTC. This change reduces the time it will take for us to get through our backlog, particularly increasing the runs at the weekend, when we should have less other CI running, which it could impact due to GH API rate limiting. Assisted-by: IBM Bob Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-16 20:50:38 +01:00
Fabiano Fidêncio	d9128a58d9	Merge pull request #11611 from Xynnn007/docs-typo docs: fix nerdctl guest image command	2026-04-16 15:36:37 +02:00
Fabiano Fidêncio	57ce3a1347	Merge pull request #11364 from kata-containers/dependabot/github_actions/tim-actions/wip-check-1.1.0 build(deps): bump tim-actions/w.i.p.-check from 1.0.0 to 1.1.0	2026-04-16 14:11:12 +02:00
Fabiano Fidêncio	78a8133112	Merge pull request #12242 from stevenhorsman/msrv-current-thoughts doc: Add MSRV comments to toolchain guidance	2026-04-16 14:09:30 +02:00
Fabiano Fidêncio	88ce64819d	Merge pull request #12726 from LandonTClipp/doc_annotations docs: Add annotation config to doc site	2026-04-16 13:07:53 +02:00
stevenhorsman	05430d5690	doc: Add MSRV comments to toolchain guidance Add some extra clarification about our current position on MSRV. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-16 12:06:46 +01:00
Fabiano Fidêncio	beb06573fa	Merge pull request #12790 from kata-containers/dependabot/cargo/src/tools/kata-ctl/tracing-0d2b5df27c build(deps): bump tracing from 0.1.41 to 0.1.44 in /src/tools/kata-ctl in the tracing group across 1 directory	2026-04-16 12:52:05 +02:00
dependabot[bot]	c044403409	build(deps): bump tim-actions/wip-check from 1.0.0 to 1.1.0 Bumps [tim-actions/wip-check](https://github.com/tim-actions/wip-check) from 1.0.0 to 1.1.0. - [Release notes](https://github.com/tim-actions/wip-check/releases) - [Commits](`1c2a1ca6c1...8c84f59872`) --- updated-dependencies: - dependency-name: tim-actions/wip-check dependency-version: 1.1.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2026-04-16 10:48:41 +00:00
Xynnn007	1d806e0cfa	docs: fix nerdctl guest image command the image name is delivered via annotation than label in nerdctl >= 2.0 version. See the release note https://github.com/containerd/nerdctl/releases/tag/v2.0.0 and PR https://github.com/containerd/nerdctl/pull/2906 If an old version of nerdctl (< 2.0), --label will still work. Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2026-04-16 11:34:03 +02:00
stevenhorsman	ff246f9538	ci: Remove deploy_snapshotter Snapshotter deployment is a no-op now that kata-deploy handles this, so clean up this code. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-16 09:21:04 +01:00
stevenhorsman	fce6415865	tests: Use hypervisor helpers Utilise the new hypervisor helpers in our CI and test code to help add clarity and reduce duplication Note: `kubernetes_dir` is declared as readonly in tests/integration/kubernetes/setup.sh which is sourced by tests_common.sh, so we update it to only be set if unset Assisted-by: IBM Bob Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-16 09:21:04 +01:00
stevenhorsman	2f3fec9727	tests: Add new hypervisor helper script Add a pure shell script which the CI and integration tests can use to check for different categories of runtime Assisted-by: IBM Bob Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-16 09:21:04 +01:00
Alex Lyn	c546b3c585	Merge pull request #12843 from microsoft/saul/build-opt runtime-rs: add build optimization flags	2026-04-16 09:05:20 +08:00
Dan Mihai	c967b45996	Merge pull request #12838 from kata-containers/sprt/new-az-region ci: Change Azure region to eastus2	2026-04-15 16:08:21 -07:00
Aurélien Bombo	1602e04b2d	ci: Change Azure region to eastus2 I'm doing some bookkeeping in the Azure subscription that requires we move from eastus to eastus2. This should have no user-facing impact. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-04-15 14:37:13 -05:00
Fabiano Fidêncio	19441e5515	Merge pull request #12844 from Apokleos/fix-warning runtime-rs: Fix unformatted code in runtime-rs	2026-04-15 17:35:03 +02:00
Fabiano Fidêncio	d2fb22edbe	Merge pull request #12847 from fidencio/topic/ci-adjust-timeout-for-k8s-tests ci: k8s: Adjust timeout on free runners	2026-04-15 17:30:51 +02:00
Fabiano Fidêncio	8d6f1d6f34	ci: k8s: Adjust timeout on free runners I've seen several cases of the CLH tests just being killed due to the 60 minutes timeout. Let's bump it to 75 and see how it goes. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-15 17:09:30 +02:00
dependabot[bot]	bbb037e025	build(deps): bump the tracing group across 1 directory with 1 update Bumps the tracing group with 1 update in the /src/tools/kata-ctl directory: [tracing](https://github.com/tokio-rs/tracing). Updates `tracing` from 0.1.41 to 0.1.44 - [Release notes](https://github.com/tokio-rs/tracing/releases) - [Commits](https://github.com/tokio-rs/tracing/compare/tracing-0.1.41...tracing-0.1.44) Updates `tracing` from 0.1.41 to 0.1.44 - [Release notes](https://github.com/tokio-rs/tracing/releases) - [Commits](https://github.com/tokio-rs/tracing/compare/tracing-0.1.41...tracing-0.1.44) --- updated-dependencies: - dependency-name: tracing dependency-version: 0.1.44 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: tracing - dependency-name: tracing dependency-version: 0.1.44 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: tracing ... Signed-off-by: dependabot[bot] <support@github.com>	2026-04-15 15:06:48 +00:00
LandonTClipp	fd896e4e76	ci: Add kata-dictionary.txt to required_tests.yaml This makes it so that changes to the kata-dictionary.txt file only trigger the static checks to run. Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>	2026-04-15 14:48:01 +01:00
LandonTClipp	56cdfa831f	docs: Add annotation config to doc site Adding the pod annotation config to the doc site. A symlink is created at docs/pod-annotations.md that points to how-to/how-to-set-sandbox-config-kata.md so that the URL for this file will be created at `/pod-annotations`. Also adding brief contrbuting guidelines and how-to's for running the documentation site locally for local previews. Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>	2026-04-15 14:48:01 +01:00
Alex Lyn	2f6319f130	runtime-rs: Fix unformatted code in runtime-rs When build runtime-rs, one unformatted code block comes up,as below: ``` - config - .hypervisor - .entry("qemu".to_owned()) - .and_modify(\|hv\| { - hv.cpu_info.default_vcpus = default_vcpus; - hv.cpu_info.default_maxvcpus = default_maxvcpus; - hv.memory_info.default_memory = default_memory; - hv.memory_info.default_maxmemory = default_maxmemory; - }); + config.hypervisor.entry("qemu".to_owned()).and_modify(\|hv\| { + hv.cpu_info.default_vcpus = default_vcpus; + hv.cpu_info.default_maxvcpus = default_maxvcpus; + hv.memory_info.default_memory = default_memory; + hv.memory_info.default_maxmemory = default_maxmemory; + }); ``` Let's format it now. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-04-15 14:48:23 +02:00
Fabiano Fidêncio	57898de1fe	Merge pull request #12845 from fidencio/topic/fix-signed-image-tests tests: Update images used for signed tests	2026-04-15 14:47:58 +02:00
Fabiano Fidêncio	ba9a02897e	genpolicy: make allowed cgroup v2 mount extras configurable Newer kernels and containerd versions (>= 2.2.3) may add extra mount options to /sys/fs/cgroup that genpolicy does not embed in the policy (e.g. nsdelegate, memory_recursiveprot). This causes the Kata agent to reject CreateContainerRequest with PERMISSION_DENIED because the check_mount rules require an exact match. Rather than hard-coding the allowed extras in Rego, make them configurable via genpolicy-settings.json under cluster_config.cgroup_mount_extras_allowed. The corresponding Rego rule (check_mount 4) reads the list from policy_data.cluster_config and allows only those named options beyond the policy-embedded set. To support this, cluster_config is now included in PolicyData so that it gets serialized into the Rego policy_data object at generation time. This follows the established pattern of keeping site- and version-specific tunables in genpolicy-settings.json so they can be overridden via JSON-Patch drop-ins without touching the Rego source. A policy test case is added to verify that the default allowed extras (nsdelegate, memory_recursiveprot) are accepted and that unknown extras are rejected. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-15 13:24:21 +02:00
Fabiano Fidêncio	d29b77e953	tests: Update images used for signed tests I've updaed the images on the Confidential Containers side, in order to add arm64 support, but I didn't realize it'd break tests not using those. Apologies! Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-15 12:11:37 +02:00
Saul Paredes	9404104aba	runtime-rs: add build optimization flags Enable the following optimizations when building runtime-rs in release mode: - lto: true - codegen-units=1: Setting these reduce the binary size and improve performance at the cost of longer build times. Without these flags: - build time: 4m 55s - binary size: 51 MB With these flags: - build time: 7m 21s - binary size: 38MB Per https://github.com/kata-containers/kata-containers/issues/1125 and local experiments, a smaller binary size leads to a smaller shim memory footprint. - https://nnethercote.github.io/perf-book/build-configuration.html#codegen-units - https://nnethercote.github.io/perf-book/build-configuration.html#link-time-optimization Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2026-04-14 15:52:38 -07:00
Fabiano Fidêncio	2d57b89857	Merge pull request #12805 from stevenhorsman/stale-bot-improvements Stale bot improvements	2026-04-14 23:20:41 +02:00
Fabiano Fidêncio	672d3f2b0f	workflows: Use docker buildx to build and push auth test image skopeo copy with --override-arch fails with "authentication required" during blob existence checks at the destination, regardless of how credentials are provided (--dest-creds, --authfile, REGISTRY_AUTH_FILE). This is a known issue with skopeo 1.13.x when copying from manifest list sources. Replace the skopeo/buildah approach with docker/build-push-action, which is already proven in this repo (build-kubectl-image.yaml) and handles multi-arch builds and Quay pushes reliably. The workflow now builds a trivial FROM busybox image using buildx with QEMU emulation. Fixes: `b0abe5999` ("workflows: Add workflow to create auth registry test image") Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-14 22:44:30 +02:00
Fabiano Fidêncio	09ef32eaf1	Merge pull request #12827 from fidencio/topic/kata-deploy-custom-containerd-config kata-deploy: Allow overriding containerd config path and file name	2026-04-14 22:23:33 +02:00
stevenhorsman	5ea30b33ae	workflows: stale-issue: Increase operations-per-run At a rate of default 30 per run, with over 1.5k issues, it will take us over 50 days to do a pass of the issues we have, so increase operations-per-run as suggested in the workflow by github to reduce this. Based on the stats of the latest run, we are not too close to hitting the API rate limit: ``` Github API rate used: 32 Github API rate remaining: 3693; reset at: Thu Apr 09 2026 10:23:31 GMT+0000 (Coordinated Universal Time) ``` so I think this should be okay. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-14 16:25:35 +01:00
stevenhorsman	a0359326e9	workflow: Bump stale action version v9 is based on Node.js 20 which is deprecated, so update to the latest to pick up a Node.js 24 version before Github removes Node 20 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-14 16:25:35 +01:00
Fabiano Fidêncio	0713b2d5d3	Merge pull request #12828 from kata-containers/dependabot/pip/docs/pillow-12.2.0 build(deps): bump pillow from 12.1.1 to 12.2.0 in /docs	2026-04-14 17:23:07 +02:00
Fabiano Fidêncio	661cfd7efa	Merge pull request #12800 from kata-containers/dependabot/go_modules/src/runtime/go.opentelemetry.io/otel/sdk-1.43.0 build(deps): bump go.opentelemetry.io/otel/sdk from 1.40.0 to 1.43.0 in /src/runtime	2026-04-14 17:22:47 +02:00
dependabot[bot]	b54f02aa6c	build(deps): bump pillow from 12.1.1 to 12.2.0 in /docs Bumps [pillow](https://github.com/python-pillow/Pillow) from 12.1.1 to 12.2.0. - [Release notes](https://github.com/python-pillow/Pillow/releases) - [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst) - [Commits](https://github.com/python-pillow/Pillow/compare/12.1.1...12.2.0) --- updated-dependencies: - dependency-name: pillow dependency-version: 12.2.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2026-04-14 14:40:14 +00:00
Steve Horsman	8289aaf0c7	Merge pull request #12831 from kata-containers/topic/ci-move-out-of-nodejs-20 ci: Update GitHub Actions to Node.js 24 compatible versions	2026-04-14 14:59:03 +01:00
Fabiano Fidêncio	c087eb92ec	ci: Update GitHub Actions to Node.js 24 compatible versions Node.js 20 is deprecated on GitHub Actions runners and will be forced to Node.js 24 starting June 2nd, 2026. Update all affected actions to versions that natively support Node.js 24: - actions/upload-artifact: v4.6.2 -> v6.0.0 - actions/download-artifact: v4.3.0 -> v7.0.0 - docker/build-push-action: v5.4.0 -> v7.0.0 - docker/login-action: v3.4.0 -> v4.1.0 - docker/setup-buildx-action: v3.10.0 -> v4.0.0 - docker/setup-qemu-action: v3.6.0 -> v4.0.0 - geekyeggo/delete-artifact: v5.1.0 -> v6.0.0 - azure/login: v2.3.0 -> v3.0.0 - azure/setup-kubectl: v4.0.1 -> v5.0.0 - nick-fields/retry: v3.0.2 -> v4.0.0 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-14 15:48:45 +02:00
Fabiano Fidêncio	7e464f13a5	Merge pull request #12830 from fidencio/topic/workflows-create-auth-registry-image workflows: Add workflow to create auth registry test image	2026-04-14 11:28:23 +02:00
Fabiano Fidêncio	b0abe59993	workflows: Add workflow to create auth registry test image Add a manually-triggered workflow that builds and pushes a multi-arch busybox-based image to quay.io/kata-containers/confidential-containers-auth for use as an authenticated container image in CI tests. The workflow uses skopeo to copy per-arch images and buildah to create and push the multi-arch manifest. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-14 10:59:12 +02:00
Fabiano Fidêncio	b0a87880e7	Merge pull request #12826 from fidencio/topic/fix-concurrent-map-access-in-wait runtime: Fix concurrent map read/write panic in Wait()	2026-04-14 08:48:52 +02:00
Fabiano Fidêncio	df1d02d3cf	kata-deploy: Allow overriding containerd config path and file name Add two new Helm values under `containerd`: - `configDir`: overrides the host directory where the containerd config lives, taking precedence over the k8sDistribution-based auto-detection. - `configFileName`: overrides the containerd config file name, propagated to the kata-deploy binary via the new CONTAINERD_CONFIG_FILE_NAME environment variable. These are useful for non-standard containerd setups that don't match any of the built-in k8sDistribution presets (k8s, k3s, rke2, k0s, microk8s). The config file name override only affects the default runtime branch in get_containerd_paths(). The k0s/microk8s/k3s/rke2 branches are left untouched since those runtimes have mandatory file naming conventions. Also fixes a spurious leading space in the k3s containerdConfPath branch. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-13 22:31:55 +02:00
Fabiano Fidêncio	b17dd2a902	runtime: Fix concurrent map read/write panic in Wait() Wait() was releasing s.mu immediately after getContainer(), then calling getExec() — which reads c.execs — without holding any lock. Concurrent Exec() or Delete() calls that write to c.execs under s.mu triggered a "concurrent map read and map write" fatal panic. Add a dedicated sync.RWMutex to the container struct that protects the execs map. getExec() now acquires a read lock internally, and all writes go through new setExec()/deleteExec() helpers that acquire the write lock. This keeps the locking concern local to the map and avoids complicating the s.mu usage in Wait(). Add a regression test (TestConcurrentExecAccess) that exercises concurrent getExec reads against setExec/deleteExec writes; this reliably reproduces the panic under the race detector without the fix. Fixes: #12825 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-13 21:14:28 +02:00
Fabiano Fidêncio	4c567a9c05	ci: Reduce TEE test scope for PR runs TEE hardware (TDX, SEV-SNP) is very limited in CI. Running the full test suite on every PR consumes these resources unnecessarily, since most tests exercises what is already exercised by the -coco-dev CIs. Introduce a `tee-test-scope` workflow input (small/full) and a new `baremetal-small-tee` K8S_TEST_HOST_TYPE that runs only the 12 tests that are TEE-relevant: attestation tests (encrypted/authenticated/ signed image pull, confidential attestation) plus policy and trusted ephemeral data storage tests. PR runs default to "small" (12 tests), nightly runs use "full" (59 tests), and manual dispatch offers a dropdown to choose. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-13 20:26:46 +02:00
dependabot[bot]	b303600283	build(deps): bump go.opentelemetry.io/otel/sdk in /src/runtime Bumps [go.opentelemetry.io/otel/sdk](https://github.com/open-telemetry/opentelemetry-go) from 1.40.0 to 1.43.0. - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.40.0...v1.43.0) --- updated-dependencies: - dependency-name: go.opentelemetry.io/otel/sdk dependency-version: 1.43.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2026-04-13 10:36:44 +00:00
Fabiano Fidêncio	bd6377a038	Merge pull request #12614 from manuelh-dev/mahuber/image-signing-nim tests: nvidia: Enforce image signing for NIM test	2026-04-11 14:48:04 +02:00
Fabiano Fidêncio	5eb7844183	Merge pull request #12430 from stevenhorsman/cargo-deny-static-checks static-checks: Rework cargo deny check	2026-04-11 12:05:53 +02:00

1 2 3 4 5 ...

18497 Commits