Commit Graph

18461 Commits

Author SHA1 Message Date
stevenhorsman
a0359326e9 workflow: Bump stale action version
v9 is based on Node.js 20 which is deprecated, so update to the
latest to pick up a Node.js 24 version before Github removes Node 20

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-14 16:25:35 +01:00
Fabiano Fidêncio
0713b2d5d3 Merge pull request #12828 from kata-containers/dependabot/pip/docs/pillow-12.2.0
build(deps): bump pillow from 12.1.1 to 12.2.0 in /docs
2026-04-14 17:23:07 +02:00
Fabiano Fidêncio
661cfd7efa Merge pull request #12800 from kata-containers/dependabot/go_modules/src/runtime/go.opentelemetry.io/otel/sdk-1.43.0
build(deps): bump go.opentelemetry.io/otel/sdk from 1.40.0 to 1.43.0 in /src/runtime
2026-04-14 17:22:47 +02:00
dependabot[bot]
b54f02aa6c build(deps): bump pillow from 12.1.1 to 12.2.0 in /docs
Bumps [pillow](https://github.com/python-pillow/Pillow) from 12.1.1 to 12.2.0.
- [Release notes](https://github.com/python-pillow/Pillow/releases)
- [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst)
- [Commits](https://github.com/python-pillow/Pillow/compare/12.1.1...12.2.0)

---
updated-dependencies:
- dependency-name: pillow
  dependency-version: 12.2.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-14 14:40:14 +00:00
Steve Horsman
8289aaf0c7 Merge pull request #12831 from kata-containers/topic/ci-move-out-of-nodejs-20
ci: Update GitHub Actions to Node.js 24 compatible versions
2026-04-14 14:59:03 +01:00
Fabiano Fidêncio
c087eb92ec ci: Update GitHub Actions to Node.js 24 compatible versions
Node.js 20 is deprecated on GitHub Actions runners and will be
forced to Node.js 24 starting June 2nd, 2026. Update all affected
actions to versions that natively support Node.js 24:

- actions/upload-artifact: v4.6.2 -> v6.0.0
- actions/download-artifact: v4.3.0 -> v7.0.0
- docker/build-push-action: v5.4.0 -> v7.0.0
- docker/login-action: v3.4.0 -> v4.1.0
- docker/setup-buildx-action: v3.10.0 -> v4.0.0
- docker/setup-qemu-action: v3.6.0 -> v4.0.0
- geekyeggo/delete-artifact: v5.1.0 -> v6.0.0
- azure/login: v2.3.0 -> v3.0.0
- azure/setup-kubectl: v4.0.1 -> v5.0.0
- nick-fields/retry: v3.0.2 -> v4.0.0

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-14 15:48:45 +02:00
Fabiano Fidêncio
7e464f13a5 Merge pull request #12830 from fidencio/topic/workflows-create-auth-registry-image
workflows: Add workflow to create auth registry test image
2026-04-14 11:28:23 +02:00
Fabiano Fidêncio
b0abe59993 workflows: Add workflow to create auth registry test image
Add a manually-triggered workflow that builds and pushes a multi-arch
busybox-based image to quay.io/kata-containers/confidential-containers-auth
for use as an authenticated container image in CI tests.

The workflow uses skopeo to copy per-arch images and buildah to create
and push the multi-arch manifest.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-14 10:59:12 +02:00
Fabiano Fidêncio
b0a87880e7 Merge pull request #12826 from fidencio/topic/fix-concurrent-map-access-in-wait
runtime: Fix concurrent map read/write panic in Wait()
2026-04-14 08:48:52 +02:00
Fabiano Fidêncio
b17dd2a902 runtime: Fix concurrent map read/write panic in Wait()
Wait() was releasing s.mu immediately after getContainer(), then
calling getExec() — which reads c.execs — without holding any lock.
Concurrent Exec() or Delete() calls that write to c.execs under s.mu
triggered a "concurrent map read and map write" fatal panic.

Add a dedicated sync.RWMutex to the container struct that protects the
execs map. getExec() now acquires a read lock internally, and all
writes go through new setExec()/deleteExec() helpers that acquire the
write lock. This keeps the locking concern local to the map and avoids
complicating the s.mu usage in Wait().

Add a regression test (TestConcurrentExecAccess) that exercises
concurrent getExec reads against setExec/deleteExec writes; this
reliably reproduces the panic under the race detector without the fix.

Fixes: #12825

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-13 21:14:28 +02:00
Fabiano Fidêncio
4c567a9c05 ci: Reduce TEE test scope for PR runs
TEE hardware (TDX, SEV-SNP) is very limited in CI. Running the full
test suite on every PR consumes these resources unnecessarily, since
most tests exercises what is already exercised by the -coco-dev CIs.

Introduce a `tee-test-scope` workflow input (small/full) and a new
`baremetal-small-tee` K8S_TEST_HOST_TYPE that runs only the 12 tests
that are TEE-relevant: attestation tests (encrypted/authenticated/
signed image pull, confidential attestation) plus policy and trusted
ephemeral data storage tests.

PR runs default to "small" (12 tests), nightly runs use "full" (59
tests), and manual dispatch offers a dropdown to choose.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-13 20:26:46 +02:00
dependabot[bot]
b303600283 build(deps): bump go.opentelemetry.io/otel/sdk in /src/runtime
Bumps [go.opentelemetry.io/otel/sdk](https://github.com/open-telemetry/opentelemetry-go) from 1.40.0 to 1.43.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.40.0...v1.43.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/sdk
  dependency-version: 1.43.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-13 10:36:44 +00:00
Fabiano Fidêncio
bd6377a038 Merge pull request #12614 from manuelh-dev/mahuber/image-signing-nim
tests: nvidia: Enforce image signing for NIM test
2026-04-11 14:48:04 +02:00
Fabiano Fidêncio
5eb7844183 Merge pull request #12430 from stevenhorsman/cargo-deny-static-checks
static-checks: Rework cargo deny check
2026-04-11 12:05:53 +02:00
stevenhorsman
8be3a24112 ci: Update cargo-deny in gatekeeper
Update the name and move it to the static checks as we don't
need to ensure it's running for none code changes.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-11 08:46:32 +01:00
stevenhorsman
9448988783 workflow: Update cargo deny check
The cargo deny generated action doesn't seem to work
and seems unnecessarily complex, so try using
EmbarkStudios/cargo-deny-action instead

Fixes: #11218
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-11 08:46:32 +01:00
stevenhorsman
a0410e0d5c static-checks: Update cargo deny config
The previous config is not valid, so update it based on information
from https://embarkstudios.github.io/cargo-deny/checks/cfg.html

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-11 08:46:32 +01:00
stevenhorsman
a32c6fd9ff mem-agent: Add package metadata
Make the authors, edition and license be inherited from the
workspace

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-11 08:46:32 +01:00
stevenhorsman
5bcc006447 runtime-rs: Add missing license
The ch-config crate was missing a license

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-11 08:46:32 +01:00
Manuel Huber
7daeb78b67 tests: nvidia: Enforce image signing for NIM test
Validate container image signatures for the NIM test using NVIDIA's
public signing key.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-04-11 09:22:50 +02:00
Fabiano Fidêncio
3ce3644c3c Merge pull request #12807 from PiotrProkop/blk-sector-rust
runtime-rs: allow specifying logical/physical sector size for block devices
2026-04-11 00:42:45 +02:00
Fabiano Fidêncio
6f3c11aec4 Merge pull request #12808 from fidencio/topic/agent-allow-configuring-launch-process-timeout
agent: Make launch_process_timeout configurable
2026-04-11 00:36:01 +02:00
Fabiano Fidêncio
d4a042a155 Merge pull request #12813 from fitzthum/bump-gc-ma-sigs
Bump guest components to pickup additional signature support
2026-04-10 23:57:19 +02:00
Fabiano Fidêncio
78fa4c88e2 Merge pull request #12814 from fidencio/topic/nvidia-always-do-vcpu-pinning
runtime: Set `enable_vcpus_pinning = true` for NVIDIA configs
2026-04-10 23:47:44 +02:00
Fabiano Fidêncio
7244389ad4 runtime: Set enable_vcpus_pinning = true for NVIDIA configs
So we can have a better performance by default.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-10 16:41:34 +02:00
Fabiano Fidêncio
1d77c4e60f Merge pull request #12752 from LizZhang315/add-overheadEnabled
helm: add overheadEnabled switch for runtimeclass
2026-04-10 16:40:42 +02:00
Tobin Feldman-Fitzthum
ff26a6b876 versions: update image-rs to pickup signature fixes
The new version of image-rs supports more types of signed images. First,
we added supported for a few more key types. Second, we added support
for multi-arch images where the manifest digest is signed but the
individual arch manifest is not. These images are relatively common, so
let's pickup the fix asap.

Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
2026-04-10 06:54:58 -07:00
Tobin Feldman-Fitzthum
2588a0e5a5 agent-ctl: bump image-rs version
I don't think agent-ctl will benefit from the new image-rs features, but
let's update it to be complete.

Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
2026-04-10 06:52:53 -07:00
Fabiano Fidêncio
e8f34a2b26 agent: Update protocol
This is not related to this PR, but rather to #12734, which ended up not
running the `make src/agent generate-protocols`.

While here, let's also fix it.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-10 14:47:01 +02:00
Fabiano Fidêncio
36a2d8e7f2 agent: Make launch_process_timeout configurable
The hardcoded DEFAULT_LAUNCH_PROCESS_TIMEOUT of 6 seconds in the kata
agent is insufficient for environments with NVIDIA GPUs and NVSwitches,
where the attestation-agent needs significantly more time to collect
evidence during initialization (e.g. ~2 seconds per NVSwitch).

When the timeout expires, the agent (PID 1) exits with an error, causing
the guest kernel to perform an orderly shutdown before the
attestation-agent has finished starting.

Make this timeout configurable via the kernel parameter
agent.launch_process_timeout (in seconds), preserving the 6-second
default for backward compatibility. The Go runtime is wired up to pass
this value from the TOML config's [agent.kata] section through to the
kernel command line.

The NVIDIA GPU configs set the new default to 15 seconds.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-10 14:47:01 +02:00
PiotrProkop
82de35c720 runtime-rs: allow specifying logical/physical sector size for block devices
Add two new configuration knobs that control the logical and physical
sector sizes advertised by virtio-blk devices to the guest:

  block_device_logical_sector_size  (config file)
  block_device_physical_sector_size (config file)

  io.katacontainers.config.hypervisor.blk_logical_sector_size  (annotation)
  io.katacontainers.config.hypervisor.blk_physical_sector_size (annotation)

The annotation names are abbreviated relative to the config file keys
because Kubernetes enforces a 63-character limit on annotation name
segments, and the full names would exceed it.

Both settings default to 0 (let QEMU decide). When set, they are passed
as logical_block_size and physical_block_size in the QMP device_add
command during block device hotplug.

Setting logical_sector_size smaller then container filesystem
block size will cause EINVAL on mount. The physical_sector_size can
always be set independently.

Values must be 0 or a power of 2 in the range [512, 65536]; other
values are rejected with an error at sandbox creation time.

Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2026-04-10 11:14:51 +02:00
Fabiano Fidêncio
fd6375d8d5 Merge pull request #12806 from kata-containers/topic/ci-run-runtime-rs-on-SNP
ci: Run qemu-snp-runtime-rs tests in the CI
2026-04-10 11:01:20 +02:00
LizZhang315
2312f67c9b helm: add overheadEnabled switch for runtimeclass
Add a global and per-shim configurable switch to enable/disable
the overhead section in generated RuntimeClasses. This allows users
to omit overhead when it's not needed or managed externally.

Priority: per-shim > global > default(true).

Signed-off-by: LizZhang315 <123134987@qq.com>
2026-04-10 10:26:11 +02:00
Fabiano Fidêncio
218077506b Merge pull request #12769 from RuoqingHe/runtime-rs-allow-install-on-riscv
runtime-rs: Allow installation on RISC-V platforms
2026-04-10 10:24:40 +02:00
Fabiano Fidêncio
dca89485f0 Merge pull request #12802 from stevenhorsman/bump-golang-1.25.9
versions: bump golang to 1.25.9
2026-04-10 06:50:35 +02:00
Fabiano Fidêncio
72fb41d33b kata-deploy: Symlink original config to per-shim runtime copy
Users were confused about which configuration file to edit because
kata-deploy copied the base config into a per-shim runtime directory
(runtimes/<shim>/) for config.d support, leaving the original file
in place untouched.  This made it look like the original was the
authoritative config, when in reality the runtime was loading the
copy from the per-shim directory.

Replace the original config file with a symlink pointing to the
per-shim runtime copy after the copy is made.  The runtime's
ResolvePath / EvalSymlinks follows the symlink and lands in the
per-shim directory, where it naturally finds config.d/ with all
drop-in fragments.  This makes it immediately obvious that the
real configuration lives in the per-shim directory and removes the
ambiguity about which file to inspect or modify.

During cleanup, the symlink at the original location is explicitly
removed before the runtime directory is deleted.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-09 17:16:40 +02:00
Steve Horsman
9e8069569e Merge pull request #12734 from Apokleos/rm-v9p-rs
runtime-rs: Remove virtio-9p Shared Filesystem Support
2026-04-09 16:15:55 +01:00
Fabiano Fidêncio
5e1ab0aa7d tests: Support runtime-rs QEMU cmdline format in attestation test
The k8s-confidential-attestation test extracts the QEMU command line
from journal logs to compute the SNP launch measurement. It only
matched the Go runtime's log format ("launching <path> with: [<args>]"),
but runtime-rs logs differently ("qemu args: <args>").

Handle both formats so the test works with qemu-snp-runtime-rs.

Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-09 16:35:08 +02:00
Fabiano Fidêncio
3b155ab0b1 ci: Run runtime-rs tests for SNP
As we're in the process to stabilise runtime-rs for the coming 4.0.0
release, we better start running as many tests as possible with that.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-09 16:35:08 +02:00
stevenhorsman
31f9a5461b versions: bump golang to 1.25.9
Bump the go version to resolve CVEs:
- GO-2026-4947
- GO-2026-4946
- GO-2026-4870
- GO-2026-4869
- GO-2026-4865
- GO-2026-4864

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-09 08:59:40 +01:00
Hyounggyu Choi
f15f7f49f1 Merge pull request #12787 from fidencio/topic/runtime-rs-qemu-arm64-use-static-sandbox-resource-mgmt
runtime: qemu: Enable static sandbox resource management on ARM & s390x
2026-04-09 09:18:11 +02:00
Ruoqing He
98ee385220 runtime-rs: Consolidate unsupported arch
Consolidate arch we don't support at the moment, and avoid hard coding
error messages per arch.

Signed-off-by: Ruoqing He <ruoqing.he@lingcage.com>
2026-04-09 04:18:50 +00:00
Ruoqing He
26ffe1223b runtime-rs: Allow install on riscv64 platform
runtime-rs works with QEMU on RISC-V platforms, let's enable
installation on RISC-V.

Signed-off-by: Ruoqing He <ruoqing.he@lingcage.com>
2026-04-09 04:18:50 +00:00
Fabiano Fidêncio
80b0ed273f Merge pull request #12784 from hgowda-amd/sev-snp-tests-required
Add sev-snp, qemu-snp CIs as required
2026-04-09 00:22:49 +02:00
Harshitha Gowda
bb1165b23f tests: Set sev-snp, qemu-snp CIs as required
run-k8s-tests-on-tee (sev-snp, qemu-snp)

Signed-off-by: Harshitha Gowda <hgowda@amd.com>
2026-04-08 22:36:58 +02:00
Fabiano Fidêncio
2148afe243 Merge pull request #12796 from fidencio/topic/kata-deploy-run-cargo-fmt-and-cargo-check
kata-deploy: Run cargo clippy during build
2026-04-08 22:32:31 +02:00
Fabiano Fidêncio
8ff630059a Merge pull request #12778 from amd-aliem/enable-img-rootfs-snp
runtime: SNP img-based rootfs with dm-verity
2026-04-08 22:06:31 +02:00
Fabiano Fidêncio
4561ae3e29 Merge pull request #12799 from fitzthum/fixup-nv-doc-1
docs: update flow for setting nvidia devices to ready
2026-04-08 21:32:55 +02:00
Tobin Feldman-Fitzthum
9119b4982c docs: update flow for setting nvidia devices to ready
Now, we include the nvrc.smi.srs=1 flag in the default kernel cmdline.
Thus, we can remove the guidance for people to add it themselves when
not using attestation. In fact, users don't really need to know about
this flag at all.

Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
2026-04-08 18:59:51 +00:00
Fabiano Fidêncio
21466eb4e5 kata-deploy: Fix clippy warnings across crate
Fix all clippy warnings triggered by -D warnings:

- install.rs: remove useless .into() conversions on PathBuf values
  and replace vec! with an array literal where a Vec is not needed
- utils/toml.rs: replace while-let-on-iterator with a for loop and
  drop the now-unnecessary mut on the iterator binding
- main.rs: replace match-with-single-pattern with if-let in two
  places dealing with experimental_setup_snapshotter
- utils/yaml.rs: extract repeated serde_yaml::Value::String key into
  a local variable, removing needless borrows on temporary values

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-08 20:47:59 +02:00