Commit Graph

18866 Commits

Author SHA1 Message Date
Alex Tibbles
8d7246e29a kernel: bump kernel versions other than dragonball
Applies fix for CVE-2026-31431 for non-dragonball configurations on current LTS 6.18.

Signed-Off-By: Alex Tibbles <alex@bleg.org>
2026-05-05 09:30:46 +02:00
Fabiano Fidêncio
27c3dfbb8c Merge pull request #12943 from fidencio/topic/kata-deploy-add-http-health-probes
kata-deploy: add HTTP health probes (healthz/readyz)
2026-05-05 09:30:17 +02:00
Fabiano Fidêncio
03f36e391e Merge pull request #12980 from microsoft/danmihai1/mariner-oci-version
ci: mariner: use OCI version 1.2.1
2026-05-05 08:10:03 +02:00
Dan Mihai
0a6dc2fae0 ci: mariner: use OCI version 1.2.1
Mariner moved from version 1.2.0 to version 1.2.1.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2026-05-05 02:23:30 +00:00
Fabiano Fidêncio
76d815dc67 Merge pull request #12968 from rajatchopra/pgpudocfix
doc update for Nvidia use case
2026-05-04 21:29:31 +02:00
Rajat Chopra
4a19262efb docs: fix nvidia config for device plugin
Description: the config for gpu operator for Nvidia kata containers device
plugin needs to be revised. The older one attributes to vgpu/kubevirt use case.

Signed-off-by: Rajat Chopra <rajatc@nvidia.com>
2026-05-04 11:03:58 -07:00
Fabiano Fidêncio
6033f25e0e Merge pull request #12965 from kata-containers/dependabot/github_actions/actions/checkout-6.0.2
build(deps): bump actions/checkout from 4.2.2 to 6.0.2
2026-05-04 19:37:54 +02:00
Fabiano Fidêncio
a3d6829ed4 Merge pull request #12964 from kata-containers/dependabot/github_actions/editorconfig-checker/action-editorconfig-checker-2.2.0
build(deps): bump editorconfig-checker/action-editorconfig-checker from 2.1.0 to 2.2.0
2026-05-04 19:37:42 +02:00
Fabiano Fidêncio
7c61c55011 Merge pull request #12966 from kata-containers/dependabot/github_actions/streetsidesoftware/cspell-action-8.4.0
build(deps): bump streetsidesoftware/cspell-action from 8.3.0 to 8.4.0
2026-05-04 19:37:28 +02:00
dependabot[bot]
b2931c6759 build(deps): bump actions/checkout from 4.2.2 to 6.0.2
Bumps [actions/checkout](https://github.com/actions/checkout) from 4.2.2 to 6.0.2.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v4.2.2...de0fac2e4500dabe0009e67214ff5f5447ce83dd)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: 6.0.2
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-05-04 17:05:59 +00:00
Fabiano Fidêncio
3d43259463 Merge pull request #12974 from fidencio/topic/ci-tdx-nightly-run-with-runtime-rs
ci: tdx: Remove ITA key usage and run qemu-tdx-runtime-rs on nightly
2026-05-04 19:04:03 +02:00
Fabiano Fidêncio
b195dcca65 Merge pull request #12975 from kata-containers/topic/ci-nvidia-run-nightly-without-trace-log-level
ci: nvidia: Disable NVRC trace logging on nightly runs
2026-05-04 19:02:14 +02:00
Fabiano Fidêncio
d9722ba4be Merge pull request #12960 from microsoft/saul/update_mariner_test_configs
kata-deploy: configure_mariner: update test configs
2026-05-04 18:26:41 +02:00
Fabiano Fidêncio
9e3bd6b576 tests: fix kata-deploy lifecycle test reliability
Fix two issues in kata-deploy-lifecycle.bats that caused failures on
k3s, k0s and rke2:

  run_on_host():
  - `kubectl run --rm -i` causes k3s/rke2 to inject session-recording
    banners into stdout, polluting command output and breaking string
    assertions. Replace with a create/wait/logs/delete sequence so only
    the container's actual stdout is captured.

  "Artifacts are fully cleaned up after uninstall":
  - After a CRI restart the kubelet may briefly report "Unknown" for the
    container runtime version. Retry for up to 60s before asserting.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-03 22:09:08 +02:00
Fabiano Fidêncio
ed4f6ebc9e tests: use readiness probes to wait for kata-deploy install
Now that kata-deploy has a proper readiness probe (/readyz returns 200
only after install completes), replace the ad-hoc wait strategies with
kubectl wait --for=condition=Ready on the kata-deploy pods.

Note: helm --wait is ineffective for single-node clusters with
maxUnavailable=1 (the DaemonSet is considered ready with 0 ready pods),
so the CI uses kubectl wait on the pod readiness condition directly.

  gha-run-k8s-common.sh:
  - Drop the waitForProcess polling loop for Running pods
  - Drop the `sleep 60s` with its FIXME comment
  - Add kubectl wait --for=condition=Ready instead

  helm-deploy.bash:
  - Drop the extra `kubectl rollout status` after helm
  - Drop the `sleep 60`
  - The existing --wait on the helm command now suffices

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-03 22:09:08 +02:00
Fabiano Fidêncio
49396b7991 kata-deploy: add HTTP health probes (healthz/readyz)
The kata-deploy DaemonSet pod had no Kubernetes health probes, so the
kubelet could not distinguish between "still installing" and "crashed",
and rolling updates would proceed to the next node before install
actually finished.

Add a lightweight HTTP health server (built on raw tokio TcpListener,
no new crate dependencies) that starts immediately in the install path:

  /healthz — liveness: returns 200 as soon as the server binds
  /readyz  — readiness: returns 503 while installing, 200 after
             install completes (artifacts extracted, CRI restarted,
             node labeled)

Wire the Helm chart with startup, liveness, and readiness probes
(all individually toggleable). The startup probe allows up to 10
minutes for install to complete before the liveness probe takes over.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-03 22:09:08 +02:00
Fabiano Fidêncio
cd51003f3f Merge pull request #12947 from fidencio/topic/runtime-rs-s390x-docker
runtime-rs: qemu: add CCW network hotplug & retry update_interface
2026-05-03 22:06:00 +02:00
Fabiano Fidêncio
746d182c1a runtime-rs: qemu: add CCW network hotplug & retry update_interface
On s390x, QEMU uses the CCW bus instead of PCI.  The network device
hotplug path was hardcoded to find a PCI slot, which fails with
"no free slots on PCI bridges" on s390x.

Add CCW support to `hotplug_network_device`: when running on a
native CCW bus, allocate a CCW subchannel address and use `devno`
instead of PCI `bus`/`addr`/`vectors`.

Additionally, after hotplugging a network device, the guest kernel
needs time to probe the CCW device before the network interface
appears.  Add a retry loop (up to 10 attempts, 100ms apart) to
`handle_interfaces` so that `update_interface` succeeds once the
guest has created the link.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2026-05-03 19:26:39 +02:00
Fabiano Fidêncio
8655d87892 ci: nvidia: Disable NVRC trace logging on nightly runs
On nightly CI, run the NVIDIA GPU tests without setting nvrc.log=trace.
This gives us end-to-end test coverage that more closely matches how
users would actually run Kata Containers with NVIDIA GPUs, since trace
logging is not enabled by default in production.

NVRC trace logging remains enabled for PR runs, where the extra
verbosity is useful for debugging failures.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-03 18:13:07 +02:00
Fabiano Fidêncio
51d5f2ea7b ci: Run runtime-rs tests for TDX on nightly
As we're in the process to stabilise runtime-rs for the coming 4.0.0
release, we better start running as many tests as possible with that.

The TDX runtime-rs job is gated to nightly runs only (pr-number ==
"nightly") since we only have a single TDX machine and cannot afford
to run both qemu-tdx and qemu-tdx-runtime-rs on every PR.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-03 18:05:58 +02:00
Fabiano Fidêncio
8c3c7aa871 ci: Drop ITA_KEY usage from CI workflows
The ITA_KEY secret was conditionally passed to TDX jobs for Intel
Trust Authority attestation, but it is no longer needed. Remove it
from all workflow files and the test helper export.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-03 18:05:51 +02:00
Steve Horsman
86e5975ad6 Merge pull request #12973 from stevenhorsman/release-concurrency-fix
release: fix release workflow concurrency deadlock
3.30.0
2026-05-02 20:16:29 +01:00
stevenhorsman
9715a7cca2 release: fix release workflow concurrency deadlock
Architecture-specific release workflows were using the same concurrency
group when called from release.yaml, causing GitHub Actions to detect
a deadlock and cancel the builds.

Fix by appending architecture suffix to each workflow's concurrency
group, allowing parallel execution without conflicts.

Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-05-02 20:13:17 +01:00
Fabiano Fidêncio
5540f50198 Merge pull request #12972 from stevenhorsman/release/3.30.0
release: Bump version to 3.30.0
2026-05-02 20:54:54 +02:00
Steve Horsman
fd2b85f8ad Merge pull request #12969 from burgerdev/require-codegen
gatekeeper: require codegen
2026-05-02 18:38:53 +01:00
stevenhorsman
a1a6a9a150 release: Bump version to 3.30.0
Bump VERSION and helm-charts versions.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-05-02 17:57:39 +01:00
Steve Horsman
3ae3a0437b Merge pull request #12963 from zvonkok/copyfail
kernel: Bump Kernel Version
2026-05-02 16:58:53 +01:00
Markus Rudy
22598a34b2 gatekeeper: require codegen
The codegen check ensures that generated files are up-to-date and
correspond to the tool versions used in CI. Requiring this check
prevents us from accidentally merging, e.g., proto changes without the
corresponding Rust/Go updates.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-05-02 12:28:58 +02:00
dependabot[bot]
7a1fa7842d build(deps): bump streetsidesoftware/cspell-action from 8.3.0 to 8.4.0
Bumps [streetsidesoftware/cspell-action](https://github.com/streetsidesoftware/cspell-action) from 8.3.0 to 8.4.0.
- [Release notes](https://github.com/streetsidesoftware/cspell-action/releases)
- [Changelog](https://github.com/streetsidesoftware/cspell-action/blob/main/CHANGELOG.md)
- [Commits](9cd41bb518...de2a73e963)

---
updated-dependencies:
- dependency-name: streetsidesoftware/cspell-action
  dependency-version: 8.4.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-05-01 18:06:22 +00:00
dependabot[bot]
883edd798f build(deps): bump editorconfig-checker/action-editorconfig-checker
Bumps [editorconfig-checker/action-editorconfig-checker](https://github.com/editorconfig-checker/action-editorconfig-checker) from 2.1.0 to 2.2.0.
- [Release notes](https://github.com/editorconfig-checker/action-editorconfig-checker/releases)
- [Commits](4b6cd6190d...840e866d93)

---
updated-dependencies:
- dependency-name: editorconfig-checker/action-editorconfig-checker
  dependency-version: 2.2.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-05-01 18:05:35 +00:00
Saul Paredes
cbb06545f7 kata-deploy: configure_mariner: also apply test config to runtime-rs
Apply same test configs we use in runtime-go config to runtime-rs config.

These are:
- runtime.static_sandbox_resource_mgmt = true
- hypervisor.clh.valid_hypervisor_paths includes cloud-hypervisor-glibc
- hypervisor.clh.path = cloud-hypervisor-glibc

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2026-05-01 08:15:52 -07:00
Saul Paredes
564d381b79 kata-deploy: configure_mariner: correctly set static_sandbox_resource_mgmt
static_sandbox_resource_mgmt is under the runtime config, not the hypervisor one.

See
31f7438ecd/src/runtime/config/configuration-clh.toml.in (L439)

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2026-05-01 08:15:52 -07:00
Zvonko Kaiser
803531dd9c kernel: Bump Kernel Version
Copy Fail" (CVE-2026-31431) is a high-severity local privilege escalation (LPE)
vulnerability found in the Linux kernel in April 2026, which affects all major
Linux distributions—including those using Long Term Support (LTS) kernels—released since 2017.
The bug allows an unprivileged user to gain root access, escape containers,
and modify the in-memory page cache reliably using a tiny 732-byte script

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-05-01 14:21:49 +00:00
Steve Horsman
62b847fd6c Merge pull request #12850 from burgerdev/remove-standard-oci-runtime
agent: remove standard-oci-runtime feature
2026-05-01 12:44:10 +01:00
Fabiano Fidêncio
79ba4e2dd0 Merge pull request #12937 from fidencio/topic/kata-deploy-support-containerd-config-version-4
kata-deploy: support containerd config version 4
2026-05-01 07:46:36 +02:00
Fabiano Fidêncio
96b68e77a7 kata-deploy: support containerd config schema version 4 and newer
Containerd 2.3.0 introduces config schema version 4 (see upstream
RELEASES.md and the version-4 server-plugin documentation). The default file
still uses the same split-CRI layout as version 3 (plugins under
io.containerd.cri.v1.runtime and io.containerd.cri.v1.images). Schema v4
mainly moves gRPC, TTRPC, debug, and metrics listener settings under
io.containerd.server.v1.*; kata-deploy does not edit those server tables except
for containerd log verbosity when DEBUG=true.

Fixes: #12936

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-30 16:23:43 +02:00
Steve Horsman
31f7438ecd Merge pull request #12949 from stevenhorsman/kata-ctl/move-into-root-workspace
kata-ctl: Move into root workspace
2026-04-30 11:45:50 +01:00
stevenhorsman
b61b3d2f20 kata-deploy: Update default tool binary location
Now that all but agent-ctl (still WIP) of the tools are
in the root workspace, switch the default to that and add
the exception for agent-ctl as it's the odd one out.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-30 08:46:22 +01:00
stevenhorsman
f8cf47d17c kata-ctl: fix clippy to_string_in_format_args warnings
With the workspace unification we've bumped anyhow
from 1.0.31 to 1.0.102, so update the code to reflect that
error implements `Display` now in the newer version.

Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-30 08:45:27 +01:00
stevenhorsman
efe62c9280 kata-ctl: Move into root workspace
Add kata-ctl to be a workspace member to simplify the
dependency management.

Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-30 08:45:27 +01:00
Fabiano Fidêncio
1e6c54cbcf Merge pull request #12856 from harshitgupta1337/cbl-mariner-config-return-0
rootfs: Suppress condition check failure errors in cbl-mariner/config.sh
2026-04-30 08:35:06 +02:00
Fabiano Fidêncio
3b978c77ed Merge pull request #12950 from stevenhorsman/trace-forwarder/move-to-root-workspace
trace-forwarder: Move into root workspace
2026-04-29 23:54:43 +02:00
Harshit Gupta
3b796c6579 rootfs: mariner: suppress condition check failure errors
Avoid returning failure from sourced scripts when condition check evaluates
to false.

Signed-off-by: Harshit Gupta <guptaharshit@microsoft.com>
2026-04-29 14:11:32 -04:00
Fabiano Fidêncio
5f59e20032 Merge pull request #12944 from fidencio/topic/run-arm64-ci-on-PR-again
Revert "ci: Only run arm64 k8s tests on nightly builds"
2026-04-29 15:30:22 +02:00
stevenhorsman
9cae783f14 kata-deploy: fix binary location for trace-forwarder
Moving the trace-forwarder into the root workspace moves the target
directory, so update this target.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-29 13:27:09 +01:00
stevenhorsman
7664ebda7e trace-forwarder: Move into root workspace
Add trace-forwarder to be a workspace member to simplify the
dependency management.

Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-29 12:11:04 +01:00
Fabiano Fidêncio
1a22c3adec Merge pull request #12942 from stevenhorsman/fix-cri-containerd-test-names
ci: Fix cri-containerd-test names
2026-04-29 09:56:43 +02:00
Fabiano Fidêncio
ef15324b04 Revert "ci: Only run arm64 k8s tests on nightly builds"
This reverts commit c5b159c556, as now we
have 3 runners plugged into the CI.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-29 07:38:12 +02:00
Steve Horsman
2435970fe8 Merge pull request #12933 from fidencio/topic/runtime-rs-decouple-dragonball-from-non-x86-checks
runtime-rs: drop misleading unsupported arches gating
2026-04-28 18:36:16 +01:00
stevenhorsman
4d4dee3af2 ci: Fix cri-containerd-test names
During the zizmor refactoring I changed the name of two jobs
to make all the architectures match. I forgot to update required_tests
and as a workflow only change the PR didn't check this, so update
them now.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-28 18:30:53 +01:00