Commit Graph

18081 Commits

Author SHA1 Message Date
Fabiano Fidêncio
4e024bfb43 Revert "tests: Skip testing k3s/rke2 with nydus snapshotter"
This reverts commit ab25592533, as now
we're deploying k3s/rke2 in a way that we properly test them.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-04 11:26:31 +01:00
Fabiano Fidêncio
a2216ec05a tests: set up full K3s/RKE2 V3 containerd template when needed
If the rendered config-v3.toml does not import the drop-in dir, write
the full k3s ContainerdConfigTemplateV3 (with hardcoded import path) so
kata-deploy can use drop-in.

This allows us to test with K3s/RKE2 before my patch there gets
released.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-04 11:26:31 +01:00
Fabiano Fidêncio
01895bf87e kata-deploy: use k3s/rke2 drop-in
Check the rendered containerd config for the versioned drop-in dir import
(config.toml.d or config-v3.toml.d) and bail with a clear error if it is
missing.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-04 11:08:26 +01:00
Aurélien Bombo
d821d4e572 Merge pull request #12619 from sprt/require-editorconfig
gatekeeper: Add EditorConfig checker to required tests
2026-03-03 21:36:32 -06:00
Fabiano Fidêncio
b0345d50e8 build: kernel: Do not expect a modules tarball for vanilla kernel
When I added this I had in mind the period that we still relied on the
SEV module being generated, which we don't do for quite a long time.

This wrong assumption caused the cache to **ALWAYS** fail, increasing
our build time considerably for no reason.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-03 20:14:42 +01:00
Aurélien Bombo
911742e26e gatekeeper: Add EditorConfig checker to required tests
Now that it's stable and fully configured.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-03 11:34:06 -06:00
Fabiano Fidêncio
ab25592533 tests: Skip testing k3s/rke2 with nydus snapshotter
We depend on a k3s commit so we can properly test it, or we need to
change our CI quite a bit to deploy a full template with that imports
in. For now, let's just skip the testing in k3s/rke2 and we'll address
it in a different PR.

ref:
b51167a996

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-03 12:55:10 +01:00
Fabiano Fidêncio
fa3c3eb2ce ci: Add autogenerated policy tests on k0s, k3s, rke2 and microk8s
These tests run only on nightly and when triggering the dev CI manually.
They cover both nydus snapshotter with guest-pull and experimental-force-guest-pull,
using qemu-coco-dev and qemu-coco-dev-runtime-rs, and are included in the
run-kata-coco-tests workflow behind the extensive-matrix-autogenerated-policy flag.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-03 12:55:10 +01:00
Fabiano Fidêncio
3e807300ac tests: k0s: Ensure --logging=containerd=debug is passed
As the default is `info` and that actually overrides whatever is set in
the drop-in file used by k0s.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-03 12:55:10 +01:00
Fabiano Fidêncio
876c6c832d tests: set runtime-request-timeout to 600s for k0s, k3s, rke2, microk8s
Align with kubeadm and bare metal by setting the kubelet CRI
runtime-request-timeout to 600s in deploy functions for k0s (worker
profile), k3s (--kubelet-arg), rke2 (config.yaml), and microk8s
(args/kubelet + restart).

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-03 12:55:10 +01:00
Fabiano Fidêncio
9725df658f tests: k8s: policy: set OCI bundle 1.2.1 for k3s/rke2
k3s and rke2 use containerd that expects OCI bundle 1.2.1; otherwise
autogenerated policy tests fail. Add adapt_common_policy_settings_for_k3s_rke2
and call it from adapt_common_policy_settings when KUBERNETES is k3s or rke2.

Tested with k3s v1.34.4+k3s1, rke2 v1.34.4+rke2r1.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-03 12:55:10 +01:00
Steve Horsman
7ca8db1e61 Merge pull request #12616 from Amulyam24/go-arch-fix
gha: pass the arch for setup-go on ppc64le
2026-03-03 11:34:30 +00:00
Amulyam24
0754a17fed gha: pass the arch for setup-go on ppc64le
By default, setup-go installs ppc64 binary instead of ppc64le,
resulting in an exec format error. Pass the arch explicitly to fix this.

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2026-03-03 16:41:10 +05:30
Dan Mihai
3ea23528a5 docs: require user/group/fsGroup/supplementalGroups
Add a nydus guest-pull limitation explaining that specifying runAsUser,
runAsGroup, fsGroup, and supplementalGroups are required.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-03-02 23:48:36 +01:00
Steve Horsman
e50324ba5b Merge pull request #12609 from kata-containers/dependabot/go_modules/src/runtime/go.opentelemetry.io/otel/sdk-1.40.0
build(deps): bump go.opentelemetry.io/otel/sdk from 1.35.0 to 1.40.0 in /src/runtime
2026-03-02 16:32:40 +00:00
stevenhorsman
993a4846c8 versions: Bump go to 1.25.7
Now that go 1.26 is out, 1.24 is not supported, so bump to
1.25 as per our policy.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-02 16:33:47 +01:00
dependabot[bot]
d95d1796b2 build(deps): bump go.opentelemetry.io/otel/sdk in /src/runtime
Bumps [go.opentelemetry.io/otel/sdk](https://github.com/open-telemetry/opentelemetry-go) from 1.35.0 to 1.40.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.35.0...v1.40.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/sdk
  dependency-version: 1.40.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-03-02 12:59:21 +00:00
Steve Horsman
501d8d1916 Merge pull request #12596 from kata-containers/remove-install_go
workflow | tests: Remove install go
2026-03-02 12:36:58 +00:00
Steve Horsman
964c91f8fc Merge pull request #12608 from kata-containers/sprt/fix-hostpath-dev-docs
docs: Use more accurate wording for /dev hostPath behavior
2026-03-02 11:50:15 +00:00
Aurélien Bombo
68e67d7f8a docs: Use more accurate wording for /dev hostPath behavior
I got lazy when I first added this section in 5c21b1f, so updating the
language to specify that any non-regular host file (under /dev) qualifies,
not just devices.

This matches the actual code, see:

330bfff4be/src/runtime/virtcontainers/mount.go (L57-L83)

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-02 11:32:01 +00:00
Steve Horsman
b147cb1319 Merge pull request #12587 from fidencio/topic/runtime-add-configurable-kubelet-root-dir
runtimes: add configurable kubelet root dir
2026-02-28 19:06:14 +00:00
Xuewei Niu
8a4ae090e6 Merge pull request #12513 from lifupan/event_publish
send the task create/start/delete event to containerd
2026-02-28 14:41:46 +08:00
Zvonko Kaiser
afe09803a1 gpu: Ignore OVMF and use the Kernel for proper PCI setup
Sometimes OVFM provides incorrect values to the kernel
we override it by telling the kernel to handle the PCI space setup
like allocating the proper window sizes and assigning the proper busses
to each device.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-02-27 22:54:31 +01:00
Manuel Huber
88f746dea8 runtime: nvidia: Use OVMF for NV GPU handler
Shift to using OVMF instead of using SeaBios.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>

Update src/runtime/Makefile

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-02-27 22:54:31 +01:00
Zvonko Kaiser
eec397ac08 qemu: Remove PCIe root port BAR reserve sizing
Stop computing and setting mem-reserve and pref64-reserve on PCIe root
ports and switch ports. Remove getBARsMaxAddressableMemory() which
scanned host GPU BARs to pre-calculate these values.

The previous approach only considered GPU devices (IsGPU(), class
0x0302) when scanning for BAR sizes, so devices like NVSwitches (class
0x0680) with their 32MB non-prefetchable BAR0 were not accounted for
and received the 4MB default. Additionally, GetTotalAddressableMemory()
classifies BARs by 32/64-bit address width rather than by the
prefetchable flag that QEMU's mem-reserve vs pref64-reserve maps to.

Modern QEMU introspects VFIO device BARs when they are attached to
root ports and sizes the MMIO windows accordingly. Modern OVMF
(edk2-stable202502+) automatically calculates the 64-bit PCI MMIO
aperture based on the BARs of actually present devices during PCI
enumeration. Omitting the reserve parameters lets QEMU and OVMF
handle MMIO window sizing correctly for all device types including
GPUs, NVSwitches, and NICs without requiring host-side BAR scanning.

This also removes the nvpci dependency from qemu_arch_base.go.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-02-27 22:54:31 +01:00
Zvonko Kaiser
bb7fd335f3 qemu: Remove OVMF X-PciMmio64Mb fw_cfg hint
Modern OVMF (edk2-stable202502 and later) automatically sizes the
64-bit PCI MMIO aperture based on the BARs of actually attached
devices during PCI enumeration. The opt/ovmf/X-PciMmio64Mb fw_cfg
hint is no longer needed to ensure large-BAR devices like NVIDIA
GPUs receive adequate MMIO space.

The previous approach was fragile: the runtime scanned host PCI
devices to estimate the required aperture size, but only considered
GPU devices (class 0x0302), missing NVSwitches and other devices
with large BARs. Removing this code avoids confusion about MMIO
sizing responsibility.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-02-27 22:54:31 +01:00
Fabiano Fidêncio
330bfff4be kata-deploy: Fix nydus snapshotter config (on v3 config version)
On containerd v3 config, disable_snapshot_annotations must be set under the
images plugin, not the runtime plugin.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-27 18:20:30 +01:00
Fabiano Fidêncio
0a73638744 runtime: add configurable kubelet root dir
Different kubernetes distributions, such as k0s, use a different kubelet
root dir location instead of the default /var/lib/kubelet, so ConfigMap
and Secret volume propagation were failing.

This adds a kubelet_root_dir config option that the go runtime uses when
matching volume paths and kata-deploy now sets it automatically for k0s
via a drop-in file.

runtime-rs does not need this option: it identifies ConfigMap/Secret,
projected, and downward-api volumes by volume-type path segment
(kubernetes.io~configmap, etc.), not by kubelet root prefix.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-27 14:10:57 +01:00
Steve Horsman
2695007ef8 Merge pull request #12584 from stevenhorsman/switch-actionlint-workflow
workflow: Update actionlint workflows
2026-02-27 13:03:58 +00:00
stevenhorsman
66e58d6490 tests: Delete install_go.sh
Having a script to install go is legacy from Jenkins, so
delete it, so there is less code in our repo.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-27 12:42:43 +00:00
stevenhorsman
b71bb47e21 workflow: Use setup-go to install go
Rather than having our own script, just use the github action
to install go when needed.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-27 12:42:43 +00:00
Steve Horsman
3442fc7d07 Merge pull request #12477 from kata-containers/workflow-improvements
workflow: Recommended improvements
2026-02-27 11:57:22 +00:00
Markus Rudy
d9d886b419 agent-policy: read bundle-id from OCI spec rootfs
The host path of bundles is not portable and could be literally anything
depending on containerd configuration, so we can't rely on a specific
prefix when deriving the bundle-id. Instead, we derive the bundle-id
from the target root path in the guest.

NOTE: fixes https://github.com/kata-containers/kata-containers/issues/10065

Signed-off-by: Markus Rudy <mr@edgeless.systems>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-27 10:24:38 +01:00
Hyounggyu Choi
be5ae7d1e1 Merge pull request #12573 from BbolroC/support-memory-hotplug-go-runtime-s390x
runtime: Support memory hotplug via virtio-mem on s390x
2026-02-27 09:59:40 +01:00
Steve Horsman
c6014ddfe4 Merge pull request #12574 from sathieu/kata-deploy-kubectl-image
kata-deploy: allow to configure kubectl image
2026-02-27 08:42:06 +00:00
Steve Horsman
1048132eb1 Merge pull request #12564 from stevenhorsman/remove-unused-dependencies
Try and remove unused crates
2026-02-26 13:53:44 +00:00
Aurélien Bombo
2a13f33d50 Merge pull request #12565 from microsoft/danmihai1/clh-51.1
versions: update cloud hypervisor to v51.1
2026-02-26 07:52:57 -06:00
Hyounggyu Choi
b1847f9598 tests: Run TestContainerMemoryUpdate() on s390x only with virtio-mem
Let's run `TestContainerMemoryUpdate` on s390x
only when virtio-mem is enabled.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2026-02-26 14:21:34 +01:00
Hyounggyu Choi
b9f3d5aa67 runtime: Support memory hotplug with virtio-mem on s390x
This commit adds logic to properly handle memory hotplug
for QemuCCWVirtio in the ExecMemdevAdd() path.

The new logic is triggered only when virtio-mem is enabled.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2026-02-26 14:21:34 +01:00
Hyounggyu Choi
19771671c2 runtime: Handle virtio-mem resize in hotplugAddMemory()
ResizeMemory() already contains the virtio-mem resize logic.
However, hotplugAddMemory(), which is invoked via a different
path, lacked this handling and always fell back to the pc-dimm
path, even when virtio-mem was configured.

This commit adds virtio-mem resize handling to hotplugAddMemory().
It also adds corresponding unit tests.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2026-02-26 14:21:34 +01:00
Fabiano Fidêncio
8c91e7889c helm-chart: support digest pinning for images
When image.reference or kubectlImage.reference already contains a digest
(e.g. quay.io/...@sha256:...), use the reference as-is instead of
appending :tag. This avoids invalid image strings like 'image@sha256🔤'
when tag is empty and allows users to pin by digest.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-26 13:39:51 +01:00
Mathieu Parent
b61d169472 kata-deploy: allow to configure kubectl image
This can be used to:

- pin tag (current is 20260112)
- pin digest
- use another image

Signed-off-by: Mathieu Parent <mathieu.parent@insee.fr>
2026-02-26 13:12:03 +01:00
stevenhorsman
308442e887 workflow: Update actionlint workflows
The actionlint gh extension is outdated and the wrapping seems
unnecessary when there is a github action that seems to be maintained,
so let's update to use that

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-26 11:52:19 +00:00
stevenhorsman
82c27181d8 kata-deploy: Remove unused crates
cargo machete has identified `serde` and `thiserror` as being unused,
so remove them from Cargo.toml

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-26 09:38:35 +00:00
stevenhorsman
bdbfe9915b kata-ctl: Remove unused crates
cargo machete has identified the follow crates as unused:
- containerd-shim-protos
- safe-path
- strum
- ttrpc

strum is neded (and maybe isn't picked up due to it being
used by macros?), so add it to the ignore list and remove
the rest

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-26 09:38:35 +00:00
stevenhorsman
b4365bdcaa genpolicy: Remove unused crates
`cargo machete` has identified `openssl` and `serde-transcode`
as being un-used. openssl is required, so add it to the ignore
list and just remove serde-transcode

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-26 09:38:35 +00:00
stevenhorsman
382c6d2a2f agent-ctl: Remove unused crates
`log` and `rustjail` are flagged by cargo machete as unused,
so lets remove them to reduce the footprint of crates in this tool

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-26 09:38:35 +00:00
stevenhorsman
e43a17c2ba runtime-rs: Remove unused crates
- Remove unused crates to reduce our size and the work needed
to do updates
- Also update package.metadata.cargo-machete with some crates
that are incorrectly coming up as unused

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-26 09:37:46 +00:00
stevenhorsman
8177a440ca libs: Remove unused crates
Remove unused crates to reduce our size and the work needed
to do updates

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-26 09:37:46 +00:00
stevenhorsman
ed7ef68510 dragonball: Remove unused crates
Remove the crates that cargo machete has assessed as being unused

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-26 09:37:15 +00:00