Commit Graph

18964 Commits

Author SHA1 Message Date
Fabiano Fidêncio
77e558deb0 runtime: Fix shellcheck issues in git_push.sh
Fix shellcheck warnings and notes identified by running
shellcheck --severity=style.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-24 08:14:07 +02:00
Fabiano Fidêncio
4c490579d5 runtime: Fix shellcheck issues in update-generated-runtime-proto.sh
Fix shellcheck warnings and notes identified by running
shellcheck --severity=style.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-24 08:14:07 +02:00
Fabiano Fidêncio
71e5e67b07 runtime: Fix shellcheck issues in update-generated-hypervisor-proto.sh
Fix shellcheck warnings and notes identified by running
shellcheck --severity=style.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-24 08:14:07 +02:00
Fabiano Fidêncio
01fb3bdd1f runtime: Fix shellcheck issues in tree_status.sh
Fix shellcheck warnings and notes identified by running
shellcheck --severity=style.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-24 08:14:07 +02:00
Fabiano Fidêncio
5ef09c222b runtime: Fix shellcheck issues in go-test.sh
Fix shellcheck warnings and notes identified by running
shellcheck --severity=style.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-24 08:14:07 +02:00
Fabiano Fidêncio
6e7215317c protocols: Fix shellcheck issues in update-generated-proto.sh
Fix shellcheck warnings and notes identified by running
shellcheck --severity=style.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-24 08:14:07 +02:00
Fabiano Fidêncio
e1ab24d320 csi-kata-directvolume: Fix shellcheck issues in directvol-deploy.sh
Fix shellcheck warnings and notes identified by running
shellcheck --severity=style.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-24 08:14:07 +02:00
Fabiano Fidêncio
10f81ae534 csi-kata-directvolume: Fix shellcheck issues in rbac-deploy.sh
Fix shellcheck warnings and notes identified by running
shellcheck --severity=style.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-24 08:14:07 +02:00
Fabiano Fidêncio
b6c693ae8c csi-kata-directvolume: Fix shellcheck issues in deploy.sh
Fix shellcheck warnings and notes identified by running
shellcheck --severity=style.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-24 08:14:07 +02:00
Fabiano Fidêncio
b9e1f74417 csi-kata-directvolume: Fix shellcheck issues in pod-apply.sh
Fix shellcheck warnings and notes identified by running
shellcheck --severity=style.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-24 08:14:07 +02:00
Fabiano Fidêncio
0e9a14f7ec csi-kata-directvolume: Fix shellcheck issues in pod-delete.sh
Fix shellcheck warnings and notes identified by running
shellcheck --severity=style.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-24 08:14:07 +02:00
Fabiano Fidêncio
c5c0076859 Merge pull request #12908 from microsoft/saul/disable_nested
runtime-rs: ch: disable nested vCPUs on MSHV
2026-04-24 07:56:18 +02:00
Xiaofan Xxf
fd39117a21 dragonball: Implement userspace IOAPIC to enable split irqchip
From Linux 6.14, creating a TDX VM requires that split irqchip is
enabled. Under this circumstance, device IOAPIC would be managed
in userspace, instead of KVM, so a manager is needed to handle
MMIO read/write to emulated IOAPIC registers.
Also, with split irqchip, irqfd is no longer able to trigger an
interrupt after device IO is completed. Instead, KVM_SIGNAL_MSI
is used for interrupt triggering.

Note that only legacy irq with edge-triggered interrupt is
implemented here. And split irqchip feature is only enabled
when confidential VM type is set to TDX.

Signed-off-by: Xiaofan Xxf <xiaofan.xxf@antgroup.com>
2026-04-24 10:33:05 +08:00
Saul Paredes
ed44b363ba runtime-rs: ch: disable nested vCPUs on MSHV
This is a runtime-rs port for 7973e4e2a8

The recently-added nested property is true by default, but is not
supported yet on MSHV.

See https://github.com/cloud-hypervisor/cloud-hypervisor/pull/7408 for additional information.

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2026-04-23 21:04:53 -05:00
Cameron Baird
9da088f06e ci: Introduce smb server test
Add k8s-smb-volume.bats which stands up a SMB server and a SMB client
(in kata pod).

Verifies that a CIFS SMB volumn can be mounted in the kata VM.

Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
2026-04-23 21:04:46 -05:00
Cameron Baird
b0d52311ed kernel: add required configs for CIFS support
Enable CONFIG_CIFS and related features in the kernel. Allows the
mounting and use of SMB file shares within the guest UVM.

Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
2026-04-23 21:04:46 -05:00
Saul Paredes
75bf21b142 Merge pull request #12910 from microsoft/saul/skip_failing_aks
test: temp skip failing tests on AKS
2026-04-23 18:03:32 -07:00
Saul Paredes
90e94ab305 test: temp skip failing tests on AKS
"kubectl describe" output has been recently updated in AKS,
and this change in behaviour no longer allows us to assess these tests correctly.

failing tests: https://github.com/kata-containers/kata-containers/actions/runs/24809935437/job/72613854358#step:13:609

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2026-04-23 14:36:57 -07:00
Aurélien Bombo
75f5ceee2a Merge pull request #12904 from fidencio/topic/ci-survive-k8s-release-day
tests: Fallback to previous k8s minor version on broken pkgs
2026-04-23 09:55:51 -05:00
Fabiano Fidêncio
8f4bf1c1c3 Merge pull request #12906 from fidencio/topic/remove-arm-from-required-tests
gatekeeper: Make arm64 CI unrequired
2026-04-23 16:46:29 +02:00
Fabiano Fidêncio
0959f02b76 gatekeeper: Make arm64 CI unrequired
We have only one machine up and running the CIs, thus no capacity to
keep it as required for now.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-23 16:12:06 +02:00
Fabiano Fidêncio
a8d81acb0a tests: Fallback to previous k8s minor version on broken pkgs
The latest stable Kubernetes version advertised by dl.k8s.io may
temporarily have unresolvable package dependencies (e.g. missing
cri-tools or kubernetes-cni for the newest minor). This causes CI
failures during k8s deployment.

Refactor do_deploy_k8s to resolve the version once, perform a dry-run
apt-get install check, and if it fails, automatically fall back to the
previous minor version (e.g. v1.36 -> v1.35) before retrying.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-23 14:04:10 +02:00
Fupan Li
18378145d2 Merge pull request #12821 from fidencio/topic/runtime-rs-cpu-pinning
runtime-rs: Add vCPU thread pinning support
2026-04-23 16:49:18 +08:00
Fabiano Fidêncio
f092210342 Merge pull request #12892 from kata-containers/topic/remove-non-running-tests
ci: Remove non-running tests
2026-04-23 09:41:38 +02:00
Fabiano Fidêncio
68cc7f8e70 ci: remove unmaintained CoCo stability test workflows
The ci-coco-stability.yaml workflow has its weekly schedule
commented out with a note that the workload is not maintained.
Remove the entire chain: ci-coco-stability.yaml, ci-weekly.yaml,
run-kata-coco-stability-tests.yaml, and the kubernetes stability
test scripts that were only used through this path.

The local containerd stability tests (tests/stability/gha-run.sh)
remain as they are actively used by basic-ci workflows.

Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-23 08:46:12 +02:00
Fabiano Fidêncio
fccfd4dec7 tests: remove orphan vfio.yaml k8s workload manifest
This manifest is not referenced by any .bats test file and
is effectively dead code.

Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-23 08:46:12 +02:00
Fabiano Fidêncio
c380c4c1d2 tests: remove unreferenced stdio integration tests
The tests/integration/stdio/ directory has a gha-run.sh script
but no workflow in .github/workflows/ references it, so these
tests never run in CI.

Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-23 08:46:12 +02:00
Fabiano Fidêncio
e0d98fafe3 ci: remove disabled run-cri-containerd-tests-arm64 job
This job in ci.yaml has been unconditionally disabled (if: false)
with no tracking issue or path to re-enablement.

Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-23 08:46:12 +02:00
Fabiano Fidêncio
c7e3f95883 tests: remove disabled tracing tests and CI job
The run-tracing job in basic-ci-amd64.yaml has been disabled
(if: false) due to issue #9763, with no path to re-enablement.
Remove the job definition and the backing
tests/functional/tracing/ directory.

Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-23 08:46:12 +02:00
Fabiano Fidêncio
8a93cf8f17 tests: remove disabled VFIO tests and CI job
The run-vfio job in basic-ci-amd64.yaml has been disabled
(if: false) due to issues #9764, #9851, and #9940, with no
path to re-enablement. Remove the job definition and the
backing tests/functional/vfio/ directory.

Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-23 08:46:12 +02:00
Fabiano Fidêncio
8e685f22c6 ci: remove orphan run-kata-deploy-tests-on-aks.yaml workflow
This reusable workflow (workflow_call) has no caller anywhere in
the repository, making it dead code.

Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-23 08:46:12 +02:00
Fabiano Fidêncio
b74f2c0a9c tests: remove metrics tests and workflow
The run-metrics.yaml workflow is a reusable workflow_call with no
caller in the repository, making it effectively dead code. Remove
the workflow, the entire tests/metrics/ directory (~586 files
including vendored Go for checkmetrics), and the "metrics"
self-hosted runner label from actionlint.yaml.

Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-23 08:46:12 +02:00
Aurélien Bombo
87a3318151 Merge pull request #12695 from microsoft/saulparedes/test_mariner_runtime-rs
ci: k8s-tests: test mariner and runtime-rs
2026-04-22 16:01:08 -05:00
Fabiano Fidêncio
8dccf4cf37 Merge pull request #12896 from fidencio/release/3.29.0
release: Bump version to 3.29.0
3.29.0
2026-04-22 20:45:50 +02:00
Fabiano Fidêncio
1b9e49eb27 Merge commit from fork
genpolicy: restrict symlinks in CopyFile
2026-04-22 20:05:03 +02:00
Fabiano Fidêncio
ed3f8b4efe release: Bump version to 3.29.0
Bump VERSION and helm-charts versions.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-22 15:57:39 +02:00
Markus Rudy
639ff3578d genpolicy: restrict symlinks in CopyFile
Allowing arbitrary symlinks in the shared directory is unsafe for
confidential VM use cases. In order to make CopyFile safe both for the
VM as well for the consuming containers, we implement the following
rules for symlinks (in addition to the existing rules for other files):

1. Symlinks may not be placed directly into the shared directory.
2. Symlinks must not point 'upwards', i.e. contain `..` as a path
   element.
3. Symlinks must be relative.

These rules ensure that all writes initiated by CopyFile are restricted
to the shared directory (protecting the VM), and that symlinks can't
point outside their mount points (protecting the container).

These new restrictions mean that we can't support arbitrary mount
sources (which might not follow these rules), but the usual k8s suspects
(ConfigMap, Secret, ServiceAccountToken) should still pass.

In order to aid writing the policy, we convert the CopyFileRequest to a
structure that does not contain binary data, but well-defined strings
and types.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-04-22 15:46:12 +02:00
Markus Rudy
d6bd666b3f agent: fix naming for symlinks in CopyFile
The agent referred to the `data` field of an incoming CopyFileRequest
as the 'src'. This is misleading, because 'source' is not mentioned
in the specification (where links are just a path with attached
bytes), and because the documentation for the `ln` utility calls the
path LINK_NAME and the data TARGET. This commit fixes the glitch and
calls the first argument to `symlinkat` the target.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-04-22 15:46:12 +02:00
Markus Rudy
5c362adcff agent: add required features for standalone build
Building the kata-agent-policy crate only succeeded when its parents
(agent and genpolicy) pulled in the required features. This commit adds
the required features to the crate itself, such that it can be built
standalone and IDEs don't show errors while browsing it.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-04-22 15:46:12 +02:00
Fabiano Fidêncio
47dea24409 Merge pull request #12895 from fidencio/topic/kata-deploy-avoid-shipping-what-we-do-not-test
kata-deploy: Remove arm64 and qemu-cca shim support
2026-04-22 15:42:43 +02:00
Fabiano Fidêncio
726992cde3 Merge pull request #12702 from Apokleos/update-docs2
docs: Update docs of kata-containers
2026-04-22 12:04:48 +02:00
Fabiano Fidêncio
9b62021049 kata-deploy: Remove untested arm64 and qemu-cca shim support
We should not ship configurations that we do not actively test.

This commit drops the following from the kata-deploy helm chart:

values.yaml:
- arm64 from supportedArches for the clh shim
- arm64 from supportedArches for the cloud-hypervisor shim
- arm64 from supportedArches for the dragonball shim
- arm64 from supportedArches for the fc shim
- arm64 from supportedArches for the qemu-nvidia-gpu shim
- the entire qemu-cca shim definition

try-kata-tee.values.yaml:
- CCA from the file description comment
- qemu-cca from the TEE shims list comment
- the entire qemu-cca shim definition
- arm64: qemu-cca from the defaultShim mapping, replaced with
  arm64: qemu-coco-dev-runtime-rs (which is tested)

try-kata-nvidia-gpu.values.yaml:
- arm64 from supportedArches for the qemu-nvidia-gpu shim
- arm64: qemu-nvidia-gpu from the defaultShim mapping

Once arm64 and qemu-cca support are properly tested, they can be
re-added.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-22 10:55:29 +02:00
Alex Lyn
978f40d631 docs: Remove obsolete and update documentation index
This commit prunes the documentation tree by removing file
that are either no longer relevant to the current architecture
or have been superseded by newer guides.

Specifically, the doc Intel-Discrete-GPU-passthrough-and-Kata.md
and update using-Intel-QAT-and-kata.md index in nav.yaml

Refining the documentation helps ensure that new contributors
find accurate and up-to-date information.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-22 16:29:46 +08:00
Alex Lyn
59609463e0 docs: Update kernel modules loading document
- Restructure document with clearer sections and better readability
- Add configuration format examples for both runtimes
- Add technical details including data flow and implementation references
- Add debugging section for troubleshooting

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-22 16:29:46 +08:00
Alex Lyn
d6308ffb8c docs: Update SPDK vhost-user guide with CSI driver
- Add support for runtime-rs with Dragonball
- Add CSI driver integration method for Kubernetes
- Add kata-ctl direct-volume method for manual setup
- Preserve SPDK vhost-user Target Overview principles
- Fix minor typo (can exposes -> can expose)

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-22 16:29:46 +08:00
Saul Paredes
cafdd278ba tests: k8s: policy: improve settings selection for runtime-rs hypervisors
"cloud-hypervisor" is also a runtime-rs hypervisor. So we need to include it in the settings selection logic.

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2026-04-21 14:08:27 -07:00
Saul Paredes
baf0f16804 ci: k8s-tests: test mariner and runtime-rs
Disable policy tests when using mariner and runtime-rs. These are not supported yet.

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2026-04-21 14:08:21 -07:00
Fabiano Fidêncio
0c80372cf5 Merge pull request #12881 from stevenhorsman/bump-web-pki-to-0.103.12
Bump web pki to 0.103.12
2026-04-21 18:11:26 +02:00
Aurélien Bombo
206c1d3be8 Merge pull request #12889 from fidencio/topic/ch-config
hypervisor: Enable cloud-hypervisor feature by default
2026-04-21 11:04:31 -05:00
Fabiano Fidêncio
48669a894e runtime-rs: Add vCPU thread pinning support
Port the Go runtime's enable_vcpus_pinning feature to runtime-rs.

The Go runtime already lets users pin each vCPU thread to a specific
host CPU when the vCPU count matches the sandbox cpuset size, using
sched_setaffinity. This is useful for latency-sensitive workloads that
benefit from eliminating cross-CPU migration of vCPU threads.

The approach mirrors the Go implementation:

After VM start and on every container add/update/delete, we fetch the
vCPU thread IDs (via QMP query-cpus-fast for QEMU), compute the union of
all containers' OCI cpusets, and if the two counts match, pin vCPU i to
cpuset[i]. If they diverge (hotplug, container removal, etc.) we reset
all threads back to the full cpuset so nothing gets stuck on a single
core.

The pinning check lives in CgroupsResourceInner::update_sandbox_cgroups,
which already runs at exactly the right points in the lifecycle. The
enable_vcpus_pinning flag flows from the TOML config through
CgroupConfig into the cgroup resource layer, and can also be overridden
per-pod via the io.katacontainers.config.runtime.enable_vcpus_pinning
annotation.

The QEMU config templates default to false. The NV GPU configs will get
their own default (true) in a follow-up once those templates are added.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-21 12:45:56 +02:00