Commit Graph

17406 Commits

Author SHA1 Message Date
Steve Horsman
d8405cb7fb Merge pull request #11983 from stevenhorsman/toolchain-guidance
doc: Document our Toolchain policy
2025-12-02 15:47:54 +00:00
stevenhorsman
b9cb667687 doc: Document our Toolchain policy
Create an initial version of our toolchain policy as agreed in
Architecture Committee meetings and the PTG

Fixes: #9841
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-02 14:28:29 +00:00
stevenhorsman
5c618dc8e2 tests: Switch nginx images to use version.yaml details
- Swap out the hard-coded nginx registry and verisons for reading
the test image details for version.yaml
which can also ensure that the quay.io mirror is used
rather than the docker hub versions which can hit pull limits
- Try setting imagePullPoliycy Always to fix issues with the arm CI

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-02 10:04:09 +01:00
Manuel Huber
4355af7972 kata-deploy: Fix binary find install_tools_helper
Using make tarball targets for tools locally, binaries may exist
for both debug and release builds. In this case, cryptic errors
are shown as we try to install multiple binaries.
This change require exactly one binary to be found and errors  out
in other cases.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-01 09:29:24 -08:00
Manuel Huber
5a5c43429e ci: nvidia: remove kubectl_retry calls
When tests regress, the CI wait time can increase significantly
with the current kubectly_retry attempt logic. Thus, align with
other tests and remove kubectl_retry invocations. Instead, rely on
proper timeouts.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-11-28 19:00:57 +01:00
Fabiano Fidêncio
e3646adedf gatekeeper: Drop SEV-SNP from required
SEV-SNP machine is failing due to nydus not being deployed in the
machine.

We cannot easily contact the maintainers due to the US Holidays, and I
think this should become a criteria for a machine not be added as
required again (different regions coverage).

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-28 12:46:07 +01:00
Steve Horsman
8534afb9e8 Merge pull request #12150 from stevenhorsman/add-gatekeeper-triggers
ci: Add two extra gatekeeper triggers
2025-11-28 09:34:41 +00:00
Zvonko Kaiser
9dfa6df2cb agent: Bump CDI-rs to latest
Latest version of container-device-interface is v0.1.1

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-11-27 22:57:50 +01:00
Fabiano Fidêncio
776e08dbba build: Add nvidia image rootfs builds
So far we've only been building the initrd for the nvidia rootfs.
However, we're also interested on having the image beind used for a few
use-cases.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-27 22:46:07 +01:00
stevenhorsman
531311090c ci: Add two extra gatekeeper triggers
We hit a case that gatekeeper was failing due to thinking the WIP check
had failed, but since it ran the PR had been edited to remove that from
the title. We should listen to edits and unlabels of the PR to ensure that
gatekeeper doesn't get outdated in situations like this.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-11-27 16:45:04 +00:00
Zvonko Kaiser
bfc9e446e1 kernel: Add NUMA config
Add per arch specific NUMA enablement kernel settings

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-11-27 12:45:27 +01:00
Steve Horsman
c5ae8c4ba0 Merge pull request #12144 from BbolroC/use-runs-on-to-choose-runners
GHA: Use `runs-on` only for choosing proper runners
2025-11-27 09:54:39 +00:00
Fabiano Fidêncio
2e1ca580a6 runtime-rs: Only QEMU supports templating
We can remove the checks and default values attribution from all other
shims.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-27 10:31:28 +01:00
Alex Lyn
df8315c865 Merge pull request #12130 from Apokleos/stability-rs
tests: Enable stability tests for runtime-rs
2025-11-27 14:27:58 +08:00
Fupan Li
50dce0cc89 Merge pull request #12141 from Apokleos/fix-nydus-sn
tests: Properly handle containerd config based on version
2025-11-27 11:59:59 +08:00
Fabiano Fidêncio
fa42641692 kata-deploy: Cover all flavours of QEMU shims with multiInstallSuffix
We were missing all the runtime-rs variants.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-26 17:44:16 +01:00
Fabiano Fidêncio
96d1e0fe97 kata-deploy: Fix multiInstallSuffix for NV shims
When using the multiInstallSuffix we must be cautelous on using the shim
name, as qemu-nvidia-gpu* doesn't actually have a matching QEMU itself,
but should rather be mapped to:
qemu-nvidia-gpu -> qemu
qemu-nvidia-gpu-snp -> qemu-snp-experimental
qemu-nvidia-gpu-tdx -> qemu-tdx-experimental

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-26 17:44:16 +01:00
Markus Rudy
d8f347d397 Merge pull request #12112 from shwetha-s-poojary/fix_list_routes
agent: fix the list_routes failure
2025-11-26 17:32:10 +01:00
Steve Horsman
3573408f6b Merge pull request #11586 from zvonkok/numa-qemu
qemu: Enable NUMA
2025-11-26 16:28:16 +00:00
Steve Horsman
aae483bf1d Merge pull request #12096 from Amulyam24/enable-ibm-runners
ci: re-enable IBM runners for ppc64le and s390x
2025-11-26 13:51:21 +00:00
Steve Horsman
5c09849fe6 Merge pull request #12143 from kata-containers/topic/add-report-tests-to-workflows
workflows: Add Report tests to all workflows
2025-11-26 13:18:21 +00:00
Steve Horsman
ed7108e61a Merge pull request #12138 from arvindskumar99/SNPrequired
CI: readding SNP as required
2025-11-26 11:33:07 +00:00
Amulyam24
43a004444a ci: re-enable IBM runners for ppc64le and s390x
This PR re-enables the IBM runners for ppc64le/s390x build jobs and s390x static checks.

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2025-11-26 16:20:01 +05:30
Hyounggyu Choi
6f761149a7 GHA: Use runs-on only for choosing proper runners
Fixes: #12123

`include` in #12069, introduced to choose a different runner
based on component, leads to another set of redundant jobs
where `matrix.command` is empty.
This commit gets back to the `runs-on` solution, but makes
the condition human-readable.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2025-11-26 11:35:30 +01:00
Alex Lyn
4e450691f4 tests: Unify nydus configuration to containerd v3 schema
Containerd configuration syntax (`config.toml`) varies across versions,
requiring per-version logic for fields like `runtime`.

However, testing confirms that containerd LTS (1.7.x) and newer
versions fully support the v3 schema for the nydus remote snapshotter.

This commit changes the previous containerd v1 settings in `config.toml`.
Instead, it introduces a unified v3-style configuration for nydus, which
can be vailid for lts and active containerds.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-11-26 17:58:16 +08:00
stevenhorsman
4c59cf1a5d workflows: Add Report tests to all workflows
In the CoCo tests jobs @wainersm create a report tests step
that summarises the jobs, so they are easier to understand and
get results for. This is very useful, so let's roll it out to all the bats
tests.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-11-26 09:28:36 +00:00
shwetha-s-poojary
4510e6b49e agent: fix the list_routes failure
relax list_routes tests so not every route requires a device

Signed-off-by: shwetha-s-poojary <shwetha.s-poojary@ibm.com>
2025-11-25 20:25:46 -08:00
Xuewei Niu
04e1cf06ed Merge pull request #12137 from Apokleos/fix-netdev-mq
runtime-rs: fix QMP 'mq' parameter type in netdev_add to boolean
2025-11-26 11:49:33 +08:00
Alex Lyn
ebe084e093 Merge pull request #12122 from fidencio/topic/configs-do-no-have-commented-out-options
runtimes: config: Do NOT have commented fields
2025-11-26 10:33:32 +08:00
Alex Lyn
e9f50f6e71 Merge pull request #12116 from manuelh-dev/mahuber/ci-openvpn-policy-v2
policy: ci: enable security policy for openvpn test case
2025-11-26 09:35:43 +08:00
Fabiano Fidêncio
e859537c74 runtimes: config: Do NOT have commented fields
In order to have a better way to set things up using a toml editor, we
should take the containerd approach and actually have everything
uncommnted.  This will help us to unify how we deal with such values in
the future from the kata-deploy POV.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-25 19:26:56 +01:00
Arvind Kumar
c085011a0a CI: readding SNP as required
Reenabling the SNP CI node as a required test.

Signed-off-by: Arvind Kumar <arvinkum@amd.com>
2025-11-25 17:05:01 +00:00
Fabiano Fidêncio
5ca4f2b9ff runtimes: annotations: Fix kernel param handling
We need to ensure that we do not blindly append nor blindly override the
kernel parameters set by default, but rather modify the values in case
they exist, and append in case they do not.

Now we're actually making golang and rust runtime behave the same, as so
far they were behaving differently, each version wrong in its own way.
:-p.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-25 16:04:52 +01:00
Zvonko Kaiser
45cce49b72 shellcheckk: Fix [] [[]] SC2166
This file is a beast so doing one shellcheck fix after the other.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-11-25 15:46:16 +01:00
Zvonko Kaiser
b2c9439314 qemu: Update tools/packaging/static-build/qemu/build-qemu.sh
This nit was introduced by 227e717 during the v3.1.0 era. The + sign from the bash substitution ${CI:+...} was copied by mistake.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-25 15:46:09 +01:00
Zvonko Kaiser
2f3d42c0e4 shellcheck: build-qemu.sh is clean
Make shellcheck happy

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-11-25 15:46:07 +01:00
Zvonko Kaiser
f55de74ac5 shellcheck: build-base-qemu.sh is clean
Make shellcheck happy

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-11-25 15:45:49 +01:00
Zvonko Kaiser
040f920de1 qemu: Enable NUMA support
Enable NUMA support with QEMU.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-11-25 15:45:00 +01:00
Alex Lyn
de9308419b Merge pull request #12135 from microsoft/danmihai1/init-data
agent: allow disabling detect_initdata_device
2025-11-25 21:07:57 +08:00
Alex Lyn
34d3bd18bc Merge pull request #12132 from fidencio/topic/runtime-classes-fix-nvidia-gpu-podOverhead
runtimeclasses: Fix nvidia-gpu podOverhead
2025-11-25 20:23:07 +08:00
Alex Lyn
7f4d856e38 tests: Enable nydus tests for qemu-runtime-rs
We need enable nydus tests for qemu-runtime-rs, and this commit
aims to do it.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-11-25 17:45:57 +08:00
Alex Lyn
98df3e760c runtime-rs: fix QMP 'mq' parameter type in netdev_add to boolean
QEMU netdev_add QMP command requires the 'mq' (multi-queue) argument
to be of boolean type (`true` / `false`). In runtime-rs the virtio-net
device hotplug logic currently passes a string value (e.g. "on"/"off"),
which causes QEMU to reject the command:
```
    Invalid parameter type for 'mq', expected: boolean
```
This patch modifies `hotplug_network_device` to insert 'mq' as a proper
boolean value of `true . This fixes sandbox startup failures when
multi-queue is enabled.

Fixes #12136

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-11-25 17:34:36 +08:00
Alex Lyn
23393d47f6 tests: Enable stability tests for qemu-runtime-rs on nontee
Enable the stability tests for qemu-runtime-rs CoCo on non-TEE
environments

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-11-25 16:18:37 +08:00
Alex Lyn
f1d971040d tests: Enable run-nerdctl-tests for qemu-runtime-rs
Enable nerdctl tests for qemu-runtime-rs

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-11-25 16:14:50 +08:00
Alex Lyn
c7842aed16 tests: Enable stability tests for runtime-rs
As previous set without qemu-runtime-rs, we enable it in this commit.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-11-25 16:12:12 +08:00
Alex Lyn
aadf1d6f71 Merge pull request #11932 from Apokleos/enhance-blk-params
runtime-rs: Allow configuration of virtio block queue parameters
2025-11-25 15:24:12 +08:00
Dan Mihai
22d60a36c0 agent: allow disabling detect_initdata_device
Allow users to build the Kata Agent using INIT_DATA=no to disable the
detect_initdata_device() code loop and associated debug log output.

Future additional improvements related to Init Data are tracked by #11532.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2025-11-25 02:44:28 +00:00
Fabiano Fidêncio
bb56a2e4d9 runtimeclasses: Fix nvidia-gpu podOverhead
On 69c4fc4e76, I've mistakenly changed the
nvidia-gpu podOverhead while I should only have changed the TEE
nvidia-gpu ones.

Let's move it back to its original value.

Reported-by: Joji Mekkattuparamban <jojim@nvidia.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-24 21:43:29 +01:00
Zvonko Kaiser
55489818d6 gpu: TDX kernel param cleanup
This settings is not needed anymore with Ubuntu 25.10
and the newest QEMU releases for TDX by Ubuntu.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-11-24 15:49:16 +01:00
Steve Horsman
e1e370091c Merge pull request #12128 from fidencio/topic/kata-deploy-nfd-adjust-runtime-classe
kata-deploy: nfd: Patch TEE runtimeclasses when needed
2025-11-24 14:05:43 +00:00