Commit Graph

15496 Commits

Author SHA1 Message Date
Fabiano Fidêncio
545780a83a shellcheck: tests: k8s: Fix gha-run.sh warnings
As we'll touch this file during this series, let's already make sure we
solve all the needed warnings.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2025-03-05 19:44:27 +01:00
Fabiano Fidêncio
50f765b19c shellcheck: tests: Fix gha-run-k8s-common.sh warnings
Let's fix all the warnings caught in this file, as we're already
touching it.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2025-03-05 19:44:27 +01:00
Fabiano Fidêncio
219db60071 tests: kata-deploy: microk8s: Re-work installation
So we can ensure that the user has enough permissions to access
microk8s.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2025-03-05 19:44:27 +01:00
Fabiano Fidêncio
c337a21a4e shellcheck: kata-deploy: Fix warnings
He were fixing the few warnings we found in the files present in the
functional tests for kata-deploy.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2025-03-05 19:44:27 +01:00
Fabiano Fidêncio
fd832d0feb tests: kata-deploy: Run installation with only one VMM
It doesn't make much sense to test different VMMs as that wouldn't
trigger a different code path.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2025-03-05 19:44:27 +01:00
Fabiano Fidêncio
14bf653c35 tests: kata-deploy: Re-add tests, now using github runners
As GitHub runners now support nested virt, we're don't depend on garm
for those anymore.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2025-03-05 19:44:27 +01:00
Zvonko Kaiser
a5629f9bfa
Merge pull request #10971 from zvonkok/host-guest-mapping
agent: Enable VFIO and initContainers
2025-03-05 08:58:45 -05:00
Fabiano Fidêncio
504d9e2b66
Merge pull request #10976 from zvonkok/fix-dev-permissions
agent: Fix default linux device permissions
2025-03-05 13:54:06 +01:00
Hyounggyu Choi
65021caca6
Merge pull request #10963 from RuoqingHe/remove-arch-predicates-in-runtime-rs
runtime-rs: Enable Dragonball only for x86_64 & aarch64
2025-03-05 09:10:33 +01:00
Zvonko Kaiser
c73ff7518e agent: Fix default linux device permissions
We had the default permissions set to 0o000 if the file_mode was not
present, for most container devices this is the wrong default. Since
those devices are meant also to be accessed by users and others add a
sane default of 0o666 to devices that do not have any permissions set.

Otherwise only root can acess those and we cannot run containers as a
user.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-03-05 02:22:24 +00:00
Zvonko Kaiser
4bb0eb4590
Merge pull request #10954 from kata-containers/topic/metrics-kata-deploy
Rework and fix metrics issues
2025-03-04 20:22:53 -05:00
Dan Mihai
edf6af2a43
Merge pull request #10955 from microsoft/cameronbaird/hyp-loglevel-default-upstream
runtime: Properly set default hyp loglevel to 1
2025-03-04 16:44:08 -08:00
Cameron Baird
d48116114e runtime: Properly set default hyp loglevel to 1
Tweak default HypervisorLoglevel config option for clh to 1.

Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
2025-03-04 20:36:40 +00:00
Zvonko Kaiser
248d04c20c agent: Enable VFIO and initContainers
We had a static mapping of host guest PCI addresses, which prevented to
use VFIO devices in initContainers. We're tracking now the host-guest
mapping per container and removing this mapping if a container is
removed.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-03-04 19:53:52 +00:00
Fabiano Fidêncio
874129a11f
Merge pull request #10958 from stevenhorsman/shell-check-errors-fix
Shell check errors fix
2025-03-04 17:37:36 +01:00
stevenhorsman
02a2f6a9c1 tests: Sanitize K8S_TEST_ENTRY
Now we've added the double quotes around
`${K8S_TEST_UNION[@]}`, so platforms are
failing with:
```
Error: Test file "/home/ubuntu/runner/_layout/_work/kata-containers/kata-containers/tests/integration/kubernetes/k8s-nginx-connectivity.bats
" does not exist
```
due to the line continuation, so sanitise the value
to try and fix this.

Co-authored-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-04 09:39:10 +00:00
stevenhorsman
e33ad56cf4 kernel: bump kata_config_version
Bump kernel version as the build-kernel script
was updated (even if there was no functional change).

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-04 09:39:10 +00:00
stevenhorsman
2df3e5937a ci/openshift-ci: Fix script error
The space was missing before `]`, so fix this and also
swtich to double square brackets and variable braces

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-04 09:39:10 +00:00
stevenhorsman
9a9e88a38d test: vfio: Attempt to fix logic
This was checking that a literal string was non-zero.
I'm assume it instead wanted to check if the file exists

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-04 09:39:10 +00:00
stevenhorsman
b220cca253 shellcheck: Fix shellcheck SC2066
> Since you double-quoted this, it will not word split, and the loop will only run once.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-04 09:39:10 +00:00
stevenhorsman
b8cfdd06fb shellcheck: Fix shellcheck SC2071
> > is for string comparisons. Use -gt instead.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-04 09:39:10 +00:00
stevenhorsman
eb90b93e3f shellcheck: Fix shellcheck SC2104
> In functions, use return instead of break.
> rationale: break or continue are used to abort or
continue a loop, and are not the right way to exit
a function. Use return instead.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-04 09:39:10 +00:00
stevenhorsman
67bfd4793e shellcheck: Fix shellcheck SC2242
> Can only exit with status 0-255. Other data should be written to stdout/stderr.

Switch exit -1 to exit 1

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-04 09:39:01 +00:00
stevenhorsman
ed8347c868 shellcheck: Fix shellcheck SC2070
> -n doesn't work with unquoted arguments. Quote or use [[ ]]

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-04 09:35:46 +00:00
stevenhorsman
dbba6b056b shellcheck: Fix shellcheck SC2148
> Tips depend on target shell and yours is unknown. Add a shebang.

Add
```
#!/usr/bin/env bash
```

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-04 09:35:46 +00:00
stevenhorsman
c5ff513e0b shellcheck: Fix shellcheck SC2068
> Double quote array expansions to avoid re-splitting elements

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-04 09:35:46 +00:00
stevenhorsman
58672068ff shellcheck: Fix shellcheck SC2145
> Argument mixes string and array. Use * or separate argument.

- Swap echos for printfs and improve formatting
- Replace $@ with $*
- Split arrays into separate arguments

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-04 09:35:46 +00:00
stevenhorsman
bc2d7d9e1e osbuilder: Skip shellcheck on test_images.sh
I'm not sure if we use test_images anywhere, so before
we invest the time to fix the 120 shellcheck errors and warnings
we should decide if we want to keep it. See #10957

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-04 09:35:46 +00:00
stevenhorsman
fb1d4b571f workflows: Add required shellcheck workflow
Start with a required smaller set of shellchecks
to try and prevent regressions whilst we fix
the current problems

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-04 09:35:46 +00:00
stevenhorsman
b3972df3ca workflows: Shellcheck - ignore vendor
Ignore the vendor directories in our shellcheck
workflow as we can't fix them. If there is a way to
set this in shellcheckrc that would be better, but
it doesn't seem to be implemented yet.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-04 09:35:46 +00:00
Zvonko Kaiser
4df406f03c
Merge pull request #10965 from zvonkok/fix-init
gpu: fix init symlinks
2025-03-03 14:46:41 -05:00
Zvonko Kaiser
eb2f75ee61 gpu: fix init symlinks
With the recent changes we need to make sure NVRC is symlinked
for init and sbin/init

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-03-03 17:21:59 +00:00
Greg Kurz
545022f295
Merge pull request #10817 from Jakob-Naucke/virtio-net-ccw
Fix virtio-net-ccw
2025-03-03 17:37:46 +01:00
Ruoqing He
2ecb2fe519 runtime-rs: Enable Dragonball for x86_64 & aarch64
`USE_BUILDIN_DB` is turned on by default for architectures do not
support `Dragonball`, which leads `s390x` is building `runtime-rs` with
`--features dragonball` presents.

Let's restrict `USE_BUILDIN_DB` to be enable only for architectures
supported by `Dragonball` (namely x86_64 and aarch64 as of now).

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-03-03 12:10:58 +08:00
stevenhorsman
c69509be1c metrics: Reduce repeats for boot time tests on qemu
On qemu the run seems to error after ~4-7 runs, so try
a cut down version of repetitions to see if this helps us
get results in a stable way.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-02 08:42:00 +00:00
stevenhorsman
0962cd95bc metrics: Increase minpercent range for qemu iperf test
We have a new metrics machine and environment
and the iperf jitter result failed as it finished too quickly,
so increase the minpercent to try and get it stable

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-02 08:32:26 +00:00
stevenhorsman
ef0e8669fb metrics: Increase minpercent range for clh tests
We have a new metrics machine and environment
and the fio write.bw and iperf3 parallel.Results
tests failed for clh, as below
the minimum range, so increase the
minpercent to try and get it stable

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-02 08:32:26 +00:00
stevenhorsman
f81c85e73d metrics: Increase maxpercent range for clh boot times
We have a new metrics machine and environment
and the boot time test failed for clh, so increase the
maxpercent to try and get it stable

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-01 17:50:05 +00:00
stevenhorsman
435ee86fdd metrics: Update iperf affinity
The iperf deployment is quite a lot out of date
and uses `master` for it's affinity and toleration,
so update this to control-plane, so it can run on
newer Kubernetes clusters

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-01 17:50:05 +00:00
stevenhorsman
85bbc0e969 metrics: Increase wait time
The new metrics runner seems slower, so we are
seeing errors like:
The iperf3 tests are failing with:
```
pod rejected: RuntimeClass "kata" not found
```
so give more time for it to succeed

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-01 17:50:05 +00:00
stevenhorsman
4ce94c2d1b Revert "metrics: Add init_env function to latency test"
This reverts commit 9ac29b8d38.
to remove the duplicate `init_env` call

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-01 17:50:05 +00:00
stevenhorsman
658a5e032b metrics: Increase containerd start timeout
- Move `kill_kata_components` from common.bash
into the metrics code base as the only user of it
- Increase the timeout on the start of containerd as
the last 10 nightlies metric tests have failed with:
```
223478 Killed                  sudo timeout -s SIGKILL "${TIMEOUT}" systemctl start containerd
```

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-01 17:50:05 +00:00
stevenhorsman
3fab7944a3 workflows: Improve metrics jobs
- As the metrics tests are largely independent
then allow subsequent tests to run even if previous
ones failed. The results might not be perfect if
clean-up is required, but we can work on that later.
- Move the test results check out of the latency
test that seems arbitrary and into it's own job step
- Add timeouts to steps that might fail/hang if there
are containerd/K8s issues

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-01 17:50:05 +00:00
stevenhorsman
6f918d71f5 workflows: Update metrics jobs
Currently the run-metrics job runs a manual install
and does this in a separate job before the metrics
tests run. This doesn't make sense as if we have multiple
CI runs in parallel (like we often do), there is a high chance
that the setup for another PR runs between the metrics
setup and the runs, meaning it's not testing the correct
version of code. We want to remove this from happening,
so install (and delete to cleanup) kata as part of the metrics
test jobs.

Also switch to kata-deploy rather than manual install for
simplicity and in order to test what we recommend to users.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-01 17:50:05 +00:00
Zvonko Kaiser
3f13023f5f
Merge pull request #10870 from zvonkok/module-signing
gpu: add module signing
2025-03-01 09:51:24 -05:00
Zvonko Kaiser
d971e13446 gpu: Update rootfs.sh
Only source NV scripts if variant starts with "nvidia-gpu"

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-03-01 02:08:29 +00:00
Fabiano Fidêncio
4018079b55
Merge pull request #10960 from fidencio/topic/kata-deploy-fix-k0s-deployment
kata-deploy: k0s: Fix drop-in path
2025-02-28 18:49:46 +01:00
Zvonko Kaiser
94579517d4 shellcheck: Update nvidia_rootfs.sh
With the new rules we need more updates.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-02-28 16:36:05 +00:00
Zvonko Kaiser
af1d6c2407 shecllcheck: Update nvidia_chroot.sh
Make shellcheck happy with the new rules new updates
needed

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-02-28 16:27:51 +00:00
Fabiano Fidêncio
c95f9885ea kata-deploy: k0s: Fix drop-in path
The drop-in path should be /etc/containerd (from the containers'
perspective), which mounts to the host path /etc/k0s/containerd.d.

With what we had we ended up dropping the file under the
/etc/k0s/containerd.d/containerd.d/, which is wrong.

This is a regression introduce by: 94b3348d3c

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2025-02-28 16:32:00 +01:00