Commit Graph

15384 Commits

Author SHA1 Message Date
Zvonko Kaiser
91c6d524f8 gpu: Fix arm64 kernel build
CONFIG_IOASID (not configurable) in newer kernels.
Removing it.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-01-22 18:15:57 +00:00
Fabiano Fidêncio
6baa60d77d
Merge pull request #10775 from fidencio/topic/update-ttrpc-crate
agent: Update ttrpc to include the fix for connectivity issues
2025-01-22 17:45:38 +01:00
stevenhorsman
ab27e11d31 workflows: Switch to github-hosted arm runner
Now that gituhb have hosted arm runners
https://github.blog/changelog/2025-01-16-linux-arm64-hosted-runners-now-available-for-free-in-public-repositories-public-preview/
we should try and switch our arm64 builder jobs to
run on these.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-01-22 16:27:17 +00:00
Greg Kurz
90b6d5725b
Merge pull request #10773 from RuoqingHe/retry-on-aks-throttle
ci: Retry on failure of Create AKS cluster
2025-01-22 15:30:57 +01:00
Ruoqing He
373a388844 ci: Retry on failure of Create AKS cluster
The `Create AKS cluster` step in `run-k8s-tests-on-aks.yaml` is likely
to fail fail since we are trying to issue `PUT` to `aks` in a relatively
high frequency, while the `aks` end has it's limit on `bucket-size` and
`refill-rate`, documented here [1].

Use `nick-fields/retry@v3` to retry in 10 seconds after request fail,
based on observations that AKS were request 7, or 8 second delays
before retry as part of their 429 response

[1] https://learn.microsoft.com/en-us/azure/aks/quotas-skus-regions#throttling-limits-on-aks-resource-provider-apis

Fixes: #10772

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-01-22 13:24:51 +00:00
Fabiano Fidêncio
a8678a7794 deps: Update ttrpc to v0.8.4
Update the ttrpc crate to include the fix from Moritz Sanft, which
solves the connectivity issues with 6.12.x kernels*

*: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v6.12.9&id=3257813a3ae7462ac5cde04e120806f0c0776850

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2025-01-22 13:05:43 +01:00
Fabiano Fidêncio
e71bc1f068
Merge pull request #10770 from zvonkok/gpu_kernel_dep
gpu: Add kernel dep for the non coco use-case
2025-01-22 12:53:39 +01:00
Greg Kurz
17d053f4bb
Merge pull request #10711 from teawater/balloon
Add reclaim_guest_freed_memory config to qemu and cloud-hypervisor
2025-01-22 10:57:13 +01:00
Hui Zhu
c148b70da7 libs/kata-types: Remove config enable_swap
Remove config enable_swap because there is no code use it.

Fixes: #10761

Signed-off-by: Hui Zhu <teawater@antgroup.com>
2025-01-22 11:08:45 +08:00
Aurélien Bombo
4e9d1363b3
Merge pull request #10754 from sprt/sprt/ci-gh-pr-number-coco
ci: Unify on `$GH_PR_NUMBER` environment variable
2025-01-21 15:07:24 -06:00
Zvonko Kaiser
4621f53e4a gpu: Add kernel dep for the non coco use-case
Add the kernel dependency to the non coco use-case
so that a rootfs build can be executed via GHA.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-01-21 16:18:14 +00:00
Zvonko Kaiser
61c282c725
Merge pull request #10769 from kata-containers/revert-10764-gpu_ci_cd
Revert "gpu: Add rootfs target amd64/arm64"
2025-01-21 11:09:52 -05:00
Zvonko Kaiser
9fd430e46b Revert "gpu: Add rootfs target amd64/arm64"
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-01-21 16:08:30 +00:00
Zvonko Kaiser
ef1639b6bf
Merge pull request #10764 from zvonkok/gpu_ci_cd
gpu: Add rootfs target amd64/arm64
2025-01-21 09:51:20 -05:00
Ruoqing He
7e76ef587a virtiofsd: Enable build for RISC-V
With this change, `virtiofsd` (gnu target) could be built and then to be
used with other components.

Depends: #10741
Fixes: #10739

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-01-21 18:05:37 +08:00
Hui Zhu
185b94b7fa runtime-rs: Add reclaim_guest_freed_memory cloud-hypervisor support
Add reclaim_guest_freed_memory config to cloud-hypervisor in runtime-rs.

Fixes: #10710

Signed-off-by: Hui Zhu <teawater@antgroup.com>
2025-01-21 10:34:21 +08:00
Hui Zhu
487171d992 runtime-rs: Add reclaim_guest_freed_memory qemu support
Add reclaim_guest_freed_memory config to qemu in runtime-rs.

Fixes: #10710

Signed-off-by: Hui Zhu <teawater@antgroup.com>
2025-01-21 10:34:18 +08:00
Hui Zhu
8f550de88a runtime-rs: db: Change config enable_balloon_f_reporting
Change config enable_balloon_f_reporting of db to
reclaim_guest_freed_memory.

Signed-off-by: Hui Zhu <teawater@antgroup.com>
2025-01-21 10:34:08 +08:00
Hui Zhu
42f5ef9ff1 kernel: config: Add CONFIG_VIRTIO_BALLOON to virtio.conf
Add CONFIG_VIRTIO_BALLOON to virtio.conf to open virtio-balloon.

Fixes: #10710

Signed-off-by: Hui Zhu <teawater@antgroup.com>
2025-01-21 10:34:04 +08:00
Zvonko Kaiser
8b097244e7 gpu: Add rootfs initrd build for arm64
We need the arm64 builds as well for GH and GB systems.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-01-20 19:03:52 +00:00
Zvonko Kaiser
f525631522 gpu: Add rootfs target amd64
Adding the initrd build first to get the rootfs on amd64.
With that we can start to add tests.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-01-20 19:01:42 +00:00
Zvonko Kaiser
d7059e9024
Merge pull request #10736 from zvonkok/gpu-rootfs-fix
gpu: Fix rootfs build
2025-01-17 14:44:41 -05:00
Aurélien Bombo
0d70dc31c1 ci: Unify on $GH_PR_NUMBER environment variable
While working on #10559, I realized that some parts of the codebase use
$GH_PR_NUMBER, while other parts use $PR_NUMBER.

Notably, in that PR, since I used $GH_PR_NUMBER for CoCo non-TEE tests
without realizing that TEE tests use $PR_NUMBER, the tests on that PR
fail on TEEs:

https://github.com/kata-containers/kata-containers/actions/runs/12818127344/job/35744760351?pr=10559#step:10:45

  ...
  44      error: error parsing STDIN: error converting YAML to JSON: yaml: line 90: mapping values are not allowed in this context
  ...
  135               image: ghcr.io/kata-containers/csi-kata-directvolume:
  ...

So let's unify on $GH_PR_NUMBER so that this issue doesn't repro in the
future: I replaced all instances of PR_NUMBER with GH_PR_NUMBER.

Note that since some test scripts also refer to that variable, the CI
for this PR will fail (would have also happened with the converse
substitution), hence I'm not adding the ok-to-test label and we should
force-merge this after review.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2025-01-17 10:53:08 -06:00
Fabiano Fidêncio
c018a1cc61
Merge pull request #10741 from RuoqingHe/update-virtiofsd-build-image
virtiofsd: Update ubuntu to 22.04 for gnu target
2025-01-16 20:51:10 +01:00
Zvonko Kaiser
2777b13db7
Merge pull request #10742 from zvonkok/3.13.0-release
release: Bump version to 3.13.0
2025-01-16 10:05:48 -05:00
Ruoqing He
c70195d629 virtiofsd: Update ubuntu to 22.04 for gnu target
With ubuntu 20.04 image, virtiofsd gnu target couldn't be built due to
"unsupported ISA subset z" reported by "cc".

Updating to ubuntu 22.04 image addresses this problem.

Relates: #10739

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-01-16 17:27:38 +08:00
Zvonko Kaiser
e82fdee20f runtime: Add proper IOMMUFD parsing
With newer kernels we have a new backend for VFIO
called IOMMUFD this is a departure from VFIO IOMMU Groups
since it has only one device associated with an IOMMUFD entry.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-01-15 23:39:33 +00:00
Zvonko Kaiser
f0bd83b073 gpu: Fix rootfs build
The pyinstaller is located per default under /usr/local/bin
some prior versions were installing it to ${HOME}.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-01-15 20:37:51 +00:00
Aurélien Bombo
0d93f59f5b
Merge pull request #10738 from microsoft/danmihai1/empty-pty-lines
runtime: skip empty Guest console output lines
2025-01-15 10:33:24 -06:00
Zvonko Kaiser
0b04f43ac6 release: Bump version to 3.13.0
Bump VERSION and helm-chart versions

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-01-15 16:13:22 +00:00
Zvonko Kaiser
365def9b4a
Merge pull request #10735 from BbolroC/kubectl-create-retry-trusted-storage
tests: Introduce retry_kubectl_apply() for trusted storage
2025-01-14 21:59:45 -05:00
Dan Mihai
2e21f51375 runtime: skip empty Guest console output lines
Skip logging empty lines of text from the Guest console output, if
there are any such lines.

Without this change, the Guest console log from CLH + /dev/pts/0 has
twice as many lines of text. Half of these lines are empty.

Fixes: #10737

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2025-01-15 00:28:26 +00:00
Hyounggyu Choi
f7816e9206 tests: Introduce retry_kubectl_apply() for trusted storage
On s390x, some tests for trusted storage occasionally failed due to:

```bash
etcdserver: request timed out
```

or

```bash
Internal error occurred: resource quota evaluation timed out
```

These timeouts were not observed previously on k3s but occur
sporadically on kubeadm. Importantly, they appear to be temporary
and transient, which means they can be ignored in most cases.

To address this, we introduced a new wrapper function, `retry_kubectl_apply()`,
for `kubectl create`. This function retries applying a given manifest up to 5
times if it fails due to a timeout. However, it will still catch and handle
any other errors during pod creation.

Fixes: #10651

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2025-01-14 21:15:44 +01:00
Fabiano Fidêncio
121ac0c5c0
Merge pull request #10727 from microsoft/danmihai1/mariner3-guest
image: bump mariner guest version to 3.0
2025-01-14 19:06:28 +01:00
Fabiano Fidêncio
3658ea2320
Merge pull request #10731 from microsoft/danmihai1/quiet-rootfs-build
rootfs: reduced console output by default
2025-01-14 19:02:42 +01:00
Chengyu Zhu
7d34ca4420
Merge pull request #10674 from bpradipt/fix-10398
agent: alternative implementation for sealed_secret as volume
2025-01-14 18:55:45 +08:00
Fabiano Fidêncio
4578969c5d
Merge pull request #10730 from BbolroC/bump-coco-trustee
versions: Bump trustee to latest
2025-01-14 08:56:11 +01:00
Dan Mihai
c4da296326 rootfs: delete links to deleted files
Delete symbolic links to files being deleted.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2025-01-13 21:28:44 +00:00
Dan Mihai
5b8471ffce rootfs: print the path to files being deleted
Show the list of files being deleted.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2025-01-13 21:28:34 +00:00
Dan Mihai
a49d0fb343 rootfs: delete systemd units/files from rootfs.sh
Move the deletion of unnecessary systemd units and files from
image_builder.sh into rootfs.sh.

The files being deleted can be applicable to other image file formats
too, not just to the rootfs-image format created by image_builder.sh.

Also, image_builder.sh was deleting these files *after* it calculated
the size of the rootfs files, thus missing out on the opportunity to
possibly create a smaller image file.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2025-01-13 21:28:23 +00:00
Dan Mihai
0f522c09d9 rootfs: reduced console output by default
Use "set -x" only when the user specified DEBUG=1.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2025-01-13 19:34:05 +00:00
Pradipta Banerjee
36580bb642 tests: Update sealed secret CI value to base64url
The existing encoding was base64 and it fails due to
874948638a

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
2025-01-13 09:37:05 -05:00
Hyounggyu Choi
2cdb549a75 versions: Bump trustee to latest
This update addresses an issue with token verification for SE and SNP
introduced in the last update by #10541.
Bumping the project to the latest commit resolves the issue.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2025-01-13 15:07:33 +01:00
Pradipta Banerjee
5218345e34 agent: alternative implementation for sealed_secret as volume
The earlier implementation relied on using a specific mount-path prefix - `/sealed`
to determine that the referenced secret is a sealed secret.
However that was restrictive for certain use cases as it forced
the user to always use a specific mountpath naming convention.

This commit introduces an alternative implementation to relax the
restriction. A sealed secret can be mounted in any mount-path.
However it comes with a potential performance penality. The
implementation loops through all volume mounts and reads the file
to determine if it's a sealed secret or not.

Fixes: #10398

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
2025-01-11 12:36:44 -05:00
Dan Mihai
4707883b40 image: bump mariner guest version to 3.0
Use Mariner 3.0 (a.k.a., Azure Linux 3.0) as the Guest CI image.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2025-01-11 17:36:19 +00:00
Fabiano Fidêncio
2d9baf899a
Merge pull request #10719 from msanft/msanft/runtime/fix-boolean-opts
runtime: use actual booleans for QMP `device_add` boolean options
2025-01-11 16:38:06 +01:00
Zvonko Kaiser
f08a9eac11
Merge pull request #10721 from stevenhorsman/more-metrics-latency-minimum-range-fixes
metrics: Increase latency test range
2025-01-10 21:59:39 -05:00
Moritz Sanft
e5735b221c
runtime: use actual booleans for QMP device_add boolean options
Since
be93fd5372,
which is included in QEMU since version 9.2.0, the options for the
`device_add` QMP command need to be typed correctly.

This makes it so that instead of `"on"`, the value is set to `true`,
matching QEMU's expectations.

This has been tested on QEMU 9.2.0 and QEMU 9.1.2, so before and after
the change.

The compatibility with incorrectly typed options  for the `device_add`
command is deprecated since version 6.2.0 [^1].

[^1]:  https://qemu-project.gitlab.io/qemu/about/deprecated.html#incorrectly-typed-device-add-arguments-since-6-2

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>
2025-01-10 11:53:56 +01:00
Wainer Moschetta
5fae2a9f91
Merge pull request #9871 from wainersm/fix-print_cluster_name
tests/gha-run-k8s-common: shorten AKS cluster name
2025-01-09 14:35:02 -03:00
stevenhorsman
aaae5b6d0f metrics: clh: Increase network-iperf3 range
We hit a failure with:
```
time="2025-01-09T09:55:58Z" level=warning msg="Failed Minval (0.017600 > 0.015000) for [network-iperf3]"
```
The range is very big, but in the last 3 test runs I reviewed we have got a minimum value of 0.015s
and a max value of 0.052, so there is a ~350% difference possible
so I think we need to have a wide range to make this stable.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-01-09 11:25:57 +00:00