Commit Graph

15709 Commits

Author SHA1 Message Date
Zvonko Kaiser
ca4d227562 gpu: Add qemu-tdx-experimental build
We need to introduce again the qemu-tdx build for the GPU

Depends-on: #10867

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-02-19 14:48:56 +00:00
Hyounggyu Choi
a8363c28ca tests: Support systemd unit files in /usr/lib as well as /lib
On Ubuntu 24.04, due to the /usr merge, system-provided unit files
now reside in `/usr/lib/systemd/system/` instead of `/lib/systemd/system/`.
For example, the command below now returns a different path:

```
$ systemctl show containerd.service -p FragmentPath
/usr/lib/systemd/system/containerd.service
```

Previously, on Ubuntu 22.04 and earlier, it returned:

```
/lib/systemd/system/containerd.service
```

The current pattern `if [[ $unit_file == /lib* ]]` fails to match the new path.
To ensure compatibility across versions, we update the pattern to match both
`/lib` and `/usr/lib` like:

```
if [[ $unit_file =~ ^/(usr/)?lib/ ]]
```

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2025-02-19 14:34:59 +01:00
Zvonko Kaiser
0d786577c6
Merge pull request #10867 from zvonkok/qemu-snp-tdx-experimental
gpu: QEMU SNP+TDX experimental updates
2025-02-19 08:26:37 -05:00
Ruoqing He
a8a096b20c dragonball: Centralize RustVMM crates
Centralize all RustVMM crates to workspace.dependencies to prevent
having multiple versions of each RustVMM crate, which is error-prone and
inconsistent. With this setup, updates on RustVMM crates would be much
easier.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-02-19 21:20:30 +08:00
Ruoqing He
b129972e12 dragonball: Setup workspace
Setup workspace in dragonball, move `dbs` crates one level up to be
managed as members of dragonball workspace.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-02-19 21:20:30 +08:00
Ruoqing He
a174e2be03 dragonball: Appease clippy introduced by 1.80.0
New clippy warnings show up after Rust Tool Chain bumped from 1.75.0 to
1.80.0, fix accrodingly.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-02-19 21:20:30 +08:00
Ruoqing He
6bb193bbc0 spell: Update dictionary for dbs crates
Add entries for dbs_* crates' README.md to pass `kata-spell-check.sh`
spell checking.

Changed British terms to American terms in README of `dbs_pci` to pass
`hunspell` check.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-02-19 21:20:30 +08:00
Zvonko Kaiser
73b7a3478c
Merge pull request #10893 from RuoqingHe/fix-static-check
ci: Fix spell_check and improve header_check
2025-02-19 08:08:40 -05:00
Mikko Ylinen
926119040c packaging: make install_oras.sh to run curl without sudo
sudo hides the environment variables that are sometimes
useful with the builds (for example: proxy settings).

While install_oras.sh could run completely without sudo in
the container it's COPY'd to, make minimal changes to it
to keep it functional outside the container too while still
addressing the problem of 'sudo curl' not working with proxy
env variables.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2025-02-19 09:34:13 +02:00
Mikko Ylinen
0d8242aee4 agent: rename cargo config
To mitigate:

warning: `.../kata-containers/src/agent/.cargo/config` is deprecated in favor of `config.toml`
note: if you need to support cargo 1.38 or earlier, you can symlink `config` to `config.toml`

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2025-02-19 09:34:13 +02:00
Fabiano Fidêncio
c8db24468c
Merge pull request #10894 from BbolroC/use-multi-arch-for-qemu-sample
example: Use multi-arch image for test-deploy-kata-qemu.yaml
2025-02-18 23:43:52 +01:00
Dan Mihai
672462e6b8
Merge pull request #10895 from katexochen/p/agent-deps
agent: make policy feature optional again
2025-02-18 13:27:23 -08:00
Dan Mihai
6b389fdd4f
Merge pull request #10896 from katexochen/p/oci-client-genplicy
genpolicy: bump oci-distribution to v0.12.0
2025-02-18 12:42:23 -08:00
Markus Rudy
67fbad5f37 genpolicy: bump oci-distribution to v0.12.0
This picks up a security fix for confidential pulling of unsigned
images.

The crate moved permanently to oci-client, which required a few import
changes.

Co-authored-by: Paul Meyer <katexochen0@gmail.com>
Signed-off-by: Markus Rudy <mr@edgeless.systems>
2025-02-18 16:32:00 +01:00
Ruoqing He
d23284a0dc header_check: Check header for changed text files
We are running `header_check` for non-text files like binary files,
symbolic link files, image files (pictures) and etc., which does not
make sense.

Filter out non-text files and run `header_check` only for text files
changed.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-02-18 22:39:53 +08:00
Paul Meyer
80af09aae9 agent: make policy feature optional again
This was messed up a little when factoring out the policy crate.
Removing the dependencies no longer used by the agent and making the
import of kata-agent-policy optional again.

Signed-off-by: Paul Meyer <katexochen0@gmail.com>
2025-02-18 15:28:06 +01:00
Hyounggyu Choi
4646058c0c example: Use multi-arch image for test-deploy-kata-qemu.yaml
An image `registry.k8s.io/hpa-example` only supports amd64.
Let's use a multi-arch image `quay.io/prometheus/prometheus`
for the QEMU example instead.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2025-02-18 14:23:09 +01:00
Ruoqing He
7e49e83779 spell: Add missing entries for kata-spell-check
`kata-dictionary.dic` changes after running `kata-spell-check.sh
make-dict`. This is due to someone forgot to first update entries in
data and run `make-dict`, but directly updated `kata-dictionary.dic`
instead.

Add mssing entries to data and re-run `make-dict` to generate correct
`kata-dictionary.dic`.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-02-18 19:06:34 +08:00
Lukáš Doktor
d0ef78d3a4
ci: Change the way we modify runtimeclass in webhook
previously we used to deploy the webhook and then modified the cm from
our ci/openshift-ci/ script to the desired value, but sometimes it
happens that the webhook pod starts before we modify the cm and keeps
using the default value.

Let's change the approach and modify the deployments in-place. The only
cons is it leaves the git dirty, but since this script is only supposed
to be used in ci it should be safe.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2025-02-18 11:39:22 +01:00
Anastassios Nanos
1e6cea24c8
Merge pull request #10890 from zvonkok/arm64-fix-release
release: Remove artifacts for release
2025-02-17 22:29:23 +02:00
Zvonko Kaiser
1d9915147d release: Remove artifacts for release
We need to make sure the release does not have any residual binaries
left for the release payload

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-02-17 20:16:48 +00:00
Anastassios Nanos
ae1be28ddd
Merge pull request #10880 from nubificus/3.14.0-release
release: Bump version to 3.14.0
2025-02-17 20:25:30 +02:00
Zvonko Kaiser
72833cb00b
Merge pull request #10878 from zvonkok/agent_cdi_timeout
gpu: agent cdi timeout
2025-02-17 12:49:51 -05:00
Zvonko Kaiser
fda095a4c9
Merge pull request #10786 from zvonkok/gpu-config-update
gpu: Update config files
2025-02-17 12:45:54 -05:00
Anastassios Nanos
c7347cb76d release: Bump version to 3.14.0
Bump VERSION and helm-chart versions

Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
2025-02-17 16:47:24 +00:00
Fabiano Fidêncio
639bc84329
Merge pull request #10787 from fidencio/topic/bump-kernel-to-6.12.11
version: Bump kernel to 6.12.13
2025-02-17 17:39:14 +01:00
Fabiano Fidêncio
7ae5fa463e versions: Bump coco-guest-components
So attestation-agent and others have a version including the ttrpc bump
to v0.8.4, allowing us to use the latest LTS kernel.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2025-02-17 15:16:54 +01:00
Fabiano Fidêncio
1381cab6f0 build: Fix rootfs cache logic
We've been appending to the wrong variable for quite some time, it
seems, leading to not actually regenerating the rootfs when needed.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2025-02-17 13:55:36 +01:00
Fabiano Fidêncio
7fc7328bbc versions: Bump kernel to 6.12.13
Let's try to keep up with the LTS patch releases.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2025-02-17 13:47:35 +01:00
Simon Kaegi
f5edbfd696 kernel: support loop device in v6.8+ kernels
Set CONFIG_BLK_DEV_WRITE_MOUNTED=y to restore previous kernel behaviour.

Kernel v6.8+ will by default block buffer writes to block devices mounted by filesystems.
This unfortunately is what we need to use mounted loop devices needed by some teams
to build OSIs and as an overlay backing store.

More info on this config item [here](https://cateee.net/lkddb/web-lkddb/BLK_DEV_WRITE_MOUNTED.html)

Fixes: #10808

Signed-off-by: Simon Kaegi <simon.kaegi@gmail.com>
2025-02-17 13:47:35 +01:00
Fabiano Fidêncio
d96e8375c4
Merge pull request #10885 from stevenhorsman/bump-agent-crates-to-resolve-CVEs
agent: Bump agent crates to resolve CVEs
2025-02-17 12:11:43 +01:00
stevenhorsman
e5a284474d deps: Update cookie-store & publicsuffix
Run:
```
cargo update -p cookie-store
cargo update -p publicsuffix
```
to update the version of idna and resolve CVE-2024-12224

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-02-14 17:30:03 +00:00
stevenhorsman
5656fc6139 deps: Bump reqwest
Bump reqwest to 0.12.12 to pick up fixes

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-02-14 17:30:03 +00:00
stevenhorsman
3a3849efff deps: Update quinn-proto
Update quin-proto to fix CVE-2024-45311

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-02-14 17:30:03 +00:00
Fabiano Fidêncio
64ceb0832a
Merge pull request #10851 from fidencio/topic/bump-image-rs-to-bring-in-ttrpc-0.8.4
agent: Bump image-rs to 514c561d93
2025-02-14 18:21:56 +01:00
Fabiano Fidêncio
d5878437a4
Merge pull request #10845 from DataDog/dind-subcgroup-fix
Add process to init subcgroup when we're using dind with cgroups v2
2025-02-14 18:12:24 +01:00
Steve Horsman
469c651fc0
Merge pull request #10879 from nubificus/fix_version
packaging(release): Properly handle version tag for the release bundle
2025-02-14 14:40:37 +00:00
Zvonko Kaiser
908aacfa78 gpu: Update the logging around CDI
Removed a rogue printf and updated the logging to say
that we're waiting for CDI spec(s) to be generated rather
than saying there is an error, it's not we have a timeout
after that it is an error.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-02-14 14:32:00 +00:00
Zvonko Kaiser
4bda16565b gpu: Update timeouts
With the create_container_timeout the dial_timeout is lest important.
Add the custom timeout for GPUs in create_container_timeout

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-02-14 14:29:18 +00:00
Zvonko Kaiser
66ccc25724 tdx: Update GPU config for the latest TDX stack
We need extra kernel_params for TDX

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-02-14 14:29:18 +00:00
Zvonko Kaiser
d4dd87a974 gpu: Update config files
With the recent changed to cgroupsv1 and AGENT_INIT=no we
need update to the config files.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-02-14 14:29:18 +00:00
Anastassios Nanos
b13db29aaa packaging(release): Properly handle version tag for the release bundle
The tags created automatically for published Github releases
are probably not annotated, so by simply running `git describe` we are
not getting the correct tag. Use a `git describe --tags` to allow git
to look at all tags, not just annotated ones.

Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
2025-02-14 12:41:08 +00:00
Zvonko Kaiser
2499d013bd gpu: Update handle_cdi_devices
AgentConfig now has the cdi_timeout from the kernel
cmdline, update the proper function signature and use
it in the for loop.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-02-13 20:11:48 +00:00
Zvonko Kaiser
d28410ed75
Merge pull request #10877 from AdithyaKrishnan/main
CI: Deprecate SEV
2025-02-13 14:55:11 -05:00
Zvonko Kaiser
95aa21f018 gpu: Add CDI timeout via kernel config
Some systems like a DGX where we have 8 H100 or 8 H800 GPUs
need some extended time to be initialized. We need to make
sure we can configure CDI timeout, to enable even systems with 16 GPUs.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-02-13 19:23:19 +00:00
Adithya Krishnan Kannan
6cc5b79507 CI: Deprecate SEV
Phase 1 of Issue #10840
AMD has deprecated SEV support on
Kata Containers, and going forward,
SNP will be the only AMD feature
supported. As a first step in this
deprecation process, we are removing
the SEV CI workflow from the test suite
to unblock the CI.

Will be adding future commits to
remove redundant SEV code paths.

Signed-Off-By: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com>
2025-02-13 12:20:21 -06:00
Steve Horsman
0a39f59a9b
Merge pull request #10874 from stevenhorsman/skip-consistently-failing-block-volume-test
tests: Skip block volume test on fc, stratovirt
2025-02-13 15:39:45 +00:00
Zvonko Kaiser
a0766986e7
Merge pull request #10832 from RuoqingHe/update-yq
ci: Update yq to v4.44.5 to support riscv64
2025-02-13 08:33:02 -05:00
stevenhorsman
56fb2a9482 tests: Skip block volume test on fc, stratovirt
The block volume test has failed on 10/10 nightlies
and all the PRs I've seen, so skip it until it can be assessed.

See #10873

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-02-13 11:50:35 +00:00
stevenhorsman
2d266df846 test: Update expected error in signed image tests
We are seeing a different error in the new version of image-rs,
so update our tests to match.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-02-13 11:44:51 +00:00