Compare commits

..

186 Commits

Author SHA1 Message Date
Dan Mihai
1901c9b841 DO NOT MERGE: CI test
Test of the ci-devel pipeline
2025-12-19 09:56:15 +00:00
Alex Lyn
b85084f046 Merge pull request #12266 from BbolroC/fix-selective-skip-for-empty-dir-test
tests: remove re-delcared local variable in k8s-empty-dirs.bats
2025-12-19 17:30:07 +08:00
Hyounggyu Choi
3fa1d93f85 tests: remove re-delcared local variable in k8s-empty-dirs.bats
Since #12204 was merged, the following error has been observed:

```
bats warning: Executed 1 instead of expected 2 tests
[run_kubernetes_tests.sh:162] ERROR: Tests FAILED from suites: k8s-empty-dirs.bats
```

The cause is that `pod_logs_file` is re-declared as a local variable
in the second test before skipping, which makes it inaccessible
in `teardown()` and leads to an error.

This commit removes the re-declaration of the variable.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2025-12-18 18:57:16 +01:00
Fabiano Fidêncio
51e9b7e9d1 nydus-snapshotter: Bump to v0.15.10
As it brings a fix that most likely can workaround the containerd /
nydus-snapshotter databases desynchronization.

Reference: https://github.com/containerd/nydus-snapshotter/pull/700

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-18 18:41:09 +01:00
Fabiano Fidêncio
03297edd3a kata-deploy: rust: Add list verb for runtimeclasses RBAC
The Rust kata-deploy binary calls list_runtimeclasses() during NFD
setup, but the ClusterRole only granted get and patch permissions.

Add the list verb to the runtimeclasses resource permissions to fix
the RBAC error:
  runtimeclasses.node.k8s.io is forbidden: User
  \"system:serviceaccount:kube-system:kata-deploy-sa\" cannot list
  resource \"runtimeclasses\" in API group \"node.k8s.io\" at the
  cluster scope

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-18 18:31:52 +01:00
Manuel Huber
78c41b61f4 tests: nvidia: Update images, probes and timeouts
Changes in NIM/RAG samples:
- update image references
- update memory requirements, timeouts, model name
- sanitize some of the probes and print-out

Further refinements can be made in the future.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-18 10:57:14 +01:00
Manuel Huber
0373428de4 tests: nvidia: Use secret for NGC API key
This is a slight change in the manifest to at least use a secret
for the environment variable.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-18 10:57:14 +01:00
Hyounggyu Choi
56ec8d7788 Merge pull request #12204 from kata-containers/runtime-rs-stability-debug
CI: Upgrade log details for improved error analysis
2025-12-18 10:54:54 +01:00
Alex Lyn
c7dfdf71f5 Merge pull request #11935 from burgerdev/fsgroup
genpolicy: support fsGroup setting in pod security context
2025-12-18 16:47:48 +08:00
Xuewei Niu
a65c2b06b8 Merge pull request #12169 from zhangls-0524/new-fix-issue-11996
runtime-rs: Block Device Rootfs Mount Options Lost During Storage Object Creation
2025-12-18 10:09:38 +08:00
Fabiano Fidêncio
0e534fa7fe versions: Update virtiofsd to v1.13.3
Update virtiofsd to its latest release.

Here we also need to update the alpine version used by the builder as we
need a version of musl-dev new enough to have wrappers for pread2 and
pwrite2. As bumping, bump to the latest.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-18 00:51:08 +01:00
Fabiano Fidêncio
1d2e19b07c versions: Update pause image to 3.10.1
Update pause image to its latest release.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-18 00:51:08 +01:00
Fabiano Fidêncio
6211c10904 versions: Update libseccomp to 2.6.0
Update libseccomp to its latest release.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-18 00:51:08 +01:00
Fabiano Fidêncio
0e0a92533c versions: update lvm2 to v2_03_38
Update lvm2 to its latest release.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-18 00:51:08 +01:00
Fabiano Fidêncio
142c7d6522 versions: Update gperf to 3.3
Update gperf to its latest release.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-18 00:51:08 +01:00
Fabiano Fidêncio
e757485853 versions: Update cryptsetup to v2.8.1
Update cryptsetup to its latest release

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-18 00:51:08 +01:00
Fabiano Fidêncio
35cd5fb1d4 versions: Update helm to v4.0.4
Update helm to its latest release

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-18 00:51:08 +01:00
Tobin Feldman-Fitzthum
decc09e975 tests: cc: add test with SNP reference values
Add two attestation tests. The first one sets a resource policy that
requires CPU0 to have an affirming trust level. This is a negative test
which can run on any platform. Setting this policy without setting any
reference values should result in an attestation failure.

Next, a second test will set the same policy, but this time it will use
the journal log to find the QEMU command line from the previous test and
calculate the expected reference values. Currently this is only
supported on SNP using the sev-snp-measure tool, but the same flow
should work on other platforms.

Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
2025-12-18 00:12:11 +01:00
Ruoqing He
8b0d650081 dragonball: Use unique name for vhost path
The five tests are set to the same vhost socket path, which could lead
to racing with one another. Use unique name to avoid this.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-12-17 22:25:55 +01:00
Fabiano Fidêncio
320f1ce2a3 versions: Bump experimental {tdx,snp} QEMU
Let's bump experimental {tdx,snp} QEMU to the tags created Today in the
Confidential Containers repo, which match with QEMU 10.2.0-rc3.

This bump is mostly for early testing what will become 10.2.0, which
will be bumped everywhere then.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-17 17:42:04 +01:00
Alex Lyn
3696d9143a tests: Correct the teardown_common in cpu-ns.bats
It will address the issue:
"# bats warning: Executed 0 instead of expected 1 tests"

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-17 16:14:10 +00:00
Alex Lyn
a28f24ef8c tests: move the get_pod_config_dir into setup_common
As each case need such preparation of get_pod_config_dir,
a better method is directly move it into the setup_common method.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-17 16:14:10 +00:00
Alex Lyn
5778b0a001 tests: Introduce measure_node_time to get test case end time
To measure the duration for journal, we need clearly print the journal
start time and end time for each case which helps to ensure the journal
log is for the specified period for the case.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-17 16:14:10 +00:00
Alex Lyn
648f0913ca tests: Load lib.rs in bats to ensure related function available
The lib.rs should be first loaded before execute some functions call.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-17 16:14:10 +00:00
Alex Lyn
0929c84480 runtime-rs: Reduce output log and increase log level
For failure cases within CI, we need dump the kata log to help
address issues, but currently large log messages cause partial
log we can see.

We remove initdata log output and increase log level to reduce
log output.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-17 16:14:10 +00:00
Alex Lyn
bbec15d695 tests: delete policy_settings_dir only for first test case
Currently policy_settings_dir is created only when
BATS_TEST_NUMBER == "1",
but delete_tmp_policy_settings_dir "${policy_settings_dir}" is
called in teardown() for every test. This means that for tests
after the first one teardown() may attempt to delete a directory
that was already removed by a previous test, or rely on a value
that does not belong to the current test execution.

Adjust teardown logic so that policy_settings_dir is only deleted
for the first test case (BATS_TEST_NUMBER == "1") and ignored for
subsequent tests. This keeps the original optimization of running
genpolicy only once, while avoiding unnecessary or confusing cleanup
attempts in later test cases.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-17 16:14:10 +00:00
Alex Lyn
24e68b246f tests: Add missing bin env at the head of bats
Add the missing part of `#!/bin/bash/env` in bats.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-17 16:14:10 +00:00
Alex Lyn
93ba6a8e76 tests: Make pod_name a global variable
the previous pod_name is set as local which can not be captured
within the teardown() function, causing failure.
This commit just remove the `local pod_name` to make it a global
variable.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-17 16:14:10 +00:00
Alex Lyn
89dce4eff6 tests: Enhance debug log output
Introduce setup_common in setup() and teardown_common() in teardown()
to get enough log to help debug

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-17 16:14:10 +00:00
Fabiano Fidêncio
88cdfab604 runtime: nvidia: Align static_sandbox_resource_mgmt
Let's ensure we have those aligned for both CC and non-CC use-case.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-17 17:04:51 +01:00
Fabiano Fidêncio
995770dbeb runtime: nvidia: Use cold-plug by default
Now that we have the way to do cold-plug, let's ensure we also use it
for the non-CC use case.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-17 17:04:51 +01:00
Hyounggyu Choi
7f72acc266 Merge pull request #12180 from BbolroC/enable-vfio-ap-passthrough-runtime-rs
runtime-rs: Enable VFIO-AP passthrough (hotplug only) on s390x
2025-12-17 15:50:10 +01:00
Hyounggyu Choi
f1b4327dba Merge pull request #12247 from fidencio/topic/ci-store-the-tarballs-we-rely-on-on-gchr-follow-up
build: Fix GPG key for gperf & Pass PUSH_TO_REGISTRY and GH_TOKEN to Docker builds
2025-12-17 13:53:58 +01:00
Fabiano Fidêncio
5415cf4e0f workflows: payload: Remove unneeded stuff from the runner
Otherwise we may hit a `no space left on device` when building the rust
kata-deploy binary.

This happens mostly because of the muli-staging build used to generate a
distroless final container.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-17 09:57:02 +01:00
Fabiano Fidêncio
98c5276546 helm: runtimeclasses: Match the kata-deploy rust deployment
There we ensure labels are added to better deal with ownership of the
runtimeclasses.  It's not strictly needed here as helm does take care of
the ownership, but also doesn't hurt to follow what seems to be a common
practice.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-17 09:57:02 +01:00
Fabiano Fidêncio
6130d7330f ci: Run a nightly job using the kata-deploy rust
Let's shamelessly duplicate the nightly job to have at least nightly
runs using the rust implementation of kata-deploy.

The reason for doing that is to be pragmatic, as pragmatic as possible,
and avoid switching away of the scripts before 3.24.0 release, while
still testing both ways till the switch happens.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-17 09:57:02 +01:00
Fabiano Fidêncio
fbc29f3f5e kata-deploy: helm: Adapt to the rust binary
Differently than the scripts, which are called as `bash -c ...`, the
kata-deploy rust binary must be invoked directly we do not even have
shell in its container.

For now, the rust version is used in the used image has the "-rust"
suffix, which will help us to have both ways being used / tested for a
little while.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-17 09:57:02 +01:00
Fabiano Fidêncio
9d88c6b1d7 kata-deploy: Oxidize the script
kata-deploy shell script is not THAT bad and, to be honest, it's quite
handy for quick hacks and quick changes.  However, it's been
increasingly becoming harder to maintain as it's grown its scope from a
testing tool to the proper project's front door, lacking unit tests, and
with an abundacy of complex regular expressions and bashisms to be able
to properly parse the environment variables it consumes.

Morever, the fact it is a Frankstein's monster glued together using
python packages, golang binaries, and a distro dependent container makes
the situation VERY HARD to use it from a distroless container (thus,
avoiding security issues), preventing further integration with
components that require a higher standard of security than we've been
requiring.

With everything said, with the help of Cursor (mostly on generating the
tests cases), here comes the oxidized version of the script, which runs
from a distroless container image.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-17 09:57:02 +01:00
Fabiano Fidêncio
c9cd79655d build: Pass PUSH_TO_REGISTRY and GH_TOKEN to Docker builds
The ORAS cache helper needs PUSH_TO_REGISTRY to be set to 'yes' to
push new artifacts to the cache. However, this environment variable
was not being passed to the Docker container during agent, tools, and
busybox builds.

Moreover, for ghcr.io authentication, add support for using GH_TOKEN and
GITHUB_ACTOR as fallbacks when explicit credentials
(ARTEFACT_REGISTRY_USERNAME/PASSWORD) are not provided.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-16 21:58:16 +01:00
Fabiano Fidêncio
b11cea3113 build: Fix GPG key for gperf
The GPG key used for gperf was incorrectly set to the busybox
maintainer's key (Denis Vlasenko) instead of the gperf maintainer's
key (Marcel Schaible).

Wrong key (busybox): C9E9416F76E610DBD09D040F47B70C55ACC9965B
                     Denis Vlasenko <vda.linux@googlemail.com>

Correct key (gperf): EDEB87A500CC0A211677FBFD93C08C88471097CD
                     Marcel Schaible <marcel.schaible@studium.fernuni-hagen.de>

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-16 21:58:16 +01:00
Fabiano Fidêncio
6e01ee6d47 helm: Provide kata-remote runtime class
kata-remote is a runtime class that cloud-api-adaptor relies on to work.

kata-remote by itself does nothing, and that's the reason it's disabled
by default. We're only adding it here so cloud-api-adaptor charts can
simply do something like `--set shims.remote.enabled=true`.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-16 21:57:49 +01:00
Fabiano Fidêncio
0a0fcbae4a gatekeeper: Adjust to kata-tools
A few jobs have been renamed as part of the kata-tools split.
Let's add them all here.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-16 18:22:40 +01:00
Fabiano Fidêncio
fb326b53df agent: Ensure MS_REMOUNT is respected
When updating ephemeral storages, MS_REMOUNT is explicitly passed as,
for instance, `/dev/shm` should be remounted after memory is hotplugged.

Till now Kata Containers has been explicitly ignoring such updates,
leading to the containers' `/dev/shm` having the size of "half of the
memory allocated, during the startup time", which goes against the
expected behaviour.

Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
2025-12-16 15:11:34 +01:00
Fabiano Fidêncio
830d15d4c8 tests: Adapt to using kata-tools
Instead of relying and the fully bloated kata tarball.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-16 12:55:07 +01:00
Fabiano Fidêncio
a2534e7bc8 kata-tools: Release as its own tarball
We're only releasing those for amd64 as that's the only architecture
we've been building the packages for.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-16 12:55:07 +01:00
Fabiano Fidêncio
6d2f393be4 build: Split tools build from the other artefacts build
Let's ensure we can create a specific "tools" tarball, which will help
those who only need to pull those either for testing or production
usage.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-16 12:55:07 +01:00
Ruoqing He
6d2c66c7eb runtime-rs: Refactor feature propagation
After runtime-rs workspace merged into root workspace, features passed
when building runtime-rs needs to be refactored to be correctly
propagated. Taking dragonball for example, runtime-rs requires runtimes
to depend on virt_conttainers feature, and virt_containers needs to
handle hypervisor features specifically.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-12-16 11:26:07 +01:00
Ruoqing He
1872af7c5a ci: Install cmake before building runtime-rs
cmake is required for libz-sys to compile (which is required by nydus).

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-12-16 11:26:07 +01:00
Ruoqing He
9551f97e87 runtime-rs: Change TARGET_PATH to root workspace
After the workspace integration of runtime-rs, now the output of
runtime-rs is under the repo root, instead of src/runtime-rs. Change the
TARGET_PATH accordingly to tell Makefile where to lookup output.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-12-16 11:26:07 +01:00
Ruoqing He
c7c02ac513 dragonball: Skip tests needs kvm under non-root
Some cases in dragonball crates requires interaction with KVM module to
complete, which requires root privilege. Skip those tests under non-root
user.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-12-16 11:26:07 +01:00
Ruoqing He
889c3b6012 dragonball: Fix false use statement on aarch64
gic::create_gic is actually gated behind dbs_arch crate, instead of
arch::aarch64.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-12-16 11:26:07 +01:00
Ruoqing He
1c1f3a2416 dragonball: Allow missing_docs for dummy MMIODeviceInfo
MMIODeviceInfo inside the test module of dbs_boot on aarch64 is used for
testing purpose, but `pub` attribute requires it to have documentation.
Since this is used only for testing purpose, let's allow missing_docs
for it.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-12-16 11:26:07 +01:00
Ruoqing He
6d0cb18c07 dragonball: Add missing test module attribute
Test set of dbs_utils's tap module is missing test attribute, which
makes dev-dependencies unusable. Marking tests of tap as test module.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-12-16 11:26:07 +01:00
Ruoqing He
15fe7ecda1 runtime-rs: Remove lockfile
Remove Cargo.lock since it now shares lockfile workspace-wise.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-12-16 11:26:07 +01:00
Ruoqing He
beb0cac0d1 build: Move runtime-rs to root workspace
This is a follow-up of 3fbe693.

Remove runtime-rs from exclude list, and make it as a member of root
workspace.

Specify shim and shim-ctl as the binary of runtime-rs package, make
runtime-rs and all its members into root workspace.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-12-16 11:26:07 +01:00
Ruoqing He
ae4b3e9ac0 runtime-rs: Make runtime-rs a package
Make runtime-rs a package produces shim and shim-ctl as its binary
product, which enables Makefile to work after it's incorporated into
root workspace.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-12-16 11:26:07 +01:00
shezhang.lau
9744e9f26d runtime-rs: Block Rootfs Mount Options During Storage Object Creation
Init the storage options with original rootfs options.
Addition: XFS, append nouuid to the mount options if not exist.

Signed-off-by: shezhang.lau <shezhang.lau@antgroup.com>
2025-12-16 13:57:02 +08:00
Xuewei Niu
c8b5f8efad Merge pull request #12167 from M-Phansa/main
runtime-rs: handle container missing during kill_process gracefully
2025-12-16 10:31:50 +08:00
Fabiano Fidêncio
1388a3acda packaging: Add ORAS cache for gperf and busybox tarballs
To protect against upstream download failures for gperf and busybox,
implement ORAS-based caching to GHCR.

This adds:
- download-with-oras-cache.sh: Core helper for downloading with cache
- populate-oras-tarball-cache.sh: Script to manually populate cache
- warn() function to lib.sh for consistency

Modified build scripts to:
- Try ORAS cache first (from ghcr.io/kata-containers/kata-containers)
- Fall back to upstream download on cache miss
- Automatically push to cache when PUSH_TO_REGISTRY=yes

The cache is automatically populated during CI builds, and parallel
architecture builds check for existing versions before pushing to avoid
race conditions.

Forks benefit from upstream cache but can override with their own:
ARTEFACT_REPOSITORY=myorg/kata make agent-tarball

Generated-By: Cursor IDE with Claude
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-15 22:04:21 +01:00
Markus Rudy
661e851445 genpolicy: support fsGroup setting in pod security context
The runtime handles the fsGroup field of the pod security context by
adding a mount option to the generated storage object [1]. This commit
changes genpolicy to expect this option.

Instead of passing another side input to
yaml::get_container_mounts_and_storages, we pass the entire PodSpec.
This reduces the necessary changes in the pod-generating resources and
allows for possible future use of other PodSpec fields.

[1]: https://github.com/kata-containers/kata-containers/blob/0c6fcde1/src/runtime/virtcontainers/kata_agent.go#L1620-L1625

Fixes: #11934

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2025-12-15 15:22:33 +01:00
Fabiano Fidêncio
a25a53c860 kata-deploy: sa: Fix permissions for patching nodefeaturerules
I've seen this happening with the GPU SNP CI every now and then, but I
don't really understand how this was not caught by the TDX / SNP CI
themselves before.

In any case, the error seen is:
```
  Error from server (Forbidden): error when applying patch:
  {"metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"nfd.k8s-sigs.io/v1alpha1\",\"kind\":\"NodeFeatureRule\",\"metadata\":{\"annotations\":{},\"name\":\"amd64-tee-keys\"},\"spec\":{\"rules\":[{\"extendedResources\":{\"sev-snp.amd.com/esids\":\"@cpu.security.sev.encrypted_state_ids\"},\"labels\":{\"amd.feature.node.kubernetes.io/snp\":\"true\"},\"matchFeatures\":[{\"feature\":\"cpu.security\",\"matchExpressions\":{\"sev.snp.enabled\":{\"op\":\"Exists\"}}}],\"name\":\"amd.sev-snp\"},{\"extendedResources\":{\"tdx.intel.com/keys\":\"@cpu.security.tdx.total_keys\"},\"labels\":{\"intel.feature.node.kubernetes.io/tdx\":\"true\"},\"matchFeatures\":[{\"feature\":\"cpu.security\",\"matchExpressions\":{\"tdx.enabled\":{\"op\":\"Exists\"}}}],\"name\":\"intel.tdx\"}]}}\n"}}}
  to:
  Resource: "nfd.k8s-sigs.io/v1alpha1, Resource=nodefeaturerules", GroupVersionKind: "nfd.k8s-sigs.io/v1alpha1, Kind=NodeFeatureRule"
  Name: "amd64-tee-keys", Namespace: ""
  for: "/opt/kata-artifacts/node-feature-rules/x86_64-tee-keys.yaml": error when patching "/opt/kata-artifacts/node-feature-rules/x86_64-tee-keys.yaml": nodefeaturerules.nfd.k8s-sigs.io "amd64-tee-keys" is forbidden: User "system:serviceaccount:kube-system:kata-deploy-sa" cannot patch resource "nodefeaturerules" in API group "nfd.k8s-sigs.io" at the cluster scope
```

And the fix is as simple as allowing patching and updating a
nodefeaturerule in our service account RBAC.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-15 12:01:20 +01:00
Alex Lyn
f4f61d5666 Merge pull request #12229 from fidencio/topic/kata-deploy-do-deprecations
kata-deploy: Remove deprecated features from 3.23.0
2025-12-15 19:00:07 +08:00
Hyounggyu Choi
b69da5f3ba gatekeeper: Make s390x e2e tests required again
Since the CI issue for s390x was resolved on Dec 5th,
the nightly test result has gone green for 10 consecutive days.
This commit puts the e2e tests for s390x again into the required job list.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2025-12-15 11:12:25 +01:00
Fabiano Fidêncio
ded6d1636f kata-deploy: Remove deprecated features from 3.23.0
Let's remove the deprecated features that were marked for removal
after Kata Containers 3.23.0:

kata-deploy.sh:
- Remove non-arch-specific variable fallbacks (SHIMS, DEFAULT_SHIM,
  SNAPSHOTTER_HANDLER_MAPPING, ALLOWED_HYPERVISOR_ANNOTATIONS,
  PULL_TYPE_MAPPING, EXPERIMENTAL_FORCE_GUEST_PULL). Each arch now
  has its own default value.
- Remove CREATE_RUNTIMECLASSES and CREATE_DEFAULT_RUNTIMECLASS
  variables and associated functions (create_runtimeclasses,
  delete_runtimeclasses, adjust_shim_for_nfd). RuntimeClasses are
  now managed by Helm chart, not the daemonset script.
- Unsupported architectures now fail with an error instead of
  falling back to non-arch-specific defaults.

Helm chart:
- Remove all deprecated env values (createRuntimeClasses,
  createDefaultRuntimeClass, debug, shims, shims_*, defaultShim,
  defaultShim_*, allowedHypervisorAnnotations, snapshotterHandlerMapping,
  snapshotterHandlerMapping_*, agentHttpsProxy, agentNoProxy,
  pullTypeMapping, pullTypeMapping_*, _experimentalSetupSnapshotter,
  _experimentalForceGuestPull, _experimentalForceGuestPull_*).
- Remove backward compatibility code from _helpers.tpl that checked
  for legacy env values.
- Remove legacy env.shims check from runtimeclasses.yaml.
- Remove CREATE_RUNTIMECLASSES and CREATE_DEFAULT_RUNTIMECLASS env
  vars from kata-deploy.yaml and post-delete-job.yaml.
- Update RBAC to only include runtimeclasses get/patch permissions
  (needed for NFD patching), removing create/delete/list/update/watch.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-13 16:32:00 +01:00
Adeet Phanse
db09912808 agent: add SandboxError enum for typed error handling
- Replace generic errors in sandbox operations with typed SandboxError variants (InvalidContainerId, InitProcessNotFound, InvalidExecId).
- This enables the kata shim to handle specific failure cases differently.

Fixes #12120

Signed-off-by: Adeet Phanse <adeet.phanse@mongodb.com>
2025-12-12 12:33:18 -05:00
Adeet Phanse
5b7e1cdaad runtime-rs: handle container missing during kill_process gracefully
Add better error handling to runtime rs to handle when the sandbox itself is killed and recreated.
- Update the kill_process function to skip sending a signal when the process is stopped.
- Always set ProcessStatus::Stopped even when wait_process fails
- In state_process return synthetic state for sandbox container when using Sandbox API

Fixes #12120
Signed-off-by: Adeet Phanse <adeet.phanse@mongodb.com>
2025-12-12 12:33:17 -05:00
Fabiano Fidêncio
c7d0c270ee release: Bump version to 3.24.0
Bump VERSION and helm-chart versions

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-12 18:15:41 +01:00
Fabiano Fidêncio
50b853eb93 tests: nvidia: Always rely on the "kata" default runtime class
This is a pattern already followed by all the other tests.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-12 16:31:42 +01:00
Manuel Huber
ff2396aeec tests: nvidia: Declare KATA_HYPERVISOR variable
Align with other test logic - declare the KATA_HYPERVISOR in the
run bash script, then declare the RUNTIME_CLASS_NAME variable in
the bats files.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-12 16:31:42 +01:00
Manuel Huber
6e31cf2156 tests: nvidia: cc: USE is_confidential_gpu_hw
This function has recently been introduced, so we align patterns.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-12 16:31:42 +01:00
Manuel Huber
cd1f55b41c tests: nvidia: cc: Set GPU0 policy for NIM tests
Now that we have a more restrictive resource policy for KBS, let
us start adopting it across all NVIDIA test cases. This policy was
previously introduced by the NVIDIA attestation test.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-12 16:31:42 +01:00
Manuel Huber
edbac264cb tests: nvidia: cc: Remove KBS variable
The variable is now set in the CI YAML file, thus removing the
assignment.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-12 16:31:42 +01:00
Manuel Huber
9665b74653 tests: nvidia: cc: address shellcheck warnings
Address shellcheck warnings for run_kubernetes_nv_tests.sh

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-12 16:31:42 +01:00
Manuel Huber
5f9e7a03a8 tests: nvidia: do not use teardown_common
Clean up in each NVIDIA bats file according to our needs.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-12 16:31:42 +01:00
Alex Lyn
c3fd4c1621 version: Bump rtnetlink and netlink-packet-route
It aims to upgrade rtnetlink to mitigate netlink log noise.
This commit upgrades the `rtnetlink` dependency (and corresponding
libraries like `netlink-packet-route`) to address excessive and
unnecessary netlink-related logging during sandbox startup.

Problem:
The previously used `rtnetlink v0.16` (depending on `netlink-proto
v0.11.3`) generates a high volume of DEBUG/INFO level netlink messages
during sandbox initialization. This noise:
1.  Overloads the logging system, often leading to warnings like
"slog-async: logger dropped messages due to channel overflow."
2.  Interferes with effective troubleshooting by distracting developers
from legitimate Kata errors.

Solution:
We upgrade to `rtnetlink v0.19` (and `netlink-proto v0.12`), as testing
confirms that the latest versions have correctly elevated the verbosity
of these netlink internal events to the TRACE level.

This change significantly enhances the log analysis experience by
suppressing unnecessary network-related logs during startup.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-12 14:27:33 +01:00
Manuel Huber
1781fb8b06 tests: nvidia: cc: Use CUDA image from NVCR
Pull from nvcr.io to avoid hitting unauthenticated pull rate
limits.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-12 12:52:33 +01:00
Manuel Huber
f63f95f315 tests: nvidia: cc: generate pod security policies
With these changes, we create pod security policies when running
against NVIDIA TEE GPU handlers where AUTO_GENERATE_POLICY is set.
For the non-TEE GPU tests, the added functions bail out by design.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-12 12:52:33 +01:00
Manuel Huber
bf26ad9532 nvidia: tests: remove outer CDI annotations
With the new device plugin being used by CI runners, these
annotations are no longer necessary.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-12 12:52:33 +01:00
Manuel Huber
37b4f6ae8b tests: Adapt NVIDIA common policy settings
Following existing patterns, we adapt the common policy settings
for NVIDIA GPU CI platforms. For instance, for our CI runners, we
use containerd 2.x.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-12 12:52:33 +01:00
Manuel Huber
f4c0c8546e tests: Enable AUTO_GENERATE_POLICY for NVIDIA TEEs
Enable auto-generate policy for qemu-nvidia-gpu-* if the user
didn't specify an AUTO_GENERATE_POLICY value.

Setting this in run_kubernetes_nv_tests.sh is too late as
gha-run.sh calls into run_tests, setup.sh, and then into
create_common_genpolicy_settings() where the rules.rego and
genpolicy-settings file are being copied to the right locations.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-12 12:52:33 +01:00
Manuel Huber
b9774e44b6 genpolicy: tests: Add VFIO passthrough test cases
Add one valid test case with 2 GPUs with proper VFIO device
entries and CDI annotations.
Add seven test cases with invalid combinations of VFIO device
entries and CDI annotations.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-12 12:52:33 +01:00
Manuel Huber
d3e6936820 genpolicy: validation of vfio passthrough GPUs
Add rules for vfio passthrough GPUs. When creating the security
policy document, parse GPU resource limits and derive CDI
annotation patterns and VFIO device entries.
With various values for CDI annotations and device paths being
runtime-dependent, use regular expressions.
For now, this enables passthrough of NVIDIA GPUs, but the changes
are designed to allow for other VFIO device types.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-12 12:52:33 +01:00
Alex Lyn
82e8e9fbe0 doc: add block device's settings to the doc page
Add the block device specific annotations which is dedicated within
runtime-rs for num_queues and queue_sie to the document to help
users set the two parameters.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-11 21:10:22 +01:00
Alex Lyn
a8a458664d kata-types: Allow dynamic queue config via Pod annotations
This commit introduces the capability to dynamically configure
`queue_size` and `num_queues` parameters via Pod annotations.

Currently, `kata-runtime` allows for static configuration of
`queue_size` and `num_queues` for block devices through its config
file. However, a critical issue arises when a Pod is allocated fewer
CPU cores than the statically configured `num_queues` value. In such
scenarios, the Pod fails to start, leading to operational instability
and limiting flexibility in resource allocation.

To address this, this feature enables users to override the default
queue_size and num_queues parameters by specifying them in Pod
annotations.This allows for fine-grained control and dynamic adjustment
of these parameters based on the specific resource allocation of a Pod.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-11 21:10:22 +01:00
Steve Horsman
51459b9b15 Merge pull request #12220 from fidencio/topic/ci-arm64-temporarily-disable-arm64-non-k8s-tests
ci: arm64-non-k8s: temporarily skip the tests
2025-12-11 11:35:39 +00:00
Fabiano Fidêncio
46c7d6c9f8 ci: arm64-non-k8s: temporarily skip the tests
The runner is down for a few weeks. I may end up bringing in my personal
runner, but I'm not confident I can easily do this before the holidays,
thus I'm skipping the tests for now.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-11 12:14:32 +01:00
Manuel Huber
560f6f6c74 tests: nvidia: cc: Affirming attestation policy
Set the attestation policy for GPU0 to affirming. This requires
the GPU, for instance, to have production properties, such as
properly signed VBIOS firmware.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-11 10:16:58 +01:00
Alex Lyn
751b6875f9 tests: Temporarily skip the cpu-ns test for the s390x platform
As some reasons that this CI is continuously failed, we'd like to
temporarily skip it for the s390x platform. And it will be enabled
when we addressed related issues.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-10 22:11:56 +01:00
Alex Lyn
d495b77135 runtime-rs: Align the default annptations with runtime-go
As the default enable_annotations in runtime-rs is different with
runtime-go, we should make it align with configuration in runtime-go.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-10 22:11:56 +01:00
Alex Lyn
c8dd5fbacf runtime-rs: Migrate vCPU tracking to fractional float
This commit refactors the vCPU resource management within runtime's
`CpuResource` structure and related calculation logic to use
floating-point numbers (`f32`) instead of integers (`u32`).

This migration is necessary to fully support the fractional vCPU
allocation introduced in the `kata-types` library, ensuring better
precision in:
1.Allocation Tracking: `current_vcpu` now tracks the precise
fractional value (e.g., 1.5 vCPUs).
2.Resource Calculation: `calc_cpu_resources` now returns a precise
`f32` sum of container vCPU requests, including normalization logic
based on the maximum period, removing the previous integer rounding
steps in the calculation.
3.Hypervisor Interaction: The integer vCPU requirement for the
hypervisor remains, so `ceil()` is now explicitly applied only when
interacting with the hypervisor or agent APIs
(`do_update_cpu_resources`, `current_vcpu`, `online_cpu_mem`).

And key changes as below:
1. `CpuResource::current_vcpu` updated from `u32` to `f32`.
2. `calc_cpu_resources` return type changed from `u32` to `f32`.
3. CPU hotplug logic now uses `f32` for the target vCPU count and applies
4. `ceil()` before calling `hypervisor.resize_vcpu()`.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-10 22:11:56 +01:00
Alex Lyn
84fd33c3bc kata-types: Use fractional float for vCPU resource tracking
Refactors `LinuxContainerCpuResources` and `LinuxSandboxCpuResources`
to track calculated vCPU allocation using `f64` (fractional float)
instead of `u64` (milliseconds).

This ensures more precise resource calculation (`quota / period`) and
aggregation by avoiding rounding errors inherent in millisecond-based
integer tracking.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-10 22:11:56 +01:00
Alex Lyn
0f04363ea8 tests: Disable CPU elasticity tests for nontee scenarios
This commit updates the non-TEE tests to disable two specific test
cases: `k8s-number-cpus.bats` and `k8s-sandbox-vcpus-allocation.bats`.

These tests are designed to cover CPU elasticity/dynamic scaling
capabilities. In the non-TEE scenario, we are enforcing the disabling of
this capability by setting the default configuration to
`static_sandbox_resource_mgmt=true`.

Although the tests currently pass, allowing them to run is logically
inconsistent with the intended non-TEE configuration. Therefore, we are
disabling them for all non-TEE runtimes, specifically targeting:
- `qemu-coco-dev`
- `qemu-coco-dev-runtime-rs`

This change ensures that our non-TEE CI accurately reflects the static
resource management policy and prevents misleading test results.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-10 22:11:56 +01:00
Alex Lyn
beaf44dd2e tests: disable block volume test for s390 arch
As runtime-rs doesn't support block device hotplug in s390 arch,
with this fact, we just disable or skip the test when it is the
s390.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-10 22:11:56 +01:00
Alex Lyn
535ba589f4 runtime-rs: Enable elastic resource feature
To support such feature, the item in Makefile should be enabled,
and it can be set true when make build, just like this:
`DEFSTATICRESOURCEMGMT_QEMU := false`
When users don't want this feature, they can set it with true via
the configuration.toml.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-10 22:11:56 +01:00
Alex Lyn
28371dbec5 tests: Enable cloud-hypervisor and qemu-runtime-rs within the CI
Enable the cpu hotplug tests within the k8s-number-cpus.bats for both
cloud-hypervisor and qemu-runtime-rs.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-10 22:11:56 +01:00
Alex Lyn
82a72b4564 tests: Enable cpu hotplug for dragonball and clh in vcpus allocation
We have support cpu hotplug features within dragonball and clh, this
commit is to enable the test within the CI.

Fixes: #8660

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-10 22:11:56 +01:00
Alex Lyn
6196d3d646 tests: Enable cpu hotplug tests in k8s-cpu-ns.bats
As previous failure within the case, we choose to skip it, but now
the cpu hotplug has been corrected, and it's time to re-enable it.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-10 22:11:56 +01:00
Alex Lyn
96bd13e85d tests: Add support for qemu-runtime-rs
We have supportted virtio-scsi driver, and now the CI should be
enabled.

Fixes: #10373

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-10 22:11:56 +01:00
dependabot[bot]
2137b1fa3a build(deps): bump github.com/containernetworking/plugins in /src/runtime
Bumps [github.com/containernetworking/plugins](https://github.com/containernetworking/plugins) from 1.7.1 to 1.9.0.
- [Release notes](https://github.com/containernetworking/plugins/releases)
- [Commits](https://github.com/containernetworking/plugins/compare/v1.7.1...v1.9.0)

---
updated-dependencies:
- dependency-name: github.com/containernetworking/plugins
  dependency-version: 1.9.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-12-10 16:10:24 +01:00
LandonTClipp
b50a73912d runtime: Config test extension for IOMMUFDID
Adding additional cases for the IOMMUFDID method to check for
non-IOMMUFD paths are passed. The method should do the right
thing.

Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
2025-12-10 15:46:28 +01:00
LandonTClipp
d5e4cf6b4d runtime: Add test for ExecuteVFIODeviceAdd
Copilot made a good point that we should have a test for this.
Thus, this commit.

Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
2025-12-10 15:46:28 +01:00
LandonTClipp
137866f793 runtime: Allow QMP commands to be logged in debug level
Logging the QMP commands gives us a lot of flexibility to
troubleshoot issues with what is being sent to QEMU.

Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
2025-12-10 15:46:28 +01:00
LandonTClipp
a3b5764f67 runtime: Fix import cycle and add unit test for IOMMUFDID()
An import cycle was introduced because of a mutual need
for the constant that describes the prefix of IOMMUFD files.
We need to extract this out into a higher-level package.

Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
2025-12-10 15:46:28 +01:00
LandonTClipp
09438fd54f runtime: Add IOMMUFD Object Creation for QEMU QMP Commands
The QMP commands sent to QEMU did not properly set up
IOMMUFD objects in the codepath that handles VFIO device
hot-plugging. This is mainly relevant in the Kubernetes
use-case where the VFIO devices are not available when
QEMU is first launched.

Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
2025-12-10 15:46:28 +01:00
Manuel Huber
cb8fd2e3b1 runtime: gpu: Skip CDI annos for pause container
The pause container does not need CDI annotations, these are only
intended for workload containers.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-10 13:26:04 +01:00
Fabiano Fidêncio
69a0ac979c tests: Adjust install_bats()
The function assumes that the runner is a Ubuntu machine, which so far
has been true as part of our CI.

However, the new ARM runner is running on Debian, and those mirror
additions would simply break.

With this in mind, for any distro that's not ubuntu, let's just make
sure to inform the owner of the system to have bats already installed as
part of the environment provided.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-10 12:05:04 +01:00
Fabiano Fidêncio
406f6b1d15 Revert "tests: Add workaround to override CDI files"
This reverts commit 5a81b010f2, as we now
have all the infrastructure properly set up as part of our CI node.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-09 23:18:11 +01:00
Fabiano Fidêncio
3db7b88eff tests: remove containerd guest pull stability tests
Remove the existing containerd guest pull stability tests workflow
as we're going to rebuild all the VMs used for testing and introduce
new, more focused stability tests for nydus-snapshotter.

The new tests will be added soon, as part of another PR.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-08 16:29:11 +01:00
Fabiano Fidêncio
5b6a2d25bc podOverhead: Reduce memory overhead for GPU runtime classes
Now that we've bumped to QEMU 10.2.0-rc1, we can take advantage of a fix
that's present there, which fixes the double memory allocation for the
cases where GPUs are being cold-plugged.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-06 00:16:43 +01:00
Fabiano Fidêndio
71f78cc87e tests: cc: gpu: Lower the amount of memory required by the pods
We've made the pods require a ridiculous amount of memory, just for the
sake of getting them running.

Now that those are running, tests are passing, CI is required, let's
work to lower the amount of mmemory needed as everything else is working
as expected.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-06 00:16:43 +01:00
Dan Mihai
965ad10cf2 tests: k8s: tests_common.sh local modification
Clean-up shellcheck warnings:

SC2030 (info): Modification of cmd_out is local (to subshell caused by (..) group).
SC2031 (info): cmd_out was modified in a subshell. That change might be lost.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2025-12-06 00:16:23 +01:00
Dan Mihai
8199171cc4 tests: k8s: tests_common.sh braces around variables
Clean-up shellcheck warnings:

SC2250 (style): Prefer putting braces around variable references even
when not strictly required.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2025-12-06 00:16:23 +01:00
Fabiano Fidêncio
5a81b010f2 tests: Add workaround to override CDI files
Let's add a simple backup and restore logic for the CDI configuration
file nvidia.com-pgpu.yaml in the k8s-nvidia-*.bats and
k8s-confidential-attestation.bats test files.

Althought not optimal, this is a temporary workaround needed until
NVIDIA releases what's needed for the GPU Operator to properly deal with
cold plugged devices for the Confidential Containers cases, which is
work in progress right now.

After that's released, we can revert/drop this patch.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-05 18:58:35 +01:00
Fabiano Fidêncio
aaa67df4dd versions: Bump experimental {tdx,snp} QEMU
Let's bump experimental {tdx,snp} QEMU to the tags created Today in the
Confidential Containers repo, which match with QEMU 10.2.0-rc1.

This bump is specially beneficial for us, as we can get rid of QEMU's
double memory allocation when **cold plugging** a GPU.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-05 18:58:35 +01:00
Zvonko Kaiser
f8ad17499d gpu: VFIO handling container vs sandbox
If the sandbox has cold-plugged a IOMMUFD device but the
device-plugins sends us a /dev/vfio/<NUM> device we need to
check if the IOMMUFD device and the  VFIO device are the same
We have the sibling.BDF we now need to extract the BDF of the
devPath that is either /dev/vfio/<NUM> or /dev/vfio/devices/vfio<NUM>

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-05 16:53:31 +01:00
Zvonko Kaiser
147e9f188e Merge pull request #12080 from manuelh-dev/mahuber/cc-gpu-ci-attestation
tests: nvidia: cc: Add attestation test
2025-12-05 09:31:57 -05:00
Steve Horsman
2f1b98c232 Merge pull request #12197 from stevenhorsman/logrus-1.9.3-bump
version: Bump sirupsen/logrus
2025-12-05 14:18:50 +00:00
Manuel Huber
e5861cde20 tests: use Authorization when GH_TOKEN is set
Same as for other uses of GH_TOKEN, use it when set in order to
avoid rate limiting issues.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-05 14:08:43 +01:00
stevenhorsman
9eba559bd6 version: Bump sirupsen/logrus
Bump the github.com/sirupsen/logrus version to 1.9.3
across our components where it is back-level to bring us
up-to-date and resolve high severity CVE-2025-65637

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-05 11:12:04 +00:00
Manuel Huber
34efa83afc tests: nvidia: cc: Add attestation test
Add the attestation bats test case to the NVIDIA CI and provide a
second pod manifest for the attestation test with a GPU. This will
enable composite attestation in a subsequent step.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-05 11:48:55 +01:00
Manuel Huber
e31d592a0c versions: Bump coco-trustee
Bump to pull in a fix for composite attestation with GPUs. The new
commit ID corresponds to the fix (change for default GPU policy),
currently being the top commit of the main branch.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-05 11:48:55 +01:00
Manuel Huber
73dfa9b9d5 versions: Bump coco-guest-components
Bump to pull in a fix for NVIDIA CC GPU attestation.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-05 11:48:55 +01:00
Manuel Huber
116a72ad0d tests: cc: Fix command evaluation
This brings two fixes:
- use the test_key variable to check against the aatest value.
- properly check the run command invocation (run w/o bash does not
  seem to like the pipe which leads to ALWAYS evaluating the
  status result to 1. With this, the deny-all test would ALWAYS
  succeed regardless of whether aatest was actually returned or not.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-05 11:48:55 +01:00
Manuel Huber
23675c784b tests: cc: Reset default policy
When running these tests repeatedly locally, the default policy is not
being reset after the test completes, then subsequent runs fail.
Similar to k8s-sealed-secrets.bats, we set the default policy in an if
condition.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-05 11:48:55 +01:00
Manuel Huber
f70c3adaf1 tests: cc: Add kbs_set_gpu0_resource_policy
This allows setting a GPU0 resource policy, enabling GPU
attestation tests to not use the default resource policy.
For now, the policy requires attestation's ear status to
not be contraindicated. In a future change we will require
this to be affirming once our CI runners' vBIOS version is
properly configured.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-05 11:48:55 +01:00
Manuel Huber
c2d1e2dcc9 tests: cc: Add is_confidential_gpu_hardware
This enables attestation tests to figure out whether composite
attestation with a GPU can be executed.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-05 11:48:55 +01:00
Manuel Huber
53e94df203 tests: nvidia: cc: add SUPPORTED_TEE_HYPERVISORS
Add the NVIDIA TEE hypervisors. With this, attestation tests can be run
against the NVIDIA handlers, for instance.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-05 11:48:55 +01:00
Fabiano Fidêncio
923f97bc66 rootfs: Temporarily revert "gpu: Handle root_hash.txt correctly"
This reverts commit e4a13b9a4a, as it
caused some issues with the GPU workflows.

Reverting it is better, as it unblocks other PRs.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-05 11:47:37 +01:00
Steve Horsman
d27af53902 Merge pull request #12185 from stevenhorsman/runtime-rs-required-checks
ci: Add qemu-runtime-rs AKS tests to required
2025-12-05 10:43:25 +00:00
stevenhorsman
403de2161f version: Update golang to 1.24.11
Needed to fix:
```
Vulnerability #1: GO-2025-4155
    Excessive resource consumption when printing error string for host
    certificate validation in crypto/x509
  More info: https://pkg.go.dev/vuln/GO-2025-4155
  Standard library
    Found in: crypto/x509@go1.24.9
    Fixed in: crypto/x509@go1.24.11
    Vulnerable symbols found:
      #1: x509.HostnameError.Error
```

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-04 22:50:07 +01:00
Steve Horsman
425f4ffc8d Merge pull request #12124 from zvonkok/nvidia-measured-rootfs
gpu: Measured rootfs
2025-12-04 14:54:11 +00:00
Hyounggyu Choi
1dd3426adc tests: Extend vfio-ap test for runtime-rs
vfio-ap passthrough has been introduced for runtime-rs,
requiring that the existing test verify this new functionality.
This commit adds:

- containerd config specific to runtime-rs
- extensions to the existing test functions to cover vfio-ap

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2025-12-04 15:05:23 +01:00
Hyounggyu Choi
aa326fb9b8 tests: Remove usage of crictl for vfio-ap
`crictl` is not used any more after #10767.
Let's clean up all places where the tool is used.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2025-12-04 15:05:23 +01:00
Hyounggyu Choi
41d61f4b16 runtime-rs: Enable VFIO-AP passthrough
The following have been made for the enablement:

1. Make `MediatedPci` and `MediatedAp` in `VfioDeviceType`
2. Make HostDevice without BDF for `MediatedAp`
3. Add `CCW` to VFioBusMode and set it to VfioConfig as `bus_type`
4. Return `vfio-ap` driver type for `CCW` bus type
5. Set `bus_mode` for `VfioDevice` based on `bus_type`
6. Set `vfio-ap` to the agent device's `field_type`
7. Prepare a different argument for `vfio-ap` for QMP command
8. Set None to all PCI relevant fields

Please keep in mind that `vfio-ap` does not belong to any
types of port togologies like PCI (e.g., root or switch)
because devices on s390x are controlled by CCW.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2025-12-04 15:05:23 +01:00
Hyounggyu Choi
cb5b1384ca runtime-rs: Introduce uses_native_ccw_bus()
Until now, we relied on `VMROOTFSDRIVER` to determine
whether a system uses a native CCW bus.
However, this method is not canonical and can be error-prone
depending on the configuration.

This commit introduces a new function that checks
for the presence of CCW bus infrastructure in sysfs
and verifies that native mainframe drivers are available.
It replaces all previous uses of the old detection method.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2025-12-04 15:05:23 +01:00
Steve Horsman
f673f33e72 Merge pull request #12172 from fidencio/topic/gatekeeper-mark-nvidia-jobs-as-required
gatekeeper: Mark NVIDIA CC GPU test as required
2025-12-04 12:48:57 +00:00
stevenhorsman
112810c796 ci: Add qemu-runtime-rs AKS tests to required
Add the small and normal variants of the qemu-runtime-rs
tests to the required-tests list now that they are stable.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-04 11:15:43 +00:00
Fabiano Fidêncio
c505afb67c gatekeeper: Mark NVIDIA CC GPU test as required
It's been stable for the past 10 nightlies, no retries.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-04 11:14:25 +00:00
Steve Horsman
635f7892d5 Merge pull request #12190 from BbolroC/mark-s390x-jobs-as-nonrequired
gatekeeper: Drop all s390x e2e tests temporarily
2025-12-04 11:10:46 +00:00
Steve Horsman
2a6ebc556f Merge pull request #12175 from kata-containers/mahuber/gpu-ci-genpolicy
ci: nvidia: Install kata-artifacts
2025-12-04 09:23:32 +00:00
Hyounggyu Choi
b6ef7eb9c3 gatekeeper: Drop all s390x e2e tests temporarily
This commit marks three s390x CI jobs as non-required.
Please check out the details at #12189.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2025-12-04 08:05:14 +01:00
Steve Horsman
10b0717cae Merge pull request #12179 from stevenhorsman/nginx-test-image-by-digest
tests: Switch nginx test image ref to digest
2025-12-03 13:39:07 +00:00
Hyounggyu Choi
22778547b2 runtime-rs: Fix panic when OCI spec annotations are missing
An oci-spec can be passed to the runtime without annotations
(e.g., `ctr run`). In this case, runtime panics with:

```
src/runtime-rs/crates/runtimes/src/manager.rs:391: called `Option::unwrap()` on a `None` value
```

This commit checks if the annotation is None, and instantiates
the hashmap as an empty map if it is missing. It also adds a None
check for `netns`.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2025-12-03 13:07:39 +01:00
Hyounggyu Choi
ba78fb46fb runtime-rs: Configure protection devices when confidential_guest is set
Currently, the protection device configuration is constructed
automatically even if `confidential_guest` is not set.
This commit puts a condition to check the flag and allows the
construction accordingly.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2025-12-03 13:07:39 +01:00
Zvonko Kaiser
e4a13b9a4a gpu: Handle root_hash.txt correctly
Updates to the shim-v2 build and the binaries.sh script.
Makeing sure that both variants "confidential" AND
"nvidia-gpu-confidential" are handled.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-02 19:56:19 +01:00
Steve Horsman
d8405cb7fb Merge pull request #11983 from stevenhorsman/toolchain-guidance
doc: Document our Toolchain policy
2025-12-02 15:47:54 +00:00
stevenhorsman
b9cb667687 doc: Document our Toolchain policy
Create an initial version of our toolchain policy as agreed in
Architecture Committee meetings and the PTG

Fixes: #9841
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-02 14:28:29 +00:00
stevenhorsman
79a75b63bf tests: Switch nginx test image ref to digest
As tags are mutable and digests are not, lets pin our image
by digest to give our CI a better chance of stability

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-02 13:02:50 +00:00
stevenhorsman
5c618dc8e2 tests: Switch nginx images to use version.yaml details
- Swap out the hard-coded nginx registry and verisons for reading
the test image details for version.yaml
which can also ensure that the quay.io mirror is used
rather than the docker hub versions which can hit pull limits
- Try setting imagePullPoliycy Always to fix issues with the arm CI

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-02 10:04:09 +01:00
Manuel Huber
3427b5c00e ci: nvidia: Install kata-artifacts
In preparation for Kata agent security policy testing, installing
Kata tools to provide genpolicy.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-01 17:59:19 +00:00
Manuel Huber
4355af7972 kata-deploy: Fix binary find install_tools_helper
Using make tarball targets for tools locally, binaries may exist
for both debug and release builds. In this case, cryptic errors
are shown as we try to install multiple binaries.
This change require exactly one binary to be found and errors  out
in other cases.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-01 09:29:24 -08:00
Manuel Huber
5a5c43429e ci: nvidia: remove kubectl_retry calls
When tests regress, the CI wait time can increase significantly
with the current kubectly_retry attempt logic. Thus, align with
other tests and remove kubectl_retry invocations. Instead, rely on
proper timeouts.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-11-28 19:00:57 +01:00
Fabiano Fidêncio
e3646adedf gatekeeper: Drop SEV-SNP from required
SEV-SNP machine is failing due to nydus not being deployed in the
machine.

We cannot easily contact the maintainers due to the US Holidays, and I
think this should become a criteria for a machine not be added as
required again (different regions coverage).

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-28 12:46:07 +01:00
Steve Horsman
8534afb9e8 Merge pull request #12150 from stevenhorsman/add-gatekeeper-triggers
ci: Add two extra gatekeeper triggers
2025-11-28 09:34:41 +00:00
Zvonko Kaiser
9dfa6df2cb agent: Bump CDI-rs to latest
Latest version of container-device-interface is v0.1.1

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-11-27 22:57:50 +01:00
Fabiano Fidêncio
776e08dbba build: Add nvidia image rootfs builds
So far we've only been building the initrd for the nvidia rootfs.
However, we're also interested on having the image beind used for a few
use-cases.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-27 22:46:07 +01:00
stevenhorsman
531311090c ci: Add two extra gatekeeper triggers
We hit a case that gatekeeper was failing due to thinking the WIP check
had failed, but since it ran the PR had been edited to remove that from
the title. We should listen to edits and unlabels of the PR to ensure that
gatekeeper doesn't get outdated in situations like this.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-11-27 16:45:04 +00:00
Zvonko Kaiser
bfc9e446e1 kernel: Add NUMA config
Add per arch specific NUMA enablement kernel settings

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-11-27 12:45:27 +01:00
Steve Horsman
c5ae8c4ba0 Merge pull request #12144 from BbolroC/use-runs-on-to-choose-runners
GHA: Use `runs-on` only for choosing proper runners
2025-11-27 09:54:39 +00:00
Fabiano Fidêncio
2e1ca580a6 runtime-rs: Only QEMU supports templating
We can remove the checks and default values attribution from all other
shims.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-27 10:31:28 +01:00
Alex Lyn
df8315c865 Merge pull request #12130 from Apokleos/stability-rs
tests: Enable stability tests for runtime-rs
2025-11-27 14:27:58 +08:00
Fupan Li
50dce0cc89 Merge pull request #12141 from Apokleos/fix-nydus-sn
tests: Properly handle containerd config based on version
2025-11-27 11:59:59 +08:00
Fabiano Fidêncio
fa42641692 kata-deploy: Cover all flavours of QEMU shims with multiInstallSuffix
We were missing all the runtime-rs variants.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-26 17:44:16 +01:00
Fabiano Fidêncio
96d1e0fe97 kata-deploy: Fix multiInstallSuffix for NV shims
When using the multiInstallSuffix we must be cautelous on using the shim
name, as qemu-nvidia-gpu* doesn't actually have a matching QEMU itself,
but should rather be mapped to:
qemu-nvidia-gpu -> qemu
qemu-nvidia-gpu-snp -> qemu-snp-experimental
qemu-nvidia-gpu-tdx -> qemu-tdx-experimental

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-26 17:44:16 +01:00
Markus Rudy
d8f347d397 Merge pull request #12112 from shwetha-s-poojary/fix_list_routes
agent: fix the list_routes failure
2025-11-26 17:32:10 +01:00
Steve Horsman
3573408f6b Merge pull request #11586 from zvonkok/numa-qemu
qemu: Enable NUMA
2025-11-26 16:28:16 +00:00
Steve Horsman
aae483bf1d Merge pull request #12096 from Amulyam24/enable-ibm-runners
ci: re-enable IBM runners for ppc64le and s390x
2025-11-26 13:51:21 +00:00
Steve Horsman
5c09849fe6 Merge pull request #12143 from kata-containers/topic/add-report-tests-to-workflows
workflows: Add Report tests to all workflows
2025-11-26 13:18:21 +00:00
Steve Horsman
ed7108e61a Merge pull request #12138 from arvindskumar99/SNPrequired
CI: readding SNP as required
2025-11-26 11:33:07 +00:00
Amulyam24
43a004444a ci: re-enable IBM runners for ppc64le and s390x
This PR re-enables the IBM runners for ppc64le/s390x build jobs and s390x static checks.

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2025-11-26 16:20:01 +05:30
Hyounggyu Choi
6f761149a7 GHA: Use runs-on only for choosing proper runners
Fixes: #12123

`include` in #12069, introduced to choose a different runner
based on component, leads to another set of redundant jobs
where `matrix.command` is empty.
This commit gets back to the `runs-on` solution, but makes
the condition human-readable.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2025-11-26 11:35:30 +01:00
Alex Lyn
4e450691f4 tests: Unify nydus configuration to containerd v3 schema
Containerd configuration syntax (`config.toml`) varies across versions,
requiring per-version logic for fields like `runtime`.

However, testing confirms that containerd LTS (1.7.x) and newer
versions fully support the v3 schema for the nydus remote snapshotter.

This commit changes the previous containerd v1 settings in `config.toml`.
Instead, it introduces a unified v3-style configuration for nydus, which
can be vailid for lts and active containerds.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-11-26 17:58:16 +08:00
stevenhorsman
4c59cf1a5d workflows: Add Report tests to all workflows
In the CoCo tests jobs @wainersm create a report tests step
that summarises the jobs, so they are easier to understand and
get results for. This is very useful, so let's roll it out to all the bats
tests.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-11-26 09:28:36 +00:00
shwetha-s-poojary
4510e6b49e agent: fix the list_routes failure
relax list_routes tests so not every route requires a device

Signed-off-by: shwetha-s-poojary <shwetha.s-poojary@ibm.com>
2025-11-25 20:25:46 -08:00
Xuewei Niu
04e1cf06ed Merge pull request #12137 from Apokleos/fix-netdev-mq
runtime-rs: fix QMP 'mq' parameter type in netdev_add to boolean
2025-11-26 11:49:33 +08:00
Arvind Kumar
c085011a0a CI: readding SNP as required
Reenabling the SNP CI node as a required test.

Signed-off-by: Arvind Kumar <arvinkum@amd.com>
2025-11-25 17:05:01 +00:00
Zvonko Kaiser
45cce49b72 shellcheckk: Fix [] [[]] SC2166
This file is a beast so doing one shellcheck fix after the other.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-11-25 15:46:16 +01:00
Zvonko Kaiser
b2c9439314 qemu: Update tools/packaging/static-build/qemu/build-qemu.sh
This nit was introduced by 227e717 during the v3.1.0 era. The + sign from the bash substitution ${CI:+...} was copied by mistake.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-25 15:46:09 +01:00
Zvonko Kaiser
2f3d42c0e4 shellcheck: build-qemu.sh is clean
Make shellcheck happy

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-11-25 15:46:07 +01:00
Zvonko Kaiser
f55de74ac5 shellcheck: build-base-qemu.sh is clean
Make shellcheck happy

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-11-25 15:45:49 +01:00
Zvonko Kaiser
040f920de1 qemu: Enable NUMA support
Enable NUMA support with QEMU.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-11-25 15:45:00 +01:00
Alex Lyn
7f4d856e38 tests: Enable nydus tests for qemu-runtime-rs
We need enable nydus tests for qemu-runtime-rs, and this commit
aims to do it.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-11-25 17:45:57 +08:00
Alex Lyn
98df3e760c runtime-rs: fix QMP 'mq' parameter type in netdev_add to boolean
QEMU netdev_add QMP command requires the 'mq' (multi-queue) argument
to be of boolean type (`true` / `false`). In runtime-rs the virtio-net
device hotplug logic currently passes a string value (e.g. "on"/"off"),
which causes QEMU to reject the command:
```
    Invalid parameter type for 'mq', expected: boolean
```
This patch modifies `hotplug_network_device` to insert 'mq' as a proper
boolean value of `true . This fixes sandbox startup failures when
multi-queue is enabled.

Fixes #12136

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-11-25 17:34:36 +08:00
Alex Lyn
23393d47f6 tests: Enable stability tests for qemu-runtime-rs on nontee
Enable the stability tests for qemu-runtime-rs CoCo on non-TEE
environments

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-11-25 16:18:37 +08:00
Alex Lyn
f1d971040d tests: Enable run-nerdctl-tests for qemu-runtime-rs
Enable nerdctl tests for qemu-runtime-rs

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-11-25 16:14:50 +08:00
Alex Lyn
c7842aed16 tests: Enable stability tests for runtime-rs
As previous set without qemu-runtime-rs, we enable it in this commit.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-11-25 16:12:12 +08:00
482 changed files with 22024 additions and 9853 deletions

View File

@@ -10,11 +10,6 @@ self-hosted-runner:
- amd64-nvidia-a100
- amd64-nvidia-h100-snp
- arm64-k8s
- containerd-v1.7-overlayfs
- containerd-v2.0-overlayfs
- containerd-v2.1-overlayfs
- containerd-v2.2
- containerd-v2.2-overlayfs
- garm-ubuntu-2004
- garm-ubuntu-2004-smaller
- garm-ubuntu-2204
@@ -25,6 +20,7 @@ self-hosted-runner:
- ppc64le-k8s
- ppc64le-small
- ubuntu-24.04-ppc64le
- ubuntu-24.04-s390x
- metrics
- riscv-builder
- sev-snp

View File

@@ -147,9 +147,18 @@ jobs:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: get-kata-tools-tarball
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: kata-tools-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-tools-artifacts
- name: Install kata
run: bash tests/integration/nydus/gha-run.sh install-kata kata-artifacts
- name: Install kata-tools
run: bash tests/integration/nydus/gha-run.sh install-kata-tools kata-tools-artifacts
- name: Run nydus tests
timeout-minutes: 10
run: bash tests/integration/nydus/gha-run.sh run
@@ -367,8 +376,16 @@ jobs:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: Install kata
run: bash tests/functional/kata-agent-apis/gha-run.sh install-kata kata-artifacts
- name: get-kata-tools-tarball
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: kata-tools-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-tools-artifacts
- name: Install kata & kata-tools
run: |
bash tests/functional/kata-agent-apis/gha-run.sh install-kata kata-artifacts
bash tests/functional/kata-agent-apis/gha-run.sh install-kata-tools kata-tools-artifacts
- name: Run kata agent api tests with agent-ctl
run: bash tests/functional/kata-agent-apis/gha-run.sh run

View File

@@ -12,7 +12,12 @@ name: Build checks
jobs:
check:
name: check
runs-on: ${{ matrix.runner || inputs.instance }}
runs-on: >-
${{
( contains(inputs.instance, 's390x') && matrix.component.name == 'runtime' ) && 's390x' ||
( contains(inputs.instance, 'ppc64le') && (matrix.component.name == 'runtime' || matrix.component.name == 'agent') ) && 'ppc64le' ||
inputs.instance
}}
strategy:
fail-fast: false
matrix:
@@ -70,36 +75,6 @@ jobs:
- protobuf-compiler
instance:
- ${{ inputs.instance }}
include:
- component:
name: runtime
path: src/runtime
needs:
- golang
- XDG_RUNTIME_DIR
instance: ubuntu-24.04-s390x
runner: s390x
- component:
name: runtime
path: src/runtime
needs:
- golang
- XDG_RUNTIME_DIR
instance: ubuntu-24.04-ppc64le
runner: ppc64le
- component:
name: agent
path: src/agent
needs:
- rust
- libdevmapper
- libseccomp
- protobuf-compiler
- clang
instance: ubuntu-24.04-ppc64le
runner: ppc64le
steps:
- name: Adjust a permission for repo

View File

@@ -41,16 +41,11 @@ jobs:
matrix:
asset:
- agent
- agent-ctl
- busybox
- cloud-hypervisor
- cloud-hypervisor-glibc
- coco-guest-components
- csi-kata-directvolume
- firecracker
- genpolicy
- kata-ctl
- kata-manager
- kernel
- kernel-confidential
- kernel-dragonball-experimental
@@ -63,7 +58,6 @@ jobs:
- qemu
- qemu-snp-experimental
- qemu-tdx-experimental
- trace-forwarder
- virtiofsd
stage:
- ${{ inputs.stage }}
@@ -171,6 +165,8 @@ jobs:
- rootfs-image
- rootfs-image-confidential
- rootfs-image-mariner
- rootfs-image-nvidia-gpu
- rootfs-image-nvidia-gpu-confidential
- rootfs-initrd
- rootfs-initrd-confidential
- rootfs-initrd-nvidia-gpu
@@ -362,3 +358,104 @@ jobs:
path: kata-static.tar.zst
retention-days: 15
if-no-files-found: error
build-tools-asset:
name: build-tools-asset
runs-on: ubuntu-22.04
permissions:
contents: read
packages: write
strategy:
matrix:
asset:
- agent-ctl
- csi-kata-directvolume
- genpolicy
- kata-ctl
- kata-manager
- trace-forwarder
stage:
- ${{ inputs.stage }}
steps:
- name: Login to Kata Containers quay.io
if: ${{ inputs.push-to-registry == 'yes' }}
uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0
with:
registry: quay.io
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
persist-credentials: false
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Build ${{ matrix.asset }}
id: build
run: |
make "${KATA_ASSET}-tarball"
build_dir=$(readlink -f build)
# store-artifact does not work with symlink
mkdir -p kata-tools-build && cp "${build_dir}"/kata-static-"${KATA_ASSET}"*.tar.* kata-tools-build/.
env:
KATA_ASSET: ${{ matrix.asset }}
TAR_OUTPUT: ${{ matrix.asset }}.tar.gz
PUSH_TO_REGISTRY: ${{ inputs.push-to-registry }}
ARTEFACT_REGISTRY: ghcr.io
ARTEFACT_REGISTRY_USERNAME: ${{ github.actor }}
ARTEFACT_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}
TARGET_BRANCH: ${{ inputs.target-branch }}
RELEASE: ${{ inputs.stage == 'release' && 'yes' || 'no' }}
- name: store-artifact ${{ matrix.asset }}
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:
name: kata-tools-artifacts-amd64-${{ matrix.asset }}${{ inputs.tarball-suffix }}
path: kata-tools-build/kata-static-${{ matrix.asset }}.tar.zst
retention-days: 15
if-no-files-found: error
create-kata-tools-tarball:
name: create-kata-tools-tarball
runs-on: ubuntu-22.04
needs: [build-tools-asset]
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
fetch-tags: true
persist-credentials: false
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: get-artifacts
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
pattern: kata-tools-artifacts-amd64-*${{ inputs.tarball-suffix }}
path: kata-tools-artifacts
merge-multiple: true
- name: merge-artifacts
run: |
./tools/packaging/kata-deploy/local-build/kata-deploy-merge-builds.sh kata-tools-artifacts versions.yaml kata-tools-static.tar.zst
env:
RELEASE: ${{ inputs.stage == 'release' && 'yes' || 'no' }}
- name: store-artifacts
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:
name: kata-tools-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-tools-static.tar.zst
retention-days: 15
if-no-files-found: error

View File

@@ -150,6 +150,7 @@ jobs:
matrix:
asset:
- rootfs-image
- rootfs-image-nvidia-gpu
- rootfs-initrd
- rootfs-initrd-nvidia-gpu
steps:

View File

@@ -32,7 +32,7 @@ jobs:
permissions:
contents: read
packages: write
runs-on: ppc64le-small
runs-on: ubuntu-24.04-ppc64le
strategy:
matrix:
asset:
@@ -89,7 +89,7 @@ jobs:
build-asset-rootfs:
name: build-asset-rootfs
runs-on: ppc64le-small
runs-on: ubuntu-24.04-ppc64le
needs: build-asset
permissions:
contents: read
@@ -170,7 +170,7 @@ jobs:
build-asset-shim-v2:
name: build-asset-shim-v2
runs-on: ppc64le-small
runs-on: ubuntu-24.04-ppc64le
needs: [build-asset, build-asset-rootfs, remove-rootfs-binary-artifacts]
permissions:
contents: read
@@ -230,7 +230,7 @@ jobs:
create-kata-tarball:
name: create-kata-tarball
runs-on: ppc64le-small
runs-on: ubuntu-24.04-ppc64le
needs: [build-asset, build-asset-rootfs, build-asset-shim-v2]
permissions:
contents: read

View File

@@ -32,7 +32,7 @@ permissions: {}
jobs:
build-asset:
name: build-asset
runs-on: s390x
runs-on: ubuntu-24.04-s390x
permissions:
contents: read
packages: write
@@ -257,7 +257,7 @@ jobs:
build-asset-shim-v2:
name: build-asset-shim-v2
runs-on: s390x
runs-on: ubuntu-24.04-s390x
needs: [build-asset, build-asset-rootfs, remove-rootfs-binary-artifacts]
permissions:
contents: read
@@ -319,7 +319,7 @@ jobs:
create-kata-tarball:
name: create-kata-tarball
runs-on: s390x
runs-on: ubuntu-24.04-s390x
needs:
- build-asset
- build-asset-rootfs

36
.github/workflows/ci-nightly-rust.yaml vendored Normal file
View File

@@ -0,0 +1,36 @@
name: Kata Containers Nightly CI (Rust)
on:
schedule:
- cron: '0 1 * * *' # Run at 1 AM UTC (1 hour after script-based nightly)
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
permissions: {}
jobs:
kata-containers-ci-on-push-rust:
permissions:
contents: read
packages: write
id-token: write
attestations: write
uses: ./.github/workflows/ci.yaml
with:
commit-hash: ${{ github.sha }}
pr-number: "nightly-rust"
tag: ${{ github.sha }}-nightly-rust
target-branch: ${{ github.ref_name }}
build-type: "rust" # Use Rust-based build
secrets:
AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}
AZ_APPID: ${{ secrets.AZ_APPID }}
AZ_TENANT_ID: ${{ secrets.AZ_TENANT_ID }}
AZ_SUBSCRIPTION_ID: ${{ secrets.AZ_SUBSCRIPTION_ID }}
CI_HKD_PATH: ${{ secrets.CI_HKD_PATH }}
ITA_KEY: ${{ secrets.ITA_KEY }}
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
NGC_API_KEY: ${{ secrets.NGC_API_KEY }}
KBUILD_SIGN_PIN: ${{ secrets.KBUILD_SIGN_PIN }}

View File

@@ -19,6 +19,11 @@ on:
required: false
type: string
default: no
build-type:
description: The build type for kata-deploy. Use 'rust' for Rust-based build, empty or omit for script-based (default).
required: false
type: string
default: ""
secrets:
AUTHENTICATED_IMAGE_PASSWORD:
required: true
@@ -72,6 +77,7 @@ jobs:
target-branch: ${{ inputs.target-branch }}
runner: ubuntu-22.04
arch: amd64
build-type: ${{ inputs.build-type }}
secrets:
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
@@ -104,6 +110,7 @@ jobs:
target-branch: ${{ inputs.target-branch }}
runner: ubuntu-24.04-arm
arch: arm64
build-type: ${{ inputs.build-type }}
secrets:
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
@@ -147,8 +154,9 @@ jobs:
tag: ${{ inputs.tag }}-s390x
commit-hash: ${{ inputs.commit-hash }}
target-branch: ${{ inputs.target-branch }}
runner: s390x
runner: ubuntu-24.04-s390x
arch: s390x
build-type: ${{ inputs.build-type }}
secrets:
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
@@ -165,8 +173,9 @@ jobs:
tag: ${{ inputs.tag }}-ppc64le
commit-hash: ${{ inputs.commit-hash }}
target-branch: ${{ inputs.target-branch }}
runner: ppc64le-small
runner: ubuntu-24.04-ppc64le
arch: ppc64le
build-type: ${{ inputs.build-type }}
secrets:
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
@@ -233,14 +242,14 @@ jobs:
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: get-kata-tarball
- name: get-kata-tools-tarball
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: kata-static-tarball-amd64-${{ inputs.tag }}
path: kata-artifacts
name: kata-tools-static-tarball-amd64-${{ inputs.tag }}
path: kata-tools-artifacts
- name: Install tools
run: bash tests/integration/kubernetes/gha-run.sh install-kata-tools kata-artifacts
- name: Install kata-tools
run: bash tests/integration/kubernetes/gha-run.sh install-kata-tools kata-tools-artifacts
- name: Copy binary into Docker context
run: |
@@ -288,7 +297,7 @@ jobs:
tarball-suffix: -${{ inputs.tag }}
registry: ghcr.io
repo: ${{ github.repository_owner }}/kata-deploy-ci
tag: ${{ inputs.tag }}-amd64
tag: ${{ inputs.tag }}-amd64${{ inputs.build-type == 'rust' && '-rust' || '' }}
commit-hash: ${{ inputs.commit-hash }}
pr-number: ${{ inputs.pr-number }}
target-branch: ${{ inputs.target-branch }}
@@ -304,7 +313,7 @@ jobs:
with:
registry: ghcr.io
repo: ${{ github.repository_owner }}/kata-deploy-ci
tag: ${{ inputs.tag }}-arm64
tag: ${{ inputs.tag }}-arm64${{ inputs.build-type == 'rust' && '-rust' || '' }}
commit-hash: ${{ inputs.commit-hash }}
pr-number: ${{ inputs.pr-number }}
target-branch: ${{ inputs.target-branch }}
@@ -314,9 +323,10 @@ jobs:
needs: publish-kata-deploy-payload-amd64
uses: ./.github/workflows/run-k8s-tests-on-nvidia-gpu.yaml
with:
tarball-suffix: -${{ inputs.tag }}
registry: ghcr.io
repo: ${{ github.repository_owner }}/kata-deploy-ci
tag: ${{ inputs.tag }}-amd64
tag: ${{ inputs.tag }}-amd64${{ inputs.build-type == 'rust' && '-rust' || '' }}
commit-hash: ${{ inputs.commit-hash }}
pr-number: ${{ inputs.pr-number }}
target-branch: ${{ inputs.target-branch }}
@@ -338,7 +348,7 @@ jobs:
tarball-suffix: -${{ inputs.tag }}
registry: ghcr.io
repo: ${{ github.repository_owner }}/kata-deploy-ci
tag: ${{ inputs.tag }}-amd64
tag: ${{ inputs.tag }}-amd64${{ inputs.build-type == 'rust' && '-rust' || '' }}
commit-hash: ${{ inputs.commit-hash }}
pr-number: ${{ inputs.pr-number }}
target-branch: ${{ inputs.target-branch }}
@@ -356,7 +366,7 @@ jobs:
with:
registry: ghcr.io
repo: ${{ github.repository_owner }}/kata-deploy-ci
tag: ${{ inputs.tag }}-s390x
tag: ${{ inputs.tag }}-s390x${{ inputs.build-type == 'rust' && '-rust' || '' }}
commit-hash: ${{ inputs.commit-hash }}
pr-number: ${{ inputs.pr-number }}
target-branch: ${{ inputs.target-branch }}
@@ -370,7 +380,7 @@ jobs:
with:
registry: ghcr.io
repo: ${{ github.repository_owner }}/kata-deploy-ci
tag: ${{ inputs.tag }}-ppc64le
tag: ${{ inputs.tag }}-ppc64le${{ inputs.build-type == 'rust' && '-rust' || '' }}
commit-hash: ${{ inputs.commit-hash }}
pr-number: ${{ inputs.pr-number }}
target-branch: ${{ inputs.target-branch }}
@@ -382,7 +392,7 @@ jobs:
with:
registry: ghcr.io
repo: ${{ github.repository_owner }}/kata-deploy-ci
tag: ${{ inputs.tag }}-amd64
tag: ${{ inputs.tag }}-amd64${{ inputs.build-type == 'rust' && '-rust' || '' }}
commit-hash: ${{ inputs.commit-hash }}
pr-number: ${{ inputs.pr-number }}
target-branch: ${{ inputs.target-branch }}
@@ -473,7 +483,7 @@ jobs:
vmm: ${{ matrix.params.vmm }}
run-cri-containerd-tests-arm64:
if: ${{ inputs.skip-test != 'yes' }}
if: false
needs: build-kata-static-tarball-arm64
strategy:
fail-fast: false

View File

@@ -10,7 +10,9 @@ on:
- opened
- synchronize
- reopened
- edited
- labeled
- unlabeled
permissions: {}

View File

@@ -1,43 +0,0 @@
name: kata-runtime-classes-sync
on:
pull_request:
types:
- opened
- edited
- reopened
- synchronize
permissions: {}
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
kata-deploy-runtime-classes-check:
name: kata-deploy-runtime-classes-check
runs-on: ubuntu-22.04
steps:
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
persist-credentials: false
- name: Ensure the split out runtime classes match the all-in-one file
run: |
pushd tools/packaging/kata-deploy/runtimeclasses/
echo "::group::Combine runtime classes"
for runtimeClass in $(find . -type f \( -name "*.yaml" -and -not -name "kata-runtimeClasses.yaml" \) | sort); do
echo "Adding ${runtimeClass} to the resultingRuntimeClasses.yaml"
cat "${runtimeClass}" >> resultingRuntimeClasses.yaml;
done
echo "::endgroup::"
echo "::group::Displaying the content of resultingRuntimeClasses.yaml"
cat resultingRuntimeClasses.yaml
echo "::endgroup::"
echo ""
echo "::group::Displaying the content of kata-runtimeClasses.yaml"
cat kata-runtimeClasses.yaml
echo "::endgroup::"
echo ""
diff resultingRuntimeClasses.yaml kata-runtimeClasses.yaml

View File

@@ -82,6 +82,7 @@ jobs:
target-branch: ${{ github.ref_name }}
runner: ubuntu-22.04
arch: amd64
build-type: "" # Use script-based build (default)
secrets:
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
@@ -99,6 +100,7 @@ jobs:
target-branch: ${{ github.ref_name }}
runner: ubuntu-24.04-arm
arch: arm64
build-type: "" # Use script-based build (default)
secrets:
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
@@ -116,6 +118,7 @@ jobs:
target-branch: ${{ github.ref_name }}
runner: s390x
arch: s390x
build-type: "" # Use script-based build (default)
secrets:
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
@@ -133,6 +136,7 @@ jobs:
target-branch: ${{ github.ref_name }}
runner: ppc64le-small
arch: ppc64le
build-type: "" # Use script-based build (default)
secrets:
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

View File

@@ -30,6 +30,11 @@ on:
description: The arch of the tarball.
required: true
type: string
build-type:
description: The build type for kata-deploy. Use 'rust' for Rust-based build, empty or omit for script-based (default).
required: false
type: string
default: ""
secrets:
QUAY_DEPLOYER_PASSWORD:
required: true
@@ -50,6 +55,25 @@ jobs:
fetch-depth: 0
persist-credentials: false
- name: Remove unnecessary directories to free up space
run: |
sudo rm -rf /usr/local/.ghcup
sudo rm -rf /opt/hostedtoolcache/CodeQL
sudo rm -rf /usr/local/lib/android
sudo rm -rf /usr/share/dotnet
sudo rm -rf /opt/ghc
sudo rm -rf /usr/local/share/boost
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
sudo rm -rf /usr/lib/jvm
sudo rm -rf /usr/share/swift
sudo rm -rf /usr/local/share/powershell
sudo rm -rf /usr/local/julia*
sudo rm -rf /opt/az
sudo rm -rf /usr/local/share/chromium
sudo rm -rf /opt/microsoft
sudo rm -rf /opt/google
sudo rm -rf /usr/lib/firefox
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
@@ -83,8 +107,10 @@ jobs:
REGISTRY: ${{ inputs.registry }}
REPO: ${{ inputs.repo }}
TAG: ${{ inputs.tag }}
BUILD_TYPE: ${{ inputs.build-type }}
run: |
./tools/packaging/kata-deploy/local-build/kata-deploy-build-and-upload-payload.sh \
"$(pwd)/kata-static.tar.zst" \
"${REGISTRY}/${REPO}" \
"${TAG}"
"${TAG}" \
"${BUILD_TYPE}"

View File

@@ -31,7 +31,7 @@ jobs:
permissions:
contents: read
packages: write
runs-on: ppc64le-small
runs-on: ubuntu-24.04-ppc64le
steps:
- name: Login to Kata Containers ghcr.io
uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

View File

@@ -35,7 +35,7 @@ jobs:
permissions:
contents: read
packages: write
runs-on: s390x
runs-on: ubuntu-24.04-s390x
steps:
- name: Login to Kata Containers ghcr.io
uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

View File

@@ -181,6 +181,23 @@ jobs:
GH_TOKEN: ${{ github.token }}
ARCHITECTURE: ppc64le
- name: Set KATA_TOOLS_STATIC_TARBALL env var
run: |
tarball=$(pwd)/kata-tools-static.tar.zst
echo "KATA_TOOLS_STATIC_TARBALL=${tarball}" >> "$GITHUB_ENV"
- name: Download amd64 tools artifacts
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: kata-tools-static-tarball-amd64
- name: Upload amd64 static tarball tools to GitHub
run: |
./tools/packaging/release/release.sh upload-kata-tools-static-tarball
env:
GH_TOKEN: ${{ github.token }}
ARCHITECTURE: amd64
upload-versions-yaml:
name: upload-versions-yaml
needs: release

View File

@@ -1,167 +0,0 @@
name: CI | Run containerd guest pull stability tests
on:
schedule:
- cron: "0 */1 * * *" #run every hour
permissions: {}
# This job relies on k8s pre-installed using kubeadm
jobs:
run-containerd-guest-pull-stability-tests:
name: run-containerd-guest-pull-stability-tests-${{ matrix.environment.test-type }}-${{ matrix.environment.containerd }}
strategy:
fail-fast: false
matrix:
environment: [
{ test-type: multi-snapshotter, containerd: v2.2 },
{ test-type: force-guest-pull, containerd: v1.7 },
{ test-type: force-guest-pull, containerd: v2.0 },
{ test-type: force-guest-pull, containerd: v2.1 },
{ test-type: force-guest-pull, containerd: v2.2 },
]
env:
# I don't want those to be inside double quotes, so I'm deliberately ignoring the double quotes here.
IMAGES_LIST: quay.io/mongodb/mongodb-community-server@sha256:8b73733842da21b6bbb6df4d7b2449229bb3135d2ec8c6880314d88205772a11 ghcr.io/edgelesssys/redis@sha256:ecb0a964c259a166a1eb62f0eb19621d42bd1cce0bc9bb0c71c828911d4ba93d
runs-on: containerd-${{ matrix.environment.test-type }}-${{ matrix.environment.containerd }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
persist-credentials: false
- name: Rotate the journal
run: sudo journalctl --rotate --vacuum-time 1s
- name: Pull the kata-deploy image to be used
run: sudo ctr -n k8s.io image pull quay.io/kata-containers/kata-deploy-ci:kata-containers-latest
- name: Deploy Kata Containers
run: bash tests/integration/kubernetes/gha-run.sh deploy-kata
env:
KATA_HYPERVISOR: qemu-coco-dev
KUBERNETES: vanilla
SNAPSHOTTER: ${{ matrix.environment.test-type == 'multi-snapshotter' && 'nydus' || '' }}
USE_EXPERIMENTAL_SETUP_SNAPSHOTTER: ${{ matrix.environment.test-type == 'multi-snapshotter' }}
EXPERIMENTAL_FORCE_GUEST_PULL: ${{ matrix.environment.test-type == 'force-guest-pull' && 'qemu-coco-dev' || '' }}
# This is needed as we may hit the createContainerTimeout
- name: Adjust Kata Containers' create_container_timeout
run: |
sudo sed -i -e 's/^\(create_container_timeout\).*=.*$/\1 = 600/g' /opt/kata/share/defaults/kata-containers/configuration-qemu-coco-dev.toml
grep "create_container_timeout.*=" /opt/kata/share/defaults/kata-containers/configuration-qemu-coco-dev.toml
# This is needed in order to have enough tmpfs space inside the guest to pull the image
- name: Adjust Kata Containers' default_memory
run: |
sudo sed -i -e 's/^\(default_memory\).*=.*$/\1 = 4096/g' /opt/kata/share/defaults/kata-containers/configuration-qemu-coco-dev.toml
grep "default_memory.*=" /opt/kata/share/defaults/kata-containers/configuration-qemu-coco-dev.toml
- name: Run a few containers using overlayfs
run: |
# I don't want those to be inside double quotes, so I'm deliberately ignoring the double quotes here
# shellcheck disable=SC2086
for img in ${IMAGES_LIST}; do
echo "overlayfs | Using on image: ${img}"
pod="$(echo ${img} | tr ':.@/' '-' | awk '{print substr($0,1,56)}')"
kubectl run "${pod}" \
-it --rm \
--restart=Never \
--image="${img}" \
--image-pull-policy=Always \
--pod-running-timeout=10m \
-- uname -r
done
- name: Run a the same few containers using a different snapshotter
run: |
# I don't want those to be inside double quotes, so I'm deliberately ignoring the double quotes here
# shellcheck disable=SC2086
for img in ${IMAGES_LIST}; do
echo "nydus | Using on image: ${img}"
pod="kata-$(echo ${img} | tr ':.@/' '-' | awk '{print substr($0,1,56)}')"
kubectl run "${pod}" \
-it --rm \
--restart=Never \
--image="${img}" \
--image-pull-policy=Always \
--pod-running-timeout=10m \
--overrides='{
"spec": {
"runtimeClassName": "kata-qemu-coco-dev"
}
}' \
-- uname -r
done
- name: Uninstall Kata Containers
run: bash tests/integration/kubernetes/gha-run.sh cleanup
env:
KATA_HYPERVISOR: qemu-coco-dev
KUBERNETES: vanilla
SNAPSHOTTER: nydus
USE_EXPERIMENTAL_SETUP_SNAPSHOTTER: true
- name: Run a few containers using overlayfs
run: |
# I don't want those to be inside double quotes, so I'm deliberately ignoring the double quotes here
# shellcheck disable=SC2086
for img in ${IMAGES_LIST}; do
echo "overlayfs | Using on image: ${img}"
pod="$(echo ${img} | tr ':.@/' '-' | awk '{print substr($0,1,56)}')"
kubectl run "${pod}" \
-it --rm \
--restart=Never \
--image=${img} \
--image-pull-policy=Always \
--pod-running-timeout=10m \
-- uname -r
done
- name: Deploy Kata Containers
run: bash tests/integration/kubernetes/gha-run.sh deploy-kata
env:
KATA_HYPERVISOR: qemu-coco-dev
KUBERNETES: vanilla
SNAPSHOTTER: nydus
USE_EXPERIMENTAL_SETUP_SNAPSHOTTER: true
# This is needed as we may hit the createContainerTimeout
- name: Adjust Kata Containers' create_container_timeout
run: |
sudo sed -i -e 's/^\(create_container_timeout\).*=.*$/\1 = 600/g' /opt/kata/share/defaults/kata-containers/configuration-qemu-coco-dev.toml
grep "create_container_timeout.*=" /opt/kata/share/defaults/kata-containers/configuration-qemu-coco-dev.toml
# This is needed in order to have enough tmpfs space inside the guest to pull the image
- name: Adjust Kata Containers' default_memory
run: |
sudo sed -i -e 's/^\(default_memory\).*=.*$/\1 = 4096/g' /opt/kata/share/defaults/kata-containers/configuration-qemu-coco-dev.toml
grep "default_memory.*=" /opt/kata/share/defaults/kata-containers/configuration-qemu-coco-dev.toml
- name: Run a the same few containers using a different snapshotter
run: |
# I don't want those to be inside double quotes, so I'm deliberately ignoring the double quotes here
# shellcheck disable=SC2086
for img in ${IMAGES_LIST}; do
echo "nydus | Using on image: ${img}"
pod="kata-$(echo ${img} | tr ':.@/' '-' | awk '{print substr($0,1,56)}')"
kubectl run "${pod}" \
-it --rm \
--restart=Never \
--image="${img}" \
--image-pull-policy=Always \
--pod-running-timeout=10m \
--overrides='{
"spec": {
"runtimeClassName": "kata-qemu-coco-dev"
}
}' \
-- uname -r
done
- name: Uninstall Kata Containers
run: bash tests/integration/kubernetes/gha-run.sh cleanup || true
if: always()
env:
KATA_HYPERVISOR: qemu-coco-dev
KUBERNETES: vanilla
SNAPSHOTTER: nydus
USE_EXPERIMENTAL_SETUP_SNAPSHOTTER: true

View File

@@ -93,14 +93,14 @@ jobs:
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: get-kata-tarball
- name: get-kata-tools-tarball
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
name: kata-tools-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-tools-artifacts
- name: Install kata
run: bash tests/integration/kubernetes/gha-run.sh install-kata-tools kata-artifacts
- name: Install kata-tools
run: bash tests/integration/kubernetes/gha-run.sh install-kata-tools kata-tools-artifacts
- name: Download Azure CLI
uses: azure/setup-kubectl@776406bce94f63e41d621b960d78ee25c8b76ede # v4.0.1
@@ -142,6 +142,10 @@ jobs:
timeout-minutes: 60
run: bash tests/integration/kubernetes/gha-run.sh run-tests
- name: Report tests
if: always()
run: bash tests/integration/kubernetes/gha-run.sh report-tests
- name: Refresh OIDC token in case access token expired
if: always()
uses: azure/login@a457da9ea143d694b1b9c7c869ebb04ebe844ef5 # v2.3.0

View File

@@ -68,6 +68,10 @@ jobs:
timeout-minutes: 30
run: bash tests/integration/kubernetes/gha-run.sh run-tests
- name: Report tests
if: always()
run: bash tests/integration/kubernetes/gha-run.sh report-tests
- name: Collect artifacts ${{ matrix.vmm }}
if: always()
run: bash tests/integration/kubernetes/gha-run.sh collect-artifacts

View File

@@ -2,6 +2,9 @@ name: CI | Run NVIDIA GPU kubernetes tests on amd64
on:
workflow_call:
inputs:
tarball-suffix:
required: true
type: string
registry:
required: true
type: string
@@ -45,6 +48,7 @@ jobs:
GH_PR_NUMBER: ${{ inputs.pr-number }}
KATA_HYPERVISOR: ${{ matrix.environment.vmm }}
KUBERNETES: kubeadm
KBS: ${{ matrix.environment.name == 'nvidia-gpu-snp' && 'true' || 'false' }}
K8S_TEST_HOST_TYPE: baremetal
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
@@ -59,6 +63,15 @@ jobs:
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: get-kata-tools-tarball
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: kata-tools-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-tools-artifacts
- name: Install kata-tools
run: bash tests/integration/kubernetes/gha-run.sh install-kata-tools kata-tools-artifacts
- name: Uninstall previous `kbs-client`
if: matrix.environment.name != 'nvidia-gpu'
timeout-minutes: 10
@@ -89,6 +102,11 @@ jobs:
run: bash tests/integration/kubernetes/gha-run.sh run-nv-tests
env:
NGC_API_KEY: ${{ secrets.NGC_API_KEY }}
- name: Report tests
if: always()
run: bash tests/integration/kubernetes/gha-run.sh report-tests
- name: Collect artifacts ${{ matrix.environment.vmm }}
if: always()
run: bash tests/integration/kubernetes/gha-run.sh collect-artifacts

View File

@@ -75,3 +75,7 @@ jobs:
- name: Run tests
timeout-minutes: 30
run: bash tests/integration/kubernetes/gha-run.sh run-tests
- name: Report tests
if: always()
run: bash tests/integration/kubernetes/gha-run.sh report-tests

View File

@@ -131,6 +131,10 @@ jobs:
timeout-minutes: 60
run: bash tests/integration/kubernetes/gha-run.sh run-tests
- name: Report tests
if: always()
run: bash tests/integration/kubernetes/gha-run.sh report-tests
- name: Delete kata-deploy
if: always()
run: bash tests/integration/kubernetes/gha-run.sh cleanup-zvsi

View File

@@ -84,14 +84,14 @@ jobs:
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: get-kata-tarball
- name: get-kata-tools-tarball
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
name: kata-tools-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-tools-artifacts
- name: Install kata
run: bash tests/integration/kubernetes/gha-run.sh install-kata-tools kata-artifacts
- name: Install kata-tools
run: bash tests/integration/kubernetes/gha-run.sh install-kata-tools kata-tools-artifacts
- name: Log into the Azure account
uses: azure/login@a457da9ea143d694b1b9c7c869ebb04ebe844ef5 # v2.3.0
@@ -140,6 +140,10 @@ jobs:
timeout-minutes: 300
run: bash tests/stability/gha-stability-run.sh run-tests
- name: Report tests
if: always()
run: bash tests/integration/kubernetes/gha-run.sh report-tests
- name: Refresh OIDC token in case access token expired
if: always()
uses: azure/login@a457da9ea143d694b1b9c7c869ebb04ebe844ef5 # v2.3.0

View File

@@ -79,6 +79,15 @@ jobs:
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: get-kata-tools-tarball
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: kata-tools-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-tools-artifacts
- name: Install kata-tools
run: bash tests/integration/kubernetes/gha-run.sh install-kata-tools kata-tools-artifacts
- name: Deploy Kata
timeout-minutes: 20
run: bash tests/integration/kubernetes/gha-run.sh deploy-kata
@@ -178,14 +187,14 @@ jobs:
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: get-kata-tarball
- name: get-kata-tools-tarball
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
name: kata-tools-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-tools-artifacts
- name: Install kata
run: bash tests/integration/kubernetes/gha-run.sh install-kata-tools kata-artifacts
- name: Install kata-tools
run: bash tests/integration/kubernetes/gha-run.sh install-kata-tools kata-tools-artifacts
- name: Log into the Azure account
uses: azure/login@a457da9ea143d694b1b9c7c869ebb04ebe844ef5 # v2.3.0
@@ -301,6 +310,15 @@ jobs:
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: get-kata-tools-tarball
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: kata-tools-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-tools-artifacts
- name: Install kata-tools
run: bash tests/integration/kubernetes/gha-run.sh install-kata-tools kata-tools-artifacts
- name: Remove unnecessary directories to free up space
run: |
sudo rm -rf /usr/local/.ghcup

View File

@@ -102,6 +102,10 @@ jobs:
- name: Run tests
run: bash tests/functional/kata-deploy/gha-run.sh run-tests
- name: Report tests
if: always()
run: bash tests/integration/kubernetes/gha-run.sh report-tests
- name: Refresh OIDC token in case access token expired
if: always()
uses: azure/login@a457da9ea143d694b1b9c7c869ebb04ebe844ef5 # v2.3.0

View File

@@ -85,3 +85,7 @@ jobs:
- name: Run tests
run: bash tests/functional/kata-deploy/gha-run.sh run-tests
- name: Report tests
if: always()
run: bash tests/integration/kubernetes/gha-run.sh report-tests

View File

@@ -29,7 +29,7 @@ jobs:
matrix:
instance:
- "ubuntu-24.04-arm"
- "s390x"
- "ubuntu-24.04-s390x"
- "ubuntu-24.04-ppc64le"
uses: ./.github/workflows/build-checks.yaml
with:

1
.gitignore vendored
View File

@@ -18,3 +18,4 @@ src/tools/log-parser/kata-log-parser
tools/packaging/static-build/agent/install_libseccomp.sh
.envrc
.direnv
**/.DS_Store

3702
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -6,35 +6,49 @@ rust-version = "1.85.1"
[workspace]
members = [
# Dragonball
"src/dragonball",
"src/dragonball/dbs_acpi",
"src/dragonball/dbs_address_space",
"src/dragonball/dbs_allocator",
"src/dragonball/dbs_arch",
"src/dragonball/dbs_boot",
"src/dragonball/dbs_device",
"src/dragonball/dbs_interrupt",
"src/dragonball/dbs_legacy_devices",
"src/dragonball/dbs_pci",
"src/dragonball/dbs_tdx",
"src/dragonball/dbs_upcall",
"src/dragonball/dbs_utils",
"src/dragonball/dbs_virtio_devices",
# Dragonball
"src/dragonball",
"src/dragonball/dbs_acpi",
"src/dragonball/dbs_address_space",
"src/dragonball/dbs_allocator",
"src/dragonball/dbs_arch",
"src/dragonball/dbs_boot",
"src/dragonball/dbs_device",
"src/dragonball/dbs_interrupt",
"src/dragonball/dbs_legacy_devices",
"src/dragonball/dbs_pci",
"src/dragonball/dbs_tdx",
"src/dragonball/dbs_upcall",
"src/dragonball/dbs_utils",
"src/dragonball/dbs_virtio_devices",
# runtime-rs
"src/runtime-rs",
"src/runtime-rs/crates/agent",
"src/runtime-rs/crates/hypervisor",
"src/runtime-rs/crates/persist",
"src/runtime-rs/crates/resource",
"src/runtime-rs/crates/runtimes",
"src/runtime-rs/crates/service",
"src/runtime-rs/crates/shim",
"src/runtime-rs/crates/shim-ctl",
"src/runtime-rs/tests/utils",
]
resolver = "2"
# TODO: Add all excluded crates to root workspace
exclude = [
"src/agent",
"src/tools",
"src/libs",
"src/runtime-rs",
"src/agent",
"src/tools",
"src/libs",
# We are cloning and building rust packages under
# "tools/packaging/kata-deploy/local-build/build" folder, which may mislead
# those packages to think they are part of the kata root workspace
"tools/packaging/kata-deploy/local-build/build",
# kata-deploy binary is standalone and has its own Cargo.toml for now
"tools/packaging/kata-deploy/binary",
# We are cloning and building rust packages under
# "tools/packaging/kata-deploy/local-build/build" folder, which may mislead
# those packages to think they are part of the kata root workspace
"tools/packaging/kata-deploy/local-build/build",
]
[workspace.dependencies]
@@ -54,6 +68,7 @@ vm-superio = "0.5.0"
vmm-sys-util = "0.11.0"
# Local dependencies from Dragonball Sandbox crates
dragonball = { path = "src/dragonball" }
dbs-acpi = { path = "src/dragonball/dbs_acpi" }
dbs-address-space = { path = "src/dragonball/dbs_address_space" }
dbs-allocator = { path = "src/dragonball/dbs_allocator" }
@@ -68,5 +83,57 @@ dbs-upcall = { path = "src/dragonball/dbs_upcall" }
dbs-utils = { path = "src/dragonball/dbs_utils" }
dbs-virtio-devices = { path = "src/dragonball/dbs_virtio_devices" }
# Local dependencies from runtime-rs
agent = { path = "src/runtime-rs/crates/agent" }
hypervisor = { path = "src/runtime-rs/crates/hypervisor" }
persist = { path = "src/runtime-rs/crates/persist" }
resource = { path = "src/runtime-rs/crates/resource" }
runtimes = { path = "src/runtime-rs/crates/runtimes" }
service = { path = "src/runtime-rs/crates/service" }
tests_utils = { path = "src/runtime-rs/tests/utils" }
ch-config = { path = "src/runtime-rs/crates/hypervisor/ch-config" }
common = { path = "src/runtime-rs/crates/runtimes/common" }
linux_container = { path = "src/runtime-rs/crates/runtimes/linux_container" }
virt_container = { path = "src/runtime-rs/crates/runtimes/virt_container" }
wasm_container = { path = "src/runtime-rs/crates/runtimes/wasm_container" }
# Local dependencies from `src/lib`
kata-sys-util = { path = "src/libs/kata-sys-util" }
kata-types = { path = "src/libs/kata-types", features = ["safe-path"] }
logging = { path = "src/libs/logging" }
protocols = { path = "src/libs/protocols", features = ["async"] }
runtime-spec = { path = "src/libs/runtime-spec" }
safe-path = { path = "src/libs/safe-path" }
shim-interface = { path = "src/libs/shim-interface" }
test-utils = { path = "src/libs/test-utils" }
# Outside dependencies
actix-rt = "2.7.0"
anyhow = "1.0"
async-trait = "0.1.48"
containerd-shim = { version = "0.10.0", features = ["async"] }
containerd-shim-protos = { version = "0.10.0", features = ["async"] }
go-flag = "0.1.0"
hyper = "0.14.20"
hyperlocal = "0.8.0"
lazy_static = "1.4"
libc = "0.2"
log = "0.4.14"
netns-rs = "0.1.0"
# Note: nix needs to stay sync'd with libs versions
nix = "0.26.4"
oci-spec = { version = "0.8.1", features = ["runtime"] }
protobuf = "3.7.2"
rand = "0.8.4"
serde = { version = "1.0.145", features = ["derive"] }
serde_json = "1.0.91"
slog = "2.5.2"
slog-scope = "4.4.0"
strum = { version = "0.24.0", features = ["derive"] }
tempfile = "3.19.1"
thiserror = "1.0"
tokio = "1.46.1"
tracing = "0.1.41"
tracing-opentelemetry = "0.18.0"
ttrpc = "0.8.4"
url = "2.5.4"

View File

@@ -1 +1 @@
3.23.0
3.24.0

View File

@@ -11,6 +11,10 @@ script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "${script_dir}/../tests/common.bash"
# Path to the ORAS cache helper for downloading tarballs (sourced when needed)
# Use ORAS_CACHE_HELPER env var (set by build.sh in Docker) or fallback to repo path
oras_cache_helper="${ORAS_CACHE_HELPER:-${script_dir}/../tools/packaging/scripts/download-with-oras-cache.sh}"
# The following variables if set on the environment will change the behavior
# of gperf and libseccomp configure scripts, that may lead this script to
# fail. So let's ensure they are unset here.
@@ -44,6 +48,9 @@ fi
gperf_tarball="gperf-${gperf_version}.tar.gz"
gperf_tarball_url="${gperf_url}/${gperf_tarball}"
# Use ORAS cache for gperf downloads (gperf upstream can be unreliable)
USE_ORAS_CACHE="${USE_ORAS_CACHE:-yes}"
# We need to build the libseccomp library from sources to create a static
# library for the musl libc.
# However, ppc64le, riscv64 and s390x have no musl targets in Rust. Hence, we do
@@ -68,7 +75,23 @@ trap finish EXIT
build_and_install_gperf() {
echo "Build and install gperf version ${gperf_version}"
mkdir -p "${gperf_install_dir}"
curl -sLO "${gperf_tarball_url}"
# Use ORAS cache if available and enabled
if [[ "${USE_ORAS_CACHE}" == "yes" ]] && [[ -f "${oras_cache_helper}" ]]; then
echo "Using ORAS cache for gperf download"
source "${oras_cache_helper}"
local cached_tarball
cached_tarball=$(download_component gperf "$(pwd)")
if [[ -f "${cached_tarball}" ]]; then
gperf_tarball="${cached_tarball}"
else
echo "ORAS cache download failed, falling back to direct download"
curl -sLO "${gperf_tarball_url}"
fi
else
curl -sLO "${gperf_tarball_url}"
fi
tar -xf "${gperf_tarball}"
pushd "gperf-${gperf_version}"
# Unset $CC for configure, we will always use native for gperf

View File

@@ -83,3 +83,7 @@ Documents that help to understand and contribute to Kata Containers.
If you have a suggestion for how we can improve the
[website](https://katacontainers.io), please raise an issue (or a PR) on
[the repository that holds the source for the website](https://github.com/OpenStackweb/kata-netlify-refresh).
### Toolchain Guidance
* [Toolchain Guidance](./Toochain-Guidance.md)

39
docs/Toochain-Guidance.md Normal file
View File

@@ -0,0 +1,39 @@
# Toolchains
As a community we want to strike a balance between having up-to-date toolchains, to receive the
latest security fixes and to be able to benefit from new features and packages, whilst not being
too bleeding edge and disrupting downstream and other consumers. As a result we have the following
guidelines (note, not hard rules) for our go and rust toolchains that we are attempting to try out:
## Go toolchain
Go is released [every six months](https://go.dev/wiki/Go-Release-Cycle) with support for the
[last two major release versions](https://go.dev/doc/devel/release#policy). We always want to
ensure that we are on a supported version so we receive security fixes. To try and make
things easier for some of our users, we aim to be using the older of the two supported major
versions, unless there is a compelling reason to adopt the newer version.
In practice this means that we bump our major version of the go toolchain every six months to
version (1.x-1) in response to a new version (1.x) coming out, which makes our current version
(1.x-2) no longer supported. We will bump the minor version whenever required to satisfy
dependency updates, or security fixes.
Our go toolchain version is recorded in [`versions.yaml`](../versions.yaml) under
`.languages.golang.version` and should match with the version in our `go.mod` files.
## Rust toolchain
Rust has a [six week](https://doc.rust-lang.org/book/appendix-05-editions.html#:~:text=The%20Rust%20language%20and%20compiler,these%20tiny%20changes%20add%20up.)
release cycle and they only support the latest stable release, so if we wanted to remain on a
supported release we would only ever build with the latest stable and bump every 6 weeks.
However feedback from our community has indicated that this is a challenge as downstream consumers
often want to get rust from their distro, or downstream fork and these struggle to keep up with
the six week release schedule. As a result the community has agreed to try out a policy of
"stable-2", where we aim to build with a rust version that is two versions behind the latest stable
version.
In practice this should mean that we bump our rust toolchain every six weeks, to version
1.x-2 when 1.x is released as stable and we should be picking up the latest point release
of that version, if there were any.
The rust-toolchain that we are using is recorded in [`rust-toolchain.toml`](../rust-toolchain.toml).

View File

@@ -97,6 +97,8 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.hypervisor.use_legacy_serial` | `boolean` | uses legacy serial device for guest's console (QEMU) |
| `io.katacontainers.config.hypervisor.default_gpus` | uint32 | the minimum number of GPUs required for the VM. Only used by remote hypervisor to help with instance selection |
| `io.katacontainers.config.hypervisor.default_gpu_model` | string | the GPU model required for the VM. Only used by remote hypervisor to help with instance selection |
| `io.katacontainers.config.hypervisor.block_device_num_queues` | `usize` | The number of queues to use for block devices (runtime-rs only) |
| `io.katacontainers.config.hypervisor.block_device_queue_size` | uint32 | The size of the of the queue to use for block devices (runtime-rs only) |
## Container Options
| Key | Value Type | Comments |

View File

@@ -186,7 +186,7 @@ base64 = "0.22"
sha2 = "0.10.8"
async-compression = { version = "0.4.22", features = ["tokio", "gzip"] }
container-device-interface = "0.1.0"
container-device-interface = "0.1.1"
[target.'cfg(target_arch = "s390x")'.dependencies]
pv_core = { git = "https://github.com/ibm-s390-linux/s390-tools", rev = "4942504a9a2977d49989a5e5b7c1c8e07dc0fa41", package = "s390_pv_core" }

View File

@@ -88,7 +88,7 @@ pub fn baremount(
let destination_str = destination.to_string_lossy();
if let Ok(m) = get_linux_mount_info(destination_str.deref()) {
if m.fs_type == fs_type {
if m.fs_type == fs_type && !flags.contains(MsFlags::MS_REMOUNT) {
slog_info!(logger, "{source:?} is already mounted at {destination:?}");
return Ok(());
}

View File

@@ -401,11 +401,10 @@ impl Handle {
}
if let RouteAttribute::Oif(index) = attribute {
route.device = self
.find_link(LinkFilter::Index(*index))
.await
.context(format!("error looking up device {index}"))?
.name();
route.device = match self.find_link(LinkFilter::Index(*index)).await {
Ok(link) => link.name(),
Err(_) => String::new(),
};
}
}
@@ -1005,10 +1004,6 @@ mod tests {
.expect("Failed to list routes");
assert_ne!(all.len(), 0);
for r in &all {
assert_ne!(r.device.len(), 0);
}
}
#[tokio::test]

View File

@@ -72,7 +72,7 @@ use crate::network::setup_guest_dns;
use crate::passfd_io;
use crate::pci;
use crate::random;
use crate::sandbox::Sandbox;
use crate::sandbox::{Sandbox, SandboxError};
use crate::storage::{add_storages, update_ephemeral_mounts, STORAGE_HANDLERS};
use crate::util;
use crate::version::{AGENT_VERSION, API_VERSION};
@@ -141,6 +141,16 @@ pub fn ttrpc_error(code: ttrpc::Code, err: impl Debug) -> ttrpc::Error {
get_rpc_status(code, format!("{:?}", err))
}
/// Convert SandboxError to ttrpc error with appropriate code.
/// Process not found errors map to NOT_FOUND, others to INVALID_ARGUMENT.
fn sandbox_err_to_ttrpc(err: SandboxError) -> ttrpc::Error {
let code = match &err {
SandboxError::InitProcessNotFound | SandboxError::InvalidExecId => ttrpc::Code::NOT_FOUND,
SandboxError::InvalidContainerId => ttrpc::Code::INVALID_ARGUMENT,
};
ttrpc_error(code, err)
}
#[cfg(not(feature = "agent-policy"))]
async fn is_allowed(_req: &impl serde::Serialize) -> ttrpc::Result<()> {
Ok(())
@@ -460,7 +470,9 @@ impl AgentService {
let mut sig: libc::c_int = req.signal as libc::c_int;
{
let mut sandbox = self.sandbox.lock().await;
let p = sandbox.find_container_process(cid.as_str(), eid.as_str())?;
let p = sandbox
.find_container_process(cid.as_str(), eid.as_str())
.map_err(sandbox_err_to_ttrpc)?;
// For container initProcess, if it hasn't installed handler for "SIGTERM" signal,
// it will ignore the "SIGTERM" signal sent to it, thus send it "SIGKILL" signal
// instead of "SIGTERM" to terminate it.
@@ -568,7 +580,9 @@ impl AgentService {
let (exit_send, mut exit_recv) = tokio::sync::mpsc::channel(100);
let exit_rx = {
let mut sandbox = self.sandbox.lock().await;
let p = sandbox.find_container_process(cid.as_str(), eid.as_str())?;
let p = sandbox
.find_container_process(cid.as_str(), eid.as_str())
.map_err(sandbox_err_to_ttrpc)?;
p.exit_watchers.push(exit_send);
pid = p.pid;
@@ -665,7 +679,9 @@ impl AgentService {
let term_exit_notifier;
let reader = {
let mut sandbox = self.sandbox.lock().await;
let p = sandbox.find_container_process(cid.as_str(), eid.as_str())?;
let p = sandbox
.find_container_process(cid.as_str(), eid.as_str())
.map_err(sandbox_err_to_ttrpc)?;
term_exit_notifier = p.term_exit_notifier.clone();
@@ -947,12 +963,7 @@ impl agent_ttrpc::AgentService for AgentService {
let p = sandbox
.find_container_process(cid.as_str(), eid.as_str())
.map_err(|e| {
ttrpc_error(
ttrpc::Code::INVALID_ARGUMENT,
format!("invalid argument: {:?}", e),
)
})?;
.map_err(sandbox_err_to_ttrpc)?;
p.close_stdin().await;
@@ -970,12 +981,7 @@ impl agent_ttrpc::AgentService for AgentService {
let mut sandbox = self.sandbox.lock().await;
let p = sandbox
.find_container_process(req.container_id(), req.exec_id())
.map_err(|e| {
ttrpc_error(
ttrpc::Code::UNAVAILABLE,
format!("invalid argument: {:?}", e),
)
})?;
.map_err(sandbox_err_to_ttrpc)?;
let fd = p
.term_master
@@ -2629,12 +2635,12 @@ mod tests {
},
TestData {
create_container: false,
result: Err(anyhow!(crate::sandbox::ERR_INVALID_CONTAINER_ID)),
result: Err(anyhow!(crate::sandbox::SandboxError::InvalidContainerId)),
..Default::default()
},
TestData {
container_id: "8181",
result: Err(anyhow!(crate::sandbox::ERR_INVALID_CONTAINER_ID)),
result: Err(anyhow!(crate::sandbox::SandboxError::InvalidContainerId)),
..Default::default()
},
TestData {

View File

@@ -32,6 +32,7 @@ use rustjail::container::BaseContainer;
use rustjail::container::LinuxContainer;
use rustjail::process::Process;
use slog::Logger;
use thiserror::Error;
use tokio::sync::mpsc::{channel, Receiver, Sender};
use tokio::sync::oneshot;
use tokio::sync::Mutex;
@@ -47,7 +48,16 @@ use crate::storage::StorageDeviceGeneric;
use crate::uevent::{Uevent, UeventMatcher};
use crate::watcher::BindWatcher;
pub const ERR_INVALID_CONTAINER_ID: &str = "Invalid container id";
/// Errors that can occur when looking up processes in the sandbox.
#[derive(Debug, Error)]
pub enum SandboxError {
#[error("Invalid container id")]
InvalidContainerId,
#[error("Process not found: init process missing")]
InitProcessNotFound,
#[error("Process not found: invalid exec id")]
InvalidExecId,
}
type UeventWatcher = (Box<dyn UeventMatcher>, oneshot::Sender<Uevent>);
@@ -282,10 +292,14 @@ impl Sandbox {
None
}
pub fn find_container_process(&mut self, cid: &str, eid: &str) -> Result<&mut Process> {
pub fn find_container_process(
&mut self,
cid: &str,
eid: &str,
) -> Result<&mut Process, SandboxError> {
let ctr = self
.get_container(cid)
.ok_or_else(|| anyhow!(ERR_INVALID_CONTAINER_ID))?;
.ok_or(SandboxError::InvalidContainerId)?;
if eid.is_empty() {
let init_pid = ctr.init_process_pid;
@@ -293,10 +307,11 @@ impl Sandbox {
.processes
.values_mut()
.find(|p| p.pid == init_pid)
.ok_or_else(|| anyhow!("cannot find init process!"));
.ok_or(SandboxError::InitProcessNotFound);
}
ctr.get_process(eid).map_err(|_| anyhow!("Invalid exec id"))
ctr.get_process(eid)
.map_err(|_| SandboxError::InvalidExecId)
}
#[instrument]

View File

@@ -21,6 +21,8 @@ libc = ">=0.2.39"
[dev-dependencies]
vm-memory = { workspace = true, features = ["backend-mmap"] }
test-utils = { workspace = true }
nix = { workspace = true }
[package.metadata.docs.rs]
all-features = true

View File

@@ -205,12 +205,12 @@ pub fn create_gic(vm: &VmFd, vcpu_count: u64) -> Result<Box<dyn GICDevice>> {
#[cfg(test)]
mod tests {
use super::*;
use kvm_ioctls::Kvm;
#[test]
fn test_create_gic() {
test_utils::skip_if_not_root!();
let kvm = Kvm::new().unwrap();
let vm = kvm.create_vm().unwrap();
assert!(create_gic(&vm, 1).is_ok());

View File

@@ -150,6 +150,7 @@ mod tests {
#[test]
fn test_create_pmu() {
test_utils::skip_if_not_root!();
let kvm = Kvm::new().unwrap();
let vm = kvm.create_vm().unwrap();
let vcpu = vm.create_vcpu(0).unwrap();

View File

@@ -166,9 +166,11 @@ pub fn read_mpidr(vcpu: &VcpuFd) -> Result<u64> {
mod tests {
use super::*;
use kvm_ioctls::Kvm;
use test_utils::skip_if_not_root;
#[test]
fn test_setup_regs() {
skip_if_not_root!();
let kvm = Kvm::new().unwrap();
let vm = kvm.create_vm().unwrap();
let vcpu = vm.create_vcpu(0).unwrap();
@@ -185,6 +187,7 @@ mod tests {
#[test]
fn test_read_mpidr() {
skip_if_not_root!();
let kvm = Kvm::new().unwrap();
let vm = kvm.create_vm().unwrap();
let vcpu = vm.create_vcpu(0).unwrap();

View File

@@ -78,6 +78,7 @@ pub fn set_lint(vcpu: &VcpuFd) -> Result<()> {
mod tests {
use super::*;
use kvm_ioctls::Kvm;
use test_utils::skip_if_not_root;
const KVM_APIC_REG_SIZE: usize = 0x400;
@@ -100,6 +101,7 @@ mod tests {
#[test]
fn test_setlint() {
skip_if_not_root!();
let kvm = Kvm::new().unwrap();
assert!(kvm.check_extension(kvm_ioctls::Cap::Irqchip));
let vm = kvm.create_vm().unwrap();
@@ -126,6 +128,7 @@ mod tests {
#[test]
fn test_setlint_fails() {
skip_if_not_root!();
let kvm = Kvm::new().unwrap();
let vm = kvm.create_vm().unwrap();
let vcpu = vm.create_vcpu(0).unwrap();

View File

@@ -271,6 +271,7 @@ mod tests {
use super::*;
use crate::x86_64::gdt::gdt_entry;
use kvm_ioctls::Kvm;
use test_utils::skip_if_not_root;
use vm_memory::{Bytes, GuestAddress, GuestMemoryMmap};
const BOOT_GDT_OFFSET: u64 = 0x500;
@@ -334,6 +335,7 @@ mod tests {
#[test]
fn test_setup_fpu() {
skip_if_not_root!();
let kvm = Kvm::new().unwrap();
let vm = kvm.create_vm().unwrap();
let vcpu = vm.create_vcpu(0).unwrap();
@@ -356,6 +358,7 @@ mod tests {
#[test]
#[allow(clippy::cast_ptr_alignment)]
fn test_setup_msrs() {
skip_if_not_root!();
let kvm = Kvm::new().unwrap();
let vm = kvm.create_vm().unwrap();
let vcpu = vm.create_vcpu(0).unwrap();
@@ -384,6 +387,7 @@ mod tests {
#[test]
fn test_setup_regs() {
skip_if_not_root!();
let kvm = Kvm::new().unwrap();
let vm = kvm.create_vm().unwrap();
let vcpu = vm.create_vcpu(0).unwrap();

View File

@@ -24,3 +24,5 @@ vm-fdt = {workspace= true}
vm-memory = { workspace = true, features = ["backend-mmap"] }
device_tree = ">=1.1.0"
dbs-device = { workspace = true }
test-utils = { workspace = true }
nix = { workspace = true }

View File

@@ -399,6 +399,7 @@ mod tests {
use device_tree::DeviceTree;
use kvm_bindings::{kvm_vcpu_init, KVM_ARM_VCPU_PMU_V3, KVM_ARM_VCPU_PSCI_0_2};
use kvm_ioctls::{Kvm, VcpuFd, VmFd};
use test_utils::skip_if_not_root;
use vm_memory::GuestMemoryMmap;
use super::super::tests::MMIODeviceInfo;
@@ -460,6 +461,7 @@ mod tests {
#[test]
fn test_create_fdt_with_devices() {
skip_if_not_root!();
let regions = arch_memory_regions(FDT_MAX_SIZE + 0x1000);
let mem = GuestMemoryMmap::<()>::from_ranges(&regions).expect("Cannot initialize memory");
let dev_info: HashMap<(DeviceType, String), MMIODeviceInfo> = [
@@ -498,6 +500,7 @@ mod tests {
#[test]
fn test_create_fdt() {
skip_if_not_root!();
let regions = arch_memory_regions(FDT_MAX_SIZE + 0x1000);
let mem = GuestMemoryMmap::<()>::from_ranges(&regions).expect("Cannot initialize memory");
let kvm = Kvm::new().unwrap();
@@ -532,6 +535,7 @@ mod tests {
#[test]
fn test_create_fdt_with_initrd() {
skip_if_not_root!();
let regions = arch_memory_regions(FDT_MAX_SIZE + 0x1000);
let mem = GuestMemoryMmap::<()>::from_ranges(&regions).expect("Cannot initialize memory");
let kvm = Kvm::new().unwrap();
@@ -570,6 +574,7 @@ mod tests {
#[test]
fn test_create_fdt_with_pmu() {
skip_if_not_root!();
let regions = arch_memory_regions(FDT_MAX_SIZE + 0x1000);
let mem = GuestMemoryMmap::<()>::from_ranges(&regions).expect("Cannot initialize memory");
let kvm = Kvm::new().unwrap();

View File

@@ -304,6 +304,7 @@ mod tests {
#[test]
fn test_fdtutils_fdt_device_info() {
test_utils::skip_if_not_root!();
let kvm = Kvm::new().unwrap();
let vm = kvm.create_vm().unwrap();
let gic = create_gic(&vm, 0).unwrap();

View File

@@ -68,6 +68,7 @@ pub fn initrd_load_addr<M: GuestMemory>(guest_mem: &M, initrd_size: u64) -> supe
}
}
#[allow(missing_docs)]
#[cfg(test)]
pub mod tests {
use dbs_arch::{DeviceInfoForFDT, Error as ArchError};

View File

@@ -258,6 +258,7 @@ mod tests {
#[test]
fn test_setup_page_tables() {
test_utils::skip_if_not_root!();
let kvm = Kvm::new().unwrap();
let vm = kvm.create_vm().unwrap();
let vcpu = vm.create_vcpu(0).unwrap();

View File

@@ -18,6 +18,10 @@ kvm-ioctls = { workspace = true, optional = true }
libc = "0.2"
vmm-sys-util = {workspace = true}
[dev-dependencies]
test-utils = { workspace = true }
nix = { workspace = true }
[features]
default = ["legacy-irq", "msi-irq"]

View File

@@ -220,6 +220,7 @@ impl InterruptSourceGroup for LegacyIrq {
mod test {
use super::*;
use crate::manager::tests::create_vm_fd;
use test_utils::skip_if_not_root;
const MASTER_PIC: usize = 7;
const SLAVE_PIC: usize = 8;
@@ -228,6 +229,7 @@ mod test {
#[test]
#[allow(unreachable_patterns)]
fn test_legacy_interrupt_group() {
skip_if_not_root!();
let vmfd = Arc::new(create_vm_fd());
let rounting = Arc::new(KvmIrqRouting::new(vmfd.clone()));
let base = 0;
@@ -263,6 +265,7 @@ mod test {
#[test]
fn test_irq_routing_initialize_legacy() {
skip_if_not_root!();
let vmfd = Arc::new(create_vm_fd());
let routing = KvmIrqRouting::new(vmfd.clone());
@@ -278,6 +281,7 @@ mod test {
#[test]
fn test_routing_opt() {
skip_if_not_root!();
let vmfd = Arc::new(create_vm_fd());
let routing = KvmIrqRouting::new(vmfd.clone());
@@ -309,6 +313,7 @@ mod test {
#[test]
fn test_routing_set_routing() {
skip_if_not_root!();
let vmfd = Arc::new(create_vm_fd());
let routing = KvmIrqRouting::new(vmfd.clone());

View File

@@ -271,6 +271,7 @@ pub fn from_sys_util_errno(e: vmm_sys_util::errno::Error) -> std::io::Error {
pub(crate) mod tests {
use super::*;
use crate::manager::tests::create_vm_fd;
use test_utils::skip_if_not_root;
fn create_irq_group(
manager: Arc<KvmIrqManager>,
@@ -306,11 +307,13 @@ pub(crate) mod tests {
#[test]
fn test_create_kvm_irq_manager() {
skip_if_not_root!();
let _ = create_kvm_irq_manager();
}
#[test]
fn test_kvm_irq_manager_opt() {
skip_if_not_root!();
let vmfd = Arc::new(create_vm_fd());
vmfd.create_irq_chip().unwrap();
let manager = Arc::new(KvmIrqManager::new(vmfd.clone()));

View File

@@ -202,10 +202,12 @@ impl InterruptSourceGroup for MsiIrq {
mod test {
use super::*;
use crate::manager::tests::create_vm_fd;
use test_utils::skip_if_not_root;
#[test]
#[allow(unreachable_patterns)]
fn test_msi_interrupt_group() {
skip_if_not_root!();
let vmfd = Arc::new(create_vm_fd());
vmfd.create_irq_chip().unwrap();

View File

@@ -451,6 +451,7 @@ pub(crate) mod tests {
use dbs_device::resources::{DeviceResources, MsiIrqType, Resource};
use kvm_ioctls::{Kvm, VmFd};
use test_utils::skip_if_not_root;
use super::*;
use crate::KvmIrqManager;
@@ -502,6 +503,7 @@ pub(crate) mod tests {
#[test]
fn test_create_device_interrupt_manager() {
skip_if_not_root!();
let mut mgr = create_interrupt_manager();
assert_eq!(mgr.mode, DeviceInterruptMode::Disabled);
@@ -537,6 +539,7 @@ pub(crate) mod tests {
#[test]
fn test_device_interrupt_manager_switch_mode() {
skip_if_not_root!();
let mut mgr = create_interrupt_manager();
// Can't switch working mode in enabled state.
@@ -621,6 +624,7 @@ pub(crate) mod tests {
#[test]
fn test_msi_config() {
skip_if_not_root!();
let mut interrupt_manager = create_interrupt_manager();
assert!(interrupt_manager.set_msi_data(512, 0).is_err());
@@ -638,6 +642,7 @@ pub(crate) mod tests {
#[test]
fn test_set_working_mode_after_activated() {
skip_if_not_root!();
let mut interrupt_manager = create_interrupt_manager();
interrupt_manager.activated = true;
assert!(interrupt_manager
@@ -659,6 +664,7 @@ pub(crate) mod tests {
#[test]
fn test_disable2legacy() {
skip_if_not_root!();
let mut interrupt_manager = create_interrupt_manager();
interrupt_manager.activated = false;
interrupt_manager.mode = DeviceInterruptMode::Disabled;
@@ -669,6 +675,7 @@ pub(crate) mod tests {
#[test]
fn test_disable2nonlegacy() {
skip_if_not_root!();
let mut interrupt_manager = create_interrupt_manager();
interrupt_manager.activated = false;
interrupt_manager.mode = DeviceInterruptMode::Disabled;
@@ -679,6 +686,7 @@ pub(crate) mod tests {
#[test]
fn test_legacy2nonlegacy() {
skip_if_not_root!();
let mut interrupt_manager = create_interrupt_manager();
interrupt_manager.activated = false;
interrupt_manager.mode = DeviceInterruptMode::Disabled;
@@ -692,6 +700,7 @@ pub(crate) mod tests {
#[test]
fn test_nonlegacy2legacy() {
skip_if_not_root!();
let mut interrupt_manager = create_interrupt_manager();
interrupt_manager.activated = false;
interrupt_manager.mode = DeviceInterruptMode::Disabled;
@@ -705,6 +714,7 @@ pub(crate) mod tests {
#[test]
fn test_update() {
skip_if_not_root!();
let mut interrupt_manager = create_interrupt_manager();
interrupt_manager
.set_working_mode(DeviceInterruptMode::GenericMsiIrq)
@@ -721,6 +731,7 @@ pub(crate) mod tests {
#[test]
fn test_get_configs() {
skip_if_not_root!();
// legacy irq config
{
let interrupt_manager = create_interrupt_manager();
@@ -762,6 +773,7 @@ pub(crate) mod tests {
#[test]
fn test_reset_configs() {
skip_if_not_root!();
let mut interrupt_manager = create_interrupt_manager();
interrupt_manager.reset_configs(DeviceInterruptMode::LegacyIrq);

View File

@@ -235,6 +235,7 @@ mod tests {
use super::*;
use crate::{InterruptManager, InterruptSourceType};
use test_utils::skip_if_not_root;
const VIRTIO_INTR_VRING: u32 = 0x01;
const VIRTIO_INTR_CONFIG: u32 = 0x02;
@@ -250,6 +251,7 @@ mod tests {
#[cfg(feature = "kvm-legacy-irq")]
#[test]
fn test_create_legacy_notifier() {
skip_if_not_root!();
let (_vmfd, irq_manager) = crate::kvm::tests::create_kvm_irq_manager();
let group = irq_manager
.create_group(InterruptSourceType::LegacyIrq, 0, 1)
@@ -280,6 +282,7 @@ mod tests {
#[cfg(feature = "kvm-msi-irq")]
#[test]
fn test_virtio_msi_notifier() {
skip_if_not_root!();
let (_vmfd, irq_manager) = crate::kvm::tests::create_kvm_irq_manager();
let group = irq_manager
.create_group(InterruptSourceType::MsiIrq, 0, 3)

View File

@@ -41,6 +41,8 @@ dbs-utils = {workspace = true}
[dev-dependencies]
dbs-arch = { workspace = true }
kvm-ioctls = {workspace = true}
test-utils = { workspace = true }
nix = { workspace = true }
[lints.rust]
unexpected_cfgs = { level = "warn", check-cfg = [

View File

@@ -654,6 +654,7 @@ mod tests {
use dbs_device::resources::{DeviceResources, MsiIrqType, Resource};
use dbs_interrupt::KvmIrqManager;
use kvm_ioctls::{Kvm, VmFd};
use test_utils::skip_if_not_root;
use super::*;
@@ -735,6 +736,7 @@ mod tests {
#[test]
fn test_msi_state_struct() {
skip_if_not_root!();
let flags = MSI_CTL_ENABLE | MSI_CTL_64_BITS | MSI_CTL_PER_VECTOR | 0x6 | 0x20;
let mut cap = MsiCap::new(0xa5, flags);

View File

@@ -361,6 +361,7 @@ mod tests {
use dbs_device::resources::{DeviceResources, MsiIrqType, Resource};
use dbs_interrupt::KvmIrqManager;
use kvm_ioctls::{Kvm, VmFd};
use test_utils::skip_if_not_root;
use super::*;
@@ -422,6 +423,7 @@ mod tests {
#[test]
fn test_set_msg_ctl() {
skip_if_not_root!();
let mut config = MsixState::new(0x10);
let mut intr_mgr = create_interrupt_manager();
@@ -452,6 +454,7 @@ mod tests {
#[test]
fn test_read_write_table() {
skip_if_not_root!();
let mut intr_mgr = create_interrupt_manager();
let mut config = MsixState::new(0x10);

View File

@@ -1159,11 +1159,12 @@ impl<
#[cfg(test)]
pub(crate) mod tests {
#[cfg(target_arch = "aarch64")]
use arch::aarch64::gic::create_gic;
use dbs_arch::gic::create_gic;
use dbs_device::resources::MsiIrqType;
use dbs_interrupt::kvm::KvmIrqManager;
use dbs_utils::epoll_manager::EpollManager;
use kvm_ioctls::Kvm;
use test_utils::skip_if_not_root;
use virtio_queue::QueueSync;
use vm_memory::{GuestMemoryMmap, GuestRegionMmap, GuestUsize, MmapRegion};
@@ -1496,6 +1497,7 @@ pub(crate) mod tests {
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
#[test]
fn test_virtio_pci_device_activate() {
skip_if_not_root!();
let mut d: VirtioPciDevice<_, _, _> = get_pci_device();
assert_eq!(d.state().queues.len(), 2);
assert!(!d.state().check_queues_valid());
@@ -1554,6 +1556,7 @@ pub(crate) mod tests {
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
#[test]
fn test_bus_device_reset() {
skip_if_not_root!();
let mut d: VirtioPciDevice<_, _, _> = get_pci_device();
assert_eq!(d.state().queues.len(), 2);
@@ -1578,6 +1581,7 @@ pub(crate) mod tests {
#[test]
fn test_virtio_pci_device_resources() {
skip_if_not_root!();
let d: VirtioPciDevice<_, _, _> = get_pci_device();
let resources = d.get_assigned_resources();
@@ -1595,6 +1599,7 @@ pub(crate) mod tests {
#[test]
fn test_virtio_pci_register_ioevent() {
skip_if_not_root!();
let d: VirtioPciDevice<_, _, _> = get_pci_device();
d.register_ioevent().unwrap();
assert!(d.ioevent_registered.load(Ordering::SeqCst));
@@ -1616,6 +1621,7 @@ pub(crate) mod tests {
#[test]
fn test_read_bar() {
skip_if_not_root!();
let d: VirtioPciDevice<_, _, _> = get_pci_device();
let origin_data = vec![1u8];
// driver status

View File

@@ -22,3 +22,5 @@ vmm-sys-util = {workspace = true}
[dev-dependencies]
serde_json = "1.0.9"
test-utils = { workspace = true }
nix = { workspace = true }

View File

@@ -278,6 +278,7 @@ impl AsRawFd for Tap {
}
}
#[cfg(test)]
mod tests {
#![allow(dead_code)]
@@ -285,6 +286,7 @@ mod tests {
use std::net::Ipv4Addr;
use std::str;
use std::sync::atomic::{AtomicUsize, Ordering};
use test_utils::skip_if_not_root;
use super::*;
@@ -388,6 +390,7 @@ mod tests {
#[test]
fn test_tap_name() {
skip_if_not_root!();
// Sanity check that the assumed max iface name length is correct.
assert_eq!(
IFACE_NAME_MAX_LEN,
@@ -414,11 +417,13 @@ mod tests {
#[test]
fn test_tap_partial_eq() {
skip_if_not_root!();
assert_ne!(Tap::new().unwrap(), Tap::new().unwrap());
}
#[test]
fn test_tap_configure() {
skip_if_not_root!();
// `fetch_add` adds to the current value, returning the previous value.
let next_ip = NEXT_IP.fetch_add(1, Ordering::SeqCst);
@@ -451,6 +456,7 @@ mod tests {
#[test]
fn test_tap_enable() {
skip_if_not_root!();
let tap = Tap::new().unwrap();
let ret = tap.enable();
assert!(ret.is_ok());
@@ -458,6 +464,7 @@ mod tests {
#[test]
fn test_tap_get_ifreq() {
skip_if_not_root!();
let tap = Tap::new().unwrap();
let ret = tap.get_ifreq();
assert_eq!(
@@ -468,6 +475,7 @@ mod tests {
#[test]
fn test_raw_fd() {
skip_if_not_root!();
let tap = Tap::new().unwrap();
assert_eq!(tap.as_raw_fd(), tap.tap_file.as_raw_fd());
}

View File

@@ -50,6 +50,7 @@ vm-memory = { workspace = true, features = [
"backend-mmap",
"backend-atomic",
] }
test-utils = { workspace = true }
[features]
virtio-mmio = []

View File

@@ -748,6 +748,7 @@ pub(crate) mod tests {
use dbs_device::resources::DeviceResources;
use dbs_utils::epoll_manager::SubscriberOps;
use kvm_ioctls::Kvm;
use test_utils::skip_if_not_root;
use vm_memory::GuestMemoryMmap;
use vmm_sys_util::eventfd::EventFd;
@@ -803,6 +804,7 @@ pub(crate) mod tests {
#[test]
fn test_balloon_virtio_device_normal() {
skip_if_not_root!();
let epoll_mgr = EpollManager::default();
let config = BalloonConfig {
f_deflate_on_oom: true,
@@ -857,6 +859,7 @@ pub(crate) mod tests {
#[test]
fn test_balloon_virtio_device_active() {
skip_if_not_root!();
let epoll_mgr = EpollManager::default();
// check queue sizes error
@@ -923,6 +926,7 @@ pub(crate) mod tests {
#[test]
fn test_balloon_set_size() {
skip_if_not_root!();
let epoll_mgr = EpollManager::default();
let config = BalloonConfig {
f_deflate_on_oom: true,
@@ -936,6 +940,7 @@ pub(crate) mod tests {
#[test]
fn test_balloon_epoll_handler_handle_event() {
skip_if_not_root!();
let handler = create_balloon_epoll_handler();
let event_fd = EventFd::new(0).unwrap();
let mgr = EpollManager::default();
@@ -968,6 +973,7 @@ pub(crate) mod tests {
#[test]
fn test_balloon_epoll_handler_process_report_queue() {
skip_if_not_root!();
let mut handler = create_balloon_epoll_handler();
let m = &handler.config.vm_as.clone();
@@ -997,6 +1003,7 @@ pub(crate) mod tests {
#[test]
fn test_balloon_epoll_handler_process_queue() {
skip_if_not_root!();
let mut handler = create_balloon_epoll_handler();
let m = &handler.config.vm_as.clone();
// invalid idx

View File

@@ -376,6 +376,7 @@ mod tests {
use dbs_interrupt::NoopNotifier;
use dbs_utils::rate_limiter::{TokenBucket, TokenType};
use kvm_ioctls::Kvm;
use test_utils::skip_if_not_root;
use virtio_queue::QueueSync;
use vm_memory::{Bytes, GuestAddress, GuestMemoryMmap, GuestRegionMmap};
use vmm_sys_util::eventfd::EventFd;
@@ -909,6 +910,7 @@ mod tests {
#[test]
fn test_block_virtio_device_active() {
skip_if_not_root!();
let device_id = "dummy_device_id";
let epoll_mgr = EpollManager::default();

View File

@@ -579,6 +579,7 @@ pub(crate) mod tests {
};
use dbs_utils::epoll_manager::{EventOps, Events, MutEventSubscriber};
use kvm_ioctls::Kvm;
use test_utils::skip_if_not_root;
use virtio_queue::QueueSync;
use vm_memory::{GuestMemoryAtomic, GuestMemoryMmap, GuestMemoryRegion, MmapRegion};
@@ -629,6 +630,7 @@ pub(crate) mod tests {
#[test]
fn test_create_virtio_queue_config() {
skip_if_not_root!();
let (_vmfd, irq_manager) = crate::tests::create_vm_and_irq_manager();
let group = irq_manager
.create_group(InterruptSourceType::LegacyIrq, 0, 1)
@@ -660,6 +662,7 @@ pub(crate) mod tests {
#[test]
fn test_clone_virtio_queue_config() {
skip_if_not_root!();
let (_vmfd, irq_manager) = crate::tests::create_vm_and_irq_manager();
let group = irq_manager
.create_group(InterruptSourceType::LegacyIrq, 0, 1)
@@ -698,6 +701,7 @@ pub(crate) mod tests {
#[test]
fn test_create_virtio_device_config() {
skip_if_not_root!();
let mut device_config = create_virtio_device_config();
device_config.notify_device_changes().unwrap();
@@ -783,6 +787,7 @@ pub(crate) mod tests {
#[test]
fn test_virtio_device() {
skip_if_not_root!();
let epoll_mgr = EpollManager::default();
let avail_features = 0x1234 << 32 | 0x4567;

View File

@@ -962,6 +962,7 @@ pub mod tests {
use std::io::Write;
use std::path::PathBuf;
use std::sync::Arc;
use test_utils::skip_if_not_root;
use dbs_device::resources::DeviceResources;
use dbs_interrupt::NoopNotifier;
@@ -1187,6 +1188,7 @@ pub mod tests {
#[test]
fn test_virtio_fs_device_active() {
skip_if_not_root!();
let epoll_manager = EpollManager::default();
{
// config queue size is not 2
@@ -1675,6 +1677,7 @@ pub mod tests {
#[test]
fn test_register_mmap_region() {
skip_if_not_root!();
let epoll_manager = EpollManager::default();
let rate_limiter = RateLimiter::new(100, 0, 300, 10, 0, 300).unwrap();
let mut fs: VirtioFs<Arc<GuestMemoryMmap>> = VirtioFs::new(
@@ -1717,6 +1720,7 @@ pub mod tests {
#[test]
fn test_get_resource_requirements() {
skip_if_not_root!();
let epoll_manager = EpollManager::default();
let rate_limiter = RateLimiter::new(100, 0, 300, 10, 0, 300).unwrap();
let dax_on = 0x4000;
@@ -1761,6 +1765,7 @@ pub mod tests {
#[test]
fn test_set_resource() {
skip_if_not_root!();
let epoll_manager = EpollManager::default();
let rate_limiter = RateLimiter::new(100, 0, 300, 10, 0, 300).unwrap();
let mut fs: VirtioFs<Arc<GuestMemoryMmap>> = VirtioFs::new(

View File

@@ -503,6 +503,7 @@ pub mod tests {
use dbs_utils::epoll_manager::EpollManager;
use dbs_utils::epoll_manager::SubscriberOps;
use dbs_utils::rate_limiter::TokenBucket;
use test_utils::skip_if_not_root;
use vm_memory::{GuestAddress, GuestMemoryMmap};
use vmm_sys_util::tempfile::TempFile;
@@ -636,6 +637,7 @@ pub mod tests {
#[test]
fn test_fs_get_patch_rate_limiters() {
skip_if_not_root!();
let mut handler = create_fs_epoll_handler(String::from("1"));
let tokenbucket = TokenBucket::new(1, 1, 4);
@@ -705,6 +707,7 @@ pub mod tests {
#[test]
fn test_fs_epoll_handler_handle_event() {
skip_if_not_root!();
let handler = create_fs_epoll_handler("test_1".to_string());
let event_fd = EventFd::new(0).unwrap();
let mgr = EpollManager::default();
@@ -740,6 +743,7 @@ pub mod tests {
#[test]
fn test_fs_epoll_handler_handle_unknown_event() {
skip_if_not_root!();
let handler = create_fs_epoll_handler("test_1".to_string());
let event_fd = EventFd::new(0).unwrap();
let mgr = EpollManager::default();
@@ -756,6 +760,7 @@ pub mod tests {
#[test]
fn test_fs_epoll_handler_process_queue() {
skip_if_not_root!();
{
let mut handler = create_fs_epoll_handler("test_1".to_string());

View File

@@ -1345,6 +1345,7 @@ pub(crate) mod tests {
use std::ffi::CString;
use std::fs::File;
use std::os::unix::io::FromRawFd;
use test_utils::skip_if_not_root;
use dbs_device::resources::DeviceResources;
use dbs_interrupt::NoopNotifier;
@@ -1797,6 +1798,7 @@ pub(crate) mod tests {
#[test]
fn test_mem_virtio_device_set_resource() {
skip_if_not_root!();
let epoll_mgr = EpollManager::default();
let id = "mem0".to_string();
let factory = Arc::new(Mutex::new(DummyMemRegionFactory {}));
@@ -1874,6 +1876,7 @@ pub(crate) mod tests {
#[test]
fn test_mem_virtio_device_activate() {
skip_if_not_root!();
let epoll_mgr = EpollManager::default();
let id = "mem0".to_string();
let factory = Arc::new(Mutex::new(DummyMemRegionFactory {}));
@@ -1976,6 +1979,7 @@ pub(crate) mod tests {
#[test]
fn test_mem_virtio_device_remove() {
skip_if_not_root!();
let epoll_mgr = EpollManager::default();
let id = "mem0".to_string();
let factory = Arc::new(Mutex::new(DummyMemRegionFactory {}));
@@ -2011,6 +2015,7 @@ pub(crate) mod tests {
#[test]
fn test_mem_epoll_handler_handle_event() {
skip_if_not_root!();
let handler = create_mem_epoll_handler("test_1".to_string());
let event_fd = EventFd::new(0).unwrap();
let mgr = EpollManager::default();
@@ -2032,6 +2037,7 @@ pub(crate) mod tests {
#[test]
fn test_mem_epoll_handler_process_queue() {
skip_if_not_root!();
let mut handler = create_mem_epoll_handler("test_1".to_string());
let m = &handler.config.vm_as.clone();
// fail to parse available descriptor chain

View File

@@ -609,6 +609,7 @@ where
#[cfg(test)]
pub(crate) mod tests {
use kvm_ioctls::Kvm;
use test_utils::skip_if_not_root;
use virtio_queue::QueueSync;
use vm_memory::{GuestAddress, GuestMemoryMmap, GuestRegionMmap};
@@ -652,6 +653,7 @@ pub(crate) mod tests {
#[test]
fn test_virtio_mmio_state_new() {
skip_if_not_root!();
let mut state = get_mmio_state(false, false, 1);
assert_eq!(state.queues.len(), 3);

View File

@@ -494,6 +494,7 @@ where
pub(crate) mod tests {
use std::any::Any;
use std::sync::Mutex;
use test_utils::skip_if_not_root;
use byteorder::{ByteOrder, LittleEndian};
use dbs_device::resources::{MsiIrqType, Resource, ResourceConstraint};
@@ -708,6 +709,7 @@ pub(crate) mod tests {
#[test]
fn test_virtio_mmio_v2_device_new() {
skip_if_not_root!();
// test create error.
let resources = DeviceResources::new();
let mem = Arc::new(GuestMemoryMmap::from_ranges(&[(GuestAddress(0), 0x1000)]).unwrap());
@@ -769,6 +771,7 @@ pub(crate) mod tests {
#[test]
fn test_bus_device_read() {
skip_if_not_root!();
let mut d = get_mmio_device();
let mut buf = vec![0xff, 0, 0xfe, 0];
@@ -894,6 +897,7 @@ pub(crate) mod tests {
#[test]
fn test_bus_device_write() {
skip_if_not_root!();
let mut d = get_mmio_device();
let mut buf = vec![0; 5];
@@ -1023,6 +1027,7 @@ pub(crate) mod tests {
#[test]
fn test_bus_device_activate() {
skip_if_not_root!();
// invalid state transition should failed
let mut d = get_mmio_device();
@@ -1140,6 +1145,7 @@ pub(crate) mod tests {
#[test]
fn test_bus_device_reset() {
skip_if_not_root!();
let resources = get_device_resource(false, false);
let mut d = get_mmio_device_inner(true, 0, resources);
let mut buf = vec![0; 4];
@@ -1169,6 +1175,7 @@ pub(crate) mod tests {
#[test]
fn test_mmiov2_device_resources() {
skip_if_not_root!();
let d = get_mmio_device();
let resources = d.get_assigned_resources();
@@ -1185,6 +1192,7 @@ pub(crate) mod tests {
#[test]
fn test_mmio_v2_device_msi() {
skip_if_not_root!();
let resources = get_device_resource(true, false);
let mut d = get_mmio_device_inner(true, 0, resources);
@@ -1227,6 +1235,7 @@ pub(crate) mod tests {
#[test]
fn test_mmio_shared_memory() {
skip_if_not_root!();
let resources = get_device_resource(true, true);
let d = get_mmio_device_inner(true, 0, resources);

View File

@@ -848,6 +848,7 @@ mod tests {
use dbs_utils::epoll_manager::SubscriberOps;
use dbs_utils::rate_limiter::TokenBucket;
use kvm_ioctls::Kvm;
use test_utils::skip_if_not_root;
use vm_memory::{GuestAddress, GuestMemoryMmap};
use super::*;
@@ -900,6 +901,7 @@ mod tests {
#[test]
fn test_net_virtio_device_normal() {
skip_if_not_root!();
let next_ip = NEXT_IP.fetch_add(1, Ordering::SeqCst);
let tap = Tap::open_named(&format!("tap{next_ip}"), false).unwrap();
let epoll_mgr = EpollManager::default();
@@ -963,6 +965,7 @@ mod tests {
#[test]
fn test_net_virtio_device_active() {
skip_if_not_root!();
let epoll_mgr = EpollManager::default();
{
// config queue size is not 2
@@ -1112,6 +1115,7 @@ mod tests {
#[test]
fn test_net_set_patch_rate_limiters() {
skip_if_not_root!();
let next_ip = NEXT_IP.fetch_add(1, Ordering::SeqCst);
let tap = Tap::open_named(&format!("tap{next_ip}"), false).unwrap();
let epoll_mgr = EpollManager::default();
@@ -1150,6 +1154,7 @@ mod tests {
#[test]
fn test_net_get_patch_rate_limiters() {
skip_if_not_root!();
let mut handler = create_net_epoll_handler("test_1".to_string());
let tokenbucket = TokenBucket::new(1, 1, 4);
@@ -1174,6 +1179,7 @@ mod tests {
#[test]
fn test_net_epoll_handler_handle_event() {
skip_if_not_root!();
let handler = create_net_epoll_handler("test_1".to_string());
let event_fd = EventFd::new(0).unwrap();
let mgr = EpollManager::default();
@@ -1212,6 +1218,7 @@ mod tests {
#[test]
fn test_net_epoll_handler_handle_unknown_event() {
skip_if_not_root!();
let handler = create_net_epoll_handler("test_1".to_string());
let event_fd = EventFd::new(0).unwrap();
let mgr = EpollManager::default();
@@ -1228,6 +1235,7 @@ mod tests {
#[test]
fn test_net_epoll_handler_process_queue() {
skip_if_not_root!();
{
let mut handler = create_net_epoll_handler("test_1".to_string());
@@ -1253,6 +1261,7 @@ mod tests {
#[test]
fn test_net_bandwidth_rate_limiter() {
skip_if_not_root!();
let handler = create_net_epoll_handler("test_1".to_string());
let event_fd = EventFd::new(0).unwrap();
@@ -1330,6 +1339,7 @@ mod tests {
#[test]
fn test_net_ops_rate_limiter() {
skip_if_not_root!();
let handler = create_net_epoll_handler("test_1".to_string());
let event_fd = EventFd::new(0).unwrap();

View File

@@ -44,9 +44,11 @@ pub fn create_queue_notifier(
mod tests {
use super::*;
use dbs_interrupt::InterruptManager;
use test_utils::skip_if_not_root;
#[test]
fn test_create_virtio_legacy_notifier() {
skip_if_not_root!();
let (_vmfd, irq_manager) = crate::tests::create_vm_and_irq_manager();
let group = irq_manager
.create_group(InterruptSourceType::LegacyIrq, 0, 1)
@@ -68,6 +70,7 @@ mod tests {
#[test]
fn test_create_virtio_msi_notifier() {
skip_if_not_root!();
let (_vmfd, irq_manager) = crate::tests::create_vm_and_irq_manager();
let group = irq_manager
.create_group(InterruptSourceType::MsiIrq, 0, 3)

View File

@@ -682,6 +682,7 @@ mod tests {
};
use dbs_utils::epoll_manager::SubscriberOps;
use kvm_ioctls::Kvm;
use test_utils::skip_if_not_root;
use virtio_queue::{Queue, QueueSync};
use vm_memory::{GuestAddress, GuestMemoryMmap, GuestRegionMmap};
use vmm_sys_util::eventfd::EventFd;
@@ -718,6 +719,7 @@ mod tests {
#[test]
fn test_vhost_kern_net_virtio_normal() {
skip_if_not_root!();
let guest_mac_str = "11:22:33:44:55:66";
let guest_mac = MacAddr::parse_str(guest_mac_str).unwrap();
let queue_sizes = Arc::new(vec![128]);
@@ -757,6 +759,7 @@ mod tests {
#[test]
fn test_vhost_kern_net_virtio_activate() {
skip_if_not_root!();
let guest_mac_str = "11:22:33:44:55:66";
let guest_mac = MacAddr::parse_str(guest_mac_str).unwrap();
// Invalid queue sizes
@@ -841,6 +844,7 @@ mod tests {
#[test]
fn test_vhost_kern_net_epoll_handler_handle_event() {
skip_if_not_root!();
let handler = create_vhost_kern_net_epoll_handler("test_1".to_string());
let event_fd = EventFd::new(0).unwrap();
let mgr = EpollManager::default();

View File

@@ -631,7 +631,7 @@ mod tests {
#[test]
fn test_vhost_user_block_virtio_device_spdk() {
let socket_path = "/tmp/vhost.1";
let socket_path = concat!("vhost.", line!());
let handler = thread::spawn(move || {
let listener = Listener::new(socket_path, true).unwrap();
@@ -692,7 +692,7 @@ mod tests {
#[test]
fn test_vhost_user_block_virtio_device_activate_spdk() {
let socket_path = "/tmp/vhost.2";
let socket_path = concat!("vhost.", line!());
let handler = thread::spawn(move || {
// create vhost user block device

View File

@@ -810,7 +810,7 @@ mod tests {
#[test]
fn test_vhost_user_fs_virtio_device_normal() {
let device_socket = "/tmp/vhost.1";
let device_socket = concat!("vhost.", line!());
let tag = "test_fs";
let handler = thread::spawn(move || {
@@ -879,7 +879,7 @@ mod tests {
#[test]
fn test_vhost_user_fs_virtio_device_activate() {
let device_socket = "/tmp/vhost.1";
let device_socket = concat!("vhost.", line!());
let tag = "test_fs";
let handler = thread::spawn(move || {

View File

@@ -604,6 +604,7 @@ mod tests {
use dbs_interrupt::{InterruptManager, InterruptSourceType, MsiNotifier, NoopNotifier};
use dbs_utils::epoll_manager::EpollManager;
use kvm_ioctls::Kvm;
use test_utils::skip_if_not_root;
use vhost_rs::vhost_user::message::VhostUserU64;
use vhost_rs::vhost_user::{VhostUserProtocolFeatures, VhostUserVirtioFeatures};
use virtio_queue::QueueSync;
@@ -647,7 +648,7 @@ mod tests {
#[test]
fn test_vhost_user_net_virtio_device_normal() {
let device_socket = "/tmp/vhost.1";
let device_socket = concat!("vhost.", line!());
let queue_sizes = Arc::new(vec![128]);
let epoll_mgr = EpollManager::default();
let handler = thread::spawn(move || {
@@ -697,7 +698,8 @@ mod tests {
#[test]
fn test_vhost_user_net_virtio_device_activate() {
let device_socket = "/tmp/vhost.1";
skip_if_not_root!();
let device_socket = concat!("vhost.", line!());
let queue_sizes = Arc::new(vec![128]);
let epoll_mgr = EpollManager::default();
let handler = thread::spawn(move || {

View File

@@ -208,6 +208,7 @@ mod tests {
use dbs_device::resources::DeviceResources;
use dbs_interrupt::NoopNotifier;
use kvm_ioctls::Kvm;
use test_utils::skip_if_not_root;
use virtio_queue::QueueSync;
use vm_memory::{GuestAddress, GuestMemoryMmap, GuestRegionMmap};
@@ -243,6 +244,7 @@ mod tests {
#[test]
fn test_virtio_device() {
skip_if_not_root!();
let mut ctx = TestContext::new();
let device_features = VSOCK_AVAIL_FEATURES;
let driver_features: u64 = VSOCK_AVAIL_FEATURES | 1 | (1 << 32);

View File

@@ -310,6 +310,7 @@ where
#[cfg(test)]
mod tests {
use test_utils::skip_if_not_root;
use vm_memory::{Bytes, GuestAddress, GuestMemoryMmap};
use vmm_sys_util::epoll::EventSet;
@@ -320,6 +321,7 @@ mod tests {
#[test]
fn test_irq() {
skip_if_not_root!();
let test_ctx = TestContext::new();
let mut ctx = test_ctx.create_event_handler_context();
ctx.arti_activate(&test_ctx.mem);
@@ -329,6 +331,7 @@ mod tests {
#[test]
fn test_txq_event() {
skip_if_not_root!();
// Test case:
// - the driver has something to send (there's data in the TX queue);
// and
@@ -411,6 +414,7 @@ mod tests {
#[test]
fn test_rxq_event() {
skip_if_not_root!();
// Test case:
// - there is pending RX data in the backend; and
// - the driver makes RX buffers available; and
@@ -468,6 +472,7 @@ mod tests {
#[test]
fn test_backend_event() {
skip_if_not_root!();
// Test case:
// - a backend event is received; and
// - the backend has pending RX data.
@@ -567,6 +572,7 @@ mod tests {
#[test]
fn test_vsock_bof() {
skip_if_not_root!();
const GAP_SIZE: usize = 768 << 20;
const FIRST_AFTER_GAP: usize = 1 << 32;
const GAP_START_ADDR: usize = FIRST_AFTER_GAP - GAP_SIZE;

View File

@@ -298,6 +298,7 @@ mod tests {
use super::*;
use crate::device_manager::tests::create_address_space;
use crate::test_utils::tests::create_vm_for_test;
use test_utils::skip_if_not_root;
impl Default for BalloonDeviceConfigInfo {
fn default() -> Self {
@@ -330,6 +331,7 @@ mod tests {
#[test]
fn test_balloon_insert_or_update_device() {
skip_if_not_root!();
//Init vm for test.
let mut vm = create_vm_for_test();
@@ -354,6 +356,7 @@ mod tests {
#[test]
fn test_balloon_attach_device() {
skip_if_not_root!();
//Init vm and insert balloon config for test.
let mut vm = create_vm_for_test();
let device_op_ctx = DeviceOpContext::new(
@@ -393,6 +396,7 @@ mod tests {
#[test]
fn test_balloon_update_device() {
skip_if_not_root!();
//Init vm for test.
let mut vm = create_vm_for_test();
let device_op_ctx = DeviceOpContext::new(

View File

@@ -618,6 +618,7 @@ impl MemRegionFactory for MemoryRegionFactory {
#[cfg(test)]
mod tests {
use test_utils::skip_if_not_root;
use vm_memory::GuestMemoryRegion;
use super::*;
@@ -656,6 +657,7 @@ mod tests {
#[test]
fn test_mem_insert_or_update_device() {
skip_if_not_root!();
// Init vm for test.
let mut vm = create_vm_for_test();
@@ -681,6 +683,7 @@ mod tests {
#[test]
fn test_mem_attach_device() {
skip_if_not_root!();
// Init vm and insert mem config for test.
let mut vm = create_vm_for_test();
let dummy_mem_device = MemDeviceConfigInfo::default();
@@ -710,6 +713,7 @@ mod tests {
#[test]
fn test_mem_create_region() {
skip_if_not_root!();
let vm = create_vm_for_test();
let ctx = DeviceOpContext::new(
Some(vm.epoll_manager().clone()),

View File

@@ -277,6 +277,7 @@ impl Default for VhostNetDeviceMgr {
mod tests {
use dbs_utils::net::MacAddr;
use dbs_virtio_devices::Error as VirtioError;
use test_utils::skip_if_not_root;
use crate::{
device_manager::{
@@ -289,6 +290,7 @@ mod tests {
#[test]
fn test_create_vhost_net_device() {
skip_if_not_root!();
let vm = create_vm_for_test();
let mgr = DeviceManager::new_test_mgr();
let id_1 = String::from("id_1");
@@ -321,6 +323,7 @@ mod tests {
#[test]
fn test_attach_vhost_net_device() {
skip_if_not_root!();
// Init vm for test.
let mut vm = create_vm_for_test();
let device_op_ctx = DeviceOpContext::new(
@@ -373,6 +376,7 @@ mod tests {
#[test]
fn test_insert_vhost_net_device() {
skip_if_not_root!();
let vm = create_vm_for_test();
let mut mgr = DeviceManager::new_test_mgr();
@@ -437,6 +441,7 @@ mod tests {
#[test]
fn test_vhost_net_insert_error_cases() {
skip_if_not_root!();
let vm = create_vm_for_test();
let mut mgr = DeviceManager::new_test_mgr();

View File

@@ -219,9 +219,11 @@ impl Default for VhostUserNetDeviceMgr {
mod tests {
use super::*;
use crate::test_utils::tests::create_vm_for_test;
use test_utils::skip_if_not_root;
#[test]
fn test_create_vhost_user_net_device() {
skip_if_not_root!();
let vm = create_vm_for_test();
let mgr = DeviceManager::new_test_mgr();
let sock_1 = String::from("id_1");
@@ -249,6 +251,7 @@ mod tests {
#[test]
fn test_insert_vhost_user_net_device() {
skip_if_not_root!();
let vm = create_vm_for_test();
let mut mgr = DeviceManager::new_test_mgr();
let sock_1 = String::from("id_1");
@@ -277,6 +280,7 @@ mod tests {
#[test]
fn test_vhost_user_net_insert_error_cases() {
skip_if_not_root!();
let vm = create_vm_for_test();
let mut mgr = DeviceManager::new_test_mgr();
let sock_1 = String::from("id_1");

View File

@@ -283,6 +283,13 @@ pub const KATA_ANNO_CFG_HYPERVISOR_DEFAULT_GPUS: &str =
pub const KATA_ANNO_CFG_HYPERVISOR_DEFAULT_GPU_MODEL: &str =
"io.katacontainers.config.hypervisor.default_gpu_model";
/// Block device specific annotation for num_queues
pub const KATA_ANNO_CFG_HYPERVISOR_BLOCK_DEV_NUM_QUEUES: &str =
"io.katacontainers.config.hypervisor.block_device_num_queues";
/// Block device specific annotation for queue_size
pub const KATA_ANNO_CFG_HYPERVISOR_BLOCK_DEV_QUEUE_SIZE: &str =
"io.katacontainers.config.hypervisor.block_device_queue_size";
// Runtime related annotations
/// Prefix for Runtime configurations.
pub const KATA_ANNO_CFG_RUNTIME_PREFIX: &str = "io.katacontainers.config.runtime.";
@@ -503,6 +510,7 @@ impl Annotation {
let u32_err = io::Error::new(io::ErrorKind::InvalidData, "parse u32 error".to_string());
let u64_err = io::Error::new(io::ErrorKind::InvalidData, "parse u64 error".to_string());
let i32_err = io::Error::new(io::ErrorKind::InvalidData, "parse i32 error".to_string());
let usize_err = io::Error::new(io::ErrorKind::InvalidData, "parse usize error".to_string());
let hv = config.hypervisor.get_mut(hypervisor_name).ok_or_else(|| {
io::Error::new(
io::ErrorKind::InvalidData,
@@ -960,7 +968,26 @@ impl Annotation {
return Err(u32_err);
}
},
KATA_ANNO_CFG_HYPERVISOR_BLOCK_DEV_NUM_QUEUES => {
match self.get_value::<usize>(key) {
Ok(v) => {
hv.blockdev_info.num_queues = v.unwrap_or_default();
}
Err(_e) => {
return Err(usize_err);
}
}
}
KATA_ANNO_CFG_HYPERVISOR_BLOCK_DEV_QUEUE_SIZE => {
match self.get_value::<u32>(key) {
Ok(v) => {
hv.blockdev_info.queue_size = v.unwrap_or_default();
}
Err(_e) => {
return Err(u32_err);
}
}
}
_ => {
return Err(io::Error::new(
io::ErrorKind::InvalidInput,

View File

@@ -85,11 +85,6 @@ impl ConfigPlugin for CloudHypervisorConfig {
if ch.memory_info.memory_slots == 0 {
ch.memory_info.memory_slots = default::DEFAULT_CH_MEMORY_SLOTS;
}
// Apply factory defaults
if ch.factory.template_path.is_empty() {
ch.factory.template_path = default::DEFAULT_TEMPLATE_PATH.to_string();
}
}
Ok(())

View File

@@ -79,11 +79,6 @@ impl ConfigPlugin for DragonballConfig {
if db.memory_info.memory_slots == 0 {
db.memory_info.memory_slots = default::DEFAULT_DRAGONBALL_MEMORY_SLOTS;
}
// Apply factory defaults
if db.factory.template_path.is_empty() {
db.factory.template_path = default::DEFAULT_TEMPLATE_PATH.to_string();
}
}
Ok(())
}

View File

@@ -69,11 +69,6 @@ impl ConfigPlugin for FirecrackerConfig {
firecracker.memory_info.default_memory =
default::DEFAULT_FIRECRACKER_MEMORY_SIZE_MB;
}
// Apply factory defaults
if firecracker.factory.template_path.is_empty() {
firecracker.factory.template_path = default::DEFAULT_TEMPLATE_PATH.to_string();
}
}
Ok(())

View File

@@ -92,7 +92,6 @@ impl ConfigPlugin for QemuConfig {
qemu.memory_info.memory_slots = default::DEFAULT_QEMU_MEMORY_SLOTS;
}
// Apply factory defaults
if qemu.factory.template_path.is_empty() {
qemu.factory.template_path = default::DEFAULT_TEMPLATE_PATH.to_string();
}

View File

@@ -25,6 +25,7 @@ pub enum Error {
}
/// Assigned CPU resources for a Linux container.
/// Stores fractional vCPU allocation for more precise resource tracking.
#[derive(Clone, Default, Debug)]
pub struct LinuxContainerCpuResources {
shares: u64,
@@ -32,7 +33,8 @@ pub struct LinuxContainerCpuResources {
quota: i64,
cpuset: CpuSet,
nodeset: NumaNodeSet,
calculated_vcpu_time_ms: Option<u64>,
/// Calculated fractional vCPU allocation, e.g., 0.25 means 1/4 of a CPU.
calculated_vcpu: Option<f64>,
}
impl LinuxContainerCpuResources {
@@ -61,10 +63,10 @@ impl LinuxContainerCpuResources {
&self.nodeset
}
/// Get number of vCPUs to fulfill the CPU resource request, `None` means unconstrained.
pub fn get_vcpus(&self) -> Option<u64> {
self.calculated_vcpu_time_ms
.map(|v| v.saturating_add(999) / 1000)
/// Get the number of vCPUs assigned to the container as a fractional value.
/// Returns `None` if unconstrained (no limit).
pub fn get_vcpus(&self) -> Option<f64> {
self.calculated_vcpu
}
}
@@ -75,15 +77,18 @@ impl TryFrom<&oci::LinuxCpu> for LinuxContainerCpuResources {
fn try_from(value: &oci::LinuxCpu) -> Result<Self, Self::Error> {
let period = value.period().unwrap_or(0);
let quota = value.quota().unwrap_or(-1);
let value_cpus = value.cpus().as_ref().map_or("", |cpus| cpus);
let value_cpus = value.cpus().as_deref().unwrap_or("");
let cpuset = CpuSet::from_str(value_cpus).map_err(Error::InvalidCpuSet)?;
let value_mems = value.mems().as_ref().map_or("", |mems| mems);
let value_mems = value.mems().as_deref().unwrap_or("");
let nodeset = NumaNodeSet::from_str(value_mems).map_err(Error::InvalidNodeSet)?;
// If quota is -1, it means the CPU resource request is unconstrained. In that case,
// we don't currently assign additional CPUs.
let milli_sec = if quota >= 0 && period != 0 {
Some((quota as u64).saturating_mul(1000) / period)
// Calculate fractional vCPUs:
// If quota >= 0 and period > 0, vCPUs = quota / period.
// Otherwise, if cpuset is non-empty, derive from cpuset length.
let vcpu_fraction = if quota >= 0 && period > 0 {
Some(quota as f64 / period as f64)
} else if !cpuset.is_empty() {
Some(cpuset.len() as f64)
} else {
None
};
@@ -94,16 +99,18 @@ impl TryFrom<&oci::LinuxCpu> for LinuxContainerCpuResources {
quota,
cpuset,
nodeset,
calculated_vcpu_time_ms: milli_sec,
calculated_vcpu: vcpu_fraction,
})
}
}
/// Assigned CPU resources for a Linux sandbox/pod.
/// Aggregated CPU resources for a Linux sandbox/pod.
/// Tracks cumulative fractional vCPU allocation across all containers in the pod.
#[derive(Default, Debug)]
pub struct LinuxSandboxCpuResources {
shares: u64,
calculated_vcpu_time_ms: u64,
/// Total fractional vCPU allocation for the sandbox.
calculated_vcpu: f64,
cpuset: CpuSet,
nodeset: NumaNodeSet,
}
@@ -122,9 +129,9 @@ impl LinuxSandboxCpuResources {
self.shares
}
/// Get assigned vCPU time in ms.
pub fn calculated_vcpu_time_ms(&self) -> u64 {
self.calculated_vcpu_time_ms
/// Return the cumulative fractional vCPU allocation for the sandbox.
pub fn calculated_vcpu(&self) -> f64 {
self.calculated_vcpu
}
/// Get the CPU set.
@@ -137,19 +144,23 @@ impl LinuxSandboxCpuResources {
&self.nodeset
}
/// Get number of vCPUs to fulfill the CPU resource request.
pub fn get_vcpus(&self) -> u64 {
if self.calculated_vcpu_time_ms == 0 && !self.cpuset.is_empty() {
self.cpuset.len() as u64
} else {
self.calculated_vcpu_time_ms.saturating_add(999) / 1000
/// Get the number of vCPUs for the sandbox as a fractional value.
/// If no quota and cpuset is defined, return cpuset length as float.
pub fn get_vcpus(&self) -> f64 {
if self.calculated_vcpu == 0.0 {
if !self.cpuset.is_empty() {
return self.cpuset.len() as f64;
}
return 0.0;
}
self.calculated_vcpu
}
/// Merge resources assigned to a container into the sandbox/pod resources.
/// Merge container CPU resources into this sandbox CPU resource object.
/// Aggregates fractional vCPU allocation and extends cpuset/nodeset.
pub fn merge(&mut self, container_resource: &LinuxContainerCpuResources) -> &mut Self {
if let Some(v) = container_resource.calculated_vcpu_time_ms.as_ref() {
self.calculated_vcpu_time_ms += v;
if let Some(v) = container_resource.calculated_vcpu {
self.calculated_vcpu += v;
}
self.cpuset.extend(&container_resource.cpuset);
self.nodeset.extend(&container_resource.nodeset);
@@ -160,16 +171,16 @@ impl LinuxSandboxCpuResources {
#[cfg(test)]
mod tests {
use super::*;
const EPSILON: f64 = 0.0001;
#[test]
fn test_linux_container_cpu_resources() {
let resources = LinuxContainerCpuResources::default();
assert_eq!(resources.shares(), 0);
assert_eq!(resources.calculated_vcpu_time_ms, None);
assert!(resources.cpuset.is_empty());
assert!(resources.nodeset.is_empty());
assert!(resources.calculated_vcpu_time_ms.is_none());
assert!(resources.get_vcpus().is_none());
let mut linux_cpu = oci::LinuxCpu::default();
linux_cpu.set_shares(Some(2048));
@@ -182,11 +193,20 @@ mod tests {
assert_eq!(resources.shares(), 2048);
assert_eq!(resources.period(), 100);
assert_eq!(resources.quota(), 1001);
assert_eq!(resources.calculated_vcpu_time_ms, Some(10010));
assert_eq!(resources.get_vcpus().unwrap(), 11);
// Expected fractional vCPUs = quota / period
let expected_vcpus = 1001.0 / 100.0;
assert!(
(resources.get_vcpus().unwrap() - expected_vcpus).abs() < EPSILON,
"got {}, expect {}",
resources.get_vcpus().unwrap(),
expected_vcpus
);
assert_eq!(resources.cpuset().len(), 3);
assert_eq!(resources.nodeset().len(), 1);
// Test cpuset-only path (no quota)
let mut linux_cpu = oci::LinuxCpu::default();
linux_cpu.set_shares(Some(2048));
linux_cpu.set_cpus(Some("1".to_string()));
@@ -196,8 +216,10 @@ mod tests {
assert_eq!(resources.shares(), 2048);
assert_eq!(resources.period(), 0);
assert_eq!(resources.quota(), -1);
assert_eq!(resources.calculated_vcpu_time_ms, None);
assert!(resources.get_vcpus().is_none());
assert!(
(resources.get_vcpus().unwrap() - 1.0).abs() < EPSILON,
"cpuset size vCPU mismatch"
);
assert_eq!(resources.cpuset().len(), 1);
assert_eq!(resources.nodeset().len(), 2);
}
@@ -207,8 +229,7 @@ mod tests {
let mut sandbox = LinuxSandboxCpuResources::new(1024);
assert_eq!(sandbox.shares(), 1024);
assert_eq!(sandbox.get_vcpus(), 0);
assert_eq!(sandbox.calculated_vcpu_time_ms(), 0);
assert_eq!(sandbox.get_vcpus(), 0.0);
assert!(sandbox.cpuset().is_empty());
assert!(sandbox.nodeset().is_empty());
@@ -222,11 +243,20 @@ mod tests {
let resources = LinuxContainerCpuResources::try_from(&linux_cpu).unwrap();
sandbox.merge(&resources);
assert_eq!(sandbox.shares(), 1024);
assert_eq!(sandbox.get_vcpus(), 11);
assert_eq!(sandbox.calculated_vcpu_time_ms(), 10010);
// vCPUs after merge = quota / period
let expected_vcpus = 1001.0 / 100.0;
assert!(
(sandbox.get_vcpus() - expected_vcpus).abs() < EPSILON,
"sandbox vCPU mismatch: got {}, expect {}",
sandbox.get_vcpus(),
expected_vcpus
);
assert_eq!(sandbox.cpuset().len(), 3);
assert_eq!(sandbox.nodeset().len(), 1);
// Merge cpuset-only container
let mut linux_cpu = oci::LinuxCpu::default();
linux_cpu.set_shares(Some(2048));
linux_cpu.set_cpus(Some("1,4".to_string()));
@@ -236,8 +266,15 @@ mod tests {
sandbox.merge(&resources);
assert_eq!(sandbox.shares(), 1024);
assert_eq!(sandbox.get_vcpus(), 11);
assert_eq!(sandbox.calculated_vcpu_time_ms(), 10010);
// Expect quota-based + cpuset len (since cpuset is treated as allocation)
let expected_after_merge = expected_vcpus + resources.get_vcpus().unwrap();
assert!(
(sandbox.get_vcpus() - expected_after_merge).abs() < EPSILON,
"sandbox vCPU mismatch after cpuset merge: got {}, expect {}",
sandbox.get_vcpus(),
expected_after_merge
);
assert_eq!(sandbox.cpuset().len(), 4);
assert_eq!(sandbox.nodeset().len(), 2);
}

6185
src/runtime-rs/Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -1,77 +1,31 @@
[workspace]
members = [
"crates/agent",
"crates/hypervisor",
"crates/persist",
"crates/resource",
"crates/runtimes",
"crates/service",
"crates/shim",
"crates/shim-ctl",
[package]
name = "runtime-rs"
version = "0.1.0"
authors = { workspace = true }
description = "Containerd shim runtime for Kata Containers"
keywords = ["kata-containers", "shim"]
repository = "https://github.com/kata-containers/kata-containers.git"
license = { workspace = true }
edition = { workspace = true }
"tests/utils",
]
[[bin]]
name = "containerd-shim-kata-v2"
path = "crates/shim/src/bin/main.rs"
[workspace.package]
authors = ["The Kata Containers community <kata-dev@lists.katacontainers.io>"]
edition = "2018"
license = "Apache-2.0"
[[bin]]
name = "shim-ctl"
path = "crates/shim-ctl/src/main.rs"
[workspace.dependencies]
agent = { path = "crates/agent" }
hypervisor = { path = "crates/hypervisor" }
persist = { path = "crates/persist" }
resource = { path = "crates/resource" }
runtimes = { path = "crates/runtimes" }
service = { path = "crates/service" }
tests_utils = { path = "tests/utils" }
[features]
dragonball = ["runtimes/dragonball"]
cloud-hypervisor = ["runtimes/cloud-hypervisor"]
ch-config = { path = "crates/hypervisor/ch-config" }
common = { path = "crates/runtimes/common" }
linux_container = { path = "crates/runtimes/linux_container" }
virt_container = { path = "crates/runtimes/virt_container" }
wasm_container = { path = "crates/runtimes/wasm_container" }
# Local dependencies from `src/libs`
kata-sys-util = { path = "../libs/kata-sys-util" }
kata-types = { path = "../libs/kata-types", features = ["safe-path"] }
logging = { path = "../libs/logging" }
protocols = { path = "../libs/protocols", features = ["async"] }
runtime-spec = { path = "../libs/runtime-spec" }
safe-path = { path = "../libs/safe-path" }
shim-interface = { path = "../libs/shim-interface" }
test-utils = { path = "../libs/test-utils" }
# Local dependencies from `src/dragonball`
dragonball = { path = "../dragonball" }
dbs-utils = { path = "../dragonball/dbs_utils" }
actix-rt = "2.7.0"
anyhow = "1.0"
async-trait = "0.1.48"
containerd-shim = { version = "0.10.0", features = ["async"] }
containerd-shim-protos = { version = "0.10.0", features = ["async"] }
go-flag = "0.1.0"
hyper = "0.14.20"
hyperlocal = "0.8.0"
lazy_static = "1.4"
libc = "0.2"
log = "0.4.14"
netns-rs = "0.1.0"
# Note: nix needs to stay sync'd with libs versions
nix = "0.26.4"
oci-spec = { version = "0.8.1", features = ["runtime"] }
protobuf = "3.7.2"
rand = "0.8.4"
serde = { version = "1.0.145", features = ["derive"] }
serde_json = "1.0.91"
slog = "2.5.2"
slog-scope = "4.4.0"
strum = { version = "0.24.0", features = ["derive"] }
tempfile = "3.19.1"
thiserror = "1.0"
tokio = "1.46.1"
tracing = "0.1.41"
tracing-opentelemetry = "0.18.0"
ttrpc = "0.8.4"
url = "2.5.4"
[dependencies]
anyhow = { workspace = true }
go-flag = { workspace = true }
nix = { workspace = true }
tokio = { workspace = true, features = ["rt", "rt-multi-thread"] }
shim = { path = "crates/shim" }
common = { workspace = true }
logging = { workspace = true }
runtimes = { workspace = true }

View File

@@ -150,8 +150,8 @@ DEFMEMSLOTS := 10
DEFMAXMEMSZ := 0
##VAR DEFBRIDGES=<number> Default number of bridges
DEFBRIDGES := 0
DEFENABLEANNOTATIONS := [\"kernel_params\"]
DEFENABLEANNOTATIONS_COCO := [\"kernel_params\",\"cc_init_data\"]
DEFENABLEANNOTATIONS := [\"enable_iommu\", \"virtio_fs_extra_args\", \"kernel_params\", \"default_vcpus\", \"default_memory\"]
DEFENABLEANNOTATIONS_COCO := [\"enable_iommu\", \"virtio_fs_extra_args\", \"kernel_params\", \"default_vcpus\", \"default_memory\", \"cc_init_data\"]
DEFDISABLEGUESTSECCOMP := true
DEFDISABLEGUESTEMPTYDIR := false
##VAR DEFAULTEXPFEATURES=[features] Default experimental features enabled
@@ -328,7 +328,7 @@ ifneq (,$(QEMUCMD))
KERNELPATH_COCO = $(KERNELDIR)/$(KERNEL_NAME_COCO)
# overriding options
DEFSTATICRESOURCEMGMT_QEMU := true
DEFSTATICRESOURCEMGMT_QEMU := false
# qemu-specific options
DEFSANDBOXCGROUPONLY_QEMU := false
@@ -583,7 +583,7 @@ ifneq ($(EXTRA_RUSTFEATURES),)
endif
TARGET_PATH = target/$(TRIPLE)/$(BUILD_TYPE)/$(TARGET)
TARGET_PATH = ../../target/$(TRIPLE)/$(BUILD_TYPE)/$(TARGET)
##VAR DESTDIR=<path> is a directory prepended to each installed target file
DESTDIR ?= /

View File

@@ -263,20 +263,6 @@ tx_rate_limiter_max_rate = 0
# disable applying SELinux on the VMM process (default false)
disable_selinux = @DEFDISABLESELINUX@
[factory]
# VM templating support. Once enabled, new VMs are created from template
# using vm cloning. They will share the same initial kernel, initramfs and
# agent memory by mapping it readonly. It helps speeding up new container
# creation and saves a lot of memory if there are many kata containers running
# on the same host.
#
# When disabled, new VMs are created from scratch.
#
# Note: Requires "initrd=" to be set ("image=" is not supported).
#
# Default false
enable_template = false
[agent.@PROJECT_TYPE@]
# If enabled, make the agent display debug-level messages.
# (default: disabled)

View File

@@ -33,6 +33,7 @@ pub const SYS_KERN_IOMMU_GROUPS: &str = "/sys/kernel/iommu_groups";
pub const VFIO_PCI_DRIVER: &str = "vfio-pci";
pub const DRIVER_MMIO_BLK_TYPE: &str = "mmioblk";
pub const DRIVER_VFIO_PCI_TYPE: &str = "vfio-pci";
pub const DRIVER_VFIO_AP_TYPE: &str = "vfio-ap";
pub const MAX_DEV_ID_SIZE: usize = 31;
const VFIO_PCI_DRIVER_NEW_ID: &str = "/sys/bus/pci/drivers/vfio-pci/new_id";
@@ -75,6 +76,7 @@ pub enum VfioBusMode {
#[default]
MMIO,
PCI,
CCW,
}
impl VfioBusMode {
@@ -94,8 +96,12 @@ impl VfioBusMode {
// driver_type used for kata-agent
// (1) vfio-pci for add device handler,
// (2) mmioblk for add storage handler,
pub fn driver_type(mode: &str) -> &str {
// (2) vfio-ap for add ccw device handler,
// (3) mmioblk for add storage handler,
pub fn driver_type(bus_type: &str, mode: &str) -> &'static str {
if bus_type == "ccw" {
return DRIVER_VFIO_AP_TYPE;
}
match mode {
"b" => DRIVER_MMIO_BLK_TYPE,
_ => DRIVER_VFIO_PCI_TYPE,
@@ -103,7 +109,7 @@ impl VfioBusMode {
}
}
#[derive(Clone, Debug, Default)]
#[derive(Clone, Debug, Default, PartialEq)]
pub enum VfioDeviceType {
/// error type of VFIO device
Error,
@@ -112,8 +118,11 @@ pub enum VfioDeviceType {
#[default]
Normal,
/// mediated VFIO device type
Mediated,
/// mediated VFIO-PCI device type
MediatedPci,
/// mediated VFIO-AP device type
MediatedAp,
}
// DeviceVendorClass represents a PCI device's deviceID, vendorID and classID
@@ -195,6 +204,9 @@ pub struct VfioConfig {
/// device as block or char
pub dev_type: String,
/// bus type: pci or ccw
pub bus_type: String,
/// hostdev_prefix for devices, such as:
/// (1) phisycial endpoint: "physical_nic_"
/// (2) vfio mdev: "vfio_mdev_"
@@ -247,12 +259,19 @@ impl VfioDevice {
// get bus mode and driver type based on the device type
let dev_type = dev_info.dev_type.as_str();
let driver_type = VfioBusMode::driver_type(dev_type).to_owned();
let bus_type = dev_info.bus_type.as_str();
let driver_type = VfioBusMode::driver_type(bus_type, dev_type).to_owned();
let bus_mode = if bus_type == "ccw" {
VfioBusMode::CCW
} else {
VfioBusMode::PCI
};
let mut vfio_device = Self {
device_id,
attach_count: 0,
bus_mode: VfioBusMode::PCI,
bus_mode,
driver_type,
config: dev_info.clone(),
devices,
@@ -278,14 +297,22 @@ impl VfioDevice {
// nornaml VFIO BDF: 0000:04:00.0
// mediated VFIO BDF: 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001
fn get_vfio_device_type(&self, device_sys_path: String) -> Result<VfioDeviceType> {
fn get_vfio_device_type(
&self,
device_sys_path: String,
iommu_dev_path: PathBuf,
) -> Result<VfioDeviceType> {
let mut tokens: Vec<&str> = device_sys_path.as_str().split(':').collect();
let vfio_type = match tokens.len() {
3 => VfioDeviceType::Normal,
_ => {
tokens = device_sys_path.split('-').collect();
if tokens.len() == 5 {
VfioDeviceType::Mediated
if iommu_dev_path.to_string_lossy().contains("vfio_ap") {
VfioDeviceType::MediatedAp
} else {
VfioDeviceType::MediatedPci
}
} else {
VfioDeviceType::Error
}
@@ -329,20 +356,24 @@ impl VfioDevice {
dev_file_name: String,
iommu_dev_path: PathBuf,
) -> Result<(Option<String>, String, VfioDeviceType)> {
let vfio_type = self.get_vfio_device_type(dev_file_name.clone())?;
let vfio_type = self.get_vfio_device_type(dev_file_name.clone(), iommu_dev_path.clone())?;
match vfio_type {
VfioDeviceType::Normal => {
let dev_bdf = get_device_bdf(dev_file_name.clone());
let dev_sys = [SYS_BUS_PCI_DEVICES, dev_file_name.as_str()].join("/");
Ok((dev_bdf, dev_sys, vfio_type))
}
VfioDeviceType::Mediated => {
VfioDeviceType::MediatedPci | VfioDeviceType::MediatedAp => {
// sysfsdev eg. /sys/devices/pci0000:00/0000:00:02.0/f79944e4-5a3d-11e8-99ce-479cbab002e4
let sysfs_dev = Path::new(&iommu_dev_path).join(dev_file_name);
let dev_sys = self
.get_sysfs_device(sysfs_dev)
.context("get sysfs device failed")?;
if vfio_type == VfioDeviceType::MediatedAp {
return Ok((None, dev_sys, vfio_type));
}
let dev_bdf = if let Some(dev_s) = get_mediated_device_bdf(dev_sys.clone()) {
get_device_bdf(dev_s)
} else {
@@ -376,25 +407,31 @@ impl VfioDevice {
.get_vfio_device_details(device_name.to_owned(), iommu_devs_path)
.context("get vfio device details failed")?;
// It's safe as BDF really exists.
let dev_bdf = vfio_dev_details.0.unwrap();
let dev_vendor_class = self
.get_vfio_device_vendor_class(device_name)
.context("get property device and vendor failed")?;
// BDF exists only for PCI devices
// For AP devices, the BDF is not available.
if let Some(bdf) = vfio_dev_details.0 {
let dev_vendor_class = self
.get_vfio_device_vendor_class(device_name)
.context("get property device and vendor failed")?;
let parts: Vec<&str> = device_name.splitn(2, ':').collect();
let domain_part = parts.first().context("missing domain segment")?;
let vfio_dev = HostDevice {
domain: domain_part.to_string(),
bus_slot_func: dev_bdf.clone(),
device_vendor_class: Some(dev_vendor_class),
sysfs_path: vfio_dev_details.1,
vfio_type: vfio_dev_details.2,
..Default::default()
};
Ok(vfio_dev)
let parts: Vec<&str> = device_name.splitn(2, ':').collect();
let domain_part = parts.first().context("missing domain segment")?;
let vfio_dev = HostDevice {
domain: domain_part.to_string(),
bus_slot_func: bdf.clone(),
device_vendor_class: Some(dev_vendor_class),
sysfs_path: vfio_dev_details.1,
vfio_type: vfio_dev_details.2,
..Default::default()
};
Ok(vfio_dev)
} else {
Ok(HostDevice {
sysfs_path: vfio_dev_details.1,
vfio_type: vfio_dev_details.2,
..Default::default()
})
}
}
// filter Host or PCI Bridges that are in the same IOMMU group as the
@@ -507,6 +544,7 @@ impl Device for VfioDevice {
Ok(())
}
Err(e) => {
error!(sl!(), "failed to attach vfio device: {:?}", e);
self.decrease_attach_count().await?;
unregister_pcie_device!(self, pcie_topo)?;
return Err(e);

View File

@@ -6,7 +6,8 @@
use crate::device::topology::{PCIePortBusPrefix, TopologyPortDevice, DEFAULT_PCIE_ROOT_BUS};
use crate::qemu::qmp::get_qmp_socket_path;
use crate::utils::{
chown_to_parent, clear_cloexec, create_vhost_net_fds, open_named_tuntap, SocketAddress,
chown_to_parent, clear_cloexec, create_vhost_net_fds, open_named_tuntap, uses_native_ccw_bus,
SocketAddress,
};
use crate::{kernel_param::KernelParams, Address, HypervisorConfig};
@@ -79,8 +80,8 @@ impl Display for VirtioBusType {
}
}
fn bus_type(config: &HypervisorConfig) -> VirtioBusType {
if config.machine_info.machine_type.contains("-ccw-") {
fn bus_type() -> VirtioBusType {
if uses_native_ccw_bus() {
VirtioBusType::Ccw
} else {
VirtioBusType::Pci
@@ -2210,7 +2211,7 @@ pub struct QemuCmdLine<'a> {
impl<'a> QemuCmdLine<'a> {
pub fn new(id: &str, config: &'a HypervisorConfig) -> Result<QemuCmdLine<'a>> {
let ccw_subchannel = match bus_type(config) {
let ccw_subchannel = match bus_type() {
VirtioBusType::Ccw => Some(CcwSubChannel::new()),
_ => None,
};
@@ -2242,11 +2243,11 @@ impl<'a> QemuCmdLine<'a> {
qemu_cmd_line.add_template();
}
if bus_type(config) != VirtioBusType::Ccw {
if bus_type() != VirtioBusType::Ccw {
qemu_cmd_line.add_rng();
}
if bus_type(config) != VirtioBusType::Ccw && config.device_info.default_bridges > 0 {
if bus_type() != VirtioBusType::Ccw && config.device_info.default_bridges > 0 {
qemu_cmd_line.add_bridges(config.device_info.default_bridges);
}
@@ -2314,16 +2315,10 @@ impl<'a> QemuCmdLine<'a> {
fn add_scsi_controller(&mut self) {
let devno = get_devno_ccw(&mut self.ccw_subchannel, "scsi0");
let mut virtio_scsi = DeviceVirtioScsi::new(
"scsi0",
should_disable_modern(),
bus_type(self.config),
devno,
);
let mut virtio_scsi =
DeviceVirtioScsi::new("scsi0", should_disable_modern(), bus_type(), devno);
if self.config.device_info.enable_iommu_platform
&& bus_type(self.config) == VirtioBusType::Ccw
{
if self.config.device_info.enable_iommu_platform && bus_type() == VirtioBusType::Ccw {
virtio_scsi.set_iommu_platform(true);
}
@@ -2352,7 +2347,7 @@ impl<'a> QemuCmdLine<'a> {
self.devices.push(Box::new(virtiofsd_socket_chardev));
let bus_type = bus_type(self.config);
let bus_type = bus_type();
let devno = get_devno_ccw(&mut self.ccw_subchannel, chardev_name);
let mut virtiofs_device = DeviceVhostUserFs::new(chardev_name, mount_tag, bus_type, devno);
virtiofs_device.set_queue_size(queue_size);
@@ -2389,15 +2384,13 @@ impl<'a> QemuCmdLine<'a> {
clear_cloexec(vhostfd.as_raw_fd()).context("clearing O_CLOEXEC failed on vsock fd")?;
let devno = get_devno_ccw(&mut self.ccw_subchannel, "vsock-0");
let mut vhost_vsock_pci = VhostVsock::new(vhostfd, guest_cid, bus_type(self.config), devno);
let mut vhost_vsock_pci = VhostVsock::new(vhostfd, guest_cid, bus_type(), devno);
if !self.config.disable_nesting_checks && should_disable_modern() {
vhost_vsock_pci.set_disable_modern(true);
}
if self.config.device_info.enable_iommu_platform
&& bus_type(self.config) == VirtioBusType::Ccw
{
if self.config.device_info.enable_iommu_platform && bus_type() == VirtioBusType::Ccw {
vhost_vsock_pci.set_iommu_platform(true);
}
@@ -2449,11 +2442,8 @@ impl<'a> QemuCmdLine<'a> {
self.devices
.push(Box::new(DeviceScsiHd::new(device_id, "scsi0.0", devno)));
} else {
self.devices.push(Box::new(DeviceVirtioBlk::new(
device_id,
bus_type(self.config),
devno,
)));
self.devices
.push(Box::new(DeviceVirtioBlk::new(device_id, bus_type(), devno)));
}
Ok(())
@@ -2484,10 +2474,8 @@ impl<'a> QemuCmdLine<'a> {
pub fn add_console(&mut self, console_socket_path: &str) {
let devno = get_devno_ccw(&mut self.ccw_subchannel, "serial0");
let mut serial_dev = DeviceVirtioSerial::new("serial0", bus_type(self.config), devno);
if self.config.device_info.enable_iommu_platform
&& bus_type(self.config) == VirtioBusType::Ccw
{
let mut serial_dev = DeviceVirtioSerial::new("serial0", bus_type(), devno);
if self.config.device_info.enable_iommu_platform && bus_type() == VirtioBusType::Ccw {
serial_dev.set_iommu_platform(true);
}
self.devices.push(Box::new(serial_dev));
@@ -2777,13 +2765,12 @@ pub fn get_network_device(
}
let devno = get_devno_ccw(ccw_subchannel, &netdev.id);
let mut virtio_net_device =
DeviceVirtioNet::new(&netdev.id, guest_mac, bus_type(config), devno);
let mut virtio_net_device = DeviceVirtioNet::new(&netdev.id, guest_mac, bus_type(), devno);
if should_disable_modern() {
virtio_net_device.set_disable_modern(true);
}
if config.device_info.enable_iommu_platform && bus_type(config) == VirtioBusType::Ccw {
if config.device_info.enable_iommu_platform && bus_type() == VirtioBusType::Ccw {
virtio_net_device.set_iommu_platform(true);
}
if config.network_info.network_queues > 1 {

View File

@@ -822,6 +822,7 @@ impl QemuInner {
primary_device.guest_pci_path = qmp.hotplug_vfio_device(
&primary_device.hostdev_id,
&primary_device.sysfs_path,
&primary_device.bus_slot_func,
&vfiodev.driver_type,
&vfiodev.bus,

View File

@@ -719,25 +719,41 @@ impl Qmp {
pub fn hotplug_vfio_device(
&mut self,
hostdev_id: &str,
sysfs_path: &str,
bus_slot_func: &str,
driver: &str,
bus: &str,
) -> Result<Option<PciPath>> {
let mut vfio_args = Dictionary::new();
let bdf = if !bus_slot_func.starts_with("0000") {
format!("0000:{}", bus_slot_func)
} else {
bus_slot_func.to_owned()
};
vfio_args.insert("addr".to_owned(), "0x0".into());
vfio_args.insert("host".to_owned(), bdf.into());
vfio_args.insert("multifunction".to_owned(), "off".into());
let vfio_device_add = qmp::device_add {
driver: driver.to_string(),
bus: Some(bus.to_string()),
id: Some(hostdev_id.to_string()),
arguments: vfio_args,
let (vfio_device_add, early_return) = match driver {
"vfio-ap" => {
vfio_args.insert("sysfsdev".to_owned(), sysfs_path.to_string().into());
let device_add = qmp::device_add {
driver: driver.to_string(),
bus: None,
id: Some(hostdev_id.to_string()),
arguments: vfio_args,
};
(device_add, Some(Ok(None)))
}
_ => {
let bdf = if !bus_slot_func.starts_with("0000") {
format!("0000:{}", bus_slot_func)
} else {
bus_slot_func.to_owned()
};
vfio_args.insert("addr".to_owned(), "0x0".into());
vfio_args.insert("host".to_owned(), bdf.into());
vfio_args.insert("multifunction".to_owned(), "off".into());
let device_add = qmp::device_add {
driver: driver.to_string(),
bus: Some(bus.to_string()),
id: Some(hostdev_id.to_string()),
arguments: vfio_args,
};
(device_add, None)
}
};
info!(sl!(), "vfio_device_add: {:?}", vfio_device_add.clone());
@@ -769,6 +785,11 @@ impl Qmp {
.set_read_timeout(Some(Duration::from_millis(DEFAULT_QMP_READ_TIMEOUT)))?;
}
// For AP devices, we don't need to get the PCI path as it's not available.
if let Some(result) = early_return {
return result;
}
let pci_path = self
.get_device_by_qdev_id(hostdev_id)
.context("get device by qdev_id failed")?;

View File

@@ -21,6 +21,7 @@ use kata_types::{
build_path,
config::{Hypervisor, KATA_PATH},
};
use lazy_static::lazy_static;
use nix::{
fcntl,
sched::{setns, CloneFlags},
@@ -369,6 +370,51 @@ pub fn get_cmd_output(cmd: &str, args: &[&str]) -> Result<String> {
Ok(String::from_utf8(result.stdout)?)
}
// The presence of this sysfs directory is the fundamental architectural proof.
const CCW_BUS_PATH: &str = "/sys/bus/ccw/devices";
// These drivers are specific to traditional mainframe I/O and prove
// native CCW support, even in virtualized environments.
const NATIVE_CCW_DRIVERS: [&str; 3] = [
"3270", // IBM 3270 Terminal Driver
"dasd-eckd", // Mainframe DASD (Disk) Driver
"zfcp", // Fibre Channel Protocol Driver (FICON)
];
lazy_static! {
static ref NATIVE_CCW_BUS_CACHE: bool = {
if !Path::new(CCW_BUS_PATH).exists() {
false
} else {
let drivers_path = PathBuf::from("/sys/bus/ccw/drivers");
let mut native_driver_found = false;
for driver_name in NATIVE_CCW_DRIVERS.iter() {
let driver_path = drivers_path.join(driver_name);
if driver_path.exists() {
native_driver_found = true;
break;
}
}
native_driver_found
}
};
}
/// Detects if the system uses native CCW (Channel Command Word) bus.
/// This function checks for the presence of CCW bus infrastructure in sysfs
/// and verifies that native mainframe drivers are available.
///
/// The result is cached after the first call to avoid repeated IO operations.
///
/// # Returns
/// `true` if native CCW bus is detected, `false` otherwise.
pub fn uses_native_ccw_bus() -> bool {
*NATIVE_CCW_BUS_CACHE
}
#[cfg(test)]
mod tests {
use std::fs;

View File

@@ -40,9 +40,9 @@ tempfile = "3.19.1"
hex = "0.4"
## Dependencies from `rust-netlink`
netlink-packet-route = "0.22"
netlink-packet-route = "0.26"
netlink-sys = "0.8"
rtnetlink = "0.16"
rtnetlink = "0.19"
# Local dependencies
agent = { workspace = true }

Some files were not shown because too many files have changed in this diff Show More