Compare commits

...

359 Commits

Author SHA1 Message Date
Fabiano Fidêncio
1eabd6c729 tests: k8s: coco: Use a var for AUTHENTICATED_IMAGE_PASSWORD
It's a bot password that only has read permissions for that image, no
need to use a secret here.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-15 23:12:02 +01:00
Fabiano Fidêncio
b1ec7d0c02 build: ci: remove KBUILD_SIGN_PIN entirely
Drop kernel build signing (KBUILD_SIGN_PIN) from CI and from all
scripts that referenced it.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-15 18:52:18 +01:00
Fabiano Fidêncio
83dce477d0 build: ci: remove CI_HKD_PATH and s390x boot-image-se build
Drop the CI_HKD_PATH secret and the build-asset-boot-image-se job from
the s390x tarball workflow; the artefact that depended on the host key
was never ever released anyways.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-15 16:24:30 +01:00
Fabiano Fidêncio
d000acfe08 infra: fix multi-arch manifest publish
Per-arch images were failing publish-multiarch-manifest with 'X is a manifest
list' because Buildx now enables attestations by default, so each arch tag
became an image index. Use 'docker buildx imagetools create' instead of
'docker manifest create' so we can merge those indexes into the final
multi-arch manifest while keeping provenance and SBOM on per-arch images.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-14 19:49:00 +01:00
Fabiano Fidêncio
02c9a4b23c kata-deploy: Temporarily comment GPU specific labels
We depend on GPU Operator v26.3 release, which is not out yet.
Although we have been testing with it, it's not yet publicly available,
which would break anyone actually trying to use the GPU runtime classes.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-14 09:25:14 +01:00
Fabiano Fidêncio
5106e7b341 build: Add gnupg to the agent's builder container
Otherwise we'll fail to check gperf's GPG signing key when needed.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-14 00:33:45 +01:00
stevenhorsman
79b5022a5a kata-ctl: Bump rkyv version to 0.7.46
Bump to remediate RUSTSEC-2026-0001

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-14 00:33:45 +01:00
stevenhorsman
30ebc4241e genpolicy: Bump rkyv version to 0.7.46
Bump to remediate RUSTSEC-2026-0001

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-14 00:33:45 +01:00
stevenhorsman
87d1979c84 agent-ctl: Bump rkyv version to 0.7.46
Bump to remediate RUSTSEC-2026-0001

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-14 00:33:45 +01:00
stevenhorsman
90dbd3f562 agent: Bump rkyv version to 0.7.46
Bump to remediate RUSTSEC-2026-0001

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-14 00:33:45 +01:00
stevenhorsman
7f77948658 versions: Bump rkyv version to 0.7.46
Bump to remediate RUSTSEC-2026-0001

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-14 00:33:45 +01:00
Aurélien Bombo
981f693a88 Merge pull request #11140 from balintTobik/hyperv_warning
runtime: refactor hypervisor devices cgroup creation
2026-02-13 15:16:09 -06:00
Fabiano Fidêncio
d8acc403c8 kata-deploy: set CRI images runtime_platform snapshotter for containerd v3
In containerd config v3 the CRI plugin is split into runtime and images,
and setting the snapshotter only on the runtime plugin is not enough for image
pull/prepare.

The images plugin must have runtime_platform.<runtime>.snapshotter so it
uses the correct snapshotter per runtime (e.g. nydus, erofs).

A PR on the containerd side is open so we can rely on the runtime plugin
snapshotter alone: https://github.com/containerd/containerd/pull/12836

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-13 22:15:02 +01:00
Fabiano Fidêncio
2930c68c0b ci: tdx: properly skip k8s-sandbox-vcpus-allocation.bats
This is a follow-up for 25962e9325

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-13 20:56:08 +01:00
Fabiano Fidêncio
f6e0a7c33c scripts: use temporary GPG home when verifying cached gperf tarball
In CI the default GPG keyring is often read-only or missing, so
'gpg --import' of the cached keyring fails and verification cannot
succeed. Use a temporary GNUPGHOME for import and verify so cached
gperf can be verified without writing to the system keyring.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-13 19:39:55 +01:00
stevenhorsman
55a89f6836 runtime: doc: Remove usage of golang.org/x/net/context
This package is deprecated and we aren't using it any more

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-13 17:55:23 +01:00
stevenhorsman
06246ea18b csi-kata-directvolume: Remove usage of golang.org/x/net/context
This packages is deprecated, so use the standard library context
package instead

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-13 17:55:23 +01:00
stevenhorsman
f2fae93785 csi-kata-directvolume: Bump x/net to v0.50
Remediates CVEs:
- GO-2026-4440
- GO-2026-4441

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-13 17:55:23 +01:00
stevenhorsman
74d4469dab ci/openshift-ci: Bump x/net to v0.50
Remediates CVEs:
- GO-2026-4440
- GO-2026-4441

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-13 17:55:23 +01:00
Steve Horsman
bb867149bb Merge pull request #12514 from fidencio/topic/nvidia-try-to-improve-genpolicy-failures
tests: nvidia: Fix genpolicy error when pulling nvcr.io images
2026-02-13 16:34:00 +00:00
Joji Mekkattuparamban
f3bba08851 kata-deploy: add node selector to nvidia runtime classes
The CC runtime classes kata-qemu-nvidia-gpu-snp and kata-qemu-nvidia-gpu-tdx
are mutually exclusive with kata-qemu-nvidia-gpu, as dictated by the gpu
cc mode setting. In order to properly support a cluster that has both CC and
non-CC nodes, we use a node selector so the scheduling is consistent with the
GPU mode. The GPU operator sets a label nvidia.com/cc.ready.state=[true, false]
to indicate the gpu mode setting

Fixes #12431

Signed-off-by: Joji Mekkattuparamban <jojim@nvidia.com>
2026-02-13 15:58:06 +01:00
Fabiano Fidêncio
8cb7d0be9d tests: nvidia: Fix genpolicy error when pulling nvcr.io images
genpolicy pulls image manifests from nvcr.io to generate policy and was
failing with 'UnauthorizedError' because it had no registry credentials.

Genpolicy (src/tools/genpolicy) uses docker_credential::get_credential()
in registry.rs, which reads from DOCKER_CONFIG/config.json. Add
setup_genpolicy_registry_auth() to create a Docker config with nvcr.io
auth (NGC_API_KEY) and set DOCKER_CONFIG before running genpolicy so it
can authenticate when pulling manifests.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-13 13:12:55 +01:00
Fabiano Fidêncio
f4dcb66a3c ci: add workflow to push ORAS tarball cache
Add push-oras-tarball-cache workflow that runs on push to main when
versions.yaml changes (and on workflow_dispatch). It populates the
ghcr.io ORAS cache with gperf and busybox tarballs from versions.yaml.

Remove the push_to_cache call from download-with-oras-cache.sh since
it was never triggered in CI. Cache population is now done solely by the
new workflow and by populate-oras-tarball-cache.sh when run manually.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-13 12:57:48 +01:00
Balint Tobik
295a6a81d0 runtime: refactor hypervisor devices cgroup creation
Separatly added hypervisor devices to cgroup to
omit not relevant warnings and fail if none of them
are available.
Also fix a testcase reload removed kernel modules to later testcases
and skip some tests on ARM because lack of virtualization support
Fixes #6656

Signed-off-by: Balint Tobik <btobik@redhat.com>
2026-02-13 09:23:08 +01:00
Aurélien Bombo
14be9504e7 Merge pull request #12506 from kata-containers/sprt/gperf-mirror
versions: Switch gperf mirror again
2026-02-12 17:00:17 -06:00
Fabiano Fidêncio
a01e95b988 kata-deploy: test k3s/rke2 template handling / version checks
Add tests for the split_non_toml_header helper that strips Go template
directives before TOML parsing, and for every TOML operation (set, get,
append, remove, set_array) on files that start with {{ template "base" . }}.

Also converts the containerd version detection tests in manager.rs from
individual #[test] functions with helper wrappers to parametrized #[rstest]
cases, which is more readable and easier to extend.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-12 22:30:08 +01:00
Fabiano Fidêncio
2e7633674f kata-deploy: use k3s/rke2 base template
K3s docs (https://docs.k3s.io/advanced#configuring-containerd) say that the
right way to customize containerd is to extend the base template with
{{ template "base" . }} and append your own TOML blocks, rather than copying a
prerendered config.toml into the template file.

We were copying config.toml into config.toml.tmpl / config-v3.toml.tmpl, which
meant we were replacing the K3s defaults with a snapshot that gets stale as
soon as K3s is upgraded.

Now we create the template files with just the base directive and let our
regular set_toml_value code path append the Kata runtime configuration on top.

To make that work, the TOML utils learned to handle files that start with a
Go template line ({{ ... }}): strip it before parsing, put it back when writing.
This keeps the K3s/RKE2 path identical to every other runtime -- no special
append logic needed.

refs:
* k3s:: https://docs.k3s.io/advanced#configuring-containerd
* rke2: https://docs.rke2.io/advanced?_highlight=conyainerd#configuring-containerd

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-12 22:30:08 +01:00
Aurélien Bombo
199e1ab16c versions: Switch gperf mirror again
The mirror introduced by #11178 still breaks quite often so apply this as a
quick fix.

A proper solution would probably be to load balance like in #12453.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-02-12 13:41:19 -06:00
Fabiano Fidêncio
6a3bbb1856 tests: Retry k8s deployment
We've seen a lot of spurious issues when deploying the infra needed for
the tests.  Let's give it a few tries before actually failing.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-02-12 20:13:59 +01:00
Manuel Huber
ed7de905b5 build: Tighten upstream download path for ORAS
The gperf-3.3 tarball frequently fails to download on my end with
cryptic error messages such as: "tar: This does not look like a tar
archive". This change tightens the download logic a bit: We fail at
the point in time when we're supposed to fail. This way we detect
rate limiting issues right away, and this way, the actual hashsum
and signature checks are effective, not only printouts.

This change also updates the key reference and allows for an array,
for instance, when a different signer was used for a cache vs
upstream version.
The change also makes it clear, that signature verification is only
implemented for the gperf tarball. Improvements can be made in a
subsequent change.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-12 19:20:35 +01:00
Fabiano Fidêncio
9fc5be47d0 kata-deploy: fix custom runtime config path for runtime-rs shims
Custom runtimes whose base config lives under runtime-rs/ (e.g. dragonball,
cloud-hypervisor) were not found because the path was always built under
share/defaults/kata-containers/. Use get_kata_containers_original_config_path
for the handler so rust shim configs are read from .../runtime-rs/.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-12 18:08:47 +01:00
Fabiano Fidêncio
50923b6d62 kata-deploy: run cleanup on uninstall via DaemonSet preStop
On helm uninstall let's rely on a preStop hook to run kata-deploy
cleanup so each pod cleans its node before exiting.

We **must** keep RBAC (resource-policy: keep) so pods retain API access
during termination, and then can properly delete the NodeFeatureRules
and remove the labels from the nodes.

The post-delete hook Job, which runs on a single node, now is only
responsible for cleaning the kept RBAC (cluster-wide resource) after
uninstall, not leaving any resource or artefact behind.

The changes on this commit lead to a "resouerces were kept" message when
running `helm uninstall`, which document as being normal, as the
post-delete job will remove those.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-11 22:05:10 +01:00
Fabiano Fidêncio
6e0cbc28a3 kata-deploy: fix node label removal
When removing a node label, JSON merge patch semantics require setting
the key to null; omitting the key leaves it unchanged.

Fix label_node to send a patch with the label key set to null so the API
server actually removes katacontainers.io/kata-runtime.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-11 22:05:10 +01:00
Fabiano Fidêncio
510d2a69ae kata-deploy: exit with 0 on SIGTERM in install mode
Wait for SIGTERM after install and exit(0) so the container terminates
cleanly. If registering the SIGTERM handler fails, log a warning and
sleep forever instead of exiting with an error (fallback to the old
behaviour).

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-11 22:05:10 +01:00
Mikko Ylinen
25962e9325 tests/coco: disable k8s-sandbox-vcpus-allocation.bats for TDX
After the move to Linux 6.17 and QEMU 10.2 from Kata,
k8s-sandbox-vcpus-allocation.bats started failing on TDX.

2026-02-10T16:39:39.1305813Z # pod/vcpus-less-than-one-with-no-limits created
2026-02-10T16:39:39.1306474Z # pod/vcpus-less-than-one-with-limits created
2026-02-10T16:39:39.1307090Z # pod/vcpus-more-than-one-with-limits created
2026-02-10T16:39:39.1307672Z # pod/vcpus-less-than-one-with-limits condition met
2026-02-10T16:39:39.1308373Z # timed out waiting for the condition on pods/vcpus-less-than-one-with-no-limits
2026-02-10T16:39:39.1309132Z # timed out waiting for the condition on pods/vcpus-more-than-one-with-limits
2026-02-10T16:39:39.1310370Z # Error from server (BadRequest): container "vcpus-less-than-one-with-no-limits" in pod "vcpus-less-than-one-with-no-limits" is waiting to start: ContainerCreating

A manual test without agent policies added it seems to work OK but disable
the test for now to get CI stable.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2026-02-11 22:02:59 +01:00
stevenhorsman
006a5d5141 versions: Tidy up versions file
- We don't use containerd.latest as the comment on it suggests
- We also don't have any references to `sriov-network-device`
so remove that and the plugins section.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-11 20:49:53 +01:00
Dan Mihai
9d763e9d5a Merge pull request #12439 from sespiros/genpolicy-suppress-yaml-stdout
genpolicy: suppress YAML output when --{base64/raw}-out are used
2026-02-11 10:27:01 -08:00
Spyros Seimenis
282bfc9f14 genpolicy: suppress YAML output when --{base64/raw}-out are used
this will suppress yaml output only if the input is passed via
stdin. If {base64/raw}-out is passed in alongside a yaml file, the
encoded annotation or the policy data respectively will be printed
to stdout as before.

Fixes #12438

Signed-off-by: Spyros Seimenis <sse@edgeless.systems>
2026-02-11 14:08:30 +02:00
Hyounggyu Choi
c84e37f6ac Merge pull request #12486 from BbolroC/cpu-hotplug-s390x-runtime-rs
runtime-rs: Skip sockets and threads for hotplug_vcpus on Z/P
2026-02-11 09:40:21 +01:00
Hyounggyu Choi
67f54bdcb5 tests: Remove skip condition for runtime-rs on s390x in k8s-cpu-ns
This commit removes the skip condition for qemu-runtime-rs on s390x
in k8s-cpu-ns.bats.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2026-02-11 05:52:13 +01:00
Hyounggyu Choi
eab77a26ab runtime-rs: Skip sockets and threads for hotplug_vcpus on Z/P
As s390x and ppc64 use a flat CPU topology without sockets and threads,
this commit skips the socket_id and thread_id properties for vCPU hotplug
on these architectures instead of aborting the operation.
This is the change in line with those from the Go runtime:
- isSocketIDSupported()
- isThreadIDSupported()

Fixes: #12155

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2026-02-11 05:52:13 +01:00
Alex Lyn
c53910eb1b Merge pull request #12408 from Apokleos/netdev-multiq
runtime-rs: Add support configurable network_queues via configuration and annotation
2026-02-11 09:34:58 +08:00
stevenhorsman
a115d6d858 ci: Add copyright and license to shellcheckrc
Make the static-checks happy

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-10 21:58:28 +01:00
stevenhorsman
15d6a681ed doc: Fix spelling issues
Put things in backticks

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-10 21:58:28 +01:00
stevenhorsman
e84d234721 doc: Update broken/slow URLs
Update the URLs to better/existing links

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-10 21:58:28 +01:00
Fabiano Fidêncio
5c0269881e tests: Make editorconfig-checker happy
- Trim trailing whitespace and ensure final newline in non-vendor files
- Add .editorconfig-checker.json excluding vendor dirs, *.patch, *.img,
  *.dtb, *.drawio, *.svg, and pkg/cloud-hypervisor/client so CI only
  checks project code
- Leave generated and binary assets unchanged (excluded from checker)

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-10 21:58:28 +01:00
Fabiano Fidêncio
34199b09eb runtime-rs: Properly parse containerd runtime options to extract ConfigPath
The runtime-rs shim was failing to load its configuration when deployed
via kata-deploy because it couldn't correctly parse the ConfigPath passed
by containerd. The previous implementation naively skipped the first 2
bytes of the options and interpreted the rest as a UTF-8 string, which
doesn't work since containerd passes a properly serialized protobuf
message of type runtimeoptions.v1.Options.

This change adds the runtimeoptions.proto definition to the protocols
crate and updates the load_config function to correctly deserialize the
protobuf message and extract the config_path field, matching how the Go
runtime handles this via typeurl.UnmarshalAny.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-10 18:12:17 +01:00
Fabiano Fidêncio
cb652e0da1 tests: Update NVRC trace to use drop-in config mechanism
Update the enable_nvrc_trace() function to use the new drop-in
configuration mechanism instead of directly modifying the base
configuration file. The function now creates a 90-nvrc-trace.toml
drop-in file that properly combines existing kernel parameters
with the nvrc.log=trace setting.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-10 18:12:17 +01:00
Fabiano Fidêncio
4cb2aea9dd kata-deploy: Document drop-in configuration and add warning to config files
When kata-deploy installs Kata Containers, the base configuration files
should not be modified directly. This change adds documentation explaining
how to use drop-in configuration files for customization, and prepends a
warning comment to all deployed configuration files reminding users to use
drop-in files instead.

The warning is added to both standard shim configurations and custom
runtime configurations. It includes a brief explanation of how drop-in
files work and points users to the documentation for more details.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-10 18:12:17 +01:00
Fabiano Fidêncio
d5d561abe5 kata-deploy: Add detailed logging for drop-in configuration
Add clear INFO-level messages when creating drop-in configuration
files, making it easy to understand what kata-deploy is doing during
installation:

- "Setting up runtime directory for shim: X"
- "Generating drop-in configuration files for shim: X"
- "Created drop-in file: <path>"

When DEBUG mode is enabled (via DEBUG=true environment variable),
also log the full content of each drop-in file to aid troubleshooting.

The log level is now automatically set to Debug when the DEBUG
environment variable is set, ensuring debug messages are visible.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-10 18:12:17 +01:00
Fabiano Fidêncio
eddd1b507e kata-deploy: Extract common drop-in generation into shared helper
Deduplicate the drop-in file generation logic between configure_shim_config
and install_custom_runtime_configs by extracting it into a shared
write_common_drop_ins helper function.

This ensures both standard and custom runtimes use the same code path
for generating drop-in configuration files.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-10 18:12:17 +01:00
Fabiano Fidêncio
577aa6b319 kata-deploy: Propagate drop-in configs to custom runtime classes
Ensure custom runtime classes receive the same drop-in configuration
files as standard runtimes:
- 10-installation-prefix.toml (if custom dest_dir)
- 20-debug.toml (if debug enabled)
- 30-kernel-params.toml (proxy + debug kernel params)

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-10 18:12:17 +01:00
Fabiano Fidêncio
8c60a88bda kata-deploy: Add combined kernel_params drop-in
Add a combined drop-in file (30-kernel-params.toml) that handles all
kernel_params modifications. This approach reads the base kernel_params
from the original untouched config file and combines them with:
- Proxy settings (agent.https_proxy, agent.no_proxy)
- Debug settings (agent.log=debug, initcall_debug)

Using a single drop-in file for kernel_params avoids the TOML merge
behavior where scalar values are replaced rather than appended.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-10 18:12:17 +01:00
Fabiano Fidêncio
fae96f1f82 kata-deploy: Add drop-in file for debug configuration
When debug mode is enabled, generate a drop-in configuration file
(20-debug.toml) with the boolean debug flags for hypervisor, runtime,
and agent sections.

Note: kernel_params for debug (agent.log=debug, initcall_debug) will
be handled by a separate combined kernel_params drop-in file to avoid
the TOML merge replacement behavior.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-10 18:12:17 +01:00
Fabiano Fidêncio
bb65e516e5 kata-deploy: Add drop-in file for installation prefix
When the installation prefix differs from the default /opt/kata,
generate a drop-in configuration file (10-installation-prefix.toml)
with the adjusted paths instead of modifying the original config file.

This removes the need for adjust_installation_prefix and
adjust_qemu_cmdline functions which are now deleted along with
their tests.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-10 18:12:17 +01:00
Fabiano Fidêncio
cd76d61a3d kata-deploy: Add infrastructure for per-shim drop-in configuration
Instead of modifying original config files directly, set up a per-shim
directory structure that uses symlinks to the original configs and
config.d/ directories for drop-in overrides.

This enables cleaner configuration management where the original files
remain untouched and all kata-deploy customizations are in separate
drop-in files that can be easily inspected and removed.

Directory structure:
  {config_path}/runtimes/{shim}/
  {config_path}/runtimes/{shim}/configuration-{shim}.toml -> symlink
  {config_path}/runtimes/{shim}/config.d/

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-10 18:12:17 +01:00
Paul Meyer
c5ad3f9b26 Merge pull request #12472 from katexochen/p/disable-nvdimm-cc
runtime: disable nvdimm for confidential guest
2026-02-10 14:54:40 +01:00
Steve Horsman
44c86f881b Merge pull request #12466 from ldoktor/gk-pagination
tools.gatekeeper: Add support to paginate workflows
2026-02-10 11:59:57 +00:00
Steve Horsman
a8debc9841 Merge pull request #12476 from stevenhorsman/bump-rust-to-1.91
versions: Bump rust to 1.91
2026-02-10 10:03:01 +00:00
Paul Meyer
76525b97a6 runtime-rs: disable nvdimm for confidential guest
nvdimm isn't supported by confidential guests, so disable it in
the configuration.

Signed-off-by: Paul Meyer <katexochen0@gmail.com>
2026-02-10 08:38:41 +01:00
Paul Meyer
a5f554922c runtime: disable nvdimm for confidential guest
There is code to disable this at runtime  when confidential_guest
is enabled anyway[^1], but it will omit a warning every time. All
the touched configuration files set confidential_guest to true,
so we already know nvdimm isn't supported.

[^1]: 16a7ed6e14/src/runtime/virtcontainers/qemu_amd64.go (L144-L148)

Signed-off-by: Paul Meyer <katexochen0@gmail.com>
2026-02-10 08:38:18 +01:00
Lukáš Doktor
f7baa394d4 tools.gatekeeper: Add support to paginate workflows
The number of workflows increased over 30 so we need to paginate them as
well as jobs. This commit extracts the existing pagination from jobs and
uses it for both jobs and workflows.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2026-02-10 06:53:47 +00:00
stevenhorsman
120fde28e1 versions: Bump rust to 1.91
Following the agreed toolchain policy - bump rust to the current (1.93)-2
releases.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-10 06:52:42 +00:00
Alex Lyn
362a4c5714 runtime-rs: Fix multiqueue config propagation and tap initialization
The previous implementation failed to correctly propagate the network
multiqueue configuration, causing the effective queue number to remain
0.
It also mixed up "queue pairs" with "queue number", so tap devices were
opened without proper multiqueue initialization which causes Clh
netconfig validation failed.

This commit fixes the configuration mapping and initializes tap devices
with the correct multiqueue semantics, ensuring Cloud Hypervisor
receives a valid netconfig.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-10 11:34:25 +08:00
Alex Lyn
79f81dae50 runtime-rs: Add network_queues for setting network device multiqueues
To make network_queues configurable, a new method is introduced via
configurtion toml.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-10 11:34:25 +08:00
Alex Lyn
6723ff5c46 runtime-rs: Add configurable DEFNETQUEUES in Makefile
To make build with a configurable item of network queues, a dedicated
variable of DEFNETQUEUES is added.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-10 11:34:25 +08:00
Alex Lyn
cfc479ef1d kata-types: Add Network device specific annotation for network queues
This commit introduces a new annotation for users to easily set network
queues via "io.katacontainers.config.hypervisor.network_queues".
And the annotation will be mapped into `NetworkInfo.network_queues`
within the configuration.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-10 11:34:25 +08:00
Alex Lyn
61e7875267 kata-types: Adjust the network_queues when load from configuration
Adjusts the network queues after loading from a configuration file.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-10 11:34:25 +08:00
Manuel Huber
a6ca5c6628 ci: add editorconfig checker
This adds a basic configuration for editorconfig checker. The
supplied configuration checks against trailing whitespaces and
issues with newlines.
Example:
| tools/packaging/kernel/configs/fragments/x86_64/numa.conf:
|       Wrong line endings or no final newline
| tools/packaging/release/generate_vendor.sh:
|       44: Trailing whitespace

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-09 15:03:26 -08:00
stevenhorsman
e6d291cf0a trace-forwarder: Bump time to 0.3.47
Bump time to remediate CVE-2026-25727

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-09 21:44:51 +01:00
stevenhorsman
79dc892e18 kata-ctl: Bump time to 0.3.47
Bump time to remediate CVE-2026-25727

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-09 21:44:51 +01:00
stevenhorsman
9e1ddcdde9 agent-ctl: Bump time to 0.3.47
Bump time to remediate CVE-2026-25727

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-09 21:44:51 +01:00
stevenhorsman
f840f9ad54 rust: Bump time to 0.3.47
To remediate CVE-2026-25727

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-09 21:44:51 +01:00
stevenhorsman
ffcb10b6a3 agent: Bump time crate to 0.3.47
Update time to resolve CVE-2026-25727.
Note: this involved bumping the versions of slog-term and slog-json
and bumping the MSRV to 1.88.0 which time 0.3.47 requires.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-09 21:44:51 +01:00
stevenhorsman
33d494b07e kata-deploy: Bump bytes to 1.11.1
To remediate CVE-2026-25541

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-09 21:43:23 +01:00
stevenhorsman
2ea29df99a genpolicy: Bump bytes to 1.11.1
To remediate CVE-2026-25541

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-09 21:43:23 +01:00
stevenhorsman
fa3b419965 kata-ctl: Bump bytes to 1.11.1
To remediate CVE-2026-25541

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-09 21:43:23 +01:00
stevenhorsman
e49a61eea2 agent: Bump bytes to 1.11.1
To remediate CVE-2026-25541

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-09 21:43:23 +01:00
stevenhorsman
bc45788356 versions: Bump bytes to 1.11.1
To remediate CVE-2026-25541

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-09 21:43:23 +01:00
stevenhorsman
51d35f9261 agent-ctl: Bump bytes to 1.11.1
Remediate CVE-2026-25541

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-09 21:43:23 +01:00
Park.Jiyeon
082e25b297 genpolicy: skip serializing VFIO generation-only settings
Skip serializing anno/value regexes and the NVIDIA VFIO device type since they
are generation-time only.

Signed-off-by: Park.Jiyeon <jiyeonnn2@icloud.com>
2026-02-09 11:36:34 -08:00
Park.Jiyeon
9231144b99 genpolicy: refactor VFIO settings and support multiple NVIDIA GPU keys
- Moved VFIO-related config from "device_annotations" to a new "devices" section.
- Introduced structured "nvidia" subfield for NVIDIA-specific VFIO settings.
- Replaced hardcoded "nvidia.com/pgpu" with configurable "pgpu_resource_keys".
- Adjusted Rego rules and code to match new config schema.

Signed-off-by: Park.Jiyeon <jiyeonnn2@icloud.com>
2026-02-09 11:36:34 -08:00
Park.Jiyeon
5fa5d1934b fix(genpolicy): make NVIDIA GPU resource keys configurable
Allow specifying multiple NVIDIA GPU resource keys via an explicit allowlist.

Keys are now configured under `device_annotations.vfio.nvidia_pgpu_resource_keys`
in genpolicy-settings.json. This removes the previous hardcoded reliance on
`nvidia.com/pgpu` and supports model-specific resource names.

Fixes #12322

Signed-off-by: Park.Jiyeon <jiyeonnn2@icloud.com>
2026-02-09 11:36:34 -08:00
Manuel Huber
525192832f tests: Clean up superfluous GPU annotation
This annotation was required for GPU cold-plug before using a
newer device plugin and before querying the pod resources API.
As this annotation is no longer required, cleaning it up.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-09 11:28:24 -08:00
Konstantin Khlebnikov
5d99a141d9 runtime: add hypervisor options for NUMA topology
With enable_numa=true hypervisor will expose host NUMA topology as is:
map vm NUMA nodes to host 1:1 and bind vpus to relates CPUS.

Option "numa_mapping" allows to redefine NUMA nodes mapping:
- map each vm node to particular host node or several numa nodes
- emulate numa on host without numa (useful for tests)

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Co-authored-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-02-09 20:09:25 +01:00
Fabiano Fidêncio
ab515712d4 kernel: Unify kernel and kernel-confidential
Build a single kernel for both kernel and kernel-confidential on x86_64
and s390x. The kernel is built with TEE support (-x) on those arches only.

This helps to simplilfy and to maintain the code, and having a single
kernel was the original plan since forever.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-09 18:28:23 +01:00
Fabiano Fidêncio
c5b5433866 kernel: Unify nvidia-gpu and nvidia-gpu-confidential
Build a single kernel for both nvidia-gpu and nvidia-gpu-confidential,
simplifying and reducing code maintenance.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-09 18:28:23 +01:00
Steve Horsman
f02fa79758 Merge pull request #12470 from jirimoravcik/docs/add-os-version
docs: add `OS_VERSION` to rootfs script
2026-02-09 15:06:14 +00:00
Alex Lyn
3fda59e27d tests: rename pod_exec_with_retries to pod_exec and update callers
It will do following works in this commit:

(1) Rename pod_exec_with_retries() to pod_exec().
(2) Update implementation to call container_exec().
(3) Replace all usages of pod_exec_with_retries across tests
    with pod_exec.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-09 15:56:13 +01:00
Alex Lyn
861d39305c tests: drop kubectl exec retries in container_exec
This commit aims to drop retries when kubectl exec a container:

(1) Rename container_exec_with_retries() to container_exec().
(2) Remove the retry loop and sleep backoff around kubectl exec.

Keep the same logging and container-selection logic and return
kubectl exec exit status directly.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-09 15:56:13 +01:00
Alex Lyn
41e8acbc5e runtime: Map empty ReadStdout/ReadStderr response to io.EOF
After the kata-agent "drain-after-exit" change, stdout/stderr EOF is
signaled by a successful ReadStdout/ReadStderr reply with empty Data
(len==0), instead of an RPC error. However, runtime-go currently
returns (0, nil) to io.CopyBuffer() when resp.Data is empty, which
violates Go io.Reader semantics and can cause `kubectl exec` to
hang after the command output is already printed.

To avoid exec hang:
In readProcessStream(), map an empty response (len(resp.Data)==0)
into (0, io.EOF). This allows the stdout/stderr copy goroutines to
terminate, closes exitIOch, and unblocks the wait path so exec can
complete normally.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-09 15:56:13 +01:00
Alex Lyn
ffb8a6a9c3 agent: fix misleading tokio::select! biased comment in do_read_stream
The previous comment incorrectly implied that `biased` prevents data
loss and the exit notifier would never be polled before all buffered
data is read. And the detailed info can be seen from the document:
https://docs.rs/tokio/latest/src/tokio/macros/select.rs.html#67

Tokio's `biased` only makes polling order deterministic(top-to-bottom)
when multiple branches are ready in the same poll, and it makes fairness
the caller's responsibility. Output can still be truncated if the exit
notification becomes ready while `read_stream` is pending.

This change updates the comment to reflect the actual semantics and
caveats. No functional behavior change.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-09 15:56:13 +01:00
Alex Lyn
1080f6d87e agent: Introduce drain after exit mechanism to address truncation race
Short-lived processes (e.g., `kubectl exec echo`) in legacy-io mode
occasionally lose the last segments of their output.

The root cause is a race condition where the `term_exit_notifier`
triggers before the pipe buffers are fully drained. In the previous
implementation, once the exit notification was received, the agent
immediately returned an EOF, causing the runtime's `run_io_copy` to
terminate and drop any residual data in the pipe.

This patch introduces a "drain after exit" mechanism:
- Upon receiving an exit notification, the agent enters a 500ms window
  for polling `read_streaim` to flush remaining data from the buffer.
- A true EOF is only returned if the stream is confirmed empty or the
  timeout is reached.

This ensures reliable output delivery for transient exec tasks under
high concurrency.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-09 15:56:13 +01:00
Alex Lyn
700bddeecc agent: treat EOF as normal for read_stdout/stderr stream
Legacy IO uses shim polling via read_stdout/read_stderr. The agent
previously mapped pipe EOF (read() == 0) and term_exit_notifier to
errors ("read meet eof"/"eof"), which became ttrpc INTERNAL failures.
This caused runtime IO copy to abort early, leading to lost
stdout/stderr for short-lived exec (e.g."echo") and spurious failures.

Normalize EOF semantics: read_stream now returns Ok(empty) on EOF
instead of Err("read meet eof").

This makes legacy IO behave like a proper stream: data until EOF, no
INTERNAL errors for normal termination.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-09 15:56:13 +01:00
stevenhorsman
b909c41128 runtime: Bump x/net to v0.49.0
Bump x/net to resolve CVEs:
- GO-2026-4441
- GO-2026-4440

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-09 14:49:31 +01:00
stevenhorsman
b29312289f versions: Bump go to 1.24.13
Bump go to 1.24.13 to fix CVE GO-2026-4337

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-09 14:49:31 +01:00
Zvonko Kaiser
7af306de13 agent: Update aarch64 create_pci_root_bus_path
aarch64 is also a supported architecture for NUMA.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-02-09 10:19:41 +01:00
Zvonko Kaiser
8185c015ad gpu: Add Agent NUMA Support 1 of N
We're introducing a root_complex to assign each
and every device to a NUMA node or to the default
root_complex="00" aka pcie.0. This patch introduces
the proper handling of the current  qom path being
bus/device == "00/02" with NUMAA we need to extend it
with the root_complex/bus/device == "10/00/02".

We're defaulting to root_complex="00".

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-02-09 10:19:41 +01:00
Alex Lyn
16a7ed6e14 Merge pull request #12464 from mythi/runtime-rs-tdvf
runtime-rs: use FIRMWARETDVFPATH like Go runtime
2026-02-09 09:12:52 +08:00
Mikko Ylinen
4088881662 runtime-rs: use FIRMWARETDVFPATH like Go runtime
Use OVMF path configuration for Intel TDX consistently:

$ git grep FIRMWARETD
src/runtime-rs/Makefile:FIRMWARETDXPATH := $(PREFIXDEPS)/share/ovmf/OVMF.inteltdx.fd
src/runtime-rs/Makefile:USER_VARS += FIRMWARETDXPATH
src/runtime-rs/config/configuration-qemu-tdx-runtime-rs.toml.in:firmware = "@FIRMWARETDXPATH@"
src/runtime/Makefile:FIRMWARETDVFPATH := $(PREFIXDEPS)/share/ovmf/OVMF.inteltdx.fd

Go runtime has used *TDVF* so just make runtime-rs to follow. This
keeps the behavior consistent when downstreams switch from Go runtime
to runtime-rs.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2026-02-08 21:38:06 +01:00
Jiri Moravcik
d5840149d2 docs: add OS_VERSION to rootfs script
The OS_VERSION is required when trying to build RootFS with ubuntu distro.

Fixes #12469

Signed-off-by: Jiri Moravcik <jiri.moravcik@gmail.com>
2026-02-08 21:21:59 +01:00
Manuel Huber
d9d1073cf1 gpu: Install packages for devkit
Introduce a new function to install additional packages into the
devkit flavor. With modprobe, we avoid errors on pod startup
related to loading nvidia kernel modules in the NVRC phase.
Note, the production flavor gets modprobe from busybox, see its
configuration file containing CONFIG_MODPROBE=y.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-06 09:58:32 +01:00
Manuel Huber
a786582d0b rootfs: deprecate initramfs dm-verity mode
Remove the initramfs folder, its build steps, and use the kernel
based dm-verity enforcement for the handlers which used the
initramfs mode. Also, remove the initramfs verity mode
capability from the shims and their configs.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
cf7f340b39 tests: Read and overwrite kernel_verity_parameters
Read the kernel_verity_paramers from the shim config and adjust
the root hash for the negative test.
Further, improve some of the test logic by using shared
functions. This especially ensures we don't read the full
journalctl logs on a node but only the portion of the logs we are
actually supposed to look at.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
7958be8634 runtime: Make kernel_verity_params overwritable
Similar to the kernel_params annotation, add a
kernel_verity_params annotation and add logic to make these
parameters overwritable. For instance, this can be used in test
logic to provide bogus dm-verity hashes for negative tests.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
7700095ea8 runtime-rs: Make kernel_verity_params overwritable
Similar to the kernel_params annotation, add a
kernel_verity_params annotation and add logic to make these
parameters overwritable. For instance, this can be used in test
logic to provide bogus dm-verity hashes for negative tests.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
472b50fa42 runtime-rs: Enable kernelinit dm-verity variant
This change introduces the kernel_verity_parameters knob to the
rust based shim, picking up dm-verity information in a new config
field (the corresponding build variable is already produced by
the shim build). The change extends the shim to parse dm-verity
information from this parameter and to construct the kernel command
line appropriately, based on the indicated initramfs or kernelinit
build variant.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
f639c3fa17 runtime: Enable kernelinit dm-verity variant
This change introduces the kernel_verity_parameters knob to the
Go based shim, picking up dm-verity information in a new config
field (the corresponding build variable is already produced by
the shim build). The change extends the shim to parse dm-verity
information from this parameter and to construct the kernel command
line appropriately, based on the indicated initramfs or kernelinit
build variant.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
e120dd4cc6 tests: cc: Remove quotes from kernel command line
With dm-mod.create parameters using quotes, we remove the
backslashes used to escape these quotes from the output we
retrieve. This will enable attestation tests to work with the
kernelinit dm-verity mode.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
976df22119 rootfs: Change condition for cryptsetup-bin
Measured rootfs mode and CDH secure storage feature require the
cryptsetup-bin and e2fsprogs components in the guest.
This change makes this more explicity - confidential guests are
users of the CDH secure container image layer storage feature.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
a3c4e0b64f rootfs: Introduce kernelinit dm-verity mode
This change introduces the kernelinit dm-verity mode, allowing
initramfs-less dm-verity enforcement against the rootfs image.
For this, the change introduces a new variable with dm-verity
information. This variable will be picked up by shim
configurations in subsequent commits.
This will allow the shims to build the kernel command line
with dm-verity information based on the existing
kernel_parameters configuration knob and a new
kernel_verity_params configuration knob. The latter
specifically provides the relevant dm-verity information.
This new configuration knob avoids merging the verity
parameters into the kernel_params field. Avoiding this, no
cumbersome escape logic is required as we do not need to pass the
dm-mod.create="..." parameter directly in the kernel_parameters,
but only relevant dm-verity parameters in semi-structured manner
(see above). The only place where the final command line is
assembled is in the shims. Further, this is a line easy to comment
out for developers to disable dm-verity enforcement (or for CI
tasks).

This change produces the new kernelinit dm-verity parameters for
the NVIDIA runtime handlers, and modifies the format of how
these parameters are prepared for all handlers. With this, the
parameters are currently no longer provided to the
kernel_params configuration knob for any runtime handler.
This change alone should thus not be used as dm-verity
information will no longer be picked up by the shims.

systemd-analyze on the coco-dev handler shows that using the
kernelinit mode on a local machine, less time is spent in the
kernel phase, slightly speeding up pod start-up. On that machine,
the average of 172.5ms was reduced to 141ms (4 measurements, each
with a basic pod manifest), i.e., the kernel phase duration is
improved by about 18 percent.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
83a0bd1360 gpu: use dm-verity for the non-TEE GPU handler
Use a dm-verity protected rootfs image for the non-TEE NVIDIA
GPU handler as well.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
02ed4c99bc rootfs: Use maxdepth=1 to search for kata tarballs
These tarballs are in the top layer of the build directory,
no need to traverse all sub-directories.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
d37db5f068 rootfs: Restore "gpu: Handle root_hash.txt ..."
This reverts commit 923f97bc66 in
order to re-instantiate the logic from commit
e4a13b9a4a.

The latter commit was previously reverted due to the NVIDIA GPU TEE
handler using an initrd, not an image.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
f1ca547d66 initramfs: introduce log function
Log to /dev/kmsg, this way logs will show up and not get lost.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
6d0bb49716 runtime: nvidia: Use img and sanitize whitespaces
Shift NVIDIA shim configurations to use an image instead of an initrd,
and remove trailing whitespaces from the configs.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
282014000f tests: cc: support initrd, image for attestation
Allow using an image instead of an initrd. For confidential
guests using images, the assumption is that the guest kernel uses
dm-verity protection, implicitly measuring the rootfs image via
the kernel command line's dm-verity information.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Greg Kurz
e430b2641c Merge pull request #12435 from bpradipt/crio-annotation
shim: Add CRI-O annotation support for device cold plug
2026-02-05 09:29:19 +01:00
Alex Lyn
e257430976 Merge pull request #12433 from manuelh-dev/mahuber/cfg-sanitize-whitespaces
runtimes: Sanitize trailing whitespaces
2026-02-05 09:31:21 +08:00
Fabiano Fidêncio
dda1b30c34 tests: nvidia-nim: Use sealed secrets for NGC_API_KEY
Convert the NGC_API_KEY from a regular Kubernetes secret to a sealed
secret for the CC GPU tests. This ensures the API key is only accessible
within the confidential enclave after successful attestation.

The sealed secret uses the "vault" type which points to a resource stored
in the Key Broker Service (KBS). The Confidential Data Hub (CDH) inside
the guest will unseal this secret by fetching it from KBS after
attestation.

The initdata file is created AFTER create_tmp_policy_settings_dir()
copies the empty default file, and BEFORE auto_generate_policy() runs.
This allows genpolicy to add the generated policy.rego to our custom
CDH configuration.

The sealed secret format follows the CoCo specification:
sealed.<JWS header>.<JWS payload>.<signature>

Where the payload contains:
- version: "0.1.0"
- type: "vault" (pointer to KBS resource)
- provider: "kbs"
- resource_uri: KBS path to the actual secret

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-04 12:34:44 +01:00
Fabiano Fidêncio
c9061f9e36 tests: kata-deploy: Increase post-deployment wait time
Increase the sleep time after kata-deploy deployment from 10s to 60s
to give more time for runtimes to be configured. This helps avoid
race conditions on slower K8s distributions like k3s where the
RuntimeClass may not be immediately available after the DaemonSet
rollout completes.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-04 12:13:53 +01:00
Fabiano Fidêncio
0fb2c500fd tests: kata-deploy: Merge E2E tests to avoid timing issues
Merge the two E2E tests ("Custom RuntimeClass exists with correct
properties" and "Custom runtime can run a pod") into a single test, as
those 2 are very much dependent of each other.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-04 12:13:53 +01:00
Fabiano Fidêncio
fef93f1e08 tests: kata-deploy: Use die() instead of fail() for error handling
Replace fail() calls with die() which is already provided by
common.bash. The fail() function doesn't exist in the test
infrastructure, causing "command not found" errors when tests fail.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-04 12:13:53 +01:00
Fabiano Fidêncio
f90c12d4df kata-deploy: Avoid text file busy error with nydus-snapshotter
We cannot overwrtie a binary that's currently in use, and that's the
reason that elsewhere we remove / unlink the binary (the running process
keeps its file descriptor, so we're good doing that) and only then we
copy the binary.  However, we missed doing this for the
nydus-snapshotter deployment.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-04 10:24:49 +01:00
Manuel Huber
30c7325e75 runtimes: Sanitize trailing whitespaces
Clean up trailing whitespaces, making life easier for those who
have configured their IDE to clean these up.
Suggest to not add new code with trailing whitespaces etc.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-03 11:46:30 -08:00
Steve Horsman
30494abe48 Merge pull request #12426 from kata-containers/dependabot/github_actions/zizmorcore/zizmor-action-0.4.1
build(deps): bump zizmorcore/zizmor-action from 0.2.0 to 0.4.1
2026-02-03 14:38:54 +00:00
Pradipta Banerjee
8a449d358f shim: Add CRI-O annotation support for device cold plug
Add support for CRI-O annotations when fetching pod identifiers for
device cold plug. The code now checks containerd CRI annotations first,
then falls back to CRI-O annotations if they are empty.

This enables device cold plug to work with both containerd and CRI-O
container runtimes.

Annotations supported:
- containerd: io.kubernetes.cri.sandbox-name, io.kubernetes.cri.sandbox-namespace
- CRI-O: io.kubernetes.cri-o.KubeName, io.kubernetes.cri-o.Namespace

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
2026-02-03 04:51:15 +00:00
Steve Horsman
6bb77a2f13 Merge pull request #12390 from mythi/tdx-updates-2026-2
runtime: tdx QEMU configuration changes
2026-02-02 16:58:44 +00:00
Zvonko Kaiser
6702b48858 Merge pull request #12428 from fidencio/topic/nydus-snapshotter-start-from-a-clean-state
kata-deploy: nydus: Always start from a clean state
2026-02-02 11:21:26 -05:00
Steve Horsman
0530a3494f Merge pull request #12415 from nlle/make-helm-updatestrategy-configurable
kata-deploy: Make update strategy configurable for kata-deploy DaemonSet
2026-02-02 10:29:01 +00:00
Steve Horsman
93dcaee965 Merge pull request #12423 from manuelh-dev/mahuber/pause-build-fix
packaging: Delete pause_bundle dir before unpack
2026-02-02 10:26:30 +00:00
Fabiano Fidêncio
62ad0814c5 kata-deploy: nydus: Always start from a clean state
Clean up existing nydus-snapshotter state to ensure fresh start with new
version.

This is safe across all K8s distributions (k3s, rke2, k0s, microk8s,
etc.) because we only touch the nydus data directory, not containerd's
internals.

When containerd tries to use non-existent snapshots, it will
re-pull/re-unpack.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-02 11:06:37 +01:00
Mikko Ylinen
870630c421 kata-deploy: drop custom TDX installation steps
As we have moved to use QEMU (and OVMF already earlier) from
kata-deploy, the custom tdx configurations and distro checks
are no longer needed.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2026-02-02 11:11:26 +02:00
Mikko Ylinen
927be7b8ad runtime: tdx: move to use QEMU from kata-deploy
Currently, a working TDX setup expects users to install special
TDX support builds from Canonical/CentOS virt-sig for TDX to
work. kata-deploy configured TDX runtime handler to use QEMU
from the distro's paths.

With TDX support now being available in upstream Linux and
Ubuntu 24.04 having an install candidate (linux-image-generic-6.17)
for a new enough kernel, move TDX configuration to use QEMU from
kata-deploy.

While this is the new default, going back to the original
setup is possible by making manual changes to TDX runtime handlers.

Note: runtime-rs is already using QEMUPATH for TDX.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2026-02-02 11:10:52 +02:00
Nikolaj Lindberg Lerche
6e98df2bac kata-deploy: Make update strategy configurable for kata-deploy DaemonSet
This Allows the updateStrategy to be configured for the kata-deploy helm
chart, this is enabling administrators to control the aggressiveness of
updates. For a less aggressive approach, the strategy can be set to
`OnDelete`. Alternatively, the update process can be made more
aggressive by adjusting the `maxUnavailable` parameter.

Signed-off-by: Nikolaj Lindberg Lerche <nlle@ambu.com>
2026-02-01 20:14:29 +01:00
Dan Mihai
d7ff54769c tests: policy: remove the need for using sudo
Modify the copy of root user's settings file, instead of modifying the
original file.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2026-02-01 20:09:50 +01:00
Dan Mihai
4d860dcaf5 tests: policy: avoid redundant debug output
Avoid redundant and confusing teardown_common() debug output for
k8s-policy-pod.bats and k8s-policy-pvc.bats.

The Policy tests skip the Message field when printing information about
their pods, because unfortunately that field might contain a truncated
Policy log - for the test cases that intentiocally cause Policy
failures. The non-truncated Policy log is already available from other
"kubectl describe" fields.

So, avoid the redundant pod information from teardown_common(), that
also included the confusing Message field.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2026-02-01 20:09:50 +01:00
dependabot[bot]
dc8d9e056d build(deps): bump zizmorcore/zizmor-action from 0.2.0 to 0.4.1
Bumps [zizmorcore/zizmor-action](https://github.com/zizmorcore/zizmor-action) from 0.2.0 to 0.4.1.
- [Release notes](https://github.com/zizmorcore/zizmor-action/releases)
- [Commits](e673c3917a...135698455d)

---
updated-dependencies:
- dependency-name: zizmorcore/zizmor-action
  dependency-version: 0.4.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-02-01 15:08:10 +00:00
Manuel Huber
8b0c199f43 packaging: Delete pause_bundle dir before unpack
Delete the pause_bundle directory before running the umoci unpack
operation. This will make builds idempotent and not fail with
errors like "create runtime bundle: config.json already exists in
.../build/pause-image/destdir/pause_bundle". This will make life
better when building locally.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-01-31 19:43:11 +01:00
Steve Horsman
4d1095e653 Merge pull request #12350 from manuelh-dev/mahuber/term-grace-period
tests: Remove terminationGracePeriod in manifests
2026-01-29 15:17:17 +00:00
Fabiano Fidêncio
b85393e70b release: Bump version to 3.26.0
Bump VERSION and helm-charts versions.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-29 00:23:26 +01:00
Fabiano Fidêncio
500146bfee versions: Bump Go to 1.24.12
Update Go from 1.24.11 to 1.24.12 to address security vulnerabilities
in the standard library:

- GO-2026-4342: Excessive CPU consumption in archive/zip
- GO-2026-4341: Memory exhaustion in net/url query parsing
- GO-2026-4340: TLS handshake encryption level issue in crypto/tls

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-29 00:23:26 +01:00
Dan Mihai
20ca4d2d79 runtime: DEFDISABLEBLOCK := true
1. Add disable_block_device_use to CLH settings file, for parity with
   the already existing QEMU settings.

2. Set DEFDISABLEBLOCK := true by default for both QEMU and CLH. After
   this change, Kata Guests will use by default virtio-fs to access
   container rootfs directories from their Hosts. Hosts that were
   designed to use Host block devices attached to the Guests can
   re-enable these rootfs block devices by changing the value of
   disable_block_device_use back to false in their settings files.

3. Add test using container image without any rootfs layers. Depending
   on the container runtime and image snapshotter being used, the empty
   container rootfs image might get stored on a host block device that
   cannot be safely hotplugged to a guest VM, because the host is using
   the same block device.

4. Add block device hotplug safety warning into the Kata Shim
   configuration files.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Cameron McDermott <cameron@northflank.com>
2026-01-28 19:47:49 +01:00
Manuel Huber
5e60d384a2 kata-deploy: Update for mariner in all target
Remove the initrd function and add the image function to align
with the actually existing functions in this file.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-01-28 08:58:45 -08:00
Greg Kurz
ea627166b9 Merge pull request #12389 from ldoktor/ci-helm
ci.ocp: Use 0.0.0-dev tagged helm chart
2026-01-28 17:20:07 +01:00
Manuel Huber
0d8fbdef07 kernel: Readjust kernel version after decrement
Readjust the kata_config_version counter after it was
accidentally decremented in commit c7f5ff4.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-01-28 10:48:12 +01:00
Joji Mekkattuparamban
1440dd7468 shim: enforce iommufd for confidential guest vfio
Confidential guests cannot use traditional IOMMU Group based VFIO.
Instead, they need to use IMMUFD. This is mainly because the group
abstraction is incompatible with a confidential device model.
If traditional VFIO is specified for a confidential guest, detect
the error and bail out early.

Fixes #12393

Signed-off-by: Joji Mekkattuparamban <jojim@nvidia.com>
2026-01-28 00:11:38 +01:00
stevenhorsman
c7bc428e59 versions: Bump guest-components
Bump guest-components to 9aae2eae
to pick up the latest security fixes and toolchain bump

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-01-28 00:05:58 +01:00
Aurélien Bombo
932920cb86 Merge pull request #11959 from houstar/main
agent: remove redundant func comment
2026-01-27 12:01:04 -06:00
Lukáš Doktor
5250d4bacd ci.ocp: Use 0.0.0-dev tagged helm chart
in CI we are testing the latest kata-deploy, which requires the latest
helm chart. The previous query doesn't work anymore, but these days we
should be able to rely on the "0.0.0-dev" tag and on helm to print the
to-be-installed version into console.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2026-01-27 14:58:46 +01:00
Steve Horsman
eb3d204ff3 Merge pull request #12274 from ldoktor/pp-images
ci.ocp: Two little fixes regarding the openshift-ci
2026-01-27 11:31:51 +00:00
Lukáš Doktor
971b096a1f ci.ocp: Update cleanup.sh to cope with helm deployment
replaces the old kata-deploy and uses "helm uninstall" instead.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2026-01-27 07:59:13 +01:00
Lukáš Doktor
272ff9c568 ci.ocp: Add notes about where to get other podvm images
I keep struggling finding the debug images, let's include them in the
peer-pods-azure.sh script so people can find them easier.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2026-01-27 07:59:12 +01:00
Qingyuan Hou
ca43a8cbb8 agent: remove redundant func comment
This comment was first introduced in e111093 with secure_join()
but then we forgot to remove it when we switched to the safe-path
lib in c0ceaf6

Signed-off-by: Qingyuan Hou <lenohou@gmail.com>
2026-01-27 03:07:57 +00:00
Alex Lyn
6c0ae4eb04 Merge pull request #11585 from Apokleos/enhance-qmp
runtime-rs: Make QMP init robust by retrying handshake with deadline
2026-01-27 09:11:19 +08:00
Zvonko Kaiser
a59f791bf5 gpu: Move CUDA repo selection to versions.yaml
We want to enable local and remote CUDA repository builds.
Moving the cuda and tools repo to versions.yaml with a
unified build for both types.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-01-26 22:19:40 +01:00
Fabiano Fidêncio
d0fe60e784 tests: Fix empty string handling for helm
Fix empty string handling in format conversion

When HELM_ALLOWED_HYPERVISOR_ANNOTATIONS, HELM_AGENT_HTTPS_PROXY, or
HELM_AGENT_NO_PROXY are empty, the pattern matching condition
`!= *:*` or `!= *=*` evaluates to true, causing the conversion loop
to create invalid entries like "qemu-tdx: qemu-snp:".

Add -n checks to ensure conversion only runs when variables are
non-empty.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-26 20:50:01 +01:00
Fabiano Fidêncio
4b2d4e96ae tests: Add qemu-{tdx,snp}-runtime-rs to the list of tee shims
We missed doing this as part of
b5a986eacf.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-26 20:50:01 +01:00
Fabiano Fidêncio
26c534d610 tests: Use shims.disableAll in test helpers
Update the CI and functional test helpers to use the new
shims.disableAll option instead of iterating over every shim
to disable them individually.

Also adds helm repo for node-feature-discovery before building
dependencies to fix CI failures on some distributions.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-26 20:50:01 +01:00
Fabiano Fidêncio
04f45a379c kata-deploy: docs: Document shims.disableAll option
Update the Helm chart README to document the new shims.disableAll
option and simplify the examples that previously required listing
every shim to disable.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-26 20:50:01 +01:00
Fabiano Fidêncio
c9e9a682ab kata-deploy: Use disableAll in example values files
Simplify the example values files by using the new shims.disableAll
option instead of listing every shim to disable.

Before (try-kata-nvidia-gpu.values.yaml):
  shims:
    clh:
      enabled: false
    cloud-hypervisor:
      enabled: false
    # ... 15 more lines ...

After:
  shims:
    disableAll: true

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-26 20:50:01 +01:00
Fabiano Fidêncio
cfe9bcbaf1 kata-deploy: Add shims.disableAll option to Helm chart
Add a new `shims.disableAll` option that disables all standard shims
at once. This is useful when:
- Enabling only specific shims without listing every other shim
- Using custom runtimes only mode (no standard Kata shims)

Usage:
  shims:
    disableAll: true
    qemu:
      enabled: true  # Only qemu is enabled

All helper templates are updated to check for this flag before
iterating over shims.

One thing that's super important to note here is that helm recursively
merges user values with chart defaults, making a simple
`disableAll` flag problematic: if defaults have `enabled: true`, user's
`disableAll: true` gets merged with those defaults, resulting in all
shims still being enabled.

The workaround found is to use null (`~`) as the default for `enabled`
field. The template logic interprets null differently based on
disableAll:

| enabled value | disableAll: false | disableAll: true |
|---------------|-------------------|------------------|
| ~ (null)      | Enabled           | Disabled         |
| true          | Enabled           | Enabled          |
| false         | Disabled          | Disabled         |

This is backward compatible:
- Default behavior unchanged: all shims enabled when disableAll: false
- Users can set `disableAll: true` to disable all, then explicitly
  enable specific shims with `enabled: true`
- Explicit `enabled: false` always disables, regardless of disableAll

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-26 20:50:01 +01:00
Fabiano Fidêncio
d8a3272f85 kata-deploy: Add tests for custom runtimes Helm templates
Add Bats tests to verify the custom runtimes Helm template rendering,
and that the we can start a pod with the custom runtime.

Tests were written with Cursor's help.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-26 20:50:01 +01:00
Fabiano Fidêncio
3be57bb501 kata-deploy: Add Helm chart support for custom runtimes
Add Helm chart configuration for defining custom RuntimeClasses with
base configuration and drop-in overrides.

Usage:
  helm install kata-deploy ./kata-deploy \
    -f custom-runtimes.values.yaml

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-26 20:50:01 +01:00
Fabiano Fidêncio
a76cdb5814 kata-deploy: Add custom runtime config installation/removal
Add functions to install and remove custom runtime configuration files.
Each custom runtime gets an isolated directory structure:

  custom-runtimes/{handler}/
    configuration-{baseConfig}.toml  # Copied from base config
    config.d/
      50-overrides.toml              # User's drop-in overrides

The base config is copied AFTER kata-deploy has applied its modifications
(debug settings, proxy configuration, annotations), so custom runtimes
inherit these settings.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-26 20:50:01 +01:00
Fabiano Fidêncio
4c3989c3e4 kata-deploy: Add custom runtime configuration for containerd/CRI-O
Add functions to configure custom runtimes in containerd and CRI-O.
Custom runtimes use an isolated config directory under:
  custom-runtimes/{handler}/

Custom runtimes automatically derive the shim binary path from the
baseConfig field using the existing is_rust_shim() logic.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-26 20:50:01 +01:00
Fabiano Fidêncio
678b560e6d kata-deploy: Add CustomRuntime struct and parsing
Add support for parsing custom runtime configurations from a mounted
ConfigMap. This allows users to define their own RuntimeClasses with
custom Kata configurations.

The ConfigMap format uses a custom-runtimes.list file with entries:
  handler:baseConfig:containerd_snapshotter:crio_pulltype

Drop-in files are read from dropin-{handler}.toml, if present.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-26 20:50:01 +01:00
Fabiano Fidêncio
609a25e643 kata-deploy: Refactor runtime configuration with helper functions
Let's extract the common logic from configure_containerd_runtime and
configure_crio_runtime into reusable helper functions. This reduces
code duplication and prepares for adding custom runtime support.

For containerd:
- Add ContainerdRuntimeParams struct to encapsulate common parameters
- Add get_containerd_pluginid() to extract version detection logic
- Add get_containerd_output_path() to extract file path resolution
- Add write_containerd_runtime_config() to write common TOML values

For CRI-O:
- Add CrioRuntimeParams struct to encapsulate common parameters
- Add write_crio_runtime_config() to write common configuration

While here, let's also simplify pod_annotations to always use
"[\"io.katacontainers.*\"]" for all runtimes, as the NVIDIA specific
case has been removed from the shell script, but we forgot to do so
here.

No functional changes intended.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-26 20:50:01 +01:00
Steve Horsman
aa94038355 Merge pull request #12388 from Apokleos/fix-shimio
runtime-rs: se File instead of UnixStream for FIFO to fix ENOTSOCK
2026-01-26 13:22:57 +00:00
tak-ka3
5471fa133c runtime-rs: Add -info flag support for containerd v2.0+
Add -info flag handling to containerd-shim-kata-v2 (Rust version).
This outputs RuntimeInfo protobuf (name, version, revision) to stdout,
providing compatibility with containerd v2.0+ which queries runtime
information via this flag.

This is the runtime-rs counterpart to the Go implementation.

Fixes #12133

Signed-off-by: tak-ka3 <takumi.hiraoka@acompany-ac.com>
2026-01-26 13:38:07 +01:00
Alex Lyn
68d671af0f runtime-rs: Make QMP init robust by retrying handshake with deadline
It aims to make QMP initialize robust by retrying QMP handshake with
global deadline to handle slow QEMU bring-up.

Qmp::new() used DEFAULT_QMP_READ_TIMEOUT as the effective deadline
for the QMP handshake read. When QEMU initialization is slow (e.g.
heavy host load, large memory/device init, slow storage, confidential
guests, etc.), the QMP greeting may not become readable within a small
per-read timeout (e.g. 250ms).  This caused QMP init to fail with
"Resource temporarily unavailable (os error 11)" and spam
"couldn't initialise QMP", while subsequent retries might eventually
succeed once QEMU became ready.

To address this issue, keep a short per-read timeout to avoid
indefinite blocking, but add a global "wait for QMP ready" deadline
that retries the handshake with a small backoff. This improves startup
reliability under load and avoids unnecessary reconnect failures.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-26 16:47:32 +08:00
Bo Liu
c7f5ff45a2 arm64: Update ptp.conf to correct time sync
Given the patch has been merged in linux upstream, it's safe to enable
these two options.

Signed-off-by: Bo Liu <152475812+liubocflt@users.noreply.github.com>
2026-01-24 21:08:21 +01:00
Hui Zhu
37a0c81b6a libs: Change kv of get_agent_kernel_params to BTreeMap
HashMap cannot guarantee the order.  The command line is always changed.
This commit change kv of get_agent_kernel_params to BTreeMap to make
sure the command line is not changed.

Fixes: #10977

Signed-off-by: Hui Zhu <teawater@antgroup.com>
2026-01-24 21:07:41 +01:00
Alex Lyn
e7b8b302ac runtime-rs: se File instead of UnixStream for FIFO to fix ENOTSOCK
It aims to address the issue:
"run_io_copy[Stdout]: failed to copy stream: Not a socket (os error 88)"

The `Not a socket (os error 88)` error was caused by incorrectly wrapping
a FIFO file descriptor in a `UnixStream`. The following changes:
(1) Refactor `open_fifo_write` to return `tokio::fs::File` (or a generic
  async reader/writer) instead of `AsyncUnixStream`.
(2) Ensure IO copying logic treats stdout/stderr streams as file-like
  objects rather than sockets.

This fix eliminates the "failed to copy stream" errors in the IO loop
and ensures reliable log forwarding for legacy-io.

Fixes: #12387

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-24 10:41:27 +00:00
Alex Lyn
8a0fad4b95 runtime-rs: Move the set_flag_with_blocking out as a public method
Move the private closure out and make it a public method which is
responsible for clear O_NONBLOCK for an fd and turn it into blocking
mode.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-24 10:41:27 +00:00
Manuel Huber
6438fe7f2d tests: Remove terminationGracePeriod in manifests
Do not kill containers immediately, instead use Kubernetes'
default termination grace period.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-01-23 16:18:44 -08:00
Manuel Huber
0d35b36652 Revert "ci: Ensure the KBS resources are created"
This reverts commit c0d7222194.

Soon, guest components will switch to using a DB instead of
storing resources in the filesystem. Further, I don't see any
more indicators why kbs-client would struggle to set simple
resources.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-01-23 16:18:10 -08:00
Fabiano Fidêncio
5b82b160e2 runtime-rs: Add arm64 QEMU support
Add the necessary configuration and code changes to support QEMU
on arm64 architecture in runtime-rs.

Changes:
- Set MACHINETYPE to "virt" for arm64
- Add machine accelerators "usb=off,gic-version=host" required for
  proper arm64 virtualization
- Add arm64-specific kernel parameter "iommu.passthrough=0"
- Guard vIOMMU (Intel IOMMU) to skip on arm64 since it's not supported

These changes align runtime-rs with the Go runtime's arm64 QEMU support.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>
2026-01-23 19:48:31 +01:00
tak-ka3
29e7dd27f1 runtime: Add -info flag support for containerd v2.0+
Add support for the -info flag that containerd v2.0+ passes to shims.
The flag outputs RuntimeInfo protobuf to stdout containing the shim
name and version information.

Fixes #12133

Signed-off-by: tak-ka3 <takumi.hiraoka@acompany-ac.com>
2026-01-22 19:26:44 +01:00
Steve Horsman
d0bfb27857 Merge pull request #12384 from Apokleos/fix-full-debug
doc: update enabling full debug method
2026-01-22 14:25:11 +00:00
Fabiano Fidêncio
ac8436e326 kata-deploy: Update debian in the container image to 13 (trixie)
Just a bump to the latest version, as requested by Mikko.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-22 12:32:59 +01:00
Steve Horsman
2cd76796bd Merge pull request #12305 from stevenhorsman/fix-stalebot-permissions
ci: Fix stalebot permissions
2026-01-22 10:02:43 +00:00
Alex Lyn
fb7390ce3c doc: update enabling full debug method
The enable_debug parameter was explicitly set to false rather than
being commented out (e.g., # enable_debug = true). As the previous
enabling method failed to account for this explicit setting, it was
rendered invalid. This commit updates the matching logic to correctly
handle and toggle the explicit false value.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-22 17:44:57 +08:00
Hyounggyu Choi
bc131a84b9 GHA: Set timeout for kata-deploy and kbs cleanup
It was observed that some kata-deploy cleanup steps could hang,
causing the workflow to never finish properly. In these cases,
a QEMU process was not cleaned up and kept printing debug logs
to the journal. Over time, this maxed out the runner’s disk
usage and caused the runner service to stop.

Set timeouts for the relevant cleanup steps to avoid this.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2026-01-22 10:32:24 +01:00
Fabiano Fidêncio
dacb14619d kata-deploy: Make verification ConfigMap a regular resource
The verification job mounts a ConfigMap containing the pod spec for
the Kata runtime test. Previously, both the ConfigMap and the Job were
Helm hooks with different weights (-5 and 0 respectively).

On k3s, a race condition was observed where the Job pod would be
scheduled before the kubelet's informer cache had registered the
ConfigMap, causing a FailedMount error:

  MountVolume.SetUp failed for volume "pod-spec": object
  "kube-system"/"kata-deploy-verification-spec" not registered

This happened because k3s's lightweight architecture schedules pods
very quickly, and the hook weight difference only controls Helm's
ordering, not actual timing between resource creation and cache sync.

By making the ConfigMap a regular chart resource (removing hook
annotations), it is created during the main chart installation phase,
well before any post-install hooks run. This guarantees the ConfigMap
is fully propagated to all kubelets before the verification Job starts.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-21 20:14:33 +01:00
Fabiano Fidêncio
89e287c3b2 kata-deploy: Add more permissions to verification job's RBAC
The verification job needs to list nodes to check for the
katacontainers.io/kata-runtime label and list events to detect
FailedCreatePodSandBox errors during pod creation.

This was discovered when testing with k0s, where the service account
lacked the required cluster-scope permissions to list nodes.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-21 20:14:33 +01:00
Fabiano Fidêncio
869dd5ac65 kata-deploy: Enable dynamic drop-in support for k0s
Remove k0s-worker and k0s-controller from
RUNTIMES_WITHOUT_CONTAINERD_DROP_IN_SUPPORT and always return true for
k0s in is_containerd_capable_of_using_drop_in_files since k0s auto-loads
from containerd.d/ directory regardless of containerd version.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-21 20:14:33 +01:00
Fabiano Fidêncio
d4ea02e339 kata-deploy: Add microk8s support with dynamic version detection
Add microk8s case to get_containerd_paths() method and remove microk8s
from RUNTIMES_WITHOUT_CONTAINERD_DROP_IN_SUPPORT to enable dynamic
containerd version checking.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-21 20:14:33 +01:00
Fabiano Fidêncio
69dd9679c2 kata-deploy: Centralize containerd path management
Introduce ContainerdPaths struct and get_containerd_paths() method to
centralize the complex logic for determining containerd configuration
file paths across different Kubernetes distributions.

The new ContainerdPaths struct includes:
- config_file: File to read containerd version from and write to
- backup_file: Backup file path before modification
- imports_file: File to add/remove drop-in imports from (Option<String>)
- drop_in_file: Path to the drop-in configuration file
- use_drop_in: Whether drop-in files can be used

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-21 20:14:33 +01:00
Fabiano Fidêncio
606c12df6d kata-deploy: fix JSONPath parsing for labels with dots
The JSONPath parser was incorrectly splitting on escaped dots (\.)
causing microk8s detection to fail. Labels like "microk8s.io/cluster"
were being split into ["microk8s\", "io/cluster"] instead of being
treated as a single key.

This adds a split_jsonpath() helper that properly handles escaped dots,
allowing the automatic microk8s detection via the node label to work
correctly.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-21 20:14:33 +01:00
Fabiano Fidêncio
ec18dd79ba tests: Simplify kata-deploy test to use helm directly
The kata-deploy test was using helm_helper which made it hard to debug
failures (die() calls would cause "Executed 0 tests" errors) and added
unnecessary complexity.

The test now calls helm directly like a user would, making it simpler
and more representative of real-world usage. The verification job status
is explicitly checked with proper failure detection instead of relying
on helm --wait.

Timeouts are configurable via environment variables to account for
different network speeds and image sizes:
- KATA_DEPLOY_TIMEOUT (default: 600s)
- KATA_DEPLOY_DAEMONSET_TIMEOUT (default: 300s)
- KATA_DEPLOY_VERIFICATION_TIMEOUT (default: 120s)

Documentation has been added to explain what each timeout controls and
how to customize them.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-21 20:14:33 +01:00
Fabiano Fidêncio
86e0b08b13 kata-deploy: Improve verification job timing and failure detection
The verification job now supports configurable timeouts to accommodate
different environments and network conditions. The daemonset timeout
defaults to 1200 seconds (20 minutes) to allow for large image downloads,
while the verification pod timeout defaults to 180 seconds.

The job now waits for the DaemonSet to exist, pods to be scheduled,
rollout to complete, and nodes to be labeled before creating the
verification pod. A 15-second delay is added after node labeling to
allow kubelet time to refresh runtime information.

Retry logic with 3 attempts and a 10-second delay handles transient
FailedCreatePodSandBox errors that can occur during runtime
initialization. The job only fails on pod errors after a 30-second
grace period to avoid false positives from timing issues.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-21 20:14:33 +01:00
Fabiano Fidêncio
2369cf585d tests: Fix retry loop bugs in helm_helper
The retry loop in helm_helper had two bugs:
1. Counter initialized to 10 instead of 0, causing immediate failure
2. Exit condition used -eq instead of -ge, incorrect for loop logic

These bugs would cause helm_helper to fail immediately on the first
retry attempt instead of properly retrying up to max_tries times.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-21 20:14:33 +01:00
stevenhorsman
19efeae12e workflow: Fix stalebot permissions
When looking into stale bot more for issues, I realised that our existing
stale job would need permissions to work. Unfortunately the behaviour
of the actions without these permissions is to log, but still finish as successful.
This means it was hard to spot we had an issue.

Add the required permissions to get this working again and improve the message
Also add concurrency rule to make zizmor happy

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-01-21 17:28:59 +00:00
Steve Horsman
70f6543333 Merge pull request #12371 from stevenhorsman/cargo-check
build: Add cargo check
2026-01-21 14:50:07 +00:00
Steve Horsman
4eb50d7b59 Merge pull request #12334 from stevenhorsman/rust-linting-improvements
Rust linting improvements
2026-01-21 14:01:37 +00:00
Steve Horsman
ba47bb6583 Merge pull request #11421 from kata-containers/dependabot/go_modules/src/runtime/github.com/urfave/cli-1.22.17
build(deps): bump github.com/urfave/cli from 1.22.14 to 1.22.17 in /src/runtime
2026-01-21 11:46:02 +00:00
stevenhorsman
62847e1efb kata-ctl: Remove unnecessary unwrap
Switch `is_err()` and then `unwrap_err()` for `if let` which is
"more idiomatic"

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-01-21 08:53:40 +00:00
stevenhorsman
78824e0181 agent: Remove unnecessary unwrap
Switch `is_some()` and then `unwrap()` for `if let` which is
"more idiomatic"

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-01-21 08:53:40 +00:00
stevenhorsman
d135a186e1 libs: Remove unnecessary unwrap
Switch `is_err()` and then `unwrap_err()` for `if let` which is
"more idiomatic"

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-01-21 08:52:48 +00:00
stevenhorsman
949e0c2ca0 libs: Remove unused imports
Tidy up the imports

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-01-21 08:52:48 +00:00
stevenhorsman
83b0c44986 dragonball: Remove unused imports
Clean up the imports

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-01-21 08:52:48 +00:00
stevenhorsman
7a02c54b6c kata-ctl: Allow unused assigned in clap parsing
command isn't ever read, but leave it in for now, so we don't disrupt
the parsing option

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-01-21 08:52:48 +00:00
stevenhorsman
bf1539b802 libs: Replace manual default
HugePageType has a manual default that can be derived
more concisely

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-01-21 08:52:47 +00:00
stevenhorsman
0fd9eebf0f kata-ctl: Update Cargo.lock
The cargo check identified that the lock file is out of date,
so bump this to fix the issue

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-01-20 16:07:34 +00:00
stevenhorsman
3f1533ae8a build: Add cargo check
We've had a couple of occasions that Cargo.lock has been out of sync
with Cargo.toml, so try and extend our rust check to pick this up in the CI.

There is probably a more elegant way than doing `cargo check` and
checking for changes, but I'll start with this approach

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-01-20 16:07:34 +00:00
Greg Kurz
cf3441bd2c agent: Refresh Cargo.lock
Downstream builders at Red Hat complain that `Cargo.lock` doesn't match
`Cargo.toml`.

Run `cargo check` to refresh `Cargo.lock`.

`git bisect` shows that 7cfb97d41b is the first commit where
`cargo check` has an effect in `src/agent`.

Signed-off-by: Greg Kurz <groug@kaod.org>
2026-01-20 14:44:47 +01:00
Fabiano Fidêncio
e0158869b1 tests: Add common bats test runner function
Add run_bats_tests() function to common.bash that provides consistent
test execution and reporting across all test suites (k8s, nvidia,
kata-deploy).

This removes duplicated test runner code from run_kubernetes_tests.sh,
run_kubernetes_nv_tests.sh, and run-kata-deploy-tests.sh.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-20 12:31:55 +01:00
Fabiano Fidêncio
5aff81198f helm-chart: Fix warnings on README
nydus -> `nydus`
erofs -> `erofs`

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-19 22:41:50 +01:00
Fabiano Fidêncio
b5a986eacf kata-deploy: Add runtime-rs TDX / SNP runtimeclasses
https://github.com/kata-containers/kata-containers/pull/11534 has been
merged and it added all the needed bits to deploy the QEMU SNP / TDX
runtime-rs variants, apart from the kata-deploy additions, which is done
by this PR.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-19 22:41:50 +01:00
Fabiano Fidêncio
c7570427d2 tests: Add report generation to NVIDIA tests
The NVIDIA GPU test runner script was not generating test reports,
causing the report_tests() function in gha-run.sh to have nothing
to display. This aligns the script with run_kubernetes_tests.sh by:

- Adding set -o pipefail for proper pipeline error handling
- Creating a reports directory with timestamped subdirectory
- Capturing test output to files with ok-/not_ok- prefixes
- Adding --timing flag to bats for timing information

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-19 18:21:43 +01:00
Fabiano Fidêncio
c1216598e8 static-checks: Fix kata-deploy reference
Let's just point to the official documentation rather than explaining
exactly how to deploy (and the current text was very outdated).

Removing fluentd / minikube examples is out of context of this commit.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-19 15:09:20 +01:00
Fabiano Fidêncio
96e1fb4ca6 tools: Remove runk
The runk tool hasn't been supported for a few years, with no maintainers
since ManaSugi stopped being involved in the project and the CI was
disabled in 2024.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-19 14:43:53 +01:00
Fabiano Fidêncio
f68c25de6a kata-deploy: Switch to the rust version
Let's remove the script and rely only on the rust version from now on.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-19 14:07:49 +01:00
Fabiano Fidêncio
d7aa793dde Revert "ci: Run a nightly job using the kata-deploy rust"
This reverts commit 6130d7330f, as we're
officially swithcing to the rust version of kata-deploy.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-19 14:07:49 +01:00
Fabiano Fidêncio
17472f3f10 release: scripts: Accept KATA_TOOLS_STATIC_TARBALL env var
a2534e7bc8 introduced the logic to also
release a kata-tools tarball, but it missed allowing
KATA_TOOLS_STATIC_TARBALL env var to be passed to the release script,
leading to the following error during the release process:
```
ERROR: Invalid environment variable "KATA_TOOLS_STATIC_TARBALL"
```

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-19 13:03:23 +01:00
Fabiano Fidêncio
882862d711 release: Bump version to 3.25.0
Bump VERSION and helm-charts versions.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-19 11:33:45 +01:00
XanderC
93beb58c5d runtime: fix network initialization for non-hotplug VMMs
In startVM(), for VMMs without hotplug support (e.g., Firecracker or
QEMU microvm), the runtime runs prestart hooks but misses rescanning
the network namespace. This causes VMs to boot with uninitialized
network configs, as updates from CNI plugins are not captured.

This patch adds a network rescan via AddEndpoints after prestart hooks
for the non-hotplug path, ensuring correct network info is passed to
the VMM configuration before the VM starts.

Fixes #11500

Signed-off-by: XanderC <xanderc@qq.com>
2026-01-17 23:56:59 +01:00
Zvonko Kaiser
428cc5d586 gpu: Chroot Cleanup
With the newest NVRC we do not need the supported GPUs
anymore.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-01-17 19:27:24 +01:00
Fabiano Fidêncio
1c154b4c15 kernel: Add DAX fix for arm64
The patch has been provided upstream by Seunguk Shin and is already
approved.

We'll drop it once it becomes available in the LTS tree.

Reference:
https://lore.kernel.org/all/18af3213-6c46-4611-ba75-da5be5a1c9b0@arm.coum

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-17 19:15:53 +01:00
Fabiano Fidêncio
33b1f0786e Revert "arm64: Do not use DAX with the rootfs image"
This reverts commit 2acb94ef2d, as we have
a kernel patch approved fixing the issue.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-17 19:15:53 +01:00
Alex Lyn
fe15f2fa47 runtime-rs: Remove deprecated virtio-9p
The virtio-9p is not supported for a long time, specially within
the runtime-rs, we have no such plan to support it. Removal of the
related items is reasonable.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-17 18:52:57 +01:00
Alex Lyn
b7cfc6fd72 runtime-rs: Remove mem-agent section from TDX/SNP configurations
As Memory Agent feature is not used within CoCo(TDX/SNP) scenarios,
with this fact, it's better to just remove the related sections.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-17 18:52:57 +01:00
Alex Lyn
634ec2b56d runtime-rs: Add configurable SNP items in Makefile when make build
It aims to introduce some related items within Makefile to enable
Intel SNP settings in configuration when do make build. And make it
possible to generate the rendered qemu-snp-runtime-rs configuration
based on the *.in template.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-17 18:52:57 +01:00
Alex Lyn
0abdb8e016 runtime-rs: Introduce a qemu-runtime-rs/SEV-SNP dedicated configuration
To make it work well on the SEV-SNP platforms for qemu-runtime-rs with
coco, a dedicated SEV-SNP configuration should be introduced to help
prepare related CVM resources.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-17 18:52:57 +01:00
Alex Lyn
b0a82f7bb8 runtime-rs: Enable measured rootfs within configuration when make build
Enable measured rootfs within configuration when make build. And add
some other important items to make the configuration work well.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-17 18:52:57 +01:00
Alex Lyn
3799855040 runtime-rs: Add configurable TDX items in Makefile when make build
It aims to introduce some related items within Makefile to enable
Intel TDX settings in configuration when do make build. And make it
possible to generate the rendered qemu-tdx-runtime-rs configuration
based on the *.in template.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-17 18:52:57 +01:00
Alex Lyn
4d55e2c8c8 runtime-rs: Introduce a dedicated configuration for qemu-runtime-rs/TDX
To make it work well on the TDX platforms for qemu-runtime-rs with
coco, a dedicated TDX configuration should be introduced to help
prepare related CVM resources.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-17 18:52:57 +01:00
Manuel Huber
956f43c6c6 runtime: skip MoveTo for systemd cgroups
Systemd-managed cgroups use the slice:prefix:name format, which is
not a filesystem path. Calling MoveTo() on such paths fails with
"invalid group path" and can abort cleanup before Delete() runs.
In some cases, this causes pod teardown delays.
Skip MoveTo for systemd-formatted sandbox/overhead cgroup paths when
sandbox_cgroup_only is true; systemd moves tasks on unit deletion.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-01-16 16:41:38 +01:00
Manuel Huber
6b70923e55 docs: Update NVIDIA GPU passthrough QEMU scenario
With cold-plug becoming by design the only supported mode with the
update of NVRC to v0.1.1, resolving references to hot-plug.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-01-16 13:50:10 +01:00
Steve Horsman
610a8bdfd5 Merge pull request #12346 from Amulyam24/ppc64le-payload
ci: move the job publish kata payload after push to an alternate runner for ppc64le
2026-01-16 11:41:53 +00:00
Fabiano Fidêncio
ea18f543b4 tests: kata-deploy: Enable verification during helm install
Enable post-install verification in kata-deploy CI tests. When
HELM_VERIFY_DEPLOYMENT is set, a simple verification pod is created
that runs with the Kata runtime to confirm deployment succeeded.

The verification pod prints kernel info and exits - success indicates
the Kata runtime is properly configured and functional.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-16 10:52:43 +01:00
Fabiano Fidêncio
a188f04d75 kata-deploy: helm: Add optional post-install verification
Add optional verification that runs after kata-deploy installation.
When a pod spec is provided via --set-file verification.pod=<file>,
a verification job runs after install/upgrade to validate deployment.

The user is fully responsible for the verification pod content:
- Pod name, runtimeClassName, annotations, and verification logic
- Pod must exit 0 on success, non-zero on failure

The verification job simply:
1. Waits for kata-deploy DaemonSet to be ready
2. Applies the user-provided pod spec
3. Waits for the pod to complete
4. Shows logs and cleans up

Usage:
  helm install kata-deploy ... \
    --set-file verification.pod=/path/to/your-pod.yaml

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-16 10:52:43 +01:00
Amulyam24
859313d904 ci: move the job payload after push to an alternate runner for ppc64le
To unlock the release, move the job to publish kata payload after push to an alternate runner(IBM owned) for ppc64le.

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2026-01-16 11:14:42 +05:30
Alex Lyn
c0cca81993 runtime-rs: Set default_bridges with 0 for dragonball vmm
As Dragonball VMM does not support PCI hotplug options, it should
be set 0.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-15 20:32:15 +01:00
Alex Lyn
1a76d44e16 kata-types: Chanage the default bridges with 1
It aims to align it with the Makefile and configuration's
setting.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-15 20:32:15 +01:00
Alex Lyn
6375b3881d runtime-rs: Set the default bridges with default 1
As runtime-go use the default bridges with 1, it should be
kept as 1 to avoid alignment issues.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-15 20:32:15 +01:00
Alex Lyn
8728b262fb Merge pull request #12338 from zvonkok/nvrc-update
gpu: Bump NVRC Version
2026-01-15 19:36:07 +08:00
Zvonko Kaiser
adce41c432 gpu: Bump NVRC Version
The new NVRC version works for CC and non-CC use cases,
no --feature confidential needed anymore.

Bump versions.yaml and adjust deployment instructions.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-01-15 01:51:10 +00:00
Manuel Huber
6753c3ac08 runtime: nvidia: Disable NVDIMM
Disable NVDIMM. When using GPU passthrough, using NVDIMM would create
a r/o file-backed memory region. When using a GPU, QEMU tries to DMA-
map guest memory for the device, resulting in a mapping error:
memory listener initialization failed: Region mem0:
vfio_container_dma_map ... -22 (Invalid argument).
For the CC configs, NVDIMM is disabled by default in qemu_amd64.go
with a warning, but we also explicitly disable the setting in the
shim configuration file.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-01-14 22:51:07 +01:00
Fabiano Fidêncio
a9dda0e52b versions: nvidia: Bump kernel to the latest LTS
As now that we have the decoupled rootfs / kernel, doing the bump
becomes trivial.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-14 20:45:54 +01:00
Fabiano Fidêncio
4e99860fd2 workflows: nvidia: Adjust to kernel / roots build decouple
We don't need to store the kernel headers anymore. We do need to store
the kernel modules, instead.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-14 20:45:54 +01:00
Zvonko Kaiser
02d2b6bdf2 kernel: bump kata_config_version
We have kernel build changes bump the config version

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-01-14 20:45:54 +01:00
Zvonko Kaiser
a075c3740a gpu: build_image.sh use versions.yaml
We've done some bad file based driver determination,
now with versions.yaml there is a single source of truth.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-01-14 20:45:54 +01:00
Zvonko Kaiser
ffc8725164 gpu: rootfs update decoupling
Remove all the driver build instructions,
sicne those are now done in the kernel target.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-01-14 20:45:54 +01:00
Zvonko Kaiser
cca973772d gpu: deploy modules for kernel build
We need to package the build modules for the rootfs
to be able to consume it. We package the whole
/lib/modules/$(uname -r)  directory strip=2.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-01-14 20:45:54 +01:00
Zvonko Kaiser
13ed3cdff9 gpu: Add NVIDA modules to build-kernel.sh
Checkout and build the kernel modules along
with the kernel to avoid the kernel rootfs dependency.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-01-14 20:45:54 +01:00
Zvonko Kaiser
2a11910acb gpu: Remove building of Headers
Since we build along the kernel we do not need to
carry over the headers to the rootfs build.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-01-14 20:45:54 +01:00
Zvonko Kaiser
b1870fef07 gpu: versions.yaml nvidia driver pinning
We want to have deterministic behaviour and only
one valid driver version acceptable via versions.yaml

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-01-14 20:45:54 +01:00
Zvonko Kaiser
229481b348 kernel: bugfix install yq
We actually never installed yq to the kernel build,
there are  some path that use yq but were never hit,
for the GPU use-case we need to read values from versions.yaml

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-01-14 20:45:54 +01:00
Steve Horsman
6db3a4cf8d Merge pull request #12333 from fitzthum/bump-v0180
Update Trustee and guest-components for upcoming releases
2026-01-14 19:44:55 +00:00
Tobin Feldman-Fitzthum
ca29e68acb agent-ctl: bump image-rs version
In preparation for coco v0.18.0, bump the version of image-rs we use in
agent-ctl to match what we have in versions.yaml.

Drop the snapshotter-overlayfs feature. This was dropped from image-rs
when we removed enclave-cc support.

Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
2026-01-14 06:54:29 -08:00
Tobin Feldman-Fitzthum
25a08ef739 versions: bump Trustee and guest-components
Before cutting the Kata release that will be used with CoCo v0.18.0,
let's bump the versions of Trustee and guest-components to latest.

Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
2026-01-14 06:43:30 -08:00
Steve Horsman
0f5f914a04 Merge pull request #12330 from LandonTClipp/docs_improvement
docs: Navigation improvements and bug fixes to Pages
2026-01-14 14:13:29 +00:00
stevenhorsman
70e3e2b0c9 genpolicy: Bump openssl-src
This is a vulnerability (CVE-2025-9230) in openssl, so move
to 3.5.4 which has a fix for this

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-01-14 14:05:48 +01:00
stevenhorsman
aace7a7336 versions: Bump openssl-src
This is a vulnerability (CVE-2025-9230) in openssl, so move
to 3.5.4 which has a fix for this

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-01-14 14:05:48 +01:00
Fabiano Fidêncio
2acb94ef2d arm64: Do not use DAX with the rootfs image
Kernel 6.18.x has an issue with DAX, which is not yet fixed upstream:
```
[    0.737679] EXT4-fs (pmem0p1): mounted filesystem 79676804-7c8b-491a-b2a6-9bae3c72af70 ro with ordered data mode. Quota mode: disabled.
[    0.737891] VFS: Mounted root (ext4 filesystem) readonly on device 259:1.
[    0.739119] devtmpfs: mounted
[    0.739476] Freeing unused kernel memory: 1920K
[    0.740156] Run /sbin/init as init process
[    0.740229]   with arguments:
[    0.740286]     /sbin/init
[    0.740321]   with environment:
[    0.740369]     HOME=/
[    0.740400]     TERM=linux
[    0.743162] Unable to handle kernel paging request at virtual address fffffdffbf000008
[    0.743285] Mem abort info:
[    0.743316]   ESR = 0x0000000096000006
[    0.743371]   EC = 0x25: DABT (current EL), IL = 32 bits
[    0.743444]   SET = 0, FnV = 0
[    0.743489]   EA = 0, S1PTW = 0
[    0.743545]   FSC = 0x06: level 2 translation fault
[    0.743610] Data abort info:
[    0.743656]   ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000
[    0.743720]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[    0.743785]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[    0.743848] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000b9d17000
[    0.743931] [fffffdffbf000008] pgd=10000000bfa3d403, p4d=10000000bfa3d403, pud=1000000040bfe403, pmd=0000000000000000
[    0.744070] Internal error: Oops: 0000000096000006 [#1]  SMP
[    0.748888] CPU: 0 UID: 0 PID: 1 Comm: init Not tainted 6.18.4 #1 NONE
[    0.749421] pstate: 004000c5 (nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    0.749969] pc : dax_disassociate_entry.constprop.0+0x20/0x50
[    0.750444] lr : dax_insert_entry+0xcc/0x408
[    0.750802] sp : ffff80008000b9e0
[    0.751083] x29: ffff80008000b9e0 x28: 0000000000000000 x27: 0000000000000000
[    0.751682] x26: 0000000001963d01 x25: ffff0000004f7d90 x24: 0000000000000000
[    0.752264] x23: 0000000000000000 x22: ffff80008000bcc8 x21: 0000000000000011
[    0.752836] x20: ffff80008000ba90 x19: 0000000001963d01 x18: 0000000000000000
[    0.753407] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[    0.753970] x14: ffffbf3154b9ae70 x13: 0000000000000000 x12: ffffbf3154b9ae70
[    0.754548] x11: ffffffffffffffff x10: 0000000000000000 x9 : 0000000000000000
[    0.755122] x8 : 000000000000000d x7 : 000000000000001f x6 : 0000000000000000
[    0.755707] x5 : 0000000000000000 x4 : 0000000000000000 x3 : fffffdffc0000000
[    0.756287] x2 : 0000000000000008 x1 : 0000000040000000 x0 : fffffdffbf000000
[    0.756871] Call trace:
[    0.757107]  dax_disassociate_entry.constprop.0+0x20/0x50 (P)
[    0.757592]  dax_iomap_pte_fault+0x4fc/0x808
[    0.757951]  dax_iomap_fault+0x28/0x30
[    0.758258]  ext4_dax_huge_fault+0x80/0x2dc
[    0.758594]  ext4_dax_fault+0x10/0x3c
[    0.758892]  __do_fault+0x38/0x12c
[    0.759175]  __handle_mm_fault+0x530/0xcf0
[    0.759518]  handle_mm_fault+0xe4/0x230
[    0.759833]  do_page_fault+0x17c/0x4dc
[    0.760144]  do_translation_fault+0x30/0x38
[    0.760483]  do_mem_abort+0x40/0x8c
[    0.760771]  el0_ia+0x4c/0x170
[    0.761032]  el0t_64_sync_handler+0xd8/0xdc
[    0.761371]  el0t_64_sync+0x168/0x16c
[    0.761677] Code: f9453021 f2dfbfe3 cb813080 8b001860 (f9400401)
[    0.762168] ---[ end trace 0000000000000000 ]---
[    0.762550] note: init[1] exited with irqs disabled
[    0.762631] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
```

For now, we limit the rootfs that we ship to ARM64 to not use DAX, in
the future we'll re-enable it as soon as the patch lands on mainstream
kernel.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-14 11:46:40 +01:00
Fabiano Fidêncio
3ef99f4ee3 versions: Add specific nvidia kernel version
This is needed as the 580 driver doesn't build against 6.18.x, and the
590 driver is not yet fully working for our case, thus we stick to the
previous version that worked before.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-14 11:46:40 +01:00
Fabiano Fidêncio
cce5d4abf6 kernel: bump to v6.18.x (LTS)
Bump both the kernel and kernel-confidential versions from v6.12.x and
v6.16.x to v6.18.4, aligning with the new LTS release.

Kernel 6.18 introduced several configuration changes that required
updates to our kernel config fragments:

* CRYPTO_FIPS dependencies changed:
  - In 6.12: depended on !CRYPTO_MANAGER_DISABLE_TESTS
  - In 6.18: now depends on CRYPTO_SELFTESTS (which requires EXPERT)
  Added CONFIG_EXPERT=y and CONFIG_CRYPTO_SELFTESTS=y to crypto.conf
  to satisfy the new dependency chain.
  * CONFIG_EXPERT is a naughty one, as it disables / enables a bunch
    of things behind ones back, probably just to prove a point that
    it is for experts ;-) ... regardless, a reasonable amount of
    options had to be re-added in order to make sure anything ends
    up broken.

* Legacy iptables support:
  Kernel 6.18 requires explicit legacy xtables/iptables configs for
  IP_NF_* options. Added CONFIG_NETFILTER_XTABLES_LEGACY,
  CONFIG_IP_NF_IPTABLES_LEGACY, and CONFIG_IP6_NF_IPTABLES_LEGACY
  to netfilter.conf.

* Module signing dependencies:
  Added CONFIG_MODULES=y and other required dependencies to
  module_signing.conf to ensure MODULE_SIG can be properly enabled.

* Whitelist updates:
  - Added CONFIG_NF_CT_PROTO_DCCP (removed in 6.18+)
  - Added CONFIG_CRYPTO_SELFTESTS, CONFIG_NETFILTER_XTABLES_LEGACY,
    CONFIG_IP_NF_IPTABLES_LEGACY, CONFIG_IP6_NF_IPTABLES_LEGACY
    (added in 6.18+, not present in older kernels like 6.12)

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-14 11:46:40 +01:00
LandonTClipp
197231456f docs: Navigation improvements and bug fixes to Pages
A few minor changes to the Zensical config that makes navigation easier. Also
fixed a couple of bugs with local serving and added some quality of life
features to Zensical.

Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
2026-01-13 11:17:58 -06:00
LandonTClipp
94fde1356c docs: Add Zensical Doc Site Generation
This commit adds a Github workflow for building a Github Pages site for the markdown
files in the docs/ directory. Zensical is a new markdown-based static site generation
framework built by the creators of Material for Mkdocs. https://zensical.org/

This commit does not clean the doc structure, so site navigation is initially going to
be messy.

Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
2026-01-13 12:42:02 +01:00
dependabot[bot]
2edb161c53 build(deps): bump github.com/urfave/cli in /src/runtime
Bumps [github.com/urfave/cli](https://github.com/urfave/cli) from 1.22.14 to 1.22.17.
- [Release notes](https://github.com/urfave/cli/releases)
- [Changelog](https://github.com/urfave/cli/blob/main/docs/CHANGELOG.md)
- [Commits](https://github.com/urfave/cli/compare/v1.22.14...v1.22.17)

---
updated-dependencies:
- dependency-name: github.com/urfave/cli
  dependency-version: 1.22.17
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-01-13 09:04:41 +00:00
dependabot[bot]
3377d729ea build(deps): bump rsa from 0.9.6 to 0.9.9 in /src/tools/agent-ctl
Bumps [rsa](https://github.com/RustCrypto/RSA) from 0.9.6 to 0.9.9.
- [Changelog](https://github.com/RustCrypto/RSA/blob/v0.9.9/CHANGELOG.md)
- [Commits](https://github.com/RustCrypto/RSA/compare/v0.9.6...v0.9.9)

---
updated-dependencies:
- dependency-name: rsa
  dependency-version: 0.9.9
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-01-13 04:08:40 +01:00
Fupan Li
1f1a000608 Merge pull request #12291 from Apokleos/bump-qapi
runtime-rs: Bump qapi-rs from 0.14 to 0.15
2026-01-13 10:39:41 +08:00
Manuel Huber
9e30283952 runtime: nvidia: change kernel parameters
Remove the agent hotplug timeout parameter from the kernel
command line. Having shifted to VFIO cold-plug, this parameter is
no longer needed.
Remove the no longer required parameter for TDX and thus align the
SNP and TDX configurations.
Add a parameter to avoid the kernel to mount the /dev tmpfs. NVRC
and later on kata-agent attempt this. While kata-agent does not
panic when mounting /dev fails, NVRC makes mounting /dev a hard
requirement.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-01-12 16:11:28 -08:00
dependabot[bot]
bcadb9b231 build(deps): bump sequoia-openpgp in /src/tools/agent-ctl
Bumps [sequoia-openpgp](https://gitlab.com/sequoia-pgp/sequoia) from 2.0.0 to 2.1.0.
- [Commits](https://gitlab.com/sequoia-pgp/sequoia/compare/openpgp/v2.0.0...openpgp/v2.1.0)

---
updated-dependencies:
- dependency-name: sequoia-openpgp
  dependency-version: 2.1.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-01-12 22:16:51 +01:00
Alex Lyn
fba92880c9 tests: make set_container_command idempotent and add debug output
set_container_command() previously appended command arguments
one-by-one with
'.command += [...]'. This makes the helper non-idempotent and can
lead to unexpected command arrays when invoked multiple times.

Update the helper to set the full command array in a single yq v4
expression and print the target YAML path plus the command being
applied to simplify debugging when tests fail.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-12 17:56:28 +01:00
Alex Lyn
38296a41b2 tests: Generate pod config with stable .yaml suffix
The pod config file created by new_pod_config() was generated via
mktemp using the template "pod-config.yaml.in.XXX", which produces
filenames that do not end with ".yaml" (e.g. pod-config.yaml.in.ABC).

If the random combination of special suffix with ".Csv" or ".Xml", etc.
the following operations with yq will fail.

Some helpers and tooling assume the config path ends with ".yaml".
Switch the mktemp template to place the random suffix before the
extension so the returned path always ends with ".yaml".

Fixes: #12268, #12319

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-12 17:56:28 +01:00
Fabiano Fidêncio
9fec31f400 tools: kubectl: Add kubectl version as a tag
This is a suggestion from Choi, so we can easily test with a specific
kubectl version and also easily understand which kubectl version is
being used in case of failure.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-12 15:48:44 +01:00
Fabiano Fidêncio
26dfcb627b tools: Build kubectl image
This image will be used by our helm charts to verify that a
kata-containers deployment is correct.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-12 15:48:44 +01:00
Alex Lyn
d03eccf567 runtime-rs: Improve wait_for_migration to avoid fixed sleep
Enhance the wait_for_migration implementation to reliably wait for
QEMU migration completion and avoid the previous `sleep(280ms)`
delay.
(1) Add an initial fast-path query to return immediately if
migration is already completed/failed/cancelled.
(2) Use a hard deadline to enforce timeouts deterministically.
(3) Implement adaptive polling with backoff and a maximum interval
to reduce QMP load while keeping responsiveness.
(4) Unify migration status handling and return clear errors on
failed/cancelled states.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-12 20:06:55 +08:00
Alex Lyn
5026b33455 runtime-rs: Introduce a method to detect current migrate info
Return information about current migration process. And the input
and output as below:
{ 'command': 'query-migrate', 'returns': 'MigrationInfo' }

But note that the Qemu API is valid within qapi-rs(v0.15+)

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-12 20:06:55 +08:00
Alex Lyn
c472b5db54 runtime-rs: Bump qapi-rs from 0.14 to 0.15
The detailed information about the updated versions as below:
```
qapi = { version = "0.15", features = ["qmp", "async-tokio-all"] }
qapi-spec = "0.3.2"
qapi-qmp = "0.15.0"
```
and it will correct some corresonding structures.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-12 20:06:55 +08:00
Manuel Huber
183507beeb agent: change secure_storage_integrity default
Change the secure_storage_integrity option's default value to true.
With this, integrity protection for encrypted block device contents
will be requested from the confidential data hub by default, see the
agent's cdh_handler_trusted_storage function in rpc.rs.
This behavior can be disabled by explicitly setting the
agent.secure_storage_integrity parameter to 0 or false via kernel
command line parameters.

This will affect the trusted storage implementation for the guest-pull
mechanism, and it will affect future implementations using this code
path, such as implementations for ephemeral secure storage.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-01-10 16:54:03 +01:00
stevenhorsman
a0d96256f5 packaging: Fix tools permissions issue
In some builds we are seeing:
```
error: could not create temp file /opt/rustup/tmp/r2xu46kwuyc7k2kr_file: Permission denied (os error 13)
```
in the agent-ctl build, so try and port a fix from #12313 to the tools build
to try and resolve this.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-01-09 21:45:26 +01:00
Federico A. Corazza
787768fe9b kata-deploy: Fix extraction of the containerd major version
Fixes deploying kata-containers using k3s. The deploy script fails with /opt/kata-artifacts/scripts/kata-deploy.sh: line 397: [: too many arguments

Signed-off-by: Federico A. Corazza <git@facorazza.com>
2026-01-09 19:52:18 +01:00
stevenhorsman
5067ed7d9a versions.yaml: Fix formatting errors
yamllint complains that there is only one space before the comment,
so add a second to prevent this annoying message showing up.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-01-09 19:36:31 +01:00
stevenhorsman
a850f66fc4 versions: Bump rust to 1.89
Following the agreed toolchain policy - bump rust to the current (1.91)-2
releases.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-01-09 19:36:31 +01:00
Manuel Huber
df2896c298 docs: Create NVIDIA GPU passthrough QEMU scenario
Create a new page for a reference implementation for Kubernetes
using QEMU, the go shim and an NVIDIA rootfs. The new page
contains information on:
- components involved in the NVIDIA (TEE) GPU scenario
- orchestration flow for GPU passthrough scenarios
- deployment guidance

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-01-09 19:02:56 +01:00
Manuel Huber
43627805f4 docs: Improve structure and flow of NVIDIA guide
- Apply a few structural/grouping changes and improve flow
- Group build sections together
- Move usage examples to last section

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-01-09 19:02:56 +01:00
Steve Horsman
489deaad17 Merge pull request #12297 from manuelh-dev/mahuber/fix-doc
docs: Fix trusted-image-storage reference
2026-01-09 15:22:25 +00:00
Hyounggyu Choi
2962e14c10 virtiofsd: fix RUSTUP_HOME and CARGO_HOME permissions for non-root builds
The following error was observed during virtiofsd static build:

```
error: could not create temp file /opt/rustup/tmp/p44enysfaxwdbvw4_file:
Permission denied (os error 13)
```

This occurs because RUSTUP_HOME and CARGO_HOME were initialized by the
root user during `docker build`, but `cargo build` is executed as a
non-root user via 'docker run --user'.

Ensure these directories are writable by adjusting the permission after
the toolchain installation is complete.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2026-01-09 14:01:20 +01:00
Manuel Huber
65aa99f291 docs: Fix trusted-image-storage reference
The sample uses a volume device name which does not exist,
hence fix.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-01-09 11:41:18 +00:00
Saul Paredes
02979a13e3 Merge pull request #12208 from romoh/patch-1
ci: Update AKS setup post Pod Sandboxing GA
2026-01-08 11:02:05 -08:00
Fabiano Fidêncio
f8318c0542 kata-deploy: Remove unused dependency
We're depending solely on toml_edit, thus we can safely remove the toml
dependency.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-08 18:58:11 +01:00
Fupan Li
b3546f3a68 Merge pull request #12282 from kata-containers/set-required-ci
Set several tests as required ci
2026-01-08 20:34:39 +08:00
Mikko Ylinen
cc6277b735 Revert "tdx: Update GPU config for the latest TDX stack"
Prefer the "full feature TDVF" instead of the generic OVMF build. See
Option-B in
https://github.com/tianocore/edk2/tree/master/OvmfPkg/IntelTdx#configurations-and-features
for the extra hardening supported.

FIRMWAREPATH_NV also seems to be TDX specific unlike the Makefile
suggests. Therefore, it can be dropped completely.

This reverts commit 66ccc25724.
2026-01-08 10:21:47 +01:00
Mikko Ylinen
e02e226431 packaging: build OVMF for Intel TDX again
OVMF build for Intel TDX (aka "TDVF") was disabled in favor of Ubuntu/
CentOS pre-upstream releases of Intel TDX.

See 4292c4c3b1.

It's time to re-enable the build and move runtime configurations to
use it (the latter will be done in a later commit).

This is a partial revert of 4292c4c3b with the following changes:
- Stop calling OVMF for Intel TDX "TDVF" and follow the naming distros
use for TDX enabled build: OVMF.inteltdx.fd.
- Single binary OVMF.inteltdx.fd is supported using -bios QEMU param.
- Secure Boot infrastructure is disabled since Kata does not support it.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2026-01-08 10:21:47 +01:00
Alex Lyn
f3d92a8b4a dragonball: Fix UT failed in test_fs_manipulate_backend_fs
Improve the checking logic for source path existing.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-08 12:42:00 +08:00
Alex Lyn
7de968b416 dragonball: Fix warning of unused method
Actually this method is indeed called, just add attribute of
`#[allow(dead_code)]` to allow UT pass. And the warning looks like:
warning: method `send_message_with_payload` is never used
    |
224 | impl<R: Req> Endpoint<R> {
    | ------------------------ method in this implementation
...
522 |     pub fn send_message_with_payload<T: Sized, P: Sized>(
    |            ^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: `#[warn(dead_code)]` on by default

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-08 11:01:34 +08:00
Alex Lyn
36d3d7c3bf dragonball: Fix warnings of result to be handled
warning: unused `std::result::Result` that must be used
   -->
src/dragonball/dbs_virtio_devices/src/vhost/vhost_user/net.rs:679:9
    |
679 | /         VirtioDevice::<Arc<GuestMemoryMmap<()>>, QueueSync,
GuestRegionMmap>::write_config(
680 | |             &mut dev, 0, &config,
681 | |         );
    | |_________^
    |
    = note: this `Result` may be an `Err` variant, which should be
handled
    = note: `#[warn(unused_must_use)]` on by default
help: use `let _ = ...` to ignore the resulting value
    |
679 |         let _ = VirtioDevice::<Arc<GuestMemoryMmap<()>>,
QueueSync, GuestRegionMmap>::write_config(
    |         +++++++

warning: unused `std::result::Result` that must be used
   -->
src/dragonball/dbs_virtio_devices/src/vhost/vhost_user/net.rs:683:9
    |
683 | /         VirtioDevice::<Arc<GuestMemoryMmap<()>>, QueueSync,
GuestRegionMmap>::read_config(
684 | |             &mut dev, 0, &mut data,
685 | |         );
    | |_________^
    |
    = note: this `Result` may be an `Err` variant, which should be
handled
help: use `let _ = ...` to ignore the resulting value
    |
683 |         let _ = VirtioDevice::<Arc<GuestMemoryMmap<()>>,
QueueSync, GuestRegionMmap>::read_config(
    |         +++++++

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-08 10:52:19 +08:00
Alex Lyn
6a1b25a4b0 dragonball: Fix warning of variable does not need to be mutable
the WARNING looks like as:
...
warning: variable does not need to be mutable
   --> src/dragonball/dbs_virtio_devices/src/vsock/csm/txbuf.rs:217:13
    |
217 |         let mut tmp: Vec<u8> = vec![0; TxBuf::SIZE - 2];
    |             ----^^^
    |             |
    |             help: remove this `mut`
    |
    = note: `#[warn(unused_mut)]` on by default
...

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-08 10:44:25 +08:00
Alex Lyn
064271b9cb dragonball: Fix unexpected cfg condition of test-resources
Fix the warnings about unexpected cfg of test-resources, and the
detailed warning message looks like as below:

...
warning: unexpected `cfg` condition value: `test-resources`
   --> src/dragonball/dbs_virtio_devices/src/fs/device.rs:973:11
    |
973 |     #[cfg(feature = "test-resources")]
    |           ^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: expected values for `feature` are: `fuse-backend-rs`,
`vhost`, `vhost-net`, `vhost-rs`, `vhost-user`, `vhost-user-blk`,
`vhost-user-fs`, `vhost-user-net`, `virtio-balloon`, `virtio-blk`,
`virtio-fs`, `virtio-fs-pro`, `virtio-mem`, `virtio-mmio`, `virtio-net`,
and `virtio-vsock`
    = help: consider adding `test-resources` as a feature in
`Cargo.toml`
...

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-08 10:39:33 +08:00
Alex Lyn
ef36c47ca4 runtime-rs: Fix deprecated method in UT
Remove into_path() and replace it with keep().

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-08 10:32:31 +08:00
Alex Lyn
e4451baa84 tests: Set run-nerdctl-tests with qemu-runtime-rs required
run-nerdctl-tests (qemu-runtime-rs)

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-08 09:56:50 +08:00
Alex Lyn
56a21c33a3 tests: Set stability tests with qemu-runtime-rs required
run-containerd-stability (active, qemu-runtime-rs)
run-containerd-stability (lts, qemu-runtime-rs)

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-08 09:56:50 +08:00
Alex Lyn
679e31d884 tests: Set run-nydus CIs as required
run-basic-amd64-tests / run-nydus

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-01-08 09:56:50 +08:00
Fabiano Fidêncio
6b3953dd51 tests: k8s: liveness-probes: Adjust events grep
Till k8s 1.34 we could grep by "Started containerd". From k8s 1.35
onwards the event message changed and we should, instead, grep by
"Container started".

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-07 23:01:59 +01:00
Fabiano Fidêncio
c4194538e2 versions: Bump QEMU to v10.2.0
QEMU v10.2.0 was released on December 24th, 2025.

The experimental GPU SNP / TDX are also pointing to v10.2.0 release with
their gpu-{snp,tdx}-20260107 branch.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-07 12:30:55 +01:00
Steve Horsman
93ad6fde75 Merge pull request #12294 from stevenhorsman/remediate-RUSTSEC-2021-0064
versions: Bump sha2 crate version
2026-01-07 09:53:26 +00:00
stevenhorsman
c456b84537 versions: Bump sha2 crate version
sha2 0.9.3 includes the use of cpuid-bool, which was renamed to cpufeatures
around 5 years ago. Try moving to a workspace dependency of sha2
and bumping to the latest version to remediate RUSTSEC-2021-0064

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-01-06 15:41:34 +00:00
Roaa Sakr
44c79cf14a ci: Update AKS setup post Pod Sandboxing GA
Update workload-runtime value to align with current AKS Pod Sandboxing documentation post GA.

Signed-off-by: Roaa Sakr <romoh@microsoft.com>
2026-01-05 13:47:33 -08:00
Steve Horsman
9463dd970e Merge pull request #12287 from mythi/drop-qat
use-cases: drop Intel QuickAssist instructions
2026-01-05 13:28:16 +00:00
Mikko Ylinen
99bc0f49cc use-cases: drop Intel QuickAssist instructions
While the use-case of Intel QuickAssist (QAT) accelerated crypto
and/or compression with k8s and Kata Containers is still valid,
the setup instructions are outdated:

Starting with Intel Xeon Gen4 (Sapphire Rapids), QAT driver
stack moved to in-tree drivers without a separete SR-IOV VF
driver.

Drop all the setup instructions but keep the use-cases doc
for reference. Users wanting to enable the use-case, should consult
with Intel QAT Device plugins or Intel QAT DRA driver authors.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2026-01-02 12:14:04 +02:00
Fupan Li
b27a80b800 Merge pull request #12156 from Apokleos/required-coco-dev-rs
tests: Make the tests coco-dev job with coco-dev-runtime-rs required
2025-12-25 17:30:40 +08:00
Steve Horsman
bdc5f7d4be Merge pull request #12271 from stevenhorsman/bump-rust-to-1.88
Bump rust to 1.88
2025-12-23 21:38:42 +00:00
Alex Lyn
0b1a5c6e93 tests: Make the tests coco-dev job with coco-dev-runtime-rs required
The nontee job (run-k8s-tests-coco-nontee) for qemu-coco-dev-runtime-rs
is running well and it's time to make it required when the CI runs.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-23 09:54:52 +08:00
stevenhorsman
b6108a7c4a dragonball: Fix manual implementation of .is_multiple_of
Use this new method to avoid the clippy warning and increase
readability

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-22 19:50:19 +00:00
stevenhorsman
55be31ef0f runtime-rs: Fix manual implementation of .is_multiple_of
Use this new method to avoid the clippy warning and increase
readability

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-22 19:50:19 +00:00
stevenhorsman
1d139a7c92 versions: Bump rust to 1.88
In prep for the bump to rust 1.90, try bumping
to 1.88 first to see if the CI is successful here

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-22 19:50:19 +00:00
stevenhorsman
c6053e976f dragonball: Improve vector initialisation
Directly initialise  a zero-filled vector, rather than resizing later

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-22 19:50:19 +00:00
stevenhorsman
18a51dad98 dragonball: Fix manual slice size calculation
Using the built in size_of_val is easier to read and less error-prone
than doing this calculation manually

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-22 19:50:19 +00:00
stevenhorsman
188c9e6eb7 dragonball: Prefer from over into
From give Into for free, so prefer this method

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-22 19:50:19 +00:00
stevenhorsman
c7daa12fe6 dragonball: Remove unnecessary cast
Don't cast usize to usize

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-22 19:50:19 +00:00
stevenhorsman
6c19bd01c8 dragonball: Fix redundant pattern matching
Convert `matches!(desc, None)` to desc.is_none() which is simpler

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-22 19:50:19 +00:00
stevenhorsman
15c6ef5988 dragonball: Fix deprecated cargo-clippy cfg
#[cfg(feature = "cargo-clippy")] has been deprecated for years,
so should be replaced with `#[cfg(clippy)]`

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-22 19:50:19 +00:00
stevenhorsman
e0d09dd787 dragonball: Fix useless use of vec!
`vec![...]` is the same as `[...]`, so remove it to clean up code

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-22 19:50:19 +00:00
stevenhorsman
4fb90d61aa dragonball: Temporaily skip kvm bindgen tests
There are many, many null pointer dereferences in the bindgen code
when moving between rust 1.85.1 and 1.86 and no docs of the source
that it was generated from, so try and skip
these test from running until an SME can look at them @lifupan

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-22 19:50:19 +00:00
stevenhorsman
04306c162b genpolicy: Fix uninlined_format_args
Clippy is recommending that format args are inlined for
better clarity, so update our code to remove these warnings

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-22 19:50:11 +00:00
stevenhorsman
b9ce0bbdf8 trace-forwarder: Fix uninlined_format_args in examples
Clippy is recommending that format args are inlined for
better clarity, so update our code to remove these warnings

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-22 19:50:11 +00:00
stevenhorsman
c5f0acef23 kata-ctl: Fix uninlined_format_args
Clippy is recommending that format args are inlined for
better clarity, so update our code to remove these warnings

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-22 19:50:02 +00:00
stevenhorsman
aff3524420 kata-ctl: Refresh runtime-rs crates
runtime-rs crates are pulled into kata-ctl and some of these have
bumped recently, so update these in kata-ctl as well

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-22 19:50:01 +00:00
stevenhorsman
2caa62f753 agent-ctl: Fix uninlined_format_args
Clippy is recommending that format args are inlined for
better clarity, so update our code to remove these warnings

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-22 19:49:52 +00:00
stevenhorsman
6006b8350d libs: Fix uninlined_format_args
Clippy is recommending that format args are inlined for
better clarity, so update our code to remove these warnings

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-22 19:49:45 +00:00
stevenhorsman
2fde31547a runtime-rs: Fix uninlined_format_args
Clippy is recommending that format args are inlined for
better clarity, so update our code to remove these warnings

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-22 19:49:36 +00:00
stevenhorsman
a299338b6c dragonball: Fix uninlined_format_args
Clippy is recommending that format args are inlined for
better clarity, so update our code to remove these warnings

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-22 19:49:27 +00:00
stevenhorsman
e44c4d901f doc: Fix uninlined_format_args in examples
Clippy is recommending that format args are inlined for
better clarity, so ensure our docs include this

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-22 19:49:27 +00:00
stevenhorsman
b07899f8dc agent: Fix uninlined_format_args
Clippy is recommending that format args are inlined for
better clarity, so update our code to remove these warnings

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-22 19:49:17 +00:00
stevenhorsman
2af88dbb48 agent: bump cdi-rs
In #12151 the version was bumped in cargo.toml, but the update not
done, so run `cargo update -p container-device-interface` to apply it

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-20 10:08:45 +00:00
Steve Horsman
97603608ac Merge pull request #12259 from RuoqingHe/filter-tests-requires-kvm
dragonball: Skip tests require kvm while kvm is absent
2025-12-19 16:05:33 +00:00
Steve Horsman
81d74346f3 Merge pull request #12255 from stevenhorsman/bump-to-rust-1.90-prep
Preparations for the rust 1.90 bump
2025-12-19 14:41:32 +00:00
Steve Horsman
b75cc16bad Merge pull request #12272 from shwetha-s-poojary/revert_cleanup
workflows: payload: do not remove AGENT_TOOLSDIRECTORY
2025-12-19 14:22:36 +00:00
shwetha-s-poojary
1929ca8879 workflows: payload: do not remove AGENT_TOOLSDIRECTORY
Remove line that deletes $AGENT_TOOLSDIRECTORY

Signed-off-by: shwetha-s-poojary <shwetha.s-poojary@ibm.com>
2025-12-19 05:24:36 -08:00
Ruoqing He
5fa663b1e3 dragonball: Skip tests requires KVM when KVM is absent
KVM is not available in our ARM runners, let's skip those tests
accordingly, while making the rest test cases remain tested on machines
with KVM present and access to KVM device.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-12-18 14:17:46 +00:00
Ruoqing He
7cfb97d41b libs: Introduce skip_if_kvm_unaccessable macro
There are test cases require interaction with KVM device, introduce
skip_if_kvm_unaccessable macro to skip them.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-12-18 12:43:20 +00:00
stevenhorsman
e5568e65a1 lib: Fix missing copyright and license
Add the copyright date from when the file was first submitted to github

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-18 07:45:27 +00:00
stevenhorsman
175c2c70b1 dragonball: Fix pointer equality check
Use `ptr::eq` to compare references by address rather than the
values that they point to

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-18 07:45:27 +00:00
stevenhorsman
a221eaa81d dragonball: Fix length comparison to zero
Replace .len() == 0 with .is_empty() for more clarity

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-18 07:45:27 +00:00
stevenhorsman
e73a7c3717 dragonball: Replace manual div_ceil
Use the more clear built-in method

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-18 07:45:27 +00:00
stevenhorsman
048000654c runtime-rs: Prevent doc test issue
cargo test was trying to evaluate the documentation comment and failing,
so try and make the comment explicitly text to avoid this

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-18 07:45:27 +00:00
stevenhorsman
4384b6ad9f dragonball: Avoid manual implementation of ok
Refactor to use `.ok()` rather than implementing it ourselves

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-18 07:45:27 +00:00
stevenhorsman
f4dd69a835 dragonball: Remove unnecessary unwrap
Given that we call `is_some` earlier, we don't then need to unwrap,
so refactor to avoid this

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-18 07:45:27 +00:00
stevenhorsman
20192f819f agent-ctl: Remove unnecessary unwrap
Given that we call `is_some` earlier, we don't then need to unwrap,
so refactor to avoid this

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-18 07:45:27 +00:00
stevenhorsman
9bf5f113f9 genpolicy: Allow dead_code
A few structs in genpolicy are never constructed, so add
`#[allow(dead_code)]` to prevent this clipped warning

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-18 07:45:27 +00:00
stevenhorsman
ca1c0c853f libs: Remove doc overindentation
The doc comment had one space to many in it's list, so the format was wrong

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-18 07:45:27 +00:00
stevenhorsman
501b41cf8f dragonball: Remove doc overindentation
The doc comment had one space to many in it's list, so the format was wrong

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-18 07:45:27 +00:00
stevenhorsman
6a45ee0874 runtime-rs: Improve map iteration
The key was never used, just the value, so just iterate over `.values()`

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-18 07:45:27 +00:00
stevenhorsman
2f49dffcd7 runtime-rs: Remove dead code
`VmmPingResponse` and `NetInterworkingModel` are
never constructed, so remove them

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-18 07:45:27 +00:00
stevenhorsman
35557745b1 runtime-rs: Fix char_indices_as_byte_indices
In unicode you can have multi-byte characters, so it's better to
user char_indices than enumerate the bytes

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-18 07:45:27 +00:00
stevenhorsman
69ca6c0de0 runtime-rs: Fix manual_contains
Use contains to be more concise and efficient rather than manually
implementing this check

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-18 07:45:27 +00:00
stevenhorsman
0027f6cae0 agent: Fix dead_code warning
VirtioBlkCcwDeviceHandler and VirtioBlkCcwHandler
are only constructed on s390x, so add #[cfg(target_arch = "s390x")]
to all the code

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-18 07:45:27 +00:00
stevenhorsman
3b2c83f9d2 trace-forwarder: Fix clippy::io_other_error issue
We can use the new Error::other options rather than
Error:new(Error:Kind:Other

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-18 07:45:26 +00:00
stevenhorsman
b1cfa98524 runtime-rs: Fix clippy::io_other_error issue
We can use the new Error::other options rather than
Error:new(Error:Kind:Other

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-18 07:45:26 +00:00
stevenhorsman
dc8f628dd1 libs: Fix clippy::io_other_error issue
We can use the new Error::other options rather than
Error:new(Error:Kind:Other and drop our own macro that did this mapping

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-18 07:45:26 +00:00
stevenhorsman
5f1d3481af dragonball: Fix clippy::io_other_error issue
We can use the new Error::other options rather than
Error:new(Error:Kind:Other

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-18 07:45:26 +00:00
stevenhorsman
9ec7109712 agent: Fix clippy::io_other_error issue
We can use the new Error::other options rather than
Error:new(Error:Kind:Other

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-18 07:45:26 +00:00
stevenhorsman
34d299ae44 vsock-exporter: Fix clippy::io_other_error issue
We can use the new Error::other options rather than
Error:new(Error:Kind:Other

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-18 07:45:26 +00:00
stevenhorsman
b2f9f23504 dragonball: Fix mismatched_lifetime_syntaxes issue
Fix to`warning: hiding a lifetime that's elided elsewhere is confusing`

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-18 07:45:26 +00:00
stevenhorsman
8bbbc3a58b lib: Fix mismatched_lifetime_syntaxes issue
Fix the warning throw up:
```
warning: hiding a lifetime that's elided elsewhere is confusing
  --> /root/go/src/github.com/kata-containers/kata-containers/src/libs/kata-types/src/utils/u32_set.rs:50:17
   |
50 |     pub fn iter(&self) -> Iter<u32> {
   |                 ^^^^^     --------- the same lifetime is hidden here
   |                 |
   |                 the lifetime is elided here
   |
   = help: the same lifetime is referred to in inconsistent ways, making the signature confusing
   = note: `#[warn(mismatched_lifetime_syntaxes)]` on by default
help: use `'_` for type paths
   |
50 |     pub fn iter(&self) -> Iter<'_, u32> {
   |                                +++
   ```

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-18 07:45:26 +00:00
787 changed files with 15675 additions and 18560 deletions

7
.editorconfig Normal file
View File

@@ -0,0 +1,7 @@
root = true
[*]
charset = utf-8
end_of_line = lf
insert_final_newline = true
trim_trailing_whitespace = true

View File

@@ -0,0 +1,30 @@
{
"Verbose": false,
"Debug": false,
"IgnoreDefaults": false,
"SpacesAfterTabs": false,
"NoColor": false,
"Exclude": [
"src/runtime/vendor",
"src/tools/log-parser/vendor",
"tests/metrics/cmd/checkmetrics/vendor",
"tests/vendor",
"src/runtime/virtcontainers/pkg/cloud-hypervisor/client",
"\\.img$",
"\\.dtb$",
"\\.drawio$",
"\\.svg$",
"\\.patch$"
],
"AllowedContentTypes": [],
"PassedFiles": [],
"Disable": {
"EndOfLine": false,
"Indentation": false,
"IndentSize": false,
"InsertFinalNewline": false,
"TrimTrailingWhitespace": false,
"MaxLineLength": false,
"Charset": false
}
}

View File

@@ -17,7 +17,7 @@ runs:
uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: nightly
toolchain: nightly
override: true
- name: Cache

View File

@@ -12,7 +12,6 @@ updates:
- "/src/tools/agent-ctl"
- "/src/tools/genpolicy"
- "/src/tools/kata-ctl"
- "/src/tools/runk"
- "/src/tools/trace-forwarder"
schedule:
interval: "daily"

View File

@@ -163,42 +163,6 @@ jobs:
timeout-minutes: 10
run: bash tests/integration/nydus/gha-run.sh run
run-runk:
name: run-runk
# Skip runk tests as we have no maintainers. TODO: Decide when to remove altogether
if: false
runs-on: ubuntu-22.04
env:
CONTAINERD_VERSION: lts
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
persist-credentials: false
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install dependencies
run: bash tests/integration/runk/gha-run.sh install-dependencies
- name: get-kata-tarball
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: Install kata
run: bash tests/integration/runk/gha-run.sh install-kata kata-artifacts
- name: Run runk tests
timeout-minutes: 10
run: bash tests/integration/runk/gha-run.sh run
run-tracing:
name: run-tracing
strategy:
@@ -383,7 +347,7 @@ jobs:
path: kata-tools-artifacts
- name: Install kata & kata-tools
run: |
run: |
bash tests/functional/kata-agent-apis/gha-run.sh install-kata kata-artifacts
bash tests/functional/kata-agent-apis/gha-run.sh install-kata-tools kata-tools-artifacts

View File

@@ -74,7 +74,7 @@ jobs:
- rust
- protobuf-compiler
instance:
- ${{ inputs.instance }}
- ${{ inputs.instance }}
steps:
- name: Adjust a permission for repo

View File

@@ -23,8 +23,6 @@ on:
secrets:
QUAY_DEPLOYER_PASSWORD:
required: false
KBUILD_SIGN_PIN:
required: true
permissions: {}
@@ -47,13 +45,12 @@ jobs:
- coco-guest-components
- firecracker
- kernel
- kernel-confidential
- kernel-dragonball-experimental
- kernel-nvidia-gpu
- kernel-nvidia-gpu-confidential
- nydus
- ovmf
- ovmf-sev
- ovmf-tdx
- pause-image
- qemu
- qemu-snp-experimental
@@ -103,7 +100,6 @@ jobs:
ARTEFACT_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}
TARGET_BRANCH: ${{ inputs.target-branch }}
RELEASE: ${{ inputs.stage == 'release' && 'yes' || 'no' }}
KBUILD_SIGN_PIN: ${{ contains(matrix.asset, 'nvidia') && secrets.KBUILD_SIGN_PIN || '' }}
- name: Parse OCI image name and digest
id: parse-oci-segments
@@ -144,11 +140,11 @@ jobs:
if-no-files-found: error
- name: store-extratarballs-artifact ${{ matrix.asset }}
if: ${{ startsWith(matrix.asset, 'kernel-nvidia-gpu') }}
if: ${{ matrix.asset == 'kernel' || startsWith(matrix.asset, 'kernel-nvidia-gpu') }}
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:
name: kata-artifacts-amd64-${{ matrix.asset }}-headers${{ inputs.tarball-suffix }}
path: kata-build/kata-static-${{ matrix.asset }}-headers.tar.zst
name: kata-artifacts-amd64-${{ matrix.asset }}-modules${{ inputs.tarball-suffix }}
path: kata-build/kata-static-${{ matrix.asset }}-modules.tar.zst
retention-days: 15
if-no-files-found: error
@@ -216,7 +212,6 @@ jobs:
ARTEFACT_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}
TARGET_BRANCH: ${{ inputs.target-branch }}
RELEASE: ${{ inputs.stage == 'release' && 'yes' || 'no' }}
KBUILD_SIGN_PIN: ${{ contains(matrix.asset, 'nvidia') && secrets.KBUILD_SIGN_PIN || '' }}
- name: store-artifact ${{ matrix.asset }}
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
@@ -236,8 +231,8 @@ jobs:
asset:
- busybox
- coco-guest-components
- kernel-nvidia-gpu-headers
- kernel-nvidia-gpu-confidential-headers
- kernel-modules
- kernel-nvidia-gpu-modules
- pause-image
steps:
- uses: geekyeggo/delete-artifact@f275313e70c08f6120db482d7a6b98377786765b # v5.1.0

View File

@@ -23,8 +23,6 @@ on:
secrets:
QUAY_DEPLOYER_PASSWORD:
required: false
KBUILD_SIGN_PIN:
required: true
permissions: {}
@@ -90,7 +88,6 @@ jobs:
ARTEFACT_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}
TARGET_BRANCH: ${{ inputs.target-branch }}
RELEASE: ${{ inputs.stage == 'release' && 'yes' || 'no' }}
KBUILD_SIGN_PIN: ${{ contains(matrix.asset, 'nvidia') && secrets.KBUILD_SIGN_PIN || '' }}
- name: Parse OCI image name and digest
id: parse-oci-segments
@@ -134,8 +131,8 @@ jobs:
if: ${{ startsWith(matrix.asset, 'kernel-nvidia-gpu') }}
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:
name: kata-artifacts-arm64-${{ matrix.asset }}-headers${{ inputs.tarball-suffix }}
path: kata-build/kata-static-${{ matrix.asset }}-headers.tar.zst
name: kata-artifacts-arm64-${{ matrix.asset }}-modules${{ inputs.tarball-suffix }}
path: kata-build/kata-static-${{ matrix.asset }}-modules.tar.zst
retention-days: 15
if-no-files-found: error
@@ -197,7 +194,6 @@ jobs:
ARTEFACT_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}
TARGET_BRANCH: ${{ inputs.target-branch }}
RELEASE: ${{ inputs.stage == 'release' && 'yes' || 'no' }}
KBUILD_SIGN_PIN: ${{ contains(matrix.asset, 'nvidia') && secrets.KBUILD_SIGN_PIN || '' }}
- name: store-artifact ${{ matrix.asset }}
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
@@ -216,7 +212,7 @@ jobs:
matrix:
asset:
- busybox
- kernel-nvidia-gpu-headers
- kernel-nvidia-gpu-modules
steps:
- uses: geekyeggo/delete-artifact@f275313e70c08f6120db482d7a6b98377786765b # v5.1.0
with:

View File

@@ -21,8 +21,6 @@ on:
type: string
default: ""
secrets:
CI_HKD_PATH:
required: true
QUAY_DEPLOYER_PASSWORD:
required: true
@@ -44,7 +42,6 @@ jobs:
- agent
- coco-guest-components
- kernel
- kernel-confidential
- pause-image
- qemu
- virtiofsd
@@ -121,6 +118,15 @@ jobs:
retention-days: 15
if-no-files-found: error
- name: store-extratarballs-artifact ${{ matrix.asset }}
if: ${{ matrix.asset == 'kernel' }}
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:
name: kata-artifacts-s390x-${{ matrix.asset }}-modules${{ inputs.tarball-suffix }}
path: kata-build/kata-static-${{ matrix.asset }}-modules.tar.zst
retention-days: 15
if-no-files-found: error
build-asset-rootfs:
name: build-asset-rootfs
runs-on: s390x
@@ -189,60 +195,11 @@ jobs:
retention-days: 15
if-no-files-found: error
build-asset-boot-image-se:
name: build-asset-boot-image-se
runs-on: s390x
needs: [build-asset, build-asset-rootfs]
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
persist-credentials: false
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: get-artifacts
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
pattern: kata-artifacts-s390x-*${{ inputs.tarball-suffix }}
path: kata-artifacts
merge-multiple: true
- name: Place a host key document
run: |
mkdir -p "host-key-document"
cp "${CI_HKD_PATH}" "host-key-document"
env:
CI_HKD_PATH: ${{ secrets.CI_HKD_PATH }}
- name: Build boot-image-se
run: |
./tests/gha-adjust-to-use-prebuilt-components.sh kata-artifacts "boot-image-se"
make boot-image-se-tarball
build_dir=$(readlink -f build)
sudo cp -r "${build_dir}" "kata-build"
sudo chown -R "$(id -u)":"$(id -g)" "kata-build"
env:
HKD_PATH: "host-key-document"
- name: store-artifact boot-image-se
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:
name: kata-artifacts-s390x${{ inputs.tarball-suffix }}
path: kata-build/kata-static-boot-image-se.tar.zst
retention-days: 1
if-no-files-found: error
# We don't need the binaries installed in the rootfs as part of the release tarball, so can delete them now we've built the rootfs
remove-rootfs-binary-artifacts:
name: remove-rootfs-binary-artifacts
runs-on: ubuntu-22.04
needs: [build-asset-rootfs, build-asset-boot-image-se]
needs: [build-asset-rootfs]
strategy:
matrix:
asset:
@@ -323,7 +280,6 @@ jobs:
needs:
- build-asset
- build-asset-rootfs
- build-asset-boot-image-se
- build-asset-shim-v2
permissions:
contents: read

View File

@@ -0,0 +1,75 @@
name: Build kubectl multi-arch image
on:
schedule:
# Run every Sunday at 00:00 UTC
- cron: '0 0 * * 0'
workflow_dispatch:
# Allow manual triggering
push:
branches:
- main
paths:
- 'tools/packaging/kubectl/Dockerfile'
- '.github/workflows/build-kubectl-image.yaml'
permissions: {}
env:
REGISTRY: quay.io
IMAGE_NAME: kata-containers/kubectl
jobs:
build-and-push:
name: Build and push multi-arch image
runs-on: ubuntu-24.04
permissions:
contents: read
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
persist-credentials: false
- name: Set up QEMU
uses: docker/setup-qemu-action@29109295f81e9208d7d86ff1c6c12d2833863392 # v3.6.0
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@b5ca514318bd6ebac0fb2aedd5d36ec1b5c232a2 # v3.10.0
- name: Login to Quay.io
uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0
with:
registry: ${{ env.REGISTRY }}
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- name: Get kubectl version
id: kubectl-version
run: |
KUBECTL_VERSION=$(curl -L -s https://dl.k8s.io/release/stable.txt)
echo "version=${KUBECTL_VERSION}" >> "$GITHUB_OUTPUT"
- name: Generate image metadata
id: meta
uses: docker/metadata-action@902fa8ec7d6ecbf8d84d538b9b233a880e428804 # v5.7.0
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=raw,value=latest
type=raw,value={{date 'YYYYMMDD'}}
type=raw,value=${{ steps.kubectl-version.outputs.version }}
type=sha,prefix=
- name: Build and push multi-arch image
uses: docker/build-push-action@ca052bb54ab0790a636c9b5f226502c73d547a25 # v5.4.0
with:
context: tools/packaging/kubectl/
file: tools/packaging/kubectl/Dockerfile
platforms: linux/amd64,linux/arm64,linux/s390x,linux/ppc64le
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max

View File

@@ -25,9 +25,8 @@ jobs:
tag: ${{ github.sha }}-weekly
target-branch: ${{ github.ref_name }}
secrets:
AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}
AUTHENTICATED_IMAGE_PASSWORD: ${{ vars.AUTHENTICATED_IMAGE_PASSWORD }}
AZ_APPID: ${{ secrets.AZ_APPID }}
AZ_TENANT_ID: ${{ secrets.AZ_TENANT_ID }}
AZ_SUBSCRIPTION_ID: ${{ secrets.AZ_SUBSCRIPTION_ID }}
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
KBUILD_SIGN_PIN: ${{ secrets.KBUILD_SIGN_PIN }}

View File

@@ -19,15 +19,13 @@ jobs:
target-branch: ${{ github.ref_name }}
secrets:
AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}
AUTHENTICATED_IMAGE_PASSWORD: ${{ vars.AUTHENTICATED_IMAGE_PASSWORD }}
AZ_APPID: ${{ secrets.AZ_APPID }}
AZ_TENANT_ID: ${{ secrets.AZ_TENANT_ID }}
AZ_SUBSCRIPTION_ID: ${{ secrets.AZ_SUBSCRIPTION_ID }}
CI_HKD_PATH: ${{ secrets.CI_HKD_PATH }}
ITA_KEY: ${{ secrets.ITA_KEY }}
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
NGC_API_KEY: ${{ secrets.NGC_API_KEY }}
KBUILD_SIGN_PIN: ${{ secrets.KBUILD_SIGN_PIN }}
build-checks:
uses: ./.github/workflows/build-checks.yaml

View File

@@ -1,36 +0,0 @@
name: Kata Containers Nightly CI (Rust)
on:
schedule:
- cron: '0 1 * * *' # Run at 1 AM UTC (1 hour after script-based nightly)
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
permissions: {}
jobs:
kata-containers-ci-on-push-rust:
permissions:
contents: read
packages: write
id-token: write
attestations: write
uses: ./.github/workflows/ci.yaml
with:
commit-hash: ${{ github.sha }}
pr-number: "nightly-rust"
tag: ${{ github.sha }}-nightly-rust
target-branch: ${{ github.ref_name }}
build-type: "rust" # Use Rust-based build
secrets:
AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}
AZ_APPID: ${{ secrets.AZ_APPID }}
AZ_TENANT_ID: ${{ secrets.AZ_TENANT_ID }}
AZ_SUBSCRIPTION_ID: ${{ secrets.AZ_SUBSCRIPTION_ID }}
CI_HKD_PATH: ${{ secrets.CI_HKD_PATH }}
ITA_KEY: ${{ secrets.ITA_KEY }}
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
NGC_API_KEY: ${{ secrets.NGC_API_KEY }}
KBUILD_SIGN_PIN: ${{ secrets.KBUILD_SIGN_PIN }}

View File

@@ -23,12 +23,10 @@ jobs:
tag: ${{ github.sha }}-nightly
target-branch: ${{ github.ref_name }}
secrets:
AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}
AUTHENTICATED_IMAGE_PASSWORD: ${{ vars.AUTHENTICATED_IMAGE_PASSWORD }}
AZ_APPID: ${{ secrets.AZ_APPID }}
AZ_TENANT_ID: ${{ secrets.AZ_TENANT_ID }}
AZ_SUBSCRIPTION_ID: ${{ secrets.AZ_SUBSCRIPTION_ID }}
CI_HKD_PATH: ${{ secrets.CI_HKD_PATH }}
ITA_KEY: ${{ secrets.ITA_KEY }}
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
NGC_API_KEY: ${{ secrets.NGC_API_KEY }}
KBUILD_SIGN_PIN: ${{ secrets.KBUILD_SIGN_PIN }}

View File

@@ -43,12 +43,10 @@ jobs:
target-branch: ${{ github.event.pull_request.base.ref }}
skip-test: ${{ needs.skipper.outputs.skip_test }}
secrets:
AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}
AUTHENTICATED_IMAGE_PASSWORD: ${{ vars.AUTHENTICATED_IMAGE_PASSWORD }}
AZ_APPID: ${{ secrets.AZ_APPID }}
AZ_TENANT_ID: ${{ secrets.AZ_TENANT_ID }}
AZ_SUBSCRIPTION_ID: ${{ secrets.AZ_SUBSCRIPTION_ID }}
CI_HKD_PATH: ${{ secrets.CI_HKD_PATH }}
ITA_KEY: ${{ secrets.ITA_KEY }}
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
NGC_API_KEY: ${{ secrets.NGC_API_KEY }}
KBUILD_SIGN_PIN: ${{ secrets.KBUILD_SIGN_PIN }}

View File

@@ -27,8 +27,6 @@ on:
required: true
QUAY_DEPLOYER_PASSWORD:
required: true
KBUILD_SIGN_PIN:
required: true
permissions: {}
@@ -44,8 +42,6 @@ jobs:
tarball-suffix: -${{ inputs.tag }}
commit-hash: ${{ inputs.commit-hash }}
target-branch: ${{ inputs.target-branch }}
secrets:
KBUILD_SIGN_PIN: ${{ secrets.KBUILD_SIGN_PIN }}
publish-kata-deploy-payload-amd64:
needs: build-kata-static-tarball-amd64
@@ -119,7 +115,7 @@ jobs:
target-branch: ${{ inputs.target-branch }}
tarball-suffix: -${{ inputs.tag }}
secrets:
AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}
AUTHENTICATED_IMAGE_PASSWORD: ${{ vars.AUTHENTICATED_IMAGE_PASSWORD }}
AZ_APPID: ${{ secrets.AZ_APPID }}
AZ_TENANT_ID: ${{ secrets.AZ_TENANT_ID }}
AZ_SUBSCRIPTION_ID: ${{ secrets.AZ_SUBSCRIPTION_ID }}

View File

@@ -19,11 +19,6 @@ on:
required: false
type: string
default: no
build-type:
description: The build type for kata-deploy. Use 'rust' for Rust-based build, empty or omit for script-based (default).
required: false
type: string
default: ""
secrets:
AUTHENTICATED_IMAGE_PASSWORD:
required: true
@@ -34,16 +29,12 @@ on:
required: true
AZ_SUBSCRIPTION_ID:
required: true
CI_HKD_PATH:
required: true
ITA_KEY:
required: true
QUAY_DEPLOYER_PASSWORD:
required: true
NGC_API_KEY:
required: true
KBUILD_SIGN_PIN:
required: true
permissions: {}
@@ -59,8 +50,6 @@ jobs:
tarball-suffix: -${{ inputs.tag }}
commit-hash: ${{ inputs.commit-hash }}
target-branch: ${{ inputs.target-branch }}
secrets:
KBUILD_SIGN_PIN: ${{ secrets.KBUILD_SIGN_PIN }}
publish-kata-deploy-payload-amd64:
needs: build-kata-static-tarball-amd64
@@ -77,7 +66,6 @@ jobs:
target-branch: ${{ inputs.target-branch }}
runner: ubuntu-22.04
arch: amd64
build-type: ${{ inputs.build-type }}
secrets:
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
@@ -92,8 +80,6 @@ jobs:
tarball-suffix: -${{ inputs.tag }}
commit-hash: ${{ inputs.commit-hash }}
target-branch: ${{ inputs.target-branch }}
secrets:
KBUILD_SIGN_PIN: ${{ secrets.KBUILD_SIGN_PIN }}
publish-kata-deploy-payload-arm64:
needs: build-kata-static-tarball-arm64
@@ -110,7 +96,6 @@ jobs:
target-branch: ${{ inputs.target-branch }}
runner: ubuntu-24.04-arm
arch: arm64
build-type: ${{ inputs.build-type }}
secrets:
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
@@ -126,7 +111,6 @@ jobs:
commit-hash: ${{ inputs.commit-hash }}
target-branch: ${{ inputs.target-branch }}
secrets:
CI_HKD_PATH: ${{ secrets.ci_hkd_path }}
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
build-kata-static-tarball-ppc64le:
@@ -156,7 +140,6 @@ jobs:
target-branch: ${{ inputs.target-branch }}
runner: ubuntu-24.04-s390x
arch: s390x
build-type: ${{ inputs.build-type }}
secrets:
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
@@ -175,7 +158,6 @@ jobs:
target-branch: ${{ inputs.target-branch }}
runner: ubuntu-24.04-ppc64le
arch: ppc64le
build-type: ${{ inputs.build-type }}
secrets:
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
@@ -297,7 +279,7 @@ jobs:
tarball-suffix: -${{ inputs.tag }}
registry: ghcr.io
repo: ${{ github.repository_owner }}/kata-deploy-ci
tag: ${{ inputs.tag }}-amd64${{ inputs.build-type == 'rust' && '-rust' || '' }}
tag: ${{ inputs.tag }}-amd64
commit-hash: ${{ inputs.commit-hash }}
pr-number: ${{ inputs.pr-number }}
target-branch: ${{ inputs.target-branch }}
@@ -313,7 +295,7 @@ jobs:
with:
registry: ghcr.io
repo: ${{ github.repository_owner }}/kata-deploy-ci
tag: ${{ inputs.tag }}-arm64${{ inputs.build-type == 'rust' && '-rust' || '' }}
tag: ${{ inputs.tag }}-arm64
commit-hash: ${{ inputs.commit-hash }}
pr-number: ${{ inputs.pr-number }}
target-branch: ${{ inputs.target-branch }}
@@ -326,7 +308,7 @@ jobs:
tarball-suffix: -${{ inputs.tag }}
registry: ghcr.io
repo: ${{ github.repository_owner }}/kata-deploy-ci
tag: ${{ inputs.tag }}-amd64${{ inputs.build-type == 'rust' && '-rust' || '' }}
tag: ${{ inputs.tag }}-amd64
commit-hash: ${{ inputs.commit-hash }}
pr-number: ${{ inputs.pr-number }}
target-branch: ${{ inputs.target-branch }}
@@ -348,12 +330,12 @@ jobs:
tarball-suffix: -${{ inputs.tag }}
registry: ghcr.io
repo: ${{ github.repository_owner }}/kata-deploy-ci
tag: ${{ inputs.tag }}-amd64${{ inputs.build-type == 'rust' && '-rust' || '' }}
tag: ${{ inputs.tag }}-amd64
commit-hash: ${{ inputs.commit-hash }}
pr-number: ${{ inputs.pr-number }}
target-branch: ${{ inputs.target-branch }}
secrets:
AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}
AUTHENTICATED_IMAGE_PASSWORD: ${{ vars.AUTHENTICATED_IMAGE_PASSWORD }}
AZ_APPID: ${{ secrets.AZ_APPID }}
AZ_TENANT_ID: ${{ secrets.AZ_TENANT_ID }}
AZ_SUBSCRIPTION_ID: ${{ secrets.AZ_SUBSCRIPTION_ID }}
@@ -366,12 +348,12 @@ jobs:
with:
registry: ghcr.io
repo: ${{ github.repository_owner }}/kata-deploy-ci
tag: ${{ inputs.tag }}-s390x${{ inputs.build-type == 'rust' && '-rust' || '' }}
tag: ${{ inputs.tag }}-s390x
commit-hash: ${{ inputs.commit-hash }}
pr-number: ${{ inputs.pr-number }}
target-branch: ${{ inputs.target-branch }}
secrets:
AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}
AUTHENTICATED_IMAGE_PASSWORD: ${{ vars.AUTHENTICATED_IMAGE_PASSWORD }}
run-k8s-tests-on-ppc64le:
if: ${{ inputs.skip-test != 'yes' }}
@@ -380,7 +362,7 @@ jobs:
with:
registry: ghcr.io
repo: ${{ github.repository_owner }}/kata-deploy-ci
tag: ${{ inputs.tag }}-ppc64le${{ inputs.build-type == 'rust' && '-rust' || '' }}
tag: ${{ inputs.tag }}-ppc64le
commit-hash: ${{ inputs.commit-hash }}
pr-number: ${{ inputs.pr-number }}
target-branch: ${{ inputs.target-branch }}
@@ -392,7 +374,7 @@ jobs:
with:
registry: ghcr.io
repo: ${{ github.repository_owner }}/kata-deploy-ci
tag: ${{ inputs.tag }}-amd64${{ inputs.build-type == 'rust' && '-rust' || '' }}
tag: ${{ inputs.tag }}-amd64
commit-hash: ${{ inputs.commit-hash }}
pr-number: ${{ inputs.pr-number }}
target-branch: ${{ inputs.target-branch }}

32
.github/workflows/docs.yaml vendored Normal file
View File

@@ -0,0 +1,32 @@
name: Documentation
on:
push:
branches:
- main
permissions: {}
jobs:
deploy-docs:
name: deploy-docs
permissions:
contents: read
pages: write
id-token: write
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
runs-on: ubuntu-latest
steps:
- uses: actions/configure-pages@v5
- uses: actions/checkout@v5
with:
persist-credentials: false
- uses: actions/setup-python@v5
with:
python-version: 3.x
- run: pip install zensical
- run: zensical build --clean
- uses: actions/upload-pages-artifact@v4
with:
path: site
- uses: actions/deploy-pages@v4
id: deployment

View File

@@ -0,0 +1,29 @@
name: EditorConfig checker
on:
pull_request:
permissions: {}
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
editorconfig-checker:
name: editorconfig-checker
runs-on: ubuntu-24.04
steps:
- name: Checkout the code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
persist-credentials: false
- name: Set up editorconfig-checker
uses: editorconfig-checker/action-editorconfig-checker@4b6cd6190d435e7e084fb35e36a096e98506f7b9 # v2.1.0
with:
version: v3.6.1
- name: Run editorconfig-checker
run: editorconfig-checker

View File

@@ -24,7 +24,6 @@ jobs:
target-branch: ${{ github.ref_name }}
secrets:
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
KBUILD_SIGN_PIN: ${{ secrets.KBUILD_SIGN_PIN }}
build-assets-arm64:
permissions:
@@ -39,7 +38,6 @@ jobs:
target-branch: ${{ github.ref_name }}
secrets:
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
KBUILD_SIGN_PIN: ${{ secrets.KBUILD_SIGN_PIN }}
build-assets-s390x:
permissions:
@@ -53,7 +51,6 @@ jobs:
push-to-registry: yes
target-branch: ${{ github.ref_name }}
secrets:
CI_HKD_PATH: ${{ secrets.CI_HKD_PATH }}
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
build-assets-ppc64le:
@@ -82,7 +79,6 @@ jobs:
target-branch: ${{ github.ref_name }}
runner: ubuntu-22.04
arch: amd64
build-type: "" # Use script-based build (default)
secrets:
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
@@ -100,7 +96,6 @@ jobs:
target-branch: ${{ github.ref_name }}
runner: ubuntu-24.04-arm
arch: arm64
build-type: "" # Use script-based build (default)
secrets:
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
@@ -118,7 +113,6 @@ jobs:
target-branch: ${{ github.ref_name }}
runner: s390x
arch: s390x
build-type: "" # Use script-based build (default)
secrets:
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
@@ -134,9 +128,8 @@ jobs:
repo: kata-containers/kata-deploy-ci
tag: kata-containers-latest-ppc64le
target-branch: ${{ github.ref_name }}
runner: ppc64le-small
runner: ubuntu-24.04-ppc64le
arch: ppc64le
build-type: "" # Use script-based build (default)
secrets:
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

View File

@@ -30,11 +30,6 @@ on:
description: The arch of the tarball.
required: true
type: string
build-type:
description: The build type for kata-deploy. Use 'rust' for Rust-based build, empty or omit for script-based (default).
required: false
type: string
default: ""
secrets:
QUAY_DEPLOYER_PASSWORD:
required: true
@@ -63,7 +58,6 @@ jobs:
sudo rm -rf /usr/share/dotnet
sudo rm -rf /opt/ghc
sudo rm -rf /usr/local/share/boost
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
sudo rm -rf /usr/lib/jvm
sudo rm -rf /usr/share/swift
sudo rm -rf /usr/local/share/powershell
@@ -107,10 +101,8 @@ jobs:
REGISTRY: ${{ inputs.registry }}
REPO: ${{ inputs.repo }}
TAG: ${{ inputs.tag }}
BUILD_TYPE: ${{ inputs.build-type }}
run: |
./tools/packaging/kata-deploy/local-build/kata-deploy-build-and-upload-payload.sh \
"$(pwd)/kata-static.tar.zst" \
"${REGISTRY}/${REPO}" \
"${TAG}" \
"${BUILD_TYPE}"
"${TAG}"

View File

@@ -0,0 +1,43 @@
# Push gperf and busybox tarballs to the ORAS cache (ghcr.io) so that
# download-with-oras-cache.sh can pull them instead of hitting upstream.
# Runs when versions.yaml changes on main (e.g. after a PR merge) or manually.
name: CI | Push ORAS tarball cache
on:
push:
branches:
- main
paths:
- 'versions.yaml'
workflow_dispatch:
permissions: {}
jobs:
push-oras-cache:
name: push-oras-cache
runs-on: ubuntu-22.04
permissions:
contents: read
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
persist-credentials: false
- name: Install yq
run: ./ci/install_yq.sh
- name: Install ORAS
uses: oras-project/setup-oras@22ce207df3b08e061f537244349aac6ae1d214f6 # v1.2.4
with:
version: "1.2.0"
- name: Populate ORAS tarball cache
run: ./tools/packaging/scripts/populate-oras-tarball-cache.sh all
env:
ARTEFACT_REGISTRY: ghcr.io
ARTEFACT_REPOSITORY: kata-containers
ARTEFACT_REGISTRY_USERNAME: ${{ github.actor }}
ARTEFACT_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}

View File

@@ -8,8 +8,6 @@ on:
secrets:
QUAY_DEPLOYER_PASSWORD:
required: true
KBUILD_SIGN_PIN:
required: true
permissions: {}
@@ -21,7 +19,6 @@ jobs:
stage: release
secrets:
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
KBUILD_SIGN_PIN: ${{ secrets.KBUILD_SIGN_PIN }}
permissions:
contents: read
packages: write

View File

@@ -8,8 +8,6 @@ on:
secrets:
QUAY_DEPLOYER_PASSWORD:
required: true
KBUILD_SIGN_PIN:
required: true
permissions: {}
@@ -21,7 +19,6 @@ jobs:
stage: release
secrets:
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
KBUILD_SIGN_PIN: ${{ secrets.KBUILD_SIGN_PIN }}
permissions:
contents: read
packages: write

View File

@@ -6,8 +6,6 @@ on:
required: true
type: string
secrets:
CI_HKD_PATH:
required: true
QUAY_DEPLOYER_PASSWORD:
required: true
@@ -20,7 +18,6 @@ jobs:
push-to-registry: yes
stage: release
secrets:
CI_HKD_PATH: ${{ secrets.CI_HKD_PATH }}
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
permissions:
contents: read

View File

@@ -35,7 +35,6 @@ jobs:
target-arch: amd64
secrets:
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
KBUILD_SIGN_PIN: ${{ secrets.KBUILD_SIGN_PIN }}
build-and-push-assets-arm64:
needs: release
@@ -49,7 +48,6 @@ jobs:
target-arch: arm64
secrets:
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
KBUILD_SIGN_PIN: ${{ secrets.KBUILD_SIGN_PIN }}
build-and-push-assets-s390x:
needs: release
@@ -62,7 +60,6 @@ jobs:
with:
target-arch: s390x
secrets:
CI_HKD_PATH: ${{ secrets.CI_HKD_PATH }}
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
build-and-push-assets-ppc64le:

View File

@@ -32,6 +32,7 @@ jobs:
matrix:
vmm:
- qemu
- qemu-runtime-rs
k8s:
- kubeadm
runs-on: arm64-k8s

View File

@@ -126,5 +126,6 @@ jobs:
- name: Delete CoCo KBS
if: always() && matrix.environment.name != 'nvidia-gpu'
timeout-minutes: 10
run: |
bash tests/integration/kubernetes/gha-run.sh delete-coco-kbs

View File

@@ -76,7 +76,7 @@ jobs:
SNAPSHOTTER: ${{ matrix.snapshotter }}
TARGET_ARCH: "s390x"
AUTHENTICATED_IMAGE_USER: ${{ vars.AUTHENTICATED_IMAGE_USER }}
AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}
AUTHENTICATED_IMAGE_PASSWORD: ${{ vars.AUTHENTICATED_IMAGE_PASSWORD }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
@@ -137,10 +137,12 @@ jobs:
- name: Delete kata-deploy
if: always()
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh cleanup-zvsi
- name: Delete CoCo KBS
if: always()
timeout-minutes: 10
run: |
if [ "${KBS}" == "true" ]; then
bash tests/integration/kubernetes/gha-run.sh delete-coco-kbs

View File

@@ -69,7 +69,7 @@ jobs:
KUBERNETES: "vanilla"
PULL_TYPE: ${{ matrix.pull-type }}
AUTHENTICATED_IMAGE_USER: ${{ vars.AUTHENTICATED_IMAGE_USER }}
AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}
AUTHENTICATED_IMAGE_PASSWORD: ${{ vars.AUTHENTICATED_IMAGE_PASSWORD }}
SNAPSHOTTER: ${{ matrix.snapshotter }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

View File

@@ -63,7 +63,7 @@ jobs:
SNAPSHOTTER: "nydus"
PULL_TYPE: "guest-pull"
AUTHENTICATED_IMAGE_USER: ${{ vars.AUTHENTICATED_IMAGE_USER }}
AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}
AUTHENTICATED_IMAGE_PASSWORD: ${{ vars.AUTHENTICATED_IMAGE_PASSWORD }}
GH_ITA_KEY: ${{ secrets.ITA_KEY }}
AUTO_GENERATE_POLICY: "yes"
steps:
@@ -120,10 +120,12 @@ jobs:
- name: Delete kata-deploy
if: always()
timeout-minutes: 15
run: bash tests/integration/kubernetes/gha-run.sh cleanup
- name: Delete CoCo KBS
if: always()
timeout-minutes: 10
run: |
[[ "${KATA_HYPERVISOR}" == "qemu-tdx" ]] && echo "ITA_KEY=${GH_ITA_KEY}" >> "${GITHUB_ENV}"
bash tests/integration/kubernetes/gha-run.sh delete-coco-kbs
@@ -166,7 +168,7 @@ jobs:
KUBERNETES: "vanilla"
PULL_TYPE: ${{ matrix.pull-type }}
AUTHENTICATED_IMAGE_USER: ${{ vars.AUTHENTICATED_IMAGE_USER }}
AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}
AUTHENTICATED_IMAGE_PASSWORD: ${{ vars.AUTHENTICATED_IMAGE_PASSWORD }}
SNAPSHOTTER: ${{ matrix.snapshotter }}
EXPERIMENTAL_FORCE_GUEST_PULL: ${{ matrix.pull-type == 'experimental-force-guest-pull' && matrix.vmm || '' }}
# Caution: current ingress controller used to expose the KBS service
@@ -327,7 +329,6 @@ jobs:
sudo rm -rf /usr/share/dotnet
sudo rm -rf /opt/ghc
sudo rm -rf /usr/local/share/boost
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
sudo rm -rf /usr/lib/jvm
sudo rm -rf /usr/share/swift
sudo rm -rf /usr/local/share/powershell

View File

@@ -66,7 +66,6 @@ jobs:
sudo rm -rf /usr/share/dotnet
sudo rm -rf /opt/ghc
sudo rm -rf /usr/local/share/boost
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
sudo rm -rf /usr/lib/jvm
sudo rm -rf /usr/share/swift
sudo rm -rf /usr/local/share/powershell
@@ -88,4 +87,4 @@ jobs:
- name: Report tests
if: always()
run: bash tests/integration/kubernetes/gha-run.sh report-tests
run: bash tests/functional/kata-deploy/gha-run.sh report-tests

View File

@@ -1,54 +0,0 @@
name: CI | Run runk tests
on:
workflow_call:
inputs:
tarball-suffix:
required: false
type: string
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
permissions: {}
jobs:
run-runk:
name: run-runk
# Skip runk tests as we have no maintainers. TODO: Decide when to remove altogether
if: false
runs-on: ubuntu-22.04
env:
CONTAINERD_VERSION: lts
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
persist-credentials: false
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install dependencies
run: bash tests/integration/runk/gha-run.sh install-dependencies
env:
GH_TOKEN: ${{ github.token }}
- name: get-kata-tarball
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: Install kata
run: bash tests/integration/runk/gha-run.sh install-kata kata-artifacts
- name: Run runk tests
run: bash tests/integration/runk/gha-run.sh run

View File

@@ -6,14 +6,21 @@ on:
permissions: {}
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
stale:
name: stale
runs-on: ubuntu-22.04
permissions:
actions: write # Needed to manage caches for state persistence across runs
pull-requests: write # Needed to add/remove labels, post comments, or close PRs
steps:
- uses: actions/stale@5bef64f19d7facfb25b37b414482c7164d639639 # v9.1.0
with:
stale-pr-message: 'This PR has been opened without with no activity for 180 days. Comment on the issue otherwise it will be closed in 7 days'
stale-pr-message: 'This PR has been opened without activity for 180 days. Please comment on the issue or it will be closed in 7 days.'
days-before-pr-stale: 180
days-before-pr-close: 7
days-before-issue-stale: -1

View File

@@ -21,7 +21,7 @@ jobs:
persist-credentials: false
- name: Run zizmor
uses: zizmorcore/zizmor-action@e673c3917a1aef3c65c972347ed84ccd013ecda4 # v0.2.0
uses: zizmorcore/zizmor-action@135698455da5c3b3e55f73f4419e481ab68cdd95 # v0.4.1
with:
advanced-security: false
annotations: true

1
.gitignore vendored
View File

@@ -19,3 +19,4 @@ tools/packaging/static-build/agent/install_libseccomp.sh
.envrc
.direnv
**/.DS_Store
site/

68
Cargo.lock generated
View File

@@ -525,9 +525,9 @@ checksum = "14c189c53d098945499cdfa7ecc63567cf3886b3332b312a5b4585d8d3a6a610"
[[package]]
name = "bytes"
version = "1.5.0"
version = "1.11.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a2bd12c1caf447e69cd4528f47f94d203fd2582878ecb9e9465484c4148a8223"
checksum = "1e748733b7cbc798e1434b6ac524f0c1ff2ab456fe201501e6497c8417a4fc33"
[[package]]
name = "caps"
@@ -770,12 +770,6 @@ dependencies = [
"libc",
]
[[package]]
name = "cpuid-bool"
version = "0.1.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8aebca1129a03dc6dc2b127edd729435bbc4a37e1d5f4d7513165089ceb02634"
[[package]]
name = "crc32fast"
version = "1.3.2"
@@ -1144,12 +1138,12 @@ dependencies = [
[[package]]
name = "deranged"
version = "0.3.11"
version = "0.5.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b42b6fa04a440b495c8b04d0e71b707c585f83cb9cb28cf8cd0d976c315e31b4"
checksum = "ececcb659e7ba858fb4f10388c250a7252eb0a27373f1a72b8748afdd248e587"
dependencies = [
"powerfmt",
"serde",
"serde_core",
]
[[package]]
@@ -2378,7 +2372,7 @@ dependencies = [
"nix 0.23.2",
"once_cell",
"serde",
"sha2 0.9.3",
"sha2 0.9.9",
"thiserror 1.0.48",
"uuid 0.8.2",
]
@@ -2790,9 +2784,9 @@ dependencies = [
[[package]]
name = "num-conv"
version = "0.1.0"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "51d515d32fb182ee37cda2ccdcb92950d6a3c2893aa280e540671c2cd0f3b1d9"
checksum = "cf97ec579c3c42f953ef76dbf8d55ac91fb219dde70e49aa4a6b7d74e9919050"
[[package]]
name = "num-traits"
@@ -2999,9 +2993,9 @@ checksum = "ff011a302c396a5197692431fc1948019154afc178baf7d8e37367442a4601cf"
[[package]]
name = "openssl-src"
version = "300.5.0+3.5.0"
version = "300.5.4+3.5.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e8ce546f549326b0e6052b649198487d91320875da901e7bd11a06d1ee3f9c2f"
checksum = "a507b3792995dae9b0df8a1c1e3771e8418b7c2d9f0baeba32e6fe8b06c7cb72"
dependencies = [
"cc",
]
@@ -3627,9 +3621,9 @@ dependencies = [
[[package]]
name = "qapi"
version = "0.14.0"
version = "0.15.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c6412bdd014ebee03ddbbe79ac03a0b622cce4d80ba45254f6357c847f06fa38"
checksum = "7b047adab56acc4948d4b9b58693c1f33fd13efef2d6bb5f0f66a47436ceada8"
dependencies = [
"bytes",
"futures 0.3.28",
@@ -3664,9 +3658,9 @@ dependencies = [
[[package]]
name = "qapi-qmp"
version = "0.14.0"
version = "0.15.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e8b944db7e544d2fa97595e9a000a6ba5c62c426fa185e7e00aabe4b5640b538"
checksum = "45303cac879d89361cad0287ae15f9ae1e7799b904b474152414aeece39b9875"
dependencies = [
"qapi-codegen",
"qapi-spec",
@@ -3951,9 +3945,9 @@ dependencies = [
[[package]]
name = "rkyv"
version = "0.7.45"
version = "0.7.46"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9008cd6385b9e161d8229e1f6549dd23c3d022f132a2ea37ac3a10ac4935779b"
checksum = "2297bf9c81a3f0dc96bc9521370b88f054168c29826a75e89c55ff196e7ed6a1"
dependencies = [
"bitvec",
"bytecheck",
@@ -3969,9 +3963,9 @@ dependencies = [
[[package]]
name = "rkyv_derive"
version = "0.7.45"
version = "0.7.46"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "503d1d27590a2b0a3a4ca4c94755aa2875657196ecbf401a42eff41d7de532c0"
checksum = "84d7b42d4b8d06048d3ac8db0eb31bcb942cbeb709f0b5f2b2ebde398d3038f5"
dependencies = [
"proc-macro2",
"quote",
@@ -4011,6 +4005,7 @@ version = "0.1.0"
dependencies = [
"anyhow",
"common",
"containerd-shim-protos",
"go-flag",
"logging",
"nix 0.26.4",
@@ -4052,6 +4047,8 @@ dependencies = [
"persist",
"procfs 0.12.0",
"prometheus",
"protobuf",
"protocols",
"resource",
"runtime-spec",
"serde_json",
@@ -4430,13 +4427,13 @@ dependencies = [
[[package]]
name = "sha2"
version = "0.9.3"
version = "0.9.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "fa827a14b29ab7f44778d14a88d3cb76e949c45083f7dbfa507d0cb699dc12de"
checksum = "4d58a1e1bf39749807d89cf2d98ac2dfa0ff1cb3faa38fbb64dd88ac8013d800"
dependencies = [
"block-buffer 0.9.0",
"cfg-if 1.0.0",
"cpuid-bool",
"cpufeatures",
"digest 0.9.0",
"opaque-debug",
]
@@ -4482,7 +4479,7 @@ dependencies = [
"runtimes",
"serial_test 0.10.0",
"service",
"sha2 0.9.3",
"sha2 0.10.9",
"slog",
"slog-async",
"slog-scope",
@@ -4866,6 +4863,7 @@ checksum = "8f50febec83f5ee1df3015341d8bd429f2d1cc62bcba7ea2076759d315084683"
name = "test-utils"
version = "0.1.0"
dependencies = [
"libc",
"nix 0.26.4",
]
@@ -4952,30 +4950,30 @@ dependencies = [
[[package]]
name = "time"
version = "0.3.37"
version = "0.3.47"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "35e7868883861bd0e56d9ac6efcaaca0d6d5d82a2a7ec8209ff492c07cf37b21"
checksum = "743bd48c283afc0388f9b8827b976905fb217ad9e647fae3a379a9283c4def2c"
dependencies = [
"deranged",
"itoa",
"num-conv",
"powerfmt",
"serde",
"serde_core",
"time-core",
"time-macros",
]
[[package]]
name = "time-core"
version = "0.1.2"
version = "0.1.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ef927ca75afb808a4d64dd374f00a2adf8d0fcff8e7b184af886c3c87ec4a3f3"
checksum = "7694e1cfe791f8d31026952abf09c69ca6f6fa4e1a1229e18988f06a04a12dca"
[[package]]
name = "time-macros"
version = "0.2.19"
version = "0.2.27"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2834e6017e3e5e4b9834939793b282bc03b37a3336245fa820e35e233e2a85de"
checksum = "2e70e4c5a0e0a8a4823ad65dfe1a6930e4f4d756dcd9dd7939022b5e8c501215"
dependencies = [
"num-conv",
"time-core",

View File

@@ -2,7 +2,7 @@
authors = ["The Kata Containers community <kata-dev@lists.katacontainers.io>"]
edition = "2018"
license = "Apache-2.0"
rust-version = "1.85.1"
rust-version = "1.88"
[workspace]
members = [
@@ -127,6 +127,7 @@ protobuf = "3.7.2"
rand = "0.8.4"
serde = { version = "1.0.145", features = ["derive"] }
serde_json = "1.0.91"
sha2 = "0.10.9"
slog = "2.5.2"
slog-scope = "4.4.0"
strum = { version = "0.24.0", features = ["derive"] }

View File

@@ -18,7 +18,6 @@ TOOLS =
TOOLS += agent-ctl
TOOLS += kata-ctl
TOOLS += log-parser
TOOLS += runk
TOOLS += trace-forwarder
STANDARD_TARGETS = build check clean install static-checks-build test vendor
@@ -48,7 +47,10 @@ docs-url-alive-check:
bash ci/docs-url-alive-check.sh
build-and-publish-kata-debug:
bash tools/packaging/kata-debug/kata-debug-build-and-upload-payload.sh ${KATA_DEBUG_REGISTRY} ${KATA_DEBUG_TAG}
bash tools/packaging/kata-debug/kata-debug-build-and-upload-payload.sh ${KATA_DEBUG_REGISTRY} ${KATA_DEBUG_TAG}
docs-serve:
docker run --rm -p 8000:8000 -v ./docs:/docs:ro -v ${PWD}/zensical.toml:/zensical.toml:ro zensical/zensical serve --config-file /zensical.toml -a 0.0.0.0:8000
.PHONY: \
all \
@@ -56,4 +58,5 @@ build-and-publish-kata-debug:
install-tarball \
default \
static-checks \
docs-url-alive-check
docs-url-alive-check \
docs-serve

View File

@@ -139,7 +139,6 @@ The table below lists the remaining parts of the project:
| [`agent-ctl`](src/tools/agent-ctl) | utility | Tool that provides low-level access for testing the agent. |
| [`kata-ctl`](src/tools/kata-ctl) | utility | Tool that provides advanced commands and debug facilities. |
| [`trace-forwarder`](src/tools/trace-forwarder) | utility | Agent tracing helper. |
| [`runk`](src/tools/runk) | utility | Standard OCI container runtime based on the agent. |
| [`ci`](.github/workflows) | CI | Continuous Integration configuration files and scripts. |
| [`ocp-ci`](ci/openshift-ci/README.md) | CI | Continuous Integration configuration for the OpenShift pipelines. |
| [`katacontainers.io`](https://github.com/kata-containers/www.katacontainers.io) | Source for the [`katacontainers.io`](https://www.katacontainers.io) site. |

View File

@@ -1 +1 @@
3.24.0
3.26.0

View File

@@ -73,12 +73,12 @@ function install_yq() {
goarch=arm64
;;
"arm64")
# If we're on an apple silicon machine, just assign amd64.
# The version of yq we use doesn't have a darwin arm build,
# If we're on an apple silicon machine, just assign amd64.
# The version of yq we use doesn't have a darwin arm build,
# but Rosetta can come to the rescue here.
if [[ ${goos} == "Darwin" ]]; then
goarch=amd64
else
else
goarch=arm64
fi
;;

View File

@@ -46,16 +46,12 @@ fi
[[ ${SELINUX_PERMISSIVE} == "yes" ]] && oc delete -f "${deployments_dir}/machineconfig_selinux.yaml.in"
# Delete kata-containers
pushd "${katacontainers_repo_dir}/tools/packaging/kata-deploy" || { echo "Failed to push to ${katacontainers_repo_dir}/tools/packaging/kata-deploy"; exit 125; }
oc delete -f kata-deploy/base/kata-deploy.yaml
helm uninstall kata-deploy --wait --namespace kube-system
oc -n kube-system wait --timeout=10m --for=delete -l name=kata-deploy pod
oc apply -f kata-cleanup/base/kata-cleanup.yaml
echo "Wait for all related pods to be gone"
( repeats=1; for _ in $(seq 1 600); do
oc get pods -l name="kubelet-kata-cleanup" --no-headers=true -n kube-system 2>&1 | grep "No resources found" -q && ((repeats++)) || repeats=1
[[ "${repeats}" -gt 5 ]] && echo kata-cleanup finished && break
sleep 1
done) || { echo "There are still some kata-cleanup related pods after 600 iterations"; oc get all -n kube-system; exit 1; }
oc delete -f kata-cleanup/base/kata-cleanup.yaml
oc delete -f kata-rbac/base/kata-rbac.yaml
oc delete -f runtimeclasses/kata-runtimeClasses.yaml

View File

@@ -51,13 +51,13 @@ apply_kata_deploy() {
oc label --overwrite ns kube-system pod-security.kubernetes.io/enforce=privileged pod-security.kubernetes.io/warn=baseline pod-security.kubernetes.io/audit=baseline
local version chart
version=$(curl -sSL https://api.github.com/repos/kata-containers/kata-containers/releases/latest | jq .tag_name | tr -d '"')
version='0.0.0-dev'
chart="oci://ghcr.io/kata-containers/kata-deploy-charts/kata-deploy"
# Ensure any potential leftover is cleaned up ... and this secret usually is not in case of previous failures
oc delete secret sh.helm.release.v1.kata-deploy.v1 -n kube-system || true
echo "Installing kata using helm ${chart} ${version}"
echo "Installing kata using helm ${chart} ${version} (sha printed in helm output)"
helm install kata-deploy --wait --namespace kube-system --set "image.reference=${KATA_DEPLOY_IMAGE%%:*},image.tag=${KATA_DEPLOY_IMAGE##*:}" "${chart}" --version "${version}"
}

View File

@@ -157,6 +157,16 @@ if [[ -z "${CAA_IMAGE}" ]]; then
fi
# Get latest PP image
#
# You can list the CI images by:
# az sig image-version list-community --location "eastus" --public-gallery-name "cocopodvm-d0e4f35f-5530-4b9c-8596-112487cdea85" --gallery-image-definition "podvm_image0" --output table
# or the release images by:
# az sig image-version list-community --location "eastus" --public-gallery-name "cococommunity-42d8482d-92cd-415b-b332-7648bd978eff" --gallery-image-definition "peerpod-podvm-fedora" --output table
# or the release debug images by:
# az sig image-version list-community --location "eastus" --public-gallery-name "cococommunity-42d8482d-92cd-415b-b332-7648bd978eff" --gallery-image-definition "peerpod-podvm-fedora-debug" --output table
#
# Note there are other flavours of the released images, you can list them by:
# az sig image-definition list-community --location "eastus" --public-gallery-name "cococommunity-42d8482d-92cd-415b-b332-7648bd978eff" --output table
if [[ -z "${PP_IMAGE_ID}" ]]; then
SUCCESS_TIME=$(curl -s \
-H "Accept: application/vnd.github+json" \

View File

@@ -83,4 +83,4 @@ files to the repository and create a pull request when you are ready.
If you have an idea for a blog post and would like to get feedback from the
community about it or have any questions about the process, please reach out
on one of the community's [communication channels](https://katacontainers.io/community/).
on one of the community's [communication channels](https://katacontainers.io/community/).

View File

@@ -125,7 +125,7 @@ If you want to enable SELinux in Permissive mode, add `enforcing=0` to the kerne
Enable full debug as follows:
```bash
$ sudo sed -i -e 's/^# *\(enable_debug\).*=.*$/\1 = true/g' /etc/kata-containers/configuration.toml
$ sudo sed -i -E 's/^(\s*enable_debug\s*=\s*)false/\1true/' /etc/kata-containers/configuration.toml
$ sudo sed -i -e 's/^kernel_params = "\(.*\)"/kernel_params = "\1 agent.log=debug initcall_debug"/g' /etc/kata-containers/configuration.toml
```
@@ -289,14 +289,14 @@ provided by your distribution.
As a prerequisite, you need to install Docker. Otherwise, you will not be
able to run the `rootfs.sh` script with `USE_DOCKER=true` as expected in
the following example.
the following example. Specifying the `OS_VERSION` is required when using `distro="ubuntu"`.
```bash
$ export distro="ubuntu" # example
$ export ROOTFS_DIR="$(realpath kata-containers/tools/osbuilder/rootfs-builder/rootfs)"
$ sudo rm -rf "${ROOTFS_DIR}"
$ pushd kata-containers/tools/osbuilder/rootfs-builder
$ script -fec 'sudo -E USE_DOCKER=true ./rootfs.sh "${distro}"'
$ script -fec 'sudo -E USE_DOCKER=true OS_VERSION=noble ./rootfs.sh "${distro}"'
$ popd
```

View File

@@ -206,7 +206,7 @@ For security reasons, the following mounts are disallowed:
| `proc \|\| sysfs` | `*` | not a directory (e.g. symlink) | CVE-2019-19921 |
For bind mounts under /proc, these destinations are allowed:
* `/proc/cpuinfo`
* `/proc/diskstats`
* `/proc/meminfo`

View File

@@ -198,7 +198,7 @@ fn join_params_with_dash(str: &str, num: i32) -> Result<String> {
return Err("number must be positive");
}
let result = format!("{}-{}", str, num);
let result = format!("{str}-{num}");
Ok(result)
}
@@ -253,13 +253,13 @@ mod tests {
// Run the tests
for (i, d) in tests.iter().enumerate() {
// Create a string containing details of the test
let msg = format!("test[{}]: {:?}", i, d);
let msg = format!("test[{i}]: {d:?}");
// Call the function under test
let result = join_params_with_dash(d.str, d.num);
// Update the test details string with the results of the call
let msg = format!("{}, result: {:?}", msg, result);
let msg = format!("{msg}, result: {result:?}");
// Perform the checks
if d.result.is_ok() {
@@ -267,8 +267,8 @@ mod tests {
continue;
}
let expected_error = format!("{}", d.result.as_ref().unwrap_err());
let actual_error = format!("{}", result.unwrap_err());
let expected_error = format!("{d.result.as_ref().unwrap_err()}");
let actual_error = format!("{result.unwrap_err()}");
assert!(actual_error == expected_error, msg);
}
}

9
docs/assets/favicon.svg Normal file
View File

@@ -0,0 +1,9 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 32 32">
<!-- Dark background matching the site -->
<rect width="32" height="32" rx="4" fill="#1a1a2e"/>
<!-- Kata logo scaled and centered -->
<g transform="translate(-27, -2) scale(0.75)">
<path d="M70.925 25.22L58.572 37.523 46.27 25.22l2.192-2.192 10.11 10.11 10.11-10.11zm-6.575-.2l-3.188-3.188 3.188-3.188 3.188 3.188zm-4.93-2.54l3.736 3.736-3.736 3.736zm-1.694 7.422l-8.07-8.07 8.07-8.07zm1.694-16.14l3.686 3.686-3.686 3.686zm-13.15 4.682L58.572 6.143l12.353 12.303-2.192 2.192-10.16-10.11-10.11 10.11zm26.997 0L58.572 3.752 43.878 18.446l3.387 3.387-3.387 3.387 14.694 14.694L73.266 25.22l-3.337-3.387z" fill="#f15b3e"/>
</g>
</svg>

After

Width:  |  Height:  |  Size: 710 B

View File

@@ -4,7 +4,7 @@ As we know, we can interact with cgroups in two ways, **`cgroupfs`** and **`syst
## usage
For systemd, kata agent configures cgroups according to the following `linux.cgroupsPath` format standard provided by `runc` (`[slice]:[prefix]:[name]`). If you don't provide a valid `linux.cgroupsPath`, kata agent will treat it as `"system.slice:kata_agent:<container-id>"`.
For systemd, kata agent configures cgroups according to the following `linux.cgroupsPath` format standard provided by `runc` (`[slice]:[prefix]:[name]`). If you don't provide a valid `linux.cgroupsPath`, kata agent will treat it as `"system.slice:kata_agent:<container-id>"`.
> Here slice is a systemd slice under which the container is placed. If empty, it defaults to system.slice, except when cgroup v2 is used and rootless container is created, in which case it defaults to user.slice.
>
@@ -65,7 +65,7 @@ The kata agent will translate the parameters in the `linux.resources` of `config
## Systemd Interface
`session.rs` and `system.rs` in `src/agent/rustjail/src/cgroups/systemd/interface` are automatically generated by `zbus-xmlgen`, which is is an accompanying tool provided by `zbus` to generate Rust code from `D-Bus XML interface descriptions`. The specific commands to generate these two files are as follows:
`session.rs` and `system.rs` in `src/agent/rustjail/src/cgroups/systemd/interface` are automatically generated by `zbus-xmlgen`, which is is an accompanying tool provided by `zbus` to generate Rust code from `D-Bus XML interface descriptions`. The specific commands to generate these two files are as follows:
```shell
// system.rs

View File

@@ -10,7 +10,7 @@ participant proxy
#Docker Exec
Docker->kata-runtime: exec
kata-runtime->virtcontainers: EnterContainer()
virtcontainers->agent: exec
virtcontainers->agent: exec
agent->virtcontainers: Process started in the container
virtcontainers->shim: start shim
shim->agent: ReadStdout()

View File

@@ -322,7 +322,7 @@ The runtime is responsible for starting the [hypervisor](#hypervisor)
and it's VM, and communicating with the [agent](#agent) using a
[ttRPC based protocol](#agent-communications-protocol) over a VSOCK
socket that provides a communications link between the VM and the
host.
host.
This protocol allows the runtime to send container management commands
to the agent. The protocol is also used to carry the standard I/O

View File

@@ -4,15 +4,15 @@ Containers typically live in their own, possibly shared, networking namespace.
At some point in a container lifecycle, container engines will set up that namespace
to add the container to a network which is isolated from the host network.
In order to setup the network for a container, container engines call into a
In order to setup the network for a container, container engines call into a
networking plugin. The network plugin will usually create a virtual
ethernet (`veth`) pair adding one end of the `veth` pair into the container
networking namespace, while the other end of the `veth` pair is added to the
ethernet (`veth`) pair adding one end of the `veth` pair into the container
networking namespace, while the other end of the `veth` pair is added to the
host networking namespace.
This is a very namespace-centric approach as many hypervisors or VM
Managers (VMMs) such as `virt-manager` cannot handle `veth`
interfaces. Typically, [`TAP`](https://www.kernel.org/doc/Documentation/networking/tuntap.txt)
interfaces. Typically, [`TAP`](https://www.kernel.org/doc/Documentation/networking/tuntap.txt)
interfaces are created for VM connectivity.
To overcome incompatibility between typical container engines expectations
@@ -22,15 +22,15 @@ interfaces with `TAP` ones using [Traffic Control](https://man7.org/linux/man-pa
![Kata Containers networking](../arch-images/network.png)
With a TC filter rules in place, a redirection is created between the container network
and the virtual machine. As an example, the network plugin may place a device,
`eth0`, in the container's network namespace, which is one end of a VETH device.
and the virtual machine. As an example, the network plugin may place a device,
`eth0`, in the container's network namespace, which is one end of a VETH device.
Kata Containers will create a tap device for the VM, `tap0_kata`,
and setup a TC redirection filter to redirect traffic from `eth0`'s ingress to `tap0_kata`'s egress,
and a second TC filter to redirect traffic from `tap0_kata`'s ingress to `eth0`'s egress.
Kata Containers maintains support for MACVTAP, which was an earlier implementation used in Kata.
With this method, Kata created a MACVTAP device to connect directly to the `eth0` device.
TC-filter is the default because it allows for simpler configuration, better CNI plugin
Kata Containers maintains support for MACVTAP, which was an earlier implementation used in Kata.
With this method, Kata created a MACVTAP device to connect directly to the `eth0` device.
TC-filter is the default because it allows for simpler configuration, better CNI plugin
compatibility, and performance on par with MACVTAP.
Kata Containers has deprecated support for bridge due to lacking performance relative to TC-filter and MACVTAP.

View File

@@ -51,6 +51,7 @@ containers started after the VM has been launched.
Users can check to see if the container uses the `devicemapper` block
device as its rootfs by calling `mount(8)` within the container. If
the `devicemapper` block device is used, the root filesystem (`/`)
will be mounted from `/dev/vda`. Users can disable direct mounting of
the underlying block device through the runtime
[configuration](README.md#configuration).
will be mounted from `/dev/vda`. Users can enable direct mounting of
the underlying block device by setting the runtime
[configuration](README.md#configuration) flag `disable_block_device_use` to
`false`.

View File

@@ -111,20 +111,20 @@ In our case, there will be a variety of resources, and every resource has severa
- Are the "service", "message dispatcher" and "runtime handler" all part of the single Kata 3.x runtime binary?
Yes. They are components in Kata 3.x runtime. And they will be packed into one binary.
1. Service is an interface, which is responsible for handling multiple services like task service, image service and etc.
2. Message dispatcher, it is used to match multiple requests from the service module.
3. Runtime handler is used to deal with the operation for sandbox and container.
1. Service is an interface, which is responsible for handling multiple services like task service, image service and etc.
2. Message dispatcher, it is used to match multiple requests from the service module.
3. Runtime handler is used to deal with the operation for sandbox and container.
- What is the name of the Kata 3.x runtime binary?
Apparently we can't use `containerd-shim-v2-kata` because it's already used. We are facing the hardest issue of "naming" again. Any suggestions are welcomed.
Internally we use `containerd-shim-v2-rund`.
- Is the Kata 3.x design compatible with the containerd shimv2 architecture?
- Is the Kata 3.x design compatible with the containerd shimv2 architecture?
Yes. It is designed to follow the functionality of go version kata. And it implements the `containerd shim v2` interface/protocol.
- How will users migrate to the Kata 3.x architecture?
The migration plan will be provided before the Kata 3.x is merging into the main branch.
- Is `Dragonball` limited to its own built-in VMM? Can the `Dragonball` system be configured to work using an external `Dragonball` VMM/hypervisor?
@@ -134,35 +134,35 @@ In our case, there will be a variety of resources, and every resource has severa
`runD` is the `containerd-shim-v2` counterpart of `runC` and can run a pod/containers. `Dragonball` is a `microvm`/VMM that is designed to run container workloads. Instead of `microvm`/VMM, we sometimes refer to it as secure sandbox.
- QEMU, Cloud Hypervisor and Firecracker support are planned, but how that would work. Are they working in separate process?
Yes. They are unable to work as built in VMM.
- What is `upcall`?
The `upcall` is used to hotplug CPU/memory/MMIO devices, and it solves two issues.
1. avoid dependency on PCI/ACPI
2. avoid dependency on `udevd` within guest and get deterministic results for hotplug operations. So `upcall` is an alternative to ACPI based CPU/memory/device hotplug. And we may cooperate with the community to add support for ACPI based CPU/memory/device hotplug if needed.
`Dbs-upcall` is a `vsock-based` direct communication tool between VMM and guests. The server side of the `upcall` is a driver in guest kernel (kernel patches are needed for this feature) and it'll start to serve the requests once the kernel has started. And the client side is in VMM , it'll be a thread that communicates with VSOCK through `uds`. We have accomplished device hotplug / hot-unplug directly through `upcall` in order to avoid virtualization of ACPI to minimize virtual machine's overhead. And there could be many other usage through this direct communication channel. It's already open source.
https://github.com/openanolis/dragonball-sandbox/tree/main/crates/dbs-upcall
https://github.com/openanolis/dragonball-sandbox/tree/main/crates/dbs-upcall
- The URL below says the kernel patches work with 4.19, but do they also work with 5.15+ ?
Forward compatibility should be achievable, we have ported it to 5.10 based kernel.
- Are these patches platform-specific or would they work for any architecture that supports VSOCK?
It's almost platform independent, but some message related to CPU hotplug are platform dependent.
- Could the kernel driver be replaced with a userland daemon in the guest using loopback VSOCK?
We need to create device nodes for hot-added CPU/memory/devices, so it's not easy for userspace daemon to do these tasks.
- The fact that `upcall` allows communication between the VMM and the guest suggests that this architecture might be incompatible with https://github.com/confidential-containers where the VMM should have no knowledge of what happens inside the VM.
1. `TDX` doesn't support CPU/memory hotplug yet.
2. For ACPI based device hotplug, it depends on ACPI `DSDT` table, and the guest kernel will execute `ASL` code to handle during handling those hotplug event. And it should be easier to audit VSOCK based communication than ACPI `ASL` methods.
- What is the security boundary for the monolithic / "Built-in VMM" case?
It has the security boundary of virtualization. More details will be provided in next stage.

View File

@@ -1,62 +1,62 @@
# Motivation
Today, there exist a few gaps between Container Storage Interface (CSI) and virtual machine (VM) based runtimes such as Kata Containers
Today, there exist a few gaps between Container Storage Interface (CSI) and virtual machine (VM) based runtimes such as Kata Containers
that prevent them from working together smoothly.
First, its cumbersome to use a persistent volume (PV) with Kata Containers. Today, for a PV with Filesystem volume mode, Virtio-fs
is the only way to surface it inside a Kata Container guest VM. But often mounting the filesystem (FS) within the guest operating system (OS) is
is the only way to surface it inside a Kata Container guest VM. But often mounting the filesystem (FS) within the guest operating system (OS) is
desired due to performance benefits, availability of native FS features and security benefits over the Virtio-fs mechanism.
Second, its difficult if not impossible to resize a PV online with Kata Containers. While a PV can be expanded on the host OS,
the updated metadata needs to be propagated to the guest OS in order for the application container to use the expanded volume.
Second, its difficult if not impossible to resize a PV online with Kata Containers. While a PV can be expanded on the host OS,
the updated metadata needs to be propagated to the guest OS in order for the application container to use the expanded volume.
Currently, there is not a way to propagate the PV metadata from the host OS to the guest OS without restarting the Pod sandbox.
# Proposed Solution
Because of the OS boundary, these features cannot be implemented in the CSI node driver plugin running on the host OS
as is normally done in the runc container. Instead, they can be done by the Kata Containers agent inside the guest OS,
but it requires the CSI driver to pass the relevant information to the Kata Containers runtime.
An ideal long term solution would be to have the `kubelet` coordinating the communication between the CSI driver and
the container runtime, as described in [KEP-2857](https://github.com/kubernetes/enhancements/pull/2893/files).
Because of the OS boundary, these features cannot be implemented in the CSI node driver plugin running on the host OS
as is normally done in the runc container. Instead, they can be done by the Kata Containers agent inside the guest OS,
but it requires the CSI driver to pass the relevant information to the Kata Containers runtime.
An ideal long term solution would be to have the `kubelet` coordinating the communication between the CSI driver and
the container runtime, as described in [KEP-2857](https://github.com/kubernetes/enhancements/pull/2893/files).
However, as the KEP is still under review, we would like to propose a short/medium term solution to unblock our use case.
The proposed solution is built on top of a previous [proposal](https://github.com/egernst/kata-containers/blob/da-proposal/docs/design/direct-assign-volume.md)
The proposed solution is built on top of a previous [proposal](https://github.com/egernst/kata-containers/blob/da-proposal/docs/design/direct-assign-volume.md)
described by Eric Ernst. The previous proposal has two gaps:
1. Writing a `csiPlugin.json` file to the volume root path introduced a security risk. A malicious user can gain unauthorized
access to a block device by writing their own `csiPlugin.json` to the above location through an ephemeral CSI plugin.
1. Writing a `csiPlugin.json` file to the volume root path introduced a security risk. A malicious user can gain unauthorized
access to a block device by writing their own `csiPlugin.json` to the above location through an ephemeral CSI plugin.
2. The proposal didn't describe how to establish a mapping between a volume and a kata sandbox, which is needed for
2. The proposal didn't describe how to establish a mapping between a volume and a kata sandbox, which is needed for
implementing CSI volume resize and volume stat collection APIs.
This document particularly focuses on how to address these two gaps.
## Assumptions and Limitations
1. The proposal assumes that a block device volume will only be used by one Pod on a node at a time, which we believe
is the most common pattern in Kata Containers use cases. Its also unsafe to have the same block device attached to more than
one Kata pod. In the context of Kubernetes, the `PersistentVolumeClaim` (PVC) needs to have the `accessMode` as `ReadWriteOncePod`.
2. More advanced Kubernetes volume features such as, `fsGroup`, `fsGroupChangePolicy`, and `subPath` are not supported.
1. The proposal assumes that a block device volume will only be used by one Pod on a node at a time, which we believe
is the most common pattern in Kata Containers use cases. Its also unsafe to have the same block device attached to more than
one Kata pod. In the context of Kubernetes, the `PersistentVolumeClaim` (PVC) needs to have the `accessMode` as `ReadWriteOncePod`.
2. More advanced Kubernetes volume features such as, `fsGroup`, `fsGroupChangePolicy`, and `subPath` are not supported.
## End User Interface
1. The user specifies a PV as a direct-assigned volume. How a PV is specified as a direct-assigned volume is left for each CSI implementation to decide.
There are a few options for reference:
1. A storage class parameter specifies whether it's a direct-assigned volume. This avoids any lookups of PVC
or Pod information from the CSI plugin (as external provisioner takes care of these). However, all PVs in the storage class with the parameter set
1. A storage class parameter specifies whether it's a direct-assigned volume. This avoids any lookups of PVC
or Pod information from the CSI plugin (as external provisioner takes care of these). However, all PVs in the storage class with the parameter set
will have host mounts skipped.
2. Use a PVC annotation. This approach requires the CSI plugins have `--extra-create-metadata` [set](https://kubernetes-csi.github.io/docs/external-provisioner.html#persistentvolumeclaim-and-persistentvolume-parameters)
to be able to perform a lookup of the PVC annotations from the API server. Pro: API server lookup of annotations only required during creation of PV.
to be able to perform a lookup of the PVC annotations from the API server. Pro: API server lookup of annotations only required during creation of PV.
Con: The CSI plugin will always skip host mounting of the PV.
3. The CSI plugin can also lookup pod `runtimeclass` during `NodePublish`. This approach can be found in the [ALIBABA CSI plugin](https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver/blob/master/pkg/disk/nodeserver.go#L248).
2. The CSI node driver delegates the direct assigned volume to the Kata Containers runtime. The CSI node driver APIs need to
2. The CSI node driver delegates the direct assigned volume to the Kata Containers runtime. The CSI node driver APIs need to
be modified to pass the volume mount information and collect volume information to/from the Kata Containers runtime by invoking `kata-runtime` command line commands.
* **NodePublishVolume** -- It invokes `kata-runtime direct-volume add --volume-path [volumePath] --mount-info [mountInfo]`
* **`NodePublishVolume`** -- It invokes `kata-runtime direct-volume add --volume-path [volumePath] --mount-info [mountInfo]`
to propagate the volume mount information to the Kata Containers runtime for it to carry out the filesystem mount operation.
The `volumePath` is the [target_path](https://github.com/container-storage-interface/spec/blob/master/csi.proto#L1364) in the CSI `NodePublishVolumeRequest`.
The `mountInfo` is a serialized JSON string.
* **NodeGetVolumeStats** -- It invokes `kata-runtime direct-volume stats --volume-path [volumePath]` to retrieve the filesystem stats of direct-assigned volume.
* **NodeExpandVolume** -- It invokes `kata-runtime direct-volume resize --volume-path [volumePath] --size [size]` to send a resize request to the Kata Containers runtime to
The `mountInfo` is a serialized JSON string.
* **`NodeGetVolumeStats`** -- It invokes `kata-runtime direct-volume stats --volume-path [volumePath]` to retrieve the filesystem stats of direct-assigned volume.
* **`NodeExpandVolume`** -- It invokes `kata-runtime direct-volume resize --volume-path [volumePath] --size [size]` to send a resize request to the Kata Containers runtime to
resize the direct-assigned volume.
* **NodeStageVolume/NodeUnStageVolume** -- It invokes `kata-runtime direct-volume remove --volume-path [volumePath]` to remove the persisted metadata of a direct-assigned volume.
* **`NodeStageVolume/NodeUnStageVolume`** -- It invokes `kata-runtime direct-volume remove --volume-path [volumePath]` to remove the persisted metadata of a direct-assigned volume.
The `mountInfo` object is defined as follows:
```Golang
@@ -78,17 +78,17 @@ Notes: given that the `mountInfo` is persisted to the disk by the Kata runtime,
## Implementation Details
### Kata runtime
Instead of the CSI node driver writing the mount info into a `csiPlugin.json` file under the volume root,
as described in the original proposal, here we propose that the CSI node driver passes the mount information to
the Kata Containers runtime through a new `kata-runtime` commandline command. The `kata-runtime` then writes the mount
Instead of the CSI node driver writing the mount info into a `csiPlugin.json` file under the volume root,
as described in the original proposal, here we propose that the CSI node driver passes the mount information to
the Kata Containers runtime through a new `kata-runtime` commandline command. The `kata-runtime` then writes the mount
information to a `mountInfo.json` file in a predefined location (`/run/kata-containers/shared/direct-volumes/[volume_path]/`).
When the Kata Containers runtime starts a container, it verifies whether a volume mount is a direct-assigned volume by checking
whether there is a `mountInfo` file under the computed Kata `direct-volumes` directory. If it is, the runtime parses the `mountInfo` file,
When the Kata Containers runtime starts a container, it verifies whether a volume mount is a direct-assigned volume by checking
whether there is a `mountInfo` file under the computed Kata `direct-volumes` directory. If it is, the runtime parses the `mountInfo` file,
updates the mount spec with the data in `mountInfo`. The updated mount spec is then passed to the Kata agent in the guest VM together
with other mounts. The Kata Containers runtime also creates a file named by the sandbox id under the `direct-volumes/[volume_path]/`
directory. The reason for adding a sandbox id file is to establish a mapping between the volume and the sandbox using it.
Later, when the Kata Containers runtime handles the `get-stats` and `resize` commands, it uses the sandbox id to identify
with other mounts. The Kata Containers runtime also creates a file named by the sandbox id under the `direct-volumes/[volume_path]/`
directory. The reason for adding a sandbox id file is to establish a mapping between the volume and the sandbox using it.
Later, when the Kata Containers runtime handles the `get-stats` and `resize` commands, it uses the sandbox id to identify
the endpoint of the corresponding `containerd-shim-kata-v2`.
### containerd-shim-kata-v2 changes
@@ -101,12 +101,12 @@ $ curl --unix-socket "$shim_socket_path" -I -X GET 'http://localhost/direct-volu
$ curl --unix-socket "$shim_socket_path" -I -X POST 'http://localhost/direct-volume/resize' -d '{ "volumePath"": [volumePath], "Size": "123123" }'
```
The shim then forwards the corresponding request to the `kata-agent` to carry out the operations inside the guest VM. For `resize` operation,
the Kata runtime also needs to notify the hypervisor to resize the block device (e.g. call `block_resize` in QEMU).
The shim then forwards the corresponding request to the `kata-agent` to carry out the operations inside the guest VM. For `resize` operation,
the Kata runtime also needs to notify the hypervisor to resize the block device (e.g. call `block_resize` in QEMU).
### Kata agent changes
The mount spec of a direct-assigned volume is passed to `kata-agent` through the existing `Storage` GRPC object.
The mount spec of a direct-assigned volume is passed to `kata-agent` through the existing `Storage` GRPC object.
Two new APIs and three new GRPC objects are added to GRPC protocol between the shim and agent for resizing and getting volume stats:
```protobuf
@@ -226,7 +226,7 @@ Lets assume that changes have been made in the `aws-ebs-csi-driver` node driv
1. In the node CSI driver, the `NodePublishVolume` API invokes: `kata-runtime direct-volume add --volume-path "/kubelet/a/b/c/d/sdf" --mount-info "{\"Device\": \"/dev/sdf\", \"fstype\": \"ext4\"}"`.
2. The `Kata-runtime` writes the mount-info JSON to a file called `mountInfo.json` under `/run/kata-containers/shared/direct-volumes/kubelet/a/b/c/d/sdf`.
**Node unstage volume**
**Node `unstage` volume**
1. In the node CSI driver, the `NodeUnstageVolume` API invokes: `kata-runtime direct-volume remove --volume-path "/kubelet/a/b/c/d/sdf"`.
2. Kata-runtime deletes the directory `/run/kata-containers/shared/direct-volumes/kubelet/a/b/c/d/sdf`.

View File

@@ -59,5 +59,5 @@ The table below summarized when and where those different hooks will be executed
+ `Hook Path` specifies where hook's path be resolved.
+ `Exec Place` specifies in which namespace those hooks can be executed.
+ For `CreateContainer` Hooks, OCI requires to run them inside the container namespace while the hook executable path is in the host runtime, which is a non-starter for VM-based containers. So we design to keep them running in the *host vmm namespace.*
+ `Exec Time` specifies at which time point those hooks can be executed.
+ For `CreateContainer` Hooks, OCI requires to run them inside the container namespace while the hook executable path is in the host runtime, which is a non-starter for VM-based containers. So we design to keep them running in the *host vmm namespace.*
+ `Exec Time` specifies at which time point those hooks can be executed.

View File

@@ -118,7 +118,7 @@ all vCPU and I/O related threads) will be created in the `/kata_<PodSandboxID>`
### Why create a kata-cgroup under the parent cgroup?
And why not directly adding the per sandbox shim directly to the pod cgroup (e.g.
And why not directly adding the per sandbox shim directly to the pod cgroup (e.g.
`/kubepods` in the Kubernetes context)?
The Kata Containers shim implementation creates a per-sandbox cgroup
@@ -219,13 +219,13 @@ the `/kubepods` cgroup hierarchy, and a `/<PodSandboxID>` under the `/kata_overh
On a typical cgroup v1 hierarchy mounted under `/sys/fs/cgroup/`, for a pod which sandbox
ID is `12345678`, create with `sandbox_cgroup_only` disabled, the 2 memory subsystems
for the sandbox cgroup and the overhead cgroup would respectively live under
for the sandbox cgroup and the overhead cgroup would respectively live under
`/sys/fs/cgroup/memory/kubepods/kata_12345678` and `/sys/fs/cgroup/memory/kata_overhead/12345678`.
Unlike when `sandbox_cgroup_only` is enabled, the Kata Containers shim will move itself
to the overhead cgroup first, and then move the vCPU threads to the sandbox cgroup as
they're created. All Kata processes and threads will run under the overhead cgroup except for
the vCPU threads.
the vCPU threads.
With `sandbox_cgroup_only` disabled, Kata Containers assumes the pod cgroup is only sized
to accommodate for the actual container workloads processes. For Kata, this maps
@@ -247,7 +247,7 @@ cgroup size and constraints accordingly.
# Supported cgroups
Kata Containers currently supports cgroups `v1` and `v2`.
Kata Containers currently supports cgroups `v1` and `v2`.
In the following sections each cgroup is described briefly.

View File

@@ -119,17 +119,17 @@ The metrics service also doesn't hold any metrics in memory.
*Metrics size*: response size of one Prometheus scrape request.
It's easy to estimate the size of one metrics fetch request issued by Prometheus.
The formula to calculate the expected size when no gzip compression is in place is:
The formula to calculate the expected size when no gzip compression is in place is:
9 + (144 - 9) * `number of kata sandboxes`
Prometheus supports `gzip compression`. When enabled, the response size of each request will be smaller:
Prometheus supports `gzip compression`. When enabled, the response size of each request will be smaller:
2 + (10 - 2) * `number of kata sandboxes`
**Example**
We have 10 sandboxes running on a node. The expected size of one metrics fetch request issued by Prometheus against the kata-monitor agent running on that node will be:
**Example**
We have 10 sandboxes running on a node. The expected size of one metrics fetch request issued by Prometheus against the kata-monitor agent running on that node will be:
9 + (144 - 9) * 10 = **1.35M**
If `gzip compression` is enabled:
If `gzip compression` is enabled:
2 + (10 - 2) * 10 = **82K**
#### Metrics delay ####

View File

@@ -71,7 +71,7 @@ The Kata Containers runtime **MUST** support scalable I/O through the SRIOV tech
### Virtualization overhead reduction
A compelling aspect of containers is their minimal overhead compared to bare metal applications.
A container runtime should keep the overhead to a minimum in order to provide the expected user
experience.
experience.
The Kata Containers runtime implementation **SHOULD** be optimized for:
* Minimal workload boot and shutdown times

View File

@@ -5,7 +5,7 @@ To safeguard the integrity of container images and prevent tampering from the ho
## Introduction to remote snapshot
Containerd 1.7 introduced `remote snapshotter` feature which is the foundation for pulling images in the guest for Confidential Containers.
While it's beyond the scope of this document to fully explain how the container rootfs is created to the point it can be executed, a fundamental grasp of the snapshot concept is essential. Putting it in a simple way, containerd fetches the image layers from an OCI registry into its local content storage. However, they cannot be mounted as is (e.g. the layer can be tar+gzip compressed) as well as they should be immutable so the content can be shared among containers. Thus containerd leverages snapshots of those layers to build the container's rootfs.
While it's beyond the scope of this document to fully explain how the container rootfs is created to the point it can be executed, a fundamental grasp of the snapshot concept is essential. Putting it in a simple way, containerd fetches the image layers from an OCI registry into its local content storage. However, they cannot be mounted as is (e.g. the layer can be tar+gzip compressed) as well as they should be immutable so the content can be shared among containers. Thus containerd leverages snapshots of those layers to build the container's rootfs.
The role of `remote snapshotter` is to reuse snapshots that are stored in a remotely shared place, thus enabling containerd to prepare the containers rootfs in a manner similar to that of a local `snapshotter`. The key behavior that makes this the building block of Kata's guest image management for Confidential Containers is that containerd will not pull the image layers from registry, instead it assumes that `remote snapshotter` and/or an external entity will perform that operation on his behalf.
@@ -48,7 +48,7 @@ Pull the container image directly from the guest VM using `nydus snapshotter` ba
#### Architecture
The following diagram provides an overview of the architecture for pulling image in the guest with key components.
The following diagram provides an overview of the architecture for pulling image in the guest with key components.
```mermaid
flowchart LR
Kubelet[kubelet]--> |1\. Pull image request & metadata|Containerd
@@ -129,7 +129,7 @@ Next the `handleImageGuestPullBlockVolume()` is called to build the Storage obje
Below is an example of storage information packaged in the message sent to the kata-agent:
```json
"driver": "image_guest_pull",
"driver": "image_guest_pull",
"driver_options": [
"image_guest_pull"{
"metadata":{
@@ -145,15 +145,15 @@ Below is an example of storage information packaged in the message sent to the k
"io.kubernetes.cri.sandbox-uid": "de7c6a0c-79c0-44dc-a099-69bb39f180af",
}
}
],
"source": "quay.io/kata-containers/confidential-containers:unsigned",
"fstype": "overlay",
"options": [],
],
"source": "quay.io/kata-containers/confidential-containers:unsigned",
"fstype": "overlay",
"options": [],
"mount_point": "/run/kata-containers/cb0b47276ea66ee9f44cc53afa94d7980b57a52c3f306f68cb034e58d9fbd3c6/rootfs",
```
Next, the kata-agent's RPC module will handle the create container request which, among other things, involves adding storages to the sandbox. The storage module contains implementations of `StorageHandler` interface for various storage types, being the `ImagePullHandler` in charge of handling the storage object for the container image (the storage manager instantiates the handler based on the value of the "driver").
`ImagePullHandler` delegates the image pulling operation to the `confidential_data_hub.pull_image()` that is going to create the image's bundle directory on the guest filesystem and, in turn, the `ImagePullService` of Confidential Data Hub to fetch, uncompress and mount the image's rootfs.
`ImagePullHandler` delegates the image pulling operation to the `confidential_data_hub.pull_image()` that is going to create the image's bundle directory on the guest filesystem and, in turn, the `ImagePullService` of Confidential Data Hub to fetch, uncompress and mount the image's rootfs.
> **Notes:**
> In this flow, `confidential_data_hub.pull_image()` parses the image metadata, looking for either the `io.kubernetes.cri.container-type: sandbox` or `io.kubernetes.cri-o.ContainerType: sandbox` (CRI-IO case) annotation, then it never calls the `pull_image()` RPC of Confidential Data Hub because the pause image is expected to already be inside the guest's filesystem, so instead `confidential_data_hub.unpack_pause_image()` is called.

View File

@@ -1,6 +1,6 @@
# Virtual machine vCPU sizing in Kata Containers 3.0
> Preview:
> Preview:
> [Kubernetes(since 1.23)][1] and [Containerd(since 1.6.0-beta4)][2] will help calculate `Sandbox Size` info and pass it to Kata Containers through annotations.
> In order to adapt to this beneficial change and be compatible with the past, we have implemented the new vCPUs handling way in `runtime-rs`, which is slightly different from the original `runtime-go`'s design.
@@ -20,7 +20,7 @@ Our understanding and priority of these resources are as follows, which will aff
* `default_vcpus`: default number of vCPUs when starting a VM.
* `default_maxvcpus`: maximum number of vCPUs.
* From `Annotation`:
* `InitialSize`: we call the size of the resource passed from the annotations as `InitialSize`. Kubernetes will calculate the sandbox size according to the Pod's statement, which is the `InitialSize` here. This size should be the size we want to prioritize.
* `InitialSize`: we call the size of the resource passed from the annotations as `InitialSize`. Kubernetes will calculate the sandbox size according to the Pod's statement, which is the `InitialSize` here. This size should be the size we want to prioritize.
* From `Container Spec`:
* The amount of CPU resources that the Container wants to use will be declared through the spec. Including the aforementioned annotations, we mainly consider `cpu quota` and `cpuset` when calculating the number of vCPUs.
* `cpu quota`: `cpu quota` is the most common way to declare the amount of CPU resources. The number of vCPUs introduced by `cpu quota` declared in a container's spec is: `vCPUs = ceiling( quota / period )`.

View File

@@ -1,9 +1,9 @@
# Design Doc for Kata Containers' VCPUs Pinning Feature
# Design Doc for Kata Containers VCPUs Pinning Feature
## Background
By now, vCPU threads of Kata Containers are scheduled randomly to CPUs. And each pod would request a specific set of CPUs which we call it CPU set (just the CPU set meaning in Linux cgroups).
By now, vCPU threads of Kata Containers are scheduled randomly to CPUs. And each pod would request a specific set of CPUs which we call it CPU set (just the CPU set meaning in Linux cgroups).
If the number of vCPU threads are equal to that of CPUs claimed in CPU set, we can then pin each vCPU thread to one specified CPU, to reduce the cost of random scheduling.
If the number of vCPU threads are equal to that of CPUs claimed in CPU set, we can then pin each vCPU thread to one specified CPU, to reduce the cost of random scheduling.
## Detailed Design
@@ -20,7 +20,7 @@ Two ways are provided to use this vCPU thread pinning feature: through `QEMU` co
### When is VCPUs Pinning Checked?
As shown in Section 1, when `num(vCPU threads) == num(CPUs in CPU set)`, we shall pin each vCPU thread to a specified CPU. And when this condition is broken, we should restore to the original random scheduling pattern.
As shown in Section 1, when `num(vCPU threads) == num(CPUs in CPU set)`, we shall pin each vCPU thread to a specified CPU. And when this condition is broken, we should restore to the original random scheduling pattern.
So when may `num(CPUs in CPU set)` change? There are 5 possible scenes:
| Possible scenes | Related Code |
@@ -34,4 +34,4 @@ So when may `num(CPUs in CPU set)` change? There are 5 possible scenes:
### Core Pinning Logics
We can split the whole process into the following steps. Related methods are `checkVCPUsPinning` and `resetVCPUsPinning`, in file Sandbox.go.
![](arch-images/vcpus-pinning-process.png)
![](arch-images/vcpus-pinning-process.png)

View File

@@ -49,4 +49,4 @@
- [How to use the Kata Agent Policy](how-to-use-the-kata-agent-policy.md)
- [How to pull images in the guest](how-to-pull-images-in-guest-with-kata.md)
- [How to use mem-agent to decrease the memory usage of Kata container](how-to-use-memory-agent.md)
- [How to use seccomp with runtime-rs](how-to-use-seccomp-with-runtime-rs.md)
- [How to use seccomp with runtime-rs](how-to-use-seccomp-with-runtime-rs.md)

View File

@@ -1,24 +1,24 @@
# How to use Kata Containers and Containerd
This document covers the installation and configuration of [containerd](https://containerd.io/)
This document covers the installation and configuration of [containerd](https://containerd.io/)
and [Kata Containers](https://katacontainers.io). The containerd provides not only the `ctr`
command line tool, but also the [CRI](https://kubernetes.io/blog/2016/12/container-runtime-interface-cri-in-kubernetes/)
command line tool, but also the [CRI](https://kubernetes.io/blog/2016/12/container-runtime-interface-cri-in-kubernetes/)
interface for [Kubernetes](https://kubernetes.io) and other CRI clients.
This document is primarily written for Kata Containers v1.5.0-rc2 or above, and containerd v1.2.0 or above.
This document is primarily written for Kata Containers v1.5.0-rc2 or above, and containerd v1.2.0 or above.
Previous versions are addressed here, but we suggest users upgrade to the newer versions for better support.
## Concepts
### Kubernetes `RuntimeClass`
[`RuntimeClass`](https://kubernetes.io/docs/concepts/containers/runtime-class/) is a Kubernetes feature first
introduced in Kubernetes 1.12 as alpha. It is the feature for selecting the container runtime configuration to
[`RuntimeClass`](https://kubernetes.io/docs/concepts/containers/runtime-class/) is a Kubernetes feature first
introduced in Kubernetes 1.12 as alpha. It is the feature for selecting the container runtime configuration to
use to run a pods containers. This feature is supported in `containerd` since [v1.2.0](https://github.com/containerd/containerd/releases/tag/v1.2.0).
Before the `RuntimeClass` was introduced, Kubernetes was not aware of the difference of runtimes on the node. `kubelet`
creates Pod sandboxes and containers through CRI implementations, and treats all the Pods equally. However, there
are requirements to run trusted Pods (i.e. Kubernetes plugin) in a native container like runc, and to run untrusted
are requirements to run trusted Pods (i.e. Kubernetes plugin) in a native container like runc, and to run untrusted
workloads with isolated sandboxes (i.e. Kata Containers).
As a result, the CRI implementations extended their semantics for the requirements:
@@ -32,17 +32,17 @@ As a result, the CRI implementations extended their semantics for the requiremen
```
- Similarly, CRI-O introduced the annotation `io.kubernetes.cri-o.TrustedSandbox` for untrusted Pods.
To eliminate the complexity of user configuration introduced by the non-standardized annotations and provide
extensibility, `RuntimeClass` was introduced. This gives users the ability to affect the runtime behavior
through `RuntimeClass` without the knowledge of the CRI daemons. We suggest that users with multiple runtimes
To eliminate the complexity of user configuration introduced by the non-standardized annotations and provide
extensibility, `RuntimeClass` was introduced. This gives users the ability to affect the runtime behavior
through `RuntimeClass` without the knowledge of the CRI daemons. We suggest that users with multiple runtimes
use `RuntimeClass` instead of the deprecated annotations.
### Containerd Runtime V2 API: Shim V2 API
The [`containerd-shim-kata-v2` (short as `shimv2` in this documentation)](../../src/runtime/cmd/containerd-shim-kata-v2/)
implements the [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/main/core/runtime/v2) for Kata.
With `shimv2`, Kubernetes can launch Pod and OCI-compatible containers with one shim per Pod. Prior to `shimv2`, `2N+1`
shims (i.e. a `containerd-shim` and a `kata-shim` for each container and the Pod sandbox itself) and no standalone `kata-proxy`
With `shimv2`, Kubernetes can launch Pod and OCI-compatible containers with one shim per Pod. Prior to `shimv2`, `2N+1`
shims (i.e. a `containerd-shim` and a `kata-shim` for each container and the Pod sandbox itself) and no standalone `kata-proxy`
process were used, even with VSOCK not available.
![Kubernetes integration with shimv2](../design/arch-images/shimv2.svg)
@@ -87,7 +87,7 @@ $ popd
### Install `cri-tools`
> **Note:** `cri-tools` is a set of tools for CRI used for development and testing. Users who only want
> **Note:** `cri-tools` is a set of tools for CRI used for development and testing. Users who only want
> to use containerd with Kubernetes can skip the `cri-tools`.
You can install the `cri-tools` from source code:
@@ -104,7 +104,7 @@ $ popd
### Configure containerd to use Kata Containers
By default, the configuration of containerd is located at `/etc/containerd/config.toml`, and the
By default, the configuration of containerd is located at `/etc/containerd/config.toml`, and the
`cri` plugins are placed in the following section:
```toml
@@ -123,7 +123,7 @@ The following sections outline how to add Kata Containers to the configurations.
#### Kata Containers as a `RuntimeClass`
For
For
- Kata Containers v1.5.0 or above (including `1.5.0-rc`)
- Containerd v1.2.0 or above
- Kubernetes v1.12.0 or above
@@ -132,8 +132,8 @@ The `RuntimeClass` is suggested.
The following configuration includes two runtime classes:
- `plugins.cri.containerd.runtimes.runc`: the runc, and it is the default runtime.
- `plugins.cri.containerd.runtimes.kata`: The function in containerd (reference [the document here](https://github.com/containerd/containerd/tree/main/core/runtime/v2))
where the dot-connected string `io.containerd.kata.v2` is translated to `containerd-shim-kata-v2` (i.e. the
- `plugins.cri.containerd.runtimes.kata`: The function in containerd (reference [the document here](https://github.com/containerd/containerd/tree/main/core/runtime/v2))
where the dot-connected string `io.containerd.kata.v2` is translated to `containerd-shim-kata-v2` (i.e. the
binary name of the Kata implementation of [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/main/core/runtime/v2)).
```toml
@@ -168,9 +168,9 @@ This `ConfigPath` option is optional. If you do not specify it, shimv2 first tri
#### Kata Containers as the runtime for untrusted workload
For cases without `RuntimeClass` support, we can use the legacy annotation method to support using Kata Containers
for an untrusted workload. With the following configuration, you can run trusted workloads with a runtime such as `runc`
and then, run an untrusted workload with Kata Containers:
For cases without `RuntimeClass` support, we can use the legacy annotation method to support using Kata Containers
for an untrusted workload. With the following configuration, you can run trusted workloads with a runtime such as `runc`
and then, run an untrusted workload with Kata Containers:
```toml
[plugins.cri.containerd]
@@ -201,9 +201,9 @@ If you want to set Kata Containers as the only runtime in the deployment, you ca
> **Note:** If you skipped the [Install `cri-tools`](#install-cri-tools) section, you can skip this section too.
First, add the CNI configuration in the containerd configuration.
First, add the CNI configuration in the containerd configuration.
The following is the configuration if you installed CNI as the *[Install CNI plugins](#install-cni-plugins)* section outlined.
The following is the configuration if you installed CNI as the *[Install CNI plugins](#install-cni-plugins)* section outlined.
Put the CNI configuration as `/etc/cni/net.d/10-mynet.conf`:
@@ -324,7 +324,7 @@ $ sudo crictl start 1aab7585530e6
1aab7585530e6
```
In Kubernetes, you need to create a `RuntimeClass` resource and add the `RuntimeClass` field in the Pod Spec
In Kubernetes, you need to create a `RuntimeClass` resource and add the `RuntimeClass` field in the Pod Spec
(see this [document](https://kubernetes.io/docs/concepts/containers/runtime-class/) for more information).
If `RuntimeClass` is not supported, you can use the following annotation in a Kubernetes pod to identify as an untrusted workload:

View File

@@ -3358,4 +3358,4 @@
"title": "Kata containers",
"uid": "75pdqURGk",
"version": 1
}
}

View File

@@ -27,7 +27,7 @@ spec:
containers:
- name: kata-monitor
image: quay.io/kata-containers/kata-monitor:2.0.0
args:
args:
- -log-level=debug
ports:
- containerPort: 8090

View File

@@ -79,7 +79,7 @@ metadata:
spec:
replicas: 1
selector:
matchLabels:
matchLabels:
app: prometheus
template:
metadata:

View File

@@ -16,7 +16,7 @@ To pull images in the guest, we need to do the following steps:
### Delete images used for pulling in the guest
Though the `CRI Runtime Specific Snapshotter` is still an [experimental feature](https://github.com/containerd/containerd/blob/main/RELEASES.md#experimental-features) in containerd, which containerd is not supported to manage the same image in different `snapshotters`(The default `snapshotter` in containerd is `overlayfs`). To avoid errors caused by this, it is recommended to delete images (including the pause image) in containerd that needs to be pulled in guest later before configuring `nydus snapshotter` in containerd.
Though the `CRI Runtime Specific Snapshotter` is still an [experimental feature](https://github.com/containerd/containerd/blob/main/RELEASES.md#experimental-features) in containerd, which containerd is not supported to manage the same image in different `snapshotters`(The default `snapshotter` in containerd is `overlayfs`). To avoid errors caused by this, it is recommended to delete images (including the pause image) in containerd that needs to be pulled in guest later before configuring `nydus snapshotter` in containerd.
### Install `nydus snapshotter`
@@ -24,7 +24,7 @@ Though the `CRI Runtime Specific Snapshotter` is still an [experimental feature]
To use DaemonSet to install `nydus snapshotter`, we need to ensure that `yq` exists in the host.
1. Download `nydus snapshotter` repo
1. Download `nydus snapshotter` repo
```bash
$ nydus_snapshotter_install_dir="/tmp/nydus-snapshotter"
$ nydus_snapshotter_url=https://github.com/containerd/nydus-snapshotter
@@ -42,7 +42,7 @@ $ yq -i \
$ yq -i \
> 'data.ENABLE_CONFIG_FROM_VOLUME = "false"' -P \
> misc/snapshotter/base/nydus-snapshotter.yaml
# Enable to run snapshotter as a systemd service
# Enable to run snapshotter as a systemd service
# (skip if you want to run nydus snapshotter as a standalone process)
$ yq -i \
> 'data.ENABLE_SYSTEMD_SERVICE = "true"' -P \
@@ -79,7 +79,7 @@ Created symlink /etc/systemd/system/multi-user.target.wants/nydus-snapshotter.se
#### Install `nydus snapshotter` manually
1. Download `nydus snapshotter` binary from release
1. Download `nydus snapshotter` binary from release
```bash
$ ARCH=$(uname -m)
$ golang_arch=$(case "$ARCH" in
@@ -111,7 +111,7 @@ level=info msg="Run daemons monitor..."
Configure `nydus snapshotter` to enable `CRI Runtime Specific Snapshotter` in containerd. This ensures run kata containers with `nydus snapshotter`. Below, the steps are illustrated using `kata-qemu` as an example.
```toml
# Modify containerd configuration to ensure that the following lines appear in the containerd configuration
# Modify containerd configuration to ensure that the following lines appear in the containerd configuration
# (Assume that the containerd config is located in /etc/containerd/config.toml)
[plugins."io.containerd.grpc.v1.cri".containerd]
@@ -124,7 +124,7 @@ Configure `nydus snapshotter` to enable `CRI Runtime Specific Snapshotter` in co
snapshotter = "nydus"
```
> **Notes:**
> **Notes:**
> The `CRI Runtime Specific Snapshotter` feature only works for containerd v1.7.0 and above. So for Containerd v1.7.0 below, in addition to the above settings, we need to set the global `snapshotter` to `nydus` in containerd config. For example:
```toml
@@ -256,7 +256,7 @@ spec:
values:
- NODE_NAME
volumes:
- name: trusted-storage
- name: trusted-image-storage
persistentVolumeClaim:
claimName: trusted-pvc
containers:
@@ -280,7 +280,7 @@ quay.io/confidential-containers/test-images largeimage
```bash
$ lsblk --fs
NAME FSTYPE LABEL UUID FSAVAIL FSUSE% MOUNTPOINT
sda
sda
└─encrypted_disk_GsLDt
178M 87% /run/kata-containers/image
@@ -309,4 +309,4 @@ $ free -m
total used free shared buff/cache available
Mem: 1989 52 43 0 1893 1904
Swap: 0 0 0
```
```

View File

@@ -10,19 +10,19 @@ Currently, there is no widely applicable and convenient method available for use
## Solution
According to the proposal, it requires to use the `kata-ctl direct-volume` command to add a direct assigned block volume device to the Kata Containers runtime.
According to the proposal, it requires to use the `kata-ctl direct-volume` command to add a direct assigned block volume device to the Kata Containers runtime.
And then with the help of method [get_volume_mount_info](https://github.com/kata-containers/kata-containers/blob/099b4b0d0e3db31b9054e7240715f0d7f51f9a1c/src/libs/kata-types/src/mount.rs#L95), get information from JSON file: `(mountinfo.json)` and parse them into structure [Direct Volume Info](https://github.com/kata-containers/kata-containers/blob/099b4b0d0e3db31b9054e7240715f0d7f51f9a1c/src/libs/kata-types/src/mount.rs#L70) which is used to save device-related information.
And then with the help of method [get_volume_mount_info](https://github.com/kata-containers/kata-containers/blob/099b4b0d0e3db31b9054e7240715f0d7f51f9a1c/src/libs/kata-types/src/mount.rs#L95), get information from JSON file: `(mountinfo.json)` and parse them into structure [Direct Volume Info](https://github.com/kata-containers/kata-containers/blob/099b4b0d0e3db31b9054e7240715f0d7f51f9a1c/src/libs/kata-types/src/mount.rs#L70) which is used to save device-related information.
We only fill the `mountinfo.json`, such as `device` ,`volume_type`, `fs_type`, `metadata` and `options`, which correspond to the fields in [Direct Volume Info](https://github.com/kata-containers/kata-containers/blob/099b4b0d0e3db31b9054e7240715f0d7f51f9a1c/src/libs/kata-types/src/mount.rs#L70), to describe a device.
We only fill the `mountinfo.json`, such as `device` ,`volume_type`, `fs_type`, `metadata` and `options`, which correspond to the fields in [Direct Volume Info](https://github.com/kata-containers/kata-containers/blob/099b4b0d0e3db31b9054e7240715f0d7f51f9a1c/src/libs/kata-types/src/mount.rs#L70), to describe a device.
The JSON file `mountinfo.json` placed in a sub-path `/kubelet/kata-test-vol-001/volume001` which under fixed path `/run/kata-containers/shared/direct-volumes/`.
And the full path looks like: `/run/kata-containers/shared/direct-volumes/kubelet/kata-test-vol-001/volume001`, But for some security reasons. it is
The JSON file `mountinfo.json` placed in a sub-path `/kubelet/kata-test-vol-001/volume001` which under fixed path `/run/kata-containers/shared/direct-volumes/`.
And the full path looks like: `/run/kata-containers/shared/direct-volumes/kubelet/kata-test-vol-001/volume001`, But for some security reasons. it is
encoded as `/run/kata-containers/shared/direct-volumes/L2t1YmVsZXQva2F0YS10ZXN0LXZvbC0wMDEvdm9sdW1lMDAx`.
Finally, when running a Kata Containers with `ctr run --mount type=X, src=Y, dst=Z,,options=rbind:rw`, the `type=X` should be specified a proprietary type specifically designed for some kind of volume.
Finally, when running a Kata Containers with `ctr run --mount type=X, src=Y, dst=Z,,options=rbind:rw`, the `type=X` should be specified a proprietary type specifically designed for some kind of volume.
Now, supported types:
Now, supported types:
- `directvol` for direct volume
- `vfiovol` for VFIO device based volume
@@ -46,10 +46,10 @@ $ sudo mkfs.ext4 /tmp/stor/rawdisk01.20g
```json
{
"device": "/tmp/stor/rawdisk01.20g",
"volume_type": "directvol",
"fs_type": "ext4",
"metadata":"{}",
"device": "/tmp/stor/rawdisk01.20g",
"volume_type": "directvol",
"fs_type": "ext4",
"metadata":"{}",
"options": []
}
```
@@ -57,7 +57,7 @@ $ sudo mkfs.ext4 /tmp/stor/rawdisk01.20g
```bash
$ sudo kata-ctl direct-volume add /kubelet/kata-direct-vol-002/directvol002 "{\"device\": \"/tmp/stor/rawdisk01.20g\", \"volume_type\": \"directvol\", \"fs_type\": \"ext4\", \"metadata\":"{}", \"options\": []}"
$# /kubelet/kata-direct-vol-002/directvol002 <==> /run/kata-containers/shared/direct-volumes/W1lMa2F0ZXQva2F0YS10a2F0DAxvbC0wMDEvdm9sdW1lMDAx
$ cat W1lMa2F0ZXQva2F0YS10a2F0DAxvbC0wMDEvdm9sdW1lMDAx/mountInfo.json
$ cat W1lMa2F0ZXQva2F0YS10a2F0DAxvbC0wMDEvdm9sdW1lMDAx/mountInfo.json
{"volume_type":"directvol","device":"/tmp/stor/rawdisk01.20g","fs_type":"ext4","metadata":{},"options":[]}
```
@@ -79,8 +79,8 @@ In this scenario, the device's host kernel driver will be replaced by `vfio-pci`
And either device's BDF or its VFIO IOMMU group ID in `/dev/vfio/` is fine for "device" in `mountinfo.json`.
```bash
$ lspci -nn -k -s 45:00.1
45:00.1 SCSI storage controller
$ lspci -nn -k -s 45:00.1
45:00.1 SCSI storage controller
...
Kernel driver in use: vfio-pci
...
@@ -99,9 +99,9 @@ First, configure the `mountinfo.json`, as below:
```json
{
"device": "45:00.1",
"volume_type": "vfiovol",
"fs_type": "ext4",
"metadata":"{}",
"volume_type": "vfiovol",
"fs_type": "ext4",
"metadata":"{}",
"options": []
}
```
@@ -111,9 +111,9 @@ First, configure the `mountinfo.json`, as below:
```json
{
"device": "0000:45:00.1",
"volume_type": "vfiovol",
"fs_type": "ext4",
"metadata":"{}",
"volume_type": "vfiovol",
"fs_type": "ext4",
"metadata":"{}",
"options": []
}
```
@@ -122,10 +122,10 @@ First, configure the `mountinfo.json`, as below:
```json
{
"device": "/dev/vfio/110",
"volume_type": "vfiovol",
"fs_type": "ext4",
"metadata":"{}",
"device": "/dev/vfio/110",
"volume_type": "vfiovol",
"fs_type": "ext4",
"metadata":"{}",
"options": []
}
```
@@ -135,7 +135,7 @@ Second, run kata-containers with device(`/dev/vfio/110`) as an example:
```bash
$ sudo kata-ctl direct-volume add /kubelet/kata-vfio-vol-003/vfiovol003 "{\"device\": \"/dev/vfio/110\", \"volume_type\": \"vfiovol\", \"fs_type\": \"ext4\", \"metadata\":"{}", \"options\": []}"
$ # /kubelet/kata-vfio-vol-003/directvol003 <==> /run/kata-containers/shared/direct-volumes/F0va22F0ZvaS12F0YS10a2F0DAxvbC0F0ZXvdm9sdF0Z0YSx
$ cat F0va22F0ZvaS12F0YS10a2F0DAxvbC0F0ZXvdm9sdF0Z0YSx/mountInfo.json
$ cat F0va22F0ZvaS12F0YS10a2F0DAxvbC0F0ZXvdm9sdF0Z0YSx/mountInfo.json
{"volume_type":"vfiovol","device":"/dev/vfio/110","fs_type":"ext4","metadata":{},"options":[]}
```

View File

@@ -1,5 +1,5 @@
## Introduction
To improve security, Kata Container supports running the VMM process (QEMU and cloud-hypervisor) as a non-`root` user.
To improve security, Kata Container supports running the VMM process (QEMU and cloud-hypervisor) as a non-`root` user.
This document describes how to enable the rootless VMM mode and its limitations.
## Pre-requisites
@@ -20,8 +20,8 @@ By default, the VMM process still runs as the root user. There are two ways to e
2. Set the Kubernetes annotation `io.katacontainers.hypervisor.rootless` to `true`.
## Implementation details
When `rootless` flag is enabled, upon a request to create a Pod, Kata Containers runtime creates a random user and group (e.g. `kata-123`), and uses them to start the hypervisor process.
The `kvm` group is also given to the hypervisor process as a supplemental group to give the hypervisor process access to the `/dev/kvm` device.
When `rootless` flag is enabled, upon a request to create a Pod, Kata Containers runtime creates a random user and group (e.g. `kata-123`), and uses them to start the hypervisor process.
The `kvm` group is also given to the hypervisor process as a supplemental group to give the hypervisor process access to the `/dev/kvm` device.
Another necessary change is to move the hypervisor runtime files (e.g. `vhost-fs.sock`, `qmp.sock`) to a directory (under `/run/user/[uid]/`) where only the non-root hypervisor has access to.
## Limitations
@@ -30,4 +30,4 @@ Another necessary change is to move the hypervisor runtime files (e.g. `vhost-fs
2. Currently, this feature is only supported in QEMU and cloud-hypervisor. For firecracker, you can use jailer to run the VMM process with a non-root user.
3. Certain features will not work when rootless VMM is enabled, including:
1. Passing devices to the guest (`virtio-blk`, `virtio-scsi`) will not work if the non-privileged user does not have permission to access it (leading to a permission denied error). A more permissive permission (e.g. 666) may overcome this issue. However, you need to be aware of the potential security implications of reducing the security on such devices.
2. `vfio` device will also not work because of permission denied error.
2. `vfio` device will also not work because of permission denied error.

View File

@@ -50,7 +50,7 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.hypervisor.default_max_vcpus` | uint32| the maximum number of vCPUs allocated for the VM by the hypervisor |
| `io.katacontainers.config.hypervisor.default_memory` | uint32| the memory assigned for a VM by the hypervisor in `MiB` |
| `io.katacontainers.config.hypervisor.default_vcpus` | float32| the default vCPUs assigned for a VM by the hypervisor |
| `io.katacontainers.config.hypervisor.disable_block_device_use` | `boolean` | disallow a block device from being used |
| `io.katacontainers.config.hypervisor.disable_block_device_use` | `boolean` | disable hotplugging host block devices to guest VMs for container rootfs |
| `io.katacontainers.config.hypervisor.disable_image_nvdimm` | `boolean` | specify if a `nvdimm` device should be used as rootfs for the guest (QEMU) |
| `io.katacontainers.config.hypervisor.disable_vhost_net` | `boolean` | specify if `vhost-net` is not available on the host |
| `io.katacontainers.config.hypervisor.enable_hugepages` | `boolean` | if the memory should be `pre-allocated` from huge pages |
@@ -59,7 +59,7 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.hypervisor.enable_iothreads` | `boolean`| enable IO to be processed in a separate thread. Supported currently for virtio-`scsi` driver |
| `io.katacontainers.config.hypervisor.enable_mem_prealloc` | `boolean` | the memory space used for `nvdimm` device by the hypervisor |
| `io.katacontainers.config.hypervisor.enable_vhost_user_store` | `boolean` | enable vhost-user storage device (QEMU) |
| `io.katacontainers.config.hypervisor.vhost_user_reconnect_timeout_sec` | `string`| the timeout for reconnecting vhost user socket (QEMU)
| `io.katacontainers.config.hypervisor.vhost_user_reconnect_timeout_sec` | `string`| the timeout for reconnecting vhost user socket (QEMU)
| `io.katacontainers.config.hypervisor.enable_virtio_mem` | `boolean` | enable virtio-mem (QEMU) |
| `io.katacontainers.config.hypervisor.entropy_source` (R) | string| the path to a host source of entropy (`/dev/random`, `/dev/urandom` or real hardware RNG device) |
| `io.katacontainers.config.hypervisor.file_mem_backend` (R) | string | file based memory backend root directory |

View File

@@ -19,7 +19,7 @@ The Kubernetes cluster will use the
## Install and configure containerd
First, follow the [How to use Kata Containers and Containerd](containerd-kata.md) to install and configure containerd.
First, follow the [How to use Kata Containers and Containerd](containerd-kata.md) to install and configure containerd.
Then, make sure the containerd works with the [examples in it](containerd-kata.md#run).
## Install and configure Kubernetes
@@ -44,7 +44,7 @@ In order to allow Kubelet to use containerd (using the CRI interface), configure
```bash
$ sudo mkdir -p /etc/systemd/system/kubelet.service.d/
$ cat << EOF | sudo tee /etc/systemd/system/kubelet.service.d/0-containerd.conf
[Service]
[Service]
Environment="KUBELET_EXTRA_ARGS=--container-runtime=remote --runtime-request-timeout=15m --container-runtime-endpoint=unix:///run/containerd/containerd.sock"
EOF
```
@@ -182,7 +182,7 @@ If a pod has the `runtimeClassName` set to `kata`, the CRI runs the pod with the
containers:
- name: nginx
image: nginx
EOF
```

View File

@@ -2,9 +2,9 @@
## Sysctls
In Linux, the sysctl interface allows an administrator to modify kernel
parameters at runtime. Parameters are available via the `/proc/sys/` virtual
process file system.
In Linux, the sysctl interface allows an administrator to modify kernel
parameters at runtime. Parameters are available via the `/proc/sys/` virtual
process file system.
The parameters include the following subsystems among others:
- `fs` (file systems)
@@ -17,9 +17,9 @@ To get a complete list of kernel parameters, run:
$ sudo sysctl -a
```
Kubernetes provide mechanisms for setting namespaced sysctls.
Kubernetes provide mechanisms for setting namespaced sysctls.
Namespaced sysctls can be set per pod in the case of Kubernetes.
The following sysctls are known to be namespaced and can be set with
The following sysctls are known to be namespaced and can be set with
Kubernetes:
- `kernel.shm*`
@@ -84,14 +84,14 @@ The recommendation is to set them directly on the host or use a privileged
container in the case of Kubernetes.
In the case of Kata, the approach of setting sysctls on the host does not
work since the host sysctls have no effect on a Kata Container running
work since the host sysctls have no effect on a Kata Container running
inside a guest. Kata gives you the ability to set non-namespaced sysctls using a privileged container.
This has the advantage that the non-namespaced sysctls are set inside the guest
without having any effect on the `/proc/sys` values of any other pod or the
host itself.
This has the advantage that the non-namespaced sysctls are set inside the guest
without having any effect on the `/proc/sys` values of any other pod or the
host itself.
The recommended approach to do this would be to set the sysctl value in a
privileged init container. In this way, the application containers do not need any elevated
privileged init container. In this way, the application containers do not need any elevated
privileges.
```

View File

@@ -116,4 +116,4 @@ time for I in $(seq 100); do
done
# Display the memory usage again after running the test
free -h
free -h

View File

@@ -1,6 +1,6 @@
# Kata Agent Policy
Agent Policy is a Kata Containers feature that enables the Guest VM to perform additional validation for each [ttRPC API](../../src/libs/protocols/protos/agent.proto) request.
Agent Policy is a Kata Containers feature that enables the Guest VM to perform additional validation for each [ttRPC API](../../src/libs/protocols/protos/agent.proto) request.
The Policy is commonly used for implementing confidential containers, where the Kata Shim and the Kata Agent have different trust properties. However, the Policy can be used for non-confidential containers too - e.g., for a basic defense in depth step of blocking the Host from starting an application on the Guest. However, for non-confidential containers, the Host might be able to modify the Policy and/or replace the Agent and disable its Policy rules, so a Policy is more helpful for confidential containers.
@@ -35,7 +35,7 @@ Kubernetes users can encode in `base64` format their Policy documents, and add t
For example, the [`allow-all-except-exec-process.rego`](../../src/kata-opa/allow-all-except-exec-process.rego) sample policy file is different from the [default Policy](../../src/kata-opa/allow-all.rego) because it rejects any `ExecProcess` requests. To encode this policy file, you need to:
- Embed the policy inside an init data struct
- Compress
- Base64 encode
- Base64 encode
For example:
```bash

View File

@@ -22,7 +22,7 @@ mitigation does not affect a container's ability to mount *guest* devices.
## Containerd
The Containerd allows configuring the privileged host devices behavior for each runtime in the containerd config. This is
done with the `privileged_without_host_devices` option. Setting this to `true` will disable hot plugging of the host
done with the `privileged_without_host_devices` option. Setting this to `true` will disable hot plugging of the host
devices into the guest, even when privileged is enabled.
Support for configuring privileged host devices behaviour was added in containerd `1.3.0` version.
@@ -49,7 +49,7 @@ See below example config:
## CRI-O
Similar to containerd, CRI-O allows configuring the privileged host devices
behavior for each runtime in the CRI config. This is done with the
behavior for each runtime in the CRI config. This is done with the
`privileged_without_host_devices` option. Setting this to `true` will disable
hot plugging of the host devices into the guest, even when privileged is enabled.
@@ -74,4 +74,4 @@ See below example config:
```
- [Kata Containers with CRI-O](../how-to/run-kata-with-k8s.md#cri-o)

View File

@@ -30,7 +30,7 @@ POD ID CREATED STATE NAME
#### Create container in the pod sandbox with config file
```bash
$ sudo crictl create 16a62b035940f container_config.json sandbox_config.json
$ sudo crictl create 16a62b035940f container_config.json sandbox_config.json
e6ca0e0f7f532686236b8b1f549e4878e4fe32ea6b599a5d684faf168b429202
```
@@ -66,7 +66,7 @@ $ sudo crictl exec -it e6ca0e0f7f532 sh
And run commands in it:
```
/ # hostname
/ # hostname
busybox_host
/ # id
uid=0(root) gid=0(root)

View File

@@ -169,7 +169,7 @@ Add the following annotation for CRI-O
```yaml
io.kubernetes.cri-o.TrustedSandbox: "false"
```
The following is an example of what your YAML can look like:
The following is an example of what your YAML can look like:
```yaml
...
@@ -199,7 +199,7 @@ Add the following annotation for containerd
```yaml
io.kubernetes.cri.untrusted-workload: "true"
```
The following is an example of what your YAML can look like:
The following is an example of what your YAML can look like:
```yaml
...

View File

@@ -3,12 +3,12 @@
### What is VMCache
VMCache is a new function that creates VMs as caches before using it.
It helps speed up new container creation.
It helps speed up new container creation.
The function consists of a server and some clients communicating
through Unix socket. The protocol is gRPC in [`protocols/cache/cache.proto`](../../src/runtime/protocols/cache/cache.proto).
through Unix socket. The protocol is gRPC in [`protocols/cache/cache.proto`](../../src/runtime/protocols/cache/cache.proto).
The VMCache server will create some VMs and cache them by factory cache.
It will convert the VM to gRPC format and transport it when gets
requested from clients.
requested from clients.
Factory `grpccache` is the VMCache client. It will request gRPC format
VM and convert it back to a VM. If VMCache function is enabled,
`kata-runtime` will request VM from factory `grpccache` when it creates
@@ -16,8 +16,8 @@ a new sandbox.
### How is this different to VM templating
Both [VM templating](../how-to/what-is-vm-templating-and-how-do-I-use-it.md) and VMCache help speed up new container creation.
When VM templating enabled, new VMs are created by cloning from a pre-created template VM, and they will share the same initramfs, kernel and agent memory in readonly mode. So it saves a lot of memory if there are many Kata Containers running on the same host.
Both [VM templating](../how-to/what-is-vm-templating-and-how-do-I-use-it.md) and VMCache help speed up new container creation.
When VM templating enabled, new VMs are created by cloning from a pre-created template VM, and they will share the same initramfs, kernel and agent memory in readonly mode. So it saves a lot of memory if there are many Kata Containers running on the same host.
VMCache is not vulnerable to [share memory CVE](../how-to/what-is-vm-templating-and-how-do-I-use-it.md#what-are-the-cons) because each VM doesn't share the memory.
### How to enable VMCache
@@ -25,9 +25,9 @@ VMCache is not vulnerable to [share memory CVE](../how-to/what-is-vm-templating-
VMCache can be enabled by changing your Kata Containers config file (`/usr/share/defaults/kata-containers/configuration.toml`,
overridden by `/etc/kata-containers/configuration.toml` if provided) such that:
* `vm_cache_number` specifies the number of caches of VMCache:
* unspecified or == 0
* unspecified or == 0
VMCache is disabled
* `> 0`
* `> 0`
will be set to the specified number
* `vm_cache_endpoint` specifies the address of the Unix socket.

View File

@@ -10,8 +10,8 @@ much like a process fork done by the kernel but here we *fork* VMs.
### How is this different from VMCache
Both [VMCache](../how-to/what-is-vm-cache-and-how-do-I-use-it.md) and VM templating help speed up new container creation.
When VMCache enabled, new VMs are created by the VMCache server. So it is not vulnerable to share memory CVE because each VM doesn't share the memory.
Both [VMCache](../how-to/what-is-vm-cache-and-how-do-I-use-it.md) and VM templating help speed up new container creation.
When VMCache enabled, new VMs are created by the VMCache server. So it is not vulnerable to share memory CVE because each VM doesn't share the memory.
VM templating saves a lot of memory if there are many Kata Containers running on the same host.
### What are the Pros

View File

@@ -1,11 +1,11 @@
# Kata Containers installation guides
The following is an overview of the different installation methods available.
The following is an overview of the different installation methods available.
## Prerequisites
Kata Containers requires nested virtualization or bare metal. Check
[hardware requirements](./../../README.md#hardware-requirements) to see if your system is capable of running Kata
Kata Containers requires nested virtualization or bare metal. Check
[hardware requirements](./../../README.md#hardware-requirements) to see if your system is capable of running Kata
Containers.
The Kata Deploy Helm chart is the preferred way to install all of the binaries and

View File

@@ -101,7 +101,7 @@
```
> **Note:**
>
>
> The containerd daemon needs to be able to find the
> `containerd-shim-kata-v2` binary to allow Kata Containers to be created.

View File

@@ -1,10 +1,10 @@
# Kata Containers 3.0 rust runtime installation
The following is an overview of the different installation methods available.
The following is an overview of the different installation methods available.
## Prerequisites
Kata Containers 3.0 rust runtime requires nested virtualization or bare metal. Check
[hardware requirements](/src/runtime/README.md#hardware-requirements) to see if your system is capable of running Kata
Kata Containers 3.0 rust runtime requires nested virtualization or bare metal. Check
[hardware requirements](/src/runtime/README.md#hardware-requirements) to see if your system is capable of running Kata
Containers.
### Platform support
@@ -25,10 +25,10 @@ architectures:
| Installation method | Description | Automatic updates | Use case | Availability
|------------------------------------------------------|----------------------------------------------------------------------------------------------|-------------------|-----------------------------------------------------------------------------------------------|----------- |
| [Using kata-deploy](#kata-deploy-installation) | The preferred way to deploy the Kata Containers distributed binaries on a Kubernetes cluster | **No!** | Best way to give it a try on kata-containers on an already up and running Kubernetes cluster. | Yes |
| [Using official distro packages](#official-packages) | Kata packages provided by Linux distributions official repositories | yes | Recommended for most users. | No |
| [Using official distro packages](#official-packages) | Kata packages provided by Linux distributions official repositories | yes | Recommended for most users. | No |
| [Automatic](#automatic-installation) | Run a single command to install a full system | **No!** | For those wanting the latest release quickly. | No |
| [Manual](#manual-installation) | Follow a guide step-by-step to install a working system | **No!** | For those who want the latest release with more control. | No |
| [Build from source](#build-from-source-installation) | Build the software components manually | **No!** | Power users and developers only. | Yes |
| [Build from source](#build-from-source-installation) | Build the software components manually | **No!** | Power users and developers only. | Yes |
### Kata Deploy Installation
@@ -57,7 +57,7 @@ Follow the [`kata-deploy`](../../tools/packaging/kata-deploy/helm-chart/README.m
```
* Musl support for fully static binary
Example for `x86_64`
```
$ rustup target add x86_64-unknown-linux-musl

View File

@@ -103,48 +103,8 @@ $ minikube ssh "grep -c -E 'vmx|svm' /proc/cpuinfo"
## Installing Kata Containers
You can now install the Kata Containers runtime components. You will need a local copy of some Kata
Containers components to help with this, and then use `kubectl` on the host (that Minikube has already
configured for you) to deploy them:
```sh
$ git clone https://github.com/kata-containers/kata-containers.git
$ cd kata-containers/tools/packaging/kata-deploy
$ kubectl apply -f kata-rbac/base/kata-rbac.yaml
$ kubectl apply -f kata-deploy/base/kata-deploy.yaml
```
This installs the Kata Containers components into `/opt/kata` inside the Minikube node. It can take
a few minutes for the operation to complete. You can check the installation has worked by checking
the status of the `kata-deploy` pod, which will be executing
[this script](../../tools/packaging/kata-deploy/scripts/kata-deploy.sh),
and will be executing a `sleep infinity` once it has successfully completed its work.
You can accomplish this by running the following:
```sh
$ podname=$(kubectl -n kube-system get pods -o=name | grep -F kata-deploy | sed 's?pod/??')
$ kubectl -n kube-system exec ${podname} -- ps -ef | grep -F infinity
```
> *NOTE:* This check only works for single node clusters, which is the default for Minikube.
> For multi-node clusters, the check would need to be adapted to check `kata-deploy` had
> completed on all nodes.
## Enabling Kata Containers
Now you have installed the Kata Containers components in the Minikube node. Next, you need to configure
Kubernetes `RuntimeClass` to know when to use Kata Containers to run a pod.
### Register the runtime
Now register the `kata qemu` runtime with that class. This should result in no errors:
```sh
$ cd kata-containers/tools/packaging/kata-deploy/runtimeclasses
$ kubectl apply -f kata-runtimeClasses.yaml
```
The Kata Containers installation process should be complete and enabled in the Minikube cluster.
You can now install the Kata Containers runtime components
[following the official instructions](../../tools/packaging/kata-deploy/helm-chart).
## Testing Kata Containers

View File

@@ -48,7 +48,7 @@ $ make test
- Run a test in the current package in verbose mode:
```bash
# Example
# Example
$ test="config::tests::test_get_log_level"
$ cargo test "$test" -vv -- --exact --nocapture
@@ -223,7 +223,7 @@ What's wrong with this function?
```rust
fn foo(config: &Config, path_prefix: String, container_id: String, pid: String) -> Result<()> {
let mut full_path = format!("{}/{}", path_prefix, container_id);
let mut full_path = format!("{path_prefix}/{container_id}");
let _ = remove_recursively(&mut full_path);

View File

@@ -80,8 +80,8 @@ In case of Kata, today the devices which we need in the guest are:
- Dynamic Resource Management: `ACPI` is utilized to allow for dynamic VM
resource management (for example: CPU, memory, device hotplug). This is
required when containers are resized, or more generally when containers are
added to a pod.
added to a pod.
How these devices are utilized varies depending on the VMM utilized. We clarify
the default settings provided when integrating Kata with the QEMU, Dragonball,
Firecracker and Cloud Hypervisor VMMs in the following sections.

View File

@@ -107,7 +107,7 @@ Containers agent:
By default, tracing is disabled for all components. To enable _any_ form of
tracing an `enable_tracing` option must be enabled for at least one component.
> **Note:**
> **Note:**
>
> Enabling this option will only allow tracing for subsequently
> started containers.
@@ -187,7 +187,7 @@ from improvements in the tracing infrastructure. Overall, the impact of
enabling runtime and agent tracing should be extremely low.
## Agent shutdown behaviour
In normal operation, the Kata runtime manages the VM shutdown and performs
certain optimisations to speed up this process. However, if agent tracing is
enabled, the agent itself is responsible for shutting down the VM. This it to

View File

@@ -83,10 +83,10 @@ If you have [s390-tools](https://github.com/ibm-s390-linux/s390-tools) available
```
[container]# lszcrypt -V
CARD.DOM TYPE MODE STATUS REQUESTS PENDING HWTYPE QDEPTH FUNCTIONS DRIVER SESTAT
CARD.DOM TYPE MODE STATUS REQUESTS PENDING HWTYPE QDEPTH FUNCTIONS DRIVER SESTAT
--------------------------------------------------------------------------------------------------------
03 CEX8P EP11-Coproc online 2 0 14 08 -----XN-F- cex4card -
03.0041 CEX8P EP11-Coproc online 2 0 14 08 -----XN-F- cex4queue usable
03 CEX8P EP11-Coproc online 2 0 14 08 -----XN-F- cex4card -
03.0041 CEX8P EP11-Coproc online 2 0 14 08 -----XN-F- cex4queue usable
```
---

View File

@@ -3,4 +3,4 @@
Kata Containers supports passing certain GPUs from the host into the container. Select the GPU vendor for detailed information:
- [Intel Discrete GPUs](Intel-Discrete-GPU-passthrough-and-Kata.md)/[Intel Integrated GPUs](Intel-GPU-passthrough-and-Kata.md)
- [NVIDIA](NVIDIA-GPU-passthrough-and-Kata.md)
- [NVIDIA GPUs](NVIDIA-GPU-passthrough-and-Kata.md) and [Enabling NVIDIA GPU workloads using GPU passthrough with Kata Containers](NVIDIA-GPU-passthrough-and-Kata-QEMU.md)

View File

@@ -6,15 +6,15 @@ For integrated GPUs please refer to [Integrate-Intel-GPUs-with-Kata](Intel-GPU-p
> **Note:** These instructions are for a system that has an x86_64 CPU.
An Intel Discrete GPU can be passed to a Kata Container using GPU passthrough,
An Intel Discrete GPU can be passed to a Kata Container using GPU passthrough,
or SR-IOV passthrough.
In Intel GPU pass-through mode, an entire physical GPU is directly assigned to one VM.
In Intel GPU pass-through mode, an entire physical GPU is directly assigned to one VM.
In this mode of operation, the GPU is accessed exclusively by the Intel driver running in
the VM to which it is assigned. The GPU is not shared among VMs.
With SR-IOV mode, it is possible to pass a Virtual GPU instance to a virtual machine.
With this, multiple Virtual GPU instances can be carved out of a single physical GPU
With this, multiple Virtual GPU instances can be carved out of a single physical GPU
and be passed to different VMs, allowing the GPU to be shared.
| Technology | Description |
@@ -28,13 +28,13 @@ Intel GPUs Recommended for Virtualization:
- Intel® Data Center GPU Max Series (`Ponte Vecchio`)
- Intel® Data Center GPU Flex Series (`Arctic Sound-M`)
- Intel® Data Center GPU Arc Series
- Intel® Data Center GPU Arc Series
The following steps outline the workflow for using an Intel Graphics device with Kata Containers.
## Host BIOS requirements
Hardware such as Intel Max and Flex series require larger PCI BARs.
Hardware such as Intel Max and Flex series require larger PCI BARs.
For large BAR devices, MMIO mapping above the 4GB address space should be enabled in the PCI configuration of the BIOS.
@@ -89,7 +89,7 @@ CONFIG_VFIO_IOMMU_TYPE1
CONFIG_VFIO_PCI
```
## Host kernel command line
## Host kernel command line
Your host kernel needs to be booted with `intel_iommu=on` and `i915.enable_iaf=0` on the kernel command
line.
@@ -112,8 +112,8 @@ $ sudo update-grub
For CentOS/RHEL:
```bash
$ sudo grub2-mkconfig -o /boot/grub2/grub.cfg
```
$ sudo grub2-mkconfig -o /boot/grub2/grub.cfg
```
4. Reboot the system
```bash
@@ -234,7 +234,7 @@ Use the following steps to pass an Intel Graphics device in SR-IOV mode to a Kat
$ BDF="0000:3a:00.0"
$ cat "/sys/bus/pci/devices/$BDF/sriov_totalvfs"
63
```
```
Create SR-IOV interfaces for the GPU:
```sh

View File

@@ -0,0 +1,569 @@
# Enabling NVIDIA GPU workloads using GPU passthrough with Kata Containers
This page provides:
1. A description of the components involved when running GPU workloads with
Kata Containers using the NVIDIA TEE and non-TEE GPU runtime classes.
1. An explanation of the orchestration flow on a Kubernetes node for this
scenario.
1. A deployment guide enabling to utilize these runtime classes.
The goal is to educate readers familiar with Kubernetes and Kata Containers
on NVIDIA's reference implementation which is reflected in Kata CI's build
and test framework. With this, we aim to enable readers to leverage this
stack, or to use the principles behind this stack in order to run GPU
workloads on their variant of the Kata Containers stack.
We assume the reader is familiar with Kubernetes, Kata Containers, and
Confidential Containers.
> **Note:**
>
> The current supported mode for enabling GPU workloads in the TEE scenario
> is single GPU passthrough (one GPU per pod) on AMD64 platforms (AMD SEV-SNP
> being the only supported TEE scenario so far with support for Intel TDX being
> on the way).
## Component Overview
Before providing deployment guidance, we describe the components involved to
support running GPU workloads. We start from a top to bottom perspective
from the NVIDIA GPU operator via the Kata runtime to the components within
the NVIDIA GPU Utility Virtual Machine (UVM) root filesystem.
### NVIDIA GPU Operator
A central component is the
[NVIDIA GPU operator](https://github.com/NVIDIA/gpu-operator) which can be
deployed onto your cluster as a helm chart. Installing the GPU operator
delivers various operands on your nodes in the form of Kubernetes DaemonSets.
These operands are vital to support the flow of orchestrating pod manifests
using NVIDIA GPU runtime classes with GPU passthrough on your nodes. Without
getting into the details, the most important operands and their
responsibilities are:
- **nvidia-vfio-manager:** Binding discovered NVIDIA GPUs to the `vfio-pci`
driver for VFIO passthrough.
- **nvidia-cc-manager:** Transitioning GPUs into confidential computing (CC)
and non-CC mode (see the
[NVIDIA/k8s-cc-manager](https://github.com/NVIDIA/k8s-cc-manager)
repository).
- **nvidia-kata-manager:** Creating host-side CDI specifications for GPU
passthrough, resulting in the file `/var/run/cdi/nvidia.yaml`, containing
`kind: nvidia.com/pgpu` (see the
[NVIDIA/k8s-kata-manager](https://github.com/NVIDIA/k8s-kata-manager)
repository).
- **nvidia-sandbox-device-plugin** (see the
[NVIDIA/sandbox-device-plugin](https://github.com/NVIDIA/sandbox-device-plugin)
repository):
- Allocating GPUs during pod deployment.
- Discovering NVIDIA GPUs, their capabilities, and advertising these to
the Kubernetes control plane (allocatable resources as type
`nvidia.com/pgpu` resources will appear for the node and GPU Device IDs
will be registered with Kubelet). These GPUs can thus be allocated as
container resources in your pod manifests. See below GPU operator
deployment instructions for the use of the key `pgpu`, controlled via a
variable.
To summarize, the GPU operator manages the GPUs on each node, allowing for
simple orchestration of pod manifests using Kata Containers. Once the cluster
with GPU operator and Kata bits is up and running, the end user can schedule
Kata NVIDIA GPU workloads, using resource limits and the
`kata-qemu-nvidia-gpu` or `kata-qemu-nvidia-gpu-snp` runtime classes, for
example:
```yaml
apiVersion: v1
kind: Pod
...
spec:
...
runtimeClassName: kata-qemu-nvidia-gpu-snp
...
resources:
limits:
"nvidia.com/pgpu": 1
...
```
When this happens, the Kubelet calls into the sandbox device plugin to
allocate a GPU. The sandbox device plugin returns `DeviceSpec` entries to the
Kubelet for the allocated GPU. The Kubelet uses internal device IDs for
tracking of allocated GPUs and includes the device specifications in the CRI
request when scheduling the pod through containerd. Containerd processes the
device specifications and includes the device configuration in the OCI
runtime spec used to invoke the Kata runtime during the create container
request.
### Kata runtime
The Kata runtime for the NVIDIA GPU handlers is configured to cold-plug VFIO
devices (`cold_plug_vfio` is set to `root-port` while
`hot_plug_vfio` is set to `no-port`). Cold-plug is by design the only
supported mode for NVIDIA GPU passthrough of the NVIDIA reference stack.
With cold-plug, the Kata runtime attaches the GPU at VM launch time, when
creating the pod sandbox. This happens *before* the create container request,
i.e., before the Kata runtime receives the OCI spec including device
configurations from containerd. Thus, a mechanism to acquire the device
information is required. This is done by the runtime calling the
`coldPlugDevices()` function during sandbox creation. In this function,
the runtime queries Kubelet's Pod Resources API to discover allocated GPU
device IDs (e.g., `nvidia.com/pgpu = [vfio0]`). The runtime formats these as
CDI device identifiers and injects them into the OCI spec using
`config.InjectCDIDevices()`. The runtime then consults the host CDI
specifications and determines the device path the GPU is backed by
(e.g., `/dev/vfio/devices/vfio0`). Finally, the runtime resolves the device's
PCI BDF (e.g., `0000:21:00`) and cold-plugs the GPU by launching QEMU with
relevant parameters for device passthrough (e.g.,
`-device vfio-pci,host=0000:21:00.0,x-pci-vendor-id=0x10de,x-pci-device-id=0x2321,bus=rp0,iommufd=iommufdvfio-faf829f2ea7aec330`).
The runtime also creates *inner runtime* CDI annotations
which map host VFIO devices to guest GPU devices. These are annotations
intended for the kata-agent, here referred to as the inner runtime (inside the
UVM), to properly handle GPU passthrough into containers. These annotations
serve as metadata providing the kata-agent with the information needed to
attach the passthrough devices to the correct container.
The annotations are key-value pairs consisting of `cdi.k8s.io/vfio<num>` keys
(derived from the host VFIO device path, e.g., `/dev/vfio/devices/vfio1`) and
`nvidia.com/gpu=<index>` values (referencing the corresponding device in the
guest CDI spec). These annotations are injected by the runtime during container
creation via the `annotateContainerWithVFIOMetadata` function (see
`container.go`).
We continue describing the orchestration flow inside the UVM in the next
section.
### Kata NVIDIA GPU UVM
#### UVM composition
To better understand the orchestration flow inside the NVIDIA GPU UVM, we
first look at the components its root filesystem contains. Should you decide
to use your own root filesystem to enable NVIDIA GPU scenarios, this should
give you a good idea on what ingredients you need.
From a file system perspective, the UVM is composed of two files: a standard
Kata kernel image and the NVIDIA GPU rootfs in initrd or disk image format.
These two files are being utilized for the QEMU launch command when the UVM
is created.
The two most important pieces in Kata Container's build recipes for the
NVIDIA GPU root filesystem are the `nvidia_chroot.sh` and `nvidia_rootfs.sh`
files. The build follows a two-stage process. In the first stage, a
full-fledged Ubuntu-based root filesystem is composed within a chroot
environment. In this stage, NVIDIA kernel modules are built and signed
against the current Kata kernel and relevant NVIDIA packages are installed.
In the second stage, a chiseled build is performed: Only relevant contents
from the first stage are copied and compressed into a new distro-less root
filesystem folder. Kata's build infrastructure then turns this root
filesystem into the NVIDIA initrd and image files.
The resulting root filesystem contains the following software components:
- NVRC - the
[NVIDIA Runtime Container init system](https://github.com/NVIDIA/nvrc/tree/main)
- NVIDIA drivers (kernel modules)
- NVIDIA user space driver libraries
- NVIDIA user space tools
- kata-agent
- confidential computing guest components: the attestation agent,
confidential data hub and api-server-rest binaries
- CRI-O pause container (for the guest image-pull method)
- BusyBox utilities (provides a base set of libraries and binaries, and a
linker)
- some supporting files, such as file containing a list of supported GPU
device IDs which NVRC reads
#### UVM orchestration flow
When the Kata runtime asks QEMU to launch the VM, the UVM's Linux kernel
boots and mounts the root filesystem. After this, NVRC starts as the initial
process.
NVRC scans for NVIDIA GPUs on the PCI bus, loads the
NVIDIA kernel modules, waits for driver initialization, creates the device nodes,
and initializes the GPU hardware (using the `nvidia-smi` binary). NVRC also
creates the guest-side CDI specification file (using the
`nvidia-ctk cdi generate` command). This file specifies devices of
`kind: nvidia.com/gpu`, i.e., GPUs appearing to be physical GPUs on regular
bare metal systems. The guest CDI specification also contains `containerEdits`
for each device, specifying device nodes (e.g., `/dev/nvidia0`,
`/dev/nvidiactl`), library mounts, and environment variables to be mounted
into the container which receives the passthrough GPU.
Then, NVRC forks the Kata agent while continuing to run as the
init system. This allows NVRC to handle ongoing GPU management tasks
while kata-agent focuses on container lifecycle management. See the
[NVRC sources](https://github.com/NVIDIA/nvrc/blob/main/src/main.rs) for an
overview on the steps carried out by NVRC.
When the Kata runtime sends the create container request, the Kata agent
parses the inner runtime CDI annotation. For example, for the inner runtime
annotation `"cdi.k8s.io/vfio1": "nvidia.com/gpu=0"`, the agent looks up device
`0` in the guest CDI specification with `kind: nvidia.com/gpu`.
The Kata agent also reads the guest CDI specification's `containerEdits`
section and injects relevant contents into the OCI spec of the respective
container. The kata agent then creates and starts a `rustjail` container
based on the final OCI spec. The container now has relevant device nodes,
binaries and low-level libraries available, and can start a user application
linked against the CUDA runtime API (e.g., `libcudart.so` and other
libraries). When used, the CUDA runtime API in turn calls the CUDA driver
API and kernel drivers, interacting with the pass-through GPU device.
An additional step is exercised in our CI samples: when using images from an
authenticated registry, the guest-pull mechanism triggers attestation using
trustee's Key Broker Service (KBS) for secure release of the NGC API
authentication key used to access the NVCR container registry. As part of
this, the attestation agent exercises composite attestation and transitions
the GPU into `Ready` state (without this, the GPU has to explicitly be
transitioned into `Ready` state by passing the `nvrc.smi.srs=1` kernel
parameter via the shim config, causing NVRC to transition the GPU into the
`Ready` state).
## Deployment Guidance
This guidance assumes you use bare-metal machines with proper support for
Kata's non-TEE and TEE GPU workload deployment scenarios for your Kubernetes
nodes. We provide guidance based on the upstream Kata CI procedures for the
NVIDIA GPU CI validation jobs. Note that, this setup:
- uses the guest image pull method to pull container image layers
- uses the genpolicy tool to attach Kata agent security policies to the pod
manifest
- has dedicated (composite) attestation tests, a CUDA vectorAdd test, and a
NIM/RA test sample with secure API key release
A similar deployment guide and scenario description can be found in NVIDIA resources
under
[Early Access: NVIDIA GPU Operator with Confidential Containers based on Kata](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/confidential-containers.html).
### Requirements
The requirements for the TEE scenario are:
- Ubuntu 25.10 as host OS
- CPU with AMD SEV-SNP support with proper BIOS/UEFI version and settings
- CC-capable Hopper/Blackwell GPU with proper VBIOS version.
BIOS and VBIOS configuration is out of scope for this guide. Other resources,
such as the documentation found on the
[NVIDIA Trusted Computing Solutions](https://docs.nvidia.com/nvtrust/index.html)
page and the above linked NVIDIA documentation, provide guidance on
selecting proper hardware and on properly configuring its firmware and OS.
### Installation
#### Containerd and Kubernetes
First, set up your Kubernetes cluster. For instance, in Kata CI, our NVIDIA
jobs use a single-node vanilla Kubernetes cluster with a 2.x containerd
version and Kata's current supported Kubernetes version. We set this cluster
up using the `deploy_k8s` function from `tests/integration/kubernetes/gha-run.sh`
as follows:
```bash
$ export KUBERNETES="vanilla"
$ export CONTAINER_ENGINE="containerd"
$ export CONTAINER_ENGINE_VERSION="v2.1"
$ source tests/gha-run-k8s-common.sh
$ deploy_k8s
```
> **Note:**
>
> We recommend to configure your Kubelet with a higher
> `runtimeRequestTimeout` timeout value than the two minute default timeout.
> Using the guest-pull mechanism, pulling large images may take a significant
> amount of time and may delay container start, possibly leading your Kubelet
> to de-allocate your pod before it transitions from the *container created*
> to the *container running* state.
> **Note:**
>
> The NVIDIA GPU runtime classes use VFIO cold-plug which, as
> described above, requires the Kata runtime to query Kubelet's Pod Resources
> API to discover allocated GPU devices during sandbox creation. For
> Kubernetes versions **older than 1.34**, you must explicitly enable the
> `KubeletPodResourcesGet` feature gate in your Kubelet configuration. For
> Kubernetes 1.34 and later, this feature is enabled by default.
#### GPU Operator
Assuming you have the helm tools installed, deploy the latest version of the
GPU Operator as a helm chart (minimum version: `v25.10.0`):
```bash
$ helm repo add nvidia https://helm.ngc.nvidia.com/nvidia && helm repo update
$ helm install --wait --generate-name \
-n gpu-operator --create-namespace \
nvidia/gpu-operator \
--set sandboxWorkloads.enabled=true \
--set sandboxWorkloads.defaultWorkload=vm-passthrough \
--set kataManager.enabled=true \
--set kataManager.config.runtimeClasses=null \
--set kataManager.repository=nvcr.io/nvidia/cloud-native \
--set kataManager.image=k8s-kata-manager \
--set kataManager.version=v0.2.4 \
--set ccManager.enabled=true \
--set ccManager.defaultMode=on \
--set ccManager.repository=nvcr.io/nvidia/cloud-native \
--set ccManager.image=k8s-cc-manager \
--set ccManager.version=v0.2.0 \
--set sandboxDevicePlugin.repository=nvcr.io/nvidia/cloud-native \
--set sandboxDevicePlugin.image=nvidia-sandbox-device-plugin \
--set sandboxDevicePlugin.version=v0.0.1 \
--set 'sandboxDevicePlugin.env[0].name=P_GPU_ALIAS' \
--set 'sandboxDevicePlugin.env[0].value=pgpu' \
--set nfd.enabled=true \
--set nfd.nodefeaturerules=true
```
> **Note:**
>
> For heterogeneous clusters with different GPU types, you can omit
> the `P_GPU_ALIAS` environment variable lines. This will cause the sandbox
> device plugin to create GPU model-specific resource types (e.g.,
> `nvidia.com/GH100_H100L_94GB`) instead of the generic `nvidia.com/pgpu`,
> which in turn can be used by pods through respective resource limits.
> For simplicity, this guide uses the generic alias.
> **Note:**
>
> Using `--set sandboxWorkloads.defaultWorkload=vm-passthrough` causes all
> your nodes to be labeled for GPU VM passthrough. Remove this parameter if
> you intend to only use selected nodes for this scenario, and label these
> nodes by hand, using:
> `kubectl label node <node-name> nvidia.com/gpu.workload.config=vm-passthrough`.
#### Kata Containers
Install the latest Kata Containers helm chart, similar to
[existing documentation](https://github.com/kata-containers/kata-containers/blob/main/tools/packaging/kata-deploy/helm-chart/README.md)
(minimum version: `3.24.0`).
```bash
$ export VERSION=$(curl -sSL https://api.github.com/repos/kata-containers/kata-containers/releases/latest | jq .tag_name | tr -d '"')
$ export CHART="oci://ghcr.io/kata-containers/kata-deploy-charts/kata-deploy"
$ helm install kata-deploy \
--namespace kata-system \
--create-namespace \
-f "https://raw.githubusercontent.com/kata-containers/kata-containers/refs/tags/${VERSION}/tools/packaging/kata-deploy/helm-chart/kata-deploy/try-kata-nvidia-gpu.values.yaml" \
--set nfd.enabled=false \
--set shims.qemu-nvidia-gpu-tdx.enabled=false \
--wait --timeout 10m --atomic \
"${CHART}" --version "${VERSION}"
```
#### Trustee's KBS for remote attestation
For our Kata CI runners we use Trustee's KBS for composite attestation for
secure key release, for instance, for test scenarios which use authenticated
container images. In such scenarios, the credentials to access the
authenticated container registry are only released to the confidential guest
after successful attestation. Please see the section below for more
information about this.
```bash
$ export NVIDIA_VERIFIER_MODE="remote"
$ export KBS_INGRESS="nodeport"
$ bash tests/integration/kubernetes/gha-run.sh deploy-coco-kbs
$ bash tests/integration/kubernetes/gha-run.sh install-kbs-client
```
Please note, that Trustee can also be deployed via any other upstream
mechanism as documented by the
[confidential-containers repository](https://github.com/confidential-containers/trustee).
For our architecture it is important to set up KBS in the remote verifier
mode which requires entering a licensing agreement with NVIDIA, see the
[notes in confidential-containers repository](https://github.com/confidential-containers/trustee/blob/main/deps/verifier/src/nvidia/README.md).
### Cluster validation and preparation
If you did not use the `sandboxWorkloads.defaultWorkload=vm-passthrough`
parameter during GPU operator deployment, label your nodes for GPU VM
passthrough, for the example of using all nodes for GPU passthrough, run:
```bash
$ kubectl label nodes --all nvidia.com/gpu.workload.config=vm-passthrough --overwrite
```
Check if the `nvidia-cc-manager` pod is running if you intend to run GPU TEE
scenarios. If not, you need to manually label the node as CC capable. Current
GPU Operator node feature rules do not yet recognize all CC capable GPU PCI
IDs. Run the following command:
```bash
$ kubectl label nodes --all nvidia.com/cc.capable=true
```
After this, assure the `nvidia-cc-manager` pod is running. With the suggested
parameters for GPU Operator deployment, the `nvidia-cc-manager` will
automatically transition the GPU into CC mode.
After deployment, you can transition your node(s) to the desired CC state,
using either the `on` or `off` value, depending on your scenario. For the
non-CC scenario, transition to the `off` state via:
`kubectl label nodes --all nvidia.com/cc.mode=off` and wait until all pods
are back running. When an actual change is exercised, various GPU operator
operands will be restarted.
Ensure all pods are running:
```bash
$ kubectl get pods -A
```
On your node(s), ensure for correct driver binding. Your GPU device should be
bound to the VFIO driver, i.e., showing `Kernel driver in use: vfio-pci`
when running:
```bash
$ lspci -nnk -d 10de:
```
### Run the CUDA vectorAdd sample
Create the following file:
```yaml
apiVersion: v1
kind: Pod
metadata:
name: cuda-vectoradd-kata
namespace: default
annotations:
io.katacontainers.config.hypervisor.kernel_params: "nvrc.smi.srs=1"
spec:
runtimeClassName: ${GPU_RUNTIME_CLASS_NAME}
restartPolicy: Never
containers:
- name: cuda-vectoradd
image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubuntu22.04"
resources:
limits:
nvidia.com/pgpu: "1"
memory: 16Gi
```
Depending on your scenario and on the CC state, export your desired runtime
class name define the environment variable:
```bash
$ export GPU_RUNTIME_CLASS_NAME="kata-qemu-nvidia-gpu-snp"
```
Then, deploy the sample Kubernetes pod manifest and observe the pod logs:
```bash
$ envsubst < ./cuda-vectoradd-kata.yaml.in | kubectl apply -f -
$ kubectl wait --for=condition=Ready pod/cuda-vectoradd-kata --timeout=60s
$ kubectl logs -n default cuda-vectoradd-kata
```
Expect the following output:
```
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
```
To stop the pod, run: `kubectl delete pod cuda-vectoradd-kata`.
### Next steps
#### Transition between CC and non-CC mode
Use the previously described node labeling approach to transition between
the CC and non-CC mode. In case of the non-CC mode, you can use the
`kata-qemu-nvidia-gpu` value for the `GPU_RUNTIME_CLASS_NAME` runtime class
variable in the above CUDA vectorAdd sample. The `kata-qemu-nvidia-gpu-snp`
runtime class will **NOT** work in this mode - and vice versa.
#### Run Kata CI tests locally
Upstream Kata CI runs the CUDA vectorAdd test, a composite attestation test,
and a basic NIM/RAG deployment. Running CI tests for the TEE GPU scenario
requires KBS to be deployed (except for the CUDA vectorAdd test). The best
place to get started running these tests locally is to look into our
[NVIDIA CI workflow manifest](https://github.com/kata-containers/kata-containers/blob/main/.github/workflows/run-k8s-tests-on-nvidia-gpu.yaml)
and into the underling
[run_kubernetes_nv_tests.sh](https://github.com/kata-containers/kata-containers/blob/main/tests/integration/kubernetes/run_kubernetes_nv_tests.sh)
script. For example, to run the CUDA vectorAdd scenario against the TEE GPU
runtime class use the following commands:
```bash
# create the kata runtime class the test framework uses
$ export KATA_HYPERVISOR=qemu-nvidia-gpu-snp
$ kubectl delete runtimeclass kata --ignore-not-found
$ kubectl get runtimeclass "kata-${KATA_HYPERVISOR}" -o json | \
jq '.metadata.name = "kata" | del(.metadata.uid, .metadata.resourceVersion, .metadata.creationTimestamp)' | \
kubectl apply -f -
$ cd tests/integration/kubernetes
$ K8S_TEST_NV="k8s-nvidia-cuda.bats" ./gha-run.sh run-nv-tests
```
> **Note:**
>
> The other scenarios require an NGC API key to run, i.e., to export the
> `NGC_API_KEY` variable with a valid NGC API key.
#### Deploy pods using attestation
Attestation is a fundamental piece of the confidential containers solution.
In our upstream CI we use attestation at the example of leveraging the
authenticated container image pull mechanism where container images reside
in the authenticated NVCR registry (`k8s-nvidia-nim.bats`), and for
requesting secrets from KBS (`k8s-confidential-attestation.bats`). KBS will
release the image pull secret to a confidential guest. To get the
authentication credentials from inside the guest, KBS must already be
deployed and configured. In our CI samples, we configure KBS with the guest
image pull secret, a resource policy, and launch the pod with certain kernel
command line parameters:
`"agent.image_registry_auth=kbs:///default/credentials/nvcr agent.aa_kbc_params=cc_kbc::${CC_KBS_ADDR}"`.
The `agent.aa_kbc_params` option is a general configuration for attestation.
For your use case, you need to set the IP address and port under which KBS
is reachable through the `CC_KBS_ADDR` variable (see our CI sample). This
tells the guest how to reach KBS. Something like this must be set whenever
attestation is used, but on its own this parameter does not trigger
attestation. The `agent.image_registry_auth` option tells the guest to ask
for a resource from KBS and use it as the authentication configuration. When
this is set, the guest will request this resource at boot (and trigger
attestation) regardless of which image is being pulled.
To deploy your own pods using authenticated container images, or secure key
release for attestation, follow steps similar to our mentioned CI samples.
#### Deploy pods with Kata agent security policies
With GPU passthrough being supported by the
[genpolicy tool](https://github.com/kata-containers/kata-containers/tree/main/src/tools/genpolicy),
you can use the tool to create a Kata agent security policy. Our CI deploys
all sample pod manifests with a Kata agent security policy.
#### Deploy pods using your own containers and manifests
You can author pod manifests leveraging your own containers, for instance,
containers built using the CUDA container toolkit. We recommend to start
with a CUDA base container.
The GPU is transitioned into the `Ready` state via attestation, for instance,
when pulling authenticated images. If your deployment scenario does not use
attestation, please refer back to the CUDA vectorAdd pod manifest. In this
manifest, we ensure that NVRC sets the GPU to `Ready` state by adding the
following annotation in the manifest:
`io.katacontainers.config.hypervisor.kernel_params: "nvrc.smi.srs=1"`
> **Notes:**
>
> - musl-based container images (e.g., using Alpine), or distro-less
> containers are not supported.
> - for the TEE scenario, only single-GPU passthrough per pod is supported,
> so your pod resource limit must be: `nvidia.com/pgpu: "1"` (on a system
> with multiple GPUs, you can thus pass through one GPU per pod).

View File

@@ -1,10 +1,25 @@
# Using NVIDIA GPU device with Kata Containers
This page gives an overview on the different modes in which GPUs can be passed
to a Kata Containers container, provides host system requirements, explains how
Kata Containers guest components can be built to support the NVIDIA GPU
scenario, and gives practical usage examples using `ctr`.
Please see the guide
[Enabling NVIDIA GPU workloads using GPU passthrough with Kata Containers](NVIDIA-GPU-passthrough-and-Kata-QEMU.md)
for a documentation of an end-to-end reference implementation of a Kata
Containers stack for GPU passthrough using QEMU, the go-based Kata Runtime,
and an NVIDIA-specific root filesystem. This reference implementation is built
and validated in Kata's CI, and it can be used to test GPU workloads with Kata
components and Kubernetes out of the box.
## Comparison between Passthrough and vGPU Modes
An NVIDIA GPU device can be passed to a Kata Containers container using GPU
passthrough (NVIDIA GPU pass-through mode) as well as GPU mediated passthrough
passthrough (NVIDIA GPU passthrough mode) as well as GPU mediated passthrough
(NVIDIA `vGPU` mode).
NVIDIA GPU pass-through mode, an entire physical GPU is directly assigned to one
NVIDIA GPU passthrough mode, an entire physical GPU is directly assigned to one
VM, bypassing the NVIDIA Virtual GPU Manager. In this mode of operation, the GPU
is accessed exclusively by the NVIDIA driver running in the VM to which it is
assigned. The GPU is not shared among VMs.
@@ -20,18 +35,20 @@ with [MIG-slices](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/).
| Technology | Description | Behavior | Detail |
| --- | --- | --- | --- |
| NVIDIA GPU pass-through mode | GPU passthrough | Physical GPU assigned to a single VM | Direct GPU assignment to VM without limitation |
| NVIDIA GPU passthrough mode | GPU passthrough | Physical GPU assigned to a single VM | Direct GPU assignment to VM without limitation |
| NVIDIA vGPU time-sliced | GPU time-sliced | Physical GPU time-sliced for multiple VMs | Mediated passthrough |
| NVIDIA vGPU MIG-backed | GPU with MIG-slices | Physical GPU MIG-sliced for multiple VMs | Mediated passthrough |
## Hardware Requirements
## Host Requirements
NVIDIA GPUs Recommended for Virtualization:
### Hardware
NVIDIA GPUs recommended for virtualization:
- NVIDIA Tesla (T4, M10, P6, V100 or newer)
- NVIDIA Quadro RTX 6000/8000
## Host BIOS Requirements
### Firmware
Some hardware requires a larger PCI BARs window, for example, NVIDIA Tesla P100,
K40m
@@ -55,9 +72,7 @@ Some hardware vendors use a different name in BIOS, such as:
If one is using a GPU based on the Ampere architecture and later additionally
SR-IOV needs to be enabled for the `vGPU` use-case.
The following steps outline the workflow for using an NVIDIA GPU with Kata.
## Host Kernel Requirements
### Kernel
The following configurations need to be enabled on your host kernel:
@@ -70,7 +85,13 @@ The following configurations need to be enabled on your host kernel:
Your host kernel needs to be booted with `intel_iommu=on` on the kernel command
line.
## Install and configure Kata Containers
## Build the Kata Components
This section explains how to build an environment with Kata Containers bits
supporting the GPU scenario. We first deploy and configure the regular Kata
components, then describe how to build the guest kernel and root filesystem.
### Install and configure Kata Containers
To use non-large BARs devices (for example, NVIDIA Tesla T4), you need Kata
version 1.3.0 or above. Follow the [Kata Containers setup
@@ -101,7 +122,7 @@ hotplug_vfio_on_root_bus = true
pcie_root_port = 1
```
## Build Kata Containers kernel with GPU support
### Build guest kernel with GPU support
The default guest kernel installed with Kata Containers does not provide GPU
support. To use an NVIDIA GPU with Kata Containers, you need to build a kernel
@@ -160,11 +181,11 @@ code, using `Dragonball VMM` for NVIDIA GPU `hot-plug/hot-unplug` requires apply
addition to the above kernel configuration items. Follow these steps to build for NVIDIA GPU `hot-[un]plug`
for `Dragonball`:
```sh
# Prepare .config to support both upcall and nvidia gpu
```sh
# Prepare .config to support both upcall and nvidia gpu
$ ./build-kernel.sh -v 5.10.25 -e -t dragonball -g nvidia -f setup
# Build guest kernel to support both upcall and nvidia gpu
# Build guest kernel to support both upcall and nvidia gpu
$ ./build-kernel.sh -v 5.10.25 -e -t dragonball -g nvidia build
# Install guest kernel to support both upcall and nvidia gpu
@@ -196,303 +217,7 @@ Before using the new guest kernel, please update the `kernel` parameters in
kernel = "/usr/share/kata-containers/vmlinuz-nvidia-gpu.container"
```
## NVIDIA GPU pass-through mode with Kata Containers
Use the following steps to pass an NVIDIA GPU device in pass-through mode with Kata:
1. Find the Bus-Device-Function (BDF) for the GPU device on the host:
```sh
$ sudo lspci -nn -D | grep -i nvidia
0000:d0:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:20b9] (rev a1)
```
> PCI address `0000:d0:00.0` is assigned to the hardware GPU device.
> `10de:20b9` is the device ID of the hardware GPU device.
2. Find the IOMMU group for the GPU device:
```sh
$ BDF="0000:d0:00.0"
$ readlink -e /sys/bus/pci/devices/$BDF/iommu_group
```
The previous output shows that the GPU belongs to IOMMU group 192. The next
step is to bind the GPU to the VFIO-PCI driver.
```sh
$ BDF="0000:d0:00.0"
$ DEV="/sys/bus/pci/devices/$BDF"
$ echo "vfio-pci" > $DEV/driver_override
$ echo $BDF > $DEV/driver/unbind
$ echo $BDF > /sys/bus/pci/drivers_probe
# To return the device to the standard driver, we simply clear the
# driver_override and reprobe the device, ex:
$ echo > $DEV/preferred_driver
$ echo $BDF > $DEV/driver/unbind
$ echo $BDF > /sys/bus/pci/drivers_probe
```
3. Check the IOMMU group number under `/dev/vfio`:
```sh
$ ls -l /dev/vfio
total 0
crw------- 1 zvonkok zvonkok 243, 0 Mar 18 03:06 192
crw-rw-rw- 1 root root 10, 196 Mar 18 02:27 vfio
```
4. Start a Kata container with the GPU device:
```sh
# You may need to `modprobe vhost-vsock` if you get
# host system doesn't support vsock: stat /dev/vhost-vsock
$ sudo ctr --debug run --runtime "io.containerd.kata.v2" --device /dev/vfio/192 --rm -t "docker.io/library/archlinux:latest" arch uname -r
```
5. Run `lspci` within the container to verify the GPU device is seen in the list
of the PCI devices. Note the vendor-device id of the GPU (`10de:20b9`) in the `lspci` output.
```sh
$ sudo ctr --debug run --runtime "io.containerd.kata.v2" --device /dev/vfio/192 --rm -t "docker.io/library/archlinux:latest" arch sh -c "lspci -nn | grep '10de:20b9'"
```
6. Additionally, you can check the PCI BARs space of the NVIDIA GPU device in the container:
```sh
$ sudo ctr --debug run --runtime "io.containerd.kata.v2" --device /dev/vfio/192 --rm -t "docker.io/library/archlinux:latest" arch sh -c "lspci -s 02:00.0 -vv | grep Region"
```
> **Note**: If you see a message similar to the above, the BAR space of the NVIDIA
> GPU has been successfully allocated.
## NVIDIA vGPU mode with Kata Containers
NVIDIA vGPU is a licensed product on all supported GPU boards. A software license
is required to enable all vGPU features within the guest VM. NVIDIA vGPU manager
needs to be installed on the host to configure GPUs in vGPU mode. See [NVIDIA Virtual GPU Software Documentation v14.0 through 14.1](https://docs.nvidia.com/grid/14.0/) for more details.
### NVIDIA vGPU time-sliced
In the time-sliced mode, the GPU is not partitioned and the workload uses the
whole GPU and shares access to the GPU engines. Processes are scheduled in
series. The best effort scheduler is the default one and can be exchanged by
other scheduling policies see the documentation above how to do that.
Beware if you had `MIG` enabled before to disable `MIG` on the GPU if you want
to use `time-sliced` `vGPU`.
```sh
$ sudo nvidia-smi -mig 0
```
Enable the virtual functions for the physical GPU in the `sysfs` file system.
```sh
$ sudo /usr/lib/nvidia/sriov-manage -e 0000:41:00.0
```
Get the `BDF` of the available virtual function on the GPU, and choose one for the
following steps.
```sh
$ cd /sys/bus/pci/devices/0000:41:00.0/
$ ls -l | grep virtfn
```
#### List all available vGPU instances
The following shell snippet will walk the `sysfs` and only print instances
that are available, that can be created.
```sh
# The 00.0 is often the PF of the device the VFs will have the funciont in the
# BDF incremented by some values so e.g. the very first VF is 0000:41:00.4
cd /sys/bus/pci/devices/0000:41:00.0/
for vf in $(ls -d virtfn*)
do
BDF=$(basename $(readlink -f $vf))
for md in $(ls -d $vf/mdev_supported_types/*)
do
AVAIL=$(cat $md/available_instances)
NAME=$(cat $md/name)
DIR=$(basename $md)
if [ $AVAIL -gt 0 ]; then
echo "| BDF | INSTANCES | NAME | DIR |"
echo "+--------------+-----------+----------------+------------+"
printf "| %12s |%10d |%15s | %10s |\n\n" "$BDF" "$AVAIL" "$NAME" "$DIR"
fi
done
done
```
If there are available instances you get something like this (for the first VF),
beware that the output is highly dependent on the GPU you have, if there is no
output check again if `MIG` is really disabled.
```sh
| BDF | INSTANCES | NAME | DIR |
+--------------+-----------+----------------+------------+
| 0000:41:00.4 | 1 | GRID A100D-4C | nvidia-692 |
| BDF | INSTANCES | NAME | DIR |
+--------------+-----------+----------------+------------+
| 0000:41:00.4 | 1 | GRID A100D-8C | nvidia-693 |
| BDF | INSTANCES | NAME | DIR |
+--------------+-----------+----------------+------------+
| 0000:41:00.4 | 1 | GRID A100D-10C | nvidia-694 |
| BDF | INSTANCES | NAME | DIR |
+--------------+-----------+----------------+------------+
| 0000:41:00.4 | 1 | GRID A100D-16C | nvidia-695 |
| BDF | INSTANCES | NAME | DIR |
+--------------+-----------+----------------+------------+
| 0000:41:00.4 | 1 | GRID A100D-20C | nvidia-696 |
| BDF | INSTANCES | NAME | DIR |
+--------------+-----------+----------------+------------+
| 0000:41:00.4 | 1 | GRID A100D-40C | nvidia-697 |
| BDF | INSTANCES | NAME | DIR |
+--------------+-----------+----------------+------------+
| 0000:41:00.4 | 1 | GRID A100D-80C | nvidia-698 |
```
Change to the `mdev_supported_types` directory for the virtual function on which
you want to create the `vGPU`. Taking the first output as an example:
```sh
$ cd virtfn0/mdev_supported_types/nvidia-692
$ UUIDGEN=$(uuidgen)
$ sudo bash -c "echo $UUIDGEN > create"
```
Confirm that the `vGPU` was created. You should see the `UUID` pointing to a
subdirectory of the `sysfs` space.
```sh
$ ls -l /sys/bus/mdev/devices/
```
Get the `IOMMU` group number and verify there is a `VFIO` device created to use
with Kata.
```sh
$ ls -l /sys/bus/mdev/devices/*/
$ ls -l /dev/vfio
```
Use the `VFIO` device created in the same way as in the pass-through use-case.
Beware that the guest needs the NVIDIA guest drivers, so one would need to build
a new guest `OS` image.
### NVIDIA vGPU MIG-backed
We're not going into detail what `MIG` is but briefly it is a technology to
partition the hardware into independent instances with guaranteed quality of
service. For more details see [NVIDIA Multi-Instance GPU User Guide](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/).
First enable `MIG` mode for a GPU, depending on the platform you're running
a reboot would be necessary. Some platforms support GPU reset.
```sh
$ sudo nvidia-smi -mig 1
```
If the platform supports a GPU reset one can run, otherwise you will get a
warning to reboot the server.
```sh
$ sudo nvidia-smi --gpu-reset
```
The driver per default provides a number of profiles that users can opt-in when
configuring the MIG feature.
```sh
$ sudo nvidia-smi mig -lgip
+-----------------------------------------------------------------------------+
| GPU instance profiles: |
| GPU Name ID Instances Memory P2P SM DEC ENC |
| Free/Total GiB CE JPEG OFA |
|=============================================================================|
| 0 MIG 1g.10gb 19 7/7 9.50 No 14 0 0 |
| 1 0 0 |
+-----------------------------------------------------------------------------+
| 0 MIG 1g.10gb+me 20 1/1 9.50 No 14 1 0 |
| 1 1 1 |
+-----------------------------------------------------------------------------+
| 0 MIG 2g.20gb 14 3/3 19.50 No 28 1 0 |
| 2 0 0 |
+-----------------------------------------------------------------------------+
...
```
Create the GPU instances that correspond to the `vGPU` types of the `MIG-backed`
`vGPUs` that you will create [NVIDIA A100 PCIe 80GB Virtual GPU Types](https://docs.nvidia.com/grid/13.0/grid-vgpu-user-guide/index.html#vgpu-types-nvidia-a100-pcie-80gb).
```sh
# MIG 1g.10gb --> vGPU A100D-1-10C
$ sudo nvidia-smi mig -cgi 19
```
List the GPU instances and get the GPU instance id to create the compute
instance.
```sh
$ sudo nvidia-smi mig -lgi # list the created GPU instances
$ sudo nvidia-smi mig -cci -gi 9 # each GPU instance can have several compute
# instances. Instance -> Workload
```
Verify that the compute instances were created within the GPU instance
```sh
$ nvidia-smi
... snip ...
+-----------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG|
| | | ECC| |
|==================+======================+===========+=======================|
| 0 9 0 0 | 0MiB / 9728MiB | 14 0 | 1 0 0 0 0 |
| | 0MiB / 4095MiB | | |
+------------------+----------------------+-----------+-----------------------+
... snip ...
```
We can use the [snippet](#list-all-available-vgpu-instances) from before to list
the available `vGPU` instances, this time `MIG-backed`.
```sh
| BDF | INSTANCES | NAME | DIR |
+--------------+-----------+----------------+------------+
| 0000:41:00.4 | 1 |GRID A100D-1-10C | nvidia-699 |
| BDF | INSTANCES | NAME | DIR |
+--------------+-----------+----------------+------------+
| 0000:41:00.5 | 1 |GRID A100D-1-10C | nvidia-699 |
| BDF | INSTANCES | NAME | DIR |
+--------------+-----------+----------------+------------+
| 0000:41:01.6 | 1 |GRID A100D-1-10C | nvidia-699 |
... snip ...
```
Repeat the steps after the [snippet](#list-all-available-vgpu-instances) listing
to create the corresponding `mdev` device and use the guest `OS` created in the
previous section with `time-sliced` `vGPUs`.
## Install NVIDIA Driver + Toolkit in Kata Containers Guest OS
### Build Guest OS with NVIDIA Driver and Toolkit
Consult the [Developer-Guide](https://github.com/kata-containers/kata-containers/blob/main/docs/Developer-Guide.md#create-a-rootfs-image) on how to create a
rootfs base image for a distribution of your choice. This is going to be used as
@@ -583,9 +308,12 @@ Enable the `guest_hook_path` in Kata's `configuration.toml`
guest_hook_path = "/usr/share/oci/hooks"
```
As the last step one can remove the additional packages and files that were added
to the `$ROOTFS_DIR` to keep it as small as possible.
One has built a NVIDIA rootfs, kernel and now we can run any GPU container
without installing the drivers into the container. Check NVIDIA device status
with `nvidia-smi`
with `nvidia-smi`:
```sh
$ sudo ctr --debug run --runtime "io.containerd.kata.v2" --device /dev/vfio/192 --rm -t "docker.io/nvidia/cuda:11.6.0-base-ubuntu20.04" cuda nvidia-smi
@@ -611,8 +339,309 @@ Fri Mar 18 10:36:59 2022
+-----------------------------------------------------------------------------+
```
As the last step one can remove the additional packages and files that were added
to the `$ROOTFS_DIR` to keep it as small as possible.
## Usage Examples with Kata Containers
The following sections give usage examples for this based on the different modes.
### NVIDIA GPU passthrough mode
Use the following steps to pass an NVIDIA GPU device in passthrough mode with Kata:
1. Find the Bus-Device-Function (BDF) for the GPU device on the host:
```sh
$ sudo lspci -nn -D | grep -i nvidia
0000:d0:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:20b9] (rev a1)
```
> PCI address `0000:d0:00.0` is assigned to the hardware GPU device.
> `10de:20b9` is the device ID of the hardware GPU device.
2. Find the IOMMU group for the GPU device:
```sh
$ BDF="0000:d0:00.0"
$ readlink -e /sys/bus/pci/devices/$BDF/iommu_group
```
The previous output shows that the GPU belongs to IOMMU group 192. The next
step is to bind the GPU to the VFIO-PCI driver.
```sh
$ BDF="0000:d0:00.0"
$ DEV="/sys/bus/pci/devices/$BDF"
$ echo "vfio-pci" > $DEV/driver_override
$ echo $BDF > $DEV/driver/unbind
$ echo $BDF > /sys/bus/pci/drivers_probe
# To return the device to the standard driver, we simply clear the
# driver_override and reprobe the device, ex:
$ echo > $DEV/preferred_driver
$ echo $BDF > $DEV/driver/unbind
$ echo $BDF > /sys/bus/pci/drivers_probe
```
3. Check the IOMMU group number under `/dev/vfio`:
```sh
$ ls -l /dev/vfio
total 0
crw------- 1 zvonkok zvonkok 243, 0 Mar 18 03:06 192
crw-rw-rw- 1 root root 10, 196 Mar 18 02:27 vfio
```
4. Start a Kata container with the GPU device:
```sh
# You may need to `modprobe vhost-vsock` if you get
# host system doesn't support vsock: stat /dev/vhost-vsock
$ sudo ctr --debug run --runtime "io.containerd.kata.v2" --device /dev/vfio/192 --rm -t "docker.io/library/archlinux:latest" arch uname -r
```
5. Run `lspci` within the container to verify the GPU device is seen in the list
of the PCI devices. Note the vendor-device id of the GPU (`10de:20b9`) in the `lspci` output.
```sh
$ sudo ctr --debug run --runtime "io.containerd.kata.v2" --device /dev/vfio/192 --rm -t "docker.io/library/archlinux:latest" arch sh -c "lspci -nn | grep '10de:20b9'"
```
6. Additionally, you can check the PCI BARs space of the NVIDIA GPU device in the container:
```sh
$ sudo ctr --debug run --runtime "io.containerd.kata.v2" --device /dev/vfio/192 --rm -t "docker.io/library/archlinux:latest" arch sh -c "lspci -s 02:00.0 -vv | grep Region"
```
> **Note**: If you see a message similar to the above, the BAR space of the NVIDIA
> GPU has been successfully allocated.
### NVIDIA vGPU mode
NVIDIA vGPU is a licensed product on all supported GPU boards. A software license
is required to enable all vGPU features within the guest VM. NVIDIA vGPU manager
needs to be installed on the host to configure GPUs in vGPU mode. See
[NVIDIA Virtual GPU Software Documentation v14.0 through 14.1](https://docs.nvidia.com/grid/14.0/)
for more details.
#### NVIDIA vGPU time-sliced
In the time-sliced mode, the GPU is not partitioned and the workload uses the
whole GPU and shares access to the GPU engines. Processes are scheduled in
series. The best effort scheduler is the default one and can be exchanged by
other scheduling policies see the documentation above how to do that.
Beware if you had `MIG` enabled before to disable `MIG` on the GPU if you want
to use `time-sliced` `vGPU`.
```sh
$ sudo nvidia-smi -mig 0
```
Enable the virtual functions for the physical GPU in the `sysfs` file system.
```sh
$ sudo /usr/lib/nvidia/sriov-manage -e 0000:41:00.0
```
Get the `BDF` of the available virtual function on the GPU, and choose one for the
following steps.
```sh
$ cd /sys/bus/pci/devices/0000:41:00.0/
$ ls -l | grep virtfn
```
##### List all available vGPU instances
The following shell snippet will walk the `sysfs` and only print instances
that are available, that can be created.
```sh
# The 00.0 is often the PF of the device. The VFs will have the function in the
# BDF incremented by some values so e.g. the very first VF is 0000:41:00.4
cd /sys/bus/pci/devices/0000:41:00.0/
for vf in $(ls -d virtfn*)
do
BDF=$(basename $(readlink -f $vf))
for md in $(ls -d $vf/mdev_supported_types/*)
do
AVAIL=$(cat $md/available_instances)
NAME=$(cat $md/name)
DIR=$(basename $md)
if [ $AVAIL -gt 0 ]; then
echo "| BDF | INSTANCES | NAME | DIR |"
echo "+--------------+-----------+----------------+------------+"
printf "| %12s |%10d |%15s | %10s |\n\n" "$BDF" "$AVAIL" "$NAME" "$DIR"
fi
done
done
```
If there are available instances you get something like this (for the first VF),
beware that the output is highly dependent on the GPU you have, if there is no
output check again if `MIG` is really disabled.
```sh
| BDF | INSTANCES | NAME | DIR |
+--------------+-----------+----------------+------------+
| 0000:41:00.4 | 1 | GRID A100D-4C | nvidia-692 |
| BDF | INSTANCES | NAME | DIR |
+--------------+-----------+----------------+------------+
| 0000:41:00.4 | 1 | GRID A100D-8C | nvidia-693 |
| BDF | INSTANCES | NAME | DIR |
+--------------+-----------+----------------+------------+
| 0000:41:00.4 | 1 | GRID A100D-10C | nvidia-694 |
| BDF | INSTANCES | NAME | DIR |
+--------------+-----------+----------------+------------+
| 0000:41:00.4 | 1 | GRID A100D-16C | nvidia-695 |
| BDF | INSTANCES | NAME | DIR |
+--------------+-----------+----------------+------------+
| 0000:41:00.4 | 1 | GRID A100D-20C | nvidia-696 |
| BDF | INSTANCES | NAME | DIR |
+--------------+-----------+----------------+------------+
| 0000:41:00.4 | 1 | GRID A100D-40C | nvidia-697 |
| BDF | INSTANCES | NAME | DIR |
+--------------+-----------+----------------+------------+
| 0000:41:00.4 | 1 | GRID A100D-80C | nvidia-698 |
```
Change to the `mdev_supported_types` directory for the virtual function on which
you want to create the `vGPU`. Taking the first output as an example:
```sh
$ cd virtfn0/mdev_supported_types/nvidia-692
$ UUIDGEN=$(uuidgen)
$ sudo bash -c "echo $UUIDGEN > create"
```
Confirm that the `vGPU` was created. You should see the `UUID` pointing to a
subdirectory of the `sysfs` space.
```sh
$ ls -l /sys/bus/mdev/devices/
```
Get the `IOMMU` group number and verify there is a `VFIO` device created to use
with Kata.
```sh
$ ls -l /sys/bus/mdev/devices/*/
$ ls -l /dev/vfio
```
Use the `VFIO` device created in the same way as in the passthrough use-case.
Beware that the guest needs the NVIDIA guest drivers, so one would need to build
a new guest `OS` image.
#### NVIDIA vGPU MIG-backed
We're not going into detail what `MIG` is but briefly it is a technology to
partition the hardware into independent instances with guaranteed quality of
service. For more details see
[NVIDIA Multi-Instance GPU User Guide](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/).
First enable `MIG` mode for a GPU, depending on the platform you're running
a reboot would be necessary. Some platforms support GPU reset.
```sh
$ sudo nvidia-smi -mig 1
```
If the platform supports a GPU reset one can run, otherwise you will get a
warning to reboot the server.
```sh
$ sudo nvidia-smi --gpu-reset
```
The driver per default provides a number of profiles that users can opt-in when
configuring the MIG feature.
```sh
$ sudo nvidia-smi mig -lgip
+-----------------------------------------------------------------------------+
| GPU instance profiles: |
| GPU Name ID Instances Memory P2P SM DEC ENC |
| Free/Total GiB CE JPEG OFA |
|=============================================================================|
| 0 MIG 1g.10gb 19 7/7 9.50 No 14 0 0 |
| 1 0 0 |
+-----------------------------------------------------------------------------+
| 0 MIG 1g.10gb+me 20 1/1 9.50 No 14 1 0 |
| 1 1 1 |
+-----------------------------------------------------------------------------+
| 0 MIG 2g.20gb 14 3/3 19.50 No 28 1 0 |
| 2 0 0 |
+-----------------------------------------------------------------------------+
...
```
Create the GPU instances that correspond to the `vGPU` types of the `MIG-backed`
`vGPUs` that you will create
[NVIDIA A100 PCIe 80GB Virtual GPU Types](https://docs.nvidia.com/grid/13.0/grid-vgpu-user-guide/index.html#vgpu-types-nvidia-a100-pcie-80gb).
```sh
# MIG 1g.10gb --> vGPU A100D-1-10C
$ sudo nvidia-smi mig -cgi 19
```
List the GPU instances and get the GPU instance id to create the compute
instance.
```sh
$ sudo nvidia-smi mig -lgi # list the created GPU instances
$ sudo nvidia-smi mig -cci -gi 9 # each GPU instance can have several compute
# instances. Instance -> Workload
```
Verify that the compute instances were created within the GPU instance
```sh
$ nvidia-smi
... snip ...
+-----------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG|
| | | ECC| |
|==================+======================+===========+=======================|
| 0 9 0 0 | 0MiB / 9728MiB | 14 0 | 1 0 0 0 0 |
| | 0MiB / 4095MiB | | |
+------------------+----------------------+-----------+-----------------------+
... snip ...
```
We can use the [snippet](#list-all-available-vgpu-instances) from before to list
the available `vGPU` instances, this time `MIG-backed`.
```sh
| BDF | INSTANCES | NAME | DIR |
+--------------+-----------+----------------+------------+
| 0000:41:00.4 | 1 |GRID A100D-1-10C | nvidia-699 |
| BDF | INSTANCES | NAME | DIR |
+--------------+-----------+----------------+------------+
| 0000:41:00.5 | 1 |GRID A100D-1-10C | nvidia-699 |
| BDF | INSTANCES | NAME | DIR |
+--------------+-----------+----------------+------------+
| 0000:41:01.6 | 1 |GRID A100D-1-10C | nvidia-699 |
... snip ...
```
Repeat the steps after the [snippet](#list-all-available-vgpu-instances) listing
to create the corresponding `mdev` device and use the guest `OS` created in the
previous section with `time-sliced` `vGPUs`.
## References

View File

@@ -1,24 +1,20 @@
# Table of Contents
**Note:**: This guide used to contain an end-to-end flow to build a
custom Kata containers root filesystem with QAT out-of-tree SR-IOV virtual
function driver and run QAT enabled containers. The former is no longer necessary
so the instructions are dropped. If the use-case is still of interest, please file
an issue in either of the QAT Kubernetes specific repos linked below.
# Introduction
Intel® QuickAssist Technology (QAT) provides hardware acceleration
for security (cryptography) and compression. These instructions cover the
steps for the latest [Ubuntu LTS release](https://ubuntu.com/download/desktop)
which already include the QAT host driver. These instructions can be adapted to
any Linux distribution. These instructions guide the user on how to download
the kernel sources, compile kernel driver modules against those sources, and
load them onto the host as well as preparing a specially built Kata Containers
kernel and custom Kata Containers rootfs.
for security (cryptography) and compression. Kata Containers can enable
these acceleration functions for containers using QAT SR-IOV with the
support from [Intel QAT Device Plugin for Kubernetes](https://github.com/intel/intel-device-plugins-for-kubernetes)
or [Intel QAT DRA Resource Driver for Kubernetes](https://github.com/intel/intel-resource-drivers-for-kubernetes).
* Download kernel sources
* Compile Kata kernel
* Compile kernel driver modules against those sources
* Download rootfs
* Add driver modules to rootfs
* Build rootfs image
## Helpful Links before starting
## More Information
[Intel® QuickAssist Technology at `01.org`](https://www.intel.com/content/www/us/en/developer/topic-technology/open/quick-assist-technology/overview.html)
@@ -26,554 +22,6 @@ kernel and custom Kata Containers rootfs.
[Intel Device Plugin for Kubernetes](https://github.com/intel/intel-device-plugins-for-kubernetes)
[Intel DRA Resource Driver for Kubernetes](https://github.com/intel/intel-resource-drivers-for-kubernetes)
[Intel® QuickAssist Technology for Crypto Poll Mode Driver](https://dpdk-docs.readthedocs.io/en/latest/cryptodevs/qat.html)
## Steps to enable Intel® QAT in Kata Containers
There are some steps to complete only once, some steps to complete with every
reboot, and some steps to complete when the host kernel changes.
## Script variables
The following list of variables must be set before running through the
scripts. These variables refer to locations to store modules and configuration
files on the host and links to the drivers to use. Modify these as
needed to point to updated drivers or different install locations.
### Set environment variables (Every Reboot)
Make sure to check [`01.org`](https://www.intel.com/content/www/us/en/developer/topic-technology/open/quick-assist-technology/overview.html) for
the latest driver.
```bash
$ export QAT_DRIVER_VER=qat1.7.l.4.14.0-00031.tar.gz
$ export QAT_DRIVER_URL=https://downloadmirror.intel.com/30178/eng/${QAT_DRIVER_VER}
$ export QAT_CONF_LOCATION=~/QAT_conf
$ export QAT_DOCKERFILE=https://raw.githubusercontent.com/intel/intel-device-plugins-for-kubernetes/main/demo/openssl-qat-engine/Dockerfile
$ export QAT_SRC=~/src/QAT
$ export GOPATH=~/src/go
$ export KATA_KERNEL_LOCATION=~/kata
$ export KATA_ROOTFS_LOCATION=~/kata
```
## Prepare the Ubuntu Host
The host could be a bare metal instance or a virtual machine. If using a
virtual machine, make sure that KVM nesting is enabled. The following
instructions reference an Intel® C62X chipset. Some of the instructions must be
modified if using a different Intel® QAT device. The Intel® QAT chipset can be
identified by executing the following.
### Identify which PCI Bus the Intel® QAT card is on
```bash
$ for i in 0434 0435 37c8 1f18 1f19; do lspci -d 8086:$i; done
```
### Install necessary packages for Ubuntu
These packages are necessary to compile the Kata kernel, Intel® QAT driver, and to
prepare the rootfs for Kata. [Docker](https://docs.docker.com/engine/install/ubuntu/)
also needs to be installed to be able to build the rootfs. To test that
everything works a Kubernetes pod is started requesting Intel® QAT resources. For the
pass through of the virtual functions the kernel boot parameter needs to have
`INTEL_IOMMU=on`.
```bash
$ sudo apt update
$ sudo apt install -y golang-go build-essential python pkg-config zlib1g-dev libudev-dev bison libelf-dev flex libtool automake autotools-dev autoconf bc libpixman-1-dev coreutils libssl-dev
$ sudo sed -i 's/GRUB_CMDLINE_LINUX_DEFAULT=""/GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"/' /etc/default/grub
$ sudo update-grub
$ sudo reboot
```
### Download Intel® QAT drivers
This will download the [Intel® QAT drivers](https://www.intel.com/content/www/us/en/developer/topic-technology/open/quick-assist-technology/overview.html).
Make sure to check the website for the latest version.
```bash
$ mkdir -p $QAT_SRC
$ cd $QAT_SRC
$ curl -L $QAT_DRIVER_URL | tar zx
```
### Copy Intel® QAT configuration files and enable virtual functions
Modify the instructions below as necessary if using a different Intel® QAT hardware
platform. You can learn more about customizing configuration files at the
[Intel® QAT Engine repository](https://github.com/intel/QAT_Engine/#copy-the-correct-intel-quickassist-technology-driver-config-files)
This section starts from a base config file and changes the `SSL` section to
`SHIM` to support the OpenSSL engine. There are more tweaks that you can make
depending on the use case and how many Intel® QAT engines should be run. You
can find more information about how to customize in the
[Intel® QuickAssist Technology Software for Linux* - Programmer's Guide.](https://www.intel.com/content/www/us/en/content-details/709196/intel-quickassist-technology-api-programmer-s-guide.html)
> **Note: This section assumes that a Intel® QAT `c6xx` platform is used.**
```bash
$ mkdir -p $QAT_CONF_LOCATION
$ cp $QAT_SRC/quickassist/utilities/adf_ctl/conf_files/c6xxvf_dev0.conf.vm $QAT_CONF_LOCATION/c6xxvf_dev0.conf
$ sed -i 's/\[SSL\]/\[SHIM\]/g' $QAT_CONF_LOCATION/c6xxvf_dev0.conf
```
### Expose and Bind Intel® QAT virtual functions to VFIO-PCI (Every reboot)
To enable virtual functions, the host OS should have IOMMU groups enabled. In
the UEFI Firmware Intel® Virtualization Technology for Directed I/O
(Intel® VT-d) must be enabled. Also, the kernel boot parameter should be
`intel_iommu=on` or `intel_iommu=ifgx_off`. This should have been set from
the instructions above. Check the output of `/proc/cmdline` to confirm. The
following commands assume you installed an Intel® QAT card, IOMMU is on, and
VT-d is enabled. The vendor and device ID add to the `VFIO-PCI` driver so that
each exposed virtual function can be bound to the `VFIO-PCI` driver. Once
complete, each virtual function passes into a Kata Containers container using
the PCIe device passthrough feature. For Kubernetes, the
[Intel device plugin](https://github.com/intel/intel-device-plugins-for-kubernetes)
for Kubernetes handles the binding of the driver, but the VFs still must be
enabled.
```bash
$ sudo modprobe vfio-pci
$ QAT_PCI_BUS_PF_NUMBERS=$((lspci -d :435 && lspci -d :37c8 && lspci -d :19e2 && lspci -d :6f54) | cut -d ' ' -f 1)
$ QAT_PCI_BUS_PF_1=$(echo $QAT_PCI_BUS_PF_NUMBERS | cut -d ' ' -f 1)
$ echo 16 | sudo tee /sys/bus/pci/devices/0000:$QAT_PCI_BUS_PF_1/sriov_numvfs
$ QAT_PCI_ID_VF=$(cat /sys/bus/pci/devices/0000:${QAT_PCI_BUS_PF_1}/virtfn0/uevent | grep PCI_ID)
$ QAT_VENDOR_AND_ID_VF=$(echo ${QAT_PCI_ID_VF/PCI_ID=} | sed 's/:/ /')
$ echo $QAT_VENDOR_AND_ID_VF | sudo tee --append /sys/bus/pci/drivers/vfio-pci/new_id
```
Loop through all the virtual functions and bind to the VFIO driver
```bash
$ for f in /sys/bus/pci/devices/0000:$QAT_PCI_BUS_PF_1/virtfn*
do QAT_PCI_BUS_VF=$(basename $(readlink $f))
echo $QAT_PCI_BUS_VF | sudo tee --append /sys/bus/pci/drivers/c6xxvf/unbind
echo $QAT_PCI_BUS_VF | sudo tee --append /sys/bus/pci/drivers/vfio-pci/bind
done
```
### Check Intel® QAT virtual functions are enabled
If the following command returns empty, then the virtual functions are not
properly enabled. This command checks the enumerated device IDs for just the
virtual functions. Using the Intel® QAT as an example, the physical device ID
is `37c8` and virtual function device ID is `37c9`. The following command checks
if VF's are enabled for any of the currently known Intel® QAT device ID's. The
following `ls` command should show the 16 VF's bound to `VFIO-PCI`.
```bash
$ for i in 0442 0443 37c9 19e3; do lspci -d 8086:$i; done
```
Another way to check is to see what PCI devices that `VFIO-PCI` is mapped to.
It should match the device ID's of the VF's.
```bash
$ ls -la /sys/bus/pci/drivers/vfio-pci
```
## Prepare Kata Containers
### Download Kata kernel Source
This example automatically uses the latest Kata kernel supported by Kata. It
follows the instructions from the
[packaging kernel repository](../../tools/packaging/kernel)
and uses the latest Kata kernel
[config](../../tools/packaging/kernel/configs).
There are some patches that must be installed as well, which the
`build-kernel.sh` script should automatically apply. If you are using a
different kernel version, then you might need to manually apply them. Since
the Kata Containers kernel has a minimal set of kernel flags set, you must
create a Intel® QAT kernel fragment with the necessary `CONFIG_CRYPTO_*` options set.
Update the config to set some of the `CRYPTO` flags to enabled. This might
change with different kernel versions. The following instructions were tested
with kernel `v5.4.0-64-generic`.
```bash
$ mkdir -p $GOPATH
$ cd $GOPATH
$ go get -v github.com/kata-containers/kata-containers
$ cat << EOF > $GOPATH/src/github.com/kata-containers/kata-containers/tools/packaging/kernel/configs/fragments/common/qat.conf
CONFIG_PCIEAER=y
CONFIG_UIO=y
CONFIG_CRYPTO_HW=y
CONFIG_CRYPTO_DEV_QAT_C62XVF=m
CONFIG_CRYPTO_CBC=y
CONFIG_MODULES=y
CONFIG_MODULE_SIG=y
CONFIG_CRYPTO_AUTHENC=y
CONFIG_CRYPTO_DH=y
EOF
$ $GOPATH/src/github.com/kata-containers/kata-containers/tools/packaging/kernel/build-kernel.sh setup
```
### Build Kata kernel
```bash
$ cd $GOPATH
$ export LINUX_VER=$(ls -d kata-linux-*)
$ sed -i 's/EXTRAVERSION =/EXTRAVERSION = .qat.container/' $LINUX_VER/Makefile
$ $GOPATH/src/github.com/kata-containers/kata-containers/tools/packaging/kernel/build-kernel.sh build
```
### Copy Kata kernel
```bash
$ export KATA_KERNEL_NAME=vmlinux-${LINUX_VER}_qat
$ mkdir -p $KATA_KERNEL_LOCATION
$ cp ${GOPATH}/${LINUX_VER}/vmlinux ${KATA_KERNEL_LOCATION}/${KATA_KERNEL_NAME}
```
### Prepare Kata root filesystem
These instructions build upon the OS builder instructions located in the
[Developer Guide](../Developer-Guide.md). At this point it is recommended that
[Docker](https://docs.docker.com/engine/install/ubuntu/) is installed first, and
then [Kata-deploy](../../tools/packaging/kata-deploy)
is use to install Kata. This will make sure that the correct `agent` version
is installed into the rootfs in the steps below.
The following instructions use Ubuntu as the root filesystem with systemd as
the init and will add in the `kmod` binary, which is not a standard binary in
a Kata rootfs image. The `kmod` binary is necessary to load the Intel® QAT
kernel modules when the virtual machine rootfs boots.
```bash
$ export OSBUILDER=$GOPATH/src/github.com/kata-containers/kata-containers/tools/osbuilder
$ export ROOTFS_DIR=${OSBUILDER}/rootfs-builder/rootfs
$ export EXTRA_PKGS='kmod'
```
Make sure that the `kata-agent` version matches the installed `kata-runtime`
version. Also make sure the `kata-runtime` install location is in your `PATH`
variable. The following `AGENT_VERSION` can be set manually to match
the `kata-runtime` version if the following commands don't work.
```bash
$ export PATH=$PATH:/opt/kata/bin
$ cd $GOPATH
$ export AGENT_VERSION=$(kata-runtime version | head -n 1 | grep -o "[0-9.]\+")
$ cd ${OSBUILDER}/rootfs-builder
$ sudo rm -rf ${ROOTFS_DIR}
$ script -fec 'sudo -E GOPATH=$GOPATH USE_DOCKER=true SECCOMP=no ./rootfs.sh ubuntu'
```
### Compile Intel® QAT drivers for Kata Containers kernel and add to Kata Containers rootfs
After the Kata Containers kernel builds with the proper configuration flags,
you must build the Intel® QAT drivers against that Kata Containers kernel
version in a similar way they were previously built for the host OS. You must
set the `KERNEL_SOURCE_ROOT` variable to the Kata Containers kernel source
directory and build the Intel® QAT drivers again. The `make` command will
install the Intel® QAT modules into the Kata rootfs.
```bash
$ cd $GOPATH
$ export LINUX_VER=$(ls -d kata*)
$ export KERNEL_MAJOR_VERSION=$(awk '/^VERSION =/{print $NF}' $GOPATH/$LINUX_VER/Makefile)
$ export KERNEL_PATHLEVEL=$(awk '/^PATCHLEVEL =/{print $NF}' $GOPATH/$LINUX_VER/Makefile)
$ export KERNEL_SUBLEVEL=$(awk '/^SUBLEVEL =/{print $NF}' $GOPATH/$LINUX_VER/Makefile)
$ export KERNEL_EXTRAVERSION=$(awk '/^EXTRAVERSION =/{print $NF}' $GOPATH/$LINUX_VER/Makefile)
$ export KERNEL_ROOTFS_DIR=${KERNEL_MAJOR_VERSION}.${KERNEL_PATHLEVEL}.${KERNEL_SUBLEVEL}${KERNEL_EXTRAVERSION}
$ cd $QAT_SRC
$ KERNEL_SOURCE_ROOT=$GOPATH/$LINUX_VER ./configure --enable-icp-sriov=guest
$ sudo -E make all -j $(nproc)
$ sudo -E make INSTALL_MOD_PATH=$ROOTFS_DIR qat-driver-install -j $(nproc)
```
The `usdm_drv` module also needs to be copied into the rootfs modules path and
`depmod` should be run.
```bash
$ sudo cp $QAT_SRC/build/usdm_drv.ko $ROOTFS_DIR/lib/modules/${KERNEL_ROOTFS_DIR}/updates/drivers
$ sudo depmod -a -b ${ROOTFS_DIR} ${KERNEL_ROOTFS_DIR}
$ cd ${OSBUILDER}/image-builder
$ script -fec 'sudo -E USE_DOCKER=true ./image_builder.sh ${ROOTFS_DIR}'
```
> **Note: Ignore any errors on modules.builtin and modules.order when running
> `depmod`.**
### Copy Kata rootfs
```bash
$ mkdir -p $KATA_ROOTFS_LOCATION
$ cp ${OSBUILDER}/image-builder/kata-containers.img $KATA_ROOTFS_LOCATION
```
## Verify Intel® QAT works in a container
The following instructions uses a OpenSSL Dockerfile that builds the
Intel® QAT engine to allow OpenSSL to offload crypto functions. It is a
convenient way to test that VFIO device passthrough for the Intel® QAT VFs are
working properly with the Kata Containers VM.
### Build OpenSSL Intel® QAT engine container
Use the OpenSSL Intel® QAT [Dockerfile](https://github.com/intel/intel-device-plugins-for-kubernetes/tree/main/demo/openssl-qat-engine)
to build a container image with an optimized OpenSSL engine for
Intel® QAT. Using `docker build` with the Kata Containers runtime can sometimes
have issues. Therefore, make sure that `runc` is the default Docker container
runtime.
```bash
$ cd $QAT_SRC
$ curl -O $QAT_DOCKERFILE
$ sudo docker build -t openssl-qat-engine .
```
> **Note: The Intel® QAT driver version in this container might not match the
> Intel® QAT driver compiled and loaded on the host when compiling.**
### Test Intel® QAT with the ctr tool
The `ctr` tool can be used to interact with the containerd daemon. It may be
more convenient to use this tool to verify the kernel and image instead of
setting up a Kubernetes cluster. The correct Kata runtimes need to be added
to the containerd `config.toml`. Below is a sample snippet that can be added
to allow QEMU and Cloud Hypervisor (CLH) to work with `ctr`.
```
[plugins.cri.containerd.runtimes.kata-qemu]
runtime_type = "io.containerd.kata-qemu.v2"
privileged_without_host_devices = true
pod_annotations = ["io.katacontainers.*"]
[plugins.cri.containerd.runtimes.kata-qemu.options]
ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration-qemu.toml"
[plugins.cri.containerd.runtimes.kata-clh]
runtime_type = "io.containerd.kata-clh.v2"
privileged_without_host_devices = true
pod_annotations = ["io.katacontainers.*"]
[plugins.cri.containerd.runtimes.kata-clh.options]
ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration-clh.toml"
```
In addition, containerd expects the binary to be in `/usr/local/bin` so add
this small script so that it redirects to be able to use either QEMU or
Cloud Hypervisor with Kata.
```bash
$ echo '#!/usr/bin/env bash' | sudo tee /usr/local/bin/containerd-shim-kata-qemu-v2
$ echo 'KATA_CONF_FILE=/opt/kata/share/defaults/kata-containers/configuration-qemu.toml /opt/kata/bin/containerd-shim-kata-v2 $@' | sudo tee -a /usr/local/bin/containerd-shim-kata-qemu-v2
$ sudo chmod +x /usr/local/bin/containerd-shim-kata-qemu-v2
$ echo '#!/usr/bin/env bash' | sudo tee /usr/local/bin/containerd-shim-kata-clh-v2
$ echo 'KATA_CONF_FILE=/opt/kata/share/defaults/kata-containers/configuration-clh.toml /opt/kata/bin/containerd-shim-kata-v2 $@' | sudo tee -a /usr/local/bin/containerd-shim-kata-clh-v2
$ sudo chmod +x /usr/local/bin/containerd-shim-kata-clh-v2
```
After the OpenSSL image is built and imported into containerd, a Intel® QAT
virtual function exposed in the step above can be added to the `ctr` command.
Make sure to change the `/dev/vfio` number to one that actually exists on the
host system. When using the `ctr` tool, the`configuration.toml` for Kata needs
to point to the custom Kata kernel and rootfs built above and the Intel® QAT
modules in the Kata rootfs need to load at boot. The following steps assume that
`kata-deploy` was used to install Kata and QEMU is being tested. If using a
different hypervisor, different install method for Kata, or a different
Intel® QAT chipset then the command will need to be modified.
> **Note: The following was tested with
[containerd v1.4.6](https://github.com/containerd/containerd/releases/tag/v1.4.6).**
```bash
$ config_file="/opt/kata/share/defaults/kata-containers/configuration-qemu.toml"
$ sudo sed -i "/kernel =/c kernel = "\"${KATA_ROOTFS_LOCATION}/${KATA_KERNEL_NAME}\""" $config_file
$ sudo sed -i "/image =/c image = "\"${KATA_KERNEL_LOCATION}/kata-containers.img\""" $config_file
$ sudo sed -i -e 's/^kernel_params = "\(.*\)"/kernel_params = "\1 modules-load=usdm_drv,qat_c62xvf"/g' $config_file
$ sudo docker save -o openssl-qat-engine.tar openssl-qat-engine:latest
$ sudo ctr images import openssl-qat-engine.tar
$ sudo ctr run --runtime io.containerd.run.kata-qemu.v2 --privileged -t --rm --device=/dev/vfio/180 --mount type=bind,src=/dev,dst=/dev,options=rbind:rw --mount type=bind,src=${QAT_CONF_LOCATION}/c6xxvf_dev0.conf,dst=/etc/c6xxvf_dev0.conf,options=rbind:rw docker.io/library/openssl-qat-engine:latest bash
```
Below are some commands to run in the container image to verify Intel® QAT is
working
```sh
root@67561dc2757a/ # cat /proc/modules
qat_c62xvf 16384 - - Live 0xffffffffc00d9000 (OE)
usdm_drv 86016 - - Live 0xffffffffc00e8000 (OE)
intel_qat 249856 - - Live 0xffffffffc009b000 (OE)
root@67561dc2757a/ # adf_ctl restart
Restarting all devices.
Processing /etc/c6xxvf_dev0.conf
root@67561dc2757a/ # adf_ctl status
Checking status of all devices.
There is 1 QAT acceleration device(s) in the system:
qat_dev0 - type: c6xxvf, inst_id: 0, node_id: 0, bsf: 0000:01:01.0, #accel: 1 #engines: 1 state: up
root@67561dc2757a/ # openssl engine -c -t qat-hw
(qat-hw) Reference implementation of QAT crypto engine v0.6.1
[RSA, DSA, DH, AES-128-CBC-HMAC-SHA1, AES-128-CBC-HMAC-SHA256, AES-256-CBC-HMAC-SHA1, AES-256-CBC-HMAC-SHA256, TLS1-PRF, HKDF, X25519, X448]
[ available ]
```
### Test Intel® QAT in Kubernetes
Start a Kubernetes cluster with containerd as the CRI. The host should
already be setup with 16 virtual functions of the Intel® QAT card bound to
`VFIO-PCI`. Verify this by looking in `/dev/vfio` for a listing of devices.
You might need to disable Docker before initializing Kubernetes. Be aware
that the OpenSSL container image built above will need to be exported from
Docker and imported into containerd.
If Kata is installed through [`kata-deploy`](../../tools/packaging/kata-deploy/helm-chart/README.md)
there will be multiple `configuration.toml` files associated with different
hypervisors. Rather than add in the custom Kata kernel, Kata rootfs, and
kernel modules to each `configuration.toml` as the default, instead use
[annotations](../how-to/how-to-load-kernel-modules-with-kata.md)
in the Kubernetes YAML file to tell Kata which kernel and rootfs to use. The
easy way to do this is to use `kata-deploy` which will install the Kata binaries
to `/opt` and properly configure the `/etc/containerd/config.toml` with annotation
support. However, the `configuration.toml` needs to enable support for
annotations as well. The following configures both QEMU and Cloud Hypervisor
`configuration.toml` files that are currently available with Kata Container
versions 2.0 and higher.
```bash
$ sudo sed -i 's/enable_annotations\s=\s\[\]/enable_annotations = [".*"]/' /opt/kata/share/defaults/kata-containers/configuration-qemu.toml
$ sudo sed -i 's/enable_annotations\s=\s\[\]/enable_annotations = [".*"]/' /opt/kata/share/defaults/kata-containers/configuration-clh.toml
```
Export the OpenSSL image from Docker and import into containerd.
```bash
$ sudo docker save -o openssl-qat-engine.tar openssl-qat-engine:latest
$ sudo ctr -n=k8s.io images import openssl-qat-engine.tar
```
The [Intel® QAT Plugin](https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/cmd/qat_plugin/README.md)
needs to be started so that the virtual functions can be discovered and
used by Kubernetes.
The following YAML file can be used to start a Kata container with Intel® QAT
support. If Kata is installed with `kata-deploy`, then the containerd
`configuration.toml` should have all of the Kata runtime classes already
populated and annotations supported. To use a Intel® QAT virtual function, the
Intel® QAT plugin needs to be started after the VF's are bound to `VFIO-PCI` as
described [above](#expose-and-bind-intel-qat-virtual-functions-to-vfio-pci-every-reboot).
Edit the following to point to the correct Kata kernel and rootfs location
built with Intel® QAT support.
```bash
$ cat << EOF > kata-openssl-qat.yaml
apiVersion: v1
kind: Pod
metadata:
name: kata-openssl-qat
labels:
app: kata-openssl-qat
annotations:
io.katacontainers.config.hypervisor.kernel: "$KATA_KERNEL_LOCATION/$KATA_KERNEL_NAME"
io.katacontainers.config.hypervisor.image: "$KATA_ROOTFS_LOCATION/kata-containers.img"
io.katacontainers.config.hypervisor.kernel_params: "modules-load=usdm_drv,qat_c62xvf"
spec:
runtimeClassName: kata-qemu
containers:
- name: kata-openssl-qat
image: docker.io/library/openssl-qat-engine:latest
imagePullPolicy: IfNotPresent
resources:
limits:
qat.intel.com/generic: 1
cpu: 1
securityContext:
capabilities:
add: ["IPC_LOCK", "SYS_ADMIN"]
volumeMounts:
- mountPath: /etc/c6xxvf_dev0.conf
name: etc-mount
- mountPath: /dev
name: dev-mount
volumes:
- name: dev-mount
hostPath:
path: /dev
- name: etc-mount
hostPath:
path: $QAT_CONF_LOCATION/c6xxvf_dev0.conf
EOF
```
Use `kubectl` to start the pod. Verify that Intel® QAT card acceleration is
working with the Intel® QAT engine.
```bash
$ kubectl apply -f kata-openssl-qat.yaml
```
```sh
$ kubectl exec -it kata-openssl-qat -- adf_ctl restart
Restarting all devices.
Processing /etc/c6xxvf_dev0.conf
$ kubectl exec -it kata-openssl-qat -- adf_ctl status
Checking status of all devices.
There is 1 QAT acceleration device(s) in the system:
qat_dev0 - type: c6xxvf, inst_id: 0, node_id: 0, bsf: 0000:01:01.0, #accel: 1 #engines: 1 state: up
$ kubectl exec -it kata-openssl-qat -- openssl engine -c -t qat-hw
(qat-hw) Reference implementation of QAT crypto engine v0.6.1
[RSA, DSA, DH, AES-128-CBC-HMAC-SHA1, AES-128-CBC-HMAC-SHA256, AES-256-CBC-HMAC-SHA1, AES-256-CBC-HMAC-SHA256, TLS1-PRF, HKDF, X25519, X448]
[ available ]
```
### Troubleshooting
* Check that `/dev/vfio` has VFs enabled.
```sh
$ ls /dev/vfio
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 vfio
```
* Check that the modules load when inside the Kata Container.
```sh
bash-5.0# grep -E "qat|usdm_drv" /proc/modules
qat_c62xvf 16384 - - Live 0x0000000000000000 (O)
usdm_drv 86016 - - Live 0x0000000000000000 (O)
intel_qat 184320 - - Live 0x0000000000000000 (O)
```
* Verify that at least the first `c6xxvf_dev0.conf` file mounts inside the
container image in `/etc`. You will need one configuration file for each VF
passed into the container.
```sh
bash-5.0# ls /etc
c6xxvf_dev0.conf c6xxvf_dev11.conf c6xxvf_dev14.conf c6xxvf_dev3.conf c6xxvf_dev6.conf c6xxvf_dev9.conf resolv.conf
c6xxvf_dev1.conf c6xxvf_dev12.conf c6xxvf_dev15.conf c6xxvf_dev4.conf c6xxvf_dev7.conf hostname
c6xxvf_dev10.conf c6xxvf_dev13.conf c6xxvf_dev2.conf c6xxvf_dev5.conf c6xxvf_dev8.conf hosts
```
* Check `dmesg` inside the container to see if there are any issues with the
Intel® QAT driver.
* If there are issues building the OpenSSL Intel® QAT container image, then
check to make sure that runc is the default runtime for building container.
```sh
$ cat /etc/systemd/system/docker.service.d/50-runtime.conf
[Service]
Environment="DOCKER_DEFAULT_RUNTIME=--default-runtime runc"
```
## Optional Scripts
### Verify Intel® QAT card counters are incremented
To check the built in firmware counters, the Intel® QAT driver has to be compiled
and installed to the host and can't rely on the built in host driver. The
counters will increase when the accelerator is actively being used. To verify
Intel® QAT is actively accelerating the containerized application, use the
following instructions to check if any of the counters increment. Make
sure to change the PCI Device ID to match whats in the system.
```bash
$ for i in 0434 0435 37c8 1f18 1f19; do lspci -d 8086:$i; done
$ sudo watch cat /sys/kernel/debug/qat_c6xx_0000\:b1\:00.0/fw_counters
$ sudo watch cat /sys/kernel/debug/qat_c6xx_0000\:b3\:00.0/fw_counters
$ sudo watch cat /sys/kernel/debug/qat_c6xx_0000\:b5\:00.0/fw_counters
```

View File

@@ -1,3 +1,3 @@
[toolchain]
# Keep in sync with versions.yaml
channel = "1.85.1"
channel = "1.91"

View File

@@ -1,3 +1,8 @@
# Copyright (c) 2025 NVIDIA Corporation
#
# SPDX-License-Identifier: Apache-2.0
#
# Allow opening any 'source'd file, even if not specified as input
external-sources=true
@@ -9,7 +14,7 @@ enable=check-unassigned-uppercase
# Enforces braces around variable expansions to avoid ambiguity or confusion.
# e.g. ${filename} rather than $filename
enable=require-variable-braces
enable=require-variable-braces
# Requires double-bracket syntax [[ expr ]] for safer, more consistent tests.
# NO: if [ "$var" = "value" ]

210
src/agent/Cargo.lock generated
View File

@@ -625,9 +625,9 @@ dependencies = [
[[package]]
name = "bytes"
version = "1.10.1"
version = "1.11.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d71b6127be86fdcfddb610f7182ac57211d4b18a3e9c82eb2d17662f2227ad6a"
checksum = "1e748733b7cbc798e1434b6ac524f0c1ff2ab456fe201501e6497c8417a4fc33"
[[package]]
name = "capctl"
@@ -780,9 +780,9 @@ dependencies = [
[[package]]
name = "container-device-interface"
version = "0.1.0"
version = "0.1.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "653849f0c250f73d9afab4b2a9a6b07adaee1f34c44ffa6f2d2c3f9392002c1a"
checksum = "2605001b0e8214dae8af146a43ccaa965d960403e330f174c21327154530df8b"
dependencies = [
"anyhow",
"clap",
@@ -987,9 +987,9 @@ dependencies = [
[[package]]
name = "deranged"
version = "0.4.0"
version = "0.5.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9c9e6a11ca8224451684bc0d7d5a7adbf8f2fd6887261a1cfc3c0432f9d4068e"
checksum = "ececcb659e7ba858fb4f10388c250a7252eb0a27373f1a72b8748afdd248e587"
dependencies = [
"powerfmt",
]
@@ -1066,27 +1066,6 @@ dependencies = [
"crypto-common",
]
[[package]]
name = "dirs-next"
version = "2.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b98cf8ebf19c3d1b223e151f99a4f9f0690dca41414773390fc824184ac833e1"
dependencies = [
"cfg-if",
"dirs-sys-next",
]
[[package]]
name = "dirs-sys-next"
version = "0.1.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4ebda144c4fe02d1f7ea1a7d9641b6fc6b580adcfa024ae48797ecdeb6825b4d"
dependencies = [
"libc",
"redox_users",
"winapi",
]
[[package]]
name = "displaydoc"
version = "0.2.5"
@@ -1207,9 +1186,9 @@ dependencies = [
[[package]]
name = "fancy-regex"
version = "0.14.0"
version = "0.16.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6e24cb5a94bcae1e5408b0effca5cd7172ea3c5755049c5f3af4cd283a165298"
checksum = "998b056554fbe42e03ae0e152895cd1a7e1002aec800fdc6635d20270260c46f"
dependencies = [
"bit-set",
"regex-automata 0.4.9",
@@ -1590,7 +1569,7 @@ version = "1.3.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f4a85d31aea989eead29a3aaf9e1115a180df8282431156e533de47660892565"
dependencies = [
"bytes 1.10.1",
"bytes 1.11.1",
"fnv",
"itoa",
]
@@ -1601,7 +1580,7 @@ version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1efedce1fb8e6913f23e0c92de8e62cd5b772a67e7b3946df930a62566c93184"
dependencies = [
"bytes 1.10.1",
"bytes 1.11.1",
"http",
]
@@ -1611,7 +1590,7 @@ version = "0.1.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b021d93e26becf5dc7e1b75b1bed1fd93124b374ceb73f43d4d4eafec896a64a"
dependencies = [
"bytes 1.10.1",
"bytes 1.11.1",
"futures-core",
"http",
"http-body",
@@ -1630,7 +1609,7 @@ version = "1.6.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cc2b571658e38e0c01b1fdca3bbbe93c00d3d71693ff2770043f8c29bc7d6f80"
dependencies = [
"bytes 1.10.1",
"bytes 1.11.1",
"futures-channel",
"futures-util",
"http",
@@ -1649,7 +1628,7 @@ version = "0.1.11"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "497bbc33a26fdd4af9ed9c70d63f61cf56a938375fbb32df34db9b1cd6d643f2"
dependencies = [
"bytes 1.10.1",
"bytes 1.11.1",
"futures-channel",
"futures-util",
"http",
@@ -1675,7 +1654,7 @@ dependencies = [
"js-sys",
"log",
"wasm-bindgen",
"windows-core 0.61.0",
"windows-core",
]
[[package]]
@@ -2007,9 +1986,9 @@ dependencies = [
[[package]]
name = "jsonschema"
version = "0.30.0"
version = "0.33.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f1b46a0365a611fbf1d2143104dcf910aada96fafd295bab16c60b802bf6fa1d"
checksum = "d46662859bc5f60a145b75f4632fbadc84e829e45df6c5de74cfc8e05acb96b5"
dependencies = [
"ahash 0.8.12",
"base64 0.22.1",
@@ -2214,16 +2193,6 @@ version = "0.2.172"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d750af042f7ef4f724306de029d18836c26c1765a54a6a3f094cbd23a7267ffa"
[[package]]
name = "libredox"
version = "0.1.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c0ff37bd590ca25063e35af745c343cb7a0271906fb7b37e4813e8f79f00268d"
dependencies = [
"bitflags 2.9.0",
"libc",
]
[[package]]
name = "libseccomp"
version = "0.3.0"
@@ -2488,7 +2457,7 @@ version = "0.11.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "72452e012c2f8d612410d89eea01e2d9b56205274abb35d53f60200b2ec41d60"
dependencies = [
"bytes 1.10.1",
"bytes 1.11.1",
"futures",
"log",
"netlink-packet-core",
@@ -2514,7 +2483,7 @@ version = "0.8.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "16c903aa70590cb93691bf97a767c8d1d6122d2cc9070433deb3bbf36ce8bd23"
dependencies = [
"bytes 1.10.1",
"bytes 1.11.1",
"futures",
"libc",
"log",
@@ -2678,9 +2647,9 @@ dependencies = [
[[package]]
name = "num-conv"
version = "0.1.0"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "51d515d32fb182ee37cda2ccdcb92950d6a3c2893aa280e540671c2cd0f3b1d9"
checksum = "cf97ec579c3c42f953ef76dbf8d55ac91fb219dde70e49aa4a6b7d74e9919050"
[[package]]
name = "num-integer"
@@ -3183,7 +3152,7 @@ version = "0.8.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "de5e2533f59d08fcf364fd374ebda0692a70bd6d7e66ef97f306f45c6c5d8020"
dependencies = [
"bytes 1.10.1",
"bytes 1.11.1",
"prost-derive",
]
@@ -3193,7 +3162,7 @@ version = "0.8.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "355f634b43cdd80724ee7848f95770e7e70eefa6dcf14fea676216573b8fd603"
dependencies = [
"bytes 1.10.1",
"bytes 1.11.1",
"heck 0.3.3",
"itertools",
"log",
@@ -3224,7 +3193,7 @@ version = "0.8.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "603bbd6394701d13f3f25aada59c7de9d35a6a5887cfc156181234a44002771b"
dependencies = [
"bytes 1.10.1",
"bytes 1.11.1",
"prost",
]
@@ -3372,17 +3341,6 @@ dependencies = [
"bitflags 2.9.0",
]
[[package]]
name = "redox_users"
version = "0.4.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ba009ff324d1fc1b900bd1fdb31564febe58a8ccc8a6fdbb93b543d33b13ca43"
dependencies = [
"getrandom 0.2.16",
"libredox",
"thiserror 1.0.69",
]
[[package]]
name = "ref-cast"
version = "1.0.24"
@@ -3405,9 +3363,9 @@ dependencies = [
[[package]]
name = "referencing"
version = "0.30.0"
version = "0.33.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c8eff4fa778b5c2a57e85c5f2fe3a709c52f0e60d23146e2151cbef5893f420e"
checksum = "9e9c261f7ce75418b3beadfb3f0eb1299fe8eb9640deba45ffa2cb783098697d"
dependencies = [
"ahash 0.8.12",
"fluent-uri 0.3.2",
@@ -3498,7 +3456,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d19c46a6fdd48bc4dab94b6103fccc55d34c67cc0ad04653aad4ea2a07cd7bbb"
dependencies = [
"base64 0.22.1",
"bytes 1.10.1",
"bytes 1.11.1",
"futures-channel",
"futures-core",
"futures-util",
@@ -3530,13 +3488,13 @@ dependencies = [
[[package]]
name = "rkyv"
version = "0.7.45"
version = "0.7.46"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9008cd6385b9e161d8229e1f6549dd23c3d022f132a2ea37ac3a10ac4935779b"
checksum = "2297bf9c81a3f0dc96bc9521370b88f054168c29826a75e89c55ff196e7ed6a1"
dependencies = [
"bitvec",
"bytecheck",
"bytes 1.10.1",
"bytes 1.11.1",
"hashbrown 0.12.3",
"ptr_meta",
"rend",
@@ -3548,9 +3506,9 @@ dependencies = [
[[package]]
name = "rkyv_derive"
version = "0.7.45"
version = "0.7.46"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "503d1d27590a2b0a3a4ca4c94755aa2875657196ecbf401a42eff41d7de532c0"
checksum = "84d7b42d4b8d06048d3ac8db0eb31bcb942cbeb709f0b5f2b2ebde398d3038f5"
dependencies = [
"proc-macro2",
"quote",
@@ -3631,7 +3589,7 @@ checksum = "faa7de2ba56ac291bd90c6b9bece784a52ae1411f9506544b3eae36dd2356d50"
dependencies = [
"arrayvec",
"borsh",
"bytes 1.10.1",
"bytes 1.11.1",
"num-traits",
"rand",
"rkyv",
@@ -3827,10 +3785,11 @@ checksum = "56e6fa9c48d24d85fb3de5ad847117517440f6beceb7798af16b4a87d616b8d0"
[[package]]
name = "serde"
version = "1.0.219"
version = "1.0.228"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5f0e2c6ed6606019b4e29e69dbaba95b11854410e5347d525002456dbbb786b6"
checksum = "9a8e94ea7f378bd32cbbd37198a4a91436180c5bb472411e48b5ec2e2124ae9e"
dependencies = [
"serde_core",
"serde_derive",
]
@@ -3865,10 +3824,19 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "794e44574226fc701e3be5c651feb7939038fc67fb73f6f4dd5c4ba90fd3be70"
[[package]]
name = "serde_derive"
version = "1.0.219"
name = "serde_core"
version = "1.0.228"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5b0276cf7f2c73365f7157c8123c21cd9a50fbbd844757af28ca1f5925fc2a00"
checksum = "41d385c7d4ca58e59fc732af25c3983b67ac852c1a25000afe1175de458b67ad"
dependencies = [
"serde_derive",
]
[[package]]
name = "serde_derive"
version = "1.0.228"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d540f220d3187173da220f885ab66608367b6574e925011a9353e4badda91d79"
dependencies = [
"proc-macro2",
"quote",
@@ -4095,10 +4063,11 @@ dependencies = [
[[package]]
name = "slog-term"
version = "2.9.1"
version = "2.9.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b6e022d0b998abfe5c3782c1f03551a596269450ccd677ea51c56f8b214610e8"
checksum = "5cb1fc680b38eed6fad4c02b3871c09d2c81db8c96aa4e9c0a34904c830f09b5"
dependencies = [
"chrono",
"is-terminal",
"slog",
"term",
@@ -4286,13 +4255,11 @@ dependencies = [
[[package]]
name = "term"
version = "0.7.0"
version = "1.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c59df8ac95d96ff9bede18eb7300b0fda5e5d8d90960e76f8e14ae765eedbf1f"
checksum = "d8c27177b12a6399ffc08b98f76f7c9a1f4fe9fc967c784c5a071fa8d93cf7e1"
dependencies = [
"dirs-next",
"rustversion",
"winapi",
"windows-sys 0.60.2",
]
[[package]]
@@ -4305,6 +4272,7 @@ checksum = "8f50febec83f5ee1df3015341d8bd429f2d1cc62bcba7ea2076759d315084683"
name = "test-utils"
version = "0.1.0"
dependencies = [
"libc",
"nix 0.26.4",
]
@@ -4360,30 +4328,30 @@ dependencies = [
[[package]]
name = "time"
version = "0.3.41"
version = "0.3.47"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8a7619e19bc266e0f9c5e6686659d394bc57973859340060a69221e57dbc0c40"
checksum = "743bd48c283afc0388f9b8827b976905fb217ad9e647fae3a379a9283c4def2c"
dependencies = [
"deranged",
"itoa",
"num-conv",
"powerfmt",
"serde",
"serde_core",
"time-core",
"time-macros",
]
[[package]]
name = "time-core"
version = "0.1.4"
version = "0.1.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c9e9a38711f559d9e3ce1cdb06dd7c5b8ea546bc90052da6d06bb76da74bb07c"
checksum = "7694e1cfe791f8d31026952abf09c69ca6f6fa4e1a1229e18988f06a04a12dca"
[[package]]
name = "time-macros"
version = "0.2.22"
version = "0.2.27"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3526739392ec93fd8b359c8e98514cb3e8e021beb4e5f597b00a0221f8ed8a49"
checksum = "2e70e4c5a0e0a8a4823ad65dfe1a6930e4f4d756dcd9dd7939022b5e8c501215"
dependencies = [
"num-conv",
"time-core",
@@ -4421,7 +4389,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0cc3a2344dafbe23a245241fe8b09735b521110d30fcefbbd5feb1797ca35d17"
dependencies = [
"backtrace",
"bytes 1.10.1",
"bytes 1.11.1",
"io-uring",
"libc",
"mio",
@@ -4475,7 +4443,7 @@ version = "0.4.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "52a15c15b1bc91f90902347eff163b5b682643aff0c8e972912cca79bd9208dd"
dependencies = [
"bytes 1.10.1",
"bytes 1.11.1",
"futures",
"libc",
"tokio",
@@ -5020,7 +4988,7 @@ version = "0.57.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "12342cb4d8e3b046f3d80effd474a7a02447231330ef77d71daa6fbc40681143"
dependencies = [
"windows-core 0.57.0",
"windows-core",
"windows-targets 0.52.6",
]
@@ -5030,25 +4998,12 @@ version = "0.57.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d2ed2439a290666cd67ecce2b0ffaad89c2a56b976b736e6ece670297897832d"
dependencies = [
"windows-implement 0.57.0",
"windows-interface 0.57.0",
"windows-implement",
"windows-interface",
"windows-result 0.1.2",
"windows-targets 0.52.6",
]
[[package]]
name = "windows-core"
version = "0.61.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4763c1de310c86d75a878046489e2e5ba02c649d185f21c67d4cf8a56d098980"
dependencies = [
"windows-implement 0.60.0",
"windows-interface 0.59.1",
"windows-link",
"windows-result 0.3.2",
"windows-strings 0.4.0",
]
[[package]]
name = "windows-implement"
version = "0.57.0"
@@ -5060,17 +5015,6 @@ dependencies = [
"syn 2.0.101",
]
[[package]]
name = "windows-implement"
version = "0.60.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a47fddd13af08290e67f4acabf4b459f647552718f683a7b415d290ac744a836"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.101",
]
[[package]]
name = "windows-interface"
version = "0.57.0"
@@ -5082,17 +5026,6 @@ dependencies = [
"syn 2.0.101",
]
[[package]]
name = "windows-interface"
version = "0.59.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bd9211b69f8dcdfa817bfd14bf1c97c9188afa36f4750130fcdf3f400eca9fa8"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.101",
]
[[package]]
name = "windows-link"
version = "0.1.1"
@@ -5106,7 +5039,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4286ad90ddb45071efd1a66dfa43eb02dd0dfbae1545ad6cc3c51cf34d7e8ba3"
dependencies = [
"windows-result 0.3.2",
"windows-strings 0.3.1",
"windows-strings",
"windows-targets 0.53.2",
]
@@ -5137,15 +5070,6 @@ dependencies = [
"windows-link",
]
[[package]]
name = "windows-strings"
version = "0.4.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7a2ba9642430ee452d5a7aa78d72907ebe8cfda358e8cb7918a2050581322f97"
dependencies = [
"windows-link",
]
[[package]]
name = "windows-sys"
version = "0.48.0"

View File

@@ -5,7 +5,7 @@ members = ["rustjail", "policy", "vsock-exporter"]
authors = ["The Kata Containers community <kata-dev@lists.katacontainers.io>"]
edition = "2018"
license = "Apache-2.0"
rust-version = "1.85.1"
rust-version = "1.88.0"
[workspace.dependencies]
oci-spec = { version = "0.8.1", features = ["runtime"] }

View File

@@ -21,7 +21,7 @@ fn to_capshashset(cfd_log: RawFd, capabilities: &Option<HashSet<LinuxCapability>
let binding: HashSet<LinuxCapability> = HashSet::new();
let caps = capabilities.as_ref().unwrap_or(&binding);
for cap in caps.iter() {
match Capability::from_str(&format!("CAP_{}", cap)) {
match Capability::from_str(&format!("CAP_{cap}")) {
Err(_) => {
log_child!(cfd_log, "{} is not a cap", &cap.to_string());
continue;

Some files were not shown because too many files have changed in this diff Show More