a2534e7bc8 introduced the logic to also
release a kata-tools tarball, but it missed allowing
KATA_TOOLS_STATIC_TARBALL env var to be passed to the release script,
leading to the following error during the release process:
```
ERROR: Invalid environment variable "KATA_TOOLS_STATIC_TARBALL"
```
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
In startVM(), for VMMs without hotplug support (e.g., Firecracker or
QEMU microvm), the runtime runs prestart hooks but misses rescanning
the network namespace. This causes VMs to boot with uninitialized
network configs, as updates from CNI plugins are not captured.
This patch adds a network rescan via AddEndpoints after prestart hooks
for the non-hotplug path, ensuring correct network info is passed to
the VMM configuration before the VM starts.
Fixes#11500
Signed-off-by: XanderC <xanderc@qq.com>
The virtio-9p is not supported for a long time, specially within
the runtime-rs, we have no such plan to support it. Removal of the
related items is reasonable.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
As Memory Agent feature is not used within CoCo(TDX/SNP) scenarios,
with this fact, it's better to just remove the related sections.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
It aims to introduce some related items within Makefile to enable
Intel SNP settings in configuration when do make build. And make it
possible to generate the rendered qemu-snp-runtime-rs configuration
based on the *.in template.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
To make it work well on the SEV-SNP platforms for qemu-runtime-rs with
coco, a dedicated SEV-SNP configuration should be introduced to help
prepare related CVM resources.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Enable measured rootfs within configuration when make build. And add
some other important items to make the configuration work well.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
It aims to introduce some related items within Makefile to enable
Intel TDX settings in configuration when do make build. And make it
possible to generate the rendered qemu-tdx-runtime-rs configuration
based on the *.in template.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
To make it work well on the TDX platforms for qemu-runtime-rs with
coco, a dedicated TDX configuration should be introduced to help
prepare related CVM resources.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Systemd-managed cgroups use the slice:prefix:name format, which is
not a filesystem path. Calling MoveTo() on such paths fails with
"invalid group path" and can abort cleanup before Delete() runs.
In some cases, this causes pod teardown delays.
Skip MoveTo for systemd-formatted sandbox/overhead cgroup paths when
sandbox_cgroup_only is true; systemd moves tasks on unit deletion.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
With cold-plug becoming by design the only supported mode with the
update of NVRC to v0.1.1, resolving references to hot-plug.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Enable post-install verification in kata-deploy CI tests. When
HELM_VERIFY_DEPLOYMENT is set, a simple verification pod is created
that runs with the Kata runtime to confirm deployment succeeded.
The verification pod prints kernel info and exits - success indicates
the Kata runtime is properly configured and functional.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Add optional verification that runs after kata-deploy installation.
When a pod spec is provided via --set-file verification.pod=<file>,
a verification job runs after install/upgrade to validate deployment.
The user is fully responsible for the verification pod content:
- Pod name, runtimeClassName, annotations, and verification logic
- Pod must exit 0 on success, non-zero on failure
The verification job simply:
1. Waits for kata-deploy DaemonSet to be ready
2. Applies the user-provided pod spec
3. Waits for the pod to complete
4. Shows logs and cleans up
Usage:
helm install kata-deploy ... \
--set-file verification.pod=/path/to/your-pod.yaml
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
To unlock the release, move the job to publish kata payload after push to an alternate runner(IBM owned) for ppc64le.
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
The new NVRC version works for CC and non-CC use cases,
no --feature confidential needed anymore.
Bump versions.yaml and adjust deployment instructions.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Disable NVDIMM. When using GPU passthrough, using NVDIMM would create
a r/o file-backed memory region. When using a GPU, QEMU tries to DMA-
map guest memory for the device, resulting in a mapping error:
memory listener initialization failed: Region mem0:
vfio_container_dma_map ... -22 (Invalid argument).
For the CC configs, NVDIMM is disabled by default in qemu_amd64.go
with a warning, but we also explicitly disable the setting in the
shim configuration file.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
We don't need to store the kernel headers anymore. We do need to store
the kernel modules, instead.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
We've done some bad file based driver determination,
now with versions.yaml there is a single source of truth.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
We need to package the build modules for the rootfs
to be able to consume it. We package the whole
/lib/modules/$(uname -r) directory strip=2.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
We want to have deterministic behaviour and only
one valid driver version acceptable via versions.yaml
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
We actually never installed yq to the kernel build,
there are some path that use yq but were never hit,
for the GPU use-case we need to read values from versions.yaml
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
In preparation for coco v0.18.0, bump the version of image-rs we use in
agent-ctl to match what we have in versions.yaml.
Drop the snapshotter-overlayfs feature. This was dropped from image-rs
when we removed enclave-cc support.
Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
Before cutting the Kata release that will be used with CoCo v0.18.0,
let's bump the versions of Trustee and guest-components to latest.
Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
This is needed as the 580 driver doesn't build against 6.18.x, and the
590 driver is not yet fully working for our case, thus we stick to the
previous version that worked before.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Bump both the kernel and kernel-confidential versions from v6.12.x and
v6.16.x to v6.18.4, aligning with the new LTS release.
Kernel 6.18 introduced several configuration changes that required
updates to our kernel config fragments:
* CRYPTO_FIPS dependencies changed:
- In 6.12: depended on !CRYPTO_MANAGER_DISABLE_TESTS
- In 6.18: now depends on CRYPTO_SELFTESTS (which requires EXPERT)
Added CONFIG_EXPERT=y and CONFIG_CRYPTO_SELFTESTS=y to crypto.conf
to satisfy the new dependency chain.
* CONFIG_EXPERT is a naughty one, as it disables / enables a bunch
of things behind ones back, probably just to prove a point that
it is for experts ;-) ... regardless, a reasonable amount of
options had to be re-added in order to make sure anything ends
up broken.
* Legacy iptables support:
Kernel 6.18 requires explicit legacy xtables/iptables configs for
IP_NF_* options. Added CONFIG_NETFILTER_XTABLES_LEGACY,
CONFIG_IP_NF_IPTABLES_LEGACY, and CONFIG_IP6_NF_IPTABLES_LEGACY
to netfilter.conf.
* Module signing dependencies:
Added CONFIG_MODULES=y and other required dependencies to
module_signing.conf to ensure MODULE_SIG can be properly enabled.
* Whitelist updates:
- Added CONFIG_NF_CT_PROTO_DCCP (removed in 6.18+)
- Added CONFIG_CRYPTO_SELFTESTS, CONFIG_NETFILTER_XTABLES_LEGACY,
CONFIG_IP_NF_IPTABLES_LEGACY, CONFIG_IP6_NF_IPTABLES_LEGACY
(added in 6.18+, not present in older kernels like 6.12)
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
A few minor changes to the Zensical config that makes navigation easier. Also
fixed a couple of bugs with local serving and added some quality of life
features to Zensical.
Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
This commit adds a Github workflow for building a Github Pages site for the markdown
files in the docs/ directory. Zensical is a new markdown-based static site generation
framework built by the creators of Material for Mkdocs. https://zensical.org/
This commit does not clean the doc structure, so site navigation is initially going to
be messy.
Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
Remove the agent hotplug timeout parameter from the kernel
command line. Having shifted to VFIO cold-plug, this parameter is
no longer needed.
Remove the no longer required parameter for TDX and thus align the
SNP and TDX configurations.
Add a parameter to avoid the kernel to mount the /dev tmpfs. NVRC
and later on kata-agent attempt this. While kata-agent does not
panic when mounting /dev fails, NVRC makes mounting /dev a hard
requirement.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
set_container_command() previously appended command arguments
one-by-one with
'.command += [...]'. This makes the helper non-idempotent and can
lead to unexpected command arrays when invoked multiple times.
Update the helper to set the full command array in a single yq v4
expression and print the target YAML path plus the command being
applied to simplify debugging when tests fail.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
The pod config file created by new_pod_config() was generated via
mktemp using the template "pod-config.yaml.in.XXX", which produces
filenames that do not end with ".yaml" (e.g. pod-config.yaml.in.ABC).
If the random combination of special suffix with ".Csv" or ".Xml", etc.
the following operations with yq will fail.
Some helpers and tooling assume the config path ends with ".yaml".
Switch the mktemp template to place the random suffix before the
extension so the returned path always ends with ".yaml".
Fixes: #12268, #12319
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
This is a suggestion from Choi, so we can easily test with a specific
kubectl version and also easily understand which kubectl version is
being used in case of failure.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
This image will be used by our helm charts to verify that a
kata-containers deployment is correct.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Enhance the wait_for_migration implementation to reliably wait for
QEMU migration completion and avoid the previous `sleep(280ms)`
delay.
(1) Add an initial fast-path query to return immediately if
migration is already completed/failed/cancelled.
(2) Use a hard deadline to enforce timeouts deterministically.
(3) Implement adaptive polling with backoff and a maximum interval
to reduce QMP load while keeping responsiveness.
(4) Unify migration status handling and return clear errors on
failed/cancelled states.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Return information about current migration process. And the input
and output as below:
{ 'command': 'query-migrate', 'returns': 'MigrationInfo' }
But note that the Qemu API is valid within qapi-rs(v0.15+)
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
The detailed information about the updated versions as below:
```
qapi = { version = "0.15", features = ["qmp", "async-tokio-all"] }
qapi-spec = "0.3.2"
qapi-qmp = "0.15.0"
```
and it will correct some corresonding structures.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Change the secure_storage_integrity option's default value to true.
With this, integrity protection for encrypted block device contents
will be requested from the confidential data hub by default, see the
agent's cdh_handler_trusted_storage function in rpc.rs.
This behavior can be disabled by explicitly setting the
agent.secure_storage_integrity parameter to 0 or false via kernel
command line parameters.
This will affect the trusted storage implementation for the guest-pull
mechanism, and it will affect future implementations using this code
path, such as implementations for ephemeral secure storage.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
In some builds we are seeing:
```
error: could not create temp file /opt/rustup/tmp/r2xu46kwuyc7k2kr_file: Permission denied (os error 13)
```
in the agent-ctl build, so try and port a fix from #12313 to the tools build
to try and resolve this.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Fixes deploying kata-containers using k3s. The deploy script fails with /opt/kata-artifacts/scripts/kata-deploy.sh: line 397: [: too many arguments
Signed-off-by: Federico A. Corazza <git@facorazza.com>
yamllint complains that there is only one space before the comment,
so add a second to prevent this annoying message showing up.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Create a new page for a reference implementation for Kubernetes
using QEMU, the go shim and an NVIDIA rootfs. The new page
contains information on:
- components involved in the NVIDIA (TEE) GPU scenario
- orchestration flow for GPU passthrough scenarios
- deployment guidance
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
- Apply a few structural/grouping changes and improve flow
- Group build sections together
- Move usage examples to last section
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
The following error was observed during virtiofsd static build:
```
error: could not create temp file /opt/rustup/tmp/p44enysfaxwdbvw4_file:
Permission denied (os error 13)
```
This occurs because RUSTUP_HOME and CARGO_HOME were initialized by the
root user during `docker build`, but `cargo build` is executed as a
non-root user via 'docker run --user'.
Ensure these directories are writable by adjusting the permission after
the toolchain installation is complete.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
OVMF build for Intel TDX (aka "TDVF") was disabled in favor of Ubuntu/
CentOS pre-upstream releases of Intel TDX.
See 4292c4c3b1.
It's time to re-enable the build and move runtime configurations to
use it (the latter will be done in a later commit).
This is a partial revert of 4292c4c3b with the following changes:
- Stop calling OVMF for Intel TDX "TDVF" and follow the naming distros
use for TDX enabled build: OVMF.inteltdx.fd.
- Single binary OVMF.inteltdx.fd is supported using -bios QEMU param.
- Secure Boot infrastructure is disabled since Kata does not support it.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Actually this method is indeed called, just add attribute of
`#[allow(dead_code)]` to allow UT pass. And the warning looks like:
warning: method `send_message_with_payload` is never used
|
224 | impl<R: Req> Endpoint<R> {
| ------------------------ method in this implementation
...
522 | pub fn send_message_with_payload<T: Sized, P: Sized>(
| ^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: `#[warn(dead_code)]` on by default
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
warning: unused `std::result::Result` that must be used
-->
src/dragonball/dbs_virtio_devices/src/vhost/vhost_user/net.rs:679:9
|
679 | / VirtioDevice::<Arc<GuestMemoryMmap<()>>, QueueSync,
GuestRegionMmap>::write_config(
680 | | &mut dev, 0, &config,
681 | | );
| |_________^
|
= note: this `Result` may be an `Err` variant, which should be
handled
= note: `#[warn(unused_must_use)]` on by default
help: use `let _ = ...` to ignore the resulting value
|
679 | let _ = VirtioDevice::<Arc<GuestMemoryMmap<()>>,
QueueSync, GuestRegionMmap>::write_config(
| +++++++
warning: unused `std::result::Result` that must be used
-->
src/dragonball/dbs_virtio_devices/src/vhost/vhost_user/net.rs:683:9
|
683 | / VirtioDevice::<Arc<GuestMemoryMmap<()>>, QueueSync,
GuestRegionMmap>::read_config(
684 | | &mut dev, 0, &mut data,
685 | | );
| |_________^
|
= note: this `Result` may be an `Err` variant, which should be
handled
help: use `let _ = ...` to ignore the resulting value
|
683 | let _ = VirtioDevice::<Arc<GuestMemoryMmap<()>>,
QueueSync, GuestRegionMmap>::read_config(
| +++++++
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
the WARNING looks like as:
...
warning: variable does not need to be mutable
--> src/dragonball/dbs_virtio_devices/src/vsock/csm/txbuf.rs:217:13
|
217 | let mut tmp: Vec<u8> = vec![0; TxBuf::SIZE - 2];
| ----^^^
| |
| help: remove this `mut`
|
= note: `#[warn(unused_mut)]` on by default
...
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Till k8s 1.34 we could grep by "Started containerd". From k8s 1.35
onwards the event message changed and we should, instead, grep by
"Container started".
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
QEMU v10.2.0 was released on December 24th, 2025.
The experimental GPU SNP / TDX are also pointing to v10.2.0 release with
their gpu-{snp,tdx}-20260107 branch.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
sha2 0.9.3 includes the use of cpuid-bool, which was renamed to cpufeatures
around 5 years ago. Try moving to a workspace dependency of sha2
and bumping to the latest version to remediate RUSTSEC-2021-0064
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
While the use-case of Intel QuickAssist (QAT) accelerated crypto
and/or compression with k8s and Kata Containers is still valid,
the setup instructions are outdated:
Starting with Intel Xeon Gen4 (Sapphire Rapids), QAT driver
stack moved to in-tree drivers without a separete SR-IOV VF
driver.
Drop all the setup instructions but keep the use-cases doc
for reference. Users wanting to enable the use-case, should consult
with Intel QAT Device plugins or Intel QAT DRA driver authors.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
The nontee job (run-k8s-tests-coco-nontee) for qemu-coco-dev-runtime-rs
is running well and it's time to make it required when the CI runs.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Using the built in size_of_val is easier to read and less error-prone
than doing this calculation manually
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
#[cfg(feature = "cargo-clippy")] has been deprecated for years,
so should be replaced with `#[cfg(clippy)]`
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
There are many, many null pointer dereferences in the bindgen code
when moving between rust 1.85.1 and 1.86 and no docs of the source
that it was generated from, so try and skip
these test from running until an SME can look at them @lifupan
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Clippy is recommending that format args are inlined for
better clarity, so update our code to remove these warnings
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Clippy is recommending that format args are inlined for
better clarity, so update our code to remove these warnings
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Clippy is recommending that format args are inlined for
better clarity, so update our code to remove these warnings
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
runtime-rs crates are pulled into kata-ctl and some of these have
bumped recently, so update these in kata-ctl as well
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Clippy is recommending that format args are inlined for
better clarity, so update our code to remove these warnings
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Clippy is recommending that format args are inlined for
better clarity, so update our code to remove these warnings
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Clippy is recommending that format args are inlined for
better clarity, so update our code to remove these warnings
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Clippy is recommending that format args are inlined for
better clarity, so update our code to remove these warnings
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Clippy is recommending that format args are inlined for
better clarity, so ensure our docs include this
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Clippy is recommending that format args are inlined for
better clarity, so update our code to remove these warnings
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
In #12151 the version was bumped in cargo.toml, but the update not
done, so run `cargo update -p container-device-interface` to apply it
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Since #12204 was merged, the following error has been observed:
```
bats warning: Executed 1 instead of expected 2 tests
[run_kubernetes_tests.sh:162] ERROR: Tests FAILED from suites: k8s-empty-dirs.bats
```
The cause is that `pod_logs_file` is re-declared as a local variable
in the second test before skipping, which makes it inaccessible
in `teardown()` and leads to an error.
This commit removes the re-declaration of the variable.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The Rust kata-deploy binary calls list_runtimeclasses() during NFD
setup, but the ClusterRole only granted get and patch permissions.
Add the list verb to the runtimeclasses resource permissions to fix
the RBAC error:
runtimeclasses.node.k8s.io is forbidden: User
\"system:serviceaccount:kube-system:kata-deploy-sa\" cannot list
resource \"runtimeclasses\" in API group \"node.k8s.io\" at the
cluster scope
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
KVM is not available in our ARM runners, let's skip those tests
accordingly, while making the rest test cases remain tested on machines
with KVM present and access to KVM device.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
There are test cases require interaction with KVM device, introduce
skip_if_kvm_unaccessable macro to skip them.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Changes in NIM/RAG samples:
- update image references
- update memory requirements, timeouts, model name
- sanitize some of the probes and print-out
Further refinements can be made in the future.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
cargo test was trying to evaluate the documentation comment and failing,
so try and make the comment explicitly text to avoid this
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
A few structs in genpolicy are never constructed, so add
`#[allow(dead_code)]` to prevent this clipped warning
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
In unicode you can have multi-byte characters, so it's better to
user char_indices than enumerate the bytes
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
VirtioBlkCcwDeviceHandler and VirtioBlkCcwHandler
are only constructed on s390x, so add #[cfg(target_arch = "s390x")]
to all the code
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
We can use the new Error::other options rather than
Error:new(Error:Kind:Other and drop our own macro that did this mapping
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Fix the warning throw up:
```
warning: hiding a lifetime that's elided elsewhere is confusing
--> /root/go/src/github.com/kata-containers/kata-containers/src/libs/kata-types/src/utils/u32_set.rs:50:17
|
50 | pub fn iter(&self) -> Iter<u32> {
| ^^^^^ --------- the same lifetime is hidden here
| |
| the lifetime is elided here
|
= help: the same lifetime is referred to in inconsistent ways, making the signature confusing
= note: `#[warn(mismatched_lifetime_syntaxes)]` on by default
help: use `'_` for type paths
|
50 | pub fn iter(&self) -> Iter<'_, u32> {
| +++
```
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Update virtiofsd to its latest release.
Here we also need to update the alpine version used by the builder as we
need a version of musl-dev new enough to have wrappers for pread2 and
pwrite2. As bumping, bump to the latest.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Add two attestation tests. The first one sets a resource policy that
requires CPU0 to have an affirming trust level. This is a negative test
which can run on any platform. Setting this policy without setting any
reference values should result in an attestation failure.
Next, a second test will set the same policy, but this time it will use
the journal log to find the QEMU command line from the previous test and
calculate the expected reference values. Currently this is only
supported on SNP using the sev-snp-measure tool, but the same flow
should work on other platforms.
Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
The five tests are set to the same vhost socket path, which could lead
to racing with one another. Use unique name to avoid this.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Let's bump experimental {tdx,snp} QEMU to the tags created Today in the
Confidential Containers repo, which match with QEMU 10.2.0-rc3.
This bump is mostly for early testing what will become 10.2.0, which
will be bumped everywhere then.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
It will address the issue:
"# bats warning: Executed 0 instead of expected 1 tests"
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
As each case need such preparation of get_pod_config_dir,
a better method is directly move it into the setup_common method.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
To measure the duration for journal, we need clearly print the journal
start time and end time for each case which helps to ensure the journal
log is for the specified period for the case.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
For failure cases within CI, we need dump the kata log to help
address issues, but currently large log messages cause partial
log we can see.
We remove initdata log output and increase log level to reduce
log output.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Currently policy_settings_dir is created only when
BATS_TEST_NUMBER == "1",
but delete_tmp_policy_settings_dir "${policy_settings_dir}" is
called in teardown() for every test. This means that for tests
after the first one teardown() may attempt to delete a directory
that was already removed by a previous test, or rely on a value
that does not belong to the current test execution.
Adjust teardown logic so that policy_settings_dir is only deleted
for the first test case (BATS_TEST_NUMBER == "1") and ignored for
subsequent tests. This keeps the original optimization of running
genpolicy only once, while avoiding unnecessary or confusing cleanup
attempts in later test cases.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
the previous pod_name is set as local which can not be captured
within the teardown() function, causing failure.
This commit just remove the `local pod_name` to make it a global
variable.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Otherwise we may hit a `no space left on device` when building the rust
kata-deploy binary.
This happens mostly because of the muli-staging build used to generate a
distroless final container.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
There we ensure labels are added to better deal with ownership of the
runtimeclasses. It's not strictly needed here as helm does take care of
the ownership, but also doesn't hurt to follow what seems to be a common
practice.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Let's shamelessly duplicate the nightly job to have at least nightly
runs using the rust implementation of kata-deploy.
The reason for doing that is to be pragmatic, as pragmatic as possible,
and avoid switching away of the scripts before 3.24.0 release, while
still testing both ways till the switch happens.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Differently than the scripts, which are called as `bash -c ...`, the
kata-deploy rust binary must be invoked directly we do not even have
shell in its container.
For now, the rust version is used in the used image has the "-rust"
suffix, which will help us to have both ways being used / tested for a
little while.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
kata-deploy shell script is not THAT bad and, to be honest, it's quite
handy for quick hacks and quick changes. However, it's been
increasingly becoming harder to maintain as it's grown its scope from a
testing tool to the proper project's front door, lacking unit tests, and
with an abundacy of complex regular expressions and bashisms to be able
to properly parse the environment variables it consumes.
Morever, the fact it is a Frankstein's monster glued together using
python packages, golang binaries, and a distro dependent container makes
the situation VERY HARD to use it from a distroless container (thus,
avoiding security issues), preventing further integration with
components that require a higher standard of security than we've been
requiring.
With everything said, with the help of Cursor (mostly on generating the
tests cases), here comes the oxidized version of the script, which runs
from a distroless container image.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The ORAS cache helper needs PUSH_TO_REGISTRY to be set to 'yes' to
push new artifacts to the cache. However, this environment variable
was not being passed to the Docker container during agent, tools, and
busybox builds.
Moreover, for ghcr.io authentication, add support for using GH_TOKEN and
GITHUB_ACTOR as fallbacks when explicit credentials
(ARTEFACT_REGISTRY_USERNAME/PASSWORD) are not provided.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The GPG key used for gperf was incorrectly set to the busybox
maintainer's key (Denis Vlasenko) instead of the gperf maintainer's
key (Marcel Schaible).
Wrong key (busybox): C9E9416F76E610DBD09D040F47B70C55ACC9965B
Denis Vlasenko <vda.linux@googlemail.com>
Correct key (gperf): EDEB87A500CC0A211677FBFD93C08C88471097CD
Marcel Schaible <marcel.schaible@studium.fernuni-hagen.de>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
kata-remote is a runtime class that cloud-api-adaptor relies on to work.
kata-remote by itself does nothing, and that's the reason it's disabled
by default. We're only adding it here so cloud-api-adaptor charts can
simply do something like `--set shims.remote.enabled=true`.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
When updating ephemeral storages, MS_REMOUNT is explicitly passed as,
for instance, `/dev/shm` should be remounted after memory is hotplugged.
Till now Kata Containers has been explicitly ignoring such updates,
leading to the containers' `/dev/shm` having the size of "half of the
memory allocated, during the startup time", which goes against the
expected behaviour.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
We're only releasing those for amd64 as that's the only architecture
we've been building the packages for.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Let's ensure we can create a specific "tools" tarball, which will help
those who only need to pull those either for testing or production
usage.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
After runtime-rs workspace merged into root workspace, features passed
when building runtime-rs needs to be refactored to be correctly
propagated. Taking dragonball for example, runtime-rs requires runtimes
to depend on virt_conttainers feature, and virt_containers needs to
handle hypervisor features specifically.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
After the workspace integration of runtime-rs, now the output of
runtime-rs is under the repo root, instead of src/runtime-rs. Change the
TARGET_PATH accordingly to tell Makefile where to lookup output.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Some cases in dragonball crates requires interaction with KVM module to
complete, which requires root privilege. Skip those tests under non-root
user.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
MMIODeviceInfo inside the test module of dbs_boot on aarch64 is used for
testing purpose, but `pub` attribute requires it to have documentation.
Since this is used only for testing purpose, let's allow missing_docs
for it.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Test set of dbs_utils's tap module is missing test attribute, which
makes dev-dependencies unusable. Marking tests of tap as test module.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
This is a follow-up of 3fbe693.
Remove runtime-rs from exclude list, and make it as a member of root
workspace.
Specify shim and shim-ctl as the binary of runtime-rs package, make
runtime-rs and all its members into root workspace.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Make runtime-rs a package produces shim and shim-ctl as its binary
product, which enables Makefile to work after it's incorporated into
root workspace.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Init the storage options with original rootfs options.
Addition: XFS, append nouuid to the mount options if not exist.
Signed-off-by: shezhang.lau <shezhang.lau@antgroup.com>
To protect against upstream download failures for gperf and busybox,
implement ORAS-based caching to GHCR.
This adds:
- download-with-oras-cache.sh: Core helper for downloading with cache
- populate-oras-tarball-cache.sh: Script to manually populate cache
- warn() function to lib.sh for consistency
Modified build scripts to:
- Try ORAS cache first (from ghcr.io/kata-containers/kata-containers)
- Fall back to upstream download on cache miss
- Automatically push to cache when PUSH_TO_REGISTRY=yes
The cache is automatically populated during CI builds, and parallel
architecture builds check for existing versions before pushing to avoid
race conditions.
Forks benefit from upstream cache but can override with their own:
ARTEFACT_REPOSITORY=myorg/kata make agent-tarball
Generated-By: Cursor IDE with Claude
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The runtime handles the fsGroup field of the pod security context by
adding a mount option to the generated storage object [1]. This commit
changes genpolicy to expect this option.
Instead of passing another side input to
yaml::get_container_mounts_and_storages, we pass the entire PodSpec.
This reduces the necessary changes in the pod-generating resources and
allows for possible future use of other PodSpec fields.
[1]: https://github.com/kata-containers/kata-containers/blob/0c6fcde1/src/runtime/virtcontainers/kata_agent.go#L1620-L1625Fixes: #11934
Signed-off-by: Markus Rudy <mr@edgeless.systems>
I've seen this happening with the GPU SNP CI every now and then, but I
don't really understand how this was not caught by the TDX / SNP CI
themselves before.
In any case, the error seen is:
```
Error from server (Forbidden): error when applying patch:
{"metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"nfd.k8s-sigs.io/v1alpha1\",\"kind\":\"NodeFeatureRule\",\"metadata\":{\"annotations\":{},\"name\":\"amd64-tee-keys\"},\"spec\":{\"rules\":[{\"extendedResources\":{\"sev-snp.amd.com/esids\":\"@cpu.security.sev.encrypted_state_ids\"},\"labels\":{\"amd.feature.node.kubernetes.io/snp\":\"true\"},\"matchFeatures\":[{\"feature\":\"cpu.security\",\"matchExpressions\":{\"sev.snp.enabled\":{\"op\":\"Exists\"}}}],\"name\":\"amd.sev-snp\"},{\"extendedResources\":{\"tdx.intel.com/keys\":\"@cpu.security.tdx.total_keys\"},\"labels\":{\"intel.feature.node.kubernetes.io/tdx\":\"true\"},\"matchFeatures\":[{\"feature\":\"cpu.security\",\"matchExpressions\":{\"tdx.enabled\":{\"op\":\"Exists\"}}}],\"name\":\"intel.tdx\"}]}}\n"}}}
to:
Resource: "nfd.k8s-sigs.io/v1alpha1, Resource=nodefeaturerules", GroupVersionKind: "nfd.k8s-sigs.io/v1alpha1, Kind=NodeFeatureRule"
Name: "amd64-tee-keys", Namespace: ""
for: "/opt/kata-artifacts/node-feature-rules/x86_64-tee-keys.yaml": error when patching "/opt/kata-artifacts/node-feature-rules/x86_64-tee-keys.yaml": nodefeaturerules.nfd.k8s-sigs.io "amd64-tee-keys" is forbidden: User "system:serviceaccount:kube-system:kata-deploy-sa" cannot patch resource "nodefeaturerules" in API group "nfd.k8s-sigs.io" at the cluster scope
```
And the fix is as simple as allowing patching and updating a
nodefeaturerule in our service account RBAC.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Since the CI issue for s390x was resolved on Dec 5th,
the nightly test result has gone green for 10 consecutive days.
This commit puts the e2e tests for s390x again into the required job list.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Let's remove the deprecated features that were marked for removal
after Kata Containers 3.23.0:
kata-deploy.sh:
- Remove non-arch-specific variable fallbacks (SHIMS, DEFAULT_SHIM,
SNAPSHOTTER_HANDLER_MAPPING, ALLOWED_HYPERVISOR_ANNOTATIONS,
PULL_TYPE_MAPPING, EXPERIMENTAL_FORCE_GUEST_PULL). Each arch now
has its own default value.
- Remove CREATE_RUNTIMECLASSES and CREATE_DEFAULT_RUNTIMECLASS
variables and associated functions (create_runtimeclasses,
delete_runtimeclasses, adjust_shim_for_nfd). RuntimeClasses are
now managed by Helm chart, not the daemonset script.
- Unsupported architectures now fail with an error instead of
falling back to non-arch-specific defaults.
Helm chart:
- Remove all deprecated env values (createRuntimeClasses,
createDefaultRuntimeClass, debug, shims, shims_*, defaultShim,
defaultShim_*, allowedHypervisorAnnotations, snapshotterHandlerMapping,
snapshotterHandlerMapping_*, agentHttpsProxy, agentNoProxy,
pullTypeMapping, pullTypeMapping_*, _experimentalSetupSnapshotter,
_experimentalForceGuestPull, _experimentalForceGuestPull_*).
- Remove backward compatibility code from _helpers.tpl that checked
for legacy env values.
- Remove legacy env.shims check from runtimeclasses.yaml.
- Remove CREATE_RUNTIMECLASSES and CREATE_DEFAULT_RUNTIMECLASS env
vars from kata-deploy.yaml and post-delete-job.yaml.
- Update RBAC to only include runtimeclasses get/patch permissions
(needed for NFD patching), removing create/delete/list/update/watch.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
- Replace generic errors in sandbox operations with typed SandboxError variants (InvalidContainerId, InitProcessNotFound, InvalidExecId).
- This enables the kata shim to handle specific failure cases differently.
Fixes#12120
Signed-off-by: Adeet Phanse <adeet.phanse@mongodb.com>
Add better error handling to runtime rs to handle when the sandbox itself is killed and recreated.
- Update the kill_process function to skip sending a signal when the process is stopped.
- Always set ProcessStatus::Stopped even when wait_process fails
- In state_process return synthetic state for sandbox container when using Sandbox API
Fixes#12120
Signed-off-by: Adeet Phanse <adeet.phanse@mongodb.com>
Align with other test logic - declare the KATA_HYPERVISOR in the
run bash script, then declare the RUNTIME_CLASS_NAME variable in
the bats files.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Now that we have a more restrictive resource policy for KBS, let
us start adopting it across all NVIDIA test cases. This policy was
previously introduced by the NVIDIA attestation test.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
It aims to upgrade rtnetlink to mitigate netlink log noise.
This commit upgrades the `rtnetlink` dependency (and corresponding
libraries like `netlink-packet-route`) to address excessive and
unnecessary netlink-related logging during sandbox startup.
Problem:
The previously used `rtnetlink v0.16` (depending on `netlink-proto
v0.11.3`) generates a high volume of DEBUG/INFO level netlink messages
during sandbox initialization. This noise:
1. Overloads the logging system, often leading to warnings like
"slog-async: logger dropped messages due to channel overflow."
2. Interferes with effective troubleshooting by distracting developers
from legitimate Kata errors.
Solution:
We upgrade to `rtnetlink v0.19` (and `netlink-proto v0.12`), as testing
confirms that the latest versions have correctly elevated the verbosity
of these netlink internal events to the TRACE level.
This change significantly enhances the log analysis experience by
suppressing unnecessary network-related logs during startup.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
With these changes, we create pod security policies when running
against NVIDIA TEE GPU handlers where AUTO_GENERATE_POLICY is set.
For the non-TEE GPU tests, the added functions bail out by design.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Following existing patterns, we adapt the common policy settings
for NVIDIA GPU CI platforms. For instance, for our CI runners, we
use containerd 2.x.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Enable auto-generate policy for qemu-nvidia-gpu-* if the user
didn't specify an AUTO_GENERATE_POLICY value.
Setting this in run_kubernetes_nv_tests.sh is too late as
gha-run.sh calls into run_tests, setup.sh, and then into
create_common_genpolicy_settings() where the rules.rego and
genpolicy-settings file are being copied to the right locations.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Add one valid test case with 2 GPUs with proper VFIO device
entries and CDI annotations.
Add seven test cases with invalid combinations of VFIO device
entries and CDI annotations.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Add rules for vfio passthrough GPUs. When creating the security
policy document, parse GPU resource limits and derive CDI
annotation patterns and VFIO device entries.
With various values for CDI annotations and device paths being
runtime-dependent, use regular expressions.
For now, this enables passthrough of NVIDIA GPUs, but the changes
are designed to allow for other VFIO device types.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Add the block device specific annotations which is dedicated within
runtime-rs for num_queues and queue_sie to the document to help
users set the two parameters.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
This commit introduces the capability to dynamically configure
`queue_size` and `num_queues` parameters via Pod annotations.
Currently, `kata-runtime` allows for static configuration of
`queue_size` and `num_queues` for block devices through its config
file. However, a critical issue arises when a Pod is allocated fewer
CPU cores than the statically configured `num_queues` value. In such
scenarios, the Pod fails to start, leading to operational instability
and limiting flexibility in resource allocation.
To address this, this feature enables users to override the default
queue_size and num_queues parameters by specifying them in Pod
annotations.This allows for fine-grained control and dynamic adjustment
of these parameters based on the specific resource allocation of a Pod.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
The runner is down for a few weeks. I may end up bringing in my personal
runner, but I'm not confident I can easily do this before the holidays,
thus I'm skipping the tests for now.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Set the attestation policy for GPU0 to affirming. This requires
the GPU, for instance, to have production properties, such as
properly signed VBIOS firmware.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
As some reasons that this CI is continuously failed, we'd like to
temporarily skip it for the s390x platform. And it will be enabled
when we addressed related issues.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
As the default enable_annotations in runtime-rs is different with
runtime-go, we should make it align with configuration in runtime-go.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
This commit refactors the vCPU resource management within runtime's
`CpuResource` structure and related calculation logic to use
floating-point numbers (`f32`) instead of integers (`u32`).
This migration is necessary to fully support the fractional vCPU
allocation introduced in the `kata-types` library, ensuring better
precision in:
1.Allocation Tracking: `current_vcpu` now tracks the precise
fractional value (e.g., 1.5 vCPUs).
2.Resource Calculation: `calc_cpu_resources` now returns a precise
`f32` sum of container vCPU requests, including normalization logic
based on the maximum period, removing the previous integer rounding
steps in the calculation.
3.Hypervisor Interaction: The integer vCPU requirement for the
hypervisor remains, so `ceil()` is now explicitly applied only when
interacting with the hypervisor or agent APIs
(`do_update_cpu_resources`, `current_vcpu`, `online_cpu_mem`).
And key changes as below:
1. `CpuResource::current_vcpu` updated from `u32` to `f32`.
2. `calc_cpu_resources` return type changed from `u32` to `f32`.
3. CPU hotplug logic now uses `f32` for the target vCPU count and applies
4. `ceil()` before calling `hypervisor.resize_vcpu()`.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Refactors `LinuxContainerCpuResources` and `LinuxSandboxCpuResources`
to track calculated vCPU allocation using `f64` (fractional float)
instead of `u64` (milliseconds).
This ensures more precise resource calculation (`quota / period`) and
aggregation by avoiding rounding errors inherent in millisecond-based
integer tracking.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
This commit updates the non-TEE tests to disable two specific test
cases: `k8s-number-cpus.bats` and `k8s-sandbox-vcpus-allocation.bats`.
These tests are designed to cover CPU elasticity/dynamic scaling
capabilities. In the non-TEE scenario, we are enforcing the disabling of
this capability by setting the default configuration to
`static_sandbox_resource_mgmt=true`.
Although the tests currently pass, allowing them to run is logically
inconsistent with the intended non-TEE configuration. Therefore, we are
disabling them for all non-TEE runtimes, specifically targeting:
- `qemu-coco-dev`
- `qemu-coco-dev-runtime-rs`
This change ensures that our non-TEE CI accurately reflects the static
resource management policy and prevents misleading test results.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
As runtime-rs doesn't support block device hotplug in s390 arch,
with this fact, we just disable or skip the test when it is the
s390.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
To support such feature, the item in Makefile should be enabled,
and it can be set true when make build, just like this:
`DEFSTATICRESOURCEMGMT_QEMU := false`
When users don't want this feature, they can set it with true via
the configuration.toml.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Enable the cpu hotplug tests within the k8s-number-cpus.bats for both
cloud-hypervisor and qemu-runtime-rs.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
We have support cpu hotplug features within dragonball and clh, this
commit is to enable the test within the CI.
Fixes: #8660
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
As previous failure within the case, we choose to skip it, but now
the cpu hotplug has been corrected, and it's time to re-enable it.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Adding additional cases for the IOMMUFDID method to check for
non-IOMMUFD paths are passed. The method should do the right
thing.
Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
Logging the QMP commands gives us a lot of flexibility to
troubleshoot issues with what is being sent to QEMU.
Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
An import cycle was introduced because of a mutual need
for the constant that describes the prefix of IOMMUFD files.
We need to extract this out into a higher-level package.
Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
The QMP commands sent to QEMU did not properly set up
IOMMUFD objects in the codepath that handles VFIO device
hot-plugging. This is mainly relevant in the Kubernetes
use-case where the VFIO devices are not available when
QEMU is first launched.
Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
The function assumes that the runner is a Ubuntu machine, which so far
has been true as part of our CI.
However, the new ARM runner is running on Debian, and those mirror
additions would simply break.
With this in mind, for any distro that's not ubuntu, let's just make
sure to inform the owner of the system to have bats already installed as
part of the environment provided.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
This reverts commit 5a81b010f2, as we now
have all the infrastructure properly set up as part of our CI node.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Remove the existing containerd guest pull stability tests workflow
as we're going to rebuild all the VMs used for testing and introduce
new, more focused stability tests for nydus-snapshotter.
The new tests will be added soon, as part of another PR.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Now that we've bumped to QEMU 10.2.0-rc1, we can take advantage of a fix
that's present there, which fixes the double memory allocation for the
cases where GPUs are being cold-plugged.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
We've made the pods require a ridiculous amount of memory, just for the
sake of getting them running.
Now that those are running, tests are passing, CI is required, let's
work to lower the amount of mmemory needed as everything else is working
as expected.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Clean-up shellcheck warnings:
SC2030 (info): Modification of cmd_out is local (to subshell caused by (..) group).
SC2031 (info): cmd_out was modified in a subshell. That change might be lost.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Clean-up shellcheck warnings:
SC2250 (style): Prefer putting braces around variable references even
when not strictly required.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Let's add a simple backup and restore logic for the CDI configuration
file nvidia.com-pgpu.yaml in the k8s-nvidia-*.bats and
k8s-confidential-attestation.bats test files.
Althought not optimal, this is a temporary workaround needed until
NVIDIA releases what's needed for the GPU Operator to properly deal with
cold plugged devices for the Confidential Containers cases, which is
work in progress right now.
After that's released, we can revert/drop this patch.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Let's bump experimental {tdx,snp} QEMU to the tags created Today in the
Confidential Containers repo, which match with QEMU 10.2.0-rc1.
This bump is specially beneficial for us, as we can get rid of QEMU's
double memory allocation when **cold plugging** a GPU.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
If the sandbox has cold-plugged a IOMMUFD device but the
device-plugins sends us a /dev/vfio/<NUM> device we need to
check if the IOMMUFD device and the VFIO device are the same
We have the sibling.BDF we now need to extract the BDF of the
devPath that is either /dev/vfio/<NUM> or /dev/vfio/devices/vfio<NUM>
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Bump the github.com/sirupsen/logrus version to 1.9.3
across our components where it is back-level to bring us
up-to-date and resolve high severity CVE-2025-65637
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Add the attestation bats test case to the NVIDIA CI and provide a
second pod manifest for the attestation test with a GPU. This will
enable composite attestation in a subsequent step.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Bump to pull in a fix for composite attestation with GPUs. The new
commit ID corresponds to the fix (change for default GPU policy),
currently being the top commit of the main branch.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
This brings two fixes:
- use the test_key variable to check against the aatest value.
- properly check the run command invocation (run w/o bash does not
seem to like the pipe which leads to ALWAYS evaluating the
status result to 1. With this, the deny-all test would ALWAYS
succeed regardless of whether aatest was actually returned or not.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
When running these tests repeatedly locally, the default policy is not
being reset after the test completes, then subsequent runs fail.
Similar to k8s-sealed-secrets.bats, we set the default policy in an if
condition.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
This allows setting a GPU0 resource policy, enabling GPU
attestation tests to not use the default resource policy.
For now, the policy requires attestation's ear status to
not be contraindicated. In a future change we will require
this to be affirming once our CI runners' vBIOS version is
properly configured.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
This enables attestation tests to figure out whether composite
attestation with a GPU can be executed.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Add the NVIDIA TEE hypervisors. With this, attestation tests can be run
against the NVIDIA handlers, for instance.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
This reverts commit e4a13b9a4a, as it
caused some issues with the GPU workflows.
Reverting it is better, as it unblocks other PRs.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
vfio-ap passthrough has been introduced for runtime-rs,
requiring that the existing test verify this new functionality.
This commit adds:
- containerd config specific to runtime-rs
- extensions to the existing test functions to cover vfio-ap
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The following have been made for the enablement:
1. Make `MediatedPci` and `MediatedAp` in `VfioDeviceType`
2. Make HostDevice without BDF for `MediatedAp`
3. Add `CCW` to VFioBusMode and set it to VfioConfig as `bus_type`
4. Return `vfio-ap` driver type for `CCW` bus type
5. Set `bus_mode` for `VfioDevice` based on `bus_type`
6. Set `vfio-ap` to the agent device's `field_type`
7. Prepare a different argument for `vfio-ap` for QMP command
8. Set None to all PCI relevant fields
Please keep in mind that `vfio-ap` does not belong to any
types of port togologies like PCI (e.g., root or switch)
because devices on s390x are controlled by CCW.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Until now, we relied on `VMROOTFSDRIVER` to determine
whether a system uses a native CCW bus.
However, this method is not canonical and can be error-prone
depending on the configuration.
This commit introduces a new function that checks
for the presence of CCW bus infrastructure in sysfs
and verifies that native mainframe drivers are available.
It replaces all previous uses of the old detection method.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Add the small and normal variants of the qemu-runtime-rs
tests to the required-tests list now that they are stable.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
An oci-spec can be passed to the runtime without annotations
(e.g., `ctr run`). In this case, runtime panics with:
```
src/runtime-rs/crates/runtimes/src/manager.rs:391: called `Option::unwrap()` on a `None` value
```
This commit checks if the annotation is None, and instantiates
the hashmap as an empty map if it is missing. It also adds a None
check for `netns`.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Currently, the protection device configuration is constructed
automatically even if `confidential_guest` is not set.
This commit puts a condition to check the flag and allows the
construction accordingly.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Updates to the shim-v2 build and the binaries.sh script.
Makeing sure that both variants "confidential" AND
"nvidia-gpu-confidential" are handled.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Create an initial version of our toolchain policy as agreed in
Architecture Committee meetings and the PTG
Fixes: #9841
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
As tags are mutable and digests are not, lets pin our image
by digest to give our CI a better chance of stability
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- Swap out the hard-coded nginx registry and verisons for reading
the test image details for version.yaml
which can also ensure that the quay.io mirror is used
rather than the docker hub versions which can hit pull limits
- Try setting imagePullPoliycy Always to fix issues with the arm CI
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Using make tarball targets for tools locally, binaries may exist
for both debug and release builds. In this case, cryptic errors
are shown as we try to install multiple binaries.
This change require exactly one binary to be found and errors out
in other cases.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
When tests regress, the CI wait time can increase significantly
with the current kubectly_retry attempt logic. Thus, align with
other tests and remove kubectl_retry invocations. Instead, rely on
proper timeouts.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
SEV-SNP machine is failing due to nydus not being deployed in the
machine.
We cannot easily contact the maintainers due to the US Holidays, and I
think this should become a criteria for a machine not be added as
required again (different regions coverage).
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
So far we've only been building the initrd for the nvidia rootfs.
However, we're also interested on having the image beind used for a few
use-cases.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
We hit a case that gatekeeper was failing due to thinking the WIP check
had failed, but since it ran the PR had been edited to remove that from
the title. We should listen to edits and unlabels of the PR to ensure that
gatekeeper doesn't get outdated in situations like this.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
When using the multiInstallSuffix we must be cautelous on using the shim
name, as qemu-nvidia-gpu* doesn't actually have a matching QEMU itself,
but should rather be mapped to:
qemu-nvidia-gpu -> qemu
qemu-nvidia-gpu-snp -> qemu-snp-experimental
qemu-nvidia-gpu-tdx -> qemu-tdx-experimental
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Fixes: #12123
`include` in #12069, introduced to choose a different runner
based on component, leads to another set of redundant jobs
where `matrix.command` is empty.
This commit gets back to the `runs-on` solution, but makes
the condition human-readable.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Containerd configuration syntax (`config.toml`) varies across versions,
requiring per-version logic for fields like `runtime`.
However, testing confirms that containerd LTS (1.7.x) and newer
versions fully support the v3 schema for the nydus remote snapshotter.
This commit changes the previous containerd v1 settings in `config.toml`.
Instead, it introduces a unified v3-style configuration for nydus, which
can be vailid for lts and active containerds.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
In the CoCo tests jobs @wainersm create a report tests step
that summarises the jobs, so they are easier to understand and
get results for. This is very useful, so let's roll it out to all the bats
tests.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
In order to have a better way to set things up using a toml editor, we
should take the containerd approach and actually have everything
uncommnted. This will help us to unify how we deal with such values in
the future from the kata-deploy POV.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
We need to ensure that we do not blindly append nor blindly override the
kernel parameters set by default, but rather modify the values in case
they exist, and append in case they do not.
Now we're actually making golang and rust runtime behave the same, as so
far they were behaving differently, each version wrong in its own way.
:-p.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
QEMU netdev_add QMP command requires the 'mq' (multi-queue) argument
to be of boolean type (`true` / `false`). In runtime-rs the virtio-net
device hotplug logic currently passes a string value (e.g. "on"/"off"),
which causes QEMU to reject the command:
```
Invalid parameter type for 'mq', expected: boolean
```
This patch modifies `hotplug_network_device` to insert 'mq' as a proper
boolean value of `true . This fixes sandbox startup failures when
multi-queue is enabled.
Fixes#12136
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Allow users to build the Kata Agent using INIT_DATA=no to disable the
detect_initdata_device() code loop and associated debug log output.
Future additional improvements related to Init Data are tracked by #11532.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
On 69c4fc4e76, I've mistakenly changed the
nvidia-gpu podOverhead while I should only have changed the TEE
nvidia-gpu ones.
Let's move it back to its original value.
Reported-by: Joji Mekkattuparamban <jojim@nvidia.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
This settings is not needed anymore with Ubuntu 25.10
and the newest QEMU releases for TDX by Ubuntu.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
With issue 11777 being resolved, this commit enables openvpn
policy testing. The remaining work on the security policy
required to successfully run this test case was to enable UDP
ports for Service kinds and to use the mount path's last component
instead of the volume name to construct the expected storage
source path.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Use the mount path's last component instead of the volume name to
construct the expected storage source path. Example: Name of a
volumeMount is 'openvpn-config' and its mountPath is
'/etc/openvpn/'. Without this change, we use 'openvpn-config' to
calculate the expected storage source path. However, we need to
use 'openvpn', because the shim uses the basename of the
destination path as the source suffix and not the volume name.
For reference, see 'fs_hsare_linux.go"'s 'ShareFile' function
where the filename variable uses 'filepath.Base(m.Destionation))'.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
For Service kinds using the UDP protocol as port. An example is
the openvpn-server-service.yaml file part of the openvpn CI test.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
We've added logic to properly do the book keeping of the TEE keys when
using NFD **AND** creating the runtime classes. However, we need to also
take into consideration the case where the runtimeclasses are being
created by the helm template, and in that case we just update what helm
has deployed.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Remove the nvrc.smi.srs=1 parameter from the kernel command line.
In CC use cases, the attestation agent is expected to set the GPU
ready state. For the CUDA vectorAdd case where attestation agent
is not being used, we set the ready state by adding the kernel
command line parameter through an annotation.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Add an allow-all policy for the CC GPU tests and ensure the init-data
device is being created (hypervisor annotations).
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
The add_allow_all_policy_to_yaml in tests_common.sh needs some
improvements so that this function can support pod manifests with
different resource kinds. For now, moving the Secret definition
to the bottom so that we can create a default policy for the Pod.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
The qemu-nvida-gpu handlers should not cause is_aks_cluster to
return 1. Otherwise, CI logic will assume these hypervisors run on
AKS hosts, see the following message in CI w/o this change:
INFO: Adapting common policy settings for AKS Hosts
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
We currently start a pod that does a `wget` to the KBS address, and
fails after 5 seconds.
By the time it fails and reports back, we can see that KBS is actually
running, but the workflow failed as the checker failed. :-/
Let's give it more time for the KBS to show up, and the flakeness should
go away.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Utilize Kubelet's Pod Resource API to determine device allocations
for the Pod during sandbox creation. Use CDI files to translate the device
IDs to corresponding device paths and perform device injection.
Fixes#12009
Signed-off-by: Joji Mekkattuparamban <jojim@nvidia.com>
Use the pod name variable so that kubectl wait finds the pod. Currently,
kubectl waits for nvidia-nim-llama-3-2-nv-embedqa-1b-v2, not for
nvidia-nim-llama-3-2-nv-embedqa-1b-v2-tee
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Introduce a new devkit parameter which will produce a rootfs
without chisselling. This results in a larger rootfs with various
packages and binaries being included, for instance, enabling the
use of the debug console.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
There are rust packages being cloned and built inside
tools/packaging/kata-deploy/local-build/build folder, which may mislead
those packages to think they are part of the kata root workspace.
Exclude the directory to avoid that.
Reported-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
The person who introduced the check, someone named Fabiano Fidêncio,
forgot a `$` in a variable assignment.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The snp CI has not been required for a while and has recently been
broken, so comment it out from the list of required jobs.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The run-nydus tests are not stable and blocking PRs, so make them
non-required temporarily until they can be looked at
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Enable auto-generate policy on cbl-mariner Hosts for
qemu-coco-dev-runtime-rs if the user didn't specify an
AUTO_GENERATE_POLICY value.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
We will re-enable this one later on once the changes to properly cold
plug multi GPUs are merged.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Let's just move the podOverhead to a gigantic value, as we do need pod
snadboxes as big as that, and we've noticed QEMU being OOM killed with
smaller overheads.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Those need to pull the models inside the guest, and the guest has 50% of
its memory "allowed" to be used as tmpfs, so, we gotta usa the RAM that
we have.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Yes, we're dealing with a combination of large images and image-rs
concurrent image layers being not optimal.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
We cannot use the same format used for docker, as it includes username
and password, while what's expected when using Trustee does not.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Now that we've bumped Trustee to a version that supports the NVIDIA
remote verifier, let's re-enable the tests.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Right now we have only been passing the env var to the deployment
script, but we really need to pass it to the tests script as well.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Try and reduce the page limit of each job request to avoid the chances of
us tripping over github's 10s api limit.
All credit to @burgerdev for the investigation and suggestion!
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Add the related block queue_size and num_queues in volumes based on
block devices, This very important for IO performance.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Previous Clh's settings with disk queue_size and num_queues are
hardcodes, they should be configurable with user-defined values.
This commit is to address such issue via passing these settings.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Usually, we pass the related block config via BlockConfig, and to reach
the goal of user-friendly setting queue_size and num_queues for users,
the queue_size and num_queues are introduced in BlockConfig.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Add two fields of queue_size and num_queues in BlockDeviceInfo to allow
users to set the related items via configurations
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Add related items for block device queue size and num queues in
configurations. And users can set the related items by configurations.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
The current implementation causes issues with the Agent Policy
nontee CI tests, as Kata-Agent does not allow any configuration
for `count(Linux.Resources.Devices) == 0`.
This commit ensures that Linux.Resources.Devices, including all its
values, is completely cleared from the OCI Runtime Specification before
being passed to the Kata-Agent.
This addresses the CI failure by enforcing the required empty state for
the Devices cgroup configuration.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Previously, CopyFile implementation attempted to reuse existing guest
paths for subsequent containers within the same Pod. This prevented
correct bind mounting of shared configurations (e.g., ConfigMaps,
Service Accounts) into the later containers within a multi-containers
pod, as they lacked their own allocated guest path.
This commit modifies the logic to create a unique guest path for every
container that requires file propagation.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Crates with no workspace setup would think themselves are in the root
workspace, which our root workspace is not ready for them. Excluding
them for now.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Add Cargo.toml at repo root, use this root workspace for as many as
possible Rust components of Kata Containers. This would enable us to
share a common Cargo.lock file, and reduce the noise from dependabot.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Similar to #12075, bump-backtrace to 0.3.76 to remove the dependency
on adler, which is unmaintained - contributing to mitigating RUSTSEC-2025-0056
As a side effect this brought in loads of other crate changes, which I think are due
to it bumping the local dependencies that this package builds on.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Similar to #12075, bump-backtrace to remove the dependency
on adler, which is unmaintained - contributing to mitigating RUSTSEC-2025-0056
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Similar to #12075, bump-backtrace to remove the dependency
on adler, which is unmaintained - contributing to mitigating RUSTSEC-2025-0056
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Similar to #12075, bump-backtrace to remove the dependency
on adler, which is unmaintained - contributing to mitigating RUSTSEC-2025-0056
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Similar to #12075, bump flate2 and backtrace to remove the dependency
on adler, which is unmaintained - contributing to mitigating RUSTSEC-2025-0056
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Similar to #12075, bump flate2 and backtrace to remove the dependency
on adler, which is unmaintained - contributing to mitigating RUSTSEC-2025-0056
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Since the network device hotplug is an asynchronous operation,
it's possible that the hotplug operation had returned, but
the network device hasn't ready in guest, thus it's better to
retry on this operation to wait until the device ready in guest.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
This makes the user experience better, as the admin can deploy Kata
Containers without having to download / set up any additional file.
Of course, if the admin wants something more specific, examples are
provided.
Tests and documentation are updated to reflect this change.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The current format of genpolicy request logs looks a bit like JSON, but
it does not parse out of the box and needs post-processing with sed, for
example.
This commit changes the log format to jsonlines[1], which is basically
newline-delimited compact JSON values. Compared to standard JSON, this
allows streaming output. The resulting file can be converted and
processed programmatically, for example with `jq -s`.
The fields are also adjusted to match the field names of TestRequest, so
that the logged requests can be used immediately in tests.
[1]: https://jsonlines.org/
Signed-off-by: Markus Rudy <mr@edgeless.systems>
This should allow keeping future diffs minimal.
The files were formatted with `jq -S`, which should be used after future
updates to the test case files.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
Storing the request type outside the request object has two benefits:
* The request JSON passed to the Rego engine matches more closely what
would be passed by the agent (no `type` field).
* If we want to update the requests, it's easier to insert them into a
dedicated field, rather than inserting them and amending the type
field.
This is a first step towards programmatic updates of testcase files.
This commit also adds the 'Request' suffix to the test case enum, such
that we can use the 'ep' input for allow_request directly.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
Add comments to make the "EnableIOThreads" flag as a switch
for virtio-blk(based on IndepIOThreads) driver.
Signed-off-by: zhangchen.kidd <zhangchen.kidd@jd.com>
Make hotplug virtio-blk device attach to Independent IOThread 0 as default
when enabled the EnableIOThreads and IndepIOThreads.
Signed-off-by: zhangchen.kidd <zhangchen.kidd@jd.com>
Qemu already support the device_add with iothread args.
Make KATA have ability to hotplug PCI device with IOThreads.
Currently, just support QEMU as the hypervisor, not sure it
works for stratovirt.
Signed-off-by: zhangchen.kidd <zhangchen.kidd@jd.com>
Make the original virtio-scsi iothread and the new independent
iothread to a dedicated method for handing the related logics.
Signed-off-by: zhangchen.kidd <zhangchen.kidd@jd.com>
Introduce independent IOThread framework for Kata container.
What is the indep_iothreads:
This new feature introduce a way to pre-alloc IOThreads
for QEMU hypervisor (maybe other hypervisor can support too).
Independent IOThreads enables IO to be processed in a separate thread.
To generally improve the performance of each module, avoid them
running in the QEMU main loop.
Why need indep_iothreads:
In Kata container implementation, many devices based on hotplug
mechanism. The real workload container may not sync the same
lifecycle with the VM. It may require to hotplug/unplug new disks
or other devices without destroying the VM. So we can keep the
IOThread with the VM as a IOThread pool(some devices need multi iothreads
for performance like virtio-blk vq-mapping), the hotplug devices
can attach/detach with the IOThread according to business needs.
At the same time, QEMU also support the "x-blockdev-set-iothread"
to change iothreads(but it need stop VM for data secure).
Current QEMU have many devices support iothread, virtio-blk,
virtio-scsi, virtio-balloon, monitor, colo-compare...etc...
How it works:
Add new item in hypervisor struct named "indep_iothreads" in toml.
The default value is 0, it reused the original "enable_iothreads" as
the switch. If the "indep_iothreads" != 0 and "enable_iothreads" = true
it will add qmp object -iothread indepIOThreadsPrefix_No when VM startup.
The first user is the virtio-blk, it will attach the indep_iothread_0
as default when enable iothread for virtio-blk.
Thanks
Chen
Signed-off-by: zhangchen.kidd <zhangchen.kidd@jd.com>
In commit 1f95d9401b
runtime-rs: change representation of default_vcpus from i32 to f32,
When the vCPU number is less than 1.0, directly converting an integer to
a floating-point number will automatically convert it to 0. Therefore,
it needs to be rounded up before converting it back to an integer.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Update the `cpath` variable in the policy template to support the
optional `/passthrough` subpath used by runtime-rs. This ensures
that mount source path validation works correctly for both runtime
implementations.
By changing `cpath` to include the `(?:/passthrough)?` regular
expression fragment, we make the `/passthrough` segment optional.
The updated `cpath`:
`/run/kata-containers/shared/containers(?:/passthrough)?`
This single regex pattern now correctly matches both:
1.`/run/kata-containers/shared/containers/<sandbox-id>/...`
(runtime-go)
2.`/run/kata-containers/shared/containers/passthrough/<sandbox-id>/...`
(runtime-rs)
This elegantly resolves the compatibility issue without needing to add
separate or conditional logic to the policy rules, making the policy
more robust and maintainable.
Fixes: #12063
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Add three example values files to make it easier for users to try out
different Kata Containers configurations:
- try-kata.values.yaml: Enables all available shims
- try-kata-tee.values.yaml: Enables only TEE/confidential computing shims
- try-kata-nvidia-gpu.values.yaml: Enables only NVIDIA GPU shims
These files use the new structured configuration format and serve as
ready-to-use examples for common deployment scenarios.
Also update the README.md to document these example files and how to use them.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Update the helm_helper function in gha-run-k8s-common.sh to use the
new structured configuration format instead of the legacy env.* format.
All possible settings have been migrated to the structured format:
- HELM_DEBUG now sets root-level 'debug' boolean
- HELM_SHIMS now enables shims in structured format with automatic
architecture detection based on shim name
- HELM_DEFAULT_SHIM now sets per-architecture defaultShim mapping
- HELM_EXPERIMENTAL_SETUP_SNAPSHOTTER now sets snapshotter.setup array
- HELM_ALLOWED_HYPERVISOR_ANNOTATIONS now sets per-shim allowedHypervisorAnnotations
- HELM_SNAPSHOTTER_HANDLER_MAPPING now sets per-shim containerd.snapshotter
- HELM_AGENT_HTTPS_PROXY and HELM_AGENT_NO_PROXY now set per-shim agent proxy settings
- HELM_PULL_TYPE_MAPPING now sets per-shim forceGuestPull/guestPull settings
- HELM_EXPERIMENTAL_FORCE_GUEST_PULL now sets per-shim forceGuestPull/guestPull
The test helper automatically determines supported architectures for
each shim (e.g., qemu-se supports s390x, qemu-cca supports arm64,
qemu-snp/qemu-tdx support amd64, etc.) and applies per-shim settings
to the appropriate shims based on HELM_SHIMS.
Only HELM_HOST_OS remains in legacy env.* format as it doesn't have
a structured equivalent yet.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Add comprehensive documentation for the new structured configuration
format, including:
- Migration guide from legacy env.* format
- List of deprecated fields with removal timeline (2 releases)
- Examples of the new structured format
- Explanation of key benefits
- Backward compatibility notes
The documentation makes it clear that the legacy format is deprecated
but will continue to work during the transition period.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
This commit adds backward compatibility support to ensure existing
configurations using the legacy env.* format continue to work.
The helper functions now check for legacy env.* values first, and
only fall back to the new structured format if legacy values are
not set. This allows for gradual migration without breaking
existing deployments.
Backward compatibility is maintained for:
- env.shims, env.shims_* (per architecture)
- env.defaultShim, env.defaultShim_* (per architecture)
- env.allowedHypervisorAnnotations
- env.snapshotterHandlerMapping_* (per architecture)
- env.pullTypeMapping_* (per architecture)
- env.agentHttpsProxy, env.agentNoProxy
- env._experimentalSetupSnapshotter
- env._experimentalForceGuestPull_* (per architecture)
- env.debug
Legacy env vars (SHIMS, DEFAULT_SHIM, etc.) are still set in the
DaemonSet when using the old format to maintain full compatibility.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
This commit introduces a new structured configuration format for
configuring Kata Containers shims in the Helm chart. The new format
provides:
- Per-shim configuration with enabled/supportedArches
- Per-shim snapshotter, guest pull, and agent proxy settings
- Architecture-aware default shim configuration
- Root-level debug and snapshotter setup configuration
All shims are disabled by default and must be explicitly enabled.
This provides better type safety and clearer organization compared
to the legacy env.* string-based format.
The templates are updated to use the new structure exclusively.
Backward compatibility will be added in a follow-up commit.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
As the some of the global vars can be empty, we should actually check
their _FOR_ARCH version instead.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
As we're making the values.yaml more user friendly, we actually have to
handle the https_proxy and no_proxy entries per shim, instead of having
this globally available, as this will only affect images being pulled
inside the guest (as in, when using TEE variations of the shims).
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Adds a practical set of kernel config used by docker-in-docker and kind
for network bridging and filtering. It also includes the matching IPv6
support to allow tools like kind that require IPv6 network policies to
work out of the box.
This support includes:
- nftables reject and filtering support for inet/ipv4/ipv6
- Bridge filtering for container-to-container traffic
- IPv6 NAT, filtering, and packet matching rules for network policies
- VXLAN and IPsec crypto support for network tunneling
- TMPFS POSIX ACL support for filesystem permissions
The configs are organized across fragment files:
- common/fs.conf: TMPFS ACL support
- common/crypto.conf: IPsec/VXLAN crypto algorithms
- common/network.conf: VXLAN, IPsec ESP, nftables bridge/ARP/netdev
- common/netfilter.conf: IPv6 netfilter stack and nftables advanced features
Fixes: #11886
Signed-off-by: Simon Kaegi <simon.kaegi@gmail.com>
Re-enable AUTO_GENERATE_POLICY for coco-dev Hosts, unless PULL_TYPE is
"experimental-force-guest-pull", or the caller specified a different
value for AUTO_GENERATE_POLICY.
Auto-generated Policy has been disabled accidentally and recently for
these Hosts, by a GHA workflow change.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Don't skip anymore parsing the pause container image when using the
recently updated AKS pause container handling - i.e. when
pause_container_id_policy == "v2".
This was the easiest CI fix for guest pull + new AKS given the *current*
tests. When adding *new* UID/GID/AdditionalGids tests in the future,
these workarounds might need additional updates.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
The update removes the deprecated adler crate from our dependencies. In
addition, we're switching to the default backend (miniz_oxide), which is
a pure Rust implementation and thus much more portable. The performance
impact is negligible, because flate2 is only used for initdata
decompression, which is limited to a couple of MiB anyway.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
The update removes the deprecated adler crate from our dependencies. In
addition, we're switching to the default backend (miniz_oxide), which is
a pure Rust implementation and thus much more portable. The performance
impact is acceptable for a developer tool.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
The github API suggestions that `Authorization: Bearer <YOUR-TOKEN>`
is the way to set the auth token, but it also mentioned that `token`
should work, so it's unclear if this will help much, but it shouldn't harm.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The formatting wasn't quite right, so the `qemu-coco-dev-runtime-rs`
hypervisor wasn't skipping this test
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Introduce a flag `DEFSTATICRESOURCEMGMT_COCO` for setting static sandbox
resource management with default true. And then set it to the item of
`static_sandbox_resource_mgmt` in configuration.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
The teardown_common will print the description of the running pods, kill
them all and print the system's syslogs afterwards.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
When testing this branch, on several occasions the Delete
AKS cluster step has hung for multiple hours, so add a timeout
to prevent this.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Since it aligns with the create_container_timeout definition in
runtime-go, we need to set the value in configuration.toml in seconds,
not milliseconds. We must also convert it to milliseconds when the
configuration is loaded for request_timeout_ms.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Create non-tee runtime class for runtime-rs qemu CoCo development
without requiring TEE hardware. Based on the qemu-runtime-rs
config, but with updated guest image, kernel and shared_fs
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The new environment of Power runners for agent checks is causing two test case failures
w.r.to selinux and inode which needs further understanding and is mostly an issue
due to environemnt change and not to do with the agent.
Fall back to running agent checks on original ppc64le self hosted runners.
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
As the arm 22.04 runner isn't working at the moment, let's test the
24.04 version to see if that is better.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The fact that we were not explicitly setting the VMM was leading to us
testing with the default runtime class (qemu). :-/
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
By doing this, the ones interested on RISC-V support can still have a
ood visibility of its state, without the extra noise in our CI.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
We have had those tests broken for months. It's time to get rid of
those.
NOTE that we could easily revert this commit and re-add those tests as
soon as we find someone to maintain and be responsible for such
integration.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
As stratovirt CI was removed in #12006 we should remove the
jobs from required.
Also the docker tests have been commented out for months, and
we are considering removing them, so clean this file up.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
sometimes it's hard to enumerate all blacklisted namespaces, lets add a
regular expression based only filter to allow specifying namespaces that
should be mutated.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
Previous set for the Mount.type with `bind` is wrong, and for local
storage, the type of Mount should be `local`.
This commit aims to correct the type with "local".
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
As the disable_guest_empty_dir order is wrong which causes
the bool value is not correct and it got a wrong result.
This commit aims to correct the parameters order.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
This is a bump pre-release, which brings several fixes and some
improvements related to initData, and NVIDIA's remote verifier.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The test case designed to verify policy failures due to an "unexpected
capability" was misconfigured. It was using "CAP_SYS_CHROOT" as the
unexpected capability to be added.
This configuration was flawed for two main reasons:
1.Incorrect Syntax: Kubernetes Pod specs expect capability names without
the "CAP_" prefix (e.g., "SYS_CHROOT", not "CAP_SYS_CHROOT").
This made the test case's premise incorrect from a K8s API perspective.
2.Part of Default Set: "SYS_CHROOT" is already included in the
`default_caps` list for a standard container. Therefore, adding it would
not trigger a policy violation, defeating the purpose of the
"unexpected capability" test.
Furthermore, a related issue was observed where a malformed capability
like "CAP_CAP_SYS_CHROOT" was being generated, causing parsing failures
in the `oci-spec-rs` library. This was a symptom of incorrect string
manipulation when handling capabilities.
This commit corrects the test by selecting "SYS_NICE" as the unexpected
capability. "SYS_NICE" is a more suitable choice because:
- It is a valid Linux capability.
- It is relatively harmless.
- It is **not** part of the default capability set defined in
`genpolicy-settings.json`.
By using "SYS_NICE", the test now accurately simulates a scenario where
a Pod requests a legitimate but non-default capability, which the policy
(generated from a baseline Pod without this capability) should correctly
reject. This change fixes the test's logic and also resolves the
downstream `oci-spec-rs` parsing error by ensuring only valid capability
names are processed.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Detected a format mismatch in OCI Spec Capabilities fields between
`runtime-rs` (no `CAP_` prefix) and `runtime-go` (with `CAP_` prefix).
This introduces a normalization of caps in match_caps(p_caps, i_caps).
This ensures robust and consistent processing of Capabilities regardless
of whether the OCI Spec originates from `runtime-rs` or `runtime-go`.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Currently, the initdata module only detects virtio-blk devices
(/dev/vd*) when searching for the initdata block device. However,
when using virtio-scsi, the devices appear as /dev/sd* in the
guest, causing the initdata detection to fail.
This commit extends the device detection logic to support both
device types:
- virtio-blk devices: /dev/vda, /dev/vdb, etc.
- virtio-scsi devices: /dev/sda, /dev/sdb, etc.
This commits aims to address issue of theinitdata device not being
found when using virtio-scsi
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Thankfully there's only one piece that's still SNP specific (for the
supported TEEs). Let's adjust it so we can have an easy and smooth
execution when adding a TDX CI machine.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
There are several changes needed in order to get this test working with
CC, and yet we still are skipping it.
Basically, we need to:
* Pull an authenticated image inside the guest, which requires:
* Using Trustee to release the credential
* We still depend on a PR to be merged on Trustee side
* https://github.com/confidential-containers/trustee/pull/1035
* We still depend on a Trustee bump (including the PR above) on our
side
Apart from those changes, I ended up "duplicating" the tests by adding a
"-tee" version of those, which already have:
* The proper kbs annotations set up
* Dropped host mounts
* Increases the memory needed
Last but not least, as "bats" probably means "being a terrible script",
I had to re-arrange a few things otherwise the tests would not even run
due to bats-isms that I am sincerely not able to pin-point.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
We added the tests using virtio-9p as we knew it'd require incremental
changes to be able to use any kind of guest-pull method.
Now, as in the coming commits we'll be actually ensuring that guest-pull
works and is in use, we can enforce the experimental_force_guest_pull
usage for the nvidia cases.
Note: We're using experimental_force_guest_pull instead of
nydus-snapshotter due to stability concerns with the snapshotter.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
It was just missed when adding those configurations.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
It takes either a shim name or "", but we were treating this (thankfully
only in this specific file) as a boolean.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Adjust output to the setup_file and teardown_file behavior.
With this, we will be able to observe relevant logging rather than
adding to the output variable.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Previous commit enabled getting the physical address reduction from
processor but just stored it for later use. This commit adds handling
of the value to ProtectionDevice and enables the QEMU driver to use it.
Signed-off-by: Pavel Mores <pmores@redhat.com>
An implementation of cbitpos acquisition is supplied that was missing
so far. We also get the physical address reduction value from the same
source (CPUID Fn8000_001f function). This has been hardcoded at 1 so far,
following the Go runtime example, but it's better to get it from the
processor.
Signed-off-by: Pavel Mores <pmores@redhat.com>
- version.rs gets generated from version.rs.in
- version.rs.in contains values read from VERSION
- so version.rs (and maybe other Agent files too) must be
re-generated when the VERSION file changes
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
The new image reference has changed to mcr.microsoft.com/oss/v2/kubernetes/pause:3.6
from mcr.microsoft.com/oss/kubernetes/pause:3.6.
The new image uses by default UID=0, GID=0 while the older. The older image had:
UID=65535, GID=65535.
There is a new pause_container_id_policy field in genpolicy-settings.json, informing
genpolicy about the way AdditionalGids gets updated - "v1" for the older behavior
and "v2" for the newer AKS version:
- When using v1, the default value of AdditionalGids is {65535}.
- When using v2, the default value of AdditionalGids is {}.
UID=65535 and GID=65535 are still hard-coded by default in genpolicy-settings.json.
We might be able to remove/ignore these fields in the future, if we'll stop relying
on policy::KataSpec::get_process_fields to use these fields.
A new CI function adapt_common_policy_settings_for_aks() changes the pause container
UID, GID, pause_container_id_policy, and image ref settings values when testing on
AKS Hosts - i.e., when testing coco-dev or mariner Hosts.
The genpolicy workarounds for the unexpected behavior with guest pull enabled have
been improved to use the current container's GID instead of hard-coding GID=0 as the
guest pull default. Also, AdditionalGids gets updated when the current container's GID is
changing, instead of always changing the AdditionalGids at the very end of
policy::AgentPolicy::get_container_process(), when the relevant evolution of the GID
value was no longer available.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Make it easier to understand the source of the UID/GID/AdditionalGids
values from the container in the auto-generated policy.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Parallelize busybox builds to build a bit faster and create the
build directory prior to Docker execution, which on my
environment, helps with permission issues when building busybox
without the kata-containers/build directory existing beforehand.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Correct the hardcoded value of disable_guest_empty_dir, instead,
we use the real value of it which comes from the configuration.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
A sandbox annotation that determines if it should create Kubernetes
emptyDir mounts on the guest filesystem.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
It acts as if it should create Kubernetes emptyDir mounts on the
guest filesystem. If enabled, the runtime will not create Kubernetes
emptyDir mounts on the guest filesystem.Instead, emptyDir mounts will
be created on the host and shared via virtio-fs.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Let's ensure Trustee is deployed as some of the tests rely images that
live behind authentication. /o\
The approach taken here to deploy Trustee is exactly the same one taken
on the other CoCo tests, apart from an env var passed to ensure we're
using the NVIDIA remote verifier (which will be in handy very very
soon).
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
This isn't really related to remote hypervisor though it was useful for
its debugging. It's a small helper I've been using regularly during
development for quite some time that I think might be useful more broadly.
Signed-off-by: Pavel Mores <pmores@redhat.com>
The remote hypervisor launches no VM, it just instructs the Cloud API
Adaptor to do so, therefore it has no need for an image or initrd to boot
from and should be exempt from the mandate for one or the other to be
specified.
Signed-off-by: Pavel Mores <pmores@redhat.com>
The go runtime's .proto file - which is also used by the Cloud API
Adaptor - puts the Hypervisor service into the "hypervisor" package.
runtime-rs has to do the same to avoid an "unimplemented" error.
Signed-off-by: Pavel Mores <pmores@redhat.com>
With the change made to the matrix when the CC GPU runner was added,
there was a change in the job name (@sprt saw that coming, but I
didn't).
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Same deal as the previous commut, just enabling the tests here, with the
same list of improvements that we will need to go through in order to
get is working in a perfect way.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
While the primary goal of this change is to detect regressions to the
NVIDIA SNP GPU scenario, various improvements to reflect a more
realistic CC setting are planned in subsequent changes, such as:
* moving away from the overlayfs snapshotter
* disabling filesystem sharing
* applying a pod security policy
* activating the GPUs only after attestation
* using a refined approach for GPU cold-plugging without requiring
annotations
* revisiting pod timeout and overhead parameters (the podOverhead value
was increased due to CUDA vectorAdd requiring about 6Gi of
podOverhead, as well as the inference and embedqa requiring at least
12Gi, respectively, 14Gi of podOverhead to run without invoking the
host's oom-killer. We will revisit this aspect after addressing
points 1. and 2.)
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
For the nvidia-gpu-snp and nvidia-gpu-tdx we must set containerd to
allow the CDI annotation to be passed to down.
This solution may become obsolete soon enough, but the cleanest way to
have it properly working is by adding it here (even if we remove it
before the next release).
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
It's been noticed that as more RAM is needed to run the CC tests, we
also need to update the podOverhead of the NVIDIA CC runtime classes to
avoid getting OOM Killed.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Since there's something wrong with the cpu hotplug
on qemu-runtime-rs, thus disable this test temporally.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
It should do nothing instead of return an error when
hot-unplug the memory to the size smaller than static
plugged memory size.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Since the nerdctl's network hook would call pselect6 syscall
by xtables-nft-multi, thus we'd better add it to the seccomp's
whitelist.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Let's add a new NVIDIA machine, which later on will be used for CC
related tests.
For now the current tests are skipped in the CC capable machine.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Let's now make sure that we don't add duplicated values to any of our
entries, making the script as sane as possible for sequential runs.
Vibed with Cursor's help!
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Let's add some helper functions, not yet used, to avoid adding
duplicated items.
This idea is an expansion of Choi's idea to avoid setting duplicated
items, and it'll help on making the whole script idempotent on
sequential runs.
Vibed with Cursor's help!
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
I know, this is not simplifying much things for now, but it has a good
intent in the background and will serve as base for making the
kata-deploy helm chart more user friendly.
With that said, let's add ALLOWED_HYPERVISOR_ANNOTATIONS per arch, while
adding support to set something like "qemu:foo,bar clh:bar foobar
barfoo". Why? Because in the future we'll have a better way to set this
per shim (and the shim is per arch ...).
More details of what we'll do in the future are being discussed here:
https://github.com/kata-containers/kata-containers/issues/12024
Anyways, the variables are **DELIBERATELY** not exposed to the chart for
now, as those will be later on when addressing the issue mentioned
above.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
When the runtimeClasses were added, as part of 7cfa826804, the
firecracker runtimeClass ended up missing from the dictionary.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The Firecracker installation docs had an outaded containerd configuration for the devmapper plugin.
This commit updates the instructions so that they are compatible with more recent versions of containerd.
Signed-off-by: Anton Ippolitov <anton.ippolitov@datadoghq.com>
When added, I've mistakenly used the wrong test-type name, which is now
fixed and should be enough to trigger the tests correctly.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
On IBM actionspz P/Z runners, the following error was observed during
runtime tests:
```
host system doesn't support vsock: stat /dev/vhost-vsock: no such file or directory
```
Since loading the vsock module on the fly is not permitted, this commit
moves the runtime tests back to self-hosted runners for P/Z.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
On IBM actionspz Z runners, the following error occurs when running
`modprobe`:
```
modprobe: FATAL: Module bridge not found in directory /lib/modules/6.8.0-85-generic
```
Additionally, there are no files under `/lib/modules`, for example:
```
total 0
drwxr-xr-x 1 root root 0 Aug 5 13:09 .
drwxr-xr-x 1 root root 2.0K Oct 1 22:59 ..
```
This commit skips the `test_load_kernel_module` test if the module is
not found or if running `modprobe` is not permitted.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
On IBM actionspz Z runners, write operations on network interfaces
are not allowed, even for the root user.
This commit skips the `add_update_addresses` test if the operation
fails with EACCES (-13, permission denied).
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
On IBM actionspz Z runners, the ioctl system call is not allowed even
for the root user. There is likely an additional security mechanism
(such as AppArmor or seccomp) in place on Ubuntu runners.
This commit introduces a new helper, `is_permission_error()`,
which skips the test if ioctl operations in `reseed_rng()` are not
permitted.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The IBM actionspz Z runners mount /dev as tmpfs, while other systems
use devtmpfs. This difference causes an assertion failure for
test_already_baremounted.
This commit sets the detected filesystem for bare-mounted points
as the expected value.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The root filesystem for IBM actionspz Z runners is `btrfs` instead of `ext4`.
The error message differs when an unprivileged user tries to perform a bind mount.
This commit adjusts the handling of error messages based on the detected root
filesystem type.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Since the qemu & cloud-hypervisor support the cpu & memory
hotplug now, thus disable the static resource management
for qemu and cloud-hypervisor by default.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Since qemu-coco-dev-runtime-rs and qemu-coco-dev had disabled the
cpu&memory hotplug by enable static_sandbox_resource_mgmt, thus
we should disable the cpu hotplug test for those two runtime.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Since the qemu, cloud-hypervisor and dragonball had supported the
cpu hotplug on runtime-rs, thus enable the cpu hotplug test in CI.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
This commit introduces the configuration flag `disable_guest_empty_dir`
to control the placement of Kubernetes emptyDir volumes.
By default, the value is set to `false`, maintaining the current
behavior of creating emptyDirs within the guest VM
When set to `true`, emptyDirs will be created on the host filesystem.
This is essential for scenarios where users need to share data between
the host and the guest VM via an emptyDir.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
When handling a memory-based emptyDir, the runtime creates a tmpfs
mount inside the guest VM. The previous implementation just supports
mount options with only "rbind", which does not explicitly guarantee
the desired mount propagation behavior.
This commit hardens the mounting process by explicitly adding the
`rprivate` and `rw` mount flags.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
This commit introduces the 'local' volume, which is specifically
designed to create and manage Kubernetes emptyDir volumes directly
within the VM's sandbox directory.
The core functionality ensures that local volume can be handled
correctly in handle volume procedure.
This capability is essential for allowing containers to leverage the
storage backend for shared volumes.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
This commit implements the new 'local' storage type, enabling Kubernetes
emptyDir volumes to be created and managed directly inside the Kata VM
(in the sandbox directory).
The 'local' type instructs the kata-agent to provision the empty
directory within the VM.
This approach allows containers to share storage inside VM, Specially
useful within CoCo emptyDir scenarios.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Separated the checks for tmpfs and disk-based emptyDirs from an
`if-else if` block into two distinct `if` statements. This clarifies
the logic by treating each volume type detection as an independent task.
Additionally, updated the type for disk-based emptyDirs to the more
semantically accurate `KATA_K8S_LOCAL_STORAGE_TYPE`. This allows for
more specific handling downstream, distinguishing them from generic
host path mounts.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
In fact, emptyDir is not usually found in the proc mounts with the
previous logic and then it failed with the previous implementation.
Based on the related implementation within runtime-go,related
implementation within
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
This introduces a new storage type: local. Local storage type will
tell kata-agent to create an empty directory with LocalStorgae handler
in the sandbox directory within the VM.
And it also makes it align with runtime-go `KataLocalDevType = "local"`.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Pod annotations from the outer runtime are being used for cold-plugging
CDI devices. We need to ensure that these annotations don't leak into
the inner runtime for which specific container (sibling) annotations
are being created. Without this change, the inner runtime receives both
annotations, leading to failing CDI injection as an outer runtime
annotation observed in the guest translates to an unresolvable CDI
device, for example, cdi.k8s.io/gpu: "nvidia.com/pgpu=0".
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Stratovirt has been failing for a considerable amount of time, with no
sign of someone watching it and being actively working on a fix.
With this we also stop building and shipping stratovirt as part of our
release as we cannot test it.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
A few weeks ago we've tested nydus-snapshotter with this approach, and
we DID find issues with it.
Now, let's also test this with `experimental_force_guest_pull`.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
It's just a follow-up on the previous commit where we move away from the
runtimeClass creation inside the script, and instead we do it using the
chart itself.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
This reverts commit be05e1370c, which is
not a problem as we never released such option.
Conflicts:
tools/packaging/kata-deploy/helm-chart/README.md
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
We had this logic inside the script when we didn't use the helm chart.
However, this only makes the shim script more convoluted for no reason.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
In order to fix:
```
=== Running govulncheck on containerd-shim-kata-v2 ===
Vulnerabilities found in containerd-shim-kata-v2:
=== Symbol Results ===
Vulnerability #1: GO-2025-4015
Excessive CPU consumption in Reader.ReadResponse in net/textproto
More info: https://pkg.go.dev/vuln/GO-2025-4015
Standard library
Found in: net/textproto@go1.24.6
Fixed in: net/textproto@go1.24.8
Vulnerable symbols found:
#1: textproto.Reader.ReadResponse
Vulnerability #2: GO-2025-4014
Unbounded allocation when parsing GNU sparse map in archive/tar
More info: https://pkg.go.dev/vuln/GO-2025-4014
Standard library
Found in: archive/tar@go1.24.6
Fixed in: archive/tar@go1.24.8
Vulnerable symbols found:
#1: tar.Reader.Next
Vulnerability #3: GO-2025-4013
Panic when validating certificates with DSA public keys in crypto/x509
More info: https://pkg.go.dev/vuln/GO-2025-4013
Standard library
Found in: crypto/x509@go1.24.6
Fixed in: crypto/x509@go1.24.8
Vulnerable symbols found:
#1: x509.Certificate.Verify
#2: x509.Certificate.Verify
Vulnerability #4: GO-2025-4012
Lack of limit when parsing cookies can cause memory exhaustion in net/http
More info: https://pkg.go.dev/vuln/GO-2025-4012
Standard library
Found in: net/http@go1.24.6
Fixed in: net/http@go1.24.8
Vulnerable symbols found:
#1: http.Client.Do
#2: http.Client.Get
#3: http.Client.Head
#4: http.Client.Post
#5: http.Client.PostForm
Use '-show traces' to see the other 9 found symbols
Vulnerability #5: GO-2025-4011
Parsing DER payload can cause memory exhaustion in encoding/asn1
More info: https://pkg.go.dev/vuln/GO-2025-4011
Standard library
Found in: encoding/asn1@go1.24.6
Fixed in: encoding/asn1@go1.24.8
Vulnerable symbols found:
#1: asn1.Unmarshal
#2: asn1.UnmarshalWithParams
Vulnerability #6: GO-2025-4010
Insufficient validation of bracketed IPv6 hostnames in net/url
More info: https://pkg.go.dev/vuln/GO-2025-4010
Standard library
Found in: net/url@go1.24.6
Fixed in: net/url@go1.24.8
Vulnerable symbols found:
#1: url.JoinPath
#2: url.Parse
#3: url.ParseRequestURI
#4: url.URL.Parse
#5: url.URL.UnmarshalBinary
Vulnerability #7: GO-2025-4009
Quadratic complexity when parsing some invalid inputs in encoding/pem
More info: https://pkg.go.dev/vuln/GO-2025-4009
Standard library
Found in: encoding/pem@go1.24.6
Fixed in: encoding/pem@go1.24.8
Vulnerable symbols found:
#1: pem.Decode
Vulnerability #8: GO-2025-4008
ALPN negotiation error contains attacker controlled information in
crypto/tls
More info: https://pkg.go.dev/vuln/GO-2025-4008
Standard library
Found in: crypto/tls@go1.24.6
Fixed in: crypto/tls@go1.24.8
Vulnerable symbols found:
#1: tls.Conn.Handshake
#2: tls.Conn.HandshakeContext
#3: tls.Conn.Read
#4: tls.Conn.Write
#5: tls.Dial
Use '-show traces' to see the other 4 found symbols
Vulnerability #9: GO-2025-4007
Quadratic complexity when checking name constraints in crypto/x509
More info: https://pkg.go.dev/vuln/GO-2025-4007
Standard library
Found in: crypto/x509@go1.24.6
Fixed in: crypto/x509@go1.24.9
Vulnerable symbols found:
#1: x509.CertPool.AppendCertsFromPEM
#2: x509.Certificate.CheckCRLSignature
#3: x509.Certificate.CheckSignature
#4: x509.Certificate.CheckSignatureFrom
#5: x509.Certificate.CreateCRL
Use '-show traces' to see the other 27 found symbols
Vulnerability #10: GO-2025-4006
Excessive CPU consumption in ParseAddress in net/mail
More info: https://pkg.go.dev/vuln/GO-2025-4006
Standard library
Found in: net/mail@go1.24.6
Fixed in: net/mail@go1.24.8
Vulnerable symbols found:
#1: mail.AddressParser.Parse
#2: mail.AddressParser.ParseList
#3: mail.Header.AddressList
#4: mail.ParseAddress
#5: mail.ParseAddressList
```
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
During the ${wait_time} for an expected condition, if
CreateContainerRequest was NOT expected to fail: detect possible
CreateContainerRequest failures early and abort the wait.
For example, before this change:
not ok 1 Successful replication controller with auto-generated policy in 123335ms
ok 2 Policy failure: unexpected container command in 14601ms
ok 3 Policy failure: unexpected volume mountPath in 14443ms
ok 4 Policy failure: unexpected host device mapping in 14515ms
ok 5 Policy failure: unexpected securityContext.allowPrivilegeEscalation in 14485ms
ok 6 Policy failure: unexpected capability in 14382ms
ok 7 Policy failure: unexpected UID = 1000 in 14578ms
After this change:
not ok 1 Successful replication controller with auto-generated policy in 17108ms
ok 2 Policy failure: unexpected container command in 14427ms
ok 3 Policy failure: unexpected volume mountPath in 14636ms
ok 4 Policy failure: unexpected host device mapping in 14493ms
ok 5 Policy failure: unexpected securityContext.allowPrivilegeEscalation in 14554ms
ok 6 Policy failure: unexpected capability in 15087ms
ok 7 Policy failure: unexpected UID = 1000 in 14371ms
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
During the ${wait_time} for an expected condition, if
CreateContainerRequest was NOT expected to fail: detect possible
CreateContainerRequest failures early and abort the wait.
For example, before this change:
not ok 1 Successful pod with auto-generated policy in 94852ms
ok 2 Policy failure: unexpected device mount in 17807ms
After this change:
not ok 1 Successful pod with auto-generated policy in 35194ms
ok 2 Policy failure: unexpected device mount in 21355ms
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
During the ${wait_time} for an expected condition, if
CreateContainerRequest was NOT expected to fail: detect possible
CreateContainerRequest failures early and abort the wait.
For example, before this change:
not ok 1 Logs empty when ReadStreamRequest is blocked in 102257ms
After this change:
not ok 1 Logs empty when ReadStreamRequest is blocked in 17339ms
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
During the ${wait_time} for an expected condition, if
CreateContainerRequest was NOT expected to fail: detect possible
CreateContainerRequest failures early and abort the wait.
For example, before this change:
not ok 1 Successful job with auto-generated policy in 107111ms
ok 2 Policy failure: unexpected environment variable in 7920ms
ok 3 Policy failure: unexpected command line argument in 7874ms
ok 4 Policy failure: unexpected emptyDir volume in 7823ms
ok 5 Policy failure: unexpected projected volume in 7812ms
ok 6 Policy failure: unexpected readOnlyRootFilesystem in 7903ms
ok 7 Policy failure: unexpected UID = 222 in 7720ms
After this change:
not ok 1 Successful job with auto-generated policy in 10271ms
ok 2 Policy failure: unexpected environment variable in 8018ms
ok 3 Policy failure: unexpected command line argument in 7886ms
ok 4 Policy failure: unexpected emptyDir volume in 7621ms
ok 5 Policy failure: unexpected projected volume in 7843ms
ok 6 Policy failure: unexpected readOnlyRootFilesystem in 7632ms
ok 7 Policy failure: unexpected UID = 222 in 7619ms
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
During the ${wait_time} for an expected condition, if
CreateContainerRequest was NOT expected to fail: detect possible
CreateContainerRequest failures early and abort the wait.
For example, before this change:
ok 1 Successful sc deployment with auto-generated policy and container image volumes in 14769ms
ok 2 Successful sc with fsGroup/supplementalGroup deployment with auto-generated policy and container image volumes in 8384ms
not ok 3 Successful sc deployment with security context choosing another valid user in 136149ms
ok 4 Successful layered sc deployment with auto-generated policy and container image volumes in 8862ms
ok 5 Policy failure: unexpected GID = 0 for layered securityContext deployment in 7941ms
ok 6 Policy failure: malicious root group added via supplementalGroups deployment in 11612ms
After:
ok 1 Successful sc deployment with auto-generated policy and container image volumes in 15230ms
ok 2 Successful sc with fsGroup/supplementalGroup deployment with auto-generated policy and container image volumes in 9364ms
not ok 3 Successful sc deployment with security context choosing another valid user in 11060ms
ok 4 Successful layered sc deployment with auto-generated policy and container image volumes in 9124ms
ok 5 Policy failure: unexpected GID = 0 for layered securityContext deployment in 7919ms
ok 6 Policy failure: malicious root group added via supplementalGroups deployment in 11666ms
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
During the ${wait_time} for an expected condition, if
CreateContainerRequest was NOT expected to fail: detect possible
CreateContainerRequest failures early and abort the wait.
For example, before this change:
not ok 1 Successful pod with auto-generated policy in 110801ms
not ok 2 Able to read env variables sourced from configmap using envFrom in 94104ms
not ok 3 Successful pod with auto-generated policy and runtimeClassName filter in 95838ms
not ok 4 Successful pod with auto-generated policy and custom layers cache path in 110712ms
ok 5 Policy failure: unexpected container image in 8113ms
ok 6 Policy failure: unexpected privileged security context in 7943ms
ok 7 Policy failure: unexpected terminationMessagePath in 11530ms
ok 8 Policy failure: unexpected hostPath volume mount in 7970ms
ok 9 Policy failure: unexpected config map in 7933ms
not ok 10 Policy failure: unexpected lifecycle.postStart.exec.command in 112677ms
ok 11 RuntimeClassName filter: no policy in 2302ms
not ok 12 ExecProcessRequest tests in 93946ms
not ok 13 Successful pod: runAsUser having the same value as the UID from the container image in 94003ms
ok 14 Policy failure: unexpected UID = 0 in 8016ms
ok 15 Policy failure: unexpected UID = 1234 in 7850ms
After:
not ok 1 Successful pod with auto-generated policy in 12182ms
not ok 2 Able to read env variables sourced from configmap using envFrom in 10121ms
not ok 3 Successful pod with auto-generated policy and runtimeClassName filter in 11738ms
not ok 4 Successful pod with auto-generated policy and custom layers cache path in 26592ms
ok 5 Policy failure: unexpected container image in 7742ms
ok 6 Policy failure: unexpected privileged security context in 7949ms
ok 7 Policy failure: unexpected terminationMessagePath in 7789ms
ok 8 Policy failure: unexpected hostPath volume mount in 7887ms
ok 9 Policy failure: unexpected config map in 7818ms
not ok 10 Policy failure: unexpected lifecycle.postStart.exec.command in 9120ms
ok 11 RuntimeClassName filter: no policy in 2081ms
not ok 12 ExecProcessRequest tests in 9883ms
not ok 13 Successful pod: runAsUser having the same value as the UID from the container image in 9870ms
ok 14 Policy failure: unexpected UID = 0 in 11161ms
ok 15 Policy failure: unexpected UID = 1234 in 7814ms
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
We've seen a few cases where we fail the test due to timeout and when we
print the pods we just see that they've been created.
With that in mind, let's just increase the timeout a little bit.
Example:
```
not ok 1 Parallel jobs in 6250ms
(in test file k8s-parallel.bats, line 41)
`kubectl wait --for=condition=Ready --timeout=$timeout pod -l jobgroup=${job_name}' failed
No resources found in kata-containers-k8s-tests namespace.
[bats-exec-test:71] INFO: k8s configured to use runtimeclass
job.batch/process-item-test1 created
job.batch/process-item-test2 created
job.batch/process-item-test3 created
NAME STATUS COMPLETIONS DURATION AGE
process-item-test1 Running 0/1 0s
process-item-test2 Running 0/1 0s
process-item-test3 Running 0/1 0s
error: no matching resources found
No resources found in kata-containers-k8s-tests namespace.
No resources found in kata-containers-k8s-tests namespace.
DEBUG: system logs of node 'aks-nodepool1-25989463-vmss000000' since test start time (2025-11-01 16:39:03)
-- No entries --
job.batch "process-item-test1" deleted
job.batch "process-item-test2" deleted
job.batch "process-item-test3" deleted
```
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Otherwise we'll face issues like:
```
Error: found in Chart.yaml, but missing in charts/ directory: node-feature-discovery
```
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
As we're failing on the uninstall, which seems related to a bug on NFD
itself, but I don't have access to a s390x machine to debug, let's skip
the enablement for now and enable it back once we've experimented it
better on s390x.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
As we're failing to install NFD on CBL Mariner, let's skip the
enablement there, and enable it once we've experimented it better there.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
As we have the ability to deploy NFD as a sub-chart of our chart, let's
make sure we test it during our CI.
We had to increase the timeout values, where we had timeouts set, to
deploy / undeploy kata, as now NFD is also deployed / undeployed.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Let's ensure that we add NFD as a weak dependency of the kata-deploy
helm chart.
What we're doing for now is leaving it up to the user / admin to enable
it, and if enabled then we do a explicit check for virtualization
support (x86_64 only for now).
In case NFD is already deployed, we fail the installation (in case it's
enabled on the kata-deploy helm chart) with a clear error message to the
user.
While I know that kata-remote **DOES NOT** require virtualization, I've
left this out (with a comment for when we add a peer-pods dependency on
kata-deploy) in order to simplify things for now, as kata-remote is not
a deployed shim by default.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
As Kata Containers can be consumed by other helm-charts, hard coding the
default runtime class name to `kata` is not optimal.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
All the options that take a specific shim as an argument MUST have
specific per arch settings, as not all the shims are available for all
the arches, leading to issues when setting up multi-arch deployments.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Let's ensure that we consume NVRC releases straight from GitHub instead
of building the binaries ourselves.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
We have here either /dev/vfio/<num> or /dev/vfio/devices/vfio<num>,
for IOMMUFD format /dev/vfio/devices/vfio<num>, strip "vfio" prefix
/dev/vfio/123 - basename "123" - vfioNum = "123" - cdi.k8s.io/vfio123
/dev/vfio/devices/vfio123 - basename "vfio123" - strip - vfioNum = "123" - cdi.k8s.io/vfio123
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
This allows us to test privileged containers when using the webhook.
We can do this because kata-deploy sets privileged_without_host_devices = true for kata runtime by default.
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
We have 2 tests running on GitHub provided runners:
* devmapper
* CRI-O
- devmapper situation
For devmapper, we're currently testing devmapper with s390x as part of
one of its jobs.
More than that, this test has been failing here due to a lack of space
in the machine for quite some time, and no-action was taken to bring it
back either via GARM or some other way.
With that said, let's rely on the s390x CI to test devmapper and avoid
one extra failure on our CI by removing this one.
- cri-o situation
CRI-O is being tested with a fixed version of kubernetes that's already
reached its EOL, and a CRI-O version that matches that k8s version.
There has been attempts to raise issues, and also to provide a PR that
does at least part of the work ... leaving the debugging part for the
maintainers of the CI. However, there was no action on those from the
maintainers.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
- Explained the concept and benefits of VM templating
- Provided step-by-step instructions for enabling VM templating
- Detailed the setup for using snapshotter in place of VirtioFS for template-based VM creation
- Added performance test results comparing template-based and direct VM creation
Signed-off-by: ssc <741026400@qq.com>
- init: initialize the VM template factory
- status: check the current factory status
- destroy: clean up and remove factory resources
These commands provide basic lifecycle management for VM templates.
Signed-off-by: ssc <741026400@qq.com>
Use `ioctl_with_mut_ref` instead of `ioctl_with_ref` in the
`create_device` method as it needs to write to the `kvm_create_device`
struct passed to it, which was released in v0.12.1.
Signed-off-by: Siyu Tao <taosiyu2024@163.com>
Fix the cargo fmt issues and then we can make the libs tests required
again to avoid this regression happening again.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
kata-deploy helm chart is *THE* way to deploy kata-containers on
kubernetes environments, and kubernetes environments is basically the
only reliably tested deployment we have.
For now, let's just drop documentation that is outdated / incorrect, and
in the future let's ensure we update the linked docs, as we work on
update / upgrade for the helm chart.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The libs in question were added when moving to developer.nvidia.com
but switching back to ubuntu only based builds they are not needed.
Remove them to keep the rootfs as minimal as possible.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
In the case of CC we need additional libraries in the rootfs.
Add them conditionally if type == confidential.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Add build_vm_from_template() that flips boot_from_template flag,
wires factory.template_path/{memory,state} into the hypervisor config,
and returns ready-to-use hypervisor & agent instances.
When factory.template is enabled, VirtContainer bypasses normal creation
and directly boots the VM by restoring the template through incoming migration,
completing the "create → save → clone" loop.
Fixes: #11413
Signed-off-by: ssc <741026400@qq.com>
Introduced factory::FactoryConfig with init/destroy/status commands to manage template pools.
Added template::Template to fetch, create and persist base VMs.
Introduced vm::{VM, VMConfig} exposing create, pause, save, resume, stop,
disconnect and migration helpers for sandbox integration.
Extended QemuInner to executes QMP incoming migration, pause/resume and status tracking.
Fixes: #11413
Signed-off-by: ssc <741026400@qq.com>
Added new fields in Hypervisor struct to support VM template creation,
template boot, memory and device state paths, shared path, and store
paths. Introduced a Factory struct in config to manage template path,
cache endpoint, cache number, and template enable flag. Integrated
Factory into TomlConfig for runtime configuration parsing.
Fixes: #11413
Signed-off-by: ssc <741026400@qq.com>
This change enables to run the Cloud Hypervisor VMM using a non-root user
when rootless flag is set true in configuration.
Fixes: #11414
Signed-off-by: stevenfryto <sunzitai_1832@bupt.edu.cn>
Pass the file descriptors of the tuntap device to the Cloud Hypervisor VMM process
so that the process could open the device without cap_net_admin
Signed-off-by: stevenfryto <sunzitai_1832@bupt.edu.cn>
There's no reason to keep the env var / input as it's never been used
and now kata-deploy detects automatically whether NFD is deployed or
not.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
When the NodeFeatureRule CRD is detected kata-deploy will:
* Create the specific NodeFeatureRules for the x86_64 TEEs
* Adapt the TEEs runtime classes to take into account the amount of keys
available in the system when spawning the podsandbox.
Note, we still do not have NFD as sub-dependency of the helm chart, and
I'm not even sure if we will have. However, it's important to integrate
better with the scenarios where the NFD is already present.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Change NIM bats file logic to allow skipping test cases which
require multiple GPUs. This can be helpful for test clusters where
there is only one node with a single GPU, or for local test
environments with a single-node cluster with a single GPU.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
As a community we want to strike a balance between having up-to-date toolchains, to receive the
latest security fixes and to be able to benefit from new features and packages, whilst not being
too bleeding edge and disrupting downstream and other consumers. As a result we have the following
guidelines (note, not hard rules) for our go and rust toolchains that we are attempting to try out:
## Go toolchain
Go is released [every six months](https://go.dev/wiki/Go-Release-Cycle) with support for the
[last two major release versions](https://go.dev/doc/devel/release#policy). We always want to
ensure that we are on a supported version so we receive security fixes. To try and make
things easier for some of our users, we aim to be using the older of the two supported major
versions, unless there is a compelling reason to adopt the newer version.
In practice this means that we bump our major version of the go toolchain every six months to
version (1.x-1) in response to a new version (1.x) coming out, which makes our current version
(1.x-2) no longer supported. We will bump the minor version whenever required to satisfy
dependency updates, or security fixes.
Our go toolchain version is recorded in [`versions.yaml`](../versions.yaml) under
`.languages.golang.version` and should match with the version in our `go.mod` files.
## Rust toolchain
Rust has a [six week](https://doc.rust-lang.org/book/appendix-05-editions.html#:~:text=The%20Rust%20language%20and%20compiler,these%20tiny%20changes%20add%20up.)
release cycle and they only support the latest stable release, so if we wanted to remain on a
supported release we would only ever build with the latest stable and bump every 6 weeks.
However feedback from our community has indicated that this is a challenge as downstream consumers
often want to get rust from their distro, or downstream fork and these struggle to keep up with
the six week release schedule. As a result the community has agreed to try out a policy of
"stable-2", where we aim to build with a rust version that is two versions behind the latest stable
version.
In practice this should mean that we bump our rust toolchain every six weeks, to version
1.x-2 when 1.x is released as stable and we should be picking up the latest point release
of that version, if there were any.
The rust-toolchain that we are using is recorded in [`rust-toolchain.toml`](../rust-toolchain.toml).
@@ -97,6 +97,8 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.hypervisor.use_legacy_serial` | `boolean` | uses legacy serial device for guest's console (QEMU) |
| `io.katacontainers.config.hypervisor.default_gpus` | uint32 | the minimum number of GPUs required for the VM. Only used by remote hypervisor to help with instance selection |
| `io.katacontainers.config.hypervisor.default_gpu_model` | string | the GPU model required for the VM. Only used by remote hypervisor to help with instance selection |
| `io.katacontainers.config.hypervisor.block_device_num_queues` | `usize` | The number of queues to use for block devices (runtime-rs only) |
| `io.katacontainers.config.hypervisor.block_device_queue_size` | uint32 | The size of the of the queue to use for block devices (runtime-rs only) |
VM templating is a Kata Containers feature that enables new VM creation using a cloning technique. When enabled, new VMs are created by cloning from a pre-created template VM, and they will share the same initramfs, kernel and agent memory in readonly mode. It is very much like a process fork done by the kernel but here we *fork* VMs.
For more details on VM templating, refer to the [What is VM templating and how do I use it](./what-is-vm-templating-and-how-do-I-use-it.md) article.
## How to Enable VM Templating
VM templating can be enabled by changing your Kata Containers config file (`/opt/kata/share/defaults/kata-containers/runtime-rs/configuration.toml`, overridden by `/etc/kata-containers/configuration.toml` if provided) such that:
-`qemu` version `v4.1.0` or above is specified in `hypervisor.qemu`->`path` section
-`enable_template = true`
-`template_path = "/run/vc/vm/template"` (default value, can be customized as needed)
-`initrd =` is set
-`image =` option is commented out or removed
-`shared_fs =` option is commented out or removed
-`default_memory =` should be set to more than 256MB
Then you can create a VM template for later usage by calling:
### Initialize and create the VM template
The `factory init` command creates a VM template by launching a new VM, initializing the Kata Agent, then pausing and saving its state (memory and device snapshots) to the template directory. This saved template is used to rapidly clone new VMs using QEMU's memory sharing capabilities.
```bash
sudo kata-ctl factory init
```
### Check the status of the VM template
The `factory status` command checks whether a VM template currently exists by verifying the presence of template files (memory snapshot and device state). It will output "VM factory is on" if the template exists, or "VM factory is off" otherwise.
```bash
sudo kata-ctl factory status
```
### Destroy and clean up the VM template
The `factory destroy` command removes the VM template by remove the `tmpfs` filesystem and deleting the template directory along with all its contents.
```bash
sudo kata-ctl factory destroy
```
## How to Create a New VM from VM Template
In the Go version of Kata Containers, the VM templating mechanism is implemented using virtio-9p (9pfs). However, 9pfs is not supported in runtime-rs due to its poor performance, limited cache coherence, and security risks. Instead, runtime-rs adopts `VirtioFS` as the default mechanism to provide rootfs for containers and VMs.
Yet, when enabling the VM template mechanism, `VirtioFS` introduces conflicts in memory sharing because its DAX-based shared memory mapping overlaps with the template's page-sharing design. To resolve these conflicts and ensure strict isolation between cloned VMs, runtime-rs replaces `VirtioFS` with the snapshotter approach — specifically, the `blockfile` snapshotter.
The `blockfile` snapshotter is used in runtime-rs because it provides each VM with an independent block-based root filesystem, ensuring strong isolation and full compatibility with the VM templating mechanism.
### Configure Snapshotter
#### Check if `Blockfile` Snapshotter is Available
```bash
ctr plugins ls | grep blockfile
```
If not available, continue with the following steps:
After modifying the configuration, restart containerd to apply changes:
```bash
sudo systemctl restart containerd
```
### Run Container with `blockfile` Snapshotter
After the VM template is created, you can pull an image and run a container using the `blockfile` snapshotter:
```bash
ctr run --rm -t --snapshotter blockfile docker.io/library/busybox:latest template sh
```
We can verify whether a VM was launched from a template or started normally by checking the launch parameters — if the parameters contain `incoming`, it indicates that the VM was started from a template rather than created directly.
## Performance Test
The comparative experiment between **template-based VM** creation and **direct VM** creation showed that the template-based approach achieved a ≈ **73.2%** reduction in startup latency (average launch time of **0.6s** vs. **0.82s**) and a ≈ **79.8%** reduction in memory usage (average memory usage of **178.2 MiB** vs. **223.2 MiB**), demonstrating significant improvements in VM startup efficiency and resource utilization.
The test script is as follows:
```bash
# Clear the page cache, dentries, and inodes to free up memory
echo 3 | sudo tee /proc/sys/vm/drop_caches
# Display the current memory usage
free -h
# Create 100 normal VMs and template-based VMs, and track the time
time for I in $(seq 100); do
echo -n " ${I}th" # Display the iteration number
ctr run -d --runtime io.containerd.kata.v2 --snapshotter blockfile docker.io/library/busybox:latest normal/template${I}
done
# Display the memory usage again after running the test
| [Using official distro packages](#official-packages) | Kata packages provided by Linux distributions official repositories | yes | Recommended for most users. |
| [Automatic](#automatic-installation) | Run a single command to install a full system | **No!** | For those wanting the latest release quickly. |
| [Using kata-deploy Helm chart](#kata-deploy-helm-chart) | The preferred way to deploy the Kata Containers distributed binaries on a Kubernetes cluster | **No!** | Best way to give it a try on kata-containers on an already up and running Kubernetes cluster. |
### Kata Deploy Helm Chart
The Kata Deploy Helm chart is a convenient way to install all of the binaries and
The Kata Deploy Helm chart is the preferred way to install all of the binaries and
artifacts required to run Kata Containers on Kubernetes.
[Use Kata Deploy Helm Chart](/tools/packaging/kata-deploy/helm-chart/README.md) to install Kata Containers on a Kubernetes Cluster.
### Official packages
Kata packages are provided by official distribution repositories for:
| Distribution (link to installation guide) | Minimum versions |
Kata Containers on Amazon Web Services (AWS) makes use of [i3.metal](https://aws.amazon.com/ec2/instance-types/i3/) instances. Most of the installation procedure is identical to that for Kata on your preferred distribution, except that you have to run it on bare metal instances since AWS doesn't support nested virtualization yet. This guide walks you through creating an i3.metal instance.
region = <your-aws-region-for-your-i3-metal-instance>
EOF
```
For more information on how to get AWS credentials please refer to [this guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html). Alternatively, you can ask the administrator of your AWS account to issue one with the AWS CLI:
```sh
$ aws_username="myusername"
$ aws iam create-access-key --user-name="$aws_username"
```
More general AWS CLI guidelines can be found [here](https://docs.aws.amazon.com/cli/latest/userguide/installing.html).
Refer to [this guide](https://docs.aws.amazon.com/cli/latest/userguide/cli-ec2-launch.html) for more details on how to launch instances with the AWS CLI.
SSH into the machine
```bash
$ ssh -i MyKeyPair.pem ubuntu@${IP}
```
Go onto the next step.
## Install Kata
The process for installing Kata itself on bare metal is identical to that of a virtualization-enabled VM.
For detailed information to install Kata on your distribution of choice, see the [Kata Containers installation user guides](../install/README.md).
# Install Kata Containers on Google Compute Engine
Kata Containers on Google Compute Engine (GCE) makes use of [nested virtualization](https://cloud.google.com/compute/docs/instances/enable-nested-virtualization-vm-instances). Most of the installation procedure is identical to that for Kata on your preferred distribution, but enabling nested virtualization currently requires extra steps on GCE. This guide walks you through creating an image and instance with nested virtualization enabled. Note that `kata-runtime check` checks for nested virtualization, but does not fail if support is not found.
As a pre-requisite this guide assumes an installed and configured instance of the [Google Cloud SDK](https://cloud.google.com/sdk/downloads). For a zero-configuration option, all of the commands below were been tested under [Google Cloud Shell](https://cloud.google.com/shell/) (as of Jun 2018). Verify your `gcloud` installation and configuration:
```bash
$ gcloud info ||{echo"ERROR: no Google Cloud SDK";exit 1;}
```
## Create an Image with Nested Virtualization Enabled
VM images on GCE are grouped into families under projects. Officially supported images are automatically discoverable with `gcloud compute images list`. That command produces a list similar to the following (likely with different image names):
Each distribution has its own project, and each project can host images for multiple versions of the distribution, typically grouped into families. We recommend you select images by project and family, rather than by name. This ensures any scripts or other automation always works with a non-deprecated image, including security updates, updates to GCE-specific scripts, etc.
### Create the Image
The following example (substitute your preferred distribution project and image family) produces an image with nested virtualization enabled in your currently active GCE project:
If successful, `gcloud` reports that the image was created. Verify that the image has the nested virtualization license with `gcloud compute images describe $IMAGE_NAME`. This produces output like the following (some fields have been removed for clarity and to redact personal info):
The primary criterion of interest here is the presence of the `enable-vmx` license. Without that licence Kata will not work. Without that license Kata does not work. The presence of that license instructs the Google Compute Engine hypervisor to enable Intel's VT-x instructions in virtual machines created from the image. Note that nested virtualization is only available in VMs running on Intel Haswell or later CPU micro-architectures.
### Verify VMX is Available
Assuming you created a nested-enabled image using the previous instructions, verify that VMs created from this image are VMX-enabled with the following:
1. Create a VM from the image created previously:
```bash
$ gcloud compute instances create \
--image $IMAGE_NAME \
--machine-type n1-standard-2 \
--min-cpu-platform "Intel Broadwell" \
kata-testing
```
> **NOTE**: In most zones the `--min-cpu-platform` argument can be omitted. It is only necessary in GCE Zones that include hosts based on Intel's Ivybridge platform.
2. Verify that the VMX CPUID flag is set:
```bash
$ gcloud compute ssh kata-testing
# While ssh'd into the VM:
$ [ -z "$(lscpu|grep GenuineIntel)" ] && { echo "ERROR: Need an Intel CPU"; exit 1; }
```
If this fails, ensure you created your instance from the correct image and that the previously listed `enable-vmx` license is included.
## Install Kata
The process for installing Kata itself on a virtualization-enabled VM is identical to that for bare metal.
For detailed information to install Kata on your distribution of choice, see the [Kata Containers installation user guides](../install/README.md).
## Create a Kata-enabled Image
Optionally, after installing Kata, create an image to preserve the fruits of your labor:
```bash
$ gcloud compute instances stop kata-testing
$ gcloud compute images create \
--source-disk kata-testing \
kata-base
```
The result is an image that includes any changes made to the `kata-testing` instance as well as the `enable-vmx` flag. Verify this with `gcloud compute images describe kata-base`. The result, which omits some fields for clarity, should be similar to the following:
- [NVIDIA GPUs](NVIDIA-GPU-passthrough-and-Kata.md) and [Enabling NVIDIA GPU workloads using GPU passthrough with Kata Containers](NVIDIA-GPU-passthrough-and-Kata-QEMU.md)
- confidential computing guest components: the attestation agent,
confidential data hub and api-server-rest binaries
- CRI-O pause container (for the guest image-pull method)
- BusyBox utilities (provides a base set of libraries and binaries, and a
linker)
- some supporting files, such as file containing a list of supported GPU
device IDs which NVRC reads
#### UVM orchestration flow
When the Kata runtime asks QEMU to launch the VM, the UVM's Linux kernel
boots and mounts the root filesystem. After this, NVRC starts as the initial
process.
NVRC scans for NVIDIA GPUs on the PCI bus, loads the
NVIDIA kernel modules, waits for driver initialization, creates the device nodes,
and initializes the GPU hardware (using the `nvidia-smi` binary). NVRC also
creates the guest-side CDI specification file (using the
`nvidia-ctk cdi generate` command). This file specifies devices of
`kind: nvidia.com/gpu`, i.e., GPUs appearing to be physical GPUs on regular
bare metal systems. The guest CDI specification also contains `containerEdits`
for each device, specifying device nodes (e.g., `/dev/nvidia0`,
`/dev/nvidiactl`), library mounts, and environment variables to be mounted
into the container which receives the passthrough GPU.
Then, NVRC forks the Kata agent while continuing to run as the
init system. This allows NVRC to handle ongoing GPU management tasks
while kata-agent focuses on container lifecycle management. See the
[NVRC sources](https://github.com/NVIDIA/nvrc/blob/main/src/main.rs) for an
overview on the steps carried out by NVRC.
When the Kata runtime sends the create container request, the Kata agent
parses the inner runtime CDI annotation. For example, for the inner runtime
annotation `"cdi.k8s.io/vfio1": "nvidia.com/gpu=0"`, the agent looks up device
`0` in the guest CDI specification with `kind: nvidia.com/gpu`.
The Kata agent also reads the guest CDI specification's `containerEdits`
section and injects relevant contents into the OCI spec of the respective
container. The kata agent then creates and starts a `rustjail` container
based on the final OCI spec. The container now has relevant device nodes,
binaries and low-level libraries available, and can start a user application
linked against the CUDA runtime API (e.g., `libcudart.so` and other
libraries). When used, the CUDA runtime API in turn calls the CUDA driver
API and kernel drivers, interacting with the pass-through GPU device.
An additional step is exercised in our CI samples: when using images from an
authenticated registry, the guest-pull mechanism triggers attestation using
trustee's Key Broker Service (KBS) for secure release of the NGC API
authentication key used to access the NVCR container registry. As part of
this, the attestation agent exercises composite attestation and transitions
the GPU into `Ready` state (without this, the GPU has to explicitly be
transitioned into `Ready` state by passing the `nvrc.smi.srs=1` kernel
parameter via the shim config, causing NVRC to transition the GPU into the
`Ready` state).
## Deployment Guidance
This guidance assumes you use bare-metal machines with proper support for
Kata's non-TEE and TEE GPU workload deployment scenarios for your Kubernetes
nodes. We provide guidance based on the upstream Kata CI procedures for the
NVIDIA GPU CI validation jobs. Note that, this setup:
- uses the guest image pull method to pull container image layers
- uses the genpolicy tool to attach Kata agent security policies to the pod
manifest
- has dedicated (composite) attestation tests, a CUDA vectorAdd test, and a
NIM/RA test sample with secure API key release
A similar deployment guide and scenario description can be found in NVIDIA resources
under
[Early Access: NVIDIA GPU Operator with Confidential Containers based on Kata](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/confidential-containers.html).
### Requirements
The requirements for the TEE scenario are:
- Ubuntu 25.10 as host OS
- CPU with AMD SEV-SNP support with proper BIOS/UEFI version and settings
- CC-capable Hopper/Blackwell GPU with proper VBIOS version.
BIOS and VBIOS configuration is out of scope for this guide. Other resources,
> **Note**: If you see a message similar to the above, the BAR space of the NVIDIA
> GPU has been successfully allocated.
## NVIDIA vGPU mode with Kata Containers
NVIDIA vGPU is a licensed product on all supported GPU boards. A software license
is required to enable all vGPU features within the guest VM. NVIDIA vGPU manager
needs to be installed on the host to configure GPUs in vGPU mode. See [NVIDIA Virtual GPU Software Documentation v14.0 through 14.1](https://docs.nvidia.com/grid/14.0/) for more details.
### NVIDIA vGPU time-sliced
In the time-sliced mode, the GPU is not partitioned and the workload uses the
whole GPU and shares access to the GPU engines. Processes are scheduled in
series. The best effort scheduler is the default one and can be exchanged by
other scheduling policies see the documentation above how to do that.
Beware if you had `MIG` enabled before to disable `MIG` on the GPU if you want
to use `time-sliced` `vGPU`.
```sh
$ sudo nvidia-smi -mig 0
```
Enable the virtual functions for the physical GPU in the `sysfs` file system.
Create the GPU instances that correspond to the `vGPU` types of the `MIG-backed`
`vGPUs` that you will create [NVIDIA A100 PCIe 80GB Virtual GPU Types](https://docs.nvidia.com/grid/13.0/grid-vgpu-user-guide/index.html#vgpu-types-nvidia-a100-pcie-80gb).
```sh
# MIG 1g.10gb --> vGPU A100D-1-10C
$ sudo nvidia-smi mig -cgi 19
```
List the GPU instances and get the GPU instance id to create the compute
instance.
```sh
$ sudo nvidia-smi mig -lgi # list the created GPU instances
$ sudo nvidia-smi mig -cci -gi 9 # each GPU instance can have several compute
# instances. Instance -> Workload
```
Verify that the compute instances were created within the GPU instance
Repeat the steps after the [snippet](#list-all-available-vgpu-instances) listing
to create the corresponding `mdev` device and use the guest `OS` created in the
previous section with `time-sliced` `vGPUs`.
## Install NVIDIA Driver + Toolkit in Kata Containers Guest OS
### Build Guest OS with NVIDIA Driver and Toolkit
Consult the [Developer-Guide](https://github.com/kata-containers/kata-containers/blob/main/docs/Developer-Guide.md#create-a-rootfs-image) on how to create a
rootfs base image for a distribution of your choice. This is going to be used as
@@ -583,9 +308,12 @@ Enable the `guest_hook_path` in Kata's `configuration.toml`
guest_hook_path="/usr/share/oci/hooks"
```
As the last step one can remove the additional packages and files that were added
to the `$ROOTFS_DIR` to keep it as small as possible.
One has built a NVIDIA rootfs, kernel and now we can run any GPU container
without installing the drivers into the container. Check NVIDIA device status
with `nvidia-smi`
with `nvidia-smi`:
```sh
$ sudo ctr --debug run --runtime "io.containerd.kata.v2" --device /dev/vfio/192 --rm -t "docker.io/nvidia/cuda:11.6.0-base-ubuntu20.04" cuda nvidia-smi
for security (cryptography) and compression. These instructions cover the
steps for the latest [Ubuntu LTS release](https://ubuntu.com/download/desktop)
which already include the QAT host driver. These instructions can be adapted to
any Linux distribution. These instructions guide the user on how to download
the kernel sources, compile kernel driver modules against those sources, and
load them onto the host as well as preparing a specially built Kata Containers
kernel and custom Kata Containers rootfs.
for security (cryptography) and compression. Kata Containers can enable
these acceleration functions for containers using QAT SR-IOV with the
support from [Intel QAT Device Plugin for Kubernetes](https://github.com/intel/intel-device-plugins-for-kubernetes)
or [Intel QAT DRA Resource Driver for Kubernetes](https://github.com/intel/intel-resource-drivers-for-kubernetes).
* Download kernel sources
* Compile Kata kernel
* Compile kernel driver modules against those sources
* Download rootfs
* Add driver modules to rootfs
* Build rootfs image
## Helpful Links before starting
## More Information
[Intel® QuickAssist Technology at `01.org`](https://www.intel.com/content/www/us/en/developer/topic-technology/open/quick-assist-technology/overview.html)
@@ -26,554 +22,6 @@ kernel and custom Kata Containers rootfs.
[Intel Device Plugin for Kubernetes](https://github.com/intel/intel-device-plugins-for-kubernetes)
[Intel DRA Resource Driver for Kubernetes](https://github.com/intel/intel-resource-drivers-for-kubernetes)
[Intel® QuickAssist Technology for Crypto Poll Mode Driver](https://dpdk-docs.readthedocs.io/en/latest/cryptodevs/qat.html)
## Steps to enable Intel® QAT in Kata Containers
There are some steps to complete only once, some steps to complete with every
reboot, and some steps to complete when the host kernel changes.
## Script variables
The following list of variables must be set before running through the
scripts. These variables refer to locations to store modules and configuration
files on the host and links to the drivers to use. Modify these as
needed to point to updated drivers or different install locations.
### Set environment variables (Every Reboot)
Make sure to check [`01.org`](https://www.intel.com/content/www/us/en/developer/topic-technology/open/quick-assist-technology/overview.html) for
$ sudo sed -i 's/GRUB_CMDLINE_LINUX_DEFAULT=""/GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"/' /etc/default/grub
$ sudo update-grub
$ sudo reboot
```
### Download Intel® QAT drivers
This will download the [Intel® QAT drivers](https://www.intel.com/content/www/us/en/developer/topic-technology/open/quick-assist-technology/overview.html).
Make sure to check the website for the latest version.
```bash
$ mkdir -p $QAT_SRC
$ cd$QAT_SRC
$ curl -L $QAT_DRIVER_URL| tar zx
```
### Copy Intel® QAT configuration files and enable virtual functions
Modify the instructions below as necessary if using a different Intel® QAT hardware
platform. You can learn more about customizing configuration files at the
In addition, containerd expects the binary to be in `/usr/local/bin` so add
this small script so that it redirects to be able to use either QEMU or
Cloud Hypervisor with Kata.
```bash
$ echo'#!/usr/bin/env bash'| sudo tee /usr/local/bin/containerd-shim-kata-qemu-v2
$ echo'KATA_CONF_FILE=/opt/kata/share/defaults/kata-containers/configuration-qemu.toml /opt/kata/bin/containerd-shim-kata-v2 $@'| sudo tee -a /usr/local/bin/containerd-shim-kata-qemu-v2
$ echo'#!/usr/bin/env bash'| sudo tee /usr/local/bin/containerd-shim-kata-clh-v2
$ echo'KATA_CONF_FILE=/opt/kata/share/defaults/kata-containers/configuration-clh.toml /opt/kata/bin/containerd-shim-kata-v2 $@'| sudo tee -a /usr/local/bin/containerd-shim-kata-clh-v2
File diff suppressed because it is too large
Load Diff
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.