Commit Graph

164 Commits

Author SHA1 Message Date
Steve Horsman
b147cb1319 Merge pull request #12587 from fidencio/topic/runtime-add-configurable-kubelet-root-dir
runtimes: add configurable kubelet root dir
2026-02-28 19:06:14 +00:00
Manuel Huber
88f746dea8 runtime: nvidia: Use OVMF for NV GPU handler
Shift to using OVMF instead of using SeaBios.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>

Update src/runtime/Makefile

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-02-27 22:54:31 +01:00
Fabiano Fidêncio
0a73638744 runtime: add configurable kubelet root dir
Different kubernetes distributions, such as k0s, use a different kubelet
root dir location instead of the default /var/lib/kubelet, so ConfigMap
and Secret volume propagation were failing.

This adds a kubelet_root_dir config option that the go runtime uses when
matching volume paths and kata-deploy now sets it automatically for k0s
via a drop-in file.

runtime-rs does not need this option: it identifies ConfigMap/Secret,
projected, and downward-api volumes by volume-type path segment
(kubernetes.io~configmap, etc.), not by kubelet root prefix.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-27 14:10:57 +01:00
Hyounggyu Choi
b9f3d5aa67 runtime: Support memory hotplug with virtio-mem on s390x
This commit adds logic to properly handle memory hotplug
for QemuCCWVirtio in the ExecMemdevAdd() path.

The new logic is triggered only when virtio-mem is enabled.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2026-02-26 14:21:34 +01:00
Aurélien Bombo
e17f96251d runtime{,-rs}/clh: Disable virtio-pmem
This disables virtio-pmem support for Cloud Hypervisor by changing
Kata config defaults and removing the relevant code paths.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-02-18 11:47:53 -06:00
Paul Meyer
c5ad3f9b26 Merge pull request #12472 from katexochen/p/disable-nvdimm-cc
runtime: disable nvdimm for confidential guest
2026-02-10 14:54:40 +01:00
Paul Meyer
a5f554922c runtime: disable nvdimm for confidential guest
There is code to disable this at runtime  when confidential_guest
is enabled anyway[^1], but it will omit a warning every time. All
the touched configuration files set confidential_guest to true,
so we already know nvdimm isn't supported.

[^1]: 16a7ed6e14/src/runtime/virtcontainers/qemu_amd64.go (L144-L148)

Signed-off-by: Paul Meyer <katexochen0@gmail.com>
2026-02-10 08:38:18 +01:00
Konstantin Khlebnikov
5d99a141d9 runtime: add hypervisor options for NUMA topology
With enable_numa=true hypervisor will expose host NUMA topology as is:
map vm NUMA nodes to host 1:1 and bind vpus to relates CPUS.

Option "numa_mapping" allows to redefine NUMA nodes mapping:
- map each vm node to particular host node or several numa nodes
- emulate numa on host without numa (useful for tests)

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Co-authored-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-02-09 20:09:25 +01:00
Fabiano Fidêncio
ab515712d4 kernel: Unify kernel and kernel-confidential
Build a single kernel for both kernel and kernel-confidential on x86_64
and s390x. The kernel is built with TEE support (-x) on those arches only.

This helps to simplilfy and to maintain the code, and having a single
kernel was the original plan since forever.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-09 18:28:23 +01:00
Manuel Huber
a786582d0b rootfs: deprecate initramfs dm-verity mode
Remove the initramfs folder, its build steps, and use the kernel
based dm-verity enforcement for the handlers which used the
initramfs mode. Also, remove the initramfs verity mode
capability from the shims and their configs.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
f639c3fa17 runtime: Enable kernelinit dm-verity variant
This change introduces the kernel_verity_parameters knob to the
Go based shim, picking up dm-verity information in a new config
field (the corresponding build variable is already produced by
the shim build). The change extends the shim to parse dm-verity
information from this parameter and to construct the kernel command
line appropriately, based on the indicated initramfs or kernelinit
build variant.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
83a0bd1360 gpu: use dm-verity for the non-TEE GPU handler
Use a dm-verity protected rootfs image for the non-TEE NVIDIA
GPU handler as well.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
6d0bb49716 runtime: nvidia: Use img and sanitize whitespaces
Shift NVIDIA shim configurations to use an image instead of an initrd,
and remove trailing whitespaces from the configs.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
30c7325e75 runtimes: Sanitize trailing whitespaces
Clean up trailing whitespaces, making life easier for those who
have configured their IDE to clean these up.
Suggest to not add new code with trailing whitespaces etc.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-03 11:46:30 -08:00
Mikko Ylinen
927be7b8ad runtime: tdx: move to use QEMU from kata-deploy
Currently, a working TDX setup expects users to install special
TDX support builds from Canonical/CentOS virt-sig for TDX to
work. kata-deploy configured TDX runtime handler to use QEMU
from the distro's paths.

With TDX support now being available in upstream Linux and
Ubuntu 24.04 having an install candidate (linux-image-generic-6.17)
for a new enough kernel, move TDX configuration to use QEMU from
kata-deploy.

While this is the new default, going back to the original
setup is possible by making manual changes to TDX runtime handlers.

Note: runtime-rs is already using QEMUPATH for TDX.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2026-02-02 11:10:52 +02:00
Dan Mihai
20ca4d2d79 runtime: DEFDISABLEBLOCK := true
1. Add disable_block_device_use to CLH settings file, for parity with
   the already existing QEMU settings.

2. Set DEFDISABLEBLOCK := true by default for both QEMU and CLH. After
   this change, Kata Guests will use by default virtio-fs to access
   container rootfs directories from their Hosts. Hosts that were
   designed to use Host block devices attached to the Guests can
   re-enable these rootfs block devices by changing the value of
   disable_block_device_use back to false in their settings files.

3. Add test using container image without any rootfs layers. Depending
   on the container runtime and image snapshotter being used, the empty
   container rootfs image might get stored on a host block device that
   cannot be safely hotplugged to a guest VM, because the host is using
   the same block device.

4. Add block device hotplug safety warning into the Kata Shim
   configuration files.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Cameron McDermott <cameron@northflank.com>
2026-01-28 19:47:49 +01:00
Manuel Huber
6753c3ac08 runtime: nvidia: Disable NVDIMM
Disable NVDIMM. When using GPU passthrough, using NVDIMM would create
a r/o file-backed memory region. When using a GPU, QEMU tries to DMA-
map guest memory for the device, resulting in a mapping error:
memory listener initialization failed: Region mem0:
vfio_container_dma_map ... -22 (Invalid argument).
For the CC configs, NVDIMM is disabled by default in qemu_amd64.go
with a warning, but we also explicitly disable the setting in the
shim configuration file.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-01-14 22:51:07 +01:00
Manuel Huber
9e30283952 runtime: nvidia: change kernel parameters
Remove the agent hotplug timeout parameter from the kernel
command line. Having shifted to VFIO cold-plug, this parameter is
no longer needed.
Remove the no longer required parameter for TDX and thus align the
SNP and TDX configurations.
Add a parameter to avoid the kernel to mount the /dev tmpfs. NVRC
and later on kata-agent attempt this. While kata-agent does not
panic when mounting /dev fails, NVRC makes mounting /dev a hard
requirement.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-01-12 16:11:28 -08:00
Mikko Ylinen
cc6277b735 Revert "tdx: Update GPU config for the latest TDX stack"
Prefer the "full feature TDVF" instead of the generic OVMF build. See
Option-B in
https://github.com/tianocore/edk2/tree/master/OvmfPkg/IntelTdx#configurations-and-features
for the extra hardening supported.

FIRMWAREPATH_NV also seems to be TDX specific unlike the Makefile
suggests. Therefore, it can be dropped completely.

This reverts commit 66ccc25724.
2026-01-08 10:21:47 +01:00
Fabiano Fidêncio
88cdfab604 runtime: nvidia: Align static_sandbox_resource_mgmt
Let's ensure we have those aligned for both CC and non-CC use-case.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-17 17:04:51 +01:00
Fabiano Fidêncio
995770dbeb runtime: nvidia: Use cold-plug by default
Now that we have the way to do cold-plug, let's ensure we also use it
for the non-CC use case.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-17 17:04:51 +01:00
Fabiano Fidêncio
e859537c74 runtimes: config: Do NOT have commented fields
In order to have a better way to set things up using a toml editor, we
should take the containerd approach and actually have everything
uncommnted.  This will help us to unify how we deal with such values in
the future from the kata-deploy POV.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-25 19:26:56 +01:00
Joji Mekkattuparamban
5aa184925a shim: Support device cold plug with Kubernetes
Utilize Kubelet's Pod Resource API to determine device allocations
for the Pod during sandbox creation. Use CDI files to translate the device
IDs to corresponding device paths and perform device injection.

Fixes #12009

Signed-off-by: Joji Mekkattuparamban <jojim@nvidia.com>
2025-11-20 10:58:55 +01:00
zhangchen.kidd
f9d4829e77 rumtime: qemu: Add indep_iothreads for QEMU hypervisor toml
Add indep_iothreads args for QEMU related configuration toml.
The default value is 0.

Signed-off-by: zhangchen.kidd <zhangchen.kidd@jd.com>
2025-11-17 15:55:03 +08:00
Manuel Huber
a5cd7235cb runtime: Align nvidia TEEs enable_annotations with TEEs
It was just missed when adding those configurations.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-10 13:01:30 +01:00
Mikko Ylinen
1beda258b8 qemu: nvidia: tdx: add quote-generation-socket for attestation to work
Add TDX QGS quote-generation-socket TDX QEMU object params for
attestation to work in NVGPU+TDX environment.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2025-10-22 21:01:35 +02:00
Kevin Zhao
141070b388 Kata-deploy: Add kata-deploy set up for qemu-cca
Support launch qemu-cca in Kata-deploy.

Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>
2025-10-16 17:24:52 +08:00
Kevin Zhao
bfa7f2486d runtime: Add Arm64 CCA confidential Guest Support
This commit add the support for Arm CCA/RME support in golang runtime.
The guest kernel is support since Linux 6.13.

The host kernel which Kata is running is picked from: https://gitlab.arm.com/linux-arm/linux-cca
branch: cca-host/v8 which is currently very stable and reviewed for a while, and it is
expecting to merged this year.

The Qemu support is picked up from: https://git.codelinaro.org/linaro/dcap/qemu.git, branch: cca/2025-05-28,
The Qemu support will be merged to upstream after the CCA host support official support in linux kernel.

More info regarding the CCA software stack dev and test, please refer to link:
https://linaro.atlassian.net/wiki/spaces/QEMU/pages/29051027459/Building+an+RME+stack+for+QEMU

Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>
2025-10-16 17:23:54 +08:00
Dan Mihai
60beb5236d runtime: snp: enable CoCo annotations
Use @DEFENABLEANNOTATIONS_COCO@ in configuration-qemu-snp.toml,
for consistency with the tdx and coco-dev configuration files.

k8s-initdata.bats was failing during CI on SNP without this change,
because the cc_init_data annotation was disabled.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2025-09-12 15:38:33 +00:00
Fabiano Fidêncio
fd1b8ceed1 runtime: qemu: Add reclaim_guest_freed_memory [BACKPORT]
Similar to what we've done for Cloud Hypervisor in the commit
9f76467cb7, we're backporting a runtime-rs
feature that would be benificial to have as part of the go runtime.

This allows users to use virito-balloon for the hypervisor to reclaim
memory freed by the guest.

Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
2025-08-22 23:56:47 +02:00
stevenhorsman
081823b388 runtime: Enable init_data annotation
In #11693 the cc_init_data annotation was changes to be hypervisor
scoped, so each hypervisor needs to explicitly allow it in order to
use it now, so add this to both the go and rust runtime's remote
configurations

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-08-21 19:26:10 +01:00
Hyounggyu Choi
208cec429a runtime-rs: Introduce CoCo-specific enable_annotations
We need to include `cc_init_data` in the enable_annotations
array to pass the data. Since initdata is a CoCo-specific
feature, this commit introduces a new array,
`DEFENABLEANNOTATIONS_COCO`, which contains the required
string and applies it to the relevant CoCo configuration.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2025-08-20 10:15:23 +02:00
Paul Meyer
5635410dd3 runtime: make SNP guest policy configurable
Dependening on the platform configuration, users might want to
set a more secure policy than the QEMU default.

Signed-off-by: Paul Meyer <katexochen0@gmail.com>
2025-08-13 09:06:36 +02:00
Aurélien Bombo
6d96875d04 runtime: virtio-fs: Support "metadata" cache mode
The Rust virtiofsd supports a "metadata" cache mode [1] that wasn't
present in the C version [2], so this PR adds support for that.

 [1] https://gitlab.com/virtio-fs/virtiofsd
 [2] https://qemu.weilnetz.de/doc/5.1/tools/virtiofsd.html#cmdoption-virtiofsd-cache

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2025-08-07 21:24:40 +08:00
Fabiano Fidêncio
3143787f69 qemu: tdx: Fix binary path for non-gpu TDX
On commit 90bc749a19, we've changed the
QEMUTDXPATH in order to get it to work with GPUs, but the change broke
the non-GPU TDX use-case, which depends on the distro binary.

Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
2025-07-18 15:26:27 +02:00
Hyounggyu Choi
09297b7955 Merge pull request #11537 from BbolroC/set-sharedfs-to-none-for-ibm-sel
runtime/runtime-rs: Set shared_fs to none for IBM SEL in config file
2025-07-09 18:30:08 +02:00
Hyounggyu Choi
bca31d5a4d runtime/runtime-rs: Set shared_fs to none for IBM SEL in config file
In line with configuration for other TEEs, shared_fs should
be set to none for IBM SEL. This commit updates the value for
runtime/runtime-rs.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2025-07-09 14:22:28 +02:00
Arvind Kumar
ecac3d2d28 runtime: Removing runtime logic for SEV
Removing runtime SEV functionality,
such as the kbs, ovmf, VMSA handling,
and SEV configs as part of deprecating
SEV from kata.

Co-authored-by: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com>
Signed-off-by: Arvind Kumar <arvinkum@amd.com>
2025-07-07 11:17:32 -05:00
Dan Mihai
1aeef52bae clh: runtime: add disable_image_nvdimm support
Allow users to build using DEFDISABLEIMAGENVDIMM=true if they want to
set disable_image_nvdimm=true in configuration-clh.toml.

disable_image_nvdimm=false is the default config value.

Also, use virtio-blk instead of nvdimm if disable_image_nvdimm=true in
configuration-clh.toml.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2025-06-10 02:00:52 +00:00
Dan Mihai
0dd9325264 qemu: runtime: build variable for disable_image_nvdimm=true
Allow users to build using DEFDISABLEIMAGENVDIMM=true if they
want to set disable_image_nvdimm=true in configuration-qemu*.toml.

disable_image_nvdimm=false is the default configuration value.

Note that the value of disable_image_nvdimm gets ignored for
platforms using "confidential_guest = true".

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2025-06-10 01:57:42 +00:00
Dan Mihai
d51e0c9875 snp: gpu: comment out disable_image_nvdimm config
Comment out "disable_image_nvdimm = true" in:

- configuration-qemu-snp.toml
- configuration-qemu-nvidia-gpu-snp.toml

for consistency with the other configuration-qemu*.toml files.

Those two platforms are using "confidential_guest = true", and therefore
the value of disable_image_nvdimm gets ignored.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2025-06-10 01:44:51 +00:00
Shunsuke Kimura
5193cfedca runtime: remove hotplug_vfio_on_root_bus from toml
In this commit, hotplug_vfio_on_root_bus parameter is removed.
<dd422ccb69>

pcie_root_port parameter description
(`This value is valid when hotplug_vfio_on_root_bus is true and
machine_type is "q35"`) will have no value,
and not completely valid, since vrit or DB as also support for root-ports and CLH as well.
so removed.

Fixes: #11316

Co-authored-by: Zvonko Kaiser <zkaiser@nvidia.com>
Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com>
2025-06-05 21:53:06 +09:00
Paul Meyer
c4815eb3ad runtime: add option to force guest pull
This enables guest pull via config, without the need of any external
snapshotter. When the config enables runtime.experimental_force_guest_pull, instead of
relying on annotations to select the way to share the root FS, we always
use guest pull.

Co-authored-by: Markus Rudy <mr@edgeless.systems>
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
2025-05-27 12:42:00 +02:00
Rong Tao
914730d948 config: Fix typos
devie should be device

Signed-off-by: Rong Tao <rongtao@cestc.cn>
2025-05-19 14:19:22 +08:00
Hyounggyu Choi
4fac1293bd runtime/config: Add VFIO config for IBM SEL
With #11076 merged, a VFIO configuration is needed in the runtime
when IBM SEL is involved (e.g., qemu-se or qemu-se-runtime-rs).

For the Go runtime, we already have a nightly test
(e.g., https://github.com/kata-containers/kata-containers/actions/runs/14964175872/job/42031097043)
in which this change has been applied.
For the Rust runtime, the feature has not yet been migrated.
Thus, this change serves as a placeholder and a reminder for future implementation.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2025-05-12 14:58:29 +02:00
Champ-Goblem
9f76467cb7 runtime: clh: Add reclaim_guest_freed_memory [BACKPORT]
We're bringing to *Cloud Hypervisor only* the reclaim_guest_freed_memory
option already present in the runtime-rs.

This allows us to use virtio-balloon for the hypervisor to reclaim
memory freed by the guest.

The reason we're not touching other hypervisors is because we're very
much aware of avoiding to clutter the go code at this point, so we'll
leave it for whoever really needs this on other hypervisor (and trust
me, we really do need it for Cloud Hypervisor right now ;-)).

Signed-off-by: Champ-Goblem <cameron@northflank.com>
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
2025-04-25 21:05:53 +02:00
Paul Meyer
a994f142d0 runtime: make SNP IDBlock configurable
For a use case, we want to set the SNP IDBlock, which allows
configuring the AMD ASP to enforce parameters like expected launch
digest at launch. The struct with the config that should be enforced
(IDBlock) is signed. The public key is placed in the auth block and
the signature is verified by the ASP before launch. The digest of the
public key is also part of the attestation report (ID_KEY_DIGESTS).

Signed-off-by: Paul Meyer <katexochen0@gmail.com>
2025-03-14 07:50:54 +01:00
arvindskumar99
c0a3ecb27b config: Disabling nesting check for SNP
Adding disable_nesting_checks to accomodate SNP on Azure

Signed-off-by: arvindskumar99 <arvinkum@amd.com>
2025-02-20 12:24:08 +01:00
Zvonko Kaiser
4bda16565b gpu: Update timeouts
With the create_container_timeout the dial_timeout is lest important.
Add the custom timeout for GPUs in create_container_timeout

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-02-14 14:29:18 +00:00
Zvonko Kaiser
66ccc25724 tdx: Update GPU config for the latest TDX stack
We need extra kernel_params for TDX

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-02-14 14:29:18 +00:00