kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-03-18 18:58:36 +00:00

Author	SHA1	Message	Date
Balint Tobik	295a6a81d0	runtime: refactor hypervisor devices cgroup creation Separatly added hypervisor devices to cgroup to omit not relevant warnings and fail if none of them are available. Also fix a testcase reload removed kernel modules to later testcases and skip some tests on ARM because lack of virtualization support Fixes #6656 Signed-off-by: Balint Tobik <btobik@redhat.com>	2026-02-13 09:23:08 +01:00
Konstantin Khlebnikov	5d99a141d9	runtime: add hypervisor options for NUMA topology With enable_numa=true hypervisor will expose host NUMA topology as is: map vm NUMA nodes to host 1:1 and bind vpus to relates CPUS. Option "numa_mapping" allows to redefine NUMA nodes mapping: - map each vm node to particular host node or several numa nodes - emulate numa on host without numa (useful for tests) Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Co-authored-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-02-09 20:09:25 +01:00
Manuel Huber	7958be8634	runtime: Make kernel_verity_params overwritable Similar to the kernel_params annotation, add a kernel_verity_params annotation and add logic to make these parameters overwritable. For instance, this can be used in test logic to provide bogus dm-verity hashes for negative tests. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	f639c3fa17	runtime: Enable kernelinit dm-verity variant This change introduces the kernel_verity_parameters knob to the Go based shim, picking up dm-verity information in a new config field (the corresponding build variable is already produced by the shim build). The change extends the shim to parse dm-verity information from this parameter and to construct the kernel command line appropriately, based on the indicated initramfs or kernelinit build variant. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Pradipta Banerjee	8a449d358f	shim: Add CRI-O annotation support for device cold plug Add support for CRI-O annotations when fetching pod identifiers for device cold plug. The code now checks containerd CRI annotations first, then falls back to CRI-O annotations if they are empty. This enables device cold plug to work with both containerd and CRI-O container runtimes. Annotations supported: - containerd: io.kubernetes.cri.sandbox-name, io.kubernetes.cri.sandbox-namespace - CRI-O: io.kubernetes.cri-o.KubeName, io.kubernetes.cri-o.Namespace Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>	2026-02-03 04:51:15 +00:00
LandonTClipp	b50a73912d	runtime: Config test extension for IOMMUFDID Adding additional cases for the IOMMUFDID method to check for non-IOMMUFD paths are passed. The method should do the right thing. Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>	2025-12-10 15:46:28 +01:00
LandonTClipp	d5e4cf6b4d	runtime: Add test for ExecuteVFIODeviceAdd Copilot made a good point that we should have a test for this. Thus, this commit. Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>	2025-12-10 15:46:28 +01:00
LandonTClipp	137866f793	runtime: Allow QMP commands to be logged in debug level Logging the QMP commands gives us a lot of flexibility to troubleshoot issues with what is being sent to QEMU. Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>	2025-12-10 15:46:28 +01:00
LandonTClipp	a3b5764f67	runtime: Fix import cycle and add unit test for IOMMUFDID() An import cycle was introduced because of a mutual need for the constant that describes the prefix of IOMMUFD files. We need to extract this out into a higher-level package. Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>	2025-12-10 15:46:28 +01:00
LandonTClipp	09438fd54f	runtime: Add IOMMUFD Object Creation for QEMU QMP Commands The QMP commands sent to QEMU did not properly set up IOMMUFD objects in the codepath that handles VFIO device hot-plugging. This is mainly relevant in the Kubernetes use-case where the VFIO devices are not available when QEMU is first launched. Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>	2025-12-10 15:46:28 +01:00
Zvonko Kaiser	f8ad17499d	gpu: VFIO handling container vs sandbox If the sandbox has cold-plugged a IOMMUFD device but the device-plugins sends us a /dev/vfio/<NUM> device we need to check if the IOMMUFD device and the VFIO device are the same We have the sibling.BDF we now need to extract the BDF of the devPath that is either /dev/vfio/<NUM> or /dev/vfio/devices/vfio<NUM> Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-12-05 16:53:31 +01:00
Fabiano Fidêncio	5ca4f2b9ff	runtimes: annotations: Fix kernel param handling We need to ensure that we do not blindly append nor blindly override the kernel parameters set by default, but rather modify the values in case they exist, and append in case they do not. Now we're actually making golang and rust runtime behave the same, as so far they were behaving differently, each version wrong in its own way. :-p. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-25 16:04:52 +01:00
Joji Mekkattuparamban	5aa184925a	shim: Support device cold plug with Kubernetes Utilize Kubelet's Pod Resource API to determine device allocations for the Pod during sandbox creation. Use CDI files to translate the device IDs to corresponding device paths and perform device injection. Fixes #12009 Signed-off-by: Joji Mekkattuparamban <jojim@nvidia.com>	2025-11-20 10:58:55 +01:00
zhangchen.kidd	fea954df7a	runtime: qemu: qmp: Add iothread args for QMP ExecutePCIDeviceAdd Qemu already support the device_add with iothread args. Make KATA have ability to hotplug PCI device with IOThreads. Currently, just support QEMU as the hypervisor, not sure it works for stratovirt. Signed-off-by: zhangchen.kidd <zhangchen.kidd@jd.com>	2025-11-17 15:55:03 +08:00
zhangchen.kidd	c3d3684f81	runtime: Introduce independent IOThreads framework Introduce independent IOThread framework for Kata container. What is the indep_iothreads: This new feature introduce a way to pre-alloc IOThreads for QEMU hypervisor (maybe other hypervisor can support too). Independent IOThreads enables IO to be processed in a separate thread. To generally improve the performance of each module, avoid them running in the QEMU main loop. Why need indep_iothreads: In Kata container implementation, many devices based on hotplug mechanism. The real workload container may not sync the same lifecycle with the VM. It may require to hotplug/unplug new disks or other devices without destroying the VM. So we can keep the IOThread with the VM as a IOThread pool(some devices need multi iothreads for performance like virtio-blk vq-mapping), the hotplug devices can attach/detach with the IOThread according to business needs. At the same time, QEMU also support the "x-blockdev-set-iothread" to change iothreads(but it need stop VM for data secure). Current QEMU have many devices support iothread, virtio-blk, virtio-scsi, virtio-balloon, monitor, colo-compare...etc... How it works: Add new item in hypervisor struct named "indep_iothreads" in toml. The default value is 0, it reused the original "enable_iothreads" as the switch. If the "indep_iothreads" != 0 and "enable_iothreads" = true it will add qmp object -iothread indepIOThreadsPrefix_No when VM startup. The first user is the virtio-blk, it will attach the indep_iothread_0 as default when enable iothread for virtio-blk. Thanks Chen Signed-off-by: zhangchen.kidd <zhangchen.kidd@jd.com>	2025-11-17 15:55:01 +08:00
Manuel Huber	1561d7fbba	runtime: Clear outer CDI annotations Pod annotations from the outer runtime are being used for cold-plugging CDI devices. We need to ensure that these annotations don't leak into the inner runtime for which specific container (sibling) annotations are being created. Without this change, the inner runtime receives both annotations, leading to failing CDI injection as an outer runtime annotation observed in the guest translates to an unresolvable CDI device, for example, cdi.k8s.io/gpu: "nvidia.com/pgpu=0". Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-11-04 23:18:00 +01:00
Kevin Zhao	af919686ab	Kata-deploy: Add CCA firmware build support runtime: pass firmware to CCA Realm Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>	2025-10-16 17:24:45 +08:00
Kevin Zhao	bfa7f2486d	runtime: Add Arm64 CCA confidential Guest Support This commit add the support for Arm CCA/RME support in golang runtime. The guest kernel is support since Linux 6.13. The host kernel which Kata is running is picked from: https://gitlab.arm.com/linux-arm/linux-cca branch: cca-host/v8 which is currently very stable and reviewed for a while, and it is expecting to merged this year. The Qemu support is picked up from: https://git.codelinaro.org/linaro/dcap/qemu.git, branch: cca/2025-05-28, The Qemu support will be merged to upstream after the CCA host support official support in linux kernel. More info regarding the CCA software stack dev and test, please refer to link: https://linaro.atlassian.net/wiki/spaces/QEMU/pages/29051027459/Building+an+RME+stack+for+QEMU Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>	2025-10-16 17:23:54 +08:00
Sergei Puzyrev	62b12953c7	runtime: fix "num-queues expects uint64" error with virtio-blk Unneeded type-conversion was removed. Fixes #11887 Signed-off-by: Sergei Puzyrev <spuzirev@gmail.com>	2025-10-08 17:09:22 -05:00
Mikko Ylinen	6f45a7f937	runtime: config: allow TDX QGS port=0 `85f3391bc` added the support for TDX QGS port=0 but missed defaultQgsPort in the default config. defaultQgsPort overrides user provided tdx_quote_generation_service_socket_port=0. After this change, defaultQgsPort is not needed anymore since there's no default: any positive integer is OK and negative or unset value becomes a parse error. QEMUTDXQUOTEGENERATIONSERVICESOCKETPORT in the Makefile is used to provide a sane default when tdx_quote_generation_service_socket_port gets set in the configuration. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2025-09-30 09:47:05 +02:00
Aurélien Bombo	433e59de1f	gha: zizmor: fix "workflow or action definition without a name" error This fixes that error everywhere by adding a `name:` field to all jobs that were missing it. We keep the same name as the job ID to ensure no disturbance to the required job names. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-09-25 23:34:40 -05:00
Saul Paredes	252d4486f1	runtime: delete initdata annotation Delete annotation from OCI spec and sandbox config. This is done after the optional initdata annotation value has been read. Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2025-09-15 11:34:26 -07:00
stevenhorsman	16dd1de0ab	kata-monitor: Update deprecated use of grpc functions In google.golang.org/grpc v1.72.0, `DialContext`, is deprecated, so switch to use `NewClient` instead. `grpc.WithBlock()` is deprecated and not recommend, so remove this Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-09-15 14:29:06 +01:00
stevenhorsman	b9ff5ffc21	kata-monitor: Replace use of deprecated expfmt.FmtText In `github.com/prometheus/common v0.62.0` expfmt.FmtText is deprecated, so replace with `expfmt.NewFormat(expfmt.TypeTextPlain)`. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-09-15 14:29:06 +01:00
stevenhorsman	7f86b967d1	runtime: Replace use of deprecated expfmt.FmtText In `github.com/prometheus/common v0.62.0` expfmt.FmtText is deprecated, so replace with `expfmt.NewFormat(expfmt.TypeTextPlain)`. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-09-15 14:29:06 +01:00
Fabiano Fidêncio	08d2ba1969	cgroups: Fix "." parent cgroup special case `ef642fe890` added a special case to avoid moving cgroups that are on the "default" slice in case of deletion. However, this special check should be done in the Parent() method instead, which ensures that the default resource controller ID is returned, instead of ".". Fixes: #11599 Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-08-27 08:15:15 +02:00
Fabiano Fidêncio	fd1b8ceed1	runtime: qemu: Add reclaim_guest_freed_memory [BACKPORT] Similar to what we've done for Cloud Hypervisor in the commit `9f76467cb7`, we're backporting a runtime-rs feature that would be benificial to have as part of the go runtime. This allows users to use virito-balloon for the hypervisor to reclaim memory freed by the guest. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-08-22 23:56:47 +02:00
stevenhorsman	8cbb1a4357	runtime: Fix non constant Errorf formatting As part of the go 1.24.6 bump there are errors about the incorrect use of a errorf, so switch to the non-formatting version, or add the format string as appropriate Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-08-22 10:44:15 +02:00
Hyounggyu Choi	93ec470928	runtime/tests: Update annotation for initdata Let's rename the runtime-rs initdata annotation from `io.katacontainers.config.runtime.cc_init_data` to `io.katacontainers.config.hypervisor.cc_init_data`. Rationale: - initdata itself is a hypervisor-specific feature - the new name aligns with the annotation handling logic: `c92bb1aa88/src/libs/kata-types/src/annotations/mod.rs (L514-L968)` This commit updates the annotation for go-runtime and tests accordingly. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-08-19 15:17:01 +02:00
Paul Meyer	5635410dd3	runtime: make SNP guest policy configurable Dependening on the platform configuration, users might want to set a more secure policy than the QEMU default. Signed-off-by: Paul Meyer <katexochen0@gmail.com>	2025-08-13 09:06:36 +02:00
Christophe de Dinechin	ec480dc438	qemu: Respect the JSON schema for hot plug When hot-plugging CPUs on QEMU, we send a QMP command with JSON arguments. QEMU 9.2 recently became more strict[1] enforcing the JSON schema for QMP parameters. As a result, running Kata Containers with QEMU 9.2 results in a message complaining that the core-id parameter is expected to be an integer: ``` qmp hotplug cpu, cpuID=cpu-0 socketID=1, error: QMP command failed: Invalid parameter type for 'core-id', expected: integer ``` Fix that by changing the core-id, socket-id and thread-id to be integer values. [1]: `be93fd5372` Fixes: #11633 Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2025-08-07 09:13:57 +02:00
Fabiano Fidêncio	17ce44083c	runtime: Remove reference to sev package Otherwise it'll just break static checks. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-07-18 12:49:54 +02:00
Arvind Kumar	ecac3d2d28	runtime: Removing runtime logic for SEV Removing runtime SEV functionality, such as the kbs, ovmf, VMSA handling, and SEV configs as part of deprecating SEV from kata. Co-authored-by: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com> Signed-off-by: Arvind Kumar <arvinkum@amd.com>	2025-07-07 11:17:32 -05:00
Gao Xiang	9079c8e598	runtime: improve EROFS snapshotter support To better support containerd 2.1 and later versions, remove the hardcoded `layer.erofs` and instead parse `/proc/mounts` to obtain the real mount source (and `/sys/block/loopX/loop/backing_file` if needed). If the mount source doesn't end with `layer.erofs`, it should be marked as unsupported, as it may be a filesystem meta file generated by later containerd versions for the EROFS flattened filesystem feature. Also check whether the filesystem type is `overlay` or not, since the containerd mount manager [1] may change it after being introduced. [1] https://github.com/containerd/containerd/issues/11303 Fixes: `f63ec50ba3` ("runtime: Add EROFS snapshotter with block device support") Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2025-06-26 10:12:12 +08:00
Steve Horsman	64c95cb996	Merge pull request #11389 from kata-containers/checkout-persist-credentials-false workflows: Set persist-credentials: false on checkout	2025-06-16 09:58:22 +01:00
stevenhorsman	99e70100c7	workflows: Set persist-credentials: false on checkout By default the checkout action leave the credentials in the checked-out repo's `.git/config`, which means they could get exposed. Use persist-credentials: false to prevent this happening. Note: static-checks.yaml does use git diff after the checkout, but the git docs state that git diff is just local, so doesn't need authentication. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-10 10:33:41 +01:00
Dan Mihai	1aeef52bae	clh: runtime: add disable_image_nvdimm support Allow users to build using DEFDISABLEIMAGENVDIMM=true if they want to set disable_image_nvdimm=true in configuration-clh.toml. disable_image_nvdimm=false is the default config value. Also, use virtio-blk instead of nvdimm if disable_image_nvdimm=true in configuration-clh.toml. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-06-10 02:00:52 +00:00
Xynnn007	39aa481da1	runtime: fix initdata support for SNP the qemu commandline of SNP should start with `sev-snp-guest`, and then following other parameters separeted by ','. This patch fixes the parameter order. Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2025-06-02 20:33:19 +08:00
Champ-Goblem	ef642fe890	runtime: fix cgroupv2 deletion when sandbox_cgroup_only=false Currently, when a new sandbox resource controller is created with cgroupsv2 and sandbox_cgroup_only is disabled, the cgroup management falls back to cgroupfs. During deletion, `IsSystemdCgroup` checks if the path contains `:` and tries to delete the cgroup via systemd. However, the cgroup was originally set up via cgroupfs and this process fails with `lstat /sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/....scope: no such file or directory`. This patch updates the deletion logic to take in to account the sandbox_cgroup_only=false option and in this case uses the cgroupfs delete. Fixes: #11036 Signed-off-by: Champ-Goblem <cameron@northflank.com>	2025-05-30 17:51:31 +02:00
stevenhorsman	088e97075c	workflow: Add top-level permissions Set: ``` permissions: contents: read ``` as the default top-level permissions explicitly to conform to recommended security practices e.g. https://github.com/ossf/scorecard/blob/main/docs/checks.md#token-permissions	2025-05-28 19:34:28 +01:00
Paul Meyer	c4815eb3ad	runtime: add option to force guest pull This enables guest pull via config, without the need of any external snapshotter. When the config enables runtime.experimental_force_guest_pull, instead of relying on annotations to select the way to share the root FS, we always use guest pull. Co-authored-by: Markus Rudy <mr@edgeless.systems> Signed-off-by: Paul Meyer <katexochen0@gmail.com>	2025-05-27 12:42:00 +02:00
stevenhorsman	4de79b9821	runtime: Ignoring deprecated warning. In the latest oci-spec, the prestart hook is deprecated. However, the docker & nerdctl tests failed when I switched to one of the newer hooks which don't run at quite the same time, so ignore the deprecation warnings for now to unblock the security fix Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-05-06 15:18:37 +01:00
stevenhorsman	3740ce6e7b	runtime: Update crio annotations We've been using the github.com/containers/podman/v4/pkg/annotations module to get cri-o annotations, which has some major CVEs in, but in v5 most of the annotations were moved into crio (from 1.30) (see https://github.com/cri-o/cri-o/pull/7867). Let's switch to use the cri-o annotations module instead and remediate CVE-2024-3056. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-05-06 15:18:37 +01:00
ChengyuZhu6	f63ec50ba3	runtime: Add EROFS snapshotter with block device support - Detection of EROFS options in container rootfs - Creation of necessary EROFS devices - Sharing of rootfs with EROFS via overlayfs Fixes: #11163 Signed-off-by: ChengyuZhu6 <hudson@cyzhu.com>	2025-05-05 23:51:13 +02:00
Champ-Goblem	9f76467cb7	runtime: clh: Add reclaim_guest_freed_memory [BACKPORT] We're bringing to Cloud Hypervisor only the reclaim_guest_freed_memory option already present in the runtime-rs. This allows us to use virtio-balloon for the hypervisor to reclaim memory freed by the guest. The reason we're not touching other hypervisors is because we're very much aware of avoiding to clutter the go code at this point, so we'll leave it for whoever really needs this on other hypervisor (and trust me, we really do need it for Cloud Hypervisor right now ;-)). Signed-off-by: Champ-Goblem <cameron@northflank.com> Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-04-25 21:05:53 +02:00
Alex Lyn	8b49564c01	Merge pull request #10610 from Xynnn007/faet-initdata-rbd Feat \| Implement initdata for bare-metal/qemu hypervisor	2025-04-24 09:59:14 +08:00
Zvonko Kaiser	6713db8990	gpu: Add CDI parsing for Sandbox as well Extend the CDI parsing for pod_sandbox as well, only single_container was covered properly. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-04-23 21:02:06 +00:00
Zvonko Kaiser	97f4bcb456	gpu: Remove CDI annotations for outer runtime After the outer runtime has processed the CDI annotation from the spec we can delete them since they were converted into Linux devices in the OCI spec. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-04-23 21:02:06 +00:00
Xynnn007	91bb6b7c34	runtime: add support for io.katacontainers.config.runtime.cc_init_data io.katacontainers.config.runtime.cc_init_data specifies initdata used by the pod in base64(gzip(initdata toml)) format. The initdata will be encapsulated into an initdata image and mount it as a raw block device to the guest. The initdata image will be aligned with 512 bytes, which is chosen as a usual sector size supported by different hypervisors like qemu, clh and dragonball. Note that this patch only adds support for qemu hypervisor. Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2025-04-15 16:35:59 +08:00
Dan Mihai	8779abd0a1	Merge pull request #11057 from mythi/tdx-qgs-uds runtime: qemu: add support to use TDX QGS via Unix Domain Sockets	2025-04-07 07:27:48 -07:00

1 2 3 4 5 ...

562 Commits