Commit Graph

534 Commits

Author SHA1 Message Date
Hyounggyu Choi
93ec470928 runtime/tests: Update annotation for initdata
Let's rename the runtime-rs initdata annotation from
`io.katacontainers.config.runtime.cc_init_data` to
`io.katacontainers.config.hypervisor.cc_init_data`.

Rationale:
- initdata itself is a hypervisor-specific feature
- the new name aligns with the annotation handling logic:
c92bb1aa88/src/libs/kata-types/src/annotations/mod.rs (L514-L968)

This commit updates the annotation for go-runtime and tests accordingly.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2025-08-19 15:17:01 +02:00
Paul Meyer
5635410dd3 runtime: make SNP guest policy configurable
Dependening on the platform configuration, users might want to
set a more secure policy than the QEMU default.

Signed-off-by: Paul Meyer <katexochen0@gmail.com>
2025-08-13 09:06:36 +02:00
Christophe de Dinechin
ec480dc438 qemu: Respect the JSON schema for hot plug
When hot-plugging CPUs on QEMU, we send a QMP command with JSON
arguments. QEMU 9.2 recently became more strict[1] enforcing the
JSON schema for QMP parameters. As a result, running Kata Containers
with QEMU 9.2 results in a message complaining that the core-id
parameter is expected to be an integer:

```
qmp hotplug cpu, cpuID=cpu-0 socketID=1, error:
QMP command failed:
Invalid parameter type for 'core-id', expected: integer
```

Fix that by changing the core-id, socket-id and thread-id to be
integer values.

[1]: be93fd5372

Fixes: #11633

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2025-08-07 09:13:57 +02:00
Fabiano Fidêncio
17ce44083c runtime: Remove reference to sev package
Otherwise it'll just break static checks.

Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
2025-07-18 12:49:54 +02:00
Arvind Kumar
ecac3d2d28 runtime: Removing runtime logic for SEV
Removing runtime SEV functionality,
such as the kbs, ovmf, VMSA handling,
and SEV configs as part of deprecating
SEV from kata.

Co-authored-by: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com>
Signed-off-by: Arvind Kumar <arvinkum@amd.com>
2025-07-07 11:17:32 -05:00
Gao Xiang
9079c8e598 runtime: improve EROFS snapshotter support
To better support containerd 2.1 and later versions, remove the
hardcoded `layer.erofs` and instead parse `/proc/mounts` to obtain the
real mount source (and `/sys/block/loopX/loop/backing_file` if needed).

If the mount source doesn't end with `layer.erofs`, it should be marked
as unsupported, as it may be a filesystem meta file generated by later
containerd versions for the EROFS flattened filesystem feature.

Also check whether the filesystem type is `overlay` or not, since the
containerd mount manager [1] may change it after being introduced.

[1] https://github.com/containerd/containerd/issues/11303

Fixes: f63ec50ba3 ("runtime: Add EROFS snapshotter with block device support")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2025-06-26 10:12:12 +08:00
Steve Horsman
64c95cb996
Merge pull request #11389 from kata-containers/checkout-persist-credentials-false
workflows: Set persist-credentials: false on checkout
2025-06-16 09:58:22 +01:00
stevenhorsman
99e70100c7 workflows: Set persist-credentials: false on checkout
By default the checkout action leave the credentials
in the checked-out repo's `.git/config`, which means
they could get exposed. Use persist-credentials: false
to prevent this happening.

Note: static-checks.yaml does use git diff after the checkout,
but the git docs state that git diff is just local, so doesn't
need authentication.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-06-10 10:33:41 +01:00
Dan Mihai
1aeef52bae clh: runtime: add disable_image_nvdimm support
Allow users to build using DEFDISABLEIMAGENVDIMM=true if they want to
set disable_image_nvdimm=true in configuration-clh.toml.

disable_image_nvdimm=false is the default config value.

Also, use virtio-blk instead of nvdimm if disable_image_nvdimm=true in
configuration-clh.toml.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2025-06-10 02:00:52 +00:00
Xynnn007
39aa481da1 runtime: fix initdata support for SNP
the qemu commandline of SNP should start with `sev-snp-guest`, and then
following other parameters separeted by ','. This patch fixes the
parameter order.

Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
2025-06-02 20:33:19 +08:00
Champ-Goblem
ef642fe890 runtime: fix cgroupv2 deletion when sandbox_cgroup_only=false
Currently, when a new sandbox resource controller is created with cgroupsv2 and sandbox_cgroup_only is disabled,
the cgroup management falls back to cgroupfs. During deletion, `IsSystemdCgroup` checks if the path contains `:`
and tries to delete the cgroup via systemd. However, the cgroup was originally set up via cgroupfs and this process
fails with `lstat /sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/....scope: no such file or directory`.

This patch updates the deletion logic to take in to account the sandbox_cgroup_only=false option and in this case uses
the cgroupfs delete.

Fixes: #11036
Signed-off-by: Champ-Goblem <cameron@northflank.com>
2025-05-30 17:51:31 +02:00
stevenhorsman
088e97075c workflow: Add top-level permissions
Set:
```
permissions:
  contents: read
```
as the default top-level permissions explicitly
to conform to recommended security practices e.g.
https://github.com/ossf/scorecard/blob/main/docs/checks.md#token-permissions
2025-05-28 19:34:28 +01:00
Paul Meyer
c4815eb3ad runtime: add option to force guest pull
This enables guest pull via config, without the need of any external
snapshotter. When the config enables runtime.experimental_force_guest_pull, instead of
relying on annotations to select the way to share the root FS, we always
use guest pull.

Co-authored-by: Markus Rudy <mr@edgeless.systems>
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
2025-05-27 12:42:00 +02:00
stevenhorsman
4de79b9821 runtime: Ignoring deprecated warning.
In the latest oci-spec, the prestart hook is deprecated.
However, the docker & nerdctl tests failed when I switched
to one of the newer hooks which don't run at quite the same time,
so ignore the deprecation warnings for now to unblock the security fix

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-05-06 15:18:37 +01:00
stevenhorsman
3740ce6e7b runtime: Update crio annotations
We've been using the
github.com/containers/podman/v4/pkg/annotations module
to get cri-o annotations, which has some major CVEs in, but
in v5 most of the annotations were moved into crio (from 1.30)
(see https://github.com/cri-o/cri-o/pull/7867). Let's switch
to use the cri-o annotations module instead and remediate
CVE-2024-3056.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-05-06 15:18:37 +01:00
ChengyuZhu6
f63ec50ba3 runtime: Add EROFS snapshotter with block device support
- Detection of EROFS options in container rootfs
- Creation of necessary EROFS devices
- Sharing of rootfs with EROFS via overlayfs

Fixes: #11163

Signed-off-by: ChengyuZhu6 <hudson@cyzhu.com>
2025-05-05 23:51:13 +02:00
Champ-Goblem
9f76467cb7 runtime: clh: Add reclaim_guest_freed_memory [BACKPORT]
We're bringing to *Cloud Hypervisor only* the reclaim_guest_freed_memory
option already present in the runtime-rs.

This allows us to use virtio-balloon for the hypervisor to reclaim
memory freed by the guest.

The reason we're not touching other hypervisors is because we're very
much aware of avoiding to clutter the go code at this point, so we'll
leave it for whoever really needs this on other hypervisor (and trust
me, we really do need it for Cloud Hypervisor right now ;-)).

Signed-off-by: Champ-Goblem <cameron@northflank.com>
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
2025-04-25 21:05:53 +02:00
Alex Lyn
8b49564c01
Merge pull request #10610 from Xynnn007/faet-initdata-rbd
Feat | Implement initdata for bare-metal/qemu hypervisor
2025-04-24 09:59:14 +08:00
Zvonko Kaiser
6713db8990 gpu: Add CDI parsing for Sandbox as well
Extend the CDI parsing for pod_sandbox as well, only single_container
was covered properly.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-04-23 21:02:06 +00:00
Zvonko Kaiser
97f4bcb456 gpu: Remove CDI annotations for outer runtime
After the outer runtime has processed the CDI annotation from the
spec we can delete them since they were converted into Linux
devices in the OCI spec.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-04-23 21:02:06 +00:00
Xynnn007
91bb6b7c34 runtime: add support for io.katacontainers.config.runtime.cc_init_data
io.katacontainers.config.runtime.cc_init_data specifies initdata used by
the pod in base64(gzip(initdata toml)) format. The initdata will be
encapsulated into an initdata image and mount it as a raw block device
to the guest.

The initdata image will be aligned with 512 bytes, which is chosen as a
usual sector size supported by different hypervisors like qemu, clh and
dragonball.

Note that this patch only adds support for qemu hypervisor.

Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
2025-04-15 16:35:59 +08:00
Dan Mihai
8779abd0a1
Merge pull request #11057 from mythi/tdx-qgs-uds
runtime: qemu: add support to use TDX QGS via Unix Domain Sockets
2025-04-07 07:27:48 -07:00
Ruoqing He
1662595146 runtime: Introduce riscv64 to govmm pkg
Define `vmm` for riscv64, set `MaxVCPUs` to 512 as QEMU RISC-V virt
Generic Virtual Platform [1] define.

[1] https://www.qemu.org/docs/master/system/riscv/virt.html

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-03-27 09:57:49 +08:00
Mikko Ylinen
85f3391bcf runtime: qemu: add support to use TDX QGS via Unix Domain Sockets
TDX Quote Generation Service (QGS) signs TDREPORT sent to it from
Qemu (GetQuote hypercall). Qemu needs quote-generation-socket
address configured for IPC.

Currently, Kata govmm only enables vsock based IPC for QGS but
QGS supports Unix Domain Sockets too which works well for host
process to process IPC (Qemu <-> QGS).

The QGS configuration to enable UDS is to run the service with "-port=0"
parameter. The same works well here too: setting
"tdx_quote_generation_service_socket_port=0" let's users to enable
UDS based IPC.

The socket path is fixed in QGS and cannot be configured: when "-port=0"
is used, the socket appears in /var/run/tdx-qgs/qgs.socket.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2025-03-25 10:18:40 +02:00
Greg Kurz
e19b81225c
Merge pull request #11045 from kata-containers/sprt/fix-gha-tag
security: ci: Pin third-party actions to commit hashes
2025-03-20 08:14:06 +01:00
Aurélien Bombo
a678046d13 gha: Pin third-party actions to commit hashes
A popular third-party action has recently been compromised [1][2] and
the attacker managed to point multiple git version tags to a malicious
commit containing code to exfiltrate secrets.

This PR follows GitHub's recommendation [3] to pin third-party actions
to a full-length commit hash, to mitigate such attacks.

Hopefully actionlint starts warning about this soon [4].

 [1] https://www.cve.org/CVERecord?id=CVE-2025-30066
 [2] https://www.stepsecurity.io/blog/harden-runner-detection-tj-actions-changed-files-action-is-compromised
 [3] https://docs.github.com/en/actions/security-for-github-actions/security-guides/security-hardening-for-github-actions#using-third-party-actions
 [4] https://github.com/rhysd/actionlint/pull/436

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2025-03-19 13:52:49 -05:00
stevenhorsman
cb7c599180 runtime: Switch from deprecated tracer
`go.opentelemetry.io/otel/trace.NewNoopTracerProvider`
is deprectated now, so switch to
`go.opentelemetry.io/otel/trace/noop.NewTracerProvider`

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-19 14:22:06 +00:00
Paul Meyer
a994f142d0 runtime: make SNP IDBlock configurable
For a use case, we want to set the SNP IDBlock, which allows
configuring the AMD ASP to enforce parameters like expected launch
digest at launch. The struct with the config that should be enforced
(IDBlock) is signed. The public key is placed in the auth block and
the signature is verified by the ASP before launch. The digest of the
public key is also part of the attestation report (ID_KEY_DIGESTS).

Signed-off-by: Paul Meyer <katexochen0@gmail.com>
2025-03-14 07:50:54 +01:00
Cameron Baird
d48116114e runtime: Properly set default hyp loglevel to 1
Tweak default HypervisorLoglevel config option for clh to 1.

Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
2025-03-04 20:36:40 +00:00
Markus Rudy
1f6833bd0d runtime: add cause to CDI errors
Adding devices by CDI annotation can fail for a variety of reasons. If
that happens, it's helpful to know the root cause of the issue (CDI spec
missing, malformatted, requested device not present, etc.).

This commit adds the root cause of the CDI device addition to the errors
reported back to the caller. Since this error is bubbled up all the way
back to the shimv2 task.Create handler, it will be visible in Kubernetes
logs and enable fixing the root cause.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2025-02-26 08:36:15 +01:00
Zvonko Kaiser
804e5cd332 gpu: IOMMUFD provide proper ID
We need a proper ID otherwise QEMU sometimes fails with invalid ID.
Use the same pattern as with the old VFIO implementation.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-02-25 16:24:17 +00:00
Hyounggyu Choi
58647bb654
Merge pull request #10743 from zvonkok/iommufd-gpu-fix
IOMMUFD GPU enhancement
2025-02-20 23:43:00 +01:00
Zvonko Kaiser
7cca2c4925 gpu: Use a dedicated VFIO group vs iommufd entry
We do not want to abuse the sysfsentry lets use a dedicated
devfsentry.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-02-20 18:27:52 +00:00
Zvonko Kaiser
9add633258 qemu: Add command line for IOMMUFD
For each IOMMUFD device create an object and assign
it to the device, we need additional information that
is populated now correctly to decide if we run the old VFIO
or new VFIO backend.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-02-20 18:27:50 +00:00
Dan Mihai
3fc170788d
Merge pull request #10811 from microsoft/cameronbaird/hyp-loglevel-upstream
CLH: config: add hypervisor_loglevel
2025-02-04 11:59:21 -08:00
Cameron Baird
b6b0addd5e config: add hypervisor_loglevel
Implement HypervisorLoglevel config option for clh.

Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
2025-01-31 18:37:03 +00:00
Ryan Savino
c1ca49a66c snp: set snp to use upstream qemu in config
use upstream qemu in snp and nvidia snp configs.
load ovmf with bios flag on qemu cmdline instead of file.

Fixes: #10750

Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
2025-01-28 18:09:40 -06:00
Ryan Savino
e87231edc7 snp: remove snp certs on qemu cmdline
snp standard attestation with the upstream kernel and qemu do not support extended attestation with certs.

Fixes: #10750

Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
2025-01-28 18:09:40 -06:00
Zvonko Kaiser
e82fdee20f runtime: Add proper IOMMUFD parsing
With newer kernels we have a new backend for VFIO
called IOMMUFD this is a departure from VFIO IOMMU Groups
since it has only one device associated with an IOMMUFD entry.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-01-15 23:39:33 +00:00
Moritz Sanft
e5735b221c
runtime: use actual booleans for QMP device_add boolean options
Since
be93fd5372,
which is included in QEMU since version 9.2.0, the options for the
`device_add` QMP command need to be typed correctly.

This makes it so that instead of `"on"`, the value is set to `true`,
matching QEMU's expectations.

This has been tested on QEMU 9.2.0 and QEMU 9.1.2, so before and after
the change.

The compatibility with incorrectly typed options  for the `device_add`
command is deprecated since version 6.2.0 [^1].

[^1]:  https://qemu-project.gitlab.io/qemu/about/deprecated.html#incorrectly-typed-device-add-arguments-since-6-2

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>
2025-01-10 11:53:56 +01:00
stevenhorsman
f02d540799 workflows: Bump outdated action versions
Bump some actions that are significantly out-of-date
and out of sync with the versions used in other workflows

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-12-06 13:50:12 +00:00
Alexandru Matei
e83f8f8a04 runtime: Set maxvcpus equal to vcpus for the static resources case
Fixes: #9194

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2024-11-12 16:36:42 +02:00
Aurélien Bombo
19e972151f gha: Hardcode ubuntu-22.04 instead of latest
GHA is migrating ubuntu-latest to Ubuntu 24 so
let's hardcode the current 22.04 LTS.

https://github.blog/changelog/2024-11-05-notice-of-breaking-changes-for-github-actions/

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2024-11-08 11:00:15 -06:00
Pradipta Banerjee
6f1ba007ed runtime: Add GPU annotations for remote hypervisor
Add GPU annotations for remote hypervisor to help
with the right instance selection based on number of GPUs
and model

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
2024-10-29 10:28:21 -04:00
Fabiano Fidêncio
fefcf7cfa4 acrn: Drop support
As we don't have any CI, nor maintainer to keep ACRN code around, we
better have it removed than give users the expectation that it should or
would work at some point.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2024-09-19 16:05:43 +02:00
Fabiano Fidêncio
3d5f48e02e
Merge pull request #10283 from alexman-stripe/alexman-stripe/fix-kata-shim-not-reporting-inactive-file-cgroup-v2
shim: Fix memory usage reporting for cgroup v2
2024-09-19 12:50:36 +02:00
Fabiano Fidêncio
eccdffebf7
Merge pull request #10243 from katexochen/nydus-overlayfs-path
virtcontainers: allow specifying nydus-overlayfs binary by path
2024-09-19 11:35:45 +02:00
Alex Man
27f8f69195 shim: Fix memory usage reporting for cgroup v2
kata-shim was not reporting `inactive_file` in memory stat.

This memory is deducted by containerd when calculating the size of container working set, as it can be paged out by the operating
system under memory pressure. Without reporting `inactive_file`, containerd will over report container memory usage.
[Here](https://github.com/containerd/containerd/blob/v1.7.22/pkg/cri/server/container_stats_list_linux.go#L117) is where containerd
deducts `inactive_file` from memory usage.

Note that kata-shim correctly reports `total_inactive_file` for cgroup v1, but this was not implemented for cgroup v2.

This commit:
- Adds code in kata-shim to report "inactive_file" memory for cgroup v2
- Implements reporting of all available cgroup v2 memory stats to containerd
- Uses defensive coding to avoid assuming existence of any memory.stat fields

The list of available cgroup v2 memory stats defined by containerd can be found
[here](https://pkg.go.dev/github.com/containerd/cgroups/v2/stats#MemoryStat).

Fixes #10280

Signed-off-by: Alex Man <alexman@stripe.com>
2024-09-18 14:04:24 -07:00
Fabiano Fidêncio
65a4562050 runtime: qemu: tdx: Add omitempty to QuoteGenerationSocket
I know right now we're always passing a value for that, but this doesn't
really have to be set unless attestation is used.  Thus, let's also omit
it in case it's empty.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2024-09-06 15:05:55 +02:00
Fabiano Fidêncio
7818484120 runtime: qemu: tdx: Support mrconfigid / mrowner/ mrownerconfig
This is a quick and simple pre-req for supporting initData, which will
take advantage of the mrconfigid in the TDX case.

While already adding mrconfigid, which is hardcoded empty right now,
let's do the same for mrowner and mrownerconfig, and leave it prepared
for future expansions.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2024-09-06 15:05:54 +02:00