Let's rename the runtime-rs initdata annotation from
`io.katacontainers.config.runtime.cc_init_data` to
`io.katacontainers.config.hypervisor.cc_init_data`.
Rationale:
- initdata itself is a hypervisor-specific feature
- the new name aligns with the annotation handling logic:
c92bb1aa88/src/libs/kata-types/src/annotations/mod.rs (L514-L968)
This commit updates the annotation for go-runtime and tests accordingly.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Dependening on the platform configuration, users might want to
set a more secure policy than the QEMU default.
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
When hot-plugging CPUs on QEMU, we send a QMP command with JSON
arguments. QEMU 9.2 recently became more strict[1] enforcing the
JSON schema for QMP parameters. As a result, running Kata Containers
with QEMU 9.2 results in a message complaining that the core-id
parameter is expected to be an integer:
```
qmp hotplug cpu, cpuID=cpu-0 socketID=1, error:
QMP command failed:
Invalid parameter type for 'core-id', expected: integer
```
Fix that by changing the core-id, socket-id and thread-id to be
integer values.
[1]: be93fd5372Fixes: #11633
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Removing runtime SEV functionality,
such as the kbs, ovmf, VMSA handling,
and SEV configs as part of deprecating
SEV from kata.
Co-authored-by: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com>
Signed-off-by: Arvind Kumar <arvinkum@amd.com>
To better support containerd 2.1 and later versions, remove the
hardcoded `layer.erofs` and instead parse `/proc/mounts` to obtain the
real mount source (and `/sys/block/loopX/loop/backing_file` if needed).
If the mount source doesn't end with `layer.erofs`, it should be marked
as unsupported, as it may be a filesystem meta file generated by later
containerd versions for the EROFS flattened filesystem feature.
Also check whether the filesystem type is `overlay` or not, since the
containerd mount manager [1] may change it after being introduced.
[1] https://github.com/containerd/containerd/issues/11303
Fixes: f63ec50ba3 ("runtime: Add EROFS snapshotter with block device support")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
By default the checkout action leave the credentials
in the checked-out repo's `.git/config`, which means
they could get exposed. Use persist-credentials: false
to prevent this happening.
Note: static-checks.yaml does use git diff after the checkout,
but the git docs state that git diff is just local, so doesn't
need authentication.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Allow users to build using DEFDISABLEIMAGENVDIMM=true if they want to
set disable_image_nvdimm=true in configuration-clh.toml.
disable_image_nvdimm=false is the default config value.
Also, use virtio-blk instead of nvdimm if disable_image_nvdimm=true in
configuration-clh.toml.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
the qemu commandline of SNP should start with `sev-snp-guest`, and then
following other parameters separeted by ','. This patch fixes the
parameter order.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
Currently, when a new sandbox resource controller is created with cgroupsv2 and sandbox_cgroup_only is disabled,
the cgroup management falls back to cgroupfs. During deletion, `IsSystemdCgroup` checks if the path contains `:`
and tries to delete the cgroup via systemd. However, the cgroup was originally set up via cgroupfs and this process
fails with `lstat /sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/....scope: no such file or directory`.
This patch updates the deletion logic to take in to account the sandbox_cgroup_only=false option and in this case uses
the cgroupfs delete.
Fixes: #11036
Signed-off-by: Champ-Goblem <cameron@northflank.com>
This enables guest pull via config, without the need of any external
snapshotter. When the config enables runtime.experimental_force_guest_pull, instead of
relying on annotations to select the way to share the root FS, we always
use guest pull.
Co-authored-by: Markus Rudy <mr@edgeless.systems>
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
In the latest oci-spec, the prestart hook is deprecated.
However, the docker & nerdctl tests failed when I switched
to one of the newer hooks which don't run at quite the same time,
so ignore the deprecation warnings for now to unblock the security fix
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
We've been using the
github.com/containers/podman/v4/pkg/annotations module
to get cri-o annotations, which has some major CVEs in, but
in v5 most of the annotations were moved into crio (from 1.30)
(see https://github.com/cri-o/cri-o/pull/7867). Let's switch
to use the cri-o annotations module instead and remediate
CVE-2024-3056.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- Detection of EROFS options in container rootfs
- Creation of necessary EROFS devices
- Sharing of rootfs with EROFS via overlayfs
Fixes: #11163
Signed-off-by: ChengyuZhu6 <hudson@cyzhu.com>
We're bringing to *Cloud Hypervisor only* the reclaim_guest_freed_memory
option already present in the runtime-rs.
This allows us to use virtio-balloon for the hypervisor to reclaim
memory freed by the guest.
The reason we're not touching other hypervisors is because we're very
much aware of avoiding to clutter the go code at this point, so we'll
leave it for whoever really needs this on other hypervisor (and trust
me, we really do need it for Cloud Hypervisor right now ;-)).
Signed-off-by: Champ-Goblem <cameron@northflank.com>
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
After the outer runtime has processed the CDI annotation from the
spec we can delete them since they were converted into Linux
devices in the OCI spec.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
io.katacontainers.config.runtime.cc_init_data specifies initdata used by
the pod in base64(gzip(initdata toml)) format. The initdata will be
encapsulated into an initdata image and mount it as a raw block device
to the guest.
The initdata image will be aligned with 512 bytes, which is chosen as a
usual sector size supported by different hypervisors like qemu, clh and
dragonball.
Note that this patch only adds support for qemu hypervisor.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
TDX Quote Generation Service (QGS) signs TDREPORT sent to it from
Qemu (GetQuote hypercall). Qemu needs quote-generation-socket
address configured for IPC.
Currently, Kata govmm only enables vsock based IPC for QGS but
QGS supports Unix Domain Sockets too which works well for host
process to process IPC (Qemu <-> QGS).
The QGS configuration to enable UDS is to run the service with "-port=0"
parameter. The same works well here too: setting
"tdx_quote_generation_service_socket_port=0" let's users to enable
UDS based IPC.
The socket path is fixed in QGS and cannot be configured: when "-port=0"
is used, the socket appears in /var/run/tdx-qgs/qgs.socket.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
`go.opentelemetry.io/otel/trace.NewNoopTracerProvider`
is deprectated now, so switch to
`go.opentelemetry.io/otel/trace/noop.NewTracerProvider`
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
For a use case, we want to set the SNP IDBlock, which allows
configuring the AMD ASP to enforce parameters like expected launch
digest at launch. The struct with the config that should be enforced
(IDBlock) is signed. The public key is placed in the auth block and
the signature is verified by the ASP before launch. The digest of the
public key is also part of the attestation report (ID_KEY_DIGESTS).
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
Adding devices by CDI annotation can fail for a variety of reasons. If
that happens, it's helpful to know the root cause of the issue (CDI spec
missing, malformatted, requested device not present, etc.).
This commit adds the root cause of the CDI device addition to the errors
reported back to the caller. Since this error is bubbled up all the way
back to the shimv2 task.Create handler, it will be visible in Kubernetes
logs and enable fixing the root cause.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
We need a proper ID otherwise QEMU sometimes fails with invalid ID.
Use the same pattern as with the old VFIO implementation.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
For each IOMMUFD device create an object and assign
it to the device, we need additional information that
is populated now correctly to decide if we run the old VFIO
or new VFIO backend.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
use upstream qemu in snp and nvidia snp configs.
load ovmf with bios flag on qemu cmdline instead of file.
Fixes: #10750
Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
snp standard attestation with the upstream kernel and qemu do not support extended attestation with certs.
Fixes: #10750
Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
With newer kernels we have a new backend for VFIO
called IOMMUFD this is a departure from VFIO IOMMU Groups
since it has only one device associated with an IOMMUFD entry.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Since
be93fd5372,
which is included in QEMU since version 9.2.0, the options for the
`device_add` QMP command need to be typed correctly.
This makes it so that instead of `"on"`, the value is set to `true`,
matching QEMU's expectations.
This has been tested on QEMU 9.2.0 and QEMU 9.1.2, so before and after
the change.
The compatibility with incorrectly typed options for the `device_add`
command is deprecated since version 6.2.0 [^1].
[^1]: https://qemu-project.gitlab.io/qemu/about/deprecated.html#incorrectly-typed-device-add-arguments-since-6-2
Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>
Bump some actions that are significantly out-of-date
and out of sync with the versions used in other workflows
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Add GPU annotations for remote hypervisor to help
with the right instance selection based on number of GPUs
and model
Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
As we don't have any CI, nor maintainer to keep ACRN code around, we
better have it removed than give users the expectation that it should or
would work at some point.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
kata-shim was not reporting `inactive_file` in memory stat.
This memory is deducted by containerd when calculating the size of container working set, as it can be paged out by the operating
system under memory pressure. Without reporting `inactive_file`, containerd will over report container memory usage.
[Here](https://github.com/containerd/containerd/blob/v1.7.22/pkg/cri/server/container_stats_list_linux.go#L117) is where containerd
deducts `inactive_file` from memory usage.
Note that kata-shim correctly reports `total_inactive_file` for cgroup v1, but this was not implemented for cgroup v2.
This commit:
- Adds code in kata-shim to report "inactive_file" memory for cgroup v2
- Implements reporting of all available cgroup v2 memory stats to containerd
- Uses defensive coding to avoid assuming existence of any memory.stat fields
The list of available cgroup v2 memory stats defined by containerd can be found
[here](https://pkg.go.dev/github.com/containerd/cgroups/v2/stats#MemoryStat).
Fixes#10280
Signed-off-by: Alex Man <alexman@stripe.com>
I know right now we're always passing a value for that, but this doesn't
really have to be set unless attestation is used. Thus, let's also omit
it in case it's empty.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
This is a quick and simple pre-req for supporting initData, which will
take advantage of the mrconfigid in the TDX case.
While already adding mrconfigid, which is hardcoded empty right now,
let's do the same for mrowner and mrownerconfig, and leave it prepared
for future expansions.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>