Add container_exec_with_retries(), useful for retrying if needed
commands similar to:
kubectl exec <pod_name> -c <container_name> -- <command>
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
On s390x, QEMU fails if maxmem is set to 0:
```
invalid value of maxmem: maximum memory size (0x0) must be at least the initial memory size
```
This commit sets maxmem to the initial memory size for s390x when hotplug is disabled,
resolving the error while still ensuring that memory hotplug remains off.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The setting '-m xM,slots=y,maxmem=zM' where maxmem is from
the host's memory capacity is failing with confidential VMs
on hosts having 1T+ of RAM.
slots/maxmem are necessary for setups where the container
memory is hotplugged to the VM during container creation based
on createContainer info.
This is not the case with CoCo since StaticResourceManagement
is enabled and memory hotplug flows have not been checked.
To avoid unexpeted errors with maxmem, disable slots/maxmem
in case ConfidentialGuest is requested.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Add the annotation of OCI bundle path to store its path.
As it'll be checked within agent policy, we need add them
to pass agent policy validations.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
With the help of `update_ocispec_annotations`, we'll add the contaienr
type key with "io.katacontainers.pkg.oci.container_type" and its
corresponding type "pod_sandbox" when it's pause container and
"pod_container" when it's an other containers.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
It'll updates OCI annotations by removing specified keys and adding
new ones. This function creates a new `HashMap` containing the updated
annotations, ensuring that the original map remains unchanged.
It is optimized for performance by pre-allocating the necessary capacity
and handling removals and additions efficiently.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
To enable access to the constants `POD_CONTAINER` and `POD_SANDBOX` from
other crates, their visibility has been updated to public. This change
addresses the previous limitation of restricted access and ensures these
values can be utilized across the codebase.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Add the annotation of nerdctl network namespace to let nerdctl know which namespace
to use when calling the selected CNI plugin with "nerdctl/network-namespace".
As it'll be checked within agent policy, we need add them to pass agent policy validations.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
On 63f6dcdeb9 we added the support to
download either a .xz or a .zst tarball file. However, we missed adding
the code to properly unpack a .zst tarball file.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
agent-ctl's make check has been failing with:
```
Checking kata-agent-ctl v0.0.1 (/home/ubuntu/runner/_layout/_work/kata-containers/kata-containers/src/tools/agent-ctl)
error[E0432]: unresolved import `hypervisor::ch`
--> src/vm/vm_ops.rs:10:5
|
10 | ch::CloudHypervisor,
| ^^ could not find `ch` in `hypervisor`
|
note: found an item that was configured out
--> /home/ubuntu/runner/_layout/_work/kata-containers/kata-containers/src/runtime-rs/crates/hypervisor/src/lib.rs:30:9
|
30 | pub mod ch;
| ^^
note: the item is gated here
--> /home/ubuntu/runner/_layout/_work/kata-containers/kata-containers/src/runtime-rs/crates/hypervisor/src/lib.rs:26:1
|
26 | / #[cfg(all(
27 | | feature = "cloud-hypervisor",
28 | | any(target_arch = "x86_64", target_arch = "aarch64")
29 | | ))]
| |___^
```
Let's just make sure that we include ch conditionally as well.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Docker containers support specifying the shm size using the --shm-size
option and support sandbox-level shm volumes, so we've added support for
shm volumes. Since Kubernetes doesn't support specifying the shm size,
it typically uses a memory-based emptydir as the container's shm, and
its size can be specified.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Since the rate limiter would be shared by cloud-hypervisor
and firecracker etc, thus move it from clh's config to
hypervisor config crate which would be shared by other vmm.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Given that Rust-based VMMs like cloud-hypervisor, Firecracker, and
Dragonball naturally offer user-level block I/O rate limiting, I/O
throttling has been implemented to leverage this capability for these
VMMs. This PR specifically introduces support for cloud-hypervisor.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Support for the share-rw=true parameter has been added. While this
parameter is essential for maintaining data consistency across multiple
QEMU instances sharing a backend disk image, its implementation also
serves to standardize parameters with the block device hotplug
functionality in kata-runtime/qemu.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Docker tests have been broken for a while and should be removed if we
cannot maintain those.
For now, though, let's limit it to run only with one hypervisor and
avoid wasting resources for no reason.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
devmapper tests have been failing for a while. It's been breaking on the
kata-deploy deployment, which is most likely related to Disk Pressure.
Removing files was not enough to get the tests to run, so we'll just run
those with QEMU as a way to test fixes. Once we get the test working,
we can re-enable the other VMMs, but for now let's just not waste
resources for no reason.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
add_allow_all_policy_to_yaml now also sets the initdata annotation. So don't overwrite the
initdata annotation that was previously set by create_coco_pod_yaml_with_annotations.
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
Delete annotation from OCI spec and sandbox config. This is done after the optional initdata annotation value has been read.
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
Bump cri-o to 1.34.0 to try and remediate security advisories
CVE-2025-0750 and CVE-2025-4437.
Note: Running
```
go get github.com/cri-o/cri-o@v1.34.0
```
seems to bump a lot of other go modules, hence the size of the
vendor diff
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
In google.golang.org/grpc v1.72.0, `DialContext`, is deprecated, so
switch to use `NewClient` instead.
`grpc.WithBlock()` is deprecated and not recommend, so remove this
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
In `github.com/prometheus/common v0.62.0` expfmt.FmtText
is deprecated, so replace with `expfmt.NewFormat(expfmt.TypeTextPlain)`.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
In `github.com/prometheus/common v0.62.0` expfmt.FmtText
is deprecated, so replace with `expfmt.NewFormat(expfmt.TypeTextPlain)`.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This is following Steve's suggestion, based on what's been done on
cloud-api-adaptor.
The reason we're doing it here is because we've seen pods being evicted
due to disk pressure.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
I've hit this when using a machine with slow internet connection, which
took ages to download the kata-cleanup image, and then helm timed out in
the middle of the cleanup, leading to the cleanup job being restarted
and then bailing with an error as the runtimeclasses that kata-deploy
tries to delete were already deleted.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Since the cloud hypervisor's resize vCPU is an asynchronous operation,
it's possible that the previous resize operation hasn't completed when
the request is sent, causing the current call to return an error.
Therefore, several retries can be performed to avoid this error.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>