Commit Graph

16328 Commits

Author SHA1 Message Date
Saul Paredes
e7b9eddced
Merge pull request #11248 from microsoft/archana1/storages
genpolicy: add validation for storages
2025-07-01 10:02:10 -07:00
Fabiano Fidêncio
07b41c88de
Merge pull request #11490 from Apokleos/fix-noise
runtime-rs: Fix noise with frequently appearing in unstaged changes
2025-07-01 17:43:41 +02:00
Archana Choudhary
6932beb01f policy: fix parse errors in rules.rego
This patch fixes the rules.rego file to ensure that the
policy is correctly parsed and applied by opa.

Signed-off-by: Archana Choudhary <archana1@microsoft.com>
2025-07-01 12:43:41 +00:00
Archana Choudhary
abbe1be69f tests: enable confidential_guest setting for coco
This commit updates the `tests_common.sh` script
to enable the `confidential_guest`
setting for the coco tests in the Kubernetes
integration tests.

Signed-off-by: Archana Choudhary <archana1@microsoft.com>
2025-07-01 10:35:20 +00:00
Archana Choudhary
9dd365fdb5 genpolicy: fix mount source check in rules.rego
This commit fixes the mount source check in rules.rego.

Signed-off-by: Archana Choudhary <archana1@microsoft.com>
2025-07-01 10:35:20 +00:00
Archana Choudhary
1cbea890f1 genpolicy: tests: update testcases for execprocess
This patch removes storages from the testcases.json file for execprocess.
This is because input storage objects are invalid for two reasons:
1. "io.katacontainers.fs-opt.layer=" is missing option in annotations.
2. by default, we don't have host-tarfs-dm-verity enabled, so the storage
objects are not created in policy.

Signed-off-by: Archana Choudhary <archana1@microsoft.com>
---
2025-07-01 10:35:20 +00:00
Archana Choudhary
6adec0737c genpolicy: add rules for image_guest_pull storage
This patch introduces some basic checks for the
`image_guest_pull` storage type in the genpolicy tool.

Signed-off-by: Archana Choudhary <archana1@microsoft.com>
2025-07-01 10:35:20 +00:00
Archana Choudhary
bd2dc1422e genpolicy: add test for container images having volumes
This patch adds a test case to genpolicy for container images that have volumes.
Examples of such container images include:
 - quay.io/opstree/redis
 - https://github.com/kubernetes/examples/blob/master/cassandra/image/Dockerfile

Signed-off-by: Archana Choudhary <archana1@microsoft.com>
2025-07-01 10:35:20 +00:00
Archana Choudhary
d7f998fbd5 genpolicy: tests: update test for emptydir volumes
This patch
  - updates testcases.json for emptydir volumes/storages

Signed-off-by: Archana Choudhary <archana1@microsoft.com>
2025-07-01 10:35:20 +00:00
Archana Choudhary
68c8c31718 genpolicy: tests: add test for config_map volumes
This patch adds test for config_map volumes.

Signed-off-by: Archana Choudhary <archana1@microsoft.com>
2025-07-01 10:35:20 +00:00
Archana Choudhary
9ebbc08d70 genpolicy: enable storage checks
This patch
 - adds condition to add container image layers as storages
 - enable storage checks
 - fix CI policy test cases
 - update genpolicy-settings.json to enable storage checks
 - remove storage object addition in container image parsing

Signed-off-by: Archana Choudhary <archana1@microsoft.com>
2025-07-01 10:35:20 +00:00
Archana Choudhary
5b1459e623 genpolicy: test framework: enable config map usage
This patch improves the test framework for the
genpolicy tool by enabling the use of config maps.

Signed-off-by: Archana Choudhary <archana1@microsoft.com>
2025-07-01 10:35:20 +00:00
Alex Lyn
8784cebb84
Merge pull request #10693 from Apokleos/guest-pullimage-timeout
runtime-rs: support setting create_container timeout with request_timeout_ms for image pulling in guest
2025-07-01 11:40:19 +08:00
alex.lyn
b7c1d04a47 runtime-rs: Fix noise with frequently appearing in unstaged changes
Fixes #11489

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2025-07-01 10:19:02 +08:00
alex.lyn
9839c17cad build: add Makefile variable for create_container_timeout
Add the definiation of variable DEFCREATECONTAINERTIMEOUT into
Makefile target with default timeout 30s.

Fixes: #485

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2025-06-30 20:04:56 +08:00
alex.lyn
1a06bd1f08 kata-types: Introduce annotation *_RUNTIME_CREATE_CONTAINTER_TIMEOUT
It's used to indicate timeout value set for image pulling in
guest during creating container.
This allows users to set this timeout with annotation according to the
size of image to be pulled.

Fixes #10692

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2025-06-30 20:04:56 +08:00
alex.lyn
f886e82f03 runtime-rs: support setting create_container_timeout
It allows users to set this create container timeout within
configuration.toml according to the size of image to be pulled
inside guest.

Fixes #10692

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2025-06-30 20:04:56 +08:00
alex.lyn
ce524a3958 kata-types: Give a more comprehensive definition of request_timeout_ms
To better understand the impact of different timeout values on system
behavior, this section provides a more comprehensive explanation of the
request_timeout_ms:

This timeout value is used to set the maximum duration for the agent to
process a CreateContainerRequest. It's also used to ensure that workloads,
especially those involving large image pulls within the guest, have sufficient
time to complete.
Based on explaination above, it's renamed with `create_container_timeout`,
Specially, exposed in 'configuration.toml'

Fixes #10692

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2025-06-30 20:04:56 +08:00
Steve Horsman
f04bb3f34c
Merge pull request #11479 from stevenhorsman/skip-weekly-coco-stability-tests
workflows: Skip weekly coco stability tests
2025-06-30 09:05:14 +01:00
Fabiano Fidêncio
b024d8737c
Merge pull request #11481 from fidencio/topic/fix-passing-image-size-alignment
build: Allow passing IMAGE_SIZE_ALIGNMENT_MB as an env var
2025-06-30 09:04:39 +02:00
Alex Lyn
69d2c078d1
Merge pull request #11484 from stevenhorsman/bump-nydus-snapshotter-0.15.2
version: Bump nydus-snapshotter
2025-06-30 14:44:01 +08:00
Alex Lyn
e66baf503b
Merge pull request #11474 from Apokleos/remote-annotation
runtime-rs: Add GPU annotations for remote hypervisor
2025-06-30 14:05:15 +08:00
Fabiano Fidêncio
8d4e3b47b1
Merge pull request #11470 from fidencio/topic/runtime-rs-fix-odd-memory-size-calculation
runtime-rs: Fix calculation of odd memory sizes
2025-06-30 07:26:30 +02:00
Champ-Goblem
91cadb7bfe runtime-rs: Fix calculation of odd memory sizes
An odd memory size leads to the runtime breaking during its startup, as
shown below:
```
Warning  FailedCreatePodSandBox  34s   kubelet            Failed to
create pod sandbox: rpc error: code = Unknown desc = failed to start
sandbox
"708c81910f4e67e53b4170b6615083339b220154cb9a0c521b3232cdb40d50f9":
failed to create containerd task: failed to create shim task:
Others("failed to handle message start sandbox in task handler\n\nCaused
by:\n    0: start vm\n    1: set vm base config\n    2: set vm
configuration\n    3: Failed to set vm configuration VmConfigInfo {
vcpu_count: 2, max_vcpu_count: 16, cpu_pm: \"on\", cpu_topology:
CpuTopology { threads_per_core: 1, cores_per_die: 1, dies_per_socket: 1,
sockets: 1 }, vpmu_feature: 0, mem_type: \"shmem\", mem_file_path: \"\",
mem_size_mib: 4513, serial_path:
Some(\"/run/kata/708c81910f4e67e53b4170b6615083339b220154cb9a0c521b3232cdb40d50f9/console.sock\"),
pci_hotplug_enabled: true }\n    4: vmm action error:
MachineConfig(InvalidMemorySize(4513))\n\nStack backtrace:\n   0:
anyhow::error::<impl anyhow::Error>::msg\n   1:
hypervisor::dragonball::vmm_instance::VmmInstance::handle_request\n   2:
hypervisor::dragonball::vmm_instance::VmmInstance::set_vm_configuration\n
3: hypervisor::dragonball::inner::DragonballInner::set_vm_base_config\n
4: <hypervisor::dragonball::Dragonball as
hypervisor::Hypervisor>::start_vm::{{closure}}::{{closure}}\n   5:
<hypervisor::dragonball::Dragonball as
hypervisor::Hypervisor>::start_vm::{{closure}}\n   6:
<virt_container::sandbox::VirtSandbox as
common::sandbox::Sandbox>::start::{{closure}}::{{closure}}\n   7:
<virt_container::sandbox::VirtSandbox as
common::sandbox::Sandbox>::start::{{closure}}\n   8:
runtimes::manager::RuntimeHandlerManager::handler_task_message::{{closure}}::{{closure}}\n
9:
runtimes::manager::RuntimeHandlerManager::handler_task_message::{{closure}}\n
10: <service::task_service::TaskService as
containerd_shim_protos::shim::shim_ttrpc_async::Task>::create::{{closure}}\n
11: <containerd_shim_protos::shim::shim_ttrpc_async::CreateMethod as
ttrpc::asynchronous::utils::MethodHandler>::handler::{{closure}}\n  12:
<tokio::time::timeout::Timeout<T> as
core::future::future::Future>::poll\n  13:
ttrpc::asynchronous::server::HandlerContext::handle_msg::{{closure}}\n
14: <core::future::poll_fn::PollFn<F> as
core::future::future::Future>::poll\n  15:
<ttrpc::asynchronous::server::ServerReader as
ttrpc::asynchronous::connection::ReaderDelegate>::handle_msg::{{closure}}::{{closure}}\n
16: tokio::runtime::task::core::Core<T,S>::poll\n  17:
tokio::runtime::task::harness::Harness<T,S>::poll\n  18:
tokio::runtime::scheduler::multi_thread::worker::Context::run_task\n
19: tokio::runtime::scheduler::multi_thread::worker::Context::run\n  20:
tokio::runtime::context::runtime::enter_runtime\n  21:
tokio::runtime::scheduler::multi_thread::worker::run\n  22:
<tokio::runtime::blocking::task::BlockingTask<T> as
core::future::future::Future>::poll\n  23:
tokio::runtime::task::core::Core<T,S>::poll\n  24:
tokio::runtime::task::harness::Harness<T,S>::poll\n  25:
tokio::runtime::blocking::pool::Inner::run\n  26:
std::sys::backtrace::__rust_begin_short_backtrace\n  27:
core::ops::function::FnOnce::call_once{{vtable.shim}}\n  28:
std::sys::pal::unix:🧵:Thread:🆕:thread_start")
```

As we cannot control what the users will set, let's just round it up to
the next acceptable value.

Signed-off-by: Champ-Goblem <cameron@northflank.com>
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
2025-06-28 14:29:18 +02:00
Fabiano Fidêncio
e2b93fff3f build: Allow passing IMAGE_SIZE_ALIGNMENT_MB as an env var
This helps considerably to avoid patching the code, and just adjusting
the build environment to use a smaller alignment than the default one.

Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
2025-06-28 00:05:20 +02:00
stevenhorsman
fe5d43b4bd workflows: Skip weekly coco stability tests
These tests are not passing, or being maintained,
so as discussed on the AC meeting, we will skip them
from automatically running until they can be reviewed
and re-worked, so avoid wasting CI cycles.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-06-27 16:51:53 +01:00
stevenhorsman
61b12d4e1b version: Bump nydus-snapshotter
Bump to version v0.15.2 to pick up fix to mount source in
https://github.com/containerd/nydus-snapshotter/pull/636

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-06-27 14:04:00 +01:00
RuoqingHe
a43e06e0eb
Merge pull request #11461 from stevenhorsman/bump-guest-components-4cd62c3
versions: Bump guest-components
2025-06-27 10:45:06 +08:00
Saul Paredes
8c57beb943
Merge pull request #11471 from microsoft/saulparedes/fix_kata_monitor_dockerfile
tools: kata-monitor: update go version used to build in Dockerfile
2025-06-26 08:37:08 -07:00
Chao Wu
ac928218f3
Merge pull request #11434 from hsiangkao/erofs
runtime: improve EROFS snapshotter support
2025-06-26 22:40:48 +08:00
Cameron McDermott
b6cd6e6914
Merge pull request #11469 from fidencio/topic/dragonball-set-default_maxvcpus-to-zero
runtime-rs: Set default_maxvcpus to 0
2025-06-26 15:20:21 +01:00
Aurélien Bombo
a1aa3e79d4
Merge pull request #11392 from kata-containers/sprt/zizmor
ci: Run zizmor for GHA security analysis
2025-06-26 08:55:22 -05:00
Fupan Li
1ff54a95d2
Merge pull request #11422 from lifupan/memory_hotplug
runtime-rs: Add the memory and vcpu hotplug for cloud-hypervisor
2025-06-26 17:56:49 +08:00
Aurélien Bombo
34c8cd810d ci: Run zizmor for GHA security analysis
This runs the zizmor security lint [1] on our GH Actions.
The initial workflow uses [2] as a base.

[1] https://docs.zizmor.sh/
[2] https://docs.zizmor.sh/usage/#use-in-github-actions

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2025-06-26 10:52:28 +01:00
alex.lyn
e6e4cd91b8 runtime-rs: Enable GPU annotations in remote hypervisor configuration
Enable GPU annotations by adding `default_gpus` and `default_gpu_model`
into the list of valid annotations `enable_annotations`.

Fixes #10484

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2025-06-26 17:29:36 +08:00
alex.lyn
e5f44fae30 runtime-rs: Add GPU annotations during remote hypervisor preparation
Add GPU specific annotations used by remote hypervisor for instance
selection during `prepare_vm`.

Fixes #10484

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2025-06-26 17:27:41 +08:00
alex.lyn
866d3facba kata-types: Introduce two GPU annotations for remote hypervisor
Two annotations: `default_gpus and `default_gpu_model` as GPU annotations
are introduced for Kata VM configurations to improve instance selection on
remote hypervisors. By adding these annotations:
(1) `default_gpus`: Allows users to specify the minimum number of GPUs a VM
requires. This ensures that the remote hypervisor selects an instance
with at least that many GPUs, preventing resource under-provisioning.
(2) `default_gpu_model`: Lets users define the specific GPU model needed for
the VM. This is crucial for workloads that depend on particular GPU archs or
features, ensuring compatibility and optimal performance.

Fixes #10484

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2025-06-26 17:27:41 +08:00
alex.lyn
ed0c0b2367 kata-types: Introduce GPU related fields in RemoteInfo
To provide the remote hypervisor with the necessary intelligence
to select the most appropriate instance for a given GPU instance,
leading to better resource allocation, two fields `default_gpus`
and `default_gpu_model` are introduced in `RemoteInfo`.

Fixes #10484

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2025-06-26 17:27:28 +08:00
Alex Lyn
9a1d4fc5d6
Merge pull request #11468 from Apokleos/fix-sharefs-none
runtime-rs: Support shared fs with "none" on non-tee platforms
2025-06-26 15:37:44 +08:00
Gao Xiang
9079c8e598 runtime: improve EROFS snapshotter support
To better support containerd 2.1 and later versions, remove the
hardcoded `layer.erofs` and instead parse `/proc/mounts` to obtain the
real mount source (and `/sys/block/loopX/loop/backing_file` if needed).

If the mount source doesn't end with `layer.erofs`, it should be marked
as unsupported, as it may be a filesystem meta file generated by later
containerd versions for the EROFS flattened filesystem feature.

Also check whether the filesystem type is `overlay` or not, since the
containerd mount manager [1] may change it after being introduced.

[1] https://github.com/containerd/containerd/issues/11303

Fixes: f63ec50ba3 ("runtime: Add EROFS snapshotter with block device support")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2025-06-26 10:12:12 +08:00
Saul Paredes
d53c720ac1 tools: kata-monitor: update go version used to build in Dockerfile
Current Dockerfile fails when trying to build from the root of the repo
docker build -t kata-monitor -f tools/packaging/kata-monitor/Dockerfile .
with "invalid go version '1.23.0': must match format 1.23"

Using go 1.23 in the Dockerfile fixes the build error

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2025-06-25 15:32:41 -07:00
stevenhorsman
290fda9b97 agent-ctl: Bump image-rs version
I notices that agent-ctl is including a 9 month old version of
image-rs and the libs crates haven't been update for potentially
many years, so bump all of these.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-06-25 16:30:58 +01:00
stevenhorsman
c7da62dd1e versions: Bump guest-components
Bump to pick up the new guest-components
and matching trustee which use rust 1.85.1

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-06-25 15:05:07 +01:00
Fabiano Fidêncio
bebe377f0d runtime-rs: Set default_maxvcpus to 0
Otherwise we just cannot start a container that requests more than 1
vcpu.

Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
2025-06-25 14:36:46 +02:00
Steve Horsman
9ff30c6aeb
Merge pull request #11462 from kata-containers/add-scorecard-action
ci: Add scorecard action
2025-06-25 12:48:11 +01:00
Fabiano Fidêncio
69c706b570
Merge pull request #11441 from stevenhorsman/protobuf-3.7.2-bump
versions: Bump protobuf to 3.7.2
2025-06-25 13:47:28 +02:00
alex.lyn
eae62ca9ac runtime-rs: Support shared fs with "none" on non-tee platforms
This commit introduces the ability to run Pods without shared fs
mechanism in Kata.

The default shared fs can lead to unnecessary resource consumption
and security risks for certain use cases. Specifically, scenarios
where files only need to be copied into the VM once at Pod creation
(e.g., non-tee envs) and don't require dynamic updates make the shared
fs redundant and inefficient.

By explicitly disabling shared fs functionality, we reduce resource
overhead and shrink the attack surface. Users will need to employ
alternative methods(e.g. guest-pull) to ensure container images are
shared into the guest VM for these specific scenarios.

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2025-06-25 17:36:57 +08:00
Fabiano Fidêncio
4719c08184
Merge pull request #11467 from lifupan/fixblockfile
runtime-rs: fix the issue return the wrong volume
2025-06-25 09:56:28 +02:00
Fupan Li
48c8e0f296 runtime-rs: fix the issue return the wrong volume
In the pre commit:74eccc54e7b31cc4c9abd8b6e4007c3a4c1d4dd4,
it missed return the right rootfs volume.

In the is_block_rootfs fn, if the rootfs is based on a
block device such as devicemapper, it should clear the
volume's source and let the device_manager to use the
dev_id to get the device's host path.

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2025-06-25 10:02:52 +08:00
Alex Lyn
648fef4f52
Merge pull request #11466 from lifupan/blockfile
runtime-rs: add the blockfile based rootfs support
2025-06-25 09:46:54 +08:00