Commit Graph

12082 Commits

Author SHA1 Message Date
Fabiano Fidêncio
5e9cf75937 vc: utils: Rename CalculateMilliCPUs() to CalculateCPUsF()
With the change done in the last commit, instead of calculating milli
cpus, we're actually converting the CPUs to a fraction number, a float.

Let's update the function name (and associated vars) to represent that
change.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-11-10 18:26:01 +01:00
Fabiano Fidêncio
e477ed0e86 runtime: Improve vCPU allocation for the VMMs
First of all, this is a controversial piece, and I know that.

In this commit we're trying to make a less greedy approach regards the
amount of vCPUs we allocate for the VMM, which will be advantageous
mainly when using the `static_sandbox_resource_mgmt` feature, which is
used by the confidential guests.

The current approach we have basically does:
* Gets the amount of vCPUs set in the config (an integer)
* Gets the amount of vCPUs set as limit (an integer)
* Sum those up
* Starts / Updates the VMM to use that total amount of vCPUs

The fact we're dealing with integers is logical, as we cannot request
500m vCPUs to the VMMs.  However, it leads us to, in several cases, be
wasting one vCPU.

Let's take the example that we know the VMM requires 500m vCPUs to be
running, and the workload sets 250m vCPUs as a resource limit.

In that case, we'd do:
* Gets the amount of vCPUs set in the config: 1
* Gets the amount of vCPUs set as limit: ceil(0.25)
* 1 + ceil(0.25) = 1 + 1 = 2 vCPUs
* Starts / Updates the VMM to use 2 vCPUs

With the logic changed here, what we're doing is considering everything
as float till just before we start / update the VMM. So, the flow
describe above would be:
* Gets the amount of vCPUs set in the config: 0.5
* Gets the amount of vCPUs set as limit: 0.25
* ceil(0.5 + 0.25) = 1 vCPUs
* Starts / Updates the VMM to use 1 vCPUs

In the way I've written this patch we introduce zero regressions, as
the default values set are still the same, and those will only be
changed for the TEE use cases (although I can see firecracker, or any
other user of `static_sandbox_resource_mgmt=true` taking advantage of
this).

There's, though, an implicit assumption in this patch that we'd need to
make explicit, and that's that the default_vcpus / default_memory is the
amount of vcpus / memory required by the VMM, and absolutely nothing
else.  Also, the amount set there should be reflected in the
podOverhead for the specific runtime class.

One other possible approach, which I am not that much in favour of
taking as I think it's **less clear**, is that we could actually get the
podOverhead amount, subtract it from the default_vcpus (treating the
result as a float), then sum up what the user set as limit (as a float),
and finally ceil the result.  It could work, but IMHO this is **less
clear**, and **less explicit** on what we're actually doing, and how the
default_vcpus / default_memory should be used.

Fixes: #6909

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2023-11-10 18:25:57 +01:00
Fabiano Fidêncio
b0157ad73a runtime: confidential: Do not set the max_vcpu to cpu
We don't have to do this since we're relying on the
`static_sandbox_resource_mgmt` feature, which gives us the correct
amount of memory and CPUs to be allocated.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-11-10 12:58:20 +01:00
Steve Horsman
b23952c852
Merge pull request #8309 from gkurz/update-release-process-doc
Update release process documentation
2023-11-10 09:44:18 +00:00
Archana Shinde
21e45bebc8
Merge pull request #8376 from fidencio/topic/kata-manager-add-support-for-docker-installation
kata-manager: Add support for Docker CLI installation
2023-11-09 22:11:50 -08:00
Chao Wu
a62fb83c91
Merge pull request #8169 from openanolis/chao/fix_typo_shm
runtime-rs: fix a typo in shm
2023-11-10 14:00:11 +08:00
Chao Wu
820b578aa3
Merge pull request #8370 from gaohuatao-1/bugfix
agent: update AGENT_THREADS metrics value
2023-11-10 13:16:29 +08:00
gaohuatao
78df1bb851 agent: update AGENT_THREADS metrics value
Fixes: #8369

Signed-off-by: gaohuatao <gaohuatao@bytedance.com>
2023-11-10 10:39:57 +08:00
Chao Wu
afb002c25c runtime-rs: fix a typo in shm
is_shim_volume should be is_shm_volume in shm_volume mod.

fixes: #8168
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-11-10 10:36:58 +08:00
Fabiano Fidêncio
2b937400fe
Merge pull request #8404 from fidencio/topic/kata-deploy-allow-users-to-enable-hypervisor-annotations
kata-deploy: Allow users to set hypervisor annotations
2023-11-09 17:44:52 +01:00
Fabiano Fidêncio
5d10aed9ba kata-manager: Make containerd_config a global var
As "/etc/containerd/config.toml" is used from more than one place, let's
just make it a global var.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-11-09 13:47:52 +01:00
Fabiano Fidêncio
66d1b2c173 kata-manager: Add support for docker installation
Add support for also installing the Docker CLI, giving users the chance
to try Kata Containers with docker in the same way we provide users the
chance to try Kata Containers with `ctr`.

Fixes: #8357

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-11-09 13:47:52 +01:00
Fabiano Fidêncio
1a81989d20 tests: k8s: Use the "ALLOWED_HYPERVISOR_ANNOTATIONS"
The current kata-deploy code has been doing a `sed` to add allowed
hypervisor annotations, so CBL mariner can be tested with their own
kernel and initrd.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-11-09 13:42:31 +01:00
Fabiano Fidêncio
023c4a17cf kata-deploy: Allow users to set hypervisor annotations
Currently the only way one can specify allowed hypervisor annotations is
during build time, which is a big issue for users grabbing kata-deploy
as we provide.

Fixes: #8403

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-11-09 13:42:31 +01:00
Fabiano Fidêncio
0352f1e029 kata-manager: Allow passing a specific tool to test_installation
Right now we're only testing with `ctr` and there's no change in
behaviour with this commit.  However, allowing to pass a tool to run the
tests with gives us an easier time when expanding kata-manager to
support, for instance, docker and nerdctl.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-11-09 11:24:37 +01:00
Fabiano Fidêncio
50df1129ea
Merge pull request #8411 from fidencio/topic/fix-k3s-deployment
gha: Fix regex used to get kubectl version from the k3s version
2023-11-09 10:44:34 +01:00
Fabiano Fidêncio
455b7bf776 gha: k3s: Avoid unnecessary escape
There's no reason to escape the first + on the +k3s[0-9]\+ regex, as
shown here:
```sh
ubuntu@k3s:~$ /usr/local/bin/k3s kubectl version --short 2>/dev/null | \
	grep "Client Version" | \
	sed \
		-e 's/Client Version: //' \
		-e 's/+k3s[0-9]\+//'
v1.27.7
```

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-11-09 08:42:25 +01:00
Fabiano Fidêncio
e7890ee8f6 gha: Fix regex used to get kubectl version from the k3s version
It seems that with the new k3s release, they've bumped their kubectl
version from x.y.z+k3s1 to x.y.z+k3s2.

Let's ensure our regexp is more generic and future proof for such
changes.

Fixes: #8410

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-11-09 07:08:02 +01:00
Archana Shinde
1611723465
Merge pull request #8379 from likebreath/1103/clh_v36.0
Upgrade to Cloud Hypervisor v36.0
2023-11-08 21:10:41 -08:00
Archana Shinde
268d4d622f
Merge pull request #8389 from justxuewei/vm-capable-test
runtime: Fix TestCheckHostIsVMContainerCapable unstablity issue
2023-11-08 12:14:04 -08:00
Archana Shinde
92a517156c
Merge pull request #8367 from amshinde/add-nerdctl-ipvlan-test
network: Fix network hotplug for ipvlan and macvlan endpoints for qemu and add tests
2023-11-08 11:45:13 -08:00
Chelsea Mafrica
83e731328f
Merge pull request #8023 from cmaf/runtime-rs-ch-pause-resume
runtime-rs: Update status for pause and resume
2023-11-08 11:34:47 -08:00
Xuewei Niu
acd9057c7b runtime: Fix TestCheckHostIsVMContainerCapable unstablity issue
TestCheckHostIsVMContainerCapable removes sysModuleDir to simulate a
case that the kernel modules are not loaded. However,
checkKernelModules() executes modprobe <module> if a module not
found in that directory. Loading those modules is required to be denied
temporarily.

Fixes: #8390

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-08 22:40:08 +08:00
Fupan Li
100a73d2fd
Merge pull request #7531 from justxuewei/device-cgroup
agent: Restrict device access at upper node of container's cgroup
2023-11-08 22:01:48 +08:00
Chao Wu
4435c1efd7
Merge pull request #8386 from jodh-intel/runtime-rs-ch-tidy-up
runtime-rs: ch: Simplify VSOCK error handling
2023-11-08 17:31:40 +08:00
Xuewei Niu
023d8dc01e agent: Changes according to Pan's comments
- Disable device cgroup restriction while pod cgroup is not available.
- Remove balcklist-related names and change whitelist-related names to
  allowed_all.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-08 09:39:08 +08:00
Xuewei Niu
136fb76222 tests: Add a integrated test for device cgroup
`TestDeviceCgroup` is added to cri-containerd's integration tests. The test
launches two containers. Each container has a block device. It checks the
validity of device cgroup.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-08 09:39:07 +08:00
Xuewei Niu
b5f3a8cb39 agent: Fix container launching failure with systemd cgroup
FSManager of systemd cgroup manager is responsible for setting up cgroup
path. The container launching will be failed if the FSManager is in
read-only mode.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-08 09:39:07 +08:00
Xuewei Niu
6477825195 agent: Minor changes according to Zhou's comments
The changes include:

- Change to debug logging level for resources after processed.
- Remove a todo for pod cgroup cleanup.
- Add an anyhow context to `get_paths_and_mounts()`.
- Remove code which denys access to VMROOTFS since it won't take effect. If
  blackmode is in use, the VMROOTFS will be denyed as default. Otherwise,
  device cgroups won't be updated in whitelist mode.
- Add a unit test for `default_allowed_devices()`.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-08 09:39:07 +08:00
Xuewei Niu
cec8044744 agent: Make devcg_info optional for LinuxContainer::new()
The runk is a standard OCI runtime that isnt' aware of concept of sandbox.
Therefore, the `devcg_info` argument of `LinuxContainer::new()` is
unneccessary to be provided.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-08 09:39:07 +08:00
Xuewei Niu
ef4c3844a3 agent: Restrict device access at upper node of container's cgroup
The target is to guarantee that containers couldn't escape to access extra
devices, like vm rootfs, etc.

Assume that there is a cgroup, such as `/A/B`. The `B` is container cgroup,
and the `A` is what we called pod cgroup. No matter what permissions are
set for the container (`B`), the `A`'s permission is always `a *:* rwm`. It
leads that containers could acquire permission to access to other devices
in VM that not belongs to themselves.

In order to set devices cgroup properly, the order of setting cgroups is
that the pod cgroup comes first and the container cgroup comes after.

The `Sandbox` has a new field, `devcg_info`, to save cgroup states. To
avoid setting container cgroup too early, an initialization should be done
carefully. `inited`, one of the states, is a boolean to indicate if the pod
cgroup is initialized. If no, the pod cgroup should be created firstly, and
set default permissions. After that, the pause container cgroup is created
and inherits the permissions from the pod cgroup.

If whitelist mode which allows containers to access all devices in VM is
enabled,  then device resources from OCI spec are ignored.

This feature not supports systemd cgroup and cgroup v2, since:

- Systemd cgroup implemented on Agent hasn't supported devices subsystem so
  far, see: https://github.com/kata-containers/kata-containers/issues/7506.
- Cgroup v2's device controller depends on eBPF programs, which is out of
  scope of cgroup.

Fixes: #7507

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-08 09:39:07 +08:00
Archana Shinde
c075fa6817 tests: Add test with nerdctl to verify macvlan support
Add test to verify kata supports macvlan networks.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-11-07 10:13:51 -08:00
Archana Shinde
07db673eb9 tests: Add test with nerdctl to verify ipvlan support
Add test to verify kata supports ipvlan networks.
This test can be bit tricky as it requires knowledge about host interfaces
to be used as a master for the ipvlan network.
However, with github actions, we can assume interface called eth0 to be
present on the host and functioning.

Fixes: #8366

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-11-07 10:13:51 -08:00
Archana Shinde
a6272733e7 network: Fix network hotplug for ipvlan and macvlan endpoints.
Since moving from network coldplug to hotplug, the only case verified
was veth endpoints. Support for network hotplug for ipvlan and macvlan was
broken/not added. Fix it.

Fixes: #8391

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-11-07 10:13:51 -08:00
James O. D. Hunt
59d0d4caff runtime-rs: ch: Simplify VSOCK error handling
Remove the redundant `VmConfigError::EmptyVsockSocketPath` error from
the Cloud Hypervisor config crate since this scenario is already handled
by the `VsockConfigError::NoVsockSocketPath` error.

Fixes: #8385.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-11-07 17:45:38 +00:00
James O. D. Hunt
bdb83f8282 runtime-rs: ch: Remove unused function
Remove the redundant `parse_mac()` function: this was never used and we
already have an implementation in `crates/resource/src/network/utils/mod.rs`.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-11-07 17:45:38 +00:00
Wainer Moschetta
949ac4d810
Merge pull request #8217 from beraldoleal/issues/8216
tests: fixes permission denied when running test
2023-11-07 12:25:23 -03:00
Wainer Moschetta
7f5d70f48b
Merge pull request #8061 from beraldoleal/gogo-removal-v3
Updating containerd to a GogoProtobuf free version
2023-11-07 12:18:50 -03:00
Greg Kurz
b27b4ce104 doc: No longer release the test repository
Now that most of the test repository got migrated to the main Kata repository,
it is no longer needed to tag the test repository when doing a release.

Update the documentation accordingly by dropping all references to the test
repository and only mention *the* Kata repository.

Fixes #8302

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-11-07 10:28:43 +01:00
Greg Kurz
af2d897fb1 doc: Release now uses the official GitHub CLI
The hub tool is deprecated. Releases are now based on the official gh
CLI. A notable improvement : when properly setup (see [1]), gh allows
to directly use HTTPS with one's GitHub credentials, instead of having
to setup proper SSH access for pushes to the repo.

Adjust the documentation accordingly.

Fixes #8302

[1] https://docs.github.com/en/github-cli/github-cli/quickstart#prerequisites

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-11-07 10:22:54 +01:00
Greg Kurz
2af9419fa4 doc: No longer run kata-deploy test when releasing
This is already tested by CI for every PR. Drop this step from the release
process documentation.

Fixes #8302

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-11-07 10:19:32 +01:00
Beraldo Leal
dd530ba8ee tests: fixes AMD errors
TestCheckHostIsVMContainerCapable is failing on AMD machines.
kata-check_amd64_test.go:96 has no AMD modules, also getCPUType is
missing.

Fixes #8384.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-11-06 16:49:59 +00:00
Beraldo Leal
7641c19f74 runtime: bump containerd for gogo deprecation
This update includes necessary changes due to the version bump of
containerd and its dependencies. It's part of a broader initiative to
phase out gogo protobuf, which has been deprecated, and to align with
the current supported libraries.

Fixes #7420.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-11-06 16:49:59 +00:00
Beraldo Leal
16fa2c39e6 protocols: replace gogo/types.Empty and Any
by Google versions.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-11-06 16:49:58 +00:00
Beraldo Leal
c61f4a8592 protocols: remove unused fieldpath option
The +fieldpath option, specific to gogoprotobuf, enabled dynamic field
access in protobuf messages, allowing nested fields to be accessed via
string paths.

This change is part of a larger effort to transition to the official Go
protobuf library for better maintainability and community support.
Upon review, no instances of dynamic field access were found in the
codebase, confirming that the feature is not in use.

By removing this unused feature, we simplify the build process and make
it easier to complete the transition away from gogoprotobuf.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-11-06 16:49:58 +00:00
Beraldo Leal
c87bc60ea0 protocols: removing unused mappings
Those mappings are not used by our .proto files and there is no
difference between .pb.go files generated.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-11-06 16:49:58 +00:00
Beraldo Leal
c5d845b30a agent: updating Cargo.lock files
Probably previous changes missed updating Cargo.lock.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-11-06 16:49:58 +00:00
Beraldo Leal
5d88c78a6e protocols: generating agent.pb.go
a3b003c345 modified agent but agent.pb.go
was not updated.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-11-06 16:49:58 +00:00
Archana Shinde
3b2fb6a604
Merge pull request #8284 from amshinde/runtime-rs-update-device-pci-info
runtime-rs: update device pci info for vfio and virtio-blk devices
2023-11-06 01:09:20 -08:00
Archana Shinde
036b7787dd runtime-rs: Use PCI path from hypervisor for vfio devices
Remove earlier functionality that tries to assign PCI path to vfio
devices from the host assuming pci slots to start from 1.
Get this from the hypervisor instead.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-11-05 21:59:44 -08:00