Commit Graph

14618 Commits

Author SHA1 Message Date
Hyounggyu Choi
2c2941122c tests: Fail fast in assert_pod_fail()
`assert_pod_fail()` currently calls `k8s_create_pod()` to ensure that a pod
does not become ready within the default 120s. However, this delays the test's
completion even if an error message is detected earlier in the journal.

This commit removes the use of `k8s_create_pod()` and modifies `assert_pod_fail()`
to fail as soon as the pod enters a failed state.

All failing pods end up in one of the following states:

- CrashLoopBackOff
- ImagePullBackOff

The function now polls the pod's state every 5 seconds to check for these conditions.
If the pod enters a failed state, the function immediately returns 0. If the pod
does not reach a failed state within 120 seconds, it returns 1.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-09-24 16:09:20 +02:00
Aurélien Bombo
e738054ddb
Merge pull request #10311 from pawelpros/pproskur/fixyq
ci: don't require sudo for yq if already installed
2024-09-23 08:57:11 -07:00
Alex Lyn
6b94cc47a8
Merge pull request #10146 from Apokleos/intro-cdi
Introduce cdi in runtime-rs
2024-09-23 21:45:42 +08:00
Alex Lyn
b8ba346e98 runtime-rs: Add test for container devices with CDI.
Fixes #10145

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-09-23 17:20:22 +08:00
Steve Horsman
0e0cb24387
Merge pull request #10329 from Bickor/webhook-check
tools.kata-webhook: Specify runtime class using configMap
2024-09-23 09:59:12 +01:00
Steve Horsman
6f0b3eb2f9
Merge pull request #10337 from stevenhorsman/update-release-process-post-3.9.0
doc: Update the release process
2024-09-23 09:55:57 +01:00
Hyounggyu Choi
8a893cd4ee
Merge pull request #10232 from BbolroC/fix-loop-device-for-exec_host
tests: Fix loop device handling for exec_host()
2024-09-23 08:15:03 +02:00
Fupan Li
f1f5bef9ef
Merge pull request #10339 from lifupan/main_fix
runtime-rs: fix the issue of using block_on
2024-09-23 09:28:40 +08:00
Fupan Li
fb27de3561 runtime-rs: fix the issue of using block_on
Since the block_on would block on the current thread
which would prevent other async tasks to be run on this
worker thread, thus change it to use the async task for
this task.

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2024-09-22 14:40:44 +08:00
Aurélien Bombo
79a3b4e2e5
Merge pull request #10335 from kata-containers/sprt/fix-kata-deploy-docs
kata-deploy: clean up and fix docs for k0s
2024-09-20 13:33:14 -07:00
stevenhorsman
4f745f77cb doc: Update the release process
- Reflect the need to update the versions in the Helm Chart
- Add the lock branch instruction
- Add clarity about the permissions needed to complete tasks

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-09-20 19:04:33 +01:00
Aurélien Bombo
78c63c7951 kata-deploy: clean up and fix docs for k0s
* Clarifies instructions for k0s.
* Adds kata-deploy step for each cluster type.
* Removes the old kata-deploy-stable step for vanilla k8s.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2024-09-20 11:59:40 -05:00
Hyounggyu Choi
2d6ac3d85d tests: Re-enable guest-pull-image tests for qemu-coco-dev
Now that the issue with handling loop devices has been resolved,
this commit re-enables the guest-pull-image tests for `qemu-coco-dev`.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-09-20 14:37:43 +02:00
Hyounggyu Choi
c6b86e88e4 tests: Increase timeouts for qemu-coco-dev in trusted image storage tests
Timeouts occur (e.g. `create_container_timeout` and `wait_time`)
when using qemu-coco-dev.
This commit increases these timeouts for the trusted image storage
test cases

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-09-20 14:37:43 +02:00
Hyounggyu Choi
9cff9271bc tests: Run all commands in *_loop_device() using exec_host()
If the host running the tests is different from the host where the cluster is running,
the *_loop_device() functions do not work as expected because the device is created
on the test host, while the cluster expects the device to be local.

This commit ensures that all commands for the relevant functions are executed via exec_host()
so that a device should be handled on a cluster node.

Additionally, it modifies exec_host() to return the exit code of the last executed command
because the existing logic with `kubectl debug` sometimes includes unexpected characters
that are difficult to handle. `kubectl exec` appears to properly return the exit code for
a given command to it.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-09-20 14:37:43 +02:00
Hyounggyu Choi
374b8d2534 tests: Create and delete node debugger pod only once
Creating and deleting a node debugger pod for every `exec_host()`
call is inefficient.
This commit changes the test suite to create and delete the pod
only once, globally.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-09-20 14:37:43 +02:00
Hyounggyu Choi
aedf14b244 tests: Mimic node debugger with full privileges
This commit addresses an issue with handling loop devices
via a node debugger due to restricted privileges.
It runs a pod with full privileges, allowing it to mount
the host root to `/host`, similar to the node debugger.
This change enables us to run tests for trusted image storage
using the `qemu-coco-dev` runtime class.

Fixes: #10133

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-09-20 14:37:43 +02:00
Alex Lyn
63b25e8cb0 runtime-rs: Introduce cdi devices in container creation
Fixes #10145

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-09-20 09:28:51 +08:00
Alex Lyn
03735d78ec runtime-rs: add cdi devices definition and related methods
Add cdi devices including ContainerDevice definition and
annotation_container_device method to annotate vfio device
in OCI Spec annotations which is inserted into Guest with
its mapping of vendor-class and guest pci path.

Fixes #10145

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-09-20 09:28:51 +08:00
Alex Lyn
020e3da9b9 runtime-rs: extend DeviceVendor with device class
We need vfio device's properties device, vendor and
class, but we can only get property device and vendor.
just extend it with class is ok.

Fixes #10145

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-09-20 09:28:51 +08:00
Fabiano Fidêncio
77c844da12
Merge pull request #10239 from fidencio/topic/remove-acrn
acrn: Drop support
2024-09-19 23:10:29 +02:00
GabyCT
6eef58dc3e
Merge pull request #10336 from GabyCT/topic/extendtimeout
gha: Increase timeout to run k8s tests on TDX
2024-09-19 13:12:55 -06:00
Martin
b9d88f74ed tools.kata-webhook: Specify runtime class using configMap
The kata webhook requires a configmap to define what runtime class it
should set for the newly created pods. Additionally, the configmap
allows others to modify the default runtime class name we wish to set
(in case the handler is kata but the name of the runtimeclass is
different).

Finally, this PR changes the webhook-check to compare the runtime of the
newly created pod against the specific runtime class in the configmap,
if said confimap doesn't exist, then it will default to "kata".

Signed-off-by: Martin <mheberling@microsoft.com>
2024-09-19 11:51:38 -07:00
Fabiano Fidêncio
51dade3382 docs: Fix spell checker
tokio is not a valid word, it seeems, so let's use `tokio`.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2024-09-19 20:25:21 +02:00
Gabriela Cervantes
49b3a0faa3 gha: Increase timeout to run k8s tests on TDX
This PR increases the timeout to run k8s tests for Kata CoCo TDX
to avoid the random failures of timeout.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-09-19 17:15:47 +00:00
Fabiano Fidêncio
31438dba79 docs: Fix qemu link
Otherwise static checks will fail, as we woke up the dogs with changes
on the same file.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2024-09-19 16:05:43 +02:00
Fabiano Fidêncio
fefcf7cfa4 acrn: Drop support
As we don't have any CI, nor maintainer to keep ACRN code around, we
better have it removed than give users the expectation that it should or
would work at some point.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2024-09-19 16:05:43 +02:00
Fabiano Fidêncio
cdaaf708a1
Merge pull request #10334 from emanuellima1/bump-version
release: Bump version to 3.9.0
2024-09-19 15:27:50 +02:00
Emanuel Lima
a6ee15c5c7
release: Bump VERSION to 3.9.0
Starting the v3.9.0 release

Signed-off-by: Emanuel Lima <emlima@redhat.com>
2024-09-19 10:14:55 -03:00
Fabiano Fidêncio
e9593b53a4
Merge pull request #10234 from pmores/add-support-for-disabled-guest-selinux
runtime-rs: add support for disabled guest selinux
2024-09-19 15:03:24 +02:00
Fabiano Fidêncio
4d11fecc2d
Merge pull request #10274 from ajaypvictor/remote_image-os_types
runtime: Enable Image annotation for remote hypervisor
2024-09-19 13:39:20 +02:00
Fabiano Fidêncio
3d5f48e02e
Merge pull request #10283 from alexman-stripe/alexman-stripe/fix-kata-shim-not-reporting-inactive-file-cgroup-v2
shim: Fix memory usage reporting for cgroup v2
2024-09-19 12:50:36 +02:00
Pavel Mores
5e5eb9759f runtime-rs: handle disabled guest selinux in virtiofsd
This is just a port of functionality existing in the golang runtime.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-09-19 12:47:10 +02:00
Pavel Mores
8c92f3bfec runtime-rs: enable/disable selinux in guest based on disable_guest_selinux
This change technically affects the path for enabled guest selinux as well,
however since this is not implemented in runtime-rs anyway nothing should
break.  When guest selinux support is added this change will come handy.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-09-19 12:47:10 +02:00
Pavel Mores
204ee21bc8 runtime-rs: handle disabled guest selinux in OCI spec
If guest selinux is off the runtime has to ensure that container OCI spec
contains no selinux labels for the container rootfs and process.  Failure
to do so causes kata agent to try and apply the labels which fails since
selinux is not enabled in guest, which in turn causes container launch
to fail.

This is largely inspired by golang runtime(*) with a slight deviation
in ordering of checks.  This change simply checks the disable_guest_selinux
config setting and if it's true it clears both rootfs and process label if
necessary.  Golang runtime, on the other hand, seems to first check if
process label is non-empty and only then it checks the config setting,
meaning that if process label is empty the rootfs label is not reset
even if it's non-empty.  Frankly, this looks like a potential bug though
probably unlikely to manifest since it can be assumed that the labels are
either both empty, or both non-empty.

(*) 4fd4b02f2e/src/runtime/virtcontainers/kata_agent.go (L1005)

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-09-19 12:47:10 +02:00
Pavel Mores
eb1227f47d runtime-rs: parse the disable_guest_selinux config key
In order to handle the setting we have to first parse it and make its
value available to the rest of the program.

The yes() function is added to comply with serde which seems to insist
on default values being returned from functions.  Long term, this is
surely not the best place for this function to live, however given that
this is currently the first and only place where it's used it seems
appropriate to put it near its use.  If it ends up being reused elsewhere
a better place will surely emerge.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-09-19 12:47:10 +02:00
Steve Horsman
8789551fe6
Merge pull request #10333 from fidencio/topic/ci-bump-ubuntu-20.04-runners-to-22.04
ci: Bump ubuntu 20.04 runners to 22.04
2024-09-19 11:44:33 +01:00
Fabiano Fidêncio
35c7f8d1ba ci: Bump ubuntu 20.04 runners to 22.04
Azure internal mirrors for Ubuntu 20.04 have gone awry, leading to a
situation where dependencies cannot be installed (such as
libdevmapper-dev), blocking then our CI.

Let's bump the runners to 22.04 regardless, even knowing it'll cause an
issue with the runk tests, as the agent check tests are considered more
crucial to the project at this point.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2024-09-19 12:29:20 +02:00
Fabiano Fidêncio
eccdffebf7
Merge pull request #10243 from katexochen/nydus-overlayfs-path
virtcontainers: allow specifying nydus-overlayfs binary by path
2024-09-19 11:35:45 +02:00
Ajay Victor
a19f2eacec runtime: Enable ImageName annotation for remote hypervisor
Enables ImageName to support multiple VM images in remote hypervisor scenario

Fixes https://github.com/kata-containers/kata-containers/issues/10240

Signed-off-by: Ajay Victor <ajvictor@in.ibm.com>
2024-09-19 14:48:46 +05:30
Alex Man
27f8f69195 shim: Fix memory usage reporting for cgroup v2
kata-shim was not reporting `inactive_file` in memory stat.

This memory is deducted by containerd when calculating the size of container working set, as it can be paged out by the operating
system under memory pressure. Without reporting `inactive_file`, containerd will over report container memory usage.
[Here](https://github.com/containerd/containerd/blob/v1.7.22/pkg/cri/server/container_stats_list_linux.go#L117) is where containerd
deducts `inactive_file` from memory usage.

Note that kata-shim correctly reports `total_inactive_file` for cgroup v1, but this was not implemented for cgroup v2.

This commit:
- Adds code in kata-shim to report "inactive_file" memory for cgroup v2
- Implements reporting of all available cgroup v2 memory stats to containerd
- Uses defensive coding to avoid assuming existence of any memory.stat fields

The list of available cgroup v2 memory stats defined by containerd can be found
[here](https://pkg.go.dev/github.com/containerd/cgroups/v2/stats#MemoryStat).

Fixes #10280

Signed-off-by: Alex Man <alexman@stripe.com>
2024-09-18 14:04:24 -07:00
Fabiano Fidêncio
1597f8ba00
Merge pull request #10279 from alexman-stripe/alexman-stripe/fix-cgroup-v2-wrong-cpu-usage-unit
agent: Fix CPU usage reporting for cgroup v2 in kata-agent
2024-09-18 21:36:52 +02:00
Fabiano Fidêncio
593cbb8710
Merge pull request #10306 from microsoft/danmihai1/more-security-contexts
genpolicy: get UID from PodSecurityContext
2024-09-18 21:33:39 +02:00
Aurélien Bombo
5402f2c637
Merge pull request #10308 from Sumynwa/sumsharma/add_setpolicy_agent_ctl
agent-ctl: Add SetPolicy support
2024-09-18 10:09:07 -07:00
Pawel Proskurnicki
b63d49b34a ci: don't require sudo for yq if already installed
Yq installation shouldn't force to use sudo in case yq is already installed in correct version.

Signed-off-by: Pawel Proskurnicki <pawel.proskurnicki@intel.com>
2024-09-18 11:01:07 +02:00
Sumedh Alok Sharma
18c887f055 agent-ctl: Add SetPolicy support
This patch adds support to call kata agents SetPolicy
API. Also adds tests for SetPolicy API using agent-ctl.

Fixes #9711

Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
2024-09-18 10:53:49 +05:30
GabyCT
28d430ec42
Merge pull request #10324 from GabyCT/topic/fixinlib
ci: Fix indentation of install libseccomp script
2024-09-17 14:21:24 -06:00
Fabiano Fidêncio
da2377346d
Merge pull request #10323 from stevenhorsman/update-kubectl-release-url
kata-deploy: Switch Kubernetes URL
2024-09-17 20:47:17 +02:00
Gabriela Cervantes
096f32cc52 ci: Fix indentation of install libseccomp script
This PR fixes the indentation of the install libseccomp script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-09-17 16:38:53 +00:00
Aurélien Bombo
9d29ce460d
Merge pull request #10303 from Sumynwa/sumsharma/agent_policy_set_env
agent: add support to provide default agent policy via env
2024-09-17 09:04:11 -07:00