kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-04-10 22:12:35 +00:00

Author	SHA1	Message	Date
cncal	9caa7beb1f	runtime: make kata-runtime check error more understandable If device /dev/kvm does not exist, kata-runtime check would fail with an ambiguous error messae 'no such file or directory'. I added a little more details to make it understandable and it will belike: ``` ERRO[0000] cannot open kvm device: no such file or directory arch=arm64 check-type=full device=/dev/kvm name=kata-runtime pid=2849085 source=runtime ERRO[0000] no such file or directory arch=arm64 name=kata-runtime pid=2849085 source=runtime no such file or directory ``` Signed-off-by: cncal <flycalvin@qq.com>	2024-05-03 08:29:08 +08:00
Dan Mihai	c26dad8fe5	Merge pull request #9294 from burgerdev/burgerdev/genpolicy-configurable-pause genpolicy: support insecure registries and custom pause containers	2024-04-16 09:39:33 -07:00
Greg Kurz	aca6a1bcb5	Merge pull request #9353 from pmores/pr-8866-follow-up runtime-rs: refactor qemu driver	2024-04-16 16:07:36 +02:00
Hyounggyu Choi	32f58abfde	Merge pull request #9403 from BbolroC/runtime-rs-ci-qemu CI: Enable GHA cri-containerd workflow for runtime-rs with QEMU	2024-04-15 09:31:25 +02:00
Hyounggyu Choi	606f8e1ab2	runtime-rs: Adjust configuration for qemu-runtime-rs To make `qemu-runtime-rs` working for CI, we have to rename a configuration template file and `CONFIG_FILE_QEMU` in Makefile. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-04-12 12:25:53 +02:00
Alexandru Matei	9e01732f7a	agent: shutdown vm on exit when agent is used as init process Linux kernel generates a panic when the init process exits. The kernel is booted with panic=1, hence this leads to a vm reboot. When used as a service the kata-agent service has an ExecStop option which does a full sync and shuts down the vm. This patch mimicks this behavior when kata-agent is used as the init process. Fixes: #9429 Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>	2024-04-12 11:32:31 +03:00
Markus Rudy	77540503f9	genpolicy: add support for insecure registries genpolicy is a handy tool to use in CI systems, to prepare workloads before applying them to the Kubernetes API server. However, many modern build systems like Bazel or Nix restrict network access, and rightfully so, so any registry interaction must take place on localhost. Configuring certificates for localhost is tricky at best, and since there are no privacy concerns for localhost traffic, genpolicy should allow to contact some registries insecurely. As this is a runtime environment detail, not a target environment detail, configuring insecure registries does not belong into the JSON settings, so it's implemented as command line flags. Fixes: #9008 Signed-off-by: Markus Rudy <webmaster@burgerdev.de>	2024-04-11 22:29:03 +02:00
Markus Rudy	bc2292bc27	genpolicy: make pause container image configurable CRIs don't always use a pause container, but even if they do the concrete container choice is not specified. Even if the CRI config can be tweaked, it's not guaranteed that registries in the public internet can be reached. To be portable across CRI implementations and configurations, the genpolicy user needs to be able to configure the container the tool should append to the policy. Signed-off-by: Markus Rudy <webmaster@burgerdev.de>	2024-04-11 16:26:35 +02:00
Markus Rudy	8b30fa103f	genpolicy: parse json settings during config init Decouple initialization of the Settings struct from creating the AgentPolicy struct, so that the settings are available for evaluating, extending or overriding command line arguments. Signed-off-by: Markus Rudy <webmaster@burgerdev.de>	2024-04-11 16:17:33 +02:00
Xuewei Niu	50f78ec52c	agent: Fix the issue with the "test_new_fs_manager" test This patch introduces a one-time cpath to mitigate the cgroup residuals. It might break the device cgroup merging rules when the cgroup has children. Fixes: #9456 Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2024-04-11 18:06:05 +08:00
Saul Paredes	51498ba99a	genpolicy: toggle containerd pull in tests - Add v1 image test case - Install protobuf-compiler in build check - Reset containerd config to default in kubernetes test if we are testing genpolicy - Update docker_credential crate - Add test that uses default pull method - Use GENPOLICY_PULL_METHOD in test Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-04-08 19:28:29 -07:00
Saul Paredes	c96ebf237c	genpolicy: add containerd pull method Add optional toggle to use existing containerd installation to pull and manage container images. This adds support to a wider set of images that are currently not supported by standard pull method, such as those that use v1 manifest. Fixes: #9144 Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-04-08 09:56:59 -07:00
Greg Kurz	8b996b9307	Merge pull request #9331 from egernst/foobar katautils: check number of cores on the system intead of go runtime	2024-04-08 18:38:49 +02:00
Wainer Moschetta	fba1d394d7	Merge pull request #9369 from ChengyuZhu6/sandbox-image agent:image: Support different pause image in the guest for guest pull	2024-04-08 11:06:21 -03:00
stevenhorsman	864e9c22ba	agent: doc: Add new config doc Document the new guest_components_rest_api config parameter Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-04-08 11:38:53 +01:00
Biao Lu	f0edec84f6	agent: Launch api-server-rest If 'rest_api' is configured, let's start the api-server-rest after the attestation-agent and the confidential-data-hub have been started. Fixes: #7555 Signed-off-by: Biao Lu <biao.lu@intel.com> Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> Signed-off-by: Linda Yu <linda.yu@intel.com> Co-authored-by: stevenhorsman <steven@uk.ibm.com> Co-authored-by: Jakob Naucke <jakob.naucke@ibm.com> Co-authored-by: Wang, Arron <arron.wang@intel.com> Co-authored-by: zhouliang121 <liang.a.zhou@linux.alibaba.com> Co-authored-by: Alex Carter <alex.carter@ibm.com> Co-authored-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com> Co-authored-by: Xynnn007 <xynnn@linux.alibaba.com>	2024-04-08 11:38:53 +01:00
Biao lu	4d752e6350	agent: Add config for api-server-rest Add configuration for 'rest api server'. Optional configurations are 'agent.rest_api=attestation' will enable attestation api 'agent.rest_api=resource' will enable resource api 'agent.rest_api=all' will enable all (attestation and resource) api Fixes: #7555 Signed-off-by: Biao Lu <biao.lu@intel.com> Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> Signed-off-by: Linda Yu <linda.yu@intel.com> Co-authored-by: stevenhorsman <steven@uk.ibm.com> Co-authored-by: Jakob Naucke <jakob.naucke@ibm.com> Co-authored-by: Wang, Arron <arron.wang@intel.com> Co-authored-by: zhouliang121 <liang.a.zhou@linux.alibaba.com> Co-authored-by: Alex Carter <alex.carter@ibm.com> Co-authored-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com> Co-authored-by: Xynnn007 <xynnn@linux.alibaba.com>	2024-04-08 11:06:14 +01:00
Biao Lu	f476d671ed	agent: Launch the confidential data hub Let's introduce a new method to start the confidential data hub and the attestation agent. The former depends on the later, and it needs to be started before the RPC server. Starting the attestation components is based on whether the confidential containers guest components binaries are found in the rootfs. Fixes: #7544 Signed-off-by: Biao Lu <biao.lu@intel.com> Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> Signed-off-by: Linda Yu <linda.yu@intel.com> Co-authored-by: stevenhorsman <steven@uk.ibm.com> Co-authored-by: Jakob Naucke <jakob.naucke@ibm.com> Co-authored-by: Wang, Arron <arron.wang@intel.com> Co-authored-by: zhouliang121 <liang.a.zhou@linux.alibaba.com> Co-authored-by: Alex Carter <alex.carter@ibm.com> Co-authored-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com> Co-authored-by: Xynnn007 <xynnn@linux.alibaba.com>	2024-04-08 11:06:14 +01:00
Greg Kurz	be8f0cb520	Merge pull request #9402 from deagon/feat/debug-threads qemu: show the thread name when enable the hypervisor.debug option	2024-04-08 11:04:36 +02:00
ChengyuZhu6	8c897f822c	agent:image: Support different pause image in the guest for guest pull Support different pause images in the guest for guest-pull, such as k8s pause image (registry.k8s.io/pause) and openshift pause image (quay.io/bpradipt/okd-pause). Fixes: #9225 -- part III Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-04-07 09:00:10 +08:00
Fabiano Fidêncio	cdb8531302	hypervisor: Simplify TDX protection detection Let's rely on the kvm module 'tdx' parameter to do so. This aligns with both OSVs (Canonical, Red Hat, SUSE) and the TDX adoption (https://github.com/intel/tdx-linux) stacks. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-04-05 19:51:27 +02:00
Fabiano Fidêncio	b7cccfa019	qemu: tdx: Adapt command line This commit is a mess, but I'm not exactly sure what's the best way to make it less messy, as we're getting QEMU TDX to work while partially reverting `1e34220c41`. With that said, let me cover the content of this commit. Firstly, we're reverting all the changes related to "memory-backend-memfd-private", as that's what was used with the previous host stack, but it seems it didn't fly upstream. Secondly, in order to get QEMU to properly work with TDX, we need to enforce the 'private=on' knob and use the "memory-backend-ram", and we're doing so, and also making sure to test the `private=on` newly added knob. I'm sorry for the confusion, I understand this is not optimal, I just don't see an easy path to do changes without leaving the code broken during those changes. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-04-05 19:51:27 +02:00
Fabiano Fidêncio	6b4cc5ea6a	Revert "qemu: tdx: Workaround SMP issue with TDX 1.5" This reverts commit `d1b54ede29`. Conflicts: src/runtime/virtcontainers/qemu.go This commit was a hack that was needed in order to get QEMU + TDX to work atop of the stack our CI was running on. As we're moving to "the officially supported by distros" host OS, we need to get rid of this. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-04-05 10:23:52 +02:00
Fabiano Fidêncio	582b5b6b19	govmm: tdx: Expose the private=on\|off knob The private=on\|off knob is required in order to properly lauunch a TDX guest VM. This is a brand new property that is part of the still in-flight patches adding TDX support on QEMU. Please, see: `3fdd8072da` Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-04-05 10:23:52 +02:00
Alex Lyn	0e0a361f0e	Merge pull request #8782 from Apokleos/device-increate-count bugfix and refactor device increate count	2024-04-05 13:43:49 +08:00
Eric Ernst	da01bccd36	katautils: check number of cores on the system intead of go runtime We used to utilize go runtime's "NumCPUs()", which will give the number of cores available to the Go runtime, which may be a subset of physical cores if the shim is started from within a cpuset. From the function's description: "NumCPU returns the number of logical CPUs usable by the current process." As an example, if containerd is run from within a smaller CPUset, the maximum size of a pod will be dictated by this CPUset, instead of what will be available on the rest of the system. Since the shim will be moved into its own cgroup that may have a different CPUset, let's stick with checking physical cores. This also aligns with what we have documented for maxVCPU handling. In the event we fail to read /proc/cpuinfo, let's use the goruntime. Fixes: #9327 Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2024-04-03 16:09:16 -07:00
Alex Lyn	935a1a3b40	runtime-rs: refactor decrease_attach_count with do_decrease_count Try to reduce duplicated code in decrease_attach_count with public new function do_decrease_count. Fixes: #8738 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-04-03 17:19:19 +08:00
Alex Lyn	4f0fab938d	runtime-rs: refactor increase_attach_count with do_increase_count Try to reduce duplicated code in increase_attach_count with public new function do_increase_count. Fixes: #8738 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-04-03 17:19:19 +08:00
Alex Lyn	fff64f1c3e	runtime-rs: introduce dedicated function do_decrease_count Introduce a dedicated public function do_decrease_count to reduce duplicated code in drivers' decrease_attach_count. Fixes: #8738 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-04-03 17:19:08 +08:00
Alex Lyn	5750faaf31	runtime-rs: introduce dedicated function do_increase_count Since there are many implementations of reference counting in the drivers, all of which have the same implementation, we should try to reduce such duplicated code as much as possible. Therefore, a new function is introduced to solve the problem of duplicated code. Fixes: #8738 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-04-03 17:09:17 +08:00
Guoqiang Ding	cd0c31e185	qemu: show the thread name when enable the hypervisor.debug option Add debug-threads=on in the name argument if debug enabled. Fixes: #9400 Signed-off-by: Guoqiang Ding <dgq8211@gmail.com>	2024-04-03 10:36:52 +08:00
Alex Lyn	fa8049af6c	Merge pull request #9383 from Apokleos/unified-cgrp-cmdline kata-agent: enabling cgroups-v2 by systemd.unified_cgroup_hierarchy	2024-04-02 09:08:04 +08:00
Alex Lyn	07bfdf4a22	Merge pull request #9275 from Apokleos/swap-hooks-bindmnt kata-agent: Change order of guest hook and bind mount processing	2024-04-02 07:40:10 +08:00
Alex Lyn	c88014834b	kata-agent: enabling cgroups-v2 by systemd.unified_cgroup_hierarchy Configure the system to mount cgroups-v2 by default during system boot by the systemd system, We must add systemd.unified_cgroup_hierarchy=1 parameter to kernel cmdline, which will be passed by kernel_params in configuration.toml. To enable cgroup-v2, just add systemd.unified_cgroup_hierarchy=true[1] to kernel_params. Fixes: #9336 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-04-01 18:45:12 +08:00
alex.lyn	548f252bc4	runtime-rs: bugfix incorrect use of refcount before vfio attach When there's a pod with multiple containers, there may be case that attach point more than 2, we should not return Err in that case when we are doing attach ops, but just return Ok. Fixes: #8738 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-04-01 11:28:57 +08:00
Alex Lyn	dfa8832406	Merge pull request #9345 from c3d/bug/9342-agent-test-errors agent: Fix errors in `make check`	2024-04-01 09:48:44 +08:00
Dan Mihai	600f9266f3	runtime: remove stream copy infinite loop This reverts commit `1c5693be86`. Avoid apparent infinite loop when ReadStreamRequest is blocked by policy - for some of the pods. When running the k8s-limit-range.bats test with Policy enabled, the Shim + VMM never get terminated on my cluster. Not sure why the sandbox clean-up works better for other tests, but the k8s-limit-range test pod gets stuck in an infinite loop: stdout io stream copy error happens: error = %wrpc error: code = PermissionDenied desc = \"ReadStreamRequest is blocked by policy ... policy check: ReadStreamRequest ... stdout io stream copy error happens: error = %wrpc error: code = PermissionDenied desc = \"ReadStreamRequest is blocked by policy ... policy check: ReadStreamRequest ... Fixes: #9380 Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-03-28 22:43:28 +00:00
Dan Mihai	ebb26edf42	Merge pull request #9347 from microsoft/danmihai1/reduce-exec-test-policy-prints genpolicy: reduce policy debug prints	2024-03-27 15:12:10 -07:00
Tobin Feldman-Fitzthum	9856fe5bea	runtime: remove ServiceOffload parameter Since we no longer use the service_offload configuration, remove the ServiceOffload field from the image struct. Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>	2024-03-27 12:21:13 -05:00
Tobin Feldman-Fitzthum	a18c7ca307	runtime: remove unimplemented CoCo configurations These experimental options were added 2 years ago in anticipation of features that would be added in CoCo. These do not match the features that were eventually added and will soon be ported to main. Fixes: #8047 Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>	2024-03-27 12:21:06 -05:00
Chengyu Zhu	e66a5cb54d	Merge pull request #9332 from ChengyuZhu6/guest-pull-timeout Support to set timeout to pull large image in guest	2024-03-28 00:34:08 +08:00
Christophe de Dinechin	82c4079fd0	agent: Remove useless loop This is the report from `make check`: ``` error: this loop never actually loops --> src/signal.rs:147:9 \| 147 \| / loop { 148 \| \| select! { 149 \| \| _ = handle => { 150 \| \| println!("INFO: task completed"); ... \| 156 \| \| } 157 \| \| } \| \|_________^ \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#never_loop = note: `#[deny(clippy::never_loop)]` on by default ``` There is only one option: you get something or a timeout. You never retry, so the report is correct. Fixes: #9342 Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2024-03-27 17:03:44 +01:00
Christophe de Dinechin	df5c88cdf0	agent: Remove lint error about `.flatten` running forever The lint report is the following: ``` error: `flatten()` will run forever if the iterator repeatedly produces an `Err` --> src/rpc.rs:1754:10 \| 1754 \| .flatten() \| ^^^^^^^^^ help: replace with: `map_while(Result::ok)` \| note: this expression returning a `std::io::Lines` may produce an infinite number of `Err` in case of a read error --> src/rpc.rs:1752:5 \| 1752 \| / reader 1753 \| \| .lines() \| \|________________^ = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#lines_filter_map_ok = note: `-D clippy::lines-filter-map-ok` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::lines_filter_map_ok)]` ``` This commit simply applies the suggestion. Fixes: #9342 Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2024-03-27 17:03:44 +01:00
Christophe de Dinechin	bfb55312be	agent: Fix `.enumerate` errors during `make check` Running `make check` in the `src/agent` directory gives: ``` error: you seem to use `.enumerate()` and immediately discard the index --> rustjail/src/mount.rs:572:27 \| 572 \| for (_index, line) in reader.lines().enumerate() { \| ^^^^^^^^^^^^^^^^^^^^^^^^^^ \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unused_enumerate_index = note: `-D clippy::unused-enumerate-index` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::unused_enumerate_index)]` help: remove the `.enumerate()` call \| 572 \| for line in reader.lines() { \| ~~~~ ~~~~~~~~~~~~~~ Checking tokio-native-tls v0.3.1 Checking hyper-tls v0.5.0 Checking reqwest v0.11.18 error: could not compile `rustjail` (lib) due to 1 previous error warning: build failed, waiting for other jobs to finish... make: *** [../../utils.mk:177: standard_rust_check] Error 101 ``` Fixes: #9342 Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2024-03-27 17:03:44 +01:00
Greg Kurz	e1068da1a0	Merge pull request #9326 from gkurz/draft-release Only tag and publish the release when it is fully ready	2024-03-27 15:59:59 +01:00
ChengyuZhu6	c2dc13ebaa	runtime: support to configure CreateContainer Timeout in configurations support to configure CreateContainerRequestTimeout in the configurations. e.g.: [runtime] ... create_container_timeout = 300 Note: The effective timeout is determined by the lesser of two values: runtime-request-timeout from kubelet config (https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#:~:text=runtime%2Drequest%2Dtimeout) and create_container_timeout. In essence, the timeout used for guest pull=runtime-request-timeout<create_container_timeout?runtime-request-timeout:create_container_timeout. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-03-27 21:58:41 +08:00
Greg Kurz	693c9487d4	docs: Adjust release documentation Most of the content of `docs/Stable-Branch-Strategy.md` got de-facto deprecated by the re-design of the release process described in #9064. Remove this file and all its references in the repo. The `## Versioning` section has some useful information though. It is moved to `docs/Release-Process.md`. The documentation of the `PATCH` field is adapted according to new workflow. Fixes #9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>	2024-03-27 12:41:48 +01:00
Steve Horsman	45aba769c0	Merge pull request #9346 from cmaf/ci-remove-repo-docs Remove additional links to tests directory	2024-03-27 11:13:32 +00:00
ChengyuZhu6	2224f6d63f	runtime: support to configure CreateContainer timeout in annotation Support to configure CreateContainerRequestTimeout in the annotations. e.g.: annotations: "io.katacontainers.config.runtime.create_container_timeout": "300" Note: The effective timeout is determined by the lesser of two values: runtime-request-timeout from kubelet config (https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#:~:text=runtime%2Drequest%2Dtimeout) and create_container_timeout. In essence, the timeout used for guest pull=runtime-request-timeout<create_container_timeout?runtime-request-timeout:create_container_timeout. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-03-27 15:44:29 +08:00
ChengyuZhu6	39bd462431	runtime: support to set timeout for CreateContainerRequest In the situation to pull images in the guest #8484, it’s important to account for pulling large images. Presently, the image pull process in the guest hinges on `CreateContainerRequest`, which defaults to a 60-second timeout. However, this duration may prove insufficient for pulling larger images, such as those containing AI models. Consequently, we must devise a method to extend the timeout period for large image pull. Fixes: #8141 Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-03-27 15:44:29 +08:00

1 2 3 4 5 ...

4150 Commits