Commit Graph

4150 Commits

Author SHA1 Message Date
cncal
9caa7beb1f runtime: make kata-runtime check error more understandable
If device /dev/kvm does not exist, kata-runtime check would fail with
an ambiguous error messae 'no such file or directory'. I added a little
more details to make it understandable and it will belike:

```
ERRO[0000] cannot open kvm device: no such file or directory  arch=arm64 check-type=full device=/dev/kvm name=kata-runtime pid=2849085 source=runtime
ERRO[0000] no such file or directory                          arch=arm64 name=kata-runtime pid=2849085 source=runtime
no such file or directory
```

Signed-off-by: cncal <flycalvin@qq.com>
2024-05-03 08:29:08 +08:00
Dan Mihai
c26dad8fe5 Merge pull request #9294 from burgerdev/burgerdev/genpolicy-configurable-pause
genpolicy: support insecure registries and custom pause containers
2024-04-16 09:39:33 -07:00
Greg Kurz
aca6a1bcb5 Merge pull request #9353 from pmores/pr-8866-follow-up
runtime-rs: refactor qemu driver
2024-04-16 16:07:36 +02:00
Hyounggyu Choi
32f58abfde Merge pull request #9403 from BbolroC/runtime-rs-ci-qemu
CI: Enable GHA cri-containerd workflow for runtime-rs with QEMU
2024-04-15 09:31:25 +02:00
Hyounggyu Choi
606f8e1ab2 runtime-rs: Adjust configuration for qemu-runtime-rs
To make `qemu-runtime-rs` working for CI, we have to rename a configuration
template file and `CONFIG_FILE_QEMU` in Makefile.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-04-12 12:25:53 +02:00
Alexandru Matei
9e01732f7a agent: shutdown vm on exit when agent is used as init process
Linux kernel generates a panic when the init process exits.
The kernel is booted with panic=1, hence this leads to a
vm reboot.
When used as a service the kata-agent service has an ExecStop
option which does a full sync and shuts down the vm.
This patch mimicks this behavior when kata-agent is used as
the init process.

Fixes: #9429

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2024-04-12 11:32:31 +03:00
Markus Rudy
77540503f9 genpolicy: add support for insecure registries
genpolicy is a handy tool to use in CI systems, to prepare workloads
before applying them to the Kubernetes API server. However, many modern
build systems like Bazel or Nix restrict network access, and rightfully
so, so any registry interaction must take place on localhost.
Configuring certificates for localhost is tricky at best, and since
there are no privacy concerns for localhost traffic, genpolicy should
allow to contact some registries insecurely. As this is a runtime
environment detail, not a target environment detail, configuring
insecure registries does not belong into the JSON settings, so it's
implemented as command line flags.

Fixes: #9008

Signed-off-by: Markus Rudy <webmaster@burgerdev.de>
2024-04-11 22:29:03 +02:00
Markus Rudy
bc2292bc27 genpolicy: make pause container image configurable
CRIs don't always use a pause container, but even if they do the
concrete container choice is not specified. Even if the CRI config can
be tweaked, it's not guaranteed that registries in the public internet
can be reached. To be portable across CRI implementations and
configurations, the genpolicy user needs to be able to configure the
container the tool should append to the policy.

Signed-off-by: Markus Rudy <webmaster@burgerdev.de>
2024-04-11 16:26:35 +02:00
Markus Rudy
8b30fa103f genpolicy: parse json settings during config init
Decouple initialization of the Settings struct from creating the
AgentPolicy struct, so that the settings are available for evaluating,
extending or overriding command line arguments.

Signed-off-by: Markus Rudy <webmaster@burgerdev.de>
2024-04-11 16:17:33 +02:00
Xuewei Niu
50f78ec52c agent: Fix the issue with the "test_new_fs_manager" test
This patch introduces a one-time cpath to mitigate the cgroup residuals. It
might break the device cgroup merging rules when the cgroup has children.

Fixes: #9456

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2024-04-11 18:06:05 +08:00
Saul Paredes
51498ba99a genpolicy: toggle containerd pull in tests
- Add v1 image test case
- Install protobuf-compiler in build check
- Reset containerd config to default in kubernetes test if we are testing genpolicy
- Update docker_credential crate
- Add test that uses default pull method
- Use GENPOLICY_PULL_METHOD in test

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2024-04-08 19:28:29 -07:00
Saul Paredes
c96ebf237c genpolicy: add containerd pull method
Add optional toggle to use existing containerd installation to pull and manage container images.
This adds support to a wider set of images that are currently not supported by standard pull method,
such as those that use v1 manifest.

Fixes: #9144

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2024-04-08 09:56:59 -07:00
Greg Kurz
8b996b9307 Merge pull request #9331 from egernst/foobar
katautils: check number of cores on the system intead of go runtime
2024-04-08 18:38:49 +02:00
Wainer Moschetta
fba1d394d7 Merge pull request #9369 from ChengyuZhu6/sandbox-image
agent:image: Support different pause image in the guest for guest pull
2024-04-08 11:06:21 -03:00
stevenhorsman
864e9c22ba agent: doc: Add new config doc
Document the new guest_components_rest_api config parameter

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-04-08 11:38:53 +01:00
Biao Lu
f0edec84f6 agent: Launch api-server-rest
If 'rest_api' is configured, let's start the  api-server-rest after
the attestation-agent and the confidential-data-hub have been started.

Fixes: #7555

Signed-off-by: Biao Lu <biao.lu@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Linda Yu <linda.yu@intel.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Co-authored-by: Jakob Naucke <jakob.naucke@ibm.com>
Co-authored-by: Wang, Arron <arron.wang@intel.com>
Co-authored-by: zhouliang121 <liang.a.zhou@linux.alibaba.com>
Co-authored-by: Alex Carter <alex.carter@ibm.com>
Co-authored-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
Co-authored-by: Xynnn007 <xynnn@linux.alibaba.com>
2024-04-08 11:38:53 +01:00
Biao lu
4d752e6350 agent: Add config for api-server-rest
Add configuration for 'rest api server'.

Optional configurations are
  'agent.rest_api=attestation' will enable attestation api
  'agent.rest_api=resource' will enable resource api
  'agent.rest_api=all' will enable all (attestation and resource) api

Fixes: #7555

Signed-off-by: Biao Lu <biao.lu@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Linda Yu <linda.yu@intel.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Co-authored-by: Jakob Naucke <jakob.naucke@ibm.com>
Co-authored-by: Wang, Arron <arron.wang@intel.com>
Co-authored-by: zhouliang121 <liang.a.zhou@linux.alibaba.com>
Co-authored-by: Alex Carter <alex.carter@ibm.com>
Co-authored-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
Co-authored-by: Xynnn007 <xynnn@linux.alibaba.com>
2024-04-08 11:06:14 +01:00
Biao Lu
f476d671ed agent: Launch the confidential data hub
Let's introduce a new method to start the confidential data hub and the
attestation agent.  The former depends on the later, and it needs to be
started before the RPC server.

Starting the attestation components is based on whether the confidential
containers guest components binaries are found in the rootfs.

Fixes: #7544

Signed-off-by: Biao Lu <biao.lu@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Linda Yu <linda.yu@intel.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Co-authored-by: Jakob Naucke <jakob.naucke@ibm.com>
Co-authored-by: Wang, Arron <arron.wang@intel.com>
Co-authored-by: zhouliang121 <liang.a.zhou@linux.alibaba.com>
Co-authored-by: Alex Carter <alex.carter@ibm.com>
Co-authored-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
Co-authored-by: Xynnn007 <xynnn@linux.alibaba.com>
2024-04-08 11:06:14 +01:00
Greg Kurz
be8f0cb520 Merge pull request #9402 from deagon/feat/debug-threads
qemu: show the thread name when enable the hypervisor.debug option
2024-04-08 11:04:36 +02:00
ChengyuZhu6
8c897f822c agent:image: Support different pause image in the guest for guest pull
Support different pause images in the guest for guest-pull, such as k8s
pause image (registry.k8s.io/pause) and openshift pause image (quay.io/bpradipt/okd-pause).

Fixes: #9225 -- part III

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-04-07 09:00:10 +08:00
Fabiano Fidêncio
cdb8531302 hypervisor: Simplify TDX protection detection
Let's rely on the kvm module 'tdx' parameter to do so.
This aligns with both OSVs (Canonical, Red Hat, SUSE) and the TDX
adoption (https://github.com/intel/tdx-linux) stacks.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-04-05 19:51:27 +02:00
Fabiano Fidêncio
b7cccfa019 qemu: tdx: Adapt command line
This commit is a mess, but I'm not exactly sure what's the best way to
make it less messy, as we're getting QEMU TDX to work while partially
reverting 1e34220c41.

With that said, let me cover the content of this commit.

Firstly, we're reverting all the changes related to
"memory-backend-memfd-private", as that's what was used with the
previous host stack, but it seems it
didn't fly upstream.

Secondly, in order to get QEMU to properly work with TDX, we need to
enforce the 'private=on' knob and use the "memory-backend-ram", and
we're doing so, and also making sure to test the `private=on` newly
added knob.

I'm sorry for the confusion, I understand this is not optimal, I just
don't see an easy path to do changes without leaving the code broken
during those changes.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-04-05 19:51:27 +02:00
Fabiano Fidêncio
6b4cc5ea6a Revert "qemu: tdx: Workaround SMP issue with TDX 1.5"
This reverts commit d1b54ede29.

 Conflicts:
	src/runtime/virtcontainers/qemu.go

This commit was a hack that was needed in order to get QEMU + TDX to
work atop of the stack our CI was running on.  As we're moving to "the
officially supported by distros" host OS, we need to get rid of this.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-04-05 10:23:52 +02:00
Fabiano Fidêncio
582b5b6b19 govmm: tdx: Expose the private=on|off knob
The private=on|off knob is required in order to properly lauunch a TDX
guest VM.

This is a brand new property that is part of the still in-flight patches
adding TDX support on QEMU.

Please, see:
3fdd8072da

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-04-05 10:23:52 +02:00
Alex Lyn
0e0a361f0e Merge pull request #8782 from Apokleos/device-increate-count
bugfix and refactor device increate count
2024-04-05 13:43:49 +08:00
Eric Ernst
da01bccd36 katautils: check number of cores on the system intead of go runtime
We used to utilize go runtime's "NumCPUs()", which will give the number
of cores available to the Go runtime, which may be a subset of physical
cores if the shim is started from within a cpuset. From the function's
description:
"NumCPU returns the number of logical CPUs usable by the current
process."

As an example, if containerd is run from within a smaller CPUset, the
maximum size of a pod will be dictated by this CPUset, instead of what
will be available on the rest of the system.

Since the shim will be moved into its own cgroup that may have a
different CPUset, let's stick with checking physical cores. This also
aligns with what we have documented for maxVCPU handling.

In the event we fail to read /proc/cpuinfo, let's use the goruntime.

Fixes: #9327

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2024-04-03 16:09:16 -07:00
Alex Lyn
935a1a3b40 runtime-rs: refactor decrease_attach_count with do_decrease_count
Try to reduce duplicated code in decrease_attach_count with public
new function do_decrease_count.

Fixes: #8738

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-04-03 17:19:19 +08:00
Alex Lyn
4f0fab938d runtime-rs: refactor increase_attach_count with do_increase_count
Try to reduce duplicated code in increase_attach_count with public
new function do_increase_count.

Fixes: #8738

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-04-03 17:19:19 +08:00
Alex Lyn
fff64f1c3e runtime-rs: introduce dedicated function do_decrease_count
Introduce a dedicated public function do_decrease_count to
reduce duplicated code in drivers' decrease_attach_count.

Fixes: #8738

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-04-03 17:19:08 +08:00
Alex Lyn
5750faaf31 runtime-rs: introduce dedicated function do_increase_count
Since there are many implementations of reference counting in the
drivers, all of which have the same implementation, we should try
to reduce such duplicated code as much as possible. Therefore, a
new function is introduced to solve the problem of duplicated code.

Fixes: #8738

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-04-03 17:09:17 +08:00
Guoqiang Ding
cd0c31e185 qemu: show the thread name when enable the hypervisor.debug option
Add debug-threads=on in the name argument if debug enabled.

Fixes: #9400
Signed-off-by: Guoqiang Ding <dgq8211@gmail.com>
2024-04-03 10:36:52 +08:00
Alex Lyn
fa8049af6c Merge pull request #9383 from Apokleos/unified-cgrp-cmdline
kata-agent: enabling cgroups-v2 by systemd.unified_cgroup_hierarchy
2024-04-02 09:08:04 +08:00
Alex Lyn
07bfdf4a22 Merge pull request #9275 from Apokleos/swap-hooks-bindmnt
kata-agent: Change order of guest hook and bind mount processing
2024-04-02 07:40:10 +08:00
Alex Lyn
c88014834b kata-agent: enabling cgroups-v2 by systemd.unified_cgroup_hierarchy
Configure the system to mount cgroups-v2 by default during system boot
by the systemd system, We must add systemd.unified_cgroup_hierarchy=1
parameter to kernel cmdline, which will be passed by kernel_params in
configuration.toml.
To enable cgroup-v2, just add systemd.unified_cgroup_hierarchy=true[1]
to kernel_params.

Fixes: #9336

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-04-01 18:45:12 +08:00
alex.lyn
548f252bc4 runtime-rs: bugfix incorrect use of refcount before vfio attach
When there's a pod with multiple containers, there may be case that
attach point more than 2, we should not return Err in that case when
we are doing attach ops, but just return Ok.

Fixes: #8738

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-04-01 11:28:57 +08:00
Alex Lyn
dfa8832406 Merge pull request #9345 from c3d/bug/9342-agent-test-errors
agent: Fix errors in `make check`
2024-04-01 09:48:44 +08:00
Dan Mihai
600f9266f3 runtime: remove stream copy infinite loop
This reverts commit 1c5693be86.

Avoid apparent infinite loop when ReadStreamRequest is blocked by
policy - for some of the pods.

When running the k8s-limit-range.bats test with Policy enabled,
the Shim + VMM never get terminated on my cluster. Not sure why
the sandbox clean-up works better for other tests, but the
k8s-limit-range test pod gets stuck in an infinite loop:

stdout io stream copy error happens: error = %wrpc error: code =
PermissionDenied desc = \"ReadStreamRequest is blocked by policy

...

policy check: ReadStreamRequest

...

stdout io stream copy error happens: error = %wrpc error: code =
PermissionDenied desc = \"ReadStreamRequest is blocked by policy

...

policy check: ReadStreamRequest

...

Fixes: #9380

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-03-28 22:43:28 +00:00
Dan Mihai
ebb26edf42 Merge pull request #9347 from microsoft/danmihai1/reduce-exec-test-policy-prints
genpolicy: reduce policy debug prints
2024-03-27 15:12:10 -07:00
Tobin Feldman-Fitzthum
9856fe5bea runtime: remove ServiceOffload parameter
Since we no longer use the service_offload configuration,
remove the ServiceOffload field from the image struct.

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
2024-03-27 12:21:13 -05:00
Tobin Feldman-Fitzthum
a18c7ca307 runtime: remove unimplemented CoCo configurations
These experimental options were added 2 years ago
in anticipation of features that would be added
in CoCo. These do not match the features that were
eventually added and will soon be ported to main.

Fixes: #8047

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
2024-03-27 12:21:06 -05:00
Chengyu Zhu
e66a5cb54d Merge pull request #9332 from ChengyuZhu6/guest-pull-timeout
Support to set timeout to pull large image in guest
2024-03-28 00:34:08 +08:00
Christophe de Dinechin
82c4079fd0 agent: Remove useless loop
This is the report from `make check`:

```
error: this loop never actually loops
   --> src/signal.rs:147:9
    |
147 | /         loop {
148 | |             select! {
149 | |                 _ = handle => {
150 | |                     println!("INFO: task completed");
...   |
156 | |             }
157 | |         }
    | |_________^
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#never_loop
    = note: `#[deny(clippy::never_loop)]` on by default
```

There is only one option: you get something or a timeout. You never retry, so
the report is correct.

Fixes: #9342

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2024-03-27 17:03:44 +01:00
Christophe de Dinechin
df5c88cdf0 agent: Remove lint error about .flatten running forever
The lint report is the following:

```
error: `flatten()` will run forever if the iterator repeatedly produces an `Err`
    --> src/rpc.rs:1754:10
     |
1754 |         .flatten()
     |          ^^^^^^^^^ help: replace with: `map_while(Result::ok)`
     |
note: this expression returning a `std::io::Lines` may produce an infinite number of `Err` in case of a read error
    --> src/rpc.rs:1752:5
     |
1752 | /     reader
1753 | |         .lines()
     | |________________^
     = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#lines_filter_map_ok
     = note: `-D clippy::lines-filter-map-ok` implied by `-D warnings`
     = help: to override `-D warnings` add `#[allow(clippy::lines_filter_map_ok)]`
```

This commit simply applies the suggestion.

Fixes: #9342

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2024-03-27 17:03:44 +01:00
Christophe de Dinechin
bfb55312be agent: Fix .enumerate errors during make check
Running `make check` in the `src/agent` directory gives:

```
error: you seem to use `.enumerate()` and immediately discard the index
   --> rustjail/src/mount.rs:572:27
    |
572 |     for (_index, line) in reader.lines().enumerate() {
    |                           ^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unused_enumerate_index
    = note: `-D clippy::unused-enumerate-index` implied by `-D warnings`
    = help: to override `-D warnings` add `#[allow(clippy::unused_enumerate_index)]`
help: remove the `.enumerate()` call
    |
572 |     for line in reader.lines() {
    |         ~~~~    ~~~~~~~~~~~~~~

    Checking tokio-native-tls v0.3.1
    Checking hyper-tls v0.5.0
    Checking reqwest v0.11.18
error: could not compile `rustjail` (lib) due to 1 previous error
warning: build failed, waiting for other jobs to finish...
make: *** [../../utils.mk:177: standard_rust_check] Error 101
```

Fixes: #9342

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2024-03-27 17:03:44 +01:00
Greg Kurz
e1068da1a0 Merge pull request #9326 from gkurz/draft-release
Only tag and publish the release when it is fully ready
2024-03-27 15:59:59 +01:00
ChengyuZhu6
c2dc13ebaa runtime: support to configure CreateContainer Timeout in configurations
support to configure CreateContainerRequestTimeout in the
configurations.

e.g.:
[runtime]
...
create_container_timeout = 300

Note: The effective timeout is determined by the lesser of two values: runtime-request-timeout from kubelet config
(https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#:~:text=runtime%2Drequest%2Dtimeout) and create_container_timeout.
In essence, the timeout used for guest pull=runtime-request-timeout<create_container_timeout?runtime-request-timeout:create_container_timeout.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-27 21:58:41 +08:00
Greg Kurz
693c9487d4 docs: Adjust release documentation
Most of the content of `docs/Stable-Branch-Strategy.md` got de-facto
deprecated by the re-design of the release process described in #9064.
Remove this file and all its references in the repo.

The `## Versioning` section has some useful information though. It is
moved to `docs/Release-Process.md`. The documentation of the `PATCH`
field is adapted according to new workflow.

Fixes #9064 - part VI

Signed-off-by: Greg Kurz <groug@kaod.org>
2024-03-27 12:41:48 +01:00
Steve Horsman
45aba769c0 Merge pull request #9346 from cmaf/ci-remove-repo-docs
Remove additional links to tests directory
2024-03-27 11:13:32 +00:00
ChengyuZhu6
2224f6d63f runtime: support to configure CreateContainer timeout in annotation
Support to configure CreateContainerRequestTimeout in the annotations.

e.g.:
annotations:
      "io.katacontainers.config.runtime.create_container_timeout": "300"

Note: The effective timeout is determined by the lesser of two values: runtime-request-timeout from kubelet config
(https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#:~:text=runtime%2Drequest%2Dtimeout) and create_container_timeout.
In essence, the timeout used for guest pull=runtime-request-timeout<create_container_timeout?runtime-request-timeout:create_container_timeout.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-27 15:44:29 +08:00
ChengyuZhu6
39bd462431 runtime: support to set timeout for CreateContainerRequest
In the situation to pull images in the guest #8484, it’s important to account for pulling large images.
Presently, the image pull process in the guest hinges on `CreateContainerRequest`, which defaults to a 60-second timeout.
However, this duration may prove insufficient for pulling larger images, such as those containing AI models.
Consequently, we must devise a method to extend the timeout period for large image pull.

Fixes: #8141

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-27 15:44:29 +08:00