Commit Graph

4120 Commits

Author SHA1 Message Date
Fabiano Fidêncio
cdb8531302 hypervisor: Simplify TDX protection detection
Let's rely on the kvm module 'tdx' parameter to do so.
This aligns with both OSVs (Canonical, Red Hat, SUSE) and the TDX
adoption (https://github.com/intel/tdx-linux) stacks.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-04-05 19:51:27 +02:00
Fabiano Fidêncio
b7cccfa019 qemu: tdx: Adapt command line
This commit is a mess, but I'm not exactly sure what's the best way to
make it less messy, as we're getting QEMU TDX to work while partially
reverting 1e34220c41.

With that said, let me cover the content of this commit.

Firstly, we're reverting all the changes related to
"memory-backend-memfd-private", as that's what was used with the
previous host stack, but it seems it
didn't fly upstream.

Secondly, in order to get QEMU to properly work with TDX, we need to
enforce the 'private=on' knob and use the "memory-backend-ram", and
we're doing so, and also making sure to test the `private=on` newly
added knob.

I'm sorry for the confusion, I understand this is not optimal, I just
don't see an easy path to do changes without leaving the code broken
during those changes.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-04-05 19:51:27 +02:00
Fabiano Fidêncio
6b4cc5ea6a Revert "qemu: tdx: Workaround SMP issue with TDX 1.5"
This reverts commit d1b54ede29.

 Conflicts:
	src/runtime/virtcontainers/qemu.go

This commit was a hack that was needed in order to get QEMU + TDX to
work atop of the stack our CI was running on.  As we're moving to "the
officially supported by distros" host OS, we need to get rid of this.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-04-05 10:23:52 +02:00
Fabiano Fidêncio
582b5b6b19 govmm: tdx: Expose the private=on|off knob
The private=on|off knob is required in order to properly lauunch a TDX
guest VM.

This is a brand new property that is part of the still in-flight patches
adding TDX support on QEMU.

Please, see:
3fdd8072da

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-04-05 10:23:52 +02:00
Alex Lyn
0e0a361f0e Merge pull request #8782 from Apokleos/device-increate-count
bugfix and refactor device increate count
2024-04-05 13:43:49 +08:00
Alex Lyn
935a1a3b40 runtime-rs: refactor decrease_attach_count with do_decrease_count
Try to reduce duplicated code in decrease_attach_count with public
new function do_decrease_count.

Fixes: #8738

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-04-03 17:19:19 +08:00
Alex Lyn
4f0fab938d runtime-rs: refactor increase_attach_count with do_increase_count
Try to reduce duplicated code in increase_attach_count with public
new function do_increase_count.

Fixes: #8738

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-04-03 17:19:19 +08:00
Alex Lyn
fff64f1c3e runtime-rs: introduce dedicated function do_decrease_count
Introduce a dedicated public function do_decrease_count to
reduce duplicated code in drivers' decrease_attach_count.

Fixes: #8738

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-04-03 17:19:08 +08:00
Alex Lyn
5750faaf31 runtime-rs: introduce dedicated function do_increase_count
Since there are many implementations of reference counting in the
drivers, all of which have the same implementation, we should try
to reduce such duplicated code as much as possible. Therefore, a
new function is introduced to solve the problem of duplicated code.

Fixes: #8738

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-04-03 17:09:17 +08:00
Alex Lyn
fa8049af6c Merge pull request #9383 from Apokleos/unified-cgrp-cmdline
kata-agent: enabling cgroups-v2 by systemd.unified_cgroup_hierarchy
2024-04-02 09:08:04 +08:00
Alex Lyn
07bfdf4a22 Merge pull request #9275 from Apokleos/swap-hooks-bindmnt
kata-agent: Change order of guest hook and bind mount processing
2024-04-02 07:40:10 +08:00
Alex Lyn
c88014834b kata-agent: enabling cgroups-v2 by systemd.unified_cgroup_hierarchy
Configure the system to mount cgroups-v2 by default during system boot
by the systemd system, We must add systemd.unified_cgroup_hierarchy=1
parameter to kernel cmdline, which will be passed by kernel_params in
configuration.toml.
To enable cgroup-v2, just add systemd.unified_cgroup_hierarchy=true[1]
to kernel_params.

Fixes: #9336

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-04-01 18:45:12 +08:00
alex.lyn
548f252bc4 runtime-rs: bugfix incorrect use of refcount before vfio attach
When there's a pod with multiple containers, there may be case that
attach point more than 2, we should not return Err in that case when
we are doing attach ops, but just return Ok.

Fixes: #8738

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-04-01 11:28:57 +08:00
Alex Lyn
dfa8832406 Merge pull request #9345 from c3d/bug/9342-agent-test-errors
agent: Fix errors in `make check`
2024-04-01 09:48:44 +08:00
Dan Mihai
600f9266f3 runtime: remove stream copy infinite loop
This reverts commit 1c5693be86.

Avoid apparent infinite loop when ReadStreamRequest is blocked by
policy - for some of the pods.

When running the k8s-limit-range.bats test with Policy enabled,
the Shim + VMM never get terminated on my cluster. Not sure why
the sandbox clean-up works better for other tests, but the
k8s-limit-range test pod gets stuck in an infinite loop:

stdout io stream copy error happens: error = %wrpc error: code =
PermissionDenied desc = \"ReadStreamRequest is blocked by policy

...

policy check: ReadStreamRequest

...

stdout io stream copy error happens: error = %wrpc error: code =
PermissionDenied desc = \"ReadStreamRequest is blocked by policy

...

policy check: ReadStreamRequest

...

Fixes: #9380

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-03-28 22:43:28 +00:00
Dan Mihai
ebb26edf42 Merge pull request #9347 from microsoft/danmihai1/reduce-exec-test-policy-prints
genpolicy: reduce policy debug prints
2024-03-27 15:12:10 -07:00
Tobin Feldman-Fitzthum
9856fe5bea runtime: remove ServiceOffload parameter
Since we no longer use the service_offload configuration,
remove the ServiceOffload field from the image struct.

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
2024-03-27 12:21:13 -05:00
Tobin Feldman-Fitzthum
a18c7ca307 runtime: remove unimplemented CoCo configurations
These experimental options were added 2 years ago
in anticipation of features that would be added
in CoCo. These do not match the features that were
eventually added and will soon be ported to main.

Fixes: #8047

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
2024-03-27 12:21:06 -05:00
Chengyu Zhu
e66a5cb54d Merge pull request #9332 from ChengyuZhu6/guest-pull-timeout
Support to set timeout to pull large image in guest
2024-03-28 00:34:08 +08:00
Christophe de Dinechin
82c4079fd0 agent: Remove useless loop
This is the report from `make check`:

```
error: this loop never actually loops
   --> src/signal.rs:147:9
    |
147 | /         loop {
148 | |             select! {
149 | |                 _ = handle => {
150 | |                     println!("INFO: task completed");
...   |
156 | |             }
157 | |         }
    | |_________^
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#never_loop
    = note: `#[deny(clippy::never_loop)]` on by default
```

There is only one option: you get something or a timeout. You never retry, so
the report is correct.

Fixes: #9342

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2024-03-27 17:03:44 +01:00
Christophe de Dinechin
df5c88cdf0 agent: Remove lint error about .flatten running forever
The lint report is the following:

```
error: `flatten()` will run forever if the iterator repeatedly produces an `Err`
    --> src/rpc.rs:1754:10
     |
1754 |         .flatten()
     |          ^^^^^^^^^ help: replace with: `map_while(Result::ok)`
     |
note: this expression returning a `std::io::Lines` may produce an infinite number of `Err` in case of a read error
    --> src/rpc.rs:1752:5
     |
1752 | /     reader
1753 | |         .lines()
     | |________________^
     = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#lines_filter_map_ok
     = note: `-D clippy::lines-filter-map-ok` implied by `-D warnings`
     = help: to override `-D warnings` add `#[allow(clippy::lines_filter_map_ok)]`
```

This commit simply applies the suggestion.

Fixes: #9342

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2024-03-27 17:03:44 +01:00
Christophe de Dinechin
bfb55312be agent: Fix .enumerate errors during make check
Running `make check` in the `src/agent` directory gives:

```
error: you seem to use `.enumerate()` and immediately discard the index
   --> rustjail/src/mount.rs:572:27
    |
572 |     for (_index, line) in reader.lines().enumerate() {
    |                           ^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unused_enumerate_index
    = note: `-D clippy::unused-enumerate-index` implied by `-D warnings`
    = help: to override `-D warnings` add `#[allow(clippy::unused_enumerate_index)]`
help: remove the `.enumerate()` call
    |
572 |     for line in reader.lines() {
    |         ~~~~    ~~~~~~~~~~~~~~

    Checking tokio-native-tls v0.3.1
    Checking hyper-tls v0.5.0
    Checking reqwest v0.11.18
error: could not compile `rustjail` (lib) due to 1 previous error
warning: build failed, waiting for other jobs to finish...
make: *** [../../utils.mk:177: standard_rust_check] Error 101
```

Fixes: #9342

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2024-03-27 17:03:44 +01:00
Greg Kurz
e1068da1a0 Merge pull request #9326 from gkurz/draft-release
Only tag and publish the release when it is fully ready
2024-03-27 15:59:59 +01:00
ChengyuZhu6
c2dc13ebaa runtime: support to configure CreateContainer Timeout in configurations
support to configure CreateContainerRequestTimeout in the
configurations.

e.g.:
[runtime]
...
create_container_timeout = 300

Note: The effective timeout is determined by the lesser of two values: runtime-request-timeout from kubelet config
(https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#:~:text=runtime%2Drequest%2Dtimeout) and create_container_timeout.
In essence, the timeout used for guest pull=runtime-request-timeout<create_container_timeout?runtime-request-timeout:create_container_timeout.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-27 21:58:41 +08:00
Greg Kurz
693c9487d4 docs: Adjust release documentation
Most of the content of `docs/Stable-Branch-Strategy.md` got de-facto
deprecated by the re-design of the release process described in #9064.
Remove this file and all its references in the repo.

The `## Versioning` section has some useful information though. It is
moved to `docs/Release-Process.md`. The documentation of the `PATCH`
field is adapted according to new workflow.

Fixes #9064 - part VI

Signed-off-by: Greg Kurz <groug@kaod.org>
2024-03-27 12:41:48 +01:00
Steve Horsman
45aba769c0 Merge pull request #9346 from cmaf/ci-remove-repo-docs
Remove additional links to tests directory
2024-03-27 11:13:32 +00:00
ChengyuZhu6
2224f6d63f runtime: support to configure CreateContainer timeout in annotation
Support to configure CreateContainerRequestTimeout in the annotations.

e.g.:
annotations:
      "io.katacontainers.config.runtime.create_container_timeout": "300"

Note: The effective timeout is determined by the lesser of two values: runtime-request-timeout from kubelet config
(https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#:~:text=runtime%2Drequest%2Dtimeout) and create_container_timeout.
In essence, the timeout used for guest pull=runtime-request-timeout<create_container_timeout?runtime-request-timeout:create_container_timeout.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-27 15:44:29 +08:00
ChengyuZhu6
39bd462431 runtime: support to set timeout for CreateContainerRequest
In the situation to pull images in the guest #8484, it’s important to account for pulling large images.
Presently, the image pull process in the guest hinges on `CreateContainerRequest`, which defaults to a 60-second timeout.
However, this duration may prove insufficient for pulling larger images, such as those containing AI models.
Consequently, we must devise a method to extend the timeout period for large image pull.

Fixes: #8141

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-27 15:44:29 +08:00
Chengyu Zhu
d16971e37e Merge pull request #9325 from ChengyuZhu6/image_service
agent:image: Refactor code to improve memory efficiency of image service
2024-03-26 10:38:37 +08:00
Dan Mihai
6c72c29535 genpolicy: reduce policy debug prints
Kata CI has full debug output enabled for the cbl-mariner k8s tests,
and the test AKS node is relatively slow. So debug prints from policy
are expensive during CI.

Fixes: #9296

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-03-26 02:21:26 +00:00
Alex Lyn
cec943fc26 Merge pull request #9244 from Apokleos/dgb-gpu
runtime-rs/dragonball: add support building kernel with upcall and GPU hotplug
2024-03-26 08:53:54 +08:00
Chelsea Mafrica
d69514766e src: Remove references to files in tests repo
Change scripts and source that uses files in the tests repo to use the
corresponding file in the current repo.

Fixes #9165

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2024-03-25 15:09:52 -07:00
Alex Lyn
5c54315a87 dragonball: fix CI failure due to poor UT adaptation.
Fixes: #9144

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-25 20:25:27 +08:00
ChengyuZhu6
f47408fdf4 agent:image: Refactor code to improve memory efficiency of image service
Currently, `.lock().await.clone()` results in `Option<ImageService>` being duplicated in memory with each call to `singleton()`.
Consequently, if kata-agent receives numerous image pulling requests simultaneously,
it will lead to the allocation of multiple `Option<ImageService>` instances in memory, thereby consuming additional memory resources.

In image.rs, we introduce two public functions:
`merge_bundle_oci()` and `init_image_service()`. These functions will encapsulate
the operations on `IMAGE_SERVICE`, ensuring that its internal details remain
hidden from external modules such as `rpc.rs`.

Fixes: #9225 -- part II

Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-25 07:46:50 +08:00
ChengyuZhu6
7a49ec1c80 agent:util: Refactor the unit tests to leverage rstest
Refactor the unit tests in util.rs to leverage rstest for parameterization.

Fixes: #9314

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-23 10:49:53 +08:00
ChengyuZhu6
2df2b4d30d agent:namespace: Refactor unit tests to leverage rstest
Refactor the unit tests in `namespace.rs` to leverage rstest for parameterization.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-23 10:49:48 +08:00
Hyounggyu Choi
d915a79e2d Merge pull request #9280 from BbolroC/enable-qemu-on-s390x
runtime-rs: Enable qemu on s390x
2024-03-22 23:58:42 +01:00
Hyounggyu Choi
81aaa34bd6 runtime-rs: Add DeviceVirtioSerial and DeviceVirtconsole
It is observed that virtiofsd exits immediately on s390x
if there is no attached console devices.
This commit resolves the issue by migrating `appendConsole()`
from runtime and being triggered in `start_vm()`.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-03-22 19:27:13 +01:00
Hyounggyu Choi
2cfe745efb runtime-rs: Enable memory backend option for Machine for s390x
For s390x, it requires an additional option `memory-backend` for `-machine`.
Otherwise, virtiofsd exits with HandleRequest(InvalidParam).

This commit is to add a field `memory_backend` to `struct Machine`
and turn it on for s390x.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-03-22 19:27:13 +01:00
Hyounggyu Choi
9bcfaad625 runtime-rs: Add ccw block device for rootfs
Like nvdimm for x86_64, a block device for s390x should be
treated differently with `virtio-blk-ccw`.
This is to generate a QEMU command line parameter for a block
device by using `-blockdev` and `-device` if the `vm_rootfs_driver`
is set to `virtio-blk-ccw`.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-03-22 19:27:13 +01:00
David Esparza
3e40051634 Merge pull request #9255 from dborquez/thread_pid_function
runtime-rs: ch: Implement full thread/tid/pid handling
2024-03-22 10:05:02 -06:00
Chengyu Zhu
9a4cb96262 Merge pull request #9312 from ChengyuZhu6/show-feature
agent: Add guest-pull to the list of agent features in announce()
2024-03-21 23:35:29 +08:00
David Esparza
b498e140a1 runtime-rs: ch: Implement full thread/tid/pid handling
Add in the full details once cloud-hypervisor/cloud-hypervisor#6103
has been implemented, and the feature is available in a Cloud Hypervisor
release.

Fixes: #8799

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2024-03-21 08:24:53 -06:00
ChengyuZhu6
754399d909 agent: Add guest-pull to the list of agent features in announce()
Add guest-pull to the list of agent features in announce().

Fixes: #9225 -- part IV

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-21 20:01:52 +08:00
Xuewei Niu
9c4f9dcb35 Merge pull request #9311 from studychao/chao/fix_mtrr
Dragonballl: introduce MTRR regs support
2024-03-21 17:24:27 +08:00
Hyounggyu Choi
9b2c08935b runtime-rs: Pass different device argument based on bus type
Currently, `*-pci` is used as an argument for the device config.
It is not true for a case where a different type of bus is used.
s390x uses `ccw`.
This commit is to make it flexible to generate the device argument
based on the bus type. A structure `DeviceVhostUserFsPci` and
`VhostVsockPci` is renamed to `DeviceVhostUserFs` and `VhostVsock`
because the structure name is not bound to a certain bus type any more.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-03-21 09:25:37 +01:00
Hyounggyu Choi
7b3d1adb8c libs: Bump sysinfo to v0.30.5
It has been observed that the runtime stops running around
`sysinfo::total_memory()` while adjusting a config on s390x.
This is to update the crate to the latest version which happened
to resolve the issue. (No explicit release note for this)

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-03-20 09:27:13 +01:00
Chao Wu
5a4b858ece Dragonballl: introduce MTRR regs support
MTRR, or Memory-Type Range Registers are a group of x86 MSRs providing a way to control access
 and cache ability of physical memory regions.
During our test in runtime-rs + Dragonball, we found out that this register support is a must
for passthrough GPU running CUDA application, GPU needs that information to properly use GPU memory.

fixes: #9310
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2024-03-20 14:18:16 +08:00
ChengyuZhu6
5bad18f9c9 agent: set https_proxy/no_proxy before initializing agent policy
When the https_proxy/no_proxy settings are configured alongside agent-policy enabled, the process of pulling image in the guest will hang.
This issue could stem from the instantiation of `reqwest`’s HTTP client at the time of agent-policy initialization,
potentially impacting the effectiveness of the proxy settings during image guest pulling.
Given that both functionalities use `reqwest`, it is advisable to set https_proxy/no_proxy prior to the initialization of agent-policy.

Fixes: #9212

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-19 18:06:00 +01:00
ChengyuZhu6
db9f18029c README: Add https_proxy and no_proxy to agent README
Add agent.https_proxy and agent.no_proxy to the table in the agent README.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-19 18:06:00 +01:00