Commit Graph

4749 Commits

Author SHA1 Message Date
Fabiano Fidêncio
eea4277fbf
runtime: Update runc to v1.1.12
Although we don't seem to be affected by
https://nvd.nist.gov/vuln/detail/CVE-2024-21626, we vendor and use the
runc package in a few different places of our code, and we better update
the package to its latest release.

Fixes: #9097

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-02-14 23:13:39 +01:00
Greg Kurz
d7afd31fd4
Merge pull request #8455 from BbolroC/runtime-rs-qemu-config
runtime-rs: Add a new config option for QEMU
2024-02-10 08:48:23 +01:00
Dan Mihai
a054462eb7
Merge pull request #9051 from microsoft/danmihai1/k8s-copy-file
tests: k8s: k8s-copy-file auto-generated policy
2024-02-09 12:30:49 -08:00
Hyounggyu Choi
05c4c8055c runtime-rs: Configure argument replacement for QEMU in Makefile
Last but not least, all placeholders for argument replacement
should be configured to generate a configuration file when `QEMUCMD`
is defined. This enriches those variables.

Additionally, this involves creating a symbolic link to `configuration-qemu.toml`
if QEMU is defined as the default hypervisor.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-02-09 19:31:20 +01:00
Hyounggyu Choi
27cb30d8ce runtime-rs: Adjust configuration template for runtime-rs
There are some variables newly introduced to runtime-rs, such as:

- runtime.name
- runtime.hypervisor_name
- runtime.agent_name
- vm_rootfs_driver

Additionally some of the placeholders for argument replacement are
made hypervisor-specific based on the changes made for dragonball.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-02-09 16:26:59 +01:00
Steve Horsman
b99f574522
Merge pull request #9037 from niteeshkd/nd_SevSnpGuest
runtime: fix creation of SEV confidential container on SNP enabled host.
2024-02-08 09:29:20 +00:00
Greg Kurz
6ead48ec06
Merge pull request #8986 from pmores/drop-shim-v2-address-value-validation
runtime-rs: fix interoperability issues between runtime-rs and cri-o
2024-02-08 09:44:12 +01:00
Dan Mihai
9a780aa98f genpolicy: improve logging from ExecProcessRequest
Additional logging from the ExecProcessRequest rules, for easier
debugging.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-02-08 02:21:58 +00:00
Dan Mihai
dab567bdfa genpolicy: add easy way to allow CloseStdinRequest
For example, Kata CI's k8s-copy-file.bats transfers files between the
Host and the Guest using "kubectl exec", and that results in
CloseStdinRequest being called from the Host.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-02-08 02:21:58 +00:00
Dan Mihai
8401adb113 genpolicy: update default values
1. Remove PullImageRequest because that is not used in the main
   branch. It was used in the CCv0 branch.

2. Add default false values for the remaining Kata Agent ttrpc
   requests.

These changes don't change the functionality of the auto generated
Policy, but they help with easier understanding the Policy text and
the logging from the Rego rules.

Fixes: #9049

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-02-08 02:21:58 +00:00
Dan Mihai
535db6b29c
Merge pull request #9043 from ChengyuZhu6/assert
runtime-rs: fix assert error in `make check`
2024-02-07 18:19:18 -08:00
Dan Mihai
01745689e1
Merge pull request #9029 from microsoft/danmihai1/k8s-empty-dirs
genpolicy: mount source for non-confidential guest
2024-02-07 11:26:16 -08:00
Pavel Mores
6346e04cf7 runtime-rs: fix handling of TTRCP_ADDRESS
Since cri-o doesn't seem to use address for event publishing as mentioned
in the previous commit it will not send it.  However, the exact way of
not sending it is unfortunately different from what is assumed by
runtime-rs.  Due to an implementation detail of cri-o which uses containerd
libraries for some low-level tasks, TTRPC_ADDRESS will not be missing from
environment as assumed, instead it will be present with an empty value.

This commit contains a small adjustment to account for that and use
LogForwarder even if TTRPC_ADDRESS is present, but with an empty value.

Fixes #8985

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-02-07 17:01:04 +01:00
ChengyuZhu6
34c47e08b2 runtime-rs: fix assert error in test in make check
Fix assert error:
error: used `assert_eq!` with a literal bool
   --> crates/hypervisor/src/ch/inner.rs:218:9
    |
218 |         assert_eq!(state.jailed, false);
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#bool_assert_comparison
    = note: `-D clippy::bool-assert-comparison` implied by `-D warnings`

Fixes: #9042

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-02-07 19:31:10 +08:00
Archana Shinde
d9ce88ada3
Merge pull request #8704 from amshinde/runtime-rs-clh-implement-persist
runtime-rs: implement persist api for cloud-hypervisor
2024-02-07 02:29:33 -08:00
Niteesh Dubey
3e383674f8 runtime: fix creation of SEV confidential container on SNP enabled host.
This is needed to fix the bug which is not allowing to create SEV container
on SNP enabled host anymore. This is a regression that was introduced as
part of the following commit:
de39fb7d38

Fixes: #9036

Signed-off-by: Niteesh Dubey <niteesh@us.ibm.com>
2024-02-06 19:01:30 +00:00
Hyounggyu Choi
462afcf829 runtime-rs: Copy configuration for QEMU from runtime
It makes sense to reuse a configuration template for runtime-golang
as a base. This is simply to copy it into the config directory.

Fixes: #8441

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-02-06 19:35:44 +01:00
Pavel Mores
f0256fded5 runtime-rs: remove validation of shim v2 -address value
It appears that under the shim v2 protocol, a shim has no use of its own
for the -address value, it just passes it back to container runtime's
(mostly containerd or cri-o) event-publishing binary.  Since the -address
value only flows through the shim, being passed to the shim by a container
runtime and then essentially passed back by shim to the container runtime,
it seems inappropriate for a shim to validate the value that is fully
owned and only used by the container runtime.

This commit removes such validation from runtime-rs.  Doing so, it solves
(part of) an interoperability problem between runtime-rs and cri-o.  cri-o
seems to intentionally choose not to implement the event-publishing part
of the shim v2 protocol and thus it has no value it could pass to
runtime-rs for -address.  As a result, it sends an empty string which has
been failing the excessive validation performed by runtime-rs so far.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-02-06 13:43:09 +01:00
Alex Lyn
1ab9a21492
Merge pull request #8552 from deagon/fix/missing-port-type
runtime: missing port type in the DeviceInfo
2024-02-06 10:56:46 +08:00
Dan Mihai
473efc2149 genpolicy: mount source for non-confidential guest
The emergent Kata CI tests for Policy use confidential_guest = false
in genpolicy-settings.json. That value is inconsistent with the
following mount settings:

        "emptyDir": {
            "mount_type": "local",
            "mount_source": "^$(cpath)/$(sandbox-id)/local/",
            "mount_point": "^$(cpath)/$(sandbox-id)/local/",
            "driver": "local",
            "source": "local",
            "fstype": "local",
            "options": [
                "mode=0777"
            ]
        },

We need to keep those settings for confidential_guest = true, and
change confidential_guest = false to use:

        "emptyDir": {
            "mount_type": "local",
            "mount_source": "^$(cpath)/$(sandbox-id)/rootfs/local/",
            "mount_point": "^$(cpath)/$(sandbox-id)/local/",
            "driver": "local",
            "source": "local",
            "fstype": "local",
            "options": [
                "mode=0777"
            ]
        },

The value of the mount_source field is different.

This change unblocks testing using Kata CI's pod-empty-dir.yaml:

genpolicy -u -y pod-empty-dir.yaml

kubectl apply -f pod-empty-dir.yaml

k get pod sharevol-kata
NAME            READY   STATUS    RESTARTS   AGE
sharevol-kata   1/1     Running   0          53s

Fixes: #8887

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-02-06 01:19:48 +00:00
Fabiano Fidêncio
1362918ff0
Merge pull request #9011 from fidencio/topic/switch-to-using-the-confidential-rootfs
runtime: Replace TEE specific initrd / image for the confidential one
2024-02-05 10:43:12 +01:00
Guoqiang Ding
6068faf40b runtime: failed to run in the case of ColdPlugVFIO
Add the missing port type in the DeviceInfo.

Fixes: #9014
Signed-off-by: Guoqiang Ding <dgq8211@gmail.com>
2024-02-05 17:30:11 +08:00
Archana Shinde
b3c74411f6 runtime-rs: Add tests for persist api for clh
Add tests to check clh struct is saved/restored correctly.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2024-02-04 22:03:57 -08:00
Archana Shinde
0b78296dca runtime-rs: Store additional field for hypervisor state
Implementing Persist API for cloud-hypervisor was done partially with
initial support for cloud-hypervisor. Store and retrieve additional
fields to/from the hypervisor state.

Fixes: #6202

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2024-02-04 22:03:57 -08:00
Archana Shinde
a5f0b92bca runtime-rs: Add guest protection to hypervisor state
Store guest-protection used while storing the state of the hypervisor.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2024-02-04 22:03:54 -08:00
Alex Lyn
cf74166d75
Merge pull request #9015 from Apokleos/bugfix-exec-uds
runtime: display accurate error msg to avoid misleading users.
2024-02-05 13:50:43 +08:00
Alex Lyn
c6830ceb89 runtime: display accurate error msg to avoid misleading users.
The original handling method does not reach user expectations.
When the ClientSocketAddress method stats the corresponding
path of runtime-rs and has not found it yet, we should return
an error message here that includes the reason for the failure
(which should be an error display indicating that both runtime-go
and runtime-rs were not found). Instead of simply displaying the
corresponding path of runtime-rs as the final error message to
users.
It is also necessary to return the error promptly to the caller
for further error handling.

Fixes: #8999

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-02-04 16:45:59 +08:00
Guoqiang Ding
7bf1ebe16d kata-monitor: fix agentUrl from containerd shim
Fix the missing leading slash.

Fixes: #9013
Signed-off-by: Guoqiang Ding <dgq8211@gmail.com>
2024-02-04 16:24:13 +08:00
Fabiano Fidêncio
e4258d8694
runtime: Use confidential image / initrd instead of TEE specific ones
Now that we have a confidential image / initrd being built, instead of a
specific one for each TEE, let's use it everywhere possible.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-02-03 13:20:14 +01:00
Fabiano Fidêncio
3755c69165
runtime: makefile: remove SNP specific kernel references
As this is not used anymore, we can go ahead and just remove it

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-02-02 21:12:21 +01:00
Fabiano Fidêncio
57b132f94c
runtime: makefile: remove SEV specific kernel references
As this is not used anymore, we can go ahead and just remove it

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-02-02 21:12:21 +01:00
Fabiano Fidêncio
2562d23242
runtime: makefile: remove TDX specific kernel references
As this is not used anymore, we can go ahead and just remove it.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-02-02 21:11:43 +01:00
Fabiano Fidêncio
f4e3c936d8
runtime: snp: config: Use the confidential kernel
As we're building a single confidential kernel, we should rely on it
rather than keep using the specific ones for TDX / SEV / SNP.

However, for debugability-sake, let's do this change TEE by TEE.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-02-02 21:11:36 +01:00
Fabiano Fidêncio
8731366d7b
runtime: sev: config: Use the confidential kernel
As we're building a single confidential kernel, we should rely on it
rather than keep using the specific ones for TDX / SEV / SNP.

However, for debugability-sake, let's do this change TEE by TEE.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-02-02 21:11:36 +01:00
Fabiano Fidêncio
6cbdba7268
runtime: tdx: config: Use the confidential kernel
As we're building a single confidential kernel, we should rely on it
rather than keep using the specific ones for TDX / SEV / SNP.

However, for debugability-sake, let's do this change TEE by TEE.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-02-02 17:13:06 +01:00
Fabiano Fidêncio
a618461d3a
runtime: Add confidential kernel to the makefile
With this we can properly generate and the the `-confidential` kernel,
which supports SEV / SNP / TDX as part of our configuration files.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-02-02 17:13:05 +01:00
Wenyuan Liu
cb888516c1
Merge pull request #8760 from fadecoder/reduce_go_runtime_mounts
runtime: Reduce the mount points with namespace isolation
2024-02-02 16:54:44 +08:00
Greg Kurz
d1a26ead94
Merge pull request #8454 from BbolroC/compile-with-qemu-s390x
runtime-rs: make compilation for QEMU on s390x
2024-02-02 09:29:32 +01:00
Hyounggyu Choi
bb6f5073aa runtime-rs: Allow compilation for s390x
Until now, runtime-rs couldn't be compiled on s390x.
We need to lift those restrictions in Makefile first.

Fixes: #8446

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-02-01 23:48:15 +01:00
Dan Mihai
6f1062b5d6
Merge pull request #8966 from microsoft/danmihai1/k8s-sandbox-vcpus-allocation
genpolicy: ignore empty YAML as input
2024-02-01 13:51:02 -08:00
Dan Mihai
8f9c92c0ee
Merge pull request #8977 from microsoft/danmihai1/default-namespace
genpolicy: support non-default namespace name
2024-02-01 13:50:33 -08:00
Hyounggyu Choi
8fcee6e6ec runtime-rs: Use Persist::restore() of QEMU for VirtSandbox
It fails to compile virt_container because Dragonball is only
used in the implementation of the trait method Persist::restore().
As the hypervisor is not compiled on s390x and QEMU implements
the trait method, this commit is to let the method use QEMUi's.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-02-01 18:02:10 +01:00
Hyounggyu Choi
56aef3741d runtime-rs: Exclude hypervisors plugins except QEMU for s390x
Dragonball and cloud-hypervisor are not supported on s390x.
We need to exclude the plugins for these hypervisors from compilation.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-02-01 18:02:10 +01:00
Zhigang Wang
9317e23df1 mount: Reduce the mount points with namespace isolation
This patch can reduce load on systemd process, and
increase the k8s deployment density when using go runtime.

Fixes: #8758

Signed-off-by: Zhigang Wang <wangzhigang17@huawei.com>
Signed-off-by: Liu Wenyuan <liuwenyuan9@huawei.com>
2024-02-01 18:34:24 +08:00
Xuewei Niu
2332552c8f
Merge pull request #7483 from frezcirno/passfd_io_feature
runtime-rs: improving io performance using dragonball's vsock fd passthrough
2024-02-01 14:53:53 +08:00
Alex Lyn
cf26c16017
Merge pull request #8931 from yaoyinnan/8930/feat/merge-ValidCgroupPath
runtime: merged ValidCgroupPath method
2024-02-01 12:53:55 +08:00
Alex Lyn
a157fc3b74
Merge pull request #8974 from yaoyinnan/5240/fix/cgroup-parallel
runtime: add SingleContainer when obtaining OCI Spec
2024-02-01 11:43:02 +08:00
Alex Lyn
1b8f3ce28a
Merge pull request #8929 from yaoyinnan/8838/fix/error-message
runtime-rs: report error on missing or empty fields in configuration
2024-02-01 11:02:30 +08:00
Dan Mihai
09ea0eed9d genpolicy: ignore empty YAML as input
Kata CI's pod-sandbox-vcpus-allocation.yaml ends with "---", so the
empty YAML document following that line should be ignored.

To test this fix:

genpolicy -u -y pod-sandbox-vcpus-allocation.yaml

Fixes: #8895

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-02-01 02:22:21 +00:00
Dan Mihai
befef119ff
Merge pull request #8941 from malt3/genpolicy-flags
genpolicy: allow separate paths for rules and settings files
2024-01-31 18:14:12 -08:00
Dan Mihai
21125baec3
Merge pull request #8962 from microsoft/danmihai1/config-map-optional2
genpolicy: ignore volume configMap optional field
2024-01-31 12:29:30 -08:00
Greg Kurz
8b1dc06971
Merge pull request #8938 from pmores/log-qemus-stderr-in-shim-log
runtime-rs: Log qemu's stderr in shim log
2024-01-31 18:04:28 +01:00
Dan Mihai
f0339a79a6 genpolicy: support non-default namespace name
Allow users to specify in genpolicy-settings.json a default cluster
namespace other than "default". For example, Kata CI uses as default
namespace: "kata-containers-k8s-tests".

Fixes: #8976

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-01-31 15:47:01 +00:00
Zixuan Tan
222de4f684 agent: Fix a race condition in passfd_io.rs
There is a race condition in agent HVSOCK_STREAMS hashmap, where a
stream may be taken before it is inserted into the hashmap. This patch
add simple retry logic to the stream consumer to alleviate this issue.

Fixes: #6714
Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2024-01-31 21:07:48 +08:00
Zixuan Tan
6e4d4c329a agent,runtime-rs: Add license header to passfd_io.rs
Fixes: #6714
Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2024-01-31 21:07:48 +08:00
Zixuan Tan
1206de2c23 agent: Use pipes as stdout/stderr of container process
Linux forbids opening an existing socket through /proc/<pid>/fd/<fd>,
making some images relying on the special file /dev/stdout(stderr),
/proc/self/fd/1(2) fail to boot in passfd io mode, where the
stdout/stderr of a container process is a vsock socket.

For back compatibility, a pipe is introduced between the process
and the socket, and its read end is set as stdout/stderr of the
container process instead of the socket. The agent will do the
forwarding between the pipe and the socket.

Fixes: #6714
Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2024-01-31 21:07:48 +08:00
Zixuan Tan
f6710610d1 agent,runtime-rs,runk: fix fmt and clippy warnings
Fix rustfmt and clippy warnings detected by CI.

Fixes: #6714
Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2024-01-31 21:07:48 +08:00
Zixuan Tan
89be42a177 runtime-rs: open stdout and stderr fifos NONBLOCK
This patch adds O_NONBLOCK flag when open stdout and stderr FIFOs
to avoid blocking.

Fixes: #6714
Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2024-01-31 21:07:48 +08:00
Zixuan Tan
3eb4bed957 agent: use biased select to avoid data loss
This patch uses a biased select to avoid stdin data loss in case of
CloseStdinRequest.

Fixes: #6714
Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2024-01-31 21:07:48 +08:00
Zixuan Tan
7874ef5fd2 agent: set stdout/err vsock stream as blocking before passing to child
In passfd io mode, when not using a terminal, the stdout/stderr vsock
streams are directly used as the stdout/stderr of the child process.
These streams are non-blocking by default.

The stdout/stderr of the process should be blocking, otherwise
the process may encounter EAGAIN error when writing to stdout/stderr.

Fixes: #6714
Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2024-01-31 21:07:48 +08:00
Fupan Li
cfb262d02f container: keep the io connection when pass fd to hybrid vsock
We want the io connection keep connected when the containerd closed
the io pipe, thus it can be attached on the io stream.

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2024-01-31 21:07:48 +08:00
Fupan Li
4a762fcfdd dbs: hybrid stream support keep the connection when local closed
Support the hybrid fd passthrough mode with passing pipe fd,
which can specify this connection kept even when the pipe
peer closed, and this connection can be reget wich re-opening
the pipe.

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2024-01-31 21:07:48 +08:00
Zixuan Tan
5536743361 agent,runtime-rs: fix container io detach and attach
Partially fix some issues related to container io detach and attach.

Fixes: #6714
Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2024-01-31 21:07:48 +08:00
Zixuan Tan
657b17a86f runtime-rs: open stdin fifo with RDWR|NONBLOCK when pass vsock streams
In linux, when a FIFO is opened and there are no writers, the reader
will continuously receive the HUP event. This can be problematic
when creating containers in detached mode, as the stdin FIFO writer
is closed after the container is created, resulting in this situation.

In passfd io mode, open stdin fifo with O_RDWR|O_NONBLOCK to avoid the
HUP event.

Fixes: #6714
Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2024-01-31 21:07:48 +08:00
Zixuan Tan
f1b33fd2e0 agent: clean up term master fd when container exits
When container exits, the agent should clean up the term master fd,
otherwise the fd will be leaked.

Fixes: kata-containers#6714

Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2024-01-31 21:07:48 +08:00
Zixuan Tan
b8632b4034 dragonball: vsock: properly handle EPOLLHUP/EPOLLERR events
When one end of the connection close, the epoll event will be triggered
forever. We should close the connection and kill the connection.

Fixes: #6714

Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2024-01-31 21:07:48 +08:00
Zixuan Tan
442df71fe5 agent,runtime-rs: refactor process io using vsock fd passthrough feature
Currently in the kata container, every io read/write operation requires
an RPC request from the runtime to the agent. This process involves
data copying into/from an RPC request/response, which are high overhead.

To solve this issue, this commit utilize the vsock fd passthrough, a
newly introduced feature in the Dragonball hypervisor. This feature
allows other host programs to pass a file descriptor to the Dragonball
process, directly as the backend of an ordinary hybrid vsock connection.

The runtime-rs now utilizes this feature for container process io. It
open the stdin/stdout/stderr fifo from containerd, and pass them to
Dragonball, then don't bother with process io any more, eliminating
the need for an RPC for each io read/write operation.

In passfd io mode, the agent uses the vsock connections as the child
process's stdin/stdout/stderr, eliminating the need for a pipe
to bump data (in non-tty mode).

Fixes: #6714

Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2024-01-31 21:07:48 +08:00
Zixuan Tan
eb6bb6fe0d config: add two options to control vsock passthrough io feature
Two toml options, `use_passfd_io` and `passfd_listener_port` are introduced
to enable and configure dragonball's vsock fd passthrough io feature.

This commit is a preparation for vsock fd passthrough io feature.

Fixes: #6714

Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2024-01-31 21:07:48 +08:00
Zixuan Tan
973b5ad1f4 runtime-rs: make Container::new async
Fixes: #6714

Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2024-01-31 21:07:48 +08:00
Malte Poll
531a11159f genpolicy: allow separate paths for rules and settings files
Using custom input paths with -i is counter-intuitive. Simplify path handling with explicit flags for rules.rego and genpolicy-settings.json.

Fixes: #8568

Signed-Off-By: Malte Poll <1780588+malt3@users.noreply.github.com>
2024-01-31 11:00:19 +01:00
yaoyinnan
9aa1ed805a runtime: add SingleContainer when obtaining OCI Spec
When creating a cgroup, add a SingleContainer when obtaining the OCI Spec to apply to ctr, podman, etc.

Fixes: #5240

Signed-off-by: yaoyinnan <35447132+yaoyinnan@users.noreply.github.com>
2024-01-31 15:24:07 +08:00
yaoyinnan
b0b8523cea runtime: modify ValidCgroupPath unit test
Modify ValidCgroupPath unit test.

Fixes: #8930

Signed-off-by: yaoyinnan <35447132+yaoyinnan@users.noreply.github.com>
2024-01-31 14:37:17 +08:00
yaoyinnan
feed5c8ff9 runtime: merged ValidCgroupPath method
Merged ValidCgroupPath method to handle cgroupv1 and cgroupv2.

Fixes: #8930

Signed-off-by: yaoyinnan <35447132+yaoyinnan@users.noreply.github.com>
2024-01-31 14:37:13 +08:00
yaoyinnan
864389c524 runtime-rs: report error on missing or empty fields in configuration
Removed the setting of default values for runtime fields. Added explicit checks for missing or empty fields, reporting errors with clear messages.

Fixes: #8838

Signed-off-by: yaoyinnan <35447132+yaoyinnan@users.noreply.github.com>
2024-01-31 12:46:17 +08:00
Kvlil
3fd5628771 dragonball: fix noop-method-call warning
The `noop-method-call` is a rustc lint that has existed since v1.52.0.
This lint has been moved to the warn by default lint level since v1.73.0.
Therefore build is failing with this version and above.
This commit removes the unnecessary call to `<&T as Deref>::deref` on `T: !Deref`.

Fixes: #8586

Signed-off-by: Kvlil <kalil.pelissier@gmail.com>
2024-01-30 17:16:49 +00:00
Wainer Moschetta
bf54a02e16
Merge pull request #8924 from microsoft/danmihai1/pod-nested-configmap-secret
genpolicy: fix ConfigMap volume mount paths
2024-01-30 14:09:41 -03:00
Dan Mihai
d12875ee66 genpolicy: ignore volume configMap optional field
The auto-generated Policy already allows these volumes to be mounted,
regardless if they are:
- Present, or
- Missing and optional

Fixes: #8893

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-01-30 15:32:37 +00:00
Xuewei Niu
7e10000b6f
Merge pull request #8928 from yaoyinnan/8927/fix/unused-DriverInfo
runtime-rs: fix unused driverInfo error
2024-01-30 20:39:10 +08:00
Pavel Mores
d53edbd0a5 runtime-rs: collect qemu stderr and log it in shim log
Qemu stderr monitoring runs in its own asynchronous green thread.
For that, `stderr` is taken out of the Child representing the qemu child
process to avoid partial move and make it possible for the main thread
still to call functions on QemuInner::qemu_process (e.g. kill(), id()).

Fixes #8937

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-01-30 09:09:05 +01:00
Pavel Mores
684d740122 runtime-rs: switch qemu child process management from std to tokio
We'll want to capture qemu's stderr in parallel with normal runtime-rs
execution.  Tokio's primitives make this much easier than std's.  This
also makes child process management more consistent across runtime-rs
(i.e. virtiofsd child process is already launched and managed using tokio).

Some changes were necessary due to tokio functions being slightly different
from their std counterparts.  Child::kill() is now async and Child::id()
now returns an Option.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-01-30 09:07:14 +01:00
Dan Mihai
6a8f46f3b8
Merge pull request #8918 from microsoft/danmihai1/metadata
genpolicy: optional PodTemplateSpec metadata field
2024-01-29 12:36:30 -08:00
Dan Mihai
60ac3048e9 genpolicy: fix ConfigMap volume mount paths
Allow Kata CI's pod-nested-configmap-secret.yaml to work with
genpolicy and current cbl-mariner images:

1. Ignore the optional type field of Secret input YAML files.

   It's possible that CoCo will need a more sophisticated Policy
   for Secrets, but this change at least unblocks CI testing for
   already-existing genpolicy features.

2. Adapt the value of the settings field below to fit current CI
   images for testing on cbl-mariner Hosts:

    "kata_config": {
        "confidential_guest": false
    },

    Switching this value from true to false instructs genpolicy to
    expect ConfigMap volume mounts similar to:

        "configMap": {
            "mount_type": "bind",
            "mount_source": "$(sfprefix)",
            "mount_point": "^$(cpath)/watchable/$(bundle-id)-[a-z0-9]{16}-",
            "driver": "watchable-bind",
            "fstype": "bind",
            "options": [
                "rbind",
                "rprivate",
                "ro"
            ]
        },

    instead of:

        "confidential_configMap": {
            "mount_type": "bind",
            "mount_source": "$(sfprefix)",
            "mount_point": "$(sfprefix)",
            "driver": "local",
            "fstype": "bind",
            "options": [
                "rbind",
                "rprivate",
                "ro"
            ]
        }
    },

    This settings change unblocks CI testing for ConfigMaps.

Simple sanity testing for these changes:

genpolicy -u -y pod-nested-configmap-secret.yaml

kubectl apply -f pod-nested-configmap-secret.yaml

kubectl get pods | grep config
nested-configmap-secret-pod 1/1     Running   0          26s

Fixes: #8892

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-01-29 16:13:47 +00:00
Pavel Mores
b52a398469 runtime-rs: move creation of VM path from start_vm() to prepare_vm()
This fixes a flaw pointed out in review of PR #8185.  Creation of the
directory semantically fits better into VM preparation than VM launch.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-01-27 13:46:35 +01:00
Dan Mihai
076869aa39 genpolicy: ignore the nodeName field
Validating the node name is currently outside the scope of the CoCo
policy.

This change unblocks testing using Kata CI's test-pod-file-volume.yaml
and pv-pod.yaml.

Fixes: #8888

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-01-26 16:30:55 +00:00
yaoyinnan
9b7c5c69cf runtime-rs: fix unused driverInfo error
Remove the unused DriverInfo declaration or integrate it into the codebase where applicable.

Fixes: #8927
Signed-off-by: yaoyinnan <35447132+yaoyinnan@users.noreply.github.com>
2024-01-26 19:59:52 +08:00
Dan Mihai
8ad5459beb genpolicy: optional PodTemplateSpec metadata field
Add metadata containing the Policy annotation if the user didn't
provide any metadata in the input yaml file.

For a simple sanity test using a Kata CI YAML file:

genpolicy -u -y job.yaml

kubectl apply -f job.yaml

kubectl get pods | grep job
job-pi-test-64dxs 0/1     Completed   0          14s

Fixes: #8891

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-01-25 19:06:59 +00:00
Dan Mihai
535cf04edb genpolicy: add shareProcessNamespace support
Validate the sandbox_pidns field value for CreateSandbox and
CreateContainer.

Fixes: #8868

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-01-25 16:48:57 +00:00
Kvlil
a4b208a712 runtime: remove SharedVersions field dead code
SharedVersion fiel add a versiontable property that isn't supported by upstream QEMU.
This is dead code since virtcontainers isn't setting SharedVersions to true.

Fixes: #7720

Signed-off-by: Kvlil <kalil.pelissier@gmail.com>
2024-01-22 12:18:42 +00:00
Fabiano Fidêncio
1e30fde8fa
Merge pull request #8862 from microsoft/danmihai1/genpolicy-dns
genpolicy: ignore pod DNS settings
2024-01-19 23:08:26 +01:00
Dan Mihai
ca03d47634 genpolicy: ignore pod DNS settings
Ignore pod DNS settings because policing the network traffic is
currently outside the scope of the Agent Policy.

Example from Kata CI: pod-custom-dns.yaml

Fixes: #8832

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-01-19 16:42:35 +00:00
Alex.Lyn
826c751bf3
Merge pull request #8185 from pmores/add-qemu-cmdline-generation-framework
Add qemu cmdline generation framework
2024-01-19 21:42:49 +08:00
Pavel Mores
25c8d5db5d runtime-rs: use qemu cmdline generation framework to launch VM
Deploy the framework added by the previous commit to generate qemu
command line and launch the VM.

We now properly store the child process object which allows us to
implement remaining Hypervisor functions necessary for a simple but
successful VM lifecycle, get_vmm_master_tid() and stop_vm().

Fixes #8184

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-01-19 11:42:23 +01:00
Amulyam24
f6fea5f2ca agent: fix failing unit tests on ppc64le
- test_volume_capacity_stats: verify the file block size against the fetched size via statfs()
 - test_reseed_rng: Correct the request codes for RNDADDTOENTCNT and RNDRESEEDCRNG when platform is ppc64le
 - test list_routes: Add the route only if destination is not empty
 - test_new_fs_manager: skip the test if cgroups v2 is used by default
 - skip test cases rpc::tests::test_do_write_stream, sandbox::tests::test_find_process, sandbox::t
ests::test_find_container_process and sandbox::tests::add_and_get_container on ppc64le as they are fl
aky

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2024-01-18 16:32:16 +01:00
Hyounggyu Choi
610f878894 dragonball: Fix compile error for aarch64
This is to fix a compile error raised for aarch64.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-01-18 16:32:15 +01:00
Amulyam24
376941cf69 kata-ctl: skip building kata-ctl on ppc64le
kata-ctl currently fails to build on ppc64le. Skip it for running static checks and the issues will be fixed and tracked in a seperate issue.

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2024-01-18 16:31:13 +01:00
Amulyam24
4ecd82a5df runk: skip the test_init_container_create_launcher if not root on ppc64le
This is to skip the test_init_container_create_launcher if not root on ppc64le.

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2024-01-18 16:31:13 +01:00
Amulyam24
a4b5447924 tools: fix makefile spacing
This minor PR removes the extra space in the makefiles.

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2024-01-18 16:31:13 +01:00
Amulyam24
394777291d runtime: fix failing unit tests on ppc64le
A few CPU related test cases were failing as the version was being verified against Power8 while the CI machine is Power9.

Fixes: #5531

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2024-01-18 16:31:13 +01:00
Amulyam24
486b8a0538 dragonball: skip running static-checks for ppc64le
Since dragonball is not currently supported on ppc64le, skip running the targets for static-checks.

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2024-01-18 16:31:13 +01:00
Hyounggyu Choi
290ecf4c46 Static-check: Exclude s390x from dragonball and runtime-rs
At the moment, a project `dragonball` and `runtime-rs` does not support
for s390x. During the enablement, some errors due to the misconfiguration
of Makefile for `make check` and `make vendor` were identified.

This is to skip the build for the affected target of the projects.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-01-18 16:31:13 +01:00
Hyounggyu Choi
c0f57c9e0a Lint: Fix cargo clippy errors for s390x
Some linting errors were identified during the enablement of `make check`.
These have not been found by the Jenkins CI job because `make test` was
only triggered.

The errors for the `agent` occurs under the s390x specific tests while
the other ones for the `kata-ctl` are the architecture-specific code.

This commit is to fix those errors.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-01-18 16:31:13 +01:00
Jianyong Wu
ba74a624a8 runtime-rs: use pathBuf only for x86
PathBuf here is only used for x86.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2024-01-18 16:31:13 +01:00
Chelsea Mafrica
32ad465663
Merge pull request #8710 from jodh-intel/runtime-rs-ch-get-thread-ids
runtime-rs: ch: Implement minimal implementation for missing thread/pid APIs
2024-01-17 14:51:44 -08:00
Fabiano Fidêncio
147d5fd752
Merge pull request #8836 from microsoft/danmihai1/test-with-cbl-mariner
genpolicy: use root path from cbl-mariner Guest VM
2024-01-17 17:51:44 +01:00
Pavel Mores
f550d9a325 runtime-rs: add basic implementation of qemu command line generation
This current framework is enough to launch a VM with a simple container
in it (e.g. busybox).

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-01-17 12:55:00 +01:00
Pavel Mores
e8e13044da runtime-rs: add simple impls to some of Qemu's Hypervisor functions
The idea of most of these is just to prevent running into todo!()s where
we can at the moment, while implementing the fundamental functionality of
VM launch.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-01-17 12:55:00 +01:00
Dan Mihai
69557e5ad6
Merge pull request #8814 from microsoft/danmihai1/genpolicy-kata-deploy
tools: genpolicy static checks
2024-01-16 07:33:42 -08:00
Dan Mihai
13f2398fe8
Merge pull request #8837 from microsoft/danmihai1/allow_storages
genpolicy: temporarily disable allow_storages()
2024-01-16 07:10:49 -08:00
alex.lyn
99717371c1 runtime-rs: bugfix for DirectVolume/rawblock when driver is blk
DirectVolume/Rawblock doesn't work well when device's block driver
is virtio-blk-pci and the storage handler is DRIVER_BLK_PCI_TYPE.

Fixes: #8707

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2024-01-16 10:35:08 +08:00
Dan Mihai
205dafd323 genpolicy: temporarily disable allow_storages()
Temporarily disable the allow_storages() rules, because they are based
on the tarfs snapshotter + container image integrity information that
are not available yet in the main branch - see #8833.

Fixes: #8834

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-01-15 23:55:27 +00:00
Dan Mihai
f4106a6107 genpolicy: use root path from cbl-mariner Guest VM
Adjust genpolicy-settings.json to match the container root path from
the main branch + cbl-mariner Guest VMs.

This configuration might have to be adjusted again when other types of
Guest VMs will be tested during CI using genpolicy, in the future.

Also, improve logging from allow_root_path(), to easier debug these
issues in the future.

Fixes: #8835

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-01-15 23:33:28 +00:00
Dan Mihai
201eec628a tools: genpolicy static checks
Package genpolicy and enable static checks for it.

Fixes: #8813

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-01-15 16:49:58 +00:00
Fabiano Fidêncio
0dc00ae373
Merge pull request #8822 from microsoft/danmihai1/cargo-clippy
genpolicy: cargo clippy fixes
2024-01-15 14:59:04 +01:00
Xuewei Niu
923bd65dff
Merge pull request #8819 from justxuewei/rm-protocol-backend
dragonball: Remove unused definition
2024-01-15 10:09:46 +08:00
Dan Mihai
681cb1626a genpolicy: cargo clippy fixes
Clean up cargo clippy errors.

Fixes: #8818

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-01-14 01:23:46 +00:00
Xuewei Niu
f1fda3d6b0 dragonball: Remove unused definition
`EndpointProtocolFlags::ProtocolBackend` is removed due to no reference.

Fixes: #8745

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2024-01-13 13:25:11 +08:00
Dan Mihai
dcaae54cf6 genpolicy: "cargo fmt -- --check" clean-up
Also, update Cargo.lock

Fixes: #8816

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-01-13 01:57:00 +00:00
Fabiano Fidêncio
a606401722
Merge pull request #8803 from jodh-intel/issues-8784-runtime-rs-ch-rm-todo-to-unbreak
runtime-rs: ch: Unbreak CH driver
2024-01-11 19:37:13 -03:00
Fabiano Fidêncio
86a6d133e4
Merge pull request #8248 from microsoft/danmihai1/genpolicy-main
tools: add policy generation tool
2024-01-11 17:02:54 -03:00
James O. D. Hunt
29e0de4e4a runtime-rs: ch: Implement minimal memory hotplug APIs
Replace the `todo!()` calls with a minimal NOP implementation to return
the CH driver to working order since the `todo!()`'s forcibly crash the
driver at runtime. Full implementations for these APIs will be added on
issues #8800, #8801, and #8802.

Fixes: #8784.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2024-01-11 14:11:31 +00:00
James O. D. Hunt
1c0df670af runtime-rs: ch: Add minimal implementation of hypervisor metrics method
Remove the `todo!()` macro which would cause a runtime crash and replace
with a implementation that returns an error as a stop-gap until #8800 is
implemented.

Fixes: #8785.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2024-01-11 14:11:01 +00:00
Hyounggyu Choi
f62ec0a7f5
Merge pull request #8693 from BbolroC/ibm-se-config-validation-fix
runtime: Allow no initrd path for IBM Z Secure Execution
2024-01-11 09:53:51 +01:00
Xuewei Niu
70305fefc5
Merge pull request #8780 from justxuewei/containerd-events
runtime-rs: Forward events to containerd via ttrpc
2024-01-11 14:58:14 +08:00
Xuewei Niu
6fd49f7604 runtime-rs: Forward events to containerd via ttrpc
It is a little bit heavy for the runtime-rs to forwards events via
containerd CLI, contrast to the ttrpc way. Plus, for runtimes that haven't
this mechanism, e.g. CRI-O, we can't get those events anywhere.

This patch introduces two types of forwarders:

- `ContainerdForwarder`: Acquire ttrpc address from environment variables
  and forward events via ttrpc connection.
- `LogForwarder`: Write event info into logs.

Fixes: #7881

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2024-01-11 10:32:50 +08:00
Alex.Lyn
695440a431
Merge pull request #8749 from Apokleos/fixup-dragonball-vfio
runtime-rs: fixup vfio device in runtime-rs/dragonball
2024-01-10 15:20:34 +08:00
Greg Kurz
e3611cf27d
Merge pull request #8326 from cheriL/8325/fix_method_param
agent: use method params instead of const params in functions
2024-01-09 07:35:19 +01:00
Pavel Mores
0cfb2d2570 runtime-rs: add simple Persist implementation for Qemu
This is not necessarily meant to work, just to stub out unimplemented
functionality while focusing on more fundamental things.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-01-08 13:12:39 +01:00
Pavel Mores
45862aeec0 runtime-rs: add default rootfs type for qemu
Make sure that rootfs type is known early on even if it's not set in
configuration.toml.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-01-08 13:12:39 +01:00
Xuewei Niu
192c6ee9c3
Merge pull request #8773 from justxuewei/dbs-k8s-fragile 2024-01-05 12:54:32 +08:00
Xuewei Niu
0e9d73fe30 agent: Fix an issue reporting OOM events by mistake
The agent registers an event fd in `memory.oom_control`. An OOM event is
forwarded to containerd when the event is emitted, regardless of the
content in that file.

I observed content indicating that events should not be forwarded, as shown
below. When `oom_kill` is set to 0, it means no OOM has occurred. Therefore,
it is important to check the content to avoid mistakenly forwarding OOM
events.

```
oom_kill_disable 0
under_oom 0
oom_kill 0
```

Fixes: #8715

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2024-01-05 11:06:37 +08:00
Dan Mihai
7d5336aca3 agent: hold lock while setting new policy
Don't release the lock between is_allowed and set_policy calls,
because the policy might change in between these calls.

Also, move more policy code into policy.rs.

Fixes: #8734

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-01-04 16:45:30 +00:00
Xuewei Niu
b5a6e74cdf
Merge pull request #8744 from justxuewei/vhu-net-compile
dragonball: Fix compilation issue without all net features
2024-01-04 19:02:55 +08:00
soup
7c176a62fe agent: use method params instead of const params in functions
Fixes: #8325

Signed-off-by: soup <lqh348659137@outlook.com>
2024-01-04 09:29:29 +01:00
Xuewei Niu
f97f16a44a agent-ctl: Bump ttrpc version
- `ttrpc` from `0.7.1` to `0.8`.

Fixes: #8757

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2024-01-04 15:58:34 +08:00
Xuewei Niu
bf59c7b3d4 runtime-rs: Bump ttrpc and containerd-shim-protos versions
- `ttrpc` from `0.7.1` to `0.8`.
- `containerd-shim-protos` from `0.3.0` to `0.6.0`.

Fixes: #8756

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2024-01-04 15:58:34 +08:00
Xuewei Niu
cf9a0e21a1 protocols: Bump ttrpc version
- `ttrpc` from `0.7.1` to `0.8`.

Fixes: #8756

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2024-01-04 15:58:34 +08:00
Xuewei Niu
91360e7ddb agent: Bump ttrpc version
- `ttrpc` from `0.7.1` to `0.8`.

Fixes: #8756

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2024-01-04 15:58:34 +08:00
Chao Wu
f1235ddba3 dbs_virtio_devices: add Cargo.lock
In order to avoid rust-vmm upstream change breaks Dragonball
compilation, we introduce Cargo.lock to dbs crates.

fixes: #8770

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2024-01-04 11:23:30 +08:00
Chao Wu
02cd726bfc dbs-utils: add Cargo.lock
In order to avoid rust-vmm upstream change breaks Dragonball
compilation, we introduce Cargo.lock to dbs crates.

fixes: #8770

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2024-01-04 11:17:45 +08:00
Chao Wu
97bdc1529b dbs-pci: introduce Cargo.lock
As reported in #8767, we have found that the root cause is that rust-vmm's vmm-sys-utils
introduce a new release 0.12.1 and dbs-pci rely on rust-vmm's vfio-ioctls which uses >=
to declare vmm-sys-utils so it automatically upgrade vmm-sys-utils to 0.12.1.
That's how two different versions of vmm-sys-utils is introduced and this breaks the compilation.

In order to fix this and also avoid future problems, we introduce Cargo.lock file to dbs crates.

fixes: #8770

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2024-01-04 11:11:56 +08:00
alex.lyn
d2080fd221 runtime-rs: refactor getting the vfio device guest pci path
Fixes: #8748

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2024-01-02 14:28:34 +08:00
alex.lyn
d795fcfc2f runtime-rs: bridge the vfio device between runtime-rs and dragonball
Previously, Dragonball did not support PCI device hot-plugging or
VFIO device passthrough. Therefore, the runtime-rs support for
Dragonball was incomplete. it is time to complete it so that users
can use Dragonball's PCI hot-plugging and VFIO passthrough capabilities.

Fixes: #8748

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2024-01-02 14:28:10 +08:00
Chao Wu
67b91c1eb3
Merge pull request #8740 from openanolis/upstream/pci-6-final
Dragonball: add pci vfio passthrough, hot(un)plug support
2023-12-29 01:58:32 +08:00
Chao Wu
71c322c293 runtime-rs: fix ci complains
vfio commits introduce quite a lot change in runtime-rs, this commit is
for all the changes related to ci, including compilation errors and so on.

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-12-28 23:34:41 +08:00
Chao Wu
a3f7601f5a dragonball: add pci hotplug / hot-unplug support
Introduce two new vmm action to implement pci hotplug
and pci hot-unplug: PrepareRemoveHostDevice and RemoveHostDevice.

PrepareRemoveHostDevice is to call upcall to unregister the pci device
in the guest kernel.
RemoveHostDevice should be called after PrepareRemoveHostDevice, it is used
to clean the PCI resource in the Dragonball side.

fixes: #8741

Signed-off-by: Gerry Liu <gerry@linux.alibaba.com>
Signed-off-by: Zizheng Bian <zizheng.bian@linux.alibaba.com>
Signed-off-by: Zha Bin <zhabin@linux.alibaba.com>
Signed-off-by: Helin Guo <helinguo@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-12-28 16:08:31 +08:00
Chao Wu
0f402a14f9 dragonball: add InsertHostDevice vmm action
Introduce a new vmm action InsertHostDevice to passthrough
host pci devices like NIC or GPU devices into guest so that
users could have high performance usage of those devices.

fixes: #8741

Signed-off-by: Gerry Liu <gerry@linux.alibaba.com>
Signed-off-by: Zizheng Bian <zizheng.bian@linux.alibaba.com>
Signed-off-by: Zha Bin <zhabin@linux.alibaba.com>
Signed-off-by: Helin Guo <helinguo@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-12-28 16:04:22 +08:00
Xuewei Niu
4c023e341c dragonball: Fix compilation issue without all net features
Combinations of network features were tested:

- None
- virtio-net
- vhost-net
- vhost-user-net
- virtio-net,vhost-net
- vhost-net,vhost-user-net
- virtio-net,vhost-user-net
- virtio-net,vhost-net,vhost-user-net

Fixes: #8742

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-12-28 11:37:26 +08:00
Alex.Lyn
990a3adf39
Merge pull request #8618 from Apokleos/csi-for-directvol
runtime-rs: Add dedicated CSI driver for DirectVolume support in Kata
2023-12-27 21:27:29 +08:00
alex.lyn
ea69c17008 runtime-rs: initialize pcie topology in Device Manager
Add a pcie_topology field to DeviceManager and initialize
pcie_topology when ResourceManager calls DeviceManager's new()
with TopologyConfigInfo.

Fixes: #7218

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-27 15:57:23 +08:00
alex.lyn
b42548b8e1 runtime-rs: do unregister device in Trait Device/detach
Fixes: #7218

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-27 15:53:18 +08:00
alex.lyn
0f0b6d13c9 runtime-rs: do register/update device in Trait Device/attach
Before calling the device driver to attach a device, register
the device to PCIe topology and allocate a PciPath for it.

However, for some hypervisor such as CLH, the allocation is invalid
when plugging devices to VM, they have the ability to return
DeviceInfo containing PciPath. It'll update the PciPath with the
returned pci path in the PCIe topology for them to prevent the
inferred pcipath from being different from the actual value returned.

But the update will not be executed if the pcipath value doesn't change.

Fixes: #7218

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-27 15:49:18 +08:00
alex.lyn
ce7d363695 runtime-rs: Introduce helper macros to simplify PCIe device ops
Introduce helper macros to simplify PCIe device register/unregister
and update, which provides a convenient way to handle devices in
topology.

Fixes: #7218

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-27 15:43:58 +08:00
alex.lyn
0d4992b24d runtime-rs: add one more argument in Device attach/detach
Add one more argument with type &mut Option<&mut PCIeTopology>
in attach and detach to inroduce methods within PCIe Topology.

Fixes: #7218

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-27 15:40:01 +08:00
alex.lyn
b425de6105 runtime-rs: implement Trait PCIeDevice for pcie/pci device
Implement Trait PCIeDevice register/unregister for pcie/pci
device, such as vfio device which needs set/get device's pci
path for kata agent's device handler.

Fixes: #7218

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-27 15:33:08 +08:00
alex.lyn
87e39cd1f6 runtime-rs: introduce Trait PCIeDevice to do [un]register device
Introduce Trait PCIeDevice with register/unregister, which are
used to register or unregister pcie device within the PCIe topology.

Fixes: #7218

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-27 15:29:35 +08:00
alex.lyn
6ebc4884fa runtime-rs: introduce PCIe Topology framework for pcie/pci devices
Due to different ways that different VMMs handle PCI devices,
we expect to provide a general PCIe topology processing framework
that is as compatible as possible with VMMs such as dragonball,
qemu, clh(Though it has its own management method, no conflict).

Currently,it's mainly developed for kinds of PCIe/PCI devices in
dragonball/clh which are attached on the pci/pcie root bus directly.

More will be added when Qemu is ready in runtime-rs.

Fixes: #7218

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-27 15:29:25 +08:00
alex.lyn
88839026b9 runtime-rs: introduce TopologyConfigInfo to initialize pcie topology
A TopologyConfigInfo added to store device config info for PCIe/PCI
devices in the VM from Hypervisor DeviceInfo.

And TopologyConfigInfo::new will be the entry to initialize PCIe
Topology for each VM.

Fixes: #7218

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-27 15:21:53 +08:00
Chao Wu
8895cb82df
Merge pull request #8724 from openanolis/chao/add_vfio
dragonball: introduce vfio support
2023-12-27 11:40:53 +08:00
Xuewei Niu
43a627c96f
Merge pull request #8632 from adamqqqplay/support-vhost-user-blk
dragonball: introduce vhost-user-blk device
2023-12-27 09:54:21 +08:00
Chao Wu
2f797a6eb7 pci: rename 2 parameters to follow rust naming convention
PciCapabilityID -> PciCapabilityId
PciBarRegionType::IORegion -> PciBarRegionType::IoRegion

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-12-26 23:28:47 +08:00
Chao Wu
9c13b2c990 dragonball: introduce vfio support
vfio mod collects lots of information related to the vfio operations, including VfioMsi and VfioMsix capability & state,
vfio interrupt info, pci region infor and vfio pci device info & state.

fixes: #8722

Signed-off-by: Gerry Liu <gerry@linux.alibaba.com>
Signed-off-by: Zizheng Bian <zizheng.bian@linux.alibaba.com>
Signed-off-by: Shifang Feng <fengshifang@linux.alibaba.com>
Signed-off-by: Yang Su <yang.su@linux.alibaba.com>
Signed-off-by: Zha Bin <zhabin@linux.alibaba.com>
Signed-off-by: Xin Lin <jingshan@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-12-26 23:28:43 +08:00
alex.lyn
ba5437382a runtime-rs: add examples about Kata pod with directvol by CSI.
Fixes: #8602

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-26 20:36:34 +08:00
alex.lyn
c6d2a32146 runtime-rs: add support for directvol csi deploy scripts.
Fixes: #8602

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-26 20:36:34 +08:00
alex.lyn
25d8e83e43 runtime-rs: Add dedicated CSI driver for DirectVolume support in Kata
Bridge the gap between user requirements for direct block device access
and the DirectVolume capabilities provided by Kata runtimes
(kata-runtime/runtime-rs), and facilitate seamless integration with CSI
to improve user experience.

It aims to integrate DirectVolume CSI support into Kata, enabling users
to benefit from its performance and flexibility advantages.

Fixes: #8602

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-26 20:36:22 +08:00
Qinqi Qu
81ab174c16 dragonball: support vhost-user-blk in device manager
This patch introduces a feature of supporting vhost-user-blk device.

Fixes: #8631

Signed-off-by: Qinqi Qu <quqinqi@linux.alibaba.com>
2023-12-26 20:02:38 +08:00
Qinqi Qu
ef8dc3b0ce dragonball: support vhost-user-blk
This patch introduces a feature of supporting vhost-user-blk device.

This device needs to be defined before the VM instance is started,
which can be done through the dbs-cli tool with --virblks option:
--virblks '{
	"drive_id": "8623",
	"device_type": "Spdk",
	"path_on_host": "spdk:///var/tmp/vhost.sock",
	"is_root_device": false,
	"is_read_only": false,
	"is_direct": false,
	"no_drop": false,
	"num_queues": 1,
	"queue_size": 256
}'

Fixes: #8631

Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
Signed-off-by: fupan <fupan.lfp@antgroup.com>
Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Qinqi Qu <quqinqi@linux.alibaba.com>
2023-12-26 20:02:32 +08:00
alex.lyn
3b317e69e2 runtime-rs: add README and user guide to deploy directvol CSI Driver
Fixes: #8602

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-26 18:00:35 +08:00
Xuewei Niu
36a4cbccf6 runtime-rs: Expand all DeviceType in match arms
The compiler will give a warning if a developer forget to add an arm for
a new variants defined.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-12-26 10:18:59 +08:00
Xuewei Niu
f2d08bc00f runtime-rs: Remove unused index from Endpoints
The affected `Endpoint`s are `VhostUserEndpoint` and `TapEndpoint`.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-12-26 10:18:59 +08:00
Xuewei Niu
60a42351e2 runtime-rs: DAN supports vhost-user-net device
DAN reads vhost-user-net device from JSON config. It only supports VMM
running as server right now.

Fixes: #8625

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-12-26 10:18:59 +08:00
Xuewei Niu
693a0cfbfd dragonball: Make vhost-user-net ready for VhostUserEndpoint
The changes involve:

- Expose VhostUserConfig struct to runtime-rs.
- Set a default value while num_queues or queue_size are 0.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-12-26 10:18:59 +08:00
Xuewei Niu
54df832407 runtime-rs: Support VhostUserEndpoint
This commit introduces VhostUserEndpoint and supports relative to
vhost-user-net devices for device manager. For now, Dragonball is able to
attach vhost-user-net devices.

Fixes: #8625

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-12-26 10:18:50 +08:00
Xuewei Niu
374c2f01aa runtime-rs: Simplify VhostUserType enum
Remove unused string parameter from each item.

Fixes: #8625

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-12-25 16:15:57 +08:00
Xuewei Niu
38eb4077a6
Merge pull request #8503 from justxuewei/vhost-user-net
dragonball: Support vhost-user-net device
2023-12-25 13:47:51 +08:00
Xuewei Niu
4c5de72863 dragonball: Wrap config space into set_config_space
Config space of network device is shared and accord with virtio 1.1 spec.
It is a good way to abstract the common part into one function.
`set_config_space()` implements this.

Plus, this patch removes `vq_pairs` from vhost-net devices, since there is
a possibility of data inconsistency. For example, some places read that
from `self.vq_pairs`, others read from `queue_sizes.len() / 2`.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-12-25 10:47:34 +08:00
Alex.Lyn
3a3f39aa2d
Merge pull request #8668 from Apokleos/pci-path-refactor
runtime-rs: Refactor the code related to PCI paths and VFIO device driver initialize in DM.
2023-12-23 21:44:07 +08:00
Dan Mihai
080541a0f2 genpolicy: add SPDX license header
Add SPDX license header to rules.rego.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-12-22 15:35:05 +00:00
Saul Paredes
7f126be67e genpolicy: Update oci_distribution to 0.10.0
Also support alternative media type and update samples

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-12-22 15:35:05 +00:00
Dan Mihai
9eb6fd4c24 docs: add agent policy and genpolicy docs
Add docs for the Agent Policy and for the genpolicy tool.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-12-22 15:35:05 +00:00
Dan Mihai
57f93195ef genpolicy: add support for StatefulSet YAML input
Generate policy for K8s StatefulSet YAML.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-12-22 15:35:05 +00:00
Dan Mihai
35958ec9cc genpolicy: add support for ReplicationController
Generate policy for K8s ReplicationController YAML.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-12-22 15:35:05 +00:00
Dan Mihai
7da17099f2 genpolicy: add support for ReplicaSet YAML input
Generate policy for K8s ReplicaSet YAML.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-12-22 15:35:05 +00:00
Dan Mihai
d84300f1ee genpolicy: add support for List YAML input
Generate policy for K8s List YAML.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-12-22 15:35:05 +00:00
Dan Mihai
a03452637b genpolicy: add support for Job YAML input
Generate policy for K8s Job YAML.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-12-22 15:35:05 +00:00
Dan Mihai
2dbd01c80b genpolicy: add support for Deployment YAML input
Generate policy for K8s Deployment YAML.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-12-22 15:35:05 +00:00
Dan Mihai
a40a6003d0 genpolicy: add support for DaemonSet YAML input
Generate policy for K8s DaemonSet YAML.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-12-22 15:35:05 +00:00
Dan Mihai
48829120b6 policy: initial genpolicy commit
Add application that infers K8s user's intentions based on user's
K8s YAML file, and generates a Rego/OPA based policy for that YAML.

Just Pod YAML files are supported as input using this initial source
code. Support for other types of YAML files will come with upcoming
commits.

Fixes: #7673

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-12-22 15:35:05 +00:00
Chao Wu
8cf3bcefd8 dragonball: introduce pci msi/msix interrupt
introduce msi/msix mod to maintain information for PCI Message Signalled
Interrupt Extended Capability. It will be initialized when parsing pci
configuration space and used when getting interrupt capabilities.

fixes: #8661

Signed-off-by: Gerry Liu <gerry@linux.alibaba.com>
Signed-off-by: Zizheng Bian <zizheng.bian@linux.alibaba.com>
Signed-off-by: Shifang Feng <fengshifang@linux.alibaba.com>
Signed-off-by: Yang Su <yang.su@linux.alibaba.com>
Signed-off-by: Zha Bin <zhabin@linux.alibaba.com>
Signed-off-by: Xin Lin <jingshan@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-12-22 16:28:22 +08:00
Xuewei Niu
beadce54c5 dragonball: Support vhost-user-net devices
This PR introduces vhost-user-net devices to Dragonball. The devices are
allowed to run as server on the VMM side.

Fixes: #8502

Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Zha Bin <zhabin@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Signed-off-by: Zizheng Bian <zizheng.bian@linux.alibaba.com>
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-12-22 14:53:18 +08:00
Xuewei Niu
1f21d3cb2c dragonball: Introduce address space for MmioV2DeviceState
Vhost-user-net has a dependency on address space from `MmioV2DeviceState`.
The addition of the address space is introduced in this patch. Plus, it
makes sure all unit tests have the according parameter as well.

Fixes: #8502

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-12-22 14:53:18 +08:00
alex.lyn
94c83cea84 runtime-rs: Refactor vfio driver implementation
It's important to ensure that these tasks which setup vfio
devices are completed before add_device.

So Moving vfio device setup code to a dedicated method at device
building time which does not affect the behavior of other code.

And this change makes it easier to understand the difference
between create and attach, and also makes the boundaries
clearer.

Fixes: #8665

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-22 10:37:40 +08:00
alex.lyn
82d3cfdeda runtime-rs: Make VhostUserConfig's field pci_path type more specific
Make VhostUserConfig pci_path's type more specific, change it
from Option<String> to Option<PciPath>.

Fixes: #8665

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-22 10:35:38 +08:00
alex.lyn
5cc2890a10 runtime-rs: refactor and re-implement pci path.
Do refactor and re-implement to make the pci path more "rusty".

Fixes: #8665

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-22 10:34:41 +08:00
alex.lyn
1b5758c1f2 runtime-rs: Move the PciPath-related code to a dedicated file
Move the pciPath code to a new file pci_path.rs and update the
references.

Fixes: #8665

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-21 11:35:18 +08:00
alex.lyn
275de453d5 runtime-rs: remove useless get_host_guest_map and its test case
Fixes: #8665

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-21 11:07:56 +08:00
James O. D. Hunt
7da6d0a845 runtime-rs: ch: Implement missing thread/pid APIs
Add implementations for the following `Hypervisor` trait methods which
simply return the same details as the `get_vmm_master_tid()` method:

- `get_thread_ids()`
- `get_pids()`

Fixes: #6438.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-12-20 17:58:40 +00:00
Archana Shinde
7e5868a55f
Merge pull request #8588 from amshinde/runtime-rs-update-readme
runtime-rs: Update readme to indicate cloud-hypervisor support
2023-12-19 22:09:14 -08:00
Hyounggyu Choi
540a2a7fb1 runtime: Allow no initrd path for IBM Z Secure Execution
This is to reintroduce a configuration rule for IBM Z Secure Execution,
where no initrd path should be configured. For the TEE of interest,
only a kernel image should be specified with `confidential_guest=true`.

Fixes: #8692

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2023-12-19 11:21:16 +01:00
Xuewei Niu
ec30d5a9a8
Merge pull request #8700 from justxuewei/dbs-ut
dragonball: Trigger unit tests of dbs_* subcrates by `make test`
2023-12-19 17:51:20 +08:00
Xuewei Niu
039fe7f391 dragonball: Trigger unit tests of dbs_* subcrates by make test
`make SUPPORT_VIRTUALIZATION=1 test` iterates through all subcrates and
does test.

Plus, this patch fixes some issues about unit tests:

- Feed too much parameters to `I8042Device::new()`.
- Virtqueue checks have been introduced since `virtio-queue v0.7.0`.
- GHA might have no access to `/var/tmp` dir on runner.

Fixes: #8690

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-12-19 16:22:37 +08:00
Hyounggyu Choi
ceea8882db
Merge pull request #8672 from BbolroC/introduce-vsock-device-init
runtime-rs: Separate init_config() from new() for struct VsockDevice
2023-12-18 22:04:37 +01:00
Hyounggyu Choi
3cd0cc1388 runtime-rs: Separate init_config() from new() for struct VsockDevice
As a follow-up for #8516, guest_cid and vhost_fd are not necessarily initialised
via new(). Instead, the fields should be initialised later when they are really
used to construct hypervisor's parameters.
This commit is to separate init_config() from new() to initialise guest_cid
and vhost_fd and leave only the assignment of id for the existing function.

Fixes: #8671

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2023-12-18 16:36:09 +01:00
Greg Kurz
2987d3eeb5
Merge pull request #8341 from jongwu/fix_cpushares
agent: correct CPUShares and CPUWeight value
2023-12-18 15:40:04 +01:00
James O. D. Hunt
3c49120d2f
Merge pull request #8641 from jodh-intel/kata-ctl-add-cfg-file-cli-option
kata-ctl: Add option to dump config files
2023-12-18 11:54:19 +00:00
Zhongtao Hu
9a37e77f2a runtime-rs: check the update memory size
check the update memory size greater than default max memory size

Fixes:#6875
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2023-12-15 11:25:34 +08:00
Zhongtao Hu
6039417104 runtime-rs: add default_maxmemory in config file
add default_maxmemory in config file

Fixes:#6875
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2023-12-15 10:25:20 +08:00
Zhongtao Hu
8d9fd9c067 runtime-rs: support memory resize
Fixes:#6875
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2023-12-15 10:25:13 +08:00
Zhongtao Hu
81e55c424a runtime-rs: add resize_memory trait for hypervisor
Fixes: #6875
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2023-12-15 10:25:03 +08:00
Zhongtao Hu
d428a3f9b9 runtim-rs: get guest memory details
get memory block size and guest mem hotplug probe

Fixes:#6356
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2023-12-15 10:22:37 +08:00
Jianyong Wu
58e88d9469 agent: correct CPUShares and CPUWeight value
If cgroup driver is systemd, CPUShares, for cgroup v1, should be at
least 2 [1] and CPUWeight for cgroup v2, should be at least 1 [2].

Fixes: #8340
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>

[1] d19434fbf8/src/basic/cgroup-util.h (L122)
[2] d19434fbf8/src/basic/cgroup-util.h (L91)
2023-12-15 02:04:31 +08:00
Alex.Lyn
c7c7632203
Merge pull request #8620 from Apokleos/enhance-directv-using-csi
runtime-rs: Enhancement of DirectVolume when using a dedicated CSI
2023-12-14 22:59:09 +08:00
alex.lyn
aa42f0a03f runtime-rs: Enhancement of DirectVolume when using CSI.
We use a matching direct-volume path to determine whether an OCI mount
is a DirectVolume. However, we should handle the case where no match is
found appropriately.
This error will be defined as a non-DirectVolume type when judging the
OCI mount but not failed.

Fixes: #8619

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-14 18:19:03 +08:00
alex.lyn
80d631ee84 runtime-rs: Add attribute serde rename to each field of DirectVolume.
DirectVolume structure in runtime-rs is different from it in kata-runtime,
which causes they has no unified handling method for DirectVolumeMountInfo
and MountInfo.

We should align the two by simply adding the attribute #[serde(rename="x")
to each field in DirectVolumeMountInfo

Fixes: #8619

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-14 18:18:40 +08:00
Xuewei Niu
7f611dfe84
Merge pull request #8609 from justxuewei/runtime-rs-vhost-net
dragonball: Use vhost-net device by default
2023-12-14 16:33:29 +08:00
Xuewei Niu
82fde4431e dragonball: Set default queue config for vhost-net device
Dragonball sets a default queue config in the case of `None`. The
queue_size and num_queues of vhost-net are set to `Some(0)` by default.
Therefore, we might get an invalid queue config. This patch fixes this
issue.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-12-14 11:18:33 +08:00
Xuewei Niu
c11b066728 runtime-rs: Use vhost-net device by default
This patch set vhost-net as default backend of networking. It allows users
to set `disable_vhost_net` to `true` to reenable virtio-net backend.
Plus, which backend to use is a matter of hypervisor, runtime-rs will no
longer need to know that.

Fixes: #8608

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-12-14 11:18:26 +08:00
Chao Wu
dfaf006fcc
Merge pull request #8564 from openanolis/chao/add_pci_root_bus_device
dragonball: add pci root bus and root device
2023-12-13 17:57:16 +08:00
James O. D. Hunt
d7c6219dfe
Merge pull request #8630 from jodh-intel/runtime-rs-ch-set-state-on-vm-stop
runtime-rs: ch: Change state when VM stopped
2023-12-13 09:26:30 +00:00
James O. D. Hunt
2a518f0898 runtime-rs: ch: Change state when VM stopped
Make the CH (Cloud Hypervisor) `stop_vm()` method check the VM state before
attempting to stop the VM, and update the state once the VM has stopped.

This avoids the method failing if called multiple times which will
happen if the workload exits before the container manager requests that
the container stop.

This change ensures the CH driver finishes cleanly.

Fixes: #8629.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-12-12 18:25:20 +00:00
James O. D. Hunt
1195692d3c runtime-rs: ch: Move state handling to top-level APIs
Move the state setting to the `Hypervisor` trait calls. This makes the
code clearer.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-12-12 15:25:27 +00:00
James O. D. Hunt
5637f11a8c kata-ctl: Add option to dump config files
Add a `--show-default-config-paths` command line option for parity with
`kata-runtime`.

Note that this requires the `KataCtlCli.command` to be optional so that
the user can run simply:

```bash
$ kata-ctl --show-default-config-paths
```

... without also specifying a (sub-)command.

Fixes: #8640.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-12-12 14:20:04 +00:00
Xuewei Niu
86918e91b3 dragonball: Disable packed virtqueue for vhost-user devices
The layout of packed virtqueue isn't supported by `Endpoint::negotiate()`.
Communication between device and driver will be failed due to the failure
of parsing virtqueue if we don't disable the packed feature. This patch
fixes this issue.

Fixes: #8633

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-12-12 17:24:20 +08:00
Chao Wu
b079e1aabc dragonball: add pci root bus and root device
In order to follow up the PCI implementation in Dragonball, we need to
add PCI root device and root bus support.

root device is a pseudo PCI root device to manage accessing to PCI
configuration space.

root bus is mainly for emulating PCI root bridge and also create the PCI
root bus with the given bus ID with the PCI root bridge.

fixes: #8563

Signed-off-by: Gerry Liu <gerry@linux.alibaba.com>
Signed-off-by: Zizheng Bian <zizheng.bian@linux.alibaba.com>
Signed-off-by: Shifang Feng <fengshifang@linux.alibaba.com>
Signed-off-by: Yang Su <yang.su@linux.alibaba.com>
Signed-off-by: Zha Bin <zhabin@linux.alibaba.com>
Signed-off-by: Xin Lin <jingshan@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-12-12 11:43:14 +08:00
Chao Wu
52f7a40e4e dragonball: add --all for fmt ci
Right now, cargo fmt check in Dragonball only test with the default
features but not all features. This will cause some code being untested
by the fmt tool.

This PR adds --all option for the Dragonball CI and also fix some code
that forgets to do cargo fmt --all.

fixes: #8598

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-12-11 20:54:25 +08:00
Chao Wu
df7f416cb8
Merge pull request #8566 from liubogithub/liubo/dev/panic_fix
runtime-rs: fix panic when hypervisor mismatches with configuration
2023-12-10 21:33:59 +08:00
Chelsea Mafrica
1c42d94550
Merge pull request #6826 from gabevenberg/log-parser-rs
kata-ctl: Moved log-parser-rs into kata-ctl
2023-12-08 11:33:09 -08:00
Liu Bo
bf97051f11 runtime-rs: fix panic when hypervisor mismatches with configuration
If a wrong configuration.toml file is used by accidentally, runtime-rs
binary could run into panic because of unwrap().

This fixes the panic by returning errors instead of unwrap().

fixes: #8565

Signed-off-by: Liu Bo <liub.liubo@gmail.com>
2023-12-08 08:56:23 -08:00
Chao Wu
5054e59ccb
Merge pull request #8429 from adamqqqplay/support-vhost-user-fs
dragonball: introduce vhost-user-fs device
2023-12-08 17:20:52 +08:00
Hyounggyu Choi
588f639a69
Merge pull request #6755 from BbolroC/add-se-artifacts-to-main
packaging: Add IBM Z SE artifacts to main
2023-12-08 05:17:38 +01:00
Gabe Venberg
69fdd05ce5 kata-ctl: Moved log-parser-rs into kata-ctl
Log-parser-rs was always intended to become a sub-functionality of
kata-ctl, but it was useful to develop it and initaly merge it as a
standalone program, and migrate it to a subcommand later.

Fixes #6797

Signed-off-by: Gabe Venberg <gabevenberg@gmail.com>
2023-12-07 21:35:28 -06:00
Archana Shinde
a5105b4227
Merge pull request #8582 from amshinde/runtime-rs-tryfrom-blkconfig
Implement and use try_from for DiskConfig
2023-12-07 15:02:00 -08:00
Archana Shinde
458e91b289 runtime-rs: Update readme to indicate cloud-hypervisor support
Since cloud-hypervisor is no longer built as an optional feature,
lets mention cloud-hypervisor in the list of hypervisors supported by
runtime-rs.

Fixes: #8587

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-12-07 14:59:43 -08:00
Huang Jianan
5629b7454f dragonball: support vhost-user-fs in device manager
This patch implements the virtio-fs device used for filesystem sharing
and heavily based on the vhost-user protocol.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
Signed-off-by: Huang Jianan <jnhuang@linux.alibaba.com>
Signed-off-by: Qinqi Qu <quqinqi@linux.alibaba.com>
2023-12-07 11:59:07 +08:00
Archana Shinde
a661ac3a0e runtime-rs: Implement and use try_from for DiskConfig
Implement try_from trait function to convert runtime-rs BlockConfig
to cloud-hypervisor DiskConfig. This can allow for code reuse in the
future.

Fixes: #8581

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-12-06 12:10:34 -08:00
Huang Jianan
2a1fc29e84 dragonball: add unit test for vhost-user-fs
Add some test cases for vhost-user-fs function.

Signed-off-by: Beiyue <beiyue@linux.alibaba.com>
Signed-off-by: Huang Jianan <jnhuang@linux.alibaba.com>
2023-12-06 10:43:24 +08:00
Huang Jianan
d6cfbe9436 dragonball: support vhost-user-fs
This patch implements the virtio-fs device used for filesystem sharing
and heavily based on the vhost-user protocol.

This vhost-user-fs device defines 5 parameters:
  - path: vhost-user socket path
  - tag: mount tag used from the guest to mount the filesystem
  - req_num_queues: number of request virtqueues
  - queue_size: depth of each virtqueue
  - cache_size: cache window size for dax

This device needs to be defined before the VM instance is started,
which can be done through the dbs-cli tool with --fs option:
--fs '{
    "sock_path":"/path/to/virtiofs.socket",
    "tag":"myfs",
    "num_queues":1,
    "queue_size":1024,
    "cache_size":0,
    "thread_pool_size":1,
    "cache_policy":"auto",
    "writeback_cache":true,
    "no_open":true,
    "xattr":true,
    "drop_sys_resource":false,
    "mode":"vhostuser",
    "fuse_killpriv_v2":true,
    "no_readdir":false,
}'

Fixes: #8428

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
Signed-off-by: Huang Jianan <jnhuang@linux.alibaba.com>
2023-12-06 10:43:17 +08:00
Archana Shinde
955dec06da runtime-rs: add network hotplug for clh
This is required for clh to work with nerdtcl and docker.
This fixes the issues seen with nerdctl while starting a container.
Hoewever, container exit with docker is still broken due to an unrelated
issue.

Fixes: #8579

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-12-05 15:29:53 -08:00
Fabiano Fidêncio
d149b9f9ca
Merge pull request #7231 from wainersm/measured_rootfs-improvements
Build for measured rootfs improvements
2023-12-05 22:20:33 +01:00
Jeremi Piotrowski
e2c6b8ae6e
Merge pull request #4743 from yuchen0cc/main
mount: support checking multiple kinds of block device driver
2023-12-05 18:04:51 +01:00
James O. D. Hunt
d9daadf15c
Merge pull request #8558 from jodh-intel/load-config-improvement
runtime-rs: Show config files attempted on config load failure
2023-12-05 11:48:42 +00:00
Greg Kurz
1650d02b91
Merge pull request #8516 from Apokleos/vsock-dev
move vsock device into device manager
2023-12-05 11:28:37 +01:00
James O. D. Hunt
93c0fc2ad3
Merge pull request #8551 from amshinde/runtime-rs-setns-clh
runtime-rs: Launch cloud-hypervisor in given netns
2023-12-05 10:18:34 +00:00
James O. D. Hunt
d627893975 runtime-rs: Show config files attempted on config load failure
PR #8483 changed the location of the rust runtime config files to
`/etc/kata-containers/runtime-rs/`. However, if you haven't updated your
system to create that directory, attempting to create a container using
the rust runtime was giving the following cryptic message
(formatted for easier reading):

```
failed to handler message try init runtime instance

Caused by:
    0: load config
    1: load toml config
    2: entity not found
```

Now, the message is as follows (again, reformatted for easier reading):

```
failed to handle message try init runtime instance

Caused by:
    0: load config
    1: load TOML config failed (tried [
        \"/etc/kata-containers/runtime-rs/configuration.toml\",
        \"/usr/share/defaults/kata-containers/runtime-rs/configuration.toml\",
        \"/opt/kata/share/defaults/kata-containers/runtime-rs/configuration.toml\"
    ])
```

Fixes: #8557.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-12-05 09:10:18 +00:00
James O. D. Hunt
45c0364d4c runtime-rs: Fix typo in task service
"failed to handler message" -> "failed to handle message".

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-12-05 09:10:18 +00:00
Archana Shinde
2df8144cfe runtime-rs: Launch cloud-hypervisor in given netns
Launch cloud-hypervisor binary in the netns provided at the prepare_vm
stage.

Fixes: #6441

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-12-04 13:02:43 -08:00
Hyounggyu Choi
bb1d4adaa9 config: add SE configuration
This is to add SE configuration which is used by kata runtime.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2023-12-04 21:08:49 +01:00
James O. D. Hunt
e4aebb4560
Merge pull request #8549 from jodh-intel/tdx-no-root
libs: protection: x86_64: drop root requirement for querying
2023-12-04 13:03:10 +00:00
Chao Wu
1550ee6767
Merge pull request #8480 from openanolis/chao/add_dbs_pci
dragonball: init dbs-pci lib with pci bus & pci conf
2023-12-04 18:08:40 +08:00
Chao Wu
52fd57e49a
Merge pull request #8301 from Apokleos/do-direct-volume
runtime-rs: Enhancing DirectVolMount Handling with Patching Support
2023-12-04 16:49:46 +08:00
James O. D. Hunt
7beab11d9e
Merge pull request #8547 from jodh-intel/unbreak-logger
libs:logging: Fix logger
2023-12-04 08:38:03 +00:00
alex.lyn
0fabfa336d runtime-rs: bring support for legacy vsock device.
Bring support for legacy vsock and add Vsock to the ResourceConfig
enum type, and add the processing flow of the Vsock device to the
prepare_before_start_vm function.

Fixes: #8474

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-04 15:54:51 +08:00
alex.lyn
6c08cf35d5 runtime-rs: Introduce prepare_vm_socket_config to VirtSandbox.
Instroduce prepare_vm_socket_config to VirtSandbox for vm
socket config, including Vsock and Hybrid Vsock.
Use the capabilities() trait of the hypervisor to get the
vm socket supported in VMM.

Fixes: #8474

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-04 15:54:50 +08:00
alex.lyn
60f88da5e1 runtime-rs: add Capability of HybridVsockSupport for Hypervisor.
Add Cap of HybridVsockSupport for hypervisors CLH and Dragonball
which use hybrid-vsock, default for Qemu, which uses legacy vsock.

Fixes: #8474

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-04 15:54:50 +08:00
alex.lyn
c5178dd258 runtime-rs: Introduce Capability of HybridVsockSupport.
Introduce HybridVsock Cap to judge which kind of vm socket will
be supported by the Hypervisor.
Use `is_hybrid_vsock_supported` to tell if an hypervisor supports
hybrid-vsock, if not, it supports legacy vsock.

Fixes: #8474

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-04 15:54:29 +08:00
James O. D. Hunt
e1caca3e41 kata-ctl: Remove root requirement for "env"
Remove the redundant `kata-ctl` `root` check when running the `env`
command. This check duplicated the `GuestProtection` check, and that
check is now no longer necessary anyway.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-12-01 15:55:45 +00:00
James O. D. Hunt
f05ada592f libs: protection: x86_64: drop root requirement for querying
It is no longer necessary to be `root` to query the guest protection
(TDX) on `x86_64` systems, so drop the requirement.

> **Note:**
>
> This change drops the `nix` `Uid` import required for the `root` check.
> But at the same time it adds it for PPC64le since that implementation of
> `available_guest_protection()` needs it and it was previously missing.

Fixes: #8548.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-12-01 15:55:21 +00:00
Fabiano Fidêncio
852021e416
Merge pull request #8483 from fidencio/topic/move-rust-config-files-to-subdir-based-on-jodh-approach
build/kata-deploy: Move rust runtime config files to runtime-rs directory -- based on #8445
2023-12-01 16:22:51 +01:00
James O. D. Hunt
f9f1d3a071 libs:logging: Fix logger
PR #8311 inadvertently broke the logging since no log messages below the
`Info` level are logged now, regardless of the requested log level.

Resolve the issue by storing the requested log level in the
`RuntimeComponentLevelFilter` and using that level in the `log()`
function, rather than hard-coding `Info` as the default where no entry
is found in the `FILTER_RULE` hashmap.

Fixes: #8546.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-12-01 12:21:20 +00:00
yuchen.cc
1cd1558a92 mount: support checking multiple kinds of block device driver
Device mapper is the only supported block device driver so far,
which seems limiting. Kata Containers can work well with other
block devices. It is necessary to enhance supporting of multiple
kinds of host block device.

Fixes #4714

Signed-off-by: yuchen.cc <yuchen.cc@alibaba-inc.com>
2023-12-01 11:59:30 +08:00
Chelsea Mafrica
207a7fef90
Merge pull request #7815 from cmaf/runtime-rs-ch-vsock
runtime-rs: Add Hybrid VSOCK device handling for CH
2023-11-30 12:22:36 -08:00
Chao Wu
b3da71f21e dragonball: init dbs-pci lib with pci bus & pci conf
This commit inits dbs-pci lib for Dragonball to use.
It contains several implementation now:

1. PCI configuration space
2. PCI bus

More info of the design & behavior of those two features could be found
in the README of dbs-pci.

fixes: #8479

Signed-off-by: Gerry Liu <gerry@linux.alibaba.com>
Signed-off-by: Zizheng Bian <zizheng.bian@linux.alibaba.com>
Signed-off-by: Shifang Feng <fengshifang@linux.alibaba.com>
Signed-off-by: Yang Su <yang.su@linux.alibaba.com>
Signed-off-by: Zha Bin <zhabin@linux.alibaba.com>
Signed-off-by: Xin Lin <jingshan@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-11-30 23:40:26 +08:00
Steve Horsman
c6110284d5
Merge pull request #8520 from stevenhorsman/hypervisor-ttrpc
runtime: Update hypervisor generated code
2023-11-30 10:01:56 +00:00
Fabiano Fidêncio
f15e16b692 Revert "runtime: confidential: Do not set the max_vcpu to cpu"
This reverts commit b0157ad73a.
```
commit b0157ad73a
Refs: 3.3.0-alpha0-124-gb0157ad73
Author:     Fabiano Fidêncio <fabiano.fidencio@intel.com>
AuthorDate: Fri Aug 11 14:55:11 2023 +0200
Commit:     Fabiano Fidêncio <fabiano.fidencio@intel.com>
CommitDate: Fri Nov 10 12:58:20 2023 +0100

    runtime: confidential: Do not set the max_vcpu to cpu

    We don't have to do this since we're relying on the
    `static_sandbox_resource_mgmt` feature, which gives us the correct
    amount of memory and CPUs to be allocated.

    Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
```

This commit was removing a requirement that was made previously, but due
to the SMP issue we're facing with the QEMU used for TDX (see commit
d1b54ede290e95762099fff4e0bcdad10f816126*), QEMU will fail to start due
to:
```
Invalid CPU topology: product of the hierarchy must match maxcpus:
sockets (1) * dies (1) * cores (1) * threads (1) != maxcpus (240)"
```

This has no affect on the SEV / SNP workflow and hopefully we'll be able
to re-revet this soon enough, when this gets solved on te QEMU side.

Last but not least, this is not a "clean" revert as we're using
conf.NumVCPUs() instead of conf.NumVCPUs, to ensure we're dealing with
uint32.

Fixes: #8532

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-11-30 00:41:27 +01:00
stevenhorsman
47b8c3181f runtime: remote hypervisor updates to ttrpc
- Update the remote hypervisor code to match the re-genned code for
the ttrpc Hypervisor Service

Fixes: #8519
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-11-29 18:04:40 +00:00
stevenhorsman
613c75ba8c runtime: Update hypervisor generated code
Update to use ttrpc_out instead of grpc_out

Fixes: #8519
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-11-29 18:04:40 +00:00
Wainer dos Santos Moschetta
a13eecf7f3 runtime(-rs): add clean-generated-files target
The new clean-generated-files make target allows for removing the
generated files (including the configuration.toml files).

The tools/packaging/static-build/shim-v2/build.sh script now uses that
target to always force the re-generation of those files.

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2023-11-28 11:21:53 -03:00
Fabiano Fidêncio
80860478bf runtime-rs: Remove the golang config paths
As the configuration files are different, we can safely remove those as
any new installation of the binary should also bring in the new
configurations.

This makes things less error-prone in the future, as we're ensuring that
the rust runtime will only be reading the rust configuration files.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-11-28 15:16:53 +01:00
James O. D. Hunt
b86ab5aa21 runtime-rs: Update list of config paths to check
Update the `DEFAULT_RUNTIME_CONFIGURATIONS` list to include a number of
rust runtime specific paths to try to load before checking the
"traditional" (golang) runtime configuration paths.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-11-28 15:16:53 +01:00
James O. D. Hunt
89ef464b7c build: Install rust config files to runtime-rs directory
Install the rust runtime configuration files to a `runtime-rs/`
directory to distinguish them from the golang config files (which may
have a different syntax).

The default values mean that the rust config files are now installed to
`/opt/kata/share/defaults/kata-containers/runtime-rs/` rather than
`/opt/kata/share/defaults/kata-containers/`.

See: https://github.com/kata-containers/kata-containers/issues/6020

Fixes: #8444.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-11-28 15:16:53 +01:00
alex.lyn
fe68f25bea runtime-rs: enhancement of vfio volume.
Reimplement vfio volume into direct_volume and do alignment
of rawblock/spdk volume.

Fixes: #8300

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-11-28 10:08:05 +08:00
alex.lyn
e3fd403126 runtime-rs: enhancement of spdk volume.
(1) Add enum DirectVolumeType for direct volumes.
(2) Reimplement spdk volume into direct_volume and
do alignment of rawblock volume.

Fixes: #8300

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-11-28 10:08:05 +08:00
alex.lyn
f973729029 runtime-rs: Enhancing DirectVolMount Handling for current Infra.
The current infra(K8S, CSI, CRI, Containerd) for Kata containers is
unable to properly handle direct volumes, resulting in the need for
workarounds like searching/comparision and then patch up volume type.

In this commit, reimplement of handling method is added to support
raw block volume which backends may be rawdisk or other format file.

Fixes: #8300

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-11-28 10:08:05 +08:00
alex.lyn
e3becea566 runtime-rs: add support kata/multi-containers sharing one vfio volume.
Fiexes: #8300

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-11-28 10:07:23 +08:00
James O. D. Hunt
45cc417a4e
Merge pull request #8461 from jodh-intel/update-codeowners
CODEOWNERS: Expand scope
2023-11-27 15:38:39 +00:00
Fabiano Fidêncio
bb4c51a5e0
Merge pull request #8494 from ChengyuZhu6/kata_virtual_volume
runtime: Pass `KataVirtualVolume` to the guest as devices in go runtime
2023-11-27 16:02:28 +01:00
alex.lyn
6af0592274 runtime-rs: Add vsock device in device manager.
(1) Implement Device Trait for vsock device.
(2) add vsock device in device manager.

Fixes: #8474

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-11-27 15:23:18 +08:00
alex.lyn
1a6b45d3b7 runtime-rs: Reintroduce Vsock and add it to the DeviceType enum
As vsock device will be used in Qemu or other VMMs, the Vsoock
is reintroduced to DeviceType enum.

Fixes: #8474

Signed-off-by: Pavel Mores <pmores@redhat.com>
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-11-27 15:12:44 +08:00
alex.lyn
e31dbc94a5 runtime-rs: remove vhost_fd from VsockConfig and make it cloneable.
Currently encounters difficulty in utilizing the clone operation
on VsockConfig due to the implicit management of the vhost fd
within the runtime-rs. This responsibility should be delegated to
the VMM(especially QEMU) child process, as it's not runtime-rs core
responsibilities. We'll remove the member vhost_fd from VsockConfig
and make the VsockConfig/VsockDevice Cloneable.

Fixes: #8474

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-11-27 15:11:21 +08:00
alex.lyn
eb90962b27 runtime-rs: introduce a new function generate_vhost_vsock_cid.
Introduce a new function generate_vhost_vsock_cid to generate
a guest CID and set guest CID for vsock fd.
Also this commit wouldn't introduce functional change and it's
just splited from the previous VsockDevice::new().

Fixes: #8474

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-11-27 15:06:58 +08:00
alex.lyn
b952c5c5ce runtime-rs: add support kata/multi-containers sharing one spdk volume.
Fiexes: #8300

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-11-25 21:13:03 +08:00
alex.lyn
17d2d465d1 runtime-rs: re-organize the volumes with adding new direct_volumes.
Add a new dire direct_volumes containing spdk, rawblock and vfio volume.

Fixes: #8300

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-11-25 21:04:55 +08:00
alex.lyn
6731466b13 runtime-rs: set a standard NotFound when direct volume path not found.
Fixes: #8300

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-11-25 19:51:12 +08:00
alex.lyn
d23867273f runtime-rs: split the block volume into block and rawblock volume
(1) rawblock volume is directvol mount type.
(2) block volume is based on the bind mount type.

Fixes: #8300

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-11-24 23:30:30 +08:00
ChengyuZhu6
5318afe273 runtime: support to create VirtualVolume rootfs storages
1) Creating storage for all `io.katacontainers.volume=` messages in rootFs.Options,
and then aggregates all storages  into `containerStorages`.
2) Creating storage for other data volumes and push them into `volumeStorages`.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2023-11-23 23:22:55 +08:00
ChengyuZhu6
0b4f7c2ee7 runtime: redefine and add functions to handle VirtualVolume to storage
1) Extract function `handleBlockVolume` to create Storage only.
2) Add functions to handle KataVirtualVolume device and construct
   corresponding storages.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2023-11-23 23:07:32 +08:00
ChengyuZhu6
bd099fbda9 runtime: extend SharedFile to support mutiple storage devices
To enhance the construction and administration of `Katavirtualvolume` storages,
this commit expands the 'sharedFile' structure to manage both
rootfs storages(`containerStorages`) including `Katavirtualvolume` and other data volumes storages(`volumeStorages`).

NOTE: `volumeStorages` is intended for future extensions to support Kubernetes data volumes.
Currently, `KataVirtualVolume` is exclusively employed for container rootfs, hence only `containerStorages` is actively utilized.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2023-11-23 23:05:14 +08:00
ChengyuZhu6
e4f33ac141 runtime: add functions to create devices in KataVirtualVolume
The snapshotter will place `KataVirtualVolume` information
into 'rootfs.options' and commence with the prefix 'io.katacontainers.volume='.
The purpose of this commit is to transform the encapsulated KataVirtualVolume data into device information.

Fixes: #8495

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Co-authored-by: Feng Wang <feng.wang@databricks.com>
Co-authored-by: Samuel Ortiz <sameo@linux.intel.com>
Co-authored-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-11-23 23:05:13 +08:00
Dan Mihai
756022787c
Merge pull request #8239 from Sumynwa/sumsharma/fix_configmap_update_propagation
runtime: Fix configmap/secrets updates with FS sharing disabled
2023-11-23 06:50:53 -08:00
Chelsea Mafrica
98aa291c9e runtime-rs: Add Hybrid VSOCK device handling for CH
Update cloud hypervisor implementation to allow hybrid vsock device to
be handled.

Fixes #6692

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2023-11-22 14:42:09 -08:00
briwan01
231b9dfd9d runtime-rs/clh: Fix unable to boot container
In the case of Cloud Hypervisor running on arm64 architecture,
only arm AMBA UART (pl011) is supported as the TTY. Consequently,
when enabling Hypervisor debug mode, it's essential to configure
the console as "ttyAMA0" rather than "ttyS0

Fixes: #8381

Signed-off-by: briwan01 <brian.wang@arm.com>
2023-11-22 17:52:11 +08:00
Chao Wu
6a6c3c53b5
Merge pull request #8450 from adamqqqplay/vhost-user-general
dragonball: add vhost-user connection management logic
2023-11-21 16:05:17 +08:00
Alex.Lyn
4fd2914a33
Merge pull request #7932 from Apokleos/wrap-virtiofs-in-dm
runtime-rs: bringing virtio-fs device in device-manager
2023-11-21 13:48:15 +08:00
Huang Jianan
a9571398a6 dragonball: add test utils for vhost-user
The test utils will be used by the upcoming feature tests: vhost-user-net,
vhost-user-blk and vhost-user-fs.

Signed-off-by: Beiyue <beiyue@linux.alibaba.com>
Signed-off-by: Huang Jianan <jnhuang@linux.alibaba.com>
2023-11-21 09:51:56 +08:00
Qinqi Qu
a6a399d5bc dragonball: add vhost-user connection management logic
The vhost-user connection management logic will be used by
the upcoming features: vhost-user-net, vhost-user-blk and
vhost-user-fs.

Fixes: #8448

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Qinqi Qu <quqinqi@linux.alibaba.com>
Signed-off-by: Huang Jianan <jnhuang@linux.alibaba.com>
2023-11-21 09:51:48 +08:00
Fabiano Fidêncio
9445a967b6
Merge pull request #8471 from ChengyuZhu6/kata-virtual-volume
runtime: Introduce `KataVirtualVolume` structure into go runtime
2023-11-20 21:58:27 +01:00
Wainer Moschetta
728565d1e4
Merge pull request #7046 from stevenhorsman/remote-hypervisor-cherry-picks
CC: Remote hypervisor merge to main
2023-11-20 15:22:37 -03:00
Chao Wu
5ee8829700
Merge pull request #8451 from openanolis/chao/pci 2023-11-21 00:29:22 +08:00
Fabiano Fidêncio
41f3f6f93e
Merge pull request #8465 from justxuewei/rename-virtio
dragonball: Uniform the spelling of Virtio
2023-11-20 16:31:33 +01:00
alex.lyn
fe62e656a7 runtime-rs: Name the ShareFs Mount Option type more accurately
Fixes: #7915

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-11-20 20:05:50 +08:00
alex.lyn
856315ff87 runtime-rs: bringing virtio-fs device in device-manager
It mainly focus on the two parts:
(1) redesign the ShareFsConfig with ShareFsMountConfig

The device mount operation must depend on the fact that sharefs
device exists, and re-design the structure of SharesFsConfig and
move the ShareFsMountConfig into it with Option type, which is to
describe the relation between ShareFsConfig and ShareFsMountConfig.

(2) move virtiofs into device manager
Currently, virtio-fs is still outside of the device manager.
To do Enhancement of device manager, it will bring virtio-fs
device in device-manager for unified management

Fixes: #7915

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-11-20 20:04:47 +08:00
Chao Wu
b3318e59eb
Merge pull request #8332 from Apokleos/bugfix-directvol-multicontainers
runitme-rs/bugfix: kata pod with multi-containers sharing one direct volume
2023-11-20 19:37:58 +08:00
Chao Wu
ee55897827 fmt: refactor in pci & balloon
1. merge hashmap get logic according to Xuewei suggestion.

2. do cargo fmt

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-11-20 17:53:51 +08:00
Chao Wu
baf3db9e6e Dragonball: add PCI bus and PCI interrupt support in mptable Spec
In order to support PCI VFIO functionality in Dragonball, we should
first add PCI bus and PCI device Interrupt information in Dragonball
mptable setup process.

This patch add :

1. pci_legacy_irqs transfered to setup_mptable function.
2. pci bus support in mptable mem
3. pci interrupt support in mptable mem

fixes: #8449

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-11-20 17:53:51 +08:00
Xuewei Niu
c305634b4e dragonball: Uniform the spelling of Virtio
The changes are:

- VirtIoError -> VirtioError
- VirtIoResult -> VirtioResult
- VirtIoDevice -> VirtioDevice

Fixes: #8464

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-20 17:00:58 +08:00
ChengyuZhu6
1353b14e6c runtime: Add KataVirtualVolume struct in runtime
Add the corresponding data structure in the runtime part according to
kata-containers/kata-containers/pull/7698.

Fixes: #8472

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2023-11-19 13:30:32 +08:00
Greg Kurz
110574353d
Merge pull request #8345 from beraldoleal/issues/8343
Fixes make check errors
2023-11-17 17:38:29 +01:00
Pradipta Banerjee
39e8c84269 runtime: Add support for key annotations to remote hyp
In order to support different pod VM instance type via
remote hypervisor implementation (cloud-api-adaptor),
we need to pass machine_type, default_vcpus
and default_memory annotations to cloud-api-adaptor.

The cloud-api-adaptor then uses these annotations to spin
up the appropriate cloud instance.

Reference PR for cloud-api-adaptor
https://github.com/confidential-containers/cloud-api-adaptor/pull/1088

Fixes: #7140
Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
(based on commit 004f07f076)
2023-11-17 13:33:27 +00:00
Yohei Ueda
2910e333a8 runtime: Use static resource in remote hypervisor
This patch updates the template configuration file for
the remote hypervisor to set static_sandbox_resource_mgmt
to be true.  The remote hypervisor uses the peer pod config
to determine the sandbox size, so requires this to be set to
true by default.

Fixes: #6616
Signed-off-by: Yohei Ueda <yohei@jp.ibm.com>
(based on commit 938447803b)
2023-11-17 13:33:27 +00:00
stevenhorsman
26d56678a9 config: Add initial remote hypervisor config
- Remote hypervisor template config
- Add annotation enablement for machine_type, default_memory and
default_vcpus for flexible instance types

Fixes: #6349
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
(based on commits 7c9a791d67
and 335a456425)
2023-11-17 13:33:24 +00:00
stevenhorsman
ad63439a3e runtime: Update the remote hypervisor config
Add the SELinux setting to ensure it is passed through to the remote
hypervisor

Fixes: #5936

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
(based on commit 3ef2fd1784)
2023-11-17 13:32:52 +00:00
Lei Li
50e0d43dad runtime: Support privileged containers in peer pod VM
This patch fixes the issue of running containers
with privileged as true.

See the discussion at this URL for the details.
https://github.com/confidential-containers/cloud-api-adaptor/issues/111

Signed-off-by: Lei Li <cdlleili@cn.ibm.com>
(based on commit c3e6b66051)
2023-11-17 13:32:52 +00:00
Yohei Ueda
57d4dd8e57 runtime: Support the remote hypervisor type
This patch adds the support of the remote hypervisor type.
Shim opens a Unix domain socket specified in the config file,
and sends TTPRC requests to a external process to control
sandbox VMs.

Fixes #4482

Co-authored-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Signed-off-by: Yohei Ueda <yohei@jp.ibm.com>
(based on commit f9278f22c3)
2023-11-17 13:32:49 +00:00
Yohei Ueda
8ac9a22097 runtime: Add hypervisor proto to support peer pod VMs
This patch adds a protobuf definiton of the remote hypervisor type.

Signed-off-by: Yohei Ueda <yohei@jp.ibm.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
(based on commit 150e8aba6d)
2023-11-17 13:31:09 +00:00
Sumedh Alok Sharma
4aaf54bdad runtime: Fix configmap/secrets update propagation with FS sharing disabled
This PR fixes k8's configmap/secrets etc update propagation when filesystem sharing is disabled.
The commit introduces below changes with some limitations:
- creates new timestamped directory in guest
- updates the '..data' symlink
- creates user visible symlinks to newly created secrets.
- Limitation: The older timestamped directory and stale user visible symlinks exist in guest
  due to missing DELETE api in agent.

Fixes: #7398

Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
2023-11-17 13:01:23 +05:30
James O. D. Hunt
4a4fc9c648 CODEOWNERS: Expand scope
Improve the `CODEOWNERS` file by specifying more groups.

Since GitHub automatically checks the `CODEOWNERS` file when a PR is
created and adds all matching groups as reviewers for the PR, this may
help reduce the PR backlog since the right people will be alerted and
requested to review the PR. That should improve the quality of reviews
(and thus the quality of the landed code). It may also have a positive
effect on PR velocity.

> **Note:**
>
> This PR combines the other `CODEOWNERS` files so we have
> a single, visible, top-level file.

See: https://github.com/kata-containers/community/issues/253

Fixes: #3804.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-11-16 16:09:20 +00:00
Liu Wenyuan
c77e990c3e tests: Enable tests for StratoVirt hypervisor
This commit enables StratoVirt hypervisor to be tested in kata GHA,
incluing k8s, metrics, cri-containerd, nydus and so on.

Meanwhile, adding some unit tests for StratoVirt to make sure it works.

Fixes: #7794

Signed-off-by: Liu Wenyuan <liuwenyuan9@huawei.com>
2023-11-16 20:47:26 +08:00
Liu Wenyuan
9542211e71 configuration: add configuration for StratoVirt hypervisor.
Add configuration-stratovirt.toml.in to generate the StratoVirt configuration,
and parser to deliver config to StratoVirt.

Fixes: #7794

Signed-off-by: Liu Wenyuan <liuwenyuan9@huawei.com>
2023-11-16 20:47:26 +08:00
Liu Wenyuan
561c85be54 build: Makefile for StratoVirt hypervisor
Add support for building StratoVirt hypervisor, including x86_64 and
arm64.

Fixes: #7794

Signed-off-by: Liu Wenyuan <liuwenyuan9@huawei.com>
2023-11-16 20:47:26 +08:00
Liu Wenyuan
26966c8469 virtcontainers: Add StratoVirt as a supported hypervisor
Initial support of the MicroVM machine type of StratoVirt
hypervisor for the kata go runtime.

Fixes: #7794

Signed-off-by: Liu Wenyuan <liuwenyuan9@huawei.com>
2023-11-16 20:47:24 +08:00
Xuewei Niu
f18794d880
Merge pull request #8426 from justxuewei/vhost-rm-virtio-net
dragonball: Remove vhost-net dependency on virtio-net
2023-11-15 10:39:27 +08:00
alex.lyn
ba632ba825 runitme-rs: kata with multi-containers sharing one direct volume
When multiple containers in a kata pod share one direct volume,
it's important to make sure that the corresponding block device
is only mounted once in the guest. This means that there should
be only one mount entry for the device in the mount information.

Fixes: #8328

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-11-15 10:37:01 +08:00
alex.lyn
d7594d830c runtime-rs: correct the path from cid to device_id.
When a direct volume is used by multiple containers in Kata,
Generating many shared paths with cids will cause IO error
as the result of one direct volume mounts more than once.
To correct it, use the device_id instead of cid which
ensures that the guest only mounts the FS once.

Fixes: #8328

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-11-15 10:30:39 +08:00
Fabiano Fidêncio
fd9b6d6837
Merge pull request #7623 from fidencio/topic/runtime-improve-vcpu-allocation-on-host-side
runtime: Improve vCPU allocation for the VMMs
2023-11-14 14:10:54 +01:00
Xuewei Niu
49c2e6e23c dragonball: Remove vhost-net dependency on virtio-net
This patch is to remove vhost-net dependency on virtio-net for
dbs-virtio-devices crate. Then, the feature of vhost-net is able to enable
without enabling virtio-net device, error, etc.

Fixes: #8423

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-14 15:35:10 +08:00
alex.lyn
4d65c2e8a2 runtime-rs: introduce update_device in trait Hypervisor
Introduce the `update_device` trait in Hypervisor to enable
device updates for VMMs.This trait will initially be utilized
for virtiofs Mount operations.

Fixes: #7915

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-11-14 11:56:36 +08:00
James O. D. Hunt
7f666f783d runtime-rs: ch: Fix TDX
PR #8311 inadvertently broke the runtime-rs / Cloud Hypervisor TDX
handling. It also introduced unrecoverable failure scenarios. Hence,
replace slow, fallible regex matching in logging fast path with single pass
non-failing multi-string log level matching.

Also, added a unit test for `parse_ch_log_level()`.

Fixes: #8418.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-11-13 08:49:47 +00:00
Xuewei Niu
0a9125e629
Merge pull request #7675 from justxuewei/vhost-net 2023-11-12 20:38:18 +08:00
Xuewei Niu
d1deaf0538 dragonball: Minor changes for a comment from Bian
- Add feature control for InsertNetworkDevice.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-12 14:14:10 +08:00
Xuewei Niu
e4f83e27c4 dragonball: vhost-net set_offload with acked features
set_offload() for tap devices depends on acked features.

Signed-off-by: Helin Guo <helinguo@linux.alibaba.com>
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-12 14:10:39 +08:00
Xuewei Niu
6cd572dbbb dragonball: Minor changes for Chao's comments
- Remove two panic statements from InsertNetworkDevice test.
- Rename `NUM_QUEUES` to `DEFAULT_NUM_QUEUES`, `QUEUE_SIZE` to
  `DEFAULT_QUEUE_SIZE` for vhost-net and virtio-net.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-12 14:10:39 +08:00
Xuewei Niu
dcdf3c6556 runtime-rs: Supply missing fields of NetworkConfig
`test_networkconfig_to_netconfig` from clh depends on `NetworkConfig` which
has some new fields in this PR. Therefore, this commit gives the test
missing fields.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-12 14:10:39 +08:00
Xuewei Niu
58e9709c1f dragonball: Changes for ZizhengBian's comments
- Dragonball's vhost-net feature not depends on virtio-net feature.
- Remove `TapError` from dbs-virtio-devices's Error, and add `VirtioNet`
  and `VhostNet` two fields.
- Downgrade visiblity of two fields of `VhostNetDeviceMgr` from
  `pub(crate)`.
- File an issue to record a todo for network rate limiter.
- Print internal errors with `{0:?}.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-12 14:10:33 +08:00
Fabiano Fidêncio
5e9cf75937 vc: utils: Rename CalculateMilliCPUs() to CalculateCPUsF()
With the change done in the last commit, instead of calculating milli
cpus, we're actually converting the CPUs to a fraction number, a float.

Let's update the function name (and associated vars) to represent that
change.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-11-10 18:26:01 +01:00
Fabiano Fidêncio
e477ed0e86 runtime: Improve vCPU allocation for the VMMs
First of all, this is a controversial piece, and I know that.

In this commit we're trying to make a less greedy approach regards the
amount of vCPUs we allocate for the VMM, which will be advantageous
mainly when using the `static_sandbox_resource_mgmt` feature, which is
used by the confidential guests.

The current approach we have basically does:
* Gets the amount of vCPUs set in the config (an integer)
* Gets the amount of vCPUs set as limit (an integer)
* Sum those up
* Starts / Updates the VMM to use that total amount of vCPUs

The fact we're dealing with integers is logical, as we cannot request
500m vCPUs to the VMMs.  However, it leads us to, in several cases, be
wasting one vCPU.

Let's take the example that we know the VMM requires 500m vCPUs to be
running, and the workload sets 250m vCPUs as a resource limit.

In that case, we'd do:
* Gets the amount of vCPUs set in the config: 1
* Gets the amount of vCPUs set as limit: ceil(0.25)
* 1 + ceil(0.25) = 1 + 1 = 2 vCPUs
* Starts / Updates the VMM to use 2 vCPUs

With the logic changed here, what we're doing is considering everything
as float till just before we start / update the VMM. So, the flow
describe above would be:
* Gets the amount of vCPUs set in the config: 0.5
* Gets the amount of vCPUs set as limit: 0.25
* ceil(0.5 + 0.25) = 1 vCPUs
* Starts / Updates the VMM to use 1 vCPUs

In the way I've written this patch we introduce zero regressions, as
the default values set are still the same, and those will only be
changed for the TEE use cases (although I can see firecracker, or any
other user of `static_sandbox_resource_mgmt=true` taking advantage of
this).

There's, though, an implicit assumption in this patch that we'd need to
make explicit, and that's that the default_vcpus / default_memory is the
amount of vcpus / memory required by the VMM, and absolutely nothing
else.  Also, the amount set there should be reflected in the
podOverhead for the specific runtime class.

One other possible approach, which I am not that much in favour of
taking as I think it's **less clear**, is that we could actually get the
podOverhead amount, subtract it from the default_vcpus (treating the
result as a float), then sum up what the user set as limit (as a float),
and finally ceil the result.  It could work, but IMHO this is **less
clear**, and **less explicit** on what we're actually doing, and how the
default_vcpus / default_memory should be used.

Fixes: #6909

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2023-11-10 18:25:57 +01:00
Fabiano Fidêncio
b0157ad73a runtime: confidential: Do not set the max_vcpu to cpu
We don't have to do this since we're relying on the
`static_sandbox_resource_mgmt` feature, which gives us the correct
amount of memory and CPUs to be allocated.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-11-10 12:58:20 +01:00
Chao Wu
a62fb83c91
Merge pull request #8169 from openanolis/chao/fix_typo_shm
runtime-rs: fix a typo in shm
2023-11-10 14:00:11 +08:00
gaohuatao
78df1bb851 agent: update AGENT_THREADS metrics value
Fixes: #8369

Signed-off-by: gaohuatao <gaohuatao@bytedance.com>
2023-11-10 10:39:57 +08:00
Chao Wu
afb002c25c runtime-rs: fix a typo in shm
is_shim_volume should be is_shm_volume in shm_volume mod.

fixes: #8168
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-11-10 10:36:58 +08:00
Archana Shinde
1611723465
Merge pull request #8379 from likebreath/1103/clh_v36.0
Upgrade to Cloud Hypervisor v36.0
2023-11-08 21:10:41 -08:00
Archana Shinde
268d4d622f
Merge pull request #8389 from justxuewei/vm-capable-test
runtime: Fix TestCheckHostIsVMContainerCapable unstablity issue
2023-11-08 12:14:04 -08:00
Archana Shinde
92a517156c
Merge pull request #8367 from amshinde/add-nerdctl-ipvlan-test
network: Fix network hotplug for ipvlan and macvlan endpoints for qemu and add tests
2023-11-08 11:45:13 -08:00
Chelsea Mafrica
83e731328f
Merge pull request #8023 from cmaf/runtime-rs-ch-pause-resume
runtime-rs: Update status for pause and resume
2023-11-08 11:34:47 -08:00
Xuewei Niu
acd9057c7b runtime: Fix TestCheckHostIsVMContainerCapable unstablity issue
TestCheckHostIsVMContainerCapable removes sysModuleDir to simulate a
case that the kernel modules are not loaded. However,
checkKernelModules() executes modprobe <module> if a module not
found in that directory. Loading those modules is required to be denied
temporarily.

Fixes: #8390

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-08 22:40:08 +08:00
Fupan Li
100a73d2fd
Merge pull request #7531 from justxuewei/device-cgroup
agent: Restrict device access at upper node of container's cgroup
2023-11-08 22:01:48 +08:00
Xuewei Niu
023d8dc01e agent: Changes according to Pan's comments
- Disable device cgroup restriction while pod cgroup is not available.
- Remove balcklist-related names and change whitelist-related names to
  allowed_all.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-08 09:39:08 +08:00
Xuewei Niu
b5f3a8cb39 agent: Fix container launching failure with systemd cgroup
FSManager of systemd cgroup manager is responsible for setting up cgroup
path. The container launching will be failed if the FSManager is in
read-only mode.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-08 09:39:07 +08:00
Xuewei Niu
6477825195 agent: Minor changes according to Zhou's comments
The changes include:

- Change to debug logging level for resources after processed.
- Remove a todo for pod cgroup cleanup.
- Add an anyhow context to `get_paths_and_mounts()`.
- Remove code which denys access to VMROOTFS since it won't take effect. If
  blackmode is in use, the VMROOTFS will be denyed as default. Otherwise,
  device cgroups won't be updated in whitelist mode.
- Add a unit test for `default_allowed_devices()`.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-08 09:39:07 +08:00
Xuewei Niu
cec8044744 agent: Make devcg_info optional for LinuxContainer::new()
The runk is a standard OCI runtime that isnt' aware of concept of sandbox.
Therefore, the `devcg_info` argument of `LinuxContainer::new()` is
unneccessary to be provided.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-08 09:39:07 +08:00
Xuewei Niu
ef4c3844a3 agent: Restrict device access at upper node of container's cgroup
The target is to guarantee that containers couldn't escape to access extra
devices, like vm rootfs, etc.

Assume that there is a cgroup, such as `/A/B`. The `B` is container cgroup,
and the `A` is what we called pod cgroup. No matter what permissions are
set for the container (`B`), the `A`'s permission is always `a *:* rwm`. It
leads that containers could acquire permission to access to other devices
in VM that not belongs to themselves.

In order to set devices cgroup properly, the order of setting cgroups is
that the pod cgroup comes first and the container cgroup comes after.

The `Sandbox` has a new field, `devcg_info`, to save cgroup states. To
avoid setting container cgroup too early, an initialization should be done
carefully. `inited`, one of the states, is a boolean to indicate if the pod
cgroup is initialized. If no, the pod cgroup should be created firstly, and
set default permissions. After that, the pause container cgroup is created
and inherits the permissions from the pod cgroup.

If whitelist mode which allows containers to access all devices in VM is
enabled,  then device resources from OCI spec are ignored.

This feature not supports systemd cgroup and cgroup v2, since:

- Systemd cgroup implemented on Agent hasn't supported devices subsystem so
  far, see: https://github.com/kata-containers/kata-containers/issues/7506.
- Cgroup v2's device controller depends on eBPF programs, which is out of
  scope of cgroup.

Fixes: #7507

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-08 09:39:07 +08:00
Archana Shinde
a6272733e7 network: Fix network hotplug for ipvlan and macvlan endpoints.
Since moving from network coldplug to hotplug, the only case verified
was veth endpoints. Support for network hotplug for ipvlan and macvlan was
broken/not added. Fix it.

Fixes: #8391

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-11-07 10:13:51 -08:00
James O. D. Hunt
59d0d4caff runtime-rs: ch: Simplify VSOCK error handling
Remove the redundant `VmConfigError::EmptyVsockSocketPath` error from
the Cloud Hypervisor config crate since this scenario is already handled
by the `VsockConfigError::NoVsockSocketPath` error.

Fixes: #8385.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-11-07 17:45:38 +00:00
James O. D. Hunt
bdb83f8282 runtime-rs: ch: Remove unused function
Remove the redundant `parse_mac()` function: this was never used and we
already have an implementation in `crates/resource/src/network/utils/mod.rs`.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-11-07 17:45:38 +00:00
Xuewei Niu
8ea87405ed runtime-rs: Remove virtio config from Backend
Virtio-net and vhost-net share a common virtio config, and vhost-user-net
uses another config, named `VhostUserConfig`. Thus, the virtio config could
be added into `NetworkConfig` instead of `Backend`.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-07 19:35:02 +08:00
Xuewei Niu
ad66378bf5 runtime-rs: Move Dragonball stuff out of device drivers
Moving Dragonball structs convertions out of device drivers to keep driver
neutral. The convertions include `NetworkBackend` to
`DragonballNetworkBackend` and `NetworkConfig` to
`DragonballNetworkConfig`.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-07 19:35:02 +08:00
Xuewei Niu
3e0614cdf0 dragonball: Minor changes to comments
Changes include:

- Merge `VhostNetDeviceError` import item.
- Replace if with match in `add_vhost_net_device()`

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-07 19:35:02 +08:00
Xuewei Niu
a047331a34 runtime-rs: Network config distinguishes backends
Network backends determine the virtio dataplane implementations. Common
protocols include virtio-net, vhost-net and vhost-user-net, etc. Network
config has a new field named `backend` to specify which protocol to use.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-07 19:35:02 +08:00
Xuewei Niu
9203371833 dragonball: Introduce vhost-net device
PLEASE NOTE THAT this pull request just implements vhost-net support for
Dragonball, and adaptation for the Runtime-rs. And this pull request
DOESN'T provide an item to config which backend to use. To sum up,
virtio-net as a default backend is only choice for the user so far.

This pull request introduces vhost-net device for the Dragonball. In
addition, this pull request includes changes of Runtime-rs to improve
network configuration abilities.

The Dragonball part implements a vhost-net device and a vhost-net device
manager, named `VhostNetDeviceMgr`, to manage vhost-net device.
`NetworkInterfaceConfig` is introduced as a high-level abstract for network
config. Then, the Dragonball is able to distinguish network backends, e.g.
virtio-net, vhost-net, vhost-user-net(WIP), etc.

The Runtime-rs part adds support of multiple network backends as well.
`NetworkConfig` has a couple of new fields, like `backend`,
`use_shared_irq`, etc. And Dragonball's network config structs are
implmented `From` trait which allow to be converted from the Runtime-rs's
network config conveniently.

Fixes: #7674

Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
Signed-off-by: Zizheng Bian <zizheng.bian@linux.alibaba.com>
Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-11-07 19:35:02 +08:00
Beraldo Leal
dd530ba8ee tests: fixes AMD errors
TestCheckHostIsVMContainerCapable is failing on AMD machines.
kata-check_amd64_test.go:96 has no AMD modules, also getCPUType is
missing.

Fixes #8384.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-11-06 16:49:59 +00:00
Beraldo Leal
7641c19f74 runtime: bump containerd for gogo deprecation
This update includes necessary changes due to the version bump of
containerd and its dependencies. It's part of a broader initiative to
phase out gogo protobuf, which has been deprecated, and to align with
the current supported libraries.

Fixes #7420.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-11-06 16:49:59 +00:00
Beraldo Leal
16fa2c39e6 protocols: replace gogo/types.Empty and Any
by Google versions.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-11-06 16:49:58 +00:00
Beraldo Leal
c61f4a8592 protocols: remove unused fieldpath option
The +fieldpath option, specific to gogoprotobuf, enabled dynamic field
access in protobuf messages, allowing nested fields to be accessed via
string paths.

This change is part of a larger effort to transition to the official Go
protobuf library for better maintainability and community support.
Upon review, no instances of dynamic field access were found in the
codebase, confirming that the feature is not in use.

By removing this unused feature, we simplify the build process and make
it easier to complete the transition away from gogoprotobuf.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-11-06 16:49:58 +00:00
Beraldo Leal
c87bc60ea0 protocols: removing unused mappings
Those mappings are not used by our .proto files and there is no
difference between .pb.go files generated.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-11-06 16:49:58 +00:00
Beraldo Leal
c5d845b30a agent: updating Cargo.lock files
Probably previous changes missed updating Cargo.lock.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-11-06 16:49:58 +00:00
Beraldo Leal
5d88c78a6e protocols: generating agent.pb.go
a3b003c345 modified agent but agent.pb.go
was not updated.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-11-06 16:49:58 +00:00
Archana Shinde
036b7787dd runtime-rs: Use PCI path from hypervisor for vfio devices
Remove earlier functionality that tries to assign PCI path to vfio
devices from the host assuming pci slots to start from 1.
Get this from the hypervisor instead.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-11-05 21:59:44 -08:00
Archana Shinde
c3ce6a1d15 runtime-rs: Provide PCI path to the agent for virtio-block
If PCI path for block device is not empty for a block device, use
that as identifier for agent instead of virt path which is valid only
for mmio devices.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-11-05 21:59:44 -08:00
Archana Shinde
a2bbbad711 runtime-rs: change hypervisor add_device trait to return device copy
Block(virtio-blk) and vfio devices are currently not handled correctly
by the agent as the agent is not provided with correct PCI paths for
these devices.

The PCI paths for these devices can be inferred from the PCI information
provided by the hypervisor when the device is added.
Hence changing the add_device trait function to return a device copy
with PCI info potentially provided by the hypervisor. This can then be
provided to the agent to correctly detect devices within the VM.

This commit includes implementation for PCI info update for
cloud-hupervisor for virtio-blk devices with stubs provided for other
hypervisors.

Removing Vsock from the DeviceType enum as Vsock currently does not
implement the Device Trait, it has no attach and detach trait functions
among others. Part of the reason is because these functions require Vsock
to implement Clone trait as these functions need cloned copies to be
passed down the hypervisor.

The change introduced for returning a device copy from the add_device
hypervisor trait explicitly requires a device to implement
Copy trait. Hence removing Vsock from the DeviceType enum for now, as
its implementation is incomplete and not currently used.

Note, one of the blockers for adding the Clone trait to Vsock is that it
currently includes a file handle which cannot be cloned. For Clone and
Device Traits to be implemented for Vsock, it requires an implementation
change in the future for it to be cloneable.

Fixes: #8283

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-11-05 21:59:44 -08:00
Bo Chen
071667f1ca runtime: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v35.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.

Fixes: #8378

Signed-off-by: Bo Chen <chen.bo@intel.com>
2023-11-03 10:47:06 -07:00
Fabiano Fidêncio
40cc397218
Merge pull request #8255 from cmaf/migrate-checks-fixes-links
docs: Fix broken links
2023-11-01 14:46:30 +01:00
Beraldo Leal
afec54799e libs: fixes dereferenced reference
make check is giving us the following error:

error: this expression creates a reference which is immediately
dereferenced by the compiler.

Fixes #8344

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-10-31 15:55:32 -04:00
Beraldo Leal
c57df607ad libs: fixes comparison to empty slice
Make check gives us an "error: comparison to empty slice".

Fixes #8343

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-10-31 15:51:03 -04:00
Fabiano Fidêncio
53cda12a71
Merge pull request #8311 from TimePrinciple/log-system-enhancement
runtime-rs: Log system enhancement
2023-10-31 10:14:41 +01:00
Archana Shinde
148c565b2f
Merge pull request #8289 from BbolroC/skip-create-tmpfs-s390x
agent: Skip flaky create_tmpfs on s390x
2023-10-30 22:26:28 -07:00
Ruoqing He
4ad2cfe0c2 runtime-rs: Log system enhancement
By modifying RuntimeLevelFilter drain to improve logging control,
enabling isolation of change effect of the loggers between components,
tuning clh logs to be logged according to their log levels
given by cloud-hypervisor.

Fixes: #8310

Signed-off-by: Ruoqing He <linuxwatcher@outlook.com>
2023-10-31 04:57:46 +00:00
David Esparza
2a17d3889e
Merge pull request #8334 from amshinde/ipvlan-nerdctl-fix
network: Fix network attach for ipvlan and macvlan
2023-10-30 16:00:32 -06:00
Chao Wu
7d26604061
Merge pull request #7831 from lisongqian/feat/dragonball_trace
dragonball: add tracing feature for dragonball
2023-10-30 17:27:30 +08:00
James O. D. Hunt
d7e410ad2b
Merge pull request #8314 from jodh-intel/kata-ctl-show-confidential-guest
kata-runtime/kata-ctl: Add security details to output
2023-10-30 07:41:22 +00:00
Songqian Li
2f533c3003 dragonball: add tracing feature for dragonball
This PR adds the tracing capability for dragonball and it depends on the tracing::Subscriber of the upper layer.

Fixes: #7249

Signed-off-by: Songqian Li <mail@lisongqian.cn>
2023-10-28 19:52:24 +08:00
Chao Wu
f1f4410537
Merge pull request #7695 from lisongqian/feat/legacy_metrics
dragonball: add metrics support for legacy device
2023-10-28 16:48:57 +08:00
Archana Shinde
f53f86884f network: Fix network attach for ipvlan and macvlan
We used the approach of cold-plugging network interface for pre-shimv2
support for docker.Since the hotplug approach was not required,
we never really got to implementing hotplug support for certain network
endpoints, ipvlan and macvlan being among them.

Since moving to shimv2 interface as the default for
runtime, we switched to hotplugging the network interface for supporting
docker and nerdctl. This was done for veth endpoints only.

Implement the hot-attach apis for ipvlan and macvlan as well to support
ipvlan and macvlan networks with docker and nerdctl.

Fixes: #8333

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-10-27 21:42:37 -07:00
Peng Tao
52a014d9cd
Merge pull request #8033 from h56983577/6715/shared-mount
agent: use open_tree()/move_mount() to set up bind mounts between containers directly.
2023-10-28 10:57:34 +08:00
Songqian Li
da77b19449 dragonball: output legacy device metrics to runtime
Legacy device manager adds device metrics to METRICS when a device is created and removes metrics when a device is dropped.

Fixes: #7248

Signed-off-by: Songqian Li <mail@lisongqian.cn>
2023-10-27 14:09:42 +08:00
Songqian Li
65213e9fbe dragonball: unify the metric interface of legacy device
Fixes: #7248

Signed-off-by: Songqian Li <mail@lisongqian.cn>
2023-10-27 14:09:42 +08:00
Archana Shinde
f5c17f89a3
Merge pull request #8250 from amshinde/runtime-rs-clh-config
runtime-rs: Add default configuration file for cloud-hypervisor
2023-10-26 14:54:47 -07:00
Chelsea Mafrica
0608e20a01 docs: Fix broken links
Update broken links so that static checks pass.

Fixes #8254

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2023-10-26 10:17:01 -07:00
HanZiyao
a3b003c345 agent: support bind mounts between containers
This feature supports creating bind mounts directly between containers through annotations.

Fixes: #6715

Signed-off-by: HanZiyao <h56983577@126.com>
2023-10-26 16:34:50 +08:00
James O. D. Hunt
d707fa2c0d kata-runtime/kata-ctl: Add security details to output
Add the hypervisor security details to the output of the `kata-runtime
env` and `kata-ctl env` commands so the user can see, amongst other
things, the value of `confidential_guest`.

Fixes: #8313.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-10-25 16:34:42 +01:00
Chao Wu
29d863350f
Merge pull request #7697 from lisongqian/feat/balloon_metrics
dragonball: add metrics support for balloon device
2023-10-25 02:42:14 -05:00
Fabiano Fidêncio
328ba0da99
Merge pull request #7647 from jongwu/use_pcie_virt
AArch64: runtime: use pcie root port to do pci/pcie device hotplug
2023-10-25 09:17:13 +02:00
Archana Shinde
f99de4d5a1 runtime-rs: Make default kernel params as empty
The default kernel params passed to any hypervisor except dragonball is
empty.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-10-24 15:50:12 -07:00
Archana Shinde
a813012785 runtime-rs: Add default configuration file for clouf-hypervisor
The config template file for clh is in the new format for runtime-rs.
It is a result of merging the new format file and options supportted by
cloud-hypervisor.

Some config options from the golang runtime are missing as they may not
be currently supported by the rust runtime. An example of this is the
selinux options, rate limiting options as these are not currently
supported or verified with the rust runtime.

Fixes: #8249

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-10-24 15:17:24 -07:00
Songqian Li
dce365d5b4 dragonball: add conditional compilation for BalloonDeviceMetrics
Fixes: #7248

Signed-off-by: Songqian Li <mail@lisongqian.cn>
2023-10-24 13:33:39 +08:00
Songqian Li
3819f0ee6f dragonball: output balloon device metrics to runtime
Balloon device manager adds balloon device metrics to METRICS when a device is created and remove metrics when a device is dropped.

Fixes: #7248

Signed-off-by: Songqian Li <mail@lisongqian.cn>
2023-10-23 21:15:22 +08:00
Zizheng Bian
7d7c25c1d6 runtime-rs: fix a typo in device manager
Fixes: #8293
Signed-off-by: Zizheng Bian <zizheng.bian@linux.alibaba.com>
2023-10-23 20:33:47 +08:00
Hyounggyu Choi
a0746c8d7b agent: Skip flaky create_tmpfs on s390x
This is to skip a flaky test `create_tmpfs()` on s390x until a root cause is identified and fixed.

Fixes: #4248

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2023-10-23 11:22:14 +02:00
Dan Mihai
732fe163f3
Merge pull request #8229 from microsoft/danmihai1/no-config-toml-endpoints
agent: no endpoint blocking from agent-config.toml
2023-10-20 11:30:43 -07:00
Dan Mihai
52aaf10759 agent: no endpoint blocking from agent-config.toml
Remove the ability to block access to kata agent endpoints by using
agent-config.toml. That functionality is now implemented using the
Agent Policy feature (#7573).

The CCv0 branch relied on blocking endpoints using agent-config.toml
but will set-up an equivalent default policy file instead (#8219).

Fixes: #8228

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-10-20 02:26:54 +00:00
James O. D. Hunt
9b14dda147 libs: protection: Fix typo in TDX output
Add the missing closing bracket to the output of the TDX details,
so rather than:

```bash
$ sudo kata-ctl env 2>/dev/null | grep available_guest_protection
available_guest_protection = "tdx (major_version: 1, minor_version: 0"
:                                                                    ^
:                                                           Missing ')' !
```

... we now have:

```bash
$ sudo kata-ctl env 2>/dev/null | grep available_guest_protection
available_guest_protection = "tdx (major_version: 1, minor_version: 0)"
:                                                                    ^
:                                                                   Aha!
```

Added a unit test for this scenario.

Fixes: #8257.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-10-19 16:06:08 +01:00
James O. D. Hunt
9336e2e492
Merge pull request #8155 from jodh-intel/runtime-rs-check-ch-tdx-build-feature
runtime-rs: ch: Add TDX CH features check
2023-10-19 14:13:08 +01:00
James O. D. Hunt
048cc70654
Merge pull request #8213 from jodh-intel/validate-hypervisor-cfg-name
runtime: Validate hypervisor section name in config file
2023-10-19 07:40:58 +01:00
James O. D. Hunt
0e0867f15d runtime-rs: ch: Add TDX CH features check
If you attempt to create a container (a TD) on a TDX system using a
custom build of Cloud Hypervisor (CH) that was not built with the `tdx`
CH feature, Kata will report the following, somewhat cryptic, CH error:

```
ApiError(VmBoot(InvalidPayload))
```

Newer versions of CH now report their build-time features in the ping
API response message so we now use that, if available, to detect this
scenario and generate a user-friendly error message instead.

This changes improves the readability of `handle_guest_protection()` and
adds a couple of additional tests for that method.

Fixes: #8152.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-10-18 18:07:39 +01:00
James O. D. Hunt
409eadddb2 runtime-rs: ch: Improve readability of guest protection checks
Improve the way `handle_guest_protection()` is structured by inverting
the logic and checking the value of the `confidential_guest` setting
before checking the guest protection. This makes the code easier to
understand.

> **Notes:**
>
> - This change also unconditionally saves the available guest protection
>   (where previously it was only saved when `confidential_guest=true`).
>   This explains the minor unit test fix.
>
> - This changes also errors if the CH driver finds an unexpected
>   protection (since only Intel TDX is currently tested).

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-10-18 18:06:02 +01:00
Jianyong Wu
f9c9d8f645 runtime: QemuVirt: hotadd virtio-mem dev to pcie root port
Hotplug virtio-mem device to pcie root port for Qemu Virt.

Fixes: #7646
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-10-18 06:35:57 +00:00
Jianyong Wu
ef18c9550c runtime:qemuvirt: hotadd net dev to pcie root port
Hotplug network device to pcie root port as this is the only way on
QemuVirt.

Fixes: #7646
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-10-18 06:35:57 +00:00
Jianyong Wu
f1aec98f9d qemu/virt: use pcie_root_port to do device hotplug for virt
ACPI PCI device hotplug on qemu virt is not supported. The only way to
hotplug pci device is pcie native way. Thus we need create pcie root
port as default.

Pcie root port number depends on following:
1. reserved one for network device as default;
2. virtio-mem dev;
3. add enough port for vhost user blk dev;

Fixes: #7646
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-10-18 06:35:57 +00:00
Jianyong Wu
28a41e1d16 runtime: add a new API for Network interface
Add GetEndpointsNum API for Network Interface to get the number of
network endpoints. This is used for caculate the number of pcie root
port for QemuVirt.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-10-18 06:35:57 +00:00
Songqian Li
09d46450f1 dragonball: add metrics support for balloon device
Fixes: #7248

Signed-off-by: Songqian Li <mail@lisongqian.cn>
2023-10-18 14:02:56 +08:00
Fabiano Fidêncio
db37692f36
Merge pull request #8226 from microsoft/danmihai1/policy-typo
policy: allow access to ReseedRandomDev
2023-10-16 19:17:31 +02:00
Peng Tao
45e82b6581
Merge pull request #8192 from bergwolf/github/deps
runtime/kata-ctl: update dependencies
2023-10-16 16:39:17 +08:00
Chao Wu
408b59c02c runtime-rs: fix bugs to support Nydus v5
1. enable virtio-fs-pro in Dragonball to have the ability to process nydus backend registry
2. change passthrough for rw layer's readonly config to false to have the accurate read write ability.

Fixes:#8013

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-10-16 10:22:21 +08:00
Chao Wu
678fe3cd31 Dragonball: fix Nydus config serde problem
Since Nydus snapshotter has been updated in previous commits, there is a
problem that the config passthrough to Dragonball during mount_rafs is
RafsConfig instead of ConfigV2, but Dragonball could only serde ConfigV2
so it will panic.

We need to add the support for RafsConfig

Fixes:#8013

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-10-16 10:22:21 +08:00
Dan Mihai
b6ec621389 policy: allow access to ReseedRandomDev
Allow access to the ReseedRandomDev endpoint by default. Using false
for ReseedRandomDevRequest was unintended.

Fixes: #8225

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-10-13 21:18:27 +00:00
James O. D. Hunt
3e8cf6959c runtime: Validate hypervisor section name in config file
Previously, if you accidentally modified the name of the hypervisor
section in the config file, the default golang runtime gives a cryptic
error message ("`VM memory cannot be zero`"). This can be demonstrated
using the `kata-runtime` utility program which uses the same golang
config package as the actual runtime (`containerd-shim-kata-v2`):

```bash
$ kata-runtime env >/dev/null; echo $?
0
$ sudo sed -i 's!^\[hypervisor\.qemu\]!\[hypervisor\.foo\]!g' /etc/kata-containers/configuration.toml
$ kata-runtime env >/dev/null; echo $?
VM memory cannot be zero
1
```

The hypervisor name is now validated so that the behaviour becomes:

```bash
$ kata-runtime env >/dev/null; echo $?
0
$ sudo sed -i 's!^\[hypervisor\.qemu\]!\[hypervisor\.foo\]!g' /etc/kata-containers/configuration.toml
$ ./kata-runtime env >/dev/null; echo $?
/etc/kata-containers/configuration.toml: configuration file contains invalid hypervisor section: "foo"
1
```

Fixes: #8212.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-10-12 13:53:37 +01:00
James O. D. Hunt
45d28998d9
Merge pull request #8149 from jodh-intel/runtime-rs-ch-detect-tdx-version
runtime-rs: ch: Detect Intel TDX version
2023-10-12 10:09:42 +01:00
QuanweiZhou
f904e64155
Merge pull request #8179 from Apokleos/directvol-urlEncode
runitme-rs: use the same base64 as kata-runtime/direct-volume does
2023-10-12 09:04:11 +08:00
James O. D. Hunt
87b760f569 runtime-rs: ch: Detect Intel TDX version
Improve the `GuestProtection` handling to detect the version of
Intel TDX available.

The TDX version is now logged by the Cloud Hypervisor driver.

Fixes: #8147.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-10-11 09:38:00 +01:00
alex.lyn
73e81f5e39 runitme-rs: unify base64 encoding for direct-volume
Direct-volume needs to use the same base64 character set as
kata-runtime/direct-volume does.

Fixes: #8175

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-10-11 14:00:13 +08:00
Archana Shinde
8d6f7b9096 runtime-rs: Add support for handling vfio device for cloud-hypervisor
This change adds support for adding and removing vfio devices for
 cloud-hypervisor.

Fixes: #6691

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-10-10 12:25:44 -07:00
lisongqian
dbfe6512fc dragonball: vcpu metrics change to be recorded per vcpu
In this commit, the vcpu metrics in Dragonball will be changed to record per-vcpu.

Fixes: #7248

Signed-off-by: lisongqian <mail@lisongqian.cn>
2023-10-10 16:22:40 +08:00
lisongqian
fa60fbe023 dragonball: METRICS is refactored to RwLock<DragonballMetrics>
In this commit, the METRICS is refactored to RwLock<DragonballMetrics>.

Fixes: #7248

Signed-off-by: lisongqian <mail@lisongqian.cn>
2023-10-10 16:22:40 +08:00
Peng Tao
500d1c5cee kata-ctl: update rustls-webpki/webpki dependency
The old ones have security issues.
ref: https://github.com/briansmith/webpki/issues/69
https://github.com/briansmith/webpki/issues/69

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-10-10 03:56:45 +00:00
Peng Tao
d7660d82a0 runtime: unify gopkg.in/yaml.v3 to v3.0.1
The older versions have Denial of Service issues.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-10-10 03:56:45 +00:00
Peng Tao
fc9a107e8e runtime: unify swag and testify dependency
So that we don't need to depend on that many versions of them.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-10-10 03:56:45 +00:00
Peng Tao
79ebb959c5 runtime: update runc dependency to v1.1.9
To pick up security fixes.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-10-10 03:56:45 +00:00
Peng Tao
7f3e8bd65e runtime: unify golang.org/x/text to v0.7.0
The older versions contain security issues.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-10-10 03:56:45 +00:00
Peng Tao
df325ae371 runtime: update golang.org/x/net to v0.7.0
To pick up fix for the following issue:

A maliciously crafted HTTP/2 stream could cause excessive CPU
consumption in the HPACK decoder, sufficient to cause a denial of
service from a small number of small requests.

Fixes: #8190
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-10-10 03:56:39 +00:00
James O. D. Hunt
b8a46a4b85 runtime-rs: ch: Enable feature
Enable the Cloud Hypervisor driver (the `cloud-hypervisor` build feature) for the rust runtime.

Fixes: #6264.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-10-05 17:58:39 +01:00
Fabiano Fidêncio
1727487eef agent: Allow specifying DESTDIR and AGENT_POLICY via env vars
This will help to build the agent binary as part of the kata-deploy
localbuild, as we need to pass the DESTDIR to where the agent will be
installed, and also whether we're building the agent with policy support
enabled or not.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-03 14:18:45 +02:00
Zvonko Kaiser
7c934dc7da gpu: Fix cold-plug of VFIO devices
We need to do proper sandbox sizing when we're doing cold-plug introduce CDI,
the de-facto standard for enabling devices in containers. containerd
will pass-through annotations for accumulated CPU,Memory and now CDI
devices. With that information sandbox sizing can be derived correctly.

Fixes: #7331

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-09-28 09:49:13 +00:00
Greg Kurz
defbb64ac8
Merge pull request #8036 from rye-stripe/bugfix/overhead-metrics
runtime: fix reading cgroup stats of sandboxes
2023-09-27 19:39:55 +02:00
Archana Shinde
95455e6fe8
Merge pull request #8058 from likebreath/0925/clh_v35.0
Upgrade to Cloud Hypervisor v35.0
2023-09-27 10:39:32 -07:00
Chelsea Mafrica
a49bc68374 runtime-rs: Update status for pause and resume
Pause and resume task do not currently update the status of the
container to paused or running, so fix this. This is specifically for
pausing the task and not the VM.

Fixes #6434

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2023-09-26 17:22:47 -07:00
James O. D. Hunt
b0a3293d53 runtime-rs: ch: Enable Intel TDX
Allow Cloud Hypervisor to create a confidential guest (a TD or
"Trust Domain") rather than a VM (Virtual Machine) on Intel systems
that provide TDX functionality.

> **Notes:**
>
> - At least currently, when built with the `tdx` feature, Cloud Hypervisor
>   cannot create a standard VM on a TDX capable system: it can only create
>   a TD. This implies that on TDX capable systems, the Kata Configuration
>   option `confidential_guest=` must be set to `true`. If it is not, Kata
>   will detect this and display the following error:
>
>   ```
>   TDX guest protection available and must be used with Cloud Hypervisor (set 'confidential_guest=true')
>   ```
>
> - This change expands the scope of the protection code, changing
>   Intel TDX specific booleans to more generic "available guest protection"
>   code that could be "none" or "TDX", or some other form of guest
>   protection.

Fixes: #6448.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-09-26 10:55:25 +01:00
James O. D. Hunt
523399c329 runtime-rs: ch: Add more consts
Introduce a few new constants (for PCI segment count and FS queues) and
move the disk queue constants to `convert.rs` to allow them to be used
there too.

> **Note:**
>
> This change gives the `ShareFs` code it's own set of values rather
> than relying on the disk queue constants.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-09-26 08:41:32 +01:00
James O. D. Hunt
dea8065811 runtime-rs: ch: Remove unused function
Delete the `handle_pending_devices_after_boot()` function which is no
longer required.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-09-26 08:41:32 +01:00
James O. D. Hunt
995f2c015f runtime-rs: ch: Only handle particular pending device types
Modify the Cloud Hypervisor `add_device()` method to add `ShareFs` and
`Network` devices to the list of pending devices since only these two
device types need to be cached before VM startup. Full details in the
comments.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-09-26 08:41:32 +01:00
James O. D. Hunt
b1b96a5c49 runtime-rs: ch: Remove erroneous "virtio-blk-mmio" check
Remove the `VIRTIO_BLK_MMIO` check which appears to have been added
erroneously in the first place.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-09-26 08:41:32 +01:00
Bo Chen
dfd0c9fa9a runtime: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v35.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.

Fixes: #8057

Signed-off-by: Bo Chen <chen.bo@intel.com>
2023-09-25 12:22:37 -07:00
Archana Shinde
9bb9a3e7a4
Merge pull request #7966 from amshinde/runtime-rs-network-clh
runtime-rs: Add network support for cloud-hypervisor
2023-09-22 13:08:09 -07:00
Chao Wu
6f98fbafde
Merge pull request #6706 from guixiongwei/feat/thp
feat(runtime-rs): introduce huge page mode to select VM RAM's backend
2023-09-22 15:27:06 +08:00
Peteris Rudzusiks
94e2ccc2d5 runtime: fix reading cgroup stats of sandboxes
The cgroup stats come from resourcecontrol package in the form of pointers
to structs. The sandbox Stat() method incorrectly was expecting structs.
This caused the cpu and memory stats to always be 0, which in turn caused
incorrect pod overhead metrics.

Fixes #8035

Signed-off-by: Peteris Rudzusiks <rye@stripe.com>
2023-09-21 17:00:53 +02:00
Alexandru Matei
d507d189bb fc: Add support for noflush cache option
Firecracker supports noflush semantic via Unsafe cache type.
There is no support for direct i/o, remove it from config file

Fixes: #7823

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2023-09-21 14:48:24 +03:00
Alexandru Matei
2ca781518a clh: Direct IO support for block devices
Clh suports direct i/o for disks. It doesn't
offer any support for noflush, removed passing
of option to cloud-hypervisor internal config

Fixes: #7798

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2023-09-21 14:48:24 +03:00
Wainer Moschetta
87e64a07ed
Merge pull request #7979 from beraldoleal/gogo-removal
protocol: remove gogoprotobuff tests
2023-09-20 22:38:10 -03:00
Beraldo Leal
730ef51693 deps: updating dependencies
Updating dependencies after make check, make test.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-09-19 16:54:35 -04:00
Dan Mihai
82ff2db460 runtime: support kernel params including spaces
Support quoted kernel command line parameters that include space
characters. Example:

dm-mod.create="dm-verity,,,ro,0 736328 verity 1
/dev/vda1 /dev/vda2 4096 4096 92041 0 sha256
f211b9f1921ef726d57a72bf82be23a510076639fa8549ade10f85e214e0ddb4
065c13dfb5b4e0af034685aa5442bddda47b17c182ee44ba55a373835d18a038"

Fixes: #8003

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-09-19 20:26:38 +00:00
Beraldo Leal
604a9dd673 protocol: remove gogoprotobuff tests
This is part of a bigger effort to drop gogoprotobuff from our code
base. IIUC, those options are basically used by *pb_test.go, and since
we are dropping gogoprotobuff and those are auto generated tests, let's
just remove it.

Fixes #7978.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-09-19 12:55:42 -04:00
Fabiano Fidêncio
84c0d59d23
Merge pull request #7985 from fidencio/topic/clh-use-static_sandbox_resource_mgmt-as-default-on-arm
clh: arm: Use static_sandbox_resource_mgmt=true
2023-09-19 09:25:34 +02:00
Fabiano Fidêncio
c3ee913bf6
Merge pull request #7953 from gkurz/extra-monitor-socket
runtime/qemu: Rework QMP/HMP support
2023-09-18 19:04:14 +02:00
Fabiano Fidêncio
72599f1911 clh: arm: Use static_sandbox_resource_mgmt=true
Users have noticed that this is needed, as CLH does not yet implement a
way to hotplug resources on aarh64.

With this patch, when building for x86_64, I can see the this is the
resulting config:
```
$ ARCH=amd64 make
...

$ cat config/configuration-clh.toml | grep static_sandbox_resource_mgmt
static_sandbox_resource_mgmt=false

```

And when building for aarch64:
```
$ ARCH=arm64 make
...

$ cat config/configuration-clh.toml | grep static_sandbox_resource_mgmt
static_sandbox_resource_mgmt=true
```

Fixes: #7941

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 14:14:10 +02:00
Jeremi Piotrowski
dfa6af54df
Merge pull request #7806 from jongwu/clh_serial
clh:arm64: use arm AMBA UART for hypervisor debug
2023-09-18 12:29:07 +02:00
Greg Kurz
1f16b6627b runtime/qemu: Rework QMP/HMP support
PR #6146 added the possibility to control QEMU with an extra HMP socket
as an aid for debugging. This is great for development or bug chasing
but this raises some concerns in production.

The HMP monitor allows to temper with the VM state in a variety of ways.
This could be intentionally or mistakenly used to inject subtle bugs in
the VM that would be extremely hard if not even impossible to debug. We
definitely don't want that to be enabled by default.

The feature is currently wired to the `enable_debug` setting in the
`[hypervisor.qemu]` section of the configuration file. This setting has
historically been used to control "debug output" and it is used as such
by some downstream users (e.g. Openshift). Forcing people to have the
extra HMP backdoor at the same time is abusive and dangerous.

A new `extra_monitor_socket` is added to `[hypervisor.qemu]` to give
fine control on whether the HMP socket is wanted or not. This setting
is still gated by `enable_debug = true` to make it clear it is for
debug only. The default is to not have the HMP socket though. This
isn't backward compatible with #6416 but it is for the sake of "better
safe than sorry".

An extra monitor socket makes the QEMU instance untrusted. A warning is
thus logged to the journal when one is requested.

While here, also allow the user to choose between HMP and QMP for the
extra monitor socket. Motivation is that QMP offers way more options to
control or introspect the VM than HMP does. Users can also ask for
pretty json formatting well suited for human reading. This will improve
the debugging experience.

This feature is only made visible in the base and GPU configurations
of QEMU for now.

Fixes #7952

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-09-18 12:13:01 +02:00
Fabiano Fidêncio
0e3bfac3b3
Merge pull request #7976 from fidencio/topic/ci-static-checks-rework-part-0
ci: Rework static checks
2023-09-18 11:01:18 +02:00
Peng Tao
6eedd9b0b9
Merge pull request #7738 from Xuanqing-Shi/7732/handle-non-empty-endpoints-in-RemoveEndpoints
runtime: incorrect handling of non-empty []Endpoint parameter in Remo…
2023-09-18 10:58:28 +08:00
Fabiano Fidêncio
08f2e5ae0b runtime-rs: Ensure static-checks-build is a dep of make test
Otherwise `make test` will simply fail with:
```
error[E0583]: file not found for module `config`
```

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:53:13 +02:00
Fabiano Fidêncio
2bc3a616ae kata-ctl: Use loop instead of kvm module in tests
This makes it pssible to run the tests in the cost free runners, which
are not KVM capable.

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:53:08 +02:00
Fabiano Fidêncio
46daddc500 kata-ctl: Ensure GENERATED_CODE is a dep of make test
Otherwise `make test` will simply fail with:
```
error[E0583]: file not found for module `version`
```

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:53:01 +02:00
Fabiano Fidêncio
ec826f328f agent: Ensure GENERATED_CODE is a dep of make test
Otherwise `make test` will fail with:
```
error[E0583]: file not found for module `version`
```

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:52:57 +02:00
Fabiano Fidêncio
473ec87806 kata-ctl: Add kata-types to the Cargo.lock file
Commit message covered everything. :-)

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:52:40 +02:00
Fabiano Fidêncio
ea19549a99 kata-ctl: Ensure GENERATED_CODE is a dep of make check
Otherwise `make check` would fail with:
```
Error writing files: failed to resolve mod `version`:
/home/runner/work/kata-containers/kata-containers/src/tools/kata-ctl/src/ops/version.rs
does not exist make: *** [../../../utils.mk:176: standard_rust_check] Error 1
```

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:52:36 +02:00
Archana Shinde
9c233bb9e0 test: Add test to verify try_from for clh Netconfig
Add tests to verify conversion from runtime NetworkConfig
to clh specific config.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-09-16 00:24:14 -07:00
Archana Shinde
9049d311df runtime-rs: Add network support for cloud-hypervisor
This PR adds support for adding a network device before starting the
cloud-hypervisor VM.

Support for adding and removing network devices is not really added to
the resource manager, so supporting this for cloud-hypervisor is not
scoped in this PR.

This also changes "pending_devices" for clh implementation from an
Option of vector to simply a vector. This simplifies the structure a bit
as we can simple iterate over the pending devices instead of having to
check for a "Some" value as this is not really required.

Fixes: #6333

Signed-off-by: Shuaiyi Zhang <zhang_syi@qq.com>
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-09-15 23:25:20 -07:00
Jianyong Wu
241c355e07 clh:arm64: use arm AMBA uart for hypervisor debug
cloud hypervisor on arm64 only support arm AMBA UART(pl011) as
tty. So, the console should be set to "ttyAMA0" instead of "ttyS0"
when enable hypervisor debug mode.

Fixes: #5080
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-09-15 01:44:23 +00:00
Jeremi Piotrowski
3a1db7a86b runtime: clh: Support enabling iommu
by enabling IOMMU on the default PCI segment. For hotplug to work we need a
virtualized iommu and clh exposes one if there is some device or PCI segment
that requests it. I would have preferred to add a separate PCI segment for
hotplugging vfio devices but unfortunately kata assumes there is only one
segment all over the place. See create_pci_root_bus_path(),
split_vfio_pci_option() and grep for '0000'.

Enabling the IOMMU on the default PCI segment requires passing enabling IOMMU on
every device that is attached to it, which is why it is sprinkled all over the
place.

CLH does not support IOMMU for VirtioFs, so I've added a non IOMMU segment for
that device.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
bfc93927fb runtime: Remove redundant check in checkPCIeConfig
There is no way for this branch to be hit, as port is only set when it is
different than config.NoPort.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
7c4e73b609 runtime: Add test cases for checkPCIeConfig
These test cases shows which options are valid for CLH/Qemu, and test that we
correctly catch unsupported combinations.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
fc51e4b9eb runtime: Check config for supported CLH (cold|hot)_plug_vfio values
The only supported options are hot_plug_vfio=root-port or no-port.
cold_plug_vfio not supported yet.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
509771e6f5 runtime: clh: Add hot_plug_vfio entry to config
hot_plug_vfio needs to be set to root-port, otherwise attaching vfio devices to
CLH VMs fails. Either cold_plug_vfio or hot_plug_vfio is required, and we have
not implemented support for cold_plug_vfio in CLH yet.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Peng Tao
55ca7e8aec
Merge pull request #7907 from Xuanqing-Shi/7876/network-devices-naming-conflict
runtime: Naming conflict of network devices
2023-09-13 19:29:41 +08:00
shixuanqing
1636abbe1c runtime: issue with non-empty []Endpoint in RemoveEndpoints
In the RemoveEndpoints(), when the endpoints paramete isn't empty,
using idx may result in wrong endpoint removals. To improve,
directly passing the endpoint parameter helps
locate the correct elements within n.eps.

Fixes: #7732

Signed-off-by: shixuanqing <1356292400@qq.com>

Fixes: #7732

Signed-off-by: shixuanqing <1356292400@qq.com>

Update src/runtime/virtcontainers/network_linux.go

Co-authored-by: Xuewei Niu <justxuewei@apache.org>
2023-09-13 09:47:18 +00:00
Peng Tao
9766f9090c
Merge pull request #7719 from beraldoleal/nullable
Remove gogoproto.nullable extension
2023-09-13 15:11:56 +08:00
James O. D. Hunt
7feb8de9dc
Merge pull request #7887 from jodh-intel/hypervisor-remove-debug-kernel-options
runtime-rs: hypervisor: Remove debug kernel options
2023-09-12 16:31:48 +01:00
stevenhorsman
a75fd5eb81 runk: Fix rust unecessary mut error
- Fix `error: variable does not need to be mutable`
in rust 1.72

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
a31c145172 kata-ctl: useless-vec warning
- Fix clippy::useless-vec warning

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
c8419fc3bb kata-ctl: Resolve non-minimal-cfg warning
- In rust 1.72, clippy warned clippy::non-minimal-cfg
as the cfg has only one condition, so doesn't
need to be wrapped in the any combinator.

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
3eaf68d954 agent-ctl: Allow clippy lint
- Allow `clippy::redundant-closure-call`
which has issues with the guard function passed into
the `run_if_auto_values` macro

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
1d8b78959d runtime-rs: Fix useless-vec warning
Fix clippy::useless-vec warning

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
99f3d69e94 runtime-rs: Remove mut
Fix `error: variable does not need to be mutable`

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
16fbc27b09 dragonball: Allow ambiguous-glob-reexports
The bindgen generated code is triggering lots of
ambiguous-glob-reexports warnings in rust 1.70+

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
bbf1919516 dragonball: Resolve non-minimal-cfg warning
- In rust 1.72, clippy warned clippy::non-minimal-cfg
as the cfg has only one condition, so doesn't
need to be wrapped in the all combinators.

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
75cfdd5d59 agent: config: Allow clippy lint
- Allow `clippy::redundant-closure-call` in `from_cmdline`
which has issues with the guard function passed into
the `parse_cmdline_param` macro

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
f3a0fd5907 agent: config: Fix useles-vec warning
Fix clippy::useless-vec warning

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
9e423bd3d6 libs: Fix clippy unnecesary hashes error
- Fix error: unnecessary hashes around raw string literal

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
Yipeng Yin
a16b0962b5 chore(cargo): update cargo lock
Update cargo lock for runtime-rs, agent and kata-ctl.

Signed-off-by: Yipeng Yin <yinyipeng@bytedance.com>
2023-09-12 15:27:38 +08:00
Chao Wu
c800d0739f
Merge pull request #7889 from UiPath/fix-dragonball-build
dragonball: fix for non-deterministic builds
2023-09-12 14:06:18 +08:00
shixuanqing
ca4b6b051d runtime: Naming conflict of network devices
When creating a new endpoint, we check existing endpoint names and automatically adjust the naming of the new endpoint to ensure uniqueness.

Fixes: #7876

Signed-off-by: shixuanqing <1356292400@qq.com>
2023-09-12 04:29:51 +00:00
Guixiong Wei
202049f35e feat(runtime-rs): introduce huge page type to select VM RAM's backend
This commit allows us to specify the huge page backend when enabling huge
page. Currently, we support two backends: thp and hugetlbfs, the default
is hugetlbfs.

To ensure backward compatibility, we introduce another configuration item
"hugepage_type" to select the memory backend, which is available only when
"enable_hugepages" is true. Besides, we add an annotation
"io.katacontainers.config.hypervisor.hugepage_type" to configure huge page
type per pod.

Fixes: #6703

Signed-off-by: Guixiong Wei <weiguixiong@bytedance.com>
Signed-off-by: Yipeng Yin <yinyipeng@bytedance.com>
2023-09-12 11:28:27 +08:00
Zhongtao Hu
e1f54f96d0
Merge pull request #7766 from Apokleos/wrap-vsock-virtiofs
runtime-rs: bring hybrid vsock devices in manager.
2023-09-12 09:27:34 +08:00
Fabiano Fidêncio
d7f991d139
Merge pull request #7151 from Yuan-Zhuo/fix-systemd-cgroup
agent: optimize the code of systemd cgroup manager
2023-09-11 20:15:51 +02:00
James O. D. Hunt
c0f697fcc5 runtime: Allow kernel_params annotation
To support the removal of the `initcall_debug` and `earlyprintk=`
options from the default guest kernel cmdline, add `kernel_params` to the list
of enabled annotations to allow those kernel options (or others) to be
set using `kata-deploy` for either runtime.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-09-11 12:12:12 +01:00
Alexandru Matei
b03e49794e dragonball: fix for non-deterministic builds
Fixes: #7888

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2023-09-11 14:07:10 +03:00
James O. D. Hunt
976d10150c runtime-rs: hypervisor: Remove debug kernel options
Removed the following kernel command line options:

- `earlyprintk=ttyS0`
- `initcall_debug`

Both these options are only useful when debugging a guest kernel failure
which is not a common occurrence.

Further, the `earlyprintk=` option can have a large negative performance
impact (it can increase the VM boot time significantly).

If the user wishes to use either of these options, they can add them to the
`kernel_params=` setting in the Kata configuration file's hypervisor
stanza.

Fixes: #7886.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-09-11 09:43:39 +01:00
Fabiano Fidêncio
6cd5d83a37
Merge pull request #7865 from gkurz/fix-more-virtiofs-args
runtime: Fix more virtiofs args
2023-09-09 21:30:16 +02:00
Yuan-Zhuo
470d065415 agent: optimize the code of systemd cgroup manager
1. Directly support CgroupManager::freeze through systemd API.
2. Avoid always passing unit_name by storing it into DBusClient.
3. Realize CgroupManager::destroy more accurately by killing systemd unit rather than stop it.
4. Ignore no such unit error when destroying systemd unit.
5. Update zbus version and corresponding interface file.

Acknowledgement: error handling for no such systemd unit error refers to

Fixes: #7080, #7142, #7143, #7166

Signed-off-by: Yuan-Zhuo <yuanzhuo0118@outlook.com>
Signed-off-by: Yohei Ueda <yohei@jp.ibm.com>
2023-09-09 13:56:43 +08:00
Greg Kurz
72c510d057 runtime/virtiofsd: Drop all references to "--cache=none"
This syntax belongs to the legacy C virtiofsd implementation that
we don't support anymore since kata-containers 3.1.3 because
of other API breaking changes.

People have been warned to switch from "none" to "never" since
kata-containers 2.5.2. Let's officially do that.

The compat code that would convert "none" to "never" isn't
needed anymore. Just drop it.

Fixes #7864

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-09-08 17:57:30 +02:00
Beraldo Leal
ead724bec1 protocol: removing gogo.nullable feature
gogo.nullable is the main gogo.protobuf' feature used here. Since we are
trying to remove gogo.protobuf, the first reasonable step seems to be
remove this feature. This is a core update, and it will change how the
structs are defined. I could spot only a few places using those structs,
based on make check/build.

Fixes #7723.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-09-08 11:49:01 -04:00
Beraldo Leal
d8e4bb9859 protocol: remove unused PROTO_FILE env
There is no reference to PROTO_FILE and this is not working. Also we are
not inside a Makefile, so makes sense to adapt the usage to reflect the
script instead of a make command.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-09-08 11:49:01 -04:00
Beraldo Leal
5e1106a770 protocol: remove unused import_path
import_path is used as the default package when no input files specify
go_package. However, all the files we are currently building already
have a go_package definition, making this behavior both redundant and
error-prone.

Additionally, one of our files (types.pb.go) resides outside the grpc
directory, indicating that it's indeed ignored but also inconsistent.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-09-08 11:49:01 -04:00
Beraldo Leal
87accaaecb protocol: use workdir during build
Currently, the script searches for .proto files within $GOPATH/.
Consequently, modifications to a definition file in the current working
directory won't influence the output .pb.go if the directory is outside
of $GOPATH. For developers, it's more intuitive to alter the local
codebase than the version stored in $GOPATH.

With this modification, the generated .pb.go files will be relative to
the current working directory, removing the need to clone this project
under $GOPATH/src/github.com/kata-containers.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-09-08 11:49:01 -04:00
Beraldo Leal
711a7ed965 protocol: remove mapping definitions
The definitions are already specified in the .proto files using the
go_package option. Centralizing them in one location reduces the
potential for errors and simplifies the script.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-09-08 11:49:01 -04:00
Beraldo Leal
8db84c1bd2 protocol: force GOPATH to be set
Currently, if GOPATH is not set, errors will raise since protoc is using
GOPATH to find packages.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-09-08 11:49:01 -04:00
Beraldo Leal
68156d77ac protocol: breaking lines to improve readability
Just a small change to improve the readability of modules before the
actual changes.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-09-08 11:49:01 -04:00
Chao Wu
cd8c217ee1
Merge pull request #6879 from openanolis/chao/update_upstream_upcall_feature
Dragonball: optimize the placement of dbs-upcall features
2023-09-07 18:07:53 +08:00
Peng Tao
435e890cd9
Merge pull request #7703 from bergwolf/github/nerdctl-fc
runtime: run prestart hooks before starting VM for FC
2023-09-07 10:55:31 +08:00
Chao Wu
deed1b927d Dragonball: optimize the placement of dbs-upcall features
Currently, the dbs-upcall features have 2 problems that are needed to be
fixed :

There are redundant dbs-upcall features that are needed to be removed.
Some place should be controlled by dbs-upcall but not being implemented.

This commit will fix those two problems.

fixes: #6878

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-09-07 10:27:29 +08:00
Greg Kurz
81536f21af runtime/qemu: Pass "--xattr" to virtiofsd instead of "-o xattr"
The "-o" syntax belongs to the legacy C virtiofsd. It is deprecated
with the rust implementation.

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-09-06 17:50:35 +02:00
Fabiano Fidêncio
b1dd09a4d3 runtime: Allow virtio_fs_extra_args annotation
Some use cases may just require passing extra arguments to virtiofsd,
and having this disabled by default makes it impossible to set when
using kata-deploy, as changes in the configuration file would be
overwritten by the daemon-set.

With this in mind, let's allow users to pass whatever thet need (and
here I'm specifically looking at `--xattr`) as a virtio_fs_extra_arg.

Fixes: #7853

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-06 17:11:16 +02:00
Zhongtao Hu
aa85e0b3ec
Merge pull request #7714 from justxuewei/volumes-cleanup
runtime-rs: Fix volumes and rootfs cleanup issues
2023-09-06 10:13:55 +08:00
alex.lyn
7870b33a2d runtime-rs: bring hybridVsock devices in manager.
Currently, virtio_vsock are still outside of the device
manager. This causes some management issues,such as the
inability to unify PCI address management.

Just do some work for hybrid vsock.

Fixes: #7655

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-09-05 08:46:56 +08:00
Fabiano Fidêncio
27dab249a0
Merge pull request #7800 from jodh-intel/kata-sys-util-update-tdx-protection-checks
kata-sys-util: protection: Update TDX checks
2023-09-02 14:47:51 +02:00
Jiang Liu
57e7bf14a6 agent: refine StorageDeviceGeneric::cleanup()
Refine StorageDeviceGeneric::cleanup() to improve safety.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-09-02 14:22:21 +08:00
Jiang Liu
53edb19374 agent: implement StorageDeviceGeneric::cleanup()
Refactor cleanup_sandbox_storage as StorageDeviceGeneric::cleanup().

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-09-02 14:00:26 +08:00
Jiang Liu
0c63453e28 types: make StorageDevice::cleanup() return possible error code
Make StorageDevice::cleanup() return possible error code.

Fixes: #7818

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-09-02 13:27:06 +08:00
Jiang Liu
3a3d77b3b5 agent: move StorageDeviceGeneric from kata-types into agent
Move StorageDeviceGeneric from kata-types into agent, so we can
refactor code later.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-09-02 13:12:17 +08:00
Jiang Liu
d848126b61
Merge pull request #7821 from jiangliu/storage-leak
agent: avoid possible leakage of storage device
2023-09-02 12:40:40 +08:00
Jiang Liu
9cd706d1c9 agent: avoid possible leakage of storage device
When a storage device is used by more than one container, the second
and forth instances will cause storage device reference count leakage,
thus cause storage device leakage. The reason is:
add_storages() will increase reference count of existing storage device,
but forget to add the device to the `mount_list` array, thus leak the
reference count.

Fixes: #7820

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-09-01 22:52:42 +08:00
Dan Mihai
bf21411e90 tests: add policy to k8s tests
Use AGENT_POLICY=yes when building the Guest images, and add a
permissive test policy to the k8s tests for:
- CBL-Mariner
- SEV
- SNP
- TDX

Also, add an example of policy rejecting ExecProcessRequest.

Fixes: #7667

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-09-01 14:28:08 +00:00
Dan Mihai
d0e0610679 runtime: config: use the SEV initrd for SNP
Thanks Unmesh Deodhar!

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-09-01 14:28:08 +00:00
Fabiano Fidêncio
67fed26f18 runtime: Use TDX image with in the qemu-tdx config
Let's make sure we use the TDX image as part of the QEMU TDX
configuration, which will help us to have the policies tested here.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-01 14:28:08 +00:00
Jeremi Piotrowski
bde06758b1
Merge pull request #7761 from jepio/iocopy-fix-race
runtime: Fix data race in ioCopy
2023-09-01 09:30:54 +02:00
James O. D. Hunt
c290eaed8c kata-sys-util: protection: Update TDX checks
Update the protection checking code to detect newer versions of Intel
TDX (whose userland interface has now stabilised).

> **Note:** that we don't need to retain the existing behaviour since:
>
> - We haven't yet landed the TDX feature (#6448).
> - Systems wishing to use TDX will need to use the latest available
>   system components (such as firmware and host kernel).

Also added an explicit TDX unit test.

Fixes: #7384.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-08-31 16:15:15 +01:00
Jeremi Piotrowski
c2ba29c15b runtime: Fix data race in ioCopy
IoCopy is a tricky function (I don't claim to fully understand its contract),
but here is what I see: The goroutine that runs it spawns 3 goroutines - one
for each stream to handle (stdin/stdout/stderr). The goroutine then waits for
the stream goroutines to exit. The idea is that when the process exits and is
closed, the stdout goroutine will be unblocked and close stdin - this should
unblock the stdin goroutine. The stderr goroutine will exit at the same time as
the stdout goroutine. The iocopy routine then closes all tty.io streams.

The problem is that the stdout goroutine decrements the WaitGroup before
closing the stdin stream, which causes the iocopy goroutine to race to close
the streams. Move the wg.Done() of the stdout routine past the close so that
*this* race becomes impossible. I can't guarantee that this doesn't affect some
unspecified behavior.

Fixes: #5031
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-08-31 10:17:38 +02:00
Peng Tao
2e4c874726 runtime/vc: runPrestartHooks should ignore GetHypervisorPid failure
If we are running FC hypervisor, it is not started when prestart hooks
are executed. So we should just ignore such error and just go ahead and
run the hooks.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-08-30 03:06:11 +00:00
Peng Tao
21204caf20 runtime: fail early when starting docker container with FC
FC does not support network device hotplug. Let's add a check to fail
early when starting containers created by docker.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-08-30 02:52:01 +00:00
Peng Tao
32fd013716 runtime: run prestart hooks before starting VM for FC
Add a new hypervisor capability to tell if it supports device hotplug.
If not, we should run prestart hooks before starting new VMs as nerdctl
is using the prestart hooks to set up netns. To make nerdctl + FC
to work, we need to run the prestart hooks before starting new VMs.

Fixes: #6384
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-08-30 02:52:01 +00:00
Beraldo Leal
00e7ffd988 tests: check vmx only on Intel machines
When running on amd machines, those tests will fail because there is no
vmx flag. Following other tests that checks for cpuType, let's adapt
them to restrict vmx only on Intel machines.

Fixes #7788.
Related #5066

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-08-29 20:04:31 -04:00
Beraldo Leal
80146f2078 tests: Fixes cpuType check on AMD machines
cpuType is not initialized yet. gets 0 (Intel) by default, failing on
AMD machines.

Fixes #7785

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-08-29 17:04:07 -04:00
Chao Wu
e4fb20c74a
Merge pull request #7585 from lifupan/main
dragonball: vsock add fifo/pipe stream support for passed fd hybridSt…
2023-08-29 23:39:21 +08:00
Fabiano Fidêncio
d1b54ede29 qemu: tdx: Workaround SMP issue with TDX 1.5
`...,sockets=1,cores=numvcpus,threads=1,...` must be used.

Fixes: #7770

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-28 13:41:36 +02:00
Archana Shinde
1e34220c41 qemu: tdx: Adapt to the TDX 1.5 stack
QEMU for TDX 1.5 makes use of private memory map/unmap.
Make changes to govmm to support this. Support for private backing fd
for memory is added as knob to the qemu config.

Userspace's map/unmap operations are done by fallocate() ioctl on the
backing store fd.
Reference:
https://lore.kernel.org/linux-mm/20220519153713.819591-1-chao.p.peng@linux.intel.com/

Fixes: #7770

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-28 13:41:36 +02:00
Zhongtao Hu
f0440a9cfe
Merge pull request #7742 from frezcirno/fix-log-forwarder-loop
runtime-rs: check peer close in log_forwarder
2023-08-26 10:44:09 +08:00
Jiang Liu
91db888d83
Merge pull request #7602 from jiangliu/agent-storage
Refine storage device management for kata-agent
2023-08-25 22:20:18 +08:00
Zixuan Tan
dffc16e5b3 runtime-rs: check peer close in log_forwarder
The log_forwarder task does not check if the peer has closed, causing a
meaningless loop during the period of “kata vm exit”, when the peer
closed, and “ShutdownContainer RPC received” that aborts the log forwarder.

This patch fixes the problem.

Fixes: #7741

Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2023-08-25 19:00:07 +08:00
Jiang Liu
aaa5ab1264 agent: simplify storage device by removing StorageDeviceObject
Simplify storage device implementation by removing StorageDeviceObject.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-25 17:23:16 +08:00
Greg Kurz
9991772b26
Merge pull request #7718 from littlejawa/fix_filemode_when_zero
kata-agent: use default filemode for block device when it is set to 0
2023-08-24 11:40:28 +02:00
Jiang Liu
0e7248264d agent: move storage device related code into dedicated files
Move storage device related code into dedicated files.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 13:48:51 +08:00
Xuewei Niu
268e846558 runtime-rs: Fix volumes and rootfs cleanup issues
There are several processes for container exit:

- Non-detach mode: `Wait` request is sent by containerd, then
  `wait_process()` will be called eventually.
- Detach mode: `Wait` request is not sent, the `wait_process()` won’t be
  called.
    - Killed by ctr: For example, a container runs `tail -f /dev/null`, and
      is killed by `sudo ctr t kill -a -s SIGTERM <CID>`. Kill request is
      sent, then `kill_process()` will be called. User executes `sudo ctr c
      rm <CID>`, `Delete` request is sent, then `delete_process()` will be
      called.
    - Exited on its own: For example, a container runs `sleep 1s`. The
      container’s state goes to `Stopped` after 1 second. User executes
      the delete command as below.

Where do we do container cleanup things?

- `wait_process()`: No, because it won’t be called in detach mode.
- `delete_process()`: No, because it depends on when the user executes the
  delete command.
- `run_io_wait()`: Yes. A container is considered exited once its IO ended.
  And this always be called once a container is launched.

Fixes: #7713

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-08-24 13:23:47 +08:00
Jiang Liu
8f49ee33b2 agent: refine storage related code a bit
Refine storage related code by:
- remove the STORAGE_HANDLER_LIST
- define type alias
- move code near to its caller

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 13:09:10 +08:00
Jiang Liu
60ca12ccb0 agent: switch to new storage subsystem
Switch to new storage subsystem to create a StorageDevice for each
storage object.

Fixes: #7614

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 13:09:09 +08:00
Jiang Liu
fcbda0b419 kata-types: introduce StorageDevice and StorageHandlerManager
Introduce StorageDevice and StorageHandlerManager, which will be used
to refine storage device management for kata-agent.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 13:08:55 +08:00
Jiang Liu
b03b1f6134 agent: simplify the way to manage storage object
Simplify the way to manage storage objects, and introduce
StorageStateCommon structures for coming extensions.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 12:58:24 +08:00
Jiang Liu
8392c71bf2 sys-util: support more mount flags in parse_mount_options()
Support more mount flags in parse_mount_options().

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 12:17:39 +08:00
Jiang Liu
c00d8f3d48 agent: use create_mount_destination() from kata-sys-util
Use create_mount_destination() from kata-sys-util crate to reduce
redundant code.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 12:17:38 +08:00
Jiang Liu
5e867f0538 types: add more mount related constants
Add more mount related constants.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 12:17:36 +08:00
Jiang Liu
880e6c9a76 agent: use function from kata-sys-utils to reduce code
Use function get_linux_mount_info() from kata-sys-util crate to share
common code.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 12:17:34 +08:00
QuanweiZhou
a6921dd837
Merge pull request #7698 from jiangliu/virtual-volume
kata-types: introduce KataVirtualVolume to support nydus, direct volume and image pull
2023-08-24 11:50:39 +08:00
Fabiano Fidêncio
7705c5962e
Merge pull request #7728 from ManaSugi/fix/typo-test-toml
libs,tests: fix typo disable_guest_seccomp in configuration-anno-1.toml
2023-08-23 23:55:41 +02:00
Peng Tao
18d42da21e runtime/fc: fix image/initrd annotation handling
Right now if we configure an image annotation and have a config file
setting initrd, the initrd config would override the image annotation.

Make sure annotations are preferred over config options in image and initrd
path handling.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-08-23 03:47:28 +00:00
Peng Tao
9fda7059a5 runtime/clh: fix image/initrd annotation handling
We should make sure annotations are preferred over
config options in image and initrd path handling.

Fixes: #7705
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-08-23 03:47:28 +00:00
Peng Tao
1a0092d631 runtime/qemu: fix image/initrd annotation handling
Right now if we configure an image annotation and have a config file
setting initrd, the initrd config would override the image annotation.

Add a helper function ImageOrInitrdAssetPath to make sure annotations
are preferred over config options in image and initrd path handling.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-08-23 03:47:27 +00:00
Manabu Sugimoto
22d8f335d6 libs,tests: fix typo disable_guest_seccomp in configuration-anno-1.toml
Change `pdisable_guest_seccomp` to `disable_guest_seccomp`

Fixes: #7727

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-08-23 12:08:18 +09:00
Julien Ropé
40914b25d4 kata-agent: use default filemode for block device when it is set to 0
When the FileMode field for the device is unset (0), use a default value instead
to allow the use of the device from the container.
This behaviour is seen from cri-o typically.

Note: this is what runc is doing, which is why regular containers don't have an
issue. This change makes sure kata behaves the same as runc.

Fixes: #7717

Signed-off-by: Julien Ropé <jrope@redhat.com>
2023-08-22 16:08:14 +02:00
Jiang Liu
4aee3eade0 kata-types: implement serde methods for KataVirtualVolume
Implement serilization/deserialization methods for KataVirtualVolume.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-21 16:46:56 +08:00
Jiang Liu
b875e39323 kata-types: validate KataVirtualVolume object
Implement method validate() for KataVirtualVolume to validate message
format.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-21 16:42:07 +08:00
Jiang Liu
fa2fdc1057 kata-types: implement two conversion helpers for KataVirtualVolume
Enable conversions from NydusExtraOptions/DirectVolumeMountInfo to
KataVirtualVolume.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-21 16:35:26 +08:00
Jiang Liu
6326af20e3 kata-types: introduce KataVirtualVolume
Introduce structure KataVirtualVolume to to encapsulate information
for extra mount options and direct volumes, so we could build a common
infrastructure to handle these cases.

Fixes: #7699

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-21 16:19:47 +08:00
Dan Mihai
cb056f8cb3 rootfs: agent: Policy support with AGENT_INIT=yes
When building with AGENT_POLICY=yes and AGENT_INIT=yes:
1. Include OPA and the Policy settings in rootfs.
2. Start OPA from the kata agent.

Before these changes, building with both AGENT_POLICY=yes and
AGENT_INIT=yes was unsupported.

Starting OPA from systemd (when AGENT_INIT=no) was already supported.

Fixes: #7615

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-08-17 22:37:58 +00:00
Wedson Almeida Filho
962378606e
Merge pull request #7627 from wedsonaf/error-conv
agent: simplify error handling
2023-08-16 21:02:38 -03:00
Fabiano Fidêncio
4adcf2192e
Merge pull request #7651 from ManaSugi/runk/containerd-test
runk: Modify kill command's error message for containerd tests
2023-08-16 15:37:48 +02:00
Zhongtao Hu
d90f7ac689 runtime-rs: add unit test for block driver
add unit test for block driver

Fixes:#7539
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2023-08-16 11:45:27 +08:00
Zhongtao Hu
e44919f0da runtime-rs: add load_test_config for unit test
add load_test_config for unit test

Fixes:#7539
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2023-08-16 11:32:56 +08:00
Zhongtao Hu
7f48a69379 runtime-rs: add driver option
add driver option when handle linux devices

Fixes:#7539
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2023-08-16 11:32:49 +08:00
Manabu Sugimoto
25d151bd1b runk: Modify kill command's error message for containerd tests
The error message when the kill command is executed with the container's
state == Stopped should be "container not running" because the containerd
tests expect that OCI runtimes return the error message and compare it.
If the error message is different from the expected one, the tests fail.

Fixes: #7650

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-08-16 00:39:50 +09:00
Wedson Almeida Filho
76dac8f22c agent: simplify error handling
We extend the `Result` and `Option` types with associated types that
allows converting a `Result<T, E>` and `Option<T>` into
`ttrpc::Result<T>`.

This allows the elimination of many `match` statements in favor of
calling the map function plus the `?` operator. This transformation
simplifies the code.

Fixes: #7624

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-15 06:55:27 -03:00
Fabiano Fidêncio
e107d1d94e
Merge pull request #7574 from microsoft/danmihai1/policy
agent: runtime: add Agent Policy feature
2023-08-15 11:29:13 +02:00
Bin Liu
ea81eb6c2e
Merge pull request #7169 from chethanah/runk/support-no-pid-ns
runk: Support without pid ns
2023-08-15 13:00:40 +08:00
Chelsea Mafrica
22465d22f0
Merge pull request #7638 from ManaSugi/fix/virtcontainers-doc
docs: Remove installation step in virtcontainers doc
2023-08-14 10:21:57 -07:00
Dan Mihai
ab829d1038 agent: runtime: add the Agent Policy feature
Fixes: #7573

To enable this feature, build your rootfs using AGENT_POLICY=yes. The
default is AGENT_POLICY=no.

Building rootfs using AGENT_POLICY=yes has the following effects:

1. The kata-opa service gets included in the Guest image.

2. The agent gets built using AGENT_POLICY=yes.

After this patch, the shim calls SetPolicy if and only if a Policy
annotation is attached to the sandbox/pod. When creating a sandbox/pod
that doesn't have an attached Policy annotation:

1. If the agent was built using AGENT_POLICY=yes, the new sandbox uses
   the default agent settings, that might include a default Policy too.

2. If the agent was built using AGENT_POLICY=no, the new sandbox is
   executed the same way as before this patch.

Any SetPolicy calls from the shim to the agent fail if the agent was
built using AGENT_POLICY=no.

If the agent was built using AGENT_POLICY=yes:

1. The agent reads the contents of a default policy file during sandbox
   start-up.

2. The agent then connects to the OPA service on localhost and sends
   the default policy to OPA.

3. If the shim calls SetPolicy:

   a. The agent checks if SetPolicy is allowed by the current
      policy (the current policy is typically the default policy
      mentioned above).

   b. If SetPolicy is allowed, the agent deletes the current policy
      from OPA and replaces it with the new policy it received from
      the shim.

   A typical new policy from the shim doesn't allow any future SetPolicy
   calls.

4. For every agent rpc API call, the agent asks OPA if that call
   should be allowed. OPA allows or not a call based on the current
   policy, the name of the agent API, and the API call's inputs. The
   agent rejects any calls that are rejected by OPA.

When building using AGENT_POLICY_DEBUG=yes, additional Policy logging
gets enabled in the agent. In particular, information about the inputs
for agent rpc API calls is logged in /tmp/policy.txt, on the Guest VM.
These inputs can be useful for investigating API calls that might have
been rejected by the Policy. Examples:

1. Load a failing policy file test1.rego on a different machine:

opa run --server --addr 127.0.0.1:8181 test1.rego

2. Collect the API inputs from Guest's /tmp/policy.txt and test on the
   machine where the failing policy has been loaded:

curl -X POST http://localhost:8181/v1/data/agent_policy/CreateContainerRequest \
--data-binary @test1-inputs.json

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-08-14 17:07:35 +00:00
Manabu Sugimoto
416445e7eb docs: Remove installation step in virtcontainers doc
Remove the installation step in the virtcontainers doc
because the virtcontainers install/uninstall targets have
been removed by 86723b51ae
and they are not used anymore.

Fixes: #7637

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-08-14 15:15:24 +09:00
stevenhorsman
8815ed0665 runtime: Remove config warnings
Remove configuration file shared_fs = none warnings
now that there is a solution to updating configMaps, secrets etc

Fixes: #7210
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-08-11 16:31:08 +01:00
Yohei Ueda
afe1a6ac5a agent: support copying of directories and symlinks
This patch allows copying of directories and symlinks when
static file copying is used between host and guest. This change is
necessary to support recursive file copying between shim and agent.

Signed-off-by: Yohei Ueda <yohei@jp.ibm.com>
(cherry picked from commit de232b8030)
2023-08-11 16:31:08 +01:00
Pradipta Banerjee
ab13ef87ee runtime: propagate configmap/secrets etc changes for remote-hyp
For remote hypervisor, the configmap, secrets, downward-api or project-volumes are
copied from host to guest. This patch watches for changes to the host files
and copies the changes to the guest.

Note that configmap updates takes significantly longer than updates via downward-api.
This is similar across runc and Kata runtimes.

Fixes: #7210

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Signed-off-by: Julien Ropé <jrope@redhat.com>
(cherry picked from commit 3081cd5f8e)
(cherry picked from commit 68ec673bc4d9cd853eee51b21a0e91fcec149aad)
2023-08-11 16:31:08 +01:00
Yohei Ueda
c074ec4df1 runtime: Copy shared files recursively
This patch enables recursive file copying
when filesystem sharing is not used.

Signed-off-by: Yohei Ueda <yohei@jp.ibm.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
(cherry picked from commit 5422a056f2)
(cherry picked from commit 16055ce040bbd724be2916bc518d89b69c9e0ca5)

Fixes: #7210
2023-08-11 16:16:52 +01:00
Peng Tao
a39fd6c066
Merge pull request #7611 from ManaSugi/fix/fc-version
versions: Update firecracker version to 1.4.0
2023-08-11 16:43:37 +08:00
Chao Wu
7031b5db07
Merge pull request #7535 from ManaSugi/fix/allow-redundant-clone
agent: Allow clippy::redundant_clone in the unit tests
2023-08-11 14:17:56 +08:00
Manabu Sugimoto
cc922be5ec versions: Update firecracker version to 1.4.0
This patch upgrades Firecracker version from v1.1.0 to v1.4.0.

* Generate swagger models for v1.4.0 (from `firecracker.yaml`)
  - The version of go-swagger used is v0.30.0
* The firecracker v1.4.0 includes the following changes.
  - Added
    * Added support for custom CPU templates allowing users to adjust vCPU features
    exposed to the guest via CPUID, MSRs and ARM registers.
    * Introduced V1N1 static CPU template for ARM to represent Neoverse V1 CPU
    as Neoverse N1.
    * Added support for the virtio-rng entropy device. The device is optional. A
    single device can be enabled per VM using the /entropy endpoint.
    * Added a cpu-template-helper tool for assisting with creating and managing
    custom CPU templates.
  - Changed
    * Set FDP_EXCPTN_ONLY bit (CPUID.7h.0:EBX[6]) and ZERO_FCS_FDS bit
    (CPUID.7h.0:EBX[13]) in Intel's CPUID normalization process.
  - Fixed
    * Fixed feature flags in T2S CPU template on Intel Ice Lake.
    * Fixed CPUID leaf 0xb to be exposed to guests running on AMD host.
    * Fixed a performance regression in the jailer logic for closing open file
    descriptors.
    * A race condition that has been identified between the API thread and the VMM
    thread due to a misconfiguration of the api_event_fd.
    * Fixed CPUID leaf 0x1 to disable perfmon and debug feature on x86 host.
    * Fixed passing through cache information from host in CPUID leaf 0x80000006.
    * Fixed the T2S CPU template to set the RRSBA bit of the IA32_ARCH_CAPABILITIES
    MSR to 1 in accordance with an Intel microcode update.
    * Fixed the T2CL CPU template to pass through the RSBA and RRSBA bits of the
    IA32_ARCH_CAPABILITIES MSR from the host in accordance with an Intel microcode
    update.
    * Fixed passing through cache information from host in CPUID leaf 0x80000005.
    * Fixed the T2A CPU template to disable SVM (nested virtualization).
    * Fixed the T2A CPU template to set EferLmsleUnsupported bit
    (CPUID.80000008h:EBX[20]), which indicates that EFER[LMSLE] is not supported.

Fixes: #7610

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-08-10 16:48:13 +09:00
Fupan Li
39e67b06e9 dragonball: vsock add fifo/pipe stream support for passed fd hybridStream
Since the passed fd through unix socket would be any
stream fd such as pipe/fifo fd or any other socket
fd, thus we should deal with it as a normal hybrid
stream instead of a unix stream.

Fixes:#7584

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2023-08-10 11:07:10 +08:00
Wedson Almeida Filho
729b2dd611 agent: avoid creating new Vec instances when easily avoidable
There are many places where the code currently creates new `Vec`
instances when it's not really needed. The result is a perf hit because
it allocates memory, copies all elements, then frees the memory; in some
cases, copying elements also involves extra allocations (e.g., when
elements are strings, or structs containing strings).

This patch addresses a number of these cases.

Fixes: #7203

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-09 02:38:36 -03:00
Jiang Liu
baabfa9f1f agent: refine implementation of mount related code
Refine implementation of mount by:
- log message with `path.display()` instead of `{:?}`
- add prefix "_" to unused variables
- pass by reference instead of by value to avoid creating redundant
  array
- exactly matching prefix "fsgid=" instead of "fsgid"
- avoid redundant clone() operations

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-08 18:03:03 +08:00
Jiang Liu
98ba211a34 agent: fix a bug in update_ephemeral_mounts()
There's a bug in function update_ephemeral_mounts() which only handles
the first storage object and ignores all other storage objects.

Fixes: #7551

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-08 18:03:02 +08:00
Jiang Liu
5333618d70 agent: make add_storage() take &[Storage] instead of Vec<Storage>
Simplify add_storage() by taking &[Storage] instead of Vec<Storage>.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-08 18:03:01 +08:00
Jiang Liu
37f34781d1 agent: simplify function online_cpu_memory()
Simplify function online_cpu_memory() by on calling update_cpuset_path()
for containers with cpuset configured.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-08 18:03:00 +08:00
Jiang Liu
d3c5422379 agent: refine style of code related to sandbox
Refine style of code related to sandbox by:
- remove unnecessary comments for caller to take lock, we have already taken
  `&mut self`.
- change "*count < 1 " to "*count == 0", `count` is type of u32.
- make remove_sandbox_storage() to take `&mut self` instead of `&self`.
- group related function to each others
- avoid search the map twice in function find_process()
- avoid unwrap() in function run_oom_event_monitor()
- avoid unwrap() in online_resources()

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-08 18:02:59 +08:00
Jiang Liu
71a9f67781 agent: avoid unwrap() in function do_remove_container()
Avoid unwrap() in function do_remove_container(), and also make
implmementation symmetric for both timeout and non-timeout cases.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-08 18:02:58 +08:00
Jiang Liu
84badd89d7 agent: avoid clone objects when possible
Optimize agent rpc implementation by:
- avoid clone objects when possible
- avoid unwrap() when possible
- explictly drop object to ensure order

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-08 18:02:56 +08:00
Chao Wu
b098960442
Merge pull request #7581 from justxuewei/bump-versions
deps: Bump dependent crate versions
2023-08-08 15:16:57 +08:00
Chao Wu
24bf637835
Merge pull request #7500 from pmores/fix-queue-num-in-dragonball-share-fs
fix number of queues handling in dragonball share fs device
2023-08-08 12:07:25 +08:00
Xuewei Niu
b23c5ed155 deps: Bump dependent crate versions
This pull request is mainly for updating vm-memory and vmm-sys-util.

The affacted crates include:

- vm-memory: from 0.9.0 to 0.10.0
- vmm-sys-util: from 0.10.0 to 0.11.0
- virtio-queue: from 0.6.0 to 0.7.0
- fuse-backend-rs: from 0.10.4 to 0.10.5
- linux-loader: from 0.6.0 to 0.8.0
- nydus-api: from 0.3.0 to 0.3.1
- nydus-rafs: from 0.3.1 to 0.3.2
- nydus-storage: from 0.6.3 to 0.6.4

Fixes: #0000

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-08-08 11:54:09 +08:00
Fupan Li
5a20d8dcaf
Merge pull request #7383 from justxuewei/dan
runtime-rs: Introduce directly attachable network
2023-08-08 09:54:28 +08:00
Wedson Almeida Filho
c36572418f agent: avoid unnecessary calls to Arc::clone
These calls cause two extra atomic instructions each time they're used,
one to increment and another one to decrement the refcount.

Since we don't need them because the referred value is guaranteed to
outlive the function, remove the calls.

Fixes: #7190

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-03 20:53:05 -03:00
Wedson Almeida Filho
4fbe0a3a53 runtime: bind-mount mounted block device into container
When the mounted block device isn't a layer, we want to mount it into
containers, but since it's already mounted with the correct fs (e.g.,
tar, ext4, etc.) in the pod, we just bind-mount it into the container.

Fixes: #7536

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-03 17:58:39 -03:00
Wedson Almeida Filho
7e1b1949d4 runtime: add support for kata overlays
When at least one `io.katacontainers.fs-opt.layer` option is added to
the rootfs, it gets inserted into the VM as a layer, and the file system
is mounted as an overlay of all layers using the overlayfs driver.

Additionally, if the `io.katacontainers.fs-opt.block_device=file` option
is present in a layer, it is mounted as a block device backed by a file
on the host.

Fixes: #7536

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-03 17:58:39 -03:00
Wedson Almeida Filho
6c867d9e86 agent: add io.katacontainers.fs-opt.overlay-rw option
This causes the overlay-fs driver to add the `upperdir` and `workdir`
options to an overlay-fs mount so that the mount becomes writable using
a discardable directory under the container id.

Fixes: #7536

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-03 17:58:39 -03:00
Wedson Almeida Filho
6163c35657 agent: skip mount options that start with "io.katacontainers."
This is so that file systems don't fail when we pass kata-specific
options from the snapshotter to kata.

Fixes: #7536

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-03 17:58:39 -03:00
Fabiano Fidêncio
fa35afa982
Merge pull request #7542 from wedsonaf/ci-fix
Use version 0.10.4 of `fuse-backend-rs`
2023-08-03 22:50:11 +02:00
Wedson Almeida Filho
b2ff97aa01 dragonball: use version 0.10.4 of fuse-backend-rs
Version 0.10.5, which was just released, breaks `nydus-storage`.

This is a workaround to fix the CI which is blocking other PRs.

Fixes: #7541

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-03 14:15:17 -03:00
Manabu Sugimoto
845eeb4d7b agent: Allow clippy::redundant_clone in the unit tests
Allow `clippy::redundant_clone` in the agent's unit tests
because rustc>=1.70 shows the errors as false-negatives.
These `clone()` are required because the following codes
refer to the variable, but the clippy analyzes them by mistake,
using the conservative and limited approach.
Ref. https://rust-lang.github.io/rust-clippy/master/index.html#/redundant_clone

Fixes: #7534

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-08-03 19:07:40 +09:00
Xuewei Niu
3958a39d07 runtime-rs: Introduce directly attachable network
Kata containers as VM-based containers are allowed to run in the host
netns. That is, the network is able to isolate in the L2. The network
performance will benefit from this architecture, which eliminates as many
hops as possible. We called it a Directly Attachable Network (DAN for
short).

The network devices are placed at the host netns by the CNI plugins. The
configs are saved at {dan_conf}/{sandbox_id}.json in the format of JSON,
including device name, type, and network info. At the very beginning stage,
the DAN only supports host tap devices. More devices, like the DPDK, will
be supported in later versions.

The format of file looks like as below:

```json
{
	"netns": "/path/to/netns",
	"devices": [{
		"name": "eth0",
		"guest_mac": "xx:xx:xx:xx:xx",
		"device": {
			"type": "vhost-user",
			"path": "/tmp/test",
			"queue_num": 1,
			"queue_size": 1
		},
		"network_info": {
			"interface": {
				"ip_addresses": ["192.168.0.1/24"],
				"mtu": 1500,
				"ntype": "tuntap",
				"flags": 0
			},
			"routes": [{
				"dest": "172.18.0.0/16",
				"source": "172.18.0.1",
				"gateway": "172.18.31.1",
				"scope": 0,
				"flags": 0
			}],
			"neighbors": [{
				"ip_address": "192.168.0.3/16",
				"device": "",
				"state": 0,
				"flags": 0,
				"hardware_addr": "xx:xx:xx:xx:xx"
			}]
		}
	}]
}
```

Fixes: #1922

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-08-03 15:33:34 +08:00
Zhongtao Hu
e719423262
Merge pull request #7127 from cmaf/runtime-rs-ch-blk-2
runtime-rs: Add block device handling for cloud hypervisor
2023-08-03 09:46:32 +08:00
Zvonko Kaiser
cf8899f260
Merge pull request #7494 from zvonkok/vfio-mode
vfio: Fix vfio device ordering
2023-08-02 19:45:22 +02:00
Chelsea Mafrica
a81ad3b587 runtime-rs: Add block device handling in cloud hypervisor
Add functions for adding a block device to a container for CH.

Fixes #6690

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2023-08-02 09:18:48 -07:00
Fupan Li
1a6b27bf6a
Merge pull request #5797 from Yuan-Zhuo/add-metrics-for-runtime-rs
runtime-rs: add support for gather metrics in runtime-rs
2023-08-02 13:40:22 +08:00
Fupan Li
a536d4a7bf
Merge pull request #6672 from Yuan-Zhuo/add-monitor-in-kata-ctl
kata-ctl: add monitor subcommand for runtime-rs
2023-08-02 13:39:02 +08:00
Pavel Mores
28e5e9c86e runtime-rs: fix number of queues handling in dragonball share fs device
Looks like a copy/paste error...

Fixes #7501

Signed-off-by: Pavel Mores <pmores@redhat.com>
2023-07-31 17:25:47 +02:00
Zvonko Kaiser
cddcde1d40 vfio: Fix vfio device ordering
If modeVFIO is enabled we need 1st to attach the VFIO control group
device /dev/vfio/vfio an 2nd the actuall device(s) afterwards.Sort the
devices starting with device #1 being the VFIO control group device and
the next the actuall device(s)
/dev/vfio/<group>

Fixes: #7493

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-31 11:26:27 +00:00
Jiang Liu
b3901c46d6 runtime-rs: ignore errors during clean up sandbox resources
Ignore errors during clean up sandbox resources as much as we can.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-07-31 13:07:43 +08:00
Jiang Liu
62e328ca5c runtime-rs: refine implementation of TaskService
Refine implementation of TaskService, making handler_message() as a
method.

Fixes: #7479

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-07-29 00:47:33 +08:00
Jiang Liu
458e1bc712 runtime-rs: make send_message() as an method of ServiceManager
Simplify implementation by making send_message() as an method of
ServiceManager.

Fixes: #7479

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-07-29 00:47:31 +08:00
Jiang Liu
1cc1c81c9a runtime-rs: fix possibe bug in ServiceManager::run()
Multiple instances of task service may get registered by
ServiceManager::run(), fix it by making operation symmetric.

Fixes: #7479

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-07-29 00:47:30 +08:00
Jiang Liu
1a5f90dc3f runtime-rs: simplify implementation of service crate
Simplify implementation of service crate.

Fixes: #7479

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-07-29 00:47:28 +08:00
Yuan-Zhuo
731e7c763f kata-ctl: add monitor subcommand for runtime-rs
The previous kata-monitor in golang could not communicate with runtime-rs
to gather metrics due to different sandbox addresses.
This PR adds the subcommand monitor in kata-ctl to gather metrics from
runtime-rs and monitor itself.

Fixes: #5017

Signed-off-by: Yuan-Zhuo <yuanzhuo0118@outlook.com>
2023-07-28 17:30:08 +08:00
Yuan-Zhuo
d74639d8c6 kata-ctl: provide the global TIMEOUT for creating MgmtClient
Several functions in kata-ctl need to establish a connection with runtime-rs through MgmtClient.
This PR provides a global TIMEOUT to avoid multiple definitions.

Fixes: #5017

Signed-off-by: Yuan-Zhuo <yuanzhuo0118@outlook.com>
2023-07-28 17:23:37 +08:00
Yuan-Zhuo
02cc4fe9db runtime-rs: add support for gather metrics in runtime-rs
1. Implemented metrics collection for runtime-rs shim and dragonball hypervisor.
2. Described the current supported metrics in runtime-rs.(docs/design/kata-metrics-in-runtime-rs.md)

Fixes: #5017

Signed-off-by: Yuan-Zhuo <yuanzhuo0118@outlook.com>
2023-07-28 17:16:51 +08:00
Zhongtao Hu
61a8eabf8e
Merge pull request #7139 from openanolis/fix/devmanager
runtime-rs: change block index to 0
2023-07-28 14:04:19 +08:00
Chelsea Mafrica
e941b3a094
Merge pull request #7456 from alakesh/agent-fix-typo
agent: fix typo in constant
2023-07-27 09:31:24 -07:00
Zhongtao Hu
c8fcd29d9b runtime-rs: use device manager to handle virtio-pmem
use device manager to handle virtio-pmem device

Fixes: #7119
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2023-07-27 20:18:49 +08:00
Zhongtao Hu
901c192251 runtime-rs: support configure vm_rootfs_driver
support configure vm_rootfs_driver in toml config

Fixes: #7119
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2023-07-27 20:12:53 +08:00
Zhongtao Hu
5d6199f9bc runtime-rs: use device manager to handle vm rootfs
use device manager to handle vm rootfs, after attach the block device of
vm rootfs, we need to increase index number

Fixes: #7119
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-07-27 20:12:45 +08:00
James O. D. Hunt
20f1f62a2a runtime-rs: change block index to 0
Change block index in SharedInfo to 0 for vda.

Fixes #7119

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-07-27 20:11:44 +08:00
Fabiano Fidêncio
8a22b5f075
Merge pull request #7439 from ManaSugi/fix/remove-unused-mut
agent,libs: Remove unused 'mut' keywords
2023-07-26 21:25:41 +02:00
Fabiano Fidêncio
9792ac49fe
Merge pull request #7425 from jongwu/remove_mut
runtime-rs: remove unneeded 'mut' keywords
2023-07-26 21:24:40 +02:00
Alakesh Haloi
314aec73d4 agent: fix typo in constant
It fixes a constant name to have the right spelling

Fixes: #7457
Signed-off-by: Alakesh Haloi <a_haloi@apple.com>
2023-07-26 00:06:34 -05:00
Eric Ernst
5385ddc560
Merge pull request #7365 from alakesh/symlink-fix
agent: exclude symlinks from recursive ownership change
2023-07-25 11:27:48 -07:00
GabyCT
7a3b55ce67
Merge pull request #7432 from ManaSugi/runk/doc-docker
runk: Add Docker guide to README
2023-07-25 09:56:02 -06:00
Manabu Sugimoto
ff4cfcd8a2 runk: Add Docker guide to README
`runk` can launch containers using Docker, so add the guide
to it's README.

```sh
$ sudo dockerd --experimental --add-runtime="runk=/usr/local/bin/runk"
$ sudo docker run -it --rm --runtime runk busybox echo hello runk
hello runk
```

Fixes: #7431

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-07-25 20:10:49 +09:00
Manabu Sugimoto
b9f100b391 agent,libs: Remove unused 'mut' keywords
Remove unused `mut` because the agent compilation fails
when the rust compiler is >= 1.71. This is related to #7425

Fixes: #7438

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-07-25 17:41:08 +09:00
Fabiano Fidêncio
5ce0b4743f
Merge pull request #7382 from zvonkok/vfio-ap-debug
s390x: Fixing device.Bus assignment
2023-07-25 08:26:25 +02:00
Jianyong Wu
2c8f83424d runtime-rs: remove unneeded 'mut' keywords
These unneeded 'mut' keywords blocks built by rust 1.71.0. Remove them.

Fixes: #7424
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-07-24 08:47:15 +00:00
Zvonko Kaiser
1fc715bc65 s390x: Add AP Attach/Detach test
Now that we have propper AP device support add a
unit test for testing the correct Attach/Detach of AP devices.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-23 13:44:19 +00:00
Zvonko Kaiser
545de5042a vfio: Fix tests
Now with more elaborate checking of cold|hot plug ports
we needed to update some of the tests.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-20 13:42:44 +00:00
Zvonko Kaiser
62aa6750ec vfio: Added better handling of VFIO Control Devices
Depending on the vfio_mode we need to mount the
VFIO control device additionally into the container.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-20 13:42:42 +00:00
Zvonko Kaiser
dd422ccb69 vfio: Remove obsolete HotplugVFIOonRootBus
Removing HotplugVFIOonRootBus which is obsolete with the latest PCI
topology changes, users can set cold_plug_vfio or hot_plug_vfio either
in the configuration.toml or via annotations.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-20 07:25:40 +00:00
Zvonko Kaiser
114542e2ba s390x: Fixing device.Bus assignment
The device.Bus was reset if a specific combination of
configuration parameters were not met. With the new
PCIe topology this should not happen anymore

Fixes: #7381

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-20 07:24:26 +00:00
Alakesh Haloi
371a118ad0 agent: exclude symlinks from recursive ownership change
currently when fsGroup is used with direct-assign, kata agent
recursively changes ownership and permission for each file including
symlinks. However the problem with symlinks is, the permission of
the symlink itself may not be same as the underlying file. So while
doing recursive ownership and permission changes we should skip
symlinks.

Fixes: #7364
Signed-off-by: Alakesh Haloi <a_haloi@apple.com>
2023-07-19 20:42:55 -07:00
Chao Wu
bbd3c1b6ab Dragonball: migrate dragonball-sandbox crates to Kata
In order to make it easier for developers to contribute to Dragonball,
we decide to migrate all dragonball-sandbox crates to Kata.

fixes: #7262

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-07-19 19:41:57 +08:00
Chao Wu
935432c36d
Merge pull request #7352 from justxuewei/exec-hang
agent: Fix exec hang issues with a backgroud process
2023-07-18 23:02:18 +08:00
Fabiano Fidêncio
25d80fcec2
Merge pull request #6993 from zvonkok/kata-agent-init-mount
agent: Ignore already mounted dev/fs/pseudo-fs
2023-07-18 14:11:44 +02:00
Zhongtao Hu
d50f3888af
Merge pull request #7219 from Apokleos/network-refactor
runtime-rs: enhancement of Device Manager for network endpoints.
2023-07-17 14:13:51 +08:00
QuanweiZhou
ce14f26d82
Merge pull request #5450 from openanolis/trace_rs
feat(Tracing): tracing in Rust runtime
2023-07-17 09:27:13 +08:00
Manabu Sugimoto
f1d8de9be6 runk: Allow runk to launch a container without pid namespace
Allow runk to launch a container even though users don't specify the
pid namespace in `config.json` because general container runtimes
such as runc also can launch a container without the namespace.
On the other hand, Kata Containers doesn't allow it due to security issue
so this feature should be enabled in only runk.

Fixes: #7168

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-07-16 23:31:14 +05:30
Zhongtao Hu
419f8a5db7
Merge pull request #7021 from cheriL/7020/ignore-unconfigured-netinterface
runtime-rs: ignore unconfigured network interfaces
2023-07-16 10:11:15 +08:00
Xuewei Niu
6c91af0a26 agent: Fix exec hang issues with a backgroud process
Issue #4747 and pull request #4748 fix exec hang issues where the exec
command hangs when a process's stdout is not closed. However, the PR might
cause the exec command not to work as expected, leading to CI failure. The
PR was reverted in #7042. This PR resolves the exec hang issues and has
undergone 1000 rounds of testing to verify that it would not cause any CI
failures.

Fixes: #4747

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-07-16 08:32:45 +08:00
Chao Wu
9b3dc572ae
Merge pull request #7018 from nubificus/feat_bindmount_propagation
runtime-rs: add parameter for propagation of (u)mount events
2023-07-14 15:21:41 +08:00
Archana Shinde
b9b8ccca0c
Merge pull request #7236 from amshinde/move-guestprotection
kata-ctl: Move GuestProtection code to kata-sys-util
2023-07-13 23:50:17 -07:00
soup
150e54d02b runtime-rs: ignore unconfigured network interfaces
Fixes: #7020

Signed-off-by: soup <lqh348659137@outlook.com>
2023-07-14 14:16:03 +08:00
Anastassios Nanos
6787c63900 runtime-rs: add parameter for propagation of (u)mount events
Add an extra parameter in `bind_mount_unchecked` to specify
the propagation type: "shared" or "slave".

Fixes: #7017

Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
2023-07-13 15:58:22 +00:00
Archana Shinde
62080f83cb kata-sys-util: Fix compilation errors
Fix compilation errors for aarch64 and s390x

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-07-13 20:09:43 +05:30
Archana Shinde
02d99caf6d static-checks: Make cargo clippy pass.
Get rid of cargo clippy warnings.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-07-13 20:08:13 +05:30
Archana Shinde
9824206820 agent: Make the static checks pass for agent
The static checks for the agent require Cargo.lock to be updated.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-07-13 20:08:13 +05:30
Archana Shinde
61e4032b08 kata-ctl: Remove all utility functions to get platform protection
Since these have been added to kata-sys-util, remove these from
kata-ctl. Change all invocations to get platform protection to make use
of kata-sys-util.

Fixes: #7144

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-07-13 20:08:13 +05:30
Archana Shinde
a24dbdc781 kata-sys-util: Move utilities to get platform protection
Add utilities to get platform protection to kata-sys-util

Fixes: #7144

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-07-13 20:08:13 +05:30
Archana Shinde
dacdf7c282 kata-ctl: Remove cpu related functions from kata-ctl
Remove cpu related functions which have been moved to kata-sys-util.
Change invocations in kata-ctl to make use of functions now moved to
kata-sys-util.

Signed-off-by: Nathan Whyte <nathanwhyte35@gmail.com>
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-07-13 20:08:13 +05:30
Archana Shinde
f5d1957174 kata-sys-util: Move additional functionality to cpu.rs
Make certain imports architecture specific as these are not used on all
architectures.
Move additional constants and functionality to cpu.rs.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-07-13 20:08:13 +05:30
Nathan Whyte
304b9d9146 kata-sys-util: Move CPU info functions
Move get_single_cpu_info and get_cpu_flags into kata-sys-util.
Add new functions that get a list of flags and check if a flag
exists in that list.

Fixes #6383

Signed-off-by: Nathan Whyte <nathanwhyte35@gmail.com>
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-07-13 20:08:13 +05:30
Zhongtao Hu
b69cdb5c21
Merge pull request #7286 from xuejun-xj/xuejun/up-fix
dragonball/agent: Add some optimization for Makefile and bugfixes of unit tests on aarch64
2023-07-13 09:39:23 +08:00
alex.lyn
283f809dda runtime-rs: Enhancing Device Manager for network endpoints.
Currently, network endpoints are separate from the device manager
and need to be included for proper management. In order to do so,
we need to refactor the implementation of the network endpoints.

The first step is to restructure the NetworkConfig and NetworkDevice
structures.
Next, we will implement the virtio-net driver and add the Network
device to the Device Manager.
Finally, we'll unify entries with do_handle_device for each endpoint.

Fixes: #7215

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-07-12 11:27:12 +08:00
xuejun-xj
a65291ad72 agent: rustjail: update test_mknod_dev
When running cargo test in container, test_mknod_dev may fail sometimes
because of "Operation not permitted". Change the device path to
"/dev/fifo-test" to avoid this case.

Fixes: #7284

Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
2023-07-12 11:22:32 +08:00
xuejun-xj
46b81dd7d2 agent: clippy: fix cargo clippy warnings
Replace "if let Ok(_) = ..." with ".is_ok()" method.

Fixes: #7284

Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
2023-07-12 11:22:32 +08:00
xuejun-xj
c4771d9e89 agent: Makefile: enable set SECCOMP dynamically
Change ":=" to "?:".

Fixes: #7284

Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
2023-07-12 11:22:32 +08:00
xuejun-xj
883b4db380 dragonball: fix cargo test on aarch64
1. Update memory end assert because address space layout differs between
x86 and arm.
2. Set guest_addr for aarch64 in test_handler_insert_region case.

Fixes: #7284
TODO: #7290

Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
2023-07-12 11:22:31 +08:00
Xuewei Niu
6822029c81 runtime-rs: Do not scan network if network model is "none"
Skip to scan network from netns if the network model is specified to
"none".

Fixes: #7305

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-07-12 10:00:50 +08:00
xuejun-xj
aedc586e14 dragonball: Makefile: add coverage target
Add "coverage" target to compute code coverage for dragonball.

Fixes: #7284

Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
2023-07-11 14:36:25 +08:00
Yushuo
28c29b248d bugfix: plus default_memory when calculating mem size
We've noticed this caused regressions with the k8s-oom tests, and then
decided to take a step back and do this in the same way it was done
before 67972ec48a.

Moreover, this step back is also more reasonable in terms of the
controlling logic.

And by doing this we can re-enable the k8s-oom.bats tests, which is done
as part of this PR.

Fixes: #7271
Depends-on: github.com/kata-containers/tests#5705

Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
2023-07-10 15:53:04 +08:00
Ji-Xinyou
ed23b47c71 tracing: Add tracing to runtime-rs
Introduce tracing into runtime-rs, only some functions are instrumented.

Fixes: #5239

Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
2023-07-09 22:09:43 +08:00
Fabiano Fidêncio
96e9374d4b dragonball: Don't fail if a request asks for more CPUs than allowed
Let's take the same approach of the go runtime, instead, and allocate
the maximum allowed number of vcpus instead.

Fixes: #7270

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-08 15:50:23 +02:00
Fabiano Fidêncio
275c84e7b5 Revert "agent: fix the issue of exec hang with a backgroud process"
This reverts commit 25d2fb0fde.

The reason we're reverting the commit is because it to check whether
it's the cause for the regression on devmapper tests.

Fixes: #7253
Depends-on: github.com/kata-containers/tests#5705

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-08 14:27:40 +02:00
Zvonko Kaiser
f72cb2fc12 agent: Remove shadowed function, add slog-term
Remove shadowed get_mounts(), added slog-term as a new crate,
slog can directly log to stdout and we can capture output
in the test-cases that are created in the function to be tested.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-07 11:28:14 +00:00
Zvonko Kaiser
07810bf71f agent: Ignore already mounted dev/fs/pseudo-fs
Using an initrd and setting KATA_INIT=yes meaning we're using the kata-agent
as the init process we need to make sure that the agent is not segfaulting
if mounts are already happened. Some workloads need to configure several
things in the initrd before the kata-agent starts which involves having
/proc or /sys already mounted.

Fixes: #6992

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-07 07:36:04 +00:00
Bin Liu
f214058b07
Merge pull request #7202 from wedsonaf/macros
Convert `is_allowed`, `ttrpc_error` and `sl` to functions
2023-07-04 14:23:08 +08:00
Peng Tao
581be92b25
Merge pull request #4492 from zvonkok/pcie-topology
runtime: fix PCIe topology for GPUDirect use-case
2023-07-03 09:17:12 +08:00
Fabiano Fidêncio
6a21e20c63 runtime: Add "none" as a shared_fs option
Currently, even when using devmapper, if the VMM supports virtio-fs /
virtio-9p, that's used to share a few files between the host and the
guest.

This *needed*, as we need to share with the guest contents like secrets,
certificates, and configurations, via Kubernetes objects like configMaps
or secrets, and those are rotated and must be updated into the guest
whenever the rotation happens.

However, there are still use-cases users can live with just copying
those files into the guest at the pod creation time, and for those
there's absolutely no need to have a shared filesystem process running
with no extra obvious benefit, consuming memory and even increasing the
attack surface used by Kata Containers.

For the case mentioned above, we should allow users, making it very
clear which limitations it'll bring, to run Kata Containers with
devmapper without actually having to use a shared file system, which is
already the approach taken when using Firecracker as the VMM.

Fixes: #7207

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-06-30 20:45:00 +02:00
Zvonko Kaiser
0f454d0c04 gpu: Fixing typos for PCIe topology changes
Some comments and functions had typos and wrong capitalization.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-30 08:42:55 +00:00
Fupan Li
4288b935e1
Merge pull request #7104 from openanolis/physical/endpoint
runtime-rs:  support physical endpoint using device manager
2023-06-29 14:43:44 +08:00
GabyCT
19890133e9
Merge pull request #7189 from Apokleos/direct-vol-bugfix
runtime-rs: bugfix for direct volume path's validation.
2023-06-28 12:26:22 -06:00
Wedson Almeida Filho
0504bd7254 agent: convert the sl macros to functions
There is nothing in them that requires them to be macros. Converting
them to functions allows for better error messages.

Fixes: #7201

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-06-28 14:05:32 -03:00
Wedson Almeida Filho
0860fbd410 agent: convert the ttrpc_error macro to a function
There is nothing in it that requires it to be a macro. Converting it to
a function allows for better error messages.

Fixes: #7201

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-06-28 14:05:32 -03:00
Wedson Almeida Filho
0e5d6ce6d7 agent: convert the is_allowed macro to a function
Having a function allows for better error messages from the type checker
and it makes it clearer to callers what can happen. For example:

is_allowed!(req);

Gives no indication that it may result in an early return, and no simple
way for callers to modify the behaviour. It also makes it look like
ownership of `req` is being transferred.

On the other hand,

is_allowed(&req)?;

Indicates that `req` is being borrowed (immutably) and may fail. The
question mark indicates that the caller wants an early return on
failure.

Fixes: #7201

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-06-28 14:05:32 -03:00
Wedson Almeida Filho
f680fc52be agent: change AGENT_CONFIG's lazy type to just AgentConfig
Since it is never modified, it doesn't really need a lock of any kind.
Removing the `RwLock` wrapper allows us to remove all `.read().await`
calls when accessing it.

Additionally, `AGENT_CONFIG` already has a static lifetime, so there is
no need to wrap it in a ref-counted heap allocation.

Fixes: #5409

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-06-28 14:05:27 -03:00
Jianyong Wu
1f3e837e4b runtime-rs: fix build error on AArch64
Vfio support introduce build error on AArch64. Remove arch related
annotation can avoid this error.

Fixes: #7187
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-06-28 07:10:43 +00:00
alex.lyn
6fd25968c6 runtime-rs: bugfix for direct volume path's validation.
The failure mainly caused by the encoded volume path and
the mount/src. As the src will be validated with stat,but
it's not a full path and encoded, which causes the stat
mount source failed.

Fixes: #7186

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-06-28 10:07:07 +08:00
Zhongtao Hu
bff4672f7d runtime-rs: support physical endpoint using device manager
use device manager to attach physical endpoint

Fixes: #7103
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2023-06-27 10:25:51 +08:00
alex.lyn
0df2fc2702 runtime-rs: add support spdk/vhost-user based volume.
Unlike the previous usage which requires creating
/dev/xxx by mknod on the host, the new approach will
fully utilize the DirectVolume-related usage method,
and pass the spdk controller to vmm.

And a user guide about using the spdk volume when run
a kata-containers. it can be found in docs/how-to.

Fixes: #6526

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-06-25 16:23:19 +08:00
GabyCT
388b55175e
Merge pull request #7056 from FuuuOverclocking/fuu/fix-console_manager
dragonball: avoid obtaining lock twice in create_stdio_console
2023-06-23 16:47:00 -06:00
Zvonko Kaiser
8330fb8ee7 gpu: Update unit tests
Some tests are now failing due to the changes how PCIe is
handled. Update the test accordingly.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-23 11:16:25 +00:00
Fupan Li
469c678425
Merge pull request #7058 from Apokleos/vfio-dev
add support vfio device manager
2023-06-22 17:51:22 -06:00
Archana Shinde
2d329125fd
Merge pull request #6800 from amshinde/check-vm-capability
kata-ctl: Check for vm capability
2023-06-21 23:52:46 -07:00
Archana Shinde
610f7986e4 check: Relax the unrestricted_guest check when running in a VM
When running on a VM, the kernel parameter "unrestricted_guest" for
kernel module "kvm_intel" is not required. So, return success when running
on a VM without checking value of this kernel parameter.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-06-21 07:30:35 -07:00
Archana Shinde
1b406b9d0c kata-ctl:Implement functionality to check host is capable of running VM
Implement functionality to add to the env output if the host is capable
of running a VM.

Fixes: #6727

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-06-21 07:30:22 -07:00
soup
09720babc3 docs: fix spelling of "crate"
Fixes: #7153

Signed-off-by: soup <lqh348659137@outlook.com>
2023-06-21 16:10:54 +08:00
alex.lyn
59510cfee0 runtime-rs: add support vfio device based volume
A new choice of using vfio devic based volume for kata-containers.
With the help of kata-ctl direct-volume, users are able to add a
specified device which is BDF or IOMMU group ID.

To help users to use it smoothly, A doc about howto added in
docs/how-to/how-to-run-kata-containers-with-kinds-of-Block-Volumes.

Fixes: #6525

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-06-18 14:07:05 +08:00
alex.lyn
1e3b372bbb runtime-rs: add support vfio device manager
Limitations:
As no ready rust vmm's vfio manager is ready, it only supports
part of vfio in runtime-rs. And the left part is to call vmm
interfaces related to vfio add/remove.

So when vmm/vfio manager ready, a new PR will be pushed to
narrow the gap.

Fixes: #6525

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-06-18 14:05:59 +08:00
Greg Kurz
a43ea24dfc virtiofsd: Convert legacy -o sub-options to their -- replacement
The `-o` option is the legacy way to configure virtiofsd, inherited
from the C implementation. The rust implementation honours it for
compatibility but it logs deprecation warnings.

Let's use the replacement options in the go shim code. Also drop
references to `-o` from the configuration TOML file.

Fixes #7111

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-06-16 11:42:54 +02:00
Greg Kurz
8e00dc6944 virtiofsd: Drop -o no_posix_lock
The C implementation of virtiofsd had some kind of limited support
for remote POSIX locks that was causing some workflows to fail with
kata. Commit 432f9bea6e hard coded `-o no_posix_lock` in order
to enforce guest local POSIX locks and avoid the issues.

We've switched to the rust implementation of virtiofsd since then,
but it emits a warning about `-o` being deprecated.

According to https://gitlab.com/virtio-fs/virtiofsd/-/issues/53 :

   The C implementation of the daemon has limited support for
   remote POSIX locks, restricted exclusively to non-blocking
   operations. We tried to implement the same level of
   functionality in #2, but we finally decided against it because,
   in practice most applications will fail if non-blocking
   operations aren't supported.

   Implementing support for non-blocking isn't trivial and will
   probably require extending the kernel interface before we can
   even start working on the daemon side.

There is thus no justification to pass `-o no_posix_lock` anymore.

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-06-16 11:42:39 +02:00
Greg Kurz
2a15ad9788 virtiofsd: Stop using deprecated -f option
The rust implementation of virtiofsd always runs foreground and
spits a deprecation warning when `-f` is passed.

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-06-16 10:30:40 +02:00
Zvonko Kaiser
72f2cb84e6 gpu: Reset cold or hot plug after overriding
If we override the cold, hot plug with an annotation
we need to reset the other plugging mechanism to NoPort
otherwise both will be enabled.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-15 17:51:01 +00:00
Zvonko Kaiser
fbacc09646 gpu: PCIe topology, consider vhost-user-block in Virt
In Virt the vhost-user-block is an PCIe device so
we need to make sure to consider it as well. We're keeping
track of vhost-user-block devices and deduce the correct
amount of PCIe root ports.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-15 17:39:55 +00:00
Zvonko Kaiser
b11246c3aa gpu: Various fixes for virt machine type
The PCI qom path was not deduced correctly added regex for correct
path walking.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:33:57 +00:00
Zvonko Kaiser
40101ea7db vfio: Added annotation for hot(cold) plug
Now it is possible to configure the PCIe topology via annotations
and addded a simple test, checking for Invalid and RootPort

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
Zvonko Kaiser
8f0d4e2612 vfio: Cleanup of Cold and Hot Plug
Removed the configuration of PCIeRootPort and PCIeSwitchPort, those
values can be deduced in createPCIeTopology

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
Zvonko Kaiser
b5c4677e0e vfio: Rearrange the bus assignemnt
Refactor the bus assignment so that the call to GetAllVFIODevicesFromIOMMUGroup
can be used by any module without affecting the topology.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
Zvonko Kaiser
b1aa8c8a24 gpu: Moved the PCIe configs to drivers
The hypervisor_state file was the wrong location for the PCIe Port
settings, moved everything under device umbrella, where it can be
consumed more easily and we do not get into circular deps.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
Zvonko Kaiser
55a66eb7fb gpu: Add config to TOML
Update cold-plug and hot-plug setting to include bridge, root and
switch-port

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
Zvonko Kaiser
da42801c38 gpu: Add config settings tests for hot-plug
Updated all references and config settings for hot-plug to match
cold-plug

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
Zvonko Kaiser
de39fb7d38 runtime: Add support for GPUDirect and GPUDirect RDMA PCIe topology
Fixes: #4491

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
alex.lyn
347385b4ee runtime-rs: Enhance flexibility of virtio-fs config
support more and flexible options for inline virtiofs.

Fixes: #7091

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-06-13 15:12:47 +08:00
Zhongtao Hu
355a24e0e1
Merge pull request #6289 from openanolis/runtime_vcpu_resize
feat(runtime): vcpu resize capability
2023-06-13 10:54:11 +08:00
Yushuo
ae2cfa8263 doc: add vcpu handlint doc for runtime-rs
Kubernetes and Containerd will help calculate the Sandbox Size and pass it to
Kata Containers through annotations.

In order to accommodate this favorable change and be compatible with the past,
we have implemented the handling of the number of vCPUs in runtime-rs. This is
This is slightly different from the original runtime-go design.

This doc introduce how we handle vCPU size in runtime-rs.

Fixes: #5030

Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
2023-06-12 19:23:11 +08:00
Yushuo
7b1e67819c fix(clippy): fix clippy error
Fixes: #5030

Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
2023-06-12 17:53:16 +08:00
Yushuo
67972ec48a feat(runtime-rs): calculate initial size
In this commit, we refactored the logic of static resource management.

We defined the sandbox size calculated from PodSandbox's annotation and
SingleContainer's spec as initial size, which will always be the sandbox
size when booting the VM.

The configuration static_sandbox_resource_mgmt controls whether we will
modify the sandbox size in  the following container operation.

Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
2023-06-12 17:53:16 +08:00
Yushuo
aaa96c749b feat(runtime-rs): modify onlineCpuMemRequest
Some vmms, such as dragonball, will actively help us
perform online cpu operations when doing cpu hotplug.
Under the old onlineCpuMem interface, it is difficult
to adapt to this situation.

So we modify the semantics of nb_cpus in onlineCpuMemRequest.
In the original semantics, nb_cpus represents the number of
newly added CPUs that need to be online. The modified
semantics become that the number of online CPUs in the guest
needs to be guaranteed.

Fixes: #5030

Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
2023-06-12 17:53:16 +08:00
Yushuo
d66f7572dd feat(runtime-rs): clear cpuset in runtime side
The declaration of the cpu number in the cpuset is greater
than the actual number of vcpus, which will cause an error when
updating the cgroup in the guest.

This problem is difficult to solve, so we temporarily clean up
the cpuset in the container spec before passing in the agent.

Fixes: #5030

Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
2023-06-12 17:53:16 +08:00
Yushuo
a0385e1383 feat(runtime-rs): update linux resource when stop_process
Update the resource when delete container, which is in
stop_process in runtime-rs.

Fixes: #5030

Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
2023-06-12 17:53:16 +08:00
Yushuo
a39e1e6cd1 feat(runtime-rs): merge the update_cgroups in update_linux_resources
Updating vCPU resources and memory resources of the sandbox and
updating cgroups on the host will always happening together, and
they are all updated based on the linux resources declarations of
all the containers.

So we merge update_cgroups into the update_linux_resources, so we
can better manage the resources allocated to one pod in the host.

Fixes: #5030

Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
2023-06-12 17:53:16 +08:00
Ji-Xinyou
fa6dff9f70 feat(runtime-rs): support vcpu resizing on runtime side
Support vcpu resizing on runtime side:
1. Calculate vcpu numbers in resource_manager using all the containers'
   linux_resources in the spec.
2. Call the hypervisor(vmm) to do the vcpu resize.
3. Call the agent to online vcpus.

Fixes: #5030
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
2023-06-12 17:53:16 +08:00
James O. D. Hunt
8cb4238b46 packaging: Remove snap package
Nobody has volunteered to maintain the (currently broken) snap build, so
remove it.

Fixes: #6769.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-06-12 09:24:09 +01:00
Chao Wu
2988553305
Merge pull request #6998 from HerlinCoder/herlincoder/vpa
Dragonball: support resize memory
2023-06-11 17:21:12 +08:00
Archana Shinde
56d2ea9b78 kata-ctl: Refactor kernel module check
Adding vhost and vhost-net to the kernel modules. These do not require
any kernel module parameters to be checked. Currently, kernel params is
a required field. Make this as optional. Could make this as <Option>,
but making this a slice instead, as a module could have multiple kernel
params. Refactor the function that checks are for kernel modules into
two with one specifically checking if the module is loaded and other
checking for module parameters.

Refactor some of the tests to take into account these changes.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-06-09 14:10:31 -07:00
Fabiano Fidêncio
b50f62ce48
Merge pull request #6756 from arronwy/measured_rootfs
Port Measured rootfs feature from CCv0 branch to main
2023-06-09 12:35:05 +02:00
Helin Guo
8fb7ab7518 dragonball: introduce virtio-balloon device
We introduce virtio-balloon device to support memory resize.
virtio-balloon device could reclaim memory from guest to host.

Fixes: #6719

Signed-off-by: Helin Guo <helinguo@linux.alibaba.com>
2023-06-09 17:47:27 +08:00
Helin Guo
7ed9494973 dragonball: introduce virtio-mem device
We introduce virtio-mem device to support memory resize. virtio-mem
device could hot-plug more memory blocks to guest and could also
hot-unplug them from guest.

Fixes: #6719

Signed-off-by: Helin Guo <helinguo@linux.alibaba.com>
2023-06-09 17:47:21 +08:00
alex.lyn
776a15e092 runtime-rs: add support direct volume.
As block/direct volume use similar steps of device adding,
so making full use of block volume code is a better way to
handle direct volume.

the only different point is that direct volume will use
DirectVolume and get_volume_mount_info to parse mountinfo.json
from the direct volume path. That's to say, direct volume needs
the help of `kata-ctl direct-volume ...`.

Details seen at Advanced Topics:
[How to run Kata Containers with kinds of Block Volumes]
docs/how-to/how-to-run-kata-containers-with-kinds-of-Block-Volumes.md

Fixes: #5656

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-06-09 08:16:26 +08:00
Helin Guo
a8e0f51c52 dragonball: extend DeviceOpContext
In order to support virtio-mem and virtio-balloon devices, we need to
extend DeviceOpContext with VmConfigInfo and InstanceInfo.

Fixes: #6719

Signed-off-by: Helin Guo <helinguo@linux.alibaba.com>
2023-06-08 22:04:31 +08:00
alex.lyn
abae114046 runtime-rs: refactor device manager implementation
The key aspects of the DM implementation refactoring as below:

1. reduce duplicated code
 Many scenarios have similar steps when adding devices. so to reduce
 duplicated code, we should create a common method abstracted and use
 it in various scenarios.
do_handle_device:
(1) new_device with DeviceConfig and return device_id;
(2) try_add_device with device_id and do really add device;
(3) return device info of device's info;

2. return full info of Device Trait get_device_info
 replace the original type DeviceConfig with full info DeviceType.

3. refactor find_device method.

Fixes: #5656

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-06-08 08:47:08 +08:00
James O. D. Hunt
452f286552
Merge pull request #6764 from byron-marohn/fix_5401
kata-ctl: Switch to slog logging; add --log-level and --json-logging arguments
2023-06-07 16:08:53 +01:00
Fuu
210a15794c dragonball: avoid obtaining lock twice in create_stdio_console
Fixes #7055

Signed-off-by: Fuu <fuu-open@linux.alibaba.com>
2023-06-07 16:12:22 +08:00
GabyCT
5ad8aaf9df
Merge pull request #7035 from GabyCT/topic/logparserdoc
log-parser: Update log parser link at README
2023-06-06 12:02:25 -06:00
Wang, Arron
f62b2670c0 config: Add root hash value and measure config to kernel params
After we have a guest kernel with builtin initramfs which
provide the rootfs measurement capability and Kata rootfs
image with hash device, we need set related root hash value
and measure config to the kernel params in kata configuration file.

Fixes: #6674

Signed-off-by: Wang, Arron <arron.wang@intel.com>
2023-06-06 12:34:13 +02:00
Fabiano Fidêncio
eb1bfa922b
Merge pull request #6980 from nubificus/feat_sharefs_files
runtime-rs: handle copy files when share_fs is not available
2023-06-06 12:26:55 +02:00
Gabriela Cervantes
980d084f47 log-parser: Update log parser link at README
This PR updates the link to the correspondent Developer Guide at the
enabling full containerd debug that we have for kata 2.0 documentation.

Fixes #7034

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-06-05 15:59:52 +00:00
Yushuo
410bc18143 agent-ctl: fix the compile error
When the version of libc is upgraded to 0.2.145, older getrandom could not adapt
to new API, and this will make agent-ctl fail to compile.

We upgrade the version of `rand`, so the low version of getrandom will no longer
need.

Fixes: #7032

Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
2023-06-05 21:48:36 +08:00
Jayant Singh
77519fd120 kata-ctl: Switch to slog logging; add --log-level, --json-logging args
Fixes: #5401, #6654

- Switch kata-ctl from eprintln!()/println!() to structured logging via
  the logging library which uses slog.
- Adds a new create_term_logger() library call which enables printing
  log messages to the terminal via a less verbose / more human readable
  terminal format with colors.
- Adds --log-level argument to select the minimum log level of printed messages.
- Adds --json-logging argument to switch to logging in JSON format.

Co-authored-by: Byron Marohn <byron.marohn@intel.com>
Co-authored-by: Luke Phillips <lucas.phillips@intel.com>
Signed-off-by: Jayant Singh <jayant.singh@intel.com>
Signed-off-by: Byron Marohn <byron.marohn@intel.com>
Signed-off-by: Luke Phillips <lucas.phillips@intel.com>
Signed-off-by: Kelby Madal-Hellmuth <kelby.madal-hellmuth@intel.com>
Signed-off-by: Liz Lawrens <liz.lawrens@intel.com>
2023-06-02 20:13:22 +00:00
Fupan Li
465f5a5ced
Merge pull request #4748 from lifupan/main_fix
agent: fix the issue of exec hang with a backgroud process
2023-06-02 10:46:43 +08:00
Anastassios Nanos
ed37715e05 runtime-rs: handle copy files when share_fs is not available
In hypervisors that do not support virtiofs we have to copy files in
the VM sandbox to properly setup the network (resolv.conf, hosts, and hostname).

To do that, we construct the volume as before, with the addition of an extra
variable that designates the path where the file will reside in the sandbox.

In this case, we issue a `copy_file` agent request *and* we patch the spec
to account for this change.

Fixes: #6978

Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
Signed-off-by: George Pyrros <gpyrros@nubificus.co.uk>
2023-06-01 21:40:56 +00:00
xuejun-xj
5f6fc3ed76 runtime-rs: bugfix: update Cargo.lock
When dragonball update dbs-boot crate in commit
64c764c147, the Cargo.lock in runtime-rs
should also be updated.

Fixes: #6969

Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
2023-06-01 20:25:35 +08:00
xuejun-xj
560442e6ed dragonball: add vcpu_boot_onlined vector
This commit implements the vcpu_boot_onlined vector in get_fdt_vm_info.

"boot_enabled" means whether this vcpu should be onlined at first boot.
It will be used by fdt, which write an attribute called boot_enabled,
and will be handled by guest kernel to pass the correct cpu number to
function "bringup_nonboot_cpus".

Fixes: #6010

Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
2023-05-30 15:51:08 +08:00
xuejun-xj
e31772cfea dragonball: add support resize_vcpu on aarch64
This commit add support of resize_vcpu on aarch64. As kvm will check
whether vgic is initialized when calling KVM_CREATE_VCPU ioctl, all the
vcpu fds should be created before vm is booted.

To support resizing vcpu scenario, we use max_vcpu_count for
create_vcpus and setup_interrupt_controller interfaces. The
SetVmConfiguration API will ensure max_vcpu_count >= boot_vcpu_count.

Fixes: #6010

Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
2023-05-30 15:51:08 +08:00
xuejun-xj
64c764c147 dragonball: update dbs-boot to v0.4.0
dbs-boot-v0.4.0 refectors the create_fdt interface. It simplifies the
parameters needed to be passed and abstracts them into three structs.

By the way, it also reserves some interfaces for future feature: numa
passthrough and cache passthrough.

Fixes: #6969

Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
2023-05-30 15:51:08 +08:00
xuejun-xj
fd9b414646 dragonball: update comment for init_microvm
Rewrite the comment of Vm::init_microvm method for aarch64.

Fixes cargo test warnings on aarch64.

Fixes: #6969

Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
2023-05-30 15:51:08 +08:00
Zhongtao Hu
099b4b0d0e
Merge pull request #6598 from Apokleos/sandbox_bind_mounts
runtime-rs/sandbox_bindmounts: add support for sandbox bindmounts
2023-05-28 12:00:39 +08:00
Zhongtao Hu
cb962b0dc9
Merge pull request #6702 from Apokleos/directvol-common
runtime-rs/kata-ctl: Enhancement of DirectVolumeMount.
2023-05-28 12:00:12 +08:00
alex.lyn
5ddc4f94c5 runtime-rs/kata-ctl: Enhancement of DirectVolumeMount.
Move the get_volume_mount_info to kata-types/src/mount.rs.
If so, it becomes a common method of DirectVolumeMountInfo
and reduces duplicated code.

Fixes: #6701

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-05-26 11:18:29 +08:00
Fupan Li
25d2fb0fde agent: fix the issue of exec hang with a backgroud process
When run a exec process in backgroud without tty, the
exec will hang and didn't terminated.

For example:

crictl -i <container id> sh -c 'nohup tail -f /dev/null &'

Fixes: #4747

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2023-05-26 10:56:46 +08:00
Tim Zhang
5231aff90f
Merge pull request #6860 from lifupan/main
netlink: Fix the issue of update_interface
2023-05-26 10:54:07 +08:00
Greg Kurz
837f7a2fe6
Merge pull request #6959 from beraldoleal/issues/6757
runtime: sending SIGKILL to qemu
2023-05-25 16:24:37 +02:00
alex.lyn
eee7aae71d runtime-rs/sandbox_bindmounts: add support for sandbox bindmounts
sandbox_bind_mounts supports kinds of mount patterns, for example:

(1) "/path/to", default readonly mode.
(2) "/path/to:ro", same as (1).
(3) "/path/to:rw", readwrite mode.

Both support configuration and annotation:
(1)[runtime]
sandbox_bind_mounts=["/path/to", "/path/to:rw", "/mnt/to:ro"]
(2) annotation will alse be supported, restricted as below:
io.katacontainers.config.runtime.sandbox_bind_mounts
                         = "/path/to /path/to:rw /mnt/to:ro"

Fixes: #6597

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-05-25 20:00:25 +08:00
Fupan Li
62b2838962
Merge pull request #6846 from ZhangShuaiyi/DeviceMgrMethod
dragonball: convert BlockDeviceMgr and VirtioNetDeviceMgr functions to methods
2023-05-25 18:11:44 +08:00
QuanweiZhou
377b7735f5
Merge pull request #6872 from justxuewei/rm-virtio-devices
dragonball: Remove virtio-net and vsock devices gracefully
2023-05-25 17:08:36 +08:00
Beraldo Leal
0e47cfc4c7 runtime: sending SIGKILL to qemu
There is a race condition when virtiofsd is killed without finishing all
the clients. Because of that, when a pod is stopped, QEMU detects
virtiofsd is gone, which is legitimate.

Sending a SIGTERM first before killing could introduce some latency
during the shutdown.

Fixes #6757.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-05-24 11:31:28 -04:00
Fabiano Fidêncio
9aae333343
Merge pull request #6871 from kmjohansen/bugfix/ptmx
runtime: make debug console work with sandbox_cgroup_only
2023-05-23 22:24:51 +02:00
Fupan Li
170336517f
Merge pull request #5441 from openanolis/device_manager_dev
runtime-rs: device manager for runtime-rs
2023-05-23 16:50:07 +08:00
Zhongtao Hu
4719802c8d runtime-rs: add virtio-blk-mmio
add virtio-blk-mmio option for dragonball

Fixes:#5375
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-05-23 00:58:10 +08:00
Zhongtao Hu
f9bded4484 runtime-rs: add devicetype enum
use device type to store the config information for different kind of
devices

Fixes:#5375
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-05-23 00:55:35 +08:00
Zhongtao Hu
6800d30fdb runtime-rs: remove device
Support remove device after container stop

Fixes:#5375
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-05-23 00:54:22 +08:00
Zhongtao Hu
f16012a1eb runtime-rs: support linux device
support linux device in runtime-rs

Fixes:#5375
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-05-23 00:54:13 +08:00
Zhongtao Hu
fe9ec67644 runtime-rs: block volume
support block volume in runtime-rs

Fixes: #5375
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-05-23 00:54:04 +08:00
Zhongtao Hu
a8bfac90b1 runtime-rs: support block rootfs
support devmapper for block rootfs

Fixes: #5375

Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-05-23 00:53:30 +08:00
Zhongtao Hu
b076d46db3 agent: handle hotplug virtio-mmio device
As dragonball support hotplug virtio-mmio device, we should handle it in agent

Fixes:#5375
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-05-23 00:53:22 +08:00
Zhongtao Hu
6e273d6ccc runtime-rs: implement trait for vhost-user device
add the trait implementation for vhost-user device

Fixes:#5375
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2023-05-23 00:53:16 +08:00
Zhongtao Hu
cc9c915384 runtime-rs: implement trait for vfio device
add the trait implementation for vfio device,

Fixes:#5375
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-05-23 00:53:10 +08:00
Archana Shinde
2c9efbe04c
Merge pull request #6907 from likebreath/0519/clh_v32.0
Upgrade to Cloud Hypervisor v32.0
2023-05-22 09:53:05 -07:00
Zhongtao Hu
e4c5c74a75 runtime-rs: device manager
Support device manager for runtime-rs, add block device handler for
device manager

Fixes:#5375
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-05-23 00:53:04 +08:00
GabyCT
6796af511b
Merge pull request #6890 from GabyCT/topic/fixurlvirt
docs: Update container network model url
2023-05-19 15:10:26 -06:00
Bo Chen
35c3d7b4bc runtime: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v32.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.

Fixes: #6632

Signed-off-by: Bo Chen <chen.bo@intel.com>
2023-05-19 12:49:45 -07:00
Fabiano Fidêncio
0364620844
Merge pull request #6819 from fidencio/topic/use-static-sandbox-resource-mgmt-for-TEEs
runtime: Use static_sandbox_resource_mgmt=true for TEEs
2023-05-18 22:38:31 +02:00
Fabiano Fidêncio
2ea8acaaa5
Merge pull request #6882 from bergwolf/github/tokio
update tokio dependency
2023-05-18 20:35:16 +02:00
Krister Johansen
eff6ed2d5f runtime: make debug console work with sandbox_cgroup_only
If a hypervisor debug console is enabled and sandbox_cgroup_only is set,
the hypervisor can fail to open /dev/ptmx, which prevents the sandbox
from launching.

This is caused by the absence of a device cgroup entry to allow access
to /dev/ptmx.  When sandbox_cgroup_only is not set, the hypervisor
inherits the default unrestrcited device cgroup, but with it enabled it
runs into allow / deny list restrictions.

Fix by adding an allowlist entry for /dev/ptmx when debug is enabled,
sandbox_cgroup_only is true, and no /dev/ptmx is already in the list of
devices.

Fixes: #6870

Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
2023-05-18 10:36:24 -07:00
Gabriela Cervantes
11a34a72e2 docs: Update container network model url
This PR updates the container network model url that is part of the
virtcontainers documentation.

Fixes #6889

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-05-18 15:08:08 +00:00
Peng Tao
f6e1b1152c agent: update tokio dependency
To 1.28.1 to bring in the latest fixes.

Fixes: #6881
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-05-18 09:36:06 +00:00
Shuaiyi Zhang
c477ac551f dragonball: Convert VirtioNetDeviceMgr function to method
Convert VirtioNetDeviceMgr::insert_device and
VirtioNetDeviceMgr::update_device_ratelimiters to method.

Fixes: #6880

Signed-off-by: Shuaiyi Zhang <zhang_syi@qq.com>
2023-05-18 16:57:01 +08:00
Shuaiyi Zhang
4659facb74 dragonball: Convert BlockDeviceMgr function to method
Convert BlockDeviceMgr::insert_device, BlockDeviceMgr::remove_device
and BlockDeviceMgr::update_device_ratelimiters to method.

Fixes: #6880

Signed-off-by: Shuaiyi Zhang <zhang_syi@qq.com>
2023-05-18 16:56:49 +08:00
Peng Tao
4cb83dc219 kata-ctl: update tokio dependency
Update to 1.28.1 To pick up the latest fixes.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-05-18 08:25:13 +00:00
Peng Tao
df615ff252 runk: update tokio dependency
Update to 1.28.1 to pick up latest fixes.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-05-18 08:24:41 +00:00
Peng Tao
ca6892ddb1 runtime-rs: update tokio dependency
Unify it to the latest 1.28.1 version.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-05-18 08:18:22 +00:00
Fabiano Fidêncio
3a4b924226
Merge pull request #6833 from rye-stripe/bugfix/vcpu-pinning
resource-control: fix setting CPU affinities on Linux
2023-05-18 08:12:39 +02:00
Xuewei Niu
ee6deef09d dragonball: Remove virtio-net and vsock devices gracefully
This MR implements removing virtio-net and virtio-vsock devices gracefully when
shutting down VMM.

Fixes: #6684

Signed-off-by: Zizheng Bian <zizheng.bian@linux.alibaba.com>
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-05-18 12:11:20 +08:00
Fabiano Fidêncio
e762f70920
Merge pull request #6838 from rye-stripe/bugfix/use-enable-vcpus-pinning-from-toml
runtime: use enable_vcpus_pinning from toml
2023-05-17 21:30:44 +02:00
Fabiano Fidêncio
ca1531fe9d runtime: Use static_sandbox_resource_mgmt=true for TEEs
When this option is enabled the runtime will attempt to determine the
appropriate sandbox size (memory, CPU) before booting the virtual
machine.

As TEEs do not support memory and CPU hotplug, this approach must be
used.

Fixes: #6818

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-17 19:21:52 +02:00
Fabiano Fidêncio
8ce14e709a
Merge pull request #6810 from fitzthum/snp-enable
gha: Enable SEV-SNP tests on main
2023-05-17 15:29:54 +02:00
Wainer Moschetta
259158f1c3
Merge pull request #6789 from dubek/add-sev-package
runtime: Port sev package to main
2023-05-17 10:02:19 -03:00
Tobin Feldman-Fitzthum
cbb9fe8b81 config: Use standard OVMF with SEV
The AmdSev firmware package should be used with
measured direct boot. If the expected hashes are not
injected into the firmware binary by the VMM, the
guest will not boot. This is required for security.

Currently the main branch does not have the extended
shim support for SEV, which tells the VMM to inject
the expected hashes.

We ship the standard OVMF package to use with SNP,
so let's switch SEV to that for now. This will need
to be changed back when shim support for SEV(-ES)
is added to main.

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
2023-05-17 11:36:04 +02:00
fupan
2bda92face netlink: Fix the issue of update_interface
When updating an interface, there's maybe an existed
interface whose name would be the same with the updated
required name, thus it would update failed with interface
name existed error. Thus we should rename the existed interface
with an temporary name and swap it with the previouse interface
name last.

Fixes: #6842

Signed-off-by: fupan <fupan.lfp@antgroup.com>
2023-05-17 16:45:49 +08:00
Fabiano Fidêncio
9630c13ac0
Merge pull request #6845 from fidencio/topic/yet-more-nvidia-gpu-naming-fixes
gpu: Rename the last bits from `gpu` to `nvidia-gpu`
2023-05-17 09:05:12 +02:00
Amulya Meka
3ccc29030d
Merge pull request #6780 from Amulyam24/rust-virtfs
ppc64le: switch virtiofsd from C to rust version
2023-05-17 09:36:28 +05:30
Salvador Fuentes
b76058c979
Merge pull request #6721 from nedsouza/virtcontainers-qemu-go-coverage
virtcontainers/qemu_test.go: Improve coverage
2023-05-16 11:11:43 -06:00
Feng Wang
ebc8e8e2fd
Merge pull request #6773 from jepio/agent-config-error-context
agent: Add context to errors that may occur when AgentConfig file is …
2023-05-16 09:21:34 -07:00
James O. D. Hunt
a96fcfd5be
Merge pull request #6735 from nedsouza/258/tests-coverage-compatoci
virtcontainers/pkg/compatoci/: Improved coverage for  for Kata 2.0
2023-05-16 15:36:35 +01:00
Amulyam24
c5a59caca1 ppc64le: switch virtiofsd from C to rust version
We have been using the C version of virtiofsd on ppc64le. Now that the issue with
rust virtiofsd have been fixed, let's switch to it.

Fixes: #4259

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2023-05-16 14:46:19 +02:00
Dov Murik
dd7562522a runtime: pkg/sev: Add kbs utility package for SEV pre-attestation
Supports both online and offline modes of interaction with simple-kbs
for SEV/SEV-ES confidential guests.

Fixes: #6795

Signed-off-by: Dov Murik <dovmurik@linux.ibm.com>
2023-05-16 15:27:32 +03:00
Dov Murik
05de7b2607 runtime: Add sev package
The sev package provides utilities for launching AMD SEV and SEV-ES
confidential guests.

Fixes: #6795

Signed-off-by: Dov Murik <dovmurik@linux.ibm.com>
2023-05-16 15:27:32 +03:00
Fabiano Fidêncio
3a9d3c72aa gpu: Rename the last bits from gpu to nvidia-gpu
Let's specifically name the `gpu` runtime class as `nvidia-gpu`.  By
doing this we keep the door open and ease the life of the next vendor
adding GPU support for Kata Containers.

Fixes: #6553

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-16 13:47:52 +02:00
Bin Liu
47a02dcc7f
Merge pull request #6767 from ngpatel6/Issue-5403
kata-ctl:  Add the option to install kata-ctl to a user specified directory
2023-05-16 10:43:40 +08:00
Bin Liu
2cd2d02d1f
Merge pull request #6812 from ZhangShuaiyi/dev/write_bootparams
Dragonball: use LinuxBootConfigurator::write_bootparams
2023-05-16 09:54:41 +08:00
Narendra Patel
593840e075 kata-ctl: Allow INSTALL_PATH= to be specified
Update the kata-ctl install rule to allow it to be installed to a given directory

The Makefile was updated to use an INSTALL_PATH variable to track where the
kata-ctl binary should be installed.  If the user doesn't specify anything,
then it uses the default path that cargo uses.  Otherwise, it will install it
in the directory that the user specified.  The README.md file was also updated
to show how to use the new option.

Fixes #5403

Co-authored-by: Cesar Tamayo <cesar.tamayo@intel.com>
Co-authored-by: Kevin Mora Jimenez <kevin.mora.jimenez@intel.com>
Co-authored-by: Narendra Patel <narendra.g.patel@intel.com>
Co-authored-by: Ray Karrenbauer <ray.karrenbauer@intel.com>
Co-authored-by: Srinath Duraisamy <srinath.duraisamy@intel.com>
Signed-off-by: Narendra Patel <narendra.g.patel@intel.com>
2023-05-15 17:21:49 -04:00
Peteris Rudzusiks
bdb75fb21e runtime: use enable_vcpus_pinning from toml
Set the default value of runtime's EnableVCPUsPinning to value read from .toml.

Fixes: #6836

Signed-off-by: Peteris Rudzusiks <rye@stripe.com>
2023-05-15 21:41:20 +02:00
Tamas K Lengyel
20cb875087 virtcontainers/qemu_test.go: Improve test coverage
Rework TestQemuCreateVM routine to be a table driven test with
various config variations passed to it. After CreateVM a handful
of additional functions are exercised to improve code-coverage.
Also add partial coverage for StartVM routine.

Currently improving from 19.7% to 35.7%

Credit PR to Hackathon Team3

Fixes: #267

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
2023-05-15 15:26:35 -04:00
Peteris Rudzusiks
3e85bf5b17 resource-control: fix setting CPU affinities on Linux
With this fix the vCPU pinning feature chooses the correct
physical cores to pin the vCPU threads on rather than always using core 0.

Fixes #6831

Signed-off-by: Peteris Rudzusiks <rye@stripe.com>
2023-05-15 16:46:36 +02:00
LiuWeijie
50cc9c582f tests: Improve coverage for virtcontainers/pkg/compatoci/ for Kata 2.0
Add test cases for ParseConfigJson function and GetContainerSpec function

Fixes: #258

Signed-off-by: LiuWeijie <weijie.liu@intel.com>
2023-05-15 11:58:17 +08:00
Archana Shinde
32b39ee347
Merge pull request #6763 from nedsouza/266/tests_coverage_virtcontainers_fc
virtcontainers: Improved test coverage for fc.go from 4.6% to 18.5%
2023-05-12 11:53:27 -07:00
Shuaiyi Zhang
197c336516 Dragonball: use LinuxBootConfigurator::write_bootparams to writes
the boot parameters into guest memory.

Fixes: #6813

Signed-off-by: Shuaiyi Zhang <zhang_syi@qq.com>
2023-05-12 16:07:44 +08:00
Amulya Meka
76f975e5e6
Merge pull request #6742 from Amulyam24/agent-build
runtime: remove overriding ARCH value by default for ppc64le
2023-05-12 12:34:50 +05:30