Compare commits

...

492 Commits

Author SHA1 Message Date
Greg Kurz
ef49fa95f7 Merge pull request #5290 from gkurz/3.0.0-rc1-branch-bump
# Kata Containers 3.0.0-rc1
2022-09-30 08:43:06 +02:00
Greg Kurz
727f233e2a release: Kata Containers 3.0.0-rc1
- tools: release: fix bogus version check
- osbuilder: Export directory variables for libseccomp
- kata-deploy: support runtime-rs for kata deploy
- Last backport for 3.0-rc1
- stable-3.0: backport runtime/runtime-rs dependency updates

babab160bc tools: release: fix bogus version check
af22e71375 osbuilder: Export directory variables for libseccomp
b0c5f040f0 runtime-rs: set agent timeout to 0 for stream RPCs
d44e39e059 runtime-rs: fix incorrect comments
43b0e95800 runtime: store the user name in hypervisor config
81801888a2 runtime: make StopVM thread-safe
fba39ef32d runtime: add more debug logs for non-root user operation
63309514ca runtime-rs: drop dependency on rustc-serialize
e229a03cc8 runtime: update runc dependency
d663f110d7 kata-deploy: get the config path from cri options
c6b3dcb67d kata-deploy: support kata-deploy for runtime-rs
a394761a5c kata-deploy: add installation for runtime-rs

Signed-off-by: Greg Kurz <groug@kaod.org>
2022-09-29 17:21:11 +02:00
Greg Kurz
619d1b487f Merge pull request #5286 from gkurz/backport-3.0/5284-release-script
tools: release: fix bogus version check
2022-09-29 17:11:23 +02:00
Greg Kurz
babab160bc tools: release: fix bogus version check
Shell expands `*"rc"*` to the top-level `src` directory. This results
in comparing a version with a directory name. This doesn't make sense
and causes the script to choose the wrong branch of the `if`.

The intent of the check is actually to detect `rc` in the version.

Fixes: #5283
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 421729f991)
Signed-off-by: Greg Kurz <groug@kaod.org>
2022-09-29 14:56:52 +02:00
Archana Shinde
f168555569 Merge pull request #5273 from gkurz/backport-3.0/5233-osbuilder
osbuilder: Export directory variables for libseccomp
2022-09-28 17:22:51 -07:00
Gabriela Cervantes
af22e71375 osbuilder: Export directory variables for libseccomp
To avoid the random failures when we are building the rootfs as it seems
that it does not find the value for the libseccomp and gperf directory,
this PR export these variables.

Fixes #5232

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit a4a23457ca)
Signed-off-by: Greg Kurz <groug@kaod.org>
2022-09-28 13:16:09 +02:00
Greg Kurz
b9379521a0 Merge pull request #5263 from openanolis/origin/kata-deploy
kata-deploy: support runtime-rs for kata deploy
2022-09-28 09:41:12 +02:00
Peng Tao
5b3bbc62ba Merge pull request #5257 from gkurz/backport-3_0_rc1
Last backport for 3.0-rc1
2022-09-28 11:01:09 +08:00
Bin Liu
b0c5f040f0 runtime-rs: set agent timeout to 0 for stream RPCs
For stream RPCs:
- write_stdin
- read_stdout
- read_stderr

there should be no timeout (by setting it to 0).

Fixes: #5249

Signed-off-by: Bin Liu <bin@hyper.sh>
(cherry picked from commit 20bcaf0e36)
Signed-off-by: Greg Kurz <groug@kaod.org>
2022-09-27 16:01:17 +02:00
Bin Liu
d44e39e059 runtime-rs: fix incorrect comments
Some comments for types are incorrect in file
 src/libs/kata-types/src/config/hypervisor/mod.rs

Fixes: #5187

Signed-off-by: Bin Liu <bin@hyper.sh>
(cherry picked from commit 3f65ff2d07)
Signed-off-by: Greg Kurz <groug@kaod.org>
2022-09-27 15:58:27 +02:00
Feng Wang
43b0e95800 runtime: store the user name in hypervisor config
The user name will be used to delete the user instead of relying on
uid lookup because uid can be reused.

Fixes: #5155

Signed-off-by: Feng Wang <feng.wang@databricks.com>
(cherry picked from commit f914319874)
Signed-off-by: Greg Kurz <groug@kaod.org>
2022-09-27 15:58:26 +02:00
Feng Wang
81801888a2 runtime: make StopVM thread-safe
StopVM can be invoked by multiple threads and needs to be thread-safe

Fixes: #5155

Signed-off-by: Feng Wang <feng.wang@databricks.com>
(cherry picked from commit 5cafe21770)
Signed-off-by: Greg Kurz <groug@kaod.org>
2022-09-27 15:58:25 +02:00
Feng Wang
fba39ef32d runtime: add more debug logs for non-root user operation
Previously the logging was insufficient and made debugging difficult

Fixes: #5155

Signed-off-by: Feng Wang <feng.wang@databricks.com>
(cherry picked from commit c3015927a3)
Signed-off-by: Greg Kurz <groug@kaod.org>
2022-09-27 15:58:24 +02:00
Fupan Li
57261ec97a Merge pull request #5251 from bergwolf/github/backport-3.0
stable-3.0: backport runtime/runtime-rs dependency updates
2022-09-27 14:55:55 +08:00
Peng Tao
63309514ca runtime-rs: drop dependency on rustc-serialize
We are not using it and it hasn't got any updates for more than five
years, leaving open CVEs unresolved.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-09-27 11:54:44 +08:00
Peng Tao
e229a03cc8 runtime: update runc dependency
To bring fix to CVE-2022-29162.

Fixes: #5217
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-09-27 11:54:37 +08:00
Zhongtao Hu
d663f110d7 kata-deploy: get the config path from cri options
get the config path for runtime-rs from cri options

Fixes: #5000
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-09-22 17:39:25 +08:00
Zhongtao Hu
c6b3dcb67d kata-deploy: support kata-deploy for runtime-rs
support kata-deploy for runtime-rs

Fixes:#5000
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-09-22 17:39:20 +08:00
Zhongtao Hu
a394761a5c kata-deploy: add installation for runtime-rs
setup the compile environment and installation path for the Rust runtime

Fixes:#5000
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-09-22 15:59:44 +08:00
Tim Zhang
32a9d6d66d Merge pull request #5174 from bergwolf/3.0.0-rc0-branch-bump
# Kata Containers 3.0.0-rc0
2022-09-16 16:59:55 +08:00
Peng Tao
583591099d release: Kata Containers 3.0.0-rc0
- runtime-rs: delete some allow(dead_code) attributes
- kata-types: don't check virtio_fs_daemon for inline-virtio-fs
- kata-types: change return type of getting CPU period/quota function
- runtime-rs: fix host device check pattern
- runtime-rs: remove meaningless comment
- runtime-rs: update rust runtime roadmap
- runk: Enable seccomp support by default
- config: add "inline-virtio-fs" as a "shared_fs" type
- runtime-rs: add README.md
- runk: Refactor container builder
- kernel: fix kernel tarball name for SEV
- libs/kata-types: replace tabs by spaces in comments
- gperf: point URL to mirror site

be242a3c3 release: Adapt kata-deploy for 3.0.0-rc0
156e1c324 runtime-rs: delete some allow(dead_code) attributes
62cf6e6fc runtime-rs: remove meaningless comment
bcf6bf843 runk: Enable seccomp support by default
2b1d05857 runtime-rs: fix host device check pattern
85b49cee0 runtime-rs: add README.md
36d805fab config: add "inline-virtio-fs" as a "shared_fs" type
b948a8ffe kernel: fix kernel tarball name for SEV
50f912615 libs/kata-types: replace tabs by spaces in comments
96c8be715 libs/kata-types: change return type of getting CPU period/quota
fc9c6f87a kata-types: don't check virtio_fs_daemon for inline-virtio-fs
968c2f6e8 runk: Refactor container builder
84268f871 runtime-rs: update rust runtime roadmap
566656b08 gperf: point URL to mirror site

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-09-16 03:53:44 +00:00
Peng Tao
be242a3c3c release: Adapt kata-deploy for 3.0.0-rc0
kata-deploy files must be adapted to a new release.  The cases where it
happens are when the release goes from -> to:
* main -> stable:
  * kata-deploy-stable / kata-cleanup-stable: are removed

* stable -> stable:
  * kata-deploy / kata-cleanup: bump the release to the new one.

There are no changes when doing an alpha release, as the files on the
"main" branch always point to the "latest" and "stable" tags.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-09-16 03:53:43 +00:00
Bin Liu
be22e8408d Merge pull request #5165 from liubin/fix/5164-remove-dead_code
runtime-rs: delete some allow(dead_code) attributes
2022-09-15 09:32:10 +08:00
Bin Liu
156e1c3247 runtime-rs: delete some allow(dead_code) attributes
Some #![allow(dead_code)]s and code are not needed indeed.

Fixes: #5164

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-09-14 20:50:30 +08:00
Bin Liu
a58feba9bb Merge pull request #5105 from liubin/fix/5104-ignore-virtiofs-daemon-for-inline-mode
kata-types: don't check virtio_fs_daemon for inline-virtio-fs
2022-09-13 10:33:56 +08:00
Bin Liu
42d4da9b6c Merge pull request #5101 from liubin/fix/5100-cpu-period-quota-data-type
kata-types: change return type of getting CPU period/quota function
2022-09-13 10:33:29 +08:00
Tim Zhang
8ec4edcf4f Merge pull request #5146 from liubin/fix/5145-check-host-dev
runtime-rs: fix host device check pattern
2022-09-13 10:33:05 +08:00
Tim Zhang
447521c6da Merge pull request #5151 from liubin/fix/5150-remove-comment
runtime-rs: remove meaningless comment
2022-09-13 10:32:53 +08:00
Bin Liu
2f830c09a3 Merge pull request #5073 from openanolis/update
runtime-rs: update rust runtime roadmap
2022-09-13 10:32:25 +08:00
Bin Liu
62cf6e6fc3 runtime-rs: remove meaningless comment
The comment for `generate_mount_path` function is a copy miss
and should be deleted.

Fixes: #5150

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-09-09 16:07:35 +08:00
Bin Liu
55f4f3a95b Merge pull request #4897 from ManaSugi/runk/enable-seccomp
runk: Enable seccomp support by default
2022-09-09 14:11:35 +08:00
Manabu Sugimoto
bcf6bf843c runk: Enable seccomp support by default
Enable seccomp support in `runk` by default.
Due to this, `runk` is built with `gnu libc` by default
because the building `runk` with statically linked the `libseccomp`
and `musl` requires additional configurations.
Also, general container runtimes are built with `gnu libc` as
dynamically linked binaries by default.
The user can disable seccomp by `make SECCOMP=no`.

Fixes: #4896

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2022-09-09 10:55:16 +09:00
GabyCT
be462baa7e Merge pull request #5103 from liubin/fix/5102-add-inline-virtiofs-config
config: add "inline-virtio-fs" as a "shared_fs" type
2022-09-08 10:33:20 -05:00
GabyCT
bcbce8317d Merge pull request #5061 from liubin/fix/5022-runtime-rs-readme
runtime-rs: add README.md
2022-09-08 10:32:08 -05:00
bin liu
2b1d058572 runtime-rs: fix host device check pattern
Host devices should start with `/dev/` but not `/dev`.

Fixes: #5145

Signed-off-by: bin liu <liubin0329@gmail.com>
2022-09-08 22:44:46 +08:00
Bin Liu
85b49cee02 runtime-rs: add README.md
Add README.md for runtime-rs.

Fixes: #5022

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-09-08 16:03:45 +08:00
Bin Liu
7cfc357c6e Merge pull request #5034 from ManaSugi/runk/refactor-container-builder
runk: Refactor container builder
2022-09-08 11:30:07 +08:00
Bin Liu
36d805fab9 config: add "inline-virtio-fs" as a "shared_fs" type
"inline-virtio-fs" is newly supported by kata 3.0 as a "shared_fs" type,
it should be described in configuration file.

"inline-virtio-fs" is the same as "virtio-fs", but it is running in
the same process of shim, does not need an external virtiofsd process.

Fixes: #5102

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-09-08 11:05:01 +08:00
Fabiano Fidêncio
5793685a4b Merge pull request #5095 from ryansavino/sev-kernel-build-fix
kernel: fix kernel tarball name for SEV
2022-09-07 17:50:17 +02:00
Bin Liu
5df6ff991d Merge pull request #5116 from liubin/fix/5115-replace-tab-by-space
libs/kata-types: replace tabs by spaces in comments
2022-09-07 15:53:34 +08:00
Fabiano Fidêncio
e94d38c97b Merge pull request #5058 from ryansavino/gperf-url-fix
gperf: point URL to mirror site
2022-09-07 09:25:13 +02:00
Bin Liu
fe55f6afd7 Merge pull request #5124 from amshinde/revert-arp-neighbour-api
Revert arp neighbour api
2022-09-07 11:14:53 +08:00
Chelsea Mafrica
051dabb0fe Merge pull request #5099 from liubin/fix/5098-add-default-config-for-runtime-rs
runtime-rs: add default agent/runtime/hypervisor for configuration
2022-09-06 17:49:42 -07:00
Archana Shinde
d23779ec9b Revert "agent: fix unittests for arp neighbors"
This reverts commit 81fe51ab0b.
2022-09-06 15:41:42 -07:00
Archana Shinde
d340564d61 Revert "agent: use rtnetlink's neighbours API to add neighbors"
This reverts commit 845c1c03cf.

Fixes: #5126
2022-09-06 15:41:42 -07:00
Archana Shinde
188d37badc kata-deploy: Add debug statement
Adding this so that we can see the status of running pods in
case of failure.

Fixes: #5126

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-09-06 15:41:14 -07:00
Ryan Savino
b948a8ffe6 kernel: fix kernel tarball name for SEV
'linux-' prefix needed for tarball name in SEV case. Output to same file name.

Fixes: #5094

Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
2022-09-06 11:04:29 -05:00
Bin Liu
50f9126153 libs/kata-types: replace tabs by spaces in comments
Replace tabs by spaces in the comments of file
libs/kata-types/src/annotations/mod.rs.

Fixes: #5115

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-09-06 17:32:57 +08:00
Bin Liu
96c8be715b libs/kata-types: change return type of getting CPU period/quota
period should have a type of u64, and quota should be i64, the
function of getting CPU period and quota from annotations should
use the same data type as function return type.

Fixes: #5100

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-09-06 11:35:52 +08:00
Bin Liu
fc9c6f87a3 kata-types: don't check virtio_fs_daemon for inline-virtio-fs
If the shared_fs is set to "inline-virtio-fs", the "virtio_fs_daemon"
should be ignored.

Fixes: #5104

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-09-05 17:44:28 +08:00
James O. D. Hunt
662ce3d6f2 Merge pull request #5086 from Yuan-Zhuo/main
docs: fix unix socket address in agent-ctl doc
2022-09-05 09:24:28 +01:00
Bin Liu
e879270a0c runtime-rs: add default agent/runtime/hypervisor for configuration
Kata 3.0 introduced 3 new configurations under runtime section:

name="virt_container"
hypervisor_name="dragonball"
agent_name="kata"
Blank values will lead to starting to fail.

Adding default values will make user easy to migrate to kata 3.0.

Fixes: #5098

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-09-05 15:55:28 +08:00
Bin Liu
e5437a7084 Merge pull request #5063 from liubin/fix/5062-split-amend-spec
runtime-rs: split amend_spec function
2022-09-05 15:00:31 +08:00
Manabu Sugimoto
968c2f6e8e runk: Refactor container builder
Refactor the container builder code (`InitContainer` and `ActivatedContainer`)
to make it easier to understand and to maintain.

The details:

1. Separate the existing `builder.rs` into an `init_builder.rs` and
`activated_builder.rs` to make them easy to read and maintain.

2. Move the `create_linux_container` function from the `builder.rs` to
`container.rs` because it is shared by the both files.

3. Some validation functions such as `validate_spec` from `builder.rs`
to `utils.rs` because they will be also used by other components as
utilities in the future.

Fixes: #5033

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2022-09-05 14:36:30 +09:00
Bin Liu
ba013c5d0f Merge pull request #4744 from openanolis/runtime-rs-static_resource_mgmt
runtime-rs: support functionality of static resource management
2022-09-05 11:17:09 +08:00
Wainer Moschetta
e81a73b622 Merge pull request #4719 from bookinabox/cargo-deny
github-actions: Add cargo-deny
2022-09-02 17:24:50 -03:00
Fabiano Fidêncio
1ccd883103 Merge pull request #5090 from fidencio/topic/keep-passing-build-suffix-to-qemu
qemu: Keep passing BUILD_SUFFIX
2022-09-02 19:37:22 +02:00
Fabiano Fidêncio
373dac2dbb qemu: Keep passing BUILD_SUFFIX
In the commit 54d6d01754 we ended up
removing the BUILD_SUFFIX argument passed to QEMU as it only seemed to
be used to generate the HYPERVISOR_NAME and PKGVERSION, which were added
as arguments to the dockerfile.

However, it turns out BUILD_SUFFIX is used by the `qemu-build-post.sh`
script, so it can rename the QEMU binary accordingly.

Let's just bring it back.

Fixes: #5078

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-09-02 15:47:48 +02:00
Fabiano Fidêncio
9cf4eaac13 Merge pull request #5079 from ryansavino/tdx-qemu-tarball-path-fix
qemu: fix tdx qemu tarball directories
2022-09-02 14:04:50 +02:00
Yuan-Zhuo
5f4f5f2400 docs: fix unix socket address in agent-ctl doc
Following the instructions in guidance doc will result in the ECONNREFUSED,
thus we need to keep the unix socket address in the two commands consistent.

Fixes: #5085

Signed-off-by: Yuan-Zhuo <yuanzhuo0118@outlook.com>
2022-09-02 17:37:44 +08:00
Peng Tao
b5786361e9 Merge pull request #4862 from egernst/memory-hotplug-limitation
Address Memory hotplug limitation
2022-09-02 16:11:46 +08:00
Ryan Savino
59e3850bfd qemu: create no_patches.txt file for SPR-BKC-QEMU-v2.5
Patches failing without the no_patches.txt file for SPR-BKC-QEMU-v2.5.

Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
2022-09-01 21:07:30 -05:00
Bin Liu
6de4bfd860 Merge pull request #5076 from GabyCT/topic/updatedeveloperguide
docs: Update url in the Developer Guide
2022-09-02 10:01:02 +08:00
Ryan Savino
54d6d01754 qemu: fix tdx qemu tarball directories
Dockerfile cannot decipher multiple conditional statements in the main RUN call.
Cannot segregate statements in Dockerfile with '{}' braces without wrapping entire statement in 'bash -c' statement.
Dockerfile does not support setting variables by bash command.
Must set HYPERVISOR_NAME and PKGVERSION from parent script: build-base-qemu.sh

Fixes: #5078

Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
2022-09-01 20:36:28 -05:00
Archana Shinde
f79ef1ad90 Merge pull request #5048 from amshinde/3.0.0-alpha1-branch-bump
# Kata Containers 3.0.0-alpha1
2022-09-02 06:42:16 +05:30
Gabriela Cervantes
e83b821316 docs: Update url in the Developer Guide
This PR updates the url for containerd in the Developer Guide.

Fixes #5075

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2022-09-01 15:33:29 +00:00
Zhongtao Hu
84268f8716 runtime-rs: update rust runtime roadmap
Update the status and plan for the Rust runtime developement

Fixes: #4884
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-09-01 22:53:30 +08:00
GabyCT
9bce2beebf Merge pull request #5040 from GabyCT/topic/updatecni
versions: Update cni plugins version
2022-09-01 09:31:06 -05:00
Bin Liu
69b82023a8 Merge pull request #5065 from liubin/fix/5064-specify-language-for-code-in-markdown
docs: Specify language in markdown for syntax highlight
2022-09-01 16:11:23 +08:00
Bin Liu
41ec71169f runtime-rs: split amend_spec function
amend_spec do two works:

- modify the spec
- check if the pid namespace is enabled

This make it confusable. So split it into two functions.

Fixes: #5062

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-09-01 14:44:54 +08:00
Bin Liu
749a6a2480 docs: Specify language in markdown for syntax highlight
Specify language for code block in docs/Unit-Test-Advice.md
for syntax highlight.

Fixes: #5064

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-09-01 13:54:31 +08:00
Eric Ernst
9997ab064a sandbox_test: Add test to verify memory hotplug behavior
Augment the mock hypervisor so that we can validate that ACPI memory hotplug
is carried out as expected.

We'll augment the number of memory slots in the hypervisor config each
time the memory of the hypervisor is changed. In this way we can ensure
that large memory hotplugs are broken up into appropriately sized
pieces in the unit test.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-08-31 10:32:30 -07:00
Eric Ernst
f390c122f0 sandbox: don't hotplug too much memory at once
If we're using ACPI hotplug for memory, there's a limitation on the
amount of memory which can be hotplugged at a single time.

During hotplug, we'll allocate memory for the memmap for each page,
resulting in a 64 byte per 4KiB page allocation. As an example, hotplugging 12GiB
of memory requires ~192 MiB of *free* memory, which is about the limit
we should expect for an idle 256 MiB guest (conservative heuristic of 75%
of provided memory).

From experimentation, at pod creation time we can reliably add 48 times
what is provided to the guest. (a factor of 48 results in using 75% of
provided memory for hotplug). Using prior example of a guest with 256Mi
RAM, 256 Mi * 48 = 12 Gi; 12GiB is upper end of what we should expect
can be hotplugged successfully into the guest.

Note: It isn't expected that we'll need to hotplug large amounts of RAM
after workloads have already started -- container additions are expected
to occur first in pod lifecycle. Based on this, we expect that provided
memory should be freely available for hotplug.

If virtio-mem is being utilized, there isn't such a limitation - we can
hotplug the max allowed memory at a single time.

Fixes: #4847

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-08-31 10:32:30 -07:00
Ryan Savino
566656b085 gperf: point URL to mirror site
gperf download fails intermittently.
Changing to mirror site will hopefully increase download reliability.

Fixes: #5057

Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
2022-08-31 10:02:53 -05:00
Fabiano Fidêncio
08d230c940 Merge pull request #5046 from fidencio/topic/fix-regression-on-building-tdx-kernel
kernel: Re-work get_tee_kernel()
2022-08-31 13:16:26 +02:00
Greg Kurz
380af44043 Merge pull request #5036 from jpecholt/whitelist-cleanup
kernel: Whitelist cleanup
2022-08-31 11:08:32 +02:00
Fabiano Fidêncio
a1fdc08275 kernel: Re-work get_tee_kernel()
00aadfe20a introduced a regression on
`make cc-tdx-kernel-tarball` as we stopped passing all the needed
information to the `build-kernel.sh` script, leading to requiring `yq`
installed in the container used to build the kernel.

This commit partially reverts the faulty one, rewritting it in a way the
old behaviour is brought back, without changing the behaviour that was
added by the faulty commit.

Fixes: #5043

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-31 10:08:12 +02:00
Peng Tao
f1276180b1 Merge pull request #4996 from liubin/fix/4995-delete-socket-option-for-shim
runtime-rs: delete socket from shim command-line options
2022-08-31 14:16:56 +08:00
Bin Liu
515bdcb138 Merge pull request #4900 from wllenyj/dragonball-ut
Built-in Sandbox: add more unit tests for dragonball.
2022-08-31 14:00:07 +08:00
Eric Ernst
e0142db24f hypervisor: Add GetTotalMemoryMB to interface
It'll be useful to get the total memory provided to the guest
(hotplugged + coldplugged). We'll use this information when calcualting
how much memory we can add at a time when utilizing ACPI hotplug.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-08-30 16:37:47 -07:00
Archana Shinde
0ab49b233e release: Kata Containers 3.0.0-alpha1
- Initrd fixes for ubuntu systemd
- kernel: Add CONFIG_CGROUP_HUGETLB=y as part of the cgroup fragments
- Fix kata-deploy to work on CI context
- github-actions: Auto-backporting
- runtime-rs: add support for core scheduling
- ci: Use versions.yaml for the libseccomp
- runk: Add cli message for init command
- agent: add some logs for mount operation
- Use iouring for qemu block devices
- logging: Replace nix::Error::EINVAL with more descriptive msgs
- kata-deploy: fix threading conflicts
- kernel: Ignore CONFIG_SPECULATION_MITIGATIONS for older kernels
- runtime-rs: support loading kernel modules in guest vm
- TDX: Get TDX working again with Cloud Hypervisor + a minor change on QEMU's code
- runk: Move delete logic to libcontainer
- runtime: cri-o annotations have been moved to podman
- Fix depbot reported rust crates dependency security issues
- UT: test_load_kernel_module needs root
- enable vmx for vm factory
- runk: add pause/resume commands
- kernel: upgrade guest kernel support to 5.19
- Drop-in cfg files support in runtime-rs
- agent: do some rollback works if case of do_create_container failed
- network: Fix error message for setting hardware address on TAP interface
- Upgrade to Cloud Hypervisor v26.0
- runtime: tracing: End root span at end of trace
- ci: Update libseccomp version
- dep: update nix dependency
- Updated the link target of CRI-O
- libs/test-utils: share test code by create a new crate

dc32c4622 osbuilder: fix ubuntu initrd /dev/ttyS0 hang
cc5f91dac osbuilder: add systemd symlinks for kata-agent
c08a8631e agent: add some logs for mount operation
0a6f0174f kernel: Ignore CONFIG_SPECULATION_MITIGATIONS for older kernels
6cf16c4f7 agent-ctl: fix clippy error
4b57c04c3 runtime-rs: support loading kernel modules in guest vm
dc90eae17 qemu: Drop unnecessary `tdx_guest` kernel parameter
d4b67613f clh: Use HVC console with TDX
c0cb3cd4d clh: Avoid crashing when memory hotplug is not allowed
9f0a57c0e clh: Increase API and SandboxStop timeouts for TDX
b535bac9c runk: Add cli message for init command
c142fa254 clh: Lift the sharedFS restriction used with TDX
bdf8a57bd runk: Move delete logic to libcontainer
a06d819b2 runtime: cri-o annotations have been moved to podman
ffd1c1ff4 agent-ctl/trace-forwarder: udpate thread_local dependency
69080d76d agent/runk: update regex dependency
e0ec09039 runtime-rs: update async-std dependency
763ceeb7b logging: Replace nix::Error::EINVAL with more descriptive msgs
4ee2b99e1 kata-deploy: fix threading conflicts
731d39df4 kernel: Add CONFIG_CGROUP_HUGETLB=y as part of the cgroup fragments
96d903734 github-actions: Auto-backporting
a6fbaac1b runk: add pause/resume commands
8e201501e kernel: fix for set_kmem_limit error
00aadfe20 kernel: SEV guest kernel upgrade to 5.19.2
0d9d8d63e kernel: upgrade guest kernel support to 5.19.2
57bd3f42d runtime-rs: plug drop-in decoding into config-loading code
87b97b699 runtime-rs: add filesystem-related part of drop-in handling
cf785a1a2 runtime-rs: add core toml::Value tree merging
92f7d6bf8 ci: Use versions.yaml for the libseccomp
f508c2909 runtime: constify splitIrqChipMachineOptions
2b0587db9 runtime: VMX is migratible in vm factory case
fa09f0ec8 runtime: remove qemuPaths
326f1cc77 agent: enrich some error code path
4f53e010b agent: skip test_load_kernel_module if non-root
3a597c274 runtime: clh: Use the new 'payload' interface
16baecc5b runtime: clh: Re-generate the client code
50ea07183 versions: Upgrade to Cloud Hypervisor v26.0
f7d41e98c kata-deploy: export CI in the build container
4f90e3c87 kata-deploy: add dockerbuild/install_yq.sh to gitignore
8ff5c10ac network: Fix error message for setting hardware address on TAP interface
338c28295 dep: update nix dependency
78231a36e ci: Update libseccomp version
34746496b libs/test-utils: share test code by create a new crate
3829ab809 docs: Update CRI-O target link
fcc1e0c61 runtime: tracing: End root span at end of trace
c1e3b8f40 govmm: Refactor qmp functions for adding block device
598884f37 govmm: Refactor code to get rid of redundant code
00860a7e4 qmp: Pass aio backend while adding block device
e1b49d758 config: Add block aio as a supported annotation
ed0f1d0b3 config: Add "block_device_aio" as a config option for qemu
b6cd2348f govmm: Add io_uring as AIO type
81cdaf077 govmm: Correct documentation for Linux aio.
a355812e0 runtime-rs: fixed bug on core-sched error handling
591dfa4fe runtime-rs: add support for core scheduling
09672eb2d agent: do some rollback works if case of do_create_container failed

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-08-30 12:59:10 -07:00
Derek Lee
52bbc3a4b0 cargo.lock: update crates to comply with checks
Updates versions of crossbeam-channel because 0.52.0 is a yanked package
(creators mark version as not for release except as a dependency for
another package)

Updates chrono to use >0.42.0 to avoid:
https://rustsec.org/advisories/RUSTSEC-2020-0159

Updates lz4-sys.

Signed-off-by: Derek Lee <derlee@redhat.com>
2022-08-30 10:08:41 -07:00
Derek Lee
aa581f4b28 cargo.toml: Add oci to src/libs workplace
Adds oci under the src/libs workplace.

oci shares a Cargo.lock file with the rest of src/libs but was not
listed as a member of the workspace.

There is no clear reason why it is not included in the workspace, so
adding it so cargo-deny stop complaining

Signed-off-by: Derek Lee <derlee@redhat.com>
2022-08-30 09:30:03 -07:00
Derek Lee
7914da72c9 cargo.tomls: Added Apache 2.0 to cargo.tomls
One of the checks done by cargo-deny is ensuring all crates have a valid
license. As the rust programs import each other, cargo.toml files
without licenses trigger the check. While I could disable this check
this would be bad practice.

This adds an Apache-2.0 license in the Cargo.toml files.

Some of these files already had a header comment saying it is an Apache
license. As the entire project itself is under an Apache-2.0 license, I
assumed all individual components would also be covered under that
license.

Signed-off-by: Derek Lee <derlee@redhat.com>
2022-08-30 09:30:03 -07:00
Derek Lee
bed4aab7ee github-actions: Add cargo-deny
Adds cargo-deny to scan for vulnerabilities and license issues regarding
rust crates.

GitHub Actions does not have an obvious way to loop over each of the
Cargo.toml files. To avoid hardcoding it, I worked around the problem
using a composite action that first generates the cargo-deny action by
finding all Cargo.toml files before calling this new generated action in
the master workflow.

Uses recommended deny.toml from cargo-deny repo with the following
modifications:

 ignore = ["RUSTSEC-2020-0071"]
  because chrono is dependent on the version of time with the
  vulnerability and there is no simple workaround

 multiple-versions = "allow"
  Because of the above error and other packages, there are instances
  where some crates require different versions of a crate.

 unknown-git = "allow"
  I don't see a particular issue with allowing crates from other repos.
  An alternative would be the manually set each repo we want in an
  allow-git list, but I see this as more of a nuisance that its worth.
  We could leave this as a warning (default), but to avoid clutter I'm
  going to allow it.

If deny.toml needs to be edited in the future, here's the guide:
https://embarkstudios.github.io/cargo-deny/index.html

Fixes #3359

Signed-off-by: Derek Lee <derlee@redhat.com>
2022-08-30 09:30:03 -07:00
Gabriela Cervantes
b1a8acad57 versions: Update cni plugins version
This PR updates the cni plugins version that is being used in the kata CI.

Fixes #5039
Depends-on: github.com/kata-containers/tests#5088

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2022-08-30 16:04:45 +00:00
Joana Pecholt
a6581734c2 kernel: Whitelist cleanup
This removes two options that are not needed (any longer). These
are not set for any kernel so they do not need to be ignored either.

Fixes #5035

Signed-off-by: Joana Pecholt <joana.pecholt@aisec.fraunhofer.de>
2022-08-30 13:24:12 +02:00
Fabiano Fidêncio
1b92a946d6 Merge pull request #4987 from ryansavino/initrd-fixes-for-ubuntu-systemd
Initrd fixes for ubuntu systemd
2022-08-30 09:16:43 +02:00
GabyCT
630eada0d3 Merge pull request #4956 from shippomx/main
kernel: Add CONFIG_CGROUP_HUGETLB=y as part of the cgroup fragments
2022-08-29 14:31:46 -05:00
GabyCT
3426da66df Merge pull request #4951 from wainersm/fix_kata-deploy-ci
Fix kata-deploy to work on CI context
2022-08-29 14:30:59 -05:00
Wainer Moschetta
cd5be6d55a Merge pull request #4775 from bookinabox/auto-backport
github-actions: Auto-backporting
2022-08-29 14:08:12 -03:00
Bin Liu
11383c2c0e Merge pull request #4797 from openanolis/runtime-rs-coresched
runtime-rs: add support for core scheduling
2022-08-29 14:28:30 +08:00
Bin Liu
25f54bb999 Merge pull request #4942 from ManaSugi/fix/use-versions-yaml-for-libseccomp
ci: Use versions.yaml for the libseccomp
2022-08-29 11:22:35 +08:00
Archana Shinde
c174eb809e Merge pull request #4983 from ManaSugi/runk/add-init-msg
runk: Add cli message for init command
2022-08-27 00:15:25 +05:30
Ryan Savino
dc32c4622f osbuilder: fix ubuntu initrd /dev/ttyS0 hang
Guest log is showing a hang on systemd getty start.
Adding symlink for /dev/ttyS0 resolves issue.

Fixes: #4932

Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
2022-08-26 04:59:36 -05:00
Ryan Savino
cc5f91dac7 osbuilder: add systemd symlinks for kata-agent
AGENT_INIT=no (systemd) add symlinks for kata-agent service.

Fixes: #4932

Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
2022-08-26 04:59:36 -05:00
Fupan Li
63959b0be6 Merge pull request #5011 from liubin/fix/4962-add-logs
agent: add some logs for mount operation
2022-08-26 17:12:15 +08:00
Bin Liu
c08a8631e0 agent: add some logs for mount operation
Somewhere is lack of log info, add more details about
the storage and log when error will help understand
what happened.

Fixes: #4962

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-08-26 14:09:56 +08:00
Archana Shinde
7d52934ec1 Merge pull request #4798 from amshinde/use-iouring-qemu
Use iouring for qemu block devices
2022-08-26 04:00:24 +05:30
Wainer Moschetta
cbe5e324ae Merge pull request #4815 from bookinabox/improve-agent-errors
logging: Replace nix::Error::EINVAL with more descriptive msgs
2022-08-25 14:27:56 -03:00
Fabiano Fidêncio
1eea3d9920 Merge pull request #4965 from ryansavino/kata-deploy-threading-fix
kata-deploy: fix threading conflicts
2022-08-25 19:11:52 +02:00
Fabiano Fidêncio
70cd4f1320 Merge pull request #4999 from fidencio/topic/ignore-CONFIG_SPECULATION_MITIGATIONS-for-older-kernels
kernel: Ignore CONFIG_SPECULATION_MITIGATIONS for older kernels
2022-08-25 17:43:57 +02:00
Fabiano Fidêncio
0a6f0174f5 kernel: Ignore CONFIG_SPECULATION_MITIGATIONS for older kernels
TDX kernel is based on a kernel version which doesn't have the
CONFIG_SPECULATION_MITIGATIONS option.

Having this in the allow list for missing configs avoids a breakage in
the TDX CI.

Fixes: #4998

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-25 10:51:13 +02:00
Bin Liu
cce99c5c73 runtime-rs: delete socket from shim command-line options
The socket is not used to specify the socket address, but
an ENV variable is used for runtime-rs.

Fixes: #4995

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-08-25 15:32:17 +08:00
Bin Liu
a7e64b1ca9 Merge pull request #4892 from openanolis/shuoyu/runtime-rs
runtime-rs: support loading kernel modules in guest vm
2022-08-25 15:01:23 +08:00
Fabiano Fidêncio
ddc94e00b0 Merge pull request #4982 from fidencio/topic/improve-cloud-hypervisor-plus-tdx-support
TDX: Get TDX working again with Cloud Hypervisor + a minor change on QEMU's code
2022-08-25 08:53:10 +02:00
Bin Liu
875d946fb4 Merge pull request #4976 from ManaSugi/runk/refactor-delete-func
runk: Move delete logic to libcontainer
2022-08-25 14:30:30 +08:00
Yushuo
6cf16c4f76 agent-ctl: fix clippy error
Fixes: #4988

Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
2022-08-25 11:00:49 +08:00
Yushuo
4b57c04c33 runtime-rs: support loading kernel modules in guest vm
Users can specify the kernel module to be loaded through the agent
configuration in kata configuration file or in pod anotation file.

And information of those modules will be sent to kata agent when
sandbox is created.

Fixes: #4894

Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
2022-08-25 10:38:04 +08:00
Peng Tao
aa6bcacb7d Merge pull request #4973 from bergwolf/github/go-depbot
runtime: cri-o annotations have been moved to podman
2022-08-25 10:12:06 +08:00
Peng Tao
78af76b72a Merge pull request #4969 from bergwolf/github/depbot
Fix depbot reported rust crates dependency security issues
2022-08-25 10:11:54 +08:00
Fabiano Fidêncio
dc90eae17b qemu: Drop unnecessary tdx_guest kernel parameter
With the current TDX kernel used with Kata Containers, `tdx_guest` is
not needed, as TDX_GUEST is now a kernel configuration.

With this in mind, let's just drop the kernel parameter.

Fixes: #4981

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-24 20:02:43 +02:00
Fabiano Fidêncio
d4b67613f0 clh: Use HVC console with TDX
As right now the TDX guest kernel doesn't support "serial" console,
let's switch to using HVC in this case.

Fixes: #4980

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-24 20:02:40 +02:00
Fabiano Fidêncio
c0cb3cd4d8 clh: Avoid crashing when memory hotplug is not allowed
The runtime will crash when trying to resize memory when memory hotplug
is not allowed.

This happens because we cannot simply set the hotplug amount to zero,
leading is to not set memory hotplug at all, and later then trying to
access the value of a nil pointer.

Fixes: #4979

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-24 20:02:22 +02:00
Fabiano Fidêncio
9f0a57c0eb clh: Increase API and SandboxStop timeouts for TDX
While doing tests using `ctr`, I've noticed that I've been hitting those
timeouts more frequently than expected.

Till we find the root cause of the issue (which is *not* in the Kata
Containers), let's increase the timeouts when dealing with a
Confidential Guest.

Fixes: #4978

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-24 20:02:12 +02:00
Manabu Sugimoto
b535bac9c3 runk: Add cli message for init command
Add cli message for init command to tell the user
not to run this command directly.

Fixes: #4367

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2022-08-25 00:32:35 +09:00
Fabiano Fidêncio
c142fa2541 clh: Lift the sharedFS restriction used with TDX
When booting the TDX kernel with `tdx_disable_filter`, as it's been done
for QEMU, VirtioFS can work without any issues.

Whether this will be part of the upstream kernel or not is a different
story, but it easily could make it there as Cloud Hypervisor relies on
the VIRTIO_F_IOMMU_PLATFORM feature, which forces the guest to use the
DMA API, making these devices compatible with TDX.

See Sebastien Boeuf's explanation of this in the
3c973fa7ce208e7113f69424b7574b83f584885d commit:
"""
By using DMA API, the guest triggers the TDX codepath to share some of
the guest memory, in particular the virtqueues and associated buffers so
that the VMM and vhost-user backends/processes can access this memory.
"""

Fixes: #4977

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-24 17:14:05 +02:00
Manabu Sugimoto
bdf8a57bdb runk: Move delete logic to libcontainer
Move delete logic to `libcontainer` crate to make the code clean
like other commands.

Fixes: #4975

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2022-08-24 19:12:36 +09:00
Peng Tao
a06d819b24 runtime: cri-o annotations have been moved to podman
Let's swith to depending on podman which also simplies indirect
dependency on kubernetes components. And it helps to avoid cri-o
security issues like CVE-2022-1708 as well.

Fixes: #4972
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-08-24 18:11:37 +08:00
Peng Tao
ffd1c1ff4f agent-ctl/trace-forwarder: udpate thread_local dependency
To bring in fix to CWE-362.

Fixes: #4968
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-08-24 17:10:49 +08:00
Peng Tao
69080d76da agent/runk: update regex dependency
To bring in fix to CVE-2022-24713.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-08-24 17:02:15 +08:00
Peng Tao
e0ec09039d runtime-rs: update async-std dependency
So that we bump several indirect dependencies like crossbeam-channel,
crossbeam-utils to bring in fixes to known security issues like CVE-2020-15254.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-08-24 16:56:29 +08:00
Bin Liu
2b5dc2ad39 Merge pull request #4705 from bergwolf/github/agent-ut-improve
UT: test_load_kernel_module needs root
2022-08-24 16:22:55 +08:00
Bin Liu
6551d4f25a Merge pull request #4051 from bergwolf/github/vmx-vm-factory
enable vmx for vm factory
2022-08-24 16:22:37 +08:00
Bin Liu
ad91801240 Merge pull request #4870 from cyyzero/runk-cgroup
runk: add pause/resume commands
2022-08-24 14:44:43 +08:00
Derek Lee
763ceeb7ba logging: Replace nix::Error::EINVAL with more descriptive msgs
Replaces instances of anyhow!(nix::Error::EINVAL) with other messages to
make it easier to debug.

Fixes #954

Signed-off-by: Derek Lee <derlee@redhat.com>
2022-08-23 13:44:46 -07:00
Ryan Savino
4ee2b99e1e kata-deploy: fix threading conflicts
Fix threading conflicts when kata-deploy 'make kata-tarball' is called.
Force the creation of rootfs tarballs to happen serially instead of in parallel.

Fixes: #4787

Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
2022-08-23 12:35:23 -05:00
Miao Xia
731d39df45 kernel: Add CONFIG_CGROUP_HUGETLB=y as part of the cgroup fragments
Kata guest os cgroup is not work properly kata guest kernel config option
CONFIG_CGROUP_HUGETLB is not set, leading to:

root@clr-b08d402cc29d44719bb582392b7b3466 ls /sys/fs/cgroup/hugetlb/
ls: cannot access '/sys/fs/cgroup/hugetlb/': No such file or directory

Fixes: #4953

Signed-off-by: Miao Xia <xia.miao1@zte.com.cn>
2022-08-23 12:31:13 +02:00
Derek Lee
96d9037347 github-actions: Auto-backporting
An implementation of semi-automating the backporting
process.

This implementation has two steps:
1. Checking whether any associated issues are marked as bugs

   If they do, mark with `auto-backport` label

2. On a successful merge, if there is a `auto-backport` label  and there
   are any tags of `backport-to-BRANCHNAME`, it calls an action that
   cherry-picks the commits in the PR and automatically creates a PR to
   those branches.

This action uses https://github.com/sqren/backport-github-action

Fixes #3618

Signed-off-by: Derek Lee <derlee@redhat.com>
2022-08-22 16:19:09 -07:00
Chen Yiyang
a6fbaac1bd runk: add pause/resume commands
To make cgroup v1 and v2 works well, I use `cgroups::cgroup` in
`Container` to manager cgroup now. `CgroupManager` in rustjail has some
drawbacks. Frist, methods in Manager traits are not visiable. So we need
to modify rustjail and make them public. Second, CgrupManager.cgroup is
private too, and it can't be serialized. We can't load/save it in
status file. One solution is adding getter/setter in rustjail, then
create `cgroup` and set it when loading status. In order to keep the
modifications to a minimum in rustjail, I use `cgroups::cgroup`
directly. Now it can work on cgroup v1 or v2, since cgroup-rs do this
stuff.

Fixes: #4364 #4821

Signed-off-by: Chen Yiyang <cyyzero@qq.com>
2022-08-22 23:11:50 +08:00
Fabiano Fidêncio
d797036b77 Merge pull request #4861 from ryansavino/upgrade-kernel-support-5.19
kernel: upgrade guest kernel support to 5.19
2022-08-22 14:57:00 +02:00
Bin Liu
8c8e97a495 Merge pull request #4772 from pmores/drop-in-cfg-files-support-rs
Drop-in cfg files support in runtime-rs
2022-08-22 13:41:56 +08:00
Bin Liu
eb91ee45be Merge pull request #4754 from liubin/fix/4749-rollback-when-creating-container-failed
agent: do some rollback works if case of do_create_container failed
2022-08-22 10:44:11 +08:00
Ryan Savino
8e201501ef kernel: fix for set_kmem_limit error
Fixes: #4390

Fix in cargo cgroups-rs crate - Updated crate version to 0.2.10

Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
2022-08-19 13:08:14 -05:00
Ryan Savino
00aadfe20a kernel: SEV guest kernel upgrade to 5.19.2
kernel: Update SEV guest kernel to 5.19.2

Kernel 5.19.2 has all the needed patches for running SEV, thus let's update it and stop using the version coming from confidential-containers.

Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
2022-08-19 13:08:14 -05:00
Ryan Savino
0d9d8d63ea kernel: upgrade guest kernel support to 5.19.2
kernel: Upgrade guest kernel support to 5.19.2

Let's update to the latest 5.19.x released kernel.

CONFIG modifications necessary:
fragments/common/dax.conf - CONFIG_DEV_PAGEMAP_OPS no longer configurable:
https://www.kernelconfig.io/CONFIG_DEV_PAGEMAP_OPS?q=CONFIG_DEV_PAGEMAP_OPS&kernelversion=5.19.2
fragments/common/dax.conf - CONFIG_ND_BLK no longer supported:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f8669f1d6a86a6b17104ceca9340ded280307ac1
fragments/x86_64/base.conf - CONFIG_SPECULATION_MITIGATIONS is a dependency for CONFIG_RETPOLINE:
https://www.kernelconfig.io/config_retpoline?q=&kernelversion=5.19.2
fragments/s390/network.conf - removed from kernel since 5.9.9:
https://www.kernelconfig.io/CONFIG_PACK_STACK?q=CONFIG_PACK_STACK&kernelversion=5.19.2

Updated vmlinux path in build-kernel.sh for arch s390

Fixes #4860

Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
2022-08-19 13:08:13 -05:00
Fabiano Fidêncio
9806ce8615 Merge pull request #4937 from chenhengqi/fix-error-msg
network: Fix error message for setting hardware address on TAP interface
2022-08-19 17:54:58 +02:00
Pavel Mores
57bd3f42d3 runtime-rs: plug drop-in decoding into config-loading code
To plug drop-in support into existing config-loading code in a robust
way, more specifically to create a single point where this needs to be
handled, load_from_file() and load_raw_from_file() were refactored.
Seeing as the original implemenations of both functions were identical
apart from adjust_config() calls in load_from_file(), load_from_file()
was reimplemented in terms of load_raw_from_file().

Fixes  #4771

Signed-off-by: Pavel Mores <pmores@redhat.com>
2022-08-19 11:01:29 +02:00
Pavel Mores
87b97b6994 runtime-rs: add filesystem-related part of drop-in handling
The central function being added here is load() which takes a path to a
base config file and uses it to load the base config file itself, find
the corresponding drop-in directory (get_dropin_dir_path()), iterate
through its contents (update_from_dropins()) and load each drop-in in
turn and merge its contents with the base file (update_from_dropin()).

Also added is a test of load() which mirrors the corresponding test in
the golang runtime (TestLoadDropInConfiguration() in config_test.go).

Signed-off-by: Pavel Mores <pmores@redhat.com>
2022-08-19 11:01:29 +02:00
Pavel Mores
cf785a1a23 runtime-rs: add core toml::Value tree merging
This is the core functionality of merging config file fragments into the
base config file.  Our TOML parser crate doesn't seem to allow working
at the level of TomlConfig instances like BurntSushi, used in the Golang
runtime, does so we implement the required functionality at the level of
toml::Value trees.

Tests to verify basic requirements are included.  Values set by a base
config file and not touched by a subsequent drop-in should be preserved.
Drop-in config file fragments should be able to change values set by the
base config file and add settings not present in the base.  Conversion
of a merged tree into a mock TomlConfig-style structure is tested as
well.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2022-08-19 11:01:29 +02:00
Manabu Sugimoto
92f7d6bf8f ci: Use versions.yaml for the libseccomp
It would be nice to use `versions.yaml` for the maintainability.
Previously, we have been specified the `libseccomp` and the `gperf` version
directly in this script without using the `versions.yaml` because the current
snap workflow is incomplete and fails.
This is because snap CI environment does not have kata-cotnainers repository
under ${GOPATH}. To avoid the failure, the `rootfs.sh` extracts the libseccomp
version and url in advance and pass them to the `install_libseccomp.sh` as
environment variables.

Fixes: #4941

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2022-08-19 09:05:08 +09:00
Fabiano Fidêncio
828383bc39 Merge pull request #4933 from likebreath/0816/prepare_clh_v26.0
Upgrade to Cloud Hypervisor v26.0
2022-08-18 18:36:53 +02:00
James O. D. Hunt
6d6edb0bb3 Merge pull request #4903 from cmaf/tracing-defer-rootSpan-end
runtime: tracing: End root span at end of trace
2022-08-18 08:51:41 +01:00
Peng Tao
f508c2909a runtime: constify splitIrqChipMachineOptions
A simple cleanup.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-08-18 10:09:20 +08:00
Peng Tao
2b0587db95 runtime: VMX is migratible in vm factory case
We are not spinning up any L2 guests in vm factory, so the L1 guest
migration is expected to work even with VMX.

See https://www.linux-kvm.org/page/Nested_Guests

Fixes: #4050
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-08-18 10:08:43 +08:00
Peng Tao
fa09f0ec84 runtime: remove qemuPaths
It is broken that it doesn't list QemuVirt machine type. In fact we
don't need it at all. Just drop it.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-08-18 10:06:10 +08:00
Peng Tao
326f1cc773 agent: enrich some error code path
So that it is easier to find out why some function fails.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-08-18 10:02:12 +08:00
Peng Tao
4f53e010b4 agent: skip test_load_kernel_module if non-root
We need root privilege to load a real kernel module.

Fixes: #4704
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-08-18 10:02:12 +08:00
Bin Liu
cc4b9ac7cd Merge pull request #4940 from ManaSugi/fix/update-libseccomp-version
ci: Update libseccomp version
2022-08-18 08:36:59 +08:00
Bin Liu
c7b7bb701a Merge pull request #4930 from bergwolf/github/depbot
dep: update nix dependency
2022-08-18 08:05:14 +08:00
Bo Chen
3a597c2742 runtime: clh: Use the new 'payload' interface
The new 'payload' interface now contains the 'kernel' and 'initramfs'
config.

Fixes: #4952

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-08-17 12:23:43 -07:00
Bo Chen
16baecc5b1 runtime: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v26.0.
Note: The client code of cloud-hypervisor's (CLH) OpenAPI is
automatically generated by openapi-generator [1-2].

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Fixes: #4952

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-08-17 12:23:12 -07:00
Bo Chen
50ea071834 versions: Upgrade to Cloud Hypervisor v26.0
Highlights from the Cloud Hypervisor release v26.0:

**SMBIOS Improvements via `--platform`**
`--platform` and the appropriate API structure has gained support for supplying
OEM strings (primarily used to communicate metadata to systemd in the guest)

**Unified Binary MSHV and KVM Support**
Support for both the MSHV and KVM hypervisors can be compiled into the same
binary with the detection of the hypervisor to use made at runtime.

**Notable Bug Fixes**
* The prefetchable flag is preserved on BARs for VFIO devices
* PCI Express capabilties for functionality we do not support are now filtered
out
* GDB breakpoint support is more reliable
* SIGINT and SIGTERM signals are now handled before the VM has booted
* Multiple API event loop handling bug fixes
* Incorrect assumptions in virtio queue numbering were addressed, allowing
thevirtio-fs driver in OVMF to be used
* VHDX file format header fix
* The same VFIO device cannot be added twice
* SMBIOS tables were being incorrectly generated

**Deprecations**
Deprecated features will be removed in a subsequent release and users should
plan to use alternatives.

The top-level `kernel` and `initramfs` members on the `VmConfig` have been
moved inside a `PayloadConfig` as the `payload` member. The OpenAPI document
has been updated to reflect the change and the old API members continue to
function and are mapped to the new version. The expectation is that these old
versions will be removed in the v28.0 release.

**Removals**
The following functionality has been removed:

The unused poll_queue parameter has been removed from --disk and
equivalent. This was residual from the removal of the vhost-user-block
spawning feature.

Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v26.0

Fixes: #4952

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-08-17 12:20:26 -07:00
wllenyj
c75970b816 dragonball: add more unit test for config manager
Added more unit tests for config manager.

Fixes: #4899

Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
2022-08-17 23:46:26 +08:00
Wainer dos Santos Moschetta
f7d41e98cb kata-deploy: export CI in the build container
The clone_tests_repo() in ci/lib.sh relies on CI variable to decide
whether to checkout the tests repository or not. So it is required to
pass that variable down to the build container of kata-deploy, otherwise
it can fail on some scenarios.

Fixes #4949
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2022-08-17 10:42:49 -03:00
Wainer dos Santos Moschetta
4f90e3c87e kata-deploy: add dockerbuild/install_yq.sh to gitignore
The install_yq.sh is copied to tools/packaging/kata-deploy/local-build/dockerbuild
so that it is added in the kata-deploy build image. Let's tell git to
ignore that file.

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2022-08-17 10:00:28 -03:00
Bin Liu
9d6d236003 Merge pull request #4869 from PrajwalBorkar/prajwal-patch
Updated the link target of CRI-O
2022-08-17 17:55:40 +08:00
Hengqi Chen
8ff5c10ac4 network: Fix error message for setting hardware address on TAP interface
Error out with the correct interface name and hardware address instead.

Fixes: #4944

Signed-off-by: Hengqi Chen <chenhengqi@outlook.com>
2022-08-17 16:42:07 +08:00
Peng Tao
338c282950 dep: update nix dependency
To fix CVE-2021-45707 that affects nix < 0.20.2.

Fixes: #4929
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-08-17 16:06:26 +08:00
James O. D. Hunt
82ad43f9bf Merge pull request #4928 from liubin/fix/4925-share-test-utils-for-rust
libs/test-utils: share test code by create a new crate
2022-08-17 08:31:11 +01:00
Manabu Sugimoto
78231a36e4 ci: Update libseccomp version
Updates the libseccomp version that is being used in the Kata CI.

Fixes: #4858, #4939

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2022-08-17 15:39:22 +09:00
Bin Liu
8cd1e50eb6 Merge pull request #4921 from liubin/fix/2920-delete-vergen
runtime-rs: delete vergen dependency
2022-08-17 10:09:12 +08:00
Bin Liu
34746496b7 libs/test-utils: share test code by create a new crate
More and more Rust code is introduced, the test utils original in agent
should be made easy to share, move it into a new crate will make it
easy to share between different crates.

Fixes: #4925

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-08-17 00:12:44 +08:00
GabyCT
dd93d4ad5a Merge pull request #4922 from bergwolf/github/release
workflow: trigger release for 3.x releases
2022-08-16 10:20:33 -05:00
Peng Tao
6d6c068692 workflow: trigger release for 3.x releases
So that we can push 3.x artifacts to the release page.

Fixes: #4919
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-08-16 17:55:51 +08:00
Bin Liu
eab7c8f28f runtime-rs: delete vergen dependency
vergen is a build dependency, but it is not being used.
we are processing ver/commit hash by make command, but not by vergen.

Fixes: #4920

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-08-16 15:31:24 +08:00
Bin Liu
828574d27c Merge pull request #4893 from openanolis/runtime-rs-main
Runtime-rs: support persist file
2022-08-16 14:42:22 +08:00
Bin Liu
334c7b3355 Merge pull request #4916 from GabyCT/topic/fixurl
docs: Update url in containerd documentation
2022-08-16 13:45:58 +08:00
Bin Liu
f9d3181533 Merge pull request #4911 from bergwolf/3.0.0-alpha0-branch-bump
# Kata Containers 3.0.0-alpha0
2022-08-16 13:44:49 +08:00
Gabriela Cervantes
3e9077f6ee docs: Update url in containerd documentation
This PR updates the url that we have in our kata containerd
documentation.

Fixes #4915

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2022-08-15 19:04:29 +00:00
Bin Liu
830fb266e6 Merge pull request #4854 from openanolis/runtime-rs-delete
runtime-rs: delete route model
2022-08-15 20:48:58 +08:00
Prajwal Borkar
3829ab809f docs: Update CRI-O target link
Fixes #4767

Signed-off-by: Prajwal Borkar <prajwalborkar5075@gmail.com>
2022-08-15 16:48:32 +05:30
Peng Tao
52133ef66e release: Kata Containers 3.0.0-alpha0
- runtime-rs: fix design doc's typo
- docs: use curl as default downloader for runtime-rs
- runtime-rs: update Cargo.lock
- Fix some GitHub actions workflow issues
- versions: Update libseccomp version
- runtime-rs:merge runtime rs to main
- nydus: wait nydusd API server ready before mounting share fs
- versions: Update TD-shim due to build breakage
- agent-ctl: Add an empty [workspace]
- packaging: Create no_patches.txt for the SPR-BKC-PC-v9.6.x
- docs: Improve SGX documentation
- runtime: explicitly mark the source of the log is from qemu.log
- runtime: add unlock before return in sendReq
- docs: add back host network limitation
- runk: add ps sub-command
- Depends-on:github.com/kata-containers/tests#4986
- runtime-rs:update rtnetlink version
- runtime-rs:skip the build process when the arch is s390x
- docs: Improve SGX documentation
- agent: Use rtnetlink's neighbours API to add neighbors
- Bump TDX dependencies (QEMU and Kernel)
- OVMF / td-shim: Adjust final tarball location
- libs: fix CI error for protocols
- runtime-rs: merge main to runtime-rs
- packaging: Add support for building TDVF
- versions: Track and add support for building TD-shim
- versions: Upgrade rust version
- Merge Main into runtime-rs branch
- agent: log RPC calls for debugging
- runtime-rs: fix stop failed in azure
- Add support AmdSev build of OVMF
- runtime: Support for host cgroupv2
- versions: Update runc version
- qemu: Add liburing to qemu build
- runtime-rs: fix set share sandbox pid namespace
- Docs: fix tables format error
- versions: Update Firecracker version to v1.1.0
- agent: Fix stream fd's double close
- container: kill all of the processes in a container when it terminated
- fix network failed for kata ci
- runtime-rs: handle default_vcpus greator than default_maxvcpu
- agent: fix fd-double-close problem in ut test_do_write_stream
- runtime-rs: add functionalities support for macvlan and vlan endpoints
- Docs: add rust environment setup for kata 3.0
- rustjail: check result to let it return early
- upgrade nydus version
- support disable_guest_seccomp
- cgroups: remove unnecessary get_paths()
- versions: Update firecracker version
- kata-monitor: fix can't monitor /run/vc/sbs
- runtime-rs: fix sandbox_cgroup_only=false panic
- runtime-rs: fix ctr exit failed
- docs: add installation guide for kata 3.0
- runtime-rs: support functionalities of ipvlan endpoint
- runtime-rs: remove the value of hypervisor path in DB config
- kata-sys-util: upgrade nix version
- runtime-rs: fix some bugs to make runtime-rs on aarch64
- runk: Support `exec` sub-command
- runtime-rs: hypervisor part
- clh: Don't crash if no network device is set by the upper layer
- packaging: Rework how ${BUILD_SUFFIX} is used with the QEMU builder scripts
- versions: Update Cloud Hypervisor to v25.0
- Runtime-rs merge main
- kernel: Deduplicate code used for building TEE kernels
- runtime-rs: Dragonball-sandbox - add virtio device feature support for aarch64
- packaging: Simplify config path handling
- build: save lines for repository_owner check
- kata 3.0 Architecture
- Fix clh tarball build
- runtime-rs: built-in Dragonball sandbox part III - virtio-blk, virtio-fs, virtio-net and VMM API support
- runtime: Fix DisableSelinux config
- docs: Update URL links for containerd documentation
- docs: delete CRI containerd plugin statement
- release: Revert kata-deploy changes after 2.5.0-rc0 release
- tools/snap: simplify nproc
- action: revert commit message limit to 150 bytes
- runtime-rs: Dragonball sandbox - add Vcpu::configure() function for aarch64
- runtime-rs: makefile for dragonball
- runtime-rs:refactor network model with netlink
- runtime-rs: Merge Main into runtime-rs branch
- runtime-rs: built-in Dragonball sandbox part II - vCPU manager
- runtime-rs: runtime-rs merge main
- runtime-rs: built-in Dragonball sandbox part I - resource and device managers

caada34f1 runtime-rs: fix design doc's typo
b61dda40b docs: use curl as default downloader for runtime-rs
ca9d16e5e runtime-rs: update Cargo.lock
99a7b4f3e workflow: Revert "static-checks: Allow Merge commit to be >75 chars"
d14e80e9f workflow: Revert "docs: modify move-issues-to-in-progress.yaml"
1f4b6e646 versions: Update libseccomp version
8a4e69008 versions: Update TD-shim due to build breakage
065305f4a agent-ctl: Add an empty [workspace]
1444d7ce4 packaging: Create no_patches.txt for the SPR-BKC-PC-v9.6.x
2ae807fd2 nydus: wait nydusd API server ready before mounting share fs
c8d4ea84e docs: Improve SGX documentation
d8ad16a34 runtime: add unlock before return in sendReq
8bbffc42c runtime-rs:update rtnetlink version
c5452faec docs: Improve SGX documentation
389ae9702  runtime-rs:skip the test when the arch is s390x
945e02227 runtime-rs:skip the build process when the arch is s390x
8d1cb1d51 td-shim: Adjust final tarball location
62f05d4b4 ovmf: Adjust final tarball location
9972487f6 versions: Bump Kernel TDX version
c9358155a kernel: Sort the TDX configs alphabetically
dd397ff1b versions: Bump QEMU TDX version
230a22905 runk: add ps sub-command
889557ecb docs: add back host network limitation
c9b5bde30 versions: Track and build TDVF
e6a5a5106 packaging: Generate a tarball as OVMF build result
42eaf19b4 packaging: Simplify OVMF repo clone
4d33b0541 packaging: Don't hardcode "edk2" as the cloned repo's dir.
7247575fa runtime-rs:fix cargo clippy
b06bc8228 versions: Track and add support for building TD-shim
86ac653ba libs: fix CI error for protocols
81fe51ab0 agent: fix unittests for arp neighbors
845c1c03c agent: use rtnetlink's neighbours API to add neighbors
9b1940e93 versions: update rust version
638c2c416 static-build: Add AmdSev option for OVMF builder Introduces new build of firmware needed for SEV
f0b58e38d static-build: Add build script for  OVMF
fa0b11fc5 runtime-rs: fix stdin hang in azure
5c3155f7e runtime: Support for host cgroup v2
4ab45e5c9 docs: Update support for host cgroupv2
326eb2f91 versions: Update runc version
f5aa6ae46 agent: Fix stream fd's double close problem
6e149b43f Docs: fix tables format error
85f4e7caf runtime: explicitly mark the source of the log is from qemu.log
56d49b507 versions: Update Firecracker version to v1.1.0
b3147411e runtime-rs:add unit test for set share pid ns
1ef3f8eac runtime-rs: set share sandbox pid namespace
57c556a80 runtime-rs: fix stop failed in azure
0e24f47a4 agent: log RPC calls for debugging
c825065b2 runtime-rs: fix tc filter setup failed
e0194dcb5 runtime-rs: update route destination with prefix
fa85fd584 docs: add rust environment setup for kata 3.0
896478c92 runtime-rs: add functionalities support for macvlan and vlan endpoints
df79c8fe1 versions: Update firecracker version
912641509 agent: fix fd-double-close problem in ut test_do_write_stream
43045be8d runtime-rs: handle default_vcpus greator than default_maxvcpu
0d7cb7eb1 agent: delete agent-type property in announce
eec9ac81e rustjail: check result to let it return early.
402bfa0ce nydus: upgrade nydus/nydus-snapshotter version
54f53d57e runtime-rs: support disable_guest_seccomp
4331ef80d Runtime-rs: add installation guide for rust-runtime
72dbd1fcb kata-monitor: fix can't monitor /run/vc/sbs.
e9988f0c6 runtime-rs: fix sandbox_cgroup_only=false panic
cebbebbe8 runtime-rs: fix ctr exit failed
62182db64 runtime-rs: add unit test for ipvlan endpoint
99654ce69 runtime-rs: update dbs-xxx dependencies
f4c3adf59 runtime-rs: Add compile option file
545ae3f0e runtime-rs: fix warning
19eca71cd runtime-rs: remove the value of hypervisor path in DB config
d8920b00c runtime-rs: support functionalities of ipvlan endpoint
2b01e9ba4 dragonball: fix warning
996a6b80b kata-sys-util: upgrade nix version
f690b0aad qemu: Add liburing to qemu build
d93e4b939 container: kill all of the processes in this container
3c989521b dragonball: update for review
274598ae5 kata-runtime: add dragonball config check support.
1befbe673 runtime-rs: Cargo lock for fix version problem
3d6156f6e runtime-rs: support dragonball and runtime-binary
3f6123b4d libs: update configuration and annotations
9ae2a45b3 cgroups: remove unnecessary get_paths()
be31207f6 clh: Don't crash if no network device is set by the upper layer
051181249 packaging: Add a "-" in the dir name if $BUILD_DIR is available
dc3b6f659 versions: Update Cloud Hypervisor to v25.0
201ff223f packaging: Use the $BUILD_SUFFIX when renaming the qemu binary
1a25afcdf kernel: Allow passing the URL to download the tarball
80c68b80a kernel: Deduplicate code used for building TEE kernels
d2584991e dragonball: fix dependency unused warning
458f6f42f dragonball: use const string for legacy device type
939959e72 docs: add Dragonball to hypervisors
f6f96b8fe dragonball: add legacy device support for aarch64
7a4183980 dragonball: add device info support for aarch64
f7ccf92dc kata-deploy: Rely on the configured config path
386a523a0 kata-deploy: Pass the config path to CRI-O
13df57c39 build: save lines for repository_owner check
57c2d8b74 docs: Update URL links for containerd documentation
e57a1c831 build: Mark git repos as safe for build
2551924bd docs: delete CRI containerd plugin statement
9cee52153 fmt: do cargo fmt and add a dependency for blk_dev
47a4142e0 fs: change vhostuser and virtio into const
e14e98bbe cpu_topo: add handle_cpu_topology function
5d3b53ee7 downtime: add downtime support
6a1fe85f1 vfio: add vfio as TODO
5ea35ddcd refractor: remove redundant by_id
b646d7cb3 config: remove ht_enabled
cb54ac6c6 memory: remove reserve_memory_bytes
bde6609b9 hotplug: add room for other hotplug solution
d88b1bf01 dragonball: update vsock dependency
dd003ebe0 Dragonball: change error name and fix compile error
38957fe00 UT: fix compile error in unit tests
11b3f9514 dragonball: add virtio-fs device support
948381bdb dragonball: add virtio-net device support
3d20387a2 dragonball: add virtio-blk device support
87d38ae49 Doc: add document for Dragonball API
2bb1eeaec docs: further questions related to upcall
026aaeecc docs: add FAQ to the report
fffcb8165 docs: update the content of the report
42ea854eb docs: kata 3.0 Architecture
efdb92366 build: Fix clh source build as normal user
0e40ecf38 tools/snap: simplify nproc
f59939a31 runk: Support `exec` sub-command
4d89476c9 runtime: Fix DisableSelinux config
090de2dae dragonball: fix the clippy errors.
a1593322b dragonball: add vsock api to api server
89b9ba860 dragonball: add set_vm_configuration api
95fa0c70c dragonball: add start microvm support
5c1ccc376 dragonball: add Vmm struct
4d234f574 dragonball: refactor code layout
cfd5dae47 dragonball: add vm struct
527b73a8e dragonball: remove unused feature in AddressSpaceMgr
3bafafec5 action: extend commit message line limit to 150 bytes
5010c643c release: Revert kata-deploy changes after 2.5.0-rc0 release
7120afe4e dragonball: add vcpu test function for aarch64
648d285a2 dragonball: add vcpu support for aarch64
7dad7c89f dragonball: update dbs-xxx dependency
07231b2f3 runtime-rs:refactor network model with netlink
c8a905206 build: format files
242992e3d build: put install methods in utils.mk
8a697268d build: makefile for dragonball config
9c526292e runtime-rs:refactor network model with netlink
71db2dd5b hotplug: add room for future acpi hotplug mechanism
8bb00a3dc dragonball: fix a bug when generating kernel boot args
2aedd4d12 doc: add document for vCPU, api and device
bec22ad01 dragonball: add api module
07f44c3e0 dragonball: add vcpu manager
78c971875 dragonball: add upcall support
7d1953b52 dragonball: add vcpu
468c73b3c dragonball: add kvm context
e89e6507a dragonball: add signal handler
b6cb2c4ae dragonball: add metrics system
e80e0c464 dragonball: add io manager wrapper
d5ee3fc85 safe-path: fix clippy warning
93c10dfd8 runtime-rs: add crosvm license in Dragonball
dfe6de771 dragonball: add dragonball into kata README
39ff85d61 dragonball: green ci
71f24d827 dragonball: add Makefile.
a1df6d096 Doc: Update Dragonball Readme and add document for device
8619f2b3d dragonball: add virtio vsock device manager.
52d42af63 dragonball: add device manager.
c1c1e5152 dragonball: add kernel config.
6850ef99a dragonball: add configuration manager.
0bcb422fc dragonball: add legacy devices manager
3c45c0715 dragonball: add console manager.
3d38bb300 dragonball: add address space manager.
aff604055 dragonball: add resource manager support.
8835db6b0 dragonball: initial commit
9cb15ab4c agent: add the FSGroup support
ff7874bc2 protobuf: upgrade the protobuf version to 2.27.0
06f398a34 runtime-rs: use withContext to evaluate lazily
fd4c26f9c runtime-rs: support network resource
4be7185aa runtime-rs: runtime part implement
10343b1f3 runtime-rs: enhance runtimes
9887272db libs: enhance kata-sys-util and kata-types
3ff0db05a runtime-rs: support rootfs volume for resource
234d7bca0 runtime-rs: support cgroup resource
75e282b4c runtime-rs: hypervisor base define
bdfee005f runtime-rs: service and runtime framework
4296e3069 runtime-rs: agent implements
d3da156ee runtime-rs: uint FsType for s390x
e705ee07c runtime-rs: update containerd-shim-protos to 0.2.0
8c0a60e19 runtime-rs: modify the review suggestion
278f843f9 runtime-rs: shim implements for runtime-rs
641b73610 libs: enhance kata-sys-util
69ba1ae9e trans: fix the issue of wrong swapness type
d2a9bc667 agent: agent-protocol support async
aee9633ce libs/sys-util: provide functions to execute hooks
8509de0ae libs/sys-util: add function to detect and update K8s emptyDir volume
6d59e8e19 libs/sys-util: introduce function to get device id
5300ea23a libs/sys-util: implement reflink_copy()
1d5c898d7 libs/sys-util: add utilities to parse NUMA information
87887026f libs/sys-util: add utilities to manipulate cgroup
ccd03e2ca libs/sys-util: add wrappers for mount and fs
45a00b4f0 libs/sys-util: add kata-sys-util crate under src/libs
48c201a1a libs/types: make the variable name easier to understand
b9b6d70aa libs/types: modify implementation details
05ad026fc libs/types: fix implementation details
d96716b4d libs/types:fix styles and implementation details
6cffd943b libs/types:return Result to handle parse error
6ae87d9d6 libs/types: use contains to make code more readable
45e5780e7 libs/types: fixed spelling and grammer error
2599a06a5 libs/types:use include_str! in test file
8ffff40af libs/types:Option type to handle empty tomlconfig
626828696 libs/types: add license for test-config.rs
97d8c6c0f docs: modify move-issues-to-in-progress.yaml
8cdd70f6c libs/types: change method to update config by annotation
e19d04719 libs/types: implement KataConfig to wrap TomlConfig
387ffa914 libs/types: support load Kata agent configuration from file
69f10afb7 libs/types: support load Kata hypervisor configuration from file
21cc02d72 libs/types: support load Kata runtime configuration from file
5b89c1df2 libs/types: add kata-types crate under src/libs
4f62a7618 libs/logging: fix clippy warnings
6f8acb94c libs: refine Makefile rules
7cdee4980 libs/logging: introduce a wrapper writer for logging
426f38de9 libs/logging: implement rotator for log files
392f1ecdf libs: convert to a cargo workspace
575df4dc4 static-checks: Allow Merge commit to be >75 chars

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-08-15 07:23:13 +00:00
Ji-Xinyou
ff7c78e0e8 runtime-rs: static resource mgmt default to false
Static resource management should be default to false. If default to be
true, later update sandbox operation, e.g. resize, will not work.

Fixes: #4742
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
2022-08-15 14:42:38 +08:00
Ji-Xinyou
00f3a6de12 runtime-rs: make static resource mgmt idiomatic
Make the get value process (cpu and mem) more idiomatic.

Fixes: #4742
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
2022-08-15 11:18:35 +08:00
Zhongtao Hu
4d7f3edbaf runtime-rs: support the functionality of cleanup
Cleanup sandbox resource

Fixes: #4891
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-08-13 15:56:38 +08:00
Zhongtao Hu
5aa83754e5 runtime-rs: support save to persist file and restore
Support the functionality of save and restore for sandbox state

Fixes:#4891
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-08-13 15:44:13 +08:00
Chelsea Mafrica
fcc1e0c617 runtime: tracing: End root span at end of trace
The root span should exist the duration of the trace. Defer ending span
until the end of the trace instead of end of function. Add the span to
the service struct to do so.

Fixes #4902

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2022-08-12 13:15:39 -07:00
GabyCT
97b7fe438a Merge pull request #4898 from openanolis/fixdoc
runtime-rs: fix design doc's typo
2022-08-12 10:06:44 -05:00
Bin Liu
2cd964ca79 Merge pull request #4881 from openanolis/runtime-rs-curl
docs: use curl as default downloader for runtime-rs
2022-08-12 19:46:39 +08:00
Bin Liu
6a8e8dfc8e Merge pull request #4876 from liubin/fix/4875-update-Cargo-lock
runtime-rs: update Cargo.lock
2022-08-12 19:41:02 +08:00
Ji-Xinyou
caada34f1d runtime-rs: fix design doc's typo
Fix docs/design/architecture_3.0's typo. Both source code and png.

Fixes: #4883
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
2022-08-12 17:38:13 +08:00
Bin Liu
bfa86246f8 Merge pull request #4872 from liubin/fix/4871-github-actions-fix
Fix some GitHub actions workflow issues
2022-08-11 19:26:15 +08:00
Zhongtao Hu
c280d6965b runtime-rs: delete route model
As route model is used for specific internal scenario, and it's not for
the general requirement.

Fixes:#4838
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-08-11 15:56:43 +08:00
Zhongtao Hu
b61dda40b7 docs: use curl as default downloader for runtime-rs
use curl as default downloader for runtime-rs

Fixes: #4879
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-08-11 15:52:13 +08:00
Fabiano Fidêncio
881c87a25c Merge pull request #4859 from GabyCT/topic/updatelibse
versions: Update libseccomp version
2022-08-11 09:34:44 +02:00
Bin Liu
ca9d16e5ea runtime-rs: update Cargo.lock
Update Cargo.lock

Fixes: #4875

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-08-11 10:34:36 +08:00
Ji-Xinyou
4a54876dde runtime-rs: support static resource management functionality
Supports functionalities of static resource management, enabled by
default.

Fixes: #4742
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
2022-08-11 09:46:44 +08:00
Bin Liu
99a7b4f3e1 workflow: Revert "static-checks: Allow Merge commit to be >75 chars"
This reverts commit 575df4dc4d.

Fixes: #4871

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-08-11 08:59:02 +08:00
Bin Liu
d14e80e9fd workflow: Revert "docs: modify move-issues-to-in-progress.yaml"
This reverts commit 97d8c6c0fa.

Fixes: #4871

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-08-11 08:58:43 +08:00
Bin Liu
cb7f9524be Merge pull request #4804 from openanolis/anolis/merge_runtime_rs_to_main
runtime-rs:merge runtime rs to main
2022-08-11 08:40:41 +08:00
Tim Zhang
4813a3cef9 Merge pull request #4711 from liubin/fix/4710-wait-nydusd-api-server-ready
nydus: wait nydusd API server ready before mounting share fs
2022-08-10 17:20:17 +08:00
Gabriela Cervantes
1f4b6e6460 versions: Update libseccomp version
This PR updates the libseccomp version at the versions.yaml that is
being used in the kata CI.

Fixes #4858

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2022-08-09 14:27:59 +00:00
GabyCT
4d07c86cf1 Merge pull request #4846 from fidencio/topic/update-td-shim-due-to-build-breakage
versions: Update TD-shim due to build breakage
2022-08-08 11:50:49 -05:00
Fabiano Fidêncio
b0fa44165e Merge pull request #4844 from fidencio/topic/agent-ctl-add-an-empty-workspace
agent-ctl: Add an empty [workspace]
2022-08-08 17:08:43 +02:00
Fabiano Fidêncio
a8176d0218 Merge pull request #4842 from fidencio/topic/packaging-create-no_patches.txt-for-the-SPR-BKC-PC-v9.6.x-kernel
packaging: Create no_patches.txt for the SPR-BKC-PC-v9.6.x
2022-08-08 17:05:26 +02:00
Fabiano Fidêncio
8a4e690089 versions: Update TD-shim due to build breakage
"We need a newer nightly 1.62 rust to deal with the change
rust-lang/libc@576f778 on crate libc which breaks the compilation."

This comes from the a pull-request raised on TD-shim repo,
https://github.com/confidential-containers/td-shim/pull/354, which fixes
the issues with the commit being used with Kata Containers.

Let's bump to a newer commit of TD-shim and to a newer version of the
nightly toolchain as part of our versions file.

Fixes: #4840

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-08 15:53:57 +02:00
Fabiano Fidêncio
8854b4de2c Merge pull request #4836 from cmaf/sgx-update-docs-2
docs: Improve SGX documentation
2022-08-08 12:15:04 +02:00
Fabiano Fidêncio
065305f4a1 agent-ctl: Add an empty [workspace]
"An empty [workspace] can be used with a package to conveniently create a
workspace with the package and all of its path dependencies", according
to the https://doc.rust-lang.org/cargo/reference/workspaces.html

This is also matches with the suggestion provided by the Cargo itself,
due to the errors faced with the Cloud Hypervisor CI:
```
10:46:23 this may be fixable by adding `go/src/github.com/kata-containers/kata-containers/src/tools/agent-ctl` to the `workspace.members` array of the manifest located at: /tmp/jenkins/workspace/kata-containers-2-clh-PR/Cargo.toml
10:46:23 Alternatively, to keep it out of the workspace, add the package to the `workspace.exclude` array, or add an empty `[workspace]` table to the package's manifest.
```

Fixes: #4843

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-08 11:24:39 +02:00
Fabiano Fidêncio
1444d7ce42 packaging: Create no_patches.txt for the SPR-BKC-PC-v9.6.x
The file was added as part of the commit that tested this changes in the
CCv0 branch, but forgotten when re-writing it to the `main` branch.

Fixes: #4841

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-08 11:00:23 +02:00
liubin
2ae807fd29 nydus: wait nydusd API server ready before mounting share fs
If the API server is not ready, the mount call will fail, so before
mounting share fs, we should wait the nydusd is started and
the API server is ready.

Fixes: #4710

Signed-off-by: liubin <liubin0329@gmail.com>
Signed-off-by: Bin Liu <bin@hyper.sh>
2022-08-08 16:18:38 +08:00
Tim Zhang
8d4d98587f Merge pull request #4746 from liubin/fix/4745-add-log-field
runtime: explicitly mark the source of the log is from qemu.log
2022-08-08 15:21:01 +08:00
Bin Liu
9516286f6d Merge pull request #4829 from LetFu/fix/addUnlock
runtime: add unlock before return in sendReq
2022-08-08 14:42:44 +08:00
Archana Shinde
c1e3b8f40f govmm: Refactor qmp functions for adding block device
Instead of passing a bunch of arguments to qmp functions for
adding block devices, use govmm BlockDevice structure to reduce these.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-08-05 13:16:34 -07:00
Archana Shinde
598884f374 govmm: Refactor code to get rid of redundant code
Get rid of redundant return values from function.
args and blockdevArgs used to return different values to maintain
compatilibity between qemu versions. These are exactly the same now.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-08-05 13:16:34 -07:00
Archana Shinde
00860a7e43 qmp: Pass aio backend while adding block device
Allow govmm to pass aio backend while adding block device.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-08-05 13:16:34 -07:00
Archana Shinde
e1b49d7586 config: Add block aio as a supported annotation
Allow Block AIO to be passed as a per pod annotation.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-08-05 13:16:34 -07:00
Archana Shinde
ed0f1d0b32 config: Add "block_device_aio" as a config option for qemu
This configuration will allow users to choose between different
I/O backends for qemu, with the default being io_uring.
This will allow users to fallback to a different I/O mechanism while
running on kernels olders than 5.1.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-08-05 13:16:34 -07:00
Archana Shinde
83a919a5ea Merge pull request #4795 from liubin/fix/4794-update-limitation
docs: add back host network limitation
2022-08-05 23:00:47 +05:30
Chelsea Mafrica
c8d4ea84e3 docs: Improve SGX documentation
Remove line about annotations support in CRI-O and containerd since it
has been supported for a couple years.

Fixes #4819

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2022-08-05 09:57:50 -07:00
Fabiano Fidêncio
e2968b177d Merge pull request #4763 from cyyzero/runk-ps
runk: add ps sub-command
2022-08-05 16:28:38 +02:00
chmod100
d8ad16a34e runtime: add unlock before return in sendReq
Unlock is required before return, so there need to add unlock

Fixes: #4827

Signed-off-by: chmod100 <letfu@outlook.com>
2022-08-05 13:30:12 +00:00
Peng Tao
b828190158 Merge pull request #4823 from openanolis/runtime-rs-merge-main-runtime-rs
Depends-on:github.com/kata-containers/tests#4986
Runtime-rs:merge main runtime rs
2022-08-05 14:42:22 +08:00
Peng Tao
f791169efc Merge pull request #4826 from openanolis/runtime-rs-version
runtime-rs:update rtnetlink version
2022-08-05 14:28:46 +08:00
Zhongtao Hu
8bbffc42cf runtime-rs:update rtnetlink version
update rtnetlink version for runtime-rs

Fixes:#4824
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-08-05 11:18:09 +08:00
Zhongtao Hu
e403838131 runtim-rs: Merge remote-tracking branch 'origin/main' into runtime-rs
To keep runtime-rs up to date, we will merge main into runtime-rs every
week.

Fixes:kata-containers#4822
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-08-05 10:49:33 +08:00
Bin Liu
931251105b Merge pull request #4817 from openanolis/runtime-rs-s390x-fail
runtime-rs:skip the build process when the arch is s390x
2022-08-05 08:23:13 +08:00
Salvador Fuentes
587c0c5e55 Merge pull request #4820 from cmaf/sgx-update-docs-1
docs: Improve SGX documentation
2022-08-04 15:59:33 -05:00
Chelsea Mafrica
c5452faec6 docs: Improve SGX documentation
Update documentation with details regarding
intel-device-plugins-for-kubernetes setup and dependencies.

Fixes #4819

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2022-08-04 12:49:01 -07:00
GabyCT
2764bd7522 Merge pull request #4770 from justxuewei/refactor/agent/netlink-neighbor
agent: Use rtnetlink's neighbours API to add neighbors
2022-08-04 12:09:30 -05:00
Zhongtao Hu
389ae97020 runtime-rs:skip the test when the arch is s390x
github.com/kata-containers/tests#4986.To avoid returning an error when
running the ci, we just skip the test if the arch is s390x

Fixes: #4816
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-08-04 21:13:50 +08:00
Zhongtao Hu
945e02227c runtime-rs:skip the build process when the arch is s390x
github.com/kata-containers/tests#4986.To avoid returning an error when running the ci, we just skip the build
process if the arch is s390x

Fixes: #4816
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-08-04 21:13:40 +08:00
Archana Shinde
b6cd2348f5 govmm: Add io_uring as AIO type
io_uring was introduced as a new kernel IO interface in kernel 5.1.
It is designed for higher performance than the older Linux AIO API.
This feature was added in qemu 5.0.

Fixes #4645

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-08-03 10:43:12 -07:00
Archana Shinde
81cdaf0771 govmm: Correct documentation for Linux aio.
The comments for "native" aio are incorrect. Correct these.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-08-03 10:41:50 -07:00
Fabiano Fidêncio
578121124e Merge pull request #4805 from fidencio/topic/bump-tdx-dependencies
Bump TDX dependencies (QEMU and Kernel)
2022-08-03 19:31:26 +02:00
Fabiano Fidêncio
869e408516 Merge pull request #4810 from fidencio/topic/adjust-final-tarball-location-for-tdvf-and-td-shim
OVMF / td-shim: Adjust final tarball location
2022-08-03 16:55:14 +02:00
Fabiano Fidêncio
8d1cb1d513 td-shim: Adjust final tarball location
Let's create the td-shim tarball in the directory where the script was
called from, instead of doing it in the $DESTDIR.

This aligns with the logic being used for creating / extracting the
tarball content, which is already in use by the kata-deploy local build
scripts.

Fixes: #4809

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-03 14:58:44 +02:00
Fabiano Fidêncio
62f05d4b48 ovmf: Adjust final tarball location
Let's create the OVMF tarball in the directory where the script was
called from, instead of doing it in the $DESTDIR.

This aligns with the logic being used for creating / extracting the
tarball content, which is already in use by the kata-deploy local build
scripts.

Fixes: #4808

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-03 14:58:29 +02:00
Fabiano Fidêncio
9972487f6e versions: Bump Kernel TDX version
The latest kernel with TDX support should be pulled from a different
repo (https://github.com/intel/linux-kernel-dcp, instead of
https://github.com/intel/tdx), and the latest version to be used is
SPR-BKC-PC-v9.6.

With the new version being used, let's make sure we enable the
INTEL_TDX_ATTESTATION config option, and all the dependencies needed to
do so.

Fixes: #4803

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-03 12:00:49 +02:00
Fabiano Fidêncio
c9358155a2 kernel: Sort the TDX configs alphabetically
Let's just re-order the TDX configs alphabetically. No new config has
been added or removed, thus no need to bump the kernel version.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-03 11:57:02 +02:00
Fabiano Fidêncio
dd397ff1bf versions: Bump QEMU TDX version
Let's use the latest tag provided in the
"https://github.com/intel/qemu-dcp" repo, "SPR-BKC-QEMU-v2.5".

Fixes: #4802

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-03 11:00:36 +02:00
Ji-Xinyou
a355812e05 runtime-rs: fixed bug on core-sched error handling
Kernel code returns -errno, this should check negative values.

Fixes: #4429
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
2022-08-03 15:26:48 +08:00
Bin Liu
8b0e1859cb Merge pull request #4784 from openanolis/fix-protocol-ci-err
libs: fix CI error for protocols
2022-08-03 11:03:02 +08:00
Bin Liu
b337390c28 Merge pull request #4791 from openanolis/runtime-rs-merge-main-1
runtime-rs: merge main to runtime-rs
2022-08-03 11:00:54 +08:00
Chelsea Mafrica
873e75b915 Merge pull request #4773 from fidencio/topic/build-tdvf
packaging: Add support for building TDVF
2022-08-02 09:14:13 -07:00
Chen Yiyang
230a229052 runk: add ps sub-command
ps command supprot two formats, `json` and `table`. `json` format just
outputs pids in the container. `table` format will use `ps` utilty in
the host, search and output all processes in the container. Add a struct
`container` to represent a spawned container. Move the `kill`
implemention from kill.rs as a method of `container`.

Fixes: #4361

Signed-off-by: Chen Yiyang <cyyzero@qq.com>
2022-08-02 20:45:50 +08:00
Ji-Xinyou
591dfa4fe6 runtime-rs: add support for core scheduling
Linux 5.14 supports core scheduling to have better security control
for SMT siblings. This PR supports that.

Fixes: #4429
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
2022-08-02 17:54:04 +08:00
Bin Liu
889557ecb1 docs: add back host network limitation
Kata Containers doesn't support host network namespace,
it's a common issue for new users. The limitation
is deleted, this commit will add them back.

Also, Docker has support to run containers using
Kata Containers, delete Docker from not support list.

This commit reverts parts of #3710

Fixes: #4794

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-08-02 15:58:16 +08:00
Fabiano Fidêncio
c9b5bde30b versions: Track and build TDVF
TDVF is the firmware used by QEMU to start TDX capable VMs.  Let's start
tracking it as it'll become part of the Confidential Containers sooner
or later.

TDVF lives in the public https://github.com/tianocore/edk2-staging repo
and we're using as its version tags that are consumed internally at
Intel.

Fixes: #4624

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-02 09:51:47 +02:00
Fabiano Fidêncio
e6a5a5106d packaging: Generate a tarball as OVMF build result
Instead of having as a result the directory where OVMF artefacts where
installed, let's follow what we do with the other components and have a
tarball as a result of the OVMF build.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-02 09:48:59 +02:00
Fabiano Fidêncio
42eaf19b43 packaging: Simplify OVMF repo clone
Instead of cloning the repo, and then switching to a specific branch,
let's take advantage of `--branch` and directly clone the specific
branch / tag.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-02 09:48:59 +02:00
Fabiano Fidêncio
4d33b0541d packaging: Don't hardcode "edk2" as the cloned repo's dir.
As TDVF comes from a different repo, the edk2-staging one, we cannot
simply hardcode the name.  Instead, let's get the name of the directory
from name of the git repo.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-02 09:48:59 +02:00
Zhongtao Hu
7247575fa2 runtime-rs:fix cargo clippy
fix cargo clippy

Fixes: #4791
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-08-02 13:17:37 +08:00
Zhongtao Hu
9803393f2f runtime-rs: Merge branch 'main' into runtime-rs-merge-main-1
To keep runtime-rs up to date, we will merge main into runtime-rs every
week.

Fixes: #4790
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-08-02 10:53:01 +08:00
Fabiano Fidêncio
7503bdab6e Merge pull request #4783 from fidencio/topic/build-td-shim
versions: Track and add support for building TD-shim
2022-08-01 20:50:58 +02:00
Fabiano Fidêncio
b06bc82284 versions: Track and add support for building TD-shim
TD-shim is a simplified TDX virtual firmware, used by Cloud Hypervisor,
in order to create a TDX capable VM.

TD-shim is heavily under development, and is hosted as part of the
Confidential Containers project:
https://github.com/confidential-containers/td-shim

The version chosen for this commit, is a version that's being tested
inside Intel, but we, most likely, will need to change it before we have
it officially packaged as part of an official release.

Fixes: #4779

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-01 16:36:12 +02:00
Bin Liu
8d9135a7ce Merge pull request #4765 from ryansavino/ccv0-rust-upgrade
versions: Upgrade rust version
2022-08-01 17:15:05 +08:00
Quanwei Zhou
86ac653ba7 libs: fix CI error for protocols
Fix CI error for protocols.

Fixes: #4781
Signed-off-by: Quanwei Zhou <quanweiZhou@linux.alibaba.com>
2022-08-01 16:26:52 +08:00
Xuewei Niu
81fe51ab0b agent: fix unittests for arp neighbors
Set an ARP address explicitly before netlink::test_add_one_arp_neighbor() running.

Signed-off-by: Xuewei Niu <justxuewei@apache.org>
2022-08-01 16:19:25 +08:00
Xuewei Niu
845c1c03cf agent: use rtnetlink's neighbours API to add neighbors
Bump rtnetlink version from 0.8.0 to 0.11.0. Use rtnetlinks's API to
add neighbors and fix issues to adapt new verson of rtnetlink.

Fixes: #4607

Signed-off-by: Xuewei Niu <justxuewei@apache.org>
2022-08-01 13:44:07 +08:00
Bin Liu
993ae24080 Merge pull request #4777 from openanolis/runtime-rs-merge
Merge Main into runtime-rs branch
2022-08-01 13:04:31 +08:00
Zhongtao Hu
adfad44efe Merge remote-tracking branch 'origin/main' into runtime-rs-merge-tmp
To keep runtime-rs up to date, we will merge main into runtime-rs every
week.

Fixes:#4776
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-08-01 11:12:48 +08:00
Ryan Savino
9b1940e93e versions: update rust version
Fixes #4764

versions: update rust version to fix ccv0 attestation-agent build error
static-checks: kata tools, libs, and agent fixes

Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
2022-07-29 18:41:43 -05:00
Peng Tao
0aefab4d80 Merge pull request #4739 from liubin/fix/4738-trace-rpc-calls
agent: log RPC calls for debugging
2022-07-29 14:18:23 +08:00
Peng Tao
5457deb034 Merge pull request #4741 from openanolis/fix-stop-failed-in-azure
runtime-rs: fix stop failed in azure
2022-07-29 11:41:16 +08:00
Fabiano Fidêncio
54147db921 Merge pull request #4170 from Alex-Carter01/build-amdsev-ovmf
Add support AmdSev build of OVMF
2022-07-28 19:42:50 +02:00
Alex Carter
638c2c4164 static-build: Add AmdSev option for OVMF builder
Introduces new build of firmware needed for SEV

Fixes: kata-containers#4169

Signed-off-by: Alex Carter <alex.carter@ibm.com>
2022-07-28 09:56:06 -05:00
Alex Carter
f0b58e38d2 static-build: Add build script for OVMF
Introduces a build script for OVMF. Defaults to X86_64 build (x64 in OVMF)

Fixes: #4169

Signed-off-by: Alex Carter <alex.carter@ibm.com>
2022-07-28 09:07:49 -05:00
Quanwei Zhou
fa0b11fc52 runtime-rs: fix stdin hang in azure
Fix stdin hang in azure.

Fixes: #4740
Signed-off-by: Quanwei Zhou <quanweiZhou@linux.alibaba.com>
2022-07-28 16:16:37 +08:00
Bin Liu
a67402cc1f Merge pull request #4397 from yaoyinnan/3073/ftr/host-cgroupv2
runtime: Support for host cgroupv2
2022-07-28 14:30:03 +08:00
Tim Zhang
229ff29c0f Merge pull request #4758 from GabyCT/topic/updaterunc
versions: Update runc version
2022-07-28 14:12:58 +08:00
yaoyinnan
5c3155f7e2 runtime: Support for host cgroup v2
Support cgroup v2 on the host. Update vendor containerd/cgroups to add cgroup v2.

Fixes: #3073

Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
2022-07-28 10:30:45 +08:00
yaoyinnan
4ab45e5c93 docs: Update support for host cgroupv2
Currently cgroup v2 is supported. Remove the note that host cgroup v2 is not supported.

Fixes: #3073

Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
2022-07-28 10:30:44 +08:00
GabyCT
9dfd949f23 Merge pull request #4646 from amshinde/add-liburing-qemu
qemu: Add liburing to qemu build
2022-07-27 15:47:49 -05:00
Gabriela Cervantes
326eb2f910 versions: Update runc version
This PR updates the runc version to v1.1.0.

Fixes #4757

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2022-07-27 16:19:11 +00:00
Bin Liu
50b0b7cc15 Merge pull request #4681 from Tim-0731-Hzt/runtime-rs-sharepid
runtime-rs: fix set share sandbox pid namespace
2022-07-27 21:43:58 +08:00
Bin Liu
557229c39d Merge pull request #4724 from yahaa/fix-docs
Docs: fix tables format error
2022-07-27 21:13:29 +08:00
Bin Liu
09672eb2da agent: do some rollback works if case of do_create_container failed
In some cases do_create_container may return an error, mostly due to
`container.start(process)` call. This commit will do some rollback
works if this function failed.

Fixes: #4749

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-07-27 10:23:46 +08:00
Archana Shinde
1b01ea53d9 Merge pull request #4735 from nubificus/feature-fc-v1.1
versions: Update Firecracker version to v1.1.0
2022-07-27 04:50:32 +05:30
Peng Tao
27c82018d1 Merge pull request #4753 from Tim-Zhang/agent-fix-stream-fd-double-close
agent: Fix stream fd's double close
2022-07-27 00:54:07 +08:00
Bin Liu
6fddf031df Merge pull request #4664 from lifupan/main
container: kill all of the processes in a container when it terminated
2022-07-26 23:12:11 +08:00
Tim Zhang
f5aa6ae467 agent: Fix stream fd's double close problem
The fd would be closed on Pipestream's dropping and we should
not close it agian.

Fixes: #4752

Signed-off-by: Tim Zhang <tim@hyper.sh>
2022-07-26 20:05:06 +08:00
yahaa
6e149b43f7 Docs: fix tables format error
Fixes: #4725

Signed-off-by: yahaa <1477765176@qq.com>
2022-07-26 19:05:09 +08:00
Bin Liu
85f4e7caf6 runtime: explicitly mark the source of the log is from qemu.log
In qemu.StopVM(), if debug is enabled, the shim will dump logs
from qemu.log, but users don't know which logs are from qemu.log
and shim itself. Adding some additional messages will
help users to distinguish these logs.

Fixes: #4745

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-07-26 16:08:59 +08:00
Peng Tao
129335714b Merge pull request #4727 from openanolis/anolis-fix-network
fix network failed for kata ci
2022-07-26 15:10:55 +08:00
Peng Tao
71384b60f3 Merge pull request #4713 from openanolis/adjust_default_vcpu
runtime-rs: handle default_vcpus greator than default_maxvcpu
2022-07-26 15:02:34 +08:00
gntouts
56d49b5073 versions: Update Firecracker version to v1.1.0
This patch upgrades Firecracker version from v0.23.4 to v1.1.0

* Generate swagger models for v1.1.0 (from firecracker.yaml)
* Replace ht_enabled param to smt (API change)
* Remove NUMA-related jailer param --node 0

Fixes: #4673
Depends-on: github.com/kata-containers/tests#4968

Signed-off-by: George Ntoutsos <gntouts@nubificus.co.uk>
Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
2022-07-26 07:01:26 +00:00
Zhongtao Hu
b3147411e3 runtime-rs:add unit test for set share pid ns
Fixes:#4680
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-07-26 14:42:00 +08:00
Zhongtao Hu
1ef3f8eac6 runtime-rs: set share sandbox pid namespace
Set the share sandbox pid namepsace from spec

Fixes:#4680
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-07-26 14:41:59 +08:00
Quanwei Zhou
57c556a801 runtime-rs: fix stop failed in azure
Fix the stop failed in azure.

Fixes: #4740
Signed-off-by: Quanwei Zhou <quanweiZhou@linux.alibaba.com>
2022-07-26 12:16:32 +08:00
liubin
0e24f47a43 agent: log RPC calls for debugging
We can log all RPC calls to the agent for debugging purposes
to check which RPC is called, which can help us to understand
the container lifespan.

Fixes: #4738

Signed-off-by: liubin <liubin0329@gmail.com>
2022-07-26 10:32:44 +08:00
Tim Zhang
e764a726ab Merge pull request #4715 from Tim-Zhang/fix-ut-test_do_write_stream
agent: fix fd-double-close problem in ut test_do_write_stream
2022-07-25 17:34:26 +08:00
Peng Tao
3f4dd92c2d Merge pull request #4702 from openanolis/runtime-rs-endpoint-dev
runtime-rs: add functionalities support for macvlan and vlan endpoints
2022-07-25 17:04:45 +08:00
Peng Tao
a3127a03f3 Merge pull request #4721 from openanolis/install-guide-2
Docs: add rust environment setup for kata 3.0
2022-07-25 16:50:20 +08:00
Tim Zhang
427b29454a Merge pull request #4709 from liubin/fix/4708-unwrap-error
rustjail: check result to let it return early
2022-07-25 15:05:20 +08:00
Tim Zhang
0337377838 Merge pull request #4695 from liubin/4694/upgrade-nydus-version
upgrade nydus version
2022-07-25 15:05:04 +08:00
Quanwei Zhou
c825065b27 runtime-rs: fix tc filter setup failed
Fix bug using tc filter and protocol needs to use network byte order.

Fixes: #4726
Signed-off-by: Quanwei Zhou <quanweiZhou@linux.alibaba.com>
2022-07-25 11:16:33 +08:00
Quanwei Zhou
e0194dcb5e runtime-rs: update route destination with prefix
Update route destination with prefix.

Fixes: #4726
Signed-off-by: Quanwei Zhou <quanweiZhou@linux.alibaba.com>
2022-07-25 11:16:22 +08:00
Bin Liu
534a4920b1 Merge pull request #4692 from openanolis/support_disable_guest_seccomp
support disable_guest_seccomp
2022-07-25 11:08:41 +08:00
Zhongtao Hu
fa85fd584e docs: add rust environment setup for kata 3.0
add more details for rust set up in kata 3.0 install guide

Fixes: #4720
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-07-25 09:48:18 +08:00
Wainer Moschetta
0b4a91ec1a Merge pull request #4644 from bookinabox/optimize-get-paths
cgroups: remove unnecessary get_paths()
2022-07-22 17:01:01 -03:00
Ji-Xinyou
896478c92b runtime-rs: add functionalities support for macvlan and vlan endpoints
Add macvlan and vlan support to runtime-rs code and corresponding unit
tests.

Fixes: #4701
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
2022-07-22 10:09:11 +08:00
GabyCT
68c265587c Merge pull request #4718 from GabyCT/topic/updatefirecrackerversion
versions: Update firecracker version
2022-07-21 14:26:57 -05:00
Gabriela Cervantes
df79c8fe1d versions: Update firecracker version
This PR updates the firecracker version that is being
used in kata CI.

Fixes #4717

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2022-07-21 16:10:29 +00:00
Tim Zhang
912641509e agent: fix fd-double-close problem in ut test_do_write_stream
The fd will closed on struct Process's dropping, so don't
close it again manually.

Fixes: #4598

Signed-off-by: Tim Zhang <tim@hyper.sh>
2022-07-21 19:37:15 +08:00
Zhongtao Hu
43045be8d1 runtime-rs: handle default_vcpus greator than default_maxvcpu
when the default_vcpus is greater than the default_maxvcpus, the default
vcpu number should be set equal to the default_maxvcpus.

Fixes: #4712
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-07-21 16:37:56 +08:00
liubin
0d7cb7eb16 agent: delete agent-type property in announce
Since there is only one type of agent now, the
agent-type is not needed anymore.

Signed-off-by: liubin <liubin0329@gmail.com>
2022-07-21 14:53:01 +08:00
liubin
eec9ac81ef rustjail: check result to let it return early.
check the result to let it return early if there are some errors

Fixes: #4708

Signed-off-by: liubin <liubin0329@gmail.com>
2022-07-21 14:51:30 +08:00
liubin
402bfa0ce3 nydus: upgrade nydus/nydus-snapshotter version
Upgrade nydus/nydus-snapshotter to the latest version.

Fixes: #4694

Signed-off-by: liubin <liubin0329@gmail.com>
2022-07-21 14:39:14 +08:00
Quanwei Zhou
54f53d57ef runtime-rs: support disable_guest_seccomp
support disable_guest_seccomp

Fixes: #4691
Signed-off-by: Quanwei Zhou <quanweiZhou@linux.alibaba.com>
2022-07-21 07:46:28 +08:00
Peng Tao
6d56cdb9ac Merge pull request #4686 from xujunjie-cover/issue4685
kata-monitor: fix can't monitor /run/vc/sbs
2022-07-19 23:40:14 +08:00
Bin Liu
540303880e Merge pull request #4688 from quanweiZhou/fix_sandbox_cgroup_false
runtime-rs: fix sandbox_cgroup_only=false panic
2022-07-19 20:38:57 +08:00
Peng Tao
7c146a5d95 Merge pull request #4684 from quanweiZhou/fix-ctr-exit-error
runtime-rs: fix ctr exit failed
2022-07-19 16:02:20 +08:00
Peng Tao
08a6581673 Merge pull request #4662 from openanolis/runtime-rs-user-manaul
docs: add installation guide for kata 3.0
2022-07-19 15:58:55 +08:00
Zhongtao Hu
4331ef80d0 Runtime-rs: add installation guide for rust-runtime
add installation guide for rust-runtime

Fixes:#4661
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-07-19 13:12:13 +08:00
Peng Tao
4c3bd6b1d1 Merge pull request #4656 from openanolis/runtime-rs-ipvlan
runtime-rs: support functionalities of ipvlan endpoint
2022-07-19 11:15:31 +08:00
xujunjie-cover
72dbd1fcb4 kata-monitor: fix can't monitor /run/vc/sbs.
need bind host dir /run/vc/sbs/ to kata monitor

Fixes: #4685

Signed-off-by: xujunjie-cover <xujunjielxx@163.com>
2022-07-19 09:52:54 +08:00
Bin Liu
960f2a7f70 Merge pull request #4678 from Tim-0731-Hzt/runtime-rs-makefile-2
runtime-rs: remove the value of hypervisor path in DB config
2022-07-19 09:34:45 +08:00
Quanwei Zhou
e9988f0c68 runtime-rs: fix sandbox_cgroup_only=false panic
When run with configuration `sandbox_cgroup_only=false`, we will call
`gen_overhead_path()` as the overhead path. The `cgroup-rs` will push
the path with the subsystem prefix by `PathBuf::push()`. When the path
has prefix “/” it will act as root path, such as
```
let mut path = PathBuf::from("/tmp");
path.push("/etc");
assert_eq!(path, PathBuf::from("/etc"));
```
So we shoud not set overhead path with prefix "/".

Fixes: #4687
Signed-off-by: Quanwei Zhou <quanweiZhou@linux.alibaba.com>
2022-07-19 08:30:34 +08:00
Quanwei Zhou
cebbebbe8a runtime-rs: fix ctr exit failed
During use, there will be cases where the container is in the stop state
and get another stop. In this case, the second stop needs to be ignored.

Fixes: #4683
Signed-off-by: Quanwei Zhou <quanweiZhou@linux.alibaba.com>
2022-07-19 07:43:22 +08:00
Bin Liu
758cc47b32 Merge pull request #4671 from liubin/4670-upgrade-nix
kata-sys-util: upgrade nix version
2022-07-18 23:31:07 +08:00
Bin Liu
25be4d00fd Merge pull request #4676 from openanolis/xuejun/runtime-rs
runtime-rs: fix some bugs to make runtime-rs on aarch64
2022-07-18 23:29:32 +08:00
Ji-Xinyou
62182db645 runtime-rs: add unit test for ipvlan endpoint
Add unit test to check the integrity of IPVlanEndpoint::new(...)

Fixes: #4655
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
2022-07-18 15:56:06 +08:00
xuejun-xj
99654ce694 runtime-rs: update dbs-xxx dependencies
Update dbs-xxx commit ID for aarch64 in runtime-rs/Cargo.toml file to add
dependencies for aarch64.

Fixes: #4676

Signed-off-by: xuejun-xj <jiyunxue@alibaba.linux.com>
2022-07-18 13:46:46 +08:00
xuejun-xj
f4c3adf596 runtime-rs: Add compile option file
Add file aarch64-options.mk for compiling on aarch64 architectures.

Fixes: #4676

Signed-off-by: xuejun-xj <jiyunxue@alibaba.linux.com>
2022-07-18 13:46:46 +08:00
xuejun-xj
545ae3f0ee runtime-rs: fix warning
Module anyhow::anyhow is only used on x86_64 architecture in
crates/hypervisor/src/device/vfio.rs file.

Fixes: #4676

Signed-off-by: xuejun-xj <jiyunxue@alibaba.linux.com>
2022-07-18 13:46:39 +08:00
Zhongtao Hu
19eca71cd9 runtime-rs: remove the value of hypervisor path in DB config
As a built in VMM, Path, jailer path, ctlpath are not needed for
Dragonball. So we don't generate those value in Makefile.

Fixes: #4677
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-07-18 13:37:51 +08:00
Ji-Xinyou
d8920b00cd runtime-rs: support functionalities of ipvlan endpoint
Add support for ipvlan endpoint

Fixes: #4655
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
2022-07-18 11:34:03 +08:00
xuejun-xj
2b01e9ba40 dragonball: fix warning
Add map_err for vcpu_manager.set_reset_event_fd() function.

Fixes: #4676

Signed-off-by: xuejun-xj <jiyunxue@alibaba.linux.com>
2022-07-18 09:52:13 +08:00
liubin
996a6b80bc kata-sys-util: upgrade nix version
New nix is supporting UMOUNT_NOFOLLOW, upgrade nix
version to use this flag instead of the self-defined flag.

Fixes: #4670

Signed-off-by: liubin <liubin0329@gmail.com>
2022-07-15 17:38:15 +08:00
Archana Shinde
f690b0aad0 qemu: Add liburing to qemu build
io_uring is a Linux API for asynchronous I/O introduced in qemu 5.0.
It is designed to better performance than older aio API.
We could leverage this in order to get better storage performance.

We should be adding liburing-dev to qemu build to leverage this feature.
However liburing-dev package is not available in ubuntu 20.04,
it is avaiable in 22.04.

Upgrading the ubuntu version in the dockerfile to 22.04 is causing
issues in the static qemu build related to libpmem.

So instead we are building liburing from source until those build issues
are solved.

Fixes: #4645

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-07-14 19:21:47 -07:00
Fupan Li
d93e4b939d container: kill all of the processes in this container
When a container terminated, we should make sure there's no processes
left after destroying the container.

Before this commit, kata-agent depended on the kernel's pidns
to destroy all of the process in a container after the 1 process
exit in a container. This is true for those container using a
separated pidns, but for the case of shared pidns within the
sandbox, the container exit wouldn't trigger the pidns terminated,
and there would be some daemon process left in this container, this
wasn't expected.

Fixes: #4663

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2022-07-14 16:39:49 +08:00
Bin Liu
575b5eb5f5 Merge pull request #4506 from cyyzero/runk-exec
runk: Support `exec` sub-command
2022-07-14 14:22:24 +08:00
Bin Liu
9f49f7adca Merge pull request #4493 from openanolis/runtime-rs-dev
runtime-rs: hypervisor part
2022-07-14 13:49:34 +08:00
Quanwei Zhou
3c989521b1 dragonball: update for review
update for review

Fixes: #3785
Signed-off-by: Quanwei Zhou <quanweiZhou@linux.alibaba.com>
2022-07-14 10:43:59 +08:00
wllenyj
274598ae56 kata-runtime: add dragonball config check support.
add dragonball config check support.

Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
2022-07-14 10:43:50 +08:00
Chao Wu
1befbe6738 runtime-rs: Cargo lock for fix version problem
Cargo lock for fix version problem

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-07-14 08:49:39 +08:00
Quanwei Zhou
3d6156f6ec runtime-rs: support dragonball and runtime-binary
Fixes: #3785
Signed-off-by: Quanwei Zhou <quanweiZhou@linux.alibaba.com>
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-07-14 08:49:30 +08:00
Zhongtao Hu
3f6123b4dd libs: update configuration and annotations
1. support annotation for runtime.name, hypervisor_name, agent_name.
2. fix parse memory from annotation

Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-07-14 08:49:17 +08:00
Derek Lee
9ae2a45b38 cgroups: remove unnecessary get_paths()
Change get_mounts to get paths from a borrowed argument rather than
calling get_paths a second time.

Fixes #3768

Signed-off-by: Derek Lee <derlee@redhat.com>
2022-07-13 09:17:14 -07:00
Bin Liu
0cc20f014d Merge pull request #4647 from fidencio/topic/fix-clh-crash-when-booting-up-with-no-network-device
clh: Don't crash if no network device is set by the upper layer
2022-07-13 21:28:46 +08:00
Fabiano Fidêncio
418a03a128 Merge pull request #4639 from fidencio/topic/packaging-rework-qemu-build-suffix
packaging: Rework how ${BUILD_SUFFIX} is used with the QEMU builder scripts
2022-07-13 15:03:19 +02:00
Fabiano Fidêncio
be31207f6e clh: Don't crash if no network device is set by the upper layer
`ctr` doesn't set a network device when creating the sandbox, which
leads to Cloud Hypervisor's driver crashing, see the log below:
```
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x55641c23b248]
goroutine 32 [running]:
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.glob..func1(0xc000397900)
	/home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/clh.go:163 +0x128
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.(*cloudHypervisor).vmAddNetPut(...)
	/home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/clh.go:1348
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.(*cloudHypervisor).bootVM(0xc000397900, {0x55641c76dfc0, 0xc000454ae0})
	/home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/clh.go:1378 +0x5a2
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.(*cloudHypervisor).StartVM(0xc000397900, {0x55641c76dff8, 0xc00044c240},
0x55641b8016fd)
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/clh.go:659 +0x7ee
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.(*Sandbox).startVM.func2()
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/sandbox.go:1219 +0x190
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.(*LinuxNetwork).Run.func1({0xc0004a8910, 0x3b})
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/network_linux.go:319 +0x1b
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.doNetNS({0xc000048440, 0xc00044c240}, 0xc0005d5b38)
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/network_linux.go:1045 +0x163
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.(*LinuxNetwork).Run(0xc000150c80, {0x55641c76dff8, 0xc00044c240}, 0xc00014e4e0)
	/home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/network_linux.go:318 +0x105
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.(*Sandbox).startVM(0xc000107d40, {0x55641c76dff8, 0xc0005529f0})
	/home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/sandbox.go:1205 +0x65f
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.createSandboxFromConfig({_, _}, {{0x0, 0x0, 0x0}, {0xc000385a00, 0x1, 0x1},
{0x55641d033260, 0x0, ...}, ...}, ...)
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/api.go:91 +0x346
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.CreateSandbox({_, _}, {{0x0, 0x0, 0x0}, {0xc000385a00, 0x1, 0x1},
{0x55641d033260, 0x0, ...}, ...}, ...)
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/api.go:51 +0x150
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.(*VCImpl).CreateSandbox(_, {_, _}, {{0x0, 0x0, 0x0}, {0xc000385a00, 0x1, 0x1},
{0x55641d033260, ...}, ...})
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/implementation.go:35 +0x74
github.com/kata-containers/kata-containers/src/runtime/pkg/katautils.CreateSandbox({_, _}, {_, _}, {{0xc0004806c0, 0x9}, 0xc000140110, 0xc00000f7a0,
{0x0, 0x0}, ...}, ...)
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/pkg/katautils/create.go:175 +0x8b6
github.com/kata-containers/kata-containers/src/runtime/pkg/containerd-shim-v2.create({0x55641c76dff8, 0xc0004129f0}, 0xc00034a000, 0xc00036a000)
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/pkg/containerd-shim-v2/create.go:147 +0xdea
github.com/kata-containers/kata-containers/src/runtime/pkg/containerd-shim-v2.(*service).Create.func2()
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/pkg/containerd-shim-v2/service.go:401 +0x32
created by github.com/kata-containers/kata-containers/src/runtime/pkg/containerd-shim-v2.(*service).Create
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/pkg/containerd-shim-v2/service.go:400 +0x534
```

This bug has been introduced as part of the
https://github.com/kata-containers/kata-containers/pull/4312 PR, which
changed how we add the network device.

In order to avoid the crash, let's simply check whether we have a device
to be added before iterating the list of network devices.

Fixes: #4618

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-07-13 10:40:21 +02:00
Peng Tao
39974fbacc Merge pull request #4642 from fidencio/topic/clh-bump-to-v25.0-release
versions: Update Cloud Hypervisor to v25.0
2022-07-13 16:08:01 +08:00
Fabiano Fidêncio
051181249c packaging: Add a "-" in the dir name if $BUILD_DIR is available
Currently $BUILD_DIR will be used to create a directory as:
/opt/kata/share/kata-qemu${BUILD_DIR}

It means that when passing a BUILD_DIR, like "foo", a name would be
built like /opt/kata/share/kata-qemufoo
We should, instead, be building it as /opt/kata/share/kata-qemu-foo.

Fixes: #4638

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-07-12 21:27:41 +02:00
Fabiano Fidêncio
dc3b6f6592 versions: Update Cloud Hypervisor to v25.0
Cloud Hypervisor v25.0 has been released on July 7th, 2022, and brings
the following changes:

**ch-remote Improvements**
The ch-remote command has gained support for creating the VM from a JSON
config and support for booting and deleting the VM from the VMM.

**VM "Coredump" Support**
Under the guest_debug feature flag it is now possible to extract the memory
of the guest for use in debugging with e.g. the crash utility.
(https://github.com/cloud-hypervisor/cloud-hypervisor/issues/4012)

**Notable Bug Fixes**
* Always restore console mode on exit
  (https://github.com/cloud-hypervisor/cloud-hypervisor/issues/4249,
   https://github.com/cloud-hypervisor/cloud-hypervisor/issues/4248)
* Restore vCPUs in numerical order which fixes aarch64 snapshot/restore
  (https://github.com/cloud-hypervisor/cloud-hypervisor/issues/4244)
* Don't try and configure IFF_RUNNING on TAP devices
  (https://github.com/cloud-hypervisor/cloud-hypervisor/issues/4279)
* Propagate configured queue size through to vhost-user backend
  (https://github.com/cloud-hypervisor/cloud-hypervisor/issues/4286)
* Always Program vCPU CPUID before running the vCPU to fix running on Linux
  5.16
  (https://github.com/cloud-hypervisor/cloud-hypervisor/issues/4156)
* Enable ACPI MADT "Online Capable" flag for hotpluggable vCPUs to fix newer
  Linux guest

**Removals**
The following functionality has been removed:

* The mergeable option from the virtio-pmem support has been removed
  (https://github.com/cloud-hypervisor/cloud-hypervisor/issues/3968)
* The dax option from the virtio-fs support has been removed
  (https://github.com/cloud-hypervisor/cloud-hypervisor/issues/3889)

Fixes: #4641

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-07-12 14:47:58 +00:00
Fabiano Fidêncio
201ff223f6 packaging: Use the $BUILD_SUFFIX when renaming the qemu binary
Instead of always naming the binary as "-experimental", let's take
advantage of the $BUILD_SUFFIX that's already passed and correctly name
the binary according to it.

Fixes: #4638

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-07-12 15:09:31 +02:00
Bin Liu
f3335c99ce Merge pull request #4614 from Tim-0731-Hzt/runtime-rs-merge-main
Runtime-rs merge main
2022-07-12 19:25:11 +08:00
Bin Liu
9f0e4bb775 Merge pull request #4628 from fidencio/topic/rework-tee-kernel-builds
kernel: Deduplicate code used for building TEE kernels
2022-07-12 17:25:04 +08:00
Bin Liu
b424cf3c90 Merge pull request #4544 from openanolis/anolis/virtio_device_aarch64
runtime-rs: Dragonball-sandbox - add virtio device feature support for aarch64
2022-07-12 12:39:31 +08:00
Fabiano Fidêncio
cda1919a0a Merge pull request #4609 from fidencio/topic/kata-deploy-simplify-config-path-handling
packaging: Simplify config path handling
2022-07-11 23:48:54 +02:00
Fabiano Fidêncio
1a25afcdf5 kernel: Allow passing the URL to download the tarball
Passing the URL to be used to download the kernel tarball is useful in
various scenarios, mainly when doing a downstream build, thus let's add
this new option.

This new option also works around a known issue of the Dockerfile used
to build the kernel not having `yq` installed.

Fixes: #4629

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-07-11 14:23:49 +02:00
snir911
0024b8d10a Merge pull request #4617 from Yuan-Zhuo/main
build: save lines for repository_owner check
2022-07-11 15:04:35 +03:00
Fabiano Fidêncio
80c68b80a8 kernel: Deduplicate code used for building TEE kernels
There's no need to have the entire function for building SEV / TDX
duplicated.

Let's remove those functions and create a `get_tee_kernel()` which takes
the TEE as the argument.

Fixes: #4627

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-07-11 13:25:17 +02:00
xuejun-xj
d2584991eb dragonball: fix dependency unused warning
Fix the warning "unused import: `dbs_arch::gic::Error as GICError`" and
"unused import: `dbs_arch::gic::GICDevice`" in file src/vm/mod.rs when
compiling.

Fixes: #4544

Signed-off-by: xuejun-xj <jiyunxue@alibaba.linux.com>
Signed-off-by: jingshan <jingshan@linux.alibaba.com>
2022-07-11 17:55:04 +08:00
xuejun-xj
458f6f42f6 dragonball: use const string for legacy device type
As string "com1", "com2" and "rtc" are used in two files
(device_manager/mod.rs and device_manager/legacy.rs), we use public
const variables COM1, COM2 and RTC to replace them respectively.

Fixes: #4544

Signed-off-by: xuejun-xj <jiyunxue@alibaba.linux.com>
Signed-off-by: jingshan <jingshan@linux.alibaba.com>
2022-07-11 17:46:10 +08:00
James O. D. Hunt
58b0fc4794 Merge pull request #4192 from Tim-0731-Hzt/runtime-rs
kata 3.0 Architecture
2022-07-11 09:34:17 +01:00
Zhongtao Hu
0826a2157d Merge remote-tracking branch 'origin/main' into runtime-rs-1
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-07-11 09:47:23 +08:00
Zhongtao Hu
939959e726 docs: add Dragonball to hypervisors
Fixes:#4193
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-07-11 09:38:17 +08:00
xuejun-xj
f6f96b8fee dragonball: add legacy device support for aarch64
Implement RTC device for aarch64.

Fixes: #4544

Signed-off-by: xuejun-xj <jiyunxue@alibaba.linux.com>
Signed-off-by: jingshan <jingshan@linux.alibaba.com>
2022-07-10 17:35:30 +08:00
xuejun-xj
7a4183980e dragonball: add device info support for aarch64
Implement generate_virtio_device_info() and
get_virtio_mmio_device_info() functions su support the mmio_device_info
member, which is used by FDT.

Fixes: #4544

Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
Signed-off-by: jingshan <jingshan@linux.alibaba.com>
2022-07-10 17:09:59 +08:00
Fabiano Fidêncio
46fd7ce025 Merge pull request #4595 from amshinde/fix-clh-tarball-build
Fix clh tarball build
2022-07-08 20:15:30 +02:00
Peng Tao
30da3fb954 Merge pull request #4515 from openanolis/anolis/dragonball-3
runtime-rs: built-in Dragonball sandbox part III - virtio-blk, virtio-fs, virtio-net and VMM API support
2022-07-08 23:14:01 +08:00
Fabiano Fidêncio
f7ccf92dc8 kata-deploy: Rely on the configured config path
Instead of passing a `KATA_CONF_FILE` environament variable, let's rely
on the configured (in the container engine) config path, as both
containerd and CRI-O support it, and we're using this for both of them.

Fixes: #4608

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-07-08 15:02:26 +02:00
Fabiano Fidêncio
33360f1710 Merge pull request #4600 from ManaSugi/fix/selinux-hypervisor-config
runtime: Fix DisableSelinux config
2022-07-08 13:05:25 +02:00
Fabiano Fidêncio
386a523a05 kata-deploy: Pass the config path to CRI-O
As we're already doing for containerd, let's also pass the configuration
path to CRI-O, as all the supported CRI-O versions do support this
configuration option.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-07-08 12:36:47 +02:00
Yuan-Zhuo
13df57c393 build: save lines for repository_owner check
repository_owner check in docs-url-alive-check.yaml now is specified for each step, it can be in job level to save lines.

Fixes: #4611

Signed-off-by: Yuan-Zhuo <yuanzhuo0118@outlook.com>
2022-07-08 10:40:30 +08:00
Bin Liu
f36bc8bc52 Merge pull request #4616 from GabyCT/topic/updatecontainerddoc
docs: Update URL links for containerd documentation
2022-07-08 08:49:06 +08:00
Gabriela Cervantes
57c2d8b749 docs: Update URL links for containerd documentation
This PR updates some url links related with containerd documentation.

Fixes #4615

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2022-07-07 21:48:18 +00:00
Archana Shinde
e57a1c831e build: Mark git repos as safe for build
This is not an issue when the build is run as non-privilged user.
Marking these as safe in case where the build may be run as root
or some other user.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-07-07 12:11:00 -07:00
GabyCT
ee3f5558ae Merge pull request #4606 from liubin/fix/4605-delete-cri-containerd-plugin
docs: delete CRI containerd plugin statement
2022-07-07 09:35:36 -05:00
Fabiano Fidêncio
c09634dbc7 Merge pull request #4592 from fidencio/revert-kata-deploy-changes-after-2.5.0-rc0-release
release: Revert kata-deploy changes after 2.5.0-rc0 release
2022-07-07 08:59:43 +02:00
liubin
2551924bda docs: delete CRI containerd plugin statement
There is no independent CRI containerd plugin for new containerd,
the related documentation should be updated too.

Fixes: #4605

Signed-off-by: liubin <liubin0329@gmail.com>
2022-07-07 12:06:25 +08:00
Bin Liu
bee7915932 Merge pull request #4533 from bookinabox/simplify-nproc
tools/snap: simplify nproc
2022-07-07 11:38:29 +08:00
Chao Wu
9cee52153b fmt: do cargo fmt and add a dependency for blk_dev
fmt: do cargo fmt and add a dependency for blk_dev

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-07-07 10:32:35 +08:00
Chao Wu
47a4142e0d fs: change vhostuser and virtio into const
change fs mode vhostuser and virtio into const.

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-07-07 10:32:35 +08:00
Chao Wu
e14e98bbeb cpu_topo: add handle_cpu_topology function
add handle_cpu_topology funciton to make it easier to understand the
set_vm_configuration function.

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-07-07 10:32:35 +08:00
Chao Wu
5d3b53ee7b downtime: add downtime support
add downtime support in `resume_all_vcpus_with_downtime`

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-07-07 10:32:35 +08:00
Chao Wu
6a1fe85f10 vfio: add vfio as TODO
We add vfio as TODO in this commit and create a github issue for this.

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-07-07 10:32:35 +08:00
Chao Wu
5ea35ddcdc refractor: remove redundant by_id
remove redundant by_id in get_vm_by_id_mut and get_vm_by_id. They are
optimized to get_vm_mut and get_vm.

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-07-07 10:32:35 +08:00
Chao Wu
b646d7cb37 config: remove ht_enabled
Since cpu topology could tell whether hyper thread is enabled or not, we
removed ht_enabled config from VmConfigInfo

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-07-07 10:32:35 +08:00
Chao Wu
cb54ac6c6e memory: remove reserve_memory_bytes
This is currently an unsupported feature and we will remove it from the
current code.

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-07-07 10:32:35 +08:00
Chao Wu
bde6609b93 hotplug: add room for other hotplug solution
Add room in the code for other hotplug solution without upcall

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-07-07 10:32:35 +08:00
wllenyj
d88b1bf01c dragonball: update vsock dependency
1. fix vsock device init failed
2. fix VsockDeviceConfigInfo not found

Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
2022-07-07 10:32:35 +08:00
Chao Wu
dd003ebe0e Dragonball: change error name and fix compile error
Change error name from `StartMicrovm` to `StartMicroVm`,
`StartMicrovmError` to `StartMicroVmError`.

Besides, we fix a compile error in config_manager.

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-07-07 10:32:35 +08:00
Chao Wu
38957fe00b UT: fix compile error in unit tests
fix compile error in unit tests for DummyConfigInfo.

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-07-07 10:32:35 +08:00
wllenyj
11b3f95140 dragonball: add virtio-fs device support
Virtio-fs devices are supported.

Fixes: #4257

Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
2022-07-07 10:32:35 +08:00
wllenyj
948381bdbe dragonball: add virtio-net device support
Virtio-net devices are supported.

Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
2022-07-07 10:32:35 +08:00
wllenyj
3d20387a25 dragonball: add virtio-blk device support
Virtio-blk devices are supported.

Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
2022-07-07 10:32:35 +08:00
Chao Wu
87d38ae49f Doc: add document for Dragonball API
add detailed explanation for Dragonball API

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-07-07 10:32:26 +08:00
Zhongtao Hu
2bb1eeaecc docs: further questions related to upcall
add questions and answers for upcall

Fixes:#4193
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-07-07 09:52:50 +08:00
Zhongtao Hu
026aaeeccc docs: add FAQ to the report
1.provide answers for the qeustions will be frequently asked

2.format the document

Fixes:#4193
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-07-07 09:52:50 +08:00
Christophe de Dinechin
fffcb81652 docs: update the content of the report
1. Explain why the current situation is a problem.

2. We are beyond a simple introduction now, it's a real proposal.

3. Explain why you think it is solid, and fix a grammatical error.

4. The Rust rationale does not really belong to the initial paragraph.
   Also, I rephrased it to highlight the contrast with Go and the Kata community's
   past experience switching to Rust for the agent.

Fixes:#4193
Signed-off-by: Christophe de Dinechin <christophe@dinechin.org>
2022-07-07 09:52:46 +08:00
Zhongtao Hu
42ea854eb6 docs: kata 3.0 Architecture
An introduction for kata 3.0 architecture design

Fixes:#4193
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Signed-off-by: Quanwei Zhou <quanweiZhou@linux.alibaba.com>
Signed-off-by: Christophe de Dinechin <christophe@dinechin.org>
2022-07-07 09:47:07 +08:00
Archana Shinde
efdb92366b build: Fix clh source build as normal user
While running make as non-privileged user, the make errors out with
the following message:
"INFO: Build cloud-hypervisor enabling the following features: tdx
Got permission denied while trying to connect to the Docker daemon
socket at unix:///var/run/docker.sock: Post
"http://%2Fvar%2Frun%2Fdocker.sock/v1.24/images/create?fromImage=cloudhypervisor%2Fdev&tag=20220524-0":
dial unix /var/run/docker.sock: connect: permission denied"

Even though the user may be part of docker group, the clh build from
source does a docker in docker build. It is necessary for the user of
the nested container to be part of docker build for the build to
succeed.

Fixes #4594

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-07-06 18:28:00 -07:00
Derek Lee
0e40ecf383 tools/snap: simplify nproc
Replaces calls of nproc	with nproc with

nproc ${CI:+--ignore 1}

to run nproc with one less processing unit than the maximum to prevent
DOS-ing the local machine.

If process is being run in a container (determined via whether $CI is
null), all processing units avaliable will be used.

Fixes #3967

Signed-off-by: Derek Lee <derlee@redhat.com>
2022-07-06 15:04:08 -07:00
Chen Yiyang
f59939a31f runk: Support exec sub-command
`exec` will execute a command inside a container which exists and is not
frozon or stopped. *Inside* means that the new process share namespaces
and cgroup with the container init process. Command can be specified by
`--process` parameter to read from a file, or from other parameters such
as arg, env, etc. In order to be compatible with `create`/`run`
commands, I refactor libcontainer. `Container` in builder.rs is divided
into `InitContainer` and `ActivatedContainer`. `InitContainer` is used
for `create`/`run` command. It will load spec from given bundle path.
`ActivatedContainer` is used by `exec` command, and will read the
container's status file, which stores the spec and `CreateOpt` for
creating the rustjail::LinuxContainer. Adapt the spec by replacing the
process with given options and updating the namesapces with some paths
to join the container. I also rename the `ContainerContext` as
`ContainerLauncher`, which is only used to spawn process now. It uses
the `LinuxContaier` in rustjail as the runner. For `create`/`run`, the
`launch` method will create a new container and run the first process.
For `exec`, the `launch` method will spawn a process which joins a
container.

Fixes #4363

Signed-off-by: Chen Yiyang <cyyzero@qq.com>
2022-07-06 21:11:30 +08:00
Bin Liu
be68cf0712 Merge pull request #4597 from bergwolf/github/action
action: revert commit message limit to 150 bytes
2022-07-06 17:13:15 +08:00
Manabu Sugimoto
4d89476c91 runtime: Fix DisableSelinux config
Enable Kata runtime to handle `disable_selinux` flag properly in order
to be able to change the status by the runtime configuration whether the
runtime applies the SELinux label to VMM process.

Fixes: #4599
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2022-07-06 15:50:28 +09:00
wllenyj
090de2dae2 dragonball: fix the clippy errors.
fix clippy errors  and do fmt in this PR.

Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
2022-07-06 11:29:49 +08:00
wllenyj
a1593322bd dragonball: add vsock api to api server
Enables vsock to use the api for device configuration.

Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
2022-07-06 11:29:49 +08:00
wllenyj
89b9ba8603 dragonball: add set_vm_configuration api
Set virtual machine configuration configurations.

Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
2022-07-06 11:29:49 +08:00
wllenyj
95fa0c70c3 dragonball: add start microvm support
We add microvm start related support in thie pull request.

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
Signed-off-by: jingshan <jingshan@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-07-06 11:29:49 +08:00
wllenyj
5c1ccc376b dragonball: add Vmm struct
The Vmm struct is global coordinator to manage API servers, virtual
machines etc.

Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
2022-07-06 11:29:49 +08:00
Jiang Liu
4d234f5742 dragonball: refactor code layout
Refactored some code layout.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2022-07-06 11:29:49 +08:00
wllenyj
cfd5dae47c dragonball: add vm struct
The vm struct to manage resources and control states of an virtual
machine instance.

Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
Signed-off-by: jingshan <jingshan@linux.alibaba.com>
Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-07-06 11:29:46 +08:00
wllenyj
527b73a8e5 dragonball: remove unused feature in AddressSpaceMgr
log_dirty_pages is useless now and will be redesigned to support live
migration in the future.

Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
2022-07-06 11:28:32 +08:00
Peng Tao
3bafafec58 action: extend commit message line limit to 150 bytes
So that we can add move info there and few people use such small
terminals nowadays.

Fixes: #4596
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-07-06 11:19:08 +08:00
Fabiano Fidêncio
5010c643c4 release: Revert kata-deploy changes after 2.5.0-rc0 release
As 2.5.0-rc0 has been released, let's switch the kata-deploy / kata-cleanup
tags back to "latest", and re-add the kata-deploy-stable and the
kata-cleanup-stable files.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2022-07-05 22:23:49 +02:00
Peng Tao
514b4e7235 Merge pull request #4543 from openanolis/anolis/add_vcpu_configure_aarch64
runtime-rs: Dragonball sandbox - add Vcpu::configure() function for aarch64
2022-07-05 17:47:40 +08:00
xuejun-xj
7120afe4ed dragonball: add vcpu test function for aarch64
add create_vcpu() function in vcpu test unit for aarch64

Fixes: #4445

Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
Signed-off-by: jingshan <jingshan@linux.alibaba.com>
2022-07-04 15:23:43 +08:00
xuejun-xj
648d285a24 dragonball: add vcpu support for aarch64
add configure() function for aarch64 vcpu

Fixes: #4543

Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
Signed-off-by: jingshan <jingshan@linux.alibaba.com>
2022-07-04 15:23:37 +08:00
xuejun-xj
7dad7c89f3 dragonball: update dbs-xxx dependency
change to up-to-date commit ID

Fixes: #4543

Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
Signed-off-by: jingshan <jingshan@linux.alibaba.com>
2022-07-04 15:23:11 +08:00
James O. D. Hunt
59cab9e835 Merge pull request #4380 from Tim-0731-Hzt/rund/makefile
runtime-rs: makefile for dragonball
2022-07-01 09:12:38 +01:00
Bin Liu
18093251ec Merge pull request #4527 from Tim-0731-Hzt/rund-new/netlink
runtime-rs:refactor network model with netlink
2022-07-01 11:12:54 +08:00
Zhongtao Hu
07231b2f3f runtime-rs:refactor network model with netlink
add unit test for tcfilter

Fixes: #4289
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-06-29 11:38:23 +08:00
Zhongtao Hu
c8a9052063 build: format files
add Enter at the end of the file

Fixes: #4379
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-06-29 11:19:10 +08:00
Zhongtao Hu
242992e3de build: put install methods in utils.mk
put install methods in utils.mk to avoid duplication

Fixes: #4379
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-06-29 11:19:10 +08:00
Zhongtao Hu
8a697268d0 build: makefile for dragonball config
use makefile to generate dragonball config file

Fixes: #4379
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-06-29 11:19:07 +08:00
Zhongtao Hu
9c526292e7 runtime-rs:refactor network model with netlink
refactor tcfilter with netlink

Fixes: #4289
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-06-29 11:03:29 +08:00
GabyCT
12c1b9e6d6 Merge pull request #4536 from Tim-0731-Hzt/runtime-rs-kata-main
runtime-rs: Merge Main into runtime-rs branch
2022-06-28 10:27:35 -05:00
Zhongtao Hu
f3907aa127 runtime-rs:Merge remote-tracking branch 'origin/main' into runtime-rs-newv
Fixes:#4536
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-06-28 20:58:40 +08:00
Bin Liu
badbbcd8be Merge pull request #4400 from openanolis/anolis/dragonball-2
runtime-rs: built-in Dragonball sandbox part II - vCPU manager
2022-06-28 20:41:36 +08:00
Chao Wu
71db2dd5b8 hotplug: add room for future acpi hotplug mechanism
In order to support ACPI hotplug in the future with the cooperative work
from the Kata community, we add ACPI feature and dbs-upcall feature to
add room for ACPI hotplug.

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-06-27 21:52:36 +08:00
Zizheng Bian
8bb00a3dc8 dragonball: fix a bug when generating kernel boot args
We should refuse to generate boot args when hotplugging, not cold starting.

Signed-off-by: Zizheng Bian <zizheng.bian@linux.alibaba.com>
2022-06-27 18:12:50 +08:00
Chao Wu
2aedd4d12a doc: add document for vCPU, api and device
Create the document for vCPU and api.

Add some detail in the device document.

Fixes: #4257

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-06-27 18:12:50 +08:00
wllenyj
bec22ad01f dragonball: add api module
It is used to define the vmm communication interface.

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
2022-06-27 18:12:50 +08:00
wllenyj
07f44c3e0a dragonball: add vcpu manager
Manage vcpu related operations.

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: jingshan <jingshan@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
2022-06-27 18:12:48 +08:00
wllenyj
78c9718752 dragonball: add upcall support
Upcall is a direct communication tool between VMM and guest developed
upon vsock. It is used to implement device hotplug.

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: jingshan <jingshan@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
Signed-off-by: Zizheng Bian <zizheng.bian@linux.alibaba.com>
2022-06-27 17:04:47 +08:00
wllenyj
7d1953b52e dragonball: add vcpu
Virtual CPU manager for virtual machines.

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: jingshan <jingshan@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
2022-06-27 17:04:42 +08:00
wllenyj
468c73b3cb dragonball: add kvm context
KVM operation context for virtual machines.

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: jingshan <jingshan@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
2022-06-27 16:02:06 +08:00
Fupan Li
98f041ed8e Merge pull request #4486 from openanolis/runtime-rs-merge-main
runtime-rs: runtime-rs merge main
2022-06-20 13:52:14 +08:00
Chao Wu
86123f49f2 Merge branch 'main' into runtime-rs
In order to keep update with the main, we will update runtime-rs every
week.

Fixes: #4485
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-06-20 10:01:58 +08:00
wllenyj
e89e6507a4 dragonball: add signal handler
Used to register dragonball's signal handler.

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: jingshan <jingshan@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
2022-06-16 17:31:58 +08:00
wllenyj
b6cb2c4ae3 dragonball: add metrics system
metrics system is added for collecting Dragonball metrics to analyze the
system.

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: jingshan <jingshan@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
2022-06-13 13:51:51 +08:00
wllenyj
e80e0c4645 dragonball: add io manager wrapper
Wrapper over IoManager to support device hotplug.

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: jingshan <jingshan@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
2022-06-13 13:51:46 +08:00
Bin Liu
f23d7092e3 Merge pull request #4265 from openanolis/anolis/dragonball-1
runtime-rs: built-in Dragonball sandbox part I - resource and device managers
2022-06-12 12:17:57 +08:00
Chao Wu
d5ee3fc856 safe-path: fix clippy warning
fix clippy warnings in safe-path lib to make clippy happy.

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-06-12 10:24:05 +08:00
Chao Wu
93c10dfd86 runtime-rs: add crosvm license in Dragonball
add THIRD-PARTY file to add license for crosvm.

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-06-11 17:24:58 +08:00
Chao Wu
dfe6de7714 dragonball: add dragonball into kata README
add dragonball description into kata README to help introduce dragonball
sandbox.

Fixes: #4257

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-06-11 17:24:56 +08:00
wllenyj
39ff85d610 dragonball: green ci
Revert this patch, after dragonball-sandbox is ready. And all
subsequent implementations are submitted.

Fixes: #4257

Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
2022-06-11 17:24:17 +08:00
wllenyj
71f24d8271 dragonball: add Makefile.
Currently supported: build, clippy, check, format, test, clean

Fixes: #4257

Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
2022-06-11 17:24:17 +08:00
Chao Wu
a1df6d0969 Doc: Update Dragonball Readme and add document for device
Update Dragonball Readme to fix style problem and add github issue for
TODOs.

Add document for devices in dragonball. This is the document for the
current dragonball device status and we'll keep updating it when we
introduce more devices in later pull requets.

Fixes: #4257

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-06-11 17:24:17 +08:00
wllenyj
8619f2b3d6 dragonball: add virtio vsock device manager.
Added VsockDeviceMgr struct to manage all vsock devices.

Fixes: #4257

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-06-11 17:23:56 +08:00
wllenyj
52d42af636 dragonball: add device manager.
Device manager to manage IO devices for a virtual machine. And added
DeviceManagerTx to provide operation transaction for device management,
added DeviceManagerContext to operation context for device management.

Fixes: #4257

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-06-11 17:23:56 +08:00
wllenyj
c1c1e5152a dragonball: add kernel config.
It is used for holding guest kernel configuration information.

Fixes: #4257

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-06-11 17:23:46 +08:00
wllenyj
6850ef99ae dragonball: add configuration manager.
It is used for managing a group of configuration information.

Fixes: #4257

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-06-11 17:23:39 +08:00
wllenyj
0bcb422fcb dragonball: add legacy devices manager
The legacy devices manager is used for managing legacy devices.

Fixes: #4257

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-06-11 17:23:33 +08:00
wllenyj
3c45c0715f dragonball: add console manager.
Console manager to manage frontend and backend console devcies.

A virtual console are composed up of two parts: frontend in virtual
machine and backend in host OS. A frontend may be serial port,
virtio-console etc, a backend may be stdio or Unix domain socket. The
manager connects the frontend with the backend.

Fixes: #4257

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-06-11 17:23:27 +08:00
wllenyj
3d38bb3005 dragonball: add address space manager.
Address space abstraction to manage virtual machine's physical address space.
The AddressSpaceMgr Struct to manage address space.

Fixes: #4257

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-06-11 17:21:41 +08:00
wllenyj
aff6040555 dragonball: add resource manager support.
Resource manager manages all resources of a virtual machine instance.

Fixes: #4257

Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-06-11 17:21:41 +08:00
wllenyj
8835db6b0f dragonball: initial commit
The dragonball crate initial commit that includes dragonball README and
basic code structure.

Fixes: #4257

Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2022-06-11 17:21:41 +08:00
Fupan Li
9cb15ab4c5 agent: add the FSGroup support
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2022-06-11 11:30:51 +08:00
Fupan Li
ff7874bc23 protobuf: upgrade the protobuf version to 2.27.0
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2022-06-11 10:05:52 +08:00
Zhongtao Hu
06f398a34f runtime-rs: use withContext to evaluate lazily
Fixes: #4129
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-06-10 22:03:13 +08:00
Quanwei Zhou
fd4c26f9c1 runtime-rs: support network resource
Fixes: #3785
Signed-off-by: Quanwei Zhou <quanweiZhou@linux.alibaba.com>
2022-06-10 22:02:58 +08:00
Tim Zhang
4be7185aa4 runtime-rs: runtime part implement
Fixes: #3785
Signed-off-by: Tim Zhang <tim@hyper.sh>
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Signed-off-by: Quanwei Zhou <quanweiZhou@linux.alibaba.com>
2022-06-10 22:01:12 +08:00
Zhongtao Hu
10343b1f3d runtime-rs: enhance runtimes
1. support oom event
2. use ContainerProcess to store container_id and exec_id
3. support stats

Fixes: #3785
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-06-10 22:01:05 +08:00
Quanwei Zhou
9887272db9 libs: enhance kata-sys-util and kata-types
Fixes: #3785
Signed-off-by: Quanwei Zhou <quanweiZhou@linux.alibaba.com>
2022-06-10 21:59:47 +08:00
Quanwei Zhou
3ff0db05a7 runtime-rs: support rootfs volume for resource
Fixes: #3785
Signed-off-by: Quanwei Zhou <quanweiZhou@linux.alibaba.com>
2022-06-10 19:58:01 +08:00
Tim Zhang
234d7bca04 runtime-rs: support cgroup resource
Fixes: #3785
Signed-off-by: Tim Zhang <tim@hyper.sh>
2022-06-10 19:57:53 +08:00
Quanwei Zhou
75e282b4c1 runtime-rs: hypervisor base define
Responsible for VM manager, such as Qemu, Dragonball

Fixes: #3785
Signed-off-by: Quanwei Zhou <quanweiZhou@linux.alibaba.com>
2022-06-10 19:57:45 +08:00
Quanwei Zhou
bdfee005fa runtime-rs: service and runtime framework
1. service: Responsible for processing services, such as task service, image service
2. Responsible for implementing different runtimes, such as Virt-container,
Linux-container, Wasm-container

Fixes: #3785
Signed-off-by: Quanwei Zhou <quanweiZhou@linux.alibaba.com>
2022-06-10 19:57:36 +08:00
Quanwei Zhou
4296e3069f runtime-rs: agent implements
Responsible for communicating with the agent, such as kata-agent in the VM

Fixes: #3785
Signed-off-by: Quanwei Zhou <quanweiZhou@linux.alibaba.com>
2022-06-10 19:57:29 +08:00
Jakob Naucke
d3da156eea runtime-rs: uint FsType for s390x
statfs type on s390x should be c_uint, not __fsword_t

Fixes: #3888
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2022-06-10 19:57:23 +08:00
quanwei.zqw
e705ee07c5 runtime-rs: update containerd-shim-protos to 0.2.0
Fixes: #3866
Signed-off-by: quanwei.zqw <quanwei.zqw@alibaba-inc.com>
2022-06-10 19:57:14 +08:00
quanwei.zqw
8c0a60e191 runtime-rs: modify the review suggestion
Fixes: #3876
Signed-off-by: quanwei.zqw <quanwei.zqw@alibaba-inc.com>
2022-06-10 19:57:07 +08:00
Zack
278f843f92 runtime-rs: shim implements for runtime-rs
Responsible for processing shim related commands: start, delete.

This patch is extracted from Alibaba Cloud's internal repository *runD*
Thanks to all contributors!

Fixes: #3785
Signed-off-by: acetang <aceapril@126.com>
Signed-off-by: Bin Liu <bin@hyper.sh>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
Signed-off-by: Fupan Li <lifupan@gmail.com>
Signed-off-by: gexuyang <gexuyang@linux.alibaba.com>
Signed-off-by: Helin Guo <helinguo@linux.alibaba.com>
Signed-off-by: He Rongguang <herongguang@linux.alibaba.com>
Signed-off-by: Hui Zhu <teawater@gmail.com>
Signed-off-by: Issac Hai <hjwissac@linux.alibaba.com>
Signed-off-by: Jiahuan Chao <jhchao@linux.alibaba.com>
Signed-off-by: lichenglong9 <lichenglong9@163.com>
Signed-off-by: mengze <mengze@linux.alibaba.com>
Signed-off-by: Qingyuan Hou <qingyuan.hou@linux.alibaba.com>
Signed-off-by: Quanwei Zhou <quanweiZhou@linux.alibaba.com>
Signed-off-by: shiqiangzhang <shiyu.zsq@linux.alibaba.com>
Signed-off-by: Simon Guo <wei.guo.simon@linux.alibaba.com>
Signed-off-by: Tim Zhang <tim@hyper.sh>
Signed-off-by: wanglei01 <wllenyj@linux.alibaba.com>
Signed-off-by: Wei Yang <wei.yang1@linux.alibaba.com>
Signed-off-by: yanlei <yl.on.the.way@gmail.com>
Signed-off-by: Yiqun Leng <yqleng@linux.alibaba.com>
Signed-off-by: yuchang.xu <yuchang.xu@linux.alibaba.com>
Signed-off-by: Yves Chan <lingfu@linux.alibaba.com>
Signed-off-by: Zack <zmlcc@linux.alibaba.com>
Signed-off-by: Zhiheng Tao <zhihengtao@linux.alibaba.com>
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Signed-off-by: Zizheng Bian <zizheng.bian@linux.alibaba.com>
2022-06-10 19:56:59 +08:00
Quanwei Zhou
641b736106 libs: enhance kata-sys-util
1. move verify_cid from agent to libs/kata-sys-util
2. enhance kata-sys-util/k8s

Signed-off-by: Quanwei Zhou <quanweiZhou@linux.alibaba.com>
2022-06-10 19:55:39 +08:00
Fupan Li
69ba1ae9e4 trans: fix the issue of wrong swapness type
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2022-06-10 19:46:25 +08:00
Quanwei Zhou
d2a9bc6674 agent: agent-protocol support async
1. support async.
2. update ttrpc and protobuf
update ttrpc to 0.6.0
update protobuf to 2.23.0
3. support trans from oci

Fixes: #3746
Signed-off-by: Quanwei Zhou <quanweiZhou@linux.alibaba.com>
2022-06-10 19:36:55 +08:00
Liu Jiang
aee9633ced libs/sys-util: provide functions to execute hooks
Provide functions to execute OCI hooks.

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Bin Liu <bin@hyper.sh>
Signed-off-by: Huamin Tang <huamin.thm@alibaba-inc.com>
Signed-off-by: Lei Wang <wllenyj@linux.alibaba.com>
Signed-off-by: Quanwei Zhou <quanweiZhou@linux.alibaba.com>
2022-06-10 19:24:30 +08:00
Liu Jiang
8509de0aea libs/sys-util: add function to detect and update K8s emptyDir volume
Add function to detect and update K8s emptyDir volume.

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Qingyuan Hou <qingyuan.hou@linux.alibaba.com>
2022-06-10 19:15:59 +08:00
Liu Jiang
6d59e8e197 libs/sys-util: introduce function to get device id
Introduce get_devid() to get major/minor number of a block device.

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
2022-06-10 19:15:28 +08:00
Liu Jiang
5300ea23ad libs/sys-util: implement reflink_copy()
Implement reflink_copy() to copy file by reflink, and fallback to normal
file copy.

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
2022-06-10 19:15:20 +08:00
Liu Jiang
1d5c898d7f libs/sys-util: add utilities to parse NUMA information
Add utilities to parse NUMA information.

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Qingyuan Hou <qingyuan.hou@linux.alibaba.com>
Signed-off-by: Simon Guo <wei.guo.simon@linux.alibaba.com>
2022-06-10 19:15:12 +08:00
Liu Jiang
87887026f6 libs/sys-util: add utilities to manipulate cgroup
Add utilities to manipulate cgroup, currently only v1 is supported.

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: He Rongguang <herongguang@linux.alibaba.com>
Signed-off-by: Jiahuan Chao <jhchao@linux.alibaba.com>
Signed-off-by: Qingyuan Hou <qingyuan.hou@linux.alibaba.com>
Signed-off-by: Quanwei Zhou <quanweiZhou@linux.alibaba.com>
Signed-off-by: Tim Zhang <tim@hyper.sh>
2022-06-10 19:14:59 +08:00
Liu Jiang
ccd03e2cae libs/sys-util: add wrappers for mount and fs
Add some wrappers for mount and fs syscall.

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Bin Liu <bin@hyper.sh>
Signed-off-by: Fupan Li <lifupan@gmail.com>
Signed-off-by: Huamin Tang <huamin.thm@alibaba-inc.com>
Signed-off-by: Lei Wang <wllenyj@linux.alibaba.com>
Signed-off-by: Quanwei Zhou <quanweiZhou@linux.alibaba.com>
2022-06-10 19:14:06 +08:00
Liu Jiang
45a00b4f02 libs/sys-util: add kata-sys-util crate under src/libs
The kata-sys-util crate is a collection of modules that provides helpers
and utilities used by multiple Kata Containers components.

Fixes: #3305

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
2022-06-10 19:10:40 +08:00
Zhongtao Hu
48c201a1ac libs/types: make the variable name easier to understand
1. modify default values for hypervisor
2. change the variable name
3. check the min memory limit

Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-06-10 19:01:31 +08:00
Zhongtao Hu
b9b6d70aae libs/types: modify implementation details
1. fix nit problems
2. use generic type when parsing different type

Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-06-10 19:01:24 +08:00
Zhongtao Hu
05ad026fc0 libs/types: fix implementation details
use ok_or_else to handle get_mut(hypervisor) to substitue unwrap

Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-06-10 19:01:17 +08:00
Zhongtao Hu
d96716b4d2 libs/types:fix styles and implementation details
1. Some Nit problems are fixed
2. Make the code more readable
3. Modify some implementation details

Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-06-10 19:01:09 +08:00
Zhongtao Hu
6cffd943be libs/types:return Result to handle parse error
If there is a parse error when we are trying to get the annotations, we
will return Result<Option<type>> to handle that.

Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-06-10 19:00:58 +08:00
Zhongtao Hu
6ae87d9d66 libs/types: use contains to make code more readable
use contains to when validate hypervisor block_device_driver

Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-06-10 19:00:50 +08:00
Zhongtao Hu
45e5780e7c libs/types: fixed spelling and grammer error
fixed spelling and grammer error in some files

Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-06-10 19:00:43 +08:00
Zhongtao Hu
2599a06a56 libs/types:use include_str! in test file
use include_str! to load toml file to string fmt

Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-06-10 18:28:14 +08:00
Zhongtao Hu
8ffff40af4 libs/types:Option type to handle empty tomlconfig
loading from empty string is only used to identity that the config is
not initialized yet, so Option<TomlConfig> is a better option

Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-06-10 18:28:05 +08:00
Zhongtao Hu
626828696d libs/types: add license for test-config.rs
add SPDX license identifier: Apache-2.0

Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-06-10 18:27:57 +08:00
Zhongtao Hu
97d8c6c0fa docs: modify move-issues-to-in-progress.yaml
change issue backlog to runtime-rs

Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-06-10 18:27:49 +08:00
Liu Jiang
8cdd70f6c2 libs/types: change method to update config by annotation
Some annotations are used to override hypervisor configurations, and you
know it's dangerous. We must be careful when overriding hypervisor configuration
by annotations, to avoid security flaws.
There are two existing mechanisms to prevent attacks by annotations:
1) config.hypervisor.enable_annotations defines the allowed annotation
keys for config.hypervisor.
2) config.hyperisor.xxxx_paths defines allowd values for specific keys.

The access methods for config.hypervisor.xxx enforces the permisstion
checks for above rules.

To update conifg, traverse the annotation hashmap,check if the key is enabled in hypervisor or not.
If it is enabled. For path related annotation, check whether it is valid or not
before updating conifg. For cpu and memory related annotation, check whether it
is more than or less than the limitation for DB and qemu beforing updating config.

If it is not enabled, there will be three possibilities, agent related
annotation, runtime related annotation and hypervisor related annotation
but not enabled. The function will handle agent and runtime annotation
first, then the option left will be the invlaid hypervisor, err message
will be returned.

add more edge cases tests for updating config

clean up unused functions, delete unused files and fix warnings

Fixes: #3523

Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
2022-06-10 18:27:36 +08:00
Liu Jiang
e19d04719f libs/types: implement KataConfig to wrap TomlConfig
The TomlConfig structure is a parsed form of Kata configuration file,
but it's a little inconveneient to access those configuration
information directly. So introduce a wrapper KataConfig to easily
access those configuration information.

Two singletons of KataConfig is provided:
- KATA_DEFAULT_CONFIG: the original version directly loaded from Kata
configuration file.
- KATA_ACTIVE_CONFIG: the active version is the KATA_DEFAULT_CONFIG
patched by annotations.

So the recommended to way to use these two singletons:
- Load TomlConfig from configuration file and set it as the default one.
- Clone the default one and patch it with values from annotations.
- Use the default one for permission checks, such as to check for
  allowed annotation keys/values.
- The patched version may be set as the active one or passed to clients.
- The clients directly accesses information from the active/passed one,
  and do not need to check annotation for override.

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
2022-06-10 18:26:48 +08:00
Liu Jiang
387ffa914e libs/types: support load Kata agent configuration from file
Add structures to load Kata agent configuration from configuration files.
Also define a mechanism for vendor to extend the Kata configuration
structure.

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
2022-06-10 18:26:37 +08:00
Liu Jiang
69f10afb71 libs/types: support load Kata hypervisor configuration from file
Add structures to load Kata hypevisor configuration from configuration
files. Also define a mechanisms to:
1) for hypervisors to handle the configuration info.
2) for vendor to extend the Kata configuration structure.

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-06-10 18:25:24 +08:00
Liu Jiang
21cc02d724 libs/types: support load Kata runtime configuration from file
Add structures to load Kata runtime configuration from configuration
files. Also define a mechanism for vendor to extend the Kata
configuration structure.

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-06-10 18:25:24 +08:00
Liu Jiang
5b89c1df2f libs/types: add kata-types crate under src/libs
Add kata-types crate to host constants and data types shared by multiple
Kata Containers components.

Fixes: #3305

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Fupan Li <lifupan@gmail.com>
Signed-off-by: Huamin Tang <huamin.thm@alibaba-inc.com>
Signed-off-by: Lei Wang <wllenyj@linux.alibaba.com>
Signed-off-by: yanlei <yl.on.the.way@gmail.com>
2022-06-10 18:25:24 +08:00
Liu Jiang
4f62a7618c libs/logging: fix clippy warnings
Fix clippy warnings of libs/logging.

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
2022-06-10 18:25:24 +08:00
Liu Jiang
6f8acb94c2 libs: refine Makefile rules
Refine Makefile rules to better support the KATA ci env.

Fixes: #3536

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
2022-06-10 18:25:24 +08:00
Liu Jiang
7cdee4980c libs/logging: introduce a wrapper writer for logging
Introduce a wrapper writer `LogWriter` which converts every line written
to it into a log record.

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Wei Yang <wei.yang1@linux.alibaba.com>
Signed-off-by: yanlei <yl.on.the.way@gmail.com>
2022-06-10 18:25:24 +08:00
Liu Jiang
426f38de94 libs/logging: implement rotator for log files
Add FileRotator to rotate log files.

The FileRotator structure may be used as writer for create_logger()
and limits the storage space occupied by log files.

Fixes: #3304

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Wei Yang <wei.yang1@linux.alibaba.com>
Signed-off-by: yanlei <yl.on.the.way@gmail.com>
2022-06-10 18:25:24 +08:00
Liu Jiang
392f1ecdf5 libs: convert to a cargo workspace
Convert libs into a Cargo workspace, so all libraries could share the
build infrastructure.

Fixes #3282

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
2022-06-10 18:25:24 +08:00
Liu Jiang
575df4dc4d static-checks: Allow Merge commit to be >75 chars
Some generated merge commit messages are >75 chars
Allow these to not trigger the subject line length failure

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2022-06-10 18:25:24 +08:00
1337 changed files with 117438 additions and 26856 deletions

View File

@@ -0,0 +1,40 @@
#!/bin/bash
#
# Copyright (c) 2022 Red Hat
#
# SPDX-License-Identifier: Apache-2.0
#
script_dir=$(dirname "$(readlink -f "$0")")
parent_dir=$(realpath "${script_dir}/../..")
cidir="${parent_dir}/ci"
source "${cidir}/lib.sh"
cargo_deny_file="${script_dir}/action.yaml"
cat cargo-deny-skeleton.yaml.in > "${cargo_deny_file}"
changed_files_status=$(run_get_pr_changed_file_details)
changed_files_status=$(echo "$changed_files_status" | grep "Cargo\.toml$" || true)
changed_files=$(echo "$changed_files_status" | awk '{print $NF}' || true)
if [ -z "$changed_files" ]; then
cat >> "${cargo_deny_file}" << EOF
- run: echo "No Cargo.toml files to check"
shell: bash
EOF
fi
for path in $changed_files
do
cat >> "${cargo_deny_file}" << EOF
- name: ${path}
continue-on-error: true
shell: bash
run: |
pushd $(dirname ${path})
cargo deny check
popd
EOF
done

View File

@@ -0,0 +1,30 @@
#
# Copyright (c) 2022 Red Hat
#
# SPDX-License-Identifier: Apache-2.0
#
name: 'Cargo Crates Check'
description: 'Checks every Cargo.toml file using cargo-deny'
env:
CARGO_TERM_COLOR: always
runs:
using: "composite"
steps:
- name: Install Rust
uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: nightly
override: true
- name: Cache
uses: Swatinem/rust-cache@v2
- name: Install Cargo deny
shell: bash
run: |
which cargo
cargo install --locked cargo-deny || true

View File

@@ -0,0 +1,100 @@
name: Add backport label
on:
pull_request:
types:
- opened
- synchronize
- reopened
- edited
- labeled
- unlabeled
jobs:
check-issues:
if: ${{ github.event.label.name != 'auto-backport' }}
runs-on: ubuntu-latest
steps:
- name: Checkout code to allow hub to communicate with the project
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v3
- name: Install hub extension script
run: |
pushd $(mktemp -d) &>/dev/null
git clone --single-branch --depth 1 "https://github.com/kata-containers/.github" && cd .github/scripts
sudo install hub-util.sh /usr/local/bin
popd &>/dev/null
- name: Determine whether to add label
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
CONTAINS_AUTO_BACKPORT: ${{ contains(github.event.pull_request.labels.*.name, 'auto-backport') }}
id: add_label
run: |
pr=${{ github.event.pull_request.number }}
linked_issue_urls=$(hub-util.sh \
list-issues-for-pr "$pr" |\
grep -v "^\#" |\
cut -d';' -f3 || true)
[ -z "$linked_issue_urls" ] && {
echo "::error::No linked issues for PR $pr"
exit 1
}
has_bug=false
for issue_url in $(echo "$linked_issue_urls")
do
issue=$(echo "$issue_url"| awk -F\/ '{print $NF}' || true)
[ -z "$issue" ] && {
echo "::error::Cannot determine issue number from $issue_url for PR $pr"
exit 1
}
labels=$(hub-util.sh list-labels-for-issue "$issue")
label_names=$(echo $labels | jq -r '.[].name' || true)
if [[ "$label_names" =~ "bug" ]]; then
has_bug=true
break
fi
done
has_backport_needed_label=${{ contains(github.event.pull_request.labels.*.name, 'needs-backport') }}
has_no_backport_needed_label=${{ contains(github.event.pull_request.labels.*.name, 'no-backport-needed') }}
echo "::set-output name=add_backport_label::false"
if [ $has_backport_needed_label = true ] || [ $has_bug = true ]; then
if [[ $has_no_backport_needed_label = false ]]; then
echo "::set-output name=add_backport_label::true"
fi
fi
# Do not spam comment, only if auto-backport label is going to be newly added.
echo "::set-output name=auto_backport_added::$CONTAINS_AUTO_BACKPORT"
- name: Add comment
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') && steps.add_label.outputs.add_backport_label == 'true' && steps.add_label.outputs.auto_backport_added == 'false' }}
uses: actions/github-script@v6
with:
script: |
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: 'This issue has been marked for auto-backporting. Add label(s) backport-to-BRANCHNAME to backport to them'
})
# Allow label to be removed by adding no-backport-needed label
- name: Remove auto-backport label
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') && steps.add_label.outputs.add_backport_label == 'false' }}
uses: andymckay/labeler@e6c4322d0397f3240f0e7e30a33b5c5df2d39e90
with:
remove-labels: "auto-backport"
repo-token: ${{ secrets.GITHUB_TOKEN }}
- name: Add auto-backport label
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') && steps.add_label.outputs.add_backport_label == 'true' }}
uses: andymckay/labeler@e6c4322d0397f3240f0e7e30a33b5c5df2d39e90
with:
add-labels: "auto-backport"
repo-token: ${{ secrets.GITHUB_TOKEN }}

29
.github/workflows/auto-backport.yaml vendored Normal file
View File

@@ -0,0 +1,29 @@
on:
pull_request_target:
types: ["labeled", "closed"]
jobs:
backport:
name: Backport PR
runs-on: ubuntu-latest
if: |
github.event.pull_request.merged == true
&& contains(github.event.pull_request.labels.*.name, 'auto-backport')
&& (
(github.event.action == 'labeled' && github.event.label.name == 'auto-backport')
|| (github.event.action == 'closed')
)
steps:
- name: Backport Action
uses: sqren/backport-github-action@v8.9.2
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
auto_backport_label_prefix: backport-to-
- name: Info log
if: ${{ success() }}
run: cat /home/runner/.backport/backport.info.log
- name: Debug log
if: ${{ failure() }}
run: cat /home/runner/.backport/backport.debug.log

View File

@@ -0,0 +1,19 @@
name: Cargo Crates Check Runner
on: [pull_request]
jobs:
cargo-deny-runner:
runs-on: ubuntu-latest
steps:
- name: Checkout Code
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v3
- name: Generate Action
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: bash cargo-deny-generator.sh
working-directory: ./.github/cargo-deny-composite-action/
env:
GOPATH: ${{ runner.workspace }}/kata-containers
- name: Run Action
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: ./.github/cargo-deny-composite-action

View File

@@ -75,8 +75,8 @@ jobs:
#
# - A SoB comment can be any length (as it is unreasonable to penalise
# people with long names/email addresses :)
pattern: '^.+(\n([a-zA-Z].{0,72}|[^a-zA-Z\n].*|[^\s\n]*|Signed-off-by:.*|))+$'
error: 'Body line too long (max 72)'
pattern: '^.+(\n([a-zA-Z].{0,150}|[^a-zA-Z\n].*|[^\s\n]*|Signed-off-by:.*|))+$'
error: 'Body line too long (max 150)'
post_error: ${{ env.error_msg }}
- name: Check Fixes

View File

@@ -10,35 +10,32 @@ jobs:
go-version: [1.17.x]
os: [ubuntu-20.04]
runs-on: ${{ matrix.os }}
# don't run this action on forks
if: github.repository_owner == 'kata-containers'
env:
target_branch: ${{ github.base_ref }}
steps:
- name: Install Go
if: github.repository_owner == 'kata-containers'
uses: actions/setup-go@v2
with:
go-version: ${{ matrix.go-version }}
env:
GOPATH: ${{ runner.workspace }}/kata-containers
- name: Set env
if: github.repository_owner == 'kata-containers'
run: |
echo "GOPATH=${{ github.workspace }}" >> $GITHUB_ENV
echo "${{ github.workspace }}/bin" >> $GITHUB_PATH
- name: Checkout code
if: github.repository_owner == 'kata-containers'
uses: actions/checkout@v2
with:
fetch-depth: 0
path: ./src/github.com/${{ github.repository }}
- name: Setup
if: github.repository_owner == 'kata-containers'
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/setup.sh
env:
GOPATH: ${{ runner.workspace }}/kata-containers
# docs url alive check
- name: Docs URL Alive Check
if: github.repository_owner == 'kata-containers'
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && make docs-url-alive-check

View File

@@ -1,8 +1,8 @@
name: Publish Kata 2.x release artifacts
name: Publish Kata release artifacts
on:
push:
tags:
- '2.*'
- '[0-9]+.[0-9]+.[0-9]+*'
jobs:
build-asset:

View File

@@ -1,8 +1,9 @@
name: Release Kata 2.x in snapcraft store
name: Release Kata in snapcraft store
on:
push:
tags:
- '2.*'
- '[0-9]+.[0-9]+.[0-9]+*'
jobs:
release-snap:
runs-on: ubuntu-20.04

View File

@@ -6,8 +6,10 @@
# List of available components
COMPONENTS =
COMPONENTS += libs
COMPONENTS += agent
COMPONENTS += runtime
COMPONENTS += runtime-rs
# List of available tools
TOOLS =
@@ -21,11 +23,6 @@ STANDARD_TARGETS = build check clean install test vendor
default: all
all: logging-crate-tests build
logging-crate-tests:
make -C src/libs/logging
include utils.mk
include ./tools/packaging/kata-deploy/local-build/Makefile
@@ -49,7 +46,6 @@ docs-url-alive-check:
binary-tarball \
default \
install-binary-tarball \
logging-crate-tests \
static-checks \
docs-url-alive-check

View File

@@ -71,6 +71,7 @@ See the [official documentation](docs) including:
- [Developer guide](docs/Developer-Guide.md)
- [Design documents](docs/design)
- [Architecture overview](docs/design/architecture)
- [Architecture 3.0 overview](docs/design/architecture_3.0/)
## Configuration
@@ -116,7 +117,10 @@ The table below lists the core parts of the project:
| Component | Type | Description |
|-|-|-|
| [runtime](src/runtime) | core | Main component run by a container manager and providing a containerd shimv2 runtime implementation. |
| [runtime-rs](src/runtime-rs) | core | The Rust version runtime. |
| [agent](src/agent) | core | Management process running inside the virtual machine / POD that sets up the container environment. |
| [libraries](src/libs) | core | Library crates shared by multiple Kata Container components or published to [`crates.io`](https://crates.io/index.html) |
| [`dragonball`](src/dragonball) | core | An optional built-in VMM brings out-of-the-box Kata Containers experience with optimizations on container workloads |
| [documentation](docs) | documentation | Documentation common to all components (such as design and install documentation). |
| [libraries](src/libs) | core | Library crates shared by multiple Kata Container components or published to [`crates.io`](https://crates.io/index.html) |
| [tests](https://github.com/kata-containers/tests) | tests | Excludes unit tests which live with the main code. |

View File

@@ -1 +1 @@
2.5.0-rc0
3.0.0-rc1

View File

@@ -23,25 +23,27 @@ arch=${ARCH:-$(uname -m)}
workdir="$(mktemp -d --tmpdir build-libseccomp.XXXXX)"
# Variables for libseccomp
# Currently, specify the libseccomp version directly without using `versions.yaml`
# because the current Snap workflow is incomplete.
# After solving the issue, replace this code by using the `versions.yaml`.
# libseccomp_version=$(get_version "externals.libseccomp.version")
# libseccomp_url=$(get_version "externals.libseccomp.url")
libseccomp_version="2.5.1"
libseccomp_url="https://github.com/seccomp/libseccomp"
libseccomp_version="${LIBSECCOMP_VERSION:-""}"
if [ -z "${libseccomp_version}" ]; then
libseccomp_version=$(get_version "externals.libseccomp.version")
fi
libseccomp_url="${LIBSECCOMP_URL:-""}"
if [ -z "${libseccomp_url}" ]; then
libseccomp_url=$(get_version "externals.libseccomp.url")
fi
libseccomp_tarball="libseccomp-${libseccomp_version}.tar.gz"
libseccomp_tarball_url="${libseccomp_url}/releases/download/v${libseccomp_version}/${libseccomp_tarball}"
cflags="-O2"
# Variables for gperf
# Currently, specify the gperf version directly without using `versions.yaml`
# because the current Snap workflow is incomplete.
# After solving the issue, replace this code by using the `versions.yaml`.
# gperf_version=$(get_version "externals.gperf.version")
# gperf_url=$(get_version "externals.gperf.url")
gperf_version="3.1"
gperf_url="https://ftp.gnu.org/gnu/gperf"
gperf_version="${GPERF_VERSION:-""}"
if [ -z "${gperf_version}" ]; then
gperf_version=$(get_version "externals.gperf.version")
fi
gperf_url="${GPERF_URL:-""}"
if [ -z "${gperf_url}" ]; then
gperf_url=$(get_version "externals.gperf.url")
fi
gperf_tarball="gperf-${gperf_version}.tar.gz"
gperf_tarball_url="${gperf_url}/${gperf_tarball}"

View File

@@ -54,3 +54,13 @@ run_docs_url_alive_check()
git fetch -a
bash "$tests_repo_dir/.ci/static-checks.sh" --docs --all "github.com/kata-containers/kata-containers"
}
run_get_pr_changed_file_details()
{
clone_tests_repo
# Make sure we have the targeting branch
git remote set-branches --add origin "${branch}"
git fetch -a
source "$tests_repo_dir/.ci/lib.sh"
get_pr_changed_file_details
}

33
deny.toml Normal file
View File

@@ -0,0 +1,33 @@
targets = [
{ triple = "x86_64-apple-darwin" },
{ triple = "x86_64-unknown-linux-gnu" },
{ triple = "x86_64-unknown-linux-musl" },
]
[advisories]
vulnerability = "deny"
unsound = "deny"
unmaintained = "deny"
ignore = ["RUSTSEC-2020-0071"]
[bans]
multiple-versions = "allow"
deny = [
{ name = "cmake" },
{ name = "openssl-sys" },
]
[licenses]
unlicensed = "deny"
allow-osi-fsf-free = "neither"
copyleft = "allow"
# We want really high confidence when inferring licenses from text
confidence-threshold = 0.93
allow = ["0BSD", "Apache-2.0", "BSD-2-Clause", "BSD-3-Clause", "CC0-1.0", "ISC", "MIT", "MPL-2.0"]
private = { ignore = true}
exceptions = []
[sources]
unknown-registry = "allow"
unknown-git = "allow"

View File

@@ -425,7 +425,7 @@ To build utilizing the same options as Kata, you should make use of the `configu
$ cd $your_qemu_directory
$ $packaging_dir/scripts/configure-hypervisor.sh kata-qemu > kata.cfg
$ eval ./configure "$(cat kata.cfg)"
$ make -j $(nproc)
$ make -j $(nproc --ignore=1)
$ sudo -E make install
```
@@ -522,7 +522,7 @@ bash-4.2# exit
exit
```
`kata-runtime exec` has a command-line option `runtime-namespace`, which is used to specify under which [runtime namespace](https://github.com/containerd/containerd/blob/master/docs/namespaces.md) the particular pod was created. By default, it is set to `k8s.io` and works for containerd when configured
`kata-runtime exec` has a command-line option `runtime-namespace`, which is used to specify under which [runtime namespace](https://github.com/containerd/containerd/blob/main/docs/namespaces.md) the particular pod was created. By default, it is set to `k8s.io` and works for containerd when configured
with Kubernetes. For CRI-O, the namespace should set to `default` explicitly. This should not be confused with [Kubernetes namespaces](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/).
For other CRI-runtimes and configurations, you may need to set the namespace utilizing the `runtime-namespace` option.

View File

@@ -60,17 +60,26 @@ This section lists items that might be possible to fix.
## OCI CLI commands
### Docker and Podman support
Currently Kata Containers does not support Docker or Podman.
Currently Kata Containers does not support Podman.
See issue https://github.com/kata-containers/kata-containers/issues/722 for more information.
Docker supports Kata Containers since 22.06:
```bash
$ sudo docker run --runtime io.containerd.kata.v2
```
Kata Containers works perfectly with containerd, we recommend to use
containerd's Docker-style command line tool [`nerdctl`](https://github.com/containerd/nerdctl).
## Runtime commands
### checkpoint and restore
The runtime does not provide `checkpoint` and `restore` commands. There
are discussions about using VM save and restore to give us a
`[criu](https://github.com/checkpoint-restore/criu)`-like functionality,
[`criu`](https://github.com/checkpoint-restore/criu)-like functionality,
which might provide a solution.
Note that the OCI standard does not specify `checkpoint` and `restore`
@@ -93,6 +102,42 @@ All other configurations are supported and are working properly.
## Networking
### Host network
Host network (`nerdctl/docker run --net=host`or [Kubernetes `HostNetwork`](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#hosts-namespaces)) is not supported.
It is not possible to directly access the host networking configuration
from within the VM.
The `--net=host` option can still be used with `runc` containers and
inter-mixed with running Kata Containers, thus enabling use of `--net=host`
when necessary.
It should be noted, currently passing the `--net=host` option into a
Kata Container may result in the Kata Container networking setup
modifying, re-configuring and therefore possibly breaking the host
networking setup. Do not use `--net=host` with Kata Containers.
### Support for joining an existing VM network
Docker supports the ability for containers to join another containers
namespace with the `docker run --net=containers` syntax. This allows
multiple containers to share a common network namespace and the network
interfaces placed in the network namespace. Kata Containers does not
support network namespace sharing. If a Kata Container is setup to
share the network namespace of a `runc` container, the runtime
effectively takes over all the network interfaces assigned to the
namespace and binds them to the VM. Consequently, the `runc` container loses
its network connectivity.
### docker run --link
The runtime does not support the `docker run --link` command. This
command is now deprecated by docker and we have no intention of adding support.
Equivalent functionality can be achieved with the newer docker networking commands.
See more documentation at
[docs.docker.com](https://docs.docker.com/network/links/).
## Resource management
Due to the way VMs differ in their CPU and memory allocation, and sharing

View File

@@ -341,7 +341,7 @@ The main repository has the most comprehensive set of skip abilities. See:
One method is to use the `nix` crate along with some custom macros:
```
```rust
#[cfg(test)]
mod tests {
#[allow(unused_macros)]

View File

@@ -0,0 +1,169 @@
# Kata 3.0 Architecture
## Overview
In cloud-native scenarios, there is an increased demand for container startup speed, resource consumption, stability, and security, areas where the present Kata Containers runtime is challenged relative to other runtimes. To achieve this, we propose a solid, field-tested and secure Rust version of the kata-runtime.
Also, we provide the following designs:
- Turn key solution with builtin `Dragonball` Sandbox
- Async I/O to reduce resource consumption
- Extensible framework for multiple services, runtimes and hypervisors
- Lifecycle management for sandbox and container associated resources
### Rationale for choosing Rust
We chose Rust because it is designed as a system language with a focus on efficiency.
In contrast to Go, Rust makes a variety of design trade-offs in order to obtain
good execution performance, with innovative techniques that, in contrast to C or
C++, provide reasonable protection against common memory errors (buffer
overflow, invalid pointers, range errors), error checking (ensuring errors are
dealt with), thread safety, ownership of resources, and more.
These benefits were verified in our project when the Kata Containers guest agent
was rewritten in Rust. We notably saw a significant reduction in memory usage
with the Rust-based implementation.
## Design
### Architecture
![architecture](./images/architecture.png)
### Built-in VMM
#### Current Kata 2.x architecture
![not_builtin_vmm](./images/not_built_in_vmm.png)
As shown in the figure, runtime and VMM are separate processes. The runtime process forks the VMM process and interacts through the inter-process RPC. Typically, process interaction consumes more resources than peers within the process, and it will result in relatively low efficiency. At the same time, the cost of resource operation and maintenance should be considered. For example, when performing resource recovery under abnormal conditions, the exception of any process must be detected by others and activate the appropriate resource recovery process. If there are additional processes, the recovery becomes even more difficult.
#### How To Support Built-in VMM
We provide `Dragonball` Sandbox to enable built-in VMM by integrating VMM's function into the Rust library. We could perform VMM-related functionalities by using the library. Because runtime and VMM are in the same process, there is a benefit in terms of message processing speed and API synchronization. It can also guarantee the consistency of the runtime and the VMM life cycle, reducing resource recovery and exception handling maintenance, as shown in the figure:
![builtin_vmm](./images/built_in_vmm.png)
### Async Support
#### Why Need Async
**Async is already in stable Rust and allows us to write async code**
- Async provides significantly reduced CPU and memory overhead, especially for workloads with a large amount of IO-bound tasks
- Async is zero-cost in Rust, which means that you only pay for what you use. Specifically, you can use async without heap allocations and dynamic dispatch, which greatly improves efficiency
- For more (see [Why Async?](https://rust-lang.github.io/async-book/01_getting_started/02_why_async.html) and [The State of Asynchronous Rust](https://rust-lang.github.io/async-book/01_getting_started/03_state_of_async_rust.html)).
**There may be several problems if implementing kata-runtime with Sync Rust**
- Too many threads with a new TTRPC connection
- TTRPC threads: reaper thread(1) + listener thread(1) + client handler(2)
- Add 3 I/O threads with a new container
- In Sync mode, implementing a timeout mechanism is challenging. For example, in TTRPC API interaction, the timeout mechanism is difficult to align with Golang
#### How To Support Async
The kata-runtime is controlled by TOKIO_RUNTIME_WORKER_THREADS to run the OS thread, which is 2 threads by default. For TTRPC and container-related threads run in the `tokio` thread in a unified manner, and related dependencies need to be switched to Async, such as Timer, File, Netlink, etc. With the help of Async, we can easily support no-block I/O and timer. Currently, we only utilize Async for kata-runtime. The built-in VMM keeps the OS thread because it can ensure that the threads are controllable.
**For N tokio worker threads and M containers**
- Sync runtime(both OS thread and `tokio` task are OS thread but without `tokio` worker thread) OS thread number: 4 + 12*M
- Async runtime(only OS thread is OS thread) OS thread number: 2 + N
```shell
├─ main(OS thread)
├─ async-logger(OS thread)
└─ tokio worker(N * OS thread)
├─ agent log forwarder(1 * tokio task)
├─ health check thread(1 * tokio task)
├─ TTRPC reaper thread(M * tokio task)
├─ TTRPC listener thread(M * tokio task)
├─ TTRPC client handler thread(7 * M * tokio task)
├─ container stdin io thread(M * tokio task)
├─ container stdin io thread(M * tokio task)
└─ container stdin io thread(M * tokio task)
```
### Extensible Framework
The Kata 3.x runtime is designed with the extension of service, runtime, and hypervisor, combined with configuration to meet the needs of different scenarios. At present, the service provides a register mechanism to support multiple services. Services could interact with runtime through messages. In addition, the runtime handler handles messages from services. To meet the needs of a binary that supports multiple runtimes and hypervisors, the startup must obtain the runtime handler type and hypervisor type through configuration.
![framework](./images/framework.png)
### Resource Manager
In our case, there will be a variety of resources, and every resource has several subtypes. Especially for `Virt-Container`, every subtype of resource has different operations. And there may be dependencies, such as the share-fs rootfs and the share-fs volume will use share-fs resources to share files to the VM. Currently, network and share-fs are regarded as sandbox resources, while rootfs, volume, and cgroup are regarded as container resources. Also, we abstract a common interface for each resource and use subclass operations to evaluate the differences between different subtypes.
![resource manager](./images/resourceManager.png)
## Roadmap
- Stage 1 (June): provide basic features (current delivered)
- Stage 2 (September): support common features
- Stage 3: support full features
| **Class** | **Sub-Class** | **Development Stage** | **Status** |
| -------------------------- | ------------------- | --------------------- |------------|
| Service | task service | Stage 1 | ✅ |
| | extend service | Stage 3 | 🚫 |
| | image service | Stage 3 | 🚫 |
| Runtime handler | `Virt-Container` | Stage 1 | ✅ |
| Endpoint | VETH Endpoint | Stage 1 | ✅ |
| | Physical Endpoint | Stage 2 | ✅ |
| | Tap Endpoint | Stage 2 | ✅ |
| | `Tuntap` Endpoint | Stage 2 | ✅ |
| | `IPVlan` Endpoint | Stage 2 | ✅ |
| | `MacVlan` Endpoint | Stage 2 | ✅ |
| | MACVTAP Endpoint | Stage 3 | 🚫 |
| | `VhostUserEndpoint` | Stage 3 | 🚫 |
| Network Interworking Model | Tc filter | Stage 1 | ✅ |
| | `MacVtap` | Stage 3 | 🚧 |
| Storage | Virtio-fs | Stage 1 | ✅ |
| | `nydus` | Stage 2 | 🚧 |
| | `device mapper` | Stage 2 | 🚫 |
| `Cgroup V2` | | Stage 2 | 🚧 |
| Hypervisor | `Dragonball` | Stage 1 | 🚧 |
| | QEMU | Stage 2 | 🚫 |
| | ACRN | Stage 3 | 🚫 |
| | Cloud Hypervisor | Stage 3 | 🚫 |
| | Firecracker | Stage 3 | 🚫 |
## FAQ
- Are the "service", "message dispatcher" and "runtime handler" all part of the single Kata 3.x runtime binary?
Yes. They are components in Kata 3.x runtime. And they will be packed into one binary.
1. Service is an interface, which is responsible for handling multiple services like task service, image service and etc.
2. Message dispatcher, it is used to match multiple requests from the service module.
3. Runtime handler is used to deal with the operation for sandbox and container.
- What is the name of the Kata 3.x runtime binary?
Apparently we can't use `containerd-shim-v2-kata` because it's already used. We are facing the hardest issue of "naming" again. Any suggestions are welcomed.
Internally we use `containerd-shim-v2-rund`.
- Is the Kata 3.x design compatible with the containerd shimv2 architecture?
Yes. It is designed to follow the functionality of go version kata. And it implements the `containerd shim v2` interface/protocol.
- How will users migrate to the Kata 3.x architecture?
The migration plan will be provided before the Kata 3.x is merging into the main branch.
- Is `Dragonball` limited to its own built-in VMM? Can the `Dragonball` system be configured to work using an external `Dragonball` VMM/hypervisor?
The `Dragonball` could work as an external hypervisor. However, stability and performance is challenging in this case. Built in VMM could optimise the container overhead, and it's easy to maintain stability.
`runD` is the `containerd-shim-v2` counterpart of `runC` and can run a pod/containers. `Dragonball` is a `microvm`/VMM that is designed to run container workloads. Instead of `microvm`/VMM, we sometimes refer to it as secure sandbox.
- QEMU, Cloud Hypervisor and Firecracker support are planned, but how that would work. Are they working in separate process?
Yes. They are unable to work as built in VMM.
- What is `upcall`?
The `upcall` is used to hotplug CPU/memory/MMIO devices, and it solves two issues.
1. avoid dependency on PCI/ACPI
2. avoid dependency on `udevd` within guest and get deterministic results for hotplug operations. So `upcall` is an alternative to ACPI based CPU/memory/device hotplug. And we may cooperate with the community to add support for ACPI based CPU/memory/device hotplug if needed.
`Dbs-upcall` is a `vsock-based` direct communication tool between VMM and guests. The server side of the `upcall` is a driver in guest kernel (kernel patches are needed for this feature) and it'll start to serve the requests once the kernel has started. And the client side is in VMM , it'll be a thread that communicates with VSOCK through `uds`. We have accomplished device hotplug / hot-unplug directly through `upcall` in order to avoid virtualization of ACPI to minimize virtual machine's overhead. And there could be many other usage through this direct communication channel. It's already open source.
https://github.com/openanolis/dragonball-sandbox/tree/main/crates/dbs-upcall
- The URL below says the kernel patches work with 4.19, but do they also work with 5.15+ ?
Forward compatibility should be achievable, we have ported it to 5.10 based kernel.
- Are these patches platform-specific or would they work for any architecture that supports VSOCK?
It's almost platform independent, but some message related to CPU hotplug are platform dependent.
- Could the kernel driver be replaced with a userland daemon in the guest using loopback VSOCK?
We need to create device nodes for hot-added CPU/memory/devices, so it's not easy for userspace daemon to do these tasks.
- The fact that `upcall` allows communication between the VMM and the guest suggests that this architecture might be incompatible with https://github.com/confidential-containers where the VMM should have no knowledge of what happens inside the VM.
1. `TDX` doesn't support CPU/memory hotplug yet.
2. For ACPI based device hotplug, it depends on ACPI `DSDT` table, and the guest kernel will execute `ASL` code to handle during handling those hotplug event. And it should be easier to audit VSOCK based communication than ACPI `ASL` methods.
- What is the security boundary for the monolithic / "Built-in VMM" case?
It has the security boundary of virtualization. More details will be provided in next stage.

Binary file not shown.

After

Width:  |  Height:  |  Size: 95 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 66 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 136 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 72 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 139 KiB

File diff suppressed because one or more lines are too long

View File

@@ -12,7 +12,7 @@ The OCI [runtime specification][linux-config] provides guidance on where the con
> [`cgroupsPath`][cgroupspath]: (string, OPTIONAL) path to the cgroups. It can be used to either control the cgroups
> hierarchy for containers or to run a new process in an existing container
Cgroups are hierarchical, and this can be seen with the following pod example:
The cgroups are hierarchical, and this can be seen with the following pod example:
- Pod 1: `cgroupsPath=/kubepods/pod1`
- Container 1: `cgroupsPath=/kubepods/pod1/container1`
@@ -247,14 +247,14 @@ cgroup size and constraints accordingly.
# Supported cgroups
Kata Containers currently only supports cgroups `v1`.
Kata Containers currently supports cgroups `v1` and `v2`.
In the following sections each cgroup is described briefly.
## Cgroups V1
## cgroups v1
`Cgroups V1` are under a [`tmpfs`][1] filesystem mounted at `/sys/fs/cgroup`, where each cgroup is
mounted under a separate cgroup filesystem. A `Cgroups v1` hierarchy may look like the following
`cgroups v1` are under a [`tmpfs`][1] filesystem mounted at `/sys/fs/cgroup`, where each cgroup is
mounted under a separate cgroup filesystem. A `cgroups v1` hierarchy may look like the following
diagram:
```
@@ -301,13 +301,12 @@ diagram:
A process can join a cgroup by writing its process id (`pid`) to `cgroup.procs` file,
or join a cgroup partially by writing the task (thread) id (`tid`) to the `tasks` file.
Kata Containers only supports `v1`.
To know more about `cgroups v1`, see [cgroupsv1(7)][2].
## Cgroups V2
## cgroups v2
`Cgroups v2` are also known as unified cgroups, unlike `cgroups v1`, the cgroups are
mounted under the same cgroup filesystem. A `Cgroups v2` hierarchy may look like the following
`cgroups v2` are also known as unified cgroups, unlike `cgroups v1`, the cgroups are
mounted under the same cgroup filesystem. A `cgroups v2` hierarchy may look like the following
diagram:
```
@@ -354,8 +353,6 @@ Same as `cgroups v1`, a process can join the cgroup by writing its process id (`
`cgroup.procs` file, or join a cgroup partially by writing the task (thread) id (`tid`) to
`cgroup.threads` file.
Kata Containers does not support cgroups `v2` on the host.
### Distro Support
Many Linux distributions do not yet support `cgroups v2`, as it is quite a recent addition.

View File

@@ -5,7 +5,7 @@
- [Run Kata containers with `crictl`](run-kata-with-crictl.md)
- [Run Kata Containers with Kubernetes](run-kata-with-k8s.md)
- [How to use Kata Containers and Containerd](containerd-kata.md)
- [How to use Kata Containers and CRI (containerd) with Kubernetes](how-to-use-k8s-with-cri-containerd-and-kata.md)
- [How to use Kata Containers and containerd with Kubernetes](how-to-use-k8s-with-containerd-and-kata.md)
- [Kata Containers and service mesh for Kubernetes](service-mesh.md)
- [How to import Kata Containers logs into Fluentd](how-to-import-kata-logs-with-fluentd.md)

View File

@@ -40,7 +40,7 @@ use `RuntimeClass` instead of the deprecated annotations.
### Containerd Runtime V2 API: Shim V2 API
The [`containerd-shim-kata-v2` (short as `shimv2` in this documentation)](../../src/runtime/cmd/containerd-shim-kata-v2/)
implements the [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/master/runtime/v2) for Kata.
implements the [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/main/runtime/v2) for Kata.
With `shimv2`, Kubernetes can launch Pod and OCI-compatible containers with one shim per Pod. Prior to `shimv2`, `2N+1`
shims (i.e. a `containerd-shim` and a `kata-shim` for each container and the Pod sandbox itself) and no standalone `kata-proxy`
process were used, even with VSOCK not available.
@@ -132,9 +132,9 @@ The `RuntimeClass` is suggested.
The following configuration includes two runtime classes:
- `plugins.cri.containerd.runtimes.runc`: the runc, and it is the default runtime.
- `plugins.cri.containerd.runtimes.kata`: The function in containerd (reference [the document here](https://github.com/containerd/containerd/tree/master/runtime/v2#binary-naming))
- `plugins.cri.containerd.runtimes.kata`: The function in containerd (reference [the document here](https://github.com/containerd/containerd/tree/main/runtime/v2#binary-naming))
where the dot-connected string `io.containerd.kata.v2` is translated to `containerd-shim-kata-v2` (i.e. the
binary name of the Kata implementation of [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/master/runtime/v2)).
binary name of the Kata implementation of [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/main/runtime/v2)).
```toml
[plugins.cri.containerd]

View File

@@ -45,6 +45,9 @@ spec:
- name: containerdsocket
mountPath: /run/containerd/containerd.sock
readOnly: true
- name: sbs
mountPath: /run/vc/sbs/
readOnly: true
terminationGracePeriodSeconds: 30
volumes:
- name: containerdtask
@@ -53,3 +56,6 @@ spec:
- name: containerdsocket
hostPath:
path: /run/containerd/containerd.sock
- name: sbs
hostPath:
path: /run/vc/sbs/

View File

@@ -19,7 +19,7 @@ Also you should ensure that `kubectl` working correctly.
> **Note**: More information about Kubernetes integrations:
> - [Run Kata Containers with Kubernetes](run-kata-with-k8s.md)
> - [How to use Kata Containers and Containerd](containerd-kata.md)
> - [How to use Kata Containers and CRI (containerd plugin) with Kubernetes](how-to-use-k8s-with-cri-containerd-and-kata.md)
> - [How to use Kata Containers and containerd with Kubernetes](how-to-use-k8s-with-containerd-and-kata.md)
## Configure Prometheus

View File

@@ -1,15 +1,15 @@
# How to use Kata Containers and CRI (containerd plugin) with Kubernetes
# How to use Kata Containers and containerd with Kubernetes
This document describes how to set up a single-machine Kubernetes (k8s) cluster.
The Kubernetes cluster will use the
[CRI containerd](https://github.com/containerd/containerd/) and
[Kata Containers](https://katacontainers.io) to launch untrusted workloads.
[containerd](https://github.com/containerd/containerd/) and
[Kata Containers](https://katacontainers.io) to launch workloads.
## Requirements
- Kubernetes, Kubelet, `kubeadm`
- containerd with `cri` plug-in
- containerd
- Kata Containers
> **Note:** For information about the supported versions of these components,
@@ -149,7 +149,7 @@ $ sudo -E kubectl taint nodes --all node-role.kubernetes.io/master-
## Create runtime class for Kata Containers
By default, all pods are created with the default runtime configured in CRI containerd plugin.
By default, all pods are created with the default runtime configured in containerd.
From Kubernetes v1.12, users can use [`RuntimeClass`](https://kubernetes.io/docs/concepts/containers/runtime-class/#runtime-class) to specify a different runtime for Pods.
```bash
@@ -166,7 +166,7 @@ $ sudo -E kubectl apply -f runtime.yaml
## Run pod in Kata Containers
If a pod has the `runtimeClassName` set to `kata`, the CRI plugin runs the pod with the
If a pod has the `runtimeClassName` set to `kata`, the CRI runs the pod with the
[Kata Containers runtime](../../src/runtime/README.md).
- Create an pod configuration that using Kata Containers runtime

View File

@@ -40,7 +40,7 @@ See below example config:
ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration.toml"
```
- [Kata Containers with Containerd and CRI documentation](how-to-use-k8s-with-cri-containerd-and-kata.md)
- [How to use Kata Containers and containerd with Kubernetes](how-to-use-k8s-with-containerd-and-kata.md)
- [Containerd CRI config documentation](https://github.com/containerd/containerd/blob/main/docs/cri/config.md)
#### CRI-O

View File

@@ -15,7 +15,7 @@ After choosing one CRI implementation, you must make the appropriate configurati
to ensure it integrates with Kata Containers.
Kata Containers 1.5 introduced the `shimv2` for containerd 1.2.0, reducing the components
required to spawn pods and containers, and this is the preferred way to run Kata Containers with Kubernetes ([as documented here](../how-to/how-to-use-k8s-with-cri-containerd-and-kata.md#configure-containerd-to-use-kata-containers)).
required to spawn pods and containers, and this is the preferred way to run Kata Containers with Kubernetes ([as documented here](../how-to/how-to-use-k8s-with-containerd-and-kata.md#configure-containerd-to-use-kata-containers)).
An equivalent shim implementation for CRI-O is planned.
@@ -57,7 +57,7 @@ content shown below:
To customize containerd to select Kata Containers runtime, follow our
"Configure containerd to use Kata Containers" internal documentation
[here](../how-to/how-to-use-k8s-with-cri-containerd-and-kata.md#configure-containerd-to-use-kata-containers).
[here](../how-to/how-to-use-k8s-with-containerd-and-kata.md#configure-containerd-to-use-kata-containers).
## Install Kubernetes
@@ -85,7 +85,7 @@ Environment="KUBELET_EXTRA_ARGS=--container-runtime=remote --runtime-request-tim
Environment="KUBELET_EXTRA_ARGS=--container-runtime=remote --runtime-request-timeout=15m --container-runtime-endpoint=unix:///run/containerd/containerd.sock"
```
For more information about containerd see the "Configure Kubelet to use containerd"
documentation [here](../how-to/how-to-use-k8s-with-cri-containerd-and-kata.md#configure-kubelet-to-use-containerd).
documentation [here](../how-to/how-to-use-k8s-with-containerd-and-kata.md#configure-kubelet-to-use-containerd).
## Run a Kubernetes pod with Kata Containers

View File

@@ -33,6 +33,7 @@ are available, their default values and how each setting can be used.
[Cloud Hypervisor] | rust | `aarch64`, `x86_64` | Type 2 ([KVM]) | `configuration-clh.toml` |
[Firecracker] | rust | `aarch64`, `x86_64` | Type 2 ([KVM]) | `configuration-fc.toml` |
[QEMU] | C | all | Type 2 ([KVM]) | `configuration-qemu.toml` |
[`Dragonball`] | rust | `aarch64`, `x86_64` | Type 2 ([KVM]) | `configuration-dragonball.toml` |
## Determine currently configured hypervisor
@@ -52,6 +53,7 @@ the hypervisors:
[Cloud Hypervisor] | Low latency, small memory footprint, small attack surface | Minimal | | excellent | excellent | High performance modern cloud workloads | |
[Firecracker] | Very slimline | Extremely minimal | Doesn't support all device types | excellent | excellent | Serverless / FaaS | |
[QEMU] | Lots of features | Lots | | good | good | Good option for most users | | All users |
[`Dragonball`] | Built-in VMM, low CPU and memory overhead| Minimal | | excellent | excellent | Optimized for most container workloads | `out-of-the-box` Kata Containers experience |
For further details, see the [Virtualization in Kata Containers](design/virtualization.md) document and the official documentation for each hypervisor.
@@ -60,3 +62,4 @@ For further details, see the [Virtualization in Kata Containers](design/virtuali
[Firecracker]: https://github.com/firecracker-microvm/firecracker
[KVM]: https://en.wikipedia.org/wiki/Kernel-based_Virtual_Machine
[QEMU]: http://www.qemu-project.org
[`Dragonball`]: https://github.com/openanolis/dragonball-sandbox

View File

@@ -79,3 +79,6 @@ versions. This is not recommended for normal users.
* [upgrading document](../Upgrading.md)
* [developer guide](../Developer-Guide.md)
* [runtime documentation](../../src/runtime/README.md)
## Kata Containers 3.0 rust runtime installation
* [installation guide](../install/kata-containers-3.0-rust-runtime-installation-guide.md)

View File

@@ -19,12 +19,6 @@
> - If you decide to proceed and install a Kata Containers release, you can
> still check for the latest version of Kata Containers by running
> `kata-runtime check --only-list-releases`.
>
> - These instructions will not work for Fedora 31 and higher since those
> distribution versions only support cgroups version 2 by default. However,
> Kata Containers currently requires cgroups version 1 (on the host side). See
> https://github.com/kata-containers/kata-containers/issues/927 for further
> details.
## Install Kata Containers

View File

@@ -0,0 +1,101 @@
# Kata Containers 3.0 rust runtime installation
The following is an overview of the different installation methods available.
## Prerequisites
Kata Containers 3.0 rust runtime requires nested virtualization or bare metal. Check
[hardware requirements](/src/runtime/README.md#hardware-requirements) to see if your system is capable of running Kata
Containers.
### Platform support
Kata Containers 3.0 rust runtime currently runs on 64-bit systems supporting the following
architectures:
> **Notes:**
> For other architectures, see https://github.com/kata-containers/kata-containers/issues/4320
| Architecture | Virtualization technology |
|-|-|
| `x86_64`| [Intel](https://www.intel.com) VT-x |
| `aarch64` ("`arm64`")| [ARM](https://www.arm.com) Hyp |
## Packaged installation methods
| Installation method | Description | Automatic updates | Use case | Availability
|------------------------------------------------------|----------------------------------------------------------------------------------------------|-------------------|-----------------------------------------------------------------------------------------------|----------- |
| [Using kata-deploy](#kata-deploy-installation) | The preferred way to deploy the Kata Containers distributed binaries on a Kubernetes cluster | **No!** | Best way to give it a try on kata-containers on an already up and running Kubernetes cluster. | No |
| [Using official distro packages](#official-packages) | Kata packages provided by Linux distributions official repositories | yes | Recommended for most users. | No |
| [Using snap](#snap-installation) | Easy to install | yes | Good alternative to official distro packages. | No |
| [Automatic](#automatic-installation) | Run a single command to install a full system | **No!** | For those wanting the latest release quickly. | No |
| [Manual](#manual-installation) | Follow a guide step-by-step to install a working system | **No!** | For those who want the latest release with more control. | No |
| [Build from source](#build-from-source-installation) | Build the software components manually | **No!** | Power users and developers only. | Yes |
### Kata Deploy Installation
`ToDo`
### Official packages
`ToDo`
### Snap Installation
`ToDo`
### Automatic Installation
`ToDo`
### Manual Installation
`ToDo`
## Build from source installation
### Rust Environment Set Up
* Download `Rustup` and install `Rust`
> **Notes:**
> Rust version 1.58 is needed
Example for `x86_64`
```
$ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
$ source $HOME/.cargo/env
$ rustup install 1.58
$ rustup default 1.58-x86_64-unknown-linux-gnu
```
* Musl support for fully static binary
Example for `x86_64`
```
$ rustup target add x86_64-unknown-linux-musl
```
* [Musl `libc`](http://musl.libc.org/) install
Example for musl 1.2.3
```
$ curl -O https://git.musl-libc.org/cgit/musl/snapshot/musl-1.2.3.tar.gz
$ tar vxf musl-1.2.3.tar.gz
$ cd musl-1.2.3/
$ ./configure --prefix=/usr/local/
$ make && sudo make install
```
### Install Kata 3.0 Rust Runtime Shim
```
$ git clone https://github.com/kata-containers/kata-containers.git
$ cd kata-containers/src/runtime-rs
$ make && sudo make install
```
After running the command above, the default config file `configuration.toml` will be installed under `/usr/share/defaults/kata-containers/`, the binary file `containerd-shim-kata-v2` will be installed under `/user/local/bin` .
### Build Kata Containers Kernel
Follow the [Kernel installation guide](/tools/packaging/kernel/README.md).
### Build Kata Rootfs
Follow the [Rootfs installation guide](../../tools/osbuilder/rootfs-builder/README.md).
### Build Kata Image
Follow the [Image installation guide](../../tools/osbuilder/image-builder/README.md).
### Install Containerd
Follow the [Containerd installation guide](container-manager/containerd/containerd-install.md).

View File

@@ -55,11 +55,11 @@ Here are the features to set up a CRI-O based Minikube, and why you need them:
| what | why |
| ---- | --- |
| `--bootstrapper=kubeadm` | As recommended for [minikube CRI-o](https://kubernetes.io/docs/setup/minikube/#cri-o) |
| `--bootstrapper=kubeadm` | As recommended for [minikube CRI-O](https://minikube.sigs.k8s.io/docs/handbook/config/#runtime-configuration) |
| `--container-runtime=cri-o` | Using CRI-O for Kata |
| `--enable-default-cni` | As recommended for [minikube CRI-o](https://kubernetes.io/docs/setup/minikube/#cri-o) |
| `--enable-default-cni` | As recommended for [minikube CRI-O](https://minikube.sigs.k8s.io/docs/handbook/config/#runtime-configuration) |
| `--memory 6144` | Allocate sufficient memory, as Kata Containers default to 1 or 2Gb |
| `--network-plugin=cni` | As recommended for [minikube CRI-o](https://kubernetes.io/docs/setup/minikube/#cri-o) |
| `--network-plugin=cni` | As recommended for [minikube CRI-O](https://minikube.sigs.k8s.io/docs/handbook/config/#runtime-configuration) |
| `--vm-driver kvm2` | The host VM driver |
To use containerd, modify the `--container-runtime` argument:

View File

@@ -279,8 +279,8 @@ $ export KERNEL_EXTRAVERSION=$(awk '/^EXTRAVERSION =/{print $NF}' $GOPATH/$LINUX
$ export KERNEL_ROOTFS_DIR=${KERNEL_MAJOR_VERSION}.${KERNEL_PATHLEVEL}.${KERNEL_SUBLEVEL}${KERNEL_EXTRAVERSION}
$ cd $QAT_SRC
$ KERNEL_SOURCE_ROOT=$GOPATH/$LINUX_VER ./configure --enable-icp-sriov=guest
$ sudo -E make all -j$(nproc)
$ sudo -E make INSTALL_MOD_PATH=$ROOTFS_DIR qat-driver-install -j$(nproc)
$ sudo -E make all -j $($(nproc ${CI:+--ignore 1}))
$ sudo -E make INSTALL_MOD_PATH=$ROOTFS_DIR qat-driver-install -j $($(nproc ${CI:+--ignore 1}))
```
The `usdm_drv` module also needs to be copied into the rootfs modules path and

View File

@@ -18,16 +18,13 @@ CONFIG_X86_SGX_KVM=y
* Kubernetes cluster configured with:
* [`kata-deploy`](../../tools/packaging/kata-deploy) based Kata Containers installation
* [Intel SGX Kubernetes device plugin](https://github.com/intel/intel-device-plugins-for-kubernetes/tree/main/cmd/sgx_plugin#deploying-with-pre-built-images)
* [Intel SGX Kubernetes device plugin](https://github.com/intel/intel-device-plugins-for-kubernetes/tree/main/cmd/sgx_plugin#deploying-with-pre-built-images) and associated components including [operator](https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/cmd/operator/README.md) and dependencies
> Note: Kata Containers supports creating VM sandboxes with Intel® SGX enabled
> using [cloud-hypervisor](https://github.com/cloud-hypervisor/cloud-hypervisor/) and [QEMU](https://www.qemu.org/) VMMs only.
### Kata Containers Configuration
Before running a Kata Container make sure that your version of `crio` or `containerd`
supports annotations.
For `containerd` check in `/etc/containerd/config.toml` that the list of `pod_annotations` passed
to the `sandbox` are: `["io.katacontainers.*", "sgx.intel.com/epc"]`.
@@ -99,4 +96,4 @@ because socket passthrough is not supported. An alternative is to deploy the `ae
container.
* Projects like [Gramine Shielded Containers (GSC)](https://gramine-gsc.readthedocs.io/en/latest/) are
also known to work. For GSC specifically, the Kata guest kernel needs to have the `CONFIG_NUMA=y`
enabled and at least one CPU online when running the GSC container.
enabled and at least one CPU online when running the GSC container. The Kata Containers guest kernel currently has `CONFIG_NUMA=y` enabled by default.

View File

@@ -193,7 +193,7 @@ parts:
# Setup and build kernel
./build-kernel.sh -v "${kernel_version}" -d setup
cd ${kernel_dir_prefix}*
make -j $(($(nproc)-1)) EXTRAVERSION=".container"
make -j $(nproc ${CI:+--ignore 1}) EXTRAVERSION=".container"
kernel_suffix="${kernel_version}.container"
kata_kernel_dir="${SNAPCRAFT_PART_INSTALL}/usr/share/kata-containers"
@@ -206,7 +206,7 @@ parts:
# Install raw kernel
vmlinux_path="vmlinux"
[ "${arch}" = "s390x" ] && vmlinux_path="arch/s390/boot/compressed/vmlinux"
[ "${arch}" = "s390x" ] && vmlinux_path="arch/s390/boot/vmlinux"
vmlinux_name="vmlinux-${kernel_suffix}"
cp "${vmlinux_path}" "${kata_kernel_dir}/${vmlinux_name}"
ln -sf "${vmlinux_name}" "${kata_kernel_dir}/vmlinux.container"
@@ -282,7 +282,7 @@ parts:
esac
# build and install
make -j $(($(nproc)-1))
make -j $(nproc ${CI:+--ignore 1})
make install DESTDIR="${SNAPCRAFT_PART_INSTALL}"
prime:
- -snap/

256
src/agent/Cargo.lock generated
View File

@@ -17,6 +17,15 @@ dependencies = [
"memchr",
]
[[package]]
name = "android_system_properties"
version = "0.1.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "819e7219dbd41043ac279b19830f2efc897156490d7fd6ea916720117ee66311"
dependencies = [
"libc",
]
[[package]]
name = "ansi_term"
version = "0.12.1"
@@ -98,6 +107,12 @@ version = "3.10.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "37ccbd214614c6783386c1af30caf03192f17891059cecc394b4fb119e363de3"
[[package]]
name = "byte-unit"
version = "3.1.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "415301c9de11005d4b92193c0eb7ac7adc37e5a49e0ac9bed0a42343512744b8"
[[package]]
name = "byteorder"
version = "1.4.3"
@@ -162,26 +177,28 @@ checksum = "baf1de4339761588bc0619e3cbc0120ee582ebb74b53b4efbf79117bd2da40fd"
[[package]]
name = "cgroups-rs"
version = "0.2.9"
version = "0.2.10"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cdae996d9638ba03253ffa1c93345a585974a97abbdeab9176c77922f3efc1e8"
checksum = "cf5525f2cf84d5113ab26bfb6474180eb63224b4b1e4be31ee87be4098f11399"
dependencies = [
"libc",
"log",
"nix 0.23.1",
"nix 0.24.2",
"regex",
]
[[package]]
name = "chrono"
version = "0.4.19"
version = "0.4.22"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "670ad68c9088c2a963aaa298cb369688cf3f9465ce5e2d4ca10e6e0098a1ce73"
checksum = "bfd4d1b31faaa3a89d7934dbded3111da0d2ef28e3ebccdb4f0179f5929d1ef1"
dependencies = [
"libc",
"iana-time-zone",
"js-sys",
"num-integer",
"num-traits",
"time 0.1.44",
"wasm-bindgen",
"winapi",
]
@@ -224,6 +241,12 @@ dependencies = [
"os_str_bytes",
]
[[package]]
name = "common-path"
version = "1.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2382f75942f4b3be3690fe4f86365e9c853c1587d6ee58212cebf6e2a9ccd101"
[[package]]
name = "core-foundation-sys"
version = "0.8.3"
@@ -322,6 +345,17 @@ dependencies = [
"libc",
]
[[package]]
name = "fail"
version = "0.5.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ec3245a0ca564e7f3c797d20d833a6870f57a728ac967d5225b3ffdef4465011"
dependencies = [
"lazy_static",
"log",
"rand 0.8.5",
]
[[package]]
name = "fastrand"
version = "1.7.0"
@@ -442,6 +476,17 @@ dependencies = [
"slab",
]
[[package]]
name = "getrandom"
version = "0.1.16"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8fc3cb4d91f53b50155bdcfd23f6a4c39ae1969c2ae85982b135750cccaf5fce"
dependencies = [
"cfg-if 1.0.0",
"libc",
"wasi 0.9.0+wasi-snapshot-preview1",
]
[[package]]
name = "getrandom"
version = "0.2.7"
@@ -453,6 +498,12 @@ dependencies = [
"wasi 0.11.0+wasi-snapshot-preview1",
]
[[package]]
name = "glob"
version = "0.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9b919933a397b79c37e33b77bb2aa3dc8eb6e165ad809e58ff75bc7db2e34574"
[[package]]
name = "hashbrown"
version = "0.12.1"
@@ -489,6 +540,19 @@ version = "0.4.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7f24254aa9a54b5c858eaee2f5bccdb46aaf0e486a595ed5fd8f86ba55232a70"
[[package]]
name = "iana-time-zone"
version = "0.1.46"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ad2bfd338099682614d3ee3fe0cd72e0b6a41ca6a87f6a74a3bd593c91650501"
dependencies = [
"android_system_properties",
"core-foundation-sys",
"js-sys",
"wasm-bindgen",
"winapi",
]
[[package]]
name = "indexmap"
version = "1.9.1"
@@ -584,13 +648,15 @@ dependencies = [
"clap",
"futures",
"ipnetwork",
"kata-sys-util",
"kata-types",
"lazy_static",
"libc",
"log",
"logging",
"netlink-packet-utils",
"netlink-sys",
"nix 0.23.1",
"nix 0.24.2",
"oci",
"opentelemetry",
"procfs",
@@ -610,6 +676,7 @@ dependencies = [
"slog-stdlog",
"sysinfo",
"tempfile",
"test-utils",
"thiserror",
"tokio",
"tokio-vsock",
@@ -621,6 +688,47 @@ dependencies = [
"vsock-exporter",
]
[[package]]
name = "kata-sys-util"
version = "0.1.0"
dependencies = [
"byteorder",
"cgroups-rs",
"chrono",
"common-path",
"fail",
"kata-types",
"lazy_static",
"libc",
"nix 0.24.2",
"oci",
"once_cell",
"rand 0.7.3",
"serde_json",
"slog",
"slog-scope",
"subprocess",
"thiserror",
]
[[package]]
name = "kata-types"
version = "0.1.0"
dependencies = [
"byte-unit",
"glob",
"lazy_static",
"num_cpus",
"oci",
"regex",
"serde",
"serde_json",
"slog",
"slog-scope",
"thiserror",
"toml",
]
[[package]]
name = "lazy_static"
version = "1.4.0"
@@ -635,21 +743,20 @@ checksum = "349d5a591cd28b49e1d1037471617a32ddcda5731b99419008085f72d5a53836"
[[package]]
name = "libseccomp"
version = "0.1.3"
version = "0.2.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "36ad71a5b66ceef3acfe6a3178b29b4da063f8bcb2c36dab666d52a7a9cfdb86"
checksum = "49bda1fbf25c42ac8942ff7df1eb6172a3bc36299e84be0dba8c888a7db68c80"
dependencies = [
"libc",
"libseccomp-sys",
"nix 0.17.0",
"pkg-config",
]
[[package]]
name = "libseccomp-sys"
version = "0.1.1"
version = "0.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "539912de229a4fc16e507e8df12a394038a524a5b5b6c92045ad344472aac475"
checksum = "9a7cbbd4ad467251987c6e5b47d53b11a5a05add08f2447a9e2d70aef1e0d138"
[[package]]
name = "lock_api"
@@ -797,19 +904,6 @@ dependencies = [
"tokio",
]
[[package]]
name = "nix"
version = "0.17.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "50e4785f2c3b7589a0d0c1dd60285e1188adac4006e8abd6dd578e1567027363"
dependencies = [
"bitflags",
"cc",
"cfg-if 0.1.10",
"libc",
"void",
]
[[package]]
name = "nix"
version = "0.22.3"
@@ -836,6 +930,18 @@ dependencies = [
"memoffset",
]
[[package]]
name = "nix"
version = "0.24.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "195cdbc1741b8134346d515b3a56a1c94b0912758009cfd53f99ea0f57b065fc"
dependencies = [
"bitflags",
"cfg-if 1.0.0",
"libc",
"memoffset",
]
[[package]]
name = "ntapi"
version = "0.3.7"
@@ -912,7 +1018,7 @@ dependencies = [
"lazy_static",
"percent-encoding",
"pin-project",
"rand",
"rand 0.8.5",
"serde",
"thiserror",
"tokio",
@@ -1176,9 +1282,9 @@ dependencies = [
[[package]]
name = "protobuf"
version = "2.14.0"
version = "2.27.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8e86d370532557ae7573551a1ec8235a0f8d6cb276c7c9e6aa490b511c447485"
checksum = "cf7e6d18738ecd0902d30d1ad232c9125985a3422929b16c65517b38adc14f96"
dependencies = [
"serde",
"serde_derive",
@@ -1186,18 +1292,18 @@ dependencies = [
[[package]]
name = "protobuf-codegen"
version = "2.14.0"
version = "2.27.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "de113bba758ccf2c1ef816b127c958001b7831136c9bc3f8e9ec695ac4e82b0c"
checksum = "aec1632b7c8f2e620343439a7dfd1f3c47b18906c4be58982079911482b5d707"
dependencies = [
"protobuf",
]
[[package]]
name = "protobuf-codegen-pure"
version = "2.14.0"
version = "2.27.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2d1a4febc73bf0cada1d77c459a0c8e5973179f1cfd5b0f1ab789d45b17b6440"
checksum = "9f8122fdb18e55190c796b088a16bdb70cd7acdcd48f7a8b796b58c62e532cc6"
dependencies = [
"protobuf",
"protobuf-codegen",
@@ -1208,6 +1314,7 @@ name = "protocols"
version = "0.1.0"
dependencies = [
"async-trait",
"oci",
"protobuf",
"ttrpc",
"ttrpc-codegen",
@@ -1222,6 +1329,19 @@ dependencies = [
"proc-macro2",
]
[[package]]
name = "rand"
version = "0.7.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6a6b1679d49b24bbfe0c803429aa1874472f50d9b363131f0e89fc356b544d03"
dependencies = [
"getrandom 0.1.16",
"libc",
"rand_chacha 0.2.2",
"rand_core 0.5.1",
"rand_hc",
]
[[package]]
name = "rand"
version = "0.8.5"
@@ -1229,8 +1349,18 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "34af8d1a0e25924bc5b7c43c079c942339d8f0a8b57c39049bef581b46327404"
dependencies = [
"libc",
"rand_chacha",
"rand_core",
"rand_chacha 0.3.1",
"rand_core 0.6.3",
]
[[package]]
name = "rand_chacha"
version = "0.2.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f4c8ed856279c9737206bf725bf36935d8666ead7aa69b52be55af369d193402"
dependencies = [
"ppv-lite86",
"rand_core 0.5.1",
]
[[package]]
@@ -1240,7 +1370,16 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e6c10a63a0fa32252be49d21e7709d4d4baf8d231c2dbce1eaa8141b9b127d88"
dependencies = [
"ppv-lite86",
"rand_core",
"rand_core 0.6.3",
]
[[package]]
name = "rand_core"
version = "0.5.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "90bde5296fc891b0cef12a6d03ddccc162ce7b2aff54160af9338f8d40df6d19"
dependencies = [
"getrandom 0.1.16",
]
[[package]]
@@ -1249,7 +1388,16 @@ version = "0.6.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d34f1408f55294453790c48b2f1ebbb1c5b4b7563eb1f418bcfcfdbb06ebb4e7"
dependencies = [
"getrandom",
"getrandom 0.2.7",
]
[[package]]
name = "rand_hc"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ca3129af7b92a17112d59ad498c6f81eaf463253766b90396d39ea7a39d6613c"
dependencies = [
"rand_core 0.5.1",
]
[[package]]
@@ -1375,6 +1523,7 @@ dependencies = [
"slog",
"slog-scope",
"tempfile",
"test-utils",
"tokio",
]
@@ -1556,6 +1705,16 @@ version = "0.10.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "73473c0e59e6d5812c5dfe2a064a6444949f089e20eec9a2e5506596494e4623"
[[package]]
name = "subprocess"
version = "0.2.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0c2e86926081dda636c546d8c5e641661049d7562a68f5488be4a1f7f66f6086"
dependencies = [
"libc",
"winapi",
]
[[package]]
name = "syn"
version = "1.0.98"
@@ -1611,6 +1770,13 @@ dependencies = [
"winapi-util",
]
[[package]]
name = "test-utils"
version = "0.1.0"
dependencies = [
"nix 0.24.2",
]
[[package]]
name = "textwrap"
version = "0.15.0"
@@ -1837,9 +2003,9 @@ dependencies = [
[[package]]
name = "ttrpc"
version = "0.5.3"
version = "0.6.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c46d73bc2a74f2440921b6539afbed68064b48b2c4f194c637430d1c83d052ad"
checksum = "2ecfff459a859c6ba6668ff72b34c2f1d94d9d58f7088414c2674ad0f31cc7d8"
dependencies = [
"async-trait",
"byteorder",
@@ -1905,12 +2071,6 @@ version = "0.9.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "49874b5167b65d7193b8aba1567f5c7d93d001cafc34600cee003eda787e483f"
[[package]]
name = "void"
version = "1.0.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6a02e4885ed3bc0f2de90ea6dd45ebcbb66dacffe03547fadbb0eeae2770887d"
[[package]]
name = "vsock"
version = "0.2.6"
@@ -1929,7 +2089,7 @@ dependencies = [
"bincode",
"byteorder",
"libc",
"nix 0.23.1",
"nix 0.24.2",
"opentelemetry",
"serde",
"slog",
@@ -1938,6 +2098,12 @@ dependencies = [
"tokio-vsock",
]
[[package]]
name = "wasi"
version = "0.9.0+wasi-snapshot-preview1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cccddf32554fecc6acb585f82a32a72e28b48f8c4c1883ddfeeeaa96f7d8e519"
[[package]]
name = "wasi"
version = "0.10.0+wasi-snapshot-preview1"

View File

@@ -3,23 +3,26 @@ name = "kata-agent"
version = "0.1.0"
authors = ["The Kata Containers community <kata-dev@lists.katacontainers.io>"]
edition = "2018"
license = "Apache-2.0"
[dependencies]
oci = { path = "../libs/oci" }
rustjail = { path = "rustjail" }
protocols = { path = "../libs/protocols" }
protocols = { path = "../libs/protocols", features = ["async"] }
lazy_static = "1.3.0"
ttrpc = { version = "0.5.0", features = ["async", "protobuf-codec"], default-features = false }
protobuf = "=2.14.0"
ttrpc = { version = "0.6.0", features = ["async"], default-features = false }
protobuf = "2.27.0"
libc = "0.2.58"
nix = "0.23.0"
nix = "0.24.2"
capctl = "0.2.0"
serde_json = "1.0.39"
scan_fmt = "0.2.3"
scopeguard = "1.0.0"
thiserror = "1.0.26"
regex = "1.5.5"
regex = "1.5.6"
serial_test = "0.5.1"
kata-sys-util = { path = "../libs/kata-sys-util" }
kata-types = { path = "../libs/kata-types" }
sysinfo = "0.23.0"
# Async helpers
@@ -49,7 +52,7 @@ log = "0.4.11"
prometheus = { version = "0.13.0", features = ["process"] }
procfs = "0.12.0"
anyhow = "1.0.32"
cgroups = { package = "cgroups-rs", version = "0.2.8" }
cgroups = { package = "cgroups-rs", version = "0.2.10" }
# Tracing
tracing = "0.1.26"
@@ -65,6 +68,7 @@ clap = { version = "3.0.1", features = ["derive"] }
[dev-dependencies]
tempfile = "3.1.0"
test-utils = { path = "../libs/test-utils" }
[workspace]
members = [

View File

@@ -107,10 +107,7 @@ endef
##TARGET default: build code
default: $(TARGET) show-header
$(TARGET): $(GENERATED_CODE) logging-crate-tests $(TARGET_PATH)
logging-crate-tests:
make -C $(CWD)/../libs/logging
$(TARGET): $(GENERATED_CODE) $(TARGET_PATH)
$(TARGET_PATH): show-summary
@RUSTFLAGS="$(EXTRA_RUSTFLAGS) --deny warnings" cargo build --target $(TRIPLE) $(if $(findstring release,$(BUILD_TYPE)),--release) $(EXTRA_RUSTFEATURES)
@@ -203,7 +200,6 @@ codecov-html: check_tarpaulin
.PHONY: \
help \
logging-crate-tests \
optimize \
show-header \
show-summary \

View File

@@ -3,6 +3,7 @@ name = "rustjail"
version = "0.1.0"
authors = ["The Kata Containers community <kata-dev@lists.katacontainers.io>"]
edition = "2018"
license = "Apache-2.0"
[dependencies]
serde = "1.0.91"
@@ -16,26 +17,27 @@ scopeguard = "1.0.0"
capctl = "0.2.0"
lazy_static = "1.3.0"
libc = "0.2.58"
protobuf = "=2.14.0"
protobuf = "2.27.0"
slog = "2.5.2"
slog-scope = "4.1.2"
scan_fmt = "0.2.6"
regex = "1.5.5"
regex = "1.5.6"
path-absolutize = "1.2.0"
anyhow = "1.0.32"
cgroups = { package = "cgroups-rs", version = "0.2.8" }
cgroups = { package = "cgroups-rs", version = "0.2.10" }
rlimit = "0.5.3"
cfg-if = "0.1.0"
tokio = { version = "1.2.0", features = ["sync", "io-util", "process", "time", "macros"] }
tokio = { version = "1.2.0", features = ["sync", "io-util", "process", "time", "macros", "rt"] }
futures = "0.3.17"
async-trait = "0.1.31"
inotify = "0.9.2"
libseccomp = { version = "0.1.3", optional = true }
libseccomp = { version = "0.2.3", optional = true }
[dev-dependencies]
serial_test = "0.5.0"
tempfile = "3.1.0"
test-utils = { path = "../../libs/test-utils" }
[features]
seccomp = ["libseccomp"]

View File

@@ -174,7 +174,7 @@ impl CgroupManager for Manager {
freezer_controller.freeze()?;
}
_ => {
return Err(anyhow!(nix::Error::EINVAL));
return Err(anyhow!("Invalid FreezerState"));
}
}
@@ -911,9 +911,8 @@ pub fn get_paths() -> Result<HashMap<String, String>> {
Ok(m)
}
pub fn get_mounts() -> Result<HashMap<String, String>> {
pub fn get_mounts(paths: &HashMap<String, String>) -> Result<HashMap<String, String>> {
let mut m = HashMap::new();
let paths = get_paths()?;
for l in fs::read_to_string(MOUNTS)?.lines() {
let p: Vec<&str> = l.splitn(2, " - ").collect();
@@ -951,7 +950,7 @@ impl Manager {
let mut m = HashMap::new();
let paths = get_paths()?;
let mounts = get_mounts()?;
let mounts = get_mounts(&paths)?;
for key in paths.keys() {
let mnt = mounts.get(key);

View File

@@ -106,6 +106,11 @@ impl Default for ContainerStatus {
}
}
// We might want to change this to thiserror in the future
const MissingCGroupManager: &str = "failed to get container's cgroup Manager";
const MissingLinux: &str = "no linux config";
const InvalidNamespace: &str = "invalid namespace type";
pub type Config = CreateOpts;
type NamespaceType = String;
@@ -292,7 +297,7 @@ impl Container for LinuxContainer {
self.status.transition(ContainerState::Paused);
return Ok(());
}
Err(anyhow!("failed to get container's cgroup manager"))
Err(anyhow!(MissingCGroupManager))
}
fn resume(&mut self) -> Result<()> {
@@ -310,7 +315,7 @@ impl Container for LinuxContainer {
self.status.transition(ContainerState::Running);
return Ok(());
}
Err(anyhow!("failed to get container's cgroup manager"))
Err(anyhow!(MissingCGroupManager))
}
}
@@ -397,7 +402,7 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
};
if spec.linux.is_none() {
return Err(anyhow!("no linux config"));
return Err(anyhow!(MissingLinux));
}
let linux = spec.linux.as_ref().unwrap();
@@ -411,7 +416,7 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
for ns in &nses {
let s = NAMESPACES.get(&ns.r#type.as_str());
if s.is_none() {
return Err(anyhow!("invalid ns type"));
return Err(anyhow!(InvalidNamespace));
}
let s = s.unwrap();
@@ -1092,6 +1097,16 @@ impl BaseContainer for LinuxContainer {
fs::remove_dir_all(&self.root)?;
if let Some(cgm) = self.cgroup_manager.as_mut() {
// Kill all of the processes created in this container to prevent
// the leak of some daemon process when this container shared pidns
// with the sandbox.
let pids = cgm.get_pids().context("get cgroup pids")?;
for i in pids {
if let Err(e) = signal::kill(Pid::from_raw(i), Signal::SIGKILL) {
warn!(self.logger, "kill the process {} error: {:?}", i, e);
}
}
cgm.destroy().context("destroy cgroups")?;
}
Ok(())
@@ -1171,7 +1186,7 @@ fn do_exec(args: &[String]) -> ! {
unreachable!()
}
fn update_namespaces(logger: &Logger, spec: &mut Spec, init_pid: RawFd) -> Result<()> {
pub fn update_namespaces(logger: &Logger, spec: &mut Spec, init_pid: RawFd) -> Result<()> {
info!(logger, "updating namespaces");
let linux = spec
.linux
@@ -1427,18 +1442,10 @@ impl LinuxContainer {
Some(unistd::getuid()),
Some(unistd::getgid()),
)
.context(format!("cannot change onwer of container {} root", id))?;
if config.spec.is_none() {
return Err(anyhow!(nix::Error::EINVAL));
}
.context(format!("Cannot change owner of container {} root", id))?;
let spec = config.spec.as_ref().unwrap();
if spec.linux.is_none() {
return Err(anyhow!(nix::Error::EINVAL));
}
let linux = spec.linux.as_ref().unwrap();
let cpath = if linux.cgroups_path.is_empty() {
@@ -1447,7 +1454,12 @@ impl LinuxContainer {
linux.cgroups_path.clone()
};
let cgroup_manager = FsManager::new(cpath.as_str())?;
let cgroup_manager = FsManager::new(cpath.as_str()).map_err(|e| {
anyhow!(format!(
"fail to create cgroup manager with path {}: {:}",
cpath, e
))
})?;
info!(logger, "new cgroup_manager {:?}", &cgroup_manager);
Ok(LinuxContainer {
@@ -1515,7 +1527,7 @@ pub async fn execute_hook(logger: &Logger, h: &Hook, st: &OCIState) -> Result<()
let binary = PathBuf::from(h.path.as_str());
let path = binary.canonicalize()?;
if !path.exists() {
return Err(anyhow!(nix::Error::EINVAL));
return Err(anyhow!("Path {:?} does not exist", path));
}
let mut args = h.args.clone();
@@ -1646,12 +1658,12 @@ fn valid_env(e: &str) -> Option<(&str, &str)> {
mod tests {
use super::*;
use crate::process::Process;
use crate::skip_if_not_root;
use nix::unistd::Uid;
use std::fs;
use std::os::unix::fs::MetadataExt;
use std::os::unix::io::AsRawFd;
use tempfile::tempdir;
use test_utils::skip_if_not_root;
use tokio::process::Command;
macro_rules! sl {

View File

@@ -514,15 +514,6 @@ pub fn grpc_to_oci(grpc: &grpc::Spec) -> oci::Spec {
#[cfg(test)]
mod tests {
use super::*;
#[macro_export]
macro_rules! skip_if_not_root {
() => {
if !nix::unistd::Uid::effective().is_root() {
println!("INFO: skipping {} which needs root", module_path!());
return;
}
};
}
// Parameters:
//

View File

@@ -780,18 +780,31 @@ fn mount_from(
Path::new(&dest).parent().unwrap()
};
let _ = fs::create_dir_all(&dir).map_err(|e| {
fs::create_dir_all(&dir).map_err(|e| {
log_child!(
cfd_log,
"create dir {}: {}",
dir.to_str().unwrap(),
e.to_string()
)
});
);
e
})?;
// make sure file exists so we can bind over it
if !src.is_dir() {
let _ = OpenOptions::new().create(true).write(true).open(&dest);
let _ = OpenOptions::new()
.create(true)
.write(true)
.open(&dest)
.map_err(|e| {
log_child!(
cfd_log,
"open/create dest error. {}: {:?}",
dest.as_str(),
e
);
e
})?;
}
src.to_str().unwrap().to_string()
} else {
@@ -804,8 +817,10 @@ fn mount_from(
}
};
let _ = stat::stat(dest.as_str())
.map_err(|e| log_child!(cfd_log, "dest stat error. {}: {:?}", dest.as_str(), e));
let _ = stat::stat(dest.as_str()).map_err(|e| {
log_child!(cfd_log, "dest stat error. {}: {:?}", dest.as_str(), e);
e
})?;
mount(
Some(src.as_str()),
@@ -1005,9 +1020,7 @@ pub fn finish_rootfs(cfd_log: RawFd, spec: &Spec, process: &Process) -> Result<(
}
fn mask_path(path: &str) -> Result<()> {
if !path.starts_with('/') || path.contains("..") {
return Err(anyhow!(nix::Error::EINVAL));
}
check_paths(path)?;
match mount(
Some("/dev/null"),
@@ -1025,9 +1038,7 @@ fn mask_path(path: &str) -> Result<()> {
}
fn readonly_path(path: &str) -> Result<()> {
if !path.starts_with('/') || path.contains("..") {
return Err(anyhow!(nix::Error::EINVAL));
}
check_paths(path)?;
if let Err(e) = mount(
Some(&path[1..]),
@@ -1053,11 +1064,20 @@ fn readonly_path(path: &str) -> Result<()> {
Ok(())
}
fn check_paths(path: &str) -> Result<()> {
if !path.starts_with('/') || path.contains("..") {
return Err(anyhow!(
"Cannot mount {} (path does not start with '/' or contains '..').",
path
));
}
Ok(())
}
#[cfg(test)]
mod tests {
use super::*;
use crate::assert_result;
use crate::skip_if_not_root;
use std::fs::create_dir;
use std::fs::create_dir_all;
use std::fs::remove_dir_all;
@@ -1065,6 +1085,7 @@ mod tests {
use std::os::unix::fs;
use std::os::unix::io::AsRawFd;
use tempfile::tempdir;
use test_utils::skip_if_not_root;
#[test]
#[serial(chdir)]
@@ -1405,6 +1426,55 @@ mod tests {
}
}
#[test]
fn test_check_paths() {
#[derive(Debug)]
struct TestData<'a> {
name: &'a str,
path: &'a str,
result: Result<()>,
}
let tests = &[
TestData {
name: "valid path",
path: "/foo/bar",
result: Ok(()),
},
TestData {
name: "does not starts with /",
path: "foo/bar",
result: Err(anyhow!(
"Cannot mount foo/bar (path does not start with '/' or contains '..')."
)),
},
TestData {
name: "contains ..",
path: "../foo/bar",
result: Err(anyhow!(
"Cannot mount ../foo/bar (path does not start with '/' or contains '..')."
)),
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d.name);
let result = check_paths(d.path);
let msg = format!("{}: result: {:?}", msg, result);
if d.result.is_ok() {
assert!(result.is_ok());
continue;
}
let expected_error = format!("{}", d.result.as_ref().unwrap_err());
let actual_error = format!("{}", result.unwrap_err());
assert!(actual_error == expected_error, "{}", msg);
}
}
#[test]
fn test_check_proc_mount() {
let mount = oci::Mount {

View File

@@ -28,7 +28,6 @@ macro_rules! close_process_stream {
($self: ident, $stream:ident, $stream_type: ident) => {
if $self.$stream.is_some() {
$self.close_stream(StreamType::$stream_type);
let _ = unistd::close($self.$stream.unwrap());
$self.$stream = None;
}
};
@@ -225,7 +224,7 @@ impl Process {
Some(writer)
}
pub fn close_stream(&mut self, stream_type: StreamType) {
fn close_stream(&mut self, stream_type: StreamType) {
let _ = self.readers.remove(&stream_type);
let _ = self.writers.remove(&stream_type);
}

View File

@@ -26,12 +26,15 @@ fn get_rule_conditions(args: &[LinuxSeccompArg]) -> Result<Vec<ScmpArgCompare>>
return Err(anyhow!("seccomp opreator is required"));
}
let cond = ScmpArgCompare::new(
arg.index,
ScmpCompareOp::from_str(&arg.op)?,
arg.value,
Some(arg.value_two),
);
let mut op = ScmpCompareOp::from_str(&arg.op)?;
let mut value = arg.value;
// For SCMP_CMP_MASKED_EQ, arg.value is the mask and arg.value_two is the value
if op == ScmpCompareOp::MaskedEqual(u64::default()) {
op = ScmpCompareOp::MaskedEqual(arg.value);
value = arg.value_two;
}
let cond = ScmpArgCompare::new(arg.index, op, value);
conditions.push(cond);
}
@@ -44,7 +47,7 @@ pub fn get_unknown_syscalls(scmp: &LinuxSeccomp) -> Option<Vec<String>> {
for syscall in &scmp.syscalls {
for name in &syscall.names {
if get_syscall_from_name(name, None).is_err() {
if ScmpSyscall::from_name(name).is_err() {
unknown_syscalls.push(name.to_string());
}
}
@@ -60,7 +63,7 @@ pub fn get_unknown_syscalls(scmp: &LinuxSeccomp) -> Option<Vec<String>> {
// init_seccomp creates a seccomp filter and loads it for the current process
// including all the child processes.
pub fn init_seccomp(scmp: &LinuxSeccomp) -> Result<()> {
let def_action = ScmpAction::from_str(scmp.default_action.as_str(), Some(libc::EPERM as u32))?;
let def_action = ScmpAction::from_str(scmp.default_action.as_str(), Some(libc::EPERM as i32))?;
// Create a new filter context
let mut filter = ScmpFilterContext::new_filter(def_action)?;
@@ -72,7 +75,7 @@ pub fn init_seccomp(scmp: &LinuxSeccomp) -> Result<()> {
}
// Unset no new privileges bit
filter.set_no_new_privs_bit(false)?;
filter.set_ctl_nnp(false)?;
// Add a rule for each system call
for syscall in &scmp.syscalls {
@@ -80,13 +83,13 @@ pub fn init_seccomp(scmp: &LinuxSeccomp) -> Result<()> {
return Err(anyhow!("syscall name is required"));
}
let action = ScmpAction::from_str(&syscall.action, Some(syscall.errno_ret))?;
let action = ScmpAction::from_str(&syscall.action, Some(syscall.errno_ret as i32))?;
if action == def_action {
continue;
}
for name in &syscall.names {
let syscall_num = match get_syscall_from_name(name, None) {
let syscall_num = match ScmpSyscall::from_name(name) {
Ok(num) => num,
Err(_) => {
// If we cannot resolve the given system call, we assume it is not supported
@@ -96,10 +99,10 @@ pub fn init_seccomp(scmp: &LinuxSeccomp) -> Result<()> {
};
if syscall.args.is_empty() {
filter.add_rule(action, syscall_num, None)?;
filter.add_rule(action, syscall_num)?;
} else {
let conditions = get_rule_conditions(&syscall.args)?;
filter.add_rule(action, syscall_num, Some(&conditions))?;
filter.add_rule_conditional(action, syscall_num, &conditions)?;
}
}
}
@@ -119,10 +122,10 @@ pub fn init_seccomp(scmp: &LinuxSeccomp) -> Result<()> {
#[cfg(test)]
mod tests {
use super::*;
use crate::skip_if_not_root;
use libc::{dup3, process_vm_readv, EPERM, O_CLOEXEC};
use std::io::Error;
use std::ptr::null;
use test_utils::skip_if_not_root;
macro_rules! syscall_assert {
($e1: expr, $e2: expr) => {

View File

@@ -4,17 +4,15 @@
//
use crate::container::Config;
use anyhow::{anyhow, Context, Error, Result};
use anyhow::{anyhow, Context, Result};
use oci::{Linux, LinuxIdMapping, LinuxNamespace, Spec};
use std::collections::HashMap;
use std::path::{Component, PathBuf};
fn einval() -> Error {
anyhow!(nix::Error::EINVAL)
}
fn get_linux(oci: &Spec) -> Result<&Linux> {
oci.linux.as_ref().ok_or_else(einval)
oci.linux
.as_ref()
.ok_or_else(|| anyhow!("Unable to get Linux section from Spec"))
}
fn contain_namespace(nses: &[LinuxNamespace], key: &str) -> bool {
@@ -31,7 +29,10 @@ fn rootfs(root: &str) -> Result<()> {
let path = PathBuf::from(root);
// not absolute path or not exists
if !path.exists() || !path.is_absolute() {
return Err(einval());
return Err(anyhow!(
"Path from {:?} does not exist or is not absolute",
root
));
}
// symbolic link? ..?
@@ -49,7 +50,7 @@ fn rootfs(root: &str) -> Result<()> {
if let Some(v) = c.as_os_str().to_str() {
stack.push(v.to_string());
} else {
return Err(einval());
return Err(anyhow!("Invalid path component (unable to convert to str)"));
}
}
@@ -58,10 +59,13 @@ fn rootfs(root: &str) -> Result<()> {
cleaned.push(e);
}
let canon = path.canonicalize().context("canonicalize")?;
let canon = path.canonicalize().context("failed to canonicalize path")?;
if cleaned != canon {
// There is symbolic in path
return Err(einval());
return Err(anyhow!(
"There may be illegal symbols in the path name. Cleaned ({:?}) and canonicalized ({:?}) paths do not match",
cleaned,
canon));
}
Ok(())
@@ -74,7 +78,7 @@ fn hostname(oci: &Spec) -> Result<()> {
let linux = get_linux(oci)?;
if !contain_namespace(&linux.namespaces, "uts") {
return Err(einval());
return Err(anyhow!("Linux namespace does not contain uts"));
}
Ok(())
@@ -88,7 +92,7 @@ fn security(oci: &Spec) -> Result<()> {
}
if !contain_namespace(&linux.namespaces, "mount") {
return Err(einval());
return Err(anyhow!("Linux namespace does not contain mount"));
}
// don't care about selinux at present
@@ -103,7 +107,7 @@ fn idmapping(maps: &[LinuxIdMapping]) -> Result<()> {
}
}
Err(einval())
Err(anyhow!("No idmap has size > 0"))
}
fn usernamespace(oci: &Spec) -> Result<()> {
@@ -121,7 +125,7 @@ fn usernamespace(oci: &Spec) -> Result<()> {
} else {
// no user namespace but idmap
if !linux.uid_mappings.is_empty() || !linux.gid_mappings.is_empty() {
return Err(einval());
return Err(anyhow!("No user namespace, but uid or gid mapping exists"));
}
}
@@ -163,7 +167,7 @@ fn sysctl(oci: &Spec) -> Result<()> {
if contain_namespace(&linux.namespaces, "ipc") {
continue;
} else {
return Err(einval());
return Err(anyhow!("Linux namespace does not contain ipc"));
}
}
@@ -178,11 +182,11 @@ fn sysctl(oci: &Spec) -> Result<()> {
}
if key == "kernel.hostname" {
return Err(einval());
return Err(anyhow!("Kernel hostname specfied in Spec"));
}
}
return Err(einval());
return Err(anyhow!("Sysctl config contains invalid settings"));
}
Ok(())
}
@@ -191,12 +195,13 @@ fn rootless_euid_mapping(oci: &Spec) -> Result<()> {
let linux = get_linux(oci)?;
if !contain_namespace(&linux.namespaces, "user") {
return Err(einval());
return Err(anyhow!("Linux namespace is missing user"));
}
if linux.uid_mappings.is_empty() || linux.gid_mappings.is_empty() {
// rootless containers requires at least one UID/GID mapping
return Err(einval());
return Err(anyhow!(
"Rootless containers require at least one UID/GID mapping"
));
}
Ok(())
@@ -220,7 +225,7 @@ fn rootless_euid_mount(oci: &Spec) -> Result<()> {
let fields: Vec<&str> = opt.split('=').collect();
if fields.len() != 2 {
return Err(einval());
return Err(anyhow!("Options has invalid field: {:?}", fields));
}
let id = fields[1]
@@ -229,11 +234,11 @@ fn rootless_euid_mount(oci: &Spec) -> Result<()> {
.context(format!("parse field {}", &fields[1]))?;
if opt.starts_with("uid=") && !has_idmapping(&linux.uid_mappings, id) {
return Err(einval());
return Err(anyhow!("uid of {} does not have a valid mapping", id));
}
if opt.starts_with("gid=") && !has_idmapping(&linux.gid_mappings, id) {
return Err(einval());
return Err(anyhow!("gid of {} does not have a valid mapping", id));
}
}
}
@@ -249,15 +254,18 @@ fn rootless_euid(oci: &Spec) -> Result<()> {
pub fn validate(conf: &Config) -> Result<()> {
lazy_static::initialize(&SYSCTLS);
let oci = conf.spec.as_ref().ok_or_else(einval)?;
let oci = conf
.spec
.as_ref()
.ok_or_else(|| anyhow!("Invalid config spec"))?;
if oci.linux.is_none() {
return Err(einval());
return Err(anyhow!("oci Linux is none"));
}
let root = match oci.root.as_ref() {
Some(v) => v.path.as_str(),
None => return Err(einval()),
None => return Err(anyhow!("oci root is none")),
};
rootfs(root).context("rootfs")?;

View File

@@ -12,6 +12,8 @@ use std::str::FromStr;
use std::time;
use tracing::instrument;
use kata_types::config::default::DEFAULT_AGENT_VSOCK_PORT;
const DEBUG_CONSOLE_FLAG: &str = "agent.debug_console";
const DEV_MODE_FLAG: &str = "agent.devmode";
const TRACE_MODE_OPTION: &str = "agent.trace";
@@ -28,7 +30,6 @@ const DEFAULT_LOG_LEVEL: slog::Level = slog::Level::Info;
const DEFAULT_HOTPLUG_TIMEOUT: time::Duration = time::Duration::from_secs(3);
const DEFAULT_CONTAINER_PIPE_SIZE: i32 = 0;
const VSOCK_ADDR: &str = "vsock://-1";
const VSOCK_PORT: u16 = 1024;
// Environment variables used for development and testing
const SERVER_ADDR_ENV_VAR: &str = "KATA_AGENT_SERVER_ADDR";
@@ -147,7 +148,7 @@ impl Default for AgentConfig {
debug_console_vport: 0,
log_vport: 0,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: format!("{}:{}", VSOCK_ADDR, VSOCK_PORT),
server_addr: format!("{}:{}", VSOCK_ADDR, DEFAULT_AGENT_VSOCK_PORT),
unified_cgroup_hierarchy: false,
tracing: false,
endpoints: Default::default(),
@@ -432,7 +433,7 @@ fn get_container_pipe_size(param: &str) -> Result<i32> {
#[cfg(test)]
mod tests {
use crate::assert_result;
use test_utils::assert_result;
use super::*;
use anyhow::anyhow;

View File

@@ -9,7 +9,7 @@ use anyhow::{anyhow, Result};
use nix::fcntl::{self, FcntlArg, FdFlag, OFlag};
use nix::libc::{STDERR_FILENO, STDIN_FILENO, STDOUT_FILENO};
use nix::pty::{openpty, OpenptyResult};
use nix::sys::socket::{self, AddressFamily, SockAddr, SockFlag, SockType};
use nix::sys::socket::{self, AddressFamily, SockFlag, SockType, VsockAddr};
use nix::sys::stat::Mode;
use nix::sys::wait;
use nix::unistd::{self, close, dup2, fork, setsid, ForkResult, Pid};
@@ -67,7 +67,7 @@ pub async fn debug_console_handler(
SockFlag::SOCK_CLOEXEC,
None,
)?;
let addr = SockAddr::new_vsock(libc::VMADDR_CID_ANY, port);
let addr = VsockAddr::new(libc::VMADDR_CID_ANY, port);
socket::bind(listenfd, &addr)?;
socket::listen(listenfd, 1)?;

View File

@@ -22,7 +22,7 @@ extern crate slog;
use anyhow::{anyhow, Context, Result};
use clap::{AppSettings, Parser};
use nix::fcntl::OFlag;
use nix::sys::socket::{self, AddressFamily, SockAddr, SockFlag, SockType};
use nix::sys::socket::{self, AddressFamily, SockFlag, SockType, VsockAddr};
use nix::unistd::{self, dup, Pid};
use std::env;
use std::ffi::OsStr;
@@ -49,8 +49,6 @@ mod pci;
pub mod random;
mod sandbox;
mod signal;
#[cfg(test)]
mod test_utils;
mod uevent;
mod util;
mod version;
@@ -110,10 +108,6 @@ enum SubCommand {
fn announce(logger: &Logger, config: &AgentConfig) {
info!(logger, "announce";
"agent-commit" => version::VERSION_COMMIT,
// Avoid any possibility of confusion with the old agent
"agent-type" => "rust",
"agent-version" => version::AGENT_VERSION,
"api-version" => version::API_VERSION,
"config" => format!("{:?}", config),
@@ -132,7 +126,7 @@ async fn create_logger_task(rfd: RawFd, vsock_port: u32, shutdown: Receiver<bool
None,
)?;
let addr = SockAddr::new_vsock(libc::VMADDR_CID_ANY, vsock_port);
let addr = VsockAddr::new(libc::VMADDR_CID_ANY, vsock_port);
socket::bind(listenfd, &addr)?;
socket::listen(listenfd, 1)?;
@@ -213,7 +207,7 @@ async fn real_main() -> std::result::Result<(), Box<dyn std::error::Error>> {
if config.log_level == slog::Level::Trace {
// Redirect ttrpc log calls to slog iff full debug requested
ttrpc_log_guard = Ok(slog_stdlog::init().map_err(|e| e)?);
ttrpc_log_guard = Ok(slog_stdlog::init()?);
}
if config.tracing {
@@ -405,7 +399,8 @@ use std::os::unix::io::{FromRawFd, RawFd};
#[cfg(test)]
mod tests {
use super::*;
use crate::test_utils::test_utils::TestUserType;
use test_utils::TestUserType;
use test_utils::{assert_result, skip_if_not_root, skip_if_root};
#[tokio::test]
async fn test_create_logger_task() {

View File

@@ -169,11 +169,12 @@ pub fn baremount(
info!(
logger,
"mount source={:?}, dest={:?}, fs_type={:?}, options={:?}",
"baremount source={:?}, dest={:?}, fs_type={:?}, options={:?}, flags={:?}",
source,
destination,
fs_type,
options
options,
flags
);
nix::mount::mount(
@@ -779,6 +780,14 @@ pub async fn add_storages(
};
// Todo need to rollback the mounted storage if err met.
if res.is_err() {
error!(
logger,
"add_storages failed, storage: {:?}, error: {:?} ", storage, res
);
}
let mount_point = res?;
if !mount_point.is_empty() {
@@ -840,7 +849,8 @@ pub fn get_mount_fs_type_from_file(mount_file: &str, mount_point: &str) -> Resul
return Err(anyhow!("Invalid mount point {}", mount_point));
}
let content = fs::read_to_string(mount_file)?;
let content = fs::read_to_string(mount_file)
.map_err(|e| anyhow!("read mount file {}: {}", mount_file, e))?;
let re = Regex::new(format!("device .+ mounted on {} with fstype (.+)", mount_point).as_str())?;
@@ -1016,8 +1026,6 @@ fn parse_options(option_list: Vec<String>) -> HashMap<String, String> {
#[cfg(test)]
mod tests {
use super::*;
use crate::test_utils::test_utils::TestUserType;
use crate::{skip_if_not_root, skip_loop_by_user, skip_loop_if_not_root, skip_loop_if_root};
use protobuf::RepeatedField;
use protocols::agent::FSGroup;
use std::fs::File;
@@ -1025,6 +1033,10 @@ mod tests {
use std::io::Write;
use std::path::PathBuf;
use tempfile::tempdir;
use test_utils::TestUserType;
use test_utils::{
skip_if_not_root, skip_loop_by_user, skip_loop_if_not_root, skip_loop_if_root,
};
#[test]
fn test_mount() {

View File

@@ -187,9 +187,10 @@ impl fmt::Debug for NamespaceType {
#[cfg(test)]
mod tests {
use super::{Namespace, NamespaceType};
use crate::{mount::remove_mounts, skip_if_not_root};
use crate::mount::remove_mounts;
use nix::sched::CloneFlags;
use tempfile::Builder;
use test_utils::skip_if_not_root;
#[tokio::test]
async fn test_setup_persistent_ns() {

View File

@@ -64,7 +64,7 @@ impl Handle {
pub async fn update_interface(&mut self, iface: &Interface) -> Result<()> {
// The reliable way to find link is using hardware address
// as filter. However, hardware filter might not be supported
// by netlink, we may have to dump link list and the find the
// by netlink, we may have to dump link list and then find the
// target link. filter using name or family is supported, but
// we cannot use that to find target link.
// let's try if hardware address filter works. -_-
@@ -178,7 +178,7 @@ impl Handle {
.with_context(|| format!("Failed to parse MAC address: {}", addr))?;
// Hardware filter might not be supported by netlink,
// we may have to dump link list and the find the target link.
// we may have to dump link list and then find the target link.
stream
.try_filter(|f| {
let result = f.nlas.iter().any(|n| match n {
@@ -523,7 +523,7 @@ impl Handle {
.as_ref()
.map(|to| to.address.as_str()) // Extract address field
.and_then(|addr| if addr.is_empty() { None } else { Some(addr) }) // Make sure it's not empty
.ok_or_else(|| anyhow!(nix::Error::EINVAL))?;
.ok_or_else(|| anyhow!("Unable to determine ip address of ARP neighbor"))?;
let ip = IpAddr::from_str(ip_address)
.map_err(|e| anyhow!("Failed to parse IP {}: {:?}", ip_address, e))?;
@@ -612,7 +612,12 @@ fn parse_mac_address(addr: &str) -> Result<[u8; 6]> {
// Parse single Mac address block
let mut parse_next = || -> Result<u8> {
let v = u8::from_str_radix(split.next().ok_or_else(|| anyhow!(nix::Error::EINVAL))?, 16)?;
let v = u8::from_str_radix(
split
.next()
.ok_or_else(|| anyhow!("Invalid MAC address {}", addr))?,
16,
)?;
Ok(v)
};
@@ -770,10 +775,10 @@ impl Address {
#[cfg(test)]
mod tests {
use super::*;
use crate::skip_if_not_root;
use rtnetlink::packet;
use std::iter;
use std::process::Command;
use test_utils::skip_if_not_root;
#[tokio::test]
async fn find_link_by_name() {

View File

@@ -76,11 +76,11 @@ fn do_setup_guest_dns(logger: Logger, dns_list: Vec<String>, src: &str, dst: &st
#[cfg(test)]
mod tests {
use super::*;
use crate::skip_if_not_root;
use nix::mount;
use std::fs::File;
use std::io::Write;
use tempfile::tempdir;
use test_utils::skip_if_not_root;
#[test]
fn test_setup_guest_dns() {

View File

@@ -53,9 +53,9 @@ pub fn reseed_rng(data: &[u8]) -> Result<()> {
#[cfg(test)]
mod tests {
use super::*;
use crate::skip_if_not_root;
use std::fs::File;
use std::io::prelude::*;
use test_utils::skip_if_not_root;
#[test]
fn test_reseed_rng() {

View File

@@ -34,6 +34,7 @@ use protocols::health::{
HealthCheckResponse, HealthCheckResponse_ServingStatus, VersionCheckResponse,
};
use protocols::types::Interface;
use protocols::{agent_ttrpc_async as agent_ttrpc, health_ttrpc_async as health_ttrpc};
use rustjail::cgroups::notifier;
use rustjail::container::{BaseContainer, Container, LinuxContainer};
use rustjail::process::Process;
@@ -133,30 +134,6 @@ pub struct AgentService {
sandbox: Arc<Mutex<Sandbox>>,
}
// A container ID must match this regex:
//
// ^[a-zA-Z0-9][a-zA-Z0-9_.-]+$
//
fn verify_cid(id: &str) -> Result<()> {
let mut chars = id.chars();
let valid = match chars.next() {
Some(first)
if first.is_alphanumeric()
&& id.len() > 1
&& chars.all(|c| c.is_alphanumeric() || ['.', '-', '_'].contains(&c)) =>
{
true
}
_ => false,
};
match valid {
true => Ok(()),
false => Err(anyhow!("invalid container ID: {:?}", id)),
}
}
impl AgentService {
#[instrument]
async fn do_create_container(
@@ -165,7 +142,7 @@ impl AgentService {
) -> Result<()> {
let cid = req.container_id.clone();
verify_cid(&cid)?;
kata_sys_util::validate::verify_id(&cid)?;
let mut oci_spec = req.OCI.clone();
let use_sandbox_pidns = req.get_sandbox_pidns();
@@ -249,7 +226,20 @@ impl AgentService {
info!(sl!(), "no process configurations!");
return Err(anyhow!(nix::Error::EINVAL));
};
ctr.start(p).await?;
// if starting container failed, we will do some rollback work
// to ensure no resources are leaked.
if let Err(err) = ctr.start(p).await {
error!(sl!(), "failed to start container: {:?}", err);
if let Err(e) = ctr.destroy().await {
error!(sl!(), "failed to destroy container: {:?}", e);
}
if let Err(e) = remove_container_resources(&mut s, &cid) {
error!(sl!(), "failed to remove container resources: {:?}", e);
}
return Err(err);
}
s.update_shared_pidns(&ctr)?;
s.add_container(ctr);
info!(sl!(), "created container!");
@@ -295,27 +285,6 @@ impl AgentService {
req: protocols::agent::RemoveContainerRequest,
) -> Result<()> {
let cid = req.container_id.clone();
let mut cmounts: Vec<String> = vec![];
let mut remove_container_resources = |sandbox: &mut Sandbox| -> Result<()> {
// Find the sandbox storage used by this container
let mounts = sandbox.container_mounts.get(&cid);
if let Some(mounts) = mounts {
for m in mounts.iter() {
if sandbox.storages.get(m).is_some() {
cmounts.push(m.to_string());
}
}
}
for m in cmounts.iter() {
sandbox.unset_and_remove_sandbox_storage(m)?;
}
sandbox.container_mounts.remove(cid.as_str());
sandbox.containers.remove(cid.as_str());
Ok(())
};
if req.timeout == 0 {
let s = Arc::clone(&self.sandbox);
@@ -329,7 +298,7 @@ impl AgentService {
.destroy()
.await?;
remove_container_resources(&mut sandbox)?;
remove_container_resources(&mut sandbox, &cid)?;
return Ok(());
}
@@ -361,8 +330,7 @@ impl AgentService {
let s = self.sandbox.clone();
let mut sandbox = s.lock().await;
remove_container_resources(&mut sandbox)?;
remove_container_resources(&mut sandbox, &cid)?;
Ok(())
}
@@ -380,7 +348,7 @@ impl AgentService {
let mut process = req
.process
.into_option()
.ok_or_else(|| anyhow!(nix::Error::EINVAL))?;
.ok_or_else(|| anyhow!("Unable to parse process from ExecProcessRequest"))?;
// Apply any necessary corrections for PCI addresses
update_env_pci(&mut process.Env, &sandbox.pcimap)?;
@@ -629,7 +597,7 @@ impl AgentService {
};
if reader.is_none() {
return Err(anyhow!(nix::Error::EINVAL));
return Err(anyhow!("Unable to determine stream reader, is None"));
}
let reader = reader.ok_or_else(|| anyhow!("cannot get stream reader"))?;
@@ -650,7 +618,7 @@ impl AgentService {
}
#[async_trait]
impl protocols::agent_ttrpc::AgentService for AgentService {
impl agent_ttrpc::AgentService for AgentService {
async fn create_container(
&self,
ctx: &TtrpcContext,
@@ -1536,7 +1504,7 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
struct HealthService;
#[async_trait]
impl protocols::health_ttrpc::Health for HealthService {
impl health_ttrpc::Health for HealthService {
async fn check(
&self,
_ctx: &TtrpcContext,
@@ -1675,18 +1643,17 @@ async fn read_stream(reader: Arc<Mutex<ReadHalf<PipeStream>>>, l: usize) -> Resu
}
pub fn start(s: Arc<Mutex<Sandbox>>, server_address: &str) -> Result<TtrpcServer> {
let agent_service = Box::new(AgentService { sandbox: s })
as Box<dyn protocols::agent_ttrpc::AgentService + Send + Sync>;
let agent_service =
Box::new(AgentService { sandbox: s }) as Box<dyn agent_ttrpc::AgentService + Send + Sync>;
let agent_worker = Arc::new(agent_service);
let health_service =
Box::new(HealthService {}) as Box<dyn protocols::health_ttrpc::Health + Send + Sync>;
let health_service = Box::new(HealthService {}) as Box<dyn health_ttrpc::Health + Send + Sync>;
let health_worker = Arc::new(health_service);
let aservice = protocols::agent_ttrpc::create_agent_service(agent_worker);
let aservice = agent_ttrpc::create_agent_service(agent_worker);
let hservice = protocols::health_ttrpc::create_health(health_worker);
let hservice = health_ttrpc::create_health(health_worker);
let server = TtrpcServer::new()
.bind(server_address)?
@@ -1752,6 +1719,35 @@ fn update_container_namespaces(
Ok(())
}
fn remove_container_resources(sandbox: &mut Sandbox, cid: &str) -> Result<()> {
let mut cmounts: Vec<String> = vec![];
// Find the sandbox storage used by this container
let mounts = sandbox.container_mounts.get(cid);
if let Some(mounts) = mounts {
for m in mounts.iter() {
if sandbox.storages.get(m).is_some() {
cmounts.push(m.to_string());
}
}
}
for m in cmounts.iter() {
if let Err(err) = sandbox.unset_and_remove_sandbox_storage(m) {
error!(
sl!(),
"failed to unset_and_remove_sandbox_storage for container {}, error: {:?}",
cid,
err
);
}
}
sandbox.container_mounts.remove(cid);
sandbox.containers.remove(cid);
Ok(())
}
fn append_guest_hooks(s: &Sandbox, oci: &mut Spec) -> Result<()> {
if let Some(ref guest_hooks) = s.hooks {
let mut hooks = oci.hooks.take().unwrap_or_default();
@@ -1843,7 +1839,11 @@ fn do_copy_file(req: &CopyFileRequest) -> Result<()> {
let path = PathBuf::from(req.path.as_str());
if !path.starts_with(CONTAINER_BASE) {
return Err(anyhow!(nix::Error::EINVAL));
return Err(anyhow!(
"Path {:?} does not start with {}",
path,
CONTAINER_BASE
));
}
let parent = path.parent();
@@ -2011,14 +2011,12 @@ fn load_kernel_module(module: &protocols::agent::KernelModule) -> Result<()> {
#[cfg(test)]
mod tests {
use super::*;
use crate::{
assert_result, namespace::Namespace, protocols::agent_ttrpc::AgentService as _,
skip_if_not_root,
};
use crate::{namespace::Namespace, protocols::agent_ttrpc_async::AgentService as _};
use nix::mount;
use nix::sched::{unshare, CloneFlags};
use oci::{Hook, Hooks, Linux, LinuxNamespace};
use tempfile::{tempdir, TempDir};
use test_utils::{assert_result, skip_if_not_root};
use ttrpc::{r#async::TtrpcContext, MessageHeader};
fn mk_ttrpc_context() -> TtrpcContext {
@@ -2084,6 +2082,7 @@ mod tests {
let result = load_kernel_module(&m);
assert!(result.is_err(), "load module should failed");
skip_if_not_root!();
// case 3: normal module.
// normally this module should eixsts...
m.name = "bridge".to_string();
@@ -2262,6 +2261,7 @@ mod tests {
if d.has_fd {
Some(wfd)
} else {
unistd::close(wfd).unwrap();
None
}
};
@@ -2296,13 +2296,14 @@ mod tests {
if !d.break_pipe {
unistd::close(rfd).unwrap();
}
unistd::close(wfd).unwrap();
// XXX: Do not close wfd.
// the fd will be closed on Process's dropping.
// unistd::close(wfd).unwrap();
let msg = format!("{}, result: {:?}", msg, result);
assert_result!(d.result, result, msg);
}
}
#[tokio::test]
async fn test_update_container_namespaces() {
#[derive(Debug)]
@@ -2670,233 +2671,6 @@ OtherField:other
}
}
#[tokio::test]
async fn test_verify_cid() {
#[derive(Debug)]
struct TestData<'a> {
id: &'a str,
expect_error: bool,
}
let tests = &[
TestData {
// Cannot be blank
id: "",
expect_error: true,
},
TestData {
// Cannot be a space
id: " ",
expect_error: true,
},
TestData {
// Must start with an alphanumeric
id: ".",
expect_error: true,
},
TestData {
// Must start with an alphanumeric
id: "-",
expect_error: true,
},
TestData {
// Must start with an alphanumeric
id: "_",
expect_error: true,
},
TestData {
// Must start with an alphanumeric
id: " a",
expect_error: true,
},
TestData {
// Must start with an alphanumeric
id: ".a",
expect_error: true,
},
TestData {
// Must start with an alphanumeric
id: "-a",
expect_error: true,
},
TestData {
// Must start with an alphanumeric
id: "_a",
expect_error: true,
},
TestData {
// Must start with an alphanumeric
id: "..",
expect_error: true,
},
TestData {
// Too short
id: "a",
expect_error: true,
},
TestData {
// Too short
id: "z",
expect_error: true,
},
TestData {
// Too short
id: "A",
expect_error: true,
},
TestData {
// Too short
id: "Z",
expect_error: true,
},
TestData {
// Too short
id: "0",
expect_error: true,
},
TestData {
// Too short
id: "9",
expect_error: true,
},
TestData {
// Must start with an alphanumeric
id: "-1",
expect_error: true,
},
TestData {
id: "/",
expect_error: true,
},
TestData {
id: "a/",
expect_error: true,
},
TestData {
id: "a/../",
expect_error: true,
},
TestData {
id: "../a",
expect_error: true,
},
TestData {
id: "../../a",
expect_error: true,
},
TestData {
id: "../../../a",
expect_error: true,
},
TestData {
id: "foo/../bar",
expect_error: true,
},
TestData {
id: "foo bar",
expect_error: true,
},
TestData {
id: "a.",
expect_error: false,
},
TestData {
id: "a..",
expect_error: false,
},
TestData {
id: "aa",
expect_error: false,
},
TestData {
id: "aa.",
expect_error: false,
},
TestData {
id: "hello..world",
expect_error: false,
},
TestData {
id: "hello/../world",
expect_error: true,
},
TestData {
id: "aa1245124sadfasdfgasdga.",
expect_error: false,
},
TestData {
id: "aAzZ0123456789_.-",
expect_error: false,
},
TestData {
id: "abcdefghijklmnopqrstuvwxyz0123456789.-_",
expect_error: false,
},
TestData {
id: "0123456789abcdefghijklmnopqrstuvwxyz.-_",
expect_error: false,
},
TestData {
id: " abcdefghijklmnopqrstuvwxyz0123456789.-_",
expect_error: true,
},
TestData {
id: ".abcdefghijklmnopqrstuvwxyz0123456789.-_",
expect_error: true,
},
TestData {
id: "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.-_",
expect_error: false,
},
TestData {
id: "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ.-_",
expect_error: false,
},
TestData {
id: " ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.-_",
expect_error: true,
},
TestData {
id: ".ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.-_",
expect_error: true,
},
TestData {
id: "/a/b/c",
expect_error: true,
},
TestData {
id: "a/b/c",
expect_error: true,
},
TestData {
id: "foo/../../../etc/passwd",
expect_error: true,
},
TestData {
id: "../../../../../../etc/motd",
expect_error: true,
},
TestData {
id: "/etc/passwd",
expect_error: true,
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let result = verify_cid(d.id);
let msg = format!("{}, result: {:?}", msg, result);
if result.is_ok() {
assert!(!d.expect_error, "{}", msg);
} else {
assert!(d.expect_error, "{}", msg);
}
}
}
#[tokio::test]
async fn test_volume_capacity_stats() {
skip_if_not_root!();

View File

@@ -471,7 +471,7 @@ fn online_memory(logger: &Logger) -> Result<()> {
#[cfg(test)]
mod tests {
use super::*;
use crate::{mount::baremount, skip_if_not_root};
use crate::mount::baremount;
use anyhow::{anyhow, Error};
use nix::mount::MsFlags;
use oci::{Linux, Root, Spec};
@@ -484,6 +484,7 @@ mod tests {
use std::os::unix::fs::PermissionsExt;
use std::path::Path;
use tempfile::{tempdir, Builder, TempDir};
use test_utils::skip_if_not_root;
fn bind_mount(src: &str, dst: &str, logger: &Logger) -> Result<(), Error> {
let src_path = Path::new(src);

View File

@@ -1,99 +0,0 @@
// Copyright (c) 2019 Intel Corporation
//
// SPDX-License-Identifier: Apache-2.0
//
#![allow(clippy::module_inception)]
#[cfg(test)]
pub mod test_utils {
#[derive(Debug, PartialEq)]
pub enum TestUserType {
RootOnly,
NonRootOnly,
Any,
}
#[macro_export]
macro_rules! skip_if_root {
() => {
if nix::unistd::Uid::effective().is_root() {
println!("INFO: skipping {} which needs non-root", module_path!());
return;
}
};
}
#[macro_export]
macro_rules! skip_if_not_root {
() => {
if !nix::unistd::Uid::effective().is_root() {
println!("INFO: skipping {} which needs root", module_path!());
return;
}
};
}
#[macro_export]
macro_rules! skip_loop_if_root {
($msg:expr) => {
if nix::unistd::Uid::effective().is_root() {
println!(
"INFO: skipping loop {} in {} which needs non-root",
$msg,
module_path!()
);
continue;
}
};
}
#[macro_export]
macro_rules! skip_loop_if_not_root {
($msg:expr) => {
if !nix::unistd::Uid::effective().is_root() {
println!(
"INFO: skipping loop {} in {} which needs root",
$msg,
module_path!()
);
continue;
}
};
}
// Parameters:
//
// 1: expected Result
// 2: actual Result
// 3: string used to identify the test on error
#[macro_export]
macro_rules! assert_result {
($expected_result:expr, $actual_result:expr, $msg:expr) => {
if $expected_result.is_ok() {
let expected_value = $expected_result.as_ref().unwrap();
let actual_value = $actual_result.unwrap();
assert!(*expected_value == actual_value, "{}", $msg);
} else {
assert!($actual_result.is_err(), "{}", $msg);
let expected_error = $expected_result.as_ref().unwrap_err();
let expected_error_msg = format!("{:?}", expected_error);
let actual_error_msg = format!("{:?}", $actual_result.unwrap_err());
assert!(expected_error_msg == actual_error_msg, "{}", $msg);
}
};
}
#[macro_export]
macro_rules! skip_loop_by_user {
($msg:expr, $user:expr) => {
if $user == TestUserType::RootOnly {
skip_loop_if_not_root!($msg);
} else if $user == TestUserType::NonRootOnly {
skip_loop_if_root!($msg);
}
};
}
}

View File

@@ -69,6 +69,8 @@ macro_rules! trace_rpc_call {
propagator.extract(&extract_carrier_from_ttrpc($ctx))
});
info!(sl!(), "rpc call from shim to agent: {:?}", $name);
// generate tracing span
let rpc_span = span!(tracing::Level::INFO, $name, "mod"="rpc.rs", req=?$req);

View File

@@ -528,10 +528,10 @@ impl BindWatcher {
mod tests {
use super::*;
use crate::mount::is_mounted;
use crate::skip_if_not_root;
use nix::unistd::{Gid, Uid};
use std::fs;
use std::thread;
use test_utils::skip_if_not_root;
async fn create_test_storage(dir: &Path, id: &str) -> Result<(protos::Storage, PathBuf)> {
let src_path = dir.join(format!("src{}", id));

View File

@@ -3,11 +3,12 @@ name = "vsock-exporter"
version = "0.1.0"
authors = ["James O. D. Hunt <james.o.hunt@intel.com>"]
edition = "2018"
license = "Apache-2.0"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
nix = "0.23.0"
nix = "0.24.2"
libc = "0.2.94"
thiserror = "1.0.26"
opentelemetry = { version = "0.14.0", features=["serialize"] }

3
src/dragonball/.gitignore vendored Normal file
View File

@@ -0,0 +1,3 @@
target
Cargo.lock
.idea

65
src/dragonball/Cargo.toml Normal file
View File

@@ -0,0 +1,65 @@
[package]
name = "dragonball"
version = "0.1.0"
authors = ["The Kata Containers community <kata-dev@lists.katacontainers.io>"]
description = "A secure sandbox for Kata Containers"
keywords = ["kata-containers", "sandbox", "vmm", "dragonball"]
homepage = "https://katacontainers.io/"
repository = "https://github.com/kata-containers/kata-containers.git"
license = "Apache-2.0"
edition = "2018"
[dependencies]
arc-swap = "1.5.0"
bytes = "1.1.0"
dbs-address-space = "0.1.0"
dbs-allocator = "0.1.0"
dbs-arch = "0.1.0"
dbs-boot = "0.2.0"
dbs-device = "0.1.0"
dbs-interrupt = { version = "0.1.0", features = ["kvm-irq"] }
dbs-legacy-devices = "0.1.0"
dbs-upcall = { version = "0.1.0", optional = true }
dbs-utils = "0.1.0"
dbs-virtio-devices = { version = "0.1.0", optional = true, features = ["virtio-mmio"] }
kvm-bindings = "0.5.0"
kvm-ioctls = "0.11.0"
lazy_static = "1.2"
libc = "0.2.39"
linux-loader = "0.4.0"
log = "0.4.14"
nix = "0.24.2"
seccompiler = "0.2.0"
serde = "1.0.27"
serde_derive = "1.0.27"
serde_json = "1.0.9"
slog = "2.5.2"
slog-scope = "4.4.0"
thiserror = "1"
vmm-sys-util = "0.9.0"
virtio-queue = { version = "0.1.0", optional = true }
vm-memory = { version = "0.7.0", features = ["backend-mmap"] }
[dev-dependencies]
slog-term = "2.9.0"
slog-async = "2.7.0"
[features]
acpi = []
atomic-guest-memory = []
hotplug = ["virtio-vsock"]
virtio-vsock = ["dbs-virtio-devices/virtio-vsock", "virtio-queue"]
virtio-blk = ["dbs-virtio-devices/virtio-blk", "virtio-queue"]
virtio-net = ["dbs-virtio-devices/virtio-net", "virtio-queue"]
# virtio-fs only work on atomic-guest-memory
virtio-fs = ["dbs-virtio-devices/virtio-fs", "virtio-queue", "atomic-guest-memory"]
[patch.'crates-io']
dbs-device = { git = "https://github.com/openanolis/dragonball-sandbox.git", rev = "7a8e832b53d66994d6a16f0513d69f540583dcd0" }
dbs-interrupt = { git = "https://github.com/openanolis/dragonball-sandbox.git", rev = "7a8e832b53d66994d6a16f0513d69f540583dcd0" }
dbs-legacy-devices = { git = "https://github.com/openanolis/dragonball-sandbox.git", rev = "7a8e832b53d66994d6a16f0513d69f540583dcd0" }
dbs-upcall = { git = "https://github.com/openanolis/dragonball-sandbox.git", rev = "7a8e832b53d66994d6a16f0513d69f540583dcd0" }
dbs-utils = { git = "https://github.com/openanolis/dragonball-sandbox.git", rev = "7a8e832b53d66994d6a16f0513d69f540583dcd0" }
dbs-virtio-devices = { git = "https://github.com/openanolis/dragonball-sandbox.git", rev = "7a8e832b53d66994d6a16f0513d69f540583dcd0" }
dbs-boot = { git = "https://github.com/openanolis/dragonball-sandbox.git", rev = "7a8e832b53d66994d6a16f0513d69f540583dcd0" }
dbs-arch = { git = "https://github.com/openanolis/dragonball-sandbox.git", rev = "7a8e832b53d66994d6a16f0513d69f540583dcd0" }

1
src/dragonball/LICENSE Symbolic link
View File

@@ -0,0 +1 @@
../../LICENSE

29
src/dragonball/Makefile Normal file
View File

@@ -0,0 +1,29 @@
# Copyright (c) 2019-2022 Alibaba Cloud. All rights reserved.
# Copyright (c) 2019-2022 Ant Group. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
default: build
build:
# FIXME: This line will be removed when we solve the vm-memory dependency problem in Dragonball Sandbox
cargo update -p vm-memory:0.8.0 --precise 0.7.0
cargo build --all-features
check: clippy format
clippy:
@echo "INFO: cargo clippy..."
cargo clippy --all-targets --all-features \
-- \
-D warnings
format:
@echo "INFO: cargo fmt..."
cargo fmt -- --check
clean:
cargo clean
test:
@echo "INFO: testing dragonball for development build"
cargo test --all-features -- --nocapture

40
src/dragonball/README.md Normal file
View File

@@ -0,0 +1,40 @@
# Introduction
`Dragonball Sandbox` is a light-weight virtual machine manager (VMM) based on Linux Kernel-based Virtual Machine (KVM),
which is optimized for container workloads with:
- container image management and acceleration service
- flexible and high-performance virtual device drivers
- low CPU and memory overhead
- minimal startup time
- optimized concurrent startup speed
`Dragonball Sandbox` aims to provide a simple solution for the Kata Containers community. It is integrated into Kata 3.0
runtime as a built-in VMM and gives users an out-of-the-box Kata Containers experience without complex environment setup
and configuration process.
# Getting Started
[TODO](https://github.com/kata-containers/kata-containers/issues/4302)
# Documentation
Device: [Device Document](docs/device.md)
vCPU: [vCPU Document](docs/vcpu.md)
API: [API Document](docs/api.md)
Currently, the documents are still actively adding.
You could see the [official documentation](docs/) page for more details.
# Supported Architectures
- x86-64
- aarch64
# Supported Kernel
[TODO](https://github.com/kata-containers/kata-containers/issues/4303)
# Acknowledgement
Part of the code is based on the [Cloud Hypervisor](https://github.com/cloud-hypervisor/cloud-hypervisor) project, [`crosvm`](https://github.com/google/crosvm) project and [Firecracker](https://github.com/firecracker-microvm/firecracker) project. They are all rust written virtual machine managers with advantages on safety and security.
`Dragonball sandbox` is designed to be a VMM that is customized for Kata Containers and we will focus on optimizing container workloads for Kata ecosystem. The focus on the Kata community is what differentiates us from other rust written virtual machines.
# License
`Dragonball` is licensed under [Apache License](http://www.apache.org/licenses/LICENSE-2.0), Version 2.0.

View File

@@ -0,0 +1,27 @@
// Copyright 2017 The Chromium OS Authors. All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// * Redistributions in binary form must reproduce the above
// copyright notice, this list of conditions and the following disclaimer
// in the documentation and/or other materials provided with the
// distribution.
// * Neither the name of Google Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

View File

@@ -0,0 +1,27 @@
# API
We provide plenty API for Kata runtime to interact with `Dragonball` virtual machine manager.
This document provides the introduction for each of them.
## `ConfigureBootSource`
Configure the boot source of the VM using `BootSourceConfig`. This action can only be called before the VM has booted.
### Boot Source Config
1. `kernel_path`: Path of the kernel image. `Dragonball` only supports compressed kernel image for now.
2. `initrd_path`: Path of the initrd (could be None)
3. `boot_args`: Boot arguments passed to the kernel (could be None)
## `SetVmConfiguration`
Set virtual machine configuration using `VmConfigInfo` to initialize VM.
### VM Config Info
1. `vcpu_count`: Number of vCPU to start. Currently we only support up to 255 vCPUs.
2. `max_vcpu_count`: Max number of vCPU can be added through CPU hotplug.
3. `cpu_pm`: CPU power management.
4. `cpu_topology`: CPU topology information (including `threads_per_core`, `cores_per_die`, `dies_per_socket` and `sockets`).
5. `vpmu_feature`: `vPMU` feature level.
6. `mem_type`: Memory type that can be either `hugetlbfs` or `shmem`, default is `shmem`.
7. `mem_file_path` : Memory file path.
8. `mem_size_mib`: The memory size in MiB. The maximum memory size is 1TB.
9. `serial_path`: Optional sock path.

View File

@@ -0,0 +1,20 @@
# Device
## Device Manager
Currently we have following device manager:
| Name | Description |
| --- | --- |
| [address space manager](../src/address_space_manager.rs) | abstracts virtual machine's physical management and provide mapping for guest virtual memory and MMIO ranges of emulated virtual devices, pass-through devices and vCPU |
| [config manager](../src/config_manager.rs) | provides abstractions for configuration information |
| [console manager](../src/device_manager/console_manager.rs) | provides management for all console devices |
| [resource manager](../src/resource_manager.rs) |provides resource management for `legacy_irq_pool`, `msi_irq_pool`, `pio_pool`, `mmio_pool`, `mem_pool`, `kvm_mem_slot_pool` with builder `ResourceManagerBuilder` |
| [VSOCK device manager](../src/device_manager/vsock_dev_mgr.rs) | provides configuration info for `VIRTIO-VSOCK` and management for all VSOCK devices |
## Device supported
`VIRTIO-VSOCK`
`i8042`
`COM1`
`COM2`

View File

@@ -0,0 +1,42 @@
# vCPU
## vCPU Manager
The vCPU manager is to manage all vCPU related actions, we will dive into some of the important structure members in this doc.
For now, aarch64 vCPU support is still under development, we'll introduce it when we merge `runtime-rs` to the master branch. (issue: #4445)
### vCPU config
`VcpuConfig` is used to configure guest overall CPU info.
`boot_vcpu_count` is used to define the initial vCPU number.
`max_vcpu_count` is used to define the maximum vCPU number and it's used for the upper boundary for CPU hotplug feature
`thread_per_core`, `cores_per_die`, `dies_per_socket` and `socket` are used to define CPU topology.
`vpmu_feature` is used to define `vPMU` feature level.
If `vPMU` feature is `Disabled`, it means `vPMU` feature is off (by default).
If `vPMU` feature is `LimitedlyEnabled`, it means minimal `vPMU` counters are supported (cycles and instructions).
If `vPMU` feature is `FullyEnabled`, it means all `vPMU` counters are supported
## vCPU State
There are four states for vCPU state machine: `running`, `paused`, `waiting_exit`, `exited`. There is a state machine to maintain the task flow.
When the vCPU is created, it'll turn to `paused` state. After vCPU resource is ready at VMM, it'll send a `Resume` event to the vCPU thread, and then vCPU state will change to `running`.
During the `running` state, VMM will catch vCPU exit and execute different logic according to the exit reason.
If the VMM catch some exit reasons that it cannot handle, the state will change to `waiting_exit` and VMM will stop the virtual machine.
When the state switches to `waiting_exit`, an exit event will be sent to vCPU `exit_evt`, event manager will detect the change in `exit_evt` and set VMM `exit_evt_flag` as 1. A thread serving for VMM event loop will check `exit_evt_flag` and if the flag is 1, it'll stop the VMM.
When the VMM is stopped / destroyed, the state will change to `exited`.
## vCPU Hot plug
Since `Dragonball Sandbox` doesn't support virtualization of ACPI system, we use [`upcall`](https://github.com/openanolis/dragonball-sandbox/tree/main/crates/dbs-upcall) to establish a direct communication channel between `Dragonball` and Guest in order to trigger vCPU hotplug.
To use `upcall`, kernel patches are needed, you can get the patches from [`upcall`](https://github.com/openanolis/dragonball-sandbox/tree/main/crates/dbs-upcall) page, and we'll provide a ready-to-use guest kernel binary for you to try.
vCPU hot plug / hot unplug range is [1, `max_vcpu_count`]. Operations not in this range will be invalid.

View File

@@ -0,0 +1,892 @@
// Copyright (C) 2019-2022 Alibaba Cloud. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
//! Address space abstraction to manage virtual machine's physical address space.
//!
//! The AddressSpace abstraction is introduced to manage virtual machine's physical address space.
//! The regions in virtual machine's physical address space may be used to:
//! 1) map guest virtual memory
//! 2) map MMIO ranges for emulated virtual devices, such as virtio-fs DAX window.
//! 3) map MMIO ranges for pass-through devices, such as PCI device BARs.
//! 4) map MMIO ranges for to vCPU, such as local APIC.
//! 5) not used/available
//!
//! A related abstraction, vm_memory::GuestMemory, is used to access guest virtual memory only.
//! In other words, AddressSpace is the resource owner, and GuestMemory is an accessor for guest
//! virtual memory.
use std::collections::{BTreeMap, HashMap};
use std::fs::File;
use std::os::unix::io::{AsRawFd, FromRawFd};
use std::sync::atomic::{AtomicBool, AtomicU8, Ordering};
use std::sync::{Arc, Mutex};
use std::thread;
use dbs_address_space::{
AddressSpace, AddressSpaceError, AddressSpaceLayout, AddressSpaceRegion,
AddressSpaceRegionType, NumaNode, NumaNodeInfo, MPOL_MF_MOVE, MPOL_PREFERRED,
};
use dbs_allocator::Constraint;
use kvm_bindings::kvm_userspace_memory_region;
use kvm_ioctls::VmFd;
use log::{debug, error, info, warn};
use nix::sys::mman;
use nix::unistd::dup;
#[cfg(feature = "atomic-guest-memory")]
use vm_memory::atomic::GuestMemoryAtomic;
use vm_memory::{
Address, FileOffset, GuestAddress, GuestAddressSpace, GuestMemoryMmap, GuestMemoryRegion,
GuestRegionMmap, GuestUsize, MemoryRegionAddress, MmapRegion,
};
use crate::resource_manager::ResourceManager;
use crate::vm::NumaRegionInfo;
#[cfg(not(feature = "atomic-guest-memory"))]
/// Concrete GuestAddressSpace type used by the VMM.
pub type GuestAddressSpaceImpl = Arc<GuestMemoryMmap>;
#[cfg(feature = "atomic-guest-memory")]
/// Concrete GuestAddressSpace type used by the VMM.
pub type GuestAddressSpaceImpl = GuestMemoryAtomic<GuestMemoryMmap>;
/// Concrete GuestMemory type used by the VMM.
pub type GuestMemoryImpl = <Arc<vm_memory::GuestMemoryMmap> as GuestAddressSpace>::M;
/// Concrete GuestRegion type used by the VMM.
pub type GuestRegionImpl = GuestRegionMmap;
// Maximum number of working threads for memory pre-allocation.
const MAX_PRE_ALLOC_THREAD: u64 = 16;
// Control the actual number of pre-allocating threads. After several performance tests, we decide to use one thread to do pre-allocating for every 4G memory.
const PRE_ALLOC_GRANULARITY: u64 = 32;
// We don't have plan to support mainframe computer and only focus on PC servers.
// 64 as max nodes should be enough for now.
const MAX_NODE: u32 = 64;
// We will split the memory region if it conflicts with the MMIO hole.
// But if the space below the MMIO hole is smaller than the MINIMAL_SPLIT_SPACE, we won't split the memory region in order to enhance performance.
const MINIMAL_SPLIT_SPACE: u64 = 128 << 20;
/// Errors associated with virtual machine address space management.
#[derive(Debug, thiserror::Error)]
pub enum AddressManagerError {
/// Invalid address space operation.
#[error("invalid address space operation")]
InvalidOperation,
/// Invalid address range.
#[error("invalid address space region (0x{0:x}, 0x{1:x})")]
InvalidAddressRange(u64, GuestUsize),
/// No available mem address.
#[error("no available mem address")]
NoAvailableMemAddress,
/// No available kvm slotse.
#[error("no available kvm slots")]
NoAvailableKvmSlot,
/// Address manager failed to create memfd to map anonymous memory.
#[error("address manager failed to create memfd to map anonymous memory")]
CreateMemFd(#[source] nix::Error),
/// Address manager failed to open memory file.
#[error("address manager failed to open memory file")]
OpenFile(#[source] std::io::Error),
/// Memory file provided is invalid due to empty file path, non-existent file path and other possible mistakes.
#[error("memory file provided to address manager {0} is invalid")]
FileInvalid(String),
/// Memory file provided is invalid due to empty memory type
#[error("memory type provided to address manager {0} is invalid")]
TypeInvalid(String),
/// Failed to set size for memory file.
#[error("address manager failed to set size for memory file")]
SetFileSize(#[source] std::io::Error),
/// Failed to unlink memory file.
#[error("address manager failed to unlink memory file")]
UnlinkFile(#[source] nix::Error),
/// Failed to duplicate fd of memory file.
#[error("address manager failed to duplicate memory file descriptor")]
DupFd(#[source] nix::Error),
/// Failure in accessing the memory located at some address.
#[error("address manager failed to access guest memory located at 0x{0:x}")]
AccessGuestMemory(u64, #[source] vm_memory::mmap::Error),
/// Failed to create GuestMemory
#[error("address manager failed to create guest memory object")]
CreateGuestMemory(#[source] vm_memory::Error),
/// Failure in initializing guest memory.
#[error("address manager failed to initialize guest memory")]
GuestMemoryNotInitialized,
/// Failed to mmap() guest memory
#[error("address manager failed to mmap() guest memory into current process")]
MmapGuestMemory(#[source] vm_memory::mmap::MmapRegionError),
/// Failed to set KVM memory slot.
#[error("address manager failed to configure KVM memory slot")]
KvmSetMemorySlot(#[source] kvm_ioctls::Error),
/// Failed to set madvise on AddressSpaceRegion
#[error("address manager failed to set madvice() on guest memory region")]
Madvise(#[source] nix::Error),
/// join threads fail
#[error("address manager failed to join threads")]
JoinFail,
/// Failed to create Address Space Region
#[error("address manager failed to create Address Space Region {0}")]
CreateAddressSpaceRegion(#[source] AddressSpaceError),
}
type Result<T> = std::result::Result<T, AddressManagerError>;
/// Parameters to configure address space creation operations.
pub struct AddressSpaceMgrBuilder<'a> {
mem_type: &'a str,
mem_file: &'a str,
mem_index: u32,
mem_suffix: bool,
mem_prealloc: bool,
dirty_page_logging: bool,
vmfd: Option<Arc<VmFd>>,
}
impl<'a> AddressSpaceMgrBuilder<'a> {
/// Create a new [`AddressSpaceMgrBuilder`] object.
pub fn new(mem_type: &'a str, mem_file: &'a str) -> Result<Self> {
if mem_type.is_empty() {
return Err(AddressManagerError::TypeInvalid(mem_type.to_string()));
}
Ok(AddressSpaceMgrBuilder {
mem_type,
mem_file,
mem_index: 0,
mem_suffix: true,
mem_prealloc: false,
dirty_page_logging: false,
vmfd: None,
})
}
/// Enable/disable adding numbered suffix to memory file path.
/// This feature could be useful to generate hugetlbfs files with number suffix. (e.g. shmem0, shmem1)
pub fn toggle_file_suffix(&mut self, enabled: bool) {
self.mem_suffix = enabled;
}
/// Enable/disable memory pre-allocation.
/// Enable this feature could improve performance stability at the start of workload by avoiding page fault.
/// Disable this feature may influence performance stability but the cpu resource consumption and start-up time will decrease.
pub fn toggle_prealloc(&mut self, prealloc: bool) {
self.mem_prealloc = prealloc;
}
/// Enable/disable KVM dirty page logging.
pub fn toggle_dirty_page_logging(&mut self, logging: bool) {
self.dirty_page_logging = logging;
}
/// Set KVM [`VmFd`] handle to configure memory slots.
pub fn set_kvm_vm_fd(&mut self, vmfd: Arc<VmFd>) -> Option<Arc<VmFd>> {
let mut existing_vmfd = None;
if self.vmfd.is_some() {
existing_vmfd = self.vmfd.clone();
}
self.vmfd = Some(vmfd);
existing_vmfd
}
/// Build a ['AddressSpaceMgr'] using the configured parameters.
pub fn build(
self,
res_mgr: &ResourceManager,
numa_region_infos: &[NumaRegionInfo],
) -> Result<AddressSpaceMgr> {
let mut mgr = AddressSpaceMgr::default();
mgr.create_address_space(res_mgr, numa_region_infos, self)?;
Ok(mgr)
}
fn get_next_mem_file(&mut self) -> String {
if self.mem_suffix {
let path = format!("{}{}", self.mem_file, self.mem_index);
self.mem_index += 1;
path
} else {
self.mem_file.to_string()
}
}
}
/// Struct to manage virtual machine's physical address space.
pub struct AddressSpaceMgr {
address_space: Option<AddressSpace>,
vm_as: Option<GuestAddressSpaceImpl>,
base_to_slot: Arc<Mutex<HashMap<u64, u32>>>,
prealloc_handlers: Vec<thread::JoinHandle<()>>,
prealloc_exit: Arc<AtomicBool>,
numa_nodes: BTreeMap<u32, NumaNode>,
}
impl AddressSpaceMgr {
/// Query address space manager is initialized or not
pub fn is_initialized(&self) -> bool {
self.address_space.is_some()
}
/// Gets address space.
pub fn address_space(&self) -> Option<&AddressSpace> {
self.address_space.as_ref()
}
/// Create the address space for a virtual machine.
///
/// This method is designed to be called when starting up a virtual machine instead of at
/// runtime, so it's expected the virtual machine will be tore down and no strict error recover.
pub fn create_address_space(
&mut self,
res_mgr: &ResourceManager,
numa_region_infos: &[NumaRegionInfo],
mut param: AddressSpaceMgrBuilder,
) -> Result<()> {
let mut regions = Vec::new();
let mut start_addr = dbs_boot::layout::GUEST_MEM_START;
// Create address space regions.
for info in numa_region_infos.iter() {
info!("numa_region_info {:?}", info);
// convert size_in_mib to bytes
let size = info
.size
.checked_shl(20)
.ok_or_else(|| AddressManagerError::InvalidOperation)?;
// Guest memory does not intersect with the MMIO hole.
// TODO: make it work for ARM (issue #4307)
if start_addr > dbs_boot::layout::MMIO_LOW_END
|| start_addr + size <= dbs_boot::layout::MMIO_LOW_START
{
let region = self.create_region(start_addr, size, info, &mut param)?;
regions.push(region);
start_addr = start_addr
.checked_add(size)
.ok_or_else(|| AddressManagerError::InvalidOperation)?;
} else {
// Add guest memory below the MMIO hole, avoid splitting the memory region
// if the available address region is small than MINIMAL_SPLIT_SPACE MiB.
let mut below_size = dbs_boot::layout::MMIO_LOW_START
.checked_sub(start_addr)
.ok_or_else(|| AddressManagerError::InvalidOperation)?;
if below_size < (MINIMAL_SPLIT_SPACE) {
below_size = 0;
} else {
let region = self.create_region(start_addr, below_size, info, &mut param)?;
regions.push(region);
}
// Add guest memory above the MMIO hole
let above_start = dbs_boot::layout::MMIO_LOW_END + 1;
let above_size = size
.checked_sub(below_size)
.ok_or_else(|| AddressManagerError::InvalidOperation)?;
let region = self.create_region(above_start, above_size, info, &mut param)?;
regions.push(region);
start_addr = above_start
.checked_add(above_size)
.ok_or_else(|| AddressManagerError::InvalidOperation)?;
}
}
// Create GuestMemory object
let mut vm_memory = GuestMemoryMmap::new();
for reg in regions.iter() {
// Allocate used guest memory addresses.
// These addresses are statically allocated, resource allocation/update should not fail.
let constraint = Constraint::new(reg.len())
.min(reg.start_addr().raw_value())
.max(reg.last_addr().raw_value());
let _key = res_mgr
.allocate_mem_address(&constraint)
.ok_or(AddressManagerError::NoAvailableMemAddress)?;
let mmap_reg = self.create_mmap_region(reg.clone())?;
vm_memory = vm_memory
.insert_region(mmap_reg.clone())
.map_err(AddressManagerError::CreateGuestMemory)?;
self.map_to_kvm(res_mgr, &param, reg, mmap_reg)?;
}
#[cfg(feature = "atomic-guest-memory")]
{
self.vm_as = Some(AddressSpace::convert_into_vm_as(vm_memory));
}
#[cfg(not(feature = "atomic-guest-memory"))]
{
self.vm_as = Some(Arc::new(vm_memory));
}
let layout = AddressSpaceLayout::new(
*dbs_boot::layout::GUEST_PHYS_END,
dbs_boot::layout::GUEST_MEM_START,
*dbs_boot::layout::GUEST_MEM_END,
);
self.address_space = Some(AddressSpace::from_regions(regions, layout));
Ok(())
}
// size unit: Byte
fn create_region(
&mut self,
start_addr: u64,
size_bytes: u64,
info: &NumaRegionInfo,
param: &mut AddressSpaceMgrBuilder,
) -> Result<Arc<AddressSpaceRegion>> {
let mem_file_path = param.get_next_mem_file();
let region = AddressSpaceRegion::create_default_memory_region(
GuestAddress(start_addr),
size_bytes,
info.host_numa_node_id,
param.mem_type,
&mem_file_path,
param.mem_prealloc,
false,
)
.map_err(AddressManagerError::CreateAddressSpaceRegion)?;
let region = Arc::new(region);
self.insert_into_numa_nodes(
&region,
info.guest_numa_node_id.unwrap_or(0),
&info.vcpu_ids,
);
info!(
"create new region: guest addr 0x{:x}-0x{:x} size {}",
start_addr,
start_addr + size_bytes,
size_bytes
);
Ok(region)
}
fn map_to_kvm(
&mut self,
res_mgr: &ResourceManager,
param: &AddressSpaceMgrBuilder,
reg: &Arc<AddressSpaceRegion>,
mmap_reg: Arc<GuestRegionImpl>,
) -> Result<()> {
// Build mapping between GPA <-> HVA, by adding kvm memory slot.
let slot = res_mgr
.allocate_kvm_mem_slot(1, None)
.ok_or(AddressManagerError::NoAvailableKvmSlot)?;
if let Some(vmfd) = param.vmfd.as_ref() {
let host_addr = mmap_reg
.get_host_address(MemoryRegionAddress(0))
.map_err(|_e| AddressManagerError::InvalidOperation)?;
let flags = 0u32;
let mem_region = kvm_userspace_memory_region {
slot: slot as u32,
guest_phys_addr: reg.start_addr().raw_value(),
memory_size: reg.len() as u64,
userspace_addr: host_addr as u64,
flags,
};
info!(
"VM: guest memory region {:x} starts at {:x?}",
reg.start_addr().raw_value(),
host_addr
);
// Safe because the guest regions are guaranteed not to overlap.
unsafe { vmfd.set_user_memory_region(mem_region) }
.map_err(AddressManagerError::KvmSetMemorySlot)?;
}
self.base_to_slot
.lock()
.unwrap()
.insert(reg.start_addr().raw_value(), slot as u32);
Ok(())
}
/// Mmap the address space region into current process.
pub fn create_mmap_region(
&mut self,
region: Arc<AddressSpaceRegion>,
) -> Result<Arc<GuestRegionImpl>> {
// Special check for 32bit host with 64bit virtual machines.
if region.len() > usize::MAX as u64 {
return Err(AddressManagerError::InvalidAddressRange(
region.start_addr().raw_value(),
region.len(),
));
}
// The device MMIO regions may not be backed by memory files, so refuse to mmap them.
if region.region_type() == AddressSpaceRegionType::DeviceMemory {
return Err(AddressManagerError::InvalidOperation);
}
// The GuestRegionMmap/MmapRegion will take ownership of the FileOffset object,
// so we have to duplicate the fd here. It's really a dirty design.
let file_offset = match region.file_offset().as_ref() {
Some(fo) => {
let fd = dup(fo.file().as_raw_fd()).map_err(AddressManagerError::DupFd)?;
// Safe because we have just duplicated the raw fd.
let file = unsafe { File::from_raw_fd(fd) };
let file_offset = FileOffset::new(file, fo.start());
Some(file_offset)
}
None => None,
};
let perm_flags = if (region.perm_flags() & libc::MAP_POPULATE) != 0 && region.is_hugepage()
{
// mmap(MAP_POPULATE) conflicts with madive(MADV_HUGEPAGE) because mmap(MAP_POPULATE)
// will pre-fault in all memory with normal pages before madive(MADV_HUGEPAGE) gets
// called. So remove the MAP_POPULATE flag and memory will be faulted in by working
// threads.
region.perm_flags() & (!libc::MAP_POPULATE)
} else {
region.perm_flags()
};
let mmap_reg = MmapRegion::build(
file_offset,
region.len() as usize,
libc::PROT_READ | libc::PROT_WRITE,
perm_flags,
)
.map_err(AddressManagerError::MmapGuestMemory)?;
if region.is_anonpage() {
self.configure_anon_mem(&mmap_reg)?;
}
if let Some(node_id) = region.host_numa_node_id() {
self.configure_numa(&mmap_reg, node_id)?;
}
if region.is_hugepage() {
self.configure_thp_and_prealloc(&region, &mmap_reg)?;
}
let reg = GuestRegionImpl::new(mmap_reg, region.start_addr())
.map_err(AddressManagerError::CreateGuestMemory)?;
Ok(Arc::new(reg))
}
fn configure_anon_mem(&self, mmap_reg: &MmapRegion) -> Result<()> {
unsafe {
mman::madvise(
mmap_reg.as_ptr() as *mut libc::c_void,
mmap_reg.size(),
mman::MmapAdvise::MADV_DONTFORK,
)
}
.map_err(AddressManagerError::Madvise)
}
fn configure_numa(&self, mmap_reg: &MmapRegion, node_id: u32) -> Result<()> {
let nodemask = 1_u64
.checked_shl(node_id)
.ok_or_else(|| AddressManagerError::InvalidOperation)?;
let res = unsafe {
libc::syscall(
libc::SYS_mbind,
mmap_reg.as_ptr() as *mut libc::c_void,
mmap_reg.size(),
MPOL_PREFERRED,
&nodemask as *const u64,
MAX_NODE,
MPOL_MF_MOVE,
)
};
if res < 0 {
warn!(
"failed to mbind memory to host_numa_node_id {}: this may affect performance",
node_id
);
}
Ok(())
}
// We set Transparent Huge Page (THP) through mmap to increase performance.
// In order to reduce the impact of page fault on performance, we start several threads (up to MAX_PRE_ALLOC_THREAD) to touch every 4k page of the memory region to manually do memory pre-allocation.
// The reason why we don't use mmap to enable THP and pre-alloction is that THP setting won't take effect in this operation (tested in kernel 4.9)
fn configure_thp_and_prealloc(
&mut self,
region: &Arc<AddressSpaceRegion>,
mmap_reg: &MmapRegion,
) -> Result<()> {
debug!(
"Setting MADV_HUGEPAGE on AddressSpaceRegion addr {:x?} len {:x?}",
mmap_reg.as_ptr(),
mmap_reg.size()
);
// Safe because we just create the MmapRegion
unsafe {
mman::madvise(
mmap_reg.as_ptr() as *mut libc::c_void,
mmap_reg.size(),
mman::MmapAdvise::MADV_HUGEPAGE,
)
}
.map_err(AddressManagerError::Madvise)?;
if region.perm_flags() & libc::MAP_POPULATE > 0 {
// Touch every 4k page to trigger allocation. The step is 4K instead of 2M to ensure
// pre-allocation when running out of huge pages.
const PAGE_SIZE: u64 = 4096;
const PAGE_SHIFT: u32 = 12;
let addr = mmap_reg.as_ptr() as u64;
// Here we use >> PAGE_SHIFT to calculate how many 4K pages in the memory region.
let npage = (mmap_reg.size() as u64) >> PAGE_SHIFT;
let mut touch_thread = ((mmap_reg.size() as u64) >> PRE_ALLOC_GRANULARITY) + 1;
if touch_thread > MAX_PRE_ALLOC_THREAD {
touch_thread = MAX_PRE_ALLOC_THREAD;
}
let per_npage = npage / touch_thread;
for n in 0..touch_thread {
let start_npage = per_npage * n;
let end_npage = if n == (touch_thread - 1) {
npage
} else {
per_npage * (n + 1)
};
let mut per_addr = addr + (start_npage * PAGE_SIZE);
let should_stop = self.prealloc_exit.clone();
let handler = thread::Builder::new()
.name("PreallocThread".to_string())
.spawn(move || {
info!("PreallocThread start start_npage: {:?}, end_npage: {:?}, per_addr: {:?}, thread_number: {:?}",
start_npage, end_npage, per_addr, touch_thread );
for _ in start_npage..end_npage {
if should_stop.load(Ordering::Acquire) {
info!("PreallocThread stop start_npage: {:?}, end_npage: {:?}, per_addr: {:?}, thread_number: {:?}",
start_npage, end_npage, per_addr, touch_thread);
break;
}
// Reading from a THP page may be served by the zero page, so only
// write operation could ensure THP memory allocation. So use
// the compare_exchange(old_val, old_val) trick to trigger allocation.
let addr_ptr = per_addr as *mut u8;
let read_byte = unsafe { std::ptr::read_volatile(addr_ptr) };
let atomic_u8 : &AtomicU8 = unsafe {&*(addr_ptr as *mut AtomicU8)};
let _ = atomic_u8.compare_exchange(read_byte, read_byte, Ordering::SeqCst, Ordering::SeqCst);
per_addr += PAGE_SIZE;
}
info!("PreallocThread done start_npage: {:?}, end_npage: {:?}, per_addr: {:?}, thread_number: {:?}",
start_npage, end_npage, per_addr, touch_thread );
});
match handler {
Err(e) => error!(
"Failed to create working thread for async pre-allocation, {:?}. This may affect performance stability at the start of the workload.",
e
),
Ok(hdl) => self.prealloc_handlers.push(hdl),
}
}
}
Ok(())
}
/// Get the address space object
pub fn get_address_space(&self) -> Option<&AddressSpace> {
self.address_space.as_ref()
}
/// Get the default guest memory object, which will be used to access virtual machine's default
/// guest memory.
pub fn get_vm_as(&self) -> Option<&GuestAddressSpaceImpl> {
self.vm_as.as_ref()
}
/// Get the base to slot map
pub fn get_base_to_slot_map(&self) -> Arc<Mutex<HashMap<u64, u32>>> {
self.base_to_slot.clone()
}
/// get numa nodes infos from address space manager.
pub fn get_numa_nodes(&self) -> &BTreeMap<u32, NumaNode> {
&self.numa_nodes
}
/// add cpu and memory numa informations to BtreeMap
fn insert_into_numa_nodes(
&mut self,
region: &Arc<AddressSpaceRegion>,
guest_numa_node_id: u32,
vcpu_ids: &[u32],
) {
let node = self
.numa_nodes
.entry(guest_numa_node_id)
.or_insert_with(NumaNode::new);
node.add_info(&NumaNodeInfo {
base: region.start_addr(),
size: region.len(),
});
node.add_vcpu_ids(vcpu_ids);
}
/// get address space layout from address space manager.
pub fn get_layout(&self) -> Result<AddressSpaceLayout> {
self.address_space
.as_ref()
.map(|v| v.layout())
.ok_or(AddressManagerError::GuestMemoryNotInitialized)
}
/// Wait for the pre-allocation working threads to finish work.
///
/// Force all working threads to exit if `stop` is true.
pub fn wait_prealloc(&mut self, stop: bool) -> Result<()> {
if stop {
self.prealloc_exit.store(true, Ordering::Release);
}
while let Some(handlers) = self.prealloc_handlers.pop() {
if let Err(e) = handlers.join() {
error!("wait_prealloc join fail {:?}", e);
return Err(AddressManagerError::JoinFail);
}
}
Ok(())
}
}
impl Default for AddressSpaceMgr {
/// Create a new empty AddressSpaceMgr
fn default() -> Self {
AddressSpaceMgr {
address_space: None,
vm_as: None,
base_to_slot: Arc::new(Mutex::new(HashMap::new())),
prealloc_handlers: Vec::new(),
prealloc_exit: Arc::new(AtomicBool::new(false)),
numa_nodes: BTreeMap::new(),
}
}
}
#[cfg(test)]
mod tests {
use dbs_boot::layout::GUEST_MEM_START;
use std::ops::Deref;
use vm_memory::{Bytes, GuestAddressSpace, GuestMemory, GuestMemoryRegion};
use vmm_sys_util::tempfile::TempFile;
use super::*;
#[test]
fn test_create_address_space() {
let res_mgr = ResourceManager::new(None);
let mem_size = 128 << 20;
let numa_region_infos = vec![NumaRegionInfo {
size: mem_size >> 20,
host_numa_node_id: None,
guest_numa_node_id: Some(0),
vcpu_ids: vec![1, 2],
}];
let builder = AddressSpaceMgrBuilder::new("shmem", "").unwrap();
let as_mgr = builder.build(&res_mgr, &numa_region_infos).unwrap();
let vm_as = as_mgr.get_vm_as().unwrap();
let guard = vm_as.memory();
let gmem = guard.deref();
assert_eq!(gmem.num_regions(), 1);
let reg = gmem
.find_region(GuestAddress(GUEST_MEM_START + mem_size - 1))
.unwrap();
assert_eq!(reg.start_addr(), GuestAddress(GUEST_MEM_START));
assert_eq!(reg.len(), mem_size);
assert!(gmem
.find_region(GuestAddress(GUEST_MEM_START + mem_size))
.is_none());
assert!(reg.file_offset().is_some());
let buf = [0x1u8, 0x2u8, 0x3u8, 0x4u8, 0x5u8];
gmem.write_slice(&buf, GuestAddress(GUEST_MEM_START))
.unwrap();
// Update middle of mapped memory region
let mut val = 0xa5u8;
gmem.write_obj(val, GuestAddress(GUEST_MEM_START + 0x1))
.unwrap();
val = gmem.read_obj(GuestAddress(GUEST_MEM_START + 0x1)).unwrap();
assert_eq!(val, 0xa5);
val = gmem.read_obj(GuestAddress(GUEST_MEM_START)).unwrap();
assert_eq!(val, 1);
val = gmem.read_obj(GuestAddress(GUEST_MEM_START + 0x2)).unwrap();
assert_eq!(val, 3);
val = gmem.read_obj(GuestAddress(GUEST_MEM_START + 0x5)).unwrap();
assert_eq!(val, 0);
// Read ahead of mapped memory region
assert!(gmem
.read_obj::<u8>(GuestAddress(GUEST_MEM_START + mem_size))
.is_err());
let res_mgr = ResourceManager::new(None);
let mem_size = dbs_boot::layout::MMIO_LOW_START + (1 << 30);
let numa_region_infos = vec![NumaRegionInfo {
size: mem_size >> 20,
host_numa_node_id: None,
guest_numa_node_id: Some(0),
vcpu_ids: vec![1, 2],
}];
let builder = AddressSpaceMgrBuilder::new("shmem", "").unwrap();
let as_mgr = builder.build(&res_mgr, &numa_region_infos).unwrap();
let vm_as = as_mgr.get_vm_as().unwrap();
let guard = vm_as.memory();
let gmem = guard.deref();
#[cfg(target_arch = "x86_64")]
assert_eq!(gmem.num_regions(), 2);
#[cfg(target_arch = "aarch64")]
assert_eq!(gmem.num_regions(), 1);
// Test dropping GuestMemoryMmap object releases all resources.
for _ in 0..10000 {
let res_mgr = ResourceManager::new(None);
let mem_size = 1 << 20;
let numa_region_infos = vec![NumaRegionInfo {
size: mem_size >> 20,
host_numa_node_id: None,
guest_numa_node_id: Some(0),
vcpu_ids: vec![1, 2],
}];
let builder = AddressSpaceMgrBuilder::new("shmem", "").unwrap();
let _as_mgr = builder.build(&res_mgr, &numa_region_infos).unwrap();
}
let file = TempFile::new().unwrap().into_file();
let fd = file.as_raw_fd();
// fd should be small enough if there's no leaking of fds.
assert!(fd < 1000);
}
#[test]
fn test_address_space_mgr_get_boundary() {
let layout = AddressSpaceLayout::new(
*dbs_boot::layout::GUEST_PHYS_END,
dbs_boot::layout::GUEST_MEM_START,
*dbs_boot::layout::GUEST_MEM_END,
);
let res_mgr = ResourceManager::new(None);
let mem_size = 128 << 20;
let numa_region_infos = vec![NumaRegionInfo {
size: mem_size >> 20,
host_numa_node_id: None,
guest_numa_node_id: Some(0),
vcpu_ids: vec![1, 2],
}];
let builder = AddressSpaceMgrBuilder::new("shmem", "").unwrap();
let as_mgr = builder.build(&res_mgr, &numa_region_infos).unwrap();
assert_eq!(as_mgr.get_layout().unwrap(), layout);
}
#[test]
fn test_address_space_mgr_get_numa_nodes() {
let res_mgr = ResourceManager::new(None);
let mem_size = 128 << 20;
let cpu_vec = vec![1, 2];
let numa_region_infos = vec![NumaRegionInfo {
size: mem_size >> 20,
host_numa_node_id: None,
guest_numa_node_id: Some(0),
vcpu_ids: cpu_vec.clone(),
}];
let builder = AddressSpaceMgrBuilder::new("shmem", "").unwrap();
let as_mgr = builder.build(&res_mgr, &numa_region_infos).unwrap();
let mut numa_node = NumaNode::new();
numa_node.add_info(&NumaNodeInfo {
base: GuestAddress(GUEST_MEM_START),
size: mem_size,
});
numa_node.add_vcpu_ids(&cpu_vec);
assert_eq!(*as_mgr.get_numa_nodes().get(&0).unwrap(), numa_node);
}
#[test]
fn test_address_space_mgr_async_prealloc() {
let res_mgr = ResourceManager::new(None);
let mem_size = 2 << 20;
let cpu_vec = vec![1, 2];
let numa_region_infos = vec![NumaRegionInfo {
size: mem_size >> 20,
host_numa_node_id: None,
guest_numa_node_id: Some(0),
vcpu_ids: cpu_vec,
}];
let mut builder = AddressSpaceMgrBuilder::new("hugeshmem", "").unwrap();
builder.toggle_prealloc(true);
let mut as_mgr = builder.build(&res_mgr, &numa_region_infos).unwrap();
as_mgr.wait_prealloc(false).unwrap();
}
#[test]
fn test_address_space_mgr_builder() {
let mut builder = AddressSpaceMgrBuilder::new("shmem", "/tmp/shmem").unwrap();
assert_eq!(builder.mem_type, "shmem");
assert_eq!(builder.mem_file, "/tmp/shmem");
assert_eq!(builder.mem_index, 0);
assert!(builder.mem_suffix);
assert!(!builder.mem_prealloc);
assert!(!builder.dirty_page_logging);
assert!(builder.vmfd.is_none());
assert_eq!(&builder.get_next_mem_file(), "/tmp/shmem0");
assert_eq!(&builder.get_next_mem_file(), "/tmp/shmem1");
assert_eq!(&builder.get_next_mem_file(), "/tmp/shmem2");
assert_eq!(builder.mem_index, 3);
builder.toggle_file_suffix(false);
assert_eq!(&builder.get_next_mem_file(), "/tmp/shmem");
assert_eq!(&builder.get_next_mem_file(), "/tmp/shmem");
assert_eq!(builder.mem_index, 3);
builder.toggle_prealloc(true);
builder.toggle_dirty_page_logging(true);
assert!(builder.mem_prealloc);
assert!(builder.dirty_page_logging);
}
#[test]
fn test_configure_invalid_numa() {
let res_mgr = ResourceManager::new(None);
let mem_size = 128 << 20;
let numa_region_infos = vec![NumaRegionInfo {
size: mem_size >> 20,
host_numa_node_id: None,
guest_numa_node_id: Some(0),
vcpu_ids: vec![1, 2],
}];
let builder = AddressSpaceMgrBuilder::new("shmem", "").unwrap();
let as_mgr = builder.build(&res_mgr, &numa_region_infos).unwrap();
let mmap_reg = MmapRegion::new(8).unwrap();
assert!(as_mgr.configure_numa(&mmap_reg, u32::MAX).is_err());
}
}

View File

@@ -0,0 +1,6 @@
// Copyright (C) 2019-2022 Alibaba Cloud. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
//! API related data structures to configure the vmm.
pub mod v1;

View File

@@ -0,0 +1,55 @@
// Copyright (C) 2022 Alibaba Cloud. All rights reserved.
// Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0
use serde_derive::{Deserialize, Serialize};
/// Default guest kernel command line:
/// - `reboot=k` shutdown the guest on reboot, instead of well... rebooting;
/// - `panic=1` on panic, reboot after 1 second;
/// - `pci=off` do not scan for PCI devices (ser boot time);
/// - `nomodules` disable loadable kernel module support;
/// - `8250.nr_uarts=0` disable 8250 serial interface;
/// - `i8042.noaux` do not probe the i8042 controller for an attached mouse (ser boot time);
/// - `i8042.nomux` do not probe i8042 for a multiplexing controller (ser boot time);
/// - `i8042.nopnp` do not use ACPIPnP to discover KBD/AUX controllers (ser boot time);
/// - `i8042.dumbkbd` do not attempt to control kbd state via the i8042 (ser boot time).
pub const DEFAULT_KERNEL_CMDLINE: &str = "reboot=k panic=1 pci=off nomodules 8250.nr_uarts=0 \
i8042.noaux i8042.nomux i8042.nopnp i8042.dumbkbd";
/// Strongly typed data structure used to configure the boot source of the microvm.
#[derive(Clone, Debug, Deserialize, PartialEq, Serialize, Default)]
#[serde(deny_unknown_fields)]
pub struct BootSourceConfig {
/// Path of the kernel image.
/// We only support uncompressed kernel for Dragonball.
pub kernel_path: String,
/// Path of the initrd, if there is one.
/// ps. rootfs is set in BlockDeviceConfigInfo
pub initrd_path: Option<String>,
/// The boot arguments to pass to the kernel.
#[serde(skip_serializing_if = "Option::is_none")]
pub boot_args: Option<String>,
}
/// Errors associated with actions on `BootSourceConfig`.
#[derive(Debug, thiserror::Error)]
pub enum BootSourceConfigError {
/// The kernel file cannot be opened.
#[error(
"the kernel file cannot be opened due to invalid kernel path or invalid permissions: {0}"
)]
InvalidKernelPath(#[source] std::io::Error),
/// The initrd file cannot be opened.
#[error("the initrd file cannot be opened due to invalid path or invalid permissions: {0}")]
InvalidInitrdPath(#[source] std::io::Error),
/// The kernel command line is invalid.
#[error("the kernel command line is invalid: {0}")]
InvalidKernelCommandLine(#[source] linux_loader::cmdline::Error),
/// The boot source cannot be update post boot.
#[error("the update operation is not allowed after boot")]
UpdateNotAllowedPostBoot,
}

View File

@@ -0,0 +1,88 @@
// Copyright (C) 2022 Alibaba Cloud. All rights reserved.
// Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
//
// SPDX-License-Identifier: Apache-2.0
use serde_derive::{Deserialize, Serialize};
/// The microvm state.
///
/// When Dragonball starts, the instance state is Uninitialized. Once start_microvm method is
/// called, the state goes from Uninitialized to Starting. The state is changed to Running until
/// the start_microvm method ends. Halting and Halted are currently unsupported.
#[derive(Copy, Clone, Debug, Deserialize, PartialEq, Serialize)]
pub enum InstanceState {
/// Microvm is not initialized.
Uninitialized,
/// Microvm is starting.
Starting,
/// Microvm is running.
Running,
/// Microvm is Paused.
Paused,
/// Microvm received a halt instruction.
Halting,
/// Microvm is halted.
Halted,
/// Microvm exit instead of process exit.
Exited(i32),
}
/// The state of async actions
#[derive(Debug, Deserialize, Serialize, Clone, PartialEq)]
pub enum AsyncState {
/// Uninitialized
Uninitialized,
/// Success
Success,
/// Failure
Failure,
}
/// The strongly typed that contains general information about the microVM.
#[derive(Debug, Deserialize, Serialize)]
pub struct InstanceInfo {
/// The ID of the microVM.
pub id: String,
/// The state of the microVM.
pub state: InstanceState,
/// The version of the VMM that runs the microVM.
pub vmm_version: String,
/// The pid of the current VMM process.
pub pid: u32,
/// The state of async actions.
pub async_state: AsyncState,
/// List of tids of vcpu threads (vcpu index, tid)
pub tids: Vec<(u8, u32)>,
/// Last instance downtime
pub last_instance_downtime: u64,
}
impl InstanceInfo {
/// create instance info object with given id, version, and platform type
pub fn new(id: String, vmm_version: String) -> Self {
InstanceInfo {
id,
state: InstanceState::Uninitialized,
vmm_version,
pid: std::process::id(),
async_state: AsyncState::Uninitialized,
tids: Vec::new(),
last_instance_downtime: 0,
}
}
}
impl Default for InstanceInfo {
fn default() -> Self {
InstanceInfo {
id: String::from(""),
state: InstanceState::Uninitialized,
vmm_version: env!("CARGO_PKG_VERSION").to_string(),
pid: std::process::id(),
async_state: AsyncState::Uninitialized,
tids: Vec::new(),
last_instance_downtime: 0,
}
}
}

View File

@@ -0,0 +1,86 @@
// Copyright (C) 2022 Alibaba Cloud. All rights reserved.
// Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0
/// We only support this number of vcpus for now. Mostly because we have set all vcpu related metrics as u8
/// and breaking u8 will take extra efforts.
pub const MAX_SUPPORTED_VCPUS: u8 = 254;
/// Memory hotplug value should have alignment in this size (unit: MiB)
pub const MEMORY_HOTPLUG_ALIGHMENT: u8 = 64;
/// Errors associated with configuring the microVM.
#[derive(Debug, PartialEq, thiserror::Error)]
pub enum VmConfigError {
/// Cannot update the configuration of the microvm post boot.
#[error("update operation is not allowed after boot")]
UpdateNotAllowedPostBoot,
/// The max vcpu count is invalid.
#[error("the vCPU number shouldn't large than {}", MAX_SUPPORTED_VCPUS)]
VcpuCountExceedsMaximum,
/// The vcpu count is invalid. When hyperthreading is enabled, the `cpu_count` must be either
/// 1 or an even number.
#[error(
"the vCPU number '{0}' can only be 1 or an even number when hyperthreading is enabled"
)]
InvalidVcpuCount(u8),
/// The threads_per_core is invalid. It should be either 1 or 2.
#[error("the threads_per_core number '{0}' can only be 1 or 2")]
InvalidThreadsPerCore(u8),
/// The cores_per_die is invalid. It should be larger than 0.
#[error("the cores_per_die number '{0}' can only be larger than 0")]
InvalidCoresPerDie(u8),
/// The dies_per_socket is invalid. It should be larger than 0.
#[error("the dies_per_socket number '{0}' can only be larger than 0")]
InvalidDiesPerSocket(u8),
/// The socket number is invalid. It should be either 1 or 2.
#[error("the socket number '{0}' can only be 1 or 2")]
InvalidSocket(u8),
/// max vcpu count inferred from cpu topology(threads_per_core * cores_per_die * dies_per_socket * sockets) should be larger or equal to vcpu_count
#[error("the max vcpu count inferred from cpu topology '{0}' (threads_per_core * cores_per_die * dies_per_socket * sockets) should be larger or equal to vcpu_count")]
InvalidCpuTopology(u8),
/// The max vcpu count is invalid.
#[error(
"the max vCPU number '{0}' shouldn't less than vCPU count and can only be 1 or an even number when hyperthreading is enabled"
)]
InvalidMaxVcpuCount(u8),
/// The memory size is invalid. The memory can only be an unsigned integer.
#[error("the memory size 0x{0:x}MiB is invalid")]
InvalidMemorySize(usize),
/// The hotplug memory size is invalid. The memory can only be an unsigned integer.
#[error(
"the hotplug memory size '{0}' (MiB) is invalid, must be multiple of {}",
MEMORY_HOTPLUG_ALIGHMENT
)]
InvalidHotplugMemorySize(usize),
/// The memory type is invalid.
#[error("the memory type '{0}' is invalid")]
InvalidMemType(String),
/// The memory file path is invalid.
#[error("the memory file path is invalid")]
InvalidMemFilePath(String),
/// NUMA region memory size is invalid
#[error("Total size of memory in NUMA regions: {0}, should matches memory size in config")]
InvalidNumaRegionMemorySize(usize),
/// NUMA region vCPU count is invalid
#[error("Total counts of vCPUs in NUMA regions: {0}, should matches max vcpu count in config")]
InvalidNumaRegionCpuCount(u16),
/// NUMA region vCPU count is invalid
#[error("Max id of vCPUs in NUMA regions: {0}, should matches max vcpu count in config")]
InvalidNumaRegionCpuMaxId(u16),
}

View File

@@ -0,0 +1,19 @@
// Copyright (C) 2019-2022 Alibaba Cloud. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
//! API Version 1 related data structures to configure the vmm.
mod vmm_action;
pub use self::vmm_action::*;
/// Wrapper for configuring the microVM boot source.
mod boot_source;
pub use self::boot_source::{BootSourceConfig, BootSourceConfigError, DEFAULT_KERNEL_CMDLINE};
/// Wrapper over the microVM general information.
mod instance_info;
pub use self::instance_info::{InstanceInfo, InstanceState};
/// Wrapper for configuring the memory and CPU of the microVM.
mod machine_config;
pub use self::machine_config::{VmConfigError, MAX_SUPPORTED_VCPUS};

View File

@@ -0,0 +1,636 @@
// Copyright (C) 2020-2022 Alibaba Cloud. All rights reserved.
// Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0
//
// Portions Copyright 2017 The Chromium OS Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the THIRD-PARTY file.
use std::fs::File;
use std::sync::mpsc::{Receiver, Sender, TryRecvError};
use log::{debug, error, info, warn};
use crate::error::{Result, StartMicroVmError, StopMicrovmError};
use crate::event_manager::EventManager;
use crate::vm::{CpuTopology, KernelConfigInfo, VmConfigInfo};
use crate::vmm::Vmm;
use self::VmConfigError::*;
use self::VmmActionError::MachineConfig;
#[cfg(feature = "virtio-blk")]
pub use crate::device_manager::blk_dev_mgr::{
BlockDeviceConfigInfo, BlockDeviceConfigUpdateInfo, BlockDeviceError, BlockDeviceMgr,
};
#[cfg(feature = "virtio-fs")]
pub use crate::device_manager::fs_dev_mgr::{
FsDeviceConfigInfo, FsDeviceConfigUpdateInfo, FsDeviceError, FsDeviceMgr, FsMountConfigInfo,
};
#[cfg(feature = "virtio-net")]
pub use crate::device_manager::virtio_net_dev_mgr::{
VirtioNetDeviceConfigInfo, VirtioNetDeviceConfigUpdateInfo, VirtioNetDeviceError,
VirtioNetDeviceMgr,
};
#[cfg(feature = "virtio-vsock")]
pub use crate::device_manager::vsock_dev_mgr::{VsockDeviceConfigInfo, VsockDeviceError};
use super::*;
/// Wrapper for all errors associated with VMM actions.
#[derive(Debug, thiserror::Error)]
pub enum VmmActionError {
/// Invalid virtual machine instance ID.
#[error("the virtual machine instance ID is invalid")]
InvalidVMID,
/// Failed to hotplug, due to Upcall not ready.
#[error("Upcall not ready, can't hotplug device.")]
UpcallNotReady,
/// The action `ConfigureBootSource` failed either because of bad user input or an internal
/// error.
#[error("failed to configure boot source for VM: {0}")]
BootSource(#[source] BootSourceConfigError),
/// The action `StartMicroVm` failed either because of bad user input or an internal error.
#[error("failed to boot the VM: {0}")]
StartMicroVm(#[source] StartMicroVmError),
/// The action `StopMicroVm` failed either because of bad user input or an internal error.
#[error("failed to shutdown the VM: {0}")]
StopMicrovm(#[source] StopMicrovmError),
/// One of the actions `GetVmConfiguration` or `SetVmConfiguration` failed either because of bad
/// input or an internal error.
#[error("failed to set configuration for the VM: {0}")]
MachineConfig(#[source] VmConfigError),
#[cfg(feature = "virtio-vsock")]
/// The action `InsertVsockDevice` failed either because of bad user input or an internal error.
#[error("failed to add virtio-vsock device: {0}")]
Vsock(#[source] VsockDeviceError),
#[cfg(feature = "virtio-blk")]
/// Block device related errors.
#[error("virtio-blk device error: {0}")]
Block(#[source] BlockDeviceError),
#[cfg(feature = "virtio-net")]
/// Net device related errors.
#[error("virtio-net device error: {0}")]
VirtioNet(#[source] VirtioNetDeviceError),
#[cfg(feature = "virtio-fs")]
/// The action `InsertFsDevice` failed either because of bad user input or an internal error.
#[error("virtio-fs device: {0}")]
FsDevice(#[source] FsDeviceError),
}
/// This enum represents the public interface of the VMM. Each action contains various
/// bits of information (ids, paths, etc.).
#[derive(Clone, Debug, PartialEq)]
pub enum VmmAction {
/// Configure the boot source of the microVM using `BootSourceConfig`.
/// This action can only be called before the microVM has booted.
ConfigureBootSource(BootSourceConfig),
/// Launch the microVM. This action can only be called before the microVM has booted.
StartMicroVm,
/// Shutdown the vmicroVM. This action can only be called after the microVM has booted.
/// When vmm is used as the crate by the other process, which is need to
/// shutdown the vcpu threads and destory all of the object.
ShutdownMicroVm,
/// Get the configuration of the microVM.
GetVmConfiguration,
/// Set the microVM configuration (memory & vcpu) using `VmConfig` as input. This
/// action can only be called before the microVM has booted.
SetVmConfiguration(VmConfigInfo),
#[cfg(feature = "virtio-vsock")]
/// Add a new vsock device or update one that already exists using the
/// `VsockDeviceConfig` as input. This action can only be called before the microVM has
/// booted. The response is sent using the `OutcomeSender`.
InsertVsockDevice(VsockDeviceConfigInfo),
#[cfg(feature = "virtio-blk")]
/// Add a new block device or update one that already exists using the `BlockDeviceConfig` as
/// input. This action can only be called before the microVM has booted.
InsertBlockDevice(BlockDeviceConfigInfo),
#[cfg(feature = "virtio-blk")]
/// Remove a new block device for according to given drive_id
RemoveBlockDevice(String),
#[cfg(feature = "virtio-blk")]
/// Update a block device, after microVM start. Currently, the only updatable properties
/// are the RX and TX rate limiters.
UpdateBlockDevice(BlockDeviceConfigUpdateInfo),
#[cfg(feature = "virtio-net")]
/// Add a new network interface config or update one that already exists using the
/// `NetworkInterfaceConfig` as input. This action can only be called before the microVM has
/// booted. The response is sent using the `OutcomeSender`.
InsertNetworkDevice(VirtioNetDeviceConfigInfo),
#[cfg(feature = "virtio-net")]
/// Update a network interface, after microVM start. Currently, the only updatable properties
/// are the RX and TX rate limiters.
UpdateNetworkInterface(VirtioNetDeviceConfigUpdateInfo),
#[cfg(feature = "virtio-fs")]
/// Add a new shared fs device or update one that already exists using the
/// `FsDeviceConfig` as input. This action can only be called before the microVM has
/// booted.
InsertFsDevice(FsDeviceConfigInfo),
#[cfg(feature = "virtio-fs")]
/// Attach a new virtiofs Backend fs or detach an existing virtiofs Backend fs using the
/// `FsMountConfig` as input. This action can only be called _after_ the microVM has
/// booted.
ManipulateFsBackendFs(FsMountConfigInfo),
#[cfg(feature = "virtio-fs")]
/// Update fs rate limiter, after microVM start.
UpdateFsDevice(FsDeviceConfigUpdateInfo),
}
/// The enum represents the response sent by the VMM in case of success. The response is either
/// empty, when no data needs to be sent, or an internal VMM structure.
#[derive(Debug)]
pub enum VmmData {
/// No data is sent on the channel.
Empty,
/// The microVM configuration represented by `VmConfigInfo`.
MachineConfiguration(Box<VmConfigInfo>),
}
/// Request data type used to communicate between the API and the VMM.
pub type VmmRequest = Box<VmmAction>;
/// Data type used to communicate between the API and the VMM.
pub type VmmRequestResult = std::result::Result<VmmData, VmmActionError>;
/// Response data type used to communicate between the API and the VMM.
pub type VmmResponse = Box<VmmRequestResult>;
/// VMM Service to handle requests from the API server.
///
/// There are two levels of API servers as below:
/// API client <--> VMM API Server <--> VMM Core
pub struct VmmService {
from_api: Receiver<VmmRequest>,
to_api: Sender<VmmResponse>,
machine_config: VmConfigInfo,
}
impl VmmService {
/// Create a new VMM API server instance.
pub fn new(from_api: Receiver<VmmRequest>, to_api: Sender<VmmResponse>) -> Self {
VmmService {
from_api,
to_api,
machine_config: VmConfigInfo::default(),
}
}
/// Handle requests from the HTTP API Server and send back replies.
pub fn run_vmm_action(&mut self, vmm: &mut Vmm, event_mgr: &mut EventManager) -> Result<()> {
let request = match self.from_api.try_recv() {
Ok(t) => *t,
Err(TryRecvError::Empty) => {
warn!("Got a spurious notification from api thread");
return Ok(());
}
Err(TryRecvError::Disconnected) => {
panic!("The channel's sending half was disconnected. Cannot receive data.");
}
};
debug!("receive vmm action: {:?}", request);
let response = match request {
VmmAction::ConfigureBootSource(boot_source_body) => {
self.configure_boot_source(vmm, boot_source_body)
}
VmmAction::StartMicroVm => self.start_microvm(vmm, event_mgr),
VmmAction::ShutdownMicroVm => self.shutdown_microvm(vmm),
VmmAction::GetVmConfiguration => Ok(VmmData::MachineConfiguration(Box::new(
self.machine_config.clone(),
))),
VmmAction::SetVmConfiguration(machine_config) => {
self.set_vm_configuration(vmm, machine_config)
}
#[cfg(feature = "virtio-vsock")]
VmmAction::InsertVsockDevice(vsock_cfg) => self.add_vsock_device(vmm, vsock_cfg),
#[cfg(feature = "virtio-blk")]
VmmAction::InsertBlockDevice(block_device_config) => {
self.add_block_device(vmm, event_mgr, block_device_config)
}
#[cfg(feature = "virtio-blk")]
VmmAction::UpdateBlockDevice(blk_update) => {
self.update_blk_rate_limiters(vmm, blk_update)
}
#[cfg(feature = "virtio-blk")]
VmmAction::RemoveBlockDevice(drive_id) => {
self.remove_block_device(vmm, event_mgr, &drive_id)
}
#[cfg(feature = "virtio-net")]
VmmAction::InsertNetworkDevice(virtio_net_cfg) => {
self.add_virtio_net_device(vmm, event_mgr, virtio_net_cfg)
}
#[cfg(feature = "virtio-net")]
VmmAction::UpdateNetworkInterface(netif_update) => {
self.update_net_rate_limiters(vmm, netif_update)
}
#[cfg(feature = "virtio-fs")]
VmmAction::InsertFsDevice(fs_cfg) => self.add_fs_device(vmm, fs_cfg),
#[cfg(feature = "virtio-fs")]
VmmAction::ManipulateFsBackendFs(fs_mount_cfg) => {
self.manipulate_fs_backend_fs(vmm, fs_mount_cfg)
}
#[cfg(feature = "virtio-fs")]
VmmAction::UpdateFsDevice(fs_update_cfg) => {
self.update_fs_rate_limiters(vmm, fs_update_cfg)
}
};
debug!("send vmm response: {:?}", response);
self.send_response(response)
}
fn send_response(&self, result: VmmRequestResult) -> Result<()> {
self.to_api
.send(Box::new(result))
.map_err(|_| ())
.expect("vmm: one-shot API result channel has been closed");
Ok(())
}
fn configure_boot_source(
&self,
vmm: &mut Vmm,
boot_source_config: BootSourceConfig,
) -> VmmRequestResult {
use super::BootSourceConfigError::{
InvalidInitrdPath, InvalidKernelCommandLine, InvalidKernelPath,
UpdateNotAllowedPostBoot,
};
use super::VmmActionError::BootSource;
let vm = vmm.get_vm_mut().ok_or(VmmActionError::InvalidVMID)?;
if vm.is_vm_initialized() {
return Err(BootSource(UpdateNotAllowedPostBoot));
}
let kernel_file = File::open(&boot_source_config.kernel_path)
.map_err(|e| BootSource(InvalidKernelPath(e)))?;
let initrd_file = match boot_source_config.initrd_path {
None => None,
Some(ref path) => Some(File::open(path).map_err(|e| BootSource(InvalidInitrdPath(e)))?),
};
let mut cmdline = linux_loader::cmdline::Cmdline::new(dbs_boot::layout::CMDLINE_MAX_SIZE);
let boot_args = boot_source_config
.boot_args
.clone()
.unwrap_or_else(|| String::from(DEFAULT_KERNEL_CMDLINE));
cmdline
.insert_str(boot_args)
.map_err(|e| BootSource(InvalidKernelCommandLine(e)))?;
let kernel_config = KernelConfigInfo::new(kernel_file, initrd_file, cmdline);
vm.set_kernel_config(kernel_config);
Ok(VmmData::Empty)
}
fn start_microvm(&mut self, vmm: &mut Vmm, event_mgr: &mut EventManager) -> VmmRequestResult {
use self::StartMicroVmError::MicroVMAlreadyRunning;
use self::VmmActionError::StartMicroVm;
let vmm_seccomp_filter = vmm.vmm_seccomp_filter();
let vcpu_seccomp_filter = vmm.vcpu_seccomp_filter();
let vm = vmm.get_vm_mut().ok_or(VmmActionError::InvalidVMID)?;
if vm.is_vm_initialized() {
return Err(StartMicroVm(MicroVMAlreadyRunning));
}
vm.start_microvm(event_mgr, vmm_seccomp_filter, vcpu_seccomp_filter)
.map(|_| VmmData::Empty)
.map_err(StartMicroVm)
}
fn shutdown_microvm(&mut self, vmm: &mut Vmm) -> VmmRequestResult {
vmm.event_ctx.exit_evt_triggered = true;
Ok(VmmData::Empty)
}
/// Set virtual machine configuration.
pub fn set_vm_configuration(
&mut self,
vmm: &mut Vmm,
machine_config: VmConfigInfo,
) -> VmmRequestResult {
let vm = vmm.get_vm_mut().ok_or(VmmActionError::InvalidVMID)?;
if vm.is_vm_initialized() {
return Err(MachineConfig(UpdateNotAllowedPostBoot));
}
// If the check is successful, set it up together.
let mut config = vm.vm_config().clone();
if config.vcpu_count != machine_config.vcpu_count {
let vcpu_count = machine_config.vcpu_count;
// Check that the vcpu_count value is >=1.
if vcpu_count == 0 {
return Err(MachineConfig(InvalidVcpuCount(vcpu_count)));
}
config.vcpu_count = vcpu_count;
}
if config.cpu_topology != machine_config.cpu_topology {
let cpu_topology = &machine_config.cpu_topology;
config.cpu_topology = handle_cpu_topology(cpu_topology, config.vcpu_count)?.clone();
} else {
// the same default
let mut default_cpu_topology = CpuTopology {
threads_per_core: 1,
cores_per_die: config.vcpu_count,
dies_per_socket: 1,
sockets: 1,
};
if machine_config.max_vcpu_count > config.vcpu_count {
default_cpu_topology.cores_per_die = machine_config.max_vcpu_count;
}
config.cpu_topology = default_cpu_topology;
}
let cpu_topology = &config.cpu_topology;
let max_vcpu_from_topo = cpu_topology.threads_per_core
* cpu_topology.cores_per_die
* cpu_topology.dies_per_socket
* cpu_topology.sockets;
// If the max_vcpu_count inferred by cpu_topology is not equal to
// max_vcpu_count, max_vcpu_count will be changed. currently, max vcpu size
// is used when cpu_topology is not defined and help define the cores_per_die
// for the default cpu topology.
let mut max_vcpu_count = machine_config.max_vcpu_count;
if max_vcpu_count < config.vcpu_count {
return Err(MachineConfig(InvalidMaxVcpuCount(max_vcpu_count)));
}
if max_vcpu_from_topo != max_vcpu_count {
max_vcpu_count = max_vcpu_from_topo;
info!("Since max_vcpu_count is not equal to cpu topo information, we have changed the max vcpu count to {}", max_vcpu_from_topo);
}
config.max_vcpu_count = max_vcpu_count;
config.cpu_pm = machine_config.cpu_pm;
config.mem_type = machine_config.mem_type;
let mem_size_mib_value = machine_config.mem_size_mib;
// Support 1TB memory at most, 2MB aligned for huge page.
if mem_size_mib_value == 0 || mem_size_mib_value > 0x10_0000 || mem_size_mib_value % 2 != 0
{
return Err(MachineConfig(InvalidMemorySize(mem_size_mib_value)));
}
config.mem_size_mib = mem_size_mib_value;
config.mem_file_path = machine_config.mem_file_path.clone();
if config.mem_type == "hugetlbfs" && config.mem_file_path.is_empty() {
return Err(MachineConfig(InvalidMemFilePath("".to_owned())));
}
config.vpmu_feature = machine_config.vpmu_feature;
let vm_id = vm.shared_info().read().unwrap().id.clone();
let serial_path = match machine_config.serial_path {
Some(value) => value,
None => {
if config.serial_path.is_none() {
String::from("/run/dragonball/") + &vm_id + "_com1"
} else {
// Safe to unwrap() because we have checked it has a value.
config.serial_path.as_ref().unwrap().clone()
}
}
};
config.serial_path = Some(serial_path);
vm.set_vm_config(config.clone());
self.machine_config = config;
Ok(VmmData::Empty)
}
#[cfg(feature = "virtio-vsock")]
fn add_vsock_device(&self, vmm: &mut Vmm, config: VsockDeviceConfigInfo) -> VmmRequestResult {
let vm = vmm.get_vm_mut().ok_or(VmmActionError::InvalidVMID)?;
if vm.is_vm_initialized() {
return Err(VmmActionError::Vsock(
VsockDeviceError::UpdateNotAllowedPostBoot,
));
}
// VMADDR_CID_ANY (-1U) means any address for binding;
// VMADDR_CID_HYPERVISOR (0) is reserved for services built into the hypervisor;
// VMADDR_CID_RESERVED (1) must not be used;
// VMADDR_CID_HOST (2) is the well-known address of the host.
if config.guest_cid <= 2 {
return Err(VmmActionError::Vsock(VsockDeviceError::GuestCIDInvalid(
config.guest_cid,
)));
}
info!("add_vsock_device: {:?}", config);
let ctx = vm.create_device_op_context(None).map_err(|e| {
info!("create device op context error: {:?}", e);
VmmActionError::Vsock(VsockDeviceError::UpdateNotAllowedPostBoot)
})?;
vm.device_manager_mut()
.vsock_manager
.insert_device(ctx, config)
.map(|_| VmmData::Empty)
.map_err(VmmActionError::Vsock)
}
#[cfg(feature = "virtio-blk")]
// Only call this function as part of the API.
// If the drive_id does not exist, a new Block Device Config is added to the list.
fn add_block_device(
&mut self,
vmm: &mut Vmm,
event_mgr: &mut EventManager,
config: BlockDeviceConfigInfo,
) -> VmmRequestResult {
let vm = vmm.get_vm_mut().ok_or(VmmActionError::InvalidVMID)?;
let ctx = vm
.create_device_op_context(Some(event_mgr.epoll_manager()))
.map_err(|e| {
if let StartMicroVmError::UpcallNotReady = e {
return VmmActionError::UpcallNotReady;
}
VmmActionError::Block(BlockDeviceError::UpdateNotAllowedPostBoot)
})?;
BlockDeviceMgr::insert_device(vm.device_manager_mut(), ctx, config)
.map(|_| VmmData::Empty)
.map_err(VmmActionError::Block)
}
#[cfg(feature = "virtio-blk")]
/// Updates configuration for an emulated net device as described in `config`.
fn update_blk_rate_limiters(
&mut self,
vmm: &mut Vmm,
config: BlockDeviceConfigUpdateInfo,
) -> VmmRequestResult {
let vm = vmm.get_vm_mut().ok_or(VmmActionError::InvalidVMID)?;
BlockDeviceMgr::update_device_ratelimiters(vm.device_manager_mut(), config)
.map(|_| VmmData::Empty)
.map_err(VmmActionError::Block)
}
#[cfg(feature = "virtio-blk")]
// Remove the device
fn remove_block_device(
&mut self,
vmm: &mut Vmm,
event_mgr: &mut EventManager,
drive_id: &str,
) -> VmmRequestResult {
let vm = vmm.get_vm_mut().ok_or(VmmActionError::InvalidVMID)?;
let ctx = vm
.create_device_op_context(Some(event_mgr.epoll_manager()))
.map_err(|_| VmmActionError::Block(BlockDeviceError::UpdateNotAllowedPostBoot))?;
BlockDeviceMgr::remove_device(vm.device_manager_mut(), ctx, drive_id)
.map(|_| VmmData::Empty)
.map_err(VmmActionError::Block)
}
#[cfg(feature = "virtio-net")]
fn add_virtio_net_device(
&mut self,
vmm: &mut Vmm,
event_mgr: &mut EventManager,
config: VirtioNetDeviceConfigInfo,
) -> VmmRequestResult {
let vm = vmm.get_vm_mut().ok_or(VmmActionError::InvalidVMID)?;
let ctx = vm
.create_device_op_context(Some(event_mgr.epoll_manager()))
.map_err(|e| {
if let StartMicroVmError::MicroVMAlreadyRunning = e {
VmmActionError::VirtioNet(VirtioNetDeviceError::UpdateNotAllowedPostBoot)
} else if let StartMicroVmError::UpcallNotReady = e {
VmmActionError::UpcallNotReady
} else {
VmmActionError::StartMicroVm(e)
}
})?;
VirtioNetDeviceMgr::insert_device(vm.device_manager_mut(), ctx, config)
.map(|_| VmmData::Empty)
.map_err(VmmActionError::VirtioNet)
}
#[cfg(feature = "virtio-net")]
fn update_net_rate_limiters(
&mut self,
vmm: &mut Vmm,
config: VirtioNetDeviceConfigUpdateInfo,
) -> VmmRequestResult {
let vm = vmm.get_vm_mut().ok_or(VmmActionError::InvalidVMID)?;
VirtioNetDeviceMgr::update_device_ratelimiters(vm.device_manager_mut(), config)
.map(|_| VmmData::Empty)
.map_err(VmmActionError::VirtioNet)
}
#[cfg(feature = "virtio-fs")]
fn add_fs_device(&mut self, vmm: &mut Vmm, config: FsDeviceConfigInfo) -> VmmRequestResult {
let vm = vmm.get_vm_mut().ok_or(VmmActionError::InvalidVMID)?;
let hotplug = vm.is_vm_initialized();
if !cfg!(feature = "hotplug") && hotplug {
return Err(VmmActionError::FsDevice(
FsDeviceError::UpdateNotAllowedPostBoot,
));
}
let ctx = vm.create_device_op_context(None).map_err(|e| {
info!("create device op context error: {:?}", e);
VmmActionError::FsDevice(FsDeviceError::UpdateNotAllowedPostBoot)
})?;
FsDeviceMgr::insert_device(vm.device_manager_mut(), ctx, config)
.map(|_| VmmData::Empty)
.map_err(VmmActionError::FsDevice)
}
#[cfg(feature = "virtio-fs")]
fn manipulate_fs_backend_fs(
&self,
vmm: &mut Vmm,
config: FsMountConfigInfo,
) -> VmmRequestResult {
let vm = vmm.get_vm_mut().ok_or(VmmActionError::InvalidVMID)?;
if !vm.is_vm_initialized() {
return Err(VmmActionError::FsDevice(FsDeviceError::MicroVMNotRunning));
}
FsDeviceMgr::manipulate_backend_fs(vm.device_manager_mut(), config)
.map(|_| VmmData::Empty)
.map_err(VmmActionError::FsDevice)
}
#[cfg(feature = "virtio-fs")]
fn update_fs_rate_limiters(
&self,
vmm: &mut Vmm,
config: FsDeviceConfigUpdateInfo,
) -> VmmRequestResult {
let vm = vmm.get_vm_mut().ok_or(VmmActionError::InvalidVMID)?;
if !vm.is_vm_initialized() {
return Err(VmmActionError::FsDevice(FsDeviceError::MicroVMNotRunning));
}
FsDeviceMgr::update_device_ratelimiters(vm.device_manager_mut(), config)
.map(|_| VmmData::Empty)
.map_err(VmmActionError::FsDevice)
}
}
fn handle_cpu_topology(
cpu_topology: &CpuTopology,
vcpu_count: u8,
) -> std::result::Result<&CpuTopology, VmmActionError> {
// Check if dies_per_socket, cores_per_die, threads_per_core and socket number is valid
if cpu_topology.threads_per_core < 1 || cpu_topology.threads_per_core > 2 {
return Err(MachineConfig(InvalidThreadsPerCore(
cpu_topology.threads_per_core,
)));
}
let vcpu_count_from_topo = cpu_topology
.sockets
.checked_mul(cpu_topology.dies_per_socket)
.ok_or(MachineConfig(VcpuCountExceedsMaximum))?
.checked_mul(cpu_topology.cores_per_die)
.ok_or(MachineConfig(VcpuCountExceedsMaximum))?
.checked_mul(cpu_topology.threads_per_core)
.ok_or(MachineConfig(VcpuCountExceedsMaximum))?;
if vcpu_count_from_topo > MAX_SUPPORTED_VCPUS {
return Err(MachineConfig(VcpuCountExceedsMaximum));
}
if vcpu_count_from_topo < vcpu_count {
return Err(MachineConfig(InvalidCpuTopology(vcpu_count_from_topo)));
}
Ok(cpu_topology)
}

View File

@@ -0,0 +1,823 @@
// Copyright (C) 2020-2022 Alibaba Cloud. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
use std::convert::TryInto;
use std::io;
use std::ops::{Index, IndexMut};
use std::sync::Arc;
use dbs_device::DeviceIo;
use dbs_utils::rate_limiter::{RateLimiter, TokenBucket};
use serde_derive::{Deserialize, Serialize};
/// Get bucket update for rate limiter.
#[macro_export]
macro_rules! get_bucket_update {
($self:ident, $rate_limiter: ident, $metric: ident) => {{
match &$self.$rate_limiter {
Some(rl_cfg) => {
let tb_cfg = &rl_cfg.$metric;
dbs_utils::rate_limiter::RateLimiter::make_bucket(
tb_cfg.size,
tb_cfg.one_time_burst,
tb_cfg.refill_time,
)
// Updated active rate-limiter.
.map(dbs_utils::rate_limiter::BucketUpdate::Update)
// Updated/deactivated rate-limiter
.unwrap_or(dbs_utils::rate_limiter::BucketUpdate::Disabled)
}
// No update to the rate-limiter.
None => dbs_utils::rate_limiter::BucketUpdate::None,
}
}};
}
/// Trait for generic configuration information.
pub trait ConfigItem {
/// Related errors.
type Err;
/// Get the unique identifier of the configuration item.
fn id(&self) -> &str;
/// Check whether current configuration item conflicts with another one.
fn check_conflicts(&self, other: &Self) -> std::result::Result<(), Self::Err>;
}
/// Struct to manage a group of configuration items.
#[derive(Debug, Default, Deserialize, PartialEq, Serialize)]
pub struct ConfigInfos<T>
where
T: ConfigItem + Clone,
{
configs: Vec<T>,
}
impl<T> ConfigInfos<T>
where
T: ConfigItem + Clone + Default,
{
/// Constructor
pub fn new() -> Self {
ConfigInfos::default()
}
/// Insert a configuration item in the group.
pub fn insert(&mut self, config: T) -> std::result::Result<(), T::Err> {
for item in self.configs.iter() {
config.check_conflicts(item)?;
}
self.configs.push(config);
Ok(())
}
/// Update a configuration item in the group.
pub fn update(&mut self, config: T, err: T::Err) -> std::result::Result<(), T::Err> {
match self.get_index_by_id(&config) {
None => Err(err),
Some(index) => {
for (idx, item) in self.configs.iter().enumerate() {
if idx != index {
config.check_conflicts(item)?;
}
}
self.configs[index] = config;
Ok(())
}
}
}
/// Insert or update a configuration item in the group.
pub fn insert_or_update(&mut self, config: T) -> std::result::Result<(), T::Err> {
match self.get_index_by_id(&config) {
None => {
for item in self.configs.iter() {
config.check_conflicts(item)?;
}
self.configs.push(config)
}
Some(index) => {
for (idx, item) in self.configs.iter().enumerate() {
if idx != index {
config.check_conflicts(item)?;
}
}
self.configs[index] = config;
}
}
Ok(())
}
/// Remove the matching configuration entry.
pub fn remove(&mut self, config: &T) -> Option<T> {
if let Some(index) = self.get_index_by_id(config) {
Some(self.configs.remove(index))
} else {
None
}
}
/// Returns an immutable iterator over the config items
pub fn iter(&self) -> ::std::slice::Iter<T> {
self.configs.iter()
}
/// Get the configuration entry with matching ID.
pub fn get_by_id(&self, item: &T) -> Option<&T> {
let id = item.id();
self.configs.iter().rfind(|cfg| cfg.id() == id)
}
fn get_index_by_id(&self, item: &T) -> Option<usize> {
let id = item.id();
self.configs.iter().position(|cfg| cfg.id() == id)
}
}
impl<T> Clone for ConfigInfos<T>
where
T: ConfigItem + Clone,
{
fn clone(&self) -> Self {
ConfigInfos {
configs: self.configs.clone(),
}
}
}
/// Struct to maintain configuration information for a device.
pub struct DeviceConfigInfo<T>
where
T: ConfigItem + Clone,
{
/// Configuration information for the device object.
pub config: T,
/// The associated device object.
pub device: Option<Arc<dyn DeviceIo>>,
}
impl<T> DeviceConfigInfo<T>
where
T: ConfigItem + Clone,
{
/// Create a new instance of ['DeviceInfoGroup'].
pub fn new(config: T) -> Self {
DeviceConfigInfo {
config,
device: None,
}
}
/// Create a new instance of ['DeviceInfoGroup'] with optional device.
pub fn new_with_device(config: T, device: Option<Arc<dyn DeviceIo>>) -> Self {
DeviceConfigInfo { config, device }
}
/// Set the device object associated with the configuration.
pub fn set_device(&mut self, device: Arc<dyn DeviceIo>) {
self.device = Some(device);
}
}
impl<T> Clone for DeviceConfigInfo<T>
where
T: ConfigItem + Clone,
{
fn clone(&self) -> Self {
DeviceConfigInfo::new_with_device(self.config.clone(), self.device.clone())
}
}
/// Struct to maintain configuration information for a group of devices.
pub struct DeviceConfigInfos<T>
where
T: ConfigItem + Clone,
{
info_list: Vec<DeviceConfigInfo<T>>,
}
impl<T> Default for DeviceConfigInfos<T>
where
T: ConfigItem + Clone,
{
fn default() -> Self {
Self::new()
}
}
impl<T> DeviceConfigInfos<T>
where
T: ConfigItem + Clone,
{
/// Create a new instance of ['DeviceConfigInfos'].
pub fn new() -> Self {
DeviceConfigInfos {
info_list: Vec::new(),
}
}
/// Insert or update configuration information for a device.
pub fn insert_or_update(&mut self, config: &T) -> std::result::Result<usize, T::Err> {
let device_info = DeviceConfigInfo::new(config.clone());
Ok(match self.get_index_by_id(config) {
Some(index) => {
for (idx, info) in self.info_list.iter().enumerate() {
if idx != index {
info.config.check_conflicts(config)?;
}
}
self.info_list[index] = device_info;
index
}
None => {
for info in self.info_list.iter() {
info.config.check_conflicts(config)?;
}
self.info_list.push(device_info);
self.info_list.len() - 1
}
})
}
/// Remove a device configuration information object.
pub fn remove(&mut self, index: usize) -> Option<DeviceConfigInfo<T>> {
if self.info_list.len() > index {
Some(self.info_list.remove(index))
} else {
None
}
}
/// Get number of device configuration information objects.
pub fn len(&self) -> usize {
self.info_list.len()
}
/// Returns true if the device configuration information objects is empty.
pub fn is_empty(&self) -> bool {
self.info_list.len() == 0
}
/// Add a device configuration information object at the tail.
pub fn push(&mut self, info: DeviceConfigInfo<T>) {
self.info_list.push(info);
}
/// Iterator for configuration information objects.
pub fn iter(&self) -> std::slice::Iter<DeviceConfigInfo<T>> {
self.info_list.iter()
}
/// Mutable iterator for configuration information objects.
pub fn iter_mut(&mut self) -> std::slice::IterMut<DeviceConfigInfo<T>> {
self.info_list.iter_mut()
}
fn get_index_by_id(&self, config: &T) -> Option<usize> {
self.info_list
.iter()
.position(|info| info.config.id().eq(config.id()))
}
}
impl<T> Index<usize> for DeviceConfigInfos<T>
where
T: ConfigItem + Clone,
{
type Output = DeviceConfigInfo<T>;
fn index(&self, idx: usize) -> &Self::Output {
&self.info_list[idx]
}
}
impl<T> IndexMut<usize> for DeviceConfigInfos<T>
where
T: ConfigItem + Clone,
{
fn index_mut(&mut self, idx: usize) -> &mut Self::Output {
&mut self.info_list[idx]
}
}
impl<T> Clone for DeviceConfigInfos<T>
where
T: ConfigItem + Clone,
{
fn clone(&self) -> Self {
DeviceConfigInfos {
info_list: self.info_list.clone(),
}
}
}
/// Configuration information for RateLimiter token bucket.
#[derive(Clone, Debug, Default, Deserialize, PartialEq, Serialize)]
pub struct TokenBucketConfigInfo {
/// The size for the token bucket. A TokenBucket of `size` total capacity will take `refill_time`
/// milliseconds to go from zero tokens to total capacity.
pub size: u64,
/// Number of free initial tokens, that can be consumed at no cost.
pub one_time_burst: u64,
/// Complete refill time in milliseconds.
pub refill_time: u64,
}
impl TokenBucketConfigInfo {
fn resize(&mut self, n: u64) {
if n != 0 {
self.size /= n;
self.one_time_burst /= n;
}
}
}
impl From<TokenBucketConfigInfo> for TokenBucket {
fn from(t: TokenBucketConfigInfo) -> TokenBucket {
(&t).into()
}
}
impl From<&TokenBucketConfigInfo> for TokenBucket {
fn from(t: &TokenBucketConfigInfo) -> TokenBucket {
TokenBucket::new(t.size, t.one_time_burst, t.refill_time)
}
}
/// Configuration information for RateLimiter objects.
#[derive(Clone, Debug, Default, Deserialize, PartialEq, Serialize)]
pub struct RateLimiterConfigInfo {
/// Data used to initialize the RateLimiter::bandwidth bucket.
pub bandwidth: TokenBucketConfigInfo,
/// Data used to initialize the RateLimiter::ops bucket.
pub ops: TokenBucketConfigInfo,
}
impl RateLimiterConfigInfo {
/// Update the bandwidth budget configuration.
pub fn update_bandwidth(&mut self, new_config: TokenBucketConfigInfo) {
self.bandwidth = new_config;
}
/// Update the ops budget configuration.
pub fn update_ops(&mut self, new_config: TokenBucketConfigInfo) {
self.ops = new_config;
}
/// resize the limiter to its 1/n.
pub fn resize(&mut self, n: u64) {
self.bandwidth.resize(n);
self.ops.resize(n);
}
}
impl TryInto<RateLimiter> for &RateLimiterConfigInfo {
type Error = io::Error;
fn try_into(self) -> Result<RateLimiter, Self::Error> {
RateLimiter::new(
self.bandwidth.size,
self.bandwidth.one_time_burst,
self.bandwidth.refill_time,
self.ops.size,
self.ops.one_time_burst,
self.ops.refill_time,
)
}
}
impl TryInto<RateLimiter> for RateLimiterConfigInfo {
type Error = io::Error;
fn try_into(self) -> Result<RateLimiter, Self::Error> {
RateLimiter::new(
self.bandwidth.size,
self.bandwidth.one_time_burst,
self.bandwidth.refill_time,
self.ops.size,
self.ops.one_time_burst,
self.ops.refill_time,
)
}
}
#[cfg(test)]
mod tests {
use super::*;
#[derive(Debug, thiserror::Error)]
pub enum DummyError {
#[error("configuration entry exists")]
Exist,
}
#[derive(Clone, Debug, Default)]
pub struct DummyConfigInfo {
id: String,
content: String,
}
impl ConfigItem for DummyConfigInfo {
type Err = DummyError;
fn id(&self) -> &str {
&self.id
}
fn check_conflicts(&self, other: &Self) -> Result<(), DummyError> {
if self.id == other.id || self.content == other.content {
Err(DummyError::Exist)
} else {
Ok(())
}
}
}
type DummyConfigInfos = ConfigInfos<DummyConfigInfo>;
#[test]
fn test_insert_config_info() {
let mut configs = DummyConfigInfos::new();
let config1 = DummyConfigInfo {
id: "1".to_owned(),
content: "a".to_owned(),
};
configs.insert(config1).unwrap();
assert_eq!(configs.configs.len(), 1);
assert_eq!(configs.configs[0].id, "1");
assert_eq!(configs.configs[0].content, "a");
// Test case: cannot insert new item with the same id.
let config2 = DummyConfigInfo {
id: "1".to_owned(),
content: "b".to_owned(),
};
configs.insert(config2).unwrap_err();
assert_eq!(configs.configs.len(), 1);
assert_eq!(configs.configs[0].id, "1");
assert_eq!(configs.configs[0].content, "a");
let config3 = DummyConfigInfo {
id: "2".to_owned(),
content: "c".to_owned(),
};
configs.insert(config3).unwrap();
assert_eq!(configs.configs.len(), 2);
assert_eq!(configs.configs[0].id, "1");
assert_eq!(configs.configs[0].content, "a");
assert_eq!(configs.configs[1].id, "2");
assert_eq!(configs.configs[1].content, "c");
// Test case: cannot insert new item with the same content.
let config4 = DummyConfigInfo {
id: "3".to_owned(),
content: "c".to_owned(),
};
configs.insert(config4).unwrap_err();
assert_eq!(configs.configs.len(), 2);
assert_eq!(configs.configs[0].id, "1");
assert_eq!(configs.configs[0].content, "a");
assert_eq!(configs.configs[1].id, "2");
assert_eq!(configs.configs[1].content, "c");
}
#[test]
fn test_update_config_info() {
let mut configs = DummyConfigInfos::new();
let config1 = DummyConfigInfo {
id: "1".to_owned(),
content: "a".to_owned(),
};
configs.insert(config1).unwrap();
assert_eq!(configs.configs.len(), 1);
assert_eq!(configs.configs[0].id, "1");
assert_eq!(configs.configs[0].content, "a");
// Test case: succeed to update an existing entry
let config2 = DummyConfigInfo {
id: "1".to_owned(),
content: "b".to_owned(),
};
configs.update(config2, DummyError::Exist).unwrap();
assert_eq!(configs.configs.len(), 1);
assert_eq!(configs.configs[0].id, "1");
assert_eq!(configs.configs[0].content, "b");
// Test case: cannot update a non-existing entry
let config3 = DummyConfigInfo {
id: "2".to_owned(),
content: "c".to_owned(),
};
configs.update(config3, DummyError::Exist).unwrap_err();
assert_eq!(configs.configs.len(), 1);
assert_eq!(configs.configs[0].id, "1");
assert_eq!(configs.configs[0].content, "b");
// Test case: cannot update an entry with conflicting content
let config4 = DummyConfigInfo {
id: "2".to_owned(),
content: "c".to_owned(),
};
configs.insert(config4).unwrap();
let config5 = DummyConfigInfo {
id: "1".to_owned(),
content: "c".to_owned(),
};
configs.update(config5, DummyError::Exist).unwrap_err();
}
#[test]
fn test_insert_or_update_config_info() {
let mut configs = DummyConfigInfos::new();
let config1 = DummyConfigInfo {
id: "1".to_owned(),
content: "a".to_owned(),
};
configs.insert_or_update(config1).unwrap();
assert_eq!(configs.configs.len(), 1);
assert_eq!(configs.configs[0].id, "1");
assert_eq!(configs.configs[0].content, "a");
// Test case: succeed to update an existing entry
let config2 = DummyConfigInfo {
id: "1".to_owned(),
content: "b".to_owned(),
};
configs.insert_or_update(config2.clone()).unwrap();
assert_eq!(configs.configs.len(), 1);
assert_eq!(configs.configs[0].id, "1");
assert_eq!(configs.configs[0].content, "b");
// Add a second entry
let config3 = DummyConfigInfo {
id: "2".to_owned(),
content: "c".to_owned(),
};
configs.insert_or_update(config3.clone()).unwrap();
assert_eq!(configs.configs.len(), 2);
assert_eq!(configs.configs[0].id, "1");
assert_eq!(configs.configs[0].content, "b");
assert_eq!(configs.configs[1].id, "2");
assert_eq!(configs.configs[1].content, "c");
// Lookup the first entry
let config4 = configs
.get_by_id(&DummyConfigInfo {
id: "1".to_owned(),
content: "b".to_owned(),
})
.unwrap();
assert_eq!(config4.id, config2.id);
assert_eq!(config4.content, config2.content);
// Lookup the second entry
let config5 = configs
.get_by_id(&DummyConfigInfo {
id: "2".to_owned(),
content: "c".to_owned(),
})
.unwrap();
assert_eq!(config5.id, config3.id);
assert_eq!(config5.content, config3.content);
// Test case: can't insert an entry with conflicting content
let config6 = DummyConfigInfo {
id: "3".to_owned(),
content: "c".to_owned(),
};
configs.insert_or_update(config6).unwrap_err();
assert_eq!(configs.configs.len(), 2);
assert_eq!(configs.configs[0].id, "1");
assert_eq!(configs.configs[0].content, "b");
assert_eq!(configs.configs[1].id, "2");
assert_eq!(configs.configs[1].content, "c");
}
#[test]
fn test_remove_config_info() {
let mut configs = DummyConfigInfos::new();
let config1 = DummyConfigInfo {
id: "1".to_owned(),
content: "a".to_owned(),
};
configs.insert_or_update(config1).unwrap();
let config2 = DummyConfigInfo {
id: "1".to_owned(),
content: "b".to_owned(),
};
configs.insert_or_update(config2.clone()).unwrap();
let config3 = DummyConfigInfo {
id: "2".to_owned(),
content: "c".to_owned(),
};
configs.insert_or_update(config3.clone()).unwrap();
assert_eq!(configs.configs.len(), 2);
assert_eq!(configs.configs[0].id, "1");
assert_eq!(configs.configs[0].content, "b");
assert_eq!(configs.configs[1].id, "2");
assert_eq!(configs.configs[1].content, "c");
let config4 = configs
.remove(&DummyConfigInfo {
id: "1".to_owned(),
content: "no value".to_owned(),
})
.unwrap();
assert_eq!(config4.id, config2.id);
assert_eq!(config4.content, config2.content);
assert_eq!(configs.configs.len(), 1);
assert_eq!(configs.configs[0].id, "2");
assert_eq!(configs.configs[0].content, "c");
let config5 = configs
.remove(&DummyConfigInfo {
id: "2".to_owned(),
content: "no value".to_owned(),
})
.unwrap();
assert_eq!(config5.id, config3.id);
assert_eq!(config5.content, config3.content);
assert_eq!(configs.configs.len(), 0);
}
type DummyDeviceInfoList = DeviceConfigInfos<DummyConfigInfo>;
#[test]
fn test_insert_or_update_device_info() {
let mut configs = DummyDeviceInfoList::new();
let config1 = DummyConfigInfo {
id: "1".to_owned(),
content: "a".to_owned(),
};
configs.insert_or_update(&config1).unwrap();
assert_eq!(configs.len(), 1);
assert_eq!(configs[0].config.id, "1");
assert_eq!(configs[0].config.content, "a");
// Test case: succeed to update an existing entry
let config2 = DummyConfigInfo {
id: "1".to_owned(),
content: "b".to_owned(),
};
configs.insert_or_update(&config2 /* */).unwrap();
assert_eq!(configs.len(), 1);
assert_eq!(configs[0].config.id, "1");
assert_eq!(configs[0].config.content, "b");
// Add a second entry
let config3 = DummyConfigInfo {
id: "2".to_owned(),
content: "c".to_owned(),
};
configs.insert_or_update(&config3).unwrap();
assert_eq!(configs.len(), 2);
assert_eq!(configs[0].config.id, "1");
assert_eq!(configs[0].config.content, "b");
assert_eq!(configs[1].config.id, "2");
assert_eq!(configs[1].config.content, "c");
// Lookup the first entry
let config4_id = configs
.get_index_by_id(&DummyConfigInfo {
id: "1".to_owned(),
content: "b".to_owned(),
})
.unwrap();
let config4 = &configs[config4_id].config;
assert_eq!(config4.id, config2.id);
assert_eq!(config4.content, config2.content);
// Lookup the second entry
let config5_id = configs
.get_index_by_id(&DummyConfigInfo {
id: "2".to_owned(),
content: "c".to_owned(),
})
.unwrap();
let config5 = &configs[config5_id].config;
assert_eq!(config5.id, config3.id);
assert_eq!(config5.content, config3.content);
// Test case: can't insert an entry with conflicting content
let config6 = DummyConfigInfo {
id: "3".to_owned(),
content: "c".to_owned(),
};
configs.insert_or_update(&config6).unwrap_err();
assert_eq!(configs.len(), 2);
assert_eq!(configs[0].config.id, "1");
assert_eq!(configs[0].config.content, "b");
assert_eq!(configs[1].config.id, "2");
assert_eq!(configs[1].config.content, "c");
}
#[test]
fn test_remove_device_info() {
let mut configs = DummyDeviceInfoList::new();
let config1 = DummyConfigInfo {
id: "1".to_owned(),
content: "a".to_owned(),
};
configs.insert_or_update(&config1).unwrap();
let config2 = DummyConfigInfo {
id: "1".to_owned(),
content: "b".to_owned(),
};
configs.insert_or_update(&config2).unwrap();
let config3 = DummyConfigInfo {
id: "2".to_owned(),
content: "c".to_owned(),
};
configs.insert_or_update(&config3).unwrap();
assert_eq!(configs.len(), 2);
assert_eq!(configs[0].config.id, "1");
assert_eq!(configs[0].config.content, "b");
assert_eq!(configs[1].config.id, "2");
assert_eq!(configs[1].config.content, "c");
let config4 = configs.remove(0).unwrap().config;
assert_eq!(config4.id, config2.id);
assert_eq!(config4.content, config2.content);
assert_eq!(configs.len(), 1);
assert_eq!(configs[0].config.id, "2");
assert_eq!(configs[0].config.content, "c");
let config5 = configs.remove(0).unwrap().config;
assert_eq!(config5.id, config3.id);
assert_eq!(config5.content, config3.content);
assert_eq!(configs.len(), 0);
}
#[test]
fn test_rate_limiter_configs() {
const SIZE: u64 = 1024 * 1024;
const ONE_TIME_BURST: u64 = 1024;
const REFILL_TIME: u64 = 1000;
let c: TokenBucketConfigInfo = TokenBucketConfigInfo {
size: SIZE,
one_time_burst: ONE_TIME_BURST,
refill_time: REFILL_TIME,
};
let b: TokenBucket = c.into();
assert_eq!(b.capacity(), SIZE);
assert_eq!(b.one_time_burst(), ONE_TIME_BURST);
assert_eq!(b.refill_time_ms(), REFILL_TIME);
let mut rlc = RateLimiterConfigInfo {
bandwidth: TokenBucketConfigInfo {
size: SIZE,
one_time_burst: ONE_TIME_BURST,
refill_time: REFILL_TIME,
},
ops: TokenBucketConfigInfo {
size: SIZE * 2,
one_time_burst: 0,
refill_time: REFILL_TIME * 2,
},
};
let rl: RateLimiter = (&rlc).try_into().unwrap();
assert_eq!(rl.bandwidth().unwrap().capacity(), SIZE);
assert_eq!(rl.bandwidth().unwrap().one_time_burst(), ONE_TIME_BURST);
assert_eq!(rl.bandwidth().unwrap().refill_time_ms(), REFILL_TIME);
assert_eq!(rl.ops().unwrap().capacity(), SIZE * 2);
assert_eq!(rl.ops().unwrap().one_time_burst(), 0);
assert_eq!(rl.ops().unwrap().refill_time_ms(), REFILL_TIME * 2);
let bandwidth = TokenBucketConfigInfo {
size: SIZE * 2,
one_time_burst: ONE_TIME_BURST * 2,
refill_time: REFILL_TIME * 2,
};
rlc.update_bandwidth(bandwidth);
assert_eq!(rlc.bandwidth.size, SIZE * 2);
assert_eq!(rlc.bandwidth.one_time_burst, ONE_TIME_BURST * 2);
assert_eq!(rlc.bandwidth.refill_time, REFILL_TIME * 2);
assert_eq!(rlc.ops.size, SIZE * 2);
assert_eq!(rlc.ops.one_time_burst, 0);
assert_eq!(rlc.ops.refill_time, REFILL_TIME * 2);
let ops = TokenBucketConfigInfo {
size: SIZE * 3,
one_time_burst: ONE_TIME_BURST * 3,
refill_time: REFILL_TIME * 3,
};
rlc.update_ops(ops);
assert_eq!(rlc.bandwidth.size, SIZE * 2);
assert_eq!(rlc.bandwidth.one_time_burst, ONE_TIME_BURST * 2);
assert_eq!(rlc.bandwidth.refill_time, REFILL_TIME * 2);
assert_eq!(rlc.ops.size, SIZE * 3);
assert_eq!(rlc.ops.one_time_burst, ONE_TIME_BURST * 3);
assert_eq!(rlc.ops.refill_time, REFILL_TIME * 3);
}
}

View File

@@ -0,0 +1,773 @@
// Copyright 2020-2022 Alibaba, Inc. or its affiliates. All Rights Reserved.
// Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0
//
// Portions Copyright 2017 The Chromium OS Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the THIRD-PARTY file.
//! Device manager for virtio-blk and vhost-user-blk devices.
use std::collections::{vec_deque, VecDeque};
use std::convert::TryInto;
use std::fs::OpenOptions;
use std::os::unix::fs::OpenOptionsExt;
use std::os::unix::io::AsRawFd;
use std::path::{Path, PathBuf};
use std::sync::Arc;
use dbs_virtio_devices as virtio;
use dbs_virtio_devices::block::{aio::Aio, io_uring::IoUring, Block, LocalFile, Ufile};
use serde_derive::{Deserialize, Serialize};
use crate::address_space_manager::GuestAddressSpaceImpl;
use crate::config_manager::{ConfigItem, DeviceConfigInfo, RateLimiterConfigInfo};
use crate::device_manager::blk_dev_mgr::BlockDeviceError::InvalidDeviceId;
use crate::device_manager::{DeviceManager, DeviceMgrError, DeviceOpContext};
use crate::get_bucket_update;
use crate::vm::KernelConfigInfo;
use super::DbsMmioV2Device;
// The flag of whether to use the shared irq.
const USE_SHARED_IRQ: bool = true;
// The flag of whether to use the generic irq.
const USE_GENERIC_IRQ: bool = true;
macro_rules! info(
($l:expr, $($args:tt)+) => {
slog::info!($l, $($args)+; slog::o!("subsystem" => "block_manager"))
};
);
macro_rules! error(
($l:expr, $($args:tt)+) => {
slog::error!($l, $($args)+; slog::o!("subsystem" => "block_manager"))
};
);
/// Default queue size for VirtIo block devices.
pub const QUEUE_SIZE: u16 = 128;
/// Errors associated with the operations allowed on a drive.
#[derive(Debug, thiserror::Error)]
pub enum BlockDeviceError {
/// Invalid VM instance ID.
#[error("invalid VM instance id")]
InvalidVMID,
/// The block device path is invalid.
#[error("invalid block device path '{0}'")]
InvalidBlockDevicePath(PathBuf),
/// The block device type is invalid.
#[error("invalid block device type")]
InvalidBlockDeviceType,
/// The block device path was already used for a different drive.
#[error("block device path '{0}' already exists")]
BlockDevicePathAlreadyExists(PathBuf),
/// The device id doesn't exist.
#[error("invalid block device id '{0}'")]
InvalidDeviceId(String),
/// Cannot perform the requested operation after booting the microVM.
#[error("block device does not support runtime update")]
UpdateNotAllowedPostBoot,
/// A root block device was already added.
#[error("could not add multiple virtual machine root devices")]
RootBlockDeviceAlreadyAdded,
/// Failed to send patch message to block epoll handler.
#[error("could not send patch message to the block epoll handler")]
BlockEpollHanderSendFail,
/// Failure from device manager,
#[error("device manager errors: {0}")]
DeviceManager(#[from] DeviceMgrError),
/// Failure from virtio subsystem.
#[error(transparent)]
Virtio(virtio::Error),
/// Unable to seek the block device backing file due to invalid permissions or
/// the file was deleted/corrupted.
#[error("cannot create block device: {0}")]
CreateBlockDevice(#[source] virtio::Error),
/// Cannot open the block device backing file.
#[error("cannot open the block device backing file: {0}")]
OpenBlockDevice(#[source] std::io::Error),
/// Cannot initialize a MMIO Block Device or add a device to the MMIO Bus.
#[error("failure while registering block device: {0}")]
RegisterBlockDevice(#[source] DeviceMgrError),
}
/// Type of low level storage device/protocol for virtio-blk devices.
#[derive(Clone, Copy, Debug, PartialEq, Serialize, Deserialize)]
pub enum BlockDeviceType {
/// Unknown low level device type.
Unknown,
/// Vhost-user-blk based low level device.
/// SPOOL is a reliable NVMe virtualization system for the cloud environment.
/// You could learn more SPOOL here: https://www.usenix.org/conference/atc20/presentation/xue
Spool,
/// Local disk/file based low level device.
RawBlock,
}
impl BlockDeviceType {
/// Get type of low level storage device/protocol by parsing `path`.
pub fn get_type(path: &str) -> BlockDeviceType {
// SPOOL path should be started with "spool", e.g. "spool:/device1"
if path.starts_with("spool:/") {
BlockDeviceType::Spool
} else {
BlockDeviceType::RawBlock
}
}
}
/// Configuration information for a block device.
#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)]
pub struct BlockDeviceConfigUpdateInfo {
/// Unique identifier of the drive.
pub drive_id: String,
/// Rate Limiter for I/O operations.
pub rate_limiter: Option<RateLimiterConfigInfo>,
}
impl BlockDeviceConfigUpdateInfo {
/// Provides a `BucketUpdate` description for the bandwidth rate limiter.
pub fn bytes(&self) -> dbs_utils::rate_limiter::BucketUpdate {
get_bucket_update!(self, rate_limiter, bandwidth)
}
/// Provides a `BucketUpdate` description for the ops rate limiter.
pub fn ops(&self) -> dbs_utils::rate_limiter::BucketUpdate {
get_bucket_update!(self, rate_limiter, ops)
}
}
/// Configuration information for a block device.
#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)]
pub struct BlockDeviceConfigInfo {
/// Unique identifier of the drive.
pub drive_id: String,
/// Type of low level storage/protocol.
pub device_type: BlockDeviceType,
/// Path of the drive.
pub path_on_host: PathBuf,
/// If set to true, it makes the current device the root block device.
/// Setting this flag to true will mount the block device in the
/// guest under /dev/vda unless the part_uuid is present.
pub is_root_device: bool,
/// Part-UUID. Represents the unique id of the boot partition of this device.
/// It is optional and it will be used only if the `is_root_device` field is true.
pub part_uuid: Option<String>,
/// If set to true, the drive is opened in read-only mode. Otherwise, the
/// drive is opened as read-write.
pub is_read_only: bool,
/// If set to false, the drive is opened with buffered I/O mode. Otherwise, the
/// drive is opened with direct I/O mode.
pub is_direct: bool,
/// Don't close `path_on_host` file when dropping the device.
pub no_drop: bool,
/// Block device multi-queue
pub num_queues: usize,
/// Virtio queue size. Size: byte
pub queue_size: u16,
/// Rate Limiter for I/O operations.
pub rate_limiter: Option<RateLimiterConfigInfo>,
/// Use shared irq
pub use_shared_irq: Option<bool>,
/// Use generic irq
pub use_generic_irq: Option<bool>,
}
impl std::default::Default for BlockDeviceConfigInfo {
fn default() -> Self {
Self {
drive_id: String::default(),
device_type: BlockDeviceType::RawBlock,
path_on_host: PathBuf::default(),
is_root_device: false,
part_uuid: None,
is_read_only: false,
is_direct: Self::default_direct(),
no_drop: Self::default_no_drop(),
num_queues: Self::default_num_queues(),
queue_size: 256,
rate_limiter: None,
use_shared_irq: None,
use_generic_irq: None,
}
}
}
impl BlockDeviceConfigInfo {
/// Get default queue numbers
pub fn default_num_queues() -> usize {
1
}
/// Get default value of is_direct switch
pub fn default_direct() -> bool {
true
}
/// Get default value of no_drop switch
pub fn default_no_drop() -> bool {
false
}
/// Get type of low level storage/protocol.
pub fn device_type(&self) -> BlockDeviceType {
self.device_type
}
/// Returns a reference to `path_on_host`.
pub fn path_on_host(&self) -> &PathBuf {
&self.path_on_host
}
/// Returns a reference to the part_uuid.
pub fn get_part_uuid(&self) -> Option<&String> {
self.part_uuid.as_ref()
}
/// Checks whether the drive had read only permissions.
pub fn is_read_only(&self) -> bool {
self.is_read_only
}
/// Checks whether the drive uses direct I/O
pub fn is_direct(&self) -> bool {
self.is_direct
}
/// Get number and size of queues supported.
pub fn queue_sizes(&self) -> Vec<u16> {
(0..self.num_queues)
.map(|_| self.queue_size)
.collect::<Vec<u16>>()
}
}
impl ConfigItem for BlockDeviceConfigInfo {
type Err = BlockDeviceError;
fn id(&self) -> &str {
&self.drive_id
}
fn check_conflicts(&self, other: &Self) -> Result<(), BlockDeviceError> {
if self.drive_id == other.drive_id {
Ok(())
} else if self.path_on_host == other.path_on_host {
Err(BlockDeviceError::BlockDevicePathAlreadyExists(
self.path_on_host.clone(),
))
} else {
Ok(())
}
}
}
impl std::fmt::Debug for BlockDeviceInfo {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "{:?}", self.config)
}
}
/// Block Device Info
pub type BlockDeviceInfo = DeviceConfigInfo<BlockDeviceConfigInfo>;
/// Wrapper for the collection that holds all the Block Devices Configs
//#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)]
#[derive(Clone)]
pub struct BlockDeviceMgr {
/// A list of `BlockDeviceInfo` objects.
info_list: VecDeque<BlockDeviceInfo>,
has_root_block: bool,
has_part_uuid_root: bool,
read_only_root: bool,
part_uuid: Option<String>,
use_shared_irq: bool,
}
impl BlockDeviceMgr {
/// returns a front-to-back iterator.
pub fn iter(&self) -> vec_deque::Iter<BlockDeviceInfo> {
self.info_list.iter()
}
/// Checks whether any of the added BlockDevice is the root.
pub fn has_root_block_device(&self) -> bool {
self.has_root_block
}
/// Checks whether the root device is configured using a part UUID.
pub fn has_part_uuid_root(&self) -> bool {
self.has_part_uuid_root
}
/// Checks whether the root device has read-only permisssions.
pub fn is_read_only_root(&self) -> bool {
self.read_only_root
}
/// Gets the index of the device with the specified `drive_id` if it exists in the list.
pub fn get_index_of_drive_id(&self, id: &str) -> Option<usize> {
self.info_list
.iter()
.position(|info| info.config.id().eq(id))
}
/// Gets the 'BlockDeviceConfigInfo' of the device with the specified `drive_id` if it exists in the list.
pub fn get_config_of_drive_id(&self, drive_id: &str) -> Option<BlockDeviceConfigInfo> {
match self.get_index_of_drive_id(drive_id) {
Some(index) => {
let config = self.info_list.get(index).unwrap().config.clone();
Some(config)
}
None => None,
}
}
/// Inserts `block_device_config` in the block device configuration list.
/// If an entry with the same id already exists, it will attempt to update
/// the existing entry.
/// Inserting a secondary root block device will fail.
pub fn insert_device(
device_mgr: &mut DeviceManager,
mut ctx: DeviceOpContext,
config: BlockDeviceConfigInfo,
) -> std::result::Result<(), BlockDeviceError> {
if !cfg!(feature = "hotplug") && ctx.is_hotplug {
return Err(BlockDeviceError::UpdateNotAllowedPostBoot);
}
let mgr = &mut device_mgr.block_manager;
// If the id of the drive already exists in the list, the operation is update.
match mgr.get_index_of_drive_id(config.id()) {
Some(index) => {
// No support for runtime update yet.
if ctx.is_hotplug {
Err(BlockDeviceError::BlockDevicePathAlreadyExists(
config.path_on_host.clone(),
))
} else {
for (idx, info) in mgr.info_list.iter().enumerate() {
if idx != index {
info.config.check_conflicts(&config)?;
}
}
mgr.update(index, config)
}
}
None => {
for info in mgr.info_list.iter() {
info.config.check_conflicts(&config)?;
}
let index = mgr.create(config.clone())?;
if !ctx.is_hotplug {
return Ok(());
}
match config.device_type {
BlockDeviceType::RawBlock => {
let device = Self::create_blk_device(&config, &mut ctx)
.map_err(BlockDeviceError::Virtio)?;
let dev = DeviceManager::create_mmio_virtio_device(
device,
&mut ctx,
config.use_shared_irq.unwrap_or(mgr.use_shared_irq),
config.use_generic_irq.unwrap_or(USE_GENERIC_IRQ),
)
.map_err(BlockDeviceError::DeviceManager)?;
mgr.update_device_by_index(index, Arc::clone(&dev))?;
// live-upgrade need save/restore device from info.device.
mgr.info_list[index].set_device(dev.clone());
ctx.insert_hotplug_mmio_device(&dev, None).map_err(|e| {
let logger = ctx.logger().new(slog::o!());
BlockDeviceMgr::remove_device(device_mgr, ctx, &config.drive_id)
.unwrap();
error!(
logger,
"failed to hot-add virtio block device {}, {:?}",
&config.drive_id,
e
);
BlockDeviceError::DeviceManager(e)
})
}
_ => Err(BlockDeviceError::InvalidBlockDeviceType),
}
}
}
}
/// Attaches all block devices from the BlockDevicesConfig.
pub fn attach_devices(
&mut self,
ctx: &mut DeviceOpContext,
) -> std::result::Result<(), BlockDeviceError> {
for info in self.info_list.iter_mut() {
match info.config.device_type {
BlockDeviceType::RawBlock => {
info!(
ctx.logger(),
"attach virtio-blk device, drive_id {}, path {}",
info.config.drive_id,
info.config.path_on_host.to_str().unwrap_or("<unknown>")
);
let device = Self::create_blk_device(&info.config, ctx)
.map_err(BlockDeviceError::Virtio)?;
let device = DeviceManager::create_mmio_virtio_device(
device,
ctx,
info.config.use_shared_irq.unwrap_or(self.use_shared_irq),
info.config.use_generic_irq.unwrap_or(USE_GENERIC_IRQ),
)
.map_err(BlockDeviceError::RegisterBlockDevice)?;
info.device = Some(device);
}
_ => {
return Err(BlockDeviceError::OpenBlockDevice(
std::io::Error::from_raw_os_error(libc::EINVAL),
));
}
}
}
Ok(())
}
/// Removes all virtio-blk devices
pub fn remove_devices(&mut self, ctx: &mut DeviceOpContext) -> Result<(), DeviceMgrError> {
while let Some(mut info) = self.info_list.pop_back() {
info!(ctx.logger(), "remove drive {}", info.config.drive_id);
if let Some(device) = info.device.take() {
DeviceManager::destroy_mmio_virtio_device(device, ctx)?;
}
}
Ok(())
}
fn remove(&mut self, drive_id: &str) -> Option<BlockDeviceInfo> {
match self.get_index_of_drive_id(drive_id) {
Some(index) => self.info_list.remove(index),
None => None,
}
}
/// remove a block device, it basically is the inverse operation of `insert_device``
pub fn remove_device(
dev_mgr: &mut DeviceManager,
mut ctx: DeviceOpContext,
drive_id: &str,
) -> std::result::Result<(), BlockDeviceError> {
if !cfg!(feature = "hotplug") {
return Err(BlockDeviceError::UpdateNotAllowedPostBoot);
}
let mgr = &mut dev_mgr.block_manager;
match mgr.remove(drive_id) {
Some(mut info) => {
info!(ctx.logger(), "remove drive {}", info.config.drive_id);
if let Some(device) = info.device.take() {
DeviceManager::destroy_mmio_virtio_device(device, &mut ctx)
.map_err(BlockDeviceError::DeviceManager)?;
}
}
None => return Err(BlockDeviceError::InvalidDeviceId(drive_id.to_owned())),
}
Ok(())
}
fn create_blk_device(
cfg: &BlockDeviceConfigInfo,
ctx: &mut DeviceOpContext,
) -> std::result::Result<Box<Block<GuestAddressSpaceImpl>>, virtio::Error> {
let epoll_mgr = ctx.epoll_mgr.clone().ok_or(virtio::Error::InvalidInput)?;
let mut block_files: Vec<Box<dyn Ufile>> = vec![];
match cfg.device_type {
BlockDeviceType::RawBlock => {
let custom_flags = if cfg.is_direct() {
info!(
ctx.logger(),
"Open block device \"{}\" in direct mode.",
cfg.path_on_host().display()
);
libc::O_DIRECT
} else {
info!(
ctx.logger(),
"Open block device \"{}\" in buffer mode.",
cfg.path_on_host().display(),
);
0
};
let io_uring_supported = IoUring::is_supported();
for i in 0..cfg.num_queues {
let queue_size = cfg.queue_sizes()[i] as u32;
let file = OpenOptions::new()
.read(true)
.custom_flags(custom_flags)
.write(!cfg.is_read_only())
.open(cfg.path_on_host())?;
info!(ctx.logger(), "Queue {}: block file opened", i);
if io_uring_supported {
info!(
ctx.logger(),
"Queue {}: Using io_uring Raw disk file, queue size {}.", i, queue_size
);
let io_engine = IoUring::new(file.as_raw_fd(), queue_size)?;
block_files.push(Box::new(LocalFile::new(file, cfg.no_drop, io_engine)?));
} else {
info!(
ctx.logger(),
"Queue {}: Since io_uring_supported is not enabled, change to default support of Aio Raw disk file, queue size {}", i, queue_size
);
let io_engine = Aio::new(file.as_raw_fd(), queue_size)?;
block_files.push(Box::new(LocalFile::new(file, cfg.no_drop, io_engine)?));
}
}
}
_ => {
error!(
ctx.logger(),
"invalid block device type: {:?}", cfg.device_type
);
return Err(virtio::Error::InvalidInput);
}
};
let mut limiters = vec![];
for _i in 0..cfg.num_queues {
if let Some(limiter) = cfg.rate_limiter.clone().map(|mut v| {
v.resize(cfg.num_queues as u64);
v.try_into().unwrap()
}) {
limiters.push(limiter);
}
}
Ok(Box::new(Block::new(
block_files,
cfg.is_read_only,
Arc::new(cfg.queue_sizes()),
epoll_mgr,
limiters,
)?))
}
/// Generated guest kernel commandline related to root block device.
pub fn generate_kernel_boot_args(
&self,
kernel_config: &mut KernelConfigInfo,
) -> std::result::Result<(), DeviceMgrError> {
// Respect user configuration if kernel_cmdline contains "root=",
// special attention for the case when kernel command line starting with "root=xxx"
let old_kernel_cmdline = format!(" {}", kernel_config.kernel_cmdline().as_str());
if !old_kernel_cmdline.contains(" root=") && self.has_root_block {
let cmdline = kernel_config.kernel_cmdline_mut();
if let Some(ref uuid) = self.part_uuid {
cmdline
.insert("root", &format!("PART_UUID={}", uuid))
.map_err(DeviceMgrError::Cmdline)?;
} else {
cmdline
.insert("root", "/dev/vda")
.map_err(DeviceMgrError::Cmdline)?;
}
if self.read_only_root {
if old_kernel_cmdline.contains(" rw") {
return Err(DeviceMgrError::InvalidOperation);
}
cmdline.insert_str("ro").map_err(DeviceMgrError::Cmdline)?;
}
}
Ok(())
}
/// insert a block device's config. return index on success.
fn create(
&mut self,
block_device_config: BlockDeviceConfigInfo,
) -> std::result::Result<usize, BlockDeviceError> {
self.check_data_file_present(&block_device_config)?;
if self
.get_index_of_drive_path(&block_device_config.path_on_host)
.is_some()
{
return Err(BlockDeviceError::BlockDevicePathAlreadyExists(
block_device_config.path_on_host,
));
}
// check whether the Device Config belongs to a root device
// we need to satisfy the condition by which a VMM can only have on root device
if block_device_config.is_root_device {
if self.has_root_block {
return Err(BlockDeviceError::RootBlockDeviceAlreadyAdded);
} else {
self.has_root_block = true;
self.read_only_root = block_device_config.is_read_only;
self.has_part_uuid_root = block_device_config.part_uuid.is_some();
self.part_uuid = block_device_config.part_uuid.clone();
// Root Device should be the first in the list whether or not PART_UUID is specified
// in order to avoid bugs in case of switching from part_uuid boot scenarios to
// /dev/vda boot type.
self.info_list
.push_front(BlockDeviceInfo::new(block_device_config));
Ok(0)
}
} else {
self.info_list
.push_back(BlockDeviceInfo::new(block_device_config));
Ok(self.info_list.len() - 1)
}
}
/// Updates a Block Device Config. The update fails if it would result in two
/// root block devices.
fn update(
&mut self,
mut index: usize,
new_config: BlockDeviceConfigInfo,
) -> std::result::Result<(), BlockDeviceError> {
// Check if the path exists
self.check_data_file_present(&new_config)?;
if let Some(idx) = self.get_index_of_drive_path(&new_config.path_on_host) {
if idx != index {
return Err(BlockDeviceError::BlockDevicePathAlreadyExists(
new_config.path_on_host.clone(),
));
}
}
if self.info_list.get(index).is_none() {
return Err(InvalidDeviceId(index.to_string()));
}
// Check if the root block device is being updated.
if self.info_list[index].config.is_root_device {
self.has_root_block = new_config.is_root_device;
self.read_only_root = new_config.is_root_device && new_config.is_read_only;
self.has_part_uuid_root = new_config.part_uuid.is_some();
self.part_uuid = new_config.part_uuid.clone();
} else if new_config.is_root_device {
// Check if a second root block device is being added.
if self.has_root_block {
return Err(BlockDeviceError::RootBlockDeviceAlreadyAdded);
} else {
// One of the non-root blocks is becoming root.
self.has_root_block = true;
self.read_only_root = new_config.is_read_only;
self.has_part_uuid_root = new_config.part_uuid.is_some();
self.part_uuid = new_config.part_uuid.clone();
// Make sure the root device is on the first position.
self.info_list.swap(0, index);
// Block config to be updated has moved to first position.
index = 0;
}
}
// Update the config.
self.info_list[index].config = new_config;
Ok(())
}
fn check_data_file_present(
&self,
block_device_config: &BlockDeviceConfigInfo,
) -> std::result::Result<(), BlockDeviceError> {
if block_device_config.device_type == BlockDeviceType::RawBlock
&& !block_device_config.path_on_host.exists()
{
Err(BlockDeviceError::InvalidBlockDevicePath(
block_device_config.path_on_host.clone(),
))
} else {
Ok(())
}
}
fn get_index_of_drive_path(&self, drive_path: &Path) -> Option<usize> {
self.info_list
.iter()
.position(|info| info.config.path_on_host.eq(drive_path))
}
/// update devce information in `info_list`. The caller of this method is
/// `insert_device` when hotplug is true.
pub fn update_device_by_index(
&mut self,
index: usize,
device: Arc<DbsMmioV2Device>,
) -> Result<(), BlockDeviceError> {
if let Some(info) = self.info_list.get_mut(index) {
info.device = Some(device);
return Ok(());
}
Err(BlockDeviceError::InvalidDeviceId("".to_owned()))
}
/// Update the ratelimiter settings of a virtio blk device.
pub fn update_device_ratelimiters(
device_mgr: &mut DeviceManager,
new_cfg: BlockDeviceConfigUpdateInfo,
) -> std::result::Result<(), BlockDeviceError> {
let mgr = &mut device_mgr.block_manager;
match mgr.get_index_of_drive_id(&new_cfg.drive_id) {
Some(index) => {
let config = &mut mgr.info_list[index].config;
config.rate_limiter = new_cfg.rate_limiter.clone();
let device = mgr.info_list[index]
.device
.as_mut()
.ok_or_else(|| BlockDeviceError::InvalidDeviceId("".to_owned()))?;
if let Some(mmio_dev) = device.as_any().downcast_ref::<DbsMmioV2Device>() {
let guard = mmio_dev.state();
let inner_dev = guard.get_inner_device();
if let Some(blk_dev) = inner_dev
.as_any()
.downcast_ref::<virtio::block::Block<GuestAddressSpaceImpl>>()
{
return blk_dev
.set_patch_rate_limiters(new_cfg.bytes(), new_cfg.ops())
.map(|_p| ())
.map_err(|_e| BlockDeviceError::BlockEpollHanderSendFail);
}
}
Ok(())
}
None => Err(BlockDeviceError::InvalidDeviceId(new_cfg.drive_id)),
}
}
}
impl Default for BlockDeviceMgr {
/// Constructor for the BlockDeviceMgr. It initializes an empty LinkedList.
fn default() -> BlockDeviceMgr {
BlockDeviceMgr {
info_list: VecDeque::<BlockDeviceInfo>::new(),
has_root_block: false,
has_part_uuid_root: false,
read_only_root: false,
part_uuid: None,
use_shared_irq: USE_SHARED_IRQ,
}
}
}

View File

@@ -0,0 +1,440 @@
// Copyright (C) 2022 Alibaba Cloud. All rights reserved.
// Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0
//
// Portions Copyright 2017 The Chromium OS Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the THIRD-PARTY file.
//! Virtual machine console device manager.
//!
//! A virtual console are composed up of two parts: frontend in virtual machine and backend in
//! host OS. A frontend may be serial port, virtio-console etc, a backend may be stdio or Unix
//! domain socket. The manager connects the frontend with the backend.
use std::io::{self, Read};
use std::os::unix::net::{UnixListener, UnixStream};
use std::path::Path;
use std::sync::{Arc, Mutex};
use bytes::{BufMut, BytesMut};
use dbs_legacy_devices::{ConsoleHandler, SerialDevice};
use dbs_utils::epoll_manager::{
EpollManager, EventOps, EventSet, Events, MutEventSubscriber, SubscriberId,
};
use vmm_sys_util::terminal::Terminal;
use super::{DeviceMgrError, Result};
const EPOLL_EVENT_SERIAL: u32 = 0;
const EPOLL_EVENT_SERIAL_DATA: u32 = 1;
const EPOLL_EVENT_STDIN: u32 = 2;
// Maximal backend throughput for every data transaction.
const MAX_BACKEND_THROUGHPUT: usize = 64;
/// Errors related to Console manager operations.
#[derive(Debug, thiserror::Error)]
pub enum ConsoleManagerError {
/// Cannot create unix domain socket for serial port
#[error("cannot create socket for serial console")]
CreateSerialSock(#[source] std::io::Error),
/// An operation on the epoll instance failed due to resource exhaustion or bad configuration.
#[error("failure while managing epoll event for console fd")]
EpollMgr(#[source] dbs_utils::epoll_manager::Error),
/// Cannot set mode for terminal.
#[error("failure while setting attribute for terminal")]
StdinHandle(#[source] vmm_sys_util::errno::Error),
}
enum Backend {
StdinHandle(std::io::Stdin),
SockPath(String),
}
/// Console manager to manage frontend and backend console devices.
pub struct ConsoleManager {
epoll_mgr: EpollManager,
logger: slog::Logger,
subscriber_id: Option<SubscriberId>,
backend: Option<Backend>,
}
impl ConsoleManager {
/// Create a console manager instance.
pub fn new(epoll_mgr: EpollManager, logger: &slog::Logger) -> Self {
let logger = logger.new(slog::o!("subsystem" => "console_manager"));
ConsoleManager {
epoll_mgr,
logger,
subscriber_id: Default::default(),
backend: None,
}
}
/// Create a console backend device by using stdio streams.
pub fn create_stdio_console(&mut self, device: Arc<Mutex<SerialDevice>>) -> Result<()> {
let stdin_handle = std::io::stdin();
stdin_handle
.lock()
.set_raw_mode()
.map_err(|e| DeviceMgrError::ConsoleManager(ConsoleManagerError::StdinHandle(e)))?;
let handler = ConsoleEpollHandler::new(device, Some(stdin_handle), None, &self.logger);
self.subscriber_id = Some(self.epoll_mgr.add_subscriber(Box::new(handler)));
self.backend = Some(Backend::StdinHandle(std::io::stdin()));
Ok(())
}
/// Create s console backend device by using Unix Domain socket.
pub fn create_socket_console(
&mut self,
device: Arc<Mutex<SerialDevice>>,
sock_path: String,
) -> Result<()> {
let sock_listener = Self::bind_domain_socket(&sock_path).map_err(|e| {
DeviceMgrError::ConsoleManager(ConsoleManagerError::CreateSerialSock(e))
})?;
let handler = ConsoleEpollHandler::new(device, None, Some(sock_listener), &self.logger);
self.subscriber_id = Some(self.epoll_mgr.add_subscriber(Box::new(handler)));
self.backend = Some(Backend::SockPath(sock_path));
Ok(())
}
/// Reset the host side terminal to canonical mode.
pub fn reset_console(&self) -> Result<()> {
if let Some(Backend::StdinHandle(stdin_handle)) = self.backend.as_ref() {
stdin_handle
.lock()
.set_canon_mode()
.map_err(|e| DeviceMgrError::ConsoleManager(ConsoleManagerError::StdinHandle(e)))?;
}
Ok(())
}
fn bind_domain_socket(serial_path: &str) -> std::result::Result<UnixListener, std::io::Error> {
let path = Path::new(serial_path);
if path.is_file() {
let _ = std::fs::remove_file(serial_path);
}
UnixListener::bind(path)
}
}
struct ConsoleEpollHandler {
device: Arc<Mutex<SerialDevice>>,
stdin_handle: Option<std::io::Stdin>,
sock_listener: Option<UnixListener>,
sock_conn: Option<UnixStream>,
logger: slog::Logger,
}
impl ConsoleEpollHandler {
fn new(
device: Arc<Mutex<SerialDevice>>,
stdin_handle: Option<std::io::Stdin>,
sock_listener: Option<UnixListener>,
logger: &slog::Logger,
) -> Self {
ConsoleEpollHandler {
device,
stdin_handle,
sock_listener,
sock_conn: None,
logger: logger.new(slog::o!("subsystem" => "console_manager")),
}
}
fn uds_listener_accept(&mut self, ops: &mut EventOps) -> std::io::Result<()> {
if self.sock_conn.is_some() {
slog::warn!(self.logger,
"UDS for serial port 1 already exists, reject the new connection";
"subsystem" => "console_mgr",
);
// Do not expected poisoned lock.
let _ = self.sock_listener.as_mut().unwrap().accept();
} else {
// Safe to unwrap() because self.sock_conn is Some().
let (conn_sock, _) = self.sock_listener.as_ref().unwrap().accept()?;
let events = Events::with_data(&conn_sock, EPOLL_EVENT_SERIAL_DATA, EventSet::IN);
if let Err(e) = ops.add(events) {
slog::error!(self.logger,
"failed to register epoll event for serial, {:?}", e;
"subsystem" => "console_mgr",
);
return Err(std::io::Error::last_os_error());
}
let conn_sock_copy = conn_sock.try_clone()?;
// Do not expected poisoned lock.
self.device
.lock()
.unwrap()
.set_output_stream(Some(Box::new(conn_sock_copy)));
self.sock_conn = Some(conn_sock);
}
Ok(())
}
fn uds_read_in(&mut self, ops: &mut EventOps) -> std::io::Result<()> {
let mut should_drop = true;
if let Some(conn_sock) = self.sock_conn.as_mut() {
let mut out = [0u8; MAX_BACKEND_THROUGHPUT];
match conn_sock.read(&mut out[..]) {
Ok(0) => {
// Zero-length read means EOF. Remove this conn sock.
self.device
.lock()
.expect("console: poisoned console lock")
.set_output_stream(None);
}
Ok(count) => {
self.device
.lock()
.expect("console: poisoned console lock")
.raw_input(&out[..count])?;
should_drop = false;
}
Err(e) => {
slog::warn!(self.logger,
"error while reading serial conn sock: {:?}", e;
"subsystem" => "console_mgr"
);
self.device
.lock()
.expect("console: poisoned console lock")
.set_output_stream(None);
}
}
}
if should_drop {
assert!(self.sock_conn.is_some());
// Safe to unwrap() because self.sock_conn is Some().
let sock_conn = self.sock_conn.take().unwrap();
let events = Events::with_data(&sock_conn, EPOLL_EVENT_SERIAL_DATA, EventSet::IN);
if let Err(e) = ops.remove(events) {
slog::error!(self.logger,
"failed deregister epoll event for UDS, {:?}", e;
"subsystem" => "console_mgr"
);
}
}
Ok(())
}
fn stdio_read_in(&mut self, ops: &mut EventOps) -> std::io::Result<()> {
let mut should_drop = true;
if let Some(handle) = self.stdin_handle.as_ref() {
let mut out = [0u8; MAX_BACKEND_THROUGHPUT];
// Safe to unwrap() because self.stdin_handle is Some().
let stdin_lock = handle.lock();
match stdin_lock.read_raw(&mut out[..]) {
Ok(0) => {
// Zero-length read indicates EOF. Remove from pollables.
self.device
.lock()
.expect("console: poisoned console lock")
.set_output_stream(None);
}
Ok(count) => {
self.device
.lock()
.expect("console: poisoned console lock")
.raw_input(&out[..count])?;
should_drop = false;
}
Err(e) => {
slog::warn!(self.logger,
"error while reading stdin: {:?}", e;
"subsystem" => "console_mgr"
);
self.device
.lock()
.expect("console: poisoned console lock")
.set_output_stream(None);
}
}
}
if should_drop {
let events = Events::with_data_raw(libc::STDIN_FILENO, EPOLL_EVENT_STDIN, EventSet::IN);
if let Err(e) = ops.remove(events) {
slog::error!(self.logger,
"failed to deregister epoll event for stdin, {:?}", e;
"subsystem" => "console_mgr"
);
}
}
Ok(())
}
}
impl MutEventSubscriber for ConsoleEpollHandler {
fn process(&mut self, events: Events, ops: &mut EventOps) {
slog::trace!(self.logger, "ConsoleEpollHandler::process()");
let slot = events.data();
match slot {
EPOLL_EVENT_SERIAL => {
if let Err(e) = self.uds_listener_accept(ops) {
slog::warn!(self.logger, "failed to accept incoming connection, {:?}", e);
}
}
EPOLL_EVENT_SERIAL_DATA => {
if let Err(e) = self.uds_read_in(ops) {
slog::warn!(self.logger, "failed to read data from UDS, {:?}", e);
}
}
EPOLL_EVENT_STDIN => {
if let Err(e) = self.stdio_read_in(ops) {
slog::warn!(self.logger, "failed to read data from stdin, {:?}", e);
}
}
_ => slog::error!(self.logger, "unknown epoll slot number {}", slot),
}
}
fn init(&mut self, ops: &mut EventOps) {
slog::trace!(self.logger, "ConsoleEpollHandler::init()");
if self.stdin_handle.is_some() {
slog::info!(self.logger, "ConsoleEpollHandler: stdin handler");
let events = Events::with_data_raw(libc::STDIN_FILENO, EPOLL_EVENT_STDIN, EventSet::IN);
if let Err(e) = ops.add(events) {
slog::error!(
self.logger,
"failed to register epoll event for stdin, {:?}",
e
);
}
}
if let Some(sock) = self.sock_listener.as_ref() {
slog::info!(self.logger, "ConsoleEpollHandler: sock listener");
let events = Events::with_data(sock, EPOLL_EVENT_SERIAL, EventSet::IN);
if let Err(e) = ops.add(events) {
slog::error!(
self.logger,
"failed to register epoll event for UDS listener, {:?}",
e
);
}
}
if let Some(conn) = self.sock_conn.as_ref() {
slog::info!(self.logger, "ConsoleEpollHandler: sock connection");
let events = Events::with_data(conn, EPOLL_EVENT_SERIAL_DATA, EventSet::IN);
if let Err(e) = ops.add(events) {
slog::error!(
self.logger,
"failed to register epoll event for UDS connection, {:?}",
e
);
}
}
}
}
/// Writer to process guest kernel dmesg.
pub struct DmesgWriter {
buf: BytesMut,
logger: slog::Logger,
}
impl DmesgWriter {
/// Creates a new instance.
pub fn new(logger: &slog::Logger) -> Self {
Self {
buf: BytesMut::with_capacity(1024),
logger: logger.new(slog::o!("subsystem" => "dmesg")),
}
}
}
impl io::Write for DmesgWriter {
/// 0000000 [ 0 . 0 3 4 9 1 6 ] R
/// 5b 20 20 20 20 30 2e 30 33 34 39 31 36 5d 20 52
/// 0000020 u n / s b i n / i n i t a s
/// 75 6e 20 2f 73 62 69 6e 2f 69 6e 69 74 20 61 73
/// 0000040 i n i t p r o c e s s \r \n [
///
/// dmesg message end a line with /r/n . When redirect message to logger, we should
/// remove the /r/n .
fn write(&mut self, buf: &[u8]) -> io::Result<usize> {
let arr: Vec<&[u8]> = buf.split(|c| *c == b'\n').collect();
let count = arr.len();
for (i, sub) in arr.iter().enumerate() {
if sub.is_empty() {
if !self.buf.is_empty() {
slog::info!(
self.logger,
"{}",
String::from_utf8_lossy(self.buf.as_ref()).trim_end()
);
self.buf.clear();
}
} else if sub.len() < buf.len() && i < count - 1 {
slog::info!(
self.logger,
"{}{}",
String::from_utf8_lossy(self.buf.as_ref()).trim_end(),
String::from_utf8_lossy(sub).trim_end(),
);
self.buf.clear();
} else {
self.buf.put_slice(sub);
}
}
Ok(buf.len())
}
fn flush(&mut self) -> io::Result<()> {
Ok(())
}
}
#[cfg(test)]
mod tests {
use super::*;
use slog::Drain;
use std::io::Write;
fn create_logger() -> slog::Logger {
let decorator = slog_term::TermDecorator::new().build();
let drain = slog_term::FullFormat::new(decorator).build().fuse();
let drain = slog_async::Async::new(drain).build().fuse();
slog::Logger::root(drain, slog::o!())
}
#[test]
fn test_dmesg_writer() {
let mut writer = DmesgWriter {
buf: Default::default(),
logger: create_logger(),
};
writer.flush().unwrap();
writer.write_all("".as_bytes()).unwrap();
writer.write_all("\n".as_bytes()).unwrap();
writer.write_all("\n\n".as_bytes()).unwrap();
writer.write_all("\n\n\n".as_bytes()).unwrap();
writer.write_all("12\n23\n34\n56".as_bytes()).unwrap();
writer.write_all("78".as_bytes()).unwrap();
writer.write_all("90\n".as_bytes()).unwrap();
writer.flush().unwrap();
}
// TODO: add unit tests for console manager
}

View File

@@ -0,0 +1,528 @@
// Copyright 2020-2022 Alibaba Cloud. All Rights Reserved.
// Copyright 2019 Intel Corporation. All Rights Reserved.
//
// SPDX-License-Identifier: Apache-2.0
use std::convert::TryInto;
use dbs_utils::epoll_manager::EpollManager;
use dbs_virtio_devices::{self as virtio, Error as VirtIoError};
use serde_derive::{Deserialize, Serialize};
use slog::{error, info};
use crate::address_space_manager::GuestAddressSpaceImpl;
use crate::config_manager::{
ConfigItem, DeviceConfigInfo, DeviceConfigInfos, RateLimiterConfigInfo,
};
use crate::device_manager::{
DbsMmioV2Device, DeviceManager, DeviceMgrError, DeviceOpContext, DeviceVirtioRegionHandler,
};
use crate::get_bucket_update;
use super::DbsVirtioDevice;
// The flag of whether to use the shared irq.
const USE_SHARED_IRQ: bool = true;
// The flag of whether to use the generic irq.
const USE_GENERIC_IRQ: bool = true;
// Default cache size is 2 Gi since this is a typical VM memory size.
const DEFAULT_CACHE_SIZE: u64 = 2 * 1024 * 1024 * 1024;
// We have 2 supported fs device mode, vhostuser and virtio
const VHOSTUSER_FS_MODE: &str = "vhostuser";
// We have 2 supported fs device mode, vhostuser and virtio
const VIRTIO_FS_MODE: &str = "virtio";
/// Errors associated with `FsDeviceConfig`.
#[derive(Debug, thiserror::Error)]
pub enum FsDeviceError {
/// Invalid fs, "virtio" or "vhostuser" is allowed.
#[error("the fs type is invalid, virtio or vhostuser is allowed")]
InvalidFs,
/// Cannot access address space.
#[error("Cannot access address space.")]
AddressSpaceNotInitialized,
/// Cannot convert RateLimterConfigInfo into RateLimiter.
#[error("failure while converting RateLimterConfigInfo into RateLimiter: {0}")]
RateLimterConfigInfoTryInto(#[source] std::io::Error),
/// The fs device tag was already used for a different fs.
#[error("VirtioFs device tag {0} already exists")]
FsDeviceTagAlreadyExists(String),
/// The fs device path was already used for a different fs.
#[error("VirtioFs device tag {0} already exists")]
FsDevicePathAlreadyExists(String),
/// The update is not allowed after booting the microvm.
#[error("update operation is not allowed after boot")]
UpdateNotAllowedPostBoot,
/// The attachbackendfs operation fails.
#[error("Fs device attach a backend fs failed")]
AttachBackendFailed(String),
/// attach backend fs must be done when vm is running.
#[error("vm is not running when attaching a backend fs")]
MicroVMNotRunning,
/// The mount tag doesn't exist.
#[error("fs tag'{0}' doesn't exist")]
TagNotExists(String),
/// Failed to send patch message to VirtioFs epoll handler.
#[error("could not send patch message to the VirtioFs epoll handler")]
VirtioFsEpollHanderSendFail,
/// Creating a shared-fs device fails (if the vhost-user socket cannot be open.)
#[error("cannot create shared-fs device: {0}")]
CreateFsDevice(#[source] VirtIoError),
/// Cannot initialize a shared-fs device or add a device to the MMIO Bus.
#[error("failure while registering shared-fs device: {0}")]
RegisterFsDevice(#[source] DeviceMgrError),
/// The device manager errors.
#[error("DeviceManager error: {0}")]
DeviceManager(#[source] DeviceMgrError),
}
/// Configuration information for a vhost-user-fs device.
#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)]
pub struct FsDeviceConfigInfo {
/// vhost-user socket path.
pub sock_path: String,
/// virtiofs mount tag name used inside the guest.
/// used as the device name during mount.
pub tag: String,
/// Number of virtqueues to use.
pub num_queues: usize,
/// Size of each virtqueue. Unit: byte.
pub queue_size: u16,
/// DAX cache window size
pub cache_size: u64,
/// Number of thread pool workers.
pub thread_pool_size: u16,
/// The caching policy the file system should use (auto, always or never).
/// This cache policy is set for virtio-fs, visit https://gitlab.com/virtio-fs/virtiofsd to get further information.
pub cache_policy: String,
/// Writeback cache
pub writeback_cache: bool,
/// Enable no_open or not
pub no_open: bool,
/// Enable xattr or not
pub xattr: bool,
/// Drop CAP_SYS_RESOURCE or not
pub drop_sys_resource: bool,
/// virtio fs or vhostuser fs.
pub mode: String,
/// Enable kill_priv_v2 or not
pub fuse_killpriv_v2: bool,
/// Enable no_readdir or not
pub no_readdir: bool,
/// Rate Limiter for I/O operations.
pub rate_limiter: Option<RateLimiterConfigInfo>,
/// Use shared irq
pub use_shared_irq: Option<bool>,
/// Use generic irq
pub use_generic_irq: Option<bool>,
}
impl std::default::Default for FsDeviceConfigInfo {
fn default() -> Self {
Self {
sock_path: String::default(),
tag: String::default(),
num_queues: 1,
queue_size: 1024,
cache_size: DEFAULT_CACHE_SIZE,
thread_pool_size: 0,
cache_policy: Self::default_cache_policy(),
writeback_cache: Self::default_writeback_cache(),
no_open: Self::default_no_open(),
fuse_killpriv_v2: Self::default_fuse_killpriv_v2(),
no_readdir: Self::default_no_readdir(),
xattr: Self::default_xattr(),
drop_sys_resource: Self::default_drop_sys_resource(),
mode: Self::default_fs_mode(),
rate_limiter: Some(RateLimiterConfigInfo::default()),
use_shared_irq: None,
use_generic_irq: None,
}
}
}
impl FsDeviceConfigInfo {
/// The default mode is set to 'virtio' for 'virtio-fs' device.
pub fn default_fs_mode() -> String {
String::from(VIRTIO_FS_MODE)
}
/// The default cache policy
pub fn default_cache_policy() -> String {
"always".to_string()
}
/// The default setting of writeback cache
pub fn default_writeback_cache() -> bool {
true
}
/// The default setting of no_open
pub fn default_no_open() -> bool {
true
}
/// The default setting of killpriv_v2
pub fn default_fuse_killpriv_v2() -> bool {
false
}
/// The default setting of xattr
pub fn default_xattr() -> bool {
false
}
/// The default setting of drop_sys_resource
pub fn default_drop_sys_resource() -> bool {
false
}
/// The default setting of no_readdir
pub fn default_no_readdir() -> bool {
false
}
/// The default setting of rate limiter
pub fn default_fs_rate_limiter() -> Option<RateLimiterConfigInfo> {
None
}
}
/// Configuration information for virtio-fs.
#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)]
pub struct FsDeviceConfigUpdateInfo {
/// virtiofs mount tag name used inside the guest.
/// used as the device name during mount.
pub tag: String,
/// Rate Limiter for I/O operations.
pub rate_limiter: Option<RateLimiterConfigInfo>,
}
impl FsDeviceConfigUpdateInfo {
/// Provides a `BucketUpdate` description for the bandwidth rate limiter.
pub fn bytes(&self) -> dbs_utils::rate_limiter::BucketUpdate {
get_bucket_update!(self, rate_limiter, bandwidth)
}
/// Provides a `BucketUpdate` description for the ops rate limiter.
pub fn ops(&self) -> dbs_utils::rate_limiter::BucketUpdate {
get_bucket_update!(self, rate_limiter, ops)
}
}
impl ConfigItem for FsDeviceConfigInfo {
type Err = FsDeviceError;
fn id(&self) -> &str {
&self.tag
}
fn check_conflicts(&self, other: &Self) -> Result<(), FsDeviceError> {
if self.tag == other.tag {
Err(FsDeviceError::FsDeviceTagAlreadyExists(self.tag.clone()))
} else if self.mode.as_str() == VHOSTUSER_FS_MODE && self.sock_path == other.sock_path {
Err(FsDeviceError::FsDevicePathAlreadyExists(
self.sock_path.clone(),
))
} else {
Ok(())
}
}
}
/// Configuration information of manipulating backend fs for a virtiofs device.
#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)]
pub struct FsMountConfigInfo {
/// Mount operations, mount, update, umount
pub ops: String,
/// The backend fs type to mount.
pub fstype: Option<String>,
/// the source file/directory the backend fs points to
pub source: Option<String>,
/// where the backend fs gets mounted
pub mountpoint: String,
/// backend fs config content in json format
pub config: Option<String>,
/// virtiofs mount tag name used inside the guest.
/// used as the device name during mount.
pub tag: String,
/// Path to file that contains file lists that should be prefetched by rafs
pub prefetch_list_path: Option<String>,
/// What size file supports dax
pub dax_threshold_size_kb: Option<u64>,
}
pub(crate) type FsDeviceInfo = DeviceConfigInfo<FsDeviceConfigInfo>;
impl ConfigItem for FsDeviceInfo {
type Err = FsDeviceError;
fn id(&self) -> &str {
&self.config.tag
}
fn check_conflicts(&self, other: &Self) -> Result<(), FsDeviceError> {
if self.config.tag == other.config.tag {
Err(FsDeviceError::FsDeviceTagAlreadyExists(
self.config.tag.clone(),
))
} else if self.config.sock_path == other.config.sock_path {
Err(FsDeviceError::FsDevicePathAlreadyExists(
self.config.sock_path.clone(),
))
} else {
Ok(())
}
}
}
/// Wrapper for the collection that holds all the Fs Devices Configs
pub struct FsDeviceMgr {
/// A list of `FsDeviceConfig` objects.
pub(crate) info_list: DeviceConfigInfos<FsDeviceConfigInfo>,
pub(crate) use_shared_irq: bool,
}
impl FsDeviceMgr {
/// Inserts `fs_cfg` in the shared-fs device configuration list.
pub fn insert_device(
device_mgr: &mut DeviceManager,
ctx: DeviceOpContext,
fs_cfg: FsDeviceConfigInfo,
) -> std::result::Result<(), FsDeviceError> {
// It's too complicated to manage life cycle of shared-fs service process for hotplug.
if ctx.is_hotplug {
error!(
ctx.logger(),
"no support of shared-fs device hotplug";
"subsystem" => "shared-fs",
"tag" => &fs_cfg.tag,
);
return Err(FsDeviceError::UpdateNotAllowedPostBoot);
}
info!(
ctx.logger(),
"add shared-fs device configuration";
"subsystem" => "shared-fs",
"tag" => &fs_cfg.tag,
);
device_mgr
.fs_manager
.lock()
.unwrap()
.info_list
.insert_or_update(&fs_cfg)?;
Ok(())
}
/// Attaches all vhost-user-fs devices from the FsDevicesConfig.
pub fn attach_devices(
&mut self,
ctx: &mut DeviceOpContext,
) -> std::result::Result<(), FsDeviceError> {
let epoll_mgr = ctx
.epoll_mgr
.clone()
.ok_or(FsDeviceError::CreateFsDevice(virtio::Error::InvalidInput))?;
for info in self.info_list.iter_mut() {
let device = Self::create_fs_device(&info.config, ctx, epoll_mgr.clone())?;
let mmio_device = DeviceManager::create_mmio_virtio_device(
device,
ctx,
info.config.use_shared_irq.unwrap_or(self.use_shared_irq),
info.config.use_generic_irq.unwrap_or(USE_GENERIC_IRQ),
)
.map_err(FsDeviceError::RegisterFsDevice)?;
info.set_device(mmio_device);
}
Ok(())
}
fn create_fs_device(
config: &FsDeviceConfigInfo,
ctx: &mut DeviceOpContext,
epoll_mgr: EpollManager,
) -> std::result::Result<DbsVirtioDevice, FsDeviceError> {
match &config.mode as &str {
VIRTIO_FS_MODE => Self::attach_virtio_fs_devices(config, ctx, epoll_mgr),
_ => Err(FsDeviceError::CreateFsDevice(virtio::Error::InvalidInput)),
}
}
fn attach_virtio_fs_devices(
config: &FsDeviceConfigInfo,
ctx: &mut DeviceOpContext,
epoll_mgr: EpollManager,
) -> std::result::Result<DbsVirtioDevice, FsDeviceError> {
info!(
ctx.logger(),
"add virtio-fs device configuration";
"subsystem" => "virito-fs",
"tag" => &config.tag,
"dax_window_size" => &config.cache_size,
);
let limiter = if let Some(rlc) = config.rate_limiter.clone() {
Some(
rlc.try_into()
.map_err(FsDeviceError::RateLimterConfigInfoTryInto)?,
)
} else {
None
};
let vm_as = ctx.get_vm_as().map_err(|e| {
error!(ctx.logger(), "virtio-fs get vm_as error: {:?}", e;
"subsystem" => "virito-fs");
FsDeviceError::DeviceManager(e)
})?;
let address_space = match ctx.address_space.as_ref() {
Some(address_space) => address_space.clone(),
None => {
error!(ctx.logger(), "virtio-fs get address_space error"; "subsystem" => "virito-fs");
return Err(FsDeviceError::AddressSpaceNotInitialized);
}
};
let handler = DeviceVirtioRegionHandler {
vm_as,
address_space,
};
let device = Box::new(
virtio::fs::VirtioFs::new(
&config.tag,
config.num_queues,
config.queue_size,
config.cache_size,
&config.cache_policy,
config.thread_pool_size,
config.writeback_cache,
config.no_open,
config.fuse_killpriv_v2,
config.xattr,
config.drop_sys_resource,
config.no_readdir,
Box::new(handler),
epoll_mgr,
limiter,
)
.map_err(FsDeviceError::CreateFsDevice)?,
);
Ok(device)
}
/// Attach a backend fs to a VirtioFs device or detach a backend
/// fs from a Virtiofs device
pub fn manipulate_backend_fs(
device_mgr: &mut DeviceManager,
config: FsMountConfigInfo,
) -> std::result::Result<(), FsDeviceError> {
let mut found = false;
let mgr = &mut device_mgr.fs_manager.lock().unwrap();
for info in mgr
.info_list
.iter()
.filter(|info| info.config.tag.as_str() == config.tag.as_str())
{
found = true;
if let Some(device) = info.device.as_ref() {
if let Some(mmio_dev) = device.as_any().downcast_ref::<DbsMmioV2Device>() {
let mut guard = mmio_dev.state();
let inner_dev = guard.get_inner_device_mut();
if let Some(virtio_fs_dev) = inner_dev
.as_any_mut()
.downcast_mut::<virtio::fs::VirtioFs<GuestAddressSpaceImpl>>()
{
return virtio_fs_dev
.manipulate_backend_fs(
config.source,
config.fstype,
&config.mountpoint,
config.config,
&config.ops,
config.prefetch_list_path,
config.dax_threshold_size_kb,
)
.map(|_p| ())
.map_err(|e| FsDeviceError::AttachBackendFailed(e.to_string()));
}
}
}
}
if !found {
Err(FsDeviceError::AttachBackendFailed(
"fs tag not found".to_string(),
))
} else {
Ok(())
}
}
/// Gets the index of the device with the specified `tag` if it exists in the list.
pub fn get_index_of_tag(&self, tag: &str) -> Option<usize> {
self.info_list
.iter()
.position(|info| info.config.id().eq(tag))
}
/// Update the ratelimiter settings of a virtio fs device.
pub fn update_device_ratelimiters(
device_mgr: &mut DeviceManager,
new_cfg: FsDeviceConfigUpdateInfo,
) -> std::result::Result<(), FsDeviceError> {
let mgr = &mut device_mgr.fs_manager.lock().unwrap();
match mgr.get_index_of_tag(&new_cfg.tag) {
Some(index) => {
let config = &mut mgr.info_list[index].config;
config.rate_limiter = new_cfg.rate_limiter.clone();
let device = mgr.info_list[index]
.device
.as_mut()
.ok_or_else(|| FsDeviceError::TagNotExists("".to_owned()))?;
if let Some(mmio_dev) = device.as_any().downcast_ref::<DbsMmioV2Device>() {
let guard = mmio_dev.state();
let inner_dev = guard.get_inner_device();
if let Some(fs_dev) = inner_dev
.as_any()
.downcast_ref::<virtio::fs::VirtioFs<GuestAddressSpaceImpl>>()
{
return fs_dev
.set_patch_rate_limiters(new_cfg.bytes(), new_cfg.ops())
.map(|_p| ())
.map_err(|_e| FsDeviceError::VirtioFsEpollHanderSendFail);
}
}
Ok(())
}
None => Err(FsDeviceError::TagNotExists(new_cfg.tag)),
}
}
}
impl Default for FsDeviceMgr {
/// Create a new `FsDeviceMgr` object..
fn default() -> Self {
FsDeviceMgr {
info_list: DeviceConfigInfos::new(),
use_shared_irq: USE_SHARED_IRQ,
}
}
}

View File

@@ -0,0 +1,246 @@
// Copyright (C) 2022 Alibaba Cloud. All rights reserved.
// Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0
//
// Portions Copyright 2017 The Chromium OS Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the THIRD-PARTY file.
//! Device Manager for Legacy Devices.
use std::io;
use std::sync::{Arc, Mutex};
use dbs_device::device_manager::Error as IoManagerError;
#[cfg(target_arch = "aarch64")]
use dbs_legacy_devices::RTCDevice;
use dbs_legacy_devices::SerialDevice;
use vmm_sys_util::eventfd::EventFd;
// The I8042 Data Port (IO Port 0x60) is used for reading data that was received from a I8042 device or from the I8042 controller itself and writing data to a I8042 device or to the I8042 controller itself.
const I8042_DATA_PORT: u16 = 0x60;
/// Errors generated by legacy device manager.
#[derive(Debug, thiserror::Error)]
pub enum Error {
/// Cannot add legacy device to Bus.
#[error("bus failure while managing legacy device")]
BusError(#[source] IoManagerError),
/// Cannot create EventFd.
#[error("failure while reading EventFd file descriptor")]
EventFd(#[source] io::Error),
/// Failed to register/deregister interrupt.
#[error("failure while managing interrupt for legacy device")]
IrqManager(#[source] vmm_sys_util::errno::Error),
}
/// The `LegacyDeviceManager` is a wrapper that is used for registering legacy devices
/// on an I/O Bus.
///
/// It currently manages the uart and i8042 devices. The `LegacyDeviceManger` should be initialized
/// only by using the constructor.
pub struct LegacyDeviceManager {
#[cfg(target_arch = "x86_64")]
i8042_reset_eventfd: EventFd,
#[cfg(target_arch = "aarch64")]
pub(crate) _rtc_device: Arc<Mutex<RTCDevice>>,
#[cfg(target_arch = "aarch64")]
_rtc_eventfd: EventFd,
pub(crate) com1_device: Arc<Mutex<SerialDevice>>,
_com1_eventfd: EventFd,
pub(crate) com2_device: Arc<Mutex<SerialDevice>>,
_com2_eventfd: EventFd,
}
impl LegacyDeviceManager {
/// Get the serial device for com1.
pub fn get_com1_serial(&self) -> Arc<Mutex<SerialDevice>> {
self.com1_device.clone()
}
/// Get the serial device for com2
pub fn get_com2_serial(&self) -> Arc<Mutex<SerialDevice>> {
self.com2_device.clone()
}
}
#[cfg(target_arch = "x86_64")]
pub(crate) mod x86_64 {
use super::*;
use dbs_device::device_manager::IoManager;
use dbs_device::resources::Resource;
use dbs_legacy_devices::{EventFdTrigger, I8042Device, I8042DeviceMetrics};
use kvm_ioctls::VmFd;
pub(crate) const COM1_IRQ: u32 = 4;
pub(crate) const COM1_PORT1: u16 = 0x3f8;
pub(crate) const COM2_IRQ: u32 = 3;
pub(crate) const COM2_PORT1: u16 = 0x2f8;
type Result<T> = ::std::result::Result<T, Error>;
impl LegacyDeviceManager {
/// Create a LegacyDeviceManager instance handling legacy devices (uart, i8042).
pub fn create_manager(bus: &mut IoManager, vm_fd: Option<Arc<VmFd>>) -> Result<Self> {
let (com1_device, com1_eventfd) =
Self::create_com_device(bus, vm_fd.as_ref(), COM1_IRQ, COM1_PORT1)?;
let (com2_device, com2_eventfd) =
Self::create_com_device(bus, vm_fd.as_ref(), COM2_IRQ, COM2_PORT1)?;
let exit_evt = EventFd::new(libc::EFD_NONBLOCK).map_err(Error::EventFd)?;
let i8042_device = Arc::new(Mutex::new(I8042Device::new(
EventFdTrigger::new(exit_evt.try_clone().map_err(Error::EventFd)?),
Arc::new(I8042DeviceMetrics::default()),
)));
let resources = [Resource::PioAddressRange {
// 0x60 and 0x64 are the io ports that i8042 devices used.
// We register pio address range from 0x60 - 0x64 with base I8042_DATA_PORT for i8042 to use.
base: I8042_DATA_PORT,
size: 0x5,
}];
bus.register_device_io(i8042_device, &resources)
.map_err(Error::BusError)?;
Ok(LegacyDeviceManager {
i8042_reset_eventfd: exit_evt,
com1_device,
_com1_eventfd: com1_eventfd,
com2_device,
_com2_eventfd: com2_eventfd,
})
}
/// Get the eventfd for exit notification.
pub fn get_reset_eventfd(&self) -> Result<EventFd> {
self.i8042_reset_eventfd.try_clone().map_err(Error::EventFd)
}
fn create_com_device(
bus: &mut IoManager,
vm_fd: Option<&Arc<VmFd>>,
irq: u32,
port_base: u16,
) -> Result<(Arc<Mutex<SerialDevice>>, EventFd)> {
let eventfd = EventFd::new(libc::EFD_NONBLOCK).map_err(Error::EventFd)?;
let device = Arc::new(Mutex::new(SerialDevice::new(
eventfd.try_clone().map_err(Error::EventFd)?,
)));
// port_base defines the base port address for the COM devices.
// Since every COM device has 8 data registers so we register the pio address range as size 0x8.
let resources = [Resource::PioAddressRange {
base: port_base,
size: 0x8,
}];
bus.register_device_io(device.clone(), &resources)
.map_err(Error::BusError)?;
if let Some(fd) = vm_fd {
fd.register_irqfd(&eventfd, irq)
.map_err(Error::IrqManager)?;
}
Ok((device, eventfd))
}
}
}
#[cfg(target_arch = "aarch64")]
pub(crate) mod aarch64 {
use super::*;
use dbs_device::device_manager::IoManager;
use dbs_device::resources::DeviceResources;
use kvm_ioctls::VmFd;
use std::collections::HashMap;
type Result<T> = ::std::result::Result<T, Error>;
/// LegacyDeviceType: com1
pub const COM1: &str = "com1";
/// LegacyDeviceType: com2
pub const COM2: &str = "com2";
/// LegacyDeviceType: rtc
pub const RTC: &str = "rtc";
impl LegacyDeviceManager {
/// Create a LegacyDeviceManager instance handling legacy devices.
pub fn create_manager(
bus: &mut IoManager,
vm_fd: Option<Arc<VmFd>>,
resources: &HashMap<String, DeviceResources>,
) -> Result<Self> {
let (com1_device, com1_eventfd) =
Self::create_com_device(bus, vm_fd.as_ref(), resources.get(COM1).unwrap())?;
let (com2_device, com2_eventfd) =
Self::create_com_device(bus, vm_fd.as_ref(), resources.get(COM2).unwrap())?;
let (rtc_device, rtc_eventfd) =
Self::create_rtc_device(bus, vm_fd.as_ref(), resources.get(RTC).unwrap())?;
Ok(LegacyDeviceManager {
_rtc_device: rtc_device,
_rtc_eventfd: rtc_eventfd,
com1_device,
_com1_eventfd: com1_eventfd,
com2_device,
_com2_eventfd: com2_eventfd,
})
}
fn create_com_device(
bus: &mut IoManager,
vm_fd: Option<&Arc<VmFd>>,
resources: &DeviceResources,
) -> Result<(Arc<Mutex<SerialDevice>>, EventFd)> {
let eventfd = EventFd::new(libc::EFD_NONBLOCK).map_err(Error::EventFd)?;
let device = Arc::new(Mutex::new(SerialDevice::new(
eventfd.try_clone().map_err(Error::EventFd)?,
)));
bus.register_device_io(device.clone(), resources.get_all_resources())
.map_err(Error::BusError)?;
if let Some(fd) = vm_fd {
let irq = resources.get_legacy_irq().unwrap();
fd.register_irqfd(&eventfd, irq)
.map_err(Error::IrqManager)?;
}
Ok((device, eventfd))
}
fn create_rtc_device(
bus: &mut IoManager,
vm_fd: Option<&Arc<VmFd>>,
resources: &DeviceResources,
) -> Result<(Arc<Mutex<RTCDevice>>, EventFd)> {
let eventfd = EventFd::new(libc::EFD_NONBLOCK).map_err(Error::EventFd)?;
let device = Arc::new(Mutex::new(RTCDevice::new()));
bus.register_device_io(device.clone(), resources.get_all_resources())
.map_err(Error::BusError)?;
if let Some(fd) = vm_fd {
let irq = resources.get_legacy_irq().unwrap();
fd.register_irqfd(&eventfd, irq)
.map_err(Error::IrqManager)?;
}
Ok((device, eventfd))
}
}
}
#[cfg(test)]
mod tests {
#[cfg(target_arch = "x86_64")]
use super::*;
#[test]
#[cfg(target_arch = "x86_64")]
fn test_create_legacy_device_manager() {
let mut bus = dbs_device::device_manager::IoManager::new();
let mgr = LegacyDeviceManager::create_manager(&mut bus, None).unwrap();
let _exit_fd = mgr.get_reset_eventfd().unwrap();
}
}

View File

@@ -0,0 +1,110 @@
// Copyright 2022 Alibaba, Inc. or its affiliates. All Rights Reserved.
//
// SPDX-License-Identifier: Apache-2.0
use std::io;
use std::sync::Arc;
use dbs_address_space::{AddressSpace, AddressSpaceRegion, AddressSpaceRegionType};
use dbs_virtio_devices::{Error as VirtIoError, VirtioRegionHandler};
use log::{debug, error};
use vm_memory::{FileOffset, GuestAddressSpace, GuestMemoryRegion, GuestRegionMmap};
use crate::address_space_manager::GuestAddressSpaceImpl;
/// This struct implements the VirtioRegionHandler trait, which inserts the memory
/// region of the virtio device into vm_as and address_space.
///
/// * After region is inserted into the vm_as, the virtio device can read guest memory
/// data using vm_as.get_slice with GuestAddress.
///
/// * Insert virtio memory into address_space so that the correct guest last address can
/// be found when initializing the e820 table. The e820 table is a table that describes
/// guest memory prepared before the guest startup. we need to config the correct guest
/// memory address and length in the table. The virtio device memory belongs to the MMIO
/// space and does not belong to the Guest Memory space. Therefore, it cannot be configured
/// into the e820 table. When creating AddressSpaceRegion we use
/// AddressSpaceRegionType::ReservedMemory type, in this way, address_space will know that
/// this region a special memory, it will don't put the this memory in e820 table.
///
/// This function relies on the atomic-guest-memory feature. Without this feature enabled, memory
/// regions cannot be inserted into vm_as. Because the insert_region interface of vm_as does
/// not insert regions in place, but returns an array of inserted regions. We need to manually
/// replace this array of regions with vm_as, and that's what atomic-guest-memory feature does.
/// So we rely on the atomic-guest-memory feature here
pub struct DeviceVirtioRegionHandler {
pub(crate) vm_as: GuestAddressSpaceImpl,
pub(crate) address_space: AddressSpace,
}
impl DeviceVirtioRegionHandler {
fn insert_address_space(
&mut self,
region: Arc<GuestRegionMmap>,
) -> std::result::Result<(), VirtIoError> {
let file_offset = match region.file_offset() {
// TODO: use from_arc
Some(f) => Some(FileOffset::new(f.file().try_clone()?, 0)),
None => None,
};
let as_region = Arc::new(AddressSpaceRegion::build(
AddressSpaceRegionType::DAXMemory,
region.start_addr(),
region.size() as u64,
None,
file_offset,
region.flags(),
false,
));
self.address_space.insert_region(as_region).map_err(|e| {
error!("inserting address apace error: {}", e);
// dbs-virtio-devices should not depend on dbs-address-space.
// So here io::Error is used instead of AddressSpaceError directly.
VirtIoError::IOError(io::Error::new(
io::ErrorKind::Other,
format!(
"invalid address space region ({0:#x}, {1:#x})",
region.start_addr().0,
region.len()
),
))
})?;
Ok(())
}
fn insert_vm_as(
&mut self,
region: Arc<GuestRegionMmap>,
) -> std::result::Result<(), VirtIoError> {
let vm_as_new = self.vm_as.memory().insert_region(region).map_err(|e| {
error!(
"DeviceVirtioRegionHandler failed to insert guest memory region: {:?}.",
e
);
VirtIoError::InsertMmap(e)
})?;
// Do not expect poisoned lock here, so safe to unwrap().
self.vm_as.lock().unwrap().replace(vm_as_new);
Ok(())
}
}
impl VirtioRegionHandler for DeviceVirtioRegionHandler {
fn insert_region(
&mut self,
region: Arc<GuestRegionMmap>,
) -> std::result::Result<(), VirtIoError> {
debug!(
"add geust memory region to address_space/vm_as, new region: {:?}",
region
);
self.insert_address_space(region.clone())?;
self.insert_vm_as(region)?;
Ok(())
}
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,387 @@
// Copyright 2020-2022 Alibaba, Inc. or its affiliates. All Rights Reserved.
// Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0
//
// Portions Copyright 2017 The Chromium OS Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the THIRD-PARTY file.
use std::convert::TryInto;
use std::sync::Arc;
use dbs_utils::net::{MacAddr, Tap, TapError};
use dbs_utils::rate_limiter::BucketUpdate;
use dbs_virtio_devices as virtio;
use dbs_virtio_devices::net::Net;
use dbs_virtio_devices::Error as VirtioError;
use serde_derive::{Deserialize, Serialize};
use crate::address_space_manager::GuestAddressSpaceImpl;
use crate::config_manager::{
ConfigItem, DeviceConfigInfo, DeviceConfigInfos, RateLimiterConfigInfo,
};
use crate::device_manager::{DeviceManager, DeviceMgrError, DeviceOpContext};
use crate::get_bucket_update;
use super::DbsMmioV2Device;
/// Default number of virtio queues, one rx/tx pair.
pub const NUM_QUEUES: usize = 2;
/// Default size of virtio queues.
pub const QUEUE_SIZE: u16 = 256;
// The flag of whether to use the shared irq.
const USE_SHARED_IRQ: bool = true;
// The flag of whether to use the generic irq.
const USE_GENERIC_IRQ: bool = true;
/// Errors associated with virtio net device operations.
#[derive(Debug, thiserror::Error)]
pub enum VirtioNetDeviceError {
/// The virtual machine instance ID is invalid.
#[error("the virtual machine instance ID is invalid")]
InvalidVMID,
/// The iface ID is invalid.
#[error("invalid virtio-net iface id '{0}'")]
InvalidIfaceId(String),
/// Invalid queue number configuration for virtio_net device.
#[error("invalid queue number {0} for virtio-net device")]
InvalidQueueNum(usize),
/// Failure from device manager,
#[error("failure in device manager operations, {0}")]
DeviceManager(#[source] DeviceMgrError),
/// The Context Identifier is already in use.
#[error("the device ID {0} already exists")]
DeviceIDAlreadyExist(String),
/// The MAC address is already in use.
#[error("the guest MAC address {0} is already in use")]
GuestMacAddressInUse(String),
/// The host device name is already in use.
#[error("the host device name {0} is already in use")]
HostDeviceNameInUse(String),
/// Cannot open/create tap device.
#[error("cannot open TAP device")]
OpenTap(#[source] TapError),
/// Failure from virtio subsystem.
#[error(transparent)]
Virtio(VirtioError),
/// Failed to send patch message to net epoll handler.
#[error("could not send patch message to the net epoll handler")]
NetEpollHanderSendFail,
/// The update is not allowed after booting the microvm.
#[error("update operation is not allowed after boot")]
UpdateNotAllowedPostBoot,
/// Split this at some point.
/// Internal errors are due to resource exhaustion.
/// Users errors are due to invalid permissions.
#[error("cannot create network device: {0}")]
CreateNetDevice(#[source] VirtioError),
/// Cannot initialize a MMIO Network Device or add a device to the MMIO Bus.
#[error("failure while registering network device: {0}")]
RegisterNetDevice(#[source] DeviceMgrError),
}
/// Configuration information for virtio net devices.
#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)]
pub struct VirtioNetDeviceConfigUpdateInfo {
/// ID of the guest network interface.
pub iface_id: String,
/// Rate Limiter for received packages.
pub rx_rate_limiter: Option<RateLimiterConfigInfo>,
/// Rate Limiter for transmitted packages.
pub tx_rate_limiter: Option<RateLimiterConfigInfo>,
}
impl VirtioNetDeviceConfigUpdateInfo {
/// Provides a `BucketUpdate` description for the RX bandwidth rate limiter.
pub fn rx_bytes(&self) -> BucketUpdate {
get_bucket_update!(self, rx_rate_limiter, bandwidth)
}
/// Provides a `BucketUpdate` description for the RX ops rate limiter.
pub fn rx_ops(&self) -> BucketUpdate {
get_bucket_update!(self, rx_rate_limiter, ops)
}
/// Provides a `BucketUpdate` description for the TX bandwidth rate limiter.
pub fn tx_bytes(&self) -> BucketUpdate {
get_bucket_update!(self, tx_rate_limiter, bandwidth)
}
/// Provides a `BucketUpdate` description for the TX ops rate limiter.
pub fn tx_ops(&self) -> BucketUpdate {
get_bucket_update!(self, tx_rate_limiter, ops)
}
}
/// Configuration information for virtio net devices.
#[derive(Clone, Debug, Deserialize, PartialEq, Serialize, Default)]
pub struct VirtioNetDeviceConfigInfo {
/// ID of the guest network interface.
pub iface_id: String,
/// Host level path for the guest network interface.
pub host_dev_name: String,
/// Number of virtqueues to use.
pub num_queues: usize,
/// Size of each virtqueue. Unit: byte.
pub queue_size: u16,
/// Guest MAC address.
pub guest_mac: Option<MacAddr>,
/// Rate Limiter for received packages.
pub rx_rate_limiter: Option<RateLimiterConfigInfo>,
/// Rate Limiter for transmitted packages.
pub tx_rate_limiter: Option<RateLimiterConfigInfo>,
/// allow duplicate mac
pub allow_duplicate_mac: bool,
/// Use shared irq
pub use_shared_irq: Option<bool>,
/// Use generic irq
pub use_generic_irq: Option<bool>,
}
impl VirtioNetDeviceConfigInfo {
/// Returns the tap device that `host_dev_name` refers to.
pub fn open_tap(&self) -> std::result::Result<Tap, VirtioNetDeviceError> {
Tap::open_named(self.host_dev_name.as_str(), false).map_err(VirtioNetDeviceError::OpenTap)
}
/// Returns a reference to the mac address. It the mac address is not configured, it
/// return None.
pub fn guest_mac(&self) -> Option<&MacAddr> {
self.guest_mac.as_ref()
}
///Rx and Tx queue and max queue sizes
pub fn queue_sizes(&self) -> Vec<u16> {
let mut queue_size = self.queue_size;
if queue_size == 0 {
queue_size = QUEUE_SIZE;
}
let num_queues = if self.num_queues > 0 {
self.num_queues
} else {
NUM_QUEUES
};
(0..num_queues).map(|_| queue_size).collect::<Vec<u16>>()
}
}
impl ConfigItem for VirtioNetDeviceConfigInfo {
type Err = VirtioNetDeviceError;
fn id(&self) -> &str {
&self.iface_id
}
fn check_conflicts(&self, other: &Self) -> Result<(), VirtioNetDeviceError> {
if self.iface_id == other.iface_id {
Err(VirtioNetDeviceError::DeviceIDAlreadyExist(
self.iface_id.clone(),
))
} else if !other.allow_duplicate_mac
&& self.guest_mac.is_some()
&& self.guest_mac == other.guest_mac
{
Err(VirtioNetDeviceError::GuestMacAddressInUse(
self.guest_mac.as_ref().unwrap().to_string(),
))
} else if self.host_dev_name == other.host_dev_name {
Err(VirtioNetDeviceError::HostDeviceNameInUse(
self.host_dev_name.clone(),
))
} else {
Ok(())
}
}
}
/// Virtio Net Device Info
pub type VirtioNetDeviceInfo = DeviceConfigInfo<VirtioNetDeviceConfigInfo>;
/// Device manager to manage all virtio net devices.
pub struct VirtioNetDeviceMgr {
pub(crate) info_list: DeviceConfigInfos<VirtioNetDeviceConfigInfo>,
pub(crate) use_shared_irq: bool,
}
impl VirtioNetDeviceMgr {
/// Gets the index of the device with the specified `drive_id` if it exists in the list.
pub fn get_index_of_iface_id(&self, if_id: &str) -> Option<usize> {
self.info_list
.iter()
.position(|info| info.config.iface_id.eq(if_id))
}
/// Insert or update a virtio net device into the manager.
pub fn insert_device(
device_mgr: &mut DeviceManager,
mut ctx: DeviceOpContext,
config: VirtioNetDeviceConfigInfo,
) -> std::result::Result<(), VirtioNetDeviceError> {
if config.num_queues % 2 != 0 {
return Err(VirtioNetDeviceError::InvalidQueueNum(config.num_queues));
}
if !cfg!(feature = "hotplug") && ctx.is_hotplug {
return Err(VirtioNetDeviceError::UpdateNotAllowedPostBoot);
}
let mgr = &mut device_mgr.virtio_net_manager;
slog::info!(
ctx.logger(),
"add virtio-net device configuration";
"subsystem" => "net_dev_mgr",
"id" => &config.iface_id,
"host_dev_name" => &config.host_dev_name,
);
let device_index = mgr.info_list.insert_or_update(&config)?;
if ctx.is_hotplug {
slog::info!(
ctx.logger(),
"attach virtio-net device";
"subsystem" => "net_dev_mgr",
"id" => &config.iface_id,
"host_dev_name" => &config.host_dev_name,
);
match Self::create_device(&config, &mut ctx) {
Ok(device) => {
let dev = DeviceManager::create_mmio_virtio_device(
device,
&mut ctx,
config.use_shared_irq.unwrap_or(mgr.use_shared_irq),
config.use_generic_irq.unwrap_or(USE_GENERIC_IRQ),
)
.map_err(VirtioNetDeviceError::DeviceManager)?;
ctx.insert_hotplug_mmio_device(&dev.clone(), None)
.map_err(VirtioNetDeviceError::DeviceManager)?;
// live-upgrade need save/restore device from info.device.
mgr.info_list[device_index].set_device(dev);
}
Err(e) => {
mgr.info_list.remove(device_index);
return Err(VirtioNetDeviceError::Virtio(e));
}
}
}
Ok(())
}
/// Update the ratelimiter settings of a virtio net device.
pub fn update_device_ratelimiters(
device_mgr: &mut DeviceManager,
new_cfg: VirtioNetDeviceConfigUpdateInfo,
) -> std::result::Result<(), VirtioNetDeviceError> {
let mgr = &mut device_mgr.virtio_net_manager;
match mgr.get_index_of_iface_id(&new_cfg.iface_id) {
Some(index) => {
let config = &mut mgr.info_list[index].config;
config.rx_rate_limiter = new_cfg.rx_rate_limiter.clone();
config.tx_rate_limiter = new_cfg.tx_rate_limiter.clone();
let device = mgr.info_list[index].device.as_mut().ok_or_else(|| {
VirtioNetDeviceError::InvalidIfaceId(new_cfg.iface_id.clone())
})?;
if let Some(mmio_dev) = device.as_any().downcast_ref::<DbsMmioV2Device>() {
let guard = mmio_dev.state();
let inner_dev = guard.get_inner_device();
if let Some(net_dev) = inner_dev
.as_any()
.downcast_ref::<virtio::net::Net<GuestAddressSpaceImpl>>()
{
return net_dev
.set_patch_rate_limiters(
new_cfg.rx_bytes(),
new_cfg.rx_ops(),
new_cfg.tx_bytes(),
new_cfg.tx_ops(),
)
.map(|_p| ())
.map_err(|_e| VirtioNetDeviceError::NetEpollHanderSendFail);
}
}
Ok(())
}
None => Err(VirtioNetDeviceError::InvalidIfaceId(
new_cfg.iface_id.clone(),
)),
}
}
/// Attach all configured vsock device to the virtual machine instance.
pub fn attach_devices(
&mut self,
ctx: &mut DeviceOpContext,
) -> std::result::Result<(), VirtioNetDeviceError> {
for info in self.info_list.iter_mut() {
slog::info!(
ctx.logger(),
"attach virtio-net device";
"subsystem" => "net_dev_mgr",
"id" => &info.config.iface_id,
"host_dev_name" => &info.config.host_dev_name,
);
let device = Self::create_device(&info.config, ctx)
.map_err(VirtioNetDeviceError::CreateNetDevice)?;
let device = DeviceManager::create_mmio_virtio_device(
device,
ctx,
info.config.use_shared_irq.unwrap_or(self.use_shared_irq),
info.config.use_generic_irq.unwrap_or(USE_GENERIC_IRQ),
)
.map_err(VirtioNetDeviceError::RegisterNetDevice)?;
info.set_device(device);
}
Ok(())
}
fn create_device(
cfg: &VirtioNetDeviceConfigInfo,
ctx: &mut DeviceOpContext,
) -> std::result::Result<Box<Net<GuestAddressSpaceImpl>>, virtio::Error> {
let epoll_mgr = ctx.epoll_mgr.clone().ok_or(virtio::Error::InvalidInput)?;
let rx_rate_limiter = match cfg.rx_rate_limiter.as_ref() {
Some(rl) => Some(rl.try_into().map_err(virtio::Error::IOError)?),
None => None,
};
let tx_rate_limiter = match cfg.tx_rate_limiter.as_ref() {
Some(rl) => Some(rl.try_into().map_err(virtio::Error::IOError)?),
None => None,
};
let net_device = Net::new(
cfg.host_dev_name.clone(),
cfg.guest_mac(),
Arc::new(cfg.queue_sizes()),
epoll_mgr,
rx_rate_limiter,
tx_rate_limiter,
)?;
Ok(Box::new(net_device))
}
}
impl Default for VirtioNetDeviceMgr {
/// Create a new virtio net device manager.
fn default() -> Self {
VirtioNetDeviceMgr {
info_list: DeviceConfigInfos::new(),
use_shared_irq: USE_SHARED_IRQ,
}
}
}

View File

@@ -0,0 +1,299 @@
// Copyright (C) 2022 Alibaba Cloud. All rights reserved.
// Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0
//
// Portions Copyright 2017 The Chromium OS Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the THIRD-PARTY file.
use std::sync::Arc;
use dbs_virtio_devices as virtio;
use dbs_virtio_devices::mmio::DRAGONBALL_FEATURE_INTR_USED;
use dbs_virtio_devices::vsock::backend::{
VsockInnerBackend, VsockInnerConnector, VsockTcpBackend, VsockUnixStreamBackend,
};
use dbs_virtio_devices::vsock::Vsock;
use dbs_virtio_devices::Error as VirtioError;
use serde_derive::{Deserialize, Serialize};
use super::StartMicroVmError;
use crate::config_manager::{ConfigItem, DeviceConfigInfo, DeviceConfigInfos};
use crate::device_manager::{DeviceManager, DeviceOpContext};
pub use dbs_virtio_devices::vsock::QUEUE_SIZES;
const SUBSYSTEM: &str = "vsock_dev_mgr";
// The flag of whether to use the shared irq.
const USE_SHARED_IRQ: bool = true;
// The flag of whether to use the generic irq.
const USE_GENERIC_IRQ: bool = true;
/// Errors associated with `VsockDeviceConfigInfo`.
#[derive(Debug, thiserror::Error)]
pub enum VsockDeviceError {
/// The virtual machine instance ID is invalid.
#[error("the virtual machine instance ID is invalid")]
InvalidVMID,
/// The Context Identifier is already in use.
#[error("the device ID {0} already exists")]
DeviceIDAlreadyExist(String),
/// The Context Identifier is invalid.
#[error("the guest CID {0} is invalid")]
GuestCIDInvalid(u32),
/// The Context Identifier is already in use.
#[error("the guest CID {0} is already in use")]
GuestCIDAlreadyInUse(u32),
/// The Unix Domain Socket path is already in use.
#[error("the Unix Domain Socket path {0} is already in use")]
UDSPathAlreadyInUse(String),
/// The net address is already in use.
#[error("the net address {0} is already in use")]
NetAddrAlreadyInUse(String),
/// The update is not allowed after booting the microvm.
#[error("update operation is not allowed after boot")]
UpdateNotAllowedPostBoot,
/// The VsockId Already Exists
#[error("vsock id {0} already exists")]
VsockIdAlreadyExists(String),
/// Inner backend create error
#[error("vsock inner backend create error: {0}")]
CreateInnerBackend(#[source] std::io::Error),
}
/// Configuration information for a vsock device.
#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)]
pub struct VsockDeviceConfigInfo {
/// ID of the vsock device.
pub id: String,
/// A 32-bit Context Identifier (CID) used to identify the guest.
pub guest_cid: u32,
/// unix domain socket path.
pub uds_path: Option<String>,
/// tcp socket address.
pub tcp_addr: Option<String>,
/// Virtio queue size.
pub queue_size: Vec<u16>,
/// Use shared irq
pub use_shared_irq: Option<bool>,
/// Use generic irq
pub use_generic_irq: Option<bool>,
}
impl Default for VsockDeviceConfigInfo {
fn default() -> Self {
Self {
id: String::default(),
guest_cid: 0,
uds_path: None,
tcp_addr: None,
queue_size: Vec::from(QUEUE_SIZES),
use_shared_irq: None,
use_generic_irq: None,
}
}
}
impl VsockDeviceConfigInfo {
/// Get number and size of queues supported.
pub fn queue_sizes(&self) -> Vec<u16> {
self.queue_size.clone()
}
}
impl ConfigItem for VsockDeviceConfigInfo {
type Err = VsockDeviceError;
fn id(&self) -> &str {
&self.id
}
fn check_conflicts(&self, other: &Self) -> Result<(), VsockDeviceError> {
if self.id == other.id {
return Err(VsockDeviceError::DeviceIDAlreadyExist(self.id.clone()));
}
if self.guest_cid == other.guest_cid {
return Err(VsockDeviceError::GuestCIDAlreadyInUse(self.guest_cid));
}
if let (Some(self_uds_path), Some(other_uds_path)) =
(self.uds_path.as_ref(), other.uds_path.as_ref())
{
if self_uds_path == other_uds_path {
return Err(VsockDeviceError::UDSPathAlreadyInUse(self_uds_path.clone()));
}
}
if let (Some(self_net_addr), Some(other_net_addr)) =
(self.tcp_addr.as_ref(), other.tcp_addr.as_ref())
{
if self_net_addr == other_net_addr {
return Err(VsockDeviceError::NetAddrAlreadyInUse(self_net_addr.clone()));
}
}
Ok(())
}
}
/// Vsock Device Info
pub type VsockDeviceInfo = DeviceConfigInfo<VsockDeviceConfigInfo>;
/// Device manager to manage all vsock devices.
pub struct VsockDeviceMgr {
pub(crate) info_list: DeviceConfigInfos<VsockDeviceConfigInfo>,
pub(crate) default_inner_backend: Option<VsockInnerBackend>,
pub(crate) default_inner_connector: Option<VsockInnerConnector>,
pub(crate) use_shared_irq: bool,
}
impl VsockDeviceMgr {
/// Insert or update a vsock device into the manager.
pub fn insert_device(
&mut self,
ctx: DeviceOpContext,
config: VsockDeviceConfigInfo,
) -> std::result::Result<(), VsockDeviceError> {
if ctx.is_hotplug {
slog::error!(
ctx.logger(),
"no support of virtio-vsock device hotplug";
"subsystem" => SUBSYSTEM,
"id" => &config.id,
"uds_path" => &config.uds_path,
);
return Err(VsockDeviceError::UpdateNotAllowedPostBoot);
}
// VMADDR_CID_ANY (-1U) means any address for binding;
// VMADDR_CID_HYPERVISOR (0) is reserved for services built into the hypervisor;
// VMADDR_CID_RESERVED (1) must not be used;
// VMADDR_CID_HOST (2) is the well-known address of the host.
if config.guest_cid <= 2 {
return Err(VsockDeviceError::GuestCIDInvalid(config.guest_cid));
}
slog::info!(
ctx.logger(),
"add virtio-vsock device configuration";
"subsystem" => SUBSYSTEM,
"id" => &config.id,
"uds_path" => &config.uds_path,
);
self.lazy_make_default_connector()?;
self.info_list.insert_or_update(&config)?;
Ok(())
}
/// Attach all configured vsock device to the virtual machine instance.
pub fn attach_devices(
&mut self,
ctx: &mut DeviceOpContext,
) -> std::result::Result<(), StartMicroVmError> {
let epoll_mgr = ctx
.epoll_mgr
.clone()
.ok_or(StartMicroVmError::CreateVsockDevice(
virtio::Error::InvalidInput,
))?;
for info in self.info_list.iter_mut() {
slog::info!(
ctx.logger(),
"attach virtio-vsock device";
"subsystem" => SUBSYSTEM,
"id" => &info.config.id,
"uds_path" => &info.config.uds_path,
);
let mut device = Box::new(
Vsock::new(
info.config.guest_cid as u64,
Arc::new(info.config.queue_sizes()),
epoll_mgr.clone(),
)
.map_err(VirtioError::VirtioVsockError)
.map_err(StartMicroVmError::CreateVsockDevice)?,
);
if let Some(uds_path) = info.config.uds_path.as_ref() {
let unix_backend = VsockUnixStreamBackend::new(uds_path.clone())
.map_err(VirtioError::VirtioVsockError)
.map_err(StartMicroVmError::CreateVsockDevice)?;
device
.add_backend(Box::new(unix_backend), true)
.map_err(VirtioError::VirtioVsockError)
.map_err(StartMicroVmError::CreateVsockDevice)?;
}
if let Some(tcp_addr) = info.config.tcp_addr.as_ref() {
let tcp_backend = VsockTcpBackend::new(tcp_addr.clone())
.map_err(VirtioError::VirtioVsockError)
.map_err(StartMicroVmError::CreateVsockDevice)?;
device
.add_backend(Box::new(tcp_backend), false)
.map_err(VirtioError::VirtioVsockError)
.map_err(StartMicroVmError::CreateVsockDevice)?;
}
// add inner backend to the the first added vsock device
if let Some(inner_backend) = self.default_inner_backend.take() {
device
.add_backend(Box::new(inner_backend), false)
.map_err(VirtioError::VirtioVsockError)
.map_err(StartMicroVmError::CreateVsockDevice)?;
}
let device = DeviceManager::create_mmio_virtio_device_with_features(
device,
ctx,
Some(DRAGONBALL_FEATURE_INTR_USED),
info.config.use_shared_irq.unwrap_or(self.use_shared_irq),
info.config.use_generic_irq.unwrap_or(USE_GENERIC_IRQ),
)
.map_err(StartMicroVmError::RegisterVsockDevice)?;
info.device = Some(device);
}
Ok(())
}
// check the default connector is present, or build it.
fn lazy_make_default_connector(&mut self) -> std::result::Result<(), VsockDeviceError> {
if self.default_inner_connector.is_none() {
let inner_backend =
VsockInnerBackend::new().map_err(VsockDeviceError::CreateInnerBackend)?;
self.default_inner_connector = Some(inner_backend.get_connector());
self.default_inner_backend = Some(inner_backend);
}
Ok(())
}
/// Get the default vsock inner connector.
pub fn get_default_connector(
&mut self,
) -> std::result::Result<VsockInnerConnector, VsockDeviceError> {
self.lazy_make_default_connector()?;
// safe to unwrap, because we created the inner connector before
Ok(self.default_inner_connector.clone().unwrap())
}
}
impl Default for VsockDeviceMgr {
/// Create a new Vsock device manager.
fn default() -> Self {
VsockDeviceMgr {
info_list: DeviceConfigInfos::new(),
default_inner_backend: None,
default_inner_connector: None,
use_shared_irq: USE_SHARED_IRQ,
}
}
}

224
src/dragonball/src/error.rs Normal file
View File

@@ -0,0 +1,224 @@
// Copyright (C) 2022 Alibaba Cloud. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
//
// Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
//
// Portions Copyright 2017 The Chromium OS Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the THIRD-PARTY file
//! Error codes for the virtual machine monitor subsystem.
#[cfg(feature = "dbs-virtio-devices")]
use dbs_virtio_devices::Error as VirtIoError;
use crate::{address_space_manager, device_manager, resource_manager, vcpu, vm};
/// Shorthand result type for internal VMM commands.
pub type Result<T> = std::result::Result<T, Error>;
/// Errors associated with the VMM internal logic.
///
/// These errors cannot be generated by direct user input, but can result from bad configuration
/// of the host (for example if Dragonball doesn't have permissions to open the KVM fd).
#[derive(Debug, thiserror::Error)]
pub enum Error {
/// Empty AddressSpace from parameters.
#[error("Empty AddressSpace from parameters")]
AddressSpace,
/// The zero page extends past the end of guest_mem.
#[error("the guest zero page extends past the end of guest memory")]
ZeroPagePastRamEnd,
/// Error writing the zero page of guest memory.
#[error("failed to write to guest zero page")]
ZeroPageSetup,
/// Failure occurs in issuing KVM ioctls and errors will be returned from kvm_ioctls lib.
#[error("failure in issuing KVM ioctl command: {0}")]
Kvm(#[source] kvm_ioctls::Error),
/// The host kernel reports an unsupported KVM API version.
#[error("unsupported KVM version {0}")]
KvmApiVersion(i32),
/// Cannot initialize the KVM context due to missing capabilities.
#[error("missing KVM capability: {0:?}")]
KvmCap(kvm_ioctls::Cap),
#[cfg(target_arch = "x86_64")]
#[error("failed to configure MSRs: {0:?}")]
/// Cannot configure MSRs
GuestMSRs(dbs_arch::msr::Error),
/// MSR inner error
#[error("MSR inner error")]
Msr(vmm_sys_util::fam::Error),
/// Error writing MP table to memory.
#[cfg(target_arch = "x86_64")]
#[error("failed to write MP table to guest memory: {0}")]
MpTableSetup(#[source] dbs_boot::mptable::Error),
/// Fail to boot system
#[error("failed to boot system: {0}")]
BootSystem(#[source] dbs_boot::Error),
/// Cannot open the VM file descriptor.
#[error(transparent)]
Vm(vm::VmError),
}
/// Errors associated with starting the instance.
#[derive(Debug, thiserror::Error)]
pub enum StartMicroVmError {
/// Failed to allocate resources.
#[error("cannot allocate resources")]
AllocateResource(#[source] resource_manager::ResourceError),
/// Cannot read from an Event file descriptor.
#[error("failure while reading from EventFd file descriptor")]
EventFd,
/// Cannot add event to Epoll.
#[error("failure while registering epoll event for file descriptor")]
RegisterEvent,
/// The start command was issued more than once.
#[error("the virtual machine is already running")]
MicroVMAlreadyRunning,
/// Cannot start the VM because the kernel was not configured.
#[error("cannot start the virtual machine without kernel configuration")]
MissingKernelConfig,
#[cfg(feature = "hotplug")]
/// Upcall initialize miss vsock device.
#[error("the upcall client needs a virtio-vsock device for communication")]
UpcallMissVsock,
/// Upcall is not ready
#[error("the upcall client is not ready")]
UpcallNotReady,
/// Configuration passed in is invalidate.
#[error("invalid virtual machine configuration: {0} ")]
ConfigureInvalid(String),
/// This error is thrown by the minimal boot loader implementation.
/// It is related to a faulty memory configuration.
#[error("failure while configuring boot information for the virtual machine: {0}")]
ConfigureSystem(#[source] Error),
/// Cannot configure the VM.
#[error("failure while configuring the virtual machine: {0}")]
ConfigureVm(#[source] vm::VmError),
/// Cannot load initrd.
#[error("cannot load Initrd into guest memory: {0}")]
InitrdLoader(#[from] LoadInitrdError),
/// Cannot load kernel due to invalid memory configuration or invalid kernel image.
#[error("cannot load guest kernel into guest memory: {0}")]
KernelLoader(#[source] linux_loader::loader::Error),
/// Cannot load command line string.
#[error("failure while configuring guest kernel commandline: {0}")]
LoadCommandline(#[source] linux_loader::loader::Error),
/// The device manager was not configured.
#[error("the device manager failed to manage devices: {0}")]
DeviceManager(#[source] device_manager::DeviceMgrError),
/// Cannot add devices to the Legacy I/O Bus.
#[error("failure in managing legacy device: {0}")]
LegacyDevice(#[source] device_manager::LegacyDeviceError),
#[cfg(feature = "virtio-vsock")]
/// Failed to create the vsock device.
#[error("cannot create virtio-vsock device: {0}")]
CreateVsockDevice(#[source] VirtIoError),
#[cfg(feature = "virtio-vsock")]
/// Cannot initialize a MMIO Vsock Device or add a device to the MMIO Bus.
#[error("failure while registering virtio-vsock device: {0}")]
RegisterVsockDevice(#[source] device_manager::DeviceMgrError),
/// Address space manager related error, e.g.cannot access guest address space manager.
#[error("address space manager related error: {0}")]
AddressManagerError(#[source] address_space_manager::AddressManagerError),
/// Cannot create a new vCPU file descriptor.
#[error("vCPU related error: {0}")]
Vcpu(#[source] vcpu::VcpuManagerError),
#[cfg(all(feature = "hotplug", feature = "dbs-upcall"))]
/// Upcall initialize Error.
#[error("failure while initializing the upcall client: {0}")]
UpcallInitError(#[source] dbs_upcall::UpcallClientError),
#[cfg(all(feature = "hotplug", feature = "dbs-upcall"))]
/// Upcall connect Error.
#[error("failure while connecting the upcall client: {0}")]
UpcallConnectError(#[source] dbs_upcall::UpcallClientError),
#[cfg(feature = "virtio-blk")]
/// Virtio-blk errors.
#[error("virtio-blk errors: {0}")]
BlockDeviceError(#[source] device_manager::blk_dev_mgr::BlockDeviceError),
#[cfg(feature = "virtio-net")]
/// Virtio-net errors.
#[error("virtio-net errors: {0}")]
VirtioNetDeviceError(#[source] device_manager::virtio_net_dev_mgr::VirtioNetDeviceError),
#[cfg(feature = "virtio-fs")]
/// Virtio-fs errors.
#[error("virtio-fs errors: {0}")]
FsDeviceError(#[source] device_manager::fs_dev_mgr::FsDeviceError),
}
/// Errors associated with starting the instance.
#[derive(Debug, thiserror::Error)]
pub enum StopMicrovmError {
/// Guest memory has not been initialized.
#[error("Guest memory has not been initialized")]
GuestMemoryNotInitialized,
/// Cannnot remove devices
#[error("Failed to remove devices in device_manager {0}")]
DeviceManager(#[source] device_manager::DeviceMgrError),
}
/// Errors associated with loading initrd
#[derive(Debug, thiserror::Error)]
pub enum LoadInitrdError {
/// Cannot load initrd due to an invalid memory configuration.
#[error("failed to load the initrd image to guest memory")]
LoadInitrd,
/// Cannot load initrd due to an invalid image.
#[error("failed to read the initrd image: {0}")]
ReadInitrd(#[source] std::io::Error),
}
/// A dedicated error type to glue with the vmm_epoll crate.
#[derive(Debug, thiserror::Error)]
pub enum EpollError {
/// Generic internal error.
#[error("unclassfied internal error")]
InternalError,
/// Errors from the epoll subsystem.
#[error("failed to issue epoll syscall: {0}")]
EpollMgr(#[from] dbs_utils::epoll_manager::Error),
/// Generic IO errors.
#[error(transparent)]
IOError(std::io::Error),
#[cfg(feature = "dbs-virtio-devices")]
/// Errors from virtio devices.
#[error("failed to manager Virtio device: {0}")]
VirtIoDevice(#[source] VirtIoError),
}

View File

@@ -0,0 +1,169 @@
// Copyright (C) 2020-2022 Alibaba Cloud. All rights reserved.
// Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0
//
// Portions Copyright 2017 The Chromium OS Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the THIRD-PARTY file.
//! Event manager to manage and handle IO events and requests from API server .
use std::sync::atomic::{AtomicUsize, Ordering};
use std::sync::{Arc, Mutex};
use dbs_utils::epoll_manager::{
EpollManager, EventOps, EventSet, Events, MutEventSubscriber, SubscriberId,
};
use log::{error, warn};
use vmm_sys_util::eventfd::EventFd;
use crate::error::{EpollError, Result};
use crate::vmm::Vmm;
// Statically assigned epoll slot for VMM events.
pub(crate) const EPOLL_EVENT_EXIT: u32 = 0;
pub(crate) const EPOLL_EVENT_API_REQUEST: u32 = 1;
/// Shared information between vmm::vmm_thread_event_loop() and VmmEpollHandler.
pub(crate) struct EventContext {
pub api_event_fd: EventFd,
pub api_event_triggered: bool,
pub exit_evt_triggered: bool,
}
impl EventContext {
/// Create a new instance of [`EventContext`].
pub fn new(api_event_fd: EventFd) -> Result<Self> {
Ok(EventContext {
api_event_fd,
api_event_triggered: false,
exit_evt_triggered: false,
})
}
}
/// Event manager for VMM to handle API requests and IO events.
pub struct EventManager {
epoll_mgr: EpollManager,
subscriber_id: SubscriberId,
vmm_event_count: Arc<AtomicUsize>,
}
impl Drop for EventManager {
fn drop(&mut self) {
// Vmm -> Vm -> EpollManager -> VmmEpollHandler -> Vmm
// We need to remove VmmEpollHandler to break the circular reference
// so that Vmm can drop.
self.epoll_mgr
.remove_subscriber(self.subscriber_id)
.map_err(|e| {
error!("event_manager: remove_subscriber err. {:?}", e);
e
})
.ok();
}
}
impl EventManager {
/// Create a new event manager associated with the VMM object.
pub fn new(vmm: &Arc<Mutex<Vmm>>, epoll_mgr: EpollManager) -> Result<Self> {
let vmm_event_count = Arc::new(AtomicUsize::new(0));
let handler: Box<dyn MutEventSubscriber + Send> = Box::new(VmmEpollHandler {
vmm: vmm.clone(),
vmm_event_count: vmm_event_count.clone(),
});
let subscriber_id = epoll_mgr.add_subscriber(handler);
Ok(EventManager {
epoll_mgr,
subscriber_id,
vmm_event_count,
})
}
/// Get the underlying epoll event manager.
pub fn epoll_manager(&self) -> EpollManager {
self.epoll_mgr.clone()
}
/// Registry the eventfd for exit notification.
pub fn register_exit_eventfd(
&mut self,
exit_evt: &EventFd,
) -> std::result::Result<(), EpollError> {
let events = Events::with_data(exit_evt, EPOLL_EVENT_EXIT, EventSet::IN);
self.epoll_mgr
.add_event(self.subscriber_id, events)
.map_err(EpollError::EpollMgr)
}
/// Poll pending events and invoke registered event handler.
///
/// # Arguments:
/// * max_events: maximum number of pending events to handle
/// * timeout: maximum time in milliseconds to wait
pub fn handle_events(&self, timeout: i32) -> std::result::Result<usize, EpollError> {
self.epoll_mgr
.handle_events(timeout)
.map_err(EpollError::EpollMgr)
}
/// Fetch the VMM event count and reset it to zero.
pub fn fetch_vmm_event_count(&self) -> usize {
self.vmm_event_count.swap(0, Ordering::AcqRel)
}
}
struct VmmEpollHandler {
vmm: Arc<Mutex<Vmm>>,
vmm_event_count: Arc<AtomicUsize>,
}
impl MutEventSubscriber for VmmEpollHandler {
fn process(&mut self, events: Events, _ops: &mut EventOps) {
// Do not try to recover when the lock has already been poisoned.
// And be careful to avoid deadlock between process() and vmm::vmm_thread_event_loop().
let mut vmm = self.vmm.lock().unwrap();
match events.data() {
EPOLL_EVENT_API_REQUEST => {
if let Err(e) = vmm.event_ctx.api_event_fd.read() {
error!("event_manager: failed to read API eventfd, {:?}", e);
}
vmm.event_ctx.api_event_triggered = true;
self.vmm_event_count.fetch_add(1, Ordering::AcqRel);
}
EPOLL_EVENT_EXIT => {
let vm = vmm.get_vm().unwrap();
match vm.get_reset_eventfd() {
Some(ev) => {
if let Err(e) = ev.read() {
error!("event_manager: failed to read exit eventfd, {:?}", e);
}
}
None => warn!("event_manager: leftover exit event in epoll context!"),
}
vmm.event_ctx.exit_evt_triggered = true;
self.vmm_event_count.fetch_add(1, Ordering::AcqRel);
}
_ => error!("event_manager: unknown epoll slot number {}", events.data()),
}
}
fn init(&mut self, ops: &mut EventOps) {
// Do not expect poisoned lock.
let vmm = self.vmm.lock().unwrap();
let events = Events::with_data(
&vmm.event_ctx.api_event_fd,
EPOLL_EVENT_API_REQUEST,
EventSet::IN,
);
if let Err(e) = ops.add(events) {
error!(
"event_manager: failed to register epoll event for API server, {:?}",
e
);
}
}
}

View File

@@ -0,0 +1,60 @@
// Copyright (C) 2022 Alibaba Cloud. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
use std::sync::Arc;
use arc_swap::{ArcSwap, Cache};
use dbs_device::device_manager::Error;
use dbs_device::device_manager::IoManager;
/// A specialized version of [`std::result::Result`] for IO manager related operations.
pub type Result<T> = std::result::Result<T, Error>;
/// Wrapper over IoManager to support device hotplug with [`ArcSwap`] and [`Cache`].
#[derive(Clone)]
pub struct IoManagerCached(pub(crate) Cache<Arc<ArcSwap<IoManager>>, Arc<IoManager>>);
impl IoManagerCached {
/// Create a new instance of [`IoManagerCached`].
pub fn new(io_manager: Arc<ArcSwap<IoManager>>) -> Self {
IoManagerCached(Cache::new(io_manager))
}
#[cfg(target_arch = "x86_64")]
#[inline]
/// Read data from IO ports.
pub fn pio_read(&mut self, addr: u16, data: &mut [u8]) -> Result<()> {
self.0.load().pio_read(addr, data)
}
#[cfg(target_arch = "x86_64")]
#[inline]
/// Write data to IO ports.
pub fn pio_write(&mut self, addr: u16, data: &[u8]) -> Result<()> {
self.0.load().pio_write(addr, data)
}
#[inline]
/// Read data to MMIO address.
pub fn mmio_read(&mut self, addr: u64, data: &mut [u8]) -> Result<()> {
self.0.load().mmio_read(addr, data)
}
#[inline]
/// Write data to MMIO address.
pub fn mmio_write(&mut self, addr: u64, data: &[u8]) -> Result<()> {
self.0.load().mmio_write(addr, data)
}
#[inline]
/// Revalidate the inner cache
pub fn revalidate_cache(&mut self) {
let _ = self.0.load();
}
#[inline]
/// Get immutable reference to underlying [`IoManager`].
pub fn load(&mut self) -> &IoManager {
self.0.load()
}
}

View File

@@ -0,0 +1,251 @@
// Copyright (C) 2022 Alibaba Cloud. All rights reserved.
// Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0
//
// Portions Copyright 2017 The Chromium OS Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the THIRD-PARTY file.
#![allow(dead_code)]
use kvm_bindings::KVM_API_VERSION;
use kvm_ioctls::{Cap, Kvm, VmFd};
use std::os::unix::io::{FromRawFd, RawFd};
use crate::error::{Error, Result};
/// Describes a KVM context that gets attached to the micro VM instance.
/// It gives access to the functionality of the KVM wrapper as long as every required
/// KVM capability is present on the host.
pub struct KvmContext {
kvm: Kvm,
max_memslots: usize,
#[cfg(target_arch = "x86_64")]
supported_msrs: kvm_bindings::MsrList,
}
impl KvmContext {
/// Create a new KVM context object, using the provided `kvm_fd` if one is presented.
pub fn new(kvm_fd: Option<RawFd>) -> Result<Self> {
let kvm = if let Some(fd) = kvm_fd {
// Safe because we expect kvm_fd to contain a valid fd number when is_some() == true.
unsafe { Kvm::from_raw_fd(fd) }
} else {
Kvm::new().map_err(Error::Kvm)?
};
if kvm.get_api_version() != KVM_API_VERSION as i32 {
return Err(Error::KvmApiVersion(kvm.get_api_version()));
}
Self::check_cap(&kvm, Cap::Irqchip)?;
Self::check_cap(&kvm, Cap::Irqfd)?;
Self::check_cap(&kvm, Cap::Ioeventfd)?;
Self::check_cap(&kvm, Cap::UserMemory)?;
#[cfg(target_arch = "x86_64")]
Self::check_cap(&kvm, Cap::SetTssAddr)?;
#[cfg(target_arch = "x86_64")]
let supported_msrs = dbs_arch::msr::supported_guest_msrs(&kvm).map_err(Error::GuestMSRs)?;
let max_memslots = kvm.get_nr_memslots();
Ok(KvmContext {
kvm,
max_memslots,
#[cfg(target_arch = "x86_64")]
supported_msrs,
})
}
/// Get underlying KVM object to access kvm-ioctls interfaces.
pub fn kvm(&self) -> &Kvm {
&self.kvm
}
/// Get the maximum number of memory slots reported by this KVM context.
pub fn max_memslots(&self) -> usize {
self.max_memslots
}
/// Create a virtual machine object.
pub fn create_vm(&self) -> Result<VmFd> {
self.kvm.create_vm().map_err(Error::Kvm)
}
/// Get the max vcpu count supported by kvm
pub fn get_max_vcpus(&self) -> usize {
self.kvm.get_max_vcpus()
}
fn check_cap(kvm: &Kvm, cap: Cap) -> std::result::Result<(), Error> {
if !kvm.check_extension(cap) {
return Err(Error::KvmCap(cap));
}
Ok(())
}
}
#[cfg(target_arch = "x86_64")]
mod x86_64 {
use super::*;
use dbs_arch::msr::*;
use kvm_bindings::{kvm_msr_entry, CpuId, MsrList, Msrs};
use std::collections::HashSet;
impl KvmContext {
/// Get information about supported CPUID of x86 processor.
pub fn supported_cpuid(
&self,
max_entries_count: usize,
) -> std::result::Result<CpuId, kvm_ioctls::Error> {
self.kvm.get_supported_cpuid(max_entries_count)
}
/// Get information about supported MSRs of x86 processor.
pub fn supported_msrs(
&self,
_max_entries_count: usize,
) -> std::result::Result<MsrList, kvm_ioctls::Error> {
Ok(self.supported_msrs.clone())
}
// It's very sensible to manipulate MSRs, so please be careful to change code below.
fn build_msrs_list(kvm: &Kvm) -> Result<Msrs> {
let mut mset: HashSet<u32> = HashSet::new();
let supported_msr_list = kvm.get_msr_index_list().map_err(super::Error::Kvm)?;
for msr in supported_msr_list.as_slice() {
mset.insert(*msr);
}
let mut msrs = vec![
MSR_IA32_APICBASE,
MSR_IA32_SYSENTER_CS,
MSR_IA32_SYSENTER_ESP,
MSR_IA32_SYSENTER_EIP,
MSR_IA32_CR_PAT,
];
let filters_list = vec![
MSR_STAR,
MSR_VM_HSAVE_PA,
MSR_TSC_AUX,
MSR_IA32_TSC_ADJUST,
MSR_IA32_TSCDEADLINE,
MSR_IA32_MISC_ENABLE,
MSR_IA32_BNDCFGS,
MSR_IA32_SPEC_CTRL,
];
for msr in filters_list {
if mset.contains(&msr) {
msrs.push(msr);
}
}
// TODO: several msrs are optional.
// TODO: Since our guests don't support nested-vmx, LMCE nor SGX for now.
// msrs.push(MSR_IA32_FEATURE_CONTROL);
msrs.push(MSR_CSTAR);
msrs.push(MSR_KERNEL_GS_BASE);
msrs.push(MSR_SYSCALL_MASK);
msrs.push(MSR_LSTAR);
msrs.push(MSR_IA32_TSC);
msrs.push(MSR_KVM_SYSTEM_TIME_NEW);
msrs.push(MSR_KVM_WALL_CLOCK_NEW);
// FIXME: check if it's supported.
msrs.push(MSR_KVM_ASYNC_PF_EN);
msrs.push(MSR_KVM_PV_EOI_EN);
msrs.push(MSR_KVM_STEAL_TIME);
msrs.push(MSR_CORE_PERF_FIXED_CTR_CTRL);
msrs.push(MSR_CORE_PERF_GLOBAL_CTRL);
msrs.push(MSR_CORE_PERF_GLOBAL_STATUS);
msrs.push(MSR_CORE_PERF_GLOBAL_OVF_CTRL);
const MAX_FIXED_COUNTERS: u32 = 3;
for i in 0..MAX_FIXED_COUNTERS {
msrs.push(MSR_CORE_PERF_FIXED_CTR0 + i);
}
// FIXME: skip MCE for now.
let mtrr_msrs = vec![
MSR_MTRRdefType,
MSR_MTRRfix64K_00000,
MSR_MTRRfix16K_80000,
MSR_MTRRfix16K_A0000,
MSR_MTRRfix4K_C0000,
MSR_MTRRfix4K_C8000,
MSR_MTRRfix4K_D0000,
MSR_MTRRfix4K_D8000,
MSR_MTRRfix4K_E0000,
MSR_MTRRfix4K_E8000,
MSR_MTRRfix4K_F0000,
MSR_MTRRfix4K_F8000,
];
for mtrr in mtrr_msrs {
msrs.push(mtrr);
}
const MSR_MTRRCAP_VCNT: u32 = 8;
for i in 0..MSR_MTRRCAP_VCNT {
msrs.push(0x200 + 2 * i);
msrs.push(0x200 + 2 * i + 1);
}
let msrs: Vec<kvm_msr_entry> = msrs
.iter()
.map(|reg| kvm_msr_entry {
index: *reg,
reserved: 0,
data: 0,
})
.collect();
Msrs::from_entries(&msrs).map_err(super::Error::Msr)
}
}
}
#[cfg(test)]
mod tests {
use super::*;
use kvm_ioctls::Kvm;
use std::fs::File;
use std::os::unix::fs::MetadataExt;
use std::os::unix::io::{AsRawFd, FromRawFd};
#[test]
fn test_create_kvm_context() {
let c = KvmContext::new(None).unwrap();
assert!(c.max_memslots >= 32);
let kvm = Kvm::new().unwrap();
let f = unsafe { File::from_raw_fd(kvm.as_raw_fd()) };
let m1 = f.metadata().unwrap();
let m2 = File::open("/dev/kvm").unwrap().metadata().unwrap();
assert_eq!(m1.dev(), m2.dev());
assert_eq!(m1.ino(), m2.ino());
}
#[cfg(target_arch = "x86_64")]
#[test]
fn test_get_supported_cpu_id() {
let c = KvmContext::new(None).unwrap();
let _ = c
.supported_cpuid(kvm_bindings::KVM_MAX_CPUID_ENTRIES)
.expect("failed to get supported CPUID");
assert!(c.supported_cpuid(0).is_err());
}
#[test]
fn test_create_vm() {
let c = KvmContext::new(None).unwrap();
let _ = c.create_vm().unwrap();
}
}

60
src/dragonball/src/lib.rs Normal file
View File

@@ -0,0 +1,60 @@
// Copyright (C) 2018-2022 Alibaba Cloud. All rights reserved.
// Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0
//! Dragonball is a light-weight virtual machine manager(VMM) based on Linux Kernel-based Virtual
//! Machine(KVM) which is optimized for container workloads.
#![warn(missing_docs)]
//TODO: Remove this, after the rest of dragonball has been committed.
#![allow(dead_code)]
/// Address space manager for virtual machines.
pub mod address_space_manager;
/// API to handle vmm requests.
pub mod api;
/// Structs to maintain configuration information.
pub mod config_manager;
/// Device manager for virtual machines.
pub mod device_manager;
/// Errors related to Virtual machine manager.
pub mod error;
/// KVM operation context for virtual machines.
pub mod kvm_context;
/// Metrics system.
pub mod metric;
/// Resource manager for virtual machines.
pub mod resource_manager;
/// Signal handler for virtual machines.
pub mod signal_handler;
/// Virtual CPU manager for virtual machines.
pub mod vcpu;
/// Virtual machine manager for virtual machines.
pub mod vm;
mod event_manager;
mod io_manager;
mod vmm;
pub use self::error::StartMicroVmError;
pub use self::io_manager::IoManagerCached;
pub use self::vmm::Vmm;
/// Success exit code.
pub const EXIT_CODE_OK: u8 = 0;
/// Generic error exit code.
pub const EXIT_CODE_GENERIC_ERROR: u8 = 1;
/// Generic exit code for an error considered not possible to occur if the program logic is sound.
pub const EXIT_CODE_UNEXPECTED_ERROR: u8 = 2;
/// Dragonball was shut down after intercepting a restricted system call.
pub const EXIT_CODE_BAD_SYSCALL: u8 = 148;
/// Dragonball was shut down after intercepting `SIGBUS`.
pub const EXIT_CODE_SIGBUS: u8 = 149;
/// Dragonball was shut down after intercepting `SIGSEGV`.
pub const EXIT_CODE_SIGSEGV: u8 = 150;
/// Invalid json passed to the Dragonball process for configuring microvm.
pub const EXIT_CODE_INVALID_JSON: u8 = 151;
/// Bad configuration for microvm's resources, when using a single json.
pub const EXIT_CODE_BAD_CONFIGURATION: u8 = 152;
/// Command line arguments parsing error.
pub const EXIT_CODE_ARG_PARSING: u8 = 153;

View File

@@ -0,0 +1,58 @@
// Copyright (C) 2022 Alibaba Cloud. All rights reserved.
// Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0
use dbs_utils::metric::SharedIncMetric;
use lazy_static::lazy_static;
use serde::Serialize;
pub use dbs_utils::metric::IncMetric;
lazy_static! {
/// Static instance used for handling metrics.
pub static ref METRICS: DragonballMetrics = DragonballMetrics::default();
}
/// Metrics specific to VCPUs' mode of functioning.
#[derive(Default, Serialize)]
pub struct VcpuMetrics {
/// Number of KVM exits for handling input IO.
pub exit_io_in: SharedIncMetric,
/// Number of KVM exits for handling output IO.
pub exit_io_out: SharedIncMetric,
/// Number of KVM exits for handling MMIO reads.
pub exit_mmio_read: SharedIncMetric,
/// Number of KVM exits for handling MMIO writes.
pub exit_mmio_write: SharedIncMetric,
/// Number of errors during this VCPU's run.
pub failures: SharedIncMetric,
/// Failures in configuring the CPUID.
pub filter_cpuid: SharedIncMetric,
}
/// Metrics for the seccomp filtering.
#[derive(Default, Serialize)]
pub struct SeccompMetrics {
/// Number of errors inside the seccomp filtering.
pub num_faults: SharedIncMetric,
}
/// Metrics related to signals.
#[derive(Default, Serialize)]
pub struct SignalMetrics {
/// Number of times that SIGBUS was handled.
pub sigbus: SharedIncMetric,
/// Number of times that SIGSEGV was handled.
pub sigsegv: SharedIncMetric,
}
/// Structure storing all metrics while enforcing serialization support on them.
#[derive(Default, Serialize)]
pub struct DragonballMetrics {
/// Metrics related to a vcpu's functioning.
pub vcpu: VcpuMetrics,
/// Metrics related to seccomp filtering.
pub seccomp: SeccompMetrics,
/// Metrics related to signals.
pub signals: SignalMetrics,
}

View File

@@ -0,0 +1,785 @@
// Copyright (C) 2022 Alibaba Cloud. All rights reserved.
//
// SPDX-License-Identifier: Apache-2.0
use std::sync::Mutex;
use dbs_allocator::{Constraint, IntervalTree, Range};
use dbs_boot::layout::{
GUEST_MEM_END, GUEST_MEM_START, GUEST_PHYS_END, IRQ_BASE as LEGACY_IRQ_BASE,
IRQ_MAX as LEGACY_IRQ_MAX, MMIO_LOW_END, MMIO_LOW_START,
};
use dbs_device::resources::{DeviceResources, MsiIrqType, Resource, ResourceConstraint};
// We reserve the LEGACY_IRQ_BASE(5) for shared IRQ.
const SHARED_IRQ: u32 = LEGACY_IRQ_BASE;
// Since ioapic2 have 24 pins for legacy devices, so irq number 0-23 are used. We will set MSI_IRQ_BASE at 24.
#[cfg(target_arch = "x86_64")]
const MSI_IRQ_BASE: u32 = 24;
#[cfg(target_arch = "aarch64")]
/// We define MSI_IRQ_BASE as LEGACY_IRQ_MAX for aarch64 in order not to conflict with legacy irq numbers.
const MSI_IRQ_BASE: u32 = LEGACY_IRQ_MAX + 1;
// kvm max irq is defined in arch/x86/include/asm/kvm_host.h
const MSI_IRQ_MAX: u32 = 1023;
// x86's kvm user mem slots is defined in arch/x86/include/asm/kvm_host.h
#[cfg(target_arch = "x86_64")]
const KVM_USER_MEM_SLOTS: u32 = 509;
// aarch64's kvm user mem slots is defined in arch/arm64/include/asm/kvm_host.h
#[cfg(target_arch = "aarch64")]
const KVM_USER_MEM_SLOTS: u32 = 512;
const PIO_MIN: u16 = 0x0;
const PIO_MAX: u16 = 0xFFFF;
// Reserve the 64MB MMIO address range just below 4G, x86 systems have special
// devices, such as LAPIC, IOAPIC, HPET etc, in this range. And we don't explicitly
// allocate MMIO address for those devices.
const MMIO_SPACE_RESERVED: u64 = 0x400_0000;
/// Errors associated with resource management operations
#[derive(Debug, PartialEq, thiserror::Error)]
pub enum ResourceError {
/// Unknown/unsupported resource type.
#[error("unsupported resource type")]
UnknownResourceType,
/// Invalid resource range.
#[error("invalid resource range for resource type : {0}")]
InvalidResourceRange(String),
/// No resource available.
#[error("no resource available")]
NoAvailResource,
}
#[derive(Default)]
struct ResourceManagerBuilder {
// IntervalTree for allocating legacy irq number.
legacy_irq_pool: IntervalTree<()>,
// IntervalTree for allocating message signal interrupt (MSI) irq number.
msi_irq_pool: IntervalTree<()>,
// IntervalTree for allocating port-mapped io (PIO) address.
pio_pool: IntervalTree<()>,
// IntervalTree for allocating memory-mapped io (MMIO) address.
mmio_pool: IntervalTree<()>,
// IntervalTree for allocating guest memory.
mem_pool: IntervalTree<()>,
// IntervalTree for allocating kvm memory slot.
kvm_mem_slot_pool: IntervalTree<()>,
}
impl ResourceManagerBuilder {
/// init legacy_irq_pool with arch specific constants.
fn init_legacy_irq_pool(mut self) -> Self {
// The LEGACY_IRQ_BASE irq is reserved for shared IRQ and won't be allocated / reallocated,
// so we don't insert it into the legacy_irq interval tree.
self.legacy_irq_pool
.insert(Range::new(LEGACY_IRQ_BASE + 1, LEGACY_IRQ_MAX), None);
self
}
/// init msi_irq_pool with arch specific constants.
fn init_msi_irq_pool(mut self) -> Self {
self.msi_irq_pool
.insert(Range::new(MSI_IRQ_BASE, MSI_IRQ_MAX), None);
self
}
/// init pio_pool with arch specific constants.
fn init_pio_pool(mut self) -> Self {
self.pio_pool.insert(Range::new(PIO_MIN, PIO_MAX), None);
self
}
/// Create mmio_pool with arch specific constants.
/// allow(clippy) is because `GUEST_MEM_START > MMIO_LOW_END`, we may modify GUEST_MEM_START or
/// MMIO_LOW_END in the future.
#[allow(clippy::absurd_extreme_comparisons)]
fn init_mmio_pool_helper(mmio: &mut IntervalTree<()>) {
mmio.insert(Range::new(MMIO_LOW_START, MMIO_LOW_END), None);
if !(*GUEST_MEM_END < MMIO_LOW_START
|| GUEST_MEM_START > MMIO_LOW_END
|| MMIO_LOW_START == MMIO_LOW_END)
{
#[cfg(target_arch = "x86_64")]
{
let constraint = Constraint::new(MMIO_SPACE_RESERVED)
.min(MMIO_LOW_END - MMIO_SPACE_RESERVED)
.max(0xffff_ffffu64);
let key = mmio.allocate(&constraint);
if let Some(k) = key.as_ref() {
mmio.update(k, ());
} else {
panic!("failed to reserve MMIO address range for x86 system devices");
}
}
}
if *GUEST_MEM_END < *GUEST_PHYS_END {
mmio.insert(Range::new(*GUEST_MEM_END + 1, *GUEST_PHYS_END), None);
}
}
/// init mmio_pool with helper function
fn init_mmio_pool(mut self) -> Self {
Self::init_mmio_pool_helper(&mut self.mmio_pool);
self
}
/// Create mem_pool with arch specific constants.
/// deny(clippy) is because `GUEST_MEM_START > MMIO_LOW_END`, we may modify GUEST_MEM_START or
/// MMIO_LOW_END in the future.
#[allow(clippy::absurd_extreme_comparisons)]
pub(crate) fn init_mem_pool_helper(mem: &mut IntervalTree<()>) {
if *GUEST_MEM_END < MMIO_LOW_START
|| GUEST_MEM_START > MMIO_LOW_END
|| MMIO_LOW_START == MMIO_LOW_END
{
mem.insert(Range::new(GUEST_MEM_START, *GUEST_MEM_END), None);
} else {
if MMIO_LOW_START > GUEST_MEM_START {
mem.insert(Range::new(GUEST_MEM_START, MMIO_LOW_START - 1), None);
}
if MMIO_LOW_END < *GUEST_MEM_END {
mem.insert(Range::new(MMIO_LOW_END + 1, *GUEST_MEM_END), None);
}
}
}
/// init mem_pool with helper function
fn init_mem_pool(mut self) -> Self {
Self::init_mem_pool_helper(&mut self.mem_pool);
self
}
/// init kvm_mem_slot_pool with arch specific constants.
fn init_kvm_mem_slot_pool(mut self, max_kvm_mem_slot: Option<usize>) -> Self {
let max_slots = max_kvm_mem_slot.unwrap_or(KVM_USER_MEM_SLOTS as usize);
self.kvm_mem_slot_pool
.insert(Range::new(0, max_slots as u64), None);
self
}
fn build(self) -> ResourceManager {
ResourceManager {
legacy_irq_pool: Mutex::new(self.legacy_irq_pool),
msi_irq_pool: Mutex::new(self.msi_irq_pool),
pio_pool: Mutex::new(self.pio_pool),
mmio_pool: Mutex::new(self.mmio_pool),
mem_pool: Mutex::new(self.mem_pool),
kvm_mem_slot_pool: Mutex::new(self.kvm_mem_slot_pool),
}
}
}
/// Resource manager manages all resources for a virtual machine instance.
pub struct ResourceManager {
legacy_irq_pool: Mutex<IntervalTree<()>>,
msi_irq_pool: Mutex<IntervalTree<()>>,
pio_pool: Mutex<IntervalTree<()>>,
mmio_pool: Mutex<IntervalTree<()>>,
mem_pool: Mutex<IntervalTree<()>>,
kvm_mem_slot_pool: Mutex<IntervalTree<()>>,
}
impl Default for ResourceManager {
fn default() -> Self {
ResourceManagerBuilder::default().build()
}
}
impl ResourceManager {
/// Create a resource manager instance.
pub fn new(max_kvm_mem_slot: Option<usize>) -> Self {
let res_manager_builder = ResourceManagerBuilder::default();
res_manager_builder
.init_legacy_irq_pool()
.init_msi_irq_pool()
.init_pio_pool()
.init_mmio_pool()
.init_mem_pool()
.init_kvm_mem_slot_pool(max_kvm_mem_slot)
.build()
}
/// Init mem_pool with arch specific constants.
pub fn init_mem_pool(&self) {
let mut mem = self.mem_pool.lock().unwrap();
ResourceManagerBuilder::init_mem_pool_helper(&mut mem);
}
/// Check if mem_pool is empty.
pub fn is_mem_pool_empty(&self) -> bool {
self.mem_pool.lock().unwrap().is_empty()
}
/// Allocate one legacy irq number.
///
/// Allocate the specified irq number if `fixed` contains an irq number.
pub fn allocate_legacy_irq(&self, shared: bool, fixed: Option<u32>) -> Option<u32> {
// if shared_irq is used, just return the shared irq num.
if shared {
return Some(SHARED_IRQ);
}
let mut constraint = Constraint::new(1u32);
if let Some(v) = fixed {
if v == SHARED_IRQ {
return None;
}
constraint.min = v as u64;
constraint.max = v as u64;
}
// Safe to unwrap() because we don't expect poisoned lock here.
let mut legacy_irq_pool = self.legacy_irq_pool.lock().unwrap();
let key = legacy_irq_pool.allocate(&constraint);
if let Some(k) = key.as_ref() {
legacy_irq_pool.update(k, ());
}
key.map(|v| v.min as u32)
}
/// Free a legacy irq number.
///
/// Panic if the irq number is invalid.
pub fn free_legacy_irq(&self, irq: u32) -> Result<(), ResourceError> {
// if the irq number is shared_irq, we don't need to do anything.
if irq == SHARED_IRQ {
return Ok(());
}
if !(LEGACY_IRQ_BASE..=LEGACY_IRQ_MAX).contains(&irq) {
return Err(ResourceError::InvalidResourceRange(
"Legacy IRQ".to_string(),
));
}
let key = Range::new(irq, irq);
// Safe to unwrap() because we don't expect poisoned lock here.
self.legacy_irq_pool.lock().unwrap().free(&key);
Ok(())
}
/// Allocate a group of MSI irq numbers.
///
/// The allocated MSI irq numbers may or may not be naturally aligned.
pub fn allocate_msi_irq(&self, count: u32) -> Option<u32> {
let constraint = Constraint::new(count);
// Safe to unwrap() because we don't expect poisoned lock here.
let mut msi_irq_pool = self.msi_irq_pool.lock().unwrap();
let key = msi_irq_pool.allocate(&constraint);
if let Some(k) = key.as_ref() {
msi_irq_pool.update(k, ());
}
key.map(|v| v.min as u32)
}
/// Allocate a group of MSI irq numbers, naturally aligned to `count`.
///
/// This may be used to support PCI MSI, which requires the allocated irq number is naturally
/// aligned.
pub fn allocate_msi_irq_aligned(&self, count: u32) -> Option<u32> {
let constraint = Constraint::new(count).align(count);
// Safe to unwrap() because we don't expect poisoned lock here.
let mut msi_irq_pool = self.msi_irq_pool.lock().unwrap();
let key = msi_irq_pool.allocate(&constraint);
if let Some(k) = key.as_ref() {
msi_irq_pool.update(k, ());
}
key.map(|v| v.min as u32)
}
/// Free a group of MSI irq numbers.
///
/// Panic if `irq` or `count` is invalid.
pub fn free_msi_irq(&self, irq: u32, count: u32) -> Result<(), ResourceError> {
if irq < MSI_IRQ_BASE
|| count == 0
|| irq.checked_add(count).is_none()
|| irq + count - 1 > MSI_IRQ_MAX
{
return Err(ResourceError::InvalidResourceRange("MSI IRQ".to_string()));
}
let key = Range::new(irq, irq + count - 1);
// Safe to unwrap() because we don't expect poisoned lock here.
self.msi_irq_pool.lock().unwrap().free(&key);
Ok(())
}
/// Allocate a group of PIO address and returns the allocated PIO base address.
pub fn allocate_pio_address_simple(&self, size: u16) -> Option<u16> {
let constraint = Constraint::new(size);
self.allocate_pio_address(&constraint)
}
/// Allocate a group of PIO address and returns the allocated PIO base address.
pub fn allocate_pio_address(&self, constraint: &Constraint) -> Option<u16> {
// Safe to unwrap() because we don't expect poisoned lock here.
let mut pio_pool = self.pio_pool.lock().unwrap();
let key = pio_pool.allocate(constraint);
if let Some(k) = key.as_ref() {
pio_pool.update(k, ());
}
key.map(|v| v.min as u16)
}
/// Free PIO address range `[base, base + size - 1]`.
///
/// Panic if `base` or `size` is invalid.
pub fn free_pio_address(&self, base: u16, size: u16) -> Result<(), ResourceError> {
if base.checked_add(size).is_none() {
return Err(ResourceError::InvalidResourceRange(
"PIO Address".to_string(),
));
}
let key = Range::new(base, base + size - 1);
// Safe to unwrap() because we don't expect poisoned lock here.
self.pio_pool.lock().unwrap().free(&key);
Ok(())
}
/// Allocate a MMIO address range alinged to `align` and returns the allocated base address.
pub fn allocate_mmio_address_aligned(&self, size: u64, align: u64) -> Option<u64> {
let constraint = Constraint::new(size).align(align);
self.allocate_mmio_address(&constraint)
}
/// Allocate a MMIO address range and returns the allocated base address.
pub fn allocate_mmio_address(&self, constraint: &Constraint) -> Option<u64> {
// Safe to unwrap() because we don't expect poisoned lock here.
let mut mmio_pool = self.mmio_pool.lock().unwrap();
let key = mmio_pool.allocate(constraint);
key.map(|v| v.min)
}
/// Free MMIO address range `[base, base + size - 1]`
pub fn free_mmio_address(&self, base: u64, size: u64) -> Result<(), ResourceError> {
if base.checked_add(size).is_none() {
return Err(ResourceError::InvalidResourceRange(
"MMIO Address".to_string(),
));
}
let key = Range::new(base, base + size - 1);
// Safe to unwrap() because we don't expect poisoned lock here.
self.mmio_pool.lock().unwrap().free(&key);
Ok(())
}
/// Allocate guest memory address range and returns the allocated base memory address.
pub fn allocate_mem_address(&self, constraint: &Constraint) -> Option<u64> {
// Safe to unwrap() because we don't expect poisoned lock here.
let mut mem_pool = self.mem_pool.lock().unwrap();
let key = mem_pool.allocate(constraint);
key.map(|v| v.min)
}
/// Free the guest memory address range `[base, base + size - 1]`.
///
/// Panic if the guest memory address range is invalid.
/// allow(clippy) is because `base < GUEST_MEM_START`, we may modify GUEST_MEM_START in the future.
#[allow(clippy::absurd_extreme_comparisons)]
pub fn free_mem_address(&self, base: u64, size: u64) -> Result<(), ResourceError> {
if base.checked_add(size).is_none()
|| base < GUEST_MEM_START
|| base + size > *GUEST_MEM_END
{
return Err(ResourceError::InvalidResourceRange(
"MEM Address".to_string(),
));
}
let key = Range::new(base, base + size - 1);
// Safe to unwrap() because we don't expect poisoned lock here.
self.mem_pool.lock().unwrap().free(&key);
Ok(())
}
/// Allocate a kvm memory slot number.
///
/// Allocate the specified slot if `fixed` contains a slot number.
pub fn allocate_kvm_mem_slot(&self, size: u32, fixed: Option<u32>) -> Option<u32> {
let mut constraint = Constraint::new(size);
if let Some(v) = fixed {
constraint.min = v as u64;
constraint.max = v as u64;
}
// Safe to unwrap() because we don't expect poisoned lock here.
let mut kvm_mem_slot_pool = self.kvm_mem_slot_pool.lock().unwrap();
let key = kvm_mem_slot_pool.allocate(&constraint);
if let Some(k) = key.as_ref() {
kvm_mem_slot_pool.update(k, ());
}
key.map(|v| v.min as u32)
}
/// Free a kvm memory slot number.
pub fn free_kvm_mem_slot(&self, slot: u32) -> Result<(), ResourceError> {
let key = Range::new(slot, slot);
// Safe to unwrap() because we don't expect poisoned lock here.
self.kvm_mem_slot_pool.lock().unwrap().free(&key);
Ok(())
}
/// Allocate requested resources for a device.
pub fn allocate_device_resources(
&self,
requests: &[ResourceConstraint],
shared_irq: bool,
) -> std::result::Result<DeviceResources, ResourceError> {
let mut resources = DeviceResources::new();
for resource in requests.iter() {
let res = match resource {
ResourceConstraint::PioAddress { range, align, size } => {
let mut constraint = Constraint::new(*size).align(*align);
if let Some(r) = range {
constraint.min = r.0 as u64;
constraint.max = r.1 as u64;
}
match self.allocate_pio_address(&constraint) {
Some(base) => Resource::PioAddressRange {
base: base as u16,
size: *size,
},
None => {
if let Err(e) = self.free_device_resources(&resources) {
return Err(e);
} else {
return Err(ResourceError::NoAvailResource);
}
}
}
}
ResourceConstraint::MmioAddress { range, align, size } => {
let mut constraint = Constraint::new(*size).align(*align);
if let Some(r) = range {
constraint.min = r.0;
constraint.max = r.1;
}
match self.allocate_mmio_address(&constraint) {
Some(base) => Resource::MmioAddressRange { base, size: *size },
None => {
if let Err(e) = self.free_device_resources(&resources) {
return Err(e);
} else {
return Err(ResourceError::NoAvailResource);
}
}
}
}
ResourceConstraint::MemAddress { range, align, size } => {
let mut constraint = Constraint::new(*size).align(*align);
if let Some(r) = range {
constraint.min = r.0;
constraint.max = r.1;
}
match self.allocate_mem_address(&constraint) {
Some(base) => Resource::MemAddressRange { base, size: *size },
None => {
if let Err(e) = self.free_device_resources(&resources) {
return Err(e);
} else {
return Err(ResourceError::NoAvailResource);
}
}
}
}
ResourceConstraint::LegacyIrq { irq } => {
match self.allocate_legacy_irq(shared_irq, *irq) {
Some(v) => Resource::LegacyIrq(v),
None => {
if let Err(e) = self.free_device_resources(&resources) {
return Err(e);
} else {
return Err(ResourceError::NoAvailResource);
}
}
}
}
ResourceConstraint::PciMsiIrq { size } => {
match self.allocate_msi_irq_aligned(*size) {
Some(base) => Resource::MsiIrq {
ty: MsiIrqType::PciMsi,
base,
size: *size,
},
None => {
if let Err(e) = self.free_device_resources(&resources) {
return Err(e);
} else {
return Err(ResourceError::NoAvailResource);
}
}
}
}
ResourceConstraint::PciMsixIrq { size } => match self.allocate_msi_irq(*size) {
Some(base) => Resource::MsiIrq {
ty: MsiIrqType::PciMsix,
base,
size: *size,
},
None => {
if let Err(e) = self.free_device_resources(&resources) {
return Err(e);
} else {
return Err(ResourceError::NoAvailResource);
}
}
},
ResourceConstraint::GenericIrq { size } => match self.allocate_msi_irq(*size) {
Some(base) => Resource::MsiIrq {
ty: MsiIrqType::GenericMsi,
base,
size: *size,
},
None => {
if let Err(e) = self.free_device_resources(&resources) {
return Err(e);
} else {
return Err(ResourceError::NoAvailResource);
}
}
},
ResourceConstraint::KvmMemSlot { slot, size } => {
match self.allocate_kvm_mem_slot(*size, *slot) {
Some(v) => Resource::KvmMemSlot(v),
None => {
if let Err(e) = self.free_device_resources(&resources) {
return Err(e);
} else {
return Err(ResourceError::NoAvailResource);
}
}
}
}
};
resources.append(res);
}
Ok(resources)
}
/// Free resources allocated for a device.
pub fn free_device_resources(&self, resources: &DeviceResources) -> Result<(), ResourceError> {
for res in resources.iter() {
let result = match res {
Resource::PioAddressRange { base, size } => self.free_pio_address(*base, *size),
Resource::MmioAddressRange { base, size } => self.free_mmio_address(*base, *size),
Resource::MemAddressRange { base, size } => self.free_mem_address(*base, *size),
Resource::LegacyIrq(base) => self.free_legacy_irq(*base),
Resource::MsiIrq { ty: _, base, size } => self.free_msi_irq(*base, *size),
Resource::KvmMemSlot(slot) => self.free_kvm_mem_slot(*slot),
Resource::MacAddresss(_) => Ok(()),
};
if result.is_err() {
return result;
}
}
Ok(())
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_allocate_legacy_irq() {
let mgr = ResourceManager::new(None);
// Allocate/free shared IRQs multiple times.
assert_eq!(mgr.allocate_legacy_irq(true, None).unwrap(), SHARED_IRQ);
assert_eq!(mgr.allocate_legacy_irq(true, None).unwrap(), SHARED_IRQ);
mgr.free_legacy_irq(SHARED_IRQ);
mgr.free_legacy_irq(SHARED_IRQ);
mgr.free_legacy_irq(SHARED_IRQ);
// Allocate specified IRQs.
assert_eq!(
mgr.allocate_legacy_irq(false, Some(LEGACY_IRQ_BASE + 10))
.unwrap(),
LEGACY_IRQ_BASE + 10
);
mgr.free_legacy_irq(LEGACY_IRQ_BASE + 10);
assert_eq!(
mgr.allocate_legacy_irq(false, Some(LEGACY_IRQ_BASE + 10))
.unwrap(),
LEGACY_IRQ_BASE + 10
);
assert!(mgr
.allocate_legacy_irq(false, Some(LEGACY_IRQ_BASE + 10))
.is_none());
assert!(mgr.allocate_legacy_irq(false, None).is_some());
assert!(mgr
.allocate_legacy_irq(false, Some(LEGACY_IRQ_BASE - 1))
.is_none());
assert!(mgr
.allocate_legacy_irq(false, Some(LEGACY_IRQ_MAX + 1))
.is_none());
assert!(mgr.allocate_legacy_irq(false, Some(SHARED_IRQ)).is_none());
}
#[test]
fn test_invalid_free_legacy_irq() {
let mgr = ResourceManager::new(None);
assert_eq!(
mgr.free_legacy_irq(LEGACY_IRQ_MAX + 1),
Err(ResourceError::InvalidResourceRange(
"Legacy IRQ".to_string(),
))
);
}
#[test]
fn test_allocate_msi_irq() {
let mgr = ResourceManager::new(None);
let msi = mgr.allocate_msi_irq(3).unwrap();
mgr.free_msi_irq(msi, 3);
let msi = mgr.allocate_msi_irq(3).unwrap();
mgr.free_msi_irq(msi, 3);
let irq = mgr.allocate_msi_irq_aligned(8).unwrap();
assert_eq!(irq & 0x7, 0);
mgr.free_msi_irq(msi, 8);
let irq = mgr.allocate_msi_irq_aligned(8).unwrap();
assert_eq!(irq & 0x7, 0);
let irq = mgr.allocate_msi_irq_aligned(512).unwrap();
assert_eq!(irq, 512);
mgr.free_msi_irq(irq, 512);
let irq = mgr.allocate_msi_irq_aligned(512).unwrap();
assert_eq!(irq, 512);
assert!(mgr.allocate_msi_irq(4099).is_none());
}
#[test]
fn test_invalid_free_msi_irq() {
let mgr = ResourceManager::new(None);
assert_eq!(
mgr.free_msi_irq(MSI_IRQ_MAX, 3),
Err(ResourceError::InvalidResourceRange("MSI IRQ".to_string()))
);
}
#[test]
fn test_allocate_pio_addr() {
let mgr = ResourceManager::new(None);
assert!(mgr.allocate_pio_address_simple(10).is_some());
let mut requests = vec![
ResourceConstraint::PioAddress {
range: None,
align: 0x1000,
size: 0x2000,
},
ResourceConstraint::PioAddress {
range: Some((0x8000, 0x9000)),
align: 0x1000,
size: 0x1000,
},
ResourceConstraint::PioAddress {
range: Some((0x9000, 0xa000)),
align: 0x1000,
size: 0x1000,
},
ResourceConstraint::PioAddress {
range: Some((0xb000, 0xc000)),
align: 0x1000,
size: 0x1000,
},
];
let resources = mgr.allocate_device_resources(&requests, false).unwrap();
mgr.free_device_resources(&resources);
let resources = mgr.allocate_device_resources(&requests, false).unwrap();
mgr.free_device_resources(&resources);
requests.push(ResourceConstraint::PioAddress {
range: Some((0xc000, 0xc000)),
align: 0x1000,
size: 0x1000,
});
assert!(mgr.allocate_device_resources(&requests, false).is_err());
let resources = mgr
.allocate_device_resources(&requests[0..requests.len() - 1], false)
.unwrap();
mgr.free_device_resources(&resources);
}
#[test]
fn test_invalid_free_pio_addr() {
let mgr = ResourceManager::new(None);
assert_eq!(
mgr.free_pio_address(u16::MAX, 3),
Err(ResourceError::InvalidResourceRange(
"PIO Address".to_string(),
))
);
}
#[test]
fn test_allocate_kvm_mem_slot() {
let mgr = ResourceManager::new(None);
assert_eq!(mgr.allocate_kvm_mem_slot(1, None).unwrap(), 0);
assert_eq!(mgr.allocate_kvm_mem_slot(1, Some(200)).unwrap(), 200);
mgr.free_kvm_mem_slot(200);
assert_eq!(mgr.allocate_kvm_mem_slot(1, Some(200)).unwrap(), 200);
assert_eq!(
mgr.allocate_kvm_mem_slot(1, Some(KVM_USER_MEM_SLOTS))
.unwrap(),
KVM_USER_MEM_SLOTS
);
assert!(mgr
.allocate_kvm_mem_slot(1, Some(KVM_USER_MEM_SLOTS + 1))
.is_none());
}
#[test]
fn test_allocate_mmio_address() {
let mgr = ResourceManager::new(None);
#[cfg(target_arch = "x86_64")]
{
// Can't allocate from reserved region
let constraint = Constraint::new(0x100_0000u64)
.min(0x1_0000_0000u64 - 0x200_0000u64)
.max(0xffff_ffffu64);
assert!(mgr.allocate_mmio_address(&constraint).is_none());
}
let constraint = Constraint::new(0x100_0000u64).min(0x1_0000_0000u64 - 0x200_0000u64);
assert!(mgr.allocate_mmio_address(&constraint).is_some());
#[cfg(target_arch = "x86_64")]
{
// Can't allocate from reserved region
let constraint = Constraint::new(0x100_0000u64)
.min(0x1_0000_0000u64 - 0x200_0000u64)
.max(0xffff_ffffu64);
assert!(mgr.allocate_mem_address(&constraint).is_none());
}
#[cfg(target_arch = "aarch64")]
{
let constraint = Constraint::new(0x200_0000u64)
.min(0x1_0000_0000u64 - 0x200_0000u64)
.max(0xffff_fffeu64);
assert!(mgr.allocate_mem_address(&constraint).is_none());
}
let constraint = Constraint::new(0x100_0000u64).min(0x1_0000_0000u64 - 0x200_0000u64);
assert!(mgr.allocate_mem_address(&constraint).is_some());
}
#[test]
#[should_panic]
fn test_allocate_duplicate_memory() {
let mgr = ResourceManager::new(None);
let constraint_1 = Constraint::new(0x100_0000u64)
.min(0x1_0000_0000u64)
.max(0x1_0000_0000u64 + 0x100_0000u64);
let constraint_2 = Constraint::new(0x100_0000u64)
.min(0x1_0000_0000u64)
.max(0x1_0000_0000u64 + 0x100_0000u64);
assert!(mgr.allocate_mem_address(&constraint_1).is_some());
assert!(mgr.allocate_mem_address(&constraint_2).is_some());
}
}

View File

@@ -0,0 +1,219 @@
// Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0
use libc::{_exit, c_int, c_void, siginfo_t, SIGBUS, SIGSEGV, SIGSYS};
use log::error;
use vmm_sys_util::signal::register_signal_handler;
use crate::metric::{IncMetric, METRICS};
// The offset of `si_syscall` (offending syscall identifier) within the siginfo structure
// expressed as an `(u)int*`.
// Offset `6` for an `i32` field means that the needed information is located at `6 * sizeof(i32)`.
// See /usr/include/linux/signal.h for the C struct definition.
// See https://github.com/rust-lang/libc/issues/716 for why the offset is different in Rust.
const SI_OFF_SYSCALL: isize = 6;
const SYS_SECCOMP_CODE: i32 = 1;
extern "C" {
fn __libc_current_sigrtmin() -> c_int;
fn __libc_current_sigrtmax() -> c_int;
}
/// Gets current sigrtmin
pub fn sigrtmin() -> c_int {
unsafe { __libc_current_sigrtmin() }
}
/// Gets current sigrtmax
pub fn sigrtmax() -> c_int {
unsafe { __libc_current_sigrtmax() }
}
/// Signal handler for `SIGSYS`.
///
/// Increments the `seccomp.num_faults` metric, logs an error message and terminates the process
/// with a specific exit code.
extern "C" fn sigsys_handler(num: c_int, info: *mut siginfo_t, _unused: *mut c_void) {
// Safe because we're just reading some fields from a supposedly valid argument.
let si_signo = unsafe { (*info).si_signo };
let si_code = unsafe { (*info).si_code };
// Sanity check. The condition should never be true.
if num != si_signo || num != SIGSYS || si_code != SYS_SECCOMP_CODE as i32 {
// Safe because we're terminating the process anyway.
unsafe { _exit(i32::from(super::EXIT_CODE_UNEXPECTED_ERROR)) };
}
// Other signals which might do async unsafe things incompatible with the rest of this
// function are blocked due to the sa_mask used when registering the signal handler.
let syscall = unsafe { *(info as *const i32).offset(SI_OFF_SYSCALL) as usize };
// SIGSYS is triggered when bad syscalls are detected. num_faults is only added when SIGSYS is detected
// so it actually only collects the count for bad syscalls.
METRICS.seccomp.num_faults.inc();
error!(
"Shutting down VM after intercepting a bad syscall ({}).",
syscall
);
// Safe because we're terminating the process anyway. We don't actually do anything when
// running unit tests.
#[cfg(not(test))]
unsafe {
_exit(i32::from(super::EXIT_CODE_BAD_SYSCALL))
};
}
/// Signal handler for `SIGBUS` and `SIGSEGV`.
///
/// Logs an error message and terminates the process with a specific exit code.
extern "C" fn sigbus_sigsegv_handler(num: c_int, info: *mut siginfo_t, _unused: *mut c_void) {
// Safe because we're just reading some fields from a supposedly valid argument.
let si_signo = unsafe { (*info).si_signo };
let si_code = unsafe { (*info).si_code };
// Sanity check. The condition should never be true.
if num != si_signo || (num != SIGBUS && num != SIGSEGV) {
// Safe because we're terminating the process anyway.
unsafe { _exit(i32::from(super::EXIT_CODE_UNEXPECTED_ERROR)) };
}
// Other signals which might do async unsafe things incompatible with the rest of this
// function are blocked due to the sa_mask used when registering the signal handler.
match si_signo {
SIGBUS => METRICS.signals.sigbus.inc(),
SIGSEGV => METRICS.signals.sigsegv.inc(),
_ => (),
}
error!(
"Shutting down VM after intercepting signal {}, code {}.",
si_signo, si_code
);
// Safe because we're terminating the process anyway. We don't actually do anything when
// running unit tests.
#[cfg(not(test))]
unsafe {
_exit(i32::from(match si_signo {
SIGBUS => super::EXIT_CODE_SIGBUS,
SIGSEGV => super::EXIT_CODE_SIGSEGV,
_ => super::EXIT_CODE_UNEXPECTED_ERROR,
}))
};
}
/// Registers all the required signal handlers.
///
/// Custom handlers are installed for: `SIGBUS`, `SIGSEGV`, `SIGSYS`.
pub fn register_signal_handlers() -> vmm_sys_util::errno::Result<()> {
// Call to unsafe register_signal_handler which is considered unsafe because it will
// register a signal handler which will be called in the current thread and will interrupt
// whatever work is done on the current thread, so we have to keep in mind that the registered
// signal handler must only do async-signal-safe operations.
register_signal_handler(SIGSYS, sigsys_handler)?;
register_signal_handler(SIGBUS, sigbus_sigsegv_handler)?;
register_signal_handler(SIGSEGV, sigbus_sigsegv_handler)?;
Ok(())
}
#[cfg(test)]
mod tests {
use super::*;
use libc::{cpu_set_t, syscall};
use std::convert::TryInto;
use std::{mem, process, thread};
use seccompiler::{apply_filter, BpfProgram, SeccompAction, SeccompFilter};
// This function is used when running unit tests, so all the unsafes are safe.
fn cpu_count() -> usize {
let mut cpuset: cpu_set_t = unsafe { mem::zeroed() };
unsafe {
libc::CPU_ZERO(&mut cpuset);
}
let ret = unsafe {
libc::sched_getaffinity(
0,
mem::size_of::<cpu_set_t>(),
&mut cpuset as *mut cpu_set_t,
)
};
assert_eq!(ret, 0);
let mut num = 0;
for i in 0..libc::CPU_SETSIZE as usize {
if unsafe { libc::CPU_ISSET(i, &cpuset) } {
num += 1;
}
}
num
}
#[test]
fn test_signal_handler() {
let child = thread::spawn(move || {
assert!(register_signal_handlers().is_ok());
let filter = SeccompFilter::new(
vec![
(libc::SYS_brk, vec![]),
(libc::SYS_exit, vec![]),
(libc::SYS_futex, vec![]),
(libc::SYS_getpid, vec![]),
(libc::SYS_munmap, vec![]),
(libc::SYS_kill, vec![]),
(libc::SYS_rt_sigprocmask, vec![]),
(libc::SYS_rt_sigreturn, vec![]),
(libc::SYS_sched_getaffinity, vec![]),
(libc::SYS_set_tid_address, vec![]),
(libc::SYS_sigaltstack, vec![]),
(libc::SYS_write, vec![]),
]
.into_iter()
.collect(),
SeccompAction::Trap,
SeccompAction::Allow,
std::env::consts::ARCH.try_into().unwrap(),
)
.unwrap();
assert!(apply_filter(&TryInto::<BpfProgram>::try_into(filter).unwrap()).is_ok());
assert_eq!(METRICS.seccomp.num_faults.count(), 0);
// Call the blacklisted `SYS_mkdirat`.
unsafe { syscall(libc::SYS_mkdirat, "/foo/bar\0") };
// Call SIGBUS signal handler.
assert_eq!(METRICS.signals.sigbus.count(), 0);
unsafe {
syscall(libc::SYS_kill, process::id(), SIGBUS);
}
// Call SIGSEGV signal handler.
assert_eq!(METRICS.signals.sigsegv.count(), 0);
unsafe {
syscall(libc::SYS_kill, process::id(), SIGSEGV);
}
});
assert!(child.join().is_ok());
// Sanity check.
assert!(cpu_count() > 0);
// Kcov somehow messes with our handler getting the SIGSYS signal when a bad syscall
// is caught, so the following assertion no longer holds. Ideally, we'd have a surefire
// way of either preventing this behaviour, or detecting for certain whether this test is
// run by kcov or not. The best we could do so far is to look at the perceived number of
// available CPUs. Kcov seems to make a single CPU available to the process running the
// tests, so we use this as an heuristic to decide if we check the assertion.
if cpu_count() > 1 {
// The signal handler should let the program continue during unit tests.
assert!(METRICS.seccomp.num_faults.count() >= 1);
}
assert!(METRICS.signals.sigbus.count() >= 1);
assert!(METRICS.signals.sigsegv.count() >= 1);
}
}

View File

@@ -0,0 +1,123 @@
// Copyright (C) 2022 Alibaba Cloud. All rights reserved.
// Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0
//
// Portions Copyright 2017 The Chromium OS Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the THIRD-PARTY file.
use std::ops::Deref;
use std::sync::mpsc::{channel, Sender};
use std::sync::Arc;
use crate::IoManagerCached;
use dbs_arch::regs;
use dbs_boot::get_fdt_addr;
use dbs_utils::time::TimestampUs;
use kvm_ioctls::{VcpuFd, VmFd};
use vm_memory::{Address, GuestAddress, GuestAddressSpace};
use vmm_sys_util::eventfd::EventFd;
use crate::address_space_manager::GuestAddressSpaceImpl;
use crate::vcpu::vcpu_impl::{Result, Vcpu, VcpuError, VcpuStateEvent};
use crate::vcpu::VcpuConfig;
#[allow(unused)]
impl Vcpu {
/// Constructs a new VCPU for `vm`.
///
/// # Arguments
///
/// * `id` - Represents the CPU number between [0, max vcpus).
/// * `vcpu_fd` - The kvm `VcpuFd` for the vcpu.
/// * `io_mgr` - The io-manager used to access port-io and mmio devices.
/// * `exit_evt` - An `EventFd` that will be written into when this vcpu
/// exits.
/// * `vcpu_state_event` - The eventfd which can notify vmm state of some
/// vcpu should change.
/// * `vcpu_state_sender` - The channel to send state change message from
/// vcpu thread to vmm thread.
/// * `create_ts` - A timestamp used by the vcpu to calculate its lifetime.
/// * `support_immediate_exit` - whether kvm uses supports immediate_exit flag.
pub fn new_aarch64(
id: u8,
vcpu_fd: Arc<VcpuFd>,
io_mgr: IoManagerCached,
exit_evt: EventFd,
vcpu_state_event: EventFd,
vcpu_state_sender: Sender<VcpuStateEvent>,
create_ts: TimestampUs,
support_immediate_exit: bool,
) -> Result<Self> {
let (event_sender, event_receiver) = channel();
let (response_sender, response_receiver) = channel();
Ok(Vcpu {
fd: vcpu_fd,
id,
io_mgr,
create_ts,
event_receiver,
event_sender: Some(event_sender),
response_receiver: Some(response_receiver),
response_sender,
vcpu_state_event,
vcpu_state_sender,
support_immediate_exit,
mpidr: 0,
exit_evt,
})
}
/// Configures an aarch64 specific vcpu.
///
/// # Arguments
///
/// * `vcpu_config` - vCPU config for this vCPU status
/// * `vm_fd` - The kvm `VmFd` for this microvm.
/// * `vm_as` - The guest memory address space used by this microvm.
/// * `kernel_load_addr` - Offset from `guest_mem` at which the kernel is loaded.
/// * `_pgtable_addr` - pgtable address for ap vcpu (not used in aarch64)
pub fn configure(
&mut self,
_vcpu_config: &VcpuConfig,
vm_fd: &VmFd,
vm_as: &GuestAddressSpaceImpl,
kernel_load_addr: Option<GuestAddress>,
_pgtable_addr: Option<GuestAddress>,
) -> Result<()> {
let mut kvi: kvm_bindings::kvm_vcpu_init = kvm_bindings::kvm_vcpu_init::default();
// This reads back the kernel's preferred target type.
vm_fd
.get_preferred_target(&mut kvi)
.map_err(VcpuError::VcpuArmPreferredTarget)?;
// We already checked that the capability is supported.
kvi.features[0] |= 1 << kvm_bindings::KVM_ARM_VCPU_PSCI_0_2;
// Non-boot cpus are powered off initially.
if self.id > 0 {
kvi.features[0] |= 1 << kvm_bindings::KVM_ARM_VCPU_POWER_OFF;
}
self.fd.vcpu_init(&kvi).map_err(VcpuError::VcpuArmInit)?;
if let Some(address) = kernel_load_addr {
regs::setup_regs(
&self.fd,
self.id,
address.raw_value(),
get_fdt_addr(vm_as.memory().deref()),
)
.map_err(VcpuError::REGSConfiguration)?;
}
self.mpidr = regs::read_mpidr(&self.fd).map_err(VcpuError::REGSConfiguration)?;
Ok(())
}
/// Gets the MPIDR register value.
pub fn get_mpidr(&self) -> u64 {
self.mpidr
}
}

Some files were not shown because too many files have changed in this diff Show More