Compare commits

..

88 Commits
3.1.3 ... 3.0.2

Author SHA1 Message Date
Archana Shinde
35b32156ad Merge pull request #6282 from amshinde/3.0.2-branch-bump
# Kata Containers 3.0.2
2023-02-15 16:20:58 -08:00
Archana Shinde
2f638b3666 release: Kata Containers 3.0.2
- stable-3.0: Stable 3.0 backports
- stable-3.0 | docs: Fix missing critical steps in how-to-hotplug-memory-arm64.md
- Stable-3.0 | Upgrade to Cloud Hypervisor v28.2
- Qemu logs for stable 3.0
- Backport CI fixes for s390x and ppc64le to stable-3.0
- docs: Fix missing critical steps in how-to-hotplug-memory-arm64.md
- Stable-3.0 | Upgrade to Cloud Hypervisor v28.1

4ebeb51bb release: Adapt kata-deploy for 3.0.2
178ee3d7e agent: check command before do test_ip_tables
7461bcd76 runtime-rs: change cache mode
123c86717 SEV: Update ReducedPhysBits
98f60c100 clh: Enforce API timeout only for vm.boot request
960f089d3 virtiofsd: fix the build on ppc64le
92f3b11c9 runtime:all APIs are hang in the service.mu
92619c833 runtime: Drop QEMU log file support
4f3db7678 runtime: Collect QEMU's stderr
918c11e46 runtime: Start QEMU undaemonized
8c4507be2 runtime: Launch QEMU with cmd.Start()
a61fba6d4 runtime: Pre-establish the QMP connection
ad9cb0ba5 govmm: Optionally pass QMP listener to QEMU
d6dd99e98 govmm: Optionally start QMP with a pre-configured connection
0623f1fe6 virtiofsd: Not use "link-self-contained=yes" on s390x
5883dc1bd CI: Set docker version to v20.10 in ubuntu:20.04 for s390x|ppc64le
4a5877f45 docs: Fix missing critical steps in how-to-hotplug-memory-arm64.md
d3b57325e versions: Upgrade to Cloud Hypervisor v28.2
0d7bd066d docs: Fix missing critical steps in how-to-hotplug-memory-arm64.md
ac1ce2d30 docs: Fix missing critical steps in how-to-hotplug-memory-arm64.md
f4d71af45 docs: Fix missing critical steps in how-to-hotplug-memory-arm64.md
fcc120d49 versions: Upgrade to Cloud Hypervisor v28.1

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-02-15 10:56:42 -08:00
Archana Shinde
98bacb0efc release: Adapt kata-deploy for 3.0.2
kata-deploy files must be adapted to a new release.  The cases where it
happens are when the release goes from -> to:
* main -> stable:
  * kata-deploy-stable / kata-cleanup-stable: are removed

* stable -> stable:
  * kata-deploy / kata-cleanup: bump the release to the new one.

There are no changes when doing an alpha release, as the files on the
"main" branch always point to the "latest" and "stable" tags.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-02-15 10:56:42 -08:00
Archana Shinde
69e681961a Merge pull request #6275 from amshinde/stable-3.0-backports
stable-3.0: Stable 3.0 backports
2023-02-14 14:28:04 -08:00
Jianyong Wu
178ee3d7e3 agent: check command before do test_ip_tables
test_ip_tables test depends on iptables tools. But we can't
ensure these tools are exist. it's better to skip the test
if there is no such tools.

Fixes: #5697
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
(cherry picked from commit b53171b605)
2023-02-14 08:55:53 -08:00
Zhongtao Hu
7461bcd760 runtime-rs: change cache mode
use never as the cache mode if none is configured

Fixes:#6020
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
(cherry picked from commit 6199b69178)
2023-02-14 08:55:53 -08:00
Larry Dewey
123c867172 SEV: Update ReducedPhysBits
Updating this field, as `cpuid` provides host level data, which is not
what a guest would expect for Reduced Phsycial Bits. In almost all
cases, we should be using `1` for the value here.

Amend: Adding unit test change.

Fixes: #5006

Signed-off-by: Larry Dewey <larry.dewey@amd.com>
(cherry picked from commit 67b8f0773f)
2023-02-14 08:55:53 -08:00
Alexandru Matei
98f60c100c clh: Enforce API timeout only for vm.boot request
launchClh already has a timeout of 10seconds for launching clh, e.g.
if launchClh or setupVirtiofsDaemon takes a few seconds the context's
deadline will already be expired by the time it reaches bootVM

Fixes #6240
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
(cherry picked from commit ac64b021a6)
2023-02-14 08:55:53 -08:00
Archana Shinde
960f089d3c virtiofsd: fix the build on ppc64le
link-self-contained is not supported on ppc64le rust target.
Hence, do not pass it while building virtiofsd.

Fixes: #6195

Backport of #856ab66871

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-02-14 08:55:53 -08:00
ls
92f3b11c94 runtime:all APIs are hang in the service.mu
When the vmm process exits abnormally, a goroutine sets s.monitor
to null in the 'watchSandbox' function without getting service.mu,
This will cause another goroutine to block when sending a message
to s.monitor, and it holds service.mu, which leads to a deadlock.
For example, the wait function in the file
.../pkg/containerd-shim-v2/wait.go will send a message to s.monitor
after obtaining service.mu, but s.monitor may be null at this time

Fixes: #6059

Signed-off-by: ls <335814617@qq.com>
(cherry picked from commit 69fc8de712)
2023-02-14 08:55:53 -08:00
GabyCT
e299c6bd4b Merge pull request #6196 from singhwang/stable-3.0
stable-3.0 | docs: Fix missing critical steps in how-to-hotplug-memory-arm64.md
2023-02-10 10:37:08 -06:00
Bo Chen
06c94933f2 Merge pull request #6139 from likebreath/0126/clh_v28.2
Stable-3.0 | Upgrade to Cloud Hypervisor v28.2
2023-02-07 08:19:37 -08:00
Greg Kurz
8c5053ca5d Merge pull request #6175 from gkurz/qemu-logs-for-stable-3.0
Qemu logs for stable 3.0
2023-02-07 07:45:13 +01:00
Greg Kurz
92619c833e runtime: Drop QEMU log file support
The QEMU log file is essentially about fine grain tracing of QEMU
internals and mostly useful for developpers, not production. Notably,
the log file isn't limited in size, nor rotated in any way. It means
that a container running in the VM could possibly flood the log file
with a guest triggerable trace. For example, on openshift, the log
file is supposed to reside on a per-VM 14 GiB tmpfs mount. This means
that each pod running with the kata runtime could potentially consume
this amount of host RAM which is not acceptable.

Error messages are best collected from QEMU's stderr as kata is doing
now since PR #5736 was merged. Drop support for the QEMU log file
because it doesn't bring any value but can certainly do harm.

Fixes #6173

Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 334c4b8bdc)
Signed-off-by: Greg Kurz <groug@kaod.org>
2023-02-04 17:56:17 +01:00
Greg Kurz
4f3db76780 runtime: Collect QEMU's stderr
LaunchQemu now connects a pipe to QEMU's stderr and makes it
usable by callers through a Go io.ReadCloser object. As
explained in [0], all messages should be read from the pipe
before calling cmd.Wait : introduce a LogAndWait helper to handle
that.

Fixes #5780

Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 39fe4a4b6f)
Signed-off-by: Greg Kurz <groug@kaod.org>
2023-02-04 17:56:17 +01:00
Greg Kurz
918c11e46b runtime: Start QEMU undaemonized
QEMU has always been started daemonized since the beginning. I
could not find any justification for that though, but it certainly
introduces a problem : QEMU stops logging errors when started this
way, which isn't accaptable from a support standpoint. The QEMU
community discourages the use of -daemonize ; mostly because
libvirt, QEMU's primary consummer, doesn't use this option and
prefers getting errors from QEMU's stderr through a pipe in order
to enforce rollover.

Now that virtcontainers knows how to start QEMU with a pre-
established QMP connection, let's start QEMU without -daemonize.
This requires to handle the reaping of QEMU when it terminates.
Since cmd.Wait() is blocking, call it from a goroutine.

Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit a5319c6be6)
Signed-off-by: Greg Kurz <groug@kaod.org>
2023-02-04 17:56:17 +01:00
Greg Kurz
8c4507be21 runtime: Launch QEMU with cmd.Start()
LaunchCustomQemu() currently starts QEMU with cmd.Run() which is
supposed to block until the child process terminates. This assumes
that QEMU daemonizes itself, otherwise LaunchCustomQemu() would
block forever. The virtcontainers package indeed enables the
Daemonize knob in the configuration but having such an implicit
dependency on a supposedly configurable setting is ugly and fragile.

cmd.Run() is :

func (c *Cmd) Run() error {
	if err := c.Start(); err != nil {
		return err
	}
	return c.Wait()
}

Let's open-code this : govmm calls cmd.Start() and returns the
cmd to virtcontainers which calls cmd.Wait().

If QEMU doesn't start, e.g. missing binary, there won't be any
errors to collect from QEMU output. Just drop these lines in govmm.
Similarily there won't be any log file to read from in virtcontainers.
Drop that as well.

Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit bf4e3a618f)
Signed-off-by: Greg Kurz <groug@kaod.org>
2023-02-04 17:56:17 +01:00
Greg Kurz
a61fba6d45 runtime: Pre-establish the QMP connection
Running QEMU daemonized ensures that the QMP socket is ready to
accept connections when LaunchQemu() returns. In order to be
able to run QEMU undaemonized, let's handle that part upfront.
Create a listener socket and connect to it. Pass the listener
to QEMU and pass the connected socket to QMP : this ensures
that we cannot fail to establish QMP connection and that we
can detect if QEMU exits before accepting the connection.
This is basically what libvirt does.

Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 8a1723a5cb)
Signed-off-by: Greg Kurz <groug@kaod.org>
2023-02-04 17:56:17 +01:00
Greg Kurz
ad9cb0ba58 govmm: Optionally pass QMP listener to QEMU
QEMU's -qmp option can be passed the file descriptor of a socket that
is already in listening mode. This is done with by passing `fd=XXX`
to `-qmp` instead of a path. Note that these two options are mutually
exclusive : QEMU errors out if both are passed, so we check that as
well in the validation function.

While here add the `path=` stanza in the path based case for clarity.

Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 8a4f08cb0f)
Signed-off-by: Greg Kurz <groug@kaod.org>
2023-02-04 17:56:17 +01:00
Greg Kurz
d6dd99e986 govmm: Optionally start QMP with a pre-configured connection
When QEMU is launched daemonized, we have the guarantee that the
QMP socket is available. In order to launch a non-daemonized QEMU,
the QMP connection should be created before QEMU is started in order
to avoid a race. Introduce a variant of QMPStart() that can use such
an existing connection.

Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 219bb8e7d0)
Signed-off-by: Greg Kurz <groug@kaod.org>
2023-02-04 17:56:17 +01:00
Greg Kurz
3cbdec5a02 Merge pull request #6215 from gkurz/backport-6212-for-stable-3.0
Backport CI fixes for s390x and ppc64le to stable-3.0
2023-02-04 17:55:22 +01:00
Hyounggyu Choi
0623f1fe6b virtiofsd: Not use "link-self-contained=yes" on s390x
The compile option link-self-contained=yes asks rustc to use
C library startup object files that come with the compiler,
which are not available on the target s390x-unknown-linux-gnu.
A build does not contain any startup files leading to a
broken executable entry point (causing segmentation fault).

Fixes: #5522 for stable-3.0
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
(cherry picked from commit 43fcb8fd09)
Signed-off-by: Greg Kurz <groug@kaod.org>
2023-02-04 11:17:26 +01:00
Hyounggyu Choi
5883dc1bd9 CI: Set docker version to v20.10 in ubuntu:20.04 for s390x|ppc64le
This is to make a docker version to v20.10 in docker upstream image ubuntu:20.04 for s390x and ppc64le.

Fixes: #6211 for stable-3.0
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
(cherry picked from commit f49b89b632)
Signed-off-by: Greg Kurz <groug@kaod.org>
2023-02-04 11:16:11 +01:00
SinghWang
4a5877f451 docs: Fix missing critical steps in how-to-hotplug-memory-arm64.md
The key steps in how-to-hotplug-memory-arm64.md are missing, resulting in the kata qemu pod not being created successfully.

Fixes: #6105
Signed-off-by: SinghWang <wangxin_0611@126.com>
2023-02-02 16:28:45 +08:00
Bin Liu
f90e75e542 Merge pull request #6106 from singhwang/stable-3.0
docs: Fix missing critical steps in how-to-hotplug-memory-arm64.md
2023-01-28 09:07:58 +08:00
Bo Chen
d3b57325ee versions: Upgrade to Cloud Hypervisor v28.2
This patch upgrade Cloud Hypervisor to its latest bug release v28.2:
https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v28.2

Fixes: #6138

Signed-off-by: Bo Chen <chen.bo@intel.com>
2023-01-26 11:31:35 -08:00
SinghWang
0d7bd066d3 docs: Fix missing critical steps in how-to-hotplug-memory-arm64.md
The key steps in how-to-hotplug-memory-arm64.md are missing, resulting in the kata qemu pod not being created successfully.

Fixes: #6105
Signed-off-by: SinghWang <wangxin_0611@126.com>
2023-01-20 11:48:13 +08:00
SinghWang
ac1ce2d30b docs: Fix missing critical steps in how-to-hotplug-memory-arm64.md
The key steps in how-to-hotplug-memory-arm64.md are missing, resulting in the kata qemu pod not being created successfully.

Fixes: #6105
Signed-off-by: SinghWang <wangxin_0611@126.com>
2023-01-19 19:29:59 +08:00
SinghWang
f4d71af457 docs: Fix missing critical steps in how-to-hotplug-memory-arm64.md
The key steps in how-to-hotplug-memory-arm64.md are missing, resulting in the kata qemu pod not being created successfully.

Fixes: #6105
Signed-off-by: SinghWang <wangxin_0611@126.com>
2023-01-19 15:12:17 +08:00
Bo Chen
f36f8ffa16 Merge pull request #5978 from likebreath/0104/backport_clh_v28.1
Stable-3.0 | Upgrade to Cloud Hypervisor v28.1
2023-01-05 09:05:18 -08:00
Bo Chen
fcc120d495 versions: Upgrade to Cloud Hypervisor v28.1
This patch upgrade Cloud Hypervisor to its latest bug release v28.1:
https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v28.1

Fixes: #5973

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 652021ad95)
2023-01-04 10:44:03 -08:00
Fabiano Fidêncio
cfbc834602 Merge pull request #5922 from fidencio/3.0.1-branch-bump
# Kata Containers 3.0.1
2022-12-19 19:54:56 +01:00
Fabiano Fidêncio
ea74df1270 release: Kata Containers 3.0.1
- stable-3.0 | kata-deploy: Fix the pod of kata deploy starts to occur an error
- Stable-3.0 | Upgrade to Cloud Hypervisor v28.0
- stable-3.0 | Snap CI backports
- stable-3.0 | package: add nydus to release artifacts

19f51c7cc release: Adapt kata-deploy for 3.0.1
d3f7b829f versions: update nydusd version
1bf7f2f68 package: add nydus to release artifacts
9cf1af873 runtime: clh: Re-generate the client code
4d6ca7623 versions: Upgrade to Cloud Hypervisor v28.0
719017d68 clh: return faster with dead clh process from isClhRunning
569ecdbe7 clh: fast exit from isClhRunning if the process was stopped
fa8a0ad49 clh: don't try to stop clh multiple times
8fbf862fa cloud-hypervisor: Fix GetThreadIDs function
9141acd94 versions: Update Cloud Hypervisor to b4e39427080
9a0ab92f6 runtime: clh: Use the new API to boot with TDX firmware (td-shim)
f3eac35b5 runtime: clh: Re-generate the client code
8a7e0efd1 versions: Upgrade to Cloud Hypervisor v27.0
a152f6034 runk: Ignore an error when calling kill cmd with --all option
50bf4434d log-parser: Simplify check
74791ed38 runtime: Fix gofmt issues
778ebb6e6 golang: Stop using io/ioutils
b5661e988 versions: Update golangci-lint
88c13b682 versions: bump containerd version
b8ce291dd build: update golang version to 1.19.2
f5e5ca427 github: Parallelise static checks
eaa7ab746 snap: Unbreak docker install
8d2fd2449 snap: Use metadata for dependencies
ab83ab6be snap: Build virtiofsd using the kata-deploy scripts
1772df5ac snap: Create a task for installing docker
2e4958644 virtiofsd: Build inside a container

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-12-19 16:05:12 +01:00
Fabiano Fidêncio
c712057ae7 release: Adapt kata-deploy for 3.0.1
kata-deploy files must be adapted to a new release.  The cases where it
happens are when the release goes from -> to:
* main -> stable:
  * kata-deploy-stable / kata-cleanup-stable: are removed

* stable -> stable:
  * kata-deploy / kata-cleanup: bump the release to the new one.

There are no changes when doing an alpha release, as the files on the
"main" branch always point to the "latest" and "stable" tags.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-12-19 16:05:12 +01:00
Peng Tao
bc5bbfa60f versions: update nydusd version
To the latest stable v2.1.1.

Fixes: #5635
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
(cherry picked from commit a636d426d9)
2022-12-19 16:05:03 +01:00
Bin Liu
0afcc57a92 package: add nydus to release artifacts
Install nydus related binaries under /opt/kata/libexec/

Fixes: #5726

Signed-off-by: Bin Liu <bin@hyper.sh>
(cherry picked from commit abb9ebeece)
2022-12-19 15:51:18 +01:00
Peng Tao
bcc2ee6e12 Merge pull request #5913 from singhwang/stable-3.0
stable-3.0 | kata-deploy: Fix the pod of kata deploy starts to occur an error
2022-12-16 16:53:45 +08:00
Fabiano Fidêncio
bd797eddec kata-deploy: Fix the pod of kata deploy starts to occur an error
If a pod of kata is deployed on a machine, after the machine restarts, the pod status of kata-deploy will be CrashLoopBackOff.

Fixes: #5868
Signed-off-by: SinghWang <wangxin_0611@126.com>
2022-12-15 14:40:11 +08:00
Fabiano Fidêncio
b3760bb3a6 Merge pull request #5699 from likebreath/1118/backport_clh_v28.0
Stable-3.0 | Upgrade to Cloud Hypervisor v28.0
2022-11-26 11:41:35 +01:00
Bo Chen
9cf1af873b runtime: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v28.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.

Fixes: #5683

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 36545aa81a)
2022-11-25 17:53:03 +01:00
Bo Chen
4d6ca7623a versions: Upgrade to Cloud Hypervisor v28.0
Details of this release can be found in our new roadmap project as
iteration v28.0: https://github.com/orgs/cloud-hypervisor/projects/6.

Fixes: #5683

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit f4b02c2244)
2022-11-25 17:53:03 +01:00
Alexandru Matei
719017d688 clh: return faster with dead clh process from isClhRunning
Through proactively checking if Cloud Hypervisor process is dead,
this patch provides a faster path for isClhRunning

Fixes: #5623

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
(cherry picked from commit 0e3ac66e76)
2022-11-25 17:53:03 +01:00
Alexandru Matei
569ecdbe76 clh: fast exit from isClhRunning if the process was stopped
Use atomic operations instead of acquiring a mutex in isClhRunning.
This stops isClhRunning from generating a deadlock by trying to
reacquire an already-acquired lock when called via StopVM->terminate.

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
(cherry picked from commit 9ef68e0c7a)
2022-11-25 17:53:03 +01:00
Alexandru Matei
fa8a0ad49b clh: don't try to stop clh multiple times
Avoid executing StopVM concurrently when virtiofs dies as a result of clh
being stopped in StopVM.

Fixes: #5622

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
(cherry picked from commit 2631b08ff1)
2022-11-25 17:53:03 +01:00
Guanglu Guo
8fbf862fa6 cloud-hypervisor: Fix GetThreadIDs function
Get vcpu thread-ids by reading cloud-hypervisor process tasks information.

Fixes: #5568

Signed-off-by: Guanglu Guo <guoguanglu@qiyi.com>
(cherry picked from commit daeee26a1e)
2022-11-25 17:53:03 +01:00
Fabiano Fidêncio
9141acd94c versions: Update Cloud Hypervisor to b4e39427080
An API change, done a long time ago, has been exposed on Cloud
Hypervisor and we should update it on the Kata Containers side to ensure
it doesn't affect Cloud Hypervisor CI and because the change is needed
for an upcoming work to get QAT working with Cloud Hypervisor.

Fixes: #5492

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 9d286af7b4)
2022-11-25 17:53:03 +01:00
Bo Chen
9a0ab92f65 runtime: clh: Use the new API to boot with TDX firmware (td-shim)
The new way to boot from TDX firmware (e.g. td-shim) is using the
combination of '--platform tdx=on' with '--firmware tdshim'.

Fixes: #5309

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 067e2b1e33)
2022-11-25 17:53:03 +01:00
Bo Chen
f3eac35b55 runtime: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v27.0.
Note: The client code of cloud-hypervisor's (CLH) OpenAPI is
automatically generated by openapi-generator [1-2].

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Fixes: #5309

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 5d63fcf344)
2022-11-25 17:53:03 +01:00
Bo Chen
8a7e0efd14 versions: Upgrade to Cloud Hypervisor v27.0
This release has been tracked in our new [roadmap project ](https://github.com/orgs/cloud-hypervisor/projects/6) as iteration v27.0.

**Community Engagement**
A new mailing list has been created to support broader community discussions.
Please consider [subscribing](https://lists.cloudhypervisor.org/g/dev/); an announcement of a regular meeting will be
announced via this list shortly.

**Prebuilt Packages**
Prebuilt packages are now available. Please see this [document](https://github.com/cloud-hypervisor/obs-packaging/blob/main/README.md)
on how to install. These packages also include packages for the different
firmware options available.

**Network Device MTU Exposed to Guest**
The MTU for the TAP device associated with a virtio-net device is now exposed
to the guest. If the user provides a MTU with --net mtu=.. then that MTU is
applied to created TAP interfaces. This functionality is also exposed for
vhost-user-net devices including those created with the reference backend.

**Boot Tracing**
Support for generating a trace report for the boot time has been added
including a script for generating an SVG from that trace.

**Simplified Build Feature Flags**
The set of feature flags, for e.g. experimental features, have been simplified:

* msvh and kvm features provide support for those specific hypervisors
(with kvm enabled by default),
* tdx provides support for Intel TDX; and although there is no MSHV support
now it is now possible to compile with the mshv feature,
* tracing adds support for boot tracing,
* guest_debug now covers both support for gdbing a guest (formerly gdb
feature) and dumping guest memory.

The following feature flags were removed as the functionality was enabled by
default: amx, fwdebug, cmos and common.

**Asynchronous Kernel Loading**
AArch64 has gained support for loading the guest kernel asynchronously like
x86-64.

**GDB Support for AArch64**
GDB stub support (accessed through --gdb under guest_debug feature) is now
available on AArch64 as well as as x86-64.

**Notable Bug Fixes**
* This version incorporates a version of virtio-queue that addresses an issue
where a rogue guest can potentially DoS the VMM,
* Improvements around PTY handling for virtio-console and serial devices,
* Improved error handling in virtio devices.

**Deprecations**
Deprecated features will be removed in a subsequent release and users should
plan to use alternatives.

* Booting legacy firmware (compiled without a PVH header) has been deprecated.
All the firmware options (Cloud Hypervisor OVMF and Rust Hypervisor Firmware)
support booting with PVH so support for loading firmware in a legacy mode is no
longer needed. This functionality will be removed in the next release.

Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v27.0

Note: To have the new API of loading firmware for booting (e.g. boot
from td-shim), a specific commit revision after the v27.0 release is
used as the Cloud Hypervisor version from the 'versions.yaml'.

Fixes: #5309

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit fe61070426)
2022-11-25 17:53:03 +01:00
Fabiano Fidêncio
754308c478 Merge pull request #5734 from fidencio/topic/stable-3.0-snap-ci-backports
stable-3.0 | Snap CI backports
2022-11-25 17:51:34 +01:00
Manabu Sugimoto
a152f6034e runk: Ignore an error when calling kill cmd with --all option
Ignore an error handling that is triggered when the kill command is called
with `--all option` to the stopped container.

High-level container runtimes such as containerd call the kill command with
`--all` option in order to terminate all processes inside the container
even if the container already is stopped. Hence, a low-level runtime
should allow `kill --all` regardless of the container state like runc.

This commit reverts to the previous behavior.

Fixes: #5555

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
(cherry picked from commit 16dca4ecd4)
2022-11-25 14:10:32 +01:00
Fabiano Fidêncio
50bf4434dd log-parser: Simplify check
```
14:13:15 parse.go:306:5: S1009: should omit nil check; len() for github.com/kata-containers/kata-containers/src/tools/log-parser.kvPairs is defined as zero (gosimple)
14:13:15 	if pairs == nil || len(pairs) == 0 {
14:13:15 	   ^
```

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 2f5f575a43)
2022-11-25 14:10:32 +01:00
Fabiano Fidêncio
74791ed389 runtime: Fix gofmt issues
It seems that bumping the version of golang and golangci-lint new format
changes are required.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit d94718fb30)
2022-11-25 14:10:06 +01:00
Fabiano Fidêncio
778ebb6e60 golang: Stop using io/ioutils
The package has been deprecated as part of 1.16 and the same
functionality is now provided by either the io or the os package.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 16b8375095)
2022-11-25 13:29:34 +01:00
Fabiano Fidêncio
b5661e9882 versions: Update golangci-lint
Let's bump the golangci-lint in order to fix issues that popped up after
updating Golang to its 1.19.2 version.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 66aa330d0d)
2022-11-25 13:29:31 +01:00
Peng Tao
88c13b6823 versions: bump containerd version
v1.5.2 cannot be built from source by newer golang. Let's bump
containerd version to 1.6.8. The GO runtime dependency has
been moved to v1.6.6 for some time already.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
(cherry picked from commit b3a4a16294)
2022-11-25 13:29:30 +01:00
Peng Tao
b8ce291dd0 build: update golang version to 1.19.2
So that we get the latest language fixes.

There is little use to maitain compiler backward compatibility.
Let's just set the default golang version to the latest 1.19.2.

Fixes: #5494
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
(cherry picked from commit eab8d6be13)
2022-11-25 13:29:04 +01:00
Fabiano Fidêncio
f5e5ca427d github: Parallelise static checks
Although introducing an awful amount of code duplication, let's
parallelise the static checks in order to reduce its time and the space
used in the VMs running those.

While I understand there may be ways to make the whole setup less
repetitive and error prone, I'm taking the approach of:
* Make it work
* Make it right
* Make it fast

So, it's clear that I'm only attempting to make it work, and I'd
appreciate community help in order to improve the situation here.  But,
for now, this is a stopgap solution.

JFYI, the time needed for run the tests on the `main` branch went down
from ~110 minutes to ~60 minutes.  Plus, we're not running those on a
single VM anymore, which decreases the change to hit the space limit.

Reference: https://github.com/kata-containers/kata-containers/actions/runs/3393468605/jobs/5640842041

Ideally, each one of the following tests should be also split into
smaller tests, each test for one component, for instance.
* static-checks
* compiler-checks
* unit-tests
* unit-tests-as-root

Fixes: #5585

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 40d514aa2c)
2022-11-25 13:29:03 +01:00
James O. D. Hunt
eaa7ab7462 snap: Unbreak docker install
It appears that _either_ the GitHub workflow runners have changed their
environment, or the Ubuntu archive has changed package dependencies,
resulting in the following error when building the snap:

```
Installing build dependencies: bc bison build-essential cpio curl docker.io ...

    :

The following packages have unmet dependencies:
docker.io : Depends: containerd (>= 1.2.6-0ubuntu1~)
E: Unable to correct problems, you have held broken packages.
```

This PR uses the simplest solution: install the `containerd` and `runc`
packages. However, we might want to investigate alternative solutions in
the future given that the docker and containerd packages seem to have
gone wild in the Ubuntu GitHub workflow runner environment. If you
include the official docker repo (which the snap uses), a _subset_ of
the related packages is now:

- `containerd`
- `containerd.io`
- `docker-ce`
- `docker.io`
- `moby-containerd`
- `moby-engine`
- `moby-runc`
- `runc`

Fixes: #5545.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
(cherry picked from commit 990e6359b7)
2022-11-25 13:29:02 +01:00
James O. D. Hunt
8d2fd24492 snap: Use metadata for dependencies
Rather than hard-coding the package manager into the docker part,
use the `build-packages` section to specify the parts package
dependencies in a distro agnostic manner.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
(cherry picked from commit ca69a9ad6d)
2022-11-25 13:29:01 +01:00
Fabiano Fidêncio
ab83ab6be5 snap: Build virtiofsd using the kata-deploy scripts
Let's build virtiofsd using the kata-deploy build scripts, which
simplifies and unifies the way we build our components.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 0bc5baafb9)
2022-11-25 13:29:00 +01:00
Fabiano Fidêncio
1772df5ac2 snap: Create a task for installing docker
Let's have the docker installation / configuration as part of its own
task, which can be set as a dependency of other tasks whcih may or may
not depend on docker.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit cb4ef4734f)
2022-11-25 13:28:59 +01:00
Fabiano Fidêncio
2e49586445 virtiofsd: Build inside a container
When moving to building the CI artefacts using the kata-deploy scripts,
we've noticed that the build would fail on any machine where the tarball
wasn't officially provided.

This happens as rust is missing from the 1st layer container.  However,
it's a very common practice to leave the 1st layer container with the
minimum possible dependencies and install whatever is needed for
building a specific component in a 2nd layer container, which virtiofsd
never had.

In this commit we introduce the second layer containers (yes,
comtainers), one for building virtiofsd using musl, and one for building
virtiofsd using glibc.  The reason for taking this approach was to
actually simplify the scripts and avoid building the dependencies
(libseccomp, libcap-ng) using musl libc.

Fixes: #5425

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 7e5941c578)
2022-11-25 13:28:57 +01:00
Peng Tao
e2a8815ba4 Merge pull request #5379 from bergwolf/3.0.0-branch-bump
# Kata Containers 3.0.0
2022-10-09 16:59:20 +08:00
Peng Tao
63495cf43a release: Kata Containers 3.0.0
- stable-3.0: backport agent fixes
- backport fix for 3.0.0 release

fb4430549 release: Adapt kata-deploy for 3.0.0
20c02528e agent: reduce reference count for failed mount
3eb6f5858 agent: don't exit early if signal fails due to ESRCH
8dc8565ed versions: Update gperf url to avoid libseccomp random failures
740e7e2f7 kata-sys-util: fix typo `unknow`

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-10-08 12:42:06 +00:00
Peng Tao
fb44305497 release: Adapt kata-deploy for 3.0.0
kata-deploy files must be adapted to a new release.  The cases where it
happens are when the release goes from -> to:
* main -> stable:
  * kata-deploy-stable / kata-cleanup-stable: are removed

* stable -> stable:
  * kata-deploy / kata-cleanup: bump the release to the new one.

There are no changes when doing an alpha release, as the files on the
"main" branch always point to the "latest" and "stable" tags.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-10-08 12:42:06 +00:00
Fabiano Fidêncio
cea5c29e70 Merge pull request #5377 from bergwolf/github/backport-3.0
stable-3.0: backport agent fixes
2022-10-08 11:55:19 +02:00
Feng Wang
20c02528e5 agent: reduce reference count for failed mount
The kata agent adds a reference for each storage object before mount
and skip mount again if the storage object is known. We need to
remove the object reference if mount fails.

Fixes: #5364

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-10-08 15:13:39 +08:00
Feng Wang
3eb6f5858a agent: don't exit early if signal fails due to ESRCH
ESRCH usually means the process has exited. In this case,
the execution should continue to kill remaining container processes.

Fixes: #5366

Signed-off-by: Feng Wang <feng.wang@databricks.com>
[Fix up cargo updates]
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-10-08 15:13:30 +08:00
Peng Tao
8b0231bec8 Merge pull request #5372 from bergwolf/github/backport-3.0
backport fix for 3.0.0 release
2022-10-08 10:33:21 +08:00
Gabriela Cervantes
8dc8565ed5 versions: Update gperf url to avoid libseccomp random failures
This PR updates the gperf url to avoid random failures when installing
libseccomp as it seems that the mirrror url produces network random
failures in multiple CIs.

Fixes #5294

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2022-10-07 20:56:43 +08:00
Bin Liu
740e7e2f77 kata-sys-util: fix typo unknow
Change `unknow` to `unknown`.

Fixes: #5296

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-10-07 19:58:46 +08:00
Greg Kurz
ef49fa95f7 Merge pull request #5290 from gkurz/3.0.0-rc1-branch-bump
# Kata Containers 3.0.0-rc1
2022-09-30 08:43:06 +02:00
Greg Kurz
727f233e2a release: Kata Containers 3.0.0-rc1
- tools: release: fix bogus version check
- osbuilder: Export directory variables for libseccomp
- kata-deploy: support runtime-rs for kata deploy
- Last backport for 3.0-rc1
- stable-3.0: backport runtime/runtime-rs dependency updates

babab160bc tools: release: fix bogus version check
af22e71375 osbuilder: Export directory variables for libseccomp
b0c5f040f0 runtime-rs: set agent timeout to 0 for stream RPCs
d44e39e059 runtime-rs: fix incorrect comments
43b0e95800 runtime: store the user name in hypervisor config
81801888a2 runtime: make StopVM thread-safe
fba39ef32d runtime: add more debug logs for non-root user operation
63309514ca runtime-rs: drop dependency on rustc-serialize
e229a03cc8 runtime: update runc dependency
d663f110d7 kata-deploy: get the config path from cri options
c6b3dcb67d kata-deploy: support kata-deploy for runtime-rs
a394761a5c kata-deploy: add installation for runtime-rs

Signed-off-by: Greg Kurz <groug@kaod.org>
2022-09-29 17:21:11 +02:00
Greg Kurz
619d1b487f Merge pull request #5286 from gkurz/backport-3.0/5284-release-script
tools: release: fix bogus version check
2022-09-29 17:11:23 +02:00
Greg Kurz
babab160bc tools: release: fix bogus version check
Shell expands `*"rc"*` to the top-level `src` directory. This results
in comparing a version with a directory name. This doesn't make sense
and causes the script to choose the wrong branch of the `if`.

The intent of the check is actually to detect `rc` in the version.

Fixes: #5283
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 421729f991)
Signed-off-by: Greg Kurz <groug@kaod.org>
2022-09-29 14:56:52 +02:00
Archana Shinde
f168555569 Merge pull request #5273 from gkurz/backport-3.0/5233-osbuilder
osbuilder: Export directory variables for libseccomp
2022-09-28 17:22:51 -07:00
Gabriela Cervantes
af22e71375 osbuilder: Export directory variables for libseccomp
To avoid the random failures when we are building the rootfs as it seems
that it does not find the value for the libseccomp and gperf directory,
this PR export these variables.

Fixes #5232

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit a4a23457ca)
Signed-off-by: Greg Kurz <groug@kaod.org>
2022-09-28 13:16:09 +02:00
Greg Kurz
b9379521a0 Merge pull request #5263 from openanolis/origin/kata-deploy
kata-deploy: support runtime-rs for kata deploy
2022-09-28 09:41:12 +02:00
Peng Tao
5b3bbc62ba Merge pull request #5257 from gkurz/backport-3_0_rc1
Last backport for 3.0-rc1
2022-09-28 11:01:09 +08:00
Bin Liu
b0c5f040f0 runtime-rs: set agent timeout to 0 for stream RPCs
For stream RPCs:
- write_stdin
- read_stdout
- read_stderr

there should be no timeout (by setting it to 0).

Fixes: #5249

Signed-off-by: Bin Liu <bin@hyper.sh>
(cherry picked from commit 20bcaf0e36)
Signed-off-by: Greg Kurz <groug@kaod.org>
2022-09-27 16:01:17 +02:00
Bin Liu
d44e39e059 runtime-rs: fix incorrect comments
Some comments for types are incorrect in file
 src/libs/kata-types/src/config/hypervisor/mod.rs

Fixes: #5187

Signed-off-by: Bin Liu <bin@hyper.sh>
(cherry picked from commit 3f65ff2d07)
Signed-off-by: Greg Kurz <groug@kaod.org>
2022-09-27 15:58:27 +02:00
Feng Wang
43b0e95800 runtime: store the user name in hypervisor config
The user name will be used to delete the user instead of relying on
uid lookup because uid can be reused.

Fixes: #5155

Signed-off-by: Feng Wang <feng.wang@databricks.com>
(cherry picked from commit f914319874)
Signed-off-by: Greg Kurz <groug@kaod.org>
2022-09-27 15:58:26 +02:00
Feng Wang
81801888a2 runtime: make StopVM thread-safe
StopVM can be invoked by multiple threads and needs to be thread-safe

Fixes: #5155

Signed-off-by: Feng Wang <feng.wang@databricks.com>
(cherry picked from commit 5cafe21770)
Signed-off-by: Greg Kurz <groug@kaod.org>
2022-09-27 15:58:25 +02:00
Feng Wang
fba39ef32d runtime: add more debug logs for non-root user operation
Previously the logging was insufficient and made debugging difficult

Fixes: #5155

Signed-off-by: Feng Wang <feng.wang@databricks.com>
(cherry picked from commit c3015927a3)
Signed-off-by: Greg Kurz <groug@kaod.org>
2022-09-27 15:58:24 +02:00
Fupan Li
57261ec97a Merge pull request #5251 from bergwolf/github/backport-3.0
stable-3.0: backport runtime/runtime-rs dependency updates
2022-09-27 14:55:55 +08:00
Peng Tao
63309514ca runtime-rs: drop dependency on rustc-serialize
We are not using it and it hasn't got any updates for more than five
years, leaving open CVEs unresolved.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-09-27 11:54:44 +08:00
Peng Tao
e229a03cc8 runtime: update runc dependency
To bring fix to CVE-2022-29162.

Fixes: #5217
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-09-27 11:54:37 +08:00
502 changed files with 5286 additions and 26393 deletions

View File

@@ -1,12 +1,5 @@
name: Cargo Crates Check Runner
on:
pull_request:
types:
- opened
- edited
- reopened
- synchronize
paths-ignore: [ '**.md', '**.png', '**.jpg', '**.jpeg', '**.svg', '/docs/**' ]
on: [pull_request]
jobs:
cargo-deny-runner:
runs-on: ubuntu-latest

View File

@@ -5,7 +5,7 @@ on:
- edited
- reopened
- synchronize
paths-ignore: [ '**.md', '**.png', '**.jpg', '**.jpeg', '**.svg', '/docs/**' ]
name: Darwin tests
jobs:
test:
@@ -14,7 +14,7 @@ jobs:
- name: Install Go
uses: actions/setup-go@v2
with:
go-version: 1.19.3
go-version: 1.19.2
- name: Checkout code
uses: actions/checkout@v2
- name: Build utils

View File

@@ -14,7 +14,7 @@ jobs:
- name: Install Go
uses: actions/setup-go@v2
with:
go-version: 1.19.3
go-version: 1.19.2
env:
GOPATH: ${{ runner.workspace }}/kata-containers
- name: Set env

View File

@@ -18,7 +18,6 @@ jobs:
matrix:
asset:
- kernel
- kernel-dragonball-experimental
- shim-v2
- qemu
- cloud-hypervisor
@@ -29,6 +28,12 @@ jobs:
- nydus
steps:
- uses: actions/checkout@v2
- name: Install docker
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
curl -fsSL https://test.docker.com -o test-docker.sh
sh test-docker.sh
- name: Build ${{ matrix.asset }}
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |

View File

@@ -1,10 +1,5 @@
on:
workflow_dispatch: # this is used to trigger the workflow on non-main branches
inputs:
pr:
description: 'PR number from the selected branch to test'
type: string
required: true
issue_comment:
types: [created, edited]
@@ -18,20 +13,19 @@ jobs:
&& github.event_name == 'issue_comment'
&& github.event.action == 'created'
&& startsWith(github.event.comment.body, '/test_kata_deploy')
|| github.event_name == 'workflow_dispatch'
steps:
- name: Check membership on comment or dispatch
- name: Check membership
uses: kata-containers/is-organization-member@1.0.1
id: is_organization_member
with:
organization: kata-containers
username: ${{ github.event.comment.user.login || github.event.sender.login }}
username: ${{ github.event.comment.user.login }}
token: ${{ secrets.GITHUB_TOKEN }}
- name: Fail if not member
run: |
result=${{ steps.is_organization_member.outputs.result }}
if [ $result == false ]; then
user=${{ github.event.comment.user.login || github.event.sender.login }}
user=${{ github.event.comment.user.login }}
echo Either ${user} is not part of the kata-containers organization
echo or ${user} has its Organization Visibility set to Private at
echo https://github.com/orgs/kata-containers/people?query=${user}
@@ -50,7 +44,6 @@ jobs:
- cloud-hypervisor
- firecracker
- kernel
- kernel-dragonball-experimental
- nydus
- qemu
- rootfs-image
@@ -61,17 +54,18 @@ jobs:
- name: get-PR-ref
id: get-PR-ref
run: |
if [ ${{ github.event_name }} == 'issue_comment' ]; then
ref=$(cat $GITHUB_EVENT_PATH | jq -r '.issue.pull_request.url' | sed 's#^.*\/pulls#refs\/pull#' | sed 's#$#\/merge#')
else # workflow_dispatch
ref="refs/pull/${{ github.event.inputs.pr }}/merge"
fi
echo "reference for PR: " ${ref} "event:" ${{ github.event_name }}
ref=$(cat $GITHUB_EVENT_PATH | jq -r '.issue.pull_request.url' | sed 's#^.*\/pulls#refs\/pull#' | sed 's#$#\/merge#')
echo "reference for PR: " ${ref}
echo "##[set-output name=pr-ref;]${ref}"
- uses: actions/checkout@v2
with:
ref: ${{ steps.get-PR-ref.outputs.pr-ref }}
- name: Install docker
run: |
curl -fsSL https://test.docker.com -o test-docker.sh
sh test-docker.sh
- name: Build ${{ matrix.asset }}
run: |
make "${KATA_ASSET}-tarball"
@@ -96,12 +90,8 @@ jobs:
- name: get-PR-ref
id: get-PR-ref
run: |
if [ ${{ github.event_name }} == 'issue_comment' ]; then
ref=$(cat $GITHUB_EVENT_PATH | jq -r '.issue.pull_request.url' | sed 's#^.*\/pulls#refs\/pull#' | sed 's#$#\/merge#')
else # workflow_dispatch
ref="refs/pull/${{ github.event.inputs.pr }}/merge"
fi
echo "reference for PR: " ${ref} "event:" ${{ github.event_name }}
ref=$(cat $GITHUB_EVENT_PATH | jq -r '.issue.pull_request.url' | sed 's#^.*\/pulls#refs\/pull#' | sed 's#$#\/merge#')
echo "reference for PR: " ${ref}
echo "##[set-output name=pr-ref;]${ref}"
- uses: actions/checkout@v2
with:
@@ -127,12 +117,8 @@ jobs:
- name: get-PR-ref
id: get-PR-ref
run: |
if [ ${{ github.event_name }} == 'issue_comment' ]; then
ref=$(cat $GITHUB_EVENT_PATH | jq -r '.issue.pull_request.url' | sed 's#^.*\/pulls#refs\/pull#' | sed 's#$#\/merge#')
else # workflow_dispatch
ref="refs/pull/${{ github.event.inputs.pr }}/merge"
fi
echo "reference for PR: " ${ref} "event:" ${{ github.event_name }}
ref=$(cat $GITHUB_EVENT_PATH | jq -r '.issue.pull_request.url' | sed 's#^.*\/pulls#refs\/pull#' | sed 's#$#\/merge#')
echo "reference for PR: " ${ref}
echo "##[set-output name=pr-ref;]${ref}"
- uses: actions/checkout@v2
with:

View File

@@ -13,7 +13,6 @@ jobs:
- cloud-hypervisor
- firecracker
- kernel
- kernel-dragonball-experimental
- nydus
- qemu
- rootfs-image
@@ -22,6 +21,11 @@ jobs:
- virtiofsd
steps:
- uses: actions/checkout@v2
- name: Install docker
run: |
curl -fsSL https://test.docker.com -o test-docker.sh
sh test-docker.sh
- name: Build ${{ matrix.asset }}
run: |
./tools/packaging/kata-deploy/local-build/kata-deploy-copy-yq-installer.sh
@@ -119,7 +123,8 @@ jobs:
name: kata-static-tarball
- name: install hub
run: |
wget -q -O- https://github.com/mislav/hub/releases/download/v2.14.2/hub-linux-amd64-2.14.2.tgz | \
HUB_VER=$(curl -s "https://api.github.com/repos/github/hub/releases/latest" | jq -r .tag_name | sed 's/^v//')
wget -q -O- https://github.com/github/hub/releases/download/v$HUB_VER/hub-linux-amd64-$HUB_VER.tgz | \
tar xz --strip-components=2 --wildcards '*/bin/hub' && sudo mv hub /usr/local/bin/hub
- name: push static tarball to github
run: |

View File

@@ -4,9 +4,6 @@ on:
tags:
- '[0-9]+.[0-9]+.[0-9]+*'
env:
SNAPCRAFT_STORE_CREDENTIALS: ${{ secrets.snapcraft_token }}
jobs:
release-snap:
runs-on: ubuntu-20.04
@@ -17,16 +14,9 @@ jobs:
fetch-depth: 0
- name: Install Snapcraft
run: |
# Required to avoid snapcraft install failure
sudo chown root:root /
# "--classic" is needed for the GitHub action runner
# environment.
sudo snap install snapcraft --classic
# Allow other parts to access snap binaries
echo /snap/bin >> "$GITHUB_PATH"
uses: samuelmeuli/action-snapcraft@v1
with:
snapcraft_token: ${{ secrets.snapcraft_token }}
- name: Build snap
run: |

View File

@@ -6,7 +6,6 @@ on:
- synchronize
- reopened
- edited
paths-ignore: [ '**.md', '**.png', '**.jpg', '**.jpeg', '**.svg', '/docs/**' ]
jobs:
test:
@@ -20,16 +19,7 @@ jobs:
- name: Install Snapcraft
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
# Required to avoid snapcraft install failure
sudo chown root:root /
# "--classic" is needed for the GitHub action runner
# environment.
sudo snap install snapcraft --classic
# Allow other parts to access snap binaries
echo /snap/bin >> "$GITHUB_PATH"
uses: samuelmeuli/action-snapcraft@v1
- name: Build snap
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}

View File

@@ -1,33 +0,0 @@
on:
pull_request:
types:
- opened
- edited
- reopened
- synchronize
paths-ignore: [ '**.md', '**.png', '**.jpg', '**.jpeg', '**.svg', '/docs/**' ]
name: Static checks dragonball
jobs:
test-dragonball:
runs-on: self-hosted
env:
RUST_BACKTRACE: "1"
steps:
- uses: actions/checkout@v3
- name: Set env
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "GOPATH=${{ github.workspace }}" >> $GITHUB_ENV
- name: Install Rust
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
./ci/install_rust.sh
PATH=$PATH:"$HOME/.cargo/bin"
- name: Run Unit Test
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd src/dragonball
cargo version
rustc --version
sudo -E env PATH=$PATH LIBC=gnu SUPPORT_VIRTUALIZATION=true make test

View File

@@ -8,16 +8,8 @@ on:
name: Static checks
jobs:
static-checks:
check-vendored-code:
runs-on: ubuntu-20.04
strategy:
matrix:
cmd:
- "make vendor"
- "make static-checks"
- "make check"
- "make test"
- "sudo -E PATH=\"$PATH\" make test"
env:
TRAVIS: "true"
TRAVIS_BRANCH: ${{ github.base_ref }}
@@ -26,33 +18,13 @@ jobs:
RUST_BACKTRACE: "1"
target_branch: ${{ github.base_ref }}
steps:
- name: Checkout code
uses: actions/checkout@v3
with:
fetch-depth: 0
path: ./src/github.com/${{ github.repository }}
- name: Install Go
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/setup-go@v3
with:
go-version: 1.19.3
go-version: 1.19.2
env:
GOPATH: ${{ runner.workspace }}/kata-containers
- name: Check kernel config version
run: |
cd "${{ github.workspace }}/src/github.com/${{ github.repository }}"
kernel_dir="tools/packaging/kernel/"
kernel_version_file="${kernel_dir}kata_config_version"
modified_files=$(git diff --name-only origin/main..HEAD)
result=$(git whatchanged origin/main..HEAD "${kernel_dir}" >>"/dev/null")
if git whatchanged origin/main..HEAD "${kernel_dir}" >>"/dev/null"; then
echo "Kernel directory has changed, checking if $kernel_version_file has been updated"
if echo "$modified_files" | grep -v "README.md" | grep "${kernel_dir}" >>"/dev/null"; then
echo "$modified_files" | grep "$kernel_version_file" >>/dev/null || ( echo "Please bump version in $kernel_version_file" && exit 1)
else
echo "Readme file changed, no need for kernel config version update."
fi
echo "Check passed"
fi
- name: Setup GOPATH
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
@@ -65,6 +37,64 @@ jobs:
run: |
echo "GOPATH=${{ github.workspace }}" >> $GITHUB_ENV
echo "${{ github.workspace }}/bin" >> $GITHUB_PATH
- name: Checkout code
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v3
with:
fetch-depth: 0
path: ./src/github.com/${{ github.repository }}
- name: Setup travis references
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "TRAVIS_BRANCH=${TRAVIS_BRANCH:-$(echo $GITHUB_REF | awk 'BEGIN { FS = \"/\" } ; { print $3 }')}"
target_branch=${TRAVIS_BRANCH}
- name: Setup
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/setup.sh
env:
GOPATH: ${{ runner.workspace }}/kata-containers
# Check whether the vendored code is up-to-date & working as the first thing
- name: Check vendored code
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && make vendor
static-checks:
runs-on: ubuntu-20.04
env:
TRAVIS: "true"
TRAVIS_BRANCH: ${{ github.base_ref }}
TRAVIS_PULL_REQUEST_BRANCH: ${{ github.head_ref }}
TRAVIS_PULL_REQUEST_SHA : ${{ github.event.pull_request.head.sha }}
RUST_BACKTRACE: "1"
target_branch: ${{ github.base_ref }}
steps:
- name: Install Go
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/setup-go@v3
with:
go-version: 1.19.2
env:
GOPATH: ${{ runner.workspace }}/kata-containers
- name: Setup GOPATH
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "TRAVIS_BRANCH: ${TRAVIS_BRANCH}"
echo "TRAVIS_PULL_REQUEST_BRANCH: ${TRAVIS_PULL_REQUEST_BRANCH}"
echo "TRAVIS_PULL_REQUEST_SHA: ${TRAVIS_PULL_REQUEST_SHA}"
echo "TRAVIS: ${TRAVIS}"
- name: Set env
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "GOPATH=${{ github.workspace }}" >> $GITHUB_ENV
echo "${{ github.workspace }}/bin" >> $GITHUB_PATH
- name: Checkout code
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v3
with:
fetch-depth: 0
path: ./src/github.com/${{ github.repository }}
- name: Setup travis references
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
@@ -84,7 +114,6 @@ jobs:
rustup target add x86_64-unknown-linux-musl
rustup component add rustfmt clippy
- name: Setup seccomp
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
libseccomp_install_dir=$(mktemp -d -t libseccomp.XXXXXXXXXX)
gperf_install_dir=$(mktemp -d -t gperf.XXXXXXXXXX)
@@ -92,7 +121,206 @@ jobs:
echo "Set environment variables for the libseccomp crate to link the libseccomp library statically"
echo "LIBSECCOMP_LINK_TYPE=static" >> $GITHUB_ENV
echo "LIBSECCOMP_LIB_PATH=${libseccomp_install_dir}/lib" >> $GITHUB_ENV
- name: Run check
- name: Static Checks
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ${{ matrix.cmd }}
cd ${GOPATH}/src/github.com/${{ github.repository }} && make static-checks
compiler-checks:
runs-on: ubuntu-20.04
env:
TRAVIS: "true"
TRAVIS_BRANCH: ${{ github.base_ref }}
TRAVIS_PULL_REQUEST_BRANCH: ${{ github.head_ref }}
TRAVIS_PULL_REQUEST_SHA : ${{ github.event.pull_request.head.sha }}
RUST_BACKTRACE: "1"
target_branch: ${{ github.base_ref }}
steps:
- name: Install Go
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/setup-go@v3
with:
go-version: 1.19.2
env:
GOPATH: ${{ runner.workspace }}/kata-containers
- name: Setup GOPATH
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "TRAVIS_BRANCH: ${TRAVIS_BRANCH}"
echo "TRAVIS_PULL_REQUEST_BRANCH: ${TRAVIS_PULL_REQUEST_BRANCH}"
echo "TRAVIS_PULL_REQUEST_SHA: ${TRAVIS_PULL_REQUEST_SHA}"
echo "TRAVIS: ${TRAVIS}"
- name: Set env
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "GOPATH=${{ github.workspace }}" >> $GITHUB_ENV
echo "${{ github.workspace }}/bin" >> $GITHUB_PATH
- name: Checkout code
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v3
with:
fetch-depth: 0
path: ./src/github.com/${{ github.repository }}
- name: Setup travis references
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "TRAVIS_BRANCH=${TRAVIS_BRANCH:-$(echo $GITHUB_REF | awk 'BEGIN { FS = \"/\" } ; { print $3 }')}"
target_branch=${TRAVIS_BRANCH}
- name: Setup
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/setup.sh
env:
GOPATH: ${{ runner.workspace }}/kata-containers
- name: Installing rust
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/install_rust.sh
PATH=$PATH:"$HOME/.cargo/bin"
rustup target add x86_64-unknown-linux-musl
rustup component add rustfmt clippy
- name: Setup seccomp
run: |
libseccomp_install_dir=$(mktemp -d -t libseccomp.XXXXXXXXXX)
gperf_install_dir=$(mktemp -d -t gperf.XXXXXXXXXX)
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/install_libseccomp.sh "${libseccomp_install_dir}" "${gperf_install_dir}"
echo "Set environment variables for the libseccomp crate to link the libseccomp library statically"
echo "LIBSECCOMP_LINK_TYPE=static" >> $GITHUB_ENV
echo "LIBSECCOMP_LIB_PATH=${libseccomp_install_dir}/lib" >> $GITHUB_ENV
- name: Run Compiler Checks
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && make check
unit-tests:
runs-on: ubuntu-20.04
env:
TRAVIS: "true"
TRAVIS_BRANCH: ${{ github.base_ref }}
TRAVIS_PULL_REQUEST_BRANCH: ${{ github.head_ref }}
TRAVIS_PULL_REQUEST_SHA : ${{ github.event.pull_request.head.sha }}
RUST_BACKTRACE: "1"
target_branch: ${{ github.base_ref }}
steps:
- name: Install Go
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/setup-go@v3
with:
go-version: 1.19.2
env:
GOPATH: ${{ runner.workspace }}/kata-containers
- name: Setup GOPATH
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "TRAVIS_BRANCH: ${TRAVIS_BRANCH}"
echo "TRAVIS_PULL_REQUEST_BRANCH: ${TRAVIS_PULL_REQUEST_BRANCH}"
echo "TRAVIS_PULL_REQUEST_SHA: ${TRAVIS_PULL_REQUEST_SHA}"
echo "TRAVIS: ${TRAVIS}"
- name: Set env
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "GOPATH=${{ github.workspace }}" >> $GITHUB_ENV
echo "${{ github.workspace }}/bin" >> $GITHUB_PATH
- name: Checkout code
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v3
with:
fetch-depth: 0
path: ./src/github.com/${{ github.repository }}
- name: Setup travis references
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "TRAVIS_BRANCH=${TRAVIS_BRANCH:-$(echo $GITHUB_REF | awk 'BEGIN { FS = \"/\" } ; { print $3 }')}"
target_branch=${TRAVIS_BRANCH}
- name: Setup
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/setup.sh
env:
GOPATH: ${{ runner.workspace }}/kata-containers
- name: Installing rust
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/install_rust.sh
PATH=$PATH:"$HOME/.cargo/bin"
rustup target add x86_64-unknown-linux-musl
rustup component add rustfmt clippy
- name: Setup seccomp
run: |
libseccomp_install_dir=$(mktemp -d -t libseccomp.XXXXXXXXXX)
gperf_install_dir=$(mktemp -d -t gperf.XXXXXXXXXX)
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/install_libseccomp.sh "${libseccomp_install_dir}" "${gperf_install_dir}"
echo "Set environment variables for the libseccomp crate to link the libseccomp library statically"
echo "LIBSECCOMP_LINK_TYPE=static" >> $GITHUB_ENV
echo "LIBSECCOMP_LIB_PATH=${libseccomp_install_dir}/lib" >> $GITHUB_ENV
- name: Run Unit Tests
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && make test
unit-tests-as-root:
runs-on: ubuntu-20.04
env:
TRAVIS: "true"
TRAVIS_BRANCH: ${{ github.base_ref }}
TRAVIS_PULL_REQUEST_BRANCH: ${{ github.head_ref }}
TRAVIS_PULL_REQUEST_SHA : ${{ github.event.pull_request.head.sha }}
RUST_BACKTRACE: "1"
target_branch: ${{ github.base_ref }}
steps:
- name: Install Go
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/setup-go@v3
with:
go-version: 1.19.2
env:
GOPATH: ${{ runner.workspace }}/kata-containers
- name: Setup GOPATH
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "TRAVIS_BRANCH: ${TRAVIS_BRANCH}"
echo "TRAVIS_PULL_REQUEST_BRANCH: ${TRAVIS_PULL_REQUEST_BRANCH}"
echo "TRAVIS_PULL_REQUEST_SHA: ${TRAVIS_PULL_REQUEST_SHA}"
echo "TRAVIS: ${TRAVIS}"
- name: Set env
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "GOPATH=${{ github.workspace }}" >> $GITHUB_ENV
echo "${{ github.workspace }}/bin" >> $GITHUB_PATH
- name: Checkout code
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v3
with:
fetch-depth: 0
path: ./src/github.com/${{ github.repository }}
- name: Setup travis references
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "TRAVIS_BRANCH=${TRAVIS_BRANCH:-$(echo $GITHUB_REF | awk 'BEGIN { FS = \"/\" } ; { print $3 }')}"
target_branch=${TRAVIS_BRANCH}
- name: Setup
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/setup.sh
env:
GOPATH: ${{ runner.workspace }}/kata-containers
- name: Installing rust
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/install_rust.sh
PATH=$PATH:"$HOME/.cargo/bin"
rustup target add x86_64-unknown-linux-musl
rustup component add rustfmt clippy
- name: Setup seccomp
run: |
libseccomp_install_dir=$(mktemp -d -t libseccomp.XXXXXXXXXX)
gperf_install_dir=$(mktemp -d -t gperf.XXXXXXXXXX)
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/install_libseccomp.sh "${libseccomp_install_dir}" "${gperf_install_dir}"
echo "Set environment variables for the libseccomp crate to link the libseccomp library statically"
echo "LIBSECCOMP_LINK_TYPE=static" >> $GITHUB_ENV
echo "LIBSECCOMP_LIB_PATH=${libseccomp_install_dir}/lib" >> $GITHUB_ENV
- name: Run Unit Tests As Root User
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && sudo -E PATH="$PATH" make test

3
.gitignore vendored
View File

@@ -4,8 +4,6 @@
**/*.rej
**/target
**/.vscode
**/.idea
**/.fleet
pkg/logging/Cargo.lock
src/agent/src/version.rs
src/agent/kata-agent.service
@@ -13,3 +11,4 @@ src/agent/protocols/src/*.rs
!src/agent/protocols/src/lib.rs
build
src/tools/log-parser/kata-log-parser

View File

@@ -8,7 +8,6 @@ COMPONENTS =
COMPONENTS += libs
COMPONENTS += agent
COMPONENTS += dragonball
COMPONENTS += runtime
COMPONENTS += runtime-rs
@@ -16,12 +15,11 @@ COMPONENTS += runtime-rs
TOOLS =
TOOLS += agent-ctl
TOOLS += kata-ctl
TOOLS += log-parser
TOOLS += runk
TOOLS += trace-forwarder
TOOLS += runk
TOOLS += log-parser
STANDARD_TARGETS = build check clean install static-checks-build test vendor
STANDARD_TARGETS = build check clean install test vendor
default: all
@@ -37,7 +35,7 @@ generate-protocols:
make -C src/agent generate-protocols
# Some static checks rely on generated source files of components.
static-checks: static-checks-build
static-checks: build
bash ci/static-checks.sh
docs-url-alive-check:
@@ -45,8 +43,10 @@ docs-url-alive-check:
.PHONY: \
all \
kata-tarball \
install-tarball \
binary-tarball \
default \
install-binary-tarball \
static-checks \
docs-url-alive-check

View File

@@ -119,8 +119,10 @@ The table below lists the core parts of the project:
| [runtime](src/runtime) | core | Main component run by a container manager and providing a containerd shimv2 runtime implementation. |
| [runtime-rs](src/runtime-rs) | core | The Rust version runtime. |
| [agent](src/agent) | core | Management process running inside the virtual machine / POD that sets up the container environment. |
| [libraries](src/libs) | core | Library crates shared by multiple Kata Container components or published to [`crates.io`](https://crates.io/index.html) |
| [`dragonball`](src/dragonball) | core | An optional built-in VMM brings out-of-the-box Kata Containers experience with optimizations on container workloads |
| [documentation](docs) | documentation | Documentation common to all components (such as design and install documentation). |
| [libraries](src/libs) | core | Library crates shared by multiple Kata Container components or published to [`crates.io`](https://crates.io/index.html) |
| [tests](https://github.com/kata-containers/tests) | tests | Excludes unit tests which live with the main code. |
### Additional components
@@ -133,7 +135,6 @@ The table below lists the remaining parts of the project:
| [kernel](https://www.kernel.org) | kernel | Linux kernel used by the hypervisor to boot the guest image. Patches are stored [here](tools/packaging/kernel). |
| [osbuilder](tools/osbuilder) | infrastructure | Tool to create "mini O/S" rootfs and initrd images and kernel for the hypervisor. |
| [`agent-ctl`](src/tools/agent-ctl) | utility | Tool that provides low-level access for testing the agent. |
| [`kata-ctl`](src/tools/kata-ctl) | utility | Tool that provides advanced commands and debug facilities. |
| [`trace-forwarder`](src/tools/trace-forwarder) | utility | Agent tracing helper. |
| [`runk`](src/tools/runk) | utility | Standard OCI container runtime based on the agent. |
| [`ci`](https://github.com/kata-containers/ci) | CI | Continuous Integration configuration files and scripts. |

View File

@@ -1 +1 @@
3.1.3
3.0.2

View File

@@ -43,16 +43,6 @@ function install_yq() {
"aarch64")
goarch=arm64
;;
"arm64")
# If we're on an apple silicon machine, just assign amd64.
# The version of yq we use doesn't have a darwin arm build,
# but Rosetta can come to the rescue here.
if [ $goos == "Darwin" ]; then
goarch=amd64
else
goarch=arm64
fi
;;
"ppc64le")
goarch=ppc64le
;;
@@ -74,7 +64,7 @@ function install_yq() {
fi
## NOTE: ${var,,} => gives lowercase value of var
local yq_url="https://${yq_pkg}/releases/download/${yq_version}/yq_${goos}_${goarch}"
local yq_url="https://${yq_pkg}/releases/download/${yq_version}/yq_${goos,,}_${goarch}"
curl -o "${yq_path}" -LSsf "${yq_url}"
[ $? -ne 0 ] && die "Download ${yq_url} failed"
chmod +x "${yq_path}"

View File

@@ -33,41 +33,51 @@ You need to install the following to build Kata Containers components:
- `make`.
- `gcc` (required for building the shim and runtime).
# Build and install Kata Containers
## Build and install the Kata Containers runtime
# Build and install the Kata Containers runtime
```bash
$ git clone https://github.com/kata-containers/kata-containers.git
$ pushd kata-containers/src/runtime
$ make && sudo -E "PATH=$PATH" make install
$ sudo mkdir -p /etc/kata-containers/
$ sudo install -o root -g root -m 0640 /usr/share/defaults/kata-containers/configuration.toml /etc/kata-containers
$ popd
```
$ go get -d -u github.com/kata-containers/kata-containers
$ cd $GOPATH/src/github.com/kata-containers/kata-containers/src/runtime
$ make && sudo -E PATH=$PATH make install
```
The build will create the following:
- runtime binary: `/usr/local/bin/kata-runtime` and `/usr/local/bin/containerd-shim-kata-v2`
- configuration file: `/usr/share/defaults/kata-containers/configuration.toml` and `/etc/kata-containers/configuration.toml`
- configuration file: `/usr/share/defaults/kata-containers/configuration.toml`
# Check hardware requirements
You can check if your system is capable of creating a Kata Container by running the following:
```
$ sudo kata-runtime check
```
If your system is *not* able to run Kata Containers, the previous command will error out and explain why.
## Configure to use initrd or rootfs image
Kata containers can run with either an initrd image or a rootfs image.
If you want to test with `initrd`, make sure you have uncommented `initrd = /usr/share/kata-containers/kata-containers-initrd.img`
in your configuration file, commenting out the `image` line in
`/etc/kata-containers/configuration.toml`. For example:
If you want to test with `initrd`, make sure you have `initrd = /usr/share/kata-containers/kata-containers-initrd.img`
in your configuration file, commenting out the `image` line:
```bash
`/usr/share/defaults/kata-containers/configuration.toml` and comment out the `image` line with the following. For example:
```
$ sudo mkdir -p /etc/kata-containers/
$ sudo install -o root -g root -m 0640 /usr/share/defaults/kata-containers/configuration.toml /etc/kata-containers
$ sudo sed -i 's/^\(image =.*\)/# \1/g' /etc/kata-containers/configuration.toml
$ sudo sed -i 's/^# \(initrd =.*\)/\1/g' /etc/kata-containers/configuration.toml
```
You can create the initrd image as shown in the [create an initrd image](#create-an-initrd-image---optional) section.
If you want to test with a rootfs `image`, make sure you have uncommented `image = /usr/share/kata-containers/kata-containers.img`
If you want to test with a rootfs `image`, make sure you have `image = /usr/share/kata-containers/kata-containers.img`
in your configuration file, commenting out the `initrd` line. For example:
```bash
```
$ sudo mkdir -p /etc/kata-containers/
$ sudo install -o root -g root -m 0640 /usr/share/defaults/kata-containers/configuration.toml /etc/kata-containers
$ sudo sed -i 's/^\(initrd =.*\)/# \1/g' /etc/kata-containers/configuration.toml
```
The rootfs image is created as shown in the [create a rootfs image](#create-a-rootfs-image) section.
@@ -80,38 +90,19 @@ rootfs `image`(100MB+).
Enable seccomp as follows:
```bash
```
$ sudo sed -i '/^disable_guest_seccomp/ s/true/false/' /etc/kata-containers/configuration.toml
```
This will pass container seccomp profiles to the kata agent.
## Enable SELinux on the guest
> **Note:**
>
> - To enable SELinux on the guest, SELinux MUST be also enabled on the host.
> - You MUST create and build a rootfs image for SELinux in advance.
> See [Create a rootfs image](#create-a-rootfs-image) and [Build a rootfs image](#build-a-rootfs-image).
> - SELinux on the guest is supported in only a rootfs image currently, so
> you cannot enable SELinux with the agent init (`AGENT_INIT=yes`) yet.
Enable guest SELinux in Enforcing mode as follows:
```
$ sudo sed -i '/^disable_guest_selinux/ s/true/false/g' /etc/kata-containers/configuration.toml
```
The runtime automatically will set `selinux=1` to the kernel parameters and `xattr` option to
`virtiofsd` when `disable_guest_selinux` is set to `false`.
If you want to enable SELinux in Permissive mode, add `enforcing=0` to the kernel parameters.
## Enable full debug
Enable full debug as follows:
```bash
```
$ sudo mkdir -p /etc/kata-containers/
$ sudo install -o root -g root -m 0640 /usr/share/defaults/kata-containers/configuration.toml /etc/kata-containers
$ sudo sed -i -e 's/^# *\(enable_debug\).*=.*$/\1 = true/g' /etc/kata-containers/configuration.toml
$ sudo sed -i -e 's/^kernel_params = "\(.*\)"/kernel_params = "\1 agent.log=debug initcall_debug"/g' /etc/kata-containers/configuration.toml
```
@@ -184,7 +175,7 @@ and offers possible workarounds and fixes.
it stores. When messages are suppressed, it is noted in the logs. This can be checked
for by looking for those notifications, such as:
```bash
```sh
$ sudo journalctl --since today | fgrep Suppressed
Jun 29 14:51:17 mymachine systemd-journald[346]: Suppressed 4150 messages from /system.slice/docker.service
```
@@ -209,7 +200,7 @@ RateLimitBurst=0
Restart `systemd-journald` for the changes to take effect:
```bash
```sh
$ sudo systemctl restart systemd-journald
```
@@ -223,52 +214,39 @@ $ sudo systemctl restart systemd-journald
The agent is built with a statically linked `musl.` The default `libc` used is `musl`, but on `ppc64le` and `s390x`, `gnu` should be used. To configure this:
```bash
$ export ARCH="$(uname -m)"
```
$ export ARCH=$(uname -m)
$ if [ "$ARCH" = "ppc64le" -o "$ARCH" = "s390x" ]; then export LIBC=gnu; else export LIBC=musl; fi
$ [ "${ARCH}" == "ppc64le" ] && export ARCH=powerpc64le
$ rustup target add "${ARCH}-unknown-linux-${LIBC}"
$ [ ${ARCH} == "ppc64le" ] && export ARCH=powerpc64le
$ rustup target add ${ARCH}-unknown-linux-${LIBC}
```
To build the agent:
```
$ go get -d -u github.com/kata-containers/kata-containers
$ cd $GOPATH/src/github.com/kata-containers/kata-containers/src/agent && make
```
The agent is built with seccomp capability by default.
If you want to build the agent without the seccomp capability, you need to run `make` with `SECCOMP=no` as follows.
```bash
$ make -C kata-containers/src/agent SECCOMP=no
```
For building the agent with seccomp support using `musl`, set the environment
variables for the [`libseccomp` crate](https://github.com/libseccomp-rs/libseccomp-rs).
```bash
$ export LIBSECCOMP_LINK_TYPE=static
$ export LIBSECCOMP_LIB_PATH="the path of the directory containing libseccomp.a"
$ make -C kata-containers/src/agent
$ make -C $GOPATH/src/github.com/kata-containers/kata-containers/src/agent SECCOMP=no
```
If the compilation fails when the agent tries to link the `libseccomp` library statically
against `musl`, you will need to build `libseccomp` manually with `-U_FORTIFY_SOURCE`.
You can use [our script](https://github.com/kata-containers/kata-containers/blob/main/ci/install_libseccomp.sh)
to install `libseccomp` for the agent.
```bash
$ mkdir -p ${seccomp_install_path} ${gperf_install_path}
$ kata-containers/ci/install_libseccomp.sh ${seccomp_install_path} ${gperf_install_path}
$ export LIBSECCOMP_LIB_PATH="${seccomp_install_path}/lib"
```
On `ppc64le` and `s390x`, `glibc` is used. You will need to install the `libseccomp` library
provided by your distribution.
> e.g. `libseccomp-dev` for Ubuntu, or `libseccomp-devel` for CentOS
> **Note:**
>
> - If you enable seccomp in the main configuration file but build the agent without seccomp capability,
> the runtime exits conservatively with an error message.
## Get the osbuilder
```
$ go get -d -u github.com/kata-containers/kata-containers
$ cd $GOPATH/src/github.com/kata-containers/kata-containers/tools/osbuilder
```
## Create a rootfs image
### Create a local rootfs
@@ -276,32 +254,24 @@ As a prerequisite, you need to install Docker. Otherwise, you will not be
able to run the `rootfs.sh` script with `USE_DOCKER=true` as expected in
the following example.
```bash
$ export distro="ubuntu" # example
$ export ROOTFS_DIR="$(realpath kata-containers/tools/osbuilder/rootfs-builder/rootfs)"
$ sudo rm -rf "${ROOTFS_DIR}"
$ pushd kata-containers/tools/osbuilder/rootfs-builder
$ script -fec 'sudo -E USE_DOCKER=true ./rootfs.sh "${distro}"'
$ popd
```
$ export ROOTFS_DIR=${GOPATH}/src/github.com/kata-containers/kata-containers/tools/osbuilder/rootfs-builder/rootfs
$ sudo rm -rf ${ROOTFS_DIR}
$ cd $GOPATH/src/github.com/kata-containers/kata-containers/tools/osbuilder/rootfs-builder
$ script -fec 'sudo -E GOPATH=$GOPATH USE_DOCKER=true ./rootfs.sh ${distro}'
```
You MUST choose a distribution (e.g., `ubuntu`) for `${distro}`.
You can get a supported distributions list in the Kata Containers by running the following.
```bash
$ ./kata-containers/tools/osbuilder/rootfs-builder/rootfs.sh -l
```
$ ./rootfs.sh -l
```
If you want to build the agent without seccomp capability, you need to run the `rootfs.sh` script with `SECCOMP=no` as follows.
```bash
$ script -fec 'sudo -E AGENT_INIT=yes USE_DOCKER=true SECCOMP=no ./rootfs.sh "${distro}"'
```
If you want to enable SELinux on the guest, you MUST choose `centos` and run the `rootfs.sh` script with `SELINUX=yes` as follows.
```
$ script -fec 'sudo -E GOPATH=$GOPATH USE_DOCKER=true SELINUX=yes ./rootfs.sh centos'
$ script -fec 'sudo -E GOPATH=$GOPATH AGENT_INIT=yes USE_DOCKER=true SECCOMP=no ./rootfs.sh ${distro}'
```
> **Note:**
@@ -317,32 +287,18 @@ $ script -fec 'sudo -E GOPATH=$GOPATH USE_DOCKER=true SELINUX=yes ./rootfs.sh ce
>
> - You should only do this step if you are testing with the latest version of the agent.
```bash
$ sudo install -o root -g root -m 0550 -t "${ROOTFS_DIR}/usr/bin" "${ROOTFS_DIR}/../../../../src/agent/target/x86_64-unknown-linux-musl/release/kata-agent"
$ sudo install -o root -g root -m 0440 "${ROOTFS_DIR}/../../../../src/agent/kata-agent.service" "${ROOTFS_DIR}/usr/lib/systemd/system/"
$ sudo install -o root -g root -m 0440 "${ROOTFS_DIR}/../../../../src/agent/kata-containers.target" "${ROOTFS_DIR}/usr/lib/systemd/system/"
```
$ sudo install -o root -g root -m 0550 -t ${ROOTFS_DIR}/usr/bin ../../../src/agent/target/x86_64-unknown-linux-musl/release/kata-agent
$ sudo install -o root -g root -m 0440 ../../../src/agent/kata-agent.service ${ROOTFS_DIR}/usr/lib/systemd/system/
$ sudo install -o root -g root -m 0440 ../../../src/agent/kata-containers.target ${ROOTFS_DIR}/usr/lib/systemd/system/
```
### Build a rootfs image
```bash
$ pushd kata-containers/tools/osbuilder/image-builder
$ script -fec 'sudo -E USE_DOCKER=true ./image_builder.sh "${ROOTFS_DIR}"'
$ popd
```
If you want to enable SELinux on the guest, you MUST run the `image_builder.sh` script with `SELINUX=yes`
to label the guest image as follows.
To label the image on the host, you need to make sure that SELinux is enabled (`selinuxfs` is mounted) on the host
and the rootfs MUST be created by running the `rootfs.sh` with `SELINUX=yes`.
$ cd $GOPATH/src/github.com/kata-containers/kata-containers/tools/osbuilder/image-builder
$ script -fec 'sudo -E USE_DOCKER=true ./image_builder.sh ${ROOTFS_DIR}'
```
$ script -fec 'sudo -E USE_DOCKER=true SELINUX=yes ./image_builder.sh ${ROOTFS_DIR}'
```
Currently, the `image_builder.sh` uses `chcon` as an interim solution in order to apply `container_runtime_exec_t`
to the `kata-agent`. Hence, if you run `restorecon` to the guest image after running the `image_builder.sh`,
the `kata-agent` needs to be labeled `container_runtime_exec_t` again by yourself.
> **Notes:**
>
@@ -353,31 +309,25 @@ the `kata-agent` needs to be labeled `container_runtime_exec_t` again by yoursel
> variable in the previous command and ensure the `qemu-img` command is
> available on your system.
> - If `qemu-img` is not installed, you will likely see errors such as `ERROR: File /dev/loop19p1 is not a block device` and `losetup: /tmp/tmp.bHz11oY851: Warning: file is smaller than 512 bytes; the loop device may be useless or invisible for system tools`. These can be mitigated by installing the `qemu-img` command (available in the `qemu-img` package on Fedora or the `qemu-utils` package on Debian).
> - If `loop` module is not probed, you will likely see errors such as `losetup: cannot find an unused loop device`. Execute `modprobe loop` could resolve it.
### Install the rootfs image
```bash
$ pushd kata-containers/tools/osbuilder/image-builder
$ commit="$(git log --format=%h -1 HEAD)"
$ date="$(date +%Y-%m-%d-%T.%N%z)"
```
$ commit=$(git log --format=%h -1 HEAD)
$ date=$(date +%Y-%m-%d-%T.%N%z)
$ image="kata-containers-${date}-${commit}"
$ sudo install -o root -g root -m 0640 -D kata-containers.img "/usr/share/kata-containers/${image}"
$ (cd /usr/share/kata-containers && sudo ln -sf "$image" kata-containers.img)
$ popd
```
## Create an initrd image - OPTIONAL
### Create a local rootfs for initrd image
```bash
$ export distro="ubuntu" # example
$ export ROOTFS_DIR="$(realpath kata-containers/tools/osbuilder/rootfs-builder/rootfs)"
$ sudo rm -rf "${ROOTFS_DIR}"
$ pushd kata-containers/tools/osbuilder/rootfs-builder/
$ script -fec 'sudo -E AGENT_INIT=yes USE_DOCKER=true ./rootfs.sh "${distro}"'
$ popd
```
$ export ROOTFS_DIR="${GOPATH}/src/github.com/kata-containers/kata-containers/tools/osbuilder/rootfs-builder/rootfs"
$ sudo rm -rf ${ROOTFS_DIR}
$ cd $GOPATH/src/github.com/kata-containers/kata-containers/tools/osbuilder/rootfs-builder
$ script -fec 'sudo -E GOPATH=$GOPATH AGENT_INIT=yes USE_DOCKER=true ./rootfs.sh ${distro}'
```
`AGENT_INIT` controls if the guest image uses the Kata agent as the guest `init` process. When you create an initrd image,
always set `AGENT_INIT` to `yes`.
@@ -385,14 +335,14 @@ always set `AGENT_INIT` to `yes`.
You MUST choose a distribution (e.g., `ubuntu`) for `${distro}`.
You can get a supported distributions list in the Kata Containers by running the following.
```bash
$ ./kata-containers/tools/osbuilder/rootfs-builder/rootfs.sh -l
```
$ ./rootfs.sh -l
```
If you want to build the agent without seccomp capability, you need to run the `rootfs.sh` script with `SECCOMP=no` as follows.
```bash
$ script -fec 'sudo -E AGENT_INIT=yes USE_DOCKER=true SECCOMP=no ./rootfs.sh "${distro}"'
```
$ script -fec 'sudo -E GOPATH=$GOPATH AGENT_INIT=yes USE_DOCKER=true SECCOMP=no ./rootfs.sh ${distro}'
```
> **Note:**
@@ -401,31 +351,28 @@ $ script -fec 'sudo -E AGENT_INIT=yes USE_DOCKER=true SECCOMP=no ./rootfs.sh "${
Optionally, add your custom agent binary to the rootfs with the following commands. The default `$LIBC` used
is `musl`, but on ppc64le and s390x, `gnu` should be used. Also, Rust refers to ppc64le as `powerpc64le`:
```bash
$ export ARCH="$(uname -m)"
$ [ "${ARCH}" == "ppc64le" ] || [ "${ARCH}" == "s390x" ] && export LIBC=gnu || export LIBC=musl
$ [ "${ARCH}" == "ppc64le" ] && export ARCH=powerpc64le
$ sudo install -o root -g root -m 0550 -T "${ROOTFS_DIR}/../../../../src/agent/target/${ARCH}-unknown-linux-${LIBC}/release/kata-agent" "${ROOTFS_DIR}/sbin/init"
```
$ export ARCH=$(uname -m)
$ [ ${ARCH} == "ppc64le" ] || [ ${ARCH} == "s390x" ] && export LIBC=gnu || export LIBC=musl
$ [ ${ARCH} == "ppc64le" ] && export ARCH=powerpc64le
$ sudo install -o root -g root -m 0550 -T ../../../src/agent/target/${ARCH}-unknown-linux-${LIBC}/release/kata-agent ${ROOTFS_DIR}/sbin/init
```
### Build an initrd image
```bash
$ pushd kata-containers/tools/osbuilder/initrd-builder
$ script -fec 'sudo -E AGENT_INIT=yes USE_DOCKER=true ./initrd_builder.sh "${ROOTFS_DIR}"'
$ popd
```
$ cd $GOPATH/src/github.com/kata-containers/kata-containers/tools/osbuilder/initrd-builder
$ script -fec 'sudo -E AGENT_INIT=yes USE_DOCKER=true ./initrd_builder.sh ${ROOTFS_DIR}'
```
### Install the initrd image
```bash
$ pushd kata-containers/tools/osbuilder/initrd-builder
$ commit="$(git log --format=%h -1 HEAD)"
$ date="$(date +%Y-%m-%d-%T.%N%z)"
```
$ commit=$(git log --format=%h -1 HEAD)
$ date=$(date +%Y-%m-%d-%T.%N%z)
$ image="kata-containers-initrd-${date}-${commit}"
$ sudo install -o root -g root -m 0640 -D kata-containers-initrd.img "/usr/share/kata-containers/${image}"
$ (cd /usr/share/kata-containers && sudo ln -sf "$image" kata-containers-initrd.img)
$ popd
```
# Install guest kernel images
@@ -444,44 +391,44 @@ Kata Containers makes use of upstream QEMU branch. The exact version
and repository utilized can be found by looking at the [versions file](../versions.yaml).
Find the correct version of QEMU from the versions file:
```bash
$ source kata-containers/tools/packaging/scripts/lib.sh
$ qemu_version="$(get_from_kata_deps "assets.hypervisor.qemu.version")"
$ echo "${qemu_version}"
```
$ source ${GOPATH}/src/github.com/kata-containers/kata-containers/tools/packaging/scripts/lib.sh
$ qemu_version=$(get_from_kata_deps "assets.hypervisor.qemu.version")
$ echo ${qemu_version}
```
Get source from the matching branch of QEMU:
```bash
$ git clone -b "${qemu_version}" https://github.com/qemu/qemu.git
$ your_qemu_directory="$(realpath qemu)"
```
$ go get -d github.com/qemu/qemu
$ cd ${GOPATH}/src/github.com/qemu/qemu
$ git checkout ${qemu_version}
$ your_qemu_directory=${GOPATH}/src/github.com/qemu/qemu
```
There are scripts to manage the build and packaging of QEMU. For the examples below, set your
environment as:
```bash
$ packaging_dir="$(realpath kata-containers/tools/packaging)"
```
$ go get -d github.com/kata-containers/kata-containers
$ packaging_dir="${GOPATH}/src/github.com/kata-containers/kata-containers/tools/packaging"
```
Kata often utilizes patches for not-yet-upstream and/or backported fixes for components,
including QEMU. These can be found in the [packaging/QEMU directory](../tools/packaging/qemu/patches),
and it's *recommended* that you apply them. For example, suppose that you are going to build QEMU
version 5.2.0, do:
```bash
$ "$packaging_dir/scripts/apply_patches.sh" "$packaging_dir/qemu/patches/5.2.x/"
```
$ cd $your_qemu_directory
$ $packaging_dir/scripts/apply_patches.sh $packaging_dir/qemu/patches/5.2.x/
```
To build utilizing the same options as Kata, you should make use of the `configure-hypervisor.sh` script. For example:
```bash
$ pushd "$your_qemu_directory"
$ "$packaging_dir/scripts/configure-hypervisor.sh" kata-qemu > kata.cfg
```
$ cd $your_qemu_directory
$ $packaging_dir/scripts/configure-hypervisor.sh kata-qemu > kata.cfg
$ eval ./configure "$(cat kata.cfg)"
$ make -j $(nproc --ignore=1)
# Optional
$ sudo -E make install
$ popd
```
If you do not want to install the respective QEMU version, the configuration file can be modified to point to the correct binary. In `/etc/kata-containers/configuration.toml`, change `path = "/path/to/qemu/build/qemu-system-x86_64"` to point to the correct QEMU binary.
See the [static-build script for QEMU](../tools/packaging/static-build/qemu/build-static-qemu.sh) for a reference on how to get, setup, configure and build QEMU for Kata.
### Build a custom QEMU for aarch64/arm64 - REQUIRED
@@ -492,33 +439,11 @@ See the [static-build script for QEMU](../tools/packaging/static-build/qemu/buil
> under upstream review for supporting NVDIMM on aarch64.
>
You could build the custom `qemu-system-aarch64` as required with the following command:
```bash
$ git clone https://github.com/kata-containers/tests.git
$ script -fec 'sudo -E tests/.ci/install_qemu.sh'
```
## Build `virtiofsd`
When using the file system type virtio-fs (default), `virtiofsd` is required
```bash
$ pushd kata-containers/tools/packaging/static-build/virtiofsd
$ ./build.sh
$ popd
$ go get -d github.com/kata-containers/tests
$ script -fec 'sudo -E ${GOPATH}/src/github.com/kata-containers/tests/.ci/install_qemu.sh'
```
Modify `/etc/kata-containers/configuration.toml` and update value `virtio_fs_daemon = "/path/to/kata-containers/tools/packaging/static-build/virtiofsd/virtiofsd/virtiofsd"` to point to the binary.
# Check hardware requirements
You can check if your system is capable of creating a Kata Container by running the following:
```bash
$ sudo kata-runtime check
```
If your system is *not* able to run Kata Containers, the previous command will error out and explain why.
# Run Kata Containers with Containerd
Refer to the [How to use Kata Containers and Containerd](how-to/containerd-kata.md) how-to guide.
@@ -549,7 +474,7 @@ See [Set up a debug console](#set-up-a-debug-console).
## Checking Docker default runtime
```bash
```
$ sudo docker info 2>/dev/null | grep -i "default runtime" | cut -d: -f2- | grep -q runc && echo "SUCCESS" || echo "ERROR: Incorrect default Docker runtime"
```
## Set up a debug console
@@ -566,7 +491,7 @@ contain either `/bin/sh` or `/bin/bash`.
Enable debug_console_enabled in the `configuration.toml` configuration file:
```toml
```
[agent.kata]
debug_console_enabled = true
```
@@ -577,7 +502,7 @@ This will pass `agent.debug_console agent.debug_console_vport=1026` to agent as
For Kata Containers `2.0.x` releases, the `kata-runtime exec` command depends on the`kata-monitor` running, in order to get the sandbox's `vsock` address to connect to. Thus, first start the `kata-monitor` process.
```bash
```
$ sudo kata-monitor
```
@@ -639,10 +564,10 @@ an additional `coreutils` package.
For example using CentOS:
```bash
$ pushd kata-containers/tools/osbuilder/rootfs-builder
$ export ROOTFS_DIR="$(realpath ./rootfs)"
$ script -fec 'sudo -E USE_DOCKER=true EXTRA_PKGS="bash coreutils" ./rootfs.sh centos'
```
$ cd $GOPATH/src/github.com/kata-containers/kata-containers/tools/osbuilder/rootfs-builder
$ export ROOTFS_DIR=${GOPATH}/src/github.com/kata-containers/kata-containers/tools/osbuilder/rootfs-builder/rootfs
$ script -fec 'sudo -E GOPATH=$GOPATH USE_DOCKER=true EXTRA_PKGS="bash coreutils" ./rootfs.sh centos'
```
#### Build the debug image
@@ -657,10 +582,9 @@ Install the image:
>**Note**: When using an initrd image, replace the below rootfs image name `kata-containers.img`
>with the initrd image name `kata-containers-initrd.img`.
```bash
```
$ name="kata-containers-centos-with-debug-console.img"
$ sudo install -o root -g root -m 0640 kata-containers.img "/usr/share/kata-containers/${name}"
$ popd
```
Next, modify the `image=` values in the `[hypervisor.qemu]` section of the
@@ -669,7 +593,7 @@ to specify the full path to the image name specified in the previous code
section. Alternatively, recreate the symbolic link so it points to
the new debug image:
```bash
```
$ (cd /usr/share/kata-containers && sudo ln -sf "$name" kata-containers.img)
```
@@ -680,7 +604,7 @@ to avoid all subsequently created containers from using the debug image.
Create a container as normal. For example using `crictl`:
```bash
```
$ sudo crictl run -r kata container.yaml pod.yaml
```
@@ -693,7 +617,7 @@ those for firecracker / cloud-hypervisor.
Add `agent.debug_console` to the guest kernel command line to allow the agent process to start a debug console.
```bash
```
$ sudo sed -i -e 's/^kernel_params = "\(.*\)"/kernel_params = "\1 agent.debug_console"/g' "${kata_configuration_file}"
```
@@ -714,7 +638,7 @@ between the host and the guest. The kernel command line option `agent.debug_cons
Add the parameter `agent.debug_console_vport=1026` to the kernel command line
as shown below:
```bash
```
sudo sed -i -e 's/^kernel_params = "\(.*\)"/kernel_params = "\1 agent.debug_console_vport=1026"/g' "${kata_configuration_file}"
```
@@ -727,7 +651,7 @@ Next, connect to the debug console. The VSOCKS paths vary slightly between each
VMM solution.
In case of cloud-hypervisor, connect to the `vsock` as shown:
```bash
```
$ sudo su -c 'cd /var/run/vc/vm/${sandbox_id}/root/ && socat stdin unix-connect:clh.sock'
CONNECT 1026
```
@@ -735,7 +659,7 @@ CONNECT 1026
**Note**: You need to type `CONNECT 1026` and press `RETURN` key after entering the `socat` command.
For firecracker, connect to the `hvsock` as shown:
```bash
```
$ sudo su -c 'cd /var/run/vc/firecracker/${sandbox_id}/root/ && socat stdin unix-connect:kata.hvsock'
CONNECT 1026
```
@@ -744,7 +668,7 @@ CONNECT 1026
For QEMU, connect to the `vsock` as shown:
```bash
```
$ sudo su -c 'cd /var/run/vc/vm/${sandbox_id} && socat "stdin,raw,echo=0,escape=0x11" "unix-connect:console.sock"'
```
@@ -757,7 +681,7 @@ If the image is created using
[osbuilder](../tools/osbuilder), the following YAML
file exists and contains details of the image and how it was created:
```bash
```
$ cat /var/lib/osbuilder/osbuilder.yaml
```

View File

@@ -7,9 +7,7 @@ Kata Containers design documents:
- [Design requirements for Kata Containers](kata-design-requirements.md)
- [VSocks](VSocks.md)
- [VCPU handling](vcpu-handling.md)
- [VCPU threads pinning](vcpu-threads-pinning.md)
- [Host cgroups](host-cgroups.md)
- [Agent systemd cgroup](agent-systemd-cgroup.md)
- [`Inotify` support](inotify.md)
- [Metrics(Kata 2.0)](kata-2-0-metrics.md)
- [Design for Kata Containers `Lazyload` ability with `nydus`](kata-nydus-design.md)

View File

@@ -1,84 +0,0 @@
# Systemd Cgroup for Agent
As we know, we can interact with cgroups in two ways, **`cgroupfs`** and **`systemd`**. The former is achieved by reading and writing cgroup `tmpfs` files under `/sys/fs/cgroup` while the latter is done by configuring a transient unit by requesting systemd. Kata agent uses **`cgroupfs`** by default, unless you pass the parameter `--systemd-cgroup`.
## usage
For systemd, kata agent configures cgroups according to the following `linux.cgroupsPath` format standard provided by `runc` (`[slice]:[prefix]:[name]`). If you don't provide a valid `linux.cgroupsPath`, kata agent will treat it as `"system.slice:kata_agent:<container-id>"`.
> Here slice is a systemd slice under which the container is placed. If empty, it defaults to system.slice, except when cgroup v2 is used and rootless container is created, in which case it defaults to user.slice.
>
> Note that slice can contain dashes to denote a sub-slice (e.g. user-1000.slice is a correct notation, meaning a `subslice` of user.slice), but it must not contain slashes (e.g. user.slice/user-1000.slice is invalid).
>
> A slice of `-` represents a root slice.
>
> Next, prefix and name are used to compose the unit name, which is `<prefix>-<name>.scope`, unless name has `.slice` suffix, in which case prefix is ignored and the name is used as is.
## supported properties
The kata agent will translate the parameters in the `linux.resources` of `config.json` into systemd unit properties, and send it to systemd for configuration. Since systemd supports limited properties, only the following parameters in `linux.resources` will be applied. We will simply treat hybrid mode as legacy mode by the way.
- CPU
- v1
| runtime spec resource | systemd property name |
| --------------------- | --------------------- |
| `cpu.shares` | `CPUShares` |
- v2
| runtime spec resource | systemd property name |
| -------------------------- | -------------------------- |
| `cpu.shares` | `CPUShares` |
| `cpu.period` | `CPUQuotaPeriodUSec`(v242) |
| `cpu.period` & `cpu.quota` | `CPUQuotaPerSecUSec` |
- MEMORY
- v1
| runtime spec resource | systemd property name |
| --------------------- | --------------------- |
| `memory.limit` | `MemoryLimit` |
- v2
| runtime spec resource | systemd property name |
| ------------------------------ | --------------------- |
| `memory.low` | `MemoryLow` |
| `memory.max` | `MemoryMax` |
| `memory.swap` & `memory.limit` | `MemorySwapMax` |
- PIDS
| runtime spec resource | systemd property name |
| --------------------- | --------------------- |
| `pids.limit ` | `TasksMax` |
- CPUSET
| runtime spec resource | systemd property name |
| --------------------- | -------------------------- |
| `cpuset.cpus` | `AllowedCPUs`(v244) |
| `cpuset.mems` | `AllowedMemoryNodes`(v244) |
## Systemd Interface
`session.rs` and `system.rs` in `src/agent/rustjail/src/cgroups/systemd/interface` are automatically generated by `zbus-xmlgen`, which is is an accompanying tool provided by `zbus` to generate Rust code from `D-Bus XML interface descriptions`. The specific commands to generate these two files are as follows:
```shell
// system.rs
zbus-xmlgen --system org.freedesktop.systemd1 /org/freedesktop/systemd1
// session.rs
zbus-xmlgen --session org.freedesktop.systemd1 /org/freedesktop/systemd1
```
The current implementation of `cgroups/systemd` uses `system.rs` while `session.rs` could be used to build rootless containers in the future.
## references
- [runc - systemd cgroup driver](https://github.com/opencontainers/runc/blob/main/docs/systemd.md)
- [systemd.resource-control — Resource control unit settings](https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html)

Binary file not shown.

Before

Width:  |  Height:  |  Size: 193 KiB

View File

@@ -64,8 +64,8 @@ The kata-runtime is controlled by TOKIO_RUNTIME_WORKER_THREADS to run the OS thr
├─ TTRPC listener thread(M * tokio task)
├─ TTRPC client handler thread(7 * M * tokio task)
├─ container stdin io thread(M * tokio task)
├─ container stdout io thread(M * tokio task)
└─ container stderr io thread(M * tokio task)
├─ container stdin io thread(M * tokio task)
└─ container stdin io thread(M * tokio task)
```
### Extensible Framework
The Kata 3.x runtime is designed with the extension of service, runtime, and hypervisor, combined with configuration to meet the needs of different scenarios. At present, the service provides a register mechanism to support multiple services. Services could interact with runtime through messages. In addition, the runtime handler handles messages from services. To meet the needs of a binary that supports multiple runtimes and hypervisors, the startup must obtain the runtime handler type and hypervisor type through configuration.

View File

@@ -81,7 +81,7 @@ Notes: given that the `mountInfo` is persisted to the disk by the Kata runtime,
Instead of the CSI node driver writing the mount info into a `csiPlugin.json` file under the volume root,
as described in the original proposal, here we propose that the CSI node driver passes the mount information to
the Kata Containers runtime through a new `kata-runtime` commandline command. The `kata-runtime` then writes the mount
information to a `mountInfo.json` file in a predefined location (`/run/kata-containers/shared/direct-volumes/[volume_path]/`).
information to a `mount-info.json` file in a predefined location (`/run/kata-containers/shared/direct-volumes/[volume_path]/`).
When the Kata Containers runtime starts a container, it verifies whether a volume mount is a direct-assigned volume by checking
whether there is a `mountInfo` file under the computed Kata `direct-volumes` directory. If it is, the runtime parses the `mountInfo` file,

View File

@@ -1,37 +0,0 @@
# Design Doc for Kata Containers' VCPUs Pinning Feature
## Background
By now, vCPU threads of Kata Containers are scheduled randomly to CPUs. And each pod would request a specific set of CPUs which we call it CPU set (just the CPU set meaning in Linux cgroups).
If the number of vCPU threads are equal to that of CPUs claimed in CPU set, we can then pin each vCPU thread to one specified CPU, to reduce the cost of random scheduling.
## Detailed Design
### Passing Config Parameters
Two ways are provided to use this vCPU thread pinning feature: through `QEMU` configuration file and through annotations. Finally the pinning parameter is passed to `HypervisorConfig`.
### Related Linux Thread Scheduling API
| API Info | Value |
|-------------------|-----------------------------------------------------------|
| Package | `golang.org/x/sys/unix` |
| Method | `unix.SchedSetaffinity(thread_id, &unixCPUSet)` |
| Official Doc Page | https://pkg.go.dev/golang.org/x/sys/unix#SchedSetaffinity |
### When is VCPUs Pinning Checked?
As shown in Section 1, when `num(vCPU threads) == num(CPUs in CPU set)`, we shall pin each vCPU thread to a specified CPU. And when this condition is broken, we should restore to the original random scheduling pattern.
So when may `num(CPUs in CPU set)` change? There are 5 possible scenes:
| Possible scenes | Related Code |
|-----------------------------------|--------------------------------------------|
| when creating a container | File Sandbox.go, in method `CreateContainer` |
| when starting a container | File Sandbox.go, in method `StartContainer` |
| when deleting a container | File Sandbox.go, in method `DeleteContainer` |
| when updating a container | File Sandbox.go, in method `UpdateContainer` |
| when creating multiple containers | File Sandbox.go, in method `createContainers` |
### Core Pinning Logics
We can split the whole process into the following steps. Related methods are `checkVCPUsPinning` and `resetVCPUsPinning`, in file Sandbox.go.
![](arch-images/vcpus-pinning-process.png)

View File

@@ -110,7 +110,7 @@ Devices and features used:
- VFIO
- hotplug
- seccomp filters
- [HTTP OpenAPI](https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/vmm/src/api/openapi/cloud-hypervisor.yaml)
- [HTTP OpenAPI](https://github.com/cloud-hypervisor/cloud-hypervisor/blob/master/vmm/src/api/openapi/cloud-hypervisor.yaml)
### Summary

View File

@@ -42,6 +42,4 @@
- [How to setup swap devices in guest kernel](how-to-setup-swap-devices-in-guest-kernel.md)
- [How to run rootless vmm](how-to-run-rootless-vmm.md)
- [How to run Docker with Kata Containers](how-to-run-docker-with-kata.md)
- [How to run Kata Containers with `nydus`](how-to-use-virtio-fs-nydus-with-kata.md)
- [How to run Kata Containers with AMD SEV-SNP](how-to-run-kata-containers-with-SNP-VMs.md)
- [How to use EROFS to build rootfs in Kata Containers](how-to-use-erofs-build-rootfs.md)
- [How to run Kata Containers with `nydus`](how-to-use-virtio-fs-nydus-with-kata.md)

View File

@@ -77,8 +77,8 @@ $ command -v containerd
You can manually install CNI plugins as follows:
```bash
$ git clone https://github.com/containernetworking/plugins.git
$ pushd plugins
$ go get github.com/containernetworking/plugins
$ pushd $GOPATH/src/github.com/containernetworking/plugins
$ ./build_linux.sh
$ sudo mkdir /opt/cni
$ sudo cp -r bin /opt/cni/
@@ -93,8 +93,8 @@ $ popd
You can install the `cri-tools` from source code:
```bash
$ git clone https://github.com/kubernetes-sigs/cri-tools.git
$ pushd cri-tools
$ go get github.com/kubernetes-sigs/cri-tools
$ pushd $GOPATH/src/github.com/kubernetes-sigs/cri-tools
$ make
$ sudo -E make install
$ popd
@@ -257,48 +257,6 @@ This launches a BusyBox container named `hello`, and it will be removed by `--rm
The `--cni` flag enables CNI networking for the container. Without this flag, a container with just a
loopback interface is created.
### Launch containers using `ctr` command line with rootfs bundle
#### Get rootfs
Use the script to create rootfs
```bash
ctr i pull quay.io/prometheus/busybox:latest
ctr i export rootfs.tar quay.io/prometheus/busybox:latest
rootfs_tar=rootfs.tar
bundle_dir="./bundle"
mkdir -p "${bundle_dir}"
# extract busybox rootfs
rootfs_dir="${bundle_dir}/rootfs"
mkdir -p "${rootfs_dir}"
layers_dir="$(mktemp -d)"
tar -C "${layers_dir}" -pxf "${rootfs_tar}"
for ((i=0;i<$(cat ${layers_dir}/manifest.json | jq -r ".[].Layers | length");i++)); do
tar -C ${rootfs_dir} -xf ${layers_dir}/$(cat ${layers_dir}/manifest.json | jq -r ".[].Layers[${i}]")
done
```
#### Get `config.json`
Use runc spec to generate `config.json`
```bash
cd ./bundle/rootfs
runc spec
mv config.json ../
```
Change the root `path` in `config.json` to the absolute path of rootfs
```JSON
"root":{
"path":"/root/test/bundle/rootfs",
"readonly": false
},
```
#### Run container
```bash
sudo ctr run -d --runtime io.containerd.run.kata.v2 --config bundle/config.json hello
sudo ctr t exec --exec-id ${ID} -t hello sh
```
### Launch Pods with `crictl` command line
With the `crictl` command line of `cri-tools`, you can specify runtime class with `-r` or `--runtime` flag.

View File

@@ -1,159 +0,0 @@
# Kata Containers with AMD SEV-SNP VMs
## Disclaimer
This guide is designed for developers and is - same as the Developer Guide - not intended for production systems or end users. It is advisable to only follow this guide on non-critical development systems.
## Prerequisites
To run Kata Containers in SNP-VMs, the following software stack is used.
![Kubernetes integration with shimv2](./images/SNP-stack.svg)
The host BIOS and kernel must be capable of supporting AMD SEV-SNP and configured accordingly. For Kata Containers, the host kernel with branch [`sev-snp-iommu-avic_5.19-rc6_v3`](https://github.com/AMDESE/linux/tree/sev-snp-iommu-avic_5.19-rc6_v3) and commit [`3a88547`](https://github.com/AMDESE/linux/commit/3a885471cf89156ea555341f3b737ad2a8d9d3d0) is known to work in conjunction with SEV Firmware version 1.51.3 (0xh\_1.33.03) available on AMD's [SEV developer website](https://developer.amd.com/sev/). See [AMD's guide](https://github.com/AMDESE/AMDSEV/tree/sev-snp-devel) to configure the host accordingly. Verify that you are able to run SEV-SNP encrypted VMs first. The guest components required for Kata Containers are built as described below.
**Tip**: It is easiest to first have Kata Containers running on your system and then modify it to run containers in SNP-VMs. Follow the [Developer guide](../Developer-Guide.md#warning) and then follow the below steps. Nonetheless, you can just follow this guide from the start.
## How to build
Follow all of the below steps to install Kata Containers with SNP-support from scratch. These steps mostly follow the developer guide with modifications to support SNP
__Steps from the Developer Guide:__
- Get all the [required components](../Developer-Guide.md#requirements-to-build-individual-components) for building the kata-runtime
- [Build the and install kata-runtime](../Developer-Guide.md#build-and-install-the-kata-containers-runtime)
- [Build a custom agent](../Developer-Guide.md#build-a-custom-kata-agent---optional)
- [Create an initrd image](../Developer-Guide.md#create-an-initrd-image---optional) by first building a rootfs, then building the initrd based on the rootfs, use a custom agent and install. `ubuntu` works as the distribution of choice.
- Get the [required components](../../tools/packaging/kernel/README.md#requirements) to build a custom kernel
__SNP-specific steps:__
- Build the SNP-specific kernel as shown below (see this [guide](../../tools/packaging/kernel/README.md#build-kata-containers-kernel) for more information)
```bash
$ pushd kata-containers/tools/packaging/kernel/
$ ./build-kernel.sh -a x86_64 -x snp setup
$ ./build-kernel.sh -a x86_64 -x snp build
$ sudo -E PATH="${PATH}" ./build-kernel.sh -x snp install
$ popd
```
- Build a current OVMF capable of SEV-SNP:
```bash
$ pushd kata-containers/tools/packaging/static-build/ovmf
$ ./build.sh
$ tar -xvf edk2-x86_64.tar.gz
$ popd
```
- Build a custom QEMU
```bash
$ source kata-containers/tools/packaging/scripts/lib.sh
$ qemu_url="$(get_from_kata_deps "assets.hypervisor.qemu.snp.url")"
$ qemu_branch="$(get_from_kata_deps "assets.hypervisor.qemu.snp.branch")"
$ qemu_commit="$(get_from_kata_deps "assets.hypervisor.qemu.snp.commit")"
$ git clone -b "${qemu_branch}" "${qemu_url}"
$ pushd qemu
$ git checkout "${qemu_commit}"
$ ./configure --enable-virtfs --target-list=x86_64-softmmu --enable-debug
$ make -j "$(nproc)"
$ popd
```
### Kata Containers Configuration for SNP
The configuration file located at `/etc/kata-containers/configuration.toml` must be adapted as follows to support SNP-VMs:
- Use the SNP-specific kernel for the guest VM (change path)
```toml
kernel = "/usr/share/kata-containers/vmlinuz-snp.container"
```
- Enable the use of an initrd (uncomment)
```toml
initrd = "/usr/share/kata-containers/kata-containers-initrd.img"
```
- Disable the use of a rootfs (comment out)
```toml
# image = "/usr/share/kata-containers/kata-containers.img"
```
- Use the custom QEMU capable of SNP (change path)
```toml
path = "/path/to/qemu/build/qemu-system-x86_64"
```
- Use `virtio-9p` device since `virtio-fs` is unsupported due to bugs / shortcomings in QEMU version [`snp-v3`](https://github.com/AMDESE/qemu/tree/snp-v3) for SEV and SEV-SNP (change value)
```toml
shared_fs = "virtio-9p"
```
- Disable `virtiofsd` since it is no longer required (comment out)
```toml
# virtio_fs_daemon = "/usr/libexec/virtiofsd"
```
- Disable NVDIMM (uncomment)
```toml
disable_image_nvdimm = true
```
- Disable shared memory (uncomment)
```toml
file_mem_backend = ""
```
- Enable confidential guests (uncomment)
```toml
confidential_guest = true
```
- Enable SNP-VMs (uncomment)
```toml
sev_snp_guest = true
```
- Configure an OVMF (add path)
```toml
firmware = "/path/to/kata-containers/tools/packaging/static-build/ovmf/opt/kata/share/ovmf/OVMF.fd"
```
## Test Kata Containers with Containerd
With Kata Containers configured to support SNP-VMs, we use containerd to test and deploy containers in these VMs.
### Install Containerd
If not already present, follow [this guide](./containerd-kata.md#install) to install containerd and its related components including `CNI` and the `cri-tools` (skip Kata Containers since we already installed it)
### Containerd Configuration
Follow [this guide](./containerd-kata.md#configuration) to configure containerd to use Kata Containers
## Run Kata Containers in SNP-VMs
Run the below commands to start a container. See [this guide](./containerd-kata.md#run) for more information
```bash
$ sudo ctr image pull docker.io/library/busybox:latest
$ sudo ctr run --cni --runtime io.containerd.run.kata.v2 -t --rm docker.io/library/busybox:latest hello sh
```
### Check for active SNP:
Inside the running container, run the following commands to check if SNP is active. It should look something like this:
```
/ # dmesg | grep -i sev
[ 0.299242] Memory Encryption Features active: AMD SEV SEV-ES SEV-SNP
[ 0.472286] SEV: Using SNP CPUID table, 31 entries present.
[ 0.514574] SEV: SNP guest platform device initialized.
[ 0.885425] sev-guest sev-guest: Initialized SEV guest driver (using vmpck_id 0)
```
### Obtain an SNP Attestation Report
To obtain an attestation report inside the container, the `/dev/sev-guest` must first be configured. As of now, the VM does not perform this step, however it can be performed inside the container, either in the terminal or in code.
Example for shell:
```
/ # SNP_MAJOR=$(cat /sys/devices/virtual/misc/sev-guest/dev | awk -F: '{print $1}')
/ # SNP_MINOR=$(cat /sys/devices/virtual/misc/sev-guest/dev | awk -F: '{print $2}')
/ # mknod -m 600 /dev/sev-guest c "${SNP_MAJOR}" "${SNP_MINOR}"
```
## Known Issues
- Support for cgroups v2 is still [work in progress](https://github.com/kata-containers/kata-containers/issues/927). If issues occur due to cgroups v2 becoming the default in newer systems, one possible solution is to downgrade cgroups to v1:
```bash
sudo sed -i 's/^\(GRUB_CMDLINE_LINUX=".*\)"/\1 systemd.unified_cgroup_hierarchy=0"/' /etc/default/grub
sudo update-grub
sudo reboot
```
- If both SEV and SEV-SNP are supported by the host, Kata Containers uses SEV-SNP by default. You can verify what features are enabled by checking `/sys/module/kvm_amd/parameters/sev` and `sev_snp`. This means that Kata Containers can not run both SEV-SNP-VMs and SEV-VMs at the same time. If SEV is to be used by Kata Containers instead, reload the `kvm_amd` kernel module without SNP-support, this will disable SNP-support for the entire platform.
```bash
sudo rmmod kvm_amd && sudo modprobe kvm_amd sev_snp=0
```

View File

@@ -57,7 +57,6 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.hypervisor.enable_iothreads` | `boolean`| enable IO to be processed in a separate thread. Supported currently for virtio-`scsi` driver |
| `io.katacontainers.config.hypervisor.enable_mem_prealloc` | `boolean` | the memory space used for `nvdimm` device by the hypervisor |
| `io.katacontainers.config.hypervisor.enable_vhost_user_store` | `boolean` | enable vhost-user storage device (QEMU) |
| `io.katacontainers.config.hypervisor.vhost_user_reconnect_timeout_sec` | `string`| the timeout for reconnecting vhost user socket (QEMU)
| `io.katacontainers.config.hypervisor.enable_virtio_mem` | `boolean` | enable virtio-mem (QEMU) |
| `io.katacontainers.config.hypervisor.entropy_source` (R) | string| the path to a host source of entropy (`/dev/random`, `/dev/urandom` or real hardware RNG device) |
| `io.katacontainers.config.hypervisor.file_mem_backend` (R) | string | file based memory backend root directory |
@@ -88,7 +87,7 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.hypervisor.use_vsock` | `boolean` | specify use of `vsock` for agent communication |
| `io.katacontainers.config.hypervisor.vhost_user_store_path` (R) | `string` | specify the directory path where vhost-user devices related folders, sockets and device nodes should be (QEMU) |
| `io.katacontainers.config.hypervisor.virtio_fs_cache_size` | uint32 | virtio-fs DAX cache size in `MiB` |
| `io.katacontainers.config.hypervisor.virtio_fs_cache` | string | the cache mode for virtio-fs, valid values are `always`, `auto` and `never` |
| `io.katacontainers.config.hypervisor.virtio_fs_cache` | string | the cache mode for virtio-fs, valid values are `always`, `auto` and `none` |
| `io.katacontainers.config.hypervisor.virtio_fs_daemon` | string | virtio-fs `vhost-user` daemon path |
| `io.katacontainers.config.hypervisor.virtio_fs_extra_args` | string | extra options passed to `virtiofs` daemon |
| `io.katacontainers.config.hypervisor.enable_guest_swap` | `boolean` | enable swap in the guest |

View File

@@ -17,9 +17,9 @@ Enable setup swap device in guest kernel as follows:
$ sudo sed -i -e 's/^#enable_guest_swap.*$/enable_guest_swap = true/g' /etc/kata-containers/configuration.toml
```
## Run a Kata Containers utilizing swap device
## Run a Kata Container utilizing swap device
Use following command to start a Kata Containers with swappiness 60 and 1GB swap device (swap_in_bytes - memory_limit_in_bytes).
Use following command to start a Kata Container with swappiness 60 and 1GB swap device (swap_in_bytes - memory_limit_in_bytes).
```
$ pod_yaml=pod.yaml
$ container_yaml=container.yaml
@@ -43,12 +43,12 @@ command:
- top
EOF
$ sudo crictl pull $image
$ podid=$(sudo crictl runp --runtime kata $pod_yaml)
$ podid=$(sudo crictl runp $pod_yaml)
$ cid=$(sudo crictl create $podid $container_yaml $pod_yaml)
$ sudo crictl start $cid
```
Kata Containers setups swap device for this container only when `io.katacontainers.container.resource.swappiness` is set.
Kata Container setups swap device for this container only when `io.katacontainers.container.resource.swappiness` is set.
The following table shows the swap size how to decide if `io.katacontainers.container.resource.swappiness` is set.
|`io.katacontainers.container.resource.swap_in_bytes`|`memory_limit_in_bytes`|swap size|

View File

@@ -1,90 +0,0 @@
# Configure Kata Containers to use EROFS build rootfs
## Introduction
For kata containers, rootfs is used in the read-only way. EROFS can noticeably decrease metadata overhead.
`mkfs.erofs` can generate compressed and uncompressed EROFS images.
For uncompressed images, no files are compressed. However, it is optional to inline the data blocks at the end of the file with the metadata.
For compressed images, each file will be compressed using the lz4 or lz4hc algorithm, and it will be confirmed whether it can save space. Use No compression of the file if compression does not save space.
## Performance comparison
| | EROFS | EXT4 | XFS |
|-----------------|-------| --- | --- |
| Image Size [MB] | 106(uncompressed) | 256 | 126 |
## Guidance
### Install the `erofs-utils`
#### `apt/dnf` install
On newer `Ubuntu/Debian` systems, it can be installed directly using the `apt` command, and on `Fedora` it can be installed directly using the `dnf` command.
```shell
# Debian/Ubuntu
$ apt install erofs-utils
# Fedora
$ dnf install erofs-utils
```
#### Source install
[https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git](https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git)
##### Compile dependencies
If you need to enable the `Lz4` compression feature, `Lz4 1.8.0+` is required, and `Lz4 1.9.3+` is strongly recommended.
##### Compilation process
For some old lz4 versions (lz4-1.8.0~1.8.3), if lz4-static is not installed, the lz4hc algorithm will not be supported. lz4-static can be installed with apt install lz4-static.x86_64. However, these versions have some bugs in compression, and it is not recommended to use these versions directly.
If you use `lz4 1.9.0+`, you can directly use the following command to compile.
```shell
$ ./autogen.sh
$ ./configure
$ make
```
The compiled `mkfs.erofs` program will be saved in the `mkfs` directory. Afterwards, the generated tools can be installed to a system directory using make install (requires root privileges).
### Create a local rootfs
```shell
$ export distro="ubuntu"
$ export FS_TYPE="erofs"
$ export ROOTFS_DIR="realpath kata-containers/tools/osbuilder/rootfs-builder/rootfs"
$ sudo rm -rf "${ROOTFS_DIR}"
$ pushd kata-containers/tools/osbuilder/rootfs-builder
$ script -fec 'sudo -E SECCOMP=no ./rootfs.sh "${distro}"'
$ popd
```
### Add a custom agent to the image - OPTIONAL
> Note:
> - You should only do this step if you are testing with the latest version of the agent.
```shell
$ sudo install -o root -g root -m 0550 -t "${ROOTFS_DIR}/usr/bin" "${ROOTFS_DIR}/../../../../src/agent/target/x86_64-unknown-linux-musl/release/kata-agent"
$ sudo install -o root -g root -m 0440 "${ROOTFS_DIR}/../../../../src/agent/kata-agent.service" "${ROOTFS_DIR}/usr/lib/systemd/system/"
$ sudo install -o root -g root -m 0440 "${ROOTFS_DIR}/../../../../src/agent/kata-containers.target" "${ROOTFS_DIR}/usr/lib/systemd/system/"
```
### Build a root image
```shell
$ pushd kata-containers/tools/osbuilder/image-builder
$ script -fec 'sudo -E ./image_builder.sh "${ROOTFS_DIR}"'
$ popd
```
### Install the rootfs image
```shell
$ pushd kata-containers/tools/osbuilder/image-builder
$ commit="$(git log --format=%h -1 HEAD)"
$ date="$(date +%Y-%m-%d-%T.%N%z)"
$ rootfs="erofs"
$ image="kata-containers-${rootfs}-${date}-${commit}"
$ sudo install -o root -g root -m 0640 -D kata-containers.img "/usr/share/kata-containers/${image}"
$ (cd /usr/share/kata-containers && sudo ln -sf "$image" kata-containers.img)
$ popd
```
### Use `EROFS` in the runtime
```shell
$ sudo sed -i -e 's/^# *\(rootfs_type\).*=.*$/\1 = erofs/g' /etc/kata-containers/configuration.toml
```

View File

@@ -104,7 +104,7 @@ sudo dmsetup create "${POOL_NAME}" \
cat << EOF
#
# Add this to your config.toml configuration file and restart containerd daemon
# Add this to your config.toml configuration file and restart `containerd` daemon
#
[plugins]
[plugins.devmapper]
@@ -212,7 +212,7 @@ Next, we need to configure containerd. Add a file in your path (e.g. `/usr/local
```
#!/bin/bash
KATA_CONF_FILE=/etc/kata-containers/configuration-fc.toml /usr/local/bin/containerd-shim-kata-v2 $@
KATA_CONF_FILE=/etc/containers/configuration-fc.toml /usr/local/bin/containerd-shim-kata-v2 $@
```
> **Note:** You may need to edit the paths of the configuration file and the `containerd-shim-kata-v2` to correspond to your setup.

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 9.0 KiB

View File

@@ -24,7 +24,7 @@ architectures:
| Installation method | Description | Automatic updates | Use case | Availability
|------------------------------------------------------|----------------------------------------------------------------------------------------------|-------------------|-----------------------------------------------------------------------------------------------|----------- |
| [Using kata-deploy](#kata-deploy-installation) | The preferred way to deploy the Kata Containers distributed binaries on a Kubernetes cluster | **No!** | Best way to give it a try on kata-containers on an already up and running Kubernetes cluster. | Yes |
| [Using kata-deploy](#kata-deploy-installation) | The preferred way to deploy the Kata Containers distributed binaries on a Kubernetes cluster | **No!** | Best way to give it a try on kata-containers on an already up and running Kubernetes cluster. | No |
| [Using official distro packages](#official-packages) | Kata packages provided by Linux distributions official repositories | yes | Recommended for most users. | No |
| [Using snap](#snap-installation) | Easy to install | yes | Good alternative to official distro packages. | No |
| [Automatic](#automatic-installation) | Run a single command to install a full system | **No!** | For those wanting the latest release quickly. | No |
@@ -32,8 +32,7 @@ architectures:
| [Build from source](#build-from-source-installation) | Build the software components manually | **No!** | Power users and developers only. | Yes |
### Kata Deploy Installation
Follow the [`kata-deploy`](../../tools/packaging/kata-deploy/README.md).
`ToDo`
### Official packages
`ToDo`
### Snap Installation
@@ -49,14 +48,14 @@ Follow the [`kata-deploy`](../../tools/packaging/kata-deploy/README.md).
* Download `Rustup` and install `Rust`
> **Notes:**
> Rust version 1.62.0 is needed
> Rust version 1.58 is needed
Example for `x86_64`
```
$ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
$ source $HOME/.cargo/env
$ rustup install 1.62.0
$ rustup default 1.62.0-x86_64-unknown-linux-gnu
$ rustup install 1.58
$ rustup default 1.58-x86_64-unknown-linux-gnu
```
* Musl support for fully static binary
@@ -84,7 +83,7 @@ $ git clone https://github.com/kata-containers/kata-containers.git
$ cd kata-containers/src/runtime-rs
$ make && sudo make install
```
After running the command above, the default config file `configuration.toml` will be installed under `/usr/share/defaults/kata-containers/`, the binary file `containerd-shim-kata-v2` will be installed under `/usr/local/bin/` .
After running the command above, the default config file `configuration.toml` will be installed under `/usr/share/defaults/kata-containers/`, the binary file `containerd-shim-kata-v2` will be installed under `/user/local/bin` .
### Build Kata Containers Kernel
Follow the [Kernel installation guide](/tools/packaging/kernel/README.md).

View File

@@ -71,6 +71,12 @@ To use containerd, modify the `--container-runtime` argument:
> **Notes:**
> - Adjust the `--memory 6144` line to suit your environment and requirements. Kata Containers default to
> requesting 2048MB per container. We recommended you supply more than that to the Minikube node.
> - Prior to Minikube/Kubernetes v1.14, the beta `RuntimeClass` feature also needed enabling with
> the following.
>
> | what | why |
> | ---- | --- |
> | `--feature-gates=RuntimeClass=true` | Kata needs to use the `RuntimeClass` Kubernetes feature |
The full command is therefore:
@@ -132,9 +138,17 @@ $ kubectl -n kube-system exec ${podname} -- ps -ef | fgrep infinity
## Enabling Kata Containers
> **Note:** Only Minikube/Kubernetes versions <= 1.13 require this step. Since version
> v1.14, the `RuntimeClass` is enabled by default. Performing this step on Kubernetes > v1.14 is
> however benign.
Now you have installed the Kata Containers components in the Minikube node. Next, you need to configure
Kubernetes `RuntimeClass` to know when to use Kata Containers to run a pod.
```sh
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/node-api/master/manifests/runtimeclass_crd.yaml > runtimeclass_crd.yaml
```
### Register the runtime
Now register the `kata qemu` runtime with that class. This should result in no errors:

View File

@@ -545,12 +545,6 @@ Create the hook execution file for Kata:
/usr/bin/nvidia-container-toolkit -debug $@
```
Make sure the hook shell is executable:
```sh
chmod +x $ROOTFS_DIR/usr/share/oci/hooks/prestart/nvidia-container-toolkit.sh
```
As the last step one can do some cleanup of files or package caches. Build the
rootfs and configure it for use with Kata according to the development guide.

View File

@@ -49,7 +49,7 @@ the latest driver.
$ export QAT_DRIVER_VER=qat1.7.l.4.14.0-00031.tar.gz
$ export QAT_DRIVER_URL=https://downloadmirror.intel.com/30178/eng/${QAT_DRIVER_VER}
$ export QAT_CONF_LOCATION=~/QAT_conf
$ export QAT_DOCKERFILE=https://raw.githubusercontent.com/intel/intel-device-plugins-for-kubernetes/main/demo/openssl-qat-engine/Dockerfile
$ export QAT_DOCKERFILE=https://raw.githubusercontent.com/intel/intel-device-plugins-for-kubernetes/master/demo/openssl-qat-engine/Dockerfile
$ export QAT_SRC=~/src/QAT
$ export GOPATH=~/src/go
$ export KATA_KERNEL_LOCATION=~/kata

View File

@@ -61,9 +61,6 @@ spec:
name: eosgx-demo-job-1
image: oeciteam/oe-helloworld:latest
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /dev
name: dev-mount
securityContext:
readOnlyRootFilesystem: true
capabilities:

View File

@@ -197,6 +197,11 @@ vhost_user_store_path = "<Path of the base directory for vhost-user device>"
> under `[hypervisor.qemu]` section.
For the subdirectories of `vhost_user_store_path`: `block` is used for block
device; `block/sockets` is where we expect UNIX domain sockets for vhost-user
block devices to live; `block/devices` is where simulated block device nodes
for vhost-user block devices are created.
For the subdirectories of `vhost_user_store_path`:
- `block` is used for block device;
- `block/sockets` is where we expect UNIX domain sockets for vhost-user

628
src/agent/Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -23,6 +23,7 @@ regex = "1.5.6"
serial_test = "0.5.1"
kata-sys-util = { path = "../libs/kata-sys-util" }
kata-types = { path = "../libs/kata-types" }
sysinfo = "0.23.0"
# Async helpers
async-trait = "0.1.42"
@@ -30,7 +31,7 @@ async-recursion = "0.3.2"
futures = "0.3.17"
# Async runtime
tokio = { version = "1.28.1", features = ["full"] }
tokio = { version = "1.14.0", features = ["full"] }
tokio-vsock = "0.3.1"
netlink-sys = { version = "0.7.0", features = ["tokio_socket",]}
@@ -51,7 +52,7 @@ log = "0.4.11"
prometheus = { version = "0.13.0", features = ["process"] }
procfs = "0.12.0"
anyhow = "1.0.32"
cgroups = { package = "cgroups-rs", version = "0.3.2" }
cgroups = { package = "cgroups-rs", version = "0.2.10" }
# Tracing
tracing = "0.1.26"

View File

@@ -107,8 +107,6 @@ endef
##TARGET default: build code
default: $(TARGET) show-header
static-checks-build: $(GENERATED_CODE)
$(TARGET): $(GENERATED_CODE) $(TARGET_PATH)
$(TARGET_PATH): show-summary

View File

@@ -11,7 +11,6 @@ serde_json = "1.0.39"
serde_derive = "1.0.91"
oci = { path = "../../libs/oci" }
protocols = { path ="../../libs/protocols" }
kata-sys-util = { path = "../../libs/kata-sys-util" }
caps = "0.5.0"
nix = "0.24.2"
scopeguard = "1.0.0"
@@ -25,18 +24,15 @@ scan_fmt = "0.2.6"
regex = "1.5.6"
path-absolutize = "1.2.0"
anyhow = "1.0.32"
cgroups = { package = "cgroups-rs", version = "0.3.2" }
cgroups = { package = "cgroups-rs", version = "0.2.10" }
rlimit = "0.5.3"
cfg-if = "0.1.0"
tokio = { version = "1.28.1", features = ["sync", "io-util", "process", "time", "macros", "rt"] }
tokio = { version = "1.2.0", features = ["sync", "io-util", "process", "time", "macros", "rt"] }
futures = "0.3.17"
async-trait = "0.1.31"
inotify = "0.9.2"
libseccomp = { version = "0.3.0", optional = true }
zbus = "2.3.0"
bit-vec= "0.6.3"
xattr = "0.2.3"
libseccomp = { version = "0.2.3", optional = true }
[dev-dependencies]
serial_test = "0.5.0"

View File

@@ -32,7 +32,6 @@ use protocols::agent::{
BlkioStats, BlkioStatsEntry, CgroupStats, CpuStats, CpuUsage, HugetlbStats, MemoryData,
MemoryStats, PidsStats, ThrottlingData,
};
use std::any::Any;
use std::collections::HashMap;
use std::fs;
use std::path::Path;
@@ -76,7 +75,7 @@ macro_rules! set_resource {
impl CgroupManager for Manager {
fn apply(&self, pid: pid_t) -> Result<()> {
self.cgroup.add_task_by_tgid(CgroupPid::from(pid as u64))?;
self.cgroup.add_task(CgroupPid::from(pid as u64))?;
Ok(())
}
@@ -194,83 +193,6 @@ impl CgroupManager for Manager {
Ok(result)
}
fn update_cpuset_path(&self, guest_cpuset: &str, container_cpuset: &str) -> Result<()> {
if guest_cpuset.is_empty() {
return Ok(());
}
info!(sl!(), "update_cpuset_path to: {}", guest_cpuset);
let h = cgroups::hierarchies::auto();
let root_cg = h.root_control_group();
let root_cpuset_controller: &CpuSetController = root_cg.controller_of().unwrap();
let path = root_cpuset_controller.path();
let root_path = Path::new(path);
info!(sl!(), "root cpuset path: {:?}", &path);
let container_cpuset_controller: &CpuSetController = self.cgroup.controller_of().unwrap();
let path = container_cpuset_controller.path();
let container_path = Path::new(path);
info!(sl!(), "container cpuset path: {:?}", &path);
let mut paths = vec![];
for ancestor in container_path.ancestors() {
if ancestor == root_path {
break;
}
paths.push(ancestor);
}
info!(sl!(), "parent paths to update cpuset: {:?}", &paths);
let mut i = paths.len();
loop {
if i == 0 {
break;
}
i -= 1;
// remove cgroup root from path
let r_path = &paths[i]
.to_str()
.unwrap()
.trim_start_matches(root_path.to_str().unwrap());
info!(sl!(), "updating cpuset for parent path {:?}", &r_path);
let cg = new_cgroup(cgroups::hierarchies::auto(), r_path)?;
let cpuset_controller: &CpuSetController = cg.controller_of().unwrap();
cpuset_controller.set_cpus(guest_cpuset)?;
}
if !container_cpuset.is_empty() {
info!(
sl!(),
"updating cpuset for container path: {:?} cpuset: {}",
&container_path,
container_cpuset
);
container_cpuset_controller.set_cpus(container_cpuset)?;
}
Ok(())
}
fn get_cgroup_path(&self, cg: &str) -> Result<String> {
if cgroups::hierarchies::is_cgroup2_unified_mode() {
let cg_path = format!("/sys/fs/cgroup/{}", self.cpath);
return Ok(cg_path);
}
// for cgroup v1
Ok(self.paths.get(cg).map(|s| s.to_string()).unwrap())
}
fn as_any(&self) -> Result<&dyn Any> {
Ok(self)
}
fn name(&self) -> &str {
"cgroupfs"
}
}
fn set_network_resources(
@@ -330,28 +252,19 @@ fn set_devices_resources(
}
fn set_hugepages_resources(
cg: &cgroups::Cgroup,
_cg: &cgroups::Cgroup,
hugepage_limits: &[LinuxHugepageLimit],
res: &mut cgroups::Resources,
) {
info!(sl!(), "cgroup manager set hugepage");
let mut limits = vec![];
let hugetlb_controller = cg.controller_of::<HugeTlbController>();
for l in hugepage_limits.iter() {
if hugetlb_controller.is_some() && hugetlb_controller.unwrap().size_supported(&l.page_size)
{
let hr = HugePageResource {
size: l.page_size.clone(),
limit: l.limit,
};
limits.push(hr);
} else {
warn!(
sl!(),
"{} page size support cannot be verified, dropping requested limit", l.page_size
);
}
let hr = HugePageResource {
size: l.page_size.clone(),
limit: l.limit,
};
limits.push(hr);
}
res.hugepages.limits = limits;
}
@@ -695,6 +608,17 @@ fn get_cpuacct_stats(cg: &cgroups::Cgroup) -> SingularPtrField<CpuUsage> {
});
}
if cg.v2() {
return SingularPtrField::some(CpuUsage {
total_usage: 0,
percpu_usage: vec![],
usage_in_kernelmode: 0,
usage_in_usermode: 0,
unknown_fields: UnknownFields::default(),
cached_size: CachedSize::default(),
});
}
// try to get from cpu controller
let cpu_controller: &CpuController = get_controller_or_return_singular_none!(cg);
let stat = cpu_controller.cpu().stat;
@@ -725,7 +649,7 @@ fn get_memory_stats(cg: &cgroups::Cgroup) -> SingularPtrField<MemoryStats> {
let value = memory.use_hierarchy;
let use_hierarchy = value == 1;
// get memory data
// gte memory datas
let usage = SingularPtrField::some(MemoryData {
usage: memory.usage_in_bytes,
max_usage: memory.max_usage_in_bytes,
@@ -1016,9 +940,9 @@ pub fn get_mounts(paths: &HashMap<String, String>) -> Result<HashMap<String, Str
Ok(m)
}
fn new_cgroup(h: Box<dyn cgroups::Hierarchy>, path: &str) -> Result<Cgroup> {
fn new_cgroup(h: Box<dyn cgroups::Hierarchy>, path: &str) -> Cgroup {
let valid_path = path.trim_start_matches('/').to_string();
cgroups::Cgroup::new(h, valid_path.as_str()).map_err(anyhow::Error::from)
cgroups::Cgroup::new(h, valid_path.as_str())
}
impl Manager {
@@ -1040,16 +964,83 @@ impl Manager {
m.insert(key.to_string(), p);
}
let cg = new_cgroup(cgroups::hierarchies::auto(), cpath)?;
Ok(Self {
paths: m,
mounts,
// rels: paths,
cpath: cpath.to_string(),
cgroup: cg,
cgroup: new_cgroup(cgroups::hierarchies::auto(), cpath),
})
}
pub fn update_cpuset_path(&self, guest_cpuset: &str, container_cpuset: &str) -> Result<()> {
if guest_cpuset.is_empty() {
return Ok(());
}
info!(sl!(), "update_cpuset_path to: {}", guest_cpuset);
let h = cgroups::hierarchies::auto();
let root_cg = h.root_control_group();
let root_cpuset_controller: &CpuSetController = root_cg.controller_of().unwrap();
let path = root_cpuset_controller.path();
let root_path = Path::new(path);
info!(sl!(), "root cpuset path: {:?}", &path);
let container_cpuset_controller: &CpuSetController = self.cgroup.controller_of().unwrap();
let path = container_cpuset_controller.path();
let container_path = Path::new(path);
info!(sl!(), "container cpuset path: {:?}", &path);
let mut paths = vec![];
for ancestor in container_path.ancestors() {
if ancestor == root_path {
break;
}
paths.push(ancestor);
}
info!(sl!(), "parent paths to update cpuset: {:?}", &paths);
let mut i = paths.len();
loop {
if i == 0 {
break;
}
i -= 1;
// remove cgroup root from path
let r_path = &paths[i]
.to_str()
.unwrap()
.trim_start_matches(root_path.to_str().unwrap());
info!(sl!(), "updating cpuset for parent path {:?}", &r_path);
let cg = new_cgroup(cgroups::hierarchies::auto(), r_path);
let cpuset_controller: &CpuSetController = cg.controller_of().unwrap();
cpuset_controller.set_cpus(guest_cpuset)?;
}
if !container_cpuset.is_empty() {
info!(
sl!(),
"updating cpuset for container path: {:?} cpuset: {}",
&container_path,
container_cpuset
);
container_cpuset_controller.set_cpus(container_cpuset)?;
}
Ok(())
}
pub fn get_cg_path(&self, cg: &str) -> Option<String> {
if cgroups::hierarchies::is_cgroup2_unified_mode() {
let cg_path = format!("/sys/fs/cgroup/{}", self.cpath);
return Some(cg_path);
}
// for cgroup v1
self.paths.get(cg).map(|s| s.to_string())
}
}
// get the guest's online cpus.

View File

@@ -11,7 +11,6 @@ use anyhow::Result;
use cgroups::freezer::FreezerState;
use libc::{self, pid_t};
use oci::LinuxResources;
use std::any::Any;
use std::collections::HashMap;
use std::string::String;
@@ -54,22 +53,6 @@ impl CgroupManager for Manager {
fn get_pids(&self) -> Result<Vec<pid_t>> {
Ok(Vec::new())
}
fn update_cpuset_path(&self, _: &str, _: &str) -> Result<()> {
Ok(())
}
fn get_cgroup_path(&self, _: &str) -> Result<String> {
Ok("".to_string())
}
fn as_any(&self) -> Result<&dyn Any> {
Ok(self)
}
fn name(&self) -> &str {
"mock"
}
}
impl Manager {
@@ -80,4 +63,12 @@ impl Manager {
cpath: cpath.to_string(),
})
}
pub fn update_cpuset_path(&self, _: &str, _: &str) -> Result<()> {
Ok(())
}
pub fn get_cg_path(&self, _: &str) -> Option<String> {
Some("".to_string())
}
}

View File

@@ -4,10 +4,8 @@
//
use anyhow::{anyhow, Result};
use core::fmt::Debug;
use oci::LinuxResources;
use protocols::agent::CgroupStats;
use std::any::Any;
use cgroups::freezer::FreezerState;
@@ -40,24 +38,4 @@ pub trait Manager {
fn set(&self, _container: &LinuxResources, _update: bool) -> Result<()> {
Err(anyhow!("not supported!"))
}
fn update_cpuset_path(&self, _: &str, _: &str) -> Result<()> {
Err(anyhow!("not supported!"))
}
fn get_cgroup_path(&self, _: &str) -> Result<String> {
Err(anyhow!("not supported!"))
}
fn as_any(&self) -> Result<&dyn Any> {
Err(anyhow!("not supported!"))
}
fn name(&self) -> &str;
}
impl Debug for dyn Manager + Send + Sync {
fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
write!(f, "{}", self.name())
}
}

View File

@@ -0,0 +1,10 @@
// Copyright (c) 2019 Ant Financial
//
// SPDX-License-Identifier: Apache-2.0
//
use crate::cgroups::Manager as CgroupManager;
pub struct Manager {}
impl CgroupManager for Manager {}

View File

@@ -1,95 +0,0 @@
// Copyright 2021-2022 Kata Contributors
//
// SPDX-License-Identifier: Apache-2.0
//
use anyhow::{anyhow, Result};
use super::common::{DEFAULT_SLICE, SCOPE_SUFFIX, SLICE_SUFFIX};
use std::string::String;
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct CgroupsPath {
pub slice: String,
pub prefix: String,
pub name: String,
}
impl CgroupsPath {
pub fn new(cgroups_path_str: &str) -> Result<Self> {
let path_vec: Vec<&str> = cgroups_path_str.split(':').collect();
if path_vec.len() != 3 {
return Err(anyhow!("invalid cpath: {:?}", cgroups_path_str));
}
Ok(CgroupsPath {
slice: if path_vec[0].is_empty() {
DEFAULT_SLICE.to_string()
} else {
path_vec[0].to_owned()
},
prefix: path_vec[1].to_owned(),
name: path_vec[2].to_owned(),
})
}
// ref: https://github.com/opencontainers/runc/blob/main/docs/systemd.md
// return: (parent_slice, unit_name)
pub fn parse(&self) -> Result<(String, String)> {
Ok((
parse_parent(self.slice.to_owned())?,
get_unit_name(self.prefix.to_owned(), self.name.to_owned()),
))
}
}
fn parse_parent(slice: String) -> Result<String> {
if !slice.ends_with(SLICE_SUFFIX) || slice.contains('/') {
return Err(anyhow!("invalid slice name: {}", slice));
} else if slice == "-.slice" {
return Ok(String::new());
}
let mut slice_path = String::new();
let mut prefix = String::new();
for subslice in slice.trim_end_matches(SLICE_SUFFIX).split('-') {
if subslice.is_empty() {
return Err(anyhow!("invalid slice name: {}", slice));
}
slice_path = format!("{}/{}{}{}", slice_path, prefix, subslice, SLICE_SUFFIX);
prefix = format!("{}{}-", prefix, subslice);
}
slice_path.remove(0);
Ok(slice_path)
}
fn get_unit_name(prefix: String, name: String) -> String {
if name.ends_with(SLICE_SUFFIX) {
name
} else if prefix.is_empty() {
format!("{}{}", name, SCOPE_SUFFIX)
} else {
format!("{}-{}{}", prefix, name, SCOPE_SUFFIX)
}
}
#[cfg(test)]
mod tests {
use super::CgroupsPath;
#[test]
fn test_cgroup_path_parse() {
let slice = "system.slice";
let prefix = "kata_agent";
let name = "123";
let cgroups_path =
CgroupsPath::new(format!("{}:{}:{}", slice, prefix, name).as_str()).unwrap();
assert_eq!(slice, cgroups_path.slice.as_str());
assert_eq!(prefix, cgroups_path.prefix.as_str());
assert_eq!(name, cgroups_path.name.as_str());
let (parent_slice, unit_name) = cgroups_path.parse().unwrap();
assert_eq!(format!("{}", slice), parent_slice);
assert_eq!(format!("{}-{}.scope", prefix, name), unit_name);
}
}

View File

@@ -1,17 +0,0 @@
// Copyright 2021-2022 Kata Contributors
//
// SPDX-License-Identifier: Apache-2.0
//
pub const DEFAULT_SLICE: &str = "system.slice";
pub const SLICE_SUFFIX: &str = ".slice";
pub const SCOPE_SUFFIX: &str = ".scope";
pub const UNIT_MODE: &str = "replace";
pub type Properties<'a> = Vec<(&'a str, zbus::zvariant::Value<'a>)>;
#[derive(Serialize, Deserialize, Debug, Clone)]
pub enum CgroupHierarchy {
Legacy,
Unified,
}

View File

@@ -1,129 +0,0 @@
// Copyright 2021-2022 Kata Contributors
//
// SPDX-License-Identifier: Apache-2.0
//
use std::vec;
use super::common::CgroupHierarchy;
use super::common::{Properties, SLICE_SUFFIX, UNIT_MODE};
use super::interface::system::ManagerProxyBlocking as SystemManager;
use anyhow::{Context, Result};
use zbus::zvariant::Value;
pub trait SystemdInterface {
fn start_unit(
&self,
pid: i32,
parent: &str,
unit_name: &str,
cg_hierarchy: &CgroupHierarchy,
) -> Result<()>;
fn set_properties(&self, unit_name: &str, properties: &Properties) -> Result<()>;
fn stop_unit(&self, unit_name: &str) -> Result<()>;
fn get_version(&self) -> Result<String>;
fn unit_exists(&self, unit_name: &str) -> Result<bool>;
fn add_process(&self, pid: i32, unit_name: &str) -> Result<()>;
}
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct DBusClient {}
impl DBusClient {
fn build_proxy(&self) -> Result<SystemManager<'static>> {
let connection =
zbus::blocking::Connection::system().context("Establishing a D-Bus connection")?;
let proxy = SystemManager::new(&connection).context("Building a D-Bus proxy manager")?;
Ok(proxy)
}
}
impl SystemdInterface for DBusClient {
fn start_unit(
&self,
pid: i32,
parent: &str,
unit_name: &str,
cg_hierarchy: &CgroupHierarchy,
) -> Result<()> {
let proxy = self.build_proxy()?;
// enable CPUAccounting & MemoryAccounting & (Block)IOAccounting by default
let mut properties: Properties = vec![
("CPUAccounting", Value::Bool(true)),
("DefaultDependencies", Value::Bool(false)),
("MemoryAccounting", Value::Bool(true)),
("TasksAccounting", Value::Bool(true)),
("Description", Value::Str("kata-agent container".into())),
("PIDs", Value::Array(vec![pid as u32].into())),
];
match *cg_hierarchy {
CgroupHierarchy::Legacy => properties.push(("IOAccounting", Value::Bool(true))),
CgroupHierarchy::Unified => properties.push(("BlockIOAccounting", Value::Bool(true))),
}
if unit_name.ends_with(SLICE_SUFFIX) {
properties.push(("Wants", Value::Str(parent.into())));
} else {
properties.push(("Slice", Value::Str(parent.into())));
properties.push(("Delegate", Value::Bool(true)));
}
proxy
.start_transient_unit(unit_name, UNIT_MODE, &properties, &[])
.with_context(|| format!("failed to start transient unit {}", unit_name))?;
Ok(())
}
fn set_properties(&self, unit_name: &str, properties: &Properties) -> Result<()> {
let proxy = self.build_proxy()?;
proxy
.set_unit_properties(unit_name, true, properties)
.with_context(|| format!("failed to set unit properties {}", unit_name))?;
Ok(())
}
fn stop_unit(&self, unit_name: &str) -> Result<()> {
let proxy = self.build_proxy()?;
proxy
.stop_unit(unit_name, UNIT_MODE)
.with_context(|| format!("failed to stop unit {}", unit_name))?;
Ok(())
}
fn get_version(&self) -> Result<String> {
let proxy = self.build_proxy()?;
let systemd_version = proxy
.version()
.with_context(|| "failed to get systemd version".to_string())?;
Ok(systemd_version)
}
fn unit_exists(&self, unit_name: &str) -> Result<bool> {
let proxy = self
.build_proxy()
.with_context(|| format!("Checking if systemd unit {} exists", unit_name))?;
Ok(proxy.get_unit(unit_name).is_ok())
}
fn add_process(&self, pid: i32, unit_name: &str) -> Result<()> {
let proxy = self.build_proxy()?;
proxy
.attach_processes_to_unit(unit_name, "/", &[pid as u32])
.with_context(|| format!("failed to add process {}", unit_name))?;
Ok(())
}
}

View File

@@ -1,7 +0,0 @@
// Copyright 2021-2022 Kata Contributors
//
// SPDX-License-Identifier: Apache-2.0
//
pub(crate) mod session;
pub(crate) mod system;

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,133 +0,0 @@
// Copyright 2021-2022 Kata Contributors
//
// SPDX-License-Identifier: Apache-2.0
//
use crate::cgroups::Manager as CgroupManager;
use crate::protocols::agent::CgroupStats;
use anyhow::Result;
use cgroups::freezer::FreezerState;
use libc::{self, pid_t};
use oci::LinuxResources;
use std::any::Any;
use std::collections::HashMap;
use std::convert::TryInto;
use std::string::String;
use std::vec;
use super::super::fs::Manager as FsManager;
use super::cgroups_path::CgroupsPath;
use super::common::{CgroupHierarchy, Properties};
use super::dbus_client::{DBusClient, SystemdInterface};
use super::subsystem::transformer::Transformer;
use super::subsystem::{cpu::Cpu, cpuset::CpuSet, memory::Memory, pids::Pids};
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct Manager {
pub paths: HashMap<String, String>,
pub mounts: HashMap<String, String>,
pub cgroups_path: CgroupsPath,
pub cpath: String,
pub unit_name: String,
// dbus client for set properties
dbus_client: DBusClient,
// fs manager for get properties
fs_manager: FsManager,
// cgroup version for different dbus properties
cg_hierarchy: CgroupHierarchy,
}
impl CgroupManager for Manager {
fn apply(&self, pid: pid_t) -> Result<()> {
let unit_name = self.unit_name.as_str();
if self.dbus_client.unit_exists(unit_name)? {
self.dbus_client.add_process(pid, self.unit_name.as_str())?;
} else {
self.dbus_client.start_unit(
(pid as u32).try_into().unwrap(),
self.cgroups_path.slice.as_str(),
self.unit_name.as_str(),
&self.cg_hierarchy,
)?;
}
Ok(())
}
fn set(&self, r: &LinuxResources, _: bool) -> Result<()> {
let mut properties: Properties = vec![];
let systemd_version = self.dbus_client.get_version()?;
let systemd_version_str = systemd_version.as_str();
Cpu::apply(r, &mut properties, &self.cg_hierarchy, systemd_version_str)?;
Memory::apply(r, &mut properties, &self.cg_hierarchy, systemd_version_str)?;
Pids::apply(r, &mut properties, &self.cg_hierarchy, systemd_version_str)?;
CpuSet::apply(r, &mut properties, &self.cg_hierarchy, systemd_version_str)?;
self.dbus_client
.set_properties(self.unit_name.as_str(), &properties)?;
Ok(())
}
fn get_stats(&self) -> Result<CgroupStats> {
self.fs_manager.get_stats()
}
fn freeze(&self, state: FreezerState) -> Result<()> {
self.fs_manager.freeze(state)
}
fn destroy(&mut self) -> Result<()> {
self.dbus_client.stop_unit(self.unit_name.as_str())?;
self.fs_manager.destroy()
}
fn get_pids(&self) -> Result<Vec<pid_t>> {
self.fs_manager.get_pids()
}
fn update_cpuset_path(&self, guest_cpuset: &str, container_cpuset: &str) -> Result<()> {
self.fs_manager
.update_cpuset_path(guest_cpuset, container_cpuset)
}
fn get_cgroup_path(&self, cg: &str) -> Result<String> {
self.fs_manager.get_cgroup_path(cg)
}
fn as_any(&self) -> Result<&dyn Any> {
Ok(self)
}
fn name(&self) -> &str {
"systemd"
}
}
impl Manager {
pub fn new(cgroups_path_str: &str) -> Result<Self> {
let cgroups_path = CgroupsPath::new(cgroups_path_str)?;
let (parent_slice, unit_name) = cgroups_path.parse()?;
let cpath = parent_slice + "/" + &unit_name;
let fs_manager = FsManager::new(cpath.as_str())?;
Ok(Manager {
paths: fs_manager.paths.clone(),
mounts: fs_manager.mounts.clone(),
cgroups_path,
cpath,
unit_name,
dbus_client: DBusClient {},
fs_manager,
cg_hierarchy: if cgroups::hierarchies::is_cgroup2_unified_mode() {
CgroupHierarchy::Unified
} else {
CgroupHierarchy::Legacy
},
})
}
}

View File

@@ -1,12 +0,0 @@
// Copyright 2021-2022 Kata Contributors
//
// SPDX-License-Identifier: Apache-2.0
//
pub mod manager;
mod cgroups_path;
mod common;
mod dbus_client;
mod interface;
mod subsystem;

View File

@@ -1,139 +0,0 @@
// Copyright 2021-2022 Kata Contributors
//
// SPDX-License-Identifier: Apache-2.0
//
use super::super::common::{CgroupHierarchy, Properties};
use super::transformer::Transformer;
use anyhow::Result;
use oci::{LinuxCpu, LinuxResources};
use zbus::zvariant::Value;
const BASIC_SYSTEMD_VERSION: &str = "242";
const DEFAULT_CPUQUOTAPERIOD: u64 = 100 * 1000;
const SEC2MICROSEC: u64 = 1000 * 1000;
const BASIC_INTERVAL: u64 = 10 * 1000;
pub struct Cpu {}
impl Transformer for Cpu {
fn apply(
r: &LinuxResources,
properties: &mut Properties,
cgroup_hierarchy: &CgroupHierarchy,
systemd_version: &str,
) -> Result<()> {
if let Some(cpu_resources) = &r.cpu {
match cgroup_hierarchy {
CgroupHierarchy::Legacy => {
Self::legacy_apply(cpu_resources, properties, systemd_version)?
}
CgroupHierarchy::Unified => {
Self::unified_apply(cpu_resources, properties, systemd_version)?
}
}
}
Ok(())
}
}
impl Cpu {
// v1:
// cpu.shares <-> CPUShares
// cpu.period <-> CPUQuotaPeriodUSec
// cpu.period & cpu.quota <-> CPUQuotaPerSecUSec
fn legacy_apply(
cpu_resources: &LinuxCpu,
properties: &mut Properties,
systemd_version: &str,
) -> Result<()> {
if let Some(shares) = cpu_resources.shares {
properties.push(("CPUShares", Value::U64(shares)));
}
if let Some(period) = cpu_resources.period {
if period != 0 && systemd_version >= BASIC_SYSTEMD_VERSION {
properties.push(("CPUQuotaPeriodUSec", Value::U64(period)));
}
}
if let Some(quota) = cpu_resources.quota {
let period = cpu_resources.period.unwrap_or(DEFAULT_CPUQUOTAPERIOD);
if period != 0 {
let cpu_quota_per_sec_usec = resolve_cpuquota(quota, period);
properties.push(("CPUQuotaPerSecUSec", Value::U64(cpu_quota_per_sec_usec)));
}
}
Ok(())
}
// v2:
// cpu.shares <-> CPUWeight
// cpu.period <-> CPUQuotaPeriodUSec
// cpu.period & cpu.quota <-> CPUQuotaPerSecUSec
fn unified_apply(
cpu_resources: &LinuxCpu,
properties: &mut Properties,
systemd_version: &str,
) -> Result<()> {
if let Some(shares) = cpu_resources.shares {
let weight = shares_to_weight(shares);
properties.push(("CPUWeight", Value::U64(weight)));
}
if let Some(period) = cpu_resources.period {
if period != 0 && systemd_version >= BASIC_SYSTEMD_VERSION {
properties.push(("CPUQuotaPeriodUSec", Value::U64(period)));
}
}
if let Some(quota) = cpu_resources.quota {
let period = cpu_resources.period.unwrap_or(DEFAULT_CPUQUOTAPERIOD);
if period != 0 {
let cpu_quota_per_sec_usec = resolve_cpuquota(quota, period);
properties.push(("CPUQuotaPerSecUSec", Value::U64(cpu_quota_per_sec_usec)));
}
}
Ok(())
}
}
// ref: https://github.com/containers/crun/blob/main/crun.1.md#cgroup-v2
// [2-262144] to [1-10000]
fn shares_to_weight(shares: u64) -> u64 {
if shares == 0 {
return 100;
}
1 + ((shares - 2) * 9999) / 262142
}
fn resolve_cpuquota(quota: i64, period: u64) -> u64 {
let mut cpu_quota_per_sec_usec = u64::MAX;
if quota > 0 {
cpu_quota_per_sec_usec = (quota as u64) * SEC2MICROSEC / period;
if cpu_quota_per_sec_usec % BASIC_INTERVAL != 0 {
cpu_quota_per_sec_usec =
((cpu_quota_per_sec_usec / BASIC_INTERVAL) + 1) * BASIC_INTERVAL;
}
}
cpu_quota_per_sec_usec
}
#[cfg(test)]
mod tests {
use crate::cgroups::systemd::subsystem::cpu::resolve_cpuquota;
#[test]
fn test_unified_cpuquota() {
let quota: i64 = 1000000;
let period: u64 = 500000;
let cpu_quota_per_sec_usec = resolve_cpuquota(quota, period);
assert_eq!(2000000, cpu_quota_per_sec_usec);
}
}

View File

@@ -1,124 +0,0 @@
// Copyright 2021-2022 Kata Contributors
//
// SPDX-License-Identifier: Apache-2.0
//
use super::super::common::{CgroupHierarchy, Properties};
use super::transformer::Transformer;
use anyhow::{bail, Result};
use bit_vec::BitVec;
use oci::{LinuxCpu, LinuxResources};
use std::convert::{TryFrom, TryInto};
use zbus::zvariant::Value;
const BASIC_SYSTEMD_VERSION: &str = "244";
pub struct CpuSet {}
impl Transformer for CpuSet {
fn apply(
r: &LinuxResources,
properties: &mut Properties,
_: &CgroupHierarchy,
systemd_version: &str,
) -> Result<()> {
if let Some(cpuset_resources) = &r.cpu {
Self::apply(cpuset_resources, properties, systemd_version)?;
}
Ok(())
}
}
// v1 & v2:
// cpuset.cpus <-> AllowedCPUs (v244)
// cpuset.mems <-> AllowedMemoryNodes (v244)
impl CpuSet {
fn apply(
cpuset_resources: &LinuxCpu,
properties: &mut Properties,
systemd_version: &str,
) -> Result<()> {
if systemd_version < BASIC_SYSTEMD_VERSION {
return Ok(());
}
let cpus = cpuset_resources.cpus.as_str();
if !cpus.is_empty() {
let cpus_vec: BitMask = cpus.try_into()?;
properties.push(("AllowedCPUs", Value::Array(cpus_vec.0.into())));
}
let mems = cpuset_resources.mems.as_str();
if !mems.is_empty() {
let mems_vec: BitMask = mems.try_into()?;
properties.push(("AllowedMemoryNodes", Value::Array(mems_vec.0.into())));
}
Ok(())
}
}
struct BitMask(Vec<u8>);
impl TryFrom<&str> for BitMask {
type Error = anyhow::Error;
fn try_from(bitmask_str: &str) -> Result<Self, Self::Error> {
let mut bitmask_vec = BitVec::from_elem(8, false);
let bitmask_str_vec: Vec<&str> = bitmask_str.split(',').collect();
for bitmask in bitmask_str_vec.iter() {
let range: Vec<&str> = bitmask.split('-').collect();
match range.len() {
1 => {
let idx: usize = range[0].parse()?;
while idx >= bitmask_vec.len() {
bitmask_vec.grow(8, false);
}
bitmask_vec.set(adjust_index(idx), true);
}
2 => {
let left_index = range[0].parse()?;
let right_index = range[1].parse()?;
while right_index >= bitmask_vec.len() {
bitmask_vec.grow(8, false);
}
for idx in left_index..=right_index {
bitmask_vec.set(adjust_index(idx), true);
}
}
_ => bail!("invalid bitmask str {}", bitmask_str),
}
}
let mut result_vec = bitmask_vec.to_bytes();
result_vec.reverse();
Ok(BitMask(result_vec))
}
}
#[inline(always)]
fn adjust_index(idx: usize) -> usize {
idx / 8 * 8 + 7 - idx % 8
}
#[cfg(test)]
mod tests {
use std::convert::TryInto;
use crate::cgroups::systemd::subsystem::cpuset::BitMask;
#[test]
fn test_bitmask_conversion() {
let cpus_vec: BitMask = "2-4".try_into().unwrap();
assert_eq!(vec![0b11100 as u8], cpus_vec.0);
let cpus_vec: BitMask = "1,7".try_into().unwrap();
assert_eq!(vec![0b10000010 as u8], cpus_vec.0);
let cpus_vec: BitMask = "0,2-3,7".try_into().unwrap();
assert_eq!(vec![0b10001101 as u8], cpus_vec.0);
}
}

View File

@@ -1,117 +0,0 @@
// Copyright 2021-2022 Kata Contributors
//
// SPDX-License-Identifier: Apache-2.0
//
use super::super::common::{CgroupHierarchy, Properties};
use super::transformer::Transformer;
use anyhow::{bail, Result};
use oci::{LinuxMemory, LinuxResources};
use zbus::zvariant::Value;
pub struct Memory {}
impl Transformer for Memory {
fn apply(
r: &LinuxResources,
properties: &mut Properties,
cgroup_hierarchy: &CgroupHierarchy,
_: &str,
) -> Result<()> {
if let Some(memory_resources) = &r.memory {
match cgroup_hierarchy {
CgroupHierarchy::Legacy => Self::legacy_apply(memory_resources, properties)?,
CgroupHierarchy::Unified => Self::unified_apply(memory_resources, properties)?,
}
}
Ok(())
}
}
impl Memory {
// v1:
// memory.limit <-> MemoryLimit
fn legacy_apply(memory_resources: &LinuxMemory, properties: &mut Properties) -> Result<()> {
if let Some(limit) = memory_resources.limit {
let limit = match limit {
1..=i64::MAX => limit as u64,
0 => u64::MAX,
_ => bail!("invalid memory.limit"),
};
properties.push(("MemoryLimit", Value::U64(limit)));
}
Ok(())
}
// v2:
// memory.low <-> MemoryLow
// memory.max <-> MemoryMax
// memory.swap & memory.limit <-> MemorySwapMax
fn unified_apply(memory_resources: &LinuxMemory, properties: &mut Properties) -> Result<()> {
if let Some(limit) = memory_resources.limit {
let limit = match limit {
1..=i64::MAX => limit as u64,
0 => u64::MAX,
_ => bail!("invalid memory.limit: {}", limit),
};
properties.push(("MemoryMax", Value::U64(limit)));
}
if let Some(reservation) = memory_resources.reservation {
let reservation = match reservation {
1..=i64::MAX => reservation as u64,
0 => u64::MAX,
_ => bail!("invalid memory.reservation: {}", reservation),
};
properties.push(("MemoryLow", Value::U64(reservation)));
}
let swap = match memory_resources.swap {
Some(0) => u64::MAX,
Some(1..=i64::MAX) => match memory_resources.limit {
Some(1..=i64::MAX) => {
(memory_resources.limit.unwrap() - memory_resources.swap.unwrap()) as u64
}
_ => bail!("invalid memory.limit when memory.swap specified"),
},
None => u64::MAX,
_ => bail!("invalid memory.swap"),
};
properties.push(("MemorySwapMax", Value::U64(swap)));
Ok(())
}
}
#[cfg(test)]
mod tests {
use super::Memory;
use super::Properties;
use super::Value;
#[test]
fn test_unified_memory() {
let memory_resources = oci::LinuxMemory {
limit: Some(736870912),
reservation: Some(536870912),
swap: Some(536870912),
kernel: Some(0),
kernel_tcp: Some(0),
swappiness: Some(0),
disable_oom_killer: Some(false),
};
let mut properties: Properties = vec![];
assert_eq!(
true,
Memory::unified_apply(&memory_resources, &mut properties).is_ok()
);
assert_eq!(Value::U64(200000000), properties[2].1);
}
}

View File

@@ -1,10 +0,0 @@
// Copyright 2021-2022 Kata Contributors
//
// SPDX-License-Identifier: Apache-2.0
//
pub mod cpu;
pub mod cpuset;
pub mod memory;
pub mod pids;
pub mod transformer;

View File

@@ -1,60 +0,0 @@
// Copyright 2021-2022 Kata Contributors
//
// SPDX-License-Identifier: Apache-2.0
//
use super::super::common::{CgroupHierarchy, Properties};
use super::transformer::Transformer;
use anyhow::Result;
use oci::{LinuxPids, LinuxResources};
use zbus::zvariant::Value;
pub struct Pids {}
impl Transformer for Pids {
fn apply(
r: &LinuxResources,
properties: &mut Properties,
_: &CgroupHierarchy,
_: &str,
) -> Result<()> {
if let Some(pids_resources) = &r.pids {
Self::apply(pids_resources, properties)?;
}
Ok(())
}
}
// pids.limit <-> TasksMax
impl Pids {
fn apply(pids_resources: &LinuxPids, properties: &mut Properties) -> Result<()> {
let limit = if pids_resources.limit > 0 {
pids_resources.limit as u64
} else {
u64::MAX
};
properties.push(("TasksMax", Value::U64(limit)));
Ok(())
}
}
#[cfg(test)]
mod tests {
use super::Pids;
use super::Properties;
use super::Value;
#[test]
fn test_subsystem_workflow() {
let pids_resources = oci::LinuxPids { limit: 0 };
let mut properties: Properties = vec![];
assert_eq!(true, Pids::apply(&pids_resources, &mut properties).is_ok());
assert_eq!(Value::U64(u64::MAX), properties[0].1);
}
}

View File

@@ -1,17 +0,0 @@
// Copyright 2021-2022 Kata Contributors
//
// SPDX-License-Identifier: Apache-2.0
//
use super::super::common::{CgroupHierarchy, Properties};
use anyhow::Result;
use oci::LinuxResources;
pub trait Transformer {
fn apply(
r: &LinuxResources,
properties: &mut Properties,
cgroup_hierarchy: &CgroupHierarchy,
systemd_version: &str,
) -> Result<()>;
}

View File

@@ -6,7 +6,7 @@
use anyhow::{anyhow, Context, Result};
use libc::pid_t;
use oci::{ContainerState, LinuxDevice, LinuxIdMapping};
use oci::{Linux, LinuxNamespace, LinuxResources, Spec};
use oci::{Hook, Linux, LinuxNamespace, LinuxResources, Spec};
use std::clone::Clone;
use std::ffi::CString;
use std::fmt::Display;
@@ -22,7 +22,6 @@ use crate::capabilities;
use crate::cgroups::fs::Manager as FsManager;
#[cfg(test)]
use crate::cgroups::mock::Manager as FsManager;
use crate::cgroups::systemd::manager::Manager as SystemdManager;
use crate::cgroups::Manager;
#[cfg(feature = "standard-oci-runtime")]
use crate::console;
@@ -30,7 +29,6 @@ use crate::log_child;
use crate::process::Process;
#[cfg(feature = "seccomp")]
use crate::seccomp;
use crate::selinux;
use crate::specconv::CreateOpts;
use crate::{mount, validator};
@@ -51,7 +49,6 @@ use std::os::unix::io::AsRawFd;
use protobuf::SingularPtrField;
use oci::State as OCIState;
use regex::Regex;
use std::collections::HashMap;
use std::os::unix::io::FromRawFd;
use std::str::FromStr;
@@ -67,9 +64,6 @@ use rlimit::{setrlimit, Resource, Rlim};
use tokio::io::AsyncBufReadExt;
use tokio::sync::Mutex;
use kata_sys_util::hooks::HookStates;
use kata_sys_util::validate::valid_env;
pub const EXEC_FIFO_FILENAME: &str = "exec.fifo";
const INIT: &str = "INIT";
@@ -113,6 +107,7 @@ impl Default for ContainerStatus {
}
// We might want to change this to thiserror in the future
const MissingCGroupManager: &str = "failed to get container's cgroup Manager";
const MissingLinux: &str = "no linux config";
const InvalidNamespace: &str = "invalid namespace type";
@@ -206,8 +201,6 @@ lazy_static! {
},
]
};
pub static ref SYSTEMD_CGROUP_PATH_FORMAT:Regex = Regex::new(r"^[\w\-.]*:[\w\-.]*:[\w\-.]*$").unwrap();
}
#[derive(Serialize, Deserialize, Debug)]
@@ -246,7 +239,7 @@ pub struct LinuxContainer {
pub id: String,
pub root: String,
pub config: Config,
pub cgroup_manager: Box<dyn Manager + Send + Sync>,
pub cgroup_manager: Option<FsManager>,
pub init_process_pid: pid_t,
pub init_process_start_time: u64,
pub uid_map_path: String,
@@ -295,11 +288,16 @@ impl Container for LinuxContainer {
));
}
self.cgroup_manager.as_ref().freeze(FreezerState::Frozen)?;
if self.cgroup_manager.is_some() {
self.cgroup_manager
.as_ref()
.unwrap()
.freeze(FreezerState::Frozen)?;
self.status.transition(ContainerState::Paused);
Ok(())
self.status.transition(ContainerState::Paused);
return Ok(());
}
Err(anyhow!(MissingCGroupManager))
}
fn resume(&mut self) -> Result<()> {
@@ -308,11 +306,16 @@ impl Container for LinuxContainer {
return Err(anyhow!("container status is: {:?}, not paused", status));
}
self.cgroup_manager.as_ref().freeze(FreezerState::Thawed)?;
if self.cgroup_manager.is_some() {
self.cgroup_manager
.as_ref()
.unwrap()
.freeze(FreezerState::Thawed)?;
self.status.transition(ContainerState::Running);
Ok(())
self.status.transition(ContainerState::Running);
return Ok(());
}
Err(anyhow!(MissingCGroupManager))
}
}
@@ -387,9 +390,7 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
let buf = read_sync(crfd)?;
let cm_str = std::str::from_utf8(&buf)?;
// deserialize cm_str into FsManager and SystemdManager separately
let fs_cm: Result<FsManager, serde_json::Error> = serde_json::from_str(cm_str);
let systemd_cm: Result<SystemdManager, serde_json::Error> = serde_json::from_str(cm_str);
let cm: FsManager = serde_json::from_str(cm_str)?;
#[cfg(feature = "standard-oci-runtime")]
let csocket_fd = console::setup_console_socket(&std::env::var(CONSOLE_SOCKET_FD)?)?;
@@ -530,8 +531,6 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
}
}
let selinux_enabled = selinux::is_enabled()?;
sched::unshare(to_new & !CloneFlags::CLONE_NEWUSER)?;
if userns {
@@ -549,18 +548,7 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
if to_new.contains(CloneFlags::CLONE_NEWNS) {
// setup rootfs
if let Ok(systemd_cm) = systemd_cm {
mount::init_rootfs(
cfd_log,
&spec,
&systemd_cm.paths,
&systemd_cm.mounts,
bind_device,
)?;
} else {
let fs_cm = fs_cm.unwrap();
mount::init_rootfs(cfd_log, &spec, &fs_cm.paths, &fs_cm.mounts, bind_device)?;
}
mount::init_rootfs(cfd_log, &spec, &cm.paths, &cm.mounts, bind_device)?;
}
if init {
@@ -633,18 +621,6 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
capctl::prctl::set_no_new_privs().map_err(|_| anyhow!("cannot set no new privileges"))?;
}
// Set SELinux label
if !oci_process.selinux_label.is_empty() {
if !selinux_enabled {
return Err(anyhow!(
"SELinux label for the process is provided but SELinux is not enabled on the running kernel"
));
}
log_child!(cfd_log, "Set SELinux label to the container process");
selinux::set_exec_label(&oci_process.selinux_label)?;
}
// Log unknown seccomp system calls in advance before the log file descriptor closes.
#[cfg(feature = "seccomp")]
if let Some(ref scmp) = linux.seccomp {
@@ -854,17 +830,22 @@ impl BaseContainer for LinuxContainer {
}
fn stats(&self) -> Result<StatsContainerResponse> {
let mut r = StatsContainerResponse::default();
if self.cgroup_manager.is_some() {
r.cgroup_stats =
SingularPtrField::some(self.cgroup_manager.as_ref().unwrap().get_stats()?);
}
// what about network interface stats?
Ok(StatsContainerResponse {
cgroup_stats: SingularPtrField::some(self.cgroup_manager.as_ref().get_stats()?),
..Default::default()
})
Ok(r)
}
fn set(&mut self, r: LinuxResources) -> Result<()> {
self.cgroup_manager.as_ref().set(&r, true)?;
if self.cgroup_manager.is_some() {
self.cgroup_manager.as_ref().unwrap().set(&r, true)?;
}
self.config
.spec
.as_mut()
@@ -1037,8 +1018,7 @@ impl BaseContainer for LinuxContainer {
&logger,
spec,
&p,
self.cgroup_manager.as_ref(),
self.config.use_systemd_cgroup,
self.cgroup_manager.as_ref().unwrap(),
&st,
&mut pipe_w,
&mut pipe_r,
@@ -1101,14 +1081,12 @@ impl BaseContainer for LinuxContainer {
}
}
// guest Poststop hook
// * should be executed after the container is deleted but before the delete operation returns
// * the executable file is in agent namespace
// * should also be executed in agent namespace.
if let Some(hooks) = spec.hooks.as_ref() {
info!(self.logger, "guest Poststop hook");
let mut hook_states = HookStates::new();
hook_states.execute_hooks(&hooks.poststop, Some(st))?;
if spec.hooks.is_some() {
info!(self.logger, "poststop");
let hooks = spec.hooks.as_ref().unwrap();
for h in hooks.poststop.iter() {
execute_hook(&self.logger, h, &st).await?;
}
}
self.status.transition(ContainerState::Stopped);
@@ -1118,19 +1096,19 @@ impl BaseContainer for LinuxContainer {
)?;
fs::remove_dir_all(&self.root)?;
let cgm = self.cgroup_manager.as_mut();
// Kill all of the processes created in this container to prevent
// the leak of some daemon process when this container shared pidns
// with the sandbox.
let pids = cgm.get_pids().context("get cgroup pids")?;
for i in pids {
if let Err(e) = signal::kill(Pid::from_raw(i), Signal::SIGKILL) {
warn!(self.logger, "kill the process {} error: {:?}", i, e);
if let Some(cgm) = self.cgroup_manager.as_mut() {
// Kill all of the processes created in this container to prevent
// the leak of some daemon process when this container shared pidns
// with the sandbox.
let pids = cgm.get_pids().context("get cgroup pids")?;
for i in pids {
if let Err(e) = signal::kill(Pid::from_raw(i), Signal::SIGKILL) {
warn!(self.logger, "kill the process {} error: {:?}", i, e);
}
}
cgm.destroy().context("destroy cgroups")?;
}
cgm.destroy().context("destroy cgroups")?;
Ok(())
}
@@ -1154,14 +1132,16 @@ impl BaseContainer for LinuxContainer {
.ok_or_else(|| anyhow!("OCI spec was not found"))?;
let st = self.oci_state()?;
// guest Poststart hook
// * should be executed after the container is started but before the delete operation returns
// * the executable file is in agent namespace
// * should also be executed in agent namespace.
if let Some(hooks) = spec.hooks.as_ref() {
info!(self.logger, "guest Poststart hook");
let mut hook_states = HookStates::new();
hook_states.execute_hooks(&hooks.poststart, Some(st))?;
// run poststart hook
if spec.hooks.is_some() {
info!(self.logger, "poststart hook");
let hooks = spec
.hooks
.as_ref()
.ok_or_else(|| anyhow!("OCI hooks were not found"))?;
for h in hooks.poststart.iter() {
execute_hook(&self.logger, h, &st).await?;
}
}
unistd::close(fd)?;
@@ -1300,13 +1280,11 @@ pub fn setup_child_logger(fd: RawFd, child_logger: Logger) -> tokio::task::JoinH
})
}
#[allow(clippy::too_many_arguments)]
async fn join_namespaces(
logger: &Logger,
spec: &Spec,
p: &Process,
cm: &(dyn Manager + Send + Sync),
use_systemd_cgroup: bool,
cm: &FsManager,
st: &OCIState,
pipe_w: &mut PipeStream,
pipe_r: &mut PipeStream,
@@ -1333,11 +1311,7 @@ async fn join_namespaces(
info!(logger, "wait child received oci process");
read_async(pipe_r).await?;
let cm_str = if use_systemd_cgroup {
serde_json::to_string(cm.as_any()?.downcast_ref::<SystemdManager>().unwrap())
} else {
serde_json::to_string(cm.as_any()?.downcast_ref::<FsManager>().unwrap())
}?;
let cm_str = serde_json::to_string(cm)?;
write_async(pipe_w, SYNC_DATA, cm_str.as_str()).await?;
// wait child setup user namespace
@@ -1360,16 +1334,13 @@ async fn join_namespaces(
}
// apply cgroups
// For FsManger, it's no matter about the order of apply and set.
// For SystemdManger, apply must be precede set because we can only create a systemd unit with specific processes(pids).
if res.is_some() {
info!(logger, "apply processes to cgroups!");
cm.apply(p.pid)?;
if p.init && res.is_some() {
info!(logger, "apply cgroups!");
cm.set(res.unwrap(), false)?;
}
if p.init && res.is_some() {
info!(logger, "set properties to cgroups!");
cm.set(res.unwrap(), false)?;
if res.is_some() {
cm.apply(p.pid)?;
}
info!(logger, "notify child to continue");
@@ -1382,14 +1353,13 @@ async fn join_namespaces(
info!(logger, "get ready to run prestart hook!");
// guest Prestart hook
// * should be executed during the start operation, and before the container command is executed
// * the executable file is in agent namespace
// * should also be executed in agent namespace.
if let Some(hooks) = spec.hooks.as_ref() {
info!(logger, "guest Prestart hook");
let mut hook_states = HookStates::new();
hook_states.execute_hooks(&hooks.prestart, Some(st.clone()))?;
// run prestart hook
if spec.hooks.is_some() {
info!(logger, "prestart hook");
let hooks = spec.hooks.as_ref().unwrap();
for h in hooks.prestart.iter() {
execute_hook(&logger, h, st).await?;
}
}
// notify child run prestart hooks completed
@@ -1475,41 +1445,27 @@ impl LinuxContainer {
.context(format!("Cannot change owner of container {} root", id))?;
let spec = config.spec.as_ref().unwrap();
let linux = spec.linux.as_ref().unwrap();
let cpath = if config.use_systemd_cgroup {
if linux.cgroups_path.len() == 2 {
format!("system.slice:kata_agent:{}", id.as_str())
} else {
linux.cgroups_path.clone()
}
} else if linux.cgroups_path.is_empty() {
let cpath = if linux.cgroups_path.is_empty() {
format!("/{}", id.as_str())
} else {
// if we have a systemd cgroup path we need to convert it to a fs cgroup path
linux.cgroups_path.replace(':', "/")
linux.cgroups_path.clone()
};
let cgroup_manager: Box<dyn Manager + Send + Sync> = if config.use_systemd_cgroup {
Box::new(SystemdManager::new(cpath.as_str()).map_err(|e| {
anyhow!(format!(
"fail to create cgroup manager with path {}: {:}",
cpath, e
))
})?)
} else {
Box::new(FsManager::new(cpath.as_str()).map_err(|e| {
anyhow!(format!(
"fail to create cgroup manager with path {}: {:}",
cpath, e
))
})?)
};
let cgroup_manager = FsManager::new(cpath.as_str()).map_err(|e| {
anyhow!(format!(
"fail to create cgroup manager with path {}: {:}",
cpath, e
))
})?;
info!(logger, "new cgroup_manager {:?}", &cgroup_manager);
Ok(LinuxContainer {
id: id.clone(),
root,
cgroup_manager,
cgroup_manager: Some(cgroup_manager),
status: ContainerStatus::new(),
uid_map_path: String::from(""),
gid_map_path: "".to_string(),
@@ -1561,6 +1517,143 @@ fn set_sysctls(sysctls: &HashMap<String, String>) -> Result<()> {
Ok(())
}
use std::process::Stdio;
use std::time::Duration;
use tokio::io::{AsyncReadExt, AsyncWriteExt};
pub async fn execute_hook(logger: &Logger, h: &Hook, st: &OCIState) -> Result<()> {
let logger = logger.new(o!("action" => "execute-hook"));
let binary = PathBuf::from(h.path.as_str());
let path = binary.canonicalize()?;
if !path.exists() {
return Err(anyhow!("Path {:?} does not exist", path));
}
let mut args = h.args.clone();
// the hook.args[0] is the hook binary name which shouldn't be included
// in the Command.args
if args.len() > 1 {
args.remove(0);
}
// all invalid envs will be omitted, only valid envs will be passed to hook.
let env: HashMap<&str, &str> = h.env.iter().filter_map(|e| valid_env(e)).collect();
// Avoid the exit signal to be reaped by the global reaper.
let _wait_locker = WAIT_PID_LOCKER.lock().await;
let mut child = tokio::process::Command::new(path)
.args(args.iter())
.envs(env.iter())
.kill_on_drop(true)
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.spawn()?;
// default timeout 10s
let mut timeout: u64 = 10;
// if timeout is set if hook, then use the specified value
if let Some(t) = h.timeout {
if t > 0 {
timeout = t as u64;
}
}
let state = serde_json::to_string(st)?;
let path = h.path.clone();
let join_handle = tokio::spawn(async move {
if let Some(mut stdin) = child.stdin.take() {
match stdin.write_all(state.as_bytes()).await {
Ok(_) => {}
Err(e) => {
info!(logger, "write to child stdin failed: {:?}", e);
}
}
}
// read something from stdout and stderr for debug
if let Some(stdout) = child.stdout.as_mut() {
let mut out = String::new();
match stdout.read_to_string(&mut out).await {
Ok(_) => {
info!(logger, "child stdout: {}", out.as_str());
}
Err(e) => {
info!(logger, "read from child stdout failed: {:?}", e);
}
}
}
let mut err = String::new();
if let Some(stderr) = child.stderr.as_mut() {
match stderr.read_to_string(&mut err).await {
Ok(_) => {
info!(logger, "child stderr: {}", err.as_str());
}
Err(e) => {
info!(logger, "read from child stderr failed: {:?}", e);
}
}
}
match child.wait().await {
Ok(exit) => {
let code = exit
.code()
.ok_or_else(|| anyhow!("hook exit status has no status code"))?;
if code != 0 {
error!(
logger,
"hook {} exit status is {}, error message is {}", &path, code, err
);
return Err(anyhow!(nix::Error::UnknownErrno));
}
debug!(logger, "hook {} exit status is 0", &path);
Ok(())
}
Err(e) => Err(anyhow!(
"wait child error: {} {}",
e,
e.raw_os_error().unwrap()
)),
}
});
match tokio::time::timeout(Duration::new(timeout, 0), join_handle).await {
Ok(r) => r.unwrap(),
Err(_) => Err(anyhow!(nix::Error::ETIMEDOUT)),
}
}
// valid environment variables according to https://doc.rust-lang.org/std/env/fn.set_var.html#panics
fn valid_env(e: &str) -> Option<(&str, &str)> {
// wherther key or value will contain NULL char.
if e.as_bytes().contains(&b'\0') {
return None;
}
let v: Vec<&str> = e.splitn(2, '=').collect();
// key can't hold an `equal` sign, but value can
if v.len() != 2 {
return None;
}
let (key, value) = (v[0].trim(), v[1].trim());
// key can't be empty
if key.is_empty() {
return None;
}
Some((key, value))
}
#[cfg(test)]
mod tests {
use super::*;
@@ -1571,6 +1664,7 @@ mod tests {
use std::os::unix::io::AsRawFd;
use tempfile::tempdir;
use test_utils::skip_if_not_root;
use tokio::process::Command;
macro_rules! sl {
() => {
@@ -1578,6 +1672,113 @@ mod tests {
};
}
async fn which(cmd: &str) -> String {
let output: std::process::Output = Command::new("which")
.arg(cmd)
.output()
.await
.expect("which command failed to run");
match String::from_utf8(output.stdout) {
Ok(v) => v.trim_end_matches('\n').to_string(),
Err(e) => panic!("Invalid UTF-8 sequence: {}", e),
}
}
#[tokio::test]
async fn test_execute_hook() {
let temp_file = "/tmp/test_execute_hook";
let touch = which("touch").await;
defer!(fs::remove_file(temp_file).unwrap(););
let invalid_str = vec![97, b'\0', 98];
let invalid_string = std::str::from_utf8(&invalid_str).unwrap();
let invalid_env = format!("{}=value", invalid_string);
execute_hook(
&slog_scope::logger(),
&Hook {
path: touch,
args: vec!["touch".to_string(), temp_file.to_string()],
env: vec![invalid_env],
timeout: Some(10),
},
&OCIState {
version: "1.2.3".to_string(),
id: "321".to_string(),
status: ContainerState::Running,
pid: 2,
bundle: "".to_string(),
annotations: Default::default(),
},
)
.await
.unwrap();
assert_eq!(Path::new(&temp_file).exists(), true);
}
#[tokio::test]
async fn test_execute_hook_with_error() {
let ls = which("ls").await;
let res = execute_hook(
&slog_scope::logger(),
&Hook {
path: ls,
args: vec!["ls".to_string(), "/tmp/not-exist".to_string()],
env: vec![],
timeout: None,
},
&OCIState {
version: "1.2.3".to_string(),
id: "321".to_string(),
status: ContainerState::Running,
pid: 2,
bundle: "".to_string(),
annotations: Default::default(),
},
)
.await;
let expected_err = nix::Error::UnknownErrno;
assert_eq!(
res.unwrap_err().downcast::<nix::Error>().unwrap(),
expected_err
);
}
#[tokio::test]
async fn test_execute_hook_with_timeout() {
let sleep = which("sleep").await;
let res = execute_hook(
&slog_scope::logger(),
&Hook {
path: sleep,
args: vec!["sleep".to_string(), "2".to_string()],
env: vec![],
timeout: Some(1),
},
&OCIState {
version: "1.2.3".to_string(),
id: "321".to_string(),
status: ContainerState::Running,
pid: 2,
bundle: "".to_string(),
annotations: Default::default(),
},
)
.await;
let expected_err = nix::Error::ETIMEDOUT;
assert_eq!(
res.unwrap_err().downcast::<nix::Error>().unwrap(),
expected_err
);
}
#[test]
fn test_status_transtition() {
let mut status = ContainerStatus::new();
@@ -1730,12 +1931,20 @@ mod tests {
assert!(format!("{:?}", ret).contains("failed to pause container"))
}
#[test]
fn test_linuxcontainer_pause_cgroupmgr_is_none() {
let ret = new_linux_container_and_then(|mut c: LinuxContainer| {
c.cgroup_manager = None;
c.pause().map_err(|e| anyhow!(e))
});
assert!(ret.is_err(), "Expecting error, Got {:?}", ret);
}
#[test]
fn test_linuxcontainer_pause() {
let ret = new_linux_container_and_then(|mut c: LinuxContainer| {
c.cgroup_manager = Box::new(FsManager::new("").map_err(|e| {
anyhow!(format!("fail to create cgroup manager with path: {:}", e))
})?);
c.cgroup_manager = FsManager::new("").ok();
c.pause().map_err(|e| anyhow!(e))
});
@@ -1754,12 +1963,21 @@ mod tests {
assert!(format!("{:?}", ret).contains("not paused"))
}
#[test]
fn test_linuxcontainer_resume_cgroupmgr_is_none() {
let ret = new_linux_container_and_then(|mut c: LinuxContainer| {
c.status.transition(ContainerState::Paused);
c.cgroup_manager = None;
c.resume().map_err(|e| anyhow!(e))
});
assert!(ret.is_err(), "Expecting error, Got {:?}", ret);
}
#[test]
fn test_linuxcontainer_resume() {
let ret = new_linux_container_and_then(|mut c: LinuxContainer| {
c.cgroup_manager = Box::new(FsManager::new("").map_err(|e| {
anyhow!(format!("fail to create cgroup manager with path: {:}", e))
})?);
c.cgroup_manager = FsManager::new("").ok();
// Change status to paused, this way we can resume it
c.status.transition(ContainerState::Paused);
c.resume().map_err(|e| anyhow!(e))
@@ -1892,4 +2110,49 @@ mod tests {
let ret = do_init_child(std::io::stdin().as_raw_fd());
assert!(ret.is_err(), "Expecting Err, Got {:?}", ret);
}
#[test]
fn test_valid_env() {
let env = valid_env("a=b=c");
assert_eq!(Some(("a", "b=c")), env);
let env = valid_env("a=b");
assert_eq!(Some(("a", "b")), env);
let env = valid_env("a =b");
assert_eq!(Some(("a", "b")), env);
let env = valid_env(" a =b");
assert_eq!(Some(("a", "b")), env);
let env = valid_env("a= b");
assert_eq!(Some(("a", "b")), env);
let env = valid_env("a=b ");
assert_eq!(Some(("a", "b")), env);
let env = valid_env("a=b c ");
assert_eq!(Some(("a", "b c")), env);
let env = valid_env("=b");
assert_eq!(None, env);
let env = valid_env("a=");
assert_eq!(Some(("a", "")), env);
let env = valid_env("a==");
assert_eq!(Some(("a", "=")), env);
let env = valid_env("a");
assert_eq!(None, env);
let invalid_str = vec![97, b'\0', 98];
let invalid_string = std::str::from_utf8(&invalid_str).unwrap();
let invalid_env = format!("{}=value", invalid_string);
let env = valid_env(&invalid_env);
assert_eq!(None, env);
let invalid_env = format!("key={}", invalid_string);
let env = valid_env(&invalid_env);
assert_eq!(None, env);
}
}

View File

@@ -38,7 +38,6 @@ pub mod pipestream;
pub mod process;
#[cfg(feature = "seccomp")]
pub mod seccomp;
pub mod selinux;
pub mod specconv;
pub mod sync;
pub mod sync_with_async;

View File

@@ -25,7 +25,6 @@ use std::fs::File;
use std::io::{BufRead, BufReader};
use crate::container::DEFAULT_DEVICES;
use crate::selinux;
use crate::sync::write_count;
use std::string::ToString;
@@ -182,8 +181,6 @@ pub fn init_rootfs(
None => flags |= MsFlags::MS_SLAVE,
}
let label = &linux.mount_label;
let root = spec
.root
.as_ref()
@@ -247,7 +244,7 @@ pub fn init_rootfs(
}
}
mount_from(cfd_log, m, rootfs, flags, &data, label)?;
mount_from(cfd_log, m, rootfs, flags, &data, "")?;
// bind mount won't change mount options, we need remount to make mount options
// effective.
// first check that we have non-default options required before attempting a
@@ -527,6 +524,7 @@ pub fn pivot_rootfs<P: ?Sized + NixPath + std::fmt::Debug>(path: &P) -> Result<(
fn rootfs_parent_mount_private(path: &str) -> Result<()> {
let mount_infos = parse_mount_table(MOUNTINFO_PATH)?;
let mut max_len = 0;
let mut mount_point = String::from("");
let mut options = String::from("");
@@ -769,9 +767,9 @@ fn mount_from(
rootfs: &str,
flags: MsFlags,
data: &str,
label: &str,
_label: &str,
) -> Result<()> {
let mut d = String::from(data);
let d = String::from(data);
let dest = secure_join(rootfs, &m.destination);
let src = if m.r#type.as_str() == "bind" {
@@ -782,7 +780,7 @@ fn mount_from(
Path::new(&dest).parent().unwrap()
};
fs::create_dir_all(dir).map_err(|e| {
fs::create_dir_all(&dir).map_err(|e| {
log_child!(
cfd_log,
"create dir {}: {}",
@@ -824,37 +822,6 @@ fn mount_from(
e
})?;
// Set the SELinux context for the mounts
let mut use_xattr = false;
if !label.is_empty() {
if selinux::is_enabled()? {
let device = Path::new(&m.source)
.file_name()
.ok_or_else(|| anyhow!("invalid device source path: {}", &m.source))?
.to_str()
.ok_or_else(|| anyhow!("failed to convert device source path: {}", &m.source))?;
match device {
// SELinux does not support labeling of /proc or /sys
"proc" | "sysfs" => (),
// SELinux does not support mount labeling against /dev/mqueue,
// so we use setxattr instead
"mqueue" => {
use_xattr = true;
}
_ => {
log_child!(cfd_log, "add SELinux mount label to {}", dest.as_str());
selinux::add_mount_label(&mut d, label);
}
}
} else {
log_child!(
cfd_log,
"SELinux label for the mount is provided but SELinux is not enabled on the running kernel"
);
}
}
mount(
Some(src.as_str()),
dest.as_str(),
@@ -867,10 +834,6 @@ fn mount_from(
e
})?;
if !label.is_empty() && selinux::is_enabled()? && use_xattr {
xattr::set(dest.as_str(), "security.selinux", label.as_bytes())?;
}
if flags.contains(MsFlags::MS_BIND)
&& flags.intersects(
!(MsFlags::MS_REC

View File

@@ -63,7 +63,7 @@ pub fn get_unknown_syscalls(scmp: &LinuxSeccomp) -> Option<Vec<String>> {
// init_seccomp creates a seccomp filter and loads it for the current process
// including all the child processes.
pub fn init_seccomp(scmp: &LinuxSeccomp) -> Result<()> {
let def_action = ScmpAction::from_str(scmp.default_action.as_str(), Some(libc::EPERM))?;
let def_action = ScmpAction::from_str(scmp.default_action.as_str(), Some(libc::EPERM as i32))?;
// Create a new filter context
let mut filter = ScmpFilterContext::new_filter(def_action)?;

View File

@@ -1,80 +0,0 @@
// Copyright 2022 Sony Group Corporation
//
// SPDX-License-Identifier: Apache-2.0
//
use anyhow::{Context, Result};
use nix::unistd::gettid;
use std::fs::{self, OpenOptions};
use std::io::prelude::*;
use std::path::Path;
pub fn is_enabled() -> Result<bool> {
let buf = fs::read_to_string("/proc/mounts")?;
let enabled = buf.contains("selinuxfs");
Ok(enabled)
}
pub fn add_mount_label(data: &mut String, label: &str) {
if data.is_empty() {
let context = format!("context=\"{}\"", label);
data.push_str(&context);
} else {
let context = format!(",context=\"{}\"", label);
data.push_str(&context);
}
}
pub fn set_exec_label(label: &str) -> Result<()> {
let mut attr_path = Path::new("/proc/thread-self/attr/exec").to_path_buf();
if !attr_path.exists() {
// Fall back to the old convention
attr_path = Path::new("/proc/self/task")
.join(gettid().to_string())
.join("attr/exec")
}
let mut file = OpenOptions::new()
.write(true)
.truncate(true)
.open(attr_path)?;
file.write_all(label.as_bytes())
.with_context(|| "failed to apply SELinux label")?;
Ok(())
}
#[cfg(test)]
mod tests {
use super::*;
const TEST_LABEL: &str = "system_u:system_r:unconfined_t:s0";
#[test]
fn test_is_enabled() {
let ret = is_enabled();
assert!(ret.is_ok(), "Expecting Ok, Got {:?}", ret);
}
#[test]
fn test_add_mount_label() {
let mut data = String::new();
add_mount_label(&mut data, TEST_LABEL);
assert_eq!(data, format!("context=\"{}\"", TEST_LABEL));
let mut data = String::from("defaults");
add_mount_label(&mut data, TEST_LABEL);
assert_eq!(data, format!("defaults,context=\"{}\"", TEST_LABEL));
}
#[test]
fn test_set_exec_label() {
let ret = set_exec_label(TEST_LABEL);
if is_enabled().unwrap() {
assert!(ret.is_ok(), "Expecting Ok, Got {:?}", ret);
} else {
assert!(ret.is_err(), "Expecting error, Got {:?}", ret);
}
}
}

View File

@@ -6,7 +6,6 @@
use crate::container::Config;
use anyhow::{anyhow, Context, Result};
use oci::{Linux, LinuxIdMapping, LinuxNamespace, Spec};
use regex::Regex;
use std::collections::HashMap;
use std::path::{Component, PathBuf};
@@ -87,23 +86,6 @@ fn hostname(oci: &Spec) -> Result<()> {
fn security(oci: &Spec) -> Result<()> {
let linux = get_linux(oci)?;
let label_pattern = r".*_u:.*_r:.*_t:s[0-9]|1[0-5].*";
let label_regex = Regex::new(label_pattern)?;
if let Some(ref process) = oci.process {
if !process.selinux_label.is_empty() && !label_regex.is_match(&process.selinux_label) {
return Err(anyhow!(
"SELinux label for the process is invalid format: {}",
&process.selinux_label
));
}
}
if !linux.mount_label.is_empty() && !label_regex.is_match(&linux.mount_label) {
return Err(anyhow!(
"SELinux label for the mount is invalid format: {}",
&linux.mount_label
));
}
if linux.masked_paths.is_empty() && linux.readonly_paths.is_empty() {
return Ok(());
@@ -113,6 +95,8 @@ fn security(oci: &Spec) -> Result<()> {
return Err(anyhow!("Linux namespace does not contain mount"));
}
// don't care about selinux at present
Ok(())
}
@@ -301,7 +285,7 @@ pub fn validate(conf: &Config) -> Result<()> {
#[cfg(test)]
mod tests {
use super::*;
use oci::{Mount, Process};
use oci::Mount;
#[test]
fn test_namespace() {
@@ -404,29 +388,6 @@ mod tests {
];
spec.linux = Some(linux);
security(&spec).unwrap();
// SELinux
let valid_label = "system_u:system_r:container_t:s0:c123,c456";
let mut process = Process::default();
process.selinux_label = valid_label.to_string();
spec.process = Some(process);
security(&spec).unwrap();
let mut linux = Linux::default();
linux.mount_label = valid_label.to_string();
spec.linux = Some(linux);
security(&spec).unwrap();
let invalid_label = "system_u:system_r:container_t";
let mut process = Process::default();
process.selinux_label = invalid_label.to_string();
spec.process = Some(process);
security(&spec).unwrap_err();
let mut linux = Linux::default();
linux.mount_label = invalid_label.to_string();
spec.linux = Some(linux);
security(&spec).unwrap_err();
}
#[test]

View File

@@ -414,7 +414,7 @@ fn scan_scsi_bus(scsi_addr: &str) -> Result<()> {
// Scan scsi host passing in the channel, SCSI id and LUN.
// Channel is always 0 because we have only one SCSI controller.
let scan_data = &format!("0 {} {}", tokens[0], tokens[1]);
let scan_data = format!("0 {} {}", tokens[0], tokens[1]);
for entry in fs::read_dir(SYSFS_SCSI_HOST_PATH)? {
let host = entry?.file_name();
@@ -428,7 +428,7 @@ fn scan_scsi_bus(scsi_addr: &str) -> Result<()> {
let scan_path = PathBuf::from(&format!("{}/{}/{}", SYSFS_SCSI_HOST_PATH, host_str, "scan"));
fs::write(scan_path, scan_data)?;
fs::write(scan_path, &scan_data)?;
}
Ok(())
@@ -1531,7 +1531,7 @@ mod tests {
pci_driver_override(syspci, dev0, "drv_b").unwrap();
assert_eq!(fs::read_to_string(&dev0override).unwrap(), "drv_b");
assert_eq!(fs::read_to_string(&probepath).unwrap(), dev0.to_string());
assert_eq!(fs::read_to_string(drvaunbind).unwrap(), dev0.to_string());
assert_eq!(fs::read_to_string(&drvaunbind).unwrap(), dev0.to_string());
}
#[test]
@@ -1543,7 +1543,7 @@ mod tests {
let dev0 = pci::Address::new(0, 0, pci::SlotFn::new(0, 0).unwrap());
let dev0path = syspci.join("devices").join(dev0.to_string());
fs::create_dir_all(dev0path).unwrap();
fs::create_dir_all(&dev0path).unwrap();
// Test dev0
assert!(pci_iommu_group(&syspci, dev0).unwrap().is_none());
@@ -1554,7 +1554,7 @@ mod tests {
let dev1group = dev1path.join("iommu_group");
fs::create_dir_all(&dev1path).unwrap();
std::os::unix::fs::symlink("../../../kernel/iommu_groups/12", dev1group).unwrap();
std::os::unix::fs::symlink("../../../kernel/iommu_groups/12", &dev1group).unwrap();
// Test dev1
assert_eq!(
@@ -1567,7 +1567,7 @@ mod tests {
let dev2path = syspci.join("devices").join(dev2.to_string());
let dev2group = dev2path.join("iommu_group");
fs::create_dir_all(dev2group).unwrap();
fs::create_dir_all(&dev2group).unwrap();
// Test dev2
assert!(pci_iommu_group(&syspci, dev2).is_err());

View File

@@ -339,7 +339,7 @@ async fn start_sandbox(
sandbox.lock().await.sender = Some(tx);
// vsock:///dev/vsock, port
let mut server = rpc::start(sandbox.clone(), config.server_addr.as_str(), init_mode)?;
let mut server = rpc::start(sandbox.clone(), config.server_addr.as_str())?;
server.start().await?;
rx.await?;
@@ -436,8 +436,9 @@ mod tests {
let msg = format!("test[{}]: {:?}", i, d);
let (rfd, wfd) = unistd::pipe2(OFlag::O_CLOEXEC).unwrap();
defer!({
// XXX: Never try to close rfd, because it will be closed by PipeStream in
// create_logger_task() and it's not safe to close the same fd twice time.
// rfd is closed by the use of PipeStream in the crate_logger_task function,
// but we will attempt to close in case of a failure
let _ = unistd::close(rfd);
unistd::close(wfd).unwrap();
});

View File

@@ -5,11 +5,10 @@
extern crate procfs;
use prometheus::{Encoder, Gauge, GaugeVec, IntCounter, Opts, Registry, TextEncoder};
use prometheus::{Encoder, Gauge, GaugeVec, IntCounter, TextEncoder};
use anyhow::{anyhow, Result};
use anyhow::Result;
use slog::warn;
use std::sync::Mutex;
use tracing::instrument;
const NAMESPACE_KATA_AGENT: &str = "kata_agent";
@@ -24,70 +23,55 @@ macro_rules! sl {
lazy_static! {
static ref REGISTERED: Mutex<bool> = Mutex::new(false);
static ref AGENT_SCRAPE_COUNT: IntCounter =
prometheus::register_int_counter!(format!("{}_{}",NAMESPACE_KATA_AGENT,"scrape_count"), "Metrics scrape count").unwrap();
// custom registry
static ref REGISTRY: Registry = Registry::new();
static ref AGENT_THREADS: Gauge =
prometheus::register_gauge!(format!("{}_{}",NAMESPACE_KATA_AGENT,"threads"), "Agent process threads").unwrap();
static ref AGENT_SCRAPE_COUNT: IntCounter =
IntCounter::new(format!("{}_{}",NAMESPACE_KATA_AGENT,"scrape_count"), "Metrics scrape count").unwrap();
static ref AGENT_TOTAL_TIME: Gauge =
prometheus::register_gauge!(format!("{}_{}",NAMESPACE_KATA_AGENT,"total_time"), "Agent process total time").unwrap();
// agent metrics
static ref AGENT_THREADS: Gauge =
Gauge::new(format!("{}_{}",NAMESPACE_KATA_AGENT,"threads"), "Agent process threads").unwrap();
static ref AGENT_TOTAL_VM: Gauge =
prometheus::register_gauge!(format!("{}_{}",NAMESPACE_KATA_AGENT,"total_vm"), "Agent process total VM size").unwrap();
static ref AGENT_TOTAL_TIME: Gauge =
Gauge::new(format!("{}_{}",NAMESPACE_KATA_AGENT,"total_time"), "Agent process total time").unwrap();
static ref AGENT_TOTAL_RSS: Gauge =
prometheus::register_gauge!(format!("{}_{}",NAMESPACE_KATA_AGENT,"total_rss"), "Agent process total RSS size").unwrap();
static ref AGENT_TOTAL_VM: Gauge =
Gauge::new(format!("{}_{}",NAMESPACE_KATA_AGENT,"total_vm"), "Agent process total VM size").unwrap() ;
static ref AGENT_PROC_STATUS: GaugeVec =
prometheus::register_gauge_vec!(format!("{}_{}",NAMESPACE_KATA_AGENT,"proc_status"), "Agent process status.", &["item"]).unwrap();
static ref AGENT_TOTAL_RSS: Gauge =
Gauge::new(format!("{}_{}",NAMESPACE_KATA_AGENT,"total_rss"), "Agent process total RSS size").unwrap();
static ref AGENT_IO_STAT: GaugeVec =
prometheus::register_gauge_vec!(format!("{}_{}",NAMESPACE_KATA_AGENT,"io_stat"), "Agent process IO statistics.", &["item"]).unwrap();
static ref AGENT_PROC_STATUS: GaugeVec =
GaugeVec::new(Opts::new(format!("{}_{}",NAMESPACE_KATA_AGENT,"proc_status"), "Agent process status."), &["item"]).unwrap();
static ref AGENT_IO_STAT: GaugeVec =
GaugeVec::new(Opts::new(format!("{}_{}",NAMESPACE_KATA_AGENT,"io_stat"), "Agent process IO statistics."), &["item"]).unwrap();
static ref AGENT_PROC_STAT: GaugeVec =
GaugeVec::new(Opts::new(format!("{}_{}",NAMESPACE_KATA_AGENT,"proc_stat"), "Agent process statistics."), &["item"]).unwrap();
static ref AGENT_PROC_STAT: GaugeVec =
prometheus::register_gauge_vec!(format!("{}_{}",NAMESPACE_KATA_AGENT,"proc_stat"), "Agent process statistics.", &["item"]).unwrap();
// guest os metrics
static ref GUEST_LOAD: GaugeVec =
GaugeVec::new(Opts::new(format!("{}_{}",NAMESPACE_KATA_GUEST,"load"), "Guest system load."), &["item"]).unwrap();
static ref GUEST_LOAD: GaugeVec =
prometheus::register_gauge_vec!(format!("{}_{}",NAMESPACE_KATA_GUEST,"load") , "Guest system load.", &["item"]).unwrap();
static ref GUEST_TASKS: GaugeVec =
GaugeVec::new(Opts::new(format!("{}_{}",NAMESPACE_KATA_GUEST,"tasks"), "Guest system load."), &["item"]).unwrap();
static ref GUEST_TASKS: GaugeVec =
prometheus::register_gauge_vec!(format!("{}_{}",NAMESPACE_KATA_GUEST,"tasks") , "Guest system load.", &["item"]).unwrap();
static ref GUEST_CPU_TIME: GaugeVec =
GaugeVec::new(Opts::new(format!("{}_{}",NAMESPACE_KATA_GUEST,"cpu_time"), "Guest CPU statistics."), &["cpu","item"]).unwrap();
static ref GUEST_CPU_TIME: GaugeVec =
prometheus::register_gauge_vec!(format!("{}_{}",NAMESPACE_KATA_GUEST,"cpu_time") , "Guest CPU statistics.", &["cpu","item"]).unwrap();
static ref GUEST_VM_STAT: GaugeVec =
GaugeVec::new(Opts::new(format!("{}_{}",NAMESPACE_KATA_GUEST,"vm_stat"), "Guest virtual memory statistics."), &["item"]).unwrap();
static ref GUEST_VM_STAT: GaugeVec =
prometheus::register_gauge_vec!(format!("{}_{}",NAMESPACE_KATA_GUEST,"vm_stat") , "Guest virtual memory statistics.", &["item"]).unwrap();
static ref GUEST_NETDEV_STAT: GaugeVec =
GaugeVec::new(Opts::new(format!("{}_{}",NAMESPACE_KATA_GUEST,"netdev_stat"), "Guest net devices statistics."), &["interface","item"]).unwrap();
static ref GUEST_NETDEV_STAT: GaugeVec =
prometheus::register_gauge_vec!(format!("{}_{}",NAMESPACE_KATA_GUEST,"netdev_stat") , "Guest net devices statistics.", &["interface","item"]).unwrap();
static ref GUEST_DISKSTAT: GaugeVec =
GaugeVec::new(Opts::new(format!("{}_{}",NAMESPACE_KATA_GUEST,"diskstat"), "Disks statistics in system."), &["disk","item"]).unwrap();
static ref GUEST_DISKSTAT: GaugeVec =
prometheus::register_gauge_vec!(format!("{}_{}",NAMESPACE_KATA_GUEST,"diskstat") , "Disks statistics in system.", &["disk","item"]).unwrap();
static ref GUEST_MEMINFO: GaugeVec =
GaugeVec::new(Opts::new(format!("{}_{}",NAMESPACE_KATA_GUEST,"meminfo"), "Statistics about memory usage in the system."), &["item"]).unwrap();
static ref GUEST_MEMINFO: GaugeVec =
prometheus::register_gauge_vec!(format!("{}_{}",NAMESPACE_KATA_GUEST,"meminfo") , "Statistics about memory usage in the system.", &["item"]).unwrap();
}
#[instrument]
pub fn get_metrics(_: &protocols::agent::GetMetricsRequest) -> Result<String> {
let mut registered = REGISTERED
.lock()
.map_err(|e| anyhow!("failed to check agent metrics register status {:?}", e))?;
if !(*registered) {
register_metrics()?;
*registered = true;
}
AGENT_SCRAPE_COUNT.inc();
// update agent process metrics
@@ -97,7 +81,7 @@ pub fn get_metrics(_: &protocols::agent::GetMetricsRequest) -> Result<String> {
update_guest_metrics();
// gather all metrics and return as a String
let metric_families = REGISTRY.gather();
let metric_families = prometheus::gather();
let mut buffer = Vec::new();
let encoder = TextEncoder::new();
@@ -106,31 +90,6 @@ pub fn get_metrics(_: &protocols::agent::GetMetricsRequest) -> Result<String> {
Ok(String::from_utf8(buffer)?)
}
#[instrument]
fn register_metrics() -> Result<()> {
REGISTRY.register(Box::new(AGENT_SCRAPE_COUNT.clone()))?;
// agent metrics
REGISTRY.register(Box::new(AGENT_THREADS.clone()))?;
REGISTRY.register(Box::new(AGENT_TOTAL_TIME.clone()))?;
REGISTRY.register(Box::new(AGENT_TOTAL_VM.clone()))?;
REGISTRY.register(Box::new(AGENT_TOTAL_RSS.clone()))?;
REGISTRY.register(Box::new(AGENT_PROC_STATUS.clone()))?;
REGISTRY.register(Box::new(AGENT_IO_STAT.clone()))?;
REGISTRY.register(Box::new(AGENT_PROC_STAT.clone()))?;
// guest metrics
REGISTRY.register(Box::new(GUEST_LOAD.clone()))?;
REGISTRY.register(Box::new(GUEST_TASKS.clone()))?;
REGISTRY.register(Box::new(GUEST_CPU_TIME.clone()))?;
REGISTRY.register(Box::new(GUEST_VM_STAT.clone()))?;
REGISTRY.register(Box::new(GUEST_NETDEV_STAT.clone()))?;
REGISTRY.register(Box::new(GUEST_DISKSTAT.clone()))?;
REGISTRY.register(Box::new(GUEST_MEMINFO.clone()))?;
Ok(())
}
#[instrument]
fn update_agent_metrics() -> Result<()> {
let me = procfs::process::Process::myself();

View File

@@ -648,7 +648,7 @@ pub fn recursive_ownership_change(
) -> Result<()> {
let mut mask = if read_only { RO_MASK } else { RW_MASK };
if path.is_dir() {
for entry in fs::read_dir(path)? {
for entry in fs::read_dir(&path)? {
recursive_ownership_change(entry?.path().as_path(), uid, gid, read_only)?;
}
mask |= EXEC_MASK;
@@ -894,7 +894,7 @@ pub fn get_cgroup_mounts(
}]);
}
let file = File::open(cg_path)?;
let file = File::open(&cg_path)?;
let reader = BufReader::new(file);
let mut has_device_cgroup = false;
@@ -1777,7 +1777,7 @@ mod tests {
let tempdir = tempdir().unwrap();
let src = if d.mask_src {
tempdir.path().join(d.src)
tempdir.path().join(&d.src)
} else {
Path::new(d.src).to_path_buf()
};

View File

@@ -78,7 +78,6 @@ impl Namespace {
// setup creates persistent namespace without switching to it.
// Note, pid namespaces cannot be persisted.
#[instrument]
#[allow(clippy::question_mark)]
pub async fn setup(mut self) -> Result<Self> {
fs::create_dir_all(&self.persistent_ns_dir)?;
@@ -89,7 +88,7 @@ impl Namespace {
}
let logger = self.logger.clone();
let new_ns_path = ns_path.join(ns_type.get());
let new_ns_path = ns_path.join(&ns_type.get());
File::create(new_ns_path.as_path())?;
@@ -103,7 +102,7 @@ impl Namespace {
let source = Path::new(&origin_ns_path);
let destination = new_ns_path.as_path();
File::open(source)?;
File::open(&source)?;
// Create a new netns on the current thread.
let cf = ns_type.get_flags();

View File

@@ -529,9 +529,7 @@ impl Handle {
.map_err(|e| anyhow!("Failed to parse IP {}: {:?}", ip_address, e))?;
// Import rtnetlink objects that make sense only for this function
use packet::constants::{
NDA_UNSPEC, NLM_F_ACK, NLM_F_CREATE, NLM_F_REPLACE, NLM_F_REQUEST,
};
use packet::constants::{NDA_UNSPEC, NLM_F_ACK, NLM_F_CREATE, NLM_F_EXCL, NLM_F_REQUEST};
use packet::neighbour::{NeighbourHeader, NeighbourMessage};
use packet::nlas::neighbour::Nla;
use packet::{NetlinkMessage, NetlinkPayload, RtnlMessage};
@@ -574,7 +572,7 @@ impl Handle {
// Send request and ACK
let mut req = NetlinkMessage::from(RtnlMessage::NewNeighbour(message));
req.header.flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_CREATE | NLM_F_REPLACE;
req.header.flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_EXCL | NLM_F_CREATE;
let mut response = self.handle.request(req)?;
while let Some(message) = response.next().await {
@@ -946,13 +944,13 @@ mod tests {
fn clean_env_for_test_add_one_arp_neighbor(dummy_name: &str, ip: &str) {
// ip link delete dummy
Command::new("ip")
.args(["link", "delete", dummy_name])
.args(&["link", "delete", dummy_name])
.output()
.expect("prepare: failed to delete dummy");
// ip neigh del dev dummy ip
Command::new("ip")
.args(["neigh", "del", dummy_name, ip])
.args(&["neigh", "del", dummy_name, ip])
.output()
.expect("prepare: failed to delete neigh");
}
@@ -967,19 +965,19 @@ mod tests {
// ip link add dummy type dummy
Command::new("ip")
.args(["link", "add", dummy_name, "type", "dummy"])
.args(&["link", "add", dummy_name, "type", "dummy"])
.output()
.expect("failed to add dummy interface");
// ip addr add 192.168.0.2/16 dev dummy
Command::new("ip")
.args(["addr", "add", "192.168.0.2/16", "dev", dummy_name])
.args(&["addr", "add", "192.168.0.2/16", "dev", dummy_name])
.output()
.expect("failed to add ip for dummy");
// ip link set dummy up;
Command::new("ip")
.args(["link", "set", dummy_name, "up"])
.args(&["link", "set", dummy_name, "up"])
.output()
.expect("failed to up dummy");
}
@@ -1011,7 +1009,7 @@ mod tests {
// ip neigh show dev dummy ip
let stdout = Command::new("ip")
.args(["neigh", "show", "dev", dummy_name, to_ip])
.args(&["neigh", "show", "dev", dummy_name, to_ip])
.output()
.expect("failed to show neigh")
.stdout;

View File

@@ -64,7 +64,7 @@ fn do_setup_guest_dns(logger: Logger, dns_list: Vec<String>, src: &str, dst: &st
.map(|x| x.trim())
.collect::<Vec<&str>>()
.join("\n");
fs::write(src, content)?;
fs::write(src, &content)?;
// bind mount to /etc/resolv.conf
mount::mount(Some(src), dst, Some("bind"), MsFlags::MS_BIND, None::<&str>)

View File

@@ -36,7 +36,7 @@ use protocols::health::{
use protocols::types::Interface;
use protocols::{agent_ttrpc_async as agent_ttrpc, health_ttrpc_async as health_ttrpc};
use rustjail::cgroups::notifier;
use rustjail::container::{BaseContainer, Container, LinuxContainer, SYSTEMD_CGROUP_PATH_FORMAT};
use rustjail::container::{BaseContainer, Container, LinuxContainer};
use rustjail::process::Process;
use rustjail::specconv::CreateOpts;
@@ -44,6 +44,7 @@ use nix::errno::Errno;
use nix::mount::MsFlags;
use nix::sys::{stat, statfs};
use nix::unistd::{self, Pid};
use rustjail::cgroups::Manager;
use rustjail::process::ProcessOperations;
use crate::device::{
@@ -83,15 +84,9 @@ use std::path::PathBuf;
const CONTAINER_BASE: &str = "/run/kata-containers";
const MODPROBE_PATH: &str = "/sbin/modprobe";
/// the iptables seriers binaries could appear either in /sbin
/// or /usr/sbin, we need to check both of them
const USR_IPTABLES_SAVE: &str = "/usr/sbin/iptables-save";
const IPTABLES_SAVE: &str = "/sbin/iptables-save";
const USR_IPTABLES_RESTORE: &str = "/usr/sbin/iptables-store";
const IPTABLES_RESTORE: &str = "/sbin/iptables-restore";
const USR_IP6TABLES_SAVE: &str = "/usr/sbin/ip6tables-save";
const IP6TABLES_SAVE: &str = "/sbin/ip6tables-save";
const USR_IP6TABLES_RESTORE: &str = "/usr/sbin/ip6tables-save";
const IP6TABLES_RESTORE: &str = "/sbin/ip6tables-restore";
const ERR_CANNOT_GET_WRITER: &str = "Cannot get writer";
@@ -137,7 +132,6 @@ macro_rules! is_allowed {
#[derive(Clone, Debug)]
pub struct AgentService {
sandbox: Arc<Mutex<Sandbox>>,
init_mode: bool,
}
impl AgentService {
@@ -211,20 +205,9 @@ impl AgentService {
// restore the cwd for kata-agent process.
defer!(unistd::chdir(&olddir).unwrap());
// determine which cgroup driver to take and then assign to use_systemd_cgroup
// systemd: "[slice]:[prefix]:[name]"
// fs: "/path_a/path_b"
// If agent is init we can't use systemd cgroup mode, no matter what the host tells us
let cgroups_path = oci.linux.as_ref().map_or("", |linux| &linux.cgroups_path);
let use_systemd_cgroup = if self.init_mode {
false
} else {
SYSTEMD_CGROUP_PATH_FORMAT.is_match(cgroups_path)
};
let opts = CreateOpts {
cgroup_name: "".to_string(),
use_systemd_cgroup,
use_systemd_cgroup: false,
no_pivot_root: s.no_pivot_root,
no_new_keyring: false,
spec: Some(oci.clone()),
@@ -283,13 +266,14 @@ impl AgentService {
}
// start oom event loop
if let Some(ref ctr) = ctr.cgroup_manager {
let cg_path = ctr.get_cg_path("memory");
let cg_path = ctr.cgroup_manager.as_ref().get_cgroup_path("memory");
if let Some(cg_path) = cg_path {
let rx = notifier::notify_oom(cid.as_str(), cg_path.to_string()).await?;
if let Ok(cg_path) = cg_path {
let rx = notifier::notify_oom(cid.as_str(), cg_path.to_string()).await?;
s.run_oom_event_monitor(rx, cid.clone()).await;
s.run_oom_event_monitor(rx, cid.clone()).await;
}
}
Ok(())
@@ -393,7 +377,6 @@ impl AgentService {
"signal process";
"container-id" => cid.clone(),
"exec-id" => eid.clone(),
"signal" => req.signal,
);
let mut sig: libc::c_int = req.signal as libc::c_int;
@@ -476,7 +459,11 @@ impl AgentService {
let ctr = sandbox
.get_container(cid)
.ok_or_else(|| anyhow!("Invalid container id {}", cid))?;
ctr.cgroup_manager.as_ref().freeze(state)?;
let cm = ctr
.cgroup_manager
.as_ref()
.ok_or_else(|| anyhow!("cgroup manager not exist"))?;
cm.freeze(state)?;
Ok(())
}
@@ -486,7 +473,11 @@ impl AgentService {
let ctr = sandbox
.get_container(cid)
.ok_or_else(|| anyhow!("Invalid container id {}", cid))?;
let pids = ctr.cgroup_manager.as_ref().get_pids()?;
let cm = ctr
.cgroup_manager
.as_ref()
.ok_or_else(|| anyhow!("cgroup manager not exist"))?;
let pids = cm.get_pids()?;
Ok(pids)
}
@@ -1007,18 +998,8 @@ impl agent_ttrpc::AgentService for AgentService {
info!(sl!(), "get_ip_tables: request received");
// the binary could exists in either /usr/sbin or /sbin
// here check both of the places and return the one exists
// if none exists, return the /sbin one, and the rpc will
// returns an internal error
let cmd = if req.is_ipv6 {
if Path::new(USR_IP6TABLES_SAVE).exists() {
USR_IP6TABLES_SAVE
} else {
IP6TABLES_SAVE
}
} else if Path::new(USR_IPTABLES_SAVE).exists() {
USR_IPTABLES_SAVE
IP6TABLES_SAVE
} else {
IPTABLES_SAVE
}
@@ -1046,18 +1027,8 @@ impl agent_ttrpc::AgentService for AgentService {
info!(sl!(), "set_ip_tables request received");
// the binary could exists in both /usr/sbin and /sbin
// here check both of the places and return the one exists
// if none exists, return the /sbin one, and the rpc will
// returns an internal error
let cmd = if req.is_ipv6 {
if Path::new(USR_IP6TABLES_RESTORE).exists() {
USR_IP6TABLES_RESTORE
} else {
IP6TABLES_RESTORE
}
} else if Path::new(USR_IPTABLES_RESTORE).exists() {
USR_IPTABLES_RESTORE
IP6TABLES_RESTORE
} else {
IPTABLES_RESTORE
}
@@ -1685,11 +1656,9 @@ async fn read_stream(reader: Arc<Mutex<ReadHalf<PipeStream>>>, l: usize) -> Resu
Ok(content)
}
pub fn start(s: Arc<Mutex<Sandbox>>, server_address: &str, init_mode: bool) -> Result<TtrpcServer> {
let agent_service = Box::new(AgentService {
sandbox: s,
init_mode,
}) as Box<dyn agent_ttrpc::AgentService + Send + Sync>;
pub fn start(s: Arc<Mutex<Sandbox>>, server_address: &str) -> Result<TtrpcServer> {
let agent_service =
Box::new(AgentService { sandbox: s }) as Box<dyn agent_ttrpc::AgentService + Send + Sync>;
let agent_worker = Arc::new(agent_service);
@@ -2165,7 +2134,6 @@ mod tests {
let agent_service = Box::new(AgentService {
sandbox: Arc::new(Mutex::new(sandbox)),
init_mode: true,
});
let req = protocols::agent::UpdateInterfaceRequest::default();
@@ -2183,7 +2151,6 @@ mod tests {
let agent_service = Box::new(AgentService {
sandbox: Arc::new(Mutex::new(sandbox)),
init_mode: true,
});
let req = protocols::agent::UpdateRoutesRequest::default();
@@ -2201,7 +2168,6 @@ mod tests {
let agent_service = Box::new(AgentService {
sandbox: Arc::new(Mutex::new(sandbox)),
init_mode: true,
});
let req = protocols::agent::AddARPNeighborsRequest::default();
@@ -2335,7 +2301,6 @@ mod tests {
let agent_service = Box::new(AgentService {
sandbox: Arc::new(Mutex::new(sandbox)),
init_mode: true,
});
let result = agent_service
@@ -2791,32 +2756,22 @@ OtherField:other
async fn test_ip_tables() {
skip_if_not_root!();
let iptables_cmd_list = [
USR_IPTABLES_SAVE,
USR_IP6TABLES_SAVE,
USR_IPTABLES_RESTORE,
USR_IP6TABLES_RESTORE,
IPTABLES_SAVE,
IP6TABLES_SAVE,
IPTABLES_RESTORE,
IP6TABLES_RESTORE,
];
for cmd in iptables_cmd_list {
if !check_command(cmd) {
warn!(
sl!(),
"one or more commands for ip tables test are missing, skip it"
);
return;
}
if !check_command(IPTABLES_SAVE)
|| !check_command(IPTABLES_RESTORE)
|| !check_command(IP6TABLES_SAVE)
|| !check_command(IP6TABLES_RESTORE)
{
warn!(
sl!(),
"one or more commands for ip tables test are missing, skip it"
);
return;
}
let logger = slog::Logger::root(slog::Discard, o!());
let sandbox = Sandbox::new(&logger).unwrap();
let agent_service = Box::new(AgentService {
sandbox: Arc::new(Mutex::new(sandbox)),
init_mode: true,
});
let ctx = mk_ttrpc_context();
@@ -2922,7 +2877,7 @@ COMMIT
.unwrap();
assert!(!result.data.is_empty(), "we should have non-zero output:");
assert!(
std::str::from_utf8(&result.data).unwrap().contains(
std::str::from_utf8(&*result.data).unwrap().contains(
"PREROUTING -d 192.168.103.153/32 -j DNAT --to-destination 192.168.188.153"
),
"We should see the resulting rule"
@@ -2960,7 +2915,7 @@ COMMIT
.unwrap();
assert!(!result.data.is_empty(), "we should have non-zero output:");
assert!(
std::str::from_utf8(&result.data)
std::str::from_utf8(&*result.data)
.unwrap()
.contains("INPUT -s 2001:db8:100::1/128 -i sit+ -p tcp -m tcp --sport 512:65535"),
"We should see the resulting rule"

View File

@@ -296,6 +296,7 @@ impl Sandbox {
info!(self.logger, "updating {}", ctr.id.as_str());
ctr.cgroup_manager
.as_ref()
.unwrap()
.update_cpuset_path(guest_cpuset.as_str(), container_cpust)?;
}
@@ -326,7 +327,7 @@ impl Sandbox {
// Reject non-file, symlinks and non-executable files
if !entry.file_type()?.is_file()
|| entry.file_type()?.is_symlink()
|| entry.metadata()?.permissions().mode() & 0o111 == 0
|| entry.metadata()?.permissions().mode() & 0o777 & 0o111 == 0
{
continue;
}
@@ -1072,7 +1073,7 @@ mod tests {
fs::create_dir(&subdir_path).unwrap();
for file in j.files {
let subfile_path = format!("{}/{}", subdir_path, file.name);
let mut subfile = File::create(subfile_path).unwrap();
let mut subfile = File::create(&subfile_path).unwrap();
subfile.write_all(file.content.as_bytes()).unwrap();
}
}

View File

@@ -24,7 +24,7 @@ async fn handle_sigchild(logger: Logger, sandbox: Arc<Mutex<Sandbox>>) -> Result
loop {
// Avoid reaping the undesirable child's signal, e.g., execute_hook's
// The lock should be released immediately.
let _locker = rustjail::container::WAIT_PID_LOCKER.lock().await;
rustjail::container::WAIT_PID_LOCKER.lock().await;
let result = wait::waitpid(
Some(Pid::from_raw(-1)),
Some(WaitPidFlag::WNOHANG | WaitPidFlag::__WALL),

View File

@@ -11,7 +11,7 @@ use std::path::{Path, PathBuf};
use std::sync::Arc;
use std::time::SystemTime;
use anyhow::{anyhow, ensure, Context, Result};
use anyhow::{ensure, Context, Result};
use async_recursion::async_recursion;
use nix::mount::{umount, MsFlags};
use nix::unistd::{Gid, Uid};
@@ -34,13 +34,9 @@ const MAX_SIZE_PER_WATCHABLE_MOUNT: u64 = 1024 * 1024;
/// How often to check for modified files.
const WATCH_INTERVAL_SECS: u64 = 2;
/// Destination path for tmpfs, which used by the golang runtime
/// Destination path for tmpfs
const WATCH_MOUNT_POINT_PATH: &str = "/run/kata-containers/shared/containers/watchable/";
/// Destination path for tmpfs for runtime-rs passthrough file sharing
const WATCH_MOUNT_POINT_PATH_PASSTHROUGH: &str =
"/run/kata-containers/shared/containers/passthrough/watchable/";
/// Represents a single watched storage entry which may have multiple files to watch.
#[derive(Default, Debug, Clone)]
struct Storage {
@@ -124,7 +120,7 @@ impl Storage {
// if we are creating a directory: just create it, nothing more to do
if metadata.file_type().is_dir() {
let dest_file_path = self.make_target_path(source_file_path)?;
let dest_file_path = self.make_target_path(&source_file_path)?;
fs::create_dir_all(&dest_file_path)
.await
@@ -152,7 +148,7 @@ impl Storage {
// Assume target mount is a file path
self.target_mount_point.clone()
} else {
let dest_file_path = self.make_target_path(source_file_path)?;
let dest_file_path = self.make_target_path(&source_file_path)?;
if let Some(path) = dest_file_path.parent() {
debug!(logger, "Creating destination directory: {}", path.display());
@@ -455,7 +451,7 @@ impl BindWatcher {
) -> Result<()> {
if self.watch_thread.is_none() {
// Virtio-fs shared path is RO by default, so we back the target-mounts by tmpfs.
self.mount(logger).await.context("mount watch directory")?;
self.mount(logger).await?;
// Spawn background thread to monitor changes
self.watch_thread = Some(Self::spawn_watcher(
@@ -504,28 +500,16 @@ impl BindWatcher {
}
async fn mount(&self, logger: &Logger) -> Result<()> {
// the watchable directory is created on the host side.
// here we can only check if it exist.
// first we will check the default WATCH_MOUNT_POINT_PATH,
// and then check WATCH_MOUNT_POINT_PATH_PASSTHROUGH
// in turn which are introduced by runtime-rs file sharing.
let watchable_dir = if Path::new(WATCH_MOUNT_POINT_PATH).is_dir() {
WATCH_MOUNT_POINT_PATH
} else if Path::new(WATCH_MOUNT_POINT_PATH_PASSTHROUGH).is_dir() {
WATCH_MOUNT_POINT_PATH_PASSTHROUGH
} else {
return Err(anyhow!("watchable mount source not found"));
};
fs::create_dir_all(WATCH_MOUNT_POINT_PATH).await?;
baremount(
Path::new("tmpfs"),
Path::new(watchable_dir),
Path::new(WATCH_MOUNT_POINT_PATH),
"tmpfs",
MsFlags::empty(),
"",
logger,
)
.context("baremount watchable mount path")?;
)?;
Ok(())
}
@@ -536,12 +520,7 @@ impl BindWatcher {
handle.abort();
}
// try umount watchable mount path in turn
if Path::new(WATCH_MOUNT_POINT_PATH).is_dir() {
let _ = umount(WATCH_MOUNT_POINT_PATH);
} else if Path::new(WATCH_MOUNT_POINT_PATH_PASSTHROUGH).is_dir() {
let _ = umount(WATCH_MOUNT_POINT_PATH_PASSTHROUGH);
}
let _ = umount(WATCH_MOUNT_POINT_PATH);
}
}
@@ -550,7 +529,6 @@ mod tests {
use super::*;
use crate::mount::is_mounted;
use nix::unistd::{Gid, Uid};
use scopeguard::defer;
use std::fs;
use std::thread;
use test_utils::skip_if_not_root;
@@ -778,7 +756,7 @@ mod tests {
22
);
assert_eq!(
fs::read_to_string(entries.0[0].target_mount_point.as_path().join("1.txt")).unwrap(),
fs::read_to_string(&entries.0[0].target_mount_point.as_path().join("1.txt")).unwrap(),
"updated"
);
@@ -823,7 +801,7 @@ mod tests {
2
);
assert_eq!(
fs::read_to_string(entries.0[1].target_mount_point.as_path().join("foo.txt")).unwrap(),
fs::read_to_string(&entries.0[1].target_mount_point.as_path().join("foo.txt")).unwrap(),
"updated"
);
@@ -1000,7 +978,7 @@ mod tests {
// create a path we'll remove later
fs::create_dir_all(source_dir.path().join("tmp")).unwrap();
fs::write(source_dir.path().join("tmp/test-file"), "foo").unwrap();
fs::write(&source_dir.path().join("tmp/test-file"), "foo").unwrap();
assert_eq!(entry.scan(&logger).await.unwrap(), 3); // root, ./tmp, test-file
// Verify expected directory, file:
@@ -1291,26 +1269,19 @@ mod tests {
#[tokio::test]
#[serial]
#[cfg(not(target_arch = "aarch64"))]
async fn create_tmpfs() {
skip_if_not_root!();
let logger = slog::Logger::root(slog::Discard, o!());
let mut watcher = BindWatcher::default();
for mount_point in [WATCH_MOUNT_POINT_PATH, WATCH_MOUNT_POINT_PATH_PASSTHROUGH] {
fs::create_dir_all(mount_point).unwrap();
// ensure the watchable directory is deleted.
defer!(fs::remove_dir_all(mount_point).unwrap());
watcher.mount(&logger).await.unwrap();
assert!(is_mounted(WATCH_MOUNT_POINT_PATH).unwrap());
watcher.mount(&logger).await.unwrap();
assert!(is_mounted(mount_point).unwrap());
thread::sleep(Duration::from_millis(20));
thread::sleep(Duration::from_millis(20));
watcher.cleanup();
assert!(!is_mounted(mount_point).unwrap());
}
watcher.cleanup();
assert!(!is_mounted(WATCH_MOUNT_POINT_PATH).unwrap());
}
#[tokio::test]
@@ -1318,10 +1289,6 @@ mod tests {
async fn spawn_thread() {
skip_if_not_root!();
fs::create_dir_all(WATCH_MOUNT_POINT_PATH).unwrap();
// ensure the watchable directory is deleted.
defer!(fs::remove_dir_all(WATCH_MOUNT_POINT_PATH).unwrap());
let source_dir = tempfile::tempdir().unwrap();
fs::write(source_dir.path().join("1.txt"), "one").unwrap();
@@ -1352,10 +1319,6 @@ mod tests {
async fn verify_container_cleanup_watching() {
skip_if_not_root!();
fs::create_dir_all(WATCH_MOUNT_POINT_PATH).unwrap();
// ensure the watchable directory is deleted.
defer!(fs::remove_dir_all(WATCH_MOUNT_POINT_PATH).unwrap());
let source_dir = tempfile::tempdir().unwrap();
fs::write(source_dir.path().join("1.txt"), "one").unwrap();

View File

@@ -18,4 +18,4 @@ bincode = "1.3.3"
byteorder = "1.4.3"
slog = { version = "2.5.2", features = ["dynamic-keys", "max_level_trace", "release_max_level_debug"] }
async-trait = "0.1.50"
tokio = "1.28.1"
tokio = "1.2.0"

1850
src/dragonball/Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -12,21 +12,21 @@ edition = "2018"
[dependencies]
arc-swap = "1.5.0"
bytes = "1.1.0"
dbs-address-space = "0.2.0"
dbs-address-space = "0.1.0"
dbs-allocator = "0.1.0"
dbs-arch = "0.2.0"
dbs-boot = "0.3.0"
dbs-device = "0.2.0"
dbs-interrupt = { version = "0.2.0", features = ["kvm-irq"] }
dbs-arch = "0.1.0"
dbs-boot = "0.2.0"
dbs-device = "0.1.0"
dbs-interrupt = { version = "0.1.0", features = ["kvm-irq"] }
dbs-legacy-devices = "0.1.0"
dbs-upcall = { version = "0.1.0", optional = true }
dbs-utils = "0.2.0"
dbs-utils = "0.1.0"
dbs-virtio-devices = { version = "0.1.0", optional = true, features = ["virtio-mmio"] }
kvm-bindings = "0.5.0"
kvm-ioctls = "0.11.0"
lazy_static = "1.2"
libc = "0.2.39"
linux-loader = "0.6.0"
linux-loader = "0.4.0"
log = "0.4.14"
nix = "0.24.2"
seccompiler = "0.2.0"
@@ -36,21 +36,30 @@ serde_json = "1.0.9"
slog = "2.5.2"
slog-scope = "4.4.0"
thiserror = "1"
vmm-sys-util = "0.11.0"
virtio-queue = { version = "0.4.0", optional = true }
vm-memory = { version = "0.9.0", features = ["backend-mmap"] }
vmm-sys-util = "0.9.0"
virtio-queue = { version = "0.1.0", optional = true }
vm-memory = { version = "0.7.0", features = ["backend-mmap"] }
[dev-dependencies]
slog-term = "2.9.0"
slog-async = "2.7.0"
test-utils = { path = "../libs/test-utils" }
[features]
acpi = []
atomic-guest-memory = [ "vm-memory/backend-atomic" ]
atomic-guest-memory = []
hotplug = ["virtio-vsock"]
virtio-vsock = ["dbs-virtio-devices/virtio-vsock", "virtio-queue"]
virtio-blk = ["dbs-virtio-devices/virtio-blk", "virtio-queue"]
virtio-net = ["dbs-virtio-devices/virtio-net", "virtio-queue"]
# virtio-fs only work on atomic-guest-memory
virtio-fs = ["dbs-virtio-devices/virtio-fs", "virtio-queue", "atomic-guest-memory"]
[patch.'crates-io']
dbs-device = { git = "https://github.com/openanolis/dragonball-sandbox.git", rev = "7a8e832b53d66994d6a16f0513d69f540583dcd0" }
dbs-interrupt = { git = "https://github.com/openanolis/dragonball-sandbox.git", rev = "7a8e832b53d66994d6a16f0513d69f540583dcd0" }
dbs-legacy-devices = { git = "https://github.com/openanolis/dragonball-sandbox.git", rev = "7a8e832b53d66994d6a16f0513d69f540583dcd0" }
dbs-upcall = { git = "https://github.com/openanolis/dragonball-sandbox.git", rev = "7a8e832b53d66994d6a16f0513d69f540583dcd0" }
dbs-utils = { git = "https://github.com/openanolis/dragonball-sandbox.git", rev = "7a8e832b53d66994d6a16f0513d69f540583dcd0" }
dbs-virtio-devices = { git = "https://github.com/openanolis/dragonball-sandbox.git", rev = "7a8e832b53d66994d6a16f0513d69f540583dcd0" }
dbs-boot = { git = "https://github.com/openanolis/dragonball-sandbox.git", rev = "7a8e832b53d66994d6a16f0513d69f540583dcd0" }
dbs-arch = { git = "https://github.com/openanolis/dragonball-sandbox.git", rev = "7a8e832b53d66994d6a16f0513d69f540583dcd0" }

View File

@@ -2,22 +2,12 @@
# Copyright (c) 2019-2022 Ant Group. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
include ../../utils.mk
ifeq ($(ARCH), s390x)
default build check test clippy:
@echo "s390x not support currently"
exit 0
else
default: build
build:
@echo "INFO: cargo build..."
cargo build --all-features --target $(TRIPLE)
static-checks-build:
@echo "INFO: static-checks-build do nothing.."
# FIXME: This line will be removed when we solve the vm-memory dependency problem in Dragonball Sandbox
cargo update -p vm-memory:0.8.0 --precise 0.7.0
cargo build --all-features
check: clippy format
@@ -27,9 +17,6 @@ clippy:
-- \
-D warnings
vendor:
@echo "INFO: vendor do nothing.."
format:
@echo "INFO: cargo fmt..."
cargo fmt -- --check
@@ -38,13 +25,5 @@ clean:
cargo clean
test:
ifdef SUPPORT_VIRTUALIZATION
cargo test --all-features --target $(TRIPLE) -- --nocapture
else
@echo "INFO: skip testing dragonball, it need virtualization support."
exit 0
endif
endif # ifeq ($(ARCH), s390x)
.DEFAULT_GOAL := default
@echo "INFO: testing dragonball for development build"
cargo test --all-features -- --nocapture

View File

@@ -19,7 +19,6 @@ and configuration process.
Device: [Device Document](docs/device.md)
vCPU: [vCPU Document](docs/vcpu.md)
API: [API Document](docs/api.md)
`Upcall`: [`Upcall` Document](docs/upcall.md)
Currently, the documents are still actively adding.
You could see the [official documentation](docs/) page for more details.

View File

@@ -1,177 +0,0 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xl="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" version="1.1" viewBox="51 242 818 479" width="818" height="479">
<defs>
<marker orient="auto" overflow="visible" markerUnits="strokeWidth" id="FilledArrow_Marker" stroke-linejoin="miter" stroke-miterlimit="10" viewBox="-1 -4 10 8" markerWidth="10" markerHeight="8" color="black">
<g>
<path d="M 8 0 L 0 -3 L 0 3 Z" fill="currentColor" stroke="currentColor" stroke-width="1"/>
</g>
</marker>
<marker orient="auto" overflow="visible" markerUnits="strokeWidth" id="FilledArrow_Marker_2" stroke-linejoin="miter" stroke-miterlimit="10" viewBox="-9 -4 10 8" markerWidth="10" markerHeight="8" color="black">
<g>
<path d="M -8 0 L 0 3 L 0 -3 Z" fill="currentColor" stroke="currentColor" stroke-width="1"/>
</g>
</marker>
</defs>
<g id="Canvas_1" fill="none" fill-opacity="1" stroke="none" stroke-opacity="1" stroke-dasharray="none">
<title>Canvas 1</title>
<rect fill="white" x="51" y="242" width="818" height="479"/>
<g id="Canvas_1_Layer_1">
<title>Layer 1</title>
<g id="Line_4">
<line x1="153" y1="279.5" x2="856.1097" y2="279.5" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-dasharray="4.0,4.0" stroke-width="1"/>
</g>
<g id="Graphic_5">
<text transform="translate(56 247.5)" fill="black">
<tspan font-family="Alibaba PuHuiTi" font-size="16" fill="black" x="0" y="17">Guest User</tspan>
</text>
</g>
<g id="Graphic_6">
<text transform="translate(56 286)" fill="black">
<tspan font-family="Alibaba PuHuiTi" font-size="16" fill="black" x="0" y="17">Guest Kernel</tspan>
</text>
</g>
<g id="Line_7">
<line x1="153" y1="592" x2="856.1097" y2="592" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-dasharray="4.0,4.0" stroke-width="1"/>
</g>
<g id="Graphic_8">
<text transform="translate(62.76 597.5)" fill="black">
<tspan font-family="Alibaba PuHuiTi" font-size="16" fill="black" x="7531753e-19" y="17">Hypervisor</tspan>
</text>
</g>
<g id="Graphic_10">
<path d="M 264 328 L 347.456 328 C 354.0834 328 359.456 333.3726 359.456 340 L 359.456 524.5 C 359.456 531.1274 354.0834 536.5 347.456 536.5 L 264 536.5 C 257.37258 536.5 252 531.1274 252 524.5 L 252 340 C 252 333.3726 257.37258 328 264 328 Z" stroke="gray" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_11">
<text transform="translate(276.776 333)" fill="black">
<tspan font-family="Alibaba PuHuiTi" font-size="16" fill="black" x="0" y="17">socket</tspan>
</text>
</g>
<g id="Graphic_12">
<path d="M 582 294.5 L 672 294.5 C 678.6274 294.5 684 299.8726 684 306.5 L 684 354 C 684 360.6274 678.6274 366 672 366 L 582 366 C 575.3726 366 570 360.6274 570 354 L 570 306.5 C 570 299.8726 575.3726 294.5 582 294.5 Z" stroke="gray" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(575 302.578)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="27.704" y="15">Device </tspan>
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="20.44" y="33.448">Manager</tspan>
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="25.488" y="51.895996">Service</tspan>
</text>
</g>
<g id="Graphic_13">
<text transform="translate(284.824 374)" fill="black">
<tspan font-family="Alibaba PuHuiTi" font-size="16" fill="black" x="8135714e-19" y="17">bind</tspan>
</text>
</g>
<g id="Graphic_14">
<text transform="translate(280.528 416.25)" fill="black">
<tspan font-family="Alibaba PuHuiTi" font-size="16" fill="black" x="0" y="17">listen</tspan>
</text>
</g>
<g id="Graphic_15">
<text transform="translate(274.92 459.5)" fill="black">
<tspan font-family="Alibaba PuHuiTi" font-size="16" fill="black" x="0" y="17">accept</tspan>
</text>
</g>
<g id="Graphic_16">
<text transform="translate(256.372 503.5)" fill="black">
<tspan font-family="Alibaba PuHuiTi" font-size="16" fill="black" x="8668621e-19" y="17">new kthread</tspan>
</text>
</g>
<g id="Graphic_17">
<path d="M 268 566.5 L 807.5 566.5 C 813.0228 566.5 817.5 570.97715 817.5 576.5 L 817.5 576.5 C 817.5 582.02285 813.0228 586.5 807.5 586.5 L 268 586.5 C 262.47715 586.5 258 582.02285 258 576.5 L 258 576.5 C 258 570.97715 262.47715 566.5 268 566.5 Z" stroke="gray" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(263 567.276)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="226.454" y="15">virtio-vsocket</tspan>
</text>
</g>
<g id="Graphic_18">
<path d="M 268 598.5 L 807.5 598.5 C 813.0228 598.5 817.5 602.97715 817.5 608.5 L 817.5 608.5 C 817.5 614.02285 813.0228 618.5 807.5 618.5 L 268 618.5 C 262.47715 618.5 258 614.02285 258 608.5 L 258 608.5 C 258 602.97715 262.47715 598.5 268 598.5 Z" stroke="gray" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(263 599.276)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="193.254" y="15">virtio-vsocket backend</tspan>
</text>
</g>
<g id="Line_20">
<line x1="301.9" y1="352" x2="301.9" y2="369.84976" marker-end="url(#FilledArrow_Marker)" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Line_21">
<line x1="300.828" y1="394.6251" x2="300.828" y2="412.4749" marker-end="url(#FilledArrow_Marker)" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Line_22">
<line x1="300.828" y1="437.56256" x2="300.828" y2="455.4123" marker-end="url(#FilledArrow_Marker)" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Line_23">
<line x1="299.9" y1="480.1251" x2="299.9" y2="497.9749" marker-end="url(#FilledArrow_Marker)" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_24">
<rect x="266.5" y="541.5" width="71.188" height="20" stroke="gray" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(271.5 540.5)" fill="black">
<tspan font-family="Alibaba PuHuiTi" font-size="16" fill="black" x="13.858" y="17">Port</tspan>
</text>
</g>
<g id="Graphic_27">
<path d="M 582 648.5 L 672 648.5 C 678.6274 648.5 684 653.8726 684 660.5 L 684 708 C 684 714.6274 678.6274 720 672 720 L 582 720 C 575.3726 720 570 714.6274 570 708 L 570 660.5 C 570 653.8726 575.3726 648.5 582 648.5 Z" stroke="gray" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(575 656.578)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="27.704" y="15">Device </tspan>
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="20.44" y="33.448">Manager</tspan>
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="20.288" y="51.895996">Backend</tspan>
</text>
</g>
<g id="Line_28">
<line x1="627" y1="375.9" x2="627" y2="638.6" marker-end="url(#FilledArrow_Marker)" marker-start="url(#FilledArrow_Marker_2)" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-dasharray="4.0,4.0" stroke-width="1"/>
</g>
<g id="Graphic_31">
<path d="M 711 294.5 L 801 294.5 C 807.6274 294.5 813 299.8726 813 306.5 L 813 354 C 813 360.6274 807.6274 366 801 366 L 711 366 C 704.3726 366 699 360.6274 699 354 L 699 306.5 C 699 299.8726 704.3726 294.5 711 294.5 Z" stroke="gray" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(704 321.026)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="17.784" y="15">Service B</tspan>
</text>
</g>
<g id="Graphic_30">
<path d="M 711 648.5 L 801 648.5 C 807.6274 648.5 813 653.8726 813 660.5 L 813 708 C 813 714.6274 807.6274 720 801 720 L 711 720 C 704.3726 720 699 714.6274 699 708 L 699 660.5 C 699 653.8726 704.3726 648.5 711 648.5 Z" stroke="gray" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(704 675.026)" fill="black">
<tspan font-family="Helvetica Neue" font-size="16" fill="black" x="12.584" y="15">Backend B</tspan>
</text>
</g>
<g id="Line_29">
<line x1="756" y1="375.9" x2="756" y2="638.6" marker-end="url(#FilledArrow_Marker)" marker-start="url(#FilledArrow_Marker_2)" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-dasharray="4.0,4.0" stroke-width="1"/>
</g>
<g id="Graphic_32">
<text transform="translate(833 319.25)" fill="black">
<tspan font-family="Alibaba PuHuiTi" font-size="16" fill="black" x="58264504e-20" y="17">……</tspan>
</text>
</g>
<g id="Graphic_33">
<text transform="translate(833 673.25)" fill="black">
<tspan font-family="Alibaba PuHuiTi" font-size="16" fill="black" x="58264504e-20" y="17">……</tspan>
</text>
</g>
<g id="Graphic_34">
<text transform="translate(252.616 296)" fill="black">
<tspan font-family="Alibaba PuHuiTi" font-size="16" fill="black" x="0" y="17">Upcall Server</tspan>
</text>
</g>
<g id="Line_39">
<path d="M 251.372 514.94444 L 196.16455 515.40173 L 196.2135 443.25 L 290.92825 443.92903" marker-end="url(#FilledArrow_Marker)" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Graphic_41">
<text transform="translate(417 503.5)" fill="black">
<tspan font-family="Alibaba PuHuiTi" font-size="16" fill="black" x="0" y="17">Service handler</tspan>
</text>
</g>
<g id="Graphic_42">
<rect x="591.406" y="540.4723" width="71.188" height="20" stroke="gray" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(596.406 539.4723)" fill="black">
<tspan font-family="Alibaba PuHuiTi" font-size="16" fill="black" x="10.386" y="17">Conn</tspan>
</text>
</g>
<g id="Graphic_43">
<rect x="720.406" y="541.4723" width="71.188" height="20" stroke="gray" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
<text transform="translate(725.406 540.4723)" fill="black">
<tspan font-family="Alibaba PuHuiTi" font-size="16" fill="black" x="10.386" y="17">Conn</tspan>
</text>
</g>
<g id="Line_44">
<line x1="358.684" y1="514.5" x2="402.1" y2="514.5" marker-end="url(#FilledArrow_Marker)" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
<g id="Line_46">
<path d="M 479.2467 498.5 L 480 328 L 560.10116 329.22604" marker-end="url(#FilledArrow_Marker)" stroke="black" stroke-linecap="round" stroke-linejoin="round" stroke-width="1"/>
</g>
</g>
</g>
</svg>

Before

Width:  |  Height:  |  Size: 12 KiB

View File

@@ -1,30 +0,0 @@
# `Upcall`
## What is `Upcall`?
`Upcall` is a direct communication tool between VMM and guest developed upon `vsock`. The server side of the `upcall` is a driver in guest kernel (kernel patches are needed for this feature) and it'll start to serve the requests after the kernel starts. And the client side is in Dragonball VMM , it'll be a thread that communicates with `vsock` through `uds`.
We want to keep the lightweight of the VM through the implementation of the `upcall`.
![architecture overview](images/upcall-architecture.svg)
## What can `upcall` do?
We define specific operations in the device manager service (one of the services in `upcall` we developed) to perform device hotplug / hot-unplug including vCPU hotplug, `virtio-mmio` hotplug, and memory hotplug. We have accomplished device hotplug / hot-unplug directly through `upcall` in order to avoid the virtualization of ACPI to minimize virtual machines overhead. And there could be many other uses if other services are implemented.
## How to enable `upcall`?
`Upcall` needs a server in the guest kernel which will be several kernel patches for the `upcall` server itself and different services registered in the `upcall` server. It's currently tested on upstream Linux kernel 5.10.
To make it easy for users to use, we have open-source the `upcall` guest patches in [Dragonball experimental guest patches](../../../tools/packaging/kernel/patches/5.10.x/dragonball-experimental) and develop `upcall` support in [Kata guest kernel building script](../../../tools/packaging/kernel/build-kernel.sh).
You could use following command to download the upstream kernel (currently Dragonball uses 5.10.25) and put the `upcall` patches and other Kata patches into kernel code.
`sh build-kernel.sh -e -t dragonball -f setup`
`-e` here means experimental, mainly because `upcall` patches are not in upstream Linux kernel.
`-t dragonball` is for specifying hypervisor type
`-f` is for generating `.config` file
After this command, the kernel code with `upcall` and related `.config` file are all set up in the directory `kata-linux-dragonball-experimental-5.10.25-[config version]`. You can either manually compile the kernel with `make` command or following [Document for build-kernel.sh](../../../tools/packaging/kernel/README.md) to build and use this guest kernel.
Also, a client-side is also needed in VMM. Dragonball has already open-source the way to implement `upcall` client and Dragonball compiled with `dbs-upcall` feature will enable Dragonball client side.

View File

@@ -33,10 +33,10 @@ use log::{debug, error, info, warn};
use nix::sys::mman;
use nix::unistd::dup;
#[cfg(feature = "atomic-guest-memory")]
use vm_memory::GuestMemoryAtomic;
use vm_memory::atomic::GuestMemoryAtomic;
use vm_memory::{
address::Address, FileOffset, GuestAddress, GuestAddressSpace, GuestMemoryMmap,
GuestMemoryRegion, GuestRegionMmap, GuestUsize, MemoryRegionAddress, MmapRegion,
Address, FileOffset, GuestAddress, GuestAddressSpace, GuestMemoryMmap, GuestMemoryRegion,
GuestRegionMmap, GuestUsize, MemoryRegionAddress, MmapRegion,
};
use crate::resource_manager::ResourceManager;
@@ -250,11 +250,6 @@ impl AddressSpaceMgr {
self.address_space.as_ref()
}
/// Get the guest memory.
pub fn vm_memory(&self) -> Option<<GuestAddressSpaceImpl as GuestAddressSpace>::T> {
self.get_vm_as().map(|m| m.memory())
}
/// Create the address space for a virtual machine.
///
/// This method is designed to be called when starting up a virtual machine instead of at
@@ -275,7 +270,7 @@ impl AddressSpaceMgr {
let size = info
.size
.checked_shl(20)
.ok_or(AddressManagerError::InvalidOperation)?;
.ok_or_else(|| AddressManagerError::InvalidOperation)?;
// Guest memory does not intersect with the MMIO hole.
// TODO: make it work for ARM (issue #4307)
@@ -286,13 +281,13 @@ impl AddressSpaceMgr {
regions.push(region);
start_addr = start_addr
.checked_add(size)
.ok_or(AddressManagerError::InvalidOperation)?;
.ok_or_else(|| AddressManagerError::InvalidOperation)?;
} else {
// Add guest memory below the MMIO hole, avoid splitting the memory region
// if the available address region is small than MINIMAL_SPLIT_SPACE MiB.
let mut below_size = dbs_boot::layout::MMIO_LOW_START
.checked_sub(start_addr)
.ok_or(AddressManagerError::InvalidOperation)?;
.ok_or_else(|| AddressManagerError::InvalidOperation)?;
if below_size < (MINIMAL_SPLIT_SPACE) {
below_size = 0;
} else {
@@ -304,12 +299,12 @@ impl AddressSpaceMgr {
let above_start = dbs_boot::layout::MMIO_LOW_END + 1;
let above_size = size
.checked_sub(below_size)
.ok_or(AddressManagerError::InvalidOperation)?;
.ok_or_else(|| AddressManagerError::InvalidOperation)?;
let region = self.create_region(above_start, above_size, info, &mut param)?;
regions.push(region);
start_addr = above_start
.checked_add(above_size)
.ok_or(AddressManagerError::InvalidOperation)?;
.ok_or_else(|| AddressManagerError::InvalidOperation)?;
}
}
@@ -406,9 +401,9 @@ impl AddressSpaceMgr {
let flags = 0u32;
let mem_region = kvm_userspace_memory_region {
slot,
slot: slot as u32,
guest_phys_addr: reg.start_addr().raw_value(),
memory_size: reg.len(),
memory_size: reg.len() as u64,
userspace_addr: host_addr as u64,
flags,
};
@@ -426,7 +421,7 @@ impl AddressSpaceMgr {
self.base_to_slot
.lock()
.unwrap()
.insert(reg.start_addr().raw_value(), slot);
.insert(reg.start_addr().raw_value(), slot as u32);
Ok(())
}
@@ -507,7 +502,7 @@ impl AddressSpaceMgr {
fn configure_numa(&self, mmap_reg: &MmapRegion, node_id: u32) -> Result<()> {
let nodemask = 1_u64
.checked_shl(node_id)
.ok_or(AddressManagerError::InvalidOperation)?;
.ok_or_else(|| AddressManagerError::InvalidOperation)?;
let res = unsafe {
libc::syscall(
libc::SYS_mbind,

View File

@@ -18,7 +18,7 @@ pub const DEFAULT_KERNEL_CMDLINE: &str = "reboot=k panic=1 pci=off nomodules 825
i8042.noaux i8042.nomux i8042.nopnp i8042.dumbkbd";
/// Strongly typed data structure used to configure the boot source of the microvm.
#[derive(Clone, Debug, Deserialize, PartialEq, Eq, Serialize, Default)]
#[derive(Clone, Debug, Deserialize, PartialEq, Serialize, Default)]
#[serde(deny_unknown_fields)]
pub struct BootSourceConfig {
/// Path of the kernel image.

View File

@@ -10,7 +10,7 @@ use serde_derive::{Deserialize, Serialize};
/// When Dragonball starts, the instance state is Uninitialized. Once start_microvm method is
/// called, the state goes from Uninitialized to Starting. The state is changed to Running until
/// the start_microvm method ends. Halting and Halted are currently unsupported.
#[derive(Copy, Clone, Debug, Deserialize, PartialEq, Eq, Serialize)]
#[derive(Copy, Clone, Debug, Deserialize, PartialEq, Serialize)]
pub enum InstanceState {
/// Microvm is not initialized.
Uninitialized,
@@ -29,7 +29,7 @@ pub enum InstanceState {
}
/// The state of async actions
#[derive(Debug, Deserialize, Serialize, Clone, PartialEq, Eq)]
#[derive(Debug, Deserialize, Serialize, Clone, PartialEq)]
pub enum AsyncState {
/// Uninitialized
Uninitialized,

View File

@@ -10,7 +10,7 @@ pub const MAX_SUPPORTED_VCPUS: u8 = 254;
pub const MEMORY_HOTPLUG_ALIGHMENT: u8 = 64;
/// Errors associated with configuring the microVM.
#[derive(Debug, PartialEq, Eq, thiserror::Error)]
#[derive(Debug, PartialEq, thiserror::Error)]
pub enum VmConfigError {
/// Cannot update the configuration of the microvm post boot.
#[error("update operation is not allowed after boot")]

View File

@@ -35,9 +35,6 @@ pub use crate::device_manager::virtio_net_dev_mgr::{
#[cfg(feature = "virtio-vsock")]
pub use crate::device_manager::vsock_dev_mgr::{VsockDeviceConfigInfo, VsockDeviceError};
#[cfg(feature = "hotplug")]
pub use crate::vcpu::{VcpuResizeError, VcpuResizeInfo};
use super::*;
/// Wrapper for all errors associated with VMM actions.
@@ -47,13 +44,9 @@ pub enum VmmActionError {
#[error("the virtual machine instance ID is invalid")]
InvalidVMID,
/// VM doesn't exist and can't get VM information.
#[error("VM doesn't exist and can't get VM information")]
VmNotExist,
/// Failed to hotplug, due to Upcall not ready.
#[error("Upcall not ready, can't hotplug device.")]
UpcallServerNotReady,
UpcallNotReady,
/// The action `ConfigureBootSource` failed either because of bad user input or an internal
/// error.
@@ -90,18 +83,13 @@ pub enum VmmActionError {
#[cfg(feature = "virtio-fs")]
/// The action `InsertFsDevice` failed either because of bad user input or an internal error.
#[error("virtio-fs device error: {0}")]
#[error("virtio-fs device: {0}")]
FsDevice(#[source] FsDeviceError),
#[cfg(feature = "hotplug")]
/// The action `ResizeVcpu` Failed
#[error("vcpu resize error : {0}")]
ResizeVcpu(#[source] VcpuResizeError),
}
/// This enum represents the public interface of the VMM. Each action contains various
/// bits of information (ids, paths, etc.).
#[derive(Clone, Debug, PartialEq, Eq)]
#[derive(Clone, Debug, PartialEq)]
pub enum VmmAction {
/// Configure the boot source of the microVM using `BootSourceConfig`.
/// This action can only be called before the microVM has booted.
@@ -168,10 +156,6 @@ pub enum VmmAction {
#[cfg(feature = "virtio-fs")]
/// Update fs rate limiter, after microVM start.
UpdateFsDevice(FsDeviceConfigUpdateInfo),
#[cfg(feature = "hotplug")]
/// Resize Vcpu number in the guest.
ResizeVcpu(VcpuResizeInfo),
}
/// The enum represents the response sent by the VMM in case of success. The response is either
@@ -272,8 +256,6 @@ impl VmmService {
VmmAction::UpdateFsDevice(fs_update_cfg) => {
self.update_fs_rate_limiters(vmm, fs_update_cfg)
}
#[cfg(feature = "hotplug")]
VmmAction::ResizeVcpu(vcpu_resize_cfg) => self.resize_vcpu(vmm, vcpu_resize_cfg),
};
debug!("send vmm response: {:?}", response);
@@ -316,6 +298,7 @@ impl VmmService {
let mut cmdline = linux_loader::cmdline::Cmdline::new(dbs_boot::layout::CMDLINE_MAX_SIZE);
let boot_args = boot_source_config
.boot_args
.clone()
.unwrap_or_else(|| String::from(DEFAULT_KERNEL_CMDLINE));
cmdline
.insert_str(boot_args)
@@ -424,10 +407,19 @@ impl VmmService {
}
config.vpmu_feature = machine_config.vpmu_feature;
// If serial_path is:
// - None, legacy_manager will create_stdio_console.
// - Some(path), legacy_manager will create_socket_console on that path.
config.serial_path = machine_config.serial_path;
let vm_id = vm.shared_info().read().unwrap().id.clone();
let serial_path = match machine_config.serial_path {
Some(value) => value,
None => {
if config.serial_path.is_none() {
String::from("/run/dragonball/") + &vm_id + "_com1"
} else {
// Safe to unwrap() because we have checked it has a value.
config.serial_path.as_ref().unwrap().clone()
}
}
};
config.serial_path = Some(serial_path);
vm.set_vm_config(config.clone());
self.machine_config = config;
@@ -480,8 +472,8 @@ impl VmmService {
let ctx = vm
.create_device_op_context(Some(event_mgr.epoll_manager()))
.map_err(|e| {
if let StartMicroVmError::UpcallServerNotReady = e {
return VmmActionError::UpcallServerNotReady;
if let StartMicroVmError::UpcallNotReady = e {
return VmmActionError::UpcallNotReady;
}
VmmActionError::Block(BlockDeviceError::UpdateNotAllowedPostBoot)
})?;
@@ -536,8 +528,8 @@ impl VmmService {
.map_err(|e| {
if let StartMicroVmError::MicroVMAlreadyRunning = e {
VmmActionError::VirtioNet(VirtioNetDeviceError::UpdateNotAllowedPostBoot)
} else if let StartMicroVmError::UpcallServerNotReady = e {
VmmActionError::UpcallServerNotReady
} else if let StartMicroVmError::UpcallNotReady = e {
VmmActionError::UpcallNotReady
} else {
VmmActionError::StartMicroVm(e)
}
@@ -613,37 +605,6 @@ impl VmmService {
.map(|_| VmmData::Empty)
.map_err(VmmActionError::FsDevice)
}
#[cfg(feature = "hotplug")]
fn resize_vcpu(&mut self, vmm: &mut Vmm, config: VcpuResizeInfo) -> VmmRequestResult {
if !cfg!(target_arch = "x86_64") {
// TODO: Arm need to support vcpu hotplug. issue: #6010
warn!("This arch do not support vm resize!");
return Ok(VmmData::Empty);
}
if !cfg!(feature = "dbs-upcall") {
warn!("We only support cpu resize through upcall server in the guest kernel now, please enable dbs-upcall feature.");
return Ok(VmmData::Empty);
}
let vm = vmm.get_vm_mut().ok_or(VmmActionError::VmNotExist)?;
if !vm.is_vm_initialized() {
return Err(VmmActionError::ResizeVcpu(
VcpuResizeError::UpdateNotAllowedPreBoot,
));
}
vm.resize_vcpu(config, None).map_err(|e| {
if let VcpuResizeError::UpcallServerNotReady = e {
return VmmActionError::UpcallServerNotReady;
}
VmmActionError::ResizeVcpu(e)
})?;
Ok(VmmData::Empty)
}
}
fn handle_cpu_topology(
@@ -673,783 +634,3 @@ fn handle_cpu_topology(
Ok(cpu_topology)
}
#[cfg(test)]
mod tests {
use std::sync::mpsc::channel;
use std::sync::{Arc, Mutex};
use dbs_utils::epoll_manager::EpollManager;
use test_utils::skip_if_not_root;
use vmm_sys_util::tempfile::TempFile;
use super::*;
use crate::vmm::tests::create_vmm_instance;
struct TestData<'a> {
req: Option<VmmAction>,
vm_state: InstanceState,
f: &'a dyn Fn(VmmRequestResult),
}
impl<'a> TestData<'a> {
fn new(req: VmmAction, vm_state: InstanceState, f: &'a dyn Fn(VmmRequestResult)) -> Self {
Self {
req: Some(req),
vm_state,
f,
}
}
fn check_request(&mut self) {
let (to_vmm, from_api) = channel();
let (to_api, from_vmm) = channel();
let epoll_mgr = EpollManager::default();
let vmm = Arc::new(Mutex::new(create_vmm_instance(epoll_mgr.clone())));
let mut vservice = VmmService::new(from_api, to_api);
let mut event_mgr = EventManager::new(&vmm, epoll_mgr).unwrap();
let mut v = vmm.lock().unwrap();
let vm = v.get_vm_mut().unwrap();
vm.set_instance_state(self.vm_state);
to_vmm.send(Box::new(self.req.take().unwrap())).unwrap();
assert!(vservice.run_vmm_action(&mut v, &mut event_mgr).is_ok());
let response = from_vmm.try_recv();
assert!(response.is_ok());
(self.f)(*response.unwrap());
}
}
#[test]
fn test_vmm_action_receive_unknown() {
skip_if_not_root!();
let (_to_vmm, from_api) = channel();
let (to_api, _from_vmm) = channel();
let epoll_mgr = EpollManager::default();
let vmm = Arc::new(Mutex::new(create_vmm_instance(epoll_mgr.clone())));
let mut vservice = VmmService::new(from_api, to_api);
let mut event_mgr = EventManager::new(&vmm, epoll_mgr).unwrap();
let mut v = vmm.lock().unwrap();
assert!(vservice.run_vmm_action(&mut v, &mut event_mgr).is_ok());
}
#[should_panic]
#[test]
fn test_vmm_action_disconnected() {
let (to_vmm, from_api) = channel();
let (to_api, _from_vmm) = channel();
let epoll_mgr = EpollManager::default();
let vmm = Arc::new(Mutex::new(create_vmm_instance(epoll_mgr.clone())));
let mut vservice = VmmService::new(from_api, to_api);
let mut event_mgr = EventManager::new(&vmm, epoll_mgr).unwrap();
let mut v = vmm.lock().unwrap();
drop(to_vmm);
vservice.run_vmm_action(&mut v, &mut event_mgr).unwrap();
}
#[test]
fn test_vmm_action_config_boot_source() {
skip_if_not_root!();
let kernel_file = TempFile::new().unwrap();
let tests = &mut [
// invalid state
TestData::new(
VmmAction::ConfigureBootSource(BootSourceConfig::default()),
InstanceState::Running,
&|result| {
if let Err(VmmActionError::BootSource(
BootSourceConfigError::UpdateNotAllowedPostBoot,
)) = result
{
let err_string = format!("{}", result.unwrap_err());
let expected_err = String::from(
"failed to configure boot source for VM: \
the update operation is not allowed after boot",
);
assert_eq!(err_string, expected_err);
} else {
panic!();
}
},
),
// invalid kernel file path
TestData::new(
VmmAction::ConfigureBootSource(BootSourceConfig::default()),
InstanceState::Uninitialized,
&|result| {
if let Err(VmmActionError::BootSource(
BootSourceConfigError::InvalidKernelPath(_),
)) = result
{
let err_string = format!("{}", result.unwrap_err());
let expected_err = String::from(
"failed to configure boot source for VM: \
the kernel file cannot be opened due to invalid kernel path or invalid permissions: \
No such file or directory (os error 2)");
assert_eq!(err_string, expected_err);
} else {
panic!();
}
},
),
//success
TestData::new(
VmmAction::ConfigureBootSource(BootSourceConfig {
kernel_path: kernel_file.as_path().to_str().unwrap().to_string(),
..Default::default()
}),
InstanceState::Uninitialized,
&|result| {
assert!(result.is_ok());
},
),
];
for t in tests.iter_mut() {
t.check_request();
}
}
#[test]
fn test_vmm_action_set_vm_configuration() {
skip_if_not_root!();
let tests = &mut [
// invalid state
TestData::new(
VmmAction::SetVmConfiguration(VmConfigInfo::default()),
InstanceState::Running,
&|result| {
assert!(matches!(
result,
Err(VmmActionError::MachineConfig(
VmConfigError::UpdateNotAllowedPostBoot
))
));
let err_string = format!("{}", result.unwrap_err());
let expected_err = String::from(
"failed to set configuration for the VM: \
update operation is not allowed after boot",
);
assert_eq!(err_string, expected_err);
},
),
// invalid cpu count (0)
TestData::new(
VmmAction::SetVmConfiguration(VmConfigInfo {
vcpu_count: 0,
..Default::default()
}),
InstanceState::Uninitialized,
&|result| {
assert!(matches!(
result,
Err(VmmActionError::MachineConfig(
VmConfigError::InvalidVcpuCount(0)
))
));
let err_string = format!("{}", result.unwrap_err());
let expected_err = String::from(
"failed to set configuration for the VM: \
the vCPU number '0' can only be 1 or an even number when hyperthreading is enabled");
assert_eq!(err_string, expected_err);
},
),
// invalid max cpu count (too small)
TestData::new(
VmmAction::SetVmConfiguration(VmConfigInfo {
vcpu_count: 4,
max_vcpu_count: 2,
..Default::default()
}),
InstanceState::Uninitialized,
&|result| {
assert!(matches!(
result,
Err(VmmActionError::MachineConfig(
VmConfigError::InvalidMaxVcpuCount(2)
))
));
let err_string = format!("{}", result.unwrap_err());
let expected_err = String::from(
"failed to set configuration for the VM: \
the max vCPU number '2' shouldn't less than vCPU count and can only be 1 or an even number when hyperthreading is enabled");
assert_eq!(err_string, expected_err);
},
),
// invalid cpu topology (larger than 254)
TestData::new(
VmmAction::SetVmConfiguration(VmConfigInfo {
vcpu_count: 254,
cpu_topology: CpuTopology {
threads_per_core: 2,
cores_per_die: 128,
dies_per_socket: 1,
sockets: 1,
},
..Default::default()
}),
InstanceState::Uninitialized,
&|result| {
assert!(matches!(
result,
Err(VmmActionError::MachineConfig(
VmConfigError::VcpuCountExceedsMaximum
))
));
let err_string = format!("{}", result.unwrap_err());
let expected_err = String::from(
"failed to set configuration for the VM: \
the vCPU number shouldn't large than 254",
);
assert_eq!(err_string, expected_err)
},
),
// cpu topology and max_vcpu_count are not matched - success
TestData::new(
VmmAction::SetVmConfiguration(VmConfigInfo {
vcpu_count: 16,
max_vcpu_count: 32,
cpu_topology: CpuTopology {
threads_per_core: 1,
cores_per_die: 128,
dies_per_socket: 1,
sockets: 1,
},
..Default::default()
}),
InstanceState::Uninitialized,
&|result| {
result.unwrap();
},
),
// invalid threads_per_core
TestData::new(
VmmAction::SetVmConfiguration(VmConfigInfo {
vcpu_count: 4,
max_vcpu_count: 4,
cpu_topology: CpuTopology {
threads_per_core: 4,
cores_per_die: 1,
dies_per_socket: 1,
sockets: 1,
},
..Default::default()
}),
InstanceState::Uninitialized,
&|result| {
assert!(matches!(
result,
Err(VmmActionError::MachineConfig(
VmConfigError::InvalidThreadsPerCore(4)
))
));
let err_string = format!("{}", result.unwrap_err());
let expected_err = String::from(
"failed to set configuration for the VM: \
the threads_per_core number '4' can only be 1 or 2",
);
assert_eq!(err_string, expected_err)
},
),
// invalid mem size
TestData::new(
VmmAction::SetVmConfiguration(VmConfigInfo {
mem_size_mib: 3,
..Default::default()
}),
InstanceState::Uninitialized,
&|result| {
assert!(matches!(
result,
Err(VmmActionError::MachineConfig(
VmConfigError::InvalidMemorySize(3)
))
));
let err_string = format!("{}", result.unwrap_err());
let expected_err = String::from(
"failed to set configuration for the VM: \
the memory size 0x3MiB is invalid",
);
assert_eq!(err_string, expected_err);
},
),
// invalid mem path
TestData::new(
VmmAction::SetVmConfiguration(VmConfigInfo {
mem_type: String::from("hugetlbfs"),
mem_file_path: String::from(""),
..Default::default()
}),
InstanceState::Uninitialized,
&|result| {
assert!(matches!(
result,
Err(VmmActionError::MachineConfig(
VmConfigError::InvalidMemFilePath(_)
))
));
let err_string = format!("{}", result.unwrap_err());
let expected_err = String::from(
"failed to set configuration for the VM: \
the memory file path is invalid",
);
assert_eq!(err_string, expected_err);
},
),
// success
TestData::new(
VmmAction::SetVmConfiguration(VmConfigInfo::default()),
InstanceState::Uninitialized,
&|result| {
assert!(result.is_ok());
},
),
];
for t in tests.iter_mut() {
t.check_request();
}
}
#[test]
fn test_vmm_action_start_microvm() {
skip_if_not_root!();
let tests = &mut [
// invalid state (running)
TestData::new(VmmAction::StartMicroVm, InstanceState::Running, &|result| {
assert!(matches!(
result,
Err(VmmActionError::StartMicroVm(
StartMicroVmError::MicroVMAlreadyRunning
))
));
let err_string = format!("{}", result.unwrap_err());
let expected_err = String::from(
"failed to boot the VM: \
the virtual machine is already running",
);
assert_eq!(err_string, expected_err);
}),
// no kernel configuration
TestData::new(
VmmAction::StartMicroVm,
InstanceState::Uninitialized,
&|result| {
assert!(matches!(
result,
Err(VmmActionError::StartMicroVm(
StartMicroVmError::MissingKernelConfig
))
));
let err_string = format!("{}", result.unwrap_err());
let expected_err = String::from(
"failed to boot the VM: \
cannot start the virtual machine without kernel configuration",
);
assert_eq!(err_string, expected_err);
},
),
];
for t in tests.iter_mut() {
t.check_request();
}
}
#[test]
fn test_vmm_action_shutdown_microvm() {
skip_if_not_root!();
let tests = &mut [
// success
TestData::new(
VmmAction::ShutdownMicroVm,
InstanceState::Uninitialized,
&|result| {
assert!(result.is_ok());
},
),
];
for t in tests.iter_mut() {
t.check_request();
}
}
#[cfg(feature = "virtio-blk")]
#[test]
fn test_vmm_action_insert_block_device() {
skip_if_not_root!();
let dummy_file = TempFile::new().unwrap();
let dummy_path = dummy_file.as_path().to_owned();
let tests = &mut [
// invalid state
TestData::new(
VmmAction::InsertBlockDevice(BlockDeviceConfigInfo::default()),
InstanceState::Running,
&|result| {
assert!(matches!(
result,
Err(VmmActionError::Block(
BlockDeviceError::UpdateNotAllowedPostBoot
))
));
let err_string = format!("{}", result.unwrap_err());
let expected_err = String::from(
"virtio-blk device error: \
block device does not support runtime update",
);
assert_eq!(err_string, expected_err);
},
),
// success
TestData::new(
VmmAction::InsertBlockDevice(BlockDeviceConfigInfo {
path_on_host: dummy_path,
device_type: crate::device_manager::blk_dev_mgr::BlockDeviceType::RawBlock,
is_root_device: true,
part_uuid: None,
is_read_only: false,
is_direct: false,
no_drop: false,
drive_id: String::from("1"),
rate_limiter: None,
num_queues: BlockDeviceConfigInfo::default_num_queues(),
queue_size: 256,
use_shared_irq: None,
use_generic_irq: None,
}),
InstanceState::Uninitialized,
&|result| {
assert!(result.is_ok());
},
),
];
for t in tests.iter_mut() {
t.check_request();
}
}
#[cfg(feature = "virtio-blk")]
#[test]
fn test_vmm_action_update_block_device() {
skip_if_not_root!();
let tests = &mut [
// invalid id
TestData::new(
VmmAction::UpdateBlockDevice(BlockDeviceConfigUpdateInfo {
drive_id: String::from("1"),
rate_limiter: None,
}),
InstanceState::Running,
&|result| {
assert!(matches!(
result,
Err(VmmActionError::Block(BlockDeviceError::InvalidDeviceId(_)))
));
let err_string = format!("{}", result.unwrap_err());
let expected_err = String::from(
"virtio-blk device error: \
invalid block device id '1'",
);
assert_eq!(err_string, expected_err);
},
),
];
for t in tests.iter_mut() {
t.check_request();
}
}
#[cfg(feature = "virtio-blk")]
#[test]
fn test_vmm_action_remove_block_device() {
skip_if_not_root!();
let tests = &mut [
// invalid state
TestData::new(
VmmAction::RemoveBlockDevice(String::from("1")),
InstanceState::Running,
&|result| {
assert!(matches!(
result,
Err(VmmActionError::Block(
BlockDeviceError::UpdateNotAllowedPostBoot
))
));
let err_string = format!("{}", result.unwrap_err());
let expected_err = String::from(
"virtio-blk device error: \
block device does not support runtime update",
);
assert_eq!(err_string, expected_err);
},
),
// invalid id
TestData::new(
VmmAction::RemoveBlockDevice(String::from("1")),
InstanceState::Uninitialized,
&|result| {
assert!(matches!(
result,
Err(VmmActionError::Block(BlockDeviceError::InvalidDeviceId(_)))
));
let err_string = format!("{}", result.unwrap_err());
let expected_err = String::from(
"virtio-blk device error: \
invalid block device id '1'",
);
assert_eq!(err_string, expected_err);
},
),
];
for t in tests.iter_mut() {
t.check_request();
}
}
#[cfg(feature = "virtio-fs")]
#[test]
fn test_vmm_action_insert_fs_device() {
skip_if_not_root!();
let tests = &mut [
// invalid state
TestData::new(
VmmAction::InsertFsDevice(FsDeviceConfigInfo::default()),
InstanceState::Running,
&|result| {
assert!(matches!(
result,
Err(VmmActionError::FsDevice(
FsDeviceError::UpdateNotAllowedPostBoot
))
));
let err_string = format!("{}", result.unwrap_err());
let expected_err = String::from(
"virtio-fs device error: \
update operation is not allowed after boot",
);
assert_eq!(err_string, expected_err);
},
),
// success
TestData::new(
VmmAction::InsertFsDevice(FsDeviceConfigInfo::default()),
InstanceState::Uninitialized,
&|result| {
assert!(result.is_ok());
},
),
];
for t in tests.iter_mut() {
t.check_request();
}
}
#[cfg(feature = "virtio-fs")]
#[test]
fn test_vmm_action_manipulate_fs_device() {
skip_if_not_root!();
let tests = &mut [
// invalid state
TestData::new(
VmmAction::ManipulateFsBackendFs(FsMountConfigInfo::default()),
InstanceState::Uninitialized,
&|result| {
assert!(matches!(
result,
Err(VmmActionError::FsDevice(FsDeviceError::MicroVMNotRunning))
));
let err_string = format!("{}", result.unwrap_err());
let expected_err = String::from(
"virtio-fs device error: \
vm is not running when attaching a backend fs",
);
assert_eq!(err_string, expected_err);
},
),
// invalid backend
TestData::new(
VmmAction::ManipulateFsBackendFs(FsMountConfigInfo::default()),
InstanceState::Running,
&|result| {
assert!(matches!(
result,
Err(VmmActionError::FsDevice(
FsDeviceError::AttachBackendFailed(_)
))
));
let err_string = format!("{}", result.unwrap_err());
println!("{}", err_string);
let expected_err = String::from(
"virtio-fs device error: \
Fs device attach a backend fs failed",
);
assert_eq!(err_string, expected_err);
},
),
];
for t in tests.iter_mut() {
t.check_request();
}
}
#[cfg(feature = "virtio-net")]
#[test]
fn test_vmm_action_insert_network_device() {
skip_if_not_root!();
let tests = &mut [
// hotplug unready
TestData::new(
VmmAction::InsertNetworkDevice(VirtioNetDeviceConfigInfo::default()),
InstanceState::Running,
&|result| {
assert!(matches!(
result,
Err(VmmActionError::StartMicroVm(
StartMicroVmError::UpcallMissVsock
))
));
let err_string = format!("{}", result.unwrap_err());
let expected_err = String::from(
"failed to boot the VM: \
the upcall client needs a virtio-vsock device for communication",
);
assert_eq!(err_string, expected_err);
},
),
// success
TestData::new(
VmmAction::InsertNetworkDevice(VirtioNetDeviceConfigInfo::default()),
InstanceState::Uninitialized,
&|result| {
assert!(result.is_ok());
},
),
];
for t in tests.iter_mut() {
t.check_request();
}
}
#[cfg(feature = "virtio-net")]
#[test]
fn test_vmm_action_update_network_interface() {
skip_if_not_root!();
let tests = &mut [
// invalid id
TestData::new(
VmmAction::UpdateNetworkInterface(VirtioNetDeviceConfigUpdateInfo {
iface_id: String::from("1"),
rx_rate_limiter: None,
tx_rate_limiter: None,
}),
InstanceState::Running,
&|result| {
assert!(matches!(
result,
Err(VmmActionError::VirtioNet(
VirtioNetDeviceError::InvalidIfaceId(_)
))
));
let err_string = format!("{}", result.unwrap_err());
let expected_err = String::from(
"virtio-net device error: \
invalid virtio-net iface id '1'",
);
assert_eq!(err_string, expected_err);
},
),
];
for t in tests.iter_mut() {
t.check_request();
}
}
#[cfg(feature = "virtio-vsock")]
#[test]
fn test_vmm_action_insert_vsock_device() {
skip_if_not_root!();
let tests = &mut [
// invalid state
TestData::new(
VmmAction::InsertVsockDevice(VsockDeviceConfigInfo::default()),
InstanceState::Running,
&|result| {
assert!(matches!(
result,
Err(VmmActionError::Vsock(
VsockDeviceError::UpdateNotAllowedPostBoot
))
));
let err_string = format!("{}", result.unwrap_err());
let expected_err = String::from(
"failed to add virtio-vsock device: \
update operation is not allowed after boot",
);
assert_eq!(err_string, expected_err);
},
),
// invalid guest_cid
TestData::new(
VmmAction::InsertVsockDevice(VsockDeviceConfigInfo::default()),
InstanceState::Uninitialized,
&|result| {
assert!(matches!(
result,
Err(VmmActionError::Vsock(VsockDeviceError::GuestCIDInvalid(0)))
));
let err_string = format!("{}", result.unwrap_err());
let expected_err = String::from(
"failed to add virtio-vsock device: \
the guest CID 0 is invalid",
);
assert_eq!(err_string, expected_err);
},
),
// success
TestData::new(
VmmAction::InsertVsockDevice(VsockDeviceConfigInfo {
guest_cid: 3,
..Default::default()
}),
InstanceState::Uninitialized,
&|result| {
assert!(result.is_ok());
},
),
];
for t in tests.iter_mut() {
t.check_request();
}
}
}

View File

@@ -46,7 +46,7 @@ pub trait ConfigItem {
}
/// Struct to manage a group of configuration items.
#[derive(Debug, Default, Deserialize, PartialEq, Eq, Serialize)]
#[derive(Debug, Default, Deserialize, PartialEq, Serialize)]
pub struct ConfigInfos<T>
where
T: ConfigItem + Clone,
@@ -316,7 +316,7 @@ where
}
/// Configuration information for RateLimiter token bucket.
#[derive(Clone, Debug, Default, Deserialize, PartialEq, Eq, Serialize)]
#[derive(Clone, Debug, Default, Deserialize, PartialEq, Serialize)]
pub struct TokenBucketConfigInfo {
/// The size for the token bucket. A TokenBucket of `size` total capacity will take `refill_time`
/// milliseconds to go from zero tokens to total capacity.
@@ -349,7 +349,7 @@ impl From<&TokenBucketConfigInfo> for TokenBucket {
}
/// Configuration information for RateLimiter objects.
#[derive(Clone, Debug, Default, Deserialize, PartialEq, Eq, Serialize)]
#[derive(Clone, Debug, Default, Deserialize, PartialEq, Serialize)]
pub struct RateLimiterConfigInfo {
/// Data used to initialize the RateLimiter::bandwidth bucket.
pub bandwidth: TokenBucketConfigInfo,

View File

@@ -106,7 +106,7 @@ pub enum BlockDeviceError {
}
/// Type of low level storage device/protocol for virtio-blk devices.
#[derive(Clone, Copy, Debug, PartialEq, Eq, Serialize, Deserialize)]
#[derive(Clone, Copy, Debug, PartialEq, Serialize, Deserialize)]
pub enum BlockDeviceType {
/// Unknown low level device type.
Unknown,
@@ -131,7 +131,7 @@ impl BlockDeviceType {
}
/// Configuration information for a block device.
#[derive(Clone, Debug, Deserialize, PartialEq, Eq, Serialize)]
#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)]
pub struct BlockDeviceConfigUpdateInfo {
/// Unique identifier of the drive.
pub drive_id: String,
@@ -151,7 +151,7 @@ impl BlockDeviceConfigUpdateInfo {
}
/// Configuration information for a block device.
#[derive(Clone, Debug, Deserialize, PartialEq, Eq, Serialize)]
#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)]
pub struct BlockDeviceConfigInfo {
/// Unique identifier of the drive.
pub drive_id: String,
@@ -285,6 +285,7 @@ impl std::fmt::Debug for BlockDeviceInfo {
pub type BlockDeviceInfo = DeviceConfigInfo<BlockDeviceConfigInfo>;
/// Wrapper for the collection that holds all the Block Devices Configs
//#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)]
#[derive(Clone)]
pub struct BlockDeviceMgr {
/// A list of `BlockDeviceInfo` objects.
@@ -576,13 +577,7 @@ impl BlockDeviceMgr {
) -> std::result::Result<(), DeviceMgrError> {
// Respect user configuration if kernel_cmdline contains "root=",
// special attention for the case when kernel command line starting with "root=xxx"
let old_kernel_cmdline = format!(
" {:?}",
kernel_config
.kernel_cmdline()
.as_cstring()
.map_err(DeviceMgrError::Cmdline)?
);
let old_kernel_cmdline = format!(" {}", kernel_config.kernel_cmdline().as_str());
if !old_kernel_cmdline.contains(" root=") && self.has_root_block {
let cmdline = kernel_config.kernel_cmdline_mut();
if let Some(ref uuid) = self.part_uuid {
@@ -624,7 +619,7 @@ impl BlockDeviceMgr {
// we need to satisfy the condition by which a VMM can only have on root device
if block_device_config.is_root_device {
if self.has_root_block {
Err(BlockDeviceError::RootBlockDeviceAlreadyAdded)
return Err(BlockDeviceError::RootBlockDeviceAlreadyAdded);
} else {
self.has_root_block = true;
self.read_only_root = block_device_config.is_read_only;

View File

@@ -74,20 +74,11 @@ impl ConsoleManager {
/// Create a console backend device by using stdio streams.
pub fn create_stdio_console(&mut self, device: Arc<Mutex<SerialDevice>>) -> Result<()> {
device
.lock()
.unwrap()
.set_output_stream(Some(Box::new(std::io::stdout())));
let stdin_handle = std::io::stdin();
stdin_handle
.lock()
.set_raw_mode()
.map_err(|e| DeviceMgrError::ConsoleManager(ConsoleManagerError::StdinHandle(e)))?;
stdin_handle
.lock()
.set_non_block(true)
.map_err(ConsoleManagerError::StdinHandle)
.map_err(DeviceMgrError::ConsoleManager)?;
let handler = ConsoleEpollHandler::new(device, Some(stdin_handle), None, &self.logger);
self.subscriber_id = Some(self.epoll_mgr.add_subscriber(Box::new(handler)));

View File

@@ -89,7 +89,7 @@ pub enum FsDeviceError {
}
/// Configuration information for a vhost-user-fs device.
#[derive(Clone, Debug, Deserialize, PartialEq, Eq, Serialize)]
#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)]
pub struct FsDeviceConfigInfo {
/// vhost-user socket path.
pub sock_path: String,
@@ -201,7 +201,7 @@ impl FsDeviceConfigInfo {
}
/// Configuration information for virtio-fs.
#[derive(Clone, Debug, Deserialize, PartialEq, Eq, Serialize)]
#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)]
pub struct FsDeviceConfigUpdateInfo {
/// virtiofs mount tag name used inside the guest.
/// used as the device name during mount.
@@ -242,7 +242,7 @@ impl ConfigItem for FsDeviceConfigInfo {
}
/// Configuration information of manipulating backend fs for a virtiofs device.
#[derive(Clone, Debug, Deserialize, PartialEq, Eq, Serialize, Default)]
#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)]
pub struct FsMountConfigInfo {
/// Mount operations, mount, update, umount
pub ops: String,

View File

@@ -147,17 +147,13 @@ pub type Result<T> = ::std::result::Result<T, DeviceMgrError>;
/// Type of the dragonball virtio devices.
#[cfg(feature = "dbs-virtio-devices")]
pub type DbsVirtioDevice = Box<
dyn VirtioDevice<
GuestAddressSpaceImpl,
virtio_queue::QueueStateSync,
vm_memory::GuestRegionMmap,
>,
dyn VirtioDevice<GuestAddressSpaceImpl, virtio_queue::QueueState, vm_memory::GuestRegionMmap>,
>;
/// Type of the dragonball virtio mmio devices.
#[cfg(feature = "dbs-virtio-devices")]
pub type DbsMmioV2Device =
MmioV2Device<GuestAddressSpaceImpl, virtio_queue::QueueStateSync, vm_memory::GuestRegionMmap>;
MmioV2Device<GuestAddressSpaceImpl, virtio_queue::QueueState, vm_memory::GuestRegionMmap>;
/// Struct to support transactional operations for device management.
pub struct DeviceManagerTx {
@@ -595,17 +591,18 @@ impl DeviceManager {
.map_err(|_| StartMicroVmError::EventFd)?;
info!(self.logger, "init console path: {:?}", com1_sock_path);
if let Some(legacy_manager) = self.legacy_manager.as_ref() {
let com1 = legacy_manager.get_com1_serial();
if let Some(path) = com1_sock_path {
if let Some(path) = com1_sock_path {
if let Some(legacy_manager) = self.legacy_manager.as_ref() {
let com1 = legacy_manager.get_com1_serial();
self.con_manager
.create_socket_console(com1, path)
.map_err(StartMicroVmError::DeviceManager)?;
} else {
self.con_manager
.create_stdio_console(com1)
.map_err(StartMicroVmError::DeviceManager)?;
}
} else if let Some(legacy_manager) = self.legacy_manager.as_ref() {
let com1 = legacy_manager.get_com1_serial();
self.con_manager
.create_stdio_console(com1)
.map_err(StartMicroVmError::DeviceManager)?;
}
Ok(())
@@ -789,14 +786,13 @@ impl DeviceManager {
fn allocate_mmio_device_resource(
&self,
) -> std::result::Result<DeviceResources, StartMicroVmError> {
let requests = vec![
ResourceConstraint::MmioAddress {
range: None,
align: MMIO_DEFAULT_CFG_SIZE,
size: MMIO_DEFAULT_CFG_SIZE,
},
ResourceConstraint::LegacyIrq { irq: None },
];
let mut requests = Vec::new();
requests.push(ResourceConstraint::MmioAddress {
range: None,
align: MMIO_DEFAULT_CFG_SIZE,
size: MMIO_DEFAULT_CFG_SIZE,
});
requests.push(ResourceConstraint::LegacyIrq { irq: None });
self.res_manager
.allocate_device_resources(&requests, false)
@@ -996,7 +992,7 @@ impl DeviceManager {
{
self.vsock_manager
.get_default_connector()
.map(Some)
.map(|d| Some(d))
.unwrap_or(None)
}
#[cfg(not(feature = "virtio-vsock"))]
@@ -1005,170 +1001,3 @@ impl DeviceManager {
}
}
}
#[cfg(test)]
mod tests {
use std::sync::{Arc, Mutex};
use kvm_ioctls::Kvm;
use test_utils::skip_if_not_root;
use vm_memory::{GuestAddress, MmapRegion};
use super::*;
use crate::vm::CpuTopology;
impl DeviceManager {
pub fn new_test_mgr() -> Self {
let kvm = Kvm::new().unwrap();
let vm = kvm.create_vm().unwrap();
let vm_fd = Arc::new(vm);
let epoll_manager = EpollManager::default();
let res_manager = Arc::new(ResourceManager::new(None));
let logger = slog_scope::logger().new(slog::o!());
DeviceManager {
vm_fd: Arc::clone(&vm_fd),
con_manager: ConsoleManager::new(epoll_manager, &logger),
io_manager: Arc::new(ArcSwap::new(Arc::new(IoManager::new()))),
io_lock: Arc::new(Mutex::new(())),
irq_manager: Arc::new(KvmIrqManager::new(vm_fd.clone())),
res_manager,
legacy_manager: None,
#[cfg(feature = "virtio-blk")]
block_manager: BlockDeviceMgr::default(),
#[cfg(feature = "virtio-fs")]
fs_manager: Arc::new(Mutex::new(FsDeviceMgr::default())),
#[cfg(feature = "virtio-net")]
virtio_net_manager: VirtioNetDeviceMgr::default(),
#[cfg(feature = "virtio-vsock")]
vsock_manager: VsockDeviceMgr::default(),
#[cfg(target_arch = "aarch64")]
mmio_device_info: HashMap::new(),
logger,
}
}
}
#[test]
fn test_create_device_manager() {
skip_if_not_root!();
let mgr = DeviceManager::new_test_mgr();
let _ = mgr.io_manager();
}
#[cfg(target_arch = "x86_64")]
#[test]
fn test_create_devices() {
skip_if_not_root!();
use crate::vm::VmConfigInfo;
let epoll_manager = EpollManager::default();
let vmm = Arc::new(Mutex::new(crate::vmm::tests::create_vmm_instance(
epoll_manager.clone(),
)));
let event_mgr = crate::event_manager::EventManager::new(&vmm, epoll_manager).unwrap();
let mut vm = crate::vm::tests::create_vm_instance();
let vm_config = VmConfigInfo {
vcpu_count: 1,
max_vcpu_count: 1,
cpu_pm: "off".to_string(),
mem_type: "shmem".to_string(),
mem_file_path: "".to_string(),
mem_size_mib: 16,
serial_path: None,
cpu_topology: CpuTopology {
threads_per_core: 1,
cores_per_die: 1,
dies_per_socket: 1,
sockets: 1,
},
vpmu_feature: 0,
};
vm.set_vm_config(vm_config);
vm.init_guest_memory().unwrap();
vm.setup_interrupt_controller().unwrap();
let vm_as = vm.vm_as().cloned().unwrap();
let kernel_temp_file = vmm_sys_util::tempfile::TempFile::new().unwrap();
let kernel_file = kernel_temp_file.into_file();
let mut cmdline = crate::vm::KernelConfigInfo::new(
kernel_file,
None,
linux_loader::cmdline::Cmdline::new(0x1000),
);
let address_space = vm.vm_address_space().cloned();
let mgr = vm.device_manager_mut();
let guard = mgr.io_manager.load();
let mut lcr = [0u8];
// 0x3f8 is the adddress of serial device
guard.pio_read(0x3f8 + 3, &mut lcr).unwrap_err();
assert_eq!(lcr[0], 0x0);
mgr.create_interrupt_manager().unwrap();
mgr.create_devices(
vm_as,
event_mgr.epoll_manager(),
&mut cmdline,
None,
None,
address_space.as_ref(),
)
.unwrap();
let guard = mgr.io_manager.load();
guard.pio_read(0x3f8 + 3, &mut lcr).unwrap();
assert_eq!(lcr[0], 0x3);
}
#[cfg(feature = "virtio-fs")]
#[test]
fn test_handler_insert_region() {
skip_if_not_root!();
use dbs_virtio_devices::VirtioRegionHandler;
use lazy_static::__Deref;
use vm_memory::{GuestAddressSpace, GuestMemory, GuestMemoryRegion};
let vm = crate::test_utils::tests::create_vm_for_test();
let ctx = DeviceOpContext::new(
Some(vm.epoll_manager().clone()),
vm.device_manager(),
Some(vm.vm_as().unwrap().clone()),
vm.vm_address_space().cloned(),
true,
);
let guest_addr = GuestAddress(0x200000000000);
let cache_len = 1024 * 1024 * 1024;
let mmap_region = MmapRegion::build(
None,
cache_len as usize,
libc::PROT_NONE,
libc::MAP_ANONYMOUS | libc::MAP_NORESERVE | libc::MAP_PRIVATE,
)
.unwrap();
let guest_mmap_region =
Arc::new(vm_memory::GuestRegionMmap::new(mmap_region, guest_addr).unwrap());
let mut handler = DeviceVirtioRegionHandler {
vm_as: ctx.get_vm_as().unwrap(),
address_space: ctx.address_space.as_ref().unwrap().clone(),
};
handler.insert_region(guest_mmap_region).unwrap();
let mut find_region = false;
let find_region_ptr = &mut find_region;
let guard = vm.vm_as().unwrap().clone().memory();
let mem = guard.deref();
for region in mem.iter() {
if region.start_addr() == guest_addr && region.len() == cache_len {
*find_region_ptr = true;
}
}
assert!(find_region);
}
}

View File

@@ -93,7 +93,7 @@ pub enum VirtioNetDeviceError {
}
/// Configuration information for virtio net devices.
#[derive(Clone, Debug, Deserialize, PartialEq, Eq, Serialize)]
#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)]
pub struct VirtioNetDeviceConfigUpdateInfo {
/// ID of the guest network interface.
pub iface_id: String,
@@ -123,7 +123,7 @@ impl VirtioNetDeviceConfigUpdateInfo {
}
/// Configuration information for virtio net devices.
#[derive(Clone, Debug, Deserialize, PartialEq, Eq, Serialize, Default)]
#[derive(Clone, Debug, Deserialize, PartialEq, Serialize, Default)]
pub struct VirtioNetDeviceConfigInfo {
/// ID of the guest network interface.
pub iface_id: String,
@@ -264,7 +264,7 @@ impl VirtioNetDeviceMgr {
config.use_generic_irq.unwrap_or(USE_GENERIC_IRQ),
)
.map_err(VirtioNetDeviceError::DeviceManager)?;
ctx.insert_hotplug_mmio_device(&dev, None)
ctx.insert_hotplug_mmio_device(&dev.clone(), None)
.map_err(VirtioNetDeviceError::DeviceManager)?;
// live-upgrade need save/restore device from info.device.
mgr.info_list[device_index].set_device(dev);
@@ -320,7 +320,7 @@ impl VirtioNetDeviceMgr {
}
}
/// Attach all configured net device to the virtual machine instance.
/// Attach all configured vsock device to the virtual machine instance.
pub fn attach_devices(
&mut self,
ctx: &mut DeviceOpContext,

View File

@@ -70,7 +70,7 @@ pub enum VsockDeviceError {
}
/// Configuration information for a vsock device.
#[derive(Clone, Debug, Deserialize, PartialEq, Eq, Serialize)]
#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)]
pub struct VsockDeviceConfigInfo {
/// ID of the vsock device.
pub id: String,

View File

@@ -100,7 +100,7 @@ pub enum StartMicroVmError {
/// Upcall is not ready
#[error("the upcall client is not ready")]
UpcallServerNotReady,
UpcallNotReady,
/// Configuration passed in is invalidate.
#[error("invalid virtual machine configuration: {0} ")]
@@ -127,10 +127,6 @@ pub enum StartMicroVmError {
#[error("failure while configuring guest kernel commandline: {0}")]
LoadCommandline(#[source] linux_loader::loader::Error),
/// Cannot process command line string.
#[error("failure while processing guest kernel commandline: {0}.")]
ProcessCommandlne(#[source] linux_loader::cmdline::Error),
/// The device manager was not configured.
#[error("the device manager failed to manage devices: {0}")]
DeviceManager(#[source] device_manager::DeviceMgrError),

View File

@@ -101,6 +101,7 @@ impl EventManager {
/// Poll pending events and invoke registered event handler.
///
/// # Arguments:
/// * max_events: maximum number of pending events to handle
/// * timeout: maximum time in milliseconds to wait
pub fn handle_events(&self, timeout: i32) -> std::result::Result<usize, EpollError> {
self.epoll_mgr

View File

@@ -210,25 +210,20 @@ mod x86_64 {
#[cfg(test)]
mod tests {
use super::*;
use kvm_ioctls::Kvm;
use std::fs::File;
use std::os::unix::fs::MetadataExt;
use std::os::unix::io::{AsRawFd, FromRawFd};
use kvm_ioctls::Kvm;
use test_utils::skip_if_not_root;
use super::*;
#[test]
fn test_create_kvm_context() {
skip_if_not_root!();
let c = KvmContext::new(None).unwrap();
assert!(c.max_memslots >= 32);
let kvm = Kvm::new().unwrap();
let f = std::mem::ManuallyDrop::new(unsafe { File::from_raw_fd(kvm.as_raw_fd()) });
let f = unsafe { File::from_raw_fd(kvm.as_raw_fd()) };
let m1 = f.metadata().unwrap();
let m2 = File::open("/dev/kvm").unwrap().metadata().unwrap();
@@ -239,8 +234,6 @@ mod tests {
#[cfg(target_arch = "x86_64")]
#[test]
fn test_get_supported_cpu_id() {
skip_if_not_root!();
let c = KvmContext::new(None).unwrap();
let _ = c
@@ -251,8 +244,6 @@ mod tests {
#[test]
fn test_create_vm() {
skip_if_not_root!();
let c = KvmContext::new(None).unwrap();
let _ = c.create_vm().unwrap();

View File

@@ -34,9 +34,6 @@ pub mod vm;
mod event_manager;
mod io_manager;
mod test_utils;
mod vmm;
pub use self::error::StartMicroVmError;

View File

@@ -36,7 +36,7 @@ const PIO_MAX: u16 = 0xFFFF;
const MMIO_SPACE_RESERVED: u64 = 0x400_0000;
/// Errors associated with resource management operations
#[derive(Debug, PartialEq, Eq, thiserror::Error)]
#[derive(Debug, PartialEq, thiserror::Error)]
pub enum ResourceError {
/// Unknown/unsupported resource type.
#[error("unsupported resource type")]
@@ -420,7 +420,6 @@ impl ResourceManager {
}
/// Allocate requested resources for a device.
#[allow(clippy::question_mark)]
pub fn allocate_device_resources(
&self,
requests: &[ResourceConstraint],
@@ -436,7 +435,10 @@ impl ResourceManager {
constraint.max = r.1 as u64;
}
match self.allocate_pio_address(&constraint) {
Some(base) => Resource::PioAddressRange { base, size: *size },
Some(base) => Resource::PioAddressRange {
base: base as u16,
size: *size,
},
None => {
if let Err(e) = self.free_device_resources(&resources) {
return Err(e);
@@ -567,7 +569,9 @@ impl ResourceManager {
Resource::KvmMemSlot(slot) => self.free_kvm_mem_slot(*slot),
Resource::MacAddresss(_) => Ok(()),
};
result?;
if result.is_err() {
return result;
}
}
Ok(())
}
@@ -584,9 +588,9 @@ mod tests {
// Allocate/free shared IRQs multiple times.
assert_eq!(mgr.allocate_legacy_irq(true, None).unwrap(), SHARED_IRQ);
assert_eq!(mgr.allocate_legacy_irq(true, None).unwrap(), SHARED_IRQ);
mgr.free_legacy_irq(SHARED_IRQ).unwrap();
mgr.free_legacy_irq(SHARED_IRQ).unwrap();
mgr.free_legacy_irq(SHARED_IRQ).unwrap();
mgr.free_legacy_irq(SHARED_IRQ);
mgr.free_legacy_irq(SHARED_IRQ);
mgr.free_legacy_irq(SHARED_IRQ);
// Allocate specified IRQs.
assert_eq!(
@@ -594,7 +598,7 @@ mod tests {
.unwrap(),
LEGACY_IRQ_BASE + 10
);
mgr.free_legacy_irq(LEGACY_IRQ_BASE + 10).unwrap();
mgr.free_legacy_irq(LEGACY_IRQ_BASE + 10);
assert_eq!(
mgr.allocate_legacy_irq(false, Some(LEGACY_IRQ_BASE + 10))
.unwrap(),
@@ -631,19 +635,19 @@ mod tests {
let mgr = ResourceManager::new(None);
let msi = mgr.allocate_msi_irq(3).unwrap();
mgr.free_msi_irq(msi, 3).unwrap();
mgr.free_msi_irq(msi, 3);
let msi = mgr.allocate_msi_irq(3).unwrap();
mgr.free_msi_irq(msi, 3).unwrap();
mgr.free_msi_irq(msi, 3);
let irq = mgr.allocate_msi_irq_aligned(8).unwrap();
assert_eq!(irq & 0x7, 0);
mgr.free_msi_irq(msi, 8).unwrap();
mgr.free_msi_irq(msi, 8);
let irq = mgr.allocate_msi_irq_aligned(8).unwrap();
assert_eq!(irq & 0x7, 0);
let irq = mgr.allocate_msi_irq_aligned(512).unwrap();
assert_eq!(irq, 512);
mgr.free_msi_irq(irq, 512).unwrap();
mgr.free_msi_irq(irq, 512);
let irq = mgr.allocate_msi_irq_aligned(512).unwrap();
assert_eq!(irq, 512);
@@ -686,9 +690,9 @@ mod tests {
},
];
let resources = mgr.allocate_device_resources(&requests, false).unwrap();
mgr.free_device_resources(&resources).unwrap();
mgr.free_device_resources(&resources);
let resources = mgr.allocate_device_resources(&requests, false).unwrap();
mgr.free_device_resources(&resources).unwrap();
mgr.free_device_resources(&resources);
requests.push(ResourceConstraint::PioAddress {
range: Some((0xc000, 0xc000)),
align: 0x1000,
@@ -698,7 +702,7 @@ mod tests {
let resources = mgr
.allocate_device_resources(&requests[0..requests.len() - 1], false)
.unwrap();
mgr.free_device_resources(&resources).unwrap();
mgr.free_device_resources(&resources);
}
#[test]
@@ -717,7 +721,7 @@ mod tests {
let mgr = ResourceManager::new(None);
assert_eq!(mgr.allocate_kvm_mem_slot(1, None).unwrap(), 0);
assert_eq!(mgr.allocate_kvm_mem_slot(1, Some(200)).unwrap(), 200);
mgr.free_kvm_mem_slot(200).unwrap();
mgr.free_kvm_mem_slot(200);
assert_eq!(mgr.allocate_kvm_mem_slot(1, Some(200)).unwrap(), 200);
assert_eq!(
mgr.allocate_kvm_mem_slot(1, Some(KVM_USER_MEM_SLOTS))

Some files were not shown because too many files have changed in this diff Show More