Compare commits

..

67 Commits
2.3.0 ... 2.2.1

Author SHA1 Message Date
Fabiano Fidêncio
af0fbb9460 Merge pull request #2723 from fidencio/2.2.1-branch-bump
# Kata Containers 2.2.1
2021-09-25 00:02:01 +02:00
Fabiano Fidêncio
bc48a58806 Merge pull request #2731 from fidencio/wip/stable-2.2-release-fix-using-vendored-sources
stable-2.2 | workflows: Fix the config file path for using vendored sources
2021-09-25 00:01:43 +02:00
Fabiano Fidêncio
d581cdab4e Merge pull request #2728 from fidencio/wip/stable-2.2-fix-wrong-tags-attribution
stable-2.2 | workflows: Fix tag attribution
2021-09-24 23:01:18 +02:00
Fabiano Fidêncio
52fdfc4fed workflows: Fix the config file path for using vendored sources
There's a typo in the file that should receive the output of `cargo
vendor`.  We should use forward the output to `.cargo/config` instead of
`.cargo/vendor`.

This was introduced by 21c8511630.

Backports: #2730
Fixes: #2729

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit a525991c2c)
2021-09-24 20:29:15 +02:00
Fabiano Fidêncio
8d98e01414 workflows: Fix tag attribution
While releasing kata-containers 2.3.0-alpha1 we've hit some issues as
the tags attribution is done incorrectly.  We want an array of tags to
iterate over, but the currently code is just lost is the parenthesis.

This issue was introduced in a156288c1f.

Fixes: #2725

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit 39dcbaa672)
2021-09-24 20:07:55 +02:00
Fabiano Fidêncio
688cc8e2bd release: Kata Containers 2.2.1
- stable-2.2 | watcher: ensure we create target mount point for storage
- stable-2.2 | virtiofs: Create shared directory with 0700 mode, not 0750
- [backport]sandbox: Allow the device to be accessed,such as /dev/null and /dev/u…
- stable-2.2 | kata-deploy: Also provide "stable" & "latest" tags
- stable-2.2 | runtime: tracing: Fix logger passed in newContainer
- stable-2.2 | runtime: tracing: Use root context to stop tracing
- packaging: Backport QEMU's GitLab switch to 5.1.x
- stable-2.2 | workflows,release: Upload the vendored cargo code
- backport: Call agent shutdown test only in the correspondent CI_JOB
- packaging: Backport QEMU's switch to GitLab repos
- stable-2.2 | virtcontainers: fc: parse vcpuID correctly
- shimv2: Backport fixes for #2527
- backport-2.2: remove default config for arm64.
- stable-2.2 | versions: Upgrade to Cloud Hypervisor v18.0
- [backport]sandbox: Add device permissions such as /dev/null to cgroup
- [backport] runtime: Fix README link
- [backport] snap: Test variable instead of executing "branch"

d9b41fc5 watcher: ensure we create target mount point for storage
2b6327ac kata-deploy: Add more info about the stable tag
5256e085 kata-deploy: Improve README
02b46268 kata-deploy: Remove qemu-virtiofs runtime class
1b3058dd release: update the kata-deploy yaml files accordingly
98e2e935 kata-deploy: Add "stable" info to the README
8f25c7da kata-deploy: Update the README
84da2f8d workflows: Add "stable" & "latest" tags to kata-deploy
5c76f1c6 packaging: Backport QEMU's GitLab switch to 5.1.x
ba6fc328 packaging: Backport QEMU's switch to GitLab repos
d5f5da43 workflows,release: Upload the vendored cargo code
017cd3c5 ci: Call agent shutdown test only in the correspondent CI_JOB
2ca867da runtime: Add container field to logs
f4da502c shimv2: add information to method comment
16164241 shimv2: add logging to shimv2 api calls
25c7e118 virtiofs: Create shared directory with 0700 mode, not 0750
4c5bf057 virtcontainers: fc: parse vcpuID correctly
b3e620db runtime: tracing: Fix logger passed in newContainer
98c2ca13 runtime: tracing: Use root context to stop tracing
0481c507 backport-2.2: remove default config for arm64.
56920bc9 sandbox: Allow the device to be accessed,such as /dev/null and /dev/urandom
a1874ccd virtcontainers: clh: Revert the workaround incorrect default values
c2c65050 virtcontainers: clh: Re-generate the client code
7ee43f94 versions: Upgrade to Cloud Hypervisor v18.0
1792a9fe runtime: Fix README link
807cc8a3 sandbox: Add device permissions such as /dev/null to cgroup
5987f3b5 snap: Test variable instead of executing "branch"

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-09-24 12:34:35 +02:00
Fabiano Fidêncio
ebc23df752 Merge pull request #2714 from egernst/watcher-fixup-backport
stable-2.2 | watcher: ensure we create target mount point for storage
2021-09-24 09:32:29 +02:00
Fabiano Fidêncio
e58fabfc20 Merge pull request #2598 from c3d/backport/2589-virtiofsd-perms-perms
stable-2.2 | virtiofs: Create shared directory with 0700 mode, not 0750
2021-09-24 09:16:59 +02:00
Peng Tao
feb06dad8a Merge pull request #2623 from Bevisy/stable-2.2-2615-bp
[backport]sandbox: Allow the device to be accessed,such as /dev/null and /dev/u…
2021-09-24 14:04:36 +08:00
Eric Ernst
d9b41fc583 watcher: ensure we create target mount point for storage
We would only create the target when updating files. We need to make
sure that we create the target if the source is a directory. Without
this, we'll fail to start a container that utilizes an empty configmap,
for example.

Add unit tests for this.

Fixes: #2638

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-09-23 15:45:57 -07:00
Julio Montes
7852b9f8e1 Merge pull request #2711 from fidencio/wip/stable-2.2-kata-deploy-use-stable-and-latest-tags
stable-2.2 | kata-deploy: Also provide "stable" & "latest" tags
2021-09-23 12:18:00 -05:00
Chelsea Mafrica
83f219577d Merge pull request #2668 from cmaf/tracing-newContainer-logger-bp-2.2
stable-2.2 | runtime: tracing: Fix logger passed in newContainer
2021-09-23 09:58:14 -07:00
Chelsea Mafrica
97421afe17 Merge pull request #2664 from cmaf/tracing-stop-rootctx-bp-2.2
stable-2.2 | runtime: tracing: Use root context to stop tracing
2021-09-23 09:57:57 -07:00
Fabiano Fidêncio
2b6327ac37 kata-deploy: Add more info about the stable tag
Let's make it as clear as possible for the user that if they go for a
tagged version of kata-deploy, eg, 2.2.1, they'll have the kata runtime
2.2.1 deployed on their cluster.

Suggested-by: Eric Adams <eric.adams@intel.com>
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit 3bdcfaa658)
2021-09-23 14:05:17 +02:00
Fabiano Fidêncio
5256e0852c kata-deploy: Improve README
Let's add more instructions in the README in order to make clear to the
reader what they can do to check whether kata-deploy is ready, or
whether they have to wait till proceeding with the next instruction.

Suggested-by: Eric Adams <eric.adams@intel.com>
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit 41c590fa0a)
2021-09-23 14:04:57 +02:00
Fabiano Fidêncio
02b46268f4 kata-deploy: Remove qemu-virtiofs runtime class
There's only one QEMU runtime class deployed as part of kata-deploy, and
that includes virtiofs support (which is the default for quite some time
already).  Knowing this, let's just remove the `qemu-virtiofs` runtime
class definition.

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit debf3c9fe9)
2021-09-23 14:04:50 +02:00
Fabiano Fidêncio
1b3058dd24 release: update the kata-deploy yaml files accordingly
Let's teach our `update-repository-version.sh` script to properly update
the kata-deploy tags on both kata-deploy and kata-cleanup yaml files.

The 3 scenarios that we're dealing with, based on which branch we're
targetting, are:
```
 1) [main] ------> [main]        NO-OP
   "alpha0"       "alpha1"

                   +----------------+----------------+
                   |      from      |       to       |
  -----------------+----------------+----------------+
  kata-deploy      | "latest"       | "latest"       |
  -----------------+----------------+----------------+
  kata-deploy-base | "stable        | "stable"       |
  -----------------+----------------+----------------+

 2) [main] ------> [stable] Update kata-deploy and
   "alpha2"         "rc0"   get rid of kata-deploy-base

                   +----------------+----------------+
                   |      from      |       to       |
  -----------------+----------------+----------------+
  kata-deploy      | "latest"       | "rc0"          |
  -----------------+----------------+----------------+
  kata-deploy-base | "stable"       | REMOVED        |
  -----------------+----------------+----------------+

 3) [stable] ------> [stable]    Update kata-deploy
    "x.y.z"         "x.y.(z+1)"

                   +----------------+----------------+
                   |      from      |       to       |
  -----------------+----------------+----------------+
  kata-deploy      | "x.y.z"        | "x.y.(z+1)"    |
  -----------------+----------------+----------------+
  kata-deploy-base | NON-EXISTENT   | NON-EXISTENT   |
  -----------------+----------------+----------------+
```

And we can easily cover those 3 cases only with the information about
the "${target_branch}" and the "${new_version}", where:
* case 1) if "${target_branch}" is "main" *and* "${new_version}"
  contains "alpha", do nothing
* case 2) if "${target_branch}" is "main" *and* "${new_version}"
  contains "rc":
  * change the kata-deploy & kata-cleanup tags from "latest" to
    "${new_version}".
  * delete the kata-deploy-stable & kata-cleanup-stable files.
* case 3) if the "${target_branch}" contains "stable":
  * change the kata-deploy & kata-cleanup tags from "${current_version}"
    to "${new_version}".

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit 43a72d76e2)
2021-09-23 14:04:44 +02:00
Fabiano Fidêncio
98e2e93552 kata-deploy: Add "stable" info to the README
Similar to the instructions we have for the "latest" images, let's also
add instructions about the "stable" images.

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit ea9b2f9c92)
2021-09-23 14:04:38 +02:00
Fabiano Fidêncio
8f25c7da11 kata-deploy: Update the README
Let's just point to our repo URLs rather than assume users using
kata-deploy will have our repo cloned.

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit e541105680)
2021-09-23 14:04:29 +02:00
Fabiano Fidêncio
84da2f8ddc workflows: Add "stable" & "latest" tags to kata-deploy
When releasing a tarball, let's *also* add the "stable" & "latest" tags
to the kata-deploy image.

The "stable" tag refers to any official release, while the "latest" tag
refers to any pre-release / release candidate.

Fixes: #2302

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit a156288c1f)
2021-09-23 14:01:33 +02:00
Fabiano Fidêncio
de0e3915b7 Merge pull request #2702 from Jakob-Naucke/backport-qemu-gitlab
packaging: Backport QEMU's GitLab switch to 5.1.x
2021-09-23 12:59:17 +02:00
Jakob Naucke
5c76f1c65a packaging: Backport QEMU's GitLab switch to 5.1.x
This brings #2699 to 5.1.x for ARM. Add a `no_patches.txt` for 5.1.0
which was missing apparently.

Fixes: #2701
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-09-23 11:11:45 +02:00
Fabiano Fidêncio
522a53010c Merge pull request #2690 from fidencio/wip/stable-2.2-upload-cargo-vendored-tarball
stable-2.2 | workflows,release: Upload the vendored cargo code
2021-09-22 22:07:08 +02:00
Julio Montes
852fc53351 Merge pull request #2688 from GabyCT/shutdown
backport: Call agent shutdown test only in the correspondent CI_JOB
2021-09-22 09:53:14 -05:00
Julio Montes
e0a27b5e90 Merge pull request #2699 from Jakob-Naucke/backport-qemu-gitlab
packaging: Backport QEMU's switch to GitLab repos
2021-09-22 09:19:16 -05:00
Jakob Naucke
ba6fc32804 packaging: Backport QEMU's switch to GitLab repos
QEMU's submodule checkout from git.qemu.org can fail. On QEMU 6.x, this
is not a problem because they moved to GitLab. However, we use QEMU 5.2
on stable-2.2, which can be a problem when no cached QEMU is used.
Backport QEMU's switch.

Fixes: #2698
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-09-22 14:59:35 +02:00
Fabiano Fidêncio
d5f5da4323 workflows,release: Upload the vendored cargo code
As part of the release, let's also upload a tarball with the vendored
cargo code.  By doing this we allow distros, which usually don't have
access to the internet while performing the builds, to just add the
vendored code as a second source, making the life of the downstream
maintainers slightly easier*.

Fixes: #1203
Backports: #2573

*: The current workflow requires the downstream maintainer to download
the tarball, unpack it, run `cargo vendor`, create the tarball, etc.
Although this doesn't look like a ridiculous amount of work, it's better
if we can have it in an automated fashion.

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit 21c8511630)
2021-09-21 21:48:58 +02:00
Gabriela Cervantes
017cd3c53c ci: Call agent shutdown test only in the correspondent CI_JOB
The agent shutdown test should only run on the CI JOB of CRI_CONTAINERD_K8S_MINIMAL
which is the only one where testing tracing is being enabled, however, this
test is being triggered in multiple CI jobs where it should not run. This PR
fixes that issue.

Fixes #2683

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2021-09-21 17:01:09 +00:00
Chelsea Mafrica
484af1a559 Merge pull request #2678 from nubificus/stable-2.2-fix_fc_vcpu_thread
stable-2.2 | virtcontainers: fc: parse vcpuID correctly
2021-09-20 09:46:07 -07:00
Chelsea Mafrica
a572a6ebf8 Merge pull request #2679 from c3d/backport/2527-adding-debugging-msgs
shimv2: Backport fixes for #2527
2021-09-20 09:42:53 -07:00
Snir Sheriber
2ca867da7b runtime: Add container field to logs
and unified field naming

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>

Backport from commit 0c7789fad6
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2021-09-20 11:04:09 +02:00
Snir Sheriber
f4da502c4f shimv2: add information to method comment
add a comment to explicitly mentioned method is a binary call

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>

Backport from commit 72e3538e36
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2021-09-20 11:03:45 +02:00
Snir Sheriber
16164241df shimv2: add logging to shimv2 api calls
and also fetch and log container id from the request

Fixes: #2527
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>

Backport from commit 8dadca9cd1
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2021-09-20 11:02:35 +02:00
Christophe de Dinechin
25c7e1181a virtiofs: Create shared directory with 0700 mode, not 0750
A discussion on the Linux kernel mailing list [1] exposed that virtiofsd makes a
core assumption that the file systems being shared are not accessible by any
non-privileged user. We currently create the `shared` directory in the sandbox
with the default `0750` permissions, which gives read and directory traversal
access to the group. There is no real good reason for a non-root user to access
the shared directory, and this is potentially dangerous.

Fixes: #2589

[1]: https://lore.kernel.org/linux-fsdevel/YTI+k29AoeGdX13Q@redhat.com/

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2021-09-20 10:54:18 +02:00
Anastassios Nanos
4c5bf0576b virtcontainers: fc: parse vcpuID correctly
In getThreadIDs(), the cpuID variable is derived from a string that
already contains a whitespace. As a result, strings.SplitAfter returns
the cpuID with a leading space. This makes any go variant of string to int
fail (strconv.ParseInt() in our case). This patch makes sure that the
leading space character is removed so the string passed to
strconv.ParseInt() is "CPUID" and not " CPUID".

This has been caused by a change in the naming scheme of vcpu threads
for Firecracker after v0.19.1.

Fixes: #2592

Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
2021-09-18 08:10:13 +00:00
Chelsea Mafrica
b3e620dbcf runtime: tracing: Fix logger passed in newContainer
Change logger in Trace call in newContainer from sandbox.Logger() to
nil. Passing nil will cause an error to be logged by kataTraceLogger
instead of the sandbox logger, which will avoid having the log message
report it as part of the sandbox subsystem when it is part of the
container subsystem.

The kataTraceLogger will not log it as related to the container
subsystem, but since the container logger has not been created at this
point, and we already use the kataTraceLogger in other instances where a
subsystem's logger has not been created yet, this PR makes the call
consistent with other code.

Backport of #2666
Fixes #2667

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-09-16 16:30:29 -07:00
Chelsea Mafrica
98c2ca13c1 runtime: tracing: Use root context to stop tracing
Call StopTracing with s.rootCtx, which is the root context for tracing,
instead of s.ctx, which is parent to a subset of trace spans.

Backport of #2662

Fixes #2663

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-09-16 11:19:40 -07:00
Fabiano Fidêncio
a97c9063db Merge pull request #2642 from jongwu/qemu_mak_2.2
backport-2.2: remove default config for arm64.
2021-09-16 07:21:32 +02:00
Jianyong Wu
0481c5070c backport-2.2: remove default config for arm64.
The current default config in qemu for arm64 doesn't suit for qemu
version 5.1+, so remove them here.

Fixes: #2595
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-09-15 10:07:13 +08:00
Samuel Ortiz
64504061c8 Merge pull request #2619 from likebreath/0913/backport_clh_v18.0
stable-2.2 | versions: Upgrade to Cloud Hypervisor v18.0
2021-09-14 12:02:50 +02:00
Binbin Zhang
56920bc943 sandbox: Allow the device to be accessed,such as /dev/null and /dev/urandom
If the device has no permission, such as /dev/null, /dev/urandom,
it needs to be added into cgroup.

Fixes: #2615
Backport: #2616

Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
2021-09-14 10:33:49 +08:00
Bo Chen
a1874ccd62 virtcontainers: clh: Revert the workaround incorrect default values
Given the fix to the bugs of the openapi spec file is included in the
Cloud Hypervisor v18.0 [1], this patch reverts the workaround we carried
in the CLH driver.

This reverts commit 932ee41b3f.

[1] https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3029

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit f785ff0bf2)
2021-09-13 14:17:58 -07:00
Bo Chen
c2c650500b virtcontainers: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v18.0.
Note: The client code of cloud-hypervisor's (CLH) OpenAPI is
automatically generated by openapi-generator [1-2].

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 0e0e59dc5f)
2021-09-13 14:17:58 -07:00
Bo Chen
7ee43f9468 versions: Upgrade to Cloud Hypervisor v18.0
Highlights from the Cloud Hypervisor release v18.0: 1) Experimental User
Device (vfio-user) support; 2) Migration support for vhost-user devices;
3) VHDX disk image support; 4) Device pass through on MSHV hypervisor;
5) AArch64 for support virtio-mem; 6) Live migration on MSHV hypervisor;
7) AArch64 CPU topology support; 8) Power button support on AArch64; 9)
Various bug fixes on PTY, TTY, signal handling, and live-migration on
AArch64.

Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v18.0

Fixes: #2543

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit f0b5331430)
2021-09-13 14:17:58 -07:00
Samuel Ortiz
eedf139076 Merge pull request #2608 from Bevisy/main-2539-bp
[backport]sandbox: Add device permissions such as /dev/null to cgroup
2021-09-13 19:07:17 +02:00
Fabiano Fidêncio
54a6890c3c Merge pull request #2614 from sameo/stable-2.2
[backport] runtime: Fix README link
2021-09-13 17:45:07 +02:00
Samuel Ortiz
1792a9fe11 runtime: Fix README link
The LICENSE file lives in the project's root.

Fixes #2612

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2021-09-11 09:57:49 +02:00
Julio Montes
9bf95279be Merge pull request #2588 from devimc/2021-09-07/backport/fixSnap
[backport] snap: Test variable instead of executing "branch"
2021-09-10 14:44:55 -05:00
Binbin Zhang
807cc8a3a5 sandbox: Add device permissions such as /dev/null to cgroup
adds the default devices for unix such as /dev/null, /dev/urandom to
the container's resource cgroup spec

Fixes: #2539
Backports: #2603

Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
2021-09-10 17:33:26 +08:00
David Gibson
5987f3b5e1 snap: Test variable instead of executing "branch"
In snapcraft.yaml we have a case statement on $(branch) - that is on the
output of executing a command "branch".  From the selections it appears
that what it actually wants is to simply select on the contents of the
$branch variable, which should be ${branch} instead.

fixes #2558

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-07 09:37:17 -05:00
Fabiano Fidêncio
caafd0f952 Merge pull request #2541 from fidencio/2.2.0-branch-bump
# Kata Containers 2.2.0
2021-09-01 00:33:25 +02:00
Fabiano Fidêncio
800126b272 release: Kata Containers 2.2.0
- runtime: drop qemu-lite support
- stable-2.2 | virtcontainers: clh: Upgrade to the openapi-generator v5.2.1
- backport ci: Temporarily skip agent shutdown test on s390x
- backport: build_image: Fix error soft link about initrd.img

dca35c17 docs: remove mentioning of qemu-lite
0bdfdad2 runtime: drop qemu-lite support
60155756 runtime: fix default hypervisor path
ca9e6538 ci: Temporarily skip agent shutdown test on s390x
938b01ae virtcontainers: clh: Workaround incorrect default values
abd708e8 virtcontainers: clh: Fix the unit test
61babd45 virtcontainers: clh: Use constructors to ensure proper default value
59c51f62 virtcontainers: clh: Migrate to use the updated client APIs
c1f260cc virtcontainers: clh: Re-generate the client code
4cd6909f virtcontainers: clh: Upgrade to the openapi-generator v5.2.1
efa2d54e build_image: Fix error soft link about initrd.img

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-08-31 18:44:03 +02:00
Archana Shinde
b1372b353f Merge pull request #2533 from bergwolf/qemu-lite
runtime: drop qemu-lite support
2021-08-31 07:39:24 -07:00
Peng Tao
dca35c1730 docs: remove mentioning of qemu-lite
vm-templating should just work with upstream qemu v4.1.0 or above.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-08-31 10:17:12 +08:00
Peng Tao
0bdfdad236 runtime: drop qemu-lite support
As the project is not maintained and we have not been testing against it
for a long time.

Fixes: #2529
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-08-31 10:17:06 +08:00
Peng Tao
60155756f3 runtime: fix default hypervisor path
Should not be qemu-lite.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-08-31 10:16:57 +08:00
Fabiano Fidêncio
669888c339 Merge pull request #2525 from likebreath/0827/backport_clh_generator
stable-2.2 | virtcontainers: clh: Upgrade to the openapi-generator v5.2.1
2021-08-30 21:25:05 +02:00
GabyCT
cde008f441 Merge pull request #2531 from Jakob-Naucke/backport-s390x-skip-agent-shutdown-test
backport ci: Temporarily skip agent shutdown test on s390x
2021-08-30 09:25:50 -05:00
Peng Tao
7c866073f9 Merge pull request #2520 from Bevisy/stable-2.2-2503
backport: build_image: Fix error soft link about initrd.img
2021-08-30 20:16:55 +08:00
Jakob Naucke
ca9e6538e6 ci: Temporarily skip agent shutdown test on s390x
see https://github.com/kata-containers/tests/issues/3878 for tracking

Fixes: #2507
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-08-30 14:14:43 +02:00
Bo Chen
938b01aedc virtcontainers: clh: Workaround incorrect default values
Two default values defined in the 'cloud-hypervisor.yaml' have typo, and this
patch manually overwrites them with the correct value as a workaround
before the corresponding fix is landed to Cloud Hypervisor upstream.

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 932ee41b3f)
2021-08-27 13:37:47 -07:00
Bo Chen
abd708e814 virtcontainers: clh: Fix the unit test
This patch fixes the unit tests over clh.go with the updated client code.

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit bff38e4f4d)
2021-08-27 13:37:47 -07:00
Bo Chen
61babd45ed virtcontainers: clh: Use constructors to ensure proper default value
With the updated openapi-generator, the client code now handles optional
attributes correctly, and ensures to assign the right default
values. This patch enables to use those constructors to make sure the
proper default values being used.

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit d967d3cb37)
2021-08-27 13:37:47 -07:00
Bo Chen
59c51f6201 virtcontainers: clh: Migrate to use the updated client APIs
The client code (and APIs) for Cloud Hypervisor has been changed
dramatically due to the upgrade to `openapi-generator` v5.2.1. This
patch migrate the Cloud Hypervisor driver in the kata-runtime to use
those updated APIs.

The main change from the client code is that it now uses "pointer" type
to represent "optional" attributes from the input openapi specification
file.

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit a6a2e525de)
2021-08-27 13:37:47 -07:00
Bo Chen
c1f260cc40 virtcontainers: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor with the
updated `openapi-generator` v5.2.1.

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 46eb07e14f)
2021-08-27 13:37:47 -07:00
Bo Chen
4cd6909f18 virtcontainers: clh: Upgrade to the openapi-generator v5.2.1
To improve the quality and correctness of the auto-generated code, this
patch upgrade the `openapi-generator` to its latest stable release
v5.2.1.

Fixes: #2487

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 80fba4d637)
2021-08-27 13:37:47 -07:00
Binbin Zhang
efa2d54e85 build_image: Fix error soft link about initrd.img
fix error soft link about initrd.img

Fixes #2503

Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
2021-08-27 16:15:49 +08:00
812 changed files with 44615 additions and 36991 deletions

View File

@@ -1,6 +1,6 @@
name: kata deploy build
name: kata-deploy-build
on: [push, pull_request]
on: push
jobs:
build-asset:
@@ -9,7 +9,6 @@ jobs:
matrix:
asset:
- kernel
- kernel-experimental
- shim-v2
- qemu
- cloud-hypervisor
@@ -25,7 +24,7 @@ jobs:
- name: Build ${{ matrix.asset }}
run: |
make "${KATA_ASSET}-tarball"
./tools/packaging/kata-deploy/local-build/kata-deploy-binaries-in-docker.sh --build="${KATA_ASSET}"
build_dir=$(readlink -f build)
# store-artifact does not work with symlink
sudo cp -r --preserve=all "${build_dir}" "kata-build"
@@ -48,21 +47,12 @@ jobs:
uses: actions/download-artifact@v2
with:
name: kata-artifacts
path: build
path: kata-artifacts
- name: merge-artifacts
run: |
make merge-builds
./tools/packaging/kata-deploy/local-build/kata-deploy-merge-builds.sh kata-artifacts
- name: store-artifacts
uses: actions/upload-artifact@v2
with:
name: kata-static-tarball
path: kata-static.tar.xz
make-kata-tarball:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: make kata-tarball
run: |
make kata-tarball
sudo make install-tarball

View File

@@ -5,121 +5,60 @@ on:
name: test-kata-deploy
jobs:
check-comment-and-membership:
runs-on: ubuntu-latest
if: |
github.event.issue.pull_request
&& github.event_name == 'issue_comment'
&& github.event.action == 'created'
&& startsWith(github.event.comment.body, '/test_kata_deploy')
steps:
- name: Check membership
uses: kata-containers/is-organization-member@1.0.1
id: is_organization_member
with:
organization: kata-containers
username: ${{ github.event.comment.user.login }}
token: ${{ secrets.GITHUB_TOKEN }}
- name: Fail if not member
run: |
result=${{ steps.is_organization_member.outputs.result }}
if [ $result == false ]; then
user=${{ github.event.comment.user.login }}
echo Either ${user} is not part of the kata-containers organization
echo or ${user} has its Organization Visibility set to Private at
echo https://github.com/orgs/kata-containers/people?query=${user}
echo
echo Ensure you change your Organization Visibility to Public and
echo trigger the test again.
exit 1
fi
build-asset:
runs-on: ubuntu-latest
needs: check-comment-and-membership
strategy:
matrix:
asset:
- cloud-hypervisor
- firecracker
- kernel
- qemu
- rootfs-image
- rootfs-initrd
- shim-v2
steps:
- uses: actions/checkout@v2
- name: Install docker
run: |
curl -fsSL https://test.docker.com -o test-docker.sh
sh test-docker.sh
- name: Build ${{ matrix.asset }}
run: |
make "${KATA_ASSET}-tarball"
build_dir=$(readlink -f build)
# store-artifact does not work with symlink
sudo cp -r "${build_dir}" "kata-build"
env:
KATA_ASSET: ${{ matrix.asset }}
TAR_OUTPUT: ${{ matrix.asset }}.tar.gz
- name: store-artifact ${{ matrix.asset }}
uses: actions/upload-artifact@v2
with:
name: kata-artifacts
path: kata-build/kata-static-${{ matrix.asset }}.tar.xz
if-no-files-found: error
create-kata-tarball:
runs-on: ubuntu-latest
needs: build-asset
steps:
- uses: actions/checkout@v2
- name: get-artifacts
uses: actions/download-artifact@v2
with:
name: kata-artifacts
path: kata-artifacts
- name: merge-artifacts
run: |
./tools/packaging/kata-deploy/local-build/kata-deploy-merge-builds.sh kata-artifacts
- name: store-artifacts
uses: actions/upload-artifact@v2
with:
name: kata-static-tarball
path: kata-static.tar.xz
kata-deploy:
needs: create-kata-tarball
check_comments:
if: ${{ github.event.issue.pull_request }}
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: get-kata-tarball
uses: actions/download-artifact@v2
- name: Check for Command
id: command
uses: kata-containers/slash-command-action@v1
with:
name: kata-static-tarball
- name: build-and-push-kata-deploy-ci
id: build-and-push-kata-deploy-ci
repo-token: ${{ secrets.GITHUB_TOKEN }}
command: "test_kata_deploy"
reaction: "true"
reaction-type: "eyes"
allow-edits: "false"
permission-level: admin
- name: verify command arg is kata-deploy
run: |
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
pushd $GITHUB_WORKSPACE
git checkout $tag
pkg_sha=$(git rev-parse HEAD)
popd
mv kata-static.tar.xz $GITHUB_WORKSPACE/tools/packaging/kata-deploy/kata-static.tar.xz
docker build --build-arg KATA_ARTIFACTS=kata-static.tar.xz -t quay.io/kata-containers/kata-deploy-ci:$pkg_sha $GITHUB_WORKSPACE/tools/packaging/kata-deploy
docker login -u ${{ secrets.QUAY_DEPLOYER_USERNAME }} -p ${{ secrets.QUAY_DEPLOYER_PASSWORD }} quay.io
docker push quay.io/kata-containers/kata-deploy-ci:$pkg_sha
mkdir -p packaging/kata-deploy
ln -s $GITHUB_WORKSPACE/tools/packaging/kata-deploy/action packaging/kata-deploy/action
echo "::set-output name=PKG_SHA::${pkg_sha}"
echo "The command was '${{ steps.command.outputs.command-name }}' with arguments '${{ steps.command.outputs.command-arguments }}'"
create-and-test-container:
needs: check_comments
runs-on: ubuntu-latest
steps:
- name: get-PR-ref
id: get-PR-ref
run: |
ref=$(cat $GITHUB_EVENT_PATH | jq -r '.issue.pull_request.url' | sed 's#^.*\/pulls#refs\/pull#' | sed 's#$#\/merge#')
echo "reference for PR: " ${ref}
echo "##[set-output name=pr-ref;]${ref}"
- name: check out
uses: actions/checkout@v2
with:
ref: ${{ steps.get-PR-ref.outputs.pr-ref }}
- name: build-container-image
id: build-container-image
run: |
PR_SHA=$(git log --format=format:%H -n1)
VERSION="2.0.0"
ARTIFACT_URL="https://github.com/kata-containers/kata-containers/releases/download/${VERSION}/kata-static-${VERSION}-x86_64.tar.xz"
wget "${ARTIFACT_URL}" -O tools/packaging/kata-deploy/kata-static.tar.xz
docker build --build-arg KATA_ARTIFACTS=kata-static.tar.xz -t katadocker/kata-deploy-ci:${PR_SHA} -t quay.io/kata-containers/kata-deploy-ci:${PR_SHA} ./tools/packaging/kata-deploy
docker login -u ${{ secrets.DOCKER_USERNAME }} -p ${{ secrets.DOCKER_PASSWORD }}
docker push katadocker/kata-deploy-ci:$PR_SHA
docker login -u ${{ secrets.QUAY_DEPLOYER_USERNAME }} -p ${{ secrets.QUAY_DEPLOYER_PASSWORD }} quay.io
docker push quay.io/kata-containers/kata-deploy-ci:$PR_SHA
echo "##[set-output name=pr-sha;]${PR_SHA}"
- name: test-kata-deploy-ci-in-aks
uses: ./packaging/kata-deploy/action
uses: ./tools/packaging/kata-deploy/action
with:
packaging-sha: ${{steps.build-and-push-kata-deploy-ci.outputs.PKG_SHA}}
packaging-sha: ${{ steps.build-container-image.outputs.pr-sha }}
env:
PKG_SHA: ${{steps.build-and-push-kata-deploy-ci.outputs.PKG_SHA}}
PKG_SHA: ${{ steps.build-container-image.outputs.pr-sha }}
AZ_APPID: ${{ secrets.AZ_APPID }}
AZ_PASSWORD: ${{ secrets.AZ_PASSWORD }}
AZ_SUBSCRIPTION_ID: ${{ secrets.AZ_SUBSCRIPTION_ID }}

295
.github/workflows/main.yaml vendored Normal file
View File

@@ -0,0 +1,295 @@
name: Publish release tarball
on:
push:
tags:
- '1.*'
jobs:
get-artifact-list:
runs-on: ubuntu-latest
steps:
- name: get the list
run: |
pushd $GITHUB_WORKSPACE
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
git checkout $tag
popd
$GITHUB_WORKSPACE/tools/packaging/artifact-list.sh > artifact-list.txt
- name: save-artifact-list
uses: actions/upload-artifact@master
with:
name: artifact-list
path: artifact-list.txt
build-kernel:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_kernel"
steps:
- uses: actions/checkout@v1
- name: get-artifact-list
uses: actions/download-artifact@master
with:
name: artifact-list
- run: |
sudo apt-get update && sudo apt install -y flex bison libelf-dev bc iptables
- name: build-kernel
run: |
if grep -q $buildstr ./artifact-list/artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-artifact-tarball.sh $buildstr
echo "artifact-built=true" >> $GITHUB_ENV
else
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: ${{ env.artifact-built }} == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
path: kata-static-kernel.tar.gz
build-experimental-kernel:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_experimental_kernel"
steps:
- uses: actions/checkout@v1
- name: get-artifact-list
uses: actions/download-artifact@master
with:
name: artifact-list
- run: |
sudo apt-get update && sudo apt install -y flex bison libelf-dev bc iptables
- name: build-experimental-kernel
run: |
if grep -q $buildstr ./artifact-list/artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-artifact-tarball.sh $buildstr
echo "artifact-built=true" >> $GITHUB_ENV
else
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: ${{ env.artifact-built }} == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
path: kata-static-experimental-kernel.tar.gz
build-qemu:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_qemu"
steps:
- uses: actions/checkout@v1
- name: get-artifact-list
uses: actions/download-artifact@master
with:
name: artifact-list
- name: build-qemu
run: |
if grep -q $buildstr ./artifact-list/artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-artifact-tarball.sh $buildstr
echo "artifact-built=true" >> $GITHUB_ENV
else
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: ${{ env.artifact-built }} == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
path: kata-static-qemu.tar.gz
# Job for building the image
build-image:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_image"
steps:
- uses: actions/checkout@v1
- name: get-artifact-list
uses: actions/download-artifact@master
with:
name: artifact-list
- name: build-image
run: |
if grep -q $buildstr ./artifact-list/artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-artifact-tarball.sh $buildstr
echo "artifact-built=true" >> $GITHUB_ENV
else
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: ${{ env.artifact-built }} == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
path: kata-static-image.tar.gz
# Job for building firecracker hypervisor
build-firecracker:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_firecracker"
steps:
- uses: actions/checkout@v1
- name: get-artifact-list
uses: actions/download-artifact@master
with:
name: artifact-list
- name: build-firecracker
run: |
if grep -q $buildstr ./artifact-list/artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-artifact-tarball.sh $buildstr
echo "artifact-built=true" >> $GITHUB_ENV
else
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: ${{ env.artifact-built }} == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
path: kata-static-firecracker.tar.gz
# Job for building cloud-hypervisor
build-clh:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_clh"
steps:
- uses: actions/checkout@v1
- name: get-artifact-list
uses: actions/download-artifact@master
with:
name: artifact-list
- name: build-clh
run: |
if grep -q $buildstr ./artifact-list/artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-artifact-tarball.sh $buildstr
echo "artifact-built=true" >> $GITHUB_ENV
else
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: ${{ env.artifact-built }} == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
path: kata-static-clh.tar.gz
# Job for building kata components
build-kata-components:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_kata_components"
steps:
- uses: actions/checkout@v1
- name: get-artifact-list
uses: actions/download-artifact@master
with:
name: artifact-list
- name: build-kata-components
run: |
if grep -q $buildstr ./artifact-list/artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-artifact-tarball.sh $buildstr
echo "artifact-built=true" >> $GITHUB_ENV
else
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: ${{ env.artifact-built }} == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
path: kata-static-kata-components.tar.gz
gather-artifacts:
runs-on: ubuntu-16.04
needs: [build-experimental-kernel, build-kernel, build-qemu, build-image, build-firecracker, build-kata-components, build-clh]
steps:
- uses: actions/checkout@v1
- name: get-artifacts
uses: actions/download-artifact@master
with:
name: kata-artifacts
- name: colate-artifacts
run: |
$GITHUB_WORKSPACE/.github/workflows/gather-artifacts.sh
- name: store-artifacts
uses: actions/upload-artifact@master
with:
name: release-candidate
path: kata-static.tar.xz
kata-deploy:
needs: gather-artifacts
runs-on: ubuntu-latest
steps:
- name: get-artifacts
uses: actions/download-artifact@master
with:
name: release-candidate
- name: build-and-push-kata-deploy-ci
id: build-and-push-kata-deploy-ci
run: |
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
git clone https://github.com/kata-containers/packaging
pushd packaging
git checkout $tag
pkg_sha=$(git rev-parse HEAD)
popd
mv release-candidate/kata-static.tar.xz ./packaging/kata-deploy/kata-static.tar.xz
docker build --build-arg KATA_ARTIFACTS=kata-static.tar.xz -t katadocker/kata-deploy-ci:$pkg_sha -t quay.io/kata-containers/kata-deploy-ci:$pkg_sha ./packaging/kata-deploy
docker login -u ${{ secrets.DOCKER_USERNAME }} -p ${{ secrets.DOCKER_PASSWORD }}
docker push katadocker/kata-deploy-ci:$pkg_sha
docker login -u ${{ secrets.QUAY_DEPLOYER_USERNAME }} -p ${{ secrets.QUAY_DEPLOYER_PASSWORD }} quay.io
docker push quay.io/kata-containers/kata-deploy-ci:$pkg_sha
echo "::set-output name=PKG_SHA::${pkg_sha}"
- name: test-kata-deploy-ci-in-aks
uses: ./packaging/kata-deploy/action
with:
packaging-sha: ${{steps.build-and-push-kata-deploy-ci.outputs.PKG_SHA}}
env:
PKG_SHA: ${{steps.build-and-push-kata-deploy-ci.outputs.PKG_SHA}}
AZ_APPID: ${{ secrets.AZ_APPID }}
AZ_PASSWORD: ${{ secrets.AZ_PASSWORD }}
AZ_SUBSCRIPTION_ID: ${{ secrets.AZ_SUBSCRIPTION_ID }}
AZ_TENANT_ID: ${{ secrets.AZ_TENANT_ID }}
- name: push-tarball
run: |
# tag the container image we created and push to DockerHub
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
docker tag katadocker/kata-deploy-ci:${{steps.build-and-push-kata-deploy-ci.outputs.PKG_SHA}} katadocker/kata-deploy:${tag}
docker push katadocker/kata-deploy:${tag}
upload-static-tarball:
needs: kata-deploy
runs-on: ubuntu-latest
steps:
- name: download-artifacts
uses: actions/download-artifact@master
with:
name: release-candidate
- name: install hub
run: |
HUB_VER=$(curl -s "https://api.github.com/repos/github/hub/releases/latest" | jq -r .tag_name | sed 's/^v//')
wget -q -O- https://github.com/github/hub/releases/download/v$HUB_VER/hub-linux-amd64-$HUB_VER.tgz | \
tar xz --strip-components=2 --wildcards '*/bin/hub' && sudo mv hub /usr/local/bin/hub
- name: push static tarball to github
run: |
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
tarball="kata-static-$tag-x86_64.tar.xz"
repo="https://github.com/kata-containers/runtime.git"
mv release-candidate/kata-static.tar.xz "release-candidate/${tarball}"
git clone "${repo}"
cd runtime
echo "uploading asset '${tarball}' to '${repo}' tag: ${tag}"
GITHUB_TOKEN=${{ secrets.GIT_UPLOAD_TOKEN }} hub release edit -m "" -a "../release-candidate/${tarball}" "${tag}"

View File

@@ -149,31 +149,3 @@ jobs:
tar -cvzf "${tarball}" src/agent/.cargo/config src/agent/vendor
GITHUB_TOKEN=${{ secrets.GIT_UPLOAD_TOKEN }} hub release edit -m "" -a "${tarball}" "${tag}"
popd
upload-libseccomp-tarball:
needs: upload-cargo-vendored-tarball
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: download-and-upload-tarball
env:
GITHUB_TOKEN: ${{ secrets.GIT_UPLOAD_TOKEN }}
GOPATH: ${HOME}/go
run: |
pushd $GITHUB_WORKSPACE
./ci/install_yq.sh
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
versions_yaml="versions.yaml"
version=$(${GOPATH}/bin/yq read ${versions_yaml} "externals.libseccomp.version")
repo_url=$(${GOPATH}/bin/yq read ${versions_yaml} "externals.libseccomp.url")
download_url="${repo_url}/releases/download/v${version}"
tarball="libseccomp-${version}.tar.gz"
asc="${tarball}.asc"
curl -sSLO "${download_url}/${tarball}"
curl -sSLO "${download_url}/${asc}"
# "-m" option should be empty to re-use the existing release title
# without opening a text editor.
# For the details, check https://hub.github.com/hub-release.1.html.
hub release edit -m "" -a "${tarball}" "${tag}"
hub release edit -m "" -a "${asc}" "${tag}"
popd

View File

@@ -12,7 +12,8 @@ on:
- reopened
- labeled
- unlabeled
branches:
pull_request:
branches:
- main
jobs:
@@ -31,6 +32,8 @@ jobs:
- name: Checkout code to allow hub to communicate with the project
uses: actions/checkout@v2
with:
token: ${{ secrets.KATA_GITHUB_ACTIONS_TOKEN }}
- name: Install porting checker script
run: |

View File

@@ -13,7 +13,7 @@ jobs:
test:
strategy:
matrix:
go-version: [1.16.x, 1.17.x]
go-version: [1.15.x, 1.16.x]
os: [ubuntu-20.04]
runs-on: ${{ matrix.os }}
env:
@@ -60,21 +60,13 @@ jobs:
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/setup.sh
env:
GOPATH: ${{ runner.workspace }}/kata-containers
- name: Installing rust
- name: Building rust
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/install_rust.sh
PATH=$PATH:"$HOME/.cargo/bin"
rustup target add x86_64-unknown-linux-musl
rustup component add rustfmt clippy
- name: Setup seccomp
run: |
libseccomp_install_dir=$(mktemp -d -t libseccomp.XXXXXXXXXX)
gperf_install_dir=$(mktemp -d -t gperf.XXXXXXXXXX)
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/install_libseccomp.sh "${libseccomp_install_dir}" "${gperf_install_dir}"
echo "Set environment variables for the libseccomp crate to link the libseccomp library statically"
echo "LIBSECCOMP_LINK_TYPE=static" >> $GITHUB_ENV
echo "LIBSECCOMP_LIB_PATH=${libseccomp_install_dir}/lib" >> $GITHUB_ENV
# Check whether the vendored code is up-to-date & working as the first thing
- name: Check vendored code
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
@@ -92,7 +84,3 @@ jobs:
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && make test
- name: Run Unit Tests As Root User
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && sudo -E PATH="$PATH" make test

View File

@@ -18,7 +18,6 @@ TOOLS += agent-ctl
STANDARD_TARGETS = build check clean install test vendor
include utils.mk
include ./tools/packaging/kata-deploy/local-build/Makefile
all: build
@@ -34,4 +33,10 @@ generate-protocols:
static-checks: build
bash ci/static-checks.sh
binary-tarball:
make -f ./tools/packaging/kata-deploy/local-build/Makefile
install-binary-tarball:
make -f ./tools/packaging/kata-deploy/local-build/Makefile install
.PHONY: all default static-checks binary-tarball install-binary-tarball

View File

@@ -1 +1 @@
2.3.0
2.2.1

30
ci/go-no-os-exit.sh Executable file
View File

@@ -0,0 +1,30 @@
#!/bin/bash
# Copyright (c) 2018 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
#
# Check there are no os.Exit() calls creeping into the code
# We don't use that exit path in the Kata codebase.
# Allow the path to check to be over-ridden.
# Default to the current directory.
go_packages=${1:-.}
echo "Checking for no os.Exit() calls for package [${go_packages}]"
candidates=`go list -f '{{.Dir}}/*.go' $go_packages`
for f in $candidates; do
filename=`basename $f`
# skip all go test files
[[ $filename == *_test.go ]] && continue
# skip exit.go where, the only file we should call os.Exit() from.
[[ $filename == "exit.go" ]] && continue
files="$f $files"
done
[ -z "$files" ] && echo "No files to check, skipping" && exit 0
if egrep -n '\<os\.Exit\>' $files; then
echo "Direct calls to os.Exit() are forbidden, please use exit() so atexit() works"
exit 1
fi

View File

@@ -1,109 +0,0 @@
#!/bin/bash
#
# Copyright 2021 Sony Group Corporation
#
# SPDX-License-Identifier: Apache-2.0
#
set -o errexit
cidir=$(dirname "$0")
source "${cidir}/lib.sh"
clone_tests_repo
source "${tests_repo_dir}/.ci/lib.sh"
# The following variables if set on the environment will change the behavior
# of gperf and libseccomp configure scripts, that may lead this script to
# fail. So let's ensure they are unset here.
unset PREFIX DESTDIR
arch=$(uname -m)
workdir="$(mktemp -d --tmpdir build-libseccomp.XXXXX)"
# Variables for libseccomp
# Currently, specify the libseccomp version directly without using `versions.yaml`
# because the current Snap workflow is incomplete.
# After solving the issue, replace this code by using the `versions.yaml`.
# libseccomp_version=$(get_version "externals.libseccomp.version")
# libseccomp_url=$(get_version "externals.libseccomp.url")
libseccomp_version="2.5.1"
libseccomp_url="https://github.com/seccomp/libseccomp"
libseccomp_tarball="libseccomp-${libseccomp_version}.tar.gz"
libseccomp_tarball_url="${libseccomp_url}/releases/download/v${libseccomp_version}/${libseccomp_tarball}"
cflags="-O2"
# Variables for gperf
# Currently, specify the gperf version directly without using `versions.yaml`
# because the current Snap workflow is incomplete.
# After solving the issue, replace this code by using the `versions.yaml`.
# gperf_version=$(get_version "externals.gperf.version")
# gperf_url=$(get_version "externals.gperf.url")
gperf_version="3.1"
gperf_url="https://ftp.gnu.org/gnu/gperf"
gperf_tarball="gperf-${gperf_version}.tar.gz"
gperf_tarball_url="${gperf_url}/${gperf_tarball}"
# We need to build the libseccomp library from sources to create a static library for the musl libc.
# However, ppc64le and s390x have no musl targets in Rust. Hence, we do not set cflags for the musl libc.
if ([ "${arch}" != "ppc64le" ] && [ "${arch}" != "s390x" ]); then
# Set FORTIFY_SOURCE=1 because the musl-libc does not have some functions about FORTIFY_SOURCE=2
cflags="-U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=1 -O2"
fi
die() {
msg="$*"
echo "[Error] ${msg}" >&2
exit 1
}
finish() {
rm -rf "${workdir}"
}
trap finish EXIT
build_and_install_gperf() {
echo "Build and install gperf version ${gperf_version}"
mkdir -p "${gperf_install_dir}"
curl -sLO "${gperf_tarball_url}"
tar -xf "${gperf_tarball}"
pushd "gperf-${gperf_version}"
./configure --prefix="${gperf_install_dir}"
make
make install
export PATH=$PATH:"${gperf_install_dir}"/bin
popd
echo "Gperf installed successfully"
}
build_and_install_libseccomp() {
echo "Build and install libseccomp version ${libseccomp_version}"
mkdir -p "${libseccomp_install_dir}"
curl -sLO "${libseccomp_tarball_url}"
tar -xf "${libseccomp_tarball}"
pushd "libseccomp-${libseccomp_version}"
./configure --prefix="${libseccomp_install_dir}" CFLAGS="${cflags}" --enable-static
make
make install
popd
echo "Libseccomp installed successfully"
}
main() {
local libseccomp_install_dir="${1:-}"
local gperf_install_dir="${2:-}"
if [ -z "${libseccomp_install_dir}" ] || [ -z "${gperf_install_dir}" ]; then
die "Usage: ${0} <libseccomp-install-dir> <gperf-install-dir>"
fi
pushd "$workdir"
# gperf is required for building the libseccomp.
build_and_install_gperf
build_and_install_libseccomp
popd
}
main "$@"

View File

@@ -12,5 +12,5 @@ source "${cidir}/lib.sh"
clone_tests_repo
pushd ${tests_repo_dir}
.ci/install_rust.sh ${1:-}
.ci/install_rust.sh
popd

View File

@@ -4,6 +4,6 @@
#
# This is the build root image for Kata Containers on OpenShift CI.
#
FROM registry.centos.org/centos:8
FROM centos:8
RUN yum -y update && yum -y install git sudo wget

View File

@@ -86,16 +86,6 @@ One of the `initrd` and `image` options in Kata runtime config file **MUST** be
The main difference between the options is that the size of `initrd`(10MB+) is significantly smaller than
rootfs `image`(100MB+).
## Enable seccomp
Enable seccomp as follows:
```
$ sudo sed -i '/^disable_guest_seccomp/ s/true/false/' /etc/kata-containers/configuration.toml
```
This will pass container seccomp profiles to the kata agent.
## Enable full debug
Enable full debug as follows:
@@ -226,18 +216,6 @@ $ go get -d -u github.com/kata-containers/kata-containers
$ cd $GOPATH/src/github.com/kata-containers/kata-containers/src/agent && make
```
The agent is built with seccomp capability by default.
If you want to build the agent without the seccomp capability, you need to run `make` with `SECCOMP=no` as follows.
```
$ make -C $GOPATH/src/github.com/kata-containers/kata-containers/src/agent SECCOMP=no
```
> **Note:**
>
> - If you enable seccomp in the main configuration file but build the agent without seccomp capability,
> the runtime exits conservatively with an error message.
## Get the osbuilder
```
@@ -256,21 +234,9 @@ the following example.
$ export ROOTFS_DIR=${GOPATH}/src/github.com/kata-containers/kata-containers/tools/osbuilder/rootfs-builder/rootfs
$ sudo rm -rf ${ROOTFS_DIR}
$ cd $GOPATH/src/github.com/kata-containers/kata-containers/tools/osbuilder/rootfs-builder
$ script -fec 'sudo -E GOPATH=$GOPATH USE_DOCKER=true ./rootfs.sh ${distro}'
```
You MUST choose a distribution (e.g., `ubuntu`) for `${distro}`.
You can get a supported distributions list in the Kata Containers by running the following.
```
$ ./rootfs.sh -l
```
If you want to build the agent without seccomp capability, you need to run the `rootfs.sh` script with `SECCOMP=no` as follows.
```
$ script -fec 'sudo -E GOPATH=$GOPATH AGENT_INIT=yes USE_DOCKER=true SECCOMP=no ./rootfs.sh ${distro}'
$ script -fec 'sudo -E GOPATH=$GOPATH USE_DOCKER=true SECCOMP=no ./rootfs.sh ${distro}'
```
You MUST choose one of `alpine`, `centos`, `clearlinux`, `debian`, `euleros`, `fedora`, `suse`, and `ubuntu` for `${distro}`. By default `seccomp` packages are not included in the rootfs image. Set `SECCOMP` to `yes` to include them.
> **Note:**
>
@@ -306,7 +272,6 @@ $ script -fec 'sudo -E USE_DOCKER=true ./image_builder.sh ${ROOTFS_DIR}'
> - If you do *not* wish to build under Docker, remove the `USE_DOCKER`
> variable in the previous command and ensure the `qemu-img` command is
> available on your system.
> - If `qemu-img` is not installed, you will likely see errors such as `ERROR: File /dev/loop19p1 is not a block device` and `losetup: /tmp/tmp.bHz11oY851: Warning: file is smaller than 512 bytes; the loop device may be useless or invisible for system tools`. These can be mitigated by installing the `qemu-img` command (available in the `qemu-img` package on Fedora or the `qemu-utils` package on Debian).
### Install the rootfs image
@@ -325,23 +290,12 @@ $ (cd /usr/share/kata-containers && sudo ln -sf "$image" kata-containers.img)
$ export ROOTFS_DIR="${GOPATH}/src/github.com/kata-containers/kata-containers/tools/osbuilder/rootfs-builder/rootfs"
$ sudo rm -rf ${ROOTFS_DIR}
$ cd $GOPATH/src/github.com/kata-containers/kata-containers/tools/osbuilder/rootfs-builder
$ script -fec 'sudo -E GOPATH=$GOPATH AGENT_INIT=yes USE_DOCKER=true ./rootfs.sh ${distro}'
```
`AGENT_INIT` controls if the guest image uses the Kata agent as the guest `init` process. When you create an initrd image,
always set `AGENT_INIT` to `yes`.
You MUST choose a distribution (e.g., `ubuntu`) for `${distro}`.
You can get a supported distributions list in the Kata Containers by running the following.
```
$ ./rootfs.sh -l
```
If you want to build the agent without seccomp capability, you need to run the `rootfs.sh` script with `SECCOMP=no` as follows.
```
$ script -fec 'sudo -E GOPATH=$GOPATH AGENT_INIT=yes USE_DOCKER=true SECCOMP=no ./rootfs.sh ${distro}'
```
`AGENT_INIT` controls if the guest image uses the Kata agent as the guest `init` process. When you create an initrd image,
always set `AGENT_INIT` to `yes`. By default `seccomp` packages are not included in the initrd image. Set `SECCOMP` to `yes` to include them.
You MUST choose one of `alpine`, `centos`, `clearlinux`, `euleros`, and `fedora` for `${distro}`.
> **Note:**
>

View File

@@ -11,10 +11,6 @@ For details of the other Kata Containers repositories, see the
* [Installation guides](./install/README.md): Install and run Kata Containers with Docker or Kubernetes
## Tracing
See the [tracing documentation](tracing.md).
## More User Guides
* [Upgrading](Upgrading.md): how to upgrade from [Clear Containers](https://github.com/clearcontainers) and [runV](https://github.com/hyperhq/runv) to [Kata Containers](https://github.com/kata-containers) and how to upgrade an existing Kata Containers system to the latest version.
@@ -44,7 +40,6 @@ Documents that help to understand and contribute to Kata Containers.
* [Kata Containers Architecture](design/architecture.md): Architectural overview of Kata Containers
* [Kata Containers E2E Flow](design/end-to-end-flow.md): The entire end-to-end flow of Kata Containers
* [Kata Containers design](./design/README.md): More Kata Containers design documents
* [Kata Containers threat model](./threat-model/threat-model.md): Kata Containers threat model
### How to Contribute

View File

@@ -64,7 +64,7 @@
### Check Git-hub Actions
We make use of [GitHub actions](https://github.com/features/actions) in this [file](https://github.com/kata-containers/kata-containers/blob/main/.github/workflows/release.yaml) in the `kata-containers/kata-containers` repository to build and upload release artifacts. This action is auto triggered with the above step when a new tag is pushed to the `kata-containers/kata-containers` repository.
We make use of [GitHub actions](https://github.com/features/actions) in this [file](https://github.com/kata-containers/kata-containers/blob/main/.github/workflows/main.yaml) in the `kata-containers/kata-containers` repository to build and upload release artifacts. This action is auto triggered with the above step when a new tag is pushed to the `kata-containers/kata-containers` repository.
Check the [actions status page](https://github.com/kata-containers/kata-containers/actions) to verify all steps in the actions workflow have completed successfully. On success, a static tarball containing Kata release artifacts will be uploaded to the [Release page](https://github.com/kata-containers/kata-containers/releases).

View File

@@ -14,7 +14,7 @@ through the [CRI-O\*](https://github.com/kubernetes-incubator/cri-o) and
Kata Containers creates a QEMU\*/KVM virtual machine for pod that `kubelet` (Kubernetes) creates respectively.
The [`containerd-shim-kata-v2` (shown as `shimv2` from this point onwards)](../../src/runtime/cmd/containerd-shim-kata-v2/)
The [`containerd-shim-kata-v2` (shown as `shimv2` from this point onwards)](../../src/runtime/containerd-shim-v2)
is the Kata Containers entrypoint, which
implements the [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/master/runtime/v2) for Kata.
@@ -259,7 +259,7 @@ With `RuntimeClass`, users can define Kata Containers as a `RuntimeClass` and th
## DAX
Kata Containers utilizes the Linux kernel DAX [(Direct Access filesystem)](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/dax.rst?h=v5.14)
Kata Containers utilizes the Linux kernel DAX [(Direct Access filesystem)](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/dax.txt)
feature to efficiently map some host-side files into the guest VM space.
In particular, Kata Containers uses the QEMU NVDIMM feature to provide a
memory-mapped virtual device that can be used to DAX map the virtual machine's

View File

@@ -12,244 +12,187 @@ The OCI [runtime specification][linux-config] provides guidance on where the con
> [`cgroupsPath`][cgroupspath]: (string, OPTIONAL) path to the cgroups. It can be used to either control the cgroups
> hierarchy for containers or to run a new process in an existing container
Cgroups are hierarchical, and this can be seen with the following pod example:
cgroups are hierarchical, and this can be seen with the following pod example:
- Pod 1: `cgroupsPath=/kubepods/pod1`
- Container 1: `cgroupsPath=/kubepods/pod1/container1`
- Container 2: `cgroupsPath=/kubepods/pod1/container2`
- Container 1:
`cgroupsPath=/kubepods/pod1/container1`
- Container 2:
`cgroupsPath=/kubepods/pod1/container2`
- Pod 2: `cgroupsPath=/kubepods/pod2`
- Container 1: `cgroupsPath=/kubepods/pod2/container2`
- Container 2: `cgroupsPath=/kubepods/pod2/container2`
- Container 1:
`cgroupsPath=/kubepods/pod2/container2`
- Container 2:
`cgroupsPath=/kubepods/pod2/container2`
Depending on the upper-level orchestration layers, the cgroup under which the pod is placed is
managed by the orchestrator or not. In the case of Kubernetes, the pod cgroup is created by Kubelet,
while the container cgroups are to be handled by the runtime.
Kubelet will size the pod cgroup based on the container resource requirements, to which it may add
a configured set of [pod resource overheads](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/).
Depending on the upper-level orchestrator, the cgroup under which the pod is placed is
managed by the orchestrator. In the case of Kubernetes, the pod-cgroup is created by Kubelet,
while the container cgroups are to be handled by the runtime. Kubelet will size the pod-cgroup
based on the container resource requirements.
Kata Containers introduces a non-negligible resource overhead for running a sandbox (pod). Typically, the Kata shim,
through its underlying VMM invocation, will create many additional threads compared to process based container runtimes:
the para-virtualized I/O back-ends, the VMM instance or even the Kata shim process, all of those host processes consume
memory and CPU time not directly tied to the container workload, and introduces a sandbox resource overhead.
In order for a Kata workload to run without significant performance degradation, its sandbox overhead must be
provisioned accordingly. Two scenarios are possible:
Kata Containers introduces a non-negligible overhead for running a sandbox (pod). Based on this, two scenarios are possible:
1) The upper-layer orchestrator takes the overhead of running a sandbox into account when sizing the pod-cgroup, or
2) Kata Containers do not fully constrain the VMM and associated processes, instead placing a subset of them outside of the pod-cgroup.
1) The upper-layer orchestrator takes the overhead of running a sandbox into account when sizing the pod cgroup.
For example, Kubernetes [`PodOverhead`](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/)
feature lets the orchestrator add a configured sandbox overhead to the sum of all its containers resources. In
that case, the pod sandbox is properly sized and all Kata created processes will run under the pod cgroup
defined constraints and limits.
2) The upper-layer orchestrator does **not** take the sandbox overhead into account and the pod cgroup is not
sized to properly run all Kata created processes. With that scenario, attaching all the Kata processes to the sandbox
cgroup may lead to non-negligible workload performance degradations. As a consequence, Kata Containers will move
all processes but the vCPU threads into a dedicated overhead cgroup under `/kata_overhead`. The Kata runtime will
not apply any constraints or limits to that cgroup, it is up to the infrastructure owner to optionally set it up.
Kata Containers provides two options for how cgroups are handled on the host. Selection of these options is done through
the `SandboxCgroupOnly` flag within the Kata Containers [configuration](../../src/runtime/README.md#configuration)
file.
Those 2 scenarios are not dynamically detected by the Kata Containers runtime implementation, and thus the
infrastructure owner must configure the runtime according to how the upper-layer orchestrator creates and sizes the
pod cgroup. That configuration selection is done through the `sandbox_cgroup_only` flag within the Kata Containers
[configuration](../../src/runtime/README.md#configuration) file.
## `SandboxCgroupOnly` enabled
## `sandbox_cgroup_only = true`
With `SandboxCgroupOnly` enabled, it is expected that the parent cgroup is sized to take the overhead of running
a sandbox into account. This is ideal, as all the applicable Kata Containers components can be placed within the
given cgroup-path.
Setting `sandbox_cgroup_only` to `true` from the Kata Containers configuration file means that the pod cgroup is
properly sized and takes the pod overhead into account. This is ideal, as all the applicable Kata Containers processes
can simply be placed within the given cgroup path.
In the context of Kubernetes, Kubelet can size the pod cgroup to take the overhead of running a Kata-based sandbox
into account. This has been supported since the 1.16 Kubernetes release, through the
[`PodOverhead`](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/) feature.
In the context of Kubernetes, Kubelet will size the pod-cgroup to take the overhead of running a Kata-based sandbox
into account. This will be feasible in the 1.16 Kubernetes release through the `PodOverhead` feature.
```
┌─────────────────────────────────────────┐
┌──────────────────────────────────┐ │
│ │
│ │ ┌─────────────────────────────┐ │ │
│ │ │ │ │
│ │ │ ┌─────────────────────┐ │ │ │
│ │ │ │ vCPU threads
│ │ │ I/O threads │ │ │ │
│ │ │ │ VMM │ │
│ │ │ │ Kata Shim
│ │ │ │ │ │ │
│ │ │ │ /kata_<sandbox_id>
│ │ └─────────────────────┘ │ │ │
│ │Pod 1 │ │ │
│ │ └─────────────────────────────┘ │ │
│ │
│ │ ┌─────────────────────────────┐ │ │
│ │ │ │ │
│ │ ┌─────────────────────┐ │ │ │
│ │ │ vCPU threads
│ │ │ I/O threads │ │ │ │
│ │ │ │ VMM
│ │ │ Kata Shim │ │ │ │
│ │ │ │
│ │ │ │ /kata_<sandbox_id>
│ │ │ └─────────────────────┘ │ │ │
│ │ │Pod 2 │ │ │
│ │ └─────────────────────────────┘ │ │
│ │ │ │
│ │/kubepods │ │
│ └──────────────────────────────────┘ │
│ │
│ Node │
└─────────────────────────────────────────┘
+----------------------------------------------------------+
| +---------------------------------------------------+ |
| | +---------------------------------------------+ | |
| | | +--------------------------------------+ | | |
| | | | kata-shimv2, VMM and threads: | | | |
| | | | (VMM, IO-threads, vCPU threads, etc)| | | |
| | | | | | | |
| | | | kata_<sandbox-id> | | | |
| | | +--------------------------------------+ | | |
| | | | | |
| | |Pod 1 | | |
| | +---------------------------------------------+ | |
| | | |
| | +---------------------------------------------+ | |
| | | +--------------------------------------+ | | |
| | | | kata-shimv2, VMM and threads: | | | |
| | | | (VMM, IO-threads, vCPU threads, etc)| | | |
| | | | | | | |
| | | | kata_<sandbox-id> | | | |
| | | +--------------------------------------+ | | |
| | |Pod 2 | | |
| | +---------------------------------------------+ | |
| |kubepods | |
| +---------------------------------------------------+ |
| |
|Node |
+----------------------------------------------------------+
```
### Implementation details
### What does Kata do in this configuration?
1. Given a `PodSandbox` container creation, let:
When `sandbox_cgroup_only` is enabled, the Kata shim will create a per pod
sub-cgroup under the pod's dedicated cgroup. For example, in the Kubernetes context,
it will create a `/kata_<PodSandboxID>` under the `/kubepods` cgroup hierarchy.
On a typical cgroup v1 hierarchy mounted under `/sys/fs/cgroup/`, the memory cgroup
subsystem for a pod with sandbox ID `12345678` would live under
`/sys/fs/cgroup/memory/kubepods/kata_12345678`.
```
podCgroup=Parent(container.CgroupsPath)
KataSandboxCgroup=<podCgroup>/kata_<PodSandboxID>
```
In most cases, the `/kata_<PodSandboxID>` created cgroup is unrestricted and inherits and shares all
constraints and limits from the parent cgroup (`/kubepods` in the Kubernetes case). The exception is
for the `cpuset` and `devices` cgroup subsystems, which are managed by the Kata shim.
2. Create the cgroup, `KataSandboxCgroup`
After creating the `/kata_<PodSandboxID>` cgroup, the Kata Containers shim will move itself to it, **before** starting
the virtual machine. As a consequence all processes subsequently created by the Kata Containers shim (the VMM itself, and
all vCPU and I/O related threads) will be created in the `/kata_<PodSandboxID>` cgroup.
3. Join the `KataSandboxCgroup`
### Why create a kata-cgroup under the parent cgroup?
Any process created by the runtime will be created in `KataSandboxCgroup`.
The runtime will limit the cgroup in the host only if the sandbox doesn't have a
container type annotation, but the caller is free to set the proper limits for the `podCgroup`.
And why not directly adding the per sandbox shim directly to the pod cgroup (e.g.
`/kubepods` in the Kubernetes context)?
In the example above the pod cgroups are `/kubepods/pod1` and `/kubepods/pod2`.
Kata creates the unrestricted sandbox cgroup under the pod cgroup.
The Kata Containers shim implementation creates a per-sandbox cgroup
(`/kata_<PodSandboxID>`) to support the `Docker` use case. Although `Docker` does not
have a notion of pods, Kata Containers still creates a sandbox to support the pod-less,
single container use case that `Docker` implements. Since `Docker` does create any
cgroup hierarchy to place a container into, it would be very complex for Kata to map
a particular container to its sandbox without placing it under a `/kata_<containerID>>`
sub-cgroup first.
### Why create a Kata-cgroup under the parent cgroup?
### Advantages
`Docker` does not have a notion of pods, and will not create a cgroup directory
to place a particular container in (i.e., all containers would be in a path like
`/docker/container-id`. To simplify the implementation and continue to support `Docker`,
Kata Containers creates the sandbox-cgroup, in the case of Kubernetes, or a container cgroup, in the case
of docker.
Keeping all Kata Containers processes under a properly sized pod cgroup is ideal
and makes for a simpler Kata Containers implementation. It also helps with gathering
accurate statistics and preventing Kata workloads from being noisy neighbors.
### Improvements
#### Pod resources statistics
- Get statistics about pod resources
If the Kata caller wants to know the resource usage on the host it can get
statistics from the pod cgroup. All cgroups stats in the hierarchy will include
the Kata overhead. This gives the possibility of gathering usage-statics at the
pod level and the container level.
#### Better host resource isolation
- Better host resource isolation
Because the Kata runtime will place all the Kata processes in the pod cgroup,
the resource limits that the caller applies to the pod cgroup will affect all
processes that belong to the Kata sandbox in the host. This will improve the
isolation in the host preventing Kata to become a noisy neighbor.
## `sandbox_cgroup_only = false` (Default setting)
If the cgroup provided to Kata is not sized appropriately, Kata components will
consume resources that the actual container workloads expect to see and use.
This can cause instability and performance degradations.
To avoid that situation, Kata Containers creates an unconstrained overhead
cgroup and moves all non workload related processes (Anything but the virtual CPU
threads) to it. The name of this overhead cgroup is `/kata_overhead` and a per
sandbox sub cgroup will be created under it for each sandbox Kata Containers creates.
Kata Containers does not add any constraints or limitations on the overhead cgroup. It is up to the infrastructure
owner to either:
- Provision nodes with a pre-sized `/kata_overhead` cgroup. Kata Containers will
load that existing cgroup and move all non workload related processes to it.
- Let Kata Containers create the `/kata_overhead` cgroup, leave it
unconstrained or resize it a-posteriori.
## `SandboxCgroupOnly` disabled (default, legacy)
If the cgroup provided to Kata is not sized appropriately, instability will be
introduced when fully constraining Kata components, and the user-workload will
see a subset of resources that were requested. Based on this, the default
handling for Kata Containers is to not fully constrain the VMM and Kata
components on the host.
```
┌────────────────────────────────────────────────────────────────────┐
│ │
│ ┌─────────────────────────────┐ ┌───────────────────────────┐ │
│ │ │ │ │ │
┌─────────────────────────┼────┼─────────────────────────┐ │ │
│ │ │ │ │ │
│ ┌─────────────────────┐ │ │ ┌─────────────────────┐ │ │ │
│ │ │ │ vCPU threads │ │ │ │ VMM │ │ │ │
│ │ │ │ │ │ I/O threads │ │ │ │
│ │ │ │ │ │ │ │ Kata Shim │ │ │ │
│ │ │ │ │ │ │ │ │ │ │ │
│ │ /kata_<sandbox_id> │ │ │ │ /<sandbox_id> │ │ │ │
│ └─────────────────────┘ │ │ └─────────────────────┘ │ │ │
│ │ │ │ │ │ │ │
│ │ │ Pod 1 │ │ │ │ │
└─────────────────────────┼────┼─────────────────────────┘ │ │
│ │ │ │ │ │
│ │ │ │ │ │
┌─────────────────────────┼────┼─────────────────────────┐ │ │
│ │ │ │ │ │ │ │
│ ┌─────────────────────┐ │ │ ┌─────────────────────┐ │ │ │
│ │ vCPU threads │ │ │ │ VMM │ │ │ │
│ │ │ │ │ │ │ │ I/O threads │ │ │ │
│ │ │ │ │ │ │ │ Kata Shim │ │ │ │
│ │ │ │ │ │ │ │ │ │ │ │
│ │ │ │ /kata_<sandbox_id> │ │ │ │ /<sandbox_id> │ │ │ │
│ │ │ └─────────────────────┘ │ │ └─────────────────────┘ │ │ │
│ │ │ │ │
│ │ │ Pod 2 │ │ │ │ │
│ │ └─────────────────────────┼────┼─────────────────────────┘ │ │
│ │ │ │ │ │
│ │ /kubepods │ │ /kata_overhead │ │
│ └─────────────────────────────┘ └───────────────────────────┘ │
│ │
│ │
│ Node │
└────────────────────────────────────────────────────────────────────┘
+----------------------------------------------------------+
| +---------------------------------------------------+ |
| | +---------------------------------------------+ | |
| | | +--------------------------------------+ | | |
| | | |Container 1 |-|Container 2 | | | |
| | | | |-| | | | |
| | | | Shim+container1 |-| Shim+container2 | | | |
| | | +--------------------------------------+ | | |
| | | | | |
| | |Pod 1 | | |
| | +---------------------------------------------+ | |
| | | |
| | +---------------------------------------------+ | |
| | | +--------------------------------------+ | | |
| | | |Container 1 |-|Container 2 | | | |
| | | | |-| | | | |
| | | | Shim+container1 |-| Shim+container2 | | | |
| | | +--------------------------------------+ | | |
| | | | | |
| | |Pod 2 | | |
| | +---------------------------------------------+ | |
| |kubepods | |
| +---------------------------------------------------+ |
| +---------------------------------------------------+ |
| | Hypervisor | |
| |Kata | |
| +---------------------------------------------------+ |
| |
|Node |
+----------------------------------------------------------+
```
### Implementation Details
### What does this method do?
When `sandbox_cgroup_only` is disabled, the Kata Containers shim will create a per pod
sub-cgroup under the pods dedicated cgroup, and another one under the overhead cgroup.
For example, in the Kubernetes context, it will create a `/kata_<PodSandboxID>` under
the `/kubepods` cgroup hierarchy, and a `/<PodSandboxID>` under the `/kata_overhead` one.
1. Given a container creation let `containerCgroupHost=container.CgroupsPath`
1. Rename `containerCgroupHost` path to add `kata_`
1. Let `PodCgroupPath=PodSanboxContainerCgroup` where `PodSanboxContainerCgroup` is the cgroup of a container of type `PodSandbox`
1. Limit the `PodCgroupPath` with the sum of all the container limits in the Sandbox
1. Move only vCPU threads of hypervisor to `PodCgroupPath`
1. Per each container, move its `kata-shim` to its own `containerCgroupHost`
1. Move hypervisor and applicable threads to memory cgroup `/kata`
On a typical cgroup v1 hierarchy mounted under `/sys/fs/cgroup/`, for a pod which sandbox
ID is `12345678`, create with `sandbox_cgroup_only` disabled, the 2 memory subsystems
for the sandbox cgroup and the overhead cgroup would respectively live under
`/sys/fs/cgroup/memory/kubepods/kata_12345678` and `/sys/fs/cgroup/memory/kata_overhead/12345678`.
_Note_: the Kata Containers runtime will not add all the hypervisor threads to
the cgroup path requested, only vCPUs. These threads are run unconstrained.
Unlike when `sandbox_cgroup_only` is enabled, the Kata Containers shim will move itself
to the overhead cgroup first, and then move the vCPU threads to the sandbox cgroup as
they're created. All Kata processes and threads will run under the overhead cgroup except for
the vCPU threads.
This mitigates the risk of the VMM and other threads receiving an out of memory scenario (`OOM`).
With `sandbox_cgroup_only` disabled, Kata Containers assumes the pod cgroup is only sized
to accommodate for the actual container workloads processes. For Kata, this maps
to the VMM created virtual CPU threads and so they are the only ones running under the pod
cgroup. This mitigates the risk of the VMM, the Kata shim and the I/O threads going through
a catastrophic out of memory scenario (`OOM`).
#### Pros and Cons
#### Impact
Running all non vCPU threads under an unconstrained overhead cgroup could lead to workloads
potentially consuming a large amount of host resources.
On the other hand, running all non vCPU threads under a dedicated overhead cgroup can provide
accurate metrics on the actual Kata Container pod overhead, allowing for tuning the overhead
cgroup size and constraints accordingly.
If resources are reserved at a system level to account for the overheads of
running sandbox containers, this configuration can be utilized with adequate
stability. In this scenario, non-negligible amounts of CPU and memory will be
utilized unaccounted for on the host.
[linux-config]: https://github.com/opencontainers/runtime-spec/blob/master/config-linux.md
[cgroupspath]: https://github.com/opencontainers/runtime-spec/blob/master/config-linux.md#cgroups-path
# Supported cgroups
Kata Containers currently only supports cgroups `v1`.
In the following sections each cgroup is described briefly.
Kata Containers supports cgroups `v1` and `v2`. In the following sections each cgroup is
described briefly and what changes are needed in Kata Containers to support it.
## Cgroups V1
@@ -301,7 +244,7 @@ diagram:
A process can join a cgroup by writing its process id (`pid`) to `cgroup.procs` file,
or join a cgroup partially by writing the task (thread) id (`tid`) to the `tasks` file.
Kata Containers only supports `v1`.
Kata Containers supports `v1` by default and no change in the configuration file is needed.
To know more about `cgroups v1`, see [cgroupsv1(7)][2].
## Cgroups V2
@@ -354,13 +297,22 @@ Same as `cgroups v1`, a process can join the cgroup by writing its process id (`
`cgroup.procs` file, or join a cgroup partially by writing the task (thread) id (`tid`) to
`cgroup.threads` file.
Kata Containers does not support cgroups `v2` on the host.
For backwards compatibility Kata Containers defaults to supporting cgroups v1 by default.
To change this to `v2`, set `sandbox_cgroup_only=true` in the `configuration.toml` file.
To know more about `cgroups v2`, see [cgroupsv2(7)][3].
### Distro Support
Many Linux distributions do not yet support `cgroups v2`, as it is quite a recent addition.
For more information about the status of this feature see [issue #2494][4].
# Summary
| cgroup option | default? | status | pros | cons | cgroups
|-|-|-|-|-|-|
| `SandboxCgroupOnly=false` | yes | legacy | Easiest to make Kata work | Unaccounted for memory and resource utilization | v1
| `SandboxCgroupOnly=true` | no | recommended | Complete tracking of Kata memory and CPU utilization. In Kubernetes, the Kubelet can fully constrain Kata via the pod cgroup | Requires upper layer orchestrator which sizes sandbox cgroup appropriately | v1, v2
[1]: http://man7.org/linux/man-pages/man5/tmpfs.5.html
[2]: http://man7.org/linux/man-pages/man7/cgroups.7.html#CGROUPS_VERSION_1

View File

@@ -207,7 +207,7 @@ Metrics for Firecracker vmm.
| `kata_firecracker_uart`: <br> Metrics specific to the UART device. | `GAUGE` | | <ul><li>`item`<ul><li>`error_count`</li><li>`flush_count`</li><li>`missed_read_count`</li><li>`missed_write_count`</li><li>`read_count`</li><li>`write_count`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_firecracker_vcpu`: <br> Metrics specific to VCPUs' mode of functioning. | `GAUGE` | | <ul><li>`item`<ul><li>`exit_io_in`</li><li>`exit_io_out`</li><li>`exit_mmio_read`</li><li>`exit_mmio_write`</li><li>`failures`</li><li>`filter_cpuid`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_firecracker_vmm`: <br> Metrics specific to the machine manager as a whole. | `GAUGE` | | <ul><li>`item`<ul><li>`device_events`</li><li>`panic_count`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_firecracker_vsock`: <br> VSOCK-related metrics. | `GAUGE` | | <ul><li>`item`<ul><li>`activate_fails`</li><li>`cfg_fails`</li><li>`conn_event_fails`</li><li>`conns_added`</li><li>`conns_killed`</li><li>`conns_removed`</li><li>`ev_queue_event_fails`</li><li>`killq_resync`</li><li>`muxer_event_fails`</li><li>`rx_bytes_count`</li><li>`rx_packets_count`</li><li>`rx_queue_event_count`</li><li>`rx_queue_event_fails`</li><li>`rx_read_fails`</li><li>`tx_bytes_count`</li><li>`tx_flush_fails`</li><li>`tx_packets_count`</li><li>`tx_queue_event_count`</li><li>`tx_queue_event_fails`</li><li>`tx_write_fails`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_firecracker_vsock`: <br> Vsock-related metrics. | `GAUGE` | | <ul><li>`item`<ul><li>`activate_fails`</li><li>`cfg_fails`</li><li>`conn_event_fails`</li><li>`conns_added`</li><li>`conns_killed`</li><li>`conns_removed`</li><li>`ev_queue_event_fails`</li><li>`killq_resync`</li><li>`muxer_event_fails`</li><li>`rx_bytes_count`</li><li>`rx_packets_count`</li><li>`rx_queue_event_count`</li><li>`rx_queue_event_fails`</li><li>`rx_read_fails`</li><li>`tx_bytes_count`</li><li>`tx_flush_fails`</li><li>`tx_packets_count`</li><li>`tx_queue_event_count`</li><li>`tx_queue_event_fails`</li><li>`tx_write_fails`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
### Kata guest OS metrics

View File

@@ -30,7 +30,7 @@ The Kata Containers runtime **MUST** implement the following command line option
The Kata Containers project **MUST** provide two interfaces for CRI shims to manage hardware
virtualization based Kubernetes pods and containers:
- An OCI and `runc` compatible command line interface, as described in the previous section.
This interface is used by implementations such as [`CRI-O`](http://cri-o.io) and [`containerd`](https://github.com/containerd/containerd), for example.
This interface is used by implementations such as [`CRI-O`](http://cri-o.io) and [`cri-containerd`](https://github.com/containerd/cri-containerd), for example.
- A hardware virtualization runtime library API for CRI shims to consume and provide a more
CRI native implementation. The [`frakti`](https://github.com/kubernetes/frakti) CRI shim is an example of such a consumer.

View File

@@ -5,7 +5,7 @@
- [Run Kata containers with `crictl`](run-kata-with-crictl.md)
- [Run Kata Containers with Kubernetes](run-kata-with-k8s.md)
- [How to use Kata Containers and Containerd](containerd-kata.md)
- [How to use Kata Containers and CRI (containerd) with Kubernetes](how-to-use-k8s-with-cri-containerd-and-kata.md)
- [How to use Kata Containers and CRI (containerd plugin) with Kubernetes](how-to-use-k8s-with-cri-containerd-and-kata.md)
- [Kata Containers and service mesh for Kubernetes](service-mesh.md)
- [How to import Kata Containers logs into Fluentd](how-to-import-kata-logs-with-fluentd.md)
@@ -17,9 +17,10 @@
- `firecracker`
- `ACRN`
While `qemu` , `cloud-hypervisor` and `firecracker` work out of the box with installation of Kata,
some additional configuration is needed in case of `ACRN`.
While `qemu` and `cloud-hypervisor` work out of the box with installation of Kata,
some additional configuration is needed in case of `firecracker` and `ACRN`.
Refer to the following guides for additional configuration steps:
- [Kata Containers with Firecracker](https://github.com/kata-containers/documentation/wiki/Initial-release-of-Kata-Containers-with-Firecracker-support)
- [Kata Containers with ACRN Hypervisor](how-to-use-kata-containers-with-acrn.md)
## Advanced Topics
@@ -34,5 +35,3 @@
- [How to set sandbox Kata Containers configurations with pod annotations](how-to-set-sandbox-config-kata.md)
- [How to monitor Kata Containers in K8s](how-to-set-prometheus-in-k8s.md)
- [How to use hotplug memory on arm64 in Kata Containers](how-to-hotplug-memory-arm64.md)
- [How to setup swap devices in guest kernel](how-to-setup-swap-devices-in-guest-kernel.md)
- [How to run rootless vmm](how-to-run-rootless-vmm.md)

View File

@@ -39,7 +39,7 @@ use `RuntimeClass` instead of the deprecated annotations.
### Containerd Runtime V2 API: Shim V2 API
The [`containerd-shim-kata-v2` (short as `shimv2` in this documentation)](../../src/runtime/cmd/containerd-shim-kata-v2/)
The [`containerd-shim-kata-v2` (short as `shimv2` in this documentation)](../../src/runtime/containerd-shim-v2)
implements the [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/master/runtime/v2) for Kata.
With `shimv2`, Kubernetes can launch Pod and OCI-compatible containers with one shim per Pod. Prior to `shimv2`, `2N+1`
shims (i.e. a `containerd-shim` and a `kata-shim` for each container and the Pod sandbox itself) and no standalone `kata-proxy`

View File

@@ -1,33 +0,0 @@
## Introduction
To improve security, Kata Container supports running the VMM process (currently only QEMU) as a non-`root` user.
This document describes how to enable the rootless VMM mode and its limitations.
## Pre-requisites
The permission and ownership of the `kvm` device node (`/dev/kvm`) need to be configured to:
```
$ crw-rw---- 1 root kvm
```
use the following commands:
```
$ sudo groupadd kvm -r
$ sudo chown root:kvm /dev/kvm
$ sudo chmod 660 /dev/kvm
```
## Configure rootless VMM
By default, the VMM process still runs as the root user. There are two ways to enable rootless VMM:
1. Set the `rootless` flag to `true` in the hypervisor section of `configuration.toml`.
2. Set the Kubernetes annotation `io.katacontainers.hypervisor.rootless` to `true`.
## Implementation details
When `rootless` flag is enabled, upon a request to create a Pod, Kata Containers runtime creates a random user and group (e.g. `kata-123`), and uses them to start the hypervisor process.
The `kvm` group is also given to the hypervisor process as a supplemental group to give the hypervisor process access to the `/dev/kvm` device.
Another necessary change is to move the hypervisor runtime files (e.g. `vhost-fs.sock`, `qmp.sock`) to a directory (under `/run/user/[uid]/`) where only the non-root hypervisor has access to.
## Limitations
1. Only the VMM process is running as a non-root user. Other processes such as Kata Container shimv2 and `virtiofsd` still run as the root user.
2. Currently, this feature is only supported in QEMU. Still need to bring it to Firecracker and Cloud Hypervisor (see https://github.com/kata-containers/kata-containers/issues/2567).
3. Certain features will not work when rootless VMM is enabled, including:
1. Passing devices to the guest (`virtio-blk`, `virtio-scsi`) will not work if the non-privileged user does not have permission to access it (leading to a permission denied error). A more permissive permission (e.g. 666) may overcome this issue. However, you need to be aware of the potential security implications of reducing the security on such devices.
2. `vfio` device will also not work because of permission denied error.

View File

@@ -34,6 +34,8 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.agent.enable_tracing` | `boolean` | enable tracing for the agent |
| `io.katacontainers.config.agent.container_pipe_size` | uint32 | specify the size of the std(in/out) pipes created for containers |
| `io.katacontainers.config.agent.kernel_modules` | string | the list of kernel modules and their parameters that will be loaded in the guest kernel. Semicolon separated list of kernel modules and their parameters. These modules will be loaded in the guest kernel using `modprobe`(8). E.g., `e1000e InterruptThrottleRate=3000,3000,3000 EEE=1; i915 enable_ppgtt=0` |
| `io.katacontainers.config.agent.trace_mode` | string | the trace mode for the agent |
| `io.katacontainers.config.agent.trace_type` | string | the trace type for the agent |
## Hypervisor Options
| Key | Value Type | Comments |
@@ -89,13 +91,6 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.hypervisor.virtio_fs_cache` | string | the cache mode for virtio-fs, valid values are `always`, `auto` and `none` |
| `io.katacontainers.config.hypervisor.virtio_fs_daemon` | string | virtio-fs `vhost-user` daemon path |
| `io.katacontainers.config.hypervisor.virtio_fs_extra_args` | string | extra options passed to `virtiofs` daemon |
| `io.katacontainers.config.hypervisor.enable_guest_swap` | `boolean` | enable swap in the guest |
## Container Options
| Key | Value Type | Comments |
|-------| ----- | ----- |
| `io.katacontainers.container.resource.swappiness"` | `uint64` | specify the `Resources.Memory.Swappiness` |
| `io.katacontainers.container.resource.swap_in_bytes"` | `uint64` | specify the `Resources.Memory.Swap` |
# CRI-O Configuration
@@ -105,12 +100,11 @@ In case of CRI-O, all annotations specified in the pod spec are passed down to K
For containerd, annotations specified in the pod spec are passed down to Kata
starting with version `1.3.0` of containerd. Additionally, extra configuration is
needed for containerd, by providing `pod_annotations` field and
`container_annotations` field in the containerd config
file. The `pod_annotations` field and `container_annotations` field are two lists of
annotations that can be passed down to Kata as OCI annotations. They support golang match
patterns. Since annotations supported by Kata follow the pattern `io.katacontainers.*`,
the following configuration would work for passing annotations to Kata from containerd:
needed for containerd, by providing a `pod_annotations` field in the containerd config
file. The `pod_annotations` field is a list of annotations that can be passed down to
Kata as OCI annotations. It supports golang match patterns. Since annotations supported
by Kata follow the pattern `io.katacontainers.*`, the following configuration would work
for passing annotations to Kata from containerd:
```
$ cat /etc/containerd/config
@@ -119,7 +113,6 @@ $ cat /etc/containerd/config
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata]
runtime_type = "io.containerd.kata.v2"
pod_annotations = ["io.katacontainers.*"]
container_annotations = ["io.katacontainers.*"]
....
```

View File

@@ -1,59 +0,0 @@
# Setup swap device in guest kernel
## Introduction
Setup swap device in guest kernel can help to increase memory capacity, handle some memory issues and increase file access speed sometimes.
Kata Containers can insert a raw file to the guest as the swap device.
## Requisites
The swap config of the containers should be set by [annotations](how-to-set-sandbox-config-kata.md#container-options). So [extra configuration is needed for containerd](how-to-set-sandbox-config-kata.md#containerd-configuration).
Kata Containers just supports setup swap device in guest kernel with QEMU.
Install and setup Kata Containers as shown [here](../install/README.md).
Enable setup swap device in guest kernel as follows:
```
$ sudo sed -i -e 's/^#enable_guest_swap.*$/enable_guest_swap = true/g' /etc/kata-containers/configuration.toml
```
## Run a Kata Container utilizing swap device
Use following command to start a Kata Container with swappiness 60 and 1GB swap device (swap_in_bytes - memory_limit_in_bytes).
```
$ pod_yaml=pod.yaml
$ container_yaml=container.yaml
$ image="quay.io/prometheus/busybox:latest"
$ cat << EOF > "${pod_yaml}"
metadata:
name: busybox-sandbox1
EOF
$ cat << EOF > "${container_yaml}"
metadata:
name: busybox-test-swap
annotations:
io.katacontainers.container.resource.swappiness: "60"
io.katacontainers.container.resource.swap_in_bytes: "2147483648"
linux:
resources:
memory_limit_in_bytes: 1073741824
image:
image: "$image"
command:
- top
EOF
$ sudo crictl pull $image
$ podid=$(sudo crictl runp $pod_yaml)
$ cid=$(sudo crictl create $podid $container_yaml $pod_yaml)
$ sudo crictl start $cid
```
Kata Container setups swap device for this container only when `io.katacontainers.container.resource.swappiness` is set.
The following table shows the swap size how to decide if `io.katacontainers.container.resource.swappiness` is set.
|`io.katacontainers.container.resource.swap_in_bytes`|`memory_limit_in_bytes`|swap size|
|---|---|---|
|set|set| `io.katacontainers.container.resource.swap_in_bytes` - `memory_limit_in_bytes`|
|not set|set| `memory_limit_in_bytes`|
|not set|not set| `io.katacontainers.config.hypervisor.default_memory`|
|set|not set|cgroup doesn't support this usage|

View File

@@ -3,7 +3,7 @@
This document describes how to set up a single-machine Kubernetes (k8s) cluster.
The Kubernetes cluster will use the
[CRI containerd](https://github.com/containerd/containerd/) and
[CRI containerd plugin](https://github.com/containerd/cri) and
[Kata Containers](https://katacontainers.io) to launch untrusted workloads.
## Requirements
@@ -71,12 +71,12 @@ $ for service in ${services}; do
service_dir="/etc/systemd/system/${service}.service.d/"
sudo mkdir -p ${service_dir}
cat << EOF | sudo tee "${service_dir}/proxy.conf"
cat << EOT | sudo tee "${service_dir}/proxy.conf"
[Service]
Environment="HTTP_PROXY=${http_proxy}"
Environment="HTTPS_PROXY=${https_proxy}"
Environment="NO_PROXY=${no_proxy}"
EOF
EOT
done
$ sudo systemctl daemon-reload
@@ -172,7 +172,7 @@ If a pod has the `runtimeClassName` set to `kata`, the CRI plugin runs the pod w
- Create an pod configuration that using Kata Containers runtime
```bash
$ cat << EOF | tee nginx-kata.yaml
$ cat << EOT | tee nginx-kata.yaml
apiVersion: v1
kind: Pod
metadata:
@@ -183,7 +183,7 @@ If a pod has the `runtimeClassName` set to `kata`, the CRI plugin runs the pod w
- name: nginx
image: nginx
EOF
EOT
```
- Create the pod

View File

@@ -22,7 +22,7 @@ This document requires the presence of the ACRN hypervisor and Kata Containers o
- ACRN supported [Hardware](https://projectacrn.github.io/latest/hardware.html#supported-hardware).
> **Note:** Please make sure to have a minimum of 4 logical processors (HT) or cores.
- ACRN [software](https://projectacrn.github.io/latest/tutorials/run_kata_containers.html) setup.
- ACRN [software](https://projectacrn.github.io/latest/tutorials/kbl-nuc-sdc.html#use-the-script-to-set-up-acrn-automatically) setup.
- For networking, ACRN supports either MACVTAP or TAP. If MACVTAP is not enabled in the Service OS, please follow the below steps to update the kernel:
```sh

View File

@@ -16,9 +16,9 @@ from the host, a potentially undesirable side-effect that decreases the security
The following sections document how to configure this behavior in different container runtimes.
#### Containerd
#### Containerd and CRI
The Containerd allows configuring the privileged host devices behavior for each runtime in the containerd config. This is
The Containerd CRI allows configuring the privileged host devices behavior for each runtime in the CRI config. This is
done with the `privileged_without_host_devices` option. Setting this to `true` will disable hot plugging of the host
devices into the guest, even when privileged is enabled.
@@ -41,7 +41,7 @@ See below example config:
```
- [Kata Containers with Containerd and CRI documentation](how-to-use-k8s-with-cri-containerd-and-kata.md)
- [Containerd CRI config documentation](https://github.com/containerd/containerd/blob/main/docs/cri/config.md)
- [Containerd CRI config documentation](https://github.com/containerd/cri/blob/master/docs/config.md)
#### CRI-O

View File

@@ -9,7 +9,7 @@ Kubernetes CRI (Container Runtime Interface) implementations allow using any
OCI-compatible runtime with Kubernetes, such as the Kata Containers runtime.
Kata Containers support both the [CRI-O](https://github.com/kubernetes-incubator/cri-o) and
[containerd](https://github.com/containerd/containerd) CRI implementations.
[CRI-containerd](https://github.com/containerd/cri) CRI implementations.
After choosing one CRI implementation, you must make the appropriate configuration
to ensure it integrates with Kata Containers.
@@ -111,7 +111,11 @@ manage_ns_lifecycle = true
```
### containerd
### containerd with CRI plugin
If you select containerd with `cri` plugin, follow the "Getting Started for Developers"
instructions [here](https://github.com/containerd/cri#getting-started-for-developers)
to properly install it.
To customize containerd to select Kata Containers runtime, follow our
"Configure containerd to use Kata Containers" internal documentation
@@ -156,7 +160,7 @@ $ sudo systemctl restart kubelet
# If using CRI-O
$ sudo kubeadm init --ignore-preflight-errors=all --cri-socket /var/run/crio/crio.sock --pod-network-cidr=10.244.0.0/16
# If using containerd
# If using CRI-containerd
$ sudo kubeadm init --ignore-preflight-errors=all --cri-socket /run/containerd/containerd.sock --pod-network-cidr=10.244.0.0/16
$ export KUBECONFIG=/etc/kubernetes/admin.conf

View File

@@ -34,7 +34,7 @@ as the proxy starts.
Follow the [instructions](../install/README.md)
to get Kata Containers properly installed and configured with Kubernetes.
You can choose between CRI-O and containerd, both are supported
You can choose between CRI-O and CRI-containerd, both are supported
through this document.
For both cases, select the workloads as _trusted_ by default. This way,
@@ -159,7 +159,7 @@ containers with `privileged: true` to `privileged: false`.
There is no difference between Istio and Linkerd in this section. It is
about which CRI implementation you use.
For both CRI-O and containerd, you have to add an annotation indicating
For both CRI-O and CRI-containerd, you have to add an annotation indicating
the workload for this deployment is not _trusted_, which will trigger
`kata-runtime` to be called instead of `runc`.
@@ -193,9 +193,9 @@ spec:
...
```
__containerd:__
__CRI-containerd:__
Add the following annotation for containerd
Add the following annotation for CRI-containerd
```yaml
io.kubernetes.cri.untrusted-workload: "true"
```

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 150 KiB

View File

@@ -1,137 +0,0 @@
# Kata Containers threat model
This document discusses threat models associated with the Kata Containers project.
Kata was designed to provide additional isolation of container workloads, protecting
the host infrastructure from potentially malicious container users or workloads. Since
Kata Containers adds a level of isolation on top of traditional containers, the focus
is on the additional layer provided, not on traditional container security.
This document provides a brief background on containers and layered security, describes
the interface to Kata from CRI runtimes, a review of utilized virtual machine interfaces, and then
a review of threats.
## Kata security objective
Kata seeks to prevent an untrusted container workload or user of that container workload to gain
control of, obtain information from, or tamper with the host infrastructure.
In our scenario, an asset is anything on the host system, or elsewhere in the cluster
infrastructure. The attacker is assumed to be either a malicious user or the workload itself
running within the container. The goal of Kata is to prevent attacks which would allow
any access to the defined assets.
## Background on containers, layered security
Traditional containers leverage several key Linux kernel features to provide isolation and
a view that the container workload is the only entity running on the host. Key features include
`Namespaces`, `cgroups`, `capablities`, `SELinux` and `seccomp`. The canonical runtime for creating such
a container is `runc`. In the remainder of the document, the term `traditional-container` will be used
to describe a container workload created by runc.
Kata Containers provides a second layer of isolation on top of those provided by traditional-containers.
The hardware virtualization interface is the basis of this additional layer. Kata launches a lightweight
virtual machine, and uses the guests Linux kernel to create a container workload, or workloads in the case
of multi-container pods. In Kubernetes and in the Kata implementation, the sandbox is carried out at the
pod level. In Kata, this sandbox is created using a virtual machine.
## Interface to Kata Containers: CRI, v2-shim, OCI
A typical Kata Containers deployment uses Kubernetes with a CRI implementation.
On every node, Kubelet will interact with a CRI implementor, which will in turn interface with
an OCI based runtime, such as Kata Containers. Typical CRI implementors are `cri-o` and `containerd`.
The CRI API, as defined at the Kubernetes [CRI-API repo](https://github.com/kubernetes/cri-api/),
results in a few constructs being supported by the CRI implementation, and ultimately in the OCI
runtime creating the workloads.
In order to run a container inside of the Kata sandbox, several virtual machine devices and interfaces
are required. Kata translates sandbox and container definitions to underlying virtualization technologies provided
by a set of virtual machine monitors (VMMs) and hypervisors. These devices and their underlying
implementations are discussed in detail in the following section.
## Interface to the Kata sandbox/virtual machine
In case of Kata, today the devices which we need in the guest are:
- Storage: In the current design of Kata Containers, we are reliant on the CRI implementor to
assist in image handling and volume management on the host. As a result, we need to support a way of passing to the sandbox the container rootfs, volumes requested
by the workload, and any other volumes created to facilitate sharing of secrets and `configmaps` with the containers. Depending on how these are managed, a block based device or file-system
sharing is required. Kata Containers does this by way of `virtio-blk` and/or `virtio-fs`.
- Networking: A method for enabling network connectivity with the workload is required. Typically this will be done providing a `TAP` device
to the VMM, and this will be exposed to the guest as a `virtio-net` device. It is feasible to pass in a NIC device directly, in which case `VFIO` is leveraged
and the device itself will be exposed to the guest.
- Control: In order to interact with the guest agent and retrieve `STDIO` from containers, a medium of communication is required.
This is available via `virtio-vsock`.
- Devices: `VFIO` is utilized when devices are passed directly to the virtual machine and exposed to the container.
- Dynamic Resource Management: `ACPI` is utilized to allow for dynamic VM resource management (for example: CPU, memory, device hotplug). This is required when containers are resized,
or more generally when containers are added to a pod.
How these devices are utilized varies depending on the VMM utilized. We clarify the default settings provided when integrating Kata
with the QEMU, Firecracker and Cloud Hypervisor VMMs in the following sections.
### Devices
Each virtio device is implemented by a backend, which may execute within userspace on the host (vhost-user), the VMM itself, or within the host kernel (vhost). While it may provide enhanced performance,
vhost devices are often seen as higher risk since an exploit would be already running within the kernel space. While VMM and vhost-user are both in userspace on the host, `vhost-user` generally allows for the back-end process to require less system calls and capabilities compared to a full VMM.
#### `virtio-blk` and `virtio-scsi`
The backend for `virtio-blk` and `virtio-scsi` are based in the VMM itself (ring3 in the context of x86) by default for Cloud Hypervisor, Firecracker and QEMU.
While `vhost` based back-ends are available for QEMU, it is not recommended. `vhost-user` back-ends are being added for Cloud Hypervisor, they are not utilized in Kata today.
#### `virtio-fs`
`virtio-fs` is supported in Cloud Hypervisor and QEMU. `virtio-fs`'s interaction with the host filesystem is done through a vhost-user daemon, `virtiofsd`.
The `virtio-fs` client, running in the guest, will generate requests to access files. `virtiofsd` will receive requests, open the file, and request the VMM
to `mmap` it into the guest. When DAX is utilized, the guest will access the host's page cache, avoiding the need for copy and duplication. DAX is still an experimental feature,
and is not enabled by default.
From the `virtiofsd` [documentation](https://qemu-project.gitlab.io/qemu/tools/virtiofsd.html):
```This program must be run as the root user. Upon startup the program will switch into a new file system namespace with the shared directory tree as its root. This prevents “file system escapes” due to symlinks and other file system objects that might lead to files outside the shared directory. The program also sandboxes itself using seccomp(2) to prevent ptrace(2) and other vectors that could allow an attacker to compromise the system after gaining control of the virtiofsd process.```
DAX-less support for `virtio-fs` is available as of the 5.4 Linux kernel. QEMU VMM supports virtio-fs as of v4.2. Cloud Hypervisor
supports `virtio-fs`.
#### `virtio-net`
`virtio-net` has many options, depending on the VMM and Kata configurations.
##### QEMU networking
While QEMU has options for `vhost`, `virtio-net` and `vhost-user`, the `virtio-net` backend
for Kata defaults to `vhost-net` for performance reasons. The default configuration is being
reevaluated.
##### Firecracker networking
For Firecracker, the `virtio-net` backend is within Firecracker's VMM.
##### Cloud Hypervisor networking
For Cloud Hypervisor, the current backend default is within the VMM. `vhost-user-net` support
is being added (written in rust, Cloud Hypervisor specific).
#### virtio-vsock
##### QEMU vsock
In QEMU, vsock is backed by `vhost_vsock`, which runs within the kernel itself.
##### Firecracker and Cloud Hypervisor
In Firecracker and Cloud Hypervisor, vsock is backed by a unix-domain-socket in the hosts userspace.
#### VFIO
Utilizing VFIO, devices can be passed through to the virtual machine. We will assess this separately. Exposure to
host is limited to gaps in device pass-through handling. This is supported in QEMU and Cloud Hypervisor, but not
Firecracker.
#### ACPI
ACPI is necessary for hotplug of CPU, memory and devices. ACPI is available in QEMU and Cloud Hypervisor. Device, CPU and memory hotplug
are not available in Firecracker.
## Devices and threat model
![Threat model](threat-model-boundaries.svg "threat-model")

View File

@@ -1,214 +0,0 @@
# Overview
This document explains how to trace Kata Containers components.
# Introduction
The Kata Containers runtime and agent are able to generate
[OpenTelemetry][opentelemetry] trace spans, which allow the administrator to
observe what those components are doing and how much time they are spending on
each operation.
# OpenTelemetry summary
An OpenTelemetry-enabled application creates a number of trace "spans". A span
contains the following attributes:
- A name
- A pair of timestamps (recording the start time and end time of some operation)
- A reference to the span's parent span
All spans need to be *finished*, or *completed*, to allow the OpenTelemetry
framework to generate the final trace information (by effectively closing the
transaction encompassing the initial (root) span and all its children).
For Kata, the root span represents the total amount of time taken to run a
particular component from startup to its shutdown (the "run time").
# Architecture
## Runtime tracing architecture
The runtime, which runs in the host environment, has been modified to
optionally generate trace spans which are sent to a trace collector on the
host.
## Agent tracing architecture
An OpenTelemetry system (such as [Jaeger][jaeger-tracing]) uses a collector to
gather up trace spans from the application for viewing and processing. For an
application to use the collector, it must run in the same context as
the collector.
This poses a problem for tracing the Kata Containers agent since it does not
run in the same context as the collector: it runs inside a virtual machine (VM).
To allow spans from the agent to be sent to the trace collector, Kata provides
a [trace forwarder][trace-forwarder] component. This runs in the same context
as the collector (generally on the host system) and listens on a
[`VSOCK`][vsock] channel for traces generated by the agent, forwarding them on
to the trace collector.
> **Note:**
>
> This design supports agent tracing without having to make changes to the
> image, but also means that [custom images][osbuilder] can also benefit from
> agent tracing.
The following diagram summarises the architecture used to trace the Kata
Containers agent:
```
+--------------------------------------------+
| Host |
| |
| +---------------+ |
| | OpenTelemetry | |
| | Trace | |
| | Collector | |
| +---------------+ |
| ^ +---------------+ |
| | spans | Kata VM | |
| +-----+-----+ | | |
| | Kata | spans o +-------+ | |
| | Trace |<-----------------| Kata | | |
| | Forwarder | VSOCK o | Agent | | |
| +-----------+ Channel | +-------+ | |
| +---------------+ |
+--------------------------------------------+
```
# Agent tracing prerequisites
- You must have a trace collector running.
Although the collector normally runs on the host, it can also be run from
inside a Docker image configured to expose the appropriate host ports to the
collector.
The [Jaeger "all-in-one" Docker image][jaeger-all-in-one] method
is the quickest and simplest way to run the collector for testing.
- If you wish to trace the agent, you must start the
[trace forwarder][trace-forwarder].
> **Notes:**
>
> - If agent tracing is enabled but the forwarder is not running,
> the agent will log an error (signalling that it cannot generate trace
> spans), but continue to work as normal.
>
> - The trace forwarder requires a trace collector (such as Jaeger) to be
> running before it is started. If a collector is not running, the trace
> forwarder will exit with an error.
# Enable tracing
By default, tracing is disabled for all components. To enable _any_ form of
tracing an `enable_tracing` option must be enabled for at least one component.
> **Note:**
>
> Enabling this option will only allow tracing for subsequently
> started containers.
## Enable runtime tracing
To enable runtime tracing, set the tracing option as shown:
```toml
[runtime]
enable_tracing = true
```
## Enable agent tracing
To enable agent tracing, set the tracing option as shown:
```toml
[agent.kata]
enable_tracing = true
```
> **Note:**
>
> If both agent tracing and runtime tracing are enabled, the resulting trace
> spans will be "collated": expanding individual runtime spans in the Jaeger
> web UI will show the agent trace spans resulting from the runtime
> operation.
# Appendices
## Agent tracing requirements
### Host environment
- The host kernel must support the VSOCK socket type.
This will be available if the kernel is built with the
`CONFIG_VHOST_VSOCK` configuration option.
- The VSOCK kernel module must be loaded:
```
$ sudo modprobe vhost_vsock
```
### Guest environment
- The guest kernel must support the VSOCK socket type:
This will be available if the kernel is built with the
`CONFIG_VIRTIO_VSOCKETS` configuration option.
> **Note:** The default Kata Containers guest kernel provides this feature.
## Agent tracing limitations
- Agent tracing is only "completed" when the workload and the Kata agent
process have exited.
Although trace information *can* be inspected before the workload and agent
have exited, it is incomplete. This is shown as `<trace-without-root-span>`
in the Jaeger web UI.
If the workload is still running, the trace transaction -- which spans the entire
runtime of the Kata agent -- will not have been completed. To view the complete
trace details, wait for the workload to end, or stop the container.
## Performance impact
[OpenTelemetry][opentelemetry] is designed for high performance. It combines
the best of two previous generation projects (OpenTracing and OpenCensus) and
uses a very efficient mechanism to capture trace spans. Further, the trace
points inserted into the agent are generated dynamically at compile time. This
is advantageous since new versions of the agent will automatically benefit
from improvements in the tracing infrastructure. Overall, the impact of
enabling runtime and agent tracing should be extremely low.
## Agent shutdown behaviour
In normal operation, the Kata runtime manages the VM shutdown and performs
certain optimisations to speed up this process. However, if agent tracing is
enabled, the agent itself is responsible for shutting down the VM. This it to
ensure all agent trace transactions are completed. This means there will be a
small performance impact for container shutdown when agent tracing is enabled
as the runtime must wait for the VM to shutdown fully.
## Set up a tracing development environment
If you want to debug, further develop, or test tracing,
[enabling full debug][enable-full-debug]
is highly recommended. For working with the agent, you may also wish to
[enable a debug console][setup-debug-console]
to allow you to access the VM environment.
[agent-ctl]: https://github.com/kata-containers/kata-containers/blob/main/tools/agent-ctl
[enable-full-debug]: https://github.com/kata-containers/kata-containers/blob/main/docs/Developer-Guide.md#enable-full-debug
[jaeger-all-in-one]: https://www.jaegertracing.io/docs/getting-started/
[jaeger-tracing]: https://www.jaegertracing.io
[opentelemetry]: https://opentelemetry.io
[osbuilder]: https://github.com/kata-containers/kata-containers/blob/main/tools/osbuilder
[setup-debug-console]: https://github.com/kata-containers/kata-containers/blob/main/docs/Developer-Guide.md#set-up-a-debug-console
[trace-forwarder]: https://github.com/kata-containers/kata-containers/blob/main/src/trace-forwarder
[vsock]: https://wiki.qemu.org/Features/VirtioVsock

View File

@@ -67,7 +67,7 @@ To use large BARs devices (for example, Nvidia Tesla P100), you need Kata versio
The following configuration in the Kata `configuration.toml` file as shown below can work:
Hotplug for PCI devices by `acpi_pcihp` (Linux's ACPI PCI Hotplug driver):
Hotplug for PCI devices by `shpchp` (Linux's SHPC PCI Hotplug driver):
```
machine_type = "q35"
@@ -91,6 +91,7 @@ The following kernel config options need to be enabled:
```
# Support PCI/PCIe device hotplug (Required for large BARs device)
CONFIG_HOTPLUG_PCI_PCIE=y
CONFIG_HOTPLUG_PCI_SHPC=y
# Support for loading modules (Required for load Nvidia drivers)
CONFIG_MODULES=y

View File

@@ -1,113 +1,107 @@
# Kata Containers with SGX
Intel Software Guard Extensions (SGX) is a set of instructions that increases the security
Intel® Software Guard Extensions (SGX) is a set of instructions that increases the security
of applications code and data, giving them more protections from disclosure or modification.
This document guides you to run containers with SGX enclaves with Kata Containers in Kubernetes.
> **Note:** At the time of writing this document, SGX patches have not landed on the Linux kernel
> project, so specific versions for guest and host kernels must be installed to enable SGX.
## Preconditions
## Check if SGX is enabled
* Intel SGX capable bare metal nodes
* Host kernel Linux 5.13 or later with SGX and SGX KVM enabled:
Run the following command to check if your host supports SGX.
```sh
$ grep SGX /boot/config-`uname -r`
CONFIG_X86_SGX=y
CONFIG_X86_SGX_KVM=y
$ grep -o sgx /proc/cpuinfo
```
* Kubernetes cluster configured with:
* [`kata-deploy`](https://github.com/kata-containers/kata-containers/tree/main/tools/packaging/kata-deploy) based Kata Containers installation
* [Intel SGX Kubernetes device plugin](https://github.com/intel/intel-device-plugins-for-kubernetes/tree/main/cmd/sgx_plugin#deploying-with-pre-built-images)
Continue to the following section if the output of the above command is empty,
otherwise continue to section [Install Guest kernel with SGX support](#install-guest-kernel-with-sgx-support)
> Note: Kata Containers supports creating VM sandboxes with Intel® SGX enabled
> using [cloud-hypervisor](https://github.com/cloud-hypervisor/cloud-hypervisor/) VMM only. QEMU support is waiting to get the
> Intel SGX enabled QEMU upstream release.
## Install Host kernel with SGX support
## Installation
### Kata Containers Guest Kernel
Follow the instructions to [setup](../../tools/packaging/kernel/README.md#setup-kernel-source-code) and [build](../../tools/packaging/kernel/README.md#build-the-kernel) the experimental guest kernel. Then, install as:
The following commands were tested on Fedora 32, they might work on other distros too.
```sh
$ sudo cp kata-linux-experimental-*/vmlinux /opt/kata/share/kata-containers/vmlinux.sgx
$ sudo sed -i 's|vmlinux.container|vmlinux.sgx|g' \
/opt/kata/share/defaults/kata-containers/configuration-clh.toml
$ git clone --depth=1 https://github.com/intel/kvm-sgx
$ pushd kvm-sgx
$ cp /boot/config-$(uname -r) .config
$ yes "" | make oldconfig
$ # In the following step, enable: INTEL_SGX and INTEL_SGX_VIRTUALIZATION
$ make menuconfig
$ make -j$(($(nproc)-1)) bzImage
$ make -j$(($(nproc)-1)) modules
$ sudo make modules_install
$ sudo make install
$ popd
$ sudo reboot
```
### Kata Containers Configuration
> **Notes:**
> * Run: `mokutil --sb-state` to check whether secure boot is enabled, if so, you will need to sign the kernel.
> * You'll lose SGX support when a new distro kernel is installed and the system rebooted.
Once you have restarted your system with the new brand Linux Kernel with SGX support, run
the following command to make sure it's enabled. If the output is empty, go to the BIOS
setup and enable SGX manually.
```sh
$ grep -o sgx /proc/cpuinfo
```
## Install Guest kernel with SGX support
Install the guest kernel in the Kata Containers directory, this way it can be used to run
Kata Containers.
```sh
$ curl -LOk https://github.com/devimc/kvm-sgx/releases/download/v0.0.1/kata-virtiofs-sgx.tar.gz
$ sudo tar -xf kata-virtiofs-sgx.tar.gz -C /usr/share/kata-containers/
$ sudo sed -i 's|kernel =|kernel = "/usr/share/kata-containers/vmlinux-virtiofs-sgx.container"|g' \
/usr/share/defaults/kata-containers/configuration.toml
```
## Run Kata Containers with SGX enabled
Before running a Kata Container make sure that your version of `crio` or `containerd`
supports annotations.
For `containerd` check in `/etc/containerd/config.toml` that the list of `pod_annotations` passed
to the `sandbox` are: `["io.katacontainers.*", "sgx.intel.com/epc"]`.
## Usage
With the following sample job deployed using `kubectl apply -f`:
> `sgx.yaml`
```yaml
apiVersion: batch/v1
kind: Job
apiVersion: v1
kind: Pod
metadata:
name: oesgx-demo-job
labels:
jobgroup: oesgx-demo
name: sgx
annotations:
sgx.intel.com/epc: "32Mi"
spec:
template:
metadata:
labels:
jobgroup: oesgx-demo
spec:
runtimeClassName: kata-clh
initContainers:
- name: init-sgx
image: busybox
command: ['sh', '-c', 'mkdir /dev/sgx; ln -s /dev/sgx_enclave /dev/sgx/enclave; ln -s /dev/sgx_provision /dev/sgx/provision']
volumeMounts:
- mountPath: /dev
name: dev-mount
restartPolicy: Never
containers:
-
name: eosgx-demo-job-1
image: oeciteam/oe-helloworld:latest
imagePullPolicy: IfNotPresent
securityContext:
readOnlyRootFilesystem: true
capabilities:
add: ["IPC_LOCK"]
resources:
limits:
sgx.intel.com/epc: "512Ki"
volumes:
- name: dev-mount
hostPath:
path: /dev
terminationGracePeriodSeconds: 0
runtimeClassName: kata
containers:
- name: c1
image: busybox
command:
- sh
stdin: true
tty: true
volumeMounts:
- mountPath: /dev/sgx/
name: test-volume
volumes:
- name: test-volume
hostPath:
path: /dev/sgx/
type: Directory
```
You'll see the enclave output:
```sh
$ kubectl logs oesgx-demo-job-wh42g
Hello world from the enclave
Enclave called into host to print: Hello World!
$ kubectl apply -f sgx.yaml
$ kubectl exec -ti sgx ls /dev/sgx/
enclave provision
```
### Notes
The output of the latest command shouldn't be empty, otherwise check
your system environment to make sure SGX is fully supported.
* The Kata VM's SGX Encrypted Page Cache (EPC) memory size is based on the sum of `sgx.intel.com/epc`
resource requests within the pod.
* `init-sgx` can be removed from the YAML configuration file if the Kata rootfs is modified with the
necessary udev rules.
See the [note on SGX backwards compatibility](https://github.com/intel/intel-device-plugins-for-kubernetes/tree/main/cmd/sgx_plugin#backwards-compatibility-note).
* Intel SGX DCAP attestation is known to work from Kata sandboxes but it comes with one limitation: If
the Intel SGX `aesm` daemon runs on the bare metal node and DCAP `out-of-proc` attestation is used,
containers within the Kata sandbox cannot get the access to the host's `/var/run/aesmd/aesm.sock`
because socket passthrough is not supported. An alternative is to deploy the `aesm` daemon as a side-car
container.
* Projects like [Gramine Shielded Containers (GSC)](https://gramine-gsc.readthedocs.io/en/latest/) are
also known to work. For GSC specifically, the Kata guest kernel needs to have the `CONFIG_NUMA=y`
enabled and at least one CPU online when running the GSC container.
[1]: github.com/cloud-hypervisor/cloud-hypervisor/

View File

@@ -12,7 +12,7 @@ serde_json = "1.0.39"
# - Dynamic keys required to allow HashMap keys to be slog::Serialized.
# - The 'max_*' features allow changing the log level at runtime
# (by stopping the compiler from removing log calls).
slog = { version = "2.5.2", features = ["dynamic-keys", "max_level_trace", "release_max_level_debug"] }
slog = { version = "2.5.2", features = ["dynamic-keys", "max_level_trace", "release_max_level_info"] }
slog-json = "2.3.0"
slog-async = "2.3.0"
slog-scope = "4.1.2"

View File

@@ -59,7 +59,7 @@ parts:
yq_version=3.4.1
yq_url="https://${yq_pkg}/releases/download/${yq_version}/yq_${goos}_${goarch}"
curl -o "${yq_path}" -L "${yq_url}"
curl -o "${yq_path}" -LSsf "${yq_url}"
chmod +x "${yq_path}"
kata_dir=gopath/src/github.com/${SNAPCRAFT_PROJECT_NAME}/${SNAPCRAFT_PROJECT_NAME}
@@ -139,7 +139,7 @@ parts:
cp kata-containers*.img ${kata_image_dir}
runtime:
after: [godeps, image, cloud-hypervisor]
after: [godeps, image]
plugin: nil
build-attributes: [no-patchelf]
override-build: |
@@ -185,7 +185,6 @@ parts:
- flex
override-build: |
yq=${SNAPCRAFT_STAGE}/yq
export PATH="${PATH}:${SNAPCRAFT_STAGE}"
export GOPATH=${SNAPCRAFT_STAGE}/gopath
kata_dir=${GOPATH}/src/github.com/${SNAPCRAFT_PROJECT_NAME}/${SNAPCRAFT_PROJECT_NAME}
versions_file="${kata_dir}/versions.yaml"
@@ -200,17 +199,10 @@ parts:
kata_dir=${GOPATH}/src/github.com/${SNAPCRAFT_PROJECT_NAME}/${SNAPCRAFT_PROJECT_NAME}
cd ${kata_dir}/tools/packaging/kernel
kernel_dir_prefix="kata-linux-"
# Setup and build kernel
if [ "$(uname -m)" = "x86_64" ]; then
kernel_version="$(${yq} r $versions_file assets.kernel-experimental.tag)"
kernel_version=${kernel_version#v}
kernel_dir_prefix="kata-linux-experimental-"
./build-kernel.sh -e -v ${kernel_version} -d setup
else
./build-kernel.sh -v ${kernel_version} -d setup
fi
./build-kernel.sh -v ${kernel_version} -d setup
kernel_dir_prefix="kata-linux-"
cd ${kernel_dir_prefix}*
make -j $(($(nproc)-1)) EXTRAVERSION=".container"
@@ -313,7 +305,7 @@ parts:
;;
*)
cp -a ${kata_dir}/tools/packaging/qemu/default-configs/* configs/devices/
cp -a ${kata_dir}/tools/packaging/qemu/default-configs/* default-configs/devices/
;;
esac
@@ -335,22 +327,6 @@ parts:
# Hack: move qemu to /
"snap/kata-containers/current/": "./"
cloud-hypervisor:
plugin: nil
after: [godeps]
override-build: |
export GOPATH=${SNAPCRAFT_STAGE}/gopath
yq=${SNAPCRAFT_STAGE}/yq
kata_dir=${GOPATH}/src/github.com/${SNAPCRAFT_PROJECT_NAME}/${SNAPCRAFT_PROJECT_NAME}
versions_file="${kata_dir}/versions.yaml"
version="$(${yq} r ${versions_file} assets.hypervisor.cloud_hypervisor.version)"
url="https://github.com/cloud-hypervisor/cloud-hypervisor/releases/download/${version}"
curl -L ${url}/cloud-hypervisor-static -o cloud-hypervisor
curl -LO ${url}/clh-remote
install -D cloud-hypervisor ${SNAPCRAFT_PART_INSTALL}/usr/bin/cloud-hypervisor
install -D clh-remote ${SNAPCRAFT_PART_INSTALL}/usr/bin/clh-remote
apps:
runtime:
command: usr/bin/kata-runtime

58
src/agent/Cargo.lock generated
View File

@@ -544,9 +544,7 @@ dependencies = [
"rustjail",
"scan_fmt",
"scopeguard",
"serde",
"serde_json",
"serial_test",
"slog",
"slog-scope",
"slog-stdlog",
@@ -554,7 +552,6 @@ dependencies = [
"thiserror",
"tokio",
"tokio-vsock",
"toml",
"tracing",
"tracing-opentelemetry",
"tracing-subscriber",
@@ -594,24 +591,6 @@ dependencies = [
"rle-decode-fast",
]
[[package]]
name = "libseccomp"
version = "0.1.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "36ad71a5b66ceef3acfe6a3178b29b4da063f8bcb2c36dab666d52a7a9cfdb86"
dependencies = [
"libc",
"libseccomp-sys",
"nix 0.17.0",
"pkg-config",
]
[[package]]
name = "libseccomp-sys"
version = "0.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "539912de229a4fc16e507e8df12a394038a524a5b5b6c92045ad344472aac475"
[[package]]
name = "lock_api"
version = "0.4.4"
@@ -781,19 +760,6 @@ dependencies = [
"void",
]
[[package]]
name = "nix"
version = "0.17.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "50e4785f2c3b7589a0d0c1dd60285e1188adac4006e8abd6dd578e1567027363"
dependencies = [
"bitflags",
"cc",
"cfg-if 0.1.10",
"libc",
"void",
]
[[package]]
name = "nix"
version = "0.19.1"
@@ -1008,12 +974,6 @@ version = "0.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8b870d8c151b6f2fb93e84a13146138f05d02ed11c7e7c54f8826aaaf7c9f184"
[[package]]
name = "pkg-config"
version = "0.3.19"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3831453b3449ceb48b6d9c7ad7c96d5ea673e9b470a1dc578c2ce6521230884c"
[[package]]
name = "ppv-lite86"
version = "0.2.10"
@@ -1321,7 +1281,6 @@ dependencies = [
"inotify",
"lazy_static",
"libc",
"libseccomp",
"nix 0.21.0",
"oci",
"path-absolutize",
@@ -1364,18 +1323,18 @@ checksum = "d29ab0c6d3fc0ee92fe66e2d99f700eab17a8d57d1c1d3b748380fb20baa78cd"
[[package]]
name = "serde"
version = "1.0.129"
version = "1.0.126"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d1f72836d2aa753853178eda473a3b9d8e4eefdaf20523b919677e6de489f8f1"
checksum = "ec7505abeacaec74ae4778d9d9328fe5a5d04253220a85c4ee022239fc996d03"
dependencies = [
"serde_derive",
]
[[package]]
name = "serde_derive"
version = "1.0.129"
version = "1.0.126"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e57ae87ad533d9a56427558b516d0adac283614e347abf85b0dc0cbbf0a249f3"
checksum = "963a7dbc9895aeac7ac90e74f34a5d5261828f79df35cbed41e10189d3804d43"
dependencies = [
"proc-macro2 1.0.26",
"quote 1.0.9",
@@ -1659,15 +1618,6 @@ dependencies = [
"vsock",
]
[[package]]
name = "toml"
version = "0.5.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a31142970826733df8241ef35dc040ef98c679ab14d7c3e54d827099b3acecaa"
dependencies = [
"serde",
]
[[package]]
name = "tracing"
version = "0.1.26"

View File

@@ -6,6 +6,7 @@ edition = "2018"
[dependencies]
oci = { path = "oci" }
logging = { path = "../../pkg/logging" }
rustjail = { path = "rustjail" }
protocols = { path = "protocols" }
lazy_static = "1.3.0"
@@ -19,7 +20,6 @@ scan_fmt = "0.2.3"
scopeguard = "1.0.0"
thiserror = "1.0.26"
regex = "1"
serial_test = "0.5.1"
# Async helpers
async-trait = "0.1.42"
@@ -35,10 +35,11 @@ rtnetlink = "0.8.0"
netlink-packet-utils = "0.4.1"
ipnetwork = "0.17.0"
# Note: this crate sets the slog 'max_*' features which allows the log level
# to be modified at runtime.
logging = { path = "../../pkg/logging" }
slog = "2.5.2"
# slog:
# - Dynamic keys required to allow HashMap keys to be slog::Serialized.
# - The 'max_*' features allow changing the log level at runtime
# (by stopping the compiler from removing log calls).
slog = { version = "2.5.2", features = ["dynamic-keys", "max_level_trace", "release_max_level_info"] }
slog-scope = "4.1.2"
# Redirect ttrpc log calls
@@ -57,10 +58,6 @@ tracing-opentelemetry = "0.13.0"
opentelemetry = { version = "0.14.0", features = ["rt-tokio-current-thread"]}
vsock-exporter = { path = "vsock-exporter" }
# Configuration
serde = { version = "1.0.129", features = ["derive"] }
toml = "0.5.8"
[dev-dependencies]
tempfile = "3.1.0"
@@ -73,6 +70,3 @@ members = [
[profile.release]
lto = true
[features]
seccomp = ["rustjail/seccomp"]

202
src/agent/LICENSE Normal file
View File

@@ -0,0 +1,202 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@@ -27,20 +27,6 @@ COMMIT_MSG = $(if $(COMMIT),$(COMMIT),unknown)
# Exported to allow cargo to see it
export VERSION_COMMIT := $(if $(COMMIT),$(VERSION)-$(COMMIT),$(VERSION))
EXTRA_RUSTFEATURES :=
##VAR SECCOMP=yes|no define if agent enables seccomp feature
SECCOMP := yes
# Enable seccomp feature of rust build
ifeq ($(SECCOMP),yes)
override EXTRA_RUSTFEATURES += seccomp
endif
ifneq ($(EXTRA_RUSTFEATURES),)
override EXTRA_RUSTFEATURES := --features $(EXTRA_RUSTFEATURES)
endif
include ../../utils.mk
TARGET_PATH = target/$(TRIPLE)/$(BUILD_TYPE)/$(TARGET)
@@ -104,14 +90,15 @@ default: $(TARGET) show-header
$(TARGET): $(GENERATED_CODE) $(TARGET_PATH)
$(TARGET_PATH): $(SOURCES) | show-summary
@RUSTFLAGS="$(EXTRA_RUSTFLAGS) --deny warnings" cargo build --target $(TRIPLE) --$(BUILD_TYPE) $(EXTRA_RUSTFEATURES)
@RUSTFLAGS="$(EXTRA_RUSTFLAGS) --deny warnings" cargo build --target $(TRIPLE) --$(BUILD_TYPE)
$(GENERATED_FILES): %: %.in
@sed $(foreach r,$(GENERATED_REPLACEMENTS),-e 's|@$r@|$($r)|g') "$<" > "$@"
##TARGET optimize: optimized build
optimize: $(SOURCES) | show-summary show-header
@RUSTFLAGS="-C link-arg=-s $(EXTRA_RUSTFLAGS) --deny-warnings" cargo build --target $(TRIPLE) --$(BUILD_TYPE) $(EXTRA_RUSTFEATURES)
@RUSTFLAGS="-C link-arg=-s $(EXTRA_RUSTFLAGS) --deny-warnings" cargo build --target $(TRIPLE) --$(BUILD_TYPE)
##TARGET clippy: run clippy linter
clippy: $(GENERATED_CODE)
@@ -140,7 +127,7 @@ vendor:
#TARGET test: run cargo tests
test:
@cargo test --all --target $(TRIPLE) $(EXTRA_RUSTFEATURES) -- --nocapture
@cargo test --all --target $(TRIPLE)
##TARGET check: run test
check: clippy format

View File

@@ -19,7 +19,6 @@ After that, we drafted the initial code here, and any contributions are welcome.
| I/O stream | :white_check_mark: |
| Cgroups | :white_check_mark: |
| Capabilities, `rlimit`, readonly path, masked path, users | :white_check_mark: |
| Seccomp | :white_check_mark: |
| container stats (`stats_container`) | :white_check_mark: |
| Hooks | :white_check_mark: |
| **Agent Features & APIs** |

View File

@@ -46,7 +46,6 @@ message Route {
string device = 3;
string source = 4;
uint32 scope = 5;
IPFamily family = 6;
}
message ARPNeighbor {

View File

@@ -30,11 +30,7 @@ tokio = { version = "1.2.0", features = ["sync", "io-util", "process", "time", "
futures = "0.3"
async-trait = "0.1.31"
inotify = "0.9.2"
libseccomp = { version = "0.1.3", optional = true }
[dev-dependencies]
serial_test = "0.5.0"
tempfile = "3.1.0"
[features]
seccomp = ["libseccomp"]

View File

@@ -25,8 +25,6 @@ use crate::cgroups::mock::Manager as FsManager;
use crate::cgroups::Manager;
use crate::log_child;
use crate::process::Process;
#[cfg(feature = "seccomp")]
use crate::seccomp;
use crate::specconv::CreateOpts;
use crate::{mount, validator};
@@ -153,7 +151,7 @@ lazy_static! {
},
LinuxDevice {
path: "/dev/full".to_string(),
r#type: "c".to_string(),
r#type: String::from("c"),
major: 1,
minor: 7,
file_mode: Some(0o666),
@@ -595,22 +593,11 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
})?;
}
// NoNewPrivileges
// NoNewPeiviledges, Drop capabilities
if oci_process.no_new_privileges {
capctl::prctl::set_no_new_privs().map_err(|_| anyhow!("cannot set no new privileges"))?;
}
// Without NoNewPrivileges, we need to set seccomp
// before dropping capabilities because the calling thread
// must have the CAP_SYS_ADMIN.
#[cfg(feature = "seccomp")]
if !oci_process.no_new_privileges {
if let Some(ref scmp) = linux.seccomp {
seccomp::init_seccomp(scmp)?;
}
}
// Drop capabilities
if oci_process.capabilities.is_some() {
let c = oci_process.capabilities.as_ref().unwrap();
capabilities::drop_privileges(cfd_log, c)?;
@@ -654,7 +641,7 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
let exec_file = Path::new(&args[0]);
log_child!(cfd_log, "process command: {:?}", &args);
if !exec_file.exists() {
find_file(exec_file).ok_or_else(|| anyhow!("the file {} was not found", &args[0]))?;
find_file(exec_file).ok_or_else(|| anyhow!("the file {} is not exist", &args[0]))?;
}
// notify parent that the child's ready to start
@@ -682,16 +669,6 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
unistd::read(fd, &mut buf)?;
}
// With NoNewPrivileges, we should set seccomp as close to
// do_exec as possible in order to reduce the amount of
// system calls in the seccomp profiles.
#[cfg(feature = "seccomp")]
if oci_process.no_new_privileges {
if let Some(ref scmp) = linux.seccomp {
seccomp::init_seccomp(scmp)?;
}
}
do_exec(&args);
}
@@ -856,20 +833,6 @@ impl BaseContainer for LinuxContainer {
}
let linux = spec.linux.as_ref().unwrap();
if p.oci.capabilities.is_none() {
// No capabilities, inherit from container process
let process = spec
.process
.as_ref()
.ok_or_else(|| anyhow!("no process config"))?;
p.oci.capabilities = Some(
process
.capabilities
.clone()
.ok_or_else(|| anyhow!("missing process capabilities"))?,
);
}
let (pfd_log, cfd_log) = unistd::pipe().context("failed to create pipe")?;
let _ = fcntl::fcntl(pfd_log, FcntlArg::F_SETFD(FdFlag::FD_CLOEXEC))

View File

@@ -34,8 +34,6 @@ pub mod container;
pub mod mount;
pub mod pipestream;
pub mod process;
#[cfg(feature = "seccomp")]
pub mod seccomp;
pub mod specconv;
pub mod sync;
pub mod sync_with_async;

View File

@@ -3,7 +3,7 @@
// SPDX-License-Identifier: Apache-2.0
//
use anyhow::{anyhow, Context, Result};
use anyhow::{anyhow, bail, Context, Result};
use libc::uid_t;
use nix::errno::Errno;
use nix::fcntl::{self, OFlag};
@@ -19,7 +19,7 @@ use std::fs::{self, OpenOptions};
use std::mem::MaybeUninit;
use std::os::unix;
use std::os::unix::io::RawFd;
use std::path::{Component, Path, PathBuf};
use std::path::{Path, PathBuf};
use path_absolutize::*;
use std::fs::File;
@@ -828,35 +828,18 @@ fn default_symlinks() -> Result<()> {
}
Ok(())
}
fn dev_rel_path(path: &str) -> Option<&Path> {
let path = Path::new(path);
if !path.starts_with("/dev")
|| path == Path::new("/dev")
|| path.components().any(|c| c == Component::ParentDir)
{
return None;
}
path.strip_prefix("/").ok()
}
fn create_devices(devices: &[LinuxDevice], bind: bool) -> Result<()> {
let op: fn(&LinuxDevice, &Path) -> Result<()> = if bind { bind_dev } else { mknod_dev };
let op: fn(&LinuxDevice) -> Result<()> = if bind { bind_dev } else { mknod_dev };
let old = stat::umask(Mode::from_bits_truncate(0o000));
for dev in DEFAULT_DEVICES.iter() {
let path = Path::new(&dev.path[1..]);
op(dev, path).context(format!("Creating container device {:?}", dev))?;
op(dev)?;
}
for dev in devices {
let path = dev_rel_path(&dev.path).ok_or_else(|| {
if !dev.path.starts_with("/dev") || dev.path.contains("..") {
let msg = format!("{} is not a valid device path", dev.path);
anyhow!(msg)
})?;
if let Some(dir) = path.parent() {
fs::create_dir_all(dir).context(format!("Creating container device {:?}", dev))?;
bail!(anyhow!(msg));
}
op(dev, path).context(format!("Creating container device {:?}", dev))?;
op(dev)?;
}
stat::umask(old);
Ok(())
@@ -878,21 +861,21 @@ lazy_static! {
};
}
fn mknod_dev(dev: &LinuxDevice, relpath: &Path) -> Result<()> {
fn mknod_dev(dev: &LinuxDevice) -> Result<()> {
let f = match LINUXDEVICETYPE.get(dev.r#type.as_str()) {
Some(v) => v,
None => return Err(anyhow!("invalid spec".to_string())),
};
stat::mknod(
relpath,
&dev.path[1..],
*f,
Mode::from_bits_truncate(dev.file_mode.unwrap_or(0)),
nix::sys::stat::makedev(dev.major as u64, dev.minor as u64),
)?;
unistd::chown(
relpath,
&dev.path[1..],
Some(Uid::from_raw(dev.uid.unwrap_or(0) as uid_t)),
Some(Gid::from_raw(dev.gid.unwrap_or(0) as uid_t)),
)?;
@@ -900,9 +883,9 @@ fn mknod_dev(dev: &LinuxDevice, relpath: &Path) -> Result<()> {
Ok(())
}
fn bind_dev(dev: &LinuxDevice, relpath: &Path) -> Result<()> {
fn bind_dev(dev: &LinuxDevice) -> Result<()> {
let fd = fcntl::open(
relpath,
&dev.path[1..],
OFlag::O_RDWR | OFlag::O_CREAT,
Mode::from_bits_truncate(0o644),
)?;
@@ -911,7 +894,7 @@ fn bind_dev(dev: &LinuxDevice, relpath: &Path) -> Result<()> {
mount(
Some(&*dev.path),
relpath,
&dev.path[1..],
None::<&str>,
MsFlags::MS_BIND,
None::<&str>,
@@ -1275,12 +1258,11 @@ mod tests {
uid: Some(unistd::getuid().as_raw()),
gid: Some(unistd::getgid().as_raw()),
};
let path = Path::new("fifo");
let ret = mknod_dev(&dev, path);
let ret = mknod_dev(&dev);
assert!(ret.is_ok(), "Should pass. Got: {:?}", ret);
let ret = stat::stat(path);
let ret = stat::stat("fifo");
assert!(ret.is_ok(), "Should pass. Got: {:?}", ret);
}
#[test]
@@ -1397,26 +1379,4 @@ mod tests {
assert!(result == t.result, "{}", msg);
}
}
#[test]
fn test_dev_rel_path() {
// Valid device paths
assert_eq!(dev_rel_path("/dev/sda").unwrap(), Path::new("dev/sda"));
assert_eq!(dev_rel_path("//dev/sda").unwrap(), Path::new("dev/sda"));
assert_eq!(
dev_rel_path("/dev/vfio/99").unwrap(),
Path::new("dev/vfio/99")
);
assert_eq!(dev_rel_path("/dev/...").unwrap(), Path::new("dev/..."));
assert_eq!(dev_rel_path("/dev/a..b").unwrap(), Path::new("dev/a..b"));
assert_eq!(dev_rel_path("/dev//foo").unwrap(), Path::new("dev/foo"));
// Bad device paths
assert!(dev_rel_path("/devfoo").is_none());
assert!(dev_rel_path("/etc/passwd").is_none());
assert!(dev_rel_path("/dev/../etc/passwd").is_none());
assert!(dev_rel_path("dev/foo").is_none());
assert!(dev_rel_path("").is_none());
assert!(dev_rel_path("/dev").is_none());
}
}

View File

@@ -24,16 +24,6 @@ use tokio::io::{split, ReadHalf, WriteHalf};
use tokio::sync::Mutex;
use tokio::sync::Notify;
macro_rules! close_process_stream {
($self: ident, $stream:ident, $stream_type: ident) => {
if $self.$stream.is_some() {
$self.close_stream(StreamType::$stream_type);
let _ = unistd::close($self.$stream.unwrap());
$self.$stream = None;
}
};
}
#[derive(Debug, PartialEq, Eq, Hash, Clone)]
pub enum StreamType {
Stdin,
@@ -157,22 +147,6 @@ impl Process {
notify.notify_one();
}
pub fn close_stdin(&mut self) {
close_process_stream!(self, term_master, TermMaster);
close_process_stream!(self, parent_stdin, ParentStdin);
self.notify_term_close();
}
pub fn cleanup_process_stream(&mut self) {
close_process_stream!(self, parent_stdin, ParentStdin);
close_process_stream!(self, parent_stdout, ParentStdout);
close_process_stream!(self, parent_stderr, ParentStderr);
close_process_stream!(self, term_master, TermMaster);
self.notify_term_close();
}
fn get_fd(&self, stream_type: &StreamType) -> Option<RawFd> {
match stream_type {
StreamType::Stdin => self.stdin,

View File

@@ -1,237 +0,0 @@
// Copyright 2021 Sony Group Corporation
//
// SPDX-License-Identifier: Apache-2.0
//
use anyhow::{anyhow, Result};
use libseccomp::*;
use oci::{LinuxSeccomp, LinuxSeccompArg};
use std::str::FromStr;
fn get_filter_attr_from_flag(flag: &str) -> Result<ScmpFilterAttr> {
match flag {
"SECCOMP_FILTER_FLAG_TSYNC" => Ok(ScmpFilterAttr::CtlTsync),
"SECCOMP_FILTER_FLAG_LOG" => Ok(ScmpFilterAttr::CtlLog),
"SECCOMP_FILTER_FLAG_SPEC_ALLOW" => Ok(ScmpFilterAttr::CtlSsb),
_ => Err(anyhow!("Invalid seccomp flag")),
}
}
// get_rule_conditions gets rule conditions for a system call from the args.
fn get_rule_conditions(args: &[LinuxSeccompArg]) -> Result<Vec<ScmpArgCompare>> {
let mut conditions: Vec<ScmpArgCompare> = Vec::new();
for arg in args {
if arg.op.is_empty() {
return Err(anyhow!("seccomp opreator is required"));
}
let cond = ScmpArgCompare::new(
arg.index,
ScmpCompareOp::from_str(&arg.op)?,
arg.value,
Some(arg.value_two),
);
conditions.push(cond);
}
Ok(conditions)
}
// init_seccomp creates a seccomp filter and loads it for the current process
// including all the child processes.
pub fn init_seccomp(scmp: &LinuxSeccomp) -> Result<()> {
let def_action = ScmpAction::from_str(scmp.default_action.as_str(), Some(libc::EPERM as u32))?;
// Create a new filter context
let mut filter = ScmpFilterContext::new_filter(def_action)?;
// Add extra architectures
for arch in &scmp.architectures {
let scmp_arch = ScmpArch::from_str(arch)?;
filter.add_arch(scmp_arch)?;
}
// Unset no new privileges bit
filter.set_no_new_privs_bit(false)?;
// Add a rule for each system call
for syscall in &scmp.syscalls {
if syscall.names.is_empty() {
return Err(anyhow!("syscall name is required"));
}
let action = ScmpAction::from_str(&syscall.action, Some(syscall.errno_ret))?;
if action == def_action {
continue;
}
for name in &syscall.names {
let syscall_num = get_syscall_from_name(name, None)?;
if syscall.args.is_empty() {
filter.add_rule(action, syscall_num, None)?;
} else {
let conditions = get_rule_conditions(&syscall.args)?;
filter.add_rule(action, syscall_num, Some(&conditions))?;
}
}
}
// Set filter attributes for each seccomp flag
for flag in &scmp.flags {
let scmp_attr = get_filter_attr_from_flag(flag)?;
filter.set_filter_attr(scmp_attr, 1)?;
}
// Load the filter
filter.load()?;
Ok(())
}
#[cfg(test)]
mod tests {
use super::*;
use crate::skip_if_not_root;
use libc::{dup3, process_vm_readv, EPERM, O_CLOEXEC};
use std::io::Error;
use std::ptr::null;
macro_rules! syscall_assert {
($e1: expr, $e2: expr) => {
let mut errno: i32 = 0;
if $e1 < 0 {
errno = -Error::last_os_error().raw_os_error().unwrap();
}
assert_eq!(errno, $e2);
};
}
#[test]
fn test_get_filter_attr_from_flag() {
skip_if_not_root!();
assert_eq!(
get_filter_attr_from_flag("SECCOMP_FILTER_FLAG_TSYNC").unwrap(),
ScmpFilterAttr::CtlTsync
);
assert_eq!(get_filter_attr_from_flag("ERROR").is_err(), true);
}
#[test]
fn test_init_seccomp() {
skip_if_not_root!();
let data = r#"{
"defaultAction": "SCMP_ACT_ALLOW",
"architectures": [
],
"flags": [
"SECCOMP_FILTER_FLAG_LOG"
],
"syscalls": [
{
"names": [
"dup3"
],
"action": "SCMP_ACT_ERRNO"
},
{
"names": [
"process_vm_readv"
],
"action": "SCMP_ACT_ERRNO",
"errnoRet": 111,
"args": [
{
"index": 0,
"value": 10,
"op": "SCMP_CMP_EQ"
}
]
},
{
"names": [
"process_vm_readv"
],
"action": "SCMP_ACT_ERRNO",
"errnoRet": 111,
"args": [
{
"index": 0,
"value": 20,
"op": "SCMP_CMP_EQ"
}
]
},
{
"names": [
"process_vm_readv"
],
"action": "SCMP_ACT_ERRNO",
"errnoRet": 222,
"args": [
{
"index": 0,
"value": 30,
"op": "SCMP_CMP_EQ"
},
{
"index": 2,
"value": 40,
"op": "SCMP_CMP_EQ"
}
]
}
]
}"#;
let mut scmp: oci::LinuxSeccomp = serde_json::from_str(data).unwrap();
let mut arch: Vec<oci::Arch>;
if cfg!(target_endian = "little") {
// For little-endian architectures
arch = vec![
"SCMP_ARCH_X86".to_string(),
"SCMP_ARCH_X32".to_string(),
"SCMP_ARCH_X86_64".to_string(),
"SCMP_ARCH_AARCH64".to_string(),
"SCMP_ARCH_ARM".to_string(),
"SCMP_ARCH_PPC64LE".to_string(),
];
} else {
// For big-endian architectures
arch = vec!["SCMP_ARCH_S390X".to_string()];
}
scmp.architectures.append(&mut arch);
init_seccomp(&scmp).unwrap();
// Basic syscall with simple rule
syscall_assert!(unsafe { dup3(0, 1, O_CLOEXEC) }, -EPERM);
// Syscall with permitted arguments
syscall_assert!(unsafe { process_vm_readv(1, null(), 0, null(), 0, 0) }, 0);
// Multiple arguments with OR rules with ERRNO
syscall_assert!(
unsafe { process_vm_readv(10, null(), 0, null(), 0, 0) },
-111
);
syscall_assert!(
unsafe { process_vm_readv(20, null(), 0, null(), 0, 0) },
-111
);
// Multiple arguments with AND rules with ERRNO
syscall_assert!(unsafe { process_vm_readv(30, null(), 0, null(), 0, 0) }, 0);
syscall_assert!(
unsafe { process_vm_readv(30, null(), 40, null(), 0, 0) },
-222
);
}
}

View File

@@ -1,41 +0,0 @@
# This is an agent configuration file example.
dev_mode = true
server_addr = 'vsock://8:2048'
[endpoints]
# All endpoints are allowed
allowed = [
"AddARPNeighborsRequest",
"AddSwapRequest",
"CloseStdinRequest",
"CopyFileRequest",
"CreateContainerRequest",
"CreateSandboxRequest",
"DestroySandboxRequest",
"ExecProcessRequest",
"GetMetricsRequest",
"GetOOMEventRequest",
"GuestDetailsRequest",
"ListInterfacesRequest",
"ListRoutesRequest",
"MemHotplugByProbeRequest",
"OnlineCPUMemRequest",
"PauseContainerRequest",
"PullImageRequest",
"ReadStreamRequest",
"RemoveContainerRequest",
"ReseedRandomDevRequest",
"ResumeContainerRequest",
"SetGuestDateTimeRequest",
"SignalProcessRequest",
"StartContainerRequest",
"StartTracingRequest",
"StatsContainerRequest",
"StopTracingRequest",
"TtyWinResizeRequest",
"UpdateContainerRequest",
"UpdateInterfaceRequest",
"UpdateRoutesRequest",
"WaitProcessRequest",
"WriteStreamRequest"
]

View File

@@ -2,13 +2,10 @@
//
// SPDX-License-Identifier: Apache-2.0
//
use crate::rpc;
use crate::tracer;
use anyhow::{bail, ensure, Context, Result};
use serde::Deserialize;
use std::collections::HashSet;
use std::env;
use std::fs;
use std::str::FromStr;
use std::time;
use tracing::instrument;
@@ -22,7 +19,6 @@ const DEBUG_CONSOLE_VPORT_OPTION: &str = "agent.debug_console_vport";
const LOG_VPORT_OPTION: &str = "agent.log_vport";
const CONTAINER_PIPE_SIZE_OPTION: &str = "agent.container_pipe_size";
const UNIFIED_CGROUP_HIERARCHY_OPTION: &str = "agent.unified_cgroup_hierarchy";
const CONFIG_FILE: &str = "agent.config_file";
const DEFAULT_LOG_LEVEL: slog::Level = slog::Level::Info;
const DEFAULT_HOTPLUG_TIMEOUT: time::Duration = time::Duration::from_secs(3);
@@ -33,7 +29,7 @@ const VSOCK_PORT: u16 = 1024;
// Environment variables used for development and testing
const SERVER_ADDR_ENV_VAR: &str = "KATA_AGENT_SERVER_ADDR";
const LOG_LEVEL_ENV_VAR: &str = "KATA_AGENT_LOG_LEVEL";
const TRACING_ENV_VAR: &str = "KATA_AGENT_TRACING";
const TRACE_TYPE_ENV_VAR: &str = "KATA_AGENT_TRACE_TYPE";
const ERR_INVALID_LOG_LEVEL: &str = "invalid log level";
const ERR_INVALID_LOG_LEVEL_PARAM: &str = "invalid log level parameter";
@@ -51,17 +47,6 @@ const ERR_INVALID_CONTAINER_PIPE_SIZE_PARAM: &str = "unable to parse container p
const ERR_INVALID_CONTAINER_PIPE_SIZE_KEY: &str = "invalid container pipe size key name";
const ERR_INVALID_CONTAINER_PIPE_NEGATIVE: &str = "container pipe size should not be negative";
#[derive(Debug, Default, Deserialize)]
pub struct EndpointsConfig {
pub allowed: Vec<String>,
}
#[derive(Debug, Default)]
pub struct AgentEndpoints {
pub allowed: HashSet<String>,
pub all_allowed: bool,
}
#[derive(Debug)]
pub struct AgentConfig {
pub debug_console: bool,
@@ -73,38 +58,7 @@ pub struct AgentConfig {
pub container_pipe_size: i32,
pub server_addr: String,
pub unified_cgroup_hierarchy: bool,
pub tracing: bool,
pub endpoints: AgentEndpoints,
pub supports_seccomp: bool,
}
#[derive(Debug, Deserialize)]
pub struct AgentConfigBuilder {
pub debug_console: Option<bool>,
pub dev_mode: Option<bool>,
pub log_level: Option<String>,
pub hotplug_timeout: Option<time::Duration>,
pub debug_console_vport: Option<i32>,
pub log_vport: Option<i32>,
pub container_pipe_size: Option<i32>,
pub server_addr: Option<String>,
pub unified_cgroup_hierarchy: Option<bool>,
pub tracing: Option<bool>,
pub endpoints: Option<EndpointsConfig>,
}
macro_rules! config_override {
($builder:ident, $config:ident, $field:ident) => {
if let Some(v) = $builder.$field {
$config.$field = v;
}
};
($builder:ident, $config:ident, $field:ident, $func: ident) => {
if let Some(v) = $builder.$field {
$config.$field = $func(&v)?;
}
};
pub tracing: tracer::TraceType,
}
// parse_cmdline_param parse commandline parameters.
@@ -137,8 +91,8 @@ macro_rules! parse_cmdline_param {
};
}
impl Default for AgentConfig {
fn default() -> Self {
impl AgentConfig {
pub fn new() -> AgentConfig {
AgentConfig {
debug_console: false,
dev_mode: false,
@@ -149,84 +103,34 @@ impl Default for AgentConfig {
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: format!("{}:{}", VSOCK_ADDR, VSOCK_PORT),
unified_cgroup_hierarchy: false,
tracing: false,
endpoints: Default::default(),
supports_seccomp: rpc::have_seccomp(),
tracing: tracer::TraceType::Disabled,
}
}
}
impl FromStr for AgentConfig {
type Err = anyhow::Error;
fn from_str(s: &str) -> Result<Self, Self::Err> {
let agent_config_builder: AgentConfigBuilder =
toml::from_str(s).map_err(anyhow::Error::new)?;
let mut agent_config: AgentConfig = Default::default();
// Overwrite default values with the configuration files ones.
config_override!(agent_config_builder, agent_config, debug_console);
config_override!(agent_config_builder, agent_config, dev_mode);
config_override!(
agent_config_builder,
agent_config,
log_level,
logrus_to_slog_level
);
config_override!(agent_config_builder, agent_config, hotplug_timeout);
config_override!(agent_config_builder, agent_config, debug_console_vport);
config_override!(agent_config_builder, agent_config, log_vport);
config_override!(agent_config_builder, agent_config, container_pipe_size);
config_override!(agent_config_builder, agent_config, server_addr);
config_override!(agent_config_builder, agent_config, unified_cgroup_hierarchy);
config_override!(agent_config_builder, agent_config, tracing);
// Populate the allowed endpoints hash set, if we got any from the config file.
if let Some(endpoints) = agent_config_builder.endpoints {
for ep in endpoints.allowed {
agent_config.endpoints.allowed.insert(ep);
}
}
Ok(agent_config)
}
}
impl AgentConfig {
#[instrument]
pub fn from_cmdline(file: &str) -> Result<AgentConfig> {
let mut config: AgentConfig = Default::default();
pub fn parse_cmdline(&mut self, file: &str) -> Result<()> {
let cmdline = fs::read_to_string(file)?;
let params: Vec<&str> = cmdline.split_ascii_whitespace().collect();
for param in params.iter() {
// If we get a configuration file path from the command line, we
// generate our config from it.
// The agent will fail to start if the configuration file is not present,
// or if it can't be parsed properly.
if param.starts_with(format!("{}=", CONFIG_FILE).as_str()) {
let config_file = get_string_value(param)?;
return AgentConfig::from_config_file(&config_file);
}
// parse cmdline flags
parse_cmdline_param!(param, DEBUG_CONSOLE_FLAG, config.debug_console);
parse_cmdline_param!(param, DEV_MODE_FLAG, config.dev_mode);
parse_cmdline_param!(param, DEBUG_CONSOLE_FLAG, self.debug_console);
parse_cmdline_param!(param, DEV_MODE_FLAG, self.dev_mode);
// Support "bare" tracing option for backwards compatibility with
// Kata 1.x.
if param == &TRACE_MODE_OPTION {
config.tracing = true;
self.tracing = tracer::TraceType::Isolated;
continue;
}
parse_cmdline_param!(param, TRACE_MODE_OPTION, config.tracing, get_bool_value);
parse_cmdline_param!(param, TRACE_MODE_OPTION, self.tracing, get_trace_type);
// parse cmdline options
parse_cmdline_param!(param, LOG_LEVEL_OPTION, config.log_level, get_log_level);
parse_cmdline_param!(param, LOG_LEVEL_OPTION, self.log_level, get_log_level);
parse_cmdline_param!(
param,
SERVER_ADDR_OPTION,
config.server_addr,
self.server_addr,
get_string_value
);
@@ -234,7 +138,7 @@ impl AgentConfig {
parse_cmdline_param!(
param,
HOTPLUG_TIMOUT_OPTION,
config.hotplug_timeout,
self.hotplug_timeout,
get_hotplug_timeout,
|hotplug_timeout: time::Duration| hotplug_timeout.as_secs() > 0
);
@@ -243,14 +147,14 @@ impl AgentConfig {
parse_cmdline_param!(
param,
DEBUG_CONSOLE_VPORT_OPTION,
config.debug_console_vport,
self.debug_console_vport,
get_vsock_port,
|port| port > 0
);
parse_cmdline_param!(
param,
LOG_VPORT_OPTION,
config.log_vport,
self.log_vport,
get_vsock_port,
|port| port > 0
);
@@ -258,47 +162,34 @@ impl AgentConfig {
parse_cmdline_param!(
param,
CONTAINER_PIPE_SIZE_OPTION,
config.container_pipe_size,
self.container_pipe_size,
get_container_pipe_size
);
parse_cmdline_param!(
param,
UNIFIED_CGROUP_HIERARCHY_OPTION,
config.unified_cgroup_hierarchy,
self.unified_cgroup_hierarchy,
get_bool_value
);
}
if let Ok(addr) = env::var(SERVER_ADDR_ENV_VAR) {
config.server_addr = addr;
self.server_addr = addr;
}
if let Ok(addr) = env::var(LOG_LEVEL_ENV_VAR) {
if let Ok(level) = logrus_to_slog_level(&addr) {
config.log_level = level;
self.log_level = level;
}
}
if let Ok(value) = env::var(TRACING_ENV_VAR) {
let name_value = format!("{}={}", TRACING_ENV_VAR, value);
config.tracing = get_bool_value(&name_value)?;
if let Ok(value) = env::var(TRACE_TYPE_ENV_VAR) {
if let Ok(result) = value.parse::<tracer::TraceType>() {
self.tracing = result;
}
}
// We did not get a configuration file: allow all endpoints.
config.endpoints.all_allowed = true;
Ok(config)
}
#[instrument]
pub fn from_config_file(file: &str) -> Result<AgentConfig> {
let config = fs::read_to_string(file)?;
AgentConfig::from_str(&config)
}
pub fn is_allowed_endpoint(&self, ep: &str) -> bool {
self.endpoints.all_allowed || self.endpoints.allowed.contains(ep)
Ok(())
}
}
@@ -345,6 +236,25 @@ fn get_log_level(param: &str) -> Result<slog::Level> {
logrus_to_slog_level(fields[1])
}
#[instrument]
fn get_trace_type(param: &str) -> Result<tracer::TraceType> {
ensure!(!param.is_empty(), "invalid trace type parameter");
let fields: Vec<&str> = param.split('=').collect();
ensure!(
fields[0] == TRACE_MODE_OPTION,
"invalid trace type key name"
);
if fields.len() == 1 {
return Ok(tracer::TraceType::Isolated);
}
let result = fields[1].parse::<tracer::TraceType>()?;
Ok(result)
}
#[instrument]
fn get_hotplug_timeout(param: &str) -> Result<time::Duration> {
let fields: Vec<&str> = param.split('=').collect();
@@ -429,6 +339,10 @@ mod tests {
use std::time;
use tempfile::tempdir;
const ERR_INVALID_TRACE_TYPE_PARAM: &str = "invalid trace type parameter";
const ERR_INVALID_TRACE_TYPE: &str = "invalid trace type";
const ERR_INVALID_TRACE_TYPE_KEY: &str = "invalid trace type key name";
// Parameters:
//
// 1: expected Result
@@ -457,7 +371,7 @@ mod tests {
#[test]
fn test_new() {
let config: AgentConfig = Default::default();
let config = AgentConfig::new();
assert!(!config.debug_console);
assert!(!config.dev_mode);
assert_eq!(config.log_level, DEFAULT_LOG_LEVEL);
@@ -465,7 +379,7 @@ mod tests {
}
#[test]
fn test_from_cmdline() {
fn test_parse_cmdline() {
const TEST_SERVER_ADDR: &str = "vsock://-1:1024";
#[derive(Debug)]
@@ -479,7 +393,7 @@ mod tests {
container_pipe_size: i32,
server_addr: &'a str,
unified_cgroup_hierarchy: bool,
tracing: bool,
tracing: tracer::TraceType,
}
impl Default for TestData<'_> {
@@ -494,7 +408,7 @@ mod tests {
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
tracing: false,
tracing: tracer::TraceType::Disabled,
}
}
}
@@ -753,121 +667,64 @@ mod tests {
},
TestData {
contents: "trace",
tracing: false,
tracing: tracer::TraceType::Disabled,
..Default::default()
},
TestData {
contents: ".trace",
tracing: false,
tracing: tracer::TraceType::Disabled,
..Default::default()
},
TestData {
contents: "agent.tracer",
tracing: false,
tracing: tracer::TraceType::Disabled,
..Default::default()
},
TestData {
contents: "agent.trac",
tracing: false,
tracing: tracer::TraceType::Disabled,
..Default::default()
},
TestData {
contents: "agent.trace",
tracing: true,
tracing: tracer::TraceType::Isolated,
..Default::default()
},
TestData {
contents: "agent.trace=true",
tracing: true,
contents: "agent.trace=isolated",
tracing: tracer::TraceType::Isolated,
..Default::default()
},
TestData {
contents: "agent.trace=false",
tracing: false,
..Default::default()
},
TestData {
contents: "agent.trace=0",
tracing: false,
..Default::default()
},
TestData {
contents: "agent.trace=1",
tracing: true,
..Default::default()
},
TestData {
contents: "agent.trace=a",
tracing: false,
..Default::default()
},
TestData {
contents: "agent.trace=foo",
tracing: false,
..Default::default()
},
TestData {
contents: "agent.trace=.",
tracing: false,
..Default::default()
},
TestData {
contents: "agent.trace=,",
tracing: false,
contents: "agent.trace=disabled",
tracing: tracer::TraceType::Disabled,
..Default::default()
},
TestData {
contents: "",
env_vars: vec!["KATA_AGENT_TRACING="],
tracing: false,
env_vars: vec!["KATA_AGENT_TRACE_TYPE=isolated"],
tracing: tracer::TraceType::Isolated,
..Default::default()
},
TestData {
contents: "",
env_vars: vec!["KATA_AGENT_TRACING=''"],
tracing: false,
..Default::default()
},
TestData {
contents: "",
env_vars: vec!["KATA_AGENT_TRACING=0"],
tracing: false,
..Default::default()
},
TestData {
contents: "",
env_vars: vec!["KATA_AGENT_TRACING=."],
tracing: false,
..Default::default()
},
TestData {
contents: "",
env_vars: vec!["KATA_AGENT_TRACING=,"],
tracing: false,
..Default::default()
},
TestData {
contents: "",
env_vars: vec!["KATA_AGENT_TRACING=foo"],
tracing: false,
..Default::default()
},
TestData {
contents: "",
env_vars: vec!["KATA_AGENT_TRACING=1"],
tracing: true,
..Default::default()
},
TestData {
contents: "",
env_vars: vec!["KATA_AGENT_TRACING=true"],
tracing: true,
env_vars: vec!["KATA_AGENT_TRACE_TYPE=disabled"],
tracing: tracer::TraceType::Disabled,
..Default::default()
},
];
let dir = tempdir().expect("failed to create tmpdir");
// First, check a missing file is handled
let file_path = dir.path().join("enoent");
let filename = file_path.to_str().expect("failed to create filename");
let mut config = AgentConfig::new();
let result = config.parse_cmdline(&filename.to_owned());
assert!(result.is_err());
// Now, test various combinations of file contents and environment
// variables.
for (i, d) in tests.iter().enumerate() {
@@ -896,7 +753,22 @@ mod tests {
vars_to_unset.push(name);
}
let config = AgentConfig::from_cmdline(filename).expect("Failed to parse command line");
let mut config = AgentConfig::new();
assert!(!config.debug_console, "{}", msg);
assert!(!config.dev_mode, "{}", msg);
assert!(!config.unified_cgroup_hierarchy, "{}", msg);
assert_eq!(
config.hotplug_timeout,
time::Duration::from_secs(3),
"{}",
msg
);
assert_eq!(config.container_pipe_size, 0, "{}", msg);
assert_eq!(config.server_addr, TEST_SERVER_ADDR, "{}", msg);
assert_eq!(config.tracing, tracer::TraceType::Disabled, "{}", msg);
let result = config.parse_cmdline(filename);
assert!(result.is_ok(), "{}", msg);
assert_eq!(d.debug_console, config.debug_console, "{}", msg);
assert_eq!(d.dev_mode, config.dev_mode, "{}", msg);
@@ -1348,33 +1220,60 @@ Caused by:
}
#[test]
fn test_config_builder_from_string() {
let config = AgentConfig::from_str(
r#"
dev_mode = true
server_addr = 'vsock://8:2048'
fn test_get_trace_type() {
#[derive(Debug)]
struct TestData<'a> {
param: &'a str,
result: Result<tracer::TraceType>,
}
[endpoints]
allowed = ["CreateContainer", "StartContainer"]
"#,
)
.unwrap();
let tests = &[
TestData {
param: "",
result: Err(anyhow!(ERR_INVALID_TRACE_TYPE_PARAM)),
},
TestData {
param: "agent.tracer",
result: Err(anyhow!(ERR_INVALID_TRACE_TYPE_KEY)),
},
TestData {
param: "agent.trac",
result: Err(anyhow!(ERR_INVALID_TRACE_TYPE_KEY)),
},
TestData {
param: "agent.trace=",
result: Err(anyhow!(ERR_INVALID_TRACE_TYPE)),
},
TestData {
param: "agent.trace==",
result: Err(anyhow!(ERR_INVALID_TRACE_TYPE)),
},
TestData {
param: "agent.trace=foo",
result: Err(anyhow!(ERR_INVALID_TRACE_TYPE)),
},
TestData {
param: "agent.trace",
result: Ok(tracer::TraceType::Isolated),
},
TestData {
param: "agent.trace=isolated",
result: Ok(tracer::TraceType::Isolated),
},
TestData {
param: "agent.trace=disabled",
result: Ok(tracer::TraceType::Disabled),
},
];
// Verify that the all_allowed flag is false
assert!(!config.endpoints.all_allowed);
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
// Verify that the override worked
assert!(config.dev_mode);
assert_eq!(config.server_addr, "vsock://8:2048");
assert_eq!(
config.endpoints.allowed,
vec!["CreateContainer".to_string(), "StartContainer".to_string()]
.iter()
.cloned()
.collect()
);
let result = get_trace_type(d.param);
// Verify that the default values are valid
assert_eq!(config.hotplug_timeout, DEFAULT_HOTPLUG_TIMEOUT);
let msg = format!("{}: result: {:?}", msg, result);
assert_result!(d.result, result, msg);
}
}
}

View File

@@ -7,10 +7,7 @@ use libc::{c_uint, major, minor};
use nix::sys::stat;
use regex::Regex;
use std::collections::HashMap;
use std::ffi::OsStr;
use std::fmt;
use std::fs;
use std::os::unix::ffi::OsStrExt;
use std::os::unix::fs::MetadataExt;
use std::path::Path;
use std::str::FromStr;
@@ -20,6 +17,10 @@ use tokio::sync::Mutex;
#[cfg(target_arch = "s390x")]
use crate::ccw;
use crate::linux_abi::*;
use crate::mount::{
DRIVER_BLK_CCW_TYPE, DRIVER_BLK_TYPE, DRIVER_MMIO_BLK_TYPE, DRIVER_NVDIMM_TYPE,
DRIVER_SCSI_TYPE,
};
use crate::pci;
use crate::sandbox::Sandbox;
use crate::uevent::{wait_for_uevent, Uevent, UeventMatcher};
@@ -37,22 +38,6 @@ macro_rules! sl {
const VM_ROOTFS: &str = "/";
pub const DRIVER_9P_TYPE: &str = "9p";
pub const DRIVER_VIRTIOFS_TYPE: &str = "virtio-fs";
pub const DRIVER_BLK_TYPE: &str = "blk";
pub const DRIVER_BLK_CCW_TYPE: &str = "blk-ccw";
pub const DRIVER_MMIO_BLK_TYPE: &str = "mmioblk";
pub const DRIVER_SCSI_TYPE: &str = "scsi";
pub const DRIVER_NVDIMM_TYPE: &str = "nvdimm";
pub const DRIVER_EPHEMERAL_TYPE: &str = "ephemeral";
pub const DRIVER_LOCAL_TYPE: &str = "local";
pub const DRIVER_WATCHABLE_BIND_TYPE: &str = "watchable-bind";
// VFIO device to be bound to a guest kernel driver
pub const DRIVER_VFIO_GK_TYPE: &str = "vfio-gk";
// VFIO device to be bound to vfio-pci and made available inside the
// container as a VFIO device node
pub const DRIVER_VFIO_TYPE: &str = "vfio";
#[derive(Debug)]
struct DevIndexEntry {
idx: usize,
@@ -62,89 +47,17 @@ struct DevIndexEntry {
#[derive(Debug)]
struct DevIndex(HashMap<String, DevIndexEntry>);
#[instrument]
pub fn rescan_pci_bus() -> Result<()> {
online_device(SYSFS_PCI_BUS_RESCAN_FILE)
}
#[instrument]
pub fn online_device(path: &str) -> Result<()> {
fs::write(path, "1")?;
Ok(())
}
// Force a given PCI device to bind to the given driver, does
// basically the same thing as
// driverctl set-override <PCI address> <driver>
#[instrument]
pub fn pci_driver_override<T, U>(syspci: T, dev: pci::Address, drv: U) -> Result<()>
where
T: AsRef<OsStr> + std::fmt::Debug,
U: AsRef<OsStr> + std::fmt::Debug,
{
let syspci = Path::new(&syspci);
let drv = drv.as_ref();
info!(sl!(), "rebind_pci_driver: {} => {:?}", dev, drv);
let devpath = syspci.join("devices").join(dev.to_string());
let overridepath = &devpath.join("driver_override");
fs::write(overridepath, drv.as_bytes())?;
let drvpath = &devpath.join("driver");
let need_unbind = match fs::read_link(drvpath) {
Ok(d) if d.file_name() == Some(drv) => return Ok(()), // Nothing to do
Err(e) if e.kind() == std::io::ErrorKind::NotFound => false, // No current driver
Err(e) => return Err(anyhow!("Error checking driver on {}: {}", dev, e)),
Ok(_) => true, // Current driver needs unbinding
};
if need_unbind {
let unbindpath = &drvpath.join("unbind");
fs::write(unbindpath, dev.to_string())?;
}
let probepath = syspci.join("drivers_probe");
fs::write(probepath, dev.to_string())?;
Ok(())
}
// Represents an IOMMU group
#[derive(Clone, Debug, PartialEq, Eq)]
pub struct IommuGroup(u32);
impl fmt::Display for IommuGroup {
fn fmt(&self, f: &mut fmt::Formatter) -> Result<(), fmt::Error> {
write!(f, "{}", self.0)
}
}
// Determine the IOMMU group of a PCI device
#[instrument]
fn pci_iommu_group<T>(syspci: T, dev: pci::Address) -> Result<Option<IommuGroup>>
where
T: AsRef<OsStr> + std::fmt::Debug,
{
let syspci = Path::new(&syspci);
let grouppath = syspci
.join("devices")
.join(dev.to_string())
.join("iommu_group");
match fs::read_link(&grouppath) {
// Device has no group
Err(e) if e.kind() == std::io::ErrorKind::NotFound => Ok(None),
Err(e) => Err(anyhow!("Error reading link {:?}: {}", &grouppath, e)),
Ok(group) => {
if let Some(group) = group.file_name() {
if let Some(group) = group.to_str() {
if let Ok(group) = group.parse::<u32>() {
return Ok(Some(IommuGroup(group)));
}
}
}
Err(anyhow!(
"Unexpected IOMMU group link {:?} => {:?}",
grouppath,
group
))
}
}
}
// pcipath_to_sysfs fetches the sysfs path for a PCI path, relative to
// the sysfs path for the PCI host bridge, based on the PCI path
// provided.
@@ -154,7 +67,7 @@ pub fn pcipath_to_sysfs(root_bus_sysfs: &str, pcipath: &pci::Path) -> Result<Str
let mut relpath = String::new();
for i in 0..pcipath.len() {
let bdf = format!("{}:{}", bus, pcipath[i]);
let bdf = format!("{}:{}.0", bus, pcipath[i]);
relpath = format!("{}/{}", relpath, bdf);
@@ -249,6 +162,8 @@ pub async fn get_virtio_blk_pci_device_name(
let sysfs_rel_path = pcipath_to_sysfs(&root_bus_sysfs, pcipath)?;
let matcher = VirtioBlkPciMatcher::new(&sysfs_rel_path);
rescan_pci_bus()?;
let uev = wait_for_uevent(sandbox, matcher).await?;
Ok(format!("{}/{}", SYSTEM_DEV_PATH, &uev.devname))
}
@@ -340,72 +255,6 @@ pub async fn wait_for_pmem_device(sandbox: &Arc<Mutex<Sandbox>>, devpath: &str)
Ok(())
}
#[derive(Debug)]
struct PciMatcher {
devpath: String,
}
impl PciMatcher {
fn new(relpath: &str) -> Result<PciMatcher> {
let root_bus = create_pci_root_bus_path();
Ok(PciMatcher {
devpath: format!("{}{}", root_bus, relpath),
})
}
}
impl UeventMatcher for PciMatcher {
fn is_match(&self, uev: &Uevent) -> bool {
uev.devpath == self.devpath
}
}
pub async fn wait_for_pci_device(
sandbox: &Arc<Mutex<Sandbox>>,
pcipath: &pci::Path,
) -> Result<pci::Address> {
let root_bus_sysfs = format!("{}{}", SYSFS_DIR, create_pci_root_bus_path());
let sysfs_rel_path = pcipath_to_sysfs(&root_bus_sysfs, pcipath)?;
let matcher = PciMatcher::new(&sysfs_rel_path)?;
let uev = wait_for_uevent(sandbox, matcher).await?;
let addr = uev
.devpath
.rsplit('/')
.next()
.ok_or_else(|| anyhow!("Bad device path {:?} in uevent", &uev.devpath))?;
let addr = pci::Address::from_str(addr)?;
Ok(addr)
}
#[derive(Debug)]
struct VfioMatcher {
syspath: String,
}
impl VfioMatcher {
fn new(grp: IommuGroup) -> VfioMatcher {
VfioMatcher {
syspath: format!("/devices/virtual/vfio/{}", grp),
}
}
}
impl UeventMatcher for VfioMatcher {
fn is_match(&self, uev: &Uevent) -> bool {
uev.devpath == self.syspath
}
}
#[instrument]
async fn get_vfio_device_name(sandbox: &Arc<Mutex<Sandbox>>, grp: IommuGroup) -> Result<String> {
let matcher = VfioMatcher::new(grp);
let uev = wait_for_uevent(sandbox, matcher).await?;
Ok(format!("{}/{}", SYSTEM_DEV_PATH, &uev.devname))
}
/// Scan SCSI bus for the given SCSI address(SCSI-Id and LUN)
#[instrument]
fn scan_scsi_bus(scsi_addr: &str) -> Result<()> {
@@ -436,27 +285,24 @@ fn scan_scsi_bus(scsi_addr: &str) -> Result<()> {
Ok(())
}
// update_spec_device updates the device list in the OCI spec to make
// it include details appropriate for the VM, instead of the host. It
// is given the host path to the device (to locate the device in the
// original OCI spec) and the VM path which it uses to determine the
// VM major/minor numbers, and the final path with which to present
// the device in the (inner) container
// update_spec_device_list takes a device description provided by the caller,
// trying to find it on the guest. Once this device has been identified, the
// "real" information that can be read from inside the VM is used to update
// the same device in the list of devices provided through the OCI spec.
// This is needed to update information about minor/major numbers that cannot
// be predicted from the caller.
#[instrument]
fn update_spec_device(
spec: &mut Spec,
devidx: &DevIndex,
host_path: &str,
vm_path: &str,
final_path: &str,
) -> Result<()> {
fn update_spec_device_list(device: &Device, spec: &mut Spec, devidx: &DevIndex) -> Result<()> {
let major_id: c_uint;
let minor_id: c_uint;
// If no container_path is provided, we won't be able to match and
// update the device in the OCI spec device list. This is an error.
if host_path.is_empty() {
return Err(anyhow!("Host path cannot empty for device"));
if device.container_path.is_empty() {
return Err(anyhow!(
"container_path cannot empty for device {:?}",
device
));
}
let linux = spec
@@ -464,11 +310,11 @@ fn update_spec_device(
.as_mut()
.ok_or_else(|| anyhow!("Spec didn't container linux field"))?;
if !Path::new(vm_path).exists() {
return Err(anyhow!("vm_path:{} doesn't exist", vm_path));
if !Path::new(&device.vm_path).exists() {
return Err(anyhow!("vm_path:{} doesn't exist", device.vm_path));
}
let meta = fs::metadata(vm_path)?;
let meta = fs::metadata(&device.vm_path)?;
let dev_id = meta.rdev();
unsafe {
major_id = major(dev_id);
@@ -477,27 +323,24 @@ fn update_spec_device(
info!(
sl!(),
"update_spec_device(): vm_path={}, major: {}, minor: {}\n", vm_path, major_id, minor_id
"got the device: dev_path: {}, major: {}, minor: {}\n", &device.vm_path, major_id, minor_id
);
if let Some(idxdata) = devidx.0.get(host_path) {
if let Some(idxdata) = devidx.0.get(device.container_path.as_str()) {
let dev = &mut linux.devices[idxdata.idx];
let host_major = dev.major;
let host_minor = dev.minor;
dev.major = major_id as i64;
dev.minor = minor_id as i64;
dev.path = final_path.to_string();
info!(
sl!(),
"change the device from path: {} major: {} minor: {} to vm device path: {} major: {} minor: {}",
host_path,
"change the device from major: {} minor: {} to vm device major: {} minor: {}",
host_major,
host_minor,
dev.path,
dev.major,
dev.minor,
major_id,
minor_id
);
// Resources must be updated since they are used to identify
@@ -518,7 +361,7 @@ fn update_spec_device(
} else {
Err(anyhow!(
"Should have found a matching device {} in the spec",
vm_path
device.vm_path
))
}
}
@@ -536,13 +379,7 @@ async fn virtiommio_blk_device_handler(
return Err(anyhow!("Invalid path for virtio mmio blk device"));
}
update_spec_device(
spec,
devidx,
&device.container_path,
&device.vm_path,
&device.container_path,
)
update_spec_device_list(device, spec, devidx)
}
// device.Id should be a PCI path string
@@ -558,13 +395,7 @@ async fn virtio_blk_device_handler(
dev.vm_path = get_virtio_blk_pci_device_name(sandbox, &pcipath).await?;
update_spec_device(
spec,
devidx,
&dev.container_path,
&dev.vm_path,
&dev.container_path,
)
update_spec_device_list(&dev, spec, devidx)
}
// device.id should be a CCW path string
@@ -579,13 +410,7 @@ async fn virtio_blk_ccw_device_handler(
let mut dev = device.clone();
let ccw_device = ccw::Device::from_str(&device.id)?;
dev.vm_path = get_virtio_blk_ccw_device_name(sandbox, &ccw_device).await?;
update_spec_device(
spec,
devidx,
&dev.container_path,
&dev.vm_path,
&dev.container_path,
)
update_spec_device_list(&dev, spec, devidx)
}
#[cfg(not(target_arch = "s390x"))]
@@ -609,13 +434,7 @@ async fn virtio_scsi_device_handler(
) -> Result<()> {
let mut dev = device.clone();
dev.vm_path = get_scsi_device_name(sandbox, &device.id).await?;
update_spec_device(
spec,
devidx,
&dev.container_path,
&dev.vm_path,
&dev.container_path,
)
update_spec_device_list(&dev, spec, devidx)
}
#[instrument]
@@ -629,79 +448,7 @@ async fn virtio_nvdimm_device_handler(
return Err(anyhow!("Invalid path for nvdimm device"));
}
update_spec_device(
spec,
devidx,
&device.container_path,
&device.vm_path,
&device.container_path,
)
}
fn split_vfio_option(opt: &str) -> Option<(&str, &str)> {
let mut tokens = opt.split('=');
let hostbdf = tokens.next()?;
let path = tokens.next()?;
if tokens.next().is_some() {
None
} else {
Some((hostbdf, path))
}
}
// device.options should have one entry for each PCI device in the VFIO group
// Each option should have the form "DDDD:BB:DD.F=<pcipath>"
// DDDD:BB:DD.F is the device's PCI address in the host
// <pcipath> is a PCI path to the device in the guest (see pci.rs)
async fn vfio_device_handler(
device: &Device,
spec: &mut Spec,
sandbox: &Arc<Mutex<Sandbox>>,
devidx: &DevIndex,
) -> Result<()> {
let vfio_in_guest = device.field_type != DRIVER_VFIO_GK_TYPE;
let mut group = None;
for opt in device.options.iter() {
let (_, pcipath) =
split_vfio_option(opt).ok_or_else(|| anyhow!("Malformed VFIO option {:?}", opt))?;
let pcipath = pci::Path::from_str(pcipath)?;
let guestdev = wait_for_pci_device(sandbox, &pcipath).await?;
if vfio_in_guest {
pci_driver_override(SYSFS_BUS_PCI_PATH, guestdev, "vfio-pci")?;
let devgroup = pci_iommu_group(SYSFS_BUS_PCI_PATH, guestdev)?;
if devgroup.is_none() {
// Devices must have an IOMMU group to be usable via VFIO
return Err(anyhow!("{} has no IOMMU group", guestdev));
}
if group.is_some() && group != devgroup {
// If PCI devices associated with the same VFIO device
// (and therefore group) in the host don't end up in
// the same group in the guest, something has gone
// horribly wrong
return Err(anyhow!(
"{} is not in guest IOMMU group {}",
guestdev,
group.unwrap()
));
}
group = devgroup;
}
}
if vfio_in_guest {
// If there are any devices at all, logic above ensures that group is not None
let group = group.unwrap();
let vmpath = get_vfio_device_name(sandbox, group).await?;
update_spec_device(spec, devidx, &device.container_path, &vmpath, &vmpath)?;
}
Ok(())
update_spec_device_list(device, spec, devidx)
}
impl DevIndex {
@@ -773,9 +520,6 @@ async fn add_device(
DRIVER_MMIO_BLK_TYPE => virtiommio_blk_device_handler(device, spec, sandbox, devidx).await,
DRIVER_NVDIMM_TYPE => virtio_nvdimm_device_handler(device, spec, sandbox, devidx).await,
DRIVER_SCSI_TYPE => virtio_scsi_device_handler(device, spec, sandbox, devidx).await,
DRIVER_VFIO_GK_TYPE | DRIVER_VFIO_TYPE => {
vfio_device_handler(device, spec, sandbox, devidx).await
}
_ => Err(anyhow!("Unknown device type {}", device.field_type)),
}
}
@@ -840,28 +584,28 @@ mod tests {
}
#[test]
fn test_update_spec_device() {
fn test_update_spec_device_list() {
let (major, minor) = (7, 2);
let mut device = Device::default();
let mut spec = Spec::default();
// container_path empty
let container_path = "";
let vm_path = "";
let devidx = DevIndex::new(&spec);
let res = update_spec_device(&mut spec, &devidx, container_path, vm_path, container_path);
let res = update_spec_device_list(&device, &mut spec, &devidx);
assert!(res.is_err());
device.container_path = "/dev/null".to_string();
// linux is empty
let container_path = "/dev/null";
let devidx = DevIndex::new(&spec);
let res = update_spec_device(&mut spec, &devidx, container_path, vm_path, container_path);
let res = update_spec_device_list(&device, &mut spec, &devidx);
assert!(res.is_err());
spec.linux = Some(Linux::default());
// linux.devices is empty
let devidx = DevIndex::new(&spec);
let res = update_spec_device(&mut spec, &devidx, container_path, vm_path, container_path);
let res = update_spec_device_list(&device, &mut spec, &devidx);
assert!(res.is_err());
spec.linux.as_mut().unwrap().devices = vec![oci::LinuxDevice {
@@ -873,32 +617,26 @@ mod tests {
// vm_path empty
let devidx = DevIndex::new(&spec);
let res = update_spec_device(&mut spec, &devidx, container_path, vm_path, container_path);
let res = update_spec_device_list(&device, &mut spec, &devidx);
assert!(res.is_err());
let vm_path = "/dev/null";
device.vm_path = "/dev/null".to_string();
// guest and host path are not the same
let devidx = DevIndex::new(&spec);
let res = update_spec_device(&mut spec, &devidx, container_path, vm_path, container_path);
assert!(
res.is_err(),
"container_path={:?} vm_path={:?} spec={:?}",
container_path,
vm_path,
spec
);
let res = update_spec_device_list(&device, &mut spec, &devidx);
assert!(res.is_err(), "device={:?} spec={:?}", device, spec);
spec.linux.as_mut().unwrap().devices[0].path = container_path.to_string();
spec.linux.as_mut().unwrap().devices[0].path = device.container_path.clone();
// spec.linux.resources is empty
let devidx = DevIndex::new(&spec);
let res = update_spec_device(&mut spec, &devidx, container_path, vm_path, container_path);
let res = update_spec_device_list(&device, &mut spec, &devidx);
assert!(res.is_ok());
// update both devices and cgroup lists
spec.linux.as_mut().unwrap().devices = vec![oci::LinuxDevice {
path: container_path.to_string(),
path: device.container_path.clone(),
major,
minor,
..oci::LinuxDevice::default()
@@ -914,12 +652,12 @@ mod tests {
});
let devidx = DevIndex::new(&spec);
let res = update_spec_device(&mut spec, &devidx, container_path, vm_path, container_path);
let res = update_spec_device_list(&device, &mut spec, &devidx);
assert!(res.is_ok());
}
#[test]
fn test_update_spec_device_guest_host_conflict() {
fn test_update_spec_device_list_guest_host_conflict() {
let null_rdev = fs::metadata("/dev/null").unwrap().rdev();
let zero_rdev = fs::metadata("/dev/zero").unwrap().rdev();
let full_rdev = fs::metadata("/dev/full").unwrap().rdev();
@@ -970,14 +708,20 @@ mod tests {
};
let devidx = DevIndex::new(&spec);
let container_path_a = "/dev/a";
let vm_path_a = "/dev/zero";
let dev_a = Device {
container_path: "/dev/a".to_string(),
vm_path: "/dev/zero".to_string(),
..Device::default()
};
let guest_major_a = stat::major(zero_rdev) as i64;
let guest_minor_a = stat::minor(zero_rdev) as i64;
let container_path_b = "/dev/b";
let vm_path_b = "/dev/full";
let dev_b = Device {
container_path: "/dev/b".to_string(),
vm_path: "/dev/full".to_string(),
..Device::default()
};
let guest_major_b = stat::major(full_rdev) as i64;
let guest_minor_b = stat::minor(full_rdev) as i64;
@@ -994,13 +738,7 @@ mod tests {
assert_eq!(Some(host_major_b), specresources.devices[1].major);
assert_eq!(Some(host_minor_b), specresources.devices[1].minor);
let res = update_spec_device(
&mut spec,
&devidx,
container_path_a,
vm_path_a,
container_path_a,
);
let res = update_spec_device_list(&dev_a, &mut spec, &devidx);
assert!(res.is_ok());
let specdevices = &spec.linux.as_ref().unwrap().devices;
@@ -1015,13 +753,7 @@ mod tests {
assert_eq!(Some(host_major_b), specresources.devices[1].major);
assert_eq!(Some(host_minor_b), specresources.devices[1].minor);
let res = update_spec_device(
&mut spec,
&devidx,
container_path_b,
vm_path_b,
container_path_b,
);
let res = update_spec_device_list(&dev_b, &mut spec, &devidx);
assert!(res.is_ok());
let specdevices = &spec.linux.as_ref().unwrap().devices;
@@ -1038,7 +770,7 @@ mod tests {
}
#[test]
fn test_update_spec_device_char_block_conflict() {
fn test_update_spec_device_list_char_block_conflict() {
let null_rdev = fs::metadata("/dev/null").unwrap().rdev();
let guest_major = stat::major(null_rdev) as i64;
@@ -1087,8 +819,11 @@ mod tests {
};
let devidx = DevIndex::new(&spec);
let container_path = "/dev/char";
let vm_path = "/dev/null";
let dev = Device {
container_path: "/dev/char".to_string(),
vm_path: "/dev/null".to_string(),
..Device::default()
};
let specresources = spec.linux.as_ref().unwrap().resources.as_ref().unwrap();
assert_eq!(Some(host_major), specresources.devices[0].major);
@@ -1096,7 +831,7 @@ mod tests {
assert_eq!(Some(host_major), specresources.devices[1].major);
assert_eq!(Some(host_minor), specresources.devices[1].minor);
let res = update_spec_device(&mut spec, &devidx, container_path, vm_path, container_path);
let res = update_spec_device_list(&dev, &mut spec, &devidx);
assert!(res.is_ok());
// Only the char device, not the block device should be updated
@@ -1107,43 +842,6 @@ mod tests {
assert_eq!(Some(host_minor), specresources.devices[1].minor);
}
#[test]
fn test_update_spec_device_final_path() {
let null_rdev = fs::metadata("/dev/null").unwrap().rdev();
let guest_major = stat::major(null_rdev) as i64;
let guest_minor = stat::minor(null_rdev) as i64;
let host_path = "/dev/host";
let host_major: i64 = 99;
let host_minor: i64 = 99;
let mut spec = Spec {
linux: Some(Linux {
devices: vec![oci::LinuxDevice {
path: host_path.to_string(),
r#type: "c".to_string(),
major: host_major,
minor: host_minor,
..oci::LinuxDevice::default()
}],
..Linux::default()
}),
..Spec::default()
};
let devidx = DevIndex::new(&spec);
let vm_path = "/dev/null";
let final_path = "/dev/final";
let res = update_spec_device(&mut spec, &devidx, host_path, vm_path, final_path);
assert!(res.is_ok());
let specdevices = &spec.linux.as_ref().unwrap().devices;
assert_eq!(guest_major, specdevices[0].major);
assert_eq!(guest_minor, specdevices[0].minor);
assert_eq!(final_path, specdevices[0].path);
}
#[test]
fn test_pcipath_to_sysfs() {
let testdir = tempdir().expect("failed to create tmpdir");
@@ -1370,112 +1068,4 @@ mod tests {
assert!(!matcher_b.is_match(&uev_a));
assert!(!matcher_a.is_match(&uev_b));
}
#[tokio::test]
async fn test_vfio_matcher() {
let grpa = IommuGroup(1);
let grpb = IommuGroup(22);
let mut uev_a = crate::uevent::Uevent::default();
uev_a.action = crate::linux_abi::U_EVENT_ACTION_ADD.to_string();
uev_a.devname = format!("vfio/{}", grpa);
uev_a.devpath = format!("/devices/virtual/vfio/{}", grpa);
let matcher_a = VfioMatcher::new(grpa);
let mut uev_b = uev_a.clone();
uev_b.devpath = format!("/devices/virtual/vfio/{}", grpb);
let matcher_b = VfioMatcher::new(grpb);
assert!(matcher_a.is_match(&uev_a));
assert!(matcher_b.is_match(&uev_b));
assert!(!matcher_b.is_match(&uev_a));
assert!(!matcher_a.is_match(&uev_b));
}
#[test]
fn test_split_vfio_option() {
assert_eq!(
split_vfio_option("0000:01:00.0=02/01"),
Some(("0000:01:00.0", "02/01"))
);
assert_eq!(split_vfio_option("0000:01:00.0=02/01=rubbish"), None);
assert_eq!(split_vfio_option("0000:01:00.0"), None);
}
#[test]
fn test_pci_driver_override() {
let testdir = tempdir().expect("failed to create tmpdir");
let syspci = testdir.path(); // Path to mock /sys/bus/pci
let dev0 = pci::Address::new(0, 0, pci::SlotFn::new(0, 0).unwrap());
let dev0path = syspci.join("devices").join(dev0.to_string());
let dev0drv = dev0path.join("driver");
let dev0override = dev0path.join("driver_override");
let drvapath = syspci.join("drivers").join("drv_a");
let drvaunbind = drvapath.join("unbind");
let probepath = syspci.join("drivers_probe");
// Start mocking dev0 as being unbound
fs::create_dir_all(&dev0path).unwrap();
pci_driver_override(syspci, dev0, "drv_a").unwrap();
assert_eq!(fs::read_to_string(&dev0override).unwrap(), "drv_a");
assert_eq!(fs::read_to_string(&probepath).unwrap(), dev0.to_string());
// Now mock dev0 already being attached to drv_a
fs::create_dir_all(&drvapath).unwrap();
std::os::unix::fs::symlink(&drvapath, dev0drv).unwrap();
std::fs::remove_file(&probepath).unwrap();
pci_driver_override(syspci, dev0, "drv_a").unwrap(); // no-op
assert_eq!(fs::read_to_string(&dev0override).unwrap(), "drv_a");
assert!(!probepath.exists());
// Now try binding to a different driver
pci_driver_override(syspci, dev0, "drv_b").unwrap();
assert_eq!(fs::read_to_string(&dev0override).unwrap(), "drv_b");
assert_eq!(fs::read_to_string(&probepath).unwrap(), dev0.to_string());
assert_eq!(fs::read_to_string(&drvaunbind).unwrap(), dev0.to_string());
}
#[test]
fn test_pci_iommu_group() {
let testdir = tempdir().expect("failed to create tmpdir"); // mock /sys
let syspci = testdir.path().join("bus").join("pci");
// Mock dev0, which has no group
let dev0 = pci::Address::new(0, 0, pci::SlotFn::new(0, 0).unwrap());
let dev0path = syspci.join("devices").join(dev0.to_string());
fs::create_dir_all(&dev0path).unwrap();
// Test dev0
assert!(pci_iommu_group(&syspci, dev0).unwrap().is_none());
// Mock dev1, which is in group 12
let dev1 = pci::Address::new(0, 1, pci::SlotFn::new(0, 0).unwrap());
let dev1path = syspci.join("devices").join(dev1.to_string());
let dev1group = dev1path.join("iommu_group");
fs::create_dir_all(&dev1path).unwrap();
std::os::unix::fs::symlink("../../../kernel/iommu_groups/12", &dev1group).unwrap();
// Test dev1
assert_eq!(
pci_iommu_group(&syspci, dev1).unwrap(),
Some(IommuGroup(12))
);
// Mock dev2, which has a bogus group (dir instead of symlink)
let dev2 = pci::Address::new(0, 2, pci::SlotFn::new(0, 0).unwrap());
let dev2path = syspci.join("devices").join(dev2.to_string());
let dev2group = dev2path.join("iommu_group");
fs::create_dir_all(&dev2group).unwrap();
// Test dev2
assert!(pci_iommu_group(&syspci, dev2).is_err());
}
}

View File

@@ -9,6 +9,7 @@
use std::fs;
pub const SYSFS_DIR: &str = "/sys";
pub const SYSFS_PCI_BUS_RESCAN_FILE: &str = "/sys/bus/pci/rescan";
#[cfg(any(
target_arch = "powerpc64",
target_arch = "s390x",
@@ -83,8 +84,6 @@ pub const SYSFS_MEMORY_ONLINE_PATH: &str = "/sys/devices/system/memory";
pub const SYSFS_SCSI_HOST_PATH: &str = "/sys/class/scsi_host";
pub const SYSFS_BUS_PCI_PATH: &str = "/sys/bus/pci";
pub const SYSFS_CGROUPPATH: &str = "/sys/fs/cgroup";
pub const SYSFS_ONLINE_FILE: &str = "online";
@@ -96,7 +95,6 @@ pub const SYSTEM_DEV_PATH: &str = "/dev";
// Linux UEvent related consts.
pub const U_EVENT_ACTION: &str = "ACTION";
pub const U_EVENT_ACTION_ADD: &str = "add";
pub const U_EVENT_ACTION_REMOVE: &str = "remove";
pub const U_EVENT_DEV_PATH: &str = "DEVPATH";
pub const U_EVENT_SUB_SYSTEM: &str = "SUBSYSTEM";
pub const U_EVENT_SEQ_NUM: &str = "SEQNUM";

View File

@@ -77,11 +77,11 @@ mod rpc;
mod tracer;
const NAME: &str = "kata-agent";
const KERNEL_CMDLINE_FILE: &str = "/proc/cmdline";
lazy_static! {
static ref AGENT_CONFIG: Arc<RwLock<AgentConfig>> = Arc::new(RwLock::new(
AgentConfig::from_cmdline("/proc/cmdline").unwrap()
));
static ref AGENT_CONFIG: Arc<RwLock<AgentConfig>> =
Arc::new(RwLock::new(config::AgentConfig::new()));
}
#[instrument]
@@ -134,11 +134,15 @@ async fn real_main() -> std::result::Result<(), Box<dyn std::error::Error>> {
console::initialize();
lazy_static::initialize(&AGENT_CONFIG);
// support vsock log
let (rfd, wfd) = unistd::pipe2(OFlag::O_CLOEXEC)?;
let (shutdown_tx, shutdown_rx) = channel(true);
let agent_config = AGENT_CONFIG.clone();
let init_mode = unistd::getpid() == Pid::from_raw(1);
if init_mode {
// dup a new file descriptor for this temporary logger writer,
@@ -159,15 +163,20 @@ async fn real_main() -> std::result::Result<(), Box<dyn std::error::Error>> {
e
})?;
lazy_static::initialize(&AGENT_CONFIG);
let mut config = agent_config.write().await;
config.parse_cmdline(KERNEL_CMDLINE_FILE)?;
init_agent_as_init(&logger, AGENT_CONFIG.read().await.unified_cgroup_hierarchy)?;
init_agent_as_init(&logger, config.unified_cgroup_hierarchy)?;
drop(logger_async_guard);
} else {
lazy_static::initialize(&AGENT_CONFIG);
// once parsed cmdline and set the config, release the write lock
// as soon as possible in case other thread would get read lock on
// it.
let mut config = agent_config.write().await;
config.parse_cmdline(KERNEL_CMDLINE_FILE)?;
}
let config = agent_config.read().await;
let config = AGENT_CONFIG.read().await;
let log_vport = config.log_vport as u32;
let log_handle = tokio::spawn(create_logger_task(rfd, log_vport, shutdown_rx.clone()));
@@ -196,16 +205,16 @@ async fn real_main() -> std::result::Result<(), Box<dyn std::error::Error>> {
ttrpc_log_guard = Ok(slog_stdlog::init().map_err(|e| e)?);
}
if config.tracing {
tracer::setup_tracing(NAME, &logger)?;
if config.tracing != tracer::TraceType::Disabled {
let _ = tracer::setup_tracing(NAME, &logger, &config)?;
}
let root_span = span!(tracing::Level::TRACE, "root-span");
let root = span!(tracing::Level::TRACE, "root-span", work_units = 2);
// XXX: Start the root trace transaction.
//
// XXX: Note that *ALL* spans needs to start after this point!!
let span_guard = root_span.enter();
let _enter = root.enter();
// Start the sandbox and wait for its ttRPC server to end
start_sandbox(&logger, &config, init_mode, &mut tasks, shutdown_rx.clone()).await?;
@@ -229,29 +238,19 @@ async fn real_main() -> std::result::Result<(), Box<dyn std::error::Error>> {
// Wait for all threads to finish
let results = join_all(tasks).await;
// force flushing spans
drop(span_guard);
drop(root_span);
for result in results {
if let Err(e) = result {
return Err(anyhow!(e).into());
}
}
if config.tracing {
if config.tracing != tracer::TraceType::Disabled {
tracer::end_tracing();
}
eprintln!("{} shutdown complete", NAME);
let mut wait_errors: Vec<tokio::task::JoinError> = vec![];
for result in results {
if let Err(e) = result {
eprintln!("wait task error: {:#?}", e);
wait_errors.push(e);
}
}
if wait_errors.is_empty() {
Ok(())
} else {
Err(anyhow!("wait all tasks failed: {:#?}", wait_errors).into())
}
Ok(())
}
fn main() -> std::result::Result<(), Box<dyn std::error::Error>> {

View File

@@ -4,27 +4,28 @@
//
use std::collections::HashMap;
use std::ffi::CString;
use std::fs;
use std::fs::File;
use std::io;
use std::io::{BufRead, BufReader};
use std::iter;
use std::os::unix::fs::{MetadataExt, PermissionsExt};
use std::path::Path;
use std::ptr::null;
use std::str::FromStr;
use std::sync::Arc;
use tokio::sync::Mutex;
use nix::mount::MsFlags;
use libc::{c_void, mount};
use nix::mount::{self, MsFlags};
use nix::unistd::Gid;
use regex::Regex;
use crate::device::{
get_scsi_device_name, get_virtio_blk_pci_device_name, online_device, wait_for_pmem_device,
DRIVER_9P_TYPE, DRIVER_BLK_CCW_TYPE, DRIVER_BLK_TYPE, DRIVER_EPHEMERAL_TYPE, DRIVER_LOCAL_TYPE,
DRIVER_MMIO_BLK_TYPE, DRIVER_NVDIMM_TYPE, DRIVER_SCSI_TYPE, DRIVER_VIRTIOFS_TYPE,
DRIVER_WATCHABLE_BIND_TYPE,
};
use crate::linux_abi::*;
use crate::pci;
@@ -36,6 +37,17 @@ use anyhow::{anyhow, Context, Result};
use slog::Logger;
use tracing::instrument;
pub const DRIVER_9P_TYPE: &str = "9p";
pub const DRIVER_VIRTIOFS_TYPE: &str = "virtio-fs";
pub const DRIVER_BLK_TYPE: &str = "blk";
pub const DRIVER_BLK_CCW_TYPE: &str = "blk-ccw";
pub const DRIVER_MMIO_BLK_TYPE: &str = "mmioblk";
pub const DRIVER_SCSI_TYPE: &str = "scsi";
pub const DRIVER_NVDIMM_TYPE: &str = "nvdimm";
pub const DRIVER_EPHEMERAL_TYPE: &str = "ephemeral";
pub const DRIVER_LOCAL_TYPE: &str = "local";
pub const DRIVER_WATCHABLE_BIND_TYPE: &str = "watchable-bind";
pub const TYPE_ROOTFS: &str = "rootfs";
pub const MOUNT_GUEST_TAG: &str = "kataShared";
@@ -137,53 +149,96 @@ pub const STORAGE_HANDLER_LIST: &[&str] = &[
DRIVER_WATCHABLE_BIND_TYPE,
];
#[instrument]
pub fn baremount(
source: &str,
destination: &str,
fs_type: &str,
#[derive(Debug, Clone)]
pub struct BareMount<'a> {
source: &'a str,
destination: &'a str,
fs_type: &'a str,
flags: MsFlags,
options: &str,
logger: &Logger,
) -> Result<()> {
let logger = logger.new(o!("subsystem" => "baremount"));
options: &'a str,
logger: Logger,
}
if source.is_empty() {
return Err(anyhow!("need mount source"));
// mount mounts a source in to a destination. This will do some bookkeeping:
// * evaluate all symlinks
// * ensure the source exists
impl<'a> BareMount<'a> {
#[instrument]
pub fn new(
s: &'a str,
d: &'a str,
fs_type: &'a str,
flags: MsFlags,
options: &'a str,
logger: &Logger,
) -> Self {
BareMount {
source: s,
destination: d,
fs_type,
flags,
options,
logger: logger.new(o!("subsystem" => "baremount")),
}
}
if destination.is_empty() {
return Err(anyhow!("need mount destination"));
#[instrument]
pub fn mount(&self) -> Result<()> {
let source;
let dest;
let fs_type;
let mut options = null();
let cstr_options: CString;
let cstr_source: CString;
let cstr_dest: CString;
let cstr_fs_type: CString;
if self.source.is_empty() {
return Err(anyhow!("need mount source"));
}
if self.destination.is_empty() {
return Err(anyhow!("need mount destination"));
}
cstr_source = CString::new(self.source)?;
source = cstr_source.as_ptr();
cstr_dest = CString::new(self.destination)?;
dest = cstr_dest.as_ptr();
if self.fs_type.is_empty() {
return Err(anyhow!("need mount FS type"));
}
cstr_fs_type = CString::new(self.fs_type)?;
fs_type = cstr_fs_type.as_ptr();
if !self.options.is_empty() {
cstr_options = CString::new(self.options)?;
options = cstr_options.as_ptr() as *const c_void;
}
info!(
self.logger,
"mount source={:?}, dest={:?}, fs_type={:?}, options={:?}",
self.source,
self.destination,
self.fs_type,
self.options
);
let rc = unsafe { mount(source, dest, fs_type, self.flags.bits(), options) };
if rc < 0 {
return Err(anyhow!(
"failed to mount {:?} to {:?}, with error: {}",
self.source,
self.destination,
io::Error::last_os_error()
));
}
Ok(())
}
if fs_type.is_empty() {
return Err(anyhow!("need mount FS type"));
}
info!(
logger,
"mount source={:?}, dest={:?}, fs_type={:?}, options={:?}",
source,
destination,
fs_type,
options
);
nix::mount::mount(
Some(source),
destination,
Some(fs_type),
flags,
Some(options),
)
.map_err(|e| {
anyhow!(
"failed to mount {:?} to {:?}, with error: {}",
source,
destination,
e
)
})
}
#[instrument]
@@ -431,14 +486,17 @@ fn mount_storage(logger: &Logger, storage: &Storage) -> Result<()> {
return Ok(());
}
let mount_path = Path::new(&storage.mount_point);
let src_path = Path::new(&storage.source);
if storage.fstype == "bind" && !src_path.is_dir() {
ensure_destination_file_exists(mount_path)
} else {
fs::create_dir_all(mount_path).map_err(anyhow::Error::from)
match storage.fstype.as_str() {
DRIVER_9P_TYPE | DRIVER_VIRTIOFS_TYPE => {
let dest_path = Path::new(storage.mount_point.as_str());
if !dest_path.exists() {
fs::create_dir_all(dest_path).context("Create mount destination failed")?;
}
}
_ => {
ensure_destination_exists(storage.mount_point.as_str(), storage.fstype.as_str())?;
}
}
.context("Could not create mountpoint")?;
let options_vec = storage.options.to_vec();
let options_vec = options_vec.iter().map(String::as_str).collect();
@@ -451,14 +509,16 @@ fn mount_storage(logger: &Logger, storage: &Storage) -> Result<()> {
"mount-options" => options.as_str(),
);
baremount(
let bare_mount = BareMount::new(
storage.source.as_str(),
storage.mount_point.as_str(),
storage.fstype.as_str(),
flags,
options.as_str(),
&logger,
)
);
bare_mount.mount()
}
/// Looks for `mount_point` entry in the /proc/mounts.
@@ -577,9 +637,11 @@ fn mount_to_rootfs(logger: &Logger, m: &InitMount) -> Result<()> {
let (flags, options) = parse_mount_flags_and_options(options_vec);
let bare_mount = BareMount::new(m.src, m.dest, m.fstype, flags, options.as_str(), logger);
fs::create_dir_all(Path::new(m.dest)).context("could not create directory")?;
baremount(m.src, m.dest, m.fstype, flags, &options, logger).or_else(|e| {
bare_mount.mount().or_else(|e| {
if m.src != "dev" {
return Err(e);
}
@@ -754,27 +816,32 @@ pub fn cgroups_mount(logger: &Logger, unified_cgroup_hierarchy: bool) -> Result<
#[instrument]
pub fn remove_mounts(mounts: &[String]) -> Result<()> {
for m in mounts.iter() {
nix::mount::umount(m.as_str()).context(format!("failed to umount {:?}", m))?;
mount::umount(m.as_str()).context(format!("failed to umount {:?}", m))?;
}
Ok(())
}
// ensure_destination_exists will recursively create a given mountpoint. If directories
// are created, their permissions are initialized to mountPerm(0755)
#[instrument]
fn ensure_destination_file_exists(path: &Path) -> Result<()> {
if path.is_file() {
fn ensure_destination_exists(destination: &str, fs_type: &str) -> Result<()> {
let d = Path::new(destination);
if d.exists() {
return Ok(());
} else if path.exists() {
return Err(anyhow!("{:?} exists but is not a regular file", path));
}
let dir = d
.parent()
.ok_or_else(|| anyhow!("mount destination {} doesn't exist", destination))?;
if !dir.exists() {
fs::create_dir_all(dir).context(format!("create dir all {:?}", dir))?;
}
// The only way parent() can return None is if the path is /,
// which always exists, so the test above will already have caught
// it, thus the unwrap() is safe
let dir = path.parent().unwrap();
fs::create_dir_all(dir).context(format!("create_dir_all {:?}", dir))?;
fs::File::create(path).context(format!("create empty file {:?}", path))?;
if fs_type != "bind" || d.is_dir() {
fs::create_dir_all(d).context(format!("create dir all {:?}", d))?;
} else {
fs::File::create(d).context(format!("create file {:?}", d))?;
}
Ok(())
}
@@ -798,6 +865,8 @@ fn parse_options(option_list: Vec<String>) -> HashMap<String, String> {
mod tests {
use super::*;
use crate::{skip_if_not_root, skip_loop_if_not_root, skip_loop_if_root};
use libc::umount;
use std::fs::metadata;
use std::fs::File;
use std::fs::OpenOptions;
use std::io::Write;
@@ -937,7 +1006,7 @@ mod tests {
std::fs::create_dir_all(d).expect("failed to created directory");
}
let result = baremount(
let bare_mount = BareMount::new(
&src_filename,
&dest_filename,
d.fs_type,
@@ -946,13 +1015,25 @@ mod tests {
&logger,
);
let result = bare_mount.mount();
let msg = format!("{}: result: {:?}", msg, result);
if d.error_contains.is_empty() {
assert!(result.is_ok(), "{}", msg);
// Cleanup
nix::mount::umount(dest_filename.as_str()).unwrap();
unsafe {
let cstr_dest =
CString::new(dest_filename).expect("failed to convert dest to cstring");
let umount_dest = cstr_dest.as_ptr();
let ret = umount(umount_dest);
let msg = format!("{}: umount result: {:?}", msg, result);
assert!(ret == 0, "{}", msg);
};
continue;
}
@@ -1022,7 +1103,7 @@ mod tests {
}
// Create an actual mount
let result = baremount(
let bare_mount = BareMount::new(
mnt_src_filename,
mnt_dest_filename,
"bind",
@@ -1030,6 +1111,8 @@ mod tests {
"",
&logger,
);
let result = bare_mount.mount();
assert!(result.is_ok(), "mount for test setup failed");
let tests = &[
@@ -1361,20 +1444,37 @@ mod tests {
}
#[test]
fn test_ensure_destination_file_exists() {
fn test_ensure_destination_exists() {
let dir = tempdir().expect("failed to create tmpdir");
let mut testfile = dir.into_path();
testfile.push("testfile");
let result = ensure_destination_file_exists(&testfile);
let result = ensure_destination_exists(testfile.to_str().unwrap(), "bind");
assert!(result.is_ok());
assert!(testfile.exists());
let result = ensure_destination_file_exists(&testfile);
let result = ensure_destination_exists(testfile.to_str().unwrap(), "bind");
assert!(result.is_ok());
assert!(testfile.is_file());
let meta = metadata(testfile).unwrap();
assert!(meta.is_file());
let dir = tempdir().expect("failed to create tmpdir");
let mut testdir = dir.into_path();
testdir.push("testdir");
let result = ensure_destination_exists(testdir.to_str().unwrap(), "ext4");
assert!(result.is_ok());
assert!(testdir.exists());
let result = ensure_destination_exists(testdir.to_str().unwrap(), "ext4");
assert!(result.is_ok());
//let meta = metadata(testdir.to_str().unwrap()).unwrap();
let meta = metadata(testdir).unwrap();
assert!(meta.is_dir());
}
}

View File

@@ -13,7 +13,7 @@ use std::fs::File;
use std::path::{Path, PathBuf};
use tracing::instrument;
use crate::mount::{baremount, FLAGS};
use crate::mount::{BareMount, FLAGS};
use slog::Logger;
const PERSISTENT_NS_DIR: &str = "/var/run/sandbox-ns";
@@ -129,7 +129,8 @@ impl Namespace {
}
};
baremount(source, destination, "none", flags, "", &logger).map_err(|e| {
let bare_mount = BareMount::new(source, destination, "none", flags, "", &logger);
bare_mount.mount().map_err(|e| {
anyhow!(
"Failed to mount {} to {} with err:{:?}",
source,

View File

@@ -6,7 +6,6 @@
use anyhow::{anyhow, Context, Result};
use futures::{future, StreamExt, TryStreamExt};
use ipnetwork::{IpNetwork, Ipv4Network, Ipv6Network};
use nix::errno::Errno;
use protobuf::RepeatedField;
use protocols::types::{ARPNeighbor, IPAddress, IPFamily, Interface, Route};
use rtnetlink::{new_connection, packet, IpVersion};
@@ -313,6 +312,7 @@ impl Handle {
for route in list {
let link = self.find_link(LinkFilter::Name(&route.device)).await?;
let is_v6 = is_ipv6(route.get_gateway()) || is_ipv6(route.get_dest());
const MAIN_TABLE: u8 = packet::constants::RT_TABLE_MAIN;
const UNICAST: u8 = packet::constants::RTN_UNICAST;
@@ -334,7 +334,7 @@ impl Handle {
// `rtnetlink` offers a separate request builders for different IP versions (IP v4 and v6).
// This if branch is a bit clumsy because it does almost the same.
if route.get_family() == IPFamily::v6 {
if is_v6 {
let dest_addr = if !route.dest.is_empty() {
Ipv6Network::from_str(&route.dest)?
} else {
@@ -364,17 +364,14 @@ impl Handle {
request = request.gateway(ip);
}
if let Err(rtnetlink::Error::NetlinkError(message)) = request.execute().await {
if Errno::from_i32(message.code.abs()) != Errno::EEXIST {
return Err(anyhow!(
"Failed to add IP v6 route (src: {}, dst: {}, gtw: {},Err: {})",
route.get_source(),
route.get_dest(),
route.get_gateway(),
message
));
}
}
request.execute().await.with_context(|| {
format!(
"Failed to add IP v6 route (src: {}, dst: {}, gtw: {})",
route.get_source(),
route.get_dest(),
route.get_gateway()
)
})?;
} else {
let dest_addr = if !route.dest.is_empty() {
Ipv4Network::from_str(&route.dest)?
@@ -405,17 +402,7 @@ impl Handle {
request = request.gateway(ip);
}
if let Err(rtnetlink::Error::NetlinkError(message)) = request.execute().await {
if Errno::from_i32(message.code.abs()) != Errno::EEXIST {
return Err(anyhow!(
"Failed to add IP v4 route (src: {}, dst: {}, gtw: {},Err: {})",
route.get_source(),
route.get_dest(),
route.get_gateway(),
message
));
}
}
request.execute().await?;
}
}
@@ -607,6 +594,10 @@ fn format_address(data: &[u8]) -> Result<String> {
}
}
fn is_ipv6(str: &str) -> bool {
Ipv6Addr::from_str(str).is_ok()
}
fn parse_mac_address(addr: &str) -> Result<[u8; 6]> {
let mut split = addr.splitn(6, ':');
@@ -941,6 +932,16 @@ mod tests {
assert_eq!(bytes, [0xAB, 0x0C, 0xDE, 0x12, 0x34, 0x56]);
}
#[test]
fn check_ipv6() {
assert!(is_ipv6("::1"));
assert!(is_ipv6("2001:0:3238:DFE1:63::FEFB"));
assert!(!is_ipv6(""));
assert!(!is_ipv6("127.0.0.1"));
assert!(!is_ipv6("10.10.10.10"));
}
fn clean_env_for_test_add_one_arp_neighbor(dummy_name: &str, ip: &str) {
// ip link delete dummy
Command::new("ip")

View File

@@ -9,143 +9,51 @@ use std::str::FromStr;
use anyhow::anyhow;
// The PCI spec reserves 5 bits (0..31) for slot number (a.k.a. device
// number)
// The PCI spec reserves 5 bits for slot number (a.k.a. device
// number), giving slots 0..31
const SLOT_BITS: u8 = 5;
const SLOT_MAX: u8 = (1 << SLOT_BITS) - 1;
// The PCI spec reserves 3 bits (0..7) for function number
const FUNCTION_BITS: u8 = 3;
const FUNCTION_MAX: u8 = (1 << FUNCTION_BITS) - 1;
// Represents a PCI function's slot (a.k.a. device) and function
// numbers, giving its location on a single logical bus
// Represents a PCI function's slot number (a.k.a. device number),
// giving its location on a single bus
#[derive(Copy, Clone, Debug, PartialEq, Eq)]
pub struct SlotFn(u8);
pub struct Slot(u8);
impl SlotFn {
pub fn new<T, U>(ss: T, f: U) -> anyhow::Result<Self>
where
T: TryInto<u8> + fmt::Display + Copy,
U: TryInto<u8> + fmt::Display + Copy,
{
let ss8 = match ss.try_into() {
Ok(ss8) if ss8 <= SLOT_MAX => ss8,
_ => {
return Err(anyhow!(
"PCI slot {} should be in range [0..{:#x}]",
ss,
SLOT_MAX
));
impl Slot {
pub fn new<T: TryInto<u8> + fmt::Display + Copy>(v: T) -> anyhow::Result<Self> {
if let Ok(v8) = v.try_into() {
if v8 <= SLOT_MAX {
return Ok(Slot(v8));
}
};
let f8 = match f.try_into() {
Ok(f8) if f8 <= FUNCTION_MAX => f8,
_ => {
return Err(anyhow!(
"PCI function {} should be in range [0..{:#x}]",
f,
FUNCTION_MAX
));
}
};
Ok(SlotFn(ss8 << FUNCTION_BITS | f8))
}
pub fn slot(self) -> u8 {
self.0 >> FUNCTION_BITS
}
pub fn function(self) -> u8 {
self.0 & FUNCTION_MAX
}
Err(anyhow!(
"PCI slot {} should be in range [0..{:#x}]",
v,
SLOT_MAX
))
}
}
impl FromStr for SlotFn {
impl FromStr for Slot {
type Err = anyhow::Error;
fn from_str(s: &str) -> anyhow::Result<Self> {
let mut tokens = s.split('.').fuse();
let slot = tokens.next();
let func = tokens.next();
if slot.is_none() || tokens.next().is_some() {
return Err(anyhow!(
"PCI slot/function {} should have the format SS.F",
s
));
}
let slot = isize::from_str_radix(slot.unwrap(), 16)?;
let func = match func {
Some(func) => isize::from_str_radix(func, 16)?,
None => 0,
};
SlotFn::new(slot, func)
let v = isize::from_str_radix(s, 16)?;
Slot::new(v)
}
}
impl fmt::Display for SlotFn {
impl fmt::Display for Slot {
fn fmt(&self, f: &mut fmt::Formatter) -> Result<(), fmt::Error> {
write!(f, "{:02x}.{:01x}", self.slot(), self.function())
}
}
#[derive(Copy, Clone, Debug, PartialEq, Eq)]
pub struct Address {
domain: u16,
bus: u8,
slotfn: SlotFn,
}
impl Address {
pub fn new(domain: u16, bus: u8, slotfn: SlotFn) -> Self {
Address {
domain,
bus,
slotfn,
}
}
}
impl FromStr for Address {
type Err = anyhow::Error;
fn from_str(s: &str) -> anyhow::Result<Self> {
let mut tokens = s.split(':').fuse();
let domain = tokens.next();
let bus = tokens.next();
let slotfn = tokens.next();
if domain.is_none() || bus.is_none() || slotfn.is_none() || tokens.next().is_some() {
return Err(anyhow!(
"PCI address {} should have the format DDDD:BB:SS.F",
s
));
}
let domain = u16::from_str_radix(domain.unwrap(), 16)?;
let bus = u8::from_str_radix(bus.unwrap(), 16)?;
let slotfn = SlotFn::from_str(slotfn.unwrap())?;
Ok(Address::new(domain, bus, slotfn))
}
}
impl fmt::Display for Address {
fn fmt(&self, f: &mut fmt::Formatter) -> Result<(), fmt::Error> {
write!(f, "{:04x}:{:02x}:{}", self.domain, self.bus, self.slotfn)
write!(f, "{:02x}", self.0)
}
}
#[derive(Clone, Debug, PartialEq, Eq)]
pub struct Path(Vec<SlotFn>);
pub struct Path(Vec<Slot>);
impl Path {
pub fn new(slots: Vec<SlotFn>) -> anyhow::Result<Self> {
pub fn new(slots: Vec<Slot>) -> anyhow::Result<Self> {
if slots.is_empty() {
return Err(anyhow!("PCI path must have at least one element"));
}
@@ -155,7 +63,7 @@ impl Path {
// Let Path be treated as a slice of Slots
impl Deref for Path {
type Target = [SlotFn];
type Target = [Slot];
fn deref(&self) -> &Self::Target {
&self.0
@@ -177,170 +85,83 @@ impl FromStr for Path {
type Err = anyhow::Error;
fn from_str(s: &str) -> anyhow::Result<Self> {
let rslots: anyhow::Result<Vec<SlotFn>> = s.split('/').map(SlotFn::from_str).collect();
let rslots: anyhow::Result<Vec<Slot>> = s.split('/').map(Slot::from_str).collect();
Path::new(rslots?)
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::pci::{Path, Slot};
use std::str::FromStr;
#[test]
fn test_slotfn() {
fn test_slot() {
// Valid slots
let sf = SlotFn::new(0x00, 0x0).unwrap();
assert_eq!(format!("{}", sf), "00.0");
let slot = Slot::new(0x00).unwrap();
assert_eq!(format!("{}", slot), "00");
let sf = SlotFn::from_str("00.0").unwrap();
assert_eq!(format!("{}", sf), "00.0");
let slot = Slot::from_str("00").unwrap();
assert_eq!(format!("{}", slot), "00");
let sf = SlotFn::from_str("00").unwrap();
assert_eq!(format!("{}", sf), "00.0");
let sf = SlotFn::new(31, 7).unwrap();
let sf2 = SlotFn::from_str("1f.7").unwrap();
assert_eq!(sf, sf2);
let slot = Slot::new(31).unwrap();
let slot2 = Slot::from_str("1f").unwrap();
assert_eq!(slot, slot2);
// Bad slots
let sf = SlotFn::new(-1, 0);
assert!(sf.is_err());
let slot = Slot::new(-1);
assert!(slot.is_err());
let sf = SlotFn::new(32, 0);
assert!(sf.is_err());
let slot = Slot::new(32);
assert!(slot.is_err());
let sf = SlotFn::from_str("20.0");
assert!(sf.is_err());
let slot = Slot::from_str("20");
assert!(slot.is_err());
let sf = SlotFn::from_str("20");
assert!(sf.is_err());
let slot = Slot::from_str("xy");
assert!(slot.is_err());
let sf = SlotFn::from_str("xy.0");
assert!(sf.is_err());
let slot = Slot::from_str("00/");
assert!(slot.is_err());
let sf = SlotFn::from_str("xy");
assert!(sf.is_err());
// Bad functions
let sf = SlotFn::new(0, -1);
assert!(sf.is_err());
let sf = SlotFn::new(0, 8);
assert!(sf.is_err());
let sf = SlotFn::from_str("00.8");
assert!(sf.is_err());
let sf = SlotFn::from_str("00.x");
assert!(sf.is_err());
// Bad formats
let sf = SlotFn::from_str("");
assert!(sf.is_err());
let sf = SlotFn::from_str("00.0.0");
assert!(sf.is_err());
let sf = SlotFn::from_str("00.0/");
assert!(sf.is_err());
let sf = SlotFn::from_str("00/");
assert!(sf.is_err());
}
#[test]
fn test_address() {
// Valid addresses
let sf0_0 = SlotFn::new(0, 0).unwrap();
let sf1f_7 = SlotFn::new(0x1f, 7).unwrap();
let addr = Address::new(0, 0, sf0_0);
assert_eq!(format!("{}", addr), "0000:00:00.0");
let addr2 = Address::from_str("0000:00:00.0").unwrap();
assert_eq!(addr, addr2);
let addr = Address::new(0xffff, 0xff, sf1f_7);
assert_eq!(format!("{}", addr), "ffff:ff:1f.7");
let addr2 = Address::from_str("ffff:ff:1f.7").unwrap();
assert_eq!(addr, addr2);
// Bad addresses
let addr = Address::from_str("10000:00:00.0");
assert!(addr.is_err());
let addr = Address::from_str("0000:100:00.0");
assert!(addr.is_err());
let addr = Address::from_str("0000:00:20.0");
assert!(addr.is_err());
let addr = Address::from_str("0000:00:00.8");
assert!(addr.is_err());
let addr = Address::from_str("xyz");
assert!(addr.is_err());
let addr = Address::from_str("xyxy:xy:xy.z");
assert!(addr.is_err());
let addr = Address::from_str("0000:00:00.0:00");
assert!(addr.is_err());
let slot = Slot::from_str("");
assert!(slot.is_err());
}
#[test]
fn test_path() {
let sf3_0 = SlotFn::new(0x03, 0).unwrap();
let sf4_0 = SlotFn::new(0x04, 0).unwrap();
let sf5_0 = SlotFn::new(0x05, 0).unwrap();
let sfa_5 = SlotFn::new(0x0a, 5).unwrap();
let sfb_6 = SlotFn::new(0x0b, 6).unwrap();
let sfc_7 = SlotFn::new(0x0c, 7).unwrap();
let slot3 = Slot::new(0x03).unwrap();
let slot4 = Slot::new(0x04).unwrap();
let slot5 = Slot::new(0x05).unwrap();
// Valid paths
let pcipath = Path::new(vec![sf3_0]).unwrap();
assert_eq!(format!("{}", pcipath), "03.0");
let pcipath2 = Path::from_str("03.0").unwrap();
assert_eq!(pcipath, pcipath2);
let pcipath = Path::new(vec![slot3]).unwrap();
assert_eq!(format!("{}", pcipath), "03");
let pcipath2 = Path::from_str("03").unwrap();
assert_eq!(pcipath, pcipath2);
assert_eq!(pcipath.len(), 1);
assert_eq!(pcipath[0], sf3_0);
assert_eq!(pcipath[0], slot3);
let pcipath = Path::new(vec![sf3_0, sf4_0]).unwrap();
assert_eq!(format!("{}", pcipath), "03.0/04.0");
let pcipath2 = Path::from_str("03.0/04.0").unwrap();
assert_eq!(pcipath, pcipath2);
let pcipath = Path::new(vec![slot3, slot4]).unwrap();
assert_eq!(format!("{}", pcipath), "03/04");
let pcipath2 = Path::from_str("03/04").unwrap();
assert_eq!(pcipath, pcipath2);
assert_eq!(pcipath.len(), 2);
assert_eq!(pcipath[0], sf3_0);
assert_eq!(pcipath[1], sf4_0);
assert_eq!(pcipath[0], slot3);
assert_eq!(pcipath[1], slot4);
let pcipath = Path::new(vec![sf3_0, sf4_0, sf5_0]).unwrap();
assert_eq!(format!("{}", pcipath), "03.0/04.0/05.0");
let pcipath2 = Path::from_str("03.0/04.0/05.0").unwrap();
assert_eq!(pcipath, pcipath2);
let pcipath = Path::new(vec![slot3, slot4, slot5]).unwrap();
assert_eq!(format!("{}", pcipath), "03/04/05");
let pcipath2 = Path::from_str("03/04/05").unwrap();
assert_eq!(pcipath, pcipath2);
assert_eq!(pcipath.len(), 3);
assert_eq!(pcipath[0], sf3_0);
assert_eq!(pcipath[1], sf4_0);
assert_eq!(pcipath[2], sf5_0);
let pcipath = Path::new(vec![sfa_5, sfb_6, sfc_7]).unwrap();
assert_eq!(format!("{}", pcipath), "0a.5/0b.6/0c.7");
let pcipath2 = Path::from_str("0a.5/0b.6/0c.7").unwrap();
assert_eq!(pcipath, pcipath2);
assert_eq!(pcipath.len(), 3);
assert_eq!(pcipath[0], sfa_5);
assert_eq!(pcipath[1], sfb_6);
assert_eq!(pcipath[2], sfc_7);
assert_eq!(pcipath[0], slot3);
assert_eq!(pcipath[1], slot4);
assert_eq!(pcipath[2], slot5);
// Bad paths
assert!(Path::new(vec!()).is_err());
assert!(Path::from_str("20").is_err());
assert!(Path::from_str("00.8").is_err());
assert!(Path::from_str("//").is_err());
assert!(Path::from_str("xyz").is_err());
}

View File

@@ -3,6 +3,7 @@
// SPDX-License-Identifier: Apache-2.0
//
use crate::pci;
use async_trait::async_trait;
use rustjail::{pipestream::PipeStream, process::StreamType};
use tokio::io::{AsyncReadExt, AsyncWriteExt, ReadHalf};
@@ -20,7 +21,7 @@ use ttrpc::{
use anyhow::{anyhow, Context, Result};
use oci::{LinuxNamespace, Root, Spec};
use protobuf::{Message, RepeatedField, SingularPtrField};
use protobuf::{RepeatedField, SingularPtrField};
use protocols::agent::{
AddSwapRequest, AgentDetails, CopyFileRequest, GuestDetailsResponse, Interfaces, Metrics,
OOMEvent, ReadStreamResponse, Routes, StatsContainerResponse, WaitProcessResponse,
@@ -43,13 +44,12 @@ use nix::sys::stat;
use nix::unistd::{self, Pid};
use rustjail::process::ProcessOperations;
use crate::device::{add_devices, get_virtio_blk_pci_device_name, update_device_cgroup};
use crate::device::{add_devices, pcipath_to_sysfs, rescan_pci_bus, update_device_cgroup};
use crate::linux_abi::*;
use crate::metrics::get_metrics;
use crate::mount::{add_storages, baremount, remove_mounts, STORAGE_HANDLER_LIST};
use crate::mount::{add_storages, remove_mounts, BareMount, STORAGE_HANDLER_LIST};
use crate::namespace::{NSTYPEIPC, NSTYPEPID, NSTYPEUTS};
use crate::network::setup_guest_dns;
use crate::pci;
use crate::random;
use crate::sandbox::Sandbox;
use crate::version::{AGENT_VERSION, API_VERSION};
@@ -86,21 +86,6 @@ macro_rules! sl {
};
}
macro_rules! is_allowed {
($req:ident) => {
if !AGENT_CONFIG
.read()
.await
.is_allowed_endpoint($req.descriptor().name())
{
return Err(ttrpc_error(
ttrpc::Code::UNIMPLEMENTED,
format!("{} is blocked", $req.descriptor().name()),
));
}
};
}
#[derive(Clone, Debug)]
pub struct AgentService {
sandbox: Arc<Mutex<Sandbox>>,
@@ -149,6 +134,10 @@ impl AgentService {
info!(sl!(), "receive createcontainer, spec: {:?}", &oci);
// re-scan PCI bus
// looking for hidden devices
rescan_pci_bus().context("Could not rescan PCI bus")?;
// Some devices need some extra processing (the ones invoked with
// --device for instance), and that's what this call is doing. It
// updates the devices listed in the OCI spec, so that they actually
@@ -433,7 +422,7 @@ impl AgentService {
.get_container(&cid)
.ok_or_else(|| anyhow!("Invalid container id"))?;
let p = match ctr.processes.get_mut(&pid) {
let mut p = match ctr.processes.get_mut(&pid) {
Some(p) => p,
None => {
// Lost race, pick up exit code from channel
@@ -444,7 +433,7 @@ impl AgentService {
// need to close all fd
// ignore errors for some fd might be closed by stream
p.cleanup_process_stream();
let _ = cleanup_process(&mut p);
resp.status = p.exit_code;
// broadcast exit code to all parallel watchers
@@ -546,7 +535,6 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
req: protocols::agent::CreateContainerRequest,
) -> ttrpc::Result<Empty> {
trace_rpc_call!(ctx, "create_container", req);
is_allowed!(req);
match self.do_create_container(req).await {
Err(e) => Err(ttrpc_error(ttrpc::Code::INTERNAL, e.to_string())),
Ok(_) => Ok(Empty::new()),
@@ -559,7 +547,6 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
req: protocols::agent::StartContainerRequest,
) -> ttrpc::Result<Empty> {
trace_rpc_call!(ctx, "start_container", req);
is_allowed!(req);
match self.do_start_container(req).await {
Err(e) => Err(ttrpc_error(ttrpc::Code::INTERNAL, e.to_string())),
Ok(_) => Ok(Empty::new()),
@@ -572,7 +559,6 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
req: protocols::agent::RemoveContainerRequest,
) -> ttrpc::Result<Empty> {
trace_rpc_call!(ctx, "remove_container", req);
is_allowed!(req);
match self.do_remove_container(req).await {
Err(e) => Err(ttrpc_error(ttrpc::Code::INTERNAL, e.to_string())),
Ok(_) => Ok(Empty::new()),
@@ -585,7 +571,6 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
req: protocols::agent::ExecProcessRequest,
) -> ttrpc::Result<Empty> {
trace_rpc_call!(ctx, "exec_process", req);
is_allowed!(req);
match self.do_exec_process(req).await {
Err(e) => Err(ttrpc_error(ttrpc::Code::INTERNAL, e.to_string())),
Ok(_) => Ok(Empty::new()),
@@ -598,7 +583,6 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
req: protocols::agent::SignalProcessRequest,
) -> ttrpc::Result<Empty> {
trace_rpc_call!(ctx, "signal_process", req);
is_allowed!(req);
match self.do_signal_process(req).await {
Err(e) => Err(ttrpc_error(ttrpc::Code::INTERNAL, e.to_string())),
Ok(_) => Ok(Empty::new()),
@@ -611,7 +595,6 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
req: protocols::agent::WaitProcessRequest,
) -> ttrpc::Result<WaitProcessResponse> {
trace_rpc_call!(ctx, "wait_process", req);
is_allowed!(req);
self.do_wait_process(req)
.await
.map_err(|e| ttrpc_error(ttrpc::Code::INTERNAL, e.to_string()))
@@ -623,7 +606,6 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
req: protocols::agent::UpdateContainerRequest,
) -> ttrpc::Result<Empty> {
trace_rpc_call!(ctx, "update_container", req);
is_allowed!(req);
let cid = req.container_id.clone();
let res = req.resources;
@@ -659,7 +641,6 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
req: protocols::agent::StatsContainerRequest,
) -> ttrpc::Result<StatsContainerResponse> {
trace_rpc_call!(ctx, "stats_container", req);
is_allowed!(req);
let cid = req.container_id;
let s = Arc::clone(&self.sandbox);
let mut sandbox = s.lock().await;
@@ -681,7 +662,6 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
req: protocols::agent::PauseContainerRequest,
) -> ttrpc::Result<protocols::empty::Empty> {
trace_rpc_call!(ctx, "pause_container", req);
is_allowed!(req);
let cid = req.get_container_id();
let s = Arc::clone(&self.sandbox);
let mut sandbox = s.lock().await;
@@ -705,7 +685,6 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
req: protocols::agent::ResumeContainerRequest,
) -> ttrpc::Result<protocols::empty::Empty> {
trace_rpc_call!(ctx, "resume_container", req);
is_allowed!(req);
let cid = req.get_container_id();
let s = Arc::clone(&self.sandbox);
let mut sandbox = s.lock().await;
@@ -728,7 +707,6 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
_ctx: &TtrpcContext,
req: protocols::agent::WriteStreamRequest,
) -> ttrpc::Result<WriteStreamResponse> {
is_allowed!(req);
self.do_write_stream(req)
.await
.map_err(|e| ttrpc_error(ttrpc::Code::INTERNAL, e.to_string()))
@@ -739,7 +717,6 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
_ctx: &TtrpcContext,
req: protocols::agent::ReadStreamRequest,
) -> ttrpc::Result<ReadStreamResponse> {
is_allowed!(req);
self.do_read_stream(req, true)
.await
.map_err(|e| ttrpc_error(ttrpc::Code::INTERNAL, e.to_string()))
@@ -750,7 +727,6 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
_ctx: &TtrpcContext,
req: protocols::agent::ReadStreamRequest,
) -> ttrpc::Result<ReadStreamResponse> {
is_allowed!(req);
self.do_read_stream(req, false)
.await
.map_err(|e| ttrpc_error(ttrpc::Code::INTERNAL, e.to_string()))
@@ -762,7 +738,6 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
req: protocols::agent::CloseStdinRequest,
) -> ttrpc::Result<Empty> {
trace_rpc_call!(ctx, "close_stdin", req);
is_allowed!(req);
let cid = req.container_id.clone();
let eid = req.exec_id;
@@ -776,7 +751,19 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
)
})?;
p.close_stdin();
if p.term_master.is_some() {
p.close_stream(StreamType::TermMaster);
let _ = unistd::close(p.term_master.unwrap());
p.term_master = None;
}
if p.parent_stdin.is_some() {
p.close_stream(StreamType::ParentStdin);
let _ = unistd::close(p.parent_stdin.unwrap());
p.parent_stdin = None;
}
p.notify_term_close();
Ok(Empty::new())
}
@@ -787,7 +774,6 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
req: protocols::agent::TtyWinResizeRequest,
) -> ttrpc::Result<Empty> {
trace_rpc_call!(ctx, "tty_win_resize", req);
is_allowed!(req);
let cid = req.container_id.clone();
let eid = req.exec_id.clone();
@@ -828,7 +814,6 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
req: protocols::agent::UpdateInterfaceRequest,
) -> ttrpc::Result<Interface> {
trace_rpc_call!(ctx, "update_interface", req);
is_allowed!(req);
let interface = req.interface.into_option().ok_or_else(|| {
ttrpc_error(
@@ -856,7 +841,6 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
req: protocols::agent::UpdateRoutesRequest,
) -> ttrpc::Result<Routes> {
trace_rpc_call!(ctx, "update_routes", req);
is_allowed!(req);
let new_routes = req
.routes
@@ -897,7 +881,6 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
req: protocols::agent::ListInterfacesRequest,
) -> ttrpc::Result<Interfaces> {
trace_rpc_call!(ctx, "list_interfaces", req);
is_allowed!(req);
let list = self
.sandbox
@@ -925,7 +908,6 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
req: protocols::agent::ListRoutesRequest,
) -> ttrpc::Result<Routes> {
trace_rpc_call!(ctx, "list_routes", req);
is_allowed!(req);
let list = self
.sandbox
@@ -948,16 +930,14 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
req: protocols::agent::StartTracingRequest,
) -> ttrpc::Result<Empty> {
info!(sl!(), "start_tracing {:?}", req);
is_allowed!(req);
Ok(Empty::new())
}
async fn stop_tracing(
&self,
_ctx: &TtrpcContext,
req: protocols::agent::StopTracingRequest,
_req: protocols::agent::StopTracingRequest,
) -> ttrpc::Result<Empty> {
is_allowed!(req);
Ok(Empty::new())
}
@@ -967,7 +947,6 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
req: protocols::agent::CreateSandboxRequest,
) -> ttrpc::Result<Empty> {
trace_rpc_call!(ctx, "create_sandbox", req);
is_allowed!(req);
{
let sandbox = self.sandbox.clone();
@@ -1033,7 +1012,6 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
req: protocols::agent::DestroySandboxRequest,
) -> ttrpc::Result<Empty> {
trace_rpc_call!(ctx, "destroy_sandbox", req);
is_allowed!(req);
let s = Arc::clone(&self.sandbox);
let mut sandbox = s.lock().await;
@@ -1055,7 +1033,6 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
req: protocols::agent::AddARPNeighborsRequest,
) -> ttrpc::Result<Empty> {
trace_rpc_call!(ctx, "add_arp_neighbors", req);
is_allowed!(req);
let neighs = req
.neighbors
@@ -1089,7 +1066,6 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
ctx: &TtrpcContext,
req: protocols::agent::OnlineCPUMemRequest,
) -> ttrpc::Result<Empty> {
is_allowed!(req);
let s = Arc::clone(&self.sandbox);
let sandbox = s.lock().await;
trace_rpc_call!(ctx, "online_cpu_mem", req);
@@ -1107,7 +1083,6 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
req: protocols::agent::ReseedRandomDevRequest,
) -> ttrpc::Result<Empty> {
trace_rpc_call!(ctx, "reseed_random_dev", req);
is_allowed!(req);
random::reseed_rng(req.data.as_slice())
.map_err(|e| ttrpc_error(ttrpc::Code::INTERNAL, e.to_string()))?;
@@ -1121,7 +1096,6 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
req: protocols::agent::GuestDetailsRequest,
) -> ttrpc::Result<GuestDetailsResponse> {
trace_rpc_call!(ctx, "get_guest_details", req);
is_allowed!(req);
info!(sl!(), "get guest details!");
let mut resp = GuestDetailsResponse::new();
@@ -1150,7 +1124,6 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
req: protocols::agent::MemHotplugByProbeRequest,
) -> ttrpc::Result<Empty> {
trace_rpc_call!(ctx, "mem_hotplug_by_probe", req);
is_allowed!(req);
do_mem_hotplug_by_probe(&req.memHotplugProbeAddr)
.map_err(|e| ttrpc_error(ttrpc::Code::INTERNAL, e.to_string()))?;
@@ -1164,7 +1137,6 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
req: protocols::agent::SetGuestDateTimeRequest,
) -> ttrpc::Result<Empty> {
trace_rpc_call!(ctx, "set_guest_date_time", req);
is_allowed!(req);
do_set_guest_date_time(req.Sec, req.Usec)
.map_err(|e| ttrpc_error(ttrpc::Code::INTERNAL, e.to_string()))?;
@@ -1178,7 +1150,6 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
req: protocols::agent::CopyFileRequest,
) -> ttrpc::Result<Empty> {
trace_rpc_call!(ctx, "copy_file", req);
is_allowed!(req);
do_copy_file(&req).map_err(|e| ttrpc_error(ttrpc::Code::INTERNAL, e.to_string()))?;
@@ -1191,7 +1162,6 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
req: protocols::agent::GetMetricsRequest,
) -> ttrpc::Result<Metrics> {
trace_rpc_call!(ctx, "get_metrics", req);
is_allowed!(req);
match get_metrics(&req) {
Err(e) => Err(ttrpc_error(ttrpc::Code::INTERNAL, e.to_string())),
@@ -1206,9 +1176,8 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
async fn get_oom_event(
&self,
_ctx: &TtrpcContext,
req: protocols::agent::GetOOMEventRequest,
_req: protocols::agent::GetOOMEventRequest,
) -> ttrpc::Result<OOMEvent> {
is_allowed!(req);
let sandbox = self.sandbox.clone();
let s = sandbox.lock().await;
let event_rx = &s.event_rx.clone();
@@ -1234,11 +1203,8 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
req: protocols::agent::AddSwapRequest,
) -> ttrpc::Result<Empty> {
trace_rpc_call!(ctx, "add_swap", req);
is_allowed!(req);
do_add_swap(&self.sandbox, &req)
.await
.map_err(|e| ttrpc_error(ttrpc::Code::INTERNAL, e.to_string()))?;
do_add_swap(&req).map_err(|e| ttrpc_error(ttrpc::Code::INTERNAL, e.to_string()))?;
Ok(Empty::new())
}
@@ -1322,19 +1288,11 @@ fn get_memory_info(block_size: bool, hotplug: bool) -> Result<(u64, bool)> {
Ok((size, plug))
}
pub fn have_seccomp() -> bool {
if cfg!(feature = "seccomp") {
return true;
}
false
}
fn get_agent_details() -> AgentDetails {
let mut detail = AgentDetails::new();
detail.set_version(AGENT_VERSION.to_string());
detail.set_supports_seccomp(have_seccomp());
detail.set_supports_seccomp(false);
detail.init_daemon = unistd::getpid() == Pid::from_raw(1);
detail.device_handlers = RepeatedField::new();
@@ -1599,13 +1557,43 @@ fn do_copy_file(req: &CopyFileRequest) -> Result<()> {
Ok(())
}
async fn do_add_swap(sandbox: &Arc<Mutex<Sandbox>>, req: &AddSwapRequest) -> Result<()> {
pub fn path_name_lookup<P: std::clone::Clone + AsRef<Path> + std::fmt::Debug>(
path: P,
lookup: &str,
) -> Result<(PathBuf, String)> {
for entry in fs::read_dir(path.clone())? {
let entry = entry?;
if let Some(name) = entry.path().file_name() {
if let Some(name) = name.to_str() {
if Some(0) == name.find(lookup) {
return Ok((entry.path(), name.to_string()));
}
}
}
}
Err(anyhow!("cannot get {} dir in {:?}", lookup, path))
}
fn do_add_swap(req: &AddSwapRequest) -> Result<()> {
// re-scan PCI bus
// looking for hidden devices
rescan_pci_bus().context("Could not rescan PCI bus")?;
let mut slots = Vec::new();
for slot in &req.PCIPath {
slots.push(pci::SlotFn::new(*slot, 0)?);
slots.push(pci::Slot::new(*slot as u8)?);
}
let pcipath = pci::Path::new(slots)?;
let dev_name = get_virtio_blk_pci_device_name(sandbox, &pcipath).await?;
let root_bus_sysfs = format!("{}{}", SYSFS_DIR, create_pci_root_bus_path());
let sysfs_rel_path = format!(
"{}{}",
root_bus_sysfs,
pcipath_to_sysfs(&root_bus_sysfs, &pcipath)?
);
let (mut virtio_path, _) = path_name_lookup(sysfs_rel_path, "virtio")?;
virtio_path.push("block");
let (_, dev_name) = path_name_lookup(virtio_path, "vd")?;
let dev_name = format!("/dev/{}", dev_name);
let c_str = CString::new(dev_name)?;
let ret = unsafe { libc::swapon(c_str.as_ptr() as *const c_char, 0) };
@@ -1636,19 +1624,25 @@ fn setup_bundle(cid: &str, spec: &mut Spec) -> Result<PathBuf> {
let rootfs_path = bundle_path.join("rootfs");
fs::create_dir_all(&rootfs_path)?;
baremount(
BareMount::new(
&spec_root.path,
rootfs_path.to_str().unwrap(),
"bind",
MsFlags::MS_BIND,
"",
&sl!(),
)?;
)
.mount()?;
spec.root = Some(Root {
path: rootfs_path.to_str().unwrap().to_owned(),
readonly: spec_root.readonly,
});
info!(
sl!(),
"{:?}",
spec.process.as_ref().unwrap().console_size.as_ref()
);
let _ = spec.save(config_path.to_str().unwrap());
let olddir = unistd::getcwd().context("cannot getcwd")?;
@@ -1657,6 +1651,37 @@ fn setup_bundle(cid: &str, spec: &mut Spec) -> Result<PathBuf> {
Ok(olddir)
}
fn cleanup_process(p: &mut Process) -> Result<()> {
if p.parent_stdin.is_some() {
p.close_stream(StreamType::ParentStdin);
unistd::close(p.parent_stdin.unwrap())?;
}
if p.parent_stdout.is_some() {
p.close_stream(StreamType::ParentStdout);
unistd::close(p.parent_stdout.unwrap())?;
}
if p.parent_stderr.is_some() {
p.close_stream(StreamType::ParentStderr);
unistd::close(p.parent_stderr.unwrap())?;
}
if p.term_master.is_some() {
p.close_stream(StreamType::TermMaster);
unistd::close(p.term_master.unwrap())?;
}
p.notify_term_close();
p.parent_stdin = None;
p.parent_stdout = None;
p.parent_stderr = None;
p.term_master = None;
Ok(())
}
fn load_kernel_module(module: &protocols::agent::KernelModule) -> Result<()> {
if module.name.is_empty() {
return Err(anyhow!("Kernel module name is empty"));

View File

@@ -449,7 +449,7 @@ fn online_memory(logger: &Logger) -> Result<()> {
#[cfg(test)]
mod tests {
use super::Sandbox;
use crate::{mount::baremount, skip_if_not_root};
use crate::{mount::BareMount, skip_if_not_root};
use anyhow::Error;
use nix::mount::MsFlags;
use oci::{Linux, Root, Spec};
@@ -461,13 +461,11 @@ mod tests {
use tempfile::Builder;
fn bind_mount(src: &str, dst: &str, logger: &Logger) -> Result<(), Error> {
baremount(src, dst, "bind", MsFlags::MS_BIND, "", logger)
let baremount = BareMount::new(src, dst, "bind", MsFlags::MS_BIND, "", logger);
baremount.mount()
}
use serial_test::serial;
#[tokio::test]
#[serial]
async fn set_sandbox_storage() {
let logger = slog::Logger::root(slog::Discard, o!());
let mut s = Sandbox::new(&logger).unwrap();
@@ -502,7 +500,6 @@ mod tests {
}
#[tokio::test]
#[serial]
async fn remove_sandbox_storage() {
skip_if_not_root!();
@@ -559,7 +556,6 @@ mod tests {
}
#[tokio::test]
#[serial]
async fn unset_and_remove_sandbox_storage() {
skip_if_not_root!();
@@ -611,7 +607,6 @@ mod tests {
}
#[tokio::test]
#[serial]
async fn unset_sandbox_storage() {
let logger = slog::Logger::root(slog::Discard, o!());
let mut s = Sandbox::new(&logger).unwrap();
@@ -695,7 +690,6 @@ mod tests {
}
#[tokio::test]
#[serial]
async fn get_container_entry_exist() {
skip_if_not_root!();
let logger = slog::Logger::root(slog::Discard, o!());
@@ -709,7 +703,6 @@ mod tests {
}
#[tokio::test]
#[serial]
async fn get_container_no_entry() {
let logger = slog::Logger::root(slog::Discard, o!());
let mut s = Sandbox::new(&logger).unwrap();
@@ -719,7 +712,6 @@ mod tests {
}
#[tokio::test]
#[serial]
async fn add_and_get_container() {
skip_if_not_root!();
let logger = slog::Logger::root(slog::Discard, o!());
@@ -731,7 +723,6 @@ mod tests {
}
#[tokio::test]
#[serial]
async fn update_shared_pidns() {
skip_if_not_root!();
let logger = slog::Logger::root(slog::Discard, o!());
@@ -750,7 +741,6 @@ mod tests {
}
#[tokio::test]
#[serial]
async fn add_guest_hooks() {
let logger = slog::Logger::root(slog::Discard, o!());
let mut s = Sandbox::new(&logger).unwrap();
@@ -774,7 +764,6 @@ mod tests {
}
#[tokio::test]
#[serial]
async fn test_sandbox_set_destroy() {
let logger = slog::Logger::root(slog::Discard, o!());
let mut s = Sandbox::new(&logger).unwrap();

View File

@@ -3,17 +3,60 @@
// SPDX-License-Identifier: Apache-2.0
//
use crate::config::AgentConfig;
use anyhow::Result;
use opentelemetry::sdk::propagation::TraceContextPropagator;
use opentelemetry::{global, sdk::trace::Config, trace::TracerProvider};
use slog::{info, o, Logger};
use std::collections::HashMap;
use std::error::Error;
use std::fmt;
use std::str::FromStr;
use tracing_opentelemetry::OpenTelemetryLayer;
use tracing_subscriber::layer::SubscriberExt;
use tracing_subscriber::Registry;
use ttrpc::r#async::TtrpcContext;
pub fn setup_tracing(name: &'static str, logger: &Logger) -> Result<()> {
#[derive(Debug, PartialEq)]
pub enum TraceType {
Disabled,
Isolated,
}
#[derive(Debug)]
pub struct TraceTypeError {
details: String,
}
impl TraceTypeError {
fn new(msg: &str) -> TraceTypeError {
TraceTypeError {
details: msg.into(),
}
}
}
impl Error for TraceTypeError {}
impl fmt::Display for TraceTypeError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "{}", self.details)
}
}
impl FromStr for TraceType {
type Err = TraceTypeError;
fn from_str(s: &str) -> Result<Self, Self::Err> {
match s {
"isolated" => Ok(TraceType::Isolated),
"disabled" => Ok(TraceType::Disabled),
_ => Err(TraceTypeError::new("invalid trace type")),
}
}
}
pub fn setup_tracing(name: &'static str, logger: &Logger, _agent_cfg: &AgentConfig) -> Result<()> {
let logger = logger.new(o!("subsystem" => "vsock-tracer"));
let exporter = vsock_exporter::Exporter::builder()

View File

@@ -97,18 +97,10 @@ impl Uevent {
})
}
#[instrument]
async fn process_remove(&self, logger: &Logger, sandbox: &Arc<Mutex<Sandbox>>) {
let mut sb = sandbox.lock().await;
sb.uevent_map.remove(&self.devpath);
}
#[instrument]
async fn process(&self, logger: &Logger, sandbox: &Arc<Mutex<Sandbox>>) {
if self.action == U_EVENT_ACTION_ADD {
return self.process_add(logger, sandbox).await;
} else if self.action == U_EVENT_ACTION_REMOVE {
return self.process_remove(logger, sandbox).await;
}
debug!(*logger, "ignoring event"; "uevent" => format!("{:?}", self));
}
@@ -119,13 +111,10 @@ pub async fn wait_for_uevent(
sandbox: &Arc<Mutex<Sandbox>>,
matcher: impl UeventMatcher,
) -> Result<Uevent> {
let logprefix = format!("Waiting for {:?}", &matcher);
info!(sl!(), "{}", logprefix);
let mut sb = sandbox.lock().await;
for uev in sb.uevent_map.values() {
if matcher.is_match(uev) {
info!(sl!(), "{}: found {:?} in uevent map", logprefix, &uev);
info!(sl!(), "Device {:?} found in device map", uev);
return Ok(uev.clone());
}
}
@@ -140,8 +129,7 @@ pub async fn wait_for_uevent(
sb.uevent_watchers.push(Some((Box::new(matcher), tx)));
drop(sb); // unlock
info!(sl!(), "{}: waiting on channel", logprefix);
info!(sl!(), "Waiting on channel for uevent notification\n");
let hotplug_timeout = AGENT_CONFIG.read().await.hotplug_timeout;
let uev = match tokio::time::timeout(hotplug_timeout, rx).await {
@@ -158,7 +146,6 @@ pub async fn wait_for_uevent(
}
};
info!(sl!(), "{}: found {:?} on channel", logprefix, &uev);
Ok(uev)
}

View File

@@ -3,7 +3,7 @@
// SPDX-License-Identifier: Apache-2.0
//
#![allow(unknown_lints)]
#![allow(clippy::unknown_clippy_lints)]
use std::collections::HashMap;
use std::path::{Path, PathBuf};
@@ -20,7 +20,7 @@ use tokio::sync::Mutex;
use tokio::task;
use tokio::time::{self, Duration};
use crate::mount::baremount;
use crate::mount::BareMount;
use crate::protocols::agent as protos;
/// The maximum number of file system entries agent will watch for each mount.
@@ -49,7 +49,7 @@ struct Storage {
/// the source becomes too large, either in number of files (>16) or total size (>1MB).
watch: bool,
/// The list of files, directories, symlinks to watch from the source mount point and updated in the target one.
/// The list of files to watch from the source mount point and updated in the target one.
watched_files: HashMap<PathBuf, SystemTime>,
}
@@ -79,20 +79,6 @@ impl Drop for Storage {
}
}
async fn copy(from: impl AsRef<Path>, to: impl AsRef<Path>) -> Result<()> {
if fs::symlink_metadata(&from).await?.file_type().is_symlink() {
// if source is a symlink, create new symlink with same link source. If
// the symlink exists, remove and create new one:
if fs::symlink_metadata(&to).await.is_ok() {
fs::remove_file(&to).await?;
}
fs::symlink(fs::read_link(&from).await?, &to).await?;
} else {
fs::copy(from, to).await?;
}
Ok(())
}
impl Storage {
async fn new(storage: protos::Storage) -> Result<Storage> {
let entry = Storage {
@@ -107,16 +93,6 @@ impl Storage {
async fn update_target(&self, logger: &Logger, source_path: impl AsRef<Path>) -> Result<()> {
let source_file_path = source_path.as_ref();
// if we are creating a directory: just create it, nothing more to do
if source_file_path.symlink_metadata()?.file_type().is_dir() {
fs::create_dir_all(source_file_path)
.await
.with_context(|| {
format!("Unable to mkdir all for {}", source_file_path.display())
})?
}
// Assume we are dealing with either a file or a symlink now:
let dest_file_path = if self.source_mount_point.is_file() {
// Simple file to file copy
// Assume target mount is a file path
@@ -134,13 +110,19 @@ impl Storage {
dest_file_path
};
copy(&source_file_path, &dest_file_path)
debug!(
logger,
"Copy from {} to {}",
source_file_path.display(),
dest_file_path.display()
);
fs::copy(&source_file_path, &dest_file_path)
.await
.with_context(|| {
format!(
"Copy from {} to {} failed",
source_file_path.display(),
dest_file_path.display(),
dest_file_path.display()
)
})?;
@@ -153,7 +135,7 @@ impl Storage {
let mut remove_list = Vec::new();
let mut updated_files: Vec<PathBuf> = Vec::new();
// Remove deleted files for tracking list.
// Remove deleted files for tracking list
self.watched_files.retain(|st, _| {
if st.exists() {
true
@@ -165,19 +147,10 @@ impl Storage {
// Delete from target
for path in remove_list {
// File has been deleted, remove it from target mount
let target = self.make_target_path(path)?;
// The target may be a directory or a file. If it is a directory that is removed,
// we'll remove all files under that directory as well. Because of this, there's a
// chance the target (a subdirectory or file under a prior removed target) was already
// removed. Make sure we check if the target exists before checking the metadata, and
// don't return an error if the remove fails
if target.exists() && target.symlink_metadata()?.file_type().is_dir() {
debug!(logger, "Removing a directory: {}", target.display());
let _ = fs::remove_dir_all(target).await;
} else {
debug!(logger, "Removing a file: {}", target.display());
let _ = fs::remove_file(target).await;
}
debug!(logger, "Removing file from mount: {}", target.display());
let _ = fs::remove_file(target).await;
}
// Scan new & changed files
@@ -209,18 +182,25 @@ impl Storage {
let mut size: u64 = 0;
debug!(logger, "Scanning path: {}", path.display());
let metadata = path
.symlink_metadata()
.with_context(|| format!("Failed to query metadata for: {}", path.display()))?;
if path.is_file() {
let metadata = path
.metadata()
.with_context(|| format!("Failed to query metadata for: {}", path.display()))?;
let modified = metadata
.modified()
.with_context(|| format!("Failed to get modified date for: {}", path.display()))?;
let modified = metadata
.modified()
.with_context(|| format!("Failed to get modified date for: {}", path.display()))?;
// Treat files and symlinks the same:
if path.is_file() || metadata.file_type().is_symlink() {
size += metadata.len();
ensure!(
self.watched_files.len() <= MAX_ENTRIES_PER_STORAGE,
WatcherError::MountTooManyFiles {
count: self.watched_files.len(),
mnt: self.source_mount_point.display().to_string()
}
);
// Insert will return old entry if any
if let Some(old_st) = self.watched_files.insert(path.to_path_buf(), modified) {
if modified > old_st {
@@ -231,25 +211,7 @@ impl Storage {
debug!(logger, "New entry: {}", path.display());
update_list.push(PathBuf::from(&path))
}
ensure!(
self.watched_files.len() <= MAX_ENTRIES_PER_STORAGE,
WatcherError::MountTooManyFiles {
count: self.watched_files.len(),
mnt: self.source_mount_point.display().to_string()
}
);
} else {
// Handling regular directories - check to see if this directory is already being tracked, and
// track if not:
if self
.watched_files
.insert(path.to_path_buf(), modified)
.is_none()
{
update_list.push(path.to_path_buf());
}
// Scan dir recursively
let mut entries = fs::read_dir(path)
.await
@@ -365,14 +327,16 @@ impl SandboxStorages {
}
}
match baremount(
match BareMount::new(
entry.source_mount_point.to_str().unwrap(),
entry.target_mount_point.to_str().unwrap(),
"bind",
MsFlags::MS_BIND,
"bind",
logger,
) {
)
.mount()
{
Ok(_) => {
entry.watch = false;
info!(logger, "watchable mount replaced with bind mount")
@@ -476,14 +440,15 @@ impl BindWatcher {
async fn mount(&self, logger: &Logger) -> Result<()> {
fs::create_dir_all(WATCH_MOUNT_POINT_PATH).await?;
baremount(
BareMount::new(
"tmpfs",
WATCH_MOUNT_POINT_PATH,
"tmpfs",
MsFlags::empty(),
"",
logger,
)?;
)
.mount()?;
Ok(())
}
@@ -650,7 +615,7 @@ mod tests {
.unwrap();
// setup storage3: many files, but still watchable
for i in 1..MAX_ENTRIES_PER_STORAGE {
for i in 1..MAX_ENTRIES_PER_STORAGE + 1 {
fs::write(src3_path.join(format!("{}.txt", i)), "original").unwrap();
}
@@ -660,9 +625,6 @@ mod tests {
..Default::default()
};
// delay 20 ms between writes to files in order to ensure filesystem timestamps are unique
thread::sleep(Duration::from_millis(20));
entries
.add(std::iter::once(storage0), &logger)
.await
@@ -715,7 +677,7 @@ mod tests {
std::fs::read_dir(entries.0[3].target_mount_point.as_path())
.unwrap()
.count(),
MAX_ENTRIES_PER_STORAGE - 1
MAX_ENTRIES_PER_STORAGE
);
// Add two files to storage 0, verify it is updated without needing to run check:
@@ -733,9 +695,6 @@ mod tests {
"updated"
);
// delay 20 ms between writes to files in order to ensure filesystem timestamps are unique
thread::sleep(Duration::from_millis(20));
//
// Prepare for second check: update mount sources
//
@@ -788,7 +747,7 @@ mod tests {
std::fs::read_dir(entries.0[3].target_mount_point.as_path())
.unwrap()
.count(),
MAX_ENTRIES_PER_STORAGE
MAX_ENTRIES_PER_STORAGE + 1
);
// verify that we can remove files as well, but that it isn't observed until check is run
@@ -866,20 +825,15 @@ mod tests {
fs::remove_file(source_dir.path().join("big.txt")).unwrap();
fs::remove_file(source_dir.path().join("too-big.txt")).unwrap();
assert_eq!(entry.scan(&logger).await.unwrap(), 0);
// Up to 15 files should be okay (can watch 15 files + 1 directory)
for i in 1..MAX_ENTRIES_PER_STORAGE {
// Up to 16 files should be okay:
for i in 1..MAX_ENTRIES_PER_STORAGE + 1 {
fs::write(source_dir.path().join(format!("{}.txt", i)), "original").unwrap();
}
assert_eq!(
entry.scan(&logger).await.unwrap(),
MAX_ENTRIES_PER_STORAGE - 1
);
assert_eq!(entry.scan(&logger).await.unwrap(), MAX_ENTRIES_PER_STORAGE);
// 16 files wll be too many:
fs::write(source_dir.path().join("16.txt"), "updated").unwrap();
// 17 files is too many:
fs::write(source_dir.path().join("17.txt"), "updated").unwrap();
thread::sleep(Duration::from_secs(1));
// Expect to receive a MountTooManyFiles error
@@ -892,180 +846,6 @@ mod tests {
}
}
#[tokio::test]
async fn test_copy() {
// prepare tmp src/destination
let source_dir = tempfile::tempdir().unwrap();
let dest_dir = tempfile::tempdir().unwrap();
// verify copy of a regular file
let src_file = source_dir.path().join("file.txt");
let dst_file = dest_dir.path().join("file.txt");
fs::write(&src_file, "foo").unwrap();
copy(&src_file, &dst_file).await.unwrap();
// verify destination:
assert!(!fs::symlink_metadata(dst_file)
.unwrap()
.file_type()
.is_symlink());
// verify copy of a symlink
let src_symlink_file = source_dir.path().join("symlink_file.txt");
let dst_symlink_file = dest_dir.path().join("symlink_file.txt");
tokio::fs::symlink(&src_file, &src_symlink_file)
.await
.unwrap();
copy(src_symlink_file, &dst_symlink_file).await.unwrap();
// verify destination:
assert!(fs::symlink_metadata(&dst_symlink_file)
.unwrap()
.file_type()
.is_symlink());
assert_eq!(fs::read_link(&dst_symlink_file).unwrap(), src_file);
assert_eq!(fs::read_to_string(&dst_symlink_file).unwrap(), "foo")
}
#[tokio::test]
async fn watch_directory_verify_dir_removal() {
let source_dir = tempfile::tempdir().unwrap();
let dest_dir = tempfile::tempdir().unwrap();
let mut entry = Storage::new(protos::Storage {
source: source_dir.path().display().to_string(),
mount_point: dest_dir.path().display().to_string(),
..Default::default()
})
.await
.unwrap();
let logger = slog::Logger::root(slog::Discard, o!());
// create a path we'll remove later
fs::create_dir_all(source_dir.path().join("tmp")).unwrap();
fs::write(&source_dir.path().join("tmp/test-file"), "foo").unwrap();
assert_eq!(entry.scan(&logger).await.unwrap(), 3); // root, ./tmp, test-file
// Verify expected directory, file:
assert_eq!(
std::fs::read_dir(dest_dir.path().join("tmp"))
.unwrap()
.count(),
1
);
assert_eq!(std::fs::read_dir(&dest_dir).unwrap().count(), 1);
// Now, remove directory, and verify that the directory (and its file) are removed:
fs::remove_dir_all(source_dir.path().join("tmp")).unwrap();
thread::sleep(Duration::from_secs(1));
assert_eq!(entry.scan(&logger).await.unwrap(), 0);
assert_eq!(std::fs::read_dir(&dest_dir).unwrap().count(), 0);
assert_eq!(entry.scan(&logger).await.unwrap(), 0);
}
#[tokio::test]
async fn watch_directory_with_symlinks() {
// Prepare source directory:
// ..2021_10_29_03_10_48.161654083/file.txt
// ..data -> ..2021_10_29_03_10_48.161654083
// file.txt -> ..data/file.txt
let source_dir = tempfile::tempdir().unwrap();
let actual_dir = source_dir.path().join("..2021_10_29_03_10_48.161654083");
let actual_file = actual_dir.join("file.txt");
let sym_dir = source_dir.path().join("..data");
let sym_file = source_dir.path().join("file.txt");
let relative_to_dir = PathBuf::from("..2021_10_29_03_10_48.161654083");
// create backing file/path
fs::create_dir_all(&actual_dir).unwrap();
fs::write(&actual_file, "two").unwrap();
// create indirection symlink directory that points to the directory that holds the actual file:
tokio::fs::symlink(&relative_to_dir, &sym_dir)
.await
.unwrap();
// create presented data file symlink:
tokio::fs::symlink(PathBuf::from("..data/file.txt"), sym_file)
.await
.unwrap();
let dest_dir = tempfile::tempdir().unwrap();
// delay 20 ms between writes to files in order to ensure filesystem timestamps are unique
thread::sleep(Duration::from_millis(20));
let mut entry = Storage::new(protos::Storage {
source: source_dir.path().display().to_string(),
mount_point: dest_dir.path().display().to_string(),
..Default::default()
})
.await
.unwrap();
let logger = slog::Logger::root(slog::Discard, o!());
assert_eq!(entry.scan(&logger).await.unwrap(), 5);
// Should copy no files since nothing is changed since last check
assert_eq!(entry.scan(&logger).await.unwrap(), 0);
// now what, what is updated?
fs::write(actual_file, "updated").unwrap();
// delay 20 ms between writes to files in order to ensure filesystem timestamps are unique
thread::sleep(Duration::from_millis(20));
assert_eq!(entry.scan(&logger).await.unwrap(), 1);
assert_eq!(
fs::read_to_string(dest_dir.path().join("file.txt")).unwrap(),
"updated"
);
// Verify that resulting file.txt is a symlink:
assert!(
tokio::fs::symlink_metadata(dest_dir.path().join("file.txt"))
.await
.unwrap()
.file_type()
.is_symlink()
);
// Verify that .data directory is a symlink:
assert!(tokio::fs::symlink_metadata(&dest_dir.path().join("..data"))
.await
.unwrap()
.file_type()
.is_symlink());
// Should copy no new files after copy happened
assert_eq!(entry.scan(&logger).await.unwrap(), 0);
// Now, simulate configmap update.
// - create a new actual dir/file,
// - update the symlink directory to point to this one
// - remove old dir/file
let new_actual_dir = source_dir.path().join("..2021_10_31");
let new_actual_file = new_actual_dir.join("file.txt");
fs::create_dir_all(&new_actual_dir).unwrap();
fs::write(&new_actual_file, "new configmap").unwrap();
tokio::fs::remove_file(&sym_dir).await.unwrap();
tokio::fs::symlink(PathBuf::from("..2021_10_31"), &sym_dir)
.await
.unwrap();
tokio::fs::remove_dir_all(&actual_dir).await.unwrap();
assert_eq!(entry.scan(&logger).await.unwrap(), 3); // file, file-dir, symlink
assert_eq!(
fs::read_to_string(dest_dir.path().join("file.txt")).unwrap(),
"new configmap"
);
}
#[tokio::test]
async fn watch_directory() {
// Prepare source directory:
@@ -1076,9 +856,6 @@ mod tests {
fs::create_dir_all(source_dir.path().join("A/B")).unwrap();
fs::write(source_dir.path().join("A/B/1.txt"), "two").unwrap();
// delay 20 ms between writes to files in order to ensure filesystem timestamps are unique
thread::sleep(Duration::from_millis(20));
let dest_dir = tempfile::tempdir().unwrap();
let mut entry = Storage::new(protos::Storage {
@@ -1091,11 +868,13 @@ mod tests {
let logger = slog::Logger::root(slog::Discard, o!());
assert_eq!(entry.scan(&logger).await.unwrap(), 5);
assert_eq!(entry.scan(&logger).await.unwrap(), 2);
// Should copy no files since nothing is changed since last check
assert_eq!(entry.scan(&logger).await.unwrap(), 0);
// Should copy 1 file
thread::sleep(Duration::from_secs(1));
fs::write(source_dir.path().join("A/B/1.txt"), "updated").unwrap();
assert_eq!(entry.scan(&logger).await.unwrap(), 1);
assert_eq!(
@@ -1103,9 +882,6 @@ mod tests {
"updated"
);
// delay 20 ms between writes to files in order to ensure filesystem timestamps are unique
thread::sleep(Duration::from_millis(20));
// Should copy no new files after copy happened
assert_eq!(entry.scan(&logger).await.unwrap(), 0);
@@ -1136,9 +912,7 @@ mod tests {
assert_eq!(entry.scan(&logger).await.unwrap(), 1);
// delay 20 ms between writes to files in order to ensure filesystem timestamps are unique
thread::sleep(Duration::from_millis(20));
thread::sleep(Duration::from_secs(1));
fs::write(&source_file, "two").unwrap();
assert_eq!(entry.scan(&logger).await.unwrap(), 1);
assert_eq!(fs::read_to_string(&dest_file).unwrap(), "two");
@@ -1164,9 +938,8 @@ mod tests {
let logger = slog::Logger::root(slog::Discard, o!());
// expect the root directory and the file:
assert_eq!(entry.scan(&logger).await.unwrap(), 2);
assert_eq!(entry.watched_files.len(), 2);
assert_eq!(entry.scan(&logger).await.unwrap(), 1);
assert_eq!(entry.watched_files.len(), 1);
assert!(target_file.exists());
assert!(entry.watched_files.contains_key(&source_file));
@@ -1176,7 +949,7 @@ mod tests {
assert_eq!(entry.scan(&logger).await.unwrap(), 0);
assert_eq!(entry.watched_files.len(), 1);
assert_eq!(entry.watched_files.len(), 0);
assert!(!target_file.exists());
}
@@ -1209,10 +982,7 @@ mod tests {
);
}
use serial_test::serial;
#[tokio::test]
#[serial]
async fn create_tmpfs() {
skip_if_not_root!();
@@ -1222,14 +992,11 @@ mod tests {
watcher.mount(&logger).await.unwrap();
assert!(is_mounted(WATCH_MOUNT_POINT_PATH).unwrap());
thread::sleep(Duration::from_millis(20));
watcher.cleanup();
assert!(!is_mounted(WATCH_MOUNT_POINT_PATH).unwrap());
}
#[tokio::test]
#[serial]
async fn spawn_thread() {
skip_if_not_root!();
@@ -1259,7 +1026,6 @@ mod tests {
}
#[tokio::test]
#[serial]
async fn verify_container_cleanup_watching() {
skip_if_not_root!();

View File

@@ -15,6 +15,6 @@ serde = { version = "1.0.126", features = ["derive"] }
tokio-vsock = "0.3.1"
bincode = "1.3.3"
byteorder = "1.4.3"
slog = { version = "2.5.2", features = ["dynamic-keys", "max_level_trace", "release_max_level_debug"] }
slog = { version = "2.5.2", features = ["dynamic-keys", "max_level_trace", "release_max_level_info"] }
async-trait = "0.1.50"
tokio = "1.2.0"

View File

@@ -12,7 +12,7 @@
// payload, which allows the forwarder to know how many bytes it must read to
// consume the trace span. The payload is a serialised version of the trace span.
#![allow(unknown_lints)]
#![allow(clippy::unknown_clippy_lints)]
use async_trait::async_trait;
use byteorder::{ByteOrder, NetworkEndian};

View File

@@ -5,10 +5,16 @@ coverage.txt
coverage.html
.git-commit
.git-commit.tmp
/config/*.toml
config-generated.go
/cli/config/configuration-acrn.toml
/cli/config/configuration-clh.toml
/cli/config/configuration-fc.toml
/cli/config/configuration-qemu.toml
/cli/config/configuration-clh.toml
/cli/config-generated.go
/cli/containerd-shim-kata-v2/config-generated.go
/cli/coverage.html
/containerd-shim-kata-v2
/pkg/containerd-shim-v2/monitor_address
/containerd-shim-v2/monitor_address
/data/kata-collect-data.sh
/kata-monitor
/kata-netmon
@@ -17,4 +23,7 @@ config-generated.go
/virtcontainers/hack/virtc/virtc
/virtcontainers/hook/mock/hook
/virtcontainers/profile.cov
/virtcontainers/shim/mock/cc-shim/cc-shim
/virtcontainers/shim/mock/kata-shim/kata-shim
/virtcontainers/shim/mock/shim
/virtcontainers/utils/supportfiles

View File

@@ -51,13 +51,12 @@ PROJECT_DIR = $(PROJECT_TAG)
IMAGENAME = $(PROJECT_TAG).img
TARGET = $(BIN_PREFIX)-runtime
RUNTIME_OUTPUT = $(CURDIR)/$(TARGET)
RUNTIME_DIR = $(CLI_DIR)/$(TARGET)
TARGET_OUTPUT = $(CURDIR)/$(TARGET)
BINLIST += $(TARGET)
NETMON_DIR = $(CLI_DIR)/netmon
NETMON_DIR = netmon
NETMON_TARGET = $(PROJECT_TYPE)-netmon
NETMON_RUNTIME_OUTPUT = $(CURDIR)/$(NETMON_TARGET)
NETMON_TARGET_OUTPUT = $(CURDIR)/$(NETMON_TARGET)
BINLIBEXECLIST += $(NETMON_TARGET)
DESTDIR ?= /
@@ -190,7 +189,6 @@ DEFVALIDVHOSTUSERSTOREPATHS := [\"$(DEFVHOSTUSERSTOREPATH)\"]
DEFFILEMEMBACKEND := ""
DEFVALIDFILEMEMBACKENDS := [\"$(DEFFILEMEMBACKEND)\"]
DEFMSIZE9P := 8192
DEFVFIOMODE := guest-kernel
# Default cgroup model
DEFSANDBOXCGROUPONLY ?= false
@@ -202,7 +200,7 @@ FEATURE_SELINUX ?= check
SED = sed
CLI_DIR = cmd
CLI_DIR = cli
SHIMV2 = containerd-shim-kata-v2
SHIMV2_OUTPUT = $(CURDIR)/$(SHIMV2)
SHIMV2_DIR = $(CLI_DIR)/$(SHIMV2)
@@ -227,7 +225,7 @@ ifneq (,$(QEMUCMD))
KNOWN_HYPERVISORS += $(HYPERVISOR_QEMU)
CONFIG_FILE_QEMU = configuration-qemu.toml
CONFIG_QEMU = config/$(CONFIG_FILE_QEMU)
CONFIG_QEMU = $(CLI_DIR)/config/$(CONFIG_FILE_QEMU)
CONFIG_QEMU_IN = $(CONFIG_QEMU).in
CONFIG_PATH_QEMU = $(abspath $(CONFDIR)/$(CONFIG_FILE_QEMU))
@@ -250,7 +248,7 @@ ifneq (,$(CLHCMD))
KNOWN_HYPERVISORS += $(HYPERVISOR_CLH)
CONFIG_FILE_CLH = configuration-clh.toml
CONFIG_CLH = config/$(CONFIG_FILE_CLH)
CONFIG_CLH = $(CLI_DIR)/config/$(CONFIG_FILE_CLH)
CONFIG_CLH_IN = $(CONFIG_CLH).in
CONFIG_PATH_CLH = $(abspath $(CONFDIR)/$(CONFIG_FILE_CLH))
@@ -273,7 +271,7 @@ ifneq (,$(FCCMD))
KNOWN_HYPERVISORS += $(HYPERVISOR_FC)
CONFIG_FILE_FC = configuration-fc.toml
CONFIG_FC = config/$(CONFIG_FILE_FC)
CONFIG_FC = $(CLI_DIR)/config/$(CONFIG_FILE_FC)
CONFIG_FC_IN = $(CONFIG_FC).in
CONFIG_PATH_FC = $(abspath $(CONFDIR)/$(CONFIG_FILE_FC))
@@ -296,7 +294,7 @@ ifneq (,$(ACRNCMD))
KNOWN_HYPERVISORS += $(HYPERVISOR_ACRN)
CONFIG_FILE_ACRN = configuration-acrn.toml
CONFIG_ACRN = config/$(CONFIG_FILE_ACRN)
CONFIG_ACRN = $(CLI_DIR)/config/$(CONFIG_FILE_ACRN)
CONFIG_ACRN_IN = $(CONFIG_ACRN).in
CONFIG_PATH_ACRN = $(abspath $(CONFDIR)/$(CONFIG_FILE_ACRN))
@@ -460,7 +458,6 @@ USER_VARS += DEFENTROPYSOURCE
USER_VARS += DEFVALIDENTROPYSOURCES
USER_VARS += DEFSANDBOXCGROUPONLY
USER_VARS += DEFBINDMOUNTS
USER_VARS += DEFVFIOMODE
USER_VARS += FEATURE_SELINUX
USER_VARS += BUILDFLAGS
@@ -525,15 +522,15 @@ containerd-shim-v2: $(SHIMV2_OUTPUT)
monitor: $(MONITOR_OUTPUT)
netmon: $(NETMON_RUNTIME_OUTPUT)
netmon: $(NETMON_TARGET_OUTPUT)
$(NETMON_RUNTIME_OUTPUT): $(SOURCES) VERSION
$(NETMON_TARGET_OUTPUT): $(SOURCES) VERSION
$(QUIET_BUILD)(cd $(NETMON_DIR) && go build $(BUILDFLAGS) -o $@ -ldflags "-X main.version=$(VERSION)" $(KATA_LDFLAGS))
runtime: $(RUNTIME_OUTPUT) $(CONFIGS)
runtime: $(TARGET_OUTPUT) $(CONFIGS)
.DEFAULT: default
build: all
build: default
#Install an executable file
# params:
@@ -561,12 +558,16 @@ define MAKE_KERNEL_VIRTIOFS_NAME
$(if $(findstring uncompressed,$1),vmlinux-virtiofs.container,vmlinuz-virtiofs.container)
endef
GENERATED_CONFIG = $(abspath $(CLI_DIR)/config-generated.go)
GENERATED_FILES += $(GENERATED_CONFIG)
GENERATED_FILES += pkg/katautils/config-settings.go
$(RUNTIME_OUTPUT): $(SOURCES) $(GENERATED_FILES) $(MAKEFILE_LIST) | show-summary
$(QUIET_BUILD)(cd $(RUNTIME_DIR) && go build $(KATA_LDFLAGS) $(BUILDFLAGS) -o $@ .)
$(TARGET_OUTPUT): $(SOURCES) $(GENERATED_FILES) $(MAKEFILE_LIST) | show-summary
$(QUIET_BUILD)(cd $(CLI_DIR) && go build $(KATA_LDFLAGS) $(BUILDFLAGS) -o $@ .)
$(SHIMV2_OUTPUT): $(SOURCES) $(GENERATED_FILES) $(MAKEFILE_LIST)
$(QUIET_BUILD)(cd $(SHIMV2_DIR)/ && ln -fs $(GENERATED_CONFIG))
$(QUIET_BUILD)(cd $(SHIMV2_DIR)/ && go build $(KATA_LDFLAGS) $(BUILDFLAGS) -o $@ .)
$(MONITOR_OUTPUT): $(SOURCES) $(GENERATED_FILES) $(MAKEFILE_LIST) .git-commit
@@ -575,11 +576,10 @@ $(MONITOR_OUTPUT): $(SOURCES) $(GENERATED_FILES) $(MAKEFILE_LIST) .git-commit
.PHONY: \
check \
check-go-static \
coverage \
default \
install \
lint \
pre-commit \
show-header \
show-summary \
show-variables \
@@ -598,6 +598,8 @@ $(GENERATED_FILES): %: %.in $(MAKEFILE_LIST) VERSION .git-commit
generate-config: $(CONFIGS)
check: check-go-static
test: install-hook go-test
install-hook:
@@ -608,37 +610,17 @@ ifeq ($(shell id -u), 0)
endif
go-test: $(GENERATED_FILES)
go clean -testcache
go test -v -mod=vendor ./...
fast-test: $(GENERATED_FILES)
go clean -testcache
for s in $$(go list ./...); do if ! go test -failfast -v -mod=vendor -p 1 $$s; then break; fi; done
GOLANGCI_LINT_FILE := ../../../tests/.ci/.golangci.yml
GOLANGCI_LINT_NAME = golangci-lint
GOLANGCI_LINT_CMD := $(shell command -v $(GOLANGCI_LINT_NAME) 2>/dev/null)
lint: all
if [ -z $(GOLANGCI_LINT_CMD) ] ; \
then \
echo "ERROR: command $(GOLANGCI_LINT_NAME) not found. Please install it first." >&2; exit 1; \
fi
if [ -f $(GOLANGCI_LINT_FILE) ] ; \
then \
echo "running $(GOLANGCI_LINT_NAME)..."; \
$(GOLANGCI_LINT_NAME) run -c $(GOLANGCI_LINT_FILE) ; \
else \
echo "ERROR: file $(GOLANGCI_LINT_FILE) not found. You should clone https://github.com/kata-containers/tests to run $(GOLANGCI_LINT_NAME) locally." >&2; exit 1; \
fi;
pre-commit: lint fast-test
check-go-static:
$(QUIET_CHECK)../../ci/go-no-os-exit.sh ./cli
$(QUIET_CHECK)../../ci/go-no-os-exit.sh ./virtcontainers
coverage:
go test -v -mod=vendor -covermode=atomic -coverprofile=coverage.txt ./...
go tool cover -html=coverage.txt -o coverage.html
install: all install-runtime install-containerd-shim-v2 install-monitor install-netmon
install: default install-runtime install-containerd-shim-v2 install-monitor install-netmon
install-bin: $(BINLIST)
$(QUIET_INST)$(foreach f,$(BINLIST),$(call INSTALL_EXEC,$f,$(BINDIR)))
@@ -681,6 +663,7 @@ clean:
$(NETMON_TARGET) \
$(MONITOR) \
$(SHIMV2) \
$(SHIMV2_DIR)/$(notdir $(GENERATED_CONFIG)) \
$(TARGET) \
.git-commit .git-commit.tmp
@@ -695,9 +678,6 @@ show-usage: show-header
@printf "\n"
@printf "\tbuild : standard build (build everything).\n"
@printf "\ttest : run tests.\n"
@printf "\tpre-commit : run $(GOLANGCI_LINT_NAME) and tests locally.\n"
@printf "\tlint : run $(GOLANGCI_LINT_NAME).\n"
@printf "\tfast-test : run tests with failfast option.\n"
@printf "\tcheck : run code checks.\n"
@printf "\tclean : remove built files.\n"
@printf "\tcontainerd-shim-v2 : only build containerd shim v2.\n"

View File

@@ -26,7 +26,8 @@ to work seamlessly with both Docker and Kubernetes respectively.
## License
The code is licensed under an Apache 2.0 license.
See [the license file](https://github.com/kata-containers/kata-containers/blob/main/LICENSE) for further details.
See [the license file](../../LICENSE) for further details.
## Platform support

View File

@@ -0,0 +1,40 @@
//
// Copyright (c) 2018-2019 Intel Corporation
//
// SPDX-License-Identifier: Apache-2.0
//
// WARNING: This file is auto-generated - DO NOT EDIT!
//
// Note that some variables are "var" to allow them to be modified
// by the tests.
package main
// name is the name of the runtime
const name = "@RUNTIME_NAME@"
// name of the project
const project = "@PROJECT_NAME@"
// prefix used to denote non-standard CLI commands and options.
const projectPrefix = "@PROJECT_TYPE@"
// original URL for this project
const projectURL = "@PROJECT_URL@"
// Project URL's organisation name
const projectORG = "@PROJECT_ORG@"
const defaultRootDirectory = "@PKGRUNDIR@"
// commit is the git commit the runtime is compiled from.
var commit = "@COMMIT@"
// version is the runtime version.
var version = "@VERSION@"
// Default config file used by stateless systems.
var defaultRuntimeConfiguration = "@CONFIG_PATH@"
// Alternate config file that takes precedence over
// defaultRuntimeConfiguration.
var defaultSysConfRuntimeConfiguration = "@SYSCONFIG@"

View File

@@ -124,17 +124,24 @@ block_device_driver = "@DEFBLOCKSTORAGEDRIVER_ACRN@"
# Enable agent tracing.
#
# If enabled, the agent will generate OpenTelemetry trace spans.
# If enabled, the default trace mode is "dynamic" and the
# default trace type is "isolated". The trace mode and type are set
# explicity with the `trace_type=` and `trace_mode=` options.
#
# Notes:
#
# - If the runtime also has tracing enabled, the agent spans will be
# associated with the appropriate runtime parent span.
# - If enabled, the runtime will wait for the container to shutdown,
# increasing the container shutdown time slightly.
# - Tracing is ONLY enabled when `enable_tracing` is set: explicitly
# setting `trace_mode=` and/or `trace_type=` without setting `enable_tracing`
# will NOT activate agent tracing.
#
# - See https://github.com/kata-containers/agent/blob/master/TRACING.md for
# full details.
#
# (default: disabled)
#enable_tracing = true
#
#trace_mode = "dynamic"
#trace_type = "isolated"
# Enable debug console.

View File

@@ -105,17 +105,10 @@ virtio_fs_extra_args = @DEFVIRTIOFSEXTRAARGS@
virtio_fs_cache = "@DEFVIRTIOFSCACHE@"
# Block storage driver to be used for the hypervisor in case the container
# rootfs is backed by a block device. This is virtio-blk.
# rootfs is backed by a block device. This is virtio-scsi, virtio-blk
# or nvdimm.
block_device_driver = "virtio-blk"
# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
# being allocated using huge pages.
#enable_hugepages = true
# Disable the 'seccomp' feature from Cloud Hypervisor, default false
# disable_seccomp = true
# This option changes the default hypervisor and kernel parameters
# to enable debug output where available.
#
@@ -146,17 +139,24 @@ block_device_driver = "virtio-blk"
# Enable agent tracing.
#
# If enabled, the agent will generate OpenTelemetry trace spans.
# If enabled, the default trace mode is "dynamic" and the
# default trace type is "isolated". The trace mode and type are set
# explicity with the `trace_type=` and `trace_mode=` options.
#
# Notes:
#
# - If the runtime also has tracing enabled, the agent spans will be
# associated with the appropriate runtime parent span.
# - If enabled, the runtime will wait for the container to shutdown,
# increasing the container shutdown time slightly.
# - Tracing is ONLY enabled when `enable_tracing` is set: explicitly
# setting `trace_mode=` and/or `trace_type=` without setting `enable_tracing`
# will NOT activate agent tracing.
#
# - See https://github.com/kata-containers/agent/blob/master/TRACING.md for
# full details.
#
# (default: disabled)
#enable_tracing = true
#
#trace_mode = "dynamic"
#trace_type = "isolated"
# Enable debug console.
@@ -262,27 +262,6 @@ sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY@
# These will not be exposed to the container workloads, and are only provided for potential guest services.
sandbox_bind_mounts=@DEFBINDMOUNTS@
# VFIO Mode
# Determines how VFIO devices should be be presented to the container.
# Options:
#
# - vfio
# Matches behaviour of OCI runtimes (e.g. runc) as much as
# possible. VFIO devices will appear in the container as VFIO
# character devices under /dev/vfio. The exact names may differ
# from the host (they need to match the VM's IOMMU group numbers
# rather than the host's)
#
# - guest-kernel
# This is a Kata-specific behaviour that's useful in certain cases.
# The VFIO device is managed by whatever driver in the VM kernel
# claims it. This means it will appear as one or more device nodes
# or network interfaces depending on the nature of the device.
# Using this mode requires specially built workloads that know how
# to locate the relevant device interfaces within the VM.
#
vfio_mode="@DEFVFIOMODE@"
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.

View File

@@ -246,17 +246,24 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# Enable agent tracing.
#
# If enabled, the agent will generate OpenTelemetry trace spans.
# If enabled, the default trace mode is "dynamic" and the
# default trace type is "isolated". The trace mode and type are set
# explicity with the `trace_type=` and `trace_mode=` options.
#
# Notes:
#
# - If the runtime also has tracing enabled, the agent spans will be
# associated with the appropriate runtime parent span.
# - If enabled, the runtime will wait for the container to shutdown,
# increasing the container shutdown time slightly.
# - Tracing is ONLY enabled when `enable_tracing` is set: explicitly
# setting `trace_mode=` and/or `trace_type=` without setting `enable_tracing`
# will NOT activate agent tracing.
#
# - See https://github.com/kata-containers/agent/blob/master/TRACING.md for
# full details.
#
# (default: disabled)
#enable_tracing = true
#
#trace_mode = "dynamic"
#trace_type = "isolated"
# Comma separated list of kernel modules and their parameters.
# These modules will be loaded in the guest kernel using modprobe(8).

View File

@@ -24,11 +24,6 @@ machine_type = "@MACHINETYPE@"
# Default false
# confidential_guest = true
# Enable running QEMU VMM as a non-root user.
# By default QEMU VMM run as root. When this is set to true, QEMU VMM process runs as
# a non-root random user. See documentation for the limitations of this mode.
# rootless = true
# List of valid annotation names for the hypervisor
# Each member of the list is a regular expression, which is the base name
# of the annotation, e.g. "path" for io.katacontainers.config.hypervisor.path"
@@ -365,7 +360,7 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# When enable_guest_swap is enabled, insert a raw file to the guest as the swap device
# if the swappiness of a container (set by annotation "io.katacontainers.container.resource.swappiness")
# is bigger than 0.
# The size of the swap device should be
# The size of the swap device should be
# swap_in_bytes (set by annotation "io.katacontainers.container.resource.swap_in_bytes") - memory_limit_in_bytes.
# If swap_in_bytes is not set, the size should be memory_limit_in_bytes.
# If swap_in_bytes and memory_limit_in_bytes is not set, the size should
@@ -422,17 +417,24 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# Enable agent tracing.
#
# If enabled, the agent will generate OpenTelemetry trace spans.
# If enabled, the default trace mode is "dynamic" and the
# default trace type is "isolated". The trace mode and type are set
# explicity with the `trace_type=` and `trace_mode=` options.
#
# Notes:
#
# - If the runtime also has tracing enabled, the agent spans will be
# associated with the appropriate runtime parent span.
# - If enabled, the runtime will wait for the container to shutdown,
# increasing the container shutdown time slightly.
# - Tracing is ONLY enabled when `enable_tracing` is set: explicitly
# setting `trace_mode=` and/or `trace_type=` without setting `enable_tracing`
# will NOT activate agent tracing.
#
# - See https://github.com/kata-containers/agent/blob/master/TRACING.md for
# full details.
#
# (default: disabled)
#enable_tracing = true
#
#trace_mode = "dynamic"
#trace_type = "isolated"
# Comma separated list of kernel modules and their parameters.
# These modules will be loaded in the guest kernel using modprobe(8).
@@ -543,27 +545,6 @@ sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY@
# These will not be exposed to the container workloads, and are only provided for potential guest services.
sandbox_bind_mounts=@DEFBINDMOUNTS@
# VFIO Mode
# Determines how VFIO devices should be be presented to the container.
# Options:
#
# - vfio
# Matches behaviour of OCI runtimes (e.g. runc) as much as
# possible. VFIO devices will appear in the container as VFIO
# character devices under /dev/vfio. The exact names may differ
# from the host (they need to match the VM's IOMMU group numbers
# rather than the host's)
#
# - guest-kernel
# This is a Kata-specific behaviour that's useful in certain cases.
# The VFIO device is managed by whatever driver in the VM kernel
# claims it. This means it will appear as one or more device nodes
# or network interfaces depending on the nature of the device.
# Using this mode requires specially built workloads that know how
# to locate the relevant device interfaces within the VM.
#
vfio_mode="@DEFVFIOMODE@"
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.

View File

@@ -0,0 +1,30 @@
// Copyright (c) 2018 HyperHQ Inc.
//
// SPDX-License-Identifier: Apache-2.0
//
package main
import (
"fmt"
"os"
"github.com/containerd/containerd/runtime/v2/shim"
containerdshim "github.com/kata-containers/kata-containers/src/runtime/containerd-shim-v2"
"github.com/kata-containers/kata-containers/src/runtime/pkg/types"
)
func shimConfig(config *shim.Config) {
config.NoReaper = true
config.NoSubreaper = true
}
func main() {
if len(os.Args) == 2 && os.Args[1] == "--version" {
fmt.Printf("%s containerd shim: id: %q, version: %s, commit: %v\n", project, types.DefaultKataRuntimeName, version, commit)
os.Exit(0)
}
shim.Run(types.DefaultKataRuntimeName, containerdshim.New, shimConfig)
}

28
src/runtime/cli/exit.go Normal file
View File

@@ -0,0 +1,28 @@
// Copyright (c) 2017 Intel Corporation
//
// SPDX-License-Identifier: Apache-2.0
//
package main
import "os"
var atexitFuncs []func()
var exitFunc = os.Exit
// atexit registers a function f that will be run when exit is called. The
// handlers so registered will be called the in reverse order of their
// registration.
func atexit(f func()) {
atexitFuncs = append(atexitFuncs, f)
}
// exit calls all atexit handlers before exiting the process with status.
func exit(status int) {
for i := len(atexitFuncs) - 1; i >= 0; i-- {
f := atexitFuncs[i]
f()
}
exitFunc(status)
}

View File

@@ -0,0 +1,42 @@
// Copyright (c) 2017 Intel Corporation
//
// SPDX-License-Identifier: Apache-2.0
//
package main
import (
"os"
"testing"
"github.com/stretchr/testify/assert"
)
var testFoo string
func testFunc() {
testFoo = "bar"
}
func TestExit(t *testing.T) {
assert := assert.New(t)
var testExitStatus int
exitFunc = func(status int) {
testExitStatus = status
}
defer func() {
exitFunc = os.Exit
}()
// test with no atexit functions added.
exit(1)
assert.Equal(testExitStatus, 1)
// test with a function added to the atexit list.
atexit(testFunc)
exit(0)
assert.Equal(testFoo, "bar")
assert.Equal(testExitStatus, 0)
}

View File

@@ -25,6 +25,7 @@ import (
"strings"
"syscall"
"github.com/containerd/cgroups"
"github.com/kata-containers/kata-containers/src/runtime/pkg/katautils"
vc "github.com/kata-containers/kata-containers/src/runtime/virtcontainers"
"github.com/kata-containers/kata-containers/src/runtime/virtcontainers/pkg/oci"
@@ -61,9 +62,9 @@ type vmContainerCapableDetails struct {
const (
moduleParamDir = "parameters"
successMessageCapable = "System is capable of running " + katautils.PROJECT
successMessageCreate = "System can currently create " + katautils.PROJECT
failMessage = "System is not capable of running " + katautils.PROJECT
successMessageCapable = "System is capable of running " + project
successMessageCreate = "System can currently create " + project
failMessage = "System is not capable of running " + project
kernelPropertyCorrect = "Kernel property value correct"
// these refer to fields in the procCPUINFO file
@@ -228,7 +229,7 @@ func checkKernelModules(modules map[string]kernelModule, handler kernelParamHand
}
if !haveKernelModule(module) {
kataLog.WithFields(fields).Errorf("kernel property %s not found", module)
kataLog.WithFields(fields).Error("kernel property not found")
if details.required {
count++
}
@@ -291,9 +292,11 @@ func genericHostIsVMContainerCapable(details vmContainerCapableDetails) error {
errorCount := uint32(0)
count := checkCPUAttribs(cpuinfo, details.requiredCPUAttribs)
errorCount += count
count = checkCPUFlags(cpuFlags, details.requiredCPUFlags)
errorCount += count
count, err = checkKernelModules(details.requiredKernelModules, archKernelParamHandler)
@@ -313,7 +316,7 @@ func genericHostIsVMContainerCapable(details vmContainerCapableDetails) error {
var kataCheckCLICommand = cli.Command{
Name: "check",
Aliases: []string{"kata-check"},
Usage: "tests if system can run " + katautils.PROJECT,
Usage: "tests if system can run " + project,
Flags: []cli.Flag{
cli.BoolFlag{
Name: "check-version-only",
@@ -372,14 +375,14 @@ EXAMPLES:
$ %s check --only-list-releases --include-all-releases
`,
katautils.PROJECT,
project,
noNetworkEnvVar,
katautils.NAME,
katautils.NAME,
katautils.NAME,
katautils.NAME,
katautils.NAME,
katautils.NAME,
name,
name,
name,
name,
name,
name,
),
Action: func(context *cli.Context) error {
@@ -398,7 +401,7 @@ EXAMPLES:
if os.Geteuid() == 0 {
kataLog.Warn("Not running network checks as super user")
} else {
err := HandleReleaseVersions(cmd, katautils.VERSION, context.Bool("include-all-releases"))
err := HandleReleaseVersions(cmd, version, context.Bool("include-all-releases"))
if err != nil {
return err
}
@@ -414,6 +417,11 @@ EXAMPLES:
return errors.New("check: cannot determine runtime config")
}
// check if cgroup can work use the same logic for creating containers
if _, err := vc.V1Constraints(); err != nil && err == cgroups.ErrMountPointNotExist && !runtimeConfig.SandboxCgroupOnly {
return fmt.Errorf("Cgroup v2 requires the following configuration: `sandbox_cgroup_only=true`.")
}
err := setCPUtype(runtimeConfig.HypervisorType)
if err != nil {
return err

View File

@@ -161,16 +161,6 @@ func setCPUtype(hypervisorType vc.HypervisorType) error {
required: false,
},
}
case "mock":
archRequiredCPUFlags = map[string]string{
cpuFlagVMX: "Virtualization support",
cpuFlagLM: "64Bit CPU",
cpuFlagSSE4_1: "SSE4.1",
}
archRequiredCPUAttribs = map[string]string{
archGenuineIntel: "Intel Architecture CPU",
}
default:
return fmt.Errorf("setCPUtype: Unknown hypervisor type %s", hypervisorType)
}
@@ -302,8 +292,6 @@ func archHostCanCreateVMContainer(hypervisorType vc.HypervisorType) error {
return kvmIsUsable()
case "acrn":
return acrnIsUsable()
case "mock":
return nil
default:
return fmt.Errorf("archHostCanCreateVMContainer: Unknown hypervisor type %s", hypervisorType)
}

View File

@@ -317,13 +317,12 @@ func TestCheckHostIsVMContainerCapable(t *testing.T) {
}
}
// to check if host is capable for Kata Containers, must setup CPU info first.
_, config, err := makeRuntimeConfig(dir)
assert.NoError(err)
setCPUtype(config.HypervisorType)
setupCheckHostIsVMContainerCapable(assert, cpuInfoFile, cpuData, moduleData)
// remove the modules to force a failure
err = os.RemoveAll(sysModuleDir)
assert.NoError(err)
details := vmContainerCapableDetails{
cpuInfoFile: cpuInfoFile,
requiredCPUFlags: archRequiredCPUFlags,
@@ -333,12 +332,6 @@ func TestCheckHostIsVMContainerCapable(t *testing.T) {
err = hostIsVMContainerCapable(details)
assert.Nil(err)
// remove the modules to force a failure
err = os.RemoveAll(sysModuleDir)
assert.NoError(err)
err = hostIsVMContainerCapable(details)
assert.Error(err)
}
func TestArchKernelParamHandler(t *testing.T) {

View File

@@ -28,9 +28,9 @@ func setupCheckHostIsVMContainerCapable(assert *assert.Assertions, cpuInfoFile s
func TestCCCheckCLIFunction(t *testing.T) {
var cpuData []testCPUData
moduleData := []testModuleData{
{filepath.Join(sysModuleDir, "kvm"), "", true},
{filepath.Join(sysModuleDir, "vhost"), "", true},
{filepath.Join(sysModuleDir, "vhost_net"), "", true},
{filepath.Join(sysModuleDir, "kvm"), true, ""},
{filepath.Join(sysModuleDir, "vhost"), true, ""},
{filepath.Join(sysModuleDir, "vhost_net"), true, ""},
}
genericCheckCLIFunction(t, cpuData, moduleData)

View File

@@ -10,7 +10,7 @@ vendor_id : IBM/S390
# processors : 4
bogomips per cpu: 20325.00
max thread id : 0
features : esan3 zarch stfle msa ldisp eimm dfp edat etf3eh highgprs te vx sie
features : esan3 zarch stfle msa ldisp eimm dfp edat etf3eh highgprs te vx sie
cache0 : level=1 type=Data scope=Private size=128K line_size=256 associativity=8
cache1 : level=1 type=Instruction scope=Private size=96K line_size=256 associativity=6
cache2 : level=2 type=Data scope=Private size=2048K line_size=256 associativity=8

View File

@@ -39,8 +39,7 @@ func testSetCPUTypeGeneric(t *testing.T) {
_, config, err := makeRuntimeConfig(tmpdir)
assert.NoError(err)
err = setCPUtype(config.HypervisorType)
assert.NoError(err)
setCPUtype(config.HypervisorType)
assert.Equal(archRequiredCPUFlags, savedArchRequiredCPUFlags)
assert.Equal(archRequiredCPUAttribs, savedArchRequiredCPUAttribs)

View File

@@ -47,8 +47,8 @@ func TestCCCheckCLIFunction(t *testing.T) {
}
moduleData := []testModuleData{
{filepath.Join(sysModuleDir, "kvm"), "", true},
{filepath.Join(sysModuleDir, "kvm_hv"), "", true},
{filepath.Join(sysModuleDir, "kvm"), false, "Y"},
{filepath.Join(sysModuleDir, "kvm_hv"), false, "Y"},
}
genericCheckCLIFunction(t, cpuData, moduleData)
@@ -58,51 +58,51 @@ func TestArchKernelParamHandler(t *testing.T) {
assert := assert.New(t)
type testData struct {
fields logrus.Fields
msg string
onVMM bool
expectIgnore bool
fields logrus.Fields
msg string
}
data := []testData{
{logrus.Fields{}, "", true, false},
{logrus.Fields{}, "", false, false},
{true, false, logrus.Fields{}, ""},
{false, false, logrus.Fields{}, ""},
{
false,
false,
logrus.Fields{
// wrong type
"parameter": 123,
},
"foo",
false,
false,
},
{
false,
false,
logrus.Fields{
"parameter": "unrestricted_guest",
},
"",
false,
false,
},
{
true,
true,
logrus.Fields{
"parameter": "unrestricted_guest",
},
"",
true,
true,
},
{
false,
true,
logrus.Fields{
"parameter": "nested",
},
"",
false,
true,
},
}

View File

@@ -47,7 +47,7 @@ func TestCCCheckCLIFunction(t *testing.T) {
}
moduleData := []testModuleData{
{filepath.Join(sysModuleDir, "kvm"), "", true},
{filepath.Join(sysModuleDir, "kvm"), false, "Y"},
}
genericCheckCLIFunction(t, cpuData, moduleData)
@@ -57,51 +57,51 @@ func TestArchKernelParamHandler(t *testing.T) {
assert := assert.New(t)
type testData struct {
fields logrus.Fields
msg string
onVMM bool
expectIgnore bool
fields logrus.Fields
msg string
}
data := []testData{
{logrus.Fields{}, "", true, false},
{logrus.Fields{}, "", false, false},
{true, false, logrus.Fields{}, ""},
{false, false, logrus.Fields{}, ""},
{
false,
false,
logrus.Fields{
// wrong type
"parameter": 123,
},
"foo",
false,
false,
},
{
false,
false,
logrus.Fields{
"parameter": "unrestricted_guest",
},
"",
false,
false,
},
{
true,
true,
logrus.Fields{
"parameter": "unrestricted_guest",
},
"",
true,
true,
},
{
false,
true,
logrus.Fields{
"parameter": "nested",
},
"",
false,
true,
},
}

View File

@@ -17,10 +17,8 @@ import (
"strings"
"testing"
"github.com/kata-containers/kata-containers/src/runtime/pkg/katatestutils"
ktu "github.com/kata-containers/kata-containers/src/runtime/pkg/katatestutils"
"github.com/kata-containers/kata-containers/src/runtime/pkg/katautils"
vc "github.com/kata-containers/kata-containers/src/runtime/virtcontainers"
"github.com/sirupsen/logrus"
"github.com/stretchr/testify/assert"
"github.com/urfave/cli"
@@ -249,13 +247,6 @@ func genericCheckCLIFunction(t *testing.T, cpuData []testCPUData, moduleData []t
flagSet := &flag.FlagSet{}
ctx := createCLIContext(flagSet)
ctx.App.Name = "foo"
if katatestutils.IsInGitHubActions() {
// only set to mock if on GitHub
t.Logf("running tests under GitHub actions")
config.HypervisorType = vc.MockHypervisor
}
ctx.App.Metadata["runtimeConfig"] = config
// create buffer to save logger output

View File

@@ -13,23 +13,21 @@ import (
"strings"
"github.com/BurntSushi/toml"
specs "github.com/opencontainers/runtime-spec/specs-go"
"github.com/prometheus/procfs"
"github.com/urfave/cli"
"github.com/kata-containers/kata-containers/src/runtime/pkg/katautils"
"github.com/kata-containers/kata-containers/src/runtime/pkg/utils"
vc "github.com/kata-containers/kata-containers/src/runtime/virtcontainers"
exp "github.com/kata-containers/kata-containers/src/runtime/virtcontainers/experimental"
"github.com/kata-containers/kata-containers/src/runtime/virtcontainers/pkg/oci"
vcUtils "github.com/kata-containers/kata-containers/src/runtime/virtcontainers/utils"
specs "github.com/opencontainers/runtime-spec/specs-go"
"github.com/prometheus/procfs"
"github.com/urfave/cli"
)
// Semantic version for the output of the command.
//
// XXX: Increment for every change to the output format
// (meaning any change to the EnvInfo type).
const formatVersion = "1.0.26"
const formatVersion = "1.0.25"
// MetaInfo stores information on the format of the output itself
type MetaInfo struct {
@@ -108,7 +106,6 @@ type HypervisorInfo struct {
EntropySource string
SharedFS string
VirtioFSDaemon string
SocketPath string
Msize9p uint32
MemorySlots uint32
PCIeRootPort uint32
@@ -118,8 +115,10 @@ type HypervisorInfo struct {
// AgentInfo stores agent details
type AgentInfo struct {
Debug bool
Trace bool
TraceMode string
TraceType string
Debug bool
Trace bool
}
// DistroInfo stores host operating system distribution details.
@@ -130,14 +129,13 @@ type DistroInfo struct {
// HostInfo stores host details
type HostInfo struct {
AvailableGuestProtections []string
Kernel string
Architecture string
Distro DistroInfo
CPU CPUInfo
Memory MemoryInfo
VMContainerCapable bool
SupportVSocks bool
Kernel string
Architecture string
Distro DistroInfo
CPU CPUInfo
Memory MemoryInfo
VMContainerCapable bool
SupportVSocks bool
}
// NetmonInfo stores netmon details
@@ -157,11 +155,11 @@ type EnvInfo struct {
Meta MetaInfo
Image ImageInfo
Initrd InitrdInfo
Hypervisor HypervisorInfo
Runtime RuntimeInfo
Netmon NetmonInfo
Host HostInfo
Agent AgentInfo
Hypervisor HypervisorInfo
Netmon NetmonInfo
Runtime RuntimeInfo
Host HostInfo
}
func getMetaInfo() MetaInfo {
@@ -171,8 +169,8 @@ func getMetaInfo() MetaInfo {
}
func getRuntimeInfo(configFile string, config oci.RuntimeConfig) RuntimeInfo {
runtimeVersionInfo := constructVersionInfo(katautils.VERSION)
runtimeVersionInfo.Commit = katautils.COMMIT
runtimeVersionInfo := constructVersionInfo(version)
runtimeVersionInfo.Commit = commit
runtimeVersion := RuntimeVersionInfo{
Version: runtimeVersionInfo,
@@ -242,17 +240,14 @@ func getHostInfo() (HostInfo, error) {
memoryInfo := getMemoryInfo()
availableGuestProtection := vc.AvailableGuestProtections()
host := HostInfo{
Kernel: hostKernelVersion,
Architecture: arch,
Distro: hostDistro,
CPU: hostCPU,
Memory: memoryInfo,
AvailableGuestProtections: availableGuestProtection,
VMContainerCapable: hostVMContainerCapable,
SupportVSocks: supportVSocks,
Kernel: hostKernelVersion,
Architecture: arch,
Distro: hostDistro,
CPU: hostCPU,
Memory: memoryInfo,
VMContainerCapable: hostVMContainerCapable,
SupportVSocks: supportVSocks,
}
return host, nil
@@ -306,11 +301,13 @@ func getAgentInfo(config oci.RuntimeConfig) (AgentInfo, error) {
agentConfig := config.AgentConfig
agent.Debug = agentConfig.Debug
agent.Trace = agentConfig.Trace
agent.TraceMode = agentConfig.TraceMode
agent.TraceType = agentConfig.TraceType
return agent, nil
}
func getHypervisorInfo(config oci.RuntimeConfig) (HypervisorInfo, error) {
func getHypervisorInfo(config oci.RuntimeConfig) HypervisorInfo {
hypervisorPath := config.HypervisorConfig.HypervisorPath
version, err := getCommandVersion(hypervisorPath)
@@ -318,19 +315,6 @@ func getHypervisorInfo(config oci.RuntimeConfig) (HypervisorInfo, error) {
version = unknown
}
hypervisorType := config.HypervisorType
socketPath := unknown
// It is only reliable to make this call as root since a
// non-privileged user may not have access to /dev/vhost-vsock.
if os.Geteuid() == 0 {
socketPath, err = vc.GetHypervisorSocketTemplate(hypervisorType, &config.HypervisorConfig)
if err != nil {
return HypervisorInfo{}, err
}
}
return HypervisorInfo{
Debug: config.HypervisorConfig.Debug,
MachineType: config.HypervisorConfig.HypervisorMachineType,
@@ -345,8 +329,7 @@ func getHypervisorInfo(config oci.RuntimeConfig) (HypervisorInfo, error) {
HotplugVFIOOnRootBus: config.HypervisorConfig.HotplugVFIOOnRootBus,
PCIeRootPort: config.HypervisorConfig.PCIeRootPort,
SocketPath: socketPath,
}, nil
}
}
func getEnvInfo(configFile string, config oci.RuntimeConfig) (env EnvInfo, err error) {
@@ -371,10 +354,7 @@ func getEnvInfo(configFile string, config oci.RuntimeConfig) (env EnvInfo, err e
return EnvInfo{}, err
}
hypervisor, err := getHypervisorInfo(config)
if err != nil {
return EnvInfo{}, err
}
hypervisor := getHypervisorInfo(config)
image := ImageInfo{
Path: config.HypervisorConfig.ImagePath,

Some files were not shown because too many files have changed in this diff Show More