Compare commits

..

278 Commits
2.1.1 ... 2.0.1

Author SHA1 Message Date
Eric Ernst
3df65f4f3a Merge pull request #1282 from egernst/fix-2.0-stable-release
Fix 2.0 stable release
2021-01-15 17:30:17 -08:00
Eric Ernst
c5a6354718 actions: w/a deprecated set-env
set-env is no longer allowed. Updated to use the new recommended syntax.

Fixes: #1273

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2021-01-15 16:29:37 -08:00
Eric Ernst
867d8bc9b4 packaging: should tag/update tests repo when releasing
We should still bump/version the tests repository, just as we do for
1.x.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2021-01-15 14:39:39 -08:00
Eric Ernst
cfe9470ff1 Merge pull request #1270 from egernst/2.0.1-branch-bump
# Kata Containers 2.0.1
2021-01-14 13:56:03 -08:00
Fabiano Fidêncio
9820459a0f Merge pull request #1271 from devimc/2021-01-14/stable-2.0/fixSnap
[backport] snap: tag yq version
2021-01-14 22:41:56 +01:00
Julio Montes
4e141a96ed snap: tag yq version
yq major releases are not backward compatible, install the same
major version used in the CI to avoid conflics building the kata
components.
We should update yq when the CI updates it, not before.

fixes #1232

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-01-14 14:03:39 -06:00
Eric Ernst
c8028da3c6 release: Kata Containers 2.0.1
- volume cleanup, RO blk device support
- Backport to stable-2.0 branch
- [stable 2.0] backport VFIO fixes
- [backport] snap: fix snap release channel
- [backport] snap: add GH actions jobs to release the snap package
- backport fixes to stable-2.0.0
- Backport: Backport doc changes from 2.0 dev

e4cea92a blk-dev: hotplug readonly if applicable
0590fedd volumes: cleanup / minor refactoring
6b666899 vendor: revendor govmm from intel to kata-containers
65ae1271 runtime: clh: update cloud-hypervisor
9bc6fe6c runtime: clh: disable virtiofs DAX when FS cache size is 0
349d496f versions: Update cloud-hypervisor to release v0.11.0
60050264 rootfs: Fix indentation inside a switch
91b43a99 rootfs: apparmor=unconfined is needed for non Red Hat host OSes
2478b8f4 rootfs: Always add SYS_ADMIN, CHROOT, and MKNOD caps to docker cmdline
499aa24d rootfs: Don't fallthrough in the docker_extra_args() switch
1edb7fe7 rustjail: fix the issue of sync read
607a892f rustjail: fix the issue of bind mount /dev
26f176e2 rustjail: allow network sysctls
3306195f agent: Avoid container stats panic caused by cgroup controller non-exist
a7568b52 agent: Clean up commented use declarations
e6d68349 agent: Fix temp prefix on Namespace::test_setup_persistent_ns
1f943bd6 agent: Return error on trying to persist a pid namespace
9a41d09f shimv2: Avoid double removing of container from sandbox
8fdb85e0 jail/validator: avoid unwrap() for safety
49516ef6 rustjail: add more context info for errors
21fad464 oci: fix two incompatible issues with OCI spec
b745e5ff agent: consume ttrpc crate from crates.io
40316f68 qemu: no state to save if QEMU isn't running
35b619ff oci: fix a typo in "addtionalGids"
662e8db5 agent/sandbox:  Don't update cpuset when ncpus = 0
9117dd40 runtime/network: Fix error reporting in listRoutes()
fce14f36 runtime/network: Correct error reporting in listInterfaces()
0fd70f7e rootfs-builder: add support for gentoo
4727a9c3 rootfs: reduce size of debian image
7ab8f62d runtime: Allow to overwrite DESTDIR
7e92833b packaging: Make qemu/apply_patches.sh common
14b18b55 packaging/qemu: Delete the temporary container
1dde0de1 packaging/qemu: Build and package completely in the container
d4c1b768 packaging/qemu: Add QEMU_DESTDIR argument to dockerfiles
3c36ce81 rootfs-builder: add functions to run before and after the container
c9d4e2c4 agent-ctl: Add void "install" target
5fadc5fc trace-forwarder: Add void "install" target
5f887506 snap: fix snap release channel
7526ee93 snap: add GH actions jobs to release the snap package
21ed9dc2 agent: update proto file copyright
5f1520bd agent: generate proto files properly
e30bd673 agent-ctl: update cargo.lock
78df4a0c runtime: remove the unused proto files
7daf9cff agent: move gogo.proto out of the github.com namespance
293be9d0 agent: types.pb.go is not regenerated
84e1a34f agent/protocols: Move agent.proto out of the mock folder of agent
cf56307e agent/protocols: Fix copyright header checking
359f76d2 agent/protocols: Stop generate agent proto files in the shellscript
ca8f1399 agent/protocols: Ignore generated files and remove these files from repo
0bb559a4 agent/protocols: Generate proto files programmatically
4ca4412f docs: fix spell check
3d80c848 docs: Update how-to Readme with hypervisor information.
f0fdc8e1 docs: Update Readme to remove hypervisor information
e53645ec docs: Remove docs for nemu

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-01-14 11:05:32 -08:00
Eric Ernst
0aa68ccfef Merge pull request #1258 from egernst/ro-stable
volume cleanup, RO blk device support
2021-01-14 09:04:54 -08:00
Eric Ernst
e4cea92ad3 blk-dev: hotplug readonly if applicable
If a block based volume is read only, let's make sure we add as a RO
device

Fixes: #1246

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2021-01-13 14:44:45 -08:00
Eric Ernst
0590fedd98 volumes: cleanup / minor refactoring
Update some headers, very minor refactoring

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2021-01-13 14:44:45 -08:00
Eric Ernst
6b6668998f vendor: revendor govmm from intel to kata-containers
- Update where we vendor govmm
- Grab latest

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2021-01-13 14:44:45 -08:00
Eric Ernst
4f7f25d1a1 Merge pull request #1251 from bergwolf/backport-2.0.0
Backport to stable-2.0 branch
2021-01-13 12:25:15 -08:00
Eric Ernst
216eb29e04 Merge pull request #1256 from devimc/2021-01-13/stable-2.0/fixVfio
[stable 2.0] backport VFIO fixes
2021-01-13 11:29:54 -08:00
Julio Montes
65ae12710d runtime: clh: update cloud-hypervisor
Update cloud-hypervisor to commit 2706319.
Fixes a limitation in OpenAPITools/openapi-generator tool,
it's impossible to send go zero types, like false and 0 to
cloud-hypervisor because `omitempty` is added if a field is not
required.
See cloud-hypervisor/cloud-hypervisor#1961 for more information

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-01-13 11:38:24 -06:00
Julio Montes
9bc6fe6c83 runtime: clh: disable virtiofs DAX when FS cache size is 0
Guest consumes 120Mb more of memory when DAX is enabled and the default
FS cache size (8G) is used. Disable dax when it is not required
reducing guest's memory footprint.

Without this patch:

```
7fdea4000000-7fdee4000000 rw-s 18850589 /memfd:ch_ram (deleted)
Size:            1048576 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:              187876 kB
```

With this patch:

```
7fa970000000-7fa9b0000000 rw-s 612001  /memfd:ch_ram (deleted)
Size:            1048576 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:               57308 kB
Pss:               56722 kB
```

fixes #1100

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-01-13 11:38:24 -06:00
Bo Chen
349d496f7f versions: Update cloud-hypervisor to release v0.11.0
The release v0.11.0 of cloud-hypervisor features the following changes:
1) Improved Linux Boot Time, 2) `SIGTERM/SIGINT` Interrupt Signal,
Handling 3) Default Log Level Changed, 4) `io_uring` support by default
for `virtio-block` (on host kernel version 5.8+), 5) Windows Guest
Support, 6) New `--balloon` Parameter Added, 7) Experimental
`virtio-watchdog` Support, 8) Bug fixes.

Fixes: #1089

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-01-13 11:38:02 -06:00
Fabiano Fidêncio
6005026416 rootfs: Fix indentation inside a switch
While touching this part of the code, let's help my OCD.

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-01-13 18:30:46 +08:00
Fabiano Fidêncio
91b43a9964 rootfs: apparmor=unconfined is needed for non Red Hat host OSes
This is not needed for Fedora, RHEL, and CentOS, but it is required when
using any other host OS.  Having --security-opt apparmor=unconfined used
unconditionally is a no go as it'd break podman.

The reason this was only added when building for SUSE (as target distro)
was because debian and ubuntu condition would fall-through the switch to
the suse case (which makes me think that the fall-through was not
accidental).

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-01-13 18:30:46 +08:00
Fabiano Fidêncio
2478b8f400 rootfs: Always add SYS_ADMIN, CHROOT, and MKNOD caps to docker cmdline
We use those, independently of the distro.

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-01-13 18:30:46 +08:00
Fabiano Fidêncio
499aa24d38 rootfs: Don't fallthrough in the docker_extra_args() switch
Falling through the switch cases in docker_extra_args() looks like a
typo and causes issues when building with podman, as `--security-opt
apparmor=unconfinded" shouldn't be passed if Apparmor is no enable on
the system.

Fixes: #1241

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-01-13 18:30:46 +08:00
fupan.lfp
1edb7fe7da rustjail: fix the issue of sync read
It should check the read count and return an
error if read count didn't match the expected
number.

Fixes: #1233

Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
2021-01-13 18:30:46 +08:00
fupan.lfp
607a892f2e rustjail: fix the issue of bind mount /dev
In case the container rootfs's /dev was overrided
by binding mount from another directory, then there's
no need to create the default devices nodes and symlinks
in /dev.

Fixes: #692

Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
2021-01-13 18:30:46 +08:00
Snir Sheriber
26f176e2d9 rustjail: allow network sysctls
The network ns is shared with the guest skip looking for it
in the spec

Fixes: #1228
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-01-13 18:30:46 +08:00
Tim Zhang
3306195f66 agent: Avoid container stats panic caused by cgroup controller non-exist
Return SingularPtrField::none() instead of panic when getting stats
from cgroup failed caused by cgroup controller missing.

Signed-off-by: Tim Zhang <tim@hyper.sh>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-01-13 18:30:46 +08:00
Wainer dos Santos Moschetta
a7568b520c agent: Clean up commented use declarations
There are some commented use declarations, removed them all.

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2021-01-13 18:30:46 +08:00
Wainer dos Santos Moschetta
e6d68349fa agent: Fix temp prefix on Namespace::test_setup_persistent_ns
Wrong prefix on the created temp directory on the test_setup_persistent_ns
for uts namesmpace type test.

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2021-01-13 18:30:46 +08:00
Wainer dos Santos Moschetta
1f943bd6bf agent: Return error on trying to persist a pid namespace
An pid namespace cannot be persisted, so add a check-and-error on
Namespace::setup() for handling that case.

Fixes #1220

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-01-13 18:30:46 +08:00
Snir Sheriber
9a41d09f39 shimv2: Avoid double removing of container from sandbox
RemoveContainerRequest results in calling to deleteContainer, according
to spec calling to RemoveContainer is idempotent and "must not return
an error if the container has already been removed", hence, don't
return error if the error reports that the container is not found.

Fixes: #836

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2021-01-13 18:30:46 +08:00
Liu Jiang
8fdb85e062 jail/validator: avoid unwrap() for safety
Explicitly return error codes instead of unwrap().

Fixes: #1214

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-01-13 18:30:46 +08:00
Liu Jiang
49516ef6f2 rustjail: add more context info for errors
To help debug.

Fixes: #1214

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-01-13 18:30:46 +08:00
Liu Jiang
21fad464e8 oci: fix two incompatible issues with OCI spec
The first incompatible issue is caused by a typo, "swapiness" should
be "swappiness". The second incompatible issue is caused by a serde
format. The struct LinuxBlockIODevice is introduced for convenience,
but it also changes serialized data, so "#[serde(flatten)]" should
be used for compatibility with OCI spec.

Fixes: #1211

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
2021-01-13 18:30:46 +08:00
Liu Jiang
b745e5ff02 agent: consume ttrpc crate from crates.io
The ttrpc v0.3.0 has been published to crates.io, so consume from
crates.io.

Fixes: #1213

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
2021-01-13 18:30:46 +08:00
Eric Ernst
40316f688a qemu: no state to save if QEMU isn't running
On pod delete, we were looking to read files that we had just deleted. In particular,
stopSandbox for QEMU was called (we cleanup up vmpath), and then QEMU's
save function was called, which immediately checks for the PID file.

Let's only update the persist store for QEMU if QEMU is actually
running. This'll avoid Error messages being displayed when we are
stopping and deleting a sandbox:

```
level=error msg="Could not read qemu pid file"
```

I reviewed CLH, and it looks like it is already taking appropriate
action, so no changes needed.

Ideally we won't spend much time saving state to persist.json unless
there's an actual error during stop/delete/shutdown path, as the persist will
also be removed after the pod is removed. We may want to optimize this,
as currently we are doing a persist store when deleting each container
(after the sandbox is stopped, VM is killed), and when we stop the sandbox.
This'll require more rework... tracked in:
  https://github.com/kata-containers/kata-containers/issues/1181

Fixes: #1179

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2021-01-13 18:30:46 +08:00
Liu Jiang
35b619ff58 oci: fix a typo in "addtionalGids"
There's a typo in "addtionalGids", which should be "additionalGids".

Fixes: #1211

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
2021-01-13 18:30:46 +08:00
Maruth Goyal
662e8db5dd agent/sandbox: Don't update cpuset when ncpus = 0
When receiving an OnlineCpuMemory RPC, if the number of CPUs to be
made available is 0, then updating the cpusets is a redundant operation.

Fixes: #1172

Signed-off-by: Maruth Goyal <maruthgoyal@gmail.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-01-13 18:30:46 +08:00
David Gibson
9117dd409e runtime/network: Fix error reporting in listRoutes()
If the upcast from resultingRoutes to *grpc.IRoutes fails, we return
(nil, err), but previous code ensures that err is nil at that point, so we
return no error.

fixes #1206

Forward port of
0ffaeeb5d8

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-01-13 18:30:46 +08:00
David Gibson
fce14f3697 runtime/network: Correct error reporting in listInterfaces()
If the upcast from resultingInterfaces to *grpc.Interfaces fails, we
return (nil, err), but previous code ensures that err is nil at that
point, so we return no error.

Forward port of
b86e904c2d

fixes #1206

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-01-13 18:30:46 +08:00
Julio Montes
0fd70f7ec3 rootfs-builder: add support for gentoo
Generate images based on gentoo

fixes #1178

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-01-13 18:30:46 +08:00
Julio Montes
4727a9c3e4 rootfs: reduce size of debian image
Improve Kata Containers memory footprint by reducing debian
image size.

Without this change:
Debian image -> 256MB

With this change:
Debian image -> 128MB

Note: this change *will not* impact ubuntu image.

fixes #1188

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-01-13 18:30:46 +08:00
Wainer dos Santos Moschetta
7ab8f62d43 runtime: Allow to overwrite DESTDIR
On runtime/Makefile the value of DESTDIR is set to "/", unless one
pass that variable as an argument to `make`. This change will
allow its overwrite if DESTDIR is exported in the environment as
well.

Fixes #1182

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2021-01-13 18:30:46 +08:00
Wainer dos Santos Moschetta
7e92833bd4 packaging: Make qemu/apply_patches.sh common
Moved the qemu/apply_patches.sh to the common scripts directory and
refactor it so that it can be used as a generic and consistent way
to apply patches.

Fixes #1014

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-01-13 18:30:46 +08:00
Wainer dos Santos Moschetta
14b18b55be packaging/qemu: Delete the temporary container
It is used a temporary container to pull the QEMU tarball out
of the build image, but this container is never deleted. This
will ensure it gets deleted after its execution.

Fixes #1168

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2021-01-13 15:50:56 +08:00
Wainer dos Santos Moschetta
1dde0de1d7 packaging/qemu: Build and package completely in the container
Currently QEMU is built inside the container, its tarball pulled to
the host, files removed then packaged again. Instead, let's run all
those steps inside the container and the resulting tarball will
be the final version. For that end, it is introduced the
qemu-build-post.sh script which will remove the uneeded files and
create the tarball.

The patterns for directories on qemu.blacklist had to be changed
to work properly with `find -path`.

Fixes #1168

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-01-13 15:50:24 +08:00
Wainer dos Santos Moschetta
d4c1b768a6 packaging/qemu: Add QEMU_DESTDIR argument to dockerfiles
The dockerfiles used to build qemu and qemu-virtiofs have the QEMU destination
path hardcoded, which in turn is also on the build scripts. This refactor
the dockerfiles to add the QEMU_DESTDIR argument, which value is passed by the scripts.

Fixes #1168

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2021-01-13 15:48:12 +08:00
Julio Montes
3c36ce8139 rootfs-builder: add functions to run before and after the container
Define `before_starting_container` and `after_stopping_container`
functions, these functions run before and after the container that
builds the rootfs respectively.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-01-13 15:48:04 +08:00
Fabiano Fidêncio
c9d4e2c4b0 agent-ctl: Add void "install" target
Otherwise `make install` run from the top directory would just fail as
the target is not defined.

Fixes: #1149

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-01-13 15:47:57 +08:00
Fabiano Fidêncio
5fadc5fcb4 trace-forwarder: Add void "install" target
Otherwise `make install` run from the top directory would just fail as
the target is not defined.

Fixes: #1149

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-01-13 15:47:52 +08:00
Fabiano Fidêncio
7cc7fd6888 Merge pull request #1176 from devimc/2020-12-07/backport/fixSnapWorkflow
[backport] snap: fix snap release channel
2021-01-08 15:06:25 +01:00
Julio Montes
5f8875064b snap: fix snap release channel
According to the new snap document
`docs/install/snap-installation-guide.md`, Kata Containers 2.x should
be available in the snapcraft `candidate` channel.

fixes #1174

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-12-08 09:33:32 -06:00
Julio Montes
3b925d6ad1 Merge pull request #1024 from devimc/2020-10-20/backport/snap-release
[backport] snap: add GH actions jobs to release the snap package
2020-11-27 15:02:52 -06:00
Julio Montes
7526ee9350 snap: add GH actions jobs to release the snap package
Use Github actions to build and release the snap package automatically
when a new tag is pushed.

fixes #1006

Depends-on: github.com/kata-containers/tests#3085

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-11-27 11:04:45 -06:00
Julio Montes
c46a6244ba Merge pull request #1145 from devimc/2020-11-26/fixSpellcheck
backport fixes to stable-2.0.0
2020-11-27 10:53:09 -06:00
Peng Tao
21ed9dc23f agent: update proto file copyright
Now that it is Ant Group...

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-11-27 09:35:25 -06:00
Peng Tao
5f1520bdee agent: generate proto files properly
Need to generate all protos.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-11-27 09:35:22 -06:00
Peng Tao
e30bd6733b agent-ctl: update cargo.lock
Just compiling would show that the cargo.lock file is not updated.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-11-27 08:26:51 -06:00
Peng Tao
78df4a0c3f runtime: remove the unused proto files
These are moved to the agent and no longer needed.

Fixes: #1028
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-11-27 08:26:51 -06:00
Peng Tao
7daf9cffb1 agent: move gogo.proto out of the github.com namespance
To follow the same namespace scope as other proto files.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-11-27 08:26:51 -06:00
Peng Tao
293be9d0ad agent: types.pb.go is not regenerated
When types.proto was relocated, types.pb.go is not regenerated and still
references the old location.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-11-27 08:26:51 -06:00
Tim Zhang
84e1a34f8f agent/protocols: Move agent.proto out of the mock folder of agent
Because the repos have been merged and the agent repo will be removed in the future,
we do not need mock the file structure any more.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-11-26 14:55:26 -06:00
Tim Zhang
cf56307edb agent/protocols: Fix copyright header checking
Caused by: bb718ba1dd

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-11-26 14:55:26 -06:00
Tim Zhang
359f76d209 agent/protocols: Stop generate agent proto files in the shellscript
Because the job has been done by build.rs.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-11-26 14:55:26 -06:00
Tim Zhang
ca8f1399ca agent/protocols: Ignore generated files and remove these files from repo
Files generated by build.rs does not need to be stored in repo.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-11-26 14:55:26 -06:00
Tim Zhang
0bb559a438 agent/protocols: Generate proto files programmatically
Build proto with build.rs

Fixes: #1019

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-11-26 14:55:26 -06:00
Julio Montes
4ca4412f64 docs: fix spell check
Fix the following errors:

```
Word 'containerID': did you mean one of the following?: ...
Word 'configurated': did you mean one of the following?: ...
Word 'cri': did you mean one of the following?: ...
```

fixes #1144

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-11-26 10:11:36 -06:00
Archana Shinde
e2424b9eb1 Merge pull request #998 from amshinde/doc-2.0-stable
Backport: Backport doc changes from 2.0 dev
2020-10-20 09:41:42 -07:00
Archana Shinde
3d80c84869 docs: Update how-to Readme with hypervisor information.
While we have setup guides for firecracker and ACRN, as these
need additional configuration, it may confuse users looking
at this guide to find mentions of just these 2 hypervisors.
Call out all the hypervisors supported with Kata here.

Fixes #996

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
(cherry picked from commit b88aac049d)
2020-10-19 16:40:09 -07:00
Archana Shinde
f0fdc8e17c docs: Update Readme to remove hypervisor information
The repo https://github.com/kata-containers/qemu has been
archived. We should remove this, as this is not the only
hypervisor we support now.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
(cherry picked from commit d64641174e)
2020-10-19 16:39:56 -07:00
Archana Shinde
e53645ec85 docs: Remove docs for nemu
This hypervisor is no longer supported with Kata.
Remove related docs.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
(cherry picked from commit b4f9fb513e)
2020-10-19 16:39:44 -07:00
Bin Liu
aa295c91f2 Merge pull request #992 from bergwolf/2.0.0-branch-bump
# Kata Containers 2.0.0
2020-10-19 16:02:09 +08:00
Ubuntu
6648c8c7fc release: Kata Containers 2.0.0
- backport 2.0-dev commits to stable-2.0.0

dbfe85e snap: install libseccomp-dev
0c3b6a9 package: drop qemu-virtiofs shim
f751c98 packaging: install virtiofsd for normal qemu build as well
08361c5 runtime: enable virtiofs by default
da9bfb2 runtime: Pass `--thread-pool-size=1` to virtiofsd
7347d43 packaging: Apply virtiofs performance related fixes to 5.x
c7bb1e2 tools: Improve agent-ctl README
e6f7ddd tools: Make agent-ctl support more APIs
46cfed5 tools: Remove commented out code in agent-ctl
81fb2c9 tools: Log request in agent-ctl tool if debug enabled
0c43215 tools: Rename agent-ctl command to GetGuestDetails
6511ffe tools: Fix comment in agent-ctl
ee59378 kernel: update to 5.4.71
ef11213 config: make virtio-fs part of standard kernel
1fb6730 agent: remove `unwrap()` for `e.as_errno()`
05e9fe0 agent: Use `?` instead of `match` when the error returns directly
d658129 kata-monitor: use regexp to check if runtime is kata containers
ae2d89e agent: use anyhow `context` to attach context to `Error` instead of `match`
095d4ad agent: remove useless match
bd816df agent: Use `ok_or_else` instead of match for Option -> Result
d413bf7 agent: Fix crasher if AddARPNeighbors request empty
76408c0 agent: Fix crasher if UpdateRoutes request empty
6e4da19 agent: Fix crasher if UpdateInterface request empty
8f8061d agent: replace `match Result` with `or_else`
64e4b2f agent: replace unnecessary `match Result` with `map_err`
7c0d68f agent: replace check! with map_err for readability
82ed34a agent: remove `check!` in child process because we cant' see logs.
9def624 agent: replace `if let Err` with `or_else`
6926914 agent: refactor namespace::setup to optimize error handling
e733c13 agent: replace `if let Err` with `map_err`
ba069f9 rustjail: add length check for uid_mappings in rootless euid mapping
cc8ec7b versions: Update Kubernetes, containerd, cri-o and cri-tools
8a364d2 annotations: Correct unit tests to validate new protections
0cc6297 annotations: Split addHypervisorOverrides to reduce complexity
b6059f3 annotations: Add unit test for checkPathIsInGlobs
c6afad2 annotations: Add unit test for regexpContains function
451608f makefile: Add missing generated vars to `USER_VARS`
8328136 makefile: Improve names of config entries for annotation checks
a92a630 annotations: Give better names to local variabes in search functions
997f7c4 annotations: Rename checkPathIsInGlobList with checkPathIsInGlobs
74d4065 config: Add better comments in the template files
73bb3fd config: Whitelist hypervisor annotations by name
5a587ba config: Use glob instead of regexp to match paths in annotations
29f5dec annotations: Fix typo in comment
d71f9e1 config: Add makefile variables for path lists
28c386c config: Protect file_mem_backend against annotation attacks
c2a186b config: Protect vhost_user_store_path against annotation attacks
8cd094c config: Add security warning on configuration examples
b5f2a1e config: Protect ctlpath from annotation attack
2d65b3b config: Protect jailer_path annotation
fe5e1cf config: Add examples for path_list configuration
3f7bcf5 annotations: Simplify negative logic
80144fc config: Add hypervisor path override through annotations
2f5f356 config: Fix typo in function name
2faafbd config: Protect virtio_fs_daemon annotation
9e5ed41 config: Add 'List' alternates for hypervisor configuration paths
b33d4fe agent: fix panic on malformed device resource in container update
1838233 cpuset: add cpuset pkg
bfbbe8b cpuset: don't set cpuset.mems in the guest
5c21ec2 sandbox: consider cpusets if quota is not enforced
9bb0d48 cpuset: support setting mems for sandbox
64a2ef6 virtcontainers: add method for calculating cpuset for sandbox
a441f21 cpuset: add cpuset pkg
ce54090 docs: Update upgrading guide
e884fef docs: update the build kata containers kernel document
9c16643 agent/device: Check type as well as major:minor when looking up devices
4978c90 agent/device: Index all devices in spec before updating them
a7ba362 agent/device: Forward port update_spec_device_list() unit test
230a983 agent/device: update_spec_device_list() should error if dev not found
a6d9fd4 sandbox: don't constrain cpus, mem only cpuset, devices
8f0cb2f cgroups: add ability to update CPUSet
cbdae44 agent: fix errorneous parsing for guest block size
97acaa8 docs: Add containerd install guide
2324666 agent: use ok_or/map_err instead of match
ebe5ad1 rustjail: use Iterator to manipulate vector elements
c9497c8 rustjail: delete codes commented out
d5d9928 rustjail: delete unused test code
f70892a agent: use chain of Result to avoid early return
ab64780 agent: update not accurate comments
9e064ba agent: use macro to simplify parse_cmdline function in config.rs
42c48f5 agent: add blank lines between methods
d3a36fa agent: delete unused field in agentService
fa54660 agent: use no-named closure to reduce codes
efddcb4 agent: use a local fn to reduce duplicated codes
7bb3e56 packaging: fix cloud-hypervisor binary path
7b53041 packaging: fix missing cloud_hypervisor_repo
38212ba packaging: apply qemu v5.1 stable fixes
fb7e9b4 agent: fix aarch64 build
0cfcbf7 docs: add namespace key to pod/container config files
997f1f6 docs: Add crictl example json files
f60f43a runtime: Clear the VCMock 1.x API Methods from 2.0
1789527 ci: snap: add event filtering
999f67d agent: do not follow link when mounting container proc and sysfs
cb2255f agent: set init process non-dumpable
2a6c9ee agent-ctl: include cargo lock updates
eaff5de versions: add plugins section
4f1d23b virtiofs: Disable DAX
6d80df9 snap: specify python version
a116ce0 osbuilder: Create target directory for agent
4dc3bc0 rust-agent: Treat warnings as error
8f7a484 rust-agent: Identify unused results in tests
ce54e5d rust-agent: Log returned errors rather than ignore them
9adb7b7 rust-agent: Remove unused imports
73ab9b1 rust-agent: Report errors to caller if possible
4db3f9e rust-agent: Ignore write errors while writing to the logs
19cb657 rust-agent: Remove unused code that has undefined behavior
86bc151 rust-agent: Remove 'mut' where not needed
8d8adb6 rust-agent: Remove uses of deprecated functions
76298c1 rust-agent: Remove or rename unused parameters
7d303ec rust-agent: Remove or rename unused variables
e0b79eb rust-agent: Remove unused functions
8ed61b1 rust-agent: Remove useless braces
cc4f02e rust-agent: Remove unused macros
ace6f1e clh: Support VFIO device unplug
47cfeaa clh: Remove unnecessary VmmPing
63c4757 versions: cloud-hypervisor: Bump to version 6d30fe05
059b89c docs: Change kata_tap0 to tap0_kata
4ff3ed5 docs: update networking description
de8dcb1 dev-guide: update kata-agent install details
c488cc4 docs: Update docs for enabling agent debug console
e5acb12 docs: update dev guide for agent build
1bddde7 ci: add github action to test the snap
9517b0a versions: cloud-hypervisor: bump version
f5a7175 runtime: cloud-hypervisor: tag openapi-generator-cli container

Signed-off-by: Ubuntu <ubuntu@ip-172-31-19-197.ap-southeast-1.compute.internal>
2020-10-19 06:18:08 +00:00
Xu Wang
49776f76bf Merge pull request #984 from bergwolf/prepare-release
backport 2.0-dev commits to stable-2.0.0
2020-10-18 13:46:16 +08:00
Peng Tao
dbfe85e705 snap: install libseccomp-dev
To build qemu with virtio-fs support.

Depends-on: github.com/kata-containers/tests#2979
Fixes: #982
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:43:15 +08:00
Peng Tao
0c3b6a94b3 package: drop qemu-virtiofs shim
We have enabled qemu-virtiofs by default.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:43:15 +08:00
Peng Tao
f751c98da3 packaging: install virtiofsd for normal qemu build as well
For experimental-virtiofs, we use it to test virtiofs with DAX. Let's
rename its virtiofsd to virtiofsd-dax.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:43:15 +08:00
Peng Tao
08361c5948 runtime: enable virtiofs by default
We've been shipping it for a long time. It's time to make it default
replacing the old obsolet 9pfs.

Fixes: #935
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:43:15 +08:00
Fabiano Fidêncio
da9bfb27ed runtime: Pass --thread-pool-size=1 to virtiofsd
Dave Gilbert brough up that passing --thread-pool-size=1 to virtiofsd
may result in a performance improvement especially when using
`cache=none`. While our current default is `cache=auto`, Dave mentioned
that he seems no harm in having it set and he also mentiond that it may
use a lot less stack space on aarch/arm.

Fixes: #943

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2020-10-18 00:43:15 +08:00
Fabiano Fidêncio
7347d43cf9 packaging: Apply virtiofs performance related fixes to 5.x
Vivek Goyal found out that using "shared" thread pool, instead of
"exclusive" results in better performance.

Knowning that and with the plan to have virtio-fs as the default fs for
the 2.0, let's bring this patch in for both 5.0 and 5.1.

Fixes: #944

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2020-10-18 00:43:15 +08:00
James O. D. Hunt
c7bb1e2790 tools: Improve agent-ctl README
Add a summary to help understand how to use the `agent-ctl` tool.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-18 00:43:15 +08:00
James O. D. Hunt
e6f7ddd9a2 tools: Make agent-ctl support more APIs
Added new `agent-ctl` commands to allow the following agent API calls to
be made:

- `AddARPNeighborsRequest`
- `CloseStdinRequest`
- `CopyFileRequest`
- `GetMetricsRequest`
- `GetOOMEventRequest`
- `MemHotplugByProbeRequest`
- `OnlineCPUMemRequest`
- `ReadStreamRequest`
- `ReseedRandomDevRequest`
- `SetGuestDateTimeRequest`
- `TtyWinResizeRequest`
- `UpdateContainerRequest`
- `WriteStreamRequest`

Fixes: #969.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-18 00:43:15 +08:00
James O. D. Hunt
46cfed5025 tools: Remove commented out code in agent-ctl
Remove a few lines of commented out code.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-18 00:43:15 +08:00
James O. D. Hunt
81fb2c9980 tools: Log request in agent-ctl tool if debug enabled
Display the API request before making the call so users can see what is
sent to the agent.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-18 00:43:15 +08:00
James O. D. Hunt
0c432153df tools: Rename agent-ctl command to GetGuestDetails
Rename the `GuestDetails` command to `GetGuestDetails` to match the
actual agent API name.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-18 00:43:15 +08:00
James O. D. Hunt
6511ffe89d tools: Fix comment in agent-ctl
Correct a comment in the agent control tool.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-18 00:43:15 +08:00
Eric Ernst
ee59378232 kernel: update to 5.4.71
vsock fix was backported to 5.4 stable, so we can drop this patch.

Fixes: #973

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:43:15 +08:00
Eric Ernst
ef11213a4e config: make virtio-fs part of standard kernel
Basic virtio-fs support has made it upstream in the Linux kernel, as
well as in QEMU and Cloud Hypervisor. Let's go ahead and add it to the
standard configuration.

Since the device driver / DAX handling is still in progress for
upstream, we will want to still build a seperate experimental kernel for
those who are comfortable trading off bleeding edge stability/kernel
updates for improved FIO numbers.

Fixes: #963

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:43:15 +08:00
Tim Zhang
1fb6730984 agent: remove unwrap() for e.as_errno()
Use `{:?}` to print `e.as_errno()` instead of using `{}`
to print `e.as_errno().unwrap().desc()`.

Avoid panic only caused by error's content.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-10-18 00:43:15 +08:00
Tim Zhang
05e9fe0591 agent: Use ? instead of match when the error returns directly
It's more clear and more readable.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-10-18 00:43:15 +08:00
bin liu
d658129695 kata-monitor: use regexp to check if runtime is kata containers
To support a few common configurations for Kata, including:

- `io.containerd.kata.v2`
- `io.containerd.kata-qemu.v2`
- `io.containerd.kata-clh.v2`

`kata-monintor` changes to use regexp instead of direct string comparison.

Fixes: #957

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:43:15 +08:00
Tim Zhang
ae2d89e95e agent: use anyhow context to attach context to Error instead of match
Context is clearer than match for these situations.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-10-18 00:43:15 +08:00
Tim Zhang
095d4ad08d agent: remove useless match
Remove useless match.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-10-18 00:43:15 +08:00
Tim Zhang
bd816dfcec agent: Use ok_or_else instead of match for Option -> Result
Using ok_or is clearer than match.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-10-18 00:43:15 +08:00
James O. D. Hunt
d413bf7d44 agent: Fix crasher if AddARPNeighbors request empty
Check if the ARP neighbours specified in the `AddARPNeighbors` API is
set before using it to avoid crashing the agent.

Fixes: #955.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-18 00:43:15 +08:00
James O. D. Hunt
76408c0f13 agent: Fix crasher if UpdateRoutes request empty
Check if the routes specified in the `UpdateRoutes` API is set before
using it to avoid crashing the agent.

Fixes: #949.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-18 00:43:15 +08:00
James O. D. Hunt
6e4da19fa5 agent: Fix crasher if UpdateInterface request empty
Check if the interface specified in the `UpdateInterface` API is set
before using it to avoid crashing the agent.

Fixes: #950.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-18 00:43:15 +08:00
Tim Zhang
8f8061da08 agent: replace match Result with or_else
`or_else` is suitable for more complicated situations.
We can use it to return Ok in Err handling.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-10-18 00:43:15 +08:00
Tim Zhang
64e4b2fa83 agent: replace unnecessary match Result with map_err
Replace `match Result` whose Ok hand is useless.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-10-18 00:43:15 +08:00
Tim Zhang
7c0d68f7f7 agent: replace check! with map_err for readability
It's ambiguous and not easy to read to call method use macro.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-10-18 00:43:15 +08:00
Tim Zhang
82ed34aee1 agent: remove check! in child process because we cant' see logs.
The check macro will log the errors but the log in child process can't
be seen, just ignore it.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-10-18 00:43:15 +08:00
Tim Zhang
9def624c05 agent: replace if let Err with or_else
Fixes #934

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-10-18 00:43:15 +08:00
Tim Zhang
6926914683 agent: refactor namespace::setup to optimize error handling
- Replace the return value with anyhow::Result.
- Remove if let Err.
- Remove match.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-10-18 00:43:15 +08:00
Tim Zhang
e733c13cf7 agent: replace if let Err with map_err
Fixes #934

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-10-18 00:43:15 +08:00
bin liu
ba069f9baa rustjail: add length check for uid_mappings in rootless euid mapping
This might be a copy miss, gid_mappings is checked twice, one should
be uid_mappings.

Fixes: #952

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:43:15 +08:00
Salvador Fuentes
cc8ec7b0e9 versions: Update Kubernetes, containerd, cri-o and cri-tools
Kubernetes: from 1.17.3 to 1.18.9
CRI-O: from 0eec454168e381e460b3d6de07bf50bfd9b0d082 (1.17) to 1.18.3
Containerd: from 3a4acfbc99aa976849f51a8edd4af20ead51d8d7 (1.3.3) to 1.3.7
cri-tools: from 1.17.0 to 1.18.0

Fixes: #960.
Signed-off-by: Salvador Fuentes <salvador.fuentes@intel.com>
2020-10-18 00:43:15 +08:00
Christophe de Dinechin
8a364d2145 annotations: Correct unit tests to validate new protections
Add the verification of some basic protections, namely that:
- EnableAnnotations is honored
- Dangerous paths cannot be modified if no match
- Errors are returned when expected

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:43:15 +08:00
Christophe de Dinechin
0cc6297716 annotations: Split addHypervisorOverrides to reduce complexity
Warning from gocyclo during make check:
 virtcontainers/pkg/oci/utils.go:404:1: cyclomatic complexity 37 of func `addHypervisorConfigOverrides` is high (> 30) (gocyclo)
 func addHypervisorConfigOverrides(ocispec specs.Spec, config *vc.SandboxConfig, runtime RuntimeConfig) error {
^

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:43:15 +08:00
Christophe de Dinechin
b6059f3566 annotations: Add unit test for checkPathIsInGlobs
There are a few interesting corner cases to consider for this
function.

Fixes: #901

Suggested-by: James O.D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:43:15 +08:00
Christophe de Dinechin
c6afad2a06 annotations: Add unit test for regexpContains function
James O.D Hunt: "But also, regexpContains() and
checkPathIsInGlobList() seem like good candidates for some unit
tests. The "look" obvious, but a few boundary condition tests would be
useful I think (filenames with spaces, backslashes, special
characters, and relative & absolute paths are also an interesting
thought here)."

There aren't that many boundary conditions on a list with regexps,
if you assume the regexp match function itself works. However, the
tests is useful in documenting expectations.

Fixes: #901

Suggested-by: James O.D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:43:15 +08:00
Christophe de Dinechin
451608fb28 makefile: Add missing generated vars to USER_VARS
This was discovered while checking a massive change in variables.
The root cause for the error is a very long list of manual
replacements, that is best replaced with a $(foreach).

All individual variables in the output configuration files were
checked against the old build using diff.

This is a forward port of a makefile fix included in
PR https://github.com/kata-containers/runtime/issues/3004
for issue https://github.com/kata-containers/runtime/issues/2943

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:43:15 +08:00
Christophe de Dinechin
8328136575 makefile: Improve names of config entries for annotation checks
The entries used to be things like PATH_LIST, which are too generic.
Replace them with more precise name with a distinguishing keyword,
namely VALID. For example valid_hypervisor_paths.

Fixes: #901

Suggested-by: James O.D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:43:15 +08:00
Christophe de Dinechin
a92a63031d annotations: Give better names to local variabes in search functions
Use more meaningful variable names for clarity.

Fixes: #901

Suggested-by: James O.D. Hunt james.o.hunt@intel.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:43:15 +08:00
Christophe de Dinechin
997f7c4433 annotations: Rename checkPathIsInGlobList with checkPathIsInGlobs
The name is shorter and more specific

Fixes: #901

Suggested-by: James O.D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:43:15 +08:00
Christophe de Dinechin
74d4065197 config: Add better comments in the template files
When there is a default value from the code (usually empty) that
differs from a possible suggested value from the distro, then the
wording "default: empty" is confusing.

Fixes: #901

Suggested-by: Julio Montes <julio.montes@intel.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:43:15 +08:00
Christophe de Dinechin
73bb3fdbee config: Whitelist hypervisor annotations by name
Add a field "enable_annotations" to the runtime configuration that can
be used to whitelist annotations using a list of regular expressions,
which are used to match any part of the base annotation name, i.e. the
part after "io.katacontainers.config.hypervisor."

For example, the following configuraiton will match "virtio_fs_daemon",
"initrd" and "jailer_path", but not "path" nor "firmware":

  enable_annotations = [ "virtio.*", "initrd", "_path" ]

The default is an empty list of enabled annotations, which disables
annotations entirely.

If an anontation is rejected, the message is something like:

  annotation io.katacontainers.config.hypervisor.virtio_fs_daemon is not enabled

Fixes: #901

Suggested-by: Peng Tao <tao.peng@linux.alibaba.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:43:10 +08:00
Christophe de Dinechin
5a587ba506 config: Use glob instead of regexp to match paths in annotations
When filtering annotations that correspond to paths,
e.g. hypervisor.path, it is better to use a glob syntax than a regexp
syntax, as it is more usual for paths, and prevents classes of matches
that are undesirable in our case, such as matching .. against .*

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
29f5dec38f annotations: Fix typo in comment
A comment talking about runtime related annotations describes them as
being related to the agent. A similar comment for the agent
annotations is missing.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
d71f9e1155 config: Add makefile variables for path lists
Add variables to override defaults at build time for the various lists
used to control path annotations.

Fixes: #901

Suggested-by: Fabiano Fidencio <fidencio@redhat.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
28c386c51f config: Protect file_mem_backend against annotation attacks
This one could theoretically be used to overwrite data on the host.
It seems somewhat less risky than the earlier ones for a number
of reasons, but worth protecting a little anyway.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
c2a186b18c config: Protect vhost_user_store_path against annotation attacks
This path could be used to overwrite data on the host.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
8cd094cf06 config: Add security warning on configuration examples
Add the following text explaining the risk of using regular
expressions in path lists:

Each member of the list can be a regular expression, but prefer names.
Otherwise, please read and understand the following carefully.
SECURITY WARNING: If you use regular expressions, be mindful that
an attacker could craft an annotation that uses .. to escape the paths
you gave. For example, if your regexp is /bin/qemu.* then if there is
a directory named /bin/qemu.d/, then an attacker can pass an annotation
containing /bin/qemu.d/../put-any-binary-name-here and attack your host.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
b5f2a1e8c4 config: Protect ctlpath from annotation attack
This also adds annotation for ctlpath which were not present
before. It's better to implement the code consistenly right now to make
sure that we don't end up with a leaky implementation tacked on later.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
2d65b3bfd8 config: Protect jailer_path annotation
The jailer_path annotation can be used to execute arbitrary code on
the host. Add a jailer_path_list configuration entry providing a list
of regular expressions that can be used to filter annotations that
represent valid file names.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
fe5e1cf2e1 config: Add examples for path_list configuration
The path_list configuration gives a series of regular expressions that
limit which values are acceptable through annotations in order to
avoid kata launching arbitrary binaries on the host when receiving an
annotation.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
3f7bcf54f0 annotations: Simplify negative logic
Replace strange negative logic  (!ok -> continue) with positive
logic (ok -> do it)

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
80144fc415 config: Add hypervisor path override through annotations
The annotation is provided, so it should be respected.
Furthermore, it is important to implement it with the appropriate
protetions similar to what was done for virtiofsd.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
2f5f35608a config: Fix typo in function name
There was an extra 'p' in addHypervisorVirtioFsOverrides.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
2faafbdd3a config: Protect virtio_fs_daemon annotation
Sending the virtio_fs_daemon annotation can be used to execute
arbitrary code on the host. In order to prevent this, restrict the
values of the annotation to a list provided by the configuration
file.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
9e5ed41511 config: Add 'List' alternates for hypervisor configuration paths
Paths mentioned in the hypervisor configuration can be overriden
using annotations, which is potentially dangerous. For each path,
add a 'List' variant that specifies the list of acceptable values
from annotations.

Bug: https://bugs.launchpad.net/katacontainers.io/+bug/1878234

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Peng Tao
b33d4fe708 agent: fix panic on malformed device resource in container update
Somehow containerd is sending a malformed device in update API. While it
should not happen, we should not panic either.

Fixes: #946
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:40:16 +08:00
Eric Ernst
183823398d cpuset: add cpuset pkg
Pulled from 1.18.4 Kubernetes, adding the cpuset pkg for managing
CPUSet calculations on the host. Go mod'ing the original code from
k8s.io/kubernetes was very painful, and this is very static, so let's
just pull in what we need.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
Eric Ernst
bfbbe8ba6b cpuset: don't set cpuset.mems in the guest
Kata doesn't map any numa topologies in the guest. Let's make sure we
clear the Cpuset fields before passing container updates to the
guest.

Note, in the future we may want to have a vCPU to guest CPU mapping and
still include the cpuset.Cpus. Until we have this support, clear this as
well.

Fixes: #932

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
Eric Ernst
5c21ec278c sandbox: consider cpusets if quota is not enforced
CPUSet cgroup allows for pinning the memory associated with a cpuset to
a given numa node. Similar to cpuset.cpus, we should take cpuset.mems
into account for the sandbox-cgroup that Kata creates.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
Eric Ernst
9bb0d48d56 cpuset: support setting mems for sandbox
CPUSet cgroup allows for pinning the memory associated with a cpuset to
a given numa node. Similar to cpuset.cpus, we should take cpuset.mems
into account for the sandbox-cgroup that Kata creates.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
Eric Ernst
64a2ef62e0 virtcontainers: add method for calculating cpuset for sandbox
Calculate sandbox's CPUSet as the union of each of the container's
CPUSets.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
Eric Ernst
a441f21c40 cpuset: add cpuset pkg
Pulled from 1.18.4 Kubernetes, adding the cpuset pkg for managing
CPUSet calculations on the host. Go mod'ing the original code from
k8s.io/kubernetes was very painful, and this is very static, so let's
just pull in what we need.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
James O. D. Hunt
ce54090f25 docs: Update upgrading guide
Update the upgrading guide for 2.0.

Fixes: #928.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-18 00:40:16 +08:00
Ychau Wang
e884fef483 docs: update the build kata containers kernel document
Update the build kata containers kernel document for 2.0 release. Fixed
the 1.x release project paths and urls, using the kata-containers
project file paths and urls.

Fixes: #929

Signed-off-by: Ychau Wang <wangyongchao.bj@inspur.com>
2020-10-18 00:40:16 +08:00
David Gibson
9c16643c12 agent/device: Check type as well as major:minor when looking up devices
To update device resource entries from host to guest, we search for
the right entry by host major:minor numbers, then later update it.
However block and character devices exist in separate major:minor
namespaces so we could have one block and one character device with
matching major:minor and thus incorrectly update both with the details
for whichever device is processed second.

Add a check on device type to prevent this.

Port from the Kata 1 Go agent
https://github.com/kata-containers/agent/commit/27ebdc9d2761

Fixes: #703

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-10-18 00:40:16 +08:00
David Gibson
4978c9092c agent/device: Index all devices in spec before updating them
The agent needs to update device entries in the OCI spec so that it
has the correct major:minor numbers for the guest, which may differ
from the host.

Entries in the main device list are looked up by device path, but
entries in the device resources list are looked up by (host)
major:minor.  This is done one device at a time, updating as we go in
update_spec_device_list().

But since the host and guest have different namespaces, one device
might have the same major:minor as a different device on the host.  In
that case we could update one resource entry to the correct guest
values, then mistakenly update it again because it now matches a
different host device.

To avoid this, rather than looking up and updating one by one, we make
all the lookups in advance, creating a map from (host) device path to
the indices in the spec where the device and resource entries can be
found.

Port from the Go agent in Kata 1,
https://github.com/kata-containers/agent/commit/d88d46849130

Fixes: #703

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-10-18 00:40:16 +08:00
David Gibson
a7ba362f92 agent/device: Forward port update_spec_device_list() unit test
The Kata 1 Go agent included a unit test for updateSpecDeviceList, but no
such unit test exists for the Rust agent's equivalent
update_spec_device_list().  Port the Kata1 test to Rust.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-10-18 00:40:16 +08:00
David Gibson
230a9833f8 agent/device: update_spec_device_list() should error if dev not found
If update_spec_device_list() is given a device that can't be found in the
OCI spec, it currently does nothing, and returns Ok(()).  That doesn't
seem like what we'd expect and is not what the Go agent in Kata 1 does.

Change it to return an error in that case, like Kata 1.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-10-18 00:40:16 +08:00
Eric Ernst
a6d9fd4118 sandbox: don't constrain cpus, mem only cpuset, devices
Allow for constraining the cpuset as well as the devices-whitelist . Revert
sandbox constraints for cpu/memory, as they break the K8S use case. Can
re-add behind a non-default flag in the future.

The sandbox CPUSet should be updated every time a container is created,
updated, or removed.

To facilitate this without rewriting the 'non constrained cgroup'
handling, let's add to the Sandbox's cgroupsUpdate function.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
Eric Ernst
8f0cb2f1ea cgroups: add ability to update CPUSet
Add function for applying a cpuset change to a cgroup

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
Eric Ernst
cbdae44992 agent: fix errorneous parsing for guest block size
We were assuming base 10 string before, when the block size from sysfs
is actually a hex string. Let's fix that.

Fixes: #908

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
James O. D. Hunt
97acaa8124 docs: Add containerd install guide
Create a containerd installation guide and a new `kata-manager` script
for 2.0 that automated the steps outlined in the guide.

Also cleaned up and improved the installation documentation in various
ways, the most significant being:

- Added legacy install link for 1.x installs.
- Official packages section:
  - Removed "Contact" column (since it was empty!)
  - Reworded "Versions" column to clarify the versions are a minimum
    (to reduce maintenance burden).
  - Add a column to show which installation methods receive automatic updates.
  - Modified order of installation options in table and document to
    de-emphasise automatic installation and promote official packages
    and snap more.
- Removed sections no longer relevant for 2.0.

Fixes: #738.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-18 00:40:16 +08:00
bin liu
23246662b2 agent: use ok_or/map_err instead of match
Sometimes `Option.or_or` and `Result.map_err` may be simpler
than match statement. Especially in rpc.rs, there are
many `ctr.get_process` and `sandbox.get_container` which
are using `match`.

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
ebe5ad1386 rustjail: use Iterator to manipulate vector elements
Use Iterator can save codes, and make code more readable

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
c9497c88e4 rustjail: delete codes commented out
There are some uses/codes/struct fields are commented out, and
may not turn into  un-comment these codes, so delete these comments.

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
d5d9928f97 rustjail: delete unused test code
The auto generated test code is no meanings, delete it.

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
f70892a5bb agent: use chain of Result to avoid early return
Use rust `Result`'s `or_else`/`and_then` can write clean codes.
And can avoid early return by check wether the `Result`
is `Ok` or `Err`.

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
ab64780a0b agent: update not accurate comments
This commit includes:
- update comments that not matched the function name
- file path with doubled slash

Fixes: #922

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
9e064ba192 agent: use macro to simplify parse_cmdline function in config.rs
In function parse_cmdline there are some similar codes, if we want
to add more commandline arguments, the code will grow too long.
Use macro can reduce some codes with the same logic/processing.

Fixes: #914

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
42c48f54ed agent: add blank lines between methods
In rpc.rs, there are no blank lines between methods, this commit
add blank lines for these methods.

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
d3a36fa06f agent: delete unused field in agentService
The code is for test, and not needed now.

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
fa546600ff agent: use no-named closure to reduce codes
For simple closures, inline closures can save codes.

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
efddcb4ab8 agent: use a local fn to reduce duplicated codes
The same codes used twices, aggregated into a function can
reduce codes.

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
Peng Tao
7bb3e562bc packaging: fix cloud-hypervisor binary path
1. ensure build-static-clh.sh puts cloud-hypervisor under ./cloud-hypervisor directory
2. install cloud-hypervisor/cloud-hypervisor binary

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:40:16 +08:00
Peng Tao
7b53041bad packaging: fix missing cloud_hypervisor_repo
It is needed in order to build from source.

Fixes: #916
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:40:16 +08:00
Peng Tao
38212ba6d8 packaging: apply qemu v5.1 stable fixes
Qemu v5.1 was released with an affending commit 9b3a35ec82
(virtio: verify that legacy support is not accidentally on).
As a result, it breaks commandline compatiblilities for old qemu
users. Upstream qemu has fixed it but no release has been put out yet.
Let's apply these fixes by hand for now.

Refs: https://www.mail-archive.com/qemu-devel@nongnu.org/msg729556.html

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:40:16 +08:00
Jianyong Wu
fb7e9b4f32 agent: fix aarch64 build
aarch64 needs libgcc to resolve some non-builtin symbols.

Fixes: #909
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
0cfcbf79b8 docs: add namespace key to pod/container config files
If no namespace field in config files, CRI-O will failed:
 setting pod sandbox name and id: cannot generate pod name without namespace

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
997f1f6cd0 docs: Add crictl example json files
Add basic sample pod/container config files to show
how to use `crictl` with Kata containers.

Fixes: #881

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
Ychau Wang
f60f43af6b runtime: Clear the VCMock 1.x API Methods from 2.0
Clear the 1.x branch api methods in the 2.0. Keep the same methods to
the VC interface, like the VCImpl struct.

Fixes: #751

Signed-off-by: Ychau Wang <wangyongchao.bj@inspur.com>
2020-10-18 00:40:16 +08:00
Julio Montes
1789527d61 ci: snap: add event filtering
Run the snap CI on every PR is not needed. Don't run the snap CI
on PRs that don't change the source code (*.go/*.rs), a configuration
file or Makefile.

fixes #896

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-18 00:40:16 +08:00
Peng Tao
999f67d573 agent: do not follow link when mounting container proc and sysfs
Attackers might use it to explore other containers in the same pod.
While it is still safe to allow it, we can just close the race window
like runc does.

Fixes: #885
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:40:16 +08:00
Peng Tao
cb2255f199 agent: set init process non-dumpable
On old kernels (like v4.9), kernel applies CLOECEC in wrong order w.r.t.
dumpable task flags. As a result, we might leak guest file descriptor to
containers. This is a former runc CVE-2016-9962 and still applies to
kata agent. Although Kata container is still valid at protecting the
host, we should not leak extra resources to user containers.

This sets the init processes that join and setup the container's
namespaces as non-dumpable before they setns to the container's pid (or
any other ) namespace.

This settings is automatically reset to the default after the Exec in
the container so that it does not change functionality for the
applications that are running inside, just our init processes.

This prevents parent processes, the pid 1 of the container, to ptrace
the init process before it drops caps and other sets LSMs.

The order during the exec syscall is that the process is set back to
dumpable before O_CLOEXEC are processed.

Refs:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=613cc2b6f272c1a8ad33aefa21cad77af23139f7
https://github.com/torvalds/linux/blob/v4.9/fs/exec.c#L1290-L1318
opencontainers/runc@50a19c6
https://nvd.nist.gov/vuln/detail/CVE-2016-9962

Fixes: #890
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:40:16 +08:00
Peng Tao
2a6c9eec74 agent-ctl: include cargo lock updates
Simply running `make` would generate some cargo lock updates for
agent-ctl. Let's include them so that we have fixed dependencies.

Fixes: #883
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:40:16 +08:00
Julio Montes
eaff5de37a versions: add plugins section
plugins sections contains the details of plugins required for
the components or testing.

Add sriov-network-device-plugin url and version that are consumed
by the VFIO test in the tests repository.

fixes #879

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-18 00:40:16 +08:00
Jose Carlos Venegas Munoz
4f1d23b651 virtiofs: Disable DAX
virtiofs DAX support is not stable today, there are
a few corner cases to make it default.

Fixes: #862
Fixes: #875

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
2020-10-18 00:40:16 +08:00
Julio Montes
6d80df9831 snap: specify python version
In order to avoid `unmet dependencies` error in the CI,
the python version must be specified in the yaml.

fixes #877

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-18 00:40:16 +08:00
Ralf Haferkamp
a116ce0b75 osbuilder: Create target directory for agent
When building with AGENT_SOURCE_BIN pointing to an already built
kata-agent binary, the target directory needs to be created in the
rootfs tree.

Fixes #873

Signed-off-by: Ralf Haferkamp <rhafer@suse.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
4dc3bc0020 rust-agent: Treat warnings as error
Avoid the accumulation of warnings we had, as reported in #750.

Fixes: #750

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
8f7a4842c2 rust-agent: Identify unused results in tests
Assign unused results to _ in order to silence warnings.

This addresses the following warnings:

    warning: unused `std::result::Result` that must be used
        --> rustjail/src/mount.rs:1182:16
         |
    1182 |         defer!(unistd::chdir(&olddir););
         |                ^^^^^^^^^^^^^^^^^^^^^^^
         |
         = note: `#[warn(unused_must_use)]` on by default
         = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
        --> rustjail/src/mount.rs:1183:9
         |
    1183 |         unistd::chdir(tempdir.path());
         |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
         |
         = note: this `Result` may be an `Err` variant, which should be handled

While in regular code, we want to log possible errors, in test code
it's OK to simply ignore the returned value.

Fixes: #750

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
ce54e5dd57 rust-agent: Log returned errors rather than ignore them
In a number of cases, we have functions that return a Result<...>
and where the possible error case is simply ignored. This is a bit
unhealthy.

Add a `check!` macro that allows us to not ignore error values
that we want to log, while not interrupting the flow by returning
them. This is useful for low-level functions such as `signal::kill` or
`unistd::close` where an error is probably significant, but should not
necessarily interrupt the flow of the program (i.e. using `call()?` is
not the right answer.

The check! macro is then used on low-level calls. This addresses the
following warnings from #750:

This addresses the following warning:

    warning: unused `std::result::Result` that must be used
       --> /home/ddd/go/src/github.com/kata-containers-2.0/src/agent/rustjail/src/container.rs:903:17
        |
    903 |                 signal::kill(Pid::from_raw(p.pid), Some(Signal::SIGKILL));
        |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: `#[warn(unused_must_use)]` on by default
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> /home/ddd/go/src/github.com/kata-containers-2.0/src/agent/rustjail/src/container.rs:916:17
        |
    916 |                 signal::kill(Pid::from_raw(child.id() as i32), Some(Signal::SIGKILL));
        |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:340:13
        |
    340 |             write_sync(cwfd, SYNC_FAILED, format!("{:?}", e).as_str());
        |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: `#[warn(unused_must_use)]` on by default
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:554:13
        |
    554 | /             write_sync(
    555 | |                 cwfd,
    556 | |                 SYNC_FAILED,
    557 | |                 format!("setgroups failed: {:?}", e).as_str(),
    558 | |             );
        | |______________^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:340:13
        |
    340 |             write_sync(cwfd, SYNC_FAILED, format!("{:?}", e).as_str());
        |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:340:13
        |
    340 |             write_sync(cwfd, SYNC_FAILED, format!("{:?}", e).as_str());
        |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: `#[warn(unused_must_use)]` on by default
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:554:13
        |
    554 | /             write_sync(
    555 | |                 cwfd,
    556 | |                 SYNC_FAILED,
    557 | |                 format!("setgroups failed: {:?}", e).as_str(),
    558 | |             );
        | |______________^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:626:5
        |
    626 |     unistd::close(cfd_log);
        |     ^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: `#[warn(unused_must_use)]` on by default
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:627:5
        |
    627 |     unistd::close(crfd);
        |     ^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:628:5
        |
    628 |     unistd::close(cwfd);
        |     ^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:770:9
        |
    770 |         fcntl::fcntl(pfd_log, FcntlArg::F_SETFD(FdFlag::FD_CLOEXEC));
        |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: `#[warn(unused_must_use)]` on by default
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:799:9
        |
    799 |         fcntl::fcntl(prfd, FcntlArg::F_SETFD(FdFlag::FD_CLOEXEC));
        |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:800:9
        |
    800 |         fcntl::fcntl(pwfd, FcntlArg::F_SETFD(FdFlag::FD_CLOEXEC));
        |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:803:13
        |
    803 |             unistd::close(prfd);
        |             ^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:930:9
        |
    930 |         log_handler.join();
        |         ^^^^^^^^^^^^^^^^^^^
        |
        = note: `#[warn(unused_must_use)]` on by default
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:803:13
        |
    803 |             unistd::close(prfd);
        |             ^^^^^^^^^^^^^^^^^^^^
        |
        = note: `#[warn(unused_must_use)]` on by default
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:804:13
        |
    804 |             unistd::close(pwfd);
        |             ^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:842:13
        |
    842 |             sched::setns(old_pid_ns, CloneFlags::CLONE_NEWPID);
        |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:843:13
        |
    843 |             unistd::close(old_pid_ns);
        |             ^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

Fixes: #844
Fixes: #750

Suggested-by: Tim Zhang <tim@hyper.sh>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
9adb7b7c28 rust-agent: Remove unused imports
This addresses the following warnings (and similar ones)::

    Compiling rustjail v0.1.0 (/home/ddd/go/src/github.com/kata-containers-2.0/src/agent/rustjail)
    warning: unused import: `debug`
      --> rustjail/src/container.rs:57:12
       |
    57 | use slog::{debug, info, o, Logger};
       |            ^^^^^

    warning: unused imports: `AddressFamily`, `SockFlag`, `SockType`, `self`
      --> rustjail/src/process.rs:18:24
       |
    18 | use nix::sys::socket::{self, AddressFamily, SockFlag, SockType};
       |                        ^^^^  ^^^^^^^^^^^^^  ^^^^^^^^  ^^^^^^^^

    warning: unused import: `nix::Error`
      --> rustjail/src/process.rs:23:5
       |
    23 | use nix::Error;
       |     ^^^^^^^^^^

    warning: unused import: `protobuf::RepeatedField`
      --> rustjail/src/validator.rs:11:5
       |
    11 | use protobuf::RepeatedField;
       |     ^^^^^^^^^^^^^^^^^^^^^^^

Fixes: #750

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
73ab9b1d6d rust-agent: Report errors to caller if possible
Various recently added error-causing calls

This addresses the following warning:

    warning: unused `std::result::Result` that must be used
      --> rustjail/src/cgroups/fs/mod.rs:93:9
       |
    93 |         cg.add_task(CgroupPid::from(pid as u64));
       |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       |
       = note: `#[warn(unused_must_use)]` on by default
       = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/cgroups/fs/mod.rs:196:17
        |
    196 |                 freezer_controller.thaw();
        |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/cgroups/fs/mod.rs:199:17
        |
    199 |                 freezer_controller.freeze();
        |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/cgroups/fs/mod.rs:365:9
        |
    365 |         cpuset_controller.set_cpus(&cpu.cpus);
        |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/cgroups/fs/mod.rs:369:9
        |
    369 |         cpuset_controller.set_mems(&cpu.mems);
        |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/cgroups/fs/mod.rs:381:13
        |
    381 |             cpu_controller.set_shares(shares);
        |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/cgroups/fs/mod.rs:385:5
        |
    385 |     cpu_controller.set_cfs_quota_and_period(cpu.quota, cpu.period);
        |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
        --> rustjail/src/cgroups/fs/mod.rs:1061:13
         |
    1061 |             cpuset_controller.set_cpus(cpuset_cpus);
         |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
         |
         = note: this `Result` may be an `Err` variant, which should be handled

The specific case of cpu_controller.set_cfs_quota_and_period is
addressed in a way that changes the logic following a suggestion by
Liu Bin, who had just added the code.

Fixes: #750

Suggested-by: Liu Bin <bin@hyper.sh>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
4db3f9e226 rust-agent: Ignore write errors while writing to the logs
When we are writing to the logs and there is an error doing so, there
is not much we can do. Chances are that a panic would make things
worse. So let it go through.

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/sync.rs:26:9
        |
    26  |         write_count(lfd, log_str.as_bytes(), log_str.len());
        |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
       ::: rustjail/src/container.rs:339:13
        |
    339 |             log_child!(cfd_log, "child exit: {:?}", e);
        |             ------------------------------------------- in this macro invocation
        |
        = note: this `Result` may be an `Err` variant, which should be handled
        = note: this warning originates in a macro (in Nightly builds, run with -Z macro-backtrace for more info)

Fixes: #750

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
19cb657299 rust-agent: Remove unused code that has undefined behavior
Some functions have undefined behavior and are not actually used.

This addresses the following warning:
    warning: the type `oci::User` does not permit zero-initialization
      --> rustjail/src/lib.rs:99:18
       |
    99 |         unsafe { MaybeUninit::zeroed().assume_init() }
       |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       |                  |
       |                  this code causes undefined behavior when executed
       |                  help: use `MaybeUninit<T>` instead, and only call `assume_init` after initialization is done
       |
       = note: `#[warn(invalid_value)]` on by default
    note: `std::ptr::Unique<u32>` must be non-null (in this struct field)

    warning: the type `protocols::oci::Process` does not permit zero-initialization
       --> rustjail/src/lib.rs:146:14
        |
    146 |     unsafe { MaybeUninit::zeroed().assume_init() }
        |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |              |
        |              this code causes undefined behavior when executed
        |              help: use `MaybeUninit<T>` instead, and only call `assume_init` after initialization is done
        |
    note: `std::ptr::Unique<std::string::String>` must be non-null (in this struct field)

Fixes: #750

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
86bc151787 rust-agent: Remove 'mut' where not needed
Addresses the following warning (and a few similar ones):
    warning: variable does not need to be mutable
       --> rustjail/src/container.rs:369:9
        |
    369 |     let mut oci_process: oci::Process = serde_json::from_str(process_str)?;
        |         ----^^^^^^^^^^^
        |         |
        |         help: remove this `mut`
        |
        = note: `#[warn(unused_mut)]` on by default

Fixes: #750

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
8d8adb6887 rust-agent: Remove uses of deprecated functions
This addresses the following:

    warning: use of deprecated item 'std::error::Error::description': use the Display impl or to_string()
        --> rustjail/src/container.rs:1598:31
         |
    1598 | ...                   e.description(),
         |                         ^^^^^^^^^^^
         |
         = note: `#[warn(deprecated)]` on by default

Fixes: #750

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
76298c12b7 rust-agent: Remove or rename unused parameters
Parameters that are never used were removed.
Parameters that are unused, but necessary because of some common
interface were renamed with a _ prefix.
In one case, consume the parameter by adding an info! call, and fix a
minor typo in a message in the same function.

This addresses the following warning:

    warning: unused variable: `child`
        --> rustjail/src/container.rs:1128:5
         |
    1128 |     child: &mut Child,
         |     ^^^^^ help: if this is intentional, prefix it with an underscore: `_child`

    warning: unused variable: `logger`
        --> rustjail/src/container.rs:1049:22
         |
    1049 | fn update_namespaces(logger: &Logger, spec: &mut Spec, init_pid: RawFd) -> Result<()> {
         |                      ^^^^^^ help: if this is intentional, prefix it with an underscore: `_logger`

Fixes: #750

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
7d303ec2d0 rust-agent: Remove or rename unused variables
Remove variables that are simply not used.
Rename as _ variables where only initialization matters.

This addresses the following warnings:

    warning: unused variable: `writer`
       --> src/main.rs:130:9
        |
    130 |     let writer = unsafe { File::from_raw_fd(wfd) };
        |         ^^^^^^ help: if this is intentional, prefix it with an underscore: `_writer`
        |
        = note: `#[warn(unused_variables)]` on by default

    warning: unused variable: `ctx`
       --> src/rpc.rs:782:9
        |
    782 |         ctx: &ttrpc::TtrpcContext,
        |         ^^^ help: if this is intentional, prefix it with an underscore: `_ctx`

    warning: unused variable: `ctx`
       --> src/rpc.rs:808:9
        |
    808 |         ctx: &ttrpc::TtrpcContext,
        |         ^^^ help: if this is intentional, prefix it with an underscore: `_ctx`

    warning: unused variable: `dns_list`
        --> src/rpc.rs:1152:16
         |
    1152 |             Ok(dns_list) => {
         |                ^^^^^^^^ help: if this is intentional, prefix it with an underscore: `_dns_list`

    warning: value assigned to `child_stdin` is never read
       --> rustjail/src/container.rs:807:13
        |
    807 |         let mut child_stdin = std::process::Stdio::null();
        |             ^^^^^^^^^^^^^^^
        |
        = note: `#[warn(unused_assignments)]` on by default
        = help: maybe it is overwritten before being read?

    warning: value assigned to `child_stdout` is never read
       --> rustjail/src/container.rs:808:13
        |
    808 |         let mut child_stdout = std::process::Stdio::null();
        |             ^^^^^^^^^^^^^^^^
        |
        = help: maybe it is overwritten before being read?

    warning: value assigned to `child_stderr` is never read
       --> rustjail/src/container.rs:809:13
        |
    809 |         let mut child_stderr = std::process::Stdio::null();
        |             ^^^^^^^^^^^^^^^^
        |
        = help: maybe it is overwritten before being read?

    warning: value assigned to `stdin` is never read
       --> rustjail/src/container.rs:810:13
        |
    810 |         let mut stdin = -1;
        |             ^^^^^^^^^
        |
        = help: maybe it is overwritten before being read?

    warning: value assigned to `stdout` is never read
       --> rustjail/src/container.rs:811:13
        |
    811 |         let mut stdout = -1;
        |             ^^^^^^^^^^
        |
        = help: maybe it is overwritten before being read?

    warning: value assigned to `stderr` is never read
       --> rustjail/src/container.rs:812:13
        |
    812 |         let mut stderr = -1;
        |             ^^^^^^^^^^
        |
        = help: maybe it is overwritten before being read?

Fixes: #750

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
e0b79eb57f rust-agent: Remove unused functions
Fixes the following warning:

   Compiling logging v0.1.0 (/home/ddd/go/src/github.com/kata-containers-2.0/pkg/logging)
   warning: associated function is never used: `set_level`
      --> /home/ddd/go/src/github.com/kata-containers-2.0/pkg/logging/src/lib.rs:186:8
       |
   186 |     fn set_level(&self, level: slog::Level) {
       |        ^^^^^^^^^
       |
       = note: `#[warn(dead_code)]` on by default

Fixes: #750

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
8ed61b1bb9 rust-agent: Remove useless braces
This addresses the following warning:

    warning: unnecessary braces around assigned value
        --> src/rpc.rs:1411:26
         |
    1411 |     detail.init_daemon = { unistd::getpid() == Pid::from_raw(1) };
         |                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: remove these braces
         |
         = note: `#[warn(unused_braces)]` on by default

Fixes: #750

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
cc4f02e2b6 rust-agent: Remove unused macros
This addresses the following warnings:

   Compiling rustjail v0.1.0 (/home/ddd/go/src/github.com/kata-containers-2.0/src/agent/rustjail)
   warning: unused `#[macro_use]` import
     --> rustjail/src/lib.rs:15:1
      |
   15 | #[macro_use]
      | ^^^^^^^^^^^^
      |
      = note: `#[warn(unused_imports)]` on by default

   warning: unused macro definition
     --> rustjail/src/lib.rs:38:1
      |
   38 | / macro_rules! sl {
   39 | |     () => {
   40 | |         slog_scope::logger().new(o!("subsystem" => "rustjail"))
   41 | |     };
   42 | | }
      | |_^
      |
      = note: `#[warn(unused_macros)]` on by default

Fixes: #750

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Bo Chen
ace6f1e66e clh: Support VFIO device unplug
This patch adds the support of VFIO device unplug when using
cloud-hypervisor.

Fixes: #860

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-10-18 00:40:16 +08:00
Bo Chen
47cfeaaf18 clh: Remove unnecessary VmmPing
We can rely on the error handling of the actual HTTP API calls to catch
errors, and don't need to call VmmPing explicitly in advance.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-10-18 00:40:16 +08:00
Bo Chen
63c475786f versions: cloud-hypervisor: Bump to version 6d30fe05
The cloud-hypervisor commit `6d30fe05` introduced a fix on its API for
VFIO device hotplug (`VmAddDevice`), which is required for supporting
VFIO unplug through openAPI calls in kata.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-10-18 00:40:16 +08:00
Chelsea Mafrica
059b89cd03 docs: Change kata_tap0 to tap0_kata
Tap device's should be tap0_kata for architecture.md

Fixes #797

Signed-off-by: duanquanfeng <duanquanfeng_yewu@cmss.chinamobile.com>
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2020-10-18 00:40:16 +08:00
Chelsea Mafrica
4ff3ed5101 docs: update networking description
First, most people don't care about CNM. Move that out of main doc.

Second, tc-filter is the default. Let's add a bit more background on
our usage of tc-filter (and clarify why we use this instead of macvtap).

Fixes #797

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2020-10-18 00:40:16 +08:00
Eric Ernst
de8dcb1549 dev-guide: update kata-agent install details
Install paths were wrong. Updated based on new agent...

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
Archana Shinde
c488cc48a2 docs: Update docs for enabling agent debug console
The systemd method of adding a debug console is not really
user friendly. Since we have added a much more straightforward
method to enable agent debug console, update developer guide to
reflect this.

Fixes #834

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2020-10-18 00:40:16 +08:00
Eric Ernst
e5acb1257f docs: update dev guide for agent build
Include details on setting up rust.

Fixes: #851

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
Julio Montes
1bddde729b ci: add github action to test the snap
Add github action to test that the snap package was generated
correctly, this CI don't test the snap, it just build it.

fixes #838

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-18 00:40:16 +08:00
Julio Montes
9517b0a933 versions: cloud-hypervisor: bump version
Use commit c54452c08a467a3e35d8d72f2a91d424e9718c57 as
version for cloud-hypervisor.
Bring openapi fix cloud-hypervisor/cloud-hypervisor#1760 to
support SGX.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-18 00:40:16 +08:00
Julio Montes
f5a7175f92 runtime: cloud-hypervisor: tag openapi-generator-cli container
Tag openapi-generator-cli container to v4.3.1 that is the latest
stable, this way we can have reproducible builds and the same
generated code in all the systems

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-18 00:40:16 +08:00
Eric Ernst
9b969bb7da packaging: fix image build script
Relative paths are error prone. Fix error.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-06 17:57:28 -07:00
Eric Ernst
fb2f3cfce2 release: Kata Containers 2.0.0-rc1
ae6ccbe8 rust-agent: Update README
3faef791 docs: drop docker installation guide
f3466b87 docs: fix static check errors in docs/install/README.md
89ec614d docs: update architecture.md
1ed73179 qemu: upgrade qemu version to 5.1.0 for arm64.
cb79dddf agent: Fix OCI Windows network shared container name typo
c50aee9d github: Remove issue template and use central one
2a4c3e6a docs: fix broken links
9e2a314e Packaging: release notes script using error kernel path urls
aed20f43 rust-agent: Replaces improper use of match for non-constant patterns
868d0248 devices: fix go test warning in manager_test.go
14164392 action: Allow long lines if non-alphabetic
2ece152c agent: remove unreachable code
033925f9 agent: Change do_exec return type to ! because it will never return
c90fff82 agent: propagate the internal detail errors to users
c0ea9102 packaging: Stop providing OBS packages
ca54edef install: Add contacts to the distribution packages
b5ece037 install: Update information about Community Packages
378e429d install: Update SUSE information
567f8587 install: Update openSUSE information
18f32d13 install: Update RHEL information
8280523c install: Update Fedora information
578db2fc install: Update CentOS information
781d6eca ci: fix clone_tests_repo function
c18c5e2c agent: Set LIBC=gnu for ppc64le arch by default
a378ba53 fc: integrate Firecracker's metrics
9991f4b5 static-build/qemu-virtiofs: Refactor apply virtiofs patches
4a0fd6c2 packaging/qemu: Add common code to apply patches
37acc030 static-build/qemu-virtiofs: Fix to apply QEMU patches
6c275c92 runtime: fix TestNewConsole UT failure
0479a4cb travis: skip static checker for ppc64
b3e52844 runtime: fix golint errors
d36d3486 agent: fix cargo fmt
e1094d7f ci: always checkout 2.0-dev of test repository
c8ba30f9 docs: fix static check errors
eaa5c433 runtime: fix make check
07caa2f2 gitignore: ignore agent service file
f34e2e66 agent: fix UT failures due to chdir
442e5906 agent: Only allow proc mount if it is procfs
f2850668 rustjail: make the mount error info much more clear
73414554 runtime: add enable_debug_console configuration item for agent
0b62f5a9 runtime: add debug console service
c23a401e runtime: Call s.newStore.Destroy if globalSandboxList.addSandbox
80879197 shimv2: add a comment in checkAndMount()
b6066cbc osbuilder: specify default toolchain verion in rust-init.
1290d007 runtime: Update cloud-hypervisor client pkg to version v0.10.0
afeece42 agent/oci: Don't use deprecated Error::description() method
a4075f0f runtime: Fix linter errors in release files
01df3c1d packaging: Build from source if the clh release binary is missing
bacd41bb runtime: add podman configuration to data collection script
d9746f31 ci: use export command to export envs instead of env config item
ca2a1176 ci: use Travis cache to reduce build time
67af593a agent: update cgroups crate
cabc60f3 docs: Update the reference path of kata-deploy in the packaging
a5859197 runtime: make kata-check check for newer release
08d194b8 how-to: add privileged_without_host_devices to containerd guide
89ade8f3 travis: enable RUST_BACKTRACE
4b30001d agent/rustjail: add more unit tests
232c8213 agent/rustjail: remove makedev function
74bcd510 agent/rustjail: add unit tests for ms_move_rootfs and mask_path
a36f93c9 agent/rustjail: implement functions to chroot
fe0f2198 agent/rustjail: add unit test for pivot_rootfs
5770c2a2 agent/rustjail: implement functions to pivot_root
838b1794 agent/rustjail: add unit test for mount_cgroups
1a60c1de agent/rustjail: add unit test for init_rootfs
77ecfed2 agent/rustjail/mount: don't use unwrap
fa7079bc agent/rustjail: add tempfile crate as depedency
c23bac5c rustjail: implement functions to mount and umount files
e99f3e79 docs: Fix the kata-pkgsync tool's docs script path
d05a7cda docs: fix k8s containerd howto links
f6877fa4 docs: fix up developer guide for 2.0
6d326f21 gitignore: ignore agent version.rs
407cb9a3 agent: fix agent panic running as init
38eb1df4 packaging: use local version file for kata 2.0 in Makefile
313dfee3 docs: fix release process doc
0c4e7b21 packaging: fix release notes

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2020-10-06 17:54:13 -07:00
Eric Ernst
f32a741c76 actions: add kata deploy test
Pull over kata-deploy-test from the 1.x packaging repository. This is
intended to be used for testing any changes to the kata-deploy
scripting, and does not exercise any new source code changes.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-06 17:54:13 -07:00
Eric Ernst
512e79f61a packaging: cleaning, updating based on new filepaths
Update scripts to take into account some files being moved, and some
general cleanup.

Fixes: #866

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-06 17:54:13 -07:00
Eric Ernst
aa70080423 packaging: remove obs-packaging
No longer required -- let's remove them.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-06 17:54:13 -07:00
Eric Ernst
34015bae12 packaging: pull versions, build-image out from obs dir
These are still required; let's pull them out.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-06 17:54:13 -07:00
Eric Ernst
93b60a8327 packaging: Revert "packaging: Stop providing OBS packages"
This reverts commit c0ea910273.

Two scripts are still required for release and testing, which should
have never been under obs-packaging dir in the first place.  Let's
revert, move the scripts / update references to it, and then we can
remove the remaining obs-packaging/ tooling.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-06 17:54:13 -07:00
Yang Bo
aa9951f2cd rust-agent: Update README
rust agent does not use grpc as submodule for a while, update README
to reflect the change.

Fixes: #196
Signed-off-by: Yang Bo <bo@hyper.sh>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Peng Tao
9d8c72998b docs: drop docker installation guide
We have removed cli support and that means dockder support is dropped
for now. Also it doesn't make sense to have so many duplications on each
distribution as we can simply refer to the official docker guide on how
to install docker.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Peng Tao
033ed13202 docs: fix static check errors in docs/install/README.md
It was merged in while the static checker is disabled.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Peng Tao
c058d04b94 docs: update architecture.md
To match the current architecture of Kata Containers 2.0.

Fixes: #831
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Edmond AK Dantes
9d2bb0c452 qemu: upgrade qemu version to 5.1.0 for arm64.
Now, the qemu version used in arm is so old. As some new features have merged
in current qemu, so it's time to upgrade it. As obs-packaging has been removed,
I put the qemu patch under qemu/patch/5.1.x.
As vxfs has been Deprecated in qemu-5.1, it will be no longer exist in
configuration-hyperversior.sh when qemu version larger than 5.0.

Fixes: #816
Signed-off-by: Edmond AK Dantes <edmond.dantes.ak47@outlook.com>
2020-10-06 17:54:13 -07:00
James O. D. Hunt
627d062fb2 agent: Fix OCI Windows network shared container name typo
Correct the typo which would break the Windows-specific OCI network
shared container name feature.

See:

- https://github.com/opencontainers/runtime-spec/blob/master/config-windows.md#network

Fixes: #685.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-06 17:54:13 -07:00
James O. D. Hunt
96afe62576 github: Remove issue template and use central one
Remove the GitHub issue template from this repository. We already have a
central set of templates [1] that are being used so the template in this
repository is redundant.

[1] - https://github.com/kata-containers/.github/tree/master/.github/ISSUE_TEMPLATE/

Fixes: #728.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-06 17:54:13 -07:00
Julio Montes
d946016eb7 docs: fix broken links
Some sections and files were removed in a previous commit,
remove all reference to such sections and files to fix the
check-markdown test.

fixes #826

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-06 17:54:13 -07:00
Ychau Wang
37f1a77a6a Packaging: release notes script using error kernel path urls
2.0 Packaging runtime-release-notes.sh script is using 1.x Packaging
kernel urls. Fix these urls to 2.0 branch Packaging urls.

Fixes: #829

Signed-off-by: Ychau Wang <wangyongchao.bj@inspur.com>
2020-10-06 17:54:13 -07:00
Christophe de Dinechin
450a81cc54 rust-agent: Replaces improper use of match for non-constant patterns
The code used `match` as a switch with variable patterns `ev_fd` and
`cf_fd`, but the way Rust interprets the code is that the first
pattern matches all values. The code does not perform as expected.

This addresses the following warning:

   warning: unreachable pattern
      --> rustjail/src/cgroups/notifier.rs:114:21
       |
   107 |                     ev_fd => {
       |                     ----- matches any value
   ...
   114 |                     cg_fd => {
       |                     ^^^^^ unreachable pattern
       |
       = note: `#[warn(unreachable_patterns)]` on by default

Fixes: #750
Fixes: #793

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-06 17:54:13 -07:00
zhanghj
c09f02e6f6 devices: fix go test warning in manager_test.go
Create "class" and "config" file in temporary device BDF dir,
and remove dir created  by ioutil.TempDir() when test finished.

fixes: #746

Signed-off-by: zhanghj <zhanghj.lc@inspur.com>
2020-10-06 17:54:13 -07:00
James O. D. Hunt
58c7469110 action: Allow long lines if non-alphabetic
Overly long commit lines are annoying. But sometimes,
we need to be able to force the use of long lines
(for example to reference a URL).

Ironically, I can't refer to the URL that explains this
because of ... the long line check! Hence:

```sh
$ cat <<EOT | tr -d '\n'; echo
See: https://github.com/kata-containers/tests/tree/master/
cmd/checkcommits#handling-long-lines
EOT
```

Maximum body length updated to 150 bytes for parity with:

https://github.com/kata-containers/tests/pull/2848

Fixes: #687.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-06 17:54:13 -07:00
Tim Zhang
c36ea0968d agent: remove unreachable code
The code in the end of init_child is unreachable and need to be removed.
The code after do_exec is unreachable and need to be removed.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-10-06 17:54:13 -07:00
Tim Zhang
ba197302e2 agent: Change do_exec return type to ! because it will never return
Indicates unreachable code.

Fixes #819

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-10-06 17:54:13 -07:00
fupan.lfp
725ad067c1 agent: propagate the internal detail errors to users
It's should propagate the detail errors to users when
the rpc call failed.

Fixes: #824

Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
2020-10-06 17:54:13 -07:00
Fabiano Fidêncio
9858c23c59 packaging: Stop providing OBS packages
The community has discussed and took the decision in favour of promoting
kata-deploy as the way of distributing and using kata for distros that
officially don't maintain the project.

Fixes: #623
Fixes: https://github.com/kata-containers/packaging/issues/1120

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2020-10-06 17:54:13 -07:00
Fabiano Fidêncio
fc8f1ff03c install: Add contacts to the distribution packages
Let's add a new column to the Official packages table, and let the
maintainers of the official distro packages to jump in and add their
names there.

This will help us to ping & redirect to the right people possible issues
that are reported against the official packages.

Fixes: #623

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2020-10-06 17:54:13 -07:00
Fabiano Fidêncio
f7b4f76082 install: Update information about Community Packages
Kata Containers will stop distributing the community packages in favour
of kata-deploy.

Fixes: #623

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2020-10-06 17:54:13 -07:00
Fabiano Fidêncio
4fd66fa689 install: Update SUSE information
Following up a conversation with Ralf Haferkamp, we can safely drop the
instructions for using Kata Containers on SLES 12 SP3 in favour of using
the official builds provided for SLE 15 SP1, and SLE 15 SP2.

Fixes: #623

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2020-10-06 17:54:13 -07:00
Fabiano Fidêncio
e6ff42b8ad install: Update openSUSE information
Let's update the openSUSE Installation Guide to reflect the current
information on how to install kata packages provided by the distro
itself.

The official packages are present on Leap 15.2 and Tumbleweed, and can
be just installed. Leap 15.1 is slightly different, as the .repo file
has to be added before the packages can be installed.

Leap 15.0 has been removed as it already reached its EOL.

Fixes: #623

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2020-10-06 17:54:13 -07:00
Fabiano Fidêncio
6710d87c6a install: Update RHEL information
Although the community packages are present for RHEL, everything about
them is extremely unsupported on the Red Hat side.

Knowing this, we'd be better to simply not mentioned those and, if users
really want to try kata-containers on RHEL, they can simply follow the
CentOS installation guide.

In the future, if the Fedora packages make their way to RHEL, we can add
the information here. However, if we're recommending something
unsupported we'd be better recommending kata-deploy instead.

Fixes: #623

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2020-10-06 17:54:13 -07:00
Fabiano Fidêncio
178b79f122 install: Update Fedora information
Let's update the Fedora Installation Guide to reflect the current
information on how to install kata packages provided by the distro
itself.

These are official packages and we, as Fedora members, recommend using
kata-containers on Fedora 32 and onwards, as from this version
everything works out-of-the-box. Also, Fedora 31 will reach its EOL as
soon as Fedora 33 is out, which should happen on October.

Fixes: #623

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2020-10-06 17:54:13 -07:00
Fabiano Fidêncio
bc545c6549 install: Update CentOS information
Let's update the CentOS Installation Guide to reflect the current
information on how to install kata packages provided by the
Virtualiation Special Interest Group.

These are not official CentOS packages, as those are not coming from Red
Hat Enterprise Linux. These are the same packages we have on Fedora and
we have decided to keep them up-to-date and sync'ed on both Fedora and
CentOS, so people can give Kata Containers a try also on CentOS.

The nature of these packages makes me think that those are "as official
as they can be", so that's the reason I've decided to add the
instructions to the "official" table.

Together with the change in the Installation Guide, let's also update
the README and reflect the fact we **strongly recommend** using CentOS
8, with the packages provided by the Virtualization Special Interest
Group, instead of using the CentOS 7 with packages built on OBS.

Fixes: #623

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2020-10-06 17:54:13 -07:00
Salvador Fuentes
585481990a ci: fix clone_tests_repo function
We should not checkout to 2.0-dev branch in the clone_tests_repo
function when running in Jenkins CI as it discards changes from
tests repo.

Fixes: #818.

Signed-off-by: Salvador Fuentes <salvador.fuentes@intel.com>
2020-10-06 17:54:13 -07:00
Pradipta Kr. Banerjee
0057f86cfa agent: Set LIBC=gnu for ppc64le arch by default
Fixes: #812

Signed-off-by: Pradipta Kr. Banerjee <pradipta.banerjee@gmail.com>
2020-10-06 17:54:13 -07:00
bin liu
fa0401793f fc: integrate Firecracker's metrics
Firecracker expose metrics through fifo file
and using a JSON format. This PR will parse the
Firecracker's metrics and convert to Prometheus metrics.

Fixes: #472

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-06 17:54:13 -07:00
Wainer dos Santos Moschetta
60b7265961 static-build/qemu-virtiofs: Refactor apply virtiofs patches
In static-build/qemu-virtiofs/Dockerfile the code which
applies the virtiofs specific patches is spread in several
RUN instructions. Refactor this code so that it runs in a
single RUN and produce a single overlay image.

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2020-10-06 17:54:13 -07:00
Wainer dos Santos Moschetta
57b53dbae8 packaging/qemu: Add common code to apply patches
The qemu and qemu-virtiofs Dockerfile files repeat the code to apply
patches based on QEMU stable branch being built. Instead, this adds
a common script (qemu/apply_patches.sh) and make it called by the
respective Dockerfile files.

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2020-10-06 17:54:13 -07:00
Wainer dos Santos Moschetta
ddf1a545d1 static-build/qemu-virtiofs: Fix to apply QEMU patches
Fix a bug on qemu-virtiofs Dockerfile which end up not applying
the QEMU patches.

Fixes #786

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2020-10-06 17:54:13 -07:00
Peng Tao
cbdf6400ae runtime: fix TestNewConsole UT failure
It needs root.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Peng Tao
ceeecf9c66 travis: skip static checker for ppc64
As we have already run it on x64.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Peng Tao
7c53baea8a runtime: fix golint errors
Need to run gofmt -s on them.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Peng Tao
b549d354bf agent: fix cargo fmt
Otherwise travis fails.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Peng Tao
9f3113e1f6 ci: always checkout 2.0-dev of test repository
We use 2.0-dev in the tests repository now. Always make sure
we use the right branch.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Peng Tao
ef94742320 docs: fix static check errors
Somehow we are not running static checks for a long time.
And that ended up with a lot for errors.

* Ensure debug options are valid is dropped
* fix snap links
* drop extra CONTRIBUTING.md
* reference kata-pkgsync
* move CODEOWNERS to proper place
* remove extra CODE_OF_CONDUCT.md.
* fix spell checker error on Developer-Guide.md

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Peng Tao
d71764985d runtime: fix make check
Need to use the correct script path.

Fixes: #802
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Peng Tao
0fc04a269d gitignore: ignore agent service file
As it is auto-generated.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Peng Tao
8d7ac5f01c agent: fix UT failures due to chdir
Current working directory is a process level resource. We cannot call
chdir in parallel from multiple threads, which would cause cwd confusion
and result in UT failures.

The agent code itself is correct that chdir is only called from spawned
child init process. Well, there is one exception that it is also called
in do_create_container() but it is safe to assume that containers are
never created in parallel (at least for now).

Fixes: #782
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
fupan.lfp
612acbe319 agent: Only allow proc mount if it is procfs
This only allows some whitelists files bind mounted under proc
and prevent other malicious mount to procfs.

Fixes: #807

Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
2020-10-06 17:54:13 -07:00
fupan.lfp
f3a487cd41 rustjail: make the mount error info much more clear
Make the invalid mount destination's error info much
more clear.

Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
2020-10-06 17:54:13 -07:00
bin liu
3a559521d1 runtime: add enable_debug_console configuration item for agent
Set enable_debug_console=true in Kata's congiguration file,
runtime will pass `agent.debug_console`
and `agent.debug_console_vport=1026` to agent.

Fixes: #245

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-06 17:54:13 -07:00
bin liu
567daf5a42 runtime: add debug console service
Add `kata-runtime exec` to enter guest OS
through shell started by agent

Fixes: #245

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-06 17:54:13 -07:00
Shukui Yang
c7d913f436 runtime: Call s.newStore.Destroy if globalSandboxList.addSandbox
Fixes: #696

Signed-off-by: Shukui Yang <keloyangsk@gmail.com>
2020-10-06 17:54:13 -07:00
Qian Cai
7bd410c725 shimv2: add a comment in checkAndMount()
In checkAndMount(), it is not clear why we check IsBlockDevice() and if
DisableBlockDeviceUse == false and then only return "false, nil" instead
of "false, err". Adding a comment to make it a bit more readable.

Fixes: #732
Signed-off-by: Qian Cai <cai@redhat.com>
2020-10-06 17:54:13 -07:00
zhanghj
7fbc789855 osbuilder: specify default toolchain verion in rust-init.
Specify default toolchain version in rust-init.

Fixes: #799

Signed-off-by: zhanghj <zhanghj.lc@inspur.com>
2020-10-06 17:54:13 -07:00
Bo Chen
7fc41a771a runtime: Update cloud-hypervisor client pkg to version v0.10.0
The latest release of cloud-hypervisor v0.10.0 contains the following
updates: 1) `virtio-block` Support for Multiple Descriptors; 2) Memory
Zones; 3) `Seccomp` Sandbox Improvements; 4) Preliminary KVM HyperV
Emulation Control; 5) various bug fixes and refactoring.

Note that this patch updates the client code of clh's HTTP API in kata,
while the 'versions.yaml' file was updated in an earlier PR.

Fixes: #789

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-10-06 17:54:13 -07:00
David Gibson
a31d82fec2 agent/oci: Don't use deprecated Error::description() method
We shouldn't use it, and we don't need to implement it.

fixes #791

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-10-06 17:54:13 -07:00
James O. D. Hunt
9ef4c80340 runtime: Fix linter errors in release files
Fix the linter errors caught in the `runtime` repos `master` branch [1],
but not in the `2.0-dev` branch [2]. See [3] for further details.

[1] - https://github.com/kata-containers/runtime/pull/2976
[2] - https://github.com/kata-containers/kata-containers/pull/735
[3] - https://github.com/kata-containers/tests/issues/2870

Fixes: #783.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-06 17:54:13 -07:00
Bo Chen
6a4e413758 packaging: Build from source if the clh release binary is missing
This patch add fall-back code path that builds cloud-hypervisor static
binary from source, when the downloading of cloud-hypervisor binary is
failing. This is useful when we experience network issues, and also
useful for upgrading clh to non-released version.

Together with the changes in the tests repo
(https://github.com/kata-containers/tests/pull/2862), the Jenkins config
file is also updated with new Execute shell script for the clh CI in the
kata-containers repo. Those two changes fix the regression on clh CI
here. Please check details in the issue below.

Fixes: #781
Fixes: https://github.com/kata-containers/tests/issues/2858

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-10-06 17:54:13 -07:00
Francesco Giudici
678d4d189d runtime: add podman configuration to data collection script
Be more verbose about podman configuration in the output of the data
collection script: get the system configuration as seen by podman and
dump the configuration files when present.

Fixes: #243
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
2020-10-06 17:54:13 -07:00
bin liu
718f718764 ci: use export command to export envs instead of env config item
Config item env is used as a Matrix Expansion key, so these envs
will export to build jobs individually.

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-06 17:54:13 -07:00
bin liu
d860ded3f0 ci: use Travis cache to reduce build time
This PR includes these changes:
- use Rust installed by Travis
- install x86_64-unknown-linux-musl
- install rustfmt
- use Travis cache
- delete ci/install_vc.sh

Fixes: #748

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-06 17:54:13 -07:00
fupan.lfp
a141da8a20 agent: update cgroups crate
Update cgroups crate to fix the building issue
on Aarch64.

Fixes: #770

Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
2020-10-06 17:54:13 -07:00
Ychau Wang
aaaaee7a4b docs: Update the reference path of kata-deploy in the packaging
Use the relative path of kata-deploy to replace the 1.x packaging url in
the kata-deploy/README.md file. Fixed the path issue, producted by
creating new branch.

Fixes: #777

Signed-off-by: Ychau Wang <wangyongchao.bj@inspur.com>
2020-10-06 17:54:13 -07:00
James O. D. Hunt
21efaf1fca runtime: make kata-check check for newer release
Update `kata-check` to see if there is a newer version available for
download. Useful for users installing static packages (without a package
manager).

Fixes: #734.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-06 17:54:13 -07:00
Peng Tao
2056623e13 how-to: add privileged_without_host_devices to containerd guide
It should be set by default for Kata containers working with containerd.

Fixes: #775
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Julio Montes
34126ee704 travis: enable RUST_BACKTRACE
RUST_BACKTRACE=1 will help us a lot to debug unit tests when
a test is failing

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-06 17:54:13 -07:00
Julio Montes
980a338454 agent/rustjail: add more unit tests
Add unit tests for finish_root, read_only_path and mknod_dev
increasing code coverage of mount.rs

fixes #284

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-06 17:54:13 -07:00
Julio Montes
e14f766895 agent/rustjail: remove makedev function
remove `makedev` function, use `nix`'s implementation instead

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-06 17:54:13 -07:00
Julio Montes
2e0731f479 agent/rustjail: add unit tests for ms_move_rootfs and mask_path
Increase code coverage of mount.rs

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-06 17:54:13 -07:00
Julio Montes
addf62087c agent/rustjail: implement functions to chroot
Use conditional compilation (#[cfg]) to change chroot behaviour
at compilation time. For example, such function will just return
`Ok(())` when the unit tests are being compiled, otherwise real
chroot operation is performed.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-06 17:54:13 -07:00
Julio Montes
c24b68dc4f agent/rustjail: add unit test for pivot_rootfs
Add unit test for pivot_rootfs increasing the code coverage of
mount.rs

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-06 17:54:13 -07:00
Julio Montes
24677d7484 agent/rustjail: implement functions to pivot_root
Use conditional compilation (#[cfg]) to change pivot_root behaviour
at compilation time. For example, such function will just return
`Ok(())` when the unit tests are being compiled, otherwise real
pivot_root operation is performed.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-06 17:54:13 -07:00
Julio Montes
9e74c28158 agent/rustjail: add unit test for mount_cgroups
Add a unit test for `mount_cgroups` increasing the code coverage
of mount.rs from 44% to 52%

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-06 17:54:13 -07:00
Julio Montes
b7aae33cc1 agent/rustjail: add unit test for init_rootfs
Add a unit test for `init_rootfs` increasing the code coverage
of mount.rs from 0% to 44%.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-06 17:54:13 -07:00
Julio Montes
6d9d58278e agent/rustjail/mount: don't use unwrap
Don't use unwrap in `init_rootfs` instead return an Error, this way
we can write unit tests that don't panic.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-06 17:54:13 -07:00
Julio Montes
1bc6fbda8c agent/rustjail: add tempfile crate as depedency
Add tempfile crate as depedency, it will be used in the following
commits to create temporary directories for unit testing.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-06 17:54:13 -07:00
Julio Montes
d39f5a85e6 rustjail: implement functions to mount and umount files
Use conditional compilation (#[cfg]) to change mount and umount
behaviours at compilation time. For example, such functions will just
return `Ok(())` when the unit tests are being compiled, otherwise real
mount and umount operations are performed.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-06 17:54:13 -07:00
Ychau Wang
d90a0eefbe docs: Fix the kata-pkgsync tool's docs script path
Fix the kata-pkgsync tool's docs, change the download path of the
packaging tool in 2.0 release.

Fixes: #773

Signed-off-by: Ychau Wang <wangyongchao.bj@inspur.com>
2020-10-06 17:54:13 -07:00
Peng Tao
2618c014a0 docs: fix k8s containerd howto links
It should points to the internal versions.yaml file.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Peng Tao
5c4878f37e docs: fix up developer guide for 2.0
1. Until we restore docker/moby support, we should use crictl as
developer example.
2. Most of the hyperlinks should point to kata-containers repository.
3. There is no more standalone mode.

Fixes: #767
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Peng Tao
bd6b169e98 gitignore: ignore agent version.rs
It is auto-generated.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Peng Tao
5770336572 agent: fix agent panic running as init
We should mount procfs before trying to parse kernel command lines.

Fixes: #771
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
zhanghj
45daec7b37 packaging: use local version file for kata 2.0 in Makefile
Use local version file instead of downloading from upstream repo.

Fixes: #756

Signed-off-by: zhanghj <zhanghj.lc@inspur.com>
2020-10-06 17:54:13 -07:00
Peng Tao
ed5a7dc022 docs: fix release process doc
We no longer build OBS packages. And we use
kata-containers/tools/packaging/release to do release.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Peng Tao
6fc7c77721 packaging: fix release notes
Should mention the 2.0 branch docs.

Fixes: #763
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
1007 changed files with 41926 additions and 92960 deletions

View File

@@ -10,7 +10,7 @@ env:
error_msg: |+
See the document below for help on formatting commits for the project.
https://github.com/kata-containers/community/blob/master/CONTRIBUTING.md#patch-format
https://github.com/kata-containers/community/blob/master/CONTRIBUTING.md#patch-forma
jobs:
commit-message-check:

View File

@@ -1,12 +1,7 @@
on:
issue_comment:
types: [created, edited]
on: issue_comment
name: test-kata-deploy
jobs:
check_comments:
if: ${{ github.event.issue.pull_request }}
runs-on: ubuntu-latest
steps:
- name: Check for Command
@@ -14,7 +9,7 @@ jobs:
uses: kata-containers/slash-command-action@v1
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
command: "test_kata_deploy"
command: "test-kata-deploy"
reaction: "true"
reaction-type: "eyes"
allow-edits: "false"
@@ -22,7 +17,6 @@ jobs:
- name: verify command arg is kata-deploy
run: |
echo "The command was '${{ steps.command.outputs.command-name }}' with arguments '${{ steps.command.outputs.command-arguments }}'"
create-and-test-container:
needs: check_comments
runs-on: ubuntu-latest
@@ -33,26 +27,22 @@ jobs:
ref=$(cat $GITHUB_EVENT_PATH | jq -r '.issue.pull_request.url' | sed 's#^.*\/pulls#refs\/pull#' | sed 's#$#\/merge#')
echo "reference for PR: " ${ref}
echo "##[set-output name=pr-ref;]${ref}"
- name: check out
uses: actions/checkout@v2
- uses: actions/checkout@v2-beta
with:
ref: ${{ steps.get-PR-ref.outputs.pr-ref }}
ref: ${{ steps.get-PR-ref.outputs.pr-ref }}
- name: build-container-image
id: build-container-image
run: |
PR_SHA=$(git log --format=format:%H -n1)
VERSION="2.0.0"
VERSION=$(curl https://raw.githubusercontent.com/kata-containers/kata-containers/2.0-dev/VERSION)
ARTIFACT_URL="https://github.com/kata-containers/kata-containers/releases/download/${VERSION}/kata-static-${VERSION}-x86_64.tar.xz"
wget "${ARTIFACT_URL}" -O tools/packaging/kata-deploy/kata-static.tar.xz
docker build --build-arg KATA_ARTIFACTS=kata-static.tar.xz -t katadocker/kata-deploy-ci:${PR_SHA} ./tools/packaging/kata-deploy
wget "${ARTIFACT_URL}" -O ./kata-deploy/kata-static.tar.xz
docker build --build-arg KATA_ARTIFACTS=kata-static.tar.xz -t katadocker/kata-deploy-ci:${PR_SHA} ./kata-deploy
docker login -u ${{ secrets.DOCKER_USERNAME }} -p ${{ secrets.DOCKER_PASSWORD }}
docker push katadocker/kata-deploy-ci:$PR_SHA
echo "##[set-output name=pr-sha;]${PR_SHA}"
- name: test-kata-deploy-ci-in-aks
uses: ./tools/packaging/kata-deploy/action
uses: ./kata-deploy/action
with:
packaging-sha: ${{ steps.build-container-image.outputs.pr-sha }}
env:

View File

@@ -43,7 +43,7 @@ jobs:
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: ${{ env.artifact-built }} == 'true'
if: env.artifact-built == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
@@ -71,7 +71,7 @@ jobs:
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: ${{ env.artifact-built }} == 'true'
if: env.artifact-built == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
@@ -97,12 +97,65 @@ jobs:
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: ${{ env.artifact-built }} == 'true'
if: env.artifact-built == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
path: kata-static-qemu.tar.gz
build-nemu:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_nemu"
steps:
- uses: actions/checkout@v1
- name: get-artifact-list
uses: actions/download-artifact@master
with:
name: artifact-list
- name: build-nemu
run: |
if grep -q $buildstr ./artifact-list/artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-artifact-tarball.sh $buildstr
echo "artifact-built=true" >> $GITHUB_ENV
else
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: env.artifact-built == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
path: kata-static-nemu.tar.gz
# Job for building the QEMU binaries with virtiofs support
build-qemu-virtiofsd:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_qemu_virtiofsd"
steps:
- uses: actions/checkout@v1
- name: get-artifact-list
uses: actions/download-artifact@master
with:
name: artifact-list
- name: build-qemu-virtiofsd
run: |
if grep -q $buildstr ./artifact-list/artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-artifact-tarball.sh $buildstr
echo "artifact-built=true" >> $GITHUB_ENV
else
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: env.artifact-built == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
path: kata-static-qemu-virtiofsd.tar.gz
# Job for building the image
build-image:
runs-on: ubuntu-16.04
@@ -124,7 +177,7 @@ jobs:
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: ${{ env.artifact-built }} == 'true'
if: env.artifact-built == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
@@ -151,7 +204,7 @@ jobs:
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: ${{ env.artifact-built }} == 'true'
if: env.artifact-built == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
@@ -178,7 +231,7 @@ jobs:
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: ${{ env.artifact-built }} == 'true'
if: env.artifact-built == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
@@ -205,7 +258,7 @@ jobs:
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: ${{ env.artifact-built }} == 'true'
if: env.artifact-built == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
@@ -213,7 +266,7 @@ jobs:
gather-artifacts:
runs-on: ubuntu-16.04
needs: [build-experimental-kernel, build-kernel, build-qemu, build-image, build-firecracker, build-kata-components, build-clh]
needs: [build-experimental-kernel, build-kernel, build-qemu, build-qemu-virtiofsd, build-image, build-firecracker, build-kata-components, build-nemu, build-clh]
steps:
- uses: actions/checkout@v1
- name: get-artifacts

View File

@@ -44,7 +44,7 @@ jobs:
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: ${{ env.artifact-built }} == 'true'
if: env.artifact-built == 'true'
uses: actions/upload-artifact@v2
with:
name: kata-artifacts
@@ -72,7 +72,7 @@ jobs:
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: ${{ env.artifact-built }} == 'true'
if: env.artifact-built == 'true'
uses: actions/upload-artifact@v2
with:
name: kata-artifacts
@@ -98,12 +98,38 @@ jobs:
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: ${{ env.artifact-built }} == 'true'
if: env.artifact-built == 'true'
uses: actions/upload-artifact@v2
with:
name: kata-artifacts
path: kata-static-qemu.tar.gz
build-qemu-virtiofsd:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_qemu_virtiofsd"
steps:
- uses: actions/checkout@v2
- name: get-artifact-list
uses: actions/download-artifact@v2
with:
name: artifact-list
- name: build-qemu-virtiofsd
run: |
if grep -q $buildstr artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-local-artifact-tarball.sh $buildstr
echo "artifact-built=true" >> $GITHUB_ENV
else
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: env.artifact-built == 'true'
uses: actions/upload-artifact@v2
with:
name: kata-artifacts
path: kata-static-qemu-virtiofsd.tar.gz
build-image:
runs-on: ubuntu-16.04
needs: get-artifact-list
@@ -124,7 +150,7 @@ jobs:
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: ${{ env.artifact-built }} == 'true'
if: env.artifact-built == 'true'
uses: actions/upload-artifact@v2
with:
name: kata-artifacts
@@ -150,7 +176,7 @@ jobs:
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: ${{ env.artifact-built }} == 'true'
if: env.artifact-built == 'true'
uses: actions/upload-artifact@v2
with:
name: kata-artifacts
@@ -177,7 +203,7 @@ jobs:
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: ${{ env.artifact-built }} == 'true'
if: env.artifact-built == 'true'
uses: actions/upload-artifact@v2
with:
name: kata-artifacts
@@ -203,7 +229,7 @@ jobs:
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: ${{ env.artifact-built }} == 'true'
if: env.artifact-built == 'true'
uses: actions/upload-artifact@v2
with:
name: kata-artifacts
@@ -211,7 +237,7 @@ jobs:
gather-artifacts:
runs-on: ubuntu-16.04
needs: [build-experimental-kernel, build-kernel, build-qemu, build-image, build-firecracker, build-kata-components, build-clh]
needs: [build-experimental-kernel, build-kernel, build-qemu, build-qemu-virtiofsd, build-image, build-firecracker, build-kata-components, build-clh]
steps:
- uses: actions/checkout@v2
- name: get-artifacts

View File

@@ -6,9 +6,6 @@
name: Ensure PR has required porting labels
on:
pull_request:
branches:
- main
pull_request_target:
types:
- opened

View File

@@ -19,10 +19,10 @@ jobs:
run: |
sudo apt-get install -y git git-extras
kata_url="https://github.com/kata-containers/kata-containers"
latest_version=$(git ls-remote --tags ${kata_url} | egrep -o "refs.*" | egrep -v "\-alpha|\-rc|{}" | egrep -o "[[:digit:]]+\.[[:digit:]]+\.[[:digit:]]+" | sort -V -r | head -1)
latest_version=$(git ls-remote --tags ${kata_url} | egrep -o "refs.*" | egrep -o "[[:digit:]]+\.[[:digit:]]+\.[[:digit:]]+" | sort -V -r -u | head -1)
current_version="$(echo ${GITHUB_REF} | cut -d/ -f3)"
# Check semantic versioning format (x.y.z) and if the current tag is the latest tag
if echo "${current_version}" | grep -q "^[[:digit:]]\+\.[[:digit:]]\+\.[[:digit:]]\+$" && echo -e "$latest_version\n$current_version" | sort -C -V; then
# Check if the current tag is the latest tag
if echo -e "$latest_version\n$current_version" | sort -C -V; then
# Current version is the latest version, build it
snapcraft -d snap --destructive-mode
fi
@@ -33,5 +33,5 @@ jobs:
snap_file="kata-containers_${snap_version}_amd64.snap"
# Upload the snap if it exists
if [ -f ${snap_file} ]; then
snapcraft upload --release=stable ${snap_file}
snapcraft upload --release=candidate ${snap_file}
fi

View File

@@ -1,5 +1,15 @@
name: snap CI
on: ["pull_request"]
on:
pull_request:
paths:
- "**/Makefile"
- "**/*.go"
- "**/*.mk"
- "**/*.rs"
- "**/*.sh"
- "**/*.toml"
- "**/*.yaml"
- "**/*.yml"
jobs:
test:
runs-on: ubuntu-20.04

View File

@@ -1,66 +0,0 @@
on: ["pull_request"]
name: Static checks
jobs:
test:
strategy:
matrix:
go-version: [1.13.x, 1.14.x, 1.15.x]
os: [ubuntu-20.04]
runs-on: ${{ matrix.os }}
env:
TRAVIS: "true"
TRAVIS_BRANCH: ${{ github.base_ref }}
TRAVIS_PULL_REQUEST_BRANCH: ${{ github.head_ref }}
TRAVIS_PULL_REQUEST_SHA : ${{ github.event.pull_request.head.sha }}
RUST_BACKTRACE: "1"
target_branch: ${TRAVIS_BRANCH}
steps:
- name: Install Go
uses: actions/setup-go@v2
with:
go-version: ${{ matrix.go-version }}
env:
GOPATH: ${{ runner.workspace }}/kata-containers
- name: Setup GOPATH
run: |
echo "TRAVIS_BRANCH: ${TRAVIS_BRANCH}"
echo "TRAVIS_PULL_REQUEST_BRANCH: ${TRAVIS_PULL_REQUEST_BRANCH}"
echo "TRAVIS_PULL_REQUEST_SHA: ${TRAVIS_PULL_REQUEST_SHA}"
echo "TRAVIS: ${TRAVIS}"
- name: Set env
run: |
echo "GOPATH=${{ github.workspace }}" >> $GITHUB_ENV
echo "${{ github.workspace }}/bin" >> $GITHUB_PATH
- name: Checkout code
uses: actions/checkout@v2
with:
fetch-depth: 0
path: ./src/github.com/${{ github.repository }}
- name: Setup travis references
run: |
echo "TRAVIS_BRANCH=${TRAVIS_BRANCH:-$(echo $GITHUB_REF | awk 'BEGIN { FS = \"/\" } ; { print $3 }')}"
target_branch=${TRAVIS_BRANCH}
- name: Setup
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/setup.sh
env:
GOPATH: ${{ runner.workspace }}/kata-containers
- name: Building rust
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/install_rust.sh
PATH=$PATH:"$HOME/.cargo/bin"
rustup target add x86_64-unknown-linux-musl
rustup component add rustfmt clippy
# Must build before static checks as we depend on some generated code in runtime and agent
- name: Build
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && make
- name: Static Checks
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/static-checks.sh
- name: Run Compiler Checks
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && make check
- name: Run Unit Tests
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && make test

2
.gitignore vendored
View File

@@ -1,10 +1,8 @@
**/*.bk
**/*~
**/*.orig
**/*.rej
**/target
**/.vscode
pkg/logging/Cargo.lock
src/agent/src/version.rs
src/agent/kata-agent.service
src/agent/protocols/src/*.rs

62
.travis.yml Normal file
View File

@@ -0,0 +1,62 @@
# Copyright (c) 2019 Ant Financial
#
# SPDX-License-Identifier: Apache-2.0
#
dist: bionic
os: linux
# set cache directories manually, because
# we are using a non-standard directory struct
# cargo root is in srs/agent
#
# If needed, caches can be cleared
# by ways documented in
# https://docs.travis-ci.com/user/caching#clearing-caches
language: rust
rust:
- 1.44.1
cache:
cargo: true
directories:
- src/agent/target
before_install:
- git remote set-branches --add origin "${TRAVIS_BRANCH}"
- git fetch
- export RUST_BACKTRACE=1
- export target_branch=$TRAVIS_BRANCH
- "ci/setup.sh"
# we use install to run check agent
# so that it is easy to skip for non-amd64 platform
install:
- export PATH=$PATH:"$HOME/.cargo/bin"
- export RUST_AGENT=yes
- rustup target add x86_64-unknown-linux-musl
- sudo ln -sf /usr/bin/g++ /bin/musl-g++
- rustup component add rustfmt
- make -C ${TRAVIS_BUILD_DIR}/src/agent
- make -C ${TRAVIS_BUILD_DIR}/src/agent check
- sudo -E PATH=$PATH make -C ${TRAVIS_BUILD_DIR}/src/agent check
before_script:
- "ci/install_go.sh"
- make -C ${TRAVIS_BUILD_DIR}/src/runtime
- make -C ${TRAVIS_BUILD_DIR}/src/runtime test
- sudo -E PATH=$PATH GOPATH=$GOPATH make -C ${TRAVIS_BUILD_DIR}/src/runtime test
script:
- "ci/static-checks.sh"
jobs:
include:
- name: x86_64 test
os: linux
- name: ppc64le test
os: linux-ppc64le
install: skip
script: skip
allow_failures:
- name: ppc64le test
fast_finish: true

201
README.md
View File

@@ -2,143 +2,130 @@
# Kata Containers
* [Kata Containers](#kata-containers)
* [Introduction](#introduction)
* [Getting started](#getting-started)
* [Documentation](#documentation)
* [Raising issues](#raising-issues)
* [Kata Containers repositories](#kata-containers-repositories)
* [Code Repositories](#code-repositories)
* [Kata Containers-developed components](#kata-containers-developed-components)
* [Agent](#agent)
* [KSM throttler](#ksm-throttler)
* [Runtime](#runtime)
* [Trace forwarder](#trace-forwarder)
* [Additional](#additional)
* [Kernel](#kernel)
* [CI](#ci)
* [Community](#community)
* [Getting help](#getting-help)
* [Raising issues](#raising-issues)
* [Kata Containers 1.x versions](#kata-containers-1x-versions)
* [Developers](#developers)
* [Components](#components)
* [Kata Containers 1.x components](#kata-containers-1x-components)
* [Common repositories](#common-repositories)
* [Packaging and releases](#packaging-and-releases)
* [Documentation](#documentation)
* [Packaging](#packaging)
* [Test code](#test-code)
* [Utilities](#utilities)
* [OS builder](#os-builder)
* [Web content](#web-content)
---
Welcome to Kata Containers!
This repository is the home of the Kata Containers code for the 2.0 and newer
releases.
The purpose of this repository is to act as a "top level" site for the project. Specifically it is used:
If you want to learn about Kata Containers, visit the main
[Kata Containers website](https://katacontainers.io).
- To provide a list of the various *other* [Kata Containers repositories](#kata-containers-repositories),
along with a brief explanation of their purpose.
For further details on the older (first generation) Kata Containers 1.x
versions, see the
[Kata Containers 1.x components](#kata-containers-1x-components)
section.
- To provide a general area for [Raising Issues](#raising-issues).
## Introduction
## Raising issues
Kata Containers is an open source project and community working to build a
standard implementation of lightweight Virtual Machines (VMs) that feel and
perform like containers, but provide the workload isolation and security
advantages of VMs.
This repository is used for [raising
issues](https://github.com/kata-containers/kata-containers/issues/new):
## Getting started
- That might affect multiple code repositories.
See the [installation documentation](docs/install).
## Documentation
See the [official documentation](docs)
(including [installation guides](docs/install),
[the developer guide](docs/Developer-Guide.md),
[design documents](docs/design) and more).
## Community
To learn more about the project, its community and governance, see the
[community repository](https://github.com/kata-containers/community). This is
the first place to go if you wish to contribute to the project.
## Getting help
See the [community](#community) section for ways to contact us.
### Raising issues
Please raise an issue
[in this repository](https://github.com/kata-containers/kata-containers/issues).
- Where the raiser is unsure which repositories are affected.
> **Note:**
> If you are reporting a security issue, please follow the [vulnerability reporting process](https://github.com/kata-containers/community#vulnerability-handling)
>
> - If an issue affects only a single component, it should be raised in that
> components repository.
#### Kata Containers 1.x versions
## Kata Containers repositories
For older Kata Containers 1.x releases, please raise an issue in the
[Kata Containers 1.x component repository](#kata-containers-1x-components)
that seems most appropriate.
### CI
If in doubt, raise an issue
[in the Kata Containers 1.x runtime repository](https://github.com/kata-containers/runtime/issues).
The [CI](https://github.com/kata-containers/ci) repository stores the Continuous
Integration (CI) system configuration information.
## Developers
### Community
### Components
The [Community](https://github.com/kata-containers/community) repository is
the first place to go if you want to use or contribute to the project.
| Component | Type | Description |
|-|-|-|
| [agent-ctl](tools/agent-ctl) | utility | Tool that provides low-level access for testing the agent. |
| [agent](src/agent) | core | Management process running inside the virtual machine / POD that sets up the container environment. |
| [documentation](docs) | documentation | Documentation common to all components (such as design and install documentation). |
| [osbuilder](tools/osbuilder) | infrastructure | Tool to create "mini O/S" rootfs and initrd images for the hypervisor. |
| [packaging](tools/packaging) | infrastructure | Scripts and metadata for producing packaged binaries<br/>(components, hypervisors, kernel and rootfs). |
| [runtime](src/runtime) | core | Main component run by a container manager and providing a containerd shimv2 runtime implementation. |
| [trace-forwarder](src/trace-forwarder) | utility | Agent tracing helper. |
### Code Repositories
#### Kata Containers 1.x components
#### Kata Containers-developed components
For the first generation of Kata Containers (1.x versions), each component was
kept in a separate repository.
##### Agent
For information on the Kata Containers 1.x releases, see the
[Kata Containers 1.x releases page](https://github.com/kata-containers/runtime/releases).
The [`kata-agent`](src/agent/README.md) runs inside the
virtual machine and sets up the container environment.
For further information on particular Kata Containers 1.x components, see the
individual component repositories:
##### KSM throttler
| Component | Type | Description |
|-|-|-|
| [agent](https://github.com/kata-containers/agent) | core | See [components](#components). |
| [documentation](https://github.com/kata-containers/documentation) | documentation | |
| [KSM throttler](https://github.com/kata-containers/ksm-throttler) | optional core | Daemon that monitors containers and deduplicates memory to maximize container density on the host. |
| [osbuilder](https://github.com/kata-containers/osbuilder) | infrastructure | See [components](#components). |
| [packaging](https://github.com/kata-containers/packaging) | infrastructure | See [components](#components). |
| [proxy](https://github.com/kata-containers/proxy) | core | Multiplexes communications between the shims, agent and runtime. |
| [runtime](https://github.com/kata-containers/runtime) | core | See [components](#components). |
| [shim](https://github.com/kata-containers/shim) | core | Handles standard I/O and signals on behalf of the container process. |
The [`kata-ksm-throttler`](https://github.com/kata-containers/ksm-throttler)
is an optional utility that monitors containers and deduplicates memory to
maximize container density on a host.
> **Note:**
>
> - There are more components for the original Kata Containers 1.x implementation.
> - The current implementation simplifies the design significantly:
> compare the [current](docs/design/architecture.md) and
> [previous generation](https://github.com/kata-containers/documentation/blob/master/design/architecture.md)
> designs.
##### Runtime
### Common repositories
The [`kata-runtime`](src/runtime/README.md) is usually
invoked by a container manager and provides high-level verbs to manage
containers.
The following repositories are used by both the current and first generation Kata Containers implementations:
##### Trace forwarder
| Component | Description | Current | First generation | Notes |
|-|-|-|-|-|
| CI | Continuous Integration configuration files and scripts. | [Kata 2.x](https://github.com/kata-containers/ci/tree/main) | [Kata 1.x](https://github.com/kata-containers/ci/tree/master) | |
| kernel | The Linux kernel used by the hypervisor to boot the guest image. | [Kata 2.x][kernel] | [Kata 1.x][kernel] | Patches are stored in the packaging component. |
| tests | Test code. | [Kata 2.x](https://github.com/kata-containers/tests/tree/main) | [Kata 1.x](https://github.com/kata-containers/tests/tree/master) | Excludes unit tests which live with the main code. |
| www.katacontainers.io | Contains the source for the [main web site](https://www.katacontainers.io). | [Kata 2.x][github-katacontainers.io] | [Kata 1.x][github-katacontainers.io] | | |
The [`kata-trace-forwarder`](src/trace-forwarder) is a component only used
when tracing the [agent](#agent) process.
### Packaging and releases
#### Additional
Kata Containers is now
[available natively for most distributions](docs/install/README.md#packaged-installation-methods).
However, packaging scripts and metadata are still used to generate snap and GitHub releases. See
the [components](#components) section for further details.
##### Kernel
---
The hypervisor uses a [Linux\* kernel](https://github.com/kata-containers/linux) to boot the guest image.
[kernel]: https://www.kernel.org
[github-katacontainers.io]: https://github.com/kata-containers/www.katacontainers.io
### Documentation
The [docs](docs/README.md) directory holds documentation common to all code components.
### Packaging
We use the [packaging](tools/packaging/README.md) to create packages for the [system
components](#kata-containers-developed-components) including
[rootfs](#os-builder) and [kernel](#kernel) images.
### Test code
The [tests](https://github.com/kata-containers/tests) repository hosts all
test code except the unit testing code (which is kept in the same repository
as the component it tests).
### Utilities
#### OS builder
The [osbuilder](tools/osbuilder/README.md) tool can create
a rootfs and a "mini O/S" image. This image is used by the hypervisor to setup
the environment before switching to the workload.
#### `kata-agent-ctl`
[`kata-agent-ctl`](tools/agent-ctl) is a low-level test tool for
interacting with the agent.
### Web content
The
[www.katacontainers.io](https://github.com/kata-containers/www.katacontainers.io)
repository contains all sources for the https://www.katacontainers.io site.
## Credits
Kata Containers uses [packagecloud](https://packagecloud.io) for package
hosting.

View File

@@ -1 +1 @@
2.1.1
2.0.1

View File

@@ -12,11 +12,10 @@ install_aarch64_musl() {
local musl_tar="${arch}-linux-musl-native.tgz"
local musl_dir="${arch}-linux-musl-native"
pushd /tmp
if curl -sLO --fail https://musl.cc/${musl_tar}; then
tar -zxf ${musl_tar}
mkdir -p /usr/local/musl/
cp -r ${musl_dir}/* /usr/local/musl/
fi
curl -sLO https://musl.cc/${musl_tar}
tar -zxf ${musl_tar}
mkdir -p /usr/local/musl/
cp -r ${musl_dir}/* /usr/local/musl/
popd
fi
}

View File

@@ -18,9 +18,7 @@ function install_yq() {
GOPATH=${GOPATH:-${HOME}/go}
local yq_path="${GOPATH}/bin/yq"
local yq_pkg="github.com/mikefarah/yq"
local yq_version=3.4.1
[ -x "${GOPATH}/bin/yq" ] && [ "`${GOPATH}/bin/yq --version`"X == "yq version ${yq_version}"X ] && return
[ -x "${GOPATH}/bin/yq" ] && return
read -r -a sysInfo <<< "$(uname -sm)"
@@ -58,6 +56,8 @@ function install_yq() {
die "Please install curl"
fi
local yq_version=3.1.0
## NOTE: ${var,,} => gives lowercase value of var
local yq_url="https://${yq_pkg}/releases/download/${yq_version}/yq_${goos,,}_${goarch}"
curl -o "${yq_path}" -LSsf "${yq_url}"

View File

@@ -5,27 +5,18 @@
export tests_repo="${tests_repo:-github.com/kata-containers/tests}"
export tests_repo_dir="$GOPATH/src/$tests_repo"
export branch="${branch:-main}"
export branch="${branch:-2.0-dev}"
# Clones the tests repository and checkout to the branch pointed out by
# the global $branch variable.
# If the clone exists and `CI` is exported then it does nothing. Otherwise
# it will clone the repository or `git pull` the latest code.
#
clone_tests_repo()
{
if [ -d "$tests_repo_dir" ]; then
[ -n "$CI" ] && return
pushd "${tests_repo_dir}"
git checkout "${branch}"
git pull
popd
else
git clone -q "https://${tests_repo}" "$tests_repo_dir"
pushd "${tests_repo_dir}"
git checkout "${branch}"
popd
if [ -d "$tests_repo_dir" -a -n "$CI" ]
then
return
fi
go get -d -u "$tests_repo" || true
pushd "${tests_repo_dir}" && git checkout "${branch}" && popd
}
run_static_checks()

View File

@@ -1,9 +0,0 @@
# Copyright (c) 2021 Red Hat, Inc.
#
# SPDX-License-Identifier: Apache-2.0
#
# This is the build root image for Kata Containers on OpenShift CI.
#
FROM centos:8
RUN yum -y update && yum -y install git sudo wget

View File

@@ -1,54 +1,55 @@
- [Warning](#warning)
- [Assumptions](#assumptions)
- [Initial setup](#initial-setup)
- [Requirements to build individual components](#requirements-to-build-individual-components)
- [Build and install the Kata Containers runtime](#build-and-install-the-kata-containers-runtime)
- [Check hardware requirements](#check-hardware-requirements)
- [Configure to use initrd or rootfs image](#configure-to-use-initrd-or-rootfs-image)
- [Enable full debug](#enable-full-debug)
- [debug logs and shimv2](#debug-logs-and-shimv2)
- [Enabling full `containerd` debug](#enabling-full-containerd-debug)
- [Enabling just `containerd shim` debug](#enabling-just-containerd-shim-debug)
- [Enabling `CRI-O` and `shimv2` debug](#enabling-cri-o-and-shimv2-debug)
- [journald rate limiting](#journald-rate-limiting)
- [`systemd-journald` suppressing messages](#systemd-journald-suppressing-messages)
- [Disabling `systemd-journald` rate limiting](#disabling-systemd-journald-rate-limiting)
- [Create and install rootfs and initrd image](#create-and-install-rootfs-and-initrd-image)
- [Build a custom Kata agent - OPTIONAL](#build-a-custom-kata-agent---optional)
- [Get the osbuilder](#get-the-osbuilder)
- [Create a rootfs image](#create-a-rootfs-image)
- [Create a local rootfs](#create-a-local-rootfs)
- [Add a custom agent to the image - OPTIONAL](#add-a-custom-agent-to-the-image---optional)
- [Build a rootfs image](#build-a-rootfs-image)
- [Install the rootfs image](#install-the-rootfs-image)
- [Create an initrd image - OPTIONAL](#create-an-initrd-image---optional)
- [Create a local rootfs for initrd image](#create-a-local-rootfs-for-initrd-image)
- [Build an initrd image](#build-an-initrd-image)
- [Install the initrd image](#install-the-initrd-image)
- [Install guest kernel images](#install-guest-kernel-images)
- [Install a hypervisor](#install-a-hypervisor)
- [Build a custom QEMU](#build-a-custom-qemu)
- [Build a custom QEMU for aarch64/arm64 - REQUIRED](#build-a-custom-qemu-for-aarch64arm64---required)
- [Run Kata Containers with Containerd](#run-kata-containers-with-containerd)
- [Run Kata Containers with Kubernetes](#run-kata-containers-with-kubernetes)
- [Troubleshoot Kata Containers](#troubleshoot-kata-containers)
- [Appendices](#appendices)
- [Checking Docker default runtime](#checking-docker-default-runtime)
- [Set up a debug console](#set-up-a-debug-console)
- [Simple debug console setup](#simple-debug-console-setup)
- [Enable agent debug console](#enable-agent-debug-console)
- [Connect to debug console](#connect-to-debug-console)
- [Traditional debug console setup](#traditional-debug-console-setup)
- [Create a custom image containing a shell](#create-a-custom-image-containing-a-shell)
- [Build the debug image](#build-the-debug-image)
- [Configure runtime for custom debug image](#configure-runtime-for-custom-debug-image)
- [Create a container](#create-a-container)
- [Connect to the virtual machine using the debug console](#connect-to-the-virtual-machine-using-the-debug-console)
- [Enabling debug console for QEMU](#enabling-debug-console-for-qemu)
- [Enabling debug console for cloud-hypervisor / firecracker](#enabling-debug-console-for-cloud-hypervisor--firecracker)
- [Connecting to the debug console](#connecting-to-the-debug-console)
- [Obtain details of the image](#obtain-details-of-the-image)
- [Capturing kernel boot logs](#capturing-kernel-boot-logs)
* [Warning](#warning)
* [Assumptions](#assumptions)
* [Initial setup](#initial-setup)
* [Requirements to build individual components](#requirements-to-build-individual-components)
* [Build and install the Kata Containers runtime](#build-and-install-the-kata-containers-runtime)
* [Check hardware requirements](#check-hardware-requirements)
* [Configure to use initrd or rootfs image](#configure-to-use-initrd-or-rootfs-image)
* [Enable full debug](#enable-full-debug)
* [debug logs and shimv2](#debug-logs-and-shimv2)
* [Enabling full `containerd` debug](#enabling-full-containerd-debug)
* [Enabling just `containerd shim` debug](#enabling-just-containerd-shim-debug)
* [Enabling `CRI-O` and `shimv2` debug](#enabling-cri-o-and-shimv2-debug)
* [journald rate limiting](#journald-rate-limiting)
* [`systemd-journald` suppressing messages](#systemd-journald-suppressing-messages)
* [Disabling `systemd-journald` rate limiting](#disabling-systemd-journald-rate-limiting)
* [Create and install rootfs and initrd image](#create-and-install-rootfs-and-initrd-image)
* [Build a custom Kata agent - OPTIONAL](#build-a-custom-kata-agent---optional)
* [Get the osbuilder](#get-the-osbuilder)
* [Create a rootfs image](#create-a-rootfs-image)
* [Create a local rootfs](#create-a-local-rootfs)
* [Add a custom agent to the image - OPTIONAL](#add-a-custom-agent-to-the-image---optional)
* [Build a rootfs image](#build-a-rootfs-image)
* [Install the rootfs image](#install-the-rootfs-image)
* [Create an initrd image - OPTIONAL](#create-an-initrd-image---optional)
* [Create a local rootfs for initrd image](#create-a-local-rootfs-for-initrd-image)
* [Build an initrd image](#build-an-initrd-image)
* [Install the initrd image](#install-the-initrd-image)
* [Install guest kernel images](#install-guest-kernel-images)
* [Install a hypervisor](#install-a-hypervisor)
* [Build a custom QEMU](#build-a-custom-qemu)
* [Build a custom QEMU for aarch64/arm64 - REQUIRED](#build-a-custom-qemu-for-aarch64arm64---required)
* [Run Kata Containers with Containerd](#run-kata-containers-with-containerd)
* [Run Kata Containers with Kubernetes](#run-kata-containers-with-kubernetes)
* [Troubleshoot Kata Containers](#troubleshoot-kata-containers)
* [Appendices](#appendices)
* [Checking Docker default runtime](#checking-docker-default-runtime)
* [Set up a debug console](#set-up-a-debug-console)
* [Simple debug console setup](#simple-debug-console-setup)
* [Enable agent debug console](#enable-agent-debug-console)
* [Start `kata-monitor`](#start-kata-monitor)
* [Connect to debug console](#connect-to-debug-console)
* [Traditional debug console setup](#traditional-debug-console-setup)
* [Create a custom image containing a shell](#create-a-custom-image-containing-a-shell)
* [Build the debug image](#build-the-debug-image)
* [Configure runtime for custom debug image](#configure-runtime-for-custom-debug-image)
* [Connect to the virtual machine using the debug console](#connect-to-the-virtual-machine-using-the-debug-console)
* [Enabling debug console for QEMU](#enabling-debug-console-for-qemu)
* [Enabling debug console for cloud-hypervisor / firecracker](#enabling-debug-console-for-cloud-hypervisor--firecracker)
* [Create a container](#create-a-container)
* [Connect to the virtual machine using the debug console](#connect-to-the-virtual-machine-using-the-debug-console)
* [Obtain details of the image](#obtain-details-of-the-image)
* [Capturing kernel boot logs](#capturing-kernel-boot-logs)
# Warning
@@ -103,7 +104,7 @@ The build will create the following:
You can check if your system is capable of creating a Kata Container by running the following:
```
$ sudo kata-runtime check
$ sudo kata-runtime kata-check
```
If your system is *not* able to run Kata Containers, the previous command will error out and explain why.
@@ -353,12 +354,9 @@ You MUST choose one of `alpine`, `centos`, `clearlinux`, `euleros`, and `fedora`
>
> - Check the [compatibility matrix](../tools/osbuilder/README.md#platform-distro-compatibility-matrix) before creating rootfs.
Optionally, add your custom agent binary to the rootfs with the following, `LIBC` default is `musl`, if `ARCH` is `ppc64le`, should set the `LIBC=gnu` and `ARCH=powerpc64le`:
Optionally, add your custom agent binary to the rootfs with the following:
```
$ export ARCH=$(shell uname -m)
$ [ ${ARCH} == "ppc64le" ] && export LIBC=gnu || export LIBC=musl
$ [ ${ARCH} == "ppc64le" ] && export ARCH=powerpc64le
$ sudo install -o root -g root -m 0550 -T ../../../src/agent/target/$(ARCH)-unknown-linux-$(LIBC)/release/kata-agent ${ROOTFS_DIR}/sbin/init
$ sudo install -o root -g root -m 0550 -T ../../agent/kata-agent ${ROOTFS_DIR}/sbin/init
```
### Build an initrd image
@@ -384,30 +382,31 @@ You can build and install the guest kernel image as shown [here](../tools/packag
# Install a hypervisor
When setting up Kata using a [packaged installation method](install/README.md#installing-on-a-linux-system), the
`QEMU` VMM is installed automatically. Cloud-Hypervisor and Firecracker VMMs are available from the [release tarballs](https://github.com/kata-containers/kata-containers/releases), as well as through [`kata-deploy`](../tools/packaging/kata-deploy/README.md).
You may choose to manually build your VMM/hypervisor.
When setting up Kata using a [packaged installation method](install/README.md#installing-on-a-linux-system), the `qemu-lite` hypervisor is installed automatically. For other installation methods, you will need to manually install a suitable hypervisor.
## Build a custom QEMU
Kata Containers makes use of upstream QEMU branch. The exact version
and repository utilized can be found by looking at the [versions file](../versions.yaml).
Your QEMU directory need to be prepared with source code. Alternatively, you can use the [Kata containers QEMU](https://github.com/kata-containers/qemu/tree/master) and checkout the recommended branch:
Kata often utilizes patches for not-yet-upstream fixes for components,
including QEMU. These can be found in the [packaging/QEMU directory](../tools/packaging/qemu/patches)
```
$ go get -d github.com/kata-containers/qemu
$ qemu_branch=$(grep qemu-lite- ${GOPATH}/src/github.com/kata-containers/kata-containers/versions.yaml | cut -d '"' -f2)
$ cd ${GOPATH}/src/github.com/kata-containers/qemu
$ git checkout -b $qemu_branch remotes/origin/$qemu_branch
$ your_qemu_directory=${GOPATH}/src/github.com/kata-containers/qemu
```
To build a version of QEMU using the same options as the default `qemu-lite` version , you could use the `configure-hypervisor.sh` script:
To build utilizing the same options as Kata, you should make use of the `configure-hypervisor.sh` script. For example:
```
$ go get -d github.com/kata-containers/kata-containers/tools/packaging
$ cd $your_qemu_directory
$ ${GOPATH}/src/github.com/kata-containers/kata-containers/tools/packaging/scripts/configure-hypervisor.sh kata-qemu > kata.cfg
$ ${GOPATH}/src/github.com/kata-containers/kata-containers/tools/packaging/scripts/configure-hypervisor.sh qemu > kata.cfg
$ eval ./configure "$(cat kata.cfg)"
$ make -j $(nproc)
$ sudo -E make install
```
See the [static-build script for QEMU](../tools/packaging/static-build/qemu/build-static-qemu.sh) for a reference on how to get, setup, configure and build QEMU for Kata.
### Build a custom QEMU for aarch64/arm64 - REQUIRED
> **Note:**
>
@@ -475,6 +474,17 @@ debug_console_enabled = true
This will pass `agent.debug_console agent.debug_console_vport=1026` to agent as kernel parameters, and sandboxes created using this parameters will start a shell in guest if new connection is accept from VSOCK.
#### Start `kata-monitor`
The `kata-runtime exec` command needs `kata-monitor` to get the sandbox's `vsock` address to connect to, first start `kata-monitor`.
```
$ sudo kata-monitor
```
`kata-monitor` will serve at `localhost:8090` by default.
#### Connect to debug console
Command `kata-runtime exec` is used to connect to the debug console.
@@ -489,10 +499,6 @@ bash-4.2# exit
exit
```
`kata-runtime exec` has a command-line option `runtime-namespace`, which is used to specify under which [runtime namespace](https://github.com/containerd/containerd/blob/master/docs/namespaces.md) the particular pod was created. By default, it is set to `k8s.io` and works for containerd when configured
with Kubernetes. For CRI-O, the namespace should set to `default` explicitly. This should not be confused with [Kubernetes namespaces](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/).
For other CRI-runtimes and configurations, you may need to set the namespace utilizing the `runtime-namespace` option.
If you want to access guest OS through a traditional way, see [Traditional debug console setup)](#traditional-debug-console-setup).
### Traditional debug console setup
@@ -612,11 +618,8 @@ sudo sed -i -e 's/^kernel_params = "\(.*\)"/kernel_params = "\1 agent.debug_cons
> **Note** Ports 1024 and 1025 are reserved for communication with the agent
> and gathering of agent logs respectively.
##### Connecting to the debug console
Next, connect to the debug console. The VSOCKS paths vary slightly between each
VMM solution.
Next, connect to the debug console. The VSOCKS paths vary slightly between
cloud-hypervisor and firecracker.
In case of cloud-hypervisor, connect to the `vsock` as shown:
```
$ sudo su -c 'cd /var/run/vc/vm/{sandbox_id}/root/ && socat stdin unix-connect:clh.sock'
@@ -633,12 +636,6 @@ CONNECT 1026
**Note**: You need to press the `RETURN` key to see the shell prompt.
For QEMU, connect to the `vsock` as shown:
```
$ sudo su -c 'cd /var/run/vc/vm/{sandbox_id} && socat "stdin,raw,echo=0,escape=0x11" "unix-connect:console.sock"
```
To disconnect from the virtual machine, type `CONTROL+q` (hold down the
`CONTROL` key and press `q`).

View File

@@ -22,4 +22,4 @@ licensing and allows automated tooling to check the license of individual
files.
This SPDX licence identifier requirement is enforced by the
[CI (Continuous Integration) system](https://github.com/kata-containers/tests/blob/main/.ci/static-checks.sh).
[CI (Continuous Integration) system](https://github.com/kata-containers/tests/blob/master/.ci/static-checks.sh).

View File

@@ -19,8 +19,6 @@
* [Support for joining an existing VM network](#support-for-joining-an-existing-vm-network)
* [docker --net=host](#docker---nethost)
* [docker run --link](#docker-run---link)
* [Storage limitations](#storage-limitations)
* [Kubernetes `volumeMounts.subPaths`](#kubernetes-volumemountssubpaths)
* [Host resource sharing](#host-resource-sharing)
* [docker run --privileged](#docker-run---privileged)
* [Miscellaneous](#miscellaneous)
@@ -28,7 +26,7 @@
* [Appendices](#appendices)
* [The constraints challenge](#the-constraints-challenge)
***
---
# Overview
@@ -94,9 +92,7 @@ This section lists items that might be possible to fix.
### checkpoint and restore
The runtime does not provide `checkpoint` and `restore` commands. There
are discussions about using VM save and restore to give us a
`[criu](https://github.com/checkpoint-restore/criu)`-like functionality,
which might provide a solution.
are discussions about using VM save and restore to give [`criu`](https://github.com/checkpoint-restore/criu)-like functionality, which might provide a solution.
Note that the OCI standard does not specify `checkpoint` and `restore`
commands.
@@ -220,17 +216,6 @@ Equivalent functionality can be achieved with the newer docker networking comman
See more documentation at
[docs.docker.com](https://docs.docker.com/engine/userguide/networking/default_network/dockerlinks/).
## Storage limitations
### Kubernetes `volumeMounts.subPaths`
Kubernetes `volumeMount.subPath` is not supported by Kata Containers at the
moment.
See [this issue](https://github.com/kata-containers/runtime/issues/2812) for more details.
[Another issue](https://github.com/kata-containers/kata-containers/issues/1728) focuses on the case of `emptyDir`.
## Host resource sharing
### docker run --privileged
@@ -239,7 +224,7 @@ Privileged support in Kata is essentially different from `runc` containers.
Kata does support `docker run --privileged` command, but in this case full access
to the guest VM is provided in addition to some host access.
The container runs with elevated capabilities within the guest and is granted
The container runs with elevated capabilities within the guest and is granted
access to guest devices instead of the host devices.
This is also true with using `securityContext privileged=true` with Kubernetes.

View File

@@ -40,7 +40,6 @@ See the [howto documentation](how-to).
* [Intel QAT with Kata](./use-cases/using-Intel-QAT-and-kata.md)
* [VPP with Kata](./use-cases/using-vpp-and-kata.md)
* [SPDK vhost-user with Kata](./use-cases/using-SPDK-vhostuser-and-kata.md)
* [Intel SGX with Kata](./use-cases/using-Intel-SGX-and-kata.md)
## Developer Guide
@@ -49,7 +48,6 @@ Documents that help to understand and contribute to Kata Containers.
### Design and Implementations
* [Kata Containers Architecture](design/architecture.md): Architectural overview of Kata Containers
* [Kata Containers E2E Flow](design/end-to-end-flow.md): The entire end-to-end flow of Kata Containers
* [Kata Containers design](./design/README.md): More Kata Containers design documents
### How to Contribute

View File

@@ -18,7 +18,8 @@
## Requirements
- [hub](https://github.com/github/hub)
* Using an [application token](https://github.com/settings/tokens) is required for hub.
- OBS account with permissions on [`/home:katacontainers`](https://build.opensuse.org/project/subprojects/home:katacontainers)
- GitHub permissions to push tags and create releases in Kata repositories.
@@ -31,9 +32,14 @@
### Bump all Kata repositories
Bump the repositories using a script in the Kata packaging repo, where:
- `BRANCH=<the-branch-you-want-to-bump>`
- `NEW_VERSION=<the-new-kata-version>`
- We have set up a Jenkins job to bump the version in the `VERSION` file in all Kata repositories. Go to the [Jenkins bump-job page](http://jenkins.katacontainers.io/job/release/build) to trigger a new job.
- Start a new job with variables for the job passed as:
- `BRANCH=<the-branch-you-want-to-bump>`
- `NEW_VERSION=<the-new-kata-version>`
For example, in the case where you want to make a patch release `1.10.2`, the variable `NEW_VERSION` should be `1.10.2` and `BRANCH` should point to `stable-1.10`. In case of an alpha or release candidate release, `BRANCH` should point to `master` branch.
Alternatively, you can also bump the repositories using a script in the Kata packaging repo
```
$ cd ${GOPATH}/src/github.com/kata-containers/kata-containers/tools/packaging/release
$ export NEW_VERSION=<the-new-kata-version>
@@ -60,7 +66,7 @@
### Check Git-hub Actions
We make use of [GitHub actions](https://github.com/features/actions) in this [file](https://github.com/kata-containers/kata-containers/blob/main/.github/workflows/main.yaml) in the `kata-containers/kata-containers` repository to build and upload release artifacts. This action is auto triggered with the above step when a new tag is pushed to the `kata-containers/kata-containers` repository.
We make use of [GitHub actions](https://github.com/features/actions) in this [file](https://github.com/kata-containers/kata-containers/blob/master/.github/workflows/main.yaml) in the `kata-containers/kata-containers` repository to build and upload release artifacts. This action is auto triggered with the above step when a new tag is pushed to the `kata-containers/kata-conatiners` repository.
Check the [actions status page](https://github.com/kata-containers/kata-containers/actions) to verify all steps in the actions workflow have completed successfully. On success, a static tarball containing Kata release artifacts will be uploaded to the [Release page](https://github.com/kata-containers/kata-containers/releases).
@@ -73,9 +79,9 @@
```
$ cd ${GOPATH}/src/github.com/kata-containers/kata-containers/tools/packaging/release
# Note: OLD_VERSION is where the script should start to get changes.
$ ./release-notes.sh ${OLD_VERSION} ${NEW_VERSION} > notes.md
$ ./runtime-release-notes.sh ${OLD_VERSION} ${NEW_VERSION} > notes.md
# Edit the `notes.md` file to review and make any changes to the release notes.
# Add the release notes in the project's GitHub.
# Add the release notes in GitHub runtime.
$ hub release edit -F notes.md "${NEW_VERSION}"
```

View File

@@ -48,10 +48,10 @@ Alternatively, if you are using Kata Containers version 1.12.0 or newer, you
can check for newer releases using the command line:
```bash
$ kata-runtime check --check-version-only
$ kata-runtime kata-check --check-version-only
```
There are various other related options. Run `kata-runtime check --help`
There are various other related options. Run `kata-runtime kata-check --help`
for further details.
# Configuration changes

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.2 MiB

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 1.0 MiB

View File

@@ -58,7 +58,7 @@ to go through the VSOCK interface exported by QEMU.
The container workload, that is, the actual OCI bundle rootfs, is exported from the
host to the virtual machine. In the case where a block-based graph driver is
configured, `virtio-scsi` will be used. In all other cases a `virtio-fs` VIRTIO mount point
configured, `virtio-scsi` will be used. In all other cases a 9pfs VIRTIO mount point
will be used. `kata-agent` uses this mount point as the root filesystem for the
container processes.
@@ -137,7 +137,7 @@ The runtime uses a TOML format configuration file called `configuration.toml`. B
The actual configuration file paths can be determined by running:
```
$ kata-runtime --show-default-config-paths
$ kata-runtime --kata-show-default-config-paths
```
Most users will not need to modify the configuration file.

View File

@@ -1,4 +0,0 @@
# Kata Containers E2E Flow
![Kata containers e2e flow](arch-images/katacontainers-e2e-with-bg.jpg)

View File

@@ -3,6 +3,7 @@ To fulfill the [Kata design requirements](kata-design-requirements.md), and base
- Sandbox based top API
- Storage and network hotplug API
- Plugin frameworks for external proprietary Kata runtime extensions
- Built-in shim and proxy types and capabilities
## Sandbox Based API
### Sandbox Management API
@@ -56,7 +57,7 @@ To fulfill the [Kata design requirements](kata-design-requirements.md), and base
|Name|Description|
|---|---|
|`sandbox.GetOOMEvent()`| Monitor the OOM events that occur in the sandbox..|
|`sandbox.UpdateRuntimeMetrics()`| Update the `shim/hypervisor` metrics of the running sandbox.|
|`sandbox.UpdateRuntimeMetrics()`| Update the shim/`hypervisor`'s metrics of the running sandbox.|
|`sandbox.GetAgentMetrics()`| Get metrics of the agent and the guest in the running sandbox.|
## Plugin framework for external proprietary Kata runtime extensions
@@ -98,3 +99,32 @@ Built-in implementations include:
### Sandbox Connection Plugin Workflow
![Sandbox Connection Plugin Workflow](https://raw.githubusercontent.com/bergwolf/raw-contents/master/kata/Sandbox-Connection.png "Sandbox Connection Plugin Workflow")
## Built-in Shim and Proxy Types and Capabilities
### Built-in shim/proxy sandbox configurations
- Supported shim configurations:
|Name|Description|
|---|---|
|`noopshim`|Do not start any shim process.|
|`ccshim`| Start the cc-shim binary.|
|`katashim`| Start the `kata-shim` binary.|
|`katashimbuiltin`|No standalone shim process but shim functionality APIs are exported.|
- Supported proxy configurations:
|Name|Description|
|---|---|
|`noopProxy`| a dummy proxy implementation of the proxy interface, only used for testing purpose.|
|`noProxy`|generic implementation for any case where no actual proxy is needed.|
|`ccProxy`|run `ccProxy` to proxy between runtime and agent.|
|`kataProxy`|run `kata-proxy` to translate Yamux connections between runtime and Kata agent. |
|`kataProxyBuiltin`| no standalone proxy process and connect to Kata agent with internal Yamux translation.|
### Built-in Shim Capability
Built-in shim capability is implemented by removing standalone shim process, and
supporting the shim related APIs.
### Built-in Proxy Capability
Built-in proxy capability is achieved by removing standalone proxy process, and
connecting to Kata agent with a custom gRPC dialer that is internal Yamux translation.
The behavior is enabled when proxy is configured as `kataProxyBuiltin`.

View File

@@ -22,10 +22,10 @@ the multiple hypervisors and virtual machine monitors that Kata supports.
## Mapping container concepts to virtual machine technologies
A typical deployment of Kata Containers will be in Kubernetes by way of a Container Runtime Interface (CRI) implementation. On every node,
Kubelet will interact with a CRI implementer (such as containerd or CRI-O), which will in turn interface with Kata Containers (an OCI based runtime).
Kubelet will interact with a CRI implementor (such as containerd or CRI-O), which will in turn interface with Kata Containers (an OCI based runtime).
The CRI API, as defined at the [Kubernetes CRI-API repo](https://github.com/kubernetes/cri-api/), implies a few constructs being supported by the
CRI implementation, and ultimately in Kata Containers. In order to support the full [API](https://github.com/kubernetes/cri-api/blob/a6f63f369f6d50e9d0886f2eda63d585fbd1ab6a/pkg/apis/runtime/v1alpha2/api.proto#L34-L110) with the CRI-implementer, Kata must provide the following constructs:
CRI implementation, and ultimately in Kata Containers. In order to support the full [API](https://github.com/kubernetes/cri-api/blob/a6f63f369f6d50e9d0886f2eda63d585fbd1ab6a/pkg/apis/runtime/v1alpha2/api.proto#L34-L110) with the CRI-implementor, Kata must provide the following constructs:
![API to construct](./arch-images/api-to-construct.png)
@@ -41,9 +41,14 @@ Each hypervisor or VMM varies on how or if it handles each of these.
## Kata Containers Hypervisor and VMM support
Kata Containers [supports multiple hypervisors](../hypervisors.md).
Kata Containers is designed to support multiple virtual machine monitors (VMMs) and hypervisors.
Kata Containers supports:
- [ACRN hypervisor](https://projectacrn.org/)
- [Cloud Hypervisor](https://github.com/cloud-hypervisor/cloud-hypervisor)/[KVM](https://www.linux-kvm.org/page/Main_Page)
- [Firecracker](https://github.com/firecracker-microvm/firecracker)/KVM
- [QEMU](http://www.qemu-project.org/)/KVM
Details of each solution and a summary are provided below.
Which configuration to use will depend on the end user's requirements. Details of each solution and a summary are provided below.
### QEMU/KVM
@@ -57,7 +62,7 @@ be changed by editing the runtime [`configuration`](./architecture.md/#configura
Devices and features used:
- virtio VSOCK or virtio serial
- virtio block or virtio SCSI
- [virtio net](https://www.redhat.com/en/virtio-networking-series)
- virtio net
- virtio fs or virtio 9p (recommend: virtio fs)
- VFIO
- hotplug
@@ -100,34 +105,25 @@ Devices used:
### Cloud Hypervisor/KVM
[Cloud Hypervisor](https://github.com/cloud-hypervisor/cloud-hypervisor), based
on [rust-vmm](https://github.com/rust-vmm), is designed to have a
lighter footprint and smaller attack surface for running modern cloud
workloads. Kata Containers with Cloud
Hypervisor provides mostly complete compatibility with Kubernetes
comparable to the QEMU configuration. As of the 1.12 and 2.0.0 release
of Kata Containers, the Cloud Hypervisor configuration supports both CPU
and memory resize, device hotplug (disk and VFIO), file-system sharing through virtio-fs,
block-based volumes, booting from VM images backed by pmem device, and
fine-grained seccomp filters for each VMM threads (e.g. all virtio
device worker threads). Please check [this GitHub Project](https://github.com/orgs/kata-containers/projects/21)
for details of ongoing integration efforts.
Cloud Hypervisor, based on [rust-VMM](https://github.com/rust-vmm), is designed to have a lighter footprint and attack surface. For Kata Containers,
relative to Firecracker, the Cloud Hypervisor configuration provides better compatibility at the expense of exposing additional devices: file system
sharing and direct device assignment. As of the 1.10 release of Kata Containers, Cloud Hypervisor does not support device hotplug, and as a result
does not support updating container resources after boot, or utilizing block based volumes. While Cloud Hypervisor does support VFIO, Kata is still adding
this support. As of 1.10, Kata does not support block based volumes or direct device assignment. See [Cloud Hypervisor device support documentation](https://github.com/cloud-hypervisor/cloud-hypervisor/blob/master/docs/device_model.md)
for more details on Cloud Hypervisor.
Devices and features used:
- virtio VSOCK or virtio serial
Devices used:
- virtio VSOCK
- virtio block
- virtio net
- virtio fs
- virtio pmem
- VFIO
- hotplug
- seccomp filters
- [HTTP OpenAPI](https://github.com/cloud-hypervisor/cloud-hypervisor/blob/master/vmm/src/api/openapi/cloud-hypervisor.yaml)
### Summary
| Solution | release introduced | brief summary |
|-|-|-|
| Cloud Hypervisor | 1.10 | upstream Cloud Hypervisor with rich feature support, e.g. hotplug, VFIO and FS sharing|
| Firecracker | 1.5 | upstream Firecracker, rust-VMM based, no VFIO, no FS sharing, no memory/CPU hotplug |
| QEMU | 1.0 | upstream QEMU, with support for hotplug and filesystem sharing |
| NEMU | 1.4 | Deprecated, removed as of 1.10 release. Slimmed down fork of QEMU, with experimental support of virtio-fs |
| Firecracker | 1.5 | upstream Firecracker, rust-VMM based, no VFIO, no FS sharing, no memory/CPU hotplug |
| QEMU-virtio-fs | 1.7 | upstream QEMU with support for virtio-fs. Will be removed once virtio-fs lands in upstream QEMU |
| Cloud Hypervisor | 1.10 | rust-VMM based, includes VFIO and FS sharing through virtio-fs, no hotplug |

View File

@@ -185,7 +185,7 @@ in Kibana:
![Kata tags in EFK](./images/efk_syslog_entry_detail.png).
We can however further sub-parse the Kata entries using the
[Fluentd plugins](https://docs.fluentbit.io/manual/pipeline/parsers/logfmt) that will parse
[Fluentd plugins](https://docs.fluentbit.io/manual/parser/logfmt) that will parse
`logfmt` formatted data. We can utilise these to parse the sub-fields using a Fluentd filter
section. At the same time, we will prefix the new fields with `kata_` to make it clear where
they have come from:
@@ -222,7 +222,7 @@ test to check the parsing works. The resulting output from Fluentd is:
"_COMM":"kata-runtime",
"_EXE":"/opt/kata/bin/kata-runtime",
"SYSLOG_TIMESTAMP":"Feb 21 10:31:27 ",
"_CMDLINE":"/opt/kata/bin/kata-runtime --config /opt/kata/share/defaults/kata-containers/configuration-qemu.toml --root /run/runc state 7cdd31660d8705facdadeb8598d2c0bd008e8142c54e3b3069abd392c8d58997",
"_CMDLINE":"/opt/kata/bin/kata-runtime --kata-config /opt/kata/share/defaults/kata-containers/configuration-qemu.toml --root /run/runc state 7cdd31660d8705facdadeb8598d2c0bd008e8142c54e3b3069abd392c8d58997",
"SYSLOG_PID":"14314",
"_PID":"14314",
"MESSAGE":"time=\"2020-02-21T10:31:27.810781647Z\" level=info msg=\"release sandbox\" arch=amd64 command=state container=7cdd31660d8705facdadeb8598d2c0bd008e8142c54e3b3069abd392c8d58997 name=kata-runtime pid=14314 sandbox=1c3e77cad66aa2b6d8cc846f818370f79cb0104c0b840f67d0f502fd6562b68c source=virtcontainers subsystem=sandbox",
@@ -281,7 +281,7 @@ own file (rather than into the system journal).
```bash
#!/bin/bash
/opt/kata/bin/kata-runtime --config "/opt/kata/share/defaults/kata-containers/configuration-qemu.toml" --log-format=json --log=/var/log/kata-runtime.log $@
/opt/kata/bin/kata-runtime --kata-config "/opt/kata/share/defaults/kata-containers/configuration-qemu.toml" --log-format=json --log=/var/log/kata-runtime.log $@
```
And then we'll add the Fluentd config section to parse that file. Note, we inform the parser that Kata is

View File

@@ -56,9 +56,8 @@ There are some limitations with this approach:
As was mentioned above, not all containers need the same modules, therefore using
the configuration file for specifying the list of kernel modules per [POD][3] can
be a pain.
Unlike the configuration file, [annotations](how-to-set-sandbox-config-kata.md)
provide a way to specify custom configurations per POD.
be a pain. Unlike the configuration file, annotations provide a way to specify
custom configurations per POD.
The list of kernel modules and parameters can be set using the annotation
`io.katacontainers.config.agent.kernel_modules` as a semicolon separated
@@ -102,7 +101,7 @@ spec:
tty: true
```
> **Note**: To pass annotations to Kata containers, [CRI-O must be configured correctly](how-to-set-sandbox-config-kata.md#cri-o-configuration)
> **Note**: To pass annotations to Kata containers, [`CRI` must to be configured correctly](how-to-set-sandbox-config-kata.md#cri-configuration)
[1]: ../../src/runtime
[2]: ../../src/agent

View File

@@ -34,7 +34,7 @@ Also you should ensure that `kubectl` working correctly.
Start Prometheus by utilizing our sample manifest:
```
$ kubectl apply -f https://raw.githubusercontent.com/kata-containers/kata-containers/main/docs/how-to/data/prometheus.yml
$ kubectl apply -f https://raw.githubusercontent.com/kata-containers/kata-containers/2.0-dev/docs/how-to/data/prometheus.yml
```
This will create a new namespace, `prometheus`, and create the following resources:
@@ -60,7 +60,7 @@ go_gc_duration_seconds{quantile="0.75"} 0.000229911
`kata-monitor` can be started on the cluster as follows:
```
$ kubectl apply -f https://raw.githubusercontent.com/kata-containers/kata-containers/main/docs/how-to/data/kata-monitor-daemonset.yml
$ kubectl apply -f https://raw.githubusercontent.com/kata-containers/kata-containers/2.0-dev/docs/how-to/data/kata-monitor-daemonset.yml
```
This will create a new namespace `kata-system` and a `daemonset` in it.
@@ -73,7 +73,7 @@ Once the `daemonset` is running, Prometheus should discover `kata-monitor` as a
Run this command to run Grafana in Kubernetes:
```
$ kubectl apply -f https://raw.githubusercontent.com/kata-containers/kata-containers/main/docs/how-to/data/grafana.yml
$ kubectl apply -f https://raw.githubusercontent.com/kata-containers/kata-containers/2.0-dev/docs/how-to/data/grafana.yml
```
This will create deployment and service for Grafana under namespace `prometheus`.
@@ -99,7 +99,7 @@ You can import this dashboard using Grafana UI, or using `curl` command in conso
$ curl -XPOST -i localhost:3000/api/dashboards/import \
-u admin:admin \
-H "Content-Type: application/json" \
-d "{\"dashboard\":$(curl -sL https://raw.githubusercontent.com/kata-containers/kata-containers/main/docs/how-to/data/dashboard.json )}"
-d "{\"dashboard\":$(curl -sL https://raw.githubusercontent.com/kata-containers/kata-containers/2.0-dev/docs/how-to/data/dashboard.json )}"
```
## References

View File

@@ -3,11 +3,6 @@
Kata Containers gives users freedom to customize at per-pod level, by setting
a wide range of Kata specific annotations in the pod specification.
Some annotations may be [restricted](#restricted-annotations) by the
configuration file for security reasons, notably annotations that could lead the
runtime to execute programs on the host. Such annotations are marked with _(R)_ in
the tables below.
# Kata Configuration Annotations
There are several kinds of Kata configurations and they are listed below.
@@ -26,13 +21,11 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.runtime.disable_new_netns` | `boolean` | determines if a new netns is created for the hypervisor process |
| `io.katacontainers.config.runtime.internetworking_model` | string| determines how the VM should be connected to the container network interface. Valid values are `macvtap`, `tcfilter` and `none` |
| `io.katacontainers.config.runtime.sandbox_cgroup_only`| `boolean` | determines if Kata processes are managed only in sandbox cgroup |
| `io.katacontainers.config.runtime.enable_pprof` | `boolean` | enables Golang `pprof` for `containerd-shim-kata-v2` process |
## Agent Options
| Key | Value Type | Comments |
|-------| ----- | ----- |
| `io.katacontainers.config.agent.enable_tracing` | `boolean` | enable tracing for the agent |
| `io.katacontainers.config.agent.container_pipe_size` | uint32 | specify the size of the std(in/out) pipes created for containers |
| `io.katacontainers.config.agent.kernel_modules` | string | the list of kernel modules and their parameters that will be loaded in the guest kernel. Semicolon separated list of kernel modules and their parameters. These modules will be loaded in the guest kernel using `modprobe`(8). E.g., `e1000e InterruptThrottleRate=3000,3000,3000 EEE=1; i915 enable_ppgtt=0` |
| `io.katacontainers.config.agent.trace_mode` | string | the trace mode for the agent |
| `io.katacontainers.config.agent.trace_type` | string | the trace type for the agent |
@@ -45,24 +38,17 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.hypervisor.block_device_cache_noflush` | `boolean` | Denotes whether flush requests for the device are ignored |
| `io.katacontainers.config.hypervisor.block_device_cache_set` | `boolean` | cache-related options will be set to block devices or not |
| `io.katacontainers.config.hypervisor.block_device_driver` | string | the driver to be used for block device, valid values are `virtio-blk`, `virtio-scsi`, `nvdimm`|
| `io.katacontainers.config.hypervisor.cpu_features` | `string` | Comma-separated list of CPU features to pass to the CPU (QEMU) |
| `io.katacontainers.config.hypervisor.ctlpath` (R) | `string` | Path to the `acrnctl` binary for the ACRN hypervisor |
| `io.katacontainers.config.hypervisor.default_max_vcpus` | uint32| the maximum number of vCPUs allocated for the VM by the hypervisor |
| `io.katacontainers.config.hypervisor.default_memory` | uint32| the memory assigned for a VM by the hypervisor in `MiB` |
| `io.katacontainers.config.hypervisor.default_vcpus` | uint32| the default vCPUs assigned for a VM by the hypervisor |
| `io.katacontainers.config.hypervisor.disable_block_device_use` | `boolean` | disallow a block device from being used |
| `io.katacontainers.config.hypervisor.disable_image_nvdimm` | `boolean` | specify if a `nvdimm` device should be used as rootfs for the guest (QEMU) |
| `io.katacontainers.config.hypervisor.disable_vhost_net` | `boolean` | specify if `vhost-net` is not available on the host |
| `io.katacontainers.config.hypervisor.enable_hugepages` | `boolean` | if the memory should be `pre-allocated` from huge pages |
| `io.katacontainers.config.hypervisor.enable_iommu_platform` | `boolean` | enable `iommu` on CCW devices (QEMU s390x) |
| `io.katacontainers.config.hypervisor.enable_iommu` | `boolean` | enable `iommu` on Q35 (QEMU x86_64) |
| `io.katacontainers.config.hypervisor.enable_iothreads` | `boolean`| enable IO to be processed in a separate thread. Supported currently for virtio-`scsi` driver |
| `io.katacontainers.config.hypervisor.enable_mem_prealloc` | `boolean` | the memory space used for `nvdimm` device by the hypervisor |
| `io.katacontainers.config.hypervisor.enable_swap` | `boolean` | enable swap of VM memory |
| `io.katacontainers.config.hypervisor.enable_vhost_user_store` | `boolean` | enable vhost-user storage device (QEMU) |
| `io.katacontainers.config.hypervisor.enable_virtio_mem` | `boolean` | enable virtio-mem (QEMU) |
| `io.katacontainers.config.hypervisor.entropy_source` (R) | string| the path to a host source of entropy (`/dev/random`, `/dev/urandom` or real hardware RNG device) |
| `io.katacontainers.config.hypervisor.file_mem_backend` (R) | string | file based memory backend root directory |
| `io.katacontainers.config.hypervisor.entropy_source` | string| the path to a host source of entropy (`/dev/random`, `/dev/urandom` or real hardware RNG device) |
| `io.katacontainers.config.hypervisor.file_mem_backend` | string | file based memory backend root directory |
| `io.katacontainers.config.hypervisor.firmware_hash` | string | container firmware SHA-512 hash value |
| `io.katacontainers.config.hypervisor.firmware` | string | the guest firmware that will run the container VM |
| `io.katacontainers.config.hypervisor.guest_hook_path` | string | the path within the VM that will be used for drop in hooks |
@@ -73,7 +59,7 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.hypervisor.initrd_hash` | string | container guest initrd SHA-512 hash value |
| `io.katacontainers.config.hypervisor.initrd` | string | the guest initrd image that will run in the container VM |
| `io.katacontainers.config.hypervisor.jailer_hash` | string | container jailer SHA-512 hash value |
| `io.katacontainers.config.hypervisor.jailer_path` (R) | string | the jailer that will constrain the container VM |
| `io.katacontainers.config.hypervisor.jailer_path` | string | the jailer that will constrain the container VM |
| `io.katacontainers.config.hypervisor.kernel_hash` | string | container kernel image SHA-512 hash value |
| `io.katacontainers.config.hypervisor.kernel_params` | string | additional guest kernel parameters |
| `io.katacontainers.config.hypervisor.kernel` | string | the kernel used to boot the container VM |
@@ -83,21 +69,17 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.hypervisor.memory_slots` | uint32| the memory slots assigned to the VM by the hypervisor |
| `io.katacontainers.config.hypervisor.msize_9p` | uint32 | the `msize` for 9p shares |
| `io.katacontainers.config.hypervisor.path` | string | the hypervisor that will run the container VM |
| `io.katacontainers.config.hypervisor.pcie_root_port` | specify the number of PCIe Root Port devices. The PCIe Root Port device is used to hot-plug a PCIe device (QEMU) |
| `io.katacontainers.config.hypervisor.shared_fs` | string | the shared file system type, either `virtio-9p` or `virtio-fs` |
| `io.katacontainers.config.hypervisor.use_vsock` | `boolean` | specify use of `vsock` for agent communication |
| `io.katacontainers.config.hypervisor.vhost_user_store_path` (R) | `string` | specify the directory path where vhost-user devices related folders, sockets and device nodes should be (QEMU) |
| `io.katacontainers.config.hypervisor.virtio_fs_cache_size` | uint32 | virtio-fs DAX cache size in `MiB` |
| `io.katacontainers.config.hypervisor.virtio_fs_cache` | string | the cache mode for virtio-fs, valid values are `always`, `auto` and `none` |
| `io.katacontainers.config.hypervisor.virtio_fs_daemon` | string | virtio-fs `vhost-user` daemon path |
| `io.katacontainers.config.hypervisor.virtio_fs_extra_args` | string | extra options passed to `virtiofs` daemon |
# CRI-O Configuration
# CRI Configuration
In case of CRI-O, all annotations specified in the pod spec are passed down to Kata.
# containerd Configuration
For containerd, annotations specified in the pod spec are passed down to Kata
starting with version `1.3.0` of containerd. Additionally, extra configuration is
needed for containerd, by providing a `pod_annotations` field in the containerd config
@@ -110,14 +92,16 @@ for passing annotations to Kata from containerd:
$ cat /etc/containerd/config
....
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata]
runtime_type = "io.containerd.kata.v2"
[plugins.cri.containerd.runtimes.kata]
runtime_type = "io.containerd.runc.v1"
pod_annotations = ["io.katacontainers.*"]
[plugins.cri.containerd.runtimes.kata.options]
BinaryName = "/usr/bin/kata-runtime"
....
```
Additional documentation on the above configuration can be found in the
Additional documentation on the above configuration can be found in the
[containerd docs](https://github.com/containerd/cri/blob/8d5a8355d07783ba2f8f451209f6bdcc7c412346/docs/config.md).
# Example - Using annotations
@@ -175,32 +159,3 @@ spec:
stdin: true
tty: true
```
# Restricted annotations
Some annotations are _restricted_, meaning that the configuration file specifies
the acceptable values. Currently, only hypervisor annotations are restricted,
for security reason, with the intent to control which binaries the Kata
Containers runtime will launch on your behalf.
The configuration file validates the annotation _name_ as well as the annotation
_value_.
The acceptable annotation names are defined by the `enable_annotations` entry in
the configuration file.
For restricted annotations, an additional configuration entry provides a list of
acceptable values. Since most restricted annotations are intended to control
which binaries the runtime can execute, the valid value is generally provided by
a shell pattern, as defined by `glob(3)`. The table below provides the name of
the configuration entry:
| Key | Config file entry | Comments |
|-------| ----- | ----- |
| `ctlpath` | `valid_ctlpaths` | Valid paths for `acrnctl` binary |
| `entropy_source` | `valid_entropy_sources` | Valid entropy sources, e.g. `/dev/random` |
| `file_mem_backend` | `valid_file_mem_backends` | Valid locations for the file-based memory backend root directory |
| `jailer_path` | `valid_jailer_paths`| Valid paths for the jailer constraining the container VM (Firecracker) |
| `path` | `valid_hypervisor_paths` | Valid hypervisors to run the container VM |
| `vhost_user_store_path` | `valid_vhost_user_store_paths` | Valid paths for vhost-user related files|
| `virtio_fs_daemon` | `valid_virtio_fs_daemon_paths` | Valid paths for the `virtiofsd` daemon |

View File

@@ -7,10 +7,9 @@
* [Configure Kubelet to use containerd](#configure-kubelet-to-use-containerd)
* [Configure HTTP proxy - OPTIONAL](#configure-http-proxy---optional)
* [Start Kubernetes](#start-kubernetes)
* [Configure Pod Network](#configure-pod-network)
* [Install a Pod Network](#install-a-pod-network)
* [Allow pods to run in the master node](#allow-pods-to-run-in-the-master-node)
* [Create runtime class for Kata Containers](#create-runtime-class-for-kata-containers)
* [Run pod in Kata Containers](#run-pod-in-kata-containers)
* [Create an untrusted pod using Kata Containers](#create-an-untrusted-pod-using-kata-containers)
* [Delete created pod](#delete-created-pod)
This document describes how to set up a single-machine Kubernetes (k8s) cluster.
@@ -19,6 +18,9 @@ The Kubernetes cluster will use the
[CRI containerd plugin](https://github.com/containerd/cri) and
[Kata Containers](https://katacontainers.io) to launch untrusted workloads.
For Kata Containers 1.5.0-rc2 and above, we will use `containerd-shim-kata-v2` (short as `shimv2` in this documentation)
to launch Kata Containers. For the previous version of Kata Containers, the Pods are launched with `kata-runtime`.
## Requirements
- Kubernetes, Kubelet, `kubeadm`
@@ -123,33 +125,43 @@ $ sudo systemctl daemon-reload
$ sudo -E kubectl get pods
```
## Configure Pod Network
## Install a Pod Network
A pod network plugin is needed to allow pods to communicate with each other.
You can find more about CNI plugins from the [Creating a cluster with `kubeadm`](https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/#instructions) guide.
By default the CNI plugin binaries is installed under `/opt/cni/bin` (in package `kubernetes-cni`), you only need to create a configuration file for CNI plugin.
- Install the `flannel` plugin by following the
[Using `kubeadm` to Create a Cluster](https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/#instructions)
guide, starting from the **Installing a pod network** section.
- Create a pod network using flannel
> **Note:** There is no known way to determine programmatically the best version (commit) to use.
> See https://github.com/coreos/flannel/issues/995.
```bash
$ sudo -E mkdir -p /etc/cni/net.d
$ sudo -E kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
```
$ sudo -E cat > /etc/cni/net.d/10-mynet.conf <<EOF
{
"cniVersion": "0.2.0",
"name": "mynet",
"type": "bridge",
"bridge": "cni0",
"isGateway": true,
"ipMasq": true,
"ipam": {
"type": "host-local",
"subnet": "172.19.0.0/24",
"routes": [
{ "dst": "0.0.0.0/0" }
]
}
}
EOF
- Wait for the pod network to become available
```bash
# number of seconds to wait for pod network to become available
$ timeout_dns=420
$ while [ "$timeout_dns" -gt 0 ]; do
if sudo -E kubectl get pods --all-namespaces | grep dns | grep Running; then
break
fi
sleep 1s
((timeout_dns--))
done
```
- Check the pod network is running
```bash
$ sudo -E kubectl get pods --all-namespaces | grep dns | grep Running && echo "OK" || ( echo "FAIL" && false )
```
## Allow pods to run in the master node
@@ -160,38 +172,24 @@ By default, the cluster will not schedule pods in the master node. To enable mas
$ sudo -E kubectl taint nodes --all node-role.kubernetes.io/master-
```
## Create runtime class for Kata Containers
## Create an untrusted pod using Kata Containers
By default, all pods are created with the default runtime configured in CRI containerd plugin.
From Kubernetes v1.12, users can use [`RuntimeClass`](https://kubernetes.io/docs/concepts/containers/runtime-class/#runtime-class) to specify a different runtime for Pods.
```bash
$ cat > runtime.yaml <<EOF
apiVersion: node.k8s.io/v1beta1
kind: RuntimeClass
metadata:
name: kata
handler: kata
EOF
$ sudo -E kubectl apply -f runtime.yaml
```
## Run pod in Kata Containers
If a pod has the `runtimeClassName` set to `kata`, the CRI plugin runs the pod with the
If a pod has the `io.kubernetes.cri.untrusted-workload` annotation set to `"true"`, the CRI plugin runs the pod with the
[Kata Containers runtime](../../src/runtime/README.md).
- Create an pod configuration that using Kata Containers runtime
- Create an untrusted pod configuration
```bash
$ cat << EOT | tee nginx-kata.yaml
$ cat << EOT | tee nginx-untrusted.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-kata
name: nginx-untrusted
annotations:
io.kubernetes.cri.untrusted-workload: "true"
spec:
runtimeClassName: kata
containers:
- name: nginx
image: nginx
@@ -199,9 +197,9 @@ If a pod has the `runtimeClassName` set to `kata`, the CRI plugin runs the pod w
EOT
```
- Create the pod
- Create an untrusted pod
```bash
$ sudo -E kubectl apply -f nginx-kata.yaml
$ sudo -E kubectl apply -f nginx-untrusted.yaml
```
- Check pod is running
@@ -218,5 +216,5 @@ If a pod has the `runtimeClassName` set to `kata`, the CRI plugin runs the pod w
## Delete created pod
```bash
$ sudo -E kubectl delete -f nginx-kata.yaml
$ sudo -E kubectl delete -f nginx-untrusted.yaml
```

View File

@@ -91,7 +91,7 @@ To configure Kata Containers with ACRN, copy the generated `configuration-acrn.t
The following command shows full paths to the `configuration.toml` files that the runtime loads. It will use the first path that exists. (Please make sure the kernel and image paths are set correctly in the `configuration.toml` file)
```bash
$ sudo kata-runtime --show-default-config-paths
$ sudo kata-runtime --kata-show-default-config-paths
```
>**Warning:** Please offline CPUs using [this](offline_cpu.sh) script, else VM launches will fail.

View File

@@ -1,12 +1,61 @@
# Kata Containers with virtio-fs
- [Kata Containers with virtio-fs](#kata-containers-with-virtio-fs)
- [Introduction](#introduction)
- [Introduction](#introduction)
- [Pre-requisites](#pre-requisites)
- [Install Kata Containers with virtio-fs support](#install-kata-containers-with-virtio-fs-support)
- [Run a Kata Container utilizing virtio-fs](#run-a-kata-container-utilizing-virtio-fs)
## Introduction
Container deployments utilize explicit or implicit file sharing between host filesystem and containers. From a trust perspective, avoiding a shared file-system between the trusted host and untrusted container is recommended. This is not always feasible. In Kata Containers, block-based volumes are preferred as they allow usage of either device pass through or `virtio-blk` for access within the virtual machine.
As of the 2.0 release of Kata Containers, [virtio-fs](https://virtio-fs.gitlab.io/) is the default filesystem sharing mechanism.
As of the 1.7 release of Kata Containers, [9pfs](https://www.kernel.org/doc/Documentation/filesystems/9p.txt) is the default filesystem sharing mechanism. While this does allow for workload compatibility, it does so with degraded performance and potential for POSIX compliance limitations.
virtio-fs support works out of the box for `cloud-hypervisor` and `qemu`, when Kata Containers is deployed using `kata-deploy`. Learn more about `kata-deploy` and how to use `kata-deploy` in Kubernetes [here](https://github.com/kata-containers/packaging/tree/master/kata-deploy#kubernetes-quick-start).
To help address these limitations, [virtio-fs](https://virtio-fs.gitlab.io/) has been developed. virtio-fs is a shared file system that lets virtual machines access a directory tree on the host. In Kata Containers, virtio-fs can be used to share container volumes, secrets, config-maps, configuration files (hostname, hosts, `resolv.conf`) and the container rootfs on the host with the guest. virtio-fs provides significant performance and POSIX compliance improvements compared to 9pfs.
Enabling of virtio-fs requires changes in the guest kernel as well as the VMM. For Kata Containers, experimental virtio-fs support is enabled through `qemu` and `cloud-hypervisor` VMMs.
**Note: virtio-fs support is experimental in the 1.7 release of Kata Containers. Work is underway to improve stability, performance and upstream integration. This is available for early preview - use at your own risk**
This document describes how to get Kata Containers to work with virtio-fs.
## Pre-requisites
Before Kata 1.8 this feature required the host to have hugepages support enabled. Enable this with the `sysctl vm.nr_hugepages=1024` command on the host.In later versions of Kata, virtio-fs leverages `/dev/shm` as the shared memory backend. The default size of `/dev/shm` on a system is typically half of the total system memory. This can pose a physical limit to the maximum number of pods that can be launched with virtio-fs. This can be overcome by increasing the size of `/dev/shm` as shown below:
```bash
$ mount -o remount,size=${desired_shm_size} /dev/shm
```
## Install Kata Containers with virtio-fs support
The Kata Containers `qemu` configuration with virtio-fs and the `virtiofs` daemon are available in the [Kata Container release](https://github.com/kata-containers/runtime/releases) artifacts starting with the 1.9 release. Installation is available through [distribution packages](https://github.com/kata-containers/documentation/blob/master/install/README.md#supported-distributions) as well through [`kata-deploy`](https://github.com/kata-containers/packaging/tree/master/kata-deploy).
**Note: Support for virtio-fs was first introduced in `NEMU` hypervisor in Kata 1.8 release. This hypervisor has been deprecated.**
Install the latest release of Kata with `kata-deploy` as follows:
```
docker run --runtime=runc -v /opt/kata:/opt/kata -v /var/run/dbus:/var/run/dbus -v /run/systemd:/run/systemd -v /etc/docker:/etc/docker -it katadocker/kata-deploy kata-deploy-docker install
```
This will place the Kata release artifacts in `/opt/kata`, and update Docker's configuration to include a runtime target, `kata-qemu-virtiofs`. Learn more about `kata-deploy` and how to use `kata-deploy` in Kubernetes [here](https://github.com/kata-containers/packaging/tree/master/kata-deploy#kubernetes-quick-start).
## Run a Kata Container utilizing virtio-fs
Once installed, start a new container, utilizing `qemu` + `virtiofs`:
```bash
$ docker run --runtime=kata-qemu-virtiofs -it busybox
```
Verify the new container is running with the `qemu` hypervisor as well as using `virtiofsd`. To do this look for the hypervisor path and the `virtiofs` daemon process on the host:
```bash
$ ps -aux | grep virtiofs
root ... /home/foo/build-x86_64_virt/x86_64_virt-softmmu/qemu-system-x86_64_virt
... -machine virt,accel=kvm,kernel_irqchip,nvdimm ...
root ... /home/foo/build-x86_64_virt/virtiofsd-x86_64 ...
```
You can also try out virtio-fs using `cloud-hypervisor` VMM:
```bash
$ docker run --runtime=kata-clh -it busybox
```

View File

@@ -13,23 +13,26 @@ Kata Containers with `virtio-mem` supports memory resize.
## Requisites
Kata Containers just supports `virtio-mem` with QEMU.
Install and setup Kata Containers as shown [here](../install/README.md).
Kata Containers with `virtio-mem` requires Linux and the QEMU that support `virtio-mem`.
The Linux kernel and QEMU upstream version still not support `virtio-mem`. @davidhildenbrand is working on them.
Please use following unofficial version of the Linux kernel and QEMU that support `virtio-mem` with Kata Containers.
### With x86_64
The `virtio-mem` config of the x86_64 Kata Linux kernel is open.
Enable `virtio-mem` as follows:
```
$ sudo sed -i -e 's/^#enable_virtio_mem.*$/enable_virtio_mem = true/g' /etc/kata-containers/configuration.toml
The Linux kernel is at https://github.com/davidhildenbrand/linux/tree/virtio-mem-rfc-v4.
The Linux kernel config that can work with Kata Containers is at https://gist.github.com/teawater/016194ee84748c768745a163d08b0fb9.
The QEMU is at https://github.com/teawater/qemu/tree/kata-virtio-mem. (The original source is at https://github.com/davidhildenbrand/qemu/tree/virtio-mem. Its base version of QEMU cannot work with Kata Containers. So merge the commit of `virtio-mem` to upstream QEMU.)
Set Linux and the QEMU that support `virtio-mem` with following line in the Kata Containers QEMU configuration `configuration-qemu.toml`:
```toml
[hypervisor.qemu]
path = "qemu-dir"
kernel = "vmlinux-dir"
```
### With other architectures
The `virtio-mem` config of the others Kata Linux kernel is not open.
You can open `virtio-mem` config as follows:
Enable `virtio-mem` with following line in the Kata Containers configuration:
```toml
enable_virtio_mem = true
```
CONFIG_VIRTIO_MEM=y
```
Then you can build and install the guest kernel image as shown [here](../../tools/packaging/kernel/README.md#build-kata-containers-kernel).
## Run a Kata Container utilizing `virtio-mem`
@@ -38,35 +41,13 @@ Use following command to enable memory overcommitment of a Linux kernel. Becaus
$ echo 1 | sudo tee /proc/sys/vm/overcommit_memory
```
Use following command to start a Kata Container.
Use following command start a Kata Container.
```
$ pod_yaml=pod.yaml
$ container_yaml=${REPORT_DIR}/container.yaml
$ image="quay.io/prometheus/busybox:latest"
$ cat << EOF > "${pod_yaml}"
metadata:
name: busybox-sandbox1
EOF
$ cat << EOF > "${container_yaml}"
metadata:
name: busybox-killed-vmm
image:
image: "$image"
command:
- top
EOF
$ sudo crictl pull $image
$ podid=$(sudo crictl runp $pod_yaml)
$ cid=$(sudo crictl create $podid $container_yaml $pod_yaml)
$ sudo crictl start $cid
$ docker run --rm -it --runtime=kata --name test busybox
```
Use the following command to set the container memory limit to 2g and the memory size of the VM to its default_memory + 2g.
Use following command set the memory size of test to default_memory + 512m.
```
$ sudo crictl update --memory $((2*1024*1024*1024)) $cid
$ docker update -m 512m --memory-swap -1 test
```
Use the following command to set the container memory limit to 1g and the memory size of the VM to its default_memory + 1g.
```
$ sudo crictl update --memory $((1*1024*1024*1024)) $cid
```

View File

@@ -46,7 +46,6 @@ overridden by `/etc/kata-containers/configuration.toml` if provided) such that:
- `enable_template = true`
- `initrd =` is set
- `image =` option is commented out or removed
- `shared_fs` should not be `virtio-fs`
Then you can create a VM templating for later usage by calling
```

View File

@@ -1,68 +0,0 @@
# Hypervisors
* [Hypervisors](#hypervisors)
* [Introduction](#introduction)
* [Types](#types)
* [Determine currently configured hypervisor](#determine-currently-configured-hypervisor)
* [Choose a Hypervisor](#choose-a-hypervisor)
## Introduction
Kata Containers supports multiple hypervisors. This document provides a very
high level overview of the available hypervisors, giving suggestions as to
which hypervisors you may wish to investigate further.
> **Note:**
>
> This document is not prescriptive or authoritative:
>
> - It is up to you to decide which hypervisors may be most appropriate for
> your use-case.
> - Refer to the official documentation for each hypervisor for further details.
## Types
Since each hypervisor offers different features and options, Kata Containers
provides a separate
[configuration file](/src/runtime/README.md#configuration)
for each. The configuration files contain comments explaining which options
are available, their default values and how each setting can be used.
> **Note:**
>
> The simplest way to switch between hypervisors is to create a symbolic link
> to the appropriate hypervisor-specific configuration file.
| Hypervisor | Written in | Architectures | Type | Configuration file |
|-|-|-|-|-|
[ACRN] | C | `x86_64` | Type 1 (bare metal) | `configuration-acrn.toml` |
[Cloud Hypervisor] | rust | `aarch64`, `x86_64` | Type 2 ([KVM]) | `configuration-clh.toml` |
[Firecracker] | rust | `aarch64`, `x86_64` | Type 2 ([KVM]) | `configuration-fc.toml` |
[QEMU] | C | all | Type 2 ([KVM]) | `configuration-qemu.toml` |
## Determine currently configured hypervisor
```bash
$ kata-runtime kata-env | awk -v RS= '/\[Hypervisor\]/' | grep Path
```
## Choose a Hypervisor
The table below provides a brief summary of some of the differences between
the hypervisors:
| Hypervisor | Summary | Features | Limitations | Container Creation speed | Memory density | Use cases | Comment |
|-|-|-|-|-|-|-|-|
[ACRN] | Safety critical and real-time workloads | | | excellent | excellent | Embedded and IOT systems | For advanced users |
[Cloud Hypervisor] | Low latency, small memory footprint, small attack surface | Minimal | | excellent | excellent | High performance modern cloud workloads | |
[Firecracker] | Very slimline | Extremely minimal | Doesn't support all device types | excellent | excellent | Serverless / FaaS | |
[QEMU] | Lots of features | Lots | | good | good | Good option for most users | | All users |
For further details, see the [Virtualization in Kata Containers](design/virtualization.md) document and the official documentation for each hypervisor.
[ACRN]: https://projectacrn.org
[Cloud Hypervisor]: https://github.com/cloud-hypervisor/cloud-hypervisor
[Firecracker]: https://github.com/firecracker-microvm/firecracker
[KVM]: https://en.wikipedia.org/wiki/Kernel-based_Virtual_Machine
[QEMU]: http://www.qemu-project.org

View File

@@ -50,7 +50,9 @@ Kata packages are provided by official distribution repositories for:
| Distribution (link to installation guide) | Minimum versions |
|----------------------------------------------------------|--------------------------------------------------------------------------------|
| [CentOS](centos-installation-guide.md) | 8 |
| [Fedora](fedora-installation-guide.md) | 34 |
| [Fedora](fedora-installation-guide.md) | 32, Rawhide |
| [openSUSE](opensuse-installation-guide.md) | [Leap 15.1](opensuse-leap-15.1-installation-guide.md)<br>Leap 15.2, Tumbleweed |
| [SUSE Linux Enterprise (SLE)](sle-installation-guide.md) | SLE 15 SP1, 15 SP2 |
> **Note::**
>

View File

@@ -3,9 +3,15 @@
1. Install the Kata Containers components with the following commands:
```bash
$ sudo -E dnf install -y centos-release-advanced-virtualization
$ sudo -E dnf module disable -y virt:rhel
$ source /etc/os-release
$ cat <<EOF | sudo -E tee /etc/yum.repos.d/advanced-virt.repo
[advanced-virt]
name=Advanced Virtualization
baseurl=http://mirror.centos.org/\$contentdir/\$releasever/virt/\$basearch/advanced-virtualization
enabled=1
gpgcheck=1
skip_if_unavailable=1
EOF
$ cat <<EOF | sudo -E tee /etc/yum.repos.d/kata-containers.repo
[kata-containers]
name=Kata Containers
@@ -14,7 +20,8 @@
gpgcheck=1
skip_if_unavailable=1
EOF
$ sudo -E dnf install -y kata-containers
$ sudo -E dnf module disable -y virt:rhel
$ sudo -E dnf install -y kata-runtime
```
2. Decide which container manager to use and select the corresponding link that follows:

View File

@@ -18,7 +18,7 @@
>
> - If you decide to proceed and install a Kata Containers release, you can
> still check for the latest version of Kata Containers by running
> `kata-runtime check --only-list-releases`.
> `kata-runtime kata-check --only-list-releases`.
>
> - These instructions will not work for Fedora 31 and higher since those
> distribution versions only support cgroups version 2 by default. However,

View File

@@ -3,7 +3,7 @@
1. Install the Kata Containers components with the following commands:
```bash
$ sudo -E dnf -y install kata-containers
$ sudo -E dnf -y install kata-runtime
```
2. Decide which container manager to use and select the corresponding link that follows:

View File

@@ -6,7 +6,7 @@
* [Install Kata](#install-kata)
* [Create a Kata-enabled Image](#create-a-kata-enabled-image)
Kata Containers on Google Compute Engine (GCE) makes use of [nested virtualization](https://cloud.google.com/compute/docs/instances/enable-nested-virtualization-vm-instances). Most of the installation procedure is identical to that for Kata on your preferred distribution, but enabling nested virtualization currently requires extra steps on GCE. This guide walks you through creating an image and instance with nested virtualization enabled. Note that `kata-runtime check` checks for nested virtualization, but does not fail if support is not found.
Kata Containers on Google Compute Engine (GCE) makes use of [nested virtualization](https://cloud.google.com/compute/docs/instances/enable-nested-virtualization-vm-instances). Most of the installation procedure is identical to that for Kata on your preferred distribution, but enabling nested virtualization currently requires extra steps on GCE. This guide walks you through creating an image and instance with nested virtualization enabled. Note that `kata-runtime kata-check` checks for nested virtualization, but does not fail if support is not found.
As a pre-requisite this guide assumes an installed and configured instance of the [Google Cloud SDK](https://cloud.google.com/sdk/downloads). For a zero-configuration option, all of the commands below were been tested under [Google Cloud Shell](https://cloud.google.com/shell/) (as of Jun 2018). Verify your `gcloud` installation and configuration:

View File

@@ -54,7 +54,7 @@ to enable nested virtualization can be found on the
[KVM Nested Guests page](https://www.linux-kvm.org/page/Nested_Guests)
Alternatively, and for other architectures, the Kata Containers built in
[`check`](../../src/runtime/README.md#hardware-requirements)
[`kata-check`](../../src/runtime/README.md#hardware-requirements)
command can be used *inside Minikube* once Kata has been installed, to check for compatibility.
## Setting up Minikube

View File

@@ -0,0 +1,10 @@
# Install Kata Containers on openSUSE
1. Install the Kata Containers components with the following commands:
```bash
$ sudo -E zypper -n install katacontainers
```
2. Decide which container manager to use and select the corresponding link that follows:
- [Kubernetes](../Developer-Guide.md#run-kata-containers-with-kubernetes)

View File

@@ -0,0 +1,11 @@
# Install Kata Containers on openSUSE Leap 15.1
1. Install the Kata Containers components with the following commands:
```bash
$ sudo -E zypper addrepo --refresh "https://download.opensuse.org/repositories/devel:/kubic/openSUSE_Leap_15.1/devel:kubic.repo"
$ sudo -E zypper -n --gpg-auto-import-keys install katacontainers
```
2. Decide which container manager to use and select the corresponding link that follows:
- [Kubernetes](../Developer-Guide.md#run-kata-containers-with-kubernetes)

View File

@@ -0,0 +1,13 @@
# Install Kata Containers on SLE
1. Install the Kata Containers components with the following commands:
```bash
$ source /etc/os-release
$ DISTRO_VERSION=$(sed "s/-/_/g" <<< "$VERSION")
$ sudo -E zypper addrepo --refresh "https://download.opensuse.org/repositories/devel:/kubic/SLE_${DISTRO_VERSION}_Backports/devel:kubic.repo"
$ sudo -E zypper -n --gpg-auto-import-keys install katacontainers
```
2. Decide which container manager to use and select the corresponding link that follows:
- [Kubernetes](../Developer-Guide.md#run-kata-containers-with-kubernetes)

View File

@@ -1,58 +1,13 @@
# Kata Containers snap package
* [Install Kata Containers](#install-kata-containers)
* [Configure Kata Containers](#configure-kata-containers)
* [Integration with shim v2 Container Engines](#integration-with-shim-v2-container-engines)
* [Remove Kata Containers snap package](#remove-kata-containers-snap-package)
## Install Kata Containers
# Install Kata Containers from `snapcraft.io`
Kata Containers can be installed in any Linux distribution that supports
[snapd](https://docs.snapcraft.io/installing-snapd).
Run the following command to install **Kata Containers**:
Run the following command to install Kata Containers:
```sh
$ sudo snap install kata-containers --stable --classic
```
```bash
$ sudo snap install kata-containers --classic
```
## Configure Kata Containers
By default Kata Containers snap image is mounted at `/snap/kata-containers` as a
read-only file system, therefore default configuration file can not be edited.
Fortunately Kata Containers supports loading a configuration file from another
path than the default.
```sh
$ sudo mkdir -p /etc/kata-containers
$ sudo cp /snap/kata-containers/current/usr/share/defaults/kata-containers/configuration.toml /etc/kata-containers/
$ $EDITOR /etc/kata-containers/configuration.toml
```
## Integration with shim v2 Container Engines
The Container engine daemon (`cri-o`, `containerd`, etc) needs to be able to find the
`containerd-shim-kata-v2` binary to allow Kata Containers to be created.
Run the following command to create a symbolic link to the shim v2 binary.
```sh
$ sudo ln -sf /snap/kata-containers/current/usr/bin/containerd-shim-kata-v2 /usr/local/bin/containerd-shim-kata-v2
```
Once the symbolic link has been created and the engine daemon configured, `io.containerd.kata.v2`
can be used as runtime.
Read the following documents to know how to run Kata Containers 2.x with `containerd`.
* [How to use Kata Containers and Containerd](https://github.com/kata-containers/kata-containers/blob/main/docs/how-to/containerd-kata.md)
* [Install Kata Containers with containerd](https://github.com/kata-containers/kata-containers/blob/main/docs/install/container-manager/containerd/containerd-install.md)
## Remove Kata Containers snap package
Run the following command to remove the Kata Containers snap:
```sh
$ sudo snap remove kata-containers
```
For further information on integrating and configuring the `snap` Kata Containers install,
refer to the [Kata Containers packaging `snap` documentation](https://github.com/kata-containers/packaging/blob/master/snap/README.md#configure-kata-containers).

View File

@@ -0,0 +1,15 @@
# Install Kata Containers on Ubuntu
1. Install the Kata Containers components with the following commands:
```bash
$ ARCH=$(arch)
$ BRANCH="${BRANCH:-master}"
$ sudo sh -c "echo 'deb http://download.opensuse.org/repositories/home:/katacontainers:/releases:/${ARCH}:/${BRANCH}/xUbuntu_$(lsb_release -rs)/ /' > /etc/apt/sources.list.d/kata-containers.list"
$ curl -sL http://download.opensuse.org/repositories/home:/katacontainers:/releases:/${ARCH}:/${BRANCH}/xUbuntu_$(lsb_release -rs)/Release.key | sudo apt-key add -
$ sudo -E apt-get update
$ sudo -E apt-get -y install kata-runtime kata-proxy kata-shim
```
2. Decide which container manager to use and select the corresponding link that follows:
- [Kubernetes](../Developer-Guide.md#run-kata-containers-with-kubernetes)

View File

@@ -1,62 +1,56 @@
# Table of Contents
- [Table of Contents](#table-of-contents)
- [Introduction](#introduction)
- [Helpful Links before starting](#helpful-links-before-starting)
- [Steps to enable Intel® QAT in Kata Containers](#steps-to-enable-intel-qat-in-kata-containers)
- [Script variables](#script-variables)
- [Set environment variables (Every Reboot)](#set-environment-variables-every-reboot)
- [Prepare the Ubuntu Host](#prepare-the-ubuntu-host)
- [Identify which PCI Bus the Intel® QAT card is on](#identify-which-pci-bus-the-intel-qat-card-is-on)
- [Install necessary packages for Ubuntu](#install-necessary-packages-for-ubuntu)
- [Download Intel® QAT drivers](#download-intel-qat-drivers)
- [Copy Intel® QAT configuration files and enable virtual functions](#copy-intel-qat-configuration-files-and-enable-virtual-functions)
- [Expose and Bind Intel® QAT virtual functions to VFIO-PCI (Every reboot)](#expose-and-bind-intel-qat-virtual-functions-to-vfio-pci-every-reboot)
- [Check Intel® QAT virtual functions are enabled](#check-intel-qat-virtual-functions-are-enabled)
- [Prepare Kata Containers](#prepare-kata-containers)
- [Download Kata kernel Source](#download-kata-kernel-source)
- [Build Kata kernel](#build-kata-kernel)
- [Copy Kata kernel](#copy-kata-kernel)
- [Prepare Kata root filesystem](#prepare-kata-root-filesystem)
- [Compile Intel® QAT drivers for Kata Containers kernel and add to Kata Containers rootfs](#compile-intel-qat-drivers-for-kata-containers-kernel-and-add-to-kata-containers-rootfs)
- [Copy Kata rootfs](#copy-kata-rootfs)
- [Verify Intel® QAT works in a container](#verify-intel-qat-works-in-a-container)
- [Build OpenSSL Intel® QAT engine container](#build-openssl-intel-qat-engine-container)
- [Test Intel® QAT with the ctr tool](#test-intel-qat-with-the-ctr-tool)
- [Test Intel® QAT in Kubernetes](#test-intel-qat-in-kubernetes)
- [Troubleshooting](#troubleshooting)
- [Optional Scripts](#optional-scripts)
- [Verify Intel® QAT card counters are incremented](#verify-intel-qat-card-counters-are-incremented)
* [Table of Contents](#table-of-contents)
* [Introduction](#introduction)
* [Helpful Links before starting](#helpful-links-before-starting)
* [Steps to enable Intel QAT in Kata Containers](#steps-to-enable-intel-qat-in-kata-containers)
* [Script variables](#script-variables)
* [Set environment variables (Every Reboot)](#set-environment-variables-every-reboot)
* [Prepare the Clear Linux Host](#prepare-the-clear-linux-host)
* [Identify which PCI Bus the Intel QAT card is on](#identify-which-pci-bus-the-intel-qat-card-is-on)
* [Install necessary bundles for Clear Linux](#install-necessary-bundles-for-clear-linux)
* [Download Intel QAT drivers](#download-intel-qat-drivers)
* [Copy Intel QAT configuration files and enable Virtual Functions](#copy-intel-qat-configuration-files-and-enable-virtual-functions)
* [Expose and Bind Intel QAT virtual functions to VFIO-PCI (Every reboot)](#expose-and-bind-intel-qat-virtual-functions-to-vfio-pci-every-reboot)
* [Check Intel QAT virtual functions are enabled](#check-intel-qat-virtual-functions-are-enabled)
* [Prepare Kata Containers](#prepare-kata-containers)
* [Download Kata kernel Source](#download-kata-kernel-source)
* [Build Kata kernel](#build-kata-kernel)
* [Copy Kata kernel](#copy-kata-kernel)
* [Prepare Kata root filesystem](#prepare-kata-root-filesystem)
* [Compile Intel QAT drivers for Kata Containers kernel and add to Kata Containers rootfs](#compile-intel-qat-drivers-for-kata-containers-kernel-and-add-to-kata-containers-rootfs)
* [Copy Kata rootfs](#copy-kata-rootfs)
* [Update Kata configuration to point to custom kernel and rootfs](#update-kata-configuration-to-point-to-custom-kernel-and-rootfs)
* [Verify Intel QAT works in a Docker Kata Containers container](#verify-intel-qat-works-in-a-docker-kata-containers-container)
* [Build OpenSSL Intel QAT engine container](#build-openssl-intel-qat-engine-container)
* [Test Intel QAT in Docker](#test-intel-qat-in-docker)
* [Troubleshooting](#troubleshooting)
* [Optional Scripts](#optional-scripts)
* [Verify Intel QAT card counters are incremented](#verify-intel-qat-card-counters-are-incremented)
# Introduction
Intel® QuickAssist Technology (QAT) provides hardware acceleration
Intel QuickAssist Technology (Intel QAT) provides hardware acceleration
for security (cryptography) and compression. These instructions cover the
steps for the latest [Ubuntu LTS release](https://ubuntu.com/download/desktop)
which already include the QAT host driver. These instructions can be adapted to
any Linux distribution. These instructions guide the user on how to download
the kernel sources, compile kernel driver modules against those sources, and
load them onto the host as well as preparing a specially built Kata Containers
kernel and custom Kata Containers rootfs.
* Download kernel sources
* Compile Kata kernel
* Compile kernel driver modules against those sources
* Download rootfs
* Add driver modules to rootfs
* Build rootfs image
steps for [Clear Linux](https://clearlinux.org) but can be adapted to any
Linux distribution. Your distribution may already have the Intel QAT
drivers, but it is likely they do not contain the necessary user space
components. These instructions guide the user on how to download the kernel
sources, compile kernel driver modules against those sources, and load them
onto the host as well as preparing a specially built Kata Containers kernel
and custom Kata Containers rootfs.
## Helpful Links before starting
[Intel® QuickAssist Technology at `01.org`](https://01.org/intel-quickassist-technology)
[Intel QAT Engine](https://github.com/intel/QAT_Engine)
[Intel® QuickAssist Technology Engine for OpenSSL](https://github.com/intel/QAT_Engine)
[Intel QuickAssist Technology at `01.org`](https://01.org/intel-quickassist-technology)
[Intel Device Plugin for Kubernetes](https://github.com/intel/intel-device-plugins-for-kubernetes)
[Intel® QuickAssist Technology for Crypto Poll Mode Driver](https://dpdk-docs.readthedocs.io/en/latest/cryptodevs/qat.html)
[Intel QuickAssist Crypto Poll Mode Driver](https://dpdk-docs.readthedocs.io/en/latest/cryptodevs/qat.html)
## Steps to enable Intel® QAT in Kata Containers
## Steps to enable Intel QAT in Kata Containers
There are some steps to complete only once, some steps to complete with every
reboot, and some steps to complete when the host kernel changes.
@@ -73,95 +67,91 @@ needed to point to updated drivers or different install locations.
Make sure to check [`01.org`](https://01.org/intel-quickassist-technology) for
the latest driver.
```bash
$ export QAT_DRIVER_VER=qat1.7.l.4.12.0-00011.tar.gz
$ export QAT_DRIVER_URL=https://downloadmirror.intel.com/30178/eng/${QAT_DRIVER_VER}
```sh
$ export QAT_DRIVER_VER=qat1.7.l.4.8.0-00005.tar.gz
$ export QAT_DRIVER_URL=https://01.org/sites/default/files/downloads/${QAT_DRIVER_VER}
$ export QAT_CONF_LOCATION=~/QAT_conf
$ export QAT_DOCKERFILE=https://raw.githubusercontent.com/intel/intel-device-plugins-for-kubernetes/master/demo/openssl-qat-engine/Dockerfile
$ export QAT_SRC=~/src/QAT
$ export GOPATH=~/src/go
$ export OSBUILDER=~/src/osbuilder
$ export KATA_KERNEL_LOCATION=~/kata
$ export KATA_ROOTFS_LOCATION=~/kata
```
## Prepare the Ubuntu Host
## Prepare the Clear Linux Host
The host could be a bare metal instance or a virtual machine. If using a
virtual machine, make sure that KVM nesting is enabled. The following
instructions reference an Intel® C62X chipset. Some of the instructions must be
modified if using a different Intel® QAT device. The Intel® QAT chipset can be
identified by executing the following.
instructions reference an Intel QAT. Some of the instructions must be
modified if using a different Intel QAT device. You can identify the Intel QAT
chipset by executing the following.
### Identify which PCI Bus the Intel® QAT card is on
### Identify which PCI Bus the Intel QAT card is on
```bash
```sh
$ for i in 0434 0435 37c8 1f18 1f19; do lspci -d 8086:$i; done
```
### Install necessary packages for Ubuntu
### Install necessary bundles for Clear Linux
These packages are necessary to compile the Kata kernel, Intel® QAT driver, and to
prepare the rootfs for Kata. [Docker](https://docs.docker.com/engine/install/ubuntu/)
also needs to be installed to be able to build the rootfs. To test that
everything works a Kubernetes pod is started requesting Intel® QAT resources. For the
pass through of the virtual functions the kernel boot parameter needs to have
`INTEL_IOMMU=on`.
Clear Linux version 30780 (Released August 13, 2019) includes a
`linux-firmware-qat` bundle that has the necessary QAT firmware along with a
functional QAT host driver that works with Kata Containers.
```bash
$ sudo apt update
$ sudo apt install -y golang-go build-essential python pkg-config zlib1g-dev libudev-dev bison libelf-dev flex libtool automake autotools-dev autoconf bc libpixman-1-dev coreutils libssl-dev
$ sudo sed -i 's/GRUB_CMDLINE_LINUX_DEFAULT=""/GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"/' /etc/default/grub
$ sudo update-grub
```sh
$ sudo swupd bundle-add network-basic linux-firmware-qat make c-basic go-basic containers-virt dev-utils devpkg-elfutils devpkg-systemd devpkg-ssl
$ sudo clr-boot-manager update
$ sudo systemctl enable --now docker
$ sudo reboot
```
### Download Intel® QAT drivers
### Download Intel QAT drivers
This will download the [Intel® QAT drivers](https://01.org/intel-quickassist-technology).
This will download the Intel QAT drivers from [`01.org`](https://01.org/intel-quickassist-technology).
Make sure to check the website for the latest version.
```bash
```sh
$ mkdir -p $QAT_SRC
$ cd $QAT_SRC
$ curl -L $QAT_DRIVER_URL | tar zx
```
### Copy Intel® QAT configuration files and enable virtual functions
### Copy Intel QAT configuration files and enable Virtual Functions
Modify the instructions below as necessary if using a different Intel® QAT hardware
Modify the instructions below as necessary if using a different QAT hardware
platform. You can learn more about customizing configuration files at the
[Intel® QAT Engine repository](https://github.com/intel/QAT_Engine/#copy-the-correct-intel-quickassist-technology-driver-config-files)
[Intel QAT Engine repository](https://github.com/intel/QAT_Engine/#copy-the-correct-intel-quickassist-technology-driver-config-files)
This section starts from a base config file and changes the `SSL` section to
`SHIM` to support the OpenSSL engine. There are more tweaks that you can make
depending on the use case and how many Intel® QAT engines should be run. You
depending on the use case and how many Intel QAT engines should be run. You
can find more information about how to customize in the
[Intel® QuickAssist Technology Software for Linux* - Programmer's Guide.](https://01.org/sites/default/files/downloads/336210qatswprogrammersguiderev006.pdf)
> **Note: This section assumes that a Intel® QAT `c6xx` platform is used.**
> **Note: This section assumes that a QAT `c6xx` platform is used.**
```bash
```sh
$ mkdir -p $QAT_CONF_LOCATION
$ cp $QAT_SRC/quickassist/utilities/adf_ctl/conf_files/c6xxvf_dev0.conf.vm $QAT_CONF_LOCATION/c6xxvf_dev0.conf
$ sed -i 's/\[SSL\]/\[SHIM\]/g' $QAT_CONF_LOCATION/c6xxvf_dev0.conf
```
### Expose and Bind Intel® QAT virtual functions to VFIO-PCI (Every reboot)
### Expose and Bind Intel QAT virtual functions to VFIO-PCI (Every reboot)
To enable virtual functions, the host OS should have IOMMU groups enabled. In
the UEFI Firmware Intel® Virtualization Technology for Directed I/O
(Intel® VT-d) must be enabled. Also, the kernel boot parameter should be
`intel_iommu=on` or `intel_iommu=ifgx_off`. This should have been set from
the instructions above. Check the output of `/proc/cmdline` to confirm. The
following commands assume you installed an Intel® QAT card, IOMMU is on, and
the UEFI Firmware Intel Virtualization Technology for Directed I/O
(Intel VT-d) must be enabled. Also, the kernel boot parameter should be
`intel_iommu=on` or `intel_iommu=ifgx_off`. The default in Clear Linux currently
is `intel_iommu=igfx_off` which should work with the Intel QAT device. The
following commands assume you installed an Intel QAT card, IOMMU is on, and
VT-d is enabled. The vendor and device ID add to the `VFIO-PCI` driver so that
each exposed virtual function can be bound to the `VFIO-PCI` driver. Once
complete, each virtual function passes into a Kata Containers container using
the PCIe device passthrough feature. For Kubernetes, the
[Intel device plugin](https://github.com/intel/intel-device-plugins-for-kubernetes)
for Kubernetes handles the binding of the driver, but the VFs still must be
the PCIe device passthrough feature. For Kubernetes, the Intel device plugin
for Kubernetes handles the binding of the driver but the VFs still must be
enabled.
```bash
```sh
$ sudo modprobe vfio-pci
$ QAT_PCI_BUS_PF_NUMBERS=$((lspci -d :435 && lspci -d :37c8 && lspci -d :19e2 && lspci -d :6f54) | cut -d ' ' -f 1)
$ QAT_PCI_BUS_PF_1=$(echo $QAT_PCI_BUS_PF_NUMBERS | cut -d ' ' -f 1)
@@ -170,10 +160,8 @@ $ QAT_PCI_ID_VF=$(cat /sys/bus/pci/devices/0000:${QAT_PCI_BUS_PF_1}/virtfn0/ueve
$ QAT_VENDOR_AND_ID_VF=$(echo ${QAT_PCI_ID_VF/PCI_ID=} | sed 's/:/ /')
$ echo $QAT_VENDOR_AND_ID_VF | sudo tee --append /sys/bus/pci/drivers/vfio-pci/new_id
```
Loop through all the virtual functions and bind to the VFIO driver
```bash
```sh
$ for f in /sys/bus/pci/devices/0000:$QAT_PCI_BUS_PF_1/virtfn*
do QAT_PCI_BUS_VF=$(basename $(readlink $f))
echo $QAT_PCI_BUS_VF | sudo tee --append /sys/bus/pci/drivers/c6xxvf/unbind
@@ -181,23 +169,22 @@ $ for f in /sys/bus/pci/devices/0000:$QAT_PCI_BUS_PF_1/virtfn*
done
```
### Check Intel® QAT virtual functions are enabled
### Check Intel QAT virtual functions are enabled
If the following command returns empty, then the virtual functions are not
properly enabled. This command checks the enumerated device IDs for just the
virtual functions. Using the Intel® QAT as an example, the physical device ID
virtual functions. Using the Intel QAT as an example, the physical device ID
is `37c8` and virtual function device ID is `37c9`. The following command checks
if VF's are enabled for any of the currently known Intel® QAT device ID's. The
if VF's are enabled for any of the currently known Intel QAT device ID's. The
following `ls` command should show the 16 VF's bound to `VFIO-PCI`.
```bash
```sh
$ for i in 0442 0443 37c9 19e3; do lspci -d 8086:$i; done
```
Another way to check is to see what PCI devices that `VFIO-PCI` is mapped to.
It should match the device ID's of the VF's.
```bash
```sh
$ ls -la /sys/bus/pci/drivers/vfio-pci
```
@@ -214,16 +201,16 @@ There are some patches that must be installed as well, which the
`build-kernel.sh` script should automatically apply. If you are using a
different kernel version, then you might need to manually apply them. Since
the Kata Containers kernel has a minimal set of kernel flags set, you must
create a Intel® QAT kernel fragment with the necessary `CONFIG_CRYPTO_*` options set.
create a QAT kernel fragment with the necessary `CONFIG_CRYPTO_*` options set.
Update the config to set some of the `CRYPTO` flags to enabled. This might
change with different kernel versions. The following instructions were tested
with kernel `v5.4.0-64-generic`.
change with different kernel versions. We tested the following instructions
with kernel `v4.19.28-41`.
```bash
```sh
$ mkdir -p $GOPATH
$ cd $GOPATH
$ go get -v github.com/kata-containers/kata-containers
$ cat << EOF > $GOPATH/src/github.com/kata-containers/kata-containers/tools/packaging/kernel/configs/fragments/common/qat.conf
$ go get -v github.com/kata-containers/packaging
$ cat << EOF > $GOPATH/src/github.com/kata-containers/packaging/kernel/configs/fragments/common/qat.conf
CONFIG_PCIEAER=y
CONFIG_UIO=y
CONFIG_CRYPTO_HW=y
@@ -234,70 +221,61 @@ CONFIG_MODULE_SIG=y
CONFIG_CRYPTO_AUTHENC=y
CONFIG_CRYPTO_DH=y
EOF
$ $GOPATH/src/github.com/kata-containers/kata-containers/tools/packaging/kernel/build-kernel.sh setup
$ $GOPATH/src/github.com/kata-containers/packaging/kernel/build-kernel.sh setup
```
### Build Kata kernel
```bash
$ cd $GOPATH
$ export LINUX_VER=$(ls -d kata-linux-*)
```sh
$ export LINUX_VER=$(ls -d kata*)
$ sed -i 's/EXTRAVERSION =/EXTRAVERSION = .qat.container/' $LINUX_VER/Makefile
$ $GOPATH/src/github.com/kata-containers/kata-containers/tools/packaging/kernel/build-kernel.sh build
$ $GOPATH/src/github.com/kata-containers/packaging/kernel/build-kernel.sh build
```
### Copy Kata kernel
```bash
$ export KATA_KERNEL_NAME=vmlinux-${LINUX_VER}_qat
```sh
$ mkdir -p $KATA_KERNEL_LOCATION
$ cp ${GOPATH}/${LINUX_VER}/vmlinux ${KATA_KERNEL_LOCATION}/${KATA_KERNEL_NAME}
$ cp $LINUX_VER/arch/x86/boot/bzImage $KATA_KERNEL_LOCATION/vmlinuz-${LINUX_VER}_qat
```
### Prepare Kata root filesystem
These instructions build upon the OS builder instructions located in the
[Developer Guide](../Developer-Guide.md). At this point it is recommended that
[Docker](https://docs.docker.com/engine/install/ubuntu/) is installed first, and
then [Kata-deploy](https://github.com/kata-containers/kata-containers/tree/main/tools/packaging/kata-deploy)
is use to install Kata. This will make sure that the correct `agent` version
is installed into the rootfs in the steps below.
[Developer Guide](../Developer-Guide.md). The following instructions use Clear
Linux (Kata Containers default) as the root filesystem with systemd as the
init and will add in the `kmod` binary, which is not a standard binary in a
Kata rootfs image. The `kmod` binary is necessary to load the QAT kernel
modules when the virtual machine rootfs boots. You should install Docker on
your system before running the following commands. If you need to use a custom
`kata-agent`, then refer to the previous link on how to add it in.
The following instructions use Debian as the root filesystem with systemd as
the init and will add in the `kmod` binary, which is not a standard binary in
a Kata rootfs image. The `kmod` binary is necessary to load the Intel® QAT
kernel modules when the virtual machine rootfs boots.
```bash
$ export OSBUILDER=$GOPATH/src/github.com/kata-containers/kata-containers/tools/osbuilder
$ export ROOTFS_DIR=${OSBUILDER}/rootfs-builder/rootfs
```sh
$ mkdir -p $OSBUILDER
$ cd $OSBUILDER
$ git clone https://github.com/kata-containers/osbuilder.git
$ export ROOTFS_DIR=${OSBUILDER}/osbuilder/rootfs-builder/rootfs
$ export EXTRA_PKGS='kmod'
```
Make sure that the `kata-agent` version matches the installed `kata-runtime`
version. Also make sure the `kata-runtime` install location is in your `PATH`
variable. The following `AGENT_VERSION` can be set manually to match
the `kata-runtime` version if the following commands don't work.
```bash
$ export PATH=$PATH:/opt/kata/bin
$ cd $GOPATH
version.
```sh
$ export AGENT_VERSION=$(kata-runtime version | head -n 1 | grep -o "[0-9.]\+")
$ cd ${OSBUILDER}/rootfs-builder
$ cd ${OSBUILDER}/osbuilder/rootfs-builder
$ sudo rm -rf ${ROOTFS_DIR}
$ script -fec 'sudo -E GOPATH=$GOPATH USE_DOCKER=true SECCOMP=no ./rootfs.sh debian'
$ script -fec 'sudo -E GOPATH=$GOPATH USE_DOCKER=true SECCOMP=no ./rootfs.sh clearlinux'
```
### Compile Intel® QAT drivers for Kata Containers kernel and add to Kata Containers rootfs
### Compile Intel QAT drivers for Kata Containers kernel and add to Kata Containers rootfs
After the Kata Containers kernel builds with the proper configuration flags,
you must build the Intel® QAT drivers against that Kata Containers kernel
you must build the Intel QAT drivers against that Kata Containers kernel
version in a similar way they were previously built for the host OS. You must
set the `KERNEL_SOURCE_ROOT` variable to the Kata Containers kernel source
directory and build the Intel® QAT drivers again. The `make` command will
install the Intel® QAT modules into the Kata rootfs.
directory and build the Intel QAT drivers again.
```bash
```sh
$ cd $GOPATH
$ export LINUX_VER=$(ls -d kata*)
$ export KERNEL_MAJOR_VERSION=$(awk '/^VERSION =/{print $NF}' $GOPATH/$LINUX_VER/Makefile)
@@ -306,18 +284,16 @@ $ export KERNEL_SUBLEVEL=$(awk '/^SUBLEVEL =/{print $NF}' $GOPATH/$LINUX_VER/Mak
$ export KERNEL_EXTRAVERSION=$(awk '/^EXTRAVERSION =/{print $NF}' $GOPATH/$LINUX_VER/Makefile)
$ export KERNEL_ROOTFS_DIR=${KERNEL_MAJOR_VERSION}.${KERNEL_PATHLEVEL}.${KERNEL_SUBLEVEL}${KERNEL_EXTRAVERSION}
$ cd $QAT_SRC
$ KERNEL_SOURCE_ROOT=$GOPATH/$LINUX_VER ./configure --enable-icp-sriov=guest
$ KERNEL_SOURCE_ROOT=$GOPATH/$LINUX_VER ./configure --disable-qat-lkcf --enable-icp-sriov=guest
$ sudo -E make all -j$(nproc)
$ sudo -E make INSTALL_MOD_PATH=$ROOTFS_DIR qat-driver-install -j$(nproc)
```
The `usdm_drv` module also needs to be copied into the rootfs modules path and
`depmod` should be run.
```bash
$ sudo cp $QAT_SRC/build/usdm_drv.ko $ROOTFS_DIR/lib/modules/${KERNEL_ROOTFS_DIR}/updates/drivers
```sh
$ sudo cp $QAT_SRC/build/usdm_drv.ko $ROOTFS_DIR/usr/lib/modules/${KERNEL_ROOTFS_DIR}/updates/drivers
$ sudo depmod -a -b ${ROOTFS_DIR} ${KERNEL_ROOTFS_DIR}
$ cd ${OSBUILDER}/image-builder
$ cd ${OSBUILDER}/osbuilder/image-builder
$ script -fec 'sudo -E USE_DOCKER=true ./image_builder.sh ${ROOTFS_DIR}'
```
@@ -326,225 +302,84 @@ $ script -fec 'sudo -E USE_DOCKER=true ./image_builder.sh ${ROOTFS_DIR}'
### Copy Kata rootfs
```bash
```sh
$ mkdir -p $KATA_ROOTFS_LOCATION
$ cp ${OSBUILDER}/image-builder/kata-containers.img $KATA_ROOTFS_LOCATION
$ cp ${OSBUILDER}/osbuilder/image-builder/kata-containers.img $KATA_ROOTFS_LOCATION
```
## Verify Intel® QAT works in a container
### Update Kata configuration to point to custom kernel and rootfs
The following instructions uses a OpenSSL Dockerfile that builds the
Intel® QAT engine to allow OpenSSL to offload crypto functions. It is a
convenient way to test that VFIO device passthrough for the Intel® QAT VFs are
You must update the `configuration.toml` for Kata Containers to point to the
custom kernel, custom rootfs, and to specify which modules to load when the
virtual machine is booted when a container is run. The following example
assumes you installed an Intel QAT, and you need to load those modules.
```sh
$ sudo mkdir -p /etc/kata-containers
$ sudo cp /usr/share/defaults/kata-containers/configuration-qemu.toml /etc/kata-containers/configuration.toml
$ sudo sed -i "s|kernel_params = \"\"|kernel_params = \"modules-load=usdm_drv,qat_c62xvf\"|g" /etc/kata-containers/configuration.toml
$ sudo sed -i "s|\/usr\/share\/kata-containers\/kata-containers.img|${KATA_KERNEL_LOCATION}\/kata-containers.img|g" /etc/kata-containers/configuration.toml
$ sudo sed -i "s|\/usr\/share\/kata-containers\/vmlinuz.container|${KATA_ROOTFS_LOCATION}\/vmlinuz-${LINUX_VER}_qat|g" /etc/kata-containers/configuration.toml
```
## Verify Intel QAT works in a Docker Kata Containers container
The following instructions leverage an OpenSSL Dockerfile that builds the
Intel QAT engine to allow OpenSSL to offload crypto functions. It is a
convenient way to test that VFIO device passthrough for the Intel QAT VFs are
working properly with the Kata Containers VM.
### Build OpenSSL Intel® QAT engine container
## Build OpenSSL Intel QAT engine container
Use the OpenSSL Intel® QAT [Dockerfile](https://github.com/intel/intel-device-plugins-for-kubernetes/tree/master/demo/openssl-qat-engine)
Use the OpenSSL Intel QAT [Dockerfile](https://github.com/intel/intel-device-plugins-for-kubernetes/tree/master/demo/openssl-qat-engine)
to build a container image with an optimized OpenSSL engine for
Intel® QAT. Using `docker build` with the Kata Containers runtime can sometimes
have issues. Therefore, make sure that `runc` is the default Docker container
runtime.
Intel QAT. Using `docker build` with the Kata Containers runtime can sometimes
have issues. Therefore, we recommended you change the default runtime to
`runc` before doing a build. Instructions for this are below.
```bash
```sh
$ cd $QAT_SRC
$ curl -O $QAT_DOCKERFILE
$ sudo sed -i 's/kata-runtime/runc/g' /etc/systemd/system/docker.service.d/50-runtime.conf
$ sudo systemctl daemon-reload && sudo systemctl restart docker
$ sudo docker build -t openssl-qat-engine .
```
> **Note: The Intel® QAT driver version in this container might not match the
> Intel® QAT driver compiled and loaded on the host when compiling.**
> **Note: The Intel QAT driver version in this container might not match the
> Intel QAT driver compiled and loaded on the host when compiling.**
### Test Intel® QAT with the ctr tool
### Test Intel QAT in Docker
The `ctr` tool can be used to interact with the containerd daemon. It may be
more convenient to use this tool to verify the kernel and image instead of
setting up a Kubernetes cluster. The correct Kata runtimes need to be added
to the containerd `config.toml`. Below is a sample snippet that can be added
to allow QEMU and Cloud Hypervisor (CLH) to work with `ctr`.
The host should already be setup with 16 virtual functions of the Intel QAT
card bound to `VFIO-PCI`. Verify this by looking in `/dev/vfio` for a listing
of devices. Replace the number 90 with one of the VFs exposed in `/dev/vfio`.
It might require you to add an `IPC_LOCK` capability to your Docker runtime
depending on which rootfs you use.
```
[plugins.cri.containerd.runtimes.kata-qemu]
runtime_type = "io.containerd.kata-qemu.v2"
privileged_without_host_devices = true
pod_annotations = ["io.katacontainers.*"]
[plugins.cri.containerd.runtimes.kata-qemu.options]
ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration-qemu.toml"
[plugins.cri.containerd.runtimes.kata-clh]
runtime_type = "io.containerd.kata-clh.v2"
privileged_without_host_devices = true
pod_annotations = ["io.katacontainers.*"]
[plugins.cri.containerd.runtimes.kata-clh.options]
ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration-clh.toml"
```sh
$ sudo docker run -it --runtime=kata-runtime --cap-add=IPC_LOCK --cap-add=SYS_ADMIN --device=/dev/vfio/90 -v /dev:/dev -v ${QAT_CONF_LOCATION}:/etc openssl-qat-engine bash
```
In addition, containerd expects the binary to be in `/usr/local/bin` so add
this small script so that it redirects to be able to use either QEMU or
Cloud Hypervisor with Kata.
```bash
$ echo '#!/bin/bash' | sudo tee /usr/local/bin/containerd-shim-kata-qemu-v2
$ echo 'KATA_CONF_FILE=/opt/kata/share/defaults/kata-containers/configuration-qemu.toml /opt/kata/bin/containerd-shim-kata-v2 $@' | sudo tee -a /usr/local/bin/containerd-shim-kata-qemu-v2
$ sudo chmod +x /usr/local/bin/containerd-shim-kata-qemu-v2
$ echo '#!/bin/bash' | sudo tee /usr/local/bin/containerd-shim-kata-clh-v2
$ echo 'KATA_CONF_FILE=/opt/kata/share/defaults/kata-containers/configuration-clh.toml /opt/kata/bin/containerd-shim-kata-v2 $@' | sudo tee -a /usr/local/bin/containerd-shim-kata-clh-v2
$ sudo chmod +x /usr/local/bin/containerd-shim-kata-clh-v2
```
After the OpenSSL image is built and imported into containerd, a Intel® QAT
virtual function exposed in the step above can be added to the `ctr` command.
Make sure to change the `/dev/vfio` number to one that actually exists on the
host system. When using the `ctr` tool, the`configuration.toml` for Kata needs
to point to the custom Kata kernel and rootfs built above and the Intel® QAT
modules in the Kata rootfs need to load at boot. The following steps assume that
`kata-deploy` was used to install Kata and QEMU is being tested. If using a
different hypervisor, different install method for Kata, or a different
Intel® QAT chipset then the command will need to be modified.
> **Note: The following was tested with
[containerd v1.3.9](https://github.com/containerd/containerd/releases/tag/v1.3.9).**
```bash
$ config_file="/opt/kata/share/defaults/kata-containers/configuration-qemu.toml"
$ sudo sed -i "/kernel =/c kernel = "\"${KATA_ROOTFS_LOCATION}/${KATA_KERNEL_NAME}\""" $config_file
$ sudo sed -i "/image =/c image = "\"${KATA_KERNEL_LOCATION}/kata-containers.img\""" $config_file
$ sudo sed -i -e 's/^kernel_params = "\(.*\)"/kernel_params = "\1 modules-load=usdm_drv,qat_c62xvf"/g' $config_file
$ sudo docker save -o openssl-qat-engine.tar openssl-qat-engine:latest
$ sudo ctr images import openssl-qat-engine.tar
$ sudo ctr run --runtime io.containerd.run.kata-qemu.v2 --privileged -t --rm --device=/dev/vfio/180 --mount type=bind,src=/dev,dst=/dev,options=rbind:rw --mount type=bind,src=${QAT_CONF_LOCATION}/c6xxvf_dev0.conf,dst=/etc/c6xxvf_dev0.conf,options=rbind:rw docker.io/library/openssl-qat-engine:latest bash
```
Below are some commands to run in the container image to verify Intel® QAT is
Below are some commands to run in the container image to verify Intel QAT is
working
```sh
root@67561dc2757a/ # cat /proc/modules
qat_c62xvf 16384 - - Live 0xffffffffc00d9000 (OE)
usdm_drv 86016 - - Live 0xffffffffc00e8000 (OE)
intel_qat 249856 - - Live 0xffffffffc009b000 (OE)
root@67561dc2757a/ # adf_ctl restart
Restarting all devices.
Processing /etc/c6xxvf_dev0.conf
root@67561dc2757a/ # adf_ctl status
Checking status of all devices.
There is 1 QAT acceleration device(s) in the system:
qat_dev0 - type: c6xxvf, inst_id: 0, node_id: 0, bsf: 0000:01:01.0, #accel: 1 #engines: 1 state: up
root@67561dc2757a/ # openssl engine -c -t qat-hw
(qat-hw) Reference implementation of QAT crypto engine v0.6.1
[RSA, DSA, DH, AES-128-CBC-HMAC-SHA1, AES-128-CBC-HMAC-SHA256, AES-256-CBC-HMAC-SHA1, AES-256-CBC-HMAC-SHA256, TLS1-PRF, HKDF, X25519, X448]
[ available ]
bash-5.0# cat /proc/modules
bash-5.0# adf_ctl restart
bash-5.0# adf_ctl status
bash-5.0# openssl engine -c -t qat
```
### Test Intel® QAT in Kubernetes
Start a Kubernetes cluster with containerd as the CRI. The host should
already be setup with 16 virtual functions of the Intel® QAT card bound to
`VFIO-PCI`. Verify this by looking in `/dev/vfio` for a listing of devices.
You might need to disable Docker before initializing Kubernetes. Be aware
that the OpenSSL container image built above will need to be exported from
Docker and imported into containerd.
If Kata is installed through [`kata-deploy`](https://github.com/kata-containers/kata-containers/blob/stable-2.0/tools/packaging/kata-deploy/README.md)
there will be multiple `configuration.toml` files associated with different
hypervisors. Rather than add in the custom Kata kernel, Kata rootfs, and
kernel modules to each `configuration.toml` as the default, instead use
[annotations](https://github.com/kata-containers/kata-containers/blob/stable-2.0/docs/how-to/how-to-load-kernel-modules-with-kata.md)
in the Kubernetes YAML file to tell Kata which kernel and rootfs to use. The
easy way to do this is to use `kata-deploy` which will install the Kata binaries
to `/opt` and properly configure the `/etc/containerd/config.toml` with annotation
support. However, the `configuration.toml` needs to enable support for
annotations as well. The following configures both QEMU and Cloud Hypervisor
`configuration.toml` files that are currently available with Kata Container
versions 2.0 and higher.
```bash
$ sudo sed -i 's/enable_annotations\s=\s\[\]/enable_annotations = [".*"]/' /opt/kata/share/defaults/kata-containers/configuration-qemu.toml
$ sudo sed -i 's/enable_annotations\s=\s\[\]/enable_annotations = [".*"]/' /opt/kata/share/defaults/kata-containers/configuration-clh.toml
```
Export the OpenSSL image from Docker and import into containerd.
```bash
$ sudo docker save -o openssl-qat-engine.tar openssl-qat-engine:latest
$ sudo ctr -n=k8s.io images import openssl-qat-engine.tar
```
The [Intel® QAT Plugin](https://github.com/intel/intel-device-plugins-for-kubernetes/blob/master/cmd/qat_plugin/README.md)
needs to be started so that the virtual functions can be discovered and
used by Kubernetes.
The following YAML file can be used to start a Kata container with Intel® QAT
support. If Kata is installed with `kata-deploy`, then the containerd
`configuration.toml` should have all of the Kata runtime classes already
populated and annotations supported. To use a Intel® QAT virtual function, the
Intel® QAT plugin needs to be started after the VF's are bound to `VFIO-PCI` as
described [above](#expose-and-bind-intel-qat-virtual-functions-to-vfio-pci-every-reboot).
Edit the following to point to the correct Kata kernel and rootfs location
built with Intel® QAT support.
```bash
$ cat << EOF > kata-openssl-qat.yaml
apiVersion: v1
kind: Pod
metadata:
name: kata-openssl-qat
labels:
app: kata-openssl-qat
annotations:
io.katacontainers.config.hypervisor.kernel: "$KATA_KERNEL_LOCATION/$KATA_KERNEL_NAME"
io.katacontainers.config.hypervisor.image: "$KATA_ROOTFS_LOCATION/kata-containers.img"
io.katacontainers.config.hypervisor.kernel_params: "modules-load=usdm_drv,qat_c62xvf"
spec:
runtimeClassName: kata-qemu
containers:
- name: kata-openssl-qat
image: docker.io/library/openssl-qat-engine:latest
imagePullPolicy: IfNotPresent
resources:
limits:
qat.intel.com/generic: 1
cpu: 1
securityContext:
capabilities:
add: ["IPC_LOCK", "SYS_ADMIN"]
volumeMounts:
- mountPath: /etc/c6xxvf_dev0.conf
name: etc-mount
- mountPath: /dev
name: dev-mount
volumes:
- name: dev-mount
hostPath:
path: /dev
- name: etc-mount
hostPath:
path: $QAT_CONF_LOCATION/c6xxvf_dev0.conf
EOF
```
Use `kubectl` to start the pod. Verify that Intel® QAT card acceleration is
working with the Intel® QAT engine.
```bash
$ kubectl apply -f kata-openssl-qat.yaml
```
Test with Intel QAT card acceleration
```sh
$ kubectl exec -it kata-openssl-qat -- adf_ctl restart
Restarting all devices.
Processing /etc/c6xxvf_dev0.conf
bash-5.0# openssl speed -engine qat -elapsed -async_jobs 72 rsa2048
```
$ kubectl exec -it kata-openssl-qat -- adf_ctl status
Checking status of all devices.
There is 1 QAT acceleration device(s) in the system:
qat_dev0 - type: c6xxvf, inst_id: 0, node_id: 0, bsf: 0000:01:01.0, #accel: 1 #engines: 1 state: up
Test with CPU acceleration
$ kubectl exec -it kata-openssl-qat -- openssl engine -c -t qat-hw
(qat-hw) Reference implementation of QAT crypto engine v0.6.1
[RSA, DSA, DH, AES-128-CBC-HMAC-SHA1, AES-128-CBC-HMAC-SHA256, AES-256-CBC-HMAC-SHA1, AES-256-CBC-HMAC-SHA256, TLS1-PRF, HKDF, X25519, X448]
[ available ]
```sh
bash-5.0# openssl speed -elapsed rsa2048
```
### Troubleshooting
@@ -577,9 +412,9 @@ c6xxvf_dev10.conf c6xxvf_dev13.conf c6xxvf_dev2.conf c6xxvf_dev5.conf c6xxvf
```
* Check `dmesg` inside the container to see if there are any issues with the
Intel® QAT driver.
Intel QAT driver.
* If there are issues building the OpenSSL Intel® QAT container image, then
* If there are issues building the OpenSSL Intel QAT container image, then
check to make sure that runc is the default runtime for building container.
```sh
@@ -590,18 +425,17 @@ Environment="DOCKER_DEFAULT_RUNTIME=--default-runtime runc"
## Optional Scripts
### Verify Intel® QAT card counters are incremented
### Verify Intel QAT card counters are incremented
To check the built in firmware counters, the Intel® QAT driver has to be compiled
and installed to the host and can't rely on the built in host driver. The
counters will increase when the accelerator is actively being used. To verify
Intel® QAT is actively accelerating the containerized application, use the
following instructions to check if any of the counters increment. Make
sure to change the PCI Device ID to match whats in the system.
Use the `lspci` command to figure out which PCI bus the Intel QAT accelerators
are on. The counters will increase when the accelerator is actively being
used. To verify QAT is actively accelerating the containerized application,
use the following instructions to check if any of the counters are
incrementing. You will have to change the PCI device ID to match your system.
```bash
```sh
$ for i in 0434 0435 37c8 1f18 1f19; do lspci -d 8086:$i; done
$ sudo watch cat /sys/kernel/debug/qat_c6xx_0000\:b1\:00.0/fw_counters
$ sudo watch cat /sys/kernel/debug/qat_c6xx_0000\:b3\:00.0/fw_counters
$ sudo watch cat /sys/kernel/debug/qat_c6xx_0000\:b5\:00.0/fw_counters
```
```

View File

@@ -1,112 +0,0 @@
# Kata Containers with SGX
- [Check if SGX is enabled](#check-if-sgx-is-enabled)
- [Install Host kernel with SGX support](#install-host-kernel-with-sgx-support)
- [Install Guest kernel with SGX support](#install-guest-kernel-with-sgx-support)
- [Run Kata Containers with SGX enabled](#run-kata-containers-with-sgx-enabled)
Intel® Software Guard Extensions (SGX) is a set of instructions that increases the security
of applications code and data, giving them more protections from disclosure or modification.
> **Note:** At the time of writing this document, SGX patches have not landed on the Linux kernel
> project, so specific versions for guest and host kernels must be installed to enable SGX.
## Check if SGX is enabled
Run the following command to check if your host supports SGX.
```sh
$ grep -o sgx /proc/cpuinfo
```
Continue to the following section if the output of the above command is empty,
otherwise continue to section [Install Guest kernel with SGX support](#install-guest-kernel-with-sgx-support)
## Install Host kernel with SGX support
The following commands were tested on Fedora 32, they might work on other distros too.
```sh
$ git clone --depth=1 https://github.com/intel/kvm-sgx
$ pushd kvm-sgx
$ cp /boot/config-$(uname -r) .config
$ yes "" | make oldconfig
$ # In the following step, enable: INTEL_SGX and INTEL_SGX_VIRTUALIZATION
$ make menuconfig
$ make -j$(($(nproc)-1)) bzImage
$ make -j$(($(nproc)-1)) modules
$ sudo make modules_install
$ sudo make install
$ popd
$ sudo reboot
```
> **Notes:**
> * Run: `mokutil --sb-state` to check whether secure boot is enabled, if so, you will need to sign the kernel.
> * You'll lose SGX support when a new distro kernel is installed and the system rebooted.
Once you have restarted your system with the new brand Linux Kernel with SGX support, run
the following command to make sure it's enabled. If the output is empty, go to the BIOS
setup and enable SGX manually.
```sh
$ grep -o sgx /proc/cpuinfo
```
## Install Guest kernel with SGX support
Install the guest kernel in the Kata Containers directory, this way it can be used to run
Kata Containers.
```sh
$ curl -LOk https://github.com/devimc/kvm-sgx/releases/download/v0.0.1/kata-virtiofs-sgx.tar.gz
$ sudo tar -xf kata-virtiofs-sgx.tar.gz -C /usr/share/kata-containers/
$ sudo sed -i 's|kernel =|kernel = "/usr/share/kata-containers/vmlinux-virtiofs-sgx.container"|g' \
/usr/share/defaults/kata-containers/configuration.toml
```
## Run Kata Containers with SGX enabled
Before running a Kata Container make sure that your version of `crio` or `containerd`
supports annotations.
For `containerd` check in `/etc/containerd/config.toml` that the list of `pod_annotations` passed
to the `sandbox` are: `["io.katacontainers.*", "sgx.intel.com/epc"]`.
> `sgx.yaml`
```yaml
apiVersion: v1
kind: Pod
metadata:
name: sgx
annotations:
sgx.intel.com/epc: "32Mi"
spec:
terminationGracePeriodSeconds: 0
runtimeClassName: kata
containers:
- name: c1
image: busybox
command:
- sh
stdin: true
tty: true
volumeMounts:
- mountPath: /dev/sgx/
name: test-volume
volumes:
- name: test-volume
hostPath:
path: /dev/sgx/
type: Directory
```
```sh
$ kubectl apply -f sgx.yaml
$ kubectl exec -ti sgx ls /dev/sgx/
enclave provision
```
The output of the latest command shouldn't be empty, otherwise check
your system environment to make sure SGX is fully supported.
[1]: github.com/cloud-hypervisor/cloud-hypervisor/

View File

@@ -10,6 +10,9 @@ Currently, the instructions are based on the following links:
- https://docs.openstack.org/zun/latest/admin/clear-containers.html
- ../install/ubuntu-installation-guide.md
## Install Git to use with DevStack
```sh
@@ -51,7 +54,7 @@ $ zun delete test
## Install Kata Containers
Follow [these instructions](../install/README.md)
Follow [these instructions](../install/ubuntu-installation-guide.md)
to install the Kata Containers components.
## Update Docker with new Kata Containers runtime

View File

@@ -21,12 +21,7 @@ const LOG_LEVELS: &[(&str, slog::Level)] = &[
];
// XXX: 'writer' param used to make testing possible.
pub fn create_logger<W>(
name: &str,
source: &str,
level: slog::Level,
writer: W,
) -> (slog::Logger, slog_async::AsyncGuard)
pub fn create_logger<W>(name: &str, source: &str, level: slog::Level, writer: W) -> slog::Logger
where
W: Write + Send + Sync + 'static,
{
@@ -42,21 +37,17 @@ where
let filter_drain = RuntimeLevelFilter::new(unique_drain, level).fuse();
// Ensure the logger is thread-safe
let (async_drain, guard) = slog_async::Async::new(filter_drain)
.thread_name("slog-async-logger".into())
.build_with_guard();
let async_drain = slog_async::Async::new(filter_drain).build().fuse();
// Add some "standard" fields
let logger = slog::Logger::root(
slog::Logger::root(
async_drain.fuse(),
o!("version" => env!("CARGO_PKG_VERSION"),
"subsystem" => "root",
"pid" => process::id().to_string(),
"name" => name.to_string(),
"source" => source.to_string()),
);
(logger, guard)
)
}
pub fn get_log_levels() -> Vec<&'static str> {
@@ -102,7 +93,9 @@ impl HashSerializer {
// Take care to only add the first instance of a key. This matters for loggers (but not
// Records) since a child loggers have parents and the loggers are serialised child first
// meaning the *newest* fields are serialised first.
self.fields.entry(key).or_insert(value);
if !self.fields.contains_key(&key) {
self.fields.insert(key, value);
}
}
fn remove_field(&mut self, key: &str) {

View File

@@ -69,7 +69,7 @@ parts:
tar -xf ${tarfile} --strip-components=1
image:
after: [godeps, qemu, kernel]
after: [godeps]
plugin: nil
build-packages:
- docker.io
@@ -89,8 +89,6 @@ parts:
export GOROOT=${SNAPCRAFT_STAGE}
export PATH="${GOROOT}/bin:${PATH}"
http_proxy=${http_proxy:-""}
https_proxy=${https_proxy:-""}
if [ -n "$http_proxy" ]; then
echo "Setting proxy $http_proxy"
sudo -E systemctl set-environment http_proxy=$http_proxy || true
@@ -171,7 +169,7 @@ parts:
fi
kernel:
after: [godeps]
after: [godeps, image]
plugin: nil
build-packages:
- libelf-dev
@@ -185,8 +183,8 @@ parts:
cd ${kata_dir}/tools/packaging/kernel
# Setup and build kernel
./build-kernel.sh -d setup
# Say 'no' to everithing, fix issues with incomplete .config files
yes "n" | ./build-kernel.sh setup
kernel_dir_prefix="kata-linux-"
cd ${kernel_dir_prefix}*
version=$(basename ${PWD} | sed 's|'"${kernel_dir_prefix}"'||' | cut -d- -f1)
@@ -208,7 +206,7 @@ parts:
qemu:
plugin: make
after: [godeps]
after: [godeps, runtime]
build-packages:
- gcc
- python3
@@ -228,7 +226,6 @@ parts:
- libffi-dev
- libmount-dev
- libselinux1-dev
- ninja-build
override-build: |
yq=${SNAPCRAFT_STAGE}/yq
export GOPATH=${SNAPCRAFT_STAGE}/gopath
@@ -245,11 +242,10 @@ parts:
;;
*)
branch="$(${yq} r ${versions_file} assets.hypervisor.qemu.version)"
branch="$(${yq} r ${versions_file} assets.hypervisor.qemu.tag)"
url="$(${yq} r ${versions_file} assets.hypervisor.qemu.url)"
commit=""
patches_dir="${kata_dir}/tools/packaging/qemu/patches/$(echo ${branch} | sed -e 's/.[[:digit:]]*$//' -e 's/^v//').x"
patches_version_dir="${kata_dir}/tools/packaging/qemu/patches/tag_patches/${branch}"
;;
esac
@@ -262,23 +258,31 @@ parts:
[ -n "$(ls -A ui/keycodemapdb)" ] || git clone https://github.com/qemu/keycodemapdb ui/keycodemapdb/
[ -n "$(ls -A capstone)" ] || git clone https://github.com/qemu/capstone capstone
# Apply branch patches
${kata_dir}/tools/packaging/scripts/apply_patches.sh "${patches_dir}"
${kata_dir}/tools/packaging/scripts/apply_patches.sh "${patches_version_dir}"
# Apply patches
for patch in ${patches_dir}/*.patch; do
echo "Applying $(basename "$patch") ..."
patch \
--batch \
--forward \
--strip 1 \
--input "$patch"
done
# Only x86_64 supports libpmem
[ "$(uname -m)" = "x86_64" ] && sudo apt-get --no-install-recommends install -y apt-utils ca-certificates libpmem-dev libseccomp-dev
configure_hypervisor=${kata_dir}/tools/packaging/scripts/configure-hypervisor.sh
chmod +x ${configure_hypervisor}
# static build. The --prefix, --libdir, --libexecdir, --datadir arguments are
# based on PREFIX and set by configure-hypervisor.sh
echo "$(PREFIX=/snap/${SNAPCRAFT_PROJECT_NAME}/current/usr ${configure_hypervisor} -s kata-qemu) \
--disable-rbd " \
# static build
echo "$(${configure_hypervisor} -s qemu) \
--disable-rbd
--prefix=/snap/${SNAPCRAFT_PROJECT_NAME}/current/usr \
--datadir=/snap/${SNAPCRAFT_PROJECT_NAME}/current/usr/share \
--libexecdir=/snap/${SNAPCRAFT_PROJECT_NAME}/current/usr/libexec/qemu" \
| xargs ./configure
# Copy QEMU configurations (Kconfigs)
cp -a ${kata_dir}/tools/packaging/qemu/default-configs/* default-configs/devices/
cp -a ${kata_dir}/tools/packaging/qemu/default-configs/* default-configs/
# build and install
make -j $(($(nproc)-1))
@@ -289,6 +293,7 @@ parts:
- -usr/bin/qemu-pr-helper
- -usr/bin/virtfs-proxy-helper
- -usr/include/
- -usr/libexec/
- -usr/share/applications/
- -usr/share/icons/
- -usr/var/
@@ -300,8 +305,4 @@ parts:
apps:
runtime:
command: usr/bin/kata-runtime
shim:
command: usr/bin/containerd-shim-kata-v2
collect-data:
command: usr/bin/kata-collect-data.sh

View File

@@ -1 +0,0 @@
tarpaulin-report.html

View File

@@ -1 +0,0 @@
edition = "2018"

793
src/agent/Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -9,53 +9,35 @@ oci = { path = "oci" }
logging = { path = "../../pkg/logging" }
rustjail = { path = "rustjail" }
protocols = { path = "protocols" }
netlink = { path = "netlink", features = ["with-log", "with-agent-handler"] }
lazy_static = "1.3.0"
ttrpc = { version = "0.5.0", features = ["async", "protobuf-codec"], default-features = false }
ttrpc = "0.3.0"
protobuf = "=2.14.0"
libc = "0.2.58"
nix = "0.17.0"
prctl = "1.0.0"
serde_json = "1.0.39"
signal-hook = "0.1.9"
scan_fmt = "0.2.3"
scopeguard = "1.0.0"
regex = "1"
async-trait = "0.1.42"
tokio = { version = "1.2.0", features = ["rt", "rt-multi-thread", "sync", "macros", "io-util", "time", "signal", "io-std", "process", "fs"] }
futures = "0.3.12"
netlink-sys = { version = "0.6.0", features = ["tokio_socket",]}
tokio-vsock = "0.3.1"
# Because the author has no time to maintain the crate, we switch the dependency to github,
# Once the new version released on crates.io, we switch it back.
# https://github.com/little-dude/netlink/issues/161
rtnetlink = { git = "https://github.com/little-dude/netlink", rev = "a9367bc4700496ddebc088110c28f40962923326" }
netlink-packet-utils = "0.4.0"
ipnetwork = "0.17.0"
# slog:
# - Dynamic keys required to allow HashMap keys to be slog::Serialized.
# - The 'max_*' features allow changing the log level at runtime
# (by stopping the compiler from removing log calls).
slog = { version = "2.5.2", features = ["dynamic-keys", "max_level_trace", "release_max_level_info"] }
slog-scope = "4.1.2"
# Redirect ttrpc log calls
slog-stdlog = "4.0.0"
log = "0.4.11"
# for testing
tempfile = "3.1.0"
prometheus = { version = "0.9.0", features = ["process"] }
procfs = "0.7.9"
anyhow = "1.0.32"
cgroups = { package = "cgroups-rs", version = "0.2.5" }
cgroups = { git = "https://github.com/kata-containers/cgroups-rs", branch = "stable-0.1.1"}
[workspace]
members = [
"netlink",
"oci",
"protocols",
"rustjail",
]
[profile.release]
lto = true

View File

@@ -3,11 +3,6 @@
# SPDX-License-Identifier: Apache-2.0
#
# To show variables or targets help on `make help`
# Use the following format:
# '##VAR VARIABLE_NAME: help about variable'
# '##TARGET TARGET_NAME: help about target'
PROJECT_NAME = Kata Containers
PROJECT_URL = https://github.com/kata-containers
PROJECT_COMPONENT = kata-agent
@@ -21,18 +16,16 @@ SOURCES := \
VERSION_FILE := ./VERSION
VERSION := $(shell grep -v ^\# $(VERSION_FILE))
COMMIT_NO := $(shell git rev-parse HEAD 2>/dev/null || true)
COMMIT_NO_SHORT := $(shell git rev-parse --short HEAD 2>/dev/null || true)
COMMIT := $(if $(shell git status --porcelain --untracked-files=no 2>/dev/null || true),${COMMIT_NO}-dirty,${COMMIT_NO})
COMMIT_MSG = $(if $(COMMIT),$(COMMIT),unknown)
# Exported to allow cargo to see it
export VERSION_COMMIT := $(if $(COMMIT),$(VERSION)-$(COMMIT),$(VERSION))
##VAR BUILD_TYPE=release|debug type of rust build
BUILD_TYPE = release
##VAR ARCH=arch target to build (format: uname -m)
ARCH = $(shell uname -m)
##VAR LIBC=musl|gnu
LIBC ?= musl
ifneq ($(LIBC),musl)
ifeq ($(LIBC),gnu)
@@ -48,11 +41,6 @@ ifeq ($(ARCH), ppc64le)
$(warning "WARNING: powerpc64le-unknown-linux-musl target is unavailable")
endif
ifeq ($(ARCH), s390x)
override LIBC = gnu
$(warning "WARNING: s390x-unknown-linux-musl target is unavailable")
endif
EXTRA_RUSTFLAGS :=
ifeq ($(ARCH), aarch64)
@@ -64,12 +52,10 @@ TRIPLE = $(ARCH)-unknown-linux-$(LIBC)
TARGET_PATH = target/$(TRIPLE)/$(BUILD_TYPE)/$(TARGET)
##VAR DESTDIR=<path> is a directory prepended to each installed target file
DESTDIR :=
##VAR BINDIR=<path> is a directory for installing executable programs
BINDIR := /usr/bin
##VAR INIT=yes|no define if agent will be installed as init
# Define if agent will be installed as init
INIT := no
# Path to systemd unit directory if installed as not init.
@@ -117,7 +103,6 @@ define INSTALL_FILE
install -D -m 644 $1 $(DESTDIR)$2/$1 || exit 1;
endef
##TARGET default: build code
default: $(TARGET) show-header
$(TARGET): $(GENERATED_CODE) $(TARGET_PATH)
@@ -125,55 +110,36 @@ $(TARGET): $(GENERATED_CODE) $(TARGET_PATH)
$(TARGET_PATH): $(SOURCES) | show-summary
@RUSTFLAGS="$(EXTRA_RUSTFLAGS) --deny warnings" cargo build --target $(TRIPLE) --$(BUILD_TYPE)
$(GENERATED_FILES): %: %.in
@sed $(foreach r,$(GENERATED_REPLACEMENTS),-e 's|@$r@|$($r)|g') "$<" > "$@"
##TARGET optimize: optimized build
optimize: $(SOURCES) | show-summary show-header
@RUSTFLAGS="-C link-arg=-s $(EXTRA_RUSTFLAGS) --deny-warnings" cargo build --target $(TRIPLE) --$(BUILD_TYPE)
show-header:
@printf "%s - version %s (commit %s)\n\n" "$(TARGET)" "$(VERSION)" "$(COMMIT_MSG)"
##TARGET clippy: run clippy linter
clippy: $(GENERATED_CODE)
cargo clippy --all-targets --all-features --release \
-- \
-Aclippy::redundant_allocation \
-D warnings
$(GENERATED_FILES): %: %.in
@sed $(foreach r,$(GENERATED_REPLACEMENTS),-e 's|@$r@|$($r)|g') "$<" > "$@"
format:
cargo fmt -- --check
##TARGET install: install agent
install: install-services
install: build-service
@install -D $(TARGET_PATH) $(DESTDIR)/$(BINDIR)/$(TARGET)
##TARGET clean: clean build
clean:
@cargo clean
@rm -f $(GENERATED_FILES)
@rm -f tarpaulin-report.html
#TARGET test: run cargo tests
test:
@cargo test --all --target $(TRIPLE)
##TARGET check: run test
check: clippy format
check: test
##TARGET run: build and run agent
run:
@cargo run --target $(TRIPLE)
install-services: $(GENERATED_FILES)
build-service: $(GENERATED_FILES)
ifeq ($(INIT),no)
@echo "Installing systemd unit files..."
$(foreach f,$(UNIT_FILES),$(call INSTALL_FILE,$f,$(UNIT_DIR)))
endif
show-header:
@printf "%s - version %s (commit %s)\n\n" "$(TARGET)" "$(VERSION)" "$(COMMIT_MSG)"
show-summary: show-header
@printf "project:\n"
@printf " name: $(PROJECT_NAME)\n"
@@ -189,35 +155,7 @@ show-summary: show-header
@printf " %s\n" "$(call get_toolchain_version)"
@printf "\n"
## help: Show help comments that start with `##VAR` and `##TARGET`
help: Makefile show-summary
@echo "==========================Help============================="
@echo "Variables:"
@sed -n 's/^##VAR//p' $< | sort
@echo ""
@echo "Targets:"
@sed -n 's/^##TARGET//p' $< | sort
TARPAULIN_ARGS:=-v --workspace
install-tarpaulin:
cargo install cargo-tarpaulin
# Check if cargo tarpaulin is installed
HAS_TARPAULIN:= $(shell cargo --list | grep tarpaulin 2>/dev/null)
check_tarpaulin:
ifndef HAS_TARPAULIN
$(error "tarpaulin is not available please: run make install-tarpaulin ")
else
$(info OK: tarpaulin installed)
endif
##TARGET codecov: Generate code coverage report
codecov: check_tarpaulin
cargo tarpaulin $(TARPAULIN_ARGS)
##TARGET codecov-html: Generate code coverage html report
codecov-html: check_tarpaulin
cargo tarpaulin $(TARPAULIN_ARGS) -o Html
help: show-summary
.PHONY: \
help \
@@ -225,6 +163,5 @@ codecov-html: check_tarpaulin
show-summary \
optimize
##TARGET generate-protocols: generate/update grpc agent protocols
generate-protocols:
protocols/hack/update-generated-proto.sh all

View File

@@ -39,27 +39,11 @@ After that, we drafted the initial code here, and any contributions are welcome.
## Getting Started
### Build from Source
The rust-agent needs to be built statically and linked with `musl`
> **Note:** skip this step for ppc64le, the build scripts explicitly use gnu for ppc64le.
The rust-agent need to be built with rust newer than 1.37, and static linked with `musl`.
```bash
$ arch=$(uname -m)
$ rustup target add "${arch}-unknown-linux-musl"
$ sudo ln -s /usr/bin/g++ /bin/musl-g++
```
ppc64le-only: Manually install `protoc`, e.g.
```bash
$ sudo dnf install protobuf-compiler
```
Download the source files in the Kata containers repository and build the agent:
```bash
$ GOPATH="${GOPATH:-$HOME/go}"
$ dir="$GOPATH/src/github.com/kata-containers"
$ git -C ${dir} clone --depth 1 https://github.com/kata-containers/kata-containers
$ make -C ${dir}/kata-containers/src/agent
rustup target add x86_64-unknown-linux-musl
sudo ln -s /usr/bin/g++ /bin/musl-g++
cargo build --target x86_64-unknown-linux-musl --release
```
## Run Kata CI with rust-agent

View File

@@ -15,10 +15,8 @@ Wants=kata-containers.target
StandardOutput=tty
Type=simple
ExecStart=@BINDIR@/@AGENT_NAME@
LimitNOFILE=1048576
LimitNOFILE=infinity
# ExecStop is required for static agent tracing; in all other scenarios
# the runtime handles shutting down the VM.
ExecStop=/bin/sync ; /usr/bin/systemctl --force poweroff
FailureAction=poweroff
# Discourage OOM-killer from touching the agent
OOMScoreAdjust=-997

View File

@@ -0,0 +1,20 @@
[package]
name = "netlink"
version = "0.1.0"
authors = ["The Kata Containers community <kata-dev@lists.katacontainers.io>"]
edition = "2018"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
libc = "0.2.58"
nix = "0.17.0"
protobuf = { version = "=2.14.0", optional = true }
protocols = { path = "../protocols", optional = true }
slog = { version = "2.5.2", features = ["dynamic-keys", "max_level_trace", "release_max_level_info"], optional = true }
slog-scope = { version = "4.1.2", optional = true }
[features]
with-log = ["slog", "slog-scope"]
with-agent-handler = ["protobuf", "protocols"]

View File

@@ -0,0 +1,572 @@
// Copyright (c) 2020 Ant Financial
// Copyright (C) 2020 Alibaba Cloud. All rights reserved.
//
// SPDX-License-Identifier: Apache-2.0
//
//! Dedicated Netlink interfaces for Kata agent protocol handler.
use std::convert::TryFrom;
use protobuf::RepeatedField;
use protocols::types::{ARPNeighbor, IPAddress, IPFamily, Interface, Route};
use super::*;
#[cfg(feature = "with-log")]
// Convenience macro to obtain the scope logger
macro_rules! sl {
() => {
slog_scope::logger().new(o!("subsystem" => "netlink"))
};
}
impl super::RtnlHandle {
pub fn update_interface(&mut self, iface: &Interface) -> Result<Interface> {
// the reliable way to find link is using hardware address
// as filter. However, hardware filter might not be supported
// by netlink, we may have to dump link list and the find the
// target link. filter using name or family is supported, but
// we cannot use that to find target link.
// let's try if hardware address filter works. -_-
let ifinfo = self.find_link_by_hwaddr(iface.hwAddr.as_str())?;
// bring down interface if it is up
if ifinfo.ifi_flags & libc::IFF_UP as u32 != 0 {
self.set_link_status(&ifinfo, false)?;
}
// delete all addresses associated with the link
let del_addrs: Vec<RtIPAddr> = self.get_link_addresses(&ifinfo)?;
self.delete_all_addrs(&ifinfo, del_addrs.as_ref())?;
// add new ip addresses in request
for grpc_addr in &iface.IPAddresses {
let rtip = RtIPAddr::try_from(grpc_addr.clone())?;
self.add_one_address(&ifinfo, &rtip)?;
}
let mut v: Vec<u8> = vec![0; DEFAULT_NETLINK_BUF_SIZE];
// Safe because we have allocated enough buffer space.
let nlh = unsafe { &mut *(v.as_mut_ptr() as *mut nlmsghdr) };
let ifi = unsafe { &mut *(NLMSG_DATA!(nlh) as *mut ifinfomsg) };
// set name, set mtu, IFF_NOARP. in one rtnl_talk.
nlh.nlmsg_len = NLMSG_LENGTH!(mem::size_of::<ifinfomsg>() as u32) as __u32;
nlh.nlmsg_type = RTM_NEWLINK;
nlh.nlmsg_flags = NLM_F_REQUEST;
self.assign_seqnum(nlh);
ifi.ifi_family = ifinfo.ifi_family;
ifi.ifi_type = ifinfo.ifi_type;
ifi.ifi_index = ifinfo.ifi_index;
if iface.raw_flags & libc::IFF_NOARP as u32 != 0 {
ifi.ifi_change |= libc::IFF_NOARP as u32;
ifi.ifi_flags |= libc::IFF_NOARP as u32;
}
// Safe because we have allocated enough buffer space.
unsafe {
nlh.addattr32(IFLA_MTU, iface.mtu as u32);
// if str is null terminated, use addattr_var.
// otherwise, use addattr_str
nlh.addattr_var(IFLA_IFNAME, iface.name.as_ref());
}
self.rtnl_talk(v.as_mut_slice(), false)?;
// TODO: why the result is ignored here?
let _ = self.set_link_status(&ifinfo, true);
Ok(iface.clone())
}
/// Delete this interface/link per request
pub fn remove_interface(&mut self, iface: &Interface) -> Result<Interface> {
let ifinfo = self.find_link_by_hwaddr(iface.hwAddr.as_str())?;
self.set_link_status(&ifinfo, false)?;
let mut v: Vec<u8> = vec![0; DEFAULT_NETLINK_BUF_SIZE];
// Safe because we have allocated enough buffer space.
let nlh = unsafe { &mut *(v.as_mut_ptr() as *mut nlmsghdr) };
let ifi = unsafe { &mut *(NLMSG_DATA!(nlh) as *mut ifinfomsg) };
// No attributes needed?
nlh.nlmsg_len = NLMSG_LENGTH!(mem::size_of::<ifinfomsg>()) as __u32;
nlh.nlmsg_type = RTM_DELLINK;
nlh.nlmsg_flags = NLM_F_REQUEST;
self.assign_seqnum(nlh);
ifi.ifi_family = ifinfo.ifi_family;
ifi.ifi_index = ifinfo.ifi_index;
ifi.ifi_type = ifinfo.ifi_type;
self.rtnl_talk(v.as_mut_slice(), false)?;
Ok(iface.clone())
}
pub fn list_interfaces(&mut self) -> Result<Vec<Interface>> {
let mut ifaces: Vec<Interface> = Vec::new();
let (_slv, lv) = self.dump_all_links()?;
let (_sav, av) = self.dump_all_addresses(0)?;
for link in &lv {
// Safe because dump_all_links() returns valid pointers.
let nlh = unsafe { &**link };
if nlh.nlmsg_type != RTM_NEWLINK && nlh.nlmsg_type != RTM_DELLINK {
continue;
}
if nlh.nlmsg_len < NLMSG_SPACE!(mem::size_of::<ifinfomsg>()) {
info!(
sl!(),
"invalid nlmsg! nlmsg_len: {}, nlmsg_space: {}",
nlh.nlmsg_len,
NLMSG_SPACE!(mem::size_of::<ifinfomsg>())
);
break;
}
// Safe because we have just validated available buffer space above.
let ifi = unsafe { &*(NLMSG_DATA!(nlh) as *const ifinfomsg) };
let rta: *mut rtattr = IFLA_RTA!(ifi as *const ifinfomsg) as *mut rtattr;
let rtalen = IFLA_PAYLOAD!(nlh) as u32;
let attrs = unsafe { parse_attrs(rta, rtalen, (IFLA_MAX + 1) as usize)? };
// fill out some fields of Interface,
let mut iface: Interface = Interface::default();
// Safe because parse_attrs() returns valid pointers.
unsafe {
if !attrs[IFLA_IFNAME as usize].is_null() {
let t = attrs[IFLA_IFNAME as usize];
iface.name = String::from_utf8(getattr_var(t as *const rtattr))?;
}
if !attrs[IFLA_MTU as usize].is_null() {
let t = attrs[IFLA_MTU as usize];
iface.mtu = getattr32(t) as u64;
}
if !attrs[IFLA_ADDRESS as usize].is_null() {
let alen = RTA_PAYLOAD!(attrs[IFLA_ADDRESS as usize]);
let a: *const u8 = RTA_DATA!(attrs[IFLA_ADDRESS as usize]) as *const u8;
iface.hwAddr = parser::format_address(a, alen as u32)?;
}
}
// get ip address info from av
let mut ads: Vec<IPAddress> = Vec::new();
for address in &av {
// Safe because dump_all_addresses() returns valid pointers.
let alh = unsafe { &**address };
if alh.nlmsg_type != RTM_NEWADDR {
continue;
}
let tlen = NLMSG_SPACE!(mem::size_of::<ifaddrmsg>());
if alh.nlmsg_len < tlen {
info!(
sl!(),
"invalid nlmsg! nlmsg_len: {}, nlmsg_space: {}", alh.nlmsg_len, tlen
);
break;
}
// Safe becahse we have checked avialable buffer space by NLMSG_SPACE above.
let ifa = unsafe { &*(NLMSG_DATA!(alh) as *const ifaddrmsg) };
let arta: *mut rtattr = IFA_RTA!(ifa) as *mut rtattr;
let artalen = IFA_PAYLOAD!(alh) as u32;
if ifa.ifa_index as u32 == ifi.ifi_index as u32 {
// found target addresses, parse attributes and fill out Interface
let addrs = unsafe { parse_attrs(arta, artalen, (IFA_MAX + 1) as usize)? };
// fill address field of Interface
let mut one: IPAddress = IPAddress::default();
let tattr: *const rtattr = if !addrs[IFA_ADDRESS as usize].is_null() {
addrs[IFA_ADDRESS as usize]
} else {
addrs[IFA_LOCAL as usize]
};
one.mask = format!("{}", ifa.ifa_prefixlen);
one.family = IPFamily::v4;
if ifa.ifa_family == libc::AF_INET6 as u8 {
one.family = IPFamily::v6;
}
// Safe because parse_attrs() returns valid pointers.
unsafe {
let a: *const u8 = RTA_DATA!(tattr) as *const u8;
let alen = RTA_PAYLOAD!(tattr);
one.address = parser::format_address(a, alen as u32)?;
}
ads.push(one);
}
}
iface.IPAddresses = RepeatedField::from_vec(ads);
ifaces.push(iface);
}
Ok(ifaces)
}
pub fn update_routes(&mut self, rt: &[Route]) -> Result<Vec<Route>> {
let rs = self.get_all_routes()?;
self.delete_all_routes(&rs)?;
for grpcroute in rt {
if grpcroute.gateway.as_str() == "" {
let r = RtRoute::try_from(grpcroute.clone())?;
if r.index == -1 {
continue;
}
self.add_one_route(&r)?;
}
}
for grpcroute in rt {
if grpcroute.gateway.as_str() != "" {
let r = RtRoute::try_from(grpcroute.clone())?;
if r.index == -1 {
continue;
}
self.add_one_route(&r)?;
}
}
Ok(rt.to_owned())
}
pub fn list_routes(&mut self) -> Result<Vec<Route>> {
// currently, only dump routes from main table for ipv4
// ie, rtmsg.rtmsg_family = AF_INET, set RT_TABLE_MAIN
// attribute in dump request
// Fix Me: think about othe tables, ipv6..
let mut rs: Vec<Route> = Vec::new();
let (_srv, rv) = self.dump_all_routes()?;
// parse out routes and store in rs
for r in &rv {
// Safe because dump_all_routes() returns valid pointers.
let nlh = unsafe { &**r };
if nlh.nlmsg_type != RTM_NEWROUTE && nlh.nlmsg_type != RTM_DELROUTE {
info!(sl!(), "not route message!");
continue;
}
let tlen = NLMSG_SPACE!(mem::size_of::<rtmsg>());
if nlh.nlmsg_len < tlen {
info!(
sl!(),
"invalid nlmsg! nlmsg_len: {}, nlmsg_spae: {}", nlh.nlmsg_len, tlen
);
break;
}
// Safe because we have just validated available buffer space above.
let rtm = unsafe { &mut *(NLMSG_DATA!(nlh) as *mut rtmsg) };
if rtm.rtm_table != RT_TABLE_MAIN as u8 {
continue;
}
let rta: *mut rtattr = RTM_RTA!(rtm) as *mut rtattr;
let rtalen = RTM_PAYLOAD!(nlh) as u32;
let attrs = unsafe { parse_attrs(rta, rtalen, (RTA_MAX + 1) as usize)? };
let t = attrs[RTA_TABLE as usize];
if !t.is_null() {
// Safe because parse_attrs() returns valid pointers
let table = unsafe { getattr32(t) };
if table != RT_TABLE_MAIN {
continue;
}
}
// find source, destination, gateway, scope, and and device name
let mut t = attrs[RTA_DST as usize];
let mut rte: Route = Route::default();
// Safe because parse_attrs() returns valid pointers
unsafe {
// destination
if !t.is_null() {
let data: *const u8 = RTA_DATA!(t) as *const u8;
let len = RTA_PAYLOAD!(t) as u32;
rte.dest =
format!("{}/{}", parser::format_address(data, len)?, rtm.rtm_dst_len);
}
// gateway
t = attrs[RTA_GATEWAY as usize];
if !t.is_null() {
let data: *const u8 = RTA_DATA!(t) as *const u8;
let len = RTA_PAYLOAD!(t) as u32;
rte.gateway = parser::format_address(data, len)?;
// for gateway, destination is 0.0.0.0
rte.dest = "0.0.0.0".to_string();
}
// source
t = attrs[RTA_SRC as usize];
if t.is_null() {
t = attrs[RTA_PREFSRC as usize];
}
if !t.is_null() {
let data: *const u8 = RTA_DATA!(t) as *const u8;
let len = RTA_PAYLOAD!(t) as u32;
rte.source = parser::format_address(data, len)?;
if rtm.rtm_src_len != 0 {
rte.source = format!("{}/{}", rte.source.as_str(), rtm.rtm_src_len);
}
}
// scope
rte.scope = rtm.rtm_scope as u32;
// oif
t = attrs[RTA_OIF as usize];
if !t.is_null() {
let data = &*(RTA_DATA!(t) as *const i32);
assert_eq!(RTA_PAYLOAD!(t), 4);
rte.device = self
.get_name_by_index(*data)
.unwrap_or_else(|_| "unknown".to_string());
}
}
rs.push(rte);
}
Ok(rs)
}
pub fn add_arp_neighbors(&mut self, neighs: &[ARPNeighbor]) -> Result<()> {
for neigh in neighs {
self.add_one_arp_neighbor(&neigh)?;
}
Ok(())
}
pub fn add_one_arp_neighbor(&mut self, neigh: &ARPNeighbor) -> Result<()> {
let to_ip = match neigh.toIPAddress.as_ref() {
None => return nix_errno(Errno::EINVAL),
Some(v) => {
if v.address.is_empty() {
return nix_errno(Errno::EINVAL);
}
v.address.as_ref()
}
};
let dev = self.find_link_by_name(&neigh.device)?;
let mut v: Vec<u8> = vec![0; DEFAULT_NETLINK_BUF_SIZE];
// Safe because we have allocated enough buffer space.
let nlh = unsafe { &mut *(v.as_mut_ptr() as *mut nlmsghdr) };
let ndm = unsafe { &mut *(NLMSG_DATA!(nlh) as *mut ndmsg) };
nlh.nlmsg_len = NLMSG_LENGTH!(std::mem::size_of::<ndmsg>()) as u32;
nlh.nlmsg_type = RTM_NEWNEIGH;
nlh.nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL;
self.assign_seqnum(nlh);
ndm.ndm_family = libc::AF_UNSPEC as __u8;
ndm.ndm_state = IFA_F_PERMANENT as __u16;
// process lladdr
if neigh.lladdr != "" {
let llabuf = parser::parse_mac_addr(&neigh.lladdr)?;
// Safe because we have allocated enough buffer space.
unsafe { nlh.addattr_var(NDA_LLADDR, llabuf.as_ref()) };
}
let (family, ip_data) = parser::parse_ip_addr_with_family(&to_ip)?;
ndm.ndm_family = family;
// Safe because we have allocated enough buffer space.
unsafe { nlh.addattr_var(NDA_DST, ip_data.as_ref()) };
// process state
if neigh.state != 0 {
ndm.ndm_state = neigh.state as __u16;
}
// process flags
ndm.ndm_flags = (*ndm).ndm_flags | neigh.flags as __u8;
// process dev
ndm.ndm_ifindex = dev.ifi_index;
// send
self.rtnl_talk(v.as_mut_slice(), false)?;
Ok(())
}
}
impl TryFrom<IPAddress> for RtIPAddr {
type Error = nix::Error;
fn try_from(ipi: IPAddress) -> std::result::Result<Self, Self::Error> {
let ip_family = if ipi.family == IPFamily::v4 {
libc::AF_INET
} else {
libc::AF_INET6
} as __u8;
let ip_mask = parser::parse_u8(ipi.mask.as_str(), 10)?;
let addr = parser::parse_ip_addr(ipi.address.as_ref())?;
Ok(Self {
ip_family,
ip_mask,
addr,
})
}
}
impl TryFrom<Route> for RtRoute {
type Error = nix::Error;
fn try_from(r: Route) -> std::result::Result<Self, Self::Error> {
// only handle ipv4
let index = {
let mut rh = RtnlHandle::new(NETLINK_ROUTE, 0)?;
match rh.find_link_by_name(r.device.as_str()) {
Ok(ifi) => ifi.ifi_index,
Err(_) => -1,
}
};
let (dest, dst_len) = if r.dest.is_empty() {
(Some(vec![0 as u8; 4]), 0)
} else {
let (dst, mask) = parser::parse_cidr(r.dest.as_str())?;
(Some(dst), mask)
};
let (source, src_len) = if r.source.is_empty() {
(None, 0)
} else {
let (src, mask) = parser::parse_cidr(r.source.as_str())?;
(Some(src), mask)
};
let gateway = if r.gateway.is_empty() {
None
} else {
Some(parser::parse_ip_addr(r.gateway.as_str())?)
};
Ok(Self {
dest,
source,
src_len,
dst_len,
index,
gateway,
scope: r.scope as u8,
protocol: RTPROTO_UNSPEC,
})
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::{RtnlHandle, NETLINK_ROUTE};
use protocols::types::IPAddress;
use std::process::Command;
fn clean_env_for_test_add_one_arp_neighbor(dummy_name: &str, ip: &str) {
// ip link delete dummy
Command::new("ip")
.args(&["link", "delete", dummy_name])
.output()
.expect("prepare: failed to delete dummy");
// ip neigh del dev dummy ip
Command::new("ip")
.args(&["neigh", "del", dummy_name, ip])
.output()
.expect("prepare: failed to delete neigh");
}
fn prepare_env_for_test_add_one_arp_neighbor(dummy_name: &str, ip: &str) {
clean_env_for_test_add_one_arp_neighbor(dummy_name, ip);
// modprobe dummy
Command::new("modprobe")
.arg("dummy")
.output()
.expect("failed to run modprobe dummy");
// ip link add dummy type dummy
Command::new("ip")
.args(&["link", "add", dummy_name, "type", "dummy"])
.output()
.expect("failed to add dummy interface");
// ip addr add 192.168.0.2/16 dev dummy
Command::new("ip")
.args(&["addr", "add", "192.168.0.2/16", "dev", dummy_name])
.output()
.expect("failed to add ip for dummy");
// ip link set dummy up;
Command::new("ip")
.args(&["link", "set", dummy_name, "up"])
.output()
.expect("failed to up dummy");
}
#[test]
fn test_add_one_arp_neighbor() {
// skip_if_not_root
if !nix::unistd::Uid::effective().is_root() {
println!("INFO: skipping {} which needs root", module_path!());
return;
}
let mac = "6a:92:3a:59:70:aa";
let to_ip = "169.254.1.1";
let dummy_name = "dummy_for_arp";
prepare_env_for_test_add_one_arp_neighbor(dummy_name, to_ip);
let mut ip_address = IPAddress::new();
ip_address.set_address(to_ip.to_string());
let mut neigh = ARPNeighbor::new();
neigh.set_toIPAddress(ip_address);
neigh.set_device(dummy_name.to_string());
neigh.set_lladdr(mac.to_string());
neigh.set_state(0x80);
let mut rtnl = RtnlHandle::new(NETLINK_ROUTE, 0).unwrap();
rtnl.add_one_arp_neighbor(&neigh).unwrap();
// ip neigh show dev dummy ip
let stdout = Command::new("ip")
.args(&["neigh", "show", "dev", dummy_name, to_ip])
.output()
.expect("failed to show neigh")
.stdout;
let stdout = std::str::from_utf8(&stdout).expect("failed to conveert stdout");
assert_eq!(stdout, format!("{} lladdr {} PERMANENT\n", to_ip, mac));
clean_env_for_test_add_one_arp_neighbor(dummy_name, to_ip);
}
}

2354
src/agent/netlink/src/lib.rs Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,201 @@
// Copyright (c) 2019 Ant Financial
//
// SPDX-License-Identifier: Apache-2.0
//! Parser for IPv4/IPv6/MAC addresses.
use std::net::{Ipv4Addr, Ipv6Addr};
use std::str::FromStr;
use super::{Errno, Result, __u8, nix_errno};
#[inline]
pub(crate) fn parse_u8(s: &str, radix: u32) -> Result<u8> {
if radix >= 2 && radix <= 36 {
u8::from_str_radix(s, radix).map_err(|_| nix::Error::Sys(Errno::EINVAL))
} else {
u8::from_str(s).map_err(|_| nix::Error::Sys(Errno::EINVAL))
}
}
pub fn parse_ipv4_addr(s: &str) -> Result<Vec<u8>> {
match Ipv4Addr::from_str(s) {
Ok(v) => Ok(Vec::from(v.octets().as_ref())),
Err(_e) => nix_errno(Errno::EINVAL),
}
}
pub fn parse_ip_addr(s: &str) -> Result<Vec<u8>> {
if let Ok(v6) = Ipv6Addr::from_str(s) {
Ok(Vec::from(v6.octets().as_ref()))
} else {
parse_ipv4_addr(s)
}
}
pub fn parse_ip_addr_with_family(ip_address: &str) -> Result<(__u8, Vec<u8>)> {
if let Ok(v6) = Ipv6Addr::from_str(ip_address) {
Ok((libc::AF_INET6 as __u8, Vec::from(v6.octets().as_ref())))
} else {
parse_ipv4_addr(ip_address).map(|v| (libc::AF_INET as __u8, v))
}
}
pub fn parse_ipv4_cidr(s: &str) -> Result<(Vec<u8>, u8)> {
let fields: Vec<&str> = s.split('/').collect();
if fields.len() != 2 {
nix_errno(Errno::EINVAL)
} else {
Ok((parse_ipv4_addr(fields[0])?, parse_u8(fields[1], 10)?))
}
}
pub fn parse_cidr(s: &str) -> Result<(Vec<u8>, u8)> {
let fields: Vec<&str> = s.split('/').collect();
if fields.len() != 2 {
nix_errno(Errno::EINVAL)
} else {
Ok((parse_ip_addr(fields[0])?, parse_u8(fields[1], 10)?))
}
}
pub fn parse_mac_addr(hwaddr: &str) -> Result<Vec<u8>> {
let fields: Vec<&str> = hwaddr.split(':').collect();
if fields.len() != 6 {
nix_errno(Errno::EINVAL)
} else {
Ok(vec![
parse_u8(fields[0], 16)?,
parse_u8(fields[1], 16)?,
parse_u8(fields[2], 16)?,
parse_u8(fields[3], 16)?,
parse_u8(fields[4], 16)?,
parse_u8(fields[5], 16)?,
])
}
}
/// Format an IPv4/IPv6/MAC address.
///
/// # Safety
/// Caller needs to ensure that addr and len are valid.
pub unsafe fn format_address(addr: *const u8, len: u32) -> Result<String> {
let mut a: String;
if len == 4 {
// ipv4
let mut i = 1;
let mut p = addr as i64;
a = format!("{}", *(p as *const u8));
while i < len {
p += 1;
i += 1;
a.push_str(format!(".{}", *(p as *const u8)).as_str());
}
return Ok(a);
}
if len == 6 {
// hwaddr
let mut i = 1;
let mut p = addr as i64;
a = format!("{:0>2X}", *(p as *const u8));
while i < len {
p += 1;
i += 1;
a.push_str(format!(":{:0>2X}", *(p as *const u8)).as_str());
}
return Ok(a);
}
if len == 16 {
// ipv6
let p = addr as *const u8 as *const libc::c_void;
let mut ar: [u8; 16] = [0; 16];
let mut v: Vec<u8> = vec![0; 16];
let dp: *mut libc::c_void = v.as_mut_ptr() as *mut libc::c_void;
libc::memcpy(dp, p, 16);
ar.copy_from_slice(v.as_slice());
return Ok(Ipv6Addr::from(ar).to_string());
}
nix_errno(Errno::EINVAL)
}
#[cfg(test)]
mod tests {
use super::*;
use libc;
#[test]
fn test_ip_addr() {
let ip = parse_ipv4_addr("1.2.3.4").unwrap();
assert_eq!(ip, vec![0x1u8, 0x2u8, 0x3u8, 0x4u8]);
parse_ipv4_addr("1.2.3.4.5").unwrap_err();
parse_ipv4_addr("1.2.3-4").unwrap_err();
parse_ipv4_addr("1.2.3.a").unwrap_err();
parse_ipv4_addr("1.2.3.x").unwrap_err();
parse_ipv4_addr("-1.2.3.4").unwrap_err();
parse_ipv4_addr("+1.2.3.4").unwrap_err();
let (family, _) = parse_ip_addr_with_family("192.168.1.1").unwrap();
assert_eq!(family, libc::AF_INET as __u8);
let (family, ip) =
parse_ip_addr_with_family("2001:0db8:85a3:0000:0000:8a2e:0370:7334").unwrap();
assert_eq!(family, libc::AF_INET6 as __u8);
assert_eq!(ip.len(), 16);
parse_ip_addr_with_family("2001:0db8:85a3:0000:0000:8a2e:0370:73345").unwrap_err();
let ip = parse_ip_addr("::1").unwrap();
assert_eq!(ip[0], 0x0);
assert_eq!(ip[15], 0x1);
}
#[test]
fn test_parse_cidr() {
let (_, mask) = parse_ipv4_cidr("1.2.3.4/31").unwrap();
assert_eq!(mask, 31);
parse_ipv4_cidr("1.2.3/4/31").unwrap_err();
parse_ipv4_cidr("1.2.3.4/f").unwrap_err();
parse_ipv4_cidr("1.2.3/8").unwrap_err();
parse_ipv4_cidr("1.2.3.4.8").unwrap_err();
let (ip, mask) = parse_cidr("2001:db8:a::123/64").unwrap();
assert_eq!(mask, 64);
assert_eq!(ip[0], 0x20);
assert_eq!(ip[15], 0x23);
}
#[test]
fn test_parse_mac_addr() {
let mac = parse_mac_addr("FF:FF:FF:FF:FF:FE").unwrap();
assert_eq!(mac.len(), 6);
assert_eq!(mac[0], 0xff);
assert_eq!(mac[5], 0xfe);
parse_mac_addr("FF:FF:FF:FF:FF:FE:A0").unwrap_err();
parse_mac_addr("FF:FF:FF:FF:FF:FX").unwrap_err();
parse_mac_addr("FF:FF:FF:FF:FF").unwrap_err();
}
#[test]
fn test_format_address() {
let buf = [1u8, 2u8, 3u8, 4u8];
let addr = unsafe { format_address(&buf as *const u8, 4).unwrap() };
assert_eq!(addr, "1.2.3.4");
let buf = [1u8, 2u8, 3u8, 4u8, 5u8, 6u8];
let addr = unsafe { format_address(&buf as *const u8, 6).unwrap() };
assert_eq!(addr, "01:02:03:04:05:06");
}
}

View File

@@ -8,7 +8,7 @@ extern crate serde;
extern crate serde_derive;
extern crate serde_json;
use libc::{self, mode_t};
use libc::mode_t;
use std::collections::HashMap;
mod serialize;
@@ -27,10 +27,6 @@ where
*d == T::default()
}
fn default_seccomp_errno() -> u32 {
libc::EPERM as u32
}
#[derive(Serialize, Deserialize, Debug, Default, Clone, PartialEq)]
pub struct Spec {
#[serde(
@@ -58,7 +54,7 @@ pub struct Spec {
#[serde(skip_serializing_if = "Option::is_none")]
pub windows: Option<Windows<String>>,
#[serde(skip_serializing_if = "Option::is_none")]
pub vm: Option<Vm>,
pub vm: Option<VM>,
}
impl Spec {
@@ -71,7 +67,7 @@ impl Spec {
}
}
pub type LinuxRlimit = PosixRlimit;
pub type LinuxRlimit = POSIXRlimit;
#[derive(Serialize, Deserialize, Debug, Default, Clone, PartialEq)]
pub struct Process {
@@ -93,7 +89,7 @@ pub struct Process {
#[serde(default, skip_serializing_if = "Option::is_none")]
pub capabilities: Option<LinuxCapabilities>,
#[serde(default, skip_serializing_if = "Vec::is_empty")]
pub rlimits: Vec<PosixRlimit>,
pub rlimits: Vec<POSIXRlimit>,
#[serde(default, rename = "noNewPrivileges")]
pub no_new_privileges: bool,
#[serde(
@@ -199,9 +195,9 @@ pub struct Hooks {
#[derive(Serialize, Deserialize, Debug, Default, Clone, PartialEq)]
pub struct Linux {
#[serde(default, rename = "uidMappings", skip_serializing_if = "Vec::is_empty")]
pub uid_mappings: Vec<LinuxIdMapping>,
pub uid_mappings: Vec<LinuxIDMapping>,
#[serde(default, rename = "gidMappings", skip_serializing_if = "Vec::is_empty")]
pub gid_mappings: Vec<LinuxIdMapping>,
pub gid_mappings: Vec<LinuxIDMapping>,
#[serde(default, skip_serializing_if = "HashMap::is_empty")]
pub sysctl: HashMap<String, String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
@@ -261,7 +257,7 @@ pub const UTSNAMESPACE: &str = "uts";
pub const CGROUPNAMESPACE: &str = "cgroup";
#[derive(Serialize, Deserialize, Debug, Default, Clone, PartialEq)]
pub struct LinuxIdMapping {
pub struct LinuxIDMapping {
#[serde(default, rename = "containerID")]
pub container_id: u32,
#[serde(default, rename = "hostID")]
@@ -271,7 +267,7 @@ pub struct LinuxIdMapping {
}
#[derive(Serialize, Deserialize, Debug, Default, Clone, PartialEq)]
pub struct PosixRlimit {
pub struct POSIXRlimit {
#[serde(default)]
pub r#type: String,
#[serde(default)]
@@ -297,7 +293,7 @@ pub struct LinuxInterfacePriority {
}
#[derive(Serialize, Deserialize, Debug, Default, Clone, PartialEq)]
pub struct LinuxBlockIoDevice {
pub struct LinuxBlockIODevice {
#[serde(default)]
pub major: i64,
#[serde(default)]
@@ -307,7 +303,7 @@ pub struct LinuxBlockIoDevice {
#[derive(Serialize, Deserialize, Debug, Default, Clone, PartialEq)]
pub struct LinuxWeightDevice {
#[serde(flatten)]
pub blk: LinuxBlockIoDevice,
pub blk: LinuxBlockIODevice,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub weight: Option<u16>,
#[serde(
@@ -321,13 +317,13 @@ pub struct LinuxWeightDevice {
#[derive(Serialize, Deserialize, Debug, Default, Clone, PartialEq)]
pub struct LinuxThrottleDevice {
#[serde(flatten)]
pub blk: LinuxBlockIoDevice,
pub blk: LinuxBlockIODevice,
#[serde(default)]
pub rate: u64,
}
#[derive(Serialize, Deserialize, Debug, Default, Clone, PartialEq)]
pub struct LinuxBlockIo {
pub struct LinuxBlockIO {
#[serde(default, skip_serializing_if = "Option::is_none")]
pub weight: Option<u16>,
#[serde(
@@ -391,7 +387,7 @@ pub struct LinuxMemory {
}
#[derive(Serialize, Deserialize, Debug, Default, Clone, PartialEq)]
pub struct LinuxCpu {
pub struct LinuxCPU {
#[serde(default, skip_serializing_if = "Option::is_none")]
pub shares: Option<u64>,
#[serde(default, skip_serializing_if = "Option::is_none")]
@@ -453,11 +449,11 @@ pub struct LinuxResources {
#[serde(default, skip_serializing_if = "Option::is_none")]
pub memory: Option<LinuxMemory>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub cpu: Option<LinuxCpu>,
pub cpu: Option<LinuxCPU>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub pids: Option<LinuxPids>,
#[serde(skip_serializing_if = "Option::is_none", rename = "blockIO")]
pub block_io: Option<LinuxBlockIo>,
pub block_io: Option<LinuxBlockIO>,
#[serde(
default,
skip_serializing_if = "Vec::is_empty",
@@ -517,7 +513,7 @@ pub struct Solaris {
#[serde(default, skip_serializing_if = "Vec::is_empty")]
pub anet: Vec<SolarisAnet>,
#[serde(default, skip_serializing_if = "Option::is_none", rename = "cappedCPU")]
pub capped_cpu: Option<SolarisCappedCpu>,
pub capped_cpu: Option<SolarisCappedCPU>,
#[serde(
default,
skip_serializing_if = "Option::is_none",
@@ -527,7 +523,7 @@ pub struct Solaris {
}
#[derive(Serialize, Deserialize, Debug, Default, Clone, PartialEq)]
pub struct SolarisCappedCpu {
pub struct SolarisCappedCPU {
#[serde(default, skip_serializing_if = "String::is_empty")]
pub ncpus: String,
}
@@ -605,7 +601,7 @@ pub struct WindowsResources {
#[serde(default, skip_serializing_if = "Option::is_none")]
pub memory: Option<WindowsMemoryResources>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub cpu: Option<WindowsCpuResources>,
pub cpu: Option<WindowsCPUResources>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub storage: Option<WindowsStorageResources>,
}
@@ -617,7 +613,7 @@ pub struct WindowsMemoryResources {
}
#[derive(Serialize, Deserialize, Debug, Default, Clone, PartialEq)]
pub struct WindowsCpuResources {
pub struct WindowsCPUResources {
#[serde(default, skip_serializing_if = "Option::is_none")]
pub count: Option<u64>,
#[serde(default, skip_serializing_if = "Option::is_none")]
@@ -675,14 +671,14 @@ pub struct WindowsHyperV {
}
#[derive(Serialize, Deserialize, Debug, Default, Clone, PartialEq)]
pub struct Vm {
pub hypervisor: VmHypervisor,
pub kernel: VmKernel,
pub image: VmImage,
pub struct VM {
pub hypervisor: VMHypervisor,
pub kernel: VMKernel,
pub image: VMImage,
}
#[derive(Serialize, Deserialize, Debug, Default, Clone, PartialEq)]
pub struct VmHypervisor {
pub struct VMHypervisor {
#[serde(default)]
pub path: String,
#[serde(default, skip_serializing_if = "String::is_empty")]
@@ -690,7 +686,7 @@ pub struct VmHypervisor {
}
#[derive(Serialize, Deserialize, Debug, Default, Clone, PartialEq)]
pub struct VmKernel {
pub struct VMKernel {
#[serde(default)]
pub path: String,
#[serde(default, skip_serializing_if = "String::is_empty")]
@@ -700,7 +696,7 @@ pub struct VmKernel {
}
#[derive(Serialize, Deserialize, Debug, Default, Clone, PartialEq)]
pub struct VmImage {
pub struct VMImage {
#[serde(default)]
pub path: String,
#[serde(default)]
@@ -714,8 +710,6 @@ pub struct LinuxSeccomp {
#[serde(default, skip_serializing_if = "Vec::is_empty")]
pub architectures: Vec<Arch>,
#[serde(default, skip_serializing_if = "Vec::is_empty")]
pub flags: Vec<LinuxSeccompFlag>,
#[serde(default, skip_serializing_if = "Vec::is_empty")]
pub syscalls: Vec<LinuxSyscall>,
}
@@ -739,20 +733,14 @@ pub const ARCHS390: &str = "SCMP_ARCH_S390";
pub const ARCHS390X: &str = "SCMP_ARCH_S390X";
pub const ARCHPARISC: &str = "SCMP_ARCH_PARISC";
pub const ARCHPARISC64: &str = "SCMP_ARCH_PARISC64";
pub const ARCHRISCV64: &str = "SCMP_ARCH_RISCV64";
pub type LinuxSeccompFlag = String;
pub type LinuxSeccompAction = String;
pub const ACTKILL: &str = "SCMP_ACT_KILL";
pub const ACTKILLPROCESS: &str = "SCMP_ACT_KILL_PROCESS";
pub const ACTKILLTHREAD: &str = "SCMP_ACT_KILL_THREAD";
pub const ACTTRAP: &str = "SCMP_ACT_TRAP";
pub const ACTERRNO: &str = "SCMP_ACT_ERRNO";
pub const ACTTRACE: &str = "SCMP_ACT_TRACE";
pub const ACTALLOW: &str = "SCMP_ACT_ALLOW";
pub const ACTLOG: &str = "SCMP_ACT_LOG";
pub type LinuxSeccompOperator = String;
@@ -782,8 +770,6 @@ pub struct LinuxSyscall {
pub names: Vec<String>,
#[serde(default, skip_serializing_if = "String::is_empty")]
pub action: LinuxSeccompAction,
#[serde(default = "default_seccomp_errno", rename = "errnoRet")]
pub errno_ret: u32,
#[serde(default, skip_serializing_if = "Vec::is_empty")]
pub args: Vec<LinuxSeccompArg>,
}
@@ -798,17 +784,7 @@ pub struct LinuxIntelRdt {
pub l3_cache_schema: String,
}
#[derive(Debug, Serialize, Deserialize, Copy, Clone, PartialEq)]
#[serde(rename_all = "lowercase")]
pub enum ContainerState {
Creating,
Created,
Running,
Stopped,
Paused,
}
#[derive(Serialize, Deserialize, Debug, Clone, PartialEq)]
#[derive(Serialize, Deserialize, Debug, Default, Clone, PartialEq)]
pub struct State {
#[serde(
default,
@@ -818,7 +794,8 @@ pub struct State {
pub version: String,
#[serde(default, skip_serializing_if = "String::is_empty")]
pub id: String,
pub status: ContainerState,
#[serde(default, skip_serializing_if = "String::is_empty")]
pub status: String,
#[serde(default)]
pub pid: i32,
#[serde(default, skip_serializing_if = "String::is_empty")]
@@ -829,8 +806,6 @@ pub struct State {
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_deserialize_state() {
let data = r#"{
@@ -843,10 +818,10 @@ mod tests {
"myKey": "myValue"
}
}"#;
let expected = State {
let expected = crate::State {
version: "0.2.0".to_string(),
id: "oci-container1".to_string(),
status: ContainerState::Running,
status: "running".to_string(),
pid: 4422,
bundle: "/containers/redis".to_string(),
annotations: [("myKey".to_string(), "myValue".to_string())]
@@ -1271,12 +1246,12 @@ mod tests {
ambient: vec!["CAP_NET_BIND_SERVICE".to_string()],
}),
rlimits: vec![
crate::PosixRlimit {
crate::POSIXRlimit {
r#type: "RLIMIT_CORE".to_string(),
hard: 1024,
soft: 1024,
},
crate::PosixRlimit {
crate::POSIXRlimit {
r#type: "RLIMIT_NOFILE".to_string(),
hard: 1024,
soft: 1024,
@@ -1408,12 +1383,12 @@ mod tests {
.cloned()
.collect(),
linux: Some(crate::Linux {
uid_mappings: vec![crate::LinuxIdMapping {
uid_mappings: vec![crate::LinuxIDMapping {
container_id: 0,
host_id: 1000,
size: 32000,
}],
gid_mappings: vec![crate::LinuxIdMapping {
gid_mappings: vec![crate::LinuxIDMapping {
container_id: 0,
host_id: 1000,
size: 32000,
@@ -1458,7 +1433,7 @@ mod tests {
swappiness: Some(0),
disable_oom_killer: Some(false),
}),
cpu: Some(crate::LinuxCpu {
cpu: Some(crate::LinuxCPU {
shares: Some(1024),
quota: Some(1000000),
period: Some(500000),
@@ -1468,17 +1443,17 @@ mod tests {
mems: "0-7".to_string(),
}),
pids: Some(crate::LinuxPids { limit: 32771 }),
block_io: Some(crate::LinuxBlockIo {
block_io: Some(crate::LinuxBlockIO {
weight: Some(10),
leaf_weight: Some(10),
weight_device: vec![
crate::LinuxWeightDevice {
blk: crate::LinuxBlockIoDevice { major: 8, minor: 0 },
blk: crate::LinuxBlockIODevice { major: 8, minor: 0 },
weight: Some(500),
leaf_weight: Some(300),
},
crate::LinuxWeightDevice {
blk: crate::LinuxBlockIoDevice {
blk: crate::LinuxBlockIODevice {
major: 8,
minor: 16,
},
@@ -1487,13 +1462,13 @@ mod tests {
},
],
throttle_read_bps_device: vec![crate::LinuxThrottleDevice {
blk: crate::LinuxBlockIoDevice { major: 8, minor: 0 },
blk: crate::LinuxBlockIODevice { major: 8, minor: 0 },
rate: 600,
}],
throttle_write_bps_device: vec![],
throttle_read_iops_device: vec![],
throttle_write_iops_device: vec![crate::LinuxThrottleDevice {
blk: crate::LinuxBlockIoDevice {
blk: crate::LinuxBlockIODevice {
major: 8,
minor: 16,
},
@@ -1579,11 +1554,9 @@ mod tests {
seccomp: Some(crate::LinuxSeccomp {
default_action: "SCMP_ACT_ALLOW".to_string(),
architectures: vec!["SCMP_ARCH_X86".to_string(), "SCMP_ARCH_X32".to_string()],
flags: vec![],
syscalls: vec![crate::LinuxSyscall {
names: vec!["getcwd".to_string(), "chmod".to_string()],
action: "SCMP_ACT_ERRNO".to_string(),
errno_ret: crate::default_seccomp_errno(),
args: vec![],
}],
}),

View File

@@ -4,6 +4,7 @@
//
use serde::{Deserialize, Serialize};
use serde_json;
use std::error;
use std::fmt::{Display, Formatter, Result as FmtResult};

View File

@@ -5,9 +5,9 @@ authors = ["The Kata Containers community <kata-dev@lists.katacontainers.io>"]
edition = "2018"
[dependencies]
ttrpc = { version = "0.5.0", features = ["async"] }
async-trait = "0.1.42"
ttrpc = "0.3.0"
protobuf = "=2.14.0"
futures = "0.1.27"
[build-dependencies]
ttrpc-codegen = "0.2.0"
ttrpc-codegen = "0.1.2"

View File

@@ -3,8 +3,8 @@
// SPDX-License-Identifier: Apache-2.0
//
use std::fs;
use ttrpc_codegen::{Codegen, Customize};
use std::fs::File;
use std::io::{Read, Write};
fn main() {
let protos = vec![
@@ -15,15 +15,16 @@ fn main() {
"protos/oci.proto",
];
Codegen::new()
// Tell Cargo that if the .proto files changed, to rerun this build script.
protos
.iter()
.for_each(|p| println!("cargo:rerun-if-changed={}", &p));
ttrpc_codegen::Codegen::new()
.out_dir("src")
.inputs(&protos)
.include("protos")
.rust_protobuf()
.customize(Customize {
async_server: true,
..Default::default()
})
.run()
.expect("Gen codes failed.");
@@ -39,6 +40,16 @@ fn main() {
}
fn replace_text_in_file(file_name: &str, from: &str, to: &str) -> Result<(), std::io::Error> {
let new_contents = fs::read_to_string(file_name)?.replace(from, to);
fs::write(&file_name, new_contents.as_bytes())
let mut src = File::open(file_name)?;
let mut contents = String::new();
src.read_to_string(&mut contents).unwrap();
drop(src);
let new_contents = contents.replace(from, to);
let mut dst = File::create(&file_name)?;
dst.write_all(new_contents.as_bytes())?;
Ok(())
}

View File

@@ -47,7 +47,7 @@ show_usage() {
}
generate_go_sources() {
local cmd="protoc -I$GOPATH/src:$GOPATH/src/github.com/kata-containers/kata-containers/src/agent/protocols/protos \
local cmd="protoc -I$GOPATH/src/github.com/kata-containers/agent/vendor/github.com/gogo/protobuf:$GOPATH/src/github.com/kata-containers/agent/vendor:$GOPATH/src/github.com/gogo/protobuf:$GOPATH/src/github.com/gogo/googleapis:$GOPATH/src:$GOPATH/src/github.com/kata-containers/kata-containers/src/agent/protocols/protos \
--gogottrpc_out=plugins=ttrpc+fieldpath,\
import_path=github.com/kata-containers/kata-containers/src/runtime/virtcontainers/pkg/agent/protocols/grpc,\
\
@@ -65,7 +65,7 @@ $GOPATH/src/github.com/kata-containers/kata-containers/src/agent/protocols/proto
}
if [ "$(basename $(pwd))" != "agent" ]; then
die "Please go to root directory of agent before execute this shell"
die "Please go to directory of protocols before execute this shell"
fi
# Protocol buffer files required to generate golang/rust bindings.
@@ -80,6 +80,12 @@ fi;
which protoc
[ $? -eq 0 ] || die "Please install protoc from github.com/protocolbuffers/protobuf"
which protoc-gen-rust
[ $? -eq 0 ] || die "Please install protobuf-codegen from github.com/pingcap/grpc-rs"
which ttrpc_rust_plugin
[ $? -eq 0 ] || die "Please install ttrpc_rust_plugin from https://github.com/containerd/ttrpc-rust"
which protoc-gen-gogottrpc
[ $? -eq 0 ] || die "Please install protoc-gen-gogottrpc from https://github.com/containerd/ttrpc"

View File

@@ -32,6 +32,7 @@ service AgentService {
rpc ExecProcess(ExecProcessRequest) returns (google.protobuf.Empty);
rpc SignalProcess(SignalProcessRequest) returns (google.protobuf.Empty);
rpc WaitProcess(WaitProcessRequest) returns (WaitProcessResponse); // wait & reap like waitpid(2)
rpc ListProcesses(ListProcessesRequest) returns (ListProcessesResponse);
rpc UpdateContainer(UpdateContainerRequest) returns (google.protobuf.Empty);
rpc StatsContainer(StatsContainerRequest) returns (StatsContainerResponse);
rpc PauseContainer(PauseContainerRequest) returns (google.protobuf.Empty);
@@ -125,6 +126,18 @@ message WaitProcessResponse {
int32 status = 1;
}
// ListProcessesRequest contains the options used to list running processes inside the container
message ListProcessesRequest {
string container_id = 1;
string format = 2;
repeated string args = 3;
}
// ListProcessesResponse represents the list of running processes inside the container
message ListProcessesResponse {
bytes process_list = 1;
}
message UpdateContainerRequest {
string container_id = 1;
LinuxResources resources = 2;

View File

@@ -12,6 +12,7 @@ option go_package = "github.com/kata-containers/kata-containers/src/runtime/virt
package grpc;
import "gogo/protobuf/gogoproto/gogo.proto";
import "google/protobuf/wrappers.proto";
option (gogoproto.equal_all) = true;
option (gogoproto.populate_all) = true;
@@ -441,8 +442,7 @@ message LinuxInterfacePriority {
message LinuxSeccomp {
string DefaultAction = 1;
repeated string Architectures = 2;
repeated string Flags = 3;
repeated LinuxSyscall Syscalls = 4 [(gogoproto.nullable) = false];
repeated LinuxSyscall Syscalls = 3 [(gogoproto.nullable) = false];
}
message LinuxSeccompArg {
@@ -455,10 +455,7 @@ message LinuxSeccompArg {
message LinuxSyscall {
repeated string Names = 1;
string Action = 2;
oneof ErrnoRet {
uint32 errnoret = 3;
}
repeated LinuxSeccompArg Args = 4 [(gogoproto.nullable) = false];
repeated LinuxSeccompArg Args = 3 [(gogoproto.nullable) = false];
}
message LinuxIntelRdt {

View File

@@ -29,8 +29,10 @@ message Interface {
uint64 mtu = 4;
string hwAddr = 5;
// PCI path for the device (see the pci::Path (Rust) or types.PciPath (Go) type for format details)
string pciPath = 6;
// pciAddr is the PCI address in the format "bridgeAddr/deviceAddr".
// Here, bridgeAddr is the address at which the bridge is attached on the root bus,
// while deviceAddr is the address at which the network device is attached on the bridge.
string pciAddr = 6;
// Type defines the type of interface described by this structure.
// The expected values are the one that are defined by the netlink

View File

@@ -3,7 +3,6 @@
// SPDX-License-Identifier: Apache-2.0
//
#![allow(bare_trait_objects)]
#![allow(clippy::redundant_field_names)]
pub mod agent;
pub mod agent_ttrpc;
@@ -12,3 +11,11 @@ pub mod health;
pub mod health_ttrpc;
pub mod oci;
pub mod types;
#[cfg(test)]
mod tests {
#[test]
fn it_works() {
assert_eq!(2 + 2, 4);
}
}

View File

@@ -10,27 +10,22 @@ serde_json = "1.0.39"
serde_derive = "1.0.91"
oci = { path = "../oci" }
protocols = { path ="../protocols" }
caps = "0.5.0"
caps = "0.3.0"
nix = "0.17.0"
scopeguard = "1.0.0"
prctl = "1.0.0"
lazy_static = "1.3.0"
libc = "0.2.58"
protobuf = "=2.14.0"
protobuf = "2.8.1"
slog = "2.5.2"
slog-scope = "4.1.2"
scan_fmt = "0.2"
regex = "1.1"
path-absolutize = "1.2.0"
dirs = "3.0.1"
anyhow = "1.0.32"
cgroups = { package = "cgroups-rs", version = "0.2.5" }
cgroups = { git = "https://github.com/kata-containers/cgroups-rs", branch = "stable-0.1.1"}
tempfile = "3.1.0"
rlimit = "0.5.3"
tokio = { version = "1.2.0", features = ["sync", "io-util", "process", "time", "macros"] }
futures = "0.3"
async-trait = "0.1.31"
inotify = "0.9.2"
[dev-dependencies]
serial_test = "0.5.0"

View File

@@ -6,47 +6,102 @@
// looks like we can use caps to manipulate capabilities
// conveniently, use caps to do it directly.. maybe
use lazy_static;
use crate::log_child;
use crate::sync::write_count;
use anyhow::{anyhow, Result};
use caps::{self, runtime, CapSet, Capability, CapsHashSet};
use caps::{self, CapSet, Capability, CapsHashSet};
use oci::LinuxCapabilities;
use std::collections::HashMap;
use std::os::unix::io::RawFd;
use std::str::FromStr;
lazy_static! {
pub static ref CAPSMAP: HashMap<String, Capability> = {
let mut m = HashMap::new();
m.insert("CAP_CHOWN".to_string(), Capability::CAP_CHOWN);
m.insert("CAP_DAC_OVERRIDE".to_string(), Capability::CAP_DAC_OVERRIDE);
m.insert(
"CAP_DAC_READ_SEARCH".to_string(),
Capability::CAP_DAC_READ_SEARCH,
);
m.insert("CAP_FOWNER".to_string(), Capability::CAP_FOWNER);
m.insert("CAP_FSETID".to_string(), Capability::CAP_FSETID);
m.insert("CAP_KILL".to_string(), Capability::CAP_KILL);
m.insert("CAP_SETGID".to_string(), Capability::CAP_SETGID);
m.insert("CAP_SETUID".to_string(), Capability::CAP_SETUID);
m.insert("CAP_SETPCAP".to_string(), Capability::CAP_SETPCAP);
m.insert(
"CAP_LINUX_IMMUTABLE".to_string(),
Capability::CAP_LINUX_IMMUTABLE,
);
m.insert(
"CAP_NET_BIND_SERVICE".to_string(),
Capability::CAP_NET_BIND_SERVICE,
);
m.insert(
"CAP_NET_BROADCAST".to_string(),
Capability::CAP_NET_BROADCAST,
);
m.insert("CAP_NET_ADMIN".to_string(), Capability::CAP_NET_ADMIN);
m.insert("CAP_NET_RAW".to_string(), Capability::CAP_NET_RAW);
m.insert("CAP_IPC_LOCK".to_string(), Capability::CAP_IPC_LOCK);
m.insert("CAP_IPC_OWNER".to_string(), Capability::CAP_IPC_OWNER);
m.insert("CAP_SYS_MODULE".to_string(), Capability::CAP_SYS_MODULE);
m.insert("CAP_SYS_RAWIO".to_string(), Capability::CAP_SYS_RAWIO);
m.insert("CAP_SYS_CHROOT".to_string(), Capability::CAP_SYS_CHROOT);
m.insert("CAP_SYS_PTRACE".to_string(), Capability::CAP_SYS_PTRACE);
m.insert("CAP_SYS_PACCT".to_string(), Capability::CAP_SYS_PACCT);
m.insert("CAP_SYS_ADMIN".to_string(), Capability::CAP_SYS_ADMIN);
m.insert("CAP_SYS_BOOT".to_string(), Capability::CAP_SYS_BOOT);
m.insert("CAP_SYS_NICE".to_string(), Capability::CAP_SYS_NICE);
m.insert("CAP_SYS_RESOURCE".to_string(), Capability::CAP_SYS_RESOURCE);
m.insert("CAP_SYS_TIME".to_string(), Capability::CAP_SYS_TIME);
m.insert(
"CAP_SYS_TTY_CONFIG".to_string(),
Capability::CAP_SYS_TTY_CONFIG,
);
m.insert("CAP_MKNOD".to_string(), Capability::CAP_MKNOD);
m.insert("CAP_LEASE".to_string(), Capability::CAP_LEASE);
m.insert("CAP_AUDIT_WRITE".to_string(), Capability::CAP_AUDIT_WRITE);
m.insert("CAP_AUDIT_CONTROL".to_string(), Capability::CAP_AUDIT_WRITE);
m.insert("CAP_SETFCAP".to_string(), Capability::CAP_SETFCAP);
m.insert("CAP_MAC_OVERRIDE".to_string(), Capability::CAP_MAC_OVERRIDE);
m.insert("CAP_SYSLOG".to_string(), Capability::CAP_SYSLOG);
m.insert("CAP_WAKE_ALARM".to_string(), Capability::CAP_WAKE_ALARM);
m.insert(
"CAP_BLOCK_SUSPEND".to_string(),
Capability::CAP_BLOCK_SUSPEND,
);
m.insert("CAP_AUDIT_READ".to_string(), Capability::CAP_AUDIT_READ);
m
};
}
fn to_capshashset(cfd_log: RawFd, caps: &[String]) -> CapsHashSet {
let mut r = CapsHashSet::new();
for cap in caps.iter() {
match Capability::from_str(cap) {
Err(_) => {
log_child!(cfd_log, "{} is not a cap", cap);
continue;
}
Ok(c) => r.insert(c),
};
let c = CAPSMAP.get(cap);
if c.is_none() {
log_child!(cfd_log, "{} is not a cap", cap);
continue;
}
r.insert(*c.unwrap());
}
r
}
pub fn get_all_caps() -> CapsHashSet {
let mut caps_set =
runtime::procfs_all_supported(None).unwrap_or_else(|_| runtime::thread_all_supported());
if caps_set.is_empty() {
caps_set = caps::all();
}
caps_set
}
pub fn reset_effective() -> Result<()> {
let all = get_all_caps();
caps::set(None, CapSet::Effective, &all).map_err(|e| anyhow!(e.to_string()))?;
caps::set(None, CapSet::Effective, caps::all()).map_err(|e| anyhow!(e.to_string()))?;
Ok(())
}
pub fn drop_privileges(cfd_log: RawFd, caps: &LinuxCapabilities) -> Result<()> {
let all = get_all_caps();
let all = caps::all();
for c in all.difference(&to_capshashset(cfd_log, caps.bounding.as_ref())) {
caps::drop(None, CapSet::Bounding, *c).map_err(|e| anyhow!(e.to_string()))?;
@@ -55,26 +110,26 @@ pub fn drop_privileges(cfd_log: RawFd, caps: &LinuxCapabilities) -> Result<()> {
caps::set(
None,
CapSet::Effective,
&to_capshashset(cfd_log, caps.effective.as_ref()),
to_capshashset(cfd_log, caps.effective.as_ref()),
)
.map_err(|e| anyhow!(e.to_string()))?;
caps::set(
None,
CapSet::Permitted,
&to_capshashset(cfd_log, caps.permitted.as_ref()),
to_capshashset(cfd_log, caps.permitted.as_ref()),
)
.map_err(|e| anyhow!(e.to_string()))?;
caps::set(
None,
CapSet::Inheritable,
&to_capshashset(cfd_log, caps.inheritable.as_ref()),
to_capshashset(cfd_log, caps.inheritable.as_ref()),
)
.map_err(|e| anyhow!(e.to_string()))?;
let _ = caps::set(
None,
CapSet::Ambient,
&to_capshashset(cfd_log, caps.ambient.as_ref()),
to_capshashset(cfd_log, caps.ambient.as_ref()),
)
.map_err(|_| log_child!(cfd_log, "failed to set ambient capability"));

View File

@@ -21,10 +21,11 @@ use cgroups::{
use crate::cgroups::Manager as CgroupManager;
use crate::container::DEFAULT_DEVICES;
use anyhow::{anyhow, Context, Result};
use lazy_static;
use libc::{self, pid_t};
use nix::errno::Errno;
use oci::{
LinuxBlockIo, LinuxCpu, LinuxDevice, LinuxDeviceCgroup, LinuxHugepageLimit, LinuxMemory,
LinuxBlockIO, LinuxCPU, LinuxDevice, LinuxDeviceCgroup, LinuxHugepageLimit, LinuxMemory,
LinuxNetwork, LinuxPids, LinuxResources,
};
@@ -37,8 +38,6 @@ use std::collections::HashMap;
use std::fs;
use std::path::Path;
const GUEST_CPUS_PATH: &str = "/sys/devices/system/cpu/online";
// Convenience macro to obtain the scope logger
macro_rules! sl {
() => {
@@ -46,6 +45,28 @@ macro_rules! sl {
};
}
pub fn load_or_create<'a>(h: Box<&'a dyn cgroups::Hierarchy>, path: &str) -> Cgroup<'a> {
let valid_path = path.trim_start_matches("/").to_string();
let cg = load(h.clone(), &valid_path);
if cg.is_none() {
info!(sl!(), "create new cgroup: {}", &valid_path);
cgroups::Cgroup::new(h, valid_path.as_str())
} else {
cg.unwrap()
}
}
pub fn load<'a>(h: Box<&'a dyn cgroups::Hierarchy>, path: &str) -> Option<Cgroup<'a>> {
let valid_path = path.trim_start_matches("/").to_string();
let cg = cgroups::Cgroup::load(h, valid_path.as_str());
let cpu_controller: &CpuController = cg.controller_of().unwrap();
if cpu_controller.exists() {
Some(cg)
} else {
None
}
}
macro_rules! get_controller_or_return_singular_none {
($cg:ident) => {
match $cg.controller_of() {
@@ -59,9 +80,8 @@ macro_rules! get_controller_or_return_singular_none {
pub struct Manager {
pub paths: HashMap<String, String>,
pub mounts: HashMap<String, String>,
// pub rels: HashMap<String, String>,
pub cpath: String,
#[serde(skip)]
cgroup: cgroups::Cgroup,
}
// set_resource is used to set reources by cgroup controller.
@@ -76,11 +96,17 @@ macro_rules! set_resource {
impl CgroupManager for Manager {
fn apply(&self, pid: pid_t) -> Result<()> {
self.cgroup.add_task(CgroupPid::from(pid as u64))?;
let h = cgroups::hierarchies::auto();
let h = Box::new(&*h);
let cg = load_or_create(h, &self.cpath);
cg.add_task(CgroupPid::from(pid as u64))?;
Ok(())
}
fn set(&self, r: &LinuxResources, update: bool) -> Result<()> {
let h = cgroups::hierarchies::auto();
let h = Box::new(&*h);
let cg = load_or_create(h, &self.cpath);
info!(
sl!(),
"cgroup manager set resources for container. Resources input {:?}", r
@@ -90,49 +116,53 @@ impl CgroupManager for Manager {
// set cpuset and cpu reources
if let Some(cpu) = &r.cpu {
set_cpu_resources(&self.cgroup, cpu)?;
set_cpu_resources(&cg, cpu)?;
}
// set memory resources
if let Some(memory) = &r.memory {
set_memory_resources(&self.cgroup, memory, update)?;
set_memory_resources(&cg, memory, update)?;
}
// set pids resources
if let Some(pids_resources) = &r.pids {
set_pids_resources(&self.cgroup, pids_resources)?;
set_pids_resources(&cg, pids_resources)?;
}
// set block_io resources
if let Some(blkio) = &r.block_io {
set_block_io_resources(&self.cgroup, blkio, res);
set_block_io_resources(&cg, blkio, res)?;
}
// set hugepages resources
if !r.hugepage_limits.is_empty() {
set_hugepages_resources(&self.cgroup, &r.hugepage_limits, res);
if r.hugepage_limits.len() > 0 {
set_hugepages_resources(&cg, &r.hugepage_limits, res)?;
}
// set network resources
if let Some(network) = &r.network {
set_network_resources(&self.cgroup, network, res);
set_network_resources(&cg, network, res)?;
}
// set devices resources
set_devices_resources(&self.cgroup, &r.devices, res);
set_devices_resources(&cg, &r.devices, res)?;
info!(sl!(), "resources after processed {:?}", res);
// apply resources
self.cgroup.apply(res)?;
cg.apply(res)?;
Ok(())
}
fn get_stats(&self) -> Result<CgroupStats> {
// CpuStats
let cpu_usage = get_cpuacct_stats(&self.cgroup);
let h = cgroups::hierarchies::auto();
let h = Box::new(&*h);
let cg = load_or_create(h, &self.cpath);
let throttling_data = get_cpu_stats(&self.cgroup);
// CpuStats
let cpu_usage = get_cpuacct_stats(&cg);
let throttling_data = get_cpu_stats(&cg);
let cpu_stats = SingularPtrField::some(CpuStats {
cpu_usage,
@@ -142,17 +172,17 @@ impl CgroupManager for Manager {
});
// Memorystats
let memory_stats = get_memory_stats(&self.cgroup);
let memory_stats = get_memory_stats(&cg);
// PidsStats
let pids_stats = get_pids_stats(&self.cgroup);
let pids_stats = get_pids_stats(&cg);
// BlkioStats
// note that virtiofs has no blkio stats
let blkio_stats = get_blkio_stats(&self.cgroup);
let blkio_stats = get_blkio_stats(&cg);
// HugetlbStats
let hugetlb_stats = get_hugetlb_stats(&self.cgroup);
let hugetlb_stats = get_hugetlb_stats(&cg);
Ok(CgroupStats {
cpu_stats,
@@ -166,7 +196,10 @@ impl CgroupManager for Manager {
}
fn freeze(&self, state: FreezerState) -> Result<()> {
let freezer_controller: &FreezerController = self.cgroup.controller_of().unwrap();
let h = cgroups::hierarchies::auto();
let h = Box::new(&*h);
let cg = load_or_create(h, &self.cpath);
let freezer_controller: &FreezerController = cg.controller_of().unwrap();
match state {
FreezerState::Thawed => {
freezer_controller.thaw()?;
@@ -183,12 +216,20 @@ impl CgroupManager for Manager {
}
fn destroy(&mut self) -> Result<()> {
let _ = self.cgroup.delete();
let h = cgroups::hierarchies::auto();
let h = Box::new(&*h);
let cg = load(h, &self.cpath);
if cg.is_some() {
cg.unwrap().delete();
}
Ok(())
}
fn get_pids(&self) -> Result<Vec<pid_t>> {
let mem_controller: &MemController = self.cgroup.controller_of().unwrap();
let h = cgroups::hierarchies::auto();
let h = Box::new(&*h);
let cg = load_or_create(h, &self.cpath);
let mem_controller: &MemController = cg.controller_of().unwrap();
let pids = mem_controller.tasks();
let result = pids.iter().map(|x| x.pid as i32).collect::<Vec<i32>>();
@@ -200,14 +241,14 @@ fn set_network_resources(
_cg: &cgroups::Cgroup,
network: &LinuxNetwork,
res: &mut cgroups::Resources,
) {
) -> Result<()> {
info!(sl!(), "cgroup manager set network");
// set classid
// description can be found at https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/net_cls.html
let class_id = network.class_id.unwrap_or(0) as u64;
if class_id != 0 {
res.network.class_id = Some(class_id);
res.network.class_id = class_id;
}
// set network priorities
@@ -220,14 +261,16 @@ fn set_network_resources(
});
}
res.network.update_values = true;
res.network.priorities = priorities;
Ok(())
}
fn set_devices_resources(
_cg: &cgroups::Cgroup,
device_resources: &[LinuxDeviceCgroup],
device_resources: &Vec<LinuxDeviceCgroup>,
res: &mut cgroups::Resources,
) {
) -> Result<()> {
info!(sl!(), "cgroup manager set devices");
let mut devices = vec![];
@@ -249,15 +292,19 @@ fn set_devices_resources(
}
}
res.devices.update_values = true;
res.devices.devices = devices;
Ok(())
}
fn set_hugepages_resources(
_cg: &cgroups::Cgroup,
hugepage_limits: &[LinuxHugepageLimit],
hugepage_limits: &Vec<LinuxHugepageLimit>,
res: &mut cgroups::Resources,
) {
) -> Result<()> {
info!(sl!(), "cgroup manager set hugepage");
res.hugepages.update_values = true;
let mut limits = vec![];
for l in hugepage_limits.iter() {
@@ -268,25 +315,42 @@ fn set_hugepages_resources(
limits.push(hr);
}
res.hugepages.limits = limits;
Ok(())
}
fn set_block_io_resources(
_cg: &cgroups::Cgroup,
blkio: &LinuxBlockIo,
cg: &cgroups::Cgroup,
blkio: &LinuxBlockIO,
res: &mut cgroups::Resources,
) {
) -> Result<()> {
info!(sl!(), "cgroup manager set block io");
res.blkio.update_values = true;
res.blkio.weight = blkio.weight;
res.blkio.leaf_weight = blkio.leaf_weight;
if cg.v2() {
res.blkio.weight = convert_blk_io_to_v2_value(blkio.weight);
res.blkio.leaf_weight = convert_blk_io_to_v2_value(blkio.leaf_weight);
} else {
res.blkio.weight = blkio.weight;
res.blkio.leaf_weight = blkio.leaf_weight;
}
let mut blk_device_resources = vec![];
for d in blkio.weight_device.iter() {
let (w, lw) = if cg.v2() {
(
convert_blk_io_to_v2_value(blkio.weight),
convert_blk_io_to_v2_value(blkio.leaf_weight),
)
} else {
(blkio.weight, blkio.leaf_weight)
};
let dr = BlkIoDeviceResource {
major: d.blk.major as u64,
minor: d.blk.minor as u64,
weight: blkio.weight,
leaf_weight: blkio.leaf_weight,
weight: w,
leaf_weight: lw,
};
blk_device_resources.push(dr);
}
@@ -300,17 +364,17 @@ fn set_block_io_resources(
build_blk_io_device_throttle_resource(&blkio.throttle_read_iops_device);
res.blkio.throttle_write_iops_device =
build_blk_io_device_throttle_resource(&blkio.throttle_write_iops_device);
Ok(())
}
fn set_cpu_resources(cg: &cgroups::Cgroup, cpu: &LinuxCpu) -> Result<()> {
fn set_cpu_resources(cg: &cgroups::Cgroup, cpu: &LinuxCPU) -> Result<()> {
info!(sl!(), "cgroup manager set cpu");
let cpuset_controller: &CpuSetController = cg.controller_of().unwrap();
if !cpu.cpus.is_empty() {
if let Err(e) = cpuset_controller.set_cpus(&cpu.cpus) {
warn!(sl!(), "write cpuset failed: {:?}", e);
}
cpuset_controller.set_cpus(&cpu.cpus)?;
}
if !cpu.mems.is_empty() {
@@ -349,34 +413,14 @@ fn set_memory_resources(cg: &cgroups::Cgroup, memory: &LinuxMemory, update: bool
mem_controller.set_kmem_limit(-1)?;
}
// If the memory update is set to -1 we should also
// set swap to -1, it means unlimited memory.
let mut swap = memory.swap.unwrap_or(0);
if memory.limit == Some(-1) {
swap = -1;
}
set_resource!(mem_controller, set_limit, memory, limit);
set_resource!(mem_controller, set_soft_limit, memory, reservation);
set_resource!(mem_controller, set_kmem_limit, memory, kernel);
set_resource!(mem_controller, set_tcp_limit, memory, kernel_tcp);
if memory.limit.is_some() && swap != 0 {
let memstat = get_memory_stats(cg)
.into_option()
.ok_or_else(|| anyhow!("failed to get the cgroup memory stats"))?;
let memusage = memstat.get_usage();
// When update memory limit, the kernel would check the current memory limit
// set against the new swap setting, if the current memory limit is large than
// the new swap, then set limit first, otherwise the kernel would complain and
// refused to set; on the other hand, if the current memory limit is smaller than
// the new swap, then we should set the swap first and then set the memor limit.
if swap == -1 || memusage.get_limit() < swap as u64 {
mem_controller.set_memswap_limit(swap)?;
set_resource!(mem_controller, set_limit, memory, limit);
} else {
set_resource!(mem_controller, set_limit, memory, limit);
mem_controller.set_memswap_limit(swap)?;
}
} else {
set_resource!(mem_controller, set_limit, memory, limit);
swap = if cg.v2() {
if let Some(swap) = memory.swap {
// set memory swap
let swap = if cg.v2() {
convert_memory_swap_to_v2_value(swap, memory.limit.unwrap_or(0))?
} else {
swap
@@ -386,12 +430,8 @@ fn set_memory_resources(cg: &cgroups::Cgroup, memory: &LinuxMemory, update: bool
}
}
set_resource!(mem_controller, set_soft_limit, memory, reservation);
set_resource!(mem_controller, set_kmem_limit, memory, kernel);
set_resource!(mem_controller, set_tcp_limit, memory, kernel_tcp);
if let Some(swappiness) = memory.swappiness {
if (0..=100).contains(&swappiness) {
if swappiness >= 0 && swappiness <= 100 {
mem_controller.set_swappiness(swappiness as u64)?;
} else {
return Err(anyhow!(
@@ -422,7 +462,7 @@ fn set_pids_resources(cg: &cgroups::Cgroup, pids: &LinuxPids) -> Result<()> {
}
fn build_blk_io_device_throttle_resource(
input: &[oci::LinuxThrottleDevice],
input: &Vec<oci::LinuxThrottleDevice>,
) -> Vec<BlkIoDeviceThrottleResource> {
let mut blk_io_device_throttle_resources = vec![];
for d in input.iter() {
@@ -513,61 +553,63 @@ lazy_static! {
};
pub static ref DEFAULT_ALLOWED_DEVICES: Vec<LinuxDeviceCgroup> = {
vec![
// all mknod to all char devices
LinuxDeviceCgroup {
allow: true,
r#type: "c".to_string(),
major: Some(WILDCARD),
minor: Some(WILDCARD),
access: "m".to_string(),
},
let mut v = Vec::new();
// all mknod to all block devices
LinuxDeviceCgroup {
allow: true,
r#type: "b".to_string(),
major: Some(WILDCARD),
minor: Some(WILDCARD),
access: "m".to_string(),
},
// all mknod to all char devices
v.push(LinuxDeviceCgroup {
allow: true,
r#type: "c".to_string(),
major: Some(WILDCARD),
minor: Some(WILDCARD),
access: "m".to_string(),
});
// all read/write/mknod to char device /dev/console
LinuxDeviceCgroup {
allow: true,
r#type: "c".to_string(),
major: Some(5),
minor: Some(1),
access: "rwm".to_string(),
},
// all mknod to all block devices
v.push(LinuxDeviceCgroup {
allow: true,
r#type: "b".to_string(),
major: Some(WILDCARD),
minor: Some(WILDCARD),
access: "m".to_string(),
});
// all read/write/mknod to char device /dev/pts/<N>
LinuxDeviceCgroup {
allow: true,
r#type: "c".to_string(),
major: Some(136),
minor: Some(WILDCARD),
access: "rwm".to_string(),
},
// all read/write/mknod to char device /dev/console
v.push(LinuxDeviceCgroup {
allow: true,
r#type: "c".to_string(),
major: Some(5),
minor: Some(1),
access: "rwm".to_string(),
});
// all read/write/mknod to char device /dev/ptmx
LinuxDeviceCgroup {
allow: true,
r#type: "c".to_string(),
major: Some(5),
minor: Some(2),
access: "rwm".to_string(),
},
// all read/write/mknod to char device /dev/pts/<N>
v.push(LinuxDeviceCgroup {
allow: true,
r#type: "c".to_string(),
major: Some(136),
minor: Some(WILDCARD),
access: "rwm".to_string(),
});
// all read/write/mknod to char device /dev/net/tun
LinuxDeviceCgroup {
allow: true,
r#type: "c".to_string(),
major: Some(10),
minor: Some(200),
access: "rwm".to_string(),
},
]
// all read/write/mknod to char device /dev/ptmx
v.push(LinuxDeviceCgroup {
allow: true,
r#type: "c".to_string(),
major: Some(5),
minor: Some(2),
access: "rwm".to_string(),
});
// all read/write/mknod to char device /dev/net/tun
v.push(LinuxDeviceCgroup {
allow: true,
r#type: "c".to_string(),
major: Some(10),
minor: Some(200),
access: "rwm".to_string(),
});
v
};
}
@@ -648,7 +690,7 @@ fn get_memory_stats(cg: &cgroups::Cgroup) -> SingularPtrField<MemoryStats> {
// use_hierarchy
let value = memory.use_hierarchy;
let use_hierarchy = value == 1;
let use_hierarchy = if value == 1 { true } else { false };
// gte memory datas
let usage = SingularPtrField::some(MemoryData {
@@ -702,12 +744,13 @@ fn get_pids_stats(cg: &cgroups::Cgroup) -> SingularPtrField<PidsStats> {
let current = pid_controller.get_pid_current().unwrap_or(0);
let max = pid_controller.get_pid_max();
let limit = match max {
Err(_) => 0,
Ok(max) => match max {
let limit = if max.is_err() {
0
} else {
match max.unwrap() {
MaxValue::Value(v) => v,
MaxValue::Max => 0,
},
}
} as u64;
SingularPtrField::some(PidsStats {
@@ -750,9 +793,9 @@ https://github.com/opencontainers/runc/blob/a5847db387ae28c0ca4ebe4beee1a76900c8
Total 0
*/
fn get_blkio_stat_blkiodata(blkiodata: &[BlkIoData]) -> RepeatedField<BlkioStatsEntry> {
fn get_blkio_stat_blkiodata(blkiodata: &Vec<BlkIoData>) -> RepeatedField<BlkioStatsEntry> {
let mut m = RepeatedField::new();
if blkiodata.is_empty() {
if blkiodata.len() == 0 {
return m;
}
@@ -772,10 +815,10 @@ fn get_blkio_stat_blkiodata(blkiodata: &[BlkIoData]) -> RepeatedField<BlkioStats
m
}
fn get_blkio_stat_ioservice(services: &[IoService]) -> RepeatedField<BlkioStatsEntry> {
fn get_blkio_stat_ioservice(services: &Vec<IoService>) -> RepeatedField<BlkioStatsEntry> {
let mut m = RepeatedField::new();
if services.is_empty() {
if services.len() == 0 {
return m;
}
@@ -796,7 +839,7 @@ fn build_blkio_stats_entry(major: i16, minor: i16, op: &str, value: u64) -> Blki
major: major as u64,
minor: minor as u64,
op: op.to_string(),
value,
value: value,
unknown_fields: UnknownFields::default(),
cached_size: CachedSize::default(),
}
@@ -837,7 +880,7 @@ fn get_blkio_stats(cg: &cgroups::Cgroup) -> SingularPtrField<BlkioStats> {
let mut m = BlkioStats::new();
let io_serviced_recursive = blkio.io_serviced_recursive;
if io_serviced_recursive.is_empty() {
if io_serviced_recursive.len() == 0 {
// fall back to generic stats
// blkio.throttle.io_service_bytes,
// maybe io_service_bytes_recursive?
@@ -892,8 +935,8 @@ fn get_hugetlb_stats(cg: &cgroups::Cgroup) -> HashMap<String, HugetlbStats> {
h
}
pub const PATHS: &str = "/proc/self/cgroup";
pub const MOUNTS: &str = "/proc/self/mountinfo";
pub const PATHS: &'static str = "/proc/self/cgroup";
pub const MOUNTS: &'static str = "/proc/self/mountinfo";
pub fn get_paths() -> Result<HashMap<String, String>> {
let mut m = HashMap::new();
@@ -948,11 +991,6 @@ pub fn get_mounts() -> Result<HashMap<String, String>> {
Ok(m)
}
fn new_cgroup(h: Box<dyn cgroups::Hierarchy>, path: &str) -> Cgroup {
let valid_path = path.trim_start_matches('/').to_string();
cgroups::Cgroup::new(h, valid_path.as_str())
}
impl Manager {
pub fn new(cpath: &str) -> Result<Self> {
let mut m = HashMap::new();
@@ -960,14 +998,18 @@ impl Manager {
let paths = get_paths()?;
let mounts = get_mounts()?;
for key in paths.keys() {
for (key, value) in &paths {
let mnt = mounts.get(key);
if mnt.is_none() {
continue;
}
let p = format!("{}/{}", mnt.unwrap(), cpath);
let p = if value == "/" {
format!("{}/{}", mnt.unwrap(), cpath)
} else {
format!("{}{}/{}", mnt.unwrap(), value, cpath)
};
m.insert(key.to_string(), p);
}
@@ -977,26 +1019,29 @@ impl Manager {
mounts,
// rels: paths,
cpath: cpath.to_string(),
cgroup: new_cgroup(cgroups::hierarchies::auto(), cpath),
})
}
pub fn update_cpuset_path(&self, guest_cpuset: &str, container_cpuset: &str) -> Result<()> {
if guest_cpuset.is_empty() {
pub fn update_cpuset_path(&self, cpuset_cpus: &str) -> Result<()> {
if cpuset_cpus == "" {
return Ok(());
}
info!(sl!(), "update_cpuset_path to: {}", guest_cpuset);
info!(sl!(), "update_cpuset_path to: {}", cpuset_cpus);
let h = cgroups::hierarchies::auto();
let root_cg = h.root_control_group();
let h = Box::new(&*h);
let root_cg = load_or_create(h, "");
let root_cpuset_controller: &CpuSetController = root_cg.controller_of().unwrap();
let path = root_cpuset_controller.path();
let root_path = Path::new(path);
info!(sl!(), "root cpuset path: {:?}", &path);
let container_cpuset_controller: &CpuSetController = self.cgroup.controller_of().unwrap();
let path = container_cpuset_controller.path();
let h = cgroups::hierarchies::auto();
let h = Box::new(&*h);
let cg = load_or_create(h, &self.cpath);
let cpuset_controller: &CpuSetController = cg.controller_of().unwrap();
let path = cpuset_controller.path();
let container_path = Path::new(path);
info!(sl!(), "container cpuset path: {:?}", &path);
@@ -1005,36 +1050,30 @@ impl Manager {
if ancestor == root_path {
break;
}
paths.push(ancestor);
if ancestor != container_path {
paths.push(ancestor);
}
}
info!(sl!(), "parent paths to update cpuset: {:?}", &paths);
info!(sl!(), "paths to update cpuset: {:?}", &paths);
let mut i = paths.len();
loop {
if i == 0 {
break;
}
i -= 1;
i = i - 1;
let h = cgroups::hierarchies::auto();
let h = Box::new(&*h);
// remove cgroup root from path
let r_path = &paths[i]
.to_str()
.unwrap()
.trim_start_matches(root_path.to_str().unwrap());
info!(sl!(), "updating cpuset for parent path {:?}", &r_path);
let cg = new_cgroup(cgroups::hierarchies::auto(), &r_path);
info!(sl!(), "updating cpuset for path {:?}", &r_path);
let cg = load_or_create(h, &r_path);
let cpuset_controller: &CpuSetController = cg.controller_of().unwrap();
cpuset_controller.set_cpus(guest_cpuset)?;
}
if !container_cpuset.is_empty() {
info!(
sl!(),
"updating cpuset for container path: {:?} cpuset: {}",
&container_path,
container_cpuset
);
container_cpuset_controller.set_cpus(container_cpuset)?;
cpuset_controller.set_cpus(cpuset_cpus)?;
}
Ok(())
@@ -1051,10 +1090,23 @@ impl Manager {
}
}
// get the guest's online cpus.
pub fn get_guest_cpuset() -> Result<String> {
let c = fs::read_to_string(GUEST_CPUS_PATH)?;
Ok(c.trim().to_string())
// for cgroup v2
if cgroups::hierarchies::is_cgroup2_unified_mode() {
let c = fs::read_to_string("/sys/fs/cgroup/cpuset.cpus.effective")?;
return Ok(c);
}
// for cgroup v1
let m = get_mounts()?;
if m.get("cpuset").is_none() {
warn!(sl!(), "no cpuset cgroup!");
return Err(nix::Error::Sys(Errno::ENOENT).into());
}
let p = format!("{}/cpuset.cpus", m.get("cpuset").unwrap());
let c = fs::read_to_string(p.as_str())?;
Ok(c)
}
// Since the OCI spec is designed for cgroup v1, in some cases
@@ -1097,6 +1149,20 @@ fn convert_memory_swap_to_v2_value(memory_swap: i64, memory: i64) -> Result<i64>
Ok(memory_swap - memory)
}
// Since the OCI spec is designed for cgroup v1, in some cases
// there is need to convert from the cgroup v1 configuration to cgroup v2
// the formula for BlkIOWeight is y = (1 + (x - 10) * 9999 / 990)
// convert linearly from [10-1000] to [1-10000]
// https://github.com/opencontainers/runc/blob/a5847db387ae28c0ca4ebe4beee1a76900c86414/libcontainer/cgroups/utils.go#L382
fn convert_blk_io_to_v2_value(blk_io_weight: Option<u16>) -> Option<u16> {
let v = blk_io_weight.unwrap_or(0);
if v != 0 {
return None;
}
Some(1 + (v - 10) * 9999 / 990 as u16)
}
#[cfg(test)]
mod tests {
use super::*;

View File

@@ -1,74 +0,0 @@
// Copyright (c) 2020 Intel Corporation
//
// SPDX-License-Identifier: Apache-2.0
//
use protobuf::{CachedSize, SingularPtrField, UnknownFields};
use crate::cgroups::Manager as CgroupManager;
use crate::protocols::agent::{BlkioStats, CgroupStats, CpuStats, MemoryStats, PidsStats};
use anyhow::Result;
use cgroups::freezer::FreezerState;
use libc::{self, pid_t};
use oci::LinuxResources;
use std::collections::HashMap;
use std::string::String;
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct Manager {
pub paths: HashMap<String, String>,
pub mounts: HashMap<String, String>,
pub cpath: String,
}
impl CgroupManager for Manager {
fn apply(&self, _: pid_t) -> Result<()> {
Ok(())
}
fn set(&self, _: &LinuxResources, _: bool) -> Result<()> {
Ok(())
}
fn get_stats(&self) -> Result<CgroupStats> {
Ok(CgroupStats {
cpu_stats: SingularPtrField::some(CpuStats::default()),
memory_stats: SingularPtrField::some(MemoryStats::new()),
pids_stats: SingularPtrField::some(PidsStats::new()),
blkio_stats: SingularPtrField::some(BlkioStats::new()),
hugetlb_stats: HashMap::new(),
unknown_fields: UnknownFields::default(),
cached_size: CachedSize::default(),
})
}
fn freeze(&self, _: FreezerState) -> Result<()> {
Ok(())
}
fn destroy(&mut self) -> Result<()> {
Ok(())
}
fn get_pids(&self) -> Result<Vec<pid_t>> {
Ok(Vec::new())
}
}
impl Manager {
pub fn new(cpath: &str) -> Result<Self> {
Ok(Self {
paths: HashMap::new(),
mounts: HashMap::new(),
cpath: cpath.to_string(),
})
}
pub fn update_cpuset_path(&self, _: &str, _: &str) -> Result<()> {
Ok(())
}
pub fn get_cg_path(&self, _: &str) -> Option<String> {
Some("".to_string())
}
}

View File

@@ -10,7 +10,6 @@ use protocols::agent::CgroupStats;
use cgroups::freezer::FreezerState;
pub mod fs;
pub mod mock;
pub mod notifier;
pub mod systemd;

View File

@@ -3,18 +3,16 @@
// SPDX-License-Identifier: Apache-2.0
//
use anyhow::{anyhow, Context, Result};
use anyhow::{anyhow, Result};
use eventfd::{eventfd, EfdFlags};
use nix::sys::eventfd;
use nix::sys::inotify::{AddWatchFlags, InitFlags, Inotify};
use std::fs::{self, File};
use std::io::Read;
use std::os::unix::io::{AsRawFd, FromRawFd};
use std::path::Path;
use crate::pipestream::PipeStream;
use futures::StreamExt as _;
use inotify::{Inotify, WatchMask};
use tokio::io::AsyncReadExt;
use tokio::sync::mpsc::{channel, Receiver};
use std::path::{Path, PathBuf};
use std::sync::mpsc::{self, Receiver};
use std::thread;
// Convenience macro to obtain the scope logger
macro_rules! sl {
@@ -23,11 +21,11 @@ macro_rules! sl {
};
}
pub async fn notify_oom(cid: &str, cg_dir: String) -> Result<Receiver<String>> {
pub fn notify_oom(cid: &str, cg_dir: String) -> Result<Receiver<String>> {
if cgroups::hierarchies::is_cgroup2_unified_mode() {
return notify_on_oom_v2(cid, cg_dir).await;
return notify_on_oom_v2(cid, cg_dir);
}
notify_on_oom(cid, cg_dir).await
notify_on_oom(cid, cg_dir)
}
// get_value_from_cgroup parse cgroup file with `Flat keyed`
@@ -35,7 +33,7 @@ pub async fn notify_oom(cid: &str, cg_dir: String) -> Result<Receiver<String>> {
// Flat keyed file format:
// KEY0 VAL0\n
// KEY1 VAL1\n
fn get_value_from_cgroup(path: &Path, key: &str) -> Result<i64> {
fn get_value_from_cgroup(path: &PathBuf, key: &str) -> Result<i64> {
let content = fs::read_to_string(path)?;
info!(
sl!(),
@@ -43,7 +41,7 @@ fn get_value_from_cgroup(path: &Path, key: &str) -> Result<i64> {
);
for line in content.lines() {
let arr: Vec<&str> = line.split(' ').collect();
let arr: Vec<&str> = line.split(" ").collect();
if arr.len() == 2 && arr[0] == key {
let r = arr[1].parse::<i64>()?;
return Ok(r);
@@ -54,11 +52,11 @@ fn get_value_from_cgroup(path: &Path, key: &str) -> Result<i64> {
// notify_on_oom returns channel on which you can expect event about OOM,
// if process died without OOM this channel will be closed.
pub async fn notify_on_oom_v2(containere_id: &str, cg_dir: String) -> Result<Receiver<String>> {
register_memory_event_v2(containere_id, cg_dir, "memory.events", "cgroup.events").await
pub fn notify_on_oom_v2(containere_id: &str, cg_dir: String) -> Result<Receiver<String>> {
register_memory_event_v2(containere_id, cg_dir, "memory.events", "cgroup.events")
}
async fn register_memory_event_v2(
fn register_memory_event_v2(
containere_id: &str,
cg_dir: String,
memory_event_name: &str,
@@ -75,49 +73,49 @@ async fn register_memory_event_v2(
"register_memory_event_v2 cgroup_event_control_path: {:?}", &cgroup_event_control_path
);
let mut inotify = Inotify::init().context("Failed to initialize inotify")?;
let fd = Inotify::init(InitFlags::empty()).unwrap();
// watching oom kill
let ev_wd = inotify.add_watch(&event_control_path, WatchMask::MODIFY)?;
let ev_fd = fd
.add_watch(&event_control_path, AddWatchFlags::IN_MODIFY)
.unwrap();
// Because no `unix.IN_DELETE|unix.IN_DELETE_SELF` event for cgroup file system, so watching all process exited
let cg_wd = inotify.add_watch(&cgroup_event_control_path, WatchMask::MODIFY)?;
let cg_fd = fd
.add_watch(&cgroup_event_control_path, AddWatchFlags::IN_MODIFY)
.unwrap();
info!(sl!(), "ev_fd: {:?}", ev_fd);
info!(sl!(), "cg_fd: {:?}", cg_fd);
info!(sl!(), "ev_wd: {:?}", ev_wd);
info!(sl!(), "cg_wd: {:?}", cg_wd);
let (sender, receiver) = channel(100);
let (sender, receiver) = mpsc::channel();
let containere_id = containere_id.to_string();
tokio::spawn(async move {
let mut buffer = [0; 32];
let mut stream = inotify
.event_stream(&mut buffer)
.expect("create inotify event stream failed");
while let Some(event_or_error) = stream.next().await {
let event = event_or_error.unwrap();
thread::spawn(move || {
loop {
let events = fd.read_events().unwrap();
info!(
sl!(),
"container[{}] get event for container: {:?}", &containere_id, &event
"container[{}] get events for container: {:?}", &containere_id, &events
);
// info!("is1: {}", event.wd == wd1);
info!(sl!(), "event.wd: {:?}", event.wd);
if event.wd == ev_wd {
let oom = get_value_from_cgroup(&event_control_path, "oom_kill");
if oom.unwrap_or(0) > 0 {
let _ = sender.send(containere_id.clone()).await.map_err(|e| {
error!(sl!(), "send containere_id failed, error: {:?}", e);
});
return;
for event in events {
if event.mask & AddWatchFlags::IN_MODIFY != AddWatchFlags::IN_MODIFY {
continue;
}
} else if event.wd == cg_wd {
let pids = get_value_from_cgroup(&cgroup_event_control_path, "populated");
if pids.unwrap_or(-1) == 0 {
return;
info!(sl!(), "event.wd: {:?}", event.wd);
if event.wd == ev_fd {
let oom = get_value_from_cgroup(&event_control_path, "oom_kill");
if oom.unwrap_or(0) > 0 {
sender.send(containere_id.clone()).unwrap();
return;
}
} else if event.wd == cg_fd {
let pids = get_value_from_cgroup(&cgroup_event_control_path, "populated");
if pids.unwrap_or(-1) == 0 {
return;
}
}
}
// When a cgroup is destroyed, an event is sent to eventfd.
// So if the control path is gone, return instead of notifying.
if !Path::new(&event_control_path).exists() {
@@ -131,17 +129,17 @@ async fn register_memory_event_v2(
// notify_on_oom returns channel on which you can expect event about OOM,
// if process died without OOM this channel will be closed.
async fn notify_on_oom(cid: &str, dir: String) -> Result<Receiver<String>> {
if dir.is_empty() {
fn notify_on_oom(cid: &str, dir: String) -> Result<Receiver<String>> {
if dir == "" {
return Err(anyhow!("memory controller missing"));
}
register_memory_event(cid, dir, "memory.oom_control", "").await
register_memory_event(cid, dir, "memory.oom_control", "")
}
// level is one of "low", "medium", or "critical"
async fn notify_memory_pressure(cid: &str, dir: String, level: &str) -> Result<Receiver<String>> {
if dir.is_empty() {
fn notify_memory_pressure(cid: &str, dir: String, level: &str) -> Result<Receiver<String>> {
if dir == "" {
return Err(anyhow!("memory controller missing"));
}
@@ -149,10 +147,10 @@ async fn notify_memory_pressure(cid: &str, dir: String, level: &str) -> Result<R
return Err(anyhow!("invalid pressure level {}", level));
}
register_memory_event(cid, dir, "memory.pressure_level", level).await
register_memory_event(cid, dir, "memory.pressure_level", level)
}
async fn register_memory_event(
fn register_memory_event(
cid: &str,
cg_dir: String,
event_name: &str,
@@ -165,7 +163,7 @@ async fn register_memory_event(
let event_control_path = Path::new(&cg_dir).join("cgroup.event_control");
let data;
if arg.is_empty() {
if arg == "" {
data = format!("{} {}", eventfd, event_file.as_raw_fd());
} else {
data = format!("{} {} {}", eventfd, event_file.as_raw_fd(), arg);
@@ -173,16 +171,15 @@ async fn register_memory_event(
fs::write(&event_control_path, data)?;
let mut eventfd_stream = unsafe { PipeStream::from_raw_fd(eventfd) };
let mut eventfd_file = unsafe { File::from_raw_fd(eventfd) };
let (sender, receiver) = tokio::sync::mpsc::channel(100);
let (sender, receiver) = mpsc::channel();
let containere_id = cid.to_string();
tokio::spawn(async move {
thread::spawn(move || {
loop {
let sender = sender.clone();
let mut buf = [0u8; 8];
match eventfd_stream.read(&mut buf).await {
let mut buf = [0; 8];
match eventfd_file.read(&mut buf) {
Err(err) => {
warn!(sl!(), "failed to read from eventfd: {:?}", err);
return;
@@ -191,10 +188,7 @@ async fn register_memory_event(
let content = fs::read_to_string(path.clone());
info!(
sl!(),
"cgroup event for container: {}, path: {:?}, content: {:?}",
&containere_id,
&path,
content
"OOM event for container: {}, content: {:?}", &containere_id, content
);
}
}
@@ -204,10 +198,7 @@ async fn register_memory_event(
if !Path::new(&event_control_path).exists() {
return;
}
let _ = sender.send(containere_id.clone()).await.map_err(|e| {
error!(sl!(), "send containere_id failed, error: {:?}", e);
});
sender.send(containere_id.clone()).unwrap();
}
});

View File

@@ -54,8 +54,6 @@ pub struct Seccomp {
#[serde(default)]
architectures: Vec<String>,
#[serde(default)]
flags: Vec<String>,
#[serde(default)]
syscalls: Vec<Syscall>,
}
@@ -76,11 +74,9 @@ pub struct Arg {
#[derive(Serialize, Deserialize, Debug)]
pub struct Syscall {
#[serde(default, skip_serializing_if = "String::is_empty")]
names: String,
name: String,
#[serde(default)]
action: Action,
#[serde(default, rename = "errnoRet")]
errno_ret: u32,
#[serde(default, skip_serializing_if = "Vec::is_empty")]
args: Vec<Arg>,
}

File diff suppressed because it is too large Load Diff

View File

@@ -40,13 +40,11 @@ pub mod capabilities;
pub mod cgroups;
pub mod container;
pub mod mount;
pub mod pipestream;
pub mod process;
pub mod specconv;
pub mod sync;
pub mod sync_with_async;
pub mod utils;
pub mod validator;
// pub mod factory;
//pub mod configs;
// pub mod devices;
@@ -58,16 +56,24 @@ pub mod validator;
// pub mod user;
//pub mod intelrdt;
// construtc ociSpec from grpcSpec, which is needed for hook
// execution. since hooks read config.json
use oci::{
Box as ociBox, Hooks as ociHooks, Linux as ociLinux, LinuxCapabilities as ociLinuxCapabilities,
Mount as ociMount, POSIXRlimit as ociPOSIXRlimit, Process as ociProcess, Root as ociRoot,
Spec as ociSpec, User as ociUser,
};
use protocols::oci::{
Hooks as grpcHooks, Linux as grpcLinux, Mount as grpcMount, Process as grpcProcess,
Root as grpcRoot, Spec as grpcSpec,
};
use std::collections::HashMap;
use protocols::oci as grpc;
// construct ociSpec from grpc::Spec, which is needed for hook
// execution. since hooks read config.json
pub fn process_grpc_to_oci(p: &grpc::Process) -> oci::Process {
pub fn process_grpc_to_oci(p: &grpcProcess) -> ociProcess {
let console_size = if p.ConsoleSize.is_some() {
let c = p.ConsoleSize.as_ref().unwrap();
Some(oci::Box {
Some(ociBox {
height: c.Height,
width: c.Width,
})
@@ -77,14 +83,14 @@ pub fn process_grpc_to_oci(p: &grpc::Process) -> oci::Process {
let user = if p.User.is_some() {
let u = p.User.as_ref().unwrap();
oci::User {
ociUser {
uid: u.UID,
gid: u.GID,
additional_gids: u.AdditionalGids.clone(),
username: u.Username.clone(),
}
} else {
oci::User {
ociUser {
uid: 0,
gid: 0,
additional_gids: vec![],
@@ -95,7 +101,7 @@ pub fn process_grpc_to_oci(p: &grpc::Process) -> oci::Process {
let capabilities = if p.Capabilities.is_some() {
let cap = p.Capabilities.as_ref().unwrap();
Some(oci::LinuxCapabilities {
Some(ociLinuxCapabilities {
bounding: cap.Bounding.clone().into_vec(),
effective: cap.Effective.clone().into_vec(),
inheritable: cap.Inheritable.clone().into_vec(),
@@ -109,7 +115,7 @@ pub fn process_grpc_to_oci(p: &grpc::Process) -> oci::Process {
let rlimits = {
let mut r = Vec::new();
for lm in p.Rlimits.iter() {
r.push(oci::PosixRlimit {
r.push(ociPOSIXRlimit {
r#type: lm.Type.clone(),
hard: lm.Hard,
soft: lm.Soft,
@@ -118,7 +124,7 @@ pub fn process_grpc_to_oci(p: &grpc::Process) -> oci::Process {
r
};
oci::Process {
ociProcess {
terminal: p.Terminal,
console_size,
user,
@@ -134,15 +140,15 @@ pub fn process_grpc_to_oci(p: &grpc::Process) -> oci::Process {
}
}
fn root_grpc_to_oci(root: &grpc::Root) -> oci::Root {
oci::Root {
fn root_grpc_to_oci(root: &grpcRoot) -> ociRoot {
ociRoot {
path: root.Path.clone(),
readonly: root.Readonly,
}
}
fn mount_grpc_to_oci(m: &grpc::Mount) -> oci::Mount {
oci::Mount {
fn mount_grpc_to_oci(m: &grpcMount) -> ociMount {
ociMount {
destination: m.destination.clone(),
r#type: m.field_type.clone(),
source: m.source.clone(),
@@ -150,12 +156,13 @@ fn mount_grpc_to_oci(m: &grpc::Mount) -> oci::Mount {
}
}
use oci::Hook as ociHook;
use protocols::oci::Hook as grpcHook;
fn hook_grpc_to_oci(h: &[grpcHook]) -> Vec<oci::Hook> {
fn hook_grpc_to_oci(h: &[grpcHook]) -> Vec<ociHook> {
let mut r = Vec::new();
for e in h.iter() {
r.push(oci::Hook {
r.push(ociHook {
path: e.Path.clone(),
args: e.Args.clone().into_vec(),
env: e.Env.clone().into_vec(),
@@ -165,29 +172,39 @@ fn hook_grpc_to_oci(h: &[grpcHook]) -> Vec<oci::Hook> {
r
}
fn hooks_grpc_to_oci(h: &grpc::Hooks) -> oci::Hooks {
fn hooks_grpc_to_oci(h: &grpcHooks) -> ociHooks {
let prestart = hook_grpc_to_oci(h.Prestart.as_ref());
let poststart = hook_grpc_to_oci(h.Poststart.as_ref());
let poststop = hook_grpc_to_oci(h.Poststop.as_ref());
oci::Hooks {
ociHooks {
prestart,
poststart,
poststop,
}
}
fn idmap_grpc_to_oci(im: &grpc::LinuxIDMapping) -> oci::LinuxIdMapping {
oci::LinuxIdMapping {
use oci::{
LinuxDevice as ociLinuxDevice, LinuxIDMapping as ociLinuxIDMapping,
LinuxIntelRdt as ociLinuxIntelRdt, LinuxNamespace as ociLinuxNamespace,
LinuxResources as ociLinuxResources, LinuxSeccomp as ociLinuxSeccomp,
};
use protocols::oci::{
LinuxIDMapping as grpcLinuxIDMapping, LinuxResources as grpcLinuxResources,
LinuxSeccomp as grpcLinuxSeccomp,
};
fn idmap_grpc_to_oci(im: &grpcLinuxIDMapping) -> ociLinuxIDMapping {
ociLinuxIDMapping {
container_id: im.ContainerID,
host_id: im.HostID,
size: im.Size,
}
}
fn idmaps_grpc_to_oci(ims: &[grpc::LinuxIDMapping]) -> Vec<oci::LinuxIdMapping> {
fn idmaps_grpc_to_oci(ims: &[grpcLinuxIDMapping]) -> Vec<ociLinuxIDMapping> {
let mut r = Vec::new();
for im in ims.iter() {
r.push(idmap_grpc_to_oci(im));
@@ -195,13 +212,24 @@ fn idmaps_grpc_to_oci(ims: &[grpc::LinuxIDMapping]) -> Vec<oci::LinuxIdMapping>
r
}
fn throttle_devices_grpc_to_oci(
tds: &[grpc::LinuxThrottleDevice],
) -> Vec<oci::LinuxThrottleDevice> {
use oci::{
LinuxBlockIO as ociLinuxBlockIO, LinuxBlockIODevice as ociLinuxBlockIODevice,
LinuxCPU as ociLinuxCPU, LinuxDeviceCgroup as ociLinuxDeviceCgroup,
LinuxHugepageLimit as ociLinuxHugepageLimit,
LinuxInterfacePriority as ociLinuxInterfacePriority, LinuxMemory as ociLinuxMemory,
LinuxNetwork as ociLinuxNetwork, LinuxPids as ociLinuxPids,
LinuxThrottleDevice as ociLinuxThrottleDevice, LinuxWeightDevice as ociLinuxWeightDevice,
};
use protocols::oci::{
LinuxBlockIO as grpcLinuxBlockIO, LinuxThrottleDevice as grpcLinuxThrottleDevice,
LinuxWeightDevice as grpcLinuxWeightDevice,
};
fn throttle_devices_grpc_to_oci(tds: &[grpcLinuxThrottleDevice]) -> Vec<ociLinuxThrottleDevice> {
let mut r = Vec::new();
for td in tds.iter() {
r.push(oci::LinuxThrottleDevice {
blk: oci::LinuxBlockIoDevice {
r.push(ociLinuxThrottleDevice {
blk: ociLinuxBlockIODevice {
major: td.Major,
minor: td.Minor,
},
@@ -211,11 +239,11 @@ fn throttle_devices_grpc_to_oci(
r
}
fn weight_devices_grpc_to_oci(wds: &[grpc::LinuxWeightDevice]) -> Vec<oci::LinuxWeightDevice> {
fn weight_devices_grpc_to_oci(wds: &[grpcLinuxWeightDevice]) -> Vec<ociLinuxWeightDevice> {
let mut r = Vec::new();
for wd in wds.iter() {
r.push(oci::LinuxWeightDevice {
blk: oci::LinuxBlockIoDevice {
r.push(ociLinuxWeightDevice {
blk: ociLinuxBlockIODevice {
major: wd.Major,
minor: wd.Minor,
},
@@ -226,7 +254,7 @@ fn weight_devices_grpc_to_oci(wds: &[grpc::LinuxWeightDevice]) -> Vec<oci::Linux
r
}
fn blockio_grpc_to_oci(blk: &grpc::LinuxBlockIO) -> oci::LinuxBlockIo {
fn blockio_grpc_to_oci(blk: &grpcLinuxBlockIO) -> ociLinuxBlockIO {
let weight_device = weight_devices_grpc_to_oci(blk.WeightDevice.as_ref());
let throttle_read_bps_device = throttle_devices_grpc_to_oci(blk.ThrottleReadBpsDevice.as_ref());
let throttle_write_bps_device =
@@ -236,7 +264,7 @@ fn blockio_grpc_to_oci(blk: &grpc::LinuxBlockIO) -> oci::LinuxBlockIo {
let throttle_write_iops_device =
throttle_devices_grpc_to_oci(blk.ThrottleWriteIOPSDevice.as_ref());
oci::LinuxBlockIo {
ociLinuxBlockIO {
weight: Some(blk.Weight as u16),
leaf_weight: Some(blk.LeafWeight as u16),
weight_device,
@@ -247,7 +275,7 @@ fn blockio_grpc_to_oci(blk: &grpc::LinuxBlockIO) -> oci::LinuxBlockIo {
}
}
pub fn resources_grpc_to_oci(res: &grpc::LinuxResources) -> oci::LinuxResources {
pub fn resources_grpc_to_oci(res: &grpcLinuxResources) -> ociLinuxResources {
let devices = {
let mut d = Vec::new();
for dev in res.Devices.iter() {
@@ -262,7 +290,7 @@ pub fn resources_grpc_to_oci(res: &grpc::LinuxResources) -> oci::LinuxResources
} else {
Some(dev.Minor)
};
d.push(oci::LinuxDeviceCgroup {
d.push(ociLinuxDeviceCgroup {
allow: dev.Allow,
r#type: dev.Type.clone(),
major,
@@ -275,7 +303,7 @@ pub fn resources_grpc_to_oci(res: &grpc::LinuxResources) -> oci::LinuxResources
let memory = if res.Memory.is_some() {
let mem = res.Memory.as_ref().unwrap();
Some(oci::LinuxMemory {
Some(ociLinuxMemory {
limit: Some(mem.Limit),
reservation: Some(mem.Reservation),
swap: Some(mem.Swap),
@@ -290,7 +318,7 @@ pub fn resources_grpc_to_oci(res: &grpc::LinuxResources) -> oci::LinuxResources
let cpu = if res.CPU.is_some() {
let c = res.CPU.as_ref().unwrap();
Some(oci::LinuxCpu {
Some(ociLinuxCPU {
shares: Some(c.Shares),
quota: Some(c.Quota),
period: Some(c.Period),
@@ -305,7 +333,7 @@ pub fn resources_grpc_to_oci(res: &grpc::LinuxResources) -> oci::LinuxResources
let pids = if res.Pids.is_some() {
let p = res.Pids.as_ref().unwrap();
Some(oci::LinuxPids { limit: p.Limit })
Some(ociLinuxPids { limit: p.Limit })
} else {
None
};
@@ -321,7 +349,7 @@ pub fn resources_grpc_to_oci(res: &grpc::LinuxResources) -> oci::LinuxResources
let hugepage_limits = {
let mut r = Vec::new();
for hl in res.HugepageLimits.iter() {
r.push(oci::LinuxHugepageLimit {
r.push(ociLinuxHugepageLimit {
page_size: hl.Pagesize.clone(),
limit: hl.Limit,
});
@@ -334,14 +362,14 @@ pub fn resources_grpc_to_oci(res: &grpc::LinuxResources) -> oci::LinuxResources
let priorities = {
let mut r = Vec::new();
for pr in net.Priorities.iter() {
r.push(oci::LinuxInterfacePriority {
r.push(ociLinuxInterfacePriority {
name: pr.Name.clone(),
priority: pr.Priority,
});
}
r
};
Some(oci::LinuxNetwork {
Some(ociLinuxNetwork {
class_id: Some(net.ClassID),
priorities,
})
@@ -349,7 +377,7 @@ pub fn resources_grpc_to_oci(res: &grpc::LinuxResources) -> oci::LinuxResources
None
};
oci::LinuxResources {
ociLinuxResources {
devices,
memory,
cpu,
@@ -361,22 +389,17 @@ pub fn resources_grpc_to_oci(res: &grpc::LinuxResources) -> oci::LinuxResources
}
}
fn seccomp_grpc_to_oci(sec: &grpc::LinuxSeccomp) -> oci::LinuxSeccomp {
use oci::{LinuxSeccompArg as ociLinuxSeccompArg, LinuxSyscall as ociLinuxSyscall};
fn seccomp_grpc_to_oci(sec: &grpcLinuxSeccomp) -> ociLinuxSeccomp {
let syscalls = {
let mut r = Vec::new();
for sys in sec.Syscalls.iter() {
let mut args = Vec::new();
let errno_ret: u32;
if sys.has_errnoret() {
errno_ret = sys.get_errnoret();
} else {
errno_ret = libc::EPERM as u32;
}
for arg in sys.Args.iter() {
args.push(oci::LinuxSeccompArg {
args.push(ociLinuxSeccompArg {
index: arg.Index as u32,
value: arg.Value,
value_two: arg.ValueTwo,
@@ -384,25 +407,23 @@ fn seccomp_grpc_to_oci(sec: &grpc::LinuxSeccomp) -> oci::LinuxSeccomp {
});
}
r.push(oci::LinuxSyscall {
r.push(ociLinuxSyscall {
names: sys.Names.clone().into_vec(),
action: sys.Action.clone(),
errno_ret,
args,
});
}
r
};
oci::LinuxSeccomp {
ociLinuxSeccomp {
default_action: sec.DefaultAction.clone(),
architectures: sec.Architectures.clone().into_vec(),
flags: sec.Flags.clone().into_vec(),
syscalls,
}
}
fn linux_grpc_to_oci(l: &grpc::Linux) -> oci::Linux {
fn linux_grpc_to_oci(l: &grpcLinux) -> ociLinux {
let uid_mappings = idmaps_grpc_to_oci(l.UIDMappings.as_ref());
let gid_mappings = idmaps_grpc_to_oci(l.GIDMappings.as_ref());
@@ -422,7 +443,7 @@ fn linux_grpc_to_oci(l: &grpc::Linux) -> oci::Linux {
let mut r = Vec::new();
for ns in l.Namespaces.iter() {
r.push(oci::LinuxNamespace {
r.push(ociLinuxNamespace {
r#type: ns.Type.clone(),
path: ns.Path.clone(),
});
@@ -434,7 +455,7 @@ fn linux_grpc_to_oci(l: &grpc::Linux) -> oci::Linux {
let mut r = Vec::new();
for d in l.Devices.iter() {
r.push(oci::LinuxDevice {
r.push(ociLinuxDevice {
path: d.Path.clone(),
r#type: d.Type.clone(),
major: d.Major,
@@ -450,14 +471,14 @@ fn linux_grpc_to_oci(l: &grpc::Linux) -> oci::Linux {
let intel_rdt = if l.IntelRdt.is_some() {
let rdt = l.IntelRdt.as_ref().unwrap();
Some(oci::LinuxIntelRdt {
Some(ociLinuxIntelRdt {
l3_cache_schema: rdt.L3CacheSchema.clone(),
})
} else {
None
};
oci::Linux {
ociLinux {
uid_mappings,
gid_mappings,
sysctl: l.Sysctl.clone(),
@@ -474,11 +495,11 @@ fn linux_grpc_to_oci(l: &grpc::Linux) -> oci::Linux {
}
}
fn linux_oci_to_grpc(_l: &oci::Linux) -> grpc::Linux {
grpc::Linux::default()
fn linux_oci_to_grpc(_l: &ociLinux) -> grpcLinux {
grpcLinux::default()
}
pub fn grpc_to_oci(grpc: &grpc::Spec) -> oci::Spec {
pub fn grpc_to_oci(grpc: &grpcSpec) -> ociSpec {
// process
let process = if grpc.Process.is_some() {
Some(process_grpc_to_oci(grpc.Process.as_ref().unwrap()))
@@ -516,7 +537,7 @@ pub fn grpc_to_oci(grpc: &grpc::Spec) -> oci::Spec {
None
};
oci::Spec {
ociSpec {
version: grpc.Version.clone(),
process,
root,

View File

@@ -3,7 +3,7 @@
// SPDX-License-Identifier: Apache-2.0
//
use anyhow::{anyhow, bail, Context, Result};
use anyhow::{anyhow, bail, Context, Error, Result};
use libc::uid_t;
use nix::errno::Errno;
use nix::fcntl::{self, OFlag};
@@ -22,11 +22,13 @@ use std::os::unix::io::RawFd;
use std::path::{Path, PathBuf};
use path_absolutize::*;
use scan_fmt;
use std::fs::File;
use std::io::{BufRead, BufReader};
use crate::container::DEFAULT_DEVICES;
use crate::sync::write_count;
use lazy_static;
use std::string::ToString;
use crate::log_child;
@@ -48,16 +50,14 @@ pub struct Info {
vfs_opts: String,
}
const MOUNTINFOFORMAT: &str = "{d} {d} {d}:{d} {} {} {} {}";
const MOUNTINFOFORMAT: &'static str = "{d} {d} {d}:{d} {} {} {} {}";
const PROC_PATH: &str = "/proc";
// since libc didn't defined this const for musl, thus redefined it here.
#[cfg(all(target_os = "linux", target_env = "gnu", not(target_arch = "s390x")))]
#[cfg(all(target_os = "linux", target_env = "gnu"))]
const PROC_SUPER_MAGIC: libc::c_long = 0x00009fa0;
#[cfg(all(target_os = "linux", target_env = "musl"))]
const PROC_SUPER_MAGIC: libc::c_ulong = 0x00009fa0;
#[cfg(all(target_os = "linux", target_env = "gnu", target_arch = "s390x"))]
const PROC_SUPER_MAGIC: libc::c_uint = 0x00009fa0;
lazy_static! {
static ref PROPAGATION: HashMap<&'static str, MsFlags> = {
@@ -68,8 +68,6 @@ lazy_static! {
m.insert("rprivate", MsFlags::MS_PRIVATE | MsFlags::MS_REC);
m.insert("slave", MsFlags::MS_SLAVE);
m.insert("rslave", MsFlags::MS_SLAVE | MsFlags::MS_REC);
m.insert("unbindable", MsFlags::MS_UNBINDABLE);
m.insert("runbindable", MsFlags::MS_UNBINDABLE | MsFlags::MS_REC);
m
};
static ref OPTIONS: HashMap<&'static str, (bool, MsFlags)> = {
@@ -95,6 +93,17 @@ lazy_static! {
m.insert("nodiratime", (false, MsFlags::MS_NODIRATIME));
m.insert("bind", (false, MsFlags::MS_BIND));
m.insert("rbind", (false, MsFlags::MS_BIND | MsFlags::MS_REC));
m.insert("unbindable", (false, MsFlags::MS_UNBINDABLE));
m.insert(
"runbindable",
(false, MsFlags::MS_UNBINDABLE | MsFlags::MS_REC),
);
m.insert("private", (false, MsFlags::MS_PRIVATE));
m.insert("rprivate", (false, MsFlags::MS_PRIVATE | MsFlags::MS_REC));
m.insert("shared", (false, MsFlags::MS_SHARED));
m.insert("rshared", (false, MsFlags::MS_SHARED | MsFlags::MS_REC));
m.insert("slave", (false, MsFlags::MS_SLAVE));
m.insert("rslave", (false, MsFlags::MS_SLAVE | MsFlags::MS_REC));
m.insert("relatime", (false, MsFlags::MS_RELATIME));
m.insert("norelatime", (true, MsFlags::MS_RELATIME));
m.insert("strictatime", (false, MsFlags::MS_STRICTATIME));
@@ -105,12 +114,7 @@ lazy_static! {
#[inline(always)]
#[allow(unused_variables)]
pub fn mount<
P1: ?Sized + NixPath,
P2: ?Sized + NixPath,
P3: ?Sized + NixPath,
P4: ?Sized + NixPath,
>(
fn mount<P1: ?Sized + NixPath, P2: ?Sized + NixPath, P3: ?Sized + NixPath, P4: ?Sized + NixPath>(
source: Option<&P1>,
target: &P2,
fstype: Option<&P3>,
@@ -125,7 +129,7 @@ pub fn mount<
#[inline(always)]
#[allow(unused_variables)]
pub fn umount2<P: ?Sized + NixPath>(
fn umount2<P: ?Sized + NixPath>(
target: &P,
flags: MntFlags,
) -> std::result::Result<(), nix::Error> {
@@ -149,7 +153,7 @@ pub fn init_rootfs(
let linux = &spec
.linux
.as_ref()
.ok_or_else(|| anyhow!("Could not get linux configuration from spec"))?;
.ok_or::<Error>(anyhow!("Could not get linux configuration from spec"))?;
let mut flags = MsFlags::MS_REC;
match PROPAGATION.get(&linux.rootfs_propagation.as_str()) {
@@ -160,14 +164,14 @@ pub fn init_rootfs(
let root = spec
.root
.as_ref()
.ok_or_else(|| anyhow!("Could not get rootfs path from spec"))
.ok_or(anyhow!("Could not get rootfs path from spec"))
.and_then(|r| {
fs::canonicalize(r.path.as_str()).context("Could not canonicalize rootfs path")
})?;
let rootfs = (*root)
.to_str()
.ok_or_else(|| anyhow!("Could not convert rootfs path to string"))?;
.ok_or(anyhow!("Could not convert rootfs path to string"))?;
mount(None::<&str>, "/", None::<&str>, flags, None::<&str>)?;
@@ -183,8 +187,8 @@ pub fn init_rootfs(
let mut bind_mount_dev = false;
for m in &spec.mounts {
let (mut flags, pgflags, data) = parse_mount(&m);
if !m.destination.starts_with('/') || m.destination.contains("..") {
let (mut flags, data) = parse_mount(&m);
if !m.destination.starts_with("/") || m.destination.contains("..") {
return Err(anyhow!(
"the mount destination {} is invalid",
m.destination
@@ -225,15 +229,13 @@ pub fn init_rootfs(
// effective.
// first check that we have non-default options required before attempting a
// remount
if m.r#type == "bind" && !pgflags.is_empty() {
let dest = secure_join(rootfs, &m.destination);
mount(
None::<&str>,
dest.as_str(),
None::<&str>,
pgflags,
None::<&str>,
)?;
if m.r#type == "bind" {
for o in &m.options {
if let Some(fl) = PROPAGATION.get(o.as_str()) {
let dest = format!("{}{}", &rootfs, &m.destination);
mount(None::<&str>, dest.as_str(), None::<&str>, *fl, None::<&str>)?;
}
}
}
}
}
@@ -280,9 +282,9 @@ fn check_proc_mount(m: &Mount) -> Result<()> {
// only allow a mount on-top of proc if it's source is "proc"
unsafe {
let mut stats = MaybeUninit::<libc::statfs>::uninit();
if m.source
if let Ok(_) = m
.source
.with_nix_path(|path| libc::statfs(path.as_ptr(), stats.as_mut_ptr()))
.is_ok()
{
if stats.assume_init().f_type == PROC_SUPER_MAGIC {
return Ok(());
@@ -305,7 +307,7 @@ fn check_proc_mount(m: &Mount) -> Result<()> {
)));
}
Ok(())
return Ok(());
}
fn mount_cgroups_v2(cfd_log: RawFd, m: &Mount, rootfs: &str, flags: MsFlags) -> Result<()> {
@@ -593,14 +595,15 @@ pub fn ms_move_root(rootfs: &str) -> Result<bool> {
let abs_root_buf = root_path.absolutize()?;
let abs_root = abs_root_buf
.to_str()
.ok_or_else(|| anyhow!("failed to parse {} to absolute path", rootfs))?;
.ok_or::<Error>(anyhow!("failed to parse {} to absolute path", rootfs))?;
for info in mount_infos.iter() {
let mount_point = Path::new(&info.mount_point);
let abs_mount_buf = mount_point.absolutize()?;
let abs_mount_point = abs_mount_buf
.to_str()
.ok_or_else(|| anyhow!("failed to parse {} to absolute path", info.mount_point))?;
let abs_mount_point = abs_mount_buf.to_str().ok_or::<Error>(anyhow!(
"failed to parse {} to absolute path",
info.mount_point
))?;
let abs_mount_point_string = String::from(abs_mount_point);
// Umount every syfs and proc file systems, except those under the container rootfs
@@ -650,73 +653,26 @@ pub fn ms_move_root(rootfs: &str) -> Result<bool> {
Ok(true)
}
fn parse_mount(m: &Mount) -> (MsFlags, MsFlags, String) {
fn parse_mount(m: &Mount) -> (MsFlags, String) {
let mut flags = MsFlags::empty();
let mut pgflags = MsFlags::empty();
let mut data = Vec::new();
for o in &m.options {
if let Some(v) = OPTIONS.get(o.as_str()) {
let (clear, fl) = *v;
if clear {
flags &= !fl;
} else {
flags |= fl;
}
} else if let Some(fl) = PROPAGATION.get(o.as_str()) {
pgflags |= *fl;
} else {
data.push(o.clone());
}
}
(flags, pgflags, data.join(","))
}
// This function constructs a canonicalized path by combining the `rootfs` and `unsafe_path` elements.
// The resulting path is guaranteed to be ("below" / "in a directory under") the `rootfs` directory.
//
// Parameters:
//
// - `rootfs` is the absolute path to the root of the containers root filesystem directory.
// - `unsafe_path` is path inside a container. It is unsafe since it may try to "escape" from the containers
// rootfs by using one or more "../" path elements or is its a symlink to path.
fn secure_join(rootfs: &str, unsafe_path: &str) -> String {
let mut path = PathBuf::from(format!("{}/", rootfs));
let unsafe_p = Path::new(&unsafe_path);
for it in unsafe_p.iter() {
let it_p = Path::new(&it);
// if it_p leads with "/", path.push(it) will be replace as it, so ignore "/"
if it_p.has_root() {
continue;
};
path.push(it);
if let Ok(v) = path.read_link() {
if v.is_absolute() {
path = PathBuf::from(format!("{}{}", rootfs, v.to_str().unwrap().to_string()));
} else {
path.pop();
for it in v.iter() {
path.push(it);
if path.exists() {
path = path.canonicalize().unwrap();
if !path.starts_with(rootfs) {
path = PathBuf::from(rootfs.to_string());
}
}
match OPTIONS.get(o.as_str()) {
Some(v) => {
let (clear, fl) = *v;
if clear {
flags &= !fl;
} else {
flags |= fl;
}
}
}
// skip any ".."
if path.ends_with("..") {
path.pop();
None => data.push(o.clone()),
}
}
path.to_str().unwrap().to_string()
(flags, data.join(","))
}
fn mount_from(
@@ -728,14 +684,14 @@ fn mount_from(
_label: &str,
) -> Result<()> {
let d = String::from(data);
let dest = secure_join(rootfs, &m.destination);
let dest = format!("{}{}", rootfs, &m.destination);
let src = if m.r#type.as_str() == "bind" {
let src = fs::canonicalize(m.source.as_str())?;
let dir = if src.is_dir() {
Path::new(&dest)
} else {
let dir = if src.is_file() {
Path::new(&dest).parent().unwrap()
} else {
Path::new(&dest)
};
let _ = fs::create_dir_all(&dir).map_err(|e| {
@@ -748,7 +704,7 @@ fn mount_from(
});
// make sure file exists so we can bind over it
if !src.is_dir() {
if src.is_file() {
let _ = OpenOptions::new().create(true).write(true).open(&dest);
}
src.to_str().unwrap().to_string()
@@ -808,7 +764,7 @@ fn mount_from(
Ok(())
}
static SYMLINKS: &[(&str, &str)] = &[
static SYMLINKS: &'static [(&'static str, &'static str)] = &[
("/proc/self/fd", "dev/fd"),
("/proc/self/fd/0", "dev/stdin"),
("/proc/self/fd/1", "dev/stdout"),
@@ -916,7 +872,7 @@ pub fn finish_rootfs(cfd_log: RawFd, spec: &Spec) -> Result<()> {
for m in spec.mounts.iter() {
if m.destination == "/dev" {
let (flags, _, _) = parse_mount(m);
let (flags, _) = parse_mount(m);
if flags.contains(MsFlags::MS_RDONLY) {
mount(
Some("/dev"),
@@ -941,7 +897,7 @@ pub fn finish_rootfs(cfd_log: RawFd, spec: &Spec) -> Result<()> {
}
fn mask_path(path: &str) -> Result<()> {
if !path.starts_with('/') || path.contains("..") {
if !path.starts_with("/") || path.contains("..") {
return Err(nix::Error::Sys(Errno::EINVAL).into());
}
@@ -970,7 +926,7 @@ fn mask_path(path: &str) -> Result<()> {
}
fn readonly_path(path: &str) -> Result<()> {
if !path.starts_with('/') || path.contains("..") {
if !path.starts_with("/") || path.contains("..") {
return Err(nix::Error::Sys(Errno::EINVAL).into());
}
@@ -1012,10 +968,6 @@ fn readonly_path(path: &str) -> Result<()> {
mod tests {
use super::*;
use crate::skip_if_not_root;
use std::fs::create_dir;
use std::fs::create_dir_all;
use std::fs::remove_dir_all;
use std::os::unix::fs;
use std::os::unix::io::AsRawFd;
use tempfile::tempdir;
@@ -1045,7 +997,7 @@ mod tests {
);
let rootfs = tempdir().unwrap();
let ret = create_dir(rootfs.path().join("dev"));
let ret = fs::create_dir(rootfs.path().join("dev"));
assert!(ret.is_ok(), "Got: {:?}", ret);
spec.root = Some(oci::Root {
@@ -1056,8 +1008,8 @@ mod tests {
// there is no spec.mounts, but should pass
let ret = init_rootfs(stdout_fd, &spec, &cpath, &mounts, true);
assert!(ret.is_ok(), "Should pass. Got: {:?}", ret);
let _ = remove_dir_all(rootfs.path().join("dev"));
let _ = create_dir(rootfs.path().join("dev"));
let _ = fs::remove_dir_all(rootfs.path().join("dev"));
let _ = fs::create_dir(rootfs.path().join("dev"));
// Adding bad mount point to spec.mounts
spec.mounts.push(oci::Mount {
@@ -1075,8 +1027,8 @@ mod tests {
ret
);
spec.mounts.pop();
let _ = remove_dir_all(rootfs.path().join("dev"));
let _ = create_dir(rootfs.path().join("dev"));
let _ = fs::remove_dir_all(rootfs.path().join("dev"));
let _ = fs::create_dir(rootfs.path().join("dev"));
// mounting a cgroup
spec.mounts.push(oci::Mount {
@@ -1089,8 +1041,8 @@ mod tests {
let ret = init_rootfs(stdout_fd, &spec, &cpath, &mounts, true);
assert!(ret.is_ok(), "Should pass. Got: {:?}", ret);
spec.mounts.pop();
let _ = remove_dir_all(rootfs.path().join("dev"));
let _ = create_dir(rootfs.path().join("dev"));
let _ = fs::remove_dir_all(rootfs.path().join("dev"));
let _ = fs::create_dir(rootfs.path().join("dev"));
// mounting /dev
spec.mounts.push(oci::Mount {
@@ -1127,11 +1079,11 @@ mod tests {
cgroup_mounts.insert("cpu".to_string(), "cpu".to_string());
cgroup_mounts.insert("memory".to_string(), "memory".to_string());
let ret = create_dir_all(tempdir.path().join("cgroups"));
let ret = fs::create_dir_all(tempdir.path().join("cgroups"));
assert!(ret.is_ok(), "Should pass. Got {:?}", ret);
let ret = create_dir_all(tempdir.path().join("cpu"));
let ret = fs::create_dir_all(tempdir.path().join("cpu"));
assert!(ret.is_ok(), "Should pass. Got {:?}", ret);
let ret = create_dir_all(tempdir.path().join("memory"));
let ret = fs::create_dir_all(tempdir.path().join("memory"));
assert!(ret.is_ok(), "Should pass. Got {:?}", ret);
let ret = mount_cgroups(
@@ -1279,89 +1231,4 @@ mod tests {
assert!(check_proc_mount(&mount).is_err());
}
#[test]
fn test_secure_join() {
#[derive(Debug)]
struct TestData<'a> {
name: &'a str,
rootfs: &'a str,
unsafe_path: &'a str,
symlink_path: &'a str,
result: &'a str,
}
// create tempory directory to simulate container rootfs with symlink
let rootfs_dir = tempdir().expect("failed to create tmpdir");
let rootfs_path = rootfs_dir.path().to_str().unwrap();
let tests = &[
TestData {
name: "rootfs_not_exist",
rootfs: "/home/rootfs",
unsafe_path: "a/b/c",
symlink_path: "",
result: "/home/rootfs/a/b/c",
},
TestData {
name: "relative_path",
rootfs: "/home/rootfs",
unsafe_path: "../../../a/b/c",
symlink_path: "",
result: "/home/rootfs/a/b/c",
},
TestData {
name: "skip any ..",
rootfs: "/home/rootfs",
unsafe_path: "../../../a/../../b/../../c",
symlink_path: "",
result: "/home/rootfs/a/b/c",
},
TestData {
name: "rootfs is null",
rootfs: "",
unsafe_path: "",
symlink_path: "",
result: "/",
},
TestData {
name: "relative softlink beyond container rootfs",
rootfs: rootfs_path,
unsafe_path: "1",
symlink_path: "../../../",
result: rootfs_path,
},
TestData {
name: "abs softlink points to the non-exist directory",
rootfs: rootfs_path,
unsafe_path: "2",
symlink_path: "/dddd",
result: &format!("{}/dddd", rootfs_path).as_str().to_owned(),
},
TestData {
name: "abs softlink points to the root",
rootfs: rootfs_path,
unsafe_path: "3",
symlink_path: "/",
result: &format!("{}/", rootfs_path).as_str().to_owned(),
},
];
for (i, t) in tests.iter().enumerate() {
// Create a string containing details of the test
let msg = format!("test[{}]: {:?}", i, t);
// if is_symlink, then should be prepare the softlink environment
if t.symlink_path != "" {
fs::symlink(t.symlink_path, format!("{}/{}", t.rootfs, t.unsafe_path)).unwrap();
}
let result = secure_join(t.rootfs, t.unsafe_path);
// Update the test details string with the results of the call
let msg = format!("{}, result: {:?}", msg, result);
// Perform the checks
assert!(result == t.result, "{}", msg);
}
}
}

View File

@@ -1,203 +0,0 @@
// Copyright (c) 2020 Ant Group
//
// SPDX-License-Identifier: Apache-2.0
//
//! Async support for pipe or something has file descriptor
use nix::unistd;
use std::{
fmt, io,
io::{Read, Result, Write},
mem,
os::unix::io::{AsRawFd, FromRawFd, IntoRawFd, RawFd},
pin::Pin,
task::{Context, Poll},
};
use futures::ready;
use tokio::io::{unix::AsyncFd, AsyncRead, AsyncWrite, ReadBuf};
fn set_nonblocking(fd: RawFd) {
unsafe {
libc::fcntl(fd, libc::F_SETFL, libc::O_NONBLOCK);
}
}
struct StreamFd(RawFd);
impl io::Read for &StreamFd {
fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
match unistd::read(self.0, buf) {
Ok(l) => Ok(l),
Err(e) => Err(e.as_errno().unwrap().into()),
}
}
}
impl io::Write for &StreamFd {
fn write(&mut self, buf: &[u8]) -> io::Result<usize> {
match unistd::write(self.0, buf) {
Ok(l) => Ok(l),
Err(e) => Err(e.as_errno().unwrap().into()),
}
}
fn flush(&mut self) -> io::Result<()> {
Ok(())
}
}
impl StreamFd {
fn close(&mut self) -> io::Result<()> {
match unistd::close(self.0) {
Ok(()) => Ok(()),
Err(e) => Err(e.as_errno().unwrap().into()),
}
}
}
impl Drop for StreamFd {
fn drop(&mut self) {
self.close().ok();
}
}
impl AsRawFd for StreamFd {
fn as_raw_fd(&self) -> RawFd {
self.0
}
}
pub struct PipeStream(AsyncFd<StreamFd>);
impl PipeStream {
pub fn new(fd: RawFd) -> Result<Self> {
set_nonblocking(fd);
Ok(Self(AsyncFd::new(StreamFd(fd))?))
}
pub fn from_fd(fd: RawFd) -> Self {
unsafe { Self::from_raw_fd(fd) }
}
}
impl AsRawFd for PipeStream {
fn as_raw_fd(&self) -> RawFd {
self.0.as_raw_fd()
}
}
impl IntoRawFd for PipeStream {
fn into_raw_fd(self) -> RawFd {
let fd = self.as_raw_fd();
mem::forget(self);
fd
}
}
impl FromRawFd for PipeStream {
unsafe fn from_raw_fd(fd: RawFd) -> Self {
Self::new(fd).unwrap()
}
}
impl fmt::Debug for PipeStream {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "PipeStream({})", self.as_raw_fd())
}
}
impl AsyncRead for PipeStream {
fn poll_read(
self: Pin<&mut Self>,
cx: &mut Context<'_>,
buf: &mut ReadBuf<'_>,
) -> Poll<Result<()>> {
let b;
unsafe {
b = &mut *(buf.unfilled_mut() as *mut [mem::MaybeUninit<u8>] as *mut [u8]);
};
loop {
let mut guard = ready!(self.0.poll_read_ready(cx))?;
match guard.try_io(|inner| inner.get_ref().read(b)) {
Ok(Ok(n)) => {
unsafe {
buf.assume_init(n);
}
buf.advance(n);
return Ok(()).into();
}
Ok(Err(e)) => return Err(e).into(),
Err(_would_block) => {
continue;
}
}
}
}
}
impl AsyncWrite for PipeStream {
fn poll_write(
self: Pin<&mut Self>,
cx: &mut Context<'_>,
buf: &[u8],
) -> Poll<io::Result<usize>> {
loop {
let mut guard = ready!(self.0.poll_write_ready(cx))?;
match guard.try_io(|inner| inner.get_ref().write(buf)) {
Ok(result) => return Poll::Ready(result),
Err(_would_block) => continue,
}
}
}
fn poll_flush(self: Pin<&mut Self>, _cx: &mut Context<'_>) -> Poll<io::Result<()>> {
Poll::Ready(Ok(()))
}
fn poll_shutdown(self: Pin<&mut Self>, _cx: &mut Context<'_>) -> Poll<io::Result<()>> {
// Do nothing in shutdown is very important
// The only right way to shutdown pipe is drop it
// Otherwise PipeStream will conflict with its twins
// Because they both have same fd, and both registered.
Poll::Ready(Ok(()))
}
}
#[cfg(test)]
mod tests {
use super::*;
use nix::fcntl::OFlag;
use tokio::io::{AsyncReadExt, AsyncWriteExt};
#[tokio::test]
// Shutdown should never close the inner fd.
async fn test_pipestream_shutdown() {
let (_, wfd1) = unistd::pipe2(OFlag::O_CLOEXEC).unwrap();
let mut writer1 = PipeStream::new(wfd1).unwrap();
// if close fd in shutdown, the fd will be reused
// and the test will failed
let _ = writer1.shutdown().await.unwrap();
// let _ = unistd::close(wfd1);
let (rfd2, wfd2) = unistd::pipe2(OFlag::O_CLOEXEC).unwrap(); // reuse fd number, rfd2 == wfd1
let mut reader2 = PipeStream::new(rfd2).unwrap();
let mut writer2 = PipeStream::new(wfd2).unwrap();
// deregister writer1, then reader2 which has the same fd will be deregistered from epoll
drop(writer1);
let _ = writer2.write(b"1").await;
let mut content = vec![0u8; 1];
// Will Block here if shutdown close the fd.
let _ = reader2.read(&mut content).await;
}
}

View File

@@ -6,7 +6,7 @@
use libc::pid_t;
use std::fs::File;
use std::os::unix::io::RawFd;
use tokio::sync::mpsc::Sender;
use std::sync::mpsc::Sender;
use nix::fcntl::{fcntl, FcntlArg, OFlag};
use nix::sys::signal::{self, Signal};
@@ -17,35 +17,14 @@ use nix::Result;
use oci::Process as OCIProcess;
use slog::Logger;
use crate::pipestream::PipeStream;
use std::collections::HashMap;
use std::sync::Arc;
use tokio::io::{split, ReadHalf, WriteHalf};
use tokio::sync::Mutex;
use tokio::sync::Notify;
#[derive(Debug, PartialEq, Eq, Hash, Clone)]
pub enum StreamType {
Stdin,
Stdout,
Stderr,
TermMaster,
ParentStdin,
ParentStdout,
ParentStderr,
}
type Reader = Arc<Mutex<ReadHalf<PipeStream>>>;
type Writer = Arc<Mutex<WriteHalf<PipeStream>>>;
#[derive(Debug)]
pub struct Process {
pub exec_id: String,
pub stdin: Option<RawFd>,
pub stdout: Option<RawFd>,
pub stderr: Option<RawFd>,
pub exit_tx: Option<tokio::sync::watch::Sender<bool>>,
pub exit_rx: Option<tokio::sync::watch::Receiver<bool>>,
pub exit_pipe_r: Option<RawFd>,
pub exit_pipe_w: Option<RawFd>,
pub extra_files: Vec<File>,
pub term_master: Option<RawFd>,
pub tty: bool,
@@ -61,10 +40,6 @@ pub struct Process {
pub exit_watchers: Vec<Sender<i32>>,
pub oci: OCIProcess,
pub logger: Logger,
pub term_exit_notifier: Arc<Notify>,
readers: HashMap<StreamType, Reader>,
writers: HashMap<StreamType, Writer>,
}
pub trait ProcessOperations {
@@ -96,15 +71,14 @@ impl Process {
pipe_size: i32,
) -> Result<Self> {
let logger = logger.new(o!("subsystem" => "process"));
let (exit_tx, exit_rx) = tokio::sync::watch::channel(false);
let mut p = Process {
exec_id: String::from(id),
stdin: None,
stdout: None,
stderr: None,
exit_tx: Some(exit_tx),
exit_rx: Some(exit_rx),
exit_pipe_w: None,
exit_pipe_r: None,
extra_files: Vec::new(),
tty: ocip.terminal,
term_master: None,
@@ -117,9 +91,6 @@ impl Process {
exit_watchers: Vec::new(),
oci: ocip.clone(),
logger: logger.clone(),
term_exit_notifier: Arc::new(Notify::new()),
readers: HashMap::new(),
writers: HashMap::new(),
};
info!(logger, "before create console socket!");
@@ -141,60 +112,6 @@ impl Process {
}
Ok(p)
}
pub fn notify_term_close(&mut self) {
let notify = self.term_exit_notifier.clone();
notify.notify_one();
}
fn get_fd(&self, stream_type: &StreamType) -> Option<RawFd> {
match stream_type {
StreamType::Stdin => self.stdin,
StreamType::Stdout => self.stdout,
StreamType::Stderr => self.stderr,
StreamType::TermMaster => self.term_master,
StreamType::ParentStdin => self.parent_stdin,
StreamType::ParentStdout => self.parent_stdout,
StreamType::ParentStderr => self.parent_stderr,
}
}
fn get_stream_and_store(&mut self, stream_type: StreamType) -> Option<(Reader, Writer)> {
let fd = self.get_fd(&stream_type)?;
let stream = PipeStream::from_fd(fd);
let (reader, writer) = split(stream);
let reader = Arc::new(Mutex::new(reader));
let writer = Arc::new(Mutex::new(writer));
self.readers.insert(stream_type.clone(), reader.clone());
self.writers.insert(stream_type, writer.clone());
Some((reader, writer))
}
pub fn get_reader(&mut self, stream_type: StreamType) -> Option<Reader> {
if let Some(reader) = self.readers.get(&stream_type) {
return Some(reader.clone());
}
let (reader, _) = self.get_stream_and_store(stream_type)?;
Some(reader)
}
pub fn get_writer(&mut self, stream_type: StreamType) -> Option<Writer> {
if let Some(writer) = self.writers.get(&stream_type) {
return Some(writer.clone());
}
let (_, writer) = self.get_stream_and_store(stream_type)?;
Some(writer)
}
pub fn close_stream(&mut self, stream_type: StreamType) {
let _ = self.readers.remove(&stream_type);
let _ = self.writers.remove(&stream_type);
}
}
fn create_extended_pipe(flags: OFlag, pipe_size: i32) -> Result<(RawFd, RawFd)> {
@@ -252,6 +169,7 @@ mod tests {
// -1 by default
assert_eq!(process.pid, -1);
assert!(process.wait().is_err());
// signal to every process in the process
// group of the calling process.
process.pid = 0;

View File

@@ -14,8 +14,8 @@ pub const SYNC_SUCCESS: i32 = 1;
pub const SYNC_FAILED: i32 = 2;
pub const SYNC_DATA: i32 = 3;
pub const DATA_SIZE: usize = 100;
pub const MSG_SIZE: usize = mem::size_of::<i32>();
const DATA_SIZE: usize = 100;
const MSG_SIZE: usize = mem::size_of::<i32>();
#[macro_export]
macro_rules! log_child {
@@ -96,14 +96,14 @@ pub fn read_sync(fd: RawFd) -> Result<Vec<u8>> {
let buf_array: [u8; MSG_SIZE] = [buf[0], buf[1], buf[2], buf[3]];
let msg: i32 = i32::from_be_bytes(buf_array);
match msg {
SYNC_SUCCESS => Ok(Vec::new()),
SYNC_SUCCESS => return Ok(Vec::new()),
SYNC_DATA => {
let buf = read_count(fd, MSG_SIZE)?;
let buf_array: [u8; MSG_SIZE] = [buf[0], buf[1], buf[2], buf[3]];
let msg_length: i32 = i32::from_be_bytes(buf_array);
let data_buf = read_count(fd, msg_length as usize)?;
Ok(data_buf)
return Ok(data_buf);
}
SYNC_FAILED => {
let mut error_buf = vec![];
@@ -127,9 +127,9 @@ pub fn read_sync(fd: RawFd) -> Result<Vec<u8>> {
}
};
Err(anyhow!(error_str))
return Err(anyhow!(error_str));
}
_ => Err(anyhow!("error in receive sync message")),
_ => return Err(anyhow!("error in receive sync message")),
}
}

View File

@@ -1,140 +0,0 @@
// Copyright (c) 2020 Ant Group
//
// SPDX-License-Identifier: Apache-2.0
//
//! The async version of sync module used for IPC
use crate::pipestream::PipeStream;
use anyhow::{anyhow, Result};
use nix::errno::Errno;
use tokio::io::{AsyncReadExt, AsyncWriteExt};
use crate::sync::{DATA_SIZE, MSG_SIZE, SYNC_DATA, SYNC_FAILED, SYNC_SUCCESS};
async fn write_count(pipe_w: &mut PipeStream, buf: &[u8], count: usize) -> Result<usize> {
let mut len = 0;
loop {
match pipe_w.write(&buf[len..]).await {
Ok(l) => {
len += l;
if len == count {
break;
}
}
Err(e) => {
if e.raw_os_error().unwrap() != Errno::EINTR as i32 {
return Err(e.into());
}
}
}
}
Ok(len)
}
async fn read_count(pipe_r: &mut PipeStream, count: usize) -> Result<Vec<u8>> {
let mut v: Vec<u8> = vec![0; count];
let mut len = 0;
loop {
match pipe_r.read(&mut v[len..]).await {
Ok(l) => {
len += l;
if len == count || l == 0 {
break;
}
}
Err(e) => {
if e.raw_os_error().unwrap() != Errno::EINTR as i32 {
return Err(e.into());
}
}
}
}
Ok(v[0..len].to_vec())
}
pub async fn read_async(pipe_r: &mut PipeStream) -> Result<Vec<u8>> {
let buf = read_count(pipe_r, MSG_SIZE).await?;
if buf.len() != MSG_SIZE {
return Err(anyhow!(
"process: {} failed to receive async message from peer: got msg length: {}, expected: {}",
std::process::id(),
buf.len(),
MSG_SIZE
));
}
let buf_array: [u8; MSG_SIZE] = [buf[0], buf[1], buf[2], buf[3]];
let msg: i32 = i32::from_be_bytes(buf_array);
match msg {
SYNC_SUCCESS => Ok(Vec::new()),
SYNC_DATA => {
let buf = read_count(pipe_r, MSG_SIZE).await?;
let buf_array: [u8; MSG_SIZE] = [buf[0], buf[1], buf[2], buf[3]];
let msg_length: i32 = i32::from_be_bytes(buf_array);
let data_buf = read_count(pipe_r, msg_length as usize).await?;
Ok(data_buf)
}
SYNC_FAILED => {
let mut error_buf = vec![];
loop {
let buf = read_count(pipe_r, DATA_SIZE).await?;
error_buf.extend(&buf);
if DATA_SIZE == buf.len() {
continue;
} else {
break;
}
}
let error_str = match std::str::from_utf8(&error_buf) {
Ok(v) => String::from(v),
Err(e) => {
return Err(
anyhow!(e).context("receive error message from child process failed")
);
}
};
Err(anyhow!(error_str))
}
_ => Err(anyhow!("error in receive sync message")),
}
}
pub async fn write_async(pipe_w: &mut PipeStream, msg_type: i32, data_str: &str) -> Result<()> {
let buf = msg_type.to_be_bytes();
let count = write_count(pipe_w, &buf, MSG_SIZE).await?;
if count != MSG_SIZE {
return Err(anyhow!("error in send sync message"));
}
match msg_type {
SYNC_FAILED => {
if let Err(e) = write_count(pipe_w, data_str.as_bytes(), data_str.len()).await {
return Err(anyhow!(e).context("error in send message to process"));
}
}
SYNC_DATA => {
let length: i32 = data_str.len() as i32;
write_count(pipe_w, &length.to_be_bytes(), MSG_SIZE)
.await
.map_err(|e| anyhow!(e).context("error in send message to process"))?;
write_count(pipe_w, data_str.as_bytes(), data_str.len())
.await
.map_err(|e| anyhow!(e).context("error in send message to process"))?;
}
_ => (),
};
Ok(())
}

View File

@@ -1,119 +0,0 @@
// Copyright (c) 2021 Ant Group
//
// SPDX-License-Identifier: Apache-2.0
//
use anyhow::{anyhow, Context, Result};
use libc::gid_t;
use libc::uid_t;
use std::fs::File;
use std::io::{BufRead, BufReader};
const PASSWD_FILE: &str = "/etc/passwd";
// An entry from /etc/passwd
#[derive(Debug, PartialEq, PartialOrd)]
pub struct PasswdEntry {
// username
pub name: String,
// user password
pub passwd: String,
// user id
pub uid: uid_t,
// group id
pub gid: gid_t,
// user Information
pub gecos: String,
// home directory
pub dir: String,
// User's Shell
pub shell: String,
}
// get an entry for a given `uid` from `/etc/passwd`
fn get_entry_by_uid(uid: uid_t, path: &str) -> Result<PasswdEntry> {
let file = File::open(path).with_context(|| format!("open file {}", path))?;
let mut reader = BufReader::new(file);
let mut line = String::new();
loop {
line.clear();
match reader.read_line(&mut line) {
Ok(0) => return Err(anyhow!(format!("file {} is empty", path))),
Ok(_) => (),
Err(e) => {
return Err(anyhow!(format!(
"failed to read file {} with {:?}",
path, e
)))
}
}
if line.starts_with('#') {
continue;
}
let parts: Vec<&str> = line.split(':').map(|part| part.trim()).collect();
if parts.len() != 7 {
continue;
}
match parts[2].parse() {
Err(_e) => continue,
Ok(new_uid) => {
if uid != new_uid {
continue;
}
let entry = PasswdEntry {
name: parts[0].to_string(),
passwd: parts[1].to_string(),
uid: new_uid,
gid: parts[3].parse().unwrap_or(0),
gecos: parts[4].to_string(),
dir: parts[5].to_string(),
shell: parts[6].to_string(),
};
return Ok(entry);
}
}
}
}
pub fn home_dir(uid: uid_t) -> Result<String> {
get_entry_by_uid(uid, PASSWD_FILE).map(|entry| entry.dir)
}
#[cfg(test)]
mod tests {
use super::*;
use std::io::Write;
use tempfile::Builder;
#[test]
fn test_get_entry_by_uid() {
let tmpdir = Builder::new().tempdir().unwrap();
let tmpdir_path = tmpdir.path().to_str().unwrap();
let temp_passwd = format!("{}/passwd", tmpdir_path);
let mut tempf = File::create(temp_passwd.as_str()).unwrap();
writeln!(tempf, "root:x:0:0:root:/root0:/bin/bash").unwrap();
writeln!(tempf, "root:x:1:0:root:/root1:/bin/bash").unwrap();
writeln!(tempf, "#root:x:1:0:root:/rootx:/bin/bash").unwrap();
writeln!(tempf, "root:x:2:0:root:/root2:/bin/bash").unwrap();
writeln!(tempf, "root:x:3:0:root:/root3").unwrap();
writeln!(tempf, "root:x:3:0:root:/root3:/bin/bash").unwrap();
let entry = get_entry_by_uid(0, temp_passwd.as_str()).unwrap();
assert_eq!(entry.dir.as_str(), "/root0");
let entry = get_entry_by_uid(1, temp_passwd.as_str()).unwrap();
assert_eq!(entry.dir.as_str(), "/root1");
let entry = get_entry_by_uid(2, temp_passwd.as_str()).unwrap();
assert_eq!(entry.dir.as_str(), "/root2");
let entry = get_entry_by_uid(3, temp_passwd.as_str()).unwrap();
assert_eq!(entry.dir.as_str(), "/root3");
}
}

View File

@@ -4,21 +4,14 @@
//
use crate::container::Config;
use anyhow::{anyhow, Context, Error, Result};
use anyhow::{anyhow, Context, Result};
use lazy_static;
use nix::errno::Errno;
use oci::{Linux, LinuxIdMapping, LinuxNamespace, Spec};
use oci::{LinuxIDMapping, LinuxNamespace, Spec};
use std::collections::HashMap;
use std::path::{Component, PathBuf};
fn einval() -> Error {
anyhow!(nix::Error::from_errno(Errno::EINVAL))
}
fn get_linux(oci: &Spec) -> Result<&Linux> {
oci.linux.as_ref().ok_or_else(einval)
}
fn contain_namespace(nses: &[LinuxNamespace], key: &str) -> bool {
fn contain_namespace(nses: &Vec<LinuxNamespace>, key: &str) -> bool {
for ns in nses {
if ns.r#type.as_str() == key {
return true;
@@ -28,28 +21,30 @@ fn contain_namespace(nses: &[LinuxNamespace], key: &str) -> bool {
false
}
fn get_namespace_path(nses: &[LinuxNamespace], key: &str) -> Result<String> {
fn get_namespace_path(nses: &Vec<LinuxNamespace>, key: &str) -> Result<String> {
for ns in nses {
if ns.r#type.as_str() == key {
return Ok(ns.path.clone());
}
}
Err(einval())
Err(anyhow!(nix::Error::from_errno(Errno::EINVAL)))
}
fn rootfs(root: &str) -> Result<()> {
let path = PathBuf::from(root);
// not absolute path or not exists
if !path.exists() || !path.is_absolute() {
return Err(einval());
return Err(anyhow!(nix::Error::from_errno(Errno::EINVAL)));
}
// symbolic link? ..?
let mut stack: Vec<String> = Vec::new();
for c in path.components() {
if stack.is_empty() && (c == Component::RootDir || c == Component::ParentDir) {
continue;
if stack.is_empty() {
if c == Component::RootDir || c == Component::ParentDir {
continue;
}
}
if c == Component::ParentDir {
@@ -60,7 +55,7 @@ fn rootfs(root: &str) -> Result<()> {
if let Some(v) = c.as_os_str().to_str() {
stack.push(v.to_string());
} else {
return Err(einval());
return Err(anyhow!(nix::Error::from_errno(Errno::EINVAL)));
}
}
@@ -72,34 +67,43 @@ fn rootfs(root: &str) -> Result<()> {
let canon = path.canonicalize().context("canonicalize")?;
if cleaned != canon {
// There is symbolic in path
return Err(einval());
return Err(anyhow!(nix::Error::from_errno(Errno::EINVAL)));
}
Ok(())
}
fn network(_oci: &Spec) -> Result<()> {
Ok(())
}
fn hostname(oci: &Spec) -> Result<()> {
if oci.hostname.is_empty() {
if oci.hostname.is_empty() || oci.hostname == "".to_string() {
return Ok(());
}
let linux = get_linux(oci)?;
let linux = oci
.linux
.as_ref()
.ok_or(anyhow!(nix::Error::from_errno(Errno::EINVAL)))?;
if !contain_namespace(&linux.namespaces, "uts") {
return Err(einval());
return Err(anyhow!(nix::Error::from_errno(Errno::EINVAL)));
}
Ok(())
}
fn security(oci: &Spec) -> Result<()> {
let linux = get_linux(oci)?;
let linux = oci
.linux
.as_ref()
.ok_or(anyhow!(nix::Error::from_errno(Errno::EINVAL)))?;
if linux.masked_paths.is_empty() && linux.readonly_paths.is_empty() {
return Ok(());
}
if !contain_namespace(&linux.namespaces, "mount") {
return Err(einval());
return Err(anyhow!(nix::Error::from_errno(Errno::EINVAL)));
}
// don't care about selinux at present
@@ -107,19 +111,21 @@ fn security(oci: &Spec) -> Result<()> {
Ok(())
}
fn idmapping(maps: &[LinuxIdMapping]) -> Result<()> {
fn idmapping(maps: &Vec<LinuxIDMapping>) -> Result<()> {
for map in maps {
if map.size > 0 {
return Ok(());
}
}
Err(einval())
Err(anyhow!(nix::Error::from_errno(Errno::EINVAL)))
}
fn usernamespace(oci: &Spec) -> Result<()> {
let linux = get_linux(oci)?;
let linux = oci
.linux
.as_ref()
.ok_or(anyhow!(nix::Error::from_errno(Errno::EINVAL)))?;
if contain_namespace(&linux.namespaces, "user") {
let user_ns = PathBuf::from("/proc/self/ns/user");
if !user_ns.exists() {
@@ -131,8 +137,8 @@ fn usernamespace(oci: &Spec) -> Result<()> {
idmapping(&linux.gid_mappings).context("idmapping gid")?;
} else {
// no user namespace but idmap
if !linux.uid_mappings.is_empty() || !linux.gid_mappings.is_empty() {
return Err(einval());
if linux.uid_mappings.len() != 0 || linux.gid_mappings.len() != 0 {
return Err(anyhow!(nix::Error::from_errno(Errno::EINVAL)));
}
}
@@ -140,8 +146,10 @@ fn usernamespace(oci: &Spec) -> Result<()> {
}
fn cgroupnamespace(oci: &Spec) -> Result<()> {
let linux = get_linux(oci)?;
let linux = oci
.linux
.as_ref()
.ok_or(anyhow!(nix::Error::from_errno(Errno::EINVAL)))?;
if contain_namespace(&linux.namespaces, "cgroup") {
let path = PathBuf::from("/proc/self/ns/cgroup");
if !path.exists() {
@@ -185,21 +193,23 @@ fn check_host_ns(path: &str) -> Result<()> {
.read_link()
.context(format!("read link {:?}", cpath))?;
if real_cpath == real_hpath {
return Err(einval());
return Err(anyhow!(nix::Error::from_errno(Errno::EINVAL)));
}
Ok(())
}
fn sysctl(oci: &Spec) -> Result<()> {
let linux = get_linux(oci)?;
let linux = oci
.linux
.as_ref()
.ok_or(anyhow!(nix::Error::from_errno(Errno::EINVAL)))?;
for (key, _) in linux.sysctl.iter() {
if SYSCTLS.contains_key(key.as_str()) || key.starts_with("fs.mqueue.") {
if contain_namespace(&linux.namespaces, "ipc") {
continue;
} else {
return Err(einval());
return Err(anyhow!(nix::Error::from_errno(Errno::EINVAL)));
}
}
@@ -214,31 +224,33 @@ fn sysctl(oci: &Spec) -> Result<()> {
}
if key == "kernel.hostname" {
return Err(einval());
return Err(anyhow!(nix::Error::from_errno(Errno::EINVAL)));
}
}
return Err(einval());
return Err(anyhow!(nix::Error::from_errno(Errno::EINVAL)));
}
Ok(())
}
fn rootless_euid_mapping(oci: &Spec) -> Result<()> {
let linux = get_linux(oci)?;
let linux = oci
.linux
.as_ref()
.ok_or(anyhow!(nix::Error::from_errno(Errno::EINVAL)))?;
if !contain_namespace(&linux.namespaces, "user") {
return Err(einval());
return Err(anyhow!(nix::Error::from_errno(Errno::EINVAL)));
}
if linux.uid_mappings.is_empty() || linux.gid_mappings.is_empty() {
if linux.uid_mappings.len() == 0 || linux.gid_mappings.len() == 0 {
// rootless containers requires at least one UID/GID mapping
return Err(einval());
return Err(anyhow!(nix::Error::from_errno(Errno::EINVAL)));
}
Ok(())
}
fn has_idmapping(maps: &[LinuxIdMapping], id: u32) -> bool {
fn has_idmapping(maps: &Vec<LinuxIDMapping>, id: u32) -> bool {
for map in maps {
if id >= map.container_id && id < map.container_id + map.size {
return true;
@@ -248,7 +260,10 @@ fn has_idmapping(maps: &[LinuxIdMapping], id: u32) -> bool {
}
fn rootless_euid_mount(oci: &Spec) -> Result<()> {
let linux = get_linux(oci)?;
let linux = oci
.linux
.as_ref()
.ok_or(anyhow!(nix::Error::from_errno(Errno::EINVAL)))?;
for mnt in oci.mounts.iter() {
for opt in mnt.options.iter() {
@@ -256,7 +271,7 @@ fn rootless_euid_mount(oci: &Spec) -> Result<()> {
let fields: Vec<&str> = opt.split('=').collect();
if fields.len() != 2 {
return Err(einval());
return Err(anyhow!(nix::Error::from_errno(Errno::EINVAL)));
}
let id = fields[1]
@@ -264,12 +279,16 @@ fn rootless_euid_mount(oci: &Spec) -> Result<()> {
.parse::<u32>()
.context(format!("parse field {}", &fields[1]))?;
if opt.starts_with("uid=") && !has_idmapping(&linux.uid_mappings, id) {
return Err(einval());
if opt.starts_with("uid=") {
if !has_idmapping(&linux.uid_mappings, id) {
return Err(anyhow!(nix::Error::from_errno(Errno::EINVAL)));
}
}
if opt.starts_with("gid=") && !has_idmapping(&linux.gid_mappings, id) {
return Err(einval());
if opt.starts_with("gid=") {
if !has_idmapping(&linux.gid_mappings, id) {
return Err(anyhow!(nix::Error::from_errno(Errno::EINVAL)));
}
}
}
}
@@ -285,18 +304,22 @@ fn rootless_euid(oci: &Spec) -> Result<()> {
pub fn validate(conf: &Config) -> Result<()> {
lazy_static::initialize(&SYSCTLS);
let oci = conf.spec.as_ref().ok_or_else(einval)?;
let oci = conf
.spec
.as_ref()
.ok_or(anyhow!(nix::Error::from_errno(Errno::EINVAL)))?;
if oci.linux.is_none() {
return Err(einval());
return Err(anyhow!(nix::Error::from_errno(Errno::EINVAL)));
}
let root = match oci.root.as_ref() {
Some(v) => v.path.as_str(),
None => return Err(einval()),
None => return Err(anyhow!(nix::Error::from_errno(Errno::EINVAL))),
};
rootfs(root).context("rootfs")?;
network(oci).context("network")?;
hostname(oci).context("hostname")?;
security(oci).context("security")?;
usernamespace(oci).context("usernamespace")?;
@@ -309,274 +332,3 @@ pub fn validate(conf: &Config) -> Result<()> {
Ok(())
}
#[cfg(test)]
mod tests {
use super::*;
use oci::Mount;
#[test]
fn test_namespace() {
let namespaces = [
LinuxNamespace {
r#type: "net".to_owned(),
path: "/sys/cgroups/net".to_owned(),
},
LinuxNamespace {
r#type: "uts".to_owned(),
path: "/sys/cgroups/uts".to_owned(),
},
];
assert_eq!(contain_namespace(&namespaces, "net"), true);
assert_eq!(contain_namespace(&namespaces, "uts"), true);
assert_eq!(contain_namespace(&namespaces, ""), false);
assert_eq!(contain_namespace(&namespaces, "Net"), false);
assert_eq!(contain_namespace(&namespaces, "ipc"), false);
assert_eq!(
get_namespace_path(&namespaces, "net").unwrap(),
"/sys/cgroups/net"
);
assert_eq!(
get_namespace_path(&namespaces, "uts").unwrap(),
"/sys/cgroups/uts"
);
get_namespace_path(&namespaces, "").unwrap_err();
get_namespace_path(&namespaces, "Uts").unwrap_err();
get_namespace_path(&namespaces, "ipc").unwrap_err();
}
#[test]
fn test_rootfs() {
rootfs("/_no_exit_fs_xxxxxxxxxxx").unwrap_err();
rootfs("sys").unwrap_err();
rootfs("/proc/self/root").unwrap_err();
rootfs("/proc/self/root/sys").unwrap_err();
rootfs("/proc/self").unwrap_err();
rootfs("/./proc/self").unwrap_err();
rootfs("/proc/././self").unwrap_err();
rootfs("/proc/.././self").unwrap_err();
rootfs("/proc/uptime").unwrap();
rootfs("/../proc/uptime").unwrap();
rootfs("/../../proc/uptime").unwrap();
rootfs("/proc/../proc/uptime").unwrap();
rootfs("/proc/../../proc/uptime").unwrap();
}
#[test]
fn test_hostname() {
let mut spec = Spec::default();
hostname(&spec).unwrap();
spec.hostname = "a.test.com".to_owned();
hostname(&spec).unwrap_err();
let mut linux = Linux::default();
linux.namespaces = vec![
LinuxNamespace {
r#type: "net".to_owned(),
path: "/sys/cgroups/net".to_owned(),
},
LinuxNamespace {
r#type: "uts".to_owned(),
path: "/sys/cgroups/uts".to_owned(),
},
];
spec.linux = Some(linux);
hostname(&spec).unwrap();
}
#[test]
fn test_security() {
let mut spec = Spec::default();
let linux = Linux::default();
spec.linux = Some(linux);
security(&spec).unwrap();
let mut linux = Linux::default();
linux.masked_paths.push("/test".to_owned());
linux.namespaces = vec![
LinuxNamespace {
r#type: "net".to_owned(),
path: "/sys/cgroups/net".to_owned(),
},
LinuxNamespace {
r#type: "uts".to_owned(),
path: "/sys/cgroups/uts".to_owned(),
},
];
spec.linux = Some(linux);
security(&spec).unwrap_err();
let mut linux = Linux::default();
linux.masked_paths.push("/test".to_owned());
linux.namespaces = vec![
LinuxNamespace {
r#type: "net".to_owned(),
path: "/sys/cgroups/net".to_owned(),
},
LinuxNamespace {
r#type: "mount".to_owned(),
path: "/sys/cgroups/mount".to_owned(),
},
];
spec.linux = Some(linux);
security(&spec).unwrap();
}
#[test]
fn test_usernamespace() {
let mut spec = Spec::default();
usernamespace(&spec).unwrap_err();
let linux = Linux::default();
spec.linux = Some(linux);
usernamespace(&spec).unwrap();
let mut linux = Linux::default();
linux.uid_mappings = vec![LinuxIdMapping {
container_id: 0,
host_id: 1000,
size: 0,
}];
spec.linux = Some(linux);
usernamespace(&spec).unwrap_err();
let mut linux = Linux::default();
linux.uid_mappings = vec![LinuxIdMapping {
container_id: 0,
host_id: 1000,
size: 100,
}];
spec.linux = Some(linux);
usernamespace(&spec).unwrap_err();
}
#[test]
fn test_rootless_euid() {
let mut spec = Spec::default();
// Test case: without linux
rootless_euid_mapping(&spec).unwrap_err();
rootless_euid_mount(&spec).unwrap_err();
// Test case: without user namespace
let linux = Linux::default();
spec.linux = Some(linux);
rootless_euid_mapping(&spec).unwrap_err();
// Test case: without user namespace
let linux = spec.linux.as_mut().unwrap();
linux.namespaces = vec![
LinuxNamespace {
r#type: "net".to_owned(),
path: "/sys/cgroups/net".to_owned(),
},
LinuxNamespace {
r#type: "uts".to_owned(),
path: "/sys/cgroups/uts".to_owned(),
},
];
rootless_euid_mapping(&spec).unwrap_err();
let linux = spec.linux.as_mut().unwrap();
linux.namespaces = vec![
LinuxNamespace {
r#type: "net".to_owned(),
path: "/sys/cgroups/net".to_owned(),
},
LinuxNamespace {
r#type: "user".to_owned(),
path: "/sys/cgroups/user".to_owned(),
},
];
linux.uid_mappings = vec![LinuxIdMapping {
container_id: 0,
host_id: 1000,
size: 1000,
}];
linux.gid_mappings = vec![LinuxIdMapping {
container_id: 0,
host_id: 1000,
size: 1000,
}];
rootless_euid_mapping(&spec).unwrap();
spec.mounts.push(Mount {
destination: "/app".to_owned(),
r#type: "tmpfs".to_owned(),
source: "".to_owned(),
options: vec!["uid=10000".to_owned()],
});
rootless_euid_mount(&spec).unwrap_err();
spec.mounts = vec![
(Mount {
destination: "/app".to_owned(),
r#type: "tmpfs".to_owned(),
source: "".to_owned(),
options: vec!["uid=500".to_owned(), "gid=500".to_owned()],
}),
];
rootless_euid(&spec).unwrap();
}
#[test]
fn test_check_host_ns() {
check_host_ns("/proc/self/ns/net").unwrap_err();
check_host_ns("/proc/sys/net/ipv4/tcp_sack").unwrap();
}
#[test]
fn test_sysctl() {
let mut spec = Spec::default();
let mut linux = Linux::default();
linux.namespaces = vec![LinuxNamespace {
r#type: "net".to_owned(),
path: "/sys/cgroups/net".to_owned(),
}];
linux
.sysctl
.insert("kernel.domainname".to_owned(), "test.com".to_owned());
spec.linux = Some(linux);
sysctl(&spec).unwrap_err();
spec.linux
.as_mut()
.unwrap()
.namespaces
.push(LinuxNamespace {
r#type: "uts".to_owned(),
path: "/sys/cgroups/uts".to_owned(),
});
sysctl(&spec).unwrap();
}
#[test]
fn test_validate() {
let spec = Spec::default();
let mut config = Config {
cgroup_name: "container1".to_owned(),
use_systemd_cgroup: false,
no_pivot_root: true,
no_new_keyring: true,
rootless_euid: false,
rootless_cgroup: false,
spec: Some(spec),
};
validate(&config).unwrap_err();
let linux = Linux::default();
config.spec.as_mut().unwrap().linux = Some(linux);
validate(&config).unwrap_err();
}
}

View File

@@ -10,7 +10,6 @@ use std::time;
const DEBUG_CONSOLE_FLAG: &str = "agent.debug_console";
const DEV_MODE_FLAG: &str = "agent.devmode";
const LOG_LEVEL_OPTION: &str = "agent.log";
const SERVER_ADDR_OPTION: &str = "agent.server_addr";
const HOTPLUG_TIMOUT_OPTION: &str = "agent.hotplug_timeout";
const DEBUG_CONSOLE_VPORT_OPTION: &str = "agent.debug_console_vport";
const LOG_VPORT_OPTION: &str = "agent.log_vport";
@@ -22,29 +21,14 @@ const DEFAULT_HOTPLUG_TIMEOUT: time::Duration = time::Duration::from_secs(3);
const DEFAULT_CONTAINER_PIPE_SIZE: i32 = 0;
const VSOCK_ADDR: &str = "vsock://-1";
const VSOCK_PORT: u16 = 1024;
// Environment variables used for development and testing
const SERVER_ADDR_ENV_VAR: &str = "KATA_AGENT_SERVER_ADDR";
const LOG_LEVEL_ENV_VAR: &str = "KATA_AGENT_LOG_LEVEL";
const ERR_INVALID_LOG_LEVEL: &str = "invalid log level";
const ERR_INVALID_LOG_LEVEL_PARAM: &str = "invalid log level parameter";
const ERR_INVALID_GET_VALUE_PARAM: &str = "expected name=value";
const ERR_INVALID_GET_VALUE_NO_NAME: &str = "name=value parameter missing name";
const ERR_INVALID_GET_VALUE_NO_VALUE: &str = "name=value parameter missing value";
const ERR_INVALID_LOG_LEVEL_KEY: &str = "invalid log level key name";
const ERR_INVALID_HOTPLUG_TIMEOUT: &str = "invalid hotplug timeout parameter";
const ERR_INVALID_HOTPLUG_TIMEOUT_PARAM: &str = "unable to parse hotplug timeout";
const ERR_INVALID_HOTPLUG_TIMEOUT_KEY: &str = "invalid hotplug timeout key name";
const ERR_INVALID_CONTAINER_PIPE_SIZE: &str = "invalid container pipe size parameter";
const ERR_INVALID_CONTAINER_PIPE_SIZE_PARAM: &str = "unable to parse container pipe size";
const ERR_INVALID_CONTAINER_PIPE_SIZE_KEY: &str = "invalid container pipe size key name";
const ERR_INVALID_CONTAINER_PIPE_NEGATIVE: &str = "container pipe size should not be negative";
// FIXME: unused
const TRACE_MODE_FLAG: &str = "agent.trace";
const USE_VSOCK_FLAG: &str = "agent.use_vsock";
#[derive(Debug)]
pub struct AgentConfig {
pub struct agentConfig {
pub debug_console: bool,
pub dev_mode: bool,
pub log_level: slog::Level,
@@ -86,9 +70,9 @@ macro_rules! parse_cmdline_param {
};
}
impl AgentConfig {
pub fn new() -> AgentConfig {
AgentConfig {
impl agentConfig {
pub fn new() -> agentConfig {
agentConfig {
debug_console: false,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
@@ -111,12 +95,6 @@ impl AgentConfig {
// parse cmdline options
parse_cmdline_param!(param, LOG_LEVEL_OPTION, self.log_level, get_log_level);
parse_cmdline_param!(
param,
SERVER_ADDR_OPTION,
self.server_addr,
get_string_value
);
// ensure the timeout is a positive value
parse_cmdline_param!(
@@ -124,7 +102,7 @@ impl AgentConfig {
HOTPLUG_TIMOUT_OPTION,
self.hotplug_timeout,
get_hotplug_timeout,
|hotplug_timeout: time::Duration| hotplug_timeout.as_secs() > 0
|hotplugTimeout: time::Duration| hotplugTimeout.as_secs() > 0
);
// vsock port should be positive values
@@ -161,18 +139,12 @@ impl AgentConfig {
self.server_addr = addr;
}
if let Ok(addr) = env::var(LOG_LEVEL_ENV_VAR) {
if let Ok(level) = logrus_to_slog_level(&addr) {
self.log_level = level;
}
}
Ok(())
}
}
fn get_vsock_port(p: &str) -> Result<i32> {
let fields: Vec<&str> = p.split('=').collect();
let fields: Vec<&str> = p.split("=").collect();
if fields.len() != 2 {
return Err(anyhow!("invalid port parameter"));
}
@@ -200,7 +172,7 @@ fn logrus_to_slog_level(logrus_level: &str) -> Result<slog::Level> {
"trace" => slog::Level::Trace,
_ => {
return Err(anyhow!(ERR_INVALID_LOG_LEVEL));
return Err(anyhow!("invalid log level"));
}
};
@@ -208,41 +180,41 @@ fn logrus_to_slog_level(logrus_level: &str) -> Result<slog::Level> {
}
fn get_log_level(param: &str) -> Result<slog::Level> {
let fields: Vec<&str> = param.split('=').collect();
let fields: Vec<&str> = param.split("=").collect();
if fields.len() != 2 {
return Err(anyhow!(ERR_INVALID_LOG_LEVEL_PARAM));
return Err(anyhow!("invalid log level parameter"));
}
if fields[0] != LOG_LEVEL_OPTION {
Err(anyhow!(ERR_INVALID_LOG_LEVEL_KEY))
Err(anyhow!("invalid log level key name"))
} else {
Ok(logrus_to_slog_level(fields[1])?)
}
}
fn get_hotplug_timeout(param: &str) -> Result<time::Duration> {
let fields: Vec<&str> = param.split('=').collect();
let fields: Vec<&str> = param.split("=").collect();
if fields.len() != 2 {
return Err(anyhow!(ERR_INVALID_HOTPLUG_TIMEOUT));
return Err(anyhow!("invalid hotplug timeout parameter"));
}
let key = fields[0];
if key != HOTPLUG_TIMOUT_OPTION {
return Err(anyhow!(ERR_INVALID_HOTPLUG_TIMEOUT_KEY));
return Err(anyhow!("invalid hotplug timeout key name"));
}
let value = fields[1].parse::<u64>();
if value.is_err() {
return Err(anyhow!(ERR_INVALID_HOTPLUG_TIMEOUT_PARAM));
return Err(anyhow!("unable to parse hotplug timeout"));
}
Ok(time::Duration::from_secs(value.unwrap()))
}
fn get_bool_value(param: &str) -> Result<bool> {
let fields: Vec<&str> = param.split('=').collect();
let fields: Vec<&str> = param.split("=").collect();
if fields.len() != 2 {
return Ok(false);
@@ -253,58 +225,36 @@ fn get_bool_value(param: &str) -> Result<bool> {
// first try to parse as bool value
v.parse::<bool>().or_else(|_err1| {
// then try to parse as integer value
v.parse::<u64>().or(Ok(0)).map(|v| !matches!(v, 0))
v.parse::<u64>().or_else(|_err2| Ok(0)).and_then(|v| {
// only `0` returns false, otherwise returns true
Ok(match v {
0 => false,
_ => true,
})
})
})
}
// Return the value from a "name=value" string.
//
// Note:
//
// - A name *and* a value is required.
// - A value can contain any number of equal signs.
// - We could/should maybe check if the name is pure whitespace
// since this is considered to be invalid.
fn get_string_value(param: &str) -> Result<String> {
let fields: Vec<&str> = param.split('=').collect();
if fields.len() < 2 {
return Err(anyhow!(ERR_INVALID_GET_VALUE_PARAM));
}
// We need name (but the value can be blank)
if fields[0].is_empty() {
return Err(anyhow!(ERR_INVALID_GET_VALUE_NO_NAME));
}
let value = fields[1..].join("=");
if value.is_empty() {
return Err(anyhow!(ERR_INVALID_GET_VALUE_NO_VALUE));
}
Ok(value)
}
fn get_container_pipe_size(param: &str) -> Result<i32> {
let fields: Vec<&str> = param.split('=').collect();
let fields: Vec<&str> = param.split("=").collect();
if fields.len() != 2 {
return Err(anyhow!(ERR_INVALID_CONTAINER_PIPE_SIZE));
return Err(anyhow!("invalid container pipe size parameter"));
}
let key = fields[0];
if key != CONTAINER_PIPE_SIZE_OPTION {
return Err(anyhow!(ERR_INVALID_CONTAINER_PIPE_SIZE_KEY));
return Err(anyhow!("invalid container pipe size key name"));
}
let res = fields[1].parse::<i32>();
if res.is_err() {
return Err(anyhow!(ERR_INVALID_CONTAINER_PIPE_SIZE_PARAM));
return Err(anyhow!("unable to parse container pipe size"));
}
let value = res.unwrap();
if value < 0 {
return Err(anyhow!(ERR_INVALID_CONTAINER_PIPE_NEGATIVE));
return Err(anyhow!("container pipe size should not be negative"));
}
Ok(value)
@@ -319,6 +269,19 @@ mod tests {
use std::time;
use tempfile::tempdir;
const ERR_INVALID_LOG_LEVEL: &str = "invalid log level";
const ERR_INVALID_LOG_LEVEL_PARAM: &str = "invalid log level parameter";
const ERR_INVALID_LOG_LEVEL_KEY: &str = "invalid log level key name";
const ERR_INVALID_HOTPLUG_TIMEOUT: &str = "invalid hotplug timeout parameter";
const ERR_INVALID_HOTPLUG_TIMEOUT_PARAM: &str = "unable to parse hotplug timeout";
const ERR_INVALID_HOTPLUG_TIMEOUT_KEY: &str = "invalid hotplug timeout key name";
const ERR_INVALID_CONTAINER_PIPE_SIZE: &str = "invalid container pipe size parameter";
const ERR_INVALID_CONTAINER_PIPE_SIZE_PARAM: &str = "unable to parse container pipe size";
const ERR_INVALID_CONTAINER_PIPE_SIZE_KEY: &str = "invalid container pipe size key name";
const ERR_INVALID_CONTAINER_PIPE_NEGATIVE: &str = "container pipe size should not be negative";
// helper function to make errors less crazy-long
fn make_err(desc: &str) -> Error {
anyhow!(desc.to_string())
@@ -334,25 +297,22 @@ mod tests {
if $expected_result.is_ok() {
let expected_level = $expected_result.as_ref().unwrap();
let actual_level = $actual_result.unwrap();
assert!(*expected_level == actual_level, "{}", $msg);
assert!(*expected_level == actual_level, $msg);
} else {
let expected_error = $expected_result.as_ref().unwrap_err();
let actual_error = $actual_result.unwrap_err();
let expected_error_msg = format!("{:?}", expected_error);
let actual_error_msg = format!("{:?}", actual_error);
if let Err(actual_error) = $actual_result {
let actual_error_msg = format!("{:?}", actual_error);
assert!(expected_error_msg == actual_error_msg, "{}", $msg);
} else {
assert!(expected_error_msg == "expected error, got OK", "{}", $msg);
}
assert!(expected_error_msg == actual_error_msg, $msg);
}
};
}
#[test]
fn test_new() {
let config = AgentConfig::new();
let config = agentConfig::new();
assert_eq!(config.debug_console, false);
assert_eq!(config.dev_mode, false);
assert_eq!(config.log_level, DEFAULT_LOG_LEVEL);
@@ -361,550 +321,297 @@ mod tests {
#[test]
fn test_parse_cmdline() {
const TEST_SERVER_ADDR: &str = "vsock://-1:1024";
#[derive(Debug)]
struct TestData<'a> {
contents: &'a str,
env_vars: Vec<&'a str>,
debug_console: bool,
dev_mode: bool,
log_level: slog::Level,
hotplug_timeout: time::Duration,
container_pipe_size: i32,
server_addr: &'a str,
unified_cgroup_hierarchy: bool,
}
let tests = &[
TestData {
contents: "agent.debug_consolex agent.devmode",
env_vars: Vec::new(),
debug_console: false,
dev_mode: true,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "agent.debug_console agent.devmodex",
env_vars: Vec::new(),
debug_console: true,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "agent.logx=debug",
env_vars: Vec::new(),
debug_console: false,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "agent.log=debug",
env_vars: Vec::new(),
debug_console: false,
dev_mode: false,
log_level: slog::Level::Debug,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "agent.log=debug",
env_vars: vec!["KATA_AGENT_LOG_LEVEL=trace"],
debug_console: false,
dev_mode: false,
log_level: slog::Level::Trace,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "",
env_vars: Vec::new(),
debug_console: false,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "foo",
env_vars: Vec::new(),
debug_console: false,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "foo bar",
env_vars: Vec::new(),
debug_console: false,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "foo bar",
env_vars: Vec::new(),
debug_console: false,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "foo agent bar",
env_vars: Vec::new(),
debug_console: false,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "foo debug_console agent bar devmode",
env_vars: Vec::new(),
debug_console: false,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "agent.debug_console",
env_vars: Vec::new(),
debug_console: true,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: " agent.debug_console ",
env_vars: Vec::new(),
debug_console: true,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "agent.debug_console foo",
env_vars: Vec::new(),
debug_console: true,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: " agent.debug_console foo",
env_vars: Vec::new(),
debug_console: true,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "foo agent.debug_console bar",
env_vars: Vec::new(),
debug_console: true,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "foo agent.debug_console",
env_vars: Vec::new(),
debug_console: true,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "foo agent.debug_console ",
env_vars: Vec::new(),
debug_console: true,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "agent.devmode",
env_vars: Vec::new(),
debug_console: false,
dev_mode: true,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: " agent.devmode ",
env_vars: Vec::new(),
debug_console: false,
dev_mode: true,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "agent.devmode foo",
env_vars: Vec::new(),
debug_console: false,
dev_mode: true,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: " agent.devmode foo",
env_vars: Vec::new(),
debug_console: false,
dev_mode: true,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "foo agent.devmode bar",
env_vars: Vec::new(),
debug_console: false,
dev_mode: true,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "foo agent.devmode",
env_vars: Vec::new(),
debug_console: false,
dev_mode: true,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "foo agent.devmode ",
env_vars: Vec::new(),
debug_console: false,
dev_mode: true,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "agent.devmode agent.debug_console",
env_vars: Vec::new(),
debug_console: true,
dev_mode: true,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "agent.devmode agent.debug_console agent.hotplug_timeout=100 agent.unified_cgroup_hierarchy=a",
env_vars: Vec::new(),
debug_console: true,
dev_mode: true,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: time::Duration::from_secs(100),
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "agent.devmode agent.debug_console agent.hotplug_timeout=0 agent.unified_cgroup_hierarchy=11",
env_vars: Vec::new(),
debug_console: true,
dev_mode: true,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: true,
},
TestData {
contents: "agent.devmode agent.debug_console agent.container_pipe_size=2097152 agent.unified_cgroup_hierarchy=false",
env_vars: Vec::new(),
debug_console: true,
dev_mode: true,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: 2097152,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "agent.devmode agent.debug_console agent.container_pipe_size=100 agent.unified_cgroup_hierarchy=true",
env_vars: Vec::new(),
debug_console: true,
dev_mode: true,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: 100,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: true,
},
TestData {
contents: "agent.devmode agent.debug_console agent.container_pipe_size=0 agent.unified_cgroup_hierarchy=0",
env_vars: Vec::new(),
debug_console: true,
dev_mode: true,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "agent.devmode agent.debug_console agent.container_pip_siz=100 agent.unified_cgroup_hierarchy=1",
env_vars: Vec::new(),
debug_console: true,
dev_mode: true,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: true,
},
TestData {
contents: "",
env_vars: Vec::new(),
debug_console: false,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "",
env_vars: vec!["KATA_AGENT_SERVER_ADDR=foo"],
debug_console: false,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: "foo",
unified_cgroup_hierarchy: false,
},
TestData {
contents: "",
env_vars: vec!["KATA_AGENT_SERVER_ADDR=="],
debug_console: false,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: "=",
unified_cgroup_hierarchy: false,
},
TestData {
contents: "",
env_vars: vec!["KATA_AGENT_SERVER_ADDR==foo"],
debug_console: false,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: "=foo",
unified_cgroup_hierarchy: false,
},
TestData {
contents: "",
env_vars: vec!["KATA_AGENT_SERVER_ADDR=foo=bar=baz="],
debug_console: false,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: "foo=bar=baz=",
unified_cgroup_hierarchy: false,
},
TestData {
contents: "",
env_vars: vec!["KATA_AGENT_SERVER_ADDR=unix:///tmp/foo.socket"],
debug_console: false,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: "unix:///tmp/foo.socket",
unified_cgroup_hierarchy: false,
},
TestData {
contents: "",
env_vars: vec!["KATA_AGENT_SERVER_ADDR=unix://@/tmp/foo.socket"],
debug_console: false,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: "unix://@/tmp/foo.socket",
unified_cgroup_hierarchy: false,
},
TestData {
contents: "",
env_vars: vec!["KATA_AGENT_LOG_LEVEL="],
debug_console: false,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "",
env_vars: vec!["KATA_AGENT_LOG_LEVEL=invalid"],
debug_console: false,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "",
env_vars: vec!["KATA_AGENT_LOG_LEVEL=debug"],
debug_console: false,
dev_mode: false,
log_level: slog::Level::Debug,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "",
env_vars: vec!["KATA_AGENT_LOG_LEVEL=debugger"],
debug_console: false,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "server_addr=unix:///tmp/foo.socket",
env_vars: Vec::new(),
debug_console: false,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "agent.server_address=unix:///tmp/foo.socket",
env_vars: Vec::new(),
debug_console: false,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: TEST_SERVER_ADDR,
unified_cgroup_hierarchy: false,
},
TestData {
contents: "agent.server_addr=unix:///tmp/foo.socket",
env_vars: Vec::new(),
debug_console: false,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: "unix:///tmp/foo.socket",
unified_cgroup_hierarchy: false,
},
TestData {
contents: " agent.server_addr=unix:///tmp/foo.socket",
env_vars: Vec::new(),
debug_console: false,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: "unix:///tmp/foo.socket",
unified_cgroup_hierarchy: false,
},
TestData {
contents: " agent.server_addr=unix:///tmp/foo.socket a",
env_vars: Vec::new(),
debug_console: false,
dev_mode: false,
log_level: DEFAULT_LOG_LEVEL,
hotplug_timeout: DEFAULT_HOTPLUG_TIMEOUT,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: "unix:///tmp/foo.socket",
unified_cgroup_hierarchy: false,
},
];
let dir = tempdir().expect("failed to create tmpdir");
@@ -914,12 +621,11 @@ mod tests {
let filename = file_path.to_str().expect("failed to create filename");
let mut config = AgentConfig::new();
let mut config = agentConfig::new();
let result = config.parse_cmdline(&filename.to_owned());
assert!(result.is_err());
// Now, test various combinations of file contents and environment
// variables.
// Now, test various combinations of file contents
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
@@ -928,25 +634,12 @@ mod tests {
let filename = file_path.to_str().expect("failed to create filename");
let mut file =
File::create(filename).unwrap_or_else(|_| panic!("{}: failed to create file", msg));
File::create(filename).expect(&format!("{}: failed to create file", msg));
file.write_all(d.contents.as_bytes())
.unwrap_or_else(|_| panic!("{}: failed to write file contents", msg));
.expect(&format!("{}: failed to write file contents", msg));
let mut vars_to_unset = Vec::new();
for v in &d.env_vars {
let fields: Vec<&str> = v.split('=').collect();
let name = fields[0];
let value = fields[1..].join("=");
env::set_var(name, value);
vars_to_unset.push(name);
}
let mut config = AgentConfig::new();
let mut config = agentConfig::new();
assert_eq!(config.debug_console, false, "{}", msg);
assert_eq!(config.dev_mode, false, "{}", msg);
assert_eq!(config.unified_cgroup_hierarchy, false, "{}", msg);
@@ -957,7 +650,6 @@ mod tests {
msg
);
assert_eq!(config.container_pipe_size, 0, "{}", msg);
assert_eq!(config.server_addr, TEST_SERVER_ADDR, "{}", msg);
let result = config.parse_cmdline(filename);
assert!(result.is_ok(), "{}", msg);
@@ -972,11 +664,6 @@ mod tests {
assert_eq!(d.log_level, config.log_level, "{}", msg);
assert_eq!(d.hotplug_timeout, config.hotplug_timeout, "{}", msg);
assert_eq!(d.container_pipe_size, config.container_pipe_size, "{}", msg);
assert_eq!(d.server_addr, config.server_addr, "{}", msg);
for v in vars_to_unset {
env::remove_var(v);
}
}
}
@@ -1050,7 +737,7 @@ mod tests {
let msg = format!("{}: result: {:?}", msg, result);
assert_result!(d.result, result, msg);
assert_result!(d.result, result, format!("{}", msg));
}
}
@@ -1144,7 +831,7 @@ mod tests {
let msg = format!("{}: result: {:?}", msg, result);
assert_result!(d.result, result, msg);
assert_result!(d.result, result, format!("{}", msg));
}
}
@@ -1214,7 +901,7 @@ mod tests {
let msg = format!("{}: result: {:?}", msg, result);
assert_result!(d.result, result, msg);
assert_result!(d.result, result, format!("{}", msg));
}
}
@@ -1288,85 +975,7 @@ mod tests {
let msg = format!("{}: result: {:?}", msg, result);
assert_result!(d.result, result, msg);
}
}
#[test]
fn test_get_string_value() {
#[derive(Debug)]
struct TestData<'a> {
param: &'a str,
result: Result<String>,
}
let tests = &[
TestData {
param: "",
result: Err(make_err(ERR_INVALID_GET_VALUE_PARAM)),
},
TestData {
param: "=",
result: Err(make_err(ERR_INVALID_GET_VALUE_NO_NAME)),
},
TestData {
param: "==",
result: Err(make_err(ERR_INVALID_GET_VALUE_NO_NAME)),
},
TestData {
param: "x=",
result: Err(make_err(ERR_INVALID_GET_VALUE_NO_VALUE)),
},
TestData {
param: "x==",
result: Ok("=".into()),
},
TestData {
param: "x===",
result: Ok("==".into()),
},
TestData {
param: "x==x",
result: Ok("=x".into()),
},
TestData {
param: "x=x",
result: Ok("x".into()),
},
TestData {
param: "x=x=",
result: Ok("x=".into()),
},
TestData {
param: "x=x=x",
result: Ok("x=x".into()),
},
TestData {
param: "foo=bar",
result: Ok("bar".into()),
},
TestData {
param: "x= =",
result: Ok(" =".into()),
},
TestData {
param: "x= =",
result: Ok(" =".into()),
},
TestData {
param: "x= = ",
result: Ok(" = ".into()),
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let result = get_string_value(d.param);
let msg = format!("{}: result: {:?}", msg, result);
assert_result!(d.result, result, msg);
assert_result!(d.result, result, format!("{}", msg));
}
}
}

View File

@@ -1,294 +0,0 @@
// Copyright (c) 2021 Ant Group
// Copyright (c) 2021 Intel Corporation
//
// SPDX-License-Identifier: Apache-2.0
//
use crate::util;
use anyhow::{anyhow, Result};
use nix::fcntl::{self, FcntlArg, FdFlag, OFlag};
use nix::libc::{STDERR_FILENO, STDIN_FILENO, STDOUT_FILENO};
use nix::pty::{openpty, OpenptyResult};
use nix::sys::socket::{self, AddressFamily, SockAddr, SockFlag, SockType};
use nix::sys::stat::Mode;
use nix::sys::wait;
use nix::unistd::{self, close, dup2, fork, setsid, ForkResult, Pid};
use rustjail::pipestream::PipeStream;
use slog::Logger;
use std::ffi::CString;
use std::os::unix::io::{FromRawFd, RawFd};
use std::path::PathBuf;
use std::process::Stdio;
use std::sync::Arc;
use std::sync::Mutex as SyncMutex;
use futures::StreamExt;
use tokio::io::{AsyncRead, AsyncWrite};
use tokio::select;
use tokio::sync::watch::Receiver;
const CONSOLE_PATH: &str = "/dev/console";
lazy_static! {
static ref SHELLS: Arc<SyncMutex<Vec<String>>> = {
let mut v = Vec::new();
if !cfg!(test) {
v.push("/bin/bash".to_string());
v.push("/bin/sh".to_string());
}
Arc::new(SyncMutex::new(v))
};
}
pub fn initialize() {
lazy_static::initialize(&SHELLS);
}
pub async fn debug_console_handler(
logger: Logger,
port: u32,
mut shutdown: Receiver<bool>,
) -> Result<()> {
let logger = logger.new(o!("subsystem" => "debug-console"));
let shells = SHELLS.lock().unwrap().to_vec();
let shell = shells
.into_iter()
.find(|sh| PathBuf::from(sh).exists())
.ok_or_else(|| anyhow!("no shell found to launch debug console"))?;
if port > 0 {
let listenfd = socket::socket(
AddressFamily::Vsock,
SockType::Stream,
SockFlag::SOCK_CLOEXEC,
None,
)?;
let addr = SockAddr::new_vsock(libc::VMADDR_CID_ANY, port);
socket::bind(listenfd, &addr)?;
socket::listen(listenfd, 1)?;
let mut incoming = util::get_vsock_incoming(listenfd);
loop {
select! {
_ = shutdown.changed() => {
info!(logger, "debug console got shutdown request");
break;
}
conn = incoming.next() => {
if let Some(conn) = conn {
// Accept a new connection
match conn {
Ok(stream) => {
let logger = logger.clone();
let shell = shell.clone();
// Do not block(await) here, or we'll never receive the shutdown signal
tokio::spawn(async move {
let _ = run_debug_console_vsock(logger, shell, stream).await;
});
}
Err(e) => {
error!(logger, "{:?}", e);
}
}
} else {
break;
}
}
}
}
} else {
let mut flags = OFlag::empty();
flags.insert(OFlag::O_RDWR);
flags.insert(OFlag::O_CLOEXEC);
let fd = fcntl::open(CONSOLE_PATH, flags, Mode::empty())?;
select! {
_ = shutdown.changed() => {
info!(logger, "debug console got shutdown request");
}
result = run_debug_console_serial(shell.clone(), fd) => {
match result {
Ok(_) => {
info!(logger, "run_debug_console_shell session finished");
}
Err(err) => {
error!(logger, "run_debug_console_shell failed: {:?}", err);
}
}
}
}
};
Ok(())
}
fn run_in_child(slave_fd: libc::c_int, shell: String) -> Result<()> {
// create new session with child as session leader
setsid()?;
// dup stdin, stdout, stderr to let child act as a terminal
dup2(slave_fd, STDIN_FILENO)?;
dup2(slave_fd, STDOUT_FILENO)?;
dup2(slave_fd, STDERR_FILENO)?;
// set tty
unsafe {
libc::ioctl(0, libc::TIOCSCTTY);
}
let cmd = CString::new(shell).unwrap();
// run shell
let _ = unistd::execvp(cmd.as_c_str(), &[]).map_err(|e| match e {
nix::Error::Sys(errno) => {
std::process::exit(errno as i32);
}
_ => std::process::exit(-2),
});
Ok(())
}
async fn run_in_parent<T: AsyncRead + AsyncWrite>(
logger: Logger,
stream: T,
pseudo: OpenptyResult,
child_pid: Pid,
) -> Result<()> {
info!(logger, "get debug shell pid {:?}", child_pid);
let master_fd = pseudo.master;
let _ = close(pseudo.slave);
let (mut socket_reader, mut socket_writer) = tokio::io::split(stream);
let (mut master_reader, mut master_writer) = tokio::io::split(PipeStream::from_fd(master_fd));
select! {
res = tokio::io::copy(&mut master_reader, &mut socket_writer) => {
debug!(
logger,
"master closed: {:?}", res
);
}
res = tokio::io::copy(&mut socket_reader, &mut master_writer) => {
info!(
logger,
"socket closed: {:?}", res
);
}
}
let wait_status = wait::waitpid(child_pid, None);
info!(logger, "debug console process exit code: {:?}", wait_status);
Ok(())
}
async fn run_debug_console_vsock<T: AsyncRead + AsyncWrite>(
logger: Logger,
shell: String,
stream: T,
) -> Result<()> {
let logger = logger.new(o!("subsystem" => "debug-console-shell"));
let pseudo = openpty(None, None)?;
let _ = fcntl::fcntl(pseudo.master, FcntlArg::F_SETFD(FdFlag::FD_CLOEXEC));
let _ = fcntl::fcntl(pseudo.slave, FcntlArg::F_SETFD(FdFlag::FD_CLOEXEC));
let slave_fd = pseudo.slave;
match fork() {
Ok(ForkResult::Child) => run_in_child(slave_fd, shell),
Ok(ForkResult::Parent { child: child_pid }) => {
run_in_parent(logger.clone(), stream, pseudo, child_pid).await
}
Err(err) => Err(anyhow!("fork error: {:?}", err)),
}
}
async fn run_debug_console_serial(shell: String, fd: RawFd) -> Result<()> {
let mut child = match tokio::process::Command::new(shell)
.arg("-i")
.kill_on_drop(true)
.stdin(unsafe { Stdio::from_raw_fd(fd) })
.stdout(unsafe { Stdio::from_raw_fd(fd) })
.stderr(unsafe { Stdio::from_raw_fd(fd) })
.spawn()
{
Ok(c) => c,
Err(_) => return Err(anyhow!("failed to spawn shell")),
};
child.wait().await?;
Ok(())
}
#[cfg(test)]
mod tests {
use super::*;
use tempfile::tempdir;
use tokio::sync::watch;
#[tokio::test]
async fn test_setup_debug_console_no_shells() {
{
// Guarantee no shells have been added
// (required to avoid racing with
// test_setup_debug_console_invalid_shell()).
let shells_ref = SHELLS.clone();
let mut shells = shells_ref.lock().unwrap();
shells.clear();
}
let logger = slog_scope::logger();
let (_, rx) = watch::channel(true);
let result = debug_console_handler(logger, 0, rx).await;
assert!(result.is_err());
assert_eq!(
result.unwrap_err().to_string(),
"no shell found to launch debug console"
);
}
#[tokio::test]
async fn test_setup_debug_console_invalid_shell() {
{
let shells_ref = SHELLS.clone();
let mut shells = shells_ref.lock().unwrap();
let dir = tempdir().expect("failed to create tmpdir");
// Add an invalid shell
let shell = dir
.path()
.join("enoent")
.to_str()
.expect("failed to construct shell path")
.to_string();
shells.push(shell);
}
let logger = slog_scope::logger();
let (_, rx) = watch::channel(true);
let result = debug_console_handler(logger, 0, rx).await;
assert!(result.is_err());
assert_eq!(
result.unwrap_err().to_string(),
"no shell found to launch debug console"
);
}
}

View File

@@ -5,20 +5,16 @@
use libc::{c_uint, major, minor};
use nix::sys::stat;
use regex::Regex;
use std::collections::HashMap;
use std::fs;
use std::os::unix::fs::MetadataExt;
use std::path::Path;
use std::str::FromStr;
use std::sync::Arc;
use tokio::sync::Mutex;
use std::sync::{mpsc, Arc, Mutex};
use crate::linux_abi::*;
use crate::mount::{DRIVER_BLK_TYPE, DRIVER_MMIO_BLK_TYPE, DRIVER_NVDIMM_TYPE, DRIVER_SCSI_TYPE};
use crate::pci;
use crate::mount::{DRIVERBLKTYPE, DRIVERMMIOBLKTYPE, DRIVERNVDIMMTYPE, DRIVERSCSITYPE};
use crate::sandbox::Sandbox;
use crate::uevent::{wait_for_uevent, Uevent, UeventMatcher};
use crate::{AGENT_CONFIG, GLOBAL_DEVICE_WATCHER};
use anyhow::{anyhow, Result};
use oci::{LinuxDeviceCgroup, LinuxResources, Spec};
use protocols::agent::Device;
@@ -39,6 +35,22 @@ struct DevIndexEntry {
struct DevIndex(HashMap<String, DevIndexEntry>);
// DeviceHandler is the type of callback to be defined to handle every type of device driver.
type DeviceHandler = fn(&Device, &mut Spec, &Arc<Mutex<Sandbox>>, &DevIndex) -> Result<()>;
// DeviceHandlerList lists the supported drivers.
#[cfg_attr(rustfmt, rustfmt_skip)]
lazy_static! {
static ref DEVICEHANDLERLIST: HashMap<&'static str, DeviceHandler> = {
let mut m: HashMap<&'static str, DeviceHandler> = HashMap::new();
m.insert(DRIVERBLKTYPE, virtio_blk_device_handler);
m.insert(DRIVERMMIOBLKTYPE, virtiommio_blk_device_handler);
m.insert(DRIVERNVDIMMTYPE, virtio_nvdimm_device_handler);
m.insert(DRIVERSCSITYPE, virtio_scsi_device_handler);
m
};
}
pub fn rescan_pci_bus() -> Result<()> {
online_device(SYSFS_PCI_BUS_RESCAN_FILE)
}
@@ -48,161 +60,112 @@ pub fn online_device(path: &str) -> Result<()> {
Ok(())
}
// pcipath_to_sysfs fetches the sysfs path for a PCI path, relative to
// the sysfs path for the PCI host bridge, based on the PCI path
// provided.
fn pcipath_to_sysfs(root_bus_sysfs: &str, pcipath: &pci::Path) -> Result<String> {
let mut bus = "0000:00".to_string();
let mut relpath = String::new();
// get_pci_device_address fetches the complete PCI address in sysfs, based on the PCI
// identifier provided. This should be in the format: "bridgeAddr/deviceAddr".
// Here, bridgeAddr is the address at which the bridge is attached on the root bus,
// while deviceAddr is the address at which the device is attached on the bridge.
fn get_pci_device_address(pci_id: &str) -> Result<String> {
let tokens: Vec<&str> = pci_id.split("/").collect();
for i in 0..pcipath.len() {
let bdf = format!("{}:{}.0", bus, pcipath[i]);
relpath = format!("{}/{}", relpath, bdf);
if i == pcipath.len() - 1 {
// Final device need not be a bridge
break;
}
// Find out the bus exposed by bridge
let bridgebuspath = format!("{}{}/pci_bus", root_bus_sysfs, relpath);
let mut files: Vec<_> = fs::read_dir(&bridgebuspath)?.collect();
if files.len() != 1 {
return Err(anyhow!(
"Expected exactly one PCI bus in {}, got {} instead",
bridgebuspath,
files.len()
));
}
// unwrap is safe, because of the length test above
let busfile = files.pop().unwrap()?;
bus = busfile
.file_name()
.into_string()
.map_err(|e| anyhow!("Bad filename under {}: {:?}", &bridgebuspath, e))?;
}
Ok(relpath)
}
// FIXME: This matcher is only correct if the guest has at most one
// SCSI host.
#[derive(Debug)]
struct ScsiBlockMatcher {
search: String,
}
impl ScsiBlockMatcher {
fn new(scsi_addr: &str) -> ScsiBlockMatcher {
let search = format!(r"/0:0:{}/block/", scsi_addr);
ScsiBlockMatcher { search }
}
}
impl UeventMatcher for ScsiBlockMatcher {
fn is_match(&self, uev: &Uevent) -> bool {
uev.subsystem == "block" && uev.devpath.contains(&self.search) && !uev.devname.is_empty()
}
}
pub async fn get_scsi_device_name(
sandbox: &Arc<Mutex<Sandbox>>,
scsi_addr: &str,
) -> Result<String> {
let matcher = ScsiBlockMatcher::new(scsi_addr);
scan_scsi_bus(scsi_addr)?;
let uev = wait_for_uevent(sandbox, matcher).await?;
Ok(format!("{}/{}", SYSTEM_DEV_PATH, &uev.devname))
}
#[derive(Debug)]
struct VirtioBlkPciMatcher {
rex: Regex,
}
impl VirtioBlkPciMatcher {
fn new(relpath: &str) -> VirtioBlkPciMatcher {
let root_bus = create_pci_root_bus_path();
let re = format!(r"^{}{}/virtio[0-9]+/block/", root_bus, relpath);
VirtioBlkPciMatcher {
rex: Regex::new(&re).unwrap(),
}
}
}
impl UeventMatcher for VirtioBlkPciMatcher {
fn is_match(&self, uev: &Uevent) -> bool {
uev.subsystem == "block" && self.rex.is_match(&uev.devpath) && !uev.devname.is_empty()
}
}
pub async fn get_virtio_blk_pci_device_name(
sandbox: &Arc<Mutex<Sandbox>>,
pcipath: &pci::Path,
) -> Result<String> {
let root_bus_sysfs = format!("{}{}", SYSFS_DIR, create_pci_root_bus_path());
let sysfs_rel_path = pcipath_to_sysfs(&root_bus_sysfs, pcipath)?;
let matcher = VirtioBlkPciMatcher::new(&sysfs_rel_path);
rescan_pci_bus()?;
let uev = wait_for_uevent(sandbox, matcher).await?;
Ok(format!("{}/{}", SYSTEM_DEV_PATH, &uev.devname))
}
#[derive(Debug)]
struct PmemBlockMatcher {
suffix: String,
}
impl PmemBlockMatcher {
fn new(devname: &str) -> PmemBlockMatcher {
let suffix = format!(r"/block/{}", devname);
PmemBlockMatcher { suffix }
}
}
impl UeventMatcher for PmemBlockMatcher {
fn is_match(&self, uev: &Uevent) -> bool {
uev.subsystem == "block"
&& uev.devpath.starts_with(ACPI_DEV_PATH)
&& uev.devpath.ends_with(&self.suffix)
&& !uev.devname.is_empty()
}
}
pub async fn wait_for_pmem_device(sandbox: &Arc<Mutex<Sandbox>>, devpath: &str) -> Result<()> {
let devname = match devpath.strip_prefix("/dev/") {
Some(dev) => dev,
None => {
return Err(anyhow!(
"Storage source '{}' must start with /dev/",
devpath
))
}
};
let matcher = PmemBlockMatcher::new(devname);
let uev = wait_for_uevent(sandbox, matcher).await?;
if uev.devname != devname {
if tokens.len() != 2 {
return Err(anyhow!(
"Unexpected device name {} for pmem device (expected {})",
uev.devname,
devname
"PCI Identifier for device should be of format [bridgeAddr/deviceAddr], got {}",
pci_id
));
}
Ok(())
let bridge_id = tokens[0];
let device_id = tokens[1];
// Deduce the complete bridge address based on the bridge address identifier passed
// and the fact that bridges are attached on the main bus with function 0.
let pci_bridge_addr = format!("0000:00:{}.0", bridge_id);
// Find out the bus exposed by bridge
let bridge_bus_path = format!("{}/{}/pci_bus/", SYSFS_PCI_BUS_PREFIX, pci_bridge_addr);
let files_slice: Vec<_> = fs::read_dir(&bridge_bus_path)
.unwrap()
.map(|res| res.unwrap().path())
.collect();
let bus_num = files_slice.len();
if bus_num != 1 {
return Err(anyhow!(
"Expected an entry for bus in {}, got {} entries instead",
bridge_bus_path,
bus_num
));
}
let bus = files_slice[0].file_name().unwrap().to_str().unwrap();
// Device address is based on the bus of the bridge to which it is attached.
// We do not pass devices as multifunction, hence the trailing 0 in the address.
let pci_device_addr = format!("{}:{}.0", bus, device_id);
let bridge_device_pci_addr = format!("{}/{}", pci_bridge_addr, pci_device_addr);
info!(
sl!(),
"Fetched PCI address for device PCIAddr:{}\n", bridge_device_pci_addr
);
Ok(bridge_device_pci_addr)
}
fn get_device_name(sandbox: &Arc<Mutex<Sandbox>>, dev_addr: &str) -> Result<String> {
// Keep the same lock order as uevent::handle_block_add_event(), otherwise it may cause deadlock.
let mut w = GLOBAL_DEVICE_WATCHER.lock().unwrap();
let sb = sandbox.lock().unwrap();
for (key, value) in sb.pci_device_map.iter() {
if key.contains(dev_addr) {
info!(sl!(), "Device {} found in pci device map", dev_addr);
return Ok(format!("{}/{}", SYSTEM_DEV_PATH, value));
}
}
drop(sb);
// If device is not found in the device map, hotplug event has not
// been received yet, create and add channel to the watchers map.
// The key of the watchers map is the device we are interested in.
// Note this is done inside the lock, not to miss any events from the
// global udev listener.
let (tx, rx) = mpsc::channel::<String>();
w.insert(dev_addr.to_string(), tx);
drop(w);
info!(sl!(), "Waiting on channel for device notification\n");
let hotplug_timeout = AGENT_CONFIG.read().unwrap().hotplug_timeout;
let dev_name = rx.recv_timeout(hotplug_timeout).map_err(|_| {
GLOBAL_DEVICE_WATCHER.lock().unwrap().remove_entry(dev_addr);
anyhow!(
"Timeout reached after {:?} waiting for device {}",
hotplug_timeout,
dev_addr
)
})?;
Ok(format!("{}/{}", SYSTEM_DEV_PATH, &dev_name))
}
pub fn get_scsi_device_name(sandbox: &Arc<Mutex<Sandbox>>, scsi_addr: &str) -> Result<String> {
let dev_sub_path = format!("{}{}/{}", SCSI_HOST_CHANNEL, scsi_addr, SCSI_BLOCK_SUFFIX);
scan_scsi_bus(scsi_addr)?;
get_device_name(sandbox, &dev_sub_path)
}
pub fn get_pci_device_name(sandbox: &Arc<Mutex<Sandbox>>, pci_id: &str) -> Result<String> {
let pci_addr = get_pci_device_address(pci_id)?;
rescan_pci_bus()?;
get_device_name(sandbox, &pci_addr)
}
/// Scan SCSI bus for the given SCSI address(SCSI-Id and LUN)
fn scan_scsi_bus(scsi_addr: &str) -> Result<()> {
let tokens: Vec<&str> = scsi_addr.split(':').collect();
let tokens: Vec<&str> = scsi_addr.split(":").collect();
if tokens.len() != 2 {
return Err(anyhow!(
"Unexpected format for SCSI Address: {}, expect SCSIID:LUA",
@@ -241,7 +204,7 @@ fn update_spec_device_list(device: &Device, spec: &mut Spec, devidx: &DevIndex)
// If no container_path is provided, we won't be able to match and
// update the device in the OCI spec device list. This is an error.
if device.container_path.is_empty() {
if device.container_path == "" {
return Err(anyhow!(
"container_path cannot empty for device {:?}",
device
@@ -311,53 +274,58 @@ fn update_spec_device_list(device: &Device, spec: &mut Spec, devidx: &DevIndex)
// device.Id should be the predicted device name (vda, vdb, ...)
// device.VmPath already provides a way to send it in
async fn virtiommio_blk_device_handler(
fn virtiommio_blk_device_handler(
device: &Device,
spec: &mut Spec,
_sandbox: &Arc<Mutex<Sandbox>>,
devidx: &DevIndex,
) -> Result<()> {
if device.vm_path.is_empty() {
if device.vm_path == "" {
return Err(anyhow!("Invalid path for virtio mmio blk device"));
}
update_spec_device_list(device, spec, devidx)
}
// device.Id should be a PCI path string
async fn virtio_blk_device_handler(
// device.Id should be the PCI address in the format "bridgeAddr/deviceAddr".
// Here, bridgeAddr is the address at which the brige is attached on the root bus,
// while deviceAddr is the address at which the device is attached on the bridge.
fn virtio_blk_device_handler(
device: &Device,
spec: &mut Spec,
sandbox: &Arc<Mutex<Sandbox>>,
devidx: &DevIndex,
) -> Result<()> {
let mut dev = device.clone();
let pcipath = pci::Path::from_str(&device.id)?;
dev.vm_path = get_virtio_blk_pci_device_name(sandbox, &pcipath).await?;
// When "Id (PCIAddr)" is not set, we allow to use the predicted "VmPath" passed from kata-runtime
// Note this is a special code path for cloud-hypervisor when BDF information is not available
if device.id != "" {
dev.vm_path = get_pci_device_name(sandbox, &device.id)?;
}
update_spec_device_list(&dev, spec, devidx)
}
// device.Id should be the SCSI address of the disk in the format "scsiID:lunID"
async fn virtio_scsi_device_handler(
fn virtio_scsi_device_handler(
device: &Device,
spec: &mut Spec,
sandbox: &Arc<Mutex<Sandbox>>,
devidx: &DevIndex,
) -> Result<()> {
let mut dev = device.clone();
dev.vm_path = get_scsi_device_name(sandbox, &device.id).await?;
dev.vm_path = get_scsi_device_name(sandbox, &device.id)?;
update_spec_device_list(&dev, spec, devidx)
}
async fn virtio_nvdimm_device_handler(
fn virtio_nvdimm_device_handler(
device: &Device,
spec: &mut Spec,
_sandbox: &Arc<Mutex<Sandbox>>,
devidx: &DevIndex,
) -> Result<()> {
if device.vm_path.is_empty() {
if device.vm_path == "" {
return Err(anyhow!("Invalid path for nvdimm device"));
}
@@ -368,11 +336,11 @@ impl DevIndex {
fn new(spec: &Spec) -> DevIndex {
let mut map = HashMap::new();
if let Some(linux) = spec.linux.as_ref() {
for linux in spec.linux.as_ref() {
for (i, d) in linux.devices.iter().enumerate() {
let mut residx = Vec::new();
if let Some(linuxres) = linux.resources.as_ref() {
for linuxres in linux.resources.as_ref() {
for (j, r) in linuxres.devices.iter().enumerate() {
if r.r#type == d.r#type
&& r.major == Some(d.major)
@@ -389,7 +357,7 @@ impl DevIndex {
}
}
pub async fn add_devices(
pub fn add_devices(
devices: &[Device],
spec: &mut Spec,
sandbox: &Arc<Mutex<Sandbox>>,
@@ -397,13 +365,13 @@ pub async fn add_devices(
let devidx = DevIndex::new(spec);
for device in devices.iter() {
add_device(device, spec, sandbox, &devidx).await?;
add_device(device, spec, sandbox, &devidx)?;
}
Ok(())
}
async fn add_device(
fn add_device(
device: &Device,
spec: &mut Spec,
sandbox: &Arc<Mutex<Sandbox>>,
@@ -413,24 +381,21 @@ async fn add_device(
info!(sl!(), "device-id: {}, device-type: {}, device-vm-path: {}, device-container-path: {}, device-options: {:?}",
device.id, device.field_type, device.vm_path, device.container_path, device.options);
if device.field_type.is_empty() {
if device.field_type == "" {
return Err(anyhow!("invalid type for device {:?}", device));
}
if device.id.is_empty() && device.vm_path.is_empty() {
if device.id == "" && device.vm_path == "" {
return Err(anyhow!("invalid ID and VM path for device {:?}", device));
}
if device.container_path.is_empty() {
if device.container_path == "" {
return Err(anyhow!("invalid container path for device {:?}", device));
}
match device.field_type.as_str() {
DRIVER_BLK_TYPE => virtio_blk_device_handler(device, spec, sandbox, devidx).await,
DRIVER_MMIO_BLK_TYPE => virtiommio_blk_device_handler(device, spec, sandbox, devidx).await,
DRIVER_NVDIMM_TYPE => virtio_nvdimm_device_handler(device, spec, sandbox, devidx).await,
DRIVER_SCSI_TYPE => virtio_scsi_device_handler(device, spec, sandbox, devidx).await,
_ => Err(anyhow!("Unknown device type {}", device.field_type)),
match DEVICEHANDLERLIST.get(device.field_type.as_str()) {
None => Err(anyhow!("Unknown device type {}", device.field_type)),
Some(dev_handler) => dev_handler(device, spec, sandbox, devidx),
}
}
@@ -467,16 +432,13 @@ pub fn update_device_cgroup(spec: &mut Spec) -> Result<()> {
#[cfg(test)]
mod tests {
use super::*;
use crate::uevent::spawn_test_watcher;
use oci::Linux;
use tempfile::tempdir;
#[test]
fn test_update_device_cgroup() {
let mut spec = Spec {
linux: Some(Linux::default()),
..Default::default()
};
let mut spec = Spec::default();
spec.linux = Some(Linux::default());
update_device_cgroup(&mut spec).unwrap();
@@ -750,171 +712,4 @@ mod tests {
assert_eq!(Some(host_major), specresources.devices[1].major);
assert_eq!(Some(host_minor), specresources.devices[1].minor);
}
#[test]
fn test_pcipath_to_sysfs() {
let testdir = tempdir().expect("failed to create tmpdir");
let rootbuspath = testdir.path().to_str().unwrap();
let path2 = pci::Path::from_str("02").unwrap();
let path23 = pci::Path::from_str("02/03").unwrap();
let path234 = pci::Path::from_str("02/03/04").unwrap();
let relpath = pcipath_to_sysfs(rootbuspath, &path2);
assert_eq!(relpath.unwrap(), "/0000:00:02.0");
let relpath = pcipath_to_sysfs(rootbuspath, &path23);
assert!(relpath.is_err());
let relpath = pcipath_to_sysfs(rootbuspath, &path234);
assert!(relpath.is_err());
// Create mock sysfs files for the device at 0000:00:02.0
let bridge2path = format!("{}{}", rootbuspath, "/0000:00:02.0");
fs::create_dir_all(&bridge2path).unwrap();
let relpath = pcipath_to_sysfs(rootbuspath, &path2);
assert_eq!(relpath.unwrap(), "/0000:00:02.0");
let relpath = pcipath_to_sysfs(rootbuspath, &path23);
assert!(relpath.is_err());
let relpath = pcipath_to_sysfs(rootbuspath, &path234);
assert!(relpath.is_err());
// Create mock sysfs files to indicate that 0000:00:02.0 is a bridge to bus 01
let bridge2bus = "0000:01";
let bus2path = format!("{}/pci_bus/{}", bridge2path, bridge2bus);
fs::create_dir_all(bus2path).unwrap();
let relpath = pcipath_to_sysfs(rootbuspath, &path2);
assert_eq!(relpath.unwrap(), "/0000:00:02.0");
let relpath = pcipath_to_sysfs(rootbuspath, &path23);
assert_eq!(relpath.unwrap(), "/0000:00:02.0/0000:01:03.0");
let relpath = pcipath_to_sysfs(rootbuspath, &path234);
assert!(relpath.is_err());
// Create mock sysfs files for a bridge at 0000:01:03.0 to bus 02
let bridge3path = format!("{}/0000:01:03.0", bridge2path);
let bridge3bus = "0000:02";
let bus3path = format!("{}/pci_bus/{}", bridge3path, bridge3bus);
fs::create_dir_all(bus3path).unwrap();
let relpath = pcipath_to_sysfs(rootbuspath, &path2);
assert_eq!(relpath.unwrap(), "/0000:00:02.0");
let relpath = pcipath_to_sysfs(rootbuspath, &path23);
assert_eq!(relpath.unwrap(), "/0000:00:02.0/0000:01:03.0");
let relpath = pcipath_to_sysfs(rootbuspath, &path234);
assert_eq!(relpath.unwrap(), "/0000:00:02.0/0000:01:03.0/0000:02:04.0");
}
// We use device specific variants of this for real cases, but
// they have some complications that make them troublesome to unit
// test
async fn example_get_device_name(
sandbox: &Arc<Mutex<Sandbox>>,
relpath: &str,
) -> Result<String> {
let matcher = VirtioBlkPciMatcher::new(relpath);
let uev = wait_for_uevent(sandbox, matcher).await?;
Ok(uev.devname)
}
#[tokio::test]
async fn test_get_device_name() {
let devname = "vda";
let root_bus = create_pci_root_bus_path();
let relpath = "/0000:00:0a.0/0000:03:0b.0";
let devpath = format!("{}{}/virtio4/block/{}", root_bus, relpath, devname);
let mut uev = crate::uevent::Uevent::default();
uev.action = crate::linux_abi::U_EVENT_ACTION_ADD.to_string();
uev.subsystem = "block".to_string();
uev.devpath = devpath.clone();
uev.devname = devname.to_string();
let logger = slog::Logger::root(slog::Discard, o!());
let sandbox = Arc::new(Mutex::new(Sandbox::new(&logger).unwrap()));
let mut sb = sandbox.lock().await;
sb.uevent_map.insert(devpath.clone(), uev);
drop(sb); // unlock
let name = example_get_device_name(&sandbox, relpath).await;
assert!(name.is_ok(), "{}", name.unwrap_err());
assert_eq!(name.unwrap(), devname);
let mut sb = sandbox.lock().await;
let uev = sb.uevent_map.remove(&devpath).unwrap();
drop(sb); // unlock
spawn_test_watcher(sandbox.clone(), uev);
let name = example_get_device_name(&sandbox, relpath).await;
assert!(name.is_ok(), "{}", name.unwrap_err());
assert_eq!(name.unwrap(), devname);
}
#[tokio::test]
async fn test_virtio_blk_matcher() {
let root_bus = create_pci_root_bus_path();
let devname = "vda";
let mut uev_a = crate::uevent::Uevent::default();
let relpath_a = "/0000:00:0a.0";
uev_a.action = crate::linux_abi::U_EVENT_ACTION_ADD.to_string();
uev_a.subsystem = "block".to_string();
uev_a.devname = devname.to_string();
uev_a.devpath = format!("{}{}/virtio4/block/{}", root_bus, relpath_a, devname);
let matcher_a = VirtioBlkPciMatcher::new(&relpath_a);
let mut uev_b = uev_a.clone();
let relpath_b = "/0000:00:0a.0/0000:00:0b.0";
uev_b.devpath = format!("{}{}/virtio0/block/{}", root_bus, relpath_b, devname);
let matcher_b = VirtioBlkPciMatcher::new(&relpath_b);
assert!(matcher_a.is_match(&uev_a));
assert!(matcher_b.is_match(&uev_b));
assert!(!matcher_b.is_match(&uev_a));
assert!(!matcher_a.is_match(&uev_b));
}
#[tokio::test]
async fn test_scsi_block_matcher() {
let root_bus = create_pci_root_bus_path();
let devname = "sda";
let mut uev_a = crate::uevent::Uevent::default();
let addr_a = "0:0";
uev_a.action = crate::linux_abi::U_EVENT_ACTION_ADD.to_string();
uev_a.subsystem = "block".to_string();
uev_a.devname = devname.to_string();
uev_a.devpath = format!(
"{}/0000:00:00.0/virtio0/host0/target0:0:0/0:0:{}/block/sda",
root_bus, addr_a
);
let matcher_a = ScsiBlockMatcher::new(&addr_a);
let mut uev_b = uev_a.clone();
let addr_b = "2:0";
uev_b.devpath = format!(
"{}/0000:00:00.0/virtio0/host0/target0:0:2/0:0:{}/block/sdb",
root_bus, addr_b
);
let matcher_b = ScsiBlockMatcher::new(&addr_b);
assert!(matcher_a.is_match(&uev_a));
assert!(matcher_b.is_match(&uev_b));
assert!(!matcher_b.is_match(&uev_a));
assert!(!matcher_a.is_match(&uev_b));
}
}

View File

@@ -5,10 +5,9 @@
/// Linux ABI related constants.
#[cfg(target_arch = "aarch64")]
use std::fs;
pub const SYSFS_DIR: &str = "/sys";
pub const SYSFS_PCI_BUS_PREFIX: &str = "/sys/bus/pci/devices";
pub const SYSFS_PCI_BUS_RESCAN_FILE: &str = "/sys/bus/pci/rescan";
#[cfg(any(
target_arch = "powerpc64",
@@ -16,61 +15,9 @@ pub const SYSFS_PCI_BUS_RESCAN_FILE: &str = "/sys/bus/pci/rescan";
target_arch = "x86_64",
target_arch = "x86"
))]
pub fn create_pci_root_bus_path() -> String {
String::from("/devices/pci0000:00")
}
pub const PCI_ROOT_BUS_PATH: &str = "/devices/pci0000:00";
#[cfg(target_arch = "aarch64")]
pub fn create_pci_root_bus_path() -> String {
let ret = String::from("/devices/platform/4010000000.pcie/pci0000:00");
let acpi_root_bus_path = String::from("/devices/pci0000:00");
let mut acpi_sysfs_dir = String::from(SYSFS_DIR);
let mut sysfs_dir = String::from(SYSFS_DIR);
let mut start_root_bus_path = String::from("/devices/platform/");
let end_root_bus_path = String::from("/pci0000:00");
// check if there is pci bus path for acpi
acpi_sysfs_dir.push_str(&acpi_root_bus_path);
if let Ok(_) = fs::metadata(&acpi_sysfs_dir) {
return acpi_root_bus_path;
}
sysfs_dir.push_str(&start_root_bus_path);
let entries = match fs::read_dir(sysfs_dir) {
Ok(e) => e,
Err(_) => return ret,
};
for entry in entries {
let pathname = match entry {
Ok(p) => p.path(),
Err(_) => return ret,
};
let dir_name = match pathname.file_name() {
Some(p) => p.to_str(),
None => return ret,
};
let dir_name = match dir_name {
Some(p) => p,
None => return ret,
};
let dir_name = String::from(dir_name);
if dir_name.ends_with(".pcie") {
start_root_bus_path.push_str(&dir_name);
start_root_bus_path.push_str(&end_root_bus_path);
return start_root_bus_path;
}
}
ret
}
// From https://www.kernel.org/doc/Documentation/acpi/namespace.txt
// The Linux kernel's core ACPI subsystem creates struct acpi_device
// objects for ACPI namespace objects representing devices, power resources
// processors, thermal zones. Those objects are exported to user space via
// sysfs as directories in the subtree under /sys/devices/LNXSYSTM:00
pub const ACPI_DEV_PATH: &str = "/devices/LNXSYSTM";
pub const PCI_ROOT_BUS_PATH: &str = "/devices/platform/4010000000.pcie/pci0000:00";
pub const SYSFS_CPU_ONLINE_PATH: &str = "/sys/devices/system/cpu";
@@ -78,6 +25,11 @@ pub const SYSFS_MEMORY_BLOCK_SIZE_PATH: &str = "/sys/devices/system/memory/block
pub const SYSFS_MEMORY_HOTPLUG_PROBE_PATH: &str = "/sys/devices/system/memory/probe";
pub const SYSFS_MEMORY_ONLINE_PATH: &str = "/sys/devices/system/memory";
// Here in "0:0", the first number is the SCSI host number because
// only one SCSI controller has been plugged, while the second number
// is always 0.
pub const SCSI_HOST_CHANNEL: &str = "0:0:";
pub const SCSI_BLOCK_SUFFIX: &str = "block";
pub const SYSFS_SCSI_HOST_PATH: &str = "/sys/class/scsi_host";
pub const SYSFS_CGROUPPATH: &str = "/sys/fs/cgroup";

View File

@@ -3,6 +3,11 @@
// SPDX-License-Identifier: Apache-2.0
//
#![allow(non_camel_case_types)]
#![allow(unused_parens)]
#![allow(unused_unsafe)]
#![allow(dead_code)]
#![allow(non_snake_case)]
#[macro_use]
extern crate lazy_static;
extern crate oci;
@@ -10,76 +15,79 @@ extern crate prctl;
extern crate prometheus;
extern crate protocols;
extern crate regex;
extern crate rustjail;
extern crate scan_fmt;
extern crate serde_json;
extern crate signal_hook;
#[macro_use]
extern crate scopeguard;
#[macro_use]
extern crate slog;
extern crate netlink;
use crate::netlink::{RtnlHandle, NETLINK_ROUTE};
use anyhow::{anyhow, Context, Result};
use nix::fcntl::OFlag;
use nix::fcntl::{self, OFlag};
use nix::fcntl::{FcntlArg, FdFlag};
use nix::libc::{STDERR_FILENO, STDIN_FILENO, STDOUT_FILENO};
use nix::pty;
use nix::sys::select::{select, FdSet};
use nix::sys::socket::{self, AddressFamily, SockAddr, SockFlag, SockType};
use nix::unistd::{self, dup, Pid};
use nix::sys::wait::{self, WaitStatus};
use nix::unistd::{self, close, dup, dup2, fork, setsid, ForkResult};
use prctl::set_child_subreaper;
use signal_hook::{iterator::Signals, SIGCHLD};
use std::collections::HashMap;
use std::env;
use std::ffi::OsStr;
use std::ffi::{CStr, CString, OsStr};
use std::fs::{self, File};
use std::io::{Read, Write};
use std::os::unix::ffi::OsStrExt;
use std::os::unix::fs as unixfs;
use std::os::unix::io::AsRawFd;
use std::path::Path;
use std::process::exit;
use std::sync::Arc;
use std::sync::mpsc::{self, Sender};
use std::sync::{Arc, Mutex, RwLock};
use std::{io, thread, thread::JoinHandle};
use unistd::Pid;
mod config;
mod console;
mod device;
mod linux_abi;
mod metrics;
mod mount;
mod namespace;
mod netlink;
mod network;
mod pci;
pub mod random;
mod sandbox;
mod signal;
#[cfg(test)]
mod test_utils;
mod uevent;
mod util;
mod version;
use mount::{cgroups_mount, general_mount};
use sandbox::Sandbox;
use signal::setup_signal_handler;
use slog::Logger;
use uevent::watch_uevents;
use futures::future::join_all;
use rustjail::pipestream::PipeStream;
use tokio::{
io::AsyncWrite,
sync::{
watch::{channel, Receiver},
Mutex, RwLock,
},
task::JoinHandle,
};
mod rpc;
const NAME: &str = "kata-agent";
const KERNEL_CMDLINE_FILE: &str = "/proc/cmdline";
const CONSOLE_PATH: &str = "/dev/console";
const DEFAULT_BUF_SIZE: usize = 8 * 1024;
lazy_static! {
static ref AGENT_CONFIG: Arc<RwLock<AgentConfig>> =
Arc::new(RwLock::new(config::AgentConfig::new()));
static ref GLOBAL_DEVICE_WATCHER: Arc<Mutex<HashMap<String, Sender<String>>>> =
Arc::new(Mutex::new(HashMap::new()));
static ref AGENT_CONFIG: Arc<RwLock<agentConfig>> =
Arc::new(RwLock::new(config::agentConfig::new()));
}
fn announce(logger: &Logger, config: &AgentConfig) {
fn announce(logger: &Logger, config: &agentConfig) {
info!(logger, "announce";
"agent-commit" => version::VERSION_COMMIT,
@@ -92,147 +100,7 @@ fn announce(logger: &Logger, config: &AgentConfig) {
);
}
// Create a thread to handle reading from the logger pipe. The thread will
// output to the vsock port specified, or stdout.
async fn create_logger_task(rfd: RawFd, vsock_port: u32, shutdown: Receiver<bool>) -> Result<()> {
let mut reader = PipeStream::from_fd(rfd);
let mut writer: Box<dyn AsyncWrite + Unpin + Send>;
if vsock_port > 0 {
let listenfd = socket::socket(
AddressFamily::Vsock,
SockType::Stream,
SockFlag::SOCK_CLOEXEC,
None,
)?;
let addr = SockAddr::new_vsock(libc::VMADDR_CID_ANY, vsock_port);
socket::bind(listenfd, &addr).unwrap();
socket::listen(listenfd, 1).unwrap();
writer = Box::new(util::get_vsock_stream(listenfd).await.unwrap());
} else {
writer = Box::new(tokio::io::stdout());
}
let _ = util::interruptable_io_copier(&mut reader, &mut writer, shutdown).await;
Ok(())
}
async fn real_main() -> std::result::Result<(), Box<dyn std::error::Error>> {
env::set_var("RUST_BACKTRACE", "full");
// List of tasks that need to be stopped for a clean shutdown
let mut tasks: Vec<JoinHandle<Result<()>>> = vec![];
console::initialize();
lazy_static::initialize(&AGENT_CONFIG);
// support vsock log
let (rfd, wfd) = unistd::pipe2(OFlag::O_CLOEXEC)?;
let (shutdown_tx, shutdown_rx) = channel(true);
let agent_config = AGENT_CONFIG.clone();
let init_mode = unistd::getpid() == Pid::from_raw(1);
if init_mode {
// dup a new file descriptor for this temporary logger writer,
// since this logger would be dropped and it's writer would
// be closed out of this code block.
let newwfd = dup(wfd)?;
let writer = unsafe { File::from_raw_fd(newwfd) };
// Init a temporary logger used by init agent as init process
// since before do the base mount, it wouldn't access "/proc/cmdline"
// to get the customzied debug level.
let (logger, logger_async_guard) =
logging::create_logger(NAME, "agent", slog::Level::Debug, writer);
// Must mount proc fs before parsing kernel command line
general_mount(&logger).map_err(|e| {
error!(logger, "fail general mount: {}", e);
e
})?;
let mut config = agent_config.write().await;
config.parse_cmdline(KERNEL_CMDLINE_FILE)?;
init_agent_as_init(&logger, config.unified_cgroup_hierarchy)?;
drop(logger_async_guard);
} else {
// once parsed cmdline and set the config, release the write lock
// as soon as possible in case other thread would get read lock on
// it.
let mut config = agent_config.write().await;
config.parse_cmdline(KERNEL_CMDLINE_FILE)?;
}
let config = agent_config.read().await;
let log_vport = config.log_vport as u32;
let log_handle = tokio::spawn(create_logger_task(rfd, log_vport, shutdown_rx.clone()));
tasks.push(log_handle);
let writer = unsafe { File::from_raw_fd(wfd) };
// Recreate a logger with the log level get from "/proc/cmdline".
let (logger, logger_async_guard) =
logging::create_logger(NAME, "agent", config.log_level, writer);
announce(&logger, &config);
// This variable is required as it enables the global (and crucially static) logger,
// which is required to satisfy the the lifetime constraints of the auto-generated gRPC code.
let global_logger = slog_scope::set_global_logger(logger.new(o!("subsystem" => "rpc")));
// Allow the global logger to be modified later (for shutdown)
global_logger.cancel_reset();
let mut ttrpc_log_guard: Result<(), log::SetLoggerError> = Ok(());
if config.log_level == slog::Level::Trace {
// Redirect ttrpc log calls to slog iff full debug requested
ttrpc_log_guard = Ok(slog_stdlog::init().map_err(|e| e)?);
}
// Start the sandbox and wait for its ttRPC server to end
start_sandbox(&logger, &config, init_mode, &mut tasks, shutdown_rx.clone()).await?;
// Install a NOP logger for the remainder of the shutdown sequence
// to ensure any log calls made by local crates using the scope logger
// don't fail.
let global_logger_guard2 =
slog_scope::set_global_logger(slog::Logger::root(slog::Discard, o!()));
global_logger_guard2.cancel_reset();
drop(logger_async_guard);
drop(ttrpc_log_guard);
// Trigger a controlled shutdown
shutdown_tx
.send(true)
.map_err(|e| anyhow!(e).context("failed to request shutdown"))?;
// Wait for all threads to finish
let results = join_all(tasks).await;
for result in results {
if let Err(e) = result {
return Err(anyhow!(e).into());
}
}
eprintln!("{} shutdown complete", NAME);
Ok(())
}
fn main() -> std::result::Result<(), Box<dyn std::error::Error>> {
fn main() -> Result<()> {
let args: Vec<String> = env::args().collect();
if args.len() == 2 && args[1] == "--version" {
@@ -248,67 +116,231 @@ fn main() -> std::result::Result<(), Box<dyn std::error::Error>> {
}
if args.len() == 2 && args[1] == "init" {
reset_sigpipe();
rustjail::container::init_child();
exit(0);
}
let rt = tokio::runtime::Builder::new_multi_thread()
.enable_all()
.build()?;
env::set_var("RUST_BACKTRACE", "full");
rt.block_on(real_main())
lazy_static::initialize(&SHELLS);
lazy_static::initialize(&AGENT_CONFIG);
// support vsock log
let (rfd, wfd) = unistd::pipe2(OFlag::O_CLOEXEC)?;
let agentConfig = AGENT_CONFIG.clone();
let init_mode = unistd::getpid() == Pid::from_raw(1);
if init_mode {
// dup a new file descriptor for this temporary logger writer,
// since this logger would be dropped and it's writer would
// be closed out of this code block.
let newwfd = dup(wfd)?;
let writer = unsafe { File::from_raw_fd(newwfd) };
// Init a temporary logger used by init agent as init process
// since before do the base mount, it wouldn't access "/proc/cmdline"
// to get the customzied debug level.
let logger = logging::create_logger(NAME, "agent", slog::Level::Debug, writer);
// Must mount proc fs before parsing kernel command line
general_mount(&logger).map_err(|e| {
error!(logger, "fail general mount: {}", e);
e
})?;
let mut config = agentConfig.write().unwrap();
config.parse_cmdline(KERNEL_CMDLINE_FILE)?;
init_agent_as_init(&logger, config.unified_cgroup_hierarchy)?;
} else {
// once parsed cmdline and set the config, release the write lock
// as soon as possible in case other thread would get read lock on
// it.
let mut config = agentConfig.write().unwrap();
config.parse_cmdline(KERNEL_CMDLINE_FILE)?;
}
let config = agentConfig.read().unwrap();
let log_vport = config.log_vport as u32;
let log_handle = thread::spawn(move || -> Result<()> {
let mut reader = unsafe { File::from_raw_fd(rfd) };
if log_vport > 0 {
let listenfd = socket::socket(
AddressFamily::Vsock,
SockType::Stream,
SockFlag::SOCK_CLOEXEC,
None,
)?;
let addr = SockAddr::new_vsock(libc::VMADDR_CID_ANY, log_vport);
socket::bind(listenfd, &addr)?;
socket::listen(listenfd, 1)?;
let datafd = socket::accept4(listenfd, SockFlag::SOCK_CLOEXEC)?;
let mut log_writer = unsafe { File::from_raw_fd(datafd) };
let _ = io::copy(&mut reader, &mut log_writer)?;
let _ = unistd::close(listenfd);
let _ = unistd::close(datafd);
}
// copy log to stdout
let mut stdout_writer = io::stdout();
let _ = io::copy(&mut reader, &mut stdout_writer)?;
Ok(())
});
let writer = unsafe { File::from_raw_fd(wfd) };
// Recreate a logger with the log level get from "/proc/cmdline".
let logger = logging::create_logger(NAME, "agent", config.log_level, writer);
announce(&logger, &config);
// This "unused" variable is required as it enables the global (and crucially static) logger,
// which is required to satisfy the the lifetime constraints of the auto-generated gRPC code.
let _guard = slog_scope::set_global_logger(logger.new(o!("subsystem" => "rpc")));
start_sandbox(&logger, &config, init_mode)?;
let _ = log_handle.join();
Ok(())
}
async fn start_sandbox(
logger: &Logger,
config: &AgentConfig,
init_mode: bool,
tasks: &mut Vec<JoinHandle<Result<()>>>,
shutdown: Receiver<bool>,
) -> Result<()> {
fn start_sandbox(logger: &Logger, config: &agentConfig, init_mode: bool) -> Result<()> {
let shells = SHELLS.clone();
let debug_console_vport = config.debug_console_vport as u32;
let mut shell_handle: Option<JoinHandle<()>> = None;
if config.debug_console {
let debug_console_task = tokio::task::spawn(console::debug_console_handler(
logger.clone(),
debug_console_vport,
shutdown.clone(),
));
let thread_logger = logger.clone();
tasks.push(debug_console_task);
let builder = thread::Builder::new();
let handle = builder.spawn(move || {
let shells = shells.lock().unwrap();
let result = setup_debug_console(&thread_logger, shells.to_vec(), debug_console_vport);
if result.is_err() {
// Report error, but don't fail
warn!(thread_logger, "failed to setup debug console";
"error" => format!("{}", result.unwrap_err()));
}
})?;
shell_handle = Some(handle);
}
// Initialize unique sandbox structure.
let s = Sandbox::new(&logger).context("Failed to create sandbox")?;
let mut s = Sandbox::new(&logger).context("Failed to create sandbox")?;
if init_mode {
s.rtnl.handle_localhost().await?;
let mut rtnl = RtnlHandle::new(NETLINK_ROUTE, 0).unwrap();
rtnl.handle_localhost()?;
s.rtnl = Some(rtnl);
}
let sandbox = Arc::new(Mutex::new(s));
let signal_handler_task = tokio::spawn(setup_signal_handler(
logger.clone(),
sandbox.clone(),
shutdown.clone(),
));
setup_signal_handler(&logger, sandbox.clone()).unwrap();
watch_uevents(sandbox.clone());
tasks.push(signal_handler_task);
let (tx, rx) = mpsc::channel::<i32>();
sandbox.lock().unwrap().sender = Some(tx);
let uevents_handler_task = tokio::spawn(watch_uevents(sandbox.clone(), shutdown.clone()));
tasks.push(uevents_handler_task);
let (tx, rx) = tokio::sync::oneshot::channel();
sandbox.lock().await.sender = Some(tx);
// vsock:///dev/vsock, port
//vsock:///dev/vsock, port
let mut server = rpc::start(sandbox.clone(), config.server_addr.as_str());
server.start().await?;
let _ = rx.await?;
server.shutdown().await?;
let _ = server.start().unwrap();
let _ = rx.recv()?;
server.shutdown();
if let Some(handle) = shell_handle {
handle.join().map_err(|e| anyhow!("{:?}", e))?;
}
Ok(())
}
use nix::sys::wait::WaitPidFlag;
fn setup_signal_handler(logger: &Logger, sandbox: Arc<Mutex<Sandbox>>) -> Result<()> {
let logger = logger.new(o!("subsystem" => "signals"));
set_child_subreaper(true)
.map_err(|err| anyhow!(err).context("failed to setup agent as a child subreaper"))?;
let signals = Signals::new(&[SIGCHLD])?;
let s = sandbox.clone();
thread::spawn(move || {
'outer: for sig in signals.forever() {
info!(logger, "received signal"; "signal" => sig);
// sevral signals can be combined together
// as one. So loop around to reap all
// exited children
'inner: loop {
let wait_status = match wait::waitpid(
Some(Pid::from_raw(-1)),
Some(WaitPidFlag::WNOHANG | WaitPidFlag::__WALL),
) {
Ok(s) => {
if s == WaitStatus::StillAlive {
continue 'outer;
}
s
}
Err(e) => {
info!(
logger,
"waitpid reaper failed";
"error" => e.as_errno().unwrap().desc()
);
continue 'outer;
}
};
let pid = wait_status.pid();
if pid.is_some() {
let raw_pid = pid.unwrap().as_raw();
let child_pid = format!("{}", raw_pid);
let logger = logger.new(o!("child-pid" => child_pid));
let mut sandbox = s.lock().unwrap();
let process = sandbox.find_process(raw_pid);
if process.is_none() {
info!(logger, "child exited unexpectedly");
continue 'inner;
}
let mut p = process.unwrap();
if p.exit_pipe_w.is_none() {
error!(logger, "the process's exit_pipe_w isn't set");
continue 'inner;
}
let pipe_write = p.exit_pipe_w.unwrap();
let ret: i32;
match wait_status {
WaitStatus::Exited(_, c) => ret = c,
WaitStatus::Signaled(_, sig, _) => ret = sig as i32,
_ => {
info!(logger, "got wrong status for process";
"child-status" => format!("{:?}", wait_status));
continue 'inner;
}
}
p.exit_code = ret;
let _ = unistd::close(pipe_write);
}
}
}
});
Ok(())
}
@@ -329,13 +361,12 @@ fn init_agent_as_init(logger: &Logger, unified_cgroup_hierarchy: bool) -> Result
unistd::setsid()?;
unsafe {
libc::ioctl(std::io::stdin().as_raw_fd(), libc::TIOCSCTTY, 1);
libc::ioctl(io::stdin().as_raw_fd(), libc::TIOCSCTTY, 1);
}
env::set_var("PATH", "/bin:/sbin/:/usr/bin/:/usr/sbin/");
let contents =
std::fs::read_to_string("/etc/hostname").unwrap_or_else(|_| String::from("localhost"));
let contents = std::fs::read_to_string("/etc/hostname").unwrap_or(String::from("localhost"));
let contents_array: Vec<&str> = contents.split(' ').collect();
let hostname = contents_array[0].trim();
@@ -359,16 +390,295 @@ fn sethostname(hostname: &OsStr) -> Result<()> {
}
}
// The Rust standard library had suppressed the default SIGPIPE behavior,
// see https://github.com/rust-lang/rust/pull/13158.
// Since the parent's signal handler would be inherited by it's child process,
// thus we should re-enable the standard SIGPIPE behavior as a workaround to
// fix the issue of https://github.com/kata-containers/kata-containers/issues/1887.
fn reset_sigpipe() {
unsafe {
libc::signal(libc::SIGPIPE, libc::SIG_DFL);
lazy_static! {
static ref SHELLS: Arc<Mutex<Vec<String>>> = {
let mut v = Vec::new();
if !cfg!(test) {
v.push("/bin/bash".to_string());
v.push("/bin/sh".to_string());
}
Arc::new(Mutex::new(v))
};
}
// pub static mut LOG_LEVEL: ;
// pub static mut TRACE_MODE: ;
use crate::config::agentConfig;
use nix::sys::stat::Mode;
use std::os::unix::io::{FromRawFd, RawFd};
use std::path::PathBuf;
use std::process::exit;
fn setup_debug_console(logger: &Logger, shells: Vec<String>, port: u32) -> Result<()> {
let mut shell: &str = "";
for sh in shells.iter() {
let binary = PathBuf::from(sh);
if binary.exists() {
shell = sh;
break;
}
}
if shell == "" {
return Err(anyhow!("no shell found to launch debug console"));
}
if port > 0 {
let listenfd = socket::socket(
AddressFamily::Vsock,
SockType::Stream,
SockFlag::SOCK_CLOEXEC,
None,
)?;
let addr = SockAddr::new_vsock(libc::VMADDR_CID_ANY, port);
socket::bind(listenfd, &addr)?;
socket::listen(listenfd, 1)?;
loop {
let f: RawFd = socket::accept4(listenfd, SockFlag::SOCK_CLOEXEC)?;
match run_debug_console_shell(logger, shell, f) {
Ok(_) => {
info!(logger, "run_debug_console_shell session finished");
}
Err(err) => {
error!(logger, "run_debug_console_shell failed: {:?}", err);
}
}
}
} else {
let mut flags = OFlag::empty();
flags.insert(OFlag::O_RDWR);
flags.insert(OFlag::O_CLOEXEC);
loop {
let f: RawFd = fcntl::open(CONSOLE_PATH, flags, Mode::empty())?;
match run_debug_console_shell(logger, shell, f) {
Ok(_) => {
info!(logger, "run_debug_console_shell session finished");
}
Err(err) => {
error!(logger, "run_debug_console_shell failed: {:?}", err);
}
}
}
};
}
fn io_copy<R: ?Sized, W: ?Sized>(reader: &mut R, writer: &mut W) -> io::Result<u64>
where
R: Read,
W: Write,
{
let mut buf = [0; DEFAULT_BUF_SIZE];
let buf_len;
match reader.read(&mut buf) {
Ok(0) => return Ok(0),
Ok(len) => buf_len = len,
Err(err) => return Err(err),
};
// write and return
match writer.write_all(&buf[..buf_len]) {
Ok(_) => return Ok(buf_len as u64),
Err(err) => return Err(err),
}
}
use crate::config::AgentConfig;
use std::os::unix::io::{FromRawFd, RawFd};
fn run_debug_console_shell(logger: &Logger, shell: &str, socket_fd: RawFd) -> Result<()> {
let pseduo = pty::openpty(None, None)?;
let _ = fcntl::fcntl(pseduo.master, FcntlArg::F_SETFD(FdFlag::FD_CLOEXEC));
let _ = fcntl::fcntl(pseduo.slave, FcntlArg::F_SETFD(FdFlag::FD_CLOEXEC));
let slave_fd = pseduo.slave;
match fork() {
Ok(ForkResult::Child) => {
// create new session with child as session leader
setsid()?;
// dup stdin, stdout, stderr to let child act as a terminal
dup2(slave_fd, STDIN_FILENO)?;
dup2(slave_fd, STDOUT_FILENO)?;
dup2(slave_fd, STDERR_FILENO)?;
// set tty
unsafe {
libc::ioctl(0, libc::TIOCSCTTY);
}
let cmd = CString::new(shell).unwrap();
let args: Vec<&CStr> = vec![];
// run shell
let _ = unistd::execvp(cmd.as_c_str(), args.as_slice()).map_err(|e| match e {
nix::Error::Sys(errno) => {
std::process::exit(errno as i32);
}
_ => std::process::exit(-2),
});
}
Ok(ForkResult::Parent { child: child_pid }) => {
info!(logger, "get debug shell pid {:?}", child_pid);
let (rfd, wfd) = unistd::pipe2(OFlag::O_CLOEXEC)?;
let master_fd = pseduo.master;
let debug_shell_logger = logger.clone();
// channel that used to sync between thread and main process
let (tx, rx) = mpsc::channel::<i32>();
// start a thread to do IO copy between socket and pseduo.master
thread::spawn(move || {
let mut master_reader = unsafe { File::from_raw_fd(master_fd) };
let mut master_writer = unsafe { File::from_raw_fd(master_fd) };
let mut socket_reader = unsafe { File::from_raw_fd(socket_fd) };
let mut socket_writer = unsafe { File::from_raw_fd(socket_fd) };
loop {
let mut fd_set = FdSet::new();
fd_set.insert(rfd);
fd_set.insert(master_fd);
fd_set.insert(socket_fd);
match select(
Some(fd_set.highest().unwrap() + 1),
&mut fd_set,
None,
None,
None,
) {
Ok(_) => (),
Err(e) => {
if e == nix::Error::from(nix::errno::Errno::EINTR) {
continue;
} else {
error!(debug_shell_logger, "select error {:?}", e);
tx.send(1).unwrap();
break;
}
}
}
if fd_set.contains(rfd) {
info!(
debug_shell_logger,
"debug shell process {} exited", child_pid
);
tx.send(1).unwrap();
break;
}
if fd_set.contains(master_fd) {
match io_copy(&mut master_reader, &mut socket_writer) {
Ok(0) => {
debug!(debug_shell_logger, "master fd closed");
tx.send(1).unwrap();
break;
}
Ok(_) => {}
Err(ref e) if e.kind() == std::io::ErrorKind::Interrupted => continue,
Err(e) => {
error!(debug_shell_logger, "read master fd error {:?}", e);
tx.send(1).unwrap();
break;
}
}
}
if fd_set.contains(socket_fd) {
match io_copy(&mut socket_reader, &mut master_writer) {
Ok(0) => {
debug!(debug_shell_logger, "socket fd closed");
tx.send(1).unwrap();
break;
}
Ok(_) => {}
Err(ref e) if e.kind() == std::io::ErrorKind::Interrupted => continue,
Err(e) => {
error!(debug_shell_logger, "read socket fd error {:?}", e);
tx.send(1).unwrap();
break;
}
}
}
}
});
let wait_status = wait::waitpid(child_pid, None);
info!(logger, "debug console process exit code: {:?}", wait_status);
info!(logger, "notify debug monitor thread to exit");
// close pipe to exit select loop
let _ = close(wfd);
// wait for thread exit.
let _ = rx.recv().unwrap();
info!(logger, "debug monitor thread has exited");
// close files
let _ = close(rfd);
let _ = close(master_fd);
let _ = close(slave_fd);
}
Err(err) => {
return Err(anyhow!("fork error: {:?}", err));
}
}
Ok(())
}
#[cfg(test)]
mod tests {
use super::*;
use tempfile::tempdir;
#[test]
fn test_setup_debug_console_no_shells() {
// Guarantee no shells have been added
// (required to avoid racing with
// test_setup_debug_console_invalid_shell()).
let shells_ref = SHELLS.clone();
let mut shells = shells_ref.lock().unwrap();
shells.clear();
let logger = slog_scope::logger();
let result = setup_debug_console(&logger, shells.to_vec(), 0);
assert!(result.is_err());
assert_eq!(
result.unwrap_err().to_string(),
"no shell found to launch debug console"
);
}
#[test]
fn test_setup_debug_console_invalid_shell() {
let shells_ref = SHELLS.clone();
let mut shells = shells_ref.lock().unwrap();
let dir = tempdir().expect("failed to create tmpdir");
// Add an invalid shell
let shell = dir
.path()
.join("enoent")
.to_str()
.expect("failed to construct shell path")
.to_string();
shells.push(shell);
let logger = slog_scope::logger();
let result = setup_debug_console(&logger, shells.to_vec(), 0);
assert!(result.is_err());
assert_eq!(
result.unwrap_err().to_string(),
"no shell found to launch debug console"
);
}
}

View File

@@ -8,6 +8,7 @@ extern crate procfs;
use prometheus::{Encoder, Gauge, GaugeVec, IntCounter, TextEncoder};
use anyhow::Result;
use protocols;
const NAMESPACE_KATA_AGENT: &str = "kata_agent";
const NAMESPACE_KATA_GUEST: &str = "kata_guest";
@@ -84,15 +85,17 @@ pub fn get_metrics(_: &protocols::agent::GetMetricsRequest) -> Result<String> {
let encoder = TextEncoder::new();
encoder.encode(&metric_families, &mut buffer).unwrap();
Ok(String::from_utf8(buffer).unwrap())
Ok(String::from_utf8(buffer.clone()).unwrap())
}
fn update_agent_metrics() {
let me = procfs::process::Process::myself();
if let Err(err) = me {
error!(sl!(), "failed to create process instance: {:?}", err);
return;
match me {
Err(err) => {
error!(sl!(), "failed to create process instance: {:?}", err);
return;
}
Ok(_) => {}
}
let me = me.unwrap();
@@ -187,9 +190,9 @@ fn update_guest_metrics() {
info!(sl!(), "failed to get guest KernelStats: {:?}", err);
}
Ok(kernel_stats) => {
set_gauge_vec_cpu_time(&GUEST_CPU_TIME, "total", &kernel_stats.total);
set_gauge_vec_CPU_time(&GUEST_CPU_TIME, "total", &kernel_stats.total);
for (i, cpu_time) in kernel_stats.cpu_time.iter().enumerate() {
set_gauge_vec_cpu_time(&GUEST_CPU_TIME, format!("{}", i).as_str(), &cpu_time);
set_gauge_vec_CPU_time(&GUEST_CPU_TIME, format!("{}", i).as_str(), &cpu_time);
}
}
}
@@ -332,7 +335,7 @@ fn set_gauge_vec_meminfo(gv: &prometheus::GaugeVec, meminfo: &procfs::Meminfo) {
.set(meminfo.k_reclaimable.unwrap_or(0) as f64);
}
fn set_gauge_vec_cpu_time(gv: &prometheus::GaugeVec, cpu: &str, cpu_time: &procfs::CpuTime) {
fn set_gauge_vec_CPU_time(gv: &prometheus::GaugeVec, cpu: &str, cpu_time: &procfs::CpuTime) {
gv.with_label_values(&[cpu, "user"])
.set(cpu_time.user as f64);
gv.with_label_values(&[cpu, "nice"])

View File

@@ -7,49 +7,39 @@ use std::collections::HashMap;
use std::ffi::CString;
use std::fs;
use std::io;
use std::os::unix::fs::{MetadataExt, PermissionsExt};
use std::iter::FromIterator;
use std::os::unix::fs::PermissionsExt;
use std::path::Path;
use std::ptr::null;
use std::str::FromStr;
use std::sync::Arc;
use tokio::sync::Mutex;
use std::sync::{Arc, Mutex};
use libc::{c_void, mount};
use nix::mount::{self, MsFlags};
use nix::unistd::Gid;
use regex::Regex;
use std::fs::File;
use std::io::{BufRead, BufReader};
use crate::device::{
get_scsi_device_name, get_virtio_blk_pci_device_name, online_device, wait_for_pmem_device,
};
use crate::device::{get_pci_device_name, get_scsi_device_name, online_device};
use crate::linux_abi::*;
use crate::pci;
use crate::protocols::agent::Storage;
use crate::Sandbox;
use anyhow::{anyhow, Context, Result};
use slog::Logger;
pub const DRIVER_9P_TYPE: &str = "9p";
pub const DRIVER_VIRTIOFS_TYPE: &str = "virtio-fs";
pub const DRIVER_BLK_TYPE: &str = "blk";
pub const DRIVER_MMIO_BLK_TYPE: &str = "mmioblk";
pub const DRIVER_SCSI_TYPE: &str = "scsi";
pub const DRIVER_NVDIMM_TYPE: &str = "nvdimm";
pub const DRIVER_EPHEMERAL_TYPE: &str = "ephemeral";
pub const DRIVER_LOCAL_TYPE: &str = "local";
pub const DRIVER9PTYPE: &str = "9p";
pub const DRIVERVIRTIOFSTYPE: &str = "virtio-fs";
pub const DRIVERBLKTYPE: &str = "blk";
pub const DRIVERMMIOBLKTYPE: &str = "mmioblk";
pub const DRIVERSCSITYPE: &str = "scsi";
pub const DRIVERNVDIMMTYPE: &str = "nvdimm";
pub const DRIVEREPHEMERALTYPE: &str = "ephemeral";
pub const DRIVERLOCALTYPE: &str = "local";
pub const TYPE_ROOTFS: &str = "rootfs";
pub const TYPEROOTFS: &str = "rootfs";
pub const MOUNT_GUEST_TAG: &str = "kataShared";
// Allocating an FSGroup that owns the pod's volumes
const FS_GID: &str = "fsgid";
#[rustfmt::skip]
#[cfg_attr(rustfmt, rustfmt_skip)]
lazy_static! {
pub static ref FLAGS: HashMap<&'static str, (bool, MsFlags)> = {
let mut m = HashMap::new();
@@ -91,14 +81,14 @@ lazy_static! {
}
#[derive(Debug, PartialEq)]
pub struct InitMount {
pub struct INIT_MOUNT {
fstype: &'static str,
src: &'static str,
dest: &'static str,
options: Vec<&'static str>,
}
#[rustfmt::skip]
#[cfg_attr(rustfmt, rustfmt_skip)]
lazy_static!{
static ref CGROUPS: HashMap<&'static str, &'static str> = {
let mut m = HashMap::new();
@@ -119,28 +109,44 @@ lazy_static!{
};
}
#[rustfmt::skip]
#[cfg_attr(rustfmt, rustfmt_skip)]
lazy_static! {
pub static ref INIT_ROOTFS_MOUNTS: Vec<InitMount> = vec![
InitMount{fstype: "proc", src: "proc", dest: "/proc", options: vec!["nosuid", "nodev", "noexec"]},
InitMount{fstype: "sysfs", src: "sysfs", dest: "/sys", options: vec!["nosuid", "nodev", "noexec"]},
InitMount{fstype: "devtmpfs", src: "dev", dest: "/dev", options: vec!["nosuid"]},
InitMount{fstype: "tmpfs", src: "tmpfs", dest: "/dev/shm", options: vec!["nosuid", "nodev"]},
InitMount{fstype: "devpts", src: "devpts", dest: "/dev/pts", options: vec!["nosuid", "noexec"]},
InitMount{fstype: "tmpfs", src: "tmpfs", dest: "/run", options: vec!["nosuid", "nodev"]},
pub static ref INIT_ROOTFS_MOUNTS: Vec<INIT_MOUNT> = vec![
INIT_MOUNT{fstype: "proc", src: "proc", dest: "/proc", options: vec!["nosuid", "nodev", "noexec"]},
INIT_MOUNT{fstype: "sysfs", src: "sysfs", dest: "/sys", options: vec!["nosuid", "nodev", "noexec"]},
INIT_MOUNT{fstype: "devtmpfs", src: "dev", dest: "/dev", options: vec!["nosuid"]},
INIT_MOUNT{fstype: "tmpfs", src: "tmpfs", dest: "/dev/shm", options: vec!["nosuid", "nodev"]},
INIT_MOUNT{fstype: "devpts", src: "devpts", dest: "/dev/pts", options: vec!["nosuid", "noexec"]},
INIT_MOUNT{fstype: "tmpfs", src: "tmpfs", dest: "/run", options: vec!["nosuid", "nodev"]},
];
}
pub const STORAGE_HANDLER_LIST: [&str; 8] = [
DRIVER_BLK_TYPE,
DRIVER_9P_TYPE,
DRIVER_VIRTIOFS_TYPE,
DRIVER_EPHEMERAL_TYPE,
DRIVER_MMIO_BLK_TYPE,
DRIVER_LOCAL_TYPE,
DRIVER_SCSI_TYPE,
DRIVER_NVDIMM_TYPE,
];
// StorageHandler is the type of callback to be defined to handle every
// type of storage driver.
type StorageHandler = fn(&Logger, &Storage, Arc<Mutex<Sandbox>>) -> Result<String>;
// STORAGEHANDLERLIST lists the supported drivers.
#[cfg_attr(rustfmt, rustfmt_skip)]
lazy_static! {
pub static ref STORAGEHANDLERLIST: HashMap<&'static str, StorageHandler> = {
let mut m = HashMap::new();
let blk: StorageHandler = virtio_blk_storage_handler;
m.insert(DRIVERBLKTYPE, blk);
let p9: StorageHandler= virtio9p_storage_handler;
m.insert(DRIVER9PTYPE, p9);
let virtiofs: StorageHandler = virtiofs_storage_handler;
m.insert(DRIVERVIRTIOFSTYPE, virtiofs);
let ephemeral: StorageHandler = ephemeral_storage_handler;
m.insert(DRIVEREPHEMERALTYPE, ephemeral);
let virtiommio: StorageHandler = virtiommio_blk_storage_handler;
m.insert(DRIVERMMIOBLKTYPE, virtiommio);
let local: StorageHandler = local_storage_handler;
m.insert(DRIVERLOCALTYPE, local);
let scsi: StorageHandler = virtio_scsi_storage_handler;
m.insert(DRIVERSCSITYPE, scsi);
m
};
}
#[derive(Debug, Clone)]
pub struct BareMount<'a> {
@@ -167,9 +173,9 @@ impl<'a> BareMount<'a> {
BareMount {
source: s,
destination: d,
fs_type,
flags,
options,
fs_type: fs_type,
flags: flags,
options: options,
logger: logger.new(o!("subsystem" => "baremount")),
}
}
@@ -184,11 +190,11 @@ impl<'a> BareMount<'a> {
let cstr_dest: CString;
let cstr_fs_type: CString;
if self.source.is_empty() {
if self.source.len() == 0 {
return Err(anyhow!("need mount source"));
}
if self.destination.is_empty() {
if self.destination.len() == 0 {
return Err(anyhow!("need mount destination"));
}
@@ -198,14 +204,14 @@ impl<'a> BareMount<'a> {
cstr_dest = CString::new(self.destination)?;
dest = cstr_dest.as_ptr();
if self.fs_type.is_empty() {
if self.fs_type.len() == 0 {
return Err(anyhow!("need mount FS type"));
}
cstr_fs_type = CString::new(self.fs_type)?;
fs_type = cstr_fs_type.as_ptr();
if !self.options.is_empty() {
if self.options.len() > 0 {
cstr_options = CString::new(self.options)?;
options = cstr_options.as_ptr() as *const c_void;
}
@@ -232,12 +238,13 @@ impl<'a> BareMount<'a> {
}
}
async fn ephemeral_storage_handler(
fn ephemeral_storage_handler(
logger: &Logger,
storage: &Storage,
sandbox: Arc<Mutex<Sandbox>>,
) -> Result<String> {
let mut sb = sandbox.lock().await;
let s = sandbox.clone();
let mut sb = s.lock().unwrap();
let new_storage = sb.set_sandbox_storage(&storage.mount_point);
if !new_storage {
@@ -245,45 +252,18 @@ async fn ephemeral_storage_handler(
}
fs::create_dir_all(Path::new(&storage.mount_point))?;
// By now we only support one option field: "fsGroup" which
// isn't an valid mount option, thus we should remove it when
// do mount.
if storage.options.len() > 0 {
// ephemeral_storage didn't support mount options except fsGroup.
let mut new_storage = storage.clone();
new_storage.options = protobuf::RepeatedField::default();
common_storage_handler(logger, &new_storage)?;
let opts_vec: Vec<String> = storage.options.to_vec();
let opts = parse_options(opts_vec);
if let Some(fsgid) = opts.get(FS_GID) {
let gid = fsgid.parse::<u32>()?;
nix::unistd::chown(storage.mount_point.as_str(), None, Some(Gid::from_raw(gid)))?;
let meta = fs::metadata(&storage.mount_point)?;
let mut permission = meta.permissions();
let o_mode = meta.mode() | 0o2000;
permission.set_mode(o_mode);
fs::set_permissions(&storage.mount_point, permission)?;
}
} else {
common_storage_handler(logger, &storage)?;
}
common_storage_handler(logger, storage)?;
Ok("".to_string())
}
async fn local_storage_handler(
fn local_storage_handler(
_logger: &Logger,
storage: &Storage,
sandbox: Arc<Mutex<Sandbox>>,
) -> Result<String> {
let mut sb = sandbox.lock().await;
let s = sandbox.clone();
let mut sb = s.lock().unwrap();
let new_storage = sb.set_sandbox_storage(&storage.mount_point);
if !new_storage {
@@ -298,24 +278,12 @@ async fn local_storage_handler(
let opts_vec: Vec<String> = storage.options.to_vec();
let opts = parse_options(opts_vec);
let mut need_set_fsgid = false;
if let Some(fsgid) = opts.get(FS_GID) {
let gid = fsgid.parse::<u32>()?;
nix::unistd::chown(storage.mount_point.as_str(), None, Some(Gid::from_raw(gid)))?;
need_set_fsgid = true;
}
if let Some(mode) = opts.get("mode") {
let mode = opts.get("mode");
if mode.is_some() {
let mode = mode.unwrap();
let mut permission = fs::metadata(&storage.mount_point)?.permissions();
let mut o_mode = u32::from_str_radix(mode, 8)?;
if need_set_fsgid {
// set SetGid mode mask.
o_mode |= 0o2000;
}
let o_mode = u32::from_str_radix(mode, 8)?;
permission.set_mode(o_mode);
fs::set_permissions(&storage.mount_point, permission)?;
@@ -324,7 +292,7 @@ async fn local_storage_handler(
Ok("".to_string())
}
async fn virtio9p_storage_handler(
fn virtio9p_storage_handler(
logger: &Logger,
storage: &Storage,
_sandbox: Arc<Mutex<Sandbox>>,
@@ -333,7 +301,7 @@ async fn virtio9p_storage_handler(
}
// virtiommio_blk_storage_handler handles the storage for mmio blk driver.
async fn virtiommio_blk_storage_handler(
fn virtiommio_blk_storage_handler(
logger: &Logger,
storage: &Storage,
_sandbox: Arc<Mutex<Sandbox>>,
@@ -343,7 +311,7 @@ async fn virtiommio_blk_storage_handler(
}
// virtiofs_storage_handler handles the storage for virtio-fs.
async fn virtiofs_storage_handler(
fn virtiofs_storage_handler(
logger: &Logger,
storage: &Storage,
_sandbox: Arc<Mutex<Sandbox>>,
@@ -352,14 +320,14 @@ async fn virtiofs_storage_handler(
}
// virtio_blk_storage_handler handles the storage for blk driver.
async fn virtio_blk_storage_handler(
fn virtio_blk_storage_handler(
logger: &Logger,
storage: &Storage,
sandbox: Arc<Mutex<Sandbox>>,
) -> Result<String> {
let mut storage = storage.clone();
// If hot-plugged, get the device node path based on the PCI path
// otherwise use the virt path provided in Storage Source
// If hot-plugged, get the device node path based on the PCI address else
// use the virt path provided in Storage Source
if storage.source.starts_with("/dev") {
let metadata = fs::metadata(&storage.source)
.context(format!("get metadata on file {:?}", &storage.source))?;
@@ -369,8 +337,7 @@ async fn virtio_blk_storage_handler(
return Err(anyhow!("Invalid device {}", &storage.source));
}
} else {
let pcipath = pci::Path::from_str(&storage.source)?;
let dev_path = get_virtio_blk_pci_device_name(&sandbox, &pcipath).await?;
let dev_path = get_pci_device_name(&sandbox, &storage.source)?;
storage.source = dev_path;
}
@@ -378,7 +345,7 @@ async fn virtio_blk_storage_handler(
}
// virtio_scsi_storage_handler handles the storage for scsi driver.
async fn virtio_scsi_storage_handler(
fn virtio_scsi_storage_handler(
logger: &Logger,
storage: &Storage,
sandbox: Arc<Mutex<Sandbox>>,
@@ -386,7 +353,7 @@ async fn virtio_scsi_storage_handler(
let mut storage = storage.clone();
// Retrieve the device path from SCSI address.
let dev_path = get_scsi_device_name(&sandbox, &storage.source).await?;
let dev_path = get_scsi_device_name(&sandbox, &storage.source)?;
storage.source = dev_path;
common_storage_handler(logger, &storage)
@@ -399,37 +366,12 @@ fn common_storage_handler(logger: &Logger, storage: &Storage) -> Result<String>
mount_storage(logger, storage).and(Ok(mount_point))
}
// nvdimm_storage_handler handles the storage for NVDIMM driver.
async fn nvdimm_storage_handler(
logger: &Logger,
storage: &Storage,
sandbox: Arc<Mutex<Sandbox>>,
) -> Result<String> {
let storage = storage.clone();
// Retrieve the device path from NVDIMM address.
wait_for_pmem_device(&sandbox, &storage.source).await?;
common_storage_handler(logger, &storage)
}
// mount_storage performs the mount described by the storage structure.
fn mount_storage(logger: &Logger, storage: &Storage) -> Result<()> {
let logger = logger.new(o!("subsystem" => "mount"));
// Check share before attempting to mount to see if the destination is already a mount point.
// If so, skip doing the mount. This facilitates mounting the sharedfs automatically
// in the guest before the agent service starts.
if storage.source == MOUNT_GUEST_TAG && is_mounted(&storage.mount_point)? {
warn!(
logger,
"{} already mounted on {}, ignoring...", MOUNT_GUEST_TAG, &storage.mount_point
);
return Ok(());
}
match storage.fstype.as_str() {
DRIVER_9P_TYPE | DRIVER_VIRTIOFS_TYPE => {
DRIVER9PTYPE | DRIVERVIRTIOFSTYPE => {
let dest_path = Path::new(storage.mount_point.as_str());
if !dest_path.exists() {
fs::create_dir_all(dest_path).context("Create mount destination failed")?;
@@ -441,7 +383,7 @@ fn mount_storage(logger: &Logger, storage: &Storage) -> Result<()> {
}
let options_vec = storage.options.to_vec();
let options_vec = options_vec.iter().map(String::as_str).collect();
let options_vec = Vec::from_iter(options_vec.iter().map(String::as_str));
let (flags, options) = parse_mount_flags_and_options(options_vec);
info!(logger, "mounting storage";
@@ -463,40 +405,22 @@ fn mount_storage(logger: &Logger, storage: &Storage) -> Result<()> {
bare_mount.mount()
}
/// Looks for `mount_point` entry in the /proc/mounts.
fn is_mounted(mount_point: &str) -> Result<bool> {
let mount_point = mount_point.trim_end_matches('/');
let found = fs::metadata(mount_point).is_ok()
// Looks through /proc/mounts and check if the mount exists
&& fs::read_to_string("/proc/mounts")?
.lines()
.any(|line| {
// The 2nd column reveals the mount point.
line.split_whitespace()
.nth(1)
.map(|target| mount_point.eq(target))
.unwrap_or(false)
});
Ok(found)
}
fn parse_mount_flags_and_options(options_vec: Vec<&str>) -> (MsFlags, String) {
let mut flags = MsFlags::empty();
let mut options: String = "".to_string();
for opt in options_vec {
if !opt.is_empty() {
if opt.len() != 0 {
match FLAGS.get(opt) {
Some(x) => {
let (_, f) = *x;
flags |= f;
flags = flags | f;
}
None => {
if !options.is_empty() {
if options.len() > 0 {
options.push_str(format!(",{}", opt).as_str());
} else {
options.push_str(opt.to_string().as_str());
options.push_str(format!("{}", opt).as_str());
}
}
};
@@ -509,7 +433,7 @@ fn parse_mount_flags_and_options(options_vec: Vec<&str>) -> (MsFlags, String) {
// associated operations such as waiting for the device to show up, and mount
// it to a specific location, according to the type of handler chosen, and for
// each storage.
pub async fn add_storages(
pub fn add_storages(
logger: Logger,
storages: Vec<Storage>,
sandbox: Arc<Mutex<Sandbox>>,
@@ -522,35 +446,19 @@ pub async fn add_storages(
"subsystem" => "storage",
"storage-type" => handler_name.to_owned()));
let res = match handler_name.as_str() {
DRIVER_BLK_TYPE => virtio_blk_storage_handler(&logger, &storage, sandbox.clone()).await,
DRIVER_9P_TYPE => virtio9p_storage_handler(&logger, &storage, sandbox.clone()).await,
DRIVER_VIRTIOFS_TYPE => {
virtiofs_storage_handler(&logger, &storage, sandbox.clone()).await
}
DRIVER_EPHEMERAL_TYPE => {
ephemeral_storage_handler(&logger, &storage, sandbox.clone()).await
}
DRIVER_MMIO_BLK_TYPE => {
virtiommio_blk_storage_handler(&logger, &storage, sandbox.clone()).await
}
DRIVER_LOCAL_TYPE => local_storage_handler(&logger, &storage, sandbox.clone()).await,
DRIVER_SCSI_TYPE => {
virtio_scsi_storage_handler(&logger, &storage, sandbox.clone()).await
}
DRIVER_NVDIMM_TYPE => nvdimm_storage_handler(&logger, &storage, sandbox.clone()).await,
_ => {
return Err(anyhow!(
let handler = STORAGEHANDLERLIST
.get(&handler_name.as_str())
.ok_or_else(|| {
anyhow!(
"Failed to find the storage handler {}",
storage.driver.to_owned()
));
}
};
)
})?;
// Todo need to rollback the mounted storage if err met.
let mount_point = res?;
let mount_point = handler(&logger, &storage, sandbox.clone())?;
if !mount_point.is_empty() {
if mount_point.len() > 0 {
mount_list.push(mount_point);
}
}
@@ -558,7 +466,7 @@ pub async fn add_storages(
Ok(mount_list)
}
fn mount_to_rootfs(logger: &Logger, m: &InitMount) -> Result<()> {
fn mount_to_rootfs(logger: &Logger, m: &INIT_MOUNT) -> Result<()> {
let options_vec: Vec<&str> = m.options.clone();
let (flags, options) = parse_mount_flags_and_options(options_vec);
@@ -601,7 +509,7 @@ pub fn get_mount_fs_type(mount_point: &str) -> Result<String> {
// get_mount_fs_type_from_file returns the FS type corresponding to the passed mount point and
// any error ecountered.
pub fn get_mount_fs_type_from_file(mount_file: &str, mount_point: &str) -> Result<String> {
if mount_point.is_empty() {
if mount_point == "" {
return Err(anyhow!("Invalid mount point {}", mount_point));
}
@@ -634,11 +542,11 @@ pub fn get_cgroup_mounts(
logger: &Logger,
cg_path: &str,
unified_cgroup_hierarchy: bool,
) -> Result<Vec<InitMount>> {
) -> Result<Vec<INIT_MOUNT>> {
// cgroup v2
// https://github.com/kata-containers/agent/blob/8c9bbadcd448c9a67690fbe11a860aaacc69813c/agent.go#L1249
if unified_cgroup_hierarchy {
return Ok(vec![InitMount {
return Ok(vec![INIT_MOUNT {
fstype: "cgroup2",
src: "cgroup2",
dest: "/sys/fs/cgroup",
@@ -650,7 +558,7 @@ pub fn get_cgroup_mounts(
let reader = BufReader::new(file);
let mut has_device_cgroup = false;
let mut cg_mounts: Vec<InitMount> = vec![InitMount {
let mut cg_mounts: Vec<INIT_MOUNT> = vec![INIT_MOUNT {
fstype: "tmpfs",
src: "tmpfs",
dest: SYSFS_CGROUPPATH,
@@ -662,10 +570,10 @@ pub fn get_cgroup_mounts(
'outer: for (_, line) in reader.lines().enumerate() {
let line = line?;
let fields: Vec<&str> = line.split('\t').collect();
let fields: Vec<&str> = line.split("\t").collect();
// Ignore comment header
if fields[0].starts_with('#') {
if fields[0].starts_with("#") {
continue;
}
@@ -686,7 +594,7 @@ pub fn get_cgroup_mounts(
}
}
if fields[0].is_empty() {
if fields[0] == "" {
continue;
}
@@ -696,7 +604,7 @@ pub fn get_cgroup_mounts(
if let Some(value) = CGROUPS.get(&fields[0]) {
let key = CGROUPS.keys().find(|&&f| f == fields[0]).unwrap();
cg_mounts.push(InitMount {
cg_mounts.push(INIT_MOUNT {
fstype: "cgroup",
src: "cgroup",
dest: *value,
@@ -710,7 +618,7 @@ pub fn get_cgroup_mounts(
return Ok(Vec::new());
}
cg_mounts.push(InitMount {
cg_mounts.push(INIT_MOUNT {
fstype: "tmpfs",
src: "tmpfs",
dest: SYSFS_CGROUPPATH,
@@ -735,7 +643,7 @@ pub fn cgroups_mount(logger: &Logger, unified_cgroup_hierarchy: bool) -> Result<
Ok(())
}
pub fn remove_mounts(mounts: &[String]) -> Result<()> {
pub fn remove_mounts(mounts: &Vec<String>) -> Result<()> {
for m in mounts.iter() {
mount::umount(m.as_str()).context(format!("failed to umount {:?}", m))?;
}
@@ -767,7 +675,7 @@ fn ensure_destination_exists(destination: &str, fs_type: &str) -> Result<()> {
fn parse_options(option_list: Vec<String>) -> HashMap<String, String> {
let mut options = HashMap::new();
for opt in option_list.iter() {
let fields: Vec<&str> = opt.split('=').collect();
let fields: Vec<&str> = opt.split("=").collect();
if fields.len() != 2 {
continue;
}
@@ -893,7 +801,7 @@ mod tests {
let src_filename: String;
let dest_filename: String;
if !d.src.is_empty() {
if d.src != "" {
src = dir.path().join(d.src.to_string());
src_filename = src
.to_str()
@@ -903,7 +811,7 @@ mod tests {
src_filename = "".to_owned();
}
if !d.dest.is_empty() {
if d.dest != "" {
dest = dir.path().join(d.dest.to_string());
dest_filename = dest
.to_str()
@@ -915,7 +823,7 @@ mod tests {
// Create the mount directories
for d in [src_filename.clone(), dest_filename.clone()].iter() {
if d.is_empty() {
if d == "" {
continue;
}
@@ -935,8 +843,8 @@ mod tests {
let msg = format!("{}: result: {:?}", msg, result);
if d.error_contains.is_empty() {
assert!(result.is_ok(), "{}", msg);
if d.error_contains == "" {
assert!(result.is_ok(), msg);
// Cleanup
unsafe {
@@ -948,7 +856,7 @@ mod tests {
let msg = format!("{}: umount result: {:?}", msg, result);
assert!(ret == 0, "{}", msg);
assert!(ret == 0, format!("{}", msg));
};
continue;
@@ -956,18 +864,10 @@ mod tests {
let err = result.unwrap_err();
let error_msg = format!("{}", err);
assert!(error_msg.contains(d.error_contains), "{}", msg);
assert!(error_msg.contains(d.error_contains), msg);
}
}
#[test]
fn test_is_mounted() {
assert!(is_mounted("/proc").unwrap());
assert!(!is_mounted("").unwrap());
assert!(!is_mounted("!").unwrap());
assert!(!is_mounted("/not_existing_path").unwrap());
}
#[test]
fn test_remove_mounts() {
skip_if_not_root!();
@@ -1014,8 +914,7 @@ mod tests {
.expect("failed to create mount destination filename");
for d in [test_dir_filename, mnt_src_filename, mnt_dest_filename].iter() {
std::fs::create_dir_all(d)
.unwrap_or_else(|_| panic!("failed to create directory {}", d));
std::fs::create_dir_all(d).expect(&format!("failed to create directory {}", d));
}
// Create an actual mount
@@ -1061,14 +960,14 @@ mod tests {
let msg = format!("{}: result: {:?}", msg, result);
if d.error_contains.is_empty() {
assert!(result.is_ok(), "{}", msg);
if d.error_contains == "" {
assert!(result.is_ok(), msg);
continue;
}
let error_msg = format!("{:#}", result.unwrap_err());
assert!(error_msg.contains(d.error_contains), "{}", msg);
assert!(error_msg.contains(d.error_contains), msg);
}
}
@@ -1144,7 +1043,6 @@ mod tests {
assert!(
format!("{}", err).contains("No such file or directory"),
"{}",
msg
);
}
@@ -1157,29 +1055,29 @@ mod tests {
let filename = file_path
.to_str()
.unwrap_or_else(|| panic!("{}: failed to create filename", msg));
.expect(&format!("{}: failed to create filename", msg));
let mut file =
File::create(filename).unwrap_or_else(|_| panic!("{}: failed to create file", msg));
File::create(filename).expect(&format!("{}: failed to create file", msg));
file.write_all(d.contents.as_bytes())
.unwrap_or_else(|_| panic!("{}: failed to write file contents", msg));
.expect(&format!("{}: failed to write file contents", msg));
let result = get_mount_fs_type_from_file(filename, d.mount_point);
// add more details if an assertion fails
let msg = format!("{}: result: {:?}", msg, result);
if d.error_contains.is_empty() {
if d.error_contains == "" {
let fs_type = result.unwrap();
assert!(d.fs_type == fs_type, "{}", msg);
assert!(d.fs_type == fs_type, msg);
continue;
}
let error_msg = format!("{}", result.unwrap_err());
assert!(error_msg.contains(d.error_contains), "{}", msg);
assert!(error_msg.contains(d.error_contains), msg);
}
}
@@ -1217,21 +1115,21 @@ mod tests {
let drain = slog::Discard;
let logger = slog::Logger::root(drain, o!());
let first_mount = InitMount {
let first_mount = INIT_MOUNT {
fstype: "tmpfs",
src: "tmpfs",
dest: SYSFS_CGROUPPATH,
options: vec!["nosuid", "nodev", "noexec", "mode=755"],
};
let last_mount = InitMount {
let last_mount = INIT_MOUNT {
fstype: "tmpfs",
src: "tmpfs",
dest: SYSFS_CGROUPPATH,
options: vec!["remount", "ro", "nosuid", "nodev", "noexec", "mode=755"],
};
let cg_devices_mount = InitMount {
let cg_devices_mount = INIT_MOUNT {
fstype: "cgroup",
src: "cgroup",
dest: "/sys/fs/cgroup/devices",
@@ -1319,43 +1217,43 @@ mod tests {
.expect("failed to create cgroup file filename");
let mut file =
File::create(filename).unwrap_or_else(|_| panic!("{}: failed to create file", msg));
File::create(filename).expect(&format!("{}: failed to create file", msg));
file.write_all(d.contents.as_bytes())
.unwrap_or_else(|_| panic!("{}: failed to write file contents", msg));
.expect(&format!("{}: failed to write file contents", msg));
let result = get_cgroup_mounts(&logger, filename, false);
let msg = format!("{}: result: {:?}", msg, result);
if !d.error_contains.is_empty() {
assert!(result.is_err(), "{}", msg);
if d.error_contains != "" {
assert!(result.is_err(), msg);
let error_msg = format!("{}", result.unwrap_err());
assert!(error_msg.contains(d.error_contains), "{}", msg);
assert!(error_msg.contains(d.error_contains), msg);
continue;
}
assert!(result.is_ok(), "{}", msg);
assert!(result.is_ok(), msg);
let mounts = result.unwrap();
let count = mounts.len();
if !d.devices_cgroup {
assert!(count == 0, "{}", msg);
assert!(count == 0, msg);
continue;
}
// get_cgroup_mounts() adds the device cgroup plus two other mounts.
assert!(count == (1 + 2), "{}", msg);
assert!(count == (1 + 2), msg);
// First mount
assert!(mounts[0].eq(&first_mount), "{}", msg);
assert!(mounts[0].eq(&first_mount), msg);
// Last mount
assert!(mounts[2].eq(&last_mount), "{}", msg);
assert!(mounts[2].eq(&last_mount), msg);
// Devices cgroup
assert!(mounts[1].eq(&cg_devices_mount), "{}", msg);
assert!(mounts[1].eq(&cg_devices_mount), msg);
}
}
}

View File

@@ -11,6 +11,7 @@ use std::fmt;
use std::fs;
use std::fs::File;
use std::path::{Path, PathBuf};
use std::thread::{self};
use crate::mount::{BareMount, FLAGS};
use slog::Logger;
@@ -45,30 +46,29 @@ impl Namespace {
logger: logger.clone(),
path: String::from(""),
persistent_ns_dir: String::from(PERSISTENT_NS_DIR),
ns_type: NamespaceType::Ipc,
ns_type: NamespaceType::IPC,
hostname: None,
}
}
pub fn get_ipc(mut self) -> Self {
self.ns_type = NamespaceType::Ipc;
pub fn as_ipc(mut self) -> Self {
self.ns_type = NamespaceType::IPC;
self
}
pub fn get_uts(mut self, hostname: &str) -> Self {
self.ns_type = NamespaceType::Uts;
if !hostname.is_empty() {
pub fn as_uts(mut self, hostname: &str) -> Self {
self.ns_type = NamespaceType::UTS;
if hostname != "" {
self.hostname = Some(String::from(hostname));
}
self
}
pub fn get_pid(mut self) -> Self {
self.ns_type = NamespaceType::Pid;
pub fn as_pid(mut self) -> Self {
self.ns_type = NamespaceType::PID;
self
}
#[allow(dead_code)]
pub fn set_root_dir(mut self, dir: &str) -> Self {
self.persistent_ns_dir = dir.to_string();
self
@@ -76,12 +76,12 @@ impl Namespace {
// setup creates persistent namespace without switching to it.
// Note, pid namespaces cannot be persisted.
pub async fn setup(mut self) -> Result<Self> {
pub fn setup(mut self) -> Result<Self> {
fs::create_dir_all(&self.persistent_ns_dir)?;
let ns_path = PathBuf::from(&self.persistent_ns_dir);
let ns_type = self.ns_type;
if ns_type == NamespaceType::Pid {
if ns_type == NamespaceType::PID {
return Err(anyhow!("Cannot persist namespace of PID type"));
}
let logger = self.logger.clone();
@@ -93,51 +93,48 @@ impl Namespace {
self.path = new_ns_path.clone().into_os_string().into_string().unwrap();
let hostname = self.hostname.clone();
let new_thread = tokio::spawn(async move {
if let Err(err) = || -> Result<()> {
let origin_ns_path = get_current_thread_ns_path(&ns_type.get());
let new_thread = thread::spawn(move || -> Result<()> {
let origin_ns_path = get_current_thread_ns_path(&ns_type.get());
File::open(Path::new(&origin_ns_path))?;
File::open(Path::new(&origin_ns_path))?;
// Create a new netns on the current thread.
let cf = ns_type.get_flags();
// Create a new netns on the current thread.
let cf = ns_type.get_flags().clone();
unshare(cf)?;
unshare(cf)?;
if ns_type == NamespaceType::Uts && hostname.is_some() {
nix::unistd::sethostname(hostname.unwrap())?;
}
// Bind mount the new namespace from the current thread onto the mount point to persist it.
let source: &str = origin_ns_path.as_str();
let destination: &str = new_ns_path.as_path().to_str().unwrap_or("none");
let mut flags = MsFlags::empty();
if let Some(x) = FLAGS.get("rbind") {
let (_, f) = *x;
flags |= f;
};
let bare_mount = BareMount::new(source, destination, "none", flags, "", &logger);
bare_mount.mount().map_err(|e| {
anyhow!(
"Failed to mount {} to {} with err:{:?}",
source,
destination,
e
)
})?;
Ok(())
}() {
return Err(err);
if ns_type == NamespaceType::UTS && hostname.is_some() {
nix::unistd::sethostname(hostname.unwrap())?;
}
// Bind mount the new namespace from the current thread onto the mount point to persist it.
let source: &str = origin_ns_path.as_str();
let destination: &str = new_ns_path.as_path().to_str().unwrap_or("none");
let mut flags = MsFlags::empty();
match FLAGS.get("rbind") {
Some(x) => {
let (_, f) = *x;
flags = flags | f;
}
None => (),
};
let bare_mount = BareMount::new(source, destination, "none", flags, "", &logger);
bare_mount.mount().map_err(|e| {
anyhow!(
"Failed to mount {} to {} with err:{:?}",
source,
destination,
e
)
})?;
Ok(())
});
new_thread
.await
.join()
.map_err(|e| anyhow!("Failed to join thread {:?}!", e))??;
Ok(self)
@@ -147,27 +144,27 @@ impl Namespace {
/// Represents the Namespace type.
#[derive(Clone, Copy, PartialEq)]
enum NamespaceType {
Ipc,
Uts,
Pid,
IPC,
UTS,
PID,
}
impl NamespaceType {
/// Get the string representation of the namespace type.
pub fn get(&self) -> &str {
match *self {
Self::Ipc => "ipc",
Self::Uts => "uts",
Self::Pid => "pid",
Self::IPC => "ipc",
Self::UTS => "uts",
Self::PID => "pid",
}
}
/// Get the associate flags with the namespace type.
pub fn get_flags(&self) -> CloneFlags {
match *self {
Self::Ipc => CloneFlags::CLONE_NEWIPC,
Self::Uts => CloneFlags::CLONE_NEWUTS,
Self::Pid => CloneFlags::CLONE_NEWPID,
Self::IPC => CloneFlags::CLONE_NEWIPC,
Self::UTS => CloneFlags::CLONE_NEWUTS,
Self::PID => CloneFlags::CLONE_NEWPID,
}
}
}
@@ -178,6 +175,12 @@ impl fmt::Debug for NamespaceType {
}
}
impl Default for NamespaceType {
fn default() -> Self {
NamespaceType::IPC
}
}
#[cfg(test)]
mod tests {
use super::{Namespace, NamespaceType};
@@ -185,58 +188,55 @@ mod tests {
use nix::sched::CloneFlags;
use tempfile::Builder;
#[tokio::test]
async fn test_setup_persistent_ns() {
#[test]
fn test_setup_persistent_ns() {
skip_if_not_root!();
// Create dummy logger and temp folder.
let logger = slog::Logger::root(slog::Discard, o!());
let tmpdir = Builder::new().prefix("ipc").tempdir().unwrap();
let ns_ipc = Namespace::new(&logger)
.get_ipc()
.as_ipc()
.set_root_dir(tmpdir.path().to_str().unwrap())
.setup()
.await;
.setup();
assert!(ns_ipc.is_ok());
assert!(remove_mounts(&[ns_ipc.unwrap().path]).is_ok());
assert!(remove_mounts(&vec![ns_ipc.unwrap().path]).is_ok());
let logger = slog::Logger::root(slog::Discard, o!());
let tmpdir = Builder::new().prefix("uts").tempdir().unwrap();
let ns_uts = Namespace::new(&logger)
.get_uts("test_hostname")
.as_uts("test_hostname")
.set_root_dir(tmpdir.path().to_str().unwrap())
.setup()
.await;
.setup();
assert!(ns_uts.is_ok());
assert!(remove_mounts(&[ns_uts.unwrap().path]).is_ok());
assert!(remove_mounts(&vec![ns_uts.unwrap().path]).is_ok());
// Check it cannot persist pid namespaces.
let logger = slog::Logger::root(slog::Discard, o!());
let tmpdir = Builder::new().prefix("pid").tempdir().unwrap();
let ns_pid = Namespace::new(&logger)
.get_pid()
.as_pid()
.set_root_dir(tmpdir.path().to_str().unwrap())
.setup()
.await;
.setup();
assert!(ns_pid.is_err());
}
#[test]
fn test_namespace_type() {
let ipc = NamespaceType::Ipc;
let ipc = NamespaceType::IPC;
assert_eq!("ipc", ipc.get());
assert_eq!(CloneFlags::CLONE_NEWIPC, ipc.get_flags());
let uts = NamespaceType::Uts;
let uts = NamespaceType::UTS;
assert_eq!("uts", uts.get());
assert_eq!(CloneFlags::CLONE_NEWUTS, uts.get_flags());
let pid = NamespaceType::Pid;
let pid = NamespaceType::PID;
assert_eq!("pid", pid.get());
assert_eq!(CloneFlags::CLONE_NEWPID, pid.get_flags());
}

File diff suppressed because it is too large Load Diff

View File

@@ -48,7 +48,7 @@ pub fn setup_guest_dns(logger: Logger, dns_list: Vec<String>) -> Result<()> {
fn do_setup_guest_dns(logger: Logger, dns_list: Vec<String>, src: &str, dst: &str) -> Result<()> {
let logger = logger.new(o!( "subsystem" => "network"));
if dns_list.is_empty() {
if dns_list.len() == 0 {
info!(
logger,
"Did not set sandbox DNS as DNS not received as part of request."
@@ -117,12 +117,12 @@ mod tests {
];
// write to /run/kata-containers/sandbox/resolv.conf
let mut src_file = File::create(src_filename)
.unwrap_or_else(|_| panic!("failed to create file {:?}", src_filename));
let mut src_file =
File::create(src_filename).expect(&format!("failed to create file {:?}", src_filename));
let content = dns.join("\n");
src_file
.write_all(content.as_bytes())
.expect("failed to write file contents");
.expect(&format!("failed to write file contents"));
// call do_setup_guest_dns
let result = do_setup_guest_dns(logger, dns.clone(), src_filename, dst_filename);
@@ -139,10 +139,10 @@ mod tests {
assert_eq!(true, content.is_ok());
let content = content.unwrap();
let expected_dns: Vec<&str> = content.split('\n').collect();
let expected_DNS: Vec<&str> = content.split('\n').collect();
// assert the data are the same as /run/kata-containers/sandbox/resolv.conf
assert_eq!(dns, expected_dns);
assert_eq!(dns, expected_DNS);
// umount /etc/resolv.conf
let _ = mount::umount(dst_filename);

Some files were not shown because too many files have changed in this diff Show More