Compare commits

..

36 Commits

Author SHA1 Message Date
Fabiano Fidêncio
0ad6f05dee Merge pull request #4024 from bergwolf/2.4.0-branch-bump
# Kata Containers 2.4.0
2022-04-01 13:46:35 +02:00
Peng Tao
4c9c01a124 release: Kata Containers 2.4.0
- stable-2.4 | agent: fix container stop error with signal SIGRTMIN+3
- stable-2.4 | kata-monitor: fix duplicated output when printing usage
- stable-2.4 | runtime: Stop getting OOM events from agent for "ttrpc closed" error
- kata-deploy: fix version bump from -rc to stable
- stable-2.4: release: Include all the rust vendored code into the vendored tarball
- stable-2.4 | tools: release: Do not consider release candidates as stable releases
- agent: Signal the whole process group
- stable-2.4 | docs: Update k8s documentation
- backport main commits to stable 2.4
- stable-2.4: Bump QEMU to 6.2 (bringing then SGX support in)
- runtime: Properly handle ESRCH error when signaling container
- stable-2.4 | versions: Upgrade to Cloud Hypervisor v22.1

f2319d69 release: Adapt kata-deploy for 2.4.0
cae48e9c agent: fix container stop error with signal SIGRTMIN+3
342aa95c kata-monitor: fix duplicated output when printing usage
9f75e226 runtime: add logs around sandbox monitor
363fbed8 runtime: stop getting OOM events when ttrpc: closed error
f840de5a workflows,release: Ship *all* the rust vendored code
952cea5f tools: Add a generate_vendor.sh script
cc965fa0 kata-deploy: fix version bump from -rc to stable
f41cc184 tools: release: Do not consider release candidates as stable releases
e059b50f runtime: Add more debug logs for container io stream copy
71ce6f53 agent: Kill the all the container processes of the same cgroup
30fc2c86 docs: Update k8s documentation
24028969 virtcontainers: Run mock hook from build tree rather than system bin dir
4e54aa5a doc: fix filename typo
d815393c manager: Add options to change self test behaviour
4111e1a3 manager: Add option to enable component debug
2918be18 manager: Create containerd link
6b31b068 kernel: fix cve-2022-0847
5589b246 doc: update Intel SGX use cases document
1da88dca tools: update QEMU to 6.2
3e2f9223 runtime: Properly handle ESRCH error when signaling container
4c21cb3e versions: Upgrade to Cloud Hypervisor v22.1

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-04-01 06:20:20 +00:00
Peng Tao
f2319d693d release: Adapt kata-deploy for 2.4.0
kata-deploy files must be adapted to a new release.  The cases where it
happens are when the release goes from -> to:
* main -> stable:
  * kata-deploy-stable / kata-cleanup-stable: are removed

* stable -> stable:
  * kata-deploy / kata-cleanup: bump the release to the new one.

There are no changes when doing an alpha release, as the files on the
"main" branch always point to the "latest" and "stable" tags.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-04-01 06:20:20 +00:00
Bin Liu
98ccf8f6a1 Merge pull request #4008 from wxx213/stable-2.4
stable-2.4 | agent: fix container stop error with signal SIGRTMIN+3
2022-04-01 11:29:18 +08:00
Wang Xingxing
cae48e9c9b agent: fix container stop error with signal SIGRTMIN+3
The nix::sys::signal::Signal package api cannot deal with SIGRTMIN+3,
directly use libc function to send the signal.

Fixes: #3990

Signed-off-by: Wang Xingxing <stellarwxx@163.com>
(cherry picked from commit 0d765bd082)
Signed-off-by: Wang Xingxing <stellarwxx@163.com>
2022-03-31 16:49:06 +08:00
snir911
a36103c759 Merge pull request #4003 from fgiudici/kata-monitor_fix_help_backport
stable-2.4 | kata-monitor: fix duplicated output when printing usage
2022-03-30 18:57:17 +03:00
Fabiano Fidêncio
6abbcc551c Merge pull request #3997 from liubin/backport-2.4
stable-2.4 | runtime: Stop getting OOM events from agent for "ttrpc closed" error
2022-03-30 14:08:55 +02:00
Francesco Giudici
342aa95cc8 kata-monitor: fix duplicated output when printing usage
(default: "/run/containerd/containerd.sock") is duplicated when
printing kata-monitor usage:

[root@kubernetes ~]# kata-monitor --help
Usage of kata-monitor:
  -listen-address string
        The address to listen on for HTTP requests. (default ":8090")
  -log-level string
        Log level of logrus(trace/debug/info/warn/error/fatal/panic). (default "info")
  -runtime-endpoint string
        Endpoint of CRI container runtime service. (default: "/run/containerd/containerd.sock") (default "/run/containerd/containerd.sock")

the golang flag package takes care of adding the defaults when printing
usage. Remove the explicit print of the value so that it would not be
printed on screen twice.

Fixes: #3998

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
(cherry picked from commit a63bbf9793)
2022-03-30 14:02:54 +02:00
bin
9f75e226f1 runtime: add logs around sandbox monitor
For debugging purposes, add some logs.

Fixes: #3815

Signed-off-by: bin <bin@hyper.sh>
2022-03-30 17:11:40 +08:00
bin
363fbed804 runtime: stop getting OOM events when ttrpc: closed error
getOOMEvents is a long-waiting call, it will retry when failed.
For cases of agent shutdown, the retry should stop.

When the agent hasn't detected agent has died, we can also check
whether the error is "ttrpc: closed".

Fixes: #3815

Signed-off-by: bin <bin@hyper.sh>
2022-03-30 17:11:35 +08:00
Fabiano Fidêncio
54a638317a Merge pull request #3988 from bergwolf/github/kata-deploy
kata-deploy: fix version bump from -rc to stable
2022-03-30 11:01:45 +02:00
Peng Tao
8ce6b12b41 Merge pull request #3993 from fidencio/wip/stable-2.4-release-include-all-rust-vendored-code-to-the-vendored-tarball
stable-2.4: release: Include all the rust vendored code into the vendored tarball
2022-03-30 16:10:47 +08:00
Fabiano Fidêncio
f840de5acb workflows,release: Ship *all* the rust vendored code
Instead of only vendoring the code needed by the agent, let's ensure we
vendor all the needed rust code, and let's do it using the newly
introduced enerate_vendor.sh script.

Fixes: #3973

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 3606923ac8)
2022-03-29 23:27:43 +02:00
Fabiano Fidêncio
952cea5f5d tools: Add a generate_vendor.sh script
This script is responsible for generating a tarball with all the rust
vendored code that is needed for fully building kata-containers on a
disconnected environment.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 2eb07455d0)
2022-03-29 23:27:29 +02:00
Peng Tao
cc965fa0cb kata-deploy: fix version bump from -rc to stable
In such case, we should bump from "latest" tag rather than from
current_version.

Fixes: #3986
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-03-29 03:45:27 +00:00
GabyCT
44b1473d0c Merge pull request #3977 from fidencio/wip/backport-fix-for-3847
stable-2.4 | tools: release: Do not consider release candidates as stable releases
2022-03-28 10:38:47 -06:00
Fupan Li
565efd1bf2 Merge pull request #3975 from bergwolf/github/backport-stable-2.4
agent: Signal the whole process group
2022-03-28 18:26:12 +08:00
Fabiano Fidêncio
f41cc18427 tools: release: Do not consider release candidates as stable releases
During the release of 2.4.0-rc0 @egernst noticed an incositency in the
way we handle release tags, as release candidates are being taken as
"stable" releases, while both the kata-deploy tests and the release
action consider this as "latest".

Ideally we should have our own tag for "release candidate", but that's
something that could and should be discussed more extensively outside of
the scope of this quick fix.

For now, let's align the code generating the PR for bumping the release
with what we already do as part of the release action and kata-deploy
test, and tag "-rc"  as latest, regardless of which branch it's coming
from.

Fixes: #3847

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 4adf93ef2c)
2022-03-28 11:01:58 +02:00
Feng Wang
e059b50f5c runtime: Add more debug logs for container io stream copy
This can help debugging container lifecycle issues

Fixes: #3913

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-28 16:22:22 +08:00
Feng Wang
71ce6f537f agent: Kill the all the container processes of the same cgroup
Otherwise the container process might leak and cause an unclean exit

Fixes: #3913

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-28 16:21:51 +08:00
Bin Liu
a2b73b60bd Merge pull request #3960 from cmaf/update-k8s-docs-1-stable-2.4
stable-2.4 | docs: Update k8s documentation
2022-03-25 15:25:25 +08:00
Bin Liu
2ce9ce7b8f Merge pull request #3954 from bergwolf/github/backport-stable-2.4
backport main commits to stable 2.4
2022-03-25 14:45:17 +08:00
Chelsea Mafrica
30fc2c863d docs: Update k8s documentation
Update documentation with missing step to untaint node to enable
scheduling and update the example to run a pod using the kata runtime
class instead of untrusted workloads, which applies to versions of CRI-O
prior to v1.12.

Fixes #3863

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
(cherry picked from commit 5c434270d1)
2022-03-24 11:22:18 -07:00
David Gibson
24028969c2 virtcontainers: Run mock hook from build tree rather than system bin dir
Running unit tests should generally have minimal dependencies on
things outside the build tree.  It *definitely* shouldn't modify
system wide things outside the build tree.  Currently the runtime
"make test" target does so, though.

Several of the tests in src/runtime/pkg/katautils/hook_test.go require a
sample hook binary.  They expect this hook in
/usr/bin/virtcontainers/bin/test/hook, so the makefile, as root, installs
the test binary to that location.

Go tests automatically run within the package's directory though, so
there's no need to use a system wide path.  We can use a relative path to
the binary build within the tree just as easily.

fixes #3941

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-03-24 12:02:00 +08:00
Garrett Mahin
4e54aa5a7b doc: fix filename typo
Corrects a filename typo in cleanup cluster part
of kata-deploy README.md

Fixes: #3869
Signed-off-by: Garrett Mahin <garrett.mahin@gmail.com>
2022-03-24 12:00:17 +08:00
James O. D. Hunt
d815393c3e manager: Add options to change self test behaviour
Added new `kata-manager` options to control the self-test behaviour. By
default, after installation the manager will run a test to ensure a Kata
Containers container can be created. New options allow:

- The self test to be disabled.
- Only the self test to be run (no installation).

These features allow changes to be made to the installed system before
the self test is run.

Fixes: #3851.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2022-03-24 11:59:48 +08:00
James O. D. Hunt
4111e1a3de manager: Add option to enable component debug
Added a `-d` option to `kata-manager` to enable Kata Containers
and containerd debug.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2022-03-24 11:59:33 +08:00
James O. D. Hunt
2918be180f manager: Create containerd link
Make the `kata-manager` create a `containerd` link to ensure the
downloaded containerd systemd service file can find the daemon when
using the GitHub packaged version of containerd.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2022-03-24 11:59:26 +08:00
Julio Montes
6b31b06832 kernel: fix cve-2022-0847
bump guest kernel version to fix cve-2022-0847 "Dirty Pipe"

fixes #3852

Signed-off-by: Julio Montes <julio.montes@intel.com>
2022-03-24 11:58:43 +08:00
Fabiano Fidêncio
53a9cf7dc4 Merge pull request #3927 from fidencio/stable-2.4/qemu-bump
stable-2.4: Bump QEMU to 6.2 (bringing then SGX support in)
2022-03-23 07:20:35 +01:00
Julio Montes
5589b246d7 doc: update Intel SGX use cases document
Installation section is not longer needed because of the latest
default kata kernel supports Intel SGX.
Include QEMU to the list of supported hypervisors.

fixes #3911

Signed-off-by: Julio Montes <julio.montes@intel.com>
(cherry picked from commit 24b29310b2)
2022-03-22 08:36:04 +01:00
Julio Montes
1da88dca4b tools: update QEMU to 6.2
bring Intel SGX support

Changes tha may impact in Kata Containers
Arm:
The 'virt' machine now supports an emulated ITS
The 'virt' machine now supports more than 123 CPUs in TCG emulation mode
The pl031 real-time clock device now supports sending RTC_CHANGE QMP events

PowerPC:
Improved POWER10 support for the 'powernv' machine
Initial support for POWER10 DD2.0 CPU added
Added support for FORM2 PAPR NUMA descriptions in the "pseries" machine
 type

s390x:
Improved storage key emulation (e.g. fixed address handling, lazy
 storage key enablement for TCG, ...)
New gen16 CPU features are now enabled automatically in the latest
 machine type

KVM:
Support for SGX in the virtual machine, using the /dev/sgx_vepc device
 on the host and the "memory-backend-epc" backend in QEMU.
New "hv-apicv" CPU property (aliased to "hv-avic") sets the
 HV_DEPRECATING_AEOI_RECOMMENDED bit in CPUID[0x40000004].EAX.

virtio-mem:
QEMU now fully supports guest memory dumps with virtio-mem.
QEMU now cleanly supports precopy migration, postcopy migration and
 background snapshots with virtio-mem.

fixes #3902

Signed-off-by: Julio Montes <julio.montes@intel.com>
(cherry picked from commit 18d4d7fb1d)
2022-03-22 08:35:45 +01:00
Peng Tao
8cc2231818 Merge pull request #3892 from fengwang666/my_2.4_pr_backport
runtime: Properly handle ESRCH error when signaling container
2022-03-15 10:11:25 +08:00
GabyCT
63c1498f05 Merge pull request #3891 from likebreath/stable-2.4
stable-2.4 | versions: Upgrade to Cloud Hypervisor v22.1
2022-03-14 17:44:09 -06:00
Feng Wang
3e2f9223b0 runtime: Properly handle ESRCH error when signaling container
Currently kata shim v2 doesn't translate ESRCH signal, causing container
fail to stop and shim leak.

Fixes: #3874

Signed-off-by: Feng Wang <feng.wang@databricks.com>
(cherry picked from commit aa5ae6b17c)
2022-03-14 13:15:54 -07:00
Bo Chen
4c21cb3eb1 versions: Upgrade to Cloud Hypervisor v22.1
This is a bug fix release. The following issues have been addressed:
1) VFIO ioctl reordering to fix MSI on AMD platforms; 2) Fix virtio-net
control queue.

Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v22.1

Fixes: #3872

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 7a18e32fa7)
2022-03-14 12:34:31 -07:00
242 changed files with 2608 additions and 11806 deletions

View File

@@ -1,38 +0,0 @@
# Copyright (c) 2022 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
#
name: Add PR sizing label
on:
pull_request_target:
types:
- opened
- reopened
- synchronize
jobs:
add-pr-size-label:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v1
- name: Install PR sizing label script
run: |
# Clone into a temporary directory to avoid overwriting
# any existing github directory.
pushd $(mktemp -d) &>/dev/null
git clone --single-branch --depth 1 "https://github.com/kata-containers/.github" && cd .github/scripts
sudo install pr-add-size-label.sh /usr/local/bin
popd &>/dev/null
- name: Add PR sizing label
env:
GITHUB_TOKEN: ${{ secrets.KATA_GITHUB_ACTIONS_PR_SIZE_TOKEN }}
run: |
pr=${{ github.event.number }}
sudo apt -y install diffstat patchutils
pr-add-size-label.sh -p "$pr"

View File

@@ -10,7 +10,7 @@ env:
error_msg: |+
See the document below for help on formatting commits for the project.
https://github.com/kata-containers/community/blob/main/CONTRIBUTING.md#patch-format
https://github.com/kata-containers/community/blob/master/CONTRIBUTING.md#patch-format
jobs:
commit-message-check:

View File

@@ -1,44 +0,0 @@
on:
schedule:
- cron: '0 23 * * 0'
name: Docs URL Alive Check
jobs:
test:
strategy:
matrix:
go-version: [1.17.x]
os: [ubuntu-20.04]
runs-on: ${{ matrix.os }}
env:
target_branch: ${{ github.base_ref }}
steps:
- name: Install Go
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/setup-go@v2
with:
go-version: ${{ matrix.go-version }}
env:
GOPATH: ${{ runner.workspace }}/kata-containers
- name: Set env
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "GOPATH=${{ github.workspace }}" >> $GITHUB_ENV
echo "${{ github.workspace }}/bin" >> $GITHUB_PATH
- name: Checkout code
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v2
with:
fetch-depth: 0
path: ./src/github.com/${{ github.repository }}
- name: Setup
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/setup.sh
env:
GOPATH: ${{ runner.workspace }}/kata-containers
# docs url alive check
- name: Docs URL Alive Check
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && make docs-url-alive-check

View File

@@ -26,7 +26,6 @@ jobs:
- name: Build ${{ matrix.asset }}
run: |
./tools/packaging/kata-deploy/local-build/kata-deploy-copy-yq-installer.sh
./tools/packaging/kata-deploy/local-build/kata-deploy-binaries-in-docker.sh --build="${KATA_ASSET}"
build_dir=$(readlink -f build)
# store-artifact does not work with symlink

1
.gitignore vendored
View File

@@ -9,5 +9,4 @@ src/agent/src/version.rs
src/agent/kata-agent.service
src/agent/protocols/src/*.rs
!src/agent/protocols/src/lib.rs
build

View File

@@ -14,7 +14,6 @@ TOOLS =
TOOLS += agent-ctl
TOOLS += trace-forwarder
TOOLS += runk
STANDARD_TARGETS = build check clean install test vendor
@@ -40,16 +39,10 @@ generate-protocols:
static-checks: build
bash ci/static-checks.sh
docs-url-alive-check:
bash ci/docs-url-alive-check.sh
.PHONY: \
all \
binary-tarball \
default \
install-binary-tarball \
logging-crate-tests \
static-checks \
docs-url-alive-check
static-checks

View File

@@ -118,7 +118,6 @@ The table below lists the core parts of the project:
| [runtime](src/runtime) | core | Main component run by a container manager and providing a containerd shimv2 runtime implementation. |
| [agent](src/agent) | core | Management process running inside the virtual machine / POD that sets up the container environment. |
| [documentation](docs) | documentation | Documentation common to all components (such as design and install documentation). |
| [libraries](src/libs) | core | Library crates shared by multiple Kata Container components or published to [`crates.io`](https://crates.io/index.html) |
| [tests](https://github.com/kata-containers/tests) | tests | Excludes unit tests which live with the main code. |
### Additional components
@@ -132,7 +131,6 @@ The table below lists the remaining parts of the project:
| [osbuilder](tools/osbuilder) | infrastructure | Tool to create "mini O/S" rootfs and initrd images and kernel for the hypervisor. |
| [`agent-ctl`](src/tools/agent-ctl) | utility | Tool that provides low-level access for testing the agent. |
| [`trace-forwarder`](src/tools/trace-forwarder) | utility | Agent tracing helper. |
| [`runk`](src/tools/runk) | utility | Standard OCI container runtime based on the agent. |
| [`ci`](https://github.com/kata-containers/ci) | CI | Continuous Integration configuration files and scripts. |
| [`katacontainers.io`](https://github.com/kata-containers/www.katacontainers.io) | Source for the [`katacontainers.io`](https://www.katacontainers.io) site. |

View File

@@ -1 +1 @@
2.5.0-alpha1
2.4.0

View File

@@ -1,12 +0,0 @@
#!/bin/bash
#
# Copyright (c) 2021 Easystack Inc.
#
# SPDX-License-Identifier: Apache-2.0
set -e
cidir=$(dirname "$0")
source "${cidir}/lib.sh"
run_docs_url_alive_check

View File

@@ -19,7 +19,7 @@ source "${tests_repo_dir}/.ci/lib.sh"
# fail. So let's ensure they are unset here.
unset PREFIX DESTDIR
arch=${ARCH:-$(uname -m)}
arch=$(uname -m)
workdir="$(mktemp -d --tmpdir build-libseccomp.XXXXX)"
# Variables for libseccomp
@@ -70,8 +70,7 @@ build_and_install_gperf() {
curl -sLO "${gperf_tarball_url}"
tar -xf "${gperf_tarball}"
pushd "gperf-${gperf_version}"
# Unset $CC for configure, we will always use native for gperf
CC= ./configure --prefix="${gperf_install_dir}"
./configure --prefix="${gperf_install_dir}"
make
make install
export PATH=$PATH:"${gperf_install_dir}"/bin
@@ -85,7 +84,7 @@ build_and_install_libseccomp() {
curl -sLO "${libseccomp_tarball_url}"
tar -xf "${libseccomp_tarball}"
pushd "libseccomp-${libseccomp_version}"
./configure --prefix="${libseccomp_install_dir}" CFLAGS="${cflags}" --enable-static --host="${arch}"
./configure --prefix="${libseccomp_install_dir}" CFLAGS="${cflags}" --enable-static
make
make install
popd

24
ci/install_musl.sh Executable file
View File

@@ -0,0 +1,24 @@
#!/usr/bin/env bash
# Copyright (c) 2020 Ant Group
#
# SPDX-License-Identifier: Apache-2.0
#
set -e
install_aarch64_musl() {
local arch=$(uname -m)
if [ "${arch}" == "aarch64" ]; then
local musl_tar="${arch}-linux-musl-native.tgz"
local musl_dir="${arch}-linux-musl-native"
pushd /tmp
if curl -sLO --fail https://musl.cc/${musl_tar}; then
tar -zxf ${musl_tar}
mkdir -p /usr/local/musl/
cp -r ${musl_dir}/* /usr/local/musl/
fi
popd
fi
}
install_aarch64_musl

View File

@@ -44,12 +44,3 @@ run_go_test()
clone_tests_repo
bash "$tests_repo_dir/.ci/go-test.sh"
}
run_docs_url_alive_check()
{
clone_tests_repo
# Make sure we have the targeting branch
git remote set-branches --add origin "${branch}"
git fetch -a
bash "$tests_repo_dir/.ci/static-checks.sh" --docs --all "github.com/kata-containers/kata-containers"
}

View File

@@ -46,7 +46,7 @@ The following link shows the latest list of limitations:
# Contributing
If you would like to work on resolving a limitation, please refer to the
[contributors guide](https://github.com/kata-containers/community/blob/main/CONTRIBUTING.md).
[contributors guide](https://github.com/kata-containers/community/blob/master/CONTRIBUTING.md).
If you wish to raise an issue for a new limitation, either
[raise an issue directly on the runtime](https://github.com/kata-containers/kata-containers/issues/new)
or see the

View File

@@ -30,6 +30,7 @@ See the [how-to documentation](how-to).
* [GPU Passthrough with Kata](./use-cases/GPU-passthrough-and-Kata.md)
* [SR-IOV with Kata](./use-cases/using-SRIOV-and-kata.md)
* [Intel QAT with Kata](./use-cases/using-Intel-QAT-and-kata.md)
* [VPP with Kata](./use-cases/using-vpp-and-kata.md)
* [SPDK vhost-user with Kata](./use-cases/using-SPDK-vhostuser-and-kata.md)
* [Intel SGX with Kata](./use-cases/using-Intel-SGX-and-kata.md)

View File

@@ -277,9 +277,7 @@ mod tests {
## Temporary files
Use `t.TempDir()` to create temporary directory. The directory created by
`t.TempDir()` is automatically removed when the test and all its subtests
complete.
Always delete temporary files on success.
### Golang temporary files
@@ -288,7 +286,11 @@ func TestSomething(t *testing.T) {
assert := assert.New(t)
// Create a temporary directory
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
assert.NoError(err)
// Delete it at the end of the test
defer os.RemoveAll(tmpdir)
// Add test logic that will use the tmpdir here...
}

View File

@@ -11,7 +11,6 @@ Kata Containers design documents:
- [`Inotify` support](inotify.md)
- [Metrics(Kata 2.0)](kata-2-0-metrics.md)
- [Design for Kata Containers `Lazyload` ability with `nydus`](kata-nydus-design.md)
- [Design for direct-assigned volume](direct-blk-device-assignment.md)
---

View File

@@ -67,15 +67,22 @@ Using a proxy for multiplexing the connections between the VM and the host uses
4.5MB per [POD][2]. In a high density deployment this could add up to GBs of
memory that could have been used to host more PODs. When we talk about density
each kilobyte matters and it might be the decisive factor between run another
POD or not. Before making the decision not to use VSOCKs, you should ask
POD or not. For example if you have 500 PODs running in a server, the same
amount of [`kata-proxy`][3] processes will be running and consuming for around
2250MB of RAM. Before making the decision not to use VSOCKs, you should ask
yourself, how many more containers can run with the memory RAM consumed by the
Kata proxies?
### Reliability
[`kata-proxy`][3] is in charge of multiplexing the connections between virtual
machine and host processes, if it dies all connections get broken. For example
if you have a [POD][2] with 10 containers running, if `kata-proxy` dies it would
be impossible to contact your containers, though they would still be running.
Since communication via VSOCKs is direct, the only way to lose communication
with the containers is if the VM itself or the `containerd-shim-kata-v2` dies, if this happens
the containers are removed automatically.
[1]: https://wiki.qemu.org/Features/VirtioVsock
[2]: ./vcpu-handling.md#virtual-cpus-and-kubernetes-pods
[3]: https://github.com/kata-containers/proxy

View File

@@ -1,253 +0,0 @@
# Motivation
Today, there exist a few gaps between Container Storage Interface (CSI) and virtual machine (VM) based runtimes such as Kata Containers
that prevent them from working together smoothly.
First, its cumbersome to use a persistent volume (PV) with Kata Containers. Today, for a PV with Filesystem volume mode, Virtio-fs
is the only way to surface it inside a Kata Container guest VM. But often mounting the filesystem (FS) within the guest operating system (OS) is
desired due to performance benefits, availability of native FS features and security benefits over the Virtio-fs mechanism.
Second, its difficult if not impossible to resize a PV online with Kata Containers. While a PV can be expanded on the host OS,
the updated metadata needs to be propagated to the guest OS in order for the application container to use the expanded volume.
Currently, there is not a way to propagate the PV metadata from the host OS to the guest OS without restarting the Pod sandbox.
# Proposed Solution
Because of the OS boundary, these features cannot be implemented in the CSI node driver plugin running on the host OS
as is normally done in the runc container. Instead, they can be done by the Kata Containers agent inside the guest OS,
but it requires the CSI driver to pass the relevant information to the Kata Containers runtime.
An ideal long term solution would be to have the `kubelet` coordinating the communication between the CSI driver and
the container runtime, as described in [KEP-2857](https://github.com/kubernetes/enhancements/pull/2893/files).
However, as the KEP is still under review, we would like to propose a short/medium term solution to unblock our use case.
The proposed solution is built on top of a previous [proposal](https://github.com/egernst/kata-containers/blob/da-proposal/docs/design/direct-assign-volume.md)
described by Eric Ernst. The previous proposal has two gaps:
1. Writing a `csiPlugin.json` file to the volume root path introduced a security risk. A malicious user can gain unauthorized
access to a block device by writing their own `csiPlugin.json` to the above location through an ephemeral CSI plugin.
2. The proposal didn't describe how to establish a mapping between a volume and a kata sandbox, which is needed for
implementing CSI volume resize and volume stat collection APIs.
This document particularly focuses on how to address these two gaps.
## Assumptions and Limitations
1. The proposal assumes that a block device volume will only be used by one Pod on a node at a time, which we believe
is the most common pattern in Kata Containers use cases. Its also unsafe to have the same block device attached to more than
one Kata pod. In the context of Kubernetes, the `PersistentVolumeClaim` (PVC) needs to have the `accessMode` as `ReadWriteOncePod`.
2. More advanced Kubernetes volume features such as, `fsGroup`, `fsGroupChangePolicy`, and `subPath` are not supported.
## End User Interface
1. The user specifies a PV as a direct-assigned volume. How a PV is specified as a direct-assigned volume is left for each CSI implementation to decide.
There are a few options for reference:
1. A storage class parameter specifies whether it's a direct-assigned volume. This avoids any lookups of PVC
or Pod information from the CSI plugin (as external provisioner takes care of these). However, all PVs in the storage class with the parameter set
will have host mounts skipped.
2. Use a PVC annotation. This approach requires the CSI plugins have `--extra-create-metadata` [set](https://kubernetes-csi.github.io/docs/external-provisioner.html#persistentvolumeclaim-and-persistentvolume-parameters)
to be able to perform a lookup of the PVC annotations from the API server. Pro: API server lookup of annotations only required during creation of PV.
Con: The CSI plugin will always skip host mounting of the PV.
3. The CSI plugin can also lookup pod `runtimeclass` during `NodePublish`. This approach can be found in the [ALIBABA CSI plugin](https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver/blob/master/pkg/disk/nodeserver.go#L248).
2. The CSI node driver delegates the direct assigned volume to the Kata Containers runtime. The CSI node driver APIs need to
be modified to pass the volume mount information and collect volume information to/from the Kata Containers runtime by invoking `kata-runtime` command line commands.
* **NodePublishVolume** -- It invokes `kata-runtime direct-volume add --volume-path [volumePath] --mount-info [mountInfo]`
to propagate the volume mount information to the Kata Containers runtime for it to carry out the filesystem mount operation.
The `volumePath` is the [target_path](https://github.com/container-storage-interface/spec/blob/master/csi.proto#L1364) in the CSI `NodePublishVolumeRequest`.
The `mountInfo` is a serialized JSON string.
* **NodeGetVolumeStats** -- It invokes `kata-runtime direct-volume stats --volume-path [volumePath]` to retrieve the filesystem stats of direct-assigned volume.
* **NodeExpandVolume** -- It invokes `kata-runtime direct-volume resize --volume-path [volumePath] --size [size]` to send a resize request to the Kata Containers runtime to
resize the direct-assigned volume.
* **NodeStageVolume/NodeUnStageVolume** -- It invokes `kata-runtime direct-volume remove --volume-path [volumePath]` to remove the persisted metadata of a direct-assigned volume.
The `mountInfo` object is defined as follows:
```Golang
type MountInfo struct {
// The type of the volume (ie. block)
VolumeType string `json:"volume-type"`
// The device backing the volume.
Device string `json:"device"`
// The filesystem type to be mounted on the volume.
FsType string `json:"fstype"`
// Additional metadata to pass to the agent regarding this volume.
Metadata map[string]string `json:"metadata,omitempty"`
// Additional mount options.
Options []string `json:"options,omitempty"`
}
```
Notes: given that the `mountInfo` is persisted to the disk by the Kata runtime, it shouldn't container any secrets (such as SMB mount password).
## Implementation Details
### Kata runtime
Instead of the CSI node driver writing the mount info into a `csiPlugin.json` file under the volume root,
as described in the original proposal, here we propose that the CSI node driver passes the mount information to
the Kata Containers runtime through a new `kata-runtime` commandline command. The `kata-runtime` then writes the mount
information to a `mount-info.json` file in a predefined location (`/run/kata-containers/shared/direct-volumes/[volume_path]/`).
When the Kata Containers runtime starts a container, it verifies whether a volume mount is a direct-assigned volume by checking
whether there is a `mountInfo` file under the computed Kata `direct-volumes` directory. If it is, the runtime parses the `mountInfo` file,
updates the mount spec with the data in `mountInfo`. The updated mount spec is then passed to the Kata agent in the guest VM together
with other mounts. The Kata Containers runtime also creates a file named by the sandbox id under the `direct-volumes/[volume_path]/`
directory. The reason for adding a sandbox id file is to establish a mapping between the volume and the sandbox using it.
Later, when the Kata Containers runtime handles the `get-stats` and `resize` commands, it uses the sandbox id to identify
the endpoint of the corresponding `containerd-shim-kata-v2`.
### containerd-shim-kata-v2 changes
`containerd-shim-kata-v2` provides an API for sandbox management through a Unix domain socket. Two new handlers are proposed: `/direct-volume/stats` and `/direct-volume/resize`:
Example:
```bash
$ curl --unix-socket "$shim_socket_path" -I -X GET 'http://localhost/direct-volume/stats/[urlSafeVolumePath]'
$ curl --unix-socket "$shim_socket_path" -I -X POST 'http://localhost/direct-volume/resize' -d '{ "volumePath"": [volumePath], "Size": "123123" }'
```
The shim then forwards the corresponding request to the `kata-agent` to carry out the operations inside the guest VM. For `resize` operation,
the Kata runtime also needs to notify the hypervisor to resize the block device (e.g. call `block_resize` in QEMU).
### Kata agent changes
The mount spec of a direct-assigned volume is passed to `kata-agent` through the existing `Storage` GRPC object.
Two new APIs and three new GRPC objects are added to GRPC protocol between the shim and agent for resizing and getting volume stats:
```protobuf
rpc GetVolumeStats(VolumeStatsRequest) returns (VolumeStatsResponse);
rpc ResizeVolume(ResizeVolumeRequest) returns (google.protobuf.Empty);
message VolumeStatsRequest {
// The volume path on the guest outside the container
string volume_guest_path = 1;
}
message ResizeVolumeRequest {
// Full VM guest path of the volume (outside the container)
string volume_guest_path = 1;
uint64 size = 2;
}
// This should be kept in sync with CSI NodeGetVolumeStatsResponse (https://github.com/container-storage-interface/spec/blob/v1.5.0/csi.proto)
message VolumeStatsResponse {
// This field is OPTIONAL.
repeated VolumeUsage usage = 1;
// Information about the current condition of the volume.
// This field is OPTIONAL.
// This field MUST be specified if the VOLUME_CONDITION node
// capability is supported.
VolumeCondition volume_condition = 2;
}
message VolumeUsage {
enum Unit {
UNKNOWN = 0;
BYTES = 1;
INODES = 2;
}
// The available capacity in specified Unit. This field is OPTIONAL.
// The value of this field MUST NOT be negative.
uint64 available = 1;
// The total capacity in specified Unit. This field is REQUIRED.
// The value of this field MUST NOT be negative.
uint64 total = 2;
// The used capacity in specified Unit. This field is OPTIONAL.
// The value of this field MUST NOT be negative.
uint64 used = 3;
// Units by which values are measured. This field is REQUIRED.
Unit unit = 4;
}
// VolumeCondition represents the current condition of a volume.
message VolumeCondition {
// Normal volumes are available for use and operating optimally.
// An abnormal volume does not meet these criteria.
// This field is REQUIRED.
bool abnormal = 1;
// The message describing the condition of the volume.
// This field is REQUIRED.
string message = 2;
}
```
### Step by step walk-through
Given the following definition:
```YAML
---
apiVersion: v1
kind: Pod
metadata:
name: app
spec:
runtime-class: kata-qemu
containers:
- name: app
image: centos
command: ["/bin/sh"]
args: ["-c", "while true; do echo $(date -u) >> /data/out.txt; sleep 5; done"]
volumeMounts:
- name: persistent-storage
mountPath: /data
volumes:
- name: persistent-storage
persistentVolumeClaim:
claimName: ebs-claim
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
skip-hostmount: "true"
name: ebs-claim
spec:
accessModes:
- ReadWriteOncePod
volumeMode: Filesystem
storageClassName: ebs-sc
resources:
requests:
storage: 4Gi
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: ebs-sc
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
parameters:
csi.storage.k8s.io/fstype: ext4
```
Lets assume that changes have been made in the `aws-ebs-csi-driver` node driver.
**Node publish volume**
1. In the node CSI driver, the `NodePublishVolume` API invokes: `kata-runtime direct-volume add --volume-path "/kubelet/a/b/c/d/sdf" --mount-info "{\"Device\": \"/dev/sdf\", \"fstype\": \"ext4\"}"`.
2. The `Kata-runtime` writes the mount-info JSON to a file called `mountInfo.json` under `/run/kata-containers/shared/direct-volumes/kubelet/a/b/c/d/sdf`.
**Node unstage volume**
1. In the node CSI driver, the `NodeUnstageVolume` API invokes: `kata-runtime direct-volume remove --volume-path "/kubelet/a/b/c/d/sdf"`.
2. Kata-runtime deletes the directory `/run/kata-containers/shared/direct-volumes/kubelet/a/b/c/d/sdf`.
**Use the volume in sandbox**
1. Upon the request to start a container, the `containerd-shim-kata-v2` examines the container spec,
and iterates through the mounts. For each mount, if there is a `mountInfo.json` file under `/run/kata-containers/shared/direct-volumes/[mount source path]`,
it generates a `storage` GRPC object after overwriting the mount spec with the information in `mountInfo.json`.
2. The shim sends the storage objects to kata-agent through TTRPC.
3. The shim writes a file with the sandbox id as the name under `/run/kata-containers/shared/direct-volumes/[mount source path]`.
4. The kata-agent mounts the storage objects for the container.
**Node expand volume**
1. In the node CSI driver, the `NodeExpandVolume` API invokes: `kata-runtime direct-volume resize -volume-path "/kubelet/a/b/c/d/sdf" -size 8Gi`.
2. The Kata runtime checks whether there is a sandbox id file under the directory `/run/kata-containers/shared/direct-volumes/kubelet/a/b/c/d/sdf`.
3. The Kata runtime identifies the shim instance through the sandbox id, and sends a GRPC request to resize the volume.
4. The shim handles the request, asks the hypervisor to resize the block device and sends a GRPC request to Kata agent to resize the filesystem.
5. Kata agent receives the request and resizes the filesystem.
**Node get volume stats**
1. In the node CSI driver, the `NodeGetVolumeStats` API invokes: `kata-runtime direct-volume stats -volume-path "/kubelet/a/b/c/d/sdf"`.
2. The Kata runtime checks whether there is a sandbox id file under the directory `/run/kata-containers/shared/direct-volumes/kubelet/a/b/c/d/sdf`.
3. The Kata runtime identifies the shim instance through the sandbox id, and sends a GRPC request to get the volume stats.
4. The shim handles the request and forwards it to the Kata agent.
5. Kata agent receives the request and returns the filesystem stats.

View File

@@ -51,7 +51,6 @@ The `kata-monitor` management agent should be started on each node where the Kat
> **Note**: a *node* running Kata containers will be either a single host system or a worker node belonging to a K8s cluster capable of running Kata pods.
- Aggregate sandbox metrics running on the node, adding the `sandbox_id` label to them.
- Attach the additional `cri_uid`, `cri_name` and `cri_namespace` labels to the sandbox metrics, tracking the `uid`, `name` and `namespace` Kubernetes pod metadata.
- Expose a new Prometheus target, allowing all node metrics coming from the Kata shim to be collected by Prometheus indirectly. This simplifies the targets count in Prometheus and avoids exposing shim's metrics by `ip:port`.
Only one `kata-monitor` process runs in each node.

View File

@@ -2,15 +2,24 @@
## Default number of virtual CPUs
Before starting a container, the [runtime][4] reads the `default_vcpus` option
from the [configuration file][5] to determine the number of virtual CPUs
Before starting a container, the [runtime][6] reads the `default_vcpus` option
from the [configuration file][7] to determine the number of virtual CPUs
(vCPUs) needed to start the virtual machine. By default, `default_vcpus` is
equal to 1 for fast boot time and a small memory footprint per virtual machine.
Be aware that increasing this value negatively impacts the virtual machine's
boot time and memory footprint.
In general, we recommend that you do not edit this variable, unless you know
what are you doing. If your container needs more than one vCPU, use
[Kubernetes `cpu` limits][1] to assign more resources.
[docker `--cpus`][1], [docker update][4], or [Kubernetes `cpu` limits][2] to
assign more resources.
*Docker*
```sh
$ docker run --name foo -ti --cpus 2 debian bash
$ docker update --cpus 4 foo
```
*Kubernetes*
@@ -40,7 +49,7 @@ $ sudo -E kubectl create -f ~/cpu-demo.yaml
## Virtual CPUs and Kubernetes pods
A Kubernetes pod is a group of one or more containers, with shared storage and
network, and a specification for how to run the containers [[specification][2]].
network, and a specification for how to run the containers [[specification][3]].
In Kata Containers this group of containers, which is called a sandbox, runs inside
the same virtual machine. If you do not specify a CPU constraint, the runtime does
not add more vCPUs and the container is not placed inside a CPU cgroup.
@@ -64,7 +73,13 @@ constraints with each container trying to consume 100% of vCPU, the resources
divide in two parts, 50% of vCPU for each container because your virtual
machine does not have enough resources to satisfy containers needs. If you want
to give access to a greater or lesser portion of vCPUs to a specific container,
use [Kubernetes `cpu` requests][1].
use [`docker --cpu-shares`][1] or [Kubernetes `cpu` requests][2].
*Docker*
```sh
$ docker run -ti --cpus-shares=512 debian bash
```
*Kubernetes*
@@ -94,9 +109,10 @@ $ sudo -E kubectl create -f ~/cpu-demo.yaml
Before running containers without CPU constraint, consider that your containers
are not running alone. Since your containers run inside a virtual machine other
processes use the vCPUs as well (e.g. `systemd` and the Kata Containers
[agent][3]). In general, we recommend setting `default_vcpus` equal to 1 to
[agent][5]). In general, we recommend setting `default_vcpus` equal to 1 to
allow non-container processes to run on this vCPU and to specify a CPU
constraint for each container.
constraint for each container. If your container is already running and needs
more vCPUs, you can add more using [docker update][4].
## Container with CPU constraint
@@ -105,7 +121,7 @@ constraints using the following formula: `vCPUs = ceiling( quota / period )`, wh
`quota` specifies the number of microseconds per CPU Period that the container is
guaranteed CPU access and `period` specifies the CPU CFS scheduler period of time
in microseconds. The result determines the number of vCPU to hot plug into the
virtual machine. Once the vCPUs have been added, the [agent][3] places the
virtual machine. Once the vCPUs have been added, the [agent][5] places the
container inside a CPU cgroup. This placement allows the container to use only
its assigned resources.
@@ -122,6 +138,25 @@ the virtual machine starts with 8 vCPUs and 1 vCPUs is added and assigned
to the container. Non-container processes might be able to use 8 vCPUs but they
use a maximum 1 vCPU, hence 7 vCPUs might not be used.
*Container without CPU constraint*
```sh
$ docker run -ti debian bash -c "nproc; cat /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_*"
1 # number of vCPUs
100000 # cfs period
-1 # cfs quota
```
*Container with CPU constraint*
```sh
docker run --cpus 4 -ti debian bash -c "nproc; cat /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_*"
5 # number of vCPUs
100000 # cfs period
400000 # cfs quota
```
## Virtual CPU handling without hotplug
In some cases, the hardware and/or software architecture being utilized does not support
@@ -148,8 +183,11 @@ the container's `spec` will provide the sizing information directly. If these ar
calculate the number of CPUs required for the workload and augment this by `default_vcpus`
configuration option, and use this for the virtual machine size.
[1]: https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource
[2]: https://kubernetes.io/docs/concepts/workloads/pods/pod/
[3]: ../../src/agent
[4]: ../../src/runtime
[5]: ../../src/runtime/README.md#configuration
[1]: https://docs.docker.com/config/containers/resource_constraints/#cpu
[2]: https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource
[3]: https://kubernetes.io/docs/concepts/workloads/pods/pod/
[4]: https://docs.docker.com/engine/reference/commandline/update/
[5]: ../../src/agent
[6]: ../../src/runtime
[7]: ../../src/runtime/README.md#configuration

View File

@@ -39,7 +39,7 @@ Details of each solution and a summary are provided below.
Kata Containers with QEMU has complete compatibility with Kubernetes.
Depending on the host architecture, Kata Containers supports various machine types,
for example `q35` on x86 systems, `virt` on ARM systems and `pseries` on IBM Power systems. The default Kata Containers
for example `pc` and `q35` on x86 systems, `virt` on ARM systems and `pseries` on IBM Power systems. The default Kata Containers
machine type is `q35`. The machine type and its [`Machine accelerators`](#machine-accelerators) can
be changed by editing the runtime [`configuration`](architecture/README.md#configuration) file.
@@ -60,8 +60,9 @@ Machine accelerators are architecture specific and can be used to improve the pe
and enable specific features of the machine types. The following machine accelerators
are used in Kata Containers:
- NVDIMM: This machine accelerator is x86 specific and only supported by `q35` machine types.
`nvdimm` is used to provide the root filesystem as a persistent memory device to the Virtual Machine.
- NVDIMM: This machine accelerator is x86 specific and only supported by `pc` and
`q35` machine types. `nvdimm` is used to provide the root filesystem as a persistent
memory device to the Virtual Machine.
#### Hotplug devices

View File

@@ -15,11 +15,6 @@
- `qemu`
- `cloud-hypervisor`
- `firecracker`
In the case of `firecracker` the use of a block device `snapshotter` is needed
for the VM rootfs. Refer to the following guide for additional configuration
steps:
- [Setup Kata containers with `firecracker`](how-to-use-kata-containers-with-firecracker.md)
- `ACRN`
While `qemu` , `cloud-hypervisor` and `firecracker` work out of the box with installation of Kata,

View File

@@ -48,9 +48,9 @@ Running Docker containers Kata Containers requires care because `VOLUME`s specif
kataShared on / type virtiofs (rw,relatime,dax)
```
`kataShared` mount types are powered by [`virtio-fs`](https://virtio-fs.gitlab.io/), a marked improvement over `virtio-9p`, thanks to [PR #1016](https://github.com/kata-containers/runtime/pull/1016). While `virtio-fs` is normally an excellent choice, in the case of DinD workloads `virtio-fs` causes an issue -- [it *cannot* be used as a "upper layer" of `overlayfs` without a custom patch](http://lists.katacontainers.io/pipermail/kata-dev/2020-January/001216.html).
`kataShared` mount types are powered by [`virtio-fs`][virtio-fs], a marked improvement over `virtio-9p`, thanks to [PR #1016](https://github.com/kata-containers/runtime/pull/1016). While `virtio-fs` is normally an excellent choice, in the case of DinD workloads `virtio-fs` causes an issue -- [it *cannot* be used as a "upper layer" of `overlayfs` without a custom patch](http://lists.katacontainers.io/pipermail/kata-dev/2020-January/001216.html).
As `/var/lib/docker` is a `VOLUME` specified by DinD (i.e. the `docker` images tagged `*-dind`/`*-dind-rootless`), `docker` will fail to start (or even worse, silently pick a worse storage driver like `vfs`) when started in a Kata Container. Special measures must be taken when running DinD-powered workloads in Kata Containers.
As `/var/lib/docker` is a `VOLUME` specified by DinD (i.e. the `docker` images tagged `*-dind`/`*-dind-rootless`), `docker` fill fail to start (or even worse, silently pick a worse storage driver like `vfs`) when started in a Kata Container. Special measures must be taken when running DinD-powered workloads in Kata Containers.
## Workarounds/Solutions
@@ -58,7 +58,7 @@ Thanks to various community contributions (see [issue references below](#referen
### Use a memory backed volume
For small workloads (small container images, without much generated filesystem load), a memory-backed volume is sufficient. Kubernetes supports a variant of [the `EmptyDir` volume](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir), which allows for memdisk-backed storage -- the the `medium: Memory`. An example of a `Pod` using such a setup [was contributed](https://github.com/kata-containers/runtime/issues/1429#issuecomment-477385283), and is reproduced below:
For small workloads (small container images, without much generated filesystem load), a memory-backed volume is sufficient. Kubernetes supports a variant of [the `EmptyDir` volume][k8s-emptydir], which allows for memdisk-backed storage -- the [the `medium: Memory` ][k8s-memory-volume-type]. An example of a `Pod` using such a setup [was contributed](https://github.com/kata-containers/runtime/issues/1429#issuecomment-477385283), and is reproduced below:
```yaml
apiVersion: v1

View File

@@ -101,7 +101,7 @@ Start an ACRN based Kata Container,
$ sudo docker run -ti --runtime=kata-runtime busybox sh
```
You will see ACRN(`acrn-dm`) is now running on your system, as well as a `kata-shim`. You should obtain an interactive shell prompt. Verify that all the Kata processes terminate once you exit the container.
You will see ACRN(`acrn-dm`) is now running on your system, as well as a `kata-shim`, `kata-proxy`. You should obtain an interactive shell prompt. Verify that all the Kata processes terminate once you exit the container.
```bash
$ ps -ef | grep -E "kata|acrn"

View File

@@ -1,254 +0,0 @@
# Configure Kata Containers to use Firecracker
This document provides an overview on how to run Kata Containers with the AWS Firecracker hypervisor.
## Introduction
AWS Firecracker is an open source virtualization technology that is purpose-built for creating and managing secure, multi-tenant container and function-based services that provide serverless operational models. AWS Firecracker runs workloads in lightweight virtual machines, called `microVMs`, which combine the security and isolation properties provided by hardware virtualization technology with the speed and flexibility of Containers.
Please refer to AWS Firecracker [documentation](https://github.com/firecracker-microvm/firecracker/blob/main/docs/getting-started.md) for more details.
## Pre-requisites
This document requires the presence of Kata Containers on your system. Install using the instructions available through the following links:
- Kata Containers [automated installation](../install/README.md)
- Kata Containers manual installation: Automated installation does not seem to be supported for Clear Linux, so please use [manual installation](../Developer-Guide.md) steps.
> **Note:** Create rootfs image and not initrd image.
## Install AWS Firecracker
Kata Containers only support AWS Firecracker v0.23.4 ([yet](https://github.com/kata-containers/kata-containers/pull/1519)).
To install Firecracker we need to get the `firecracker` and `jailer` binaries:
```bash
$ release_url="https://github.com/firecracker-microvm/firecracker/releases"
$ version="v0.23.1"
$ arch=`uname -m`
$ curl ${release_url}/download/${version}/firecracker-${version}-${arch} -o firecracker
$ curl ${release_url}/download/${version}/jailer-${version}-${arch} -o jailer
$ chmod +x jailer firecracker
```
To make the binaries available from the default system `PATH` it is recommended to move them to `/usr/local/bin` or add a symbolic link:
```bash
$ sudo ln -s $(pwd)/firecracker /usr/local/bin
$ sudo ln -s $(pwd)/jailer /usr/local/bin
```
More details can be found in [AWS Firecracker docs](https://github.com/firecracker-microvm/firecracker/blob/main/docs/getting-started.md)
In order to run Kata with AWS Firecracker a block device as the backing store for a VM is required. To interact with `containerd` and Kata we use the `devmapper` `snapshotter`.
## Configure `devmapper`
To check support for your `containerd` installation, you can run:
```
$ ctr plugins ls |grep devmapper
```
if the output of the above command is:
```
io.containerd.snapshotter.v1 devmapper linux/amd64 ok
```
then you can skip this section and move on to `Configure Kata Containers with AWS Firecracker`
If the output of the above command is:
```
io.containerd.snapshotter.v1 devmapper linux/amd64 error
```
then we need to setup `devmapper` `snapshotter`. Based on a [very useful
guide](https://docs.docker.com/storage/storagedriver/device-mapper-driver/)
from docker, we can set it up using the following scripts:
> **Note:** The following scripts assume a 100G sparse file for storing container images, a 10G sparse file for the thin-provisioning pool and 10G base image files for any sandboxed container created. This means that we will need at least 10GB free space.
```
#!/bin/bash
set -ex
DATA_DIR=/var/lib/containerd/devmapper
POOL_NAME=devpool
mkdir -p ${DATA_DIR}
# Create data file
sudo touch "${DATA_DIR}/data"
sudo truncate -s 100G "${DATA_DIR}/data"
# Create metadata file
sudo touch "${DATA_DIR}/meta"
sudo truncate -s 10G "${DATA_DIR}/meta"
# Allocate loop devices
DATA_DEV=$(sudo losetup --find --show "${DATA_DIR}/data")
META_DEV=$(sudo losetup --find --show "${DATA_DIR}/meta")
# Define thin-pool parameters.
# See https://www.kernel.org/doc/Documentation/device-mapper/thin-provisioning.txt for details.
SECTOR_SIZE=512
DATA_SIZE="$(sudo blockdev --getsize64 -q ${DATA_DEV})"
LENGTH_IN_SECTORS=$(bc <<< "${DATA_SIZE}/${SECTOR_SIZE}")
DATA_BLOCK_SIZE=128
LOW_WATER_MARK=32768
# Create a thin-pool device
sudo dmsetup create "${POOL_NAME}" \
--table "0 ${LENGTH_IN_SECTORS} thin-pool ${META_DEV} ${DATA_DEV} ${DATA_BLOCK_SIZE} ${LOW_WATER_MARK}"
cat << EOF
#
# Add this to your config.toml configuration file and restart `containerd` daemon
#
[plugins]
[plugins.devmapper]
pool_name = "${POOL_NAME}"
root_path = "${DATA_DIR}"
base_image_size = "10GB"
discard_blocks = true
EOF
```
Make it executable and run it:
```bash
$ sudo chmod +x ~/scripts/devmapper/create.sh
$ cd ~/scripts/devmapper/
$ sudo ./create.sh
```
Now, we can add the `devmapper` configuration provided from the script to `/etc/containerd/config.toml`.
> **Note:** If you are using the default `containerd` configuration (`containerd config default >> /etc/containerd/config.toml`), you may need to edit the existing `[plugins."io.containerd.snapshotter.v1.devmapper"]`configuration.
Save and restart `containerd`:
```bash
$ sudo systemctl restart containerd
```
We can use `dmsetup` to verify that the thin-pool was created successfully.
```bash
$ sudo dmsetup ls
```
We should also check that `devmapper` is registered and running:
```bash
$ sudo ctr plugins ls | grep devmapper
```
This script needs to be run only once, while setting up the `devmapper` `snapshotter` for `containerd`. Afterwards, make sure that on each reboot, the thin-pool is initialized from the same data directory. Otherwise, all the fetched containers (or the ones that you have created) will be re-initialized. A simple script that re-creates the thin-pool from the same data directory is shown below:
```
#!/bin/bash
set -ex
DATA_DIR=/var/lib/containerd/devmapper
POOL_NAME=devpool
# Allocate loop devices
DATA_DEV=$(sudo losetup --find --show "${DATA_DIR}/data")
META_DEV=$(sudo losetup --find --show "${DATA_DIR}/meta")
# Define thin-pool parameters.
# See https://www.kernel.org/doc/Documentation/device-mapper/thin-provisioning.txt for details.
SECTOR_SIZE=512
DATA_SIZE="$(sudo blockdev --getsize64 -q ${DATA_DEV})"
LENGTH_IN_SECTORS=$(bc <<< "${DATA_SIZE}/${SECTOR_SIZE}")
DATA_BLOCK_SIZE=128
LOW_WATER_MARK=32768
# Create a thin-pool device
sudo dmsetup create "${POOL_NAME}" \
--table "0 ${LENGTH_IN_SECTORS} thin-pool ${META_DEV} ${DATA_DEV} ${DATA_BLOCK_SIZE} ${LOW_WATER_MARK}"
```
We can create a systemd service to run the above script on each reboot:
```bash
$ sudo nano /lib/systemd/system/devmapper_reload.service
```
The service file:
```
[Unit]
Description=Devmapper reload script
[Service]
ExecStart=/path/to/script/reload.sh
[Install]
WantedBy=multi-user.target
```
Enable the newly created service:
```bash
$ sudo systemctl daemon-reload
$ sudo systemctl enable devmapper_reload.service
$ sudo systemctl start devmapper_reload.service
```
## Configure Kata Containers with AWS Firecracker
To configure Kata Containers with AWS Firecracker, copy the generated `configuration-fc.toml` file when building the `kata-runtime` to either `/etc/kata-containers/configuration-fc.toml` or `/usr/share/defaults/kata-containers/configuration-fc.toml`.
The following command shows full paths to the `configuration.toml` files that the runtime loads. It will use the first path that exists. (Please make sure the kernel and image paths are set correctly in the `configuration.toml` file)
```bash
$ sudo kata-runtime --show-default-config-paths
```
## Configure `containerd`
Next, we need to configure containerd. Add a file in your path (e.g. `/usr/local/bin/containerd-shim-kata-fc-v2`) with the following contents:
```
#!/bin/bash
KATA_CONF_FILE=/etc/containers/configuration-fc.toml /usr/local/bin/containerd-shim-kata-v2 $@
```
> **Note:** You may need to edit the paths of the configuration file and the `containerd-shim-kata-v2` to correspond to your setup.
Make it executable:
```bash
$ sudo chmod +x /usr/local/bin/containerd-shim-kata-fc-v2
```
Add the relevant section in `containerd`s `config.toml` file (`/etc/containerd/config.toml`):
```
[plugins.cri.containerd.runtimes]
[plugins.cri.containerd.runtimes.kata-fc]
runtime_type = "io.containerd.kata-fc.v2"
```
> **Note:** If you are using the default `containerd` configuration (`containerd config default >> /etc/containerd/config.toml`),
> the configuration should change to :
```
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-fc]
runtime_type = "io.containerd.kata-fc.v2"
```
Restart `containerd`:
```bash
$ sudo systemctl restart containerd
```
## Verify the installation
We are now ready to launch a container using Kata with Firecracker to verify that everything worked:
```bash
$ sudo ctr images pull --snapshotter devmapper docker.io/library/ubuntu:latest
$ sudo ctr run --snapshotter devmapper --runtime io.containerd.run.kata-fc.v2 -t --rm docker.io/library/ubuntu
```

View File

@@ -81,7 +81,7 @@
- Download the standard `systemd(1)` service file and install to
`/etc/systemd/system/`:
- https://raw.githubusercontent.com/containerd/containerd/main/containerd.service
- https://raw.githubusercontent.com/containerd/containerd/master/containerd.service
> **Notes:**
>

View File

@@ -3,4 +3,4 @@
Kata Containers supports passing certain GPUs from the host into the container. Select the GPU vendor for detailed information:
- [Intel](Intel-GPU-passthrough-and-Kata.md)
- [NVIDIA](NVIDIA-GPU-passthrough-and-Kata.md)
- [Nvidia](Nvidia-GPU-passthrough-and-Kata.md)

View File

@@ -1,372 +0,0 @@
# Using NVIDIA GPU device with Kata Containers
An NVIDIA GPU device can be passed to a Kata Containers container using GPU
passthrough (NVIDIA GPU pass-through mode) as well as GPU mediated passthrough
(NVIDIA vGPU mode).
NVIDIA GPU pass-through mode, an entire physical GPU is directly assigned to one
VM, bypassing the NVIDIA Virtual GPU Manager. In this mode of operation, the GPU
is accessed exclusively by the NVIDIA driver running in the VM to which it is
assigned. The GPU is not shared among VMs.
NVIDIA Virtual GPU (vGPU) enables multiple virtual machines (VMs) to have
simultaneous, direct access to a single physical GPU, using the same NVIDIA
graphics drivers that are deployed on non-virtualized operating systems. By
doing this, NVIDIA vGPU provides VMs with unparalleled graphics performance,
compute performance, and application compatibility, together with the
cost-effectiveness and scalability brought about by sharing a GPU among multiple
workloads. A vGPU can be either time-sliced or Multi-Instance GPU (MIG)-backed
with [MIG-slices](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/).
| Technology | Description | Behavior | Detail |
| --- | --- | --- | --- |
| NVIDIA GPU pass-through mode | GPU passthrough | Physical GPU assigned to a single VM | Direct GPU assignment to VM without limitation |
| NVIDIA vGPU time-sliced | GPU time-sliced | Physical GPU time-sliced for multiple VMs | Mediated passthrough |
| NVIDIA vGPU MIG-backed | GPU with MIG-slices | Physical GPU MIG-sliced for multiple VMs | Mediated passthrough |
## Hardware Requirements
NVIDIA GPUs Recommended for Virtualization:
- NVIDIA Tesla (T4, M10, P6, V100 or newer)
- NVIDIA Quadro RTX 6000/8000
## Host BIOS Requirements
Some hardware requires a larger PCI BARs window, for example, NVIDIA Tesla P100,
K40m
```sh
$ lspci -s d0:00.0 -vv | grep Region
Region 0: Memory at e7000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at 222800000000 (64-bit, prefetchable) [size=32G] # Above 4G
Region 3: Memory at 223810000000 (64-bit, prefetchable) [size=32M]
```
For large BARs devices, MMIO mapping above 4G address space should be `enabled`
in the PCI configuration of the BIOS.
Some hardware vendors use different name in BIOS, such as:
- Above 4G Decoding
- Memory Hole for PCI MMIO
- Memory Mapped I/O above 4GB
If one is using a GPU based on the Ampere architecture and later additionally
SR-IOV needs to be enabled for the vGPU use-case.
The following steps outline the workflow for using an NVIDIA GPU with Kata.
## Host Kernel Requirements
The following configurations need to be enabled on your host kernel:
- `CONFIG_VFIO`
- `CONFIG_VFIO_IOMMU_TYPE1`
- `CONFIG_VFIO_MDEV`
- `CONFIG_VFIO_MDEV_DEVICE`
- `CONFIG_VFIO_PCI`
Your host kernel needs to be booted with `intel_iommu=on` on the kernel command
line.
## Install and configure Kata Containers
To use non-large BARs devices (for example, NVIDIA Tesla T4), you need Kata
version 1.3.0 or above. Follow the [Kata Containers setup
instructions](../install/README.md) to install the latest version of Kata.
To use large BARs devices (for example, NVIDIA Tesla P100), you need Kata
version 1.11.0 or above.
The following configuration in the Kata `configuration.toml` file as shown below
can work:
Hotplug for PCI devices with small BARs by `acpi_pcihp` (Linux's ACPI PCI
Hotplug driver):
```sh
machine_type = "q35"
hotplug_vfio_on_root_bus = false
```
Hotplug for PCIe devices with large BARs by `pciehp` (Linux's PCIe Hotplug
driver):
```sh
machine_type = "q35"
hotplug_vfio_on_root_bus = true
pcie_root_port = 1
```
## Build Kata Containers kernel with GPU support
The default guest kernel installed with Kata Containers does not provide GPU
support. To use an NVIDIA GPU with Kata Containers, you need to build a kernel
with the necessary GPU support.
The following kernel config options need to be enabled:
```sh
# Support PCI/PCIe device hotplug (Required for large BARs device)
CONFIG_HOTPLUG_PCI_PCIE=y
# Support for loading modules (Required for load NVIDIA drivers)
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# Enable the MMIO access method for PCIe devices (Required for large BARs device)
CONFIG_PCI_MMCONFIG=y
```
The following kernel config options need to be disabled:
```sh
# Disable Open Source NVIDIA driver nouveau
# It conflicts with NVIDIA official driver
CONFIG_DRM_NOUVEAU=n
```
> **Note**: `CONFIG_DRM_NOUVEAU` is normally disabled by default.
It is worth checking that it is not enabled in your kernel configuration to
prevent any conflicts.
Build the Kata Containers kernel with the previous config options, using the
instructions described in [Building Kata Containers
kernel](../../tools/packaging/kernel). For further details on building and
installing guest kernels, see [the developer
guide](../Developer-Guide.md#install-guest-kernel-images).
There is an easy way to build a guest kernel that supports NVIDIA GPU:
```sh
## Build guest kernel with ../../tools/packaging/kernel
# Prepare (download guest kernel source, generate .config)
$ ./build-kernel.sh -v 5.15.23 -g nvidia -f setup
# Build guest kernel
$ ./build-kernel.sh -v 5.15.23 -g nvidia build
# Install guest kernel
$ sudo -E ./build-kernel.sh -v 5.15.23 -g nvidia install
```
To build NVIDIA Driver in Kata container, `linux-headers` is required.
This is a way to generate deb packages for `linux-headers`:
> **Note**:
> Run `make rpm-pkg` to build the rpm package.
> Run `make deb-pkg` to build the deb package.
>
```sh
$ cd kata-linux-5.15.23-89
$ make deb-pkg
```
Before using the new guest kernel, please update the `kernel` parameters in
`configuration.toml`.
```sh
kernel = "/usr/share/kata-containers/vmlinuz-nvidia-gpu.container"
```
## NVIDIA GPU pass-through mode with Kata Containers
Use the following steps to pass an NVIDIA GPU device in pass-through mode with Kata:
1. Find the Bus-Device-Function (BDF) for GPU device on host:
```sh
$ sudo lspci -nn -D | grep -i nvidia
0000:d0:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:20b9] (rev a1)
```
> PCI address `0000:d0:00.0` is assigned to the hardware GPU device.
> `10de:20b9` is the device ID of the hardware GPU device.
2. Find the IOMMU group for the GPU device:
```sh
$ BDF="0000:d0:00.0"
$ readlink -e /sys/bus/pci/devices/$BDF/iommu_group
```
The previous output shows that the GPU belongs to IOMMU group 192. The next
step is to bind the GPU to the VFIO-PCI driver.
```sh
$ BDF="0000:d0:00.0"
$ DEV="/sys/bus/pci/devices/$BDF"
$ echo "vfio-pci" > $DEV/driver_override
$ echo $BDF > $DEV/driver/unbind
$ echo $BDF > /sys/bus/pci/drivers_probe
# To return the device to the standard driver, we simply clear the
# driver_override and reprobe the device, ex:
$ echo > $DEV/preferred_driver
$ echo $BDF > $DEV/driver/unbind
$ echo $BDF > /sys/bus/pci/drivers_probe
```
3. Check the IOMMU group number under `/dev/vfio`:
```sh
$ ls -l /dev/vfio
total 0
crw------- 1 zvonkok zvonkok 243, 0 Mar 18 03:06 192
crw-rw-rw- 1 root root 10, 196 Mar 18 02:27 vfio
```
4. Start a Kata container with GPU device:
```sh
# You may need to `modprobe vhost-vsock` if you get
# host system doesn't support vsock: stat /dev/vhost-vsock
$ sudo ctr --debug run --runtime "io.containerd.kata.v2" --device /dev/vfio/192 --rm -t "docker.io/library/archlinux:latest" arch uname -r
```
5. Run `lspci` within the container to verify the GPU device is seen in the list
of the PCI devices. Note the vendor-device id of the GPU (`10de:20b9`) in the `lspci` output.
```sh
$ sudo ctr --debug run --runtime "io.containerd.kata.v2" --device /dev/vfio/192 --rm -t "docker.io/library/archlinux:latest" arch sh -c "lspci -nn | grep '10de:20b9'"
```
6. Additionally, you can check the PCI BARs space of the NVIDIA GPU device in the container:
```sh
$ sudo ctr --debug run --runtime "io.containerd.kata.v2" --device /dev/vfio/192 --rm -t "docker.io/library/archlinux:latest" arch sh -c "lspci -s 02:00.0 -vv | grep Region"
```
> **Note**: If you see a message similar to the above, the BAR space of the NVIDIA
> GPU has been successfully allocated.
## NVIDIA vGPU mode with Kata Containers
NVIDIA vGPU is a licensed product on all supported GPU boards. A software license
is required to enable all vGPU features within the guest VM.
> **TODO**: Will follow up with instructions
## Install NVIDIA Driver + Toolkit in Kata Containers Guest OS
Consult the [Developer-Guide](https://github.com/kata-containers/kata-containers/blob/main/docs/Developer-Guide.md#create-a-rootfs-image) on how to create a
rootfs base image for a distribution of your choice. This is going to be used as
a base for a NVIDIA enabled guest OS. Use the `EXTRA_PKGS` variable to install
all the needed packages to compile the drivers. Also copy the kernel development
packages from the previous `make deb-pkg` into `$ROOTFS_DIR`.
```sh
export EXTRA_PKGS="gcc make curl gnupg"
```
Having the `$ROOTFS_DIR` exported in the previous step we can now install all the
need parts in the guest OS. In this case we have an Ubuntu based rootfs.
First off all mount the special filesystems into the rootfs
```sh
$ sudo mount -t sysfs -o ro none ${ROOTFS_DIR}/sys
$ sudo mount -t proc -o ro none ${ROOTFS_DIR}/proc
$ sudo mount -t tmpfs none ${ROOTFS_DIR}/tmp
$ sudo mount -o bind,ro /dev ${ROOTFS_DIR}/dev
$ sudo mount -t devpts none ${ROOTFS_DIR}/dev/pts
```
Now we can enter `chroot`
```sh
$ sudo chroot ${ROOTFS_DIR}
```
Inside the rootfs one is going to install the drivers and toolkit to enable easy
creation of GPU containers with Kata. We can also use this rootfs for any other
container not specifically only for GPUs.
As a prerequisite install the copied kernel development packages
```sh
$ sudo dpkg -i *.deb
```
Get the driver run file, since we need to build the driver against a kernel that
is not running on the host we need the ability to specify the exact version we
want the driver to build against. Take the kernel version one used for building
the NVIDIA kernel (`5.15.23-nvidia-gpu`).
```sh
$ wget https://us.download.nvidia.com/XFree86/Linux-x86_64/510.54/NVIDIA-Linux-x86_64-510.54.run
$ chmod +x NVIDIA-Linux-x86_64-510.54.run
# Extract the source files so we can run the installer with arguments
$ ./NVIDIA-Linux-x86_64-510.54.run -x
$ cd NVIDIA-Linux-x86_64-510.54
$ ./nvidia-installer -k 5.15.23-nvidia-gpu
```
Having the drivers installed we need to install the toolkit which will take care
of providing the right bits into the container.
```sh
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
$ curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
$ apt update
$ apt install nvidia-container-toolkit
```
Create the hook execution file for Kata:
```
# Content of $ROOTFS_DIR/usr/share/oci/hooks/prestart/nvidia-container-toolkit.sh
#!/bin/bash -x
/usr/bin/nvidia-container-toolkit -debug $@
```
As a last step one can do some cleanup of files or package caches. Build the
rootfs and configure it for use with Kata according to the development guide.
Enable the `guest_hook_path` in Kata's `configuration.toml`
```sh
guest_hook_path = "/usr/share/oci/hooks"
```
One has build a NVIDIA rootfs, kernel and now we can run any GPU container
without installing the drivers into the container. Check NVIDIA device status
with `nvidia-smi`
```sh
$ sudo ctr --debug run --runtime "io.containerd.kata.v2" --device /dev/vfio/192 --rm -t "docker.io/nvidia/cuda:11.6.0-base-ubuntu20.04" cuda nvidia-smi
Fri Mar 18 10:36:59 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.54 Driver Version: 510.54 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A30X Off | 00000000:02:00.0 Off | 0 |
| N/A 38C P0 67W / 230W | 0MiB / 24576MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
```
As a last step one can remove the additional packages and files that were added
to the `$ROOTFS_DIR` to keep it as small as possible.
## References
- [Configuring a VM for GPU Pass-Through by Using the QEMU Command Line](https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#using-gpu-pass-through-red-hat-el-qemu-cli)
- https://gitlab.com/nvidia/container-images/driver/-/tree/master
- https://github.com/NVIDIA/nvidia-docker/wiki/Driver-containers

View File

@@ -0,0 +1,293 @@
# Using Nvidia GPU device with Kata Containers
An Nvidia GPU device can be passed to a Kata Containers container using GPU passthrough
(Nvidia GPU pass-through mode) as well as GPU mediated passthrough (Nvidia vGPU mode). 
Nvidia GPU pass-through mode, an entire physical GPU is directly assigned to one VM,
bypassing the Nvidia Virtual GPU Manager. In this mode of operation, the GPU is accessed
exclusively by the Nvidia driver running in the VM to which it is assigned.
The GPU is not shared among VMs.
Nvidia Virtual GPU (vGPU) enables multiple virtual machines (VMs) to have simultaneous,
direct access to a single physical GPU, using the same Nvidia graphics drivers that are
deployed on non-virtualized operating systems. By doing this, Nvidia vGPU provides VMs
with unparalleled graphics performance, compute performance, and application compatibility,
together with the cost-effectiveness and scalability brought about by sharing a GPU
among multiple workloads.
| Technology | Description | Behaviour | Detail |
| --- | --- | --- | --- |
| Nvidia GPU pass-through mode | GPU passthrough | Physical GPU assigned to a single VM | Direct GPU assignment to VM without limitation |
| Nvidia vGPU mode | GPU sharing | Physical GPU shared by multiple VMs | Mediated passthrough |
## Hardware Requirements
Nvidia GPUs Recommended for Virtualization:
- Nvidia Tesla (T4, M10, P6, V100 or newer)
- Nvidia Quadro RTX 6000/8000
## Host BIOS Requirements
Some hardware requires a larger PCI BARs window, for example, Nvidia Tesla P100, K40m
```
$ lspci -s 04:00.0 -vv | grep Region
Region 0: Memory at c6000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at 383800000000 (64-bit, prefetchable) [size=16G] #above 4G
Region 3: Memory at 383c00000000 (64-bit, prefetchable) [size=32M]
```
For large BARs devices, MMIO mapping above 4G address space should be `enabled`
in the PCI configuration of the BIOS.
Some hardware vendors use different name in BIOS, such as:
- Above 4G Decoding
- Memory Hole for PCI MMIO
- Memory Mapped I/O above 4GB
The following steps outline the workflow for using an Nvidia GPU with Kata.
## Host Kernel Requirements
The following configurations need to be enabled on your host kernel:
- `CONFIG_VFIO`
- `CONFIG_VFIO_IOMMU_TYPE1`
- `CONFIG_VFIO_MDEV`
- `CONFIG_VFIO_MDEV_DEVICE`
- `CONFIG_VFIO_PCI`
Your host kernel needs to be booted with `intel_iommu=on` on the kernel command line.
## Install and configure Kata Containers
To use non-large BARs devices (for example, Nvidia Tesla T4), you need Kata version 1.3.0 or above.
Follow the [Kata Containers setup instructions](../install/README.md)
to install the latest version of Kata.
To use large BARs devices (for example, Nvidia Tesla P100), you need Kata version 1.11.0 or above.
The following configuration in the Kata `configuration.toml` file as shown below can work:
Hotplug for PCI devices by `acpi_pcihp` (Linux's ACPI PCI Hotplug driver):
```
machine_type = "q35"
hotplug_vfio_on_root_bus = false
```
Hotplug for PCIe devices by `pciehp` (Linux's PCIe Hotplug driver):
```
machine_type = "q35"
hotplug_vfio_on_root_bus = true
pcie_root_port = 1
```
## Build Kata Containers kernel with GPU support
The default guest kernel installed with Kata Containers does not provide GPU support.
To use an Nvidia GPU with Kata Containers, you need to build a kernel with the
necessary GPU support.
The following kernel config options need to be enabled:
```
# Support PCI/PCIe device hotplug (Required for large BARs device)
CONFIG_HOTPLUG_PCI_PCIE=y
# Support for loading modules (Required for load Nvidia drivers)
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# Enable the MMIO access method for PCIe devices (Required for large BARs device)
CONFIG_PCI_MMCONFIG=y
```
The following kernel config options need to be disabled:
```
# Disable Open Source Nvidia driver nouveau
# It conflicts with Nvidia official driver
CONFIG_DRM_NOUVEAU=n
```
> **Note**: `CONFIG_DRM_NOUVEAU` is normally disabled by default.
It is worth checking that it is not enabled in your kernel configuration to prevent any conflicts.
Build the Kata Containers kernel with the previous config options,
using the instructions described in [Building Kata Containers kernel](../../tools/packaging/kernel).
For further details on building and installing guest kernels,
see [the developer guide](../Developer-Guide.md#install-guest-kernel-images).
There is an easy way to build a guest kernel that supports Nvidia GPU:
```
## Build guest kernel with ../../tools/packaging/kernel
# Prepare (download guest kernel source, generate .config)
$ ./build-kernel.sh -v 4.19.86 -g nvidia -f setup
# Build guest kernel
$ ./build-kernel.sh -v 4.19.86 -g nvidia build
# Install guest kernel
$ sudo -E ./build-kernel.sh -v 4.19.86 -g nvidia install
/usr/share/kata-containers/vmlinux-nvidia-gpu.container -> vmlinux-4.19.86-70-nvidia-gpu
/usr/share/kata-containers/vmlinuz-nvidia-gpu.container -> vmlinuz-4.19.86-70-nvidia-gpu
```
To build Nvidia Driver in Kata container, `kernel-devel` is required.
This is a way to generate rpm packages for `kernel-devel`:
```
$ cd kata-linux-4.19.86-68
$ make rpm-pkg
Output RPMs:
~/rpmbuild/RPMS/x86_64/kernel-devel-4.19.86_nvidia_gpu-1.x86_64.rpm
```
> **Note**:
> - `kernel-devel` should be installed in Kata container before run Nvidia driver installer.
> - Run `make deb-pkg` to build the deb package.
Before using the new guest kernel, please update the `kernel` parameters in `configuration.toml`.
```
kernel = "/usr/share/kata-containers/vmlinuz-nvidia-gpu.container"
```
## Nvidia GPU pass-through mode with Kata Containers
Use the following steps to pass an Nvidia GPU device in pass-through mode with Kata:
1. Find the Bus-Device-Function (BDF) for GPU device on host:
```
$ sudo lspci -nn -D | grep -i nvidia
0000:04:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:15f8] (rev a1)
0000:84:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:15f8] (rev a1)
```
> PCI address `0000:04:00.0` is assigned to the hardware GPU device.
> `10de:15f8` is the device ID of the hardware GPU device.
2. Find the IOMMU group for the GPU device:
```
$ BDF="0000:04:00.0"
$ readlink -e /sys/bus/pci/devices/$BDF/iommu_group
/sys/kernel/iommu_groups/45
```
The previous output shows that the GPU belongs to IOMMU group 45.
3. Check the IOMMU group number under `/dev/vfio`:
```
$ ls -l /dev/vfio
total 0
crw------- 1 root root 248, 0 Feb 28 09:57 45
crw------- 1 root root 248, 1 Feb 28 09:57 54
crw-rw-rw- 1 root root 10, 196 Feb 28 09:57 vfio
```
4. Start a Kata container with GPU device:
```
$ sudo docker run -it --runtime=kata-runtime --cap-add=ALL --device /dev/vfio/45 centos /bin/bash
```
5. Run `lspci` within the container to verify the GPU device is seen in the list
of the PCI devices. Note the vendor-device id of the GPU (`10de:15f8`) in the `lspci` output.
```
$ lspci -nn -D | grep '10de:15f8'
0000:01:01.0 3D controller [0302]: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] [10de:15f8] (rev a1)
```
6. Additionally, you can check the PCI BARs space of the Nvidia GPU device in the container:
```
$ lspci -s 01:01.0 -vv | grep Region
Region 0: Memory at c0000000 (32-bit, non-prefetchable) [disabled] [size=16M]
Region 1: Memory at 4400000000 (64-bit, prefetchable) [disabled] [size=16G]
Region 3: Memory at 4800000000 (64-bit, prefetchable) [disabled] [size=32M]
```
> **Note**: If you see a message similar to the above, the BAR space of the Nvidia
> GPU has been successfully allocated.
## Nvidia vGPU mode with Kata Containers
Nvidia vGPU is a licensed product on all supported GPU boards. A software license
is required to enable all vGPU features within the guest VM.
> **Note**: There is no suitable test environment, so it is not written here.
## Install Nvidia Driver in Kata Containers
Download the official Nvidia driver from
[https://www.nvidia.com/Download/index.aspx](https://www.nvidia.com/Download/index.aspx),
for example `NVIDIA-Linux-x86_64-418.87.01.run`.
Install the `kernel-devel`(generated in the previous steps) for guest kernel:
```
$ sudo rpm -ivh kernel-devel-4.19.86_gpu-1.x86_64.rpm
```
Here is an example to extract, compile and install Nvidia driver:
```
## Extract
$ sh ./NVIDIA-Linux-x86_64-418.87.01.run -x
## Compile and install (It will take some time)
$ cd NVIDIA-Linux-x86_64-418.87.01
$ sudo ./nvidia-installer -a -q --ui=none \
--no-cc-version-check \
--no-opengl-files --no-install-libglvnd \
--kernel-source-path=/usr/src/kernels/`uname -r`
```
Or just run one command line:
```
$ sudo sh ./NVIDIA-Linux-x86_64-418.87.01.run -a -q --ui=none \
--no-cc-version-check \
--no-opengl-files --no-install-libglvnd \
--kernel-source-path=/usr/src/kernels/`uname -r`
```
To view detailed logs of the installer:
```
$ tail -f /var/log/nvidia-installer.log
```
Load Nvidia driver module manually
```
# Optionalgenerate modules.dep and map files for Nvidia driver
$ sudo depmod
# Load module
$ sudo modprobe nvidia-drm
# Check module
$ lsmod | grep nvidia
nvidia_drm 45056 0
nvidia_modeset 1093632 1 nvidia_drm
nvidia 18202624 1 nvidia_modeset
drm_kms_helper 159744 1 nvidia_drm
drm 364544 3 nvidia_drm,drm_kms_helper
i2c_core 65536 3 nvidia,drm_kms_helper,drm
ipmi_msghandler 49152 1 nvidia
```
Check Nvidia device status with `nvidia-smi`
```
$ nvidia-smi
Tue Mar 3 00:03:49 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01 Driver Version: 418.87.01 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:01:01.0 Off | 0 |
| N/A 27C P0 25W / 250W | 0MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
```
## References
- [Configuring a VM for GPU Pass-Through by Using the QEMU Command Line](https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#using-gpu-pass-through-red-hat-el-qemu-cli)
- https://gitlab.com/nvidia/container-images/driver/-/tree/master
- https://github.com/NVIDIA/nvidia-docker/wiki/Driver-containers

View File

@@ -0,0 +1,76 @@
# Setup to run VPP
The Data Plane Development Kit (DPDK) is a set of libraries and drivers for
fast packet processing. Vector Packet Processing (VPP) is a platform
extensible framework that provides out-of-the-box production quality
switch and router functionality. VPP is a high performance packet-processing
stack that can run on commodity CPUs. Enabling VPP with DPDK support can
yield significant performance improvements over a Linux\* bridge providing a
switch with DPDK VHOST-USER ports.
For more information about VPP visit their [wiki](https://wiki.fd.io/view/VPP).
## Install and configure Kata Containers
Follow the [Kata Containers setup instructions](../Developer-Guide.md).
In order to make use of VHOST-USER based interfaces, the container needs to be backed
by huge pages. `HugePages` support is required for the large memory pool allocation used for
DPDK packet buffers. This is a feature which must be configured within the Linux Kernel. See
[the DPDK documentation](https://doc.dpdk.org/guides/linux_gsg/sys_reqs.html#use-of-hugepages-in-the-linux-environment)
for details on how to enable for the host. After enabling huge pages support on the host system,
update the Kata configuration to enable huge page support in the guest kernel:
```
$ sudo sed -i -e 's/^# *\(enable_hugepages\).*=.*$/\1 = true/g' /usr/share/defaults/kata-containers/configuration.toml
```
## Install VPP
Follow the [VPP installation instructions](https://wiki.fd.io/view/VPP/Installing_VPP_binaries_from_packages).
After a successful installation, your host system is ready to start
connecting Kata Containers with VPP bridges.
### Install the VPP Docker\* plugin
To create a Docker network and connect Kata Containers easily to that network through
Docker, install a VPP Docker plugin.
To install the plugin, follow the [plugin installation instructions](https://github.com/clearcontainers/vpp).
This VPP plugin allows the creation of a VPP network. Every container added
to this network is connected through an L2 bridge-domain provided by VPP.
## Example: Launch two Kata Containers using VPP
To use VPP, use Docker to create a network that makes use of VPP.
For example:
```
$ sudo docker network create -d=vpp --ipam-driver=vpp --subnet=192.168.1.0/24 --gateway=192.168.1.1 vpp_net
```
Test connectivity by launching two containers:
```
$ sudo docker run --runtime=kata-runtime --net=vpp_net --ip=192.168.1.2 --mac-address=CA:FE:CA:FE:01:02 -it busybox bash -c "ip a; ip route; sleep 300"
$ sudo docker run --runtime=kata-runtime --net=vpp_net --ip=192.168.1.3 --mac-address=CA:FE:CA:FE:01:03 -it busybox bash -c "ip a; ip route; ping 192.168.1.2"
```
These commands setup two Kata Containers connected via a VPP L2 bridge
domain. The first of the two VMs displays the networking details and then
sleeps providing a period of time for it to be pinged. The second
VM displays its networking details and then pings the first VM, verifying
connectivity between them.
After verifying connectivity, cleanup with the following commands:
```
$ sudo docker kill $(sudo docker ps --no-trunc -aq)
$ sudo docker rm $(sudo docker ps --no-trunc -aq)
$ sudo docker network rm vpp_net
$ sudo service vpp stop
```

1
src/agent/Cargo.lock generated
View File

@@ -1370,7 +1370,6 @@ dependencies = [
"async-trait",
"capctl",
"caps",
"cfg-if 0.1.10",
"cgroups-rs",
"futures",
"inotify",

View File

@@ -76,13 +76,3 @@ lto = true
[features]
seccomp = ["rustjail/seccomp"]
standard-oci-runtime = ["rustjail/standard-oci-runtime"]
[[bin]]
name = "kata-agent"
path = "src/main.rs"
[[bin]]
name = "oci-kata-agent"
path = "src/main.rs"
required-features = ["standard-oci-runtime"]

View File

@@ -14,6 +14,10 @@ PROJECT_COMPONENT = kata-agent
TARGET = $(PROJECT_COMPONENT)
SOURCES := \
$(shell find . 2>&1 | grep -E '.*\.rs$$') \
Cargo.toml
VERSION_FILE := ./VERSION
VERSION := $(shell grep -v ^\# $(VERSION_FILE))
COMMIT_NO := $(shell git rev-parse HEAD 2>/dev/null || true)
@@ -33,16 +37,8 @@ ifeq ($(SECCOMP),yes)
override EXTRA_RUSTFEATURES += seccomp
endif
##VAR STANDARD_OCI_RUNTIME=yes|no define if agent enables standard oci runtime feature
STANDARD_OCI_RUNTIME := no
# Enable standard oci runtime feature of rust build
ifeq ($(STANDARD_OCI_RUNTIME),yes)
override EXTRA_RUSTFEATURES += standard-oci-runtime
endif
ifneq ($(EXTRA_RUSTFEATURES),)
override EXTRA_RUSTFEATURES := --features "$(EXTRA_RUSTFEATURES)"
override EXTRA_RUSTFEATURES := --features $(EXTRA_RUSTFEATURES)
endif
include ../../utils.mk
@@ -112,14 +108,14 @@ $(TARGET): $(GENERATED_CODE) logging-crate-tests $(TARGET_PATH)
logging-crate-tests:
make -C $(CWD)/../libs/logging
$(TARGET_PATH): show-summary
$(TARGET_PATH): $(SOURCES) | show-summary
@RUSTFLAGS="$(EXTRA_RUSTFLAGS) --deny warnings" cargo build --target $(TRIPLE) --$(BUILD_TYPE) $(EXTRA_RUSTFEATURES)
$(GENERATED_FILES): %: %.in
@sed $(foreach r,$(GENERATED_REPLACEMENTS),-e 's|@$r@|$($r)|g') "$<" > "$@"
##TARGET optimize: optimized build
optimize: show-summary show-header
optimize: $(SOURCES) | show-summary show-header
@RUSTFLAGS="-C link-arg=-s $(EXTRA_RUSTFLAGS) --deny warnings" cargo build --target $(TRIPLE) --$(BUILD_TYPE) $(EXTRA_RUSTFEATURES)
##TARGET install: install agent

View File

@@ -25,7 +25,6 @@ path-absolutize = "1.2.0"
anyhow = "1.0.32"
cgroups = { package = "cgroups-rs", version = "0.2.8" }
rlimit = "0.5.3"
cfg-if = "0.1.0"
tokio = { version = "1.2.0", features = ["sync", "io-util", "process", "time", "macros"] }
futures = "0.3.17"
@@ -39,4 +38,3 @@ tempfile = "3.1.0"
[features]
seccomp = ["libseccomp"]
standard-oci-runtime = []

View File

@@ -391,7 +391,7 @@ fn set_memory_resources(cg: &cgroups::Cgroup, memory: &LinuxMemory, update: bool
if let Some(swappiness) = memory.swappiness {
if (0..=100).contains(&swappiness) {
mem_controller.set_swappiness(swappiness)?;
mem_controller.set_swappiness(swappiness as u64)?;
} else {
return Err(anyhow!(
"invalid value:{}. valid memory swappiness range is 0-100",
@@ -590,9 +590,9 @@ fn get_cpuacct_stats(cg: &cgroups::Cgroup) -> SingularPtrField<CpuUsage> {
let h = lines_to_map(&cpuacct.stat);
let usage_in_usermode =
(((*h.get("user").unwrap_or(&0) * NANO_PER_SECOND) as f64) / *CLOCK_TICKS) as u64;
(((*h.get("user").unwrap() * NANO_PER_SECOND) as f64) / *CLOCK_TICKS) as u64;
let usage_in_kernelmode =
(((*h.get("system").unwrap_or(&0) * NANO_PER_SECOND) as f64) / *CLOCK_TICKS) as u64;
(((*h.get("system").unwrap() * NANO_PER_SECOND) as f64) / *CLOCK_TICKS) as u64;
let total_usage = cpuacct.usage;
@@ -623,9 +623,9 @@ fn get_cpuacct_stats(cg: &cgroups::Cgroup) -> SingularPtrField<CpuUsage> {
let cpu_controller: &CpuController = get_controller_or_return_singular_none!(cg);
let stat = cpu_controller.cpu().stat;
let h = lines_to_map(&stat);
let usage_in_usermode = *h.get("user_usec").unwrap_or(&0);
let usage_in_kernelmode = *h.get("system_usec").unwrap_or(&0);
let total_usage = *h.get("usage_usec").unwrap_or(&0);
let usage_in_usermode = *h.get("user_usec").unwrap();
let usage_in_kernelmode = *h.get("system_usec").unwrap();
let total_usage = *h.get("usage_usec").unwrap();
let percpu_usage = vec![];
SingularPtrField::some(CpuUsage {

View File

@@ -151,12 +151,12 @@ async fn register_memory_event(
let eventfd = eventfd(0, EfdFlags::EFD_CLOEXEC)?;
let event_control_path = Path::new(&cg_dir).join("cgroup.event_control");
let data = if arg.is_empty() {
format!("{} {}", eventfd, event_file.as_raw_fd())
let data;
if arg.is_empty() {
data = format!("{} {}", eventfd, event_file.as_raw_fd());
} else {
format!("{} {} {}", eventfd, event_file.as_raw_fd(), arg)
};
data = format!("{} {} {}", eventfd, event_file.as_raw_fd(), arg);
}
fs::write(&event_control_path, data)?;

View File

@@ -1,82 +0,0 @@
// SPDX-License-Identifier: Apache-2.0
//
// Copyright 2021 Sony Group Corporation
//
use anyhow::{anyhow, Result};
use nix::errno::Errno;
use nix::pty;
use nix::sys::{socket, uio};
use nix::unistd::{self, dup2};
use std::os::unix::io::{AsRawFd, RawFd};
use std::path::Path;
pub fn setup_console_socket(csocket_path: &str) -> Result<Option<RawFd>> {
if csocket_path.is_empty() {
return Ok(None);
}
let socket_fd = socket::socket(
socket::AddressFamily::Unix,
socket::SockType::Stream,
socket::SockFlag::empty(),
None,
)?;
match socket::connect(
socket_fd,
&socket::SockAddr::Unix(socket::UnixAddr::new(Path::new(csocket_path))?),
) {
Ok(()) => Ok(Some(socket_fd)),
Err(errno) => Err(anyhow!("failed to open console fd: {}", errno)),
}
}
pub fn setup_master_console(socket_fd: RawFd) -> Result<()> {
let pseudo = pty::openpty(None, None)?;
let pty_name: &[u8] = b"/dev/ptmx";
let iov = [uio::IoVec::from_slice(pty_name)];
let fds = [pseudo.master];
let cmsg = socket::ControlMessage::ScmRights(&fds);
socket::sendmsg(socket_fd, &iov, &[cmsg], socket::MsgFlags::empty(), None)?;
unistd::setsid()?;
let ret = unsafe { libc::ioctl(pseudo.slave, libc::TIOCSCTTY) };
Errno::result(ret).map_err(|e| anyhow!(e).context("ioctl TIOCSCTTY"))?;
dup2(pseudo.slave, std::io::stdin().as_raw_fd())?;
dup2(pseudo.slave, std::io::stdout().as_raw_fd())?;
dup2(pseudo.slave, std::io::stderr().as_raw_fd())?;
unistd::close(socket_fd)?;
Ok(())
}
#[cfg(test)]
mod tests {
use super::*;
use crate::skip_if_not_root;
use std::fs::File;
use std::os::unix::net::UnixListener;
use std::path::PathBuf;
use tempfile::{self, tempdir};
const CONSOLE_SOCKET: &str = "console-socket";
#[test]
fn test_setup_console_socket() {
let dir = tempdir()
.map_err(|e| anyhow!(e).context("tempdir failed"))
.unwrap();
let socket_path = dir.path().join(CONSOLE_SOCKET);
let _listener = UnixListener::bind(&socket_path).unwrap();
let ret = setup_console_socket(socket_path.to_str().unwrap());
assert!(ret.is_ok());
}
}

View File

@@ -23,8 +23,6 @@ use crate::cgroups::fs::Manager as FsManager;
#[cfg(test)]
use crate::cgroups::mock::Manager as FsManager;
use crate::cgroups::Manager;
#[cfg(feature = "standard-oci-runtime")]
use crate::console;
use crate::log_child;
use crate::process::Process;
#[cfg(feature = "seccomp")]
@@ -66,7 +64,7 @@ use tokio::sync::Mutex;
use crate::utils;
pub const EXEC_FIFO_FILENAME: &str = "exec.fifo";
const EXEC_FIFO_FILENAME: &str = "exec.fifo";
const INIT: &str = "INIT";
const NO_PIVOT: &str = "NO_PIVOT";
@@ -76,10 +74,6 @@ const CLOG_FD: &str = "CLOG_FD";
const FIFO_FD: &str = "FIFO_FD";
const HOME_ENV_KEY: &str = "HOME";
const PIDNS_FD: &str = "PIDNS_FD";
const CONSOLE_SOCKET_FD: &str = "CONSOLE_SOCKET_FD";
#[cfg(feature = "standard-oci-runtime")]
const OCI_AGENT_BINARY: &str = "oci-kata-agent";
#[derive(Debug)]
pub struct ContainerStatus {
@@ -88,7 +82,7 @@ pub struct ContainerStatus {
}
impl ContainerStatus {
pub fn new() -> Self {
fn new() -> Self {
ContainerStatus {
pre_status: ContainerState::Created,
cur_status: ContainerState::Created,
@@ -105,12 +99,6 @@ impl ContainerStatus {
}
}
impl Default for ContainerStatus {
fn default() -> Self {
Self::new()
}
}
pub type Config = CreateOpts;
type NamespaceType = String;
@@ -118,7 +106,7 @@ lazy_static! {
// This locker ensures the child exit signal will be received by the right receiver.
pub static ref WAIT_PID_LOCKER: Arc<Mutex<bool>> = Arc::new(Mutex::new(false));
pub static ref NAMESPACES: HashMap<&'static str, CloneFlags> = {
static ref NAMESPACES: HashMap<&'static str, CloneFlags> = {
let mut m = HashMap::new();
m.insert("user", CloneFlags::CLONE_NEWUSER);
m.insert("ipc", CloneFlags::CLONE_NEWIPC);
@@ -131,7 +119,7 @@ lazy_static! {
};
// type to name hashmap, better to be in NAMESPACES
pub static ref TYPETONAME: HashMap<&'static str, &'static str> = {
static ref TYPETONAME: HashMap<&'static str, &'static str> = {
let mut m = HashMap::new();
m.insert("ipc", "ipc");
m.insert("user", "user");
@@ -248,8 +236,6 @@ pub struct LinuxContainer {
pub status: ContainerStatus,
pub created: SystemTime,
pub logger: Logger,
#[cfg(feature = "standard-oci-runtime")]
pub console_socket: PathBuf,
}
#[derive(Serialize, Deserialize, Debug)]
@@ -373,6 +359,7 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
)));
}
}
log_child!(cfd_log, "child process start run");
let buf = read_sync(crfd)?;
let spec_str = std::str::from_utf8(&buf)?;
@@ -392,9 +379,6 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
let cm: FsManager = serde_json::from_str(cm_str)?;
#[cfg(feature = "standard-oci-runtime")]
let csocket_fd = console::setup_console_socket(&std::env::var(CONSOLE_SOCKET_FD)?)?;
let p = if spec.process.is_some() {
spec.process.as_ref().unwrap()
} else {
@@ -686,19 +670,10 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
let _ = unistd::close(crfd);
let _ = unistd::close(cwfd);
unistd::setsid().context("create a new session")?;
if oci_process.terminal {
cfg_if::cfg_if! {
if #[cfg(feature = "standard-oci-runtime")] {
if let Some(csocket_fd) = csocket_fd {
console::setup_master_console(csocket_fd)?;
} else {
return Err(anyhow!("failed to get console master socket fd"));
}
}
else {
unistd::setsid().context("create a new session")?;
unsafe { libc::ioctl(0, libc::TIOCSCTTY) };
}
unsafe {
libc::ioctl(0, libc::TIOCSCTTY);
}
}
@@ -951,24 +926,8 @@ impl BaseContainer for LinuxContainer {
let _ = unistd::close(pid);
});
cfg_if::cfg_if! {
if #[cfg(feature = "standard-oci-runtime")] {
let exec_path = PathBuf::from(OCI_AGENT_BINARY);
}
else {
let exec_path = std::env::current_exe()?;
}
}
let exec_path = std::env::current_exe()?;
let mut child = std::process::Command::new(exec_path);
#[allow(unused_mut)]
let mut console_name = PathBuf::from("");
#[cfg(feature = "standard-oci-runtime")]
if !self.console_socket.as_os_str().is_empty() {
console_name = self.console_socket.clone();
}
let mut child = child
.arg("init")
.stdin(child_stdin)
@@ -978,8 +937,7 @@ impl BaseContainer for LinuxContainer {
.env(NO_PIVOT, format!("{}", self.config.no_pivot_root))
.env(CRFD_FD, format!("{}", crfd))
.env(CWFD_FD, format!("{}", cwfd))
.env(CLOG_FD, format!("{}", cfd_log))
.env(CONSOLE_SOCKET_FD, console_name);
.env(CLOG_FD, format!("{}", cfd_log));
if p.init {
child = child.env(FIFO_FD, format!("{}", fifofd));
@@ -1461,16 +1419,8 @@ impl LinuxContainer {
.unwrap()
.as_secs(),
logger: logger.new(o!("module" => "rustjail", "subsystem" => "container", "cid" => id)),
#[cfg(feature = "standard-oci-runtime")]
console_socket: Path::new("").to_path_buf(),
})
}
#[cfg(feature = "standard-oci-runtime")]
pub fn set_console_socket(&mut self, console_socket: &Path) -> Result<()> {
self.console_socket = console_socket.to_path_buf();
Ok(())
}
}
fn setgroups(grps: &[libc::gid_t]) -> Result<()> {
@@ -1510,7 +1460,7 @@ use std::process::Stdio;
use std::time::Duration;
use tokio::io::{AsyncReadExt, AsyncWriteExt};
pub async fn execute_hook(logger: &Logger, h: &Hook, st: &OCIState) -> Result<()> {
async fn execute_hook(logger: &Logger, h: &Hook, st: &OCIState) -> Result<()> {
let logger = logger.new(o!("action" => "execute-hook"));
let binary = PathBuf::from(h.path.as_str());

View File

@@ -30,8 +30,6 @@ extern crate regex;
pub mod capabilities;
pub mod cgroups;
#[cfg(feature = "standard-oci-runtime")]
pub mod console;
pub mod container;
pub mod mount;
pub mod pipestream;
@@ -267,7 +265,7 @@ pub fn resources_grpc_to_oci(res: &grpc::LinuxResources) -> oci::LinuxResources
swap: Some(mem.Swap),
kernel: Some(mem.Kernel),
kernel_tcp: Some(mem.KernelTCP),
swappiness: Some(mem.Swappiness),
swappiness: Some(mem.Swappiness as i64),
disable_oom_killer: Some(mem.DisableOOMKiller),
})
} else {
@@ -353,12 +351,13 @@ fn seccomp_grpc_to_oci(sec: &grpc::LinuxSeccomp) -> oci::LinuxSeccomp {
for sys in sec.Syscalls.iter() {
let mut args = Vec::new();
let errno_ret: u32;
let errno_ret: u32 = if sys.has_errnoret() {
sys.get_errnoret()
if sys.has_errnoret() {
errno_ret = sys.get_errnoret();
} else {
libc::EPERM as u32
};
errno_ret = libc::EPERM as u32;
}
for arg in sys.Args.iter() {
args.push(oci::LinuxSeccompArg {
@@ -514,7 +513,6 @@ pub fn grpc_to_oci(grpc: &grpc::Spec) -> oci::Spec {
#[cfg(test)]
mod tests {
use super::*;
#[macro_export]
macro_rules! skip_if_not_root {
() => {
@@ -524,595 +522,4 @@ mod tests {
}
};
}
// Parameters:
//
// 1: expected Result
// 2: actual Result
// 3: string used to identify the test on error
#[macro_export]
macro_rules! assert_result {
($expected_result:expr, $actual_result:expr, $msg:expr) => {
if $expected_result.is_ok() {
let expected_value = $expected_result.as_ref().unwrap();
let actual_value = $actual_result.unwrap();
assert!(*expected_value == actual_value, "{}", $msg);
} else {
assert!($actual_result.is_err(), "{}", $msg);
let expected_error = $expected_result.as_ref().unwrap_err();
let expected_error_msg = format!("{:?}", expected_error);
let actual_error_msg = format!("{:?}", $actual_result.unwrap_err());
assert!(expected_error_msg == actual_error_msg, "{}", $msg);
}
};
}
#[test]
fn test_process_grpc_to_oci() {
#[derive(Debug)]
struct TestData {
grpcproc: grpc::Process,
result: oci::Process,
}
let tests = &[
TestData {
// All fields specified
grpcproc: grpc::Process {
Terminal: true,
ConsoleSize: protobuf::SingularPtrField::<grpc::Box>::some(grpc::Box {
Height: 123,
Width: 456,
..Default::default()
}),
User: protobuf::SingularPtrField::<grpc::User>::some(grpc::User {
UID: 1234,
GID: 5678,
AdditionalGids: Vec::from([910, 1112]),
Username: String::from("username"),
..Default::default()
}),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg1"),
String::from("arg2"),
])),
Env: protobuf::RepeatedField::from(Vec::from([String::from("env")])),
Cwd: String::from("cwd"),
Capabilities: protobuf::SingularPtrField::some(grpc::LinuxCapabilities {
Bounding: protobuf::RepeatedField::from(Vec::from([String::from("bnd")])),
Effective: protobuf::RepeatedField::from(Vec::from([String::from("eff")])),
Inheritable: protobuf::RepeatedField::from(Vec::from([String::from(
"inher",
)])),
Permitted: protobuf::RepeatedField::from(Vec::from([String::from("perm")])),
Ambient: protobuf::RepeatedField::from(Vec::from([String::from("amb")])),
..Default::default()
}),
Rlimits: protobuf::RepeatedField::from(Vec::from([
grpc::POSIXRlimit {
Type: String::from("r#type"),
Hard: 123,
Soft: 456,
..Default::default()
},
grpc::POSIXRlimit {
Type: String::from("r#type2"),
Hard: 789,
Soft: 1011,
..Default::default()
},
])),
NoNewPrivileges: true,
ApparmorProfile: String::from("apparmor profile"),
OOMScoreAdj: 123456,
SelinuxLabel: String::from("Selinux Label"),
..Default::default()
},
result: oci::Process {
terminal: true,
console_size: Some(oci::Box {
height: 123,
width: 456,
}),
user: oci::User {
uid: 1234,
gid: 5678,
additional_gids: Vec::from([910, 1112]),
username: String::from("username"),
},
args: Vec::from([String::from("arg1"), String::from("arg2")]),
env: Vec::from([String::from("env")]),
cwd: String::from("cwd"),
capabilities: Some(oci::LinuxCapabilities {
bounding: Vec::from([String::from("bnd")]),
effective: Vec::from([String::from("eff")]),
inheritable: Vec::from([String::from("inher")]),
permitted: Vec::from([String::from("perm")]),
ambient: Vec::from([String::from("amb")]),
}),
rlimits: Vec::from([
oci::PosixRlimit {
r#type: String::from("r#type"),
hard: 123,
soft: 456,
},
oci::PosixRlimit {
r#type: String::from("r#type2"),
hard: 789,
soft: 1011,
},
]),
no_new_privileges: true,
apparmor_profile: String::from("apparmor profile"),
oom_score_adj: Some(123456),
selinux_label: String::from("Selinux Label"),
},
},
TestData {
// None ConsoleSize
grpcproc: grpc::Process {
ConsoleSize: protobuf::SingularPtrField::<grpc::Box>::none(),
OOMScoreAdj: 0,
..Default::default()
},
result: oci::Process {
console_size: None,
oom_score_adj: Some(0),
..Default::default()
},
},
TestData {
// None User
grpcproc: grpc::Process {
User: protobuf::SingularPtrField::<grpc::User>::none(),
OOMScoreAdj: 0,
..Default::default()
},
result: oci::Process {
user: oci::User {
uid: 0,
gid: 0,
additional_gids: vec![],
username: String::from(""),
},
oom_score_adj: Some(0),
..Default::default()
},
},
TestData {
// None Capabilities
grpcproc: grpc::Process {
Capabilities: protobuf::SingularPtrField::none(),
OOMScoreAdj: 0,
..Default::default()
},
result: oci::Process {
capabilities: None,
oom_score_adj: Some(0),
..Default::default()
},
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let result = process_grpc_to_oci(&d.grpcproc);
let msg = format!("{}, result: {:?}", msg, result);
assert_eq!(d.result, result, "{}", msg);
}
}
#[test]
fn test_root_grpc_to_oci() {
#[derive(Debug)]
struct TestData {
grpcroot: grpc::Root,
result: oci::Root,
}
let tests = &[
TestData {
// Default fields
grpcroot: grpc::Root {
..Default::default()
},
result: oci::Root {
..Default::default()
},
},
TestData {
// Specified fields, readonly false
grpcroot: grpc::Root {
Path: String::from("path"),
Readonly: false,
..Default::default()
},
result: oci::Root {
path: String::from("path"),
readonly: false,
..Default::default()
},
},
TestData {
// Specified fields, readonly true
grpcroot: grpc::Root {
Path: String::from("path"),
Readonly: true,
..Default::default()
},
result: oci::Root {
path: String::from("path"),
readonly: true,
..Default::default()
},
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let result = root_grpc_to_oci(&d.grpcroot);
let msg = format!("{}, result: {:?}", msg, result);
assert_eq!(d.result, result, "{}", msg);
}
}
#[test]
fn test_hooks_grpc_to_oci() {
#[derive(Debug)]
struct TestData {
grpchooks: grpc::Hooks,
result: oci::Hooks,
}
let tests = &[
TestData {
// Default fields
grpchooks: grpc::Hooks {
..Default::default()
},
result: oci::Hooks {
..Default::default()
},
},
TestData {
// All specified
grpchooks: grpc::Hooks {
Prestart: protobuf::RepeatedField::from(Vec::from([
grpc::Hook {
Path: String::from("prestartpath"),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg1"),
String::from("arg2"),
])),
Env: protobuf::RepeatedField::from(Vec::from([
String::from("env1"),
String::from("env2"),
])),
Timeout: 10,
..Default::default()
},
grpc::Hook {
Path: String::from("prestartpath2"),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg3"),
String::from("arg4"),
])),
Env: protobuf::RepeatedField::from(Vec::from([
String::from("env3"),
String::from("env4"),
])),
Timeout: 25,
..Default::default()
},
])),
Poststart: protobuf::RepeatedField::from(Vec::from([grpc::Hook {
Path: String::from("poststartpath"),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg1"),
String::from("arg2"),
])),
Env: protobuf::RepeatedField::from(Vec::from([
String::from("env1"),
String::from("env2"),
])),
Timeout: 10,
..Default::default()
}])),
Poststop: protobuf::RepeatedField::from(Vec::from([grpc::Hook {
Path: String::from("poststoppath"),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg1"),
String::from("arg2"),
])),
Env: protobuf::RepeatedField::from(Vec::from([
String::from("env1"),
String::from("env2"),
])),
Timeout: 10,
..Default::default()
}])),
..Default::default()
},
result: oci::Hooks {
prestart: Vec::from([
oci::Hook {
path: String::from("prestartpath"),
args: Vec::from([String::from("arg1"), String::from("arg2")]),
env: Vec::from([String::from("env1"), String::from("env2")]),
timeout: Some(10),
},
oci::Hook {
path: String::from("prestartpath2"),
args: Vec::from([String::from("arg3"), String::from("arg4")]),
env: Vec::from([String::from("env3"), String::from("env4")]),
timeout: Some(25),
},
]),
poststart: Vec::from([oci::Hook {
path: String::from("poststartpath"),
args: Vec::from([String::from("arg1"), String::from("arg2")]),
env: Vec::from([String::from("env1"), String::from("env2")]),
timeout: Some(10),
}]),
poststop: Vec::from([oci::Hook {
path: String::from("poststoppath"),
args: Vec::from([String::from("arg1"), String::from("arg2")]),
env: Vec::from([String::from("env1"), String::from("env2")]),
timeout: Some(10),
}]),
},
},
TestData {
// Prestart empty
grpchooks: grpc::Hooks {
Prestart: protobuf::RepeatedField::from(Vec::from([])),
Poststart: protobuf::RepeatedField::from(Vec::from([grpc::Hook {
Path: String::from("poststartpath"),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg1"),
String::from("arg2"),
])),
Env: protobuf::RepeatedField::from(Vec::from([
String::from("env1"),
String::from("env2"),
])),
Timeout: 10,
..Default::default()
}])),
Poststop: protobuf::RepeatedField::from(Vec::from([grpc::Hook {
Path: String::from("poststoppath"),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg1"),
String::from("arg2"),
])),
Env: protobuf::RepeatedField::from(Vec::from([
String::from("env1"),
String::from("env2"),
])),
Timeout: 10,
..Default::default()
}])),
..Default::default()
},
result: oci::Hooks {
prestart: Vec::from([]),
poststart: Vec::from([oci::Hook {
path: String::from("poststartpath"),
args: Vec::from([String::from("arg1"), String::from("arg2")]),
env: Vec::from([String::from("env1"), String::from("env2")]),
timeout: Some(10),
}]),
poststop: Vec::from([oci::Hook {
path: String::from("poststoppath"),
args: Vec::from([String::from("arg1"), String::from("arg2")]),
env: Vec::from([String::from("env1"), String::from("env2")]),
timeout: Some(10),
}]),
},
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let result = hooks_grpc_to_oci(&d.grpchooks);
let msg = format!("{}, result: {:?}", msg, result);
assert_eq!(d.result, result, "{}", msg);
}
}
#[test]
fn test_mount_grpc_to_oci() {
#[derive(Debug)]
struct TestData {
grpcmount: grpc::Mount,
result: oci::Mount,
}
let tests = &[
TestData {
// Default fields
grpcmount: grpc::Mount {
..Default::default()
},
result: oci::Mount {
..Default::default()
},
},
TestData {
grpcmount: grpc::Mount {
destination: String::from("destination"),
source: String::from("source"),
field_type: String::from("fieldtype"),
options: protobuf::RepeatedField::from(Vec::from([
String::from("option1"),
String::from("option2"),
])),
..Default::default()
},
result: oci::Mount {
destination: String::from("destination"),
source: String::from("source"),
r#type: String::from("fieldtype"),
options: Vec::from([String::from("option1"), String::from("option2")]),
},
},
TestData {
grpcmount: grpc::Mount {
destination: String::from("destination"),
source: String::from("source"),
field_type: String::from("fieldtype"),
options: protobuf::RepeatedField::from(Vec::new()),
..Default::default()
},
result: oci::Mount {
destination: String::from("destination"),
source: String::from("source"),
r#type: String::from("fieldtype"),
options: Vec::new(),
},
},
TestData {
grpcmount: grpc::Mount {
destination: String::new(),
source: String::from("source"),
field_type: String::from("fieldtype"),
options: protobuf::RepeatedField::from(Vec::from([String::from("option1")])),
..Default::default()
},
result: oci::Mount {
destination: String::new(),
source: String::from("source"),
r#type: String::from("fieldtype"),
options: Vec::from([String::from("option1")]),
},
},
TestData {
grpcmount: grpc::Mount {
destination: String::from("destination"),
source: String::from("source"),
field_type: String::new(),
options: protobuf::RepeatedField::from(Vec::from([String::from("option1")])),
..Default::default()
},
result: oci::Mount {
destination: String::from("destination"),
source: String::from("source"),
r#type: String::new(),
options: Vec::from([String::from("option1")]),
},
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let result = mount_grpc_to_oci(&d.grpcmount);
let msg = format!("{}, result: {:?}", msg, result);
assert_eq!(d.result, result, "{}", msg);
}
}
#[test]
fn test_hook_grpc_to_oci<'a>() {
#[derive(Debug)]
struct TestData<'a> {
grpchook: &'a [grpc::Hook],
result: Vec<oci::Hook>,
}
let tests = &[
TestData {
// Default fields
grpchook: &[
grpc::Hook {
Timeout: 0,
..Default::default()
},
grpc::Hook {
Timeout: 0,
..Default::default()
},
],
result: vec![
oci::Hook {
timeout: Some(0),
..Default::default()
},
oci::Hook {
timeout: Some(0),
..Default::default()
},
],
},
TestData {
// Specified fields
grpchook: &[
grpc::Hook {
Path: String::from("path"),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg1"),
String::from("arg2"),
])),
Env: protobuf::RepeatedField::from(Vec::from([
String::from("env1"),
String::from("env2"),
])),
Timeout: 10,
..Default::default()
},
grpc::Hook {
Path: String::from("path2"),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg3"),
String::from("arg4"),
])),
Env: protobuf::RepeatedField::from(Vec::from([
String::from("env3"),
String::from("env4"),
])),
Timeout: 20,
..Default::default()
},
],
result: vec![
oci::Hook {
path: String::from("path"),
args: Vec::from([String::from("arg1"), String::from("arg2")]),
env: Vec::from([String::from("env1"), String::from("env2")]),
timeout: Some(10),
},
oci::Hook {
path: String::from("path2"),
args: Vec::from([String::from("arg3"), String::from("arg4")]),
env: Vec::from([String::from("env3"), String::from("env4")]),
timeout: Some(20),
},
],
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let result = hook_grpc_to_oci(d.grpchook);
let msg = format!("{}, result: {:?}", msg, result);
assert_eq!(d.result, result, "{}", msg);
}
}
}

View File

@@ -32,21 +32,16 @@ use crate::log_child;
// Info reveals information about a particular mounted filesystem. This
// struct is populated from the content in the /proc/<pid>/mountinfo file.
#[derive(std::fmt::Debug, PartialEq)]
#[derive(std::fmt::Debug)]
pub struct Info {
mount_point: String,
optional: String,
fstype: String,
}
const MOUNTINFO_FORMAT: &str = "{d} {d} {d}:{d} {} {} {} {}";
const MOUNTINFO_PATH: &str = "/proc/self/mountinfo";
const MOUNTINFOFORMAT: &str = "{d} {d} {d}:{d} {} {} {} {}";
const PROC_PATH: &str = "/proc";
const ERR_FAILED_PARSE_MOUNTINFO: &str = "failed to parse mountinfo file";
const ERR_FAILED_PARSE_MOUNTINFO_FINAL_FIELDS: &str =
"failed to parse final fields in mountinfo file";
// since libc didn't defined this const for musl, thus redefined it here.
#[cfg(all(target_os = "linux", target_env = "gnu", not(target_arch = "s390x")))]
const PROC_SUPER_MAGIC: libc::c_long = 0x00009fa0;
@@ -523,7 +518,7 @@ pub fn pivot_rootfs<P: ?Sized + NixPath + std::fmt::Debug>(path: &P) -> Result<(
}
fn rootfs_parent_mount_private(path: &str) -> Result<()> {
let mount_infos = parse_mount_table(MOUNTINFO_PATH)?;
let mount_infos = parse_mount_table()?;
let mut max_len = 0;
let mut mount_point = String::from("");
@@ -551,8 +546,8 @@ fn rootfs_parent_mount_private(path: &str) -> Result<()> {
// Parse /proc/self/mountinfo because comparing Dev and ino does not work from
// bind mounts
fn parse_mount_table(mountinfo_path: &str) -> Result<Vec<Info>> {
let file = File::open(mountinfo_path)?;
fn parse_mount_table() -> Result<Vec<Info>> {
let file = File::open("/proc/self/mountinfo")?;
let reader = BufReader::new(file);
let mut infos = Vec::new();
@@ -574,7 +569,7 @@ fn parse_mount_table(mountinfo_path: &str) -> Result<Vec<Info>> {
let (_id, _parent, _major, _minor, _root, mount_point, _opts, optional) = scan_fmt!(
&line,
MOUNTINFO_FORMAT,
MOUNTINFOFORMAT,
i32,
i32,
i32,
@@ -583,17 +578,12 @@ fn parse_mount_table(mountinfo_path: &str) -> Result<Vec<Info>> {
String,
String,
String
)
.map_err(|_| anyhow!(ERR_FAILED_PARSE_MOUNTINFO))?;
)?;
let fields: Vec<&str> = line.split(" - ").collect();
if fields.len() == 2 {
let final_fields: Vec<&str> = fields[1].split_whitespace().collect();
if final_fields.len() != 3 {
return Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO_FINAL_FIELDS));
}
let fstype = final_fields[0].to_string();
let (fstype, _source, _vfs_opts) =
scan_fmt!(fields[1], "{} {} {}", String, String, String)?;
let mut optional_new = String::new();
if optional != "-" {
@@ -608,7 +598,7 @@ fn parse_mount_table(mountinfo_path: &str) -> Result<Vec<Info>> {
infos.push(info);
} else {
return Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO));
return Err(anyhow!("failed to parse mount info file".to_string()));
}
}
@@ -629,7 +619,7 @@ fn chroot<P: ?Sized + NixPath>(_path: &P) -> Result<(), nix::Error> {
pub fn ms_move_root(rootfs: &str) -> Result<bool> {
unistd::chdir(rootfs)?;
let mount_infos = parse_mount_table(MOUNTINFO_PATH)?;
let mount_infos = parse_mount_table()?;
let root_path = Path::new(rootfs);
let abs_root_buf = root_path.absolutize()?;
@@ -1056,12 +1046,10 @@ fn readonly_path(path: &str) -> Result<()> {
#[cfg(test)]
mod tests {
use super::*;
use crate::assert_result;
use crate::skip_if_not_root;
use std::fs::create_dir;
use std::fs::create_dir_all;
use std::fs::remove_dir_all;
use std::io;
use std::os::unix::fs;
use std::os::unix::io::AsRawFd;
use tempfile::tempdir;
@@ -1298,113 +1286,6 @@ mod tests {
let ret = stat::stat(path);
assert!(ret.is_ok(), "Should pass. Got: {:?}", ret);
}
#[test]
fn test_mount_from() {
#[derive(Debug)]
struct TestData<'a> {
source: &'a str,
destination: &'a str,
r#type: &'a str,
flags: MsFlags,
error_contains: &'a str,
// if true, a directory will be created at path in source
make_source_directory: bool,
// if true, a file will be created at path in source
make_source_file: bool,
}
impl Default for TestData<'_> {
fn default() -> Self {
TestData {
source: "tmp",
destination: "dest",
r#type: "tmpfs",
flags: MsFlags::empty(),
error_contains: "",
make_source_directory: true,
make_source_file: false,
}
}
}
let tests = &[
TestData {
..Default::default()
},
TestData {
flags: MsFlags::MS_BIND,
..Default::default()
},
TestData {
r#type: "bind",
..Default::default()
},
TestData {
r#type: "cgroup2",
..Default::default()
},
TestData {
r#type: "bind",
make_source_directory: false,
error_contains: &format!("{}", std::io::Error::from_raw_os_error(libc::ENOENT)),
..Default::default()
},
TestData {
r#type: "bind",
make_source_directory: false,
make_source_file: true,
..Default::default()
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let tempdir = tempdir().unwrap();
let (rfd, wfd) = unistd::pipe2(OFlag::O_CLOEXEC).unwrap();
defer!({
unistd::close(rfd).unwrap();
unistd::close(wfd).unwrap();
});
let source_path = tempdir.path().join(d.source).to_str().unwrap().to_string();
if d.make_source_directory {
std::fs::create_dir_all(&source_path).unwrap();
} else if d.make_source_file {
std::fs::write(&source_path, []).unwrap();
}
let mount = Mount {
source: source_path,
destination: d.destination.to_string(),
r#type: d.r#type.to_string(),
options: vec![],
};
let result = mount_from(
wfd,
&mount,
tempdir.path().to_str().unwrap(),
d.flags,
"",
"",
);
let msg = format!("{}: result: {:?}", msg, result);
if d.error_contains.is_empty() {
assert!(result.is_ok(), "{}", msg);
} else {
assert!(result.is_err(), "{}", msg);
let error_msg = format!("{}", result.unwrap_err());
assert!(error_msg.contains(d.error_contains), "{}", msg);
}
}
}
#[test]
fn test_check_proc_mount() {
let mount = oci::Mount {
@@ -1520,121 +1401,6 @@ mod tests {
}
}
#[test]
fn test_parse_mount_table() {
#[derive(Debug)]
struct TestData<'a> {
mountinfo_data: Option<&'a str>,
result: Result<Vec<Info>>,
}
let tests = &[
TestData {
mountinfo_data: Some(
"22 933 0:20 / /sys rw,nodev shared:2 - sysfs sysfs rw,noexec",
),
result: Ok(vec![Info {
mount_point: "/sys".to_string(),
optional: "shared:2".to_string(),
fstype: "sysfs".to_string(),
}]),
},
TestData {
mountinfo_data: Some(
r#"22 933 0:20 / /sys rw,nodev - sysfs sysfs rw,noexec
81 13 1:2 / /tmp/dir rw shared:2 - tmpfs tmpfs rw"#,
),
result: Ok(vec![
Info {
mount_point: "/sys".to_string(),
optional: "".to_string(),
fstype: "sysfs".to_string(),
},
Info {
mount_point: "/tmp/dir".to_string(),
optional: "shared:2".to_string(),
fstype: "tmpfs".to_string(),
},
]),
},
TestData {
mountinfo_data: Some(
"22 933 0:20 /foo\040-\040bar /sys rw,nodev shared:2 - sysfs sysfs rw,noexec",
),
result: Ok(vec![Info {
mount_point: "/sys".to_string(),
optional: "shared:2".to_string(),
fstype: "sysfs".to_string(),
}]),
},
TestData {
mountinfo_data: Some(""),
result: Ok(vec![]),
},
TestData {
mountinfo_data: Some("invalid line data - sysfs sysfs rw"),
result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO)),
},
TestData {
mountinfo_data: Some("22 96 0:21 / /sys rw,noexec - sysfs"),
result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO_FINAL_FIELDS)),
},
TestData {
mountinfo_data: Some("22 96 0:21 / /sys rw,noexec - sysfs sysfs rw rw"),
result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO_FINAL_FIELDS)),
},
TestData {
mountinfo_data: Some("22 96 0:21 / /sys rw,noexec shared:2 - x - x"),
result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO)),
},
TestData {
mountinfo_data: Some("-"),
result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO)),
},
TestData {
mountinfo_data: Some("--"),
result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO)),
},
TestData {
mountinfo_data: Some("- -"),
result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO)),
},
TestData {
mountinfo_data: Some(" - "),
result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO)),
},
TestData {
mountinfo_data: Some(
r#"22 933 0:20 / /sys rw,nodev - sysfs sysfs rw,noexec
invalid line
81 13 1:2 / /tmp/dir rw shared:2 - tmpfs tmpfs rw"#,
),
result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO)),
},
TestData {
mountinfo_data: None,
result: Err(anyhow!(io::Error::from_raw_os_error(libc::ENOENT))),
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let tempdir = tempdir().unwrap();
let mountinfo_path = tempdir.path().join("mountinfo");
if let Some(mountinfo_data) = d.mountinfo_data {
std::fs::write(&mountinfo_path, mountinfo_data).unwrap();
}
let result = parse_mount_table(mountinfo_path.to_str().unwrap());
let msg = format!("{}: result: {:?}", msg, result);
assert_result!(d.result, result, msg);
}
}
#[test]
fn test_dev_rel_path() {
// Valid device paths

View File

@@ -5,7 +5,7 @@
use oci::Spec;
#[derive(Serialize, Deserialize, Debug, Default, Clone)]
#[derive(Debug)]
pub struct CreateOpts {
pub cgroup_name: String,
pub use_systemd_cgroup: bool,

View File

@@ -97,13 +97,12 @@ mod tests {
let temp_passwd = format!("{}/passwd", tmpdir_path);
let mut tempf = File::create(temp_passwd.as_str()).unwrap();
let passwd_entries = "root:x:0:0:root:/root0:/bin/bash
root:x:1:0:root:/root1:/bin/bash
#root:x:1:0:root:/rootx:/bin/bash
root:x:2:0:root:/root2:/bin/bash
root:x:3:0:root:/root3
root:x:3:0:root:/root3:/bin/bash";
writeln!(tempf, "{}", passwd_entries).unwrap();
writeln!(tempf, "root:x:0:0:root:/root0:/bin/bash").unwrap();
writeln!(tempf, "root:x:1:0:root:/root1:/bin/bash").unwrap();
writeln!(tempf, "#root:x:1:0:root:/rootx:/bin/bash").unwrap();
writeln!(tempf, "root:x:2:0:root:/root2:/bin/bash").unwrap();
writeln!(tempf, "root:x:3:0:root:/root3").unwrap();
writeln!(tempf, "root:x:3:0:root:/root3:/bin/bash").unwrap();
let entry = get_entry_by_uid(0, temp_passwd.as_str()).unwrap();
assert_eq!(entry.dir.as_str(), "/root0");

View File

@@ -432,8 +432,6 @@ fn get_container_pipe_size(param: &str) -> Result<i32> {
#[cfg(test)]
mod tests {
use crate::assert_result;
use super::*;
use anyhow::anyhow;
use std::fs::File;
@@ -441,6 +439,32 @@ mod tests {
use std::time;
use tempfile::tempdir;
// Parameters:
//
// 1: expected Result
// 2: actual Result
// 3: string used to identify the test on error
macro_rules! assert_result {
($expected_result:expr, $actual_result:expr, $msg:expr) => {
if $expected_result.is_ok() {
let expected_level = $expected_result.as_ref().unwrap();
let actual_level = $actual_result.unwrap();
assert!(*expected_level == actual_level, "{}", $msg);
} else {
let expected_error = $expected_result.as_ref().unwrap_err();
let expected_error_msg = format!("{:?}", expected_error);
if let Err(actual_error) = $actual_result {
let actual_error_msg = format!("{:?}", actual_error);
assert!(expected_error_msg == actual_error_msg, "{}", $msg);
} else {
assert!(expected_error_msg == "expected error, got OK", "{}", $msg);
}
}
};
}
#[test]
fn test_new() {
let config: AgentConfig = Default::default();

View File

@@ -125,7 +125,9 @@ fn announce(logger: &Logger, config: &AgentConfig) {
// output to the vsock port specified, or stdout.
async fn create_logger_task(rfd: RawFd, vsock_port: u32, shutdown: Receiver<bool>) -> Result<()> {
let mut reader = PipeStream::from_fd(rfd);
let mut writer: Box<dyn AsyncWrite + Unpin + Send> = if vsock_port > 0 {
let mut writer: Box<dyn AsyncWrite + Unpin + Send>;
if vsock_port > 0 {
let listenfd = socket::socket(
AddressFamily::Vsock,
SockType::Stream,
@@ -137,10 +139,10 @@ async fn create_logger_task(rfd: RawFd, vsock_port: u32, shutdown: Receiver<bool
socket::bind(listenfd, &addr)?;
socket::listen(listenfd, 1)?;
Box::new(util::get_vsock_stream(listenfd).await?)
writer = Box::new(util::get_vsock_stream(listenfd).await?);
} else {
Box::new(tokio::io::stdout())
};
writer = Box::new(tokio::io::stdout());
}
let _ = util::interruptable_io_copier(&mut reader, &mut writer, shutdown).await;
@@ -416,59 +418,3 @@ fn reset_sigpipe() {
use crate::config::AgentConfig;
use std::os::unix::io::{FromRawFd, RawFd};
#[cfg(test)]
mod tests {
use super::*;
use crate::test_utils::test_utils::TestUserType;
#[tokio::test]
async fn test_create_logger_task() {
#[derive(Debug)]
struct TestData {
vsock_port: u32,
test_user: TestUserType,
result: Result<()>,
}
let tests = &[
TestData {
// non-root user cannot use privileged vsock port
vsock_port: 1,
test_user: TestUserType::NonRootOnly,
result: Err(anyhow!(nix::errno::Errno::from_i32(libc::EACCES))),
},
TestData {
// passing vsock_port 0 causes logger task to write to stdout
vsock_port: 0,
test_user: TestUserType::Any,
result: Ok(()),
},
];
for (i, d) in tests.iter().enumerate() {
if d.test_user == TestUserType::RootOnly {
skip_if_not_root!();
} else if d.test_user == TestUserType::NonRootOnly {
skip_if_root!();
}
let msg = format!("test[{}]: {:?}", i, d);
let (rfd, wfd) = unistd::pipe2(OFlag::O_CLOEXEC).unwrap();
defer!({
// rfd is closed by the use of PipeStream in the crate_logger_task function,
// but we will attempt to close in case of a failure
let _ = unistd::close(rfd);
unistd::close(wfd).unwrap();
});
let (shutdown_tx, shutdown_rx) = channel(true);
shutdown_tx.send(true).unwrap();
let result = create_logger_task(rfd, d.vsock_port, shutdown_rx).await;
let msg = format!("{}, result: {:?}", msg, result);
assert_result!(d.result, result, msg);
}
}
}

View File

@@ -344,25 +344,25 @@ fn set_gauge_vec_meminfo(gv: &prometheus::GaugeVec, meminfo: &procfs::Meminfo) {
#[instrument]
fn set_gauge_vec_cpu_time(gv: &prometheus::GaugeVec, cpu: &str, cpu_time: &procfs::CpuTime) {
gv.with_label_values(&[cpu, "user"])
.set(cpu_time.user_ms() as f64);
.set(cpu_time.user as f64);
gv.with_label_values(&[cpu, "nice"])
.set(cpu_time.nice_ms() as f64);
.set(cpu_time.nice as f64);
gv.with_label_values(&[cpu, "system"])
.set(cpu_time.system_ms() as f64);
.set(cpu_time.system as f64);
gv.with_label_values(&[cpu, "idle"])
.set(cpu_time.idle_ms() as f64);
.set(cpu_time.idle as f64);
gv.with_label_values(&[cpu, "iowait"])
.set(cpu_time.iowait_ms().unwrap_or(0) as f64);
.set(cpu_time.iowait.unwrap_or(0) as f64);
gv.with_label_values(&[cpu, "irq"])
.set(cpu_time.irq_ms().unwrap_or(0) as f64);
.set(cpu_time.irq.unwrap_or(0) as f64);
gv.with_label_values(&[cpu, "softirq"])
.set(cpu_time.softirq_ms().unwrap_or(0) as f64);
.set(cpu_time.softirq.unwrap_or(0) as f64);
gv.with_label_values(&[cpu, "steal"])
.set(cpu_time.steal_ms().unwrap_or(0) as f64);
.set(cpu_time.steal.unwrap_or(0) as f64);
gv.with_label_values(&[cpu, "guest"])
.set(cpu_time.guest_ms().unwrap_or(0) as f64);
.set(cpu_time.guest.unwrap_or(0) as f64);
gv.with_label_values(&[cpu, "guest_nice"])
.set(cpu_time.guest_nice_ms().unwrap_or(0) as f64);
.set(cpu_time.guest_nice.unwrap_or(0) as f64);
}
#[instrument]

View File

@@ -16,7 +16,7 @@ use std::sync::Arc;
use tokio::sync::Mutex;
use nix::mount::MsFlags;
use nix::unistd::{Gid, Uid};
use nix::unistd::Gid;
use regex::Regex;
@@ -29,7 +29,6 @@ use crate::device::{
use crate::linux_abi::*;
use crate::pci;
use crate::protocols::agent::Storage;
use crate::protocols::types::FSGroupChangePolicy;
use crate::Sandbox;
#[cfg(target_arch = "s390x")]
use crate::{ccw, device::get_virtio_blk_ccw_device_name};
@@ -44,11 +43,6 @@ pub const MOUNT_GUEST_TAG: &str = "kataShared";
// Allocating an FSGroup that owns the pod's volumes
const FS_GID: &str = "fsgid";
const RW_MASK: u32 = 0o660;
const RO_MASK: u32 = 0o440;
const EXEC_MASK: u32 = 0o110;
const MODE_SETGID: u32 = 0o2000;
#[rustfmt::skip]
lazy_static! {
pub static ref FLAGS: HashMap<&'static str, (bool, MsFlags)> = {
@@ -91,11 +85,11 @@ lazy_static! {
}
#[derive(Debug, PartialEq)]
pub struct InitMount<'a> {
fstype: &'a str,
src: &'a str,
dest: &'a str,
options: Vec<&'a str>,
pub struct InitMount {
fstype: &'static str,
src: &'static str,
dest: &'static str,
options: Vec<&'static str>,
}
#[rustfmt::skip]
@@ -121,7 +115,7 @@ lazy_static!{
#[rustfmt::skip]
lazy_static! {
pub static ref INIT_ROOTFS_MOUNTS: Vec<InitMount<'static>> = vec![
pub static ref INIT_ROOTFS_MOUNTS: Vec<InitMount> = vec![
InitMount{fstype: "proc", src: "proc", dest: "/proc", options: vec!["nosuid", "nodev", "noexec"]},
InitMount{fstype: "sysfs", src: "sysfs", dest: "/sys", options: vec!["nosuid", "nodev", "noexec"]},
InitMount{fstype: "devtmpfs", src: "dev", dest: "/dev", options: vec!["nosuid"]},
@@ -228,7 +222,7 @@ async fn ephemeral_storage_handler(
let meta = fs::metadata(&storage.mount_point)?;
let mut permission = meta.permissions();
let o_mode = meta.mode() | MODE_SETGID;
let o_mode = meta.mode() | 0o2000;
permission.set_mode(o_mode);
fs::set_permissions(&storage.mount_point, permission)?;
}
@@ -278,7 +272,7 @@ async fn local_storage_handler(
if need_set_fsgid {
// set SetGid mode mask.
o_mode |= MODE_SETGID;
o_mode |= 0o2000;
}
permission.set_mode(o_mode);
@@ -327,39 +321,26 @@ fn allocate_hugepages(logger: &Logger, options: &[String]) -> Result<()> {
// sysfs entry is always of the form hugepages-${pagesize}kB
// Ref: https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
let path = Path::new(SYS_FS_HUGEPAGES_PREFIX)
.join(format!("hugepages-{}kB", pagesize / 1024))
.join("nr_hugepages");
let path = Path::new(SYS_FS_HUGEPAGES_PREFIX).join(format!("hugepages-{}kB", pagesize / 1024));
if !path.exists() {
fs::create_dir_all(&path).context("create hugepages-size directory")?;
}
// write numpages to nr_hugepages file.
let path = path.join("nr_hugepages");
let numpages = format!("{}", size / pagesize);
info!(logger, "write {} pages to {:?}", &numpages, &path);
let mut file = OpenOptions::new()
.write(true)
.create(true)
.open(&path)
.context(format!("open nr_hugepages directory {:?}", &path))?;
file.write_all(numpages.as_bytes())
.context(format!("write nr_hugepages failed: {:?}", &path))?;
// Even if the write succeeds, the kernel isn't guaranteed to be
// able to allocate all the pages we requested. Verify that it
// did.
let verify = fs::read_to_string(&path).context(format!("reading {:?}", &path))?;
let allocated = verify
.trim_end()
.parse::<u64>()
.map_err(|_| anyhow!("Unexpected text {:?} in {:?}", &verify, &path))?;
if allocated != size / pagesize {
return Err(anyhow!(
"Only allocated {} of {} hugepages of size {}",
allocated,
numpages,
pagesize
));
}
Ok(())
}
@@ -495,9 +476,7 @@ fn common_storage_handler(logger: &Logger, storage: &Storage) -> Result<String>
// Mount the storage device.
let mount_point = storage.mount_point.to_string();
mount_storage(logger, storage)?;
set_ownership(logger, storage)?;
Ok(mount_point)
mount_storage(logger, storage).and(Ok(mount_point))
}
// nvdimm_storage_handler handles the storage for NVDIMM driver.
@@ -581,91 +560,6 @@ fn mount_storage(logger: &Logger, storage: &Storage) -> Result<()> {
)
}
#[instrument]
pub fn set_ownership(logger: &Logger, storage: &Storage) -> Result<()> {
let logger = logger.new(o!("subsystem" => "mount", "fn" => "set_ownership"));
// If fsGroup is not set, skip performing ownership change
if storage.fs_group.is_none() {
return Ok(());
}
let fs_group = storage.get_fs_group();
let mut read_only = false;
let opts_vec: Vec<String> = storage.options.to_vec();
if opts_vec.contains(&String::from("ro")) {
read_only = true;
}
let mount_path = Path::new(&storage.mount_point);
let metadata = mount_path.metadata().map_err(|err| {
error!(logger, "failed to obtain metadata for mount path";
"mount-path" => mount_path.to_str(),
"error" => err.to_string(),
);
err
})?;
if fs_group.group_change_policy == FSGroupChangePolicy::OnRootMismatch
&& metadata.gid() == fs_group.group_id
{
let mut mask = if read_only { RO_MASK } else { RW_MASK };
mask |= EXEC_MASK;
// With fsGroup change policy to OnRootMismatch, if the current
// gid of the mount path root directory matches the desired gid
// and the current permission of mount path root directory is correct,
// then ownership change will be skipped.
let current_mode = metadata.permissions().mode();
if (mask & current_mode == mask) && (current_mode & MODE_SETGID != 0) {
info!(logger, "skipping ownership change for volume";
"mount-path" => mount_path.to_str(),
"fs-group" => fs_group.group_id.to_string(),
);
return Ok(());
}
}
info!(logger, "performing recursive ownership change";
"mount-path" => mount_path.to_str(),
"fs-group" => fs_group.group_id.to_string(),
);
recursive_ownership_change(
mount_path,
None,
Some(Gid::from_raw(fs_group.group_id)),
read_only,
)
}
#[instrument]
pub fn recursive_ownership_change(
path: &Path,
uid: Option<Uid>,
gid: Option<Gid>,
read_only: bool,
) -> Result<()> {
let mut mask = if read_only { RO_MASK } else { RW_MASK };
if path.is_dir() {
for entry in fs::read_dir(&path)? {
recursive_ownership_change(entry?.path().as_path(), uid, gid, read_only)?;
}
mask |= EXEC_MASK;
mask |= MODE_SETGID;
}
nix::unistd::chown(path, uid, gid)?;
if gid.is_some() {
let metadata = path.metadata()?;
let mut permission = metadata.permissions();
let target_mode = metadata.mode() | mask;
permission.set_mode(target_mode);
fs::set_permissions(path, permission)?;
}
Ok(())
}
/// Looks for `mount_point` entry in the /proc/mounts.
#[instrument]
pub fn is_mounted(mount_point: &str) -> Result<bool> {
@@ -869,7 +763,7 @@ pub fn get_cgroup_mounts(
logger: &Logger,
cg_path: &str,
unified_cgroup_hierarchy: bool,
) -> Result<Vec<InitMount<'static>>> {
) -> Result<Vec<InitMount>> {
// cgroup v2
// https://github.com/kata-containers/agent/blob/8c9bbadcd448c9a67690fbe11a860aaacc69813c/agent.go#L1249
if unified_cgroup_hierarchy {
@@ -1017,16 +911,20 @@ fn parse_options(option_list: Vec<String>) -> HashMap<String, String> {
#[cfg(test)]
mod tests {
use super::*;
use crate::test_utils::test_utils::TestUserType;
use crate::{skip_if_not_root, skip_loop_if_not_root, skip_loop_if_root};
use protobuf::RepeatedField;
use protocols::agent::FSGroup;
use std::fs::File;
use std::fs::OpenOptions;
use std::io::Write;
use std::path::PathBuf;
use tempfile::tempdir;
#[derive(Debug, PartialEq)]
enum TestUserType {
RootOnly,
NonRootOnly,
Any,
}
#[test]
fn test_mount() {
#[derive(Debug)]
@@ -1125,7 +1023,7 @@ mod tests {
let dest_filename: String;
if !d.src.is_empty() {
src = dir.path().join(d.src);
src = dir.path().join(d.src.to_string());
src_filename = src
.to_str()
.expect("failed to convert src to filename")
@@ -1135,7 +1033,7 @@ mod tests {
}
if !d.dest.is_empty() {
dest = dir.path().join(d.dest);
dest = dir.path().join(d.dest.to_string());
dest_filename = dest
.to_str()
.expect("failed to convert dest to filename")
@@ -1586,233 +1484,6 @@ mod tests {
assert!(testfile.is_file());
}
#[test]
fn test_mount_storage() {
#[derive(Debug)]
struct TestData<'a> {
test_user: TestUserType,
storage: Storage,
error_contains: &'a str,
make_source_dir: bool,
make_mount_dir: bool,
deny_mount_permission: bool,
}
impl Default for TestData<'_> {
fn default() -> Self {
TestData {
test_user: TestUserType::Any,
storage: Storage {
mount_point: "mnt".to_string(),
source: "src".to_string(),
fstype: "tmpfs".to_string(),
..Default::default()
},
make_source_dir: true,
make_mount_dir: false,
deny_mount_permission: false,
error_contains: "",
}
}
}
let tests = &[
TestData {
test_user: TestUserType::NonRootOnly,
error_contains: "EPERM: Operation not permitted",
..Default::default()
},
TestData {
test_user: TestUserType::RootOnly,
..Default::default()
},
TestData {
storage: Storage {
mount_point: "mnt".to_string(),
source: "src".to_string(),
fstype: "bind".to_string(),
..Default::default()
},
make_source_dir: false,
make_mount_dir: true,
error_contains: "Could not create mountpoint",
..Default::default()
},
TestData {
test_user: TestUserType::NonRootOnly,
deny_mount_permission: true,
error_contains: "Could not create mountpoint",
..Default::default()
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
if d.test_user == TestUserType::RootOnly {
skip_loop_if_not_root!(msg);
} else if d.test_user == TestUserType::NonRootOnly {
skip_loop_if_root!(msg);
}
let drain = slog::Discard;
let logger = slog::Logger::root(drain, o!());
let tempdir = tempdir().unwrap();
let source = tempdir.path().join(&d.storage.source);
let mount_point = tempdir.path().join(&d.storage.mount_point);
let storage = Storage {
source: source.to_str().unwrap().to_string(),
mount_point: mount_point.to_str().unwrap().to_string(),
..d.storage.clone()
};
if d.make_source_dir {
fs::create_dir_all(&storage.source).unwrap();
}
if d.make_mount_dir {
fs::create_dir_all(&storage.mount_point).unwrap();
}
if d.deny_mount_permission {
fs::set_permissions(
mount_point.parent().unwrap(),
fs::Permissions::from_mode(0o000),
)
.unwrap();
}
let result = mount_storage(&logger, &storage);
// restore permissions so tempdir can be cleaned up
if d.deny_mount_permission {
fs::set_permissions(
mount_point.parent().unwrap(),
fs::Permissions::from_mode(0o755),
)
.unwrap();
}
if result.is_ok() {
nix::mount::umount(&mount_point).unwrap();
}
let msg = format!("{}: result: {:?}", msg, result);
if d.error_contains.is_empty() {
assert!(result.is_ok(), "{}", msg);
} else {
assert!(result.is_err(), "{}", msg);
let error_msg = format!("{}", result.unwrap_err());
assert!(error_msg.contains(d.error_contains), "{}", msg);
}
}
}
#[test]
fn test_mount_to_rootfs() {
#[derive(Debug)]
struct TestData<'a> {
test_user: TestUserType,
src: &'a str,
options: Vec<&'a str>,
error_contains: &'a str,
deny_mount_dir_permission: bool,
// if true src will be prepended with a temporary directory
mask_src: bool,
}
impl Default for TestData<'_> {
fn default() -> Self {
TestData {
test_user: TestUserType::Any,
src: "src",
options: vec![],
error_contains: "",
deny_mount_dir_permission: false,
mask_src: true,
}
}
}
let tests = &[
TestData {
test_user: TestUserType::NonRootOnly,
error_contains: "EPERM: Operation not permitted",
..Default::default()
},
TestData {
test_user: TestUserType::NonRootOnly,
src: "dev",
mask_src: false,
..Default::default()
},
TestData {
test_user: TestUserType::RootOnly,
..Default::default()
},
TestData {
test_user: TestUserType::NonRootOnly,
deny_mount_dir_permission: true,
error_contains: "could not create directory",
..Default::default()
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
if d.test_user == TestUserType::RootOnly {
skip_loop_if_not_root!(msg);
} else if d.test_user == TestUserType::NonRootOnly {
skip_loop_if_root!(msg);
}
let drain = slog::Discard;
let logger = slog::Logger::root(drain, o!());
let tempdir = tempdir().unwrap();
let src = if d.mask_src {
tempdir.path().join(&d.src)
} else {
Path::new(d.src).to_path_buf()
};
let dest = tempdir.path().join("mnt");
let init_mount = InitMount {
fstype: "tmpfs",
src: src.to_str().unwrap(),
dest: dest.to_str().unwrap(),
options: d.options.clone(),
};
if d.deny_mount_dir_permission {
fs::set_permissions(dest.parent().unwrap(), fs::Permissions::from_mode(0o000))
.unwrap();
}
let result = mount_to_rootfs(&logger, &init_mount);
// restore permissions so tempdir can be cleaned up
if d.deny_mount_dir_permission {
fs::set_permissions(dest.parent().unwrap(), fs::Permissions::from_mode(0o755))
.unwrap();
}
if result.is_ok() && d.mask_src {
nix::mount::umount(&dest).unwrap();
}
let msg = format!("{}: result: {:?}", msg, result);
if d.error_contains.is_empty() {
assert!(result.is_ok(), "{}", msg);
} else {
assert!(result.is_err(), "{}", msg);
let error_msg = format!("{}", result.unwrap_err());
assert!(error_msg.contains(d.error_contains), "{}", msg);
}
}
}
#[test]
fn test_get_pagesize_and_size_from_option() {
let expected_pagesize = 2048;
@@ -1868,263 +1539,4 @@ mod tests {
}
}
}
#[test]
fn test_parse_mount_flags_and_options() {
#[derive(Debug)]
struct TestData<'a> {
options_vec: Vec<&'a str>,
result: (MsFlags, &'a str),
}
let tests = &[
TestData {
options_vec: vec![],
result: (MsFlags::empty(), ""),
},
TestData {
options_vec: vec!["ro"],
result: (MsFlags::MS_RDONLY, ""),
},
TestData {
options_vec: vec!["rw"],
result: (MsFlags::empty(), ""),
},
TestData {
options_vec: vec!["ro", "rw"],
result: (MsFlags::empty(), ""),
},
TestData {
options_vec: vec!["ro", "nodev"],
result: (MsFlags::MS_RDONLY | MsFlags::MS_NODEV, ""),
},
TestData {
options_vec: vec!["option1", "nodev", "option2"],
result: (MsFlags::MS_NODEV, "option1,option2"),
},
TestData {
options_vec: vec!["rbind", "", "ro"],
result: (MsFlags::MS_BIND | MsFlags::MS_REC | MsFlags::MS_RDONLY, ""),
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let result = parse_mount_flags_and_options(d.options_vec.clone());
let msg = format!("{}: result: {:?}", msg, result);
let expected_result = (d.result.0, d.result.1.to_owned());
assert_eq!(expected_result, result, "{}", msg);
}
}
#[test]
fn test_set_ownership() {
skip_if_not_root!();
let logger = slog::Logger::root(slog::Discard, o!());
#[derive(Debug)]
struct TestData<'a> {
mount_path: &'a str,
fs_group: Option<FSGroup>,
read_only: bool,
expected_group_id: u32,
expected_permission: u32,
}
let tests = &[
TestData {
mount_path: "foo",
fs_group: None,
read_only: false,
expected_group_id: 0,
expected_permission: 0,
},
TestData {
mount_path: "rw_mount",
fs_group: Some(FSGroup {
group_id: 3000,
group_change_policy: FSGroupChangePolicy::Always,
unknown_fields: Default::default(),
cached_size: Default::default(),
}),
read_only: false,
expected_group_id: 3000,
expected_permission: RW_MASK | EXEC_MASK | MODE_SETGID,
},
TestData {
mount_path: "ro_mount",
fs_group: Some(FSGroup {
group_id: 3000,
group_change_policy: FSGroupChangePolicy::OnRootMismatch,
unknown_fields: Default::default(),
cached_size: Default::default(),
}),
read_only: true,
expected_group_id: 3000,
expected_permission: RO_MASK | EXEC_MASK | MODE_SETGID,
},
];
let tempdir = tempdir().expect("failed to create tmpdir");
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let mount_dir = tempdir.path().join(d.mount_path);
fs::create_dir(&mount_dir)
.unwrap_or_else(|_| panic!("{}: failed to create root directory", msg));
let directory_mode = mount_dir.as_path().metadata().unwrap().permissions().mode();
let mut storage_data = Storage::new();
if d.read_only {
storage_data.set_options(RepeatedField::from_slice(&[
"foo".to_string(),
"ro".to_string(),
]));
}
if let Some(fs_group) = d.fs_group.clone() {
storage_data.set_fs_group(fs_group);
}
storage_data.mount_point = mount_dir.clone().into_os_string().into_string().unwrap();
let result = set_ownership(&logger, &storage_data);
assert!(result.is_ok());
assert_eq!(
mount_dir.as_path().metadata().unwrap().gid(),
d.expected_group_id
);
assert_eq!(
mount_dir.as_path().metadata().unwrap().permissions().mode(),
(directory_mode | d.expected_permission)
);
}
}
#[test]
fn test_recursive_ownership_change() {
skip_if_not_root!();
const COUNT: usize = 5;
#[derive(Debug)]
struct TestData<'a> {
// Directory where the recursive ownership change should be performed on
path: &'a str,
// User ID for ownership change
uid: u32,
// Group ID for ownership change
gid: u32,
// Set when the permission should be read-only
read_only: bool,
// The expected permission of all directories after ownership change
expected_permission_directory: u32,
// The expected permission of all files after ownership change
expected_permission_file: u32,
}
let tests = &[
TestData {
path: "no_gid_change",
uid: 0,
gid: 0,
read_only: false,
expected_permission_directory: 0,
expected_permission_file: 0,
},
TestData {
path: "rw_gid_change",
uid: 0,
gid: 3000,
read_only: false,
expected_permission_directory: RW_MASK | EXEC_MASK | MODE_SETGID,
expected_permission_file: RW_MASK,
},
TestData {
path: "ro_gid_change",
uid: 0,
gid: 3000,
read_only: true,
expected_permission_directory: RO_MASK | EXEC_MASK | MODE_SETGID,
expected_permission_file: RO_MASK,
},
];
let tempdir = tempdir().expect("failed to create tmpdir");
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let mount_dir = tempdir.path().join(d.path);
fs::create_dir(&mount_dir)
.unwrap_or_else(|_| panic!("{}: failed to create root directory", msg));
let directory_mode = mount_dir.as_path().metadata().unwrap().permissions().mode();
let mut file_mode: u32 = 0;
// create testing directories and files
for n in 1..COUNT {
let nest_dir = mount_dir.join(format!("nested{}", n));
fs::create_dir(&nest_dir)
.unwrap_or_else(|_| panic!("{}: failed to create nest directory", msg));
for f in 1..COUNT {
let filename = nest_dir.join(format!("file{}", f));
File::create(&filename)
.unwrap_or_else(|_| panic!("{}: failed to create file", msg));
file_mode = filename.as_path().metadata().unwrap().permissions().mode();
}
}
let uid = if d.uid > 0 {
Some(Uid::from_raw(d.uid))
} else {
None
};
let gid = if d.gid > 0 {
Some(Gid::from_raw(d.gid))
} else {
None
};
let result = recursive_ownership_change(&mount_dir, uid, gid, d.read_only);
assert!(result.is_ok());
assert_eq!(mount_dir.as_path().metadata().unwrap().gid(), d.gid);
assert_eq!(
mount_dir.as_path().metadata().unwrap().permissions().mode(),
(directory_mode | d.expected_permission_directory)
);
for n in 1..COUNT {
let nest_dir = mount_dir.join(format!("nested{}", n));
for f in 1..COUNT {
let filename = nest_dir.join(format!("file{}", f));
let file = Path::new(&filename);
assert_eq!(file.metadata().unwrap().gid(), d.gid);
assert_eq!(
file.metadata().unwrap().permissions().mode(),
(file_mode | d.expected_permission_file)
);
}
let dir = Path::new(&nest_dir);
assert_eq!(dir.metadata().unwrap().gid(), d.gid);
assert_eq!(
dir.metadata().unwrap().permissions().mode(),
(directory_mode | d.expected_permission_directory)
);
}
}
}
}

View File

@@ -3,7 +3,7 @@
// SPDX-License-Identifier: Apache-2.0
//
use anyhow::{ensure, Result};
use anyhow::Result;
use nix::errno::Errno;
use nix::fcntl::{self, OFlag};
use nix::sys::stat::Mode;
@@ -13,7 +13,7 @@ use tracing::instrument;
pub const RNGDEV: &str = "/dev/random";
pub const RNDADDTOENTCNT: libc::c_int = 0x40045201;
pub const RNDRESEEDCRNG: libc::c_int = 0x5207;
pub const RNDRESEEDRNG: libc::c_int = 0x5207;
// Handle the differing ioctl(2) request types for different targets
#[cfg(target_env = "musl")]
@@ -24,9 +24,6 @@ type IoctlRequestType = libc::c_ulong;
#[instrument]
pub fn reseed_rng(data: &[u8]) -> Result<()> {
let len = data.len() as libc::c_long;
ensure!(len > 0, "missing entropy data");
fs::write(RNGDEV, data)?;
let f = {
@@ -44,52 +41,8 @@ pub fn reseed_rng(data: &[u8]) -> Result<()> {
};
Errno::result(ret).map(drop)?;
let ret = unsafe { libc::ioctl(f.as_raw_fd(), RNDRESEEDCRNG as IoctlRequestType, 0) };
let ret = unsafe { libc::ioctl(f.as_raw_fd(), RNDRESEEDRNG as IoctlRequestType, 0) };
Errno::result(ret).map(drop)?;
Ok(())
}
#[cfg(test)]
mod tests {
use super::*;
use crate::skip_if_not_root;
use std::fs::File;
use std::io::prelude::*;
#[test]
fn test_reseed_rng() {
skip_if_not_root!();
const POOL_SIZE: usize = 512;
let mut f = File::open("/dev/urandom").unwrap();
let mut seed = [0; POOL_SIZE];
let n = f.read(&mut seed).unwrap();
// Ensure the buffer was filled.
assert!(n == POOL_SIZE);
let ret = reseed_rng(&seed);
assert!(ret.is_ok());
}
#[test]
fn test_reseed_rng_not_root() {
const POOL_SIZE: usize = 512;
let mut f = File::open("/dev/urandom").unwrap();
let mut seed = [0; POOL_SIZE];
let n = f.read(&mut seed).unwrap();
// Ensure the buffer was filled.
assert!(n == POOL_SIZE);
let ret = reseed_rng(&seed);
if nix::unistd::Uid::effective().is_root() {
assert!(ret.is_ok());
} else {
assert!(!ret.is_ok());
}
}
#[test]
fn test_reseed_rng_zero_data() {
let seed = [];
let ret = reseed_rng(&seed);
assert!(!ret.is_ok());
}
}

View File

@@ -85,11 +85,6 @@ use std::path::PathBuf;
const CONTAINER_BASE: &str = "/run/kata-containers";
const MODPROBE_PATH: &str = "/sbin/modprobe";
const ERR_CANNOT_GET_WRITER: &str = "Cannot get writer";
const ERR_INVALID_BLOCK_SIZE: &str = "Invalid block size";
const ERR_NO_LINUX_FIELD: &str = "Spec does not contain linux field";
const ERR_NO_SANDBOX_PIDNS: &str = "Sandbox does not have sandbox_pidns";
// Convenience macro to obtain the scope logger
macro_rules! sl {
() => {
@@ -409,8 +404,7 @@ impl AgentService {
// For container initProcess, if it hasn't installed handler for "SIGTERM" signal,
// it will ignore the "SIGTERM" signal sent to it, thus send it "SIGKILL" signal
// instead of "SIGTERM" to terminate it.
let proc_status_file = format!("/proc/{}/status", p.pid);
if p.init && sig == libc::SIGTERM && !is_signal_handled(&proc_status_file, sig as u32) {
if p.init && sig == libc::SIGTERM && !is_signal_handled(p.pid, sig as u32) {
sig = libc::SIGKILL;
}
p.signal(sig)?;
@@ -581,7 +575,7 @@ impl AgentService {
}
};
let writer = writer.ok_or_else(|| anyhow!(ERR_CANNOT_GET_WRITER))?;
let writer = writer.ok_or_else(|| anyhow!("cannot get writer"))?;
writer.lock().await.write_all(req.data.as_slice()).await?;
let mut resp = WriteStreamResponse::new();
@@ -1225,12 +1219,7 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
info!(sl!(), "get guest details!");
let mut resp = GuestDetailsResponse::new();
// to get memory block size
match get_memory_info(
req.mem_block_size,
req.mem_hotplug_probe,
SYSFS_MEMORY_BLOCK_SIZE_PATH,
SYSFS_MEMORY_HOTPLUG_PROBE_PATH,
) {
match get_memory_info(req.mem_block_size, req.mem_hotplug_probe) {
Ok((u, v)) => {
resp.mem_block_size_bytes = u;
resp.support_mem_hotplug_probe = v;
@@ -1419,29 +1408,24 @@ impl protocols::health_ttrpc::Health for HealthService {
}
}
fn get_memory_info(
block_size: bool,
hotplug: bool,
block_size_path: &str,
hotplug_probe_path: &str,
) -> Result<(u64, bool)> {
fn get_memory_info(block_size: bool, hotplug: bool) -> Result<(u64, bool)> {
let mut size: u64 = 0;
let mut plug: bool = false;
if block_size {
match fs::read_to_string(block_size_path) {
match fs::read_to_string(SYSFS_MEMORY_BLOCK_SIZE_PATH) {
Ok(v) => {
if v.is_empty() {
warn!(sl!(), "file {} is empty", block_size_path);
return Err(anyhow!(ERR_INVALID_BLOCK_SIZE));
info!(sl!(), "string in empty???");
return Err(anyhow!("Invalid block size"));
}
size = u64::from_str_radix(v.trim(), 16).map_err(|_| {
warn!(sl!(), "failed to parse the str {} to hex", size);
anyhow!(ERR_INVALID_BLOCK_SIZE)
anyhow!("Invalid block size")
})?;
}
Err(e) => {
warn!(sl!(), "memory block size error: {:?}", e.kind());
info!(sl!(), "memory block size error: {:?}", e.kind());
if e.kind() != std::io::ErrorKind::NotFound {
return Err(anyhow!(e));
}
@@ -1450,10 +1434,10 @@ fn get_memory_info(
}
if hotplug {
match stat::stat(hotplug_probe_path) {
match stat::stat(SYSFS_MEMORY_HOTPLUG_PROBE_PATH) {
Ok(_) => plug = true,
Err(e) => {
warn!(sl!(), "hotplug memory error: {:?}", e);
info!(sl!(), "hotplug memory error: {:?}", e);
match e {
nix::Error::ENOENT => plug = false,
_ => return Err(anyhow!(e)),
@@ -1591,7 +1575,7 @@ fn update_container_namespaces(
let linux = spec
.linux
.as_mut()
.ok_or_else(|| anyhow!(ERR_NO_LINUX_FIELD))?;
.ok_or_else(|| anyhow!("Spec didn't container linux field"))?;
let namespaces = linux.namespaces.as_mut_slice();
for namespace in namespaces.iter_mut() {
@@ -1618,7 +1602,7 @@ fn update_container_namespaces(
if let Some(ref pidns) = &sandbox.sandbox_pidns {
pid_ns.path = String::from(pidns.path.as_str());
} else {
return Err(anyhow!(ERR_NO_SANDBOX_PIDNS));
return Err(anyhow!("failed to get sandbox pidns"));
}
}
@@ -1638,33 +1622,21 @@ fn append_guest_hooks(s: &Sandbox, oci: &mut Spec) -> Result<()> {
Ok(())
}
// Check if the container process installed the
// Check is the container process installed the
// handler for specific signal.
fn is_signal_handled(proc_status_file: &str, signum: u32) -> bool {
let shift_count: u64 = if signum == 0 {
// signum 0 is used to check for process liveness.
// Since that signal is not part of the mask in the file, we only need
// to know if the file (and therefore) process exists to handle
// that signal.
return fs::metadata(proc_status_file).is_ok();
} else if signum > 64 {
// Ensure invalid signum won't break bit shift logic
warn!(sl!(), "received invalid signum {}", signum);
return false;
} else {
(signum - 1).into()
};
fn is_signal_handled(pid: pid_t, signum: u32) -> bool {
let sig_mask: u64 = 1u64 << (signum - 1);
let file_name = format!("/proc/{}/status", pid);
// Open the file in read-only mode (ignoring errors).
let file = match File::open(proc_status_file) {
let file = match File::open(&file_name) {
Ok(f) => f,
Err(_) => {
warn!(sl!(), "failed to open file {}", proc_status_file);
warn!(sl!(), "failed to open file {}\n", file_name);
return false;
}
};
let sig_mask: u64 = 1 << shift_count;
let reader = BufReader::new(file);
// Read the file line by line using the lines() iterator from std::io::BufRead.
@@ -1672,21 +1644,21 @@ fn is_signal_handled(proc_status_file: &str, signum: u32) -> bool {
let line = match line {
Ok(l) => l,
Err(_) => {
warn!(sl!(), "failed to read file {}", proc_status_file);
warn!(sl!(), "failed to read file {}\n", file_name);
return false;
}
};
if line.starts_with("SigCgt:") {
let mask_vec: Vec<&str> = line.split(':').collect();
if mask_vec.len() != 2 {
warn!(sl!(), "parse the SigCgt field failed");
warn!(sl!(), "parse the SigCgt field failed\n");
return false;
}
let sig_cgt_str = mask_vec[1];
let sig_cgt_mask = match u64::from_str_radix(sig_cgt_str, 16) {
Ok(h) => h,
Err(_) => {
warn!(sl!(), "failed to parse the str {} to hex", sig_cgt_str);
warn!(sl!(), "failed to parse the str {} to hex\n", sig_cgt_str);
return false;
}
};
@@ -1802,7 +1774,7 @@ async fn do_add_swap(sandbox: &Arc<Mutex<Sandbox>>, req: &AddSwapRequest) -> Res
// - config.json at /<CONTAINER_BASE>/<cid>/config.json
// - container rootfs bind mounted at /<CONTAINER_BASE>/<cid>/rootfs
// - modify container spec root to point to /<CONTAINER_BASE>/<cid>/rootfs
pub fn setup_bundle(cid: &str, spec: &mut Spec) -> Result<PathBuf> {
fn setup_bundle(cid: &str, spec: &mut Spec) -> Result<PathBuf> {
let spec_root = if let Some(sr) = &spec.root {
sr
} else {
@@ -1894,9 +1866,8 @@ fn load_kernel_module(module: &protocols::agent::KernelModule) -> Result<()> {
#[cfg(test)]
mod tests {
use super::*;
use crate::{assert_result, namespace::Namespace, protocols::agent_ttrpc::AgentService as _};
use oci::{Hook, Hooks, Linux, LinuxNamespace};
use tempfile::{tempdir, TempDir};
use crate::protocols::agent_ttrpc::AgentService as _;
use oci::{Hook, Hooks};
use ttrpc::{r#async::TtrpcContext, MessageHeader};
fn mk_ttrpc_context() -> TtrpcContext {
@@ -1908,44 +1879,6 @@ mod tests {
}
}
fn create_dummy_opts() -> CreateOpts {
let root = Root {
path: String::from("/"),
..Default::default()
};
let spec = Spec {
linux: Some(oci::Linux::default()),
root: Some(root),
..Default::default()
};
CreateOpts {
cgroup_name: "".to_string(),
use_systemd_cgroup: false,
no_pivot_root: false,
no_new_keyring: false,
spec: Some(spec),
rootless_euid: false,
rootless_cgroup: false,
}
}
fn create_linuxcontainer() -> (LinuxContainer, TempDir) {
let dir = tempdir().expect("failed to make tempdir");
(
LinuxContainer::new(
"some_id",
dir.path().join("rootfs").to_str().unwrap(),
create_dummy_opts(),
&slog_scope::logger(),
)
.unwrap(),
dir,
)
}
#[test]
fn test_load_kernel_module() {
let mut m = protocols::agent::KernelModule {
@@ -2038,491 +1971,6 @@ mod tests {
assert!(result.is_err(), "expected add arp neighbors to fail");
}
#[tokio::test]
async fn test_do_write_stream() {
#[derive(Debug)]
struct TestData<'a> {
create_container: bool,
has_fd: bool,
has_tty: bool,
break_pipe: bool,
container_id: &'a str,
exec_id: &'a str,
data: Vec<u8>,
result: Result<protocols::agent::WriteStreamResponse>,
}
impl Default for TestData<'_> {
fn default() -> Self {
TestData {
create_container: true,
has_fd: true,
has_tty: true,
break_pipe: false,
container_id: "1",
exec_id: "2",
data: vec![1, 2, 3],
result: Ok(WriteStreamResponse {
len: 3,
..WriteStreamResponse::default()
}),
}
}
}
let tests = &[
TestData {
..Default::default()
},
TestData {
has_tty: false,
..Default::default()
},
TestData {
break_pipe: true,
result: Err(anyhow!(std::io::Error::from_raw_os_error(libc::EPIPE))),
..Default::default()
},
TestData {
create_container: false,
result: Err(anyhow!(crate::sandbox::ERR_INVALID_CONTAINER_ID)),
..Default::default()
},
TestData {
container_id: "8181",
result: Err(anyhow!(crate::sandbox::ERR_INVALID_CONTAINER_ID)),
..Default::default()
},
TestData {
data: vec![],
result: Ok(WriteStreamResponse {
len: 0,
..WriteStreamResponse::default()
}),
..Default::default()
},
TestData {
has_fd: false,
result: Err(anyhow!(ERR_CANNOT_GET_WRITER)),
..Default::default()
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let logger = slog::Logger::root(slog::Discard, o!());
let mut sandbox = Sandbox::new(&logger).unwrap();
let (rfd, wfd) = unistd::pipe().unwrap();
if d.break_pipe {
unistd::close(rfd).unwrap();
}
if d.create_container {
let (mut linux_container, _root) = create_linuxcontainer();
let exec_process_id = 2;
linux_container.id = "1".to_string();
let mut exec_process = Process::new(
&logger,
&oci::Process::default(),
&exec_process_id.to_string(),
false,
1,
)
.unwrap();
let fd = {
if d.has_fd {
Some(wfd)
} else {
None
}
};
if d.has_tty {
exec_process.parent_stdin = None;
exec_process.term_master = fd;
} else {
exec_process.parent_stdin = fd;
exec_process.term_master = None;
}
linux_container
.processes
.insert(exec_process_id, exec_process);
sandbox.add_container(linux_container);
}
let agent_service = Box::new(AgentService {
sandbox: Arc::new(Mutex::new(sandbox)),
});
let result = agent_service
.do_write_stream(protocols::agent::WriteStreamRequest {
container_id: d.container_id.to_string(),
exec_id: d.exec_id.to_string(),
data: d.data.clone(),
..Default::default()
})
.await;
if !d.break_pipe {
unistd::close(rfd).unwrap();
}
unistd::close(wfd).unwrap();
let msg = format!("{}, result: {:?}", msg, result);
assert_result!(d.result, result, msg);
}
}
#[tokio::test]
async fn test_update_container_namespaces() {
#[derive(Debug)]
struct TestData<'a> {
has_linux_in_spec: bool,
sandbox_pidns_path: Option<&'a str>,
namespaces: Vec<LinuxNamespace>,
use_sandbox_pidns: bool,
result: Result<()>,
expected_namespaces: Vec<LinuxNamespace>,
}
impl Default for TestData<'_> {
fn default() -> Self {
TestData {
has_linux_in_spec: true,
sandbox_pidns_path: Some("sharedpidns"),
namespaces: vec![
LinuxNamespace {
r#type: NSTYPEIPC.to_string(),
path: "ipcpath".to_string(),
},
LinuxNamespace {
r#type: NSTYPEUTS.to_string(),
path: "utspath".to_string(),
},
],
use_sandbox_pidns: false,
result: Ok(()),
expected_namespaces: vec![
LinuxNamespace {
r#type: NSTYPEIPC.to_string(),
path: "".to_string(),
},
LinuxNamespace {
r#type: NSTYPEUTS.to_string(),
path: "".to_string(),
},
LinuxNamespace {
r#type: NSTYPEPID.to_string(),
path: "".to_string(),
},
],
}
}
}
let tests = &[
TestData {
..Default::default()
},
TestData {
use_sandbox_pidns: true,
expected_namespaces: vec![
LinuxNamespace {
r#type: NSTYPEIPC.to_string(),
path: "".to_string(),
},
LinuxNamespace {
r#type: NSTYPEUTS.to_string(),
path: "".to_string(),
},
LinuxNamespace {
r#type: NSTYPEPID.to_string(),
path: "sharedpidns".to_string(),
},
],
..Default::default()
},
TestData {
namespaces: vec![],
use_sandbox_pidns: true,
expected_namespaces: vec![LinuxNamespace {
r#type: NSTYPEPID.to_string(),
path: "sharedpidns".to_string(),
}],
..Default::default()
},
TestData {
namespaces: vec![],
use_sandbox_pidns: false,
expected_namespaces: vec![LinuxNamespace {
r#type: NSTYPEPID.to_string(),
path: "".to_string(),
}],
..Default::default()
},
TestData {
namespaces: vec![],
sandbox_pidns_path: None,
use_sandbox_pidns: true,
result: Err(anyhow!(ERR_NO_SANDBOX_PIDNS)),
expected_namespaces: vec![],
..Default::default()
},
TestData {
has_linux_in_spec: false,
result: Err(anyhow!(ERR_NO_LINUX_FIELD)),
..Default::default()
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let logger = slog::Logger::root(slog::Discard, o!());
let mut sandbox = Sandbox::new(&logger).unwrap();
if let Some(pidns_path) = d.sandbox_pidns_path {
let mut sandbox_pidns = Namespace::new(&logger);
sandbox_pidns.path = pidns_path.to_string();
sandbox.sandbox_pidns = Some(sandbox_pidns);
}
let mut oci = Spec::default();
if d.has_linux_in_spec {
oci.linux = Some(Linux {
namespaces: d.namespaces.clone(),
..Default::default()
});
}
let result = update_container_namespaces(&sandbox, &mut oci, d.use_sandbox_pidns);
let msg = format!("{}, result: {:?}", msg, result);
assert_result!(d.result, result, msg);
if let Some(linux) = oci.linux {
assert_eq!(d.expected_namespaces, linux.namespaces, "{}", msg);
}
}
}
#[tokio::test]
async fn test_get_memory_info() {
#[derive(Debug)]
struct TestData<'a> {
// if None is provided, no file will be generated, else the data in the Option will populate the file
block_size_data: Option<&'a str>,
hotplug_probe_data: bool,
get_block_size: bool,
get_hotplug: bool,
result: Result<(u64, bool)>,
}
let tests = &[
TestData {
block_size_data: Some("10000000"),
hotplug_probe_data: true,
get_block_size: true,
get_hotplug: true,
result: Ok((268435456, true)),
},
TestData {
block_size_data: Some("100"),
hotplug_probe_data: false,
get_block_size: true,
get_hotplug: true,
result: Ok((256, false)),
},
TestData {
block_size_data: None,
hotplug_probe_data: false,
get_block_size: true,
get_hotplug: true,
result: Ok((0, false)),
},
TestData {
block_size_data: Some(""),
hotplug_probe_data: false,
get_block_size: true,
get_hotplug: false,
result: Err(anyhow!(ERR_INVALID_BLOCK_SIZE)),
},
TestData {
block_size_data: Some("-1"),
hotplug_probe_data: false,
get_block_size: true,
get_hotplug: false,
result: Err(anyhow!(ERR_INVALID_BLOCK_SIZE)),
},
TestData {
block_size_data: Some(" "),
hotplug_probe_data: false,
get_block_size: true,
get_hotplug: false,
result: Err(anyhow!(ERR_INVALID_BLOCK_SIZE)),
},
TestData {
block_size_data: Some("some data"),
hotplug_probe_data: false,
get_block_size: true,
get_hotplug: false,
result: Err(anyhow!(ERR_INVALID_BLOCK_SIZE)),
},
TestData {
block_size_data: Some("some data"),
hotplug_probe_data: true,
get_block_size: false,
get_hotplug: false,
result: Ok((0, false)),
},
TestData {
block_size_data: Some("100"),
hotplug_probe_data: true,
get_block_size: false,
get_hotplug: false,
result: Ok((0, false)),
},
TestData {
block_size_data: Some("100"),
hotplug_probe_data: true,
get_block_size: false,
get_hotplug: true,
result: Ok((0, true)),
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let dir = tempdir().expect("failed to make tempdir");
let block_size_path = dir.path().join("block_size_bytes");
let hotplug_probe_path = dir.path().join("probe");
if let Some(block_size_data) = d.block_size_data {
fs::write(&block_size_path, block_size_data).unwrap();
}
if d.hotplug_probe_data {
fs::write(&hotplug_probe_path, []).unwrap();
}
let result = get_memory_info(
d.get_block_size,
d.get_hotplug,
block_size_path.to_str().unwrap(),
hotplug_probe_path.to_str().unwrap(),
);
let msg = format!("{}, result: {:?}", msg, result);
assert_result!(d.result, result, msg);
}
}
#[tokio::test]
async fn test_is_signal_handled() {
#[derive(Debug)]
struct TestData<'a> {
status_file_data: Option<&'a str>,
signum: u32,
result: bool,
}
let tests = &[
TestData {
status_file_data: Some(
r#"
SigBlk:0000000000010000
SigCgt:0000000000000001
OtherField:other
"#,
),
signum: 1,
result: true,
},
TestData {
status_file_data: Some("SigCgt:000000004b813efb"),
signum: 4,
result: true,
},
TestData {
status_file_data: Some("SigCgt:000000004b813efb"),
signum: 3,
result: false,
},
TestData {
status_file_data: Some("SigCgt:000000004b813efb"),
signum: 65,
result: false,
},
TestData {
status_file_data: Some("SigCgt:000000004b813efb"),
signum: 0,
result: true,
},
TestData {
status_file_data: Some("SigCgt:ZZZZZZZZ"),
signum: 1,
result: false,
},
TestData {
status_file_data: Some("SigCgt:-1"),
signum: 1,
result: false,
},
TestData {
status_file_data: Some("SigCgt"),
signum: 1,
result: false,
},
TestData {
status_file_data: Some("any data"),
signum: 0,
result: true,
},
TestData {
status_file_data: Some("SigBlk:0000000000000001"),
signum: 1,
result: false,
},
TestData {
status_file_data: None,
signum: 1,
result: false,
},
TestData {
status_file_data: None,
signum: 0,
result: false,
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let dir = tempdir().expect("failed to make tempdir");
let proc_status_file_path = dir.path().join("status");
if let Some(file_data) = d.status_file_data {
fs::write(&proc_status_file_path, file_data).unwrap();
}
let result = is_signal_handled(proc_status_file_path.to_str().unwrap(), d.signum);
let msg = format!("{}, result: {:?}", msg, result);
assert_eq!(d.result, result, "{}", msg);
}
}
#[tokio::test]
async fn test_verify_cid() {
#[derive(Debug)]

View File

@@ -32,8 +32,6 @@ use tokio::sync::oneshot;
use tokio::sync::Mutex;
use tracing::instrument;
pub const ERR_INVALID_CONTAINER_ID: &str = "Invalid container id";
type UeventWatcher = (Box<dyn UeventMatcher>, oneshot::Sender<Uevent>);
#[derive(Debug)]
@@ -151,12 +149,7 @@ impl Sandbox {
pub fn remove_sandbox_storage(&self, path: &str) -> Result<()> {
let mounts = vec![path.to_string()];
remove_mounts(&mounts)?;
// "remove_dir" will fail if the mount point is backed by a read-only filesystem.
// This is the case with the device mapper snapshotter, where we mount the block device directly
// at the underlying sandbox path which was provided from the base RO kataShared path from the host.
if let Err(err) = fs::remove_dir(path) {
warn!(self.logger, "failed to remove dir {}, {:?}", path, err);
}
fs::remove_dir_all(path).context(format!("failed to remove dir {:?}", path))?;
Ok(())
}
@@ -239,7 +232,7 @@ impl Sandbox {
pub fn find_container_process(&mut self, cid: &str, eid: &str) -> Result<&mut Process> {
let ctr = self
.get_container(cid)
.ok_or_else(|| anyhow!(ERR_INVALID_CONTAINER_ID))?;
.ok_or_else(|| anyhow!("Invalid container id"))?;
if eid.is_empty() {
return ctr
@@ -569,8 +562,19 @@ mod tests {
.remove_sandbox_storage(invalid_dir.to_str().unwrap())
.is_err());
assert!(bind_mount(srcdir_path, destdir_path, &logger).is_ok());
// Now, create a double mount as this guarantees the directory cannot
// be deleted after the first umount.
for _i in 0..2 {
assert!(bind_mount(srcdir_path, destdir_path, &logger).is_ok());
}
assert!(
s.remove_sandbox_storage(destdir_path).is_err(),
"Expect fail as deletion cannot happen due to the second mount."
);
// This time it should work as the previous two calls have undone the double
// mount.
assert!(s.remove_sandbox_storage(destdir_path).is_ok());
}

View File

@@ -58,16 +58,17 @@ async fn handle_sigchild(logger: Logger, sandbox: Arc<Mutex<Sandbox>>) -> Result
}
let mut p = process.unwrap();
let ret: i32;
let ret: i32 = match wait_status {
WaitStatus::Exited(_, c) => c,
WaitStatus::Signaled(_, sig, _) => sig as i32,
match wait_status {
WaitStatus::Exited(_, c) => ret = c,
WaitStatus::Signaled(_, sig, _) => ret = sig as i32,
_ => {
info!(logger, "got wrong status for process";
"child-status" => format!("{:?}", wait_status));
continue;
}
};
}
p.exit_code = ret;
let _ = p.exit_tx.take();

View File

@@ -5,14 +5,7 @@
#![allow(clippy::module_inception)]
#[cfg(test)]
pub mod test_utils {
#[derive(Debug, PartialEq)]
pub enum TestUserType {
RootOnly,
NonRootOnly,
Any,
}
mod test_utils {
#[macro_export]
macro_rules! skip_if_root {
() => {
@@ -60,29 +53,4 @@ pub mod test_utils {
}
};
}
// Parameters:
//
// 1: expected Result
// 2: actual Result
// 3: string used to identify the test on error
#[macro_export]
macro_rules! assert_result {
($expected_result:expr, $actual_result:expr, $msg:expr) => {
if $expected_result.is_ok() {
let expected_value = $expected_result.as_ref().unwrap();
let actual_value = $actual_result.unwrap();
assert!(*expected_value == actual_value, "{}", $msg);
} else {
assert!($actual_result.is_err(), "{}", $msg);
let expected_error = $expected_result.as_ref().unwrap_err();
let expected_error_msg = format!("{:?}", expected_error);
let actual_error_msg = format!("{:?}", $actual_result.unwrap_err());
assert!(expected_error_msg == actual_error_msg, "{}", $msg);
}
};
}
}

View File

@@ -237,6 +237,8 @@ mod tests {
JoinError,
>;
let result: std::result::Result<u64, std::io::Error>;
select! {
res = handle => spawn_result = res,
_ = &mut timeout => panic!("timed out"),
@@ -244,7 +246,7 @@ mod tests {
assert!(spawn_result.is_ok());
let result: std::result::Result<u64, std::io::Error> = spawn_result.unwrap();
result = spawn_result.unwrap();
assert!(result.is_ok());
@@ -276,6 +278,8 @@ mod tests {
let spawn_result: std::result::Result<std::result::Result<u64, std::io::Error>, JoinError>;
let result: std::result::Result<u64, std::io::Error>;
select! {
res = handle => spawn_result = res,
_ = &mut timeout => panic!("timed out"),
@@ -283,7 +287,7 @@ mod tests {
assert!(spawn_result.is_ok());
let result: std::result::Result<u64, std::io::Error> = spawn_result.unwrap();
result = spawn_result.unwrap();
assert!(result.is_ok());
@@ -316,6 +320,8 @@ mod tests {
let spawn_result: std::result::Result<std::result::Result<u64, std::io::Error>, JoinError>;
let result: std::result::Result<u64, std::io::Error>;
select! {
res = handle => spawn_result = res,
_ = &mut timeout => panic!("timed out"),
@@ -323,7 +329,7 @@ mod tests {
assert!(spawn_result.is_ok());
let result: std::result::Result<u64, std::io::Error> = spawn_result.unwrap();
result = spawn_result.unwrap();
assert!(result.is_ok());

View File

@@ -6,7 +6,6 @@
#![allow(unknown_lints)]
use std::collections::HashMap;
use std::os::unix::fs::MetadataExt;
use std::path::{Path, PathBuf};
use std::sync::Arc;
use std::time::SystemTime;
@@ -14,7 +13,6 @@ use std::time::SystemTime;
use anyhow::{ensure, Context, Result};
use async_recursion::async_recursion;
use nix::mount::{umount, MsFlags};
use nix::unistd::{Gid, Uid};
use slog::{debug, error, info, warn, Logger};
use thiserror::Error;
use tokio::fs;
@@ -82,8 +80,7 @@ impl Drop for Storage {
}
async fn copy(from: impl AsRef<Path>, to: impl AsRef<Path>) -> Result<()> {
let metadata = fs::symlink_metadata(&from).await?;
if metadata.file_type().is_symlink() {
if fs::symlink_metadata(&from).await?.file_type().is_symlink() {
// if source is a symlink, create new symlink with same link source. If
// the symlink exists, remove and create new one:
if fs::symlink_metadata(&to).await.is_ok() {
@@ -91,15 +88,8 @@ async fn copy(from: impl AsRef<Path>, to: impl AsRef<Path>) -> Result<()> {
}
fs::symlink(fs::read_link(&from).await?, &to).await?;
} else {
fs::copy(&from, &to).await?;
fs::copy(from, to).await?;
}
// preserve the source uid and gid to the destination.
nix::unistd::chown(
to.as_ref(),
Some(Uid::from_raw(metadata.uid())),
Some(Gid::from_raw(metadata.gid())),
)?;
Ok(())
}
@@ -116,29 +106,14 @@ impl Storage {
async fn update_target(&self, logger: &Logger, source_path: impl AsRef<Path>) -> Result<()> {
let source_file_path = source_path.as_ref();
let metadata = source_file_path.symlink_metadata()?;
// if we are creating a directory: just create it, nothing more to do
if metadata.file_type().is_dir() {
if source_file_path.symlink_metadata()?.file_type().is_dir() {
let dest_file_path = self.make_target_path(&source_file_path)?;
fs::create_dir_all(&dest_file_path)
.await
.with_context(|| format!("Unable to mkdir all for {}", dest_file_path.display()))?;
// set the directory permissions to match the source directory permissions
fs::set_permissions(&dest_file_path, metadata.permissions())
.await
.with_context(|| {
format!("Unable to set permissions for {}", dest_file_path.display())
})?;
// preserve the source directory uid and gid to the destination.
nix::unistd::chown(
&dest_file_path,
Some(Uid::from_raw(metadata.uid())),
Some(Gid::from_raw(metadata.gid())),
)
.with_context(|| format!("Unable to set ownership for {}", dest_file_path.display()))?;
return Ok(());
}
@@ -529,7 +504,6 @@ mod tests {
use super::*;
use crate::mount::is_mounted;
use crate::skip_if_not_root;
use nix::unistd::{Gid, Uid};
use std::fs;
use std::thread;
@@ -921,28 +895,20 @@ mod tests {
#[tokio::test]
async fn test_copy() {
skip_if_not_root!();
// prepare tmp src/destination
let source_dir = tempfile::tempdir().unwrap();
let dest_dir = tempfile::tempdir().unwrap();
let uid = Uid::from_raw(10);
let gid = Gid::from_raw(200);
// verify copy of a regular file
let src_file = source_dir.path().join("file.txt");
let dst_file = dest_dir.path().join("file.txt");
fs::write(&src_file, "foo").unwrap();
nix::unistd::chown(&src_file, Some(uid), Some(gid)).unwrap();
copy(&src_file, &dst_file).await.unwrap();
// verify destination:
assert!(!fs::symlink_metadata(&dst_file)
assert!(!fs::symlink_metadata(dst_file)
.unwrap()
.file_type()
.is_symlink());
assert_eq!(fs::metadata(&dst_file).unwrap().uid(), uid.as_raw());
assert_eq!(fs::metadata(&dst_file).unwrap().gid(), gid.as_raw());
// verify copy of a symlink
let src_symlink_file = source_dir.path().join("symlink_file.txt");
@@ -950,7 +916,7 @@ mod tests {
tokio::fs::symlink(&src_file, &src_symlink_file)
.await
.unwrap();
copy(&src_symlink_file, &dst_symlink_file).await.unwrap();
copy(src_symlink_file, &dst_symlink_file).await.unwrap();
// verify destination:
assert!(fs::symlink_metadata(&dst_symlink_file)
.unwrap()
@@ -958,8 +924,6 @@ mod tests {
.is_symlink());
assert_eq!(fs::read_link(&dst_symlink_file).unwrap(), src_file);
assert_eq!(fs::read_to_string(&dst_symlink_file).unwrap(), "foo");
assert_ne!(fs::metadata(&dst_symlink_file).unwrap().uid(), uid.as_raw());
assert_ne!(fs::metadata(&dst_symlink_file).unwrap().gid(), gid.as_raw());
}
#[tokio::test]
@@ -1105,8 +1069,6 @@ mod tests {
#[tokio::test]
async fn watch_directory() {
skip_if_not_root!();
// Prepare source directory:
// ./tmp/1.txt
// ./tmp/A/B/2.txt
@@ -1117,9 +1079,7 @@ mod tests {
// A/C is an empty directory
let empty_dir = "A/C";
let path = source_dir.path().join(empty_dir);
fs::create_dir_all(&path).unwrap();
nix::unistd::chown(&path, Some(Uid::from_raw(10)), Some(Gid::from_raw(200))).unwrap();
fs::create_dir_all(source_dir.path().join(empty_dir)).unwrap();
// delay 20 ms between writes to files in order to ensure filesystem timestamps are unique
thread::sleep(Duration::from_millis(20));
@@ -1163,9 +1123,7 @@ mod tests {
// create another empty directory A/C/D
let empty_dir = "A/C/D";
let path = source_dir.path().join(empty_dir);
fs::create_dir_all(&path).unwrap();
nix::unistd::chown(&path, Some(Uid::from_raw(10)), Some(Gid::from_raw(200))).unwrap();
fs::create_dir_all(source_dir.path().join(empty_dir)).unwrap();
assert_eq!(entry.scan(&logger).await.unwrap(), 1);
assert!(dest_dir.path().join(empty_dir).exists());
}

View File

@@ -178,11 +178,13 @@ impl Builder {
pub fn init(self) -> Exporter {
let Builder { port, cid, logger } = self;
let cid_str: String = if self.cid == libc::VMADDR_CID_ANY {
ANY_CID.to_string()
let cid_str: String;
if self.cid == libc::VMADDR_CID_ANY {
cid_str = ANY_CID.to_string();
} else {
format!("{}", self.cid)
};
cid_str = format!("{}", self.cid);
}
Exporter {
port,

View File

@@ -1,6 +0,0 @@
[workspace]
members = [
"logging",
"safe-path",
]
resolver = "2"

View File

@@ -1,10 +0,0 @@
The `src/libs` directory hosts library crates which may be shared by multiple Kata Containers components
or published to [`crates.io`](https://crates.io/index.html).
### Library Crates
Currently it provides following library crates:
| Library | Description |
|-|-|-|
| [logging](logging/) | Facilities to setup logging subsystem based slog. |
| [safe-path](safe-path/) | Utilities to safely resolve filesystem paths. |

View File

@@ -1,5 +1,7 @@
# This file is automatically @generated by Cargo.
# It is not intended for manual editing.
version = 3
[[package]]
name = "arc-swap"
version = "1.5.0"
@@ -39,9 +41,9 @@ dependencies = [
[[package]]
name = "crossbeam-channel"
version = "0.5.2"
version = "0.5.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e54ea8bc3fb1ee042f5aace6e3c6e025d3874866da222930f70ce62aceba0bfa"
checksum = "06ed27e177f16d65f0f0c22a213e17c696ace5dd64b14258b52f9417ccb52db4"
dependencies = [
"cfg-if",
"crossbeam-utils",
@@ -49,30 +51,23 @@ dependencies = [
[[package]]
name = "crossbeam-utils"
version = "0.8.6"
version = "0.8.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cfcae03edb34f947e64acdb1c33ec169824e20657e9ecb61cef6c8c74dcb8120"
checksum = "d82cfc11ce7f2c3faef78d8a684447b40d503d9681acebed6cb728d45940c4db"
dependencies = [
"cfg-if",
"lazy_static",
]
[[package]]
name = "fastrand"
version = "1.6.0"
name = "getrandom"
version = "0.2.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "779d043b6a0b90cc4c0ed7ee380a6504394cee7efd7db050e3774eee387324b2"
dependencies = [
"instant",
]
[[package]]
name = "instant"
version = "0.1.12"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7a5bbe824c507c5da5956355e86a746d82e0e1464f65d862cc5e71da70e94b2c"
checksum = "7fcd999463524c52659517fe2cea98493cfe485d10565e7b0fb07dbba7ad2753"
dependencies = [
"cfg-if",
"libc",
"wasi",
]
[[package]]
@@ -89,9 +84,9 @@ checksum = "e2abad23fbc42b3700f2f279844dc832adb2b2eb069b2df918f455c4e18cc646"
[[package]]
name = "libc"
version = "0.2.113"
version = "0.2.112"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "eef78b64d87775463c549fbd80e19249ef436ea3bf1de2a1eb7e717ec7fab1e9"
checksum = "1b03d17f364a3a042d5e5d46b053bbbf82c92c9430c592dd4c064dc6ee997125"
[[package]]
name = "logging"
@@ -130,6 +125,52 @@ version = "1.9.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "da32515d9f6e6e489d7bc9d84c71b060db7247dc035bbe44eac88cf87486d8d5"
[[package]]
name = "ppv-lite86"
version = "0.2.15"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ed0cfbc8191465bed66e1718596ee0b0b35d5ee1f41c5df2189d0fe8bde535ba"
[[package]]
name = "rand"
version = "0.8.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2e7573632e6454cf6b99d7aac4ccca54be06da05aca2ef7423d22d27d4d4bcd8"
dependencies = [
"libc",
"rand_chacha",
"rand_core",
"rand_hc",
]
[[package]]
name = "rand_chacha"
version = "0.3.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e6c10a63a0fa32252be49d21e7709d4d4baf8d231c2dbce1eaa8141b9b127d88"
dependencies = [
"ppv-lite86",
"rand_core",
]
[[package]]
name = "rand_core"
version = "0.6.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d34f1408f55294453790c48b2f1ebbb1c5b4b7563eb1f418bcfcfdbb06ebb4e7"
dependencies = [
"getrandom",
]
[[package]]
name = "rand_hc"
version = "0.3.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d51e9f596de227fda2ea6c84607f5558e196eeaf43c986b724ba4fb8fdf497e7"
dependencies = [
"rand_core",
]
[[package]]
name = "redox_syscall"
version = "0.2.10"
@@ -154,25 +195,17 @@ version = "1.0.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "73b4b750c782965c211b42f022f59af1fbceabdd026623714f104152f1ec149f"
[[package]]
name = "safe-path"
version = "0.1.0"
dependencies = [
"libc",
"tempfile",
]
[[package]]
name = "serde"
version = "1.0.133"
version = "1.0.131"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "97565067517b60e2d1ea8b268e59ce036de907ac523ad83a0475da04e818989a"
checksum = "b4ad69dfbd3e45369132cc64e6748c2d65cdfb001a2b1c232d128b4ad60561c1"
[[package]]
name = "serde_json"
version = "1.0.75"
version = "1.0.73"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c059c05b48c5c0067d4b4b2b4f0732dd65feb52daf7e0ea09cd87e7dadc1af79"
checksum = "bcbd0344bc6533bc7ec56df11d42fb70f1b912351c0825ccb7211b59d8af7cf5"
dependencies = [
"itoa",
"ryu",
@@ -228,13 +261,13 @@ checksum = "f764005d11ee5f36500a149ace24e00e3da98b0158b3e2d53a7495660d3f4d60"
[[package]]
name = "tempfile"
version = "3.3.0"
version = "3.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5cdb1ef4eaeeaddc8fbd371e5017057064af0911902ef36b39801f67cc6d79e4"
checksum = "dac1c663cfc93810f88aed9b8941d48cabf856a1b111c29a40439018d870eb22"
dependencies = [
"cfg-if",
"fastrand",
"libc",
"rand",
"redox_syscall",
"remove_dir_all",
"winapi",
@@ -251,20 +284,19 @@ dependencies = [
[[package]]
name = "time"
version = "0.1.44"
version = "0.1.43"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6db9e6914ab8b1ae1c260a4ae7a49b6c5611b40328a735b21862567685e73255"
checksum = "ca8a50ef2360fbd1eeb0ecd46795a87a19024eb4b53c5dc916ca1fd95fe62438"
dependencies = [
"libc",
"wasi",
"winapi",
]
[[package]]
name = "wasi"
version = "0.10.0+wasi-snapshot-preview1"
version = "0.10.2+wasi-snapshot-preview1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1a143597ca7c7793eff794def352d41792a93c481eb1042423ff7ff72ba2c31f"
checksum = "fd6fbd9a79829dd1ad0cc20627bf1ed606756a7f77edff7b66b7064f9cb327c6"
[[package]]
name = "winapi"

View File

@@ -381,7 +381,7 @@ pub struct LinuxMemory {
#[serde(default, skip_serializing_if = "Option::is_none", rename = "kernelTCP")]
pub kernel_tcp: Option<i64>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub swappiness: Option<u64>,
pub swappiness: Option<i64>,
#[serde(
default,
skip_serializing_if = "Option::is_none",

View File

@@ -1,7 +1,6 @@
Cargo.lock
src/agent.rs
src/agent_ttrpc.rs
src/csi.rs
src/empty.rs
src/health.rs
src/health_ttrpc.rs

View File

@@ -7,14 +7,14 @@
# //
die() {
cat <<EOF >&2
cat <<EOT >&2
====================================================================
==== compile protocols failed ====
$1
====================================================================
EOF
EOT
exit 1
}

View File

@@ -399,17 +399,6 @@ message SetGuestDateTimeRequest {
int64 Usec = 2;
}
// FSGroup consists of the group id and group ownership change policy
// that a volume should have its ownership changed to.
message FSGroup {
// GroupID is the ID that the group ownership of the
// files in the mounted volume will need to be changed to.
uint32 group_id = 2;
// GroupChangePolicy specifies the policy for applying group id
// ownership change on a mounted volume.
types.FSGroupChangePolicy group_change_policy = 3;
}
// Storage represents both the rootfs of the container, and any volume that
// could have been defined through the Mount list of the OCI specification.
message Storage {
@@ -433,14 +422,11 @@ message Storage {
// device, "9p" for shared filesystem, or "tmpfs" for shared /dev/shm.
string fstype = 4;
// Options describes the additional options that might be needed to
// mount properly the storage filesystem.
// mount properly the storage filesytem.
repeated string options = 5;
// MountPoint refers to the path where the storage should be mounted
// inside the VM.
string mount_point = 6;
// FSGroup consists of the group ID and group ownership change policy
// that the mounted volume must have its group ID changed to when specified.
FSGroup fs_group = 7;
}
// Device represents only the devices that could have been defined through the

View File

@@ -16,15 +16,6 @@ enum IPFamily {
v6 = 1;
}
// FSGroupChangePolicy defines the policy for applying group id ownership change on a mounted volume.
enum FSGroupChangePolicy {
// Always indicates that the volume ownership will always be changed.
Always = 0;
// OnRootMismatch indicates that the volume ownership will be changed only
// when the ownership of the root directory does not match with the expected group id for the volume.
OnRootMismatch = 1;
}
message IPAddress {
IPFamily family = 1;
string address = 2;

View File

@@ -1,18 +0,0 @@
[package]
name = "safe-path"
version = "0.1.0"
description = "A library to safely handle file system paths for container runtimes"
keywords = ["kata", "container", "path", "securejoin"]
categories = ["parser-implementations", "filesystem"]
authors = ["The Kata Containers community <kata-dev@lists.katacontainers.io>"]
repository = "https://github.com/kata-containers/kata-containers.git"
homepage = "https://katacontainers.io/"
readme = "README.md"
license = "Apache-2.0"
edition = "2018"
[dependencies]
libc = "0.2.100"
[dev-dependencies]
tempfile = "3.2.0"

View File

@@ -1,21 +0,0 @@
Safe Path
====================
[![CI](https://github.com/magiclen/path-absolutize/actions/workflows/ci.yml/badge.svg)](https://github.com/magiclen/path-absolutize/actions/workflows/ci.yml)
A library to safely handle filesystem paths, typically for container runtimes.
There are often path related attacks, such as symlink based attacks, TOCTTOU attacks. The `safe-path` crate
provides several functions and utility structures to protect against path resolution related attacks.
## Support
**Operating Systems**:
- Linux
## Reference
- [`filepath-securejoin`](https://github.com/cyphar/filepath-securejoin): secure_join() written in Go.
- [CVE-2021-30465](https://github.com/advisories/GHSA-c3xm-pvg7-gh7r): symlink related TOCTOU flaw in `runC`.
## License
This code is licensed under [Apache-2.0](../../../LICENSE).

View File

@@ -1,65 +0,0 @@
// Copyright (c) 2022 Alibaba Cloud
//
// SPDX-License-Identifier: Apache-2.0
//
//! A library to safely handle filesystem paths, typically for container runtimes.
//!
//! Linux [mount namespace](https://man7.org/linux/man-pages/man7/mount_namespaces.7.html)
//! provides isolation of the list of mounts seen by the processes in each
//! [namespace](https://man7.org/linux/man-pages/man7/namespaces.7.html) instance.
//! Thus, the processes in each of the mount namespace instances will see distinct single-directory
//! hierarchies.
//!
//! Containers are used to isolate workloads from the host system. Container on Linux systems
//! depends on the mount namespace to build an isolated root filesystem for each container,
//! thus protect the host and containers from each other. When creating containers, the container
//! runtime needs to setup filesystem mounts for container rootfs/volumes. Configuration for
//! mounts/paths may be indirectly controlled by end users through:
//! - container images
//! - Kubernetes pod specifications
//! - hook command line arguments
//!
//! These volume configuration information may be controlled by end users/malicious attackers,
//! so it must not be trusted by container runtimes. When the container runtime is preparing mount
//! namespace for a container, it must be very careful to validate user input configuration
//! information and ensure data out of the container rootfs directory won't be affected
//! by the container. There are several types of attacks related to container mount namespace:
//! - symlink based attack
//! - Time of check to time of use (TOCTTOU)
//!
//! This crate provides several mechanisms for container runtimes to safely handle filesystem paths
//! when preparing mount namespace for containers.
//! - [scoped_join()](crate::scoped_join()): safely join `unsafe_path` to `root`, and ensure
//! `unsafe_path` is scoped under `root`.
//! - [scoped_resolve()](crate::scoped_resolve()): resolve `unsafe_path` to a relative path,
//! rooted at and constrained by `root`.
//! - [struct PinnedPathBuf](crate::PinnedPathBuf): safe version of `PathBuf` to protect from
//! TOCTTOU style of attacks, which ensures:
//! - the value of [`PinnedPathBuf::as_path()`] never changes.
//! - the path returned by [`PinnedPathBuf::as_path()`] is always a symlink.
//! - the filesystem object referenced by the symlink [`PinnedPathBuf::as_path()`] never changes.
//! - the value of [`PinnedPathBuf::target()`] never changes.
//! - [struct ScopedDirBuilder](crate::ScopedDirBuilder): safe version of `DirBuilder` to protect
//! from symlink race and TOCTTOU style of attacks, which enhances security by:
//! - ensuring the new directories are created under a specified `root` directory.
//! - avoiding symlink race attacks during making directories.
//! - returning a [PinnedPathBuf] for the last level of directory, so it could be used for other
//! operations safely.
//!
//! The work is inspired by:
//! - [`filepath-securejoin`](https://github.com/cyphar/filepath-securejoin): secure_join() written
//! in Go.
//! - [CVE-2021-30465](https://github.com/advisories/GHSA-c3xm-pvg7-gh7r): symlink related TOCTOU
//! flaw in `runC`.
#![deny(missing_docs)]
mod pinned_path_buf;
pub use pinned_path_buf::PinnedPathBuf;
mod scoped_dir_builder;
pub use scoped_dir_builder::ScopedDirBuilder;
mod scoped_path_resolver;
pub use scoped_path_resolver::{scoped_join, scoped_resolve};

View File

@@ -1,444 +0,0 @@
// Copyright (c) 2022 Alibaba Cloud
//
// SPDX-License-Identifier: Apache-2.0
//
use std::ffi::{CString, OsStr};
use std::fs::{self, File, Metadata, OpenOptions};
use std::io::{Error, ErrorKind, Result};
use std::ops::Deref;
use std::os::unix::ffi::OsStrExt;
use std::os::unix::fs::OpenOptionsExt;
use std::os::unix::io::{AsRawFd, FromRawFd, RawFd};
use std::path::{Component, Path, PathBuf};
use crate::scoped_join;
/// A safe version of [`PathBuf`] pinned to an underlying filesystem object to protect from
/// `TOCTTOU` style of attacks.
///
/// A [`PinnedPathBuf`] is a resolved path buffer pinned to an underlying filesystem object, which
/// guarantees:
/// - the value of [`PinnedPathBuf::as_path()`] never changes.
/// - the path returned by [`PinnedPathBuf::as_path()`] is always a symlink.
/// - the filesystem object referenced by the symlink [`PinnedPathBuf::as_path()`] never changes.
/// - the value of [`PinnedPathBuf::target()`] never changes.
///
/// Note:
/// - Though the filesystem object referenced by the symlink [`PinnedPathBuf::as_path()`] never
/// changes, the value of `fs::read_link(PinnedPathBuf::as_path())` may change due to filesystem
/// operations.
/// - The value of [`PinnedPathBuf::target()`] is a cached version of
/// `fs::read_link(PinnedPathBuf::as_path())` generated when creating the `PinnedPathBuf` object.
/// - It's a sign of possible attacks if `[PinnedPathBuf::target()]` doesn't match
/// `fs::read_link(PinnedPathBuf::as_path())`.
/// - Once the [`PinnedPathBuf`] object gets dropped, the [`Path`] returned by
/// [`PinnedPathBuf::as_path()`] becomes invalid.
///
/// With normal [`PathBuf`], there's a race window for attackers between time to validate a path and
/// time to use the path. An attacker may maliciously change filesystem object referenced by the
/// path by using symlinks to compose an attack.
///
/// The [`PinnedPathBuf`] is introduced to protect from such attacks, by using the
/// `/proc/self/fd/xxx` files on Linux. The `/proc/self/fd/xxx` file on Linux is a symlink to the
/// real target corresponding to the process's file descriptor `xxx`. And the target filesystem
/// object referenced by the symlink will be kept stable until the file descriptor has been closed.
/// Combined with `O_PATH`, a safe version of `PathBuf` could be built by:
/// - Generate a safe path from `root` and `path` by using [`crate::scoped_join()`].
/// - Open the safe path with O_PATH | O_CLOEXEC flags, say the fd number is `fd_num`.
/// - Read the symlink target of `/proc/self/fd/fd_num`.
/// - Compare the symlink target with the safe path, it's safe if these two paths equal.
/// - Use the proc file path as a safe version of [`PathBuf`].
/// - Close the `fd_num` when dropping the [`PinnedPathBuf`] object.
#[derive(Debug)]
pub struct PinnedPathBuf {
handle: File,
path: PathBuf,
target: PathBuf,
}
impl PinnedPathBuf {
/// Create a [`PinnedPathBuf`] object from `root` and `path`.
///
/// The `path` must be a subdirectory of `root`, otherwise error will be returned.
pub fn new<R: AsRef<Path>, U: AsRef<Path>>(root: R, path: U) -> Result<Self> {
let path = scoped_join(root, path)?;
Self::from_path(path)
}
/// Create a `PinnedPathBuf` from `path`.
///
/// If the resolved value of `path` doesn't equal to `path`, an error will be returned.
pub fn from_path<P: AsRef<Path>>(orig_path: P) -> Result<Self> {
let orig_path = orig_path.as_ref();
let handle = Self::open_by_path(orig_path)?;
Self::new_from_file(handle, orig_path)
}
/// Try to clone the [`PinnedPathBuf`] object.
pub fn try_clone(&self) -> Result<Self> {
let fd = unsafe { libc::dup(self.path_fd()) };
if fd < 0 {
Err(Error::last_os_error())
} else {
Ok(Self {
handle: unsafe { File::from_raw_fd(fd) },
path: Self::get_proc_path(fd),
target: self.target.clone(),
})
}
}
/// Return the underlying file descriptor representing the pinned path.
///
/// Following operations are supported by the returned `RawFd`:
/// - fchdir
/// - fstat/fstatfs
/// - openat/linkat/fchownat/fstatat/readlinkat/mkdirat/*at
/// - fcntl(F_GETFD, F_SETFD, F_GETFL)
pub fn path_fd(&self) -> RawFd {
self.handle.as_raw_fd()
}
/// Get the symlink path referring the target filesystem object.
pub fn as_path(&self) -> &Path {
self.path.as_path()
}
/// Get the cached real path of the target filesystem object.
///
/// The target path is cached version of `fs::read_link(PinnedPathBuf::as_path())` generated
/// when creating the `PinnedPathBuf` object. On the other hand, the value of
/// `fs::read_link(PinnedPathBuf::as_path())` may change due to underlying filesystem operations.
/// So it's a sign of possible attacks if `PinnedPathBuf::target()` does not match
/// `fs::read_link(PinnedPathBuf::as_path())`.
pub fn target(&self) -> &Path {
&self.target
}
/// Get [`Metadata`] about the path handle.
pub fn metadata(&self) -> Result<Metadata> {
self.handle.metadata()
}
/// Open a direct child of the filesystem objected referenced by the `PinnedPathBuf` object.
pub fn open_child(&self, path_comp: &OsStr) -> Result<Self> {
let name = Self::prepare_path_component(path_comp)?;
let oflags = libc::O_PATH | libc::O_CLOEXEC;
let res = unsafe { libc::openat(self.path_fd(), name.as_ptr(), oflags, 0) };
if res < 0 {
Err(Error::last_os_error())
} else {
let handle = unsafe { File::from_raw_fd(res) };
Self::new_from_file(handle, self.target.join(path_comp))
}
}
/// Create or open a child directory if current object is a directory.
pub fn mkdir(&self, path_comp: &OsStr, mode: libc::mode_t) -> Result<Self> {
let path_name = Self::prepare_path_component(path_comp)?;
let res = unsafe { libc::mkdirat(self.handle.as_raw_fd(), path_name.as_ptr(), mode) };
if res < 0 {
Err(Error::last_os_error())
} else {
self.open_child(path_comp)
}
}
/// Open a directory/file by path.
///
/// Obtain a file descriptor that can be used for two purposes:
/// - indicate a location in the filesystem tree
/// - perform operations that act purely at the file descriptor level
fn open_by_path<P: AsRef<Path>>(path: P) -> Result<File> {
// When O_PATH is specified in flags, flag bits other than O_CLOEXEC, O_DIRECTORY, and
// O_NOFOLLOW are ignored.
let o_flags = libc::O_PATH | libc::O_CLOEXEC;
OpenOptions::new()
.read(true)
.custom_flags(o_flags)
.open(path.as_ref())
}
fn get_proc_path<F: AsRawFd>(file: F) -> PathBuf {
PathBuf::from(format!("/proc/self/fd/{}", file.as_raw_fd()))
}
fn new_from_file<P: AsRef<Path>>(handle: File, orig_path: P) -> Result<Self> {
let path = Self::get_proc_path(handle.as_raw_fd());
let link_path = fs::read_link(path.as_path())?;
if link_path != orig_path.as_ref() {
Err(Error::new(
ErrorKind::Other,
format!(
"Path changed from {} to {} on open, possible attack",
orig_path.as_ref().display(),
link_path.display()
),
))
} else {
Ok(PinnedPathBuf {
handle,
path,
target: link_path,
})
}
}
#[inline]
fn prepare_path_component(path_comp: &OsStr) -> Result<CString> {
let path = Path::new(path_comp);
let mut comps = path.components();
let name = comps.next();
if !matches!(name, Some(Component::Normal(_))) || comps.next().is_some() {
return Err(Error::new(
ErrorKind::Other,
format!("Path component {} is invalid", path_comp.to_string_lossy()),
));
}
let name = name.unwrap();
if name.as_os_str() != path_comp {
return Err(Error::new(
ErrorKind::Other,
format!("Path component {} is invalid", path_comp.to_string_lossy()),
));
}
CString::new(path_comp.as_bytes()).map_err(|_e| {
Error::new(
ErrorKind::Other,
format!("Path component {} is invalid", path_comp.to_string_lossy()),
)
})
}
}
impl Deref for PinnedPathBuf {
type Target = PathBuf;
fn deref(&self) -> &Self::Target {
&self.path
}
}
impl AsRef<Path> for PinnedPathBuf {
fn as_ref(&self) -> &Path {
self.path.as_path()
}
}
#[cfg(test)]
mod tests {
use super::*;
use std::ffi::OsString;
use std::fs::DirBuilder;
use std::io::Write;
use std::os::unix::fs::{symlink, MetadataExt};
use std::sync::{Arc, Barrier};
use std::thread;
#[test]
fn test_pinned_path_buf() {
// Create a root directory, which itself contains symlinks.
let rootfs_dir = tempfile::tempdir().expect("failed to create tmpdir");
DirBuilder::new()
.create(rootfs_dir.path().join("b"))
.unwrap();
symlink(rootfs_dir.path().join("b"), rootfs_dir.path().join("a")).unwrap();
let rootfs_path = &rootfs_dir.path().join("a");
// Create a file and a symlink to it.
fs::create_dir(rootfs_path.join("symlink_dir")).unwrap();
symlink("/endpoint", rootfs_path.join("symlink_dir/endpoint")).unwrap();
fs::write(rootfs_path.join("endpoint"), "test").unwrap();
// Pin the target and validate the path/content.
let path = PinnedPathBuf::new(rootfs_path.to_path_buf(), "symlink_dir/endpoint").unwrap();
assert!(!path.is_dir());
let path_ref = path.deref();
let target = fs::read_link(path_ref).unwrap();
assert_eq!(target, rootfs_path.join("endpoint").canonicalize().unwrap());
let content = fs::read_to_string(&path).unwrap();
assert_eq!(&content, "test");
// Remove the target file and validate that we could still read data from the pinned path.
fs::remove_file(&target).unwrap();
fs::read_to_string(&target).unwrap_err();
let content = fs::read_to_string(&path).unwrap();
assert_eq!(&content, "test");
}
#[test]
fn test_pinned_path_buf_race() {
let root_dir = tempfile::tempdir().expect("failed to create tmpdir");
let root_path = root_dir.path();
let barrier = Arc::new(Barrier::new(2));
fs::write(root_path.join("a"), b"a").unwrap();
fs::write(root_path.join("b"), b"b").unwrap();
fs::write(root_path.join("c"), b"c").unwrap();
symlink("a", root_path.join("s")).unwrap();
let root_path2 = root_path.to_path_buf();
let barrier2 = barrier.clone();
let thread = thread::spawn(move || {
// step 1
barrier2.wait();
fs::remove_file(root_path2.join("a")).unwrap();
symlink("b", root_path2.join("a")).unwrap();
barrier2.wait();
// step 2
barrier2.wait();
fs::remove_file(root_path2.join("b")).unwrap();
symlink("c", root_path2.join("b")).unwrap();
barrier2.wait();
});
let path = scoped_join(&root_path, "s").unwrap();
let data = fs::read_to_string(&path).unwrap();
assert_eq!(&data, "a");
assert!(path.is_file());
barrier.wait();
barrier.wait();
// Verify the target has been redirected.
let data = fs::read_to_string(&path).unwrap();
assert_eq!(&data, "b");
PinnedPathBuf::from_path(&path).unwrap_err();
let pinned_path = PinnedPathBuf::new(&root_path, "s").unwrap();
let data = fs::read_to_string(&pinned_path).unwrap();
assert_eq!(&data, "b");
// step2
barrier.wait();
barrier.wait();
// Verify it still points to the old target.
let data = fs::read_to_string(&pinned_path).unwrap();
assert_eq!(&data, "b");
thread.join().unwrap();
}
#[test]
fn test_new_pinned_path_buf() {
let rootfs_dir = tempfile::tempdir().expect("failed to create tmpdir");
let rootfs_path = rootfs_dir.path();
let path = PinnedPathBuf::from_path(rootfs_path).unwrap();
let _ = OpenOptions::new().read(true).open(&path).unwrap();
}
#[test]
fn test_pinned_path_try_clone() {
let rootfs_dir = tempfile::tempdir().expect("failed to create tmpdir");
let rootfs_path = rootfs_dir.path();
let path = PinnedPathBuf::from_path(rootfs_path).unwrap();
let path2 = path.try_clone().unwrap();
assert_ne!(path.as_path(), path2.as_path());
}
#[test]
fn test_new_pinned_path_buf_from_nonexist_file() {
let rootfs_dir = tempfile::tempdir().expect("failed to create tmpdir");
let rootfs_path = rootfs_dir.path();
PinnedPathBuf::new(rootfs_path, "does_not_exist").unwrap_err();
}
#[test]
fn test_new_pinned_path_buf_without_read_perm() {
let rootfs_dir = tempfile::tempdir().expect("failed to create tmpdir");
let rootfs_path = rootfs_dir.path();
let path = rootfs_path.join("write_only_file");
let mut file = OpenOptions::new()
.read(false)
.write(true)
.create(true)
.mode(0o200)
.open(&path)
.unwrap();
file.write_all(&[0xa5u8]).unwrap();
let md = fs::metadata(&path).unwrap();
let umask = unsafe { libc::umask(0022) };
unsafe { libc::umask(umask) };
assert_eq!(md.mode() & 0o700, 0o200 & !umask);
PinnedPathBuf::from_path(&path).unwrap();
}
#[test]
fn test_pinned_path_buf_path_fd() {
let rootfs_dir = tempfile::tempdir().expect("failed to create tmpdir");
let rootfs_path = rootfs_dir.path();
let path = rootfs_path.join("write_only_file");
let mut file = OpenOptions::new()
.read(false)
.write(true)
.create(true)
.mode(0o200)
.open(&path)
.unwrap();
file.write_all(&[0xa5u8]).unwrap();
let handle = PinnedPathBuf::from_path(&path).unwrap();
// Check that `fstat()` etc works with the fd returned by `path_fd()`.
let fd = handle.path_fd();
let mut stat: libc::stat = unsafe { std::mem::zeroed() };
let res = unsafe { libc::fstat(fd, &mut stat as *mut _) };
assert_eq!(res, 0);
}
#[test]
fn test_pinned_path_buf_open_child() {
let rootfs_dir = tempfile::tempdir().expect("failed to create tmpdir");
let rootfs_path = rootfs_dir.path();
let path = PinnedPathBuf::from_path(rootfs_path).unwrap();
fs::write(path.join("child"), "test").unwrap();
let path = path.open_child(OsStr::new("child")).unwrap();
let content = fs::read_to_string(&path).unwrap();
assert_eq!(&content, "test");
path.open_child(&OsString::from("__does_not_exist__"))
.unwrap_err();
path.open_child(&OsString::from("test/a")).unwrap_err();
}
#[test]
fn test_prepare_path_component() {
assert!(PinnedPathBuf::prepare_path_component(&OsString::from("")).is_err());
assert!(PinnedPathBuf::prepare_path_component(&OsString::from(".")).is_err());
assert!(PinnedPathBuf::prepare_path_component(&OsString::from("..")).is_err());
assert!(PinnedPathBuf::prepare_path_component(&OsString::from("/")).is_err());
assert!(PinnedPathBuf::prepare_path_component(&OsString::from("//")).is_err());
assert!(PinnedPathBuf::prepare_path_component(&OsString::from("a/b")).is_err());
assert!(PinnedPathBuf::prepare_path_component(&OsString::from("./b")).is_err());
assert!(PinnedPathBuf::prepare_path_component(&OsString::from("a/.")).is_err());
assert!(PinnedPathBuf::prepare_path_component(&OsString::from("a/..")).is_err());
assert!(PinnedPathBuf::prepare_path_component(&OsString::from("a/./")).is_err());
assert!(PinnedPathBuf::prepare_path_component(&OsString::from("a/../")).is_err());
assert!(PinnedPathBuf::prepare_path_component(&OsString::from("a/./a")).is_err());
assert!(PinnedPathBuf::prepare_path_component(&OsString::from("a/../a")).is_err());
assert!(PinnedPathBuf::prepare_path_component(&OsString::from("a")).is_ok());
assert!(PinnedPathBuf::prepare_path_component(&OsString::from("a.b")).is_ok());
assert!(PinnedPathBuf::prepare_path_component(&OsString::from("a..b")).is_ok());
}
#[test]
fn test_target_fs_object_changed() {
let rootfs_dir = tempfile::tempdir().expect("failed to create tmpdir");
let rootfs_path = rootfs_dir.path();
let file = rootfs_path.join("child");
fs::write(&file, "test").unwrap();
let path = PinnedPathBuf::from_path(&file).unwrap();
let path3 = fs::read_link(path.as_path()).unwrap();
assert_eq!(&path3, path.target());
fs::rename(file, rootfs_path.join("child2")).unwrap();
let path4 = fs::read_link(path.as_path()).unwrap();
assert_ne!(&path4, path.target());
fs::remove_file(rootfs_path.join("child2")).unwrap();
let path5 = fs::read_link(path.as_path()).unwrap();
assert_ne!(&path4, &path5);
}
}

View File

@@ -1,294 +0,0 @@
// Copyright (c) 2022 Alibaba Cloud
//
// SPDX-License-Identifier: Apache-2.0
//
use std::io::{Error, ErrorKind, Result};
use std::path::Path;
use crate::{scoped_join, scoped_resolve, PinnedPathBuf};
const DIRECTORY_MODE_DEFAULT: u32 = 0o777;
const DIRECTORY_MODE_MASK: u32 = 0o777;
/// Safe version of `DirBuilder` to protect from TOCTOU style of attacks.
///
/// The `ScopedDirBuilder` is a counterpart for `DirBuilder`, with safety enhancements of:
/// - ensuring the new directories are created under a specified `root` directory.
/// - ensuring all created directories are still scoped under `root` even under symlink based
/// attacks.
/// - returning a [PinnedPathBuf] for the last level of directory, so it could be used for other
/// operations safely.
#[derive(Debug)]
pub struct ScopedDirBuilder {
root: PinnedPathBuf,
mode: u32,
recursive: bool,
}
impl ScopedDirBuilder {
/// Create a new instance of `ScopedDirBuilder` with with default mode/security settings.
pub fn new<P: AsRef<Path>>(root: P) -> Result<Self> {
let root = root.as_ref().canonicalize()?;
let root = PinnedPathBuf::from_path(root)?;
if !root.metadata()?.is_dir() {
return Err(Error::new(
ErrorKind::Other,
format!("Invalid root path: {}", root.display()),
));
}
Ok(ScopedDirBuilder {
root,
mode: DIRECTORY_MODE_DEFAULT,
recursive: false,
})
}
/// Indicates that directories should be created recursively, creating all parent directories.
///
/// Parents that do not exist are created with the same security and permissions settings.
pub fn recursive(&mut self, recursive: bool) -> &mut Self {
self.recursive = recursive;
self
}
/// Sets the mode to create new directories with. This option defaults to 0o755.
pub fn mode(&mut self, mode: u32) -> &mut Self {
self.mode = mode & DIRECTORY_MODE_MASK;
self
}
/// Creates the specified directory with the options configured in this builder.
///
/// This is a helper to create subdirectory with an absolute path, without stripping off
/// `self.root`. So error will be returned if path does start with `self.root`.
/// It is considered an error if the directory already exists unless recursive mode is enabled.
pub fn create_with_unscoped_path<P: AsRef<Path>>(&self, path: P) -> Result<PinnedPathBuf> {
if !path.as_ref().is_absolute() {
return Err(Error::new(
ErrorKind::Other,
format!(
"Expected absolute directory path: {}",
path.as_ref().display()
),
));
}
// Partially canonicalize `path` so we can strip the `root` part.
let scoped_path = scoped_join("/", path)?;
let stripped_path = scoped_path.strip_prefix(self.root.target()).map_err(|_| {
Error::new(
ErrorKind::Other,
format!(
"Path {} is not under {}",
scoped_path.display(),
self.root.target().display()
),
)
})?;
self.do_mkdir(&stripped_path)
}
/// Creates sub-directory with the options configured in this builder.
///
/// It is considered an error if the directory already exists unless recursive mode is enabled.
pub fn create<P: AsRef<Path>>(&self, path: P) -> Result<PinnedPathBuf> {
let path = scoped_resolve(&self.root, path)?;
self.do_mkdir(&path)
}
fn do_mkdir(&self, path: &Path) -> Result<PinnedPathBuf> {
assert!(path.is_relative());
if path.file_name().is_none() {
if !self.recursive {
return Err(Error::new(
ErrorKind::AlreadyExists,
"directory already exists",
));
} else {
return self.root.try_clone();
}
}
// Safe because `path` have at least one level.
let levels = path.iter().count() - 1;
let mut dir = self.root.try_clone()?;
for (idx, comp) in path.iter().enumerate() {
match dir.open_child(comp) {
Ok(v) => {
if !v.metadata()?.is_dir() {
return Err(Error::new(
ErrorKind::Other,
format!("Path {} is not a directory", v.display()),
));
} else if !self.recursive && idx == levels {
return Err(Error::new(
ErrorKind::AlreadyExists,
"directory already exists",
));
}
dir = v;
}
Err(_e) => {
if !self.recursive && idx != levels {
return Err(Error::new(
ErrorKind::NotFound,
format!("parent directory does not exist"),
));
}
dir = dir.mkdir(comp, self.mode)?;
}
}
}
Ok(dir)
}
}
#[cfg(test)]
mod tests {
use super::*;
use std::fs;
use std::fs::DirBuilder;
use std::os::unix::fs::{symlink, MetadataExt};
use tempfile::tempdir;
#[test]
fn test_scoped_dir_builder() {
// create temporary directory to emulate container rootfs with symlink
let rootfs_dir = tempdir().expect("failed to create tmpdir");
DirBuilder::new()
.create(rootfs_dir.path().join("b"))
.unwrap();
symlink(rootfs_dir.path().join("b"), rootfs_dir.path().join("a")).unwrap();
let rootfs_path = &rootfs_dir.path().join("a");
// root directory doesn't exist
ScopedDirBuilder::new(rootfs_path.join("__does_not_exist__")).unwrap_err();
ScopedDirBuilder::new("__does_not_exist__").unwrap_err();
// root is a file
fs::write(rootfs_path.join("txt"), "test").unwrap();
ScopedDirBuilder::new(rootfs_path.join("txt")).unwrap_err();
let mut builder = ScopedDirBuilder::new(&rootfs_path).unwrap();
// file with the same name already exists.
builder
.create_with_unscoped_path(rootfs_path.join("txt"))
.unwrap_err();
// parent is a file
builder.create("/txt/a").unwrap_err();
// Not starting with root
builder.create_with_unscoped_path("/txt/a").unwrap_err();
// creating "." without recursive mode should fail
builder
.create_with_unscoped_path(rootfs_path.join("."))
.unwrap_err();
// parent doesn't exist
builder
.create_with_unscoped_path(rootfs_path.join("a/b"))
.unwrap_err();
builder.create("a/b/c").unwrap_err();
let path = builder.create("a").unwrap();
assert!(rootfs_path.join("a").is_dir());
assert_eq!(path.target(), rootfs_path.join("a").canonicalize().unwrap());
// Creating an existing directory without recursive mode should fail.
builder
.create_with_unscoped_path(rootfs_path.join("a"))
.unwrap_err();
// Creating an existing directory with recursive mode should succeed.
builder.recursive(true);
let path = builder
.create_with_unscoped_path(rootfs_path.join("a"))
.unwrap();
assert_eq!(path.target(), rootfs_path.join("a").canonicalize().unwrap());
let path = builder.create(".").unwrap();
assert_eq!(path.target(), rootfs_path.canonicalize().unwrap());
let umask = unsafe { libc::umask(0022) };
unsafe { libc::umask(umask) };
builder.mode(0o740);
let path = builder.create("a/b/c/d").unwrap();
assert_eq!(
path.target(),
rootfs_path.join("a/b/c/d").canonicalize().unwrap()
);
assert!(rootfs_path.join("a/b/c/d").is_dir());
assert_eq!(
rootfs_path.join("a").metadata().unwrap().mode() & 0o777,
DIRECTORY_MODE_DEFAULT & !umask,
);
assert_eq!(
rootfs_path.join("a/b").metadata().unwrap().mode() & 0o777,
0o740 & !umask
);
assert_eq!(
rootfs_path.join("a/b/c").metadata().unwrap().mode() & 0o777,
0o740 & !umask
);
assert_eq!(
rootfs_path.join("a/b/c/d").metadata().unwrap().mode() & 0o777,
0o740 & !umask
);
// Creating should fail if some components are not directory.
builder.create("txt/e/f").unwrap_err();
fs::write(rootfs_path.join("a/b/txt"), "test").unwrap();
builder.create("a/b/txt/h/i").unwrap_err();
}
#[test]
fn test_create_root() {
let mut builder = ScopedDirBuilder::new("/").unwrap();
builder.recursive(true);
builder.create("/").unwrap();
builder.create(".").unwrap();
builder.create("..").unwrap();
builder.create("../../.").unwrap();
builder.create("").unwrap();
builder.create_with_unscoped_path("/").unwrap();
builder.create_with_unscoped_path("/..").unwrap();
builder.create_with_unscoped_path("/../.").unwrap();
}
#[test]
fn test_create_with_absolute_path() {
// create temporary directory to emulate container rootfs with symlink
let rootfs_dir = tempdir().expect("failed to create tmpdir");
DirBuilder::new()
.create(rootfs_dir.path().join("b"))
.unwrap();
symlink(rootfs_dir.path().join("b"), rootfs_dir.path().join("a")).unwrap();
let rootfs_path = &rootfs_dir.path().join("a");
let mut builder = ScopedDirBuilder::new(&rootfs_path).unwrap();
builder.create_with_unscoped_path("/").unwrap_err();
builder
.create_with_unscoped_path(rootfs_path.join("../__xxxx___xxx__"))
.unwrap_err();
builder
.create_with_unscoped_path(rootfs_path.join("c/d"))
.unwrap_err();
// Return `AlreadyExist` when recursive is false
builder.create_with_unscoped_path(&rootfs_path).unwrap_err();
builder
.create_with_unscoped_path(rootfs_path.join("."))
.unwrap_err();
builder.recursive(true);
builder.create_with_unscoped_path(&rootfs_path).unwrap();
builder
.create_with_unscoped_path(rootfs_path.join("."))
.unwrap();
builder
.create_with_unscoped_path(rootfs_path.join("c/d"))
.unwrap();
}
}

View File

@@ -1,415 +0,0 @@
// Copyright (c) 2022 Alibaba Cloud
//
// SPDX-License-Identifier: Apache-2.0
//
use std::io::{Error, ErrorKind, Result};
use std::path::{Component, Path, PathBuf};
// Follow the same configuration as
// [secure_join](https://github.com/cyphar/filepath-securejoin/blob/master/join.go#L51)
const MAX_SYMLINK_DEPTH: u32 = 255;
fn do_scoped_resolve<R: AsRef<Path>, U: AsRef<Path>>(
root: R,
unsafe_path: U,
) -> Result<(PathBuf, PathBuf)> {
let root = root.as_ref().canonicalize()?;
let mut nlinks = 0u32;
let mut curr_path = unsafe_path.as_ref().to_path_buf();
'restart: loop {
let mut subpath = PathBuf::new();
let mut iter = curr_path.components();
'next_comp: while let Some(comp) = iter.next() {
match comp {
// Linux paths don't have prefixes.
Component::Prefix(_) => {
return Err(Error::new(
ErrorKind::Other,
format!("Invalid path prefix in: {}", unsafe_path.as_ref().display()),
));
}
// `RootDir` should always be the first component, and Path::components() ensures
// that.
Component::RootDir | Component::CurDir => {
continue 'next_comp;
}
Component::ParentDir => {
subpath.pop();
}
Component::Normal(n) => {
let path = root.join(&subpath).join(n);
if let Ok(v) = path.read_link() {
nlinks += 1;
if nlinks > MAX_SYMLINK_DEPTH {
return Err(Error::new(
ErrorKind::Other,
format!(
"Too many levels of symlinks: {}",
unsafe_path.as_ref().display()
),
));
}
curr_path = if v.is_absolute() {
v.join(iter.as_path())
} else {
subpath.join(v).join(iter.as_path())
};
continue 'restart;
} else {
subpath.push(n);
}
}
}
}
return Ok((root, subpath));
}
}
/// Resolve `unsafe_path` to a relative path, rooted at and constrained by `root`.
///
/// The `scoped_resolve()` function assumes `root` exists and is an absolute path. It processes
/// each path component in `unsafe_path` as below:
/// - assume it's not a symlink and output if the component doesn't exist yet.
/// - ignore if it's "/" or ".".
/// - go to parent directory but constrained by `root` if it's "..".
/// - recursively resolve to the real path if it's a symlink. All symlink resolutions will be
/// constrained by `root`.
/// - otherwise output the path component.
///
/// # Arguments
/// - `root`: the absolute path to constrain the symlink resolution.
/// - `unsafe_path`: the path to resolve.
///
/// Note that the guarantees provided by this function only apply if the path components in the
/// returned PathBuf are not modified (in other words are not replaced with symlinks on the
/// filesystem) after this function has returned. You may use [crate::PinnedPathBuf] to protect
/// from such TOCTOU attacks.
pub fn scoped_resolve<R: AsRef<Path>, U: AsRef<Path>>(root: R, unsafe_path: U) -> Result<PathBuf> {
do_scoped_resolve(root, unsafe_path).map(|(_root, path)| path)
}
/// Safely join `unsafe_path` to `root`, and ensure `unsafe_path` is scoped under `root`.
///
/// The `scoped_join()` function assumes `root` exists and is an absolute path. It safely joins the
/// two given paths and ensures:
/// - The returned path is guaranteed to be scoped inside `root`.
/// - Any symbolic links in the path are evaluated with the given `root` treated as the root of the
/// filesystem, similar to a chroot.
///
/// It's modelled after [secure_join](https://github.com/cyphar/filepath-securejoin), but only
/// for Linux systems.
///
/// # Arguments
/// - `root`: the absolute path to scope the symlink evaluation.
/// - `unsafe_path`: the path to evaluated and joint with `root`. It is unsafe since it may try to
/// escape from the `root` by using "../" or symlinks.
///
/// # Security
/// On success return, the `scoped_join()` function guarantees that:
/// - The resulting PathBuf must be a child path of `root` and will not contain any symlink path
/// components (they will all get expanded).
/// - When expanding symlinks, all symlink path components must be resolved relative to the provided
/// `root`. In particular, this can be considered a userspace implementation of how chroot(2)
/// operates on file paths.
/// - Non-existent path components are unaffected.
///
/// Note that the guarantees provided by this function only apply if the path components in the
/// returned string are not modified (in other words are not replaced with symlinks on the
/// filesystem) after this function has returned. You may use [crate::PinnedPathBuf] to protect
/// from such TOCTTOU attacks.
pub fn scoped_join<R: AsRef<Path>, U: AsRef<Path>>(root: R, unsafe_path: U) -> Result<PathBuf> {
do_scoped_resolve(root, unsafe_path).map(|(root, path)| root.join(path))
}
#[cfg(test)]
mod tests {
use super::*;
use std::fs::DirBuilder;
use std::os::unix::fs;
use tempfile::tempdir;
#[allow(dead_code)]
#[derive(Debug)]
struct TestData<'a> {
name: &'a str,
rootfs: &'a Path,
unsafe_path: &'a str,
result: &'a str,
}
fn exec_tests(tests: &[TestData]) {
for (i, t) in tests.iter().enumerate() {
// Create a string containing details of the test
let msg = format!("test[{}]: {:?}", i, t);
let result = scoped_resolve(t.rootfs, t.unsafe_path).unwrap();
let msg = format!("{}, result: {:?}", msg, result);
// Perform the checks
assert_eq!(&result, Path::new(t.result), "{}", msg);
}
}
#[test]
fn test_scoped_resolve() {
// create temporary directory to emulate container rootfs with symlink
let rootfs_dir = tempdir().expect("failed to create tmpdir");
DirBuilder::new()
.create(rootfs_dir.path().join("b"))
.unwrap();
fs::symlink(rootfs_dir.path().join("b"), rootfs_dir.path().join("a")).unwrap();
let rootfs_path = &rootfs_dir.path().join("a");
let tests = [
TestData {
name: "normal path",
rootfs: rootfs_path,
unsafe_path: "a/b/c",
result: "a/b/c",
},
TestData {
name: "path with .. at beginning",
rootfs: rootfs_path,
unsafe_path: "../../../a/b/c",
result: "a/b/c",
},
TestData {
name: "path with complex .. pattern",
rootfs: rootfs_path,
unsafe_path: "../../../a/../../b/../../c",
result: "c",
},
TestData {
name: "path with .. in middle",
rootfs: rootfs_path,
unsafe_path: "/usr/bin/../../bin/ls",
result: "bin/ls",
},
TestData {
name: "path with . and ..",
rootfs: rootfs_path,
unsafe_path: "/usr/./bin/../../bin/./ls",
result: "bin/ls",
},
TestData {
name: "path with . at end",
rootfs: rootfs_path,
unsafe_path: "/usr/./bin/../../bin/./ls/.",
result: "bin/ls",
},
TestData {
name: "path try to escape by ..",
rootfs: rootfs_path,
unsafe_path: "/usr/./bin/../../../../bin/./ls/../ls",
result: "bin/ls",
},
TestData {
name: "path with .. at the end",
rootfs: rootfs_path,
unsafe_path: "/usr/./bin/../../bin/./ls/..",
result: "bin",
},
TestData {
name: "path ..",
rootfs: rootfs_path,
unsafe_path: "..",
result: "",
},
TestData {
name: "path .",
rootfs: rootfs_path,
unsafe_path: ".",
result: "",
},
TestData {
name: "path /",
rootfs: rootfs_path,
unsafe_path: "/",
result: "",
},
TestData {
name: "empty path",
rootfs: rootfs_path,
unsafe_path: "",
result: "",
},
];
exec_tests(&tests);
}
#[test]
fn test_scoped_resolve_invalid() {
scoped_resolve("./root_is_not_absolute_path", ".").unwrap_err();
scoped_resolve("C:", ".").unwrap_err();
scoped_resolve(r#"\\server\test"#, ".").unwrap_err();
scoped_resolve(r#"http://localhost/test"#, ".").unwrap_err();
// Chinese Unicode characters
scoped_resolve(r#"您好"#, ".").unwrap_err();
}
#[test]
fn test_scoped_resolve_symlink() {
// create temporary directory to emulate container rootfs with symlink
let rootfs_dir = tempdir().expect("failed to create tmpdir");
let rootfs_path = &rootfs_dir.path();
std::fs::create_dir(rootfs_path.join("symlink_dir")).unwrap();
fs::symlink("../../../", rootfs_path.join("1")).unwrap();
let tests = [TestData {
name: "relative symlink beyond root",
rootfs: rootfs_path,
unsafe_path: "1",
result: "",
}];
exec_tests(&tests);
fs::symlink("/dddd", rootfs_path.join("2")).unwrap();
let tests = [TestData {
name: "abs symlink pointing to non-exist directory",
rootfs: rootfs_path,
unsafe_path: "2",
result: "dddd",
}];
exec_tests(&tests);
fs::symlink("/", rootfs_path.join("3")).unwrap();
let tests = [TestData {
name: "abs symlink pointing to /",
rootfs: rootfs_path,
unsafe_path: "3",
result: "",
}];
exec_tests(&tests);
fs::symlink("usr/bin/../bin/ls", rootfs_path.join("4")).unwrap();
let tests = [TestData {
name: "symlink with one ..",
rootfs: rootfs_path,
unsafe_path: "4",
result: "usr/bin/ls",
}];
exec_tests(&tests);
fs::symlink("usr/bin/../../bin/ls", rootfs_path.join("5")).unwrap();
let tests = [TestData {
name: "symlink with two ..",
rootfs: rootfs_path,
unsafe_path: "5",
result: "bin/ls",
}];
exec_tests(&tests);
fs::symlink(
"../usr/bin/../../../bin/ls",
rootfs_path.join("symlink_dir/6"),
)
.unwrap();
let tests = [TestData {
name: "symlink try to escape",
rootfs: rootfs_path,
unsafe_path: "symlink_dir/6",
result: "bin/ls",
}];
exec_tests(&tests);
// Detect symlink loop.
fs::symlink("/endpoint_b", rootfs_path.join("endpoint_a")).unwrap();
fs::symlink("/endpoint_a", rootfs_path.join("endpoint_b")).unwrap();
scoped_resolve(rootfs_path, "endpoint_a").unwrap_err();
}
#[test]
fn test_scoped_join() {
// create temporary directory to emulate container rootfs with symlink
let rootfs_dir = tempdir().expect("failed to create tmpdir");
let rootfs_path = &rootfs_dir.path();
assert_eq!(
scoped_join(&rootfs_path, "a").unwrap(),
rootfs_path.join("a")
);
assert_eq!(
scoped_join(&rootfs_path, "./a").unwrap(),
rootfs_path.join("a")
);
assert_eq!(
scoped_join(&rootfs_path, "././a").unwrap(),
rootfs_path.join("a")
);
assert_eq!(
scoped_join(&rootfs_path, "c/d/../../a").unwrap(),
rootfs_path.join("a")
);
assert_eq!(
scoped_join(&rootfs_path, "c/d/../../../.././a").unwrap(),
rootfs_path.join("a")
);
assert_eq!(
scoped_join(&rootfs_path, "../../a").unwrap(),
rootfs_path.join("a")
);
assert_eq!(
scoped_join(&rootfs_path, "./../a").unwrap(),
rootfs_path.join("a")
);
}
#[test]
fn test_scoped_join_symlink() {
// create temporary directory to emulate container rootfs with symlink
let rootfs_dir = tempdir().expect("failed to create tmpdir");
let rootfs_path = &rootfs_dir.path();
DirBuilder::new()
.recursive(true)
.create(rootfs_dir.path().join("b/c"))
.unwrap();
fs::symlink("b/c", rootfs_dir.path().join("a")).unwrap();
let target = rootfs_path.join("b/c");
assert_eq!(scoped_join(&rootfs_path, "a").unwrap(), target);
assert_eq!(scoped_join(&rootfs_path, "./a").unwrap(), target);
assert_eq!(scoped_join(&rootfs_path, "././a").unwrap(), target);
assert_eq!(scoped_join(&rootfs_path, "b/c/../../a").unwrap(), target);
assert_eq!(
scoped_join(&rootfs_path, "b/c/../../../.././a").unwrap(),
target
);
assert_eq!(scoped_join(&rootfs_path, "../../a").unwrap(), target);
assert_eq!(scoped_join(&rootfs_path, "./../a").unwrap(), target);
assert_eq!(scoped_join(&rootfs_path, "a/../../../a").unwrap(), target);
assert_eq!(scoped_join(&rootfs_path, "a/../../../b/c").unwrap(), target);
}
#[test]
fn test_scoped_join_symlink_loop() {
// create temporary directory to emulate container rootfs with symlink
let rootfs_dir = tempdir().expect("failed to create tmpdir");
let rootfs_path = &rootfs_dir.path();
fs::symlink("/endpoint_b", rootfs_path.join("endpoint_a")).unwrap();
fs::symlink("/endpoint_a", rootfs_path.join("endpoint_b")).unwrap();
scoped_join(rootfs_path, "endpoint_a").unwrap_err();
}
#[test]
fn test_scoped_join_unicode_character() {
// create temporary directory to emulate container rootfs with symlink
let rootfs_dir = tempdir().expect("failed to create tmpdir");
let rootfs_path = &rootfs_dir.path().canonicalize().unwrap();
let path = scoped_join(rootfs_path, "您好").unwrap();
assert_eq!(path, rootfs_path.join("您好"));
let path = scoped_join(rootfs_path, "../../../您好").unwrap();
assert_eq!(path, rootfs_path.join("您好"));
let path = scoped_join(rootfs_path, "。。/您好").unwrap();
assert_eq!(path, rootfs_path.join("。。/您好"));
let path = scoped_join(rootfs_path, "您好/../../test").unwrap();
assert_eq!(path, rootfs_path.join("test"));
}
}

View File

@@ -592,7 +592,7 @@ generate-config: $(CONFIGS)
test: hook go-test
hook:
make -C pkg/katautils/mockhook
make -C virtcontainers hook
go-test: $(GENERATED_FILES)
go clean -testcache

View File

@@ -10,7 +10,6 @@ This repository contains the following components:
|-|-|
| `containerd-shim-kata-v2` | The [shimv2 runtime](../../docs/design/architecture/README.md#runtime) |
| `kata-runtime` | [utility program](../../docs/design/architecture/README.md#utility-program) |
| `kata-monitor` | [metrics collector daemon](cmd/kata-monitor/README.md) |
For details of the other Kata Containers repositories, see the
[repository summary](https://github.com/kata-containers/kata-containers).
@@ -159,7 +158,7 @@ See [the community repository](https://github.com/kata-containers/community).
### Contact
See [how to reach the community](https://github.com/kata-containers/community/blob/main/CONTRIBUTING.md#contact).
See [how to reach the community](https://github.com/kata-containers/community/blob/master/CONTRIBUTING.md#contact).
## Further information

View File

@@ -1,68 +0,0 @@
# Kata monitor
## Overview
`kata-monitor` is a daemon able to collect and expose metrics related to all the Kata Containers workloads running on the same host.
Once started, it detects all the running Kata Containers runtimes (`containerd-shim-kata-v2`) in the system and exposes few http endpoints to allow the retrieval of the available data.
The main endpoint is the `/metrics` one which aggregates metrics from all the kata workloads.
Available metrics include:
* Kata runtime metrics
* Kata agent metrics
* Kata guest OS metrics
* Hypervisor metrics
* Firecracker metrics
* Kata monitor metrics
All the provided metrics are in Prometheus format. While `kata-monitor` can be used as a standalone daemon on any host running Kata Containers workloads and can be used for retrieving profiling data from the running Kata runtimes, its main expected usage is to be deployed as a DaemonSet on a Kubernetes cluster: there Prometheus should scrape the metrics from the kata-monitor endpoints.
For more information on the Kata Containers metrics architecture and a detailed list of the available metrics provided by Kata monitor check the [Kata 2.0 Metrics Design](../../../../docs/design/kata-2-0-metrics.md) document.
## Usage
Each `kata-monitor` instance detects and monitors the Kata Container workloads running on the same node.
### Kata monitor arguments
The `kata-monitor` binary accepts the following arguments:
* `--listen-address` _IP:PORT_
* `--runtime-enpoint` _PATH_TO_THE_CONTAINER_MANAGER_CRI_INTERFACE_
* `--log-level` _[ trace | debug | info | warn | error | fatal | panic ]_
The **listen-address** specifies the IP and TCP port where the kata-monitor HTTP endpoints will be exposed. It defaults to `127.0.0.1:8090`.
The **runtime-endpoint** is the CRI of a CRI compliant container manager: it will be used to retrieve the CRI `PodSandboxMetadata` (`uid`, `name` and `namespace`) which will be attached to the Kata metrics through the labels `cri_uid`, `cri_name` and `cri_namespace`. It defaults to the containerd socket: `/run/containerd/containerd.sock`.
The **log-level** allows the chose how verbose the logs should be. The default is `info`.
### Kata monitor HTTP endpoints
`kata-monitor` exposes the following endpoints:
* `/metrics` : get Kata sandboxes metrics.
* `/sandboxes` : list all the Kata sandboxes running on the host.
* `/agent-url` : Get the agent URL of a Kata sandbox.
* `/debug/vars` : Internal data of the Kata runtime shim.
* `/debug/pprof/` : Golang profiling data of the Kata runtime shim: index page.
* `/debug/pprof/cmdline` : Golang profiling data of the Kata runtime shim: `cmdline` endpoint.
* `/debug/pprof/profile` : Golang profiling data of the Kata runtime shim: `profile` endpoint (CPU profiling).
* `/debug/pprof/symbol` : Golang profiling data of the Kata runtime shim: `symbol` endpoint.
* `/debug/pprof/trace` : Golang profiling data of the Kata runtime shim: `trace` endpoint.
**NOTE: The debug endpoints are available only if the [Kata Containers configuration file](https://github.com/kata-containers/kata-containers/blob/9d5b03a1b70bbd175237ec4b9f821d6ccee0a1f6/src/runtime/config/configuration-qemu.toml.in#L590-L592) includes** `enable_pprof = true` **in the** `[runtime]` **section**.
The `/sandboxes` endpoint lists the _sandbox ID_ of all the detected Kata runtimes. If accessed via a web browser, it provides html links to the endpoints available for each sandbox.
In order to retrieve data for a specific Kata workload, the _sandbox ID_ should be passed in the query string using the _sandbox_ key. The `/agent-url`, and all the `/debug/`* endpoints require `sandbox_id` to be specified in the query string.
<br>
#### Examples
Retrieve the IDs of the available sandboxes:
```bash
$ curl 127.0.0.1:8090/sandboxes
```
output:
```
6fcf0a90b01e90d8747177aa466c3462d02e02a878bc393649df83d4c314af0c
df96b24bd49ec437c872c1a758edc084121d607ce1242ff5d2263a0e1b693343
```
Retrieve the `agent-url` of the sandbox with ID _df96b24bd49ec437c872c1a758edc084121d607ce1242ff5d2263a0e1b693343_:
```bash
$ curl 127.0.0.1:8090/agent-url?sandbox=df96b24bd49ec437c872c1a758edc084121d607ce1242ff5d2263a0e1b693343
```
output:
```
vsock://830455376:1024
```

View File

@@ -175,15 +175,6 @@ func main() {
}
func indexPage(w http.ResponseWriter, r *http.Request) {
htmlResponse := kataMonitor.IfReturnHTMLResponse(w, r)
if htmlResponse {
indexPageHTML(w, r)
} else {
indexPageText(w, r)
}
}
func indexPageText(w http.ResponseWriter, r *http.Request) {
w.Write([]byte("Available HTTP endpoints:\n"))
spacing := 0
@@ -193,35 +184,13 @@ func indexPageText(w http.ResponseWriter, r *http.Request) {
}
}
spacing = spacing + 3
formatter := fmt.Sprintf("%%-%ds: %%s\n", spacing)
formattedString := fmt.Sprintf("%%-%ds: %%s\n", spacing)
for _, endpoint := range endpoints {
w.Write([]byte(fmt.Sprintf(formatter, endpoint.path, endpoint.desc)))
w.Write([]byte(fmt.Sprintf(formattedString, endpoint.path, endpoint.desc)))
}
}
func indexPageHTML(w http.ResponseWriter, r *http.Request) {
w.Write([]byte("<h1>Available HTTP endpoints:</h1>\n"))
var formattedString string
needLinkPaths := []string{"/metrics", "/sandboxes"}
w.Write([]byte("<ul>"))
for _, endpoint := range endpoints {
formattedString = fmt.Sprintf("<b>%s</b>: %s\n", endpoint.path, endpoint.desc)
for _, linkPath := range needLinkPaths {
if linkPath == endpoint.path {
formattedString = fmt.Sprintf("<b><a href='%s'>%s</a></b>: %s\n", endpoint.path, endpoint.path, endpoint.desc)
break
}
}
formattedString = fmt.Sprintf("<li>%s</li>", formattedString)
w.Write([]byte(formattedString))
}
w.Write([]byte("</ul>"))
}
// initLog setup logger
func initLog() {
kataMonitorLog := logrus.WithFields(logrus.Fields{

View File

@@ -8,6 +8,7 @@ package main
import (
"context"
"flag"
"os"
"testing"
"github.com/stretchr/testify/assert"
@@ -42,7 +43,9 @@ func TestFactoryCLIFunctionNoRuntimeConfig(t *testing.T) {
func TestFactoryCLIFunctionInit(t *testing.T) {
assert := assert.New(t)
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
assert.NoError(err)
defer os.RemoveAll(tmpdir)
runtimeConfig, err := newTestRuntimeConfig(tmpdir, testConsole, true)
assert.NoError(err)
@@ -89,7 +92,9 @@ func TestFactoryCLIFunctionInit(t *testing.T) {
func TestFactoryCLIFunctionDestroy(t *testing.T) {
assert := assert.New(t)
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
assert.NoError(err)
defer os.RemoveAll(tmpdir)
runtimeConfig, err := newTestRuntimeConfig(tmpdir, testConsole, true)
assert.NoError(err)
@@ -121,7 +126,9 @@ func TestFactoryCLIFunctionDestroy(t *testing.T) {
func TestFactoryCLIFunctionStatus(t *testing.T) {
assert := assert.New(t)
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
assert.NoError(err)
defer os.RemoveAll(tmpdir)
runtimeConfig, err := newTestRuntimeConfig(tmpdir, testConsole, true)
assert.NoError(err)

View File

@@ -71,7 +71,11 @@ func TestCCCheckCLIFunction(t *testing.T) {
func TestCheckCheckKernelModulesNoNesting(t *testing.T) {
assert := assert.New(t)
dir := t.TempDir()
dir, err := os.MkdirTemp("", "")
if err != nil {
t.Fatal(err)
}
defer os.RemoveAll(dir)
savedSysModuleDir := sysModuleDir
savedProcCPUInfo := procCPUInfo
@@ -87,7 +91,7 @@ func TestCheckCheckKernelModulesNoNesting(t *testing.T) {
procCPUInfo = savedProcCPUInfo
}()
err := os.MkdirAll(sysModuleDir, testDirMode)
err = os.MkdirAll(sysModuleDir, testDirMode)
if err != nil {
t.Fatal(err)
}
@@ -152,7 +156,11 @@ func TestCheckCheckKernelModulesNoNesting(t *testing.T) {
func TestCheckCheckKernelModulesNoUnrestrictedGuest(t *testing.T) {
assert := assert.New(t)
dir := t.TempDir()
dir, err := os.MkdirTemp("", "")
if err != nil {
t.Fatal(err)
}
defer os.RemoveAll(dir)
savedSysModuleDir := sysModuleDir
savedProcCPUInfo := procCPUInfo
@@ -168,7 +176,7 @@ func TestCheckCheckKernelModulesNoUnrestrictedGuest(t *testing.T) {
procCPUInfo = savedProcCPUInfo
}()
err := os.MkdirAll(sysModuleDir, testDirMode)
err = os.MkdirAll(sysModuleDir, testDirMode)
if err != nil {
t.Fatal(err)
}
@@ -247,7 +255,11 @@ func TestCheckHostIsVMContainerCapable(t *testing.T) {
assert := assert.New(t)
dir := t.TempDir()
dir, err := os.MkdirTemp("", "")
if err != nil {
t.Fatal(err)
}
defer os.RemoveAll(dir)
savedSysModuleDir := sysModuleDir
savedProcCPUInfo := procCPUInfo
@@ -263,7 +275,7 @@ func TestCheckHostIsVMContainerCapable(t *testing.T) {
procCPUInfo = savedProcCPUInfo
}()
err := os.MkdirAll(sysModuleDir, testDirMode)
err = os.MkdirAll(sysModuleDir, testDirMode)
if err != nil {
t.Fatal(err)
}
@@ -393,7 +405,11 @@ func TestArchKernelParamHandler(t *testing.T) {
func TestKvmIsUsable(t *testing.T) {
assert := assert.New(t)
dir := t.TempDir()
dir, err := os.MkdirTemp("", "")
if err != nil {
t.Fatal(err)
}
defer os.RemoveAll(dir)
savedKvmDevice := kvmDevice
fakeKVMDevice := filepath.Join(dir, "kvm")
@@ -403,7 +419,7 @@ func TestKvmIsUsable(t *testing.T) {
kvmDevice = savedKvmDevice
}()
err := kvmIsUsable()
err = kvmIsUsable()
assert.Error(err)
err = createEmptyFile(fakeKVMDevice)
@@ -441,7 +457,9 @@ foo : bar
func TestSetCPUtype(t *testing.T) {
assert := assert.New(t)
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
assert.NoError(err)
defer os.RemoveAll(tmpdir)
savedArchRequiredCPUFlags := archRequiredCPUFlags
savedArchRequiredCPUAttribs := archRequiredCPUAttribs

View File

@@ -67,7 +67,11 @@ foo : bar
{validContents, validNormalizeVendorName, validNormalizeModelName, false},
}
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
if err != nil {
panic(err)
}
defer os.RemoveAll(tmpdir)
savedProcCPUInfo := procCPUInfo
@@ -80,7 +84,7 @@ foo : bar
procCPUInfo = savedProcCPUInfo
}()
_, _, err := getCPUDetails()
_, _, err = getCPUDetails()
// ENOENT
assert.Error(t, err)
assert.True(t, os.IsNotExist(err))

View File

@@ -9,6 +9,7 @@
package main
import (
"os"
"testing"
"github.com/stretchr/testify/assert"
@@ -17,7 +18,9 @@ import (
func testSetCPUTypeGeneric(t *testing.T) {
assert := assert.New(t)
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
assert.NoError(err)
defer os.RemoveAll(tmpdir)
savedArchRequiredCPUFlags := archRequiredCPUFlags
savedArchRequiredCPUAttribs := archRequiredCPUAttribs

View File

@@ -7,6 +7,7 @@ package main
import (
"fmt"
"os"
"path/filepath"
"testing"
@@ -117,7 +118,11 @@ func TestArchKernelParamHandler(t *testing.T) {
func TestKvmIsUsable(t *testing.T) {
assert := assert.New(t)
dir := t.TempDir()
dir, err := os.MkdirTemp("", "")
if err != nil {
t.Fatal(err)
}
defer os.RemoveAll(dir)
savedKvmDevice := kvmDevice
fakeKVMDevice := filepath.Join(dir, "kvm")
@@ -127,7 +132,7 @@ func TestKvmIsUsable(t *testing.T) {
kvmDevice = savedKvmDevice
}()
err := kvmIsUsable()
err = kvmIsUsable()
assert.Error(err)
err = createEmptyFile(fakeKVMDevice)

View File

@@ -7,6 +7,7 @@ package main
import (
"fmt"
"os"
"path/filepath"
"testing"
@@ -116,7 +117,11 @@ func TestArchKernelParamHandler(t *testing.T) {
func TestKvmIsUsable(t *testing.T) {
assert := assert.New(t)
dir := t.TempDir()
dir, err := os.MkdirTemp("", "")
if err != nil {
t.Fatal(err)
}
defer os.RemoveAll(dir)
savedKvmDevice := kvmDevice
fakeKVMDevice := filepath.Join(dir, "kvm")
@@ -126,7 +131,7 @@ func TestKvmIsUsable(t *testing.T) {
kvmDevice = savedKvmDevice
}()
err := kvmIsUsable()
err = kvmIsUsable()
assert.Error(err)
err = createEmptyFile(fakeKVMDevice)

View File

@@ -155,7 +155,11 @@ func makeCPUInfoFile(path, vendorID, flags string) error {
// nolint: unused, deadcode
func genericTestGetCPUDetails(t *testing.T, validVendor string, validModel string, validContents string, data []testCPUDetail) {
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
if err != nil {
panic(err)
}
defer os.RemoveAll(tmpdir)
savedProcCPUInfo := procCPUInfo
@@ -168,7 +172,7 @@ func genericTestGetCPUDetails(t *testing.T, validVendor string, validModel strin
procCPUInfo = savedProcCPUInfo
}()
_, _, err := getCPUDetails()
_, _, err = getCPUDetails()
// ENOENT
assert.Error(t, err)
assert.True(t, os.IsNotExist(err))
@@ -193,7 +197,11 @@ func genericTestGetCPUDetails(t *testing.T, validVendor string, validModel strin
func genericCheckCLIFunction(t *testing.T, cpuData []testCPUData, moduleData []testModuleData) {
assert := assert.New(t)
dir := t.TempDir()
dir, err := os.MkdirTemp("", "")
if err != nil {
t.Fatal(err)
}
defer os.RemoveAll(dir)
_, config, err := makeRuntimeConfig(dir)
assert.NoError(err)
@@ -299,11 +307,15 @@ func TestCheckGetCPUInfo(t *testing.T) {
{"foo\n\nbar\nbaz\n\n", "foo", false},
}
dir := t.TempDir()
dir, err := os.MkdirTemp("", "")
if err != nil {
t.Fatal(err)
}
defer os.RemoveAll(dir)
file := filepath.Join(dir, "cpuinfo")
// file doesn't exist
_, err := getCPUInfo(file)
_, err = getCPUInfo(file)
assert.Error(err)
for _, d := range data {
@@ -515,7 +527,11 @@ func TestCheckHaveKernelModule(t *testing.T) {
assert := assert.New(t)
dir := t.TempDir()
dir, err := os.MkdirTemp("", "")
if err != nil {
t.Fatal(err)
}
defer os.RemoveAll(dir)
savedModProbeCmd := modProbeCmd
savedSysModuleDir := sysModuleDir
@@ -529,7 +545,7 @@ func TestCheckHaveKernelModule(t *testing.T) {
sysModuleDir = savedSysModuleDir
}()
err := os.MkdirAll(sysModuleDir, testDirMode)
err = os.MkdirAll(sysModuleDir, testDirMode)
if err != nil {
t.Fatal(err)
}
@@ -561,7 +577,11 @@ func TestCheckHaveKernelModule(t *testing.T) {
func TestCheckCheckKernelModules(t *testing.T) {
assert := assert.New(t)
dir := t.TempDir()
dir, err := os.MkdirTemp("", "")
if err != nil {
t.Fatal(err)
}
defer os.RemoveAll(dir)
savedModProbeCmd := modProbeCmd
savedSysModuleDir := sysModuleDir
@@ -575,7 +595,7 @@ func TestCheckCheckKernelModules(t *testing.T) {
sysModuleDir = savedSysModuleDir
}()
err := os.MkdirAll(sysModuleDir, testDirMode)
err = os.MkdirAll(sysModuleDir, testDirMode)
if err != nil {
t.Fatal(err)
}
@@ -642,7 +662,11 @@ func TestCheckCheckKernelModulesUnreadableFile(t *testing.T) {
t.Skip(ktu.TestDisabledNeedNonRoot)
}
dir := t.TempDir()
dir, err := os.MkdirTemp("", "")
if err != nil {
t.Fatal(err)
}
defer os.RemoveAll(dir)
testData := map[string]kernelModule{
"foo": {
@@ -667,7 +691,7 @@ func TestCheckCheckKernelModulesUnreadableFile(t *testing.T) {
}()
modPath := filepath.Join(sysModuleDir, "foo/parameters")
err := os.MkdirAll(modPath, testDirMode)
err = os.MkdirAll(modPath, testDirMode)
assert.NoError(err)
modParamFile := filepath.Join(modPath, "param1")
@@ -686,7 +710,11 @@ func TestCheckCheckKernelModulesUnreadableFile(t *testing.T) {
func TestCheckCheckKernelModulesInvalidFileContents(t *testing.T) {
assert := assert.New(t)
dir := t.TempDir()
dir, err := os.MkdirTemp("", "")
if err != nil {
t.Fatal(err)
}
defer os.RemoveAll(dir)
testData := map[string]kernelModule{
"foo": {
@@ -711,7 +739,7 @@ func TestCheckCheckKernelModulesInvalidFileContents(t *testing.T) {
}()
modPath := filepath.Join(sysModuleDir, "foo/parameters")
err := os.MkdirAll(modPath, testDirMode)
err = os.MkdirAll(modPath, testDirMode)
assert.NoError(err)
modParamFile := filepath.Join(modPath, "param1")
@@ -727,7 +755,11 @@ func TestCheckCheckKernelModulesInvalidFileContents(t *testing.T) {
func TestCheckCLIFunctionFail(t *testing.T) {
assert := assert.New(t)
dir := t.TempDir()
dir, err := os.MkdirTemp("", "")
if err != nil {
t.Fatal(err)
}
defer os.RemoveAll(dir)
_, config, err := makeRuntimeConfig(dir)
assert.NoError(err)
@@ -756,7 +788,11 @@ func TestCheckCLIFunctionFail(t *testing.T) {
func TestCheckKernelParamHandler(t *testing.T) {
assert := assert.New(t)
dir := t.TempDir()
dir, err := os.MkdirTemp("", "")
if err != nil {
t.Fatal(err)
}
defer os.RemoveAll(dir)
savedModProbeCmd := modProbeCmd
savedSysModuleDir := sysModuleDir
@@ -834,7 +870,9 @@ func TestCheckKernelParamHandler(t *testing.T) {
func TestArchRequiredKernelModules(t *testing.T) {
assert := assert.New(t)
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
assert.NoError(err)
defer os.RemoveAll(tmpdir)
_, config, err := makeRuntimeConfig(tmpdir)
assert.NoError(err)
@@ -847,7 +885,11 @@ func TestArchRequiredKernelModules(t *testing.T) {
return
}
dir := t.TempDir()
dir, err := os.MkdirTemp("", "")
if err != nil {
t.Fatal(err)
}
defer os.RemoveAll(dir)
savedModProbeCmd := modProbeCmd
savedSysModuleDir := sysModuleDir

View File

@@ -6,6 +6,7 @@
package main
import (
"os"
"testing"
"github.com/stretchr/testify/assert"
@@ -21,7 +22,9 @@ func getExpectedHostDetails(tmpdir string) (HostInfo, error) {
func TestEnvGetEnvInfoSetsCPUType(t *testing.T) {
assert := assert.New(t)
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
assert.NoError(err)
defer os.RemoveAll(tmpdir)
savedArchRequiredCPUFlags := archRequiredCPUFlags
savedArchRequiredCPUAttribs := archRequiredCPUAttribs

View File

@@ -9,6 +9,7 @@
package main
import (
"os"
"testing"
"github.com/stretchr/testify/assert"
@@ -17,7 +18,9 @@ import (
func testEnvGetEnvInfoSetsCPUTypeGeneric(t *testing.T) {
assert := assert.New(t)
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
assert.NoError(err)
defer os.RemoveAll(tmpdir)
savedArchRequiredCPUFlags := archRequiredCPUFlags
savedArchRequiredCPUAttribs := archRequiredCPUAttribs

View File

@@ -364,7 +364,11 @@ func TestEnvGetMetaInfo(t *testing.T) {
}
func TestEnvGetHostInfo(t *testing.T) {
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
if err != nil {
panic(err)
}
defer os.RemoveAll(tmpdir)
expectedHostDetails, err := getExpectedHostDetails(tmpdir)
assert.NoError(t, err)
@@ -385,9 +389,13 @@ func TestEnvGetHostInfo(t *testing.T) {
}
func TestEnvGetHostInfoNoProcCPUInfo(t *testing.T) {
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
if err != nil {
panic(err)
}
defer os.RemoveAll(tmpdir)
_, err := getExpectedHostDetails(tmpdir)
_, err = getExpectedHostDetails(tmpdir)
assert.NoError(t, err)
err = os.Remove(procCPUInfo)
@@ -398,9 +406,13 @@ func TestEnvGetHostInfoNoProcCPUInfo(t *testing.T) {
}
func TestEnvGetHostInfoNoOSRelease(t *testing.T) {
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
if err != nil {
panic(err)
}
defer os.RemoveAll(tmpdir)
_, err := getExpectedHostDetails(tmpdir)
_, err = getExpectedHostDetails(tmpdir)
assert.NoError(t, err)
err = os.Remove(osRelease)
@@ -411,9 +423,13 @@ func TestEnvGetHostInfoNoOSRelease(t *testing.T) {
}
func TestEnvGetHostInfoNoProcVersion(t *testing.T) {
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
if err != nil {
panic(err)
}
defer os.RemoveAll(tmpdir)
_, err := getExpectedHostDetails(tmpdir)
_, err = getExpectedHostDetails(tmpdir)
assert.NoError(t, err)
err = os.Remove(procVersion)
@@ -424,7 +440,11 @@ func TestEnvGetHostInfoNoProcVersion(t *testing.T) {
}
func TestEnvGetEnvInfo(t *testing.T) {
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
if err != nil {
panic(err)
}
defer os.RemoveAll(tmpdir)
// Run test twice to ensure the individual component debug+trace
// options are tested.
@@ -454,7 +474,9 @@ func TestEnvGetEnvInfo(t *testing.T) {
func TestEnvGetEnvInfoNoHypervisorVersion(t *testing.T) {
assert := assert.New(t)
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
assert.NoError(err)
defer os.RemoveAll(tmpdir)
configFile, config, err := makeRuntimeConfig(tmpdir)
assert.NoError(err)
@@ -479,14 +501,20 @@ func TestEnvGetEnvInfoNoHypervisorVersion(t *testing.T) {
func TestEnvGetEnvInfoAgentError(t *testing.T) {
assert := assert.New(t)
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
assert.NoError(err)
defer os.RemoveAll(tmpdir)
_, _, err := makeRuntimeConfig(tmpdir)
_, _, err = makeRuntimeConfig(tmpdir)
assert.NoError(err)
}
func TestEnvGetEnvInfoNoOSRelease(t *testing.T) {
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
if err != nil {
panic(err)
}
defer os.RemoveAll(tmpdir)
configFile, config, err := makeRuntimeConfig(tmpdir)
assert.NoError(t, err)
@@ -502,7 +530,11 @@ func TestEnvGetEnvInfoNoOSRelease(t *testing.T) {
}
func TestEnvGetEnvInfoNoProcCPUInfo(t *testing.T) {
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
if err != nil {
panic(err)
}
defer os.RemoveAll(tmpdir)
configFile, config, err := makeRuntimeConfig(tmpdir)
assert.NoError(t, err)
@@ -518,7 +550,11 @@ func TestEnvGetEnvInfoNoProcCPUInfo(t *testing.T) {
}
func TestEnvGetEnvInfoNoProcVersion(t *testing.T) {
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
if err != nil {
panic(err)
}
defer os.RemoveAll(tmpdir)
configFile, config, err := makeRuntimeConfig(tmpdir)
assert.NoError(t, err)
@@ -534,7 +570,11 @@ func TestEnvGetEnvInfoNoProcVersion(t *testing.T) {
}
func TestEnvGetRuntimeInfo(t *testing.T) {
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
if err != nil {
panic(err)
}
defer os.RemoveAll(tmpdir)
configFile, config, err := makeRuntimeConfig(tmpdir)
assert.NoError(t, err)
@@ -547,7 +587,11 @@ func TestEnvGetRuntimeInfo(t *testing.T) {
}
func TestEnvGetAgentInfo(t *testing.T) {
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
if err != nil {
panic(err)
}
defer os.RemoveAll(tmpdir)
_, config, err := makeRuntimeConfig(tmpdir)
assert.NoError(t, err)
@@ -682,7 +726,11 @@ func testEnvShowJSONSettings(t *testing.T, tmpdir string, tmpfile *os.File) erro
}
func TestEnvShowSettings(t *testing.T) {
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
if err != nil {
panic(err)
}
defer os.RemoveAll(tmpdir)
tmpfile, err := os.CreateTemp("", "envShowSettings-")
assert.NoError(t, err)
@@ -699,7 +747,11 @@ func TestEnvShowSettings(t *testing.T) {
}
func TestEnvShowSettingsInvalidFile(t *testing.T) {
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
if err != nil {
panic(err)
}
defer os.RemoveAll(tmpdir)
tmpfile, err := os.CreateTemp("", "envShowSettings-")
assert.NoError(t, err)
@@ -719,7 +771,11 @@ func TestEnvShowSettingsInvalidFile(t *testing.T) {
}
func TestEnvHandleSettings(t *testing.T) {
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
if err != nil {
panic(err)
}
defer os.RemoveAll(tmpdir)
configFile, config, err := makeRuntimeConfig(tmpdir)
assert.NoError(t, err)
@@ -749,7 +805,9 @@ func TestEnvHandleSettings(t *testing.T) {
func TestEnvHandleSettingsInvalidParams(t *testing.T) {
assert := assert.New(t)
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
assert.NoError(err)
defer os.RemoveAll(tmpdir)
configFile, _, err := makeRuntimeConfig(tmpdir)
assert.NoError(err)
@@ -801,7 +859,11 @@ func TestEnvHandleSettingsInvalidRuntimeConfigType(t *testing.T) {
}
func TestEnvCLIFunction(t *testing.T) {
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
if err != nil {
panic(err)
}
defer os.RemoveAll(tmpdir)
configFile, config, err := makeRuntimeConfig(tmpdir)
assert.NoError(t, err)
@@ -842,7 +904,11 @@ func TestEnvCLIFunction(t *testing.T) {
}
func TestEnvCLIFunctionFail(t *testing.T) {
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
if err != nil {
panic(err)
}
defer os.RemoveAll(tmpdir)
configFile, config, err := makeRuntimeConfig(tmpdir)
assert.NoError(t, err)
@@ -874,7 +940,9 @@ func TestEnvCLIFunctionFail(t *testing.T) {
func TestGetHypervisorInfo(t *testing.T) {
assert := assert.New(t)
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
assert.NoError(err)
defer os.RemoveAll(tmpdir)
_, config, err := makeRuntimeConfig(tmpdir)
assert.NoError(err)
@@ -894,7 +962,9 @@ func TestGetHypervisorInfo(t *testing.T) {
func TestGetHypervisorInfoSocket(t *testing.T) {
assert := assert.New(t)
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
assert.NoError(err)
defer os.RemoveAll(tmpdir)
_, config, err := makeRuntimeConfig(tmpdir)
assert.NoError(err)

View File

@@ -54,10 +54,7 @@ var addCommand = cli.Command{
},
},
Action: func(c *cli.Context) error {
if err := volume.Add(volumePath, mountInfo); err != nil {
return cli.NewExitError(err.Error(), 1)
}
return nil
return volume.Add(volumePath, mountInfo)
},
}
@@ -72,10 +69,7 @@ var removeCommand = cli.Command{
},
},
Action: func(c *cli.Context) error {
if err := volume.Remove(volumePath); err != nil {
return cli.NewExitError(err.Error(), 1)
}
return nil
return volume.Remove(volumePath)
},
}
@@ -92,8 +86,9 @@ var statsCommand = cli.Command{
Action: func(c *cli.Context) (string, error) {
stats, err := Stats(volumePath)
if err != nil {
return "", cli.NewExitError(err.Error(), 1)
return "", err
}
return string(stats), nil
},
}
@@ -114,10 +109,7 @@ var resizeCommand = cli.Command{
},
},
Action: func(c *cli.Context) error {
if err := Resize(volumePath, size); err != nil {
return cli.NewExitError(err.Error(), 1)
}
return nil
return Resize(volumePath, size)
},
}

View File

@@ -258,12 +258,14 @@ func TestMainBeforeSubCommands(t *testing.T) {
func TestMainBeforeSubCommandsInvalidLogFile(t *testing.T) {
assert := assert.New(t)
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "katatest")
assert.NoError(err)
defer os.RemoveAll(tmpdir)
logFile := filepath.Join(tmpdir, "log")
// create the file as the wrong type to force a failure
err := os.MkdirAll(logFile, testDirMode)
err = os.MkdirAll(logFile, testDirMode)
assert.NoError(err)
set := flag.NewFlagSet("", 0)
@@ -279,7 +281,9 @@ func TestMainBeforeSubCommandsInvalidLogFile(t *testing.T) {
func TestMainBeforeSubCommandsInvalidLogFormat(t *testing.T) {
assert := assert.New(t)
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "katatest")
assert.NoError(err)
defer os.RemoveAll(tmpdir)
logFile := filepath.Join(tmpdir, "log")
@@ -298,7 +302,7 @@ func TestMainBeforeSubCommandsInvalidLogFormat(t *testing.T) {
ctx := createCLIContext(set)
err := beforeSubcommands(ctx)
err = beforeSubcommands(ctx)
assert.Error(err)
assert.NotNil(kataLog.Logger.Out)
}
@@ -306,7 +310,9 @@ func TestMainBeforeSubCommandsInvalidLogFormat(t *testing.T) {
func TestMainBeforeSubCommandsLoadConfigurationFail(t *testing.T) {
assert := assert.New(t)
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "katatest")
assert.NoError(err)
defer os.RemoveAll(tmpdir)
logFile := filepath.Join(tmpdir, "log")
configFile := filepath.Join(tmpdir, "config")
@@ -339,7 +345,9 @@ func TestMainBeforeSubCommandsLoadConfigurationFail(t *testing.T) {
func TestMainBeforeSubCommandsShowCCConfigPaths(t *testing.T) {
assert := assert.New(t)
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "katatest")
assert.NoError(err)
defer os.RemoveAll(tmpdir)
set := flag.NewFlagSet("", 0)
set.Bool("show-default-config-paths", true, "")
@@ -401,7 +409,9 @@ func TestMainBeforeSubCommandsShowCCConfigPaths(t *testing.T) {
func TestMainFatal(t *testing.T) {
assert := assert.New(t)
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "katatest")
assert.NoError(err)
defer os.RemoveAll(tmpdir)
var exitStatus int
savedExitFunc := exitFunc
@@ -623,7 +633,9 @@ func TestMainCreateRuntime(t *testing.T) {
func TestMainVersionPrinter(t *testing.T) {
assert := assert.New(t)
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "katatest")
assert.NoError(err)
defer os.RemoveAll(tmpdir)
savedOutputFile := defaultOutputFile

View File

@@ -17,14 +17,18 @@ import (
)
func TestFileExists(t *testing.T) {
dir := t.TempDir()
dir, err := os.MkdirTemp("", "katatest")
if err != nil {
t.Fatal(err)
}
defer os.RemoveAll(dir)
file := filepath.Join(dir, "foo")
assert.False(t, katautils.FileExists(file),
fmt.Sprintf("File %q should not exist", file))
err := createEmptyFile(file)
err = createEmptyFile(file)
if err != nil {
t.Fatal(err)
}
@@ -50,10 +54,14 @@ func TestGetKernelVersion(t *testing.T) {
{validContents, validVersion, false},
}
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
if err != nil {
panic(err)
}
defer os.RemoveAll(tmpdir)
subDir := filepath.Join(tmpdir, "subdir")
err := os.MkdirAll(subDir, testDirMode)
err = os.MkdirAll(subDir, testDirMode)
assert.NoError(t, err)
_, err = getKernelVersion()
@@ -95,7 +103,11 @@ func TestGetDistroDetails(t *testing.T) {
const unknown = "<<unknown>>"
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
if err != nil {
panic(err)
}
defer os.RemoveAll(tmpdir)
testOSRelease := filepath.Join(tmpdir, "os-release")
testOSReleaseClr := filepath.Join(tmpdir, "os-release-clr")
@@ -119,7 +131,7 @@ VERSION_ID="%s"
`, nonClrExpectedName, nonClrExpectedVersion)
subDir := filepath.Join(tmpdir, "subdir")
err := os.MkdirAll(subDir, testDirMode)
err = os.MkdirAll(subDir, testDirMode)
assert.NoError(t, err)
// override

View File

@@ -125,8 +125,7 @@ virtio_fs_cache_size = @DEFVIRTIOFSCACHESIZE@
#
# Format example:
# ["-o", "arg1=xxx,arg2", "-o", "hello world", "--arg3=yyy"]
# Examples:
# Set virtiofsd log level to debug : ["-o", "log_level=debug"] or ["-d"]
#
# see `virtiofsd -h` for possible options.
virtio_fs_extra_args = @DEFVIRTIOFSEXTRAARGS@
@@ -180,78 +179,6 @@ block_device_driver = "virtio-blk"
# but it will not abort container execution.
#guest_hook_path = "/usr/share/oci/hooks"
#
# These options are related to network rate limiter at the VMM level, and are
# based on the Cloud Hypervisor I/O throttling. Those are disabled by default
# and we strongly advise users to refer the Cloud Hypervisor official
# documentation for a better understanding of its internals:
# https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/docs/io_throttling.md
#
# Bandwidth rate limiter options
#
# net_rate_limiter_bw_max_rate controls network I/O bandwidth (size in bits/sec
# for SB/VM).
# The same value is used for inbound and outbound bandwidth.
# Default 0-sized value means unlimited rate.
#net_rate_limiter_bw_max_rate = 0
#
# net_rate_limiter_bw_one_time_burst increases the initial max rate and this
# initial extra credit does *NOT* affect the overall limit and can be used for
# an *initial* burst of data.
# This is *optional* and only takes effect if net_rate_limiter_bw_max_rate is
# set to a non zero value.
#net_rate_limiter_bw_one_time_burst = 0
#
# Operation rate limiter options
#
# net_rate_limiter_ops_max_rate controls network I/O bandwidth (size in ops/sec
# for SB/VM).
# The same value is used for inbound and outbound bandwidth.
# Default 0-sized value means unlimited rate.
#net_rate_limiter_ops_max_rate = 0
#
# net_rate_limiter_ops_one_time_burst increases the initial max rate and this
# initial extra credit does *NOT* affect the overall limit and can be used for
# an *initial* burst of data.
# This is *optional* and only takes effect if net_rate_limiter_bw_max_rate is
# set to a non zero value.
#net_rate_limiter_ops_one_time_burst = 0
#
# These options are related to disk rate limiter at the VMM level, and are
# based on the Cloud Hypervisor I/O throttling. Those are disabled by default
# and we strongly advise users to refer the Cloud Hypervisor official
# documentation for a better understanding of its internals:
# https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/docs/io_throttling.md
#
# Bandwidth rate limiter options
#
# disk_rate_limiter_bw_max_rate controls disk I/O bandwidth (size in bits/sec
# for SB/VM).
# The same value is used for inbound and outbound bandwidth.
# Default 0-sized value means unlimited rate.
#disk_rate_limiter_bw_max_rate = 0
#
# disk_rate_limiter_bw_one_time_burst increases the initial max rate and this
# initial extra credit does *NOT* affect the overall limit and can be used for
# an *initial* burst of data.
# This is *optional* and only takes effect if disk_rate_limiter_bw_max_rate is
# set to a non zero value.
#disk_rate_limiter_bw_one_time_burst = 0
#
# Operation rate limiter options
#
# disk_rate_limiter_ops_max_rate controls disk I/O bandwidth (size in ops/sec
# for SB/VM).
# The same value is used for inbound and outbound bandwidth.
# Default 0-sized value means unlimited rate.
#disk_rate_limiter_ops_max_rate = 0
#
# disk_rate_limiter_ops_one_time_burst increases the initial max rate and this
# initial extra credit does *NOT* affect the overall limit and can be used for
# an *initial* burst of data.
# This is *optional* and only takes effect if disk_rate_limiter_bw_max_rate is
# set to a non zero value.
#disk_rate_limiter_ops_one_time_burst = 0
[agent.@PROJECT_TYPE@]
# If enabled, make the agent display debug-level messages.
# (default: disabled)
@@ -397,30 +324,3 @@ experimental=@DEFAULTEXPFEATURES@
# If enabled, user can run pprof tools with shim v2 process through kata-monitor.
# (default: false)
# enable_pprof = true
# WARNING: All the options in the following section have not been implemented yet.
# This section was added as a placeholder. DO NOT USE IT!
[image]
# Container image service.
#
# Offload the CRI image management service to the Kata agent.
# (default: false)
#service_offload = true
# Container image decryption keys provisioning.
# Applies only if service_offload is true.
# Keys can be provisioned locally (e.g. through a special command or
# a local file) or remotely (usually after the guest is remotely attested).
# The provision setting is a complete URL that lets the Kata agent decide
# which method to use in order to fetch the keys.
#
# Keys can be stored in a local file, in a measured and attested initrd:
#provision=data:///local/key/file
#
# Keys could be fetched through a special command or binary from the
# initrd (guest) image, e.g. a firmware call:
#provision=file:///path/to/bin/fetcher/in/guest
#
# Keys can be remotely provisioned. The Kata agent fetches them from e.g.
# a HTTPS URL:
#provision=https://my-key-broker.foo/tenant/<tenant-id>

View File

@@ -168,8 +168,6 @@ virtio_fs_cache_size = @DEFVIRTIOFSCACHESIZE@
#
# Format example:
# ["-o", "arg1=xxx,arg2", "-o", "hello world", "--arg3=yyy"]
# Examples:
# Set virtiofsd log level to debug : ["-o", "log_level=debug"] or ["-d"]
#
# see `virtiofsd -h` for possible options.
virtio_fs_extra_args = @DEFVIRTIOFSEXTRAARGS@

View File

@@ -50,6 +50,7 @@ func TestCreateSandboxSuccess(t *testing.T) {
}()
tmpdir, bundlePath, ociConfigFile := ktu.SetupOCIConfigFile(t)
// defer os.RemoveAll(tmpdir)
runtimeConfig, err := newTestRuntimeConfig(tmpdir, testConsole, true)
assert.NoError(err)
@@ -98,6 +99,7 @@ func TestCreateSandboxFail(t *testing.T) {
assert := assert.New(t)
tmpdir, bundlePath, ociConfigFile := ktu.SetupOCIConfigFile(t)
defer os.RemoveAll(tmpdir)
runtimeConfig, err := newTestRuntimeConfig(tmpdir, testConsole, true)
assert.NoError(err)
@@ -135,6 +137,7 @@ func TestCreateSandboxConfigFail(t *testing.T) {
assert := assert.New(t)
tmpdir, bundlePath, _ := ktu.SetupOCIConfigFile(t)
defer os.RemoveAll(tmpdir)
runtimeConfig, err := newTestRuntimeConfig(tmpdir, testConsole, true)
assert.NoError(err)
@@ -184,6 +187,7 @@ func TestCreateContainerSuccess(t *testing.T) {
}
tmpdir, bundlePath, ociConfigFile := ktu.SetupOCIConfigFile(t)
defer os.RemoveAll(tmpdir)
runtimeConfig, err := newTestRuntimeConfig(tmpdir, testConsole, true)
assert.NoError(err)
@@ -223,6 +227,7 @@ func TestCreateContainerFail(t *testing.T) {
assert := assert.New(t)
tmpdir, bundlePath, ociConfigFile := ktu.SetupOCIConfigFile(t)
defer os.RemoveAll(tmpdir)
runtimeConfig, err := newTestRuntimeConfig(tmpdir, testConsole, true)
assert.NoError(err)
@@ -273,6 +278,7 @@ func TestCreateContainerConfigFail(t *testing.T) {
}()
tmpdir, bundlePath, ociConfigFile := ktu.SetupOCIConfigFile(t)
defer os.RemoveAll(tmpdir)
runtimeConfig, err := newTestRuntimeConfig(tmpdir, testConsole, true)
assert.NoError(err)
@@ -376,7 +382,9 @@ func createAllRuntimeConfigFiles(dir, hypervisor string) (config string, err err
func TestCreateLoadRuntimeConfig(t *testing.T) {
assert := assert.New(t)
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
assert.NoError(err)
defer os.RemoveAll(tmpdir)
config, err := createAllRuntimeConfigFiles(tmpdir, "qemu")
assert.NoError(err)

View File

@@ -7,6 +7,7 @@
package containerdshim
import (
"os"
"testing"
taskAPI "github.com/containerd/containerd/runtime/v2/task"
@@ -24,8 +25,8 @@ func TestDeleteContainerSuccessAndFail(t *testing.T) {
MockID: testSandboxID,
}
_, bundlePath, _ := ktu.SetupOCIConfigFile(t)
rootPath, bundlePath, _ := ktu.SetupOCIConfigFile(t)
defer os.RemoveAll(rootPath)
_, err := compatoci.ParseConfigJSON(bundlePath)
assert.NoError(err)

View File

@@ -41,7 +41,8 @@ func TestServiceCreate(t *testing.T) {
assert := assert.New(t)
_, bundleDir, _ := ktu.SetupOCIConfigFile(t)
tmpdir, bundleDir, _ := ktu.SetupOCIConfigFile(t)
defer os.RemoveAll(tmpdir)
ctx := context.Background()

View File

@@ -26,7 +26,9 @@ func TestNewTtyIOFifoReopen(t *testing.T) {
assert := assert.New(t)
ctx := context.TODO()
testDir := t.TempDir()
testDir, err := os.MkdirTemp("", "kata-")
assert.NoError(err)
defer os.RemoveAll(testDir)
fifoPath, err := os.MkdirTemp(testDir, "fifo-path-")
assert.NoError(err)
@@ -102,7 +104,9 @@ func TestIoCopy(t *testing.T) {
testBytes2 := []byte("Test2")
testBytes3 := []byte("Test3")
testDir := t.TempDir()
testDir, err := os.MkdirTemp("", "kata-")
assert.NoError(err)
defer os.RemoveAll(testDir)
fifoPath, err := os.MkdirTemp(testDir, "fifo-path-")
assert.NoError(err)

View File

@@ -6,36 +6,17 @@
package volume
import (
b64 "encoding/base64"
"encoding/json"
"errors"
"fmt"
"io/ioutil"
"os"
"path/filepath"
"strings"
)
const (
mountInfoFileName = "mountInfo.json"
FSGroupMetadataKey = "fsGroup"
FSGroupChangePolicyMetadataKey = "fsGroupChangePolicy"
)
// FSGroupChangePolicy holds policies that will be used for applying fsGroup to a volume.
// This type and the allowed values are tracking the PodFSGroupChangePolicy defined in
// https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/api/core/v1/types.go
// It is up to the client using the direct-assigned volume feature (e.g. CSI drivers) to determine
// the optimal setting for this change policy (i.e. from Pod spec or assuming volume ownership
// based on the storage offering).
type FSGroupChangePolicy string
const (
// FSGroupChangeAlways indicates that volume's ownership should always be changed.
FSGroupChangeAlways FSGroupChangePolicy = "Always"
// FSGroupChangeOnRootMismatch indicates that volume's ownership will be changed
// only when ownership of root directory does not match with the desired group id.
FSGroupChangeOnRootMismatch FSGroupChangePolicy = "OnRootMismatch"
)
var kataDirectVolumeRootPath = "/run/kata-containers/shared/direct-volumes"
@@ -56,20 +37,19 @@ type MountInfo struct {
// Add writes the mount info of a direct volume into a filesystem path known to Kata Container.
func Add(volumePath string, mountInfo string) error {
volumeDir := filepath.Join(kataDirectVolumeRootPath, b64.URLEncoding.EncodeToString([]byte(volumePath)))
volumeDir := filepath.Join(kataDirectVolumeRootPath, volumePath)
stat, err := os.Stat(volumeDir)
if err != nil {
if !errors.Is(err, os.ErrNotExist) {
return err
}
if err := os.MkdirAll(volumeDir, 0700); err != nil {
return err
}
if err != nil && !errors.Is(err, os.ErrNotExist) {
return err
}
if stat != nil && !stat.IsDir() {
return fmt.Errorf("%s should be a directory", volumeDir)
}
if errors.Is(err, os.ErrNotExist) {
if err := os.MkdirAll(volumeDir, 0700); err != nil {
return err
}
}
var deserialized MountInfo
if err := json.Unmarshal([]byte(mountInfo), &deserialized); err != nil {
return err
@@ -80,12 +60,14 @@ func Add(volumePath string, mountInfo string) error {
// Remove deletes the direct volume path including all the files inside it.
func Remove(volumePath string) error {
return os.RemoveAll(filepath.Join(kataDirectVolumeRootPath, b64.URLEncoding.EncodeToString([]byte(volumePath))))
// Find the base of the volume path to delete the whole volume path
base := strings.SplitN(volumePath, string(os.PathSeparator), 2)[0]
return os.RemoveAll(filepath.Join(kataDirectVolumeRootPath, base))
}
// VolumeMountInfo retrieves the mount info of a direct volume.
func VolumeMountInfo(volumePath string) (*MountInfo, error) {
mountInfoFilePath := filepath.Join(kataDirectVolumeRootPath, b64.URLEncoding.EncodeToString([]byte(volumePath)), mountInfoFileName)
mountInfoFilePath := filepath.Join(kataDirectVolumeRootPath, volumePath, mountInfoFileName)
if _, err := os.Stat(mountInfoFilePath); err != nil {
return nil, err
}
@@ -102,17 +84,16 @@ func VolumeMountInfo(volumePath string) (*MountInfo, error) {
// RecordSandboxId associates a sandbox id with a direct volume.
func RecordSandboxId(sandboxId string, volumePath string) error {
encodedPath := b64.URLEncoding.EncodeToString([]byte(volumePath))
mountInfoFilePath := filepath.Join(kataDirectVolumeRootPath, encodedPath, mountInfoFileName)
mountInfoFilePath := filepath.Join(kataDirectVolumeRootPath, volumePath, mountInfoFileName)
if _, err := os.Stat(mountInfoFilePath); err != nil {
return err
}
return ioutil.WriteFile(filepath.Join(kataDirectVolumeRootPath, encodedPath, sandboxId), []byte(""), 0600)
return ioutil.WriteFile(filepath.Join(kataDirectVolumeRootPath, volumePath, sandboxId), []byte(""), 0600)
}
func GetSandboxIdForVolume(volumePath string) (string, error) {
files, err := ioutil.ReadDir(filepath.Join(kataDirectVolumeRootPath, b64.URLEncoding.EncodeToString([]byte(volumePath))))
files, err := ioutil.ReadDir(filepath.Join(kataDirectVolumeRootPath, volumePath))
if err != nil {
return "", err
}

View File

@@ -6,7 +6,6 @@
package volume
import (
b64 "encoding/base64"
"encoding/json"
"errors"
"os"
@@ -19,17 +18,16 @@ import (
func TestAdd(t *testing.T) {
var err error
kataDirectVolumeRootPath = t.TempDir()
kataDirectVolumeRootPath, err = os.MkdirTemp(os.TempDir(), "add-test")
assert.Nil(t, err)
defer os.RemoveAll(kataDirectVolumeRootPath)
var volumePath = "/a/b/c"
var basePath = "a"
actual := MountInfo{
VolumeType: "block",
Device: "/dev/sda",
FsType: "ext4",
Metadata: map[string]string{
FSGroupMetadataKey: "3000",
FSGroupChangePolicyMetadataKey: string(FSGroupChangeOnRootMismatch),
},
Options: []string{"journal_dev", "noload"},
Options: []string{"journal_dev", "noload"},
}
buf, err := json.Marshal(actual)
assert.Nil(t, err)
@@ -43,22 +41,22 @@ func TestAdd(t *testing.T) {
assert.Equal(t, expected.Device, actual.Device)
assert.Equal(t, expected.FsType, actual.FsType)
assert.Equal(t, expected.Options, actual.Options)
assert.Equal(t, expected.Metadata, actual.Metadata)
_, err = os.Stat(filepath.Join(kataDirectVolumeRootPath, b64.URLEncoding.EncodeToString([]byte(volumePath))))
assert.Nil(t, err)
// Remove the file
err = Remove(volumePath)
assert.Nil(t, err)
_, err = os.Stat(filepath.Join(kataDirectVolumeRootPath, b64.URLEncoding.EncodeToString([]byte(volumePath))))
_, err = os.Stat(filepath.Join(kataDirectVolumeRootPath, basePath))
assert.True(t, errors.Is(err, os.ErrNotExist))
_, err = os.Stat(filepath.Join(kataDirectVolumeRootPath))
assert.Nil(t, err)
// Test invalid mount info json
assert.Error(t, Add(volumePath, "{invalid json}"))
}
func TestRecordSandboxId(t *testing.T) {
var err error
kataDirectVolumeRootPath = t.TempDir()
kataDirectVolumeRootPath, err = os.MkdirTemp(os.TempDir(), "recordSanboxId-test")
assert.Nil(t, err)
defer os.RemoveAll(kataDirectVolumeRootPath)
var volumePath = "/a/b/c"
mntInfo := MountInfo{
@@ -84,7 +82,9 @@ func TestRecordSandboxId(t *testing.T) {
func TestRecordSandboxIdNoMountInfoFile(t *testing.T) {
var err error
kataDirectVolumeRootPath = t.TempDir()
kataDirectVolumeRootPath, err = os.MkdirTemp(os.TempDir(), "recordSanboxId-test")
assert.Nil(t, err)
defer os.RemoveAll(kataDirectVolumeRootPath)
var volumePath = "/a/b/c"
sandboxId := uuid.Generate().String()

View File

@@ -78,21 +78,6 @@ func (km *KataMonitor) ProcessMetricsRequest(w http.ResponseWriter, r *http.Requ
scrapeDurationsHistogram.Observe(float64(time.Since(start).Nanoseconds() / int64(time.Millisecond)))
}()
// this is likely the same as `kata-runtime metrics <SANDBOX>`.
sandboxID, err := getSandboxIDFromReq(r)
if err == nil && sandboxID != "" {
metrics, err := GetSandboxMetrics(sandboxID)
if err != nil {
w.WriteHeader(http.StatusInternalServerError)
w.Write([]byte(err.Error()))
return
}
w.Write([]byte(metrics))
return
}
// if no sandbox provided, will get all sandbox's metrics.
// prepare writer for writing response.
contentType := expfmt.Negotiate(r.Header)

View File

@@ -27,7 +27,6 @@ const (
RuntimeCRIO = "cri-o"
fsMonitorRetryDelaySeconds = 60
podCacheRefreshDelaySeconds = 5
contentTypeHtml = "text/html"
)
// SetLogger sets the logger for katamonitor package.
@@ -195,41 +194,7 @@ func (km *KataMonitor) GetAgentURL(w http.ResponseWriter, r *http.Request) {
// ListSandboxes list all sandboxes running in Kata
func (km *KataMonitor) ListSandboxes(w http.ResponseWriter, r *http.Request) {
sandboxes := km.sandboxCache.getSandboxList()
htmlResponse := IfReturnHTMLResponse(w, r)
if htmlResponse {
listSandboxesHtml(sandboxes, w)
} else {
listSandboxesText(sandboxes, w)
}
}
func listSandboxesText(sandboxes []string, w http.ResponseWriter) {
for _, s := range sandboxes {
w.Write([]byte(fmt.Sprintf("%s\n", s)))
}
}
func listSandboxesHtml(sandboxes []string, w http.ResponseWriter) {
w.Write([]byte("<h1>Sandbox list</h1>\n"))
w.Write([]byte("<ul>\n"))
for _, s := range sandboxes {
w.Write([]byte(fmt.Sprintf("<li>%s: <a href='/debug/pprof/?sandbox=%s'>pprof</a>, <a href='/metrics?sandbox=%s'>metrics</a>, <a href='/agent-url?sandbox=%s'>agent-url</a></li>\n", s, s, s, s)))
}
w.Write([]byte("</ul>\n"))
}
// IfReturnHTMLResponse returns true if request accepts html response
// NOTE: IfReturnHTMLResponse will also set response header to `text/html`
func IfReturnHTMLResponse(w http.ResponseWriter, r *http.Request) bool {
accepts := r.Header["Accept"]
for _, accept := range accepts {
fields := strings.Split(accept, ",")
for _, field := range fields {
if field == contentTypeHtml {
w.Header().Set("Content-Type", contentTypeHtml)
return true
}
}
}
return false
}

View File

@@ -10,8 +10,6 @@ import (
"io"
"net"
"net/http"
"regexp"
"strings"
cdshim "github.com/containerd/containerd/runtime/v2/shim"
@@ -35,13 +33,7 @@ func (km *KataMonitor) composeSocketAddress(r *http.Request) (string, error) {
return shim.SocketAddress(sandbox), nil
}
func (km *KataMonitor) proxyRequest(w http.ResponseWriter, r *http.Request,
proxyResponse func(req *http.Request, w io.Writer, r io.Reader) error) {
if proxyResponse == nil {
proxyResponse = copyResponse
}
func (km *KataMonitor) proxyRequest(w http.ResponseWriter, r *http.Request) {
w.Header().Set("X-Content-Type-Options", "nosniff")
socketAddress, err := km.composeSocketAddress(r)
@@ -63,10 +55,8 @@ func (km *KataMonitor) proxyRequest(w http.ResponseWriter, r *http.Request,
}
uri := fmt.Sprintf("http://shim%s", r.URL.String())
monitorLog.Debugf("proxyRequest to: %s, uri: %s", socketAddress, uri)
resp, err := client.Get(uri)
if err != nil {
serveError(w, http.StatusInternalServerError, fmt.Sprintf("failed to request %s through %s", uri, socketAddress))
return
}
@@ -83,68 +73,38 @@ func (km *KataMonitor) proxyRequest(w http.ResponseWriter, r *http.Request,
w.Header().Set("Content-Disposition", contentDisposition)
}
err = proxyResponse(r, w, output)
if err != nil {
monitorLog.WithError(err).Errorf("failed proxying %s from %s", uri, socketAddress)
serveError(w, http.StatusInternalServerError, "error retrieving resource")
}
io.Copy(w, output)
}
// ExpvarHandler handles other `/debug/vars` requests
func (km *KataMonitor) ExpvarHandler(w http.ResponseWriter, r *http.Request) {
km.proxyRequest(w, r, nil)
km.proxyRequest(w, r)
}
// PprofIndex handles other `/debug/pprof/` requests
func (km *KataMonitor) PprofIndex(w http.ResponseWriter, r *http.Request) {
if len(strings.TrimPrefix(r.URL.Path, "/debug/pprof/")) == 0 {
km.proxyRequest(w, r, copyResponseAddingSandboxIdToHref)
} else {
km.proxyRequest(w, r, nil)
}
km.proxyRequest(w, r)
}
// PprofCmdline handles other `/debug/cmdline` requests
func (km *KataMonitor) PprofCmdline(w http.ResponseWriter, r *http.Request) {
km.proxyRequest(w, r, nil)
km.proxyRequest(w, r)
}
// PprofProfile handles other `/debug/profile` requests
func (km *KataMonitor) PprofProfile(w http.ResponseWriter, r *http.Request) {
km.proxyRequest(w, r, nil)
km.proxyRequest(w, r)
}
// PprofSymbol handles other `/debug/symbol` requests
func (km *KataMonitor) PprofSymbol(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "text/plain; charset=utf-8")
km.proxyRequest(w, r, nil)
km.proxyRequest(w, r)
}
// PprofTrace handles other `/debug/trace` requests
func (km *KataMonitor) PprofTrace(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/octet-stream")
w.Header().Set("Content-Disposition", `attachment; filename="trace"`)
km.proxyRequest(w, r, nil)
}
func copyResponse(req *http.Request, w io.Writer, r io.Reader) error {
_, err := io.Copy(w, r)
return err
}
func copyResponseAddingSandboxIdToHref(req *http.Request, w io.Writer, r io.Reader) error {
sb, err := getSandboxIDFromReq(req)
if err != nil {
monitorLog.WithError(err).Warning("missing sandbox query in pprof url")
return copyResponse(req, w, r)
}
buf, err := io.ReadAll(r)
if err != nil {
return err
}
re := regexp.MustCompile(`<a href=(['"])(\w+)\?(\w+=\w+)['"]>`)
outHtml := re.ReplaceAllString(string(buf), fmt.Sprintf("<a href=$1$2?sandbox=%s&$3$1>", sb))
w.Write([]byte(outHtml))
return nil
km.proxyRequest(w, r)
}

View File

@@ -1,117 +0,0 @@
// Copyright (c) 2022 Red Hat Inc.
//
// SPDX-License-Identifier: Apache-2.0
//
package katamonitor
import (
"bytes"
"net/http"
"net/url"
"strings"
"testing"
"github.com/stretchr/testify/assert"
)
func TestCopyResponseAddingSandboxIdToHref(t *testing.T) {
assert := assert.New(t)
htmlIn := strings.NewReader(`
<html>
<head>
<title>/debug/pprof/</title>
<style>
.profile-name{
display:inline-block;
width:6rem;
}
</style>
</head>
<body>
/debug/pprof/<br>
<br>
Types of profiles available:
<table>
<thead><td>Count</td><td>Profile</td></thead>
<tr><td>27</td><td><a href='allocs?debug=1'>allocs</a></td></tr>
<tr><td>0</td><td><a href='block?debug=1'>block</a></td></tr>
<tr><td>0</td><td><a href='cmdline?debug=1'>cmdline</a></td></tr>
<tr><td>39</td><td><a href='goroutine?debug=1'>goroutine</a></td></tr>
<tr><td>27</td><td><a href='heap?debug=1'>heap</a></td></tr>
<tr><td>0</td><td><a href='mutex?debug=1'>mutex</a></td></tr>
<tr><td>0</td><td><a href='profile?debug=1'>profile</a></td></tr>
<tr><td>10</td><td><a href='threadcreate?debug=1'>threadcreate</a></td></tr>
<tr><td>0</td><td><a href='trace?debug=1'>trace</a></td></tr>
</table>
<a href="goroutine?debug=2">full goroutine stack dump</a>
<br>
<p>
Profile Descriptions:
<ul>
<li><div class=profile-name>allocs: </div> A sampling of all past memory allocations</li>
<li><div class=profile-name>block: </div> Stack traces that led to blocking on synchronization primitives</li>
<li><div class=profile-name>cmdline: </div> The command line invocation of the current program</li>
<li><div class=profile-name>goroutine: </div> Stack traces of all current goroutines</li>
<li><div class=profile-name>heap: </div> A sampling of memory allocations of live objects. You can specify the gc GET parameter to run GC before taking the heap sample.</li>
<li><div class=profile-name>mutex: </div> Stack traces of holders of contended mutexes</li>
<li><div class=profile-name>profile: </div> CPU profile. You can specify the duration in the seconds GET parameter. After you get the profile file, use the go tool pprof command to investigate the profile.</li>
<li><div class=profile-name>threadcreate: </div> Stack traces that led to the creation of new OS threads</li>
<li><div class=profile-name>trace: </div> A trace of execution of the current program. You can specify the duration in the seconds GET parameter. After you get the trace file, use the go tool trace command to investigate the trace.</li>
</ul>
</p>
</body>
</html>`)
htmlExpected := bytes.NewBufferString(`
<html>
<head>
<title>/debug/pprof/</title>
<style>
.profile-name{
display:inline-block;
width:6rem;
}
</style>
</head>
<body>
/debug/pprof/<br>
<br>
Types of profiles available:
<table>
<thead><td>Count</td><td>Profile</td></thead>
<tr><td>27</td><td><a href='allocs?sandbox=1234567890&debug=1'>allocs</a></td></tr>
<tr><td>0</td><td><a href='block?sandbox=1234567890&debug=1'>block</a></td></tr>
<tr><td>0</td><td><a href='cmdline?sandbox=1234567890&debug=1'>cmdline</a></td></tr>
<tr><td>39</td><td><a href='goroutine?sandbox=1234567890&debug=1'>goroutine</a></td></tr>
<tr><td>27</td><td><a href='heap?sandbox=1234567890&debug=1'>heap</a></td></tr>
<tr><td>0</td><td><a href='mutex?sandbox=1234567890&debug=1'>mutex</a></td></tr>
<tr><td>0</td><td><a href='profile?sandbox=1234567890&debug=1'>profile</a></td></tr>
<tr><td>10</td><td><a href='threadcreate?sandbox=1234567890&debug=1'>threadcreate</a></td></tr>
<tr><td>0</td><td><a href='trace?sandbox=1234567890&debug=1'>trace</a></td></tr>
</table>
<a href="goroutine?sandbox=1234567890&debug=2">full goroutine stack dump</a>
<br>
<p>
Profile Descriptions:
<ul>
<li><div class=profile-name>allocs: </div> A sampling of all past memory allocations</li>
<li><div class=profile-name>block: </div> Stack traces that led to blocking on synchronization primitives</li>
<li><div class=profile-name>cmdline: </div> The command line invocation of the current program</li>
<li><div class=profile-name>goroutine: </div> Stack traces of all current goroutines</li>
<li><div class=profile-name>heap: </div> A sampling of memory allocations of live objects. You can specify the gc GET parameter to run GC before taking the heap sample.</li>
<li><div class=profile-name>mutex: </div> Stack traces of holders of contended mutexes</li>
<li><div class=profile-name>profile: </div> CPU profile. You can specify the duration in the seconds GET parameter. After you get the profile file, use the go tool pprof command to investigate the profile.</li>
<li><div class=profile-name>threadcreate: </div> Stack traces that led to the creation of new OS threads</li>
<li><div class=profile-name>trace: </div> A trace of execution of the current program. You can specify the duration in the seconds GET parameter. After you get the trace file, use the go tool trace command to investigate the trace.</li>
</ul>
</p>
</body>
</html>`)
req := &http.Request{URL: &url.URL{RawQuery: "sandbox=1234567890"}}
buf := bytes.NewBuffer(nil)
copyResponseAddingSandboxIdToHref(req, buf, htmlIn)
assert.Equal(htmlExpected, buf)
}

View File

@@ -271,12 +271,14 @@ func TestGetFileContents(t *testing.T) {
{"foo\nbar"},
}
dir := t.TempDir()
dir, err := os.MkdirTemp("", "")
assert.NoError(err)
defer os.RemoveAll(dir)
file := filepath.Join(dir, "foo")
// file doesn't exist
_, err := getFileContents(file)
_, err = getFileContents(file)
assert.Error(err)
for _, d := range data {

View File

@@ -346,10 +346,11 @@ func IsInGitHubActions() bool {
func SetupOCIConfigFile(t *testing.T) (rootPath string, bundlePath, ociConfigFile string) {
assert := assert.New(t)
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "katatest-")
assert.NoError(err)
bundlePath = filepath.Join(tmpdir, "bundle")
err := os.MkdirAll(bundlePath, testDirMode)
err = os.MkdirAll(bundlePath, testDirMode)
assert.NoError(err)
ociConfigFile = filepath.Join(bundlePath, "config.json")

View File

@@ -74,78 +74,70 @@ type factory struct {
}
type hypervisor struct {
Path string `toml:"path"`
JailerPath string `toml:"jailer_path"`
Kernel string `toml:"kernel"`
CtlPath string `toml:"ctlpath"`
Initrd string `toml:"initrd"`
Image string `toml:"image"`
Firmware string `toml:"firmware"`
FirmwareVolume string `toml:"firmware_volume"`
MachineAccelerators string `toml:"machine_accelerators"`
CPUFeatures string `toml:"cpu_features"`
KernelParams string `toml:"kernel_params"`
MachineType string `toml:"machine_type"`
BlockDeviceDriver string `toml:"block_device_driver"`
EntropySource string `toml:"entropy_source"`
SharedFS string `toml:"shared_fs"`
VirtioFSDaemon string `toml:"virtio_fs_daemon"`
VirtioFSCache string `toml:"virtio_fs_cache"`
VhostUserStorePath string `toml:"vhost_user_store_path"`
FileBackedMemRootDir string `toml:"file_mem_backend"`
GuestHookPath string `toml:"guest_hook_path"`
GuestMemoryDumpPath string `toml:"guest_memory_dump_path"`
HypervisorPathList []string `toml:"valid_hypervisor_paths"`
JailerPathList []string `toml:"valid_jailer_paths"`
CtlPathList []string `toml:"valid_ctlpaths"`
VirtioFSDaemonList []string `toml:"valid_virtio_fs_daemon_paths"`
VirtioFSExtraArgs []string `toml:"virtio_fs_extra_args"`
PFlashList []string `toml:"pflashes"`
VhostUserStorePathList []string `toml:"valid_vhost_user_store_paths"`
FileBackedMemRootList []string `toml:"valid_file_mem_backends"`
EntropySourceList []string `toml:"valid_entropy_sources"`
EnableAnnotations []string `toml:"enable_annotations"`
RxRateLimiterMaxRate uint64 `toml:"rx_rate_limiter_max_rate"`
TxRateLimiterMaxRate uint64 `toml:"tx_rate_limiter_max_rate"`
MemOffset uint64 `toml:"memory_offset"`
DiskRateLimiterBwMaxRate int64 `toml:"disk_rate_limiter_bw_max_rate"`
DiskRateLimiterBwOneTimeBurst int64 `toml:"disk_rate_limiter_bw_one_time_burst"`
DiskRateLimiterOpsMaxRate int64 `toml:"disk_rate_limiter_ops_max_rate"`
DiskRateLimiterOpsOneTimeBurst int64 `toml:"disk_rate_limiter_ops_one_time_burst"`
NetRateLimiterBwMaxRate int64 `toml:"net_rate_limiter_bw_max_rate"`
NetRateLimiterBwOneTimeBurst int64 `toml:"net_rate_limiter_bw_one_time_burst"`
NetRateLimiterOpsMaxRate int64 `toml:"net_rate_limiter_ops_max_rate"`
NetRateLimiterOpsOneTimeBurst int64 `toml:"net_rate_limiter_ops_one_time_burst"`
VirtioFSCacheSize uint32 `toml:"virtio_fs_cache_size"`
DefaultMaxVCPUs uint32 `toml:"default_maxvcpus"`
MemorySize uint32 `toml:"default_memory"`
MemSlots uint32 `toml:"memory_slots"`
DefaultBridges uint32 `toml:"default_bridges"`
Msize9p uint32 `toml:"msize_9p"`
PCIeRootPort uint32 `toml:"pcie_root_port"`
NumVCPUs int32 `toml:"default_vcpus"`
BlockDeviceCacheSet bool `toml:"block_device_cache_set"`
BlockDeviceCacheDirect bool `toml:"block_device_cache_direct"`
BlockDeviceCacheNoflush bool `toml:"block_device_cache_noflush"`
EnableVhostUserStore bool `toml:"enable_vhost_user_store"`
DisableBlockDeviceUse bool `toml:"disable_block_device_use"`
MemPrealloc bool `toml:"enable_mem_prealloc"`
HugePages bool `toml:"enable_hugepages"`
VirtioMem bool `toml:"enable_virtio_mem"`
IOMMU bool `toml:"enable_iommu"`
IOMMUPlatform bool `toml:"enable_iommu_platform"`
Debug bool `toml:"enable_debug"`
DisableNestingChecks bool `toml:"disable_nesting_checks"`
EnableIOThreads bool `toml:"enable_iothreads"`
DisableImageNvdimm bool `toml:"disable_image_nvdimm"`
HotplugVFIOOnRootBus bool `toml:"hotplug_vfio_on_root_bus"`
DisableVhostNet bool `toml:"disable_vhost_net"`
GuestMemoryDumpPaging bool `toml:"guest_memory_dump_paging"`
ConfidentialGuest bool `toml:"confidential_guest"`
GuestSwap bool `toml:"enable_guest_swap"`
Rootless bool `toml:"rootless"`
DisableSeccomp bool `toml:"disable_seccomp"`
DisableSeLinux bool `toml:"disable_selinux"`
Path string `toml:"path"`
JailerPath string `toml:"jailer_path"`
Kernel string `toml:"kernel"`
CtlPath string `toml:"ctlpath"`
Initrd string `toml:"initrd"`
Image string `toml:"image"`
Firmware string `toml:"firmware"`
FirmwareVolume string `toml:"firmware_volume"`
MachineAccelerators string `toml:"machine_accelerators"`
CPUFeatures string `toml:"cpu_features"`
KernelParams string `toml:"kernel_params"`
MachineType string `toml:"machine_type"`
BlockDeviceDriver string `toml:"block_device_driver"`
EntropySource string `toml:"entropy_source"`
SharedFS string `toml:"shared_fs"`
VirtioFSDaemon string `toml:"virtio_fs_daemon"`
VirtioFSCache string `toml:"virtio_fs_cache"`
VhostUserStorePath string `toml:"vhost_user_store_path"`
FileBackedMemRootDir string `toml:"file_mem_backend"`
GuestHookPath string `toml:"guest_hook_path"`
GuestMemoryDumpPath string `toml:"guest_memory_dump_path"`
HypervisorPathList []string `toml:"valid_hypervisor_paths"`
JailerPathList []string `toml:"valid_jailer_paths"`
CtlPathList []string `toml:"valid_ctlpaths"`
VirtioFSDaemonList []string `toml:"valid_virtio_fs_daemon_paths"`
VirtioFSExtraArgs []string `toml:"virtio_fs_extra_args"`
PFlashList []string `toml:"pflashes"`
VhostUserStorePathList []string `toml:"valid_vhost_user_store_paths"`
FileBackedMemRootList []string `toml:"valid_file_mem_backends"`
EntropySourceList []string `toml:"valid_entropy_sources"`
EnableAnnotations []string `toml:"enable_annotations"`
RxRateLimiterMaxRate uint64 `toml:"rx_rate_limiter_max_rate"`
TxRateLimiterMaxRate uint64 `toml:"tx_rate_limiter_max_rate"`
MemOffset uint64 `toml:"memory_offset"`
VirtioFSCacheSize uint32 `toml:"virtio_fs_cache_size"`
DefaultMaxVCPUs uint32 `toml:"default_maxvcpus"`
MemorySize uint32 `toml:"default_memory"`
MemSlots uint32 `toml:"memory_slots"`
DefaultBridges uint32 `toml:"default_bridges"`
Msize9p uint32 `toml:"msize_9p"`
PCIeRootPort uint32 `toml:"pcie_root_port"`
NumVCPUs int32 `toml:"default_vcpus"`
BlockDeviceCacheSet bool `toml:"block_device_cache_set"`
BlockDeviceCacheDirect bool `toml:"block_device_cache_direct"`
BlockDeviceCacheNoflush bool `toml:"block_device_cache_noflush"`
EnableVhostUserStore bool `toml:"enable_vhost_user_store"`
DisableBlockDeviceUse bool `toml:"disable_block_device_use"`
MemPrealloc bool `toml:"enable_mem_prealloc"`
HugePages bool `toml:"enable_hugepages"`
VirtioMem bool `toml:"enable_virtio_mem"`
IOMMU bool `toml:"enable_iommu"`
IOMMUPlatform bool `toml:"enable_iommu_platform"`
Debug bool `toml:"enable_debug"`
DisableNestingChecks bool `toml:"disable_nesting_checks"`
EnableIOThreads bool `toml:"enable_iothreads"`
DisableImageNvdimm bool `toml:"disable_image_nvdimm"`
HotplugVFIOOnRootBus bool `toml:"hotplug_vfio_on_root_bus"`
DisableVhostNet bool `toml:"disable_vhost_net"`
GuestMemoryDumpPaging bool `toml:"guest_memory_dump_paging"`
ConfidentialGuest bool `toml:"confidential_guest"`
GuestSwap bool `toml:"enable_guest_swap"`
Rootless bool `toml:"rootless"`
DisableSeccomp bool `toml:"disable_seccomp"`
DisableSeLinux bool `toml:"disable_selinux"`
}
type runtime struct {
@@ -477,47 +469,17 @@ func (h hypervisor) getInitrdAndImage() (initrd string, image string, err error)
image, errImage := h.image()
if h.ConfidentialGuest && h.MachineType == vc.QemuCCWVirtio {
if image != "" || initrd != "" {
return "", "", errors.New("Neither the image nor initrd path may be set for Secure Execution")
}
} else if image != "" && initrd != "" {
if image != "" && initrd != "" {
return "", "", errors.New("having both an image and an initrd defined in the configuration file is not supported")
} else if errInitrd != nil && errImage != nil {
}
if errInitrd != nil && errImage != nil {
return "", "", fmt.Errorf("Either initrd or image must be set to a valid path (initrd: %v) (image: %v)", errInitrd, errImage)
}
return
}
func (h hypervisor) getDiskRateLimiterBwMaxRate() int64 {
return h.DiskRateLimiterBwMaxRate
}
func (h hypervisor) getDiskRateLimiterBwOneTimeBurst() int64 {
if h.DiskRateLimiterBwOneTimeBurst != 0 && h.getDiskRateLimiterBwMaxRate() == 0 {
kataUtilsLogger.Warn("The DiskRateLimiterBwOneTimeBurst is set but DiskRateLimiterBwMaxRate is not set, this option will be ignored.")
h.DiskRateLimiterBwOneTimeBurst = 0
}
return h.DiskRateLimiterBwOneTimeBurst
}
func (h hypervisor) getDiskRateLimiterOpsMaxRate() int64 {
return h.DiskRateLimiterOpsMaxRate
}
func (h hypervisor) getDiskRateLimiterOpsOneTimeBurst() int64 {
if h.DiskRateLimiterOpsOneTimeBurst != 0 && h.getDiskRateLimiterOpsMaxRate() == 0 {
kataUtilsLogger.Warn("The DiskRateLimiterOpsOneTimeBurst is set but DiskRateLimiterOpsMaxRate is not set, this option will be ignored.")
h.DiskRateLimiterOpsOneTimeBurst = 0
}
return h.DiskRateLimiterOpsOneTimeBurst
}
func (h hypervisor) getRxRateLimiterCfg() uint64 {
return h.RxRateLimiterMaxRate
}
@@ -526,34 +488,6 @@ func (h hypervisor) getTxRateLimiterCfg() uint64 {
return h.TxRateLimiterMaxRate
}
func (h hypervisor) getNetRateLimiterBwMaxRate() int64 {
return h.NetRateLimiterBwMaxRate
}
func (h hypervisor) getNetRateLimiterBwOneTimeBurst() int64 {
if h.NetRateLimiterBwOneTimeBurst != 0 && h.getNetRateLimiterBwMaxRate() == 0 {
kataUtilsLogger.Warn("The NetRateLimiterBwOneTimeBurst is set but NetRateLimiterBwMaxRate is not set, this option will be ignored.")
h.NetRateLimiterBwOneTimeBurst = 0
}
return h.NetRateLimiterBwOneTimeBurst
}
func (h hypervisor) getNetRateLimiterOpsMaxRate() int64 {
return h.NetRateLimiterOpsMaxRate
}
func (h hypervisor) getNetRateLimiterOpsOneTimeBurst() int64 {
if h.NetRateLimiterOpsOneTimeBurst != 0 && h.getNetRateLimiterOpsMaxRate() == 0 {
kataUtilsLogger.Warn("The NetRateLimiterOpsOneTimeBurst is set but NetRateLimiterOpsMaxRate is not set, this option will be ignored.")
h.NetRateLimiterOpsOneTimeBurst = 0
}
return h.NetRateLimiterOpsOneTimeBurst
}
func (h hypervisor) getIOMMUPlatform() bool {
if h.IOMMUPlatform {
kataUtilsLogger.Info("IOMMUPlatform is enabled by default.")
@@ -671,6 +605,16 @@ func newQemuHypervisorConfig(h hypervisor) (vc.HypervisorConfig, error) {
return vc.HypervisorConfig{}, err
}
if image != "" && initrd != "" {
return vc.HypervisorConfig{},
errors.New("having both an image and an initrd defined in the configuration file is not supported")
}
if image == "" && initrd == "" {
return vc.HypervisorConfig{},
errors.New("either image or initrd must be defined in the configuration file")
}
firmware, err := h.firmware()
if err != nil {
return vc.HypervisorConfig{}, err
@@ -892,60 +836,52 @@ func newClhHypervisorConfig(h hypervisor) (vc.HypervisorConfig, error) {
}
return vc.HypervisorConfig{
HypervisorPath: hypervisor,
HypervisorPathList: h.HypervisorPathList,
KernelPath: kernel,
InitrdPath: initrd,
ImagePath: image,
FirmwarePath: firmware,
MachineAccelerators: machineAccelerators,
KernelParams: vc.DeserializeParams(strings.Fields(kernelParams)),
HypervisorMachineType: machineType,
NumVCPUs: h.defaultVCPUs(),
DefaultMaxVCPUs: h.defaultMaxVCPUs(),
MemorySize: h.defaultMemSz(),
MemSlots: h.defaultMemSlots(),
MemOffset: h.defaultMemOffset(),
VirtioMem: h.VirtioMem,
EntropySource: h.GetEntropySource(),
EntropySourceList: h.EntropySourceList,
DefaultBridges: h.defaultBridges(),
DisableBlockDeviceUse: h.DisableBlockDeviceUse,
SharedFS: sharedFS,
VirtioFSDaemon: h.VirtioFSDaemon,
VirtioFSDaemonList: h.VirtioFSDaemonList,
VirtioFSCacheSize: h.VirtioFSCacheSize,
VirtioFSCache: h.VirtioFSCache,
MemPrealloc: h.MemPrealloc,
HugePages: h.HugePages,
FileBackedMemRootDir: h.FileBackedMemRootDir,
FileBackedMemRootList: h.FileBackedMemRootList,
Debug: h.Debug,
DisableNestingChecks: h.DisableNestingChecks,
BlockDeviceDriver: blockDriver,
BlockDeviceCacheSet: h.BlockDeviceCacheSet,
BlockDeviceCacheDirect: h.BlockDeviceCacheDirect,
BlockDeviceCacheNoflush: h.BlockDeviceCacheNoflush,
EnableIOThreads: h.EnableIOThreads,
Msize9p: h.msize9p(),
HotplugVFIOOnRootBus: h.HotplugVFIOOnRootBus,
PCIeRootPort: h.PCIeRootPort,
DisableVhostNet: true,
GuestHookPath: h.guestHookPath(),
VirtioFSExtraArgs: h.VirtioFSExtraArgs,
SGXEPCSize: defaultSGXEPCSize,
EnableAnnotations: h.EnableAnnotations,
DisableSeccomp: h.DisableSeccomp,
ConfidentialGuest: h.ConfidentialGuest,
DisableSeLinux: h.DisableSeLinux,
NetRateLimiterBwMaxRate: h.getNetRateLimiterBwMaxRate(),
NetRateLimiterBwOneTimeBurst: h.getNetRateLimiterBwOneTimeBurst(),
NetRateLimiterOpsMaxRate: h.getNetRateLimiterOpsMaxRate(),
NetRateLimiterOpsOneTimeBurst: h.getNetRateLimiterOpsOneTimeBurst(),
DiskRateLimiterBwMaxRate: h.getDiskRateLimiterBwMaxRate(),
DiskRateLimiterBwOneTimeBurst: h.getDiskRateLimiterBwOneTimeBurst(),
DiskRateLimiterOpsMaxRate: h.getDiskRateLimiterOpsMaxRate(),
DiskRateLimiterOpsOneTimeBurst: h.getDiskRateLimiterOpsOneTimeBurst(),
HypervisorPath: hypervisor,
HypervisorPathList: h.HypervisorPathList,
KernelPath: kernel,
InitrdPath: initrd,
ImagePath: image,
FirmwarePath: firmware,
MachineAccelerators: machineAccelerators,
KernelParams: vc.DeserializeParams(strings.Fields(kernelParams)),
HypervisorMachineType: machineType,
NumVCPUs: h.defaultVCPUs(),
DefaultMaxVCPUs: h.defaultMaxVCPUs(),
MemorySize: h.defaultMemSz(),
MemSlots: h.defaultMemSlots(),
MemOffset: h.defaultMemOffset(),
VirtioMem: h.VirtioMem,
EntropySource: h.GetEntropySource(),
EntropySourceList: h.EntropySourceList,
DefaultBridges: h.defaultBridges(),
DisableBlockDeviceUse: h.DisableBlockDeviceUse,
SharedFS: sharedFS,
VirtioFSDaemon: h.VirtioFSDaemon,
VirtioFSDaemonList: h.VirtioFSDaemonList,
VirtioFSCacheSize: h.VirtioFSCacheSize,
VirtioFSCache: h.VirtioFSCache,
MemPrealloc: h.MemPrealloc,
HugePages: h.HugePages,
FileBackedMemRootDir: h.FileBackedMemRootDir,
FileBackedMemRootList: h.FileBackedMemRootList,
Debug: h.Debug,
DisableNestingChecks: h.DisableNestingChecks,
BlockDeviceDriver: blockDriver,
BlockDeviceCacheSet: h.BlockDeviceCacheSet,
BlockDeviceCacheDirect: h.BlockDeviceCacheDirect,
BlockDeviceCacheNoflush: h.BlockDeviceCacheNoflush,
EnableIOThreads: h.EnableIOThreads,
Msize9p: h.msize9p(),
HotplugVFIOOnRootBus: h.HotplugVFIOOnRootBus,
PCIeRootPort: h.PCIeRootPort,
DisableVhostNet: true,
GuestHookPath: h.guestHookPath(),
VirtioFSExtraArgs: h.VirtioFSExtraArgs,
SGXEPCSize: defaultSGXEPCSize,
EnableAnnotations: h.EnableAnnotations,
DisableSeccomp: h.DisableSeccomp,
ConfidentialGuest: h.ConfidentialGuest,
DisableSeLinux: h.DisableSeLinux,
}, nil
}

Some files were not shown because too many files have changed in this diff Show More