Compare commits

..

117 Commits
3.0.0 ... 2.4.3

Author SHA1 Message Date
Archana Shinde
6330386ab6 Merge pull request #4593 from fidencio/2.4.3-branch-bump
# Kata Containers 2.4.3
2022-07-05 15:04:34 -07:00
Fabiano Fidêncio
847003187c release: Kata Containers 2.4.3
- stable-2.4 | shim: set a non-zero return code if the wait process call failed.
- stable-2.4 | rootfs: Fix chronyd.service failing on boot

396fed42c release: Adapt kata-deploy for 2.4.3
025e3ea6a shim: set a non-zero return code if the wait process call failed.
f32a14663 snap: Fix debug cli option
0718b9b55 rootfs: Fix chronyd.service failing on boot

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2022-07-05 22:26:52 +02:00
Fabiano Fidêncio
396fed42c1 release: Adapt kata-deploy for 2.4.3
kata-deploy files must be adapted to a new release.  The cases where it
happens are when the release goes from -> to:
* main -> stable:
  * kata-deploy-stable / kata-cleanup-stable: are removed

* stable -> stable:
  * kata-deploy / kata-cleanup: bump the release to the new one.

There are no changes when doing an alpha release, as the files on the
"main" branch always point to the "latest" and "stable" tags.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2022-07-05 22:26:52 +02:00
GabyCT
ca7bb9dceb Merge pull request #4571 from fidencio/topic/stable-2.4-set-status-if-wait-process-failed
stable-2.4 | shim: set a non-zero return code if the wait process call failed.
2022-07-01 11:28:35 -05:00
liubin
025e3ea6ab shim: set a non-zero return code if the wait process call failed.
Return code is an int32 type, so if an error occurred, the default value
may be zero, this value will be created as a normal exit code.

Set return code to 255 will let the caller(for example Kubernetes) know
that there are some problems with the pod/container.

Fixes: #4419

Signed-off-by: liubin <liubin0329@gmail.com>
(cherry picked from commit ab5f1c9564)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-06-30 21:55:13 +02:00
GabyCT
5d50fc3908 Merge pull request #4501 from fidencio/topic/stable-2.4-backport-chronyd-fix
stable-2.4 | rootfs: Fix chronyd.service failing on boot
2022-06-22 12:32:21 -05:00
James O. D. Hunt
f32a146637 snap: Fix debug cli option
`snap`/`snapcraft` seems to have changed recently. Since `snap`
auto-updates all `snap` packages and since we use the `snapcraft` `snap`
for building snaps, this is impacting all our CI jobs which now show:

```
Installing Snapcraft for Linux…
snapcraft 7.0.4 from Canonical* installed

Run snapcraft -d snap --destructive-mode
Usage: snapcraft [options] command [args]...
Try 'snapcraft pack -h' for help.
Error: unrecognized arguments: -d
Error: Process completed with exit code 1.
```

Move the debug option to make it a sub-command (long) option to resolve
this issue.

Fixes: #4457.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 90a7763ac6)
2022-06-21 16:35:53 +02:00
Champ-Goblem
0718b9b55f rootfs: Fix chronyd.service failing on boot
In at least kata versions 2.3.3 and 2.4.0 it was noticed that the guest
operating system's clock would drift out of sync slowly over time
whilst the pod was running.

This had previously been raised and fixed in the old reposity via [1].
In essence kvm_ptp and chrony were paired together in order to
keep the system clock up to date with the host.

In the recent versions of kata metioned above,
the chronyd.service fails upon boot with status `266/NAMESPACE`
which seems to be due to the fact that the `/var/lib/chrony`
directory no longer exists.

This change sets the `/var/lib/chrony` directory for the `ReadWritePaths`
to be ignored when the directory does not exist, as per [2].

[1] https://github.com/kata-containers/runtime/issues/1279
[2] https://www.freedesktop.org/software/systemd
/man/systemd.exec.html#ReadWritePaths=

Fixes: #4167
Signed-off-by: Champ-Goblem <cameron_mcdermott@yahoo.co.uk>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 1b7fd19acb)
2022-06-21 15:49:56 +02:00
snir911
6d93875ead Merge pull request #4385 from snir911/2.4.2-branch-bump
# Kata Containers 2.4.2
2022-06-08 19:38:17 +03:00
Snir Sheriber
7fd22d77d0 release: Kata Containers 2.4.2
- My 2.4 pr backport -- fix shim leak caused by ESRCH in agent destroy
- backport-2.4 | workflows: add workflow_dispatch triggering to test-kata-deploy
- stable-2.4: runtime: Adding the correct detection of mediated PCIe devices
- stable-2.4: backport agent fixes
- stable-2.4 | clh: Update to the v24.0 release
- stable-2.4 | Backport fixes for direct-volume stats
- stable-2.4 | tools: Add QEMU patches for SGX numa support
- stable-2.4 | versions: Upgrade to Cloud Hypervisor v23.1

607a8a9c2 release: Adapt kata-deploy for 2.4.2
e5568a31a agent: ignore ESRCH error when destroying containers
322839ac7 runtime: force stop container after the container process exits
b75d5cee7 docs: update release process github token instructions
e938ce443 docs: update release process with latest workflow triggering
046ba4df7 workflows: add workflow_dispatch triggering to test-kata-deploy
14ce4b01b runtime: Adding the correct detection of mediated PCIe devices
f54d5cf16 agent: Fix is_signal_handled failing parsing str to u64
80d5f9e14 agent: move assert_result macro to test_utils file
50a74dfee agent: add tests for is_signal_handled function
560247f8d agent: add tests for update_container_namespaces
47d4e79c1 agent: add tests for do_write_stream function
e3ce8aff9 agent: add tests for get_memory_info function
ebe9fc2ca clh: Update to the v24.0 release
29c9391da agent: fix direct-assigned volume stats
d1848523d runtime: direct-volume stats use correct name
338c9f2b0 runtime: direct-volume stats update to use GET parameter
f528bc010 runtime: fix incorrect Action function for direct-volume stats
3413c8588 tools: Add QEMU patches for SGX numa support
db6d4f7e1 versions: Upgrade to Cloud Hypervisor v23.1

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2022-06-08 11:54:59 +03:00
Snir Sheriber
607a8a9c2d release: Adapt kata-deploy for 2.4.2
kata-deploy files must be adapted to a new release.  The cases where it
happens are when the release goes from -> to:
* main -> stable:
  * kata-deploy-stable / kata-cleanup-stable: are removed

* stable -> stable:
  * kata-deploy / kata-cleanup: bump the release to the new one.

There are no changes when doing an alpha release, as the files on the
"main" branch always point to the "latest" and "stable" tags.

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2022-06-08 11:54:59 +03:00
Feng Wang
562e968d19 Merge pull request #4389 from fengwang666/my_2.4_pr_backport
My 2.4 pr backport -- fix shim leak caused by ESRCH in agent destroy
2022-06-02 14:28:40 -07:00
Feng Wang
e5568a31a7 agent: ignore ESRCH error when destroying containers
destroy() method should ignore the ESRCH error from signal::kill
and continue the operation as ESRCH is often considered harmless.

Fixes: #4359

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-06-02 12:50:54 -07:00
Feng Wang
322839ac75 runtime: force stop container after the container process exits
Set thestop container force flag to true so that the container state is always set to
“StateStopped” after the container wait goroutine is finished. This is necessary for
the following delete container step to succeed.

Fixes: #4359

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-06-02 12:50:40 -07:00
snir911
4be3aebd15 Merge pull request #4352 from snir911/fix-workflow-stable-2.4
backport-2.4 | workflows: add workflow_dispatch triggering to test-kata-deploy
2022-06-02 13:19:19 +03:00
Snir Sheriber
b75d5cee74 docs: update release process github token instructions
and fix the gpg generating key url

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2022-06-01 19:12:45 +03:00
Snir Sheriber
e938ce443c docs: update release process with latest workflow triggering
instructions

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2022-06-01 19:12:37 +03:00
Snir Sheriber
046ba4df7f workflows: add workflow_dispatch triggering to test-kata-deploy
This will allow to trigger the test-kata-deploy workflow manually from
any branch instead of using always the one that is defined on main

See: https://github.blog/changelog/2020-07-06-github-actions-manual-triggers-with-workflow_dispatch/

Fixes: #4349
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2022-06-01 16:25:25 +03:00
Fabiano Fidêncio
a1d2049bee Merge pull request #4337 from snir911/backports-stable-2.4
stable-2.4: runtime: Adding the correct detection of mediated PCIe devices
2022-05-30 22:35:26 +02:00
James O. D. Hunt
8dcf6c354f Merge pull request #4274 from egernst/backport-agent-fixes
stable-2.4: backport agent fixes
2022-05-30 16:57:07 +01:00
Zvonko Kaiser
14ce4b01ba runtime: Adding the correct detection of mediated PCIe devices
Fixes #4212

Backport-of: https://github.com/kata-containers/kata-containers/pull/4213
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2022-05-30 12:32:41 +03:00
snir911
4018fdc9b2 Merge pull request #4319 from fidencio/topic/stable-2.4-update-clh-to-v24.0
stable-2.4 | clh: Update to the v24.0 release
2022-05-29 11:46:58 +03:00
Champ-Goblem
f54d5cf165 agent: Fix is_signal_handled failing parsing str to u64
In the is_signal_handled function, when parsing the hex string returned
from `/proc/<pid>/status` the space/tab character after the colon
is not removed.

This patch trims the result of SigCgt so that
all whitespace characters are removed. It also extends the existing
test cases to check for this scenario.

Fixes: #4250
Signed-off-by: Champ-Goblem <cameron@northflank.com>
2022-05-26 15:44:56 -07:00
Braden Rayhorn
80d5f9e145 agent: move assert_result macro to test_utils file
Move the assert_result macro to the shared test_utils file
so that it is not duplicated in individual files.

Fixes: #4093

Signed-off-by: Braden Rayhorn <bradenrayhorn@fastmail.com>
2022-05-26 15:44:56 -07:00
Braden Rayhorn
50a74dfeee agent: add tests for is_signal_handled function
Add test coverage for is_signal_handled function in rpc.rs. Includes
refactors to make the function testable and handle additional cases.

Fixes #3939

Signed-off-by: Braden Rayhorn <bradenrayhorn@fastmail.com>
2022-05-26 15:43:45 -07:00
Braden Rayhorn
560247f8da agent: add tests for update_container_namespaces
Add test coverage for update_container_namespaces function
in src/rpc.rs. Includes minor refactor to make function easier
to test.

Fixes #4034

Signed-off-by: Braden Rayhorn <bradenrayhorn@fastmail.com>
2022-05-26 15:43:45 -07:00
Braden Rayhorn
47d4e79c15 agent: add tests for do_write_stream function
Add test coverage for do_write_stream function of AgentService
in src/rpc.rs. Includes minor refactoring to make function more
easily testable.

Fixes #3984

Signed-off-by: Braden Rayhorn <bradenrayhorn@fastmail.com>
2022-05-26 15:42:45 -07:00
Braden Rayhorn
e3ce8aff99 agent: add tests for get_memory_info function
Add test coverage for get_memory_info function in src/rpc.rs. Includes
some minor refactoring of the function.

Fixes #3837

Signed-off-by: Braden Rayhorn <bradenrayhorn@fastmail.com>
2022-05-26 15:42:45 -07:00
Yibo Zhuang
fc2c933a88 Merge pull request #4305 from yibozhuang/stable-2.4
stable-2.4 | Backport fixes for direct-volume stats
2022-05-26 13:52:19 -07:00
Fabiano Fidêncio
ebe9fc2cad clh: Update to the v24.0 release
This release has been tracked through the v24.0 project.

virtio-iommu specification describes how a device can be attached by default
to a bypass domain. This feature is particularly helpful for booting a VM with
guest software which doesn't support virtio-iommu but still need to access
the device. Now that Cloud Hypervisor supports this feature, it can boot a VM
with Rust Hypervisor Firmware or OVMF even if the virtio-block device exposing
the disk image is placed behind a virtual IOMMU.

Multiple checks have been added to the code to prevent devices with identical
identifiers from being created, and therefore avoid unexpected behaviors at boot
or whenever a device was hot plugged into the VM.

Sparse mmap support has been added to both VFIO and vfio-user devices. This
allows the device regions that are not fully mappable to be partially mapped.
And the more a device region can be mapped into the guest address space, the
fewer VM exits will be generated when this device is accessed. This directly
impacts the performance related to this device.

A new serial_number option has been added to --platform, allowing a user to
set a specific serial number for the platform. This number is exposed to the
guest through the SMBIOS.

* Fix loading RAW firmware (#4072)
* Reject compressed QCOW images (#4055)
* Reject virtio-mem resize if device is not activated (#4003)
* Fix potential mmap leaks from VFIO/vfio-user MMIO regions (#4069)
* Fix algorithm finding HOB memory resources (#3983)

* Refactor interrupt handling (#4083)
* Load kernel asynchronously (#4022)
* Only create ACPI memory manager DSDT when resizable (#4013)

Deprecated features will be removed in a subsequent release and users should
plan to use alternatives

* The mergeable option from the virtio-pmem support has been deprecated
(#3968)
* The dax option from the virtio-fs support has been deprecated (#3889)

Fixes: #4317

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-05-26 08:58:27 +00:00
Yibo Zhuang
29c9391da1 agent: fix direct-assigned volume stats
The current implementation of walking the
disks to match with the requested volume path
in agent doesn't work because the volume path
provided by the shim to the agent is the mount
path within the guest and not the device name.
The current logic is trying to match the
device name to the volume path which will never
match.

This change will simplify the
get_volume_capacity_stats and
get_volume_inode_stats to just call statfs and
get the bytes and inodes usage of the volume
path directly.

Fixes: #4297

Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
2022-05-23 16:40:35 -07:00
Yibo Zhuang
d1848523d3 runtime: direct-volume stats use correct name
Today the shim does a translation when doing
direct-volume stats where it takes the source and
returns the mount path within the guest.

The source for a direct-assigned volume is actually
the device path on the host and not the publish
volume path.

This change will perform a lookup of the mount info
during direct-volume stats to ensure that the
device path is provided to the shim for querying
the volume stats.

Fixes: #4297

Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
2022-05-23 16:34:50 -07:00
Yibo Zhuang
338c9f2b0b runtime: direct-volume stats update to use GET parameter
The go default http mux AFAIK doesn’t support pattern
routing so right now client is padding the url
for direct-volume stats with a subpath of the volume
path and this will always result in 404 not found returned
by the shim.

This change will update the shim to take the volume
path as a GET query parameter instead of a subpath.
If the parameter is missing or empty, then return
400 BadRequest to the client.

Fixes: #4297

Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
2022-05-23 16:34:41 -07:00
Yibo Zhuang
f528bc0103 runtime: fix incorrect Action function for direct-volume stats
The action function expects a function that returns error
but the current direct-volume stats Action returns
(string, error) which is invalid.

This change fixes the format and print out the stats from
the command instead.

Fixes: #4293

Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
2022-05-23 16:34:29 -07:00
Chelsea Mafrica
f821ecbdc6 Merge pull request #4268 from cmaf/tools-patch-qemu-sgx-numa-2.4
stable-2.4 | tools: Add QEMU patches for SGX numa support
2022-05-16 14:31:48 -07:00
Chelsea Mafrica
3413c8588d tools: Add QEMU patches for SGX numa support
There are a few patches for SGX numa support in QEMU added after the
6.2.0 release. Add them for SGX support in Kata.

Fixes #4254

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
(cherry picked from commit b4b9068cb7)
2022-05-16 11:48:22 -07:00
Fabiano Fidêncio
b93a0b1012 Merge pull request #4229 from likebreath/0510/backport_clh_v23.1
stable-2.4 | versions: Upgrade to Cloud Hypervisor v23.1
2022-05-12 21:55:20 +02:00
Bo Chen
db6d4f7e16 versions: Upgrade to Cloud Hypervisor v23.1
The following issues have been addressed from the latest bug fix release
v23.1 of Cloud Hypervisor: 1) Add some missing seccomp rules; 2) Remove
virtio-fs filesystem entries from config on removal; 3) Do not delete
API socket on API server start; 4) Reject virtio-mem resize if the guest
doesn't activate the device; 5) Fix OpenAPI naming of I/O throttling
knobs;

Fixes: #4222

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 82ea018281)
2022-05-10 11:26:57 -07:00
Fabiano Fidêncio
67d67ab66d Merge pull request #4204 from fidencio/2.4.1-branch-bump
# Kata Containers 2.4.1
2022-05-04 19:11:37 +02:00
Fabiano Fidêncio
99c6726cf6 release: Kata Containers 2.4.1
- stable-2.4 | Second round of backports for the 2.4.1 release
- stable-2.4 | First round of backports for the 2.4.1 release
- stable-2.4 | versions: Upgrade to Cloud Hypervisor v23.0
- stable-2.4 | runtime: Base64 encode the direct volume mountInfo path
- stable-2.4 | agent: Avoid agent panic when reading empty stats

8e076c87 release: Adapt kata-deploy for 2.4.1
b50b091c agent: watchers: ensure uid/gid is preserved on copy/mkdir
03bc89ab clh: Rely on Cloud Hypervisor for generating the device ID
6b2c641f tools: fix typo in clh directory name
81e10fe3 packaging: Fix clh build from source fall-back
8b21c5f7 agent: modify the type of swappiness to u64
3f5c6e71 runtime: Allock mockfs storage to be placed in any directory
0bd1abac runtime: Let MockFSInit create a mock fs driver at any path
3e74243f runtime: Move mockfs control global into mockfs.go
aed4fe6a runtime: Export StoragePathSuffix
e1c4f57c runtime: Don't abuse MockStorageRootPath() for factory tests
c49084f3 runtime: Make bind mount tests better clean up after themselves
4e350f7d runtime: Clean up mock hook logs in tests
415420f6 runtime: Make SetupOCIConfigFile clean up after itself
688b9abd runtime: Don't use fixed /tmp/mountPoint path
dc1288de kata-monitor: add a README file
78edf827 kata-monitor: add some links when generating pages for browsers
eff74fab agent: fsGroup support for direct-assigned volume
01cd5809 proto: fsGroup support for direct-assigned volume
97ad1d55 runtime: fsGroup support for direct-assigned volume
b62cced7 runtime: no need to write virtiofsd error to log
8242cfd2 kata-monitor: update the hrefs in the debug/pprof index page
a37d4e53 agent: best-effort removing mount point
d1197ee8 tools/packaging: Fix error path in 'kata-deploy-binaries.sh -s'
c9c77511 tools/packaging: Fix usage of kata-deploy-binaries.sh
1e622316 tools/packaging/kata-deploy: Copy install_yq.sh in a dedicated script
8fa64e01 packaging: Eliminate TTY_OPT and NO_TTY variables in kata-deploy
8f67f9e3 tools/packaging/kata-deploy/local-build: Add build to gitignore
3049b776 versions: Bump firecracker to v0.23.4
aedfef29 runtime/virtcontainers: Pass the hugepages resources to agent
c9e1f727 agent: Verify that we allocated as many hugepages as we need
ba858e8c agent: Don't attempt to create directories for hugepage configuration
bc32eff7 virtcontainers: clh: Re-generate the client code
984ef538 versions: Upgrade to Cloud Hypervisor v23.0
adf6493b runtime: Base64 encode the direct volume mountInfo path
6b417540 agent: Avoid agent panic when reading empty stats

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-05-04 16:18:45 +02:00
Fabiano Fidêncio
8e076c8701 release: Adapt kata-deploy for 2.4.1
kata-deploy files must be adapted to a new release.  The cases where it
happens are when the release goes from -> to:
* main -> stable:
  * kata-deploy-stable / kata-cleanup-stable: are removed

* stable -> stable:
  * kata-deploy / kata-cleanup: bump the release to the new one.

There are no changes when doing an alpha release, as the files on the
"main" branch always point to the "latest" and "stable" tags.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-05-04 16:18:45 +02:00
Fabiano Fidêncio
b11f7df5ab Merge pull request #4202 from fidencio/topic/second-round-of-backports-for-2.4.1
stable-2.4 | Second round of backports for the 2.4.1 release
2022-05-04 14:30:23 +02:00
Yibo Zhuang
b50b091c87 agent: watchers: ensure uid/gid is preserved on copy/mkdir
Today in agent watchers, when we copy files/symlinks
or create directories, the ownership of the source path
is not preserved which can lead to permission issues.

In copy, ensure that we do a chown of the source path
uid/gid to the destination file/symlink after copy to
ensure that ownership matches the source ownership.
fs::copy() takes care of setting the permissions.

For directory creation, ensure that we set the
permissions of the created directory to the source
directory permissions and also perform a chown of the
source path uid/gid to ensure directory ownership
and permissions matches to the source.

Fixes: #4188

Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
(cherry picked from commit 70eda2fa6c)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-05-04 12:39:44 +02:00
Fabiano Fidêncio
03bc89ab0b clh: Rely on Cloud Hypervisor for generating the device ID
We're currently hitting a race condition on the Cloud Hypervisor's
driver code when quickly removing and adding a block device.

This happens because the device removal is an asynchronous operation,
and we currently do *not* monitor events coming from Cloud Hypervisor to
know when the device was actually removed.  Together with this, the
sandbox code doesn't know about that and when a new device is attached
it'll quickly assign what may be the very same ID to the new device,
leading to the Cloud Hypervisor's driver trying to hotplug a device with
the very same ID of the device that was not yet removed.

This is, in a nutshell, why the tests with Cloud Hypervisor and
devmapper have been failing every now and then.

The workaround taken to solve the issue is basically *not* passing down
the device ID to Cloud Hypervisor and simply letting Cloud Hypervisor
itself generate those, as Cloud Hypervisor does it in a manner that
avoids such conflicts.  With this addition we have then to keep a map of
the device ID and the Cloud Hypervisor's generated ID, so we can
properly remove the device.

This workaround will probably stay for a while, at least till someone
has enough cycles to implement a way to watch the device removal event
and then properly act on that.  Spoiler alert, this will be a complex
change that may not even be worth it considering the race can be avoided
with this commit.

Fixes: #4196

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 33a8b70558)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-05-04 12:39:39 +02:00
Fabiano Fidêncio
d4dccb4900 Merge pull request #4153 from fidencio/wip/first-round-of-backports-for-2.4.1
stable-2.4 | First round of backports for the 2.4.1 release
2022-04-27 11:23:35 +02:00
Greg Kurz
6b2c641f0b tools: fix typo in clh directory name
This allows to get released binaries again.

Fixes: #4151

Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit b658dccc5f)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:11:05 +02:00
Greg Kurz
81e10fe34f packaging: Fix clh build from source fall-back
If we fail to download the clh binary, we fall-back to build from source.
Unfortunately, `pull_clh_released_binary()` leaves a `cloud_hypervisor`
directory behind, which causes `build_clh_from_source()` not to clone
the git repo:

    [ -d "${repo_dir}" ] || git clone "${cloud_hypervisor_repo}"

When building from a kata-containers git repo, the subsequent calls
to `git` in this function thus apply to the kata-containers repo and
eventually fail, e.g.:

+ git checkout v23.0
error: pathspec 'v23.0' did not match any file(s) known to git

It doesn't quite make sense actually to keep an existing directory the
content of which is arbitrary when we want to it to contain a specific
version of clh. Just remove it instead.

Fixes: #4151

Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit afbd60da27)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:11:01 +02:00
holyfei
8b21c5f78d agent: modify the type of swappiness to u64
The type of MemorySwappiness in runtime is uint64, and the type of swappiness in agent is int64,
if we set max uint64 in runtime and pass it to agent, the value will be equal to -1. We should
modify the type of swappiness to u64

Fixes: #4123

Signed-off-by: holyfei <yangfeiyu20092010@163.com>
(cherry picked from commit 0239502781)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:10:21 +02:00
David Gibson
3f5c6e7182 runtime: Allock mockfs storage to be placed in any directory
Currently EnableMockTesting() takes no arguments and will always place the
mock storage in the fixed location /tmp/vc/mockfs.  This means that one
test run can interfere with the next one if anything isn't cleaned up
(and there are other bugs which means that happens).  If if those were
fixed this would allow developers testing on the same machine to interfere
with each other.

So, allow the mockfs to be placed at an arbitrary place given as a
parameter to EnableMockTesting().  In TestMain() we place it under our
existing temporary directory, so we don't need any additional cleanup just
for the mockfs.

fixes #4140

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 1b931f4203)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:10:21 +02:00
David Gibson
0bd1abac3e runtime: Let MockFSInit create a mock fs driver at any path
Currently MockFSInit always creates the mockfs at the fixed path
/tmp/vc/mockfs.  This change allows it to be initialized at any path
given as a parameter.  This allows the tests in fs_test.go to be
simplified, because the by using a temporary directory from
t.TempDir(), which is automatically cleaned up, we don't need to
manually trigger initTestDir() (which is misnamed, it's actually a
cleanup function).

For now we still use the fixed path when auto-creating the mockfs in
MockAutoInit(), but we'll change that later.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit ef6d54a781)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:10:21 +02:00
David Gibson
3e74243fbe runtime: Move mockfs control global into mockfs.go
virtcontainers/persist/fs/mockfs.go defines a mock filesystem type for
testing.  A global variable in virtcontainers/persist/manager.go is used to
force use of the mock fs rather than a normal one.

This patch moves the global, and the EnableMockTesting() function which
sets it into mockfs.go.  This is slightly cleaner to begin with, and will
allow some further enhancements.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 5d8438e939)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:10:21 +02:00
David Gibson
aed4fe6a2e runtime: Export StoragePathSuffix
storagePathSuffix defines the file path suffix - "vc" - used for
Kata's persistent storage information, as a private constant.  We
duplicate this information in fc.go which also needs it.

Export it from fs.go instead, so it can be used in fc.go.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 963d03ea8a)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:10:21 +02:00
David Gibson
e1c4f57c35 runtime: Don't abuse MockStorageRootPath() for factory tests
A number of unit tests under virtcontainers/factory use
MockStorageRootPath() as a general purpose temporary directory.  This
doesn't make sense: the mockfs driver isn't even in use here since we only
call EnableMockTesting for the pase virtcontainers package, not the
subpackages.

Instead use t.TempDir() which is for exactly this purpose.  As a bonus it
also handles the cleanup, so we don't need MockStorageDestroy any more.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 1719a8b491)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:10:21 +02:00
David Gibson
c49084f303 runtime: Make bind mount tests better clean up after themselves
There are several tests in mount_test.go which perform a sample bind
mount.  These need a corresponding unmount to clean up afterwards or
attempting to delete the temporary files will fail due to the existing
mountpoint.  Most of them had such an unmount, but
TestBindMountInvalidPgtypes was missing one.

In addition, the existing unmounts where done inconsistently - one was
simply inline (so wouldn't be executed if the test fails too early) and one
is a defer.  Change them all to use the t.Cleanup mechanism.

For the dummy mountpoint files, rather than cleaning them up after the
test, the tests were removing them at the beginning of the test.  That
stops the test being messed up by a previous run, but messily.  Since
these are created in a private temporary directory anyway, if there's
something already there, that indicates a problem we shouldn't ignore.
In fact we don't need to explicitly remove these at all - they'll be
removed along with the rest of the private temporary directory.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit bec59f9e39)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:10:21 +02:00
David Gibson
4e350f7d53 runtime: Clean up mock hook logs in tests
The tests in hook_test.go run a mock hook binary, which does some debug
logging to /tmp/mock_hook.log.  Currently we don't clean up those logs
when the tests are done.  Use a test cleanup function to do this.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit f7ba21c86f)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:10:21 +02:00
David Gibson
415420f689 runtime: Make SetupOCIConfigFile clean up after itself
SetupOCIConfigFile creates a temporary directory with os.MkDirTemp().  This
means the callers need to register a deferred function to remove it again.
At least one of them was commented out meaning that a /temp/katatest-
directory was leftover after the unit tests ran.

Change to using t.TempDir() which as well as better matching other parts of
the tests means the testing framework will handle cleaning it up.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 90b2f5b776)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:10:21 +02:00
David Gibson
688b9abd35 runtime: Don't use fixed /tmp/mountPoint path
Several tests in kata_agent_test.go create /tmp/mountPoint as a dummy
directory to mount.  This is not cleaned up after the test.  Although it
is in /tmp, that's still a little messy and can be confusing to a user.
In addition, because it uses the same name every time, it allows for one
run of the test to interfere with the next.

Use the built in t.TempDir() to use an automatically named and deleted
temporary directory instead.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 2eeb5dc223)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:10:21 +02:00
Francesco Giudici
dc1288de8d kata-monitor: add a README file
Fixes: #3704

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
(cherry picked from commit 7b2ff02647)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:10:21 +02:00
bin
78edf827df kata-monitor: add some links when generating pages for browsers
Add some links to rendered webpages for better user experience,
let users can jump to pages only by clicking links in browsers.

Fixes: #4061

Signed-off-by: bin <bin@hyper.sh>
(cherry picked from commit f8cc5d1ad8)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:10:21 +02:00
Yibo Zhuang
eff74fab0e agent: fsGroup support for direct-assigned volume
Adding two functions set_ownership and
recursive_ownership_change to support changing group id
ownership for a mounted volume.

The set_ownership will be called in common_storage_handler
after mount_storage performs the mount for the volume.
set_ownership will be a noop if the FSGroup field in the
Storage struct is not set which indicates no chown will be
performed. If FSGroup field is specified, then it will
perform the recursive walk of the mounted volume path to
change ownership of all files and directories to the
desired group id. It will also configure the SetGid bit
so that files created the directory will have group
following parent directory group.

If the fsGroupChangePolicy is on root mismatch,
then the group ownership will be skipped if the root
directory group id alreasy matches the desired group
id and if the SetGid bit is also set on the root directory.

This is the same behavior as what
Kubelet does today when performing the recursive walk
to change ownership.

Fixes #4018

Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
(cherry picked from commit 92c00c7e84)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:09:22 +02:00
Yibo Zhuang
01cd58094e proto: fsGroup support for direct-assigned volume
This change adds two fields to the Storage pb

FSGroup which is a group id that the runtime
specifies to indicate to the agent to perform a
chown of the mounted volume to the specified
group id after mounting is complete in the guest.

FSGroupChangePolicy which is a policy to indicate
whether to always perform the group id ownership
change or only if the root directory group id
does not match with the desired group id.

These two fields will allow CSI plugins to indicate
to Kata that after the block device is mounted in
the guest, group id ownership change should be performed
on that volume.

Fixes #4018

Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
(cherry picked from commit 6a47b82c81)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:05:29 +02:00
Yibo Zhuang
97ad1d55ff runtime: fsGroup support for direct-assigned volume
The fsGroup will be specified by the fsGroup key in
the direct-assign mountinfo metadate field.
This will be set when invoking the kata-runtime
binary and providing the key, value pair in the metadata
field. Similarly, the fsGroupChangePolicy will also
be provided in the mountinfo metadate field.

Adding an extra fields FsGroup and FSGroupChangePolicy
in the Mount construct for container mount which will
be populated when creating block devices by parsing
out the mountInfo.json.

And in handleDeviceBlockVolume of the kata-agent client,
it checks if the mount FSGroup is not nil, which
indicates that fsGroup change is required in the guest,
and will provide the FSGroup field in the protobuf to
pass the value to the agent.

Fixes #4018

Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
(cherry picked from commit 532d53977e)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:05:29 +02:00
Zhuoyu Tie
b62cced7f4 runtime: no need to write virtiofsd error to log
The scanner reads nothing from viriofsd stderr pipe, because param
'--syslog' rediercts stderr to syslog. So there is no need to write
scanner.Text() to kata log

Fixes: #4063

Signed-off-by: Zhuoyu Tie <tiezhuoyu@outlook.com>
(cherry picked from commit 6e79042aa0)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:05:29 +02:00
Francesco Giudici
8242cfd2be kata-monitor: update the hrefs in the debug/pprof index page
kata-monitor allows to get data profiles from the kata shim
instances running on the same node by acting as a proxy
(e.g., http://$NODE_ADDRESS:8090/debug/pprof/?sandbox=$MYSANDBOXID).
In order to proxy the requests and the responses to the right shim,
kata-monitor requires to pass the sandbox id via a query string in the
url.

The profiling index page proxied by kata-monitor contains the link to all
the data profiles available. All the links anyway do not contain the
sandbox id included in the request: the links result then broken when
accessed through kata-monitor.
This happens because the profiling index page comes from the kata shim,
which will not include the query string provided in the http request.

Let's add on-the-fly the sandbox id in each href tag returned by the kata
shim index page before providing the proxied page.

Fixes: #4054

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
(cherry picked from commit 86977ff780)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:05:29 +02:00
Feng Wang
a37d4e538f agent: best-effort removing mount point
During container exit, the agent tries to remove all the mount point directories,
which can fail if it's a readonly filesytem (e.g. device mapper). This commit ignores
the removal failure and logs a warning message.

Fixes: #4043

Signed-off-by: Feng Wang <feng.wang@databricks.com>
(cherry picked from commit aabcebbf58)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:05:29 +02:00
Greg Kurz
d1197ee8e5 tools/packaging: Fix error path in 'kata-deploy-binaries.sh -s'
`make kata-tarball` relies on `kata-deploy-binaries.sh -s` which
silently ignores errors, and you may end up with an incomplete
tarball without noticing it because `make`'s exit status is 0.

`kata-deploy-binaries.sh` does set the `errexit` option and all the
code in the script seems to assume that since it doesn't do error
checking. Unfortunately, bash automatically disables `errexit` when
calling a function from a conditional pipeline, like done in the `-s`
case:

	if [ "${silent}" == true ]; then
		if ! handle_build "${t}" &>"$log_file"; then
                ^^^^^^
           this disables `errexit`

and `handle_build` ends with a `tar tvf` that always succeeds.

Adding error checking all over the place isn't really an option
as it would seriously obfuscate the code. Drop the conditional
pipeline instead and print the final error message from a `trap`
handler on the special ERR signal. This requires the `errtrace`
option as `trap`s aren't propagated to functions by default.

Since all outputs of `handle_build` are redirected to the build
log file, some file descriptor duplication magic is needed for
the handler to be able to write to the orignal stdout and stderr.

Fixes #3757

Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit a779e19bee)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:05:29 +02:00
Greg Kurz
c9c7751184 tools/packaging: Fix usage of kata-deploy-binaries.sh
Add missing documentation for -s .

Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 0baebd2b37)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:05:29 +02:00
Greg Kurz
1e62231610 tools/packaging/kata-deploy: Copy install_yq.sh in a dedicated script
'make kata-tarball' sometimes fails early with:

cp: cannot create regular file '[...]/tools/packaging/kata-deploy/local-build/dockerbuild/install_yq.sh': File exists

This happens because all assets are built in parallel using the same
`kata-deploy-binaries-in-docker.sh` script, and thus all try to copy
the `install_yq.sh` script to the same location with the `cp` command.
This is a well known race condition that cannot be avoided without
serialization of `cp` invocations.

Move the copying of `install_yq.sh` to a separate script and ensure
it is called *before* parallel builds. Make the presence of the copy
a prerequisite for each sub-build so that they still can be triggered
individually. Update the GH release workflow to also call this script
before calling `kata-deploy-binaries-in-docker.sh`.

Fixes #3756

Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 154c8b03d3)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:05:29 +02:00
David Gibson
8fa64e011d packaging: Eliminate TTY_OPT and NO_TTY variables in kata-deploy
NO_TTY configured whether to add the -t option to docker run.  It makes no
sense for the caller to configure this, since whether you need it depends
on the commands you're running.  Since the point here is to run
non-interactive build scripts, we don't need -t, or -i either.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 1ed7da8fc7)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:05:29 +02:00
David Gibson
8f67f9e384 tools/packaging/kata-deploy/local-build: Add build to gitignore
This directory consists entirely of files built during a make kata-tarball,
so it should not be committed to the tree. A symbolic link to this directory
might be created during 'make tarball', ignore it as well.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[greg: - rearranged the subject to make the subsystem checker happy
       - also ignore the symbolic link created by
         `kata-deploy-binaries-in-docker.sh`]
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit bad859d2f8)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:05:29 +02:00
Greg Kurz
3049b7760a versions: Bump firecracker to v0.23.4
This release changes Docker images repository from DockerHub to Amazon
ECR. This resolves the `You have reached your pull rate limit` error
when building the firecracker tarball.

Fixes #4001

Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 0d5f80b803)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:05:29 +02:00
Miao Xia
aedfef29a3 runtime/virtcontainers: Pass the hugepages resources to agent
The hugepages resources claimed by containers should be limited
by cgroup in the guest OS.

Fixes: #3695

Signed-off-by: Miao Xia <xia.miao1@zte.com.cn>
(cherry picked from commit a2f5c1768e)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:05:29 +02:00
David Gibson
c9e1f72785 agent: Verify that we allocated as many hugepages as we need
allocate_hugepages() writes to the kernel sysfs file to allocate hugepages
in the Kata VM.  However, even if the write succeeds, it's not certain that
the kernel will actually be able to allocate as many hugepages as we
requested.

This patch reads back the file after writing it to check if we were able to
allocate all the required hugepages.

fixes #3816

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 42e35505b0)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:05:29 +02:00
David Gibson
ba858e8cd9 agent: Don't attempt to create directories for hugepage configuration
allocate_hugepages() constructs the path for the sysfs directory containing
hugepage configuration, then attempts to create this directory if it does
not exist.

This doesn't make sense: sysfs is a view into kernel configuration, if the
kernel has support for the hugepage size, the directory will already be
there, if it doesn't, trying to create it won't help.

For the same reason, attempting to create the "nr_hugepages" file
itself is pointless, so there's no reason to call
OpenOptions::create(true).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 608e003abc)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:05:29 +02:00
Fabiano Fidêncio
b784763685 Merge pull request #4120 from likebreath/0420/backport_clh_v23.0
stable-2.4 | versions: Upgrade to Cloud Hypervisor v23.0
2022-04-21 14:33:37 +02:00
Fabiano Fidêncio
df2d57e9b8 Merge pull request #4098 from fengwang666/stable-2.4_backport
stable-2.4 | runtime: Base64 encode the direct volume mountInfo path
2022-04-21 12:54:03 +02:00
Bo Chen
bc32eff7b4 virtcontainers: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v23.0.
Note: The client code of cloud-hypervisor's (CLH) OpenAPI is
automatically generated by openapi-generator [1-2].

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 29e569aa92)
2022-04-20 15:57:50 -07:00
Bo Chen
984ef5389e versions: Upgrade to Cloud Hypervisor v23.0
Highlights from the Cloud Hypervisor release v23.0: 1) vDPA Support; 2)
Updated OS Support list (Jammy 22.04 added with EOLed versions removed);
3) AArch64 Memory Map Improvements; 4) AMX Support; 5) Bug Fixes;

Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v23.0

Fixes: #4101

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 6012c19707)
2022-04-20 15:57:50 -07:00
Feng Wang
adf6493b89 runtime: Base64 encode the direct volume mountInfo path
This is to avoid accidentally deleting multiple volumes.

Fixes #4020

Signed-off-by: Feng Wang <feng.wang@databricks.com>
(cherry picked from commit 354cd3b9b6)
2022-04-13 22:30:53 -07:00
Greg Kurz
10bab3c96a Merge pull request #4081 from fidencio/wip/stable-2.4-agent-avoid-panic-when-getting-empty-stats
stable-2.4 | agent: Avoid agent panic when reading empty stats
2022-04-13 14:13:13 +02:00
Fabiano Fidêncio
6b41754018 agent: Avoid agent panic when reading empty stats
This was seen in an issue report, where we'd try to unwrap a None value,
leading to a panic.

Fixes: #4077
Related: #4043

Full backtrace:
```
"thread 'tokio-runtime-worker' panicked at 'called `Option::unwrap()` on a `None` value', rustjail/src/cgroups/fs/mod.rs:593:31"
"stack backtrace:"
"   0:     0x7f0390edcc3a - std::backtrace_rs::backtrace::libunwind::trace::hd5eff4de16dbdd15"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5"
"   1:     0x7f0390edcc3a - std::backtrace_rs::backtrace::trace_unsynchronized::h04a775b4c6ab90d6"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5"
"   2:     0x7f0390edcc3a - std::sys_common::backtrace::_print_fmt::h3253c3db9f17d826"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:67:5"
"   3:     0x7f0390edcc3a - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h02bfc712fc868664"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:46:22"
"   4:     0x7f0390a91fbc - core::fmt::write::hfd5090d1132106d8"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/fmt/mod.rs:1149:17"
"   5:     0x7f0390edb804 - std::io::Write::write_fmt::h34acb699c6d6f5a9"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/io/mod.rs:1697:15"
"   6:     0x7f0390edbee0 - std::sys_common::backtrace::_print::hfca761479e3d91ed"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:49:5"
"   7:     0x7f0390edbee0 - std::sys_common::backtrace::print::hf666af0b87d2b5ba"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:36:9"
"   8:     0x7f0390edbee0 - std::panicking::default_hook::{{closure}}::hb4617bd1d4a09097"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:211:50"
"   9:     0x7f0390edb2da - std::panicking::default_hook::h84f684d9eff1eede"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:228:9"
"  10:     0x7f0390edb2da - std::panicking::rust_panic_with_hook::h8e784f5c39f46346"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:606:17"
"  11:     0x7f0390f0c416 - std::panicking::begin_panic_handler::{{closure}}::hef496869aa926670"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:500:13"
"  12:     0x7f0390f0c3b6 - std::sys_common::backtrace::__rust_end_short_backtrace::h8e9b039b8ed3e70f"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:139:18"
"  13:     0x7f0390f0c372 - rust_begin_unwind"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:498:5"
"  14:     0x7f03909062c0 - core::panicking::panic_fmt::h568976b83a33ae59"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/panicking.rs:107:14"
"  15:     0x7f039090641c - core::panicking::panic::he2e71cfa6548cc2c"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/panicking.rs:48:5"
"  16:     0x7f0390eb443f - <rustjail::cgroups::fs::Manager as rustjail::cgroups::Manager>::get_stats::h85031fc1c59c53d9"
"  17:     0x7f03909c0138 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::hfa6e6cd7516f8d11"
"  18:     0x7f0390d697e5 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::hffbaa534cfa97d44"
"  19:     0x7f039099c0b3 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::hae3ab083a06d0b4b"
"  20:     0x7f0390af9e1e - std::panic::catch_unwind::h1fdd25c8ebba32e1"
"  21:     0x7f0390b7c4e6 - tokio::runtime::task::raw::poll::hd3ebbd0717dac808"
"  22:     0x7f0390f49f3f - tokio::runtime::thread_pool::worker::Context::run_task::hfdd63cd1e0b17abf"
"  23:     0x7f0390f3a599 - tokio::runtime::task::raw::poll::h62954f6369b1d210"
"  24:     0x7f0390f37863 - std::sys_common::backtrace::__rust_begin_short_backtrace::h1c58f232c078bfe9"
"  25:     0x7f0390f4f3dd - core::ops::function::FnOnce::call_once{{vtable.shim}}::h2d329a84c0feed57"
"  26:     0x7f0390f0e535 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h137e5243c6233a3b"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/alloc/src/boxed.rs:1694:9"
"  27:     0x7f0390f0e535 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h7331c46863d912b7"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/alloc/src/boxed.rs:1694:9"
"  28:     0x7f0390f0e535 - std::sys::unix::thread::Thread::new::thread_start::h1fb20b966cb927ab"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys/unix/thread.rs:106:17"
```

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 78f30c33c6)
2022-04-12 18:59:02 +02:00
Fabiano Fidêncio
0ad6f05dee Merge pull request #4024 from bergwolf/2.4.0-branch-bump
# Kata Containers 2.4.0
2022-04-01 13:46:35 +02:00
Peng Tao
4c9c01a124 release: Kata Containers 2.4.0
- stable-2.4 | agent: fix container stop error with signal SIGRTMIN+3
- stable-2.4 | kata-monitor: fix duplicated output when printing usage
- stable-2.4 | runtime: Stop getting OOM events from agent for "ttrpc closed" error
- kata-deploy: fix version bump from -rc to stable
- stable-2.4: release: Include all the rust vendored code into the vendored tarball
- stable-2.4 | tools: release: Do not consider release candidates as stable releases
- agent: Signal the whole process group
- stable-2.4 | docs: Update k8s documentation
- backport main commits to stable 2.4
- stable-2.4: Bump QEMU to 6.2 (bringing then SGX support in)
- runtime: Properly handle ESRCH error when signaling container
- stable-2.4 | versions: Upgrade to Cloud Hypervisor v22.1

f2319d69 release: Adapt kata-deploy for 2.4.0
cae48e9c agent: fix container stop error with signal SIGRTMIN+3
342aa95c kata-monitor: fix duplicated output when printing usage
9f75e226 runtime: add logs around sandbox monitor
363fbed8 runtime: stop getting OOM events when ttrpc: closed error
f840de5a workflows,release: Ship *all* the rust vendored code
952cea5f tools: Add a generate_vendor.sh script
cc965fa0 kata-deploy: fix version bump from -rc to stable
f41cc184 tools: release: Do not consider release candidates as stable releases
e059b50f runtime: Add more debug logs for container io stream copy
71ce6f53 agent: Kill the all the container processes of the same cgroup
30fc2c86 docs: Update k8s documentation
24028969 virtcontainers: Run mock hook from build tree rather than system bin dir
4e54aa5a doc: fix filename typo
d815393c manager: Add options to change self test behaviour
4111e1a3 manager: Add option to enable component debug
2918be18 manager: Create containerd link
6b31b068 kernel: fix cve-2022-0847
5589b246 doc: update Intel SGX use cases document
1da88dca tools: update QEMU to 6.2
3e2f9223 runtime: Properly handle ESRCH error when signaling container
4c21cb3e versions: Upgrade to Cloud Hypervisor v22.1

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-04-01 06:20:20 +00:00
Peng Tao
f2319d693d release: Adapt kata-deploy for 2.4.0
kata-deploy files must be adapted to a new release.  The cases where it
happens are when the release goes from -> to:
* main -> stable:
  * kata-deploy-stable / kata-cleanup-stable: are removed

* stable -> stable:
  * kata-deploy / kata-cleanup: bump the release to the new one.

There are no changes when doing an alpha release, as the files on the
"main" branch always point to the "latest" and "stable" tags.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-04-01 06:20:20 +00:00
Bin Liu
98ccf8f6a1 Merge pull request #4008 from wxx213/stable-2.4
stable-2.4 | agent: fix container stop error with signal SIGRTMIN+3
2022-04-01 11:29:18 +08:00
Wang Xingxing
cae48e9c9b agent: fix container stop error with signal SIGRTMIN+3
The nix::sys::signal::Signal package api cannot deal with SIGRTMIN+3,
directly use libc function to send the signal.

Fixes: #3990

Signed-off-by: Wang Xingxing <stellarwxx@163.com>
(cherry picked from commit 0d765bd082)
Signed-off-by: Wang Xingxing <stellarwxx@163.com>
2022-03-31 16:49:06 +08:00
snir911
a36103c759 Merge pull request #4003 from fgiudici/kata-monitor_fix_help_backport
stable-2.4 | kata-monitor: fix duplicated output when printing usage
2022-03-30 18:57:17 +03:00
Fabiano Fidêncio
6abbcc551c Merge pull request #3997 from liubin/backport-2.4
stable-2.4 | runtime: Stop getting OOM events from agent for "ttrpc closed" error
2022-03-30 14:08:55 +02:00
Francesco Giudici
342aa95cc8 kata-monitor: fix duplicated output when printing usage
(default: "/run/containerd/containerd.sock") is duplicated when
printing kata-monitor usage:

[root@kubernetes ~]# kata-monitor --help
Usage of kata-monitor:
  -listen-address string
        The address to listen on for HTTP requests. (default ":8090")
  -log-level string
        Log level of logrus(trace/debug/info/warn/error/fatal/panic). (default "info")
  -runtime-endpoint string
        Endpoint of CRI container runtime service. (default: "/run/containerd/containerd.sock") (default "/run/containerd/containerd.sock")

the golang flag package takes care of adding the defaults when printing
usage. Remove the explicit print of the value so that it would not be
printed on screen twice.

Fixes: #3998

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
(cherry picked from commit a63bbf9793)
2022-03-30 14:02:54 +02:00
bin
9f75e226f1 runtime: add logs around sandbox monitor
For debugging purposes, add some logs.

Fixes: #3815

Signed-off-by: bin <bin@hyper.sh>
2022-03-30 17:11:40 +08:00
bin
363fbed804 runtime: stop getting OOM events when ttrpc: closed error
getOOMEvents is a long-waiting call, it will retry when failed.
For cases of agent shutdown, the retry should stop.

When the agent hasn't detected agent has died, we can also check
whether the error is "ttrpc: closed".

Fixes: #3815

Signed-off-by: bin <bin@hyper.sh>
2022-03-30 17:11:35 +08:00
Fabiano Fidêncio
54a638317a Merge pull request #3988 from bergwolf/github/kata-deploy
kata-deploy: fix version bump from -rc to stable
2022-03-30 11:01:45 +02:00
Peng Tao
8ce6b12b41 Merge pull request #3993 from fidencio/wip/stable-2.4-release-include-all-rust-vendored-code-to-the-vendored-tarball
stable-2.4: release: Include all the rust vendored code into the vendored tarball
2022-03-30 16:10:47 +08:00
Fabiano Fidêncio
f840de5acb workflows,release: Ship *all* the rust vendored code
Instead of only vendoring the code needed by the agent, let's ensure we
vendor all the needed rust code, and let's do it using the newly
introduced enerate_vendor.sh script.

Fixes: #3973

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 3606923ac8)
2022-03-29 23:27:43 +02:00
Fabiano Fidêncio
952cea5f5d tools: Add a generate_vendor.sh script
This script is responsible for generating a tarball with all the rust
vendored code that is needed for fully building kata-containers on a
disconnected environment.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 2eb07455d0)
2022-03-29 23:27:29 +02:00
Peng Tao
cc965fa0cb kata-deploy: fix version bump from -rc to stable
In such case, we should bump from "latest" tag rather than from
current_version.

Fixes: #3986
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-03-29 03:45:27 +00:00
GabyCT
44b1473d0c Merge pull request #3977 from fidencio/wip/backport-fix-for-3847
stable-2.4 | tools: release: Do not consider release candidates as stable releases
2022-03-28 10:38:47 -06:00
Fupan Li
565efd1bf2 Merge pull request #3975 from bergwolf/github/backport-stable-2.4
agent: Signal the whole process group
2022-03-28 18:26:12 +08:00
Fabiano Fidêncio
f41cc18427 tools: release: Do not consider release candidates as stable releases
During the release of 2.4.0-rc0 @egernst noticed an incositency in the
way we handle release tags, as release candidates are being taken as
"stable" releases, while both the kata-deploy tests and the release
action consider this as "latest".

Ideally we should have our own tag for "release candidate", but that's
something that could and should be discussed more extensively outside of
the scope of this quick fix.

For now, let's align the code generating the PR for bumping the release
with what we already do as part of the release action and kata-deploy
test, and tag "-rc"  as latest, regardless of which branch it's coming
from.

Fixes: #3847

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 4adf93ef2c)
2022-03-28 11:01:58 +02:00
Feng Wang
e059b50f5c runtime: Add more debug logs for container io stream copy
This can help debugging container lifecycle issues

Fixes: #3913

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-28 16:22:22 +08:00
Feng Wang
71ce6f537f agent: Kill the all the container processes of the same cgroup
Otherwise the container process might leak and cause an unclean exit

Fixes: #3913

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-28 16:21:51 +08:00
Bin Liu
a2b73b60bd Merge pull request #3960 from cmaf/update-k8s-docs-1-stable-2.4
stable-2.4 | docs: Update k8s documentation
2022-03-25 15:25:25 +08:00
Bin Liu
2ce9ce7b8f Merge pull request #3954 from bergwolf/github/backport-stable-2.4
backport main commits to stable 2.4
2022-03-25 14:45:17 +08:00
Chelsea Mafrica
30fc2c863d docs: Update k8s documentation
Update documentation with missing step to untaint node to enable
scheduling and update the example to run a pod using the kata runtime
class instead of untrusted workloads, which applies to versions of CRI-O
prior to v1.12.

Fixes #3863

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
(cherry picked from commit 5c434270d1)
2022-03-24 11:22:18 -07:00
David Gibson
24028969c2 virtcontainers: Run mock hook from build tree rather than system bin dir
Running unit tests should generally have minimal dependencies on
things outside the build tree.  It *definitely* shouldn't modify
system wide things outside the build tree.  Currently the runtime
"make test" target does so, though.

Several of the tests in src/runtime/pkg/katautils/hook_test.go require a
sample hook binary.  They expect this hook in
/usr/bin/virtcontainers/bin/test/hook, so the makefile, as root, installs
the test binary to that location.

Go tests automatically run within the package's directory though, so
there's no need to use a system wide path.  We can use a relative path to
the binary build within the tree just as easily.

fixes #3941

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-03-24 12:02:00 +08:00
Garrett Mahin
4e54aa5a7b doc: fix filename typo
Corrects a filename typo in cleanup cluster part
of kata-deploy README.md

Fixes: #3869
Signed-off-by: Garrett Mahin <garrett.mahin@gmail.com>
2022-03-24 12:00:17 +08:00
James O. D. Hunt
d815393c3e manager: Add options to change self test behaviour
Added new `kata-manager` options to control the self-test behaviour. By
default, after installation the manager will run a test to ensure a Kata
Containers container can be created. New options allow:

- The self test to be disabled.
- Only the self test to be run (no installation).

These features allow changes to be made to the installed system before
the self test is run.

Fixes: #3851.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2022-03-24 11:59:48 +08:00
James O. D. Hunt
4111e1a3de manager: Add option to enable component debug
Added a `-d` option to `kata-manager` to enable Kata Containers
and containerd debug.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2022-03-24 11:59:33 +08:00
James O. D. Hunt
2918be180f manager: Create containerd link
Make the `kata-manager` create a `containerd` link to ensure the
downloaded containerd systemd service file can find the daemon when
using the GitHub packaged version of containerd.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2022-03-24 11:59:26 +08:00
Julio Montes
6b31b06832 kernel: fix cve-2022-0847
bump guest kernel version to fix cve-2022-0847 "Dirty Pipe"

fixes #3852

Signed-off-by: Julio Montes <julio.montes@intel.com>
2022-03-24 11:58:43 +08:00
Fabiano Fidêncio
53a9cf7dc4 Merge pull request #3927 from fidencio/stable-2.4/qemu-bump
stable-2.4: Bump QEMU to 6.2 (bringing then SGX support in)
2022-03-23 07:20:35 +01:00
Julio Montes
5589b246d7 doc: update Intel SGX use cases document
Installation section is not longer needed because of the latest
default kata kernel supports Intel SGX.
Include QEMU to the list of supported hypervisors.

fixes #3911

Signed-off-by: Julio Montes <julio.montes@intel.com>
(cherry picked from commit 24b29310b2)
2022-03-22 08:36:04 +01:00
Julio Montes
1da88dca4b tools: update QEMU to 6.2
bring Intel SGX support

Changes tha may impact in Kata Containers
Arm:
The 'virt' machine now supports an emulated ITS
The 'virt' machine now supports more than 123 CPUs in TCG emulation mode
The pl031 real-time clock device now supports sending RTC_CHANGE QMP events

PowerPC:
Improved POWER10 support for the 'powernv' machine
Initial support for POWER10 DD2.0 CPU added
Added support for FORM2 PAPR NUMA descriptions in the "pseries" machine
 type

s390x:
Improved storage key emulation (e.g. fixed address handling, lazy
 storage key enablement for TCG, ...)
New gen16 CPU features are now enabled automatically in the latest
 machine type

KVM:
Support for SGX in the virtual machine, using the /dev/sgx_vepc device
 on the host and the "memory-backend-epc" backend in QEMU.
New "hv-apicv" CPU property (aliased to "hv-avic") sets the
 HV_DEPRECATING_AEOI_RECOMMENDED bit in CPUID[0x40000004].EAX.

virtio-mem:
QEMU now fully supports guest memory dumps with virtio-mem.
QEMU now cleanly supports precopy migration, postcopy migration and
 background snapshots with virtio-mem.

fixes #3902

Signed-off-by: Julio Montes <julio.montes@intel.com>
(cherry picked from commit 18d4d7fb1d)
2022-03-22 08:35:45 +01:00
Peng Tao
8cc2231818 Merge pull request #3892 from fengwang666/my_2.4_pr_backport
runtime: Properly handle ESRCH error when signaling container
2022-03-15 10:11:25 +08:00
GabyCT
63c1498f05 Merge pull request #3891 from likebreath/stable-2.4
stable-2.4 | versions: Upgrade to Cloud Hypervisor v22.1
2022-03-14 17:44:09 -06:00
Feng Wang
3e2f9223b0 runtime: Properly handle ESRCH error when signaling container
Currently kata shim v2 doesn't translate ESRCH signal, causing container
fail to stop and shim leak.

Fixes: #3874

Signed-off-by: Feng Wang <feng.wang@databricks.com>
(cherry picked from commit aa5ae6b17c)
2022-03-14 13:15:54 -07:00
Bo Chen
4c21cb3eb1 versions: Upgrade to Cloud Hypervisor v22.1
This is a bug fix release. The following issues have been addressed:
1) VFIO ioctl reordering to fix MSI on AMD platforms; 2) Fix virtio-net
control queue.

Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v22.1

Fixes: #3872

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 7a18e32fa7)
2022-03-14 12:34:31 -07:00
2183 changed files with 31379 additions and 435555 deletions

View File

@@ -1,40 +0,0 @@
#!/bin/bash
#
# Copyright (c) 2022 Red Hat
#
# SPDX-License-Identifier: Apache-2.0
#
script_dir=$(dirname "$(readlink -f "$0")")
parent_dir=$(realpath "${script_dir}/../..")
cidir="${parent_dir}/ci"
source "${cidir}/lib.sh"
cargo_deny_file="${script_dir}/action.yaml"
cat cargo-deny-skeleton.yaml.in > "${cargo_deny_file}"
changed_files_status=$(run_get_pr_changed_file_details)
changed_files_status=$(echo "$changed_files_status" | grep "Cargo\.toml$" || true)
changed_files=$(echo "$changed_files_status" | awk '{print $NF}' || true)
if [ -z "$changed_files" ]; then
cat >> "${cargo_deny_file}" << EOF
- run: echo "No Cargo.toml files to check"
shell: bash
EOF
fi
for path in $changed_files
do
cat >> "${cargo_deny_file}" << EOF
- name: ${path}
continue-on-error: true
shell: bash
run: |
pushd $(dirname ${path})
cargo deny check
popd
EOF
done

View File

@@ -1,30 +0,0 @@
#
# Copyright (c) 2022 Red Hat
#
# SPDX-License-Identifier: Apache-2.0
#
name: 'Cargo Crates Check'
description: 'Checks every Cargo.toml file using cargo-deny'
env:
CARGO_TERM_COLOR: always
runs:
using: "composite"
steps:
- name: Install Rust
uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: nightly
override: true
- name: Cache
uses: Swatinem/rust-cache@v2
- name: Install Cargo deny
shell: bash
run: |
which cargo
cargo install --locked cargo-deny || true

View File

@@ -1,100 +0,0 @@
name: Add backport label
on:
pull_request:
types:
- opened
- synchronize
- reopened
- edited
- labeled
- unlabeled
jobs:
check-issues:
if: ${{ github.event.label.name != 'auto-backport' }}
runs-on: ubuntu-latest
steps:
- name: Checkout code to allow hub to communicate with the project
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v3
- name: Install hub extension script
run: |
pushd $(mktemp -d) &>/dev/null
git clone --single-branch --depth 1 "https://github.com/kata-containers/.github" && cd .github/scripts
sudo install hub-util.sh /usr/local/bin
popd &>/dev/null
- name: Determine whether to add label
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
CONTAINS_AUTO_BACKPORT: ${{ contains(github.event.pull_request.labels.*.name, 'auto-backport') }}
id: add_label
run: |
pr=${{ github.event.pull_request.number }}
linked_issue_urls=$(hub-util.sh \
list-issues-for-pr "$pr" |\
grep -v "^\#" |\
cut -d';' -f3 || true)
[ -z "$linked_issue_urls" ] && {
echo "::error::No linked issues for PR $pr"
exit 1
}
has_bug=false
for issue_url in $(echo "$linked_issue_urls")
do
issue=$(echo "$issue_url"| awk -F\/ '{print $NF}' || true)
[ -z "$issue" ] && {
echo "::error::Cannot determine issue number from $issue_url for PR $pr"
exit 1
}
labels=$(hub-util.sh list-labels-for-issue "$issue")
label_names=$(echo $labels | jq -r '.[].name' || true)
if [[ "$label_names" =~ "bug" ]]; then
has_bug=true
break
fi
done
has_backport_needed_label=${{ contains(github.event.pull_request.labels.*.name, 'needs-backport') }}
has_no_backport_needed_label=${{ contains(github.event.pull_request.labels.*.name, 'no-backport-needed') }}
echo "::set-output name=add_backport_label::false"
if [ $has_backport_needed_label = true ] || [ $has_bug = true ]; then
if [[ $has_no_backport_needed_label = false ]]; then
echo "::set-output name=add_backport_label::true"
fi
fi
# Do not spam comment, only if auto-backport label is going to be newly added.
echo "::set-output name=auto_backport_added::$CONTAINS_AUTO_BACKPORT"
- name: Add comment
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') && steps.add_label.outputs.add_backport_label == 'true' && steps.add_label.outputs.auto_backport_added == 'false' }}
uses: actions/github-script@v6
with:
script: |
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: 'This issue has been marked for auto-backporting. Add label(s) backport-to-BRANCHNAME to backport to them'
})
# Allow label to be removed by adding no-backport-needed label
- name: Remove auto-backport label
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') && steps.add_label.outputs.add_backport_label == 'false' }}
uses: andymckay/labeler@e6c4322d0397f3240f0e7e30a33b5c5df2d39e90
with:
remove-labels: "auto-backport"
repo-token: ${{ secrets.GITHUB_TOKEN }}
- name: Add auto-backport label
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') && steps.add_label.outputs.add_backport_label == 'true' }}
uses: andymckay/labeler@e6c4322d0397f3240f0e7e30a33b5c5df2d39e90
with:
add-labels: "auto-backport"
repo-token: ${{ secrets.GITHUB_TOKEN }}

View File

@@ -1,40 +0,0 @@
# Copyright (c) 2022 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
#
name: Add PR sizing label
on:
pull_request_target:
types:
- opened
- reopened
- synchronize
jobs:
add-pr-size-label:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v1
- name: Install PR sizing label script
run: |
# Clone into a temporary directory to avoid overwriting
# any existing github directory.
pushd $(mktemp -d) &>/dev/null
git clone --single-branch --depth 1 "https://github.com/kata-containers/.github" && cd .github/scripts
sudo install pr-add-size-label.sh /usr/local/bin
popd &>/dev/null
- name: Add PR sizing label
env:
GITHUB_TOKEN: ${{ secrets.KATA_GITHUB_ACTIONS_PR_SIZE_TOKEN }}
run: |
pr=${{ github.event.number }}
# Removing man-db, workflow kept failing, fixes: #4480
sudo apt -y remove --purge man-db
sudo apt -y install diffstat patchutils
pr-add-size-label.sh -p "$pr"

View File

@@ -1,29 +0,0 @@
on:
pull_request_target:
types: ["labeled", "closed"]
jobs:
backport:
name: Backport PR
runs-on: ubuntu-latest
if: |
github.event.pull_request.merged == true
&& contains(github.event.pull_request.labels.*.name, 'auto-backport')
&& (
(github.event.action == 'labeled' && github.event.label.name == 'auto-backport')
|| (github.event.action == 'closed')
)
steps:
- name: Backport Action
uses: sqren/backport-github-action@v8.9.2
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
auto_backport_label_prefix: backport-to-
- name: Info log
if: ${{ success() }}
run: cat /home/runner/.backport/backport.info.log
- name: Debug log
if: ${{ failure() }}
run: cat /home/runner/.backport/backport.debug.log

View File

@@ -1,19 +0,0 @@
name: Cargo Crates Check Runner
on: [pull_request]
jobs:
cargo-deny-runner:
runs-on: ubuntu-latest
steps:
- name: Checkout Code
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v3
- name: Generate Action
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: bash cargo-deny-generator.sh
working-directory: ./.github/cargo-deny-composite-action/
env:
GOPATH: ${{ runner.workspace }}/kata-containers
- name: Run Action
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: ./.github/cargo-deny-composite-action

View File

@@ -10,7 +10,7 @@ env:
error_msg: |+
See the document below for help on formatting commits for the project.
https://github.com/kata-containers/community/blob/main/CONTRIBUTING.md#patch-format
https://github.com/kata-containers/community/blob/master/CONTRIBUTING.md#patch-format
jobs:
commit-message-check:
@@ -63,8 +63,7 @@ jobs:
# the entire commit message.
#
# - Body lines *can* be longer than the maximum if they start
# with a non-alphabetic character or if there is no whitespace in
# the line.
# with a non-alphabetic character.
#
# This allows stack traces, log files snippets, emails, long URLs,
# etc to be specified. Some of these naturally "work" as they start
@@ -75,8 +74,8 @@ jobs:
#
# - A SoB comment can be any length (as it is unreasonable to penalise
# people with long names/email addresses :)
pattern: '^.+(\n([a-zA-Z].{0,150}|[^a-zA-Z\n].*|[^\s\n]*|Signed-off-by:.*|))+$'
error: 'Body line too long (max 150)'
pattern: '^.+(\n([a-zA-Z].{0,149}|[^a-zA-Z\n].*|Signed-off-by:.*|))+$'
error: 'Body line too long (max 72)'
post_error: ${{ env.error_msg }}
- name: Check Fixes

View File

@@ -1,41 +0,0 @@
on:
schedule:
- cron: '0 23 * * 0'
name: Docs URL Alive Check
jobs:
test:
strategy:
matrix:
go-version: [1.17.x]
os: [ubuntu-20.04]
runs-on: ${{ matrix.os }}
# don't run this action on forks
if: github.repository_owner == 'kata-containers'
env:
target_branch: ${{ github.base_ref }}
steps:
- name: Install Go
uses: actions/setup-go@v2
with:
go-version: ${{ matrix.go-version }}
env:
GOPATH: ${{ runner.workspace }}/kata-containers
- name: Set env
run: |
echo "GOPATH=${{ github.workspace }}" >> $GITHUB_ENV
echo "${{ github.workspace }}/bin" >> $GITHUB_PATH
- name: Checkout code
uses: actions/checkout@v2
with:
fetch-depth: 0
path: ./src/github.com/${{ github.repository }}
- name: Setup
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/setup.sh
env:
GOPATH: ${{ runner.workspace }}/kata-containers
# docs url alive check
- name: Docs URL Alive Check
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && make docs-url-alive-check

View File

@@ -24,7 +24,6 @@ jobs:
- firecracker
- rootfs-image
- rootfs-initrd
- virtiofsd
steps:
- uses: actions/checkout@v2
- name: Install docker

View File

@@ -48,7 +48,6 @@ jobs:
- rootfs-image
- rootfs-initrd
- shim-v2
- virtiofsd
steps:
- name: get-PR-ref
id: get-PR-ref

View File

@@ -1,8 +1,8 @@
name: Publish Kata release artifacts
name: Publish Kata 2.x release artifacts
on:
push:
tags:
- '[0-9]+.[0-9]+.[0-9]+*'
- '2.*'
jobs:
build-asset:
@@ -17,7 +17,6 @@ jobs:
- rootfs-image
- rootfs-initrd
- shim-v2
- virtiofsd
steps:
- uses: actions/checkout@v2
- name: Install docker

View File

@@ -1,9 +1,8 @@
name: Release Kata in snapcraft store
name: Release Kata 2.x in snapcraft store
on:
push:
tags:
- '[0-9]+.[0-9]+.[0-9]+*'
- '2.*'
jobs:
release-snap:
runs-on: ubuntu-20.04
@@ -20,8 +19,6 @@ jobs:
- name: Build snap
run: |
# Removing man-db, workflow kept failing, fixes: #4480
sudo apt -y remove --purge man-db
sudo apt-get install -y git git-extras
kata_url="https://github.com/kata-containers/kata-containers"
latest_version=$(git ls-remote --tags ${kata_url} | egrep -o "refs.*" | egrep -v "\-alpha|\-rc|{}" | egrep -o "[[:digit:]]+\.[[:digit:]]+\.[[:digit:]]+" | sort -V -r | head -1)

1
.gitignore vendored
View File

@@ -10,5 +10,4 @@ src/agent/kata-agent.service
src/agent/protocols/src/*.rs
!src/agent/protocols/src/lib.rs
build
src/tools/log-parser/kata-log-parser

View File

@@ -6,23 +6,24 @@
# List of available components
COMPONENTS =
COMPONENTS += libs
COMPONENTS += agent
COMPONENTS += runtime
COMPONENTS += runtime-rs
# List of available tools
TOOLS =
TOOLS += agent-ctl
TOOLS += trace-forwarder
TOOLS += runk
TOOLS += log-parser
STANDARD_TARGETS = build check clean install test vendor
default: all
all: logging-crate-tests build
logging-crate-tests:
make -C src/libs/logging
include utils.mk
include ./tools/packaging/kata-deploy/local-build/Makefile
@@ -38,15 +39,10 @@ generate-protocols:
static-checks: build
bash ci/static-checks.sh
docs-url-alive-check:
bash ci/docs-url-alive-check.sh
.PHONY: \
all \
binary-tarball \
default \
install-binary-tarball \
static-checks \
docs-url-alive-check
logging-crate-tests \
static-checks

View File

@@ -71,7 +71,6 @@ See the [official documentation](docs) including:
- [Developer guide](docs/Developer-Guide.md)
- [Design documents](docs/design)
- [Architecture overview](docs/design/architecture)
- [Architecture 3.0 overview](docs/design/architecture_3.0/)
## Configuration
@@ -117,12 +116,8 @@ The table below lists the core parts of the project:
| Component | Type | Description |
|-|-|-|
| [runtime](src/runtime) | core | Main component run by a container manager and providing a containerd shimv2 runtime implementation. |
| [runtime-rs](src/runtime-rs) | core | The Rust version runtime. |
| [agent](src/agent) | core | Management process running inside the virtual machine / POD that sets up the container environment. |
| [libraries](src/libs) | core | Library crates shared by multiple Kata Container components or published to [`crates.io`](https://crates.io/index.html) |
| [`dragonball`](src/dragonball) | core | An optional built-in VMM brings out-of-the-box Kata Containers experience with optimizations on container workloads |
| [documentation](docs) | documentation | Documentation common to all components (such as design and install documentation). |
| [libraries](src/libs) | core | Library crates shared by multiple Kata Container components or published to [`crates.io`](https://crates.io/index.html) |
| [tests](https://github.com/kata-containers/tests) | tests | Excludes unit tests which live with the main code. |
### Additional components
@@ -136,7 +131,6 @@ The table below lists the remaining parts of the project:
| [osbuilder](tools/osbuilder) | infrastructure | Tool to create "mini O/S" rootfs and initrd images and kernel for the hypervisor. |
| [`agent-ctl`](src/tools/agent-ctl) | utility | Tool that provides low-level access for testing the agent. |
| [`trace-forwarder`](src/tools/trace-forwarder) | utility | Agent tracing helper. |
| [`runk`](src/tools/runk) | utility | Standard OCI container runtime based on the agent. |
| [`ci`](https://github.com/kata-containers/ci) | CI | Continuous Integration configuration files and scripts. |
| [`katacontainers.io`](https://github.com/kata-containers/www.katacontainers.io) | Source for the [`katacontainers.io`](https://www.katacontainers.io) site. |
@@ -144,7 +138,7 @@ The table below lists the remaining parts of the project:
Kata Containers is now
[available natively for most distributions](docs/install/README.md#packaged-installation-methods).
However, packaging scripts and metadata are still used to generate [snap](snap/local) and GitHub releases. See
However, packaging scripts and metadata are still used to generate snap and GitHub releases. See
the [components](#components) section for further details.
## Glossary of Terms

View File

@@ -1 +1 @@
3.0.0
2.4.3

View File

@@ -11,10 +11,10 @@ runtimedir=$cidir/../src/runtime
build_working_packages() {
# working packages:
device_api=$runtimedir/pkg/device/api
device_config=$runtimedir/pkg/device/config
device_drivers=$runtimedir/pkg/device/drivers
device_manager=$runtimedir/pkg/device/manager
device_api=$runtimedir/virtcontainers/device/api
device_config=$runtimedir/virtcontainers/device/config
device_drivers=$runtimedir/virtcontainers/device/drivers
device_manager=$runtimedir/virtcontainers/device/manager
rc_pkg_dir=$runtimedir/pkg/resourcecontrol/
utils_pkg_dir=$runtimedir/virtcontainers/utils

View File

@@ -1,6 +1,6 @@
#!/bin/bash
#!/usr/bin/env bash
#
# Copyright (c) 2021 Easystack Inc.
# Copyright (c) 2020 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
@@ -9,4 +9,4 @@ set -e
cidir=$(dirname "$0")
source "${cidir}/lib.sh"
run_docs_url_alive_check
run_go_test

View File

@@ -19,31 +19,29 @@ source "${tests_repo_dir}/.ci/lib.sh"
# fail. So let's ensure they are unset here.
unset PREFIX DESTDIR
arch=${ARCH:-$(uname -m)}
arch=$(uname -m)
workdir="$(mktemp -d --tmpdir build-libseccomp.XXXXX)"
# Variables for libseccomp
libseccomp_version="${LIBSECCOMP_VERSION:-""}"
if [ -z "${libseccomp_version}" ]; then
libseccomp_version=$(get_version "externals.libseccomp.version")
fi
libseccomp_url="${LIBSECCOMP_URL:-""}"
if [ -z "${libseccomp_url}" ]; then
libseccomp_url=$(get_version "externals.libseccomp.url")
fi
# Currently, specify the libseccomp version directly without using `versions.yaml`
# because the current Snap workflow is incomplete.
# After solving the issue, replace this code by using the `versions.yaml`.
# libseccomp_version=$(get_version "externals.libseccomp.version")
# libseccomp_url=$(get_version "externals.libseccomp.url")
libseccomp_version="2.5.1"
libseccomp_url="https://github.com/seccomp/libseccomp"
libseccomp_tarball="libseccomp-${libseccomp_version}.tar.gz"
libseccomp_tarball_url="${libseccomp_url}/releases/download/v${libseccomp_version}/${libseccomp_tarball}"
cflags="-O2"
# Variables for gperf
gperf_version="${GPERF_VERSION:-""}"
if [ -z "${gperf_version}" ]; then
gperf_version=$(get_version "externals.gperf.version")
fi
gperf_url="${GPERF_URL:-""}"
if [ -z "${gperf_url}" ]; then
gperf_url=$(get_version "externals.gperf.url")
fi
# Currently, specify the gperf version directly without using `versions.yaml`
# because the current Snap workflow is incomplete.
# After solving the issue, replace this code by using the `versions.yaml`.
# gperf_version=$(get_version "externals.gperf.version")
# gperf_url=$(get_version "externals.gperf.url")
gperf_version="3.1"
gperf_url="https://ftp.gnu.org/gnu/gperf"
gperf_tarball="gperf-${gperf_version}.tar.gz"
gperf_tarball_url="${gperf_url}/${gperf_tarball}"
@@ -72,8 +70,7 @@ build_and_install_gperf() {
curl -sLO "${gperf_tarball_url}"
tar -xf "${gperf_tarball}"
pushd "gperf-${gperf_version}"
# Unset $CC for configure, we will always use native for gperf
CC= ./configure --prefix="${gperf_install_dir}"
./configure --prefix="${gperf_install_dir}"
make
make install
export PATH=$PATH:"${gperf_install_dir}"/bin
@@ -87,7 +84,7 @@ build_and_install_libseccomp() {
curl -sLO "${libseccomp_tarball_url}"
tar -xf "${libseccomp_tarball}"
pushd "libseccomp-${libseccomp_version}"
./configure --prefix="${libseccomp_install_dir}" CFLAGS="${cflags}" --enable-static --host="${arch}"
./configure --prefix="${libseccomp_install_dir}" CFLAGS="${cflags}" --enable-static
make
make install
popd

24
ci/install_musl.sh Executable file
View File

@@ -0,0 +1,24 @@
#!/usr/bin/env bash
# Copyright (c) 2020 Ant Group
#
# SPDX-License-Identifier: Apache-2.0
#
set -e
install_aarch64_musl() {
local arch=$(uname -m)
if [ "${arch}" == "aarch64" ]; then
local musl_tar="${arch}-linux-musl-native.tgz"
local musl_dir="${arch}-linux-musl-native"
pushd /tmp
if curl -sLO --fail https://musl.cc/${musl_tar}; then
tar -zxf ${musl_tar}
mkdir -p /usr/local/musl/
cp -r ${musl_dir}/* /usr/local/musl/
fi
popd
fi
}
install_aarch64_musl

View File

@@ -18,13 +18,6 @@ clone_tests_repo()
{
if [ -d "$tests_repo_dir" ]; then
[ -n "${CI:-}" ] && return
# git config --global --add safe.directory will always append
# the target to .gitconfig without checking the existence of
# the target, so it's better to check it before adding the target repo.
local sd="$(git config --global --get safe.directory ${tests_repo_dir} || true)"
if [ -z "${sd}" ]; then
git config --global --add safe.directory ${tests_repo_dir}
fi
pushd "${tests_repo_dir}"
git checkout "${branch}"
git pull
@@ -46,21 +39,8 @@ run_static_checks()
bash "$tests_repo_dir/.ci/static-checks.sh" "$@"
}
run_docs_url_alive_check()
run_go_test()
{
clone_tests_repo
# Make sure we have the targeting branch
git remote set-branches --add origin "${branch}"
git fetch -a
bash "$tests_repo_dir/.ci/static-checks.sh" --docs --all "github.com/kata-containers/kata-containers"
}
run_get_pr_changed_file_details()
{
clone_tests_repo
# Make sure we have the targeting branch
git remote set-branches --add origin "${branch}"
git fetch -a
source "$tests_repo_dir/.ci/lib.sh"
get_pr_changed_file_details
bash "$tests_repo_dir/.ci/go-test.sh"
}

View File

@@ -1,33 +0,0 @@
targets = [
{ triple = "x86_64-apple-darwin" },
{ triple = "x86_64-unknown-linux-gnu" },
{ triple = "x86_64-unknown-linux-musl" },
]
[advisories]
vulnerability = "deny"
unsound = "deny"
unmaintained = "deny"
ignore = ["RUSTSEC-2020-0071"]
[bans]
multiple-versions = "allow"
deny = [
{ name = "cmake" },
{ name = "openssl-sys" },
]
[licenses]
unlicensed = "deny"
allow-osi-fsf-free = "neither"
copyleft = "allow"
# We want really high confidence when inferring licenses from text
confidence-threshold = 0.93
allow = ["0BSD", "Apache-2.0", "BSD-2-Clause", "BSD-3-Clause", "CC0-1.0", "ISC", "MIT", "MPL-2.0"]
private = { ignore = true}
exceptions = []
[sources]
unknown-registry = "allow"
unknown-git = "allow"

View File

@@ -116,7 +116,7 @@ detailed below.
The Kata logs appear in the `containerd` log files, along with logs from `containerd` itself.
For more information about `containerd` debug, please see the
[`containerd` documentation](https://github.com/containerd/containerd/blob/main/docs/getting-started.md).
[`containerd` documentation](https://github.com/containerd/containerd/blob/master/docs/getting-started.md).
#### Enabling full `containerd` debug
@@ -425,7 +425,7 @@ To build utilizing the same options as Kata, you should make use of the `configu
$ cd $your_qemu_directory
$ $packaging_dir/scripts/configure-hypervisor.sh kata-qemu > kata.cfg
$ eval ./configure "$(cat kata.cfg)"
$ make -j $(nproc --ignore=1)
$ make -j $(nproc)
$ sudo -E make install
```
@@ -465,7 +465,7 @@ script and paste its output directly into a
> [runtime](../src/runtime) repository.
To perform analysis on Kata logs, use the
[`kata-log-parser`](../src/tools/log-parser)
[`kata-log-parser`](https://github.com/kata-containers/tests/tree/main/cmd/log-parser)
tool, which can convert the logs into formats (e.g. JSON, TOML, XML, and YAML).
See [Set up a debug console](#set-up-a-debug-console).
@@ -522,7 +522,7 @@ bash-4.2# exit
exit
```
`kata-runtime exec` has a command-line option `runtime-namespace`, which is used to specify under which [runtime namespace](https://github.com/containerd/containerd/blob/main/docs/namespaces.md) the particular pod was created. By default, it is set to `k8s.io` and works for containerd when configured
`kata-runtime exec` has a command-line option `runtime-namespace`, which is used to specify under which [runtime namespace](https://github.com/containerd/containerd/blob/master/docs/namespaces.md) the particular pod was created. By default, it is set to `k8s.io` and works for containerd when configured
with Kubernetes. For CRI-O, the namespace should set to `default` explicitly. This should not be confused with [Kubernetes namespaces](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/).
For other CRI-runtimes and configurations, you may need to set the namespace utilizing the `runtime-namespace` option.
@@ -700,11 +700,11 @@ options to have the kernel boot messages logged into the system journal.
For generic information on enabling debug in the configuration file, see the
[Enable full debug](#enable-full-debug) section.
The kernel boot messages will appear in the `kata` logs (and in the `containerd` or `CRI-O` log appropriately).
The kernel boot messages will appear in the `containerd` or `CRI-O` log appropriately,
such as:
```bash
$ sudo journalctl -t kata
$ sudo journalctl -t containerd
-- Logs begin at Thu 2020-02-13 16:20:40 UTC, end at Thu 2020-02-13 16:30:23 UTC. --
...
time="2020-09-15T14:56:23.095113803+08:00" level=debug msg="reading guest console" console-protocol=unix console-url=/run/vc/vm/ab9f633385d4987828d342e47554fc6442445b32039023eeddaa971c1bb56791/console.sock pid=107642 sandbox=ab9f633385d4987828d342e47554fc6442445b32039023eeddaa971c1bb56791 source=virtcontainers subsystem=sandbox vmconsole="[ 0.395399] brd: module loaded"
@@ -714,4 +714,3 @@ time="2020-09-15T14:56:23.105268162+08:00" level=debug msg="reading guest consol
time="2020-09-15T14:56:23.121121598+08:00" level=debug msg="reading guest console" console-protocol=unix console-url=/run/vc/vm/ab9f633385d4987828d342e47554fc6442445b32039023eeddaa971c1bb56791/console.sock pid=107642 sandbox=ab9f633385d4987828d342e47554fc6442445b32039023eeddaa971c1bb56791 source=virtcontainers subsystem=sandbox vmconsole="[ 0.421324] memmap_init_zone_device initialised 32768 pages in 12ms"
...
```
Refer to the [kata-log-parser documentation](../src/tools/log-parser/README.md) which is useful to fetch these.

View File

@@ -46,7 +46,7 @@ The following link shows the latest list of limitations:
# Contributing
If you would like to work on resolving a limitation, please refer to the
[contributors guide](https://github.com/kata-containers/community/blob/main/CONTRIBUTING.md).
[contributors guide](https://github.com/kata-containers/community/blob/master/CONTRIBUTING.md).
If you wish to raise an issue for a new limitation, either
[raise an issue directly on the runtime](https://github.com/kata-containers/kata-containers/issues/new)
or see the
@@ -60,26 +60,17 @@ This section lists items that might be possible to fix.
## OCI CLI commands
### Docker and Podman support
Currently Kata Containers does not support Podman.
Currently Kata Containers does not support Docker or Podman.
See issue https://github.com/kata-containers/kata-containers/issues/722 for more information.
Docker supports Kata Containers since 22.06:
```bash
$ sudo docker run --runtime io.containerd.kata.v2
```
Kata Containers works perfectly with containerd, we recommend to use
containerd's Docker-style command line tool [`nerdctl`](https://github.com/containerd/nerdctl).
## Runtime commands
### checkpoint and restore
The runtime does not provide `checkpoint` and `restore` commands. There
are discussions about using VM save and restore to give us a
[`criu`](https://github.com/checkpoint-restore/criu)-like functionality,
`[criu](https://github.com/checkpoint-restore/criu)`-like functionality,
which might provide a solution.
Note that the OCI standard does not specify `checkpoint` and `restore`
@@ -102,42 +93,6 @@ All other configurations are supported and are working properly.
## Networking
### Host network
Host network (`nerdctl/docker run --net=host`or [Kubernetes `HostNetwork`](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#hosts-namespaces)) is not supported.
It is not possible to directly access the host networking configuration
from within the VM.
The `--net=host` option can still be used with `runc` containers and
inter-mixed with running Kata Containers, thus enabling use of `--net=host`
when necessary.
It should be noted, currently passing the `--net=host` option into a
Kata Container may result in the Kata Container networking setup
modifying, re-configuring and therefore possibly breaking the host
networking setup. Do not use `--net=host` with Kata Containers.
### Support for joining an existing VM network
Docker supports the ability for containers to join another containers
namespace with the `docker run --net=containers` syntax. This allows
multiple containers to share a common network namespace and the network
interfaces placed in the network namespace. Kata Containers does not
support network namespace sharing. If a Kata Container is setup to
share the network namespace of a `runc` container, the runtime
effectively takes over all the network interfaces assigned to the
namespace and binds them to the VM. Consequently, the `runc` container loses
its network connectivity.
### docker run --link
The runtime does not support the `docker run --link` command. This
command is now deprecated by docker and we have no intention of adding support.
Equivalent functionality can be achieved with the newer docker networking commands.
See more documentation at
[docs.docker.com](https://docs.docker.com/network/links/).
## Resource management
Due to the way VMs differ in their CPU and memory allocation, and sharing

View File

@@ -30,6 +30,7 @@ See the [how-to documentation](how-to).
* [GPU Passthrough with Kata](./use-cases/GPU-passthrough-and-Kata.md)
* [SR-IOV with Kata](./use-cases/using-SRIOV-and-kata.md)
* [Intel QAT with Kata](./use-cases/using-Intel-QAT-and-kata.md)
* [VPP with Kata](./use-cases/using-vpp-and-kata.md)
* [SPDK vhost-user with Kata](./use-cases/using-SPDK-vhostuser-and-kata.md)
* [Intel SGX with Kata](./use-cases/using-Intel-SGX-and-kata.md)

View File

@@ -277,9 +277,7 @@ mod tests {
## Temporary files
Use `t.TempDir()` to create temporary directory. The directory created by
`t.TempDir()` is automatically removed when the test and all its subtests
complete.
Always delete temporary files on success.
### Golang temporary files
@@ -288,7 +286,11 @@ func TestSomething(t *testing.T) {
assert := assert.New(t)
// Create a temporary directory
tmpdir := t.TempDir()
tmpdir, err := os.MkdirTemp("", "")
assert.NoError(err)
// Delete it at the end of the test
defer os.RemoveAll(tmpdir)
// Add test logic that will use the tmpdir here...
}
@@ -320,7 +322,7 @@ mod tests {
## Test user
[Unit tests are run *twice*](../src/runtime/go-test.sh):
[Unit tests are run *twice*](https://github.com/kata-containers/tests/blob/main/.ci/go-test.sh):
- as the current user
- as the `root` user (if different to the current user)
@@ -341,7 +343,7 @@ The main repository has the most comprehensive set of skip abilities. See:
One method is to use the `nix` crate along with some custom macros:
```rust
```
#[cfg(test)]
mod tests {
#[allow(unused_macros)]

View File

@@ -79,7 +79,7 @@ a "`BUG: feature X not implemented see {bug-url}`" type error.
- Don't use multiple log calls when a single log call could be used.
- Use structured logging where possible to allow
[standard tooling](../src/tools/log-parser)
[standard tooling](https://github.com/kata-containers/tests/tree/main/cmd/log-parser)
be able to extract the log fields.
### Names

View File

@@ -11,8 +11,7 @@ Kata Containers design documents:
- [`Inotify` support](inotify.md)
- [Metrics(Kata 2.0)](kata-2-0-metrics.md)
- [Design for Kata Containers `Lazyload` ability with `nydus`](kata-nydus-design.md)
- [Design for direct-assigned volume](direct-blk-device-assignment.md)
- [Design for core-scheduling](core-scheduling.md)
---
- [Design proposals](proposals)

View File

@@ -67,15 +67,22 @@ Using a proxy for multiplexing the connections between the VM and the host uses
4.5MB per [POD][2]. In a high density deployment this could add up to GBs of
memory that could have been used to host more PODs. When we talk about density
each kilobyte matters and it might be the decisive factor between run another
POD or not. Before making the decision not to use VSOCKs, you should ask
POD or not. For example if you have 500 PODs running in a server, the same
amount of [`kata-proxy`][3] processes will be running and consuming for around
2250MB of RAM. Before making the decision not to use VSOCKs, you should ask
yourself, how many more containers can run with the memory RAM consumed by the
Kata proxies?
### Reliability
[`kata-proxy`][3] is in charge of multiplexing the connections between virtual
machine and host processes, if it dies all connections get broken. For example
if you have a [POD][2] with 10 containers running, if `kata-proxy` dies it would
be impossible to contact your containers, though they would still be running.
Since communication via VSOCKs is direct, the only way to lose communication
with the containers is if the VM itself or the `containerd-shim-kata-v2` dies, if this happens
the containers are removed automatically.
[1]: https://wiki.qemu.org/Features/VirtioVsock
[2]: ./vcpu-handling.md#virtual-cpus-and-kubernetes-pods
[3]: https://github.com/kata-containers/proxy

View File

@@ -17,7 +17,7 @@ Kubelet instance is responsible for managing the lifecycle of pods
within the nodes and eventually relies on a container runtime to
handle execution. The Kubelet architecture decouples lifecycle
management from container execution through a dedicated gRPC based
[Container Runtime Interface (CRI)](https://github.com/kubernetes/design-proposals-archive/blob/main/node/container-runtime-interface-v1.md).
[Container Runtime Interface (CRI)](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/container-runtime-interface-v1.md).
In other words, a Kubelet is a CRI client and expects a CRI
implementation to handle the server side of the interface.

View File

@@ -1,17 +1,5 @@
# Storage
## Limits
Kata Containers is [compatible](README.md#compatibility) with existing
standards and runtime. From the perspective of storage, this means no
limits are placed on the amount of storage a container
[workload](README.md#workload) may use.
Since cgroups are not able to set limits on storage allocation, if you
wish to constrain the amount of storage a container uses, consider
using an existing facility such as `quota(1)` limits or
[device mapper](#devicemapper) limits.
## virtio SCSI
If a block-based graph driver is [configured](README.md#configuration),
@@ -32,7 +20,7 @@ For virtio-fs, the [runtime](README.md#runtime) starts one `virtiofsd` daemon
## Devicemapper
The
[devicemapper `snapshotter`](https://github.com/containerd/containerd/tree/main/snapshots/devmapper)
[devicemapper `snapshotter`](https://github.com/containerd/containerd/tree/master/snapshots/devmapper)
is a special case. The `snapshotter` uses dedicated block devices
rather than formatted filesystems, and operates at the block level
rather than the file level. This knowledge is used to directly use the

View File

@@ -1,169 +0,0 @@
# Kata 3.0 Architecture
## Overview
In cloud-native scenarios, there is an increased demand for container startup speed, resource consumption, stability, and security, areas where the present Kata Containers runtime is challenged relative to other runtimes. To achieve this, we propose a solid, field-tested and secure Rust version of the kata-runtime.
Also, we provide the following designs:
- Turn key solution with builtin `Dragonball` Sandbox
- Async I/O to reduce resource consumption
- Extensible framework for multiple services, runtimes and hypervisors
- Lifecycle management for sandbox and container associated resources
### Rationale for choosing Rust
We chose Rust because it is designed as a system language with a focus on efficiency.
In contrast to Go, Rust makes a variety of design trade-offs in order to obtain
good execution performance, with innovative techniques that, in contrast to C or
C++, provide reasonable protection against common memory errors (buffer
overflow, invalid pointers, range errors), error checking (ensuring errors are
dealt with), thread safety, ownership of resources, and more.
These benefits were verified in our project when the Kata Containers guest agent
was rewritten in Rust. We notably saw a significant reduction in memory usage
with the Rust-based implementation.
## Design
### Architecture
![architecture](./images/architecture.png)
### Built-in VMM
#### Current Kata 2.x architecture
![not_builtin_vmm](./images/not_built_in_vmm.png)
As shown in the figure, runtime and VMM are separate processes. The runtime process forks the VMM process and interacts through the inter-process RPC. Typically, process interaction consumes more resources than peers within the process, and it will result in relatively low efficiency. At the same time, the cost of resource operation and maintenance should be considered. For example, when performing resource recovery under abnormal conditions, the exception of any process must be detected by others and activate the appropriate resource recovery process. If there are additional processes, the recovery becomes even more difficult.
#### How To Support Built-in VMM
We provide `Dragonball` Sandbox to enable built-in VMM by integrating VMM's function into the Rust library. We could perform VMM-related functionalities by using the library. Because runtime and VMM are in the same process, there is a benefit in terms of message processing speed and API synchronization. It can also guarantee the consistency of the runtime and the VMM life cycle, reducing resource recovery and exception handling maintenance, as shown in the figure:
![builtin_vmm](./images/built_in_vmm.png)
### Async Support
#### Why Need Async
**Async is already in stable Rust and allows us to write async code**
- Async provides significantly reduced CPU and memory overhead, especially for workloads with a large amount of IO-bound tasks
- Async is zero-cost in Rust, which means that you only pay for what you use. Specifically, you can use async without heap allocations and dynamic dispatch, which greatly improves efficiency
- For more (see [Why Async?](https://rust-lang.github.io/async-book/01_getting_started/02_why_async.html) and [The State of Asynchronous Rust](https://rust-lang.github.io/async-book/01_getting_started/03_state_of_async_rust.html)).
**There may be several problems if implementing kata-runtime with Sync Rust**
- Too many threads with a new TTRPC connection
- TTRPC threads: reaper thread(1) + listener thread(1) + client handler(2)
- Add 3 I/O threads with a new container
- In Sync mode, implementing a timeout mechanism is challenging. For example, in TTRPC API interaction, the timeout mechanism is difficult to align with Golang
#### How To Support Async
The kata-runtime is controlled by TOKIO_RUNTIME_WORKER_THREADS to run the OS thread, which is 2 threads by default. For TTRPC and container-related threads run in the `tokio` thread in a unified manner, and related dependencies need to be switched to Async, such as Timer, File, Netlink, etc. With the help of Async, we can easily support no-block I/O and timer. Currently, we only utilize Async for kata-runtime. The built-in VMM keeps the OS thread because it can ensure that the threads are controllable.
**For N tokio worker threads and M containers**
- Sync runtime(both OS thread and `tokio` task are OS thread but without `tokio` worker thread) OS thread number: 4 + 12*M
- Async runtime(only OS thread is OS thread) OS thread number: 2 + N
```shell
├─ main(OS thread)
├─ async-logger(OS thread)
└─ tokio worker(N * OS thread)
├─ agent log forwarder(1 * tokio task)
├─ health check thread(1 * tokio task)
├─ TTRPC reaper thread(M * tokio task)
├─ TTRPC listener thread(M * tokio task)
├─ TTRPC client handler thread(7 * M * tokio task)
├─ container stdin io thread(M * tokio task)
├─ container stdin io thread(M * tokio task)
└─ container stdin io thread(M * tokio task)
```
### Extensible Framework
The Kata 3.x runtime is designed with the extension of service, runtime, and hypervisor, combined with configuration to meet the needs of different scenarios. At present, the service provides a register mechanism to support multiple services. Services could interact with runtime through messages. In addition, the runtime handler handles messages from services. To meet the needs of a binary that supports multiple runtimes and hypervisors, the startup must obtain the runtime handler type and hypervisor type through configuration.
![framework](./images/framework.png)
### Resource Manager
In our case, there will be a variety of resources, and every resource has several subtypes. Especially for `Virt-Container`, every subtype of resource has different operations. And there may be dependencies, such as the share-fs rootfs and the share-fs volume will use share-fs resources to share files to the VM. Currently, network and share-fs are regarded as sandbox resources, while rootfs, volume, and cgroup are regarded as container resources. Also, we abstract a common interface for each resource and use subclass operations to evaluate the differences between different subtypes.
![resource manager](./images/resourceManager.png)
## Roadmap
- Stage 1 (June): provide basic features (current delivered)
- Stage 2 (September): support common features
- Stage 3: support full features
| **Class** | **Sub-Class** | **Development Stage** | **Status** |
| -------------------------- | ------------------- | --------------------- |------------|
| Service | task service | Stage 1 | ✅ |
| | extend service | Stage 3 | 🚫 |
| | image service | Stage 3 | 🚫 |
| Runtime handler | `Virt-Container` | Stage 1 | ✅ |
| Endpoint | VETH Endpoint | Stage 1 | ✅ |
| | Physical Endpoint | Stage 2 | ✅ |
| | Tap Endpoint | Stage 2 | ✅ |
| | `Tuntap` Endpoint | Stage 2 | ✅ |
| | `IPVlan` Endpoint | Stage 2 | ✅ |
| | `MacVlan` Endpoint | Stage 2 | ✅ |
| | MACVTAP Endpoint | Stage 3 | 🚫 |
| | `VhostUserEndpoint` | Stage 3 | 🚫 |
| Network Interworking Model | Tc filter | Stage 1 | ✅ |
| | `MacVtap` | Stage 3 | 🚧 |
| Storage | Virtio-fs | Stage 1 | ✅ |
| | `nydus` | Stage 2 | 🚧 |
| | `device mapper` | Stage 2 | 🚫 |
| `Cgroup V2` | | Stage 2 | 🚧 |
| Hypervisor | `Dragonball` | Stage 1 | 🚧 |
| | QEMU | Stage 2 | 🚫 |
| | ACRN | Stage 3 | 🚫 |
| | Cloud Hypervisor | Stage 3 | 🚫 |
| | Firecracker | Stage 3 | 🚫 |
## FAQ
- Are the "service", "message dispatcher" and "runtime handler" all part of the single Kata 3.x runtime binary?
Yes. They are components in Kata 3.x runtime. And they will be packed into one binary.
1. Service is an interface, which is responsible for handling multiple services like task service, image service and etc.
2. Message dispatcher, it is used to match multiple requests from the service module.
3. Runtime handler is used to deal with the operation for sandbox and container.
- What is the name of the Kata 3.x runtime binary?
Apparently we can't use `containerd-shim-v2-kata` because it's already used. We are facing the hardest issue of "naming" again. Any suggestions are welcomed.
Internally we use `containerd-shim-v2-rund`.
- Is the Kata 3.x design compatible with the containerd shimv2 architecture?
Yes. It is designed to follow the functionality of go version kata. And it implements the `containerd shim v2` interface/protocol.
- How will users migrate to the Kata 3.x architecture?
The migration plan will be provided before the Kata 3.x is merging into the main branch.
- Is `Dragonball` limited to its own built-in VMM? Can the `Dragonball` system be configured to work using an external `Dragonball` VMM/hypervisor?
The `Dragonball` could work as an external hypervisor. However, stability and performance is challenging in this case. Built in VMM could optimise the container overhead, and it's easy to maintain stability.
`runD` is the `containerd-shim-v2` counterpart of `runC` and can run a pod/containers. `Dragonball` is a `microvm`/VMM that is designed to run container workloads. Instead of `microvm`/VMM, we sometimes refer to it as secure sandbox.
- QEMU, Cloud Hypervisor and Firecracker support are planned, but how that would work. Are they working in separate process?
Yes. They are unable to work as built in VMM.
- What is `upcall`?
The `upcall` is used to hotplug CPU/memory/MMIO devices, and it solves two issues.
1. avoid dependency on PCI/ACPI
2. avoid dependency on `udevd` within guest and get deterministic results for hotplug operations. So `upcall` is an alternative to ACPI based CPU/memory/device hotplug. And we may cooperate with the community to add support for ACPI based CPU/memory/device hotplug if needed.
`Dbs-upcall` is a `vsock-based` direct communication tool between VMM and guests. The server side of the `upcall` is a driver in guest kernel (kernel patches are needed for this feature) and it'll start to serve the requests once the kernel has started. And the client side is in VMM , it'll be a thread that communicates with VSOCK through `uds`. We have accomplished device hotplug / hot-unplug directly through `upcall` in order to avoid virtualization of ACPI to minimize virtual machine's overhead. And there could be many other usage through this direct communication channel. It's already open source.
https://github.com/openanolis/dragonball-sandbox/tree/main/crates/dbs-upcall
- The URL below says the kernel patches work with 4.19, but do they also work with 5.15+ ?
Forward compatibility should be achievable, we have ported it to 5.10 based kernel.
- Are these patches platform-specific or would they work for any architecture that supports VSOCK?
It's almost platform independent, but some message related to CPU hotplug are platform dependent.
- Could the kernel driver be replaced with a userland daemon in the guest using loopback VSOCK?
We need to create device nodes for hot-added CPU/memory/devices, so it's not easy for userspace daemon to do these tasks.
- The fact that `upcall` allows communication between the VMM and the guest suggests that this architecture might be incompatible with https://github.com/confidential-containers where the VMM should have no knowledge of what happens inside the VM.
1. `TDX` doesn't support CPU/memory hotplug yet.
2. For ACPI based device hotplug, it depends on ACPI `DSDT` table, and the guest kernel will execute `ASL` code to handle during handling those hotplug event. And it should be easier to audit VSOCK based communication than ACPI `ASL` methods.
- What is the security boundary for the monolithic / "Built-in VMM" case?
It has the security boundary of virtualization. More details will be provided in next stage.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 95 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 66 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 136 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 72 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 139 KiB

File diff suppressed because one or more lines are too long

View File

@@ -1,12 +0,0 @@
# Core scheduling
Core scheduling is a Linux kernel feature that allows only trusted tasks to run concurrently on
CPUs sharing compute resources (for example, hyper-threads on a core).
Containerd versions >= 1.6.4 leverage this to treat all of the processes associated with a
given pod or container to be a single group of trusted tasks. To indicate this should be carried
out, containerd sets the `SCHED_CORE` environment variable for each shim it spawns. When this is
set, the Kata Containers shim implementation uses the `prctl` syscall to create a new core scheduling
domain for the shim process itself as well as future VMM processes it will start.
For more details on the core scheduling feature, see the [Linux documentation](https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/core-scheduling.html).

View File

@@ -1,253 +0,0 @@
# Motivation
Today, there exist a few gaps between Container Storage Interface (CSI) and virtual machine (VM) based runtimes such as Kata Containers
that prevent them from working together smoothly.
First, its cumbersome to use a persistent volume (PV) with Kata Containers. Today, for a PV with Filesystem volume mode, Virtio-fs
is the only way to surface it inside a Kata Container guest VM. But often mounting the filesystem (FS) within the guest operating system (OS) is
desired due to performance benefits, availability of native FS features and security benefits over the Virtio-fs mechanism.
Second, its difficult if not impossible to resize a PV online with Kata Containers. While a PV can be expanded on the host OS,
the updated metadata needs to be propagated to the guest OS in order for the application container to use the expanded volume.
Currently, there is not a way to propagate the PV metadata from the host OS to the guest OS without restarting the Pod sandbox.
# Proposed Solution
Because of the OS boundary, these features cannot be implemented in the CSI node driver plugin running on the host OS
as is normally done in the runc container. Instead, they can be done by the Kata Containers agent inside the guest OS,
but it requires the CSI driver to pass the relevant information to the Kata Containers runtime.
An ideal long term solution would be to have the `kubelet` coordinating the communication between the CSI driver and
the container runtime, as described in [KEP-2857](https://github.com/kubernetes/enhancements/pull/2893/files).
However, as the KEP is still under review, we would like to propose a short/medium term solution to unblock our use case.
The proposed solution is built on top of a previous [proposal](https://github.com/egernst/kata-containers/blob/da-proposal/docs/design/direct-assign-volume.md)
described by Eric Ernst. The previous proposal has two gaps:
1. Writing a `csiPlugin.json` file to the volume root path introduced a security risk. A malicious user can gain unauthorized
access to a block device by writing their own `csiPlugin.json` to the above location through an ephemeral CSI plugin.
2. The proposal didn't describe how to establish a mapping between a volume and a kata sandbox, which is needed for
implementing CSI volume resize and volume stat collection APIs.
This document particularly focuses on how to address these two gaps.
## Assumptions and Limitations
1. The proposal assumes that a block device volume will only be used by one Pod on a node at a time, which we believe
is the most common pattern in Kata Containers use cases. Its also unsafe to have the same block device attached to more than
one Kata pod. In the context of Kubernetes, the `PersistentVolumeClaim` (PVC) needs to have the `accessMode` as `ReadWriteOncePod`.
2. More advanced Kubernetes volume features such as, `fsGroup`, `fsGroupChangePolicy`, and `subPath` are not supported.
## End User Interface
1. The user specifies a PV as a direct-assigned volume. How a PV is specified as a direct-assigned volume is left for each CSI implementation to decide.
There are a few options for reference:
1. A storage class parameter specifies whether it's a direct-assigned volume. This avoids any lookups of PVC
or Pod information from the CSI plugin (as external provisioner takes care of these). However, all PVs in the storage class with the parameter set
will have host mounts skipped.
2. Use a PVC annotation. This approach requires the CSI plugins have `--extra-create-metadata` [set](https://kubernetes-csi.github.io/docs/external-provisioner.html#persistentvolumeclaim-and-persistentvolume-parameters)
to be able to perform a lookup of the PVC annotations from the API server. Pro: API server lookup of annotations only required during creation of PV.
Con: The CSI plugin will always skip host mounting of the PV.
3. The CSI plugin can also lookup pod `runtimeclass` during `NodePublish`. This approach can be found in the [ALIBABA CSI plugin](https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver/blob/master/pkg/disk/nodeserver.go#L248).
2. The CSI node driver delegates the direct assigned volume to the Kata Containers runtime. The CSI node driver APIs need to
be modified to pass the volume mount information and collect volume information to/from the Kata Containers runtime by invoking `kata-runtime` command line commands.
* **NodePublishVolume** -- It invokes `kata-runtime direct-volume add --volume-path [volumePath] --mount-info [mountInfo]`
to propagate the volume mount information to the Kata Containers runtime for it to carry out the filesystem mount operation.
The `volumePath` is the [target_path](https://github.com/container-storage-interface/spec/blob/master/csi.proto#L1364) in the CSI `NodePublishVolumeRequest`.
The `mountInfo` is a serialized JSON string.
* **NodeGetVolumeStats** -- It invokes `kata-runtime direct-volume stats --volume-path [volumePath]` to retrieve the filesystem stats of direct-assigned volume.
* **NodeExpandVolume** -- It invokes `kata-runtime direct-volume resize --volume-path [volumePath] --size [size]` to send a resize request to the Kata Containers runtime to
resize the direct-assigned volume.
* **NodeStageVolume/NodeUnStageVolume** -- It invokes `kata-runtime direct-volume remove --volume-path [volumePath]` to remove the persisted metadata of a direct-assigned volume.
The `mountInfo` object is defined as follows:
```Golang
type MountInfo struct {
// The type of the volume (ie. block)
VolumeType string `json:"volume-type"`
// The device backing the volume.
Device string `json:"device"`
// The filesystem type to be mounted on the volume.
FsType string `json:"fstype"`
// Additional metadata to pass to the agent regarding this volume.
Metadata map[string]string `json:"metadata,omitempty"`
// Additional mount options.
Options []string `json:"options,omitempty"`
}
```
Notes: given that the `mountInfo` is persisted to the disk by the Kata runtime, it shouldn't container any secrets (such as SMB mount password).
## Implementation Details
### Kata runtime
Instead of the CSI node driver writing the mount info into a `csiPlugin.json` file under the volume root,
as described in the original proposal, here we propose that the CSI node driver passes the mount information to
the Kata Containers runtime through a new `kata-runtime` commandline command. The `kata-runtime` then writes the mount
information to a `mount-info.json` file in a predefined location (`/run/kata-containers/shared/direct-volumes/[volume_path]/`).
When the Kata Containers runtime starts a container, it verifies whether a volume mount is a direct-assigned volume by checking
whether there is a `mountInfo` file under the computed Kata `direct-volumes` directory. If it is, the runtime parses the `mountInfo` file,
updates the mount spec with the data in `mountInfo`. The updated mount spec is then passed to the Kata agent in the guest VM together
with other mounts. The Kata Containers runtime also creates a file named by the sandbox id under the `direct-volumes/[volume_path]/`
directory. The reason for adding a sandbox id file is to establish a mapping between the volume and the sandbox using it.
Later, when the Kata Containers runtime handles the `get-stats` and `resize` commands, it uses the sandbox id to identify
the endpoint of the corresponding `containerd-shim-kata-v2`.
### containerd-shim-kata-v2 changes
`containerd-shim-kata-v2` provides an API for sandbox management through a Unix domain socket. Two new handlers are proposed: `/direct-volume/stats` and `/direct-volume/resize`:
Example:
```bash
$ curl --unix-socket "$shim_socket_path" -I -X GET 'http://localhost/direct-volume/stats/[urlSafeVolumePath]'
$ curl --unix-socket "$shim_socket_path" -I -X POST 'http://localhost/direct-volume/resize' -d '{ "volumePath"": [volumePath], "Size": "123123" }'
```
The shim then forwards the corresponding request to the `kata-agent` to carry out the operations inside the guest VM. For `resize` operation,
the Kata runtime also needs to notify the hypervisor to resize the block device (e.g. call `block_resize` in QEMU).
### Kata agent changes
The mount spec of a direct-assigned volume is passed to `kata-agent` through the existing `Storage` GRPC object.
Two new APIs and three new GRPC objects are added to GRPC protocol between the shim and agent for resizing and getting volume stats:
```protobuf
rpc GetVolumeStats(VolumeStatsRequest) returns (VolumeStatsResponse);
rpc ResizeVolume(ResizeVolumeRequest) returns (google.protobuf.Empty);
message VolumeStatsRequest {
// The volume path on the guest outside the container
string volume_guest_path = 1;
}
message ResizeVolumeRequest {
// Full VM guest path of the volume (outside the container)
string volume_guest_path = 1;
uint64 size = 2;
}
// This should be kept in sync with CSI NodeGetVolumeStatsResponse (https://github.com/container-storage-interface/spec/blob/v1.5.0/csi.proto)
message VolumeStatsResponse {
// This field is OPTIONAL.
repeated VolumeUsage usage = 1;
// Information about the current condition of the volume.
// This field is OPTIONAL.
// This field MUST be specified if the VOLUME_CONDITION node
// capability is supported.
VolumeCondition volume_condition = 2;
}
message VolumeUsage {
enum Unit {
UNKNOWN = 0;
BYTES = 1;
INODES = 2;
}
// The available capacity in specified Unit. This field is OPTIONAL.
// The value of this field MUST NOT be negative.
uint64 available = 1;
// The total capacity in specified Unit. This field is REQUIRED.
// The value of this field MUST NOT be negative.
uint64 total = 2;
// The used capacity in specified Unit. This field is OPTIONAL.
// The value of this field MUST NOT be negative.
uint64 used = 3;
// Units by which values are measured. This field is REQUIRED.
Unit unit = 4;
}
// VolumeCondition represents the current condition of a volume.
message VolumeCondition {
// Normal volumes are available for use and operating optimally.
// An abnormal volume does not meet these criteria.
// This field is REQUIRED.
bool abnormal = 1;
// The message describing the condition of the volume.
// This field is REQUIRED.
string message = 2;
}
```
### Step by step walk-through
Given the following definition:
```YAML
---
apiVersion: v1
kind: Pod
metadata:
name: app
spec:
runtime-class: kata-qemu
containers:
- name: app
image: centos
command: ["/bin/sh"]
args: ["-c", "while true; do echo $(date -u) >> /data/out.txt; sleep 5; done"]
volumeMounts:
- name: persistent-storage
mountPath: /data
volumes:
- name: persistent-storage
persistentVolumeClaim:
claimName: ebs-claim
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
skip-hostmount: "true"
name: ebs-claim
spec:
accessModes:
- ReadWriteOncePod
volumeMode: Filesystem
storageClassName: ebs-sc
resources:
requests:
storage: 4Gi
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: ebs-sc
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
parameters:
csi.storage.k8s.io/fstype: ext4
```
Lets assume that changes have been made in the `aws-ebs-csi-driver` node driver.
**Node publish volume**
1. In the node CSI driver, the `NodePublishVolume` API invokes: `kata-runtime direct-volume add --volume-path "/kubelet/a/b/c/d/sdf" --mount-info "{\"Device\": \"/dev/sdf\", \"fstype\": \"ext4\"}"`.
2. The `Kata-runtime` writes the mount-info JSON to a file called `mountInfo.json` under `/run/kata-containers/shared/direct-volumes/kubelet/a/b/c/d/sdf`.
**Node unstage volume**
1. In the node CSI driver, the `NodeUnstageVolume` API invokes: `kata-runtime direct-volume remove --volume-path "/kubelet/a/b/c/d/sdf"`.
2. Kata-runtime deletes the directory `/run/kata-containers/shared/direct-volumes/kubelet/a/b/c/d/sdf`.
**Use the volume in sandbox**
1. Upon the request to start a container, the `containerd-shim-kata-v2` examines the container spec,
and iterates through the mounts. For each mount, if there is a `mountInfo.json` file under `/run/kata-containers/shared/direct-volumes/[mount source path]`,
it generates a `storage` GRPC object after overwriting the mount spec with the information in `mountInfo.json`.
2. The shim sends the storage objects to kata-agent through TTRPC.
3. The shim writes a file with the sandbox id as the name under `/run/kata-containers/shared/direct-volumes/[mount source path]`.
4. The kata-agent mounts the storage objects for the container.
**Node expand volume**
1. In the node CSI driver, the `NodeExpandVolume` API invokes: `kata-runtime direct-volume resize -volume-path "/kubelet/a/b/c/d/sdf" -size 8Gi`.
2. The Kata runtime checks whether there is a sandbox id file under the directory `/run/kata-containers/shared/direct-volumes/kubelet/a/b/c/d/sdf`.
3. The Kata runtime identifies the shim instance through the sandbox id, and sends a GRPC request to resize the volume.
4. The shim handles the request, asks the hypervisor to resize the block device and sends a GRPC request to Kata agent to resize the filesystem.
5. Kata agent receives the request and resizes the filesystem.
**Node get volume stats**
1. In the node CSI driver, the `NodeGetVolumeStats` API invokes: `kata-runtime direct-volume stats -volume-path "/kubelet/a/b/c/d/sdf"`.
2. The Kata runtime checks whether there is a sandbox id file under the directory `/run/kata-containers/shared/direct-volumes/kubelet/a/b/c/d/sdf`.
3. The Kata runtime identifies the shim instance through the sandbox id, and sends a GRPC request to get the volume stats.
4. The shim handles the request and forwards it to the Kata agent.
5. Kata agent receives the request and returns the filesystem stats.

View File

@@ -12,7 +12,7 @@ The OCI [runtime specification][linux-config] provides guidance on where the con
> [`cgroupsPath`][cgroupspath]: (string, OPTIONAL) path to the cgroups. It can be used to either control the cgroups
> hierarchy for containers or to run a new process in an existing container
The cgroups are hierarchical, and this can be seen with the following pod example:
Cgroups are hierarchical, and this can be seen with the following pod example:
- Pod 1: `cgroupsPath=/kubepods/pod1`
- Container 1: `cgroupsPath=/kubepods/pod1/container1`
@@ -247,14 +247,14 @@ cgroup size and constraints accordingly.
# Supported cgroups
Kata Containers currently supports cgroups `v1` and `v2`.
Kata Containers currently only supports cgroups `v1`.
In the following sections each cgroup is described briefly.
## cgroups v1
## Cgroups V1
`cgroups v1` are under a [`tmpfs`][1] filesystem mounted at `/sys/fs/cgroup`, where each cgroup is
mounted under a separate cgroup filesystem. A `cgroups v1` hierarchy may look like the following
`Cgroups V1` are under a [`tmpfs`][1] filesystem mounted at `/sys/fs/cgroup`, where each cgroup is
mounted under a separate cgroup filesystem. A `Cgroups v1` hierarchy may look like the following
diagram:
```
@@ -301,12 +301,13 @@ diagram:
A process can join a cgroup by writing its process id (`pid`) to `cgroup.procs` file,
or join a cgroup partially by writing the task (thread) id (`tid`) to the `tasks` file.
Kata Containers only supports `v1`.
To know more about `cgroups v1`, see [cgroupsv1(7)][2].
## cgroups v2
## Cgroups V2
`cgroups v2` are also known as unified cgroups, unlike `cgroups v1`, the cgroups are
mounted under the same cgroup filesystem. A `cgroups v2` hierarchy may look like the following
`Cgroups v2` are also known as unified cgroups, unlike `cgroups v1`, the cgroups are
mounted under the same cgroup filesystem. A `Cgroups v2` hierarchy may look like the following
diagram:
```
@@ -353,6 +354,8 @@ Same as `cgroups v1`, a process can join the cgroup by writing its process id (`
`cgroup.procs` file, or join a cgroup partially by writing the task (thread) id (`tid`) to
`cgroup.threads` file.
Kata Containers does not support cgroups `v2` on the host.
### Distro Support
Many Linux distributions do not yet support `cgroups v2`, as it is quite a recent addition.

View File

@@ -2,15 +2,24 @@
## Default number of virtual CPUs
Before starting a container, the [runtime][4] reads the `default_vcpus` option
from the [configuration file][5] to determine the number of virtual CPUs
Before starting a container, the [runtime][6] reads the `default_vcpus` option
from the [configuration file][7] to determine the number of virtual CPUs
(vCPUs) needed to start the virtual machine. By default, `default_vcpus` is
equal to 1 for fast boot time and a small memory footprint per virtual machine.
Be aware that increasing this value negatively impacts the virtual machine's
boot time and memory footprint.
In general, we recommend that you do not edit this variable, unless you know
what are you doing. If your container needs more than one vCPU, use
[Kubernetes `cpu` limits][1] to assign more resources.
[docker `--cpus`][1], [docker update][4], or [Kubernetes `cpu` limits][2] to
assign more resources.
*Docker*
```sh
$ docker run --name foo -ti --cpus 2 debian bash
$ docker update --cpus 4 foo
```
*Kubernetes*
@@ -40,7 +49,7 @@ $ sudo -E kubectl create -f ~/cpu-demo.yaml
## Virtual CPUs and Kubernetes pods
A Kubernetes pod is a group of one or more containers, with shared storage and
network, and a specification for how to run the containers [[specification][2]].
network, and a specification for how to run the containers [[specification][3]].
In Kata Containers this group of containers, which is called a sandbox, runs inside
the same virtual machine. If you do not specify a CPU constraint, the runtime does
not add more vCPUs and the container is not placed inside a CPU cgroup.
@@ -64,7 +73,13 @@ constraints with each container trying to consume 100% of vCPU, the resources
divide in two parts, 50% of vCPU for each container because your virtual
machine does not have enough resources to satisfy containers needs. If you want
to give access to a greater or lesser portion of vCPUs to a specific container,
use [Kubernetes `cpu` requests][1].
use [`docker --cpu-shares`][1] or [Kubernetes `cpu` requests][2].
*Docker*
```sh
$ docker run -ti --cpus-shares=512 debian bash
```
*Kubernetes*
@@ -94,9 +109,10 @@ $ sudo -E kubectl create -f ~/cpu-demo.yaml
Before running containers without CPU constraint, consider that your containers
are not running alone. Since your containers run inside a virtual machine other
processes use the vCPUs as well (e.g. `systemd` and the Kata Containers
[agent][3]). In general, we recommend setting `default_vcpus` equal to 1 to
[agent][5]). In general, we recommend setting `default_vcpus` equal to 1 to
allow non-container processes to run on this vCPU and to specify a CPU
constraint for each container.
constraint for each container. If your container is already running and needs
more vCPUs, you can add more using [docker update][4].
## Container with CPU constraint
@@ -105,7 +121,7 @@ constraints using the following formula: `vCPUs = ceiling( quota / period )`, wh
`quota` specifies the number of microseconds per CPU Period that the container is
guaranteed CPU access and `period` specifies the CPU CFS scheduler period of time
in microseconds. The result determines the number of vCPU to hot plug into the
virtual machine. Once the vCPUs have been added, the [agent][3] places the
virtual machine. Once the vCPUs have been added, the [agent][5] places the
container inside a CPU cgroup. This placement allows the container to use only
its assigned resources.
@@ -122,6 +138,25 @@ the virtual machine starts with 8 vCPUs and 1 vCPUs is added and assigned
to the container. Non-container processes might be able to use 8 vCPUs but they
use a maximum 1 vCPU, hence 7 vCPUs might not be used.
*Container without CPU constraint*
```sh
$ docker run -ti debian bash -c "nproc; cat /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_*"
1 # number of vCPUs
100000 # cfs period
-1 # cfs quota
```
*Container with CPU constraint*
```sh
docker run --cpus 4 -ti debian bash -c "nproc; cat /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_*"
5 # number of vCPUs
100000 # cfs period
400000 # cfs quota
```
## Virtual CPU handling without hotplug
In some cases, the hardware and/or software architecture being utilized does not support
@@ -148,8 +183,11 @@ the container's `spec` will provide the sizing information directly. If these ar
calculate the number of CPUs required for the workload and augment this by `default_vcpus`
configuration option, and use this for the virtual machine size.
[1]: https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource
[2]: https://kubernetes.io/docs/concepts/workloads/pods/pod/
[3]: ../../src/agent
[4]: ../../src/runtime
[5]: ../../src/runtime/README.md#configuration
[1]: https://docs.docker.com/config/containers/resource_constraints/#cpu
[2]: https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource
[3]: https://kubernetes.io/docs/concepts/workloads/pods/pod/
[4]: https://docs.docker.com/engine/reference/commandline/update/
[5]: ../../src/agent
[6]: ../../src/runtime
[7]: ../../src/runtime/README.md#configuration

View File

@@ -39,7 +39,7 @@ Details of each solution and a summary are provided below.
Kata Containers with QEMU has complete compatibility with Kubernetes.
Depending on the host architecture, Kata Containers supports various machine types,
for example `q35` on x86 systems, `virt` on ARM systems and `pseries` on IBM Power systems. The default Kata Containers
for example `pc` and `q35` on x86 systems, `virt` on ARM systems and `pseries` on IBM Power systems. The default Kata Containers
machine type is `q35`. The machine type and its [`Machine accelerators`](#machine-accelerators) can
be changed by editing the runtime [`configuration`](architecture/README.md#configuration) file.
@@ -60,8 +60,9 @@ Machine accelerators are architecture specific and can be used to improve the pe
and enable specific features of the machine types. The following machine accelerators
are used in Kata Containers:
- NVDIMM: This machine accelerator is x86 specific and only supported by `q35` machine types.
`nvdimm` is used to provide the root filesystem as a persistent memory device to the Virtual Machine.
- NVDIMM: This machine accelerator is x86 specific and only supported by `pc` and
`q35` machine types. `nvdimm` is used to provide the root filesystem as a persistent
memory device to the Virtual Machine.
#### Hotplug devices

View File

@@ -5,7 +5,7 @@
- [Run Kata containers with `crictl`](run-kata-with-crictl.md)
- [Run Kata Containers with Kubernetes](run-kata-with-k8s.md)
- [How to use Kata Containers and Containerd](containerd-kata.md)
- [How to use Kata Containers and containerd with Kubernetes](how-to-use-k8s-with-containerd-and-kata.md)
- [How to use Kata Containers and CRI (containerd) with Kubernetes](how-to-use-k8s-with-cri-containerd-and-kata.md)
- [Kata Containers and service mesh for Kubernetes](service-mesh.md)
- [How to import Kata Containers logs into Fluentd](how-to-import-kata-logs-with-fluentd.md)
@@ -15,11 +15,6 @@
- `qemu`
- `cloud-hypervisor`
- `firecracker`
In the case of `firecracker` the use of a block device `snapshotter` is needed
for the VM rootfs. Refer to the following guide for additional configuration
steps:
- [Setup Kata containers with `firecracker`](how-to-use-kata-containers-with-firecracker.md)
- `ACRN`
While `qemu` , `cloud-hypervisor` and `firecracker` work out of the box with installation of Kata,

View File

@@ -40,7 +40,7 @@ use `RuntimeClass` instead of the deprecated annotations.
### Containerd Runtime V2 API: Shim V2 API
The [`containerd-shim-kata-v2` (short as `shimv2` in this documentation)](../../src/runtime/cmd/containerd-shim-kata-v2/)
implements the [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/main/runtime/v2) for Kata.
implements the [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/master/runtime/v2) for Kata.
With `shimv2`, Kubernetes can launch Pod and OCI-compatible containers with one shim per Pod. Prior to `shimv2`, `2N+1`
shims (i.e. a `containerd-shim` and a `kata-shim` for each container and the Pod sandbox itself) and no standalone `kata-proxy`
process were used, even with VSOCK not available.
@@ -72,6 +72,7 @@ $ command -v containerd
### Install CNI plugins
> **Note:** You do not need to install CNI plugins if you do not want to use containerd with Kubernetes.
> If you have installed Kubernetes with `kubeadm`, you might have already installed the CNI plugins.
You can manually install CNI plugins as follows:
@@ -93,8 +94,8 @@ $ popd
You can install the `cri-tools` from source code:
```bash
$ go get github.com/kubernetes-sigs/cri-tools
$ pushd $GOPATH/src/github.com/kubernetes-sigs/cri-tools
$ go get github.com/kubernetes-incubator/cri-tools
$ pushd $GOPATH/src/github.com/kubernetes-incubator/cri-tools
$ make
$ sudo -E make install
$ popd
@@ -130,42 +131,74 @@ For
The `RuntimeClass` is suggested.
The following configuration includes two runtime classes:
The following configuration includes three runtime classes:
- `plugins.cri.containerd.runtimes.runc`: the runc, and it is the default runtime.
- `plugins.cri.containerd.runtimes.kata`: The function in containerd (reference [the document here](https://github.com/containerd/containerd/tree/main/runtime/v2#binary-naming))
- `plugins.cri.containerd.runtimes.kata`: The function in containerd (reference [the document here](https://github.com/containerd/containerd/tree/master/runtime/v2#binary-naming))
where the dot-connected string `io.containerd.kata.v2` is translated to `containerd-shim-kata-v2` (i.e. the
binary name of the Kata implementation of [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/main/runtime/v2)).
binary name of the Kata implementation of [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/master/runtime/v2)).
- `plugins.cri.containerd.runtimes.katacli`: the `containerd-shim-runc-v1` calls `kata-runtime`, which is the legacy process.
```toml
[plugins.cri.containerd]
no_pivot = false
[plugins.cri.containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
privileged_without_host_devices = false
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
BinaryName = ""
CriuImagePath = ""
CriuPath = ""
CriuWorkPath = ""
IoGid = 0
[plugins.cri.containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v1"
[plugins.cri.containerd.runtimes.runc.options]
NoPivotRoot = false
NoNewKeyring = false
ShimCgroup = ""
IoUid = 0
IoGid = 0
BinaryName = "runc"
Root = ""
CriuPath = ""
SystemdCgroup = false
[plugins.cri.containerd.runtimes.kata]
runtime_type = "io.containerd.kata.v2"
privileged_without_host_devices = true
pod_annotations = ["io.katacontainers.*"]
container_annotations = ["io.katacontainers.*"]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata.options]
ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration.toml"
[plugins.cri.containerd.runtimes.katacli]
runtime_type = "io.containerd.runc.v1"
[plugins.cri.containerd.runtimes.katacli.options]
NoPivotRoot = false
NoNewKeyring = false
ShimCgroup = ""
IoUid = 0
IoGid = 0
BinaryName = "/usr/bin/kata-runtime"
Root = ""
CriuPath = ""
SystemdCgroup = false
```
From Containerd v1.2.4 and Kata v1.6.0, there is a new runtime option supported, which allows you to specify a specific Kata configuration file as follows:
```toml
[plugins.cri.containerd.runtimes.kata]
runtime_type = "io.containerd.kata.v2"
privileged_without_host_devices = true
[plugins.cri.containerd.runtimes.kata.options]
ConfigPath = "/etc/kata-containers/config.toml"
```
`privileged_without_host_devices` tells containerd that a privileged Kata container should not have direct access to all host devices. If unset, containerd will pass all host devices to Kata container, which may cause security issues.
`pod_annotations` is the list of pod annotations passed to both the pod sandbox as well as container through the OCI config.
`container_annotations` is the list of container annotations passed through to the OCI config of the containers.
This `ConfigPath` option is optional. If you do not specify it, shimv2 first tries to get the configuration file from the environment variable `KATA_CONF_FILE`. If neither are set, shimv2 will use the default Kata configuration file paths (`/etc/kata-containers/configuration.toml` and `/usr/share/defaults/kata-containers/configuration.toml`).
If you use Containerd older than v1.2.4 or a version of Kata older than v1.6.0 and also want to specify a configuration file, you can use the following workaround, since the shimv2 accepts an environment variable, `KATA_CONF_FILE` for the configuration file path. Then, you can create a
shell script with the following:
```bash
#!/usr/bin/env bash
KATA_CONF_FILE=/etc/kata-containers/firecracker.toml containerd-shim-kata-v2 $@
```
Name it as `/usr/local/bin/containerd-shim-katafc-v2` and reference it in the configuration of containerd:
```toml
[plugins.cri.containerd.runtimes.kata-firecracker]
runtime_type = "io.containerd.katafc.v2"
```
#### Kata Containers as the runtime for untrusted workload
For cases without `RuntimeClass` support, we can use the legacy annotation method to support using Kata Containers
@@ -185,8 +218,28 @@ and then, run an untrusted workload with Kata Containers:
runtime_type = "io.containerd.kata.v2"
```
For the earlier versions of Kata Containers and containerd that do not support Runtime V2 (Shim API), you can use the following alternative configuration:
```toml
[plugins.cri.containerd]
# "plugins.cri.containerd.default_runtime" is the runtime to use in containerd.
[plugins.cri.containerd.default_runtime]
# runtime_type is the runtime type to use in containerd e.g. io.containerd.runtime.v1.linux
runtime_type = "io.containerd.runtime.v1.linux"
# "plugins.cri.containerd.untrusted_workload_runtime" is a runtime to run untrusted workloads on it.
[plugins.cri.containerd.untrusted_workload_runtime]
# runtime_type is the runtime type to use in containerd e.g. io.containerd.runtime.v1.linux
runtime_type = "io.containerd.runtime.v1.linux"
# runtime_engine is the name of the runtime engine used by containerd.
runtime_engine = "/usr/bin/kata-runtime"
```
You can find more information on the [Containerd config documentation](https://github.com/containerd/cri/blob/master/docs/config.md)
#### Kata Containers as the default runtime
If you want to set Kata Containers as the only runtime in the deployment, you can simply configure as follows:
@@ -197,6 +250,15 @@ If you want to set Kata Containers as the only runtime in the deployment, you ca
runtime_type = "io.containerd.kata.v2"
```
Alternatively, for the earlier versions of Kata Containers and containerd that do not support Runtime V2 (Shim API), you can use the following alternative configuration:
```toml
[plugins.cri.containerd]
[plugins.cri.containerd.default_runtime]
runtime_type = "io.containerd.runtime.v1.linux"
runtime_engine = "/usr/bin/kata-runtime"
```
### Configuration for `cri-tools`
> **Note:** If you skipped the [Install `cri-tools`](#install-cri-tools) section, you can skip this section too.
@@ -250,12 +312,10 @@ To run a container with Kata Containers through the containerd command line, you
```bash
$ sudo ctr image pull docker.io/library/busybox:latest
$ sudo ctr run --cni --runtime io.containerd.run.kata.v2 -t --rm docker.io/library/busybox:latest hello sh
$ sudo ctr run --runtime io.containerd.run.kata.v2 -t --rm docker.io/library/busybox:latest hello sh
```
This launches a BusyBox container named `hello`, and it will be removed by `--rm` after it quits.
The `--cni` flag enables CNI networking for the container. Without this flag, a container with just a
loopback interface is created.
### Launch Pods with `crictl` command line

View File

@@ -45,9 +45,6 @@ spec:
- name: containerdsocket
mountPath: /run/containerd/containerd.sock
readOnly: true
- name: sbs
mountPath: /run/vc/sbs/
readOnly: true
terminationGracePeriodSeconds: 30
volumes:
- name: containerdtask
@@ -56,6 +53,3 @@ spec:
- name: containerdsocket
hostPath:
path: /run/containerd/containerd.sock
- name: sbs
hostPath:
path: /run/vc/sbs/

View File

@@ -68,7 +68,7 @@ the Kata logs import to the EFK stack.
> stack they are able to utilise in order to modify and test as necessary.
Minikube by default
[configures](https://github.com/kubernetes/minikube/blob/master/deploy/iso/minikube-iso/board/minikube/x86_64/rootfs-overlay/etc/systemd/journald.conf)
[configures](https://github.com/kubernetes/minikube/blob/master/deploy/iso/minikube-iso/board/coreos/minikube/rootfs-overlay/etc/systemd/journald.conf)
the `systemd-journald` with the
[`Storage=volatile`](https://www.freedesktop.org/software/systemd/man/journald.conf.html) option,
which results in the journal being stored in `/run/log/journal`. Unfortunately, the Minikube EFK
@@ -163,7 +163,7 @@ sub-filter on, for instance, the `SYSLOG_IDENTIFIER` to differentiate the Kata c
on the `PRIORITY` to filter out critical issues etc.
Kata generates a significant amount of Kata specific information, which can be seen as
[`logfmt`](../../src/tools/log-parser/README.md#logfile-requirements).
[`logfmt`](https://github.com/kata-containers/tests/tree/main/cmd/log-parser#logfile-requirements).
data contained in the `MESSAGE` field. Imported as-is, there is no easy way to filter on that data
in Kibana:

View File

@@ -48,9 +48,9 @@ Running Docker containers Kata Containers requires care because `VOLUME`s specif
kataShared on / type virtiofs (rw,relatime,dax)
```
`kataShared` mount types are powered by [`virtio-fs`](https://virtio-fs.gitlab.io/), a marked improvement over `virtio-9p`, thanks to [PR #1016](https://github.com/kata-containers/runtime/pull/1016). While `virtio-fs` is normally an excellent choice, in the case of DinD workloads `virtio-fs` causes an issue -- [it *cannot* be used as a "upper layer" of `overlayfs` without a custom patch](http://lists.katacontainers.io/pipermail/kata-dev/2020-January/001216.html).
`kataShared` mount types are powered by [`virtio-fs`][virtio-fs], a marked improvement over `virtio-9p`, thanks to [PR #1016](https://github.com/kata-containers/runtime/pull/1016). While `virtio-fs` is normally an excellent choice, in the case of DinD workloads `virtio-fs` causes an issue -- [it *cannot* be used as a "upper layer" of `overlayfs` without a custom patch](http://lists.katacontainers.io/pipermail/kata-dev/2020-January/001216.html).
As `/var/lib/docker` is a `VOLUME` specified by DinD (i.e. the `docker` images tagged `*-dind`/`*-dind-rootless`), `docker` will fail to start (or even worse, silently pick a worse storage driver like `vfs`) when started in a Kata Container. Special measures must be taken when running DinD-powered workloads in Kata Containers.
As `/var/lib/docker` is a `VOLUME` specified by DinD (i.e. the `docker` images tagged `*-dind`/`*-dind-rootless`), `docker` fill fail to start (or even worse, silently pick a worse storage driver like `vfs`) when started in a Kata Container. Special measures must be taken when running DinD-powered workloads in Kata Containers.
## Workarounds/Solutions
@@ -58,7 +58,7 @@ Thanks to various community contributions (see [issue references below](#referen
### Use a memory backed volume
For small workloads (small container images, without much generated filesystem load), a memory-backed volume is sufficient. Kubernetes supports a variant of [the `EmptyDir` volume](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir), which allows for memdisk-backed storage -- the the `medium: Memory`. An example of a `Pod` using such a setup [was contributed](https://github.com/kata-containers/runtime/issues/1429#issuecomment-477385283), and is reproduced below:
For small workloads (small container images, without much generated filesystem load), a memory-backed volume is sufficient. Kubernetes supports a variant of [the `EmptyDir` volume][k8s-emptydir], which allows for memdisk-backed storage -- the [the `medium: Memory` ][k8s-memory-volume-type]. An example of a `Pod` using such a setup [was contributed](https://github.com/kata-containers/runtime/issues/1429#issuecomment-477385283), and is reproduced below:
```yaml
apiVersion: v1

View File

@@ -19,7 +19,7 @@ Also you should ensure that `kubectl` working correctly.
> **Note**: More information about Kubernetes integrations:
> - [Run Kata Containers with Kubernetes](run-kata-with-k8s.md)
> - [How to use Kata Containers and Containerd](containerd-kata.md)
> - [How to use Kata Containers and containerd with Kubernetes](how-to-use-k8s-with-containerd-and-kata.md)
> - [How to use Kata Containers and CRI (containerd plugin) with Kubernetes](how-to-use-k8s-with-cri-containerd-and-kata.md)
## Configure Prometheus

View File

@@ -91,7 +91,6 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.hypervisor.virtio_fs_daemon` | string | virtio-fs `vhost-user` daemon path |
| `io.katacontainers.config.hypervisor.virtio_fs_extra_args` | string | extra options passed to `virtiofs` daemon |
| `io.katacontainers.config.hypervisor.enable_guest_swap` | `boolean` | enable swap in the guest |
| `io.katacontainers.config.hypervisor.use_legacy_serial` | `boolean` | uses legacy serial device for guest's console (QEMU) |
## Container Options
| Key | Value Type | Comments |
@@ -173,7 +172,7 @@ kind: Pod
metadata:
name: pod2
annotations:
io.katacontainers.config.runtime.disable_guest_seccomp: "false"
io.katacontainers.config.runtime.disable_guest_seccomp: false
spec:
runtimeClassName: kata
containers:

View File

@@ -1,15 +1,15 @@
# How to use Kata Containers and containerd with Kubernetes
# How to use Kata Containers and CRI (containerd plugin) with Kubernetes
This document describes how to set up a single-machine Kubernetes (k8s) cluster.
The Kubernetes cluster will use the
[containerd](https://github.com/containerd/containerd/) and
[Kata Containers](https://katacontainers.io) to launch workloads.
[CRI containerd](https://github.com/containerd/containerd/) and
[Kata Containers](https://katacontainers.io) to launch untrusted workloads.
## Requirements
- Kubernetes, Kubelet, `kubeadm`
- containerd
- containerd with `cri` plug-in
- Kata Containers
> **Note:** For information about the supported versions of these components,
@@ -149,7 +149,7 @@ $ sudo -E kubectl taint nodes --all node-role.kubernetes.io/master-
## Create runtime class for Kata Containers
By default, all pods are created with the default runtime configured in containerd.
By default, all pods are created with the default runtime configured in CRI containerd plugin.
From Kubernetes v1.12, users can use [`RuntimeClass`](https://kubernetes.io/docs/concepts/containers/runtime-class/#runtime-class) to specify a different runtime for Pods.
```bash
@@ -166,7 +166,7 @@ $ sudo -E kubectl apply -f runtime.yaml
## Run pod in Kata Containers
If a pod has the `runtimeClassName` set to `kata`, the CRI runs the pod with the
If a pod has the `runtimeClassName` set to `kata`, the CRI plugin runs the pod with the
[Kata Containers runtime](../../src/runtime/README.md).
- Create an pod configuration that using Kata Containers runtime

View File

@@ -101,7 +101,7 @@ Start an ACRN based Kata Container,
$ sudo docker run -ti --runtime=kata-runtime busybox sh
```
You will see ACRN(`acrn-dm`) is now running on your system, as well as a `kata-shim`. You should obtain an interactive shell prompt. Verify that all the Kata processes terminate once you exit the container.
You will see ACRN(`acrn-dm`) is now running on your system, as well as a `kata-shim`, `kata-proxy`. You should obtain an interactive shell prompt. Verify that all the Kata processes terminate once you exit the container.
```bash
$ ps -ef | grep -E "kata|acrn"

View File

@@ -1,254 +0,0 @@
# Configure Kata Containers to use Firecracker
This document provides an overview on how to run Kata Containers with the AWS Firecracker hypervisor.
## Introduction
AWS Firecracker is an open source virtualization technology that is purpose-built for creating and managing secure, multi-tenant container and function-based services that provide serverless operational models. AWS Firecracker runs workloads in lightweight virtual machines, called `microVMs`, which combine the security and isolation properties provided by hardware virtualization technology with the speed and flexibility of Containers.
Please refer to AWS Firecracker [documentation](https://github.com/firecracker-microvm/firecracker/blob/main/docs/getting-started.md) for more details.
## Pre-requisites
This document requires the presence of Kata Containers on your system. Install using the instructions available through the following links:
- Kata Containers [automated installation](../install/README.md)
- Kata Containers manual installation: Automated installation does not seem to be supported for Clear Linux, so please use [manual installation](../Developer-Guide.md) steps.
> **Note:** Create rootfs image and not initrd image.
## Install AWS Firecracker
Kata Containers only support AWS Firecracker v0.23.4 ([yet](https://github.com/kata-containers/kata-containers/pull/1519)).
To install Firecracker we need to get the `firecracker` and `jailer` binaries:
```bash
$ release_url="https://github.com/firecracker-microvm/firecracker/releases"
$ version="v0.23.1"
$ arch=`uname -m`
$ curl ${release_url}/download/${version}/firecracker-${version}-${arch} -o firecracker
$ curl ${release_url}/download/${version}/jailer-${version}-${arch} -o jailer
$ chmod +x jailer firecracker
```
To make the binaries available from the default system `PATH` it is recommended to move them to `/usr/local/bin` or add a symbolic link:
```bash
$ sudo ln -s $(pwd)/firecracker /usr/local/bin
$ sudo ln -s $(pwd)/jailer /usr/local/bin
```
More details can be found in [AWS Firecracker docs](https://github.com/firecracker-microvm/firecracker/blob/main/docs/getting-started.md)
In order to run Kata with AWS Firecracker a block device as the backing store for a VM is required. To interact with `containerd` and Kata we use the `devmapper` `snapshotter`.
## Configure `devmapper`
To check support for your `containerd` installation, you can run:
```
$ ctr plugins ls |grep devmapper
```
if the output of the above command is:
```
io.containerd.snapshotter.v1 devmapper linux/amd64 ok
```
then you can skip this section and move on to `Configure Kata Containers with AWS Firecracker`
If the output of the above command is:
```
io.containerd.snapshotter.v1 devmapper linux/amd64 error
```
then we need to setup `devmapper` `snapshotter`. Based on a [very useful
guide](https://docs.docker.com/storage/storagedriver/device-mapper-driver/)
from docker, we can set it up using the following scripts:
> **Note:** The following scripts assume a 100G sparse file for storing container images, a 10G sparse file for the thin-provisioning pool and 10G base image files for any sandboxed container created. This means that we will need at least 10GB free space.
```
#!/bin/bash
set -ex
DATA_DIR=/var/lib/containerd/devmapper
POOL_NAME=devpool
mkdir -p ${DATA_DIR}
# Create data file
sudo touch "${DATA_DIR}/data"
sudo truncate -s 100G "${DATA_DIR}/data"
# Create metadata file
sudo touch "${DATA_DIR}/meta"
sudo truncate -s 10G "${DATA_DIR}/meta"
# Allocate loop devices
DATA_DEV=$(sudo losetup --find --show "${DATA_DIR}/data")
META_DEV=$(sudo losetup --find --show "${DATA_DIR}/meta")
# Define thin-pool parameters.
# See https://www.kernel.org/doc/Documentation/device-mapper/thin-provisioning.txt for details.
SECTOR_SIZE=512
DATA_SIZE="$(sudo blockdev --getsize64 -q ${DATA_DEV})"
LENGTH_IN_SECTORS=$(bc <<< "${DATA_SIZE}/${SECTOR_SIZE}")
DATA_BLOCK_SIZE=128
LOW_WATER_MARK=32768
# Create a thin-pool device
sudo dmsetup create "${POOL_NAME}" \
--table "0 ${LENGTH_IN_SECTORS} thin-pool ${META_DEV} ${DATA_DEV} ${DATA_BLOCK_SIZE} ${LOW_WATER_MARK}"
cat << EOF
#
# Add this to your config.toml configuration file and restart `containerd` daemon
#
[plugins]
[plugins.devmapper]
pool_name = "${POOL_NAME}"
root_path = "${DATA_DIR}"
base_image_size = "10GB"
discard_blocks = true
EOF
```
Make it executable and run it:
```bash
$ sudo chmod +x ~/scripts/devmapper/create.sh
$ cd ~/scripts/devmapper/
$ sudo ./create.sh
```
Now, we can add the `devmapper` configuration provided from the script to `/etc/containerd/config.toml`.
> **Note:** If you are using the default `containerd` configuration (`containerd config default >> /etc/containerd/config.toml`), you may need to edit the existing `[plugins."io.containerd.snapshotter.v1.devmapper"]`configuration.
Save and restart `containerd`:
```bash
$ sudo systemctl restart containerd
```
We can use `dmsetup` to verify that the thin-pool was created successfully.
```bash
$ sudo dmsetup ls
```
We should also check that `devmapper` is registered and running:
```bash
$ sudo ctr plugins ls | grep devmapper
```
This script needs to be run only once, while setting up the `devmapper` `snapshotter` for `containerd`. Afterwards, make sure that on each reboot, the thin-pool is initialized from the same data directory. Otherwise, all the fetched containers (or the ones that you have created) will be re-initialized. A simple script that re-creates the thin-pool from the same data directory is shown below:
```
#!/bin/bash
set -ex
DATA_DIR=/var/lib/containerd/devmapper
POOL_NAME=devpool
# Allocate loop devices
DATA_DEV=$(sudo losetup --find --show "${DATA_DIR}/data")
META_DEV=$(sudo losetup --find --show "${DATA_DIR}/meta")
# Define thin-pool parameters.
# See https://www.kernel.org/doc/Documentation/device-mapper/thin-provisioning.txt for details.
SECTOR_SIZE=512
DATA_SIZE="$(sudo blockdev --getsize64 -q ${DATA_DEV})"
LENGTH_IN_SECTORS=$(bc <<< "${DATA_SIZE}/${SECTOR_SIZE}")
DATA_BLOCK_SIZE=128
LOW_WATER_MARK=32768
# Create a thin-pool device
sudo dmsetup create "${POOL_NAME}" \
--table "0 ${LENGTH_IN_SECTORS} thin-pool ${META_DEV} ${DATA_DEV} ${DATA_BLOCK_SIZE} ${LOW_WATER_MARK}"
```
We can create a systemd service to run the above script on each reboot:
```bash
$ sudo nano /lib/systemd/system/devmapper_reload.service
```
The service file:
```
[Unit]
Description=Devmapper reload script
[Service]
ExecStart=/path/to/script/reload.sh
[Install]
WantedBy=multi-user.target
```
Enable the newly created service:
```bash
$ sudo systemctl daemon-reload
$ sudo systemctl enable devmapper_reload.service
$ sudo systemctl start devmapper_reload.service
```
## Configure Kata Containers with AWS Firecracker
To configure Kata Containers with AWS Firecracker, copy the generated `configuration-fc.toml` file when building the `kata-runtime` to either `/etc/kata-containers/configuration-fc.toml` or `/usr/share/defaults/kata-containers/configuration-fc.toml`.
The following command shows full paths to the `configuration.toml` files that the runtime loads. It will use the first path that exists. (Please make sure the kernel and image paths are set correctly in the `configuration.toml` file)
```bash
$ sudo kata-runtime --show-default-config-paths
```
## Configure `containerd`
Next, we need to configure containerd. Add a file in your path (e.g. `/usr/local/bin/containerd-shim-kata-fc-v2`) with the following contents:
```
#!/bin/bash
KATA_CONF_FILE=/etc/containers/configuration-fc.toml /usr/local/bin/containerd-shim-kata-v2 $@
```
> **Note:** You may need to edit the paths of the configuration file and the `containerd-shim-kata-v2` to correspond to your setup.
Make it executable:
```bash
$ sudo chmod +x /usr/local/bin/containerd-shim-kata-fc-v2
```
Add the relevant section in `containerd`s `config.toml` file (`/etc/containerd/config.toml`):
```
[plugins.cri.containerd.runtimes]
[plugins.cri.containerd.runtimes.kata-fc]
runtime_type = "io.containerd.kata-fc.v2"
```
> **Note:** If you are using the default `containerd` configuration (`containerd config default >> /etc/containerd/config.toml`),
> the configuration should change to :
```
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-fc]
runtime_type = "io.containerd.kata-fc.v2"
```
Restart `containerd`:
```bash
$ sudo systemctl restart containerd
```
## Verify the installation
We are now ready to launch a container using Kata with Firecracker to verify that everything worked:
```bash
$ sudo ctr images pull --snapshotter devmapper docker.io/library/ubuntu:latest
$ sudo ctr run --snapshotter devmapper --runtime io.containerd.run.kata-fc.v2 -t --rm docker.io/library/ubuntu
```

View File

@@ -31,7 +31,7 @@ See below example config:
[plugins.cri]
[plugins.cri.containerd]
[plugins.cri.containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
runtime_type = "io.containerd.runc.v1"
privileged_without_host_devices = false
[plugins.cri.containerd.runtimes.kata]
runtime_type = "io.containerd.kata.v2"
@@ -40,7 +40,7 @@ See below example config:
ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration.toml"
```
- [How to use Kata Containers and containerd with Kubernetes](how-to-use-k8s-with-containerd-and-kata.md)
- [Kata Containers with Containerd and CRI documentation](how-to-use-k8s-with-cri-containerd-and-kata.md)
- [Containerd CRI config documentation](https://github.com/containerd/containerd/blob/main/docs/cri/config.md)
#### CRI-O

View File

@@ -15,7 +15,7 @@ After choosing one CRI implementation, you must make the appropriate configurati
to ensure it integrates with Kata Containers.
Kata Containers 1.5 introduced the `shimv2` for containerd 1.2.0, reducing the components
required to spawn pods and containers, and this is the preferred way to run Kata Containers with Kubernetes ([as documented here](../how-to/how-to-use-k8s-with-containerd-and-kata.md#configure-containerd-to-use-kata-containers)).
required to spawn pods and containers, and this is the preferred way to run Kata Containers with Kubernetes ([as documented here](../how-to/how-to-use-k8s-with-cri-containerd-and-kata.md#configure-containerd-to-use-kata-containers)).
An equivalent shim implementation for CRI-O is planned.
@@ -57,7 +57,7 @@ content shown below:
To customize containerd to select Kata Containers runtime, follow our
"Configure containerd to use Kata Containers" internal documentation
[here](../how-to/how-to-use-k8s-with-containerd-and-kata.md#configure-containerd-to-use-kata-containers).
[here](../how-to/how-to-use-k8s-with-cri-containerd-and-kata.md#configure-containerd-to-use-kata-containers).
## Install Kubernetes
@@ -85,7 +85,7 @@ Environment="KUBELET_EXTRA_ARGS=--container-runtime=remote --runtime-request-tim
Environment="KUBELET_EXTRA_ARGS=--container-runtime=remote --runtime-request-timeout=15m --container-runtime-endpoint=unix:///run/containerd/containerd.sock"
```
For more information about containerd see the "Configure Kubelet to use containerd"
documentation [here](../how-to/how-to-use-k8s-with-containerd-and-kata.md#configure-kubelet-to-use-containerd).
documentation [here](../how-to/how-to-use-k8s-with-cri-containerd-and-kata.md#configure-kubelet-to-use-containerd).
## Run a Kubernetes pod with Kata Containers
@@ -99,18 +99,7 @@ $ sudo systemctl restart kubelet
$ sudo kubeadm init --ignore-preflight-errors=all --cri-socket /var/run/crio/crio.sock --pod-network-cidr=10.244.0.0/16
# If using containerd
$ cat <<EOF | tee kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
nodeRegistration:
criSocket: "/run/containerd/containerd.sock"
---
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
cgroupDriver: cgroupfs
podCIDR: "10.244.0.0/16"
EOF
$ sudo kubeadm init --ignore-preflight-errors=all --config kubeadm-config.yaml
$ sudo kubeadm init --ignore-preflight-errors=all --cri-socket /run/containerd/containerd.sock --pod-network-cidr=10.244.0.0/16
$ export KUBECONFIG=/etc/kubernetes/admin.conf
```

View File

@@ -33,7 +33,6 @@ are available, their default values and how each setting can be used.
[Cloud Hypervisor] | rust | `aarch64`, `x86_64` | Type 2 ([KVM]) | `configuration-clh.toml` |
[Firecracker] | rust | `aarch64`, `x86_64` | Type 2 ([KVM]) | `configuration-fc.toml` |
[QEMU] | C | all | Type 2 ([KVM]) | `configuration-qemu.toml` |
[`Dragonball`] | rust | `aarch64`, `x86_64` | Type 2 ([KVM]) | `configuration-dragonball.toml` |
## Determine currently configured hypervisor
@@ -53,7 +52,6 @@ the hypervisors:
[Cloud Hypervisor] | Low latency, small memory footprint, small attack surface | Minimal | | excellent | excellent | High performance modern cloud workloads | |
[Firecracker] | Very slimline | Extremely minimal | Doesn't support all device types | excellent | excellent | Serverless / FaaS | |
[QEMU] | Lots of features | Lots | | good | good | Good option for most users | | All users |
[`Dragonball`] | Built-in VMM, low CPU and memory overhead| Minimal | | excellent | excellent | Optimized for most container workloads | `out-of-the-box` Kata Containers experience |
For further details, see the [Virtualization in Kata Containers](design/virtualization.md) document and the official documentation for each hypervisor.
@@ -62,4 +60,3 @@ For further details, see the [Virtualization in Kata Containers](design/virtuali
[Firecracker]: https://github.com/firecracker-microvm/firecracker
[KVM]: https://en.wikipedia.org/wiki/Kernel-based_Virtual_Machine
[QEMU]: http://www.qemu-project.org
[`Dragonball`]: https://github.com/openanolis/dragonball-sandbox

View File

@@ -79,6 +79,3 @@ versions. This is not recommended for normal users.
* [upgrading document](../Upgrading.md)
* [developer guide](../Developer-Guide.md)
* [runtime documentation](../../src/runtime/README.md)
## Kata Containers 3.0 rust runtime installation
* [installation guide](../install/kata-containers-3.0-rust-runtime-installation-guide.md)

View File

@@ -19,6 +19,12 @@
> - If you decide to proceed and install a Kata Containers release, you can
> still check for the latest version of Kata Containers by running
> `kata-runtime check --only-list-releases`.
>
> - These instructions will not work for Fedora 31 and higher since those
> distribution versions only support cgroups version 2 by default. However,
> Kata Containers currently requires cgroups version 1 (on the host side). See
> https://github.com/kata-containers/kata-containers/issues/927 for further
> details.
## Install Kata Containers
@@ -75,7 +81,7 @@
- Download the standard `systemd(1)` service file and install to
`/etc/systemd/system/`:
- https://raw.githubusercontent.com/containerd/containerd/main/containerd.service
- https://raw.githubusercontent.com/containerd/containerd/master/containerd.service
> **Notes:**
>

View File

@@ -1,101 +0,0 @@
# Kata Containers 3.0 rust runtime installation
The following is an overview of the different installation methods available.
## Prerequisites
Kata Containers 3.0 rust runtime requires nested virtualization or bare metal. Check
[hardware requirements](/src/runtime/README.md#hardware-requirements) to see if your system is capable of running Kata
Containers.
### Platform support
Kata Containers 3.0 rust runtime currently runs on 64-bit systems supporting the following
architectures:
> **Notes:**
> For other architectures, see https://github.com/kata-containers/kata-containers/issues/4320
| Architecture | Virtualization technology |
|-|-|
| `x86_64`| [Intel](https://www.intel.com) VT-x |
| `aarch64` ("`arm64`")| [ARM](https://www.arm.com) Hyp |
## Packaged installation methods
| Installation method | Description | Automatic updates | Use case | Availability
|------------------------------------------------------|----------------------------------------------------------------------------------------------|-------------------|-----------------------------------------------------------------------------------------------|----------- |
| [Using kata-deploy](#kata-deploy-installation) | The preferred way to deploy the Kata Containers distributed binaries on a Kubernetes cluster | **No!** | Best way to give it a try on kata-containers on an already up and running Kubernetes cluster. | No |
| [Using official distro packages](#official-packages) | Kata packages provided by Linux distributions official repositories | yes | Recommended for most users. | No |
| [Using snap](#snap-installation) | Easy to install | yes | Good alternative to official distro packages. | No |
| [Automatic](#automatic-installation) | Run a single command to install a full system | **No!** | For those wanting the latest release quickly. | No |
| [Manual](#manual-installation) | Follow a guide step-by-step to install a working system | **No!** | For those who want the latest release with more control. | No |
| [Build from source](#build-from-source-installation) | Build the software components manually | **No!** | Power users and developers only. | Yes |
### Kata Deploy Installation
`ToDo`
### Official packages
`ToDo`
### Snap Installation
`ToDo`
### Automatic Installation
`ToDo`
### Manual Installation
`ToDo`
## Build from source installation
### Rust Environment Set Up
* Download `Rustup` and install `Rust`
> **Notes:**
> Rust version 1.58 is needed
Example for `x86_64`
```
$ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
$ source $HOME/.cargo/env
$ rustup install 1.58
$ rustup default 1.58-x86_64-unknown-linux-gnu
```
* Musl support for fully static binary
Example for `x86_64`
```
$ rustup target add x86_64-unknown-linux-musl
```
* [Musl `libc`](http://musl.libc.org/) install
Example for musl 1.2.3
```
$ curl -O https://git.musl-libc.org/cgit/musl/snapshot/musl-1.2.3.tar.gz
$ tar vxf musl-1.2.3.tar.gz
$ cd musl-1.2.3/
$ ./configure --prefix=/usr/local/
$ make && sudo make install
```
### Install Kata 3.0 Rust Runtime Shim
```
$ git clone https://github.com/kata-containers/kata-containers.git
$ cd kata-containers/src/runtime-rs
$ make && sudo make install
```
After running the command above, the default config file `configuration.toml` will be installed under `/usr/share/defaults/kata-containers/`, the binary file `containerd-shim-kata-v2` will be installed under `/user/local/bin` .
### Build Kata Containers Kernel
Follow the [Kernel installation guide](/tools/packaging/kernel/README.md).
### Build Kata Rootfs
Follow the [Rootfs installation guide](../../tools/osbuilder/rootfs-builder/README.md).
### Build Kata Image
Follow the [Image installation guide](../../tools/osbuilder/image-builder/README.md).
### Install Containerd
Follow the [Containerd installation guide](container-manager/containerd/containerd-install.md).

View File

@@ -55,11 +55,11 @@ Here are the features to set up a CRI-O based Minikube, and why you need them:
| what | why |
| ---- | --- |
| `--bootstrapper=kubeadm` | As recommended for [minikube CRI-O](https://minikube.sigs.k8s.io/docs/handbook/config/#runtime-configuration) |
| `--bootstrapper=kubeadm` | As recommended for [minikube CRI-o](https://kubernetes.io/docs/setup/minikube/#cri-o) |
| `--container-runtime=cri-o` | Using CRI-O for Kata |
| `--enable-default-cni` | As recommended for [minikube CRI-O](https://minikube.sigs.k8s.io/docs/handbook/config/#runtime-configuration) |
| `--enable-default-cni` | As recommended for [minikube CRI-o](https://kubernetes.io/docs/setup/minikube/#cri-o) |
| `--memory 6144` | Allocate sufficient memory, as Kata Containers default to 1 or 2Gb |
| `--network-plugin=cni` | As recommended for [minikube CRI-O](https://minikube.sigs.k8s.io/docs/handbook/config/#runtime-configuration) |
| `--network-plugin=cni` | As recommended for [minikube CRI-o](https://kubernetes.io/docs/setup/minikube/#cri-o) |
| `--vm-driver kvm2` | The host VM driver |
To use containerd, modify the `--container-runtime` argument:

View File

@@ -3,4 +3,4 @@
Kata Containers supports passing certain GPUs from the host into the container. Select the GPU vendor for detailed information:
- [Intel](Intel-GPU-passthrough-and-Kata.md)
- [NVIDIA](NVIDIA-GPU-passthrough-and-Kata.md)
- [Nvidia](Nvidia-GPU-passthrough-and-Kata.md)

View File

@@ -1,592 +0,0 @@
# Using NVIDIA GPU device with Kata Containers
An NVIDIA GPU device can be passed to a Kata Containers container using GPU
passthrough (NVIDIA GPU pass-through mode) as well as GPU mediated passthrough
(NVIDIA `vGPU` mode).
NVIDIA GPU pass-through mode, an entire physical GPU is directly assigned to one
VM, bypassing the NVIDIA Virtual GPU Manager. In this mode of operation, the GPU
is accessed exclusively by the NVIDIA driver running in the VM to which it is
assigned. The GPU is not shared among VMs.
NVIDIA Virtual GPU (`vGPU`) enables multiple virtual machines (VMs) to have
simultaneous, direct access to a single physical GPU, using the same NVIDIA
graphics drivers that are deployed on non-virtualized operating systems. By
doing this, NVIDIA `vGPU` provides VMs with unparalleled graphics performance,
compute performance, and application compatibility, together with the
cost-effectiveness and scalability brought about by sharing a GPU among multiple
workloads. A `vGPU` can be either time-sliced or Multi-Instance GPU (MIG)-backed
with [MIG-slices](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/).
| Technology | Description | Behavior | Detail |
| --- | --- | --- | --- |
| NVIDIA GPU pass-through mode | GPU passthrough | Physical GPU assigned to a single VM | Direct GPU assignment to VM without limitation |
| NVIDIA vGPU time-sliced | GPU time-sliced | Physical GPU time-sliced for multiple VMs | Mediated passthrough |
| NVIDIA vGPU MIG-backed | GPU with MIG-slices | Physical GPU MIG-sliced for multiple VMs | Mediated passthrough |
## Hardware Requirements
NVIDIA GPUs Recommended for Virtualization:
- NVIDIA Tesla (T4, M10, P6, V100 or newer)
- NVIDIA Quadro RTX 6000/8000
## Host BIOS Requirements
Some hardware requires a larger PCI BARs window, for example, NVIDIA Tesla P100,
K40m
```sh
$ lspci -s d0:00.0 -vv | grep Region
Region 0: Memory at e7000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at 222800000000 (64-bit, prefetchable) [size=32G] # Above 4G
Region 3: Memory at 223810000000 (64-bit, prefetchable) [size=32M]
```
For large BARs devices, MMIO mapping above 4G address space should be `enabled`
in the PCI configuration of the BIOS.
Some hardware vendors use a different name in BIOS, such as:
- Above 4G Decoding
- Memory Hole for PCI MMIO
- Memory Mapped I/O above 4GB
If one is using a GPU based on the Ampere architecture and later additionally
SR-IOV needs to be enabled for the `vGPU` use-case.
The following steps outline the workflow for using an NVIDIA GPU with Kata.
## Host Kernel Requirements
The following configurations need to be enabled on your host kernel:
- `CONFIG_VFIO`
- `CONFIG_VFIO_IOMMU_TYPE1`
- `CONFIG_VFIO_MDEV`
- `CONFIG_VFIO_MDEV_DEVICE`
- `CONFIG_VFIO_PCI`
Your host kernel needs to be booted with `intel_iommu=on` on the kernel command
line.
## Install and configure Kata Containers
To use non-large BARs devices (for example, NVIDIA Tesla T4), you need Kata
version 1.3.0 or above. Follow the [Kata Containers setup
instructions](../install/README.md) to install the latest version of Kata.
To use large BARs devices (for example, NVIDIA Tesla P100), you need Kata
version 1.11.0 or above.
The following configuration in the Kata `configuration.toml` file as shown below
can work:
Hotplug for PCI devices with small BARs by `acpi_pcihp` (Linux's ACPI PCI
Hotplug driver):
```sh
machine_type = "q35"
hotplug_vfio_on_root_bus = false
```
Hotplug for PCIe devices with large BARs by `pciehp` (Linux's PCIe Hotplug
driver):
```sh
machine_type = "q35"
hotplug_vfio_on_root_bus = true
pcie_root_port = 1
```
## Build Kata Containers kernel with GPU support
The default guest kernel installed with Kata Containers does not provide GPU
support. To use an NVIDIA GPU with Kata Containers, you need to build a kernel
with the necessary GPU support.
The following kernel config options need to be enabled:
```sh
# Support PCI/PCIe device hotplug (Required for large BARs device)
CONFIG_HOTPLUG_PCI_PCIE=y
# Support for loading modules (Required for load NVIDIA drivers)
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# Enable the MMIO access method for PCIe devices (Required for large BARs device)
CONFIG_PCI_MMCONFIG=y
```
The following kernel config options need to be disabled:
```sh
# Disable Open Source NVIDIA driver nouveau
# It conflicts with NVIDIA official driver
CONFIG_DRM_NOUVEAU=n
```
> **Note**: `CONFIG_DRM_NOUVEAU` is normally disabled by default.
It is worth checking that it is not enabled in your kernel configuration to
prevent any conflicts.
Build the Kata Containers kernel with the previous config options, using the
instructions described in [Building Kata Containers
kernel](../../tools/packaging/kernel). For further details on building and
installing guest kernels, see [the developer
guide](../Developer-Guide.md#install-guest-kernel-images).
There is an easy way to build a guest kernel that supports NVIDIA GPU:
```sh
## Build guest kernel with ../../tools/packaging/kernel
# Prepare (download guest kernel source, generate .config)
$ ./build-kernel.sh -v 5.15.23 -g nvidia -f setup
# Build guest kernel
$ ./build-kernel.sh -v 5.15.23 -g nvidia build
# Install guest kernel
$ sudo -E ./build-kernel.sh -v 5.15.23 -g nvidia install
```
To build NVIDIA Driver in Kata container, `linux-headers` are required.
This is a way to generate deb packages for `linux-headers`:
> **Note**:
> Run `make rpm-pkg` to build the rpm package.
> Run `make deb-pkg` to build the deb package.
>
```sh
$ cd kata-linux-5.15.23-89
$ make deb-pkg
```
Before using the new guest kernel, please update the `kernel` parameters in
`configuration.toml`.
```sh
kernel = "/usr/share/kata-containers/vmlinuz-nvidia-gpu.container"
```
## NVIDIA GPU pass-through mode with Kata Containers
Use the following steps to pass an NVIDIA GPU device in pass-through mode with Kata:
1. Find the Bus-Device-Function (BDF) for the GPU device on the host:
```sh
$ sudo lspci -nn -D | grep -i nvidia
0000:d0:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:20b9] (rev a1)
```
> PCI address `0000:d0:00.0` is assigned to the hardware GPU device.
> `10de:20b9` is the device ID of the hardware GPU device.
2. Find the IOMMU group for the GPU device:
```sh
$ BDF="0000:d0:00.0"
$ readlink -e /sys/bus/pci/devices/$BDF/iommu_group
```
The previous output shows that the GPU belongs to IOMMU group 192. The next
step is to bind the GPU to the VFIO-PCI driver.
```sh
$ BDF="0000:d0:00.0"
$ DEV="/sys/bus/pci/devices/$BDF"
$ echo "vfio-pci" > $DEV/driver_override
$ echo $BDF > $DEV/driver/unbind
$ echo $BDF > /sys/bus/pci/drivers_probe
# To return the device to the standard driver, we simply clear the
# driver_override and reprobe the device, ex:
$ echo > $DEV/preferred_driver
$ echo $BDF > $DEV/driver/unbind
$ echo $BDF > /sys/bus/pci/drivers_probe
```
3. Check the IOMMU group number under `/dev/vfio`:
```sh
$ ls -l /dev/vfio
total 0
crw------- 1 zvonkok zvonkok 243, 0 Mar 18 03:06 192
crw-rw-rw- 1 root root 10, 196 Mar 18 02:27 vfio
```
4. Start a Kata container with the GPU device:
```sh
# You may need to `modprobe vhost-vsock` if you get
# host system doesn't support vsock: stat /dev/vhost-vsock
$ sudo ctr --debug run --runtime "io.containerd.kata.v2" --device /dev/vfio/192 --rm -t "docker.io/library/archlinux:latest" arch uname -r
```
5. Run `lspci` within the container to verify the GPU device is seen in the list
of the PCI devices. Note the vendor-device id of the GPU (`10de:20b9`) in the `lspci` output.
```sh
$ sudo ctr --debug run --runtime "io.containerd.kata.v2" --device /dev/vfio/192 --rm -t "docker.io/library/archlinux:latest" arch sh -c "lspci -nn | grep '10de:20b9'"
```
6. Additionally, you can check the PCI BARs space of the NVIDIA GPU device in the container:
```sh
$ sudo ctr --debug run --runtime "io.containerd.kata.v2" --device /dev/vfio/192 --rm -t "docker.io/library/archlinux:latest" arch sh -c "lspci -s 02:00.0 -vv | grep Region"
```
> **Note**: If you see a message similar to the above, the BAR space of the NVIDIA
> GPU has been successfully allocated.
## NVIDIA vGPU mode with Kata Containers
NVIDIA vGPU is a licensed product on all supported GPU boards. A software license
is required to enable all vGPU features within the guest VM. NVIDIA vGPU manager
needs to be installed on the host to configure GPUs in vGPU mode. See [NVIDIA Virtual GPU Software Documentation v14.0 through 14.1](https://docs.nvidia.com/grid/14.0/) for more details.
### NVIDIA vGPU time-sliced
In the time-sliced mode, the GPU is not partitioned and the workload uses the
whole GPU and shares access to the GPU engines. Processes are scheduled in
series. The best effort scheduler is the default one and can be exchanged by
other scheduling policies see the documentation above how to do that.
Beware if you had `MIG` enabled before to disable `MIG` on the GPU if you want
to use `time-sliced` `vGPU`.
```sh
$ sudo nvidia-smi -mig 0
```
Enable the virtual functions for the physical GPU in the `sysfs` file system.
```sh
$ sudo /usr/lib/nvidia/sriov-manage -e 0000:41:00.0
```
Get the `BDF` of the available virtual function on the GPU, and choose one for the
following steps.
```sh
$ cd /sys/bus/pci/devices/0000:41:00.0/
$ ls -l | grep virtfn
```
#### List all available vGPU instances
The following shell snippet will walk the `sysfs` and only print instances
that are available, that can be created.
```sh
# The 00.0 is often the PF of the device the VFs will have the funciont in the
# BDF incremented by some values so e.g. the very first VF is 0000:41:00.4
cd /sys/bus/pci/devices/0000:41:00.0/
for vf in $(ls -d virtfn*)
do
BDF=$(basename $(readlink -f $vf))
for md in $(ls -d $vf/mdev_supported_types/*)
do
AVAIL=$(cat $md/available_instances)
NAME=$(cat $md/name)
DIR=$(basename $md)
if [ $AVAIL -gt 0 ]; then
echo "| BDF | INSTANCES | NAME | DIR |"
echo "+--------------+-----------+----------------+------------+"
printf "| %12s |%10d |%15s | %10s |\n\n" "$BDF" "$AVAIL" "$NAME" "$DIR"
fi
done
done
```
If there are available instances you get something like this (for the first VF),
beware that the output is highly dependent on the GPU you have, if there is no
output check again if `MIG` is really disabled.
```sh
| BDF | INSTANCES | NAME | DIR |
+--------------+-----------+----------------+------------+
| 0000:41:00.4 | 1 | GRID A100D-4C | nvidia-692 |
| BDF | INSTANCES | NAME | DIR |
+--------------+-----------+----------------+------------+
| 0000:41:00.4 | 1 | GRID A100D-8C | nvidia-693 |
| BDF | INSTANCES | NAME | DIR |
+--------------+-----------+----------------+------------+
| 0000:41:00.4 | 1 | GRID A100D-10C | nvidia-694 |
| BDF | INSTANCES | NAME | DIR |
+--------------+-----------+----------------+------------+
| 0000:41:00.4 | 1 | GRID A100D-16C | nvidia-695 |
| BDF | INSTANCES | NAME | DIR |
+--------------+-----------+----------------+------------+
| 0000:41:00.4 | 1 | GRID A100D-20C | nvidia-696 |
| BDF | INSTANCES | NAME | DIR |
+--------------+-----------+----------------+------------+
| 0000:41:00.4 | 1 | GRID A100D-40C | nvidia-697 |
| BDF | INSTANCES | NAME | DIR |
+--------------+-----------+----------------+------------+
| 0000:41:00.4 | 1 | GRID A100D-80C | nvidia-698 |
```
Change to the `mdev_supported_types` directory for the virtual function on which
you want to create the `vGPU`. Taking the first output as an example:
```sh
$ cd virtfn0/mdev_supported_types/nvidia-692
$ UUIDGEN=$(uuidgen)
$ sudo bash -c "echo $UUIDGEN > create"
```
Confirm that the `vGPU` was created. You should see the `UUID` pointing to a
subdirectory of the `sysfs` space.
```sh
$ ls -l /sys/bus/mdev/devices/
```
Get the `IOMMU` group number and verify there is a `VFIO` device created to use
with Kata.
```sh
$ ls -l /sys/bus/mdev/devices/*/
$ ls -l /dev/vfio
```
Use the `VFIO` device created in the same way as in the pass-through use-case.
Beware that the guest needs the NVIDIA guest drivers, so one would need to build
a new guest `OS` image.
### NVIDIA vGPU MIG-backed
We're not going into detail what `MIG` is but briefly it is a technology to
partition the hardware into independent instances with guaranteed quality of
service. For more details see [NVIDIA Multi-Instance GPU User Guide](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/).
First enable `MIG` mode for a GPU, depending on the platform you're running
a reboot would be necessary. Some platforms support GPU reset.
```sh
$ sudo nvidia-smi -mig 1
```
If the platform supports a GPU reset one can run, otherwise you will get a
warning to reboot the server.
```sh
$ sudo nvidia-smi --gpu-reset
```
The driver per default provides a number of profiles that users can opt-in when
configuring the MIG feature.
```sh
$ sudo nvidia-smi mig -lgip
+-----------------------------------------------------------------------------+
| GPU instance profiles: |
| GPU Name ID Instances Memory P2P SM DEC ENC |
| Free/Total GiB CE JPEG OFA |
|=============================================================================|
| 0 MIG 1g.10gb 19 7/7 9.50 No 14 0 0 |
| 1 0 0 |
+-----------------------------------------------------------------------------+
| 0 MIG 1g.10gb+me 20 1/1 9.50 No 14 1 0 |
| 1 1 1 |
+-----------------------------------------------------------------------------+
| 0 MIG 2g.20gb 14 3/3 19.50 No 28 1 0 |
| 2 0 0 |
+-----------------------------------------------------------------------------+
...
```
Create the GPU instances that correspond to the `vGPU` types of the `MIG-backed`
`vGPUs` that you will create [NVIDIA A100 PCIe 80GB Virtual GPU Types](https://docs.nvidia.com/grid/13.0/grid-vgpu-user-guide/index.html#vgpu-types-nvidia-a100-pcie-80gb).
```sh
# MIG 1g.10gb --> vGPU A100D-1-10C
$ sudo nvidia-smi mig -cgi 19
```
List the GPU instances and get the GPU instance id to create the compute
instance.
```sh
$ sudo nvidia-smi mig -lgi # list the created GPU instances
$ sudo nvidia-smi mig -cci -gi 9 # each GPU instance can have several compute
# instances. Instance -> Workload
```
Verify that the compute instances were created within the GPU instance
```sh
$ nvidia-smi
... snip ...
+-----------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG|
| | | ECC| |
|==================+======================+===========+=======================|
| 0 9 0 0 | 0MiB / 9728MiB | 14 0 | 1 0 0 0 0 |
| | 0MiB / 4095MiB | | |
+------------------+----------------------+-----------+-----------------------+
... snip ...
```
We can use the [snippet](#list-all-available-vgpu-instances) from before to list
the available `vGPU` instances, this time `MIG-backed`.
```sh
| BDF | INSTANCES | NAME | DIR |
+--------------+-----------+----------------+------------+
| 0000:41:00.4 | 1 |GRID A100D-1-10C | nvidia-699 |
| BDF | INSTANCES | NAME | DIR |
+--------------+-----------+----------------+------------+
| 0000:41:00.5 | 1 |GRID A100D-1-10C | nvidia-699 |
| BDF | INSTANCES | NAME | DIR |
+--------------+-----------+----------------+------------+
| 0000:41:01.6 | 1 |GRID A100D-1-10C | nvidia-699 |
... snip ...
```
Repeat the steps after the [snippet](#list-all-available-vgpu-instances) listing
to create the corresponding `mdev` device and use the guest `OS` created in the
previous section with `time-sliced` `vGPUs`.
## Install NVIDIA Driver + Toolkit in Kata Containers Guest OS
Consult the [Developer-Guide](https://github.com/kata-containers/kata-containers/blob/main/docs/Developer-Guide.md#create-a-rootfs-image) on how to create a
rootfs base image for a distribution of your choice. This is going to be used as
a base for a NVIDIA enabled guest OS. Use the `EXTRA_PKGS` variable to install
all the needed packages to compile the drivers. Also copy the kernel development
packages from the previous `make deb-pkg` into `$ROOTFS_DIR`.
```sh
export EXTRA_PKGS="gcc make curl gnupg"
```
Having the `$ROOTFS_DIR` exported in the previous step we can now install all the
needed parts in the guest OS. In this case, we have an Ubuntu based rootfs.
First off all mount the special filesystems into the rootfs
```sh
$ sudo mount -t sysfs -o ro none ${ROOTFS_DIR}/sys
$ sudo mount -t proc -o ro none ${ROOTFS_DIR}/proc
$ sudo mount -t tmpfs none ${ROOTFS_DIR}/tmp
$ sudo mount -o bind,ro /dev ${ROOTFS_DIR}/dev
$ sudo mount -t devpts none ${ROOTFS_DIR}/dev/pts
```
Now we can enter `chroot`
```sh
$ sudo chroot ${ROOTFS_DIR}
```
Inside the rootfs one is going to install the drivers and toolkit to enable the
easy creation of GPU containers with Kata. We can also use this rootfs for any
other container not specifically only for GPUs.
As a prerequisite install the copied kernel development packages
```sh
$ sudo dpkg -i *.deb
```
Get the driver run file, since we need to build the driver against a kernel that
is not running on the host we need the ability to specify the exact version we
want the driver to build against. Take the kernel version one used for building
the NVIDIA kernel (`5.15.23-nvidia-gpu`).
```sh
$ wget https://us.download.nvidia.com/XFree86/Linux-x86_64/510.54/NVIDIA-Linux-x86_64-510.54.run
$ chmod +x NVIDIA-Linux-x86_64-510.54.run
# Extract the source files so we can run the installer with arguments
$ ./NVIDIA-Linux-x86_64-510.54.run -x
$ cd NVIDIA-Linux-x86_64-510.54
$ ./nvidia-installer -k 5.15.23-nvidia-gpu
```
Having the drivers installed we need to install the toolkit which will take care
of providing the right bits into the container.
```sh
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
$ curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
$ apt update
$ apt install nvidia-container-toolkit
```
Create the hook execution file for Kata:
```
# Content of $ROOTFS_DIR/usr/share/oci/hooks/prestart/nvidia-container-toolkit.sh
#!/bin/bash -x
/usr/bin/nvidia-container-toolkit -debug $@
```
As the last step one can do some cleanup of files or package caches. Build the
rootfs and configure it for use with Kata according to the development guide.
Enable the `guest_hook_path` in Kata's `configuration.toml`
```sh
guest_hook_path = "/usr/share/oci/hooks"
```
One has built a NVIDIA rootfs, kernel and now we can run any GPU container
without installing the drivers into the container. Check NVIDIA device status
with `nvidia-smi`
```sh
$ sudo ctr --debug run --runtime "io.containerd.kata.v2" --device /dev/vfio/192 --rm -t "docker.io/nvidia/cuda:11.6.0-base-ubuntu20.04" cuda nvidia-smi
Fri Mar 18 10:36:59 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.54 Driver Version: 510.54 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A30X Off | 00000000:02:00.0 Off | 0 |
| N/A 38C P0 67W / 230W | 0MiB / 24576MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
```
As the last step one can remove the additional packages and files that were added
to the `$ROOTFS_DIR` to keep it as small as possible.
## References
- [Configuring a VM for GPU Pass-Through by Using the QEMU Command Line](https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#using-gpu-pass-through-red-hat-el-qemu-cli)
- https://gitlab.com/nvidia/container-images/driver/-/tree/master
- https://github.com/NVIDIA/nvidia-docker/wiki/Driver-containers

View File

@@ -0,0 +1,293 @@
# Using Nvidia GPU device with Kata Containers
An Nvidia GPU device can be passed to a Kata Containers container using GPU passthrough
(Nvidia GPU pass-through mode) as well as GPU mediated passthrough (Nvidia vGPU mode). 
Nvidia GPU pass-through mode, an entire physical GPU is directly assigned to one VM,
bypassing the Nvidia Virtual GPU Manager. In this mode of operation, the GPU is accessed
exclusively by the Nvidia driver running in the VM to which it is assigned.
The GPU is not shared among VMs.
Nvidia Virtual GPU (vGPU) enables multiple virtual machines (VMs) to have simultaneous,
direct access to a single physical GPU, using the same Nvidia graphics drivers that are
deployed on non-virtualized operating systems. By doing this, Nvidia vGPU provides VMs
with unparalleled graphics performance, compute performance, and application compatibility,
together with the cost-effectiveness and scalability brought about by sharing a GPU
among multiple workloads.
| Technology | Description | Behaviour | Detail |
| --- | --- | --- | --- |
| Nvidia GPU pass-through mode | GPU passthrough | Physical GPU assigned to a single VM | Direct GPU assignment to VM without limitation |
| Nvidia vGPU mode | GPU sharing | Physical GPU shared by multiple VMs | Mediated passthrough |
## Hardware Requirements
Nvidia GPUs Recommended for Virtualization:
- Nvidia Tesla (T4, M10, P6, V100 or newer)
- Nvidia Quadro RTX 6000/8000
## Host BIOS Requirements
Some hardware requires a larger PCI BARs window, for example, Nvidia Tesla P100, K40m
```
$ lspci -s 04:00.0 -vv | grep Region
Region 0: Memory at c6000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at 383800000000 (64-bit, prefetchable) [size=16G] #above 4G
Region 3: Memory at 383c00000000 (64-bit, prefetchable) [size=32M]
```
For large BARs devices, MMIO mapping above 4G address space should be `enabled`
in the PCI configuration of the BIOS.
Some hardware vendors use different name in BIOS, such as:
- Above 4G Decoding
- Memory Hole for PCI MMIO
- Memory Mapped I/O above 4GB
The following steps outline the workflow for using an Nvidia GPU with Kata.
## Host Kernel Requirements
The following configurations need to be enabled on your host kernel:
- `CONFIG_VFIO`
- `CONFIG_VFIO_IOMMU_TYPE1`
- `CONFIG_VFIO_MDEV`
- `CONFIG_VFIO_MDEV_DEVICE`
- `CONFIG_VFIO_PCI`
Your host kernel needs to be booted with `intel_iommu=on` on the kernel command line.
## Install and configure Kata Containers
To use non-large BARs devices (for example, Nvidia Tesla T4), you need Kata version 1.3.0 or above.
Follow the [Kata Containers setup instructions](../install/README.md)
to install the latest version of Kata.
To use large BARs devices (for example, Nvidia Tesla P100), you need Kata version 1.11.0 or above.
The following configuration in the Kata `configuration.toml` file as shown below can work:
Hotplug for PCI devices by `acpi_pcihp` (Linux's ACPI PCI Hotplug driver):
```
machine_type = "q35"
hotplug_vfio_on_root_bus = false
```
Hotplug for PCIe devices by `pciehp` (Linux's PCIe Hotplug driver):
```
machine_type = "q35"
hotplug_vfio_on_root_bus = true
pcie_root_port = 1
```
## Build Kata Containers kernel with GPU support
The default guest kernel installed with Kata Containers does not provide GPU support.
To use an Nvidia GPU with Kata Containers, you need to build a kernel with the
necessary GPU support.
The following kernel config options need to be enabled:
```
# Support PCI/PCIe device hotplug (Required for large BARs device)
CONFIG_HOTPLUG_PCI_PCIE=y
# Support for loading modules (Required for load Nvidia drivers)
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# Enable the MMIO access method for PCIe devices (Required for large BARs device)
CONFIG_PCI_MMCONFIG=y
```
The following kernel config options need to be disabled:
```
# Disable Open Source Nvidia driver nouveau
# It conflicts with Nvidia official driver
CONFIG_DRM_NOUVEAU=n
```
> **Note**: `CONFIG_DRM_NOUVEAU` is normally disabled by default.
It is worth checking that it is not enabled in your kernel configuration to prevent any conflicts.
Build the Kata Containers kernel with the previous config options,
using the instructions described in [Building Kata Containers kernel](../../tools/packaging/kernel).
For further details on building and installing guest kernels,
see [the developer guide](../Developer-Guide.md#install-guest-kernel-images).
There is an easy way to build a guest kernel that supports Nvidia GPU:
```
## Build guest kernel with ../../tools/packaging/kernel
# Prepare (download guest kernel source, generate .config)
$ ./build-kernel.sh -v 4.19.86 -g nvidia -f setup
# Build guest kernel
$ ./build-kernel.sh -v 4.19.86 -g nvidia build
# Install guest kernel
$ sudo -E ./build-kernel.sh -v 4.19.86 -g nvidia install
/usr/share/kata-containers/vmlinux-nvidia-gpu.container -> vmlinux-4.19.86-70-nvidia-gpu
/usr/share/kata-containers/vmlinuz-nvidia-gpu.container -> vmlinuz-4.19.86-70-nvidia-gpu
```
To build Nvidia Driver in Kata container, `kernel-devel` is required.
This is a way to generate rpm packages for `kernel-devel`:
```
$ cd kata-linux-4.19.86-68
$ make rpm-pkg
Output RPMs:
~/rpmbuild/RPMS/x86_64/kernel-devel-4.19.86_nvidia_gpu-1.x86_64.rpm
```
> **Note**:
> - `kernel-devel` should be installed in Kata container before run Nvidia driver installer.
> - Run `make deb-pkg` to build the deb package.
Before using the new guest kernel, please update the `kernel` parameters in `configuration.toml`.
```
kernel = "/usr/share/kata-containers/vmlinuz-nvidia-gpu.container"
```
## Nvidia GPU pass-through mode with Kata Containers
Use the following steps to pass an Nvidia GPU device in pass-through mode with Kata:
1. Find the Bus-Device-Function (BDF) for GPU device on host:
```
$ sudo lspci -nn -D | grep -i nvidia
0000:04:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:15f8] (rev a1)
0000:84:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:15f8] (rev a1)
```
> PCI address `0000:04:00.0` is assigned to the hardware GPU device.
> `10de:15f8` is the device ID of the hardware GPU device.
2. Find the IOMMU group for the GPU device:
```
$ BDF="0000:04:00.0"
$ readlink -e /sys/bus/pci/devices/$BDF/iommu_group
/sys/kernel/iommu_groups/45
```
The previous output shows that the GPU belongs to IOMMU group 45.
3. Check the IOMMU group number under `/dev/vfio`:
```
$ ls -l /dev/vfio
total 0
crw------- 1 root root 248, 0 Feb 28 09:57 45
crw------- 1 root root 248, 1 Feb 28 09:57 54
crw-rw-rw- 1 root root 10, 196 Feb 28 09:57 vfio
```
4. Start a Kata container with GPU device:
```
$ sudo docker run -it --runtime=kata-runtime --cap-add=ALL --device /dev/vfio/45 centos /bin/bash
```
5. Run `lspci` within the container to verify the GPU device is seen in the list
of the PCI devices. Note the vendor-device id of the GPU (`10de:15f8`) in the `lspci` output.
```
$ lspci -nn -D | grep '10de:15f8'
0000:01:01.0 3D controller [0302]: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] [10de:15f8] (rev a1)
```
6. Additionally, you can check the PCI BARs space of the Nvidia GPU device in the container:
```
$ lspci -s 01:01.0 -vv | grep Region
Region 0: Memory at c0000000 (32-bit, non-prefetchable) [disabled] [size=16M]
Region 1: Memory at 4400000000 (64-bit, prefetchable) [disabled] [size=16G]
Region 3: Memory at 4800000000 (64-bit, prefetchable) [disabled] [size=32M]
```
> **Note**: If you see a message similar to the above, the BAR space of the Nvidia
> GPU has been successfully allocated.
## Nvidia vGPU mode with Kata Containers
Nvidia vGPU is a licensed product on all supported GPU boards. A software license
is required to enable all vGPU features within the guest VM.
> **Note**: There is no suitable test environment, so it is not written here.
## Install Nvidia Driver in Kata Containers
Download the official Nvidia driver from
[https://www.nvidia.com/Download/index.aspx](https://www.nvidia.com/Download/index.aspx),
for example `NVIDIA-Linux-x86_64-418.87.01.run`.
Install the `kernel-devel`(generated in the previous steps) for guest kernel:
```
$ sudo rpm -ivh kernel-devel-4.19.86_gpu-1.x86_64.rpm
```
Here is an example to extract, compile and install Nvidia driver:
```
## Extract
$ sh ./NVIDIA-Linux-x86_64-418.87.01.run -x
## Compile and install (It will take some time)
$ cd NVIDIA-Linux-x86_64-418.87.01
$ sudo ./nvidia-installer -a -q --ui=none \
--no-cc-version-check \
--no-opengl-files --no-install-libglvnd \
--kernel-source-path=/usr/src/kernels/`uname -r`
```
Or just run one command line:
```
$ sudo sh ./NVIDIA-Linux-x86_64-418.87.01.run -a -q --ui=none \
--no-cc-version-check \
--no-opengl-files --no-install-libglvnd \
--kernel-source-path=/usr/src/kernels/`uname -r`
```
To view detailed logs of the installer:
```
$ tail -f /var/log/nvidia-installer.log
```
Load Nvidia driver module manually
```
# Optionalgenerate modules.dep and map files for Nvidia driver
$ sudo depmod
# Load module
$ sudo modprobe nvidia-drm
# Check module
$ lsmod | grep nvidia
nvidia_drm 45056 0
nvidia_modeset 1093632 1 nvidia_drm
nvidia 18202624 1 nvidia_modeset
drm_kms_helper 159744 1 nvidia_drm
drm 364544 3 nvidia_drm,drm_kms_helper
i2c_core 65536 3 nvidia,drm_kms_helper,drm
ipmi_msghandler 49152 1 nvidia
```
Check Nvidia device status with `nvidia-smi`
```
$ nvidia-smi
Tue Mar 3 00:03:49 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01 Driver Version: 418.87.01 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:01:01.0 Off | 0 |
| N/A 27C P0 25W / 250W | 0MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
```
## References
- [Configuring a VM for GPU Pass-Through by Using the QEMU Command Line](https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#using-gpu-pass-through-red-hat-el-qemu-cli)
- https://gitlab.com/nvidia/container-images/driver/-/tree/master
- https://github.com/NVIDIA/nvidia-docker/wiki/Driver-containers

View File

@@ -279,8 +279,8 @@ $ export KERNEL_EXTRAVERSION=$(awk '/^EXTRAVERSION =/{print $NF}' $GOPATH/$LINUX
$ export KERNEL_ROOTFS_DIR=${KERNEL_MAJOR_VERSION}.${KERNEL_PATHLEVEL}.${KERNEL_SUBLEVEL}${KERNEL_EXTRAVERSION}
$ cd $QAT_SRC
$ KERNEL_SOURCE_ROOT=$GOPATH/$LINUX_VER ./configure --enable-icp-sriov=guest
$ sudo -E make all -j $($(nproc ${CI:+--ignore 1}))
$ sudo -E make INSTALL_MOD_PATH=$ROOTFS_DIR qat-driver-install -j $($(nproc ${CI:+--ignore 1}))
$ sudo -E make all -j$(nproc)
$ sudo -E make INSTALL_MOD_PATH=$ROOTFS_DIR qat-driver-install -j$(nproc)
```
The `usdm_drv` module also needs to be copied into the rootfs modules path and
@@ -312,7 +312,7 @@ working properly with the Kata Containers VM.
### Build OpenSSL Intel® QAT engine container
Use the OpenSSL Intel® QAT [Dockerfile](https://github.com/intel/intel-device-plugins-for-kubernetes/tree/main/demo/openssl-qat-engine)
Use the OpenSSL Intel® QAT [Dockerfile](https://github.com/intel/intel-device-plugins-for-kubernetes/tree/master/demo/openssl-qat-engine)
to build a container image with an optimized OpenSSL engine for
Intel® QAT. Using `docker build` with the Kata Containers runtime can sometimes
have issues. Therefore, make sure that `runc` is the default Docker container
@@ -444,7 +444,7 @@ $ sudo docker save -o openssl-qat-engine.tar openssl-qat-engine:latest
$ sudo ctr -n=k8s.io images import openssl-qat-engine.tar
```
The [Intel® QAT Plugin](https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/cmd/qat_plugin/README.md)
The [Intel® QAT Plugin](https://github.com/intel/intel-device-plugins-for-kubernetes/blob/master/cmd/qat_plugin/README.md)
needs to be started so that the virtual functions can be discovered and
used by Kubernetes.

View File

@@ -18,13 +18,16 @@ CONFIG_X86_SGX_KVM=y
* Kubernetes cluster configured with:
* [`kata-deploy`](../../tools/packaging/kata-deploy) based Kata Containers installation
* [Intel SGX Kubernetes device plugin](https://github.com/intel/intel-device-plugins-for-kubernetes/tree/main/cmd/sgx_plugin#deploying-with-pre-built-images) and associated components including [operator](https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/cmd/operator/README.md) and dependencies
* [Intel SGX Kubernetes device plugin](https://github.com/intel/intel-device-plugins-for-kubernetes/tree/main/cmd/sgx_plugin#deploying-with-pre-built-images)
> Note: Kata Containers supports creating VM sandboxes with Intel® SGX enabled
> using [cloud-hypervisor](https://github.com/cloud-hypervisor/cloud-hypervisor/) and [QEMU](https://www.qemu.org/) VMMs only.
### Kata Containers Configuration
Before running a Kata Container make sure that your version of `crio` or `containerd`
supports annotations.
For `containerd` check in `/etc/containerd/config.toml` that the list of `pod_annotations` passed
to the `sandbox` are: `["io.katacontainers.*", "sgx.intel.com/epc"]`.
@@ -96,4 +99,4 @@ because socket passthrough is not supported. An alternative is to deploy the `ae
container.
* Projects like [Gramine Shielded Containers (GSC)](https://gramine-gsc.readthedocs.io/en/latest/) are
also known to work. For GSC specifically, the Kata guest kernel needs to have the `CONFIG_NUMA=y`
enabled and at least one CPU online when running the GSC container. The Kata Containers guest kernel currently has `CONFIG_NUMA=y` enabled by default.
enabled and at least one CPU online when running the GSC container.

View File

@@ -0,0 +1,76 @@
# Setup to run VPP
The Data Plane Development Kit (DPDK) is a set of libraries and drivers for
fast packet processing. Vector Packet Processing (VPP) is a platform
extensible framework that provides out-of-the-box production quality
switch and router functionality. VPP is a high performance packet-processing
stack that can run on commodity CPUs. Enabling VPP with DPDK support can
yield significant performance improvements over a Linux\* bridge providing a
switch with DPDK VHOST-USER ports.
For more information about VPP visit their [wiki](https://wiki.fd.io/view/VPP).
## Install and configure Kata Containers
Follow the [Kata Containers setup instructions](../Developer-Guide.md).
In order to make use of VHOST-USER based interfaces, the container needs to be backed
by huge pages. `HugePages` support is required for the large memory pool allocation used for
DPDK packet buffers. This is a feature which must be configured within the Linux Kernel. See
[the DPDK documentation](https://doc.dpdk.org/guides/linux_gsg/sys_reqs.html#use-of-hugepages-in-the-linux-environment)
for details on how to enable for the host. After enabling huge pages support on the host system,
update the Kata configuration to enable huge page support in the guest kernel:
```
$ sudo sed -i -e 's/^# *\(enable_hugepages\).*=.*$/\1 = true/g' /usr/share/defaults/kata-containers/configuration.toml
```
## Install VPP
Follow the [VPP installation instructions](https://wiki.fd.io/view/VPP/Installing_VPP_binaries_from_packages).
After a successful installation, your host system is ready to start
connecting Kata Containers with VPP bridges.
### Install the VPP Docker\* plugin
To create a Docker network and connect Kata Containers easily to that network through
Docker, install a VPP Docker plugin.
To install the plugin, follow the [plugin installation instructions](https://github.com/clearcontainers/vpp).
This VPP plugin allows the creation of a VPP network. Every container added
to this network is connected through an L2 bridge-domain provided by VPP.
## Example: Launch two Kata Containers using VPP
To use VPP, use Docker to create a network that makes use of VPP.
For example:
```
$ sudo docker network create -d=vpp --ipam-driver=vpp --subnet=192.168.1.0/24 --gateway=192.168.1.1 vpp_net
```
Test connectivity by launching two containers:
```
$ sudo docker run --runtime=kata-runtime --net=vpp_net --ip=192.168.1.2 --mac-address=CA:FE:CA:FE:01:02 -it busybox bash -c "ip a; ip route; sleep 300"
$ sudo docker run --runtime=kata-runtime --net=vpp_net --ip=192.168.1.3 --mac-address=CA:FE:CA:FE:01:03 -it busybox bash -c "ip a; ip route; ping 192.168.1.2"
```
These commands setup two Kata Containers connected via a VPP L2 bridge
domain. The first of the two VMs displays the networking details and then
sleeps providing a period of time for it to be pinged. The second
VM displays its networking details and then pings the first VM, verifying
connectivity between them.
After verifying connectivity, cleanup with the following commands:
```
$ sudo docker kill $(sudo docker ps --no-trunc -aq)
$ sudo docker rm $(sudo docker ps --no-trunc -aq)
$ sudo docker network rm vpp_net
$ sudo service vpp stop
```

View File

@@ -22,35 +22,21 @@ $ sudo snap install kata-containers --classic
## Build and install snap image
Run the command below which will use the packaging Makefile to build the snap image:
Run next command at the root directory of the packaging repository.
```sh
$ make -C tools/packaging snap
$ make snap
```
> **Warning:**
>
> By default, `snapcraft` will create a clean virtual machine
> environment to build the snap in using the `multipass` tool.
>
> However, `multipass` is silently disabled when `--destructive-mode` is
> used.
>
> Since building the Kata Containers package currently requires
> `--destructive-mode`, the snap will be built using the host
> environment. To avoid parts of the build auto-detecting additional
> features to enable (for example for QEMU), we recommend that you
> only run the snap build in a minimal host environment.
To install the resulting snap image, snap must be put in [classic mode][3] and the
security confinement must be disabled (`--classic`). Also since the resulting snap
has not been signed the verification of signature must be omitted (`--dangerous`).
security confinement must be disabled (*--classic*). Also since the resulting snap
has not been signed the verification of signature must be omitted (*--dangerous*).
```sh
$ sudo snap install --classic --dangerous "kata-containers_${version}_${arch}.snap"
$ sudo snap install --classic --dangerous kata-containers_[VERSION]_[ARCH].snap
```
Replace `${version}` with the current version of Kata Containers and `${arch}` with
Replace `VERSION` with the current version of Kata Containers and `ARCH` with
the system architecture.
## Configure Kata Containers
@@ -90,12 +76,12 @@ then a new configuration file can be [created](#configure-kata-containers)
and [configured][7].
[1]: https://docs.snapcraft.io/snaps/intro
[2]: ../../docs/design/architecture/README.md#root-filesystem-image
[2]: ../docs/design/architecture/README.md#root-filesystem-image
[3]: https://docs.snapcraft.io/reference/confinement#classic
[4]: https://github.com/kata-containers/kata-containers/tree/main/src/runtime#configuration
[4]: https://github.com/kata-containers/runtime#configuration
[5]: https://docs.docker.com/engine/reference/commandline/dockerd
[6]: ../../docs/install/docker/ubuntu-docker-install.md
[7]: ../../docs/Developer-Guide.md#configure-to-use-initrd-or-rootfs-image
[6]: ../docs/install/docker/ubuntu-docker-install.md
[7]: ../docs/Developer-Guide.md#configure-to-use-initrd-or-rootfs-image
[8]: https://snapcraft.io/kata-containers
[9]: ../../docs/Developer-Guide.md#run-kata-containers-with-docker
[10]: ../../docs/Developer-Guide.md#run-kata-containers-with-kubernetes
[9]: ../docs/Developer-Guide.md#run-kata-containers-with-docker
[10]: ../docs/Developer-Guide.md#run-kata-containers-with-kubernetes

View File

@@ -1,114 +0,0 @@
#!/usr/bin/env bash
#
# Copyright (c) 2022 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
# Description: Idempotent script to be sourced by all parts in a
# snapcraft config file.
set -o errexit
set -o nounset
set -o pipefail
# XXX: Bash-specific code. zsh doesn't support this option and that *does*
# matter if this script is run sourced... since it'll be using zsh! ;)
[ -n "$BASH_VERSION" ] && set -o errtrace
[ -n "${DEBUG:-}" ] && set -o xtrace
die()
{
echo >&2 "ERROR: $0: $*"
}
[ -n "${SNAPCRAFT_STAGE:-}" ] ||\
die "must be sourced from a snapcraft config file"
snap_yq_version=3.4.1
snap_common_install_yq()
{
export yq="${SNAPCRAFT_STAGE}/bin/yq"
local yq_pkg
yq_pkg="github.com/mikefarah/yq"
local yq_url
yq_url="https://${yq_pkg}/releases/download/${snap_yq_version}/yq_${goos}_${goarch}"
curl -o "${yq}" -L "${yq_url}"
chmod +x "${yq}"
}
# Function that should be called for each snap "part" in
# snapcraft.yaml.
snap_common_main()
{
# Architecture
arch="$(uname -m)"
case "${arch}" in
aarch64)
goarch="arm64"
qemu_arch="${arch}"
;;
ppc64le)
goarch="ppc64le"
qemu_arch="ppc64"
;;
s390x)
goarch="${arch}"
qemu_arch="${arch}"
;;
x86_64)
goarch="amd64"
qemu_arch="${arch}"
;;
*) die "unsupported architecture: ${arch}" ;;
esac
dpkg_arch=$(dpkg --print-architecture)
# golang
#
# We need the O/S name in golang format, but since we don't
# know if the godeps part has run, we don't know if golang is
# available yet, hence fall back to a standard system command.
goos="$(go env GOOS &>/dev/null || true)"
[ -z "$goos" ] && goos=$(uname -s|tr '[A-Z]' '[a-z]')
export GOROOT="${SNAPCRAFT_STAGE}"
export GOPATH="${GOROOT}/gopath"
export GO111MODULE="auto"
mkdir -p "${GOPATH}/bin"
export PATH="${GOPATH}/bin:${PATH}"
# Proxy
export http_proxy="${http_proxy:-}"
export https_proxy="${https_proxy:-}"
# Binaries
mkdir -p "${SNAPCRAFT_STAGE}/bin"
export PATH="$PATH:${SNAPCRAFT_STAGE}/bin"
# YAML query tool
export yq="${SNAPCRAFT_STAGE}/bin/yq"
# Kata paths
export kata_dir=$(printf "%s/src/github.com/%s/%s" \
"${GOPATH}" \
"${SNAPCRAFT_PROJECT_NAME}" \
"${SNAPCRAFT_PROJECT_NAME}")
export versions_file="${kata_dir}/versions.yaml"
[ -n "${yq:-}" ] && [ -x "${yq:-}" ] || snap_common_install_yq
}
snap_common_main

View File

@@ -1,5 +1,4 @@
name: kata-containers
website: https://github.com/kata-containers/kata-containers
summary: Build lightweight VMs that seamlessly plug into the containers ecosystem
description: |
Kata Containers is an open source project and community working to build a
@@ -19,18 +18,20 @@ parts:
- git
- git-extras
override-pull: |
source "${SNAPCRAFT_PROJECT_DIR}/snap/local/snap-common.sh"
version="9999"
kata_url="https://github.com/kata-containers/kata-containers"
if echo "${GITHUB_REF:-}" | grep -q -E "^refs/tags"; then
version=$(echo ${GITHUB_REF:-} | cut -d/ -f3)
if echo "${GITHUB_REF}" | grep -q -E "^refs/tags"; then
version=$(echo ${GITHUB_REF} | cut -d/ -f3)
git checkout ${version}
fi
snapcraftctl set-grade "stable"
snapcraftctl set-version "${version}"
# setup GOPATH - this repo dir should be there
export GOPATH=${SNAPCRAFT_STAGE}/gopath
kata_dir=${GOPATH}/src/github.com/${SNAPCRAFT_PROJECT_NAME}/${SNAPCRAFT_PROJECT_NAME}
mkdir -p $(dirname ${kata_dir})
ln -sf $(realpath "${SNAPCRAFT_STAGE}/..") ${kata_dir}
@@ -42,46 +43,31 @@ parts:
build-packages:
- curl
override-build: |
source "${SNAPCRAFT_PROJECT_DIR}/snap/local/snap-common.sh"
# put everything in stage
cd "${SNAPCRAFT_STAGE}"
cd ${SNAPCRAFT_STAGE}
version="$(${yq} r ${kata_dir}/versions.yaml languages.golang.meta.newest-version)"
yq_path="./yq"
yq_pkg="github.com/mikefarah/yq"
goos="linux"
case "$(uname -m)" in
aarch64) goarch="arm64";;
ppc64le) goarch="ppc64le";;
x86_64) goarch="amd64";;
s390x) goarch="s390x";;
*) echo "unsupported architecture: $(uname -m)"; exit 1;;
esac
yq_version=3.4.1
yq_url="https://${yq_pkg}/releases/download/${yq_version}/yq_${goos}_${goarch}"
curl -o "${yq_path}" -L "${yq_url}"
chmod +x "${yq_path}"
kata_dir=gopath/src/github.com/${SNAPCRAFT_PROJECT_NAME}/${SNAPCRAFT_PROJECT_NAME}
version="$(${yq_path} r ${kata_dir}/versions.yaml languages.golang.meta.newest-version)"
tarfile="go${version}.${goos}-${goarch}.tar.gz"
curl -LO https://golang.org/dl/${tarfile}
tar -xf ${tarfile} --strip-components=1
rustdeps:
after: [metadata]
plugin: nil
prime:
- -*
build-packages:
- curl
override-build: |
source "${SNAPCRAFT_PROJECT_DIR}/snap/local/snap-common.sh"
# put everything in stage
cd "${SNAPCRAFT_STAGE}"
version="$(${yq} r ${kata_dir}/versions.yaml languages.rust.meta.newest-version)"
if ! command -v rustup > /dev/null; then
curl https://sh.rustup.rs -sSf | sh -s -- -y --default-toolchain ${version}
fi
export PATH=${PATH}:${HOME}/.cargo/bin
rustup toolchain install ${version}
rustup default ${version}
if [ "${arch}" == "ppc64le" ] || [ "${arch}" == "s390x" ] ; then
[ "${arch}" == "ppc64le" ] && arch="powerpc64le"
rustup target add ${arch}-unknown-linux-gnu
else
rustup target add ${arch}-unknown-linux-musl
$([ "$(whoami)" != "root" ] && echo sudo) ln -sf /usr/bin/g++ /bin/musl-g++
fi
rustup component add rustfmt
image:
after: [godeps, qemu, kernel]
plugin: nil
@@ -94,17 +80,28 @@ parts:
- uidmap
- gnupg2
override-build: |
source "${SNAPCRAFT_PROJECT_DIR}/snap/local/snap-common.sh"
[ "$(uname -m)" = "ppc64le" ] || [ "$(uname -m)" = "s390x" ] && sudo apt-get --no-install-recommends install -y protobuf-compiler
[ "${arch}" = "ppc64le" ] || [ "${arch}" = "s390x" ] && sudo apt-get --no-install-recommends install -y protobuf-compiler
yq=${SNAPCRAFT_STAGE}/yq
# set GOPATH
export GOPATH=${SNAPCRAFT_STAGE}/gopath
kata_dir=${GOPATH}/src/github.com/${SNAPCRAFT_PROJECT_NAME}/${SNAPCRAFT_PROJECT_NAME}
export GOROOT=${SNAPCRAFT_STAGE}
export PATH="${GOROOT}/bin:${PATH}"
export GO111MODULE="auto"
http_proxy=${http_proxy:-""}
https_proxy=${https_proxy:-""}
if [ -n "$http_proxy" ]; then
echo "Setting proxy $http_proxy"
sudo -E systemctl set-environment http_proxy="$http_proxy" || true
sudo -E systemctl set-environment https_proxy="$https_proxy" || true
sudo -E systemctl set-environment http_proxy=$http_proxy || true
sudo -E systemctl set-environment https_proxy=$https_proxy || true
fi
# Copy yq binary. It's used in the container
mkdir -p "${GOPATH}/bin/"
cp -a "${yq}" "${GOPATH}/bin/"
echo "Unmasking docker service"
@@ -115,54 +112,63 @@ parts:
echo "Starting docker"
sudo -E systemctl start docker || true
cd "${kata_dir}/tools/osbuilder"
cd ${kata_dir}/tools/osbuilder
# build image
export AGENT_INIT=yes
export USE_DOCKER=1
export DEBUG=1
arch="$(uname -m)"
initrd_distro=$(${yq} r -X ${kata_dir}/versions.yaml assets.initrd.architecture.${arch}.name)
image_distro=$(${yq} r -X ${kata_dir}/versions.yaml assets.image.architecture.${arch}.name)
case "$arch" in
x86_64)
# In some build systems it's impossible to build a rootfs image, try with the initrd image
sudo -E PATH=$PATH make image DISTRO="${image_distro}" || sudo -E PATH="$PATH" make initrd DISTRO="${initrd_distro}"
sudo -E PATH=$PATH make image DISTRO=${image_distro} || sudo -E PATH=$PATH make initrd DISTRO=${initrd_distro}
;;
aarch64|ppc64le|s390x)
sudo -E PATH="$PATH" make initrd DISTRO="${initrd_distro}"
sudo -E PATH=$PATH make initrd DISTRO=${initrd_distro}
;;
*) die "unsupported architecture: ${arch}" ;;
*) echo "unsupported architecture: $(uname -m)"; exit 1;;
esac
# Install image
kata_image_dir="${SNAPCRAFT_PART_INSTALL}/usr/share/kata-containers"
mkdir -p "${kata_image_dir}"
cp kata-containers*.img "${kata_image_dir}"
kata_image_dir=${SNAPCRAFT_PART_INSTALL}/usr/share/kata-containers
mkdir -p ${kata_image_dir}
cp kata-containers*.img ${kata_image_dir}
runtime:
after: [godeps, image, cloud-hypervisor]
plugin: nil
build-attributes: [no-patchelf]
override-build: |
source "${SNAPCRAFT_PROJECT_DIR}/snap/local/snap-common.sh"
# set GOPATH
export GOPATH=${SNAPCRAFT_STAGE}/gopath
export GOROOT=${SNAPCRAFT_STAGE}
export PATH="${GOROOT}/bin:${PATH}"
export GO111MODULE="auto"
kata_dir=${GOPATH}/src/github.com/${SNAPCRAFT_PROJECT_NAME}/${SNAPCRAFT_PROJECT_NAME}
cd "${kata_dir}/src/runtime"
cd ${kata_dir}/src/runtime
qemu_cmd="qemu-system-${qemu_arch}"
# setup arch
arch=$(uname -m)
if [ ${arch} = "ppc64le" ]; then
arch="ppc64"
fi
# build and install runtime
make \
PREFIX="/snap/${SNAPCRAFT_PROJECT_NAME}/current/usr" \
PREFIX=/snap/${SNAPCRAFT_PROJECT_NAME}/current/usr \
SKIP_GO_VERSION_CHECK=1 \
QEMUCMD="${qemu_cmd}"
QEMUCMD=qemu-system-$arch
make install \
PREFIX=/usr \
DESTDIR="${SNAPCRAFT_PART_INSTALL}" \
DESTDIR=${SNAPCRAFT_PART_INSTALL} \
SKIP_GO_VERSION_CHECK=1 \
QEMUCMD="${qemu_cmd}"
QEMUCMD=qemu-system-$arch
if [ ! -f ${SNAPCRAFT_PART_INSTALL}/../../image/install/usr/share/kata-containers/kata-containers.img ]; then
sed -i -e "s|^image =.*|initrd = \"/snap/${SNAPCRAFT_PROJECT_NAME}/current/usr/share/kata-containers/kata-containers-initrd.img\"|" \
@@ -179,37 +185,44 @@ parts:
- bison
- flex
override-build: |
source "${SNAPCRAFT_PROJECT_DIR}/snap/local/snap-common.sh"
yq=${SNAPCRAFT_STAGE}/yq
export PATH="${PATH}:${SNAPCRAFT_STAGE}"
export GOPATH=${SNAPCRAFT_STAGE}/gopath
kata_dir=${GOPATH}/src/github.com/${SNAPCRAFT_PROJECT_NAME}/${SNAPCRAFT_PROJECT_NAME}
versions_file="${kata_dir}/versions.yaml"
kernel_version="$(${yq} r $versions_file assets.kernel.version)"
#Remove extra 'v'
kernel_version="${kernel_version#v}"
kernel_version=${kernel_version#v}
[ "${arch}" = "s390x" ] && sudo apt-get --no-install-recommends install -y libssl-dev
[ "$(uname -m)" = "s390x" ] && sudo apt-get --no-install-recommends install -y libssl-dev
cd "${kata_dir}/tools/packaging/kernel"
export GOPATH=${SNAPCRAFT_STAGE}/gopath
export GO111MODULE="auto"
kata_dir=${GOPATH}/src/github.com/${SNAPCRAFT_PROJECT_NAME}/${SNAPCRAFT_PROJECT_NAME}
cd ${kata_dir}/tools/packaging/kernel
kernel_dir_prefix="kata-linux-"
# Setup and build kernel
./build-kernel.sh -v "${kernel_version}" -d setup
./build-kernel.sh -v ${kernel_version} -d setup
cd ${kernel_dir_prefix}*
make -j $(nproc ${CI:+--ignore 1}) EXTRAVERSION=".container"
make -j $(($(nproc)-1)) EXTRAVERSION=".container"
kernel_suffix="${kernel_version}.container"
kata_kernel_dir="${SNAPCRAFT_PART_INSTALL}/usr/share/kata-containers"
mkdir -p "${kata_kernel_dir}"
kernel_suffix=${kernel_version}.container
kata_kernel_dir=${SNAPCRAFT_PART_INSTALL}/usr/share/kata-containers
mkdir -p ${kata_kernel_dir}
# Install bz kernel
make install INSTALL_PATH="${kata_kernel_dir}" EXTRAVERSION=".container" || true
vmlinuz_name="vmlinuz-${kernel_suffix}"
ln -sf "${vmlinuz_name}" "${kata_kernel_dir}/vmlinuz.container"
make install INSTALL_PATH=${kata_kernel_dir} EXTRAVERSION=".container" || true
vmlinuz_name=vmlinuz-${kernel_suffix}
ln -sf ${vmlinuz_name} ${kata_kernel_dir}/vmlinuz.container
# Install raw kernel
vmlinux_path="vmlinux"
[ "${arch}" = "s390x" ] && vmlinux_path="arch/s390/boot/vmlinux"
vmlinux_name="vmlinux-${kernel_suffix}"
cp "${vmlinux_path}" "${kata_kernel_dir}/${vmlinux_name}"
ln -sf "${vmlinux_name}" "${kata_kernel_dir}/vmlinux.container"
vmlinux_path=vmlinux
[ "$(uname -m)" = "s390x" ] && vmlinux_path=arch/s390/boot/compressed/vmlinux
vmlinux_name=vmlinux-${kernel_suffix}
cp ${vmlinux_path} ${kata_kernel_dir}/${vmlinux_name}
ln -sf ${vmlinux_name} ${kata_kernel_dir}/vmlinux.container
qemu:
plugin: make
@@ -236,8 +249,12 @@ parts:
- libselinux1-dev
- ninja-build
override-build: |
source "${SNAPCRAFT_PROJECT_DIR}/snap/local/snap-common.sh"
yq=${SNAPCRAFT_STAGE}/yq
export GOPATH=${SNAPCRAFT_STAGE}/gopath
export GO111MODULE="auto"
kata_dir=${GOPATH}/src/github.com/${SNAPCRAFT_PROJECT_NAME}/${SNAPCRAFT_PROJECT_NAME}
versions_file="${kata_dir}/versions.yaml"
branch="$(${yq} r ${versions_file} assets.hypervisor.qemu.version)"
url="$(${yq} r ${versions_file} assets.hypervisor.qemu.url)"
commit=""
@@ -245,11 +262,11 @@ parts:
patches_version_dir="${kata_dir}/tools/packaging/qemu/patches/tag_patches/${branch}"
# download source
qemu_dir="${SNAPCRAFT_STAGE}/qemu"
qemu_dir=${SNAPCRAFT_STAGE}/qemu
rm -rf "${qemu_dir}"
git clone --depth 1 --branch ${branch} --single-branch ${url} "${qemu_dir}"
cd "${qemu_dir}"
[ -z "${commit}" ] || git checkout "${commit}"
cd ${qemu_dir}
[ -z "${commit}" ] || git checkout ${commit}
[ -n "$(ls -A ui/keycodemapdb)" ] || git clone --depth 1 https://github.com/qemu/keycodemapdb ui/keycodemapdb/
[ -n "$(ls -A capstone)" ] || git clone --depth 1 https://github.com/qemu/capstone capstone
@@ -260,10 +277,10 @@ parts:
${kata_dir}/tools/packaging/scripts/apply_patches.sh "${patches_version_dir}"
# Only x86_64 supports libpmem
[ "${arch}" = "x86_64" ] && sudo apt-get --no-install-recommends install -y apt-utils ca-certificates libpmem-dev
[ "$(uname -m)" = "x86_64" ] && sudo apt-get --no-install-recommends install -y apt-utils ca-certificates libpmem-dev
configure_hypervisor="${kata_dir}/tools/packaging/scripts/configure-hypervisor.sh"
chmod +x "${configure_hypervisor}"
configure_hypervisor=${kata_dir}/tools/packaging/scripts/configure-hypervisor.sh
chmod +x ${configure_hypervisor}
# static build. The --prefix, --libdir, --libexecdir, --datadir arguments are
# based on PREFIX and set by configure-hypervisor.sh
echo "$(PREFIX=/snap/${SNAPCRAFT_PROJECT_NAME}/current/usr ${configure_hypervisor} -s kata-qemu) \
@@ -273,17 +290,17 @@ parts:
# Copy QEMU configurations (Kconfigs)
case "${branch}" in
"v5.1.0")
cp -a "${kata_dir}"/tools/packaging/qemu/default-configs/* default-configs
cp -a ${kata_dir}/tools/packaging/qemu/default-configs/* default-configs
;;
*)
cp -a "${kata_dir}"/tools/packaging/qemu/default-configs/* configs/devices/
cp -a ${kata_dir}/tools/packaging/qemu/default-configs/* configs/devices/
;;
esac
# build and install
make -j $(nproc ${CI:+--ignore 1})
make install DESTDIR="${SNAPCRAFT_PART_INSTALL}"
make -j $(($(nproc)-1))
make install DESTDIR=${SNAPCRAFT_PART_INSTALL}
prime:
- -snap/
- -usr/bin/qemu-ga
@@ -299,67 +316,26 @@ parts:
# Hack: move qemu to /
"snap/kata-containers/current/": "./"
virtiofsd:
plugin: nil
after: [godeps, rustdeps]
override-build: |
source "${SNAPCRAFT_PROJECT_DIR}/snap/local/snap-common.sh"
# Currently, powerpc makes use of the QEMU's C implementation.
# The other platforms make use of the new rust virtiofsd.
#
# See "tools/packaging/scripts/configure-hypervisor.sh".
if [ "${arch}" == "ppc64le" ]
then
echo "INFO: Building QEMU's C version of virtiofsd"
# Handled by the 'qemu' part, so nothing more to do here.
exit 0
else
echo "INFO: Building rust version of virtiofsd"
fi
cd "${kata_dir}"
export PATH=${PATH}:${HOME}/.cargo/bin
# Download the rust implementation of virtiofsd
tools/packaging/static-build/virtiofsd/build-static-virtiofsd.sh
sudo install \
--owner='root' \
--group='root' \
--mode=0755 \
-D \
--target-directory="${SNAPCRAFT_PART_INSTALL}/usr/libexec/" \
virtiofsd/virtiofsd
cloud-hypervisor:
plugin: nil
after: [godeps]
override-build: |
source "${SNAPCRAFT_PROJECT_DIR}/snap/local/snap-common.sh"
if [ "${arch}" == "aarch64" ] || [ "${arch}" == "x86_64" ]; then
arch=$(uname -m)
if [ "{$arch}" == "aarch64" ] || [ "${arch}" == "x64_64" ]; then
sudo apt-get -y update
sudo apt-get -y install ca-certificates curl gnupg lsb-release
curl -fsSL https://download.docker.com/linux/ubuntu/gpg |\
sudo gpg --batch --yes --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
distro_codename=$(lsb_release -cs)
echo "deb [arch=${dpkg_arch} signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu ${distro_codename} stable" |\
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --batch --yes --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get -y update
sudo apt-get -y install docker-ce docker-ce-cli containerd.io
sudo systemctl start docker.socket
cd "${SNAPCRAFT_PROJECT_DIR}"
export GOPATH=${SNAPCRAFT_STAGE}/gopath
kata_dir=${GOPATH}/src/github.com/${SNAPCRAFT_PROJECT_NAME}/${SNAPCRAFT_PROJECT_NAME}
cd ${kata_dir}
sudo -E NO_TTY=true make cloud-hypervisor-tarball
tarfile="${SNAPCRAFT_PROJECT_DIR}/tools/packaging/kata-deploy/local-build/build/kata-static-cloud-hypervisor.tar.xz"
tmpdir=$(mktemp -d)
tar -xvJpf "${tarfile}" -C "${tmpdir}"
install -D "${tmpdir}/opt/kata/bin/cloud-hypervisor" "${SNAPCRAFT_PART_INSTALL}/usr/bin/cloud-hypervisor"
rm -rf "${tmpdir}"
tar xvJpf build/kata-static-cloud-hypervisor.tar.xz -C /tmp/
install -D /tmp/opt/kata/bin/cloud-hypervisor ${SNAPCRAFT_PART_INSTALL}/usr/bin/cloud-hypervisor
fi
apps:

866
src/agent/Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -3,26 +3,23 @@ name = "kata-agent"
version = "0.1.0"
authors = ["The Kata Containers community <kata-dev@lists.katacontainers.io>"]
edition = "2018"
license = "Apache-2.0"
[dependencies]
oci = { path = "../libs/oci" }
rustjail = { path = "rustjail" }
protocols = { path = "../libs/protocols", features = ["async"] }
protocols = { path = "../libs/protocols" }
lazy_static = "1.3.0"
ttrpc = { version = "0.6.0", features = ["async"], default-features = false }
protobuf = "2.27.0"
ttrpc = { version = "0.5.0", features = ["async", "protobuf-codec"], default-features = false }
protobuf = "=2.14.0"
libc = "0.2.58"
nix = "0.24.2"
nix = "0.23.0"
capctl = "0.2.0"
serde_json = "1.0.39"
scan_fmt = "0.2.3"
scopeguard = "1.0.0"
thiserror = "1.0.26"
regex = "1.5.6"
regex = "1.5.4"
serial_test = "0.5.1"
kata-sys-util = { path = "../libs/kata-sys-util" }
kata-types = { path = "../libs/kata-types" }
sysinfo = "0.23.0"
# Async helpers
@@ -52,7 +49,7 @@ log = "0.4.11"
prometheus = { version = "0.13.0", features = ["process"] }
procfs = "0.12.0"
anyhow = "1.0.32"
cgroups = { package = "cgroups-rs", version = "0.2.10" }
cgroups = { package = "cgroups-rs", version = "0.2.8" }
# Tracing
tracing = "0.1.26"
@@ -68,7 +65,6 @@ clap = { version = "3.0.1", features = ["derive"] }
[dev-dependencies]
tempfile = "3.1.0"
test-utils = { path = "../libs/test-utils" }
[workspace]
members = [
@@ -80,8 +76,3 @@ lto = true
[features]
seccomp = ["rustjail/seccomp"]
standard-oci-runtime = ["rustjail/standard-oci-runtime"]
[[bin]]
name = "kata-agent"
path = "src/main.rs"

View File

@@ -14,6 +14,10 @@ PROJECT_COMPONENT = kata-agent
TARGET = $(PROJECT_COMPONENT)
SOURCES := \
$(shell find . 2>&1 | grep -E '.*\.rs$$') \
Cargo.toml
VERSION_FILE := ./VERSION
VERSION := $(shell grep -v ^\# $(VERSION_FILE))
COMMIT_NO := $(shell git rev-parse HEAD 2>/dev/null || true)
@@ -33,16 +37,8 @@ ifeq ($(SECCOMP),yes)
override EXTRA_RUSTFEATURES += seccomp
endif
##VAR STANDARD_OCI_RUNTIME=yes|no define if agent enables standard oci runtime feature
STANDARD_OCI_RUNTIME := no
# Enable standard oci runtime feature of rust build
ifeq ($(STANDARD_OCI_RUNTIME),yes)
override EXTRA_RUSTFEATURES += standard-oci-runtime
endif
ifneq ($(EXTRA_RUSTFEATURES),)
override EXTRA_RUSTFEATURES := --features "$(EXTRA_RUSTFEATURES)"
override EXTRA_RUSTFEATURES := --features $(EXTRA_RUSTFEATURES)
endif
include ../../utils.mk
@@ -107,17 +103,20 @@ endef
##TARGET default: build code
default: $(TARGET) show-header
$(TARGET): $(GENERATED_CODE) $(TARGET_PATH)
$(TARGET): $(GENERATED_CODE) logging-crate-tests $(TARGET_PATH)
$(TARGET_PATH): show-summary
@RUSTFLAGS="$(EXTRA_RUSTFLAGS) --deny warnings" cargo build --target $(TRIPLE) $(if $(findstring release,$(BUILD_TYPE)),--release) $(EXTRA_RUSTFEATURES)
logging-crate-tests:
make -C $(CWD)/../libs/logging
$(TARGET_PATH): $(SOURCES) | show-summary
@RUSTFLAGS="$(EXTRA_RUSTFLAGS) --deny warnings" cargo build --target $(TRIPLE) --$(BUILD_TYPE) $(EXTRA_RUSTFEATURES)
$(GENERATED_FILES): %: %.in
@sed $(foreach r,$(GENERATED_REPLACEMENTS),-e 's|@$r@|$($r)|g') "$<" > "$@"
##TARGET optimize: optimized build
optimize: show-summary show-header
@RUSTFLAGS="-C link-arg=-s $(EXTRA_RUSTFLAGS) --deny warnings" cargo build --target $(TRIPLE) $(if $(findstring release,$(BUILD_TYPE)),--release) $(EXTRA_RUSTFEATURES)
optimize: $(SOURCES) | show-summary show-header
@RUSTFLAGS="-C link-arg=-s $(EXTRA_RUSTFLAGS) --deny warnings" cargo build --target $(TRIPLE) --$(BUILD_TYPE) $(EXTRA_RUSTFEATURES)
##TARGET install: install agent
install: install-services
@@ -200,6 +199,7 @@ codecov-html: check_tarpaulin
.PHONY: \
help \
logging-crate-tests \
optimize \
show-header \
show-summary \

View File

@@ -3,7 +3,6 @@ name = "rustjail"
version = "0.1.0"
authors = ["The Kata Containers community <kata-dev@lists.katacontainers.io>"]
edition = "2018"
license = "Apache-2.0"
[dependencies]
serde = "1.0.91"
@@ -12,33 +11,30 @@ serde_derive = "1.0.91"
oci = { path = "../../libs/oci" }
protocols = { path ="../../libs/protocols" }
caps = "0.5.0"
nix = "0.24.2"
nix = "0.23.0"
scopeguard = "1.0.0"
capctl = "0.2.0"
lazy_static = "1.3.0"
libc = "0.2.58"
protobuf = "2.27.0"
protobuf = "=2.14.0"
slog = "2.5.2"
slog-scope = "4.1.2"
scan_fmt = "0.2.6"
regex = "1.5.6"
regex = "1.5.4"
path-absolutize = "1.2.0"
anyhow = "1.0.32"
cgroups = { package = "cgroups-rs", version = "0.2.10" }
cgroups = { package = "cgroups-rs", version = "0.2.8" }
rlimit = "0.5.3"
cfg-if = "0.1.0"
tokio = { version = "1.2.0", features = ["sync", "io-util", "process", "time", "macros", "rt"] }
tokio = { version = "1.2.0", features = ["sync", "io-util", "process", "time", "macros"] }
futures = "0.3.17"
async-trait = "0.1.31"
inotify = "0.9.2"
libseccomp = { version = "0.2.3", optional = true }
libseccomp = { version = "0.1.3", optional = true }
[dev-dependencies]
serial_test = "0.5.0"
tempfile = "3.1.0"
test-utils = { path = "../../libs/test-utils" }
[features]
seccomp = ["libseccomp"]
standard-oci-runtime = []

View File

@@ -174,7 +174,7 @@ impl CgroupManager for Manager {
freezer_controller.freeze()?;
}
_ => {
return Err(anyhow!("Invalid FreezerState"));
return Err(anyhow!(nix::Error::EINVAL));
}
}
@@ -911,8 +911,9 @@ pub fn get_paths() -> Result<HashMap<String, String>> {
Ok(m)
}
pub fn get_mounts(paths: &HashMap<String, String>) -> Result<HashMap<String, String>> {
pub fn get_mounts() -> Result<HashMap<String, String>> {
let mut m = HashMap::new();
let paths = get_paths()?;
for l in fs::read_to_string(MOUNTS)?.lines() {
let p: Vec<&str> = l.splitn(2, " - ").collect();
@@ -950,7 +951,7 @@ impl Manager {
let mut m = HashMap::new();
let paths = get_paths()?;
let mounts = get_mounts(&paths)?;
let mounts = get_mounts()?;
for key in paths.keys() {
let mnt = mounts.get(key);

View File

@@ -151,12 +151,12 @@ async fn register_memory_event(
let eventfd = eventfd(0, EfdFlags::EFD_CLOEXEC)?;
let event_control_path = Path::new(&cg_dir).join("cgroup.event_control");
let data = if arg.is_empty() {
format!("{} {}", eventfd, event_file.as_raw_fd())
let data;
if arg.is_empty() {
data = format!("{} {}", eventfd, event_file.as_raw_fd());
} else {
format!("{} {} {}", eventfd, event_file.as_raw_fd(), arg)
};
data = format!("{} {} {}", eventfd, event_file.as_raw_fd(), arg);
}
fs::write(&event_control_path, data)?;

View File

@@ -1,77 +0,0 @@
// SPDX-License-Identifier: Apache-2.0
//
// Copyright 2021 Sony Group Corporation
//
use anyhow::{anyhow, Result};
use nix::errno::Errno;
use nix::pty;
use nix::sys::socket;
use nix::unistd::{self, dup2};
use std::io::IoSlice;
use std::os::unix::io::{AsRawFd, RawFd};
use std::path::Path;
pub fn setup_console_socket(csocket_path: &str) -> Result<Option<RawFd>> {
if csocket_path.is_empty() {
return Ok(None);
}
let socket_fd = socket::socket(
socket::AddressFamily::Unix,
socket::SockType::Stream,
socket::SockFlag::empty(),
None,
)?;
match socket::connect(socket_fd, &socket::UnixAddr::new(Path::new(csocket_path))?) {
Ok(()) => Ok(Some(socket_fd)),
Err(errno) => Err(anyhow!("failed to open console fd: {}", errno)),
}
}
pub fn setup_master_console(socket_fd: RawFd) -> Result<()> {
let pseudo = pty::openpty(None, None)?;
let pty_name: &[u8] = b"/dev/ptmx";
let iov = [IoSlice::new(pty_name)];
let fds = [pseudo.master];
let cmsg = socket::ControlMessage::ScmRights(&fds);
socket::sendmsg::<()>(socket_fd, &iov, &[cmsg], socket::MsgFlags::empty(), None)?;
unistd::setsid()?;
let ret = unsafe { libc::ioctl(pseudo.slave, libc::TIOCSCTTY) };
Errno::result(ret).map_err(|e| anyhow!(e).context("ioctl TIOCSCTTY"))?;
dup2(pseudo.slave, std::io::stdin().as_raw_fd())?;
dup2(pseudo.slave, std::io::stdout().as_raw_fd())?;
dup2(pseudo.slave, std::io::stderr().as_raw_fd())?;
unistd::close(socket_fd)?;
Ok(())
}
#[cfg(test)]
mod tests {
use super::*;
use std::os::unix::net::UnixListener;
use tempfile::{self, tempdir};
const CONSOLE_SOCKET: &str = "console-socket";
#[test]
fn test_setup_console_socket() {
let dir = tempdir()
.map_err(|e| anyhow!(e).context("tempdir failed"))
.unwrap();
let socket_path = dir.path().join(CONSOLE_SOCKET);
let _listener = UnixListener::bind(&socket_path).unwrap();
let ret = setup_console_socket(socket_path.to_str().unwrap());
assert!(ret.is_ok());
}
}

View File

@@ -23,8 +23,6 @@ use crate::cgroups::fs::Manager as FsManager;
#[cfg(test)]
use crate::cgroups::mock::Manager as FsManager;
use crate::cgroups::Manager;
#[cfg(feature = "standard-oci-runtime")]
use crate::console;
use crate::log_child;
use crate::process::Process;
#[cfg(feature = "seccomp")]
@@ -42,7 +40,7 @@ use nix::pty;
use nix::sched::{self, CloneFlags};
use nix::sys::signal::{self, Signal};
use nix::sys::stat::{self, Mode};
use nix::unistd::{self, fork, ForkResult, Gid, Pid, Uid, User};
use nix::unistd::{self, fork, ForkResult, Gid, Pid, Uid};
use std::os::unix::fs::MetadataExt;
use std::os::unix::io::AsRawFd;
@@ -64,7 +62,9 @@ use rlimit::{setrlimit, Resource, Rlim};
use tokio::io::AsyncBufReadExt;
use tokio::sync::Mutex;
pub const EXEC_FIFO_FILENAME: &str = "exec.fifo";
use crate::utils;
const EXEC_FIFO_FILENAME: &str = "exec.fifo";
const INIT: &str = "INIT";
const NO_PIVOT: &str = "NO_PIVOT";
@@ -74,7 +74,6 @@ const CLOG_FD: &str = "CLOG_FD";
const FIFO_FD: &str = "FIFO_FD";
const HOME_ENV_KEY: &str = "HOME";
const PIDNS_FD: &str = "PIDNS_FD";
const CONSOLE_SOCKET_FD: &str = "CONSOLE_SOCKET_FD";
#[derive(Debug)]
pub struct ContainerStatus {
@@ -83,7 +82,7 @@ pub struct ContainerStatus {
}
impl ContainerStatus {
pub fn new() -> Self {
fn new() -> Self {
ContainerStatus {
pre_status: ContainerState::Created,
cur_status: ContainerState::Created,
@@ -100,17 +99,6 @@ impl ContainerStatus {
}
}
impl Default for ContainerStatus {
fn default() -> Self {
Self::new()
}
}
// We might want to change this to thiserror in the future
const MissingCGroupManager: &str = "failed to get container's cgroup Manager";
const MissingLinux: &str = "no linux config";
const InvalidNamespace: &str = "invalid namespace type";
pub type Config = CreateOpts;
type NamespaceType = String;
@@ -118,7 +106,7 @@ lazy_static! {
// This locker ensures the child exit signal will be received by the right receiver.
pub static ref WAIT_PID_LOCKER: Arc<Mutex<bool>> = Arc::new(Mutex::new(false));
pub static ref NAMESPACES: HashMap<&'static str, CloneFlags> = {
static ref NAMESPACES: HashMap<&'static str, CloneFlags> = {
let mut m = HashMap::new();
m.insert("user", CloneFlags::CLONE_NEWUSER);
m.insert("ipc", CloneFlags::CLONE_NEWIPC);
@@ -131,7 +119,7 @@ lazy_static! {
};
// type to name hashmap, better to be in NAMESPACES
pub static ref TYPETONAME: HashMap<&'static str, &'static str> = {
static ref TYPETONAME: HashMap<&'static str, &'static str> = {
let mut m = HashMap::new();
m.insert("ipc", "ipc");
m.insert("user", "user");
@@ -227,7 +215,7 @@ pub trait BaseContainer {
async fn start(&mut self, p: Process) -> Result<()>;
async fn run(&mut self, p: Process) -> Result<()>;
async fn destroy(&mut self) -> Result<()>;
async fn exec(&mut self) -> Result<()>;
fn exec(&mut self) -> Result<()>;
}
// LinuxContainer protected by Mutex
@@ -248,8 +236,6 @@ pub struct LinuxContainer {
pub status: ContainerStatus,
pub created: SystemTime,
pub logger: Logger,
#[cfg(feature = "standard-oci-runtime")]
pub console_socket: PathBuf,
}
#[derive(Serialize, Deserialize, Debug)]
@@ -297,7 +283,7 @@ impl Container for LinuxContainer {
self.status.transition(ContainerState::Paused);
return Ok(());
}
Err(anyhow!(MissingCGroupManager))
Err(anyhow!("failed to get container's cgroup manager"))
}
fn resume(&mut self) -> Result<()> {
@@ -315,7 +301,7 @@ impl Container for LinuxContainer {
self.status.transition(ContainerState::Running);
return Ok(());
}
Err(anyhow!(MissingCGroupManager))
Err(anyhow!("failed to get container's cgroup manager"))
}
}
@@ -373,6 +359,7 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
)));
}
}
log_child!(cfd_log, "child process start run");
let buf = read_sync(crfd)?;
let spec_str = std::str::from_utf8(&buf)?;
@@ -392,9 +379,6 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
let cm: FsManager = serde_json::from_str(cm_str)?;
#[cfg(feature = "standard-oci-runtime")]
let csocket_fd = console::setup_console_socket(&std::env::var(CONSOLE_SOCKET_FD)?)?;
let p = if spec.process.is_some() {
spec.process.as_ref().unwrap()
} else {
@@ -402,7 +386,7 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
};
if spec.linux.is_none() {
return Err(anyhow!(MissingLinux));
return Err(anyhow!("no linux config"));
}
let linux = spec.linux.as_ref().unwrap();
@@ -416,7 +400,7 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
for ns in &nses {
let s = NAMESPACES.get(&ns.r#type.as_str());
if s.is_none() {
return Err(anyhow!(InvalidNamespace));
return Err(anyhow!("invalid ns type"));
}
let s = s.unwrap();
@@ -592,20 +576,14 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
// only change stdio devices owner when user
// isn't root.
if !uid.is_root() {
set_stdio_permissions(uid)?;
if guser.uid != 0 {
set_stdio_permissions(guser.uid)?;
}
setid(uid, gid)?;
if !guser.additional_gids.is_empty() {
let gids: Vec<Gid> = guser
.additional_gids
.iter()
.map(|gid| Gid::from_raw(*gid))
.collect();
unistd::setgroups(&gids).map_err(|e| {
setgroups(guser.additional_gids.as_slice()).map_err(|e| {
let _ = write_sync(
cwfd,
SYNC_FAILED,
@@ -645,6 +623,11 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
capabilities::drop_privileges(cfd_log, c)?;
}
if init {
// notify parent to run poststart hooks
write_sync(cwfd, SYNC_SUCCESS, "")?;
}
let args = oci_process.args.to_vec();
let env = oci_process.env.to_vec();
@@ -666,17 +649,12 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
}
}
// set the "HOME" env getting from "/etc/passwd", if
// there's no uid entry in /etc/passwd, set "/" as the
// home env.
if env::var_os(HOME_ENV_KEY).is_none() {
// try to set "HOME" env by uid
if let Ok(Some(user)) = User::from_uid(Uid::from_raw(guser.uid)) {
if let Ok(user_home_dir) = user.dir.into_os_string().into_string() {
env::set_var(HOME_ENV_KEY, user_home_dir);
}
}
// set default home dir as "/" if "HOME" env is still empty
if env::var_os(HOME_ENV_KEY).is_none() {
env::set_var(HOME_ENV_KEY, String::from("/"));
}
let home_dir = utils::home_dir(guser.uid).unwrap_or_else(|_| String::from("/"));
env::set_var(HOME_ENV_KEY, home_dir);
}
let exec_file = Path::new(&args[0]);
@@ -692,19 +670,10 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
let _ = unistd::close(crfd);
let _ = unistd::close(cwfd);
unistd::setsid().context("create a new session")?;
if oci_process.terminal {
cfg_if::cfg_if! {
if #[cfg(feature = "standard-oci-runtime")] {
if let Some(csocket_fd) = csocket_fd {
console::setup_master_console(csocket_fd)?;
} else {
return Err(anyhow!("failed to get console master socket fd"));
}
}
else {
unistd::setsid().context("create a new session")?;
unsafe { libc::ioctl(0, libc::TIOCSCTTY) };
}
unsafe {
libc::ioctl(0, libc::TIOCSCTTY);
}
}
@@ -736,7 +705,7 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
// within the container to the specified user.
// The ownership needs to match because it is created outside of
// the container and needs to be localized.
fn set_stdio_permissions(uid: Uid) -> Result<()> {
fn set_stdio_permissions(uid: libc::uid_t) -> Result<()> {
let meta = fs::metadata("/dev/null")?;
let fds = [
std::io::stdin().as_raw_fd(),
@@ -751,13 +720,19 @@ fn set_stdio_permissions(uid: Uid) -> Result<()> {
continue;
}
// According to the POSIX specification, -1 is used to indicate that owner and group
// are not to be changed. Since uid_t and gid_t are unsigned types, we have to wrap
// around to get -1.
let gid = 0u32.wrapping_sub(1);
// We only change the uid owner (as it is possible for the mount to
// prefer a different gid, and there's no reason for us to change it).
// The reason why we don't just leave the default uid=X mount setup is
// that users expect to be able to actually use their console. Without
// this code, you couldn't effectively run as a non-root user inside a
// container and also have a console set up.
unistd::fchown(*fd, Some(uid), None).with_context(|| "set stdio permissions failed")?;
let res = unsafe { libc::fchown(*fd, uid, gid) };
Errno::result(res).map_err(|e| anyhow!(e).context("set stdio permissions failed"))?;
}
Ok(())
@@ -953,14 +928,6 @@ impl BaseContainer for LinuxContainer {
let exec_path = std::env::current_exe()?;
let mut child = std::process::Command::new(exec_path);
#[allow(unused_mut)]
let mut console_name = PathBuf::from("");
#[cfg(feature = "standard-oci-runtime")]
if !self.console_socket.as_os_str().is_empty() {
console_name = self.console_socket.clone();
}
let mut child = child
.arg("init")
.stdin(child_stdin)
@@ -970,8 +937,7 @@ impl BaseContainer for LinuxContainer {
.env(NO_PIVOT, format!("{}", self.config.no_pivot_root))
.env(CRFD_FD, format!("{}", crfd))
.env(CWFD_FD, format!("{}", cwfd))
.env(CLOG_FD, format!("{}", cfd_log))
.env(CONSOLE_SOCKET_FD, console_name);
.env(CLOG_FD, format!("{}", cfd_log));
if p.init {
child = child.env(FIFO_FD, format!("{}", fifofd));
@@ -1054,7 +1020,7 @@ impl BaseContainer for LinuxContainer {
self.start(p).await?;
if init {
self.exec().await?;
self.exec()?;
self.status.transition(ContainerState::Running);
}
@@ -1097,22 +1063,12 @@ impl BaseContainer for LinuxContainer {
fs::remove_dir_all(&self.root)?;
if let Some(cgm) = self.cgroup_manager.as_mut() {
// Kill all of the processes created in this container to prevent
// the leak of some daemon process when this container shared pidns
// with the sandbox.
let pids = cgm.get_pids().context("get cgroup pids")?;
for i in pids {
if let Err(e) = signal::kill(Pid::from_raw(i), Signal::SIGKILL) {
warn!(self.logger, "kill the process {} error: {:?}", i, e);
}
}
cgm.destroy().context("destroy cgroups")?;
}
Ok(())
}
async fn exec(&mut self) -> Result<()> {
fn exec(&mut self) -> Result<()> {
let fifo = format!("{}/{}", &self.root, EXEC_FIFO_FILENAME);
let fd = fcntl::open(fifo.as_str(), OFlag::O_WRONLY, Mode::from_bits_truncate(0))?;
let data: &[u8] = &[0];
@@ -1124,26 +1080,6 @@ impl BaseContainer for LinuxContainer {
.as_secs();
self.status.transition(ContainerState::Running);
let spec = self
.config
.spec
.as_ref()
.ok_or_else(|| anyhow!("OCI spec was not found"))?;
let st = self.oci_state()?;
// run poststart hook
if spec.hooks.is_some() {
info!(self.logger, "poststart hook");
let hooks = spec
.hooks
.as_ref()
.ok_or_else(|| anyhow!("OCI hooks were not found"))?;
for h in hooks.poststart.iter() {
execute_hook(&self.logger, h, &st).await?;
}
}
unistd::close(fd)?;
Ok(())
@@ -1186,7 +1122,7 @@ fn do_exec(args: &[String]) -> ! {
unreachable!()
}
pub fn update_namespaces(logger: &Logger, spec: &mut Spec, init_pid: RawFd) -> Result<()> {
fn update_namespaces(logger: &Logger, spec: &mut Spec, init_pid: RawFd) -> Result<()> {
info!(logger, "updating namespaces");
let linux = spec
.linux
@@ -1365,6 +1301,20 @@ async fn join_namespaces(
// notify child run prestart hooks completed
info!(logger, "notify child run prestart hook completed!");
write_async(pipe_w, SYNC_SUCCESS, "").await?;
info!(logger, "notify child parent ready to run poststart hook!");
// wait to run poststart hook
read_async(pipe_r).await?;
info!(logger, "get ready to run poststart hook!");
// run poststart hook
if spec.hooks.is_some() {
info!(logger, "poststart hook");
let hooks = spec.hooks.as_ref().unwrap();
for h in hooks.poststart.iter() {
execute_hook(&logger, h, st).await?;
}
}
}
info!(logger, "wait for child process ready to run exec");
@@ -1442,10 +1392,18 @@ impl LinuxContainer {
Some(unistd::getuid()),
Some(unistd::getgid()),
)
.context(format!("Cannot change owner of container {} root", id))?;
.context(format!("cannot change onwer of container {} root", id))?;
if config.spec.is_none() {
return Err(anyhow!(nix::Error::EINVAL));
}
let spec = config.spec.as_ref().unwrap();
if spec.linux.is_none() {
return Err(anyhow!(nix::Error::EINVAL));
}
let linux = spec.linux.as_ref().unwrap();
let cpath = if linux.cgroups_path.is_empty() {
@@ -1454,12 +1412,7 @@ impl LinuxContainer {
linux.cgroups_path.clone()
};
let cgroup_manager = FsManager::new(cpath.as_str()).map_err(|e| {
anyhow!(format!(
"fail to create cgroup manager with path {}: {:}",
cpath, e
))
})?;
let cgroup_manager = FsManager::new(cpath.as_str())?;
info!(logger, "new cgroup_manager {:?}", &cgroup_manager);
Ok(LinuxContainer {
@@ -1478,16 +1431,14 @@ impl LinuxContainer {
.unwrap()
.as_secs(),
logger: logger.new(o!("module" => "rustjail", "subsystem" => "container", "cid" => id)),
#[cfg(feature = "standard-oci-runtime")]
console_socket: Path::new("").to_path_buf(),
})
}
}
#[cfg(feature = "standard-oci-runtime")]
pub fn set_console_socket(&mut self, console_socket: &Path) -> Result<()> {
self.console_socket = console_socket.to_path_buf();
Ok(())
}
fn setgroups(grps: &[libc::gid_t]) -> Result<()> {
let ret = unsafe { libc::setgroups(grps.len(), grps.as_ptr() as *const libc::gid_t) };
Errno::result(ret).map(drop)?;
Ok(())
}
use std::fs::OpenOptions;
@@ -1521,13 +1472,13 @@ use std::process::Stdio;
use std::time::Duration;
use tokio::io::{AsyncReadExt, AsyncWriteExt};
pub async fn execute_hook(logger: &Logger, h: &Hook, st: &OCIState) -> Result<()> {
async fn execute_hook(logger: &Logger, h: &Hook, st: &OCIState) -> Result<()> {
let logger = logger.new(o!("action" => "execute-hook"));
let binary = PathBuf::from(h.path.as_str());
let path = binary.canonicalize()?;
if !path.exists() {
return Err(anyhow!("Path {:?} does not exist", path));
return Err(anyhow!(nix::Error::EINVAL));
}
let mut args = h.args.clone();
@@ -1658,12 +1609,11 @@ fn valid_env(e: &str) -> Option<(&str, &str)> {
mod tests {
use super::*;
use crate::process::Process;
use nix::unistd::Uid;
use crate::skip_if_not_root;
use std::fs;
use std::os::unix::fs::MetadataExt;
use std::os::unix::io::AsRawFd;
use tempfile::tempdir;
use test_utils::skip_if_not_root;
use tokio::process::Command;
macro_rules! sl {
@@ -1805,7 +1755,7 @@ mod tests {
let old_uid = meta.uid();
let uid = 1000;
set_stdio_permissions(Uid::from_raw(uid)).unwrap();
set_stdio_permissions(uid).unwrap();
let meta = fs::metadata("/dev/stdin").unwrap();
assert_eq!(meta.uid(), uid);
@@ -1817,7 +1767,7 @@ mod tests {
assert_eq!(meta.uid(), uid);
// restore the uid
set_stdio_permissions(Uid::from_raw(old_uid)).unwrap();
set_stdio_permissions(old_uid).unwrap();
}
#[test]
@@ -2098,10 +2048,9 @@ mod tests {
assert!(ret.is_ok(), "Expecting Ok, Got {:?}", ret);
}
#[tokio::test]
async fn test_linuxcontainer_exec() {
let (c, _dir) = new_linux_container();
let ret = c.unwrap().exec().await;
#[test]
fn test_linuxcontainer_exec() {
let ret = new_linux_container_and_then(|mut c: LinuxContainer| c.exec());
assert!(ret.is_err(), "Expecting Err, Got {:?}", ret);
}

View File

@@ -30,8 +30,6 @@ extern crate regex;
pub mod capabilities;
pub mod cgroups;
#[cfg(feature = "standard-oci-runtime")]
pub mod console;
pub mod container;
pub mod mount;
pub mod pipestream;
@@ -41,6 +39,7 @@ pub mod seccomp;
pub mod specconv;
pub mod sync;
pub mod sync_with_async;
pub mod utils;
pub mod validator;
use std::collections::HashMap;
@@ -352,12 +351,13 @@ fn seccomp_grpc_to_oci(sec: &grpc::LinuxSeccomp) -> oci::LinuxSeccomp {
for sys in sec.Syscalls.iter() {
let mut args = Vec::new();
let errno_ret: u32;
let errno_ret: u32 = if sys.has_errnoret() {
sys.get_errnoret()
if sys.has_errnoret() {
errno_ret = sys.get_errnoret();
} else {
libc::EPERM as u32
};
errno_ret = libc::EPERM as u32;
}
for arg in sys.Args.iter() {
args.push(oci::LinuxSeccompArg {
@@ -513,596 +513,13 @@ pub fn grpc_to_oci(grpc: &grpc::Spec) -> oci::Spec {
#[cfg(test)]
mod tests {
use super::*;
// Parameters:
//
// 1: expected Result
// 2: actual Result
// 3: string used to identify the test on error
#[macro_export]
macro_rules! assert_result {
($expected_result:expr, $actual_result:expr, $msg:expr) => {
if $expected_result.is_ok() {
let expected_value = $expected_result.as_ref().unwrap();
let actual_value = $actual_result.unwrap();
assert!(*expected_value == actual_value, "{}", $msg);
} else {
assert!($actual_result.is_err(), "{}", $msg);
let expected_error = $expected_result.as_ref().unwrap_err();
let expected_error_msg = format!("{:?}", expected_error);
let actual_error_msg = format!("{:?}", $actual_result.unwrap_err());
assert!(expected_error_msg == actual_error_msg, "{}", $msg);
macro_rules! skip_if_not_root {
() => {
if !nix::unistd::Uid::effective().is_root() {
println!("INFO: skipping {} which needs root", module_path!());
return;
}
};
}
#[test]
fn test_process_grpc_to_oci() {
#[derive(Debug)]
struct TestData {
grpcproc: grpc::Process,
result: oci::Process,
}
let tests = &[
TestData {
// All fields specified
grpcproc: grpc::Process {
Terminal: true,
ConsoleSize: protobuf::SingularPtrField::<grpc::Box>::some(grpc::Box {
Height: 123,
Width: 456,
..Default::default()
}),
User: protobuf::SingularPtrField::<grpc::User>::some(grpc::User {
UID: 1234,
GID: 5678,
AdditionalGids: Vec::from([910, 1112]),
Username: String::from("username"),
..Default::default()
}),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg1"),
String::from("arg2"),
])),
Env: protobuf::RepeatedField::from(Vec::from([String::from("env")])),
Cwd: String::from("cwd"),
Capabilities: protobuf::SingularPtrField::some(grpc::LinuxCapabilities {
Bounding: protobuf::RepeatedField::from(Vec::from([String::from("bnd")])),
Effective: protobuf::RepeatedField::from(Vec::from([String::from("eff")])),
Inheritable: protobuf::RepeatedField::from(Vec::from([String::from(
"inher",
)])),
Permitted: protobuf::RepeatedField::from(Vec::from([String::from("perm")])),
Ambient: protobuf::RepeatedField::from(Vec::from([String::from("amb")])),
..Default::default()
}),
Rlimits: protobuf::RepeatedField::from(Vec::from([
grpc::POSIXRlimit {
Type: String::from("r#type"),
Hard: 123,
Soft: 456,
..Default::default()
},
grpc::POSIXRlimit {
Type: String::from("r#type2"),
Hard: 789,
Soft: 1011,
..Default::default()
},
])),
NoNewPrivileges: true,
ApparmorProfile: String::from("apparmor profile"),
OOMScoreAdj: 123456,
SelinuxLabel: String::from("Selinux Label"),
..Default::default()
},
result: oci::Process {
terminal: true,
console_size: Some(oci::Box {
height: 123,
width: 456,
}),
user: oci::User {
uid: 1234,
gid: 5678,
additional_gids: Vec::from([910, 1112]),
username: String::from("username"),
},
args: Vec::from([String::from("arg1"), String::from("arg2")]),
env: Vec::from([String::from("env")]),
cwd: String::from("cwd"),
capabilities: Some(oci::LinuxCapabilities {
bounding: Vec::from([String::from("bnd")]),
effective: Vec::from([String::from("eff")]),
inheritable: Vec::from([String::from("inher")]),
permitted: Vec::from([String::from("perm")]),
ambient: Vec::from([String::from("amb")]),
}),
rlimits: Vec::from([
oci::PosixRlimit {
r#type: String::from("r#type"),
hard: 123,
soft: 456,
},
oci::PosixRlimit {
r#type: String::from("r#type2"),
hard: 789,
soft: 1011,
},
]),
no_new_privileges: true,
apparmor_profile: String::from("apparmor profile"),
oom_score_adj: Some(123456),
selinux_label: String::from("Selinux Label"),
},
},
TestData {
// None ConsoleSize
grpcproc: grpc::Process {
ConsoleSize: protobuf::SingularPtrField::<grpc::Box>::none(),
OOMScoreAdj: 0,
..Default::default()
},
result: oci::Process {
console_size: None,
oom_score_adj: Some(0),
..Default::default()
},
},
TestData {
// None User
grpcproc: grpc::Process {
User: protobuf::SingularPtrField::<grpc::User>::none(),
OOMScoreAdj: 0,
..Default::default()
},
result: oci::Process {
user: oci::User {
uid: 0,
gid: 0,
additional_gids: vec![],
username: String::from(""),
},
oom_score_adj: Some(0),
..Default::default()
},
},
TestData {
// None Capabilities
grpcproc: grpc::Process {
Capabilities: protobuf::SingularPtrField::none(),
OOMScoreAdj: 0,
..Default::default()
},
result: oci::Process {
capabilities: None,
oom_score_adj: Some(0),
..Default::default()
},
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let result = process_grpc_to_oci(&d.grpcproc);
let msg = format!("{}, result: {:?}", msg, result);
assert_eq!(d.result, result, "{}", msg);
}
}
#[test]
fn test_root_grpc_to_oci() {
#[derive(Debug)]
struct TestData {
grpcroot: grpc::Root,
result: oci::Root,
}
let tests = &[
TestData {
// Default fields
grpcroot: grpc::Root {
..Default::default()
},
result: oci::Root {
..Default::default()
},
},
TestData {
// Specified fields, readonly false
grpcroot: grpc::Root {
Path: String::from("path"),
Readonly: false,
..Default::default()
},
result: oci::Root {
path: String::from("path"),
readonly: false,
..Default::default()
},
},
TestData {
// Specified fields, readonly true
grpcroot: grpc::Root {
Path: String::from("path"),
Readonly: true,
..Default::default()
},
result: oci::Root {
path: String::from("path"),
readonly: true,
..Default::default()
},
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let result = root_grpc_to_oci(&d.grpcroot);
let msg = format!("{}, result: {:?}", msg, result);
assert_eq!(d.result, result, "{}", msg);
}
}
#[test]
fn test_hooks_grpc_to_oci() {
#[derive(Debug)]
struct TestData {
grpchooks: grpc::Hooks,
result: oci::Hooks,
}
let tests = &[
TestData {
// Default fields
grpchooks: grpc::Hooks {
..Default::default()
},
result: oci::Hooks {
..Default::default()
},
},
TestData {
// All specified
grpchooks: grpc::Hooks {
Prestart: protobuf::RepeatedField::from(Vec::from([
grpc::Hook {
Path: String::from("prestartpath"),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg1"),
String::from("arg2"),
])),
Env: protobuf::RepeatedField::from(Vec::from([
String::from("env1"),
String::from("env2"),
])),
Timeout: 10,
..Default::default()
},
grpc::Hook {
Path: String::from("prestartpath2"),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg3"),
String::from("arg4"),
])),
Env: protobuf::RepeatedField::from(Vec::from([
String::from("env3"),
String::from("env4"),
])),
Timeout: 25,
..Default::default()
},
])),
Poststart: protobuf::RepeatedField::from(Vec::from([grpc::Hook {
Path: String::from("poststartpath"),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg1"),
String::from("arg2"),
])),
Env: protobuf::RepeatedField::from(Vec::from([
String::from("env1"),
String::from("env2"),
])),
Timeout: 10,
..Default::default()
}])),
Poststop: protobuf::RepeatedField::from(Vec::from([grpc::Hook {
Path: String::from("poststoppath"),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg1"),
String::from("arg2"),
])),
Env: protobuf::RepeatedField::from(Vec::from([
String::from("env1"),
String::from("env2"),
])),
Timeout: 10,
..Default::default()
}])),
..Default::default()
},
result: oci::Hooks {
prestart: Vec::from([
oci::Hook {
path: String::from("prestartpath"),
args: Vec::from([String::from("arg1"), String::from("arg2")]),
env: Vec::from([String::from("env1"), String::from("env2")]),
timeout: Some(10),
},
oci::Hook {
path: String::from("prestartpath2"),
args: Vec::from([String::from("arg3"), String::from("arg4")]),
env: Vec::from([String::from("env3"), String::from("env4")]),
timeout: Some(25),
},
]),
poststart: Vec::from([oci::Hook {
path: String::from("poststartpath"),
args: Vec::from([String::from("arg1"), String::from("arg2")]),
env: Vec::from([String::from("env1"), String::from("env2")]),
timeout: Some(10),
}]),
poststop: Vec::from([oci::Hook {
path: String::from("poststoppath"),
args: Vec::from([String::from("arg1"), String::from("arg2")]),
env: Vec::from([String::from("env1"), String::from("env2")]),
timeout: Some(10),
}]),
},
},
TestData {
// Prestart empty
grpchooks: grpc::Hooks {
Prestart: protobuf::RepeatedField::from(Vec::from([])),
Poststart: protobuf::RepeatedField::from(Vec::from([grpc::Hook {
Path: String::from("poststartpath"),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg1"),
String::from("arg2"),
])),
Env: protobuf::RepeatedField::from(Vec::from([
String::from("env1"),
String::from("env2"),
])),
Timeout: 10,
..Default::default()
}])),
Poststop: protobuf::RepeatedField::from(Vec::from([grpc::Hook {
Path: String::from("poststoppath"),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg1"),
String::from("arg2"),
])),
Env: protobuf::RepeatedField::from(Vec::from([
String::from("env1"),
String::from("env2"),
])),
Timeout: 10,
..Default::default()
}])),
..Default::default()
},
result: oci::Hooks {
prestart: Vec::from([]),
poststart: Vec::from([oci::Hook {
path: String::from("poststartpath"),
args: Vec::from([String::from("arg1"), String::from("arg2")]),
env: Vec::from([String::from("env1"), String::from("env2")]),
timeout: Some(10),
}]),
poststop: Vec::from([oci::Hook {
path: String::from("poststoppath"),
args: Vec::from([String::from("arg1"), String::from("arg2")]),
env: Vec::from([String::from("env1"), String::from("env2")]),
timeout: Some(10),
}]),
},
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let result = hooks_grpc_to_oci(&d.grpchooks);
let msg = format!("{}, result: {:?}", msg, result);
assert_eq!(d.result, result, "{}", msg);
}
}
#[test]
fn test_mount_grpc_to_oci() {
#[derive(Debug)]
struct TestData {
grpcmount: grpc::Mount,
result: oci::Mount,
}
let tests = &[
TestData {
// Default fields
grpcmount: grpc::Mount {
..Default::default()
},
result: oci::Mount {
..Default::default()
},
},
TestData {
grpcmount: grpc::Mount {
destination: String::from("destination"),
source: String::from("source"),
field_type: String::from("fieldtype"),
options: protobuf::RepeatedField::from(Vec::from([
String::from("option1"),
String::from("option2"),
])),
..Default::default()
},
result: oci::Mount {
destination: String::from("destination"),
source: String::from("source"),
r#type: String::from("fieldtype"),
options: Vec::from([String::from("option1"), String::from("option2")]),
},
},
TestData {
grpcmount: grpc::Mount {
destination: String::from("destination"),
source: String::from("source"),
field_type: String::from("fieldtype"),
options: protobuf::RepeatedField::from(Vec::new()),
..Default::default()
},
result: oci::Mount {
destination: String::from("destination"),
source: String::from("source"),
r#type: String::from("fieldtype"),
options: Vec::new(),
},
},
TestData {
grpcmount: grpc::Mount {
destination: String::new(),
source: String::from("source"),
field_type: String::from("fieldtype"),
options: protobuf::RepeatedField::from(Vec::from([String::from("option1")])),
..Default::default()
},
result: oci::Mount {
destination: String::new(),
source: String::from("source"),
r#type: String::from("fieldtype"),
options: Vec::from([String::from("option1")]),
},
},
TestData {
grpcmount: grpc::Mount {
destination: String::from("destination"),
source: String::from("source"),
field_type: String::new(),
options: protobuf::RepeatedField::from(Vec::from([String::from("option1")])),
..Default::default()
},
result: oci::Mount {
destination: String::from("destination"),
source: String::from("source"),
r#type: String::new(),
options: Vec::from([String::from("option1")]),
},
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let result = mount_grpc_to_oci(&d.grpcmount);
let msg = format!("{}, result: {:?}", msg, result);
assert_eq!(d.result, result, "{}", msg);
}
}
#[test]
fn test_hook_grpc_to_oci<'a>() {
#[derive(Debug)]
struct TestData<'a> {
grpchook: &'a [grpc::Hook],
result: Vec<oci::Hook>,
}
let tests = &[
TestData {
// Default fields
grpchook: &[
grpc::Hook {
Timeout: 0,
..Default::default()
},
grpc::Hook {
Timeout: 0,
..Default::default()
},
],
result: vec![
oci::Hook {
timeout: Some(0),
..Default::default()
},
oci::Hook {
timeout: Some(0),
..Default::default()
},
],
},
TestData {
// Specified fields
grpchook: &[
grpc::Hook {
Path: String::from("path"),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg1"),
String::from("arg2"),
])),
Env: protobuf::RepeatedField::from(Vec::from([
String::from("env1"),
String::from("env2"),
])),
Timeout: 10,
..Default::default()
},
grpc::Hook {
Path: String::from("path2"),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg3"),
String::from("arg4"),
])),
Env: protobuf::RepeatedField::from(Vec::from([
String::from("env3"),
String::from("env4"),
])),
Timeout: 20,
..Default::default()
},
],
result: vec![
oci::Hook {
path: String::from("path"),
args: Vec::from([String::from("arg1"), String::from("arg2")]),
env: Vec::from([String::from("env1"), String::from("env2")]),
timeout: Some(10),
},
oci::Hook {
path: String::from("path2"),
args: Vec::from([String::from("arg3"), String::from("arg4")]),
env: Vec::from([String::from("env3"), String::from("env4")]),
timeout: Some(20),
},
],
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let result = hook_grpc_to_oci(d.grpchook);
let msg = format!("{}, result: {:?}", msg, result);
assert_eq!(d.result, result, "{}", msg);
}
}
}

View File

@@ -32,21 +32,16 @@ use crate::log_child;
// Info reveals information about a particular mounted filesystem. This
// struct is populated from the content in the /proc/<pid>/mountinfo file.
#[derive(std::fmt::Debug, PartialEq)]
#[derive(std::fmt::Debug)]
pub struct Info {
mount_point: String,
optional: String,
fstype: String,
}
const MOUNTINFO_FORMAT: &str = "{d} {d} {d}:{d} {} {} {} {}";
const MOUNTINFO_PATH: &str = "/proc/self/mountinfo";
const MOUNTINFOFORMAT: &str = "{d} {d} {d}:{d} {} {} {} {}";
const PROC_PATH: &str = "/proc";
const ERR_FAILED_PARSE_MOUNTINFO: &str = "failed to parse mountinfo file";
const ERR_FAILED_PARSE_MOUNTINFO_FINAL_FIELDS: &str =
"failed to parse final fields in mountinfo file";
// since libc didn't defined this const for musl, thus redefined it here.
#[cfg(all(target_os = "linux", target_env = "gnu", not(target_arch = "s390x")))]
const PROC_SUPER_MAGIC: libc::c_long = 0x00009fa0;
@@ -523,7 +518,7 @@ pub fn pivot_rootfs<P: ?Sized + NixPath + std::fmt::Debug>(path: &P) -> Result<(
}
fn rootfs_parent_mount_private(path: &str) -> Result<()> {
let mount_infos = parse_mount_table(MOUNTINFO_PATH)?;
let mount_infos = parse_mount_table()?;
let mut max_len = 0;
let mut mount_point = String::from("");
@@ -551,8 +546,8 @@ fn rootfs_parent_mount_private(path: &str) -> Result<()> {
// Parse /proc/self/mountinfo because comparing Dev and ino does not work from
// bind mounts
fn parse_mount_table(mountinfo_path: &str) -> Result<Vec<Info>> {
let file = File::open(mountinfo_path)?;
fn parse_mount_table() -> Result<Vec<Info>> {
let file = File::open("/proc/self/mountinfo")?;
let reader = BufReader::new(file);
let mut infos = Vec::new();
@@ -574,7 +569,7 @@ fn parse_mount_table(mountinfo_path: &str) -> Result<Vec<Info>> {
let (_id, _parent, _major, _minor, _root, mount_point, _opts, optional) = scan_fmt!(
&line,
MOUNTINFO_FORMAT,
MOUNTINFOFORMAT,
i32,
i32,
i32,
@@ -583,17 +578,12 @@ fn parse_mount_table(mountinfo_path: &str) -> Result<Vec<Info>> {
String,
String,
String
)
.map_err(|_| anyhow!(ERR_FAILED_PARSE_MOUNTINFO))?;
)?;
let fields: Vec<&str> = line.split(" - ").collect();
if fields.len() == 2 {
let final_fields: Vec<&str> = fields[1].split_whitespace().collect();
if final_fields.len() != 3 {
return Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO_FINAL_FIELDS));
}
let fstype = final_fields[0].to_string();
let (fstype, _source, _vfs_opts) =
scan_fmt!(fields[1], "{} {} {}", String, String, String)?;
let mut optional_new = String::new();
if optional != "-" {
@@ -608,7 +598,7 @@ fn parse_mount_table(mountinfo_path: &str) -> Result<Vec<Info>> {
infos.push(info);
} else {
return Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO));
return Err(anyhow!("failed to parse mount info file".to_string()));
}
}
@@ -629,7 +619,7 @@ fn chroot<P: ?Sized + NixPath>(_path: &P) -> Result<(), nix::Error> {
pub fn ms_move_root(rootfs: &str) -> Result<bool> {
unistd::chdir(rootfs)?;
let mount_infos = parse_mount_table(MOUNTINFO_PATH)?;
let mount_infos = parse_mount_table()?;
let root_path = Path::new(rootfs);
let abs_root_buf = root_path.absolutize()?;
@@ -780,31 +770,18 @@ fn mount_from(
Path::new(&dest).parent().unwrap()
};
fs::create_dir_all(&dir).map_err(|e| {
let _ = fs::create_dir_all(&dir).map_err(|e| {
log_child!(
cfd_log,
"create dir {}: {}",
dir.to_str().unwrap(),
e.to_string()
);
e
})?;
)
});
// make sure file exists so we can bind over it
if !src.is_dir() {
let _ = OpenOptions::new()
.create(true)
.write(true)
.open(&dest)
.map_err(|e| {
log_child!(
cfd_log,
"open/create dest error. {}: {:?}",
dest.as_str(),
e
);
e
})?;
let _ = OpenOptions::new().create(true).write(true).open(&dest);
}
src.to_str().unwrap().to_string()
} else {
@@ -817,10 +794,8 @@ fn mount_from(
}
};
let _ = stat::stat(dest.as_str()).map_err(|e| {
log_child!(cfd_log, "dest stat error. {}: {:?}", dest.as_str(), e);
e
})?;
let _ = stat::stat(dest.as_str())
.map_err(|e| log_child!(cfd_log, "dest stat error. {}: {:?}", dest.as_str(), e));
mount(
Some(src.as_str()),
@@ -1020,7 +995,9 @@ pub fn finish_rootfs(cfd_log: RawFd, spec: &Spec, process: &Process) -> Result<(
}
fn mask_path(path: &str) -> Result<()> {
check_paths(path)?;
if !path.starts_with('/') || path.contains("..") {
return Err(anyhow!(nix::Error::EINVAL));
}
match mount(
Some("/dev/null"),
@@ -1038,7 +1015,9 @@ fn mask_path(path: &str) -> Result<()> {
}
fn readonly_path(path: &str) -> Result<()> {
check_paths(path)?;
if !path.starts_with('/') || path.contains("..") {
return Err(anyhow!(nix::Error::EINVAL));
}
if let Err(e) = mount(
Some(&path[1..]),
@@ -1064,28 +1043,16 @@ fn readonly_path(path: &str) -> Result<()> {
Ok(())
}
fn check_paths(path: &str) -> Result<()> {
if !path.starts_with('/') || path.contains("..") {
return Err(anyhow!(
"Cannot mount {} (path does not start with '/' or contains '..').",
path
));
}
Ok(())
}
#[cfg(test)]
mod tests {
use super::*;
use crate::assert_result;
use crate::skip_if_not_root;
use std::fs::create_dir;
use std::fs::create_dir_all;
use std::fs::remove_dir_all;
use std::io;
use std::os::unix::fs;
use std::os::unix::io::AsRawFd;
use tempfile::tempdir;
use test_utils::skip_if_not_root;
#[test]
#[serial(chdir)]
@@ -1319,162 +1286,6 @@ mod tests {
let ret = stat::stat(path);
assert!(ret.is_ok(), "Should pass. Got: {:?}", ret);
}
#[test]
fn test_mount_from() {
#[derive(Debug)]
struct TestData<'a> {
source: &'a str,
destination: &'a str,
r#type: &'a str,
flags: MsFlags,
error_contains: &'a str,
// if true, a directory will be created at path in source
make_source_directory: bool,
// if true, a file will be created at path in source
make_source_file: bool,
}
impl Default for TestData<'_> {
fn default() -> Self {
TestData {
source: "tmp",
destination: "dest",
r#type: "tmpfs",
flags: MsFlags::empty(),
error_contains: "",
make_source_directory: true,
make_source_file: false,
}
}
}
let tests = &[
TestData {
..Default::default()
},
TestData {
flags: MsFlags::MS_BIND,
..Default::default()
},
TestData {
r#type: "bind",
..Default::default()
},
TestData {
r#type: "cgroup2",
..Default::default()
},
TestData {
r#type: "bind",
make_source_directory: false,
error_contains: &format!("{}", std::io::Error::from_raw_os_error(libc::ENOENT)),
..Default::default()
},
TestData {
r#type: "bind",
make_source_directory: false,
make_source_file: true,
..Default::default()
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let tempdir = tempdir().unwrap();
let (rfd, wfd) = unistd::pipe2(OFlag::O_CLOEXEC).unwrap();
defer!({
unistd::close(rfd).unwrap();
unistd::close(wfd).unwrap();
});
let source_path = tempdir.path().join(d.source).to_str().unwrap().to_string();
if d.make_source_directory {
std::fs::create_dir_all(&source_path).unwrap();
} else if d.make_source_file {
std::fs::write(&source_path, []).unwrap();
}
let mount = Mount {
source: source_path,
destination: d.destination.to_string(),
r#type: d.r#type.to_string(),
options: vec![],
};
let result = mount_from(
wfd,
&mount,
tempdir.path().to_str().unwrap(),
d.flags,
"",
"",
);
let msg = format!("{}: result: {:?}", msg, result);
if d.error_contains.is_empty() {
assert!(result.is_ok(), "{}", msg);
} else {
assert!(result.is_err(), "{}", msg);
let error_msg = format!("{}", result.unwrap_err());
assert!(error_msg.contains(d.error_contains), "{}", msg);
}
}
}
#[test]
fn test_check_paths() {
#[derive(Debug)]
struct TestData<'a> {
name: &'a str,
path: &'a str,
result: Result<()>,
}
let tests = &[
TestData {
name: "valid path",
path: "/foo/bar",
result: Ok(()),
},
TestData {
name: "does not starts with /",
path: "foo/bar",
result: Err(anyhow!(
"Cannot mount foo/bar (path does not start with '/' or contains '..')."
)),
},
TestData {
name: "contains ..",
path: "../foo/bar",
result: Err(anyhow!(
"Cannot mount ../foo/bar (path does not start with '/' or contains '..')."
)),
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d.name);
let result = check_paths(d.path);
let msg = format!("{}: result: {:?}", msg, result);
if d.result.is_ok() {
assert!(result.is_ok());
continue;
}
let expected_error = format!("{}", d.result.as_ref().unwrap_err());
let actual_error = format!("{}", result.unwrap_err());
assert!(actual_error == expected_error, "{}", msg);
}
}
#[test]
fn test_check_proc_mount() {
let mount = oci::Mount {
@@ -1590,121 +1401,6 @@ mod tests {
}
}
#[test]
fn test_parse_mount_table() {
#[derive(Debug)]
struct TestData<'a> {
mountinfo_data: Option<&'a str>,
result: Result<Vec<Info>>,
}
let tests = &[
TestData {
mountinfo_data: Some(
"22 933 0:20 / /sys rw,nodev shared:2 - sysfs sysfs rw,noexec",
),
result: Ok(vec![Info {
mount_point: "/sys".to_string(),
optional: "shared:2".to_string(),
fstype: "sysfs".to_string(),
}]),
},
TestData {
mountinfo_data: Some(
r#"22 933 0:20 / /sys rw,nodev - sysfs sysfs rw,noexec
81 13 1:2 / /tmp/dir rw shared:2 - tmpfs tmpfs rw"#,
),
result: Ok(vec![
Info {
mount_point: "/sys".to_string(),
optional: "".to_string(),
fstype: "sysfs".to_string(),
},
Info {
mount_point: "/tmp/dir".to_string(),
optional: "shared:2".to_string(),
fstype: "tmpfs".to_string(),
},
]),
},
TestData {
mountinfo_data: Some(
"22 933 0:20 /foo\040-\040bar /sys rw,nodev shared:2 - sysfs sysfs rw,noexec",
),
result: Ok(vec![Info {
mount_point: "/sys".to_string(),
optional: "shared:2".to_string(),
fstype: "sysfs".to_string(),
}]),
},
TestData {
mountinfo_data: Some(""),
result: Ok(vec![]),
},
TestData {
mountinfo_data: Some("invalid line data - sysfs sysfs rw"),
result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO)),
},
TestData {
mountinfo_data: Some("22 96 0:21 / /sys rw,noexec - sysfs"),
result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO_FINAL_FIELDS)),
},
TestData {
mountinfo_data: Some("22 96 0:21 / /sys rw,noexec - sysfs sysfs rw rw"),
result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO_FINAL_FIELDS)),
},
TestData {
mountinfo_data: Some("22 96 0:21 / /sys rw,noexec shared:2 - x - x"),
result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO)),
},
TestData {
mountinfo_data: Some("-"),
result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO)),
},
TestData {
mountinfo_data: Some("--"),
result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO)),
},
TestData {
mountinfo_data: Some("- -"),
result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO)),
},
TestData {
mountinfo_data: Some(" - "),
result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO)),
},
TestData {
mountinfo_data: Some(
r#"22 933 0:20 / /sys rw,nodev - sysfs sysfs rw,noexec
invalid line
81 13 1:2 / /tmp/dir rw shared:2 - tmpfs tmpfs rw"#,
),
result: Err(anyhow!(ERR_FAILED_PARSE_MOUNTINFO)),
},
TestData {
mountinfo_data: None,
result: Err(anyhow!(io::Error::from_raw_os_error(libc::ENOENT))),
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let tempdir = tempdir().unwrap();
let mountinfo_path = tempdir.path().join("mountinfo");
if let Some(mountinfo_data) = d.mountinfo_data {
std::fs::write(&mountinfo_path, mountinfo_data).unwrap();
}
let result = parse_mount_table(mountinfo_path.to_str().unwrap());
let msg = format!("{}: result: {:?}", msg, result);
assert_result!(d.result, result, msg);
}
}
#[test]
fn test_dev_rel_path() {
// Valid device paths

View File

@@ -5,7 +5,7 @@
use libc::pid_t;
use std::fs::File;
use std::os::unix::io::{AsRawFd, RawFd};
use std::os::unix::io::RawFd;
use tokio::sync::mpsc::Sender;
use nix::errno::Errno;
@@ -28,6 +28,7 @@ macro_rules! close_process_stream {
($self: ident, $stream:ident, $stream_type: ident) => {
if $self.$stream.is_some() {
$self.close_stream(StreamType::$stream_type);
let _ = unistd::close($self.$stream.unwrap());
$self.$stream = None;
}
};
@@ -136,25 +137,19 @@ impl Process {
info!(logger, "before create console socket!");
if !p.tty {
if cfg!(feature = "standard-oci-runtime") {
p.stdin = Some(std::io::stdin().as_raw_fd());
p.stdout = Some(std::io::stdout().as_raw_fd());
p.stderr = Some(std::io::stderr().as_raw_fd());
} else {
info!(logger, "created console socket!");
info!(logger, "created console socket!");
let (stdin, pstdin) = unistd::pipe2(OFlag::O_CLOEXEC)?;
p.parent_stdin = Some(pstdin);
p.stdin = Some(stdin);
let (stdin, pstdin) = unistd::pipe2(OFlag::O_CLOEXEC)?;
p.parent_stdin = Some(pstdin);
p.stdin = Some(stdin);
let (pstdout, stdout) = create_extended_pipe(OFlag::O_CLOEXEC, pipe_size)?;
p.parent_stdout = Some(pstdout);
p.stdout = Some(stdout);
let (pstdout, stdout) = create_extended_pipe(OFlag::O_CLOEXEC, pipe_size)?;
p.parent_stdout = Some(pstdout);
p.stdout = Some(stdout);
let (pstderr, stderr) = create_extended_pipe(OFlag::O_CLOEXEC, pipe_size)?;
p.parent_stderr = Some(pstderr);
p.stderr = Some(stderr);
}
let (pstderr, stderr) = create_extended_pipe(OFlag::O_CLOEXEC, pipe_size)?;
p.parent_stderr = Some(pstderr);
p.stderr = Some(stderr);
}
Ok(p)
}
@@ -224,7 +219,7 @@ impl Process {
Some(writer)
}
fn close_stream(&mut self, stream_type: StreamType) {
pub fn close_stream(&mut self, stream_type: StreamType) {
let _ = self.readers.remove(&stream_type);
let _ = self.writers.remove(&stream_type);
}
@@ -289,11 +284,5 @@ mod tests {
// group of the calling process.
process.pid = 0;
assert!(process.signal(libc::SIGCONT).is_ok());
if cfg!(feature = "standard-oci-runtime") {
assert_eq!(process.stdin.unwrap(), std::io::stdin().as_raw_fd());
assert_eq!(process.stdout.unwrap(), std::io::stdout().as_raw_fd());
assert_eq!(process.stderr.unwrap(), std::io::stderr().as_raw_fd());
}
}
}

View File

@@ -26,15 +26,12 @@ fn get_rule_conditions(args: &[LinuxSeccompArg]) -> Result<Vec<ScmpArgCompare>>
return Err(anyhow!("seccomp opreator is required"));
}
let mut op = ScmpCompareOp::from_str(&arg.op)?;
let mut value = arg.value;
// For SCMP_CMP_MASKED_EQ, arg.value is the mask and arg.value_two is the value
if op == ScmpCompareOp::MaskedEqual(u64::default()) {
op = ScmpCompareOp::MaskedEqual(arg.value);
value = arg.value_two;
}
let cond = ScmpArgCompare::new(arg.index, op, value);
let cond = ScmpArgCompare::new(
arg.index,
ScmpCompareOp::from_str(&arg.op)?,
arg.value,
Some(arg.value_two),
);
conditions.push(cond);
}
@@ -47,7 +44,7 @@ pub fn get_unknown_syscalls(scmp: &LinuxSeccomp) -> Option<Vec<String>> {
for syscall in &scmp.syscalls {
for name in &syscall.names {
if ScmpSyscall::from_name(name).is_err() {
if get_syscall_from_name(name, None).is_err() {
unknown_syscalls.push(name.to_string());
}
}
@@ -63,7 +60,7 @@ pub fn get_unknown_syscalls(scmp: &LinuxSeccomp) -> Option<Vec<String>> {
// init_seccomp creates a seccomp filter and loads it for the current process
// including all the child processes.
pub fn init_seccomp(scmp: &LinuxSeccomp) -> Result<()> {
let def_action = ScmpAction::from_str(scmp.default_action.as_str(), Some(libc::EPERM as i32))?;
let def_action = ScmpAction::from_str(scmp.default_action.as_str(), Some(libc::EPERM as u32))?;
// Create a new filter context
let mut filter = ScmpFilterContext::new_filter(def_action)?;
@@ -75,7 +72,7 @@ pub fn init_seccomp(scmp: &LinuxSeccomp) -> Result<()> {
}
// Unset no new privileges bit
filter.set_ctl_nnp(false)?;
filter.set_no_new_privs_bit(false)?;
// Add a rule for each system call
for syscall in &scmp.syscalls {
@@ -83,13 +80,13 @@ pub fn init_seccomp(scmp: &LinuxSeccomp) -> Result<()> {
return Err(anyhow!("syscall name is required"));
}
let action = ScmpAction::from_str(&syscall.action, Some(syscall.errno_ret as i32))?;
let action = ScmpAction::from_str(&syscall.action, Some(syscall.errno_ret))?;
if action == def_action {
continue;
}
for name in &syscall.names {
let syscall_num = match ScmpSyscall::from_name(name) {
let syscall_num = match get_syscall_from_name(name, None) {
Ok(num) => num,
Err(_) => {
// If we cannot resolve the given system call, we assume it is not supported
@@ -99,10 +96,10 @@ pub fn init_seccomp(scmp: &LinuxSeccomp) -> Result<()> {
};
if syscall.args.is_empty() {
filter.add_rule(action, syscall_num)?;
filter.add_rule(action, syscall_num, None)?;
} else {
let conditions = get_rule_conditions(&syscall.args)?;
filter.add_rule_conditional(action, syscall_num, &conditions)?;
filter.add_rule(action, syscall_num, Some(&conditions))?;
}
}
}
@@ -122,10 +119,10 @@ pub fn init_seccomp(scmp: &LinuxSeccomp) -> Result<()> {
#[cfg(test)]
mod tests {
use super::*;
use crate::skip_if_not_root;
use libc::{dup3, process_vm_readv, EPERM, O_CLOEXEC};
use std::io::Error;
use std::ptr::null;
use test_utils::skip_if_not_root;
macro_rules! syscall_assert {
($e1: expr, $e2: expr) => {

View File

@@ -5,7 +5,7 @@
use oci::Spec;
#[derive(Serialize, Deserialize, Debug, Default, Clone)]
#[derive(Debug)]
pub struct CreateOpts {
pub cgroup_name: String,
pub use_systemd_cgroup: bool,

View File

@@ -0,0 +1,119 @@
// Copyright (c) 2021 Ant Group
//
// SPDX-License-Identifier: Apache-2.0
//
use anyhow::{anyhow, Context, Result};
use libc::gid_t;
use libc::uid_t;
use std::fs::File;
use std::io::{BufRead, BufReader};
const PASSWD_FILE: &str = "/etc/passwd";
// An entry from /etc/passwd
#[derive(Debug, PartialEq, PartialOrd)]
pub struct PasswdEntry {
// username
pub name: String,
// user password
pub passwd: String,
// user id
pub uid: uid_t,
// group id
pub gid: gid_t,
// user Information
pub gecos: String,
// home directory
pub dir: String,
// User's Shell
pub shell: String,
}
// get an entry for a given `uid` from `/etc/passwd`
fn get_entry_by_uid(uid: uid_t, path: &str) -> Result<PasswdEntry> {
let file = File::open(path).with_context(|| format!("open file {}", path))?;
let mut reader = BufReader::new(file);
let mut line = String::new();
loop {
line.clear();
match reader.read_line(&mut line) {
Ok(0) => return Err(anyhow!(format!("file {} is empty", path))),
Ok(_) => (),
Err(e) => {
return Err(anyhow!(format!(
"failed to read file {} with {:?}",
path, e
)))
}
}
if line.starts_with('#') {
continue;
}
let parts: Vec<&str> = line.split(':').map(|part| part.trim()).collect();
if parts.len() != 7 {
continue;
}
match parts[2].parse() {
Err(_e) => continue,
Ok(new_uid) => {
if uid != new_uid {
continue;
}
let entry = PasswdEntry {
name: parts[0].to_string(),
passwd: parts[1].to_string(),
uid: new_uid,
gid: parts[3].parse().unwrap_or(0),
gecos: parts[4].to_string(),
dir: parts[5].to_string(),
shell: parts[6].to_string(),
};
return Ok(entry);
}
}
}
}
pub fn home_dir(uid: uid_t) -> Result<String> {
get_entry_by_uid(uid, PASSWD_FILE).map(|entry| entry.dir)
}
#[cfg(test)]
mod tests {
use super::*;
use std::io::Write;
use tempfile::Builder;
#[test]
fn test_get_entry_by_uid() {
let tmpdir = Builder::new().tempdir().unwrap();
let tmpdir_path = tmpdir.path().to_str().unwrap();
let temp_passwd = format!("{}/passwd", tmpdir_path);
let mut tempf = File::create(temp_passwd.as_str()).unwrap();
writeln!(tempf, "root:x:0:0:root:/root0:/bin/bash").unwrap();
writeln!(tempf, "root:x:1:0:root:/root1:/bin/bash").unwrap();
writeln!(tempf, "#root:x:1:0:root:/rootx:/bin/bash").unwrap();
writeln!(tempf, "root:x:2:0:root:/root2:/bin/bash").unwrap();
writeln!(tempf, "root:x:3:0:root:/root3").unwrap();
writeln!(tempf, "root:x:3:0:root:/root3:/bin/bash").unwrap();
let entry = get_entry_by_uid(0, temp_passwd.as_str()).unwrap();
assert_eq!(entry.dir.as_str(), "/root0");
let entry = get_entry_by_uid(1, temp_passwd.as_str()).unwrap();
assert_eq!(entry.dir.as_str(), "/root1");
let entry = get_entry_by_uid(2, temp_passwd.as_str()).unwrap();
assert_eq!(entry.dir.as_str(), "/root2");
let entry = get_entry_by_uid(3, temp_passwd.as_str()).unwrap();
assert_eq!(entry.dir.as_str(), "/root3");
}
}

View File

@@ -4,15 +4,17 @@
//
use crate::container::Config;
use anyhow::{anyhow, Context, Result};
use anyhow::{anyhow, Context, Error, Result};
use oci::{Linux, LinuxIdMapping, LinuxNamespace, Spec};
use std::collections::HashMap;
use std::path::{Component, PathBuf};
fn einval() -> Error {
anyhow!(nix::Error::EINVAL)
}
fn get_linux(oci: &Spec) -> Result<&Linux> {
oci.linux
.as_ref()
.ok_or_else(|| anyhow!("Unable to get Linux section from Spec"))
oci.linux.as_ref().ok_or_else(einval)
}
fn contain_namespace(nses: &[LinuxNamespace], key: &str) -> bool {
@@ -29,10 +31,7 @@ fn rootfs(root: &str) -> Result<()> {
let path = PathBuf::from(root);
// not absolute path or not exists
if !path.exists() || !path.is_absolute() {
return Err(anyhow!(
"Path from {:?} does not exist or is not absolute",
root
));
return Err(einval());
}
// symbolic link? ..?
@@ -50,7 +49,7 @@ fn rootfs(root: &str) -> Result<()> {
if let Some(v) = c.as_os_str().to_str() {
stack.push(v.to_string());
} else {
return Err(anyhow!("Invalid path component (unable to convert to str)"));
return Err(einval());
}
}
@@ -59,13 +58,10 @@ fn rootfs(root: &str) -> Result<()> {
cleaned.push(e);
}
let canon = path.canonicalize().context("failed to canonicalize path")?;
let canon = path.canonicalize().context("canonicalize")?;
if cleaned != canon {
// There is symbolic in path
return Err(anyhow!(
"There may be illegal symbols in the path name. Cleaned ({:?}) and canonicalized ({:?}) paths do not match",
cleaned,
canon));
return Err(einval());
}
Ok(())
@@ -78,7 +74,7 @@ fn hostname(oci: &Spec) -> Result<()> {
let linux = get_linux(oci)?;
if !contain_namespace(&linux.namespaces, "uts") {
return Err(anyhow!("Linux namespace does not contain uts"));
return Err(einval());
}
Ok(())
@@ -92,7 +88,7 @@ fn security(oci: &Spec) -> Result<()> {
}
if !contain_namespace(&linux.namespaces, "mount") {
return Err(anyhow!("Linux namespace does not contain mount"));
return Err(einval());
}
// don't care about selinux at present
@@ -107,7 +103,7 @@ fn idmapping(maps: &[LinuxIdMapping]) -> Result<()> {
}
}
Err(anyhow!("No idmap has size > 0"))
Err(einval())
}
fn usernamespace(oci: &Spec) -> Result<()> {
@@ -125,7 +121,7 @@ fn usernamespace(oci: &Spec) -> Result<()> {
} else {
// no user namespace but idmap
if !linux.uid_mappings.is_empty() || !linux.gid_mappings.is_empty() {
return Err(anyhow!("No user namespace, but uid or gid mapping exists"));
return Err(einval());
}
}
@@ -167,7 +163,7 @@ fn sysctl(oci: &Spec) -> Result<()> {
if contain_namespace(&linux.namespaces, "ipc") {
continue;
} else {
return Err(anyhow!("Linux namespace does not contain ipc"));
return Err(einval());
}
}
@@ -182,11 +178,11 @@ fn sysctl(oci: &Spec) -> Result<()> {
}
if key == "kernel.hostname" {
return Err(anyhow!("Kernel hostname specfied in Spec"));
return Err(einval());
}
}
return Err(anyhow!("Sysctl config contains invalid settings"));
return Err(einval());
}
Ok(())
}
@@ -195,13 +191,12 @@ fn rootless_euid_mapping(oci: &Spec) -> Result<()> {
let linux = get_linux(oci)?;
if !contain_namespace(&linux.namespaces, "user") {
return Err(anyhow!("Linux namespace is missing user"));
return Err(einval());
}
if linux.uid_mappings.is_empty() || linux.gid_mappings.is_empty() {
return Err(anyhow!(
"Rootless containers require at least one UID/GID mapping"
));
// rootless containers requires at least one UID/GID mapping
return Err(einval());
}
Ok(())
@@ -225,7 +220,7 @@ fn rootless_euid_mount(oci: &Spec) -> Result<()> {
let fields: Vec<&str> = opt.split('=').collect();
if fields.len() != 2 {
return Err(anyhow!("Options has invalid field: {:?}", fields));
return Err(einval());
}
let id = fields[1]
@@ -234,11 +229,11 @@ fn rootless_euid_mount(oci: &Spec) -> Result<()> {
.context(format!("parse field {}", &fields[1]))?;
if opt.starts_with("uid=") && !has_idmapping(&linux.uid_mappings, id) {
return Err(anyhow!("uid of {} does not have a valid mapping", id));
return Err(einval());
}
if opt.starts_with("gid=") && !has_idmapping(&linux.gid_mappings, id) {
return Err(anyhow!("gid of {} does not have a valid mapping", id));
return Err(einval());
}
}
}
@@ -254,18 +249,15 @@ fn rootless_euid(oci: &Spec) -> Result<()> {
pub fn validate(conf: &Config) -> Result<()> {
lazy_static::initialize(&SYSCTLS);
let oci = conf
.spec
.as_ref()
.ok_or_else(|| anyhow!("Invalid config spec"))?;
let oci = conf.spec.as_ref().ok_or_else(einval)?;
if oci.linux.is_none() {
return Err(anyhow!("oci Linux is none"));
return Err(einval());
}
let root = match oci.root.as_ref() {
Some(v) => v.path.as_str(),
None => return Err(anyhow!("oci root is none")),
None => return Err(einval()),
};
rootfs(root).context("rootfs")?;

View File

@@ -12,8 +12,6 @@ use std::str::FromStr;
use std::time;
use tracing::instrument;
use kata_types::config::default::DEFAULT_AGENT_VSOCK_PORT;
const DEBUG_CONSOLE_FLAG: &str = "agent.debug_console";
const DEV_MODE_FLAG: &str = "agent.devmode";
const TRACE_MODE_OPTION: &str = "agent.trace";
@@ -30,6 +28,7 @@ const DEFAULT_LOG_LEVEL: slog::Level = slog::Level::Info;
const DEFAULT_HOTPLUG_TIMEOUT: time::Duration = time::Duration::from_secs(3);
const DEFAULT_CONTAINER_PIPE_SIZE: i32 = 0;
const VSOCK_ADDR: &str = "vsock://-1";
const VSOCK_PORT: u16 = 1024;
// Environment variables used for development and testing
const SERVER_ADDR_ENV_VAR: &str = "KATA_AGENT_SERVER_ADDR";
@@ -148,7 +147,7 @@ impl Default for AgentConfig {
debug_console_vport: 0,
log_vport: 0,
container_pipe_size: DEFAULT_CONTAINER_PIPE_SIZE,
server_addr: format!("{}:{}", VSOCK_ADDR, DEFAULT_AGENT_VSOCK_PORT),
server_addr: format!("{}:{}", VSOCK_ADDR, VSOCK_PORT),
unified_cgroup_hierarchy: false,
tracing: false,
endpoints: Default::default(),
@@ -433,7 +432,7 @@ fn get_container_pipe_size(param: &str) -> Result<i32> {
#[cfg(test)]
mod tests {
use test_utils::assert_result;
use crate::assert_result;
use super::*;
use anyhow::anyhow;

View File

@@ -9,7 +9,7 @@ use anyhow::{anyhow, Result};
use nix::fcntl::{self, FcntlArg, FdFlag, OFlag};
use nix::libc::{STDERR_FILENO, STDIN_FILENO, STDOUT_FILENO};
use nix::pty::{openpty, OpenptyResult};
use nix::sys::socket::{self, AddressFamily, SockFlag, SockType, VsockAddr};
use nix::sys::socket::{self, AddressFamily, SockAddr, SockFlag, SockType};
use nix::sys::stat::Mode;
use nix::sys::wait;
use nix::unistd::{self, close, dup2, fork, setsid, ForkResult, Pid};
@@ -67,7 +67,7 @@ pub async fn debug_console_handler(
SockFlag::SOCK_CLOEXEC,
None,
)?;
let addr = VsockAddr::new(libc::VMADDR_CID_ANY, port);
let addr = SockAddr::new_vsock(libc::VMADDR_CID_ANY, port);
socket::bind(listenfd, &addr)?;
socket::listen(listenfd, 1)?;

View File

@@ -22,11 +22,12 @@ extern crate slog;
use anyhow::{anyhow, Context, Result};
use clap::{AppSettings, Parser};
use nix::fcntl::OFlag;
use nix::sys::socket::{self, AddressFamily, SockFlag, SockType, VsockAddr};
use nix::sys::socket::{self, AddressFamily, SockAddr, SockFlag, SockType};
use nix::unistd::{self, dup, Pid};
use std::env;
use std::ffi::OsStr;
use std::fs::{self, File};
use std::os::unix::ffi::OsStrExt;
use std::os::unix::fs as unixfs;
use std::os::unix::io::AsRawFd;
use std::path::Path;
@@ -49,6 +50,8 @@ mod pci;
pub mod random;
mod sandbox;
mod signal;
#[cfg(test)]
mod test_utils;
mod uevent;
mod util;
mod version;
@@ -108,6 +111,10 @@ enum SubCommand {
fn announce(logger: &Logger, config: &AgentConfig) {
info!(logger, "announce";
"agent-commit" => version::VERSION_COMMIT,
// Avoid any possibility of confusion with the old agent
"agent-type" => "rust",
"agent-version" => version::AGENT_VERSION,
"api-version" => version::API_VERSION,
"config" => format!("{:?}", config),
@@ -118,7 +125,9 @@ fn announce(logger: &Logger, config: &AgentConfig) {
// output to the vsock port specified, or stdout.
async fn create_logger_task(rfd: RawFd, vsock_port: u32, shutdown: Receiver<bool>) -> Result<()> {
let mut reader = PipeStream::from_fd(rfd);
let mut writer: Box<dyn AsyncWrite + Unpin + Send> = if vsock_port > 0 {
let mut writer: Box<dyn AsyncWrite + Unpin + Send>;
if vsock_port > 0 {
let listenfd = socket::socket(
AddressFamily::Vsock,
SockType::Stream,
@@ -126,14 +135,14 @@ async fn create_logger_task(rfd: RawFd, vsock_port: u32, shutdown: Receiver<bool
None,
)?;
let addr = VsockAddr::new(libc::VMADDR_CID_ANY, vsock_port);
let addr = SockAddr::new_vsock(libc::VMADDR_CID_ANY, vsock_port);
socket::bind(listenfd, &addr)?;
socket::listen(listenfd, 1)?;
Box::new(util::get_vsock_stream(listenfd).await?)
writer = Box::new(util::get_vsock_stream(listenfd).await?);
} else {
Box::new(tokio::io::stdout())
};
writer = Box::new(tokio::io::stdout());
}
let _ = util::interruptable_io_copier(&mut reader, &mut writer, shutdown).await;
@@ -207,7 +216,7 @@ async fn real_main() -> std::result::Result<(), Box<dyn std::error::Error>> {
if config.log_level == slog::Level::Trace {
// Redirect ttrpc log calls to slog iff full debug requested
ttrpc_log_guard = Ok(slog_stdlog::init()?);
ttrpc_log_guard = Ok(slog_stdlog::init().map_err(|e| e)?);
}
if config.tracing {
@@ -375,13 +384,27 @@ fn init_agent_as_init(logger: &Logger, unified_cgroup_hierarchy: bool) -> Result
let contents_array: Vec<&str> = contents.split(' ').collect();
let hostname = contents_array[0].trim();
if unistd::sethostname(OsStr::new(hostname)).is_err() {
if sethostname(OsStr::new(hostname)).is_err() {
warn!(logger, "failed to set hostname");
}
Ok(())
}
#[instrument]
fn sethostname(hostname: &OsStr) -> Result<()> {
let size = hostname.len() as usize;
let result =
unsafe { libc::sethostname(hostname.as_bytes().as_ptr() as *const libc::c_char, size) };
if result != 0 {
Err(anyhow!("failed to set hostname"))
} else {
Ok(())
}
}
// The Rust standard library had suppressed the default SIGPIPE behavior,
// see https://github.com/rust-lang/rust/pull/13158.
// Since the parent's signal handler would be inherited by it's child process,
@@ -395,60 +418,3 @@ fn reset_sigpipe() {
use crate::config::AgentConfig;
use std::os::unix::io::{FromRawFd, RawFd};
#[cfg(test)]
mod tests {
use super::*;
use test_utils::TestUserType;
use test_utils::{assert_result, skip_if_not_root, skip_if_root};
#[tokio::test]
async fn test_create_logger_task() {
#[derive(Debug)]
struct TestData {
vsock_port: u32,
test_user: TestUserType,
result: Result<()>,
}
let tests = &[
TestData {
// non-root user cannot use privileged vsock port
vsock_port: 1,
test_user: TestUserType::NonRootOnly,
result: Err(anyhow!(nix::errno::Errno::from_i32(libc::EACCES))),
},
TestData {
// passing vsock_port 0 causes logger task to write to stdout
vsock_port: 0,
test_user: TestUserType::Any,
result: Ok(()),
},
];
for (i, d) in tests.iter().enumerate() {
if d.test_user == TestUserType::RootOnly {
skip_if_not_root!();
} else if d.test_user == TestUserType::NonRootOnly {
skip_if_root!();
}
let msg = format!("test[{}]: {:?}", i, d);
let (rfd, wfd) = unistd::pipe2(OFlag::O_CLOEXEC).unwrap();
defer!({
// rfd is closed by the use of PipeStream in the crate_logger_task function,
// but we will attempt to close in case of a failure
let _ = unistd::close(rfd);
unistd::close(wfd).unwrap();
});
let (shutdown_tx, shutdown_rx) = channel(true);
shutdown_tx.send(true).unwrap();
let result = create_logger_task(rfd, d.vsock_port, shutdown_rx).await;
let msg = format!("{}, result: {:?}", msg, result);
assert_result!(d.result, result, msg);
}
}
}

View File

@@ -344,25 +344,25 @@ fn set_gauge_vec_meminfo(gv: &prometheus::GaugeVec, meminfo: &procfs::Meminfo) {
#[instrument]
fn set_gauge_vec_cpu_time(gv: &prometheus::GaugeVec, cpu: &str, cpu_time: &procfs::CpuTime) {
gv.with_label_values(&[cpu, "user"])
.set(cpu_time.user_ms() as f64);
.set(cpu_time.user as f64);
gv.with_label_values(&[cpu, "nice"])
.set(cpu_time.nice_ms() as f64);
.set(cpu_time.nice as f64);
gv.with_label_values(&[cpu, "system"])
.set(cpu_time.system_ms() as f64);
.set(cpu_time.system as f64);
gv.with_label_values(&[cpu, "idle"])
.set(cpu_time.idle_ms() as f64);
.set(cpu_time.idle as f64);
gv.with_label_values(&[cpu, "iowait"])
.set(cpu_time.iowait_ms().unwrap_or(0) as f64);
.set(cpu_time.iowait.unwrap_or(0) as f64);
gv.with_label_values(&[cpu, "irq"])
.set(cpu_time.irq_ms().unwrap_or(0) as f64);
.set(cpu_time.irq.unwrap_or(0) as f64);
gv.with_label_values(&[cpu, "softirq"])
.set(cpu_time.softirq_ms().unwrap_or(0) as f64);
.set(cpu_time.softirq.unwrap_or(0) as f64);
gv.with_label_values(&[cpu, "steal"])
.set(cpu_time.steal_ms().unwrap_or(0) as f64);
.set(cpu_time.steal.unwrap_or(0) as f64);
gv.with_label_values(&[cpu, "guest"])
.set(cpu_time.guest_ms().unwrap_or(0) as f64);
.set(cpu_time.guest.unwrap_or(0) as f64);
gv.with_label_values(&[cpu, "guest_nice"])
.set(cpu_time.guest_nice_ms().unwrap_or(0) as f64);
.set(cpu_time.guest_nice.unwrap_or(0) as f64);
}
#[instrument]

View File

@@ -91,11 +91,11 @@ lazy_static! {
}
#[derive(Debug, PartialEq)]
pub struct InitMount<'a> {
fstype: &'a str,
src: &'a str,
dest: &'a str,
options: Vec<&'a str>,
pub struct InitMount {
fstype: &'static str,
src: &'static str,
dest: &'static str,
options: Vec<&'static str>,
}
#[rustfmt::skip]
@@ -121,7 +121,7 @@ lazy_static!{
#[rustfmt::skip]
lazy_static! {
pub static ref INIT_ROOTFS_MOUNTS: Vec<InitMount<'static>> = vec![
pub static ref INIT_ROOTFS_MOUNTS: Vec<InitMount> = vec![
InitMount{fstype: "proc", src: "proc", dest: "/proc", options: vec!["nosuid", "nodev", "noexec"]},
InitMount{fstype: "sysfs", src: "sysfs", dest: "/sys", options: vec!["nosuid", "nodev", "noexec"]},
InitMount{fstype: "devtmpfs", src: "dev", dest: "/dev", options: vec!["nosuid"]},
@@ -169,12 +169,11 @@ pub fn baremount(
info!(
logger,
"baremount source={:?}, dest={:?}, fs_type={:?}, options={:?}, flags={:?}",
"mount source={:?}, dest={:?}, fs_type={:?}, options={:?}",
source,
destination,
fs_type,
options,
flags
options
);
nix::mount::mount(
@@ -779,20 +778,8 @@ pub async fn add_storages(
}
};
let mount_point = match res {
Err(e) => {
error!(
logger,
"add_storages failed, storage: {:?}, error: {:?} ", storage, e
);
let mut sb = sandbox.lock().await;
sb.unset_sandbox_storage(&storage.mount_point)
.map_err(|e| warn!(logger, "fail to unset sandbox storage {:?}", e))
.ok();
return Err(e);
}
Ok(m) => m,
};
// Todo need to rollback the mounted storage if err met.
let mount_point = res?;
if !mount_point.is_empty() {
mount_list.push(mount_point);
@@ -853,14 +840,15 @@ pub fn get_mount_fs_type_from_file(mount_file: &str, mount_point: &str) -> Resul
return Err(anyhow!("Invalid mount point {}", mount_point));
}
let content = fs::read_to_string(mount_file)
.map_err(|e| anyhow!("read mount file {}: {}", mount_file, e))?;
let file = File::open(mount_file)?;
let reader = BufReader::new(file);
let re = Regex::new(format!("device .+ mounted on {} with fstype (.+)", mount_point).as_str())?;
// Read the file line by line using the lines() iterator from std::io::BufRead.
for (_index, line) in content.lines().enumerate() {
let capes = match re.captures(line) {
for (_index, line) in reader.lines().enumerate() {
let line = line?;
let capes = match re.captures(line.as_str()) {
Some(c) => c,
None => continue,
};
@@ -871,9 +859,8 @@ pub fn get_mount_fs_type_from_file(mount_file: &str, mount_point: &str) -> Resul
}
Err(anyhow!(
"failed to find FS type for mount point {}, mount file content: {:?}",
mount_point,
content
"failed to find FS type for mount point {}",
mount_point
))
}
@@ -882,7 +869,7 @@ pub fn get_cgroup_mounts(
logger: &Logger,
cg_path: &str,
unified_cgroup_hierarchy: bool,
) -> Result<Vec<InitMount<'static>>> {
) -> Result<Vec<InitMount>> {
// cgroup v2
// https://github.com/kata-containers/agent/blob/8c9bbadcd448c9a67690fbe11a860aaacc69813c/agent.go#L1249
if unified_cgroup_hierarchy {
@@ -1030,6 +1017,7 @@ fn parse_options(option_list: Vec<String>) -> HashMap<String, String> {
#[cfg(test)]
mod tests {
use super::*;
use crate::{skip_if_not_root, skip_loop_if_not_root, skip_loop_if_root};
use protobuf::RepeatedField;
use protocols::agent::FSGroup;
use std::fs::File;
@@ -1037,10 +1025,13 @@ mod tests {
use std::io::Write;
use std::path::PathBuf;
use tempfile::tempdir;
use test_utils::TestUserType;
use test_utils::{
skip_if_not_root, skip_loop_by_user, skip_loop_if_not_root, skip_loop_if_root,
};
#[derive(Debug, PartialEq)]
enum TestUserType {
RootOnly,
NonRootOnly,
Any,
}
#[test]
fn test_mount() {
@@ -1127,7 +1118,11 @@ mod tests {
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
skip_loop_by_user!(msg, d.test_user);
if d.test_user == TestUserType::RootOnly {
skip_loop_if_not_root!(msg);
} else if d.test_user == TestUserType::NonRootOnly {
skip_loop_if_root!(msg);
}
let src: PathBuf;
let dest: PathBuf;
@@ -1136,7 +1131,7 @@ mod tests {
let dest_filename: String;
if !d.src.is_empty() {
src = dir.path().join(d.src);
src = dir.path().join(d.src.to_string());
src_filename = src
.to_str()
.expect("failed to convert src to filename")
@@ -1146,7 +1141,7 @@ mod tests {
}
if !d.dest.is_empty() {
dest = dir.path().join(d.dest);
dest = dir.path().join(d.dest.to_string());
dest_filename = dest
.to_str()
.expect("failed to convert dest to filename")
@@ -1597,226 +1592,6 @@ mod tests {
assert!(testfile.is_file());
}
#[test]
fn test_mount_storage() {
#[derive(Debug)]
struct TestData<'a> {
test_user: TestUserType,
storage: Storage,
error_contains: &'a str,
make_source_dir: bool,
make_mount_dir: bool,
deny_mount_permission: bool,
}
impl Default for TestData<'_> {
fn default() -> Self {
TestData {
test_user: TestUserType::Any,
storage: Storage {
mount_point: "mnt".to_string(),
source: "src".to_string(),
fstype: "tmpfs".to_string(),
..Default::default()
},
make_source_dir: true,
make_mount_dir: false,
deny_mount_permission: false,
error_contains: "",
}
}
}
let tests = &[
TestData {
test_user: TestUserType::NonRootOnly,
error_contains: "EPERM: Operation not permitted",
..Default::default()
},
TestData {
test_user: TestUserType::RootOnly,
..Default::default()
},
TestData {
storage: Storage {
mount_point: "mnt".to_string(),
source: "src".to_string(),
fstype: "bind".to_string(),
..Default::default()
},
make_source_dir: false,
make_mount_dir: true,
error_contains: "Could not create mountpoint",
..Default::default()
},
TestData {
test_user: TestUserType::NonRootOnly,
deny_mount_permission: true,
error_contains: "Could not create mountpoint",
..Default::default()
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
skip_loop_by_user!(msg, d.test_user);
let drain = slog::Discard;
let logger = slog::Logger::root(drain, o!());
let tempdir = tempdir().unwrap();
let source = tempdir.path().join(&d.storage.source);
let mount_point = tempdir.path().join(&d.storage.mount_point);
let storage = Storage {
source: source.to_str().unwrap().to_string(),
mount_point: mount_point.to_str().unwrap().to_string(),
..d.storage.clone()
};
if d.make_source_dir {
fs::create_dir_all(&storage.source).unwrap();
}
if d.make_mount_dir {
fs::create_dir_all(&storage.mount_point).unwrap();
}
if d.deny_mount_permission {
fs::set_permissions(
mount_point.parent().unwrap(),
fs::Permissions::from_mode(0o000),
)
.unwrap();
}
let result = mount_storage(&logger, &storage);
// restore permissions so tempdir can be cleaned up
if d.deny_mount_permission {
fs::set_permissions(
mount_point.parent().unwrap(),
fs::Permissions::from_mode(0o755),
)
.unwrap();
}
if result.is_ok() {
nix::mount::umount(&mount_point).unwrap();
}
let msg = format!("{}: result: {:?}", msg, result);
if d.error_contains.is_empty() {
assert!(result.is_ok(), "{}", msg);
} else {
assert!(result.is_err(), "{}", msg);
let error_msg = format!("{}", result.unwrap_err());
assert!(error_msg.contains(d.error_contains), "{}", msg);
}
}
}
#[test]
fn test_mount_to_rootfs() {
#[derive(Debug)]
struct TestData<'a> {
test_user: TestUserType,
src: &'a str,
options: Vec<&'a str>,
error_contains: &'a str,
deny_mount_dir_permission: bool,
// if true src will be prepended with a temporary directory
mask_src: bool,
}
impl Default for TestData<'_> {
fn default() -> Self {
TestData {
test_user: TestUserType::Any,
src: "src",
options: vec![],
error_contains: "",
deny_mount_dir_permission: false,
mask_src: true,
}
}
}
let tests = &[
TestData {
test_user: TestUserType::NonRootOnly,
error_contains: "EPERM: Operation not permitted",
..Default::default()
},
TestData {
test_user: TestUserType::NonRootOnly,
src: "dev",
mask_src: false,
..Default::default()
},
TestData {
test_user: TestUserType::RootOnly,
..Default::default()
},
TestData {
test_user: TestUserType::NonRootOnly,
deny_mount_dir_permission: true,
error_contains: "could not create directory",
..Default::default()
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
skip_loop_by_user!(msg, d.test_user);
let drain = slog::Discard;
let logger = slog::Logger::root(drain, o!());
let tempdir = tempdir().unwrap();
let src = if d.mask_src {
tempdir.path().join(&d.src)
} else {
Path::new(d.src).to_path_buf()
};
let dest = tempdir.path().join("mnt");
let init_mount = InitMount {
fstype: "tmpfs",
src: src.to_str().unwrap(),
dest: dest.to_str().unwrap(),
options: d.options.clone(),
};
if d.deny_mount_dir_permission {
fs::set_permissions(dest.parent().unwrap(), fs::Permissions::from_mode(0o000))
.unwrap();
}
let result = mount_to_rootfs(&logger, &init_mount);
// restore permissions so tempdir can be cleaned up
if d.deny_mount_dir_permission {
fs::set_permissions(dest.parent().unwrap(), fs::Permissions::from_mode(0o755))
.unwrap();
}
if result.is_ok() && d.mask_src {
nix::mount::umount(&dest).unwrap();
}
let msg = format!("{}: result: {:?}", msg, result);
if d.error_contains.is_empty() {
assert!(result.is_ok(), "{}", msg);
} else {
assert!(result.is_err(), "{}", msg);
let error_msg = format!("{}", result.unwrap_err());
assert!(error_msg.contains(d.error_contains), "{}", msg);
}
}
}
#[test]
fn test_get_pagesize_and_size_from_option() {
let expected_pagesize = 2048;
@@ -1873,57 +1648,6 @@ mod tests {
}
}
#[test]
fn test_parse_mount_flags_and_options() {
#[derive(Debug)]
struct TestData<'a> {
options_vec: Vec<&'a str>,
result: (MsFlags, &'a str),
}
let tests = &[
TestData {
options_vec: vec![],
result: (MsFlags::empty(), ""),
},
TestData {
options_vec: vec!["ro"],
result: (MsFlags::MS_RDONLY, ""),
},
TestData {
options_vec: vec!["rw"],
result: (MsFlags::empty(), ""),
},
TestData {
options_vec: vec!["ro", "rw"],
result: (MsFlags::empty(), ""),
},
TestData {
options_vec: vec!["ro", "nodev"],
result: (MsFlags::MS_RDONLY | MsFlags::MS_NODEV, ""),
},
TestData {
options_vec: vec!["option1", "nodev", "option2"],
result: (MsFlags::MS_NODEV, "option1,option2"),
},
TestData {
options_vec: vec!["rbind", "", "ro"],
result: (MsFlags::MS_BIND | MsFlags::MS_REC | MsFlags::MS_RDONLY, ""),
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let result = parse_mount_flags_and_options(d.options_vec.clone());
let msg = format!("{}: result: {:?}", msg, result);
let expected_result = (d.result.0, d.result.1.to_owned());
assert_eq!(expected_result, result, "{}", msg);
}
}
#[test]
fn test_set_ownership() {
skip_if_not_root!();

View File

@@ -187,10 +187,9 @@ impl fmt::Debug for NamespaceType {
#[cfg(test)]
mod tests {
use super::{Namespace, NamespaceType};
use crate::mount::remove_mounts;
use crate::{mount::remove_mounts, skip_if_not_root};
use nix::sched::CloneFlags;
use tempfile::Builder;
use test_utils::skip_if_not_root;
#[tokio::test]
async fn test_setup_persistent_ns() {

View File

@@ -64,7 +64,7 @@ impl Handle {
pub async fn update_interface(&mut self, iface: &Interface) -> Result<()> {
// The reliable way to find link is using hardware address
// as filter. However, hardware filter might not be supported
// by netlink, we may have to dump link list and then find the
// by netlink, we may have to dump link list and the find the
// target link. filter using name or family is supported, but
// we cannot use that to find target link.
// let's try if hardware address filter works. -_-
@@ -178,7 +178,7 @@ impl Handle {
.with_context(|| format!("Failed to parse MAC address: {}", addr))?;
// Hardware filter might not be supported by netlink,
// we may have to dump link list and then find the target link.
// we may have to dump link list and the find the target link.
stream
.try_filter(|f| {
let result = f.nlas.iter().any(|n| match n {
@@ -523,7 +523,7 @@ impl Handle {
.as_ref()
.map(|to| to.address.as_str()) // Extract address field
.and_then(|addr| if addr.is_empty() { None } else { Some(addr) }) // Make sure it's not empty
.ok_or_else(|| anyhow!("Unable to determine ip address of ARP neighbor"))?;
.ok_or(anyhow!(nix::Error::EINVAL))?;
let ip = IpAddr::from_str(ip_address)
.map_err(|e| anyhow!("Failed to parse IP {}: {:?}", ip_address, e))?;
@@ -612,12 +612,7 @@ fn parse_mac_address(addr: &str) -> Result<[u8; 6]> {
// Parse single Mac address block
let mut parse_next = || -> Result<u8> {
let v = u8::from_str_radix(
split
.next()
.ok_or_else(|| anyhow!("Invalid MAC address {}", addr))?,
16,
)?;
let v = u8::from_str_radix(split.next().ok_or(anyhow!(nix::Error::EINVAL))?, 16)?;
Ok(v)
};
@@ -775,10 +770,10 @@ impl Address {
#[cfg(test)]
mod tests {
use super::*;
use crate::skip_if_not_root;
use rtnetlink::packet;
use std::iter;
use std::process::Command;
use test_utils::skip_if_not_root;
#[tokio::test]
async fn find_link_by_name() {

View File

@@ -76,11 +76,11 @@ fn do_setup_guest_dns(logger: Logger, dns_list: Vec<String>, src: &str, dst: &st
#[cfg(test)]
mod tests {
use super::*;
use crate::skip_if_not_root;
use nix::mount;
use std::fs::File;
use std::io::Write;
use tempfile::tempdir;
use test_utils::skip_if_not_root;
#[test]
fn test_setup_guest_dns() {

View File

@@ -3,7 +3,7 @@
// SPDX-License-Identifier: Apache-2.0
//
use anyhow::{ensure, Result};
use anyhow::Result;
use nix::errno::Errno;
use nix::fcntl::{self, OFlag};
use nix::sys::stat::Mode;
@@ -13,7 +13,7 @@ use tracing::instrument;
pub const RNGDEV: &str = "/dev/random";
pub const RNDADDTOENTCNT: libc::c_int = 0x40045201;
pub const RNDRESEEDCRNG: libc::c_int = 0x5207;
pub const RNDRESEEDRNG: libc::c_int = 0x5207;
// Handle the differing ioctl(2) request types for different targets
#[cfg(target_env = "musl")]
@@ -24,9 +24,6 @@ type IoctlRequestType = libc::c_ulong;
#[instrument]
pub fn reseed_rng(data: &[u8]) -> Result<()> {
let len = data.len() as libc::c_long;
ensure!(len > 0, "missing entropy data");
fs::write(RNGDEV, data)?;
let f = {
@@ -44,52 +41,8 @@ pub fn reseed_rng(data: &[u8]) -> Result<()> {
};
Errno::result(ret).map(drop)?;
let ret = unsafe { libc::ioctl(f.as_raw_fd(), RNDRESEEDCRNG as IoctlRequestType, 0) };
let ret = unsafe { libc::ioctl(f.as_raw_fd(), RNDRESEEDRNG as IoctlRequestType, 0) };
Errno::result(ret).map(drop)?;
Ok(())
}
#[cfg(test)]
mod tests {
use super::*;
use std::fs::File;
use std::io::prelude::*;
use test_utils::skip_if_not_root;
#[test]
fn test_reseed_rng() {
skip_if_not_root!();
const POOL_SIZE: usize = 512;
let mut f = File::open("/dev/urandom").unwrap();
let mut seed = [0; POOL_SIZE];
let n = f.read(&mut seed).unwrap();
// Ensure the buffer was filled.
assert!(n == POOL_SIZE);
let ret = reseed_rng(&seed);
assert!(ret.is_ok());
}
#[test]
fn test_reseed_rng_not_root() {
const POOL_SIZE: usize = 512;
let mut f = File::open("/dev/urandom").unwrap();
let mut seed = [0; POOL_SIZE];
let n = f.read(&mut seed).unwrap();
// Ensure the buffer was filled.
assert!(n == POOL_SIZE);
let ret = reseed_rng(&seed);
if nix::unistd::Uid::effective().is_root() {
assert!(ret.is_ok());
} else {
assert!(ret.is_err());
}
}
#[test]
fn test_reseed_rng_zero_data() {
let seed = [];
let ret = reseed_rng(&seed);
assert!(ret.is_err());
}
}

View File

@@ -23,9 +23,8 @@ use cgroups::freezer::FreezerState;
use oci::{LinuxNamespace, Root, Spec};
use protobuf::{Message, RepeatedField, SingularPtrField};
use protocols::agent::{
AddSwapRequest, AgentDetails, CopyFileRequest, GetIPTablesRequest, GetIPTablesResponse,
GuestDetailsResponse, Interfaces, Metrics, OOMEvent, ReadStreamResponse, Routes,
SetIPTablesRequest, SetIPTablesResponse, StatsContainerResponse, VolumeStatsRequest,
AddSwapRequest, AgentDetails, CopyFileRequest, GuestDetailsResponse, Interfaces, Metrics,
OOMEvent, ReadStreamResponse, Routes, StatsContainerResponse, VolumeStatsRequest,
WaitProcessResponse, WriteStreamResponse,
};
use protocols::csi::{VolumeCondition, VolumeStatsResponse, VolumeUsage, VolumeUsage_Unit};
@@ -34,7 +33,6 @@ use protocols::health::{
HealthCheckResponse, HealthCheckResponse_ServingStatus, VersionCheckResponse,
};
use protocols::types::Interface;
use protocols::{agent_ttrpc_async as agent_ttrpc, health_ttrpc_async as health_ttrpc};
use rustjail::cgroups::notifier;
use rustjail::container::{BaseContainer, Container, LinuxContainer};
use rustjail::process::Process;
@@ -77,29 +75,18 @@ use std::time::Duration;
use nix::unistd::{Gid, Uid};
use std::fs::{File, OpenOptions};
use std::io::{BufRead, BufReader, Write};
use std::io::{BufRead, BufReader};
use std::os::unix::fs::FileExt;
use std::path::PathBuf;
const CONTAINER_BASE: &str = "/run/kata-containers";
const MODPROBE_PATH: &str = "/sbin/modprobe";
const IPTABLES_SAVE: &str = "/sbin/iptables-save";
const IPTABLES_RESTORE: &str = "/sbin/iptables-restore";
const IP6TABLES_SAVE: &str = "/sbin/ip6tables-save";
const IP6TABLES_RESTORE: &str = "/sbin/ip6tables-restore";
const ERR_CANNOT_GET_WRITER: &str = "Cannot get writer";
const ERR_INVALID_BLOCK_SIZE: &str = "Invalid block size";
const ERR_NO_LINUX_FIELD: &str = "Spec does not contain linux field";
const ERR_NO_SANDBOX_PIDNS: &str = "Sandbox does not have sandbox_pidns";
// IPTABLES_RESTORE_WAIT_SEC is the timeout value provided to iptables-restore --wait. Since we
// don't expect other writers to iptables, we don't expect contention for grabbing the iptables
// filesystem lock. Based on this, 5 seconds seems a resonable timeout period in case the lock is
// not available.
const IPTABLES_RESTORE_WAIT_SEC: u64 = 5;
// Convenience macro to obtain the scope logger
macro_rules! sl {
() => {
@@ -134,6 +121,30 @@ pub struct AgentService {
sandbox: Arc<Mutex<Sandbox>>,
}
// A container ID must match this regex:
//
// ^[a-zA-Z0-9][a-zA-Z0-9_.-]+$
//
fn verify_cid(id: &str) -> Result<()> {
let mut chars = id.chars();
let valid = match chars.next() {
Some(first)
if first.is_alphanumeric()
&& id.len() > 1
&& chars.all(|c| c.is_alphanumeric() || ['.', '-', '_'].contains(&c)) =>
{
true
}
_ => false,
};
match valid {
true => Ok(()),
false => Err(anyhow!("invalid container ID: {:?}", id)),
}
}
impl AgentService {
#[instrument]
async fn do_create_container(
@@ -142,7 +153,7 @@ impl AgentService {
) -> Result<()> {
let cid = req.container_id.clone();
kata_sys_util::validate::verify_id(&cid)?;
verify_cid(&cid)?;
let mut oci_spec = req.OCI.clone();
let use_sandbox_pidns = req.get_sandbox_pidns();
@@ -226,20 +237,7 @@ impl AgentService {
info!(sl!(), "no process configurations!");
return Err(anyhow!(nix::Error::EINVAL));
};
// if starting container failed, we will do some rollback work
// to ensure no resources are leaked.
if let Err(err) = ctr.start(p).await {
error!(sl!(), "failed to start container: {:?}", err);
if let Err(e) = ctr.destroy().await {
error!(sl!(), "failed to destroy container: {:?}", e);
}
if let Err(e) = remove_container_resources(&mut s, &cid) {
error!(sl!(), "failed to remove container resources: {:?}", e);
}
return Err(err);
}
ctr.start(p).await?;
s.update_shared_pidns(&ctr)?;
s.add_container(ctr);
info!(sl!(), "created container!");
@@ -259,7 +257,7 @@ impl AgentService {
.get_container(&cid)
.ok_or_else(|| anyhow!("Invalid container id"))?;
ctr.exec().await?;
ctr.exec()?;
if sid == cid {
return Ok(());
@@ -285,6 +283,27 @@ impl AgentService {
req: protocols::agent::RemoveContainerRequest,
) -> Result<()> {
let cid = req.container_id.clone();
let mut cmounts: Vec<String> = vec![];
let mut remove_container_resources = |sandbox: &mut Sandbox| -> Result<()> {
// Find the sandbox storage used by this container
let mounts = sandbox.container_mounts.get(&cid);
if let Some(mounts) = mounts {
for m in mounts.iter() {
if sandbox.storages.get(m).is_some() {
cmounts.push(m.to_string());
}
}
}
for m in cmounts.iter() {
sandbox.unset_and_remove_sandbox_storage(m)?;
}
sandbox.container_mounts.remove(cid.as_str());
sandbox.containers.remove(cid.as_str());
Ok(())
};
if req.timeout == 0 {
let s = Arc::clone(&self.sandbox);
@@ -298,7 +317,7 @@ impl AgentService {
.destroy()
.await?;
remove_container_resources(&mut sandbox, &cid)?;
remove_container_resources(&mut sandbox)?;
return Ok(());
}
@@ -330,7 +349,8 @@ impl AgentService {
let s = self.sandbox.clone();
let mut sandbox = s.lock().await;
remove_container_resources(&mut sandbox, &cid)?;
remove_container_resources(&mut sandbox)?;
Ok(())
}
@@ -348,7 +368,7 @@ impl AgentService {
let mut process = req
.process
.into_option()
.ok_or_else(|| anyhow!("Unable to parse process from ExecProcessRequest"))?;
.ok_or_else(|| anyhow!(nix::Error::EINVAL))?;
// Apply any necessary corrections for PCI addresses
update_env_pci(&mut process.Env, &sandbox.pcimap)?;
@@ -390,22 +410,8 @@ impl AgentService {
if p.init && sig == libc::SIGTERM && !is_signal_handled(&proc_status_file, sig as u32) {
sig = libc::SIGKILL;
}
match p.signal(sig) {
Err(Errno::ESRCH) => {
info!(
sl!(),
"signal encounter ESRCH, continue";
"container-id" => cid.clone(),
"exec-id" => eid.clone(),
"pid" => p.pid,
"signal" => sig,
);
}
Err(err) => return Err(anyhow!(err)),
Ok(()) => (),
}
};
p.signal(sig)?;
}
if eid.is_empty() {
// eid is empty, signal all the remaining processes in the container cgroup
@@ -611,7 +617,7 @@ impl AgentService {
};
if reader.is_none() {
return Err(anyhow!("Unable to determine stream reader, is None"));
return Err(anyhow!(nix::Error::EINVAL));
}
let reader = reader.ok_or_else(|| anyhow!("cannot get stream reader"))?;
@@ -632,7 +638,7 @@ impl AgentService {
}
#[async_trait]
impl agent_ttrpc::AgentService for AgentService {
impl protocols::agent_ttrpc::AgentService for AgentService {
async fn create_container(
&self,
ctx: &TtrpcContext,
@@ -988,140 +994,6 @@ impl agent_ttrpc::AgentService for AgentService {
})
}
async fn get_ip_tables(
&self,
ctx: &TtrpcContext,
req: GetIPTablesRequest,
) -> ttrpc::Result<GetIPTablesResponse> {
trace_rpc_call!(ctx, "get_iptables", req);
is_allowed!(req);
info!(sl!(), "get_ip_tables: request received");
let cmd = if req.is_ipv6 {
IP6TABLES_SAVE
} else {
IPTABLES_SAVE
}
.to_string();
match Command::new(cmd.clone()).output() {
Ok(output) => Ok(GetIPTablesResponse {
data: output.stdout,
..Default::default()
}),
Err(e) => {
warn!(sl!(), "failed to run {}: {:?}", cmd, e.kind());
return Err(ttrpc_error!(ttrpc::Code::INTERNAL, e));
}
}
}
async fn set_ip_tables(
&self,
ctx: &TtrpcContext,
req: SetIPTablesRequest,
) -> ttrpc::Result<SetIPTablesResponse> {
trace_rpc_call!(ctx, "set_iptables", req);
is_allowed!(req);
info!(sl!(), "set_ip_tables request received");
let cmd = if req.is_ipv6 {
IP6TABLES_RESTORE
} else {
IPTABLES_RESTORE
}
.to_string();
let mut child = match Command::new(cmd.clone())
.arg("--wait")
.arg(IPTABLES_RESTORE_WAIT_SEC.to_string())
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.spawn()
{
Ok(child) => child,
Err(e) => {
warn!(sl!(), "failure to spawn {}: {:?}", cmd, e.kind());
return Err(ttrpc_error!(ttrpc::Code::INTERNAL, e));
}
};
let mut stdin = match child.stdin.take() {
Some(si) => si,
None => {
println!("failed to get stdin from child");
return Err(ttrpc_error!(
ttrpc::Code::INTERNAL,
"failed to take stdin from child".to_string()
));
}
};
let (tx, rx) = tokio::sync::oneshot::channel::<i32>();
let handle = tokio::spawn(async move {
let _ = match stdin.write_all(&req.data) {
Ok(o) => o,
Err(e) => {
warn!(sl!(), "error writing stdin: {:?}", e.kind());
return;
}
};
if tx.send(1).is_err() {
warn!(sl!(), "stdin writer thread receiver dropped");
};
});
if tokio::time::timeout(Duration::from_secs(IPTABLES_RESTORE_WAIT_SEC), rx)
.await
.is_err()
{
return Err(ttrpc_error!(
ttrpc::Code::INTERNAL,
"timeout waiting for stdin writer to complete".to_string()
));
}
if handle.await.is_err() {
return Err(ttrpc_error!(
ttrpc::Code::INTERNAL,
"stdin writer thread failure".to_string()
));
}
let output = match child.wait_with_output() {
Ok(o) => o,
Err(e) => {
warn!(
sl!(),
"failure waiting for spawned {} to complete: {:?}",
cmd,
e.kind()
);
return Err(ttrpc_error!(ttrpc::Code::INTERNAL, e));
}
};
if !output.status.success() {
warn!(sl!(), "{} failed: {:?}", cmd, output.stderr);
return Err(ttrpc_error!(
ttrpc::Code::INTERNAL,
format!(
"{} failed: {:?}",
cmd,
String::from_utf8_lossy(&output.stderr)
)
));
}
Ok(SetIPTablesResponse {
data: output.stdout,
..Default::default()
})
}
async fn list_interfaces(
&self,
ctx: &TtrpcContext,
@@ -1518,7 +1390,7 @@ impl agent_ttrpc::AgentService for AgentService {
struct HealthService;
#[async_trait]
impl health_ttrpc::Health for HealthService {
impl protocols::health_ttrpc::Health for HealthService {
async fn check(
&self,
_ctx: &TtrpcContext,
@@ -1657,17 +1529,18 @@ async fn read_stream(reader: Arc<Mutex<ReadHalf<PipeStream>>>, l: usize) -> Resu
}
pub fn start(s: Arc<Mutex<Sandbox>>, server_address: &str) -> Result<TtrpcServer> {
let agent_service =
Box::new(AgentService { sandbox: s }) as Box<dyn agent_ttrpc::AgentService + Send + Sync>;
let agent_service = Box::new(AgentService { sandbox: s })
as Box<dyn protocols::agent_ttrpc::AgentService + Send + Sync>;
let agent_worker = Arc::new(agent_service);
let health_service = Box::new(HealthService {}) as Box<dyn health_ttrpc::Health + Send + Sync>;
let health_service =
Box::new(HealthService {}) as Box<dyn protocols::health_ttrpc::Health + Send + Sync>;
let health_worker = Arc::new(health_service);
let aservice = agent_ttrpc::create_agent_service(agent_worker);
let aservice = protocols::agent_ttrpc::create_agent_service(agent_worker);
let hservice = health_ttrpc::create_health(health_worker);
let hservice = protocols::health_ttrpc::create_health(health_worker);
let server = TtrpcServer::new()
.bind(server_address)?
@@ -1733,35 +1606,6 @@ fn update_container_namespaces(
Ok(())
}
fn remove_container_resources(sandbox: &mut Sandbox, cid: &str) -> Result<()> {
let mut cmounts: Vec<String> = vec![];
// Find the sandbox storage used by this container
let mounts = sandbox.container_mounts.get(cid);
if let Some(mounts) = mounts {
for m in mounts.iter() {
if sandbox.storages.get(m).is_some() {
cmounts.push(m.to_string());
}
}
}
for m in cmounts.iter() {
if let Err(err) = sandbox.unset_and_remove_sandbox_storage(m) {
error!(
sl!(),
"failed to unset_and_remove_sandbox_storage for container {}, error: {:?}",
cid,
err
);
}
}
sandbox.container_mounts.remove(cid);
sandbox.containers.remove(cid);
Ok(())
}
fn append_guest_hooks(s: &Sandbox, oci: &mut Spec) -> Result<()> {
if let Some(ref guest_hooks) = s.hooks {
let mut hooks = oci.hooks.take().unwrap_or_default();
@@ -1803,25 +1647,34 @@ fn is_signal_handled(proc_status_file: &str, signum: u32) -> bool {
let sig_mask: u64 = 1 << shift_count;
let reader = BufReader::new(file);
// read lines start with SigBlk/SigIgn/SigCgt and check any match the signal mask
reader
.lines()
.flatten()
.filter(|line| {
line.starts_with("SigBlk:")
|| line.starts_with("SigIgn:")
|| line.starts_with("SigCgt:")
})
.any(|line| {
let mask_vec: Vec<&str> = line.split(':').collect();
if mask_vec.len() == 2 {
let sig_str = mask_vec[1].trim();
if let Ok(sig) = u64::from_str_radix(sig_str, 16) {
return sig & sig_mask == sig_mask;
}
// Read the file line by line using the lines() iterator from std::io::BufRead.
for (_index, line) in reader.lines().enumerate() {
let line = match line {
Ok(l) => l,
Err(_) => {
warn!(sl!(), "failed to read file {}", proc_status_file);
return false;
}
false
})
};
if line.starts_with("SigCgt:") {
let mask_vec: Vec<&str> = line.split(':').collect();
if mask_vec.len() != 2 {
warn!(sl!(), "parse the SigCgt field failed");
return false;
}
let sig_cgt_str = mask_vec[1].trim();
let sig_cgt_mask = match u64::from_str_radix(sig_cgt_str, 16) {
Ok(h) => h,
Err(_) => {
warn!(sl!(), "failed to parse the str {} to hex", sig_cgt_str);
return false;
}
};
return (sig_cgt_mask & sig_mask) == sig_mask;
}
}
false
}
fn do_mem_hotplug_by_probe(addrs: &[u64]) -> Result<()> {
@@ -1853,11 +1706,7 @@ fn do_copy_file(req: &CopyFileRequest) -> Result<()> {
let path = PathBuf::from(req.path.as_str());
if !path.starts_with(CONTAINER_BASE) {
return Err(anyhow!(
"Path {:?} does not start with {}",
path,
CONTAINER_BASE
));
return Err(anyhow!(nix::Error::EINVAL));
}
let parent = path.parent();
@@ -1933,7 +1782,7 @@ async fn do_add_swap(sandbox: &Arc<Mutex<Sandbox>>, req: &AddSwapRequest) -> Res
// - config.json at /<CONTAINER_BASE>/<cid>/config.json
// - container rootfs bind mounted at /<CONTAINER_BASE>/<cid>/rootfs
// - modify container spec root to point to /<CONTAINER_BASE>/<cid>/rootfs
pub fn setup_bundle(cid: &str, spec: &mut Spec) -> Result<PathBuf> {
fn setup_bundle(cid: &str, spec: &mut Spec) -> Result<PathBuf> {
let spec_root = if let Some(sr) = &spec.root {
sr
} else {
@@ -2025,12 +1874,13 @@ fn load_kernel_module(module: &protocols::agent::KernelModule) -> Result<()> {
#[cfg(test)]
mod tests {
use super::*;
use crate::{namespace::Namespace, protocols::agent_ttrpc_async::AgentService as _};
use crate::{
assert_result, namespace::Namespace, protocols::agent_ttrpc::AgentService as _,
skip_if_not_root,
};
use nix::mount;
use nix::sched::{unshare, CloneFlags};
use oci::{Hook, Hooks, Linux, LinuxNamespace};
use tempfile::{tempdir, TempDir};
use test_utils::{assert_result, skip_if_not_root};
use ttrpc::{r#async::TtrpcContext, MessageHeader};
fn mk_ttrpc_context() -> TtrpcContext {
@@ -2096,7 +1946,6 @@ mod tests {
let result = load_kernel_module(&m);
assert!(result.is_err(), "load module should failed");
skip_if_not_root!();
// case 3: normal module.
// normally this module should eixsts...
m.name = "bridge".to_string();
@@ -2275,7 +2124,6 @@ mod tests {
if d.has_fd {
Some(wfd)
} else {
unistd::close(wfd).unwrap();
None
}
};
@@ -2310,14 +2158,13 @@ mod tests {
if !d.break_pipe {
unistd::close(rfd).unwrap();
}
// XXX: Do not close wfd.
// the fd will be closed on Process's dropping.
// unistd::close(wfd).unwrap();
unistd::close(wfd).unwrap();
let msg = format!("{}, result: {:?}", msg, result);
assert_result!(d.result, result, msg);
}
}
#[tokio::test]
async fn test_update_container_namespaces() {
#[derive(Debug)]
@@ -2648,12 +2495,7 @@ OtherField:other
TestData {
status_file_data: Some("SigBlk:0000000000000001"),
signum: 1,
result: true,
},
TestData {
status_file_data: Some("SigIgn:0000000000000001"),
signum: 1,
result: true,
result: false,
},
TestData {
status_file_data: None,
@@ -2685,6 +2527,233 @@ OtherField:other
}
}
#[tokio::test]
async fn test_verify_cid() {
#[derive(Debug)]
struct TestData<'a> {
id: &'a str,
expect_error: bool,
}
let tests = &[
TestData {
// Cannot be blank
id: "",
expect_error: true,
},
TestData {
// Cannot be a space
id: " ",
expect_error: true,
},
TestData {
// Must start with an alphanumeric
id: ".",
expect_error: true,
},
TestData {
// Must start with an alphanumeric
id: "-",
expect_error: true,
},
TestData {
// Must start with an alphanumeric
id: "_",
expect_error: true,
},
TestData {
// Must start with an alphanumeric
id: " a",
expect_error: true,
},
TestData {
// Must start with an alphanumeric
id: ".a",
expect_error: true,
},
TestData {
// Must start with an alphanumeric
id: "-a",
expect_error: true,
},
TestData {
// Must start with an alphanumeric
id: "_a",
expect_error: true,
},
TestData {
// Must start with an alphanumeric
id: "..",
expect_error: true,
},
TestData {
// Too short
id: "a",
expect_error: true,
},
TestData {
// Too short
id: "z",
expect_error: true,
},
TestData {
// Too short
id: "A",
expect_error: true,
},
TestData {
// Too short
id: "Z",
expect_error: true,
},
TestData {
// Too short
id: "0",
expect_error: true,
},
TestData {
// Too short
id: "9",
expect_error: true,
},
TestData {
// Must start with an alphanumeric
id: "-1",
expect_error: true,
},
TestData {
id: "/",
expect_error: true,
},
TestData {
id: "a/",
expect_error: true,
},
TestData {
id: "a/../",
expect_error: true,
},
TestData {
id: "../a",
expect_error: true,
},
TestData {
id: "../../a",
expect_error: true,
},
TestData {
id: "../../../a",
expect_error: true,
},
TestData {
id: "foo/../bar",
expect_error: true,
},
TestData {
id: "foo bar",
expect_error: true,
},
TestData {
id: "a.",
expect_error: false,
},
TestData {
id: "a..",
expect_error: false,
},
TestData {
id: "aa",
expect_error: false,
},
TestData {
id: "aa.",
expect_error: false,
},
TestData {
id: "hello..world",
expect_error: false,
},
TestData {
id: "hello/../world",
expect_error: true,
},
TestData {
id: "aa1245124sadfasdfgasdga.",
expect_error: false,
},
TestData {
id: "aAzZ0123456789_.-",
expect_error: false,
},
TestData {
id: "abcdefghijklmnopqrstuvwxyz0123456789.-_",
expect_error: false,
},
TestData {
id: "0123456789abcdefghijklmnopqrstuvwxyz.-_",
expect_error: false,
},
TestData {
id: " abcdefghijklmnopqrstuvwxyz0123456789.-_",
expect_error: true,
},
TestData {
id: ".abcdefghijklmnopqrstuvwxyz0123456789.-_",
expect_error: true,
},
TestData {
id: "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.-_",
expect_error: false,
},
TestData {
id: "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ.-_",
expect_error: false,
},
TestData {
id: " ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.-_",
expect_error: true,
},
TestData {
id: ".ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.-_",
expect_error: true,
},
TestData {
id: "/a/b/c",
expect_error: true,
},
TestData {
id: "a/b/c",
expect_error: true,
},
TestData {
id: "foo/../../../etc/passwd",
expect_error: true,
},
TestData {
id: "../../../../../../etc/motd",
expect_error: true,
},
TestData {
id: "/etc/passwd",
expect_error: true,
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let result = verify_cid(d.id);
let msg = format!("{}, result: {:?}", msg, result);
if result.is_ok() {
assert!(!d.expect_error, "{}", msg);
} else {
assert!(d.expect_error, "{}", msg);
}
}
}
#[tokio::test]
async fn test_volume_capacity_stats() {
skip_if_not_root!();
@@ -2746,162 +2815,4 @@ OtherField:other
assert_eq!(stats.used, 3);
assert_eq!(stats.available, available - 2);
}
#[tokio::test]
async fn test_ip_tables() {
skip_if_not_root!();
let logger = slog::Logger::root(slog::Discard, o!());
let sandbox = Sandbox::new(&logger).unwrap();
let agent_service = Box::new(AgentService {
sandbox: Arc::new(Mutex::new(sandbox)),
});
let ctx = mk_ttrpc_context();
// Move to a new netns in order to ensure we don't trash the hosts' iptables
unshare(CloneFlags::CLONE_NEWNET).unwrap();
// Get initial iptables, we expect to be empty:
let result = agent_service
.get_ip_tables(
&ctx,
GetIPTablesRequest {
is_ipv6: false,
..Default::default()
},
)
.await;
assert!(result.is_ok(), "get ip tables should succeed");
assert_eq!(
result.unwrap().data.len(),
0,
"ip tables should be empty initially"
);
// Initial ip6 ip tables should also be empty:
let result = agent_service
.get_ip_tables(
&ctx,
GetIPTablesRequest {
is_ipv6: true,
..Default::default()
},
)
.await;
assert!(result.is_ok(), "get ip6 tables should succeed");
assert_eq!(
result.unwrap().data.len(),
0,
"ip tables should be empty initially"
);
// Verify that attempting to write 'empty' iptables results in no error:
let empty_rules = "";
let result = agent_service
.set_ip_tables(
&ctx,
SetIPTablesRequest {
is_ipv6: false,
data: empty_rules.as_bytes().to_vec(),
..Default::default()
},
)
.await;
assert!(result.is_ok(), "set ip tables with no data should succeed");
// Verify that attempting to write "garbage" iptables results in an error:
let garbage_rules = r#"
this
is
just garbage
"#;
let result = agent_service
.set_ip_tables(
&ctx,
SetIPTablesRequest {
is_ipv6: false,
data: garbage_rules.as_bytes().to_vec(),
..Default::default()
},
)
.await;
assert!(result.is_err(), "set iptables with garbage should fail");
// Verify setup of valid iptables:Setup valid set of iptables:
let valid_rules = r#"
*nat
-A PREROUTING -d 192.168.103.153/32 -j DNAT --to-destination 192.168.188.153
COMMIT
"#;
let result = agent_service
.set_ip_tables(
&ctx,
SetIPTablesRequest {
is_ipv6: false,
data: valid_rules.as_bytes().to_vec(),
..Default::default()
},
)
.await;
assert!(result.is_ok(), "set ip tables should succeed");
let result = agent_service
.get_ip_tables(
&ctx,
GetIPTablesRequest {
is_ipv6: false,
..Default::default()
},
)
.await
.unwrap();
assert!(!result.data.is_empty(), "we should have non-zero output:");
assert!(
std::str::from_utf8(&*result.data).unwrap().contains(
"PREROUTING -d 192.168.103.153/32 -j DNAT --to-destination 192.168.188.153"
),
"We should see the resulting rule"
);
// Verify setup of valid ip6tables:
let valid_ipv6_rules = r#"
*filter
-A INPUT -s 2001:db8:100::1/128 -i sit+ -p tcp -m tcp --sport 512:65535
COMMIT
"#;
let result = agent_service
.set_ip_tables(
&ctx,
SetIPTablesRequest {
is_ipv6: true,
data: valid_ipv6_rules.as_bytes().to_vec(),
..Default::default()
},
)
.await;
assert!(result.is_ok(), "set ip6 tables should succeed");
let result = agent_service
.get_ip_tables(
&ctx,
GetIPTablesRequest {
is_ipv6: true,
..Default::default()
},
)
.await
.unwrap();
assert!(!result.data.is_empty(), "we should have non-zero output:");
assert!(
std::str::from_utf8(&*result.data)
.unwrap()
.contains("INPUT -s 2001:db8:100::1/128 -i sit+ -p tcp -m tcp --sport 512:65535"),
"We should see the resulting rule"
);
}
}

View File

@@ -470,8 +470,8 @@ fn online_memory(logger: &Logger) -> Result<()> {
#[cfg(test)]
mod tests {
use super::*;
use crate::mount::baremount;
use super::Sandbox;
use crate::{mount::baremount, skip_if_not_root};
use anyhow::{anyhow, Error};
use nix::mount::MsFlags;
use oci::{Linux, Root, Spec};
@@ -480,11 +480,9 @@ mod tests {
use rustjail::specconv::CreateOpts;
use slog::Logger;
use std::fs::{self, File};
use std::io::prelude::*;
use std::os::unix::fs::PermissionsExt;
use std::path::Path;
use tempfile::{tempdir, Builder, TempDir};
use test_utils::skip_if_not_root;
fn bind_mount(src: &str, dst: &str, logger: &Logger) -> Result<(), Error> {
let src_path = Path::new(src);
@@ -849,259 +847,4 @@ mod tests {
let p = s.find_container_process("not-exist-cid", "");
assert!(p.is_err(), "Expecting Error, Got {:?}", p);
}
#[tokio::test]
async fn test_find_process() {
let logger = slog::Logger::root(slog::Discard, o!());
let test_pids = [std::i32::MIN, -1, 0, 1, std::i32::MAX];
for test_pid in test_pids {
let mut s = Sandbox::new(&logger).unwrap();
let (mut linux_container, _root) = create_linuxcontainer();
let mut test_process = Process::new(
&logger,
&oci::Process::default(),
"this_is_a_test_process",
true,
1,
)
.unwrap();
// processes interally only have pids when manually set
test_process.pid = test_pid;
linux_container.processes.insert(test_pid, test_process);
s.add_container(linux_container);
let find_result = s.find_process(test_pid);
// test first if it finds anything
assert!(find_result.is_some(), "Should be able to find a process");
let found_process = find_result.unwrap();
// then test if it founds the correct process
assert_eq!(
found_process.pid, test_pid,
"Should be able to find correct process"
);
}
// to test for nonexistent pids, any pid that isn't the one set
// above should work, as linuxcontainer starts with no processes
let mut s = Sandbox::new(&logger).unwrap();
let nonexistent_test_pid = 1234;
let find_result = s.find_process(nonexistent_test_pid);
assert!(
find_result.is_none(),
"Shouldn't find a process for non existent pid"
);
}
#[tokio::test]
async fn test_online_resources() {
#[derive(Debug, Default)]
struct TestFile {
name: String,
content: String,
}
#[derive(Debug, Default)]
struct TestDirectory<'a> {
name: String,
files: &'a [TestFile],
}
#[derive(Debug)]
struct TestData<'a> {
directory_autogen_name: String,
number_autogen_directories: u32,
extra_directories: &'a [TestDirectory<'a>],
pattern: String,
to_enable: i32,
result: Result<i32>,
}
impl Default for TestData<'_> {
fn default() -> Self {
TestData {
directory_autogen_name: Default::default(),
number_autogen_directories: Default::default(),
extra_directories: Default::default(),
pattern: Default::default(),
to_enable: Default::default(),
result: Ok(Default::default()),
}
}
}
let tests = &[
// 4 well formed directories, request enabled 4,
// correct result 4 enabled, should pass
TestData {
directory_autogen_name: String::from("cpu"),
number_autogen_directories: 4,
pattern: String::from(r"cpu[0-9]+"),
to_enable: 4,
result: Ok(4),
..Default::default()
},
// 0 well formed directories, request enabled 4,
// correct result 0 enabled, should pass
TestData {
number_autogen_directories: 0,
to_enable: 4,
result: Ok(0),
..Default::default()
},
// 10 well formed directories, request enabled 4,
// correct result 4 enabled, should pass
TestData {
directory_autogen_name: String::from("cpu"),
number_autogen_directories: 10,
pattern: String::from(r"cpu[0-9]+"),
to_enable: 4,
result: Ok(4),
..Default::default()
},
// 0 well formed directories, request enabled 0,
// correct result 0 enabled, should pass
TestData {
number_autogen_directories: 0,
pattern: String::from(r"cpu[0-9]+"),
to_enable: 0,
result: Ok(0),
..Default::default()
},
// 4 well formed directories, 1 malformed (no online file),
// request enable 5, correct result 4
TestData {
directory_autogen_name: String::from("cpu"),
number_autogen_directories: 4,
pattern: String::from(r"cpu[0-9]+"),
extra_directories: &[TestDirectory {
name: String::from("cpu4"),
files: &[],
}],
to_enable: 5,
result: Ok(4),
},
// 3 malformed directories (no online files),
// request enable 3, correct result 0
TestData {
pattern: String::from(r"cpu[0-9]+"),
extra_directories: &[
TestDirectory {
name: String::from("cpu0"),
files: &[],
},
TestDirectory {
name: String::from("cpu1"),
files: &[],
},
TestDirectory {
name: String::from("cpu2"),
files: &[],
},
],
to_enable: 3,
result: Ok(0),
..Default::default()
},
// 1 malformed directories (online file with content "1"),
// request enable 1, correct result 0
TestData {
pattern: String::from(r"cpu[0-9]+"),
extra_directories: &[TestDirectory {
name: String::from("cpu0"),
files: &[TestFile {
name: SYSFS_ONLINE_FILE.to_string(),
content: String::from("1"),
}],
}],
to_enable: 1,
result: Ok(0),
..Default::default()
},
// 2 well formed directories, 1 malformed (online file with content "1"),
// request enable 3, correct result 2
TestData {
directory_autogen_name: String::from("cpu"),
number_autogen_directories: 2,
pattern: String::from(r"cpu[0-9]+"),
extra_directories: &[TestDirectory {
name: String::from("cpu2"),
files: &[TestFile {
name: SYSFS_ONLINE_FILE.to_string(),
content: String::from("1"),
}],
}],
to_enable: 3,
result: Ok(2),
},
];
let logger = slog::Logger::root(slog::Discard, o!());
let tmpdir = Builder::new().tempdir().unwrap();
let tmpdir_path = tmpdir.path().to_str().unwrap();
for (i, d) in tests.iter().enumerate() {
let current_test_dir_path = format!("{}/test_{}", tmpdir_path, i);
fs::create_dir(&current_test_dir_path).unwrap();
// create numbered directories and fill using root name
for j in 0..d.number_autogen_directories {
let subdir_path = format!(
"{}/{}{}",
current_test_dir_path, d.directory_autogen_name, j
);
let subfile_path = format!("{}/{}", subdir_path, SYSFS_ONLINE_FILE);
fs::create_dir(&subdir_path).unwrap();
let mut subfile = File::create(subfile_path).unwrap();
subfile.write_all(b"0").unwrap();
}
// create extra directories and fill to specification
for j in d.extra_directories {
let subdir_path = format!("{}/{}", current_test_dir_path, j.name);
fs::create_dir(&subdir_path).unwrap();
for file in j.files {
let subfile_path = format!("{}/{}", subdir_path, file.name);
let mut subfile = File::create(&subfile_path).unwrap();
subfile.write_all(file.content.as_bytes()).unwrap();
}
}
// run created directory structure against online_resources
let result = online_resources(&logger, &current_test_dir_path, &d.pattern, d.to_enable);
let mut msg = format!(
"test[{}]: {:?}, expected {}, actual {}",
i,
d,
d.result.is_ok(),
result.is_ok()
);
assert_eq!(result.is_ok(), d.result.is_ok(), "{}", msg);
if d.result.is_ok() {
let test_result_val = *d.result.as_ref().ok().unwrap();
let result_val = result.ok().unwrap();
msg = format!(
"test[{}]: {:?}, expected {}, actual {}",
i, d, test_result_val, result_val
);
assert_eq!(test_result_val, result_val, "{}", msg);
}
}
}
}

View File

@@ -58,16 +58,17 @@ async fn handle_sigchild(logger: Logger, sandbox: Arc<Mutex<Sandbox>>) -> Result
}
let mut p = process.unwrap();
let ret: i32;
let ret: i32 = match wait_status {
WaitStatus::Exited(_, c) => c,
WaitStatus::Signaled(_, sig, _) => sig as i32,
match wait_status {
WaitStatus::Exited(_, c) => ret = c,
WaitStatus::Signaled(_, sig, _) => ret = sig as i32,
_ => {
info!(logger, "got wrong status for process";
"child-status" => format!("{:?}", wait_status));
continue;
}
};
}
p.exit_code = ret;
let _ = p.exit_tx.take();

View File

@@ -0,0 +1,81 @@
// Copyright (c) 2019 Intel Corporation
//
// SPDX-License-Identifier: Apache-2.0
//
#![allow(clippy::module_inception)]
#[cfg(test)]
mod test_utils {
#[macro_export]
macro_rules! skip_if_root {
() => {
if nix::unistd::Uid::effective().is_root() {
println!("INFO: skipping {} which needs non-root", module_path!());
return;
}
};
}
#[macro_export]
macro_rules! skip_if_not_root {
() => {
if !nix::unistd::Uid::effective().is_root() {
println!("INFO: skipping {} which needs root", module_path!());
return;
}
};
}
#[macro_export]
macro_rules! skip_loop_if_root {
($msg:expr) => {
if nix::unistd::Uid::effective().is_root() {
println!(
"INFO: skipping loop {} in {} which needs non-root",
$msg,
module_path!()
);
continue;
}
};
}
#[macro_export]
macro_rules! skip_loop_if_not_root {
($msg:expr) => {
if !nix::unistd::Uid::effective().is_root() {
println!(
"INFO: skipping loop {} in {} which needs root",
$msg,
module_path!()
);
continue;
}
};
}
// Parameters:
//
// 1: expected Result
// 2: actual Result
// 3: string used to identify the test on error
#[macro_export]
macro_rules! assert_result {
($expected_result:expr, $actual_result:expr, $msg:expr) => {
if $expected_result.is_ok() {
let expected_value = $expected_result.as_ref().unwrap();
let actual_value = $actual_result.unwrap();
assert!(*expected_value == actual_value, "{}", $msg);
} else {
assert!($actual_result.is_err(), "{}", $msg);
let expected_error = $expected_result.as_ref().unwrap_err();
let expected_error_msg = format!("{:?}", expected_error);
let actual_error_msg = format!("{:?}", $actual_result.unwrap_err());
assert!(expected_error_msg == actual_error_msg, "{}", $msg);
}
};
}
}

View File

@@ -69,8 +69,6 @@ macro_rules! trace_rpc_call {
propagator.extract(&extract_carrier_from_ttrpc($ctx))
});
info!(sl!(), "rpc call from shim to agent: {:?}", $name);
// generate tracing span
let rpc_span = span!(tracing::Level::INFO, $name, "mod"="rpc.rs", req=?$req);

View File

@@ -237,6 +237,8 @@ mod tests {
JoinError,
>;
let result: std::result::Result<u64, std::io::Error>;
select! {
res = handle => spawn_result = res,
_ = &mut timeout => panic!("timed out"),
@@ -244,7 +246,7 @@ mod tests {
assert!(spawn_result.is_ok());
let result: std::result::Result<u64, std::io::Error> = spawn_result.unwrap();
result = spawn_result.unwrap();
assert!(result.is_ok());
@@ -276,6 +278,8 @@ mod tests {
let spawn_result: std::result::Result<std::result::Result<u64, std::io::Error>, JoinError>;
let result: std::result::Result<u64, std::io::Error>;
select! {
res = handle => spawn_result = res,
_ = &mut timeout => panic!("timed out"),
@@ -283,7 +287,7 @@ mod tests {
assert!(spawn_result.is_ok());
let result: std::result::Result<u64, std::io::Error> = spawn_result.unwrap();
result = spawn_result.unwrap();
assert!(result.is_ok());
@@ -316,6 +320,8 @@ mod tests {
let spawn_result: std::result::Result<std::result::Result<u64, std::io::Error>, JoinError>;
let result: std::result::Result<u64, std::io::Error>;
select! {
res = handle => spawn_result = res,
_ = &mut timeout => panic!("timed out"),
@@ -323,7 +329,7 @@ mod tests {
assert!(spawn_result.is_ok());
let result: std::result::Result<u64, std::io::Error> = spawn_result.unwrap();
result = spawn_result.unwrap();
assert!(result.is_ok());

View File

@@ -528,10 +528,10 @@ impl BindWatcher {
mod tests {
use super::*;
use crate::mount::is_mounted;
use crate::skip_if_not_root;
use nix::unistd::{Gid, Uid};
use std::fs;
use std::thread;
use test_utils::skip_if_not_root;
async fn create_test_storage(dir: &Path, id: &str) -> Result<(protos::Storage, PathBuf)> {
let src_path = dir.join(format!("src{}", id));

Some files were not shown because too many files have changed in this diff Show More