Compare commits

..

114 Commits

Author SHA1 Message Date
Archana Shinde
3a1804cd73 Merge pull request #2975 from bergwolf/2.2.3-branch-bump
# Kata Containers 2.2.3
2021-11-05 04:31:27 -07:00
Peng Tao
b7493fd5d5 release: Kata Containers 2.2.3
ad45107a2 release: Kata Containers 2.2.3
4f73e58d7 packaging/static-build: s390x fixes
45f65a73c agent: Handle uevent remove actions
06d304934 agent: fix race condition when test watcher
0366f6e81 template: disable template unit test on arm
7cb650abc runtime: DefaultMaxVCPUs should not greater than defaultMaxQemuVCPUs
e97cd23bd runtime: current vcpu number should be limited
6b6d81cce runtime: kernel version with '+' as suffix panic in parse
a479eca7d docs: Fix outdated links
b794a3940 virtcontainers: clh: Re-generate the client code
39d95f486 versions: Upgrade to Cloud Hypervisor v19.0

Depends-on: github.com/kata-containers/tests#4155
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-11-05 15:10:01 +08:00
Chelsea Mafrica
63ecbcf14b Merge pull request #2971 from wainersm/stable-2.2_image-builder-fix
stable-2.2 | osbuilder: build image-builder image from Fedora 34
2021-11-04 21:37:50 -07:00
Jakob Naucke
4f73e58d73 packaging/static-build: s390x fixes
- Install OpenSSL for key generation in kernel build
- Do not install libpmem
- Do not exclude `*/share/*/*.img` files in QEMU tarball since among
  them are boot loader files critical for IPLing.

Fixes: #2895
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-11-05 11:35:54 +08:00
Haitao Li
45f65a73c8 agent: Handle uevent remove actions
uevents with action=remove was ignored causing the agent to reuse stale
data in the device map. This patch adds handling of such uevents.

Fixes #2405

Signed-off-by: Haitao Li <lihaitao@gmail.com>
2021-11-05 11:35:34 +08:00
Jianyong Wu
06d3049349 agent: fix race condition when test watcher
create_tmpfs won't pass as the race condition in watcher umount. quote
James's words here:

1. Rust runs all tests in parallel.
2. Mounts are a process-wide, not a per-thread resource.
The only test that calls watcher.mount() is create_tmpfs().
However, other tests create BindWatcher objects.
3. BindWatcher's drop() implementation calls self.cleanup(),
which calls unmount for the mountpoint create_tmpfs() asserts.
4. The other tests are calling unmount whenever a BindWatcher goes
out of scope.

To avoid that issue, let the tests using BindWatcher in watcher and
sandbox.rs run sequentially.

Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-11-05 11:34:52 +08:00
Jianyong Wu
0366f6e817 template: disable template unit test on arm
Template is broken on arm. here we disable the template unit test
temporarily.

Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-11-05 11:34:35 +08:00
Jianyong Wu
7cb650abcf runtime: DefaultMaxVCPUs should not greater than defaultMaxQemuVCPUs
DefaultMaxVCPUs may be larger than the defaultMaxQemuVCPUs that should
be checked and avoided.

Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-11-05 11:34:22 +08:00
Jianyong Wu
e97cd23bd6 runtime: current vcpu number should be limited
The physical current vcpu number should not be used directly as the
largest vcpu number is limited to defaultMaxQemuVCPUs.
Here, a new helper is introduced in pkg/katautils/config.go to get
current vcpu number.

Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-11-05 11:34:16 +08:00
Jianyong Wu
6b6d81cced runtime: kernel version with '+' as suffix panic in parse
The current kernel version parse lib can't process suffix '+', as the
modified kernel version will add '+' as suffix, thus panic will occur.

For example, if the current kernel version is "5.14.0-rc4+", test
TestHostNetworkingRequested will panic:
--- FAIL: TestHostNetworkingRequested (0.00s)
panic: &{DistroName:ubuntu DistroVersion:18.04
KernelVersion:5.11.0-rc3+ Issue: Passed:[] Failed:[] Debug:true
ActualEUID:0}: failed to check test constraints: error: Build meta data
is empty

Here, remove the suffix '+' in kernel version fix helper.

Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-11-05 11:34:08 +08:00
Binbin Zhang
a479eca7de docs: Fix outdated links
fix outdated links which were checked out by workflow/docs-url-alive-check

Fixes #2630

Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
2021-11-05 11:31:44 +08:00
Wainer dos Santos Moschetta
ee3bf4a411 osbuilder: build image-builder image from Fedora 34
Currently the image-builder image is built from `fedora:latest` and
this is error-prone as any update of the base image can lead to
breakage. Instead let's create the image from Fedora 34, which is the
last known version to build fine.

Fixes #2960
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
(cherry picked from commit a239a38f45)
2021-11-04 13:35:32 -04:00
James O. D. Hunt
4443a982e6 Merge pull request #2888 from likebreath/1022/backport_clh_v19.0_seccomp
stable-2.2 | versions: Upgrade to Cloud Hypervisor v19.0
2021-10-25 10:49:39 +01:00
Bo Chen
b794a39401 virtcontainers: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v19.0.
Note: The client code of cloud-hypervisor's (CLH) OpenAPI is
automatically generated by openapi-generator [1-2].

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 8030b6caf0)
2021-10-22 16:39:03 -07:00
Bo Chen
39d95f486b versions: Upgrade to Cloud Hypervisor v19.0
Highlights from the Cloud Hypervisor release v19.0: 1) Improved PTY
handling for serial and virtio-console; 2) PCI boot time optimisations;
3) Improved TDX support; 4) Live migration enhancements (support with
virtio-mem and virtio-balloon); 5) virtio-mem support with vfio-user; 6)
AArch64 for virtio-iommu; 7) Various bug fixes for live-migration and
VFIO passthrough.

Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v19.0

Fixes: #2871

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 8296754e07)
2021-10-22 16:39:03 -07:00
Peng Tao
aa40324c52 Merge pull request #2841 from fidencio/2.2.2-branch-bump
# Kata Containers 2.2.2
2021-10-14 19:29:27 +08:00
Fabiano Fidêncio
9053137592 release: Kata Containers 2.2.2
- stable-2.2 | Backport #2821 and #2769
- Backport runtime: Fix !x86 static checks
- stable-2.2 | agent: exec should inherit container process capabilities
- stable-2.2 | vendor: Update containerd to v1.5.7
- stable-2.2 | fc: fix version parsing for fc >= 0.25
- [backport] kata-monitor: cache improvements

eea2c019 virtcontainers: clh: Use 'quiet' as the default kernel parameter
1e798b96 virtcontainers: clh: Turn-off serial and virtio-console by default
53c4492f agent: netlink: Use the grpc IP family field when updating the route
893623df runtime: Pass the route IP family to the agent
503ce9c1 agent: protos: Add a Family field to the Route payload
9932e76f runtime: vendor: Bump the netlink package dependency
0034f40b agent: exec should inherit container process capabilities
1f6b0f65 protection: add confidential compute frame for arm
112e0f63 check: fix typecheck failure in qemu_arm64_test.go
18820e31 virtcontainers: fix lint failure on ppc64le
8fafced9 virtcontainers: nolint guestProtection
9668095a runtime: Fix field alignment on s390x
3e145ea9 vendor: Update containerd to v1.5.7
79e0754a fc: fix version parsing for fc >= 0.25
b8fc1af3 runtime: set the sandbox storage path static
97167ccd runtime: rename GetSanboxesStoragePath() --> GetSandboxesStoragePath()
b0aca51e kata-monitor: bump version to 0.2.0
28873c4d kata-monitor: refresh kata sandbox list on fs events
3525a2ed kata-monitor: improve detection of kata workloads
30d07d44 kata-monitor: add getSandboxFS()
623b1082 runtime: add GetSandboxesStoragePath()
fc1822f0 kata-monitor: improve sandbox caching
ba6ad1c8 kata-monitor: warn when unable to retrive the lower level runtime
22d3df91 kata-monitor: minor fixes

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-10-14 09:58:00 +02:00
Fabiano Fidêncio
c4e8e86acf Merge pull request #2839 from fidencio/wip/stable-2.2-backport-2821-and-2769
stable-2.2 | Backport #2821 and #2769
2021-10-14 09:57:02 +02:00
Bo Chen
eea2c0195f virtcontainers: clh: Use 'quiet' as the default kernel parameter
The 'quiet' kernel parameter can avoid guest kernel logs while booting,
which can reduce boot time.

Fix: #2820

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 7b2bfd4eca)
2021-10-14 08:53:50 +02:00
Bo Chen
1e798b96fd virtcontainers: clh: Turn-off serial and virtio-console by default
We will need to have console output from the guest only for debugging
purposes. As a result, we can turn-off both the serial and
virtio-console devices by default for better boot time.

Fixes: #2820

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 3e24e46c70)
2021-10-14 08:53:44 +02:00
Samuel Ortiz
53c4492fb3 agent: netlink: Use the grpc IP family field when updating the route
Not all routes have either a gateway or a destination IP.
Interface routes, where the source, destination and gateway are undefined,
will default to IP v4 with the current is_ipv6() check even when they
are v6 routes.

We use the provided gRPC Route.Family field instead. This field is built
from the host netlink messages, and is a reliable way of finding out
a route's IP family.

Fixes: #2768

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
(cherry picked from commit a44cde7e8d)
2021-10-14 08:53:10 +02:00
Samuel Ortiz
893623dfbc runtime: Pass the route IP family to the agent
When updating the guest routing table, we should forward the IP family
information up to the guest.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
(cherry picked from commit 71ce6cfe9e)
2021-10-14 08:53:06 +02:00
Samuel Ortiz
503ce9c154 agent: protos: Add a Family field to the Route payload
Our check for the IP family is working as long as we have either a
gateway or a destination IP. Some routes are missing both.
The RT netlink messages provide the IP family information for each
route, so we can carry that piece of information up to the guest. That
will allow for a more reliable route IP family determination.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
(cherry picked from commit 99450bd1f7)
2021-10-14 08:53:01 +02:00
Samuel Ortiz
9932e76f27 runtime: vendor: Bump the netlink package dependency
We need to be able to get the IP family from the netlink route meesages,
and the Route.Family field only got recently added to the netlink
package.

The update generates static check warnings about the call for
nethandler.Delete() being deprecated in favor of a Close() call instead.
So we include the s/Delete()/Close()/ change as part of this PR.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
(cherry picked from commit f85fe70231)
2021-10-14 08:52:47 +02:00
GabyCT
3a035c1f43 Merge pull request #2831 from Jakob-Naucke/backport-!x86-static
Backport runtime: Fix !x86 static checks
2021-10-13 13:35:48 -05:00
Eric Ernst
4102a18aa1 Merge pull request #2832 from bergwolf/capability-fix-for-2.2
stable-2.2 | agent: exec should inherit container process capabilities
2021-10-13 10:22:27 -07:00
Peng Tao
0034f40b67 agent: exec should inherit container process capabilities
Otherwise rustjail would not set its capabilities and it ends up getting
all capabilities.

Fixes: #2828
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-10-13 17:43:08 +08:00
Jianyong Wu
1f6b0f651e protection: add confidential compute frame for arm
Even CCA, which is the confidential compute archtecture, has not been
ready, add a empty implementation to avoid static check error.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Suggested-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-10-13 11:06:42 +02:00
Jianyong Wu
112e0f6381 check: fix typecheck failure in qemu_arm64_test.go
fix typecheck failure in qemu_arm64_test.go

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-10-13 11:06:42 +02:00
Amulya Meka
18820e31d9 virtcontainers: fix lint failure on ppc64le
Add nolint for arch specific code to exclude
from lint check.

Signed-off-by: Amulya Meka <amulmek1@in.ibm.com>
2021-10-13 11:06:42 +02:00
Jakob Naucke
8fafced9ff virtcontainers: nolint guestProtection
Exclude from lint checking for it is ultimately only used in
architecture-specific code.

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-10-13 11:06:41 +02:00
Jakob Naucke
9668095abd runtime: Fix field alignment on s390x
Follow-up of #2237 for s390x -- field alignment isn't always minimal

Fixes: #2830
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-10-13 11:06:41 +02:00
Chelsea Mafrica
be51808a13 Merge pull request #2803 from fidencio/wip/stable-2.2-upgrade-vendored-containerd
stable-2.2 | vendor: Update containerd to v1.5.7
2021-10-06 18:06:44 -07:00
Fabiano Fidêncio
3e145ea94c vendor: Update containerd to v1.5.7
Bump containerd to v1.5.7 in order to bring in a fix for CVE-2021-41103,
"insufficiently restricted permissions on plugins directories
(GHSA-c2h3-6mxw-7mvq)".

dependabot found a potential security vulnerability and raised a PR to
fix it.  However, dependabot does not properly follows nor understands
the needed of our CIs (mainly related to formatting the PR and whatnot),
thus I'm re-raising it.

Fixes: #2796
Backports: #2797

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-10-06 21:08:37 +02:00
Chelsea Mafrica
3951834565 Merge pull request #2800 from fidencio/wip/stable-2.2-backport-fix-for-parsing-firecracker-version-if-it-is-0-25-or-over
stable-2.2 | fc: fix version parsing for fc >= 0.25
2021-10-06 09:50:59 -07:00
Bl1tz23
79e0754a7b fc: fix version parsing for fc >= 0.25
Allows to use firecracker version >=0.25.

Fixes: #2471

Signed-off-by: Bl1tz23 <alex3angle@gmail.com>
(cherry picked from commit 87bbae1bd7)
2021-10-06 17:27:22 +02:00
snir911
afe6005785 Merge pull request #2717 from fgiudici/stable-2.2_kata-monitor
[backport] kata-monitor: cache improvements
2021-10-03 18:45:01 +03:00
Francesco Giudici
b8fc1af363 runtime: set the sandbox storage path static
Since we now have "unix://" kind of socket returned by the
SocketAddress() function, there is no more need to build the sandbox
storage path dynamically to keep OS compatibility.

Fixes: #2738
Suggested-by: Christophe de Dinechin <dinechin@redhat.com>
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
(cherry picked from commit 2304a59601)
2021-09-30 11:48:53 +02:00
Francesco Giudici
97167ccddd runtime: rename GetSanboxesStoragePath() --> GetSandboxesStoragePath()
Add the missing 'd'.

Fixes: #2738
Suggested-by: Jakob Naucke <jakob.naucke@ibm.com>
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
(cherry picked from commit 315295e0ef)
2021-09-30 11:48:09 +02:00
Fabiano Fidêncio
af0fbb9460 Merge pull request #2723 from fidencio/2.2.1-branch-bump
# Kata Containers 2.2.1
2021-09-25 00:02:01 +02:00
Fabiano Fidêncio
bc48a58806 Merge pull request #2731 from fidencio/wip/stable-2.2-release-fix-using-vendored-sources
stable-2.2 | workflows: Fix the config file path for using vendored sources
2021-09-25 00:01:43 +02:00
Fabiano Fidêncio
d581cdab4e Merge pull request #2728 from fidencio/wip/stable-2.2-fix-wrong-tags-attribution
stable-2.2 | workflows: Fix tag attribution
2021-09-24 23:01:18 +02:00
Fabiano Fidêncio
52fdfc4fed workflows: Fix the config file path for using vendored sources
There's a typo in the file that should receive the output of `cargo
vendor`.  We should use forward the output to `.cargo/config` instead of
`.cargo/vendor`.

This was introduced by 21c8511630.

Backports: #2730
Fixes: #2729

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit a525991c2c)
2021-09-24 20:29:15 +02:00
Fabiano Fidêncio
8d98e01414 workflows: Fix tag attribution
While releasing kata-containers 2.3.0-alpha1 we've hit some issues as
the tags attribution is done incorrectly.  We want an array of tags to
iterate over, but the currently code is just lost is the parenthesis.

This issue was introduced in a156288c1f.

Fixes: #2725

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit 39dcbaa672)
2021-09-24 20:07:55 +02:00
Fabiano Fidêncio
688cc8e2bd release: Kata Containers 2.2.1
- stable-2.2 | watcher: ensure we create target mount point for storage
- stable-2.2 | virtiofs: Create shared directory with 0700 mode, not 0750
- [backport]sandbox: Allow the device to be accessed,such as /dev/null and /dev/u…
- stable-2.2 | kata-deploy: Also provide "stable" & "latest" tags
- stable-2.2 | runtime: tracing: Fix logger passed in newContainer
- stable-2.2 | runtime: tracing: Use root context to stop tracing
- packaging: Backport QEMU's GitLab switch to 5.1.x
- stable-2.2 | workflows,release: Upload the vendored cargo code
- backport: Call agent shutdown test only in the correspondent CI_JOB
- packaging: Backport QEMU's switch to GitLab repos
- stable-2.2 | virtcontainers: fc: parse vcpuID correctly
- shimv2: Backport fixes for #2527
- backport-2.2: remove default config for arm64.
- stable-2.2 | versions: Upgrade to Cloud Hypervisor v18.0
- [backport]sandbox: Add device permissions such as /dev/null to cgroup
- [backport] runtime: Fix README link
- [backport] snap: Test variable instead of executing "branch"

d9b41fc5 watcher: ensure we create target mount point for storage
2b6327ac kata-deploy: Add more info about the stable tag
5256e085 kata-deploy: Improve README
02b46268 kata-deploy: Remove qemu-virtiofs runtime class
1b3058dd release: update the kata-deploy yaml files accordingly
98e2e935 kata-deploy: Add "stable" info to the README
8f25c7da kata-deploy: Update the README
84da2f8d workflows: Add "stable" & "latest" tags to kata-deploy
5c76f1c6 packaging: Backport QEMU's GitLab switch to 5.1.x
ba6fc328 packaging: Backport QEMU's switch to GitLab repos
d5f5da43 workflows,release: Upload the vendored cargo code
017cd3c5 ci: Call agent shutdown test only in the correspondent CI_JOB
2ca867da runtime: Add container field to logs
f4da502c shimv2: add information to method comment
16164241 shimv2: add logging to shimv2 api calls
25c7e118 virtiofs: Create shared directory with 0700 mode, not 0750
4c5bf057 virtcontainers: fc: parse vcpuID correctly
b3e620db runtime: tracing: Fix logger passed in newContainer
98c2ca13 runtime: tracing: Use root context to stop tracing
0481c507 backport-2.2: remove default config for arm64.
56920bc9 sandbox: Allow the device to be accessed,such as /dev/null and /dev/urandom
a1874ccd virtcontainers: clh: Revert the workaround incorrect default values
c2c65050 virtcontainers: clh: Re-generate the client code
7ee43f94 versions: Upgrade to Cloud Hypervisor v18.0
1792a9fe runtime: Fix README link
807cc8a3 sandbox: Add device permissions such as /dev/null to cgroup
5987f3b5 snap: Test variable instead of executing "branch"

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-09-24 12:34:35 +02:00
Fabiano Fidêncio
ebc23df752 Merge pull request #2714 from egernst/watcher-fixup-backport
stable-2.2 | watcher: ensure we create target mount point for storage
2021-09-24 09:32:29 +02:00
Francesco Giudici
b0aca51eac kata-monitor: bump version to 0.2.0
We now support any container engine CRI compliant. Let's bump the
kata-monitor version to 0.2.0.

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
(cherry picked from commit 8b0bc1f45e)
2021-09-24 09:25:13 +02:00
Francesco Giudici
28873c4d75 kata-monitor: refresh kata sandbox list on fs events
This commit stops the container engine polling in favor of
the kata sandbox storage path monitoring.
The pod cache list is now refreshed based on fs events and synced with
the container engine only when needed.

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
(cherry picked from commit bfb556d56a)
2021-09-24 09:25:00 +02:00
Francesco Giudici
3525a2ed03 kata-monitor: improve detection of kata workloads
When the container engine is different than containerd or CRI-O we
lack proper detection of kata workloads and consider all the pods as
kata ones.
Instead of querying the container engine for the lower level runtime
used in each pod, check if a directory matching the pod exists in
the virtualcontainers sandboxes storage path.
This provides a container engine independent way to check for kata pods.

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
(cherry picked from commit 0e854f3b80)
2021-09-24 09:24:17 +02:00
Francesco Giudici
30d07d4407 kata-monitor: add getSandboxFS()
Retrieve the absolute sandbox storage path. We will soon need this to
monitor the creation/deletion of new kata sandboxes.

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
(cherry picked from commit afad910d0e)
2021-09-24 09:24:03 +02:00
Francesco Giudici
623b108227 runtime: add GetSandboxesStoragePath()
The storage path we use to collect the sandbox files is defined in the
virtcontainers/persist/fs package.
We create the runtime socket in that storage path, by hardcoding the
full path in the SocketAddress() function in the runtime package.
This commit splits the hardcoded path by the socket address path so that
the runtime package will be able to provide the storage path to all the
components that may need it.

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
(cherry picked from commit e38686f74d)
2021-09-24 09:23:47 +02:00
Francesco Giudici
fc1822f094 kata-monitor: improve sandbox caching
In order to retrieve the list of sandboxes, we poll the container engine
every 15 seconds via the CRI. Once we have the list we have to inspect
each pod to find out the kata ones.
This commit extend the sandbox cache to keep track of all the pods,
marking the kata ones, so that during the next polling only the new
sandboxes should be inspected to figure out which ones are using the
kata runtime.

Fixes: #2563
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
(cherry picked from commit 245a12bbb7)
2021-09-24 09:23:33 +02:00
Francesco Giudici
ba6ad1c804 kata-monitor: warn when unable to retrive the lower level runtime
this is an unexpected event (likely a change in how containerd/cri-o
record the lower level runtime in the pod) and should be more visible:
raise the log level to "warning".

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
(cherry picked from commit fc067d61d4)
2021-09-24 09:21:10 +02:00
Francesco Giudici
22d3df9141 kata-monitor: minor fixes
fix comment and use literals

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
(cherry picked from commit 53ec4df953)
2021-09-24 09:19:56 +02:00
Fabiano Fidêncio
e58fabfc20 Merge pull request #2598 from c3d/backport/2589-virtiofsd-perms-perms
stable-2.2 | virtiofs: Create shared directory with 0700 mode, not 0750
2021-09-24 09:16:59 +02:00
Peng Tao
feb06dad8a Merge pull request #2623 from Bevisy/stable-2.2-2615-bp
[backport]sandbox: Allow the device to be accessed,such as /dev/null and /dev/u…
2021-09-24 14:04:36 +08:00
Eric Ernst
d9b41fc583 watcher: ensure we create target mount point for storage
We would only create the target when updating files. We need to make
sure that we create the target if the source is a directory. Without
this, we'll fail to start a container that utilizes an empty configmap,
for example.

Add unit tests for this.

Fixes: #2638

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-09-23 15:45:57 -07:00
Julio Montes
7852b9f8e1 Merge pull request #2711 from fidencio/wip/stable-2.2-kata-deploy-use-stable-and-latest-tags
stable-2.2 | kata-deploy: Also provide "stable" & "latest" tags
2021-09-23 12:18:00 -05:00
Chelsea Mafrica
83f219577d Merge pull request #2668 from cmaf/tracing-newContainer-logger-bp-2.2
stable-2.2 | runtime: tracing: Fix logger passed in newContainer
2021-09-23 09:58:14 -07:00
Chelsea Mafrica
97421afe17 Merge pull request #2664 from cmaf/tracing-stop-rootctx-bp-2.2
stable-2.2 | runtime: tracing: Use root context to stop tracing
2021-09-23 09:57:57 -07:00
Fabiano Fidêncio
2b6327ac37 kata-deploy: Add more info about the stable tag
Let's make it as clear as possible for the user that if they go for a
tagged version of kata-deploy, eg, 2.2.1, they'll have the kata runtime
2.2.1 deployed on their cluster.

Suggested-by: Eric Adams <eric.adams@intel.com>
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit 3bdcfaa658)
2021-09-23 14:05:17 +02:00
Fabiano Fidêncio
5256e0852c kata-deploy: Improve README
Let's add more instructions in the README in order to make clear to the
reader what they can do to check whether kata-deploy is ready, or
whether they have to wait till proceeding with the next instruction.

Suggested-by: Eric Adams <eric.adams@intel.com>
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit 41c590fa0a)
2021-09-23 14:04:57 +02:00
Fabiano Fidêncio
02b46268f4 kata-deploy: Remove qemu-virtiofs runtime class
There's only one QEMU runtime class deployed as part of kata-deploy, and
that includes virtiofs support (which is the default for quite some time
already).  Knowing this, let's just remove the `qemu-virtiofs` runtime
class definition.

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit debf3c9fe9)
2021-09-23 14:04:50 +02:00
Fabiano Fidêncio
1b3058dd24 release: update the kata-deploy yaml files accordingly
Let's teach our `update-repository-version.sh` script to properly update
the kata-deploy tags on both kata-deploy and kata-cleanup yaml files.

The 3 scenarios that we're dealing with, based on which branch we're
targetting, are:
```
 1) [main] ------> [main]        NO-OP
   "alpha0"       "alpha1"

                   +----------------+----------------+
                   |      from      |       to       |
  -----------------+----------------+----------------+
  kata-deploy      | "latest"       | "latest"       |
  -----------------+----------------+----------------+
  kata-deploy-base | "stable        | "stable"       |
  -----------------+----------------+----------------+

 2) [main] ------> [stable] Update kata-deploy and
   "alpha2"         "rc0"   get rid of kata-deploy-base

                   +----------------+----------------+
                   |      from      |       to       |
  -----------------+----------------+----------------+
  kata-deploy      | "latest"       | "rc0"          |
  -----------------+----------------+----------------+
  kata-deploy-base | "stable"       | REMOVED        |
  -----------------+----------------+----------------+

 3) [stable] ------> [stable]    Update kata-deploy
    "x.y.z"         "x.y.(z+1)"

                   +----------------+----------------+
                   |      from      |       to       |
  -----------------+----------------+----------------+
  kata-deploy      | "x.y.z"        | "x.y.(z+1)"    |
  -----------------+----------------+----------------+
  kata-deploy-base | NON-EXISTENT   | NON-EXISTENT   |
  -----------------+----------------+----------------+
```

And we can easily cover those 3 cases only with the information about
the "${target_branch}" and the "${new_version}", where:
* case 1) if "${target_branch}" is "main" *and* "${new_version}"
  contains "alpha", do nothing
* case 2) if "${target_branch}" is "main" *and* "${new_version}"
  contains "rc":
  * change the kata-deploy & kata-cleanup tags from "latest" to
    "${new_version}".
  * delete the kata-deploy-stable & kata-cleanup-stable files.
* case 3) if the "${target_branch}" contains "stable":
  * change the kata-deploy & kata-cleanup tags from "${current_version}"
    to "${new_version}".

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit 43a72d76e2)
2021-09-23 14:04:44 +02:00
Fabiano Fidêncio
98e2e93552 kata-deploy: Add "stable" info to the README
Similar to the instructions we have for the "latest" images, let's also
add instructions about the "stable" images.

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit ea9b2f9c92)
2021-09-23 14:04:38 +02:00
Fabiano Fidêncio
8f25c7da11 kata-deploy: Update the README
Let's just point to our repo URLs rather than assume users using
kata-deploy will have our repo cloned.

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit e541105680)
2021-09-23 14:04:29 +02:00
Fabiano Fidêncio
84da2f8ddc workflows: Add "stable" & "latest" tags to kata-deploy
When releasing a tarball, let's *also* add the "stable" & "latest" tags
to the kata-deploy image.

The "stable" tag refers to any official release, while the "latest" tag
refers to any pre-release / release candidate.

Fixes: #2302

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit a156288c1f)
2021-09-23 14:01:33 +02:00
Fabiano Fidêncio
de0e3915b7 Merge pull request #2702 from Jakob-Naucke/backport-qemu-gitlab
packaging: Backport QEMU's GitLab switch to 5.1.x
2021-09-23 12:59:17 +02:00
Jakob Naucke
5c76f1c65a packaging: Backport QEMU's GitLab switch to 5.1.x
This brings #2699 to 5.1.x for ARM. Add a `no_patches.txt` for 5.1.0
which was missing apparently.

Fixes: #2701
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-09-23 11:11:45 +02:00
Fabiano Fidêncio
522a53010c Merge pull request #2690 from fidencio/wip/stable-2.2-upload-cargo-vendored-tarball
stable-2.2 | workflows,release: Upload the vendored cargo code
2021-09-22 22:07:08 +02:00
Julio Montes
852fc53351 Merge pull request #2688 from GabyCT/shutdown
backport: Call agent shutdown test only in the correspondent CI_JOB
2021-09-22 09:53:14 -05:00
Julio Montes
e0a27b5e90 Merge pull request #2699 from Jakob-Naucke/backport-qemu-gitlab
packaging: Backport QEMU's switch to GitLab repos
2021-09-22 09:19:16 -05:00
Jakob Naucke
ba6fc32804 packaging: Backport QEMU's switch to GitLab repos
QEMU's submodule checkout from git.qemu.org can fail. On QEMU 6.x, this
is not a problem because they moved to GitLab. However, we use QEMU 5.2
on stable-2.2, which can be a problem when no cached QEMU is used.
Backport QEMU's switch.

Fixes: #2698
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-09-22 14:59:35 +02:00
Fabiano Fidêncio
d5f5da4323 workflows,release: Upload the vendored cargo code
As part of the release, let's also upload a tarball with the vendored
cargo code.  By doing this we allow distros, which usually don't have
access to the internet while performing the builds, to just add the
vendored code as a second source, making the life of the downstream
maintainers slightly easier*.

Fixes: #1203
Backports: #2573

*: The current workflow requires the downstream maintainer to download
the tarball, unpack it, run `cargo vendor`, create the tarball, etc.
Although this doesn't look like a ridiculous amount of work, it's better
if we can have it in an automated fashion.

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit 21c8511630)
2021-09-21 21:48:58 +02:00
Gabriela Cervantes
017cd3c53c ci: Call agent shutdown test only in the correspondent CI_JOB
The agent shutdown test should only run on the CI JOB of CRI_CONTAINERD_K8S_MINIMAL
which is the only one where testing tracing is being enabled, however, this
test is being triggered in multiple CI jobs where it should not run. This PR
fixes that issue.

Fixes #2683

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2021-09-21 17:01:09 +00:00
Chelsea Mafrica
484af1a559 Merge pull request #2678 from nubificus/stable-2.2-fix_fc_vcpu_thread
stable-2.2 | virtcontainers: fc: parse vcpuID correctly
2021-09-20 09:46:07 -07:00
Chelsea Mafrica
a572a6ebf8 Merge pull request #2679 from c3d/backport/2527-adding-debugging-msgs
shimv2: Backport fixes for #2527
2021-09-20 09:42:53 -07:00
Snir Sheriber
2ca867da7b runtime: Add container field to logs
and unified field naming

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>

Backport from commit 0c7789fad6
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2021-09-20 11:04:09 +02:00
Snir Sheriber
f4da502c4f shimv2: add information to method comment
add a comment to explicitly mentioned method is a binary call

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>

Backport from commit 72e3538e36
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2021-09-20 11:03:45 +02:00
Snir Sheriber
16164241df shimv2: add logging to shimv2 api calls
and also fetch and log container id from the request

Fixes: #2527
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>

Backport from commit 8dadca9cd1
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2021-09-20 11:02:35 +02:00
Christophe de Dinechin
25c7e1181a virtiofs: Create shared directory with 0700 mode, not 0750
A discussion on the Linux kernel mailing list [1] exposed that virtiofsd makes a
core assumption that the file systems being shared are not accessible by any
non-privileged user. We currently create the `shared` directory in the sandbox
with the default `0750` permissions, which gives read and directory traversal
access to the group. There is no real good reason for a non-root user to access
the shared directory, and this is potentially dangerous.

Fixes: #2589

[1]: https://lore.kernel.org/linux-fsdevel/YTI+k29AoeGdX13Q@redhat.com/

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2021-09-20 10:54:18 +02:00
Anastassios Nanos
4c5bf0576b virtcontainers: fc: parse vcpuID correctly
In getThreadIDs(), the cpuID variable is derived from a string that
already contains a whitespace. As a result, strings.SplitAfter returns
the cpuID with a leading space. This makes any go variant of string to int
fail (strconv.ParseInt() in our case). This patch makes sure that the
leading space character is removed so the string passed to
strconv.ParseInt() is "CPUID" and not " CPUID".

This has been caused by a change in the naming scheme of vcpu threads
for Firecracker after v0.19.1.

Fixes: #2592

Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
2021-09-18 08:10:13 +00:00
Chelsea Mafrica
b3e620dbcf runtime: tracing: Fix logger passed in newContainer
Change logger in Trace call in newContainer from sandbox.Logger() to
nil. Passing nil will cause an error to be logged by kataTraceLogger
instead of the sandbox logger, which will avoid having the log message
report it as part of the sandbox subsystem when it is part of the
container subsystem.

The kataTraceLogger will not log it as related to the container
subsystem, but since the container logger has not been created at this
point, and we already use the kataTraceLogger in other instances where a
subsystem's logger has not been created yet, this PR makes the call
consistent with other code.

Backport of #2666
Fixes #2667

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-09-16 16:30:29 -07:00
Chelsea Mafrica
98c2ca13c1 runtime: tracing: Use root context to stop tracing
Call StopTracing with s.rootCtx, which is the root context for tracing,
instead of s.ctx, which is parent to a subset of trace spans.

Backport of #2662

Fixes #2663

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-09-16 11:19:40 -07:00
Fabiano Fidêncio
a97c9063db Merge pull request #2642 from jongwu/qemu_mak_2.2
backport-2.2: remove default config for arm64.
2021-09-16 07:21:32 +02:00
Jianyong Wu
0481c5070c backport-2.2: remove default config for arm64.
The current default config in qemu for arm64 doesn't suit for qemu
version 5.1+, so remove them here.

Fixes: #2595
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-09-15 10:07:13 +08:00
Samuel Ortiz
64504061c8 Merge pull request #2619 from likebreath/0913/backport_clh_v18.0
stable-2.2 | versions: Upgrade to Cloud Hypervisor v18.0
2021-09-14 12:02:50 +02:00
Binbin Zhang
56920bc943 sandbox: Allow the device to be accessed,such as /dev/null and /dev/urandom
If the device has no permission, such as /dev/null, /dev/urandom,
it needs to be added into cgroup.

Fixes: #2615
Backport: #2616

Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
2021-09-14 10:33:49 +08:00
Bo Chen
a1874ccd62 virtcontainers: clh: Revert the workaround incorrect default values
Given the fix to the bugs of the openapi spec file is included in the
Cloud Hypervisor v18.0 [1], this patch reverts the workaround we carried
in the CLH driver.

This reverts commit 932ee41b3f.

[1] https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3029

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit f785ff0bf2)
2021-09-13 14:17:58 -07:00
Bo Chen
c2c650500b virtcontainers: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v18.0.
Note: The client code of cloud-hypervisor's (CLH) OpenAPI is
automatically generated by openapi-generator [1-2].

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 0e0e59dc5f)
2021-09-13 14:17:58 -07:00
Bo Chen
7ee43f9468 versions: Upgrade to Cloud Hypervisor v18.0
Highlights from the Cloud Hypervisor release v18.0: 1) Experimental User
Device (vfio-user) support; 2) Migration support for vhost-user devices;
3) VHDX disk image support; 4) Device pass through on MSHV hypervisor;
5) AArch64 for support virtio-mem; 6) Live migration on MSHV hypervisor;
7) AArch64 CPU topology support; 8) Power button support on AArch64; 9)
Various bug fixes on PTY, TTY, signal handling, and live-migration on
AArch64.

Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v18.0

Fixes: #2543

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit f0b5331430)
2021-09-13 14:17:58 -07:00
Samuel Ortiz
eedf139076 Merge pull request #2608 from Bevisy/main-2539-bp
[backport]sandbox: Add device permissions such as /dev/null to cgroup
2021-09-13 19:07:17 +02:00
Fabiano Fidêncio
54a6890c3c Merge pull request #2614 from sameo/stable-2.2
[backport] runtime: Fix README link
2021-09-13 17:45:07 +02:00
Samuel Ortiz
1792a9fe11 runtime: Fix README link
The LICENSE file lives in the project's root.

Fixes #2612

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2021-09-11 09:57:49 +02:00
Julio Montes
9bf95279be Merge pull request #2588 from devimc/2021-09-07/backport/fixSnap
[backport] snap: Test variable instead of executing "branch"
2021-09-10 14:44:55 -05:00
Binbin Zhang
807cc8a3a5 sandbox: Add device permissions such as /dev/null to cgroup
adds the default devices for unix such as /dev/null, /dev/urandom to
the container's resource cgroup spec

Fixes: #2539
Backports: #2603

Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
2021-09-10 17:33:26 +08:00
David Gibson
5987f3b5e1 snap: Test variable instead of executing "branch"
In snapcraft.yaml we have a case statement on $(branch) - that is on the
output of executing a command "branch".  From the selections it appears
that what it actually wants is to simply select on the contents of the
$branch variable, which should be ${branch} instead.

fixes #2558

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-07 09:37:17 -05:00
Fabiano Fidêncio
caafd0f952 Merge pull request #2541 from fidencio/2.2.0-branch-bump
# Kata Containers 2.2.0
2021-09-01 00:33:25 +02:00
Fabiano Fidêncio
800126b272 release: Kata Containers 2.2.0
- runtime: drop qemu-lite support
- stable-2.2 | virtcontainers: clh: Upgrade to the openapi-generator v5.2.1
- backport ci: Temporarily skip agent shutdown test on s390x
- backport: build_image: Fix error soft link about initrd.img

dca35c17 docs: remove mentioning of qemu-lite
0bdfdad2 runtime: drop qemu-lite support
60155756 runtime: fix default hypervisor path
ca9e6538 ci: Temporarily skip agent shutdown test on s390x
938b01ae virtcontainers: clh: Workaround incorrect default values
abd708e8 virtcontainers: clh: Fix the unit test
61babd45 virtcontainers: clh: Use constructors to ensure proper default value
59c51f62 virtcontainers: clh: Migrate to use the updated client APIs
c1f260cc virtcontainers: clh: Re-generate the client code
4cd6909f virtcontainers: clh: Upgrade to the openapi-generator v5.2.1
efa2d54e build_image: Fix error soft link about initrd.img

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-08-31 18:44:03 +02:00
Archana Shinde
b1372b353f Merge pull request #2533 from bergwolf/qemu-lite
runtime: drop qemu-lite support
2021-08-31 07:39:24 -07:00
Peng Tao
dca35c1730 docs: remove mentioning of qemu-lite
vm-templating should just work with upstream qemu v4.1.0 or above.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-08-31 10:17:12 +08:00
Peng Tao
0bdfdad236 runtime: drop qemu-lite support
As the project is not maintained and we have not been testing against it
for a long time.

Fixes: #2529
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-08-31 10:17:06 +08:00
Peng Tao
60155756f3 runtime: fix default hypervisor path
Should not be qemu-lite.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-08-31 10:16:57 +08:00
Fabiano Fidêncio
669888c339 Merge pull request #2525 from likebreath/0827/backport_clh_generator
stable-2.2 | virtcontainers: clh: Upgrade to the openapi-generator v5.2.1
2021-08-30 21:25:05 +02:00
GabyCT
cde008f441 Merge pull request #2531 from Jakob-Naucke/backport-s390x-skip-agent-shutdown-test
backport ci: Temporarily skip agent shutdown test on s390x
2021-08-30 09:25:50 -05:00
Peng Tao
7c866073f9 Merge pull request #2520 from Bevisy/stable-2.2-2503
backport: build_image: Fix error soft link about initrd.img
2021-08-30 20:16:55 +08:00
Jakob Naucke
ca9e6538e6 ci: Temporarily skip agent shutdown test on s390x
see https://github.com/kata-containers/tests/issues/3878 for tracking

Fixes: #2507
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-08-30 14:14:43 +02:00
Bo Chen
938b01aedc virtcontainers: clh: Workaround incorrect default values
Two default values defined in the 'cloud-hypervisor.yaml' have typo, and this
patch manually overwrites them with the correct value as a workaround
before the corresponding fix is landed to Cloud Hypervisor upstream.

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 932ee41b3f)
2021-08-27 13:37:47 -07:00
Bo Chen
abd708e814 virtcontainers: clh: Fix the unit test
This patch fixes the unit tests over clh.go with the updated client code.

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit bff38e4f4d)
2021-08-27 13:37:47 -07:00
Bo Chen
61babd45ed virtcontainers: clh: Use constructors to ensure proper default value
With the updated openapi-generator, the client code now handles optional
attributes correctly, and ensures to assign the right default
values. This patch enables to use those constructors to make sure the
proper default values being used.

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit d967d3cb37)
2021-08-27 13:37:47 -07:00
Bo Chen
59c51f6201 virtcontainers: clh: Migrate to use the updated client APIs
The client code (and APIs) for Cloud Hypervisor has been changed
dramatically due to the upgrade to `openapi-generator` v5.2.1. This
patch migrate the Cloud Hypervisor driver in the kata-runtime to use
those updated APIs.

The main change from the client code is that it now uses "pointer" type
to represent "optional" attributes from the input openapi specification
file.

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit a6a2e525de)
2021-08-27 13:37:47 -07:00
Bo Chen
c1f260cc40 virtcontainers: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor with the
updated `openapi-generator` v5.2.1.

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 46eb07e14f)
2021-08-27 13:37:47 -07:00
Bo Chen
4cd6909f18 virtcontainers: clh: Upgrade to the openapi-generator v5.2.1
To improve the quality and correctness of the auto-generated code, this
patch upgrade the `openapi-generator` to its latest stable release
v5.2.1.

Fixes: #2487

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 80fba4d637)
2021-08-27 13:37:47 -07:00
Binbin Zhang
efa2d54e85 build_image: Fix error soft link about initrd.img
fix error soft link about initrd.img

Fixes #2503

Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
2021-08-27 16:15:49 +08:00
555 changed files with 54553 additions and 5196 deletions

View File

@@ -1,6 +1,6 @@
name: kata deploy build
name: kata-deploy-build
on: [push, pull_request]
on: push
jobs:
build-asset:
@@ -24,7 +24,7 @@ jobs:
- name: Build ${{ matrix.asset }}
run: |
make "${KATA_ASSET}-tarball"
./tools/packaging/kata-deploy/local-build/kata-deploy-binaries-in-docker.sh --build="${KATA_ASSET}"
build_dir=$(readlink -f build)
# store-artifact does not work with symlink
sudo cp -r --preserve=all "${build_dir}" "kata-build"
@@ -47,21 +47,12 @@ jobs:
uses: actions/download-artifact@v2
with:
name: kata-artifacts
path: build
path: kata-artifacts
- name: merge-artifacts
run: |
make merge-builds
./tools/packaging/kata-deploy/local-build/kata-deploy-merge-builds.sh kata-artifacts
- name: store-artifacts
uses: actions/upload-artifact@v2
with:
name: kata-static-tarball
path: kata-static.tar.xz
make-kata-tarball:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: make kata-tarball
run: |
make kata-tarball
sudo make install-tarball

View File

@@ -60,7 +60,7 @@ jobs:
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/setup.sh
env:
GOPATH: ${{ runner.workspace }}/kata-containers
- name: Installing rust
- name: Building rust
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/install_rust.sh
@@ -84,7 +84,3 @@ jobs:
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && make test
- name: Run Unit Tests As Root User
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && sudo -E PATH="$PATH" make test

View File

@@ -18,7 +18,6 @@ TOOLS += agent-ctl
STANDARD_TARGETS = build check clean install test vendor
include utils.mk
include ./tools/packaging/kata-deploy/local-build/Makefile
all: build
@@ -34,4 +33,10 @@ generate-protocols:
static-checks: build
bash ci/static-checks.sh
binary-tarball:
make -f ./tools/packaging/kata-deploy/local-build/Makefile
install-binary-tarball:
make -f ./tools/packaging/kata-deploy/local-build/Makefile install
.PHONY: all default static-checks binary-tarball install-binary-tarball

View File

@@ -1 +1 @@
2.3.0-alpha1
2.2.3

View File

@@ -4,6 +4,6 @@
#
# This is the build root image for Kata Containers on OpenShift CI.
#
FROM registry.centos.org/centos:8
FROM centos:8
RUN yum -y update && yum -y install git sudo wget

View File

@@ -1,4 +1,3 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
@@ -187,7 +186,7 @@
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright 2015 xeipuuv
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.

View File

@@ -40,7 +40,6 @@ Documents that help to understand and contribute to Kata Containers.
* [Kata Containers Architecture](design/architecture.md): Architectural overview of Kata Containers
* [Kata Containers E2E Flow](design/end-to-end-flow.md): The entire end-to-end flow of Kata Containers
* [Kata Containers design](./design/README.md): More Kata Containers design documents
* [Kata Containers threat model](./threat-model/threat-model.md): Kata Containers threat model
### How to Contribute

View File

@@ -14,7 +14,7 @@ through the [CRI-O\*](https://github.com/kubernetes-incubator/cri-o) and
Kata Containers creates a QEMU\*/KVM virtual machine for pod that `kubelet` (Kubernetes) creates respectively.
The [`containerd-shim-kata-v2` (shown as `shimv2` from this point onwards)](../../src/runtime/cmd/containerd-shim-kata-v2/)
The [`containerd-shim-kata-v2` (shown as `shimv2` from this point onwards)](../../src/runtime/containerd-shim-v2)
is the Kata Containers entrypoint, which
implements the [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/master/runtime/v2) for Kata.
@@ -259,7 +259,7 @@ With `RuntimeClass`, users can define Kata Containers as a `RuntimeClass` and th
## DAX
Kata Containers utilizes the Linux kernel DAX [(Direct Access filesystem)](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/dax.rst?h=v5.14)
Kata Containers utilizes the Linux kernel DAX [(Direct Access filesystem)](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/dax.txt)
feature to efficiently map some host-side files into the guest VM space.
In particular, Kata Containers uses the QEMU NVDIMM feature to provide a
memory-mapped virtual device that can be used to DAX map the virtual machine's

View File

@@ -12,244 +12,187 @@ The OCI [runtime specification][linux-config] provides guidance on where the con
> [`cgroupsPath`][cgroupspath]: (string, OPTIONAL) path to the cgroups. It can be used to either control the cgroups
> hierarchy for containers or to run a new process in an existing container
Cgroups are hierarchical, and this can be seen with the following pod example:
cgroups are hierarchical, and this can be seen with the following pod example:
- Pod 1: `cgroupsPath=/kubepods/pod1`
- Container 1: `cgroupsPath=/kubepods/pod1/container1`
- Container 2: `cgroupsPath=/kubepods/pod1/container2`
- Container 1:
`cgroupsPath=/kubepods/pod1/container1`
- Container 2:
`cgroupsPath=/kubepods/pod1/container2`
- Pod 2: `cgroupsPath=/kubepods/pod2`
- Container 1: `cgroupsPath=/kubepods/pod2/container2`
- Container 2: `cgroupsPath=/kubepods/pod2/container2`
- Container 1:
`cgroupsPath=/kubepods/pod2/container2`
- Container 2:
`cgroupsPath=/kubepods/pod2/container2`
Depending on the upper-level orchestration layers, the cgroup under which the pod is placed is
managed by the orchestrator or not. In the case of Kubernetes, the pod cgroup is created by Kubelet,
while the container cgroups are to be handled by the runtime.
Kubelet will size the pod cgroup based on the container resource requirements, to which it may add
a configured set of [pod resource overheads](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/).
Depending on the upper-level orchestrator, the cgroup under which the pod is placed is
managed by the orchestrator. In the case of Kubernetes, the pod-cgroup is created by Kubelet,
while the container cgroups are to be handled by the runtime. Kubelet will size the pod-cgroup
based on the container resource requirements.
Kata Containers introduces a non-negligible resource overhead for running a sandbox (pod). Typically, the Kata shim,
through its underlying VMM invocation, will create many additional threads compared to process based container runtimes:
the para-virtualized I/O back-ends, the VMM instance or even the Kata shim process, all of those host processes consume
memory and CPU time not directly tied to the container workload, and introduces a sandbox resource overhead.
In order for a Kata workload to run without significant performance degradation, its sandbox overhead must be
provisioned accordingly. Two scenarios are possible:
Kata Containers introduces a non-negligible overhead for running a sandbox (pod). Based on this, two scenarios are possible:
1) The upper-layer orchestrator takes the overhead of running a sandbox into account when sizing the pod-cgroup, or
2) Kata Containers do not fully constrain the VMM and associated processes, instead placing a subset of them outside of the pod-cgroup.
1) The upper-layer orchestrator takes the overhead of running a sandbox into account when sizing the pod cgroup.
For example, Kubernetes [`PodOverhead`](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/)
feature lets the orchestrator add a configured sandbox overhead to the sum of all its containers resources. In
that case, the pod sandbox is properly sized and all Kata created processes will run under the pod cgroup
defined constraints and limits.
2) The upper-layer orchestrator does **not** take the sandbox overhead into account and the pod cgroup is not
sized to properly run all Kata created processes. With that scenario, attaching all the Kata processes to the sandbox
cgroup may lead to non-negligible workload performance degradations. As a consequence, Kata Containers will move
all processes but the vCPU threads into a dedicated overhead cgroup under `/kata_overhead`. The Kata runtime will
not apply any constraints or limits to that cgroup, it is up to the infrastructure owner to optionally set it up.
Kata Containers provides two options for how cgroups are handled on the host. Selection of these options is done through
the `SandboxCgroupOnly` flag within the Kata Containers [configuration](../../src/runtime/README.md#configuration)
file.
Those 2 scenarios are not dynamically detected by the Kata Containers runtime implementation, and thus the
infrastructure owner must configure the runtime according to how the upper-layer orchestrator creates and sizes the
pod cgroup. That configuration selection is done through the `sandbox_cgroup_only` flag within the Kata Containers
[configuration](../../src/runtime/README.md#configuration) file.
## `SandboxCgroupOnly` enabled
## `sandbox_cgroup_only = true`
With `SandboxCgroupOnly` enabled, it is expected that the parent cgroup is sized to take the overhead of running
a sandbox into account. This is ideal, as all the applicable Kata Containers components can be placed within the
given cgroup-path.
Setting `sandbox_cgroup_only` to `true` from the Kata Containers configuration file means that the pod cgroup is
properly sized and takes the pod overhead into account. This is ideal, as all the applicable Kata Containers processes
can simply be placed within the given cgroup path.
In the context of Kubernetes, Kubelet can size the pod cgroup to take the overhead of running a Kata-based sandbox
into account. This has been supported since the 1.16 Kubernetes release, through the
[`PodOverhead`](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/) feature.
In the context of Kubernetes, Kubelet will size the pod-cgroup to take the overhead of running a Kata-based sandbox
into account. This will be feasible in the 1.16 Kubernetes release through the `PodOverhead` feature.
```
┌─────────────────────────────────────────┐
┌──────────────────────────────────┐ │
│ │
│ │ ┌─────────────────────────────┐ │ │
│ │ │ │ │
│ │ │ ┌─────────────────────┐ │ │ │
│ │ │ │ vCPU threads
│ │ │ I/O threads │ │ │ │
│ │ │ │ VMM │ │
│ │ │ │ Kata Shim
│ │ │ │ │ │ │
│ │ │ │ /kata_<sandbox_id>
│ │ └─────────────────────┘ │ │ │
│ │Pod 1 │ │ │
│ │ └─────────────────────────────┘ │ │
│ │
│ │ ┌─────────────────────────────┐ │ │
│ │ │ │ │
│ │ ┌─────────────────────┐ │ │ │
│ │ │ vCPU threads
│ │ │ I/O threads │ │ │ │
│ │ │ │ VMM
│ │ │ Kata Shim │ │ │ │
│ │ │ │
│ │ │ │ /kata_<sandbox_id>
│ │ │ └─────────────────────┘ │ │ │
│ │ │Pod 2 │ │ │
│ │ └─────────────────────────────┘ │ │
│ │ │ │
│ │/kubepods │ │
│ └──────────────────────────────────┘ │
│ │
│ Node │
└─────────────────────────────────────────┘
+----------------------------------------------------------+
| +---------------------------------------------------+ |
| | +---------------------------------------------+ | |
| | | +--------------------------------------+ | | |
| | | | kata-shimv2, VMM and threads: | | | |
| | | | (VMM, IO-threads, vCPU threads, etc)| | | |
| | | | | | | |
| | | | kata_<sandbox-id> | | | |
| | | +--------------------------------------+ | | |
| | | | | |
| | |Pod 1 | | |
| | +---------------------------------------------+ | |
| | | |
| | +---------------------------------------------+ | |
| | | +--------------------------------------+ | | |
| | | | kata-shimv2, VMM and threads: | | | |
| | | | (VMM, IO-threads, vCPU threads, etc)| | | |
| | | | | | | |
| | | | kata_<sandbox-id> | | | |
| | | +--------------------------------------+ | | |
| | |Pod 2 | | |
| | +---------------------------------------------+ | |
| |kubepods | |
| +---------------------------------------------------+ |
| |
|Node |
+----------------------------------------------------------+
```
### Implementation details
### What does Kata do in this configuration?
1. Given a `PodSandbox` container creation, let:
When `sandbox_cgroup_only` is enabled, the Kata shim will create a per pod
sub-cgroup under the pod's dedicated cgroup. For example, in the Kubernetes context,
it will create a `/kata_<PodSandboxID>` under the `/kubepods` cgroup hierarchy.
On a typical cgroup v1 hierarchy mounted under `/sys/fs/cgroup/`, the memory cgroup
subsystem for a pod with sandbox ID `12345678` would live under
`/sys/fs/cgroup/memory/kubepods/kata_12345678`.
```
podCgroup=Parent(container.CgroupsPath)
KataSandboxCgroup=<podCgroup>/kata_<PodSandboxID>
```
In most cases, the `/kata_<PodSandboxID>` created cgroup is unrestricted and inherits and shares all
constraints and limits from the parent cgroup (`/kubepods` in the Kubernetes case). The exception is
for the `cpuset` and `devices` cgroup subsystems, which are managed by the Kata shim.
2. Create the cgroup, `KataSandboxCgroup`
After creating the `/kata_<PodSandboxID>` cgroup, the Kata Containers shim will move itself to it, **before** starting
the virtual machine. As a consequence all processes subsequently created by the Kata Containers shim (the VMM itself, and
all vCPU and I/O related threads) will be created in the `/kata_<PodSandboxID>` cgroup.
3. Join the `KataSandboxCgroup`
### Why create a kata-cgroup under the parent cgroup?
Any process created by the runtime will be created in `KataSandboxCgroup`.
The runtime will limit the cgroup in the host only if the sandbox doesn't have a
container type annotation, but the caller is free to set the proper limits for the `podCgroup`.
And why not directly adding the per sandbox shim directly to the pod cgroup (e.g.
`/kubepods` in the Kubernetes context)?
In the example above the pod cgroups are `/kubepods/pod1` and `/kubepods/pod2`.
Kata creates the unrestricted sandbox cgroup under the pod cgroup.
The Kata Containers shim implementation creates a per-sandbox cgroup
(`/kata_<PodSandboxID>`) to support the `Docker` use case. Although `Docker` does not
have a notion of pods, Kata Containers still creates a sandbox to support the pod-less,
single container use case that `Docker` implements. Since `Docker` does create any
cgroup hierarchy to place a container into, it would be very complex for Kata to map
a particular container to its sandbox without placing it under a `/kata_<containerID>>`
sub-cgroup first.
### Why create a Kata-cgroup under the parent cgroup?
### Advantages
`Docker` does not have a notion of pods, and will not create a cgroup directory
to place a particular container in (i.e., all containers would be in a path like
`/docker/container-id`. To simplify the implementation and continue to support `Docker`,
Kata Containers creates the sandbox-cgroup, in the case of Kubernetes, or a container cgroup, in the case
of docker.
Keeping all Kata Containers processes under a properly sized pod cgroup is ideal
and makes for a simpler Kata Containers implementation. It also helps with gathering
accurate statistics and preventing Kata workloads from being noisy neighbors.
### Improvements
#### Pod resources statistics
- Get statistics about pod resources
If the Kata caller wants to know the resource usage on the host it can get
statistics from the pod cgroup. All cgroups stats in the hierarchy will include
the Kata overhead. This gives the possibility of gathering usage-statics at the
pod level and the container level.
#### Better host resource isolation
- Better host resource isolation
Because the Kata runtime will place all the Kata processes in the pod cgroup,
the resource limits that the caller applies to the pod cgroup will affect all
processes that belong to the Kata sandbox in the host. This will improve the
isolation in the host preventing Kata to become a noisy neighbor.
## `sandbox_cgroup_only = false` (Default setting)
If the cgroup provided to Kata is not sized appropriately, Kata components will
consume resources that the actual container workloads expect to see and use.
This can cause instability and performance degradations.
To avoid that situation, Kata Containers creates an unconstrained overhead
cgroup and moves all non workload related processes (Anything but the virtual CPU
threads) to it. The name of this overhead cgroup is `/kata_overhead` and a per
sandbox sub cgroup will be created under it for each sandbox Kata Containers creates.
Kata Containers does not add any constraints or limitations on the overhead cgroup. It is up to the infrastructure
owner to either:
- Provision nodes with a pre-sized `/kata_overhead` cgroup. Kata Containers will
load that existing cgroup and move all non workload related processes to it.
- Let Kata Containers create the `/kata_overhead` cgroup, leave it
unconstrained or resize it a-posteriori.
## `SandboxCgroupOnly` disabled (default, legacy)
If the cgroup provided to Kata is not sized appropriately, instability will be
introduced when fully constraining Kata components, and the user-workload will
see a subset of resources that were requested. Based on this, the default
handling for Kata Containers is to not fully constrain the VMM and Kata
components on the host.
```
┌────────────────────────────────────────────────────────────────────┐
│ │
│ ┌─────────────────────────────┐ ┌───────────────────────────┐ │
│ │ │ │ │ │
┌─────────────────────────┼────┼─────────────────────────┐ │ │
│ │ │ │ │ │
│ ┌─────────────────────┐ │ │ ┌─────────────────────┐ │ │ │
│ │ │ │ vCPU threads │ │ │ │ VMM │ │ │ │
│ │ │ │ │ │ I/O threads │ │ │ │
│ │ │ │ │ │ │ │ Kata Shim │ │ │ │
│ │ │ │ │ │ │ │ │ │ │ │
│ │ /kata_<sandbox_id> │ │ │ │ /<sandbox_id> │ │ │ │
│ └─────────────────────┘ │ │ └─────────────────────┘ │ │ │
│ │ │ │ │ │ │ │
│ │ │ Pod 1 │ │ │ │ │
└─────────────────────────┼────┼─────────────────────────┘ │ │
│ │ │ │ │ │
│ │ │ │ │ │
┌─────────────────────────┼────┼─────────────────────────┐ │ │
│ │ │ │ │ │ │ │
│ ┌─────────────────────┐ │ │ ┌─────────────────────┐ │ │ │
│ │ vCPU threads │ │ │ │ VMM │ │ │ │
│ │ │ │ │ │ │ │ I/O threads │ │ │ │
│ │ │ │ │ │ │ │ Kata Shim │ │ │ │
│ │ │ │ │ │ │ │ │ │ │ │
│ │ │ │ /kata_<sandbox_id> │ │ │ │ /<sandbox_id> │ │ │ │
│ │ │ └─────────────────────┘ │ │ └─────────────────────┘ │ │ │
│ │ │ │ │
│ │ │ Pod 2 │ │ │ │ │
│ │ └─────────────────────────┼────┼─────────────────────────┘ │ │
│ │ │ │ │ │
│ │ /kubepods │ │ /kata_overhead │ │
│ └─────────────────────────────┘ └───────────────────────────┘ │
│ │
│ │
│ Node │
└────────────────────────────────────────────────────────────────────┘
+----------------------------------------------------------+
| +---------------------------------------------------+ |
| | +---------------------------------------------+ | |
| | | +--------------------------------------+ | | |
| | | |Container 1 |-|Container 2 | | | |
| | | | |-| | | | |
| | | | Shim+container1 |-| Shim+container2 | | | |
| | | +--------------------------------------+ | | |
| | | | | |
| | |Pod 1 | | |
| | +---------------------------------------------+ | |
| | | |
| | +---------------------------------------------+ | |
| | | +--------------------------------------+ | | |
| | | |Container 1 |-|Container 2 | | | |
| | | | |-| | | | |
| | | | Shim+container1 |-| Shim+container2 | | | |
| | | +--------------------------------------+ | | |
| | | | | |
| | |Pod 2 | | |
| | +---------------------------------------------+ | |
| |kubepods | |
| +---------------------------------------------------+ |
| +---------------------------------------------------+ |
| | Hypervisor | |
| |Kata | |
| +---------------------------------------------------+ |
| |
|Node |
+----------------------------------------------------------+
```
### Implementation Details
### What does this method do?
When `sandbox_cgroup_only` is disabled, the Kata Containers shim will create a per pod
sub-cgroup under the pods dedicated cgroup, and another one under the overhead cgroup.
For example, in the Kubernetes context, it will create a `/kata_<PodSandboxID>` under
the `/kubepods` cgroup hierarchy, and a `/<PodSandboxID>` under the `/kata_overhead` one.
1. Given a container creation let `containerCgroupHost=container.CgroupsPath`
1. Rename `containerCgroupHost` path to add `kata_`
1. Let `PodCgroupPath=PodSanboxContainerCgroup` where `PodSanboxContainerCgroup` is the cgroup of a container of type `PodSandbox`
1. Limit the `PodCgroupPath` with the sum of all the container limits in the Sandbox
1. Move only vCPU threads of hypervisor to `PodCgroupPath`
1. Per each container, move its `kata-shim` to its own `containerCgroupHost`
1. Move hypervisor and applicable threads to memory cgroup `/kata`
On a typical cgroup v1 hierarchy mounted under `/sys/fs/cgroup/`, for a pod which sandbox
ID is `12345678`, create with `sandbox_cgroup_only` disabled, the 2 memory subsystems
for the sandbox cgroup and the overhead cgroup would respectively live under
`/sys/fs/cgroup/memory/kubepods/kata_12345678` and `/sys/fs/cgroup/memory/kata_overhead/12345678`.
_Note_: the Kata Containers runtime will not add all the hypervisor threads to
the cgroup path requested, only vCPUs. These threads are run unconstrained.
Unlike when `sandbox_cgroup_only` is enabled, the Kata Containers shim will move itself
to the overhead cgroup first, and then move the vCPU threads to the sandbox cgroup as
they're created. All Kata processes and threads will run under the overhead cgroup except for
the vCPU threads.
This mitigates the risk of the VMM and other threads receiving an out of memory scenario (`OOM`).
With `sandbox_cgroup_only` disabled, Kata Containers assumes the pod cgroup is only sized
to accommodate for the actual container workloads processes. For Kata, this maps
to the VMM created virtual CPU threads and so they are the only ones running under the pod
cgroup. This mitigates the risk of the VMM, the Kata shim and the I/O threads going through
a catastrophic out of memory scenario (`OOM`).
#### Pros and Cons
#### Impact
Running all non vCPU threads under an unconstrained overhead cgroup could lead to workloads
potentially consuming a large amount of host resources.
On the other hand, running all non vCPU threads under a dedicated overhead cgroup can provide
accurate metrics on the actual Kata Container pod overhead, allowing for tuning the overhead
cgroup size and constraints accordingly.
If resources are reserved at a system level to account for the overheads of
running sandbox containers, this configuration can be utilized with adequate
stability. In this scenario, non-negligible amounts of CPU and memory will be
utilized unaccounted for on the host.
[linux-config]: https://github.com/opencontainers/runtime-spec/blob/master/config-linux.md
[cgroupspath]: https://github.com/opencontainers/runtime-spec/blob/master/config-linux.md#cgroups-path
# Supported cgroups
Kata Containers currently only supports cgroups `v1`.
In the following sections each cgroup is described briefly.
Kata Containers supports cgroups `v1` and `v2`. In the following sections each cgroup is
described briefly and what changes are needed in Kata Containers to support it.
## Cgroups V1
@@ -301,7 +244,7 @@ diagram:
A process can join a cgroup by writing its process id (`pid`) to `cgroup.procs` file,
or join a cgroup partially by writing the task (thread) id (`tid`) to the `tasks` file.
Kata Containers only supports `v1`.
Kata Containers supports `v1` by default and no change in the configuration file is needed.
To know more about `cgroups v1`, see [cgroupsv1(7)][2].
## Cgroups V2
@@ -354,13 +297,22 @@ Same as `cgroups v1`, a process can join the cgroup by writing its process id (`
`cgroup.procs` file, or join a cgroup partially by writing the task (thread) id (`tid`) to
`cgroup.threads` file.
Kata Containers does not support cgroups `v2` on the host.
For backwards compatibility Kata Containers defaults to supporting cgroups v1 by default.
To change this to `v2`, set `sandbox_cgroup_only=true` in the `configuration.toml` file.
To know more about `cgroups v2`, see [cgroupsv2(7)][3].
### Distro Support
Many Linux distributions do not yet support `cgroups v2`, as it is quite a recent addition.
For more information about the status of this feature see [issue #2494][4].
# Summary
| cgroup option | default? | status | pros | cons | cgroups
|-|-|-|-|-|-|
| `SandboxCgroupOnly=false` | yes | legacy | Easiest to make Kata work | Unaccounted for memory and resource utilization | v1
| `SandboxCgroupOnly=true` | no | recommended | Complete tracking of Kata memory and CPU utilization. In Kubernetes, the Kubelet can fully constrain Kata via the pod cgroup | Requires upper layer orchestrator which sizes sandbox cgroup appropriately | v1, v2
[1]: http://man7.org/linux/man-pages/man5/tmpfs.5.html
[2]: http://man7.org/linux/man-pages/man7/cgroups.7.html#CGROUPS_VERSION_1

View File

@@ -17,9 +17,10 @@
- `firecracker`
- `ACRN`
While `qemu` , `cloud-hypervisor` and `firecracker` work out of the box with installation of Kata,
some additional configuration is needed in case of `ACRN`.
While `qemu` and `cloud-hypervisor` work out of the box with installation of Kata,
some additional configuration is needed in case of `firecracker` and `ACRN`.
Refer to the following guides for additional configuration steps:
- [Kata Containers with Firecracker](https://github.com/kata-containers/documentation/wiki/Initial-release-of-Kata-Containers-with-Firecracker-support)
- [Kata Containers with ACRN Hypervisor](how-to-use-kata-containers-with-acrn.md)
## Advanced Topics
@@ -34,5 +35,3 @@
- [How to set sandbox Kata Containers configurations with pod annotations](how-to-set-sandbox-config-kata.md)
- [How to monitor Kata Containers in K8s](how-to-set-prometheus-in-k8s.md)
- [How to use hotplug memory on arm64 in Kata Containers](how-to-hotplug-memory-arm64.md)
- [How to setup swap devices in guest kernel](how-to-setup-swap-devices-in-guest-kernel.md)
- [How to run rootless vmm](how-to-run-rootless-vmm.md)

View File

@@ -39,7 +39,7 @@ use `RuntimeClass` instead of the deprecated annotations.
### Containerd Runtime V2 API: Shim V2 API
The [`containerd-shim-kata-v2` (short as `shimv2` in this documentation)](../../src/runtime/cmd/containerd-shim-kata-v2/)
The [`containerd-shim-kata-v2` (short as `shimv2` in this documentation)](../../src/runtime/containerd-shim-v2)
implements the [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/master/runtime/v2) for Kata.
With `shimv2`, Kubernetes can launch Pod and OCI-compatible containers with one shim per Pod. Prior to `shimv2`, `2N+1`
shims (i.e. a `containerd-shim` and a `kata-shim` for each container and the Pod sandbox itself) and no standalone `kata-proxy`

View File

@@ -1,33 +0,0 @@
## Introduction
To improve security, Kata Container supports running the VMM process (currently only QEMU) as a non-`root` user.
This document describes how to enable the rootless VMM mode and its limitations.
## Pre-requisites
The permission and ownership of the `kvm` device node (`/dev/kvm`) need to be configured to:
```
$ crw-rw---- 1 root kvm
```
use the following commands:
```
$ sudo groupadd kvm -r
$ sudo chown root:kvm /dev/kvm
$ sudo chmod 660 /dev/kvm
```
## Configure rootless VMM
By default, the VMM process still runs as the root user. There are two ways to enable rootless VMM:
1. Set the `rootless` flag to `true` in the hypervisor section of `configuration.toml`.
2. Set the Kubernetes annotation `io.katacontainers.hypervisor.rootless` to `true`.
## Implementation details
When `rootless` flag is enabled, upon a request to create a Pod, Kata Containers runtime creates a random user and group (e.g. `kata-123`), and uses them to start the hypervisor process.
The `kvm` group is also given to the hypervisor process as a supplemental group to give the hypervisor process access to the `/dev/kvm` device.
Another necessary change is to move the hypervisor runtime files (e.g. `vhost-fs.sock`, `qmp.sock`) to a directory (under `/run/user/[uid]/`) where only the non-root hypervisor has access to.
## Limitations
1. Only the VMM process is running as a non-root user. Other processes such as Kata Container shimv2 and `virtiofsd` still run as the root user.
2. Currently, this feature is only supported in QEMU. Still need to bring it to Firecracker and Cloud Hypervisor (see https://github.com/kata-containers/kata-containers/issues/2567).
3. Certain features will not work when rootless VMM is enabled, including:
1. Passing devices to the guest (`virtio-blk`, `virtio-scsi`) will not work if the non-privileged user does not have permission to access it (leading to a permission denied error). A more permissive permission (e.g. 666) may overcome this issue. However, you need to be aware of the potential security implications of reducing the security on such devices.
2. `vfio` device will also not work because of permission denied error.

View File

@@ -91,13 +91,6 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.hypervisor.virtio_fs_cache` | string | the cache mode for virtio-fs, valid values are `always`, `auto` and `none` |
| `io.katacontainers.config.hypervisor.virtio_fs_daemon` | string | virtio-fs `vhost-user` daemon path |
| `io.katacontainers.config.hypervisor.virtio_fs_extra_args` | string | extra options passed to `virtiofs` daemon |
| `io.katacontainers.config.hypervisor.enable_guest_swap` | `boolean` | enable swap in the guest |
## Container Options
| Key | Value Type | Comments |
|-------| ----- | ----- |
| `io.katacontainers.container.resource.swappiness"` | `uint64` | specify the `Resources.Memory.Swappiness` |
| `io.katacontainers.container.resource.swap_in_bytes"` | `uint64` | specify the `Resources.Memory.Swap` |
# CRI-O Configuration
@@ -107,12 +100,11 @@ In case of CRI-O, all annotations specified in the pod spec are passed down to K
For containerd, annotations specified in the pod spec are passed down to Kata
starting with version `1.3.0` of containerd. Additionally, extra configuration is
needed for containerd, by providing `pod_annotations` field and
`container_annotations` field in the containerd config
file. The `pod_annotations` field and `container_annotations` field are two lists of
annotations that can be passed down to Kata as OCI annotations. They support golang match
patterns. Since annotations supported by Kata follow the pattern `io.katacontainers.*`,
the following configuration would work for passing annotations to Kata from containerd:
needed for containerd, by providing a `pod_annotations` field in the containerd config
file. The `pod_annotations` field is a list of annotations that can be passed down to
Kata as OCI annotations. It supports golang match patterns. Since annotations supported
by Kata follow the pattern `io.katacontainers.*`, the following configuration would work
for passing annotations to Kata from containerd:
```
$ cat /etc/containerd/config
@@ -121,7 +113,6 @@ $ cat /etc/containerd/config
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata]
runtime_type = "io.containerd.kata.v2"
pod_annotations = ["io.katacontainers.*"]
container_annotations = ["io.katacontainers.*"]
....
```

View File

@@ -1,59 +0,0 @@
# Setup swap device in guest kernel
## Introduction
Setup swap device in guest kernel can help to increase memory capacity, handle some memory issues and increase file access speed sometimes.
Kata Containers can insert a raw file to the guest as the swap device.
## Requisites
The swap config of the containers should be set by [annotations](how-to-set-sandbox-config-kata.md#container-options). So [extra configuration is needed for containerd](how-to-set-sandbox-config-kata.md#containerd-configuration).
Kata Containers just supports setup swap device in guest kernel with QEMU.
Install and setup Kata Containers as shown [here](../install/README.md).
Enable setup swap device in guest kernel as follows:
```
$ sudo sed -i -e 's/^#enable_guest_swap.*$/enable_guest_swap = true/g' /etc/kata-containers/configuration.toml
```
## Run a Kata Container utilizing swap device
Use following command to start a Kata Container with swappiness 60 and 1GB swap device (swap_in_bytes - memory_limit_in_bytes).
```
$ pod_yaml=pod.yaml
$ container_yaml=container.yaml
$ image="quay.io/prometheus/busybox:latest"
$ cat << EOF > "${pod_yaml}"
metadata:
name: busybox-sandbox1
EOF
$ cat << EOF > "${container_yaml}"
metadata:
name: busybox-test-swap
annotations:
io.katacontainers.container.resource.swappiness: "60"
io.katacontainers.container.resource.swap_in_bytes: "2147483648"
linux:
resources:
memory_limit_in_bytes: 1073741824
image:
image: "$image"
command:
- top
EOF
$ sudo crictl pull $image
$ podid=$(sudo crictl runp $pod_yaml)
$ cid=$(sudo crictl create $podid $container_yaml $pod_yaml)
$ sudo crictl start $cid
```
Kata Container setups swap device for this container only when `io.katacontainers.container.resource.swappiness` is set.
The following table shows the swap size how to decide if `io.katacontainers.container.resource.swappiness` is set.
|`io.katacontainers.container.resource.swap_in_bytes`|`memory_limit_in_bytes`|swap size|
|---|---|---|
|set|set| `io.katacontainers.container.resource.swap_in_bytes` - `memory_limit_in_bytes`|
|not set|set| `memory_limit_in_bytes`|
|not set|not set| `io.katacontainers.config.hypervisor.default_memory`|
|set|not set|cgroup doesn't support this usage|

View File

@@ -3,7 +3,7 @@
This document describes how to set up a single-machine Kubernetes (k8s) cluster.
The Kubernetes cluster will use the
[CRI containerd plugin](https://github.com/containerd/containerd/tree/main/pkg/cri) and
[CRI containerd plugin](https://github.com/containerd/cri) and
[Kata Containers](https://katacontainers.io) to launch untrusted workloads.
## Requirements

View File

@@ -22,7 +22,7 @@ This document requires the presence of the ACRN hypervisor and Kata Containers o
- ACRN supported [Hardware](https://projectacrn.github.io/latest/hardware.html#supported-hardware).
> **Note:** Please make sure to have a minimum of 4 logical processors (HT) or cores.
- ACRN [software](https://projectacrn.github.io/latest/tutorials/kbl-nuc-sdc.html#use-the-script-to-set-up-acrn-automatically) setup.
- ACRN [software](https://projectacrn.github.io/latest/tutorials/run_kata_containers.html) setup.
- For networking, ACRN supports either MACVTAP or TAP. If MACVTAP is not enabled in the Service OS, please follow the below steps to update the kernel:
```sh

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 150 KiB

View File

@@ -1,137 +0,0 @@
# Kata Containers threat model
This document discusses threat models associated with the Kata Containers project.
Kata was designed to provide additional isolation of container workloads, protecting
the host infrastructure from potentially malicious container users or workloads. Since
Kata Containers adds a level of isolation on top of traditional containers, the focus
is on the additional layer provided, not on traditional container security.
This document provides a brief background on containers and layered security, describes
the interface to Kata from CRI runtimes, a review of utilized virtual machine interfaces, and then
a review of threats.
## Kata security objective
Kata seeks to prevent an untrusted container workload or user of that container workload to gain
control of, obtain information from, or tamper with the host infrastructure.
In our scenario, an asset is anything on the host system, or elsewhere in the cluster
infrastructure. The attacker is assumed to be either a malicious user or the workload itself
running within the container. The goal of Kata is to prevent attacks which would allow
any access to the defined assets.
## Background on containers, layered security
Traditional containers leverage several key Linux kernel features to provide isolation and
a view that the container workload is the only entity running on the host. Key features include
`Namespaces`, `cgroups`, `capablities`, `SELinux` and `seccomp`. The canonical runtime for creating such
a container is `runc`. In the remainder of the document, the term `traditional-container` will be used
to describe a container workload created by runc.
Kata Containers provides a second layer of isolation on top of those provided by traditional-containers.
The hardware virtualization interface is the basis of this additional layer. Kata launches a lightweight
virtual machine, and uses the guests Linux kernel to create a container workload, or workloads in the case
of multi-container pods. In Kubernetes and in the Kata implementation, the sandbox is carried out at the
pod level. In Kata, this sandbox is created using a virtual machine.
## Interface to Kata Containers: CRI, v2-shim, OCI
A typical Kata Containers deployment uses Kubernetes with a CRI implementation.
On every node, Kubelet will interact with a CRI implementor, which will in turn interface with
an OCI based runtime, such as Kata Containers. Typical CRI implementors are `cri-o` and `containerd`.
The CRI API, as defined at the Kubernetes [CRI-API repo](https://github.com/kubernetes/cri-api/),
results in a few constructs being supported by the CRI implementation, and ultimately in the OCI
runtime creating the workloads.
In order to run a container inside of the Kata sandbox, several virtual machine devices and interfaces
are required. Kata translates sandbox and container definitions to underlying virtualization technologies provided
by a set of virtual machine monitors (VMMs) and hypervisors. These devices and their underlying
implementations are discussed in detail in the following section.
## Interface to the Kata sandbox/virtual machine
In case of Kata, today the devices which we need in the guest are:
- Storage: In the current design of Kata Containers, we are reliant on the CRI implementor to
assist in image handling and volume management on the host. As a result, we need to support a way of passing to the sandbox the container rootfs, volumes requested
by the workload, and any other volumes created to facilitate sharing of secrets and `configmaps` with the containers. Depending on how these are managed, a block based device or file-system
sharing is required. Kata Containers does this by way of `virtio-blk` and/or `virtio-fs`.
- Networking: A method for enabling network connectivity with the workload is required. Typically this will be done providing a `TAP` device
to the VMM, and this will be exposed to the guest as a `virtio-net` device. It is feasible to pass in a NIC device directly, in which case `VFIO` is leveraged
and the device itself will be exposed to the guest.
- Control: In order to interact with the guest agent and retrieve `STDIO` from containers, a medium of communication is required.
This is available via `virtio-vsock`.
- Devices: `VFIO` is utilized when devices are passed directly to the virtual machine and exposed to the container.
- Dynamic Resource Management: `ACPI` is utilized to allow for dynamic VM resource management (for example: CPU, memory, device hotplug). This is required when containers are resized,
or more generally when containers are added to a pod.
How these devices are utilized varies depending on the VMM utilized. We clarify the default settings provided when integrating Kata
with the QEMU, Firecracker and Cloud Hypervisor VMMs in the following sections.
### Devices
Each virtio device is implemented by a backend, which may execute within userspace on the host (vhost-user), the VMM itself, or within the host kernel (vhost). While it may provide enhanced performance,
vhost devices are often seen as higher risk since an exploit would be already running within the kernel space. While VMM and vhost-user are both in userspace on the host, `vhost-user` generally allows for the back-end process to require less system calls and capabilities compared to a full VMM.
#### `virtio-blk` and `virtio-scsi`
The backend for `virtio-blk` and `virtio-scsi` are based in the VMM itself (ring3 in the context of x86) by default for Cloud Hypervisor, Firecracker and QEMU.
While `vhost` based back-ends are available for QEMU, it is not recommended. `vhost-user` back-ends are being added for Cloud Hypervisor, they are not utilized in Kata today.
#### `virtio-fs`
`virtio-fs` is supported in Cloud Hypervisor and QEMU. `virtio-fs`'s interaction with the host filesystem is done through a vhost-user daemon, `virtiofsd`.
The `virtio-fs` client, running in the guest, will generate requests to access files. `virtiofsd` will receive requests, open the file, and request the VMM
to `mmap` it into the guest. When DAX is utilized, the guest will access the host's page cache, avoiding the need for copy and duplication. DAX is still an experimental feature,
and is not enabled by default.
From the `virtiofsd` [documentation](https://qemu-project.gitlab.io/qemu/tools/virtiofsd.html):
```This program must be run as the root user. Upon startup the program will switch into a new file system namespace with the shared directory tree as its root. This prevents “file system escapes” due to symlinks and other file system objects that might lead to files outside the shared directory. The program also sandboxes itself using seccomp(2) to prevent ptrace(2) and other vectors that could allow an attacker to compromise the system after gaining control of the virtiofsd process.```
DAX-less support for `virtio-fs` is available as of the 5.4 Linux kernel. QEMU VMM supports virtio-fs as of v4.2. Cloud Hypervisor
supports `virtio-fs`.
#### `virtio-net`
`virtio-net` has many options, depending on the VMM and Kata configurations.
##### QEMU networking
While QEMU has options for `vhost`, `virtio-net` and `vhost-user`, the `virtio-net` backend
for Kata defaults to `vhost-net` for performance reasons. The default configuration is being
reevaluated.
##### Firecracker networking
For Firecracker, the `virtio-net` backend is within Firecracker's VMM.
##### Cloud Hypervisor networking
For Cloud Hypervisor, the current backend default is within the VMM. `vhost-user-net` support
is being added (written in rust, Cloud Hypervisor specific).
#### virtio-vsock
##### QEMU vsock
In QEMU, vsock is backed by `vhost_vsock`, which runs within the kernel itself.
##### Firecracker and Cloud Hypervisor
In Firecracker and Cloud Hypervisor, vsock is backed by a unix-domain-socket in the hosts userspace.
#### VFIO
Utilizing VFIO, devices can be passed through to the virtual machine. We will assess this separately. Exposure to
host is limited to gaps in device pass-through handling. This is supported in QEMU and Cloud Hypervisor, but not
Firecracker.
#### ACPI
ACPI is necessary for hotplug of CPU, memory and devices. ACPI is available in QEMU and Cloud Hypervisor. Device, CPU and memory hotplug
are not available in Firecracker.
## Devices and threat model
![Threat model](threat-model-boundaries.svg "threat-model")

View File

@@ -67,7 +67,7 @@ To use large BARs devices (for example, Nvidia Tesla P100), you need Kata versio
The following configuration in the Kata `configuration.toml` file as shown below can work:
Hotplug for PCI devices by `acpi_pcihp` (Linux's ACPI PCI Hotplug driver):
Hotplug for PCI devices by `shpchp` (Linux's SHPC PCI Hotplug driver):
```
machine_type = "q35"
@@ -91,6 +91,7 @@ The following kernel config options need to be enabled:
```
# Support PCI/PCIe device hotplug (Required for large BARs device)
CONFIG_HOTPLUG_PCI_PCIE=y
CONFIG_HOTPLUG_PCI_SHPC=y
# Support for loading modules (Required for load Nvidia drivers)
CONFIG_MODULES=y

View File

@@ -305,7 +305,7 @@ parts:
;;
*)
cp -a ${kata_dir}/tools/packaging/qemu/default-configs/* configs/devices/
cp -a ${kata_dir}/tools/packaging/qemu/default-configs/* default-configs/devices/
;;
esac

1
src/agent/Cargo.lock generated
View File

@@ -545,6 +545,7 @@ dependencies = [
"scan_fmt",
"scopeguard",
"serde_json",
"serial_test",
"slog",
"slog-scope",
"slog-stdlog",

View File

@@ -20,6 +20,7 @@ scan_fmt = "0.2.3"
scopeguard = "1.0.0"
thiserror = "1.0.26"
regex = "1"
serial_test = "0.5.1"
# Async helpers
async-trait = "0.1.42"

202
src/agent/LICENSE Normal file
View File

@@ -0,0 +1,202 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@@ -127,7 +127,7 @@ vendor:
#TARGET test: run cargo tests
test:
@cargo test --all --target $(TRIPLE) -- --nocapture
@cargo test --all --target $(TRIPLE)
##TARGET check: run test
check: clippy format

View File

@@ -46,6 +46,7 @@ message Route {
string device = 3;
string source = 4;
uint32 scope = 5;
IPFamily family = 6;
}
message ARPNeighbor {

View File

@@ -833,6 +833,20 @@ impl BaseContainer for LinuxContainer {
}
let linux = spec.linux.as_ref().unwrap();
if p.oci.capabilities.is_none() {
// No capabilities, inherit from container process
let process = spec
.process
.as_ref()
.ok_or_else(|| anyhow!("no process config"))?;
p.oci.capabilities = Some(
process
.capabilities
.clone()
.ok_or_else(|| anyhow!("missing process capabilities"))?,
);
}
let (pfd_log, cfd_log) = unistd::pipe().context("failed to create pipe")?;
let _ = fcntl::fcntl(pfd_log, FcntlArg::F_SETFD(FdFlag::FD_CLOEXEC))

View File

@@ -95,6 +95,7 @@ pub const SYSTEM_DEV_PATH: &str = "/dev";
// Linux UEvent related consts.
pub const U_EVENT_ACTION: &str = "ACTION";
pub const U_EVENT_ACTION_ADD: &str = "add";
pub const U_EVENT_ACTION_REMOVE: &str = "remove";
pub const U_EVENT_DEV_PATH: &str = "DEVPATH";
pub const U_EVENT_SUB_SYSTEM: &str = "SUBSYSTEM";
pub const U_EVENT_SEQ_NUM: &str = "SEQNUM";

View File

@@ -4,18 +4,22 @@
//
use std::collections::HashMap;
use std::ffi::CString;
use std::fs;
use std::fs::File;
use std::io;
use std::io::{BufRead, BufReader};
use std::iter;
use std::os::unix::fs::{MetadataExt, PermissionsExt};
use std::path::Path;
use std::ptr::null;
use std::str::FromStr;
use std::sync::Arc;
use tokio::sync::Mutex;
use nix::mount::MsFlags;
use libc::{c_void, mount};
use nix::mount::{self, MsFlags};
use nix::unistd::Gid;
use regex::Regex;
@@ -145,53 +149,96 @@ pub const STORAGE_HANDLER_LIST: &[&str] = &[
DRIVER_WATCHABLE_BIND_TYPE,
];
#[instrument]
pub fn baremount(
source: &str,
destination: &str,
fs_type: &str,
#[derive(Debug, Clone)]
pub struct BareMount<'a> {
source: &'a str,
destination: &'a str,
fs_type: &'a str,
flags: MsFlags,
options: &str,
logger: &Logger,
) -> Result<()> {
let logger = logger.new(o!("subsystem" => "baremount"));
options: &'a str,
logger: Logger,
}
if source.is_empty() {
return Err(anyhow!("need mount source"));
// mount mounts a source in to a destination. This will do some bookkeeping:
// * evaluate all symlinks
// * ensure the source exists
impl<'a> BareMount<'a> {
#[instrument]
pub fn new(
s: &'a str,
d: &'a str,
fs_type: &'a str,
flags: MsFlags,
options: &'a str,
logger: &Logger,
) -> Self {
BareMount {
source: s,
destination: d,
fs_type,
flags,
options,
logger: logger.new(o!("subsystem" => "baremount")),
}
}
if destination.is_empty() {
return Err(anyhow!("need mount destination"));
#[instrument]
pub fn mount(&self) -> Result<()> {
let source;
let dest;
let fs_type;
let mut options = null();
let cstr_options: CString;
let cstr_source: CString;
let cstr_dest: CString;
let cstr_fs_type: CString;
if self.source.is_empty() {
return Err(anyhow!("need mount source"));
}
if self.destination.is_empty() {
return Err(anyhow!("need mount destination"));
}
cstr_source = CString::new(self.source)?;
source = cstr_source.as_ptr();
cstr_dest = CString::new(self.destination)?;
dest = cstr_dest.as_ptr();
if self.fs_type.is_empty() {
return Err(anyhow!("need mount FS type"));
}
cstr_fs_type = CString::new(self.fs_type)?;
fs_type = cstr_fs_type.as_ptr();
if !self.options.is_empty() {
cstr_options = CString::new(self.options)?;
options = cstr_options.as_ptr() as *const c_void;
}
info!(
self.logger,
"mount source={:?}, dest={:?}, fs_type={:?}, options={:?}",
self.source,
self.destination,
self.fs_type,
self.options
);
let rc = unsafe { mount(source, dest, fs_type, self.flags.bits(), options) };
if rc < 0 {
return Err(anyhow!(
"failed to mount {:?} to {:?}, with error: {}",
self.source,
self.destination,
io::Error::last_os_error()
));
}
Ok(())
}
if fs_type.is_empty() {
return Err(anyhow!("need mount FS type"));
}
info!(
logger,
"mount source={:?}, dest={:?}, fs_type={:?}, options={:?}",
source,
destination,
fs_type,
options
);
nix::mount::mount(
Some(source),
destination,
Some(fs_type),
flags,
Some(options),
)
.map_err(|e| {
anyhow!(
"failed to mount {:?} to {:?}, with error: {}",
source,
destination,
e
)
})
}
#[instrument]
@@ -439,14 +486,17 @@ fn mount_storage(logger: &Logger, storage: &Storage) -> Result<()> {
return Ok(());
}
let mount_path = Path::new(&storage.mount_point);
let src_path = Path::new(&storage.source);
if storage.fstype == "bind" && !src_path.is_dir() {
ensure_destination_file_exists(mount_path)
} else {
fs::create_dir_all(mount_path).map_err(anyhow::Error::from)
match storage.fstype.as_str() {
DRIVER_9P_TYPE | DRIVER_VIRTIOFS_TYPE => {
let dest_path = Path::new(storage.mount_point.as_str());
if !dest_path.exists() {
fs::create_dir_all(dest_path).context("Create mount destination failed")?;
}
}
_ => {
ensure_destination_exists(storage.mount_point.as_str(), storage.fstype.as_str())?;
}
}
.context("Could not create mountpoint")?;
let options_vec = storage.options.to_vec();
let options_vec = options_vec.iter().map(String::as_str).collect();
@@ -459,14 +509,16 @@ fn mount_storage(logger: &Logger, storage: &Storage) -> Result<()> {
"mount-options" => options.as_str(),
);
baremount(
let bare_mount = BareMount::new(
storage.source.as_str(),
storage.mount_point.as_str(),
storage.fstype.as_str(),
flags,
options.as_str(),
&logger,
)
);
bare_mount.mount()
}
/// Looks for `mount_point` entry in the /proc/mounts.
@@ -585,9 +637,11 @@ fn mount_to_rootfs(logger: &Logger, m: &InitMount) -> Result<()> {
let (flags, options) = parse_mount_flags_and_options(options_vec);
let bare_mount = BareMount::new(m.src, m.dest, m.fstype, flags, options.as_str(), logger);
fs::create_dir_all(Path::new(m.dest)).context("could not create directory")?;
baremount(m.src, m.dest, m.fstype, flags, &options, logger).or_else(|e| {
bare_mount.mount().or_else(|e| {
if m.src != "dev" {
return Err(e);
}
@@ -762,27 +816,32 @@ pub fn cgroups_mount(logger: &Logger, unified_cgroup_hierarchy: bool) -> Result<
#[instrument]
pub fn remove_mounts(mounts: &[String]) -> Result<()> {
for m in mounts.iter() {
nix::mount::umount(m.as_str()).context(format!("failed to umount {:?}", m))?;
mount::umount(m.as_str()).context(format!("failed to umount {:?}", m))?;
}
Ok(())
}
// ensure_destination_exists will recursively create a given mountpoint. If directories
// are created, their permissions are initialized to mountPerm(0755)
#[instrument]
fn ensure_destination_file_exists(path: &Path) -> Result<()> {
if path.is_file() {
fn ensure_destination_exists(destination: &str, fs_type: &str) -> Result<()> {
let d = Path::new(destination);
if d.exists() {
return Ok(());
} else if path.exists() {
return Err(anyhow!("{:?} exists but is not a regular file", path));
}
let dir = d
.parent()
.ok_or_else(|| anyhow!("mount destination {} doesn't exist", destination))?;
if !dir.exists() {
fs::create_dir_all(dir).context(format!("create dir all {:?}", dir))?;
}
// The only way parent() can return None is if the path is /,
// which always exists, so the test above will already have caught
// it, thus the unwrap() is safe
let dir = path.parent().unwrap();
fs::create_dir_all(dir).context(format!("create_dir_all {:?}", dir))?;
fs::File::create(path).context(format!("create empty file {:?}", path))?;
if fs_type != "bind" || d.is_dir() {
fs::create_dir_all(d).context(format!("create dir all {:?}", d))?;
} else {
fs::File::create(d).context(format!("create file {:?}", d))?;
}
Ok(())
}
@@ -806,6 +865,8 @@ fn parse_options(option_list: Vec<String>) -> HashMap<String, String> {
mod tests {
use super::*;
use crate::{skip_if_not_root, skip_loop_if_not_root, skip_loop_if_root};
use libc::umount;
use std::fs::metadata;
use std::fs::File;
use std::fs::OpenOptions;
use std::io::Write;
@@ -945,7 +1006,7 @@ mod tests {
std::fs::create_dir_all(d).expect("failed to created directory");
}
let result = baremount(
let bare_mount = BareMount::new(
&src_filename,
&dest_filename,
d.fs_type,
@@ -954,13 +1015,25 @@ mod tests {
&logger,
);
let result = bare_mount.mount();
let msg = format!("{}: result: {:?}", msg, result);
if d.error_contains.is_empty() {
assert!(result.is_ok(), "{}", msg);
// Cleanup
nix::mount::umount(dest_filename.as_str()).unwrap();
unsafe {
let cstr_dest =
CString::new(dest_filename).expect("failed to convert dest to cstring");
let umount_dest = cstr_dest.as_ptr();
let ret = umount(umount_dest);
let msg = format!("{}: umount result: {:?}", msg, result);
assert!(ret == 0, "{}", msg);
};
continue;
}
@@ -1030,7 +1103,7 @@ mod tests {
}
// Create an actual mount
let result = baremount(
let bare_mount = BareMount::new(
mnt_src_filename,
mnt_dest_filename,
"bind",
@@ -1038,6 +1111,8 @@ mod tests {
"",
&logger,
);
let result = bare_mount.mount();
assert!(result.is_ok(), "mount for test setup failed");
let tests = &[
@@ -1369,20 +1444,37 @@ mod tests {
}
#[test]
fn test_ensure_destination_file_exists() {
fn test_ensure_destination_exists() {
let dir = tempdir().expect("failed to create tmpdir");
let mut testfile = dir.into_path();
testfile.push("testfile");
let result = ensure_destination_file_exists(&testfile);
let result = ensure_destination_exists(testfile.to_str().unwrap(), "bind");
assert!(result.is_ok());
assert!(testfile.exists());
let result = ensure_destination_file_exists(&testfile);
let result = ensure_destination_exists(testfile.to_str().unwrap(), "bind");
assert!(result.is_ok());
assert!(testfile.is_file());
let meta = metadata(testfile).unwrap();
assert!(meta.is_file());
let dir = tempdir().expect("failed to create tmpdir");
let mut testdir = dir.into_path();
testdir.push("testdir");
let result = ensure_destination_exists(testdir.to_str().unwrap(), "ext4");
assert!(result.is_ok());
assert!(testdir.exists());
let result = ensure_destination_exists(testdir.to_str().unwrap(), "ext4");
assert!(result.is_ok());
//let meta = metadata(testdir.to_str().unwrap()).unwrap();
let meta = metadata(testdir).unwrap();
assert!(meta.is_dir());
}
}

View File

@@ -13,7 +13,7 @@ use std::fs::File;
use std::path::{Path, PathBuf};
use tracing::instrument;
use crate::mount::{baremount, FLAGS};
use crate::mount::{BareMount, FLAGS};
use slog::Logger;
const PERSISTENT_NS_DIR: &str = "/var/run/sandbox-ns";
@@ -129,7 +129,8 @@ impl Namespace {
}
};
baremount(source, destination, "none", flags, "", &logger).map_err(|e| {
let bare_mount = BareMount::new(source, destination, "none", flags, "", &logger);
bare_mount.mount().map_err(|e| {
anyhow!(
"Failed to mount {} to {} with err:{:?}",
source,

View File

@@ -312,7 +312,6 @@ impl Handle {
for route in list {
let link = self.find_link(LinkFilter::Name(&route.device)).await?;
let is_v6 = is_ipv6(route.get_gateway()) || is_ipv6(route.get_dest());
const MAIN_TABLE: u8 = packet::constants::RT_TABLE_MAIN;
const UNICAST: u8 = packet::constants::RTN_UNICAST;
@@ -334,7 +333,7 @@ impl Handle {
// `rtnetlink` offers a separate request builders for different IP versions (IP v4 and v6).
// This if branch is a bit clumsy because it does almost the same.
if is_v6 {
if route.get_family() == IPFamily::v6 {
let dest_addr = if !route.dest.is_empty() {
Ipv6Network::from_str(&route.dest)?
} else {
@@ -594,10 +593,6 @@ fn format_address(data: &[u8]) -> Result<String> {
}
}
fn is_ipv6(str: &str) -> bool {
Ipv6Addr::from_str(str).is_ok()
}
fn parse_mac_address(addr: &str) -> Result<[u8; 6]> {
let mut split = addr.splitn(6, ':');
@@ -932,16 +927,6 @@ mod tests {
assert_eq!(bytes, [0xAB, 0x0C, 0xDE, 0x12, 0x34, 0x56]);
}
#[test]
fn check_ipv6() {
assert!(is_ipv6("::1"));
assert!(is_ipv6("2001:0:3238:DFE1:63::FEFB"));
assert!(!is_ipv6(""));
assert!(!is_ipv6("127.0.0.1"));
assert!(!is_ipv6("10.10.10.10"));
}
fn clean_env_for_test_add_one_arp_neighbor(dummy_name: &str, ip: &str) {
// ip link delete dummy
Command::new("ip")

View File

@@ -47,7 +47,7 @@ use rustjail::process::ProcessOperations;
use crate::device::{add_devices, pcipath_to_sysfs, rescan_pci_bus, update_device_cgroup};
use crate::linux_abi::*;
use crate::metrics::get_metrics;
use crate::mount::{add_storages, baremount, remove_mounts, STORAGE_HANDLER_LIST};
use crate::mount::{add_storages, remove_mounts, BareMount, STORAGE_HANDLER_LIST};
use crate::namespace::{NSTYPEIPC, NSTYPEPID, NSTYPEUTS};
use crate::network::setup_guest_dns;
use crate::random;
@@ -1624,14 +1624,15 @@ fn setup_bundle(cid: &str, spec: &mut Spec) -> Result<PathBuf> {
let rootfs_path = bundle_path.join("rootfs");
fs::create_dir_all(&rootfs_path)?;
baremount(
BareMount::new(
&spec_root.path,
rootfs_path.to_str().unwrap(),
"bind",
MsFlags::MS_BIND,
"",
&sl!(),
)?;
)
.mount()?;
spec.root = Some(Root {
path: rootfs_path.to_str().unwrap().to_owned(),
readonly: spec_root.readonly,

View File

@@ -449,7 +449,7 @@ fn online_memory(logger: &Logger) -> Result<()> {
#[cfg(test)]
mod tests {
use super::Sandbox;
use crate::{mount::baremount, skip_if_not_root};
use crate::{mount::BareMount, skip_if_not_root};
use anyhow::Error;
use nix::mount::MsFlags;
use oci::{Linux, Root, Spec};
@@ -461,10 +461,14 @@ mod tests {
use tempfile::Builder;
fn bind_mount(src: &str, dst: &str, logger: &Logger) -> Result<(), Error> {
baremount(src, dst, "bind", MsFlags::MS_BIND, "", logger)
let baremount = BareMount::new(src, dst, "bind", MsFlags::MS_BIND, "", logger);
baremount.mount()
}
use serial_test::serial;
#[tokio::test]
#[serial]
async fn set_sandbox_storage() {
let logger = slog::Logger::root(slog::Discard, o!());
let mut s = Sandbox::new(&logger).unwrap();
@@ -499,6 +503,7 @@ mod tests {
}
#[tokio::test]
#[serial]
async fn remove_sandbox_storage() {
skip_if_not_root!();
@@ -555,6 +560,7 @@ mod tests {
}
#[tokio::test]
#[serial]
async fn unset_and_remove_sandbox_storage() {
skip_if_not_root!();
@@ -606,6 +612,7 @@ mod tests {
}
#[tokio::test]
#[serial]
async fn unset_sandbox_storage() {
let logger = slog::Logger::root(slog::Discard, o!());
let mut s = Sandbox::new(&logger).unwrap();
@@ -689,6 +696,7 @@ mod tests {
}
#[tokio::test]
#[serial]
async fn get_container_entry_exist() {
skip_if_not_root!();
let logger = slog::Logger::root(slog::Discard, o!());
@@ -702,6 +710,7 @@ mod tests {
}
#[tokio::test]
#[serial]
async fn get_container_no_entry() {
let logger = slog::Logger::root(slog::Discard, o!());
let mut s = Sandbox::new(&logger).unwrap();
@@ -711,6 +720,7 @@ mod tests {
}
#[tokio::test]
#[serial]
async fn add_and_get_container() {
skip_if_not_root!();
let logger = slog::Logger::root(slog::Discard, o!());
@@ -722,6 +732,7 @@ mod tests {
}
#[tokio::test]
#[serial]
async fn update_shared_pidns() {
skip_if_not_root!();
let logger = slog::Logger::root(slog::Discard, o!());
@@ -740,6 +751,7 @@ mod tests {
}
#[tokio::test]
#[serial]
async fn add_guest_hooks() {
let logger = slog::Logger::root(slog::Discard, o!());
let mut s = Sandbox::new(&logger).unwrap();
@@ -763,6 +775,7 @@ mod tests {
}
#[tokio::test]
#[serial]
async fn test_sandbox_set_destroy() {
let logger = slog::Logger::root(slog::Discard, o!());
let mut s = Sandbox::new(&logger).unwrap();

View File

@@ -97,10 +97,18 @@ impl Uevent {
})
}
#[instrument]
async fn process_remove(&self, logger: &Logger, sandbox: &Arc<Mutex<Sandbox>>) {
let mut sb = sandbox.lock().await;
sb.uevent_map.remove(&self.devpath);
}
#[instrument]
async fn process(&self, logger: &Logger, sandbox: &Arc<Mutex<Sandbox>>) {
if self.action == U_EVENT_ACTION_ADD {
return self.process_add(logger, sandbox).await;
} else if self.action == U_EVENT_ACTION_REMOVE {
return self.process_remove(logger, sandbox).await;
}
debug!(*logger, "ignoring event"; "uevent" => format!("{:?}", self));
}

View File

@@ -3,7 +3,7 @@
// SPDX-License-Identifier: Apache-2.0
//
#![allow(unknown_lints)]
#![allow(clippy::unknown_clippy_lints)]
use std::collections::HashMap;
use std::path::{Path, PathBuf};
@@ -20,7 +20,7 @@ use tokio::sync::Mutex;
use tokio::task;
use tokio::time::{self, Duration};
use crate::mount::baremount;
use crate::mount::BareMount;
use crate::protocols::agent as protos;
/// The maximum number of file system entries agent will watch for each mount.
@@ -193,6 +193,14 @@ impl Storage {
size += metadata.len();
ensure!(
self.watched_files.len() <= MAX_ENTRIES_PER_STORAGE,
WatcherError::MountTooManyFiles {
count: self.watched_files.len(),
mnt: self.source_mount_point.display().to_string()
}
);
// Insert will return old entry if any
if let Some(old_st) = self.watched_files.insert(path.to_path_buf(), modified) {
if modified > old_st {
@@ -203,14 +211,6 @@ impl Storage {
debug!(logger, "New entry: {}", path.display());
update_list.push(PathBuf::from(&path))
}
ensure!(
self.watched_files.len() <= MAX_ENTRIES_PER_STORAGE,
WatcherError::MountTooManyFiles {
count: self.watched_files.len(),
mnt: self.source_mount_point.display().to_string()
}
);
} else {
// Scan dir recursively
let mut entries = fs::read_dir(path)
@@ -327,14 +327,16 @@ impl SandboxStorages {
}
}
match baremount(
match BareMount::new(
entry.source_mount_point.to_str().unwrap(),
entry.target_mount_point.to_str().unwrap(),
"bind",
MsFlags::MS_BIND,
"bind",
logger,
) {
)
.mount()
{
Ok(_) => {
entry.watch = false;
info!(logger, "watchable mount replaced with bind mount")
@@ -438,14 +440,15 @@ impl BindWatcher {
async fn mount(&self, logger: &Logger) -> Result<()> {
fs::create_dir_all(WATCH_MOUNT_POINT_PATH).await?;
baremount(
BareMount::new(
"tmpfs",
WATCH_MOUNT_POINT_PATH,
"tmpfs",
MsFlags::empty(),
"",
logger,
)?;
)
.mount()?;
Ok(())
}
@@ -979,7 +982,10 @@ mod tests {
);
}
use serial_test::serial;
#[tokio::test]
#[serial]
async fn create_tmpfs() {
skip_if_not_root!();
@@ -994,6 +1000,7 @@ mod tests {
}
#[tokio::test]
#[serial]
async fn spawn_thread() {
skip_if_not_root!();
@@ -1023,6 +1030,7 @@ mod tests {
}
#[tokio::test]
#[serial]
async fn verify_container_cleanup_watching() {
skip_if_not_root!();

View File

@@ -12,7 +12,7 @@
// payload, which allows the forwarder to know how many bytes it must read to
// consume the trace span. The payload is a serialised version of the trace span.
#![allow(unknown_lints)]
#![allow(clippy::unknown_clippy_lints)]
use async_trait::async_trait;
use byteorder::{ByteOrder, NetworkEndian};

View File

@@ -5,10 +5,16 @@ coverage.txt
coverage.html
.git-commit
.git-commit.tmp
/config/*.toml
config-generated.go
/cli/config/configuration-acrn.toml
/cli/config/configuration-clh.toml
/cli/config/configuration-fc.toml
/cli/config/configuration-qemu.toml
/cli/config/configuration-clh.toml
/cli/config-generated.go
/cli/containerd-shim-kata-v2/config-generated.go
/cli/coverage.html
/containerd-shim-kata-v2
/cmd/containerd-shim-v2/monitor_address
/containerd-shim-v2/monitor_address
/data/kata-collect-data.sh
/kata-monitor
/kata-netmon
@@ -17,4 +23,7 @@ config-generated.go
/virtcontainers/hack/virtc/virtc
/virtcontainers/hook/mock/hook
/virtcontainers/profile.cov
/virtcontainers/shim/mock/cc-shim/cc-shim
/virtcontainers/shim/mock/kata-shim/kata-shim
/virtcontainers/shim/mock/shim
/virtcontainers/utils/supportfiles

201
src/runtime/LICENSE Normal file
View File

@@ -0,0 +1,201 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@@ -51,13 +51,12 @@ PROJECT_DIR = $(PROJECT_TAG)
IMAGENAME = $(PROJECT_TAG).img
TARGET = $(BIN_PREFIX)-runtime
RUNTIME_OUTPUT = $(CURDIR)/$(TARGET)
RUNTIME_DIR = $(CLI_DIR)/$(TARGET)
TARGET_OUTPUT = $(CURDIR)/$(TARGET)
BINLIST += $(TARGET)
NETMON_DIR = $(CLI_DIR)/netmon
NETMON_DIR = netmon
NETMON_TARGET = $(PROJECT_TYPE)-netmon
NETMON_RUNTIME_OUTPUT = $(CURDIR)/$(NETMON_TARGET)
NETMON_TARGET_OUTPUT = $(CURDIR)/$(NETMON_TARGET)
BINLIBEXECLIST += $(NETMON_TARGET)
DESTDIR ?= /
@@ -201,7 +200,7 @@ FEATURE_SELINUX ?= check
SED = sed
CLI_DIR = cmd
CLI_DIR = cli
SHIMV2 = containerd-shim-kata-v2
SHIMV2_OUTPUT = $(CURDIR)/$(SHIMV2)
SHIMV2_DIR = $(CLI_DIR)/$(SHIMV2)
@@ -226,7 +225,7 @@ ifneq (,$(QEMUCMD))
KNOWN_HYPERVISORS += $(HYPERVISOR_QEMU)
CONFIG_FILE_QEMU = configuration-qemu.toml
CONFIG_QEMU = config/$(CONFIG_FILE_QEMU)
CONFIG_QEMU = $(CLI_DIR)/config/$(CONFIG_FILE_QEMU)
CONFIG_QEMU_IN = $(CONFIG_QEMU).in
CONFIG_PATH_QEMU = $(abspath $(CONFDIR)/$(CONFIG_FILE_QEMU))
@@ -249,7 +248,7 @@ ifneq (,$(CLHCMD))
KNOWN_HYPERVISORS += $(HYPERVISOR_CLH)
CONFIG_FILE_CLH = configuration-clh.toml
CONFIG_CLH = config/$(CONFIG_FILE_CLH)
CONFIG_CLH = $(CLI_DIR)/config/$(CONFIG_FILE_CLH)
CONFIG_CLH_IN = $(CONFIG_CLH).in
CONFIG_PATH_CLH = $(abspath $(CONFDIR)/$(CONFIG_FILE_CLH))
@@ -272,7 +271,7 @@ ifneq (,$(FCCMD))
KNOWN_HYPERVISORS += $(HYPERVISOR_FC)
CONFIG_FILE_FC = configuration-fc.toml
CONFIG_FC = config/$(CONFIG_FILE_FC)
CONFIG_FC = $(CLI_DIR)/config/$(CONFIG_FILE_FC)
CONFIG_FC_IN = $(CONFIG_FC).in
CONFIG_PATH_FC = $(abspath $(CONFDIR)/$(CONFIG_FILE_FC))
@@ -295,7 +294,7 @@ ifneq (,$(ACRNCMD))
KNOWN_HYPERVISORS += $(HYPERVISOR_ACRN)
CONFIG_FILE_ACRN = configuration-acrn.toml
CONFIG_ACRN = config/$(CONFIG_FILE_ACRN)
CONFIG_ACRN = $(CLI_DIR)/config/$(CONFIG_FILE_ACRN)
CONFIG_ACRN_IN = $(CONFIG_ACRN).in
CONFIG_PATH_ACRN = $(abspath $(CONFDIR)/$(CONFIG_FILE_ACRN))
@@ -523,12 +522,12 @@ containerd-shim-v2: $(SHIMV2_OUTPUT)
monitor: $(MONITOR_OUTPUT)
netmon: $(NETMON_RUNTIME_OUTPUT)
netmon: $(NETMON_TARGET_OUTPUT)
$(NETMON_RUNTIME_OUTPUT): $(SOURCES) VERSION
$(NETMON_TARGET_OUTPUT): $(SOURCES) VERSION
$(QUIET_BUILD)(cd $(NETMON_DIR) && go build $(BUILDFLAGS) -o $@ -ldflags "-X main.version=$(VERSION)" $(KATA_LDFLAGS))
runtime: $(RUNTIME_OUTPUT) $(CONFIGS)
runtime: $(TARGET_OUTPUT) $(CONFIGS)
.DEFAULT: default
build: default
@@ -559,12 +558,16 @@ define MAKE_KERNEL_VIRTIOFS_NAME
$(if $(findstring uncompressed,$1),vmlinux-virtiofs.container,vmlinuz-virtiofs.container)
endef
GENERATED_CONFIG = $(abspath $(CLI_DIR)/config-generated.go)
GENERATED_FILES += $(GENERATED_CONFIG)
GENERATED_FILES += pkg/katautils/config-settings.go
$(RUNTIME_OUTPUT): $(SOURCES) $(GENERATED_FILES) $(MAKEFILE_LIST) | show-summary
$(QUIET_BUILD)(cd $(RUNTIME_DIR) && go build $(KATA_LDFLAGS) $(BUILDFLAGS) -o $@ .)
$(TARGET_OUTPUT): $(SOURCES) $(GENERATED_FILES) $(MAKEFILE_LIST) | show-summary
$(QUIET_BUILD)(cd $(CLI_DIR) && go build $(KATA_LDFLAGS) $(BUILDFLAGS) -o $@ .)
$(SHIMV2_OUTPUT): $(SOURCES) $(GENERATED_FILES) $(MAKEFILE_LIST)
$(QUIET_BUILD)(cd $(SHIMV2_DIR)/ && ln -fs $(GENERATED_CONFIG))
$(QUIET_BUILD)(cd $(SHIMV2_DIR)/ && go build $(KATA_LDFLAGS) $(BUILDFLAGS) -o $@ .)
$(MONITOR_OUTPUT): $(SOURCES) $(GENERATED_FILES) $(MAKEFILE_LIST) .git-commit
@@ -607,7 +610,6 @@ ifeq ($(shell id -u), 0)
endif
go-test: $(GENERATED_FILES)
go clean -testcache
go test -v -mod=vendor ./...
check-go-static:
@@ -661,6 +663,7 @@ clean:
$(NETMON_TARGET) \
$(MONITOR) \
$(SHIMV2) \
$(SHIMV2_DIR)/$(notdir $(GENERATED_CONFIG)) \
$(TARGET) \
.git-commit .git-commit.tmp

View File

@@ -26,7 +26,8 @@ to work seamlessly with both Docker and Kubernetes respectively.
## License
The code is licensed under an Apache 2.0 license.
See [the license file](https://github.com/kata-containers/kata-containers/blob/main/LICENSE) for further details.
See [the license file](../../LICENSE) for further details.
## Platform support

View File

@@ -0,0 +1,40 @@
//
// Copyright (c) 2018-2019 Intel Corporation
//
// SPDX-License-Identifier: Apache-2.0
//
// WARNING: This file is auto-generated - DO NOT EDIT!
//
// Note that some variables are "var" to allow them to be modified
// by the tests.
package main
// name is the name of the runtime
const name = "@RUNTIME_NAME@"
// name of the project
const project = "@PROJECT_NAME@"
// prefix used to denote non-standard CLI commands and options.
const projectPrefix = "@PROJECT_TYPE@"
// original URL for this project
const projectURL = "@PROJECT_URL@"
// Project URL's organisation name
const projectORG = "@PROJECT_ORG@"
const defaultRootDirectory = "@PKGRUNDIR@"
// commit is the git commit the runtime is compiled from.
var commit = "@COMMIT@"
// version is the runtime version.
var version = "@VERSION@"
// Default config file used by stateless systems.
var defaultRuntimeConfiguration = "@CONFIG_PATH@"
// Alternate config file that takes precedence over
// defaultRuntimeConfiguration.
var defaultSysConfRuntimeConfiguration = "@SYSCONFIG@"

View File

@@ -109,11 +109,6 @@ virtio_fs_cache = "@DEFVIRTIOFSCACHE@"
# or nvdimm.
block_device_driver = "virtio-blk"
# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
# being allocated using huge pages.
#enable_hugepages = true
# This option changes the default hypervisor and kernel parameters
# to enable debug output where available.
#

View File

@@ -24,11 +24,6 @@ machine_type = "@MACHINETYPE@"
# Default false
# confidential_guest = true
# Enable running QEMU VMM as a non-root user.
# By default QEMU VMM run as root. When this is set to true, QEMU VMM process runs as
# a non-root random user. See documentation for the limitations of this mode.
# rootless = true
# List of valid annotation names for the hypervisor
# Each member of the list is a regular expression, which is the base name
# of the annotation, e.g. "path" for io.katacontainers.config.hypervisor.path"
@@ -365,7 +360,7 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# When enable_guest_swap is enabled, insert a raw file to the guest as the swap device
# if the swappiness of a container (set by annotation "io.katacontainers.container.resource.swappiness")
# is bigger than 0.
# The size of the swap device should be
# The size of the swap device should be
# swap_in_bytes (set by annotation "io.katacontainers.container.resource.swap_in_bytes") - memory_limit_in_bytes.
# If swap_in_bytes is not set, the size should be memory_limit_in_bytes.
# If swap_in_bytes and memory_limit_in_bytes is not set, the size should

View File

@@ -0,0 +1,30 @@
// Copyright (c) 2018 HyperHQ Inc.
//
// SPDX-License-Identifier: Apache-2.0
//
package main
import (
"fmt"
"os"
"github.com/containerd/containerd/runtime/v2/shim"
containerdshim "github.com/kata-containers/kata-containers/src/runtime/containerd-shim-v2"
"github.com/kata-containers/kata-containers/src/runtime/pkg/types"
)
func shimConfig(config *shim.Config) {
config.NoReaper = true
config.NoSubreaper = true
}
func main() {
if len(os.Args) == 2 && os.Args[1] == "--version" {
fmt.Printf("%s containerd shim: id: %q, version: %s, commit: %v\n", project, types.DefaultKataRuntimeName, version, commit)
os.Exit(0)
}
shim.Run(types.DefaultKataRuntimeName, containerdshim.New, shimConfig)
}

View File

@@ -25,6 +25,7 @@ import (
"strings"
"syscall"
"github.com/containerd/cgroups"
"github.com/kata-containers/kata-containers/src/runtime/pkg/katautils"
vc "github.com/kata-containers/kata-containers/src/runtime/virtcontainers"
"github.com/kata-containers/kata-containers/src/runtime/virtcontainers/pkg/oci"
@@ -61,9 +62,9 @@ type vmContainerCapableDetails struct {
const (
moduleParamDir = "parameters"
successMessageCapable = "System is capable of running " + katautils.PROJECT
successMessageCreate = "System can currently create " + katautils.PROJECT
failMessage = "System is not capable of running " + katautils.PROJECT
successMessageCapable = "System is capable of running " + project
successMessageCreate = "System can currently create " + project
failMessage = "System is not capable of running " + project
kernelPropertyCorrect = "Kernel property value correct"
// these refer to fields in the procCPUINFO file
@@ -228,7 +229,7 @@ func checkKernelModules(modules map[string]kernelModule, handler kernelParamHand
}
if !haveKernelModule(module) {
kataLog.WithFields(fields).Errorf("kernel property %s not found", module)
kataLog.WithFields(fields).Error("kernel property not found")
if details.required {
count++
}
@@ -291,9 +292,11 @@ func genericHostIsVMContainerCapable(details vmContainerCapableDetails) error {
errorCount := uint32(0)
count := checkCPUAttribs(cpuinfo, details.requiredCPUAttribs)
errorCount += count
count = checkCPUFlags(cpuFlags, details.requiredCPUFlags)
errorCount += count
count, err = checkKernelModules(details.requiredKernelModules, archKernelParamHandler)
@@ -313,7 +316,7 @@ func genericHostIsVMContainerCapable(details vmContainerCapableDetails) error {
var kataCheckCLICommand = cli.Command{
Name: "check",
Aliases: []string{"kata-check"},
Usage: "tests if system can run " + katautils.PROJECT,
Usage: "tests if system can run " + project,
Flags: []cli.Flag{
cli.BoolFlag{
Name: "check-version-only",
@@ -372,14 +375,14 @@ EXAMPLES:
$ %s check --only-list-releases --include-all-releases
`,
katautils.PROJECT,
project,
noNetworkEnvVar,
katautils.NAME,
katautils.NAME,
katautils.NAME,
katautils.NAME,
katautils.NAME,
katautils.NAME,
name,
name,
name,
name,
name,
name,
),
Action: func(context *cli.Context) error {
@@ -398,7 +401,7 @@ EXAMPLES:
if os.Geteuid() == 0 {
kataLog.Warn("Not running network checks as super user")
} else {
err := HandleReleaseVersions(cmd, katautils.VERSION, context.Bool("include-all-releases"))
err := HandleReleaseVersions(cmd, version, context.Bool("include-all-releases"))
if err != nil {
return err
}
@@ -414,6 +417,11 @@ EXAMPLES:
return errors.New("check: cannot determine runtime config")
}
// check if cgroup can work use the same logic for creating containers
if _, err := vc.V1Constraints(); err != nil && err == cgroups.ErrMountPointNotExist && !runtimeConfig.SandboxCgroupOnly {
return fmt.Errorf("Cgroup v2 requires the following configuration: `sandbox_cgroup_only=true`.")
}
err := setCPUtype(runtimeConfig.HypervisorType)
if err != nil {
return err

View File

@@ -161,16 +161,6 @@ func setCPUtype(hypervisorType vc.HypervisorType) error {
required: false,
},
}
case "mock":
archRequiredCPUFlags = map[string]string{
cpuFlagVMX: "Virtualization support",
cpuFlagLM: "64Bit CPU",
cpuFlagSSE4_1: "SSE4.1",
}
archRequiredCPUAttribs = map[string]string{
archGenuineIntel: "Intel Architecture CPU",
}
default:
return fmt.Errorf("setCPUtype: Unknown hypervisor type %s", hypervisorType)
}
@@ -302,8 +292,6 @@ func archHostCanCreateVMContainer(hypervisorType vc.HypervisorType) error {
return kvmIsUsable()
case "acrn":
return acrnIsUsable()
case "mock":
return nil
default:
return fmt.Errorf("archHostCanCreateVMContainer: Unknown hypervisor type %s", hypervisorType)
}

View File

@@ -317,13 +317,12 @@ func TestCheckHostIsVMContainerCapable(t *testing.T) {
}
}
// to check if host is capable for Kata Containers, must setup CPU info first.
_, config, err := makeRuntimeConfig(dir)
assert.NoError(err)
setCPUtype(config.HypervisorType)
setupCheckHostIsVMContainerCapable(assert, cpuInfoFile, cpuData, moduleData)
// remove the modules to force a failure
err = os.RemoveAll(sysModuleDir)
assert.NoError(err)
details := vmContainerCapableDetails{
cpuInfoFile: cpuInfoFile,
requiredCPUFlags: archRequiredCPUFlags,
@@ -333,12 +332,6 @@ func TestCheckHostIsVMContainerCapable(t *testing.T) {
err = hostIsVMContainerCapable(details)
assert.Nil(err)
// remove the modules to force a failure
err = os.RemoveAll(sysModuleDir)
assert.NoError(err)
err = hostIsVMContainerCapable(details)
assert.Error(err)
}
func TestArchKernelParamHandler(t *testing.T) {

View File

@@ -10,7 +10,7 @@ vendor_id : IBM/S390
# processors : 4
bogomips per cpu: 20325.00
max thread id : 0
features : esan3 zarch stfle msa ldisp eimm dfp edat etf3eh highgprs te vx sie
features : esan3 zarch stfle msa ldisp eimm dfp edat etf3eh highgprs te vx sie
cache0 : level=1 type=Data scope=Private size=128K line_size=256 associativity=8
cache1 : level=1 type=Instruction scope=Private size=96K line_size=256 associativity=6
cache2 : level=2 type=Data scope=Private size=2048K line_size=256 associativity=8

View File

@@ -57,51 +57,51 @@ func TestArchKernelParamHandler(t *testing.T) {
assert := assert.New(t)
type testData struct {
onVMM bool
expectIgnore bool
fields logrus.Fields
msg string
onVMM bool
expectIgnore bool
}
data := []testData{
{true, false, logrus.Fields{}, ""},
{false, false, logrus.Fields{}, ""},
{logrus.Fields{}, "", true, false},
{logrus.Fields{}, "", false, false},
{
false,
false,
logrus.Fields{
// wrong type
"parameter": 123,
},
"foo",
false,
false,
},
{
false,
false,
logrus.Fields{
"parameter": "unrestricted_guest",
},
"",
false,
false,
},
{
true,
true,
logrus.Fields{
"parameter": "unrestricted_guest",
},
"",
true,
true,
},
{
false,
true,
logrus.Fields{
"parameter": "nested",
},
"",
false,
true,
},
}

View File

@@ -17,10 +17,8 @@ import (
"strings"
"testing"
"github.com/kata-containers/kata-containers/src/runtime/pkg/katatestutils"
ktu "github.com/kata-containers/kata-containers/src/runtime/pkg/katatestutils"
"github.com/kata-containers/kata-containers/src/runtime/pkg/katautils"
vc "github.com/kata-containers/kata-containers/src/runtime/virtcontainers"
"github.com/sirupsen/logrus"
"github.com/stretchr/testify/assert"
"github.com/urfave/cli"
@@ -249,13 +247,6 @@ func genericCheckCLIFunction(t *testing.T, cpuData []testCPUData, moduleData []t
flagSet := &flag.FlagSet{}
ctx := createCLIContext(flagSet)
ctx.App.Name = "foo"
if katatestutils.IsInGitHubActions() {
// only set to mock if on GitHub
t.Logf("running tests under GitHub actions")
config.HypervisorType = vc.MockHypervisor
}
ctx.App.Metadata["runtimeConfig"] = config
// create buffer to save logger output

View File

@@ -13,16 +13,14 @@ import (
"strings"
"github.com/BurntSushi/toml"
specs "github.com/opencontainers/runtime-spec/specs-go"
"github.com/prometheus/procfs"
"github.com/urfave/cli"
"github.com/kata-containers/kata-containers/src/runtime/pkg/katautils"
"github.com/kata-containers/kata-containers/src/runtime/pkg/utils"
vc "github.com/kata-containers/kata-containers/src/runtime/virtcontainers"
exp "github.com/kata-containers/kata-containers/src/runtime/virtcontainers/experimental"
"github.com/kata-containers/kata-containers/src/runtime/virtcontainers/pkg/oci"
vcUtils "github.com/kata-containers/kata-containers/src/runtime/virtcontainers/utils"
specs "github.com/opencontainers/runtime-spec/specs-go"
"github.com/prometheus/procfs"
"github.com/urfave/cli"
)
// Semantic version for the output of the command.
@@ -171,8 +169,8 @@ func getMetaInfo() MetaInfo {
}
func getRuntimeInfo(configFile string, config oci.RuntimeConfig) RuntimeInfo {
runtimeVersionInfo := constructVersionInfo(katautils.VERSION)
runtimeVersionInfo.Commit = katautils.COMMIT
runtimeVersionInfo := constructVersionInfo(version)
runtimeVersionInfo.Commit = commit
runtimeVersion := RuntimeVersionInfo{
Version: runtimeVersionInfo,

View File

@@ -314,8 +314,8 @@ func getExpectedKernel(config oci.RuntimeConfig) KernelInfo {
func getExpectedRuntimeDetails(config oci.RuntimeConfig, configFile string) RuntimeInfo {
runtimePath, _ := os.Executable()
runtimeVersionInfo := constructVersionInfo(katautils.VERSION)
runtimeVersionInfo.Commit = katautils.COMMIT
runtimeVersionInfo := constructVersionInfo(version)
runtimeVersionInfo.Commit = commit
return RuntimeInfo{
Version: RuntimeVersionInfo{
Version: runtimeVersionInfo,

View File

@@ -25,7 +25,7 @@ var logLevel = flag.String("log-level", "info", "Log level of logrus(trace/debug
var (
appName = "kata-monitor"
// version is the kata monitor version.
version = "0.1.0"
version = "0.2.0"
GitCommit = "unknown-commit"
)

View File

@@ -39,18 +39,18 @@ const arch = goruntime.GOARCH
var usage = fmt.Sprintf(`%s runtime
%s is a command line program for running applications packaged
according to the Open Container Initiative (OCI).`, katautils.NAME, katautils.NAME)
according to the Open Container Initiative (OCI).`, name, name)
var notes = fmt.Sprintf(`
NOTES:
- Commands starting "%s-" and options starting "--%s-" are `+katautils.PROJECT+` extensions.
- Commands starting "%s-" and options starting "--%s-" are `+project+` extensions.
URL:
The canonical URL for this project is: %s
`, katautils.PROJECTPREFIX, katautils.PROJECTPREFIX, katautils.PROJECTURL)
`, projectPrefix, projectPrefix, projectURL)
// kataLog is the logger used to record all messages
var kataLog *logrus.Entry
@@ -82,7 +82,7 @@ var defaultErrorFile = os.Stderr
var runtimeFlags = []cli.Flag{
cli.StringFlag{
Name: "config, kata-config",
Usage: katautils.PROJECT + " config file path",
Usage: project + " config file path",
},
cli.StringFlag{
Name: "log",
@@ -96,7 +96,7 @@ var runtimeFlags = []cli.Flag{
},
cli.StringFlag{
Name: "root",
Value: katautils.DEFAULTROOTDIRECTORY,
Value: defaultRootDirectory,
Usage: "root directory for storage of container state (this should be located in tmpfs)",
},
cli.StringFlag{
@@ -145,7 +145,7 @@ var savedCLIErrWriter = cli.ErrWriter
func init() {
kataLog = logrus.WithFields(logrus.Fields{
"name": katautils.NAME,
"name": name,
"source": "runtime",
"arch": arch,
"pid": os.Getpid(),
@@ -222,7 +222,7 @@ func beforeSubcommands(c *cli.Context) error {
var runtimeConfig oci.RuntimeConfig
var err error
katautils.SetConfigOptions(katautils.NAME, katautils.DEFAULTRUNTIMECONFIGURATION, katautils.DEFAULTSYSCONFRUNTIMECONFIGURATION)
katautils.SetConfigOptions(name, defaultRuntimeConfiguration, defaultSysConfRuntimeConfiguration)
handleShowConfig(c)
@@ -302,8 +302,8 @@ func beforeSubcommands(c *cli.Context) error {
args := strings.Join(c.Args(), " ")
fields := logrus.Fields{
"version": katautils.VERSION,
"commit": katautils.COMMIT,
"version": version,
"commit": commit,
"arguments": `"` + args + `"`,
}
@@ -365,14 +365,14 @@ func commandNotFound(c *cli.Context, command string) {
func makeVersionString() string {
v := make([]string, 0, 3)
versionStr := katautils.VERSION
versionStr := version
if versionStr == "" {
versionStr = unknown
}
v = append(v, katautils.NAME+" : "+versionStr)
v = append(v, name+" : "+versionStr)
commitStr := katautils.COMMIT
commitStr := commit
if commitStr == "" {
commitStr = unknown
}
@@ -411,7 +411,7 @@ func setCLIGlobals() {
func createRuntimeApp(ctx context.Context, args []string) error {
app := cli.NewApp()
app.Name = katautils.NAME
app.Name = name
app.Writer = defaultOutputFile
app.Usage = usage
app.CommandNotFound = runtimeCommandNotFound

View File

@@ -57,19 +57,19 @@ var (
var testingImpl = &vcmock.VCMock{}
func init() {
if katautils.VERSION == "" {
if version == "" {
panic("ERROR: invalid build: version not set")
}
if katautils.COMMIT == "" {
if commit == "" {
panic("ERROR: invalid build: commit not set")
}
if katautils.DEFAULTSYSCONFRUNTIMECONFIGURATION == "" {
if defaultSysConfRuntimeConfiguration == "" {
panic("ERROR: invalid build: defaultSysConfRuntimeConfiguration not set")
}
if katautils.DEFAULTRUNTIMECONFIGURATION == "" {
if defaultRuntimeConfiguration == "" {
panic("ERROR: invalid build: defaultRuntimeConfiguration not set")
}
@@ -82,7 +82,7 @@ func init() {
var err error
fmt.Printf("INFO: creating test directory\n")
testDir, err = ioutil.TempDir("", fmt.Sprintf("%s-", katautils.NAME))
testDir, err = ioutil.TempDir("", fmt.Sprintf("%s-", name))
if err != nil {
panic(fmt.Sprintf("ERROR: failed to create test directory: %v", err))
}
@@ -153,8 +153,8 @@ func runUnitTests(m *testing.M) {
func TestMain(m *testing.M) {
// If the test binary name is kata-runtime.coverage, we've are being asked to
// run the coverage-instrumented kata-runtime.
if path.Base(os.Args[0]) == katautils.NAME+".coverage" ||
path.Base(os.Args[0]) == katautils.NAME {
if path.Base(os.Args[0]) == name+".coverage" ||
path.Base(os.Args[0]) == name {
main()
exit(0)
}
@@ -666,9 +666,9 @@ func TestMainBeforeSubCommandsShowCCConfigPaths(t *testing.T) {
for i, line := range lines {
switch i {
case 0:
assert.Equal(line, katautils.DEFAULTSYSCONFRUNTIMECONFIGURATION)
assert.Equal(line, defaultSysConfRuntimeConfiguration)
case 1:
assert.Equal(line, katautils.DEFAULTRUNTIMECONFIGURATION)
assert.Equal(line, defaultRuntimeConfiguration)
}
}
}
@@ -715,7 +715,7 @@ func testVersionString(assert *assert.Assertions, versionString, expectedVersion
foundCommit := false
foundOCIVersion := false
versionRE := regexp.MustCompile(fmt.Sprintf(`%s\s*:\s*%v`, katautils.NAME, expectedVersion))
versionRE := regexp.MustCompile(fmt.Sprintf(`%s\s*:\s*%v`, name, expectedVersion))
commitRE := regexp.MustCompile(fmt.Sprintf(`%s\s*:\s*%v`, "commit", expectedCommit))
ociRE := regexp.MustCompile(fmt.Sprintf(`%s\s*:\s*%v`, "OCI specs", expectedOCIVersion))
@@ -753,37 +753,37 @@ func TestMainMakeVersionString(t *testing.T) {
v := makeVersionString()
testVersionString(assert, v, katautils.VERSION, katautils.COMMIT, specs.Version)
testVersionString(assert, v, version, commit, specs.Version)
}
func TestMainMakeVersionStringNoVersion(t *testing.T) {
assert := assert.New(t)
savedVersion := katautils.VERSION
katautils.VERSION = ""
savedVersion := version
version = ""
defer func() {
katautils.VERSION = savedVersion
version = savedVersion
}()
v := makeVersionString()
testVersionString(assert, v, unknown, katautils.COMMIT, specs.Version)
testVersionString(assert, v, unknown, commit, specs.Version)
}
func TestMainMakeVersionStringNoCommit(t *testing.T) {
assert := assert.New(t)
savedCommit := katautils.COMMIT
katautils.COMMIT = ""
savedCommit := commit
commit = ""
defer func() {
katautils.COMMIT = savedCommit
commit = savedCommit
}()
v := makeVersionString()
testVersionString(assert, v, katautils.VERSION, unknown, specs.Version)
testVersionString(assert, v, version, unknown, specs.Version)
}
func TestMainMakeVersionStringNoOCIVersion(t *testing.T) {
@@ -798,7 +798,7 @@ func TestMainMakeVersionStringNoOCIVersion(t *testing.T) {
v := makeVersionString()
testVersionString(assert, v, katautils.VERSION, katautils.COMMIT, unknown)
testVersionString(assert, v, version, commit, unknown)
}
func TestMainCreateRuntimeApp(t *testing.T) {
@@ -824,7 +824,7 @@ func TestMainCreateRuntimeApp(t *testing.T) {
defaultOutputFile = savedOutputFile
}()
args := []string{katautils.NAME}
args := []string{name}
err = createRuntimeApp(context.Background(), args)
assert.NoError(err, "%v", args)
@@ -849,7 +849,7 @@ func TestMainCreateRuntimeAppInvalidSubCommand(t *testing.T) {
}()
// calls fatal() so no return
_ = createRuntimeApp(context.Background(), []string{katautils.NAME, "i-am-an-invalid-sub-command"})
_ = createRuntimeApp(context.Background(), []string{name, "i-am-an-invalid-sub-command"})
assert.NotEqual(exitStatus, 0)
}
@@ -869,7 +869,7 @@ func TestMainCreateRuntime(t *testing.T) {
savedBefore := runtimeBeforeSubcommands
savedCommands := runtimeCommands
os.Args = []string{katautils.NAME, cmd}
os.Args = []string{name, cmd}
exitFunc = func(status int) { exitStatus = status }
// disable
@@ -920,10 +920,10 @@ func TestMainVersionPrinter(t *testing.T) {
setCLIGlobals()
err = createRuntimeApp(context.Background(), []string{katautils.NAME, "--version"})
err = createRuntimeApp(context.Background(), []string{name, "--version"})
assert.NoError(err)
err = grep(fmt.Sprintf(`%s\s*:\s*%s`, katautils.NAME, katautils.VERSION), output)
err = grep(fmt.Sprintf(`%s\s*:\s*%s`, name, version), output)
assert.NoError(err)
}
@@ -968,7 +968,7 @@ func TestMainFatalWriter(t *testing.T) {
setCLIGlobals()
err := createRuntimeApp(context.Background(), []string{katautils.NAME, cmd})
err := createRuntimeApp(context.Background(), []string{name, cmd})
assert.Error(err)
re := regexp.MustCompile(

View File

@@ -16,8 +16,6 @@ import (
"strings"
"github.com/blang/semver"
"github.com/kata-containers/kata-containers/src/runtime/pkg/katautils"
)
type ReleaseCmd int
@@ -31,7 +29,7 @@ type releaseDetails struct {
const (
// A release URL is expected to be prefixed with this value
projectAPIURL = "https://api.github.com/repos/" + katautils.PROJECTORG
projectAPIURL = "https://api.github.com/repos/" + projectORG
releasesSuffix = "/releases"
downloadsSuffix = releasesSuffix + "/download"
@@ -39,12 +37,12 @@ const (
// Kata 1.x
kata1xRepo = "runtime"
kataLegacyReleaseURL = projectAPIURL + "/" + kata1xRepo + releasesSuffix
kataLegacyDownloadURL = katautils.PROJECTURL + "/" + kata1xRepo + downloadsSuffix
kataLegacyDownloadURL = projectURL + "/" + kata1xRepo + downloadsSuffix
// Kata 2.x or newer
kata2xRepo = "kata-containers"
kataReleaseURL = projectAPIURL + "/" + kata2xRepo + releasesSuffix
kataDownloadURL = katautils.PROJECTURL + "/" + kata2xRepo + downloadsSuffix
kataDownloadURL = projectURL + "/" + kata2xRepo + downloadsSuffix
// Environment variable that can be used to override a release URL
ReleaseURLEnvVar = "KATA_RELEASE_URL"
@@ -379,7 +377,7 @@ func HandleReleaseVersions(cmd ReleaseCmd, currentVersion string, includeAll boo
currentSemver, err := semver.Make(currentVersion)
if err != nil {
return fmt.Errorf("BUG: Current version of %s (%s) has invalid SemVer version: %v", katautils.NAME, currentVersion, err)
return fmt.Errorf("BUG: Current version of %s (%s) has invalid SemVer version: %v", name, currentVersion, err)
}
releaseURL, err := getReleaseURL(currentSemver)

View File

@@ -12,7 +12,6 @@ import (
"testing"
"github.com/blang/semver"
"github.com/kata-containers/kata-containers/src/runtime/pkg/katautils"
"github.com/stretchr/testify/assert"
)
@@ -21,7 +20,7 @@ var expectedReleasesURL string
func init() {
var err error
currentSemver, err = semver.Make(katautils.VERSION)
currentSemver, err = semver.Make(version)
if err != nil {
panic(fmt.Sprintf("failed to create semver for testing: %v", err))
@@ -308,7 +307,7 @@ func TestDownloadURLIsValid(t *testing.T) {
{"foo", true},
{"foo bar", true},
{"https://google.com", true},
{katautils.PROJECTURL, true},
{projectURL, true},
{validKata1xDownload, false},
{validKata2xDownload, false},
}

View File

@@ -1,32 +0,0 @@
// Copyright (c) 2018 HyperHQ Inc.
//
// SPDX-License-Identifier: Apache-2.0
//
package main
import (
"fmt"
"os"
shimapi "github.com/containerd/containerd/runtime/v2/shim"
shim "github.com/kata-containers/kata-containers/src/runtime/pkg/containerd-shim-v2"
"github.com/kata-containers/kata-containers/src/runtime/pkg/katautils"
"github.com/kata-containers/kata-containers/src/runtime/pkg/types"
)
func shimConfig(config *shimapi.Config) {
config.NoReaper = true
config.NoSubreaper = true
}
func main() {
if len(os.Args) == 2 && os.Args[1] == "--version" {
fmt.Printf("%s containerd shim: id: %q, version: %s, commit: %v\n", katautils.PROJECT, types.DefaultKataRuntimeName, katautils.VERSION, katautils.COMMIT)
os.Exit(0)
}
shimapi.Run(types.DefaultKataRuntimeName, shim.New, shimConfig)
}

View File

@@ -10,15 +10,8 @@ package containerdshim
import (
"context"
"fmt"
"github.com/kata-containers/kata-containers/src/runtime/pkg/utils"
"github.com/kata-containers/kata-containers/src/runtime/virtcontainers/pkg/rootless"
"math/rand"
"os"
"os/user"
"path"
"path/filepath"
"strconv"
"syscall"
containerd_types "github.com/containerd/containerd/api/types"
"github.com/containerd/containerd/mount"
@@ -110,12 +103,6 @@ func create(ctx context.Context, s *service, r *taskAPI.CreateTaskRequest) (*con
}()
katautils.HandleFactory(ctx, vci, s.config)
rootless.SetRootless(s.config.HypervisorConfig.Rootless)
if rootless.IsRootless() {
if err := configureNonRootHypervisor(s.config); err != nil {
return nil, err
}
}
// Pass service's context instead of local ctx to CreateSandbox(), since local
// ctx will be canceled after this rpc service call, but the sandbox will live
@@ -272,112 +259,3 @@ func doMount(mounts []*containerd_types.Mount, rootfs string) error {
}
return nil
}
func configureNonRootHypervisor(runtimeConfig *oci.RuntimeConfig) error {
userName, err := createVmmUser()
if err != nil {
return err
}
defer func() {
if err != nil {
removeVmmUser(userName)
}
}()
u, err := user.Lookup(userName)
if err != nil {
return err
}
uid, err := strconv.Atoi(u.Uid)
if err != nil {
return err
}
gid, err := strconv.Atoi(u.Gid)
if err != nil {
return err
}
runtimeConfig.HypervisorConfig.Uid = uint32(uid)
runtimeConfig.HypervisorConfig.Gid = uint32(gid)
userTmpDir := path.Join("/run/user/", fmt.Sprint(uid))
dir, err := os.Stat(userTmpDir)
if os.IsNotExist(err) {
if err = os.Mkdir(userTmpDir, vc.DirMode); err != nil {
return err
}
defer func() {
if err != nil {
if err = os.RemoveAll(userTmpDir); err != nil {
shimLog.WithField("userTmpDir", userTmpDir).WithError(err).Warn("failed to remove userTmpDir")
}
}
}()
if err = syscall.Chown(userTmpDir, uid, gid); err != nil {
return err
}
}
if dir != nil && !dir.IsDir() {
return fmt.Errorf("%s is expected to be a directory", userTmpDir)
}
if err := os.Setenv("XDG_RUNTIME_DIR", userTmpDir); err != nil {
return err
}
info, err := os.Stat("/dev/kvm")
if err != nil {
return err
}
if stat, ok := info.Sys().(*syscall.Stat_t); ok {
// Add the kvm group to the hypervisor supplemental group so that the hypervisor process can access /dev/kvm
runtimeConfig.HypervisorConfig.Groups = append(runtimeConfig.HypervisorConfig.Groups, stat.Gid)
return nil
}
return fmt.Errorf("failed to get the gid of /dev/kvm")
}
func createVmmUser() (string, error) {
var (
err error
userName string
)
useraddPath, err := utils.FirstValidExecutable([]string{"/usr/sbin/useradd", "/sbin/useradd", "/bin/useradd"})
if err != nil {
return "", err
}
nologinPath, err := utils.FirstValidExecutable([]string{"/usr/sbin/nologin", "/sbin/nologin", "/bin/nologin"})
if err != nil {
return "", err
}
// Add retries to mitigate temporary errors and race conditions. For example, the user already exists
// or another instance of the runtime is also creating a user.
maxAttempt := 5
for i := 0; i < maxAttempt; i++ {
userName = fmt.Sprintf("kata-%v", rand.Intn(100000))
_, err = utils.RunCommand([]string{useraddPath, "-M", "-s", nologinPath, userName, "-c", "\"Kata Containers temporary hypervisor user\""})
if err == nil {
return userName, nil
}
shimLog.WithField("attempt", i+1).WithField("username", userName).
WithError(err).Warn("failed to add user, will try again")
}
return "", fmt.Errorf("could not create VMM user: %v", err)
}
func removeVmmUser(user string) {
userdelPath, err := utils.FirstValidExecutable([]string{"/usr/sbin/userdel", "/sbin/userdel", "/bin/userdel"})
if err != nil {
shimLog.WithField("username", user).WithError(err).Warn("failed to remove user")
}
// Add retries to mitigate temporary errors and race conditions.
for i := 0; i < 5; i++ {
_, err := utils.RunCommand([]string{userdelPath, "-f", user})
if err == nil {
return
}
shimLog.WithField("username", user).WithField("attempt", i+1).WithError(err).Warn("failed to remove user")
}
}

View File

@@ -183,8 +183,13 @@ func (s *service) mountPprofHandle(m *http.ServeMux, ociSpec *specs.Spec) {
m.Handle("/debug/pprof/trace", http.HandlerFunc(pprof.Trace))
}
// SocketAddress returns the address of the abstract domain socket for communicating with the
// GetSandboxesStoragePath returns the storage path where sandboxes info are stored
func GetSandboxesStoragePath() string {
return "/run/vc/sbs"
}
// SocketAddress returns the address of the unix domain socket for communicating with the
// shim management endpoint
func SocketAddress(id string) string {
return fmt.Sprintf("unix://%s", filepath.Join(string(filepath.Separator), "run", "vc", "sbs", id, "shim-monitor.sock"))
return fmt.Sprintf("unix://%s", filepath.Join(string(filepath.Separator), GetSandboxesStoragePath(), id, "shim-monitor.sock"))
}

Some files were not shown because too many files have changed in this diff Show More