Compare commits

..

114 Commits
2.4.3 ... 2.2.3

Author SHA1 Message Date
Archana Shinde
3a1804cd73 Merge pull request #2975 from bergwolf/2.2.3-branch-bump
# Kata Containers 2.2.3
2021-11-05 04:31:27 -07:00
Peng Tao
b7493fd5d5 release: Kata Containers 2.2.3
ad45107a2 release: Kata Containers 2.2.3
4f73e58d7 packaging/static-build: s390x fixes
45f65a73c agent: Handle uevent remove actions
06d304934 agent: fix race condition when test watcher
0366f6e81 template: disable template unit test on arm
7cb650abc runtime: DefaultMaxVCPUs should not greater than defaultMaxQemuVCPUs
e97cd23bd runtime: current vcpu number should be limited
6b6d81cce runtime: kernel version with '+' as suffix panic in parse
a479eca7d docs: Fix outdated links
b794a3940 virtcontainers: clh: Re-generate the client code
39d95f486 versions: Upgrade to Cloud Hypervisor v19.0

Depends-on: github.com/kata-containers/tests#4155
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-11-05 15:10:01 +08:00
Chelsea Mafrica
63ecbcf14b Merge pull request #2971 from wainersm/stable-2.2_image-builder-fix
stable-2.2 | osbuilder: build image-builder image from Fedora 34
2021-11-04 21:37:50 -07:00
Jakob Naucke
4f73e58d73 packaging/static-build: s390x fixes
- Install OpenSSL for key generation in kernel build
- Do not install libpmem
- Do not exclude `*/share/*/*.img` files in QEMU tarball since among
  them are boot loader files critical for IPLing.

Fixes: #2895
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-11-05 11:35:54 +08:00
Haitao Li
45f65a73c8 agent: Handle uevent remove actions
uevents with action=remove was ignored causing the agent to reuse stale
data in the device map. This patch adds handling of such uevents.

Fixes #2405

Signed-off-by: Haitao Li <lihaitao@gmail.com>
2021-11-05 11:35:34 +08:00
Jianyong Wu
06d3049349 agent: fix race condition when test watcher
create_tmpfs won't pass as the race condition in watcher umount. quote
James's words here:

1. Rust runs all tests in parallel.
2. Mounts are a process-wide, not a per-thread resource.
The only test that calls watcher.mount() is create_tmpfs().
However, other tests create BindWatcher objects.
3. BindWatcher's drop() implementation calls self.cleanup(),
which calls unmount for the mountpoint create_tmpfs() asserts.
4. The other tests are calling unmount whenever a BindWatcher goes
out of scope.

To avoid that issue, let the tests using BindWatcher in watcher and
sandbox.rs run sequentially.

Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-11-05 11:34:52 +08:00
Jianyong Wu
0366f6e817 template: disable template unit test on arm
Template is broken on arm. here we disable the template unit test
temporarily.

Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-11-05 11:34:35 +08:00
Jianyong Wu
7cb650abcf runtime: DefaultMaxVCPUs should not greater than defaultMaxQemuVCPUs
DefaultMaxVCPUs may be larger than the defaultMaxQemuVCPUs that should
be checked and avoided.

Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-11-05 11:34:22 +08:00
Jianyong Wu
e97cd23bd6 runtime: current vcpu number should be limited
The physical current vcpu number should not be used directly as the
largest vcpu number is limited to defaultMaxQemuVCPUs.
Here, a new helper is introduced in pkg/katautils/config.go to get
current vcpu number.

Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-11-05 11:34:16 +08:00
Jianyong Wu
6b6d81cced runtime: kernel version with '+' as suffix panic in parse
The current kernel version parse lib can't process suffix '+', as the
modified kernel version will add '+' as suffix, thus panic will occur.

For example, if the current kernel version is "5.14.0-rc4+", test
TestHostNetworkingRequested will panic:
--- FAIL: TestHostNetworkingRequested (0.00s)
panic: &{DistroName:ubuntu DistroVersion:18.04
KernelVersion:5.11.0-rc3+ Issue: Passed:[] Failed:[] Debug:true
ActualEUID:0}: failed to check test constraints: error: Build meta data
is empty

Here, remove the suffix '+' in kernel version fix helper.

Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-11-05 11:34:08 +08:00
Binbin Zhang
a479eca7de docs: Fix outdated links
fix outdated links which were checked out by workflow/docs-url-alive-check

Fixes #2630

Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
2021-11-05 11:31:44 +08:00
Wainer dos Santos Moschetta
ee3bf4a411 osbuilder: build image-builder image from Fedora 34
Currently the image-builder image is built from `fedora:latest` and
this is error-prone as any update of the base image can lead to
breakage. Instead let's create the image from Fedora 34, which is the
last known version to build fine.

Fixes #2960
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
(cherry picked from commit a239a38f45)
2021-11-04 13:35:32 -04:00
James O. D. Hunt
4443a982e6 Merge pull request #2888 from likebreath/1022/backport_clh_v19.0_seccomp
stable-2.2 | versions: Upgrade to Cloud Hypervisor v19.0
2021-10-25 10:49:39 +01:00
Bo Chen
b794a39401 virtcontainers: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v19.0.
Note: The client code of cloud-hypervisor's (CLH) OpenAPI is
automatically generated by openapi-generator [1-2].

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 8030b6caf0)
2021-10-22 16:39:03 -07:00
Bo Chen
39d95f486b versions: Upgrade to Cloud Hypervisor v19.0
Highlights from the Cloud Hypervisor release v19.0: 1) Improved PTY
handling for serial and virtio-console; 2) PCI boot time optimisations;
3) Improved TDX support; 4) Live migration enhancements (support with
virtio-mem and virtio-balloon); 5) virtio-mem support with vfio-user; 6)
AArch64 for virtio-iommu; 7) Various bug fixes for live-migration and
VFIO passthrough.

Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v19.0

Fixes: #2871

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 8296754e07)
2021-10-22 16:39:03 -07:00
Peng Tao
aa40324c52 Merge pull request #2841 from fidencio/2.2.2-branch-bump
# Kata Containers 2.2.2
2021-10-14 19:29:27 +08:00
Fabiano Fidêncio
9053137592 release: Kata Containers 2.2.2
- stable-2.2 | Backport #2821 and #2769
- Backport runtime: Fix !x86 static checks
- stable-2.2 | agent: exec should inherit container process capabilities
- stable-2.2 | vendor: Update containerd to v1.5.7
- stable-2.2 | fc: fix version parsing for fc >= 0.25
- [backport] kata-monitor: cache improvements

eea2c019 virtcontainers: clh: Use 'quiet' as the default kernel parameter
1e798b96 virtcontainers: clh: Turn-off serial and virtio-console by default
53c4492f agent: netlink: Use the grpc IP family field when updating the route
893623df runtime: Pass the route IP family to the agent
503ce9c1 agent: protos: Add a Family field to the Route payload
9932e76f runtime: vendor: Bump the netlink package dependency
0034f40b agent: exec should inherit container process capabilities
1f6b0f65 protection: add confidential compute frame for arm
112e0f63 check: fix typecheck failure in qemu_arm64_test.go
18820e31 virtcontainers: fix lint failure on ppc64le
8fafced9 virtcontainers: nolint guestProtection
9668095a runtime: Fix field alignment on s390x
3e145ea9 vendor: Update containerd to v1.5.7
79e0754a fc: fix version parsing for fc >= 0.25
b8fc1af3 runtime: set the sandbox storage path static
97167ccd runtime: rename GetSanboxesStoragePath() --> GetSandboxesStoragePath()
b0aca51e kata-monitor: bump version to 0.2.0
28873c4d kata-monitor: refresh kata sandbox list on fs events
3525a2ed kata-monitor: improve detection of kata workloads
30d07d44 kata-monitor: add getSandboxFS()
623b1082 runtime: add GetSandboxesStoragePath()
fc1822f0 kata-monitor: improve sandbox caching
ba6ad1c8 kata-monitor: warn when unable to retrive the lower level runtime
22d3df91 kata-monitor: minor fixes

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-10-14 09:58:00 +02:00
Fabiano Fidêncio
c4e8e86acf Merge pull request #2839 from fidencio/wip/stable-2.2-backport-2821-and-2769
stable-2.2 | Backport #2821 and #2769
2021-10-14 09:57:02 +02:00
Bo Chen
eea2c0195f virtcontainers: clh: Use 'quiet' as the default kernel parameter
The 'quiet' kernel parameter can avoid guest kernel logs while booting,
which can reduce boot time.

Fix: #2820

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 7b2bfd4eca)
2021-10-14 08:53:50 +02:00
Bo Chen
1e798b96fd virtcontainers: clh: Turn-off serial and virtio-console by default
We will need to have console output from the guest only for debugging
purposes. As a result, we can turn-off both the serial and
virtio-console devices by default for better boot time.

Fixes: #2820

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 3e24e46c70)
2021-10-14 08:53:44 +02:00
Samuel Ortiz
53c4492fb3 agent: netlink: Use the grpc IP family field when updating the route
Not all routes have either a gateway or a destination IP.
Interface routes, where the source, destination and gateway are undefined,
will default to IP v4 with the current is_ipv6() check even when they
are v6 routes.

We use the provided gRPC Route.Family field instead. This field is built
from the host netlink messages, and is a reliable way of finding out
a route's IP family.

Fixes: #2768

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
(cherry picked from commit a44cde7e8d)
2021-10-14 08:53:10 +02:00
Samuel Ortiz
893623dfbc runtime: Pass the route IP family to the agent
When updating the guest routing table, we should forward the IP family
information up to the guest.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
(cherry picked from commit 71ce6cfe9e)
2021-10-14 08:53:06 +02:00
Samuel Ortiz
503ce9c154 agent: protos: Add a Family field to the Route payload
Our check for the IP family is working as long as we have either a
gateway or a destination IP. Some routes are missing both.
The RT netlink messages provide the IP family information for each
route, so we can carry that piece of information up to the guest. That
will allow for a more reliable route IP family determination.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
(cherry picked from commit 99450bd1f7)
2021-10-14 08:53:01 +02:00
Samuel Ortiz
9932e76f27 runtime: vendor: Bump the netlink package dependency
We need to be able to get the IP family from the netlink route meesages,
and the Route.Family field only got recently added to the netlink
package.

The update generates static check warnings about the call for
nethandler.Delete() being deprecated in favor of a Close() call instead.
So we include the s/Delete()/Close()/ change as part of this PR.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
(cherry picked from commit f85fe70231)
2021-10-14 08:52:47 +02:00
GabyCT
3a035c1f43 Merge pull request #2831 from Jakob-Naucke/backport-!x86-static
Backport runtime: Fix !x86 static checks
2021-10-13 13:35:48 -05:00
Eric Ernst
4102a18aa1 Merge pull request #2832 from bergwolf/capability-fix-for-2.2
stable-2.2 | agent: exec should inherit container process capabilities
2021-10-13 10:22:27 -07:00
Peng Tao
0034f40b67 agent: exec should inherit container process capabilities
Otherwise rustjail would not set its capabilities and it ends up getting
all capabilities.

Fixes: #2828
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-10-13 17:43:08 +08:00
Jianyong Wu
1f6b0f651e protection: add confidential compute frame for arm
Even CCA, which is the confidential compute archtecture, has not been
ready, add a empty implementation to avoid static check error.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Suggested-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-10-13 11:06:42 +02:00
Jianyong Wu
112e0f6381 check: fix typecheck failure in qemu_arm64_test.go
fix typecheck failure in qemu_arm64_test.go

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-10-13 11:06:42 +02:00
Amulya Meka
18820e31d9 virtcontainers: fix lint failure on ppc64le
Add nolint for arch specific code to exclude
from lint check.

Signed-off-by: Amulya Meka <amulmek1@in.ibm.com>
2021-10-13 11:06:42 +02:00
Jakob Naucke
8fafced9ff virtcontainers: nolint guestProtection
Exclude from lint checking for it is ultimately only used in
architecture-specific code.

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-10-13 11:06:41 +02:00
Jakob Naucke
9668095abd runtime: Fix field alignment on s390x
Follow-up of #2237 for s390x -- field alignment isn't always minimal

Fixes: #2830
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-10-13 11:06:41 +02:00
Chelsea Mafrica
be51808a13 Merge pull request #2803 from fidencio/wip/stable-2.2-upgrade-vendored-containerd
stable-2.2 | vendor: Update containerd to v1.5.7
2021-10-06 18:06:44 -07:00
Fabiano Fidêncio
3e145ea94c vendor: Update containerd to v1.5.7
Bump containerd to v1.5.7 in order to bring in a fix for CVE-2021-41103,
"insufficiently restricted permissions on plugins directories
(GHSA-c2h3-6mxw-7mvq)".

dependabot found a potential security vulnerability and raised a PR to
fix it.  However, dependabot does not properly follows nor understands
the needed of our CIs (mainly related to formatting the PR and whatnot),
thus I'm re-raising it.

Fixes: #2796
Backports: #2797

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-10-06 21:08:37 +02:00
Chelsea Mafrica
3951834565 Merge pull request #2800 from fidencio/wip/stable-2.2-backport-fix-for-parsing-firecracker-version-if-it-is-0-25-or-over
stable-2.2 | fc: fix version parsing for fc >= 0.25
2021-10-06 09:50:59 -07:00
Bl1tz23
79e0754a7b fc: fix version parsing for fc >= 0.25
Allows to use firecracker version >=0.25.

Fixes: #2471

Signed-off-by: Bl1tz23 <alex3angle@gmail.com>
(cherry picked from commit 87bbae1bd7)
2021-10-06 17:27:22 +02:00
snir911
afe6005785 Merge pull request #2717 from fgiudici/stable-2.2_kata-monitor
[backport] kata-monitor: cache improvements
2021-10-03 18:45:01 +03:00
Francesco Giudici
b8fc1af363 runtime: set the sandbox storage path static
Since we now have "unix://" kind of socket returned by the
SocketAddress() function, there is no more need to build the sandbox
storage path dynamically to keep OS compatibility.

Fixes: #2738
Suggested-by: Christophe de Dinechin <dinechin@redhat.com>
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
(cherry picked from commit 2304a59601)
2021-09-30 11:48:53 +02:00
Francesco Giudici
97167ccddd runtime: rename GetSanboxesStoragePath() --> GetSandboxesStoragePath()
Add the missing 'd'.

Fixes: #2738
Suggested-by: Jakob Naucke <jakob.naucke@ibm.com>
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
(cherry picked from commit 315295e0ef)
2021-09-30 11:48:09 +02:00
Fabiano Fidêncio
af0fbb9460 Merge pull request #2723 from fidencio/2.2.1-branch-bump
# Kata Containers 2.2.1
2021-09-25 00:02:01 +02:00
Fabiano Fidêncio
bc48a58806 Merge pull request #2731 from fidencio/wip/stable-2.2-release-fix-using-vendored-sources
stable-2.2 | workflows: Fix the config file path for using vendored sources
2021-09-25 00:01:43 +02:00
Fabiano Fidêncio
d581cdab4e Merge pull request #2728 from fidencio/wip/stable-2.2-fix-wrong-tags-attribution
stable-2.2 | workflows: Fix tag attribution
2021-09-24 23:01:18 +02:00
Fabiano Fidêncio
52fdfc4fed workflows: Fix the config file path for using vendored sources
There's a typo in the file that should receive the output of `cargo
vendor`.  We should use forward the output to `.cargo/config` instead of
`.cargo/vendor`.

This was introduced by 21c8511630.

Backports: #2730
Fixes: #2729

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit a525991c2c)
2021-09-24 20:29:15 +02:00
Fabiano Fidêncio
8d98e01414 workflows: Fix tag attribution
While releasing kata-containers 2.3.0-alpha1 we've hit some issues as
the tags attribution is done incorrectly.  We want an array of tags to
iterate over, but the currently code is just lost is the parenthesis.

This issue was introduced in a156288c1f.

Fixes: #2725

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit 39dcbaa672)
2021-09-24 20:07:55 +02:00
Fabiano Fidêncio
688cc8e2bd release: Kata Containers 2.2.1
- stable-2.2 | watcher: ensure we create target mount point for storage
- stable-2.2 | virtiofs: Create shared directory with 0700 mode, not 0750
- [backport]sandbox: Allow the device to be accessed,such as /dev/null and /dev/u…
- stable-2.2 | kata-deploy: Also provide "stable" & "latest" tags
- stable-2.2 | runtime: tracing: Fix logger passed in newContainer
- stable-2.2 | runtime: tracing: Use root context to stop tracing
- packaging: Backport QEMU's GitLab switch to 5.1.x
- stable-2.2 | workflows,release: Upload the vendored cargo code
- backport: Call agent shutdown test only in the correspondent CI_JOB
- packaging: Backport QEMU's switch to GitLab repos
- stable-2.2 | virtcontainers: fc: parse vcpuID correctly
- shimv2: Backport fixes for #2527
- backport-2.2: remove default config for arm64.
- stable-2.2 | versions: Upgrade to Cloud Hypervisor v18.0
- [backport]sandbox: Add device permissions such as /dev/null to cgroup
- [backport] runtime: Fix README link
- [backport] snap: Test variable instead of executing "branch"

d9b41fc5 watcher: ensure we create target mount point for storage
2b6327ac kata-deploy: Add more info about the stable tag
5256e085 kata-deploy: Improve README
02b46268 kata-deploy: Remove qemu-virtiofs runtime class
1b3058dd release: update the kata-deploy yaml files accordingly
98e2e935 kata-deploy: Add "stable" info to the README
8f25c7da kata-deploy: Update the README
84da2f8d workflows: Add "stable" & "latest" tags to kata-deploy
5c76f1c6 packaging: Backport QEMU's GitLab switch to 5.1.x
ba6fc328 packaging: Backport QEMU's switch to GitLab repos
d5f5da43 workflows,release: Upload the vendored cargo code
017cd3c5 ci: Call agent shutdown test only in the correspondent CI_JOB
2ca867da runtime: Add container field to logs
f4da502c shimv2: add information to method comment
16164241 shimv2: add logging to shimv2 api calls
25c7e118 virtiofs: Create shared directory with 0700 mode, not 0750
4c5bf057 virtcontainers: fc: parse vcpuID correctly
b3e620db runtime: tracing: Fix logger passed in newContainer
98c2ca13 runtime: tracing: Use root context to stop tracing
0481c507 backport-2.2: remove default config for arm64.
56920bc9 sandbox: Allow the device to be accessed,such as /dev/null and /dev/urandom
a1874ccd virtcontainers: clh: Revert the workaround incorrect default values
c2c65050 virtcontainers: clh: Re-generate the client code
7ee43f94 versions: Upgrade to Cloud Hypervisor v18.0
1792a9fe runtime: Fix README link
807cc8a3 sandbox: Add device permissions such as /dev/null to cgroup
5987f3b5 snap: Test variable instead of executing "branch"

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-09-24 12:34:35 +02:00
Fabiano Fidêncio
ebc23df752 Merge pull request #2714 from egernst/watcher-fixup-backport
stable-2.2 | watcher: ensure we create target mount point for storage
2021-09-24 09:32:29 +02:00
Francesco Giudici
b0aca51eac kata-monitor: bump version to 0.2.0
We now support any container engine CRI compliant. Let's bump the
kata-monitor version to 0.2.0.

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
(cherry picked from commit 8b0bc1f45e)
2021-09-24 09:25:13 +02:00
Francesco Giudici
28873c4d75 kata-monitor: refresh kata sandbox list on fs events
This commit stops the container engine polling in favor of
the kata sandbox storage path monitoring.
The pod cache list is now refreshed based on fs events and synced with
the container engine only when needed.

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
(cherry picked from commit bfb556d56a)
2021-09-24 09:25:00 +02:00
Francesco Giudici
3525a2ed03 kata-monitor: improve detection of kata workloads
When the container engine is different than containerd or CRI-O we
lack proper detection of kata workloads and consider all the pods as
kata ones.
Instead of querying the container engine for the lower level runtime
used in each pod, check if a directory matching the pod exists in
the virtualcontainers sandboxes storage path.
This provides a container engine independent way to check for kata pods.

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
(cherry picked from commit 0e854f3b80)
2021-09-24 09:24:17 +02:00
Francesco Giudici
30d07d4407 kata-monitor: add getSandboxFS()
Retrieve the absolute sandbox storage path. We will soon need this to
monitor the creation/deletion of new kata sandboxes.

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
(cherry picked from commit afad910d0e)
2021-09-24 09:24:03 +02:00
Francesco Giudici
623b108227 runtime: add GetSandboxesStoragePath()
The storage path we use to collect the sandbox files is defined in the
virtcontainers/persist/fs package.
We create the runtime socket in that storage path, by hardcoding the
full path in the SocketAddress() function in the runtime package.
This commit splits the hardcoded path by the socket address path so that
the runtime package will be able to provide the storage path to all the
components that may need it.

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
(cherry picked from commit e38686f74d)
2021-09-24 09:23:47 +02:00
Francesco Giudici
fc1822f094 kata-monitor: improve sandbox caching
In order to retrieve the list of sandboxes, we poll the container engine
every 15 seconds via the CRI. Once we have the list we have to inspect
each pod to find out the kata ones.
This commit extend the sandbox cache to keep track of all the pods,
marking the kata ones, so that during the next polling only the new
sandboxes should be inspected to figure out which ones are using the
kata runtime.

Fixes: #2563
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
(cherry picked from commit 245a12bbb7)
2021-09-24 09:23:33 +02:00
Francesco Giudici
ba6ad1c804 kata-monitor: warn when unable to retrive the lower level runtime
this is an unexpected event (likely a change in how containerd/cri-o
record the lower level runtime in the pod) and should be more visible:
raise the log level to "warning".

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
(cherry picked from commit fc067d61d4)
2021-09-24 09:21:10 +02:00
Francesco Giudici
22d3df9141 kata-monitor: minor fixes
fix comment and use literals

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
(cherry picked from commit 53ec4df953)
2021-09-24 09:19:56 +02:00
Fabiano Fidêncio
e58fabfc20 Merge pull request #2598 from c3d/backport/2589-virtiofsd-perms-perms
stable-2.2 | virtiofs: Create shared directory with 0700 mode, not 0750
2021-09-24 09:16:59 +02:00
Peng Tao
feb06dad8a Merge pull request #2623 from Bevisy/stable-2.2-2615-bp
[backport]sandbox: Allow the device to be accessed,such as /dev/null and /dev/u…
2021-09-24 14:04:36 +08:00
Eric Ernst
d9b41fc583 watcher: ensure we create target mount point for storage
We would only create the target when updating files. We need to make
sure that we create the target if the source is a directory. Without
this, we'll fail to start a container that utilizes an empty configmap,
for example.

Add unit tests for this.

Fixes: #2638

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-09-23 15:45:57 -07:00
Julio Montes
7852b9f8e1 Merge pull request #2711 from fidencio/wip/stable-2.2-kata-deploy-use-stable-and-latest-tags
stable-2.2 | kata-deploy: Also provide "stable" & "latest" tags
2021-09-23 12:18:00 -05:00
Chelsea Mafrica
83f219577d Merge pull request #2668 from cmaf/tracing-newContainer-logger-bp-2.2
stable-2.2 | runtime: tracing: Fix logger passed in newContainer
2021-09-23 09:58:14 -07:00
Chelsea Mafrica
97421afe17 Merge pull request #2664 from cmaf/tracing-stop-rootctx-bp-2.2
stable-2.2 | runtime: tracing: Use root context to stop tracing
2021-09-23 09:57:57 -07:00
Fabiano Fidêncio
2b6327ac37 kata-deploy: Add more info about the stable tag
Let's make it as clear as possible for the user that if they go for a
tagged version of kata-deploy, eg, 2.2.1, they'll have the kata runtime
2.2.1 deployed on their cluster.

Suggested-by: Eric Adams <eric.adams@intel.com>
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit 3bdcfaa658)
2021-09-23 14:05:17 +02:00
Fabiano Fidêncio
5256e0852c kata-deploy: Improve README
Let's add more instructions in the README in order to make clear to the
reader what they can do to check whether kata-deploy is ready, or
whether they have to wait till proceeding with the next instruction.

Suggested-by: Eric Adams <eric.adams@intel.com>
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit 41c590fa0a)
2021-09-23 14:04:57 +02:00
Fabiano Fidêncio
02b46268f4 kata-deploy: Remove qemu-virtiofs runtime class
There's only one QEMU runtime class deployed as part of kata-deploy, and
that includes virtiofs support (which is the default for quite some time
already).  Knowing this, let's just remove the `qemu-virtiofs` runtime
class definition.

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit debf3c9fe9)
2021-09-23 14:04:50 +02:00
Fabiano Fidêncio
1b3058dd24 release: update the kata-deploy yaml files accordingly
Let's teach our `update-repository-version.sh` script to properly update
the kata-deploy tags on both kata-deploy and kata-cleanup yaml files.

The 3 scenarios that we're dealing with, based on which branch we're
targetting, are:
```
 1) [main] ------> [main]        NO-OP
   "alpha0"       "alpha1"

                   +----------------+----------------+
                   |      from      |       to       |
  -----------------+----------------+----------------+
  kata-deploy      | "latest"       | "latest"       |
  -----------------+----------------+----------------+
  kata-deploy-base | "stable        | "stable"       |
  -----------------+----------------+----------------+

 2) [main] ------> [stable] Update kata-deploy and
   "alpha2"         "rc0"   get rid of kata-deploy-base

                   +----------------+----------------+
                   |      from      |       to       |
  -----------------+----------------+----------------+
  kata-deploy      | "latest"       | "rc0"          |
  -----------------+----------------+----------------+
  kata-deploy-base | "stable"       | REMOVED        |
  -----------------+----------------+----------------+

 3) [stable] ------> [stable]    Update kata-deploy
    "x.y.z"         "x.y.(z+1)"

                   +----------------+----------------+
                   |      from      |       to       |
  -----------------+----------------+----------------+
  kata-deploy      | "x.y.z"        | "x.y.(z+1)"    |
  -----------------+----------------+----------------+
  kata-deploy-base | NON-EXISTENT   | NON-EXISTENT   |
  -----------------+----------------+----------------+
```

And we can easily cover those 3 cases only with the information about
the "${target_branch}" and the "${new_version}", where:
* case 1) if "${target_branch}" is "main" *and* "${new_version}"
  contains "alpha", do nothing
* case 2) if "${target_branch}" is "main" *and* "${new_version}"
  contains "rc":
  * change the kata-deploy & kata-cleanup tags from "latest" to
    "${new_version}".
  * delete the kata-deploy-stable & kata-cleanup-stable files.
* case 3) if the "${target_branch}" contains "stable":
  * change the kata-deploy & kata-cleanup tags from "${current_version}"
    to "${new_version}".

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit 43a72d76e2)
2021-09-23 14:04:44 +02:00
Fabiano Fidêncio
98e2e93552 kata-deploy: Add "stable" info to the README
Similar to the instructions we have for the "latest" images, let's also
add instructions about the "stable" images.

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit ea9b2f9c92)
2021-09-23 14:04:38 +02:00
Fabiano Fidêncio
8f25c7da11 kata-deploy: Update the README
Let's just point to our repo URLs rather than assume users using
kata-deploy will have our repo cloned.

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit e541105680)
2021-09-23 14:04:29 +02:00
Fabiano Fidêncio
84da2f8ddc workflows: Add "stable" & "latest" tags to kata-deploy
When releasing a tarball, let's *also* add the "stable" & "latest" tags
to the kata-deploy image.

The "stable" tag refers to any official release, while the "latest" tag
refers to any pre-release / release candidate.

Fixes: #2302

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit a156288c1f)
2021-09-23 14:01:33 +02:00
Fabiano Fidêncio
de0e3915b7 Merge pull request #2702 from Jakob-Naucke/backport-qemu-gitlab
packaging: Backport QEMU's GitLab switch to 5.1.x
2021-09-23 12:59:17 +02:00
Jakob Naucke
5c76f1c65a packaging: Backport QEMU's GitLab switch to 5.1.x
This brings #2699 to 5.1.x for ARM. Add a `no_patches.txt` for 5.1.0
which was missing apparently.

Fixes: #2701
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-09-23 11:11:45 +02:00
Fabiano Fidêncio
522a53010c Merge pull request #2690 from fidencio/wip/stable-2.2-upload-cargo-vendored-tarball
stable-2.2 | workflows,release: Upload the vendored cargo code
2021-09-22 22:07:08 +02:00
Julio Montes
852fc53351 Merge pull request #2688 from GabyCT/shutdown
backport: Call agent shutdown test only in the correspondent CI_JOB
2021-09-22 09:53:14 -05:00
Julio Montes
e0a27b5e90 Merge pull request #2699 from Jakob-Naucke/backport-qemu-gitlab
packaging: Backport QEMU's switch to GitLab repos
2021-09-22 09:19:16 -05:00
Jakob Naucke
ba6fc32804 packaging: Backport QEMU's switch to GitLab repos
QEMU's submodule checkout from git.qemu.org can fail. On QEMU 6.x, this
is not a problem because they moved to GitLab. However, we use QEMU 5.2
on stable-2.2, which can be a problem when no cached QEMU is used.
Backport QEMU's switch.

Fixes: #2698
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-09-22 14:59:35 +02:00
Fabiano Fidêncio
d5f5da4323 workflows,release: Upload the vendored cargo code
As part of the release, let's also upload a tarball with the vendored
cargo code.  By doing this we allow distros, which usually don't have
access to the internet while performing the builds, to just add the
vendored code as a second source, making the life of the downstream
maintainers slightly easier*.

Fixes: #1203
Backports: #2573

*: The current workflow requires the downstream maintainer to download
the tarball, unpack it, run `cargo vendor`, create the tarball, etc.
Although this doesn't look like a ridiculous amount of work, it's better
if we can have it in an automated fashion.

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
(cherry picked from commit 21c8511630)
2021-09-21 21:48:58 +02:00
Gabriela Cervantes
017cd3c53c ci: Call agent shutdown test only in the correspondent CI_JOB
The agent shutdown test should only run on the CI JOB of CRI_CONTAINERD_K8S_MINIMAL
which is the only one where testing tracing is being enabled, however, this
test is being triggered in multiple CI jobs where it should not run. This PR
fixes that issue.

Fixes #2683

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2021-09-21 17:01:09 +00:00
Chelsea Mafrica
484af1a559 Merge pull request #2678 from nubificus/stable-2.2-fix_fc_vcpu_thread
stable-2.2 | virtcontainers: fc: parse vcpuID correctly
2021-09-20 09:46:07 -07:00
Chelsea Mafrica
a572a6ebf8 Merge pull request #2679 from c3d/backport/2527-adding-debugging-msgs
shimv2: Backport fixes for #2527
2021-09-20 09:42:53 -07:00
Snir Sheriber
2ca867da7b runtime: Add container field to logs
and unified field naming

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>

Backport from commit 0c7789fad6
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2021-09-20 11:04:09 +02:00
Snir Sheriber
f4da502c4f shimv2: add information to method comment
add a comment to explicitly mentioned method is a binary call

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>

Backport from commit 72e3538e36
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2021-09-20 11:03:45 +02:00
Snir Sheriber
16164241df shimv2: add logging to shimv2 api calls
and also fetch and log container id from the request

Fixes: #2527
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>

Backport from commit 8dadca9cd1
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2021-09-20 11:02:35 +02:00
Christophe de Dinechin
25c7e1181a virtiofs: Create shared directory with 0700 mode, not 0750
A discussion on the Linux kernel mailing list [1] exposed that virtiofsd makes a
core assumption that the file systems being shared are not accessible by any
non-privileged user. We currently create the `shared` directory in the sandbox
with the default `0750` permissions, which gives read and directory traversal
access to the group. There is no real good reason for a non-root user to access
the shared directory, and this is potentially dangerous.

Fixes: #2589

[1]: https://lore.kernel.org/linux-fsdevel/YTI+k29AoeGdX13Q@redhat.com/

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2021-09-20 10:54:18 +02:00
Anastassios Nanos
4c5bf0576b virtcontainers: fc: parse vcpuID correctly
In getThreadIDs(), the cpuID variable is derived from a string that
already contains a whitespace. As a result, strings.SplitAfter returns
the cpuID with a leading space. This makes any go variant of string to int
fail (strconv.ParseInt() in our case). This patch makes sure that the
leading space character is removed so the string passed to
strconv.ParseInt() is "CPUID" and not " CPUID".

This has been caused by a change in the naming scheme of vcpu threads
for Firecracker after v0.19.1.

Fixes: #2592

Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
2021-09-18 08:10:13 +00:00
Chelsea Mafrica
b3e620dbcf runtime: tracing: Fix logger passed in newContainer
Change logger in Trace call in newContainer from sandbox.Logger() to
nil. Passing nil will cause an error to be logged by kataTraceLogger
instead of the sandbox logger, which will avoid having the log message
report it as part of the sandbox subsystem when it is part of the
container subsystem.

The kataTraceLogger will not log it as related to the container
subsystem, but since the container logger has not been created at this
point, and we already use the kataTraceLogger in other instances where a
subsystem's logger has not been created yet, this PR makes the call
consistent with other code.

Backport of #2666
Fixes #2667

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-09-16 16:30:29 -07:00
Chelsea Mafrica
98c2ca13c1 runtime: tracing: Use root context to stop tracing
Call StopTracing with s.rootCtx, which is the root context for tracing,
instead of s.ctx, which is parent to a subset of trace spans.

Backport of #2662

Fixes #2663

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-09-16 11:19:40 -07:00
Fabiano Fidêncio
a97c9063db Merge pull request #2642 from jongwu/qemu_mak_2.2
backport-2.2: remove default config for arm64.
2021-09-16 07:21:32 +02:00
Jianyong Wu
0481c5070c backport-2.2: remove default config for arm64.
The current default config in qemu for arm64 doesn't suit for qemu
version 5.1+, so remove them here.

Fixes: #2595
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-09-15 10:07:13 +08:00
Samuel Ortiz
64504061c8 Merge pull request #2619 from likebreath/0913/backport_clh_v18.0
stable-2.2 | versions: Upgrade to Cloud Hypervisor v18.0
2021-09-14 12:02:50 +02:00
Binbin Zhang
56920bc943 sandbox: Allow the device to be accessed,such as /dev/null and /dev/urandom
If the device has no permission, such as /dev/null, /dev/urandom,
it needs to be added into cgroup.

Fixes: #2615
Backport: #2616

Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
2021-09-14 10:33:49 +08:00
Bo Chen
a1874ccd62 virtcontainers: clh: Revert the workaround incorrect default values
Given the fix to the bugs of the openapi spec file is included in the
Cloud Hypervisor v18.0 [1], this patch reverts the workaround we carried
in the CLH driver.

This reverts commit 932ee41b3f.

[1] https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3029

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit f785ff0bf2)
2021-09-13 14:17:58 -07:00
Bo Chen
c2c650500b virtcontainers: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v18.0.
Note: The client code of cloud-hypervisor's (CLH) OpenAPI is
automatically generated by openapi-generator [1-2].

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 0e0e59dc5f)
2021-09-13 14:17:58 -07:00
Bo Chen
7ee43f9468 versions: Upgrade to Cloud Hypervisor v18.0
Highlights from the Cloud Hypervisor release v18.0: 1) Experimental User
Device (vfio-user) support; 2) Migration support for vhost-user devices;
3) VHDX disk image support; 4) Device pass through on MSHV hypervisor;
5) AArch64 for support virtio-mem; 6) Live migration on MSHV hypervisor;
7) AArch64 CPU topology support; 8) Power button support on AArch64; 9)
Various bug fixes on PTY, TTY, signal handling, and live-migration on
AArch64.

Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v18.0

Fixes: #2543

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit f0b5331430)
2021-09-13 14:17:58 -07:00
Samuel Ortiz
eedf139076 Merge pull request #2608 from Bevisy/main-2539-bp
[backport]sandbox: Add device permissions such as /dev/null to cgroup
2021-09-13 19:07:17 +02:00
Fabiano Fidêncio
54a6890c3c Merge pull request #2614 from sameo/stable-2.2
[backport] runtime: Fix README link
2021-09-13 17:45:07 +02:00
Samuel Ortiz
1792a9fe11 runtime: Fix README link
The LICENSE file lives in the project's root.

Fixes #2612

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2021-09-11 09:57:49 +02:00
Julio Montes
9bf95279be Merge pull request #2588 from devimc/2021-09-07/backport/fixSnap
[backport] snap: Test variable instead of executing "branch"
2021-09-10 14:44:55 -05:00
Binbin Zhang
807cc8a3a5 sandbox: Add device permissions such as /dev/null to cgroup
adds the default devices for unix such as /dev/null, /dev/urandom to
the container's resource cgroup spec

Fixes: #2539
Backports: #2603

Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
2021-09-10 17:33:26 +08:00
David Gibson
5987f3b5e1 snap: Test variable instead of executing "branch"
In snapcraft.yaml we have a case statement on $(branch) - that is on the
output of executing a command "branch".  From the selections it appears
that what it actually wants is to simply select on the contents of the
$branch variable, which should be ${branch} instead.

fixes #2558

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-07 09:37:17 -05:00
Fabiano Fidêncio
caafd0f952 Merge pull request #2541 from fidencio/2.2.0-branch-bump
# Kata Containers 2.2.0
2021-09-01 00:33:25 +02:00
Fabiano Fidêncio
800126b272 release: Kata Containers 2.2.0
- runtime: drop qemu-lite support
- stable-2.2 | virtcontainers: clh: Upgrade to the openapi-generator v5.2.1
- backport ci: Temporarily skip agent shutdown test on s390x
- backport: build_image: Fix error soft link about initrd.img

dca35c17 docs: remove mentioning of qemu-lite
0bdfdad2 runtime: drop qemu-lite support
60155756 runtime: fix default hypervisor path
ca9e6538 ci: Temporarily skip agent shutdown test on s390x
938b01ae virtcontainers: clh: Workaround incorrect default values
abd708e8 virtcontainers: clh: Fix the unit test
61babd45 virtcontainers: clh: Use constructors to ensure proper default value
59c51f62 virtcontainers: clh: Migrate to use the updated client APIs
c1f260cc virtcontainers: clh: Re-generate the client code
4cd6909f virtcontainers: clh: Upgrade to the openapi-generator v5.2.1
efa2d54e build_image: Fix error soft link about initrd.img

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-08-31 18:44:03 +02:00
Archana Shinde
b1372b353f Merge pull request #2533 from bergwolf/qemu-lite
runtime: drop qemu-lite support
2021-08-31 07:39:24 -07:00
Peng Tao
dca35c1730 docs: remove mentioning of qemu-lite
vm-templating should just work with upstream qemu v4.1.0 or above.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-08-31 10:17:12 +08:00
Peng Tao
0bdfdad236 runtime: drop qemu-lite support
As the project is not maintained and we have not been testing against it
for a long time.

Fixes: #2529
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-08-31 10:17:06 +08:00
Peng Tao
60155756f3 runtime: fix default hypervisor path
Should not be qemu-lite.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-08-31 10:16:57 +08:00
Fabiano Fidêncio
669888c339 Merge pull request #2525 from likebreath/0827/backport_clh_generator
stable-2.2 | virtcontainers: clh: Upgrade to the openapi-generator v5.2.1
2021-08-30 21:25:05 +02:00
GabyCT
cde008f441 Merge pull request #2531 from Jakob-Naucke/backport-s390x-skip-agent-shutdown-test
backport ci: Temporarily skip agent shutdown test on s390x
2021-08-30 09:25:50 -05:00
Peng Tao
7c866073f9 Merge pull request #2520 from Bevisy/stable-2.2-2503
backport: build_image: Fix error soft link about initrd.img
2021-08-30 20:16:55 +08:00
Jakob Naucke
ca9e6538e6 ci: Temporarily skip agent shutdown test on s390x
see https://github.com/kata-containers/tests/issues/3878 for tracking

Fixes: #2507
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-08-30 14:14:43 +02:00
Bo Chen
938b01aedc virtcontainers: clh: Workaround incorrect default values
Two default values defined in the 'cloud-hypervisor.yaml' have typo, and this
patch manually overwrites them with the correct value as a workaround
before the corresponding fix is landed to Cloud Hypervisor upstream.

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 932ee41b3f)
2021-08-27 13:37:47 -07:00
Bo Chen
abd708e814 virtcontainers: clh: Fix the unit test
This patch fixes the unit tests over clh.go with the updated client code.

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit bff38e4f4d)
2021-08-27 13:37:47 -07:00
Bo Chen
61babd45ed virtcontainers: clh: Use constructors to ensure proper default value
With the updated openapi-generator, the client code now handles optional
attributes correctly, and ensures to assign the right default
values. This patch enables to use those constructors to make sure the
proper default values being used.

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit d967d3cb37)
2021-08-27 13:37:47 -07:00
Bo Chen
59c51f6201 virtcontainers: clh: Migrate to use the updated client APIs
The client code (and APIs) for Cloud Hypervisor has been changed
dramatically due to the upgrade to `openapi-generator` v5.2.1. This
patch migrate the Cloud Hypervisor driver in the kata-runtime to use
those updated APIs.

The main change from the client code is that it now uses "pointer" type
to represent "optional" attributes from the input openapi specification
file.

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit a6a2e525de)
2021-08-27 13:37:47 -07:00
Bo Chen
c1f260cc40 virtcontainers: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor with the
updated `openapi-generator` v5.2.1.

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 46eb07e14f)
2021-08-27 13:37:47 -07:00
Bo Chen
4cd6909f18 virtcontainers: clh: Upgrade to the openapi-generator v5.2.1
To improve the quality and correctness of the auto-generated code, this
patch upgrade the `openapi-generator` to its latest stable release
v5.2.1.

Fixes: #2487

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 80fba4d637)
2021-08-27 13:37:47 -07:00
Binbin Zhang
efa2d54e85 build_image: Fix error soft link about initrd.img
fix error soft link about initrd.img

Fixes #2503

Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
2021-08-27 16:15:49 +08:00
2089 changed files with 325072 additions and 107184 deletions

View File

@@ -15,7 +15,6 @@ jobs:
name: WIP Check
steps:
- name: WIP Check
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: tim-actions/wip-check@1c2a1ca6c110026b3e2297bb2ef39e1747b5a755
with:
labels: '["do-not-merge", "wip", "rfc"]'

View File

@@ -18,32 +18,24 @@ jobs:
name: Commit Message Check
steps:
- name: Get PR Commits
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
id: 'get-pr-commits'
uses: tim-actions/get-pr-commits@v1.2.0
uses: tim-actions/get-pr-commits@v1.0.0
with:
token: ${{ secrets.GITHUB_TOKEN }}
# Filter out revert commits
# The format of a revert commit is as follows:
#
# Revert "<original-subject-line>"
#
filter_out_pattern: '^Revert "'
- name: DCO Check
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: tim-actions/dco@2fd0504dc0d27b33f542867c300c60840c6dcb20
with:
commits: ${{ steps.get-pr-commits.outputs.commits }}
- name: Commit Body Missing Check
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') && ( success() || failure() ) }}
if: ${{ success() || failure() }}
uses: tim-actions/commit-body-check@v1.0.2
with:
commits: ${{ steps.get-pr-commits.outputs.commits }}
- name: Check Subject Line Length
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') && ( success() || failure() ) }}
if: ${{ success() || failure() }}
uses: tim-actions/commit-message-checker-with-regex@v0.3.1
with:
commits: ${{ steps.get-pr-commits.outputs.commits }}
@@ -52,7 +44,7 @@ jobs:
post_error: ${{ env.error_msg }}
- name: Check Body Line Length
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') && ( success() || failure() ) }}
if: ${{ success() || failure() }}
uses: tim-actions/commit-message-checker-with-regex@v0.3.1
with:
commits: ${{ steps.get-pr-commits.outputs.commits }}
@@ -79,7 +71,7 @@ jobs:
post_error: ${{ env.error_msg }}
- name: Check Fixes
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') && ( success() || failure() ) }}
if: ${{ success() || failure() }}
uses: tim-actions/commit-message-checker-with-regex@v0.3.1
with:
commits: ${{ steps.get-pr-commits.outputs.commits }}
@@ -90,7 +82,7 @@ jobs:
one_pass_all_pass: 'true'
- name: Check Subsystem
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') && ( success() || failure() ) }}
if: ${{ success() || failure() }}
uses: tim-actions/commit-message-checker-with-regex@v0.3.1
with:
commits: ${{ steps.get-pr-commits.outputs.commits }}

View File

@@ -1,25 +0,0 @@
on:
pull_request:
types:
- opened
- edited
- reopened
- synchronize
name: Darwin tests
jobs:
test:
strategy:
matrix:
go-version: [1.16.x, 1.17.x]
os: [macos-latest]
runs-on: ${{ matrix.os }}
steps:
- name: Install Go
uses: actions/setup-go@v2
with:
go-version: ${{ matrix.go-version }}
- name: Checkout code
uses: actions/checkout@v2
- name: Build utils
run: ./ci/darwin-test.sh

View File

@@ -1,15 +1,6 @@
name: kata deploy build
name: kata-deploy-build
on:
pull_request:
types:
- opened
- edited
- reopened
- synchronize
paths:
- tools/**
- versions.yaml
on: push
jobs:
build-asset:
@@ -27,15 +18,13 @@ jobs:
steps:
- uses: actions/checkout@v2
- name: Install docker
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
curl -fsSL https://test.docker.com -o test-docker.sh
sh test-docker.sh
- name: Build ${{ matrix.asset }}
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
make "${KATA_ASSET}-tarball"
./tools/packaging/kata-deploy/local-build/kata-deploy-binaries-in-docker.sh --build="${KATA_ASSET}"
build_dir=$(readlink -f build)
# store-artifact does not work with symlink
sudo cp -r --preserve=all "${build_dir}" "kata-build"
@@ -43,7 +32,6 @@ jobs:
KATA_ASSET: ${{ matrix.asset }}
- name: store-artifact ${{ matrix.asset }}
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/upload-artifact@v2
with:
name: kata-artifacts
@@ -56,28 +44,15 @@ jobs:
steps:
- uses: actions/checkout@v2
- name: get-artifacts
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/download-artifact@v2
with:
name: kata-artifacts
path: build
path: kata-artifacts
- name: merge-artifacts
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
make merge-builds
./tools/packaging/kata-deploy/local-build/kata-deploy-merge-builds.sh kata-artifacts
- name: store-artifacts
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/upload-artifact@v2
with:
name: kata-static-tarball
path: kata-static.tar.xz
make-kata-tarball:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: make kata-tarball
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
make kata-tarball
sudo make install-tarball

View File

@@ -1,115 +1,30 @@
on:
workflow_dispatch: # this is used to trigger the workflow on non-main branches
issue_comment:
types: [created, edited]
name: test-kata-deploy
jobs:
check-comment-and-membership:
check_comments:
if: ${{ github.event.issue.pull_request }}
runs-on: ubuntu-latest
if: |
github.event.issue.pull_request
&& github.event_name == 'issue_comment'
&& github.event.action == 'created'
&& startsWith(github.event.comment.body, '/test_kata_deploy')
steps:
- name: Check membership
uses: kata-containers/is-organization-member@1.0.1
id: is_organization_member
- name: Check for Command
id: command
uses: kata-containers/slash-command-action@v1
with:
organization: kata-containers
username: ${{ github.event.comment.user.login }}
token: ${{ secrets.GITHUB_TOKEN }}
- name: Fail if not member
repo-token: ${{ secrets.GITHUB_TOKEN }}
command: "test_kata_deploy"
reaction: "true"
reaction-type: "eyes"
allow-edits: "false"
permission-level: admin
- name: verify command arg is kata-deploy
run: |
result=${{ steps.is_organization_member.outputs.result }}
if [ $result == false ]; then
user=${{ github.event.comment.user.login }}
echo Either ${user} is not part of the kata-containers organization
echo or ${user} has its Organization Visibility set to Private at
echo https://github.com/orgs/kata-containers/people?query=${user}
echo
echo Ensure you change your Organization Visibility to Public and
echo trigger the test again.
exit 1
fi
echo "The command was '${{ steps.command.outputs.command-name }}' with arguments '${{ steps.command.outputs.command-arguments }}'"
build-asset:
runs-on: ubuntu-latest
needs: check-comment-and-membership
strategy:
matrix:
asset:
- cloud-hypervisor
- firecracker
- kernel
- qemu
- rootfs-image
- rootfs-initrd
- shim-v2
steps:
- name: get-PR-ref
id: get-PR-ref
run: |
ref=$(cat $GITHUB_EVENT_PATH | jq -r '.issue.pull_request.url' | sed 's#^.*\/pulls#refs\/pull#' | sed 's#$#\/merge#')
echo "reference for PR: " ${ref}
echo "##[set-output name=pr-ref;]${ref}"
- uses: actions/checkout@v2
with:
ref: ${{ steps.get-PR-ref.outputs.pr-ref }}
- name: Install docker
run: |
curl -fsSL https://test.docker.com -o test-docker.sh
sh test-docker.sh
- name: Build ${{ matrix.asset }}
run: |
make "${KATA_ASSET}-tarball"
build_dir=$(readlink -f build)
# store-artifact does not work with symlink
sudo cp -r "${build_dir}" "kata-build"
env:
KATA_ASSET: ${{ matrix.asset }}
TAR_OUTPUT: ${{ matrix.asset }}.tar.gz
- name: store-artifact ${{ matrix.asset }}
uses: actions/upload-artifact@v2
with:
name: kata-artifacts
path: kata-build/kata-static-${{ matrix.asset }}.tar.xz
if-no-files-found: error
create-kata-tarball:
runs-on: ubuntu-latest
needs: build-asset
steps:
- name: get-PR-ref
id: get-PR-ref
run: |
ref=$(cat $GITHUB_EVENT_PATH | jq -r '.issue.pull_request.url' | sed 's#^.*\/pulls#refs\/pull#' | sed 's#$#\/merge#')
echo "reference for PR: " ${ref}
echo "##[set-output name=pr-ref;]${ref}"
- uses: actions/checkout@v2
with:
ref: ${{ steps.get-PR-ref.outputs.pr-ref }}
- name: get-artifacts
uses: actions/download-artifact@v2
with:
name: kata-artifacts
path: kata-artifacts
- name: merge-artifacts
run: |
./tools/packaging/kata-deploy/local-build/kata-deploy-merge-builds.sh kata-artifacts
- name: store-artifacts
uses: actions/upload-artifact@v2
with:
name: kata-static-tarball
path: kata-static.tar.xz
kata-deploy:
needs: create-kata-tarball
create-and-test-container:
needs: check_comments
runs-on: ubuntu-latest
steps:
- name: get-PR-ref
@@ -118,30 +33,32 @@ jobs:
ref=$(cat $GITHUB_EVENT_PATH | jq -r '.issue.pull_request.url' | sed 's#^.*\/pulls#refs\/pull#' | sed 's#$#\/merge#')
echo "reference for PR: " ${ref}
echo "##[set-output name=pr-ref;]${ref}"
- uses: actions/checkout@v2
- name: check out
uses: actions/checkout@v2
with:
ref: ${{ steps.get-PR-ref.outputs.pr-ref }}
- name: get-kata-tarball
uses: actions/download-artifact@v2
with:
name: kata-static-tarball
- name: build-and-push-kata-deploy-ci
id: build-and-push-kata-deploy-ci
ref: ${{ steps.get-PR-ref.outputs.pr-ref }}
- name: build-container-image
id: build-container-image
run: |
PR_SHA=$(git log --format=format:%H -n1)
mv kata-static.tar.xz $GITHUB_WORKSPACE/tools/packaging/kata-deploy/kata-static.tar.xz
docker build --build-arg KATA_ARTIFACTS=kata-static.tar.xz -t quay.io/kata-containers/kata-deploy-ci:$PR_SHA $GITHUB_WORKSPACE/tools/packaging/kata-deploy
docker login -u ${{ secrets.QUAY_DEPLOYER_USERNAME }} -p ${{ secrets.QUAY_DEPLOYER_PASSWORD }} quay.io
docker push quay.io/kata-containers/kata-deploy-ci:$PR_SHA
mkdir -p packaging/kata-deploy
ln -s $GITHUB_WORKSPACE/tools/packaging/kata-deploy/action packaging/kata-deploy/action
echo "::set-output name=PKG_SHA::${PR_SHA}"
PR_SHA=$(git log --format=format:%H -n1)
VERSION="2.0.0"
ARTIFACT_URL="https://github.com/kata-containers/kata-containers/releases/download/${VERSION}/kata-static-${VERSION}-x86_64.tar.xz"
wget "${ARTIFACT_URL}" -O tools/packaging/kata-deploy/kata-static.tar.xz
docker build --build-arg KATA_ARTIFACTS=kata-static.tar.xz -t katadocker/kata-deploy-ci:${PR_SHA} -t quay.io/kata-containers/kata-deploy-ci:${PR_SHA} ./tools/packaging/kata-deploy
docker login -u ${{ secrets.DOCKER_USERNAME }} -p ${{ secrets.DOCKER_PASSWORD }}
docker push katadocker/kata-deploy-ci:$PR_SHA
docker login -u ${{ secrets.QUAY_DEPLOYER_USERNAME }} -p ${{ secrets.QUAY_DEPLOYER_PASSWORD }} quay.io
docker push quay.io/kata-containers/kata-deploy-ci:$PR_SHA
echo "##[set-output name=pr-sha;]${PR_SHA}"
- name: test-kata-deploy-ci-in-aks
uses: ./packaging/kata-deploy/action
uses: ./tools/packaging/kata-deploy/action
with:
packaging-sha: ${{steps.build-and-push-kata-deploy-ci.outputs.PKG_SHA}}
packaging-sha: ${{ steps.build-container-image.outputs.pr-sha }}
env:
PKG_SHA: ${{steps.build-and-push-kata-deploy-ci.outputs.PKG_SHA}}
PKG_SHA: ${{ steps.build-container-image.outputs.pr-sha }}
AZ_APPID: ${{ secrets.AZ_APPID }}
AZ_PASSWORD: ${{ secrets.AZ_PASSWORD }}
AZ_SUBSCRIPTION_ID: ${{ secrets.AZ_SUBSCRIPTION_ID }}

295
.github/workflows/main.yaml vendored Normal file
View File

@@ -0,0 +1,295 @@
name: Publish release tarball
on:
push:
tags:
- '1.*'
jobs:
get-artifact-list:
runs-on: ubuntu-latest
steps:
- name: get the list
run: |
pushd $GITHUB_WORKSPACE
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
git checkout $tag
popd
$GITHUB_WORKSPACE/tools/packaging/artifact-list.sh > artifact-list.txt
- name: save-artifact-list
uses: actions/upload-artifact@master
with:
name: artifact-list
path: artifact-list.txt
build-kernel:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_kernel"
steps:
- uses: actions/checkout@v1
- name: get-artifact-list
uses: actions/download-artifact@master
with:
name: artifact-list
- run: |
sudo apt-get update && sudo apt install -y flex bison libelf-dev bc iptables
- name: build-kernel
run: |
if grep -q $buildstr ./artifact-list/artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-artifact-tarball.sh $buildstr
echo "artifact-built=true" >> $GITHUB_ENV
else
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: ${{ env.artifact-built }} == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
path: kata-static-kernel.tar.gz
build-experimental-kernel:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_experimental_kernel"
steps:
- uses: actions/checkout@v1
- name: get-artifact-list
uses: actions/download-artifact@master
with:
name: artifact-list
- run: |
sudo apt-get update && sudo apt install -y flex bison libelf-dev bc iptables
- name: build-experimental-kernel
run: |
if grep -q $buildstr ./artifact-list/artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-artifact-tarball.sh $buildstr
echo "artifact-built=true" >> $GITHUB_ENV
else
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: ${{ env.artifact-built }} == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
path: kata-static-experimental-kernel.tar.gz
build-qemu:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_qemu"
steps:
- uses: actions/checkout@v1
- name: get-artifact-list
uses: actions/download-artifact@master
with:
name: artifact-list
- name: build-qemu
run: |
if grep -q $buildstr ./artifact-list/artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-artifact-tarball.sh $buildstr
echo "artifact-built=true" >> $GITHUB_ENV
else
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: ${{ env.artifact-built }} == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
path: kata-static-qemu.tar.gz
# Job for building the image
build-image:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_image"
steps:
- uses: actions/checkout@v1
- name: get-artifact-list
uses: actions/download-artifact@master
with:
name: artifact-list
- name: build-image
run: |
if grep -q $buildstr ./artifact-list/artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-artifact-tarball.sh $buildstr
echo "artifact-built=true" >> $GITHUB_ENV
else
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: ${{ env.artifact-built }} == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
path: kata-static-image.tar.gz
# Job for building firecracker hypervisor
build-firecracker:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_firecracker"
steps:
- uses: actions/checkout@v1
- name: get-artifact-list
uses: actions/download-artifact@master
with:
name: artifact-list
- name: build-firecracker
run: |
if grep -q $buildstr ./artifact-list/artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-artifact-tarball.sh $buildstr
echo "artifact-built=true" >> $GITHUB_ENV
else
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: ${{ env.artifact-built }} == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
path: kata-static-firecracker.tar.gz
# Job for building cloud-hypervisor
build-clh:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_clh"
steps:
- uses: actions/checkout@v1
- name: get-artifact-list
uses: actions/download-artifact@master
with:
name: artifact-list
- name: build-clh
run: |
if grep -q $buildstr ./artifact-list/artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-artifact-tarball.sh $buildstr
echo "artifact-built=true" >> $GITHUB_ENV
else
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: ${{ env.artifact-built }} == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
path: kata-static-clh.tar.gz
# Job for building kata components
build-kata-components:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_kata_components"
steps:
- uses: actions/checkout@v1
- name: get-artifact-list
uses: actions/download-artifact@master
with:
name: artifact-list
- name: build-kata-components
run: |
if grep -q $buildstr ./artifact-list/artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-artifact-tarball.sh $buildstr
echo "artifact-built=true" >> $GITHUB_ENV
else
echo "artifact-built=false" >> $GITHUB_ENV
fi
- name: store-artifacts
if: ${{ env.artifact-built }} == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
path: kata-static-kata-components.tar.gz
gather-artifacts:
runs-on: ubuntu-16.04
needs: [build-experimental-kernel, build-kernel, build-qemu, build-image, build-firecracker, build-kata-components, build-clh]
steps:
- uses: actions/checkout@v1
- name: get-artifacts
uses: actions/download-artifact@master
with:
name: kata-artifacts
- name: colate-artifacts
run: |
$GITHUB_WORKSPACE/.github/workflows/gather-artifacts.sh
- name: store-artifacts
uses: actions/upload-artifact@master
with:
name: release-candidate
path: kata-static.tar.xz
kata-deploy:
needs: gather-artifacts
runs-on: ubuntu-latest
steps:
- name: get-artifacts
uses: actions/download-artifact@master
with:
name: release-candidate
- name: build-and-push-kata-deploy-ci
id: build-and-push-kata-deploy-ci
run: |
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
git clone https://github.com/kata-containers/packaging
pushd packaging
git checkout $tag
pkg_sha=$(git rev-parse HEAD)
popd
mv release-candidate/kata-static.tar.xz ./packaging/kata-deploy/kata-static.tar.xz
docker build --build-arg KATA_ARTIFACTS=kata-static.tar.xz -t katadocker/kata-deploy-ci:$pkg_sha -t quay.io/kata-containers/kata-deploy-ci:$pkg_sha ./packaging/kata-deploy
docker login -u ${{ secrets.DOCKER_USERNAME }} -p ${{ secrets.DOCKER_PASSWORD }}
docker push katadocker/kata-deploy-ci:$pkg_sha
docker login -u ${{ secrets.QUAY_DEPLOYER_USERNAME }} -p ${{ secrets.QUAY_DEPLOYER_PASSWORD }} quay.io
docker push quay.io/kata-containers/kata-deploy-ci:$pkg_sha
echo "::set-output name=PKG_SHA::${pkg_sha}"
- name: test-kata-deploy-ci-in-aks
uses: ./packaging/kata-deploy/action
with:
packaging-sha: ${{steps.build-and-push-kata-deploy-ci.outputs.PKG_SHA}}
env:
PKG_SHA: ${{steps.build-and-push-kata-deploy-ci.outputs.PKG_SHA}}
AZ_APPID: ${{ secrets.AZ_APPID }}
AZ_PASSWORD: ${{ secrets.AZ_PASSWORD }}
AZ_SUBSCRIPTION_ID: ${{ secrets.AZ_SUBSCRIPTION_ID }}
AZ_TENANT_ID: ${{ secrets.AZ_TENANT_ID }}
- name: push-tarball
run: |
# tag the container image we created and push to DockerHub
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
docker tag katadocker/kata-deploy-ci:${{steps.build-and-push-kata-deploy-ci.outputs.PKG_SHA}} katadocker/kata-deploy:${tag}
docker push katadocker/kata-deploy:${tag}
upload-static-tarball:
needs: kata-deploy
runs-on: ubuntu-latest
steps:
- name: download-artifacts
uses: actions/download-artifact@master
with:
name: release-candidate
- name: install hub
run: |
HUB_VER=$(curl -s "https://api.github.com/repos/github/hub/releases/latest" | jq -r .tag_name | sed 's/^v//')
wget -q -O- https://github.com/github/hub/releases/download/v$HUB_VER/hub-linux-amd64-$HUB_VER.tgz | \
tar xz --strip-components=2 --wildcards '*/bin/hub' && sudo mv hub /usr/local/bin/hub
- name: push static tarball to github
run: |
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
tarball="kata-static-$tag-x86_64.tar.xz"
repo="https://github.com/kata-containers/runtime.git"
mv release-candidate/kata-static.tar.xz "release-candidate/${tarball}"
git clone "${repo}"
cd runtime
echo "uploading asset '${tarball}' to '${repo}' tag: ${tag}"
GITHUB_TOKEN=${{ secrets.GIT_UPLOAD_TOKEN }} hub release edit -m "" -a "../release-candidate/${tarball}" "${tag}"

View File

@@ -16,7 +16,6 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Install hub
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
HUB_ARCH="amd64"
HUB_VER=$(curl -sL "https://api.github.com/repos/github/hub/releases/latest" |\
@@ -27,7 +26,6 @@ jobs:
sudo install hub /usr/local/bin
- name: Install hub extension script
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
# Clone into a temporary directory to avoid overwriting
# any existing github directory.
@@ -37,11 +35,9 @@ jobs:
popd &>/dev/null
- name: Checkout code to allow hub to communicate with the project
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v2
- name: Move issue to "In progress"
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
env:
GITHUB_TOKEN: ${{ secrets.KATA_GITHUB_ACTIONS_TOKEN }}
run: |

View File

@@ -26,7 +26,6 @@ jobs:
- name: Build ${{ matrix.asset }}
run: |
./tools/packaging/kata-deploy/local-build/kata-deploy-copy-yq-installer.sh
./tools/packaging/kata-deploy/local-build/kata-deploy-binaries-in-docker.sh --build="${KATA_ASSET}"
build_dir=$(readlink -f build)
# store-artifact does not work with symlink
@@ -141,37 +140,12 @@ jobs:
- uses: actions/checkout@v2
- name: generate-and-upload-tarball
run: |
pushd $GITHUB_WORKSPACE/src/agent
cargo vendor >> .cargo/config
popd
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
tarball="kata-containers-$tag-vendor.tar.gz"
pushd $GITHUB_WORKSPACE
bash -c "tools/packaging/release/generate_vendor.sh ${tarball}"
tar -cvzf "${tarball}" src/agent/.cargo/config src/agent/vendor
GITHUB_TOKEN=${{ secrets.GIT_UPLOAD_TOKEN }} hub release edit -m "" -a "${tarball}" "${tag}"
popd
upload-libseccomp-tarball:
needs: upload-cargo-vendored-tarball
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: download-and-upload-tarball
env:
GITHUB_TOKEN: ${{ secrets.GIT_UPLOAD_TOKEN }}
GOPATH: ${HOME}/go
run: |
pushd $GITHUB_WORKSPACE
./ci/install_yq.sh
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
versions_yaml="versions.yaml"
version=$(${GOPATH}/bin/yq read ${versions_yaml} "externals.libseccomp.version")
repo_url=$(${GOPATH}/bin/yq read ${versions_yaml} "externals.libseccomp.url")
download_url="${repo_url}/releases/download/v${version}"
tarball="libseccomp-${version}.tar.gz"
asc="${tarball}.asc"
curl -sSLO "${download_url}/${tarball}"
curl -sSLO "${download_url}/${asc}"
# "-m" option should be empty to re-use the existing release title
# without opening a text editor.
# For the details, check https://hub.github.com/hub-release.1.html.
hub release edit -m "" -a "${tarball}" "${tag}"
hub release edit -m "" -a "${asc}" "${tag}"
popd

View File

@@ -12,7 +12,8 @@ on:
- reopened
- labeled
- unlabeled
branches:
pull_request:
branches:
- main
jobs:
@@ -20,7 +21,6 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Install hub
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
HUB_ARCH="amd64"
HUB_VER=$(curl -sL "https://api.github.com/repos/github/hub/releases/latest" |\
@@ -31,8 +31,9 @@ jobs:
sudo install hub /usr/local/bin
- name: Checkout code to allow hub to communicate with the project
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v2
with:
token: ${{ secrets.KATA_GITHUB_ACTIONS_TOKEN }}
- name: Install porting checker script
run: |
@@ -44,7 +45,6 @@ jobs:
popd &>/dev/null
- name: Stop PR being merged unless it has a correct set of porting labels
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
env:
GITHUB_TOKEN: ${{ secrets.KATA_GITHUB_ACTIONS_TOKEN }}
run: |

View File

@@ -26,7 +26,7 @@ jobs:
# Check semantic versioning format (x.y.z) and if the current tag is the latest tag
if echo "${current_version}" | grep -q "^[[:digit:]]\+\.[[:digit:]]\+\.[[:digit:]]\+$" && echo -e "$latest_version\n$current_version" | sort -C -V; then
# Current version is the latest version, build it
snapcraft snap --debug --destructive-mode
snapcraft -d snap --destructive-mode
fi
- name: Upload snap

View File

@@ -1,27 +1,17 @@
name: snap CI
on:
pull_request:
types:
- opened
- synchronize
- reopened
- edited
on: ["pull_request"]
jobs:
test:
runs-on: ubuntu-20.04
steps:
- name: Check out
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v2
with:
fetch-depth: 0
- name: Install Snapcraft
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: samuelmeuli/action-snapcraft@v1
- name: Build snap
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
snapcraft snap --debug --destructive-mode
snapcraft -d snap --destructive-mode

View File

@@ -5,13 +5,15 @@ on:
- edited
- reopened
- synchronize
- labeled
- unlabeled
name: Static checks
jobs:
test:
strategy:
matrix:
go-version: [1.16.x, 1.17.x]
go-version: [1.15.x, 1.16.x]
os: [ubuntu-20.04]
runs-on: ${{ matrix.os }}
env:
@@ -58,21 +60,13 @@ jobs:
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/setup.sh
env:
GOPATH: ${{ runner.workspace }}/kata-containers
- name: Installing rust
- name: Building rust
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/install_rust.sh
PATH=$PATH:"$HOME/.cargo/bin"
rustup target add x86_64-unknown-linux-musl
rustup component add rustfmt clippy
- name: Setup seccomp
run: |
libseccomp_install_dir=$(mktemp -d -t libseccomp.XXXXXXXXXX)
gperf_install_dir=$(mktemp -d -t gperf.XXXXXXXXXX)
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/install_libseccomp.sh "${libseccomp_install_dir}" "${gperf_install_dir}"
echo "Set environment variables for the libseccomp crate to link the libseccomp library statically"
echo "LIBSECCOMP_LINK_TYPE=static" >> $GITHUB_ENV
echo "LIBSECCOMP_LIB_PATH=${libseccomp_install_dir}/lib" >> $GITHUB_ENV
# Check whether the vendored code is up-to-date & working as the first thing
- name: Check vendored code
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
@@ -90,7 +84,3 @@ jobs:
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && make test
- name: Run Unit Tests As Root User
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && sudo -E PATH="$PATH" make test

1
.gitignore vendored
View File

@@ -9,5 +9,4 @@ src/agent/src/version.rs
src/agent/kata-agent.service
src/agent/protocols/src/*.rs
!src/agent/protocols/src/lib.rs
build

View File

@@ -2,4 +2,4 @@
## This repo is part of [Kata Containers](https://katacontainers.io)
For details on how to contribute to the Kata Containers project, please see the main [contributing document](https://github.com/kata-containers/community/blob/main/CONTRIBUTING.md).
For details on how to contribute to the Kata Containers project, please see the main [contributing document](https://github.com/kata-containers/community/blob/master/CONTRIBUTING.md).

View File

@@ -1,3 +1,94 @@
# Glossary
See the [project glossary hosted in the wiki](https://github.com/kata-containers/kata-containers/wiki/Glossary).
[A](#a), [B](#b), [C](#c), [D](#d), [E](#e), [F](#f), [G](#g), [H](#h), [I](#i), [J](#j), [K](#k), [L](#l), [M](#m), [N](#n), [O](#o), [P](#p), [Q](#q), [R](#r), [S](#s), [T](#t), [U](#u), [V](#v), [W](#w), [X](#x), [Y](#y), [Z](#z)
## A
### Auto Scaling
a method used in cloud computing, whereby the amount of computational resources in a server farm, typically measured in terms of the number of active servers, which vary automatically based on the load on the farm.
## B
## C
### Container Security Solutions
The process of implementing security tools and policies that will give you the assurance that everything in your container is running as intended, and only as intended.
### Container Software
A standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another.
### Container Runtime Interface
A plugin interface which enables Kubelet to use a wide variety of container runtimes, without the need to recompile.
### Container Virtualization
A container is a virtual runtime environment that runs on top of a single operating system (OS) kernel and emulates an operating system rather than the underlying hardware.
## D
## E
## F
## G
## H
## I
### Infrastructure Architecture
A structured and modern approach for supporting an organization and facilitating innovation within an enterprise.
## J
## K
### Kata Containers
Kata containers is an open source project delivering increased container security and Workload isolation through an implementation of lightweight virtual machines.
## L
## M
## N
## O
## P
### Pod Containers
A Group of one or more containers , with shared storage/network, and a specification for how to run the containers.
### Private Cloud
A computing model that offers a proprietary environment dedicated to a single business entity.
### Public Cloud
Computing services offered by third-party providers over the public Internet, making them available to anyone who wants to use or purchase them.
## Q
## R
## S
### Serverless Containers
An architecture in which code is executed on-demand. Serverless workloads are typically in the cloud, but on-premises serverless platforms exist, too.
## T
## U
## V
### Virtual Machine Monitor
Computer software, firmware or hardware that creates and runs virtual machines.
### Virtual Machine Software
A software program or operating system that not only exhibits the behavior of a separate computer, but is also capable of performing tasks such as running applications and programs like a separate computer.
## W
## X
## Y
## Z

View File

@@ -8,24 +8,18 @@ COMPONENTS =
COMPONENTS += agent
COMPONENTS += runtime
COMPONENTS += trace-forwarder
# List of available tools
TOOLS =
TOOLS += agent-ctl
TOOLS += trace-forwarder
STANDARD_TARGETS = build check clean install test vendor
default: all
all: logging-crate-tests build
logging-crate-tests:
make -C src/libs/logging
include utils.mk
include ./tools/packaging/kata-deploy/local-build/Makefile
all: build
# Create the rules
$(eval $(call create_all_rules,$(COMPONENTS),$(TOOLS),$(STANDARD_TARGETS)))
@@ -39,10 +33,10 @@ generate-protocols:
static-checks: build
bash ci/static-checks.sh
.PHONY: \
all \
binary-tarball \
default \
install-binary-tarball \
logging-crate-tests \
static-checks
binary-tarball:
make -f ./tools/packaging/kata-deploy/local-build/Makefile
install-binary-tarball:
make -f ./tools/packaging/kata-deploy/local-build/Makefile install
.PHONY: all default static-checks binary-tarball install-binary-tarball

View File

@@ -17,73 +17,16 @@ standard implementation of lightweight Virtual Machines (VMs) that feel and
perform like containers, but provide the workload isolation and security
advantages of VMs.
## License
The code is licensed under the Apache 2.0 license.
See [the license file](LICENSE) for further details.
## Platform support
Kata Containers currently runs on 64-bit systems supporting the following
technologies:
| Architecture | Virtualization technology |
|-|-|
| `x86_64`, `amd64` | [Intel](https://www.intel.com) VT-x, AMD SVM |
| `aarch64` ("`arm64`")| [ARM](https://www.arm.com) Hyp |
| `ppc64le` | [IBM](https://www.ibm.com) Power |
| `s390x` | [IBM](https://www.ibm.com) Z & LinuxONE SIE |
### Hardware requirements
The [Kata Containers runtime](src/runtime) provides a command to
determine if your host system is capable of running and creating a
Kata Container:
```bash
$ kata-runtime check
```
> **Notes:**
>
> - This command runs a number of checks including connecting to the
> network to determine if a newer release of Kata Containers is
> available on GitHub. If you do not wish this to check to run, add
> the `--no-network-checks` option.
>
> - By default, only a brief success / failure message is printed.
> If more details are needed, the `--verbose` flag can be used to display the
> list of all the checks performed.
>
> - If the command is run as the `root` user additional checks are
> run (including checking if another incompatible hypervisor is running).
> When running as `root`, network checks are automatically disabled.
## Getting started
See the [installation documentation](docs/install).
## Documentation
See the [official documentation](docs) including:
- [Installation guides](docs/install)
- [Developer guide](docs/Developer-Guide.md)
- [Design documents](docs/design)
- [Architecture overview](docs/design/architecture)
## Configuration
Kata Containers uses a single
[configuration file](src/runtime/README.md#configuration)
which contains a number of sections for various parts of the Kata
Containers system including the [runtime](src/runtime), the
[agent](src/agent) and the [hypervisor](#hypervisors).
## Hypervisors
See the [hypervisors document](docs/hypervisors.md) and the
[Hypervisor specific configuration details](src/runtime/README.md#hypervisor-specific-configuration).
See the [official documentation](docs)
(including [installation guides](docs/install),
[the developer guide](docs/Developer-Guide.md),
[design documents](docs/design) and more).
## Community
@@ -105,8 +48,6 @@ Please raise an issue
## Developers
See the [developer guide](docs/Developer-Guide.md).
### Components
### Main components
@@ -129,8 +70,8 @@ The table below lists the remaining parts of the project:
| [packaging](tools/packaging) | infrastructure | Scripts and metadata for producing packaged binaries<br/>(components, hypervisors, kernel and rootfs). |
| [kernel](https://www.kernel.org) | kernel | Linux kernel used by the hypervisor to boot the guest image. Patches are stored [here](tools/packaging/kernel). |
| [osbuilder](tools/osbuilder) | infrastructure | Tool to create "mini O/S" rootfs and initrd images and kernel for the hypervisor. |
| [`agent-ctl`](src/tools/agent-ctl) | utility | Tool that provides low-level access for testing the agent. |
| [`trace-forwarder`](src/tools/trace-forwarder) | utility | Agent tracing helper. |
| [`agent-ctl`](tools/agent-ctl) | utility | Tool that provides low-level access for testing the agent. |
| [`trace-forwarder`](src/trace-forwarder) | utility | Agent tracing helper. |
| [`ci`](https://github.com/kata-containers/ci) | CI | Continuous Integration configuration files and scripts. |
| [`katacontainers.io`](https://github.com/kata-containers/www.katacontainers.io) | Source for the [`katacontainers.io`](https://www.katacontainers.io) site. |
@@ -143,4 +84,8 @@ the [components](#components) section for further details.
## Glossary of Terms
See the [glossary of terms](https://github.com/kata-containers/kata-containers/wiki/Glossary) related to Kata Containers.
See the [glossary of terms](Glossary.md) related to Kata Containers.
---
[kernel]: https://www.kernel.org
[github-katacontainers.io]: https://github.com/kata-containers/www.katacontainers.io

View File

@@ -1 +1 @@
2.4.3
2.2.3

View File

@@ -1,42 +0,0 @@
#!/usr/bin/env bash
#
# Copyright (c) 2022 Apple Inc.
#
# SPDX-License-Identifier: Apache-2.0
set -e
cidir=$(dirname "$0")
runtimedir=$cidir/../src/runtime
build_working_packages() {
# working packages:
device_api=$runtimedir/virtcontainers/device/api
device_config=$runtimedir/virtcontainers/device/config
device_drivers=$runtimedir/virtcontainers/device/drivers
device_manager=$runtimedir/virtcontainers/device/manager
rc_pkg_dir=$runtimedir/pkg/resourcecontrol/
utils_pkg_dir=$runtimedir/virtcontainers/utils
# broken packages :( :
#katautils=$runtimedir/pkg/katautils
#oci=$runtimedir/pkg/oci
#vc=$runtimedir/virtcontainers
pkgs=(
"$device_api"
"$device_config"
"$device_drivers"
"$device_manager"
"$utils_pkg_dir"
"$rc_pkg_dir")
for pkg in "${pkgs[@]}"; do
echo building "$pkg"
pushd "$pkg" &>/dev/null
go build
go test
popd &>/dev/null
done
}
build_working_packages

30
ci/go-no-os-exit.sh Executable file
View File

@@ -0,0 +1,30 @@
#!/bin/bash
# Copyright (c) 2018 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
#
# Check there are no os.Exit() calls creeping into the code
# We don't use that exit path in the Kata codebase.
# Allow the path to check to be over-ridden.
# Default to the current directory.
go_packages=${1:-.}
echo "Checking for no os.Exit() calls for package [${go_packages}]"
candidates=`go list -f '{{.Dir}}/*.go' $go_packages`
for f in $candidates; do
filename=`basename $f`
# skip all go test files
[[ $filename == *_test.go ]] && continue
# skip exit.go where, the only file we should call os.Exit() from.
[[ $filename == "exit.go" ]] && continue
files="$f $files"
done
[ -z "$files" ] && echo "No files to check, skipping" && exit 0
if egrep -n '\<os\.Exit\>' $files; then
echo "Direct calls to os.Exit() are forbidden, please use exit() so atexit() works"
exit 1
fi

View File

@@ -1,4 +1,3 @@
#!/usr/bin/env bash
#
# Copyright (c) 2020 Intel Corporation
#

View File

@@ -1,4 +1,4 @@
#!/usr/bin/env bash
#!/bin/bash
#
# Copyright (c) 2019 Intel Corporation
#

View File

@@ -1,109 +0,0 @@
#!/usr/bin/env bash
#
# Copyright 2021 Sony Group Corporation
#
# SPDX-License-Identifier: Apache-2.0
#
set -o errexit
cidir=$(dirname "$0")
source "${cidir}/lib.sh"
clone_tests_repo
source "${tests_repo_dir}/.ci/lib.sh"
# The following variables if set on the environment will change the behavior
# of gperf and libseccomp configure scripts, that may lead this script to
# fail. So let's ensure they are unset here.
unset PREFIX DESTDIR
arch=$(uname -m)
workdir="$(mktemp -d --tmpdir build-libseccomp.XXXXX)"
# Variables for libseccomp
# Currently, specify the libseccomp version directly without using `versions.yaml`
# because the current Snap workflow is incomplete.
# After solving the issue, replace this code by using the `versions.yaml`.
# libseccomp_version=$(get_version "externals.libseccomp.version")
# libseccomp_url=$(get_version "externals.libseccomp.url")
libseccomp_version="2.5.1"
libseccomp_url="https://github.com/seccomp/libseccomp"
libseccomp_tarball="libseccomp-${libseccomp_version}.tar.gz"
libseccomp_tarball_url="${libseccomp_url}/releases/download/v${libseccomp_version}/${libseccomp_tarball}"
cflags="-O2"
# Variables for gperf
# Currently, specify the gperf version directly without using `versions.yaml`
# because the current Snap workflow is incomplete.
# After solving the issue, replace this code by using the `versions.yaml`.
# gperf_version=$(get_version "externals.gperf.version")
# gperf_url=$(get_version "externals.gperf.url")
gperf_version="3.1"
gperf_url="https://ftp.gnu.org/gnu/gperf"
gperf_tarball="gperf-${gperf_version}.tar.gz"
gperf_tarball_url="${gperf_url}/${gperf_tarball}"
# We need to build the libseccomp library from sources to create a static library for the musl libc.
# However, ppc64le and s390x have no musl targets in Rust. Hence, we do not set cflags for the musl libc.
if ([ "${arch}" != "ppc64le" ] && [ "${arch}" != "s390x" ]); then
# Set FORTIFY_SOURCE=1 because the musl-libc does not have some functions about FORTIFY_SOURCE=2
cflags="-U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=1 -O2"
fi
die() {
msg="$*"
echo "[Error] ${msg}" >&2
exit 1
}
finish() {
rm -rf "${workdir}"
}
trap finish EXIT
build_and_install_gperf() {
echo "Build and install gperf version ${gperf_version}"
mkdir -p "${gperf_install_dir}"
curl -sLO "${gperf_tarball_url}"
tar -xf "${gperf_tarball}"
pushd "gperf-${gperf_version}"
./configure --prefix="${gperf_install_dir}"
make
make install
export PATH=$PATH:"${gperf_install_dir}"/bin
popd
echo "Gperf installed successfully"
}
build_and_install_libseccomp() {
echo "Build and install libseccomp version ${libseccomp_version}"
mkdir -p "${libseccomp_install_dir}"
curl -sLO "${libseccomp_tarball_url}"
tar -xf "${libseccomp_tarball}"
pushd "libseccomp-${libseccomp_version}"
./configure --prefix="${libseccomp_install_dir}" CFLAGS="${cflags}" --enable-static
make
make install
popd
echo "Libseccomp installed successfully"
}
main() {
local libseccomp_install_dir="${1:-}"
local gperf_install_dir="${2:-}"
if [ -z "${libseccomp_install_dir}" ] || [ -z "${gperf_install_dir}" ]; then
die "Usage: ${0} <libseccomp-install-dir> <gperf-install-dir>"
fi
pushd "$workdir"
# gperf is required for building the libseccomp.
build_and_install_gperf
build_and_install_libseccomp
popd
}
main "$@"

View File

@@ -1,4 +1,4 @@
#!/usr/bin/env bash
#!/bin/bash
# Copyright (c) 2020 Ant Group
#
# SPDX-License-Identifier: Apache-2.0

View File

@@ -1,4 +1,4 @@
#!/usr/bin/env bash
#!/bin/bash
# Copyright (c) 2019 Ant Financial
#
# SPDX-License-Identifier: Apache-2.0
@@ -12,5 +12,5 @@ source "${cidir}/lib.sh"
clone_tests_repo
pushd ${tests_repo_dir}
.ci/install_rust.sh ${1:-}
.ci/install_rust.sh
popd

View File

@@ -1,4 +1,4 @@
#!/usr/bin/env bash
#!/bin/bash
#
# Copyright (c) 2018 Intel Corporation
#

View File

@@ -36,7 +36,7 @@ run_static_checks()
# Make sure we have the targeting branch
git remote set-branches --add origin "${branch}"
git fetch -a
bash "$tests_repo_dir/.ci/static-checks.sh" "$@"
bash "$tests_repo_dir/.ci/static-checks.sh" "github.com/kata-containers/kata-containers"
}
run_go_test()

View File

@@ -4,11 +4,6 @@
#
# This is the build root image for Kata Containers on OpenShift CI.
#
FROM quay.io/centos/centos:stream8
FROM centos:8
RUN yum -y update && \
yum -y install \
git \
sudo \
wget && \
yum clean all
RUN yum -y update && yum -y install git sudo wget

View File

@@ -1,4 +1,4 @@
#!/usr/bin/env bash
#!/bin/bash
#
# Copyright (c) 2019 Ant Financial
#

View File

@@ -1,4 +1,4 @@
#!/usr/bin/env bash
#!/bin/bash
#
# Copyright (c) 2018 Intel Corporation
#

View File

@@ -1,4 +1,4 @@
#!/usr/bin/env bash
#!/bin/bash
#
# Copyright (c) 2017-2018 Intel Corporation
#
@@ -9,4 +9,4 @@ set -e
cidir=$(dirname "$0")
source "${cidir}/lib.sh"
run_static_checks "${@:-github.com/kata-containers/kata-containers}"
run_static_checks

View File

@@ -86,16 +86,6 @@ One of the `initrd` and `image` options in Kata runtime config file **MUST** be
The main difference between the options is that the size of `initrd`(10MB+) is significantly smaller than
rootfs `image`(100MB+).
## Enable seccomp
Enable seccomp as follows:
```
$ sudo sed -i '/^disable_guest_seccomp/ s/true/false/' /etc/kata-containers/configuration.toml
```
This will pass container seccomp profiles to the kata agent.
## Enable full debug
Enable full debug as follows:
@@ -212,13 +202,11 @@ $ sudo systemctl restart systemd-journald
>
> - You should only do this step if you are testing with the latest version of the agent.
The agent is built with a statically linked `musl.` The default `libc` used is `musl`, but on `ppc64le` and `s390x`, `gnu` should be used. To configure this:
The rust-agent is built with a static linked `musl.` To configure this:
```
$ export ARCH=$(uname -m)
$ if [ "$ARCH" = "ppc64le" -o "$ARCH" = "s390x" ]; then export LIBC=gnu; else export LIBC=musl; fi
$ [ ${ARCH} == "ppc64le" ] && export ARCH=powerpc64le
$ rustup target add ${ARCH}-unknown-linux-${LIBC}
rustup target add x86_64-unknown-linux-musl
sudo ln -s /usr/bin/g++ /bin/musl-g++
```
To build the agent:
@@ -228,18 +216,6 @@ $ go get -d -u github.com/kata-containers/kata-containers
$ cd $GOPATH/src/github.com/kata-containers/kata-containers/src/agent && make
```
The agent is built with seccomp capability by default.
If you want to build the agent without the seccomp capability, you need to run `make` with `SECCOMP=no` as follows.
```
$ make -C $GOPATH/src/github.com/kata-containers/kata-containers/src/agent SECCOMP=no
```
> **Note:**
>
> - If you enable seccomp in the main configuration file but build the agent without seccomp capability,
> the runtime exits conservatively with an error message.
## Get the osbuilder
```
@@ -258,21 +234,9 @@ the following example.
$ export ROOTFS_DIR=${GOPATH}/src/github.com/kata-containers/kata-containers/tools/osbuilder/rootfs-builder/rootfs
$ sudo rm -rf ${ROOTFS_DIR}
$ cd $GOPATH/src/github.com/kata-containers/kata-containers/tools/osbuilder/rootfs-builder
$ script -fec 'sudo -E GOPATH=$GOPATH USE_DOCKER=true ./rootfs.sh ${distro}'
```
You MUST choose a distribution (e.g., `ubuntu`) for `${distro}`.
You can get a supported distributions list in the Kata Containers by running the following.
```
$ ./rootfs.sh -l
```
If you want to build the agent without seccomp capability, you need to run the `rootfs.sh` script with `SECCOMP=no` as follows.
```
$ script -fec 'sudo -E GOPATH=$GOPATH AGENT_INIT=yes USE_DOCKER=true SECCOMP=no ./rootfs.sh ${distro}'
$ script -fec 'sudo -E GOPATH=$GOPATH USE_DOCKER=true SECCOMP=no ./rootfs.sh ${distro}'
```
You MUST choose one of `alpine`, `centos`, `clearlinux`, `debian`, `euleros`, `fedora`, `suse`, and `ubuntu` for `${distro}`. By default `seccomp` packages are not included in the rootfs image. Set `SECCOMP` to `yes` to include them.
> **Note:**
>
@@ -308,7 +272,6 @@ $ script -fec 'sudo -E USE_DOCKER=true ./image_builder.sh ${ROOTFS_DIR}'
> - If you do *not* wish to build under Docker, remove the `USE_DOCKER`
> variable in the previous command and ensure the `qemu-img` command is
> available on your system.
> - If `qemu-img` is not installed, you will likely see errors such as `ERROR: File /dev/loop19p1 is not a block device` and `losetup: /tmp/tmp.bHz11oY851: Warning: file is smaller than 512 bytes; the loop device may be useless or invisible for system tools`. These can be mitigated by installing the `qemu-img` command (available in the `qemu-img` package on Fedora or the `qemu-utils` package on Debian).
### Install the rootfs image
@@ -327,23 +290,12 @@ $ (cd /usr/share/kata-containers && sudo ln -sf "$image" kata-containers.img)
$ export ROOTFS_DIR="${GOPATH}/src/github.com/kata-containers/kata-containers/tools/osbuilder/rootfs-builder/rootfs"
$ sudo rm -rf ${ROOTFS_DIR}
$ cd $GOPATH/src/github.com/kata-containers/kata-containers/tools/osbuilder/rootfs-builder
$ script -fec 'sudo -E GOPATH=$GOPATH AGENT_INIT=yes USE_DOCKER=true ./rootfs.sh ${distro}'
```
`AGENT_INIT` controls if the guest image uses the Kata agent as the guest `init` process. When you create an initrd image,
always set `AGENT_INIT` to `yes`.
You MUST choose a distribution (e.g., `ubuntu`) for `${distro}`.
You can get a supported distributions list in the Kata Containers by running the following.
```
$ ./rootfs.sh -l
```
If you want to build the agent without seccomp capability, you need to run the `rootfs.sh` script with `SECCOMP=no` as follows.
```
$ script -fec 'sudo -E GOPATH=$GOPATH AGENT_INIT=yes USE_DOCKER=true SECCOMP=no ./rootfs.sh ${distro}'
```
`AGENT_INIT` controls if the guest image uses the Kata agent as the guest `init` process. When you create an initrd image,
always set `AGENT_INIT` to `yes`. By default `seccomp` packages are not included in the initrd image. Set `SECCOMP` to `yes` to include them.
You MUST choose one of `alpine`, `centos`, `clearlinux`, `euleros`, and `fedora` for `${distro}`.
> **Note:**
>

View File

@@ -57,13 +57,6 @@ for advice on which repository to raise the issue against.
This section lists items that might be possible to fix.
## OCI CLI commands
### Docker and Podman support
Currently Kata Containers does not support Docker or Podman.
See issue https://github.com/kata-containers/kata-containers/issues/722 for more information.
## Runtime commands
### checkpoint and restore
@@ -93,6 +86,21 @@ All other configurations are supported and are working properly.
## Networking
### Docker swarm and compose support
The newest version of Docker supported is specified by the
`externals.docker.version` variable in the
[versions database](https://github.com/kata-containers/runtime/blob/master/versions.yaml).
Basic Docker swarm support works. However, if you want to use custom networks
with Docker's swarm, an older version of Docker is required. This is specified
by the `externals.docker.meta.swarm-version` variable in the
[versions database](https://github.com/kata-containers/runtime/blob/master/versions.yaml).
See issue https://github.com/kata-containers/runtime/issues/175 for more information.
Docker compose normally uses custom networks, so also has the same limitations.
## Resource management
Due to the way VMs differ in their CPU and memory allocation, and sharing
@@ -104,12 +112,82 @@ See issue https://github.com/clearcontainers/runtime/issues/341 and [the constra
For CPUs resource management see
[CPU constraints](design/vcpu-handling.md).
### docker run and shared memory
The runtime does not implement the `docker run --shm-size` command to
set the size of the `/dev/shm tmpfs` within the container. It is possible to pass this configuration value into the VM container so the appropriate mount command happens at launch time.
See issue https://github.com/kata-containers/kata-containers/issues/21 for more information.
### docker run and sysctl
The `docker run --sysctl` feature is not implemented. At the runtime
level, this equates to the `linux.sysctl` OCI configuration. Docker
allows configuring the sysctl settings that support namespacing. From a security and isolation point of view, it might make sense to set them in the VM, which isolates sysctl settings. Also, given that each Kata Container has its own kernel, we can support setting of sysctl settings that are not namespaced. In some cases, we might need to support configuring some of the settings on both the host side Kata Container namespace and the Kata Containers kernel.
See issue https://github.com/kata-containers/runtime/issues/185 for more information.
## Docker daemon features
Some features enabled or implemented via the
[`dockerd` daemon](https://docs.docker.com/config/daemon/) configuration are not yet
implemented.
### SELinux support
The `dockerd` configuration option `"selinux-enabled": true` is not presently implemented
in Kata Containers. Enabling this option causes an OCI runtime error.
See issue https://github.com/kata-containers/runtime/issues/784 for more information.
The consequence of this is that the [Docker --security-opt is only partially supported](#docker---security-opt-option-partially-supported).
Kubernetes [SELinux labels](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#assign-selinux-labels-to-a-container) will also not be applied.
# Architectural limitations
This section lists items that might not be fixed due to fundamental
architectural differences between "soft containers" (i.e. traditional Linux*
containers) and those based on VMs.
## Networking limitations
### Support for joining an existing VM network
Docker supports the ability for containers to join another containers
namespace with the `docker run --net=containers` syntax. This allows
multiple containers to share a common network namespace and the network
interfaces placed in the network namespace. Kata Containers does not
support network namespace sharing. If a Kata Container is setup to
share the network namespace of a `runc` container, the runtime
effectively takes over all the network interfaces assigned to the
namespace and binds them to the VM. Consequently, the `runc` container loses
its network connectivity.
### docker --net=host
Docker host network support (`docker --net=host run`) is not supported.
It is not possible to directly access the host networking configuration
from within the VM.
The `--net=host` option can still be used with `runc` containers and
inter-mixed with running Kata Containers, thus enabling use of `--net=host`
when necessary.
It should be noted, currently passing the `--net=host` option into a
Kata Container may result in the Kata Container networking setup
modifying, re-configuring and therefore possibly breaking the host
networking setup. Do not use `--net=host` with Kata Containers.
### docker run --link
The runtime does not support the `docker run --link` command. This
command is now deprecated by docker and we have no intention of adding support.
Equivalent functionality can be achieved with the newer docker networking commands.
See more documentation at
[docs.docker.com](https://docs.docker.com/engine/userguide/networking/default_network/dockerlinks/).
## Storage limitations
### Kubernetes `volumeMounts.subPaths`
@@ -120,11 +198,15 @@ moment.
See [this issue](https://github.com/kata-containers/runtime/issues/2812) for more details.
[Another issue](https://github.com/kata-containers/kata-containers/issues/1728) focuses on the case of `emptyDir`.
## Host resource sharing
### Privileged containers
### docker run --privileged
Privileged support in Kata is essentially different from `runc` containers.
Kata does support `docker run --privileged` command, but in this case full access
to the guest VM is provided in addition to some host access.
The container runs with elevated capabilities within the guest and is granted
access to guest devices instead of the host devices.
This is also true with using `securityContext privileged=true` with Kubernetes.
@@ -134,6 +216,17 @@ The container may also be granted full access to a subset of host devices
See [Privileged Kata Containers](how-to/privileged.md) for how to configure some of this behavior.
# Miscellaneous
This section lists limitations where the possible solutions are uncertain.
## Docker --security-opt option partially supported
The `--security-opt=` option used by Docker is partially supported.
We only support `--security-opt=no-new-privileges` and `--security-opt seccomp=/path/to/seccomp/profile.json`
option as of today.
Note: The `--security-opt apparmor=your_profile` is not yet supported. See https://github.com/kata-containers/runtime/issues/707.
# Appendices
## The constraints challenge

View File

@@ -11,23 +11,20 @@ For details of the other Kata Containers repositories, see the
* [Installation guides](./install/README.md): Install and run Kata Containers with Docker or Kubernetes
## Tracing
See the [tracing documentation](tracing.md).
## More User Guides
* [Upgrading](Upgrading.md): how to upgrade from [Clear Containers](https://github.com/clearcontainers) and [runV](https://github.com/hyperhq/runv) to [Kata Containers](https://github.com/kata-containers) and how to upgrade an existing Kata Containers system to the latest version.
* [Limitations](Limitations.md): differences and limitations compared with the default [Docker](https://www.docker.com/) runtime,
[`runc`](https://github.com/opencontainers/runc).
### How-to guides
### Howto guides
See the [how-to documentation](how-to).
See the [howto documentation](how-to).
## Kata Use-Cases
* [GPU Passthrough with Kata](./use-cases/GPU-passthrough-and-Kata.md)
* [OpenStack Zun with Kata Containers](./use-cases/zun_kata.md)
* [SR-IOV with Kata](./use-cases/using-SRIOV-and-kata.md)
* [Intel QAT with Kata](./use-cases/using-Intel-QAT-and-kata.md)
* [VPP with Kata](./use-cases/using-vpp-and-kata.md)
@@ -40,30 +37,16 @@ Documents that help to understand and contribute to Kata Containers.
### Design and Implementations
* [Kata Containers Architecture](design/architecture): Architectural overview of Kata Containers
* [Kata Containers Architecture](design/architecture.md): Architectural overview of Kata Containers
* [Kata Containers E2E Flow](design/end-to-end-flow.md): The entire end-to-end flow of Kata Containers
* [Kata Containers design](./design/README.md): More Kata Containers design documents
* [Kata Containers threat model](./threat-model/threat-model.md): Kata Containers threat model
### How to Contribute
* [Developer Guide](Developer-Guide.md): Setup the Kata Containers developing environments
* [How to contribute to Kata Containers](https://github.com/kata-containers/community/blob/main/CONTRIBUTING.md)
* [How to contribute to Kata Containers](https://github.com/kata-containers/community/blob/master/CONTRIBUTING.md)
* [Code of Conduct](../CODE_OF_CONDUCT.md)
## Help Writing a Code PR
* [Code PR advice](code-pr-advice.md).
## Help Writing Unit Tests
* [Unit Test Advice](Unit-Test-Advice.md)
* [Unit testing presentation](presentations/unit-testing/kata-containers-unit-testing.md)
## Help Improving the Documents
* [Documentation Requirements](Documentation-Requirements.md)
### Code Licensing
* [Licensing](Licensing-strategy.md): About the licensing strategy of Kata Containers.
@@ -73,9 +56,9 @@ Documents that help to understand and contribute to Kata Containers.
* [Release strategy](Stable-Branch-Strategy.md)
* [Release Process](Release-Process.md)
## Presentations
## Help Improving the Documents
* [Presentations](presentations)
* [Documentation Requirements](Documentation-Requirements.md)
## Website Changes

View File

@@ -4,11 +4,11 @@
## Requirements
- [hub](https://github.com/github/hub)
* Using an [application token](https://github.com/settings/tokens) is required for hub (set to a GITHUB_TOKEN environment variable).
* Using an [application token](https://github.com/settings/tokens) is required for hub.
- GitHub permissions to push tags and create releases in Kata repositories.
- GPG configured to sign git tags. https://docs.github.com/en/authentication/managing-commit-signature-verification/generating-a-new-gpg-key
- GPG configured to sign git tags. https://help.github.com/articles/generating-a-new-gpg-key/
- You should configure your GitHub to use your ssh keys (to push to branches). See https://help.github.com/articles/adding-a-new-ssh-key-to-your-github-account/.
* As an alternative, configure hub to push and fork with HTTPS, `git config --global hub.protocol https` (Not tested yet) *
@@ -48,7 +48,6 @@
### Merge all bump version Pull requests
- The above step will create a GitHub pull request in the Kata projects. Trigger the CI using `/test` command on each bump Pull request.
- Trigger the `test-kata-deploy` workflow which is under the `Actions` tab on the repository GitHub page (make sure to select the correct branch and validate it passes).
- Check any failures and fix if needed.
- Work with the Kata approvers to verify that the CI works and the pull requests are merged.
@@ -65,7 +64,7 @@
### Check Git-hub Actions
We make use of [GitHub actions](https://github.com/features/actions) in this [file](../.github/workflows/release.yaml) in the `kata-containers/kata-containers` repository to build and upload release artifacts. This action is auto triggered with the above step when a new tag is pushed to the `kata-containers/kata-containers` repository.
We make use of [GitHub actions](https://github.com/features/actions) in this [file](https://github.com/kata-containers/kata-containers/blob/main/.github/workflows/main.yaml) in the `kata-containers/kata-containers` repository to build and upload release artifacts. This action is auto triggered with the above step when a new tag is pushed to the `kata-containers/kata-containers` repository.
Check the [actions status page](https://github.com/kata-containers/kata-containers/actions) to verify all steps in the actions workflow have completed successfully. On success, a static tarball containing Kata release artifacts will be uploaded to the [Release page](https://github.com/kata-containers/kata-containers/releases).

View File

@@ -120,7 +120,7 @@ stable and main. While this is not in place currently, it should be considered i
### Patch releases
Releases are made every four weeks, which include a GitHub release as
Releases are made every three weeks, which include a GitHub release as
well as binary packages. These patch releases are made for both stable branches, and a "release candidate"
for the next `MAJOR` or `MINOR` is created from main. If there are no changes across all the repositories, no
release is created and an announcement is made on the developer mailing list to highlight this.
@@ -136,7 +136,8 @@ The process followed for making a release can be found at [Release Process](Rele
### Frequency
Minor releases are less frequent in order to provide a more stable baseline for users. They are currently
running on a sixteen weeks cadence. The release schedule can be seen on the
running on a twelve week cadence. As the Kata Containers code base has reached a certain level of
maturity, we have increased the cadence from six weeks to twelve weeks. The release schedule can be seen on the
[release rotation wiki page](https://github.com/kata-containers/community/wiki/Release-Team-Rota).
### Compatibility

View File

@@ -1,379 +0,0 @@
# Unit Test Advice
## Overview
This document offers advice on writing a Unit Test (UT) in
[Golang](https://golang.org) and [Rust](https://www.rust-lang.org).
## General advice
### Unit test strategies
#### Positive and negative tests
Always add positive tests (where success is expected) *and* negative
tests (where failure is expected).
#### Boundary condition tests
Try to add unit tests that exercise boundary conditions such as:
- Missing values (`null` or `None`).
- Empty strings and huge strings.
- Empty (or uninitialised) complex data structures
(such as lists, vectors and hash tables).
- Common numeric values (such as `-1`, `0`, `1` and the minimum and
maximum values).
#### Test unusual values
Also always consider "unusual" input values such as:
- String values containing spaces, Unicode characters, special
characters, escaped characters or null bytes.
> **Note:** Consider these unusual values in prefix, infix and
> suffix position.
- String values that cannot be converted into numeric values or which
contain invalid structured data (such as invalid JSON).
#### Other types of tests
If the code requires other forms of testing (such as stress testing,
fuzz testing and integration testing), raise a GitHub issue and
reference it on the issue you are using for the main work. This
ensures the test team are aware that a new test is required.
### Test environment
#### Create unique files and directories
Ensure your tests do not write to a fixed file or directory. This can
cause problems when running multiple tests simultaneously and also
when running tests after a previous test run failure.
#### Assume parallel testing
Always assume your tests will be run *in parallel*. If this is
problematic for a test, force it to run in isolation using the
`serial_test` crate for Rust code for example.
### Running
Ensure you run the unit tests and they all pass before raising a PR.
Ideally do this on different distributions on different architectures
to maximise coverage (and so minimise surprises when your code runs in
the CI).
## Assertions
### Golang assertions
Use the `testify` assertions package to create a new assertion object as this
keeps the test code free from distracting `if` tests:
```go
func TestSomething(t *testing.T) {
assert := assert.New(t)
err := doSomething()
assert.NoError(err)
}
```
### Rust assertions
Use the standard set of `assert!()` macros.
## Table driven tests
Try to write tests using a table-based approach. This allows you to distill
the logic into a compact table (rather than spreading the tests across
multiple test functions). It also makes it easy to cover all the
interesting boundary conditions:
### Golang table driven tests
Assume the following function:
```go
// The function under test.
//
// Accepts a string and an integer and returns the
// result of sticking them together separated by a dash as a string.
func joinParamsWithDash(str string, num int) (string, error) {
if str == "" {
return "", errors.New("string cannot be blank")
}
if num <= 0 {
return "", errors.New("number must be positive")
}
return fmt.Sprintf("%s-%d", str, num), nil
}
```
A table driven approach to testing it:
```go
import (
"testing"
"github.com/stretchr/testify/assert"
)
func TestJoinParamsWithDash(t *testing.T) {
assert := assert.New(t)
// Type used to hold function parameters and expected results.
type testData struct {
param1 string
param2 int
expectedResult string
expectError bool
}
// List of tests to run including the expected results
data := []testData{
// Failure scenarios
{"", -1, "", true},
{"", 0, "", true},
{"", 1, "", true},
{"foo", 0, "", true},
{"foo", -1, "", true},
// Success scenarios
{"foo", 1, "foo-1", false},
{"bar", 42, "bar-42", false},
}
// Run the tests
for i, d := range data {
// Create a test-specific string that is added to each assert
// call. It will be displayed if any assert test fails.
msg := fmt.Sprintf("test[%d]: %+v", i, d)
// Call the function under test
result, err := joinParamsWithDash(d.param1, d.param2)
// update the message for more information on failure
msg = fmt.Sprintf("%s, result: %q, err: %v", msg, result, err)
if d.expectError {
assert.Error(err, msg)
// If an error is expected, there is no point
// performing additional checks.
continue
}
assert.NoError(err, msg)
assert.Equal(d.expectedResult, result, msg)
}
}
```
### Rust table driven tests
Assume the following function:
```rust
// Convenience type to allow Result return types to only specify the type
// for the true case; failures are specified as static strings.
// XXX: This is an example. In real code use the "anyhow" and
// XXX: "thiserror" crates.
pub type Result<T> = std::result::Result<T, &'static str>;
// The function under test.
//
// Accepts a string and an integer and returns the
// result of sticking them together separated by a dash as a string.
fn join_params_with_dash(str: &str, num: i32) -> Result<String> {
if str.is_empty() {
return Err("string cannot be blank");
}
if num <= 0 {
return Err("number must be positive");
}
let result = format!("{}-{}", str, num);
Ok(result)
}
```
A table driven approach to testing it:
```rust
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_join_params_with_dash() {
// This is a type used to record all details of the inputs
// and outputs of the function under test.
#[derive(Debug)]
struct TestData<'a> {
str: &'a str,
num: i32,
result: Result<String>,
}
// The tests can now be specified as a set of inputs and outputs
let tests = &[
// Failure scenarios
TestData {
str: "",
num: 0,
result: Err("string cannot be blank"),
},
TestData {
str: "foo",
num: -1,
result: Err("number must be positive"),
},
// Success scenarios
TestData {
str: "foo",
num: 42,
result: Ok("foo-42".to_string()),
},
TestData {
str: "-",
num: 1,
result: Ok("--1".to_string()),
},
];
// Run the tests
for (i, d) in tests.iter().enumerate() {
// Create a string containing details of the test
let msg = format!("test[{}]: {:?}", i, d);
// Call the function under test
let result = join_params_with_dash(d.str, d.num);
// Update the test details string with the results of the call
let msg = format!("{}, result: {:?}", msg, result);
// Perform the checks
if d.result.is_ok() {
assert!(result == d.result, msg);
continue;
}
let expected_error = format!("{}", d.result.as_ref().unwrap_err());
let actual_error = format!("{}", result.unwrap_err());
assert!(actual_error == expected_error, msg);
}
}
}
```
## Temporary files
Always delete temporary files on success.
### Golang temporary files
```go
func TestSomething(t *testing.T) {
assert := assert.New(t)
// Create a temporary directory
tmpdir, err := os.MkdirTemp("", "")
assert.NoError(err)
// Delete it at the end of the test
defer os.RemoveAll(tmpdir)
// Add test logic that will use the tmpdir here...
}
```
### Rust temporary files
Use the `tempfile` crate which allows files and directories to be deleted
automatically:
```rust
#[cfg(test)]
mod tests {
use tempfile::tempdir;
#[test]
fn test_something() {
// Create a temporary directory (which will be deleted automatically
let dir = tempdir().expect("failed to create tmpdir");
let filename = dir.path().join("file.txt");
// create filename ...
}
}
```
## Test user
[Unit tests are run *twice*](https://github.com/kata-containers/tests/blob/main/.ci/go-test.sh):
- as the current user
- as the `root` user (if different to the current user)
When writing a test consider which user should run it; even if the code the
test is exercising runs as `root`, it may be necessary to *only* run the test
as a non-`root` for the test to be meaningful. Add appropriate skip
guards around code that requires `root` and non-`root` so that the test
will run if the correct type of user is detected and skipped if not.
### Run Golang tests as a different user
The main repository has the most comprehensive set of skip abilities. See:
- [`katatestutils`](../src/runtime/pkg/katatestutils)
### Run Rust tests as a different user
One method is to use the `nix` crate along with some custom macros:
```
#[cfg(test)]
mod tests {
#[allow(unused_macros)]
macro_rules! skip_if_root {
() => {
if nix::unistd::Uid::effective().is_root() {
println!("INFO: skipping {} which needs non-root", module_path!());
return;
}
};
}
#[allow(unused_macros)]
macro_rules! skip_if_not_root {
() => {
if !nix::unistd::Uid::effective().is_root() {
println!("INFO: skipping {} which needs root", module_path!());
return;
}
};
}
#[test]
fn test_that_must_be_run_as_root() {
// Not running as the superuser, so skip.
skip_if_not_root!();
// Run test *iff* the user running the test is root
// ...
}
}
```

View File

@@ -102,7 +102,7 @@ first
[install the latest release](#determine-latest-version).
See the
[manual installation documentation](install/README.md#manual-installation)
[manual installation installation documentation](install/README.md#manual-installation)
for details on how to automatically install and configuration a static release
with containerd.
@@ -114,7 +114,7 @@ with containerd.
> kernel or image.
If you are using custom
[guest assets](design/architecture/README.md#guest-assets),
[guest assets](design/architecture.md#guest-assets),
you must upgrade them to work with Kata Containers 2.x since Kata
Containers 1.x assets will **not** work.

View File

@@ -1,247 +0,0 @@
# Code PR Advice
Before raising a PR containing code changes, we suggest you consider
the following to ensure a smooth and fast process.
> **Note:**
>
> - All the advice in this document is optional. However, if the
> advice provided is not followed, there is no guarantee your PR
> will be merged.
>
> - All the check tools will be run automatically on your PR by the CI.
> However, if you run them locally first, there is a much better
> chance of a successful initial CI run.
## Assumptions
This document assumes you have already read (and in the case of the
code of conduct agreed to):
- The [Kata Containers code of conduct](https://github.com/kata-containers/community/blob/main/CODE_OF_CONDUCT.md).
- The [Kata Containers contributing guide](https://github.com/kata-containers/community/blob/main/CONTRIBUTING.md).
## Code
### Architectures
Do not write architecture-specific code if it is possible to write the
code generically.
### General advice
- Do not write code to impress: instead write code that is easy to read and understand.
- Always consider which user will run the code. Try to minimise
the privileges the code requires.
### Comments
Always add comments if the intent of the code is not obvious. However,
try to avoid comments if the code could be made clearer (for example
by using more meaningful variable names).
### Constants
Don't embed magic numbers and strings in functions, particularly if
they are used repeatedly.
Create constants at the top of the file instead.
### Copyright and license
Ensure all new files contain a copyright statement and an SPDX license
identifier in the comments at the top of the file.
### FIXME and TODO
If the code contains areas that are not fully implemented, make this
clear a comment which provides a link to a GitHub issue that provides
further information.
Do not just rely on comments in this case though: if possible, return
a "`BUG: feature X not implemented see {bug-url}`" type error.
### Functions
- Keep functions relatively short (less than 100 lines is a good "rule of thumb").
- Document functions if the parameters, return value or general intent
of the function is not obvious.
- Always return errors where possible.
Do not discard error return values from the functions this function
calls.
### Logging
- Don't use multiple log calls when a single log call could be used.
- Use structured logging where possible to allow
[standard tooling](https://github.com/kata-containers/tests/tree/main/cmd/log-parser)
be able to extract the log fields.
### Names
Give functions, macros and variables clear and meaningful names.
### Structures
#### Golang structures
Unlike Rust, Go does not enforce that all structure members be set.
This has lead to numerous bugs in the past where code like the
following is used:
```go
type Foo struct {
Key string
Value string
}
// BUG: Key not set, but nobody noticed! ;(
let foo1 = Foo {
Value: "foo",
}
```
A much safer approach is to create a constructor function to enforce
integrity:
```go
type Foo struct {
Key string
Value string
}
func NewFoo(key, value string) (*Foo, error) {
if key == "" {
return nil, errors.New("Foo needs a key")
}
if value == "" {
return nil, errors.New("Foo needs a value")
}
return &Foo{
Key: key,
Value: value,
}, nil
}
func testFoo() error {
// BUG: Key not set, but nobody noticed! ;(
badFoo := Foo{Value: "value"}
// Ok - the constructor performs needed validation
goodFoo, err := NewFoo("name", "value")
if err != nil {
return err
}
return nil
```
> **Note:**
>
> The above is just an example. The *safest* approach would be to move
> `NewFoo()` into a separate package and make `Foo` and it's elements
> private. The compiler would then enforce the use of the constructor
> to guarantee correctly defined objects.
### Tracing
Consider if the code needs to create a new
[trace span](./tracing.md).
Ensure any new trace spans added to the code are completed.
## Tests
### Unit tests
Where possible, code changes should be accompanied by unit tests.
Consider using the standard
[table-based approach](Unit-Test-Advice.md)
as it encourages you to make functions small and simple, and also
allows you to think about what types of value to test.
### Other categories of test
Raised a GitHub issue in the
[`tests`](https://github.com/kata-containers/tests) repository that
explains what sort of test is required along with as much detail as
possible. Ensure the original issue is referenced on the `tests` issue.
### Unsafe code
#### Rust language specifics
Minimise the use of `unsafe` blocks in Rust code and since it is
potentially dangerous always write [unit tests][#unit-tests]
for this code where possible.
`expect()` and `unwrap()` will cause the code to panic on error.
Prefer to return a `Result` on error rather than using these calls to
allow the caller to deal with the error condition.
The table below lists the small number of cases where use of
`expect()` and `unwrap()` are permitted:
| Area | Rationale for permitting |
|-|-|
| In test code (the `tests` module) | Panics will cause the test to fail, which is desirable. |
| `lazy_static!()` | This magic macro cannot "return" a value as it runs before `main()`. |
| `defer!()` | Similar to golang's `defer()` but doesn't allow the use of `?`. |
| `tokio::spawn(async move {})` | Cannot currently return a `Result` from an `async move` closure. |
| If an explicit test is performed before the `unwrap()` / `expect()` | *"Just about acceptable"*, but not ideal `[*]` |
| `Mutex.lock()` | Almost unrecoverable if failed in the lock acquisition |
`[*]` - There can lead to bad *future* code: consider what would
happen if the explicit test gets dropped in the future. This is easier
to happen if the test and the extraction of the value are two separate
operations. In summary, this strategy can introduce an insidious
maintenance issue.
## Documentation
### General requirements
- All new features should be accompanied by documentation explaining:
- What the new feature does
- Why it is useful
- How to use the feature
- Any known issues or limitations
Links should be provided to GitHub issues tracking the issues
- The [documentation requirements document](Documentation-Requirements.md)
explains how the project formats documentation.
### Markdown syntax
Run the
[markdown checker](https://github.com/kata-containers/tests/tree/main/cmd/check-markdown)
on your documentation changes.
### Spell check
Run the
[spell checker](https://github.com/kata-containers/tests/tree/main/cmd/check-spelling)
on your documentation changes.
## Finally
You may wish to read the documentation that the
[Kata Review Team](https://github.com/kata-containers/community/blob/main/Rota-Process.md) use to help review PRs:
- [PR review guide](https://github.com/kata-containers/community/blob/main/PR-Review-Guide.md).
- [documentation review process](https://github.com/kata-containers/community/blob/main/Documentation-Review-Process.md).

View File

@@ -2,7 +2,7 @@
Kata Containers design documents:
- [Kata Containers architecture](architecture)
- [Kata Containers architecture](architecture.md)
- [API Design of Kata Containers](kata-api-design.md)
- [Design requirements for Kata Containers](kata-design-requirements.md)
- [VSocks](VSocks.md)
@@ -10,7 +10,6 @@ Kata Containers design documents:
- [Host cgroups](host-cgroups.md)
- [`Inotify` support](inotify.md)
- [Metrics(Kata 2.0)](kata-2-0-metrics.md)
- [Design for Kata Containers `Lazyload` ability with `nydus`](kata-nydus-design.md)
---

View File

@@ -1 +1 @@
<mxfile host="app.diagrams.net" modified="2021-11-05T13:07:32.992Z" agent="5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36" etag="j5e7J3AOXxeQrt-Zz2uw" version="15.6.8" type="device"><diagram id="XNV8G0dePIPkhS_Khqr4" name="Page-1">7Vxdd9o4EP01nLP7QI5s+fORUNhNT7rNbnqaZl/2CCywG2OxQhDIr18Z29iyZD6CDZRuHho8toQ9986dGVlNC3Yny98omvqfiIfDlg68ZQt+aOm6AUyb/4otq8TiGnpiGNPAS0wgNzwGbzgxapl1Hnh4ltoSEyMkZMFUNA5JFOEhE2yIUvIqXjYioScYpmiMJcPjEIWy9SnwmJ9YdQjd/MTvOBj76VdDCNI7n6Ds6tQw85FHXgsm2GvBLiWEJZ8myy4OY++JjulXnN3cGcUR22fAn3fPzx+jj7e9HrIXA330feIZ7czPCxTO00dO75atMh+MKZlP08swZXip8jwaZJcD+ca0zeNyomAywYyu+CXpRG3NNM1kUMoSXU+PX3OfG9nEfsHdG56gFOfxZvbcE/xD6gy1Yz7/88nuPiLwcNcZDLvEvZ10Vm1zt1+4WyIPx5NoLXj76gcMP07RMD77yqOB23w2CdPTIxKxlN78nuHtmCIv4A7qkpDQ9XzQxsjC8blREIYFu4ewMxpy+4xR8oILZ6yhgwcjfkZ2+Xa0yzDK2JzP81ZzntcNtedHI8+1LNnzo9FIHyo971kDy7Qa8Xx61gJCSGhyRHCtkXHRLLMhXGwJF659/KFrCwtHBkAbIA3rKgAAsHqdfjqDCBn/aRIYTRORUWiVa8rAwKZwcSRcuCQzFESYcrNWLz6K4HFtD9i2QrZM7HiGCjtHH8Ak3ETs+n0A+v0msdNhCTv7RkZPM1TwNSV37lb4asw6VwifDc4MnqJ88ldTTBfBjHulDB1/SibiI/o2IhEuAZGaUBiMI3445B7lvIC3sc8CXqZ20hOTwPPir1ESIqcMUORDn9DgLaZcmF7QnHKWUhqQ4TNUpUZj6GkSeuM5nvGUBl4wjeJW5odAsDnAzBJihiLgrIYgUz6DrJYSRqdvVwzblol82qJZM3Y75sfrV9wKGC+pXdEa7BTP16/s7/lLbVc0uY+8hn7lYGCkdgVKyJy0XdHkPvJn6VcOxu4C2xVte7t5zf3K0fCdv12Ry6efpl05XDgvrFvR5V7zmruVw/E6Z7OiDjcoQYK9MX5MDwllPhmTCIW93FpyXn7NPSHTFMXvmLFV6lI0Z0TEGC8D9q3w+TmeiueN5OjDMp15fbCSMdJijDg0dPUtuzI+KMwSH+bTrI+yeWRss3xO5nSYsfb+y53jDp6+P1r+y+fXj+HLU97AMETHmG1xalo+xI7cygqKQ8SCBRZuQwXxemiHUrQqXDAlQcRmhZkfYoPQBNqOmJstu0gYxQjD1ksjnBLFkrvICbd5nCNUA0qqwRidDpXMvEcDLiMCm/ZXAopnwVvaV8dcSF3IJzdvW+YHBctktmwNo727+erWHdxormWIMpEcHUYXGV0osmFz09kUZDSaYSZJymEIqyNHLqirEb5A7Xmv1hTZZB+nPa6sPVtFaqf45HyDBrRFvtnHEa5WQqklQy7ir5g6aiE6hjpqEbuQtOUCUf52p63yCFOzS6w7Lm1t9WtB1LqFNrMXjfmnlm6FcYE74CZrHH/6ZdOLeq04F/T5v92/7tqff5UoXeeyj+U5tqXsPWHHNGA2w37LPtlLib0L37auOa4IKpQXpDXHkktfq4bSV4lf1tb+HBpyXOmr75t+4L7pZ28ROQpjXY7RBxrP7eP5LH5wTBdYXla4rsATyz4LqALPaCbw1Mn7nHGXx9pzMdR2xF0eas/ZfCfJ3VDRcqrFrPa4e2fytk1bSbfq5F0eYTpmrclbyUH5XaTP2FRJzMvsOPUKJTi44QQ3Um6uqd/MXm9l/aZuiFM01x7ICwqV6J5MdqCY8QGwTqI98aQPmAbcpTFTj1wC21+P4J56tGGhCQEUK8QjaVhNs8NVzbHFezNAaSP7TlUrjTha1dDX0f1d+292B77+8dRGX7/MfHCmzPpOplaycPcah9tIslM1lpq4jWZTPGWTJBGTjjuGYVXftKXpLY0wHbh9hA5ca9uIZtpkKKfaF8RQe0KigCle6V3sXofDa2/NNfWbCgIVqm/dVLxgba76lrcvXGjb+64yetvK1u4VMKtuYTkOKjl0frQXI5VR8446FZqmXlNlCsqvQkqyXktpqnxnvMdWvOZ3hzqGmAgMR7EmYG928hR1yW5s05XCMcnaqRcsBAdZ/87j/5C4pmR7tuZkh1+gGdPlmpjZ+WzFNV9wbc/8YNJep5/FZnp+t+tvSC6uLx8ZDeejrfQ6aDtqBdTNrbzKujYjw5f6XK/a8esAwIt4hes7JgAGOKXrs/2o2o1e2qUtb51T1QZ1bL5SPoL8mvYCxEn5pkDNWKcpxir3rv+vToeGiF1BnMtSJ3n16ArUaX/XV6qTKW9Wq0md+GH+RwaSSiv/Ww2w9x8=</diagram></mxfile>
<mxfile host="Chrome" modified="2020-07-02T06:44:28.736Z" agent="5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36" etag="r7FpfnbGNK7jbg54Gu9x" version="13.3.5" type="device"><diagram id="XNV8G0dePIPkhS_Khqr4" name="Page-1">7VvZcuI4FP0aHqFky+sjkNCTqfR0qtLV6fTLlMDy0hiLscWWrx8Zy3iRQkhjQxbygnVlhK1zz9HRkg4cztZfYjT3vxIHhx0VOOsOvOqoKrRthX2kkU0WMTWYBbw4cLIQKAL3wRPOgkoeXQQOTngsC1FCQhrMq8EJiSI8oZUYimOyqt7mktCpBObIw0LgfoJCMfoQONTPoiqEdlHxFw48n/80hIA/+Qzld/NA4iOHrEoheN2Bw5gQml3N1kMcpr1X7ZjRM7W7J4txRA/5wrd/v5rDewTubvrjyZDYg1l/01W0rJklChf8lfnT0k3eBzFZRA5OW1E6cLDyA4rv52iS1q4Y6izm01nIq10SUQ4jwxAOvBg5AXvCIQlJvG0PmhgZOK1zgzAsxR2ELXfC4gmNyRSXaoyJhccuqxHfmXfDEscUr0sh3gdfMJlhGm/YLby2q+i6nn2J56QOzKy8KhDWctT8Eri7rEQ8q7xd60W/swve9a+BQW8PBlWTw+C6jm0YIgyu66oTKQyOMTZ0oykYNLsGQ/7OJRgYnUQYFENvCYYWUXgvZFDss5PBuHBBVc7OBQUKvY4dNjbyIompTzwSofC6iA4KXNKULu65JWTO0fiNKd1wONCCkipWeB3Qn6Xrx7Spns5LV2ve8raw4YUyy7R9iCRkEU/4u3i3328se/zw+97wp99Wf4fTh2I0pCj2MN3TOZwjaYfsBTjGIaLBsmomGodKhQJjKI3nEymAt2jMPFql01EYeBG7nrAOwyy/B2nqBswE9XnFLHCcDF+cBE9ovG0v7fo5CSK6fR190NGvDgJjb7YJpNlZO/6rFfMkJRPoKbahVUUtKx2MBm/8Ln27Ustqmonldrt2tQ3iugnLmzqeu4c8COK9mVkRRSNkXTpwgmUFZeO/RWopt0h0ky0UfXaDos3XWzzyenblpZ+sfykKIhw73cQPZt0poqi73DXPHnf7C9nNzY2Hmqi2yhgpWJWpLQDGdX/EW6jqM/trSoUts6bCUBwLFdPMk6Csw0YDg6EcePMV3H6D4szwiDc/y4XSt9Ji8bVtSSbq5nGibouivpdjL6o6TxjQA5ZuVjMGHKc0jQqJfKwQzdQHzpwj7YAkc+Sj17nswN7HLklGofHNCbglCrjhWKahyQQc9nUN5i20JeBMnMHLAi6bzLQm35r+megGjqKbeqhQw0OF+jR8U0W+3cVp2z5eJOmL45gl8ccmnm1XiGdIltQUS2uHePJx7py8K7j2WKbaC7wrqPaYt3eSYQ5KZr1yMTsb76QQi1Min9K5FPe3OelVnyHaq+e8oMfYVWXgkVPefIKrKL3aAlN71lRcxXgWz5PxWP2YRIYHEnmXXxBa1fSyj8uvRrMJ/XBvb7q/6A348c9DF/34nvjgzANA61lzgizJMX4jNguKer9dqpqRKKCkXX917pUpW1drS48yh6Xqp1yZ0kS9Tshk2hwOsl0xCwATynDo6wBooG0cLFibYFoiCjIQYGs2VxH6+43OL/9omNu32vLyqozxpuyqKurXe9uleZYyf+BYoa6rx3mInJWGUuFkVwOn86yKgOllW6Zp0TVr2zKaRHRPvS2jiWT+8IOfqcBeDQodqucd/8TdMeSl7/iRzaCm1bYpVREEWwZCW0dFLAGEnXilCtcsGJLTO7bpANOUEEbHliNdFbXUMczO+1SBGo0AGI2aAgqqdaC0XKTK0qWdkjB79oZYWJw0f1qsDMkgc1Kk8vN15fWwzRzHyyCRzHbZi9IqGNWOjEiEa73OQ4f7Shn61blE/aidT+LgKc2vsPPCssWrzizWBVB2ZlF2ZLE1qEQb6C1wkprUKY6j9Ez8u4CrmeEJVNGBsk1Y46TwiCdKP59L0HOhOpdLkJxkutgE2dCjw/PbBGW/p7v4hB1Y5tl9gmjpLj5B6hNka+Yn9QmqaOkuPmGHjtaeT2DF4h/tsrW/4v8V4fX/</diagram></mxfile>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 90 KiB

After

Width:  |  Height:  |  Size: 93 KiB

View File

@@ -1 +0,0 @@
<mxfile host="app.diagrams.net" modified="2022-01-18T14:06:01.890Z" agent="5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36" etag="nId-8OV6FDjWTDgzqDu-" version="15.8.9" type="device"><diagram id="bkF_ZONM9sPFCpIYoGFl" name="Page-1">5Vtbj6M2GP01eUyEbW55nGSmM2q70mqnUnf6UrnBCdYSnIIzSfbX14AJYDsLSSDZUWdHGvxhDD7n8F1sdoTm6/1zgjfhJxaQaAStYD9CjyMIgQ1d8SezHAqL504LwyqhgexUGV7pdyKNlrRuaUDSRkfOWMTppmlcsDgmC96w4SRhu2a3JYuad93gFdEMrwsc6dY/acDDwupDr7K/ELoKyzuDcn5rXHaWM0lDHLBdzYSeRmieMMaLo/V+TqIMvBKX4rpfTpw9PlhCYt7lgpd4/J1SSP9++bR8Cb6A1de/XsZ2Mco7jrZywiFLuXxgfihR4GQv7jEL+ToSBiAOU56wb2TOIpYIS8xi0XO2pFGkmHBEV7FoLsRTEmGfvZOEU4HvgzyxpkGQ3Wa2Cyknrxu8yO65E2oStoRt44BkE7BES5+xBCEbk+xrJonAM2FrwpOD6CLP+o5kQ8rRsZ2ivavItWWXsMZrSSKWclodR64QFwcSdDMB8ycYj+f+Hw/+b3jxvN57e/vXsa8RoIHfBMEEU42XJYu5fIugfeSplC5QSBpBtHSyf/LKmr340ZgWZ9z858iHBr6BopN8INDkAwGdj6llIMSxh2JkamDEjbhEqEGN+++WlSfGaY76g+gA3c2+OimOVtnf+BBs03Ea400aMp69DHJY8ZTFyEW/H/AP+uC/D9aQNbFAkzjDiwQ8A3H+ULyVSrqCOARNxInQwjGNSRIMzth0OMacCYJN14csnTFnOkG+Tpo3GGnAQJqCJomDhyySZ1EkwmlKFzlKOOG6uYZr023WUBYTRDOBW3L4mp2cOGXzTV6ZNx738sqidWjEIBJoWYMWlFK2TRakg2DFTFaEt3kkndoab47JQ0pbQiLM6XvzeU1Eyjt8ZjR/W0rluErELD10OUQxT3lVPf9QBrIVV2+7ykAFDtpAua6O075Cauh6x97iH8ZpSNfjb5jj8TscxFn04Aocx2n3A65BUMM5AT0L7c+lwqFcqg8UHKEeAVGJdSOXdAYD0rle4tOTucvw4W8wrhyvyZU7NWQr0KB5dzCq3OupMqaZufcRVWnOzwfNVnxbiTlTg4tCP4h5/dPlXZin1KA7phxjkT3DRtZhTbxj+0Tikbc+k4SKCWWFdGHcU/61HF4cv1UJjWhVI2WNITIYdM/MxIOKStSEomtmosrNVVOcoTOTDosAncWl5LNWm6ykgirVvNX0dCMFdciBC0ruJjWkKAReKjWnZaCBpQZNRfLFUmu6sFYPdmdn1bXcuq9Xc1WFqClIV6mpA3nWjaV2aWlfl9oFkql5QgvYTYkC95Ioexd/Z/9MoVWLiJ39HWiJ0UOLEBpEeF6aDXxTmr3akrRzhv0zbZ9cl5grcdBxJL732j6BpqWDM/k1llHFNthHordZifn9EA6A4gmQYZXjtozraxxzoFFyaU2bB4hBalpggROpX1tRO9gaBNTXILLt6GX6IeH0O8KJBoNTXyOg6+zzAhGOPw6sSi3sGTZkgWlDdjhYTdXxmS7eMbn4NBSwBDQZZJ2s9OwRWfJ+qJmq+bxxq/yGxKAOteStNzc0t2BC6aZeodx1/d/LV0kdfeve8jXtB95ZvtNpO0i3VW+Hrbm2Iv70RjysL0DWS/xbrQkVL+e9qmzfP8H2uVW2Fhrs21bZyLTv2K9KykWd4wJkvx9rtK7HFFnIvZQCLNiXVFxVKt7kxmLRq47yo7g8mpmL63Mrahm4TtbTqXDjNF79nnd7tCvLF0leZmLi8mWUazYUFxIxwmyT4ZIj5czEr0Bznq1IOuJZ56INqrb4zbonfM5i8fiY5pojOOW7bO0okzzHHP+Tz1Sv4HvLiFzHLJ2adD3DZwrDxZRet7vO24MIcBoe43mP7qEQ9f3cg6VwrC6/dHUP6kYXALA//yCa1efuRffqPw2gp/8A</diagram></mxfile>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 51 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 390 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 942 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 182 KiB

290
docs/design/architecture.md Normal file
View File

@@ -0,0 +1,290 @@
# Kata Containers Architecture
## Overview
This is an architectural overview of Kata Containers, based on the 2.0 release.
The primary deliverable of the Kata Containers project is a CRI friendly shim. There is also a CRI friendly library API behind them.
The [Kata Containers runtime](../../src/runtime)
is compatible with the [OCI](https://github.com/opencontainers) [runtime specification](https://github.com/opencontainers/runtime-spec)
and therefore works seamlessly with the [Kubernetes\* Container Runtime Interface (CRI)](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-node/container-runtime-interface.md)
through the [CRI-O\*](https://github.com/kubernetes-incubator/cri-o) and
[Containerd\*](https://github.com/containerd/containerd) implementation.
Kata Containers creates a QEMU\*/KVM virtual machine for pod that `kubelet` (Kubernetes) creates respectively.
The [`containerd-shim-kata-v2` (shown as `shimv2` from this point onwards)](../../src/runtime/containerd-shim-v2)
is the Kata Containers entrypoint, which
implements the [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/master/runtime/v2) for Kata.
Before `shimv2` (as done in [Kata Containers 1.x releases](https://github.com/kata-containers/runtime/releases)), we need to create a `containerd-shim` and a [`kata-shim`](https://github.com/kata-containers/shim) for each container and the Pod sandbox itself, plus an optional [`kata-proxy`](https://github.com/kata-containers/proxy) when VSOCK is not available. With `shimv2`, Kubernetes can launch Pod and OCI compatible containers with one shim (the `shimv2`) per Pod instead of `2N+1` shims, and no standalone `kata-proxy` process even if no VSOCK is available.
![Kubernetes integration with shimv2](arch-images/shimv2.svg)
The container process is then spawned by
[`kata-agent`](../../src/agent), an agent process running
as a daemon inside the virtual machine. `kata-agent` runs a [`ttRPC`](https://github.com/containerd/ttrpc-rust) server in
the guest using a VIRTIO serial or VSOCK interface which QEMU exposes as a socket
file on the host. `shimv2` uses a `ttRPC` protocol to communicate with
the agent. This protocol allows the runtime to send container management
commands to the agent. The protocol is also used to carry the I/O streams (stdout,
stderr, stdin) between the containers and the manage engines (e.g. CRI-O or containerd).
For any given container, both the init process and all potentially executed
commands within that container, together with their related I/O streams, need
to go through the VSOCK interface exported by QEMU.
The container workload, that is, the actual OCI bundle rootfs, is exported from the
host to the virtual machine. In the case where a block-based graph driver is
configured, `virtio-scsi` will be used. In all other cases a `virtio-fs` VIRTIO mount point
will be used. `kata-agent` uses this mount point as the root filesystem for the
container processes.
## Virtualization
How Kata Containers maps container concepts to virtual machine technologies, and how this is realized in the multiple
hypervisors and VMMs that Kata supports is described within the [virtualization documentation](./virtualization.md)
## Guest assets
The hypervisor will launch a virtual machine which includes a minimal guest kernel
and a guest image.
### Guest kernel
The guest kernel is passed to the hypervisor and used to boot the virtual
machine. The default kernel provided in Kata Containers is highly optimized for
kernel boot time and minimal memory footprint, providing only those services
required by a container workload. This is based on a very current upstream Linux
kernel.
### Guest image
Kata Containers supports both an `initrd` and `rootfs` based minimal guest image.
#### Root filesystem image
The default packaged root filesystem image, sometimes referred to as the "mini O/S", is a
highly optimized container bootstrap system based on [Clear Linux](https://clearlinux.org/). It provides an extremely minimal environment and
has a highly optimized boot path.
The only services running in the context of the mini O/S are the init daemon
(`systemd`) and the [Agent](#agent). The real workload the user wishes to run
is created using libcontainer, creating a container in the same manner that is done
by `runc`.
For example, when `ctr run -ti ubuntu date` is run:
- The hypervisor will boot the mini-OS image using the guest kernel.
- `systemd`, running inside the mini-OS context, will launch the `kata-agent` in
the same context.
- The agent will create a new confined context to run the specified command in
(`date` in this example).
- The agent will then execute the command (`date` in this example) inside this
new context, first setting the root filesystem to the expected Ubuntu\* root
filesystem.
#### Initrd image
A compressed `cpio(1)` archive, created from a rootfs which is loaded into memory and used as part of the Linux startup process. During startup, the kernel unpacks it into a special instance of a `tmpfs` that becomes the initial root filesystem.
The only service running in the context of the initrd is the [Agent](#agent) as the init daemon. The real workload the user wishes to run is created using libcontainer, creating a container in the same manner that is done by `runc`.
## Agent
[`kata-agent`](../../src/agent) is a process running in the guest as a supervisor for managing containers and processes running within those containers.
For the 2.0 release, the `kata-agent` is rewritten in the [RUST programming language](https://www.rust-lang.org/) so that we can minimize its memory footprint while keeping the memory safety of the original GO version of [`kata-agent` used in Kata Container 1.x](https://github.com/kata-containers/agent). This memory footprint reduction is pretty impressive, from tens of megabytes down to less than 100 kilobytes, enabling Kata Containers in more use cases like functional computing and edge computing.
The `kata-agent` execution unit is the sandbox. A `kata-agent` sandbox is a container sandbox defined by a set of namespaces (NS, UTS, IPC and PID). `shimv2` can
run several containers per VM to support container engines that require multiple
containers running inside a pod.
`kata-agent` communicates with the other Kata components over `ttRPC`.
## Runtime
`containerd-shim-kata-v2` is a [containerd runtime shimv2](https://github.com/containerd/containerd/blob/v1.4.1/runtime/v2/README.md) implementation and is responsible for handling the `runtime v2 shim APIs`, which is similar to [the OCI runtime specification](https://github.com/opencontainers/runtime-spec) but simplifies the architecture by loading the runtime once and making RPC calls to handle the various container lifecycle commands. This refinement is an improvement on the OCI specification which requires the container manager call the runtime binary multiple times, at least once for each lifecycle command.
`containerd-shim-kata-v2` heavily utilizes the
[virtcontainers package](../../src/runtime/virtcontainers/), which provides a generic, runtime-specification agnostic, hardware-virtualized containers library.
### Configuration
The runtime uses a TOML format configuration file called `configuration.toml`. By default this file is installed in the `/usr/share/defaults/kata-containers` directory and contains various settings such as the paths to the hypervisor, the guest kernel and the mini-OS image.
The actual configuration file paths can be determined by running:
```
$ kata-runtime --show-default-config-paths
```
Most users will not need to modify the configuration file.
The file is well commented and provides a few "knobs" that can be used to modify the behavior of the runtime and your chosen hypervisor.
The configuration file is also used to enable runtime [debug output](../Developer-Guide.md#enable-full-debug).
## Networking
Containers will typically live in their own, possibly shared, networking namespace.
At some point in a container lifecycle, container engines will set up that namespace
to add the container to a network which is isolated from the host network, but
which is shared between containers
In order to do so, container engines will usually add one end of a virtual
ethernet (`veth`) pair into the container networking namespace. The other end of
the `veth` pair is added to the host networking namespace.
This is a very namespace-centric approach as many hypervisors/VMMs cannot handle `veth`
interfaces. Typically, `TAP` interfaces are created for VM connectivity.
To overcome incompatibility between typical container engines expectations
and virtual machines, Kata Containers networking transparently connects `veth`
interfaces with `TAP` ones using Traffic Control:
![Kata Containers networking](arch-images/network.png)
With a TC filter in place, a redirection is created between the container network and the
virtual machine. As an example, the CNI may create a device, `eth0`, in the container's network
namespace, which is a VETH device. Kata Containers will create a tap device for the VM, `tap0_kata`,
and setup a TC redirection filter to mirror traffic from `eth0`'s ingress to `tap0_kata`'s egress,
and a second to mirror traffic from `tap0_kata`'s ingress to `eth0`'s egress.
Kata Containers maintains support for MACVTAP, which was an earlier implementation used in Kata. TC-filter
is the default because it allows for simpler configuration, better CNI plugin compatibility, and performance
on par with MACVTAP.
Kata Containers has deprecated support for bridge due to lacking performance relative to TC-filter and MACVTAP.
Kata Containers supports both
[CNM](https://github.com/docker/libnetwork/blob/master/docs/design.md#the-container-network-model)
and [CNI](https://github.com/containernetworking/cni) for networking management.
### Network Hotplug
Kata Containers has developed a set of network sub-commands and APIs to add, list and
remove a guest network endpoint and to manipulate the guest route table.
The following diagram illustrates the Kata Containers network hotplug workflow.
![Network Hotplug](arch-images/kata-containers-network-hotplug.png)
## Storage
Container workloads are shared with the virtualized environment through [virtio-fs](https://virtio-fs.gitlab.io/).
The [devicemapper `snapshotter`](https://github.com/containerd/containerd/tree/master/snapshots/devmapper) is a special case. The `snapshotter` uses dedicated block devices rather than formatted filesystems, and operates at the block level rather than the file level. This knowledge is used to directly use the underlying block device instead of the overlay file system for the container root file system. The block device maps to the top read-write layer for the overlay. This approach gives much better I/O performance compared to using `virtio-fs` to share the container file system.
Kata Containers has the ability to hotplug and remove block devices, which makes it possible to use block devices for containers started after the VM has been launched.
Users can check to see if the container uses the devicemapper block device as its rootfs by calling `mount(8)` within the container. If the devicemapper block device
is used, `/` will be mounted on `/dev/vda`. Users can disable direct mounting of the underlying block device through the runtime configuration.
## Kubernetes support
[Kubernetes\*](https://github.com/kubernetes/kubernetes/) is a popular open source
container orchestration engine. In Kubernetes, a set of containers sharing resources
such as networking, storage, mount, PID, etc. is called a
[Pod](https://kubernetes.io/docs/user-guide/pods/).
A node can have multiple pods, but at a minimum, a node within a Kubernetes cluster
only needs to run a container runtime and a container agent (called a
[Kubelet](https://kubernetes.io/docs/admin/kubelet/)).
A Kubernetes cluster runs a control plane where a scheduler (typically running on a
dedicated master node) calls into a compute Kubelet. This Kubelet instance is
responsible for managing the lifecycle of pods within the nodes and eventually relies
on a container runtime to handle execution. The Kubelet architecture decouples
lifecycle management from container execution through the dedicated
`gRPC` based [Container Runtime Interface (CRI)](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/container-runtime-interface-v1.md).
In other words, a Kubelet is a CRI client and expects a CRI implementation to
handle the server side of the interface.
[CRI-O\*](https://github.com/kubernetes-incubator/cri-o) and [Containerd\*](https://github.com/containerd/containerd/) are CRI implementations that rely on [OCI](https://github.com/opencontainers/runtime-spec)
compatible runtimes for managing container instances.
Kata Containers is an officially supported CRI-O and Containerd runtime. Refer to the following guides on how to set up Kata Containers with Kubernetes:
- [How to use Kata Containers and Containerd](../how-to/containerd-kata.md)
- [Run Kata Containers with Kubernetes](../how-to/run-kata-with-k8s.md)
#### OCI annotations
In order for the Kata Containers runtime (or any virtual machine based OCI compatible
runtime) to be able to understand if it needs to create a full virtual machine or if it
has to create a new container inside an existing pod's virtual machine, CRI-O adds
specific annotations to the OCI configuration file (`config.json`) which is passed to
the OCI compatible runtime.
Before calling its runtime, CRI-O will always add a `io.kubernetes.cri-o.ContainerType`
annotation to the `config.json` configuration file it produces from the Kubelet CRI
request. The `io.kubernetes.cri-o.ContainerType` annotation can either be set to `sandbox`
or `container`. Kata Containers will then use this annotation to decide if it needs to
respectively create a virtual machine or a container inside a virtual machine associated
with a Kubernetes pod:
```Go
containerType, err := ociSpec.ContainerType()
if err != nil {
return err
}
handleFactory(ctx, runtimeConfig)
disableOutput := noNeedForOutput(detach, ociSpec.Process.Terminal)
var process vc.Process
switch containerType {
case vc.PodSandbox:
process, err = createSandbox(ctx, ociSpec, runtimeConfig, containerID, bundlePath, console, disableOutput, systemdCgroup)
if err != nil {
return err
}
case vc.PodContainer:
process, err = createContainer(ctx, ociSpec, containerID, bundlePath, console, disableOutput)
if err != nil {
return err
}
}
```
#### Mixing VM based and namespace based runtimes
> **Note:** Since Kubernetes 1.12, the [`Kubernetes RuntimeClass`](https://kubernetes.io/docs/concepts/containers/runtime-class/)
> has been supported and the user can specify runtime without the non-standardized annotations.
With `RuntimeClass`, users can define Kata Containers as a `RuntimeClass` and then explicitly specify that a pod being created as a Kata Containers pod. For details, please refer to [How to use Kata Containers and Containerd](../../docs/how-to/containerd-kata.md).
# Appendices
## DAX
Kata Containers utilizes the Linux kernel DAX [(Direct Access filesystem)](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/dax.txt)
feature to efficiently map some host-side files into the guest VM space.
In particular, Kata Containers uses the QEMU NVDIMM feature to provide a
memory-mapped virtual device that can be used to DAX map the virtual machine's
root filesystem into the guest memory address space.
Mapping files using DAX provides a number of benefits over more traditional VM
file and device mapping mechanisms:
- Mapping as a direct access devices allows the guest to directly access
the host memory pages (such as via Execute In Place (XIP)), bypassing the guest
page cache. This provides both time and space optimizations.
- Mapping as a direct access device inside the VM allows pages from the
host to be demand loaded using page faults, rather than having to make requests
via a virtualized device (causing expensive VM exits/hypercalls), thus providing
a speed optimization.
- Utilizing `MAP_SHARED` shared memory on the host allows the host to efficiently
share pages.
Kata Containers uses the following steps to set up the DAX mappings:
1. QEMU is configured with an NVDIMM memory device, with a memory file
backend to map in the host-side file into the virtual NVDIMM space.
2. The guest kernel command line mounts this NVDIMM device with the DAX
feature enabled, allowing direct page mapping and access, thus bypassing the
guest page cache.
![DAX](arch-images/DAX.png)
Information on the use of NVDIMM via QEMU is available in the [QEMU source code](http://git.qemu-project.org/?p=qemu.git;a=blob;f=docs/nvdimm.txt;hb=HEAD)

View File

@@ -1,477 +0,0 @@
# Kata Containers Architecture
## Overview
Kata Containers is an open source community working to build a secure
container [runtime](#runtime) with lightweight virtual machines (VM's)
that feel and perform like standard Linux containers, but provide
stronger [workload](#workload) isolation using hardware
[virtualization](#virtualization) technology as a second layer of
defence.
Kata Containers runs on [multiple architectures](../../../src/runtime/README.md#platform-support)
and supports [multiple hypervisors](../../hypervisors.md).
This document is a summary of the Kata Containers architecture.
## Background knowledge
This document assumes the reader understands a number of concepts
related to containers and file systems. The
[background](background.md) document explains these concepts.
## Example command
This document makes use of a particular [example
command](example-command.md) throughout the text to illustrate certain
concepts.
## Virtualization
For details on how Kata Containers maps container concepts to VM
technologies, and how this is realized in the multiple hypervisors and
VMMs that Kata supports see the
[virtualization documentation](../virtualization.md).
## Compatibility
The [Kata Containers runtime](../../../src/runtime) is compatible with
the [OCI](https://github.com/opencontainers)
[runtime specification](https://github.com/opencontainers/runtime-spec)
and therefore works seamlessly with the
[Kubernetes Container Runtime Interface (CRI)](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-node/container-runtime-interface.md)
through the [CRI-O](https://github.com/kubernetes-incubator/cri-o)
and [containerd](https://github.com/containerd/containerd)
implementations.
Kata Containers provides a ["shimv2"](#shim-v2-architecture) compatible runtime.
## Shim v2 architecture
The Kata Containers runtime is shim v2 ("shimv2") compatible. This
section explains what this means.
> **Note:**
>
> For a comparison with the Kata 1.x architecture, see
> [the architectural history document](history.md).
The
[containerd runtime shimv2 architecture](https://github.com/containerd/containerd/tree/main/runtime/v2)
or _shim API_ architecture resolves the issues with the old
architecture by defining a set of shimv2 APIs that a compatible
runtime implementation must supply. Rather than calling the runtime
binary multiple times for each new container, the shimv2 architecture
runs a single instance of the runtime binary (for any number of
containers). This improves performance and resolves the state handling
issue.
The shimv2 API is similar to the
[OCI runtime](https://github.com/opencontainers/runtime-spec)
API in terms of the way the container lifecycle is split into
different verbs. Rather than calling the runtime multiple times, the
container manager creates a socket and passes it to the shimv2
runtime. The socket is a bi-directional communication channel that
uses a gRPC based protocol to allow the container manager to send API
calls to the runtime, which returns the result to the container
manager using the same channel.
The shimv2 architecture allows running several containers per VM to
support container engines that require multiple containers running
inside a pod.
With the new architecture [Kubernetes](kubernetes.md) can
launch both Pod and OCI compatible containers with a single
[runtime](#runtime) shim per Pod, rather than `2N+1` shims. No stand
alone `kata-proxy` process is required, even if VSOCK is not
available.
## Workload
The workload is the command the user requested to run in the
container and is specified in the [OCI bundle](background.md#oci-bundle)'s
configuration file.
In our [example](example-command.md), the workload is the `sh(1)` command.
### Workload root filesystem
For details of how the [runtime](#runtime) makes the
[container image](background.md#container-image) chosen by the user available to
the workload process, see the
[Container creation](#container-creation) and [storage](#storage) sections.
Note that the workload is isolated from the [guest VM](#environments) environment by its
surrounding [container environment](#environments). The guest VM
environment where the container runs in is also isolated from the _outer_
[host environment](#environments) where the container manager runs.
## System overview
### Environments
The following terminology is used to describe the different or
environments (or contexts) various processes run in. It is necessary
to study this table closely to make sense of what follows:
| Type | Name | Virtualized | Containerized | rootfs | Rootfs device type | Mount type | Description |
|-|-|-|-|-|-|-|-|
| Host | Host | no `[1]` | no | Host specific | Host specific | Host specific | The environment provided by a standard, physical non virtualized system. |
| VM root | Guest VM | yes | no | rootfs inside the [guest image](guest-assets.md#guest-image) | Hypervisor specific `[2]` | `ext4` | The first (or top) level VM environment created on a host system. |
| VM container root | Container | yes | yes | rootfs type requested by user ([`ubuntu` in the example](example-command.md)) | `kataShared` | [virtio FS](storage.md#virtio-fs) | The first (or top) level container environment created inside the VM. Based on the [OCI bundle](background.md#oci-bundle). |
**Key:**
- `[1]`: For simplicity, this document assumes the host environment
runs on physical hardware.
- `[2]`: See the [DAX](#dax) section.
> **Notes:**
>
> - The word "root" is used to mean _top level_ here in a similar
> manner to the term [rootfs](background.md#root-filesystem).
>
> - The term "first level" prefix used above is important since it implies
> that it is possible to create multi level systems. However, they do
> not form part of a standard Kata Containers environment so will not
> be considered in this document.
The reasons for containerizing the [workload](#workload) inside the VM
are:
- Isolates the workload entirely from the VM environment.
- Provides better isolation between containers in a [pod](kubernetes.md).
- Allows the workload to be managed and monitored through its cgroup
confinement.
### Container creation
The steps below show at a high level how a Kata Containers container is
created using the containerd container manager:
1. The user requests the creation of a container by running a command
like the [example command](example-command.md).
1. The container manager daemon runs a single instance of the Kata
[runtime](#runtime).
1. The Kata runtime loads its [configuration file](#configuration).
1. The container manager calls a set of shimv2 API functions on the runtime.
1. The Kata runtime launches the configured [hypervisor](#hypervisor).
1. The hypervisor creates and starts (_boots_) a VM using the
[guest assets](guest-assets.md#guest-assets):
- The hypervisor [DAX](#dax) shares the
[guest image](guest-assets.md#guest-image)
into the VM to become the VM [rootfs](background.md#root-filesystem) (mounted on a `/dev/pmem*` device),
which is known as the [VM root environment](#environments).
- The hypervisor mounts the [OCI bundle](background.md#oci-bundle), using [virtio FS](storage.md#virtio-fs),
into a container specific directory inside the VM's rootfs.
This container specific directory will become the
[container rootfs](#environments), known as the
[container environment](#environments).
1. The [agent](#agent) is started as part of the VM boot.
1. The runtime calls the agent's `CreateSandbox` API to request the
agent create a container:
1. The agent creates a [container environment](#environments)
in the container specific directory that contains the [container rootfs](#environments).
The container environment hosts the [workload](#workload) in the
[container rootfs](#environments) directory.
1. The agent spawns the workload inside the container environment.
> **Notes:**
>
> - The container environment created by the agent is equivalent to
> a container environment created by the
> [`runc`](https://github.com/opencontainers/runc) OCI runtime;
> Linux cgroups and namespaces are created inside the VM by the
> [guest kernel](guest-assets.md#guest-kernel) to isolate the
> workload from the VM environment the container is created in.
> See the [Environments](#environments) section for an
> explanation of why this is done.
>
> - See the [guest image](guest-assets.md#guest-image) section for
> details of exactly how the agent is started.
1. The container manager returns control of the container to the
user running the `ctr` command.
> **Note:**
>
> At this point, the container is running and:
>
> - The [workload](#workload) process ([`sh(1)` in the example](example-command.md))
> is running in the [container environment](#environments).
> - The user is now able to interact with the workload
> (using the [`ctr` command in the example](example-command.md)).
> - The [agent](#agent), running inside the VM is monitoring the
> [workload](#workload) process.
> - The [runtime](#runtime) is waiting for the agent's `WaitProcess` API
> call to complete.
Further details of these steps are provided in the sections below.
### Container shutdown
There are two possible ways for the container environment to be
terminated:
- When the [workload](#workload) exits.
This is the standard, or _graceful_ shutdown method.
- When the container manager forces the container to be deleted.
#### Workload exit
The [agent](#agent) will detect when the [workload](#workload) process
exits, capture its exit status (see `wait(2)`) and return that value
to the [runtime](#runtime) by specifying it as the response to the
`WaitProcess` agent API call made by the [runtime](#runtime).
The runtime then passes the value back to the container manager by the
`Wait` [shimv2 API](#shim-v2-architecture) call.
Once the workload has fully exited, the VM is no longer needed and the
runtime cleans up the environment (which includes terminating the
[hypervisor](#hypervisor) process).
> **Note:**
>
> When [agent tracing is enabled](../../tracing.md#agent-shutdown-behaviour),
> the shutdown behaviour is different.
#### Container manager requested shutdown
If the container manager requests the container be deleted, the
[runtime](#runtime) will signal the agent by sending it a
`DestroySandbox` [ttRPC API](../../../src/libs/protocols/protos/agent.proto) request.
## Guest assets
The guest assets comprise a guest image and a guest kernel that are
used by the [hypervisor](#hypervisor).
See the [guest assets](guest-assets.md) document for further
information.
## Hypervisor
The [hypervisor](../../hypervisors.md) specified in the
[configuration file](#configuration) creates a VM to host the
[agent](#agent) and the [workload](#workload) inside the
[container environment](#environments).
> **Note:**
>
> The hypervisor process runs inside an environment slightly different
> to the host environment:
>
> - It is run in a different cgroup environment to the host.
> - It is given a separate network namespace from the host.
> - If the [OCI configuration specifies a SELinux label](https://github.com/opencontainers/runtime-spec/blob/main/config.md#linux-process),
> the hypervisor process will run with that label (*not* the workload running inside the hypervisor's VM).
## Agent
The Kata Containers agent ([`kata-agent`](../../../src/agent)), written
in the [Rust programming language](https://www.rust-lang.org), is a
long running process that runs inside the VM. It acts as the
supervisor for managing the containers and the [workload](#workload)
running within those containers. Only a single agent process is run
for each VM created.
### Agent communications protocol
The agent communicates with the other Kata components (primarily the
[runtime](#runtime)) using a
[`ttRPC`](https://github.com/containerd/ttrpc-rust) based
[protocol](../../../src/libs/protocols/protos).
> **Note:**
>
> If you wish to learn more about this protocol, a practical way to do
> so is to experiment with the
> [agent control tool](#agent-control-tool) on a test system.
> This tool is for test and development purposes only and can send
> arbitrary ttRPC agent API commands to the [agent](#agent).
## Runtime
The Kata Containers runtime (the [`containerd-shim-kata-v2`](../../../src/runtime/cmd/containerd-shim-kata-v2
) binary) is a [shimv2](#shim-v2-architecture) compatible runtime.
> **Note:**
>
> The Kata Containers runtime is sometimes referred to as the Kata
> _shim_. Both terms are correct since the `containerd-shim-kata-v2`
> is a container runtime, and that runtime implements the containerd
> shim v2 API.
The runtime makes heavy use of the [`virtcontainers`
package](../../../src/runtime/virtcontainers), which provides a generic,
runtime-specification agnostic, hardware-virtualized containers
library.
The runtime is responsible for starting the [hypervisor](#hypervisor)
and it's VM, and communicating with the [agent](#agent) using a
[ttRPC based protocol](#agent-communications-protocol) over a VSOCK
socket that provides a communications link between the VM and the
host.
This protocol allows the runtime to send container management commands
to the agent. The protocol is also used to carry the standard I/O
streams (`stdout`, `stderr`, `stdin`) between the containers and
container managers (such as CRI-O or containerd).
## Utility program
The `kata-runtime` binary is a utility program that provides
administrative commands to manipulate and query a Kata Containers
installation.
> **Note:**
>
> In Kata 1.x, this program also acted as the main
> [runtime](#runtime), but this is no longer required due to the
> improved shimv2 architecture.
### exec command
The `exec` command allows an administrator or developer to enter the
[VM root environment](#environments) which is not accessible by the container
[workload](#workload).
See [the developer guide](../../Developer-Guide.md#connect-to-debug-console) for further details.
### Configuration
See the [configuration file details](../../../src/runtime/README.md#configuration).
The configuration file is also used to enable runtime [debug output](../../Developer-Guide.md#enable-full-debug).
## Process overview
The table below shows an example of the main processes running in the
different [environments](#environments) when a Kata Container is
created with containerd using our [example command](example-command.md):
| Description | Host | VM root environment | VM container environment |
|-|-|-|-|
| Container manager | `containerd` | |
| Kata Containers | [runtime](#runtime), [`virtiofsd`](storage.md#virtio-fs), [hypervisor](#hypervisor) | [agent](#agent) |
| User [workload](#workload) | | | [`ubuntu sh`](example-command.md) |
## Networking
See the [networking document](networking.md).
## Storage
See the [storage document](storage.md).
## Kubernetes support
See the [Kubernetes document](kubernetes.md).
#### OCI annotations
In order for the Kata Containers [runtime](#runtime) (or any VM based OCI compatible
runtime) to be able to understand if it needs to create a full VM or if it
has to create a new container inside an existing pod's VM, CRI-O adds
specific annotations to the OCI configuration file (`config.json`) which is passed to
the OCI compatible runtime.
Before calling its runtime, CRI-O will always add a `io.kubernetes.cri-o.ContainerType`
annotation to the `config.json` configuration file it produces from the Kubelet CRI
request. The `io.kubernetes.cri-o.ContainerType` annotation can either be set to `sandbox`
or `container`. Kata Containers will then use this annotation to decide if it needs to
respectively create a virtual machine or a container inside a virtual machine associated
with a Kubernetes pod:
| Annotation value | Kata VM created? | Kata container created? |
|-|-|-|
| `sandbox` | yes | yes (inside new VM) |
| `container`| no | yes (in existing VM) |
#### Mixing VM based and namespace based runtimes
> **Note:** Since Kubernetes 1.12, the [`Kubernetes RuntimeClass`](https://kubernetes.io/docs/concepts/containers/runtime-class/)
> has been supported and the user can specify runtime without the non-standardized annotations.
With `RuntimeClass`, users can define Kata Containers as a
`RuntimeClass` and then explicitly specify that a pod must be created
as a Kata Containers pod. For details, please refer to [How to use
Kata Containers and containerd](../../../docs/how-to/containerd-kata.md).
## Tracing
The [tracing document](../../tracing.md) provides details on the tracing
architecture.
# Appendices
## DAX
Kata Containers utilizes the Linux kernel DAX
[(Direct Access filesystem)](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/dax.rst?h=v5.14)
feature to efficiently map the [guest image](guest-assets.md#guest-image) in the
[host environment](#environments) into the
[guest VM environment](#environments) to become the VM's
[rootfs](background.md#root-filesystem).
If the [configured](#configuration) [hypervisor](#hypervisor) is set
to either QEMU or Cloud Hypervisor, DAX is used with the feature shown
in the table below:
| Hypervisor | Feature used | rootfs device type |
|-|-|-|
| Cloud Hypervisor (CH) | `dax` `FsConfig` configuration option | PMEM (emulated Persistent Memory device) |
| QEMU | NVDIMM memory device with a memory file backend | NVDIMM (emulated Non-Volatile Dual In-line Memory Module device) |
The features in the table above are equivalent in that they provide a memory-mapped
virtual device which is used to DAX map the VM's
[rootfs](background.md#root-filesystem) into the [VM guest](#environments) memory
address space.
The VM is then booted, specifying the `root=` kernel parameter to make
the [guest kernel](guest-assets.md#guest-kernel) use the appropriate emulated device
as its rootfs.
### DAX advantages
Mapping files using [DAX](#dax) provides a number of benefits over
more traditional VM file and device mapping mechanisms:
- Mapping as a direct access device allows the guest to directly
access the host memory pages (such as via Execute In Place (XIP)),
bypassing the [guest kernel](guest-assets.md#guest-kernel)'s page cache. This
zero copy provides both time and space optimizations.
- Mapping as a direct access device inside the VM allows pages from the
host to be demand loaded using page faults, rather than having to make requests
via a virtualized device (causing expensive VM exits/hypercalls), thus providing
a speed optimization.
- Utilizing `mmap(2)`'s `MAP_SHARED` shared memory option on the host
allows the host to efficiently share pages.
![DAX](../arch-images/DAX.png)
For further details of the use of NVDIMM with QEMU, see the [QEMU
project documentation](https://www.qemu.org).
## Agent control tool
The [agent control tool](../../../src/tools/agent-ctl) is a test and
development tool that can be used to learn more about a Kata
Containers system.
## Terminology
See the [project glossary](../../../Glossary.md).

View File

@@ -1,81 +0,0 @@
# Kata Containers architecture background knowledge
The following sections explain some of the background concepts
required to understand the [architecture document](README.md).
## Root filesystem
This document uses the term _rootfs_ to refer to a root filesystem
which is mounted as the top-level directory ("`/`") and often referred
to as _slash_.
It is important to understand this term since the overall system uses
multiple different rootfs's (as explained in the
[Environments](README.md#environments) section.
## Container image
In the [example command](example-command.md) the user has specified the
type of container they wish to run via the container image name:
`ubuntu`. This image name corresponds to a _container image_ that can
be used to create a container with an Ubuntu Linux environment. Hence,
in our [example](example-command.md), the `sh(1)` command will be run
inside a container which has an Ubuntu rootfs.
> **Note:**
>
> The term _container image_ is confusing since the image in question
> is **not** a container: it is simply a set of files (_an image_)
> that can be used to _create_ a container. The term _container
> template_ would be more accurate but the term _container image_ is
> commonly used so this document uses the standard term.
For the purposes of this document, the most important part of the
[example command line](example-command.md) is the container image the
user has requested. Normally, the container manager will _pull_
(download) a container image from a remote site and store a copy
locally. This local container image is used by the container manager
to create an [OCI bundle](#oci-bundle) which will form the environment
the container will run in. After creating the OCI bundle, the
container manager launches a [runtime](README.md#runtime) which will create the
container using the provided OCI bundle.
## OCI bundle
To understand what follows, it is important to know at a high level
how an OCI ([Open Containers Initiative](https://opencontainers.org)) compatible container is created.
An OCI compatible container is created by taking a
[container image](#container-image) and converting the embedded rootfs
into an
[OCI rootfs bundle](https://github.com/opencontainers/runtime-spec/blob/main/bundle.md),
or more simply, an _OCI bundle_.
An OCI bundle is a `tar(1)` archive normally created by a container
manager which is passed to an OCI [runtime](README.md#runtime) which converts
it into a full container rootfs. The bundle contains two assets:
- A container image [rootfs](#root-filesystem)
This is simply a directory of files that will be used to represent
the rootfs for the container.
For the [example command](example-command.md), the directory will
contain the files necessary to create a minimal Ubuntu root
filesystem.
- An [OCI configuration file](https://github.com/opencontainers/runtime-spec/blob/main/config.md)
This is a JSON file called `config.json`.
The container manager will create this file so that:
- The `root.path` value is set to the full path of the specified
container rootfs.
In [the example](example-command.md) this value will be `ubuntu`.
- The `process.args` array specifies the list of commands the user
wishes to run. This is known as the [workload](README.md#workload).
In [the example](example-command.md) the workload is `sh(1)`.

View File

@@ -1,30 +0,0 @@
# Example command
The following containerd command creates a container. It is referred
to throughout the architecture document to help explain various points:
```bash
$ sudo ctr run --runtime "io.containerd.kata.v2" --rm -t "quay.io/libpod/ubuntu:latest" foo sh
```
This command requests that containerd:
- Create a container (`ctr run`).
- Use the Kata [shimv2](README.md#shim-v2-architecture) runtime (`--runtime "io.containerd.kata.v2"`).
- Delete the container when it [exits](README.md#workload-exit) (`--rm`).
- Attach the container to the user's terminal (`-t`).
- Use the Ubuntu Linux [container image](background.md#container-image)
to create the container [rootfs](background.md#root-filesystem) that will become
the [container environment](README.md#environments)
(`quay.io/libpod/ubuntu:latest`).
- Create the container with the name "`foo`".
- Run the `sh(1)` command in the Ubuntu rootfs based container
environment.
The command specified here is referred to as the [workload](README.md#workload).
> **Note:**
>
> For the purposes of this document and to keep explanations
> simpler, we assume the user is running this command in the
> [host environment](README.md#environments).

View File

@@ -1,152 +0,0 @@
# Guest assets
Kata Containers creates a VM in which to run one or more containers.
It does this by launching a [hypervisor](README.md#hypervisor) to
create the VM. The hypervisor needs two assets for this task: a Linux
kernel and a small root filesystem image to boot the VM.
## Guest kernel
The [guest kernel](../../../tools/packaging/kernel)
is passed to the hypervisor and used to boot the VM.
The default kernel provided in Kata Containers is highly optimized for
kernel boot time and minimal memory footprint, providing only those
services required by a container workload. It is based on the latest
Linux LTS (Long Term Support) [kernel](https://www.kernel.org).
## Guest image
The hypervisor uses an image file which provides a minimal root
filesystem used by the guest kernel to boot the VM and host the Kata
Container. Kata Containers supports both initrd and rootfs based
minimal guest images. The [default packages](../../install/) provide both
an image and an initrd, both of which are created using the
[`osbuilder`](../../../tools/osbuilder) tool.
> **Notes:**
>
> - Although initrd and rootfs based images are supported, not all
> [hypervisors](README.md#hypervisor) support both types of image.
>
> - The guest image is *unrelated* to the image used in a container
> workload.
>
> For example, if a user creates a container that runs a shell in a
> BusyBox image, they will run that shell in a BusyBox environment.
> However, the guest image running inside the VM that is used to
> *host* that BusyBox image could be running Clear Linux, Ubuntu,
> Fedora or any other distribution potentially.
>
> The `osbuilder` tool provides
> [configurations for various common Linux distributions](../../../tools/osbuilder/rootfs-builder)
> which can be built into either initrd or rootfs guest images.
>
> - If you are using a [packaged version of Kata
> Containers](../../install), you can see image details by running the
> [`kata-collect-data.sh`](../../../src/runtime/data/kata-collect-data.sh.in)
> script as `root` and looking at the "Image details" section of the
> output.
#### Root filesystem image
The default packaged rootfs image, sometimes referred to as the _mini
O/S_, is a highly optimized container bootstrap system.
If this image type is [configured](README.md#configuration), when the
user runs the [example command](example-command.md):
- The [runtime](README.md#runtime) will launch the configured [hypervisor](README.md#hypervisor).
- The hypervisor will boot the mini-OS image using the [guest kernel](#guest-kernel).
- The kernel will start the init daemon as PID 1 (`systemd`) inside the VM root environment.
- `systemd`, running inside the mini-OS context, will launch the [agent](README.md#agent)
in the root context of the VM.
- The agent will create a new container environment, setting its root
filesystem to that requested by the user (Ubuntu in [the example](example-command.md)).
- The agent will then execute the command (`sh(1)` in [the example](example-command.md))
inside the new container.
The table below summarises the default mini O/S showing the
environments that are created, the services running in those
environments (for all platforms) and the root filesystem used by
each service:
| Process | Environment | systemd service? | rootfs | User accessible | Notes |
|-|-|-|-|-|-|
| systemd | VM root | n/a | [VM guest image](#guest-image)| [debug console][debug-console] | The init daemon, running as PID 1 |
| [Agent](README.md#agent) | VM root | yes | [VM guest image](#guest-image)| [debug console][debug-console] | Runs as a systemd service |
| `chronyd` | VM root | yes | [VM guest image](#guest-image)| [debug console][debug-console] | Used to synchronise the time with the host |
| container workload (`sh(1)` in [the example](example-command.md)) | VM container | no | User specified (Ubuntu in [the example](example-command.md)) | [exec command](README.md#exec-command) | Managed by the agent |
See also the [process overview](README.md#process-overview).
> **Notes:**
>
> - The "User accessible" column shows how an administrator can access
> the environment.
>
> - The container workload is running inside a full container
> environment which itself is running within a VM environment.
>
> - See the [configuration files for the `osbuilder` tool](../../../tools/osbuilder/rootfs-builder)
> for details of the default distribution for platforms other than
> Intel x86_64.
#### Initrd image
The initrd image is a compressed `cpio(1)` archive, created from a
rootfs which is loaded into memory and used as part of the Linux
startup process. During startup, the kernel unpacks it into a special
instance of a `tmpfs` mount that becomes the initial root filesystem.
If this image type is [configured](README.md#configuration), when the user runs
the [example command](example-command.md):
- The [runtime](README.md#runtime) will launch the configured [hypervisor](README.md#hypervisor).
- The hypervisor will boot the mini-OS image using the [guest kernel](#guest-kernel).
- The kernel will start the init daemon as PID 1 (the
[agent](README.md#agent))
inside the VM root environment.
- The [agent](README.md#agent) will create a new container environment, setting its root
filesystem to that requested by the user (`ubuntu` in
[the example](example-command.md)).
- The agent will then execute the command (`sh(1)` in [the example](example-command.md))
inside the new container.
The table below summarises the default mini O/S showing the environments that are created,
the processes running in those environments (for all platforms) and
the root filesystem used by each service:
| Process | Environment | rootfs | User accessible | Notes |
|-|-|-|-|-|
| [Agent](README.md#agent) | VM root | [VM guest image](#guest-image) | [debug console][debug-console] | Runs as the init daemon (PID 1) |
| container workload | VM container | User specified (Ubuntu in this example) | [exec command](README.md#exec-command) | Managed by the agent |
> **Notes:**
>
> - The "User accessible" column shows how an administrator can access
> the environment.
>
> - It is possible to use a standard init daemon such as systemd with
> an initrd image if this is desirable.
See also the [process overview](README.md#process-overview).
#### Image summary
| Image type | Default distro | Init daemon | Reason | Notes |
|-|-|-|-|-|
| [image](background.md#root-filesystem-image) | [Clear Linux](https://clearlinux.org) (for x86_64 systems)| systemd | Minimal and highly optimized | systemd offers flexibility |
| [initrd](#initrd-image) | [Alpine Linux](https://alpinelinux.org) | Kata [agent](README.md#agent) (as no systemd support) | Security hardened and tiny C library |
See also:
- The [osbuilder](../../../tools/osbuilder) tool
This is used to build all default image types.
- The [versions database](../../../versions.yaml)
The `default-image-name` and `default-initrd-name` options specify
the default distributions for each image type.
[debug-console]: ../../Developer-Guide.md#connect-to-debug-console

View File

@@ -1,41 +0,0 @@
# History
## Kata 1.x architecture
In the old [Kata 1.x architecture](https://github.com/kata-containers/documentation/blob/master/design/architecture.md),
the Kata [runtime](README.md#runtime) was an executable called `kata-runtime`.
The container manager called this executable multiple times when
creating each container. Each time the runtime was called a different
OCI command-line verb was provided. This architecture was simple, but
not well suited to creating VM based containers due to the issue of
handling state between calls. Additionally, the architecture suffered
from performance issues related to continually having to spawn new
instances of the runtime binary, and
[Kata shim](https://github.com/kata-containers/shim) and
[Kata proxy](https://github.com/kata-containers/proxy) processes for systems
that did not provide VSOCK.
## Kata 2.x architecture
See the ["shimv2"](README.md#shim-v2-architecture) section of the
architecture document.
## Architectural comparison
| Kata version | Kata Runtime process calls | Kata shim processes | Kata proxy processes (if no VSOCK) |
|-|-|-|-|
| 1.x | multiple per container | 1 per container connection | 1 |
| 2.x | 1 per VM (hosting any number of containers) | 0 | 0 |
> **Notes:**
>
> - A single VM can host one or more containers.
>
> - The "Kata shim processes" column refers to the old
> [Kata shim](https://github.com/kata-containers/shim) (`kata-shim` binary),
> *not* the new shimv2 runtime instance (`containerd-shim-kata-v2` binary).
The diagram below shows how the original architecture was simplified
with the advent of shimv2.
![Kubernetes integration with shimv2](../arch-images/shimv2.svg)

View File

@@ -1,35 +0,0 @@
# Kubernetes support
[Kubernetes](https://github.com/kubernetes/kubernetes/), or K8s, is a popular open source
container orchestration engine. In Kubernetes, a set of containers sharing resources
such as networking, storage, mount, PID, etc. is called a
[pod](https://kubernetes.io/docs/user-guide/pods/).
A node can have multiple pods, but at a minimum, a node within a Kubernetes cluster
only needs to run a container runtime and a container agent (called a
[Kubelet](https://kubernetes.io/docs/admin/kubelet/)).
Kata Containers represents a Kubelet pod as a VM.
A Kubernetes cluster runs a control plane where a scheduler (typically
running on a dedicated master node) calls into a compute Kubelet. This
Kubelet instance is responsible for managing the lifecycle of pods
within the nodes and eventually relies on a container runtime to
handle execution. The Kubelet architecture decouples lifecycle
management from container execution through a dedicated gRPC based
[Container Runtime Interface (CRI)](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/container-runtime-interface-v1.md).
In other words, a Kubelet is a CRI client and expects a CRI
implementation to handle the server side of the interface.
[CRI-O](https://github.com/kubernetes-incubator/cri-o) and
[containerd](https://github.com/containerd/containerd/) are CRI
implementations that rely on
[OCI](https://github.com/opencontainers/runtime-spec) compatible
runtimes for managing container instances.
Kata Containers is an officially supported CRI-O and containerd
runtime. Refer to the following guides on how to set up Kata
Containers with Kubernetes:
- [How to use Kata Containers and containerd](../../how-to/containerd-kata.md)
- [Run Kata Containers with Kubernetes](../../how-to/run-kata-with-k8s.md)

View File

@@ -1,49 +0,0 @@
# Networking
Containers typically live in their own, possibly shared, networking namespace.
At some point in a container lifecycle, container engines will set up that namespace
to add the container to a network which is isolated from the host network.
In order to setup the network for a container, container engines call into a
networking plugin. The network plugin will usually create a virtual
ethernet (`veth`) pair adding one end of the `veth` pair into the container
networking namespace, while the other end of the `veth` pair is added to the
host networking namespace.
This is a very namespace-centric approach as many hypervisors or VM
Managers (VMMs) such as `virt-manager` cannot handle `veth`
interfaces. Typically, [`TAP`](https://www.kernel.org/doc/Documentation/networking/tuntap.txt)
interfaces are created for VM connectivity.
To overcome incompatibility between typical container engines expectations
and virtual machines, Kata Containers networking transparently connects `veth`
interfaces with `TAP` ones using [Traffic Control](https://man7.org/linux/man-pages/man8/tc.8.html):
![Kata Containers networking](../arch-images/network.png)
With a TC filter rules in place, a redirection is created between the container network
and the virtual machine. As an example, the network plugin may place a device,
`eth0`, in the container's network namespace, which is one end of a VETH device.
Kata Containers will create a tap device for the VM, `tap0_kata`,
and setup a TC redirection filter to redirect traffic from `eth0`'s ingress to `tap0_kata`'s egress,
and a second TC filter to redirect traffic from `tap0_kata`'s ingress to `eth0`'s egress.
Kata Containers maintains support for MACVTAP, which was an earlier implementation used in Kata.
With this method, Kata created a MACVTAP device to connect directly to the `eth0` device.
TC-filter is the default because it allows for simpler configuration, better CNI plugin
compatibility, and performance on par with MACVTAP.
Kata Containers has deprecated support for bridge due to lacking performance relative to TC-filter and MACVTAP.
Kata Containers supports both
[CNM](https://github.com/docker/libnetwork/blob/master/docs/design.md#the-container-network-model)
and [CNI](https://github.com/containernetworking/cni) for networking management.
## Network Hotplug
Kata Containers has developed a set of network sub-commands and APIs to add, list and
remove a guest network endpoint and to manipulate the guest route table.
The following diagram illustrates the Kata Containers network hotplug workflow.
![Network Hotplug](../arch-images/kata-containers-network-hotplug.png)

View File

@@ -1,44 +0,0 @@
# Storage
## virtio SCSI
If a block-based graph driver is [configured](README.md#configuration),
`virtio-scsi` is used to _share_ the workload image (such as
`busybox:latest`) into the container's environment inside the VM.
## virtio FS
If a block-based graph driver is _not_ [configured](README.md#configuration), a
[`virtio-fs`](https://virtio-fs.gitlab.io) (`VIRTIO`) overlay
filesystem mount point is used to _share_ the workload image instead. The
[agent](README.md#agent) uses this mount point as the root filesystem for the
container processes.
For virtio-fs, the [runtime](README.md#runtime) starts one `virtiofsd` daemon
(that runs in the host context) for each VM created.
## Devicemapper
The
[devicemapper `snapshotter`](https://github.com/containerd/containerd/tree/master/snapshots/devmapper)
is a special case. The `snapshotter` uses dedicated block devices
rather than formatted filesystems, and operates at the block level
rather than the file level. This knowledge is used to directly use the
underlying block device instead of the overlay file system for the
container root file system. The block device maps to the top
read-write layer for the overlay. This approach gives much better I/O
performance compared to using `virtio-fs` to share the container file
system.
#### Hot plug and unplug
Kata Containers has the ability to hot plug add and hot plug remove
block devices. This makes it possible to use block devices for
containers started after the VM has been launched.
Users can check to see if the container uses the `devicemapper` block
device as its rootfs by calling `mount(8)` within the container. If
the `devicemapper` block device is used, the root filesystem (`/`)
will be mounted from `/dev/vda`. Users can disable direct mounting of
the underlying block device through the runtime
[configuration](README.md#configuration).

View File

@@ -1825,8 +1825,12 @@ components:
desc: ""
- value: grpc.StartContainerRequest
desc: ""
- value: grpc.StartTracingRequest
desc: ""
- value: grpc.StatsContainerRequest
desc: ""
- value: grpc.StopTracingRequest
desc: ""
- value: grpc.TtyWinResizeRequest
desc: ""
- value: grpc.UpdateContainerRequest

View File

@@ -12,244 +12,187 @@ The OCI [runtime specification][linux-config] provides guidance on where the con
> [`cgroupsPath`][cgroupspath]: (string, OPTIONAL) path to the cgroups. It can be used to either control the cgroups
> hierarchy for containers or to run a new process in an existing container
Cgroups are hierarchical, and this can be seen with the following pod example:
cgroups are hierarchical, and this can be seen with the following pod example:
- Pod 1: `cgroupsPath=/kubepods/pod1`
- Container 1: `cgroupsPath=/kubepods/pod1/container1`
- Container 2: `cgroupsPath=/kubepods/pod1/container2`
- Container 1:
`cgroupsPath=/kubepods/pod1/container1`
- Container 2:
`cgroupsPath=/kubepods/pod1/container2`
- Pod 2: `cgroupsPath=/kubepods/pod2`
- Container 1: `cgroupsPath=/kubepods/pod2/container1`
- Container 2: `cgroupsPath=/kubepods/pod2/container2`
- Container 1:
`cgroupsPath=/kubepods/pod2/container2`
- Container 2:
`cgroupsPath=/kubepods/pod2/container2`
Depending on the upper-level orchestration layers, the cgroup under which the pod is placed is
managed by the orchestrator or not. In the case of Kubernetes, the pod cgroup is created by Kubelet,
while the container cgroups are to be handled by the runtime.
Kubelet will size the pod cgroup based on the container resource requirements, to which it may add
a configured set of [pod resource overheads](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/).
Depending on the upper-level orchestrator, the cgroup under which the pod is placed is
managed by the orchestrator. In the case of Kubernetes, the pod-cgroup is created by Kubelet,
while the container cgroups are to be handled by the runtime. Kubelet will size the pod-cgroup
based on the container resource requirements.
Kata Containers introduces a non-negligible resource overhead for running a sandbox (pod). Typically, the Kata shim,
through its underlying VMM invocation, will create many additional threads compared to process based container runtimes:
the para-virtualized I/O back-ends, the VMM instance or even the Kata shim process, all of those host processes consume
memory and CPU time not directly tied to the container workload, and introduces a sandbox resource overhead.
In order for a Kata workload to run without significant performance degradation, its sandbox overhead must be
provisioned accordingly. Two scenarios are possible:
Kata Containers introduces a non-negligible overhead for running a sandbox (pod). Based on this, two scenarios are possible:
1) The upper-layer orchestrator takes the overhead of running a sandbox into account when sizing the pod-cgroup, or
2) Kata Containers do not fully constrain the VMM and associated processes, instead placing a subset of them outside of the pod-cgroup.
1) The upper-layer orchestrator takes the overhead of running a sandbox into account when sizing the pod cgroup.
For example, Kubernetes [`PodOverhead`](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/)
feature lets the orchestrator add a configured sandbox overhead to the sum of all its containers resources. In
that case, the pod sandbox is properly sized and all Kata created processes will run under the pod cgroup
defined constraints and limits.
2) The upper-layer orchestrator does **not** take the sandbox overhead into account and the pod cgroup is not
sized to properly run all Kata created processes. With that scenario, attaching all the Kata processes to the sandbox
cgroup may lead to non-negligible workload performance degradations. As a consequence, Kata Containers will move
all processes but the vCPU threads into a dedicated overhead cgroup under `/kata_overhead`. The Kata runtime will
not apply any constraints or limits to that cgroup, it is up to the infrastructure owner to optionally set it up.
Kata Containers provides two options for how cgroups are handled on the host. Selection of these options is done through
the `SandboxCgroupOnly` flag within the Kata Containers [configuration](../../src/runtime/README.md#configuration)
file.
Those 2 scenarios are not dynamically detected by the Kata Containers runtime implementation, and thus the
infrastructure owner must configure the runtime according to how the upper-layer orchestrator creates and sizes the
pod cgroup. That configuration selection is done through the `sandbox_cgroup_only` flag within the Kata Containers
[configuration](../../src/runtime/README.md#configuration) file.
## `SandboxCgroupOnly` enabled
## `sandbox_cgroup_only = true`
With `SandboxCgroupOnly` enabled, it is expected that the parent cgroup is sized to take the overhead of running
a sandbox into account. This is ideal, as all the applicable Kata Containers components can be placed within the
given cgroup-path.
Setting `sandbox_cgroup_only` to `true` from the Kata Containers configuration file means that the pod cgroup is
properly sized and takes the pod overhead into account. This is ideal, as all the applicable Kata Containers processes
can simply be placed within the given cgroup path.
In the context of Kubernetes, Kubelet can size the pod cgroup to take the overhead of running a Kata-based sandbox
into account. This has been supported since the 1.16 Kubernetes release, through the
[`PodOverhead`](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/) feature.
In the context of Kubernetes, Kubelet will size the pod-cgroup to take the overhead of running a Kata-based sandbox
into account. This will be feasible in the 1.16 Kubernetes release through the `PodOverhead` feature.
```
┌─────────────────────────────────────────┐
┌──────────────────────────────────┐ │
│ │
│ │ ┌─────────────────────────────┐ │ │
│ │ │ │ │
│ │ │ ┌─────────────────────┐ │ │ │
│ │ │ │ vCPU threads
│ │ │ I/O threads │ │ │ │
│ │ │ │ VMM │ │
│ │ │ │ Kata Shim
│ │ │ │ │ │ │
│ │ │ │ /kata_<sandbox_id>
│ │ └─────────────────────┘ │ │ │
│ │Pod 1 │ │ │
│ │ └─────────────────────────────┘ │ │
│ │
│ │ ┌─────────────────────────────┐ │ │
│ │ │ │ │
│ │ ┌─────────────────────┐ │ │ │
│ │ │ vCPU threads
│ │ │ I/O threads │ │ │ │
│ │ │ │ VMM
│ │ │ Kata Shim │ │ │ │
│ │ │ │
│ │ │ │ /kata_<sandbox_id>
│ │ │ └─────────────────────┘ │ │ │
│ │ │Pod 2 │ │ │
│ │ └─────────────────────────────┘ │ │
│ │ │ │
│ │/kubepods │ │
│ └──────────────────────────────────┘ │
│ │
│ Node │
└─────────────────────────────────────────┘
+----------------------------------------------------------+
| +---------------------------------------------------+ |
| | +---------------------------------------------+ | |
| | | +--------------------------------------+ | | |
| | | | kata-shimv2, VMM and threads: | | | |
| | | | (VMM, IO-threads, vCPU threads, etc)| | | |
| | | | | | | |
| | | | kata_<sandbox-id> | | | |
| | | +--------------------------------------+ | | |
| | | | | |
| | |Pod 1 | | |
| | +---------------------------------------------+ | |
| | | |
| | +---------------------------------------------+ | |
| | | +--------------------------------------+ | | |
| | | | kata-shimv2, VMM and threads: | | | |
| | | | (VMM, IO-threads, vCPU threads, etc)| | | |
| | | | | | | |
| | | | kata_<sandbox-id> | | | |
| | | +--------------------------------------+ | | |
| | |Pod 2 | | |
| | +---------------------------------------------+ | |
| |kubepods | |
| +---------------------------------------------------+ |
| |
|Node |
+----------------------------------------------------------+
```
### Implementation details
### What does Kata do in this configuration?
1. Given a `PodSandbox` container creation, let:
When `sandbox_cgroup_only` is enabled, the Kata shim will create a per pod
sub-cgroup under the pod's dedicated cgroup. For example, in the Kubernetes context,
it will create a `/kata_<PodSandboxID>` under the `/kubepods` cgroup hierarchy.
On a typical cgroup v1 hierarchy mounted under `/sys/fs/cgroup/`, the memory cgroup
subsystem for a pod with sandbox ID `12345678` would live under
`/sys/fs/cgroup/memory/kubepods/kata_12345678`.
```
podCgroup=Parent(container.CgroupsPath)
KataSandboxCgroup=<podCgroup>/kata_<PodSandboxID>
```
In most cases, the `/kata_<PodSandboxID>` created cgroup is unrestricted and inherits and shares all
constraints and limits from the parent cgroup (`/kubepods` in the Kubernetes case). The exception is
for the `cpuset` and `devices` cgroup subsystems, which are managed by the Kata shim.
2. Create the cgroup, `KataSandboxCgroup`
After creating the `/kata_<PodSandboxID>` cgroup, the Kata Containers shim will move itself to it, **before** starting
the virtual machine. As a consequence all processes subsequently created by the Kata Containers shim (the VMM itself, and
all vCPU and I/O related threads) will be created in the `/kata_<PodSandboxID>` cgroup.
3. Join the `KataSandboxCgroup`
### Why create a kata-cgroup under the parent cgroup?
Any process created by the runtime will be created in `KataSandboxCgroup`.
The runtime will limit the cgroup in the host only if the sandbox doesn't have a
container type annotation, but the caller is free to set the proper limits for the `podCgroup`.
And why not directly adding the per sandbox shim directly to the pod cgroup (e.g.
`/kubepods` in the Kubernetes context)?
In the example above the pod cgroups are `/kubepods/pod1` and `/kubepods/pod2`.
Kata creates the unrestricted sandbox cgroup under the pod cgroup.
The Kata Containers shim implementation creates a per-sandbox cgroup
(`/kata_<PodSandboxID>`) to support the `Docker` use case. Although `Docker` does not
have a notion of pods, Kata Containers still creates a sandbox to support the pod-less,
single container use case that `Docker` implements. Since `Docker` does create any
cgroup hierarchy to place a container into, it would be very complex for Kata to map
a particular container to its sandbox without placing it under a `/kata_<containerID>>`
sub-cgroup first.
### Why create a Kata-cgroup under the parent cgroup?
### Advantages
`Docker` does not have a notion of pods, and will not create a cgroup directory
to place a particular container in (i.e., all containers would be in a path like
`/docker/container-id`. To simplify the implementation and continue to support `Docker`,
Kata Containers creates the sandbox-cgroup, in the case of Kubernetes, or a container cgroup, in the case
of docker.
Keeping all Kata Containers processes under a properly sized pod cgroup is ideal
and makes for a simpler Kata Containers implementation. It also helps with gathering
accurate statistics and preventing Kata workloads from being noisy neighbors.
### Improvements
#### Pod resources statistics
- Get statistics about pod resources
If the Kata caller wants to know the resource usage on the host it can get
statistics from the pod cgroup. All cgroups stats in the hierarchy will include
the Kata overhead. This gives the possibility of gathering usage-statics at the
pod level and the container level.
#### Better host resource isolation
- Better host resource isolation
Because the Kata runtime will place all the Kata processes in the pod cgroup,
the resource limits that the caller applies to the pod cgroup will affect all
processes that belong to the Kata sandbox in the host. This will improve the
isolation in the host preventing Kata to become a noisy neighbor.
## `sandbox_cgroup_only = false` (Default setting)
If the cgroup provided to Kata is not sized appropriately, Kata components will
consume resources that the actual container workloads expect to see and use.
This can cause instability and performance degradations.
To avoid that situation, Kata Containers creates an unconstrained overhead
cgroup and moves all non workload related processes (Anything but the virtual CPU
threads) to it. The name of this overhead cgroup is `/kata_overhead` and a per
sandbox sub cgroup will be created under it for each sandbox Kata Containers creates.
Kata Containers does not add any constraints or limitations on the overhead cgroup. It is up to the infrastructure
owner to either:
- Provision nodes with a pre-sized `/kata_overhead` cgroup. Kata Containers will
load that existing cgroup and move all non workload related processes to it.
- Let Kata Containers create the `/kata_overhead` cgroup, leave it
unconstrained or resize it a-posteriori.
## `SandboxCgroupOnly` disabled (default, legacy)
If the cgroup provided to Kata is not sized appropriately, instability will be
introduced when fully constraining Kata components, and the user-workload will
see a subset of resources that were requested. Based on this, the default
handling for Kata Containers is to not fully constrain the VMM and Kata
components on the host.
```
┌────────────────────────────────────────────────────────────────────┐
│ │
│ ┌─────────────────────────────┐ ┌───────────────────────────┐ │
│ │ │ │ │ │
┌─────────────────────────┼────┼─────────────────────────┐ │ │
│ │ │ │ │ │
│ ┌─────────────────────┐ │ │ ┌─────────────────────┐ │ │ │
│ │ │ │ vCPU threads │ │ │ │ VMM │ │ │ │
│ │ │ │ │ │ I/O threads │ │ │ │
│ │ │ │ │ │ │ │ Kata Shim │ │ │ │
│ │ │ │ │ │ │ │ │ │ │ │
│ │ /kata_<sandbox_id> │ │ │ │ /<sandbox_id> │ │ │ │
│ └─────────────────────┘ │ │ └─────────────────────┘ │ │ │
│ │ │ │ │ │ │ │
│ │ │ Pod 1 │ │ │ │ │
└─────────────────────────┼────┼─────────────────────────┘ │ │
│ │ │ │ │ │
│ │ │ │ │ │
┌─────────────────────────┼────┼─────────────────────────┐ │ │
│ │ │ │ │ │ │ │
│ ┌─────────────────────┐ │ │ ┌─────────────────────┐ │ │ │
│ │ vCPU threads │ │ │ │ VMM │ │ │ │
│ │ │ │ │ │ │ │ I/O threads │ │ │ │
│ │ │ │ │ │ │ │ Kata Shim │ │ │ │
│ │ │ │ │ │ │ │ │ │ │ │
│ │ │ │ /kata_<sandbox_id> │ │ │ │ /<sandbox_id> │ │ │ │
│ │ │ └─────────────────────┘ │ │ └─────────────────────┘ │ │ │
│ │ │ │ │
│ │ │ Pod 2 │ │ │ │ │
│ │ └─────────────────────────┼────┼─────────────────────────┘ │ │
│ │ │ │ │ │
│ │ /kubepods │ │ /kata_overhead │ │
│ └─────────────────────────────┘ └───────────────────────────┘ │
│ │
│ │
│ Node │
└────────────────────────────────────────────────────────────────────┘
+----------------------------------------------------------+
| +---------------------------------------------------+ |
| | +---------------------------------------------+ | |
| | | +--------------------------------------+ | | |
| | | |Container 1 |-|Container 2 | | | |
| | | | |-| | | | |
| | | | Shim+container1 |-| Shim+container2 | | | |
| | | +--------------------------------------+ | | |
| | | | | |
| | |Pod 1 | | |
| | +---------------------------------------------+ | |
| | | |
| | +---------------------------------------------+ | |
| | | +--------------------------------------+ | | |
| | | |Container 1 |-|Container 2 | | | |
| | | | |-| | | | |
| | | | Shim+container1 |-| Shim+container2 | | | |
| | | +--------------------------------------+ | | |
| | | | | |
| | |Pod 2 | | |
| | +---------------------------------------------+ | |
| |kubepods | |
| +---------------------------------------------------+ |
| +---------------------------------------------------+ |
| | Hypervisor | |
| |Kata | |
| +---------------------------------------------------+ |
| |
|Node |
+----------------------------------------------------------+
```
### Implementation Details
### What does this method do?
When `sandbox_cgroup_only` is disabled, the Kata Containers shim will create a per pod
sub-cgroup under the pods dedicated cgroup, and another one under the overhead cgroup.
For example, in the Kubernetes context, it will create a `/kata_<PodSandboxID>` under
the `/kubepods` cgroup hierarchy, and a `/<PodSandboxID>` under the `/kata_overhead` one.
1. Given a container creation let `containerCgroupHost=container.CgroupsPath`
1. Rename `containerCgroupHost` path to add `kata_`
1. Let `PodCgroupPath=PodSanboxContainerCgroup` where `PodSanboxContainerCgroup` is the cgroup of a container of type `PodSandbox`
1. Limit the `PodCgroupPath` with the sum of all the container limits in the Sandbox
1. Move only vCPU threads of hypervisor to `PodCgroupPath`
1. Per each container, move its `kata-shim` to its own `containerCgroupHost`
1. Move hypervisor and applicable threads to memory cgroup `/kata`
On a typical cgroup v1 hierarchy mounted under `/sys/fs/cgroup/`, for a pod which sandbox
ID is `12345678`, create with `sandbox_cgroup_only` disabled, the 2 memory subsystems
for the sandbox cgroup and the overhead cgroup would respectively live under
`/sys/fs/cgroup/memory/kubepods/kata_12345678` and `/sys/fs/cgroup/memory/kata_overhead/12345678`.
_Note_: the Kata Containers runtime will not add all the hypervisor threads to
the cgroup path requested, only vCPUs. These threads are run unconstrained.
Unlike when `sandbox_cgroup_only` is enabled, the Kata Containers shim will move itself
to the overhead cgroup first, and then move the vCPU threads to the sandbox cgroup as
they're created. All Kata processes and threads will run under the overhead cgroup except for
the vCPU threads.
This mitigates the risk of the VMM and other threads receiving an out of memory scenario (`OOM`).
With `sandbox_cgroup_only` disabled, Kata Containers assumes the pod cgroup is only sized
to accommodate for the actual container workloads processes. For Kata, this maps
to the VMM created virtual CPU threads and so they are the only ones running under the pod
cgroup. This mitigates the risk of the VMM, the Kata shim and the I/O threads going through
a catastrophic out of memory scenario (`OOM`).
#### Pros and Cons
#### Impact
Running all non vCPU threads under an unconstrained overhead cgroup could lead to workloads
potentially consuming a large amount of host resources.
If resources are reserved at a system level to account for the overheads of
running sandbox containers, this configuration can be utilized with adequate
stability. In this scenario, non-negligible amounts of CPU and memory will be
utilized unaccounted for on the host.
On the other hand, running all non vCPU threads under a dedicated overhead cgroup can provide
accurate metrics on the actual Kata Container pod overhead, allowing for tuning the overhead
cgroup size and constraints accordingly.
[linux-config]: https://github.com/opencontainers/runtime-spec/blob/main/config-linux.md
[cgroupspath]: https://github.com/opencontainers/runtime-spec/blob/main/config-linux.md#cgroups-path
[linux-config]: https://github.com/opencontainers/runtime-spec/blob/master/config-linux.md
[cgroupspath]: https://github.com/opencontainers/runtime-spec/blob/master/config-linux.md#cgroups-path
# Supported cgroups
Kata Containers currently only supports cgroups `v1`.
In the following sections each cgroup is described briefly.
Kata Containers supports cgroups `v1` and `v2`. In the following sections each cgroup is
described briefly and what changes are needed in Kata Containers to support it.
## Cgroups V1
@@ -301,7 +244,7 @@ diagram:
A process can join a cgroup by writing its process id (`pid`) to `cgroup.procs` file,
or join a cgroup partially by writing the task (thread) id (`tid`) to the `tasks` file.
Kata Containers only supports `v1`.
Kata Containers supports `v1` by default and no change in the configuration file is needed.
To know more about `cgroups v1`, see [cgroupsv1(7)][2].
## Cgroups V2
@@ -354,13 +297,22 @@ Same as `cgroups v1`, a process can join the cgroup by writing its process id (`
`cgroup.procs` file, or join a cgroup partially by writing the task (thread) id (`tid`) to
`cgroup.threads` file.
Kata Containers does not support cgroups `v2` on the host.
For backwards compatibility Kata Containers defaults to supporting cgroups v1 by default.
To change this to `v2`, set `sandbox_cgroup_only=true` in the `configuration.toml` file.
To know more about `cgroups v2`, see [cgroupsv2(7)][3].
### Distro Support
Many Linux distributions do not yet support `cgroups v2`, as it is quite a recent addition.
For more information about the status of this feature see [issue #2494][4].
# Summary
| cgroup option | default? | status | pros | cons | cgroups
|-|-|-|-|-|-|
| `SandboxCgroupOnly=false` | yes | legacy | Easiest to make Kata work | Unaccounted for memory and resource utilization | v1
| `SandboxCgroupOnly=true` | no | recommended | Complete tracking of Kata memory and CPU utilization. In Kubernetes, the Kubelet can fully constrain Kata via the pod cgroup | Requires upper layer orchestrator which sizes sandbox cgroup appropriately | v1, v2
[1]: http://man7.org/linux/man-pages/man5/tmpfs.5.html
[2]: http://man7.org/linux/man-pages/man7/cgroups.7.html#CGROUPS_VERSION_1

View File

@@ -1,21 +1,21 @@
# Kata 2.0 Metrics Design
Kata implements CRI's API and supports [`ContainerStats`](https://github.com/kubernetes/kubernetes/blob/release-1.18/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.proto#L101) and [`ListContainerStats`](https://github.com/kubernetes/kubernetes/blob/release-1.18/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.proto#L103) interfaces to expose containers metrics. User can use these interfaces to get basic metrics about containers.
Kata implement CRI's API and support [`ContainerStats`](https://github.com/kubernetes/kubernetes/blob/release-1.18/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.proto#L101) and [`ListContainerStats`](https://github.com/kubernetes/kubernetes/blob/release-1.18/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.proto#L103) interfaces to expose containers metrics. User can use these interface to get basic metrics about container.
Unlike `runc`, Kata is a VM-based runtime and has a different architecture.
But unlike `runc`, Kata is a VM-based runtime and has a different architecture.
## Limitations of Kata 1.x and target of Kata 2.0
## Limitations of Kata 1.x and the target of Kata 2.0
Kata 1.x has a number of limitations related to observability that may be obstacles to running Kata Containers at scale.
In Kata 2.0, the following components will be able to provide more details about the system:
In Kata 2.0, the following components will be able to provide more details about the system.
- containerd shim v2 (effectively `kata-runtime`)
- Hypervisor statistics
- Agent process
- Guest OS statistics
> **Note**: In Kata 1.x, the main user-facing component was the runtime (`kata-runtime`). From 1.5, Kata introduced the Kata containerd shim v2 (`containerd-shim-kata-v2`) which is essentially a modified runtime that is loaded by containerd to simplify and improve the way VM-based containers are created and managed.
> **Note**: In Kata 1.x, the main user-facing component was the runtime (`kata-runtime`). From 1.5, Kata then introduced the Kata containerd shim v2 (`containerd-shim-kata-v2`) which is essentially a modified runtime that is loaded by containerd to simplify and improve the way VM-based containers are created and managed.
>
> For Kata 2.0, the main component is the Kata containerd shim v2, although the deprecated `kata-runtime` binary will be maintained for a period of time.
>
@@ -25,15 +25,14 @@ In Kata 2.0, the following components will be able to provide more details about
Kata 2.0 metrics strongly depend on [Prometheus](https://prometheus.io/), a graduated project from CNCF.
Kata Containers 2.0 introduces a new Kata component called `kata-monitor` which is used to monitor the Kata components on the host. It's shipped with the Kata runtime to provide an interface to:
Kata Containers 2.0 introduces a new Kata component called `kata-monitor` which is used to monitor the other Kata components on the host. It's the monitor interface with Kata runtime, and we can do something like these:
- Get metrics
- Get events
At present, `kata-monitor` supports retrieval of metrics only: this is what will be covered in this document.
In this document we will cover metrics only. And until now it only supports metrics function.
This is the architecture overview of metrics in Kata Containers 2.0:
This is the architecture overview metrics in Kata Containers 2.0.
![Kata Containers 2.0 metrics](arch-images/kata-2-metrics.png)
@@ -46,39 +45,38 @@ For a quick evaluation, you can check out [this how to](../how-to/how-to-set-pro
### Kata monitor
The `kata-monitor` management agent should be started on each node where the Kata containers runtime is installed. `kata-monitor` will:
`kata-monitor` is a management agent on one node, where many Kata containers are running. `kata-monitor`'s work include:
> **Note**: a *node* running Kata containers will be either a single host system or a worker node belonging to a K8s cluster capable of running Kata pods.
> **Note**: node is a single host system or a node in K8s clusters.
- Aggregate sandbox metrics running on the node, adding the `sandbox_id` label to them.
- Attach the additional `cri_uid`, `cri_name` and `cri_namespace` labels to the sandbox metrics, tracking the `uid`, `name` and `namespace` Kubernetes pod metadata.
- Expose a new Prometheus target, allowing all node metrics coming from the Kata shim to be collected by Prometheus indirectly. This simplifies the targets count in Prometheus and avoids exposing shim's metrics by `ip:port`.
- Aggregate sandbox metrics running on this node, and add `sandbox_id` label
- As a Prometheus target, all metrics from Kata shim on this node will be collected by Prometheus indirectly. This can easy the targets count in Prometheus, and also need not to expose shim's metrics by `ip:port`
Only one `kata-monitor` process runs in each node.
Only one `kata-monitor` process are running on one node.
`kata-monitor` uses a different communication channel than the one used by the container engine (`containerd`/`CRI-O`) to communicate with the Kata shim. The Kata shim exposes a dedicated socket address reserved to `kata-monitor`.
`kata-monitor` is using a different communication channel other than that `conatinerd` communicating with Kata shim, and Kata shim listen on a new socket address for communicating with `kata-monitor`.
The shim's metrics socket file is created under the virtcontainers sandboxes directory, i.e. `vc/sbs/${PODID}/shim-monitor.sock`.
The way `kata-monitor` get shim's metrics socket file(`monitor_address`) like that `containerd` get shim address. The socket is an abstract socket and saved as file `abstract` with the same directory of `address` for `containerd`.
> **Note**: If there is no Prometheus server configured, i.e., there are no scrape operations, `kata-monitor` will not collect any metrics.
> **Note**: If there is no Prometheus server is configured, i.e., there is no scrape operations, `kata-monitor` will do nothing initiative.
### Kata runtime
Kata runtime is responsible for:
Runtime is responsible for:
- Gather metrics about shim process
- Gather metrics about hypervisor process
- Gather metrics about running sandbox
- Get metrics from Kata agent (through `ttrpc`)
- Get metrics from Kata agent(through `ttrpc`)
### Kata agent
Kata agent is responsible for:
Agent is responsible for:
- Gather agent process metrics
- Gather guest OS metrics
In Kata 2.0, the agent adds a new interface:
And in Kata 2.0, agent will add a new interface:
```protobuf
rpc GetMetrics(GetMetricsRequest) returns (Metrics);
@@ -95,49 +93,33 @@ The `metrics` field is Prometheus encoded content. This can avoid defining a fix
### Performance and overhead
Metrics should not become a bottleneck for the system or downgrade the performance: they should run with minimal overhead.
Metrics should not become the bottleneck of system, downgrade the performance, and run with minimal overhead.
Requirements:
* Metrics **MUST** be quick to collect
* Metrics **MUST** be small
* Metrics **MUST** be small.
* Metrics **MUST** be generated only if there are subscribers to the Kata metrics service
* Metrics **MUST** be stateless
In Kata 2.0, metrics are collected only when needed (pull mode), mainly from the `/proc` filesystem, and consumed by Prometheus. This means that if the Prometheus collector is not running (so no one cares about the metrics) the overhead will be zero.
In Kata 2.0, metrics are collected mainly from `/proc` filesystem, and consumed by Prometheus, based on a pull mode, that is mean if there is no Prometheus collector is running, so there will be zero overhead if nobody cares the metrics.
The metrics service also doesn't hold any metrics in memory.
#### Metrics size ####
Metrics service also doesn't hold any metrics in memory.
|\*|No Sandbox | 1 Sandbox | 2 Sandboxes |
|---|---|---|---|
|Metrics count| 39 | 106 | 173 |
|Metrics size (bytes)| 9K | 144K | 283K |
|Metrics size (`gzipped`, bytes)| 2K | 10K | 17K |
|Metrics size(bytes)| 9K | 144K | 283K |
|Metrics size(`gzipped`, bytes)| 2K | 10K | 17K |
*Metrics size*: response size of one Prometheus scrape request.
*Metrics size*: Response size of one Prometheus scrape request.
It's easy to estimate the size of one metrics fetch request issued by Prometheus.
The formula to calculate the expected size when no gzip compression is in place is:
9 + (144 - 9) * `number of kata sandboxes`
Prometheus supports `gzip compression`. When enabled, the response size of each request will be smaller:
2 + (10 - 2) * `number of kata sandboxes`
**Example**
We have 10 sandboxes running on a node. The expected size of one metrics fetch request issued by Prometheus against the kata-monitor agent running on that node will be:
9 + (144 - 9) * 10 = **1.35M**
If `gzip compression` is enabled:
2 + (10 - 2) * 10 = **82K**
#### Metrics delay ####
It's easy to estimated that if there are 10 sandboxes running in the host, the size of one metrics fetch request issued by Prometheus will be about to 9 + (144 - 9) * 10 = 1.35M (not `gzipped`) or 2 + (10 - 2) * 10 = 82K (`gzipped`). Of course Prometheus support `gzip` compression, that can reduce the response size of every request.
And here is some test data:
- End-to-end (from Prometheus server to `kata-monitor` and `kata-monitor` write response back): **20ms**(avg)
- Agent (RPC all from shim to agent): **3ms**(avg)
- End-to-end (from Prometheus server to `kata-monitor` and `kata-monitor` write response back): 20ms(avg)
- Agent(RPC all from shim to agent): 3ms(avg)
Test infrastructure:
@@ -146,13 +128,13 @@ Test infrastructure:
**Scrape interval**
Prometheus default `scrape_interval` is 1 minute, but it is usually set to 15 seconds. A smaller `scrape_interval` causes more overhead, so users should set it depending on their monitoring needs.
Prometheus default `scrape_interval` is 1 minute, and usually it is set to 15s. Small `scrape_interval` will cause more overhead, so user should set it on monitor demand.
## Metrics list
Here are listed all the metrics supported by Kata 2.0. Some metrics are dependent on the VM guest kernel, so the available ones may differ based on the environment.
Here listed is all supported metrics by Kata 2.0. Some metrics is dependent on guest kernels in the VM, so there may be some different by your environment.
Metrics are categorized by the component from/for which the metrics are collected.
Metrics is categorized by component where metrics are collected from and for.
* [Metric types](#metric-types)
* [Kata agent metrics](#kata-agent-metrics)
@@ -163,15 +145,15 @@ Metrics are categorized by the component from/for which the metrics are collecte
* [Kata containerd shim v2 metrics](#kata-containerd-shim-v2-metrics)
> **Note**:
> * Labels here do not include the `instance` and `job` labels added by Prometheus.
> * Labels here are not include `instance` and `job` labels that added by Prometheus.
> * Notes about metrics unit
> * `Kibibytes`, abbreviated `KiB`. 1 `KiB` equals 1024 B.
> * For some metrics (like network devices statistics from file `/proc/net/dev`), unit depends on label( for example `recv_bytes` and `recv_packets` have different units).
> * Most of these metrics are collected from the `/proc` filesystem, so the unit of each metric matches the unit of the relevant `/proc` entry. See the `proc(5)` manual page for further details.
> * For some metrics (like network devices statistics from file `/proc/net/dev`), unit is depend on label( for example `recv_bytes` and `recv_packets` are having different units).
> * Most of these metrics is collected from `/proc` filesystem, so the unit of metrics are keeping the same unit as `/proc`. See the `proc(5)` manual page for further details.
### Metric types
Prometheus offers four core metric types.
Prometheus offer four core metric types.
- Counter: A counter is a cumulative metric that represents a single monotonically increasing counter whose value can only increase.
@@ -225,7 +207,7 @@ Metrics for Firecracker vmm.
| `kata_firecracker_uart`: <br> Metrics specific to the UART device. | `GAUGE` | | <ul><li>`item`<ul><li>`error_count`</li><li>`flush_count`</li><li>`missed_read_count`</li><li>`missed_write_count`</li><li>`read_count`</li><li>`write_count`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_firecracker_vcpu`: <br> Metrics specific to VCPUs' mode of functioning. | `GAUGE` | | <ul><li>`item`<ul><li>`exit_io_in`</li><li>`exit_io_out`</li><li>`exit_mmio_read`</li><li>`exit_mmio_write`</li><li>`failures`</li><li>`filter_cpuid`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_firecracker_vmm`: <br> Metrics specific to the machine manager as a whole. | `GAUGE` | | <ul><li>`item`<ul><li>`device_events`</li><li>`panic_count`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_firecracker_vsock`: <br> VSOCK-related metrics. | `GAUGE` | | <ul><li>`item`<ul><li>`activate_fails`</li><li>`cfg_fails`</li><li>`conn_event_fails`</li><li>`conns_added`</li><li>`conns_killed`</li><li>`conns_removed`</li><li>`ev_queue_event_fails`</li><li>`killq_resync`</li><li>`muxer_event_fails`</li><li>`rx_bytes_count`</li><li>`rx_packets_count`</li><li>`rx_queue_event_count`</li><li>`rx_queue_event_fails`</li><li>`rx_read_fails`</li><li>`tx_bytes_count`</li><li>`tx_flush_fails`</li><li>`tx_packets_count`</li><li>`tx_queue_event_count`</li><li>`tx_queue_event_fails`</li><li>`tx_write_fails`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_firecracker_vsock`: <br> Vsock-related metrics. | `GAUGE` | | <ul><li>`item`<ul><li>`activate_fails`</li><li>`cfg_fails`</li><li>`conn_event_fails`</li><li>`conns_added`</li><li>`conns_killed`</li><li>`conns_removed`</li><li>`ev_queue_event_fails`</li><li>`killq_resync`</li><li>`muxer_event_fails`</li><li>`rx_bytes_count`</li><li>`rx_packets_count`</li><li>`rx_queue_event_count`</li><li>`rx_queue_event_fails`</li><li>`rx_read_fails`</li><li>`tx_bytes_count`</li><li>`tx_flush_fails`</li><li>`tx_packets_count`</li><li>`tx_queue_event_count`</li><li>`tx_queue_event_fails`</li><li>`tx_write_fails`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
### Kata guest OS metrics
@@ -306,7 +288,7 @@ Metrics about Kata containerd shim v2 process.
| Metric name | Type | Units | Labels | Introduced in Kata version |
|---|---|---|---|---|
| `kata_shim_agent_rpc_durations_histogram_milliseconds`: <br> RPC latency distributions. | `HISTOGRAM` | `milliseconds` | <ul><li>`action` (RPC actions of Kata agent)<ul><li>`grpc.CheckRequest`</li><li>`grpc.CloseStdinRequest`</li><li>`grpc.CopyFileRequest`</li><li>`grpc.CreateContainerRequest`</li><li>`grpc.CreateSandboxRequest`</li><li>`grpc.DestroySandboxRequest`</li><li>`grpc.ExecProcessRequest`</li><li>`grpc.GetMetricsRequest`</li><li>`grpc.GuestDetailsRequest`</li><li>`grpc.ListInterfacesRequest`</li><li>`grpc.ListProcessesRequest`</li><li>`grpc.ListRoutesRequest`</li><li>`grpc.MemHotplugByProbeRequest`</li><li>`grpc.OnlineCPUMemRequest`</li><li>`grpc.PauseContainerRequest`</li><li>`grpc.RemoveContainerRequest`</li><li>`grpc.ReseedRandomDevRequest`</li><li>`grpc.ResumeContainerRequest`</li><li>`grpc.SetGuestDateTimeRequest`</li><li>`grpc.SignalProcessRequest`</li><li>`grpc.StartContainerRequest`</li><li>`grpc.StatsContainerRequest`</li><li>`grpc.TtyWinResizeRequest`</li><li>`grpc.UpdateContainerRequest`</li><li>`grpc.UpdateInterfaceRequest`</li><li>`grpc.UpdateRoutesRequest`</li><li>`grpc.WaitProcessRequest`</li><li>`grpc.WriteStreamRequest`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_agent_rpc_durations_histogram_milliseconds`: <br> RPC latency distributions. | `HISTOGRAM` | `milliseconds` | <ul><li>`action` (RPC actions of Kata agent)<ul><li>`grpc.CheckRequest`</li><li>`grpc.CloseStdinRequest`</li><li>`grpc.CopyFileRequest`</li><li>`grpc.CreateContainerRequest`</li><li>`grpc.CreateSandboxRequest`</li><li>`grpc.DestroySandboxRequest`</li><li>`grpc.ExecProcessRequest`</li><li>`grpc.GetMetricsRequest`</li><li>`grpc.GuestDetailsRequest`</li><li>`grpc.ListInterfacesRequest`</li><li>`grpc.ListProcessesRequest`</li><li>`grpc.ListRoutesRequest`</li><li>`grpc.MemHotplugByProbeRequest`</li><li>`grpc.OnlineCPUMemRequest`</li><li>`grpc.PauseContainerRequest`</li><li>`grpc.RemoveContainerRequest`</li><li>`grpc.ReseedRandomDevRequest`</li><li>`grpc.ResumeContainerRequest`</li><li>`grpc.SetGuestDateTimeRequest`</li><li>`grpc.SignalProcessRequest`</li><li>`grpc.StartContainerRequest`</li><li>`grpc.StartTracingRequest`</li><li>`grpc.StatsContainerRequest`</li><li>`grpc.StopTracingRequest`</li><li>`grpc.TtyWinResizeRequest`</li><li>`grpc.UpdateContainerRequest`</li><li>`grpc.UpdateInterfaceRequest`</li><li>`grpc.UpdateRoutesRequest`</li><li>`grpc.WaitProcessRequest`</li><li>`grpc.WriteStreamRequest`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_fds`: <br> Kata containerd shim v2 open FDs. | `GAUGE` | | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_gc_duration_seconds`: <br> A summary of the pause duration of garbage collection cycles. | `SUMMARY` | `seconds` | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_goroutines`: <br> Number of goroutines that currently exist. | `GAUGE` | | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |

View File

@@ -30,7 +30,7 @@ The Kata Containers runtime **MUST** implement the following command line option
The Kata Containers project **MUST** provide two interfaces for CRI shims to manage hardware
virtualization based Kubernetes pods and containers:
- An OCI and `runc` compatible command line interface, as described in the previous section.
This interface is used by implementations such as [`CRI-O`](http://cri-o.io) and [`containerd`](https://github.com/containerd/containerd), for example.
This interface is used by implementations such as [`CRI-O`](http://cri-o.io) and [`cri-containerd`](https://github.com/containerd/cri-containerd), for example.
- A hardware virtualization runtime library API for CRI shims to consume and provide a more
CRI native implementation. The [`frakti`](https://github.com/kubernetes/frakti) CRI shim is an example of such a consumer.

View File

@@ -1,93 +0,0 @@
# Background
[Research](https://www.usenix.org/conference/fast16/technical-sessions/presentation/harter) shows that time to take for pull operation accounts for 76% of container startup time but only 6.4% of that data is read. So if we can get data on demand (lazy load), it will speed up the container start. [`Nydus`](https://github.com/dragonflyoss/image-service) is a project which build image with new format and can get data on demand when container start.
The following benchmarking result shows the performance improvement compared with the OCI image for the container cold startup elapsed time on containerd. As the OCI image size increases, the container startup time of using `nydus` image remains very short. [Click here](https://github.com/dragonflyoss/image-service/blob/master/docs/nydus-design.md) to see `nydus` design.
![`nydus`-performance](arch-images/nydus-performance.png)
## Proposal - Bring `lazyload` ability to Kata Containers
`Nydusd` is a fuse/`virtiofs` daemon which is provided by `nydus` project and it supports `PassthroughFS` and [RAFS](https://github.com/dragonflyoss/image-service/blob/master/docs/nydus-design.md) (Registry Acceleration File System) natively, so in Kata Containers, we can use `nydusd` in place of `virtiofsd` and mount `nydus` image to guest in the meanwhile.
The process of creating/starting Kata Containers with `virtiofsd`,
1. When creating sandbox, the Kata Containers Containerd v2 [shim](https://github.com/kata-containers/kata-containers/blob/main/docs/design/architecture/README.md#runtime) will launch `virtiofsd` before VM starts and share directories with VM.
2. When creating container, the Kata Containers Containerd v2 shim will mount rootfs to `kataShared`(/run/kata-containers/shared/sandboxes/\<SANDBOX\>/mounts/\<CONTAINER\>/rootfs), so it can be seen at the path `/run/kata-containers/shared/containers/shared/\<CONTAINER\>/rootfs` in the guest and used as container's rootfs.
The process of creating/starting Kata Containers with `nydusd`,
![kata-`nydus`](arch-images/kata-nydus.png)
1. When creating sandbox, the Kata Containers Containerd v2 shim will launch `nydusd` daemon before VM starts.
After VM starts, `kata-agent` will mount `virtiofs` at the path `/run/kata-containers/shared` and Kata Containers Containerd v2 shim mount `passthroughfs` filesystem to path `/run/kata-containers/shared/containers` when the VM starts.
```bash
# start nydusd
$ sandbox_id=my-test-sandbox
$ sudo /usr/local/bin/nydusd --log-level info --sock /run/vc/vm/${sandbox_id}/vhost-user-fs.sock --apisock /run/vc/vm/${sandbox_id}/api.sock
```
```bash
# source: the host sharedir which will pass through to guest
$ sudo curl -v --unix-socket /run/vc/vm/${sandbox_id}/api.sock \
-X POST "http://localhost/api/v1/mount?mountpoint=/containers" -H "accept: */*" \
-H "Content-Type: application/json" \
-d '{
"source":"/path/to/sharedir",
"fs_type":"passthrough_fs",
"config":""
}'
```
2. When creating normal container, the Kata Containers Containerd v2 shim send request to `nydusd` to mount `rafs` at the path `/run/kata-containers/shared/rafs/<container_id>/lowerdir` in guest.
```bash
# source: the metafile of nydus image
# config: the config of this image
$ sudo curl --unix-socket /run/vc/vm/${sandbox_id}/api.sock \
-X POST "http://localhost/api/v1/mount?mountpoint=/rafs/<container_id>/lowerdir" -H "accept: */*" \
-H "Content-Type: application/json" \
-d '{
"source":"/path/to/bootstrap",
"fs_type":"rafs",
"config":"config":"{\"device\":{\"backend\":{\"type\":\"localfs\",\"config\":{\"dir\":\"blobs\"}},\"cache\":{\"type\":\"blobcache\",\"config\":{\"work_dir\":\"cache\"}}},\"mode\":\"direct\",\"digest_validate\":true}",
}'
```
The Kata Containers Containerd v2 shim will also bind mount `snapshotdir` which `nydus-snapshotter` assigns to `sharedir`
So in guest, container rootfs=overlay(`lowerdir=rafs`, `upperdir=snapshotdir/fs`, `workdir=snapshotdir/work`)
> how to transfer the `rafs` info from `nydus-snapshotter` to the Kata Containers Containerd v2 shim?
By default, when creating `OCI` image container, `nydus-snapshotter` will return [`struct` Mount slice](https://github.com/containerd/containerd/blob/main/mount/mount.go#L21) below to containerd and containerd use them to mount rootfs
```
[
{
Type: "overlay",
Source: "overlay",
Options: [lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/<snapshot_A>/mnt,upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/<snapshot_B>/fs,workdir=/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/<snapshot_B>/work],
}
]
```
Then, we can append `rafs` info into `Options`, but if do this, containerd will mount failed, as containerd can not identify `rafs` info. Here, we can refer to [containerd mount helper](https://github.com/containerd/containerd/blob/main/mount/mount_linux.go#L42) and provide a binary called `nydus-overlayfs`. The `Mount` slice which `nydus-snapshotter` returned becomes
```
[
{
Type: "fuse.nydus-overlayfs",
Source: "overlay",
Options: [lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/<snapshot_A>/mnt,upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/<snapshot_B>/fs,workdir=/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/<snapshot_B>/work,extraoption=base64({source:xxx,config:xxx,snapshotdir:xxx})],
}
]
```
When containerd find `Type` is `fuse.nydus-overlayfs`,
1. containerd will call `mount.fuse` command;
2. in `mount.fuse`, it will call `nydus-overlayfs`.
3. in `nydus-overlayfs`, it will ignore the `extraoption` and do the overlay mount.
Finally, in the Kata Containers Containerd v2 shim, it parse `extraoption` and get the `rafs` info to mount the image in guest.

View File

@@ -209,5 +209,5 @@ network accessible to the collector.
- The trace collection proposals are still being considered.
[kata-1x-tracing]: https://github.com/kata-containers/agent/blob/master/TRACING.md
[trace-forwarder]: /src/tools/trace-forwarder
[trace-forwarder]: /src/trace-forwarder
[tracing-doc-pr]: https://github.com/kata-containers/kata-containers/pull/1937

View File

@@ -157,32 +157,6 @@ docker run --cpus 4 -ti debian bash -c "nproc; cat /sys/fs/cgroup/cpu,cpuacct/cp
400000 # cfs quota
```
## Virtual CPU handling without hotplug
In some cases, the hardware and/or software architecture being utilized does not support
hotplug. For example, Firecracker VMM does not support CPU or memory hotplug. Similarly,
the current Linux Kernel for aarch64 does not support CPU or memory hotplug. To appropriately
size the virtual machine for the workload within the container or pod, we provide a `static_sandbox_resource_mgmt`
flag within the Kata Containers configuration. When this is set, the runtime will:
- Size the VM based on the workload requirements as well as the `default_vcpus` option specified in the configuration.
- Not resize the virtual machine after it has been launched.
VM size determination varies depending on the type of container being run, and may not always
be available. If workload sizing information is not available, the virtual machine will be started with the
`default_vcpus`.
In the case of a pod, the initial sandbox container (pause container) typically doesn't contain any resource
information in its runtime `spec`. It is possible that the upper layer runtime
(i.e. containerd or CRI-O) may pass sandbox sizing annotations within the pause container's
`spec`. If these are provided, we will use this to appropriately size the VM. In particular,
we'll calculate the number of CPUs required for the workload and augment this by `default_vcpus`
configuration option, and use this for the virtual machine size.
In the case of a single container (i.e., not a pod), if the container specifies resource requirements,
the container's `spec` will provide the sizing information directly. If these are set, we will
calculate the number of CPUs required for the workload and augment this by `default_vcpus`
configuration option, and use this for the virtual machine size.
[1]: https://docs.docker.com/config/containers/resource_constraints/#cpu
[2]: https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource

View File

@@ -40,8 +40,8 @@ Kata Containers with QEMU has complete compatibility with Kubernetes.
Depending on the host architecture, Kata Containers supports various machine types,
for example `pc` and `q35` on x86 systems, `virt` on ARM systems and `pseries` on IBM Power systems. The default Kata Containers
machine type is `q35`. The machine type and its [`Machine accelerators`](#machine-accelerators) can
be changed by editing the runtime [`configuration`](architecture/README.md#configuration) file.
machine type is `pc`. The machine type and its [`Machine accelerators`](#machine-accelerators) can
be changed by editing the runtime [`configuration`](./architecture.md/#configuration) file.
Devices and features used:
- virtio VSOCK or virtio serial

View File

@@ -5,7 +5,7 @@
- [Run Kata containers with `crictl`](run-kata-with-crictl.md)
- [Run Kata Containers with Kubernetes](run-kata-with-k8s.md)
- [How to use Kata Containers and Containerd](containerd-kata.md)
- [How to use Kata Containers and CRI (containerd) with Kubernetes](how-to-use-k8s-with-cri-containerd-and-kata.md)
- [How to use Kata Containers and CRI (containerd plugin) with Kubernetes](how-to-use-k8s-with-cri-containerd-and-kata.md)
- [Kata Containers and service mesh for Kubernetes](service-mesh.md)
- [How to import Kata Containers logs into Fluentd](how-to-import-kata-logs-with-fluentd.md)
@@ -17,9 +17,10 @@
- `firecracker`
- `ACRN`
While `qemu` , `cloud-hypervisor` and `firecracker` work out of the box with installation of Kata,
some additional configuration is needed in case of `ACRN`.
While `qemu` and `cloud-hypervisor` work out of the box with installation of Kata,
some additional configuration is needed in case of `firecracker` and `ACRN`.
Refer to the following guides for additional configuration steps:
- [Kata Containers with Firecracker](https://github.com/kata-containers/documentation/wiki/Initial-release-of-Kata-Containers-with-Firecracker-support)
- [Kata Containers with ACRN Hypervisor](how-to-use-kata-containers-with-acrn.md)
## Advanced Topics
@@ -34,7 +35,3 @@
- [How to set sandbox Kata Containers configurations with pod annotations](how-to-set-sandbox-config-kata.md)
- [How to monitor Kata Containers in K8s](how-to-set-prometheus-in-k8s.md)
- [How to use hotplug memory on arm64 in Kata Containers](how-to-hotplug-memory-arm64.md)
- [How to setup swap devices in guest kernel](how-to-setup-swap-devices-in-guest-kernel.md)
- [How to run rootless vmm](how-to-run-rootless-vmm.md)
- [How to run Docker with Kata Containers](how-to-run-docker-with-kata.md)
- [How to run Kata Containers with `nydus`](how-to-use-virtio-fs-nydus-with-kata.md)

View File

@@ -39,7 +39,7 @@ use `RuntimeClass` instead of the deprecated annotations.
### Containerd Runtime V2 API: Shim V2 API
The [`containerd-shim-kata-v2` (short as `shimv2` in this documentation)](../../src/runtime/cmd/containerd-shim-kata-v2/)
The [`containerd-shim-kata-v2` (short as `shimv2` in this documentation)](../../src/runtime/containerd-shim-v2)
implements the [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/master/runtime/v2) for Kata.
With `shimv2`, Kubernetes can launch Pod and OCI-compatible containers with one shim per Pod. Prior to `shimv2`, `2N+1`
shims (i.e. a `containerd-shim` and a `kata-shim` for each container and the Pod sandbox itself) and no standalone `kata-proxy`
@@ -188,7 +188,7 @@ If you use Containerd older than v1.2.4 or a version of Kata older than v1.6.0
shell script with the following:
```bash
#!/usr/bin/env bash
#!/bin/bash
KATA_CONF_FILE=/etc/kata-containers/firecracker.toml containerd-shim-kata-v2 $@
```

View File

@@ -4,7 +4,7 @@
This document describes how to import Kata Containers logs into [Fluentd](https://www.fluentd.org/),
typically for importing into an
Elastic/Fluentd/Kibana([EFK](https://github.com/kubernetes-sigs/instrumentation-addons/tree/master/fluentd-elasticsearch#running-efk-stack-in-production))
Elastic/Fluentd/Kibana([EFK](https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/fluentd-elasticsearch#running-efk-stack-in-production))
or Elastic/Logstash/Kibana([ELK](https://www.elastic.co/elastic-stack)) stack.
The majority of this document focusses on CRI-O based (classic) Kata runtime. Much of that information
@@ -257,14 +257,14 @@ go directly to a full Kata specific JSON format logfile test.
Kata runtime has the ability to generate JSON logs directly, rather than its default `logfmt` format. Passing
the `--log-format=json` argument to the Kata runtime enables this. The easiest way to pass in this extra
parameter from a [Kata deploy](../../tools/packaging/kata-deploy) installation
parameter from a [Kata deploy](https://github.com/kata-containers/kata-containers/tree/main/tools/packaging/kata-deploy) installation
is to edit the `/opt/kata/bin/kata-qemu` shell script.
At the same time, we will add the `--log=/var/log/kata-runtime.log` argument to store the Kata logs in their
own file (rather than into the system journal).
```bash
#!/usr/bin/env bash
#!/bin/bash
/opt/kata/bin/kata-runtime --config "/opt/kata/share/defaults/kata-containers/configuration-qemu.toml" --log-format=json --log=/var/log/kata-runtime.log $@
```

View File

@@ -1,141 +0,0 @@
# How to run Docker in Docker with Kata Containers
This document describes the why and how behind running Docker in a Kata Container.
> **Note:** While in other environments this might be described as "Docker in Docker", the new architecture of Kata 2.x means [Docker can no longer be used to create containers using a Kata Containers runtime](https://github.com/kata-containers/kata-containers/issues/722).
## Requirements
- A working Kata Containers installation
## Install and configure Kata Containers
Follow the [Kata Containers installation guide](../install/README.md) to Install Kata Containers on your Kubernetes cluster.
## Background
Docker in Docker ("DinD") is the colloquial name for the ability to run `docker` from inside a container.
You can learn more about about Docker-in-Docker at the following links:
- [The original announcement of DinD](https://www.docker.com/blog/docker-can-now-run-within-docker/)
- [`docker` image Docker Hub page](https://hub.docker.com/_/docker/) (this page lists the `-dind` releases)
While normally DinD refers to running `docker` from inside a Docker container,
Kata Containers 2.x allows only [supported runtimes][kata-2.x-supported-runtimes] (such as [`containerd`](../install/container-manager/containerd/containerd-install.md)).
Running `docker` in a Kata Container implies creating Docker containers from inside a container managed by `containerd` (or another supported container manager), as illustrated below:
```
container manager -> Kata Containers shim -> Docker Daemon -> Docker container
(containerd) (containerd-shim-kata-v2) (dockerd) (busybox sh)
```
[OverlayFS][OverlayFS] is the preferred storage driver for most container runtimes on Linux ([including Docker](https://docs.docker.com/storage/storagedriver/select-storage-driver)).
> **Note:** While in the past Kata Containers did not contain the [`overlay` kernel module (aka OverlayFS)][OverlayFS], the kernel modules have been included since the [Kata Containers v2.0.0 release][v2.0.0].
[OverlayFS]: https://www.kernel.org/doc/html/latest/filesystems/overlayfs.html
[v2.0.0]: https://github.com/kata-containers/kata-containers/releases/tag/2.0.0
[kata-2.x-supported-runtimes]: ../install/container-manager/containerd/containerd-install.md
## Why Docker in Kata Containers 2.x requires special measures
Running Docker containers Kata Containers requires care because `VOLUME`s specified in `Dockerfile`s run by Kata Containers are given the `kataShared` mount type by default, which applies to the root directory `/`:
```console
/ # mount
kataShared on / type virtiofs (rw,relatime,dax)
```
`kataShared` mount types are powered by [`virtio-fs`][virtio-fs], a marked improvement over `virtio-9p`, thanks to [PR #1016](https://github.com/kata-containers/runtime/pull/1016). While `virtio-fs` is normally an excellent choice, in the case of DinD workloads `virtio-fs` causes an issue -- [it *cannot* be used as a "upper layer" of `overlayfs` without a custom patch](http://lists.katacontainers.io/pipermail/kata-dev/2020-January/001216.html).
As `/var/lib/docker` is a `VOLUME` specified by DinD (i.e. the `docker` images tagged `*-dind`/`*-dind-rootless`), `docker` fill fail to start (or even worse, silently pick a worse storage driver like `vfs`) when started in a Kata Container. Special measures must be taken when running DinD-powered workloads in Kata Containers.
## Workarounds/Solutions
Thanks to various community contributions (see [issue references below](#references)) the following options, with various trade-offs have been uncovered:
### Use a memory backed volume
For small workloads (small container images, without much generated filesystem load), a memory-backed volume is sufficient. Kubernetes supports a variant of [the `EmptyDir` volume][k8s-emptydir], which allows for memdisk-backed storage -- the [the `medium: Memory` ][k8s-memory-volume-type]. An example of a `Pod` using such a setup [was contributed](https://github.com/kata-containers/runtime/issues/1429#issuecomment-477385283), and is reproduced below:
```yaml
apiVersion: v1
kind: Pod
metadata:
name: dind
spec:
runtimeClassName: kata
containers:
- name: dind
securityContext:
privileged: true
image: docker:20.10-dind
args: ["--storage-driver=overlay2"]
resources:
limits:
memory: "3G"
volumeMounts:
- mountPath: /var/run/
name: dockersock
- mountPath: /var/lib/docker
name: docker
volumes:
- name: dockersock
emptyDir: {}
- name: docker
emptyDir:
medium: Memory
```
Inside the container you can view the mount:
```console
/ # mount | grep lib\/docker
tmpfs on /var/lib/docker type tmpfs (rw,relatime)
```
As is mentioned in the comment encapsulating this code, using volatile memory for container storage backing is a risky and could be possibly wasteful on machines that do not have a lot of RAM.
### Use a loop mounted disk
Using a loop mounted disk that is provisioned shortly before starting of the container workload is another approach that yields good performance.
Contributors provided [an example in issue #1888](https://github.com/kata-containers/runtime/issues/1888#issuecomment-739057384), which is reproduced in part below:
```yaml
spec:
containers:
- name: docker
image: docker:20.10-dind
command: ["sh", "-c"]
args:
- if [[ $(df -PT /var/lib/docker | awk 'NR==2 {print $2}') == virtiofs ]]; then
apk add e2fsprogs &&
truncate -s 20G /tmp/disk.img &&
mkfs.ext4 /tmp/disk.img &&
mount /tmp/disk.img /var/lib/docker; fi &&
dockerd-entrypoint.sh;
securityContext:
privileged: true
```
Note that loop mounted disks are often sparse, which means they *do not* take up the full amount of space that has been provisioned. This solution seems to produce the best performance and flexibility, at the expense of increased complexity and additional required setup.
### Build a custom kernel
It's possible to [modify the kernel](https://github.com/kata-containers/runtime/issues/1888#issuecomment-616872558) (in addition to applying the earlier mentioned mailing list patch) to support using `virtio-fs` as an upper. Note that if you modify your kernel and use `virtio-fs` you may require [additional changes](https://github.com/kata-containers/runtime/issues/1888#issuecomment-739057384) for decent performance and to address other issues.
> **NOTE:** A future kernel release may rectify the usability and performance issues of using `virtio-fs` as an OverlayFS upper layer.
## References
The solutions proposed in this document are an amalgamation of thoughtful contributions from the Kata Containers community.
Find links to issues & related discussion and the fruits therein below:
- [How to run Docker in Docker with Kata Containers (#2474)](https://github.com/kata-containers/kata-containers/issues/2474)
- [Does Kata-container support AUFS/OverlayFS? (#2493)](https://github.com/kata-containers/runtime/issues/2493)
- [Unable to start docker in docker with virtio-fs (#1888)](https://github.com/kata-containers/runtime/issues/1888)
- [Not using native diff for overlay2 (#1429)](https://github.com/kata-containers/runtime/issues/1429)

View File

@@ -1,33 +0,0 @@
## Introduction
To improve security, Kata Container supports running the VMM process (currently only QEMU) as a non-`root` user.
This document describes how to enable the rootless VMM mode and its limitations.
## Pre-requisites
The permission and ownership of the `kvm` device node (`/dev/kvm`) need to be configured to:
```
$ crw-rw---- 1 root kvm
```
use the following commands:
```
$ sudo groupadd kvm -r
$ sudo chown root:kvm /dev/kvm
$ sudo chmod 660 /dev/kvm
```
## Configure rootless VMM
By default, the VMM process still runs as the root user. There are two ways to enable rootless VMM:
1. Set the `rootless` flag to `true` in the hypervisor section of `configuration.toml`.
2. Set the Kubernetes annotation `io.katacontainers.hypervisor.rootless` to `true`.
## Implementation details
When `rootless` flag is enabled, upon a request to create a Pod, Kata Containers runtime creates a random user and group (e.g. `kata-123`), and uses them to start the hypervisor process.
The `kvm` group is also given to the hypervisor process as a supplemental group to give the hypervisor process access to the `/dev/kvm` device.
Another necessary change is to move the hypervisor runtime files (e.g. `vhost-fs.sock`, `qmp.sock`) to a directory (under `/run/user/[uid]/`) where only the non-root hypervisor has access to.
## Limitations
1. Only the VMM process is running as a non-root user. Other processes such as Kata Container shimv2 and `virtiofsd` still run as the root user.
2. Currently, this feature is only supported in QEMU. Still need to bring it to Firecracker and Cloud Hypervisor (see https://github.com/kata-containers/kata-containers/issues/2567).
3. Certain features will not work when rootless VMM is enabled, including:
1. Passing devices to the guest (`virtio-blk`, `virtio-scsi`) will not work if the non-privileged user does not have permission to access it (leading to a permission denied error). A more permissive permission (e.g. 666) may overcome this issue. However, you need to be aware of the potential security implications of reducing the security on such devices.
2. `vfio` device will also not work because of permission denied error.

View File

@@ -34,6 +34,8 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.agent.enable_tracing` | `boolean` | enable tracing for the agent |
| `io.katacontainers.config.agent.container_pipe_size` | uint32 | specify the size of the std(in/out) pipes created for containers |
| `io.katacontainers.config.agent.kernel_modules` | string | the list of kernel modules and their parameters that will be loaded in the guest kernel. Semicolon separated list of kernel modules and their parameters. These modules will be loaded in the guest kernel using `modprobe`(8). E.g., `e1000e InterruptThrottleRate=3000,3000,3000 EEE=1; i915 enable_ppgtt=0` |
| `io.katacontainers.config.agent.trace_mode` | string | the trace mode for the agent |
| `io.katacontainers.config.agent.trace_type` | string | the trace type for the agent |
## Hypervisor Options
| Key | Value Type | Comments |
@@ -56,14 +58,13 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.hypervisor.enable_iommu` | `boolean` | enable `iommu` on Q35 (QEMU x86_64) |
| `io.katacontainers.config.hypervisor.enable_iothreads` | `boolean`| enable IO to be processed in a separate thread. Supported currently for virtio-`scsi` driver |
| `io.katacontainers.config.hypervisor.enable_mem_prealloc` | `boolean` | the memory space used for `nvdimm` device by the hypervisor |
| `io.katacontainers.config.hypervisor.enable_swap` | `boolean` | enable swap of VM memory |
| `io.katacontainers.config.hypervisor.enable_vhost_user_store` | `boolean` | enable vhost-user storage device (QEMU) |
| `io.katacontainers.config.hypervisor.enable_virtio_mem` | `boolean` | enable virtio-mem (QEMU) |
| `io.katacontainers.config.hypervisor.entropy_source` (R) | string| the path to a host source of entropy (`/dev/random`, `/dev/urandom` or real hardware RNG device) |
| `io.katacontainers.config.hypervisor.file_mem_backend` (R) | string | file based memory backend root directory |
| `io.katacontainers.config.hypervisor.firmware_hash` | string | container firmware SHA-512 hash value |
| `io.katacontainers.config.hypervisor.firmware` | string | the guest firmware that will run the container VM |
| `io.katacontainers.config.hypervisor.firmware_volume_hash` | string | container firmware volume SHA-512 hash value |
| `io.katacontainers.config.hypervisor.firmware_volume` | string | the guest firmware volume that will be passed to the container VM |
| `io.katacontainers.config.hypervisor.guest_hook_path` | string | the path within the VM that will be used for drop in hooks |
| `io.katacontainers.config.hypervisor.hotplug_vfio_on_root_bus` | `boolean` | indicate if devices need to be hotplugged on the root bus instead of a bridge|
| `io.katacontainers.config.hypervisor.hypervisor_hash` | string | container hypervisor binary SHA-512 hash value |
@@ -90,13 +91,6 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.hypervisor.virtio_fs_cache` | string | the cache mode for virtio-fs, valid values are `always`, `auto` and `none` |
| `io.katacontainers.config.hypervisor.virtio_fs_daemon` | string | virtio-fs `vhost-user` daemon path |
| `io.katacontainers.config.hypervisor.virtio_fs_extra_args` | string | extra options passed to `virtiofs` daemon |
| `io.katacontainers.config.hypervisor.enable_guest_swap` | `boolean` | enable swap in the guest |
## Container Options
| Key | Value Type | Comments |
|-------| ----- | ----- |
| `io.katacontainers.container.resource.swappiness"` | `uint64` | specify the `Resources.Memory.Swappiness` |
| `io.katacontainers.container.resource.swap_in_bytes"` | `uint64` | specify the `Resources.Memory.Swap` |
# CRI-O Configuration
@@ -106,12 +100,11 @@ In case of CRI-O, all annotations specified in the pod spec are passed down to K
For containerd, annotations specified in the pod spec are passed down to Kata
starting with version `1.3.0` of containerd. Additionally, extra configuration is
needed for containerd, by providing `pod_annotations` field and
`container_annotations` field in the containerd config
file. The `pod_annotations` field and `container_annotations` field are two lists of
annotations that can be passed down to Kata as OCI annotations. They support golang match
patterns. Since annotations supported by Kata follow the pattern `io.katacontainers.*`,
the following configuration would work for passing annotations to Kata from containerd:
needed for containerd, by providing a `pod_annotations` field in the containerd config
file. The `pod_annotations` field is a list of annotations that can be passed down to
Kata as OCI annotations. It supports golang match patterns. Since annotations supported
by Kata follow the pattern `io.katacontainers.*`, the following configuration would work
for passing annotations to Kata from containerd:
```
$ cat /etc/containerd/config
@@ -120,7 +113,6 @@ $ cat /etc/containerd/config
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata]
runtime_type = "io.containerd.kata.v2"
pod_annotations = ["io.katacontainers.*"]
container_annotations = ["io.katacontainers.*"]
....
```

View File

@@ -1,59 +0,0 @@
# Setup swap device in guest kernel
## Introduction
Setup swap device in guest kernel can help to increase memory capacity, handle some memory issues and increase file access speed sometimes.
Kata Containers can insert a raw file to the guest as the swap device.
## Requisites
The swap config of the containers should be set by [annotations](how-to-set-sandbox-config-kata.md#container-options). So [extra configuration is needed for containerd](how-to-set-sandbox-config-kata.md#containerd-configuration).
Kata Containers just supports setup swap device in guest kernel with QEMU.
Install and setup Kata Containers as shown [here](../install/README.md).
Enable setup swap device in guest kernel as follows:
```
$ sudo sed -i -e 's/^#enable_guest_swap.*$/enable_guest_swap = true/g' /etc/kata-containers/configuration.toml
```
## Run a Kata Container utilizing swap device
Use following command to start a Kata Container with swappiness 60 and 1GB swap device (swap_in_bytes - memory_limit_in_bytes).
```
$ pod_yaml=pod.yaml
$ container_yaml=container.yaml
$ image="quay.io/prometheus/busybox:latest"
$ cat << EOF > "${pod_yaml}"
metadata:
name: busybox-sandbox1
EOF
$ cat << EOF > "${container_yaml}"
metadata:
name: busybox-test-swap
annotations:
io.katacontainers.container.resource.swappiness: "60"
io.katacontainers.container.resource.swap_in_bytes: "2147483648"
linux:
resources:
memory_limit_in_bytes: 1073741824
image:
image: "$image"
command:
- top
EOF
$ sudo crictl pull $image
$ podid=$(sudo crictl runp $pod_yaml)
$ cid=$(sudo crictl create $podid $container_yaml $pod_yaml)
$ sudo crictl start $cid
```
Kata Container setups swap device for this container only when `io.katacontainers.container.resource.swappiness` is set.
The following table shows the swap size how to decide if `io.katacontainers.container.resource.swappiness` is set.
|`io.katacontainers.container.resource.swap_in_bytes`|`memory_limit_in_bytes`|swap size|
|---|---|---|
|set|set| `io.katacontainers.container.resource.swap_in_bytes` - `memory_limit_in_bytes`|
|not set|set| `memory_limit_in_bytes`|
|not set|not set| `io.katacontainers.config.hypervisor.default_memory`|
|set|not set|cgroup doesn't support this usage|

View File

@@ -3,7 +3,7 @@
This document describes how to set up a single-machine Kubernetes (k8s) cluster.
The Kubernetes cluster will use the
[CRI containerd](https://github.com/containerd/containerd/) and
[CRI containerd plugin](https://github.com/containerd/cri) and
[Kata Containers](https://katacontainers.io) to launch untrusted workloads.
## Requirements
@@ -71,12 +71,12 @@ $ for service in ${services}; do
service_dir="/etc/systemd/system/${service}.service.d/"
sudo mkdir -p ${service_dir}
cat << EOF | sudo tee "${service_dir}/proxy.conf"
cat << EOT | sudo tee "${service_dir}/proxy.conf"
[Service]
Environment="HTTP_PROXY=${http_proxy}"
Environment="HTTPS_PROXY=${https_proxy}"
Environment="NO_PROXY=${no_proxy}"
EOF
EOT
done
$ sudo systemctl daemon-reload
@@ -154,7 +154,7 @@ From Kubernetes v1.12, users can use [`RuntimeClass`](https://kubernetes.io/docs
```bash
$ cat > runtime.yaml <<EOF
apiVersion: node.k8s.io/v1
apiVersion: node.k8s.io/v1beta1
kind: RuntimeClass
metadata:
name: kata
@@ -172,7 +172,7 @@ If a pod has the `runtimeClassName` set to `kata`, the CRI plugin runs the pod w
- Create an pod configuration that using Kata Containers runtime
```bash
$ cat << EOF | tee nginx-kata.yaml
$ cat << EOT | tee nginx-kata.yaml
apiVersion: v1
kind: Pod
metadata:
@@ -183,7 +183,7 @@ If a pod has the `runtimeClassName` set to `kata`, the CRI plugin runs the pod w
- name: nginx
image: nginx
EOF
EOT
```
- Create the pod

View File

@@ -1,57 +0,0 @@
# Kata Containers with virtio-fs-nydus
## Introduction
Refer to [kata-`nydus`-design](../design/kata-nydus-design.md) for introduction and `nydus` has supported Kata Containers with hypervisor `QEMU` and `CLH` currently.
## How to
You can use Kata Containers with `nydus` as follows,
1. Use [`nydus` latest branch](https://github.com/dragonflyoss/image-service);
2. Deploy `nydus` environment as [`Nydus` Setup for Containerd Environment](https://github.com/dragonflyoss/image-service/blob/master/docs/containerd-env-setup.md);
3. Start `nydus-snapshotter` with `enable_nydus_overlayfs` enabled;
4. Use [kata-containers](https://github.com/kata-containers/kata-containers) `latest` branch to compile and build `kata-containers.img`;
5. Update `configuration-qemu.toml` or `configuration-clh.toml`to include:
```toml
shared_fs = "virtio-fs-nydus"
virtio_fs_daemon = "<nydusd binary path>"
virtio_fs_extra_args = []
```
6. run `crictl run -r kata nydus-container.yaml nydus-sandbox.yaml`;
The `nydus-sandbox.yaml` looks like below:
```yaml
metadata:
attempt: 1
name: nydus-sandbox
namespace: default
log_directory: /tmp
linux:
security_context:
namespace_options:
network: 2
annotations:
"io.containerd.osfeature": "nydus.remoteimage.v1"
```
The `nydus-container.yaml` looks like below:
```yaml
metadata:
name: nydus-container
image:
image: localhost:5000/ubuntu-nydus:latest
command:
- /bin/sleep
args:
- 600
log_path: container.1.log
```

View File

@@ -6,4 +6,4 @@ Container deployments utilize explicit or implicit file sharing between host fil
As of the 2.0 release of Kata Containers, [virtio-fs](https://virtio-fs.gitlab.io/) is the default filesystem sharing mechanism.
virtio-fs support works out of the box for `cloud-hypervisor` and `qemu`, when Kata Containers is deployed using `kata-deploy`. Learn more about `kata-deploy` and how to use `kata-deploy` in Kubernetes [here](../../tools/packaging/kata-deploy/README.md#kubernetes-quick-start).
virtio-fs support works out of the box for `cloud-hypervisor` and `qemu`, when Kata Containers is deployed using `kata-deploy`. Learn more about `kata-deploy` and how to use `kata-deploy` in Kubernetes [here](https://github.com/kata-containers/kata-containers/tree/main/tools/packaging/kata-deploy#kubernetes-quick-start).

View File

@@ -1,4 +1,4 @@
#!/usr/bin/env bash
#!/bin/bash
# Copyright (c) 2019 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0

View File

@@ -16,9 +16,9 @@ from the host, a potentially undesirable side-effect that decreases the security
The following sections document how to configure this behavior in different container runtimes.
#### Containerd
#### Containerd and CRI
The Containerd allows configuring the privileged host devices behavior for each runtime in the containerd config. This is
The Containerd CRI allows configuring the privileged host devices behavior for each runtime in the CRI config. This is
done with the `privileged_without_host_devices` option. Setting this to `true` will disable hot plugging of the host
devices into the guest, even when privileged is enabled.
@@ -41,7 +41,7 @@ See below example config:
```
- [Kata Containers with Containerd and CRI documentation](how-to-use-k8s-with-cri-containerd-and-kata.md)
- [Containerd CRI config documentation](https://github.com/containerd/containerd/blob/main/docs/cri/config.md)
- [Containerd CRI config documentation](https://github.com/containerd/cri/blob/master/docs/config.md)
#### CRI-O

View File

@@ -9,7 +9,7 @@ Kubernetes CRI (Container Runtime Interface) implementations allow using any
OCI-compatible runtime with Kubernetes, such as the Kata Containers runtime.
Kata Containers support both the [CRI-O](https://github.com/kubernetes-incubator/cri-o) and
[containerd](https://github.com/containerd/containerd) CRI implementations.
[CRI-containerd](https://github.com/containerd/cri) CRI implementations.
After choosing one CRI implementation, you must make the appropriate configuration
to ensure it integrates with Kata Containers.
@@ -20,9 +20,9 @@ required to spawn pods and containers, and this is the preferred way to run Kata
An equivalent shim implementation for CRI-O is planned.
### CRI-O
For CRI-O installation instructions, refer to the [CRI-O Tutorial](https://github.com/cri-o/cri-o/blob/main/tutorial.md) page.
For CRI-O installation instructions, refer to the [CRI-O Tutorial](https://github.com/kubernetes-incubator/cri-o/blob/master/tutorial.md) page.
The following sections show how to set up the CRI-O snippet configuration file (default path: `/etc/crio/crio.conf`) for Kata.
The following sections show how to set up the CRI-O configuration file (default path: `/etc/crio/crio.conf`) for Kata.
Unless otherwise stated, all the following settings are specific to the `crio.runtime` table:
```toml
@@ -30,7 +30,7 @@ Unless otherwise stated, all the following settings are specific to the `crio.ru
# runtime used and options for how to set up and manage the OCI runtime.
[crio.runtime]
```
A comprehensive documentation of the configuration file can be found [here](https://github.com/cri-o/cri-o/blob/main/docs/crio.conf.5.md).
A comprehensive documentation of the configuration file can be found [here](https://github.com/cri-o/cri-o/blob/master/docs/crio.conf.5.md).
> **Note**: After any change to this file, the CRI-O daemon have to be restarted with:
>````
@@ -40,20 +40,82 @@ A comprehensive documentation of the configuration file can be found [here](http
#### Kubernetes Runtime Class (CRI-O v1.12+)
The [Kubernetes Runtime Class](https://kubernetes.io/docs/concepts/containers/runtime-class/)
is the preferred way of specifying the container runtime configuration to run a Pod's containers.
To use this feature, Kata must added as a runtime handler. This can be done by
dropping a `50-kata` snippet file into `/etc/crio/crio.conf.d`, with the
content shown below:
To use this feature, Kata must added as a runtime handler with:
```toml
[crio.runtime.runtimes.kata]
runtime_path = "/usr/bin/containerd-shim-kata-v2"
runtime_type = "vm"
runtime_root = "/run/vc"
privileged_without_host_devices = true
[crio.runtime.runtimes.kata-runtime]
runtime_path = "/usr/bin/kata-runtime"
runtime_type = "oci"
```
You can also add multiple entries to specify alternatives hypervisors, e.g.:
```toml
[crio.runtime.runtimes.kata-qemu]
runtime_path = "/usr/bin/kata-runtime"
runtime_type = "oci"
[crio.runtime.runtimes.kata-fc]
runtime_path = "/usr/bin/kata-runtime"
runtime_type = "oci"
```
#### Untrusted annotation (until CRI-O v1.12)
The untrusted annotation is used to specify a runtime for __untrusted__ workloads, i.e.
a runtime to be used when the workload cannot be trusted and a higher level of security
is required. An additional flag can be used to let CRI-O know if a workload
should be considered _trusted_ or _untrusted_ by default.
For further details, see the documentation
[here](../design/architecture.md#mixing-vm-based-and-namespace-based-runtimes).
```toml
# runtime is the OCI compatible runtime used for trusted container workloads.
# This is a mandatory setting as this runtime will be the default one
# and will also be used for untrusted container workloads if
# runtime_untrusted_workload is not set.
runtime = "/usr/bin/runc"
# runtime_untrusted_workload is the OCI compatible runtime used for untrusted
# container workloads. This is an optional setting, except if
# default_container_trust is set to "untrusted".
runtime_untrusted_workload = "/usr/bin/kata-runtime"
# default_workload_trust is the default level of trust crio puts in container
# workloads. It can either be "trusted" or "untrusted", and the default
# is "trusted".
# Containers can be run through different container runtimes, depending on
# the trust hints we receive from kubelet:
# - If kubelet tags a container workload as untrusted, crio will try first to
# run it through the untrusted container workload runtime. If it is not set,
# crio will use the trusted runtime.
# - If kubelet does not provide any information about the container workload trust
# level, the selected runtime will depend on the default_container_trust setting.
# If it is set to "untrusted", then all containers except for the host privileged
# ones, will be run by the runtime_untrusted_workload runtime. Host privileged
# containers are by definition trusted and will always use the trusted container
# runtime. If default_container_trust is set to "trusted", crio will use the trusted
# container runtime for all containers.
default_workload_trust = "untrusted"
```
#### Network namespace management
To enable networking for the workloads run by Kata, CRI-O needs to be configured to
manage network namespaces, by setting the following key to `true`.
In CRI-O v1.16:
```toml
manage_network_ns_lifecycle = true
```
In CRI-O v1.17+:
```toml
manage_ns_lifecycle = true
```
### containerd
### containerd with CRI plugin
If you select containerd with `cri` plugin, follow the "Getting Started for Developers"
instructions [here](https://github.com/containerd/cri#getting-started-for-developers)
to properly install it.
To customize containerd to select Kata Containers runtime, follow our
"Configure containerd to use Kata Containers" internal documentation
@@ -98,75 +160,32 @@ $ sudo systemctl restart kubelet
# If using CRI-O
$ sudo kubeadm init --ignore-preflight-errors=all --cri-socket /var/run/crio/crio.sock --pod-network-cidr=10.244.0.0/16
# If using containerd
# If using CRI-containerd
$ sudo kubeadm init --ignore-preflight-errors=all --cri-socket /run/containerd/containerd.sock --pod-network-cidr=10.244.0.0/16
$ export KUBECONFIG=/etc/kubernetes/admin.conf
```
### Allow pods to run in the master node
You can force Kubelet to use Kata Containers by adding some `untrusted`
annotation to your pod configuration. In our case, this ensures Kata
Containers is the selected runtime to run the described workload.
By default, the cluster will not schedule pods in the master node. To enable master node scheduling:
```bash
$ sudo -E kubectl taint nodes --all node-role.kubernetes.io/master-
```
### Create runtime class for Kata Containers
Users can use [`RuntimeClass`](https://kubernetes.io/docs/concepts/containers/runtime-class/#runtime-class) to specify a different runtime for Pods.
```bash
$ cat > runtime.yaml <<EOF
apiVersion: node.k8s.io/v1
kind: RuntimeClass
`nginx-untrusted.yaml`
```yaml
apiVersion: v1
kind: Pod
metadata:
name: kata
handler: kata
EOF
$ sudo -E kubectl apply -f runtime.yaml
```
### Run pod in Kata Containers
If a pod has the `runtimeClassName` set to `kata`, the CRI plugin runs the pod with the
[Kata Containers runtime](../../src/runtime/README.md).
- Create an pod configuration that using Kata Containers runtime
```bash
$ cat << EOF | tee nginx-kata.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-kata
spec:
runtimeClassName: kata
containers:
name: nginx-untrusted
annotations:
io.kubernetes.cri.untrusted-workload: "true"
spec:
containers:
- name: nginx
image: nginx
EOF
```
- Create the pod
```bash
$ sudo -E kubectl apply -f nginx-kata.yaml
```
- Check pod is running
```bash
$ sudo -E kubectl get pods
```
- Check hypervisor is running
```bash
$ ps aux | grep qemu
```
### Delete created pod
```bash
$ sudo -E kubectl delete -f nginx-kata.yaml
```
Next, you run your pod:
```
$ sudo -E kubectl apply -f nginx-untrusted.yaml
```

View File

@@ -34,7 +34,7 @@ as the proxy starts.
Follow the [instructions](../install/README.md)
to get Kata Containers properly installed and configured with Kubernetes.
You can choose between CRI-O and containerd, both are supported
You can choose between CRI-O and CRI-containerd, both are supported
through this document.
For both cases, select the workloads as _trusted_ by default. This way,
@@ -159,7 +159,7 @@ containers with `privileged: true` to `privileged: false`.
There is no difference between Istio and Linkerd in this section. It is
about which CRI implementation you use.
For both CRI-O and containerd, you have to add an annotation indicating
For both CRI-O and CRI-containerd, you have to add an annotation indicating
the workload for this deployment is not _trusted_, which will trigger
`kata-runtime` to be called instead of `runc`.
@@ -193,9 +193,9 @@ spec:
...
```
__containerd:__
__CRI-containerd:__
Add the following annotation for containerd
Add the following annotation for CRI-containerd
```yaml
io.kubernetes.cri.untrusted-workload: "true"
```

View File

@@ -12,26 +12,16 @@ Containers.
Packaged installation methods uses your distribution's native package format (such as RPM or DEB).
> **Note:** We encourage installation methods that provides automatic updates, it ensures security updates and bug fixes are
> easily applied.
*Note:* We encourage installation methods that provides automatic updates, it ensures security updates and bug fixes are
easily applied.
| Installation method | Description | Automatic updates | Use case |
|------------------------------------------------------|----------------------------------------------------------------------------------------------|-------------------|-----------------------------------------------------------------------------------------------|
| [Using kata-deploy](#kata-deploy-installation) | The preferred way to deploy the Kata Containers distributed binaries on a Kubernetes cluster | **No!** | Best way to give it a try on kata-containers on an already up and running Kubernetes cluster. |
| [Using official distro packages](#official-packages) | Kata packages provided by Linux distributions official repositories | yes | Recommended for most users. |
| [Using snap](#snap-installation) | Easy to install | yes | Good alternative to official distro packages. |
| [Automatic](#automatic-installation) | Run a single command to install a full system | **No!** | For those wanting the latest release quickly. |
| [Manual](#manual-installation) | Follow a guide step-by-step to install a working system | **No!** | For those who want the latest release with more control. |
| [Build from source](#build-from-source-installation) | Build the software components manually | **No!** | Power users and developers only. |
### Kata Deploy Installation
Kata Deploy provides a Dockerfile, which contains all of the binaries and
artifacts required to run Kata Containers, as well as reference DaemonSets,
which can be utilized to install Kata Containers on a running Kubernetes
cluster.
[Use Kata Deploy](/tools/packaging/kata-deploy/README.md) to install Kata Containers on a Kubernetes Cluster.
| Installation method | Description | Automatic updates | Use case |
|------------------------------------------------------|---------------------------------------------------------------------|-------------------|----------------------------------------------------------|
| [Using official distro packages](#official-packages) | Kata packages provided by Linux distributions official repositories | yes | Recommended for most users. |
| [Using snap](#snap-installation) | Easy to install | yes | Good alternative to official distro packages. |
| [Automatic](#automatic-installation) | Run a single command to install a full system | **No!** | For those wanting the latest release quickly. |
| [Manual](#manual-installation) | Follow a guide step-by-step to install a working system | **No!** | For those who want the latest release with more control. |
| [Build from source](#build-from-source-installation) | Build the software components manually | **No!** | Power users and developers only. |
### Official packages
@@ -58,9 +48,9 @@ Follow the [containerd installation guide](container-manager/containerd/containe
## Build from source installation
> **Note:** Power users who decide to build from sources should be aware of the
> implications of using an unpackaged system which will not be automatically
> updated as new [releases](../Stable-Branch-Strategy.md) are made available.
*Note:* Power users who decide to build from sources should be aware of the
implications of using an unpackaged system which will not be automatically
updated as new [releases](../Stable-Branch-Strategy.md) are made available.
[Building from sources](../Developer-Guide.md#initial-setup) allows power users
who are comfortable building software from source to use the latest component

View File

@@ -6,7 +6,7 @@
cluster locally. It creates a single node Kubernetes stack in a local VM.
[Kata Containers](https://github.com/kata-containers) can be installed into a Minikube cluster using
[`kata-deploy`](../../tools/packaging/kata-deploy).
[`kata-deploy`](https://github.com/kata-containers/kata-containers/tree/main/tools/packaging/kata-deploy).
This document details the pre-requisites, installation steps, and how to check
the installation has been successful.
@@ -123,7 +123,7 @@ $ kubectl apply -f kata-deploy/base/kata-deploy.yaml
This installs the Kata Containers components into `/opt/kata` inside the Minikube node. It can take
a few minutes for the operation to complete. You can check the installation has worked by checking
the status of the `kata-deploy` pod, which will be executing
[this script](../../tools/packaging/kata-deploy/scripts/kata-deploy.sh),
[this script](https://github.com/kata-containers/kata-containers/tree/main/tools/packaging/kata-deploy/scripts/kata-deploy.sh),
and will be executing a `sleep infinity` once it has successfully completed its work.
You can accomplish this by running the following:

View File

@@ -39,8 +39,8 @@ can be used as runtime.
Read the following documents to know how to run Kata Containers 2.x with `containerd`.
* [How to use Kata Containers and Containerd](../how-to/containerd-kata.md)
* [Install Kata Containers with containerd](./container-manager/containerd/containerd-install.md)
* [How to use Kata Containers and Containerd](https://github.com/kata-containers/kata-containers/blob/main/docs/how-to/containerd-kata.md)
* [Install Kata Containers with containerd](https://github.com/kata-containers/kata-containers/blob/main/docs/install/container-manager/containerd/containerd-install.md)
## Remove Kata Containers snap package

View File

@@ -1,3 +0,0 @@
# Kata Containers presentations
* [Unit testing](unit-testing)

View File

@@ -1,14 +0,0 @@
# Kata Containers unit testing presentation
## Markdown version
See [the Kata Containers unit testing presentation](kata-containers-unit-testing.md).
### To view as an HTML presentation
```bash
$ infile="kata-containers-unit-testing.md"
$ outfile="/tmp/kata-containers-unit-testing.html"
$ pandoc -s --metadata title="Kata Containers unit testing" -f markdown -t revealjs --highlight-style="zenburn" -i -o "$outfile" "$infile"
$ xdg-open "file://$outfile"
```

View File

@@ -1,335 +0,0 @@
## Why write unit tests?
- Catch regressions
- Improve the code being tested
Structure, quality, security, performance, "shakes out" implicit
assumptions, _etc_
- Extremely instructive
Once you've fully tested a single function, you'll understand that
code very well indeed.
## Why write unit tests? (continued)
- Fun!
Yes, really! Don't believe me? Try it! ;)
## Run all Kata Containers agent unit tests
As an example, to run all agent unit tests:
```bash
$ cd $GOPATH/src/github.com/kata-containers/kata-containers
$ cd src/agent
$ make test
```
## List all unit tests
- Identify the full name of all the tests _in the current package_:
```bash
$ cargo test -- --list
```
- Identify the full name of all tests in the `foo` "local crate"
(sub-directory containing another `Cargo.toml` file):
```bash
$ cargo test -p "foo" -- --list
```
## Run a single unit test
- Run a test in the current package in verbose mode:
```bash
# Example
$ test="config::tests::test_get_log_level"
$ cargo test "$test" -vv -- --exact --nocapture
```
## Test coverage setup
```bash
$ cargo install cargo-tarpaulin
```
## Show test coverage
```bash
$ cd $GOPATH/src/github.com/kata-containers/kata-containers/src/agent
$ cargo -v tarpaulin --all-features --run-types AllTargets --count --force-clean -o Html
$ xdg-open "file://$PWD/tarpaulin-report.html"
```
## Testability (part 1)
- To be testable, a function should:
- Not be "too long" (say >100 lines).
- Not be "too complex" (say >3 levels of indentation).
- Should return a `Result` or an `Option` so error paths
can be tested.
- If functions don't conform, they need to be reworked (refactored)
before writing tests.
## Testability (part 2)
- Some functions can't be fully tested.
- However, you _can_ test the initial code that checks
the parameter values (test error paths only).
## Writing new tests: General advice (part 1)
- KISS: Keep It Simple Stupid
You don't get extra points for cryptic code.
- DRY: Don't Repeat Yourself
Make use of existing facilities (don't "re-invert the wheel").
- Read the [unit test advice document](https://github.com/kata-containers/kata-containers/blob/main/docs/Unit-Test-Advice.md)
## Writing new tests: General advice (part 2)
- Attack the function in all possible ways
- Use the _table driven_ approach:
- Simple
- Compact
- Easy to debug
- Makes boundary analysis easy
- Encourages functions to be testable
## Writing new tests: Specific advice (part 1)
- Create a new "`tests`" module if necessary.
- Give each test function a "`test_`" prefix.
- Add the "`#[test]`" annotation on each test function.
## Writing new tests: Specific advice (part 2)
- If you need to `use` (import) packages for the tests,
_only do it in the `tests` module_:
```rust
use some_test_pkg::{foo, bar}; // <-- Not here
#[cfg(test)]
mod tests {
use super::*;
use some_test_pkg:{foo, bar}; // <-- Put it here
}
```
## Writing new tests: Specific advice (part 3)
- You can add test-specific dependencies in `Cargo.toml`:
```toml
[dev-dependencies]
serial_test = "0.5.1"
```
## Writing new tests: Specific advice (part 4)
- Don't add in lots of error handling code: let the test panic!
```rust
// This will panic if the unwrap fails.
// - NOT acceptable generally for production code.
// - PERFECTLY acceptable for test code since:
// - Keeps the test code simple.
// - Rust will detect the panic and fail the test.
let result = func().unwrap();
```
## Debugging tests (part 1)
- Comment out all tests in your `TestData` array apart from the failing test.
- Add temporary `println!("FIXME: ...")` statements in the code.
- Set `RUST_BACKTRACE=full` before running `cargo test`.
## Debugging tests (part 2)
- Use a debugger (not normally necessary though):
```bash
# Disable optimisation
$ RUSTFLAGS="-C opt-level=0" cargo test --no-run
# Find the test binary
$ test_binary=$(find target/debug/deps | grep "kata_agent-[a-z0-9][a-z0-9]*$" | tail -1)
$ rust-gdb "$test_binary"
```
## Useful tips
- Always start a test with a "clean environment":
Create new set of objects / files / directories / _etc_
for each test.
- Mounts
- Linux allows mounts on top of existing mounts.
- Bind mounts and read-only mounts can be useful.
## Gotchas (part 1)
If a test runs successfully _most of the time_:
- Review the test logic.
- Add a `#[serial]` annotation on the test function
Requires the `serial_test` package in the `[dev-dependencies]`
section of `Cargo.toml`.
If this makes it work the test is probably sharing resources with
another task (thread).
## Gotchas (part 2)
If a test works locally but fails in the CI, consider the following
attributes of each environment (local and CI):
- The version of rust being used.
- The hardware architecture.
- Number (and spec) of the CPUs.
## Gotchas (part 3)
If in doubt, look at the
["test artifacts" attached to the failing CI test](http://jenkins.katacontainers.io).
## Before raising a PR
- Remember to check that the test runs locally:
- As a non-privileged user.
- As the `root` user (carefully!)
- Run the [static checker](https://github.com/kata-containers/tests/blob/main/.ci/static-checks.sh)
on your changes.
Checks formatting and many other things.
## If in doubt
- Ask for help! ;)
## Quiz 1
What's wrong with this function?
```rust
fn foo(config: &Config, path_prefix: String, container_id: String, pid: String) -> Result<()> {
let mut full_path = format!("{}/{}", path_prefix, container_id);
let _ = remove_recursively(&mut full_path);
write_number_to_file(pid, full_path);
Ok(())
}
```
## Quiz 1: Answers (part 1)
- No check that `path_prefix`, `container_id` and `pid` are not `""`.
- No check that `path_prefix` is absolute.
- No check that `container_id` does not contain slashes / contains only valid characters.
- Result of `remove_recursively()` discarded.
- `remove_recursively()` _may_ modify `full_path` without `foo()` knowing!
## Quiz 1: Answers (part 2)
- Why is `pid` not a numeric?
- No check to ensure the PID is positive.
- No check to recreate any directories in the original `path_prefix`.
- `write_number_to_file()` could fail so why doesn't it return a value?
- The `config` parameter is unused.
## Quiz 1: What if...
Imagine if the caller managed to do this:
```rust
foo(config, "", "sbin/init", r#"#!/bin/sh\n/sbin/reboot"#);
```
## Quiz 2
What makes this function difficult to test?
```rust
fn get_user_id(username: String) -> i32 {
let line = grep_file(username, "/etc/passwd").unwrap();
let fields = line.split(':');
let uid = fields.nth(2).ok_or("failed").unwrap();
uid.parse::<i32>()
}
```
## Quiz 2: Answers (part 1)
- Unhelpful error message ("failed").
- Panics on error! Return a `Result` instead!
- UID's cannot be negative so function should return an unsigned
value.
## Quiz 2: Answers (part 2)
- Hard-coded filename.
This would be better:
```rust
const PASSWD_DB: &str = "/etc/passwd";
// Test code can now pass valid and invalid files!
fn get_user_id(filename: String, username: String) -> i32 {
// ...
}
let id = get_user_id(PASSWD_DB, username);
```
## Quiz 3
What's wrong with this test code?
```rust
let mut obj = Object::new();
// Sanity check
assert_eq!(obj.num, 0);
assert_eq!(obj.wibble, false);
// Test 1
obj->foo_method(7);
assert_eq!(obj.num, 7);
// Test 2
obj->bar_method(true);
assert_eq!(obj.wibble, true);
```
## Quiz 3: Answers
- The test code is "fragile":
- The 2nd test re-uses the object created in the first test.
## Finally
- [We need a GH action to run the unit tests](https://github.com/kata-containers/kata-containers/issues/2934)
Needs to fail PRs that decrease test coverage<br/> by "x%".

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 150 KiB

View File

@@ -1,137 +0,0 @@
# Kata Containers threat model
This document discusses threat models associated with the Kata Containers project.
Kata was designed to provide additional isolation of container workloads, protecting
the host infrastructure from potentially malicious container users or workloads. Since
Kata Containers adds a level of isolation on top of traditional containers, the focus
is on the additional layer provided, not on traditional container security.
This document provides a brief background on containers and layered security, describes
the interface to Kata from CRI runtimes, a review of utilized virtual machine interfaces, and then
a review of threats.
## Kata security objective
Kata seeks to prevent an untrusted container workload or user of that container workload to gain
control of, obtain information from, or tamper with the host infrastructure.
In our scenario, an asset is anything on the host system, or elsewhere in the cluster
infrastructure. The attacker is assumed to be either a malicious user or the workload itself
running within the container. The goal of Kata is to prevent attacks which would allow
any access to the defined assets.
## Background on containers, layered security
Traditional containers leverage several key Linux kernel features to provide isolation and
a view that the container workload is the only entity running on the host. Key features include
`Namespaces`, `cgroups`, `capablities`, `SELinux` and `seccomp`. The canonical runtime for creating such
a container is `runc`. In the remainder of the document, the term `traditional-container` will be used
to describe a container workload created by runc.
Kata Containers provides a second layer of isolation on top of those provided by traditional-containers.
The hardware virtualization interface is the basis of this additional layer. Kata launches a lightweight
virtual machine, and uses the guests Linux kernel to create a container workload, or workloads in the case
of multi-container pods. In Kubernetes and in the Kata implementation, the sandbox is carried out at the
pod level. In Kata, this sandbox is created using a virtual machine.
## Interface to Kata Containers: CRI, v2-shim, OCI
A typical Kata Containers deployment uses Kubernetes with a CRI implementation.
On every node, Kubelet will interact with a CRI implementor, which will in turn interface with
an OCI based runtime, such as Kata Containers. Typical CRI implementors are `cri-o` and `containerd`.
The CRI API, as defined at the Kubernetes [CRI-API repo](https://github.com/kubernetes/cri-api/),
results in a few constructs being supported by the CRI implementation, and ultimately in the OCI
runtime creating the workloads.
In order to run a container inside of the Kata sandbox, several virtual machine devices and interfaces
are required. Kata translates sandbox and container definitions to underlying virtualization technologies provided
by a set of virtual machine monitors (VMMs) and hypervisors. These devices and their underlying
implementations are discussed in detail in the following section.
## Interface to the Kata sandbox/virtual machine
In case of Kata, today the devices which we need in the guest are:
- Storage: In the current design of Kata Containers, we are reliant on the CRI implementor to
assist in image handling and volume management on the host. As a result, we need to support a way of passing to the sandbox the container rootfs, volumes requested
by the workload, and any other volumes created to facilitate sharing of secrets and `configmaps` with the containers. Depending on how these are managed, a block based device or file-system
sharing is required. Kata Containers does this by way of `virtio-blk` and/or `virtio-fs`.
- Networking: A method for enabling network connectivity with the workload is required. Typically this will be done providing a `TAP` device
to the VMM, and this will be exposed to the guest as a `virtio-net` device. It is feasible to pass in a NIC device directly, in which case `VFIO` is leveraged
and the device itself will be exposed to the guest.
- Control: In order to interact with the guest agent and retrieve `STDIO` from containers, a medium of communication is required.
This is available via `virtio-vsock`.
- Devices: `VFIO` is utilized when devices are passed directly to the virtual machine and exposed to the container.
- Dynamic Resource Management: `ACPI` is utilized to allow for dynamic VM resource management (for example: CPU, memory, device hotplug). This is required when containers are resized,
or more generally when containers are added to a pod.
How these devices are utilized varies depending on the VMM utilized. We clarify the default settings provided when integrating Kata
with the QEMU, Firecracker and Cloud Hypervisor VMMs in the following sections.
### Devices
Each virtio device is implemented by a backend, which may execute within userspace on the host (vhost-user), the VMM itself, or within the host kernel (vhost). While it may provide enhanced performance,
vhost devices are often seen as higher risk since an exploit would be already running within the kernel space. While VMM and vhost-user are both in userspace on the host, `vhost-user` generally allows for the back-end process to require less system calls and capabilities compared to a full VMM.
#### `virtio-blk` and `virtio-scsi`
The backend for `virtio-blk` and `virtio-scsi` are based in the VMM itself (ring3 in the context of x86) by default for Cloud Hypervisor, Firecracker and QEMU.
While `vhost` based back-ends are available for QEMU, it is not recommended. `vhost-user` back-ends are being added for Cloud Hypervisor, they are not utilized in Kata today.
#### `virtio-fs`
`virtio-fs` is supported in Cloud Hypervisor and QEMU. `virtio-fs`'s interaction with the host filesystem is done through a vhost-user daemon, `virtiofsd`.
The `virtio-fs` client, running in the guest, will generate requests to access files. `virtiofsd` will receive requests, open the file, and request the VMM
to `mmap` it into the guest. When DAX is utilized, the guest will access the host's page cache, avoiding the need for copy and duplication. DAX is still an experimental feature,
and is not enabled by default.
From the `virtiofsd` [documentation](https://qemu-project.gitlab.io/qemu/tools/virtiofsd.html):
```This program must be run as the root user. Upon startup the program will switch into a new file system namespace with the shared directory tree as its root. This prevents “file system escapes” due to symlinks and other file system objects that might lead to files outside the shared directory. The program also sandboxes itself using seccomp(2) to prevent ptrace(2) and other vectors that could allow an attacker to compromise the system after gaining control of the virtiofsd process.```
DAX-less support for `virtio-fs` is available as of the 5.4 Linux kernel. QEMU VMM supports virtio-fs as of v4.2. Cloud Hypervisor
supports `virtio-fs`.
#### `virtio-net`
`virtio-net` has many options, depending on the VMM and Kata configurations.
##### QEMU networking
While QEMU has options for `vhost`, `virtio-net` and `vhost-user`, the `virtio-net` backend
for Kata defaults to `vhost-net` for performance reasons. The default configuration is being
reevaluated.
##### Firecracker networking
For Firecracker, the `virtio-net` backend is within Firecracker's VMM.
##### Cloud Hypervisor networking
For Cloud Hypervisor, the current backend default is within the VMM. `vhost-user-net` support
is being added (written in rust, Cloud Hypervisor specific).
#### virtio-vsock
##### QEMU vsock
In QEMU, vsock is backed by `vhost_vsock`, which runs within the kernel itself.
##### Firecracker and Cloud Hypervisor
In Firecracker and Cloud Hypervisor, vsock is backed by a unix-domain-socket in the hosts userspace.
#### VFIO
Utilizing VFIO, devices can be passed through to the virtual machine. We will assess this separately. Exposure to
host is limited to gaps in device pass-through handling. This is supported in QEMU and Cloud Hypervisor, but not
Firecracker.
#### ACPI
ACPI is necessary for hotplug of CPU, memory and devices. ACPI is available in QEMU and Cloud Hypervisor. Device, CPU and memory hotplug
are not available in Firecracker.
## Devices and threat model
![Threat model](threat-model-boundaries.svg "threat-model")

View File

@@ -1,213 +0,0 @@
# Overview
This document explains how to trace Kata Containers components.
# Introduction
The Kata Containers runtime and agent are able to generate
[OpenTelemetry][opentelemetry] trace spans, which allow the administrator to
observe what those components are doing and how much time they are spending on
each operation.
# OpenTelemetry summary
An OpenTelemetry-enabled application creates a number of trace "spans". A span
contains the following attributes:
- A name
- A pair of timestamps (recording the start time and end time of some operation)
- A reference to the span's parent span
All spans need to be *finished*, or *completed*, to allow the OpenTelemetry
framework to generate the final trace information (by effectively closing the
transaction encompassing the initial (root) span and all its children).
For Kata, the root span represents the total amount of time taken to run a
particular component from startup to its shutdown (the "run time").
# Architecture
## Runtime tracing architecture
The runtime, which runs in the host environment, has been modified to
optionally generate trace spans which are sent to a trace collector on the
host.
## Agent tracing architecture
An OpenTelemetry system (such as [Jaeger][jaeger-tracing]) uses a collector to
gather up trace spans from the application for viewing and processing. For an
application to use the collector, it must run in the same context as
the collector.
This poses a problem for tracing the Kata Containers agent since it does not
run in the same context as the collector: it runs inside a virtual machine (VM).
To allow spans from the agent to be sent to the trace collector, Kata provides
a [trace forwarder][trace-forwarder] component. This runs in the same context
as the collector (generally on the host system) and listens on a
[`VSOCK`][vsock] channel for traces generated by the agent, forwarding them on
to the trace collector.
> **Note:**
>
> This design supports agent tracing without having to make changes to the
> image, but also means that [custom images][osbuilder] can also benefit from
> agent tracing.
The following diagram summarises the architecture used to trace the Kata
Containers agent:
```
+--------------------------------------------+
| Host |
| |
| +---------------+ |
| | OpenTelemetry | |
| | Trace | |
| | Collector | |
| +---------------+ |
| ^ +---------------+ |
| | spans | Kata VM | |
| +-----+-----+ | | |
| | Kata | spans o +-------+ | |
| | Trace |<-----------------| Kata | | |
| | Forwarder | VSOCK o | Agent | | |
| +-----------+ Channel | +-------+ | |
| +---------------+ |
+--------------------------------------------+
```
# Agent tracing prerequisites
- You must have a trace collector running.
Although the collector normally runs on the host, it can also be run from
inside a Docker image configured to expose the appropriate host ports to the
collector.
The [Jaeger "all-in-one" Docker image][jaeger-all-in-one] method
is the quickest and simplest way to run the collector for testing.
- If you wish to trace the agent, you must start the
[trace forwarder][trace-forwarder].
> **Notes:**
>
> - If agent tracing is enabled but the forwarder is not running,
> the agent will log an error (signalling that it cannot generate trace
> spans), but continue to work as normal.
>
> - The trace forwarder requires a trace collector (such as Jaeger) to be
> running before it is started. If a collector is not running, the trace
> forwarder will exit with an error.
# Enable tracing
By default, tracing is disabled for all components. To enable _any_ form of
tracing an `enable_tracing` option must be enabled for at least one component.
> **Note:**
>
> Enabling this option will only allow tracing for subsequently
> started containers.
## Enable runtime tracing
To enable runtime tracing, set the tracing option as shown:
```toml
[runtime]
enable_tracing = true
```
## Enable agent tracing
To enable agent tracing, set the tracing option as shown:
```toml
[agent.kata]
enable_tracing = true
```
> **Note:**
>
> If both agent tracing and runtime tracing are enabled, the resulting trace
> spans will be "collated": expanding individual runtime spans in the Jaeger
> web UI will show the agent trace spans resulting from the runtime
> operation.
# Appendices
## Agent tracing requirements
### Host environment
- The host kernel must support the VSOCK socket type.
This will be available if the kernel is built with the
`CONFIG_VHOST_VSOCK` configuration option.
- The VSOCK kernel module must be loaded:
```
$ sudo modprobe vhost_vsock
```
### Guest environment
- The guest kernel must support the VSOCK socket type:
This will be available if the kernel is built with the
`CONFIG_VIRTIO_VSOCKETS` configuration option.
> **Note:** The default Kata Containers guest kernel provides this feature.
## Agent tracing limitations
- Agent tracing is only "completed" when the workload and the Kata agent
process have exited.
Although trace information *can* be inspected before the workload and agent
have exited, it is incomplete. This is shown as `<trace-without-root-span>`
in the Jaeger web UI.
If the workload is still running, the trace transaction -- which spans the entire
runtime of the Kata agent -- will not have been completed. To view the complete
trace details, wait for the workload to end, or stop the container.
## Performance impact
[OpenTelemetry][opentelemetry] is designed for high performance. It combines
the best of two previous generation projects (OpenTracing and OpenCensus) and
uses a very efficient mechanism to capture trace spans. Further, the trace
points inserted into the agent are generated dynamically at compile time. This
is advantageous since new versions of the agent will automatically benefit
from improvements in the tracing infrastructure. Overall, the impact of
enabling runtime and agent tracing should be extremely low.
## Agent shutdown behaviour
In normal operation, the Kata runtime manages the VM shutdown and performs
certain optimisations to speed up this process. However, if agent tracing is
enabled, the agent itself is responsible for shutting down the VM. This it to
ensure all agent trace transactions are completed. This means there will be a
small performance impact for container shutdown when agent tracing is enabled
as the runtime must wait for the VM to shutdown fully.
## Set up a tracing development environment
If you want to debug, further develop, or test tracing,
[enabling full debug][enable-full-debug]
is highly recommended. For working with the agent, you may also wish to
[enable a debug console][setup-debug-console]
to allow you to access the VM environment.
[enable-full-debug]: ./Developer-Guide.md#enable-full-debug
[jaeger-all-in-one]: https://www.jaegertracing.io/docs/getting-started/
[jaeger-tracing]: https://www.jaegertracing.io
[opentelemetry]: https://opentelemetry.io
[osbuilder]: ../tools/osbuilder
[setup-debug-console]: ./Developer-Guide.md#set-up-a-debug-console
[trace-forwarder]: /src/tools/trace-forwarder
[vsock]: https://wiki.qemu.org/Features/VirtioVsock

View File

@@ -67,7 +67,7 @@ To use large BARs devices (for example, Nvidia Tesla P100), you need Kata versio
The following configuration in the Kata `configuration.toml` file as shown below can work:
Hotplug for PCI devices by `acpi_pcihp` (Linux's ACPI PCI Hotplug driver):
Hotplug for PCI devices by `shpchp` (Linux's SHPC PCI Hotplug driver):
```
machine_type = "q35"
@@ -91,6 +91,7 @@ The following kernel config options need to be enabled:
```
# Support PCI/PCIe device hotplug (Required for large BARs device)
CONFIG_HOTPLUG_PCI_PCIE=y
CONFIG_HOTPLUG_PCI_SHPC=y
# Support for loading modules (Required for load Nvidia drivers)
CONFIG_MODULES=y

Binary file not shown.

After

Width:  |  Height:  |  Size: 113 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 114 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 100 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 250 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 77 KiB

View File

@@ -231,11 +231,11 @@ $ cp ${GOPATH}/${LINUX_VER}/vmlinux ${KATA_KERNEL_LOCATION}/${KATA_KERNEL_NAME}
These instructions build upon the OS builder instructions located in the
[Developer Guide](../Developer-Guide.md). At this point it is recommended that
[Docker](https://docs.docker.com/engine/install/ubuntu/) is installed first, and
then [Kata-deploy](../../tools/packaging/kata-deploy)
then [Kata-deploy](https://github.com/kata-containers/kata-containers/tree/main/tools/packaging/kata-deploy)
is use to install Kata. This will make sure that the correct `agent` version
is installed into the rootfs in the steps below.
The following instructions use Ubuntu as the root filesystem with systemd as
The following instructions use Debian as the root filesystem with systemd as
the init and will add in the `kmod` binary, which is not a standard binary in
a Kata rootfs image. The `kmod` binary is necessary to load the Intel® QAT
kernel modules when the virtual machine rootfs boots.
@@ -257,7 +257,7 @@ $ cd $GOPATH
$ export AGENT_VERSION=$(kata-runtime version | head -n 1 | grep -o "[0-9.]\+")
$ cd ${OSBUILDER}/rootfs-builder
$ sudo rm -rf ${ROOTFS_DIR}
$ script -fec 'sudo -E GOPATH=$GOPATH USE_DOCKER=true SECCOMP=no ./rootfs.sh ubuntu'
$ script -fec 'sudo -E GOPATH=$GOPATH USE_DOCKER=true SECCOMP=no ./rootfs.sh debian'
```
### Compile Intel® QAT drivers for Kata Containers kernel and add to Kata Containers rootfs
@@ -355,10 +355,10 @@ this small script so that it redirects to be able to use either QEMU or
Cloud Hypervisor with Kata.
```bash
$ echo '#!/usr/bin/env bash' | sudo tee /usr/local/bin/containerd-shim-kata-qemu-v2
$ echo '#!/bin/bash' | sudo tee /usr/local/bin/containerd-shim-kata-qemu-v2
$ echo 'KATA_CONF_FILE=/opt/kata/share/defaults/kata-containers/configuration-qemu.toml /opt/kata/bin/containerd-shim-kata-v2 $@' | sudo tee -a /usr/local/bin/containerd-shim-kata-qemu-v2
$ sudo chmod +x /usr/local/bin/containerd-shim-kata-qemu-v2
$ echo '#!/usr/bin/env bash' | sudo tee /usr/local/bin/containerd-shim-kata-clh-v2
$ echo '#!/bin/bash' | sudo tee /usr/local/bin/containerd-shim-kata-clh-v2
$ echo 'KATA_CONF_FILE=/opt/kata/share/defaults/kata-containers/configuration-clh.toml /opt/kata/bin/containerd-shim-kata-v2 $@' | sudo tee -a /usr/local/bin/containerd-shim-kata-clh-v2
$ sudo chmod +x /usr/local/bin/containerd-shim-kata-clh-v2
```
@@ -419,11 +419,11 @@ You might need to disable Docker before initializing Kubernetes. Be aware
that the OpenSSL container image built above will need to be exported from
Docker and imported into containerd.
If Kata is installed through [`kata-deploy`](../../tools/packaging/kata-deploy/README.md)
If Kata is installed through [`kata-deploy`](https://github.com/kata-containers/kata-containers/blob/stable-2.0/tools/packaging/kata-deploy/README.md)
there will be multiple `configuration.toml` files associated with different
hypervisors. Rather than add in the custom Kata kernel, Kata rootfs, and
kernel modules to each `configuration.toml` as the default, instead use
[annotations](../how-to/how-to-load-kernel-modules-with-kata.md)
[annotations](https://github.com/kata-containers/kata-containers/blob/stable-2.0/docs/how-to/how-to-load-kernel-modules-with-kata.md)
in the Kubernetes YAML file to tell Kata which kernel and rootfs to use. The
easy way to do this is to use `kata-deploy` which will install the Kata binaries
to `/opt` and properly configure the `/etc/containerd/config.toml` with annotation

View File

@@ -1,102 +1,107 @@
# Kata Containers with SGX
Intel Software Guard Extensions (SGX) is a set of instructions that increases the security
Intel® Software Guard Extensions (SGX) is a set of instructions that increases the security
of applications code and data, giving them more protections from disclosure or modification.
This document guides you to run containers with SGX enclaves with Kata Containers in Kubernetes.
> **Note:** At the time of writing this document, SGX patches have not landed on the Linux kernel
> project, so specific versions for guest and host kernels must be installed to enable SGX.
## Preconditions
## Check if SGX is enabled
* Intel SGX capable bare metal nodes
* Host kernel Linux 5.13 or later with SGX and SGX KVM enabled:
Run the following command to check if your host supports SGX.
```sh
$ grep SGX /boot/config-`uname -r`
CONFIG_X86_SGX=y
CONFIG_X86_SGX_KVM=y
$ grep -o sgx /proc/cpuinfo
```
* Kubernetes cluster configured with:
* [`kata-deploy`](../../tools/packaging/kata-deploy) based Kata Containers installation
* [Intel SGX Kubernetes device plugin](https://github.com/intel/intel-device-plugins-for-kubernetes/tree/main/cmd/sgx_plugin#deploying-with-pre-built-images)
Continue to the following section if the output of the above command is empty,
otherwise continue to section [Install Guest kernel with SGX support](#install-guest-kernel-with-sgx-support)
> Note: Kata Containers supports creating VM sandboxes with Intel® SGX enabled
> using [cloud-hypervisor](https://github.com/cloud-hypervisor/cloud-hypervisor/) and [QEMU](https://www.qemu.org/) VMMs only.
## Install Host kernel with SGX support
### Kata Containers Configuration
The following commands were tested on Fedora 32, they might work on other distros too.
```sh
$ git clone --depth=1 https://github.com/intel/kvm-sgx
$ pushd kvm-sgx
$ cp /boot/config-$(uname -r) .config
$ yes "" | make oldconfig
$ # In the following step, enable: INTEL_SGX and INTEL_SGX_VIRTUALIZATION
$ make menuconfig
$ make -j$(($(nproc)-1)) bzImage
$ make -j$(($(nproc)-1)) modules
$ sudo make modules_install
$ sudo make install
$ popd
$ sudo reboot
```
> **Notes:**
> * Run: `mokutil --sb-state` to check whether secure boot is enabled, if so, you will need to sign the kernel.
> * You'll lose SGX support when a new distro kernel is installed and the system rebooted.
Once you have restarted your system with the new brand Linux Kernel with SGX support, run
the following command to make sure it's enabled. If the output is empty, go to the BIOS
setup and enable SGX manually.
```sh
$ grep -o sgx /proc/cpuinfo
```
## Install Guest kernel with SGX support
Install the guest kernel in the Kata Containers directory, this way it can be used to run
Kata Containers.
```sh
$ curl -LOk https://github.com/devimc/kvm-sgx/releases/download/v0.0.1/kata-virtiofs-sgx.tar.gz
$ sudo tar -xf kata-virtiofs-sgx.tar.gz -C /usr/share/kata-containers/
$ sudo sed -i 's|kernel =|kernel = "/usr/share/kata-containers/vmlinux-virtiofs-sgx.container"|g' \
/usr/share/defaults/kata-containers/configuration.toml
```
## Run Kata Containers with SGX enabled
Before running a Kata Container make sure that your version of `crio` or `containerd`
supports annotations.
For `containerd` check in `/etc/containerd/config.toml` that the list of `pod_annotations` passed
to the `sandbox` are: `["io.katacontainers.*", "sgx.intel.com/epc"]`.
## Usage
With the following sample job deployed using `kubectl apply -f`:
> Note: Change the `runtimeClassName` option accordingly, only `kata-clh` and `kata-qemu` support Intel® SGX.
> `sgx.yaml`
```yaml
apiVersion: batch/v1
kind: Job
apiVersion: v1
kind: Pod
metadata:
name: oesgx-demo-job
labels:
jobgroup: oesgx-demo
name: sgx
annotations:
sgx.intel.com/epc: "32Mi"
spec:
template:
metadata:
labels:
jobgroup: oesgx-demo
spec:
runtimeClassName: kata-clh
initContainers:
- name: init-sgx
image: busybox
command: ['sh', '-c', 'mkdir /dev/sgx; ln -s /dev/sgx_enclave /dev/sgx/enclave; ln -s /dev/sgx_provision /dev/sgx/provision']
volumeMounts:
- mountPath: /dev
name: dev-mount
restartPolicy: Never
containers:
-
name: eosgx-demo-job-1
image: oeciteam/oe-helloworld:latest
imagePullPolicy: IfNotPresent
securityContext:
readOnlyRootFilesystem: true
capabilities:
add: ["IPC_LOCK"]
resources:
limits:
sgx.intel.com/epc: "512Ki"
volumes:
- name: dev-mount
hostPath:
path: /dev
terminationGracePeriodSeconds: 0
runtimeClassName: kata
containers:
- name: c1
image: busybox
command:
- sh
stdin: true
tty: true
volumeMounts:
- mountPath: /dev/sgx/
name: test-volume
volumes:
- name: test-volume
hostPath:
path: /dev/sgx/
type: Directory
```
You'll see the enclave output:
```sh
$ kubectl logs oesgx-demo-job-wh42g
Hello world from the enclave
Enclave called into host to print: Hello World!
$ kubectl apply -f sgx.yaml
$ kubectl exec -ti sgx ls /dev/sgx/
enclave provision
```
### Notes
The output of the latest command shouldn't be empty, otherwise check
your system environment to make sure SGX is fully supported.
* The Kata VM's SGX Encrypted Page Cache (EPC) memory size is based on the sum of `sgx.intel.com/epc`
resource requests within the pod.
* `init-sgx` can be removed from the YAML configuration file if the Kata rootfs is modified with the
necessary udev rules.
See the [note on SGX backwards compatibility](https://github.com/intel/intel-device-plugins-for-kubernetes/tree/main/cmd/sgx_plugin#backwards-compatibility-note).
* Intel SGX DCAP attestation is known to work from Kata sandboxes but it comes with one limitation: If
the Intel SGX `aesm` daemon runs on the bare metal node and DCAP `out-of-proc` attestation is used,
containers within the Kata sandbox cannot get the access to the host's `/var/run/aesmd/aesm.sock`
because socket passthrough is not supported. An alternative is to deploy the `aesm` daemon as a side-car
container.
* Projects like [Gramine Shielded Containers (GSC)](https://gramine-gsc.readthedocs.io/en/latest/) are
also known to work. For GSC specifically, the Kata guest kernel needs to have the `CONFIG_NUMA=y`
enabled and at least one CPU online when running the GSC container.
[1]: github.com/cloud-hypervisor/cloud-hypervisor/

View File

@@ -1,4 +1,4 @@
# Setup to run SPDK vhost-user devices with Kata Containers
# Setup to run SPDK vhost-user devices with Kata Containers and Docker*
> **Note:** This guide only applies to QEMU, since the vhost-user storage
> device is only available for QEMU now. The enablement work on other
@@ -104,7 +104,7 @@ devices:
- `vhost-user-blk`
- `vhost-user-scsi`
- `vhost-user-nvme` (deprecated from SPDK 21.07 release)
- `vhost-user-nvme`
For more information, visit [SPDK](https://spdk.io) and [SPDK vhost-user target](https://spdk.io/doc/vhost.html).
@@ -222,43 +222,26 @@ minor `0` should be created for it, in order to be recognized by Kata runtime:
$ sudo mknod /var/run/kata-containers/vhost-user/block/devices/vhostblk0 b 241 0
```
> **Note:** The enablement of vhost-user block device in Kata containers
> is supported by Kata Containers `1.11.0-alpha1` or newer.
> Make sure you have updated your Kata containers before evaluation.
## Launch a Kata container with SPDK vhost-user block device
To use `vhost-user-blk` device, use `ctr` to pass a host `vhost-user-blk`
device to the container. In your `config.json`, you should use `devices`
To use `vhost-user-blk` device, use Docker to pass a host `vhost-user-blk`
device to the container. In docker, `--device=HOST-DIR:CONTAINER-DIR` is used
to pass a host device to the container.
For example (only `vhost-user-blk` listed):
```json
{
"linux": {
"devices": [
{
"path": "/dev/vda",
"type": "b",
"major": 241,
"minor": 0,
"fileMode": 420,
"uid": 0,
"gid": 0
}
]
}
}
```
With `rootfs` provisioned under `bundle` directory, you can run your SPDK container:
For example:
```bash
$ sudo ctr run -d --runtime io.containerd.run.kata.v2 --config bundle/config.json spdk_container
$ sudo docker run --runtime kata-runtime --device=/var/run/kata-containers/vhost-user/block/devices/vhostblk0:/dev/vda -it busybox sh
```
Example of performing I/O operations on the `vhost-user-blk` device inside
container:
```
$ sudo ctr t exec --exec-id 1 -t spdk_container sh
/ # ls -l /dev/vda
brw-r--r-- 1 root root 254, 0 Jan 20 03:54 /dev/vda
/ # dd if=/dev/vda of=/tmp/ddtest bs=4k count=20

121
docs/use-cases/zun_kata.md Normal file
View File

@@ -0,0 +1,121 @@
# OpenStack Zun DevStack working with Kata Containers
## Introduction
This guide describes how to get Kata Containers to work with OpenStack Zun
using DevStack on Ubuntu 16.04. Running DevStack with this guide will setup
Docker and Clear Containers 2.0, which you replace with Kata Containers.
Currently, the instructions are based on the following links:
- https://docs.openstack.org/zun/latest/contributor/quickstart.html
- https://docs.openstack.org/zun/latest/admin/clear-containers.html
## Install Git to use with DevStack
```sh
$ sudo apt install git
```
## Setup OpenStack DevStack
The following commands will sync DevStack from GitHub, create your
`local.conf` file, assign your host IP to this file, enable Clear
Containers, start DevStack, and set the environment variables to use
`zun` on the command line.
```sh
$ sudo mkdir -p /opt/stack
$ sudo chown $USER /opt/stack
$ git clone https://github.com/openstack-dev/devstack /opt/stack/devstack
$ HOST_IP="$(ip addr | grep 'state UP' -A2 | tail -n1 | awk '{print $2}' | cut -f1 -d'/')"
$ git clone https://github.com/openstack/zun /opt/stack/zun
$ cat /opt/stack/zun/devstack/local.conf.sample \
$ | sed "s/HOST_IP=.*/HOST_IP=$HOST_IP/" \
$ > /opt/stack/devstack/local.conf
$ sed -i "s/KURYR_CAPABILITY_SCOPE=.*/KURYR_CAPABILITY_SCOPE=local/" /opt/stack/devstack/local.conf
$ echo "ENABLE_CLEAR_CONTAINER=true" >> /opt/stack/devstack/local.conf
$ echo "enable_plugin zun-ui https://git.openstack.org/openstack/zun-ui" >> /opt/stack/devstack/local.conf
$ /opt/stack/devstack/stack.sh
$ source /opt/stack/devstack/openrc admin admin
```
The previous commands start OpenStack DevStack with Zun support. You can test
it using `runc` as shown by the following commands to make sure everything
installed correctly and is working.
```sh
$ zun run --name test cirros ping -c 4 8.8.8.8
$ zun list
$ zun logs test
$ zun delete test
```
## Install Kata Containers
Follow [these instructions](../install/README.md)
to install the Kata Containers components.
## Update Docker with new Kata Containers runtime
The following commands replace the Clear Containers 2.x runtime setup with
DevStack, with Kata Containers:
```sh
$ sudo sed -i 's/"cor"/"kata-runtime"/' /etc/docker/daemon.json
$ sudo sed -i 's/"\/usr\/bin\/cc-oci-runtime"/"\/usr\/bin\/kata-runtime"/' /etc/docker/daemon.json
$ sudo systemctl daemon-reload
$ sudo systemctl restart docker
```
## Test that everything works in both Docker and OpenStack Zun
```sh
$ sudo docker run -ti --runtime kata-runtime busybox sh
$ zun run --name kata --runtime kata-runtime cirros ping -c 4 8.8.8.8
$ zun list
$ zun logs kata
$ zun delete kata
```
## Stop DevStack and clean up system (Optional)
```sh
$ /opt/stack/devstack/unstack.sh
$ /opt/stack/devstack/clean.sh
```
## Restart DevStack and reset CC 2.x runtime to `kata-runtime`
Run the following commands if you already setup Kata Containers and want to
restart DevStack:
```sh
$ /opt/stack/devstack/unstack.sh
$ /opt/stack/devstack/clean.sh
$ /opt/stack/devstack/stack.sh
$ source /opt/stack/devstack/openrc admin admin
$ sudo sed -i 's/"cor"/"kata-runtime"/' /etc/docker/daemon.json
$ sudo sed -i 's/"\/usr\/bin\/cc-oci-runtime"/"\/usr\/bin\/kata-runtime"/' /etc/docker/daemon.json
$ sudo systemctl daemon-reload
$ sudo systemctl restart docker
```
![Kata Zun image 1](./images/kata-zun1.png)
Figure 1: Create a BusyBox container image
![Kata Zun image 2](./images/kata-zun2.png)
Figure 2: Select `kata-runtime` to use
![Kata Zun image 3](./images/kata-zun3.png)
Figure 3: Two BusyBox containers successfully launched
![Kata Zun image 4](./images/kata-zun4.png)
Figure 4: Test connectivity between Kata Containers
![Kata Zun image 5](./images/kata-zun5.png)
Figure 5: CLI for Zun

View File

@@ -7,15 +7,15 @@ edition = "2018"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
serde_json = "1.0.73"
serde_json = "1.0.39"
# slog:
# - Dynamic keys required to allow HashMap keys to be slog::Serialized.
# - The 'max_*' features allow changing the log level at runtime
# (by stopping the compiler from removing log calls).
slog = { version = "2.7.0", features = ["dynamic-keys", "max_level_trace", "release_max_level_debug"] }
slog-json = "2.4.0"
slog-async = "2.7.0"
slog-scope = "4.4.0"
slog = { version = "2.5.2", features = ["dynamic-keys", "max_level_trace", "release_max_level_info"] }
slog-json = "2.3.0"
slog-async = "2.3.0"
slog-scope = "4.1.2"
[dev-dependencies]
tempfile = "3.2.0"
tempfile = "3.1.0"

View File

@@ -20,8 +20,6 @@ const LOG_LEVELS: &[(&str, slog::Level)] = &[
("critical", slog::Level::Critical),
];
const DEFAULT_SUBSYSTEM: &str = "root";
// XXX: 'writer' param used to make testing possible.
pub fn create_logger<W>(
name: &str,
@@ -52,7 +50,7 @@ where
let logger = slog::Logger::root(
async_drain.fuse(),
o!("version" => env!("CARGO_PKG_VERSION"),
"subsystem" => DEFAULT_SUBSYSTEM,
"subsystem" => "root",
"pid" => process::id().to_string(),
"name" => name.to_string(),
"source" => source.to_string()),
@@ -218,8 +216,8 @@ where
#[cfg(test)]
mod tests {
use super::*;
use serde_json::{json, Value};
use slog::{crit, debug, error, info, warn, Logger};
use serde_json::Value;
use slog::info;
use std::io::prelude::*;
use tempfile::NamedTempFile;
@@ -297,15 +295,15 @@ mod tests {
let result_level = result.unwrap();
let expected_level = d.result.unwrap();
assert!(result_level == expected_level, "{}", msg);
assert!(result_level == expected_level, msg);
continue;
} else {
assert!(result.is_err(), "{}", msg);
assert!(result.is_err(), msg);
}
let expected_error = d.result.as_ref().unwrap_err();
let actual_error = result.unwrap_err();
assert!(&actual_error == expected_error, "{}", msg);
let expected_error = format!("{}", d.result.as_ref().unwrap_err());
let actual_error = format!("{}", result.unwrap_err());
assert!(actual_error == expected_error, msg);
}
}
@@ -352,13 +350,13 @@ mod tests {
let msg = format!("{}, result: {:?}", msg, result);
if d.result.is_ok() {
assert!(result == d.result, "{}", msg);
assert!(result == d.result, msg);
continue;
}
let expected_error = d.result.as_ref().unwrap_err();
let actual_error = result.unwrap_err();
assert!(&actual_error == expected_error, "{}", msg);
let expected_error = format!("{}", d.result.as_ref().unwrap_err());
let actual_error = format!("{}", result.unwrap_err());
assert!(actual_error == expected_error, msg);
}
}
@@ -378,17 +376,14 @@ mod tests {
let record_key = "record-key-1";
let record_value = "record-key-2";
let (logger, guard) = create_logger(name, source, level, writer);
let logger = create_logger(name, source, level, writer);
let msg = "foo, bar, baz";
// Call the logger (which calls the drain)
// Note: This "mid level" log level should be available in debug or
// release builds.
info!(&logger, "{}", msg; "subsystem" => record_subsystem, record_key => record_value);
info!(logger, "{}", msg; "subsystem" => record_subsystem, record_key => record_value);
// Force temp file to be flushed
drop(guard);
drop(logger);
let mut contents = String::new();
@@ -435,168 +430,4 @@ mod tests {
.expect("failed to find record key field");
assert_eq!(field_record_value, record_value);
}
#[test]
fn test_logger_levels() {
let name = "name";
let source = "source";
let debug_msg = "a debug log level message";
let info_msg = "an info log level message";
let warn_msg = "a warn log level message";
let error_msg = "an error log level message";
let critical_msg = "a critical log level message";
// The slog crate will *remove* macro calls for log levels "above" the
// configured log level.lock
//
// At the time of writing, the default slog log
// level is "info", but this crate overrides that using the magic
// "*max_level*" features in the "Cargo.toml" manifest.
// However, there are two log levels:
//
// - max_level_${level}
//
// This is the log level for normal "cargo build" (development/debug)
// builds.
//
// - release_max_level_${level}
//
// This is the log level for "cargo install" and
// "cargo build --release" (release) builds.
//
// This crate sets them to different values, which is sensible and
// standard practice. However, that causes a problem: there is
// currently no clean way for this test code to detect _which_
// profile the test is being built for (development or release),
// meaning we cannot know which macros are expected to produce output
// and which aren't ;(
//
// The best we can do is test the following log levels which
// are expected to work in all build profiles.
let debug_closure = |logger: &Logger, msg: String| debug!(logger, "{}", msg);
let info_closure = |logger: &Logger, msg: String| info!(logger, "{}", msg);
let warn_closure = |logger: &Logger, msg: String| warn!(logger, "{}", msg);
let error_closure = |logger: &Logger, msg: String| error!(logger, "{}", msg);
let critical_closure = |logger: &Logger, msg: String| crit!(logger, "{}", msg);
struct TestData<'a> {
slog_level: slog::Level,
slog_level_tag: &'a str,
msg: String,
closure: Box<dyn Fn(&Logger, String)>,
}
let tests = &[
TestData {
slog_level: slog::Level::Debug,
// Looks like a typo but tragically it isn't! ;(
slog_level_tag: "DEBG",
msg: debug_msg.into(),
closure: Box::new(debug_closure),
},
TestData {
slog_level: slog::Level::Info,
slog_level_tag: "INFO",
msg: info_msg.into(),
closure: Box::new(info_closure),
},
TestData {
slog_level: slog::Level::Warning,
slog_level_tag: "WARN",
msg: warn_msg.into(),
closure: Box::new(warn_closure),
},
TestData {
slog_level: slog::Level::Error,
// Another language tragedy
slog_level_tag: "ERRO",
msg: error_msg.into(),
closure: Box::new(error_closure),
},
TestData {
slog_level: slog::Level::Critical,
slog_level_tag: "CRIT",
msg: critical_msg.into(),
closure: Box::new(critical_closure),
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]", i);
// Create a writer for the logger drain to use
let writer =
NamedTempFile::new().expect(&format!("{:}: failed to create tempfile", msg));
// Used to check file contents before the temp file is unlinked
let mut writer_ref = writer
.reopen()
.expect(&format!("{:?}: failed to clone tempfile", msg));
let (logger, logger_guard) = create_logger(name, source, d.slog_level, writer);
// Call the logger (which calls the drain)
(d.closure)(&logger, d.msg.to_owned());
// Force temp file to be flushed
drop(logger_guard);
drop(logger);
let mut contents = String::new();
writer_ref
.read_to_string(&mut contents)
.expect(&format!("{:?}: failed to read tempfile contents", msg));
// Convert file to JSON
let fields: Value = serde_json::from_str(&contents)
.expect(&format!("{:?}: failed to convert logfile to json", msg));
// Check the expected JSON fields
let field_ts = fields
.get("ts")
.expect(&format!("{:?}: failed to find timestamp field", msg));
assert_ne!(field_ts, "", "{}", msg);
let field_version = fields
.get("version")
.expect(&format!("{:?}: failed to find version field", msg));
assert_eq!(field_version, env!("CARGO_PKG_VERSION"), "{}", msg);
let field_pid = fields
.get("pid")
.expect(&format!("{:?}: failed to find pid field", msg));
assert_ne!(field_pid, "", "{}", msg);
let field_level = fields
.get("level")
.expect(&format!("{:?}: failed to find level field", msg));
assert_eq!(field_level, d.slog_level_tag, "{}", msg);
let field_msg = fields
.get("msg")
.expect(&format!("{:?}: failed to find msg field", msg));
assert_eq!(field_msg, &json!(d.msg), "{}", msg);
let field_name = fields
.get("name")
.expect(&format!("{:?}: failed to find name field", msg));
assert_eq!(field_name, name, "{}", msg);
let field_source = fields
.get("source")
.expect(&format!("{:?}: failed to find source field", msg));
assert_eq!(field_source, source, "{}", msg);
let field_subsystem = fields
.get("subsystem")
.expect(&format!("{:?}: failed to find subsystem field", msg));
// No explicit subsystem, so should be the default
assert_eq!(field_subsystem, &json!(DEFAULT_SUBSYSTEM), "{}", msg);
}
}
}

Some files were not shown because too many files have changed in this diff Show More