Compare commits

..

210 Commits
3.0.2 ... 2.0.0

Author SHA1 Message Date
Bin Liu
aa295c91f2 Merge pull request #992 from bergwolf/2.0.0-branch-bump
# Kata Containers 2.0.0
2020-10-19 16:02:09 +08:00
Ubuntu
6648c8c7fc release: Kata Containers 2.0.0
- backport 2.0-dev commits to stable-2.0.0

dbfe85e snap: install libseccomp-dev
0c3b6a9 package: drop qemu-virtiofs shim
f751c98 packaging: install virtiofsd for normal qemu build as well
08361c5 runtime: enable virtiofs by default
da9bfb2 runtime: Pass `--thread-pool-size=1` to virtiofsd
7347d43 packaging: Apply virtiofs performance related fixes to 5.x
c7bb1e2 tools: Improve agent-ctl README
e6f7ddd tools: Make agent-ctl support more APIs
46cfed5 tools: Remove commented out code in agent-ctl
81fb2c9 tools: Log request in agent-ctl tool if debug enabled
0c43215 tools: Rename agent-ctl command to GetGuestDetails
6511ffe tools: Fix comment in agent-ctl
ee59378 kernel: update to 5.4.71
ef11213 config: make virtio-fs part of standard kernel
1fb6730 agent: remove `unwrap()` for `e.as_errno()`
05e9fe0 agent: Use `?` instead of `match` when the error returns directly
d658129 kata-monitor: use regexp to check if runtime is kata containers
ae2d89e agent: use anyhow `context` to attach context to `Error` instead of `match`
095d4ad agent: remove useless match
bd816df agent: Use `ok_or_else` instead of match for Option -> Result
d413bf7 agent: Fix crasher if AddARPNeighbors request empty
76408c0 agent: Fix crasher if UpdateRoutes request empty
6e4da19 agent: Fix crasher if UpdateInterface request empty
8f8061d agent: replace `match Result` with `or_else`
64e4b2f agent: replace unnecessary `match Result` with `map_err`
7c0d68f agent: replace check! with map_err for readability
82ed34a agent: remove `check!` in child process because we cant' see logs.
9def624 agent: replace `if let Err` with `or_else`
6926914 agent: refactor namespace::setup to optimize error handling
e733c13 agent: replace `if let Err` with `map_err`
ba069f9 rustjail: add length check for uid_mappings in rootless euid mapping
cc8ec7b versions: Update Kubernetes, containerd, cri-o and cri-tools
8a364d2 annotations: Correct unit tests to validate new protections
0cc6297 annotations: Split addHypervisorOverrides to reduce complexity
b6059f3 annotations: Add unit test for checkPathIsInGlobs
c6afad2 annotations: Add unit test for regexpContains function
451608f makefile: Add missing generated vars to `USER_VARS`
8328136 makefile: Improve names of config entries for annotation checks
a92a630 annotations: Give better names to local variabes in search functions
997f7c4 annotations: Rename checkPathIsInGlobList with checkPathIsInGlobs
74d4065 config: Add better comments in the template files
73bb3fd config: Whitelist hypervisor annotations by name
5a587ba config: Use glob instead of regexp to match paths in annotations
29f5dec annotations: Fix typo in comment
d71f9e1 config: Add makefile variables for path lists
28c386c config: Protect file_mem_backend against annotation attacks
c2a186b config: Protect vhost_user_store_path against annotation attacks
8cd094c config: Add security warning on configuration examples
b5f2a1e config: Protect ctlpath from annotation attack
2d65b3b config: Protect jailer_path annotation
fe5e1cf config: Add examples for path_list configuration
3f7bcf5 annotations: Simplify negative logic
80144fc config: Add hypervisor path override through annotations
2f5f356 config: Fix typo in function name
2faafbd config: Protect virtio_fs_daemon annotation
9e5ed41 config: Add 'List' alternates for hypervisor configuration paths
b33d4fe agent: fix panic on malformed device resource in container update
1838233 cpuset: add cpuset pkg
bfbbe8b cpuset: don't set cpuset.mems in the guest
5c21ec2 sandbox: consider cpusets if quota is not enforced
9bb0d48 cpuset: support setting mems for sandbox
64a2ef6 virtcontainers: add method for calculating cpuset for sandbox
a441f21 cpuset: add cpuset pkg
ce54090 docs: Update upgrading guide
e884fef docs: update the build kata containers kernel document
9c16643 agent/device: Check type as well as major:minor when looking up devices
4978c90 agent/device: Index all devices in spec before updating them
a7ba362 agent/device: Forward port update_spec_device_list() unit test
230a983 agent/device: update_spec_device_list() should error if dev not found
a6d9fd4 sandbox: don't constrain cpus, mem only cpuset, devices
8f0cb2f cgroups: add ability to update CPUSet
cbdae44 agent: fix errorneous parsing for guest block size
97acaa8 docs: Add containerd install guide
2324666 agent: use ok_or/map_err instead of match
ebe5ad1 rustjail: use Iterator to manipulate vector elements
c9497c8 rustjail: delete codes commented out
d5d9928 rustjail: delete unused test code
f70892a agent: use chain of Result to avoid early return
ab64780 agent: update not accurate comments
9e064ba agent: use macro to simplify parse_cmdline function in config.rs
42c48f5 agent: add blank lines between methods
d3a36fa agent: delete unused field in agentService
fa54660 agent: use no-named closure to reduce codes
efddcb4 agent: use a local fn to reduce duplicated codes
7bb3e56 packaging: fix cloud-hypervisor binary path
7b53041 packaging: fix missing cloud_hypervisor_repo
38212ba packaging: apply qemu v5.1 stable fixes
fb7e9b4 agent: fix aarch64 build
0cfcbf7 docs: add namespace key to pod/container config files
997f1f6 docs: Add crictl example json files
f60f43a runtime: Clear the VCMock 1.x API Methods from 2.0
1789527 ci: snap: add event filtering
999f67d agent: do not follow link when mounting container proc and sysfs
cb2255f agent: set init process non-dumpable
2a6c9ee agent-ctl: include cargo lock updates
eaff5de versions: add plugins section
4f1d23b virtiofs: Disable DAX
6d80df9 snap: specify python version
a116ce0 osbuilder: Create target directory for agent
4dc3bc0 rust-agent: Treat warnings as error
8f7a484 rust-agent: Identify unused results in tests
ce54e5d rust-agent: Log returned errors rather than ignore them
9adb7b7 rust-agent: Remove unused imports
73ab9b1 rust-agent: Report errors to caller if possible
4db3f9e rust-agent: Ignore write errors while writing to the logs
19cb657 rust-agent: Remove unused code that has undefined behavior
86bc151 rust-agent: Remove 'mut' where not needed
8d8adb6 rust-agent: Remove uses of deprecated functions
76298c1 rust-agent: Remove or rename unused parameters
7d303ec rust-agent: Remove or rename unused variables
e0b79eb rust-agent: Remove unused functions
8ed61b1 rust-agent: Remove useless braces
cc4f02e rust-agent: Remove unused macros
ace6f1e clh: Support VFIO device unplug
47cfeaa clh: Remove unnecessary VmmPing
63c4757 versions: cloud-hypervisor: Bump to version 6d30fe05
059b89c docs: Change kata_tap0 to tap0_kata
4ff3ed5 docs: update networking description
de8dcb1 dev-guide: update kata-agent install details
c488cc4 docs: Update docs for enabling agent debug console
e5acb12 docs: update dev guide for agent build
1bddde7 ci: add github action to test the snap
9517b0a versions: cloud-hypervisor: bump version
f5a7175 runtime: cloud-hypervisor: tag openapi-generator-cli container

Signed-off-by: Ubuntu <ubuntu@ip-172-31-19-197.ap-southeast-1.compute.internal>
2020-10-19 06:18:08 +00:00
Xu Wang
49776f76bf Merge pull request #984 from bergwolf/prepare-release
backport 2.0-dev commits to stable-2.0.0
2020-10-18 13:46:16 +08:00
Peng Tao
dbfe85e705 snap: install libseccomp-dev
To build qemu with virtio-fs support.

Depends-on: github.com/kata-containers/tests#2979
Fixes: #982
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:43:15 +08:00
Peng Tao
0c3b6a94b3 package: drop qemu-virtiofs shim
We have enabled qemu-virtiofs by default.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:43:15 +08:00
Peng Tao
f751c98da3 packaging: install virtiofsd for normal qemu build as well
For experimental-virtiofs, we use it to test virtiofs with DAX. Let's
rename its virtiofsd to virtiofsd-dax.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:43:15 +08:00
Peng Tao
08361c5948 runtime: enable virtiofs by default
We've been shipping it for a long time. It's time to make it default
replacing the old obsolet 9pfs.

Fixes: #935
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:43:15 +08:00
Fabiano Fidêncio
da9bfb27ed runtime: Pass --thread-pool-size=1 to virtiofsd
Dave Gilbert brough up that passing --thread-pool-size=1 to virtiofsd
may result in a performance improvement especially when using
`cache=none`. While our current default is `cache=auto`, Dave mentioned
that he seems no harm in having it set and he also mentiond that it may
use a lot less stack space on aarch/arm.

Fixes: #943

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2020-10-18 00:43:15 +08:00
Fabiano Fidêncio
7347d43cf9 packaging: Apply virtiofs performance related fixes to 5.x
Vivek Goyal found out that using "shared" thread pool, instead of
"exclusive" results in better performance.

Knowning that and with the plan to have virtio-fs as the default fs for
the 2.0, let's bring this patch in for both 5.0 and 5.1.

Fixes: #944

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2020-10-18 00:43:15 +08:00
James O. D. Hunt
c7bb1e2790 tools: Improve agent-ctl README
Add a summary to help understand how to use the `agent-ctl` tool.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-18 00:43:15 +08:00
James O. D. Hunt
e6f7ddd9a2 tools: Make agent-ctl support more APIs
Added new `agent-ctl` commands to allow the following agent API calls to
be made:

- `AddARPNeighborsRequest`
- `CloseStdinRequest`
- `CopyFileRequest`
- `GetMetricsRequest`
- `GetOOMEventRequest`
- `MemHotplugByProbeRequest`
- `OnlineCPUMemRequest`
- `ReadStreamRequest`
- `ReseedRandomDevRequest`
- `SetGuestDateTimeRequest`
- `TtyWinResizeRequest`
- `UpdateContainerRequest`
- `WriteStreamRequest`

Fixes: #969.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-18 00:43:15 +08:00
James O. D. Hunt
46cfed5025 tools: Remove commented out code in agent-ctl
Remove a few lines of commented out code.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-18 00:43:15 +08:00
James O. D. Hunt
81fb2c9980 tools: Log request in agent-ctl tool if debug enabled
Display the API request before making the call so users can see what is
sent to the agent.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-18 00:43:15 +08:00
James O. D. Hunt
0c432153df tools: Rename agent-ctl command to GetGuestDetails
Rename the `GuestDetails` command to `GetGuestDetails` to match the
actual agent API name.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-18 00:43:15 +08:00
James O. D. Hunt
6511ffe89d tools: Fix comment in agent-ctl
Correct a comment in the agent control tool.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-18 00:43:15 +08:00
Eric Ernst
ee59378232 kernel: update to 5.4.71
vsock fix was backported to 5.4 stable, so we can drop this patch.

Fixes: #973

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:43:15 +08:00
Eric Ernst
ef11213a4e config: make virtio-fs part of standard kernel
Basic virtio-fs support has made it upstream in the Linux kernel, as
well as in QEMU and Cloud Hypervisor. Let's go ahead and add it to the
standard configuration.

Since the device driver / DAX handling is still in progress for
upstream, we will want to still build a seperate experimental kernel for
those who are comfortable trading off bleeding edge stability/kernel
updates for improved FIO numbers.

Fixes: #963

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:43:15 +08:00
Tim Zhang
1fb6730984 agent: remove unwrap() for e.as_errno()
Use `{:?}` to print `e.as_errno()` instead of using `{}`
to print `e.as_errno().unwrap().desc()`.

Avoid panic only caused by error's content.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-10-18 00:43:15 +08:00
Tim Zhang
05e9fe0591 agent: Use ? instead of match when the error returns directly
It's more clear and more readable.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-10-18 00:43:15 +08:00
bin liu
d658129695 kata-monitor: use regexp to check if runtime is kata containers
To support a few common configurations for Kata, including:

- `io.containerd.kata.v2`
- `io.containerd.kata-qemu.v2`
- `io.containerd.kata-clh.v2`

`kata-monintor` changes to use regexp instead of direct string comparison.

Fixes: #957

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:43:15 +08:00
Tim Zhang
ae2d89e95e agent: use anyhow context to attach context to Error instead of match
Context is clearer than match for these situations.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-10-18 00:43:15 +08:00
Tim Zhang
095d4ad08d agent: remove useless match
Remove useless match.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-10-18 00:43:15 +08:00
Tim Zhang
bd816dfcec agent: Use ok_or_else instead of match for Option -> Result
Using ok_or is clearer than match.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-10-18 00:43:15 +08:00
James O. D. Hunt
d413bf7d44 agent: Fix crasher if AddARPNeighbors request empty
Check if the ARP neighbours specified in the `AddARPNeighbors` API is
set before using it to avoid crashing the agent.

Fixes: #955.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-18 00:43:15 +08:00
James O. D. Hunt
76408c0f13 agent: Fix crasher if UpdateRoutes request empty
Check if the routes specified in the `UpdateRoutes` API is set before
using it to avoid crashing the agent.

Fixes: #949.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-18 00:43:15 +08:00
James O. D. Hunt
6e4da19fa5 agent: Fix crasher if UpdateInterface request empty
Check if the interface specified in the `UpdateInterface` API is set
before using it to avoid crashing the agent.

Fixes: #950.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-18 00:43:15 +08:00
Tim Zhang
8f8061da08 agent: replace match Result with or_else
`or_else` is suitable for more complicated situations.
We can use it to return Ok in Err handling.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-10-18 00:43:15 +08:00
Tim Zhang
64e4b2fa83 agent: replace unnecessary match Result with map_err
Replace `match Result` whose Ok hand is useless.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-10-18 00:43:15 +08:00
Tim Zhang
7c0d68f7f7 agent: replace check! with map_err for readability
It's ambiguous and not easy to read to call method use macro.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-10-18 00:43:15 +08:00
Tim Zhang
82ed34aee1 agent: remove check! in child process because we cant' see logs.
The check macro will log the errors but the log in child process can't
be seen, just ignore it.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-10-18 00:43:15 +08:00
Tim Zhang
9def624c05 agent: replace if let Err with or_else
Fixes #934

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-10-18 00:43:15 +08:00
Tim Zhang
6926914683 agent: refactor namespace::setup to optimize error handling
- Replace the return value with anyhow::Result.
- Remove if let Err.
- Remove match.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-10-18 00:43:15 +08:00
Tim Zhang
e733c13cf7 agent: replace if let Err with map_err
Fixes #934

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-10-18 00:43:15 +08:00
bin liu
ba069f9baa rustjail: add length check for uid_mappings in rootless euid mapping
This might be a copy miss, gid_mappings is checked twice, one should
be uid_mappings.

Fixes: #952

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:43:15 +08:00
Salvador Fuentes
cc8ec7b0e9 versions: Update Kubernetes, containerd, cri-o and cri-tools
Kubernetes: from 1.17.3 to 1.18.9
CRI-O: from 0eec454168e381e460b3d6de07bf50bfd9b0d082 (1.17) to 1.18.3
Containerd: from 3a4acfbc99aa976849f51a8edd4af20ead51d8d7 (1.3.3) to 1.3.7
cri-tools: from 1.17.0 to 1.18.0

Fixes: #960.
Signed-off-by: Salvador Fuentes <salvador.fuentes@intel.com>
2020-10-18 00:43:15 +08:00
Christophe de Dinechin
8a364d2145 annotations: Correct unit tests to validate new protections
Add the verification of some basic protections, namely that:
- EnableAnnotations is honored
- Dangerous paths cannot be modified if no match
- Errors are returned when expected

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:43:15 +08:00
Christophe de Dinechin
0cc6297716 annotations: Split addHypervisorOverrides to reduce complexity
Warning from gocyclo during make check:
 virtcontainers/pkg/oci/utils.go:404:1: cyclomatic complexity 37 of func `addHypervisorConfigOverrides` is high (> 30) (gocyclo)
 func addHypervisorConfigOverrides(ocispec specs.Spec, config *vc.SandboxConfig, runtime RuntimeConfig) error {
^

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:43:15 +08:00
Christophe de Dinechin
b6059f3566 annotations: Add unit test for checkPathIsInGlobs
There are a few interesting corner cases to consider for this
function.

Fixes: #901

Suggested-by: James O.D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:43:15 +08:00
Christophe de Dinechin
c6afad2a06 annotations: Add unit test for regexpContains function
James O.D Hunt: "But also, regexpContains() and
checkPathIsInGlobList() seem like good candidates for some unit
tests. The "look" obvious, but a few boundary condition tests would be
useful I think (filenames with spaces, backslashes, special
characters, and relative & absolute paths are also an interesting
thought here)."

There aren't that many boundary conditions on a list with regexps,
if you assume the regexp match function itself works. However, the
tests is useful in documenting expectations.

Fixes: #901

Suggested-by: James O.D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:43:15 +08:00
Christophe de Dinechin
451608fb28 makefile: Add missing generated vars to USER_VARS
This was discovered while checking a massive change in variables.
The root cause for the error is a very long list of manual
replacements, that is best replaced with a $(foreach).

All individual variables in the output configuration files were
checked against the old build using diff.

This is a forward port of a makefile fix included in
PR https://github.com/kata-containers/runtime/issues/3004
for issue https://github.com/kata-containers/runtime/issues/2943

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:43:15 +08:00
Christophe de Dinechin
8328136575 makefile: Improve names of config entries for annotation checks
The entries used to be things like PATH_LIST, which are too generic.
Replace them with more precise name with a distinguishing keyword,
namely VALID. For example valid_hypervisor_paths.

Fixes: #901

Suggested-by: James O.D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:43:15 +08:00
Christophe de Dinechin
a92a63031d annotations: Give better names to local variabes in search functions
Use more meaningful variable names for clarity.

Fixes: #901

Suggested-by: James O.D. Hunt james.o.hunt@intel.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:43:15 +08:00
Christophe de Dinechin
997f7c4433 annotations: Rename checkPathIsInGlobList with checkPathIsInGlobs
The name is shorter and more specific

Fixes: #901

Suggested-by: James O.D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:43:15 +08:00
Christophe de Dinechin
74d4065197 config: Add better comments in the template files
When there is a default value from the code (usually empty) that
differs from a possible suggested value from the distro, then the
wording "default: empty" is confusing.

Fixes: #901

Suggested-by: Julio Montes <julio.montes@intel.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:43:15 +08:00
Christophe de Dinechin
73bb3fdbee config: Whitelist hypervisor annotations by name
Add a field "enable_annotations" to the runtime configuration that can
be used to whitelist annotations using a list of regular expressions,
which are used to match any part of the base annotation name, i.e. the
part after "io.katacontainers.config.hypervisor."

For example, the following configuraiton will match "virtio_fs_daemon",
"initrd" and "jailer_path", but not "path" nor "firmware":

  enable_annotations = [ "virtio.*", "initrd", "_path" ]

The default is an empty list of enabled annotations, which disables
annotations entirely.

If an anontation is rejected, the message is something like:

  annotation io.katacontainers.config.hypervisor.virtio_fs_daemon is not enabled

Fixes: #901

Suggested-by: Peng Tao <tao.peng@linux.alibaba.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:43:10 +08:00
Christophe de Dinechin
5a587ba506 config: Use glob instead of regexp to match paths in annotations
When filtering annotations that correspond to paths,
e.g. hypervisor.path, it is better to use a glob syntax than a regexp
syntax, as it is more usual for paths, and prevents classes of matches
that are undesirable in our case, such as matching .. against .*

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
29f5dec38f annotations: Fix typo in comment
A comment talking about runtime related annotations describes them as
being related to the agent. A similar comment for the agent
annotations is missing.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
d71f9e1155 config: Add makefile variables for path lists
Add variables to override defaults at build time for the various lists
used to control path annotations.

Fixes: #901

Suggested-by: Fabiano Fidencio <fidencio@redhat.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
28c386c51f config: Protect file_mem_backend against annotation attacks
This one could theoretically be used to overwrite data on the host.
It seems somewhat less risky than the earlier ones for a number
of reasons, but worth protecting a little anyway.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
c2a186b18c config: Protect vhost_user_store_path against annotation attacks
This path could be used to overwrite data on the host.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
8cd094cf06 config: Add security warning on configuration examples
Add the following text explaining the risk of using regular
expressions in path lists:

Each member of the list can be a regular expression, but prefer names.
Otherwise, please read and understand the following carefully.
SECURITY WARNING: If you use regular expressions, be mindful that
an attacker could craft an annotation that uses .. to escape the paths
you gave. For example, if your regexp is /bin/qemu.* then if there is
a directory named /bin/qemu.d/, then an attacker can pass an annotation
containing /bin/qemu.d/../put-any-binary-name-here and attack your host.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
b5f2a1e8c4 config: Protect ctlpath from annotation attack
This also adds annotation for ctlpath which were not present
before. It's better to implement the code consistenly right now to make
sure that we don't end up with a leaky implementation tacked on later.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
2d65b3bfd8 config: Protect jailer_path annotation
The jailer_path annotation can be used to execute arbitrary code on
the host. Add a jailer_path_list configuration entry providing a list
of regular expressions that can be used to filter annotations that
represent valid file names.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
fe5e1cf2e1 config: Add examples for path_list configuration
The path_list configuration gives a series of regular expressions that
limit which values are acceptable through annotations in order to
avoid kata launching arbitrary binaries on the host when receiving an
annotation.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
3f7bcf54f0 annotations: Simplify negative logic
Replace strange negative logic  (!ok -> continue) with positive
logic (ok -> do it)

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
80144fc415 config: Add hypervisor path override through annotations
The annotation is provided, so it should be respected.
Furthermore, it is important to implement it with the appropriate
protetions similar to what was done for virtiofsd.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
2f5f35608a config: Fix typo in function name
There was an extra 'p' in addHypervisorVirtioFsOverrides.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
2faafbdd3a config: Protect virtio_fs_daemon annotation
Sending the virtio_fs_daemon annotation can be used to execute
arbitrary code on the host. In order to prevent this, restrict the
values of the annotation to a list provided by the configuration
file.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
9e5ed41511 config: Add 'List' alternates for hypervisor configuration paths
Paths mentioned in the hypervisor configuration can be overriden
using annotations, which is potentially dangerous. For each path,
add a 'List' variant that specifies the list of acceptable values
from annotations.

Bug: https://bugs.launchpad.net/katacontainers.io/+bug/1878234

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Peng Tao
b33d4fe708 agent: fix panic on malformed device resource in container update
Somehow containerd is sending a malformed device in update API. While it
should not happen, we should not panic either.

Fixes: #946
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:40:16 +08:00
Eric Ernst
183823398d cpuset: add cpuset pkg
Pulled from 1.18.4 Kubernetes, adding the cpuset pkg for managing
CPUSet calculations on the host. Go mod'ing the original code from
k8s.io/kubernetes was very painful, and this is very static, so let's
just pull in what we need.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
Eric Ernst
bfbbe8ba6b cpuset: don't set cpuset.mems in the guest
Kata doesn't map any numa topologies in the guest. Let's make sure we
clear the Cpuset fields before passing container updates to the
guest.

Note, in the future we may want to have a vCPU to guest CPU mapping and
still include the cpuset.Cpus. Until we have this support, clear this as
well.

Fixes: #932

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
Eric Ernst
5c21ec278c sandbox: consider cpusets if quota is not enforced
CPUSet cgroup allows for pinning the memory associated with a cpuset to
a given numa node. Similar to cpuset.cpus, we should take cpuset.mems
into account for the sandbox-cgroup that Kata creates.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
Eric Ernst
9bb0d48d56 cpuset: support setting mems for sandbox
CPUSet cgroup allows for pinning the memory associated with a cpuset to
a given numa node. Similar to cpuset.cpus, we should take cpuset.mems
into account for the sandbox-cgroup that Kata creates.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
Eric Ernst
64a2ef62e0 virtcontainers: add method for calculating cpuset for sandbox
Calculate sandbox's CPUSet as the union of each of the container's
CPUSets.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
Eric Ernst
a441f21c40 cpuset: add cpuset pkg
Pulled from 1.18.4 Kubernetes, adding the cpuset pkg for managing
CPUSet calculations on the host. Go mod'ing the original code from
k8s.io/kubernetes was very painful, and this is very static, so let's
just pull in what we need.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
James O. D. Hunt
ce54090f25 docs: Update upgrading guide
Update the upgrading guide for 2.0.

Fixes: #928.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-18 00:40:16 +08:00
Ychau Wang
e884fef483 docs: update the build kata containers kernel document
Update the build kata containers kernel document for 2.0 release. Fixed
the 1.x release project paths and urls, using the kata-containers
project file paths and urls.

Fixes: #929

Signed-off-by: Ychau Wang <wangyongchao.bj@inspur.com>
2020-10-18 00:40:16 +08:00
David Gibson
9c16643c12 agent/device: Check type as well as major:minor when looking up devices
To update device resource entries from host to guest, we search for
the right entry by host major:minor numbers, then later update it.
However block and character devices exist in separate major:minor
namespaces so we could have one block and one character device with
matching major:minor and thus incorrectly update both with the details
for whichever device is processed second.

Add a check on device type to prevent this.

Port from the Kata 1 Go agent
https://github.com/kata-containers/agent/commit/27ebdc9d2761

Fixes: #703

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-10-18 00:40:16 +08:00
David Gibson
4978c9092c agent/device: Index all devices in spec before updating them
The agent needs to update device entries in the OCI spec so that it
has the correct major:minor numbers for the guest, which may differ
from the host.

Entries in the main device list are looked up by device path, but
entries in the device resources list are looked up by (host)
major:minor.  This is done one device at a time, updating as we go in
update_spec_device_list().

But since the host and guest have different namespaces, one device
might have the same major:minor as a different device on the host.  In
that case we could update one resource entry to the correct guest
values, then mistakenly update it again because it now matches a
different host device.

To avoid this, rather than looking up and updating one by one, we make
all the lookups in advance, creating a map from (host) device path to
the indices in the spec where the device and resource entries can be
found.

Port from the Go agent in Kata 1,
https://github.com/kata-containers/agent/commit/d88d46849130

Fixes: #703

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-10-18 00:40:16 +08:00
David Gibson
a7ba362f92 agent/device: Forward port update_spec_device_list() unit test
The Kata 1 Go agent included a unit test for updateSpecDeviceList, but no
such unit test exists for the Rust agent's equivalent
update_spec_device_list().  Port the Kata1 test to Rust.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-10-18 00:40:16 +08:00
David Gibson
230a9833f8 agent/device: update_spec_device_list() should error if dev not found
If update_spec_device_list() is given a device that can't be found in the
OCI spec, it currently does nothing, and returns Ok(()).  That doesn't
seem like what we'd expect and is not what the Go agent in Kata 1 does.

Change it to return an error in that case, like Kata 1.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-10-18 00:40:16 +08:00
Eric Ernst
a6d9fd4118 sandbox: don't constrain cpus, mem only cpuset, devices
Allow for constraining the cpuset as well as the devices-whitelist . Revert
sandbox constraints for cpu/memory, as they break the K8S use case. Can
re-add behind a non-default flag in the future.

The sandbox CPUSet should be updated every time a container is created,
updated, or removed.

To facilitate this without rewriting the 'non constrained cgroup'
handling, let's add to the Sandbox's cgroupsUpdate function.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
Eric Ernst
8f0cb2f1ea cgroups: add ability to update CPUSet
Add function for applying a cpuset change to a cgroup

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
Eric Ernst
cbdae44992 agent: fix errorneous parsing for guest block size
We were assuming base 10 string before, when the block size from sysfs
is actually a hex string. Let's fix that.

Fixes: #908

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
James O. D. Hunt
97acaa8124 docs: Add containerd install guide
Create a containerd installation guide and a new `kata-manager` script
for 2.0 that automated the steps outlined in the guide.

Also cleaned up and improved the installation documentation in various
ways, the most significant being:

- Added legacy install link for 1.x installs.
- Official packages section:
  - Removed "Contact" column (since it was empty!)
  - Reworded "Versions" column to clarify the versions are a minimum
    (to reduce maintenance burden).
  - Add a column to show which installation methods receive automatic updates.
  - Modified order of installation options in table and document to
    de-emphasise automatic installation and promote official packages
    and snap more.
- Removed sections no longer relevant for 2.0.

Fixes: #738.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-18 00:40:16 +08:00
bin liu
23246662b2 agent: use ok_or/map_err instead of match
Sometimes `Option.or_or` and `Result.map_err` may be simpler
than match statement. Especially in rpc.rs, there are
many `ctr.get_process` and `sandbox.get_container` which
are using `match`.

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
ebe5ad1386 rustjail: use Iterator to manipulate vector elements
Use Iterator can save codes, and make code more readable

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
c9497c88e4 rustjail: delete codes commented out
There are some uses/codes/struct fields are commented out, and
may not turn into  un-comment these codes, so delete these comments.

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
d5d9928f97 rustjail: delete unused test code
The auto generated test code is no meanings, delete it.

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
f70892a5bb agent: use chain of Result to avoid early return
Use rust `Result`'s `or_else`/`and_then` can write clean codes.
And can avoid early return by check wether the `Result`
is `Ok` or `Err`.

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
ab64780a0b agent: update not accurate comments
This commit includes:
- update comments that not matched the function name
- file path with doubled slash

Fixes: #922

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
9e064ba192 agent: use macro to simplify parse_cmdline function in config.rs
In function parse_cmdline there are some similar codes, if we want
to add more commandline arguments, the code will grow too long.
Use macro can reduce some codes with the same logic/processing.

Fixes: #914

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
42c48f54ed agent: add blank lines between methods
In rpc.rs, there are no blank lines between methods, this commit
add blank lines for these methods.

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
d3a36fa06f agent: delete unused field in agentService
The code is for test, and not needed now.

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
fa546600ff agent: use no-named closure to reduce codes
For simple closures, inline closures can save codes.

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
efddcb4ab8 agent: use a local fn to reduce duplicated codes
The same codes used twices, aggregated into a function can
reduce codes.

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
Peng Tao
7bb3e562bc packaging: fix cloud-hypervisor binary path
1. ensure build-static-clh.sh puts cloud-hypervisor under ./cloud-hypervisor directory
2. install cloud-hypervisor/cloud-hypervisor binary

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:40:16 +08:00
Peng Tao
7b53041bad packaging: fix missing cloud_hypervisor_repo
It is needed in order to build from source.

Fixes: #916
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:40:16 +08:00
Peng Tao
38212ba6d8 packaging: apply qemu v5.1 stable fixes
Qemu v5.1 was released with an affending commit 9b3a35ec82
(virtio: verify that legacy support is not accidentally on).
As a result, it breaks commandline compatiblilities for old qemu
users. Upstream qemu has fixed it but no release has been put out yet.
Let's apply these fixes by hand for now.

Refs: https://www.mail-archive.com/qemu-devel@nongnu.org/msg729556.html

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:40:16 +08:00
Jianyong Wu
fb7e9b4f32 agent: fix aarch64 build
aarch64 needs libgcc to resolve some non-builtin symbols.

Fixes: #909
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
0cfcbf79b8 docs: add namespace key to pod/container config files
If no namespace field in config files, CRI-O will failed:
 setting pod sandbox name and id: cannot generate pod name without namespace

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
997f1f6cd0 docs: Add crictl example json files
Add basic sample pod/container config files to show
how to use `crictl` with Kata containers.

Fixes: #881

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
Ychau Wang
f60f43af6b runtime: Clear the VCMock 1.x API Methods from 2.0
Clear the 1.x branch api methods in the 2.0. Keep the same methods to
the VC interface, like the VCImpl struct.

Fixes: #751

Signed-off-by: Ychau Wang <wangyongchao.bj@inspur.com>
2020-10-18 00:40:16 +08:00
Julio Montes
1789527d61 ci: snap: add event filtering
Run the snap CI on every PR is not needed. Don't run the snap CI
on PRs that don't change the source code (*.go/*.rs), a configuration
file or Makefile.

fixes #896

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-18 00:40:16 +08:00
Peng Tao
999f67d573 agent: do not follow link when mounting container proc and sysfs
Attackers might use it to explore other containers in the same pod.
While it is still safe to allow it, we can just close the race window
like runc does.

Fixes: #885
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:40:16 +08:00
Peng Tao
cb2255f199 agent: set init process non-dumpable
On old kernels (like v4.9), kernel applies CLOECEC in wrong order w.r.t.
dumpable task flags. As a result, we might leak guest file descriptor to
containers. This is a former runc CVE-2016-9962 and still applies to
kata agent. Although Kata container is still valid at protecting the
host, we should not leak extra resources to user containers.

This sets the init processes that join and setup the container's
namespaces as non-dumpable before they setns to the container's pid (or
any other ) namespace.

This settings is automatically reset to the default after the Exec in
the container so that it does not change functionality for the
applications that are running inside, just our init processes.

This prevents parent processes, the pid 1 of the container, to ptrace
the init process before it drops caps and other sets LSMs.

The order during the exec syscall is that the process is set back to
dumpable before O_CLOEXEC are processed.

Refs:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=613cc2b6f272c1a8ad33aefa21cad77af23139f7
https://github.com/torvalds/linux/blob/v4.9/fs/exec.c#L1290-L1318
opencontainers/runc@50a19c6
https://nvd.nist.gov/vuln/detail/CVE-2016-9962

Fixes: #890
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:40:16 +08:00
Peng Tao
2a6c9eec74 agent-ctl: include cargo lock updates
Simply running `make` would generate some cargo lock updates for
agent-ctl. Let's include them so that we have fixed dependencies.

Fixes: #883
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:40:16 +08:00
Julio Montes
eaff5de37a versions: add plugins section
plugins sections contains the details of plugins required for
the components or testing.

Add sriov-network-device-plugin url and version that are consumed
by the VFIO test in the tests repository.

fixes #879

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-18 00:40:16 +08:00
Jose Carlos Venegas Munoz
4f1d23b651 virtiofs: Disable DAX
virtiofs DAX support is not stable today, there are
a few corner cases to make it default.

Fixes: #862
Fixes: #875

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
2020-10-18 00:40:16 +08:00
Julio Montes
6d80df9831 snap: specify python version
In order to avoid `unmet dependencies` error in the CI,
the python version must be specified in the yaml.

fixes #877

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-18 00:40:16 +08:00
Ralf Haferkamp
a116ce0b75 osbuilder: Create target directory for agent
When building with AGENT_SOURCE_BIN pointing to an already built
kata-agent binary, the target directory needs to be created in the
rootfs tree.

Fixes #873

Signed-off-by: Ralf Haferkamp <rhafer@suse.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
4dc3bc0020 rust-agent: Treat warnings as error
Avoid the accumulation of warnings we had, as reported in #750.

Fixes: #750

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
8f7a4842c2 rust-agent: Identify unused results in tests
Assign unused results to _ in order to silence warnings.

This addresses the following warnings:

    warning: unused `std::result::Result` that must be used
        --> rustjail/src/mount.rs:1182:16
         |
    1182 |         defer!(unistd::chdir(&olddir););
         |                ^^^^^^^^^^^^^^^^^^^^^^^
         |
         = note: `#[warn(unused_must_use)]` on by default
         = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
        --> rustjail/src/mount.rs:1183:9
         |
    1183 |         unistd::chdir(tempdir.path());
         |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
         |
         = note: this `Result` may be an `Err` variant, which should be handled

While in regular code, we want to log possible errors, in test code
it's OK to simply ignore the returned value.

Fixes: #750

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
ce54e5dd57 rust-agent: Log returned errors rather than ignore them
In a number of cases, we have functions that return a Result<...>
and where the possible error case is simply ignored. This is a bit
unhealthy.

Add a `check!` macro that allows us to not ignore error values
that we want to log, while not interrupting the flow by returning
them. This is useful for low-level functions such as `signal::kill` or
`unistd::close` where an error is probably significant, but should not
necessarily interrupt the flow of the program (i.e. using `call()?` is
not the right answer.

The check! macro is then used on low-level calls. This addresses the
following warnings from #750:

This addresses the following warning:

    warning: unused `std::result::Result` that must be used
       --> /home/ddd/go/src/github.com/kata-containers-2.0/src/agent/rustjail/src/container.rs:903:17
        |
    903 |                 signal::kill(Pid::from_raw(p.pid), Some(Signal::SIGKILL));
        |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: `#[warn(unused_must_use)]` on by default
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> /home/ddd/go/src/github.com/kata-containers-2.0/src/agent/rustjail/src/container.rs:916:17
        |
    916 |                 signal::kill(Pid::from_raw(child.id() as i32), Some(Signal::SIGKILL));
        |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:340:13
        |
    340 |             write_sync(cwfd, SYNC_FAILED, format!("{:?}", e).as_str());
        |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: `#[warn(unused_must_use)]` on by default
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:554:13
        |
    554 | /             write_sync(
    555 | |                 cwfd,
    556 | |                 SYNC_FAILED,
    557 | |                 format!("setgroups failed: {:?}", e).as_str(),
    558 | |             );
        | |______________^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:340:13
        |
    340 |             write_sync(cwfd, SYNC_FAILED, format!("{:?}", e).as_str());
        |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:340:13
        |
    340 |             write_sync(cwfd, SYNC_FAILED, format!("{:?}", e).as_str());
        |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: `#[warn(unused_must_use)]` on by default
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:554:13
        |
    554 | /             write_sync(
    555 | |                 cwfd,
    556 | |                 SYNC_FAILED,
    557 | |                 format!("setgroups failed: {:?}", e).as_str(),
    558 | |             );
        | |______________^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:626:5
        |
    626 |     unistd::close(cfd_log);
        |     ^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: `#[warn(unused_must_use)]` on by default
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:627:5
        |
    627 |     unistd::close(crfd);
        |     ^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:628:5
        |
    628 |     unistd::close(cwfd);
        |     ^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:770:9
        |
    770 |         fcntl::fcntl(pfd_log, FcntlArg::F_SETFD(FdFlag::FD_CLOEXEC));
        |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: `#[warn(unused_must_use)]` on by default
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:799:9
        |
    799 |         fcntl::fcntl(prfd, FcntlArg::F_SETFD(FdFlag::FD_CLOEXEC));
        |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:800:9
        |
    800 |         fcntl::fcntl(pwfd, FcntlArg::F_SETFD(FdFlag::FD_CLOEXEC));
        |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:803:13
        |
    803 |             unistd::close(prfd);
        |             ^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:930:9
        |
    930 |         log_handler.join();
        |         ^^^^^^^^^^^^^^^^^^^
        |
        = note: `#[warn(unused_must_use)]` on by default
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:803:13
        |
    803 |             unistd::close(prfd);
        |             ^^^^^^^^^^^^^^^^^^^^
        |
        = note: `#[warn(unused_must_use)]` on by default
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:804:13
        |
    804 |             unistd::close(pwfd);
        |             ^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:842:13
        |
    842 |             sched::setns(old_pid_ns, CloneFlags::CLONE_NEWPID);
        |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/container.rs:843:13
        |
    843 |             unistd::close(old_pid_ns);
        |             ^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

Fixes: #844
Fixes: #750

Suggested-by: Tim Zhang <tim@hyper.sh>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
9adb7b7c28 rust-agent: Remove unused imports
This addresses the following warnings (and similar ones)::

    Compiling rustjail v0.1.0 (/home/ddd/go/src/github.com/kata-containers-2.0/src/agent/rustjail)
    warning: unused import: `debug`
      --> rustjail/src/container.rs:57:12
       |
    57 | use slog::{debug, info, o, Logger};
       |            ^^^^^

    warning: unused imports: `AddressFamily`, `SockFlag`, `SockType`, `self`
      --> rustjail/src/process.rs:18:24
       |
    18 | use nix::sys::socket::{self, AddressFamily, SockFlag, SockType};
       |                        ^^^^  ^^^^^^^^^^^^^  ^^^^^^^^  ^^^^^^^^

    warning: unused import: `nix::Error`
      --> rustjail/src/process.rs:23:5
       |
    23 | use nix::Error;
       |     ^^^^^^^^^^

    warning: unused import: `protobuf::RepeatedField`
      --> rustjail/src/validator.rs:11:5
       |
    11 | use protobuf::RepeatedField;
       |     ^^^^^^^^^^^^^^^^^^^^^^^

Fixes: #750

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
73ab9b1d6d rust-agent: Report errors to caller if possible
Various recently added error-causing calls

This addresses the following warning:

    warning: unused `std::result::Result` that must be used
      --> rustjail/src/cgroups/fs/mod.rs:93:9
       |
    93 |         cg.add_task(CgroupPid::from(pid as u64));
       |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       |
       = note: `#[warn(unused_must_use)]` on by default
       = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/cgroups/fs/mod.rs:196:17
        |
    196 |                 freezer_controller.thaw();
        |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/cgroups/fs/mod.rs:199:17
        |
    199 |                 freezer_controller.freeze();
        |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/cgroups/fs/mod.rs:365:9
        |
    365 |         cpuset_controller.set_cpus(&cpu.cpus);
        |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/cgroups/fs/mod.rs:369:9
        |
    369 |         cpuset_controller.set_mems(&cpu.mems);
        |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/cgroups/fs/mod.rs:381:13
        |
    381 |             cpu_controller.set_shares(shares);
        |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/cgroups/fs/mod.rs:385:5
        |
    385 |     cpu_controller.set_cfs_quota_and_period(cpu.quota, cpu.period);
        |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = note: this `Result` may be an `Err` variant, which should be handled

    warning: unused `std::result::Result` that must be used
        --> rustjail/src/cgroups/fs/mod.rs:1061:13
         |
    1061 |             cpuset_controller.set_cpus(cpuset_cpus);
         |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
         |
         = note: this `Result` may be an `Err` variant, which should be handled

The specific case of cpu_controller.set_cfs_quota_and_period is
addressed in a way that changes the logic following a suggestion by
Liu Bin, who had just added the code.

Fixes: #750

Suggested-by: Liu Bin <bin@hyper.sh>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
4db3f9e226 rust-agent: Ignore write errors while writing to the logs
When we are writing to the logs and there is an error doing so, there
is not much we can do. Chances are that a panic would make things
worse. So let it go through.

    warning: unused `std::result::Result` that must be used
       --> rustjail/src/sync.rs:26:9
        |
    26  |         write_count(lfd, log_str.as_bytes(), log_str.len());
        |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
       ::: rustjail/src/container.rs:339:13
        |
    339 |             log_child!(cfd_log, "child exit: {:?}", e);
        |             ------------------------------------------- in this macro invocation
        |
        = note: this `Result` may be an `Err` variant, which should be handled
        = note: this warning originates in a macro (in Nightly builds, run with -Z macro-backtrace for more info)

Fixes: #750

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
19cb657299 rust-agent: Remove unused code that has undefined behavior
Some functions have undefined behavior and are not actually used.

This addresses the following warning:
    warning: the type `oci::User` does not permit zero-initialization
      --> rustjail/src/lib.rs:99:18
       |
    99 |         unsafe { MaybeUninit::zeroed().assume_init() }
       |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       |                  |
       |                  this code causes undefined behavior when executed
       |                  help: use `MaybeUninit<T>` instead, and only call `assume_init` after initialization is done
       |
       = note: `#[warn(invalid_value)]` on by default
    note: `std::ptr::Unique<u32>` must be non-null (in this struct field)

    warning: the type `protocols::oci::Process` does not permit zero-initialization
       --> rustjail/src/lib.rs:146:14
        |
    146 |     unsafe { MaybeUninit::zeroed().assume_init() }
        |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |              |
        |              this code causes undefined behavior when executed
        |              help: use `MaybeUninit<T>` instead, and only call `assume_init` after initialization is done
        |
    note: `std::ptr::Unique<std::string::String>` must be non-null (in this struct field)

Fixes: #750

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
86bc151787 rust-agent: Remove 'mut' where not needed
Addresses the following warning (and a few similar ones):
    warning: variable does not need to be mutable
       --> rustjail/src/container.rs:369:9
        |
    369 |     let mut oci_process: oci::Process = serde_json::from_str(process_str)?;
        |         ----^^^^^^^^^^^
        |         |
        |         help: remove this `mut`
        |
        = note: `#[warn(unused_mut)]` on by default

Fixes: #750

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
8d8adb6887 rust-agent: Remove uses of deprecated functions
This addresses the following:

    warning: use of deprecated item 'std::error::Error::description': use the Display impl or to_string()
        --> rustjail/src/container.rs:1598:31
         |
    1598 | ...                   e.description(),
         |                         ^^^^^^^^^^^
         |
         = note: `#[warn(deprecated)]` on by default

Fixes: #750

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
76298c12b7 rust-agent: Remove or rename unused parameters
Parameters that are never used were removed.
Parameters that are unused, but necessary because of some common
interface were renamed with a _ prefix.
In one case, consume the parameter by adding an info! call, and fix a
minor typo in a message in the same function.

This addresses the following warning:

    warning: unused variable: `child`
        --> rustjail/src/container.rs:1128:5
         |
    1128 |     child: &mut Child,
         |     ^^^^^ help: if this is intentional, prefix it with an underscore: `_child`

    warning: unused variable: `logger`
        --> rustjail/src/container.rs:1049:22
         |
    1049 | fn update_namespaces(logger: &Logger, spec: &mut Spec, init_pid: RawFd) -> Result<()> {
         |                      ^^^^^^ help: if this is intentional, prefix it with an underscore: `_logger`

Fixes: #750

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
7d303ec2d0 rust-agent: Remove or rename unused variables
Remove variables that are simply not used.
Rename as _ variables where only initialization matters.

This addresses the following warnings:

    warning: unused variable: `writer`
       --> src/main.rs:130:9
        |
    130 |     let writer = unsafe { File::from_raw_fd(wfd) };
        |         ^^^^^^ help: if this is intentional, prefix it with an underscore: `_writer`
        |
        = note: `#[warn(unused_variables)]` on by default

    warning: unused variable: `ctx`
       --> src/rpc.rs:782:9
        |
    782 |         ctx: &ttrpc::TtrpcContext,
        |         ^^^ help: if this is intentional, prefix it with an underscore: `_ctx`

    warning: unused variable: `ctx`
       --> src/rpc.rs:808:9
        |
    808 |         ctx: &ttrpc::TtrpcContext,
        |         ^^^ help: if this is intentional, prefix it with an underscore: `_ctx`

    warning: unused variable: `dns_list`
        --> src/rpc.rs:1152:16
         |
    1152 |             Ok(dns_list) => {
         |                ^^^^^^^^ help: if this is intentional, prefix it with an underscore: `_dns_list`

    warning: value assigned to `child_stdin` is never read
       --> rustjail/src/container.rs:807:13
        |
    807 |         let mut child_stdin = std::process::Stdio::null();
        |             ^^^^^^^^^^^^^^^
        |
        = note: `#[warn(unused_assignments)]` on by default
        = help: maybe it is overwritten before being read?

    warning: value assigned to `child_stdout` is never read
       --> rustjail/src/container.rs:808:13
        |
    808 |         let mut child_stdout = std::process::Stdio::null();
        |             ^^^^^^^^^^^^^^^^
        |
        = help: maybe it is overwritten before being read?

    warning: value assigned to `child_stderr` is never read
       --> rustjail/src/container.rs:809:13
        |
    809 |         let mut child_stderr = std::process::Stdio::null();
        |             ^^^^^^^^^^^^^^^^
        |
        = help: maybe it is overwritten before being read?

    warning: value assigned to `stdin` is never read
       --> rustjail/src/container.rs:810:13
        |
    810 |         let mut stdin = -1;
        |             ^^^^^^^^^
        |
        = help: maybe it is overwritten before being read?

    warning: value assigned to `stdout` is never read
       --> rustjail/src/container.rs:811:13
        |
    811 |         let mut stdout = -1;
        |             ^^^^^^^^^^
        |
        = help: maybe it is overwritten before being read?

    warning: value assigned to `stderr` is never read
       --> rustjail/src/container.rs:812:13
        |
    812 |         let mut stderr = -1;
        |             ^^^^^^^^^^
        |
        = help: maybe it is overwritten before being read?

Fixes: #750

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
e0b79eb57f rust-agent: Remove unused functions
Fixes the following warning:

   Compiling logging v0.1.0 (/home/ddd/go/src/github.com/kata-containers-2.0/pkg/logging)
   warning: associated function is never used: `set_level`
      --> /home/ddd/go/src/github.com/kata-containers-2.0/pkg/logging/src/lib.rs:186:8
       |
   186 |     fn set_level(&self, level: slog::Level) {
       |        ^^^^^^^^^
       |
       = note: `#[warn(dead_code)]` on by default

Fixes: #750

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
8ed61b1bb9 rust-agent: Remove useless braces
This addresses the following warning:

    warning: unnecessary braces around assigned value
        --> src/rpc.rs:1411:26
         |
    1411 |     detail.init_daemon = { unistd::getpid() == Pid::from_raw(1) };
         |                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: remove these braces
         |
         = note: `#[warn(unused_braces)]` on by default

Fixes: #750

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
cc4f02e2b6 rust-agent: Remove unused macros
This addresses the following warnings:

   Compiling rustjail v0.1.0 (/home/ddd/go/src/github.com/kata-containers-2.0/src/agent/rustjail)
   warning: unused `#[macro_use]` import
     --> rustjail/src/lib.rs:15:1
      |
   15 | #[macro_use]
      | ^^^^^^^^^^^^
      |
      = note: `#[warn(unused_imports)]` on by default

   warning: unused macro definition
     --> rustjail/src/lib.rs:38:1
      |
   38 | / macro_rules! sl {
   39 | |     () => {
   40 | |         slog_scope::logger().new(o!("subsystem" => "rustjail"))
   41 | |     };
   42 | | }
      | |_^
      |
      = note: `#[warn(unused_macros)]` on by default

Fixes: #750

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Bo Chen
ace6f1e66e clh: Support VFIO device unplug
This patch adds the support of VFIO device unplug when using
cloud-hypervisor.

Fixes: #860

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-10-18 00:40:16 +08:00
Bo Chen
47cfeaaf18 clh: Remove unnecessary VmmPing
We can rely on the error handling of the actual HTTP API calls to catch
errors, and don't need to call VmmPing explicitly in advance.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-10-18 00:40:16 +08:00
Bo Chen
63c475786f versions: cloud-hypervisor: Bump to version 6d30fe05
The cloud-hypervisor commit `6d30fe05` introduced a fix on its API for
VFIO device hotplug (`VmAddDevice`), which is required for supporting
VFIO unplug through openAPI calls in kata.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-10-18 00:40:16 +08:00
Chelsea Mafrica
059b89cd03 docs: Change kata_tap0 to tap0_kata
Tap device's should be tap0_kata for architecture.md

Fixes #797

Signed-off-by: duanquanfeng <duanquanfeng_yewu@cmss.chinamobile.com>
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2020-10-18 00:40:16 +08:00
Chelsea Mafrica
4ff3ed5101 docs: update networking description
First, most people don't care about CNM. Move that out of main doc.

Second, tc-filter is the default. Let's add a bit more background on
our usage of tc-filter (and clarify why we use this instead of macvtap).

Fixes #797

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2020-10-18 00:40:16 +08:00
Eric Ernst
de8dcb1549 dev-guide: update kata-agent install details
Install paths were wrong. Updated based on new agent...

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
Archana Shinde
c488cc48a2 docs: Update docs for enabling agent debug console
The systemd method of adding a debug console is not really
user friendly. Since we have added a much more straightforward
method to enable agent debug console, update developer guide to
reflect this.

Fixes #834

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2020-10-18 00:40:16 +08:00
Eric Ernst
e5acb1257f docs: update dev guide for agent build
Include details on setting up rust.

Fixes: #851

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
Julio Montes
1bddde729b ci: add github action to test the snap
Add github action to test that the snap package was generated
correctly, this CI don't test the snap, it just build it.

fixes #838

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-18 00:40:16 +08:00
Julio Montes
9517b0a933 versions: cloud-hypervisor: bump version
Use commit c54452c08a467a3e35d8d72f2a91d424e9718c57 as
version for cloud-hypervisor.
Bring openapi fix cloud-hypervisor/cloud-hypervisor#1760 to
support SGX.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-18 00:40:16 +08:00
Julio Montes
f5a7175f92 runtime: cloud-hypervisor: tag openapi-generator-cli container
Tag openapi-generator-cli container to v4.3.1 that is the latest
stable, this way we can have reproducible builds and the same
generated code in all the systems

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-18 00:40:16 +08:00
Eric Ernst
9b969bb7da packaging: fix image build script
Relative paths are error prone. Fix error.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-06 17:57:28 -07:00
Eric Ernst
fb2f3cfce2 release: Kata Containers 2.0.0-rc1
ae6ccbe8 rust-agent: Update README
3faef791 docs: drop docker installation guide
f3466b87 docs: fix static check errors in docs/install/README.md
89ec614d docs: update architecture.md
1ed73179 qemu: upgrade qemu version to 5.1.0 for arm64.
cb79dddf agent: Fix OCI Windows network shared container name typo
c50aee9d github: Remove issue template and use central one
2a4c3e6a docs: fix broken links
9e2a314e Packaging: release notes script using error kernel path urls
aed20f43 rust-agent: Replaces improper use of match for non-constant patterns
868d0248 devices: fix go test warning in manager_test.go
14164392 action: Allow long lines if non-alphabetic
2ece152c agent: remove unreachable code
033925f9 agent: Change do_exec return type to ! because it will never return
c90fff82 agent: propagate the internal detail errors to users
c0ea9102 packaging: Stop providing OBS packages
ca54edef install: Add contacts to the distribution packages
b5ece037 install: Update information about Community Packages
378e429d install: Update SUSE information
567f8587 install: Update openSUSE information
18f32d13 install: Update RHEL information
8280523c install: Update Fedora information
578db2fc install: Update CentOS information
781d6eca ci: fix clone_tests_repo function
c18c5e2c agent: Set LIBC=gnu for ppc64le arch by default
a378ba53 fc: integrate Firecracker's metrics
9991f4b5 static-build/qemu-virtiofs: Refactor apply virtiofs patches
4a0fd6c2 packaging/qemu: Add common code to apply patches
37acc030 static-build/qemu-virtiofs: Fix to apply QEMU patches
6c275c92 runtime: fix TestNewConsole UT failure
0479a4cb travis: skip static checker for ppc64
b3e52844 runtime: fix golint errors
d36d3486 agent: fix cargo fmt
e1094d7f ci: always checkout 2.0-dev of test repository
c8ba30f9 docs: fix static check errors
eaa5c433 runtime: fix make check
07caa2f2 gitignore: ignore agent service file
f34e2e66 agent: fix UT failures due to chdir
442e5906 agent: Only allow proc mount if it is procfs
f2850668 rustjail: make the mount error info much more clear
73414554 runtime: add enable_debug_console configuration item for agent
0b62f5a9 runtime: add debug console service
c23a401e runtime: Call s.newStore.Destroy if globalSandboxList.addSandbox
80879197 shimv2: add a comment in checkAndMount()
b6066cbc osbuilder: specify default toolchain verion in rust-init.
1290d007 runtime: Update cloud-hypervisor client pkg to version v0.10.0
afeece42 agent/oci: Don't use deprecated Error::description() method
a4075f0f runtime: Fix linter errors in release files
01df3c1d packaging: Build from source if the clh release binary is missing
bacd41bb runtime: add podman configuration to data collection script
d9746f31 ci: use export command to export envs instead of env config item
ca2a1176 ci: use Travis cache to reduce build time
67af593a agent: update cgroups crate
cabc60f3 docs: Update the reference path of kata-deploy in the packaging
a5859197 runtime: make kata-check check for newer release
08d194b8 how-to: add privileged_without_host_devices to containerd guide
89ade8f3 travis: enable RUST_BACKTRACE
4b30001d agent/rustjail: add more unit tests
232c8213 agent/rustjail: remove makedev function
74bcd510 agent/rustjail: add unit tests for ms_move_rootfs and mask_path
a36f93c9 agent/rustjail: implement functions to chroot
fe0f2198 agent/rustjail: add unit test for pivot_rootfs
5770c2a2 agent/rustjail: implement functions to pivot_root
838b1794 agent/rustjail: add unit test for mount_cgroups
1a60c1de agent/rustjail: add unit test for init_rootfs
77ecfed2 agent/rustjail/mount: don't use unwrap
fa7079bc agent/rustjail: add tempfile crate as depedency
c23bac5c rustjail: implement functions to mount and umount files
e99f3e79 docs: Fix the kata-pkgsync tool's docs script path
d05a7cda docs: fix k8s containerd howto links
f6877fa4 docs: fix up developer guide for 2.0
6d326f21 gitignore: ignore agent version.rs
407cb9a3 agent: fix agent panic running as init
38eb1df4 packaging: use local version file for kata 2.0 in Makefile
313dfee3 docs: fix release process doc
0c4e7b21 packaging: fix release notes

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2020-10-06 17:54:13 -07:00
Eric Ernst
f32a741c76 actions: add kata deploy test
Pull over kata-deploy-test from the 1.x packaging repository. This is
intended to be used for testing any changes to the kata-deploy
scripting, and does not exercise any new source code changes.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-06 17:54:13 -07:00
Eric Ernst
512e79f61a packaging: cleaning, updating based on new filepaths
Update scripts to take into account some files being moved, and some
general cleanup.

Fixes: #866

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-06 17:54:13 -07:00
Eric Ernst
aa70080423 packaging: remove obs-packaging
No longer required -- let's remove them.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-06 17:54:13 -07:00
Eric Ernst
34015bae12 packaging: pull versions, build-image out from obs dir
These are still required; let's pull them out.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-06 17:54:13 -07:00
Eric Ernst
93b60a8327 packaging: Revert "packaging: Stop providing OBS packages"
This reverts commit c0ea910273.

Two scripts are still required for release and testing, which should
have never been under obs-packaging dir in the first place.  Let's
revert, move the scripts / update references to it, and then we can
remove the remaining obs-packaging/ tooling.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-06 17:54:13 -07:00
Yang Bo
aa9951f2cd rust-agent: Update README
rust agent does not use grpc as submodule for a while, update README
to reflect the change.

Fixes: #196
Signed-off-by: Yang Bo <bo@hyper.sh>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Peng Tao
9d8c72998b docs: drop docker installation guide
We have removed cli support and that means dockder support is dropped
for now. Also it doesn't make sense to have so many duplications on each
distribution as we can simply refer to the official docker guide on how
to install docker.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Peng Tao
033ed13202 docs: fix static check errors in docs/install/README.md
It was merged in while the static checker is disabled.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Peng Tao
c058d04b94 docs: update architecture.md
To match the current architecture of Kata Containers 2.0.

Fixes: #831
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Edmond AK Dantes
9d2bb0c452 qemu: upgrade qemu version to 5.1.0 for arm64.
Now, the qemu version used in arm is so old. As some new features have merged
in current qemu, so it's time to upgrade it. As obs-packaging has been removed,
I put the qemu patch under qemu/patch/5.1.x.
As vxfs has been Deprecated in qemu-5.1, it will be no longer exist in
configuration-hyperversior.sh when qemu version larger than 5.0.

Fixes: #816
Signed-off-by: Edmond AK Dantes <edmond.dantes.ak47@outlook.com>
2020-10-06 17:54:13 -07:00
James O. D. Hunt
627d062fb2 agent: Fix OCI Windows network shared container name typo
Correct the typo which would break the Windows-specific OCI network
shared container name feature.

See:

- https://github.com/opencontainers/runtime-spec/blob/master/config-windows.md#network

Fixes: #685.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-06 17:54:13 -07:00
James O. D. Hunt
96afe62576 github: Remove issue template and use central one
Remove the GitHub issue template from this repository. We already have a
central set of templates [1] that are being used so the template in this
repository is redundant.

[1] - https://github.com/kata-containers/.github/tree/master/.github/ISSUE_TEMPLATE/

Fixes: #728.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-06 17:54:13 -07:00
Julio Montes
d946016eb7 docs: fix broken links
Some sections and files were removed in a previous commit,
remove all reference to such sections and files to fix the
check-markdown test.

fixes #826

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-06 17:54:13 -07:00
Ychau Wang
37f1a77a6a Packaging: release notes script using error kernel path urls
2.0 Packaging runtime-release-notes.sh script is using 1.x Packaging
kernel urls. Fix these urls to 2.0 branch Packaging urls.

Fixes: #829

Signed-off-by: Ychau Wang <wangyongchao.bj@inspur.com>
2020-10-06 17:54:13 -07:00
Christophe de Dinechin
450a81cc54 rust-agent: Replaces improper use of match for non-constant patterns
The code used `match` as a switch with variable patterns `ev_fd` and
`cf_fd`, but the way Rust interprets the code is that the first
pattern matches all values. The code does not perform as expected.

This addresses the following warning:

   warning: unreachable pattern
      --> rustjail/src/cgroups/notifier.rs:114:21
       |
   107 |                     ev_fd => {
       |                     ----- matches any value
   ...
   114 |                     cg_fd => {
       |                     ^^^^^ unreachable pattern
       |
       = note: `#[warn(unreachable_patterns)]` on by default

Fixes: #750
Fixes: #793

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-06 17:54:13 -07:00
zhanghj
c09f02e6f6 devices: fix go test warning in manager_test.go
Create "class" and "config" file in temporary device BDF dir,
and remove dir created  by ioutil.TempDir() when test finished.

fixes: #746

Signed-off-by: zhanghj <zhanghj.lc@inspur.com>
2020-10-06 17:54:13 -07:00
James O. D. Hunt
58c7469110 action: Allow long lines if non-alphabetic
Overly long commit lines are annoying. But sometimes,
we need to be able to force the use of long lines
(for example to reference a URL).

Ironically, I can't refer to the URL that explains this
because of ... the long line check! Hence:

```sh
$ cat <<EOT | tr -d '\n'; echo
See: https://github.com/kata-containers/tests/tree/master/
cmd/checkcommits#handling-long-lines
EOT
```

Maximum body length updated to 150 bytes for parity with:

https://github.com/kata-containers/tests/pull/2848

Fixes: #687.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-06 17:54:13 -07:00
Tim Zhang
c36ea0968d agent: remove unreachable code
The code in the end of init_child is unreachable and need to be removed.
The code after do_exec is unreachable and need to be removed.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-10-06 17:54:13 -07:00
Tim Zhang
ba197302e2 agent: Change do_exec return type to ! because it will never return
Indicates unreachable code.

Fixes #819

Signed-off-by: Tim Zhang <tim@hyper.sh>
2020-10-06 17:54:13 -07:00
fupan.lfp
725ad067c1 agent: propagate the internal detail errors to users
It's should propagate the detail errors to users when
the rpc call failed.

Fixes: #824

Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
2020-10-06 17:54:13 -07:00
Fabiano Fidêncio
9858c23c59 packaging: Stop providing OBS packages
The community has discussed and took the decision in favour of promoting
kata-deploy as the way of distributing and using kata for distros that
officially don't maintain the project.

Fixes: #623
Fixes: https://github.com/kata-containers/packaging/issues/1120

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2020-10-06 17:54:13 -07:00
Fabiano Fidêncio
fc8f1ff03c install: Add contacts to the distribution packages
Let's add a new column to the Official packages table, and let the
maintainers of the official distro packages to jump in and add their
names there.

This will help us to ping & redirect to the right people possible issues
that are reported against the official packages.

Fixes: #623

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2020-10-06 17:54:13 -07:00
Fabiano Fidêncio
f7b4f76082 install: Update information about Community Packages
Kata Containers will stop distributing the community packages in favour
of kata-deploy.

Fixes: #623

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2020-10-06 17:54:13 -07:00
Fabiano Fidêncio
4fd66fa689 install: Update SUSE information
Following up a conversation with Ralf Haferkamp, we can safely drop the
instructions for using Kata Containers on SLES 12 SP3 in favour of using
the official builds provided for SLE 15 SP1, and SLE 15 SP2.

Fixes: #623

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2020-10-06 17:54:13 -07:00
Fabiano Fidêncio
e6ff42b8ad install: Update openSUSE information
Let's update the openSUSE Installation Guide to reflect the current
information on how to install kata packages provided by the distro
itself.

The official packages are present on Leap 15.2 and Tumbleweed, and can
be just installed. Leap 15.1 is slightly different, as the .repo file
has to be added before the packages can be installed.

Leap 15.0 has been removed as it already reached its EOL.

Fixes: #623

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2020-10-06 17:54:13 -07:00
Fabiano Fidêncio
6710d87c6a install: Update RHEL information
Although the community packages are present for RHEL, everything about
them is extremely unsupported on the Red Hat side.

Knowing this, we'd be better to simply not mentioned those and, if users
really want to try kata-containers on RHEL, they can simply follow the
CentOS installation guide.

In the future, if the Fedora packages make their way to RHEL, we can add
the information here. However, if we're recommending something
unsupported we'd be better recommending kata-deploy instead.

Fixes: #623

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2020-10-06 17:54:13 -07:00
Fabiano Fidêncio
178b79f122 install: Update Fedora information
Let's update the Fedora Installation Guide to reflect the current
information on how to install kata packages provided by the distro
itself.

These are official packages and we, as Fedora members, recommend using
kata-containers on Fedora 32 and onwards, as from this version
everything works out-of-the-box. Also, Fedora 31 will reach its EOL as
soon as Fedora 33 is out, which should happen on October.

Fixes: #623

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2020-10-06 17:54:13 -07:00
Fabiano Fidêncio
bc545c6549 install: Update CentOS information
Let's update the CentOS Installation Guide to reflect the current
information on how to install kata packages provided by the
Virtualiation Special Interest Group.

These are not official CentOS packages, as those are not coming from Red
Hat Enterprise Linux. These are the same packages we have on Fedora and
we have decided to keep them up-to-date and sync'ed on both Fedora and
CentOS, so people can give Kata Containers a try also on CentOS.

The nature of these packages makes me think that those are "as official
as they can be", so that's the reason I've decided to add the
instructions to the "official" table.

Together with the change in the Installation Guide, let's also update
the README and reflect the fact we **strongly recommend** using CentOS
8, with the packages provided by the Virtualization Special Interest
Group, instead of using the CentOS 7 with packages built on OBS.

Fixes: #623

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2020-10-06 17:54:13 -07:00
Salvador Fuentes
585481990a ci: fix clone_tests_repo function
We should not checkout to 2.0-dev branch in the clone_tests_repo
function when running in Jenkins CI as it discards changes from
tests repo.

Fixes: #818.

Signed-off-by: Salvador Fuentes <salvador.fuentes@intel.com>
2020-10-06 17:54:13 -07:00
Pradipta Kr. Banerjee
0057f86cfa agent: Set LIBC=gnu for ppc64le arch by default
Fixes: #812

Signed-off-by: Pradipta Kr. Banerjee <pradipta.banerjee@gmail.com>
2020-10-06 17:54:13 -07:00
bin liu
fa0401793f fc: integrate Firecracker's metrics
Firecracker expose metrics through fifo file
and using a JSON format. This PR will parse the
Firecracker's metrics and convert to Prometheus metrics.

Fixes: #472

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-06 17:54:13 -07:00
Wainer dos Santos Moschetta
60b7265961 static-build/qemu-virtiofs: Refactor apply virtiofs patches
In static-build/qemu-virtiofs/Dockerfile the code which
applies the virtiofs specific patches is spread in several
RUN instructions. Refactor this code so that it runs in a
single RUN and produce a single overlay image.

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2020-10-06 17:54:13 -07:00
Wainer dos Santos Moschetta
57b53dbae8 packaging/qemu: Add common code to apply patches
The qemu and qemu-virtiofs Dockerfile files repeat the code to apply
patches based on QEMU stable branch being built. Instead, this adds
a common script (qemu/apply_patches.sh) and make it called by the
respective Dockerfile files.

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2020-10-06 17:54:13 -07:00
Wainer dos Santos Moschetta
ddf1a545d1 static-build/qemu-virtiofs: Fix to apply QEMU patches
Fix a bug on qemu-virtiofs Dockerfile which end up not applying
the QEMU patches.

Fixes #786

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2020-10-06 17:54:13 -07:00
Peng Tao
cbdf6400ae runtime: fix TestNewConsole UT failure
It needs root.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Peng Tao
ceeecf9c66 travis: skip static checker for ppc64
As we have already run it on x64.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Peng Tao
7c53baea8a runtime: fix golint errors
Need to run gofmt -s on them.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Peng Tao
b549d354bf agent: fix cargo fmt
Otherwise travis fails.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Peng Tao
9f3113e1f6 ci: always checkout 2.0-dev of test repository
We use 2.0-dev in the tests repository now. Always make sure
we use the right branch.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Peng Tao
ef94742320 docs: fix static check errors
Somehow we are not running static checks for a long time.
And that ended up with a lot for errors.

* Ensure debug options are valid is dropped
* fix snap links
* drop extra CONTRIBUTING.md
* reference kata-pkgsync
* move CODEOWNERS to proper place
* remove extra CODE_OF_CONDUCT.md.
* fix spell checker error on Developer-Guide.md

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Peng Tao
d71764985d runtime: fix make check
Need to use the correct script path.

Fixes: #802
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Peng Tao
0fc04a269d gitignore: ignore agent service file
As it is auto-generated.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Peng Tao
8d7ac5f01c agent: fix UT failures due to chdir
Current working directory is a process level resource. We cannot call
chdir in parallel from multiple threads, which would cause cwd confusion
and result in UT failures.

The agent code itself is correct that chdir is only called from spawned
child init process. Well, there is one exception that it is also called
in do_create_container() but it is safe to assume that containers are
never created in parallel (at least for now).

Fixes: #782
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
fupan.lfp
612acbe319 agent: Only allow proc mount if it is procfs
This only allows some whitelists files bind mounted under proc
and prevent other malicious mount to procfs.

Fixes: #807

Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
2020-10-06 17:54:13 -07:00
fupan.lfp
f3a487cd41 rustjail: make the mount error info much more clear
Make the invalid mount destination's error info much
more clear.

Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
2020-10-06 17:54:13 -07:00
bin liu
3a559521d1 runtime: add enable_debug_console configuration item for agent
Set enable_debug_console=true in Kata's congiguration file,
runtime will pass `agent.debug_console`
and `agent.debug_console_vport=1026` to agent.

Fixes: #245

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-06 17:54:13 -07:00
bin liu
567daf5a42 runtime: add debug console service
Add `kata-runtime exec` to enter guest OS
through shell started by agent

Fixes: #245

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-06 17:54:13 -07:00
Shukui Yang
c7d913f436 runtime: Call s.newStore.Destroy if globalSandboxList.addSandbox
Fixes: #696

Signed-off-by: Shukui Yang <keloyangsk@gmail.com>
2020-10-06 17:54:13 -07:00
Qian Cai
7bd410c725 shimv2: add a comment in checkAndMount()
In checkAndMount(), it is not clear why we check IsBlockDevice() and if
DisableBlockDeviceUse == false and then only return "false, nil" instead
of "false, err". Adding a comment to make it a bit more readable.

Fixes: #732
Signed-off-by: Qian Cai <cai@redhat.com>
2020-10-06 17:54:13 -07:00
zhanghj
7fbc789855 osbuilder: specify default toolchain verion in rust-init.
Specify default toolchain version in rust-init.

Fixes: #799

Signed-off-by: zhanghj <zhanghj.lc@inspur.com>
2020-10-06 17:54:13 -07:00
Bo Chen
7fc41a771a runtime: Update cloud-hypervisor client pkg to version v0.10.0
The latest release of cloud-hypervisor v0.10.0 contains the following
updates: 1) `virtio-block` Support for Multiple Descriptors; 2) Memory
Zones; 3) `Seccomp` Sandbox Improvements; 4) Preliminary KVM HyperV
Emulation Control; 5) various bug fixes and refactoring.

Note that this patch updates the client code of clh's HTTP API in kata,
while the 'versions.yaml' file was updated in an earlier PR.

Fixes: #789

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-10-06 17:54:13 -07:00
David Gibson
a31d82fec2 agent/oci: Don't use deprecated Error::description() method
We shouldn't use it, and we don't need to implement it.

fixes #791

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-10-06 17:54:13 -07:00
James O. D. Hunt
9ef4c80340 runtime: Fix linter errors in release files
Fix the linter errors caught in the `runtime` repos `master` branch [1],
but not in the `2.0-dev` branch [2]. See [3] for further details.

[1] - https://github.com/kata-containers/runtime/pull/2976
[2] - https://github.com/kata-containers/kata-containers/pull/735
[3] - https://github.com/kata-containers/tests/issues/2870

Fixes: #783.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-06 17:54:13 -07:00
Bo Chen
6a4e413758 packaging: Build from source if the clh release binary is missing
This patch add fall-back code path that builds cloud-hypervisor static
binary from source, when the downloading of cloud-hypervisor binary is
failing. This is useful when we experience network issues, and also
useful for upgrading clh to non-released version.

Together with the changes in the tests repo
(https://github.com/kata-containers/tests/pull/2862), the Jenkins config
file is also updated with new Execute shell script for the clh CI in the
kata-containers repo. Those two changes fix the regression on clh CI
here. Please check details in the issue below.

Fixes: #781
Fixes: https://github.com/kata-containers/tests/issues/2858

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-10-06 17:54:13 -07:00
Francesco Giudici
678d4d189d runtime: add podman configuration to data collection script
Be more verbose about podman configuration in the output of the data
collection script: get the system configuration as seen by podman and
dump the configuration files when present.

Fixes: #243
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
2020-10-06 17:54:13 -07:00
bin liu
718f718764 ci: use export command to export envs instead of env config item
Config item env is used as a Matrix Expansion key, so these envs
will export to build jobs individually.

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-06 17:54:13 -07:00
bin liu
d860ded3f0 ci: use Travis cache to reduce build time
This PR includes these changes:
- use Rust installed by Travis
- install x86_64-unknown-linux-musl
- install rustfmt
- use Travis cache
- delete ci/install_vc.sh

Fixes: #748

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-06 17:54:13 -07:00
fupan.lfp
a141da8a20 agent: update cgroups crate
Update cgroups crate to fix the building issue
on Aarch64.

Fixes: #770

Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
2020-10-06 17:54:13 -07:00
Ychau Wang
aaaaee7a4b docs: Update the reference path of kata-deploy in the packaging
Use the relative path of kata-deploy to replace the 1.x packaging url in
the kata-deploy/README.md file. Fixed the path issue, producted by
creating new branch.

Fixes: #777

Signed-off-by: Ychau Wang <wangyongchao.bj@inspur.com>
2020-10-06 17:54:13 -07:00
James O. D. Hunt
21efaf1fca runtime: make kata-check check for newer release
Update `kata-check` to see if there is a newer version available for
download. Useful for users installing static packages (without a package
manager).

Fixes: #734.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-06 17:54:13 -07:00
Peng Tao
2056623e13 how-to: add privileged_without_host_devices to containerd guide
It should be set by default for Kata containers working with containerd.

Fixes: #775
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Julio Montes
34126ee704 travis: enable RUST_BACKTRACE
RUST_BACKTRACE=1 will help us a lot to debug unit tests when
a test is failing

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-06 17:54:13 -07:00
Julio Montes
980a338454 agent/rustjail: add more unit tests
Add unit tests for finish_root, read_only_path and mknod_dev
increasing code coverage of mount.rs

fixes #284

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-06 17:54:13 -07:00
Julio Montes
e14f766895 agent/rustjail: remove makedev function
remove `makedev` function, use `nix`'s implementation instead

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-06 17:54:13 -07:00
Julio Montes
2e0731f479 agent/rustjail: add unit tests for ms_move_rootfs and mask_path
Increase code coverage of mount.rs

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-06 17:54:13 -07:00
Julio Montes
addf62087c agent/rustjail: implement functions to chroot
Use conditional compilation (#[cfg]) to change chroot behaviour
at compilation time. For example, such function will just return
`Ok(())` when the unit tests are being compiled, otherwise real
chroot operation is performed.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-06 17:54:13 -07:00
Julio Montes
c24b68dc4f agent/rustjail: add unit test for pivot_rootfs
Add unit test for pivot_rootfs increasing the code coverage of
mount.rs

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-06 17:54:13 -07:00
Julio Montes
24677d7484 agent/rustjail: implement functions to pivot_root
Use conditional compilation (#[cfg]) to change pivot_root behaviour
at compilation time. For example, such function will just return
`Ok(())` when the unit tests are being compiled, otherwise real
pivot_root operation is performed.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-06 17:54:13 -07:00
Julio Montes
9e74c28158 agent/rustjail: add unit test for mount_cgroups
Add a unit test for `mount_cgroups` increasing the code coverage
of mount.rs from 44% to 52%

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-06 17:54:13 -07:00
Julio Montes
b7aae33cc1 agent/rustjail: add unit test for init_rootfs
Add a unit test for `init_rootfs` increasing the code coverage
of mount.rs from 0% to 44%.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-06 17:54:13 -07:00
Julio Montes
6d9d58278e agent/rustjail/mount: don't use unwrap
Don't use unwrap in `init_rootfs` instead return an Error, this way
we can write unit tests that don't panic.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-06 17:54:13 -07:00
Julio Montes
1bc6fbda8c agent/rustjail: add tempfile crate as depedency
Add tempfile crate as depedency, it will be used in the following
commits to create temporary directories for unit testing.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-06 17:54:13 -07:00
Julio Montes
d39f5a85e6 rustjail: implement functions to mount and umount files
Use conditional compilation (#[cfg]) to change mount and umount
behaviours at compilation time. For example, such functions will just
return `Ok(())` when the unit tests are being compiled, otherwise real
mount and umount operations are performed.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-06 17:54:13 -07:00
Ychau Wang
d90a0eefbe docs: Fix the kata-pkgsync tool's docs script path
Fix the kata-pkgsync tool's docs, change the download path of the
packaging tool in 2.0 release.

Fixes: #773

Signed-off-by: Ychau Wang <wangyongchao.bj@inspur.com>
2020-10-06 17:54:13 -07:00
Peng Tao
2618c014a0 docs: fix k8s containerd howto links
It should points to the internal versions.yaml file.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Peng Tao
5c4878f37e docs: fix up developer guide for 2.0
1. Until we restore docker/moby support, we should use crictl as
developer example.
2. Most of the hyperlinks should point to kata-containers repository.
3. There is no more standalone mode.

Fixes: #767
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Peng Tao
bd6b169e98 gitignore: ignore agent version.rs
It is auto-generated.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Peng Tao
5770336572 agent: fix agent panic running as init
We should mount procfs before trying to parse kernel command lines.

Fixes: #771
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
zhanghj
45daec7b37 packaging: use local version file for kata 2.0 in Makefile
Use local version file instead of downloading from upstream repo.

Fixes: #756

Signed-off-by: zhanghj <zhanghj.lc@inspur.com>
2020-10-06 17:54:13 -07:00
Peng Tao
ed5a7dc022 docs: fix release process doc
We no longer build OBS packages. And we use
kata-containers/tools/packaging/release to do release.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
Peng Tao
6fc7c77721 packaging: fix release notes
Should mention the 2.0 branch docs.

Fixes: #763
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-06 17:54:13 -07:00
4885 changed files with 313066 additions and 616784 deletions

View File

@@ -1,40 +0,0 @@
#!/bin/bash
#
# Copyright (c) 2022 Red Hat
#
# SPDX-License-Identifier: Apache-2.0
#
script_dir=$(dirname "$(readlink -f "$0")")
parent_dir=$(realpath "${script_dir}/../..")
cidir="${parent_dir}/ci"
source "${cidir}/lib.sh"
cargo_deny_file="${script_dir}/action.yaml"
cat cargo-deny-skeleton.yaml.in > "${cargo_deny_file}"
changed_files_status=$(run_get_pr_changed_file_details)
changed_files_status=$(echo "$changed_files_status" | grep "Cargo\.toml$" || true)
changed_files=$(echo "$changed_files_status" | awk '{print $NF}' || true)
if [ -z "$changed_files" ]; then
cat >> "${cargo_deny_file}" << EOF
- run: echo "No Cargo.toml files to check"
shell: bash
EOF
fi
for path in $changed_files
do
cat >> "${cargo_deny_file}" << EOF
- name: ${path}
continue-on-error: true
shell: bash
run: |
pushd $(dirname ${path})
cargo deny check
popd
EOF
done

View File

@@ -1,30 +0,0 @@
#
# Copyright (c) 2022 Red Hat
#
# SPDX-License-Identifier: Apache-2.0
#
name: 'Cargo Crates Check'
description: 'Checks every Cargo.toml file using cargo-deny'
env:
CARGO_TERM_COLOR: always
runs:
using: "composite"
steps:
- name: Install Rust
uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: nightly
override: true
- name: Cache
uses: Swatinem/rust-cache@v2
- name: Install Cargo deny
shell: bash
run: |
which cargo
cargo install --locked cargo-deny || true

View File

@@ -15,7 +15,6 @@ jobs:
name: WIP Check
steps:
- name: WIP Check
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: tim-actions/wip-check@1c2a1ca6c110026b3e2297bb2ef39e1747b5a755
with:
labels: '["do-not-merge", "wip", "rfc"]'

View File

@@ -1,100 +0,0 @@
name: Add backport label
on:
pull_request:
types:
- opened
- synchronize
- reopened
- edited
- labeled
- unlabeled
jobs:
check-issues:
if: ${{ github.event.label.name != 'auto-backport' }}
runs-on: ubuntu-latest
steps:
- name: Checkout code to allow hub to communicate with the project
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v3
- name: Install hub extension script
run: |
pushd $(mktemp -d) &>/dev/null
git clone --single-branch --depth 1 "https://github.com/kata-containers/.github" && cd .github/scripts
sudo install hub-util.sh /usr/local/bin
popd &>/dev/null
- name: Determine whether to add label
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
CONTAINS_AUTO_BACKPORT: ${{ contains(github.event.pull_request.labels.*.name, 'auto-backport') }}
id: add_label
run: |
pr=${{ github.event.pull_request.number }}
linked_issue_urls=$(hub-util.sh \
list-issues-for-pr "$pr" |\
grep -v "^\#" |\
cut -d';' -f3 || true)
[ -z "$linked_issue_urls" ] && {
echo "::error::No linked issues for PR $pr"
exit 1
}
has_bug=false
for issue_url in $(echo "$linked_issue_urls")
do
issue=$(echo "$issue_url"| awk -F\/ '{print $NF}' || true)
[ -z "$issue" ] && {
echo "::error::Cannot determine issue number from $issue_url for PR $pr"
exit 1
}
labels=$(hub-util.sh list-labels-for-issue "$issue")
label_names=$(echo $labels | jq -r '.[].name' || true)
if [[ "$label_names" =~ "bug" ]]; then
has_bug=true
break
fi
done
has_backport_needed_label=${{ contains(github.event.pull_request.labels.*.name, 'needs-backport') }}
has_no_backport_needed_label=${{ contains(github.event.pull_request.labels.*.name, 'no-backport-needed') }}
echo "::set-output name=add_backport_label::false"
if [ $has_backport_needed_label = true ] || [ $has_bug = true ]; then
if [[ $has_no_backport_needed_label = false ]]; then
echo "::set-output name=add_backport_label::true"
fi
fi
# Do not spam comment, only if auto-backport label is going to be newly added.
echo "::set-output name=auto_backport_added::$CONTAINS_AUTO_BACKPORT"
- name: Add comment
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') && steps.add_label.outputs.add_backport_label == 'true' && steps.add_label.outputs.auto_backport_added == 'false' }}
uses: actions/github-script@v6
with:
script: |
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: 'This issue has been marked for auto-backporting. Add label(s) backport-to-BRANCHNAME to backport to them'
})
# Allow label to be removed by adding no-backport-needed label
- name: Remove auto-backport label
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') && steps.add_label.outputs.add_backport_label == 'false' }}
uses: andymckay/labeler@e6c4322d0397f3240f0e7e30a33b5c5df2d39e90
with:
remove-labels: "auto-backport"
repo-token: ${{ secrets.GITHUB_TOKEN }}
- name: Add auto-backport label
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') && steps.add_label.outputs.add_backport_label == 'true' }}
uses: andymckay/labeler@e6c4322d0397f3240f0e7e30a33b5c5df2d39e90
with:
add-labels: "auto-backport"
repo-token: ${{ secrets.GITHUB_TOKEN }}

View File

@@ -1,40 +0,0 @@
# Copyright (c) 2022 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
#
name: Add PR sizing label
on:
pull_request_target:
types:
- opened
- reopened
- synchronize
jobs:
add-pr-size-label:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v1
- name: Install PR sizing label script
run: |
# Clone into a temporary directory to avoid overwriting
# any existing github directory.
pushd $(mktemp -d) &>/dev/null
git clone --single-branch --depth 1 "https://github.com/kata-containers/.github" && cd .github/scripts
sudo install pr-add-size-label.sh /usr/local/bin
popd &>/dev/null
- name: Add PR sizing label
env:
GITHUB_TOKEN: ${{ secrets.KATA_GITHUB_ACTIONS_PR_SIZE_TOKEN }}
run: |
pr=${{ github.event.number }}
# Removing man-db, workflow kept failing, fixes: #4480
sudo apt -y remove --purge man-db
sudo apt -y install diffstat patchutils
pr-add-size-label.sh -p "$pr"

View File

@@ -1,29 +0,0 @@
on:
pull_request_target:
types: ["labeled", "closed"]
jobs:
backport:
name: Backport PR
runs-on: ubuntu-latest
if: |
github.event.pull_request.merged == true
&& contains(github.event.pull_request.labels.*.name, 'auto-backport')
&& (
(github.event.action == 'labeled' && github.event.label.name == 'auto-backport')
|| (github.event.action == 'closed')
)
steps:
- name: Backport Action
uses: sqren/backport-github-action@v8.9.2
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
auto_backport_label_prefix: backport-to-
- name: Info log
if: ${{ success() }}
run: cat /home/runner/.backport/backport.info.log
- name: Debug log
if: ${{ failure() }}
run: cat /home/runner/.backport/backport.debug.log

View File

@@ -1,19 +0,0 @@
name: Cargo Crates Check Runner
on: [pull_request]
jobs:
cargo-deny-runner:
runs-on: ubuntu-latest
steps:
- name: Checkout Code
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v3
- name: Generate Action
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: bash cargo-deny-generator.sh
working-directory: ./.github/cargo-deny-composite-action/
env:
GOPATH: ${{ runner.workspace }}/kata-containers
- name: Run Action
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: ./.github/cargo-deny-composite-action

View File

@@ -10,7 +10,7 @@ env:
error_msg: |+
See the document below for help on formatting commits for the project.
https://github.com/kata-containers/community/blob/main/CONTRIBUTING.md#patch-format
https://github.com/kata-containers/community/blob/master/CONTRIBUTING.md#patch-forma
jobs:
commit-message-check:
@@ -18,32 +18,24 @@ jobs:
name: Commit Message Check
steps:
- name: Get PR Commits
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
id: 'get-pr-commits'
uses: tim-actions/get-pr-commits@v1.2.0
uses: tim-actions/get-pr-commits@v1.0.0
with:
token: ${{ secrets.GITHUB_TOKEN }}
# Filter out revert commits
# The format of a revert commit is as follows:
#
# Revert "<original-subject-line>"
#
filter_out_pattern: '^Revert "'
- name: DCO Check
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: tim-actions/dco@2fd0504dc0d27b33f542867c300c60840c6dcb20
with:
commits: ${{ steps.get-pr-commits.outputs.commits }}
- name: Commit Body Missing Check
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') && ( success() || failure() ) }}
if: ${{ success() || failure() }}
uses: tim-actions/commit-body-check@v1.0.2
with:
commits: ${{ steps.get-pr-commits.outputs.commits }}
- name: Check Subject Line Length
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') && ( success() || failure() ) }}
if: ${{ success() || failure() }}
uses: tim-actions/commit-message-checker-with-regex@v0.3.1
with:
commits: ${{ steps.get-pr-commits.outputs.commits }}
@@ -52,7 +44,7 @@ jobs:
post_error: ${{ env.error_msg }}
- name: Check Body Line Length
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') && ( success() || failure() ) }}
if: ${{ success() || failure() }}
uses: tim-actions/commit-message-checker-with-regex@v0.3.1
with:
commits: ${{ steps.get-pr-commits.outputs.commits }}
@@ -63,8 +55,7 @@ jobs:
# the entire commit message.
#
# - Body lines *can* be longer than the maximum if they start
# with a non-alphabetic character or if there is no whitespace in
# the line.
# with a non-alphabetic character.
#
# This allows stack traces, log files snippets, emails, long URLs,
# etc to be specified. Some of these naturally "work" as they start
@@ -75,12 +66,12 @@ jobs:
#
# - A SoB comment can be any length (as it is unreasonable to penalise
# people with long names/email addresses :)
pattern: '^.+(\n([a-zA-Z].{0,150}|[^a-zA-Z\n].*|[^\s\n]*|Signed-off-by:.*|))+$'
error: 'Body line too long (max 150)'
pattern: '^.+(\n([a-zA-Z].{0,149}|[^a-zA-Z\n].*|Signed-off-by:.*|))+$'
error: 'Body line too long (max 72)'
post_error: ${{ env.error_msg }}
- name: Check Fixes
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') && ( success() || failure() ) }}
if: ${{ success() || failure() }}
uses: tim-actions/commit-message-checker-with-regex@v0.3.1
with:
commits: ${{ steps.get-pr-commits.outputs.commits }}
@@ -91,7 +82,7 @@ jobs:
one_pass_all_pass: 'true'
- name: Check Subsystem
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') && ( success() || failure() ) }}
if: ${{ success() || failure() }}
uses: tim-actions/commit-message-checker-with-regex@v0.3.1
with:
commits: ${{ steps.get-pr-commits.outputs.commits }}

View File

@@ -1,21 +0,0 @@
on:
pull_request:
types:
- opened
- edited
- reopened
- synchronize
name: Darwin tests
jobs:
test:
runs-on: macos-latest
steps:
- name: Install Go
uses: actions/setup-go@v2
with:
go-version: 1.19.2
- name: Checkout code
uses: actions/checkout@v2
- name: Build utils
run: ./ci/darwin-test.sh

View File

@@ -1,37 +0,0 @@
on:
schedule:
- cron: '0 23 * * 0'
name: Docs URL Alive Check
jobs:
test:
runs-on: ubuntu-20.04
# don't run this action on forks
if: github.repository_owner == 'kata-containers'
env:
target_branch: ${{ github.base_ref }}
steps:
- name: Install Go
uses: actions/setup-go@v2
with:
go-version: 1.19.2
env:
GOPATH: ${{ runner.workspace }}/kata-containers
- name: Set env
run: |
echo "GOPATH=${{ github.workspace }}" >> $GITHUB_ENV
echo "${{ github.workspace }}/bin" >> $GITHUB_PATH
- name: Checkout code
uses: actions/checkout@v2
with:
fetch-depth: 0
path: ./src/github.com/${{ github.repository }}
- name: Setup
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/setup.sh
env:
GOPATH: ${{ runner.workspace }}/kata-containers
# docs url alive check
- name: Docs URL Alive Check
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && make docs-url-alive-check

18
.github/workflows/gather-artifacts.sh vendored Executable file
View File

@@ -0,0 +1,18 @@
#!/bin/bash
# Copyright (c) 2019 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
#
set -o errexit
set -o pipefail
pushd kata-artifacts >>/dev/null
for c in ./*.tar.gz
do
echo "untarring tarball $c"
tar -xvf $c
done
tar cvfJ ../kata-static.tar.xz ./opt
popd >>/dev/null

View File

@@ -0,0 +1,36 @@
#!/bin/bash
# Copyright (c) 2019 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
#
set -o errexit
set -o pipefail
main() {
artifact_stage=${1:-}
artifact=$(echo ${artifact_stage} | sed -n -e 's/^install_//p' | sed -r 's/_/-/g')
if [ -z "${artifact}" ]; then
"Scripts needs artifact name to build"
exit 1
fi
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
export GOPATH=$HOME/go
go get github.com/kata-containers/packaging || true
pushd $GOPATH/src/github.com/kata-containers/packaging/release >>/dev/null
git checkout $tag
pushd ../obs-packaging
./gen_versions_txt.sh $tag
popd
source ./kata-deploy-binaries.sh
${artifact_stage} $tag
popd
mv $HOME/go/src/github.com/kata-containers/packaging/release/kata-static-${artifact}.tar.gz .
}
main $@

View File

@@ -0,0 +1,34 @@
#!/bin/bash
# Copyright (c) 2019 Intel Corporation
# Copyright (c) 2020 Ant Group
#
# SPDX-License-Identifier: Apache-2.0
#
set -o errexit
set -o pipefail
main() {
artifact_stage=${1:-}
artifact=$(echo ${artifact_stage} | sed -n -e 's/^install_//p' | sed -r 's/_/-/g')
if [ -z "${artifact}" ]; then
"Scripts needs artifact name to build"
exit 1
fi
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
pushd $GITHUB_WORKSPACE/tools/packaging
git checkout $tag
./scripts/gen_versions_txt.sh $tag
popd
pushd $GITHUB_WORKSPACE/tools/packaging/release
source ./kata-deploy-binaries.sh
${artifact_stage} $tag
popd
mv $GITHUB_WORKSPACE/tools/packaging/release/kata-static-${artifact}.tar.gz .
}
main $@

View File

@@ -1,85 +0,0 @@
name: kata deploy build
on:
pull_request:
types:
- opened
- edited
- reopened
- synchronize
paths:
- tools/**
- versions.yaml
jobs:
build-asset:
runs-on: ubuntu-latest
strategy:
matrix:
asset:
- kernel
- shim-v2
- qemu
- cloud-hypervisor
- firecracker
- rootfs-image
- rootfs-initrd
- virtiofsd
- nydus
steps:
- uses: actions/checkout@v2
- name: Install docker
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
curl -fsSL https://test.docker.com -o test-docker.sh
sh test-docker.sh
- name: Build ${{ matrix.asset }}
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
make "${KATA_ASSET}-tarball"
build_dir=$(readlink -f build)
# store-artifact does not work with symlink
sudo cp -r --preserve=all "${build_dir}" "kata-build"
env:
KATA_ASSET: ${{ matrix.asset }}
- name: store-artifact ${{ matrix.asset }}
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/upload-artifact@v2
with:
name: kata-artifacts
path: kata-build/kata-static-${{ matrix.asset }}.tar.xz
if-no-files-found: error
create-kata-tarball:
runs-on: ubuntu-latest
needs: build-asset
steps:
- uses: actions/checkout@v2
- name: get-artifacts
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/download-artifact@v2
with:
name: kata-artifacts
path: build
- name: merge-artifacts
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
make merge-builds
- name: store-artifacts
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/upload-artifact@v2
with:
name: kata-static-tarball
path: kata-static.tar.xz
make-kata-tarball:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: make kata-tarball
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
make kata-tarball
sudo make install-tarball

View File

@@ -1,117 +1,24 @@
on:
workflow_dispatch: # this is used to trigger the workflow on non-main branches
issue_comment:
types: [created, edited]
on: issue_comment
name: test-kata-deploy
jobs:
check-comment-and-membership:
check_comments:
runs-on: ubuntu-latest
if: |
github.event.issue.pull_request
&& github.event_name == 'issue_comment'
&& github.event.action == 'created'
&& startsWith(github.event.comment.body, '/test_kata_deploy')
steps:
- name: Check membership
uses: kata-containers/is-organization-member@1.0.1
id: is_organization_member
- name: Check for Command
id: command
uses: kata-containers/slash-command-action@v1
with:
organization: kata-containers
username: ${{ github.event.comment.user.login }}
token: ${{ secrets.GITHUB_TOKEN }}
- name: Fail if not member
repo-token: ${{ secrets.GITHUB_TOKEN }}
command: "test-kata-deploy"
reaction: "true"
reaction-type: "eyes"
allow-edits: "false"
permission-level: admin
- name: verify command arg is kata-deploy
run: |
result=${{ steps.is_organization_member.outputs.result }}
if [ $result == false ]; then
user=${{ github.event.comment.user.login }}
echo Either ${user} is not part of the kata-containers organization
echo or ${user} has its Organization Visibility set to Private at
echo https://github.com/orgs/kata-containers/people?query=${user}
echo
echo Ensure you change your Organization Visibility to Public and
echo trigger the test again.
exit 1
fi
build-asset:
runs-on: ubuntu-latest
needs: check-comment-and-membership
strategy:
matrix:
asset:
- cloud-hypervisor
- firecracker
- kernel
- nydus
- qemu
- rootfs-image
- rootfs-initrd
- shim-v2
- virtiofsd
steps:
- name: get-PR-ref
id: get-PR-ref
run: |
ref=$(cat $GITHUB_EVENT_PATH | jq -r '.issue.pull_request.url' | sed 's#^.*\/pulls#refs\/pull#' | sed 's#$#\/merge#')
echo "reference for PR: " ${ref}
echo "##[set-output name=pr-ref;]${ref}"
- uses: actions/checkout@v2
with:
ref: ${{ steps.get-PR-ref.outputs.pr-ref }}
- name: Install docker
run: |
curl -fsSL https://test.docker.com -o test-docker.sh
sh test-docker.sh
- name: Build ${{ matrix.asset }}
run: |
make "${KATA_ASSET}-tarball"
build_dir=$(readlink -f build)
# store-artifact does not work with symlink
sudo cp -r "${build_dir}" "kata-build"
env:
KATA_ASSET: ${{ matrix.asset }}
TAR_OUTPUT: ${{ matrix.asset }}.tar.gz
- name: store-artifact ${{ matrix.asset }}
uses: actions/upload-artifact@v2
with:
name: kata-artifacts
path: kata-build/kata-static-${{ matrix.asset }}.tar.xz
if-no-files-found: error
create-kata-tarball:
runs-on: ubuntu-latest
needs: build-asset
steps:
- name: get-PR-ref
id: get-PR-ref
run: |
ref=$(cat $GITHUB_EVENT_PATH | jq -r '.issue.pull_request.url' | sed 's#^.*\/pulls#refs\/pull#' | sed 's#$#\/merge#')
echo "reference for PR: " ${ref}
echo "##[set-output name=pr-ref;]${ref}"
- uses: actions/checkout@v2
with:
ref: ${{ steps.get-PR-ref.outputs.pr-ref }}
- name: get-artifacts
uses: actions/download-artifact@v2
with:
name: kata-artifacts
path: kata-artifacts
- name: merge-artifacts
run: |
./tools/packaging/kata-deploy/local-build/kata-deploy-merge-builds.sh kata-artifacts
- name: store-artifacts
uses: actions/upload-artifact@v2
with:
name: kata-static-tarball
path: kata-static.tar.xz
kata-deploy:
needs: create-kata-tarball
echo "The command was '${{ steps.command.outputs.command-name }}' with arguments '${{ steps.command.outputs.command-arguments }}'"
create-and-test-container:
needs: check_comments
runs-on: ubuntu-latest
steps:
- name: get-PR-ref
@@ -120,30 +27,26 @@ jobs:
ref=$(cat $GITHUB_EVENT_PATH | jq -r '.issue.pull_request.url' | sed 's#^.*\/pulls#refs\/pull#' | sed 's#$#\/merge#')
echo "reference for PR: " ${ref}
echo "##[set-output name=pr-ref;]${ref}"
- uses: actions/checkout@v2
- uses: actions/checkout@v2-beta
with:
ref: ${{ steps.get-PR-ref.outputs.pr-ref }}
- name: get-kata-tarball
uses: actions/download-artifact@v2
with:
name: kata-static-tarball
- name: build-and-push-kata-deploy-ci
id: build-and-push-kata-deploy-ci
- name: build-container-image
id: build-container-image
run: |
PR_SHA=$(git log --format=format:%H -n1)
mv kata-static.tar.xz $GITHUB_WORKSPACE/tools/packaging/kata-deploy/kata-static.tar.xz
docker build --build-arg KATA_ARTIFACTS=kata-static.tar.xz -t quay.io/kata-containers/kata-deploy-ci:$PR_SHA $GITHUB_WORKSPACE/tools/packaging/kata-deploy
docker login -u ${{ secrets.QUAY_DEPLOYER_USERNAME }} -p ${{ secrets.QUAY_DEPLOYER_PASSWORD }} quay.io
docker push quay.io/kata-containers/kata-deploy-ci:$PR_SHA
mkdir -p packaging/kata-deploy
ln -s $GITHUB_WORKSPACE/tools/packaging/kata-deploy/action packaging/kata-deploy/action
echo "::set-output name=PKG_SHA::${PR_SHA}"
PR_SHA=$(git log --format=format:%H -n1)
VERSION=$(curl https://raw.githubusercontent.com/kata-containers/kata-containers/2.0-dev/VERSION)
ARTIFACT_URL="https://github.com/kata-containers/kata-containers/releases/download/${VERSION}/kata-static-${VERSION}-x86_64.tar.xz"
wget "${ARTIFACT_URL}" -O ./kata-deploy/kata-static.tar.xz
docker build --build-arg KATA_ARTIFACTS=kata-static.tar.xz -t katadocker/kata-deploy-ci:${PR_SHA} ./kata-deploy
docker login -u ${{ secrets.DOCKER_USERNAME }} -p ${{ secrets.DOCKER_PASSWORD }}
docker push katadocker/kata-deploy-ci:$PR_SHA
echo "##[set-output name=pr-sha;]${PR_SHA}"
- name: test-kata-deploy-ci-in-aks
uses: ./packaging/kata-deploy/action
uses: ./kata-deploy/action
with:
packaging-sha: ${{steps.build-and-push-kata-deploy-ci.outputs.PKG_SHA}}
packaging-sha: ${{ steps.build-container-image.outputs.pr-sha }}
env:
PKG_SHA: ${{steps.build-and-push-kata-deploy-ci.outputs.PKG_SHA}}
PKG_SHA: ${{ steps.build-container-image.outputs.pr-sha }}
AZ_APPID: ${{ secrets.AZ_APPID }}
AZ_PASSWORD: ${{ secrets.AZ_PASSWORD }}
AZ_SUBSCRIPTION_ID: ${{ secrets.AZ_SUBSCRIPTION_ID }}

348
.github/workflows/main.yaml vendored Normal file
View File

@@ -0,0 +1,348 @@
name: Publish release tarball
on:
push:
tags:
- '1.*'
jobs:
get-artifact-list:
runs-on: ubuntu-latest
steps:
- name: get the list
run: |
pushd $GITHUB_WORKSPACE
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
git checkout $tag
popd
$GITHUB_WORKSPACE/tools/packaging/artifact-list.sh > artifact-list.txt
- name: save-artifact-list
uses: actions/upload-artifact@master
with:
name: artifact-list
path: artifact-list.txt
build-kernel:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_kernel"
steps:
- uses: actions/checkout@v1
- name: get-artifact-list
uses: actions/download-artifact@master
with:
name: artifact-list
- run: |
sudo apt-get update && sudo apt install -y flex bison libelf-dev bc iptables
- name: build-kernel
run: |
if grep -q $buildstr ./artifact-list/artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-artifact-tarball.sh $buildstr
echo ::set-env name=artifact-built::true
else
echo ::set-env name=artifact-built::false
fi
- name: store-artifacts
if: env.artifact-built == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
path: kata-static-kernel.tar.gz
build-experimental-kernel:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_experimental_kernel"
steps:
- uses: actions/checkout@v1
- name: get-artifact-list
uses: actions/download-artifact@master
with:
name: artifact-list
- run: |
sudo apt-get update && sudo apt install -y flex bison libelf-dev bc iptables
- name: build-experimental-kernel
run: |
if grep -q $buildstr ./artifact-list/artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-artifact-tarball.sh $buildstr
echo ::set-env name=artifact-built::true
else
echo ::set-env name=artifact-built::false
fi
- name: store-artifacts
if: env.artifact-built == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
path: kata-static-experimental-kernel.tar.gz
build-qemu:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_qemu"
steps:
- uses: actions/checkout@v1
- name: get-artifact-list
uses: actions/download-artifact@master
with:
name: artifact-list
- name: build-qemu
run: |
if grep -q $buildstr ./artifact-list/artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-artifact-tarball.sh $buildstr
echo ::set-env name=artifact-built::true
else
echo ::set-env name=artifact-built::false
fi
- name: store-artifacts
if: env.artifact-built == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
path: kata-static-qemu.tar.gz
build-nemu:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_nemu"
steps:
- uses: actions/checkout@v1
- name: get-artifact-list
uses: actions/download-artifact@master
with:
name: artifact-list
- name: build-nemu
run: |
if grep -q $buildstr ./artifact-list/artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-artifact-tarball.sh $buildstr
echo ::set-env name=artifact-built::true
else
echo ::set-env name=artifact-built::false
fi
- name: store-artifacts
if: env.artifact-built == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
path: kata-static-nemu.tar.gz
# Job for building the QEMU binaries with virtiofs support
build-qemu-virtiofsd:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_qemu_virtiofsd"
steps:
- uses: actions/checkout@v1
- name: get-artifact-list
uses: actions/download-artifact@master
with:
name: artifact-list
- name: build-qemu-virtiofsd
run: |
if grep -q $buildstr ./artifact-list/artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-artifact-tarball.sh $buildstr
echo ::set-env name=artifact-built::true
else
echo ::set-env name=artifact-built::false
fi
- name: store-artifacts
if: env.artifact-built == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
path: kata-static-qemu-virtiofsd.tar.gz
# Job for building the image
build-image:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_image"
steps:
- uses: actions/checkout@v1
- name: get-artifact-list
uses: actions/download-artifact@master
with:
name: artifact-list
- name: build-image
run: |
if grep -q $buildstr ./artifact-list/artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-artifact-tarball.sh $buildstr
echo ::set-env name=artifact-built::true
else
echo ::set-env name=artifact-built::false
fi
- name: store-artifacts
if: env.artifact-built == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
path: kata-static-image.tar.gz
# Job for building firecracker hypervisor
build-firecracker:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_firecracker"
steps:
- uses: actions/checkout@v1
- name: get-artifact-list
uses: actions/download-artifact@master
with:
name: artifact-list
- name: build-firecracker
run: |
if grep -q $buildstr ./artifact-list/artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-artifact-tarball.sh $buildstr
echo ::set-env name=artifact-built::true
else
echo ::set-env name=artifact-built::false
fi
- name: store-artifacts
if: env.artifact-built == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
path: kata-static-firecracker.tar.gz
# Job for building cloud-hypervisor
build-clh:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_clh"
steps:
- uses: actions/checkout@v1
- name: get-artifact-list
uses: actions/download-artifact@master
with:
name: artifact-list
- name: build-clh
run: |
if grep -q $buildstr ./artifact-list/artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-artifact-tarball.sh $buildstr
echo ::set-env name=artifact-built::true
else
echo ::set-env name=artifact-built::false
fi
- name: store-artifacts
if: env.artifact-built == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
path: kata-static-clh.tar.gz
# Job for building kata components
build-kata-components:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_kata_components"
steps:
- uses: actions/checkout@v1
- name: get-artifact-list
uses: actions/download-artifact@master
with:
name: artifact-list
- name: build-kata-components
run: |
if grep -q $buildstr ./artifact-list/artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-artifact-tarball.sh $buildstr
echo ::set-env name=artifact-built::true
else
echo ::set-env name=artifact-built::false
fi
- name: store-artifacts
if: env.artifact-built == 'true'
uses: actions/upload-artifact@master
with:
name: kata-artifacts
path: kata-static-kata-components.tar.gz
gather-artifacts:
runs-on: ubuntu-16.04
needs: [build-experimental-kernel, build-kernel, build-qemu, build-qemu-virtiofsd, build-image, build-firecracker, build-kata-components, build-nemu, build-clh]
steps:
- uses: actions/checkout@v1
- name: get-artifacts
uses: actions/download-artifact@master
with:
name: kata-artifacts
- name: colate-artifacts
run: |
$GITHUB_WORKSPACE/.github/workflows/gather-artifacts.sh
- name: store-artifacts
uses: actions/upload-artifact@master
with:
name: release-candidate
path: kata-static.tar.xz
kata-deploy:
needs: gather-artifacts
runs-on: ubuntu-latest
steps:
- name: get-artifacts
uses: actions/download-artifact@master
with:
name: release-candidate
- name: build-and-push-kata-deploy-ci
id: build-and-push-kata-deploy-ci
run: |
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
git clone https://github.com/kata-containers/packaging
pushd packaging
git checkout $tag
pkg_sha=$(git rev-parse HEAD)
popd
mv release-candidate/kata-static.tar.xz ./packaging/kata-deploy/kata-static.tar.xz
docker build --build-arg KATA_ARTIFACTS=kata-static.tar.xz -t katadocker/kata-deploy-ci:$pkg_sha ./packaging/kata-deploy
docker login -u ${{ secrets.DOCKER_USERNAME }} -p ${{ secrets.DOCKER_PASSWORD }}
docker push katadocker/kata-deploy-ci:$pkg_sha
echo "##[set-output name=PKG_SHA;]${pkg_sha}"
echo ::set-env name=TAG::$tag
- name: test-kata-deploy-ci-in-aks
uses: ./packaging/kata-deploy/action
with:
packaging-sha: ${{steps.build-and-push-kata-deploy-ci.outputs.PKG_SHA}}
env:
PKG_SHA: ${{steps.build-and-push-kata-deploy-ci.outputs.PKG_SHA}}
AZ_APPID: ${{ secrets.AZ_APPID }}
AZ_PASSWORD: ${{ secrets.AZ_PASSWORD }}
AZ_SUBSCRIPTION_ID: ${{ secrets.AZ_SUBSCRIPTION_ID }}
AZ_TENANT_ID: ${{ secrets.AZ_TENANT_ID }}
- name: push-tarball
run: |
# tag the container image we created and push to DockerHub
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
docker tag katadocker/kata-deploy-ci:${{steps.build-and-push-kata-deploy-ci.outputs.PKG_SHA}} katadocker/kata-deploy:${tag}
docker push katadocker/kata-deploy:${tag}
upload-static-tarball:
needs: kata-deploy
runs-on: ubuntu-latest
steps:
- name: download-artifacts
uses: actions/download-artifact@master
with:
name: release-candidate
- name: install hub
run: |
HUB_VER=$(curl -s "https://api.github.com/repos/github/hub/releases/latest" | jq -r .tag_name | sed 's/^v//')
wget -q -O- https://github.com/github/hub/releases/download/v$HUB_VER/hub-linux-amd64-$HUB_VER.tgz | \
tar xz --strip-components=2 --wildcards '*/bin/hub' && sudo mv hub /usr/local/bin/hub
- name: push static tarball to github
run: |
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
tarball="kata-static-$tag-x86_64.tar.xz"
repo="https://github.com/kata-containers/runtime.git"
mv release-candidate/kata-static.tar.xz "release-candidate/${tarball}"
git clone "${repo}"
cd runtime
echo "uploading asset '${tarball}' to '${repo}' tag: ${tag}"
GITHUB_TOKEN=${{ secrets.GIT_UPLOAD_TOKEN }} hub release edit -m "" -a "../release-candidate/${tarball}" "${tag}"

View File

@@ -16,7 +16,6 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Install hub
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
HUB_ARCH="amd64"
HUB_VER=$(curl -sL "https://api.github.com/repos/github/hub/releases/latest" |\
@@ -27,7 +26,6 @@ jobs:
sudo install hub /usr/local/bin
- name: Install hub extension script
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
# Clone into a temporary directory to avoid overwriting
# any existing github directory.
@@ -37,11 +35,9 @@ jobs:
popd &>/dev/null
- name: Checkout code to allow hub to communicate with the project
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v2
- name: Move issue to "In progress"
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
env:
GITHUB_TOKEN: ${{ secrets.KATA_GITHUB_ACTIONS_TOKEN }}
run: |

View File

@@ -1,52 +1,243 @@
name: Publish Kata release artifacts
name: Publish Kata 2.x release artifacts
on:
push:
tags:
- '[0-9]+.[0-9]+.[0-9]+*'
- '2.*'
jobs:
build-asset:
get-artifact-list:
runs-on: ubuntu-latest
strategy:
matrix:
asset:
- cloud-hypervisor
- firecracker
- kernel
- nydus
- qemu
- rootfs-image
- rootfs-initrd
- shim-v2
- virtiofsd
steps:
- uses: actions/checkout@v2
- name: Install docker
- name: get the list
run: |
curl -fsSL https://test.docker.com -o test-docker.sh
sh test-docker.sh
pushd $GITHUB_WORKSPACE
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
git checkout $tag
popd
$GITHUB_WORKSPACE/tools/packaging/artifact-list.sh > artifact-list.txt
- name: save-artifact-list
uses: actions/upload-artifact@v2
with:
name: artifact-list
path: artifact-list.txt
- name: Build ${{ matrix.asset }}
build-kernel:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_kernel"
steps:
- uses: actions/checkout@v2
- name: get-artifact-list
uses: actions/download-artifact@v2
with:
name: artifact-list
- run: |
sudo apt-get update && sudo apt install -y flex bison libelf-dev bc iptables
- name: build-kernel
run: |
./tools/packaging/kata-deploy/local-build/kata-deploy-copy-yq-installer.sh
./tools/packaging/kata-deploy/local-build/kata-deploy-binaries-in-docker.sh --build="${KATA_ASSET}"
build_dir=$(readlink -f build)
# store-artifact does not work with symlink
sudo cp -r "${build_dir}" "kata-build"
env:
KATA_ASSET: ${{ matrix.asset }}
TAR_OUTPUT: ${{ matrix.asset }}.tar.gz
- name: store-artifact ${{ matrix.asset }}
if grep -q $buildstr artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-local-artifact-tarball.sh $buildstr
echo ::set-env name=artifact-built::true
else
echo ::set-env name=artifact-built::false
fi
- name: store-artifacts
if: env.artifact-built == 'true'
uses: actions/upload-artifact@v2
with:
name: kata-artifacts
path: kata-build/kata-static-${{ matrix.asset }}.tar.xz
if-no-files-found: error
path: kata-static-kernel.tar.gz
create-kata-tarball:
runs-on: ubuntu-latest
needs: build-asset
build-experimental-kernel:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_experimental_kernel"
steps:
- uses: actions/checkout@v2
- name: get-artifact-list
uses: actions/download-artifact@v2
with:
name: artifact-list
- run: |
sudo apt-get update && sudo apt install -y flex bison libelf-dev bc iptables
- name: build-experimental-kernel
run: |
if grep -q $buildstr artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-local-artifact-tarball.sh $buildstr
echo ::set-env name=artifact-built::true
else
echo ::set-env name=artifact-built::false
fi
- name: store-artifacts
if: env.artifact-built == 'true'
uses: actions/upload-artifact@v2
with:
name: kata-artifacts
path: kata-static-experimental-kernel.tar.gz
build-qemu:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_qemu"
steps:
- uses: actions/checkout@v2
- name: get-artifact-list
uses: actions/download-artifact@v2
with:
name: artifact-list
- name: build-qemu
run: |
if grep -q $buildstr artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-local-artifact-tarball.sh $buildstr
echo ::set-env name=artifact-built::true
else
echo ::set-env name=artifact-built::false
fi
- name: store-artifacts
if: env.artifact-built == 'true'
uses: actions/upload-artifact@v2
with:
name: kata-artifacts
path: kata-static-qemu.tar.gz
build-qemu-virtiofsd:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_qemu_virtiofsd"
steps:
- uses: actions/checkout@v2
- name: get-artifact-list
uses: actions/download-artifact@v2
with:
name: artifact-list
- name: build-qemu-virtiofsd
run: |
if grep -q $buildstr artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-local-artifact-tarball.sh $buildstr
echo ::set-env name=artifact-built::true
else
echo ::set-env name=artifact-built::false
fi
- name: store-artifacts
if: env.artifact-built == 'true'
uses: actions/upload-artifact@v2
with:
name: kata-artifacts
path: kata-static-qemu-virtiofsd.tar.gz
build-image:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_image"
steps:
- uses: actions/checkout@v2
- name: get-artifact-list
uses: actions/download-artifact@v2
with:
name: artifact-list
- name: build-image
run: |
if grep -q $buildstr artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-local-artifact-tarball.sh $buildstr
echo ::set-env name=artifact-built::true
else
echo ::set-env name=artifact-built::false
fi
- name: store-artifacts
if: env.artifact-built == 'true'
uses: actions/upload-artifact@v2
with:
name: kata-artifacts
path: kata-static-image.tar.gz
build-firecracker:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_firecracker"
steps:
- uses: actions/checkout@v2
- name: get-artifact-list
uses: actions/download-artifact@v2
with:
name: artifact-list
- name: build-firecracker
run: |
if grep -q $buildstr artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-local-artifact-tarball.sh $buildstr
echo ::set-env name=artifact-built::true
else
echo ::set-env name=artifact-built::false
fi
- name: store-artifacts
if: env.artifact-built == 'true'
uses: actions/upload-artifact@v2
with:
name: kata-artifacts
path: kata-static-firecracker.tar.gz
build-clh:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_clh"
steps:
- uses: actions/checkout@v2
- name: get-artifact-list
uses: actions/download-artifact@v2
with:
name: artifact-list
- name: build-clh
run: |
if grep -q $buildstr artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-local-artifact-tarball.sh $buildstr
echo ::set-env name=artifact-built::true
else
echo ::set-env name=artifact-built::false
fi
- name: store-artifacts
if: env.artifact-built == 'true'
uses: actions/upload-artifact@v2
with:
name: kata-artifacts
path: kata-static-clh.tar.gz
build-kata-components:
runs-on: ubuntu-16.04
needs: get-artifact-list
env:
buildstr: "install_kata_components"
steps:
- uses: actions/checkout@v2
- name: get-artifact-list
uses: actions/download-artifact@v2
with:
name: artifact-list
- name: build-kata-components
run: |
if grep -q $buildstr artifact-list.txt; then
$GITHUB_WORKSPACE/.github/workflows/generate-local-artifact-tarball.sh $buildstr
echo ::set-env name=artifact-built::true
else
echo ::set-env name=artifact-built::false
fi
- name: store-artifacts
if: env.artifact-built == 'true'
uses: actions/upload-artifact@v2
with:
name: kata-artifacts
path: kata-static-kata-components.tar.gz
gather-artifacts:
runs-on: ubuntu-16.04
needs: [build-experimental-kernel, build-kernel, build-qemu, build-qemu-virtiofsd, build-image, build-firecracker, build-kata-components, build-clh]
steps:
- uses: actions/checkout@v2
- name: get-artifacts
@@ -54,24 +245,24 @@ jobs:
with:
name: kata-artifacts
path: kata-artifacts
- name: merge-artifacts
- name: colate-artifacts
run: |
./tools/packaging/kata-deploy/local-build/kata-deploy-merge-builds.sh kata-artifacts
$GITHUB_WORKSPACE/.github/workflows/gather-artifacts.sh
- name: store-artifacts
uses: actions/upload-artifact@v2
with:
name: kata-static-tarball
name: release-candidate
path: kata-static.tar.xz
kata-deploy:
needs: create-kata-tarball
needs: gather-artifacts
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: get-kata-tarball
- name: get-artifacts
uses: actions/download-artifact@v2
with:
name: kata-static-tarball
name: release-candidate
- name: build-and-push-kata-deploy-ci
id: build-and-push-kata-deploy-ci
run: |
@@ -81,14 +272,14 @@ jobs:
pkg_sha=$(git rev-parse HEAD)
popd
mv kata-static.tar.xz $GITHUB_WORKSPACE/tools/packaging/kata-deploy/kata-static.tar.xz
docker build --build-arg KATA_ARTIFACTS=kata-static.tar.xz -t katadocker/kata-deploy-ci:$pkg_sha -t quay.io/kata-containers/kata-deploy-ci:$pkg_sha $GITHUB_WORKSPACE/tools/packaging/kata-deploy
docker build --build-arg KATA_ARTIFACTS=kata-static.tar.xz -t katadocker/kata-deploy-ci:$pkg_sha $GITHUB_WORKSPACE/tools/packaging/kata-deploy
docker login -u ${{ secrets.DOCKER_USERNAME }} -p ${{ secrets.DOCKER_PASSWORD }}
docker push katadocker/kata-deploy-ci:$pkg_sha
docker login -u ${{ secrets.QUAY_DEPLOYER_USERNAME }} -p ${{ secrets.QUAY_DEPLOYER_PASSWORD }} quay.io
docker push quay.io/kata-containers/kata-deploy-ci:$pkg_sha
echo "##[set-output name=PKG_SHA;]${pkg_sha}"
echo ::set-env name=TAG::$tag
mkdir -p packaging/kata-deploy
ln -s $GITHUB_WORKSPACE/tools/packaging/kata-deploy/action packaging/kata-deploy/action
echo "::set-output name=PKG_SHA::${pkg_sha}"
- name: test-kata-deploy-ci-in-aks
uses: ./packaging/kata-deploy/action
with:
@@ -103,14 +294,8 @@ jobs:
run: |
# tag the container image we created and push to DockerHub
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
tags=($tag)
tags+=($([[ "$tag" =~ "alpha"|"rc" ]] && echo "latest" || echo "stable"))
for tag in ${tags[@]}; do \
docker tag katadocker/kata-deploy-ci:${{steps.build-and-push-kata-deploy-ci.outputs.PKG_SHA}} katadocker/kata-deploy:${tag} && \
docker tag quay.io/kata-containers/kata-deploy-ci:${{steps.build-and-push-kata-deploy-ci.outputs.PKG_SHA}} quay.io/kata-containers/kata-deploy:${tag} && \
docker push katadocker/kata-deploy:${tag} && \
docker push quay.io/kata-containers/kata-deploy:${tag}; \
done
docker tag katadocker/kata-deploy-ci:${{steps.build-and-push-kata-deploy-ci.outputs.PKG_SHA}} katadocker/kata-deploy:${tag}
docker push katadocker/kata-deploy:${tag}
upload-static-tarball:
needs: kata-deploy
@@ -120,7 +305,7 @@ jobs:
- name: download-artifacts
uses: actions/download-artifact@v2
with:
name: kata-static-tarball
name: release-candidate
- name: install hub
run: |
HUB_VER=$(curl -s "https://api.github.com/repos/github/hub/releases/latest" | jq -r .tag_name | sed 's/^v//')
@@ -134,46 +319,3 @@ jobs:
pushd $GITHUB_WORKSPACE
echo "uploading asset '${tarball}' for tag: ${tag}"
GITHUB_TOKEN=${{ secrets.GIT_UPLOAD_TOKEN }} hub release edit -m "" -a "${tarball}" "${tag}"
popd
upload-cargo-vendored-tarball:
needs: upload-static-tarball
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: generate-and-upload-tarball
run: |
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
tarball="kata-containers-$tag-vendor.tar.gz"
pushd $GITHUB_WORKSPACE
bash -c "tools/packaging/release/generate_vendor.sh ${tarball}"
GITHUB_TOKEN=${{ secrets.GIT_UPLOAD_TOKEN }} hub release edit -m "" -a "${tarball}" "${tag}"
popd
upload-libseccomp-tarball:
needs: upload-cargo-vendored-tarball
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: download-and-upload-tarball
env:
GITHUB_TOKEN: ${{ secrets.GIT_UPLOAD_TOKEN }}
GOPATH: ${HOME}/go
run: |
pushd $GITHUB_WORKSPACE
./ci/install_yq.sh
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
versions_yaml="versions.yaml"
version=$(${GOPATH}/bin/yq read ${versions_yaml} "externals.libseccomp.version")
repo_url=$(${GOPATH}/bin/yq read ${versions_yaml} "externals.libseccomp.url")
download_url="${repo_url}/releases/download/v${version}"
tarball="libseccomp-${version}.tar.gz"
asc="${tarball}.asc"
curl -sSLO "${download_url}/${tarball}"
curl -sSLO "${download_url}/${asc}"
# "-m" option should be empty to re-use the existing release title
# without opening a text editor.
# For the details, check https://hub.github.com/hub-release.1.html.
hub release edit -m "" -a "${tarball}" "${tag}"
hub release edit -m "" -a "${asc}" "${tag}"
popd

View File

@@ -12,15 +12,12 @@ on:
- reopened
- labeled
- unlabeled
branches:
- main
jobs:
check-pr-porting-labels:
runs-on: ubuntu-latest
steps:
- name: Install hub
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
HUB_ARCH="amd64"
HUB_VER=$(curl -sL "https://api.github.com/repos/github/hub/releases/latest" |\
@@ -31,8 +28,9 @@ jobs:
sudo install hub /usr/local/bin
- name: Checkout code to allow hub to communicate with the project
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v2
with:
token: ${{ secrets.KATA_GITHUB_ACTIONS_TOKEN }}
- name: Install porting checker script
run: |
@@ -44,7 +42,6 @@ jobs:
popd &>/dev/null
- name: Stop PR being merged unless it has a correct set of porting labels
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
env:
GITHUB_TOKEN: ${{ secrets.KATA_GITHUB_ACTIONS_TOKEN }}
run: |

View File

@@ -1,42 +0,0 @@
name: Release Kata in snapcraft store
on:
push:
tags:
- '[0-9]+.[0-9]+.[0-9]+*'
jobs:
release-snap:
runs-on: ubuntu-20.04
steps:
- name: Check out Git repository
uses: actions/checkout@v2
with:
fetch-depth: 0
- name: Install Snapcraft
uses: samuelmeuli/action-snapcraft@v1
with:
snapcraft_token: ${{ secrets.snapcraft_token }}
- name: Build snap
run: |
# Removing man-db, workflow kept failing, fixes: #4480
sudo apt -y remove --purge man-db
sudo apt-get install -y git git-extras
kata_url="https://github.com/kata-containers/kata-containers"
latest_version=$(git ls-remote --tags ${kata_url} | egrep -o "refs.*" | egrep -v "\-alpha|\-rc|{}" | egrep -o "[[:digit:]]+\.[[:digit:]]+\.[[:digit:]]+" | sort -V -r | head -1)
current_version="$(echo ${GITHUB_REF} | cut -d/ -f3)"
# Check semantic versioning format (x.y.z) and if the current tag is the latest tag
if echo "${current_version}" | grep -q "^[[:digit:]]\+\.[[:digit:]]\+\.[[:digit:]]\+$" && echo -e "$latest_version\n$current_version" | sort -C -V; then
# Current version is the latest version, build it
snapcraft snap --debug --destructive-mode
fi
- name: Upload snap
run: |
snap_version="$(echo ${GITHUB_REF} | cut -d/ -f3)"
snap_file="kata-containers_${snap_version}_amd64.snap"
# Upload the snap if it exists
if [ -f ${snap_file} ]; then
snapcraft upload --release=stable ${snap_file}
fi

View File

@@ -1,27 +1,25 @@
name: snap CI
on:
pull_request:
types:
- opened
- synchronize
- reopened
- edited
paths:
- "**/Makefile"
- "**/*.go"
- "**/*.mk"
- "**/*.rs"
- "**/*.sh"
- "**/*.toml"
- "**/*.yaml"
- "**/*.yml"
jobs:
test:
runs-on: ubuntu-20.04
steps:
- name: Check out
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v2
with:
fetch-depth: 0
- name: Install Snapcraft
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: samuelmeuli/action-snapcraft@v1
- name: Build snap
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
snapcraft snap --debug --destructive-mode
snapcraft -d snap --destructive-mode

View File

@@ -1,326 +0,0 @@
on:
pull_request:
types:
- opened
- edited
- reopened
- synchronize
name: Static checks
jobs:
check-vendored-code:
runs-on: ubuntu-20.04
env:
TRAVIS: "true"
TRAVIS_BRANCH: ${{ github.base_ref }}
TRAVIS_PULL_REQUEST_BRANCH: ${{ github.head_ref }}
TRAVIS_PULL_REQUEST_SHA : ${{ github.event.pull_request.head.sha }}
RUST_BACKTRACE: "1"
target_branch: ${{ github.base_ref }}
steps:
- name: Install Go
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/setup-go@v3
with:
go-version: 1.19.2
env:
GOPATH: ${{ runner.workspace }}/kata-containers
- name: Setup GOPATH
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "TRAVIS_BRANCH: ${TRAVIS_BRANCH}"
echo "TRAVIS_PULL_REQUEST_BRANCH: ${TRAVIS_PULL_REQUEST_BRANCH}"
echo "TRAVIS_PULL_REQUEST_SHA: ${TRAVIS_PULL_REQUEST_SHA}"
echo "TRAVIS: ${TRAVIS}"
- name: Set env
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "GOPATH=${{ github.workspace }}" >> $GITHUB_ENV
echo "${{ github.workspace }}/bin" >> $GITHUB_PATH
- name: Checkout code
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v3
with:
fetch-depth: 0
path: ./src/github.com/${{ github.repository }}
- name: Setup travis references
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "TRAVIS_BRANCH=${TRAVIS_BRANCH:-$(echo $GITHUB_REF | awk 'BEGIN { FS = \"/\" } ; { print $3 }')}"
target_branch=${TRAVIS_BRANCH}
- name: Setup
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/setup.sh
env:
GOPATH: ${{ runner.workspace }}/kata-containers
# Check whether the vendored code is up-to-date & working as the first thing
- name: Check vendored code
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && make vendor
static-checks:
runs-on: ubuntu-20.04
env:
TRAVIS: "true"
TRAVIS_BRANCH: ${{ github.base_ref }}
TRAVIS_PULL_REQUEST_BRANCH: ${{ github.head_ref }}
TRAVIS_PULL_REQUEST_SHA : ${{ github.event.pull_request.head.sha }}
RUST_BACKTRACE: "1"
target_branch: ${{ github.base_ref }}
steps:
- name: Install Go
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/setup-go@v3
with:
go-version: 1.19.2
env:
GOPATH: ${{ runner.workspace }}/kata-containers
- name: Setup GOPATH
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "TRAVIS_BRANCH: ${TRAVIS_BRANCH}"
echo "TRAVIS_PULL_REQUEST_BRANCH: ${TRAVIS_PULL_REQUEST_BRANCH}"
echo "TRAVIS_PULL_REQUEST_SHA: ${TRAVIS_PULL_REQUEST_SHA}"
echo "TRAVIS: ${TRAVIS}"
- name: Set env
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "GOPATH=${{ github.workspace }}" >> $GITHUB_ENV
echo "${{ github.workspace }}/bin" >> $GITHUB_PATH
- name: Checkout code
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v3
with:
fetch-depth: 0
path: ./src/github.com/${{ github.repository }}
- name: Setup travis references
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "TRAVIS_BRANCH=${TRAVIS_BRANCH:-$(echo $GITHUB_REF | awk 'BEGIN { FS = \"/\" } ; { print $3 }')}"
target_branch=${TRAVIS_BRANCH}
- name: Setup
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/setup.sh
env:
GOPATH: ${{ runner.workspace }}/kata-containers
- name: Installing rust
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/install_rust.sh
PATH=$PATH:"$HOME/.cargo/bin"
rustup target add x86_64-unknown-linux-musl
rustup component add rustfmt clippy
- name: Setup seccomp
run: |
libseccomp_install_dir=$(mktemp -d -t libseccomp.XXXXXXXXXX)
gperf_install_dir=$(mktemp -d -t gperf.XXXXXXXXXX)
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/install_libseccomp.sh "${libseccomp_install_dir}" "${gperf_install_dir}"
echo "Set environment variables for the libseccomp crate to link the libseccomp library statically"
echo "LIBSECCOMP_LINK_TYPE=static" >> $GITHUB_ENV
echo "LIBSECCOMP_LIB_PATH=${libseccomp_install_dir}/lib" >> $GITHUB_ENV
- name: Static Checks
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && make static-checks
compiler-checks:
runs-on: ubuntu-20.04
env:
TRAVIS: "true"
TRAVIS_BRANCH: ${{ github.base_ref }}
TRAVIS_PULL_REQUEST_BRANCH: ${{ github.head_ref }}
TRAVIS_PULL_REQUEST_SHA : ${{ github.event.pull_request.head.sha }}
RUST_BACKTRACE: "1"
target_branch: ${{ github.base_ref }}
steps:
- name: Install Go
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/setup-go@v3
with:
go-version: 1.19.2
env:
GOPATH: ${{ runner.workspace }}/kata-containers
- name: Setup GOPATH
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "TRAVIS_BRANCH: ${TRAVIS_BRANCH}"
echo "TRAVIS_PULL_REQUEST_BRANCH: ${TRAVIS_PULL_REQUEST_BRANCH}"
echo "TRAVIS_PULL_REQUEST_SHA: ${TRAVIS_PULL_REQUEST_SHA}"
echo "TRAVIS: ${TRAVIS}"
- name: Set env
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "GOPATH=${{ github.workspace }}" >> $GITHUB_ENV
echo "${{ github.workspace }}/bin" >> $GITHUB_PATH
- name: Checkout code
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v3
with:
fetch-depth: 0
path: ./src/github.com/${{ github.repository }}
- name: Setup travis references
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "TRAVIS_BRANCH=${TRAVIS_BRANCH:-$(echo $GITHUB_REF | awk 'BEGIN { FS = \"/\" } ; { print $3 }')}"
target_branch=${TRAVIS_BRANCH}
- name: Setup
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/setup.sh
env:
GOPATH: ${{ runner.workspace }}/kata-containers
- name: Installing rust
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/install_rust.sh
PATH=$PATH:"$HOME/.cargo/bin"
rustup target add x86_64-unknown-linux-musl
rustup component add rustfmt clippy
- name: Setup seccomp
run: |
libseccomp_install_dir=$(mktemp -d -t libseccomp.XXXXXXXXXX)
gperf_install_dir=$(mktemp -d -t gperf.XXXXXXXXXX)
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/install_libseccomp.sh "${libseccomp_install_dir}" "${gperf_install_dir}"
echo "Set environment variables for the libseccomp crate to link the libseccomp library statically"
echo "LIBSECCOMP_LINK_TYPE=static" >> $GITHUB_ENV
echo "LIBSECCOMP_LIB_PATH=${libseccomp_install_dir}/lib" >> $GITHUB_ENV
- name: Run Compiler Checks
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && make check
unit-tests:
runs-on: ubuntu-20.04
env:
TRAVIS: "true"
TRAVIS_BRANCH: ${{ github.base_ref }}
TRAVIS_PULL_REQUEST_BRANCH: ${{ github.head_ref }}
TRAVIS_PULL_REQUEST_SHA : ${{ github.event.pull_request.head.sha }}
RUST_BACKTRACE: "1"
target_branch: ${{ github.base_ref }}
steps:
- name: Install Go
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/setup-go@v3
with:
go-version: 1.19.2
env:
GOPATH: ${{ runner.workspace }}/kata-containers
- name: Setup GOPATH
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "TRAVIS_BRANCH: ${TRAVIS_BRANCH}"
echo "TRAVIS_PULL_REQUEST_BRANCH: ${TRAVIS_PULL_REQUEST_BRANCH}"
echo "TRAVIS_PULL_REQUEST_SHA: ${TRAVIS_PULL_REQUEST_SHA}"
echo "TRAVIS: ${TRAVIS}"
- name: Set env
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "GOPATH=${{ github.workspace }}" >> $GITHUB_ENV
echo "${{ github.workspace }}/bin" >> $GITHUB_PATH
- name: Checkout code
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v3
with:
fetch-depth: 0
path: ./src/github.com/${{ github.repository }}
- name: Setup travis references
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "TRAVIS_BRANCH=${TRAVIS_BRANCH:-$(echo $GITHUB_REF | awk 'BEGIN { FS = \"/\" } ; { print $3 }')}"
target_branch=${TRAVIS_BRANCH}
- name: Setup
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/setup.sh
env:
GOPATH: ${{ runner.workspace }}/kata-containers
- name: Installing rust
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/install_rust.sh
PATH=$PATH:"$HOME/.cargo/bin"
rustup target add x86_64-unknown-linux-musl
rustup component add rustfmt clippy
- name: Setup seccomp
run: |
libseccomp_install_dir=$(mktemp -d -t libseccomp.XXXXXXXXXX)
gperf_install_dir=$(mktemp -d -t gperf.XXXXXXXXXX)
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/install_libseccomp.sh "${libseccomp_install_dir}" "${gperf_install_dir}"
echo "Set environment variables for the libseccomp crate to link the libseccomp library statically"
echo "LIBSECCOMP_LINK_TYPE=static" >> $GITHUB_ENV
echo "LIBSECCOMP_LIB_PATH=${libseccomp_install_dir}/lib" >> $GITHUB_ENV
- name: Run Unit Tests
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && make test
unit-tests-as-root:
runs-on: ubuntu-20.04
env:
TRAVIS: "true"
TRAVIS_BRANCH: ${{ github.base_ref }}
TRAVIS_PULL_REQUEST_BRANCH: ${{ github.head_ref }}
TRAVIS_PULL_REQUEST_SHA : ${{ github.event.pull_request.head.sha }}
RUST_BACKTRACE: "1"
target_branch: ${{ github.base_ref }}
steps:
- name: Install Go
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/setup-go@v3
with:
go-version: 1.19.2
env:
GOPATH: ${{ runner.workspace }}/kata-containers
- name: Setup GOPATH
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "TRAVIS_BRANCH: ${TRAVIS_BRANCH}"
echo "TRAVIS_PULL_REQUEST_BRANCH: ${TRAVIS_PULL_REQUEST_BRANCH}"
echo "TRAVIS_PULL_REQUEST_SHA: ${TRAVIS_PULL_REQUEST_SHA}"
echo "TRAVIS: ${TRAVIS}"
- name: Set env
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "GOPATH=${{ github.workspace }}" >> $GITHUB_ENV
echo "${{ github.workspace }}/bin" >> $GITHUB_PATH
- name: Checkout code
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v3
with:
fetch-depth: 0
path: ./src/github.com/${{ github.repository }}
- name: Setup travis references
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "TRAVIS_BRANCH=${TRAVIS_BRANCH:-$(echo $GITHUB_REF | awk 'BEGIN { FS = \"/\" } ; { print $3 }')}"
target_branch=${TRAVIS_BRANCH}
- name: Setup
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/setup.sh
env:
GOPATH: ${{ runner.workspace }}/kata-containers
- name: Installing rust
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/install_rust.sh
PATH=$PATH:"$HOME/.cargo/bin"
rustup target add x86_64-unknown-linux-musl
rustup component add rustfmt clippy
- name: Setup seccomp
run: |
libseccomp_install_dir=$(mktemp -d -t libseccomp.XXXXXXXXXX)
gperf_install_dir=$(mktemp -d -t gperf.XXXXXXXXXX)
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/install_libseccomp.sh "${libseccomp_install_dir}" "${gperf_install_dir}"
echo "Set environment variables for the libseccomp crate to link the libseccomp library statically"
echo "LIBSECCOMP_LINK_TYPE=static" >> $GITHUB_ENV
echo "LIBSECCOMP_LIB_PATH=${libseccomp_install_dir}/lib" >> $GITHUB_ENV
- name: Run Unit Tests As Root User
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && sudo -E PATH="$PATH" make test

7
.gitignore vendored
View File

@@ -1,14 +1,7 @@
**/*.bk
**/*~
**/*.orig
**/*.rej
**/target
**/.vscode
pkg/logging/Cargo.lock
src/agent/src/version.rs
src/agent/kata-agent.service
src/agent/protocols/src/*.rs
!src/agent/protocols/src/lib.rs
build
src/tools/log-parser/kata-log-parser

62
.travis.yml Normal file
View File

@@ -0,0 +1,62 @@
# Copyright (c) 2019 Ant Financial
#
# SPDX-License-Identifier: Apache-2.0
#
dist: bionic
os: linux
# set cache directories manually, because
# we are using a non-standard directory struct
# cargo root is in srs/agent
#
# If needed, caches can be cleared
# by ways documented in
# https://docs.travis-ci.com/user/caching#clearing-caches
language: rust
rust:
- 1.44.1
cache:
cargo: true
directories:
- src/agent/target
before_install:
- git remote set-branches --add origin "${TRAVIS_BRANCH}"
- git fetch
- export RUST_BACKTRACE=1
- export target_branch=$TRAVIS_BRANCH
- "ci/setup.sh"
# we use install to run check agent
# so that it is easy to skip for non-amd64 platform
install:
- export PATH=$PATH:"$HOME/.cargo/bin"
- export RUST_AGENT=yes
- rustup target add x86_64-unknown-linux-musl
- sudo ln -sf /usr/bin/g++ /bin/musl-g++
- rustup component add rustfmt
- make -C ${TRAVIS_BUILD_DIR}/src/agent
- make -C ${TRAVIS_BUILD_DIR}/src/agent check
- sudo -E PATH=$PATH make -C ${TRAVIS_BUILD_DIR}/src/agent check
before_script:
- "ci/install_go.sh"
- make -C ${TRAVIS_BUILD_DIR}/src/runtime
- make -C ${TRAVIS_BUILD_DIR}/src/runtime test
- sudo -E PATH=$PATH GOPATH=$GOPATH make -C ${TRAVIS_BUILD_DIR}/src/runtime test
script:
- "ci/static-checks.sh"
jobs:
include:
- name: x86_64 test
os: linux
- name: ppc64le test
os: linux-ppc64le
install: skip
script: skip
allow_failures:
- name: ppc64le test
fast_finish: true

View File

@@ -2,4 +2,4 @@
## This repo is part of [Kata Containers](https://katacontainers.io)
For details on how to contribute to the Kata Containers project, please see the main [contributing document](https://github.com/kata-containers/community/blob/main/CONTRIBUTING.md).
For details on how to contribute to the Kata Containers project, please see the main [contributing document](https://github.com/kata-containers/community/blob/master/CONTRIBUTING.md).

View File

@@ -1,3 +0,0 @@
# Glossary
See the [project glossary hosted in the wiki](https://github.com/kata-containers/kata-containers/wiki/Glossary).

View File

@@ -6,25 +6,20 @@
# List of available components
COMPONENTS =
COMPONENTS += libs
COMPONENTS += agent
COMPONENTS += runtime
COMPONENTS += runtime-rs
COMPONENTS += trace-forwarder
# List of available tools
TOOLS =
TOOLS += agent-ctl
TOOLS += trace-forwarder
TOOLS += runk
TOOLS += log-parser
STANDARD_TARGETS = build check clean install test vendor
default: all
STANDARD_TARGETS = build check clean install test
include utils.mk
include ./tools/packaging/kata-deploy/local-build/Makefile
all: build
# Create the rules
$(eval $(call create_all_rules,$(COMPONENTS),$(TOOLS),$(STANDARD_TARGETS)))
@@ -34,19 +29,4 @@ $(eval $(call create_all_rules,$(COMPONENTS),$(TOOLS),$(STANDARD_TARGETS)))
generate-protocols:
make -C src/agent generate-protocols
# Some static checks rely on generated source files of components.
static-checks: build
bash ci/static-checks.sh
docs-url-alive-check:
bash ci/docs-url-alive-check.sh
.PHONY: \
all \
binary-tarball \
default \
install-binary-tarball \
static-checks \
docs-url-alive-check
.PHONY: all default

235
README.md
View File

@@ -2,151 +2,136 @@
# Kata Containers
* [Raising issues](#raising-issues)
* [Kata Containers repositories](#kata-containers-repositories)
* [Code Repositories](#code-repositories)
* [Kata Containers-developed components](#kata-containers-developed-components)
* [Agent](#agent)
* [KSM throttler](#ksm-throttler)
* [Runtime](#runtime)
* [Trace forwarder](#trace-forwarder)
* [Additional](#additional)
* [Hypervisor](#hypervisor)
* [Kernel](#kernel)
* [CI](#ci)
* [Community](#community)
* [Documentation](#documentation)
* [Packaging](#packaging)
* [Test code](#test-code)
* [Utilities](#utilities)
* [OS builder](#os-builder)
* [Web content](#web-content)
---
Welcome to Kata Containers!
This repository is the home of the Kata Containers code for the 2.0 and newer
releases.
The purpose of this repository is to act as a "top level" site for the project. Specifically it is used:
If you want to learn about Kata Containers, visit the main
[Kata Containers website](https://katacontainers.io).
- To provide a list of the various *other* [Kata Containers repositories](#kata-containers-repositories),
along with a brief explanation of their purpose.
## Introduction
- To provide a general area for [Raising Issues](#raising-issues).
Kata Containers is an open source project and community working to build a
standard implementation of lightweight Virtual Machines (VMs) that feel and
perform like containers, but provide the workload isolation and security
advantages of VMs.
## Raising issues
## License
This repository is used for [raising
issues](https://github.com/kata-containers/kata-containers/issues/new):
The code is licensed under the Apache 2.0 license.
See [the license file](LICENSE) for further details.
- That might affect multiple code repositories.
## Platform support
Kata Containers currently runs on 64-bit systems supporting the following
technologies:
| Architecture | Virtualization technology |
|-|-|
| `x86_64`, `amd64` | [Intel](https://www.intel.com) VT-x, AMD SVM |
| `aarch64` ("`arm64`")| [ARM](https://www.arm.com) Hyp |
| `ppc64le` | [IBM](https://www.ibm.com) Power |
| `s390x` | [IBM](https://www.ibm.com) Z & LinuxONE SIE |
### Hardware requirements
The [Kata Containers runtime](src/runtime) provides a command to
determine if your host system is capable of running and creating a
Kata Container:
```bash
$ kata-runtime check
```
> **Notes:**
>
> - This command runs a number of checks including connecting to the
> network to determine if a newer release of Kata Containers is
> available on GitHub. If you do not wish this to check to run, add
> the `--no-network-checks` option.
>
> - By default, only a brief success / failure message is printed.
> If more details are needed, the `--verbose` flag can be used to display the
> list of all the checks performed.
>
> - If the command is run as the `root` user additional checks are
> run (including checking if another incompatible hypervisor is running).
> When running as `root`, network checks are automatically disabled.
## Getting started
See the [installation documentation](docs/install).
## Documentation
See the [official documentation](docs) including:
- [Installation guides](docs/install)
- [Developer guide](docs/Developer-Guide.md)
- [Design documents](docs/design)
- [Architecture overview](docs/design/architecture)
- [Architecture 3.0 overview](docs/design/architecture_3.0/)
## Configuration
Kata Containers uses a single
[configuration file](src/runtime/README.md#configuration)
which contains a number of sections for various parts of the Kata
Containers system including the [runtime](src/runtime), the
[agent](src/agent) and the [hypervisor](#hypervisors).
## Hypervisors
See the [hypervisors document](docs/hypervisors.md) and the
[Hypervisor specific configuration details](src/runtime/README.md#hypervisor-specific-configuration).
## Community
To learn more about the project, its community and governance, see the
[community repository](https://github.com/kata-containers/community). This is
the first place to go if you wish to contribute to the project.
## Getting help
See the [community](#community) section for ways to contact us.
### Raising issues
Please raise an issue
[in this repository](https://github.com/kata-containers/kata-containers/issues).
- Where the raiser is unsure which repositories are affected.
> **Note:**
> If you are reporting a security issue, please follow the [vulnerability reporting process](https://github.com/kata-containers/community#vulnerability-handling)
>
> - If an issue affects only a single component, it should be raised in that
> components repository.
## Developers
## Kata Containers repositories
See the [developer guide](docs/Developer-Guide.md).
### CI
### Components
The [CI](https://github.com/kata-containers/ci) repository stores the Continuous
Integration (CI) system configuration information.
### Main components
### Community
The table below lists the core parts of the project:
The [Community](https://github.com/kata-containers/community) repository is
the first place to go if you want to use or contribute to the project.
| Component | Type | Description |
|-|-|-|
| [runtime](src/runtime) | core | Main component run by a container manager and providing a containerd shimv2 runtime implementation. |
| [runtime-rs](src/runtime-rs) | core | The Rust version runtime. |
| [agent](src/agent) | core | Management process running inside the virtual machine / POD that sets up the container environment. |
| [libraries](src/libs) | core | Library crates shared by multiple Kata Container components or published to [`crates.io`](https://crates.io/index.html) |
| [`dragonball`](src/dragonball) | core | An optional built-in VMM brings out-of-the-box Kata Containers experience with optimizations on container workloads |
| [documentation](docs) | documentation | Documentation common to all components (such as design and install documentation). |
| [libraries](src/libs) | core | Library crates shared by multiple Kata Container components or published to [`crates.io`](https://crates.io/index.html) |
| [tests](https://github.com/kata-containers/tests) | tests | Excludes unit tests which live with the main code. |
### Code Repositories
### Additional components
#### Kata Containers-developed components
The table below lists the remaining parts of the project:
##### Agent
| Component | Type | Description |
|-|-|-|
| [packaging](tools/packaging) | infrastructure | Scripts and metadata for producing packaged binaries<br/>(components, hypervisors, kernel and rootfs). |
| [kernel](https://www.kernel.org) | kernel | Linux kernel used by the hypervisor to boot the guest image. Patches are stored [here](tools/packaging/kernel). |
| [osbuilder](tools/osbuilder) | infrastructure | Tool to create "mini O/S" rootfs and initrd images and kernel for the hypervisor. |
| [`agent-ctl`](src/tools/agent-ctl) | utility | Tool that provides low-level access for testing the agent. |
| [`trace-forwarder`](src/tools/trace-forwarder) | utility | Agent tracing helper. |
| [`runk`](src/tools/runk) | utility | Standard OCI container runtime based on the agent. |
| [`ci`](https://github.com/kata-containers/ci) | CI | Continuous Integration configuration files and scripts. |
| [`katacontainers.io`](https://github.com/kata-containers/www.katacontainers.io) | Source for the [`katacontainers.io`](https://www.katacontainers.io) site. |
The [`kata-agent`](src/agent/README.md) runs inside the
virtual machine and sets up the container environment.
### Packaging and releases
##### KSM throttler
Kata Containers is now
[available natively for most distributions](docs/install/README.md#packaged-installation-methods).
However, packaging scripts and metadata are still used to generate [snap](snap/local) and GitHub releases. See
the [components](#components) section for further details.
The [`kata-ksm-throttler`](https://github.com/kata-containers/ksm-throttler)
is an optional utility that monitors containers and deduplicates memory to
maximize container density on a host.
## Glossary of Terms
##### Runtime
See the [glossary of terms](https://github.com/kata-containers/kata-containers/wiki/Glossary) related to Kata Containers.
The [`kata-runtime`](src/runtime/README.md) is usually
invoked by a container manager and provides high-level verbs to manage
containers.
##### Trace forwarder
The [`kata-trace-forwarder`](src/trace-forwarder) is a component only used
when tracing the [agent](#agent) process.
#### Additional
##### Hypervisor
The [`qemu`](https://github.com/kata-containers/qemu) hypervisor is used to
create virtual machines for hosting the containers.
##### Kernel
The hypervisor uses a [Linux\* kernel](https://github.com/kata-containers/linux) to boot the guest image.
### Documentation
The [docs](docs/README.md) directory holds documentation common to all code components.
### Packaging
We use the [packaging](tools/packaging/README.md) to create packages for the [system
components](#kata-containers-developed-components) including
[rootfs](#os-builder) and [kernel](#kernel) images.
### Test code
The [tests](https://github.com/kata-containers/tests) repository hosts all
test code except the unit testing code (which is kept in the same repository
as the component it tests).
### Utilities
#### OS builder
The [osbuilder](tools/osbuilder/README.md) tool can create
a rootfs and a "mini O/S" image. This image is used by the hypervisor to setup
the environment before switching to the workload.
#### `kata-agent-ctl`
[`kata-agent-ctl`](tools/agent-ctl) is a low-level test tool for
interacting with the agent.
### Web content
The
[www.katacontainers.io](https://github.com/kata-containers/www.katacontainers.io)
repository contains all sources for the https://www.katacontainers.io site.
## Credits
Kata Containers uses [packagecloud](https://packagecloud.io) for package
hosting.

View File

@@ -1 +1 @@
3.0.2
2.0.0

View File

@@ -1,42 +0,0 @@
#!/usr/bin/env bash
#
# Copyright (c) 2022 Apple Inc.
#
# SPDX-License-Identifier: Apache-2.0
set -e
cidir=$(dirname "$0")
runtimedir=$cidir/../src/runtime
build_working_packages() {
# working packages:
device_api=$runtimedir/pkg/device/api
device_config=$runtimedir/pkg/device/config
device_drivers=$runtimedir/pkg/device/drivers
device_manager=$runtimedir/pkg/device/manager
rc_pkg_dir=$runtimedir/pkg/resourcecontrol/
utils_pkg_dir=$runtimedir/virtcontainers/utils
# broken packages :( :
#katautils=$runtimedir/pkg/katautils
#oci=$runtimedir/pkg/oci
#vc=$runtimedir/virtcontainers
pkgs=(
"$device_api"
"$device_config"
"$device_drivers"
"$device_manager"
"$utils_pkg_dir"
"$rc_pkg_dir")
for pkg in "${pkgs[@]}"; do
echo building "$pkg"
pushd "$pkg" &>/dev/null
go build
go test
popd &>/dev/null
done
}
build_working_packages

View File

@@ -1,12 +0,0 @@
#!/bin/bash
#
# Copyright (c) 2021 Easystack Inc.
#
# SPDX-License-Identifier: Apache-2.0
set -e
cidir=$(dirname "$0")
source "${cidir}/lib.sh"
run_docs_url_alive_check

30
ci/go-no-os-exit.sh Executable file
View File

@@ -0,0 +1,30 @@
#!/bin/bash
# Copyright (c) 2018 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
#
# Check there are no os.Exit() calls creeping into the code
# We don't use that exit path in the Kata codebase.
# Allow the path to check to be over-ridden.
# Default to the current directory.
go_packages=${1:-.}
echo "Checking for no os.Exit() calls for package [${go_packages}]"
candidates=`go list -f '{{.Dir}}/*.go' $go_packages`
for f in $candidates; do
filename=`basename $f`
# skip all go test files
[[ $filename == *_test.go ]] && continue
# skip exit.go where, the only file we should call os.Exit() from.
[[ $filename == "exit.go" ]] && continue
files="$f $files"
done
[ -z "$files" ] && echo "No files to check, skipping" && exit 0
if egrep -n '\<os\.Exit\>' $files; then
echo "Direct calls to os.Exit() are forbidden, please use exit() so atexit() works"
exit 1
fi

11
ci/go-test.sh Executable file
View File

@@ -0,0 +1,11 @@
#
# Copyright (c) 2020 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
set -e
cidir=$(dirname "$0")
source "${cidir}/lib.sh"
run_go_test

View File

@@ -1,4 +1,4 @@
#!/usr/bin/env bash
#!/bin/bash
#
# Copyright (c) 2019 Intel Corporation
#

View File

@@ -1,112 +0,0 @@
#!/usr/bin/env bash
#
# Copyright 2021 Sony Group Corporation
#
# SPDX-License-Identifier: Apache-2.0
#
set -o errexit
cidir=$(dirname "$0")
source "${cidir}/lib.sh"
clone_tests_repo
source "${tests_repo_dir}/.ci/lib.sh"
# The following variables if set on the environment will change the behavior
# of gperf and libseccomp configure scripts, that may lead this script to
# fail. So let's ensure they are unset here.
unset PREFIX DESTDIR
arch=${ARCH:-$(uname -m)}
workdir="$(mktemp -d --tmpdir build-libseccomp.XXXXX)"
# Variables for libseccomp
libseccomp_version="${LIBSECCOMP_VERSION:-""}"
if [ -z "${libseccomp_version}" ]; then
libseccomp_version=$(get_version "externals.libseccomp.version")
fi
libseccomp_url="${LIBSECCOMP_URL:-""}"
if [ -z "${libseccomp_url}" ]; then
libseccomp_url=$(get_version "externals.libseccomp.url")
fi
libseccomp_tarball="libseccomp-${libseccomp_version}.tar.gz"
libseccomp_tarball_url="${libseccomp_url}/releases/download/v${libseccomp_version}/${libseccomp_tarball}"
cflags="-O2"
# Variables for gperf
gperf_version="${GPERF_VERSION:-""}"
if [ -z "${gperf_version}" ]; then
gperf_version=$(get_version "externals.gperf.version")
fi
gperf_url="${GPERF_URL:-""}"
if [ -z "${gperf_url}" ]; then
gperf_url=$(get_version "externals.gperf.url")
fi
gperf_tarball="gperf-${gperf_version}.tar.gz"
gperf_tarball_url="${gperf_url}/${gperf_tarball}"
# We need to build the libseccomp library from sources to create a static library for the musl libc.
# However, ppc64le and s390x have no musl targets in Rust. Hence, we do not set cflags for the musl libc.
if ([ "${arch}" != "ppc64le" ] && [ "${arch}" != "s390x" ]); then
# Set FORTIFY_SOURCE=1 because the musl-libc does not have some functions about FORTIFY_SOURCE=2
cflags="-U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=1 -O2"
fi
die() {
msg="$*"
echo "[Error] ${msg}" >&2
exit 1
}
finish() {
rm -rf "${workdir}"
}
trap finish EXIT
build_and_install_gperf() {
echo "Build and install gperf version ${gperf_version}"
mkdir -p "${gperf_install_dir}"
curl -sLO "${gperf_tarball_url}"
tar -xf "${gperf_tarball}"
pushd "gperf-${gperf_version}"
# Unset $CC for configure, we will always use native for gperf
CC= ./configure --prefix="${gperf_install_dir}"
make
make install
export PATH=$PATH:"${gperf_install_dir}"/bin
popd
echo "Gperf installed successfully"
}
build_and_install_libseccomp() {
echo "Build and install libseccomp version ${libseccomp_version}"
mkdir -p "${libseccomp_install_dir}"
curl -sLO "${libseccomp_tarball_url}"
tar -xf "${libseccomp_tarball}"
pushd "libseccomp-${libseccomp_version}"
./configure --prefix="${libseccomp_install_dir}" CFLAGS="${cflags}" --enable-static --host="${arch}"
make
make install
popd
echo "Libseccomp installed successfully"
}
main() {
local libseccomp_install_dir="${1:-}"
local gperf_install_dir="${2:-}"
if [ -z "${libseccomp_install_dir}" ] || [ -z "${gperf_install_dir}" ]; then
die "Usage: ${0} <libseccomp-install-dir> <gperf-install-dir>"
fi
pushd "$workdir"
# gperf is required for building the libseccomp.
build_and_install_gperf
build_and_install_libseccomp
popd
}
main "$@"

23
ci/install_musl.sh Executable file
View File

@@ -0,0 +1,23 @@
#!/bin/bash
# Copyright (c) 2020 Ant Group
#
# SPDX-License-Identifier: Apache-2.0
#
set -e
install_aarch64_musl() {
local arch=$(uname -m)
if [ "${arch}" == "aarch64" ]; then
local musl_tar="${arch}-linux-musl-native.tgz"
local musl_dir="${arch}-linux-musl-native"
pushd /tmp
curl -sLO https://musl.cc/${musl_tar}
tar -zxf ${musl_tar}
mkdir -p /usr/local/musl/
cp -r ${musl_dir}/* /usr/local/musl/
popd
fi
}
install_aarch64_musl

View File

@@ -1,4 +1,4 @@
#!/usr/bin/env bash
#!/bin/bash
# Copyright (c) 2019 Ant Financial
#
# SPDX-License-Identifier: Apache-2.0
@@ -12,5 +12,5 @@ source "${cidir}/lib.sh"
clone_tests_repo
pushd ${tests_repo_dir}
.ci/install_rust.sh ${1:-}
.ci/install_rust.sh
popd

View File

@@ -1,4 +1,4 @@
#!/usr/bin/env bash
#!/bin/bash
#
# Copyright (c) 2018 Intel Corporation
#

View File

@@ -15,18 +15,10 @@ die() {
# Install the yq yaml query package from the mikefarah github repo
# Install via binary download, as we may not have golang installed at this point
function install_yq() {
GOPATH=${GOPATH:-${HOME}/go}
local yq_path="${GOPATH}/bin/yq"
local yq_pkg="github.com/mikefarah/yq"
local yq_version=3.4.1
INSTALL_IN_GOPATH=${INSTALL_IN_GOPATH:-true}
if [ "${INSTALL_IN_GOPATH}" == "true" ];then
GOPATH=${GOPATH:-${HOME}/go}
mkdir -p "${GOPATH}/bin"
local yq_path="${GOPATH}/bin/yq"
else
yq_path="/usr/local/bin/yq"
fi
[ -x "${yq_path}" ] && [ "`${yq_path} --version`"X == "yq version ${yq_version}"X ] && return
[ -x "${GOPATH}/bin/yq" ] && return
read -r -a sysInfo <<< "$(uname -sm)"
@@ -57,12 +49,15 @@ function install_yq() {
;;
esac
mkdir -p "${GOPATH}/bin"
# Check curl
if ! command -v "curl" >/dev/null; then
die "Please install curl"
fi
local yq_version=3.1.0
## NOTE: ${var,,} => gives lowercase value of var
local yq_url="https://${yq_pkg}/releases/download/${yq_version}/yq_${goos,,}_${goarch}"
curl -o "${yq_path}" -LSsf "${yq_url}"

View File

@@ -3,38 +3,20 @@
#
# SPDX-License-Identifier: Apache-2.0
set -o nounset
export tests_repo="${tests_repo:-github.com/kata-containers/tests}"
export tests_repo_dir="$GOPATH/src/$tests_repo"
export branch="${target_branch:-main}"
export branch="${branch:-2.0-dev}"
# Clones the tests repository and checkout to the branch pointed out by
# the global $branch variable.
# If the clone exists and `CI` is exported then it does nothing. Otherwise
# it will clone the repository or `git pull` the latest code.
#
clone_tests_repo()
{
if [ -d "$tests_repo_dir" ]; then
[ -n "${CI:-}" ] && return
# git config --global --add safe.directory will always append
# the target to .gitconfig without checking the existence of
# the target, so it's better to check it before adding the target repo.
local sd="$(git config --global --get safe.directory ${tests_repo_dir} || true)"
if [ -z "${sd}" ]; then
git config --global --add safe.directory ${tests_repo_dir}
fi
pushd "${tests_repo_dir}"
git checkout "${branch}"
git pull
popd
else
git clone -q "https://${tests_repo}" "$tests_repo_dir"
pushd "${tests_repo_dir}"
git checkout "${branch}"
popd
if [ -d "$tests_repo_dir" -a -n "$CI" ]
then
return
fi
go get -d -u "$tests_repo" || true
pushd "${tests_repo_dir}" && git checkout "${branch}" && popd
}
run_static_checks()
@@ -43,24 +25,11 @@ run_static_checks()
# Make sure we have the targeting branch
git remote set-branches --add origin "${branch}"
git fetch -a
bash "$tests_repo_dir/.ci/static-checks.sh" "$@"
bash "$tests_repo_dir/.ci/static-checks.sh" "github.com/kata-containers/kata-containers"
}
run_docs_url_alive_check()
run_go_test()
{
clone_tests_repo
# Make sure we have the targeting branch
git remote set-branches --add origin "${branch}"
git fetch -a
bash "$tests_repo_dir/.ci/static-checks.sh" --docs --all "github.com/kata-containers/kata-containers"
}
run_get_pr_changed_file_details()
{
clone_tests_repo
# Make sure we have the targeting branch
git remote set-branches --add origin "${branch}"
git fetch -a
source "$tests_repo_dir/.ci/lib.sh"
get_pr_changed_file_details
bash "$tests_repo_dir/.ci/go-test.sh"
}

View File

@@ -1,14 +0,0 @@
# Copyright (c) 2021 Red Hat, Inc.
#
# SPDX-License-Identifier: Apache-2.0
#
# This is the build root image for Kata Containers on OpenShift CI.
#
FROM quay.io/centos/centos:stream8
RUN yum -y update && \
yum -y install \
git \
sudo \
wget && \
yum clean all

View File

@@ -1,4 +1,4 @@
#!/usr/bin/env bash
#!/bin/bash
#
# Copyright (c) 2019 Ant Financial
#
@@ -8,14 +8,9 @@
set -e
cidir=$(dirname "$0")
source "${cidir}/lib.sh"
export CI_JOB="${CI_JOB:-}"
clone_tests_repo
pushd ${tests_repo_dir}
.ci/run.sh
# temporary fix, see https://github.com/kata-containers/tests/issues/3878
if [ "$(uname -m)" != "s390x" ] && [ "$CI_JOB" == "CRI_CONTAINERD_K8S_MINIMAL" ]; then
tracing/test-agent-shutdown.sh
fi
popd

View File

@@ -1,4 +1,4 @@
#!/usr/bin/env bash
#!/bin/bash
#
# Copyright (c) 2018 Intel Corporation
#

View File

@@ -1,4 +1,4 @@
#!/usr/bin/env bash
#!/bin/bash
#
# Copyright (c) 2017-2018 Intel Corporation
#
@@ -9,4 +9,4 @@ set -e
cidir=$(dirname "$0")
source "${cidir}/lib.sh"
run_static_checks "${@:-github.com/kata-containers/kata-containers}"
run_static_checks

View File

@@ -1,33 +0,0 @@
targets = [
{ triple = "x86_64-apple-darwin" },
{ triple = "x86_64-unknown-linux-gnu" },
{ triple = "x86_64-unknown-linux-musl" },
]
[advisories]
vulnerability = "deny"
unsound = "deny"
unmaintained = "deny"
ignore = ["RUSTSEC-2020-0071"]
[bans]
multiple-versions = "allow"
deny = [
{ name = "cmake" },
{ name = "openssl-sys" },
]
[licenses]
unlicensed = "deny"
allow-osi-fsf-free = "neither"
copyleft = "allow"
# We want really high confidence when inferring licenses from text
confidence-threshold = 0.93
allow = ["0BSD", "Apache-2.0", "BSD-2-Clause", "BSD-3-Clause", "CC0-1.0", "ISC", "MIT", "MPL-2.0"]
private = { ignore = true}
exceptions = []
[sources]
unknown-registry = "allow"
unknown-git = "allow"

View File

@@ -1,3 +1,56 @@
* [Warning](#warning)
* [Assumptions](#assumptions)
* [Initial setup](#initial-setup)
* [Requirements to build individual components](#requirements-to-build-individual-components)
* [Build and install the Kata Containers runtime](#build-and-install-the-kata-containers-runtime)
* [Check hardware requirements](#check-hardware-requirements)
* [Configure to use initrd or rootfs image](#configure-to-use-initrd-or-rootfs-image)
* [Enable full debug](#enable-full-debug)
* [debug logs and shimv2](#debug-logs-and-shimv2)
* [Enabling full `containerd` debug](#enabling-full-containerd-debug)
* [Enabling just `containerd shim` debug](#enabling-just-containerd-shim-debug)
* [Enabling `CRI-O` and `shimv2` debug](#enabling-cri-o-and-shimv2-debug)
* [journald rate limiting](#journald-rate-limiting)
* [`systemd-journald` suppressing messages](#systemd-journald-suppressing-messages)
* [Disabling `systemd-journald` rate limiting](#disabling-systemd-journald-rate-limiting)
* [Create and install rootfs and initrd image](#create-and-install-rootfs-and-initrd-image)
* [Build a custom Kata agent - OPTIONAL](#build-a-custom-kata-agent---optional)
* [Get the osbuilder](#get-the-osbuilder)
* [Create a rootfs image](#create-a-rootfs-image)
* [Create a local rootfs](#create-a-local-rootfs)
* [Add a custom agent to the image - OPTIONAL](#add-a-custom-agent-to-the-image---optional)
* [Build a rootfs image](#build-a-rootfs-image)
* [Install the rootfs image](#install-the-rootfs-image)
* [Create an initrd image - OPTIONAL](#create-an-initrd-image---optional)
* [Create a local rootfs for initrd image](#create-a-local-rootfs-for-initrd-image)
* [Build an initrd image](#build-an-initrd-image)
* [Install the initrd image](#install-the-initrd-image)
* [Install guest kernel images](#install-guest-kernel-images)
* [Install a hypervisor](#install-a-hypervisor)
* [Build a custom QEMU](#build-a-custom-qemu)
* [Build a custom QEMU for aarch64/arm64 - REQUIRED](#build-a-custom-qemu-for-aarch64arm64---required)
* [Run Kata Containers with Containerd](#run-kata-containers-with-containerd)
* [Run Kata Containers with Kubernetes](#run-kata-containers-with-kubernetes)
* [Troubleshoot Kata Containers](#troubleshoot-kata-containers)
* [Appendices](#appendices)
* [Checking Docker default runtime](#checking-docker-default-runtime)
* [Set up a debug console](#set-up-a-debug-console)
* [Simple debug console setup](#simple-debug-console-setup)
* [Enable agent debug console](#enable-agent-debug-console)
* [Start `kata-monitor`](#start-kata-monitor)
* [Connect to debug console](#connect-to-debug-console)
* [Traditional debug console setup](#traditional-debug-console-setup)
* [Create a custom image containing a shell](#create-a-custom-image-containing-a-shell)
* [Build the debug image](#build-the-debug-image)
* [Configure runtime for custom debug image](#configure-runtime-for-custom-debug-image)
* [Connect to the virtual machine using the debug console](#connect-to-the-virtual-machine-using-the-debug-console)
* [Enabling debug console for QEMU](#enabling-debug-console-for-qemu)
* [Enabling debug console for cloud-hypervisor / firecracker](#enabling-debug-console-for-cloud-hypervisor--firecracker)
* [Create a container](#create-a-container)
* [Connect to the virtual machine using the debug console](#connect-to-the-virtual-machine-using-the-debug-console)
* [Obtain details of the image](#obtain-details-of-the-image)
* [Capturing kernel boot logs](#capturing-kernel-boot-logs)
# Warning
This document is written **specifically for developers**: it is not intended for end users.
@@ -51,7 +104,7 @@ The build will create the following:
You can check if your system is capable of creating a Kata Container by running the following:
```
$ sudo kata-runtime check
$ sudo kata-runtime kata-check
```
If your system is *not* able to run Kata Containers, the previous command will error out and explain why.
@@ -86,16 +139,6 @@ One of the `initrd` and `image` options in Kata runtime config file **MUST** be
The main difference between the options is that the size of `initrd`(10MB+) is significantly smaller than
rootfs `image`(100MB+).
## Enable seccomp
Enable seccomp as follows:
```
$ sudo sed -i '/^disable_guest_seccomp/ s/true/false/' /etc/kata-containers/configuration.toml
```
This will pass container seccomp profiles to the kata agent.
## Enable full debug
Enable full debug as follows:
@@ -116,7 +159,7 @@ detailed below.
The Kata logs appear in the `containerd` log files, along with logs from `containerd` itself.
For more information about `containerd` debug, please see the
[`containerd` documentation](https://github.com/containerd/containerd/blob/main/docs/getting-started.md).
[`containerd` documentation](https://github.com/containerd/containerd/blob/master/docs/getting-started.md).
#### Enabling full `containerd` debug
@@ -212,13 +255,11 @@ $ sudo systemctl restart systemd-journald
>
> - You should only do this step if you are testing with the latest version of the agent.
The agent is built with a statically linked `musl.` The default `libc` used is `musl`, but on `ppc64le` and `s390x`, `gnu` should be used. To configure this:
The rust-agent is built with a static linked `musl.` To configure this:
```
$ export ARCH=$(uname -m)
$ if [ "$ARCH" = "ppc64le" -o "$ARCH" = "s390x" ]; then export LIBC=gnu; else export LIBC=musl; fi
$ [ ${ARCH} == "ppc64le" ] && export ARCH=powerpc64le
$ rustup target add ${ARCH}-unknown-linux-${LIBC}
rustup target add x86_64-unknown-linux-musl
sudo ln -s /usr/bin/g++ /bin/musl-g++
```
To build the agent:
@@ -228,18 +269,6 @@ $ go get -d -u github.com/kata-containers/kata-containers
$ cd $GOPATH/src/github.com/kata-containers/kata-containers/src/agent && make
```
The agent is built with seccomp capability by default.
If you want to build the agent without the seccomp capability, you need to run `make` with `SECCOMP=no` as follows.
```
$ make -C $GOPATH/src/github.com/kata-containers/kata-containers/src/agent SECCOMP=no
```
> **Note:**
>
> - If you enable seccomp in the main configuration file but build the agent without seccomp capability,
> the runtime exits conservatively with an error message.
## Get the osbuilder
```
@@ -258,21 +287,9 @@ the following example.
$ export ROOTFS_DIR=${GOPATH}/src/github.com/kata-containers/kata-containers/tools/osbuilder/rootfs-builder/rootfs
$ sudo rm -rf ${ROOTFS_DIR}
$ cd $GOPATH/src/github.com/kata-containers/kata-containers/tools/osbuilder/rootfs-builder
$ script -fec 'sudo -E GOPATH=$GOPATH USE_DOCKER=true ./rootfs.sh ${distro}'
```
You MUST choose a distribution (e.g., `ubuntu`) for `${distro}`.
You can get a supported distributions list in the Kata Containers by running the following.
```
$ ./rootfs.sh -l
```
If you want to build the agent without seccomp capability, you need to run the `rootfs.sh` script with `SECCOMP=no` as follows.
```
$ script -fec 'sudo -E GOPATH=$GOPATH AGENT_INIT=yes USE_DOCKER=true SECCOMP=no ./rootfs.sh ${distro}'
$ script -fec 'sudo -E GOPATH=$GOPATH USE_DOCKER=true SECCOMP=no ./rootfs.sh ${distro}'
```
You MUST choose one of `alpine`, `centos`, `clearlinux`, `debian`, `euleros`, `fedora`, `suse`, and `ubuntu` for `${distro}`. By default `seccomp` packages are not included in the rootfs image. Set `SECCOMP` to `yes` to include them.
> **Note:**
>
@@ -288,7 +305,7 @@ $ script -fec 'sudo -E GOPATH=$GOPATH AGENT_INIT=yes USE_DOCKER=true SECCOMP=no
> - You should only do this step if you are testing with the latest version of the agent.
```
$ sudo install -o root -g root -m 0550 -t ${ROOTFS_DIR}/usr/bin ../../../src/agent/target/x86_64-unknown-linux-musl/release/kata-agent
$ sudo install -o root -g root -m 0550 -t ${ROOTFS_DIR}/bin ../../../src/agent/target/x86_64-unknown-linux-musl/release/kata-agent
$ sudo install -o root -g root -m 0440 ../../../src/agent/kata-agent.service ${ROOTFS_DIR}/usr/lib/systemd/system/
$ sudo install -o root -g root -m 0440 ../../../src/agent/kata-containers.target ${ROOTFS_DIR}/usr/lib/systemd/system/
```
@@ -308,7 +325,6 @@ $ script -fec 'sudo -E USE_DOCKER=true ./image_builder.sh ${ROOTFS_DIR}'
> - If you do *not* wish to build under Docker, remove the `USE_DOCKER`
> variable in the previous command and ensure the `qemu-img` command is
> available on your system.
> - If `qemu-img` is not installed, you will likely see errors such as `ERROR: File /dev/loop19p1 is not a block device` and `losetup: /tmp/tmp.bHz11oY851: Warning: file is smaller than 512 bytes; the loop device may be useless or invisible for system tools`. These can be mitigated by installing the `qemu-img` command (available in the `qemu-img` package on Fedora or the `qemu-utils` package on Debian).
### Install the rootfs image
@@ -327,35 +343,20 @@ $ (cd /usr/share/kata-containers && sudo ln -sf "$image" kata-containers.img)
$ export ROOTFS_DIR="${GOPATH}/src/github.com/kata-containers/kata-containers/tools/osbuilder/rootfs-builder/rootfs"
$ sudo rm -rf ${ROOTFS_DIR}
$ cd $GOPATH/src/github.com/kata-containers/kata-containers/tools/osbuilder/rootfs-builder
$ script -fec 'sudo -E GOPATH=$GOPATH AGENT_INIT=yes USE_DOCKER=true ./rootfs.sh ${distro}'
```
`AGENT_INIT` controls if the guest image uses the Kata agent as the guest `init` process. When you create an initrd image,
always set `AGENT_INIT` to `yes`.
You MUST choose a distribution (e.g., `ubuntu`) for `${distro}`.
You can get a supported distributions list in the Kata Containers by running the following.
```
$ ./rootfs.sh -l
```
If you want to build the agent without seccomp capability, you need to run the `rootfs.sh` script with `SECCOMP=no` as follows.
```
$ script -fec 'sudo -E GOPATH=$GOPATH AGENT_INIT=yes USE_DOCKER=true SECCOMP=no ./rootfs.sh ${distro}'
```
`AGENT_INIT` controls if the guest image uses the Kata agent as the guest `init` process. When you create an initrd image,
always set `AGENT_INIT` to `yes`. By default `seccomp` packages are not included in the initrd image. Set `SECCOMP` to `yes` to include them.
You MUST choose one of `alpine`, `centos`, `clearlinux`, `euleros`, and `fedora` for `${distro}`.
> **Note:**
>
> - Check the [compatibility matrix](../tools/osbuilder/README.md#platform-distro-compatibility-matrix) before creating rootfs.
Optionally, add your custom agent binary to the rootfs with the following commands. The default `$LIBC` used
is `musl`, but on ppc64le and s390x, `gnu` should be used. Also, Rust refers to ppc64le as `powerpc64le`:
Optionally, add your custom agent binary to the rootfs with the following:
```
$ export ARCH=$(uname -m)
$ [ ${ARCH} == "ppc64le" ] || [ ${ARCH} == "s390x" ] && export LIBC=gnu || export LIBC=musl
$ [ ${ARCH} == "ppc64le" ] && export ARCH=powerpc64le
$ sudo install -o root -g root -m 0550 -T ../../../src/agent/target/${ARCH}-unknown-linux-${LIBC}/release/kata-agent ${ROOTFS_DIR}/sbin/init
$ sudo install -o root -g root -m 0550 -T ../../agent/kata-agent ${ROOTFS_DIR}/sbin/init
```
### Build an initrd image
@@ -381,56 +382,31 @@ You can build and install the guest kernel image as shown [here](../tools/packag
# Install a hypervisor
When setting up Kata using a [packaged installation method](install/README.md#installing-on-a-linux-system), the
`QEMU` VMM is installed automatically. Cloud-Hypervisor and Firecracker VMMs are available from the [release tarballs](https://github.com/kata-containers/kata-containers/releases), as well as through [`kata-deploy`](../tools/packaging/kata-deploy/README.md).
You may choose to manually build your VMM/hypervisor.
When setting up Kata using a [packaged installation method](install/README.md#installing-on-a-linux-system), the `qemu-lite` hypervisor is installed automatically. For other installation methods, you will need to manually install a suitable hypervisor.
## Build a custom QEMU
Kata Containers makes use of upstream QEMU branch. The exact version
and repository utilized can be found by looking at the [versions file](../versions.yaml).
Your QEMU directory need to be prepared with source code. Alternatively, you can use the [Kata containers QEMU](https://github.com/kata-containers/qemu/tree/master) and checkout the recommended branch:
Find the correct version of QEMU from the versions file:
```
$ source ${GOPATH}/src/github.com/kata-containers/kata-containers/tools/packaging/scripts/lib.sh
$ qemu_version=$(get_from_kata_deps "assets.hypervisor.qemu.version")
$ echo ${qemu_version}
```
Get source from the matching branch of QEMU:
```
$ go get -d github.com/qemu/qemu
$ cd ${GOPATH}/src/github.com/qemu/qemu
$ git checkout ${qemu_version}
$ your_qemu_directory=${GOPATH}/src/github.com/qemu/qemu
$ go get -d github.com/kata-containers/qemu
$ qemu_branch=$(grep qemu-lite- ${GOPATH}/src/github.com/kata-containers/kata-containers/versions.yaml | cut -d '"' -f2)
$ cd ${GOPATH}/src/github.com/kata-containers/qemu
$ git checkout -b $qemu_branch remotes/origin/$qemu_branch
$ your_qemu_directory=${GOPATH}/src/github.com/kata-containers/qemu
```
There are scripts to manage the build and packaging of QEMU. For the examples below, set your
environment as:
```
$ go get -d github.com/kata-containers/kata-containers
$ packaging_dir="${GOPATH}/src/github.com/kata-containers/kata-containers/tools/packaging"
```
To build a version of QEMU using the same options as the default `qemu-lite` version , you could use the `configure-hypervisor.sh` script:
Kata often utilizes patches for not-yet-upstream and/or backported fixes for components,
including QEMU. These can be found in the [packaging/QEMU directory](../tools/packaging/qemu/patches),
and it's *recommended* that you apply them. For example, suppose that you are going to build QEMU
version 5.2.0, do:
```
$ go get -d github.com/kata-containers/kata-containers/tools/packaging
$ cd $your_qemu_directory
$ $packaging_dir/scripts/apply_patches.sh $packaging_dir/qemu/patches/5.2.x/
```
To build utilizing the same options as Kata, you should make use of the `configure-hypervisor.sh` script. For example:
```
$ cd $your_qemu_directory
$ $packaging_dir/scripts/configure-hypervisor.sh kata-qemu > kata.cfg
$ ${GOPATH}/src/github.com/kata-containers/kata-containers/tools/packaging/scripts/configure-hypervisor.sh qemu > kata.cfg
$ eval ./configure "$(cat kata.cfg)"
$ make -j $(nproc --ignore=1)
$ make -j $(nproc)
$ sudo -E make install
```
See the [static-build script for QEMU](../tools/packaging/static-build/qemu/build-static-qemu.sh) for a reference on how to get, setup, configure and build QEMU for Kata.
### Build a custom QEMU for aarch64/arm64 - REQUIRED
> **Note:**
>
@@ -465,7 +441,7 @@ script and paste its output directly into a
> [runtime](../src/runtime) repository.
To perform analysis on Kata logs, use the
[`kata-log-parser`](../src/tools/log-parser)
[`kata-log-parser`](https://github.com/kata-containers/tests/tree/master/cmd/log-parser)
tool, which can convert the logs into formats (e.g. JSON, TOML, XML, and YAML).
See [Set up a debug console](#set-up-a-debug-console).
@@ -498,9 +474,9 @@ debug_console_enabled = true
This will pass `agent.debug_console agent.debug_console_vport=1026` to agent as kernel parameters, and sandboxes created using this parameters will start a shell in guest if new connection is accept from VSOCK.
#### Start `kata-monitor` - ONLY NEEDED FOR 2.0.x
#### Start `kata-monitor`
For Kata Containers `2.0.x` releases, the `kata-runtime exec` command depends on the`kata-monitor` running, in order to get the sandbox's `vsock` address to connect to. Thus, first start the `kata-monitor` process.
The `kata-runtime exec` command needs `kata-monitor` to get the sandbox's `vsock` address to connect to, first start `kata-monitor`.
```
$ sudo kata-monitor
@@ -508,6 +484,7 @@ $ sudo kata-monitor
`kata-monitor` will serve at `localhost:8090` by default.
#### Connect to debug console
Command `kata-runtime exec` is used to connect to the debug console.
@@ -522,10 +499,6 @@ bash-4.2# exit
exit
```
`kata-runtime exec` has a command-line option `runtime-namespace`, which is used to specify under which [runtime namespace](https://github.com/containerd/containerd/blob/main/docs/namespaces.md) the particular pod was created. By default, it is set to `k8s.io` and works for containerd when configured
with Kubernetes. For CRI-O, the namespace should set to `default` explicitly. This should not be confused with [Kubernetes namespaces](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/).
For other CRI-runtimes and configurations, you may need to set the namespace utilizing the `runtime-namespace` option.
If you want to access guest OS through a traditional way, see [Traditional debug console setup)](#traditional-debug-console-setup).
### Traditional debug console setup
@@ -645,14 +618,11 @@ sudo sed -i -e 's/^kernel_params = "\(.*\)"/kernel_params = "\1 agent.debug_cons
> **Note** Ports 1024 and 1025 are reserved for communication with the agent
> and gathering of agent logs respectively.
##### Connecting to the debug console
Next, connect to the debug console. The VSOCKS paths vary slightly between each
VMM solution.
Next, connect to the debug console. The VSOCKS paths vary slightly between
cloud-hypervisor and firecracker.
In case of cloud-hypervisor, connect to the `vsock` as shown:
```
$ sudo su -c 'cd /var/run/vc/vm/${sandbox_id}/root/ && socat stdin unix-connect:clh.sock'
$ sudo su -c 'cd /var/run/vc/vm/{sandbox_id}/root/ && socat stdin unix-connect:clh.sock'
CONNECT 1026
```
@@ -660,18 +630,12 @@ CONNECT 1026
For firecracker, connect to the `hvsock` as shown:
```
$ sudo su -c 'cd /var/run/vc/firecracker/${sandbox_id}/root/ && socat stdin unix-connect:kata.hvsock'
$ sudo su -c 'cd /var/run/vc/firecracker/{sandbox_id}/root/ && socat stdin unix-connect:kata.hvsock'
CONNECT 1026
```
**Note**: You need to press the `RETURN` key to see the shell prompt.
For QEMU, connect to the `vsock` as shown:
```
$ sudo su -c 'cd /var/run/vc/vm/${sandbox_id} && socat "stdin,raw,echo=0,escape=0x11" "unix-connect:console.sock"'
```
To disconnect from the virtual machine, type `CONTROL+q` (hold down the
`CONTROL` key and press `q`).
@@ -700,11 +664,11 @@ options to have the kernel boot messages logged into the system journal.
For generic information on enabling debug in the configuration file, see the
[Enable full debug](#enable-full-debug) section.
The kernel boot messages will appear in the `kata` logs (and in the `containerd` or `CRI-O` log appropriately).
The kernel boot messages will appear in the `containerd` or `CRI-O` log appropriately,
such as:
```bash
$ sudo journalctl -t kata
$ sudo journalctl -t containerd
-- Logs begin at Thu 2020-02-13 16:20:40 UTC, end at Thu 2020-02-13 16:30:23 UTC. --
...
time="2020-09-15T14:56:23.095113803+08:00" level=debug msg="reading guest console" console-protocol=unix console-url=/run/vc/vm/ab9f633385d4987828d342e47554fc6442445b32039023eeddaa971c1bb56791/console.sock pid=107642 sandbox=ab9f633385d4987828d342e47554fc6442445b32039023eeddaa971c1bb56791 source=virtcontainers subsystem=sandbox vmconsole="[ 0.395399] brd: module loaded"
@@ -714,4 +678,3 @@ time="2020-09-15T14:56:23.105268162+08:00" level=debug msg="reading guest consol
time="2020-09-15T14:56:23.121121598+08:00" level=debug msg="reading guest console" console-protocol=unix console-url=/run/vc/vm/ab9f633385d4987828d342e47554fc6442445b32039023eeddaa971c1bb56791/console.sock pid=107642 sandbox=ab9f633385d4987828d342e47554fc6442445b32039023eeddaa971c1bb56791 source=virtcontainers subsystem=sandbox vmconsole="[ 0.421324] memmap_init_zone_device initialised 32768 pages in 12ms"
...
```
Refer to the [kata-log-parser documentation](../src/tools/log-parser/README.md) which is useful to fetch these.

View File

@@ -1,3 +1,16 @@
* [Introduction](#introduction)
* [General requirements](#general-requirements)
* [Linking advice](#linking-advice)
* [Notes](#notes)
* [Warnings and other admonitions](#warnings-and-other-admonitions)
* [Files and command names](#files-and-command-names)
* [Code blocks](#code-blocks)
* [Images](#images)
* [Spelling](#spelling)
* [Names](#names)
* [Version numbers](#version-numbers)
* [The apostrophe](#the-apostrophe)
# Introduction
This document outlines the requirements for all documentation in the [Kata
@@ -10,6 +23,10 @@ All documents must:
- Be written in simple English.
- Be written in [GitHub Flavored Markdown](https://github.github.com/gfm) format.
- Have a `.md` file extension.
- Include a TOC (table of contents) at the top of the document with links to
all heading sections. We recommend using the
[`kata-check-markdown`](https://github.com/kata-containers/tests/tree/master/cmd/check-markdown)
tool to generate the TOC.
- Be linked to from another document in the same repository.
Although GitHub allows navigation of the entire repository, it should be
@@ -26,10 +43,6 @@ All documents must:
which can then execute the commands specified to ensure the instructions are
correct. This avoids documents becoming out of date over time.
> **Note:**
>
> Do not add a table of contents (TOC) since GitHub will auto-generate one.
# Linking advice
Linking between documents is strongly encouraged to help users and developers
@@ -105,7 +118,7 @@ This section lists requirements for displaying commands and command output.
The requirements must be adhered to since documentation containing code blocks
is validated by the CI system, which executes the command blocks with the help
of the
[doc-to-script](https://github.com/kata-containers/tests/tree/main/.ci/kata-doc-to-script.sh)
[doc-to-script](https://github.com/kata-containers/tests/tree/master/.ci/kata-doc-to-script.sh)
utility.
- If a document includes commands the user should run, they **MUST** be shown
@@ -189,7 +202,7 @@ and compare them with standard tools (e.g. `diff(1)`).
Since this project uses a number of terms not found in conventional
dictionaries, we have a
[spell checking tool](https://github.com/kata-containers/tests/tree/main/cmd/check-spelling)
[spell checking tool](https://github.com/kata-containers/tests/tree/master/cmd/check-spelling)
that checks both dictionary words and the additional terms we use.
Run the spell checking tool on your document before raising a PR to ensure it

View File

@@ -1,5 +1,9 @@
# Licensing strategy
* [Project License](#project-license)
* [License file](#license-file)
* [License for individual files](#license-for-individual-files)
## Project License
The license for the [Kata Containers](https://github.com/kata-containers)
@@ -18,4 +22,4 @@ licensing and allows automated tooling to check the license of individual
files.
This SPDX licence identifier requirement is enforced by the
[CI (Continuous Integration) system](https://github.com/kata-containers/tests/blob/main/.ci/static-checks.sh).
[CI (Continuous Integration) system](https://github.com/kata-containers/tests/blob/master/.ci/static-checks.sh).

View File

@@ -1,3 +1,33 @@
* [Overview](#overview)
* [Definition of a limitation](#definition-of-a-limitation)
* [Scope](#scope)
* [Contributing](#contributing)
* [Pending items](#pending-items)
* [Runtime commands](#runtime-commands)
* [checkpoint and restore](#checkpoint-and-restore)
* [events command](#events-command)
* [update command](#update-command)
* [Networking](#networking)
* [Docker swarm and compose support](#docker-swarm-and-compose-support)
* [Resource management](#resource-management)
* [docker run and shared memory](#docker-run-and-shared-memory)
* [docker run and sysctl](#docker-run-and-sysctl)
* [Docker daemon features](#docker-daemon-features)
* [SELinux support](#selinux-support)
* [Architectural limitations](#architectural-limitations)
* [Networking limitations](#networking-limitations)
* [Support for joining an existing VM network](#support-for-joining-an-existing-vm-network)
* [docker --net=host](#docker---nethost)
* [docker run --link](#docker-run---link)
* [Host resource sharing](#host-resource-sharing)
* [docker run --privileged](#docker-run---privileged)
* [Miscellaneous](#miscellaneous)
* [Docker --security-opt option partially supported](#docker---security-opt-option-partially-supported)
* [Appendices](#appendices)
* [The constraints challenge](#the-constraints-challenge)
---
# Overview
A [Kata Container](https://github.com/kata-containers) utilizes a Virtual Machine (VM) to enhance security and
@@ -46,7 +76,7 @@ The following link shows the latest list of limitations:
# Contributing
If you would like to work on resolving a limitation, please refer to the
[contributors guide](https://github.com/kata-containers/community/blob/main/CONTRIBUTING.md).
[contributors guide](https://github.com/kata-containers/community/blob/master/CONTRIBUTING.md).
If you wish to raise an issue for a new limitation, either
[raise an issue directly on the runtime](https://github.com/kata-containers/kata-containers/issues/new)
or see the
@@ -57,30 +87,12 @@ for advice on which repository to raise the issue against.
This section lists items that might be possible to fix.
## OCI CLI commands
### Docker and Podman support
Currently Kata Containers does not support Podman.
See issue https://github.com/kata-containers/kata-containers/issues/722 for more information.
Docker supports Kata Containers since 22.06:
```bash
$ sudo docker run --runtime io.containerd.kata.v2
```
Kata Containers works perfectly with containerd, we recommend to use
containerd's Docker-style command line tool [`nerdctl`](https://github.com/containerd/nerdctl).
## Runtime commands
### checkpoint and restore
The runtime does not provide `checkpoint` and `restore` commands. There
are discussions about using VM save and restore to give us a
[`criu`](https://github.com/checkpoint-restore/criu)-like functionality,
which might provide a solution.
are discussions about using VM save and restore to give [`criu`](https://github.com/checkpoint-restore/criu)-like functionality, which might provide a solution.
Note that the OCI standard does not specify `checkpoint` and `restore`
commands.
@@ -102,41 +114,20 @@ All other configurations are supported and are working properly.
## Networking
### Host network
### Docker swarm and compose support
Host network (`nerdctl/docker run --net=host`or [Kubernetes `HostNetwork`](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#hosts-namespaces)) is not supported.
It is not possible to directly access the host networking configuration
from within the VM.
The newest version of Docker supported is specified by the
`externals.docker.version` variable in the
[versions database](https://github.com/kata-containers/runtime/blob/master/versions.yaml).
The `--net=host` option can still be used with `runc` containers and
inter-mixed with running Kata Containers, thus enabling use of `--net=host`
when necessary.
Basic Docker swarm support works. However, if you want to use custom networks
with Docker's swarm, an older version of Docker is required. This is specified
by the `externals.docker.meta.swarm-version` variable in the
[versions database](https://github.com/kata-containers/runtime/blob/master/versions.yaml).
It should be noted, currently passing the `--net=host` option into a
Kata Container may result in the Kata Container networking setup
modifying, re-configuring and therefore possibly breaking the host
networking setup. Do not use `--net=host` with Kata Containers.
See issue https://github.com/kata-containers/runtime/issues/175 for more information.
### Support for joining an existing VM network
Docker supports the ability for containers to join another containers
namespace with the `docker run --net=containers` syntax. This allows
multiple containers to share a common network namespace and the network
interfaces placed in the network namespace. Kata Containers does not
support network namespace sharing. If a Kata Container is setup to
share the network namespace of a `runc` container, the runtime
effectively takes over all the network interfaces assigned to the
namespace and binds them to the VM. Consequently, the `runc` container loses
its network connectivity.
### docker run --link
The runtime does not support the `docker run --link` command. This
command is now deprecated by docker and we have no intention of adding support.
Equivalent functionality can be achieved with the newer docker networking commands.
See more documentation at
[docs.docker.com](https://docs.docker.com/network/links/).
Docker compose normally uses custom networks, so also has the same limitations.
## Resource management
@@ -149,28 +140,91 @@ See issue https://github.com/clearcontainers/runtime/issues/341 and [the constra
For CPUs resource management see
[CPU constraints](design/vcpu-handling.md).
### docker run and shared memory
The runtime does not implement the `docker run --shm-size` command to
set the size of the `/dev/shm tmpfs` within the container. It is possible to pass this configuration value into the VM container so the appropriate mount command happens at launch time.
See issue https://github.com/kata-containers/kata-containers/issues/21 for more information.
### docker run and sysctl
The `docker run --sysctl` feature is not implemented. At the runtime
level, this equates to the `linux.sysctl` OCI configuration. Docker
allows configuring the sysctl settings that support namespacing. From a security and isolation point of view, it might make sense to set them in the VM, which isolates sysctl settings. Also, given that each Kata Container has its own kernel, we can support setting of sysctl settings that are not namespaced. In some cases, we might need to support configuring some of the settings on both the host side Kata Container namespace and the Kata Containers kernel.
See issue https://github.com/kata-containers/runtime/issues/185 for more information.
## Docker daemon features
Some features enabled or implemented via the
[`dockerd` daemon](https://docs.docker.com/config/daemon/) configuration are not yet
implemented.
### SELinux support
The `dockerd` configuration option `"selinux-enabled": true` is not presently implemented
in Kata Containers. Enabling this option causes an OCI runtime error.
See issue https://github.com/kata-containers/runtime/issues/784 for more information.
The consequence of this is that the [Docker --security-opt is only partially supported](#docker---security-opt-option-partially-supported).
Kubernetes [SELinux labels](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#assign-selinux-labels-to-a-container) will also not be applied.
# Architectural limitations
This section lists items that might not be fixed due to fundamental
architectural differences between "soft containers" (i.e. traditional Linux*
containers) and those based on VMs.
## Storage limitations
## Networking limitations
### Kubernetes `volumeMounts.subPaths`
### Support for joining an existing VM network
Kubernetes `volumeMount.subPath` is not supported by Kata Containers at the
moment.
Docker supports the ability for containers to join another containers
namespace with the `docker run --net=containers` syntax. This allows
multiple containers to share a common network namespace and the network
interfaces placed in the network namespace. Kata Containers does not
support network namespace sharing. If a Kata Container is setup to
share the network namespace of a `runc` container, the runtime
effectively takes over all the network interfaces assigned to the
namespace and binds them to the VM. Consequently, the `runc` container loses
its network connectivity.
See [this issue](https://github.com/kata-containers/runtime/issues/2812) for more details.
[Another issue](https://github.com/kata-containers/kata-containers/issues/1728) focuses on the case of `emptyDir`.
### docker --net=host
Docker host network support (`docker --net=host run`) is not supported.
It is not possible to directly access the host networking configuration
from within the VM.
The `--net=host` option can still be used with `runc` containers and
inter-mixed with running Kata Containers, thus enabling use of `--net=host`
when necessary.
It should be noted, currently passing the `--net=host` option into a
Kata Container may result in the Kata Container networking setup
modifying, re-configuring and therefore possibly breaking the host
networking setup. Do not use `--net=host` with Kata Containers.
### docker run --link
The runtime does not support the `docker run --link` command. This
command is now deprecated by docker and we have no intention of adding support.
Equivalent functionality can be achieved with the newer docker networking commands.
See more documentation at
[docs.docker.com](https://docs.docker.com/engine/userguide/networking/default_network/dockerlinks/).
## Host resource sharing
### Privileged containers
### docker run --privileged
Privileged support in Kata is essentially different from `runc` containers.
The container runs with elevated capabilities within the guest and is granted
Kata does support `docker run --privileged` command, but in this case full access
to the guest VM is provided in addition to some host access.
The container runs with elevated capabilities within the guest and is granted
access to guest devices instead of the host devices.
This is also true with using `securityContext privileged=true` with Kubernetes.
@@ -179,6 +233,17 @@ The container may also be granted full access to a subset of host devices
See [Privileged Kata Containers](how-to/privileged.md) for how to configure some of this behavior.
# Miscellaneous
This section lists limitations where the possible solutions are uncertain.
## Docker --security-opt option partially supported
The `--security-opt=` option used by Docker is partially supported.
We only support `--security-opt=no-new-privileges` and `--security-opt seccomp=/path/to/seccomp/profile.json`
option as of today.
Note: The `--security-opt apparmor=your_profile` is not yet supported. See https://github.com/kata-containers/runtime/issues/707.
# Appendices
## The constraints challenge

View File

@@ -1,5 +1,16 @@
# Documentation
* [Getting Started](#getting-started)
* [More User Guides](#more-user-guides)
* [Kata Use-Cases](#kata-use-cases)
* [Developer Guide](#developer-guide)
* [Design and Implementations](#design-and-implementations)
* [How to Contribute](#how-to-contribute)
* [Code Licensing](#code-licensing)
* [The Release Process](#the-release-process)
* [Help Improving the Documents](#help-improving-the-documents)
* [Website Changes](#website-changes)
The [Kata Containers](https://github.com/kata-containers)
documentation repository hosts overall system documentation, with information
common to multiple components.
@@ -11,27 +22,24 @@ For details of the other Kata Containers repositories, see the
* [Installation guides](./install/README.md): Install and run Kata Containers with Docker or Kubernetes
## Tracing
See the [tracing documentation](tracing.md).
## More User Guides
* [Upgrading](Upgrading.md): how to upgrade from [Clear Containers](https://github.com/clearcontainers) and [runV](https://github.com/hyperhq/runv) to [Kata Containers](https://github.com/kata-containers) and how to upgrade an existing Kata Containers system to the latest version.
* [Limitations](Limitations.md): differences and limitations compared with the default [Docker](https://www.docker.com/) runtime,
[`runc`](https://github.com/opencontainers/runc).
### How-to guides
### Howto guides
See the [how-to documentation](how-to).
See the [howto documentation](how-to).
## Kata Use-Cases
* [GPU Passthrough with Kata](./use-cases/GPU-passthrough-and-Kata.md)
* [OpenStack Zun with Kata Containers](./use-cases/zun_kata.md)
* [SR-IOV with Kata](./use-cases/using-SRIOV-and-kata.md)
* [Intel QAT with Kata](./use-cases/using-Intel-QAT-and-kata.md)
* [VPP with Kata](./use-cases/using-vpp-and-kata.md)
* [SPDK vhost-user with Kata](./use-cases/using-SPDK-vhostuser-and-kata.md)
* [Intel SGX with Kata](./use-cases/using-Intel-SGX-and-kata.md)
## Developer Guide
@@ -39,30 +47,15 @@ Documents that help to understand and contribute to Kata Containers.
### Design and Implementations
* [Kata Containers Architecture](design/architecture): Architectural overview of Kata Containers
* [Kata Containers E2E Flow](design/end-to-end-flow.md): The entire end-to-end flow of Kata Containers
* [Kata Containers Architecture](design/architecture.md): Architectural overview of Kata Containers
* [Kata Containers design](./design/README.md): More Kata Containers design documents
* [Kata Containers threat model](./threat-model/threat-model.md): Kata Containers threat model
### How to Contribute
* [Developer Guide](Developer-Guide.md): Setup the Kata Containers developing environments
* [How to contribute to Kata Containers](https://github.com/kata-containers/community/blob/main/CONTRIBUTING.md)
* [How to contribute to Kata Containers](https://github.com/kata-containers/community/blob/master/CONTRIBUTING.md)
* [Code of Conduct](../CODE_OF_CONDUCT.md)
## Help Writing a Code PR
* [Code PR advice](code-pr-advice.md).
## Help Writing Unit Tests
* [Unit Test Advice](Unit-Test-Advice.md)
* [Unit testing presentation](presentations/unit-testing/kata-containers-unit-testing.md)
## Help Improving the Documents
* [Documentation Requirements](Documentation-Requirements.md)
### Code Licensing
* [Licensing](Licensing-strategy.md): About the licensing strategy of Kata Containers.
@@ -72,9 +65,9 @@ Documents that help to understand and contribute to Kata Containers.
* [Release strategy](Stable-Branch-Strategy.md)
* [Release Process](Release-Process.md)
## Presentations
## Help Improving the Documents
* [Presentations](presentations)
* [Documentation Requirements](Documentation-Requirements.md)
## Website Changes

View File

@@ -1,26 +1,45 @@
# How to do a Kata Containers Release
This document lists the tasks required to create a Kata Release.
<!-- TOC START min:1 max:3 link:true asterisk:false update:true -->
- [How to do a Kata Containers Release](#how-to-do-a-kata-containers-release)
- [Requirements](#requirements)
- [Release Process](#release-process)
- [Bump all Kata repositories](#bump-all-kata-repositories)
- [Merge all bump version Pull requests](#merge-all-bump-version-pull-requests)
- [Tag all Kata repositories](#tag-all-kata-repositories)
- [Check Git-hub Actions](#check-git-hub-actions)
- [Create release notes](#create-release-notes)
- [Announce the release](#announce-the-release)
<!-- TOC END -->
## Requirements
- [hub](https://github.com/github/hub)
* Using an [application token](https://github.com/settings/tokens) is required for hub (set to a GITHUB_TOKEN environment variable).
- OBS account with permissions on [`/home:katacontainers`](https://build.opensuse.org/project/subprojects/home:katacontainers)
- GitHub permissions to push tags and create releases in Kata repositories.
- GPG configured to sign git tags. https://docs.github.com/en/authentication/managing-commit-signature-verification/generating-a-new-gpg-key
- GPG configured to sign git tags. https://help.github.com/articles/generating-a-new-gpg-key/
- You should configure your GitHub to use your ssh keys (to push to branches). See https://help.github.com/articles/adding-a-new-ssh-key-to-your-github-account/.
* As an alternative, configure hub to push and fork with HTTPS, `git config --global hub.protocol https` (Not tested yet) *
## Release Process
### Bump all Kata repositories
Bump the repositories using a script in the Kata packaging repo, where:
- `BRANCH=<the-branch-you-want-to-bump>`
- `NEW_VERSION=<the-new-kata-version>`
- We have set up a Jenkins job to bump the version in the `VERSION` file in all Kata repositories. Go to the [Jenkins bump-job page](http://jenkins.katacontainers.io/job/release/build) to trigger a new job.
- Start a new job with variables for the job passed as:
- `BRANCH=<the-branch-you-want-to-bump>`
- `NEW_VERSION=<the-new-kata-version>`
For example, in the case where you want to make a patch release `1.10.2`, the variable `NEW_VERSION` should be `1.10.2` and `BRANCH` should point to `stable-1.10`. In case of an alpha or release candidate release, `BRANCH` should point to `master` branch.
Alternatively, you can also bump the repositories using a script in the Kata packaging repo
```
$ cd ${GOPATH}/src/github.com/kata-containers/kata-containers/tools/packaging/release
$ export NEW_VERSION=<the-new-kata-version>
@@ -28,34 +47,16 @@
$ ./update-repository-version.sh -p "$NEW_VERSION" "$BRANCH"
```
### Point tests repository to stable branch
If you create a new stable branch, i.e. if your release changes a major or minor version number (not a patch release), then
you should modify the `tests` repository to point to that newly created stable branch and not the `main` branch.
The objective is that changes in the CI on the main branch will not impact the stable branch.
In the test directory, change references the main branch in:
* `README.md`
* `versions.yaml`
* `cmd/github-labels/labels.yaml.in`
* `cmd/pmemctl/pmemctl.sh`
* `.ci/lib.sh`
* `.ci/static-checks.sh`
See the commits in [the corresponding PR for stable-2.1](https://github.com/kata-containers/tests/pull/3504) for an example of the changes.
### Merge all bump version Pull requests
- The above step will create a GitHub pull request in the Kata projects. Trigger the CI using `/test` command on each bump Pull request.
- Trigger the `test-kata-deploy` workflow which is under the `Actions` tab on the repository GitHub page (make sure to select the correct branch and validate it passes).
- Check any failures and fix if needed.
- Work with the Kata approvers to verify that the CI works and the pull requests are merged.
### Tag all Kata repositories
Once all the pull requests to bump versions in all Kata repositories are merged,
tag all the repositories as shown below.
tag all the repositories as shown below.
```
$ cd ${GOPATH}/src/github.com/kata-containers/kata-containers/tools/packaging/release
$ git checkout <kata-branch-to-release>
@@ -65,7 +66,7 @@
### Check Git-hub Actions
We make use of [GitHub actions](https://github.com/features/actions) in this [file](../.github/workflows/release.yaml) in the `kata-containers/kata-containers` repository to build and upload release artifacts. This action is auto triggered with the above step when a new tag is pushed to the `kata-containers/kata-containers` repository.
We make use of [GitHub actions](https://github.com/features/actions) in this [file](https://github.com/kata-containers/kata-containers/blob/master/.github/workflows/main.yaml) in the `kata-containers/kata-containers` repository to build and upload release artifacts. This action is auto triggered with the above step when a new tag is pushed to the `kata-containers/kata-conatiners` repository.
Check the [actions status page](https://github.com/kata-containers/kata-containers/actions) to verify all steps in the actions workflow have completed successfully. On success, a static tarball containing Kata release artifacts will be uploaded to the [Release page](https://github.com/kata-containers/kata-containers/releases).
@@ -78,9 +79,9 @@
```
$ cd ${GOPATH}/src/github.com/kata-containers/kata-containers/tools/packaging/release
# Note: OLD_VERSION is where the script should start to get changes.
$ ./release-notes.sh ${OLD_VERSION} ${NEW_VERSION} > notes.md
$ ./runtime-release-notes.sh ${OLD_VERSION} ${NEW_VERSION} > notes.md
# Edit the `notes.md` file to review and make any changes to the release notes.
# Add the release notes in the project's GitHub.
# Add the release notes in GitHub runtime.
$ hub release edit -F notes.md "${NEW_VERSION}"
```

View File

@@ -32,16 +32,16 @@ provides additional information regarding release `99.123.77` in the previous ex
changing the existing behavior*.
- When `MAJOR` increases, the new release adds **new features, bug fixes, or
both** and which **changes the behavior from the previous release** (incompatible with previous releases).
both** and which *changes the behavior from the previous release* (incompatible with previous releases).
A major release will also likely require a change of the container manager version used,
for example Containerd or CRI-O. Please refer to the release notes for further details.
for example Docker\*. Please refer to the release notes for further details.
## Release Strategy
Any new features added since the last release will be available in the next minor
release. These will include bug fixes as well. To facilitate a stable user environment,
Kata provides stable branch-based releases and a main branch release.
Kata provides stable branch-based releases and a master branch release.
## Stable branch patch criteria
@@ -49,10 +49,9 @@ No new features should be introduced to stable branches. This is intended to li
providing only bug and security fixes.
## Branch Management
Kata Containers will maintain **one** stable release branch, in addition to the main branch, for
each active major release.
Once a new MAJOR or MINOR release is created from main, a new stable branch is created for
the prior MAJOR or MINOR release and the previous stable branch is no longer maintained. End of
Kata Containers will maintain two stable release branches in addition to the master branch.
Once a new MAJOR or MINOR release is created from master, a new stable branch is created for
the prior MAJOR or MINOR release and the older stable branch is no longer maintained. End of
maintenance for a branch is announced on the Kata Containers mailing list. Users can determine
the version currently installed by running `kata-runtime kata-env`. It is recommended to use the
latest stable branch available.
@@ -62,67 +61,67 @@ A couple of examples follow to help clarify this process.
### New bug fix introduced
A bug fix is submitted against the runtime which does not introduce new inter-component dependencies.
This fix is applied to both the main and stable branches, and there is no need to create a new
This fix is applied to both the master and stable branches, and there is no need to create a new
stable branch.
| Branch | Original version | New version |
|--|--|--|
| `main` | `2.3.0-rc0` | `2.3.0-rc1` |
| `stable-2.2` | `2.2.0` | `2.2.1` |
| `stable-2.1` | (unmaintained) | (unmaintained) |
| `master` | `1.3.0-rc0` | `1.3.0-rc1` |
| `stable-1.2` | `1.2.0` | `1.2.1` |
| `stable-1.1` | `1.1.2` | `1.1.3` |
### New release made feature or change adding new inter-component dependency
A new feature is introduced, which adds a new inter-component dependency. In this case a new stable
branch is created (stable-2.3) starting from main and the previous stable branch (stable-2.2)
branch is created (stable-1.3) starting from master and the older stable branch (stable-1.1)
is dropped from maintenance.
| Branch | Original version | New version |
|--|--|--|
| `main` | `2.3.0-rc1` | `2.3.0` |
| `stable-2.3` | N/A| `2.3.0` |
| `stable-2.2` | `2.2.1` | (unmaintained) |
| `stable-2.1` | (unmaintained) | (unmaintained) |
| `master` | `1.3.0-rc1` | `1.3.0` |
| `stable-1.3` | N/A| `1.3.0` |
| `stable-1.2` | `1.2.1` | `1.2.2` |
| `stable-1.1` | `1.1.3` | (unmaintained) |
Note, the stable-2.2 branch will still exist with tag 2.2.1, but under current plans it is
not maintained further. The next tag applied to main will be 2.4.0-alpha0. We would then
Note, the stable-1.1 branch will still exist with tag 1.1.3, but under current plans it is
not maintained further. The next tag applied to master will be 1.4.0-alpha0. We would then
create a couple of alpha releases gathering features targeted for that particular release (in
this case 2.4.0), followed by a release candidate. The release candidate marks a feature freeze.
this case 1.4.0), followed by a release candidate. The release candidate marks a feature freeze.
A new stable branch is created for the release candidate. Only bug fixes and any security issues
are added to the branch going forward until release 2.4.0 is made.
are added to the branch going forward until release 1.4.0 is made.
## Backporting Process
Development that occurs against the main branch and applicable code commits should also be submitted
Development that occurs against the master branch and applicable code commits should also be submitted
against the stable branches. Some guidelines for this process follow::
1. Only bug and security fixes which do not introduce inter-component dependencies are
candidates for stable branches. These PRs should be marked with "bug" in GitHub.
2. Once a PR is created against main which meets requirement of (1), a comparable one
2. Once a PR is created against master which meets requirement of (1), a comparable one
should also be submitted against the stable branches. It is the responsibility of the submitter
to apply their pull request against stable, and it is the responsibility of the
reviewers to help identify stable-candidate pull requests.
## Continuous Integration Testing
The test repository is forked to create stable branches from main. Full CI
runs on each stable and main PR using its respective tests repository branch.
The test repository is forked to create stable branches from master. Full CI
runs on each stable and master PR using its respective tests repository branch.
### An alternative method for CI testing:
Ideally, the continuous integration infrastructure will run the same test suite on both main
Ideally, the continuous integration infrastructure will run the same test suite on both master
and the stable branches. When tests are modified or new feature tests are introduced, explicit
logic should exist within the testing CI to make sure only applicable tests are executed against
stable and main. While this is not in place currently, it should be considered in the long term.
stable and master. While this is not in place currently, it should be considered in the long term.
## Release Management
### Patch releases
Releases are made every four weeks, which include a GitHub release as
Releases are made every three weeks, which include a GitHub release as
well as binary packages. These patch releases are made for both stable branches, and a "release candidate"
for the next `MAJOR` or `MINOR` is created from main. If there are no changes across all the repositories, no
for the next `MAJOR` or `MINOR` is created from master. If there are no changes across all the repositories, no
release is created and an announcement is made on the developer mailing list to highlight this.
If a release is being made, each repository is tagged for this release, regardless
of whether changes are introduced. The release schedule can be seen on the
@@ -136,16 +135,17 @@ The process followed for making a release can be found at [Release Process](Rele
### Frequency
Minor releases are less frequent in order to provide a more stable baseline for users. They are currently
running on a sixteen weeks cadence. The release schedule can be seen on the
running on a twelve week cadence. As the Kata Containers code base has reached a certain level of
maturity, we have increased the cadence from six weeks to twelve weeks. The release schedule can be seen on the
[release rotation wiki page](https://github.com/kata-containers/community/wiki/Release-Team-Rota).
### Compatibility
Kata guarantees compatibility between components that are within one minor release of each other.
This is critical for dependencies which cross between host (shimv2 runtime) and
This is critical for dependencies which cross between host (runtime, shim, proxy) and
the guest (hypervisor, rootfs and agent). For example, consider a cluster with a long-running
deployment, workload-never-dies, all on Kata version 2.1.3 components. If the operator updates
the Kata components to the next new minor release (i.e. 2.2.0), we need to guarantee that the 2.2.0
shimv2 runtime still communicates with 2.1.3 agent within workload-never-dies.
deployment, workload-never-dies, all on Kata version 1.1.3 components. If the operator updates
the Kata components to the next new minor release (i.e. 1.2.0), we need to guarantee that the 1.2.0
runtime still communicates with 1.1.3 agent within workload-never-dies.
Handling live-update is out of the scope of this document. See this [`kata-runtime` issue](https://github.com/kata-containers/runtime/issues/492) for details.

View File

@@ -1,377 +0,0 @@
# Unit Test Advice
## Overview
This document offers advice on writing a Unit Test (UT) in
[Golang](https://golang.org) and [Rust](https://www.rust-lang.org).
## General advice
### Unit test strategies
#### Positive and negative tests
Always add positive tests (where success is expected) *and* negative
tests (where failure is expected).
#### Boundary condition tests
Try to add unit tests that exercise boundary conditions such as:
- Missing values (`null` or `None`).
- Empty strings and huge strings.
- Empty (or uninitialised) complex data structures
(such as lists, vectors and hash tables).
- Common numeric values (such as `-1`, `0`, `1` and the minimum and
maximum values).
#### Test unusual values
Also always consider "unusual" input values such as:
- String values containing spaces, Unicode characters, special
characters, escaped characters or null bytes.
> **Note:** Consider these unusual values in prefix, infix and
> suffix position.
- String values that cannot be converted into numeric values or which
contain invalid structured data (such as invalid JSON).
#### Other types of tests
If the code requires other forms of testing (such as stress testing,
fuzz testing and integration testing), raise a GitHub issue and
reference it on the issue you are using for the main work. This
ensures the test team are aware that a new test is required.
### Test environment
#### Create unique files and directories
Ensure your tests do not write to a fixed file or directory. This can
cause problems when running multiple tests simultaneously and also
when running tests after a previous test run failure.
#### Assume parallel testing
Always assume your tests will be run *in parallel*. If this is
problematic for a test, force it to run in isolation using the
`serial_test` crate for Rust code for example.
### Running
Ensure you run the unit tests and they all pass before raising a PR.
Ideally do this on different distributions on different architectures
to maximise coverage (and so minimise surprises when your code runs in
the CI).
## Assertions
### Golang assertions
Use the `testify` assertions package to create a new assertion object as this
keeps the test code free from distracting `if` tests:
```go
func TestSomething(t *testing.T) {
assert := assert.New(t)
err := doSomething()
assert.NoError(err)
}
```
### Rust assertions
Use the standard set of `assert!()` macros.
## Table driven tests
Try to write tests using a table-based approach. This allows you to distill
the logic into a compact table (rather than spreading the tests across
multiple test functions). It also makes it easy to cover all the
interesting boundary conditions:
### Golang table driven tests
Assume the following function:
```go
// The function under test.
//
// Accepts a string and an integer and returns the
// result of sticking them together separated by a dash as a string.
func joinParamsWithDash(str string, num int) (string, error) {
if str == "" {
return "", errors.New("string cannot be blank")
}
if num <= 0 {
return "", errors.New("number must be positive")
}
return fmt.Sprintf("%s-%d", str, num), nil
}
```
A table driven approach to testing it:
```go
import (
"testing"
"github.com/stretchr/testify/assert"
)
func TestJoinParamsWithDash(t *testing.T) {
assert := assert.New(t)
// Type used to hold function parameters and expected results.
type testData struct {
param1 string
param2 int
expectedResult string
expectError bool
}
// List of tests to run including the expected results
data := []testData{
// Failure scenarios
{"", -1, "", true},
{"", 0, "", true},
{"", 1, "", true},
{"foo", 0, "", true},
{"foo", -1, "", true},
// Success scenarios
{"foo", 1, "foo-1", false},
{"bar", 42, "bar-42", false},
}
// Run the tests
for i, d := range data {
// Create a test-specific string that is added to each assert
// call. It will be displayed if any assert test fails.
msg := fmt.Sprintf("test[%d]: %+v", i, d)
// Call the function under test
result, err := joinParamsWithDash(d.param1, d.param2)
// update the message for more information on failure
msg = fmt.Sprintf("%s, result: %q, err: %v", msg, result, err)
if d.expectError {
assert.Error(err, msg)
// If an error is expected, there is no point
// performing additional checks.
continue
}
assert.NoError(err, msg)
assert.Equal(d.expectedResult, result, msg)
}
}
```
### Rust table driven tests
Assume the following function:
```rust
// Convenience type to allow Result return types to only specify the type
// for the true case; failures are specified as static strings.
// XXX: This is an example. In real code use the "anyhow" and
// XXX: "thiserror" crates.
pub type Result<T> = std::result::Result<T, &'static str>;
// The function under test.
//
// Accepts a string and an integer and returns the
// result of sticking them together separated by a dash as a string.
fn join_params_with_dash(str: &str, num: i32) -> Result<String> {
if str.is_empty() {
return Err("string cannot be blank");
}
if num <= 0 {
return Err("number must be positive");
}
let result = format!("{}-{}", str, num);
Ok(result)
}
```
A table driven approach to testing it:
```rust
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_join_params_with_dash() {
// This is a type used to record all details of the inputs
// and outputs of the function under test.
#[derive(Debug)]
struct TestData<'a> {
str: &'a str,
num: i32,
result: Result<String>,
}
// The tests can now be specified as a set of inputs and outputs
let tests = &[
// Failure scenarios
TestData {
str: "",
num: 0,
result: Err("string cannot be blank"),
},
TestData {
str: "foo",
num: -1,
result: Err("number must be positive"),
},
// Success scenarios
TestData {
str: "foo",
num: 42,
result: Ok("foo-42".to_string()),
},
TestData {
str: "-",
num: 1,
result: Ok("--1".to_string()),
},
];
// Run the tests
for (i, d) in tests.iter().enumerate() {
// Create a string containing details of the test
let msg = format!("test[{}]: {:?}", i, d);
// Call the function under test
let result = join_params_with_dash(d.str, d.num);
// Update the test details string with the results of the call
let msg = format!("{}, result: {:?}", msg, result);
// Perform the checks
if d.result.is_ok() {
assert!(result == d.result, msg);
continue;
}
let expected_error = format!("{}", d.result.as_ref().unwrap_err());
let actual_error = format!("{}", result.unwrap_err());
assert!(actual_error == expected_error, msg);
}
}
}
```
## Temporary files
Use `t.TempDir()` to create temporary directory. The directory created by
`t.TempDir()` is automatically removed when the test and all its subtests
complete.
### Golang temporary files
```go
func TestSomething(t *testing.T) {
assert := assert.New(t)
// Create a temporary directory
tmpdir := t.TempDir()
// Add test logic that will use the tmpdir here...
}
```
### Rust temporary files
Use the `tempfile` crate which allows files and directories to be deleted
automatically:
```rust
#[cfg(test)]
mod tests {
use tempfile::tempdir;
#[test]
fn test_something() {
// Create a temporary directory (which will be deleted automatically
let dir = tempdir().expect("failed to create tmpdir");
let filename = dir.path().join("file.txt");
// create filename ...
}
}
```
## Test user
[Unit tests are run *twice*](../src/runtime/go-test.sh):
- as the current user
- as the `root` user (if different to the current user)
When writing a test consider which user should run it; even if the code the
test is exercising runs as `root`, it may be necessary to *only* run the test
as a non-`root` for the test to be meaningful. Add appropriate skip
guards around code that requires `root` and non-`root` so that the test
will run if the correct type of user is detected and skipped if not.
### Run Golang tests as a different user
The main repository has the most comprehensive set of skip abilities. See:
- [`katatestutils`](../src/runtime/pkg/katatestutils)
### Run Rust tests as a different user
One method is to use the `nix` crate along with some custom macros:
```rust
#[cfg(test)]
mod tests {
#[allow(unused_macros)]
macro_rules! skip_if_root {
() => {
if nix::unistd::Uid::effective().is_root() {
println!("INFO: skipping {} which needs non-root", module_path!());
return;
}
};
}
#[allow(unused_macros)]
macro_rules! skip_if_not_root {
() => {
if !nix::unistd::Uid::effective().is_root() {
println!("INFO: skipping {} which needs root", module_path!());
return;
}
};
}
#[test]
fn test_that_must_be_run_as_root() {
// Not running as the superuser, so skip.
skip_if_not_root!();
// Run test *iff* the user running the test is root
// ...
}
}
```

View File

@@ -1,3 +1,16 @@
* [Introduction](#introduction)
* [Maintenance warning](#maintenance-warning)
* [Determine current version](#determine-current-version)
* [Determine latest version](#determine-latest-version)
* [Configuration changes](#configuration-changes)
* [Upgrade Kata Containers](#upgrade-kata-containers)
* [Upgrade native distribution packaged version](#upgrade-native-distribution-packaged-version)
* [Static installation](#static-installation)
* [Determine if you are using a static installation](#determine-if-you-are-using-a-static-installation)
* [Remove a static installation](#remove-a-static-installation)
* [Upgrade a static installation](#upgrade-a-static-installation)
* [Custom assets](#custom-assets)
# Introduction
This document outlines the options for upgrading from a
@@ -35,10 +48,10 @@ Alternatively, if you are using Kata Containers version 1.12.0 or newer, you
can check for newer releases using the command line:
```bash
$ kata-runtime check --check-version-only
$ kata-runtime kata-check --check-version-only
```
There are various other related options. Run `kata-runtime check --help`
There are various other related options. Run `kata-runtime kata-check --help`
for further details.
# Configuration changes
@@ -102,7 +115,7 @@ first
[install the latest release](#determine-latest-version).
See the
[manual installation documentation](install/README.md#manual-installation)
[manual installation installation documentation](install/README.md#manual-installation)
for details on how to automatically install and configuration a static release
with containerd.
@@ -114,7 +127,7 @@ with containerd.
> kernel or image.
If you are using custom
[guest assets](design/architecture/README.md#guest-assets),
[guest assets](design/architecture.md#guest-assets),
you must upgrade them to work with Kata Containers 2.x since Kata
Containers 1.x assets will **not** work.

View File

@@ -1,247 +0,0 @@
# Code PR Advice
Before raising a PR containing code changes, we suggest you consider
the following to ensure a smooth and fast process.
> **Note:**
>
> - All the advice in this document is optional. However, if the
> advice provided is not followed, there is no guarantee your PR
> will be merged.
>
> - All the check tools will be run automatically on your PR by the CI.
> However, if you run them locally first, there is a much better
> chance of a successful initial CI run.
## Assumptions
This document assumes you have already read (and in the case of the
code of conduct agreed to):
- The [Kata Containers code of conduct](https://github.com/kata-containers/community/blob/main/CODE_OF_CONDUCT.md).
- The [Kata Containers contributing guide](https://github.com/kata-containers/community/blob/main/CONTRIBUTING.md).
## Code
### Architectures
Do not write architecture-specific code if it is possible to write the
code generically.
### General advice
- Do not write code to impress: instead write code that is easy to read and understand.
- Always consider which user will run the code. Try to minimise
the privileges the code requires.
### Comments
Always add comments if the intent of the code is not obvious. However,
try to avoid comments if the code could be made clearer (for example
by using more meaningful variable names).
### Constants
Don't embed magic numbers and strings in functions, particularly if
they are used repeatedly.
Create constants at the top of the file instead.
### Copyright and license
Ensure all new files contain a copyright statement and an SPDX license
identifier in the comments at the top of the file.
### FIXME and TODO
If the code contains areas that are not fully implemented, make this
clear a comment which provides a link to a GitHub issue that provides
further information.
Do not just rely on comments in this case though: if possible, return
a "`BUG: feature X not implemented see {bug-url}`" type error.
### Functions
- Keep functions relatively short (less than 100 lines is a good "rule of thumb").
- Document functions if the parameters, return value or general intent
of the function is not obvious.
- Always return errors where possible.
Do not discard error return values from the functions this function
calls.
### Logging
- Don't use multiple log calls when a single log call could be used.
- Use structured logging where possible to allow
[standard tooling](../src/tools/log-parser)
be able to extract the log fields.
### Names
Give functions, macros and variables clear and meaningful names.
### Structures
#### Golang structures
Unlike Rust, Go does not enforce that all structure members be set.
This has lead to numerous bugs in the past where code like the
following is used:
```go
type Foo struct {
Key string
Value string
}
// BUG: Key not set, but nobody noticed! ;(
let foo1 = Foo {
Value: "foo",
}
```
A much safer approach is to create a constructor function to enforce
integrity:
```go
type Foo struct {
Key string
Value string
}
func NewFoo(key, value string) (*Foo, error) {
if key == "" {
return nil, errors.New("Foo needs a key")
}
if value == "" {
return nil, errors.New("Foo needs a value")
}
return &Foo{
Key: key,
Value: value,
}, nil
}
func testFoo() error {
// BUG: Key not set, but nobody noticed! ;(
badFoo := Foo{Value: "value"}
// Ok - the constructor performs needed validation
goodFoo, err := NewFoo("name", "value")
if err != nil {
return err
}
return nil
```
> **Note:**
>
> The above is just an example. The *safest* approach would be to move
> `NewFoo()` into a separate package and make `Foo` and it's elements
> private. The compiler would then enforce the use of the constructor
> to guarantee correctly defined objects.
### Tracing
Consider if the code needs to create a new
[trace span](./tracing.md).
Ensure any new trace spans added to the code are completed.
## Tests
### Unit tests
Where possible, code changes should be accompanied by unit tests.
Consider using the standard
[table-based approach](Unit-Test-Advice.md)
as it encourages you to make functions small and simple, and also
allows you to think about what types of value to test.
### Other categories of test
Raised a GitHub issue in the
[`tests`](https://github.com/kata-containers/tests) repository that
explains what sort of test is required along with as much detail as
possible. Ensure the original issue is referenced on the `tests` issue.
### Unsafe code
#### Rust language specifics
Minimise the use of `unsafe` blocks in Rust code and since it is
potentially dangerous always write [unit tests][#unit-tests]
for this code where possible.
`expect()` and `unwrap()` will cause the code to panic on error.
Prefer to return a `Result` on error rather than using these calls to
allow the caller to deal with the error condition.
The table below lists the small number of cases where use of
`expect()` and `unwrap()` are permitted:
| Area | Rationale for permitting |
|-|-|
| In test code (the `tests` module) | Panics will cause the test to fail, which is desirable. |
| `lazy_static!()` | This magic macro cannot "return" a value as it runs before `main()`. |
| `defer!()` | Similar to golang's `defer()` but doesn't allow the use of `?`. |
| `tokio::spawn(async move {})` | Cannot currently return a `Result` from an `async move` closure. |
| If an explicit test is performed before the `unwrap()` / `expect()` | *"Just about acceptable"*, but not ideal `[*]` |
| `Mutex.lock()` | Almost unrecoverable if failed in the lock acquisition |
`[*]` - There can lead to bad *future* code: consider what would
happen if the explicit test gets dropped in the future. This is easier
to happen if the test and the extraction of the value are two separate
operations. In summary, this strategy can introduce an insidious
maintenance issue.
## Documentation
### General requirements
- All new features should be accompanied by documentation explaining:
- What the new feature does
- Why it is useful
- How to use the feature
- Any known issues or limitations
Links should be provided to GitHub issues tracking the issues
- The [documentation requirements document](Documentation-Requirements.md)
explains how the project formats documentation.
### Markdown syntax
Run the
[markdown checker](https://github.com/kata-containers/tests/tree/main/cmd/check-markdown)
on your documentation changes.
### Spell check
Run the
[spell checker](https://github.com/kata-containers/tests/tree/main/cmd/check-spelling)
on your documentation changes.
## Finally
You may wish to read the documentation that the
[Kata Review Team](https://github.com/kata-containers/community/blob/main/Rota-Process.md) use to help review PRs:
- [PR review guide](https://github.com/kata-containers/community/blob/main/PR-Review-Guide.md).
- [documentation review process](https://github.com/kata-containers/community/blob/main/Documentation-Review-Process.md).

View File

@@ -2,17 +2,10 @@
Kata Containers design documents:
- [Kata Containers architecture](architecture)
- [Kata Containers architecture](architecture.md)
- [API Design of Kata Containers](kata-api-design.md)
- [Design requirements for Kata Containers](kata-design-requirements.md)
- [VSocks](VSocks.md)
- [VCPU handling](vcpu-handling.md)
- [Host cgroups](host-cgroups.md)
- [`Inotify` support](inotify.md)
- [Metrics(Kata 2.0)](kata-2-0-metrics.md)
- [Design for Kata Containers `Lazyload` ability with `nydus`](kata-nydus-design.md)
- [Design for direct-assigned volume](direct-blk-device-assignment.md)
- [Design for core-scheduling](core-scheduling.md)
---
- [Design proposals](proposals)

View File

@@ -1,5 +1,12 @@
# Kata Containers and VSOCKs
- [Introduction](#introduction)
- [VSOCK communication diagram](#vsock-communication-diagram)
- [System requirements](#system-requirements)
- [Advantages of using VSOCKs](#advantages-of-using-vsocks)
- [High density](#high-density)
- [Reliability](#reliability)
## Introduction
There are two different ways processes in the virtual machine can communicate
@@ -67,15 +74,22 @@ Using a proxy for multiplexing the connections between the VM and the host uses
4.5MB per [POD][2]. In a high density deployment this could add up to GBs of
memory that could have been used to host more PODs. When we talk about density
each kilobyte matters and it might be the decisive factor between run another
POD or not. Before making the decision not to use VSOCKs, you should ask
POD or not. For example if you have 500 PODs running in a server, the same
amount of [`kata-proxy`][3] processes will be running and consuming for around
2250MB of RAM. Before making the decision not to use VSOCKs, you should ask
yourself, how many more containers can run with the memory RAM consumed by the
Kata proxies?
### Reliability
[`kata-proxy`][3] is in charge of multiplexing the connections between virtual
machine and host processes, if it dies all connections get broken. For example
if you have a [POD][2] with 10 containers running, if `kata-proxy` dies it would
be impossible to contact your containers, though they would still be running.
Since communication via VSOCKs is direct, the only way to lose communication
with the containers is if the VM itself or the `containerd-shim-kata-v2` dies, if this happens
the containers are removed automatically.
[1]: https://wiki.qemu.org/Features/VirtioVsock
[2]: ./vcpu-handling.md#virtual-cpus-and-kubernetes-pods
[3]: https://github.com/kata-containers/proxy

Binary file not shown.

Before

Width:  |  Height:  |  Size: 101 KiB

View File

@@ -1 +1 @@
<mxfile host="app.diagrams.net" modified="2021-11-05T13:07:32.992Z" agent="5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36" etag="j5e7J3AOXxeQrt-Zz2uw" version="15.6.8" type="device"><diagram id="XNV8G0dePIPkhS_Khqr4" name="Page-1">7Vxdd9o4EP01nLP7QI5s+fORUNhNT7rNbnqaZl/2CCywG2OxQhDIr18Z29iyZD6CDZRuHho8toQ9986dGVlNC3Yny98omvqfiIfDlg68ZQt+aOm6AUyb/4otq8TiGnpiGNPAS0wgNzwGbzgxapl1Hnh4ltoSEyMkZMFUNA5JFOEhE2yIUvIqXjYioScYpmiMJcPjEIWy9SnwmJ9YdQjd/MTvOBj76VdDCNI7n6Ds6tQw85FHXgsm2GvBLiWEJZ8myy4OY++JjulXnN3cGcUR22fAn3fPzx+jj7e9HrIXA330feIZ7czPCxTO00dO75atMh+MKZlP08swZXip8jwaZJcD+ca0zeNyomAywYyu+CXpRG3NNM1kUMoSXU+PX3OfG9nEfsHdG56gFOfxZvbcE/xD6gy1Yz7/88nuPiLwcNcZDLvEvZ10Vm1zt1+4WyIPx5NoLXj76gcMP07RMD77yqOB23w2CdPTIxKxlN78nuHtmCIv4A7qkpDQ9XzQxsjC8blREIYFu4ewMxpy+4xR8oILZ6yhgwcjfkZ2+Xa0yzDK2JzP81ZzntcNtedHI8+1LNnzo9FIHyo971kDy7Qa8Xx61gJCSGhyRHCtkXHRLLMhXGwJF659/KFrCwtHBkAbIA3rKgAAsHqdfjqDCBn/aRIYTRORUWiVa8rAwKZwcSRcuCQzFESYcrNWLz6K4HFtD9i2QrZM7HiGCjtHH8Ak3ETs+n0A+v0msdNhCTv7RkZPM1TwNSV37lb4asw6VwifDc4MnqJ88ldTTBfBjHulDB1/SibiI/o2IhEuAZGaUBiMI3445B7lvIC3sc8CXqZ20hOTwPPir1ESIqcMUORDn9DgLaZcmF7QnHKWUhqQ4TNUpUZj6GkSeuM5nvGUBl4wjeJW5odAsDnAzBJihiLgrIYgUz6DrJYSRqdvVwzblol82qJZM3Y75sfrV9wKGC+pXdEa7BTP16/s7/lLbVc0uY+8hn7lYGCkdgVKyJy0XdHkPvJn6VcOxu4C2xVte7t5zf3K0fCdv12Ry6efpl05XDgvrFvR5V7zmruVw/E6Z7OiDjcoQYK9MX5MDwllPhmTCIW93FpyXn7NPSHTFMXvmLFV6lI0Z0TEGC8D9q3w+TmeiueN5OjDMp15fbCSMdJijDg0dPUtuzI+KMwSH+bTrI+yeWRss3xO5nSYsfb+y53jDp6+P1r+y+fXj+HLU97AMETHmG1xalo+xI7cygqKQ8SCBRZuQwXxemiHUrQqXDAlQcRmhZkfYoPQBNqOmJstu0gYxQjD1ksjnBLFkrvICbd5nCNUA0qqwRidDpXMvEcDLiMCm/ZXAopnwVvaV8dcSF3IJzdvW+YHBctktmwNo727+erWHdxormWIMpEcHUYXGV0osmFz09kUZDSaYSZJymEIqyNHLqirEb5A7Xmv1hTZZB+nPa6sPVtFaqf45HyDBrRFvtnHEa5WQqklQy7ir5g6aiE6hjpqEbuQtOUCUf52p63yCFOzS6w7Lm1t9WtB1LqFNrMXjfmnlm6FcYE74CZrHH/6ZdOLeq04F/T5v92/7tqff5UoXeeyj+U5tqXsPWHHNGA2w37LPtlLib0L37auOa4IKpQXpDXHkktfq4bSV4lf1tb+HBpyXOmr75t+4L7pZ28ROQpjXY7RBxrP7eP5LH5wTBdYXla4rsATyz4LqALPaCbw1Mn7nHGXx9pzMdR2xF0eas/ZfCfJ3VDRcqrFrPa4e2fytk1bSbfq5F0eYTpmrclbyUH5XaTP2FRJzMvsOPUKJTi44QQ3Um6uqd/MXm9l/aZuiFM01x7ICwqV6J5MdqCY8QGwTqI98aQPmAbcpTFTj1wC21+P4J56tGGhCQEUK8QjaVhNs8NVzbHFezNAaSP7TlUrjTha1dDX0f1d+292B77+8dRGX7/MfHCmzPpOplaycPcah9tIslM1lpq4jWZTPGWTJBGTjjuGYVXftKXpLY0wHbh9hA5ca9uIZtpkKKfaF8RQe0KigCle6V3sXofDa2/NNfWbCgIVqm/dVLxgba76lrcvXGjb+64yetvK1u4VMKtuYTkOKjl0frQXI5VR8446FZqmXlNlCsqvQkqyXktpqnxnvMdWvOZ3hzqGmAgMR7EmYG928hR1yW5s05XCMcnaqRcsBAdZ/87j/5C4pmR7tuZkh1+gGdPlmpjZ+WzFNV9wbc/8YNJep5/FZnp+t+tvSC6uLx8ZDeejrfQ6aDtqBdTNrbzKujYjw5f6XK/a8esAwIt4hes7JgAGOKXrs/2o2o1e2qUtb51T1QZ1bL5SPoL8mvYCxEn5pkDNWKcpxir3rv+vToeGiF1BnMtSJ3n16ArUaX/XV6qTKW9Wq0md+GH+RwaSSiv/Ww2w9x8=</diagram></mxfile>
<mxfile host="Chrome" modified="2020-07-02T06:44:28.736Z" agent="5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36" etag="r7FpfnbGNK7jbg54Gu9x" version="13.3.5" type="device"><diagram id="XNV8G0dePIPkhS_Khqr4" name="Page-1">7VvZcuI4FP0aHqFky+sjkNCTqfR0qtLV6fTLlMDy0hiLscWWrx8Zy3iRQkhjQxbygnVlhK1zz9HRkg4cztZfYjT3vxIHhx0VOOsOvOqoKrRthX2kkU0WMTWYBbw4cLIQKAL3wRPOgkoeXQQOTngsC1FCQhrMq8EJiSI8oZUYimOyqt7mktCpBObIw0LgfoJCMfoQONTPoiqEdlHxFw48n/80hIA/+Qzld/NA4iOHrEoheN2Bw5gQml3N1kMcpr1X7ZjRM7W7J4txRA/5wrd/v5rDewTubvrjyZDYg1l/01W0rJklChf8lfnT0k3eBzFZRA5OW1E6cLDyA4rv52iS1q4Y6izm01nIq10SUQ4jwxAOvBg5AXvCIQlJvG0PmhgZOK1zgzAsxR2ELXfC4gmNyRSXaoyJhccuqxHfmXfDEscUr0sh3gdfMJlhGm/YLby2q+i6nn2J56QOzKy8KhDWctT8Eri7rEQ8q7xd60W/swve9a+BQW8PBlWTw+C6jm0YIgyu66oTKQyOMTZ0oykYNLsGQ/7OJRgYnUQYFENvCYYWUXgvZFDss5PBuHBBVc7OBQUKvY4dNjbyIompTzwSofC6iA4KXNKULu65JWTO0fiNKd1wONCCkipWeB3Qn6Xrx7Spns5LV2ve8raw4YUyy7R9iCRkEU/4u3i3328se/zw+97wp99Wf4fTh2I0pCj2MN3TOZwjaYfsBTjGIaLBsmomGodKhQJjKI3nEymAt2jMPFql01EYeBG7nrAOwyy/B2nqBswE9XnFLHCcDF+cBE9ovG0v7fo5CSK6fR190NGvDgJjb7YJpNlZO/6rFfMkJRPoKbahVUUtKx2MBm/8Ln27Ustqmonldrt2tQ3iugnLmzqeu4c8COK9mVkRRSNkXTpwgmUFZeO/RWopt0h0ky0UfXaDos3XWzzyenblpZ+sfykKIhw73cQPZt0poqi73DXPHnf7C9nNzY2Hmqi2yhgpWJWpLQDGdX/EW6jqM/trSoUts6bCUBwLFdPMk6Csw0YDg6EcePMV3H6D4szwiDc/y4XSt9Ji8bVtSSbq5nGibouivpdjL6o6TxjQA5ZuVjMGHKc0jQqJfKwQzdQHzpwj7YAkc+Sj17nswN7HLklGofHNCbglCrjhWKahyQQc9nUN5i20JeBMnMHLAi6bzLQm35r+megGjqKbeqhQw0OF+jR8U0W+3cVp2z5eJOmL45gl8ccmnm1XiGdIltQUS2uHePJx7py8K7j2WKbaC7wrqPaYt3eSYQ5KZr1yMTsb76QQi1Min9K5FPe3OelVnyHaq+e8oMfYVWXgkVPefIKrKL3aAlN71lRcxXgWz5PxWP2YRIYHEnmXXxBa1fSyj8uvRrMJ/XBvb7q/6A348c9DF/34nvjgzANA61lzgizJMX4jNguKer9dqpqRKKCkXX917pUpW1drS48yh6Xqp1yZ0kS9Tshk2hwOsl0xCwATynDo6wBooG0cLFibYFoiCjIQYGs2VxH6+43OL/9omNu32vLyqozxpuyqKurXe9uleZYyf+BYoa6rx3mInJWGUuFkVwOn86yKgOllW6Zp0TVr2zKaRHRPvS2jiWT+8IOfqcBeDQodqucd/8TdMeSl7/iRzaCm1bYpVREEWwZCW0dFLAGEnXilCtcsGJLTO7bpANOUEEbHliNdFbXUMczO+1SBGo0AGI2aAgqqdaC0XKTK0qWdkjB79oZYWJw0f1qsDMkgc1Kk8vN15fWwzRzHyyCRzHbZi9IqGNWOjEiEa73OQ4f7Shn61blE/aidT+LgKc2vsPPCssWrzizWBVB2ZlF2ZLE1qEQb6C1wkprUKY6j9Ez8u4CrmeEJVNGBsk1Y46TwiCdKP59L0HOhOpdLkJxkutgE2dCjw/PbBGW/p7v4hB1Y5tl9gmjpLj5B6hNka+Yn9QmqaOkuPmGHjtaeT2DF4h/tsrW/4v8V4fX/</diagram></mxfile>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 90 KiB

After

Width:  |  Height:  |  Size: 93 KiB

View File

@@ -1 +0,0 @@
<mxfile host="app.diagrams.net" modified="2022-01-18T14:06:01.890Z" agent="5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36" etag="nId-8OV6FDjWTDgzqDu-" version="15.8.9" type="device"><diagram id="bkF_ZONM9sPFCpIYoGFl" name="Page-1">5Vtbj6M2GP01eUyEbW55nGSmM2q70mqnUnf6UrnBCdYSnIIzSfbX14AJYDsLSSDZUWdHGvxhDD7n8F1sdoTm6/1zgjfhJxaQaAStYD9CjyMIgQ1d8SezHAqL504LwyqhgexUGV7pdyKNlrRuaUDSRkfOWMTppmlcsDgmC96w4SRhu2a3JYuad93gFdEMrwsc6dY/acDDwupDr7K/ELoKyzuDcn5rXHaWM0lDHLBdzYSeRmieMMaLo/V+TqIMvBKX4rpfTpw9PlhCYt7lgpd4/J1SSP9++bR8Cb6A1de/XsZ2Mco7jrZywiFLuXxgfihR4GQv7jEL+ToSBiAOU56wb2TOIpYIS8xi0XO2pFGkmHBEV7FoLsRTEmGfvZOEU4HvgzyxpkGQ3Wa2Cyknrxu8yO65E2oStoRt44BkE7BES5+xBCEbk+xrJonAM2FrwpOD6CLP+o5kQ8rRsZ2ivavItWWXsMZrSSKWclodR64QFwcSdDMB8ycYj+f+Hw/+b3jxvN57e/vXsa8RoIHfBMEEU42XJYu5fIugfeSplC5QSBpBtHSyf/LKmr340ZgWZ9z858iHBr6BopN8INDkAwGdj6llIMSxh2JkamDEjbhEqEGN+++WlSfGaY76g+gA3c2+OimOVtnf+BBs03Ea400aMp69DHJY8ZTFyEW/H/AP+uC/D9aQNbFAkzjDiwQ8A3H+ULyVSrqCOARNxInQwjGNSRIMzth0OMacCYJN14csnTFnOkG+Tpo3GGnAQJqCJomDhyySZ1EkwmlKFzlKOOG6uYZr023WUBYTRDOBW3L4mp2cOGXzTV6ZNx738sqidWjEIBJoWYMWlFK2TRakg2DFTFaEt3kkndoab47JQ0pbQiLM6XvzeU1Eyjt8ZjR/W0rluErELD10OUQxT3lVPf9QBrIVV2+7ykAFDtpAua6O075Cauh6x97iH8ZpSNfjb5jj8TscxFn04Aocx2n3A65BUMM5AT0L7c+lwqFcqg8UHKEeAVGJdSOXdAYD0rle4tOTucvw4W8wrhyvyZU7NWQr0KB5dzCq3OupMqaZufcRVWnOzwfNVnxbiTlTg4tCP4h5/dPlXZin1KA7phxjkT3DRtZhTbxj+0Tikbc+k4SKCWWFdGHcU/61HF4cv1UJjWhVI2WNITIYdM/MxIOKStSEomtmosrNVVOcoTOTDosAncWl5LNWm6ykgirVvNX0dCMFdciBC0ruJjWkKAReKjWnZaCBpQZNRfLFUmu6sFYPdmdn1bXcuq9Xc1WFqClIV6mpA3nWjaV2aWlfl9oFkql5QgvYTYkC95Ioexd/Z/9MoVWLiJ39HWiJ0UOLEBpEeF6aDXxTmr3akrRzhv0zbZ9cl5grcdBxJL732j6BpqWDM/k1llHFNthHordZifn9EA6A4gmQYZXjtozraxxzoFFyaU2bB4hBalpggROpX1tRO9gaBNTXILLt6GX6IeH0O8KJBoNTXyOg6+zzAhGOPw6sSi3sGTZkgWlDdjhYTdXxmS7eMbn4NBSwBDQZZJ2s9OwRWfJ+qJmq+bxxq/yGxKAOteStNzc0t2BC6aZeodx1/d/LV0kdfeve8jXtB95ZvtNpO0i3VW+Hrbm2Iv70RjysL0DWS/xbrQkVL+e9qmzfP8H2uVW2Fhrs21bZyLTv2K9KykWd4wJkvx9rtK7HFFnIvZQCLNiXVFxVKt7kxmLRq47yo7g8mpmL63Mrahm4TtbTqXDjNF79nnd7tCvLF0leZmLi8mWUazYUFxIxwmyT4ZIj5czEr0Bznq1IOuJZ56INqrb4zbonfM5i8fiY5pojOOW7bO0okzzHHP+Tz1Sv4HvLiFzHLJ2adD3DZwrDxZRet7vO24MIcBoe43mP7qEQ9f3cg6VwrC6/dHUP6kYXALA//yCa1efuRffqPw2gp/8A</diagram></mxfile>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 51 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.2 MiB

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 1.0 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 390 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 942 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 182 KiB

311
docs/design/architecture.md Normal file
View File

@@ -0,0 +1,311 @@
# Kata Containers Architecture
- [Kata Containers Architecture](#kata-containers-architecture)
- [Overview](#overview)
- [Virtualization](#virtualization)
- [Guest assets](#guest-assets)
- [Guest kernel](#guest-kernel)
- [Guest image](#guest-image)
- [Root filesystem image](#root-filesystem-image)
- [Initrd image](#initrd-image)
- [Agent](#agent)
- [Runtime](#runtime)
- [Configuration](#configuration)
- [Networking](#networking)
- [Network Hotplug](#network-hotplug)
- [Storage](#storage)
- [Kubernetes support](#kubernetes-support)
- [OCI annotations](#oci-annotations)
- [Mixing VM based and namespace based runtimes](#mixing-vm-based-and-namespace-based-runtimes)
- [Appendices](#appendices)
- [DAX](#dax)
## Overview
This is an architectural overview of Kata Containers, based on the 2.0 release.
The primary deliverable of the Kata Containers project is a CRI friendly shim. There is also a CRI friendly library API behind them.
The [Kata Containers runtime](../../src/runtime)
is compatible with the [OCI](https://github.com/opencontainers) [runtime specification](https://github.com/opencontainers/runtime-spec)
and therefore works seamlessly with the [Kubernetes\* Container Runtime Interface (CRI)](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-node/container-runtime-interface.md)
through the [CRI-O\*](https://github.com/kubernetes-incubator/cri-o) and
[Containerd\*](https://github.com/containerd/containerd) implementation.
Kata Containers creates a QEMU\*/KVM virtual machine for pod that `kubelet` (Kubernetes) creates respectively.
The [`containerd-shim-kata-v2` (shown as `shimv2` from this point onwards)](../../src/runtime/containerd-shim-v2)
is the Kata Containers entrypoint, which
implements the [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/master/runtime/v2) for Kata.
Before `shimv2` (as done in [Kata Containers 1.x releases](https://github.com/kata-containers/runtime/releases)), we need to create a `containerd-shim` and a [`kata-shim`](https://github.com/kata-containers/shim) for each container and the Pod sandbox itself, plus an optional [`kata-proxy`](https://github.com/kata-containers/proxy) when VSOCK is not available. With `shimv2`, Kubernetes can launch Pod and OCI compatible containers with one shim (the `shimv2`) per Pod instead of `2N+1` shims, and no standalone `kata-proxy` process even if no VSOCK is available.
![Kubernetes integration with shimv2](arch-images/shimv2.svg)
The container process is then spawned by
[`kata-agent`](../../src/agent), an agent process running
as a daemon inside the virtual machine. `kata-agent` runs a [`ttRPC`](https://github.com/containerd/ttrpc-rust) server in
the guest using a VIRTIO serial or VSOCK interface which QEMU exposes as a socket
file on the host. `shimv2` uses a `ttRPC` protocol to communicate with
the agent. This protocol allows the runtime to send container management
commands to the agent. The protocol is also used to carry the I/O streams (stdout,
stderr, stdin) between the containers and the manage engines (e.g. CRI-O or containerd).
For any given container, both the init process and all potentially executed
commands within that container, together with their related I/O streams, need
to go through the VSOCK interface exported by QEMU.
The container workload, that is, the actual OCI bundle rootfs, is exported from the
host to the virtual machine. In the case where a block-based graph driver is
configured, `virtio-scsi` will be used. In all other cases a 9pfs VIRTIO mount point
will be used. `kata-agent` uses this mount point as the root filesystem for the
container processes.
## Virtualization
How Kata Containers maps container concepts to virtual machine technologies, and how this is realized in the multiple
hypervisors and VMMs that Kata supports is described within the [virtualization documentation](./virtualization.md)
## Guest assets
The hypervisor will launch a virtual machine which includes a minimal guest kernel
and a guest image.
### Guest kernel
The guest kernel is passed to the hypervisor and used to boot the virtual
machine. The default kernel provided in Kata Containers is highly optimized for
kernel boot time and minimal memory footprint, providing only those services
required by a container workload. This is based on a very current upstream Linux
kernel.
### Guest image
Kata Containers supports both an `initrd` and `rootfs` based minimal guest image.
#### Root filesystem image
The default packaged root filesystem image, sometimes referred to as the "mini O/S", is a
highly optimized container bootstrap system based on [Clear Linux](https://clearlinux.org/). It provides an extremely minimal environment and
has a highly optimized boot path.
The only services running in the context of the mini O/S are the init daemon
(`systemd`) and the [Agent](#agent). The real workload the user wishes to run
is created using libcontainer, creating a container in the same manner that is done
by `runc`.
For example, when `ctr run -ti ubuntu date` is run:
- The hypervisor will boot the mini-OS image using the guest kernel.
- `systemd`, running inside the mini-OS context, will launch the `kata-agent` in
the same context.
- The agent will create a new confined context to run the specified command in
(`date` in this example).
- The agent will then execute the command (`date` in this example) inside this
new context, first setting the root filesystem to the expected Ubuntu\* root
filesystem.
#### Initrd image
A compressed `cpio(1)` archive, created from a rootfs which is loaded into memory and used as part of the Linux startup process. During startup, the kernel unpacks it into a special instance of a `tmpfs` that becomes the initial root filesystem.
The only service running in the context of the initrd is the [Agent](#agent) as the init daemon. The real workload the user wishes to run is created using libcontainer, creating a container in the same manner that is done by `runc`.
## Agent
[`kata-agent`](../../src/agent) is a process running in the guest as a supervisor for managing containers and processes running within those containers.
For the 2.0 release, the `kata-agent` is rewritten in the [RUST programming language](https://www.rust-lang.org/) so that we can minimize its memory footprint while keeping the memory safety of the original GO version of [`kata-agent` used in Kata Container 1.x](https://github.com/kata-containers/agent). This memory footprint reduction is pretty impressive, from tens of megabytes down to less than 100 kilobytes, enabling Kata Containers in more use cases like functional computing and edge computing.
The `kata-agent` execution unit is the sandbox. A `kata-agent` sandbox is a container sandbox defined by a set of namespaces (NS, UTS, IPC and PID). `shimv2` can
run several containers per VM to support container engines that require multiple
containers running inside a pod.
`kata-agent` communicates with the other Kata components over `ttRPC`.
## Runtime
`containerd-shim-kata-v2` is a [containerd runtime shimv2](https://github.com/containerd/containerd/blob/v1.4.1/runtime/v2/README.md) implementation and is responsible for handling the `runtime v2 shim APIs`, which is similar to [the OCI runtime specification](https://github.com/opencontainers/runtime-spec) but simplifies the architecture by loading the runtime once and making RPC calls to handle the various container lifecycle commands. This refinement is an improvement on the OCI specification which requires the container manager call the runtime binary multiple times, at least once for each lifecycle command.
`containerd-shim-kata-v2` heavily utilizes the
[virtcontainers package](../../src/runtime/virtcontainers/), which provides a generic, runtime-specification agnostic, hardware-virtualized containers library.
### Configuration
The runtime uses a TOML format configuration file called `configuration.toml`. By default this file is installed in the `/usr/share/defaults/kata-containers` directory and contains various settings such as the paths to the hypervisor, the guest kernel and the mini-OS image.
The actual configuration file paths can be determined by running:
```
$ kata-runtime --kata-show-default-config-paths
```
Most users will not need to modify the configuration file.
The file is well commented and provides a few "knobs" that can be used to modify the behavior of the runtime and your chosen hypervisor.
The configuration file is also used to enable runtime [debug output](../Developer-Guide.md#enable-full-debug).
## Networking
Containers will typically live in their own, possibly shared, networking namespace.
At some point in a container lifecycle, container engines will set up that namespace
to add the container to a network which is isolated from the host network, but
which is shared between containers
In order to do so, container engines will usually add one end of a virtual
ethernet (`veth`) pair into the container networking namespace. The other end of
the `veth` pair is added to the host networking namespace.
This is a very namespace-centric approach as many hypervisors/VMMs cannot handle `veth`
interfaces. Typically, `TAP` interfaces are created for VM connectivity.
To overcome incompatibility between typical container engines expectations
and virtual machines, Kata Containers networking transparently connects `veth`
interfaces with `TAP` ones using Traffic Control:
![Kata Containers networking](arch-images/network.png)
With a TC filter in place, a redirection is created between the container network and the
virtual machine. As an example, the CNI may create a device, `eth0`, in the container's network
namespace, which is a VETH device. Kata Containers will create a tap device for the VM, `tap0_kata`,
and setup a TC redirection filter to mirror traffic from `eth0`'s ingress to `tap0_kata`'s egress,
and a second to mirror traffic from `tap0_kata`'s ingress to `eth0`'s egress.
Kata Containers maintains support for MACVTAP, which was an earlier implementation used in Kata. TC-filter
is the default because it allows for simpler configuration, better CNI plugin compatibility, and performance
on par with MACVTAP.
Kata Containers has deprecated support for bridge due to lacking performance relative to TC-filter and MACVTAP.
Kata Containers supports both
[CNM](https://github.com/docker/libnetwork/blob/master/docs/design.md#the-container-network-model)
and [CNI](https://github.com/containernetworking/cni) for networking management.
### Network Hotplug
Kata Containers has developed a set of network sub-commands and APIs to add, list and
remove a guest network endpoint and to manipulate the guest route table.
The following diagram illustrates the Kata Containers network hotplug workflow.
![Network Hotplug](arch-images/kata-containers-network-hotplug.png)
## Storage
Container workloads are shared with the virtualized environment through [virtio-fs](https://virtio-fs.gitlab.io/).
The [devicemapper `snapshotter`](https://github.com/containerd/containerd/tree/master/snapshots/devmapper) is a special case. The `snapshotter` uses dedicated block devices rather than formatted filesystems, and operates at the block level rather than the file level. This knowledge is used to directly use the underlying block device instead of the overlay file system for the container root file system. The block device maps to the top read-write layer for the overlay. This approach gives much better I/O performance compared to using `virtio-fs` to share the container file system.
Kata Containers has the ability to hotplug and remove block devices, which makes it possible to use block devices for containers started after the VM has been launched.
Users can check to see if the container uses the devicemapper block device as its rootfs by calling `mount(8)` within the container. If the devicemapper block device
is used, `/` will be mounted on `/dev/vda`. Users can disable direct mounting of the underlying block device through the runtime configuration.
## Kubernetes support
[Kubernetes\*](https://github.com/kubernetes/kubernetes/) is a popular open source
container orchestration engine. In Kubernetes, a set of containers sharing resources
such as networking, storage, mount, PID, etc. is called a
[Pod](https://kubernetes.io/docs/user-guide/pods/).
A node can have multiple pods, but at a minimum, a node within a Kubernetes cluster
only needs to run a container runtime and a container agent (called a
[Kubelet](https://kubernetes.io/docs/admin/kubelet/)).
A Kubernetes cluster runs a control plane where a scheduler (typically running on a
dedicated master node) calls into a compute Kubelet. This Kubelet instance is
responsible for managing the lifecycle of pods within the nodes and eventually relies
on a container runtime to handle execution. The Kubelet architecture decouples
lifecycle management from container execution through the dedicated
`gRPC` based [Container Runtime Interface (CRI)](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/container-runtime-interface-v1.md).
In other words, a Kubelet is a CRI client and expects a CRI implementation to
handle the server side of the interface.
[CRI-O\*](https://github.com/kubernetes-incubator/cri-o) and [Containerd\*](https://github.com/containerd/containerd/) are CRI implementations that rely on [OCI](https://github.com/opencontainers/runtime-spec)
compatible runtimes for managing container instances.
Kata Containers is an officially supported CRI-O and Containerd runtime. Refer to the following guides on how to set up Kata Containers with Kubernetes:
- [How to use Kata Containers and Containerd](../how-to/containerd-kata.md)
- [Run Kata Containers with Kubernetes](../how-to/run-kata-with-k8s.md)
#### OCI annotations
In order for the Kata Containers runtime (or any virtual machine based OCI compatible
runtime) to be able to understand if it needs to create a full virtual machine or if it
has to create a new container inside an existing pod's virtual machine, CRI-O adds
specific annotations to the OCI configuration file (`config.json`) which is passed to
the OCI compatible runtime.
Before calling its runtime, CRI-O will always add a `io.kubernetes.cri-o.ContainerType`
annotation to the `config.json` configuration file it produces from the Kubelet CRI
request. The `io.kubernetes.cri-o.ContainerType` annotation can either be set to `sandbox`
or `container`. Kata Containers will then use this annotation to decide if it needs to
respectively create a virtual machine or a container inside a virtual machine associated
with a Kubernetes pod:
```Go
containerType, err := ociSpec.ContainerType()
if err != nil {
return err
}
handleFactory(ctx, runtimeConfig)
disableOutput := noNeedForOutput(detach, ociSpec.Process.Terminal)
var process vc.Process
switch containerType {
case vc.PodSandbox:
process, err = createSandbox(ctx, ociSpec, runtimeConfig, containerID, bundlePath, console, disableOutput, systemdCgroup)
if err != nil {
return err
}
case vc.PodContainer:
process, err = createContainer(ctx, ociSpec, containerID, bundlePath, console, disableOutput)
if err != nil {
return err
}
}
```
#### Mixing VM based and namespace based runtimes
> **Note:** Since Kubernetes 1.12, the [`Kubernetes RuntimeClass`](https://kubernetes.io/docs/concepts/containers/runtime-class/)
> has been supported and the user can specify runtime without the non-standardized annotations.
With `RuntimeClass`, users can define Kata Containers as a `RuntimeClass` and then explicitly specify that a pod being created as a Kata Containers pod. For details, please refer to [How to use Kata Containers and Containerd](../../docs/how-to/containerd-kata.md).
# Appendices
## DAX
Kata Containers utilizes the Linux kernel DAX [(Direct Access filesystem)](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/dax.txt)
feature to efficiently map some host-side files into the guest VM space.
In particular, Kata Containers uses the QEMU NVDIMM feature to provide a
memory-mapped virtual device that can be used to DAX map the virtual machine's
root filesystem into the guest memory address space.
Mapping files using DAX provides a number of benefits over more traditional VM
file and device mapping mechanisms:
- Mapping as a direct access devices allows the guest to directly access
the host memory pages (such as via Execute In Place (XIP)), bypassing the guest
page cache. This provides both time and space optimizations.
- Mapping as a direct access device inside the VM allows pages from the
host to be demand loaded using page faults, rather than having to make requests
via a virtualized device (causing expensive VM exits/hypercalls), thus providing
a speed optimization.
- Utilizing `MAP_SHARED` shared memory on the host allows the host to efficiently
share pages.
Kata Containers uses the following steps to set up the DAX mappings:
1. QEMU is configured with an NVDIMM memory device, with a memory file
backend to map in the host-side file into the virtual NVDIMM space.
2. The guest kernel command line mounts this NVDIMM device with the DAX
feature enabled, allowing direct page mapping and access, thus bypassing the
guest page cache.
![DAX](arch-images/DAX.png)
Information on the use of NVDIMM via QEMU is available in the [QEMU source code](http://git.qemu-project.org/?p=qemu.git;a=blob;f=docs/nvdimm.txt;hb=HEAD)

View File

@@ -1,477 +0,0 @@
# Kata Containers Architecture
## Overview
Kata Containers is an open source community working to build a secure
container [runtime](#runtime) with lightweight virtual machines (VM's)
that feel and perform like standard Linux containers, but provide
stronger [workload](#workload) isolation using hardware
[virtualization](#virtualization) technology as a second layer of
defence.
Kata Containers runs on [multiple architectures](../../../src/runtime/README.md#platform-support)
and supports [multiple hypervisors](../../hypervisors.md).
This document is a summary of the Kata Containers architecture.
## Background knowledge
This document assumes the reader understands a number of concepts
related to containers and file systems. The
[background](background.md) document explains these concepts.
## Example command
This document makes use of a particular [example
command](example-command.md) throughout the text to illustrate certain
concepts.
## Virtualization
For details on how Kata Containers maps container concepts to VM
technologies, and how this is realized in the multiple hypervisors and
VMMs that Kata supports see the
[virtualization documentation](../virtualization.md).
## Compatibility
The [Kata Containers runtime](../../../src/runtime) is compatible with
the [OCI](https://github.com/opencontainers)
[runtime specification](https://github.com/opencontainers/runtime-spec)
and therefore works seamlessly with the
[Kubernetes Container Runtime Interface (CRI)](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-node/container-runtime-interface.md)
through the [CRI-O](https://github.com/kubernetes-incubator/cri-o)
and [containerd](https://github.com/containerd/containerd)
implementations.
Kata Containers provides a ["shimv2"](#shim-v2-architecture) compatible runtime.
## Shim v2 architecture
The Kata Containers runtime is shim v2 ("shimv2") compatible. This
section explains what this means.
> **Note:**
>
> For a comparison with the Kata 1.x architecture, see
> [the architectural history document](history.md).
The
[containerd runtime shimv2 architecture](https://github.com/containerd/containerd/tree/main/runtime/v2)
or _shim API_ architecture resolves the issues with the old
architecture by defining a set of shimv2 APIs that a compatible
runtime implementation must supply. Rather than calling the runtime
binary multiple times for each new container, the shimv2 architecture
runs a single instance of the runtime binary (for any number of
containers). This improves performance and resolves the state handling
issue.
The shimv2 API is similar to the
[OCI runtime](https://github.com/opencontainers/runtime-spec)
API in terms of the way the container lifecycle is split into
different verbs. Rather than calling the runtime multiple times, the
container manager creates a socket and passes it to the shimv2
runtime. The socket is a bi-directional communication channel that
uses a gRPC based protocol to allow the container manager to send API
calls to the runtime, which returns the result to the container
manager using the same channel.
The shimv2 architecture allows running several containers per VM to
support container engines that require multiple containers running
inside a pod.
With the new architecture [Kubernetes](kubernetes.md) can
launch both Pod and OCI compatible containers with a single
[runtime](#runtime) shim per Pod, rather than `2N+1` shims. No stand
alone `kata-proxy` process is required, even if VSOCK is not
available.
## Workload
The workload is the command the user requested to run in the
container and is specified in the [OCI bundle](background.md#oci-bundle)'s
configuration file.
In our [example](example-command.md), the workload is the `sh(1)` command.
### Workload root filesystem
For details of how the [runtime](#runtime) makes the
[container image](background.md#container-image) chosen by the user available to
the workload process, see the
[Container creation](#container-creation) and [storage](#storage) sections.
Note that the workload is isolated from the [guest VM](#environments) environment by its
surrounding [container environment](#environments). The guest VM
environment where the container runs in is also isolated from the _outer_
[host environment](#environments) where the container manager runs.
## System overview
### Environments
The following terminology is used to describe the different or
environments (or contexts) various processes run in. It is necessary
to study this table closely to make sense of what follows:
| Type | Name | Virtualized | Containerized | rootfs | Rootfs device type | Mount type | Description |
|-|-|-|-|-|-|-|-|
| Host | Host | no `[1]` | no | Host specific | Host specific | Host specific | The environment provided by a standard, physical non virtualized system. |
| VM root | Guest VM | yes | no | rootfs inside the [guest image](guest-assets.md#guest-image) | Hypervisor specific `[2]` | `ext4` | The first (or top) level VM environment created on a host system. |
| VM container root | Container | yes | yes | rootfs type requested by user ([`ubuntu` in the example](example-command.md)) | `kataShared` | [virtio FS](storage.md#virtio-fs) | The first (or top) level container environment created inside the VM. Based on the [OCI bundle](background.md#oci-bundle). |
**Key:**
- `[1]`: For simplicity, this document assumes the host environment
runs on physical hardware.
- `[2]`: See the [DAX](#dax) section.
> **Notes:**
>
> - The word "root" is used to mean _top level_ here in a similar
> manner to the term [rootfs](background.md#root-filesystem).
>
> - The term "first level" prefix used above is important since it implies
> that it is possible to create multi level systems. However, they do
> not form part of a standard Kata Containers environment so will not
> be considered in this document.
The reasons for containerizing the [workload](#workload) inside the VM
are:
- Isolates the workload entirely from the VM environment.
- Provides better isolation between containers in a [pod](kubernetes.md).
- Allows the workload to be managed and monitored through its cgroup
confinement.
### Container creation
The steps below show at a high level how a Kata Containers container is
created using the containerd container manager:
1. The user requests the creation of a container by running a command
like the [example command](example-command.md).
1. The container manager daemon runs a single instance of the Kata
[runtime](#runtime).
1. The Kata runtime loads its [configuration file](#configuration).
1. The container manager calls a set of shimv2 API functions on the runtime.
1. The Kata runtime launches the configured [hypervisor](#hypervisor).
1. The hypervisor creates and starts (_boots_) a VM using the
[guest assets](guest-assets.md#guest-assets):
- The hypervisor [DAX](#dax) shares the
[guest image](guest-assets.md#guest-image)
into the VM to become the VM [rootfs](background.md#root-filesystem) (mounted on a `/dev/pmem*` device),
which is known as the [VM root environment](#environments).
- The hypervisor mounts the [OCI bundle](background.md#oci-bundle), using [virtio FS](storage.md#virtio-fs),
into a container specific directory inside the VM's rootfs.
This container specific directory will become the
[container rootfs](#environments), known as the
[container environment](#environments).
1. The [agent](#agent) is started as part of the VM boot.
1. The runtime calls the agent's `CreateSandbox` API to request the
agent create a container:
1. The agent creates a [container environment](#environments)
in the container specific directory that contains the [container rootfs](#environments).
The container environment hosts the [workload](#workload) in the
[container rootfs](#environments) directory.
1. The agent spawns the workload inside the container environment.
> **Notes:**
>
> - The container environment created by the agent is equivalent to
> a container environment created by the
> [`runc`](https://github.com/opencontainers/runc) OCI runtime;
> Linux cgroups and namespaces are created inside the VM by the
> [guest kernel](guest-assets.md#guest-kernel) to isolate the
> workload from the VM environment the container is created in.
> See the [Environments](#environments) section for an
> explanation of why this is done.
>
> - See the [guest image](guest-assets.md#guest-image) section for
> details of exactly how the agent is started.
1. The container manager returns control of the container to the
user running the `ctr` command.
> **Note:**
>
> At this point, the container is running and:
>
> - The [workload](#workload) process ([`sh(1)` in the example](example-command.md))
> is running in the [container environment](#environments).
> - The user is now able to interact with the workload
> (using the [`ctr` command in the example](example-command.md)).
> - The [agent](#agent), running inside the VM is monitoring the
> [workload](#workload) process.
> - The [runtime](#runtime) is waiting for the agent's `WaitProcess` API
> call to complete.
Further details of these steps are provided in the sections below.
### Container shutdown
There are two possible ways for the container environment to be
terminated:
- When the [workload](#workload) exits.
This is the standard, or _graceful_ shutdown method.
- When the container manager forces the container to be deleted.
#### Workload exit
The [agent](#agent) will detect when the [workload](#workload) process
exits, capture its exit status (see `wait(2)`) and return that value
to the [runtime](#runtime) by specifying it as the response to the
`WaitProcess` agent API call made by the [runtime](#runtime).
The runtime then passes the value back to the container manager by the
`Wait` [shimv2 API](#shim-v2-architecture) call.
Once the workload has fully exited, the VM is no longer needed and the
runtime cleans up the environment (which includes terminating the
[hypervisor](#hypervisor) process).
> **Note:**
>
> When [agent tracing is enabled](../../tracing.md#agent-shutdown-behaviour),
> the shutdown behaviour is different.
#### Container manager requested shutdown
If the container manager requests the container be deleted, the
[runtime](#runtime) will signal the agent by sending it a
`DestroySandbox` [ttRPC API](../../../src/libs/protocols/protos/agent.proto) request.
## Guest assets
The guest assets comprise a guest image and a guest kernel that are
used by the [hypervisor](#hypervisor).
See the [guest assets](guest-assets.md) document for further
information.
## Hypervisor
The [hypervisor](../../hypervisors.md) specified in the
[configuration file](#configuration) creates a VM to host the
[agent](#agent) and the [workload](#workload) inside the
[container environment](#environments).
> **Note:**
>
> The hypervisor process runs inside an environment slightly different
> to the host environment:
>
> - It is run in a different cgroup environment to the host.
> - It is given a separate network namespace from the host.
> - If the [OCI configuration specifies a SELinux label](https://github.com/opencontainers/runtime-spec/blob/main/config.md#linux-process),
> the hypervisor process will run with that label (*not* the workload running inside the hypervisor's VM).
## Agent
The Kata Containers agent ([`kata-agent`](../../../src/agent)), written
in the [Rust programming language](https://www.rust-lang.org), is a
long running process that runs inside the VM. It acts as the
supervisor for managing the containers and the [workload](#workload)
running within those containers. Only a single agent process is run
for each VM created.
### Agent communications protocol
The agent communicates with the other Kata components (primarily the
[runtime](#runtime)) using a
[`ttRPC`](https://github.com/containerd/ttrpc-rust) based
[protocol](../../../src/libs/protocols/protos).
> **Note:**
>
> If you wish to learn more about this protocol, a practical way to do
> so is to experiment with the
> [agent control tool](#agent-control-tool) on a test system.
> This tool is for test and development purposes only and can send
> arbitrary ttRPC agent API commands to the [agent](#agent).
## Runtime
The Kata Containers runtime (the [`containerd-shim-kata-v2`](../../../src/runtime/cmd/containerd-shim-kata-v2
) binary) is a [shimv2](#shim-v2-architecture) compatible runtime.
> **Note:**
>
> The Kata Containers runtime is sometimes referred to as the Kata
> _shim_. Both terms are correct since the `containerd-shim-kata-v2`
> is a container runtime, and that runtime implements the containerd
> shim v2 API.
The runtime makes heavy use of the [`virtcontainers`
package](../../../src/runtime/virtcontainers), which provides a generic,
runtime-specification agnostic, hardware-virtualized containers
library.
The runtime is responsible for starting the [hypervisor](#hypervisor)
and it's VM, and communicating with the [agent](#agent) using a
[ttRPC based protocol](#agent-communications-protocol) over a VSOCK
socket that provides a communications link between the VM and the
host.
This protocol allows the runtime to send container management commands
to the agent. The protocol is also used to carry the standard I/O
streams (`stdout`, `stderr`, `stdin`) between the containers and
container managers (such as CRI-O or containerd).
## Utility program
The `kata-runtime` binary is a utility program that provides
administrative commands to manipulate and query a Kata Containers
installation.
> **Note:**
>
> In Kata 1.x, this program also acted as the main
> [runtime](#runtime), but this is no longer required due to the
> improved shimv2 architecture.
### exec command
The `exec` command allows an administrator or developer to enter the
[VM root environment](#environments) which is not accessible by the container
[workload](#workload).
See [the developer guide](../../Developer-Guide.md#connect-to-debug-console) for further details.
### Configuration
See the [configuration file details](../../../src/runtime/README.md#configuration).
The configuration file is also used to enable runtime [debug output](../../Developer-Guide.md#enable-full-debug).
## Process overview
The table below shows an example of the main processes running in the
different [environments](#environments) when a Kata Container is
created with containerd using our [example command](example-command.md):
| Description | Host | VM root environment | VM container environment |
|-|-|-|-|
| Container manager | `containerd` | |
| Kata Containers | [runtime](#runtime), [`virtiofsd`](storage.md#virtio-fs), [hypervisor](#hypervisor) | [agent](#agent) |
| User [workload](#workload) | | | [`ubuntu sh`](example-command.md) |
## Networking
See the [networking document](networking.md).
## Storage
See the [storage document](storage.md).
## Kubernetes support
See the [Kubernetes document](kubernetes.md).
#### OCI annotations
In order for the Kata Containers [runtime](#runtime) (or any VM based OCI compatible
runtime) to be able to understand if it needs to create a full VM or if it
has to create a new container inside an existing pod's VM, CRI-O adds
specific annotations to the OCI configuration file (`config.json`) which is passed to
the OCI compatible runtime.
Before calling its runtime, CRI-O will always add a `io.kubernetes.cri-o.ContainerType`
annotation to the `config.json` configuration file it produces from the Kubelet CRI
request. The `io.kubernetes.cri-o.ContainerType` annotation can either be set to `sandbox`
or `container`. Kata Containers will then use this annotation to decide if it needs to
respectively create a virtual machine or a container inside a virtual machine associated
with a Kubernetes pod:
| Annotation value | Kata VM created? | Kata container created? |
|-|-|-|
| `sandbox` | yes | yes (inside new VM) |
| `container`| no | yes (in existing VM) |
#### Mixing VM based and namespace based runtimes
> **Note:** Since Kubernetes 1.12, the [`Kubernetes RuntimeClass`](https://kubernetes.io/docs/concepts/containers/runtime-class/)
> has been supported and the user can specify runtime without the non-standardized annotations.
With `RuntimeClass`, users can define Kata Containers as a
`RuntimeClass` and then explicitly specify that a pod must be created
as a Kata Containers pod. For details, please refer to [How to use
Kata Containers and containerd](../../../docs/how-to/containerd-kata.md).
## Tracing
The [tracing document](../../tracing.md) provides details on the tracing
architecture.
# Appendices
## DAX
Kata Containers utilizes the Linux kernel DAX
[(Direct Access filesystem)](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/dax.rst?h=v5.14)
feature to efficiently map the [guest image](guest-assets.md#guest-image) in the
[host environment](#environments) into the
[guest VM environment](#environments) to become the VM's
[rootfs](background.md#root-filesystem).
If the [configured](#configuration) [hypervisor](#hypervisor) is set
to either QEMU or Cloud Hypervisor, DAX is used with the feature shown
in the table below:
| Hypervisor | Feature used | rootfs device type |
|-|-|-|
| Cloud Hypervisor (CH) | `dax` `FsConfig` configuration option | PMEM (emulated Persistent Memory device) |
| QEMU | NVDIMM memory device with a memory file backend | NVDIMM (emulated Non-Volatile Dual In-line Memory Module device) |
The features in the table above are equivalent in that they provide a memory-mapped
virtual device which is used to DAX map the VM's
[rootfs](background.md#root-filesystem) into the [VM guest](#environments) memory
address space.
The VM is then booted, specifying the `root=` kernel parameter to make
the [guest kernel](guest-assets.md#guest-kernel) use the appropriate emulated device
as its rootfs.
### DAX advantages
Mapping files using [DAX](#dax) provides a number of benefits over
more traditional VM file and device mapping mechanisms:
- Mapping as a direct access device allows the guest to directly
access the host memory pages (such as via Execute In Place (XIP)),
bypassing the [guest kernel](guest-assets.md#guest-kernel)'s page cache. This
zero copy provides both time and space optimizations.
- Mapping as a direct access device inside the VM allows pages from the
host to be demand loaded using page faults, rather than having to make requests
via a virtualized device (causing expensive VM exits/hypercalls), thus providing
a speed optimization.
- Utilizing `mmap(2)`'s `MAP_SHARED` shared memory option on the host
allows the host to efficiently share pages.
![DAX](../arch-images/DAX.png)
For further details of the use of NVDIMM with QEMU, see the [QEMU
project documentation](https://www.qemu.org).
## Agent control tool
The [agent control tool](../../../src/tools/agent-ctl) is a test and
development tool that can be used to learn more about a Kata
Containers system.
## Terminology
See the [project glossary](../../../Glossary.md).

View File

@@ -1,81 +0,0 @@
# Kata Containers architecture background knowledge
The following sections explain some of the background concepts
required to understand the [architecture document](README.md).
## Root filesystem
This document uses the term _rootfs_ to refer to a root filesystem
which is mounted as the top-level directory ("`/`") and often referred
to as _slash_.
It is important to understand this term since the overall system uses
multiple different rootfs's (as explained in the
[Environments](README.md#environments) section.
## Container image
In the [example command](example-command.md) the user has specified the
type of container they wish to run via the container image name:
`ubuntu`. This image name corresponds to a _container image_ that can
be used to create a container with an Ubuntu Linux environment. Hence,
in our [example](example-command.md), the `sh(1)` command will be run
inside a container which has an Ubuntu rootfs.
> **Note:**
>
> The term _container image_ is confusing since the image in question
> is **not** a container: it is simply a set of files (_an image_)
> that can be used to _create_ a container. The term _container
> template_ would be more accurate but the term _container image_ is
> commonly used so this document uses the standard term.
For the purposes of this document, the most important part of the
[example command line](example-command.md) is the container image the
user has requested. Normally, the container manager will _pull_
(download) a container image from a remote site and store a copy
locally. This local container image is used by the container manager
to create an [OCI bundle](#oci-bundle) which will form the environment
the container will run in. After creating the OCI bundle, the
container manager launches a [runtime](README.md#runtime) which will create the
container using the provided OCI bundle.
## OCI bundle
To understand what follows, it is important to know at a high level
how an OCI ([Open Containers Initiative](https://opencontainers.org)) compatible container is created.
An OCI compatible container is created by taking a
[container image](#container-image) and converting the embedded rootfs
into an
[OCI rootfs bundle](https://github.com/opencontainers/runtime-spec/blob/main/bundle.md),
or more simply, an _OCI bundle_.
An OCI bundle is a `tar(1)` archive normally created by a container
manager which is passed to an OCI [runtime](README.md#runtime) which converts
it into a full container rootfs. The bundle contains two assets:
- A container image [rootfs](#root-filesystem)
This is simply a directory of files that will be used to represent
the rootfs for the container.
For the [example command](example-command.md), the directory will
contain the files necessary to create a minimal Ubuntu root
filesystem.
- An [OCI configuration file](https://github.com/opencontainers/runtime-spec/blob/main/config.md)
This is a JSON file called `config.json`.
The container manager will create this file so that:
- The `root.path` value is set to the full path of the specified
container rootfs.
In [the example](example-command.md) this value will be `ubuntu`.
- The `process.args` array specifies the list of commands the user
wishes to run. This is known as the [workload](README.md#workload).
In [the example](example-command.md) the workload is `sh(1)`.

View File

@@ -1,30 +0,0 @@
# Example command
The following containerd command creates a container. It is referred
to throughout the architecture document to help explain various points:
```bash
$ sudo ctr run --runtime "io.containerd.kata.v2" --rm -t "quay.io/libpod/ubuntu:latest" foo sh
```
This command requests that containerd:
- Create a container (`ctr run`).
- Use the Kata [shimv2](README.md#shim-v2-architecture) runtime (`--runtime "io.containerd.kata.v2"`).
- Delete the container when it [exits](README.md#workload-exit) (`--rm`).
- Attach the container to the user's terminal (`-t`).
- Use the Ubuntu Linux [container image](background.md#container-image)
to create the container [rootfs](background.md#root-filesystem) that will become
the [container environment](README.md#environments)
(`quay.io/libpod/ubuntu:latest`).
- Create the container with the name "`foo`".
- Run the `sh(1)` command in the Ubuntu rootfs based container
environment.
The command specified here is referred to as the [workload](README.md#workload).
> **Note:**
>
> For the purposes of this document and to keep explanations
> simpler, we assume the user is running this command in the
> [host environment](README.md#environments).

View File

@@ -1,152 +0,0 @@
# Guest assets
Kata Containers creates a VM in which to run one or more containers.
It does this by launching a [hypervisor](README.md#hypervisor) to
create the VM. The hypervisor needs two assets for this task: a Linux
kernel and a small root filesystem image to boot the VM.
## Guest kernel
The [guest kernel](../../../tools/packaging/kernel)
is passed to the hypervisor and used to boot the VM.
The default kernel provided in Kata Containers is highly optimized for
kernel boot time and minimal memory footprint, providing only those
services required by a container workload. It is based on the latest
Linux LTS (Long Term Support) [kernel](https://www.kernel.org).
## Guest image
The hypervisor uses an image file which provides a minimal root
filesystem used by the guest kernel to boot the VM and host the Kata
Container. Kata Containers supports both initrd and rootfs based
minimal guest images. The [default packages](../../install/) provide both
an image and an initrd, both of which are created using the
[`osbuilder`](../../../tools/osbuilder) tool.
> **Notes:**
>
> - Although initrd and rootfs based images are supported, not all
> [hypervisors](README.md#hypervisor) support both types of image.
>
> - The guest image is *unrelated* to the image used in a container
> workload.
>
> For example, if a user creates a container that runs a shell in a
> BusyBox image, they will run that shell in a BusyBox environment.
> However, the guest image running inside the VM that is used to
> *host* that BusyBox image could be running Clear Linux, Ubuntu,
> Fedora or any other distribution potentially.
>
> The `osbuilder` tool provides
> [configurations for various common Linux distributions](../../../tools/osbuilder/rootfs-builder)
> which can be built into either initrd or rootfs guest images.
>
> - If you are using a [packaged version of Kata
> Containers](../../install), you can see image details by running the
> [`kata-collect-data.sh`](../../../src/runtime/data/kata-collect-data.sh.in)
> script as `root` and looking at the "Image details" section of the
> output.
#### Root filesystem image
The default packaged rootfs image, sometimes referred to as the _mini
O/S_, is a highly optimized container bootstrap system.
If this image type is [configured](README.md#configuration), when the
user runs the [example command](example-command.md):
- The [runtime](README.md#runtime) will launch the configured [hypervisor](README.md#hypervisor).
- The hypervisor will boot the mini-OS image using the [guest kernel](#guest-kernel).
- The kernel will start the init daemon as PID 1 (`systemd`) inside the VM root environment.
- `systemd`, running inside the mini-OS context, will launch the [agent](README.md#agent)
in the root context of the VM.
- The agent will create a new container environment, setting its root
filesystem to that requested by the user (Ubuntu in [the example](example-command.md)).
- The agent will then execute the command (`sh(1)` in [the example](example-command.md))
inside the new container.
The table below summarises the default mini O/S showing the
environments that are created, the services running in those
environments (for all platforms) and the root filesystem used by
each service:
| Process | Environment | systemd service? | rootfs | User accessible | Notes |
|-|-|-|-|-|-|
| systemd | VM root | n/a | [VM guest image](#guest-image)| [debug console][debug-console] | The init daemon, running as PID 1 |
| [Agent](README.md#agent) | VM root | yes | [VM guest image](#guest-image)| [debug console][debug-console] | Runs as a systemd service |
| `chronyd` | VM root | yes | [VM guest image](#guest-image)| [debug console][debug-console] | Used to synchronise the time with the host |
| container workload (`sh(1)` in [the example](example-command.md)) | VM container | no | User specified (Ubuntu in [the example](example-command.md)) | [exec command](README.md#exec-command) | Managed by the agent |
See also the [process overview](README.md#process-overview).
> **Notes:**
>
> - The "User accessible" column shows how an administrator can access
> the environment.
>
> - The container workload is running inside a full container
> environment which itself is running within a VM environment.
>
> - See the [configuration files for the `osbuilder` tool](../../../tools/osbuilder/rootfs-builder)
> for details of the default distribution for platforms other than
> Intel x86_64.
#### Initrd image
The initrd image is a compressed `cpio(1)` archive, created from a
rootfs which is loaded into memory and used as part of the Linux
startup process. During startup, the kernel unpacks it into a special
instance of a `tmpfs` mount that becomes the initial root filesystem.
If this image type is [configured](README.md#configuration), when the user runs
the [example command](example-command.md):
- The [runtime](README.md#runtime) will launch the configured [hypervisor](README.md#hypervisor).
- The hypervisor will boot the mini-OS image using the [guest kernel](#guest-kernel).
- The kernel will start the init daemon as PID 1 (the
[agent](README.md#agent))
inside the VM root environment.
- The [agent](README.md#agent) will create a new container environment, setting its root
filesystem to that requested by the user (`ubuntu` in
[the example](example-command.md)).
- The agent will then execute the command (`sh(1)` in [the example](example-command.md))
inside the new container.
The table below summarises the default mini O/S showing the environments that are created,
the processes running in those environments (for all platforms) and
the root filesystem used by each service:
| Process | Environment | rootfs | User accessible | Notes |
|-|-|-|-|-|
| [Agent](README.md#agent) | VM root | [VM guest image](#guest-image) | [debug console][debug-console] | Runs as the init daemon (PID 1) |
| container workload | VM container | User specified (Ubuntu in this example) | [exec command](README.md#exec-command) | Managed by the agent |
> **Notes:**
>
> - The "User accessible" column shows how an administrator can access
> the environment.
>
> - It is possible to use a standard init daemon such as systemd with
> an initrd image if this is desirable.
See also the [process overview](README.md#process-overview).
#### Image summary
| Image type | Default distro | Init daemon | Reason | Notes |
|-|-|-|-|-|
| [image](background.md#root-filesystem-image) | [Clear Linux](https://clearlinux.org) (for x86_64 systems)| systemd | Minimal and highly optimized | systemd offers flexibility |
| [initrd](#initrd-image) | [Alpine Linux](https://alpinelinux.org) | Kata [agent](README.md#agent) (as no systemd support) | Security hardened and tiny C library |
See also:
- The [osbuilder](../../../tools/osbuilder) tool
This is used to build all default image types.
- The [versions database](../../../versions.yaml)
The `default-image-name` and `default-initrd-name` options specify
the default distributions for each image type.
[debug-console]: ../../Developer-Guide.md#connect-to-debug-console

View File

@@ -1,41 +0,0 @@
# History
## Kata 1.x architecture
In the old [Kata 1.x architecture](https://github.com/kata-containers/documentation/blob/master/design/architecture.md),
the Kata [runtime](README.md#runtime) was an executable called `kata-runtime`.
The container manager called this executable multiple times when
creating each container. Each time the runtime was called a different
OCI command-line verb was provided. This architecture was simple, but
not well suited to creating VM based containers due to the issue of
handling state between calls. Additionally, the architecture suffered
from performance issues related to continually having to spawn new
instances of the runtime binary, and
[Kata shim](https://github.com/kata-containers/shim) and
[Kata proxy](https://github.com/kata-containers/proxy) processes for systems
that did not provide VSOCK.
## Kata 2.x architecture
See the ["shimv2"](README.md#shim-v2-architecture) section of the
architecture document.
## Architectural comparison
| Kata version | Kata Runtime process calls | Kata shim processes | Kata proxy processes (if no VSOCK) |
|-|-|-|-|
| 1.x | multiple per container | 1 per container connection | 1 |
| 2.x | 1 per VM (hosting any number of containers) | 0 | 0 |
> **Notes:**
>
> - A single VM can host one or more containers.
>
> - The "Kata shim processes" column refers to the old
> [Kata shim](https://github.com/kata-containers/shim) (`kata-shim` binary),
> *not* the new shimv2 runtime instance (`containerd-shim-kata-v2` binary).
The diagram below shows how the original architecture was simplified
with the advent of shimv2.
![Kubernetes integration with shimv2](../arch-images/shimv2.svg)

View File

@@ -1,35 +0,0 @@
# Kubernetes support
[Kubernetes](https://github.com/kubernetes/kubernetes/), or K8s, is a popular open source
container orchestration engine. In Kubernetes, a set of containers sharing resources
such as networking, storage, mount, PID, etc. is called a
[pod](https://kubernetes.io/docs/user-guide/pods/).
A node can have multiple pods, but at a minimum, a node within a Kubernetes cluster
only needs to run a container runtime and a container agent (called a
[Kubelet](https://kubernetes.io/docs/admin/kubelet/)).
Kata Containers represents a Kubelet pod as a VM.
A Kubernetes cluster runs a control plane where a scheduler (typically
running on a dedicated master node) calls into a compute Kubelet. This
Kubelet instance is responsible for managing the lifecycle of pods
within the nodes and eventually relies on a container runtime to
handle execution. The Kubelet architecture decouples lifecycle
management from container execution through a dedicated gRPC based
[Container Runtime Interface (CRI)](https://github.com/kubernetes/design-proposals-archive/blob/main/node/container-runtime-interface-v1.md).
In other words, a Kubelet is a CRI client and expects a CRI
implementation to handle the server side of the interface.
[CRI-O](https://github.com/kubernetes-incubator/cri-o) and
[containerd](https://github.com/containerd/containerd/) are CRI
implementations that rely on
[OCI](https://github.com/opencontainers/runtime-spec) compatible
runtimes for managing container instances.
Kata Containers is an officially supported CRI-O and containerd
runtime. Refer to the following guides on how to set up Kata
Containers with Kubernetes:
- [How to use Kata Containers and containerd](../../how-to/containerd-kata.md)
- [Run Kata Containers with Kubernetes](../../how-to/run-kata-with-k8s.md)

View File

@@ -1,49 +0,0 @@
# Networking
Containers typically live in their own, possibly shared, networking namespace.
At some point in a container lifecycle, container engines will set up that namespace
to add the container to a network which is isolated from the host network.
In order to setup the network for a container, container engines call into a
networking plugin. The network plugin will usually create a virtual
ethernet (`veth`) pair adding one end of the `veth` pair into the container
networking namespace, while the other end of the `veth` pair is added to the
host networking namespace.
This is a very namespace-centric approach as many hypervisors or VM
Managers (VMMs) such as `virt-manager` cannot handle `veth`
interfaces. Typically, [`TAP`](https://www.kernel.org/doc/Documentation/networking/tuntap.txt)
interfaces are created for VM connectivity.
To overcome incompatibility between typical container engines expectations
and virtual machines, Kata Containers networking transparently connects `veth`
interfaces with `TAP` ones using [Traffic Control](https://man7.org/linux/man-pages/man8/tc.8.html):
![Kata Containers networking](../arch-images/network.png)
With a TC filter rules in place, a redirection is created between the container network
and the virtual machine. As an example, the network plugin may place a device,
`eth0`, in the container's network namespace, which is one end of a VETH device.
Kata Containers will create a tap device for the VM, `tap0_kata`,
and setup a TC redirection filter to redirect traffic from `eth0`'s ingress to `tap0_kata`'s egress,
and a second TC filter to redirect traffic from `tap0_kata`'s ingress to `eth0`'s egress.
Kata Containers maintains support for MACVTAP, which was an earlier implementation used in Kata.
With this method, Kata created a MACVTAP device to connect directly to the `eth0` device.
TC-filter is the default because it allows for simpler configuration, better CNI plugin
compatibility, and performance on par with MACVTAP.
Kata Containers has deprecated support for bridge due to lacking performance relative to TC-filter and MACVTAP.
Kata Containers supports both
[CNM](https://github.com/docker/libnetwork/blob/master/docs/design.md#the-container-network-model)
and [CNI](https://github.com/containernetworking/cni) for networking management.
## Network Hotplug
Kata Containers has developed a set of network sub-commands and APIs to add, list and
remove a guest network endpoint and to manipulate the guest route table.
The following diagram illustrates the Kata Containers network hotplug workflow.
![Network Hotplug](../arch-images/kata-containers-network-hotplug.png)

View File

@@ -1,56 +0,0 @@
# Storage
## Limits
Kata Containers is [compatible](README.md#compatibility) with existing
standards and runtime. From the perspective of storage, this means no
limits are placed on the amount of storage a container
[workload](README.md#workload) may use.
Since cgroups are not able to set limits on storage allocation, if you
wish to constrain the amount of storage a container uses, consider
using an existing facility such as `quota(1)` limits or
[device mapper](#devicemapper) limits.
## virtio SCSI
If a block-based graph driver is [configured](README.md#configuration),
`virtio-scsi` is used to _share_ the workload image (such as
`busybox:latest`) into the container's environment inside the VM.
## virtio FS
If a block-based graph driver is _not_ [configured](README.md#configuration), a
[`virtio-fs`](https://virtio-fs.gitlab.io) (`VIRTIO`) overlay
filesystem mount point is used to _share_ the workload image instead. The
[agent](README.md#agent) uses this mount point as the root filesystem for the
container processes.
For virtio-fs, the [runtime](README.md#runtime) starts one `virtiofsd` daemon
(that runs in the host context) for each VM created.
## Devicemapper
The
[devicemapper `snapshotter`](https://github.com/containerd/containerd/tree/main/snapshots/devmapper)
is a special case. The `snapshotter` uses dedicated block devices
rather than formatted filesystems, and operates at the block level
rather than the file level. This knowledge is used to directly use the
underlying block device instead of the overlay file system for the
container root file system. The block device maps to the top
read-write layer for the overlay. This approach gives much better I/O
performance compared to using `virtio-fs` to share the container file
system.
#### Hot plug and unplug
Kata Containers has the ability to hot plug add and hot plug remove
block devices. This makes it possible to use block devices for
containers started after the VM has been launched.
Users can check to see if the container uses the `devicemapper` block
device as its rootfs by calling `mount(8)` within the container. If
the `devicemapper` block device is used, the root filesystem (`/`)
will be mounted from `/dev/vda`. Users can disable direct mounting of
the underlying block device through the runtime
[configuration](README.md#configuration).

View File

@@ -1,169 +0,0 @@
# Kata 3.0 Architecture
## Overview
In cloud-native scenarios, there is an increased demand for container startup speed, resource consumption, stability, and security, areas where the present Kata Containers runtime is challenged relative to other runtimes. To achieve this, we propose a solid, field-tested and secure Rust version of the kata-runtime.
Also, we provide the following designs:
- Turn key solution with builtin `Dragonball` Sandbox
- Async I/O to reduce resource consumption
- Extensible framework for multiple services, runtimes and hypervisors
- Lifecycle management for sandbox and container associated resources
### Rationale for choosing Rust
We chose Rust because it is designed as a system language with a focus on efficiency.
In contrast to Go, Rust makes a variety of design trade-offs in order to obtain
good execution performance, with innovative techniques that, in contrast to C or
C++, provide reasonable protection against common memory errors (buffer
overflow, invalid pointers, range errors), error checking (ensuring errors are
dealt with), thread safety, ownership of resources, and more.
These benefits were verified in our project when the Kata Containers guest agent
was rewritten in Rust. We notably saw a significant reduction in memory usage
with the Rust-based implementation.
## Design
### Architecture
![architecture](./images/architecture.png)
### Built-in VMM
#### Current Kata 2.x architecture
![not_builtin_vmm](./images/not_built_in_vmm.png)
As shown in the figure, runtime and VMM are separate processes. The runtime process forks the VMM process and interacts through the inter-process RPC. Typically, process interaction consumes more resources than peers within the process, and it will result in relatively low efficiency. At the same time, the cost of resource operation and maintenance should be considered. For example, when performing resource recovery under abnormal conditions, the exception of any process must be detected by others and activate the appropriate resource recovery process. If there are additional processes, the recovery becomes even more difficult.
#### How To Support Built-in VMM
We provide `Dragonball` Sandbox to enable built-in VMM by integrating VMM's function into the Rust library. We could perform VMM-related functionalities by using the library. Because runtime and VMM are in the same process, there is a benefit in terms of message processing speed and API synchronization. It can also guarantee the consistency of the runtime and the VMM life cycle, reducing resource recovery and exception handling maintenance, as shown in the figure:
![builtin_vmm](./images/built_in_vmm.png)
### Async Support
#### Why Need Async
**Async is already in stable Rust and allows us to write async code**
- Async provides significantly reduced CPU and memory overhead, especially for workloads with a large amount of IO-bound tasks
- Async is zero-cost in Rust, which means that you only pay for what you use. Specifically, you can use async without heap allocations and dynamic dispatch, which greatly improves efficiency
- For more (see [Why Async?](https://rust-lang.github.io/async-book/01_getting_started/02_why_async.html) and [The State of Asynchronous Rust](https://rust-lang.github.io/async-book/01_getting_started/03_state_of_async_rust.html)).
**There may be several problems if implementing kata-runtime with Sync Rust**
- Too many threads with a new TTRPC connection
- TTRPC threads: reaper thread(1) + listener thread(1) + client handler(2)
- Add 3 I/O threads with a new container
- In Sync mode, implementing a timeout mechanism is challenging. For example, in TTRPC API interaction, the timeout mechanism is difficult to align with Golang
#### How To Support Async
The kata-runtime is controlled by TOKIO_RUNTIME_WORKER_THREADS to run the OS thread, which is 2 threads by default. For TTRPC and container-related threads run in the `tokio` thread in a unified manner, and related dependencies need to be switched to Async, such as Timer, File, Netlink, etc. With the help of Async, we can easily support no-block I/O and timer. Currently, we only utilize Async for kata-runtime. The built-in VMM keeps the OS thread because it can ensure that the threads are controllable.
**For N tokio worker threads and M containers**
- Sync runtime(both OS thread and `tokio` task are OS thread but without `tokio` worker thread) OS thread number: 4 + 12*M
- Async runtime(only OS thread is OS thread) OS thread number: 2 + N
```shell
├─ main(OS thread)
├─ async-logger(OS thread)
└─ tokio worker(N * OS thread)
├─ agent log forwarder(1 * tokio task)
├─ health check thread(1 * tokio task)
├─ TTRPC reaper thread(M * tokio task)
├─ TTRPC listener thread(M * tokio task)
├─ TTRPC client handler thread(7 * M * tokio task)
├─ container stdin io thread(M * tokio task)
├─ container stdin io thread(M * tokio task)
└─ container stdin io thread(M * tokio task)
```
### Extensible Framework
The Kata 3.x runtime is designed with the extension of service, runtime, and hypervisor, combined with configuration to meet the needs of different scenarios. At present, the service provides a register mechanism to support multiple services. Services could interact with runtime through messages. In addition, the runtime handler handles messages from services. To meet the needs of a binary that supports multiple runtimes and hypervisors, the startup must obtain the runtime handler type and hypervisor type through configuration.
![framework](./images/framework.png)
### Resource Manager
In our case, there will be a variety of resources, and every resource has several subtypes. Especially for `Virt-Container`, every subtype of resource has different operations. And there may be dependencies, such as the share-fs rootfs and the share-fs volume will use share-fs resources to share files to the VM. Currently, network and share-fs are regarded as sandbox resources, while rootfs, volume, and cgroup are regarded as container resources. Also, we abstract a common interface for each resource and use subclass operations to evaluate the differences between different subtypes.
![resource manager](./images/resourceManager.png)
## Roadmap
- Stage 1 (June): provide basic features (current delivered)
- Stage 2 (September): support common features
- Stage 3: support full features
| **Class** | **Sub-Class** | **Development Stage** | **Status** |
| -------------------------- | ------------------- | --------------------- |------------|
| Service | task service | Stage 1 | ✅ |
| | extend service | Stage 3 | 🚫 |
| | image service | Stage 3 | 🚫 |
| Runtime handler | `Virt-Container` | Stage 1 | ✅ |
| Endpoint | VETH Endpoint | Stage 1 | ✅ |
| | Physical Endpoint | Stage 2 | ✅ |
| | Tap Endpoint | Stage 2 | ✅ |
| | `Tuntap` Endpoint | Stage 2 | ✅ |
| | `IPVlan` Endpoint | Stage 2 | ✅ |
| | `MacVlan` Endpoint | Stage 2 | ✅ |
| | MACVTAP Endpoint | Stage 3 | 🚫 |
| | `VhostUserEndpoint` | Stage 3 | 🚫 |
| Network Interworking Model | Tc filter | Stage 1 | ✅ |
| | `MacVtap` | Stage 3 | 🚧 |
| Storage | Virtio-fs | Stage 1 | ✅ |
| | `nydus` | Stage 2 | 🚧 |
| | `device mapper` | Stage 2 | 🚫 |
| `Cgroup V2` | | Stage 2 | 🚧 |
| Hypervisor | `Dragonball` | Stage 1 | 🚧 |
| | QEMU | Stage 2 | 🚫 |
| | ACRN | Stage 3 | 🚫 |
| | Cloud Hypervisor | Stage 3 | 🚫 |
| | Firecracker | Stage 3 | 🚫 |
## FAQ
- Are the "service", "message dispatcher" and "runtime handler" all part of the single Kata 3.x runtime binary?
Yes. They are components in Kata 3.x runtime. And they will be packed into one binary.
1. Service is an interface, which is responsible for handling multiple services like task service, image service and etc.
2. Message dispatcher, it is used to match multiple requests from the service module.
3. Runtime handler is used to deal with the operation for sandbox and container.
- What is the name of the Kata 3.x runtime binary?
Apparently we can't use `containerd-shim-v2-kata` because it's already used. We are facing the hardest issue of "naming" again. Any suggestions are welcomed.
Internally we use `containerd-shim-v2-rund`.
- Is the Kata 3.x design compatible with the containerd shimv2 architecture?
Yes. It is designed to follow the functionality of go version kata. And it implements the `containerd shim v2` interface/protocol.
- How will users migrate to the Kata 3.x architecture?
The migration plan will be provided before the Kata 3.x is merging into the main branch.
- Is `Dragonball` limited to its own built-in VMM? Can the `Dragonball` system be configured to work using an external `Dragonball` VMM/hypervisor?
The `Dragonball` could work as an external hypervisor. However, stability and performance is challenging in this case. Built in VMM could optimise the container overhead, and it's easy to maintain stability.
`runD` is the `containerd-shim-v2` counterpart of `runC` and can run a pod/containers. `Dragonball` is a `microvm`/VMM that is designed to run container workloads. Instead of `microvm`/VMM, we sometimes refer to it as secure sandbox.
- QEMU, Cloud Hypervisor and Firecracker support are planned, but how that would work. Are they working in separate process?
Yes. They are unable to work as built in VMM.
- What is `upcall`?
The `upcall` is used to hotplug CPU/memory/MMIO devices, and it solves two issues.
1. avoid dependency on PCI/ACPI
2. avoid dependency on `udevd` within guest and get deterministic results for hotplug operations. So `upcall` is an alternative to ACPI based CPU/memory/device hotplug. And we may cooperate with the community to add support for ACPI based CPU/memory/device hotplug if needed.
`Dbs-upcall` is a `vsock-based` direct communication tool between VMM and guests. The server side of the `upcall` is a driver in guest kernel (kernel patches are needed for this feature) and it'll start to serve the requests once the kernel has started. And the client side is in VMM , it'll be a thread that communicates with VSOCK through `uds`. We have accomplished device hotplug / hot-unplug directly through `upcall` in order to avoid virtualization of ACPI to minimize virtual machine's overhead. And there could be many other usage through this direct communication channel. It's already open source.
https://github.com/openanolis/dragonball-sandbox/tree/main/crates/dbs-upcall
- The URL below says the kernel patches work with 4.19, but do they also work with 5.15+ ?
Forward compatibility should be achievable, we have ported it to 5.10 based kernel.
- Are these patches platform-specific or would they work for any architecture that supports VSOCK?
It's almost platform independent, but some message related to CPU hotplug are platform dependent.
- Could the kernel driver be replaced with a userland daemon in the guest using loopback VSOCK?
We need to create device nodes for hot-added CPU/memory/devices, so it's not easy for userspace daemon to do these tasks.
- The fact that `upcall` allows communication between the VMM and the guest suggests that this architecture might be incompatible with https://github.com/confidential-containers where the VMM should have no knowledge of what happens inside the VM.
1. `TDX` doesn't support CPU/memory hotplug yet.
2. For ACPI based device hotplug, it depends on ACPI `DSDT` table, and the guest kernel will execute `ASL` code to handle during handling those hotplug event. And it should be easier to audit VSOCK based communication than ACPI `ASL` methods.
- What is the security boundary for the monolithic / "Built-in VMM" case?
It has the security boundary of virtualization. More details will be provided in next stage.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 95 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 66 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 136 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 72 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 139 KiB

File diff suppressed because one or more lines are too long

View File

@@ -1,12 +0,0 @@
# Core scheduling
Core scheduling is a Linux kernel feature that allows only trusted tasks to run concurrently on
CPUs sharing compute resources (for example, hyper-threads on a core).
Containerd versions >= 1.6.4 leverage this to treat all of the processes associated with a
given pod or container to be a single group of trusted tasks. To indicate this should be carried
out, containerd sets the `SCHED_CORE` environment variable for each shim it spawns. When this is
set, the Kata Containers shim implementation uses the `prctl` syscall to create a new core scheduling
domain for the shim process itself as well as future VMM processes it will start.
For more details on the core scheduling feature, see the [Linux documentation](https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/core-scheduling.html).

View File

@@ -1825,8 +1825,12 @@ components:
desc: ""
- value: grpc.StartContainerRequest
desc: ""
- value: grpc.StartTracingRequest
desc: ""
- value: grpc.StatsContainerRequest
desc: ""
- value: grpc.StopTracingRequest
desc: ""
- value: grpc.TtyWinResizeRequest
desc: ""
- value: grpc.UpdateContainerRequest

View File

@@ -1,253 +0,0 @@
# Motivation
Today, there exist a few gaps between Container Storage Interface (CSI) and virtual machine (VM) based runtimes such as Kata Containers
that prevent them from working together smoothly.
First, its cumbersome to use a persistent volume (PV) with Kata Containers. Today, for a PV with Filesystem volume mode, Virtio-fs
is the only way to surface it inside a Kata Container guest VM. But often mounting the filesystem (FS) within the guest operating system (OS) is
desired due to performance benefits, availability of native FS features and security benefits over the Virtio-fs mechanism.
Second, its difficult if not impossible to resize a PV online with Kata Containers. While a PV can be expanded on the host OS,
the updated metadata needs to be propagated to the guest OS in order for the application container to use the expanded volume.
Currently, there is not a way to propagate the PV metadata from the host OS to the guest OS without restarting the Pod sandbox.
# Proposed Solution
Because of the OS boundary, these features cannot be implemented in the CSI node driver plugin running on the host OS
as is normally done in the runc container. Instead, they can be done by the Kata Containers agent inside the guest OS,
but it requires the CSI driver to pass the relevant information to the Kata Containers runtime.
An ideal long term solution would be to have the `kubelet` coordinating the communication between the CSI driver and
the container runtime, as described in [KEP-2857](https://github.com/kubernetes/enhancements/pull/2893/files).
However, as the KEP is still under review, we would like to propose a short/medium term solution to unblock our use case.
The proposed solution is built on top of a previous [proposal](https://github.com/egernst/kata-containers/blob/da-proposal/docs/design/direct-assign-volume.md)
described by Eric Ernst. The previous proposal has two gaps:
1. Writing a `csiPlugin.json` file to the volume root path introduced a security risk. A malicious user can gain unauthorized
access to a block device by writing their own `csiPlugin.json` to the above location through an ephemeral CSI plugin.
2. The proposal didn't describe how to establish a mapping between a volume and a kata sandbox, which is needed for
implementing CSI volume resize and volume stat collection APIs.
This document particularly focuses on how to address these two gaps.
## Assumptions and Limitations
1. The proposal assumes that a block device volume will only be used by one Pod on a node at a time, which we believe
is the most common pattern in Kata Containers use cases. Its also unsafe to have the same block device attached to more than
one Kata pod. In the context of Kubernetes, the `PersistentVolumeClaim` (PVC) needs to have the `accessMode` as `ReadWriteOncePod`.
2. More advanced Kubernetes volume features such as, `fsGroup`, `fsGroupChangePolicy`, and `subPath` are not supported.
## End User Interface
1. The user specifies a PV as a direct-assigned volume. How a PV is specified as a direct-assigned volume is left for each CSI implementation to decide.
There are a few options for reference:
1. A storage class parameter specifies whether it's a direct-assigned volume. This avoids any lookups of PVC
or Pod information from the CSI plugin (as external provisioner takes care of these). However, all PVs in the storage class with the parameter set
will have host mounts skipped.
2. Use a PVC annotation. This approach requires the CSI plugins have `--extra-create-metadata` [set](https://kubernetes-csi.github.io/docs/external-provisioner.html#persistentvolumeclaim-and-persistentvolume-parameters)
to be able to perform a lookup of the PVC annotations from the API server. Pro: API server lookup of annotations only required during creation of PV.
Con: The CSI plugin will always skip host mounting of the PV.
3. The CSI plugin can also lookup pod `runtimeclass` during `NodePublish`. This approach can be found in the [ALIBABA CSI plugin](https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver/blob/master/pkg/disk/nodeserver.go#L248).
2. The CSI node driver delegates the direct assigned volume to the Kata Containers runtime. The CSI node driver APIs need to
be modified to pass the volume mount information and collect volume information to/from the Kata Containers runtime by invoking `kata-runtime` command line commands.
* **NodePublishVolume** -- It invokes `kata-runtime direct-volume add --volume-path [volumePath] --mount-info [mountInfo]`
to propagate the volume mount information to the Kata Containers runtime for it to carry out the filesystem mount operation.
The `volumePath` is the [target_path](https://github.com/container-storage-interface/spec/blob/master/csi.proto#L1364) in the CSI `NodePublishVolumeRequest`.
The `mountInfo` is a serialized JSON string.
* **NodeGetVolumeStats** -- It invokes `kata-runtime direct-volume stats --volume-path [volumePath]` to retrieve the filesystem stats of direct-assigned volume.
* **NodeExpandVolume** -- It invokes `kata-runtime direct-volume resize --volume-path [volumePath] --size [size]` to send a resize request to the Kata Containers runtime to
resize the direct-assigned volume.
* **NodeStageVolume/NodeUnStageVolume** -- It invokes `kata-runtime direct-volume remove --volume-path [volumePath]` to remove the persisted metadata of a direct-assigned volume.
The `mountInfo` object is defined as follows:
```Golang
type MountInfo struct {
// The type of the volume (ie. block)
VolumeType string `json:"volume-type"`
// The device backing the volume.
Device string `json:"device"`
// The filesystem type to be mounted on the volume.
FsType string `json:"fstype"`
// Additional metadata to pass to the agent regarding this volume.
Metadata map[string]string `json:"metadata,omitempty"`
// Additional mount options.
Options []string `json:"options,omitempty"`
}
```
Notes: given that the `mountInfo` is persisted to the disk by the Kata runtime, it shouldn't container any secrets (such as SMB mount password).
## Implementation Details
### Kata runtime
Instead of the CSI node driver writing the mount info into a `csiPlugin.json` file under the volume root,
as described in the original proposal, here we propose that the CSI node driver passes the mount information to
the Kata Containers runtime through a new `kata-runtime` commandline command. The `kata-runtime` then writes the mount
information to a `mount-info.json` file in a predefined location (`/run/kata-containers/shared/direct-volumes/[volume_path]/`).
When the Kata Containers runtime starts a container, it verifies whether a volume mount is a direct-assigned volume by checking
whether there is a `mountInfo` file under the computed Kata `direct-volumes` directory. If it is, the runtime parses the `mountInfo` file,
updates the mount spec with the data in `mountInfo`. The updated mount spec is then passed to the Kata agent in the guest VM together
with other mounts. The Kata Containers runtime also creates a file named by the sandbox id under the `direct-volumes/[volume_path]/`
directory. The reason for adding a sandbox id file is to establish a mapping between the volume and the sandbox using it.
Later, when the Kata Containers runtime handles the `get-stats` and `resize` commands, it uses the sandbox id to identify
the endpoint of the corresponding `containerd-shim-kata-v2`.
### containerd-shim-kata-v2 changes
`containerd-shim-kata-v2` provides an API for sandbox management through a Unix domain socket. Two new handlers are proposed: `/direct-volume/stats` and `/direct-volume/resize`:
Example:
```bash
$ curl --unix-socket "$shim_socket_path" -I -X GET 'http://localhost/direct-volume/stats/[urlSafeVolumePath]'
$ curl --unix-socket "$shim_socket_path" -I -X POST 'http://localhost/direct-volume/resize' -d '{ "volumePath"": [volumePath], "Size": "123123" }'
```
The shim then forwards the corresponding request to the `kata-agent` to carry out the operations inside the guest VM. For `resize` operation,
the Kata runtime also needs to notify the hypervisor to resize the block device (e.g. call `block_resize` in QEMU).
### Kata agent changes
The mount spec of a direct-assigned volume is passed to `kata-agent` through the existing `Storage` GRPC object.
Two new APIs and three new GRPC objects are added to GRPC protocol between the shim and agent for resizing and getting volume stats:
```protobuf
rpc GetVolumeStats(VolumeStatsRequest) returns (VolumeStatsResponse);
rpc ResizeVolume(ResizeVolumeRequest) returns (google.protobuf.Empty);
message VolumeStatsRequest {
// The volume path on the guest outside the container
string volume_guest_path = 1;
}
message ResizeVolumeRequest {
// Full VM guest path of the volume (outside the container)
string volume_guest_path = 1;
uint64 size = 2;
}
// This should be kept in sync with CSI NodeGetVolumeStatsResponse (https://github.com/container-storage-interface/spec/blob/v1.5.0/csi.proto)
message VolumeStatsResponse {
// This field is OPTIONAL.
repeated VolumeUsage usage = 1;
// Information about the current condition of the volume.
// This field is OPTIONAL.
// This field MUST be specified if the VOLUME_CONDITION node
// capability is supported.
VolumeCondition volume_condition = 2;
}
message VolumeUsage {
enum Unit {
UNKNOWN = 0;
BYTES = 1;
INODES = 2;
}
// The available capacity in specified Unit. This field is OPTIONAL.
// The value of this field MUST NOT be negative.
uint64 available = 1;
// The total capacity in specified Unit. This field is REQUIRED.
// The value of this field MUST NOT be negative.
uint64 total = 2;
// The used capacity in specified Unit. This field is OPTIONAL.
// The value of this field MUST NOT be negative.
uint64 used = 3;
// Units by which values are measured. This field is REQUIRED.
Unit unit = 4;
}
// VolumeCondition represents the current condition of a volume.
message VolumeCondition {
// Normal volumes are available for use and operating optimally.
// An abnormal volume does not meet these criteria.
// This field is REQUIRED.
bool abnormal = 1;
// The message describing the condition of the volume.
// This field is REQUIRED.
string message = 2;
}
```
### Step by step walk-through
Given the following definition:
```YAML
---
apiVersion: v1
kind: Pod
metadata:
name: app
spec:
runtime-class: kata-qemu
containers:
- name: app
image: centos
command: ["/bin/sh"]
args: ["-c", "while true; do echo $(date -u) >> /data/out.txt; sleep 5; done"]
volumeMounts:
- name: persistent-storage
mountPath: /data
volumes:
- name: persistent-storage
persistentVolumeClaim:
claimName: ebs-claim
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
skip-hostmount: "true"
name: ebs-claim
spec:
accessModes:
- ReadWriteOncePod
volumeMode: Filesystem
storageClassName: ebs-sc
resources:
requests:
storage: 4Gi
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: ebs-sc
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
parameters:
csi.storage.k8s.io/fstype: ext4
```
Lets assume that changes have been made in the `aws-ebs-csi-driver` node driver.
**Node publish volume**
1. In the node CSI driver, the `NodePublishVolume` API invokes: `kata-runtime direct-volume add --volume-path "/kubelet/a/b/c/d/sdf" --mount-info "{\"Device\": \"/dev/sdf\", \"fstype\": \"ext4\"}"`.
2. The `Kata-runtime` writes the mount-info JSON to a file called `mountInfo.json` under `/run/kata-containers/shared/direct-volumes/kubelet/a/b/c/d/sdf`.
**Node unstage volume**
1. In the node CSI driver, the `NodeUnstageVolume` API invokes: `kata-runtime direct-volume remove --volume-path "/kubelet/a/b/c/d/sdf"`.
2. Kata-runtime deletes the directory `/run/kata-containers/shared/direct-volumes/kubelet/a/b/c/d/sdf`.
**Use the volume in sandbox**
1. Upon the request to start a container, the `containerd-shim-kata-v2` examines the container spec,
and iterates through the mounts. For each mount, if there is a `mountInfo.json` file under `/run/kata-containers/shared/direct-volumes/[mount source path]`,
it generates a `storage` GRPC object after overwriting the mount spec with the information in `mountInfo.json`.
2. The shim sends the storage objects to kata-agent through TTRPC.
3. The shim writes a file with the sandbox id as the name under `/run/kata-containers/shared/direct-volumes/[mount source path]`.
4. The kata-agent mounts the storage objects for the container.
**Node expand volume**
1. In the node CSI driver, the `NodeExpandVolume` API invokes: `kata-runtime direct-volume resize -volume-path "/kubelet/a/b/c/d/sdf" -size 8Gi`.
2. The Kata runtime checks whether there is a sandbox id file under the directory `/run/kata-containers/shared/direct-volumes/kubelet/a/b/c/d/sdf`.
3. The Kata runtime identifies the shim instance through the sandbox id, and sends a GRPC request to resize the volume.
4. The shim handles the request, asks the hypervisor to resize the block device and sends a GRPC request to Kata agent to resize the filesystem.
5. Kata agent receives the request and resizes the filesystem.
**Node get volume stats**
1. In the node CSI driver, the `NodeGetVolumeStats` API invokes: `kata-runtime direct-volume stats -volume-path "/kubelet/a/b/c/d/sdf"`.
2. The Kata runtime checks whether there is a sandbox id file under the directory `/run/kata-containers/shared/direct-volumes/kubelet/a/b/c/d/sdf`.
3. The Kata runtime identifies the shim instance through the sandbox id, and sends a GRPC request to get the volume stats.
4. The shim handles the request and forwards it to the Kata agent.
5. Kata agent receives the request and returns the filesystem stats.

View File

@@ -1,3 +0,0 @@
# Kata Containers E2E Flow
![Kata containers e2e flow](arch-images/katacontainers-e2e-with-bg.jpg)

View File

@@ -1,3 +1,18 @@
- [Host cgroup management](#host-cgroup-management)
- [Introduction](#introduction)
- [`SandboxCgroupOnly` enabled](#sandboxcgrouponly-enabled)
- [What does Kata do in this configuration?](#what-does-kata-do-in-this-configuration)
- [Why create a Kata-cgroup under the parent cgroup?](#why-create-a-kata-cgroup-under-the-parent-cgroup)
- [Improvements](#improvements)
- [`SandboxCgroupOnly` disabled (default, legacy)](#sandboxcgrouponly-disabled-default-legacy)
- [What does this method do?](#what-does-this-method-do)
- [Impact](#impact)
- [Supported cgroups](#supported-cgroups)
- [Cgroups V1](#cgroups-v1)
- [Cgroups V2](#cgroups-v2)
- [Distro Support](#distro-support)
- [Summary](#summary)
# Host cgroup management
## Introduction
@@ -12,249 +27,192 @@ The OCI [runtime specification][linux-config] provides guidance on where the con
> [`cgroupsPath`][cgroupspath]: (string, OPTIONAL) path to the cgroups. It can be used to either control the cgroups
> hierarchy for containers or to run a new process in an existing container
The cgroups are hierarchical, and this can be seen with the following pod example:
cgroups are hierarchical, and this can be seen with the following pod example:
- Pod 1: `cgroupsPath=/kubepods/pod1`
- Container 1: `cgroupsPath=/kubepods/pod1/container1`
- Container 2: `cgroupsPath=/kubepods/pod1/container2`
- Container 1:
`cgroupsPath=/kubepods/pod1/container1`
- Container 2:
`cgroupsPath=/kubepods/pod1/container2`
- Pod 2: `cgroupsPath=/kubepods/pod2`
- Container 1: `cgroupsPath=/kubepods/pod2/container1`
- Container 2: `cgroupsPath=/kubepods/pod2/container2`
- Container 1:
`cgroupsPath=/kubepods/pod2/container2`
- Container 2:
`cgroupsPath=/kubepods/pod2/container2`
Depending on the upper-level orchestration layers, the cgroup under which the pod is placed is
managed by the orchestrator or not. In the case of Kubernetes, the pod cgroup is created by Kubelet,
while the container cgroups are to be handled by the runtime.
Kubelet will size the pod cgroup based on the container resource requirements, to which it may add
a configured set of [pod resource overheads](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/).
Depending on the upper-level orchestrator, the cgroup under which the pod is placed is
managed by the orchestrator. In the case of Kubernetes, the pod-cgroup is created by Kubelet,
while the container cgroups are to be handled by the runtime. Kubelet will size the pod-cgroup
based on the container resource requirements.
Kata Containers introduces a non-negligible resource overhead for running a sandbox (pod). Typically, the Kata shim,
through its underlying VMM invocation, will create many additional threads compared to process based container runtimes:
the para-virtualized I/O back-ends, the VMM instance or even the Kata shim process, all of those host processes consume
memory and CPU time not directly tied to the container workload, and introduces a sandbox resource overhead.
In order for a Kata workload to run without significant performance degradation, its sandbox overhead must be
provisioned accordingly. Two scenarios are possible:
Kata Containers introduces a non-negligible overhead for running a sandbox (pod). Based on this, two scenarios are possible:
1) The upper-layer orchestrator takes the overhead of running a sandbox into account when sizing the pod-cgroup, or
2) Kata Containers do not fully constrain the VMM and associated processes, instead placing a subset of them outside of the pod-cgroup.
1) The upper-layer orchestrator takes the overhead of running a sandbox into account when sizing the pod cgroup.
For example, Kubernetes [`PodOverhead`](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/)
feature lets the orchestrator add a configured sandbox overhead to the sum of all its containers resources. In
that case, the pod sandbox is properly sized and all Kata created processes will run under the pod cgroup
defined constraints and limits.
2) The upper-layer orchestrator does **not** take the sandbox overhead into account and the pod cgroup is not
sized to properly run all Kata created processes. With that scenario, attaching all the Kata processes to the sandbox
cgroup may lead to non-negligible workload performance degradations. As a consequence, Kata Containers will move
all processes but the vCPU threads into a dedicated overhead cgroup under `/kata_overhead`. The Kata runtime will
not apply any constraints or limits to that cgroup, it is up to the infrastructure owner to optionally set it up.
Kata Containers provides two options for how cgroups are handled on the host. Selection of these options is done through
the `SandboxCgroupOnly` flag within the Kata Containers [configuration](../../src/runtime/README.md#configuration)
file.
Those 2 scenarios are not dynamically detected by the Kata Containers runtime implementation, and thus the
infrastructure owner must configure the runtime according to how the upper-layer orchestrator creates and sizes the
pod cgroup. That configuration selection is done through the `sandbox_cgroup_only` flag within the Kata Containers
[configuration](../../src/runtime/README.md#configuration) file.
## `SandboxCgroupOnly` enabled
## `sandbox_cgroup_only = true`
With `SandboxCgroupOnly` enabled, it is expected that the parent cgroup is sized to take the overhead of running
a sandbox into account. This is ideal, as all the applicable Kata Containers components can be placed within the
given cgroup-path.
Setting `sandbox_cgroup_only` to `true` from the Kata Containers configuration file means that the pod cgroup is
properly sized and takes the pod overhead into account. This is ideal, as all the applicable Kata Containers processes
can simply be placed within the given cgroup path.
In the context of Kubernetes, Kubelet can size the pod cgroup to take the overhead of running a Kata-based sandbox
into account. This has been supported since the 1.16 Kubernetes release, through the
[`PodOverhead`](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/) feature.
In the context of Kubernetes, Kubelet will size the pod-cgroup to take the overhead of running a Kata-based sandbox
into account. This will be feasible in the 1.16 Kubernetes release through the `PodOverhead` feature.
```
┌─────────────────────────────────────────┐
┌──────────────────────────────────┐ │
│ │
│ │ ┌─────────────────────────────┐ │ │
│ │ │ │ │
│ │ │ ┌─────────────────────┐ │ │ │
│ │ │ │ vCPU threads
│ │ │ I/O threads │ │ │ │
│ │ │ │ VMM │ │
│ │ │ │ Kata Shim
│ │ │ │ │ │ │
│ │ │ │ /kata_<sandbox_id>
│ │ └─────────────────────┘ │ │ │
│ │Pod 1 │ │ │
│ │ └─────────────────────────────┘ │ │
│ │
│ │ ┌─────────────────────────────┐ │ │
│ │ │ │ │
│ │ ┌─────────────────────┐ │ │ │
│ │ │ vCPU threads
│ │ │ I/O threads │ │ │ │
│ │ │ │ VMM
│ │ │ Kata Shim │ │ │ │
│ │ │ │
│ │ │ │ /kata_<sandbox_id>
│ │ │ └─────────────────────┘ │ │ │
│ │ │Pod 2 │ │ │
│ │ └─────────────────────────────┘ │ │
│ │ │ │
│ │/kubepods │ │
│ └──────────────────────────────────┘ │
│ │
│ Node │
└─────────────────────────────────────────┘
+----------------------------------------------------------+
| +---------------------------------------------------+ |
| | +---------------------------------------------+ | |
| | | +--------------------------------------+ | | |
| | | | kata-shimv2, VMM and threads: | | | |
| | | | (VMM, IO-threads, vCPU threads, etc)| | | |
| | | | | | | |
| | | | kata_<sandbox-id> | | | |
| | | +--------------------------------------+ | | |
| | | | | |
| | |Pod 1 | | |
| | +---------------------------------------------+ | |
| | | |
| | +---------------------------------------------+ | |
| | | +--------------------------------------+ | | |
| | | | kata-shimv2, VMM and threads: | | | |
| | | | (VMM, IO-threads, vCPU threads, etc)| | | |
| | | | | | | |
| | | | kata_<sandbox-id> | | | |
| | | +--------------------------------------+ | | |
| | |Pod 2 | | |
| | +---------------------------------------------+ | |
| |kubepods | |
| +---------------------------------------------------+ |
| |
|Node |
+----------------------------------------------------------+
```
### Implementation details
### What does Kata do in this configuration?
1. Given a `PodSandbox` container creation, let:
When `sandbox_cgroup_only` is enabled, the Kata shim will create a per pod
sub-cgroup under the pod's dedicated cgroup. For example, in the Kubernetes context,
it will create a `/kata_<PodSandboxID>` under the `/kubepods` cgroup hierarchy.
On a typical cgroup v1 hierarchy mounted under `/sys/fs/cgroup/`, the memory cgroup
subsystem for a pod with sandbox ID `12345678` would live under
`/sys/fs/cgroup/memory/kubepods/kata_12345678`.
```
podCgroup=Parent(container.CgroupsPath)
KataSandboxCgroup=<podCgroup>/kata_<PodSandboxID>
```
In most cases, the `/kata_<PodSandboxID>` created cgroup is unrestricted and inherits and shares all
constraints and limits from the parent cgroup (`/kubepods` in the Kubernetes case). The exception is
for the `cpuset` and `devices` cgroup subsystems, which are managed by the Kata shim.
2. Create the cgroup, `KataSandboxCgroup`
After creating the `/kata_<PodSandboxID>` cgroup, the Kata Containers shim will move itself to it, **before** starting
the virtual machine. As a consequence all processes subsequently created by the Kata Containers shim (the VMM itself, and
all vCPU and I/O related threads) will be created in the `/kata_<PodSandboxID>` cgroup.
3. Join the `KataSandboxCgroup`
### Why create a kata-cgroup under the parent cgroup?
Any process created by the runtime will be created in `KataSandboxCgroup`.
The runtime will limit the cgroup in the host only if the sandbox doesn't have a
container type annotation, but the caller is free to set the proper limits for the `podCgroup`.
And why not directly adding the per sandbox shim directly to the pod cgroup (e.g.
`/kubepods` in the Kubernetes context)?
In the example above the pod cgroups are `/kubepods/pod1` and `/kubepods/pod2`.
Kata creates the unrestricted sandbox cgroup under the pod cgroup.
The Kata Containers shim implementation creates a per-sandbox cgroup
(`/kata_<PodSandboxID>`) to support the `Docker` use case. Although `Docker` does not
have a notion of pods, Kata Containers still creates a sandbox to support the pod-less,
single container use case that `Docker` implements. Since `Docker` does create any
cgroup hierarchy to place a container into, it would be very complex for Kata to map
a particular container to its sandbox without placing it under a `/kata_<containerID>>`
sub-cgroup first.
### Why create a Kata-cgroup under the parent cgroup?
### Advantages
`Docker` does not have a notion of pods, and will not create a cgroup directory
to place a particular container in (i.e., all containers would be in a path like
`/docker/container-id`. To simplify the implementation and continue to support `Docker`,
Kata Containers creates the sandbox-cgroup, in the case of Kubernetes, or a container cgroup, in the case
of docker.
Keeping all Kata Containers processes under a properly sized pod cgroup is ideal
and makes for a simpler Kata Containers implementation. It also helps with gathering
accurate statistics and preventing Kata workloads from being noisy neighbors.
### Improvements
#### Pod resources statistics
- Get statistics about pod resources
If the Kata caller wants to know the resource usage on the host it can get
statistics from the pod cgroup. All cgroups stats in the hierarchy will include
the Kata overhead. This gives the possibility of gathering usage-statics at the
pod level and the container level.
#### Better host resource isolation
- Better host resource isolation
Because the Kata runtime will place all the Kata processes in the pod cgroup,
the resource limits that the caller applies to the pod cgroup will affect all
processes that belong to the Kata sandbox in the host. This will improve the
isolation in the host preventing Kata to become a noisy neighbor.
## `sandbox_cgroup_only = false` (Default setting)
If the cgroup provided to Kata is not sized appropriately, Kata components will
consume resources that the actual container workloads expect to see and use.
This can cause instability and performance degradations.
To avoid that situation, Kata Containers creates an unconstrained overhead
cgroup and moves all non workload related processes (Anything but the virtual CPU
threads) to it. The name of this overhead cgroup is `/kata_overhead` and a per
sandbox sub cgroup will be created under it for each sandbox Kata Containers creates.
Kata Containers does not add any constraints or limitations on the overhead cgroup. It is up to the infrastructure
owner to either:
- Provision nodes with a pre-sized `/kata_overhead` cgroup. Kata Containers will
load that existing cgroup and move all non workload related processes to it.
- Let Kata Containers create the `/kata_overhead` cgroup, leave it
unconstrained or resize it a-posteriori.
## `SandboxCgroupOnly` disabled (default, legacy)
If the cgroup provided to Kata is not sized appropriately, instability will be
introduced when fully constraining Kata components, and the user-workload will
see a subset of resources that were requested. Based on this, the default
handling for Kata Containers is to not fully constrain the VMM and Kata
components on the host.
```
┌────────────────────────────────────────────────────────────────────┐
│ │
│ ┌─────────────────────────────┐ ┌───────────────────────────┐ │
│ │ │ │ │ │
┌─────────────────────────┼────┼─────────────────────────┐ │ │
│ │ │ │ │ │
│ ┌─────────────────────┐ │ │ ┌─────────────────────┐ │ │ │
│ │ │ │ vCPU threads │ │ │ │ VMM │ │ │ │
│ │ │ │ │ │ I/O threads │ │ │ │
│ │ │ │ │ │ │ │ Kata Shim │ │ │ │
│ │ │ │ │ │ │ │ │ │ │ │
│ │ /kata_<sandbox_id> │ │ │ │ /<sandbox_id> │ │ │ │
│ └─────────────────────┘ │ │ └─────────────────────┘ │ │ │
│ │ │ │ │ │ │ │
│ │ │ Pod 1 │ │ │ │ │
└─────────────────────────┼────┼─────────────────────────┘ │ │
│ │ │ │ │ │
│ │ │ │ │ │
┌─────────────────────────┼────┼─────────────────────────┐ │ │
│ │ │ │ │ │ │ │
│ ┌─────────────────────┐ │ │ ┌─────────────────────┐ │ │ │
│ │ vCPU threads │ │ │ │ VMM │ │ │ │
│ │ │ │ │ │ │ │ I/O threads │ │ │ │
│ │ │ │ │ │ │ │ Kata Shim │ │ │ │
│ │ │ │ │ │ │ │ │ │ │ │
│ │ │ │ /kata_<sandbox_id> │ │ │ │ /<sandbox_id> │ │ │ │
│ │ │ └─────────────────────┘ │ │ └─────────────────────┘ │ │ │
│ │ │ │ │
│ │ │ Pod 2 │ │ │ │ │
│ │ └─────────────────────────┼────┼─────────────────────────┘ │ │
│ │ │ │ │ │
│ │ /kubepods │ │ /kata_overhead │ │
│ └─────────────────────────────┘ └───────────────────────────┘ │
│ │
│ │
│ Node │
└────────────────────────────────────────────────────────────────────┘
+----------------------------------------------------------+
| +---------------------------------------------------+ |
| | +---------------------------------------------+ | |
| | | +--------------------------------------+ | | |
| | | |Container 1 |-|Container 2 | | | |
| | | | |-| | | | |
| | | | Shim+container1 |-| Shim+container2 | | | |
| | | +--------------------------------------+ | | |
| | | | | |
| | |Pod 1 | | |
| | +---------------------------------------------+ | |
| | | |
| | +---------------------------------------------+ | |
| | | +--------------------------------------+ | | |
| | | |Container 1 |-|Container 2 | | | |
| | | | |-| | | | |
| | | | Shim+container1 |-| Shim+container2 | | | |
| | | +--------------------------------------+ | | |
| | | | | |
| | |Pod 2 | | |
| | +---------------------------------------------+ | |
| |kubepods | |
| +---------------------------------------------------+ |
| +---------------------------------------------------+ |
| | Hypervisor | |
| |Kata | |
| +---------------------------------------------------+ |
| |
|Node |
+----------------------------------------------------------+
```
### Implementation Details
### What does this method do?
When `sandbox_cgroup_only` is disabled, the Kata Containers shim will create a per pod
sub-cgroup under the pods dedicated cgroup, and another one under the overhead cgroup.
For example, in the Kubernetes context, it will create a `/kata_<PodSandboxID>` under
the `/kubepods` cgroup hierarchy, and a `/<PodSandboxID>` under the `/kata_overhead` one.
1. Given a container creation let `containerCgroupHost=container.CgroupsPath`
1. Rename `containerCgroupHost` path to add `kata_`
1. Let `PodCgroupPath=PodSanboxContainerCgroup` where `PodSanboxContainerCgroup` is the cgroup of a container of type `PodSandbox`
1. Limit the `PodCgroupPath` with the sum of all the container limits in the Sandbox
1. Move only vCPU threads of hypervisor to `PodCgroupPath`
1. Per each container, move its `kata-shim` to its own `containerCgroupHost`
1. Move hypervisor and applicable threads to memory cgroup `/kata`
On a typical cgroup v1 hierarchy mounted under `/sys/fs/cgroup/`, for a pod which sandbox
ID is `12345678`, create with `sandbox_cgroup_only` disabled, the 2 memory subsystems
for the sandbox cgroup and the overhead cgroup would respectively live under
`/sys/fs/cgroup/memory/kubepods/kata_12345678` and `/sys/fs/cgroup/memory/kata_overhead/12345678`.
_Note_: the Kata Containers runtime will not add all the hypervisor threads to
the cgroup path requested, only vCPUs. These threads are run unconstrained.
Unlike when `sandbox_cgroup_only` is enabled, the Kata Containers shim will move itself
to the overhead cgroup first, and then move the vCPU threads to the sandbox cgroup as
they're created. All Kata processes and threads will run under the overhead cgroup except for
the vCPU threads.
This mitigates the risk of the VMM and other threads receiving an out of memory scenario (`OOM`).
With `sandbox_cgroup_only` disabled, Kata Containers assumes the pod cgroup is only sized
to accommodate for the actual container workloads processes. For Kata, this maps
to the VMM created virtual CPU threads and so they are the only ones running under the pod
cgroup. This mitigates the risk of the VMM, the Kata shim and the I/O threads going through
a catastrophic out of memory scenario (`OOM`).
#### Pros and Cons
#### Impact
Running all non vCPU threads under an unconstrained overhead cgroup could lead to workloads
potentially consuming a large amount of host resources.
If resources are reserved at a system level to account for the overheads of
running sandbox containers, this configuration can be utilized with adequate
stability. In this scenario, non-negligible amounts of CPU and memory will be
utilized unaccounted for on the host.
On the other hand, running all non vCPU threads under a dedicated overhead cgroup can provide
accurate metrics on the actual Kata Container pod overhead, allowing for tuning the overhead
cgroup size and constraints accordingly.
[linux-config]: https://github.com/opencontainers/runtime-spec/blob/main/config-linux.md
[cgroupspath]: https://github.com/opencontainers/runtime-spec/blob/main/config-linux.md#cgroups-path
[linux-config]: https://github.com/opencontainers/runtime-spec/blob/master/config-linux.md
[cgroupspath]: https://github.com/opencontainers/runtime-spec/blob/master/config-linux.md#cgroups-path
# Supported cgroups
Kata Containers currently supports cgroups `v1` and `v2`.
Kata Containers supports cgroups `v1` and `v2`. In the following sections each cgroup is
described briefly and what changes are needed in Kata Containers to support it.
In the following sections each cgroup is described briefly.
## Cgroups V1
## cgroups v1
`cgroups v1` are under a [`tmpfs`][1] filesystem mounted at `/sys/fs/cgroup`, where each cgroup is
mounted under a separate cgroup filesystem. A `cgroups v1` hierarchy may look like the following
`Cgroups V1` are under a [`tmpfs`][1] filesystem mounted at `/sys/fs/cgroup`, where each cgroup is
mounted under a separate cgroup filesystem. A `Cgroups v1` hierarchy may look like the following
diagram:
```
@@ -301,12 +259,13 @@ diagram:
A process can join a cgroup by writing its process id (`pid`) to `cgroup.procs` file,
or join a cgroup partially by writing the task (thread) id (`tid`) to the `tasks` file.
Kata Containers supports `v1` by default and no change in the configuration file is needed.
To know more about `cgroups v1`, see [cgroupsv1(7)][2].
## cgroups v2
## Cgroups V2
`cgroups v2` are also known as unified cgroups, unlike `cgroups v1`, the cgroups are
mounted under the same cgroup filesystem. A `cgroups v2` hierarchy may look like the following
`Cgroups v2` are also known as unified cgroups, unlike `cgroups v1`, the cgroups are
mounted under the same cgroup filesystem. A `Cgroups v2` hierarchy may look like the following
diagram:
```
@@ -353,11 +312,22 @@ Same as `cgroups v1`, a process can join the cgroup by writing its process id (`
`cgroup.procs` file, or join a cgroup partially by writing the task (thread) id (`tid`) to
`cgroup.threads` file.
For backwards compatibility Kata Containers defaults to supporting cgroups v1 by default.
To change this to `v2`, set `sandbox_cgroup_only=true` in the `configuration.toml` file.
To know more about `cgroups v2`, see [cgroupsv2(7)][3].
### Distro Support
Many Linux distributions do not yet support `cgroups v2`, as it is quite a recent addition.
For more information about the status of this feature see [issue #2494][4].
# Summary
| cgroup option | default? | status | pros | cons | cgroups
|-|-|-|-|-|-|
| `SandboxCgroupOnly=false` | yes | legacy | Easiest to make Kata work | Unaccounted for memory and resource utilization | v1
| `SandboxCgroupOnly=true` | no | recommended | Complete tracking of Kata memory and CPU utilization. In Kubernetes, the Kubelet can fully constrain Kata via the pod cgroup | Requires upper layer orchestrator which sizes sandbox cgroup appropriately | v1, v2
[1]: http://man7.org/linux/man-pages/man5/tmpfs.5.html
[2]: http://man7.org/linux/man-pages/man7/cgroups.7.html#CGROUPS_VERSION_1

View File

@@ -1,30 +0,0 @@
# Kata Containers support for `inotify`
## Background on `inotify` usage
A common pattern in Kubernetes is to watch for changes to files/directories passed in as `ConfigMaps`
or `Secrets`. Sidecar's normally use `inotify` to watch for changes and then signal the primary container to reload
the updated configuration. Kata Containers typically will pass these host files into the guest using `virtiofs`, which
does not support `inotify` today. While we work to enable this use case in `virtiofs`, we introduced a workaround in Kata Containers.
This document describes how Kata Containers implements this workaround.
### Detecting a `watchable` mount
Kubernetes creates `secrets` and `ConfigMap` mounts at very specific locations on the host filesystem. For container mounts,
the `Kata Containers` runtime will check the source of the mount to identify these special cases. For these use cases, only a single file
or very few would typically need to be watched. To avoid excessive overheads in making a mount watchable,
we enforce a limit of eight files per mount. If a `secret` or `ConfigMap` mount contains more than 8 files, it will not be
considered watchable. We similarly enforce a limit of 1 MB per mount to be considered watchable. Non-watchable mounts will
continue to propagate changes from the mount on the host to the container workload, but these updates will not trigger an
`inotify` event.
If at any point a mount grows beyond the eight file or 1MB limit, it will no longer be `watchable.`
### Presenting a `watchable` mount to the workload
For mounts that are considered `watchable`, inside the guest, the `kata-agent` will poll the mount presented from
the host through `virtiofs` and copy any changed files to a `tmpfs` mount that is presented to the container. In this way,
for `watchable` mounts, Kata will do the polling on behalf of the workload and existing workloads needn't change their usage
of `inotify`.
![drawing](arch-images/inotify-workaround.png)

View File

@@ -1,21 +1,36 @@
# Kata 2.0 Metrics Design
Kata implements CRI's API and supports [`ContainerStats`](https://github.com/kubernetes/kubernetes/blob/release-1.18/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.proto#L101) and [`ListContainerStats`](https://github.com/kubernetes/kubernetes/blob/release-1.18/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.proto#L103) interfaces to expose containers metrics. User can use these interfaces to get basic metrics about containers.
* [Limitations of Kata 1.x and the target of Kata 2.0](#limitations-of-kata-1x-and-the-target-of-kata-20)
* [Metrics architecture](#metrics-architecture)
* [Kata monitor](#kata-monitor)
* [Kata runtime](#kata-runtime)
* [Kata agent](#kata-agent)
* [Performance and overhead](#performance-and-overhead)
* [Metrics list](#metrics-list)
* [Metric types](#metric-types)
* [Kata agent metrics](#kata-agent-metrics)
* [Firecracker metrics](#firecracker-metrics)
* [Kata guest OS metrics](#kata-guest-os-metrics)
* [Hypervisor metrics](#hypervisor-metrics)
* [Kata monitor metrics](#kata-monitor-metrics)
* [Kata containerd shim v2 metrics](#kata-containerd-shim-v2-metrics)
Unlike `runc`, Kata is a VM-based runtime and has a different architecture.
Kata implement CRI's API and support [`ContainerStats`](https://github.com/kubernetes/kubernetes/blob/release-1.18/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.proto#L101) and [`ListContainerStats`](https://github.com/kubernetes/kubernetes/blob/release-1.18/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.proto#L103) interfaces to expose containers metrics. User can use these interface to get basic metrics about container.
## Limitations of Kata 1.x and target of Kata 2.0
But unlike `runc`, Kata is a VM-based runtime and has a different architecture.
## Limitations of Kata 1.x and the target of Kata 2.0
Kata 1.x has a number of limitations related to observability that may be obstacles to running Kata Containers at scale.
In Kata 2.0, the following components will be able to provide more details about the system:
In Kata 2.0, the following components will be able to provide more details about the system.
- containerd shim v2 (effectively `kata-runtime`)
- Hypervisor statistics
- Agent process
- Guest OS statistics
> **Note**: In Kata 1.x, the main user-facing component was the runtime (`kata-runtime`). From 1.5, Kata introduced the Kata containerd shim v2 (`containerd-shim-kata-v2`) which is essentially a modified runtime that is loaded by containerd to simplify and improve the way VM-based containers are created and managed.
> **Note**: In Kata 1.x, the main user-facing component was the runtime (`kata-runtime`). From 1.5, Kata then introduced the Kata containerd shim v2 (`containerd-shim-kata-v2`) which is essentially a modified runtime that is loaded by containerd to simplify and improve the way VM-based containers are created and managed.
>
> For Kata 2.0, the main component is the Kata containerd shim v2, although the deprecated `kata-runtime` binary will be maintained for a period of time.
>
@@ -25,15 +40,14 @@ In Kata 2.0, the following components will be able to provide more details about
Kata 2.0 metrics strongly depend on [Prometheus](https://prometheus.io/), a graduated project from CNCF.
Kata Containers 2.0 introduces a new Kata component called `kata-monitor` which is used to monitor the Kata components on the host. It's shipped with the Kata runtime to provide an interface to:
Kata Containers 2.0 introduces a new Kata component called `kata-monitor` which is used to monitor the other Kata components on the host. It's the monitor interface with Kata runtime, and we can do something like these:
- Get metrics
- Get events
At present, `kata-monitor` supports retrieval of metrics only: this is what will be covered in this document.
In this document we will cover metrics only. And until now it only supports metrics function.
This is the architecture overview of metrics in Kata Containers 2.0:
This is the architecture overview metrics in Kata Containers 2.0.
![Kata Containers 2.0 metrics](arch-images/kata-2-metrics.png)
@@ -46,39 +60,38 @@ For a quick evaluation, you can check out [this how to](../how-to/how-to-set-pro
### Kata monitor
The `kata-monitor` management agent should be started on each node where the Kata containers runtime is installed. `kata-monitor` will:
`kata-monitor` is a management agent on one node, where many Kata containers are running. `kata-monitor`'s work include:
> **Note**: a *node* running Kata containers will be either a single host system or a worker node belonging to a K8s cluster capable of running Kata pods.
> **Note**: node is a single host system or a node in K8s clusters.
- Aggregate sandbox metrics running on the node, adding the `sandbox_id` label to them.
- Attach the additional `cri_uid`, `cri_name` and `cri_namespace` labels to the sandbox metrics, tracking the `uid`, `name` and `namespace` Kubernetes pod metadata.
- Expose a new Prometheus target, allowing all node metrics coming from the Kata shim to be collected by Prometheus indirectly. This simplifies the targets count in Prometheus and avoids exposing shim's metrics by `ip:port`.
- Aggregate sandbox metrics running on this node, and add `sandbox_id` label
- As a Prometheus target, all metrics from Kata shim on this node will be collected by Prometheus indirectly. This can easy the targets count in Prometheus, and also need not to expose shim's metrics by `ip:port`
Only one `kata-monitor` process runs in each node.
Only one `kata-monitor` process are running on one node.
`kata-monitor` uses a different communication channel than the one used by the container engine (`containerd`/`CRI-O`) to communicate with the Kata shim. The Kata shim exposes a dedicated socket address reserved to `kata-monitor`.
`kata-monitor` is using a different communication channel other than that `conatinerd` communicating with Kata shim, and Kata shim listen on a new socket address for communicating with `kata-monitor`.
The shim's metrics socket file is created under the virtcontainers sandboxes directory, i.e. `vc/sbs/${PODID}/shim-monitor.sock`.
The way `kata-monitor` get shim's metrics socket file(`monitor_address`) like that `containerd` get shim address. The socket is an abstract socket and saved as file `abstract` with the same directory of `address` for `containerd`.
> **Note**: If there is no Prometheus server configured, i.e., there are no scrape operations, `kata-monitor` will not collect any metrics.
> **Note**: If there is no Prometheus server is configured, i.e., there is no scrape operations, `kata-monitor` will do nothing initiative.
### Kata runtime
Kata runtime is responsible for:
Runtime is responsible for:
- Gather metrics about shim process
- Gather metrics about hypervisor process
- Gather metrics about running sandbox
- Get metrics from Kata agent (through `ttrpc`)
- Get metrics from Kata agent(through `ttrpc`)
### Kata agent
Kata agent is responsible for:
Agent is responsible for:
- Gather agent process metrics
- Gather guest OS metrics
In Kata 2.0, the agent adds a new interface:
And in Kata 2.0, agent will add a new interface:
```protobuf
rpc GetMetrics(GetMetricsRequest) returns (Metrics);
@@ -95,49 +108,33 @@ The `metrics` field is Prometheus encoded content. This can avoid defining a fix
### Performance and overhead
Metrics should not become a bottleneck for the system or downgrade the performance: they should run with minimal overhead.
Metrics should not become the bottleneck of system, downgrade the performance, and run with minimal overhead.
Requirements:
* Metrics **MUST** be quick to collect
* Metrics **MUST** be small
* Metrics **MUST** be small.
* Metrics **MUST** be generated only if there are subscribers to the Kata metrics service
* Metrics **MUST** be stateless
In Kata 2.0, metrics are collected only when needed (pull mode), mainly from the `/proc` filesystem, and consumed by Prometheus. This means that if the Prometheus collector is not running (so no one cares about the metrics) the overhead will be zero.
In Kata 2.0, metrics are collected mainly from `/proc` filesystem, and consumed by Prometheus, based on a pull mode, that is mean if there is no Prometheus collector is running, so there will be zero overhead if nobody cares the metrics.
The metrics service also doesn't hold any metrics in memory.
#### Metrics size ####
Metrics service also doesn't hold any metrics in memory.
|\*|No Sandbox | 1 Sandbox | 2 Sandboxes |
|---|---|---|---|
|Metrics count| 39 | 106 | 173 |
|Metrics size (bytes)| 9K | 144K | 283K |
|Metrics size (`gzipped`, bytes)| 2K | 10K | 17K |
|Metrics size(bytes)| 9K | 144K | 283K |
|Metrics size(`gzipped`, bytes)| 2K | 10K | 17K |
*Metrics size*: response size of one Prometheus scrape request.
*Metrics size*: Response size of one Prometheus scrape request.
It's easy to estimate the size of one metrics fetch request issued by Prometheus.
The formula to calculate the expected size when no gzip compression is in place is:
9 + (144 - 9) * `number of kata sandboxes`
Prometheus supports `gzip compression`. When enabled, the response size of each request will be smaller:
2 + (10 - 2) * `number of kata sandboxes`
**Example**
We have 10 sandboxes running on a node. The expected size of one metrics fetch request issued by Prometheus against the kata-monitor agent running on that node will be:
9 + (144 - 9) * 10 = **1.35M**
If `gzip compression` is enabled:
2 + (10 - 2) * 10 = **82K**
#### Metrics delay ####
It's easy to estimated that if there are 10 sandboxes running in the host, the size of one metrics fetch request issued by Prometheus will be about to 9 + (144 - 9) * 10 = 1.35M (not `gzipped`) or 2 + (10 - 2) * 10 = 82K (`gzipped`). Of course Prometheus support `gzip` compression, that can reduce the response size of every request.
And here is some test data:
- End-to-end (from Prometheus server to `kata-monitor` and `kata-monitor` write response back): **20ms**(avg)
- Agent (RPC all from shim to agent): **3ms**(avg)
- End-to-end (from Prometheus server to `kata-monitor` and `kata-monitor` write response back): 20ms(avg)
- Agent(RPC all from shim to agent): 3ms(avg)
Test infrastructure:
@@ -146,13 +143,13 @@ Test infrastructure:
**Scrape interval**
Prometheus default `scrape_interval` is 1 minute, but it is usually set to 15 seconds. A smaller `scrape_interval` causes more overhead, so users should set it depending on their monitoring needs.
Prometheus default `scrape_interval` is 1 minute, and usually it is set to 15s. Small `scrape_interval` will cause more overhead, so user should set it on monitor demand.
## Metrics list
Here are listed all the metrics supported by Kata 2.0. Some metrics are dependent on the VM guest kernel, so the available ones may differ based on the environment.
Here listed is all supported metrics by Kata 2.0. Some metrics is dependent on guest kernels in the VM, so there may be some different by your environment.
Metrics are categorized by the component from/for which the metrics are collected.
Metrics is categorized by component where metrics are collected from and for.
* [Metric types](#metric-types)
* [Kata agent metrics](#kata-agent-metrics)
@@ -163,15 +160,15 @@ Metrics are categorized by the component from/for which the metrics are collecte
* [Kata containerd shim v2 metrics](#kata-containerd-shim-v2-metrics)
> **Note**:
> * Labels here do not include the `instance` and `job` labels added by Prometheus.
> * Labels here are not include `instance` and `job` labels that added by Prometheus.
> * Notes about metrics unit
> * `Kibibytes`, abbreviated `KiB`. 1 `KiB` equals 1024 B.
> * For some metrics (like network devices statistics from file `/proc/net/dev`), unit depends on label( for example `recv_bytes` and `recv_packets` have different units).
> * Most of these metrics are collected from the `/proc` filesystem, so the unit of each metric matches the unit of the relevant `/proc` entry. See the `proc(5)` manual page for further details.
> * For some metrics (like network devices statistics from file `/proc/net/dev`), unit is depend on label( for example `recv_bytes` and `recv_packets` are having different units).
> * Most of these metrics is collected from `/proc` filesystem, so the unit of metrics are keeping the same unit as `/proc`. See the `proc(5)` manual page for further details.
### Metric types
Prometheus offers four core metric types.
Prometheus offer four core metric types.
- Counter: A counter is a cumulative metric that represents a single monotonically increasing counter whose value can only increase.
@@ -225,7 +222,7 @@ Metrics for Firecracker vmm.
| `kata_firecracker_uart`: <br> Metrics specific to the UART device. | `GAUGE` | | <ul><li>`item`<ul><li>`error_count`</li><li>`flush_count`</li><li>`missed_read_count`</li><li>`missed_write_count`</li><li>`read_count`</li><li>`write_count`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_firecracker_vcpu`: <br> Metrics specific to VCPUs' mode of functioning. | `GAUGE` | | <ul><li>`item`<ul><li>`exit_io_in`</li><li>`exit_io_out`</li><li>`exit_mmio_read`</li><li>`exit_mmio_write`</li><li>`failures`</li><li>`filter_cpuid`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_firecracker_vmm`: <br> Metrics specific to the machine manager as a whole. | `GAUGE` | | <ul><li>`item`<ul><li>`device_events`</li><li>`panic_count`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_firecracker_vsock`: <br> VSOCK-related metrics. | `GAUGE` | | <ul><li>`item`<ul><li>`activate_fails`</li><li>`cfg_fails`</li><li>`conn_event_fails`</li><li>`conns_added`</li><li>`conns_killed`</li><li>`conns_removed`</li><li>`ev_queue_event_fails`</li><li>`killq_resync`</li><li>`muxer_event_fails`</li><li>`rx_bytes_count`</li><li>`rx_packets_count`</li><li>`rx_queue_event_count`</li><li>`rx_queue_event_fails`</li><li>`rx_read_fails`</li><li>`tx_bytes_count`</li><li>`tx_flush_fails`</li><li>`tx_packets_count`</li><li>`tx_queue_event_count`</li><li>`tx_queue_event_fails`</li><li>`tx_write_fails`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_firecracker_vsock`: <br> Vsock-related metrics. | `GAUGE` | | <ul><li>`item`<ul><li>`activate_fails`</li><li>`cfg_fails`</li><li>`conn_event_fails`</li><li>`conns_added`</li><li>`conns_killed`</li><li>`conns_removed`</li><li>`ev_queue_event_fails`</li><li>`killq_resync`</li><li>`muxer_event_fails`</li><li>`rx_bytes_count`</li><li>`rx_packets_count`</li><li>`rx_queue_event_count`</li><li>`rx_queue_event_fails`</li><li>`rx_read_fails`</li><li>`tx_bytes_count`</li><li>`tx_flush_fails`</li><li>`tx_packets_count`</li><li>`tx_queue_event_count`</li><li>`tx_queue_event_fails`</li><li>`tx_write_fails`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
### Kata guest OS metrics
@@ -306,7 +303,7 @@ Metrics about Kata containerd shim v2 process.
| Metric name | Type | Units | Labels | Introduced in Kata version |
|---|---|---|---|---|
| `kata_shim_agent_rpc_durations_histogram_milliseconds`: <br> RPC latency distributions. | `HISTOGRAM` | `milliseconds` | <ul><li>`action` (RPC actions of Kata agent)<ul><li>`grpc.CheckRequest`</li><li>`grpc.CloseStdinRequest`</li><li>`grpc.CopyFileRequest`</li><li>`grpc.CreateContainerRequest`</li><li>`grpc.CreateSandboxRequest`</li><li>`grpc.DestroySandboxRequest`</li><li>`grpc.ExecProcessRequest`</li><li>`grpc.GetMetricsRequest`</li><li>`grpc.GuestDetailsRequest`</li><li>`grpc.ListInterfacesRequest`</li><li>`grpc.ListProcessesRequest`</li><li>`grpc.ListRoutesRequest`</li><li>`grpc.MemHotplugByProbeRequest`</li><li>`grpc.OnlineCPUMemRequest`</li><li>`grpc.PauseContainerRequest`</li><li>`grpc.RemoveContainerRequest`</li><li>`grpc.ReseedRandomDevRequest`</li><li>`grpc.ResumeContainerRequest`</li><li>`grpc.SetGuestDateTimeRequest`</li><li>`grpc.SignalProcessRequest`</li><li>`grpc.StartContainerRequest`</li><li>`grpc.StatsContainerRequest`</li><li>`grpc.TtyWinResizeRequest`</li><li>`grpc.UpdateContainerRequest`</li><li>`grpc.UpdateInterfaceRequest`</li><li>`grpc.UpdateRoutesRequest`</li><li>`grpc.WaitProcessRequest`</li><li>`grpc.WriteStreamRequest`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_agent_rpc_durations_histogram_milliseconds`: <br> RPC latency distributions. | `HISTOGRAM` | `milliseconds` | <ul><li>`action` (RPC actions of Kata agent)<ul><li>`grpc.CheckRequest`</li><li>`grpc.CloseStdinRequest`</li><li>`grpc.CopyFileRequest`</li><li>`grpc.CreateContainerRequest`</li><li>`grpc.CreateSandboxRequest`</li><li>`grpc.DestroySandboxRequest`</li><li>`grpc.ExecProcessRequest`</li><li>`grpc.GetMetricsRequest`</li><li>`grpc.GuestDetailsRequest`</li><li>`grpc.ListInterfacesRequest`</li><li>`grpc.ListProcessesRequest`</li><li>`grpc.ListRoutesRequest`</li><li>`grpc.MemHotplugByProbeRequest`</li><li>`grpc.OnlineCPUMemRequest`</li><li>`grpc.PauseContainerRequest`</li><li>`grpc.RemoveContainerRequest`</li><li>`grpc.ReseedRandomDevRequest`</li><li>`grpc.ResumeContainerRequest`</li><li>`grpc.SetGuestDateTimeRequest`</li><li>`grpc.SignalProcessRequest`</li><li>`grpc.StartContainerRequest`</li><li>`grpc.StartTracingRequest`</li><li>`grpc.StatsContainerRequest`</li><li>`grpc.StopTracingRequest`</li><li>`grpc.TtyWinResizeRequest`</li><li>`grpc.UpdateContainerRequest`</li><li>`grpc.UpdateInterfaceRequest`</li><li>`grpc.UpdateRoutesRequest`</li><li>`grpc.WaitProcessRequest`</li><li>`grpc.WriteStreamRequest`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_fds`: <br> Kata containerd shim v2 open FDs. | `GAUGE` | | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_gc_duration_seconds`: <br> A summary of the pause duration of garbage collection cycles. | `SUMMARY` | `seconds` | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_goroutines`: <br> Number of goroutines that currently exist. | `GAUGE` | | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |

View File

@@ -1,9 +1,9 @@
# Kata API Design
To fulfill the [Kata design requirements](kata-design-requirements.md), and based on the discussion on [Virtcontainers API extensions](https://docs.google.com/presentation/d/1dbGrD1h9cpuqAPooiEgtiwWDGCYhVPdatq7owsKHDEQ), the Kata runtime library features the following APIs:
- Sandbox based top API
- Storage and network hotplug API
- Plugin frameworks for external proprietary Kata runtime extensions
- Built-in shim and proxy types and capabilities
## Sandbox Based API
### Sandbox Management API
@@ -23,17 +23,17 @@ To fulfill the [Kata design requirements](kata-design-requirements.md), and base
|`sandbox.Stats()`| Get the stats of a running sandbox, return a `SandboxStats` structure.|
|`sandbox.Status()`| Get the status of the sandbox and containers, return a `SandboxStatus` structure.|
|`sandbox.Stop(force)`| Stop a sandbox and Destroy the containers in the sandbox. When force is true, ignore guest related stop failures.|
|`sandbox.CreateContainer(contConfig)`| Create new container in the sandbox with the `ContainerConfig` parameter. It will add new container config to `sandbox.config.Containers`.|
|`sandbox.DeleteContainer(containerID)`| Delete a container from the sandbox by `containerID`, return a `Container` structure.|
|`sandbox.CreateContainer(contConfig)`| Create new container in the sandbox with the `ContainerConfig` param. It will add new container config to `sandbox.config.Containers`.|
|`sandbox.DeleteContainer(containerID)`| Delete a container from the sandbox by containerID, return a `Container` structure.|
|`sandbox.EnterContainer(containerID, cmd)`| Run a new process in a container, executing customer's `types.Cmd` command.|
|`sandbox.KillContainer(containerID, signal, all)`| Signal a container in the sandbox by the `containerID`.|
|`sandbox.PauseContainer(containerID)`| Pause a running container in the sandbox by the `containerID`.|
|`sandbox.KillContainer(containerID, signal, all)`| Signal a container in the sandbox by the containerID.|
|`sandbox.PauseContainer(containerID)`| Pause a running container in the sandbox by the containerID.|
|`sandbox.ProcessListContainer(containerID, options)`| List every process running inside a specific container in the sandbox, return a `ProcessList` structure.|
|`sandbox.ResumeContainer(containerID)`| Resume a paused container in the sandbox by the `containerID`.|
|`sandbox.StartContainer(containerID)`| Start a container in the sandbox by the `containerID`.|
|`sandbox.ResumeContainer(containerID)`| Resume a paused container in the sandbox by the containerID.|
|`sandbox.StartContainer(containerID)`| Start a container in the sandbox by the containerID.|
|`sandbox.StatsContainer(containerID)`| Get the stats of a running container, return a `ContainerStats` structure.|
|`sandbox.StatusContainer(containerID)`| Get the status of a container in the sandbox, return a `ContainerStatus` structure.|
|`sandbox.StopContainer(containerID, force)`| Stop a container in the sandbox by the `containerID`.|
|`sandbox.StopContainer(containerID, force)`| Stop a container in the sandbox by the containerID.|
|`sandbox.UpdateContainer(containerID, resources)`| Update a running container in the sandbox.|
|`sandbox.WaitProcess(containerID, processID)`| Wait on a process to terminate.|
### Sandbox Hotplug API
@@ -57,7 +57,7 @@ To fulfill the [Kata design requirements](kata-design-requirements.md), and base
|Name|Description|
|---|---|
|`sandbox.GetOOMEvent()`| Monitor the OOM events that occur in the sandbox..|
|`sandbox.UpdateRuntimeMetrics()`| Update the `shim/hypervisor` metrics of the running sandbox.|
|`sandbox.UpdateRuntimeMetrics()`| Update the shim/hypervisor's metrics of the running sandbox.|
|`sandbox.GetAgentMetrics()`| Get metrics of the agent and the guest in the running sandbox.|
## Plugin framework for external proprietary Kata runtime extensions
@@ -99,3 +99,32 @@ Built-in implementations include:
### Sandbox Connection Plugin Workflow
![Sandbox Connection Plugin Workflow](https://raw.githubusercontent.com/bergwolf/raw-contents/master/kata/Sandbox-Connection.png "Sandbox Connection Plugin Workflow")
## Built-in Shim and Proxy Types and Capabilities
### Built-in shim/proxy sandbox configurations
- Supported shim configurations:
|Name|Description|
|---|---|
|`noopshim`|Do not start any shim process.|
|`ccshim`| Start the cc-shim binary.|
|`katashim`| Start the `kata-shim` binary.|
|`katashimbuiltin`|No standalone shim process but shim functionality APIs are exported.|
- Supported proxy configurations:
|Name|Description|
|---|---|
|`noopProxy`| a dummy proxy implementation of the proxy interface, only used for testing purpose.|
|`noProxy`|generic implementation for any case where no actual proxy is needed.|
|`ccProxy`|run `ccProxy` to proxy between runtime and agent.|
|`kataProxy`|run `kata-proxy` to translate Yamux connections between runtime and Kata agent. |
|`kataProxyBuiltin`| no standalone proxy process and connect to Kata agent with internal Yamux translation.|
### Built-in Shim Capability
Built-in shim capability is implemented by removing standalone shim process, and
supporting the shim related APIs.
### Built-in Proxy Capability
Built-in proxy capability is achieved by removing standalone proxy process, and
connecting to Kata agent with a custom gRPC dialer that is internal Yamux translation.
The behavior is enabled when proxy is configured as `kataProxyBuiltin`.

View File

@@ -30,7 +30,7 @@ The Kata Containers runtime **MUST** implement the following command line option
The Kata Containers project **MUST** provide two interfaces for CRI shims to manage hardware
virtualization based Kubernetes pods and containers:
- An OCI and `runc` compatible command line interface, as described in the previous section.
This interface is used by implementations such as [`CRI-O`](http://cri-o.io) and [`containerd`](https://github.com/containerd/containerd), for example.
This interface is used by implementations such as [`CRI-O`](http://cri-o.io) and [`cri-containerd`](https://github.com/containerd/cri-containerd), for example.
- A hardware virtualization runtime library API for CRI shims to consume and provide a more
CRI native implementation. The [`frakti`](https://github.com/kubernetes/frakti) CRI shim is an example of such a consumer.

View File

@@ -1,93 +0,0 @@
# Background
[Research](https://www.usenix.org/conference/fast16/technical-sessions/presentation/harter) shows that time to take for pull operation accounts for 76% of container startup time but only 6.4% of that data is read. So if we can get data on demand (lazy load), it will speed up the container start. [`Nydus`](https://github.com/dragonflyoss/image-service) is a project which build image with new format and can get data on demand when container start.
The following benchmarking result shows the performance improvement compared with the OCI image for the container cold startup elapsed time on containerd. As the OCI image size increases, the container startup time of using `nydus` image remains very short. [Click here](https://github.com/dragonflyoss/image-service/blob/master/docs/nydus-design.md) to see `nydus` design.
![`nydus`-performance](arch-images/nydus-performance.png)
## Proposal - Bring `lazyload` ability to Kata Containers
`Nydusd` is a fuse/`virtiofs` daemon which is provided by `nydus` project and it supports `PassthroughFS` and [RAFS](https://github.com/dragonflyoss/image-service/blob/master/docs/nydus-design.md) (Registry Acceleration File System) natively, so in Kata Containers, we can use `nydusd` in place of `virtiofsd` and mount `nydus` image to guest in the meanwhile.
The process of creating/starting Kata Containers with `virtiofsd`,
1. When creating sandbox, the Kata Containers Containerd v2 [shim](https://github.com/kata-containers/kata-containers/blob/main/docs/design/architecture/README.md#runtime) will launch `virtiofsd` before VM starts and share directories with VM.
2. When creating container, the Kata Containers Containerd v2 shim will mount rootfs to `kataShared`(/run/kata-containers/shared/sandboxes/\<SANDBOX\>/mounts/\<CONTAINER\>/rootfs), so it can be seen at the path `/run/kata-containers/shared/containers/shared/\<CONTAINER\>/rootfs` in the guest and used as container's rootfs.
The process of creating/starting Kata Containers with `nydusd`,
![kata-`nydus`](arch-images/kata-nydus.png)
1. When creating sandbox, the Kata Containers Containerd v2 shim will launch `nydusd` daemon before VM starts.
After VM starts, `kata-agent` will mount `virtiofs` at the path `/run/kata-containers/shared` and Kata Containers Containerd v2 shim mount `passthroughfs` filesystem to path `/run/kata-containers/shared/containers` when the VM starts.
```bash
# start nydusd
$ sandbox_id=my-test-sandbox
$ sudo /usr/local/bin/nydusd --log-level info --sock /run/vc/vm/${sandbox_id}/vhost-user-fs.sock --apisock /run/vc/vm/${sandbox_id}/api.sock
```
```bash
# source: the host sharedir which will pass through to guest
$ sudo curl -v --unix-socket /run/vc/vm/${sandbox_id}/api.sock \
-X POST "http://localhost/api/v1/mount?mountpoint=/containers" -H "accept: */*" \
-H "Content-Type: application/json" \
-d '{
"source":"/path/to/sharedir",
"fs_type":"passthrough_fs",
"config":""
}'
```
2. When creating normal container, the Kata Containers Containerd v2 shim send request to `nydusd` to mount `rafs` at the path `/run/kata-containers/shared/rafs/<container_id>/lowerdir` in guest.
```bash
# source: the metafile of nydus image
# config: the config of this image
$ sudo curl --unix-socket /run/vc/vm/${sandbox_id}/api.sock \
-X POST "http://localhost/api/v1/mount?mountpoint=/rafs/<container_id>/lowerdir" -H "accept: */*" \
-H "Content-Type: application/json" \
-d '{
"source":"/path/to/bootstrap",
"fs_type":"rafs",
"config":"config":"{\"device\":{\"backend\":{\"type\":\"localfs\",\"config\":{\"dir\":\"blobs\"}},\"cache\":{\"type\":\"blobcache\",\"config\":{\"work_dir\":\"cache\"}}},\"mode\":\"direct\",\"digest_validate\":true}",
}'
```
The Kata Containers Containerd v2 shim will also bind mount `snapshotdir` which `nydus-snapshotter` assigns to `sharedir`
So in guest, container rootfs=overlay(`lowerdir=rafs`, `upperdir=snapshotdir/fs`, `workdir=snapshotdir/work`)
> how to transfer the `rafs` info from `nydus-snapshotter` to the Kata Containers Containerd v2 shim?
By default, when creating `OCI` image container, `nydus-snapshotter` will return [`struct` Mount slice](https://github.com/containerd/containerd/blob/main/mount/mount.go#L21) below to containerd and containerd use them to mount rootfs
```
[
{
Type: "overlay",
Source: "overlay",
Options: [lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/<snapshot_A>/mnt,upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/<snapshot_B>/fs,workdir=/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/<snapshot_B>/work],
}
]
```
Then, we can append `rafs` info into `Options`, but if do this, containerd will mount failed, as containerd can not identify `rafs` info. Here, we can refer to [containerd mount helper](https://github.com/containerd/containerd/blob/main/mount/mount_linux.go#L42) and provide a binary called `nydus-overlayfs`. The `Mount` slice which `nydus-snapshotter` returned becomes
```
[
{
Type: "fuse.nydus-overlayfs",
Source: "overlay",
Options: [lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/<snapshot_A>/mnt,upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/<snapshot_B>/fs,workdir=/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/<snapshot_B>/work,extraoption=base64({source:xxx,config:xxx,snapshotdir:xxx})],
}
]
```
When containerd find `Type` is `fuse.nydus-overlayfs`,
1. containerd will call `mount.fuse` command;
2. in `mount.fuse`, it will call `nydus-overlayfs`.
3. in `nydus-overlayfs`, it will ignore the `extraoption` and do the overlay mount.
Finally, in the Kata Containers Containerd v2 shim, it parse `extraoption` and get the `rafs` info to mount the image in guest.

View File

@@ -1,5 +0,0 @@
# Design proposals
Kata Containers design proposal documents:
- [Kata Containers tracing](tracing-proposals.md)

View File

@@ -1,213 +0,0 @@
# Kata Tracing proposals
## Overview
This document summarises a set of proposals triggered by the
[tracing documentation PR][tracing-doc-pr].
## Required context
This section explains some terminology required to understand the proposals.
Further details can be found in the
[tracing documentation PR][tracing-doc-pr].
### Agent trace mode terminology
| Trace mode | Description | Use-case |
|-|-|-|
| Static | Trace agent from startup to shutdown | Entire lifespan |
| Dynamic | Toggle tracing on/off as desired | On-demand "snapshot" |
### Agent trace type terminology
| Trace type | Description | Use-case |
|-|-|-|
| isolated | traces all relate to single component | Observing lifespan |
| collated | traces "grouped" (runtime+agent) | Understanding component interaction |
### Container lifespan
| Lifespan | trace mode | trace type |
|-|-|-|
| short-lived | static | collated if possible, else isolated? |
| long-running | dynamic | collated? (to see interactions) |
## Original plan for agent
- Implement all trace types and trace modes for agent.
- Why?
- Maximum flexibility.
> **Counterargument:**
>
> Due to the intrusive nature of adding tracing, we have
> learnt that landing small incremental changes is simpler and quicker!
- Compatibility with [Kata 1.x tracing][kata-1x-tracing].
> **Counterargument:**
>
> Agent tracing in Kata 1.x was extremely awkward to setup (to the extent
> that it's unclear how many users actually used it!)
>
> This point, coupled with the new architecture for Kata 2.x, suggests
> that we may not need to supply the same set of tracing features (in fact
> they may not make sense)).
## Agent tracing proposals
### Agent tracing proposal 1: Don't implement dynamic trace mode
- All tracing will be static.
- Why?
- Because dynamic tracing will always be "partial"
> In fact, not only would it be only a "snapshot" of activity, it may not
> even be possible to create a complete "trace transaction". If this is
> true, the trace output would be partial and would appear "unstructured".
### Agent tracing proposal 2: Simplify handling of trace type
- Agent tracing will be "isolated" by default.
- Agent tracing will be "collated" if runtime tracing is also enabled.
- Why?
- Offers a graceful fallback for agent tracing if runtime tracing disabled.
- Simpler code!
## Questions to ask yourself (part 1)
- Are your containers long-running or short-lived?
- Would you ever need to turn on tracing "briefly"?
- If "yes", is a "partial trace" useful or useless?
> Likely to be considered useless as it is a partial snapshot.
> Alternative tracing methods may be more appropriate to dynamic
> OpenTelemetry tracing.
## Questions to ask yourself (part 2)
- Are you happy to stop a container to enable tracing?
If "no", dynamic tracing may be required.
- Would you ever want to trace the agent and the runtime "in isolation" at the
same time?
- If "yes", we need to fully implement `trace_mode=isolated`
> This seems unlikely though.
## Trace collection
The second set of proposals affect the way traces are collected.
### Motivation
Currently:
- The runtime sends trace spans to Jaeger directly.
- The agent will send trace spans to the [`trace-forwarder`][trace-forwarder] component.
- The trace forwarder will send trace spans to Jaeger.
Kata agent tracing overview:
```
+-------------------------------------------+
| Host |
| |
| +-----------+ |
| | Trace | |
| | Collector | |
| +-----+-----+ |
| ^ +--------------+ |
| | spans | Kata VM | |
| +-----+-----+ | | |
| | Kata | spans | +-----+ | |
| | Trace |<-----------------|Kata | | |
| | Forwarder | VSOCK | |Agent| | |
| +-----------+ Channel | +-----+ | |
| +--------------+ |
+-------------------------------------------+
```
Currently:
- If agent tracing is enabled but the trace forwarder is not running,
the agent will error.
- If the trace forwarder is started but Jaeger is not running,
the trace forwarder will error.
### Goals
- The runtime and agent should:
- Use the same trace collection implementation.
- Use the most the common configuration items.
- Kata should should support more trace collection software or `SaaS`
(for example `Zipkin`, `datadog`).
- Trace collection should not block normal runtime/agent operations
(for example if `vsock-exporter`/Jaeger is not running, Kata Containers should work normally).
### Trace collection proposals
#### Trace collection proposal 1: Send all spans to the trace forwarder as a span proxy
Kata runtime/agent all send spans to trace forwarder, and the trace forwarder,
acting as a tracing proxy, sends all spans to a tracing back-end, such as Jaeger or `datadog`.
**Pros:**
- Runtime/agent will be simple.
- Could update trace collection target while Kata Containers are running.
**Cons:**
- Requires the trace forwarder component to be running (that is a pressure to operation).
#### Trace collection proposal 2: Send spans to collector directly from runtime/agent
Send spans to collector directly from runtime/agent, this proposal need
network accessible to the collector.
**Pros:**
- No additional trace forwarder component needed.
**Cons:**
- Need more code/configuration to support all trace collectors.
## Future work
- We could add dynamic and fully isolated tracing at a later stage,
if required.
## Further details
- See the new [GitHub project](https://github.com/orgs/kata-containers/projects/28).
- [kata-containers-tracing-status](https://gist.github.com/jodh-intel/0ee54d41d2a803ba761e166136b42277) gist.
- [tracing documentation PR][tracing-doc-pr].
## Summary
### Time line
- 2021-07-01: A summary of the discussion was
[posted to the mail list](http://lists.katacontainers.io/pipermail/kata-dev/2021-July/001996.html).
- 2021-06-22: These proposals were
[discussed in the Kata Architecture Committee meeting](https://etherpad.opendev.org/p/Kata_Containers_2021_Architecture_Committee_Mtgs).
- 2021-06-18: These proposals where
[announced on the mailing list](http://lists.katacontainers.io/pipermail/kata-dev/2021-June/001980.html).
### Outcome
- Nobody opposed the agent proposals, so they are being implemented.
- The trace collection proposals are still being considered.
[kata-1x-tracing]: https://github.com/kata-containers/agent/blob/master/TRACING.md
[trace-forwarder]: /src/tools/trace-forwarder
[tracing-doc-pr]: https://github.com/kata-containers/kata-containers/pull/1937

View File

@@ -1,16 +1,33 @@
- [Virtual machine vCPU sizing in Kata Containers](#virtual-machine-vcpu-sizing-in-kata-containers)
* [Default number of virtual CPUs](#default-number-of-virtual-cpus)
* [Virtual CPUs and Kubernetes pods](#virtual-cpus-and-kubernetes-pods)
* [Container lifecycle](#container-lifecycle)
* [Container without CPU constraint](#container-without-cpu-constraint)
* [Container with CPU constraint](#container-with-cpu-constraint)
* [Do not waste resources](#do-not-waste-resources)
# Virtual machine vCPU sizing in Kata Containers
## Default number of virtual CPUs
Before starting a container, the [runtime][4] reads the `default_vcpus` option
from the [configuration file][5] to determine the number of virtual CPUs
Before starting a container, the [runtime][6] reads the `default_vcpus` option
from the [configuration file][7] to determine the number of virtual CPUs
(vCPUs) needed to start the virtual machine. By default, `default_vcpus` is
equal to 1 for fast boot time and a small memory footprint per virtual machine.
Be aware that increasing this value negatively impacts the virtual machine's
boot time and memory footprint.
In general, we recommend that you do not edit this variable, unless you know
what are you doing. If your container needs more than one vCPU, use
[Kubernetes `cpu` limits][1] to assign more resources.
[docker `--cpus`][1], [docker update][4], or [Kubernetes `cpu` limits][2] to
assign more resources.
*Docker*
```sh
$ docker run --name foo -ti --cpus 2 debian bash
$ docker update --cpus 4 foo
```
*Kubernetes*
@@ -40,7 +57,7 @@ $ sudo -E kubectl create -f ~/cpu-demo.yaml
## Virtual CPUs and Kubernetes pods
A Kubernetes pod is a group of one or more containers, with shared storage and
network, and a specification for how to run the containers [[specification][2]].
network, and a specification for how to run the containers [[specification][3]].
In Kata Containers this group of containers, which is called a sandbox, runs inside
the same virtual machine. If you do not specify a CPU constraint, the runtime does
not add more vCPUs and the container is not placed inside a CPU cgroup.
@@ -64,7 +81,13 @@ constraints with each container trying to consume 100% of vCPU, the resources
divide in two parts, 50% of vCPU for each container because your virtual
machine does not have enough resources to satisfy containers needs. If you want
to give access to a greater or lesser portion of vCPUs to a specific container,
use [Kubernetes `cpu` requests][1].
use [`docker --cpu-shares`][1] or [Kubernetes `cpu` requests][2].
*Docker*
```sh
$ docker run -ti --cpus-shares=512 debian bash
```
*Kubernetes*
@@ -94,9 +117,10 @@ $ sudo -E kubectl create -f ~/cpu-demo.yaml
Before running containers without CPU constraint, consider that your containers
are not running alone. Since your containers run inside a virtual machine other
processes use the vCPUs as well (e.g. `systemd` and the Kata Containers
[agent][3]). In general, we recommend setting `default_vcpus` equal to 1 to
[agent][5]). In general, we recommend setting `default_vcpus` equal to 1 to
allow non-container processes to run on this vCPU and to specify a CPU
constraint for each container.
constraint for each container. If your container is already running and needs
more vCPUs, you can add more using [docker update][4].
## Container with CPU constraint
@@ -105,7 +129,7 @@ constraints using the following formula: `vCPUs = ceiling( quota / period )`, wh
`quota` specifies the number of microseconds per CPU Period that the container is
guaranteed CPU access and `period` specifies the CPU CFS scheduler period of time
in microseconds. The result determines the number of vCPU to hot plug into the
virtual machine. Once the vCPUs have been added, the [agent][3] places the
virtual machine. Once the vCPUs have been added, the [agent][5] places the
container inside a CPU cgroup. This placement allows the container to use only
its assigned resources.
@@ -122,34 +146,30 @@ the virtual machine starts with 8 vCPUs and 1 vCPUs is added and assigned
to the container. Non-container processes might be able to use 8 vCPUs but they
use a maximum 1 vCPU, hence 7 vCPUs might not be used.
## Virtual CPU handling without hotplug
In some cases, the hardware and/or software architecture being utilized does not support
hotplug. For example, Firecracker VMM does not support CPU or memory hotplug. Similarly,
the current Linux Kernel for aarch64 does not support CPU or memory hotplug. To appropriately
size the virtual machine for the workload within the container or pod, we provide a `static_sandbox_resource_mgmt`
flag within the Kata Containers configuration. When this is set, the runtime will:
- Size the VM based on the workload requirements as well as the `default_vcpus` option specified in the configuration.
- Not resize the virtual machine after it has been launched.
*Container without CPU constraint*
VM size determination varies depending on the type of container being run, and may not always
be available. If workload sizing information is not available, the virtual machine will be started with the
`default_vcpus`.
```sh
$ docker run -ti debian bash -c "nproc; cat /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_*"
1 # number of vCPUs
100000 # cfs period
-1 # cfs quota
```
In the case of a pod, the initial sandbox container (pause container) typically doesn't contain any resource
information in its runtime `spec`. It is possible that the upper layer runtime
(i.e. containerd or CRI-O) may pass sandbox sizing annotations within the pause container's
`spec`. If these are provided, we will use this to appropriately size the VM. In particular,
we'll calculate the number of CPUs required for the workload and augment this by `default_vcpus`
configuration option, and use this for the virtual machine size.
*Container with CPU constraint*
In the case of a single container (i.e., not a pod), if the container specifies resource requirements,
the container's `spec` will provide the sizing information directly. If these are set, we will
calculate the number of CPUs required for the workload and augment this by `default_vcpus`
configuration option, and use this for the virtual machine size.
```sh
docker run --cpus 4 -ti debian bash -c "nproc; cat /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_*"
5 # number of vCPUs
100000 # cfs period
400000 # cfs quota
```
[1]: https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource
[2]: https://kubernetes.io/docs/concepts/workloads/pods/pod/
[3]: ../../src/agent
[4]: ../../src/runtime
[5]: ../../src/runtime/README.md#configuration
[1]: https://docs.docker.com/config/containers/resource_constraints/#cpu
[2]: https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource
[3]: https://kubernetes.io/docs/concepts/workloads/pods/pod/
[4]: https://docs.docker.com/engine/reference/commandline/update/
[5]: ../../src/agent
[6]: ../../src/runtime
[7]: ../../src/runtime/README.md#configuration

View File

@@ -1,5 +1,16 @@
# Virtualization in Kata Containers
- [Virtualization in Kata Containers](#virtualization-in-kata-containers)
- [Mapping container concepts to virtual machine technologies](#mapping-container-concepts-to-virtual-machine-technologies)
- [Kata Containers Hypervisor and VMM support](#kata-containers-hypervisor-and-vmm-support)
- [QEMU/KVM](#qemukvm)
- [Machine accelerators](#machine-accelerators)
- [Hotplug devices](#hotplug-devices)
- [Firecracker/KVM](#firecrackerkvm)
- [Cloud Hypervisor/KVM](#cloud-hypervisorkvm)
- [Summary](#summary)
Kata Containers, a second layer of isolation is created on top of those provided by traditional namespace-containers. The
hardware virtualization interface is the basis of this additional layer. Kata will launch a lightweight virtual machine,
and use the guests Linux kernel to create a container workload, or workloads in the case of multi-container pods. In Kubernetes
@@ -11,10 +22,10 @@ the multiple hypervisors and virtual machine monitors that Kata supports.
## Mapping container concepts to virtual machine technologies
A typical deployment of Kata Containers will be in Kubernetes by way of a Container Runtime Interface (CRI) implementation. On every node,
Kubelet will interact with a CRI implementer (such as containerd or CRI-O), which will in turn interface with Kata Containers (an OCI based runtime).
Kubelet will interact with a CRI implementor (such as containerd or CRI-O), which will in turn interface with Kata Containers (an OCI based runtime).
The CRI API, as defined at the [Kubernetes CRI-API repo](https://github.com/kubernetes/cri-api/), implies a few constructs being supported by the
CRI implementation, and ultimately in Kata Containers. In order to support the full [API](https://github.com/kubernetes/cri-api/blob/a6f63f369f6d50e9d0886f2eda63d585fbd1ab6a/pkg/apis/runtime/v1alpha2/api.proto#L34-L110) with the CRI-implementer, Kata must provide the following constructs:
CRI implementation, and ultimately in Kata Containers. In order to support the full [API](https://github.com/kubernetes/cri-api/blob/a6f63f369f6d50e9d0886f2eda63d585fbd1ab6a/pkg/apis/runtime/v1alpha2/api.proto#L34-L110) with the CRI-implementor, Kata must provide the following constructs:
![API to construct](./arch-images/api-to-construct.png)
@@ -30,23 +41,28 @@ Each hypervisor or VMM varies on how or if it handles each of these.
## Kata Containers Hypervisor and VMM support
Kata Containers [supports multiple hypervisors](../hypervisors.md).
Kata Containers is designed to support multiple virtual machine monitors (VMMs) and hypervisors.
Kata Containers supports:
- [ACRN hypervisor](https://projectacrn.org/)
- [Cloud Hypervisor](https://github.com/cloud-hypervisor/cloud-hypervisor)/[KVM](https://www.linux-kvm.org/page/Main_Page)
- [Firecracker](https://github.com/firecracker-microvm/firecracker)/KVM
- [QEMU](http://www.qemu-project.org/)/KVM
Details of each solution and a summary are provided below.
Which configuration to use will depend on the end user's requirements. Details of each solution and a summary are provided below.
### QEMU/KVM
Kata Containers with QEMU has complete compatibility with Kubernetes.
Depending on the host architecture, Kata Containers supports various machine types,
for example `q35` on x86 systems, `virt` on ARM systems and `pseries` on IBM Power systems. The default Kata Containers
machine type is `q35`. The machine type and its [`Machine accelerators`](#machine-accelerators) can
be changed by editing the runtime [`configuration`](architecture/README.md#configuration) file.
for example `pc` and `q35` on x86 systems, `virt` on ARM systems and `pseries` on IBM Power systems. The default Kata Containers
machine type is `pc`. The machine type and its [`Machine accelerators`](#machine-accelerators) can
be changed by editing the runtime [`configuration`](./architecture.md/#configuration) file.
Devices and features used:
- virtio VSOCK or virtio serial
- virtio block or virtio SCSI
- [virtio net](https://www.redhat.com/en/virtio-networking-series)
- virtio net
- virtio fs or virtio 9p (recommend: virtio fs)
- VFIO
- hotplug
@@ -60,8 +76,9 @@ Machine accelerators are architecture specific and can be used to improve the pe
and enable specific features of the machine types. The following machine accelerators
are used in Kata Containers:
- NVDIMM: This machine accelerator is x86 specific and only supported by `q35` machine types.
`nvdimm` is used to provide the root filesystem as a persistent memory device to the Virtual Machine.
- NVDIMM: This machine accelerator is x86 specific and only supported by `pc` and
`q35` machine types. `nvdimm` is used to provide the root filesystem as a persistent
memory device to the Virtual Machine.
#### Hotplug devices
@@ -88,34 +105,25 @@ Devices used:
### Cloud Hypervisor/KVM
[Cloud Hypervisor](https://github.com/cloud-hypervisor/cloud-hypervisor), based
on [rust-vmm](https://github.com/rust-vmm), is designed to have a
lighter footprint and smaller attack surface for running modern cloud
workloads. Kata Containers with Cloud
Hypervisor provides mostly complete compatibility with Kubernetes
comparable to the QEMU configuration. As of the 1.12 and 2.0.0 release
of Kata Containers, the Cloud Hypervisor configuration supports both CPU
and memory resize, device hotplug (disk and VFIO), file-system sharing through virtio-fs,
block-based volumes, booting from VM images backed by pmem device, and
fine-grained seccomp filters for each VMM threads (e.g. all virtio
device worker threads). Please check [this GitHub Project](https://github.com/orgs/kata-containers/projects/21)
for details of ongoing integration efforts.
Cloud Hypervisor, based on [rust-VMM](https://github.com/rust-vmm), is designed to have a lighter footprint and attack surface. For Kata Containers,
relative to Firecracker, the Cloud Hypervisor configuration provides better compatibility at the expense of exposing additional devices: file system
sharing and direct device assignment. As of the 1.10 release of Kata Containers, Cloud Hypervisor does not support device hotplug, and as a result
does not support updating container resources after boot, or utilizing block based volumes. While Cloud Hypervisor does support VFIO, Kata is still adding
this support. As of 1.10, Kata does not support block based volumes or direct device assignment. See [Cloud Hypervisor device support documentation](https://github.com/cloud-hypervisor/cloud-hypervisor/blob/master/docs/device_model.md)
for more details on Cloud Hypervisor.
Devices and features used:
- virtio VSOCK or virtio serial
Devices used:
- virtio VSOCK
- virtio block
- virtio net
- virtio fs
- virtio pmem
- VFIO
- hotplug
- seccomp filters
- [HTTP OpenAPI](https://github.com/cloud-hypervisor/cloud-hypervisor/blob/master/vmm/src/api/openapi/cloud-hypervisor.yaml)
### Summary
| Solution | release introduced | brief summary |
|-|-|-|
| Cloud Hypervisor | 1.10 | upstream Cloud Hypervisor with rich feature support, e.g. hotplug, VFIO and FS sharing|
| Firecracker | 1.5 | upstream Firecracker, rust-VMM based, no VFIO, no FS sharing, no memory/CPU hotplug |
| QEMU | 1.0 | upstream QEMU, with support for hotplug and filesystem sharing |
| NEMU | 1.4 | Deprecated, removed as of 1.10 release. Slimmed down fork of QEMU, with experimental support of virtio-fs |
| Firecracker | 1.5 | upstream Firecracker, rust-VMM based, no VFIO, no FS sharing, no memory/CPU hotplug |
| QEMU-virtio-fs | 1.7 | upstream QEMU with support for virtio-fs. Will be removed once virtio-fs lands in upstream QEMU |
| Cloud Hypervisor | 1.10 | rust-VMM based, includes VFIO and FS sharing through virtio-fs, no hotplug |

View File

@@ -1,34 +1,24 @@
# Howto Guides
## Kubernetes Integration
* [Howto Guides](#howto-guides)
* [Kubernetes Integration](#kubernetes-integration)
* [Hypervisors Integration](#hypervisors-integration)
* [Advanced Topics](#advanced-topics)
## Kubernetes Integration
- [Run Kata containers with `crictl`](run-kata-with-crictl.md)
- [Run Kata Containers with Kubernetes](run-kata-with-k8s.md)
- [How to use Kata Containers and Containerd](containerd-kata.md)
- [How to use Kata Containers and containerd with Kubernetes](how-to-use-k8s-with-containerd-and-kata.md)
- [How to use Kata Containers and CRI (containerd plugin) with Kubernetes](how-to-use-k8s-with-cri-containerd-and-kata.md)
- [Kata Containers and service mesh for Kubernetes](service-mesh.md)
- [How to import Kata Containers logs into Fluentd](how-to-import-kata-logs-with-fluentd.md)
## Hypervisors Integration
Currently supported hypervisors with Kata Containers include:
- `qemu`
- `cloud-hypervisor`
- `firecracker`
In the case of `firecracker` the use of a block device `snapshotter` is needed
for the VM rootfs. Refer to the following guide for additional configuration
steps:
- [Setup Kata containers with `firecracker`](how-to-use-kata-containers-with-firecracker.md)
- `ACRN`
While `qemu` , `cloud-hypervisor` and `firecracker` work out of the box with installation of Kata,
some additional configuration is needed in case of `ACRN`.
Refer to the following guides for additional configuration steps:
- [Kata Containers with Firecracker](https://github.com/kata-containers/documentation/wiki/Initial-release-of-Kata-Containers-with-Firecracker-support)
- [Kata Containers with NEMU](how-to-use-kata-containers-with-nemu.md)
- [Kata Containers with ACRN Hypervisor](how-to-use-kata-containers-with-acrn.md)
## Advanced Topics
- [How to use Kata Containers with virtio-fs](how-to-use-virtio-fs-with-kata.md)
- [Setting Sysctls with Kata](how-to-use-sysctls-with-kata.md)
- [What Is VMCache and How To Enable It](what-is-vm-cache-and-how-do-I-use-it.md)
@@ -38,8 +28,3 @@
- [How to use Kata Containers with `virtio-mem`](how-to-use-virtio-mem-with-kata.md)
- [How to set sandbox Kata Containers configurations with pod annotations](how-to-set-sandbox-config-kata.md)
- [How to monitor Kata Containers in K8s](how-to-set-prometheus-in-k8s.md)
- [How to use hotplug memory on arm64 in Kata Containers](how-to-hotplug-memory-arm64.md)
- [How to setup swap devices in guest kernel](how-to-setup-swap-devices-in-guest-kernel.md)
- [How to run rootless vmm](how-to-run-rootless-vmm.md)
- [How to run Docker with Kata Containers](how-to-run-docker-with-kata.md)
- [How to run Kata Containers with `nydus`](how-to-use-virtio-fs-nydus-with-kata.md)

View File

@@ -1,5 +1,23 @@
# How to use Kata Containers and Containerd
- [Concepts](#concepts)
- [Kubernetes `RuntimeClass`](#kubernetes-runtimeclass)
- [Containerd Runtime V2 API: Shim V2 API](#containerd-runtime-v2-api-shim-v2-api)
- [Install](#install)
- [Install Kata Containers](#install-kata-containers)
- [Install containerd with CRI plugin](#install-containerd-with-cri-plugin)
- [Install CNI plugins](#install-cni-plugins)
- [Install `cri-tools`](#install-cri-tools)
- [Configuration](#configuration)
- [Configure containerd to use Kata Containers](#configure-containerd-to-use-kata-containers)
- [Kata Containers as a `RuntimeClass`](#kata-containers-as-a-runtimeclass)
- [Kata Containers as the runtime for untrusted workload](#kata-containers-as-the-runtime-for-untrusted-workload)
- [Kata Containers as the default runtime](#kata-containers-as-the-default-runtime)
- [Configuration for `cri-tools`](#configuration-for-cri-tools)
- [Run](#run)
- [Launch containers with `ctr` command line](#launch-containers-with-ctr-command-line)
- [Launch Pods with `crictl` command line](#launch-pods-with-crictl-command-line)
This document covers the installation and configuration of [containerd](https://containerd.io/)
and [Kata Containers](https://katacontainers.io). The containerd provides not only the `ctr`
command line tool, but also the [CRI](https://kubernetes.io/blog/2016/12/container-runtime-interface-cri-in-kubernetes/)
@@ -39,8 +57,8 @@ use `RuntimeClass` instead of the deprecated annotations.
### Containerd Runtime V2 API: Shim V2 API
The [`containerd-shim-kata-v2` (short as `shimv2` in this documentation)](../../src/runtime/cmd/containerd-shim-kata-v2/)
implements the [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/main/runtime/v2) for Kata.
The [`containerd-shim-kata-v2` (short as `shimv2` in this documentation)](../../src/runtime/containerd-shim-v2)
implements the [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/master/runtime/v2) for Kata.
With `shimv2`, Kubernetes can launch Pod and OCI-compatible containers with one shim per Pod. Prior to `shimv2`, `2N+1`
shims (i.e. a `containerd-shim` and a `kata-shim` for each container and the Pod sandbox itself) and no standalone `kata-proxy`
process were used, even with VSOCK not available.
@@ -72,6 +90,7 @@ $ command -v containerd
### Install CNI plugins
> **Note:** You do not need to install CNI plugins if you do not want to use containerd with Kubernetes.
> If you have installed Kubernetes with `kubeadm`, you might have already installed the CNI plugins.
You can manually install CNI plugins as follows:
@@ -93,8 +112,8 @@ $ popd
You can install the `cri-tools` from source code:
```bash
$ go get github.com/kubernetes-sigs/cri-tools
$ pushd $GOPATH/src/github.com/kubernetes-sigs/cri-tools
$ go get github.com/kubernetes-incubator/cri-tools
$ pushd $GOPATH/src/github.com/kubernetes-incubator/cri-tools
$ make
$ sudo -E make install
$ popd
@@ -130,42 +149,74 @@ For
The `RuntimeClass` is suggested.
The following configuration includes two runtime classes:
The following configuration includes three runtime classes:
- `plugins.cri.containerd.runtimes.runc`: the runc, and it is the default runtime.
- `plugins.cri.containerd.runtimes.kata`: The function in containerd (reference [the document here](https://github.com/containerd/containerd/tree/main/runtime/v2#binary-naming))
- `plugins.cri.containerd.runtimes.kata`: The function in containerd (reference [the document here](https://github.com/containerd/containerd/tree/master/runtime/v2#binary-naming))
where the dot-connected string `io.containerd.kata.v2` is translated to `containerd-shim-kata-v2` (i.e. the
binary name of the Kata implementation of [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/main/runtime/v2)).
binary name of the Kata implementation of [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/master/runtime/v2)).
- `plugins.cri.containerd.runtimes.katacli`: the `containerd-shim-runc-v1` calls `kata-runtime`, which is the legacy process.
```toml
[plugins.cri.containerd]
no_pivot = false
[plugins.cri.containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
privileged_without_host_devices = false
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
BinaryName = ""
CriuImagePath = ""
CriuPath = ""
CriuWorkPath = ""
IoGid = 0
[plugins.cri.containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v1"
[plugins.cri.containerd.runtimes.runc.options]
NoPivotRoot = false
NoNewKeyring = false
ShimCgroup = ""
IoUid = 0
IoGid = 0
BinaryName = "runc"
Root = ""
CriuPath = ""
SystemdCgroup = false
[plugins.cri.containerd.runtimes.kata]
runtime_type = "io.containerd.kata.v2"
privileged_without_host_devices = true
pod_annotations = ["io.katacontainers.*"]
container_annotations = ["io.katacontainers.*"]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata.options]
ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration.toml"
[plugins.cri.containerd.runtimes.katacli]
runtime_type = "io.containerd.runc.v1"
[plugins.cri.containerd.runtimes.katacli.options]
NoPivotRoot = false
NoNewKeyring = false
ShimCgroup = ""
IoUid = 0
IoGid = 0
BinaryName = "/usr/bin/kata-runtime"
Root = ""
CriuPath = ""
SystemdCgroup = false
```
From Containerd v1.2.4 and Kata v1.6.0, there is a new runtime option supported, which allows you to specify a specific Kata configuration file as follows:
```toml
[plugins.cri.containerd.runtimes.kata]
runtime_type = "io.containerd.kata.v2"
privileged_without_host_devices = true
[plugins.cri.containerd.runtimes.kata.options]
ConfigPath = "/etc/kata-containers/config.toml"
```
`privileged_without_host_devices` tells containerd that a privileged Kata container should not have direct access to all host devices. If unset, containerd will pass all host devices to Kata container, which may cause security issues.
`pod_annotations` is the list of pod annotations passed to both the pod sandbox as well as container through the OCI config.
`container_annotations` is the list of container annotations passed through to the OCI config of the containers.
This `ConfigPath` option is optional. If you do not specify it, shimv2 first tries to get the configuration file from the environment variable `KATA_CONF_FILE`. If neither are set, shimv2 will use the default Kata configuration file paths (`/etc/kata-containers/configuration.toml` and `/usr/share/defaults/kata-containers/configuration.toml`).
If you use Containerd older than v1.2.4 or a version of Kata older than v1.6.0 and also want to specify a configuration file, you can use the following workaround, since the shimv2 accepts an environment variable, `KATA_CONF_FILE` for the configuration file path. Then, you can create a
shell script with the following:
```bash
#!/bin/bash
KATA_CONF_FILE=/etc/kata-containers/firecracker.toml containerd-shim-kata-v2 $@
```
Name it as `/usr/local/bin/containerd-shim-katafc-v2` and reference it in the configuration of containerd:
```toml
[plugins.cri.containerd.runtimes.kata-firecracker]
runtime_type = "io.containerd.katafc.v2"
```
#### Kata Containers as the runtime for untrusted workload
For cases without `RuntimeClass` support, we can use the legacy annotation method to support using Kata Containers
@@ -185,8 +236,28 @@ and then, run an untrusted workload with Kata Containers:
runtime_type = "io.containerd.kata.v2"
```
For the earlier versions of Kata Containers and containerd that do not support Runtime V2 (Shim API), you can use the following alternative configuration:
```toml
[plugins.cri.containerd]
# "plugins.cri.containerd.default_runtime" is the runtime to use in containerd.
[plugins.cri.containerd.default_runtime]
# runtime_type is the runtime type to use in containerd e.g. io.containerd.runtime.v1.linux
runtime_type = "io.containerd.runtime.v1.linux"
# "plugins.cri.containerd.untrusted_workload_runtime" is a runtime to run untrusted workloads on it.
[plugins.cri.containerd.untrusted_workload_runtime]
# runtime_type is the runtime type to use in containerd e.g. io.containerd.runtime.v1.linux
runtime_type = "io.containerd.runtime.v1.linux"
# runtime_engine is the name of the runtime engine used by containerd.
runtime_engine = "/usr/bin/kata-runtime"
```
You can find more information on the [Containerd config documentation](https://github.com/containerd/cri/blob/master/docs/config.md)
#### Kata Containers as the default runtime
If you want to set Kata Containers as the only runtime in the deployment, you can simply configure as follows:
@@ -197,6 +268,15 @@ If you want to set Kata Containers as the only runtime in the deployment, you ca
runtime_type = "io.containerd.kata.v2"
```
Alternatively, for the earlier versions of Kata Containers and containerd that do not support Runtime V2 (Shim API), you can use the following alternative configuration:
```toml
[plugins.cri.containerd]
[plugins.cri.containerd.default_runtime]
runtime_type = "io.containerd.runtime.v1.linux"
runtime_engine = "/usr/bin/kata-runtime"
```
### Configuration for `cri-tools`
> **Note:** If you skipped the [Install `cri-tools`](#install-cri-tools) section, you can skip this section too.
@@ -250,12 +330,10 @@ To run a container with Kata Containers through the containerd command line, you
```bash
$ sudo ctr image pull docker.io/library/busybox:latest
$ sudo ctr run --cni --runtime io.containerd.run.kata.v2 -t --rm docker.io/library/busybox:latest hello sh
$ sudo ctr run --runtime io.containerd.run.kata.v2 -t --rm docker.io/library/busybox:latest hello sh
```
This launches a BusyBox container named `hello`, and it will be removed by `--rm` after it quits.
The `--cni` flag enables CNI networking for the container. Without this flag, a container with just a
loopback interface is created.
### Launch Pods with `crictl` command line

Some files were not shown because too many files have changed in this diff Show More