There were still issues. Tested in fork, verified environment variable
passing works as before now.
Fixes: #1273
Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
- snap: Fix yq error in build
- storage: cleanup and support read only block dev hotplug
- rootfs: Don't fallthrough in the docker_extra_args() switch
- github: Add github actions
- shimv2: Avoid double removing of container from sandbox
- Agent: return error on trying to persist a pid namespace and minor improvements
- rustjail: allow network sysctls
- rustjail: fix the issue of sync read
- rustjail: fix the issue of bind mount /dev
- qemu: no state to save if QEMU isn't running
- packaging/qemu: Build and package completely inside the container
- agent: upgrade cgroups to 0.2.0
- agent: Simplify .or_else() to .or()
- Fix error reporting in listInterfaces() and listRoutes()
- improve rustjail validator
- Add void "install" targets for both "trace-forwarder" and "agent-ctl"
- [forwardport] Add support for Gentoo
- oci: fix a typo in "addtionalGids"
- Don't update cpusets if no CPUs changed closes#1172
- rootfs: reduce size of debian image
- runtime: Allow to overwrite DESTDIR
- snap: fix snap release channel
- Don't leak fd when reseeding rng
- Fixes for make generate-protocols
- docs: Fix docs in docs/architecture.md
- docs: Update the Cloud Hypervisor description in virtualization.md
- agent: exit from exec hangs if background process is present
- [forwardport] install: Improve snap documentation
- handle vcpus properly utilized in the guest
- docs: fix the custom agent binary file path for creating initrd image
- shimv2: handle ctx passed by containerd
- runtime: clh: Enforce to call 'cleanupVM' for 'stopSandbox'
- agent: Adjust OOM Score to avoid agent being killed.
- [forward port] cli: make check subcommand more tolerant to failures
- docs: add link to VMT on top level README
- rustjail: fork a new child process to change the pid ns
- rustjail: remove the network ns validation against container
- snap: update apps section
- runtime: don't wait the second shim process in shim start
- agent: create pci root Bus Path for arm64
- agent: enable lto flag for Cargo to get better optimized code
- virtcontainers: revert CleanupContainer from PR 1079
- docs: Create hypervisor summary document
- Add hyperlink and fix typo
- versions: Use CRI-O v1.18.4-4-g6dee3891e
- runtime: change configuration key name from EnablePprof to enable_pprof
- runtime: delete sandboxlist.go and sandboxlist_test.go
- versions: Use release-1.18 (commit ee9128444bec10)
- runtime: clh: disable virtiofs DAX when FS cache size is 0
- release: Fix release candidate to major version upgrade check
- runtime: sleep 1 second after GetOOMEvent failed
- Agent: README updates for build on ppc64le
- runtime: clean/refactor code
- Forward port annotation doc
- versions: Update cloud-hypervisor to release v0.11.0
- docs: Add instructions for enabling VM templating
- Revert "version: revert back to crio 1.8.3"
- Dump guest memory when kernel panic for QEMU
- clh: Consolidate the code path for device unplug
- agent: Log ttrpc messages
- annotations: Improve asset annotation handling
- runtime: readonly volume should be bind mounted readonly on the host
- docs: Fix incorrect docs in config file
- CI: Fix incorrect URL
- docs: Update top-level README
- versions: Update crio version
- runtime: cloud-hypervisor: reduce memory footprint
- agent: Improve unit test coverage for src/sandbox.rs
- rustjail: fix the issue of create thread failed causing current thread panic
- Improve unit test coverage for rustjail/container.rs
- agent: Update build instructions
- cli: Provide aliases for kata-* subcommands and options
- runtime: Restore QEMUVIRTIOFSPATH variable in Makefile
- Use apply_patches.sh in qemu and kernel scripts
- clean up agent proto files
- agent: fixes the permissions of PID 1's STDIO
- Feature/1004 add version for kata monitor
- agent: Generate proto files programmatically
- runtime: Fix firecracker config
- docs: remove the 1.x version description about shim and proxy
- arm64: correct bridge type for QEMUVIRT
- snap: add GH actions jobs to release the snap package
- agent: clear clippy warnings
- agent: simplify ttrpc error construction
- Replace @RUNTIME_NAME@ with the target in generated files
- 2.0 update doc for hypervisor related information
- virtcontainers: Append max_ports to virtio-serial device
- snap: install libseccomp-dev
- runtime: set virtio-fs as default fs sharing method
- VirtioFS: backports & default settings to improve performance
- tools: Make agent-ctl support more APIs
- Validate runtime annotations
- kernel: update to 5.4.71
- config: make virtio-fs part of standard kernel
- agent: Optimize error handling
- versions: Update Kubernetes, containerd, cri-o and cri-tools
- agent: fix crashers if API requests empty
- rustjail: add length check for uid_mappings in rootless euid mapping
- kata-monitor: use regexp to check if runtime is kata containers
- docs: update the build kata containers kernel document
- cgroup and cpuset fixes from 1.x
- docs: Update upgrading guide
- agent: fix panic on malformed device resource in container update
- Forward port device conflict fixes from Kata 1 / Go agent
- docs: Add containerd install guide
- agent: simplify codes
- agent: fix errorneous parsing for guest block size
- agent: use macro to simplify parse_cmdline function in config.rs
- fix arm CI
- packaging: fix missing cloud_hypervisor_repo
- docs: Add crictl example json files
- ci: snap: add event filtering
- agent: do not follow link when mounting container proc and sysfs
- agent-ctl: include cargo lock updates
- agent: set init process non-dumpable
- runtime: Clear the VCMock 1.x API Methods from 2.0
- virtiofs: Disable DAX
- docs: Update docs for enabling agent debug console
- Remove compilation warnings
- osbuilder: Create target directory for agent
- versions: add plugins section
- snap: specify python version
- packaging: fix image build script
- Main packaging fixups
- clh: Support VFIO device unplug
- ci: add github action to test the snap
- docs: update networking description
- docs: update dev guide for agent build
- rust-agent: Update README
- docs: update architecture.md
- runtime: add support for SGX
- version: upgrade qemu version to v5.1.0 for arm64
- agent: Fix OCI Windows network shared container name typo
- github: Remove issue template and use central one
- docs: fix broken links
- Packaging: release notes script using error kernel path urls
- rust-agent: Replaces improper use of match for non-constant patterns
- devices: fix go test warning in manager_test.go
- action: Allow long lines if non-alphabetic
- Indicates never return function and remove unreachable code
- agent: propagate the internal detail errors to users
- Update Installation Guide to better reflect the current state of the project
- ci: fix clone_tests_repo function
- agent: Set LIBC=gnu for ppc64le arch by default
- fc: integrate Firecracker's metrics
- Fix to qemu experimental and improvements
- ci: resurrect travis static checkers
- agent: fix UT failures due to chdir
- agent: Only allow proc mount if it is procfs
- kata 2.0: add debug console service
- runtime: Call s.newStore.Destroy if globalSandboxList.addSandbox
- shimv2: add a comment in checkAndMount()
- osbuilder: specify default toolchain verion in rust-init
- runtime: Update CLH client pkg to version v0.10.0
- agent/oci: Don't use deprecated Error::description() method
- runtime: Fix linter errors in release files
- packaging: Build from source if the clh release binary is missing
- runtime: add podman configuration to data collection script
- ci: use Travis cache to reduce build time
- agent: update cgroups crate
- docs: Update the reference path of kata-deploy in the packaging
- runtime: make kata-check check for newer release
- how-to: add privileged_without_host_devices to containerd guide
- agent: Unit tests for rustjail/mount.rs
- docs: Fix the kata-pkgsync tool's docs script path
- Fix developer guide
- fix guest panic when running agent as init
- packaging: update version file url for kata 2.0 in Makefile
- Fix release notes
789fd7c1 blk-dev: hotplug readonly if applicable
12777b26 volumes: cleanup / minor refactoring
fbc1d123 vendor: revendor govmm
6cc1920c snap: Fix yq error in build
b329a74f rootfs: Fix indentation inside a switch
8879f9a0 rootfs: apparmor=unconfined is needed for non Red Hat host OSes
bbeebcdb rootfs: Always add SYS_ADMIN, CHROOT, and MKNOD caps to docker cmdline
90ec2fa8 rootfs: Don't fallthrough in the docker_extra_args() switch
ebd9fcc2 actions: Run static checks before make agent
0d3736d5 rustjail: fix the issue of sync read
0dc02f6d rustjail: fix the issue of bind mount /dev
894fa42a rustjail: allow network sysctls
d4cd2554 agent: Avoid container stats panic caused by cgroup controller non-exist
157e055f agent: upgrade crate cgroups to 0.2.0
e3ec1d50 agent: Simplify .or_else() to .or()
14e7042c agent: Clean up commented use declarations
5fe5b321 agent: Fix temp prefix on Namespace::test_setup_persistent_ns
3a891d4e agent: Return error on trying to persist a pid namespace
5c464018 shimv2: Avoid double removing of container from sandbox
b366af93 jail: add more test cases for validator
d38a5d3f jail/validator: introduce helpers to reduce duplicated code
76ad3213 jail/validator: avoid unwrap() for safety
51fd624f rustjail: add more context info for errors
9321e1b2 oci: fix two incompatible issues with OCI spec
406a91ff agent: consume ttrpc crate from crates.io
9a7bcccc qemu: no state to save if QEMU isn't running
6181570c oci: fix a typo in "addtionalGids"
a5372e00 github: Add github actions
4af5beda agent/sandbox: Don't update cpuset when ncpus = 0
e004616b runtime/network: Fix error reporting in listRoutes()
1ae8e81a runtime/network: Correct error reporting in listInterfaces()
a19263e5 agent/protocols: Remove unneeded import from oci.proto
a19cf28c agent/protocols: Remove some unnecessary include directives from protoc
2b452090 agent/protocols: Remove some unneeded dependencies for protocol generation
b36c9ea3 docs: Fix docs in docs/architecture.md
3db1c805 agent: Don't leak fd when reseeding rng
8ac93f65 rootfs-builder: add support for gentoo
9897238f rootfs: reduce size of debian image
d47122e9 docs: Update the Cloud Hypervisor description in virtualization.md
10e9bfc6 runtime: Allow to overwrite DESTDIR
f740032c packaging/qemu: Delete the temporary container
e5c710e8 packaging/qemu: Build and package completely in the container
4c3377de packaging/qemu: Add QEMU_DESTDIR argument to dockerfiles
faed2369 rootfs-builder: add functions to run before and after the container
8e5603e6 snap: fix snap release channel
8f538935 install: Improve snap documentation
1ca415d8 agent: exit from exec hangs if background process is present
a00f7c34 docs: fix the custom agent binary file path for creating initrd image
0155fe12 shimv2: handle ctx passed by containerd
a793b8d9 agent: update cpuset of container path
705182d0 agent: ignore updating cpuset error when update cgroups
647331ac runtime: clh: Enforce to call 'cleanupVM' for 'stopSandbox'
e684a541 docs: add link to VMT on top level README
68f66c51 agent-ctl: Add void "install" target
5e407758 trace-forwarder: Add void "install" target
70f198d7 cli: check modules and permissions before loading a module
cb684cf8 cli: don't fail if rate limit is exceeded
9216f2ad rustjail: fork a new child process to change the pid ns
3b08376c rustjail: remove the network ns validation against container
c388ec5b runtime: don't wait the second shim process in shim start
d6acc4c0 agent: enable lto flag for Cargo to get better optimized code
13a8e4e3 snap: update apps section
fdbf7d32 virtcontainers: revert CleanupContainer from PR 1079
91a390f0 docs: Create hypervisor summary document
3eeb25a1 docs: Tidied up virtualisation summary table
8ec3cf08 docs: Adding hyperlink to virtio-net in kata documentation 2.0
b5b67db8 docs: Fixing typo in virtualization.md file
4d46d0f0 versions: Use CRI-O v1.18.4-4-g6dee3891e
53b5d063 agent: Adjust OOM Score to avoid agent being killed.
14a21c3a runtime: change configuration key name from EnablePprof to enable_pprof
4e3a8c01 runtime: remove global sandbox variable
29020394 runtime: delete sandboxlist.go and sandboxlist_test.go
9b88a96b versions: Use release-1.18 (commit ee9128444bec10)
36f65ce1 runtime: clh: update cloud-hypervisor
e1396f04 runtime: clh: disable virtiofs DAX when FS cache size is 0
8f38265b release: Fix release candidate to major version upgrade check
2e0bf40a tests: Ensure semver build metadata is ignored
4024a827 release: Make error format string consistent
cb0e6094 runtime: sleep 1 second after GetOOMEvent failed
4c78814b docs: Fix pre-existing spelling mistakes caught by the CI
6c083d94 docs: Add a link to document describing how to use annotations
d67921a2 docs: Document restricted annotations
1fc7b764 docs: Repair inconsistencies between 2.0 and 1.x
21801a11 versions: Revert "version: revert back to crio 1.8.3"
b8414045 runtime: remove nsenter
e3510be8 runtime: use one line if statement to check if err is nil for qemu.go
378308e2 docs: Add instructions for enabling VM templating
92c1c4c6 versions: Update cloud-hypervisor to release v0.11.0
8907a339 agent: Only show ttrpc logs for trace log level
21cd7ad1 agent: Log ttrpc messages
286eebf0 agent: Add env var to set log level
b9c6db4b agent: Add env var tests
705e9955 agent: Add env var comment
5ced96e9 hypervisor: Remove unused methods
e82c9dae annotations: Improve asset annotation handling
0f26f1cd annotations: Add missing hypervisor control annotation
76064e3e asset: Formatting, grammar and whitespace
40418f6d runtime: add geust memory dump
ff13bde3 version: revert back to crio 1.8.3
6c2fc233 agent: create pci root Bus Path for arm64
a958eaa8 runtime: mount shared mountpoint readonly
125e21ce runtime: readonly mounts should be readonly bindmount on the host
5f0abc20 CI: Fix incorrect URL
b6f8a1d5 docs: Fix incorrect docs in config file
93d79625 clh: Consolidate the code path for device unplug
18a22459 Agent: README updates for build on ppc64le
655f2649 Agent: README updates for build on ppc64le
62c7e094 docs: Remove credits
679df0fb docs: Update top-level README
dfe364f8 Agent: README updates for build on ppc64le
77b50969 runtime: cloud-hypervisor: reduce memory footprint
2e1a8f0a agent: Improve unit test coverage for src/sandbox.rs
87848e87 versions: Update crio version
172d015e rustjail: fix the issue of create thread failed causing thread panic
9e93463b agent/rustjail: improve unit test coverage for rustjail/container.rs
ad4f7b86 agent/rustjail: make mount and umount2 public
926a6186 agent/rustjail: fix typo
8130d9b2 agent/rustjail: don't use unwrap in container::oci_state
5d111071 rustjail: add mock implementation for cgroup manager
e3eff0eb agent: Update build instructions
0896ce80 agent: update proto file copyright
6e9ca457 agent: generate proto files properly
837343f0 agent-ctl: update cargo.lock
b3166618 runtime: remove the unused proto files
54e23c83 agent: move gogo.proto out of the github.com namespance
583e6ed3 agent: types.pb.go is not regenerated
bb19fcb9 docs: Update documentation with new subcommand forms
d2fe7091 cli: Use new subcommand forms in kata-manager script
4d9ab0cd cli: Support new subcommand forms in bash completion
c5d355e1 cli: Remove `kata-` prefix from env and check subcommands
f134b4a3 agent: Update build instructions
9e9988df agent/protocols: Move agent.proto out of the mock folder of agent
e90aa7b4 agent: fixes the permissions of PID 1's STDIO
b9b281e7 packaging: Use apply-patches.sh in build-kernel.sh
163e6104 packaging: Make qemu/apply_patches.sh common
d4cf3057 packaging: qemu/apply_patches.sh should sort the patches
5b065eb5 runtime: change govmm package
9cb41507 agent/protocols: Fix copyright header checking
0d58d919 agent/protocols: Stop generate agent proto files in the shellscript
7559382b agent/protocols: Ignore generated files and remove these files from repo
fdc33fb7 agent/protocols: Generate proto files programmatically
f1c3bf6b runtime: let kata-collect-data.sh collect kata-monitor info
993a8da3 kata-monitor: add version subcommand
4ee78120 runtime: Restore QEMUVIRTIOFSPATH variable in Makefile
df4ce9fa ci: add `cargo clippy` for agent
2e138788 agent: clear match_like_matches_macro/vec_resize_to_zero warnings
227edfdc agent: clear module_inception/type_complexity warnings
698d25b7 agent: clear redundant_field_names clippy warning
4dd9bd7a agent: clear clippy `len_zero` warnings
bf7dec5c agent: clear clippy warnings
56f867ee rustjail: clear clippy warnings
16757ad4 oci: clear clippy warnings
f32f49bd logging: clear clippy warnings
5b079a3b snap: add GH actions jobs to release the snap package
2738b18b runtime: Fix firecracker config
e5d4259a runtime: Simplify make variables for clh
9eab3015 arm64: correct bridge type for QEMUVIRT
b88aac04 docs: Update how-to Readme with hypervisor information.
d6464117 docs: Update Readme to remove hypervisor information
b4f9fb51 docs: Remove docs for nemu
96a4ed7d Makefile: Replace @RUNTIME_NAME@ with the target in generated files
7159fc2e agent: simplify ttrpc error construction
0f894986 snap: install libseccomp-dev
9a351509 package: drop qemu-virtiofs shim
6ed669a1 packaging: install virtiofsd for normal qemu build as well
da79b4be virtcontainers: Append max_ports to virtio-serial device
bcf48530 runtime: enable virtiofs by default
e2221d34 tools: Improve agent-ctl README
2d1f2c7b kernel: update to 5.4.71
d3c98620 config: make virtio-fs part of standard kernel
edf02af1 tools: Make agent-ctl support more APIs
56201803 tools: Remove commented out code in agent-ctl
9bac4ee6 tools: Log request in agent-ctl tool if debug enabled
68821f08 tools: Rename agent-ctl command to GetGuestDetails
8553f062 tools: Fix comment in agent-ctl
6ba294a1 agent: remove `unwrap()` for `e.as_errno()`
e77482fe agent: Use `?` instead of `match` when the error returns directly
1b7ed328 kata-monitor: use regexp to check if runtime is kata containers
47ff2fb9 agent: use anyhow `context` to attach context to `Error` instead of `match`
2f690a2b agent: remove useless match
1d8def66 agent: Use `ok_or_else` instead of match for Option -> Result
84953066 agent: Fix crasher if AddARPNeighbors request empty
3d084c7d agent: Fix crasher if UpdateRoutes request empty
5615e5a7 agent: Fix crasher if UpdateInterface request empty
0dce817e agent: replace `match Result` with `or_else`
7bf4073d agent: replace unnecessary `match Result` with `map_err`
7f9e5913 agent: replace check! with map_err for readability
09aca49e agent: remove `check!` in child process because we cant' see logs.
a18899f1 agent: refactor namespace::setup to optimize error handling
a3c64e5c agent: replace `if let Err` with `or_else`
6ffa8283 agent: replace `if let Err` with `map_err`
863f918a rustjail: add length check for uid_mappings in rootless euid mapping
720eab78 versions: Update Kubernetes, containerd, cri-o and cri-tools
c5771be2 annotations: Correct unit tests to validate new protections
398d7918 annotations: Split addHypervisorOverrides to reduce complexity
b2b3bc7a annotations: Add unit test for checkPathIsInGlobs
6f52179c annotations: Add unit test for regexpContains function
966bd573 makefile: Add missing generated vars to `USER_VARS`
be6ee255 makefile: Improve names of config entries for annotation checks
b1194274 annotations: Give better names to local variabes in search functions
b5db114a annotations: Rename checkPathIsInGlobList with checkPathIsInGlobs
d65a7d10 config: Add better comments in the template files
7c6aede5 config: Whitelist hypervisor annotations by name
f047fced config: Use glob instead of regexp to match paths in annotations
11b9c90c annotations: Fix typo in comment
c16cdcb2 config: Add makefile variables for path lists
4e89b885 config: Protect file_mem_backend against annotation attacks
aae9656d config: Protect vhost_user_store_path against annotation attacks
55881653 config: Add security warning on configuration examples
b21a829c config: Protect ctlpath from annotation attack
27b6620b config: Protect jailer_path annotation
07669017 config: Add examples for path_list configuration
2d431c61 annotations: Simplify negative logic
2ca9ca89 config: Add hypervisor path override through annotations
2e093dfd config: Fix typo in function name
bf13ff0a config: Protect virtio_fs_daemon annotation
8c75de19 config: Add 'List' alternates for hypervisor configuration paths
fc6468ef agent: fix panic on malformed device resource in container update
d8a8fe47 cpuset: don't set cpuset.mems in the guest
88cd7128 sandbox: consider cpusets if quota is not enforced
77a463e5 cpuset: support setting mems for sandbox
2d690536 cpuset: add cpuset pkg
1a9515a9 runtime: Pass `--thread-pool-size=1` to virtiofsd
1c528cd1 packaging: Apply virtiofs performance related fixes to 5.x
5b520003 docs: Update upgrading guide
0e0564a5 docs: update the build kata containers kernel document
ae6b8ec7 agent/device: Check type as well as major:minor when looking up devices
859301b0 agent/device: Index all devices in spec before updating them
2477c355 agent/device: Forward port update_spec_device_list() unit test
08d80c1a agent/device: update_spec_device_list() should error if dev not found
12cc0ee1 sandbox: don't constrain cpus, mem only cpuset, devices
b6cf68a9 cgroups: add ability to update CPUSet
b812d4f7 virtcontainers: add method for calculating cpuset for sandbox
f63f7405 agent: fix errorneous parsing for guest block size
43d70a32 docs: Add containerd install guide
11c1ab8b agent: use ok_or/map_err instead of match
6b9f9915 rustjail: use Iterator to manipulate vector elements
a7251651 docs: remove the 1.x version description about shim and proxy
dc1442c3 rustjail: delete codes commented out
aa04111d rustjail: delete unused test code
eae685dc agent: use chain of Result to avoid early return
5e3d1fb6 agent: add blank lines between methods
980e48ca agent: delete unused field in agentService
52b821fa agent: use no-named closure to reduce codes
82e94501 packaging: fix cloud-hypervisor binary path
b1f95e8d agent: use a local fn to reduce duplicated codes
154a356a packaging: apply qemu v5.1 stable fixes
c781a808 agent: fix aarch64 build
906b3844 agent: update not accurate comments
78318c18 packaging: fix missing cloud_hypervisor_repo
b7309943 agent: use macro to simplify parse_cmdline function in config.rs
9834a766 docs: add namespace key to pod/container config files
37e7de72 ci: snap: add event filtering
9a02e6eb docs: Add crictl example json files
b7147eda agent: do not follow link when mounting container proc and sysfs
15b71563 agent: set init process non-dumpable
00ad3fd3 agent-ctl: include cargo lock updates
8cd62d7b versions: add plugins section
c4472481 virtiofs: Disable DAX
3e56de81 snap: specify python version
e3cdc89b osbuilder: Create target directory for agent
7cad865d packaging: fix image build script
0e898c6b rust-agent: Treat warnings as error
0e4baaab rust-agent: Identify unused results in tests
5b2b5652 rust-agent: Log returned errors rather than ignore them
d617caf1 rust-agent: Remove unused imports
ee739c5d rust-agent: Report errors to caller if possible
d5b492a1 rust-agent: Ignore write errors while writing to the logs
c635c46a rust-agent: Remove unused code that has undefined behavior
ec24f688 rust-agent: Remove 'mut' where not needed
c8f406d4 rust-agent: Remove uses of deprecated functions
f832d8a6 rust-agent: Remove or rename unused parameters
5a1d3311 rust-agent: Remove or rename unused variables
27efe291 rust-agent: Remove unused functions
d76ece0c rust-agent: Remove useless braces
3682812e rust-agent: Remove unused macros
483209bf actions: add kata deploy test
07930024 packaging: cleaning, updating based on new filepaths
f0f205cd packaging: remove obs-packaging
4b1753c5 packaging: pull versions, build-image out from obs dir
3f6cd4d5 packaging: Revert "packaging: Stop providing OBS packages"
c33ee54a clh: Support VFIO device unplug
1f4dfa31 clh: Remove unnecessary VmmPing
cc80ae0a versions: cloud-hypervisor: Bump to version 6d30fe05
0fec7a4d docs: Change kata_tap0 to tap0_kata
3394a6a5 docs: update networking description
2e83f405 dev-guide: update kata-agent install details
ffea705a docs: Update docs for enabling agent debug console
777f3981 docs: update dev guide for agent build
aa8eefd8 ci: add github action to test the snap
ea1cb37b versions: cloud-hypervisor: bump version
0ebffdf2 runtime: cloud-hypervisor: tag openapi-generator-cli container
e51a1ea3 docs: use-cases: Add Intel SGX use case
7d638231 runtime/vendor: add k8s.io/apimachinery/pkg/api/resource
6df165c1 runtime: add support for SGX
a5b3e1cd docs: drop docker installation guide
6c4300c6 docs: fix static check errors in docs/install/README.md
59224a76 docs: update architecture.md
a89deb3e rust-agent: Update README
80c52834 github: Remove issue template and use central one
0ccbca3b agent: Fix OCI Windows network shared container name typo
a6221a74 qemu: upgrade qemu version to 5.1.0 for arm64.
f30b86f1 Packaging: release notes script using error kernel path urls
a7faeaac docs: fix broken links
4501c25a agent: propagate the internal detail errors to users
1984e635 ci: fix clone_tests_repo function
02c1a59f agent: Set LIBC=gnu for ppc64le arch by default
7019e72c agent: remove unreachable code
942999ed agent: Change do_exec return type to ! because it will never return
757dfa70 fc: integrate Firecracker's metrics
b03d958e gitignore: ignore agent service file
64b4f698 agent: fix UT failures due to chdir
85d22301 runtime: fix TestNewConsole UT failure
e90e9a2c travis: skip static checker for ppc64
5611283e runtime: fix golint errors
daf2a54d agent: fix cargo fmt
c05c4ba5 ci: always checkout 2.0-dev of test repository
1569b3b3 docs: fix static check errors
df3119b6 runtime: fix make check
484a595f runtime: add enable_debug_console configuration item for agent
febdf8f6 runtime: add debug console service
07d339c7 devices: fix go test warning in manager_test.go
a4afe3af rust-agent: Replaces improper use of match for non-constant patterns
acaa806c agent: Only allow proc mount if it is procfs
ca501e54 osbuilder: specify default toolchain verion in rust-init.
03517327 action: Allow long lines if non-alphabetic
33513fb4 rustjail: make the mount error info much more clear
45b0b4ed agent/oci: Don't use deprecated Error::description() method
a34478ff runtime: Update cloud-hypervisor client pkg to version v0.10.0
ce675075 static-build/qemu-virtiofs: Refactor apply virtiofs patches
512b38cf packaging/qemu: Add common code to apply patches
edce2712 static-build/qemu-virtiofs: Fix to apply QEMU patches
86a864b8 packaging: Build from source if the clh release binary is missing
33585a8e runtime: Fix linter errors in release files
e3a0f9b3 ci: use export command to export envs instead of env config item
36ce7018 agent: update cgroups crate
3523167d runtime: Call s.newStore.Destroy if globalSandboxList.addSandbox
9e5a4b8b ci: use Travis cache to reduce build time
52984b67 docs: Update the reference path of kata-deploy in the packaging
eae21591 runtime: add podman configuration to data collection script
d1277848 how-to: add privileged_without_host_devices to containerd guide
98c4d11b docs: fix k8s containerd howto links
f107b12b docs: fix up developer guide for 2.0
9f2f5201 docs: Fix the kata-pkgsync tool's docs script path
96f8769a travis: enable RUST_BACKTRACE
cda7acf7 agent/rustjail: add more unit tests
98cc979a agent/rustjail: remove makedev function
b99fefad agent/rustjail: add unit tests for ms_move_rootfs and mask_path
d79fad2d agent/rustjail: implement functions to chroot
25c91afb agent/rustjail: add unit test for pivot_rootfs
7cf0fd95 agent/rustjail: implement functions to pivot_root
672da4d0 agent/rustjail: add unit test for mount_cgroups
ab61cf7f agent/rustjail: add unit test for init_rootfs
0a0714c9 agent/rustjail/mount: don't use unwrap
3dc9452b agent/rustjail: add tempfile crate as depedency
d756f52c rustjail: implement functions to mount and umount files
a02d1787 gitignore: ignore agent version.rs
b518ddea agent: fix agent panic running as init
1a77f69e runtime: make kata-check check for newer release
61181b9f packaging: use local version file for kata 2.0 in Makefile
e1c6aa27 docs: fix release process doc
1acfba4d packaging: fix release notes
1839dfd9 runtime: Clear the VCMock 1.x API Methods from 2.0
7225460a shimv2: add a comment in checkAndMount()
22ca2da6 packaging: Stop providing OBS packages
afa88c1b install: Add contacts to the distribution packages
3955cc89 install: Update information about Community Packages
218f77d7 install: Update SUSE information
2a0e76a8 install: Update openSUSE information
691f1364 install: Update RHEL information
270fc4b2 install: Update Fedora information
492b4e90 install: Update CentOS information
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
yq major releases are not backward compatible, install the same
major version used in the CI to avoid conflics building the kata
components.
We should update yq when the CI updates it, not before.
fixes#1232
Signed-off-by: Julio Montes <julio.montes@intel.com>
This reverts commit 6cc1920c37.
Instead of updating the syntax of yq, let's use yq 3.x, otherwise
yq must be updated in the CI and the syntax updated in all the
tools (osbuilder, packging).
Signed-off-by: Julio Montes <julio.montes@intel.com>
The snap build pulls the latest release of `yq`, but `yq` version 4
changed the CLI syntax for reading a YAML file.
Update the snap config file to use the new `yq` v4 syntax.
Fixes: #1232.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
This is not needed for Fedora, RHEL, and CentOS, but it is required when
using any other host OS. Having --security-opt apparmor=unconfined used
unconditionally is a no go as it'd break podman.
The reason this was only added when building for SUSE (as target distro)
was because debian and ubuntu condition would fall-through the switch to
the suse case (which makes me think that the fall-through was not
accidental).
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Falling through the switch cases in docker_extra_args() looks like a
typo and causes issues when building with podman, as `--security-opt
apparmor=unconfinded" shouldn't be passed if Apparmor is no enable on
the system.
Fixes: #1241
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Run static checks prior to building the agent.Checks
fail if run after since the compilation process
produces new rust code.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
It should check the read count and return an
error if read count didn't match the expected
number.
Fixes: #1233
Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
In case the container rootfs's /dev was overrided
by binding mount from another directory, then there's
no need to create the default devices nodes and symlinks
in /dev.
Fixes: #692
Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
Return SingularPtrField::none() instead of panic when getting stats
from cgroup failed caused by cgroup controller missing.
Signed-off-by: Tim Zhang <tim@hyper.sh>
Fixes: #1224
35ecd6f (origin/change-name, change-name) Update readme
eb6577e Change package name to cgroups-rs
8f6a7e0 Merge pull request #19 from Tim-Zhang/0.2.0
9baa065 (origin/0.2.0, 0.2.0) release: v0.2.0
e160df0 Make read_i64_from private and merge read_str_from to its caller
e1e05d3 Make new_with_relative_paths=new and load_with_relative_paths=new in v2
a89f4a0 Support set notify_on_release & release_agent
61a0957 Fix set_swappiness in cgroup v2
0592045 Ignore kmem in cgroup v2
c254fff Update readme
438d774 Fix test
42ee1ba Make Cgroup can be stored in struct
b6bb5ae docs: Hide Re-exports
d2882b1 Print cause when println!("{}")
abcb5ed Add more logs for create_dir error in controller.create
1f188be Detect subsystems and get root from /proc/self/mountinfo
fbd7164 Fix warnings in tests
f342254 Remove Box wrap of Cgroup.hire
cd998f3 Do not place cgroup under relative path read from cgroup by default
1ac76b6 Make function find_v1_mount pub
121f78d Expose deletion error
0f76570 Avoid exception caused by cgroup writeback feature
10650e2 Update tests to adapt new type of fields in resource
567cdb4 Use Option as resource fields, remove the update switch: update_values
0c18b08 Support customized attributes for CpuController and MemController
ca610bb add add_task_by_tgid
Signed-off-by: Tim Zhang <tim@hyper.sh>
get_bool_value() in src/agent/src/config.rs includes a Result::or_else()
call with a trivial closure which can be replaced by a Result::or. This
removes a clippy warning.
fixes#1201
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Wrong prefix on the created temp directory on the test_setup_persistent_ns
for uts namesmpace type test.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
An pid namespace cannot be persisted, so add a check-and-error on
Namespace::setup() for handling that case.
Fixes#1220
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
RemoveContainerRequest results in calling to deleteContainer, according
to spec calling to RemoveContainer is idempotent and "must not return
an error if the container has already been removed", hence, don't
return error if the error reports that the container is not found.
Fixes: #836
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
The first incompatible issue is caused by a typo, "swapiness" should
be "swappiness". The second incompatible issue is caused by a serde
format. The struct LinuxBlockIODevice is introduced for convenience,
but it also changes serialized data, so "#[serde(flatten)]" should
be used for compatibility with OCI spec.
Fixes: #1211
Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
On pod delete, we were looking to read files that we had just deleted. In particular,
stopSandbox for QEMU was called (we cleanup up vmpath), and then QEMU's
save function was called, which immediately checks for the PID file.
Let's only update the persist store for QEMU if QEMU is actually
running. This'll avoid Error messages being displayed when we are
stopping and deleting a sandbox:
```
level=error msg="Could not read qemu pid file"
```
I reviewed CLH, and it looks like it is already taking appropriate
action, so no changes needed.
Ideally we won't spend much time saving state to persist.json unless
there's an actual error during stop/delete/shutdown path, as the persist will
also be removed after the pod is removed. We may want to optimize this,
as currently we are doing a persist store when deleting each container
(after the sandbox is stopped, VM is killed), and when we stop the sandbox.
This'll require more rework... tracked in:
https://github.com/kata-containers/kata-containers/issues/1181Fixes: #1179
Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
When receiving an OnlineCpuMemory RPC, if the number of CPUs to be
made available is 0, then updating the cpusets is a redundant operation.
Fixes: #1172
Signed-off-by: Maruth Goyal <maruthgoyal@gmail.com>
If the upcast from resultingRoutes to *grpc.IRoutes fails, we return
(nil, err), but previous code ensures that err is nil at that point, so we
return no error.
fixes#1206
Forward port of
0ffaeeb5d8
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
If the upcast from resultingInterfaces to *grpc.Interfaces fails, we
return (nil, err), but previous code ensures that err is nil at that
point, so we return no error.
Forward port of
b86e904c2dfixes#1206
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
oci.proto imports "google/protobuf/wrappers.proto", but doesn't appear to
use it, which causes a warning from protoc when we compile it. Remove the
import to fix the warning.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The generate_go_sources() function in update-generate-proto.sh adds a
number of include directives to the protoc command line. Some of these
don't appear to be necessary to correctly compile the agent's protocol
files, so remove them.
Amongst other things were directives pointing at the old Kata1 runtime and
agent repositories. Those ones could be actively harmful by causing odd
dependencies of the Kata2 build on the Kata1 repositories.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
src/agent/protocols/hack/update-generated-proto.sh checks for the presence
of protoc-gen-rust and ttrpc_rust_plugin, but it doesn't actually need
them. Those tools are needed to generate Rust code from the gRPC proto
files, but that's already handled in src/agent/protocols/build.rs using
Cargo for dependency management.
This script is only needed for the Go code, for which the other tools are
sufficient.
fixes#1198
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This PR wraps fd raw descriptor with File, so it'll be properly closed once exited.
Fixes: #1192
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
Improve Kata Containers memory footprint by reducing debian
image size.
Without this change:
Debian image -> 256MB
With this change:
Debian image -> 128MB
Note: this change *will not* impact ubuntu image.
fixes#1188
Signed-off-by: Julio Montes <julio.montes@intel.com>
The current description on the Cloud Hypervisor support in Kata
containers were introduced back to kata 1.10 and are out-dated.
Depends-on: github.com/kata-containers/tests#3106
Fixes: #1167
Signed-off-by: Bo Chen <chen.bo@intel.com>
On runtime/Makefile the value of DESTDIR is set to "/", unless one
pass that variable as an argument to `make`. This change will
allow its overwrite if DESTDIR is exported in the environment as
well.
Fixes#1182
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
It is used a temporary container to pull the QEMU tarball out
of the build image, but this container is never deleted. This
will ensure it gets deleted after its execution.
Fixes#1168
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Currently QEMU is built inside the container, its tarball pulled to
the host, files removed then packaged again. Instead, let's run all
those steps inside the container and the resulting tarball will
be the final version. For that end, it is introduced the
qemu-build-post.sh script which will remove the uneeded files and
create the tarball.
The patterns for directories on qemu.blacklist had to be changed
to work properly with `find -path`.
Fixes#1168
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The dockerfiles used to build qemu and qemu-virtiofs have the QEMU destination
path hardcoded, which in turn is also on the build scripts. This refactor
the dockerfiles to add the QEMU_DESTDIR argument, which value is passed by the scripts.
Fixes#1168
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Define `before_starting_container` and `after_stopping_container`
functions, these functions run before and after the container that
builds the rootfs respectively.
Signed-off-by: Julio Montes <julio.montes@intel.com>
According to the new snap document
`docs/install/snap-installation-guide.md`, Kata Containers 2.x should
be available in the snapcraft `candidate` channel.
fixes#1174
Signed-off-by: Julio Montes <julio.montes@intel.com>
Improve snap documentation, document how to install
kata 1.x and 2.x, how to configure them and their integration
with container engines.
fixes#1138
Signed-off-by: Julio Montes <julio.montes@intel.com>
This is the Rust porting of https://github.com/kata-containers/agent/pull/371
`read_stdout`/`read_stderr` is blocking rpc calls, if exec process
exited, these calls is on blocking state for reading on process's
term master fd, and can't get a chance to break the wait.
In this PR, `read_stdout`/`read_stderr` will not read directly from
a term master of a process, instead, it will first have to get
an fd to read from newly added `epoller.poll()`. `epoller.poll()` may returns:
- the term master fd of exec process, if the process is running.
- a fd(piped fd) will return EOF when reading to indicate that th process is exited.
Fixes: #1160
Signed-off-by: bin liu <bin@hyper.sh>
Add trace calls to shimv2 that create spans for functions in service.go.
Tracing starts in New(), which is forked twice and is followed by either
StartShim() or Create().
Tracing cannot start without the value for Trace enabled from the
runtime config so load the config in New(), which results in it being
loaded every time New() is called in addition to where it is originally
loaded after Create().
Fixes#903
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
fix the custom agent binary file path for creating an initrd image in
the Developer-Guide.md file.
Fixes: #919
Signed-off-by: Ychau Wang <wangyongchao.bj@inspur.com>
Sometimes shim process cannot be shutdown because of container list
is not empty. This container list is written in shim service, while
creating container. We find that if containerd cancel its Create
Container Request due to timeout, but runtime didn't handle it properly
and continue creating action, then this container cannot be deleted at
all. So we should make sure the ctx passed to Create Service rpc call
is effective.
Fixes#1088
Signed-off-by: Yves Chan <shanks.cyp@gmail.com>
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
After cpu hot-plugged is available, cpuset for containers will be written into
cgroup files recursively, the paths should include container's cgroup path, and up
to root path of cgroup filesystem.
Fixes: #1156, #1159
Signed-off-by: bin liu <bin@hyper.sh>
The result of `cpuset_controller.set_cpus(&cpu.cpus)` is unwrapped,
this will lead creating container to fail if cpuset is set.
The sandbox's `CreateContainer` sequence is:
c, err := newContainer(s, &contConfig)
err = c.create()
c.sandbox.agent.createContainer(c.sandbox, c) (1)
err = s.updateResources()
oldCPUs, newCPUs, err := s.hypervisor.resizeVCPUs(sandboxVCPUs) (2)
cpuset only avaiable after `s.hypervisor.resizeVCPUs` has been called at (2),
and then cpuset is written to cgourps file.
Fixes: #1159
Signed-off-by: bin liu <bin@hyper.sh>
We should always cleanup the vm directory when doing `stopSandbox`,
while we are skipping the cleanup process on some error code paths when
using cloud-hypervisor driver.
Fixes: #1098
Signed-off-by: Bo Chen <chen.bo@intel.com>
The VMT process is well documented, but users would need to land on
community repo to find it. Let's make it easier to identify the correct
way to disclose vulnerabilities.
Fixes: #1136
Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
Otherwise `make install` run from the top directory would just fail as
the target is not defined.
Fixes: #1149
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Otherwise `make install` run from the top directory would just fail as
the target is not defined.
Fixes: #1149
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Before loading a module, the check subcommand should check if the
current user can load it.
fixes#3085
Signed-off-by: Julio Montes <julio.montes@intel.com>
Don't fail if rate limit is exceeded since this is a
limitation/restriction of Github not a problem in the host.
Print a warning when the rate limit is exceeded.
For more information about Github's rate limit, see
https://developer.github.com/v3/#rate-limiting
Signed-off-by: Julio Montes <julio.montes@intel.com>
The main process do unshare pid namespace, the process
couldn't spawn new thread, in order to avoid this issue,
fork a new child process and do the pid namespace unshare
in the new temporary process.
Fixes: #1140
Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
Since kata containers shared the network ns with
the guest system, thus there's no need to do the
network ns check.
Fixes: #1047
Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
In first shim v2 startup(with `start` command-line option), it will start
the second shim v2 process running as ttrpc server, there is no needs to
wait the second process, because the current shim v2 process will exit immediately.
Fixes: #1127
Signed-off-by: bin liu <bin@hyper.sh>
Add `kata-runtime` and `kata-collect-data.sh` commands to the apps
section, these two command will be accessible through the commands
`kata-containers.runtime` and `kata-containers.collect-data`
respectively.
Henceforth the snap command for `containerd-shim-kata-v2` will be
`kata-containers.shim`
fixes#1122
Signed-off-by: Julio Montes <julio.montes@intel.com>
In PR 1079, CleanupContainer's parameter of sandboxID is changed to VCSandbox, but at cleanup,
there is no VCSandbox is constructed, we should load it from disk by loadSandboxConfig() in
persist.go. This commit reverts parts of #1079Fixes: #1119
Signed-off-by: bin liu <bin@hyper.sh>
Split some of the core hypervisor details out of the virtualisation
document and present in a simpler fashion for new users.
Fixes: #1063.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
- Removed the `QEMU-virtio-fs` entry from the virtualization doc since
support is now available upstream and the QEMU virtio-fs-specific
configuration file has been removed.
- Removed NEMU as this is no longer used.
- Sorted the remaining rows.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Referring virtio-net mentioning in the kata virtualization
documentation to the virtio-networking blog series published
and explaining how it works.
Fixes#612
Signed-off-by: Ariel Adam <aadam@redhat.com>
Changing "implementor" to "implementer"
Fixes: #612
Signed-off-by: Ariel Adam <aadam@redhat.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
This (unreleased) version of CRI-O brings in the possibility of enabling
the `k8s-oom.bats` test.
Depends-on: github.com/kata-containers/tests#3060
Fixes: #1116
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Under stress, the agent can be OOM-killed, which exists the sandbox.
One possible hard-to-diagnose manifestation is a virtiofsd crash.
Fixes: #1111
Reported-by: Qian Cai <caiqian@redhat.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Key name in configuration file is in snake case but not camel case.
And the key is processed as `enable_pprof` in code, the configuration
template file should replace `EnablePprof` it by `enable_pprof`
Fixes: #1109
Signed-off-by: bin liu <bin@hyper.sh>
Remove global sandbox variable, and save *Sandbox to hypervisor struct.
For some needs, hypervisor may need to use methods from Sandbox.
Signed-off-by: bin liu <bin@hyper.sh>
Let's update CRI-O version to the commit which introduced the fix for
the "k8s-copy-file" tests.
Depends-on: github.com/kata-containers/tests#3042
Fixes: #1080
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Update cloud-hypervisor to commit 2706319.
Fixes a limitation in OpenAPITools/openapi-generator tool,
it's impossible to send go zero types, like false and 0 to
cloud-hypervisor because `omitempty` is added if a field is not
required.
See cloud-hypervisor/cloud-hypervisor#1961 for more information
Signed-off-by: Julio Montes <julio.montes@intel.com>
Guest consumes 120Mb more of memory when DAX is enabled and the default
FS cache size (8G) is used. Disable dax when it is not required
reducing guest's memory footprint.
Without this patch:
```
7fdea4000000-7fdee4000000 rw-s 18850589 /memfd:ch_ram (deleted)
Size: 1048576 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 187876 kB
```
With this patch:
```
7fa970000000-7fa9b0000000 rw-s 612001 /memfd:ch_ram (deleted)
Size: 1048576 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 57308 kB
Pss: 56722 kB
```
fixes#1100
Signed-off-by: Julio Montes <julio.montes@intel.com>
Fix `kata-runtime kata-check`'s network version check which was failing
when the user was running a release candidate build and the latest
release was a major one, two examples of the error being:
- `BUG: unhandled scenario: current version: 1.12.0-rc0, latest version: 1.12.0`
- `BUG: unhandled scenario: current version: 2.0.0-rc0, latest version: 2.0.0`
Fixes: #1104.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
According to the Semantic Versioning specification, build metadata must
be ignored for version comparisions, so add some explicit tests for this
scenario to `TestGetNewReleaseType()`.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
In some cases, for example agent crashed and not marked dead yet, the GetOOMEvent
will return errors like `connection reset by peer` or `ttrpc: closed`. Do a sleep
with 1 second (agent check interval) and let agent health check to do the check.
Fixes: #991
Signed-off-by: bin liu <bin@hyper.sh>
The documentation contains existing spelling mistakes that are caught by the CI
and prevent checking in. The errors include:
INFO: Spell checking file 'docs/how-to/how-to-load-kernel-modules-with-kata.md'
WARNING: Word 'configurated': did you mean one of the following?: configuration, reconfigured, Confederate, confederate
WARNING: Word 'cri': did you mean one of the following?: cir, crib, chi, cry, Fri, crier
ERROR: Spell check failed for file: 'docs/how-to/how-to-load-kernel-modules-with-kata.md'
INFO: spell check failed for document docs/how-to/how-to-load-kernel-modules-with-kata.md
INFO: Spell checking file 'docs/how-to/how-to-set-sandbox-config-kata.md'
INFO: Spell check successful for file: 'docs/how-to/how-to-set-sandbox-config-kata.md'
ERROR: spell check failed, See https://github.com/kata-containers/documentation/blob/master/Documentation-Requirements.md#spelling for more information.
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
The documentation `how-to/how-to-set-sandbox-config-kata.md` contains a number
of differences relative to the 1.x variant, which do not seem to correspond to
missing features in the actual code.
Fixes: #1046
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
This reverts commit ff13bde3c1, which
moved back CRI-O to v1.18.3.
The was, IMHO, a little bit premature. We want to know exactly what are
the issues on v1.18.4, solve those, and be prepared for a v1.18.5 bump
(or even a bump to a specific commit, if needed).
Just for the sake of the completeness, v1.18.4 caused a regression on
"k8s-copy-file" tests, which is tracked on CRI-O side as
https://github.com/cri-o/cri-o/issues/4353.
Fixes: #1080
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Use `if err := q.qmpSetup(); err != nil` to reduce code and make it easy
to read. And remove checking err if last function call also return an error,
return the function call directly.
Fixes: #1081
Signed-off-by: bin liu <bin@hyper.sh>
Kata 2.0 uses virtio-fs as the shared_fs by default,
bug VM templating cannot be used with virtio-fs.
Fixes: #1091
Signed-off-by: AIsland <yuchunyu01@inspur.com>
The release v0.11.0 of cloud-hypervisor features the following changes:
1) Improved Linux Boot Time, 2) `SIGTERM/SIGINT` Interrupt Signal,
Handling 3) Default Log Level Changed, 4) `io_uring` support by default
for `virtio-block` (on host kernel version 5.8+), 5) Windows Guest
Support, 6) New `--balloon` Parameter Added, 7) Experimental
`virtio-watchdog` Support, 8) Bug fixes.
Fixes: #1089
Signed-off-by: Bo Chen <chen.bo@intel.com>
Only display the `ttrpc` crate log output when full logging
(trace level) is enabled.
This is a slight abuse of log levels but provides developers and testers
what they need whilst also keeping the logs relatively quiet for the
default info log level (the `ttrpc` crate logging is a bit "chatty").
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
The `ttrpc` crate uses the `log` crate for logging. But the agent uses
the `slog` crate. This means that currently, all `ttrpc` log messages
are being discarded.
Use the `slog-stdlog` create to redirect `log` crate logging calls into
`slog` so they are visible in the agents log output.
Fixes: #978.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Add support for a `KATA_AGENT_LOG_LEVEL` environment variable for testing.
This is the equivalent to the `agent.log=` kernel command line option.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Make `asset.go` the arbiter of asset annotations by removing all asset
annotations lists from other parts of the codebase.
This makes the code simpler, easier to maintain, and more robust.
Specifically, the previous behaviour was inconsistent as the following
ways:
- `createAssets()` in `sandbox.go` was not handling the following asset
annotations:
- firmware:
- `io.katacontainers.config.hypervisor.firmware`
- `io.katacontainers.config.hypervisor.firmware_hash`
- hypervisor:
- `io.katacontainers.config.hypervisor.path`
- `io.katacontainers.config.hypervisor.hypervisor_hash`
- hypervisor control binary:
- `io.katacontainers.config.hypervisor.ctlpath`
- `io.katacontainers.config.hypervisor.hypervisorctl_hash`
- jailer:
- `io.katacontainers.config.hypervisor.jailer_path`
- `io.katacontainers.config.hypervisor.jailer_hash`
- `addAssetAnnotations()` in the `oci` package was not handling the
following asset annotations:
- hypervisor:
- `io.katacontainers.config.hypervisor.path`
- `io.katacontainers.config.hypervisor.hypervisor_hash`
- hypervisor control binary:
- `io.katacontainers.config.hypervisor.ctlpath`
- `io.katacontainers.config.hypervisor.hypervisorctl_hash`
- jailer:
- `io.katacontainers.config.hypervisor.jailer_path`
- `io.katacontainers.config.hypervisor.jailer_hash`
This change fixes the bug where specifying a custom hypervisor path via an
asset annotation was having no effect.
Fixes: #1085.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Add missing annotation definitions for a hypervisor control binary:
- `io.katacontainers.config.hypervisor.ctlpath`
- `io.katacontainers.config.hypervisor.hypervisorctl_hash`
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
When guest panic, dump guest kernel memory to host filesystem.
And also includes:
- hypervisor config
- hypervisor version
- and state of sandbox
Fixes: #1012
Signed-off-by: bin liu <bin@hyper.sh>
bindmount remount events are not propagated through mount subtrees,
so we have to remount the shared dir mountpoint directly.
E.g.,
```
mkdir -p source dest foo source/foo
mount -o bind --make-shared source dest
mount -o bind foo source/foo
echo bind mount rw
mount | grep foo
echo remount ro
mount -o remount,bind,ro source/foo
mount | grep foo
```
would result in:
```
bind mount rw
/dev/xvda1 on /home/ubuntu/source/foo type ext4 (rw,relatime,discard,data=ordered)
/dev/xvda1 on /home/ubuntu/dest/foo type ext4 (rw,relatime,discard,data=ordered)
remount ro
/dev/xvda1 on /home/ubuntu/source/foo type ext4 (ro,relatime,discard,data=ordered)
/dev/xvda1 on /home/ubuntu/dest/foo type ext4 (rw,relatime,discard,data=ordered)
```
The reason is that bind mount creats new mount structs and attaches them to different mount subtrees.
However, MS_REMOUNT only looks for existing mount structs to modify and does not try to propagate the
change to mount structs in other subtrees.
Fixes: #1061
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Correct the link in the GitHub action commit message check showing users how to format all commits.
Fixes: #1053
Signed-off-by: AIsland <yuchunyu01@inspur.com>
Correct the default configuration of [hypervisor.qemu] shared_fs in configuration-qemu.toml to virtio-fs in kata 2.0.
Fixes: #1054
Signed-off-by: AIsland <yuchunyu01@inspur.com>
In cloud-hypervisor, it provides a single unified way of unplugging
devices, e.g. the `/vm.RemoveDevice` HTTP API. Taking advantage of this
API, we can simplify our implementation of `hotplugRemoveDevice` in
`clh.go`, where we can consolidate similar code paths for different
device unplug (e.g. no need to implement `hotplugRemoveBlockDevice` and
`hotplugRemoveVfioDevice` separately). We will only need to retrieve the
right `deviceID` based on the type of devices, and use the single
unified HTTP API for device unplug.
Fixes: #1076
Signed-off-by: Bo Chen <chen.bo@intel.com>
Cloud-hypervisor supports DAX, let's enable it to reduce its memory
footprint.
Before this patch:
**19.96M**
```
20448kB -- [/usr/share/kata-containers/kata.img]
```
With this patch:
**10.83M**
```
11100kB -- [/usr/share/kata-containers/kata.img]
```
fixes#1056
Signed-off-by: Julio Montes <julio.montes@intel.com>
It's should catch the failed error of spawning a new thread, otherwise,
it would cause the current thread panic.
Fixes: #1034
Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
Only root is able to create and manipulate cgroups, this mock
implementation of a cgroup manager can used in unit testing.
Signed-off-by: Julio Montes <julio.montes@intel.com>
Fix the instructions explaining how to build the agent from source now that make needs to be run to auto-generate some source files.
Fixes: #889.
Signed-off-by: LiYa'nan <oliverliyn@gmail.com>
Remove the old subcommands from the documentation and replace them with
the new form (without the redundant `kata-` prefix).
Signed-off-by: Daniel Knittl-Frank <knittl89+git@googlemail.com>
Update the `kata-manager` script to call the new subcommand forms
without `kata-` prefix.
Signed-off-by: Daniel Knittl-Frank <knittl89+git@googlemail.com>
Provide the subcommands `kata-env` and `kata-check` as `env` and `check`
respectively.
Fixes#1011
Signed-off-by: Daniel Knittl-Frank <knittl89+git@googlemail.com>
fixup! cli: Add aliases to kata-env and kata-check commands
Fix the instructions explaining how to build the agent from source now that make needs to be run to auto-generate some source files.
Fixes: #889
Signed-off-by: LiYa'nan <oliverliyn@gmail.com>
Because the repos have been merged and the agent repo will be removed in the future,
we do not need mock the file structure any more.
Signed-off-by: Tim Zhang <tim@hyper.sh>
Fix the permissions of PID 1's STDIO within the container to
the specified user.
The ownership needs to match because it is created outside of the
container and needs to be localized.
Fixes: #1022
Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
Calls apply-patches.sh in kernel/build-kernel.sh to apply the
kernel patches.
Fixes#1014
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Moved the qemu/apply_patches.sh to the common scripts directory and
refactor it so that it can be used as a generic and consistent way
to apply patches.
Fixes#1014
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Changed apply_patches.sh script so that patches are sorted before
they be applied.
Fixes#1014
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
To run `cargo clippy`, this commit includes changes:
- add a new Makefile target to run `cargo clippy`
- move `make`/`make check` to last step to let a fast retrun if `fmt`/`clippy` failed
Fixes: #951
Signed-off-by: bin liu <bin@hyper.sh>
To clear these two warnings, this commit did changes:
- add `#![allow(clippy::module_inception)]` to target files
- use type alias for tuple of `(MessageHeader, Vec<u8>)`
Signed-off-by: bin liu <bin@hyper.sh>
add `#![allow(clippy::redundant_field_names)]` can skip check
`protocols` package, and fix redundant_field_names in other
packages.
Signed-off-by: bin liu <bin@hyper.sh>
Use Github actions to build and release the snap package automatically
when a new tag is pushed.
fixes#1006
Signed-off-by: Julio Montes <julio.montes@intel.com>
The build was setting a `FCVALIDPATHS` variable for firecracker, but
that was never being used. Conversely, the firecracker configuration
template was expecting a `FCVALIDHYPERVISORPATHS`, but that variable was
never being set.
Resolve by only setting the `FCVALIDHYPERVISORPATHS` variable to ensure
the generated firecracker config is valid once again.
Fixes: #1001.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Simplify definition of the `CLHVALIDHYPERVISORPATHS` build variable to
use the already defined `CLHPATH`.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
While we have setup guides for firecracker and ACRN, as these
need additional configuration, it may confuse users looking
at this guide to find mentions of just these 2 hypervisors.
Call out all the hypervisors supported with Kata here.
Fixes#996
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
In commit 966bd57 for PR #902, the makefile was changed to automate
the replacement of user variables. However, one variable was treated
specially in the original `sed` replacements, namely `RUNTIME_NAME`
which was replaced by `$(TARGET)`.
This commit adds the `RUNTIME_NAME` variable to the makefile in order
to ensure that the replacement works correctly.
Fixes: #993
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
For experimental-virtiofs, we use it to test virtiofs with DAX. Let's
rename its virtiofsd to virtiofsd-dax.
Depends-on: github.com/kata-containers/tests#2951
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Allow API consumers to change the maximum number of ports in the
virtio-serial devices, setting a lower number of ports can improve the
boot time and reduce the attack surface.
Before this patch on arm64:
[ 0.028664] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[ 0.055031] printk: console [hvc0] enabled
After this patch on arm64:
[ 0.028484] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[ 0.031370] printk: console [hvc0] enabled
Fixes: #2676
Signed-off-by: Jia He <justin.he@arm.com>
We've been shipping it for a long time. It's time to make it default
replacing the old obsolet 9pfs.
Fixes: #935
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Basic virtio-fs support has made it upstream in the Linux kernel, as
well as in QEMU and Cloud Hypervisor. Let's go ahead and add it to the
standard configuration.
Since the device driver / DAX handling is still in progress for
upstream, we will want to still build a seperate experimental kernel for
those who are comfortable trading off bleeding edge stability/kernel
updates for improved FIO numbers.
Fixes: #963
Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
Added new `agent-ctl` commands to allow the following agent API calls to
be made:
- `AddARPNeighborsRequest`
- `CloseStdinRequest`
- `CopyFileRequest`
- `GetMetricsRequest`
- `GetOOMEventRequest`
- `MemHotplugByProbeRequest`
- `OnlineCPUMemRequest`
- `ReadStreamRequest`
- `ReseedRandomDevRequest`
- `SetGuestDateTimeRequest`
- `TtyWinResizeRequest`
- `UpdateContainerRequest`
- `WriteStreamRequest`
Fixes: #969.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Use `{:?}` to print `e.as_errno()` instead of using `{}`
to print `e.as_errno().unwrap().desc()`.
Avoid panic only caused by error's content.
Signed-off-by: Tim Zhang <tim@hyper.sh>
To support a few common configurations for Kata, including:
- `io.containerd.kata.v2`
- `io.containerd.kata-qemu.v2`
- `io.containerd.kata-clh.v2`
`kata-monintor` changes to use regexp instead of direct string comparison.
Fixes: #957
Signed-off-by: bin liu <bin@hyper.sh>
Check if the ARP neighbours specified in the `AddARPNeighbors` API is
set before using it to avoid crashing the agent.
Fixes: #955.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Check if the routes specified in the `UpdateRoutes` API is set before
using it to avoid crashing the agent.
Fixes: #949.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Check if the interface specified in the `UpdateInterface` API is set
before using it to avoid crashing the agent.
Fixes: #950.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Kubernetes: from 1.17.3 to 1.18.9
CRI-O: from 0eec454168e381e460b3d6de07bf50bfd9b0d082 (1.17) to 1.18.3
Containerd: from 3a4acfbc99aa976849f51a8edd4af20ead51d8d7 (1.3.3) to 1.3.7
cri-tools: from 1.17.0 to 1.18.0
Fixes: #960.
Depends-on: github.com/kata-containers/tests#2958
Signed-off-by: Salvador Fuentes <salvador.fuentes@intel.com>
Add the verification of some basic protections, namely that:
- EnableAnnotations is honored
- Dangerous paths cannot be modified if no match
- Errors are returned when expected
Fixes: #901
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Warning from gocyclo during make check:
virtcontainers/pkg/oci/utils.go:404:1: cyclomatic complexity 37 of func `addHypervisorConfigOverrides` is high (> 30) (gocyclo)
func addHypervisorConfigOverrides(ocispec specs.Spec, config *vc.SandboxConfig, runtime RuntimeConfig) error {
^
Fixes: #901
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
There are a few interesting corner cases to consider for this
function.
Fixes: #901
Suggested-by: James O.D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
James O.D Hunt: "But also, regexpContains() and
checkPathIsInGlobList() seem like good candidates for some unit
tests. The "look" obvious, but a few boundary condition tests would be
useful I think (filenames with spaces, backslashes, special
characters, and relative & absolute paths are also an interesting
thought here)."
There aren't that many boundary conditions on a list with regexps,
if you assume the regexp match function itself works. However, the
tests is useful in documenting expectations.
Fixes: #901
Suggested-by: James O.D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
This was discovered while checking a massive change in variables.
The root cause for the error is a very long list of manual
replacements, that is best replaced with a $(foreach).
All individual variables in the output configuration files were
checked against the old build using diff.
This is a forward port of a makefile fix included in
PR https://github.com/kata-containers/runtime/issues/3004
for issue https://github.com/kata-containers/runtime/issues/2943Fixes: #901
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
The entries used to be things like PATH_LIST, which are too generic.
Replace them with more precise name with a distinguishing keyword,
namely VALID. For example valid_hypervisor_paths.
Fixes: #901
Suggested-by: James O.D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
When there is a default value from the code (usually empty) that
differs from a possible suggested value from the distro, then the
wording "default: empty" is confusing.
Fixes: #901
Suggested-by: Julio Montes <julio.montes@intel.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Add a field "enable_annotations" to the runtime configuration that can
be used to whitelist annotations using a list of regular expressions,
which are used to match any part of the base annotation name, i.e. the
part after "io.katacontainers.config.hypervisor."
For example, the following configuraiton will match "virtio_fs_daemon",
"initrd" and "jailer_path", but not "path" nor "firmware":
enable_annotations = [ "virtio.*", "initrd", "_path" ]
The default is an empty list of enabled annotations, which disables
annotations entirely.
If an anontation is rejected, the message is something like:
annotation io.katacontainers.config.hypervisor.virtio_fs_daemon is not enabled
Fixes: #901
Suggested-by: Peng Tao <tao.peng@linux.alibaba.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
When filtering annotations that correspond to paths,
e.g. hypervisor.path, it is better to use a glob syntax than a regexp
syntax, as it is more usual for paths, and prevents classes of matches
that are undesirable in our case, such as matching .. against .*
Fixes: #901
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
A comment talking about runtime related annotations describes them as
being related to the agent. A similar comment for the agent
annotations is missing.
Fixes: #901
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Add variables to override defaults at build time for the various lists
used to control path annotations.
Fixes: #901
Suggested-by: Fabiano Fidencio <fidencio@redhat.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
This one could theoretically be used to overwrite data on the host.
It seems somewhat less risky than the earlier ones for a number
of reasons, but worth protecting a little anyway.
Fixes: #901
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Add the following text explaining the risk of using regular
expressions in path lists:
Each member of the list can be a regular expression, but prefer names.
Otherwise, please read and understand the following carefully.
SECURITY WARNING: If you use regular expressions, be mindful that
an attacker could craft an annotation that uses .. to escape the paths
you gave. For example, if your regexp is /bin/qemu.* then if there is
a directory named /bin/qemu.d/, then an attacker can pass an annotation
containing /bin/qemu.d/../put-any-binary-name-here and attack your host.
Fixes: #901
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
This also adds annotation for ctlpath which were not present
before. It's better to implement the code consistenly right now to make
sure that we don't end up with a leaky implementation tacked on later.
Fixes: #901
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
The jailer_path annotation can be used to execute arbitrary code on
the host. Add a jailer_path_list configuration entry providing a list
of regular expressions that can be used to filter annotations that
represent valid file names.
Fixes: #901
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
The path_list configuration gives a series of regular expressions that
limit which values are acceptable through annotations in order to
avoid kata launching arbitrary binaries on the host when receiving an
annotation.
Fixes: #901
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
The annotation is provided, so it should be respected.
Furthermore, it is important to implement it with the appropriate
protetions similar to what was done for virtiofsd.
Fixes: #901
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Sending the virtio_fs_daemon annotation can be used to execute
arbitrary code on the host. In order to prevent this, restrict the
values of the annotation to a list provided by the configuration
file.
Fixes: #901
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Paths mentioned in the hypervisor configuration can be overriden
using annotations, which is potentially dangerous. For each path,
add a 'List' variant that specifies the list of acceptable values
from annotations.
Bug: https://bugs.launchpad.net/katacontainers.io/+bug/1878234Fixes: #901
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Somehow containerd is sending a malformed device in update API. While it
should not happen, we should not panic either.
Fixes: #946
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Kata doesn't map any numa topologies in the guest. Let's make sure we
clear the Cpuset fields before passing container updates to the
guest.
Note, in the future we may want to have a vCPU to guest CPU mapping and
still include the cpuset.Cpus. Until we have this support, clear this as
well.
Fixes: #932
Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
CPUSet cgroup allows for pinning the memory associated with a cpuset to
a given numa node. Similar to cpuset.cpus, we should take cpuset.mems
into account for the sandbox-cgroup that Kata creates.
Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
CPUSet cgroup allows for pinning the memory associated with a cpuset to
a given numa node. Similar to cpuset.cpus, we should take cpuset.mems
into account for the sandbox-cgroup that Kata creates.
Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
Pulled from 1.18.4 Kubernetes, adding the cpuset pkg for managing
CPUSet calculations on the host. Go mod'ing the original code from
k8s.io/kubernetes was very painful, and this is very static, so let's
just pull in what we need.
Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
Dave Gilbert brough up that passing --thread-pool-size=1 to virtiofsd
may result in a performance improvement especially when using
`cache=none`. While our current default is `cache=auto`, Dave mentioned
that he seems no harm in having it set and he also mentiond that it may
use a lot less stack space on aarch/arm.
Fixes: #943
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Vivek Goyal found out that using "shared" thread pool, instead of
"exclusive" results in better performance.
Knowning that and with the plan to have virtio-fs as the default fs for
the 2.0, let's bring this patch in for both 5.0 and 5.1.
Fixes: #944
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Update the build kata containers kernel document for 2.0 release. Fixed
the 1.x release project paths and urls, using the kata-containers
project file paths and urls.
Fixes: #929
Signed-off-by: Ychau Wang <wangyongchao.bj@inspur.com>
To update device resource entries from host to guest, we search for
the right entry by host major:minor numbers, then later update it.
However block and character devices exist in separate major:minor
namespaces so we could have one block and one character device with
matching major:minor and thus incorrectly update both with the details
for whichever device is processed second.
Add a check on device type to prevent this.
Port from the Kata 1 Go agent
https://github.com/kata-containers/agent/commit/27ebdc9d2761Fixes: #703
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The agent needs to update device entries in the OCI spec so that it
has the correct major:minor numbers for the guest, which may differ
from the host.
Entries in the main device list are looked up by device path, but
entries in the device resources list are looked up by (host)
major:minor. This is done one device at a time, updating as we go in
update_spec_device_list().
But since the host and guest have different namespaces, one device
might have the same major:minor as a different device on the host. In
that case we could update one resource entry to the correct guest
values, then mistakenly update it again because it now matches a
different host device.
To avoid this, rather than looking up and updating one by one, we make
all the lookups in advance, creating a map from (host) device path to
the indices in the spec where the device and resource entries can be
found.
Port from the Go agent in Kata 1,
https://github.com/kata-containers/agent/commit/d88d46849130Fixes: #703
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The Kata 1 Go agent included a unit test for updateSpecDeviceList, but no
such unit test exists for the Rust agent's equivalent
update_spec_device_list(). Port the Kata1 test to Rust.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
If update_spec_device_list() is given a device that can't be found in the
OCI spec, it currently does nothing, and returns Ok(()). That doesn't
seem like what we'd expect and is not what the Go agent in Kata 1 does.
Change it to return an error in that case, like Kata 1.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Allow for constraining the cpuset as well as the devices-whitelist . Revert
sandbox constraints for cpu/memory, as they break the K8S use case. Can
re-add behind a non-default flag in the future.
The sandbox CPUSet should be updated every time a container is created,
updated, or removed.
To facilitate this without rewriting the 'non constrained cgroup'
handling, let's add to the Sandbox's cgroupsUpdate function.
Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
We were assuming base 10 string before, when the block size from sysfs
is actually a hex string. Let's fix that.
Fixes: #908
Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
Create a containerd installation guide and a new `kata-manager` script
for 2.0 that automated the steps outlined in the guide.
Also cleaned up and improved the installation documentation in various
ways, the most significant being:
- Added legacy install link for 1.x installs.
- Official packages section:
- Removed "Contact" column (since it was empty!)
- Reworded "Versions" column to clarify the versions are a minimum
(to reduce maintenance burden).
- Add a column to show which installation methods receive automatic updates.
- Modified order of installation options in table and document to
de-emphasise automatic installation and promote official packages
and snap more.
- Removed sections no longer relevant for 2.0.
Fixes: #738.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Sometimes `Option.or_or` and `Result.map_err` may be simpler
than match statement. Especially in rpc.rs, there are
many `ctr.get_process` and `sandbox.get_container` which
are using `match`.
Signed-off-by: bin liu <bin@hyper.sh>
Remove the build in shim and proxy desgin description from the
kata-api-design.md file.
Fixes: #912
Signed-off-by: Ychau Wang <wangyongchao.bj@inspur.com>
There are some uses/codes/struct fields are commented out, and
may not turn into un-comment these codes, so delete these comments.
Signed-off-by: bin liu <bin@hyper.sh>
Use rust `Result`'s `or_else`/`and_then` can write clean codes.
And can avoid early return by check wether the `Result`
is `Ok` or `Err`.
Signed-off-by: bin liu <bin@hyper.sh>
Qemu v5.1 was released with an affending commit 9b3a35ec82
(virtio: verify that legacy support is not accidentally on).
As a result, it breaks commandline compatiblilities for old qemu
users. Upstream qemu has fixed it but no release has been put out yet.
Let's apply these fixes by hand for now.
Refs: https://www.mail-archive.com/qemu-devel@nongnu.org/msg729556.html
Depends-on: github.com/kata-containers/tests#2945
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
This commit includes:
- update comments that not matched the function name
- file path with doubled slash
Fixes: #922
Signed-off-by: bin liu <bin@hyper.sh>
In function parse_cmdline there are some similar codes, if we want
to add more commandline arguments, the code will grow too long.
Use macro can reduce some codes with the same logic/processing.
Fixes: #914
Signed-off-by: bin liu <bin@hyper.sh>
If no namespace field in config files, CRI-O will failed:
setting pod sandbox name and id: cannot generate pod name without namespace
Signed-off-by: bin liu <bin@hyper.sh>
Run the snap CI on every PR is not needed. Don't run the snap CI
on PRs that don't change the source code (*.go/*.rs), a configuration
file or Makefile.
fixes#896
Signed-off-by: Julio Montes <julio.montes@intel.com>
Attackers might use it to explore other containers in the same pod.
While it is still safe to allow it, we can just close the race window
like runc does.
Fixes: #885
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
On old kernels (like v4.9), kernel applies CLOECEC in wrong order w.r.t.
dumpable task flags. As a result, we might leak guest file descriptor to
containers. This is a former runc CVE-2016-9962 and still applies to
kata agent. Although Kata container is still valid at protecting the
host, we should not leak extra resources to user containers.
This sets the init processes that join and setup the container's
namespaces as non-dumpable before they setns to the container's pid (or
any other ) namespace.
This settings is automatically reset to the default after the Exec in
the container so that it does not change functionality for the
applications that are running inside, just our init processes.
This prevents parent processes, the pid 1 of the container, to ptrace
the init process before it drops caps and other sets LSMs.
The order during the exec syscall is that the process is set back to
dumpable before O_CLOEXEC are processed.
Refs:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=613cc2b6f272c1a8ad33aefa21cad77af23139f7https://github.com/torvalds/linux/blob/v4.9/fs/exec.c#L1290-L1318opencontainers/runc@50a19c6https://nvd.nist.gov/vuln/detail/CVE-2016-9962Fixes: #890
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Simply running `make` would generate some cargo lock updates for
agent-ctl. Let's include them so that we have fixed dependencies.
Fixes: #883
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
plugins sections contains the details of plugins required for
the components or testing.
Add sriov-network-device-plugin url and version that are consumed
by the VFIO test in the tests repository.
fixes#879
Signed-off-by: Julio Montes <julio.montes@intel.com>
virtiofs DAX support is not stable today, there are
a few corner cases to make it default.
Fixes: #862Fixes: #875
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
In order to avoid `unmet dependencies` error in the CI,
the python version must be specified in the yaml.
fixes#877
Signed-off-by: Julio Montes <julio.montes@intel.com>
When building with AGENT_SOURCE_BIN pointing to an already built
kata-agent binary, the target directory needs to be created in the
rootfs tree.
Fixes#873
Signed-off-by: Ralf Haferkamp <rhafer@suse.com>
There were a couple of issues with the build-scripts discovered while
doing release:
- Relative paths are error prone. Fix error.
- short_commit_length is used to truncate sha for commits when
appending agent version to resulting files. Before this was
in pkglib.sh, which is otherwise an unused file from when we
supported OBS. Add this define to lib.sh, which is sourced by
the applicable packaging scripts.
There's plenty of room for improvement, but these fixes make the
existing scripts functional again.
Fixes: #871
Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
Assign unused results to _ in order to silence warnings.
This addresses the following warnings:
warning: unused `std::result::Result` that must be used
--> rustjail/src/mount.rs:1182:16
|
1182 | defer!(unistd::chdir(&olddir););
| ^^^^^^^^^^^^^^^^^^^^^^^
|
= note: `#[warn(unused_must_use)]` on by default
= note: this `Result` may be an `Err` variant, which should be handled
warning: unused `std::result::Result` that must be used
--> rustjail/src/mount.rs:1183:9
|
1183 | unistd::chdir(tempdir.path());
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: this `Result` may be an `Err` variant, which should be handled
While in regular code, we want to log possible errors, in test code
it's OK to simply ignore the returned value.
Fixes: #750
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
In a number of cases, we have functions that return a Result<...>
and where the possible error case is simply ignored. This is a bit
unhealthy.
Add a `check!` macro that allows us to not ignore error values
that we want to log, while not interrupting the flow by returning
them. This is useful for low-level functions such as `signal::kill` or
`unistd::close` where an error is probably significant, but should not
necessarily interrupt the flow of the program (i.e. using `call()?` is
not the right answer.
The check! macro is then used on low-level calls. This addresses the
following warnings from #750:
This addresses the following warning:
warning: unused `std::result::Result` that must be used
--> /home/ddd/go/src/github.com/kata-containers-2.0/src/agent/rustjail/src/container.rs:903:17
|
903 | signal::kill(Pid::from_raw(p.pid), Some(Signal::SIGKILL));
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: `#[warn(unused_must_use)]` on by default
= note: this `Result` may be an `Err` variant, which should be handled
warning: unused `std::result::Result` that must be used
--> /home/ddd/go/src/github.com/kata-containers-2.0/src/agent/rustjail/src/container.rs:916:17
|
916 | signal::kill(Pid::from_raw(child.id() as i32), Some(Signal::SIGKILL));
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: this `Result` may be an `Err` variant, which should be handled
warning: unused `std::result::Result` that must be used
--> rustjail/src/container.rs:340:13
|
340 | write_sync(cwfd, SYNC_FAILED, format!("{:?}", e).as_str());
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: `#[warn(unused_must_use)]` on by default
= note: this `Result` may be an `Err` variant, which should be handled
warning: unused `std::result::Result` that must be used
--> rustjail/src/container.rs:554:13
|
554 | / write_sync(
555 | | cwfd,
556 | | SYNC_FAILED,
557 | | format!("setgroups failed: {:?}", e).as_str(),
558 | | );
| |______________^
|
= note: this `Result` may be an `Err` variant, which should be handled
warning: unused `std::result::Result` that must be used
--> rustjail/src/container.rs:340:13
|
340 | write_sync(cwfd, SYNC_FAILED, format!("{:?}", e).as_str());
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: this `Result` may be an `Err` variant, which should be handled
warning: unused `std::result::Result` that must be used
--> rustjail/src/container.rs:340:13
|
340 | write_sync(cwfd, SYNC_FAILED, format!("{:?}", e).as_str());
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: `#[warn(unused_must_use)]` on by default
= note: this `Result` may be an `Err` variant, which should be handled
warning: unused `std::result::Result` that must be used
--> rustjail/src/container.rs:554:13
|
554 | / write_sync(
555 | | cwfd,
556 | | SYNC_FAILED,
557 | | format!("setgroups failed: {:?}", e).as_str(),
558 | | );
| |______________^
|
= note: this `Result` may be an `Err` variant, which should be handled
warning: unused `std::result::Result` that must be used
--> rustjail/src/container.rs:626:5
|
626 | unistd::close(cfd_log);
| ^^^^^^^^^^^^^^^^^^^^^^^
|
= note: `#[warn(unused_must_use)]` on by default
= note: this `Result` may be an `Err` variant, which should be handled
warning: unused `std::result::Result` that must be used
--> rustjail/src/container.rs:627:5
|
627 | unistd::close(crfd);
| ^^^^^^^^^^^^^^^^^^^^
|
= note: this `Result` may be an `Err` variant, which should be handled
warning: unused `std::result::Result` that must be used
--> rustjail/src/container.rs:628:5
|
628 | unistd::close(cwfd);
| ^^^^^^^^^^^^^^^^^^^^
|
= note: this `Result` may be an `Err` variant, which should be handled
warning: unused `std::result::Result` that must be used
--> rustjail/src/container.rs:770:9
|
770 | fcntl::fcntl(pfd_log, FcntlArg::F_SETFD(FdFlag::FD_CLOEXEC));
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: `#[warn(unused_must_use)]` on by default
= note: this `Result` may be an `Err` variant, which should be handled
warning: unused `std::result::Result` that must be used
--> rustjail/src/container.rs:799:9
|
799 | fcntl::fcntl(prfd, FcntlArg::F_SETFD(FdFlag::FD_CLOEXEC));
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: this `Result` may be an `Err` variant, which should be handled
warning: unused `std::result::Result` that must be used
--> rustjail/src/container.rs:800:9
|
800 | fcntl::fcntl(pwfd, FcntlArg::F_SETFD(FdFlag::FD_CLOEXEC));
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: this `Result` may be an `Err` variant, which should be handled
warning: unused `std::result::Result` that must be used
--> rustjail/src/container.rs:803:13
|
803 | unistd::close(prfd);
| ^^^^^^^^^^^^^^^^^^^^
|
= note: this `Result` may be an `Err` variant, which should be handled
warning: unused `std::result::Result` that must be used
--> rustjail/src/container.rs:930:9
|
930 | log_handler.join();
| ^^^^^^^^^^^^^^^^^^^
|
= note: `#[warn(unused_must_use)]` on by default
= note: this `Result` may be an `Err` variant, which should be handled
warning: unused `std::result::Result` that must be used
--> rustjail/src/container.rs:803:13
|
803 | unistd::close(prfd);
| ^^^^^^^^^^^^^^^^^^^^
|
= note: `#[warn(unused_must_use)]` on by default
= note: this `Result` may be an `Err` variant, which should be handled
warning: unused `std::result::Result` that must be used
--> rustjail/src/container.rs:804:13
|
804 | unistd::close(pwfd);
| ^^^^^^^^^^^^^^^^^^^^
|
= note: this `Result` may be an `Err` variant, which should be handled
warning: unused `std::result::Result` that must be used
--> rustjail/src/container.rs:842:13
|
842 | sched::setns(old_pid_ns, CloneFlags::CLONE_NEWPID);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: this `Result` may be an `Err` variant, which should be handled
warning: unused `std::result::Result` that must be used
--> rustjail/src/container.rs:843:13
|
843 | unistd::close(old_pid_ns);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: this `Result` may be an `Err` variant, which should be handled
Fixes: #844Fixes: #750
Suggested-by: Tim Zhang <tim@hyper.sh>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Various recently added error-causing calls
This addresses the following warning:
warning: unused `std::result::Result` that must be used
--> rustjail/src/cgroups/fs/mod.rs:93:9
|
93 | cg.add_task(CgroupPid::from(pid as u64));
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: `#[warn(unused_must_use)]` on by default
= note: this `Result` may be an `Err` variant, which should be handled
warning: unused `std::result::Result` that must be used
--> rustjail/src/cgroups/fs/mod.rs:196:17
|
196 | freezer_controller.thaw();
| ^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: this `Result` may be an `Err` variant, which should be handled
warning: unused `std::result::Result` that must be used
--> rustjail/src/cgroups/fs/mod.rs:199:17
|
199 | freezer_controller.freeze();
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: this `Result` may be an `Err` variant, which should be handled
warning: unused `std::result::Result` that must be used
--> rustjail/src/cgroups/fs/mod.rs:365:9
|
365 | cpuset_controller.set_cpus(&cpu.cpus);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: this `Result` may be an `Err` variant, which should be handled
warning: unused `std::result::Result` that must be used
--> rustjail/src/cgroups/fs/mod.rs:369:9
|
369 | cpuset_controller.set_mems(&cpu.mems);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: this `Result` may be an `Err` variant, which should be handled
warning: unused `std::result::Result` that must be used
--> rustjail/src/cgroups/fs/mod.rs:381:13
|
381 | cpu_controller.set_shares(shares);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: this `Result` may be an `Err` variant, which should be handled
warning: unused `std::result::Result` that must be used
--> rustjail/src/cgroups/fs/mod.rs:385:5
|
385 | cpu_controller.set_cfs_quota_and_period(cpu.quota, cpu.period);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: this `Result` may be an `Err` variant, which should be handled
warning: unused `std::result::Result` that must be used
--> rustjail/src/cgroups/fs/mod.rs:1061:13
|
1061 | cpuset_controller.set_cpus(cpuset_cpus);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: this `Result` may be an `Err` variant, which should be handled
The specific case of cpu_controller.set_cfs_quota_and_period is
addressed in a way that changes the logic following a suggestion by
Liu Bin, who had just added the code.
Fixes: #750
Suggested-by: Liu Bin <bin@hyper.sh>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
When we are writing to the logs and there is an error doing so, there
is not much we can do. Chances are that a panic would make things
worse. So let it go through.
warning: unused `std::result::Result` that must be used
--> rustjail/src/sync.rs:26:9
|
26 | write_count(lfd, log_str.as_bytes(), log_str.len());
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
::: rustjail/src/container.rs:339:13
|
339 | log_child!(cfd_log, "child exit: {:?}", e);
| ------------------------------------------- in this macro invocation
|
= note: this `Result` may be an `Err` variant, which should be handled
= note: this warning originates in a macro (in Nightly builds, run with -Z macro-backtrace for more info)
Fixes: #750
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Some functions have undefined behavior and are not actually used.
This addresses the following warning:
warning: the type `oci::User` does not permit zero-initialization
--> rustjail/src/lib.rs:99:18
|
99 | unsafe { MaybeUninit::zeroed().assume_init() }
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| |
| this code causes undefined behavior when executed
| help: use `MaybeUninit<T>` instead, and only call `assume_init` after initialization is done
|
= note: `#[warn(invalid_value)]` on by default
note: `std::ptr::Unique<u32>` must be non-null (in this struct field)
warning: the type `protocols::oci::Process` does not permit zero-initialization
--> rustjail/src/lib.rs:146:14
|
146 | unsafe { MaybeUninit::zeroed().assume_init() }
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| |
| this code causes undefined behavior when executed
| help: use `MaybeUninit<T>` instead, and only call `assume_init` after initialization is done
|
note: `std::ptr::Unique<std::string::String>` must be non-null (in this struct field)
Fixes: #750
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Addresses the following warning (and a few similar ones):
warning: variable does not need to be mutable
--> rustjail/src/container.rs:369:9
|
369 | let mut oci_process: oci::Process = serde_json::from_str(process_str)?;
| ----^^^^^^^^^^^
| |
| help: remove this `mut`
|
= note: `#[warn(unused_mut)]` on by default
Fixes: #750
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
This addresses the following:
warning: use of deprecated item 'std::error::Error::description': use the Display impl or to_string()
--> rustjail/src/container.rs:1598:31
|
1598 | ... e.description(),
| ^^^^^^^^^^^
|
= note: `#[warn(deprecated)]` on by default
Fixes: #750
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Parameters that are never used were removed.
Parameters that are unused, but necessary because of some common
interface were renamed with a _ prefix.
In one case, consume the parameter by adding an info! call, and fix a
minor typo in a message in the same function.
This addresses the following warning:
warning: unused variable: `child`
--> rustjail/src/container.rs:1128:5
|
1128 | child: &mut Child,
| ^^^^^ help: if this is intentional, prefix it with an underscore: `_child`
warning: unused variable: `logger`
--> rustjail/src/container.rs:1049:22
|
1049 | fn update_namespaces(logger: &Logger, spec: &mut Spec, init_pid: RawFd) -> Result<()> {
| ^^^^^^ help: if this is intentional, prefix it with an underscore: `_logger`
Fixes: #750
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Remove variables that are simply not used.
Rename as _ variables where only initialization matters.
This addresses the following warnings:
warning: unused variable: `writer`
--> src/main.rs:130:9
|
130 | let writer = unsafe { File::from_raw_fd(wfd) };
| ^^^^^^ help: if this is intentional, prefix it with an underscore: `_writer`
|
= note: `#[warn(unused_variables)]` on by default
warning: unused variable: `ctx`
--> src/rpc.rs:782:9
|
782 | ctx: &ttrpc::TtrpcContext,
| ^^^ help: if this is intentional, prefix it with an underscore: `_ctx`
warning: unused variable: `ctx`
--> src/rpc.rs:808:9
|
808 | ctx: &ttrpc::TtrpcContext,
| ^^^ help: if this is intentional, prefix it with an underscore: `_ctx`
warning: unused variable: `dns_list`
--> src/rpc.rs:1152:16
|
1152 | Ok(dns_list) => {
| ^^^^^^^^ help: if this is intentional, prefix it with an underscore: `_dns_list`
warning: value assigned to `child_stdin` is never read
--> rustjail/src/container.rs:807:13
|
807 | let mut child_stdin = std::process::Stdio::null();
| ^^^^^^^^^^^^^^^
|
= note: `#[warn(unused_assignments)]` on by default
= help: maybe it is overwritten before being read?
warning: value assigned to `child_stdout` is never read
--> rustjail/src/container.rs:808:13
|
808 | let mut child_stdout = std::process::Stdio::null();
| ^^^^^^^^^^^^^^^^
|
= help: maybe it is overwritten before being read?
warning: value assigned to `child_stderr` is never read
--> rustjail/src/container.rs:809:13
|
809 | let mut child_stderr = std::process::Stdio::null();
| ^^^^^^^^^^^^^^^^
|
= help: maybe it is overwritten before being read?
warning: value assigned to `stdin` is never read
--> rustjail/src/container.rs:810:13
|
810 | let mut stdin = -1;
| ^^^^^^^^^
|
= help: maybe it is overwritten before being read?
warning: value assigned to `stdout` is never read
--> rustjail/src/container.rs:811:13
|
811 | let mut stdout = -1;
| ^^^^^^^^^^
|
= help: maybe it is overwritten before being read?
warning: value assigned to `stderr` is never read
--> rustjail/src/container.rs:812:13
|
812 | let mut stderr = -1;
| ^^^^^^^^^^
|
= help: maybe it is overwritten before being read?
Fixes: #750
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
This addresses the following warning:
warning: unnecessary braces around assigned value
--> src/rpc.rs:1411:26
|
1411 | detail.init_daemon = { unistd::getpid() == Pid::from_raw(1) };
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: remove these braces
|
= note: `#[warn(unused_braces)]` on by default
Fixes: #750
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Pull over kata-deploy-test from the 1.x packaging repository. This is
intended to be used for testing any changes to the kata-deploy
scripting, and does not exercise any new source code changes.
Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
This reverts commit c0ea910273.
Two scripts are still required for release and testing, which should
have never been under obs-packaging dir in the first place. Let's
revert, move the scripts / update references to it, and then we can
remove the remaining obs-packaging/ tooling.
Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
We can rely on the error handling of the actual HTTP API calls to catch
errors, and don't need to call VmmPing explicitly in advance.
Signed-off-by: Bo Chen <chen.bo@intel.com>
The cloud-hypervisor commit `6d30fe05` introduced a fix on its API for
VFIO device hotplug (`VmAddDevice`), which is required for supporting
VFIO unplug through openAPI calls in kata.
Signed-off-by: Bo Chen <chen.bo@intel.com>
First, most people don't care about CNM. Move that out of main doc.
Second, tc-filter is the default. Let's add a bit more background on
our usage of tc-filter (and clarify why we use this instead of macvtap).
Fixes#797
Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
The systemd method of adding a debug console is not really
user friendly. Since we have added a much more straightforward
method to enable agent debug console, update developer guide to
reflect this.
Fixes#834
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Add github action to test that the snap package was generated
correctly, this CI don't test the snap, it just build it.
fixes#838
Signed-off-by: Julio Montes <julio.montes@intel.com>
Tag openapi-generator-cli container to v4.3.1 that is the latest
stable, this way we can have reproducible builds and the same
generated code in all the systems
Signed-off-by: Julio Montes <julio.montes@intel.com>
k8s.io/apimachinery/pkg/api/resource is a memory quantities parser,
we use it to parse the SGX EPC size defined by the `sgx.intel.com/epc`
annotation
Signed-off-by: Julio Montes <julio.montes@intel.com>
Support the `sgx.intel.com/epc` annotation that is defined by the intel
k8s plugin. This annotation enables SGX. Hardware-based isolation and
memory encryption.
For example, use `sgx.intel.com/epc = "64Mi"` to create a container
with 1 EPC section with pre-allocated memory.
At the time of writing this patch, SGX patches have not landed on the
linux kernel project.
The following github kernel fork contains all the SGX patches for the
host and guest: https://github.com/intel/kvm-sgxfixes#483
Signed-off-by: Julio Montes <julio.montes@intel.com>
We have removed cli support and that means dockder support is dropped
for now. Also it doesn't make sense to have so many duplications on each
distribution as we can simply refer to the official docker guide on how
to install docker.
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
rust agent does not use grpc as submodule for a while, update README
to reflect the change.
Fixes: #196
Signed-off-by: Yang Bo <bo@hyper.sh>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Now, the qemu version used in arm is so old. As some new features have merged
in current qemu, so it's time to upgrade it. As obs-packaging has been removed,
I put the qemu patch under qemu/patch/5.1.x.
As vxfs has been Deprecated in qemu-5.1, it will be no longer exist in
configuration-hyperversior.sh when qemu version larger than 5.0.
Fixes: #816
Signed-off-by: Edmond AK Dantes <edmond.dantes.ak47@outlook.com>
2.0 Packaging runtime-release-notes.sh script is using 1.x Packaging
kernel urls. Fix these urls to 2.0 branch Packaging urls.
Fixes: #829
Signed-off-by: Ychau Wang <wangyongchao.bj@inspur.com>
Some sections and files were removed in a previous commit,
remove all reference to such sections and files to fix the
check-markdown test.
fixes#826
Signed-off-by: Julio Montes <julio.montes@intel.com>
We should not checkout to 2.0-dev branch in the clone_tests_repo
function when running in Jenkins CI as it discards changes from
tests repo.
Fixes: #818.
Signed-off-by: Salvador Fuentes <salvador.fuentes@intel.com>
The code in the end of init_child is unreachable and need to be removed.
The code after do_exec is unreachable and need to be removed.
Signed-off-by: Tim Zhang <tim@hyper.sh>
Firecracker expose metrics through fifo file
and using a JSON format. This PR will parse the
Firecracker's metrics and convert to Prometheus metrics.
Fixes: #472
Signed-off-by: bin liu <bin@hyper.sh>
Current working directory is a process level resource. We cannot call
chdir in parallel from multiple threads, which would cause cwd confusion
and result in UT failures.
The agent code itself is correct that chdir is only called from spawned
child init process. Well, there is one exception that it is also called
in do_create_container() but it is safe to assume that containers are
never created in parallel (at least for now).
Fixes: #782
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Somehow we are not running static checks for a long time.
And that ended up with a lot for errors.
* Ensure debug options are valid is dropped
* fix snap links
* drop extra CONTRIBUTING.md
* reference kata-pkgsync
* move CODEOWNERS to proper place
* remove extra CODE_OF_CONDUCT.md.
* fix spell checker error on Developer-Guide.md
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Set enable_debug_console=true in Kata's congiguration file,
runtime will pass `agent.debug_console`
and `agent.debug_console_vport=1026` to agent.
Fixes: #245
Signed-off-by: bin liu <bin@hyper.sh>
Create "class" and "config" file in temporary device BDF dir,
and remove dir created by ioutil.TempDir() when test finished.
fixes: #746
Signed-off-by: zhanghj <zhanghj.lc@inspur.com>
The code used `match` as a switch with variable patterns `ev_fd` and
`cf_fd`, but the way Rust interprets the code is that the first
pattern matches all values. The code does not perform as expected.
This addresses the following warning:
warning: unreachable pattern
--> rustjail/src/cgroups/notifier.rs:114:21
|
107 | ev_fd => {
| ----- matches any value
...
114 | cg_fd => {
| ^^^^^ unreachable pattern
|
= note: `#[warn(unreachable_patterns)]` on by default
Fixes: #750Fixes: #793
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
This only allows some whitelists files bind mounted under proc
and prevent other malicious mount to procfs.
Fixes: #807
Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
Overly long commit lines are annoying. But sometimes,
we need to be able to force the use of long lines
(for example to reference a URL).
Ironically, I can't refer to the URL that explains this
because of ... the long line check! Hence:
```sh
$ cat <<EOT | tr -d '\n'; echo
See: https://github.com/kata-containers/tests/tree/master/
cmd/checkcommits#handling-long-lines
EOT
```
Maximum body length updated to 150 bytes for parity with:
https://github.com/kata-containers/tests/pull/2848Fixes: #687.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
The latest release of cloud-hypervisor v0.10.0 contains the following
updates: 1) `virtio-block` Support for Multiple Descriptors; 2) Memory
Zones; 3) `Seccomp` Sandbox Improvements; 4) Preliminary KVM HyperV
Emulation Control; 5) various bug fixes and refactoring.
Note that this patch updates the client code of clh's HTTP API in kata,
while the 'versions.yaml' file was updated in an earlier PR.
Fixes: #789
Signed-off-by: Bo Chen <chen.bo@intel.com>
In static-build/qemu-virtiofs/Dockerfile the code which
applies the virtiofs specific patches is spread in several
RUN instructions. Refactor this code so that it runs in a
single RUN and produce a single overlay image.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The qemu and qemu-virtiofs Dockerfile files repeat the code to apply
patches based on QEMU stable branch being built. Instead, this adds
a common script (qemu/apply_patches.sh) and make it called by the
respective Dockerfile files.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Fix a bug on qemu-virtiofs Dockerfile which end up not applying
the QEMU patches.
Fixes#786
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This patch add fall-back code path that builds cloud-hypervisor static
binary from source, when the downloading of cloud-hypervisor binary is
failing. This is useful when we experience network issues, and also
useful for upgrading clh to non-released version.
Together with the changes in the tests repo
(https://github.com/kata-containers/tests/pull/2862), the Jenkins config
file is also updated with new Execute shell script for the clh CI in the
kata-containers repo. Those two changes fix the regression on clh CI
here. Please check details in the issue below.
Fixes: #781
Fixes: https://github.com/kata-containers/tests/issues/2858
Signed-off-by: Bo Chen <chen.bo@intel.com>
This PR includes these changes:
- use Rust installed by Travis
- install x86_64-unknown-linux-musl
- install rustfmt
- use Travis cache
- delete ci/install_vc.sh
Fixes: #748
Signed-off-by: bin liu <bin@hyper.sh>
Use the relative path of kata-deploy to replace the 1.x packaging url in
the kata-deploy/README.md file. Fixed the path issue, producted by
creating new branch.
Fixes: #777
Signed-off-by: Ychau Wang <wangyongchao.bj@inspur.com>
Be more verbose about podman configuration in the output of the data
collection script: get the system configuration as seen by podman and
dump the configuration files when present.
Fixes: #243
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
1. Until we restore docker/moby support, we should use crictl as
developer example.
2. Most of the hyperlinks should point to kata-containers repository.
3. There is no more standalone mode.
Fixes: #767
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Fix the kata-pkgsync tool's docs, change the download path of the
packaging tool in 2.0 release.
Fixes: #773
Signed-off-by: Ychau Wang <wangyongchao.bj@inspur.com>
Add unit tests for finish_root, read_only_path and mknod_dev
increasing code coverage of mount.rs
fixes#284
Signed-off-by: Julio Montes <julio.montes@intel.com>
Use conditional compilation (#[cfg]) to change chroot behaviour
at compilation time. For example, such function will just return
`Ok(())` when the unit tests are being compiled, otherwise real
chroot operation is performed.
Signed-off-by: Julio Montes <julio.montes@intel.com>
Use conditional compilation (#[cfg]) to change pivot_root behaviour
at compilation time. For example, such function will just return
`Ok(())` when the unit tests are being compiled, otherwise real
pivot_root operation is performed.
Signed-off-by: Julio Montes <julio.montes@intel.com>
Don't use unwrap in `init_rootfs` instead return an Error, this way
we can write unit tests that don't panic.
Signed-off-by: Julio Montes <julio.montes@intel.com>
Add tempfile crate as depedency, it will be used in the following
commits to create temporary directories for unit testing.
Signed-off-by: Julio Montes <julio.montes@intel.com>
Use conditional compilation (#[cfg]) to change mount and umount
behaviours at compilation time. For example, such functions will just
return `Ok(())` when the unit tests are being compiled, otherwise real
mount and umount operations are performed.
Signed-off-by: Julio Montes <julio.montes@intel.com>
Update `kata-check` to see if there is a newer version available for
download. Useful for users installing static packages (without a package
manager).
Fixes: #734.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
We should just download the official static build binary instead of
trying to build on our own.
Fixes: #760
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
- agent: add cgroup v2 support
- runtime: Don't use hard-coded crio config
- Generate version file with more information in it.
- ci: replace spaces by tabs as indent
- fix issues with short life time container/exec processes
- action: Add issue to project and move to "In progress" on linked PR
- virtiofsd: fix typo in test code
- agent: setup DNS for guest
- ci: run agent test under root user
- docs: update sandbox apis doc for kata 2.0-dev
- rustjail: fix the issue of invalid cgroup_parent path
- osbuilder: update usage of RUST_AGENT variable
- agent: add retry between doing CPU hotplug and make it online.
- kernel: update to the latest LTS kernel 5.4.60
- osbuilder: fix rootfs build on ppc64le
- kernel: Enabling PTP clock support in kernel
- rootfs-builder: fix unbootable dracut-based initramfs on Fedora
- [fwport-2.0] osbuilder/image-builder: disable reflink
- virtcontainers: Add unit test for utils/compare.go
- reimplement error handling: use anyhow
- docs: update yaml file link for prometheus deployment
- docs: Update the doc for minikube installing kata
- trivial: Fix spelling of "privilege"
- [port] image-builder: disable reflink
- runtime: qemu: reduce boot time and memory footprint
- snap for kata 2.0
- runtime: Fix typo in hotplugVFIODevice()
- drivers: Correct isPCIeDevice logic
- docs: Add documentation for VFIO-AP passthrough
- [fwport-2.0] qemu: update build dependencies to support QEMU 5
- kata-deploy: add ACRN runtime to Docker configuration
- runtime: Add support for VFIO-AP pass-through
- agent: update Cargo files authors
- packaging: adjustment for 2.0 branch
- Fix epthemeral mount issue
- clh: Disable the 'seccomp' option temporarily
- Subject: [PATCH] qemu: add annotations for iommu_platform for s390x v…
- Foward-port :virtiofs: Update virtiofs docs
- Forward-port kata deploy conf
- initrd: Increase Alpine Version to 3.12
- [forward port]: osbuilder: Update yq
- tools: Add Unix socket support to agentl-ctl
- agent: Add target optimize for Makefile
- server: Allow address to be specified
- rustjail: default permission of device node should be 666
- packaging: Add VFIO-AP fragment for s390x
- console: Fix crash if debug console disabled
- agent: support guest hooks
- virtcontainers: Add to utils unit tests
- sandbox: Disconnect from agent after VM shutdown
- runtime: Re-vendor GoVMM for hotplugging IBM Adjunct Processor (AP) devices over VFIO
- clh: Port cloud-hypervisor related changes from kata-runtime
- docs: remove outdated dependencies from agent docs
- [forward-port] packaging: s390x kernel config fragments
- action: Fix subsystem check
- osbuilder : ppc64le support for rust agent based rootfs/initrd image
- packaging: add usage instructions for -a (arch_target) option
- rustjail: add the "HOME" env for process
- rustjail: fix the issue of missing set propagation for bind mount
- agent: add unit tests for rustjail/process.rs
- ci: Update experimental kernel tag to enable CLH CI
- virtcontainers: fix outdated example code in api document
- agent: setup the "lo" interface run agent as init
- Fix commit-message-check and do some updates about github actions
- virtcontainers: cleanup codes, delete not used APIs
- Use github action to do Fixes/Length/Subsystem check for commit message
- docs: Remove installation of proxy
- virtcontainers: Add unit test for types/container.go
- shimv2: fix the issue of close IO stream
- docs: Update contributions section in limitations document kata 2.0
- Fix fd leakage in execute_hook
- Kata 2.0-dev port of #2867 (NoReboot Knob)
- qemu: remove multidev in fsdev parameter on arm64
- Makefile: add CLHCMD in arm64-options.mk
- runtime: change un-structured log to structured log
- virtcontainers: Add function to capabilities test
- virtcontainers: Expand unit test coverage for asset
615ffb93 agent: Generate version file with more adequate information in it.
f13ca94e agent: Fix setting of version
c823b4cd agent: Make build remove generated files on clean
357d7885 ci: replace spaces by tabs as indent
22876b2d agent: allow multiple wait on the same process
295f5100 runtime: Don't use hard-coded crio config
6487044f shimv2: trust cached status when deleting containers
325a4f86 shimv2: do not kill a stopped exec process
d7c77b69 runtime: write oom file to notify CRI-O tha OOM occurred
15065e44 agent: add cgroup v2 support
2ce97ec6 virtiofsd: fix typo in test code
b081f26a action: Add issue to project and move to "In progress" on linked PR
6520320f agent: setup DNS for guest
90e0dc88 ci: run agent test under root user
c133a456 rustjail: fix the issue of invalid cgroup_parent path
20a084ae docs: update sandbox apis doc for kata 2.0-dev
d86e7467 agent: add retry between doing CPU hotplug and make it online.
ebd3f316 osbuilder: fix rootfs build on ppc64le
2dfb8bc5 rootfs-builder: fix unbootable dracut-based initramfs on Fedora
2019f00e docs: update yaml file link for prometheus deployment
0be02a8f runtime: qemu: reduce boot time and memory footprint
8b07bc2c agent: fix unit tests - remove rustjail::errors
6c96d666 agent: update Cargo toml and lock
46d7b9b8 agent/rustjail: remove rustjail::errors
fbb79739 agent: Use anyhow for error handling
33759af5 agent: Add anyhow dependency
c192446a agent/rustjail: Use anyhow for error handling
2e3e2ce1 agent/rustjail/capabilities: Use anyhow for error handling
6a4c9b14 agent/rustjail/cgroups: Use anyhow for error handling
359286a8 agent/rustjail: Add anyhow dependency
dd60e56f trivial: Fix spelling of "privilege"
cb999375 runtime: Fix typo in hotplugVFIODevice()
0d198f93 virtcontainers: Add unit test for utils/compare.go
1de9bc0f snap: reimplement snapcraft.yaml to support kata 2.0
85642c32 snap: move snapcraft.yaml to the right place
92dfa463 drivers: Correct isPCIeDevice logic
b4748280 kernel: Remove arm patches for ptp
82efd2f2 kernel: Enabling PTP clock support in kernel
8666e01e qemu/default-configs: update default-config for QEMU 5
2d12da8e qemu: update default-configs
cf3ac9f7 docs: Add documentation for VFIO-AP passthrough
11e8a494 docs: update the docs for minikube installing kata
517dda02 kernel: update to the latest LTS kernel 5.4.60
ae98ea45 obs-packaging: fix wait for obs
f5b71d34 qemu: update build dependencies to support QEMU 5
fcd29a28 osbuilder/image-builder: disable reflink
dae6c7d9 osbuilder: update usage of RUST_AGENT variable
1236e224 runtime: Add support for VFIO-AP pass-through
65970d38 osbuilder: install-yq should not print on success
c624fa74 osbuilder: install musl for aarch64
b24f2cb9 gitignore: ignore vscode directory
cf1b72d6 osbuilder: install rust before sourcing cargo env
7b5ab586 packaging: fix kata-deploy yaml path
76c18aa3 osbuilder: fix alpine agent build
5216815d packaging: make build-kernel.sh work for 2.0
aa3fb4db packaging: make kata-deploy work for 2.0
86a6e0b3 packaging: fix build image scripts
ceebd06b release: add 2.0 release actions
dadab1fe osbuilder: build rust agent by default
1bd58259 packaging: tag releases on kata-containers repo
f56f68bf obs-packaging: adjust for building on kata-containers repo
60245a83 agent: update Cargo files authors
544219d9 mount: fix the issue of epthemeral storage handler
fd8f3ee9 mount: add much more error info using chain_err
10b1deb2 tools: Add Unix socket support to agentl-ctl
f5598a1b Subject: [PATCH] qemu: add annotations for iommu_platform
f879acd6 scripts: Foward port osbuilder scripts to update yq
7be95b15 tools: Simplify error handling in agent-ctl
5b0e6f37 kata-deploy: add ACRN runtime to Docker configuration
adf9ecc5 initrd: Increase Alpine Version to 3.12
32b86a8d agent: Add target optimize for Makefile
26506d83 virtiofs: Update virtiofs docs
bee17d1c kata-deploy: Add containerd configuration to support kata annotations.
219f93ff kata-deploy: Add default privileged_without_host_devices
4b62fc16 clh: Disable the 'seccomp' option temporarily
f7ff6d32 image-builder: disable reflink
0a9b8e0a rustjail: default permission of device node should be 666
81644003 server: Allow address to be specified
bb30759e agent: add guest hooks UT
095ebb8c agent: fix OCI hook handling
03a4d107 agent: support guest hooks
e7bfeb41 agent: construct container bundle in tmpfs location
2ee40027 packaging: Add VFIO-AP fragment for s390x
4c30b255 runtime: Re-vendor GoVMM for VFIO-AP support
282bff9f sandbox: Disconnect from agent after VM shutdown
9f1a3d15 kernel: add s390x fragment
f1350616 kernel: config CONFIG_GENERIC_MSI_IRQ_DOMAIN
b67325c3 kernel: add missing configs
454dd854 kernel: config CONFIG_ PARAVIRT
62b45064 kernel: config CONFIG_NO_HZ_FULL
6dca74ba kernel: moved acpi hotplug config
7c85decc kernel: config CONFIG_PCI_MSI_IRQ_DOMAIN
efe51b29 kernel: fragment for pmem
08d046d9 kernel: config CONFIG_HAVE_NET_DSA
7b49fa12 kernel: fragments not supported on s390x
ccfb73cb agent/agent-ctl: update Cargo.lock
fd13c93c virtcontainers: Add msg to existing utils unit tests
c3fc09b9 virtcontainers: Add to utils unit tests
96582556 docs: remove outdated dependencies from agent docs
d12f920b console: Fix crash if debug console disabled
572de288 sandbox: Remove unnecessary thread
d5fbba3b main: Remove commented out and redundant code
1b2fe4a5 agent: Refactor main function
bac79eee main: Display config in announce
e2952b53 main: Simplify version handling
cfa35a90 action: Fix subsystem check
39b53f44 clh: enable build using Podman
04b156f6 qemu-virtiofs: Update to qemu 5.0 + virtiofs + dax
3ec05a9f clh: Add support to unplug block devices
45e32e1b clh: Set 'Id' explicitly while hotplugging block device
895959d0 clh: Provide cpu topology to API
31594387 clh: opeanapi: update api for cloud hypervisor
89836cd3 versions: cloud-hypervisor 0.9.0
8d5a60ac versions: Update qemu-virtiofs to 5.0
76a64667 clh: Remove the use of deprecated '--memory file=' parameter
bfd78104 packaging: add usage instructions for -a (arch_target) option
ecaa1f9e clh: Enable versions and kernel tag to enable CLH CI for kata 2.0
64b06944 ppc64le: Support for rust agent based rootfs
2511cabb virtcontainers: fix outdated example code in api document
5c7f0016 rustjail: add the "HOME" env for process
58dfd503 rustjail: fix the issue of missing set propagation for bind mount
e79c5727 agent: setup the "lo" interface run agent as init
d0a45637 agent: add unit tests for rustjail/process.rs
2889af77 actions: Run subject-line-length check even if the previous checks failed
9f0fef5a actions: Add commit-body-missing check
d81af48a actions: Do not limit the length of single word in commit body
8c46a41b actions: Fix subsystem checking in github-action
2466ac73 actions: Fix 'Fixes checking' problem by update dependent action
e7d3ba12 virtcontainers: cleanup codes, delete not used APIs
998a6343 docs: Remove installation of proxy
c305911d actions: Use github action to do Fixes/Length/Subsystem check
bd78ccaf shimv2: fix the issue of close IO stream
06834931 agent: Fix fd leaks in execute_hook
b03cd1bf docs: Update contributions section in limitations document kata 2.0
c15ef219 qemu: Set govmmQemu NoReboot config Knob
57269262 qemu: Add test for qemuConfig Knobs
5010e3a3 vendor: update govmm
61d133f9 runtime: change un-structured log to structured log
f24ad25d virtcontainers: Add unit test for types/container.go
1637e9d3 qemu: remove multidev in qemu/fsdev parameter on arm64
b61c9ca2 Makefile: add CLHCMD in arm64-options.mk
e1a79e69 virtcontainers: Add function to capabilities test
d1d5c69b virtcontainers: Expand unit test coverage for asset
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Clear the 1.x branch api methods in the 2.0. Keep the same methods to
the VC interface, like the VCImpl struct.
Fixes: #751
Signed-off-by: Ychau Wang <wangyongchao.bj@inspur.com>
The version.rs file is now generated to contain up-to-date information
from the makefile, including git commit and the full binary path.
The makefile has also been modified to make it easier to add changes
in generated files based on makefile variables.
Fixes: #740
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Fix the bug where the version string generated by the `Makefile` was not
being passed to the agent, resulting in a "unknown" version.
Fixes: #725.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Until a container is deleted, agent should allow runtime to wait for
a process in parallel, as being supported by the go agent.
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Same as containers, it is possible for an exec process to stop so
quickly that containerd may send a parallel Kill request. We should
just return success in such case.
Fixes: #716
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
In checkAndMount(), it is not clear why we check IsBlockDevice() and if
DisableBlockDeviceUse == false and then only return "false, nil" instead
of "false, err". Adding a comment to make it a bit more readable.
Fixes: #732
Signed-off-by: Qian Cai <cai@redhat.com>
Let's add a new column to the Official packages table, and let the
maintainers of the official distro packages to jump in and add their
names there.
This will help us to ping & redirect to the right people possible issues
that are reported against the official packages.
Fixes: #623
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Kata Containers will stop distributing the community packages in favour
of kata-deploy.
Fixes: #623
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Following up a conversation with Ralf Haferkamp, we can safely drop the
instructions for using Kata Containers on SLES 12 SP3 in favour of using
the official builds provided for SLE 15 SP1, and SLE 15 SP2.
Fixes: #623
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Let's update the openSUSE Installation Guide to reflect the current
information on how to install kata packages provided by the distro
itself.
The official packages are present on Leap 15.2 and Tumbleweed, and can
be just installed. Leap 15.1 is slightly different, as the .repo file
has to be added before the packages can be installed.
Leap 15.0 has been removed as it already reached its EOL.
Fixes: #623
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Although the community packages are present for RHEL, everything about
them is extremely unsupported on the Red Hat side.
Knowing this, we'd be better to simply not mentioned those and, if users
really want to try kata-containers on RHEL, they can simply follow the
CentOS installation guide.
In the future, if the Fedora packages make their way to RHEL, we can add
the information here. However, if we're recommending something
unsupported we'd be better recommending kata-deploy instead.
Fixes: #623
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Let's update the Fedora Installation Guide to reflect the current
information on how to install kata packages provided by the distro
itself.
These are official packages and we, as Fedora members, recommend using
kata-containers on Fedora 32 and onwards, as from this version
everything works out-of-the-box. Also, Fedora 31 will reach its EOL as
soon as Fedora 33 is out, which should happen on October.
Fixes: #623
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Let's update the CentOS Installation Guide to reflect the current
information on how to install kata packages provided by the
Virtualiation Special Interest Group.
These are not official CentOS packages, as those are not coming from Red
Hat Enterprise Linux. These are the same packages we have on Fedora and
we have decided to keep them up-to-date and sync'ed on both Fedora and
CentOS, so people can give Kata Containers a try also on CentOS.
The nature of these packages makes me think that those are "as official
as they can be", so that's the reason I've decided to add the
instructions to the "official" table.
Together with the change in the Installation Guide, let's also update
the README and reflect the fact we **strongly recommend** using CentOS
8, with the packages provided by the Virtualization Special Interest
Group, instead of using the CentOS 7 with packages built on OBS.
Fixes: #623
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
A PR now needs *two* labels to be applied before it can be merged.
One label must be a backport label from the list below and the other
a forward port label:
- backport labels:
`needs-backport`, `no-backport-needed`, `backport`.
- forward-port labels:
`needs-forward-port`, `no-forward-port-needed`, `forward-port`.
This is to make the maintainer think carefully before merging a PR
and hopefully maximise efficient porting.
Related: https://github.com/kata-containers/kata-containers/issues/634Fixes: #639.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
The cgroup_parent path is expected to be absolute path,
add an '/' prefix to the passed cgroup_parent path to make
sure it's an absolute path.
Fixes: #336
Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
Sync the api from the runtime codes to the documentation. Remove and add
some apis in the kata-api-design.md doc. And new table for Sandbox
Monitor APIs.
Fixes: #701
Signed-off-by: Ychau Wang <wangyongchao.bj@inspur.com>
Sometimes runtime will fail in onlining CPU process,
because when the runtime calls to QMP
`device_add`, QEMU doesn't allocate all vCPUs inmediatelly.
Fixes: #665
Signed-off-by: bin liu <bin@hyper.sh>
The linux kernel feature RANDOMIZE_BASE improved the security and at
the same time increased the memory footprint of a kata container,
this feature was enabled in kata-containers/packaging#1006.
In order to mitigate this increase in memory consumption, we can
boot container using the uncompressed kernel.
Reduce boot time by ~5%
Reduce KSM memory footprint by ~14%
Reduce noKSM memory footprint by ~27%
fixes#669
Signed-off-by: Julio Montes <julio.montes@intel.com>
`rustjail::erros` was removed in a previous commit, hence some external crates
like `error_chain` are no longger required, update Cargo.toml and Cargo.lock
to reflect these changes.
Signed-off-by: Julio Montes <julio.montes@intel.com>
Don't use `rustjail::errors` for error handling, since it's not
thread safe and there are better alternatives like `anyhow`.
`anyhow` attaches context to help the person troubleshooting
the error understand where things went wrong, for example:
Current error messages:
```
No such file or directory (os error 2)
```
With `anyhow`:
```
Error: Failed to read config.json
Caused by:
No such file or directory (os error 2)
```
fixes#641
Signed-off-by: Julio Montes <julio.montes@intel.com>
anyhow provides `anyhow::Error`, a trait object based error type for
easy idiomatic error handling in Rust applications
Signed-off-by: Julio Montes <julio.montes@intel.com>
Use `.to_string` to wrap up `caps::errors::Error`s since they are not
thread safe, otherwise `cargo build` will fail with the following error:
```
doesn't satisfy `caps::errors::Error: std::marker::Sync`
```
Signed-off-by: Julio Montes <julio.montes@intel.com>
Return `anyhow::Result` from all the functions in this directory.
Add function `io_error_kind_eq` to compare an `anyhow::Error` with an
`io::Error`, this function downcast the `anyhow::Error`.
Signed-off-by: Julio Montes <julio.montes@intel.com>
anyhow provides `anyhow::Error`, a trait object based error type for
easy idiomatic error handling in Rust applications.
Signed-off-by: Julio Montes <julio.montes@intel.com>
I noticed the spelling mistake while reviewing another change and
doing a "grep" for "privilege" that turned up nothing.
Fixes: #671
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
In order to use a build systemd like launchpad, the snapcraft.yaml file
must be in the root directory of the project or under the `snap`
directory, that way launchpad detects that this project can be build
using the `snapcraft` command
Signed-off-by: Julio Montes <julio.montes@intel.com>
Currently, isPCIeDevice() attempts to determine if a (host) device is
PCI-Express capable by looking up its link speed via the PCI slots
information in sysfs. This is a) complicated and b) wrong. PCI-e
devices don't have to have slots information, so this frequently fails.
Instead determine if devices are PCI-e by checking for the presence of
PCIe extended configuration space by looking at the size of the "config"
file in sysfs.
Forward ported from 6bf93b23 in the Kata 1.x runtime repository.
Fixes: #611
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
These patches are causing compilation issues while building on x86.
Remove these while we fix the issue.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Disable the following IPMI configs, since they are not needed
for kata containers and fixes the snap job in launchpad
CONFIG_PCI_IPMI_KCS
CONFIG_PCI_IPMI_BT
CONFIG_IPMI_SSIF
fixes#581
Signed-off-by: Julio Montes <julio.montes@intel.com>
Add guide on how to pass a VFIO-AP device, such as Crypto Express cards
on IBM Z mainframes, to a Kata container. Like the documentation for
VFIO-PCI, this was put in the virtcontainers README.
Fixes: #658
Signed-off-by: Jakob-Naucke <jakob.naucke@ibm.com>
The command for intalling kata in minikube still keeping the old path of
the packaging project from the 1.x branch. This commit changed the path
of the packaging's files to 2.0-dev branch.
Fixes: #619
Signed-off-by: Ychau Wang <wangyongchao.bj@inspur.com>
Reimplement the loop that waits for OBS. Look for the packages
that are still building, not for the repos.
Signed-off-by: Julio Montes <julio.montes@intel.com>
Add the following packages as build dependencies to build QEMU
5 in OBS and launchpad (snap)
* libselinux1
* libffi
* libmount
* libblkid
* python3
fixes#1075
Signed-off-by: Julio Montes <julio.montes@intel.com>
Disable reflink when using DAX. Reflink is a xfs feature that cannot be
used together with DAX.
fixes#577
Signed-off-by: Julio Montes <julio.montes@intel.com>
Recognise when a device to be hot-plugged is an IBM Adjunct Processor
(AP) device and execute VFIO AP hot-plug accordingly. Includes unittest
for recognising and uses CCW for addDeviceToBridge in hotplugVFIODevice
if appropriate.
Fixes: #491
Signed-off-by: Jakob-Naucke <jakob.naucke@ibm.com>
Co-authored-by: Julio Montes <julio.montes@intel.com>
Reviewed-by: Alice Frosi <afrosi@redhat.com>
Since we always build musl kata-agent, there is no need to build
it inside a musl container. We can just build on the host and then
copy the binary to the target rootfs.
There are still a lot to clean up and it should be made so for ALL
target distros instead of just alpine. But this is at least working
for alpine first.
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
We do not need to clone packaging repository, nor apply
virtio_vsock as virtio-fs-dev has already included that fix.
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Anyone can collaborate in the Kata Containers project, so instead of
adding her/his name and email to the Cargo.toml files, use
`The Kata Containers community` as name and
`kata-dev@lists.katacontainers.io` as email.
fixes#643
Signed-off-by: Julio Montes <julio.montes@intel.com>
For ephemeral storage handler, it should return an
empty string instead of the mount destination.
Fixes: #635
Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
Rather than specifying the VSOCK address as two CLI options
(`--vsock-cid` and `--vsock-port`), allow the agent's ttRPC server
address to be specified to the `agent-ctl` tool using a single URI
`--server-address` CLI option. Since the ttrpc crate supports VSOCK and
UNIX schemes, this allows the tool to be run inside the VM by specifying
a UNIX address.
Fixes: #549.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
for s390x virtio devices
Add iommu_platform annotations for qemu for ccw,
other supported devices can also make use of that.
Fixes#603
Signed-off-by: Qi Feng Huo <huoqif@cn.ibm.com>
Don't format the error string before passing to the `anyhow!()` macro
since it can format strings itself.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Add an ACRN runtime ('kata-acrn') to the Docker configuration
('/etc/docker/daemon.json').
Fixes: #579
Signed-off-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>
Update this document to get rid of any nemu mentions.
Added comment to mention that number of containers that can be
launched may be limited by the size of `/dev/shm`.
Fixes#572
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
In case of containerd, not all annotations are passed down to the OCI
layer. We need to configure "pod_annotations" field for a runtime class.
This field is a list of annotations that can be passed by Kata as OCI
annotations. Add this as default configuration with kata-deploy.
Fixes: #594
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
For privieleged containers, all host devices are passed to
container. We have done work in crio and containerd to define a
scope of privileged in Kata to prevent this from happening.
Add this as the default as this falls under a best practice to follow
with Kata.
Note that if this flag has been already defined, then this change
does not override it.
Fixes#582
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
We kept observing instabilities from CLH CI jobs periodically (kata
1.x). To separate the random failures caused by `seccomp` from other
failures, this patch disables the 'seccomp' option from clh in kata for
now. We will bring this option back after completing the 'seccomp'
filter lists based on Kata's CI workload. Details are tracked in the
following two issues:
https://github.com/kata-containers/runtime/issues/2899 and
https://github.com/kata-containers/runtime/issues/2901
We are facing the similar challenge to stabilize CI jobs related to
cloud-hypervisor in Kata 2.0. We are disabling the `seccomp` option here
for the same reason. Related issue:
https://github.com/kata-containers/tests/issues/2813Fixes: #614
Signed-off-by: Bo Chen <chen.bo@intel.com>
Disable reflink when using DAX. Reflink is a xfs feature that cannot be
used together with DAX.
fixes kata-containers/osbuilder#456
fixes#577
Signed-off-by: Julio Montes <julio.montes@intel.com>
Allow the default (VSOCK) ttRPC server address to be changed using a new
`KATA_AGENT_SERVER_ADDR` environment variable (for testing and
debugging).
Fixes: #552.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Scan guest hooks upon creating new sandbox and append
them to guest OCI spec before running containers.
Fixes: #485
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Add vfio-ap.conf to the s390 kernel config fragments, which includes
the necessary flags for passing an IBM Adjunct Processor (AP) device
over VFIO.
Fixes: #567
Signed-off-by: Jakob-Naucke <jakob.naucke@ibm.com>
Reviewed-by: alicefr <afrosi@redhat.com>
This is a re-vendor of intel/govmm, with support for hot-plugging IBM
Adjunct Processor (AP) devices over VFIO. This is necessary for
enabling AP device pass-through in Kata (see #491).
39c372a Add support for hot-plugging IBM VFIO-AP devices
f5bdd53 travis: disable amd64 jobs
1af1c0d github: enable github actions
4831c6e travis: Run coveralls after success
cf0f05d qemu: add iommu_platform knob for qemuParams
175ac49 typo fix
Fixes: #565
Signed-off-by: Jakob-Naucke <jakob.naucke@ibm.com>
When a one-shot pod dies in CRI-O, the shimv2 process isn't killed until
the pod is actually deleted, even though the VM is shut down. In this
case, the shim appears to busyloop when attempting to talk to the (now
dead) agent via VSOCK. To address this, we disconnect from the agent
after the VM is shut down.
This is especially catastrophic for one-shot pods that may persist for
hours or days, but it also applies to any shimv2 pod where Kata is
configured to use VSOCK for communication.
See github.com/kata-containers/runtime#2719 for details.
Fixes#2719
Signed-off-by: Evan Foster <efoster@adobe.com>
Moved CONFIG_GENERIC_MSI_IRQ_DOMAIN in arch base.conf.
The config is not selected for s390x
Signed-off-by: Alice Frosi <afrosi@de.ibm.com>
Signed-off-by: Jakob-Naucke <jakob.naucke@ibm.com>
Some kernel configs need additional dependencies:
- CONFIG_NO_HZ depends on
CONFIG_GENERIC_CLOCKEVENTS
- CONFIG_CGROUP_PERF depends on
CONFIG_PERF_EVENTS
CONFIG_HAVE_PERF_EVENTS
- CONFIG_BLK_DEV_LOOP depends on
CONFIG_BLK_DEV
CONFIG_BLOCK
Signed-off-by: Alice Frosi <afrosi@de.ibm.com>
Signed-off-by: Jakob-Naucke <jakob.naucke@ibm.com>
Moved CONFIG_ PARAVIRT to each arch base.conf.
CONFIG_ PARAVIRT only defined in x86, arm64, arm in arch/$arch/Kconfig.
Signed-off-by: Alice Frosi <afrosi@de.ibm.com>
Signed-off-by: Jakob-Naucke <jakob.naucke@ibm.com>
Moved CONFIG_NO_HZ_FULL config to each arch base.conf.
The config CONFIG_NO_HZ_FULL depends on CONFIG_HAVE_CONTEXT_TRACKING.
See https://github.com/torvalds/linux/blob/
a811c1fa0a02c062555b54651065899437bacdbe/kernel/time/Kconfig#L96
The context tracking is not supported on s390x yet.
See https://github.com/torvalds/linux/blob/
a811c1fa0a02c062555b54651065899437bacdbe/Documentation/features/time/
context-tracking/arch-support.txt#L27
Signed-off-by: Alice Frosi <afrosi@de.ibm.com>
Signed-off-by: Jakob-Naucke <jakob.naucke@ibm.com>
Moved:
---
CONFIG_HOTPLUG_PCI_ACPI=y
CONFIG_PNPACPI=y
---
from hotplug to acpi.
In this way, it is possible to skip these config if the acpi feature is
not supported.
Signed-off-by: Alice Frosi <afrosi@de.ibm.com>
Signed-off-by: Jakob-Naucke <jakob.naucke@ibm.com>
The option CONFIG_VIRTIO_PMEM is not supported on s390x.
It requires nvdimm support.
Signed-off-by: Alice Frosi <afrosi@de.ibm.com>
Signed-off-by: Jakob-Naucke <jakob.naucke@ibm.com>
Add !s390x tag to skip these group of fragments for s390x.
Signed-off-by: Alice Frosi <afrosi@de.ibm.com>
Signed-off-by: Jakob-Naucke <jakob.naucke@ibm.com>
Expand unit tests for virtcontainers/utils/utils.go to include testing
CleanupFds, CPU calculations, ID string creation, and memory alignment
functions.
Fixes#490
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
The logic for the debug console meant that if the debug console was
_disabled_, the agent was guaranteed to crash on function exit due to
the unsafe code block. Fixed by simplifying the code to use the standard
`Option` idiom for optional values.
Fixes: #554.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Don't create a thread to wait for the ttRPC server to end - it isn't
required as the operation should be blocked on.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Print a simple version string rather than delaying the output
to display a structured version string. The structured output
is potentially more useful but:
- This output is not consistent with other components.
- Delaying the output makes `--version` unusable in some
environments (since a lot of setup is called before the
version string can be output).
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
\h is not a valid metacharacter in javascript which is used in
github-action.
Use \s\t to replace it.
Fixes: #551
Signed-off-by: Tim Zhang <tim@hyper.sh>
[ Port from packaging commit 4e1b5729f47d5f67902e1344521bc5b121673046 ]
Build clh with Podman, allow build the vmm in the Podman CI
Virtiofs qemu has to be build as this is requried by clh.
Fixes: #461
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Signed-off-by: Bo Chen <chen.bo@intel.com>
[ Port from packaging commit cbe53bdb14e303830fa9f2d5a7f3c9161a32f033 ]
Update build scripts for qemu-virtiofs.
- virtiofs-0.3 patches are not needed
- Sync build on how vanilla qemu is built
- Apply patches for virtiofsd if any (none today)
- Apply patches that are used for the qemu vanilla
- Apply patches in order
Fixes: #461
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Signed-off-by: Bo Chen <chen.bo@intel.com>
[ Port from runtime commit 44b58e4151d1fc7debed41274b65c37233a437e3 ]
This patch enables kata+clh to unplug block devices, which is required
to pass cri-o integration tests.
Fixes: #461
Signed-off-by: Bo Chen <chen.bo@intel.com>
[ Port from runtime commit 03fb9c50c180d3359178c30e06f1122df312ae76 ]
To support unplug block device, we need to set the 'Id' explicitly while
hotplugging devices with cloud-hypervisor HTTP API.
Fixes: #461
Signed-off-by: Bo Chen <chen.bo@intel.com>
[ Port from runtime commit 39897867bc89667daaafdd141367ec4a5fdc9247 ]
API now requires cpu topology.
Fixes: #461
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Signed-off-by: Bo Chen <chen.bo@intel.com>
[ Port from runtime commit 40f49312a4881c904a1cbdace04c4c697bd2d429 ]
Update api geneated by openapi.
Fixes: #461
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Signed-off-by: Bo Chen <chen.bo@intel.com>
[ Port from runtime commit 0dcbbd8dc113878c2aa8c78b5300e4853a7e64be ]
Highlights for cloud-hypervisor version 0.9.0 include:
virtiofs updates to new dax implementation based in qemu 5.0
Fixed random issues caused due to seccomp filters
io_uring Based Block Device Support
If the io_uring feature is enabled and the host kernel supports it then io_uring will be used for block devices. This results a very significant performance improvement.
Block and Network Device Statistics
Statistics for activity of the virtio network and block devices is now exposed through a new vm.counters HTTP API entry point. These take the form of simple counters which can be used to observe the activity of the VM.
HTTP API Responses
The HTTP API for adding devices now responds with the name that was assigned to the device as well the PCI BDF.
CPU Topology
A topology parameter has been added to --cpus which allows the configuration of the guest CPU topology allowing the user to specify the numbers of sockets, packages per socket, cores per package and threads per core.
Release Build Optimization
Our release build is now built with LTO (Link Time Optimization) which results in a ~20% reduction in the binary size.
Hypervisor Abstraction
A new abstraction has been introduced, in the form of a hypervisor crate so as to enable the support of additional hypervisors beyond KVM.
Snapshot/Restore Improvements
Multiple improvements have been made to the VM snapshot/restore support that was added in the last release. This includes persisting more vCPU state and in particular preserving the guest paravirtualized clock in order to avoid vCPU hangs inside the guest when running with multiple vCPUs.
Virtio Memory Ballooning Support
A virtio-balloon device has been added, controlled through the resize control, which allows the reclamation of host memory by resizing a memory balloon inside the guest.
Enhancements to ARM64 Support
The ARM64 support introduced in the last release has been further enhanced with support for using PCI for exposing devices into the guest as well as multiple bug fixes. It also now supports using an initramfs when booting.
Intel SGX Support
The guest can now use Intel SGX if the host supports it. Details can be found in the dedicated SGX documentation.
Seccomp Sandbox Improvements
The most frequently used virtio devices are now isolated with their own seccomp filters. It is also now possible to pass --seccomp=log which result in the logging of requests that would have otherwise been denied to further aid development.
Notable Bug Fixes
Our virtio-vsock implementation has been resynced with the implementation from Firecracker and includes multiple bug fixes.
CPU hotplug has been fixed so that it is now possible to add, remove, and re-add vCPUs (#1338)
A workaround is now in place for when KVM reports MSRs available MSRs that are in fact unreadable preventing snapshot/restore from working correctly (#1543).
virtio-mmio based devices are now more widely tested (#275).
Multiple issues have been fixed with virtio device configuration (#1217)
Console input was wrongly consumed by both virtio-console and the serial. (#1521)
Fixes: #461
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Signed-off-by: Bo Chen <chen.bo@intel.com>
[ Port from runtime commit d803f077c6fd26e4d020643eda415ea315f47e0c ]
Update to qemu 5.0.x with support for virtiofs + dax.
Fixes: #461
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Signed-off-by: Bo Chen <chen.bo@intel.com>
[ Port from runtime commit 30b40f5505fd46d23b89eb5fb38301d2f7454f35 ]
Along with the release of cloud-hypervisor v0.8.0, this option has been
deprecated. clh now enforces to use the alternative controls,
e.g. "shared" and "hugepages", which can infer the backing file
paths. Also, we don't use "hugepages" in kata, so we are fine now as the
"shared" control is already enabled.
Fixes: #461
Signed-off-by: Bo Chen <chen.bo@intel.com>
Add usage instructions for -a option in script and README,
currently supported architecture are aarch64/ppc64le/s390x/x86_64.
Fixes: #534
Signed-off-by: zhanghj <zhanghj.lc@inspur.com>
This PR updates the versions for the virtiofs kernel branch and
as there is a tag based in kernel 5.6 move patches to uses the tag name.
This PR is needed to enable CLH CI for kata 2.0. This PR is backporting
kata-containers/runtime#2843 and kata-containers/packaging#1098.
Fixes#532
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
For building rust agent on ppc64le, the rust toolchain is built using
the LIBC implementation - gnu instead of musl.
Fixes: #481
Signed-off-by: Amulya Meka <amulmek1@in.ibm.com>
Some type declarations were changed. The example code here is outdated
according to the example_pod_run_test.go under virtcontainers directory.
And add the imports to make where the types from clear.
Fixes: #507
Signed-off-by: Li Ning <lining_yewu@cmss.chinamobile.com>
When creating a container process/exec process, it should set the
"HOME" env for this process by getting from /etc/passwd.
Fixes: #498
Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
When do bind mount for container's volumes, the propagation
flags should be mount/set after bind mount.
Fixes: #530
Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
If the line comprises of only a single word,
it may be something like a URL (it's certainly very unlikely to be a
normal word if the default lengths are being used), so length
checks won't be applied to it.
Signed-off-by: Tim Zhang <tim@hyper.sh>
The Fixes checking should pass as long as one of the commits of
pull-request pass the check.
update depdent github-action commit-message-checker-with-regex to v0.3.1
shortlog:
d6d9770 commit-message-checker-with-regex: Add input one_pass_all_pass
Fixes: #519
Signed-off-by: Tim Zhang <tim@hyper.sh>
This PR removes the installation of proxy in the Developer Guide as it
does not exist on kata 2.0
Fixes#502
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
The commit checks does not need to wait for CI dependencies to be
installed, It's a waste of time. we need show errors ASAP.
And we should display as many problems as possible at once
Fixes: #487
Signed-off-by: Tim Zhang <tim@hyper.sh>
It should wait until the stdin io copy
termianted to close the process's io stream,
otherwise, it would miss forwarding some contents
to process stdin.
Fixes: #439
Signed-off-by: fupan.lfp <fupan.lfp@antgroup.com>
This PR updates the contributions sections for the limitations document
for kata 2.0 that instead using the previous runtime repository as example,
it will use the new one.
Fixes#476
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
The Kata architecture does not support rebooting VMs (the lifecycle
being start/exec/kill) and if a VM is killed (e.g. using sysrq-trigger),
the VM does not exit fully and other layers do not notice the state change.
Set the NoReboot config Knob so that govmmQemu.LaunchQemu() runs QEMU
with the --no-reboot command-line option.
Fixes: #2866
Signed-off-by: Liam Merwick <liam.merwick@oracle.com>
Add unit tests for types/container.go. Tests were adapted from
sandbox_test.go since ContainerState is a sandbox state structure and
the transition tests are the same.
Fixes#451
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
As the current qemu of arm64 is so old, the new multidev parameter
in 9pfsdev is not supported on arm64, so disabled it temporarily.
Fixes:#466
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
The new alpha release brings in following changes:
1f8e4f67 docs: Update travis and go report card url
db93a163 runtime: remove mock shim
e5910c9b sandbox: Stop and clean up containers that fail to create
1283febd ci: checkout TRAVIS_BRANCH
d7f75dce docs: remove shim/proxy topics and fix docs links
0b3cbee8 virtcontainers: Add additional unit tests for sandbox
c0720179 package: enable cloud-hypervisor for arm64
07a307b4 virtcontainers: Remove duplicate unit tests
d914f018 virtcontainers: Move unit tests for types/sandbox.go
33b1865e actions: Pin to a particular sha for actions
8564c99e actions: Add github actions to perform DCO check
c5081624 actions: Add action to perform WIP check for pull requests
7bbb9e81 rootfs-builder: Don't modify /sbin/init on the build host
3d467505 device: Ease device access for rootfs device to allow node creation
f554cdec virtcontainers: Add to bridges unit test
1d7d944f fc: refactor --daemonize option
7f3e8959 console-watcher: use console watcher to monitor guest console outputs
1099a288 kata 2.0: delete use_vsock option and proxy abstraction
73bf9329 cgroup: fix the issue of crashed when meet unsupported cgroup
ab7afae6 docs: Clarifying minimum version of containerd for annotations
5b15e9ef runtime: consolidate types definition
c6e4d092 agent: sandbox shared pid namespace support
afcf269c rustjail: fix the issue of missing join pid namespace
f3da6900 docs: add link to CRI Configuration for pods
4291eb17 runtime: add monitor_address to .gitignore
1c56abb7 runtime: virtcontainers: vhost-user-blk/scsi are block device nodes
bbf85170 runtime: add pprof interface for shim
0790ca49 runtime: add pod overhead metrics
ae83c96d Modifie to proper CPU architecture name for ppc64le.
f404f4d9 Modified Makefile to pick up correct architecture name for ppc64le.
cdbba6ac agent: Make LIBC configurable
2afbfcab virtcontainers: print a warning when the device to append is not supported
919fc4cd virtcontainer/cgroup: create cgroup manager after creating the network
a134c2e0 virtcontainers/network: Change signature of Enpoint Attach method
9a9721c2 drivers: change BindDevicetoVFIO signature
66219d16 device: support vfio cold plug
3eb694c5 device: add ColdPlug flag
3cf8b470 runtime: delete Stateful from SandboxConfig
069505e2 runtime: delete unused sub-commands.
a0a96db2 runtime: handle unimplemented RPC call by NotFound status code
bd8f03a5 runtime: remove agent abstraction
41c04648 runtime: fix wrong issue links
83b23665 config: there is no need to check vhost-vosck for FC
d96b3063 docs: add metrics design documents for Kata 2.0
b28b850a versions: Revert "versions: update QEMU to 5.0.0"
5ff53037 tools: fix branch and runime repo
24ea3f01 virtcontainers: GetOOMEvent should have no timeout
1b75daa0 runtime: add new command to collect metrics from Kata containers
5200ac06 runtime: remove old store
186fed2a runtime: add implementation of GetMetrics
0c4c69de agent: add GetMetrics implementation
9fd3e48c agent: add new pb message GetMetrics
9c501f3d agent: device: Allow "VmPath" to be used when adding block devices
15af20b6 versions: update QEMU to 5.0.0
a06d01e1 versions: specify rust version
7ae4376b clh: vsock: Use the updated VsockConfig
d8a333b9 versions: Move to cloud-hypervisor v0.8.0
9177d3a3 virtiofsd: Use cache=auto
d66f2192 cli: Fix kata-env output on Power
94fdec4e clh: Allow add virtiofs args and cache options from config
653df674 kata_agent: Add unit tests
6da49a04 clh: Clear the "PCIAddr" field while blk device hotplug
2d6c0731 kata_agent: Pass "VirtPath" with "PCIAddr" of blk devices to agent
56ae2099 kata_agent: Allow to use "VirtPath" as volume source for blk devices
bdd386ba qemu: Fix rtc parameter is not set to qemu
51a6d60a qemu: Remove PMU feature for Power (ppc64le) platform
3ece4130 runtime: clean up shim abstraction
3a17e7aa qemu: Remove pmu limitation in nested virtualization of amd/ppc64le
06571f03 build: Add "pmu=off" to default cpu_features option
115dfa19 annotations: add cpu_features
fa9d619e qemu: add cpu_features option
520295b9 network: Detect and add static ARP entries
117ce4ac clh: remove slow boot debug flags from kernel cmdline
70137962 clh: Remove vsock log port in kernel cmdline
fd5d1394 clh: Improve hypervisor logging
21f83348 clh: Set 'virtio-blk' as the default block device driver
8b5eed70 clh: Enable disk block device hotplug support
883af9c7 agent: set hostname when running as init
899b75f2 agent: fix the issue of missing found right shell
2a8650ba agent-ctl: add Cargo.lock
a8430b37 gitignore: ignore more files
be9ca0d5 qemu: Don't leak file descriptors in case of error
60606647 virtiofsd: Improve logging
7e250f29 shim: exit out of oom polling if unimplemented
9f8d1baa virtcontainers: tests fix, nit fix
d3b3e8be virtcontainers: x86: Support microvm machine type
19833936 virtcontainers: add support for getOOMEvent agent endpoint to sandbox
7c205be2 virtcontainers: add support for getOOMEvent agent endpoint to sandbox
380f07ec proto: update agent protocol
dbc1c30d versions: Remove golangci-lint and gometalinter entries
6e7dd435 qemu: arm64: Set defaultGICVersion to 3 to limit the max vCPU number
93d1f7b4 versions: Misc changes to descriptions
17b3021b qemu: arm64: Don't detect gic version by /proc/interrupts
4cda90ab dax: enable dax on arm64
7a440254 Makefile: add trace-forwarder/agent-ctl missing targets
61e011e8 vc: Version support check is ineffective in createSandbox
ebfbca03 osbuilder: use newest golang
0fd1eb59 Makefile: add default rule
3f8d4b68 trace-forwarder: add Cargo.lock
b68d4e45 shimv2: Removing function as no longer used
f570a2cd shimv2 : Remove workaround for sharedPidNs
b2cc403e build: Improve top-level Makefile
f2a19966 agent: Rename check rule to test
ea1d799f qemu: Only one element of qemuPaths map is relevant
5dffffd4 qemu: Remove useless table from qemuArchBase
97a02131 qemu: Detect and fail a bad machine type earlier
d6e7a58a qemu: Clarify test with bad machine type
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Add additional test cases that cover more asset types and functions to
increase unit test coverage.
Fixes#424
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
This PR fixes travis and go report carl url for the runtime README for kata
2.0
Fixes#432
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Use 'remap' behaviour to deal with multiple devices being shared with
a 9p export.
Fixes the following warning:
```
9p: Multiple devices detected in same VirtFS export, which might lead to file
ID collisions and severe misbehaviours on guest!
You should either use a separate export for each device shared from host or
use virtfs option 'multidevs=remap'!
```
fixes#378
Signed-off-by: Julio Montes <julio.montes@intel.com>
New features that can improve/impact in kata containers:
x86:
VMX features can be enabled/disabled via the "-cpu" flag.
When nested virtualization is enabled with an option like
"-cpu Haswell,+vmx", the set of VMX features will also be constrained to
what was available on the corresponding CPU model.
New "microvm" machine type that has virtio-mmio instead of PCI, and no ACPI
support (so no hotplug too). The new machine type is meant as a baseline
for performance optimizations of QEMU, firmware and guests. While inspired
by Firecracker it is not entirely compatible with it (for example it does
not have Firecracker's userspace IP stack and MicroVM Metadata Service).
Reduce memory footprint when booting uncompressed kernels.
ARM:
We now correctly support more than 256 CPUs when using KVM
The virt board now supports memory hotplugging, when used with a UEFI
guest BIOS and ACPI.
virtio-iommu is now supported with machvirt.
The Cortex-M7 CPU is now supported.
s390:
Using KVM now explicitly requires a host kernel version of at least 3.15
(which includes the 'flic' KVM device). This had been broken since QEMU
2.10 already.
ppc64le:
pseries machine type, now consumes less host resources when running a KVM
guest with XIVE (with a recent enough host kernel). This allows running
more concurrent guests with KVM accelerated XIVE.
NVDIMMs with file backend is now supported and SLOF updated to work with
iommu_platform=on for virtio devices.
Signed-off-by: Julio Montes <julio.montes@intel.com>
A container that is created and added to a sandbox can still fail
the final creation steps. In this case, the container must be stopped
and have its resources cleaned up to prevent leaking sandbox mounts.
Forward port of https://github.com/kata-containers/runtime/pull/2826Fixes#2816
Signed-off-by: Evan Foster <efoster@adobe.com>
Add tests for state change, empty string failures for Volumes and
Sockets. Change two function names to accurately reflect tests.
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Now, cloud-hypervisor is capable to work on arm64. it's time to
enable it in kata for arm64.
as cloud-hypervisor can only use virtio-fs, a new patch should be
applied to kernel for virtiofs and some config should be removed
temporarily.
Fixes: #446
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Remove tests from virtcontainers/sandbox_test.go which were moved to
virtcontainers/types/sandbox_test.go.
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Move unit tests that were in virtcontainers/sandbox_test.go relating
to Socket, Volume, and SandboxState to types/sandbox_test.go.
Change testSandboxStateTransition function to use SandboxState only
instead of Sandbox from virtcontainers/sandbox.go.
Fixes#435
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Since actions can access the github token, lets use a
particular version of sha rather than using master.
Fixes: #437
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
(cherry picked from commit 57b64f35e0)
Action performs a check to verify PR raised has commits
that are signed-off.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
(cherry picked from commit 1b157e5015)
Use github actions for performing WIP checks on PRs.
The action checks for keywords in subject line
as well labels.
Fixes: #437
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
(cherry picked from commit 0d96145c29)
Don't modify /sbin/init on the build host when using command `AGENT_INIT="yes" ./rootfs.sh centos` to build rootfs.
Fixes: #430
Signed-off-by: liangxianlong <liang.xianlong@zte.com.cn>
For docker in docker scenario, the nested container created
has entry "b *:* m" in the list of devices it is allowed to access
under /sys/fs/cgroup/devices/docker/{ctrid}/devices.list.
This entry was causing issues while starting a nested container
as we were denying "m" access to the rootfs block devices.
With this change we add back "m" access, the container would be
allowed to create a device node for the rootfs device but will
not have read-write access to the created device node.
This fixes the docker in docker use case while still making sure
the container is not allowed read/write access to the rootfs.
Note, this could also be fixed by simply skipping {"Type : "b"}
while creating the device cgroup with libcontainer.
But this seems to be undocumented behaviour at this point,
hence refrained from taking this approach.
Fixes#426
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Add function that creates new bridges to increase unit test coverage
for virtcontainers/types/bridges. Also adds test for address formats.
Fixes#422
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Import new console watcher to monitor guest console outputs, and will be
only effective when we turn on enable_debug option.
Guest console outputs may include guest kernel debug info, agent debug info,
etc.
Fixes: #389
Signed-off-by: Penny Zheng penny.zheng@arm.com
With kata containers moving to 2.0, (hybrid-)vsock will be the only
way to directly communicate between host and agent.
And kata-proxy as additional component to handle the multiplexing on
serial port is also no longer needed.
Cleaning up related unit tests, and also add another mock socket type
`MockHybridVSock` to deal with ttrpc-based hybrid-vsock mock server.
Fixes: #389
Signed-off-by: Penny Zheng penny.zheng@arm.com
Using pod annotations requires a minimum version of v1.3.0 of containerd
to pass annotations down to kata. This is already somewhat mentioned in
the corresponding how-to, however, it can be mis-read as the minimum
version of kata-containers instead of containerd. This can cause
extended and futile troubleshooting on older distributions such as
Ubuntu 16.04 which ship a version of 1.2.x of containerd. This patch
attempts to clarify this.
Fixes: #690
Signed-off-by: Georg Kunz <georg.kunz@est.tech>
We do not need the vc types translation for network data structures.
Just use the protocol buffer definitions.
Fixes: #415
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Add support shareProcessNamespace.
BTW, this commit only support shared pid namespace by
sharing the infrastructure pause container's pid namespace
with other containers, instead of creating a new pid
namespace different from pause container.
Fixes: #342
Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
When checking if a device is an emulated vhost-user-blk or
vhost-user-scsi one, we should not only check for their major number but
also their device node type. They must be block devices.
Fixes: #401
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Makefile is determining the architecture by running uname command
which gives ppc64le as output. But rust toolchain target is available
with the name powerpc64le for ppc64le arch. So this change took care of that.
Signed-off-by: Abhishek Dasgupta <abdasgupta@in.ibm.com>
Currently the default LIBC used to build the agent is "musl". However,
"musl" is not preset in a big portion of the distros *and* "gnu" libc
just works as expected.
Knowing that, let's add the option to the one building the project to
simply do `make LIBC=gnu` instead of expected the person to go through
the Makefile and replace musl by gnu there.
Fixes: #369
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Print a warning message when the device to append to a QEMU VM is not
supported. This change is just to improve debuggability.
Signed-off-by: Julio Montes <julio.montes@intel.com>
Create the cgroup manager once the network has been created, this way the
list of device will include the network VFIO devices attached to the sandbox,
when the physical enpoint is the network driver.
fixes#2774
Signed-off-by: Julio Montes <julio.montes@intel.com>
In order to use the device manager and receiver from the network enpoints,
the signature of the Attach method must change to revice a Sandbox instead of
a Hypervisor, this way devices can be added through the device manager API.
Signed-off-by: Julio Montes <julio.montes@intel.com>
Depending on ColdPlug flag, cold or hot plug vfio devices. The VFIO device
won't be hot removed when such flag is false
Signed-off-by: Julio Montes <julio.montes@intel.com>
Add ColdPlug flag to DeviceInfo and DeviceState to identify whether a device
must be or was cold plugged
Signed-off-by: Julio Montes <julio.montes@intel.com>
For now, agent return status of NotFound when calling getOOMEvents, runtime should handle it correctly.
Fixes: #393
Signed-off-by: bin liu <bin@hyper.sh>
Since the FC used the hybrid vsock, there's no need
to check whether the vhost vsock suported by host.
Fixes: #387
Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
This reverts commit 15af20b6da.
kubernetes test are failing randomly with QEMU 5.0.0, let's go back to
QEMU 4.1.1 and debug the failures with QEMU 5
Depends-on: github.com/kata-containers/tests#2701
fixes#379
Signed-off-by: Julio Montes <julio.montes@intel.com>
Kata 2.0 lives in `github.com/kata-containers/kata-containers`, so all scripts
should point to it.
Currently the branch for Kata 2.0 is 2.0-dev not master, then the branch envar
must be used instead of hardcoding `master` as default branch.
Signed-off-by: Julio Montes <julio.montes@intel.com>
When the "PCIAddr" (BDF information) is available, we allow to use the
predicted "VmPath" (from kata-runtime) to locate the block device in the
agent. This is a special code path for supporting block-device/volume
passthrough w/ cloud-hypervisor when the BDF information is not
available (as of clh v0.8.0).
This is mainly porting the changes from kata-agent PR https://github.com/kata-containers/agent/pull/790,
as the related changes from kata-runtime is ported to kata 2.0 earlier
this week (https://github.com/kata-containers/kata-containers/pull/362).
Note that the upstream clh recently added the support of returning BDF
information for hotplugged devices. We will consolidate/remove this
special code path for the next upgrade of clh version in kata.
Fixes: #248
Signed-off-by: Bo Chen <chen.bo@intel.com>
New features that can improve/impact in kata containers:
x86:
VMX features can be enabled/disabled via the "-cpu" flag.
When nested virtualization is enabled with an option like
"-cpu Haswell,+vmx", the set of VMX features will also be constrained to
what was available on the corresponding CPU model.
New "microvm" machine type that has virtio-mmio instead of PCI, and no ACPI
support (so no hotplug too). The new machine type is meant as a baseline
for performance optimizations of QEMU, firmware and guests. While inspired
by Firecracker it is not entirely compatible with it (for example it does
not have Firecracker's userspace IP stack and MicroVM Metadata Service).
Reduce memory footprint when booting uncompressed kernels.
ARM:
We now correctly support more than 256 CPUs when using KVM
The virt board now supports memory hotplugging, when used with a UEFI
guest BIOS and ACPI.
virtio-iommu is now supported with machvirt.
The Cortex-M7 CPU is now supported.
s390:
Using KVM now explicitly requires a host kernel version of at least 3.15
(which includes the 'flic' KVM device). This had been broken since QEMU
2.10 already.
ppc64le:
pseries machine type, now consumes less host resources when running a KVM
guest with XIVE (with a recent enough host kernel). This allows running
more concurrent guests with KVM accelerated XIVE.
NVDIMMs with file backend is now supported and SLOF updated to work with
iommu_platform=on for virtio devices.
Depends-on: github.com/kata-containers/tests#2694
fixes#372
Signed-off-by: Julio Montes <julio.montes@intel.com>
[ port runtime commit 364435a6a18bfbb1277512431040bf085554ffdf ]
The new release of clh v0.8.0 updated the 'VsockConfig' of its HTTP API,
which requires changes on our clh driver.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime commit 17d265af6fc1f0913545bfa64e3e1a497f3e44c0 ]
Major new functionalities added in clh v0.8.0 include Experimental
Snapshot and Restore Support, Experimental ARM64 Support, 5-level guest
paging support, etc. Also, there are quite some bug fixings and CLI/API
changes for cleanup. More details can be found in the release note:
https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v0.8.0.
Changes:
52b83969 build, release-notes: Document 0.8.0 release
776f8fc5 build: Update Cargo.lock
3f18f93f docs: Add a guide for testing on AArch64
97a1e5e1 vmm: Exit VMM event loop after guest shutdown for AArch64
5cd1730b vmm: Configure VM on AArch64
917219fa vmm: Enable VCPU for AArch64
b5f1c912 vmm: Enable memory manager for AArch64
eeeb45bb vmm: Enable device manager for AArch64
e9488846 vm-allocator: Enable vm-allocator for AArch64
5343b0ac net_util: Fix usage of deprecated mac_address method
bf37ebdc arch: x86_64: Add 5th level of paging when needed
abd6204d source: Fix file permissions
02ac1820 scripts: Ensure musl-gcc is used by musl build
cc85d896 tests: Extend test_*_reboot with checks on fd leaking
2ae547cf build(deps): bump vmm-sys-util from 0.6.0 to 0.6.1
f3556279 build(deps): bump serde_json from 1.0.54 to 1.0.55
dc034eb3 scripts: Only use musl for the Rust components
176d6716 build: Run musl builds in parallel to glibc builds
083189e5 build(deps): bump vcpkg from 0.2.9 to 0.2.10
2334b521 build(deps): bump syn from 1.0.30 to 1.0.31
99c99c24 build(deps): bump serde_json from 1.0.53 to 1.0.54
96a5e22b resources: kernel: Enable 5 levels of page table
653087d7 vmm: Reduce MMIO address space by 4KiB
5f0b6201 arch: x86_64: Enable CR4 LA57 feature
09fd3259 build: Use fork of vm-memory with less performance impact
5f9e079a device: Add AArch64 RTC PL031 implementation
625bab69 vmm: api: Allow to delete non-booted VMs
313883f6 remove duplicated structure InitrdConfig
afe60808 build(deps): bump synstructure from 0.12.3 to 0.12.4
aa79a92c tests: Add integration test for unprivileged network
9b71ba20 vmm, vm-virtio: Stop always autogenerating a host MAC address
1f8b6fa9 net_util: Allow retrieving the MAC address from the TAP device
929d70bc net_util: Only try and enable the TAP device if it not already enabled
eda9bfc7 vhost_user_fs: Replace the '--sock' parameter with '--socket'
a8cdf2f0 tests,vm-virtio,vmm: Use 'socket' for all CLI/API parameters
90e7accf ch-remote: Show response body from error
e436bbf3 build: Install libfdt in github cross-build workflow
2d13751d aarch64: Porting fdt related files from Firecracker
5a18dd36 aarch64: Porting AArch64 register implementation from Firecracker
d605fda3 aarch64: Porting GIC source files from Firecracker
ce624a6d aarch64: Add memory layout for AArch64
c7d44b88 build(deps): bump quote from 1.0.6 to 1.0.7
7c91dfae build(deps): bump proc-macro-nested from 0.1.4 to 0.1.5
17c16e5c build(deps): bump pin-project from 0.4.19 to 0.4.20
a2398742 build(deps): bump arc-swap from 0.4.6 to 0.4.7
b31fe72e build(deps): bump openssl-sys from 0.9.57 to 0.9.58
96497004 build(deps): bump dirs-sys from 0.3.4 to 0.3.5
eabf43fb Revert "tests: Extend test_*_reboot with checks on fd leaking"
7dc4e913 tests: Extend test_*_reboot with checks on fd leaking
601d898f build(deps): bump pin-project from 0.4.17 to 0.4.19
6ff107af vm-device: Switch to use get_host_address_range in vfio-ioctls
3336e801 vfio: Switch to the vfio-ioctls crate ch branch
d24aa72d vfio: Rename to vfio-ioctls
53ce5298 vfio: Move the PCI implementation to the PCI crate
8f7dc735 vmm: Move Vcpu::configure() to arch crate
969e5e0b vmm: Split configure_system() from load_kernel() for x86_64
20cf21cd vmm: Change booting process to cover AArch64 requirements
61aa4615 vhost_user_net: Implement VIRTIO_RING_F_EVENT_IDX
a4d377a0 vm-virtio: net: Implement VIRTIO_RING_F_EVENT_IDX
f0697073 vm-virtio: net: Handle lost interrupts on restore
a5596020 vm-virtio: Add some info! level debugging interrupt generation
cc51fdb8 vhost_user_net: Use NetQueuePair from vm-virtio
fcc62efc vm-virtio: net: Prepare NetQueuePair for use in vhost-user-net
2dbd1186 vm-virtio: net: Split network handling
237cb184 vm-virtio: net: Add further missing error reporting
36d072e6 vm-virtio: Add error propagation for TAP listener (un)registration
3151b5d8 vm-virtio: net: Refactor to support code reuse
22be88d3 build(deps): bump vfio-bindings from `887b3cf` to `f08cbcb`
6121f462 build(deps): bump vfio-bindings from `46ef9d4` to `887b3cf`
b731e63a build(deps): bump ryu from 1.0.4 to 1.0.5
d2d5ccb1 build(deps): bump proc-macro2 from 1.0.17 to 1.0.18
a1b9131b build(deps): bump syn from 1.0.29 to 1.0.30
2571b279 build(deps): bump vcpkg from 0.2.8 to 0.2.9
57f477ef build(deps): bump syn from 1.0.28 to 1.0.29
8a08ea46 build(deps): bump serde_derive from 1.0.110 to 1.0.111
b8ae30d4 build(deps): bump serde from 1.0.110 to 1.0.111
0a0fb246 build(deps): bump syn from 1.0.27 to 1.0.28
bc2921b2 build(deps): bump regex from 1.3.8 to 1.3.9
917ad530 build(deps): bump regex from 1.3.7 to 1.3.8
aac87196 build(deps): bump vm-memory from 0.2.0 to 0.2.1
4c2e6054 build: Update to latest version of container
c471ae94 Dockerfile: Update to latest Rust toolchain: 1.43.0
c31ad72e build: Address issues found by 1.43.0 clippy
fbd1a6c5 vmm: api: Return complete error responses in handle_http_request()
0728bece vmm: seccomp: Ensure that umask() can be reprogrammed
3497eeff main: Set the umask to 0077
c1d15de7 build(deps): bump syn from 1.0.25 to 1.0.27
a4bb96d4 build(deps): bump libc from 0.2.70 to 0.2.71
bfd52ad8 build(deps): bump linux-loader from `bd01b6d` to `1af92d2`
8f1f9d9e devices: Implement InterruptController on AArch64
b32d3025 devices: Refactor IOAPIC to cover other architectures
d5884180 build(deps): bump syn from 1.0.24 to 1.0.25
83c18de5 build(deps): bump proc-macro-hack from 0.5.15 to 0.5.16
7708b95e build(deps): bump syn from 1.0.23 to 1.0.24
749f2f03 build(deps): bump proc-macro2 from 1.0.15 to 1.0.17
c98d6fd0 build(deps): bump openssl-sys from 0.9.56 to 0.9.57
a9ca493b build(deps): bump proc-macro2 from 1.0.14 to 1.0.15
974c7138 build(deps): bump thiserror from 1.0.18 to 1.0.19
321c479b build(deps): bump proc-macro2 from 1.0.13 to 1.0.14
4f5c8be3 build: Added a workflow to cross-build targetting AArch64
1befae87 build: Fixed build errors and warnings on AArch64
0090ec2d build: Updated development utilities for AArch64
af8292b6 vmm, config, vhost_user_blk: remove "wce" parameter
9101bdd7 vm-virtio: block: Ensure backing file consistency
dc66eee8 vhost_user_block: Ensure backing file consistency
10db2131 vm-virtio: block: Add "writeback" control to Request
b94d9a30 vhost_user_backend: Allow backends to know features that can be used
9d88ba7a vhost_user_block: Use VirtioBlockConfig from vm-virtio
1fac2632 vm-virtio: Use config name as per spec
077a5c36 build(deps): bump syn from 1.0.22 to 1.0.23
a813b57f vm-virtio, vhost_user_{fs,block,backend}: Move EVENT_IDX handling
8ae7a38d build: Use same virtio-bindings version
3947809c vm-virtio: block: Ensure that VIRTIO_BLK_T_FLUSH requests actually sync
ca6edafb build(deps): bump cc from 1.0.53 to 1.0.54
a7f236b8 ci: Extend snapshot/restore to validate virtio-vsock
f442c62b vm-virtio: Implement Snapshottable trait for Vsock
f9759988 ci: Extend snapshot/restore test with virtio-iommu
646d33fe vm-virtio: Set queue fields explicitely during restore
02cbea54 vm-virtio: Implement Snapshottable trait for Iommu
4f89cb05 build(deps): bump linux-loader from `43d1c51` to `bd01b6d`
14db7b0a build(deps): bump addr2line from 0.12.0 to 0.12.1
9f2eddd9 ci: Fix test_serial_off
7c3e19c6 vhost_user_backend, vmm: Close leaked file descriptors
35782bd9 vm-virtio: Close file descriptors created by epoll::create()
039accc1 vhost_user_net, vm-virtio: Interrupt guest when TX queue is updated
c8a081e4 build(deps): bump pin-project from 0.4.16 to 0.4.17
b80a7d01 build(deps): bump vmm-sys-util from 0.5.0 to 0.6.0
e6fd6d63 vhost_user_block: Implement VIRTIO_BLK_F_FLUSH
95e3edda build(deps): bump quote from 1.0.5 to 1.0.6
d760010c build(deps): bump ppv-lite86 from 0.2.6 to 0.2.8
0cde08a7 build(deps): bump hermit-abi from 0.1.12 to 0.1.13
3adfe3fb build(deps): bump syn from 1.0.21 to 1.0.22
85aadd15 build(deps): bump proc-macro2 from 1.0.12 to 1.0.13
c764c212 build(deps): bump thiserror from 1.0.17 to 1.0.18
4366dd92 vm-virtio: block: Add support for VIRTIO_RING_F_EVENT_IDX
5a55fc07 vhost_user_fs: Fix seccomp filter for musl
391508f0 tests: Add tests checking for host MAC address setting
1b8b5ac1 vhost-user_net, vm-virtio, vmm: Permit host MAC address setting
11049401 vmm: seccomp: Add ioctl() commands interface hardware address
59e1361f net_util: tap: Add support for setting tap MAC address
68fc4329 vmm: Update seccomp filters with clock_nanosleep
badf8261 build(deps): bump anyhow from 1.0.30 to 1.0.31
7b10f732 build(deps): bump cc from 1.0.52 to 1.0.53
4120a7de vhost_user_fs: Add seccomp
6aa29bdb vmm: api: Use a common handler for data actions too
0fe223f0 vmm: api: Extend VmAction to reduce code duplication
6ec605a7 vmm: api: Refactor generic action handler
c652625b vmm: api: Add a default implementation for simple PUT requests
a3e8bea0 vmm: api: Move HttpError enum to http module
6aab0a54 vhost_user_fs: Implement support for optional sandboxing
c4bf383f vhost_user_*: Create a vhost::Listener in advance
fa844865 vhost_user_fs: Allow callers to provide a fd for /proc/self/fd
831cff3f vhost_user_fs: Use a fd for /proc/self/fd instead of /proc
ba4ec7fc ci: Extend snapshot_restore_test with hotplug
9e165c2c ci: Enable snapshot/restore integration test
c566f1f0 build(deps): bump once_cell from 1.3.1 to 1.4.0
7ffde295 build(deps): bump backtrace from 0.3.47 to 0.3.48
e9c2dbc8 build(deps): bump anyhow from 1.0.29 to 1.0.30
9ccc7daa build, vmm: Update to latest kvm-ioctls
80aa0a75 tests: Test unplugging virtio-fs
88ec93d0 vmm: config: Add missing "id" from FsConfig parsing
0f89f5ec build(deps): bump anyhow from 1.0.28 to 1.0.29
ab3d374a build(deps): bump syn from 1.0.20 to 1.0.21
35b8992e build(deps): bump thiserror from 1.0.16 to 1.0.17
3415b11d build(deps): bump quote from 1.0.4 to 1.0.5
6989bf05 build(deps): bump backtrace from 0.3.46 to 0.3.47
2991fd2a build(deps): bump libc from 0.2.69 to 0.2.70
c37da600 vmm: Update DeviceTree upon PCI BAR reprogramming
d0ae9d7c vmm: Share the DeviceTree across threads
5e9d2545 vmm: Store and restore virtio-pci BAR resources
02bd50f6 vm-virtio: Add helper to set the configuration BAR value
8a826ae2 vmm: Store and restore virtio-pci device on right PCI slot
98dac352 vmm: Add optional PCI b/d/f to each DeviceNode
1e0ebb76 pci: Allow specific PCI b/d/f to be reserved
e577b64a build(deps): bump syn from 1.0.19 to 1.0.20
36bffff2 tests: Expand the test_large_memory() test to cover lots of vCPUs
b9ba81c3 arch, vmm: Don't build mptable when using ACPI
16ac24d8 tests: Only test "noacpi" test when we don't build with ACPI
bb8d19bb arch: Check RSDP address does not go past memory
1c44e917 build(deps): bump clap from 2.33.0 to 2.33.1
4cd2eccf build(deps): bump signal-hook from 0.1.14 to 0.1.15
308b790c vm-virtio: Implement Snapshottable trait for VirtioPciDevice
6d594286 vm-virtio: Implement Snapshottable trait for VirtioPciCommonConfig
e1701f11 pci: Implement Snapshottable trait for PciConfiguration
376db311 pci: Implement Snapshottable trait for MsixConfig
52ac3779 tests: Remove network interface from test_memory_overhead
b57eeb96 vhost_user_block: Add "queue_size" to --block-backend
5016fcf8 vhost_user_block: Use config::OptionParser to simplify block backend parsing
592de97f vhost_user_net: Use config::OptionParser to simplify net backend parsing
f3f398eb vhost_user_block: Consolidate the vhost-user-block backend syntax
3220292d vhost_user_net: Consolidate the vhost-user-net backend syntax
0d2be3b6 build(deps): bump serde from 1.0.107 to 1.0.110
9d8754c6 build(deps): bump pin-project from 0.4.13 to 0.4.16
9bac13de build(deps): bump serde_json from 1.0.52 to 1.0.53
e8d4a13e build(deps): bump serde_derive from 1.0.107 to 1.0.110
d8f181c5 build(deps): bump futures from 0.3.4 to 0.3.5
1e44ac51 build(deps): bump serde_derive from 1.0.106 to 1.0.107
c197bd6f build(deps): bump serde from 1.0.106 to 1.0.107
475040b2 vm-virtio: Correctly reset the virtqueues
d809f2fe vm-virtio: Add virtio reset() support to MmioDevice
0d720cc3 bin: ch-remote: Ensure ch-remote supports syntax it advertises
74d88c4c build(deps): bump openssl-sys from 0.9.55 to 0.9.56
9adc32a0 tests: Print out details for smaps in test_memory_overhead
250f825f tests: Check that requesting tap name for virtio-net succeeds
006da040 tests: Check tap name provided is used for vhost_user_net tests
54b3329f tests: Add tests that use (non-existing) named tap
6fde2d18 build: Strip the binaries before using/releasing them
a4d23c3c build(deps): bump syn from 1.0.18 to 1.0.19
12e00c0f vmm: cpu: Retry sending signals if necessary
31bde4f5 vmm: Unpark the DeviceManager threads in shutdown
801e72ac vmm: cpu: Unpause vCPU threads
91a4a258 vmm: cpu: When coming out of the pause event check for a kill signal
cd60de8f Revert "vmm: vm: Unpark the threads before shutdown when the current state is paused"
797cd13d build(deps): bump vec_map from 0.8.1 to 0.8.2
f6a71bec vmm: Add unit tests for DeviceTree
64e01684 vmm: Create new module device_tree
3b77be90 vmm: Add device_node!() macro to improve code readability
83ec716e vmm: Create breadth-first search iterator for the DeviceTree
b91ab1e3 vmm: Remove the list of migratable devices
1be70372 vmm: Don't use migratable_devices for restore
bc608439 vmm: Add migratable field to the DeviceNode
7fec020f vmm: Create a dedicated DeviceTree structure
14b379de vmm: Add an identifier field to DeviceNode structure
0805d458 vmm: Add support for multiple children per DeviceNode
daaeba51 vmm: Change Node into DeviceNode
5c7df03e vmm: Store and restore virtio-pmem resources
2e6895d9 vmm: Store and restore virtio-fs resources
987f8215 vmm: Store and restore virtio-mmio resources
9cb1e1cc vmm: Perform MMIO allocation from virtio-mmio device creation
adf29706 vmm: Create devices in different path if restoring the VM
d39f91de vmm: Reorganize DeviceManager creation
89c2a586 vmm: Restore devices following the device tree
52c80cfc vmm: Snapshot and restore DeviceManager state
5b408eec vmm: Create a device tree
a6fde0bb vm-device: Define a Resource
b8841d7a tests: Validate vsock functionality works across a reboot
fec97e05 vm-virtio, vmm: Delete unix socket on shutdown
5109f914 vmm: config: Reject attempts to use VFIO or IOMMU without PCI
cb220ae1 tests: Add some debugging to test_memory_overhead
eb3d9d15 build(deps): bump ssh2 from 0.8.0 to 0.8.1
59b73034 build(deps): bump failure from 0.1.7 to 0.1.8
dd0791d7 build(deps): bump pnet from 0.25.0 to 0.26.0
7660a104 build(deps): bump failure_derive from 0.1.7 to 0.1.8
327d67fa virtio-mem: Return reize error in MemEpollHandler.run
bc318b64 build(deps): bump proc-macro2 from 1.0.10 to 1.0.12
5571c6af build(deps): bump signal-hook from 0.1.13 to 0.1.14
af3d0802 build(deps): bump pnet_macros from 0.25.0 to 0.26.0
678855e8 build(deps): bump term_size from 0.3.1 to 0.3.2
2a16ce7e build(deps): bump quote from 1.0.3 to 1.0.4
99e3a150 build(deps): bump backtrace-sys from 0.1.36 to 0.1.37
Signed-off-by: Bo Chen <chen.bo@intel.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime commit 4645d3e6ef2e99dae1f2b3a7bfded6fc304d3023 ]
Today for virtiofsd kata sets by default `cache=always`. This option is
useful for performance but if the shared files are modified from the
host changes are not updated in the guest as virtiofsd uses cached value
all time.
This patch changes to `cache=auto` to fix consistency issues. The option
can still be set to always if it is wanted by the user.
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime commit 9ac39116b08148de8e66abfca2e5407bc153af87 ]
kata-env output always shows "VMContainerCapable=false" on Power.
This patch fixes the same.
Signed-off-by: bpradipt@in.ibm.com
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime commit e5a3211c74e20e9878fd0f5d1c80a3c4354eabd1 ]
Today some options for virtiofsd could improve compatibility
for example xattrs for dnf or cache=auto for file consistency
for changes in the host. Allow users can enabled as requiered.
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime commit 6be76fcd07a3d74ca5521af2feaf966dd6f2c344 ]
This patch adds the unit test for 'handleDeviceBlockVolume()'.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime commit 5b96e01f1ba3b0458539c1c920d0c1aab7d5968e ]
We explicitly set "PCIAddr" to NULL, so that the "VirtPath" field can be
used by the agent to create the container.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime commit 50c1dce137bb3d608daa931c01e4941ed5fdb6cc ]
In case the "PCIAddr" of block devices is not available (e.g.
cloud-hypervisor), we also pass the "VirtPath" to the agent for adding
block devices to the container.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime commit aea29b64b66f75049cb045f9e41dff2becdbebdc ]
When the "PCIAddr" of block device is not available (e.g. cloud-hypervisor), we
allow to use the "VirtPath" as the volume source for creating containers.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime commit 379f19f7ccd71ebe938d9d6fe3cfe5f05f4f02bf ]
Add default value for Clock, otherwise rtc parameter will be dropped
by Valid function. "host" is the default value in qemu for rtc clock.
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime commit 6b32472c2138536ea7e859360498f175601d9ec9 ]
The bug got introduced in 06571f0
Signed-off-by: bpradipt@in.ibm.com
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime commit 18662e16687453185ff4cf99b495a34e3ea9935f ]
It's up to the user enable/disable pmu. After previous commit, the default
pmu option has been set to off.
This patch removes the hard limitation and unit test codes.
Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime commit 41a06d4961f51af4ec4799aaee202c744584f31e ]
The user sometimes doesn't care about pmu usage(e.g. perf tool profiling).
But pmu will cost significant overhead on boot time and virtualization
context switch. E.g. on arm64, if guest pmu is enabled, kvm should save
and restore all PMU registers when guest/host switching.
for dmesg comparision:
Before:
[ 0.007620] bus: 'platform': driver_probe_device: matched device pmu with driver armv8-pmu
[ 0.007622] bus: 'platform': really_probe: probing driver armv8-pmu with device pmu
[ 0.036282] hw perfevents: enabled with armv8_pmuv3 PMU driver, 7 counters available
[ 0.036285] driver: 'armv8-pmu': driver_bound: bound to device 'pmu'
[ 0.036295] bus: 'platform': really_probe: bound device pmu to driver armv8-pmu
After:
[ 0.007935] bus: 'platform': driver_probe_device: matched device alarmtimer with driver alarmtimer
[ 0.007937] bus: 'platform': really_probe: probing driver alarmtimer with device alarmtimer
[ 0.007940] driver: 'alarmtimer': driver_bound: bound to device 'alarmtimer'
[ 0.007944] bus: 'platform': really_probe: bound device alarmtimer to driver alarmtimer
Because s390 doest support "pmu=off", keep the default CPUFEATURES to be ""
instead of "pmu=off".
Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime commit f03c17d107999fd68da87d98ab3e242ac7843051 ]
So that users can use annotations to set it.
Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime commit 0100af18a2afdd6dfcc95129ec6237ba4915b3e5 ]
To control whether guest can enable/disable some CPU features. E.g. pmu=off,
vmx=off. As discussed in the thread [1], the best approach is to let users
specify them. How about adding a new option in the configuration file.
Currently this patch only supports this option in qemu,no other vmm.
[1] https://github.com/kata-containers/runtime/pull/2559#issuecomment-603998256
Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime commit 67d3e2c5c5d11738c0c0ff46b1228909a6c81ab0 ]
Some network plugins add static arp entries in the network namespace.
Scan namespace for static entries and pass these on to the
agent to be added within the guest.
If the grpc api is not implemented by the agent due to a older running
agent, check for this and do not error out to maintain
backward compatibility.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime commit 6c517548429da06d33172c8e135dc9b9a297175d ]
The systemd debug and kernel init call debug flags make slow the boot.
The flags are not really related with the hypervisor and
can be added if needed using extra kernel command line options.
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime commit 160e3a7c98043a52032b15cc8f6e32a91b032258 ]
Cloud hypervisor logs console via stdout. Using console logs help
to get not only agent logs but early boot kernel logs.
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime commit e1ee00d16ed621594a92ce0456eb048362962ff0 ]
Use systemd-cat to collect hypervisor output. The `systemd-cat` program
will open a journal fd and call `cat(1)` to redirect all the output to
the fd. This requires an extra binary to read from hypervisor stdout
(that has combined stdin, stderr and serial terminal). But because it is
cat the overhead is minimal and only is started on Kata debug mode.
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime commit 5e5527204c03036f1d1a6b3122c1e0c3e1d1ba94 ]
The block device driver defaults to 'virtio-scsi' when it is not set in
the hypervisor configuration file, while cloud-hypervisor supports only
'virtio-blk' for its block devices.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime commit c5f97b24d7a1eaac216f144b2c5429feb3451553 ]
With this patch, the container image can be shared from host with guest
as a block device when the 'devicemapper' is used as the storage driver
for docker. Note: The 'block_device_driver="virtio-blk"' entry is
required in the hypervisor config file to work properly.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
It should iter the shells to find the existing shell
command instead of return an error directly when it
meet an absent shell command.
Fixes: #354
Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
[ port from runtime commit 7b269ff7aa2d62fe12593ff7040798e6c9bd5d65 ]
If we take one of the error paths from setupVirtiofsd() after
opening the fd variable, the fd.Close() function is not called.
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime commit 882a82393305a4b11a77744b5fc77b98e42d15b9 ]
Send virtiofsd logs to syslog in the same way that qemu implementation
does. This requires not to wait for messages from virtiofsd stdout. This
takes the qemu implementation approach. Give the socket fd to virtiofsd.
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime commit 86f581068eb9dc4b6862c7415cdc912e111177dd ]
This exits out of polling for OOM events if the getOOMEvent
method is unimplemented.
Signed-off-by: Alex Price <aprice@atlassian.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime commit b4833a48c81132e5a6b1c25a764cd0ebbdc6afff ]
fix tests and nit
Signed-off-by: Alex Price <aprice@atlassian.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime commit 6aff077901021d9a0075c446dfe281b2487e1487 ]
With the addition of support to govmm for multiple transports (intel/govmm#111)
and microvm (intel/govmm#121) we can now enable support for the 'microvm'
machine type in kata-runtime.
Signed-off-by: Liam Merwick <liam.merwick@oracle.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime commit 86686b56a2bf7f6dd62f620278ae289564da51d0 ]
This adds support for the getOOMEvent agent endpoint to retrieve OOM
events from the agent.
Signed-off-by: Alex Price <aprice@atlassian.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime commit 86686b56a2bf7f6dd62f620278ae289564da51d0 ]
This adds support for the getOOMEvent agent endpoint to retrieve OOM
events from the agent.
Signed-off-by: Alex Price <aprice@atlassian.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime commit ee985a608015d81772901c1d9999190495fc9a0a ]
After removing dectect of host gic version, we need to limit the max vCPU
in different cases.
Given that in most cases, Kata is running on gicv3 host, set it as default
value. If the user really want to run Kata on gicv2 host, he/she need to
set default_maxvcpus in toml file to 8 instead of 0.
In summary, If the user uses host gicv3 gicv4, everything is fine
If the user uses host gicv2, set default_maxvcpus=8
Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime commit c4b5922df2 ]
Most of the description fields have capitalized text,
some of those that don't are then converted on this
change.
Fixed spelling of 'required'.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime repository commit 4d4a153af5cb145215cb6e6e386eac2bcb8c3e32 ]
Commit b4385901da ("qemu/arm64: Detect host GIC version to configure guest
GIC") reads /proc/interrupts to detect the host gic version.
But on a ThunderX2 host with 224 cpus, the /proc/interrupts is ~762K bytes.
Hence it will costs ~900K bytes memory overhead.
From the go tool pprof results:
flat flat% sum% cum cum%
976.89kB 100% 100% 976.89kB 100% github.com/kata-containers/runtime/virtcontainers.getHostGICVersion
Although the allocated memory will be freed, seems it worthy removing that
for speed up the runtime.
As per [1], there is no perfect way to detect the gic version on host.
At qemu side, if we use "gic-version=host", qemu will automatically detect
the verion by kvm ioctl. So we'd better let qemu determine the gic version.
If the user really want to start vm with gic-verion=2, he/she can set it
in machine_accelerators option.
[1]https://lists.cs.columbia.edu/pipermail/kvmarm/2014-October/011690.html
Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime repository commit e36389e25e ]
After backporting patch series of enabling memory hot remove on aarch64
to v5.4.x, we finally could enable nvdimm/dax on aarch64.
Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
[ port from runtime repository commit 7e47046111 ]
If major version matches max supported major, we continue comparing the minor version.
Signed-off-by: Ted Yu <yuzhihong@gmail.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Removing code that existed as a workaround for a bug in
how shared process namespaces were handled in the agent.
That has been long fixed in the agent.
With this, sharedPidNs will now work with shimv2.
Fixes#337
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Define a set of functions that support the standard rules (build,
install, test, *etc*). Then simply add new components and tools to the
appropriate variable to support all the standard build semantics.
Fixes#331.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Changed the name of the rule that runs the tests to "test" for
consistency, but retained `check` for backwards compatibility
for now.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
The qemuPaths field in qemuArchBase maps from machine type to the default
qemu path. But, by the time we construct it, we already know the machine
type, so that entry ends up being the only one we care about.
So, collapse the map into a single path. As a bonus, the qemuPath()
method can no longer fail.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The supportedQemuMachines array in qemuArchBase has a list of all the
qemu machine types supported for the architecture, with the options
for each. But, the machineType field already tells us which of the
machine types we're actually using, and that's the only entry we
actually care about.
So, drop the table, and just have a single value with the machine type
we're actually using. As a bonus that means the machine() method can
no longer fail, so no longer needs an error return.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently, newQemuArch() doesn't return an error. So, if passed an invalid
machine type, it will return a technically valid, but unusable qemuArch
object, which will probably fail with other errors shortly down the track.
Change this, to more cleanly fail the newQemuArch itself, letting us
detect a bad machine type earlier.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The last stanza of TestQemuAmd64Bridges is rather odd. It tries to create
a qemu instance with a machine type of (QemuQ35 + QemuPC), or in other
words "q35pc", which isn't a thing.
What it's asserting about this is that the returned bridges list is empty
despite asking for bridges, so it looks like what this is really trying to
test is for sane behaviour when given a bad machine type.
So, split this out into a separate test, and make it explicit for clarity.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-06-24 21:02:17 +10:00
1341 changed files with 176624 additions and 83252 deletions
Kata Containers is an open source project and community working to build a
standard implementation of lightweight Virtual Machines (VMs) that feel and
perform like containers, but provide the workload isolation and security
advantages of VMs.
- That might affect multiple code repositories.
## Getting started
- Where the raiser is unsure which repositories are affected.
See the [installation documentation](docs/install).
## Documentation
See the [official documentation](docs)
(including [installation guides](docs/install),
[the developer guide](docs/Developer-Guide.md),
[design documents](docs/design) and more).
## Community
To learn more about the project, its community and governance, see the
[community repository](https://github.com/kata-containers/community). This is
the first place to go if you wish to contribute to the project.
## Getting help
See the [community](#community) section for ways to contact us.
### Raising issues
Please raise an issue
[in this repository](https://github.com/kata-containers/kata-containers/issues).
> **Note:**
>
> - If an issue affects only a single component, it should be raised in that
> components repository.
> If you are reporting a security issue, please follow the [vulnerability reporting process](https://github.com/kata-containers/community#vulnerability-handling)
## Kata Containers repositories
#### Kata Containers 1.x versions
### CI
For older Kata Containers 1.x releases, please raise an issue in the
| [KSM throttler](https://github.com/kata-containers/ksm-throttler) | optional core | Daemon that monitors containers and deduplicates memory to maximize container density on the host. |
| [osbuilder](https://github.com/kata-containers/osbuilder) | infrastructure | See [components](#components). |
| [packaging](https://github.com/kata-containers/packaging) | infrastructure | See [components](#components). |
| [proxy](https://github.com/kata-containers/proxy) | core | Multiplexes communications between the shims, agent and runtime. |
| [runtime](https://github.com/kata-containers/runtime) | core | See [components](#components). |
| [shim](https://github.com/kata-containers/shim) | core | Handles standard I/O and signals on behalf of the container process. |
##### Proxy
> **Note:**
>
> - There are more components for the original Kata Containers 1.x implementation.
> - The current implementation simplifies the design significantly:
> compare the [current](docs/design/architecture.md) and
The [`kata-proxy`](https://github.com/kata-containers/proxy) is a process that
runs on the host and co-ordinates access to the agent running inside the
virtual machine.
### Common repositories
##### Runtime
The following repositories are used by both the current and first generation Kata Containers implementations:
The [`kata-runtime`](src/runtime/README.md) is usually
invoked by a container manager and provides high-level verbs to manage
containers.
| Component | Description | Current | First generation | Notes |
|-|-|-|-|-|
| CI | Continuous Integration configuration files and scripts. | [Kata 2.x](https://github.com/kata-containers/ci/tree/2.0-dev) | [Kata 1.x](https://github.com/kata-containers/ci/tree/master) | |
| kernel | The Linux kernel used by the hypervisor to boot the guest image. | [Kata 2.x][kernel] | [Kata 1.x][kernel] | Patches are stored in the packaging component. |
| tests | Test code. | [Kata 2.x](https://github.com/kata-containers/tests/tree/2.0-dev) | [Kata 1.x](https://github.com/kata-containers/tests/tree/master) | Excludes unit tests which live with the main code. |
| www.katacontainers.io | Contains the source for the [main web site](https://www.katacontainers.io). | [Kata 2.x][github-katacontainers.io] | [Kata 1.x][github-katacontainers.io] | | |
##### Shim
### Packaging and releases
The [`kata-shim`](https://github.com/kata-containers/shim) is a process that
runs on the host. It acts as though it is the workload (which actually runs
inside the virtual machine). This shim is required to be compliant with the
## This repo is part of [Kata Containers](https://katacontainers.io)
For details on how to contribute to the Kata Containers project, please see the main [contributing document](https://github.com/kata-containers/community/blob/master/CONTRIBUTING.md).
You MUST choose one of `alpine`, `centos`, `clearlinux`, `debian`, `euleros`, `fedora`, `suse`, and `ubuntu` for `${distro}`. By default `seccomp` packages are not included in the rootfs image. Set `SECCOMP` to `yes` to include them.
> **Note:**
>
> - Check the [compatibility matrix](https://github.com/kata-containers/osbuilder#platform-distro-compatibility-matrix) before creating rootfs.
> - Check the [compatibility matrix](../tools/osbuilder/README.md#platform-distro-compatibility-matrix) before creating rootfs.
> - You must ensure that the *default Docker runtime* is `runc` to make use of
> the `USE_DOCKER` variable. If that is not the case, remove the variable
> from the previous command. See [Checking Docker default runtime](#checking-docker-default-runtime).
@@ -305,15 +305,15 @@ You MUST choose one of `alpine`, `centos`, `clearlinux`, `debian`, `euleros`, `f
> - You should only do this step if you are testing with the latest version of the agent.
`AGENT_INIT` controls if the guest image uses the Kata agent as the guest `init` process. When you create an initrd image,
@@ -352,17 +352,20 @@ You MUST choose one of `alpine`, `centos`, `clearlinux`, `euleros`, and `fedora`
> **Note:**
>
> - Check the [compatibility matrix](https://github.com/kata-containers/osbuilder#platform-distro-compatibility-matrix) before creating rootfs.
> - Check the [compatibility matrix](../tools/osbuilder/README.md#platform-distro-compatibility-matrix) before creating rootfs.
Optionally, add your custom agent binary to the rootfs with the following:
Optionally, add your custom agent binary to the rootfs with the following, `LIBC` default is `musl`, if `ARCH` is `ppc64le`, should set the `LIBC=gnu` and `ARCH=powerpc64le`:
You can build and install the guest kernel image as shown [here](https://github.com/kata-containers/packaging/tree/master/kernel#build-kata-containers-kernel).
You can build and install the guest kernel image as shown [here](../tools/packaging/kernel/README.md#build-kata-containers-kernel).
# Install a hypervisor
When setting up Kata using a [packaged installation method](https://github.com/kata-containers/documentation/tree/master/install#installing-on-a-linux-system), the `qemu-lite` hypervisor is installed automatically. For other installation methods, you will need to manually install a suitable hypervisor.
When setting up Kata using a [packaged installation method](install/README.md#installing-on-a-linux-system), the `qemu-lite` hypervisor is installed automatically. For other installation methods, you will need to manually install a suitable hypervisor.
## Build a custom QEMU
@@ -390,7 +393,7 @@ Your QEMU directory need to be prepared with source code. Alternatively, you can
Kata containers provides two ways to connect to the guest. One is using traditional login service, which needs additional works. In contrast the simple debug console is easy to setup.
### Simple debug console setup
Kata Containers 2.0 supports a shell simulated *console* for quick debug purpose. This approach uses VSOCK to
connect to the shell running inside the guest which the agent starts. This method only requires the guest image to
contain either `/bin/sh` or `/bin/bash`.
#### Enable agent debug console
Enable debug_console_enabled in the `configuration.toml` configuration file:
```
[agent.kata]
debug_console_enabled = true
```
This will pass `agent.debug_console agent.debug_console_vport=1026` to agent as kernel parameters, and sandboxes created using this parameters will start a shell in guest if new connection is accept from VSOCK.
#### Start `kata-monitor`
The `kata-runtime exec` command needs `kata-monitor` to get the sandbox's `vsock` address to connect to, first start `kata-monitor`.
```
$ sudo kata-monitor
```
`kata-monitor` will serve at `localhost:8090` by default.
#### Connect to debug console
Command `kata-runtime exec` is used to connect to the debug console.
If you want to access guest OS through a traditional way, see [Traditional debug console setup)](#traditional-debug-console-setup).
### Traditional debug console setup
By default you cannot login to a virtual machine, since this can be sensitive
from a security perspective. Also, allowing logins would require additional
packages in the rootfs, which would increase the size of the image used to
@@ -494,12 +516,12 @@ the following steps (using rootfs or initrd image).
> **Note:** The following debug console instructions assume a systemd-based guest
> O/S image. This means you must create a rootfs for a distro that supports systemd.
> Currently, all distros supported by [osbuilder](https://github.com/kata-containers/osbuilder) support systemd
> Currently, all distros supported by [osbuilder](../tools/osbuilder) support systemd
> except for Alpine Linux.
>
> Look for `INIT_PROCESS=systemd` in the `config.sh` osbuilder rootfs config file
> to verify an osbuilder distro supports systemd for the distro you want to build rootfs for.
> For an example, see the [Clear Linux config.sh file](https://github.com/kata-containers/osbuilder/blob/master/rootfs-builder/clearlinux/config.sh).
> For an example, see the [Clear Linux config.sh file](../tools/osbuilder/rootfs-builder/clearlinux/config.sh).
>
> For a non-systemd-based distro, create an equivalent system
> service using that distro’s init system syntax. Alternatively, you can build a distro
@@ -507,9 +529,9 @@ the following steps (using rootfs or initrd image).
> additional packages in the rootfs and add “agent.debug_console” to kernel parameters in the runtime
> config file. This tells the Kata agent to launch the console directly.
>
> Once these steps are taken you can connect to the virtual machine using the [debug console](https://github.com/kata-containers/documentation/blob/master/Developer-Guide.md#connect-to-the-virtual-machine-using-the-debug-console).
> Once these steps are taken you can connect to the virtual machine using the [debug console](Developer-Guide.md#connect-to-the-virtual-machine-using-the-debug-console).
### Create a custom image containing a shell
#### Create a custom image containing a shell
To login to a virtual machine, you must
[create a custom rootfs](#create-a-rootfs-image) or [custom initrd](#create-an-initrd-image---optional)
@@ -519,46 +541,17 @@ an additional `coreutils` package.
For example using CentOS:
```
$ cd $GOPATH/src/github.com/kata-containers/osbuilder/rootfs-builder
Once all the pull requests to bump versions in all Kata repositories are merged,
tag all the repositories as shown below.
```
$ cd ${GOPATH}/src/github.com/kata-containers/packaging/release
$ cd ${GOPATH}/src/github.com/kata-containers/kata-containers/tools/packaging/release
$ git checkout <kata-branch-to-release>
$ git pull
$ ./tag_repos.sh -p -b "$BRANCH" tag
@@ -69,34 +68,7 @@
We make use of [GitHub actions](https://github.com/features/actions) in this [file](https://github.com/kata-containers/kata-containers/blob/master/.github/workflows/main.yaml) in the `kata-containers/kata-containers` repository to build and upload release artifacts. This action is auto triggered with the above step when a new tag is pushed to the `kata-containers/kata-conatiners` repository.
Check the [actions status page](https://github.com/kata-containers/kata-containers/actions) to verify all steps in the actions workflow have completed successfully. On success, a static tarball containing Kata release artifacts will be uploaded to the [Release page](https://github.com/kata-containers/runtime/releases).
### Create OBS Packages
- We have set up an [Azure Pipelines](https://azure.microsoft.com/en-us/services/devops/pipelines/) job
to trigger generation of Kata packages in [OBS](https://build.opensuse.org/).
Go to the [Azure Pipelines job that creates OBS packages](https://dev.azure.com/kata-containers/release-process/_release?_a=releases&view=mine&definitionId=1).
- Click on "Create release" (blue button, at top right corner).
It should prompt you for variables to be passed to the release job. They should look like:
```
BRANCH="the-kata-branch-that-is-release"
BUILD_HEAD=false
OBS_BRANCH="the-kata-branch-that-is-release"
```
Note: If the release is `Alpha` , `Beta` , or `RC` (that is part of a `master` release), please use `OBS_BRANCH=master`.
The above step shall create OBS packages for Kata for various distributions that Kata supports and test them as well.
- Verify that the packages have built successfully by checking the [Kata OBS project page](https://build.opensuse.org/project/subprojects/home:katacontainers).
- Make sure packages work correctly. This can be done manually or via the [package testing pipeline](http://jenkins.katacontainers.io/job/package-release-testing).
You have to make sure the packages are already published by OBS before this step.
It should prompt you for variables to be passed to the pipeline:
Note: `latest` will verify that a package provides the latest Kata tag in that branch.
Check the [actions status page](https://github.com/kata-containers/kata-containers/actions) to verify all steps in the actions workflow have completed successfully. On success, a static tarball containing Kata release artifacts will be uploaded to the [Release page](https://github.com/kata-containers/kata-containers/releases).
### Create release notes
@@ -105,12 +77,12 @@
Run the script as shown below:
```
$ cd ${GOPATH}/src/github.com/kata-containers/packaging/release
$ cd ${GOPATH}/src/github.com/kata-containers/kata-containers/tools/packaging/release
# Note: OLD_VERSION is where the script should start to get changes.
architecture. It also supports the [Kubernetes\* Container Runtime Interface (CRI)](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-node/container-runtime-interface.md)
and therefore works seamlessly with the [Kubernetes\* Container Runtime Interface (CRI)](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-node/container-runtime-interface.md)
through the [CRI-O\*](https://github.com/kubernetes-incubator/cri-o) and
[Containerd CRI Plugin\*](https://github.com/containerd/cri) implementation. In other words, you can transparently
select between the [default Docker and CRI shim runtime (runc)](https://github.com/opencontainers/runc)
`kata-runtime` creates a QEMU\*/KVM virtual machine for each container or pod,
the Docker engine or `kubelet` (Kubernetes) creates respectively.
Kata Containers creates a QEMU\*/KVM virtual machine for pod that `kubelet` (Kubernetes) creates respectively.

The [`containerd-shim-kata-v2` (shown as `shimv2` from this point onwards)](https://github.com/kata-containers/runtime/tree/master/containerd-shim-v2)
is another Kata Containers entrypoint, which
The [`containerd-shim-kata-v2` (shown as `shimv2` from this point onwards)](../../src/runtime/containerd-shim-v2)
is the Kata Containers entrypoint, which
implements the [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/master/runtime/v2) for Kata.
With `shimv2`, Kubernetes can launch Pod and OCI compatible containers with one shim (the `shimv2`) per Pod instead
of `2N+1` shims (a `containerd-shim` and a `kata-shim` for each container and the Pod sandbox itself), and no standalone
`kata-proxy` process even if no VSOCK is available.
Before `shimv2` (as done in [Kata Containers 1.x releases](https://github.com/kata-containers/runtime/releases)), we need to create a `containerd-shim` and a [`kata-shim`](https://github.com/kata-containers/shim) for each container and the Pod sandbox itself, plus an optional [`kata-proxy`](https://github.com/kata-containers/proxy) when VSOCK is not available. With `shimv2`, Kubernetes can launch Pod and OCI compatible containers with one shim (the `shimv2`) per Pod instead of `2N+1` shims, and no standalone `kata-proxy` process even if no VSOCK is available.

The container process is then spawned by
[agent](https://github.com/kata-containers/agent), an agent process running
as a daemon inside the virtual machine. `kata-agent` runs a gRPC server in
[`kata-agent`](../../src/agent), an agent process running
as a daemon inside the virtual machine. `kata-agent` runs a [`ttRPC`](https://github.com/containerd/ttrpc-rust) server in
the guest using a VIRTIO serial or VSOCK interface which QEMU exposes as a socket
file on the host. `kata-runtime` uses a gRPC protocol to communicate with
file on the host. `shimv2` uses a `ttRPC` protocol to communicate with
the agent. This protocol allows the runtime to send container management
commands to the agent. The protocol is also used to carry the I/O streams (stdout,
stderr, stdin) between the containers and the manage engines (e.g. Docker Engine).
stderr, stdin) between the containers and the manage engines (e.g. CRI-O or containerd).
For any given container, both the init process and all potentially executed
commands within that container, together with their related I/O streams, need
to go through the VIRTIO serial or VSOCK interface exported by QEMU.
In the VIRTIO serial case, a [Kata Containers
proxy (`kata-proxy`)](https://github.com/kata-containers/proxy) instance is
launched for each virtual machine to handle multiplexing and demultiplexing
those commands and streams.
On the host, each container process's removal is handled by a reaper in the higher
layers of the container stack. In the case of Docker or containerd it is handled by `containerd-shim`.
In the case of CRI-O it is handled by `conmon`. For clarity, for the remainder
of this document the term "container process reaper" will be used to refer to
either reaper. As Kata Containers processes run inside their own virtual machines,
the container process reaper cannot monitor, control
or reap them. `kata-runtime` fixes that issue by creating an [additional shim process
(`kata-shim`)](https://github.com/kata-containers/shim) between the container process
reaper and `kata-proxy`. A `kata-shim` instance will both forward signals and `stdin`
streams to the container process on the guest and pass the container `stdout`
and `stderr` streams back up the stack to the CRI shim or Docker via the container process
reaper. `kata-runtime` creates a `kata-shim` daemon for each container and for each
OCI command received to run within an already running container (example, `docker
exec`).
Since Kata Containers version 1.5, the new introduced `shimv2` has integrated the
functionalities of the reaper, the `kata-runtime`, the `kata-shim`, and the `kata-proxy`.
As a result, there will not be any of the additional processes previously listed.
to go through the VSOCK interface exported by QEMU.
The container workload, that is, the actual OCI bundle rootfs, is exported from the
host to the virtual machine. In the case where a block-based graph driver is
configured, `virtio-scsi` will be used. In all other cases a 9pfs VIRTIO mount point
configured, `virtio-scsi` will be used. In all other cases a `virtio-fs` VIRTIO mount point
will be used. `kata-agent` uses this mount point as the root filesystem for the
container processes.
@@ -136,7 +95,7 @@ The only services running in the context of the mini O/S are the init daemon
is created using libcontainer, creating a container in the same manner that is done
by `runc`.
For example, when `docker run -ti ubuntu date` is run:
For example, when `ctr run -ti ubuntu date` is run:
- The hypervisor will boot the mini-OS image using the guest kernel.
-`systemd`, running inside the mini-OS context, will launch the `kata-agent` in
@@ -155,225 +114,36 @@ The only service running in the context of the initrd is the [Agent](#agent) as
## Agent
[`kata-agent`](https://github.com/kata-containers/agent) is a process running in the
guest as a supervisor for managing containers and processes running within
those containers.
[`kata-agent`](../../src/agent) is a process running in the guest as a supervisor for managing containers and processes running within those containers.
The `kata-agent` execution unit is the sandbox. A `kata-agent` sandbox is a container sandbox defined by a set of namespaces (NS, UTS, IPC and PID). `kata-runtime` can
For the 2.0 release, the `kata-agent` is rewritten in the [RUST programming language](https://www.rust-lang.org/) so that we can minimize its memory footprint while keeping the memory safety of the original GO version of [`kata-agent` used in Kata Container 1.x](https://github.com/kata-containers/agent). This memory footprint reduction is pretty impressive, from tens of megabytes down to less than 100 kilobytes, enabling Kata Containers in more use cases like functional computing and edge computing.
The `kata-agent` execution unit is the sandbox. A `kata-agent` sandbox is a container sandbox defined by a set of namespaces (NS, UTS, IPC and PID). `shimv2` can
run several containers per VM to support container engines that require multiple
containers running inside a pod. In the case of docker, `kata-runtime` creates a
single container per pod.
containers running inside a pod.
`kata-agent` communicates with the other Kata components over gRPC.
It also runs a [`yamux`](https://github.com/hashicorp/yamux) server on the same gRPC URL.
The `kata-agent` makes use of [`libcontainer`](https://github.com/opencontainers/runc/tree/master/libcontainer)
to manage the lifecycle of the container. This way the `kata-agent` reuses most
of the code used by [`runc`](https://github.com/opencontainers/runc).
### Agent gRPC protocol
placeholder
`kata-agent` communicates with the other Kata components over `ttRPC`.
## Runtime
`kata-runtime` is an OCI compatible container runtime and is responsible for handling
`containerd-shim-kata-v2` is a [containerd runtime shimv2](https://github.com/containerd/containerd/blob/v1.4.1/runtime/v2/README.md) implementation and is responsible for handling the `runtime v2 shim APIs`, which is similar to [the OCI runtime specification](https://github.com/opencontainers/runtime-spec) but simplifies the architecture by loading the runtime once and making RPC calls to handle the various container lifecycle commands. This refinement is an improvement on the OCI specification which requires the container manager call the runtime binary multiple times, at least once for each lifecycle command.
`kata-runtime` heavily utilizes the
[virtcontainers project](https://github.com/containers/virtcontainers), which
provides a generic, runtime-specification agnostic, hardware-virtualized containers
library.
`containerd-shim-kata-v2` heavily utilizes the
[virtcontainers package](../../src/runtime/virtcontainers/), which provides a generic, runtime-specification agnostic, hardware-virtualized containers library.
### Configuration
The runtime uses a TOML format configuration file called `configuration.toml`. By
default this file is installed in the `/usr/share/defaults/kata-containers`
directory and contains various settings such as the paths to the hypervisor,
the guest kernel and the mini-OS image.
The runtime uses a TOML format configuration file called `configuration.toml`. By default this file is installed in the `/usr/share/defaults/kata-containers` directory and contains various settings such as the paths to the hypervisor, the guest kernel and the mini-OS image.
The actual configuration file paths can be determined by running:
```
$ kata-runtime --kata-show-default-config-paths
```
Most users will not need to modify the configuration file.
The file is well commented and provides a few "knobs" that can be used to modify
the behavior of the runtime.
The configuration file is also used to enable runtime [debug output](https://github.com/kata-containers/documentation/blob/master/Developer-Guide.md#enable-full-debug).
### Significant OCI commands
Here we describe how `kata-runtime` handles the most important OCI commands.
command, `kata-runtime` goes through the following steps:
1. Create the network namespace where we will spawn VM and shims processes.
2. Call into the pre-start hooks. One of them should be responsible for creating
the `veth` network pair between the host network namespace and the network namespace
freshly created.
3. Scan the network from the new network namespace, and create a MACVTAP connection
between the `veth` interface and a `tap` interface into the VM.
4. Start the VM inside the network namespace by providing the `tap` interface
previously created.
5. Wait for the VM to be ready.
6. Start `kata-proxy`, which will connect to the created VM. The `kata-proxy` process
will take care of proxying all communications with the VM. Kata has a single proxy
per VM.
7. Communicate with `kata-agent` (through the proxy) to configure the sandbox
inside the VM.
8. Communicate with `kata-agent` to create the container, relying on the OCI
configuration file `config.json` initially provided to `kata-runtime`. This
spawns the container process inside the VM, leveraging the `libcontainer` package.
9. Start `kata-shim`, which will connect to the gRPC server socket provided by the `kata-proxy`. `kata-shim` will spawn a few Go routines to parallelize blocking calls `ReadStdout()` , `ReadStderr()` and `WaitProcess()`. Both `ReadStdout()` and `ReadStderr()` are run through infinite loops since `kata-shim` wants the output of those until the container process terminates. `WaitProcess()` is a unique call which returns the exit code of the container process when it terminates inside the VM. Note that `kata-shim` is started inside the network namespace, to allow upper layers to determine which network namespace has been created and by checking the `kata-shim` process. It also creates a new PID namespace by entering into it. This ensures that all `kata-shim` processes belonging to the same container will get killed when the `kata-shim` representing the container process terminates.
At this point the container process is running inside of the VM, and it is represented
With traditional containers, [`start`](https://github.com/kata-containers/runtime/blob/master/cli/start.go) launches a container process in its own set of namespaces. With Kata Containers, the main task of `kata-runtime` is to ask [`kata-agent`](#agent) to start the container workload inside the virtual machine. `kata-runtime` will run through the following steps:
1. Communicate with `kata-agent` (through the proxy) to start the container workload
inside the VM. If, for example, the command to execute inside of the container is `top`,
the `kata-shim`'s `ReadStdOut()` will start returning text output for top, and
`WaitProcess()` will continue to block as long as the `top` process runs.
2. Call into the post-start hooks. Usually, this is a no-op since nothing is provided
OCI [`exec`](https://github.com/kata-containers/runtime/blob/master/cli/exec.go) allows you to run an additional command within an already running
container. In Kata Containers, this is handled as follows:
1. A request is sent to the `kata agent` (through the proxy) to start a new process
inside an existing container running within the VM.
2. A new `kata-shim` is created within the same network and PID namespaces as the
original `kata-shim` representing the container process. This new `kata-shim` is
used for the new exec process.
Now the process started with `exec` is running within the VM, sharing `uts`, `pid`, `mnt` and `ipc` namespaces with the container process.

#### `kill`
When sending the OCI [`kill`](https://github.com/kata-containers/runtime/blob/master/cli/kill.go) command, the container runtime should send a
[UNIX signal](https://en.wikipedia.org/wiki/Unix_signal) to the container process.
A `kill` sending a termination signal such as `SIGKILL` or `SIGTERM` is expected
to terminate the container process. In the context of a traditional container,
this means stopping the container. For `kata-runtime`, this translates to stopping
the container and the VM associated with it.
1. Send a request to kill the container process to the `kata-agent` (through the proxy).
2. Wait for `kata-shim` process to exit.
3. Force kill the container process if `kata-shim` process didn't return after a
timeout. This is done by communicating with `kata-agent` (connecting the proxy),
sending `SIGKILL` signal to the container process inside the VM.
4. Wait for `kata-shim` process to exit, and return an error if we reach the
timeout again.
5. Communicate with `kata-agent` (through the proxy) to remove the container
configuration from the VM.
6. Communicate with `kata-agent` (through the proxy) to destroy the sandbox
configuration from the VM.
7. Stop the VM.
8. Remove all network configurations inside the network namespace and delete the
namespace.
9. Execute post-stop hooks.
If `kill` was invoked with a non-termination signal, this simply signals the container process. Otherwise, everything has been torn down, and the VM has been removed.
#### `delete`
[`delete`](https://github.com/kata-containers/runtime/blob/master/cli/delete.go) removes all internal resources related to a container. A running container
cannot be deleted unless the OCI runtime is explicitly being asked to, by using
`--force` flag.
If the sandbox is not stopped, but the particular container process returned on
its own already, the `kata-runtime` will first go through most of the steps a `kill`
would go through for a termination signal. After this process, or if the `sandboxID` was already stopped to begin with, then `kata-runtime` will:
1. Remove container resources. Every file kept under `/var/{lib,run}/virtcontainers/sandboxes/<sandboxID>/<containerID>`.
2. Remove sandbox resources. Every file kept under `/var/{lib,run}/virtcontainers/sandboxes/<sandboxID>`.
At this point, everything related to the container should have been removed from the host system, and no related process should be running.
returns the status of the container. For `kata-runtime`, this means being
able to detect if the container is still running by looking at the state of `kata-shim`
process representing this container process.
1. Ask the container status by checking information stored on disk. (clarification needed)
2. Check `kata-shim` process representing the container.
3. In case the container status on disk was supposed to be `ready` or `running`,
and the `kata-shim` process no longer exists, this involves the detection of a
stopped container. This means that before returning the container status,
the container has to be properly stopped. Here are the steps involved in this detection:
1. Wait for `kata-shim` process to exit.
2. Force kill the container process if `kata-shim` process didn't return after a timeout. This is done by communicating with `kata-agent` (connecting the proxy), sending `SIGKILL` signal to the container process inside the VM.
3. Wait for `kata-shim` process to exit, and return an error if we reach the timeout again.
4. Communicate with `kata-agent` (connecting the proxy) to remove the container configuration from the VM.
4. Return container status.
## Proxy
Communication with the VM can be achieved by either `virtio-serial` or, if the host
kernel is newer than v4.8, a virtual socket, `vsock` can be used. The default is `virtio-serial`.
The VM will likely be running multiple container processes. In the event `virtio-serial`
is used, the I/O streams associated with each process needs to be multiplexed and demultiplexed on the host. On systems with `vsock` support, this component becomes optional.
`kata-proxy` is a process offering access to the VM [`kata-agent`](https://github.com/kata-containers/agent)
to multiple `kata-shim` and `kata-runtime` clients associated with the VM. Its
main role is to route the I/O streams and signals between each `kata-shim`
instance and the `kata-agent`.
`kata-proxy` connects to `kata-agent` on a Unix domain socket that `kata-runtime` provides
while spawning `kata-proxy`.
`kata-proxy` uses [`yamux`](https://github.com/hashicorp/yamux) to multiplex gRPC
requests on its connection to the `kata-agent`.
When proxy type is configured as `proxyBuiltIn`, we do not spawn a separate
process to proxy gRPC connections. Instead a built-in Yamux gRPC dialer is used to connect
directly to `kata-agent`. This is used by CRI container runtime server `frakti` which
calls directly into `kata-runtime`.
## Shim
A container process reaper, such as Docker's `containerd-shim` or CRI-O's `conmon`,
is designed around the assumption that it can monitor and reap the actual container
process. As the container process reaper runs on the host, it cannot directly
monitor a process running within a virtual machine. At most it can see the QEMU
process, but that is not enough. With Kata Containers, `kata-shim` acts as the
container process that the container process reaper can monitor. Therefore
`kata-shim` needs to handle all container I/O streams (`stdout`, `stdin` and `stderr`)
and forward all signals the container process reaper decides to send to the container
process.
`kata-shim` has an implicit knowledge about which VM agent will handle those streams
and signals and thus acts as an encapsulation layer between the container process
reaper and the `kata-agent`. `kata-shim`:
- Connects to `kata-proxy` on a Unix domain socket. The socket URL is passed from
`kata-runtime` to `kata-shim` when the former spawns the latter along with a
`containerID` and `execID`. The `containerID` and `execID` are used to identify
the true container process that the shim process will be shadowing or representing.
- Forwards the standard input stream from the container process reaper into
`kata-proxy` using gRPC `WriteStdin` gRPC API.
- Reads the standard output/error from the container process.
- Forwards signals it receives from the container process reaper to `kata-proxy`
using `SignalProcessRequest` API.
- Monitors terminal changes and forwards them to `kata-proxy` using gRPC `TtyWinResize`
API.
The file is well commented and provides a few "knobs" that can be used to modify the behavior of the runtime and your chosen hypervisor.
The configuration file is also used to enable runtime [debug output](../Developer-Guide.md#enable-full-debug).
## Networking
@@ -386,66 +156,31 @@ In order to do so, container engines will usually add one end of a virtual
ethernet (`veth`) pair into the container networking namespace. The other end of
the `veth` pair is added to the host networking namespace.
This is a very namespace-centric approach as many hypervisors (in particular QEMU)
cannot handle `veth`interfaces. Typically, `TAP` interfaces are created for VM
connectivity.
This is a very namespace-centric approach as many hypervisors/VMMs cannot handle `veth`
interfaces. Typically, `TAP` interfaces are created for VM connectivity.
To overcome incompatibility between typical container engines expectations
and virtual machines, `kata-runtime` networking transparently connects `veth`
interfaces with `TAP` ones using MACVTAP:
and virtual machines, Kata Containers networking transparently connects `veth`
Container workloads are shared with the virtualized environment through [9pfs](https://www.kernel.org/doc/Documentation/filesystems/9p.txt).
The devicemapper storage driver is a special case. The driver uses dedicated block
devices rather than formatted filesystems, and operates at the block level rather
than the file level. This knowledge is used to directly use the underlying block
device instead of the overlay file system for the container root file system. The
block device maps to the top read-write layer for the overlay. This approach gives
much better I/O performance compared to using 9pfs to share the container file system.
Container workloads are shared with the virtualized environment through [virtio-fs](https://virtio-fs.gitlab.io/).
The approach above does introduce a limitation in terms of dynamic file copy
in/out of the container using the `docker cp` operations. The copy operation from
host to container accesses the mounted file system on the host-side. This is
not expected to work and may lead to inconsistencies as the block device will
be simultaneously written to from two different mounts. The copy operation from
container to host will work, provided the user calls `sync(1)` from within the
container prior to the copy to make sure any outstanding cached data is written
to the block device.
The [devicemapper `snapshotter`](https://github.com/containerd/containerd/tree/master/snapshots/devmapper) is a special case. The `snapshotter` uses dedicated block devices rather than formatted filesystems, and operates at the block level rather than the file level. This knowledge is used to directly use the underlying block device instead of the overlay file system for the container root file system. The block device maps to the top read-write layer for the overlay. This approach gives much better I/O performance compared to using `virtio-fs` to share the container file system.
Kata Containers has the ability to hotplug and remove block devices, which makes it possible to use block devices for containers started after the VM has been launched.
Kata Containers has the ability to hotplug and remove block devices, which makes it
possible to use block devices for containers started after the VM has been launched.
Users can check to see if the container uses the devicemapper block device as its
rootfs by calling `mount(8)` within the container. If the devicemapper block device
is used, `/` will be mounted on `/dev/vda`. Users can disable direct mounting
of the underlying block device through the runtime configuration.
Users can check to see if the container uses the devicemapper block device as its rootfs by calling `mount(8)` within the container. If the devicemapper block device
is used, `/` will be mounted on `/dev/vda`. Users can disable direct mounting of the underlying block device through the runtime configuration.
## Kubernetes support
@@ -505,44 +219,13 @@ lifecycle management from container execution through the dedicated
In other words, a Kubelet is a CRI client and expects a CRI implementation to
handle the server side of the interface.
[CRI-O\*](https://github.com/kubernetes-incubator/cri-o) and [Containerd CRI Plugin\*](https://github.com/containerd/cri) are CRI implementations that rely on [OCI](https://github.com/opencontainers/runtime-spec)
[CRI-O\*](https://github.com/kubernetes-incubator/cri-o) and [Containerd\*](https://github.com/containerd/containerd/) are CRI implementations that rely on [OCI](https://github.com/opencontainers/runtime-spec)
compatible runtimes for managing container instances.
Kata Containers is an officially supported CRI-O and Containerd CRI Plugin runtime. It is OCI compatible and therefore aligns with project's architecture and requirements.
However, due to the fact that Kubernetes execution units are sets of containers (also
known as pods) rather than single containers, the Kata Containers runtime needs to
get extra information to seamlessly integrate with Kubernetes.
Kata Containers is an officially supported CRI-O and Containerd runtime. Refer to the following guides on how to set up Kata Containers with Kubernetes:
### Problem statement
The Kubernetes\* execution unit is a pod that has specifications detailing constraints
such as namespaces, groups, hardware resources, security contents, *etc* shared by all
the containers within that pod.
By default the Kubelet will send a container creation request to its CRI runtime for
each pod and container creation. Without additional metadata from the CRI runtime,
the Kata Containers runtime will thus create one virtual machine for each pod and for
each containers within a pod. However the task of providing the Kubernetes pod semantics
when creating one virtual machine for each container within the same pod is complex given
the resources of these virtual machines (such as networking or PID) need to be shared.
The challenge with Kata Containers when working as a Kubernetes\* runtime is thus to know
when to create a full virtual machine (for pods) and when to create a new container inside
a previously created virtual machine. In both cases it will get called with very similar
arguments, so it needs the help of the Kubernetes CRI runtime to be able to distinguish a
pod creation request from a container one.
### Containerd
As of Kata Containers 1.5, using `shimv2` with containerd 1.2.0 or above is the preferred
way to run Kata Containers with Kubernetes ([see the howto](https://github.com/kata-containers/documentation/blob/master/how-to/how-to-use-k8s-with-cri-containerd-and-kata.md#configure-containerd-to-use-kata-containers)).
The CRI-O will catch up soon ([`kubernetes-sigs/cri-o#2024`](https://github.com/kubernetes-sigs/cri-o/issues/2024)).
Refer to the following how-to guides:
- [How to use Kata Containers and Containerd](/how-to/containerd-kata.md)
- [How to use Kata Containers and CRI (containerd plugin) with Kubernetes](/how-to/how-to-use-k8s-with-cri-containerd-and-kata.md)
### CRI-O
- [How to use Kata Containers and Containerd](../how-to/containerd-kata.md)
- [Run Kata Containers with Kubernetes](../how-to/run-kata-with-k8s.md)
#### OCI annotations
@@ -587,36 +270,10 @@ with a Kubernetes pod:
#### Mixing VM based and namespace based runtimes
> **Note:** Since Kubernetes 1.12, the [`Kubernetes RuntimeClass`](/how-to/containerd-kata.md#kubernetes-runtimeclass)
> **Note:** Since Kubernetes 1.12, the [`Kubernetes RuntimeClass`](https://kubernetes.io/docs/concepts/containers/runtime-class/)
> has been supported and the user can specify runtime without the non-standardized annotations.
One interesting evolution of the CRI-O support for `kata-runtime` is the ability
to run virtual machine based pods alongside namespace ones. With CRI-O and Kata
Containers, one can introduce the concept of workload trust inside a Kubernetes
cluster.
A cluster operator can now tag (through Kubernetes annotations) container workloads
as `trusted` or `untrusted`. The former labels known to be safe workloads while
the latter describes potentially malicious or misbehaving workloads that need the
highest degree of isolation. In a software development context, an example of a `trusted` workload would be a containerized continuous integration engine whereas all
developers applications would be `untrusted` by default. Developers workloads can
be buggy, unstable or even include malicious code and thus from a security perspective
it makes sense to tag them as `untrusted`. A CRI-O and Kata Containers based
Kubernetes cluster handles this use case transparently as long as the deployed
containers are properly tagged. All `untrusted` containers will be handled by Kata Containers and thus run in a hardware virtualized secure sandbox while `runc`, for
example, could handle the `trusted` ones.
CRI-O's default behavior is to trust all pods, except when they're annotated with
`io.kubernetes.cri-o.TrustedSandbox` set to `false`. The default CRI-O trust level
is set through its `configuration.toml` configuration file. Generally speaking,
the CRI-O runtime selection between its trusted runtime (typically `runc`) and its untrusted one (`kata-runtime`) is a function of the pod `Privileged` setting, the `io.kubernetes.cri-o.TrustedSandbox` annotation value, and the default CRI-O trust
level. When a pod is `Privileged`, the runtime will always be `runc`. However, when
a pod is **not**`Privileged` the runtime selection is done as follows:
| | `io.kubernetes.cri-o.TrustedSandbox` not set | `io.kubernetes.cri-o.TrustedSandbox` = `true` | `io.kubernetes.cri-o.TrustedSandbox` = `false` |
| Default CRI-O trust level: `untrusted` | Kata Containers | Kata Containers | Kata Containers |
With `RuntimeClass`, users can define Kata Containers as a `RuntimeClass` and then explicitly specify that a pod being created as a Kata Containers pod. For details, please refer to [How to use Kata Containers and Containerd](../../docs/how-to/containerd-kata.md).
Kata implement CRI's API and support [`ContainerStats`](https://github.com/kubernetes/kubernetes/blob/release-1.18/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.proto#L101) and [`ListContainerStats`](https://github.com/kubernetes/kubernetes/blob/release-1.18/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.proto#L103) interfaces to expose containers metrics. User can use these interface to get basic metrics about container.
But unlike `runc`, Kata is a VM-based runtime and has a different architecture.
## Limitations of Kata 1.x and the target of Kata 2.0
Kata 1.x has a number of limitations related to observability that may be obstacles to running Kata Containers at scale.
In Kata 2.0, the following components will be able to provide more details about the system.
- containerd shim v2 (effectively `kata-runtime`)
- Hypervisor statistics
- Agent process
- Guest OS statistics
> **Note**: In Kata 1.x, the main user-facing component was the runtime (`kata-runtime`). From 1.5, Kata then introduced the Kata containerd shim v2 (`containerd-shim-kata-v2`) which is essentially a modified runtime that is loaded by containerd to simplify and improve the way VM-based containers are created and managed.
>
> For Kata 2.0, the main component is the Kata containerd shim v2, although the deprecated `kata-runtime` binary will be maintained for a period of time.
>
> Any mention of the "Kata runtime" in this document should be taken to refer to the Kata containerd shim v2 unless explicitly noted otherwise (for example by referring to it explicitly as the `kata-runtime` binary).
## Metrics architecture
Kata 2.0 metrics strongly depend on [Prometheus](https://prometheus.io/), a graduated project from CNCF.
Kata Containers 2.0 introduces a new Kata component called `kata-monitor` which is used to monitor the other Kata components on the host. It's the monitor interface with Kata runtime, and we can do something like these:
- Get metrics
- Get events
In this document we will cover metrics only. And until now it only supports metrics function.
This is the architecture overview metrics in Kata Containers 2.0.
For a quick evaluation, you can check out [this how to](../how-to/how-to-set-prometheus-in-k8s.md).
### Kata monitor
`kata-monitor` is a management agent on one node, where many Kata containers are running. `kata-monitor`'s work include:
> **Note**: node is a single host system or a node in K8s clusters.
- Aggregate sandbox metrics running on this node, and add `sandbox_id` label
- As a Prometheus target, all metrics from Kata shim on this node will be collected by Prometheus indirectly. This can easy the targets count in Prometheus, and also need not to expose shim's metrics by `ip:port`
Only one `kata-monitor` process are running on one node.
`kata-monitor` is using a different communication channel other than that `conatinerd` communicating with Kata shim, and Kata shim listen on a new socket address for communicating with `kata-monitor`.
The way `kata-monitor` get shim's metrics socket file(`monitor_address`) like that `containerd` get shim address. The socket is an abstract socket and saved as file `abstract` with the same directory of `address` for `containerd`.
> **Note**: If there is no Prometheus server is configured, i.e., there is no scrape operations, `kata-monitor` will do nothing initiative.
### Kata runtime
Runtime is responsible for:
- Gather metrics about shim process
- Gather metrics about hypervisor process
- Gather metrics about running sandbox
- Get metrics from Kata agent(through `ttrpc`)
### Kata agent
Agent is responsible for:
- Gather agent process metrics
- Gather guest OS metrics
And in Kata 2.0, agent will add a new interface:
```protobuf
rpcGetMetrics(GetMetricsRequest)returns(Metrics);
messageGetMetricsRequest{}
messageMetrics{
stringmetrics=1;
}
```
The `metrics` field is Prometheus encoded content. This can avoid defining a fixed structure in protocol buffers.
### Performance and overhead
Metrics should not become the bottleneck of system, downgrade the performance, and run with minimal overhead.
Requirements:
* Metrics **MUST** be quick to collect
* Metrics **MUST** be small.
* Metrics **MUST** be generated only if there are subscribers to the Kata metrics service
* Metrics **MUST** be stateless
In Kata 2.0, metrics are collected mainly from `/proc` filesystem, and consumed by Prometheus, based on a pull mode, that is mean if there is no Prometheus collector is running, so there will be zero overhead if nobody cares the metrics.
Metrics service also doesn't hold any metrics in memory.
|\*|No Sandbox | 1 Sandbox | 2 Sandboxes |
|---|---|---|---|
|Metrics count| 39 | 106 | 173 |
|Metrics size(bytes)| 9K | 144K | 283K |
|Metrics size(`gzipped`, bytes)| 2K | 10K | 17K |
*Metrics size*: Response size of one Prometheus scrape request.
It's easy to estimated that if there are 10 sandboxes running in the host, the size of one metrics fetch request issued by Prometheus will be about to 9 + (144 - 9) * 10 = 1.35M (not `gzipped`) or 2 + (10 - 2) * 10 = 82K (`gzipped`). Of course Prometheus support `gzip` compression, that can reduce the response size of every request.
And here is some test data:
- End-to-end (from Prometheus server to `kata-monitor` and `kata-monitor` write response back): 20ms(avg)
- Agent(RPC all from shim to agent): 3ms(avg)
Test infrastructure:
- OS: Ubuntu 20.04
- Hardware: Intel(R) Core(TM) i5-8500 CPU @ 3.00GHz, 6 Cores, and 16GB memory.
**Scrape interval**
Prometheus default `scrape_interval` is 1 minute, and usually it is set to 15s. Small `scrape_interval` will cause more overhead, so user should set it on monitor demand.
## Metrics list
Here listed is all supported metrics by Kata 2.0. Some metrics is dependent on guest kernels in the VM, so there may be some different by your environment.
Metrics is categorized by component where metrics are collected from and for.
> * Labels here are not include `instance` and `job` labels that added by Prometheus.
> * Notes about metrics unit
> * `Kibibytes`, abbreviated `KiB`. 1 `KiB` equals 1024 B.
> * For some metrics (like network devices statistics from file `/proc/net/dev`), unit is depend on label( for example `recv_bytes` and `recv_packets` are having different units).
> * Most of these metrics is collected from `/proc` filesystem, so the unit of metrics are keeping the same unit as `/proc`. See the `proc(5)` manual page for further details.
### Metric types
Prometheus offer four core metric types.
- Counter: A counter is a cumulative metric that represents a single monotonically increasing counter whose value can only increase.
- Gauge: A gauge metric represents a single numerical value that can go up and down, typically used for measured values like current memory usage.
- Histogram: A histogram samples observations (usually things like request durations or response sizes) and counts them in configurable buckets.
- Summary: A summary samples observations like histogram, it can calculate configurable quantiles over a sliding time window.
See [Prometheus metric types](https://prometheus.io/docs/concepts/metric_types/) for detailed explanations about these metric types.
### Kata agent metrics
Agent's metrics contains metrics about agent process.
| Metric name | Type | Units | Labels | Introduced in Kata version |
|---|---|---|---|---|
| `kata_agent_io_stat`: <br> Agent process IO stat. | `GAUGE` | | <ul><li>`item` (see `/proc/<pid>/io`)<ul><li>`cancelled_write_byte`</li><li>`rchar`</li><li>`read_bytes`</li><li>`syscr`</li><li>`syscw`</li><li>`wchar`</li><li>`write_bytes`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_agent_proc_stat`: <br> Agent process stat. | `GAUGE` | | <ul><li>`item` (see `/proc/<pid>/stat`)<ul><li>`cstime`</li><li>`cutime`</li><li>`stime`</li><li>`utime`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_agent_proc_status`: <br> Agent process status. | `GAUGE` | | <ul><li>`item` (see `/proc/<pid>/status`)<ul><li>`hugetlbpages`</li><li>`nonvoluntary_ctxt_switches`</li><li>`rssanon`</li><li>`rssfile`</li><li>`rssshmem`</li><li>`vmdata`</li><li>`vmexe`</li><li>`vmhwm`</li><li>`vmlck`</li><li>`vmlib`</li><li>`vmpeak`</li><li>`vmpin`</li><li>`vmpte`</li><li>`vmrss`</li><li>`vmsize`</li><li>`vmstk`</li><li>`vmswap`</li><li>`voluntary_ctxt_switches`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_agent_process_cpu_seconds_total`: <br> Total user and system CPU time spent in seconds. | `COUNTER` | `seconds` | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_agent_process_max_fds`: <br> Maximum number of open file descriptors. | `GAUGE` | | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_agent_process_open_fds`: <br> Number of open file descriptors. | `GAUGE` | | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_agent_process_start_time_seconds`: <br> Start time of the process since `unix` epoch in seconds. | `GAUGE` | `seconds` | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_agent_total_rss`: <br> Agent process total `rss` size | `GAUGE` | | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_agent_total_time`: <br> Agent process total time | `GAUGE` | | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_agent_total_vm`: <br> Agent process total `vm` size | `GAUGE` | | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
### Firecracker metrics
Metrics for Firecracker vmm.
| Metric name | Type | Units | Labels | Introduced in Kata version |
|---|---|---|---|---|
| `kata_firecracker_api_server`: <br> Metrics related to the internal API server. | `GAUGE` | | <ul><li>`item`<ul><li>`process_startup_time_cpu_us`</li><li>`process_startup_time_us`</li><li>`sync_response_fails`</li><li>`sync_vmm_send_timeout_count`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_firecracker_get_api_requests`: <br> Metrics specific to GET API Requests for counting user triggered actions and/or failures. | `GAUGE` | | <ul><li>`item`<ul><li>`instance_info_count`</li><li>`instance_info_fails`</li><li>`machine_cfg_count`</li><li>`machine_cfg_fails`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_firecracker_i8042`: <br> Metrics specific to the i8042 device. | `GAUGE` | | <ul><li>`item`<ul><li>`error_count`</li><li>`missed_read_count`</li><li>`missed_write_count`</li><li>`read_count`</li><li>`reset_count`</li><li>`write_count`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_firecracker_latencies_us`: <br> Performance metrics related for the moment only to snapshots. | `GAUGE` | | <ul><li>`item`<ul><li>`diff_create_snapshot`</li><li>`full_create_snapshot`</li><li>`load_snapshot`</li><li>`pause_vm`</li><li>`resume_vm`</li><li>`vmm_diff_create_snapshot`</li><li>`vmm_full_create_snapshot`</li><li>`vmm_load_snapshot`</li><li>`vmm_pause_vm`</li><li>`vmm_resume_vm`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_firecracker_logger`: <br> Metrics for the logging subsystem. | `GAUGE` | | <ul><li>`item`<ul><li>`log_fails`</li><li>`metrics_fails`</li><li>`missed_log_count`</li><li>`missed_metrics_count`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_firecracker_mmds`: <br> Metrics for the MMDS functionality. | `GAUGE` | | <ul><li>`item`<ul><li>`connections_created`</li><li>`connections_destroyed`</li><li>`rx_accepted`</li><li>`rx_accepted_err`</li><li>`rx_accepted_unusual`</li><li>`rx_bad_eth`</li><li>`rx_count`</li><li>`tx_bytes`</li><li>`tx_count`</li><li>`tx_errors`</li><li>`tx_frames`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_firecracker_patch_api_requests`: <br> Metrics specific to PATCH API Requests for counting user triggered actions and/or failures. | `GAUGE` | | <ul><li>`item`<ul><li>`drive_count`</li><li>`drive_fails`</li><li>`machine_cfg_count`</li><li>`machine_cfg_fails`</li><li>`network_count`</li><li>`network_fails`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_firecracker_put_api_requests`: <br> Metrics specific to PUT API Requests for counting user triggered actions and/or failures. | `GAUGE` | | <ul><li>`item`<ul><li>`actions_count`</li><li>`actions_fails`</li><li>`boot_source_count`</li><li>`boot_source_fails`</li><li>`drive_count`</li><li>`drive_fails`</li><li>`logger_count`</li><li>`logger_fails`</li><li>`machine_cfg_count`</li><li>`machine_cfg_fails`</li><li>`metrics_count`</li><li>`metrics_fails`</li><li>`network_count`</li><li>`network_fails`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_firecracker_rtc`: <br> Metrics specific to the RTC device. | `GAUGE` | | <ul><li>`item`<ul><li>`error_count`</li><li>`missed_read_count`</li><li>`missed_write_count`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_firecracker_seccomp`: <br> Metrics for the seccomp filtering. | `GAUGE` | | <ul><li>`item`<ul><li>`num_faults`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_firecracker_signals`: <br> Metrics related to signals. | `GAUGE` | | <ul><li>`item`<ul><li>`sigbus`</li><li>`sigsegv`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_firecracker_uart`: <br> Metrics specific to the UART device. | `GAUGE` | | <ul><li>`item`<ul><li>`error_count`</li><li>`flush_count`</li><li>`missed_read_count`</li><li>`missed_write_count`</li><li>`read_count`</li><li>`write_count`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_firecracker_vcpu`: <br> Metrics specific to VCPUs' mode of functioning. | `GAUGE` | | <ul><li>`item`<ul><li>`exit_io_in`</li><li>`exit_io_out`</li><li>`exit_mmio_read`</li><li>`exit_mmio_write`</li><li>`failures`</li><li>`filter_cpuid`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_firecracker_vmm`: <br> Metrics specific to the machine manager as a whole. | `GAUGE` | | <ul><li>`item`<ul><li>`device_events`</li><li>`panic_count`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| Metric name | Type | Units | Labels | Introduced in Kata version |
|---|---|---|---|---|
| `kata_guest_cpu_time`: <br> Guest CPU stat. | `GAUGE` | | <ul><li>`cpu` (CPU no. and total for all CPUs)<ul><li>`0` (CPU 0)</li><li>`1` (CPU 1)</li><li>`total` (for all CPUs)</li></ul></li><li>`item` (Kernel/system statistics, from `/proc/stat`)<ul><li>`guest`</li><li>`guest_nice`</li><li>`idle`</li><li>`iowait`</li><li>`irq`</li><li>`nice`</li><li>`softirq`</li><li>`steal`</li><li>`system`</li><li>`user`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_guest_diskstat`: <br> Disks stat in system. | `GAUGE` | | <ul><li>`disk` (disk name)</li><li>`item` (see `/proc/diskstats`)<ul><li>`discards`</li><li>`discards_merged`</li><li>`flushes`</li><li>`in_progress`</li><li>`merged`</li><li>`reads`</li><li>`sectors_discarded`</li><li>`sectors_read`</li><li>`sectors_written`</li><li>`time_discarding`</li><li>`time_flushing`</li><li>`time_in_progress`</li><li>`time_reading`</li><li>`time_writing`</li><li>`weighted_time_in_progress`</li><li>`writes`</li><li>`writes_merged`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_guest_meminfo`: <br> Statistics about memory usage on the system. | `GAUGE` | | <ul><li>`item` (see `/proc/meminfo`)<ul><li>`active`</li><li>`active_anon`</li><li>`active_file`</li><li>`anon_hugepages`</li><li>`anon_pages`</li><li>`bounce`</li><li>`buffers`</li><li>`cached`</li><li>`cma_free`</li><li>`cma_total`</li><li>`commit_limit`</li><li>`committed_as`</li><li>`direct_map_1G`</li><li>`direct_map_2M`</li><li>`direct_map_4M`</li><li>`direct_map_4k`</li><li>`dirty`</li><li>`hardware_corrupted`</li><li>`high_free`</li><li>`high_total`</li><li>`hugepages_free`</li><li>`hugepages_rsvd`</li><li>`hugepages_surp`</li><li>`hugepages_total`</li><li>`hugepagesize`</li><li>`hugetlb`</li><li>`inactive`</li><li>`inactive_anon`</li><li>`inactive_file`</li><li>`k_reclaimable`</li><li>`kernel_stack`</li><li>`low_free`</li><li>`low_total`</li><li>`mapped`</li><li>`mem_available`</li><li>`mem_free`</li><li>`mem_total`</li><li>`mlocked`</li><li>`mmap_copy`</li><li>`nfs_unstable`</li><li>`page_tables`</li><li>`per_cpu`</li><li>`quicklists`</li><li>`s_reclaimable`</li><li>`s_unreclaim`</li><li>`shmem`</li><li>`shmem_hugepages`</li><li>`shmem_pmd_mapped`</li><li>`slab`</li><li>`swap_cached`</li><li>`swap_free`</li><li>`swap_total`</li><li>`unevictable`</li><li>`vmalloc_chunk`</li><li>`vmalloc_total`</li><li>`vmalloc_used`</li><li>`writeback`</li><li>`writeback_tmp`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_guest_netdev_stat`: <br> Guest net devices stats. | `GAUGE` | | <ul><li>`interface` (network device name)</li><li>`item` (see `/proc/net/dev`)<ul><li>`recv_bytes`</li><li>`recv_compressed`</li><li>`recv_drop`</li><li>`recv_errs`</li><li>`recv_fifo`</li><li>`recv_frame`</li><li>`recv_multicast`</li><li>`recv_packets`</li><li>`sent_bytes`</li><li>`sent_carrier`</li><li>`sent_colls`</li><li>`sent_compressed`</li><li>`sent_drop`</li><li>`sent_errs`</li><li>`sent_fifo`</li><li>`sent_packets`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| Metric name | Type | Units | Labels | Introduced in Kata version |
|---|---|---|---|---|
| `kata_monitor_go_gc_duration_seconds`: <br> A summary of the pause duration of garbage collection cycles. | `SUMMARY` | `seconds` | | 2.0.0 |
| `kata_monitor_go_goroutines`: <br> Number of goroutines that currently exist. | `GAUGE` | | | 2.0.0 |
| `kata_monitor_go_info`: <br> Information about the Go environment. | `GAUGE` | | <ul><li>`version` (golang version)<ul><li>`go1.13.9` (environment dependent variable)</li></ul></li></ul> | 2.0.0 |
| `kata_monitor_go_memstats_alloc_bytes`: <br> Number of bytes allocated and still in use. | `GAUGE` | `bytes` | | 2.0.0 |
| `kata_monitor_go_memstats_alloc_bytes_total`: <br> Total number of bytes allocated, even if freed. | `COUNTER` | `bytes` | | 2.0.0 |
| `kata_monitor_go_memstats_buck_hash_sys_bytes`: <br> Number of bytes used by the profiling bucket hash table. | `GAUGE` | `bytes` | | 2.0.0 |
| `kata_monitor_go_memstats_frees_total`: <br> Total number of frees. | `COUNTER` | | | 2.0.0 |
| `kata_monitor_go_memstats_gc_cpu_fraction`: <br> The fraction of this program's available CPU time used by the GC since the program started. | `GAUGE` | | | 2.0.0 |
| `kata_monitor_go_memstats_gc_sys_bytes`: <br> Number of bytes used for garbage collection system metadata. | `GAUGE` | `bytes` | | 2.0.0 |
| `kata_monitor_go_memstats_heap_alloc_bytes`: <br> Number of heap bytes allocated and still in use. | `GAUGE` | `bytes` | | 2.0.0 |
| `kata_monitor_go_memstats_heap_idle_bytes`: <br> Number of heap bytes waiting to be used. | `GAUGE` | `bytes` | | 2.0.0 |
| `kata_monitor_go_memstats_heap_inuse_bytes`: <br> Number of heap bytes that are in use. | `GAUGE` | `bytes` | | 2.0.0 |
| `kata_monitor_go_memstats_heap_objects`: <br> Number of allocated objects. | `GAUGE` | | | 2.0.0 |
| `kata_monitor_go_memstats_heap_released_bytes`: <br> Number of heap bytes released to OS. | `GAUGE` | `bytes` | | 2.0.0 |
| `kata_monitor_go_memstats_heap_sys_bytes`: <br> Number of heap bytes obtained from system. | `GAUGE` | `bytes` | | 2.0.0 |
| `kata_monitor_go_memstats_last_gc_time_seconds`: <br> Number of seconds since 1970 of last garbage collection. | `GAUGE` | `seconds` | | 2.0.0 |
| `kata_monitor_go_memstats_lookups_total`: <br> Total number of pointer lookups. | `COUNTER` | | | 2.0.0 |
| `kata_monitor_go_memstats_mallocs_total`: <br> Total number of `mallocs`. | `COUNTER` | | | 2.0.0 |
| `kata_monitor_go_memstats_mcache_inuse_bytes`: <br> Number of bytes in use by `mcache` structures. | `GAUGE` | `bytes` | | 2.0.0 |
| `kata_monitor_go_memstats_mcache_sys_bytes`: <br> Number of bytes used for `mcache` structures obtained from system. | `GAUGE` | `bytes` | | 2.0.0 |
| `kata_monitor_go_memstats_mspan_inuse_bytes`: <br> Number of bytes in use by `mspan` structures. | `GAUGE` | `bytes` | | 2.0.0 |
| `kata_monitor_go_memstats_mspan_sys_bytes`: <br> Number of bytes used for `mspan` structures obtained from system. | `GAUGE` | `bytes` | | 2.0.0 |
| `kata_monitor_go_memstats_next_gc_bytes`: <br> Number of heap bytes when next garbage collection will take place. | `GAUGE` | `bytes` | | 2.0.0 |
| `kata_monitor_go_memstats_other_sys_bytes`: <br> Number of bytes used for other system allocations. | `GAUGE` | `bytes` | | 2.0.0 |
| `kata_monitor_go_memstats_stack_inuse_bytes`: <br> Number of bytes in use by the stack allocator. | `GAUGE` | `bytes` | | 2.0.0 |
| `kata_monitor_go_memstats_stack_sys_bytes`: <br> Number of bytes obtained from system for stack allocator. | `GAUGE` | `bytes` | | 2.0.0 |
| `kata_monitor_go_memstats_sys_bytes`: <br> Number of bytes obtained from system. | `GAUGE` | `bytes` | | 2.0.0 |
| `kata_monitor_go_threads`: <br> Number of OS threads created. | `GAUGE` | | | 2.0.0 |
| `kata_monitor_process_cpu_seconds_total`: <br> Total user and system CPU time spent in seconds. | `COUNTER` | `seconds` | | 2.0.0 |
| `kata_monitor_process_max_fds`: <br> Maximum number of open file descriptors. | `GAUGE` | | | 2.0.0 |
| `kata_monitor_process_open_fds`: <br> Number of open file descriptors. | `GAUGE` | | | 2.0.0 |
| Metric name | Type | Units | Labels | Introduced in Kata version |
|---|---|---|---|---|
| `kata_shim_agent_rpc_durations_histogram_milliseconds`: <br> RPC latency distributions. | `HISTOGRAM` | `milliseconds` | <ul><li>`action` (RPC actions of Kata agent)<ul><li>`grpc.CheckRequest`</li><li>`grpc.CloseStdinRequest`</li><li>`grpc.CopyFileRequest`</li><li>`grpc.CreateContainerRequest`</li><li>`grpc.CreateSandboxRequest`</li><li>`grpc.DestroySandboxRequest`</li><li>`grpc.ExecProcessRequest`</li><li>`grpc.GetMetricsRequest`</li><li>`grpc.GuestDetailsRequest`</li><li>`grpc.ListInterfacesRequest`</li><li>`grpc.ListProcessesRequest`</li><li>`grpc.ListRoutesRequest`</li><li>`grpc.MemHotplugByProbeRequest`</li><li>`grpc.OnlineCPUMemRequest`</li><li>`grpc.PauseContainerRequest`</li><li>`grpc.RemoveContainerRequest`</li><li>`grpc.ReseedRandomDevRequest`</li><li>`grpc.ResumeContainerRequest`</li><li>`grpc.SetGuestDateTimeRequest`</li><li>`grpc.SignalProcessRequest`</li><li>`grpc.StartContainerRequest`</li><li>`grpc.StartTracingRequest`</li><li>`grpc.StatsContainerRequest`</li><li>`grpc.StopTracingRequest`</li><li>`grpc.TtyWinResizeRequest`</li><li>`grpc.UpdateContainerRequest`</li><li>`grpc.UpdateInterfaceRequest`</li><li>`grpc.UpdateRoutesRequest`</li><li>`grpc.WaitProcessRequest`</li><li>`grpc.WriteStreamRequest`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_fds`: <br> Kata containerd shim v2 open FDs. | `GAUGE` | | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_gc_duration_seconds`: <br> A summary of the pause duration of garbage collection cycles. | `SUMMARY` | `seconds` | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_goroutines`: <br> Number of goroutines that currently exist. | `GAUGE` | | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_info`: <br> Information about the Go environment. | `GAUGE` | | <ul><li>`sandbox_id`</li><li>`version` (golang version)<ul><li>`go1.13.9` (environment dependent variable)</li></ul></li></ul> | 2.0.0 |
| `kata_shim_go_memstats_alloc_bytes`: <br> Number of bytes allocated and still in use. | `GAUGE` | `bytes` | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_memstats_alloc_bytes_total`: <br> Total number of bytes allocated, even if freed. | `COUNTER` | `bytes` | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_memstats_buck_hash_sys_bytes`: <br> Number of bytes used by the profiling bucket hash table. | `GAUGE` | `bytes` | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_memstats_frees_total`: <br> Total number of frees. | `COUNTER` | | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_memstats_gc_cpu_fraction`: <br> The fraction of this program's available CPU time used by the GC since the program started. | `GAUGE` | | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_memstats_gc_sys_bytes`: <br> Number of bytes used for garbage collection system metadata. | `GAUGE` | `bytes` | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_memstats_heap_alloc_bytes`: <br> Number of heap bytes allocated and still in use. | `GAUGE` | `bytes` | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_memstats_heap_idle_bytes`: <br> Number of heap bytes waiting to be used. | `GAUGE` | `bytes` | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_memstats_heap_inuse_bytes`: <br> Number of heap bytes that are in use. | `GAUGE` | `bytes` | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_memstats_heap_objects`: <br> Number of allocated objects. | `GAUGE` | | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_memstats_heap_released_bytes`: <br> Number of heap bytes released to OS. | `GAUGE` | `bytes` | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_memstats_heap_sys_bytes`: <br> Number of heap bytes obtained from system. | `GAUGE` | `bytes` | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_memstats_last_gc_time_seconds`: <br> Number of seconds since 1970 of last garbage collection. | `GAUGE` | `seconds` | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_memstats_lookups_total`: <br> Total number of pointer lookups. | `COUNTER` | | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_memstats_mallocs_total`: <br> Total number of `mallocs`. | `COUNTER` | | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_memstats_mcache_inuse_bytes`: <br> Number of bytes in use by `mcache` structures. | `GAUGE` | `bytes` | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_memstats_mcache_sys_bytes`: <br> Number of bytes used for `mcache` structures obtained from system. | `GAUGE` | `bytes` | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_memstats_mspan_inuse_bytes`: <br> Number of bytes in use by `mspan` structures. | `GAUGE` | `bytes` | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_memstats_mspan_sys_bytes`: <br> Number of bytes used for `mspan` structures obtained from system. | `GAUGE` | `bytes` | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_memstats_next_gc_bytes`: <br> Number of heap bytes when next garbage collection will take place. | `GAUGE` | `bytes` | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_memstats_other_sys_bytes`: <br> Number of bytes used for other system allocations. | `GAUGE` | `bytes` | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_memstats_stack_inuse_bytes`: <br> Number of bytes in use by the stack allocator. | `GAUGE` | `bytes` | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_memstats_stack_sys_bytes`: <br> Number of bytes obtained from system for stack allocator. | `GAUGE` | `bytes` | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_memstats_sys_bytes`: <br> Number of bytes obtained from system. | `GAUGE` | `bytes` | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_go_threads`: <br> Number of OS threads created. | `GAUGE` | | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_io_stat`: <br> Kata containerd shim v2 process IO statistics. | `GAUGE` | | <ul><li>`item` (see `/proc/<pid>/io`)<ul><li>`cancelledwritebytes`</li><li>`rchar`</li><li>`readbytes`</li><li>`syscr`</li><li>`syscw`</li><li>`wchar`</li><li>`writebytes`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_pod_overhead_cpu`: <br> Kata Pod overhead for CPU resources(percent). | `GAUGE` | percent | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_pod_overhead_memory_in_bytes`: <br> Kata Pod overhead for memory resources(bytes). | `GAUGE` | `bytes` | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_proc_stat`: <br> Kata containerd shim v2 process statistics. | `GAUGE` | | <ul><li>`item` (see `/proc/<pid>/stat`)<ul><li>`cstime`</li><li>`cutime`</li><li>`stime`</li><li>`utime`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_proc_status`: <br> Kata containerd shim v2 process status. | `GAUGE` | | <ul><li>`item` (see `/proc/<pid>/status`)<ul><li>`hugetlbpages`</li><li>`nonvoluntary_ctxt_switches`</li><li>`rssanon`</li><li>`rssfile`</li><li>`rssshmem`</li><li>`vmdata`</li><li>`vmexe`</li><li>`vmhwm`</li><li>`vmlck`</li><li>`vmlib`</li><li>`vmpeak`</li><li>`vmpin`</li><li>`vmpmd`</li><li>`vmpte`</li><li>`vmrss`</li><li>`vmsize`</li><li>`vmstk`</li><li>`vmswap`</li><li>`voluntary_ctxt_switches`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_process_cpu_seconds_total`: <br> Total user and system CPU time spent in seconds. | `COUNTER` | `seconds` | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_process_max_fds`: <br> Maximum number of open file descriptors. | `GAUGE` | | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_process_open_fds`: <br> Number of open file descriptors. | `GAUGE` | | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_process_start_time_seconds`: <br> Start time of the process since `unix` epoch in seconds. | `GAUGE` | `seconds` | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_shim_process_virtual_memory_max_bytes`: <br> Maximum amount of virtual memory available in bytes. | `GAUGE` | `bytes` | <ul><li>`sandbox_id`</li></ul> | 2.0.0 |
@@ -3,50 +3,62 @@ To fulfill the [Kata design requirements](kata-design-requirements.md), and base
- Sandbox based top API
- Storage and network hotplug API
- Plugin frameworks for external proprietary Kata runtime extensions
- Built-in shim and proxy types and capabilities
## Sandbox Based API
### Sandbox Management API
|Name|Description|
|---|---|
|`CreateSandbox(SandboxConfig)`| Create and start a sandbox, and return the sandbox structure.|
|`FetchSandbox(ID)`| Connect to an existing sandbox and return the sandbox structure.|
|`ListSandboxes()`| List all existing sandboxes with status. |
|`CreateSandbox(SandboxConfig, Factory)`| Create a sandbox and its containers, base on `SandboxConfig` and `Factory`. Return the `Sandbox` structure, but do not start them.|
### Sandbox Operation API
|Name|Description|
|---|---|
|`sandbox.Pause()`| Pause the sandbox.|
|`sandbox.Resume()`| Resume the paused sandbox.|
|`sandbox.Release()`| Release a sandbox data structure, close connections to the agent, and quit any goroutines associated with the sandbox. Mostly used for daemon restart.|
|`sandbox.Delete()`| Destroy the sandbox and remove all persistent metadata.|
|`sandbox.Status()`| Get the status of the sandbox and containers.|
|`sandbox.Delete()`| Shut down the VM in which the sandbox, and destroy the sandbox and remove all persistent metadata.|
|`sandbox.Monitor()`| Return a context handler for caller to monitor sandbox callbacks such as error termination.|
|`sandbox.CreateContainer()`| Create new container in the sandbox.|
|`sandbox.DeleteContainer()`| Delete a container from the sandbox.|
|`sandbox.StartContainer()`| Start a container in the sandbox.|
|`sandbox.StatusContainer()`| Get the status of a container in the sandbox.|
|`sandbox.EnterContainer()`| Run a new process in a container.|
|`sandbox.WaitProcess()`| Wait on a process to terminate.|
|`sandbox.Release()`| Release a sandbox data structure, close connections to the agent, and quit any goroutines associated with the Sandbox. Mostly used for daemon restart.|
|`sandbox.Start()`| Start a sandbox and the containers making the sandbox.|
|`sandbox.Stats()`| Get the stats of a running sandbox, return a `SandboxStats` structure.|
|`sandbox.Status()`| Get the status of the sandbox and containers, return a `SandboxStatus` structure.|
|`sandbox.Stop(force)`| Stop a sandbox and Destroy the containers in the sandbox. When force is true, ignore guest related stop failures.|
|`sandbox.CreateContainer(contConfig)`| Create new container in the sandbox with the `ContainerConfig` parameter. It will add new container config to `sandbox.config.Containers`.|
|`sandbox.DeleteContainer(containerID)`| Delete a container from the sandbox by `containerID`, return a `Container` structure.|
|`sandbox.EnterContainer(containerID, cmd)`| Run a new process in a container, executing customer's `types.Cmd` command.|
|`sandbox.KillContainer(containerID, signal, all)`| Signal a container in the sandbox by the `containerID`.|
|`sandbox.PauseContainer(containerID)`| Pause a running container in the sandbox by the `containerID`.|
|`sandbox.ProcessListContainer(containerID, options)`| List every process running inside a specific container in the sandbox, return a `ProcessList` structure.|
|`sandbox.ResumeContainer(containerID)`| Resume a paused container in the sandbox by the `containerID`.|
|`sandbox.StartContainer(containerID)`| Start a container in the sandbox by the `containerID`.|
|`sandbox.StatsContainer(containerID)`| Get the stats of a running container, return a `ContainerStats` structure.|
|`sandbox.StatusContainer(containerID)`| Get the status of a container in the sandbox, return a `ContainerStatus` structure.|
|`sandbox.StopContainer(containerID, force)`| Stop a container in the sandbox by the `containerID`.|
|`sandbox.UpdateContainer(containerID, resources)`| Update a running container in the sandbox.|
|`sandbox.WaitProcess(containerID, processID)`| Wait on a process to terminate.|
### Sandbox Hotplug API
|Name|Description|
|---|---|
|`sandbox.AddDevice()`| Add new storage device to the sandbox.|
|`sandbox.AddInterface()`| Add new NIC to the sandbox.|
|`sandbox.RemoveInterface()`| Remove a NIC from the sandbox.|
|`sandbox.ListInterfaces()`| List all NICs and their configurations in the sandbox.|
|`sandbox.UpdateRoutes()`| Update the sandbox route table (e.g. for portmapping support).|
|`sandbox.ListRoutes()`| List the sandbox route table.|
|`sandbox.AddDevice(info)`| Add new storage device `DeviceInfo` to the sandbox, return a `Device` structure.|
|`sandbox.AddInterface(inf)`| Add new NIC to the sandbox.|
|`sandbox.RemoveInterface(inf)`| Remove a NIC from the sandbox.|
|`sandbox.ListInterfaces()`| List all NICs and their configurations in the sandbox, return a `pbTypes.Interface` list.|
|`sandbox.UpdateRoutes(routes)`| Update the sandbox route table (e.g. for portmapping support), return a `pbTypes.Route` list.|
|`sandbox.ListRoutes()`| List the sandbox route table, return a `pbTypes.Route` list.|
### Sandbox Relay API
|Name|Description|
|---|---|
|`sandbox.WinsizeProcess(containerID, processID, Height, Width)`|Relay TTY resize request to a process.|
|`sandbox.WinsizeProcess(containerID, processID, Height, Width)`|Relay TTY resize request to a process.|
|`sandbox.SignalProcess(containerID, processID, signalID, signalALL)`| Relay a signal to a process or all processes in a container.|
|`sandbox.IOStream(containerID, processID)`| Relay a process stdio. Return stdin/stdout/stderr pipes to the process stdin/stdout/stderr streams.|
### Sandbox Monitor API
|Name|Description|
|---|---|
|`sandbox.GetOOMEvent()`| Monitor the OOM events that occur in the sandbox..|
|`sandbox.UpdateRuntimeMetrics()`| Update the `shim/hypervisor` metrics of the running sandbox.|
|`sandbox.GetAgentMetrics()`| Get metrics of the agent and the guest in the running sandbox.|
## Plugin framework for external proprietary Kata runtime extensions
@@ -22,10 +22,10 @@ the multiple hypervisors and virtual machine monitors that Kata supports.
## Mapping container concepts to virtual machine technologies
A typical deployment of Kata Containers will be in Kubernetes by way of a Container Runtime Interface (CRI) implementation. On every node,
Kubelet will interact with a CRI implementor (such as containerd or CRI-O), which will in turn interface with Kata Containers (an OCI based runtime).
Kubelet will interact with a CRI implementer (such as containerd or CRI-O), which will in turn interface with Kata Containers (an OCI based runtime).
The CRI API, as defined at the [Kubernetes CRI-API repo](https://github.com/kubernetes/cri-api/), implies a few constructs being supported by the
CRI implementation, and ultimately in Kata Containers. In order to support the full [API](https://github.com/kubernetes/cri-api/blob/a6f63f369f6d50e9d0886f2eda63d585fbd1ab6a/pkg/apis/runtime/v1alpha2/api.proto#L34-L110) with the CRI-implementor, Kata must provide the following constructs:
CRI implementation, and ultimately in Kata Containers. In order to support the full [API](https://github.com/kubernetes/cri-api/blob/a6f63f369f6d50e9d0886f2eda63d585fbd1ab6a/pkg/apis/runtime/v1alpha2/api.proto#L34-L110) with the CRI-implementer, Kata must provide the following constructs:

@@ -41,14 +41,9 @@ Each hypervisor or VMM varies on how or if it handles each of these.
## Kata Containers Hypervisor and VMM support
Kata Containers is designed to support multiple virtual machine monitors (VMMs) and hypervisors.
Cloud Hypervisor, based on [rust-VMM](https://github.com/rust-vmm), is designed to have a lighter footprint and attack surface. For Kata Containers,
relative to Firecracker, the Cloud Hypervisor configuration provides better compatibility at the expense of exposing additional devices: file system
sharing and direct device assignment. As of the 1.10 release of Kata Containers, Cloud Hypervisor does not support device hotplug, and as a result
does not support updating container resources after boot, or utilizing block based volumes. While Cloud Hypervisor does support VFIO, Kata is still adding
this support. As of 1.10, Kata does not support block based volumes or direct device assignment. See [Cloud Hypervisor device support documentation](https://github.com/cloud-hypervisor/cloud-hypervisor/blob/master/docs/device_model.md)
for more details on Cloud Hypervisor.
[Cloud Hypervisor](https://github.com/cloud-hypervisor/cloud-hypervisor), based
on [rust-vmm](https://github.com/rust-vmm), is designed to have a
lighter footprint and smaller attack surface for running modern cloud
workloads. Kata Containers with Cloud
Hypervisor provides mostly complete compatibility with Kubernetes
comparable to the QEMU configuration. As of the 1.12 and 2.0.0 release
of Kata Containers, the Cloud Hypervisor configuration supports both CPU
and memory resize, device hotplug (disk and VFIO), file-system sharing through virtio-fs,
block-based volumes, booting from VM images backed by pmem device, and
fine-grained seccomp filters for each VMM threads (e.g. all virtio
- [Run Kata containers with `crictl`](run-kata-with-crictl.md)
- [Run Kata Containers with Kubernetes](run-kata-with-k8s.md)
- [How to use Kata Containers and Containerd](containerd-kata.md)
- [How to use Kata Containers and CRI (containerd plugin) with Kubernetes](how-to-use-k8s-with-cri-containerd-and-kata.md)
@@ -13,8 +14,17 @@
- [How to import Kata Containers logs into Fluentd](how-to-import-kata-logs-with-fluentd.md)
## Hypervisors Integration
Currently supported hypervisors with Kata Containers include:
-`qemu`
-`cloud-hypervisor`
-`firecracker`
-`ACRN`
While `qemu` and `cloud-hypervisor` work out of the box with installation of Kata,
some additional configuration is needed in case of `firecracker` and `ACRN`.
Refer to the following guides for additional configuration steps:
- [Kata Containers with Firecracker](https://github.com/kata-containers/documentation/wiki/Initial-release-of-Kata-Containers-with-Firecracker-support)
- [Kata Containers with NEMU](how-to-use-kata-containers-with-nemu.md)
- [Kata Containers with ACRN Hypervisor](how-to-use-kata-containers-with-acrn.md)
## Advanced Topics
@@ -26,3 +36,4 @@
- [How to load kernel modules in Kata Containers](how-to-load-kernel-modules-with-kata.md)
- [How to use Kata Containers with `virtio-mem`](how-to-use-virtio-mem-with-kata.md)
- [How to set sandbox Kata Containers configurations with pod annotations](how-to-set-sandbox-config-kata.md)
- [How to monitor Kata Containers in K8s](how-to-set-prometheus-in-k8s.md)
@@ -57,7 +57,7 @@ use `RuntimeClass` instead of the deprecated annotations.
### Containerd Runtime V2 API: Shim V2 API
The [`containerd-shim-kata-v2` (short as `shimv2` in this documentation)](https://github.com/kata-containers/runtime/tree/master/containerd-shim-v2)
The [`containerd-shim-kata-v2` (short as `shimv2` in this documentation)](../../src/runtime/containerd-shim-v2)
implements the [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/master/runtime/v2) for Kata.
With `shimv2`, Kubernetes can launch Pod and OCI-compatible containers with one shim per Pod. Prior to `shimv2`, `2N+1`
shims (i.e. a `containerd-shim` and a `kata-shim` for each container and the Pod sandbox itself) and no standalone `kata-proxy`
@@ -72,7 +72,7 @@ is implemented in Kata Containers v1.5.0.
### Install Kata Containers
Follow the instructions to [install Kata Containers](https://github.com/kata-containers/documentation/blob/master/install/README.md).
Follow the instructions to [install Kata Containers](../install/README.md).
### Install containerd with CRI plugin
@@ -193,13 +193,16 @@ From Containerd v1.2.4 and Kata v1.6.0, there is a new runtime option supported,
```toml
[plugins.cri.containerd.runtimes.kata]
runtime_type = "io.containerd.kata.v2"
privileged_without_host_devices = true
[plugins.cri.containerd.runtimes.kata.options]
ConfigPath = "/etc/kata-containers/config.toml"
```
`privileged_without_host_devices` tells containerd that a privileged Kata container should not have direct access to all host devices. If unset, containerd will pass all host devices to Kata container, which may cause security issues.
This `ConfigPath` option is optional. If you do not specify it, shimv2 first tries to get the configuration file from the environment variable `KATA_CONF_FILE`. If neither are set, shimv2 will use the default Kata configuration file paths (`/etc/kata-containers/configuration.toml` and `/usr/share/defaults/kata-containers/configuration.toml`).
If you use Containerd older than v1.2.4 or a version of Kata older than v1.6.0 and also want to specify a configuration file, you can use the following workaround, since the shimv2 accepts an environment variable, `KATA_CONF_FILE` for the configuration file path. Then, you can create a
If you use Containerd older than v1.2.4 or a version of Kata older than v1.6.0 and also want to specify a configuration file, you can use the following workaround, since the shimv2 accepts an environment variable, `KATA_CONF_FILE` for the configuration file path. Then, you can create a
> **Warning**: This how-to is only for evaluation purpose, you **SHOULD NOT** running it in production using this configurations.
## Introduction
If you are running Kata containers in a Kubernetes cluster, the best way to run `kata-monitor` is using Kubernetes native `DaemonSet`, `kata-monitor` will run on desired Kubernetes nodes without other operations when new nodes joined the cluster.
Prometheus also support a Kubernetes service discovery that can find scrape targets dynamically without explicitly setting `kata-monitor`'s metric endpoints.
## Pre-requisites
You must have a running Kubernetes cluster first. If not, [install a Kubernetes cluster](https://kubernetes.io/docs/setup/) first.
Also you should ensure that `kubectl` working correctly.
> **Note**: More information about Kubernetes integrations:
> - [Run Kata Containers with Kubernetes](run-kata-with-k8s.md)
> - [How to use Kata Containers and Containerd](containerd-kata.md)
> - [How to use Kata Containers and CRI (containerd plugin) with Kubernetes](how-to-use-k8s-with-cri-containerd-and-kata.md)
## Configure Prometheus
Start Prometheus by utilizing our sample manifest:
This will create a new namespace, `prometheus`, and create the following resources:
*`ClusterRole`, `ServiceAccount`, `ClusterRoleBinding` to let Prometheus to access Kubernetes API server.
*`ConfigMap` that contains minimum configurations to let Prometheus run Kubernetes service discovery.
*`Deployment` that run Prometheus in `Pod`.
*`Service` with `type` of `NodePort`(`30909` in this how to), that we can access Prometheus through `<hostIP>:30909`. In production environment, this `type` may be `LoadBalancer` or `Ingress` resource.
After the Prometheus server is running, run `curl -s http://hostIP:NodePort:30909/metrics`, if Prometheus is working correctly, you will get response like these:
```
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
This will create a new namespace `kata-system` and a `daemonset` in it.
Once the `daemonset` is running, Prometheus should discover `kata-monitor` as a target. You can open `http://<hostIP>:30909/service-discovery` and find `kubernetes-pods` under the `Service Discovery` list
This will create deployment and service for Grafana under namespace `prometheus`.
After the Grafana deployment is ready, you can open `http://hostIP:NodePort:30000/` to access Grafana server. For Grafana 7.0.5, the default user/password is `admin/admin`. You can modify the default account and adjust other security settings by editing the [Grafana configuration](https://grafana.com/docs/grafana/latest/installation/configuration/#security).
To use Grafana show data from Prometheus, you must create a Prometheus `datasource` and dashboard.
### Create `datasource`
Open `http://hostIP:NodePort:30000/datasources/new` in your browser, select Prometheus from time series databases list.
Normally you only need to set `URL` to `http://hostIP:NodePort:30909` to let it work, and leave the name as `Prometheus` as default.
### Import dashboard
A [sample dashboard](data/dashboard.json) for Kata Containers metrics is provided which can be imported to Grafana for evaluation.
You can import this dashboard using Grafana UI, or using `curl` command in console.
Kata Containers gives users freedom to customize at per-pod level, by setting
a wide range of Kata specific annotations in the pod specification.
Some annotations may be [restricted](#restricted-annotations) by the
configuration file for security reasons, notably annotations that could lead the
runtime to execute programs on the host. Such annotations are marked with _(R)_ in
the tables below.
# Kata Configuration Annotations
There are several kinds of Kata configurations and they are listed below.
@@ -26,6 +31,7 @@ There are several kinds of Kata configurations and they are listed below.
| Key | Value Type | Comments |
|-------| ----- | ----- |
| `io.katacontainers.config.agent.enable_tracing` | `boolean` | enable tracing for the agent |
| `io.katacontainers.config.agent.container_pipe_size` | uint32 | specify the size of the std(in/out) pipes created for containers |
| `io.katacontainers.config.agent.kernel_modules` | string | the list of kernel modules and their parameters that will be loaded in the guest kernel. Semicolon separated list of kernel modules and their parameters. These modules will be loaded in the guest kernel using `modprobe`(8). E.g., `e1000e InterruptThrottleRate=3000,3000,3000 EEE=1; i915 enable_ppgtt=0` |
| `io.katacontainers.config.agent.trace_mode` | string | the trace mode for the agent |
| `io.katacontainers.config.agent.trace_type` | string | the trace type for the agent |
@@ -38,17 +44,24 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.hypervisor.block_device_cache_noflush` | `boolean` | Denotes whether flush requests for the device are ignored |
| `io.katacontainers.config.hypervisor.block_device_cache_set` | `boolean` | cache-related options will be set to block devices or not |
| `io.katacontainers.config.hypervisor.block_device_driver` | string | the driver to be used for block device, valid values are `virtio-blk`, `virtio-scsi`, `nvdimm`|
| `io.katacontainers.config.hypervisor.cpu_features` | `string` | Comma-separated list of CPU features to pass to the CPU (QEMU) |
| `io.katacontainers.config.hypervisor.ctlpath` (R) | `string` | Path to the `acrnctl` binary for the ACRN hypervisor |
| `io.katacontainers.config.hypervisor.default_max_vcpus` | uint32| the maximum number of vCPUs allocated for the VM by the hypervisor |
| `io.katacontainers.config.hypervisor.default_memory` | uint32| the memory assigned for a VM by the hypervisor in `MiB` |
| `io.katacontainers.config.hypervisor.default_vcpus` | uint32| the default vCPUs assigned for a VM by the hypervisor |
| `io.katacontainers.config.hypervisor.disable_block_device_use` | `boolean` | disallow a block device from being used |
| `io.katacontainers.config.hypervisor.disable_image_nvdimm` | `boolean` | specify if a `nvdimm` device should be used as rootfs for the guest (QEMU) |
| `io.katacontainers.config.hypervisor.disable_vhost_net` | `boolean` | specify if `vhost-net` is not available on the host |
| `io.katacontainers.config.hypervisor.enable_hugepages` | `boolean` | if the memory should be `pre-allocated` from huge pages |
| `io.katacontainers.config.hypervisor.enable_iothreads` | `boolean`| enable IO to be processed in a separate thread. Supported currently for virtio-`scsi` driver |
| `io.katacontainers.config.hypervisor.enable_mem_prealloc` | `boolean` | the memory space used for `nvdimm` device by the hypervisor |
| `io.katacontainers.config.hypervisor.enable_swap` | `boolean` | enable swap of VM memory |
| `io.katacontainers.config.hypervisor.entropy_source` | string| the path to a host source of entropy (`/dev/random`, `/dev/urandom` or real hardware RNG device) |
| `io.katacontainers.config.hypervisor.kernel` | string | the kernel used to boot the container VM |
@@ -69,23 +82,26 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.hypervisor.memory_slots` | uint32| the memory slots assigned to the VM by the hypervisor |
| `io.katacontainers.config.hypervisor.msize_9p` | uint32 | the `msize` for 9p shares |
| `io.katacontainers.config.hypervisor.path` | string | the hypervisor that will run the container VM |
| `io.katacontainers.config.hypervisor.pcie_root_port` | specify the number of PCIe Root Port devices. The PCIe Root Port device is used to hot-plug a PCIe device (QEMU) |
| `io.katacontainers.config.hypervisor.shared_fs` | string | the shared file system type, either `virtio-9p` or `virtio-fs` |
| `io.katacontainers.config.hypervisor.use_vsock` | `boolean` | specify use of `vsock` for agent communication |
| `io.katacontainers.config.hypervisor.vhost_user_store_path` (R) | `string` | specify the directory path where vhost-user devices related folders, sockets and device nodes should be (QEMU) |
@@ -49,7 +49,7 @@ This document requires the presence of the ACRN hypervisor and Kata Containers o
$ sudo sed -i "s/$kernel_img/bzImage/g" /mnt/loader/entries/$conf_file
$ sync && sudo umount /mnt && sudo reboot
```
- Kata Containers installation: Automated installation does not seem to be supported for Clear Linux, so please use [manual installation](https://github.com/kata-containers/documentation/blob/master/Developer-Guide.md) steps.
- Kata Containers installation: Automated installation does not seem to be supported for Clear Linux, so please use [manual installation](../Developer-Guide.md) steps.
> **Note:** Create rootfs image and not initrd image.
@@ -82,7 +82,7 @@ $ sudo systemctl daemon-reload
$ sudo systemctl restart docker
```
4. Configure [Docker](https://github.com/kata-containers/documentation/blob/master/Developer-Guide.md#update-the-docker-systemd-unit-file) to use `kata-runtime`.
4. Configure [Docker](../Developer-Guide.md#update-the-docker-systemd-unit-file) to use `kata-runtime`.
* [Configure Kata Containers](#configure-kata-containers)
Kata Containers relies by default on the QEMU hypervisor in order to spawn the virtual machines running containers. [NEMU](https://github.com/intel/nemu) is a fork of QEMU that:
- Reduces the number of lines of code.
- Removes all legacy devices.
- Reduces the emulation as far as possible.
## Introduction
This document describes how to run Kata Containers with NEMU, first by explaining how to download, build and install it. Then it walks through the steps needed to update your Kata Containers configuration in order to run with NEMU.
## Pre-requisites
This document requires Kata Containers to be [installed](https://github.com/kata-containers/documentation/blob/master/install/README.md) on your system.
Also, it's worth noting that NEMU only supports `x86_64` and `aarch64` architecture.
> **Note:** The branch `experiment/automatic-removal` is a branch published by Jenkins after it has applied the automatic removal script to the `topic/virt-x86` branch. The purpose of this code removal being to reduce the source tree by removing files not being used by NEMU.
After those commands have successfully returned, you will find the NEMU binary at `$HOME/build-x86_64_virt/x86_64_virt-softmmu/qemu-system-x86_64_virt` (__x86__), or `$HOME/build-aarch64/aarch64-softmmu/qemu-system-aarch64` (__ARM__).
You also need the `OVMF` firmware in order to boot the virtual machine's kernel. It can currently be found at this [location](https://github.com/intel/ovmf-virt/releases).
> **Note:** The OVMF firmware will be located at this temporary location until the changes can be pushed upstream.
## Configure Kata Containers
All you need from this section is to modify the configuration file `/usr/share/defaults/kata-containers/configuration.toml` to specify the options related to the hypervisor.
# Optional space-separated list of options to pass to the guest kernel.
# For example, use `kernel_params = "vsyscall=emulate"` if you are having
@@ -31,7 +31,7 @@
# Path to the firmware.
# If you want that qemu uses the default firmware leave this option empty
-firmware = ""
+firmware = "/usr/share/nemu/OVMF.fd"
# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
```
As you can see from this snippet above, all you need to change is:
- The path to the hypervisor binary, `/home/foo/build-x86_64_virt/x86_64_virt-softmmu/qemu-system-x86_64_virt` in this example.
- The machine type from `pc` to `virt`.
- The path to the firmware binary, `/usr/share/nemu/OVMF.fd` in this example.
Once you have saved those modifications, you can start a new container:
```bash
$ docker run --runtime=kata-runtime -it busybox
```
And you will be able to verify this new container is running with the NEMU hypervisor by looking for the hypervisor path and the machine type from the `qemu` process running on your system:
@@ -13,7 +13,7 @@ As of the 1.7 release of Kata Containers, [9pfs](https://www.kernel.org/doc/Docu
To help address these limitations, [virtio-fs](https://virtio-fs.gitlab.io/) has been developed. virtio-fs is a shared file system that lets virtual machines access a directory tree on the host. In Kata Containers, virtio-fs can be used to share container volumes, secrets, config-maps, configuration files (hostname, hosts, `resolv.conf`) and the container rootfs on the host with the guest. virtio-fs provides significant performance and POSIX compliance improvements compared to 9pfs.
Enabling of virtio-fs requires changes in the guest kernel as well as the VMM. For Kata Containers, experimental virtio-fs support is enabled through the [NEMU VMM](https://github.com/intel/nemu).
Enabling of virtio-fs requires changes in the guest kernel as well as the VMM. For Kata Containers, experimental virtio-fs support is enabled through `qemu` and `cloud-hypervisor` VMMs.
**Note: virtio-fs support is experimental in the 1.7 release of Kata Containers. Work is underway to improve stability, performance and upstream integration. This is available for early preview - use at your own risk**
@@ -21,31 +21,41 @@ This document describes how to get Kata Containers to work with virtio-fs.
## Pre-requisites
*Before Kata 1.8 this feature required the host to have hugepages support enabled. Enable this with the `sysctl vm.nr_hugepages=1024` command on the host.
Before Kata 1.8 this feature required the host to have hugepages support enabled. Enable this with the `sysctl vm.nr_hugepages=1024` command on the host.In later versions of Kata, virtio-fs leverages `/dev/shm` as the shared memory backend. The default size of `/dev/shm` on a system is typically half of the total system memory. This can pose a physical limit to the maximum number of pods that can be launched with virtio-fs. This can be overcome by increasing the size of `/dev/shm` as shown below:
```bash
$ mount -o remount,size=${desired_shm_size} /dev/shm
```
## Install Kata Containers with virtio-fs support
The Kata Containers NEMU configuration, the NEMU VMM and the `virtiofs` daemon are available in the [Kata Container release](https://github.com/kata-containers/runtime/releases) artifacts starting with the 1.7 release. While the feature is experimental, distribution packages are not supported, but installation is available through [`kata-deploy`](https://github.com/kata-containers/packaging/tree/master/kata-deploy).
The Kata Containers `qemu` configuration with virtio-fs and the `virtiofs` daemon are available in the [Kata Container release](https://github.com/kata-containers/runtime/releases) artifacts starting with the 1.9 release. Installation is available through [distribution packages](https://github.com/kata-containers/documentation/blob/master/install/README.md#supported-distributions) as well through [`kata-deploy`](https://github.com/kata-containers/packaging/tree/master/kata-deploy).
Install the latest release of Kata as follows:
**Note: Support for virtio-fs was first introduced in `NEMU` hypervisor in Kata 1.8 release. This hypervisor has been deprecated.**
Install the latest release of Kata with `kata-deploy` as follows:
This will place the Kata release artifacts in `/opt/kata`, and update Docker's configuration to include a runtime target, `kata-nemu`. Learn more about `kata-deploy` and how to use `kata-deploy` in Kubernetes [here](https://github.com/kata-containers/packaging/tree/master/kata-deploy#kubernetes-quick-start).
This will place the Kata release artifacts in `/opt/kata`, and update Docker's configuration to include a runtime target, `kata-qemu-virtiofs`. Learn more about `kata-deploy` and how to use `kata-deploy` in Kubernetes [here](https://github.com/kata-containers/packaging/tree/master/kata-deploy#kubernetes-quick-start).
## Run a Kata Container utilizing virtio-fs
Once installed, start a new container, utilizing NEMU + `virtiofs`:
Once installed, start a new container, utilizing `qemu` + `virtiofs`:
```bash
$ docker run --runtime=kata-nemu -it busybox
$ docker run --runtime=kata-qemu-virtiofs -it busybox
```
Verify the new container is running with the NEMU hypervisor as well as using `virtiofsd`. To do this look for the hypervisor path and the `virtiofs` daemon process on the host:
Verify the new container is running with the `qemu` hypervisor as well as using `virtiofsd`. To do this look for the hypervisor path and the `virtiofs` daemon process on the host:
* [Check `redis` server is working](#check-redis-server-is-working)
## What's `cri-tools`
[`cri-tools`](https://github.com/kubernetes-sigs/cri-tools) provides debugging and validation tools for Kubelet Container Runtime Interface (CRI).
`cri-tools` includes two tools: `crictl` and `critest`. `crictl` is the CLI for Kubelet CRI, in this document, we will show how to use `crictl` to run Pods in Kata containers.
> **Note:** `cri-tools` is only used for debugging and validation purpose, and don't use it to run production workloads.
> **Note:** For how to install and configure `cri-tools` with CRI runtimes like `containerd` or CRI-O, please also refer to other [howtos](./README.md).
## Use `crictl` run Pods in Kata containers
Sample config files in this document can be found [here](./data/crictl/).
* [Run a Kubernetes pod with Kata Containers](#run-a-kubernetes-pod-with-kata-containers)
## Prerequisites
This guide requires Kata Containers available on your system, install-able by following [this guide](https://github.com/kata-containers/documentation/blob/master/install/README.md).
This guide requires Kata Containers available on your system, install-able by following [this guide](../install/README.md).
## Install a CRI implementation
@@ -28,7 +28,7 @@ After choosing one CRI implementation, you must make the appropriate configurati
to ensure it integrates with Kata Containers.
Kata Containers 1.5 introduced the `shimv2` for containerd 1.2.0, reducing the components
required to spawn pods and containers, and this is the preferred way to run Kata Containers with Kubernetes ([as documented here](https://github.com/kata-containers/documentation/blob/master/how-to/how-to-use-k8s-with-cri-containerd-and-kata.md#configure-containerd-to-use-kata-containers)).
required to spawn pods and containers, and this is the preferred way to run Kata Containers with Kubernetes ([as documented here](../how-to/how-to-use-k8s-with-cri-containerd-and-kata.md#configure-containerd-to-use-kata-containers)).
An equivalent shim implementation for CRI-O is planned.
@@ -78,7 +78,7 @@ a runtime to be used when the workload cannot be trusted and a higher level of s
is required. An additional flag can be used to let CRI-O know if a workload
should be considered _trusted_ or _untrusted_ by default.
VMCache is a new function that creates VMs as caches before using it.
It helps speed up new container creation.
The function consists of a server and some clients communicating
through Unix socket. The protocol is gRPC in [`protocols/cache/cache.proto`](https://github.com/kata-containers/runtime/blob/master/protocols/cache/cache.proto).
through Unix socket. The protocol is gRPC in [`protocols/cache/cache.proto`](../../src/runtime/protocols/cache/cache.proto).
The VMCache server will create some VMs and cache them by factory cache.
It will convert the VM to gRPC format and transport it when gets
requested from clients.
@@ -21,9 +21,9 @@ a new sandbox.
### How is this different to VM templating
Both [VM templating](https://github.com/kata-containers/documentation/blob/master/how-to/what-is-vm-templating-and-how-do-I-use-it.md) and VMCache help speed up new container creation.
Both [VM templating](../how-to/what-is-vm-templating-and-how-do-I-use-it.md) and VMCache help speed up new container creation.
When VM templating enabled, new VMs are created by cloning from a pre-created template VM, and they will share the same initramfs, kernel and agent memory in readonly mode. So it saves a lot of memory if there are many Kata Containers running on the same host.
VMCache is not vulnerable to [share memory CVE](https://github.com/kata-containers/documentation/blob/master/how-to/what-is-vm-templating-and-how-do-I-use-it.md#what-are-the-cons) because each VM doesn't share the memory.
VMCache is not vulnerable to [share memory CVE](../how-to/what-is-vm-templating-and-how-do-I-use-it.md#what-are-the-cons) because each VM doesn't share the memory.
@@ -8,7 +8,7 @@ same initramfs, kernel and agent memory in readonly mode. It is very
much like a process fork done by the kernel but here we *fork* VMs.
### How is this different from VMCache
Both [VMCache](https://github.com/kata-containers/documentation/blob/master/how-to/what-is-vm-cache-and-how-do-I-use-it.md) and VM templating help speed up new container creation.
Both [VMCache](../how-to/what-is-vm-cache-and-how-do-I-use-it.md) and VM templating help speed up new container creation.
When VMCache enabled, new VMs are created by the VMCache server. So it is not vulnerable to share memory CVE because each VM doesn't share the memory.
VM templating saves a lot of memory if there are many Kata Containers running on the same host.
@@ -46,6 +46,7 @@ overridden by `/etc/kata-containers/configuration.toml` if provided) such that:
-`enable_template = true`
-`initrd =` is set
-`image =` option is commented out or removed
-`shared_fs` should not be `virtio-fs`
Then you can create a VM templating for later usage by calling
The table below provides a brief summary of some of the differences between
the hypervisors:
| Hypervisor | Summary | Features | Limitations | Container Creation speed | Memory density | Use cases | Comment |
|-|-|-|-|-|-|-|-|
[ACRN] | Safety critical and real-time workloads | | | excellent | excellent | Embedded and IOT systems | For advanced users |
[Cloud Hypervisor] | Low latency, small memory footprint, small attack surface | Minimal | | excellent | excellent | High performance modern cloud workloads | |
[Firecracker] | Very slimline | Extremely minimal | Doesn't support all device types | excellent | excellent | Serverless / FaaS | |
[QEMU] | Lots of features | Lots | | good | good | Good option for most users | | All users |
For further details, see the [Virtualization in Kata Containers](design/virtualization.md) document and the official documentation for each hypervisor.
| [Automatic](#automatic-installation) |Run a single command to install a full system |[see table](#supported-distributions) |
| [Using snap](#snap-installation) |Easy to install and automatic updates |any distro that supports snapd |
| [Using official distro packages](#official-packages) |Kata packages provided by Linux distributions official repositories |[see table](#supported-distributions) |
| [Scripted](#scripted-installation) |Generates an installation script which will result in a working system when executed |[see table](#supported-distributions) |
| [Manual](#manual-installation) |Allows the user to read a brief document and execute the specified commands step-by-step |[see table](#supported-distributions) |
| Installation method | Description | Automatic updates | Use case |
| [Using official distro packages](#official-packages) | Kata packages provided by Linux distributions official repositories | yes | Recommended for most users. |
| [Using snap](#snap-installation) |Easy to install | yes | Good alternative to official distro packages. |
| [Automatic](#automatic-installation) | Run a single command to install a full system | **No!** | For those wanting the latest release quickly. |
| [Manual](#manual-installation) | Follow a guide step-by-step to install a working system | **No!** | For those who want the latest release with more control. |
| [Build from source](#build-from-source-installation) | Build the software components manually | **No!** | Power users and developers only. |
### Supported Distributions
Kata is packaged by the Kata community for:
|Distribution (link to installation guide) | Versions |
[Use `kata-manager`](installing-with-kata-manager.md) to automatically install Kata packages.
> **Note::**
>
> All users are encouraged to uses the official distribution versions of Kata
> Containers unless they understand the implications of alternative methods.
### Snap Installation
> **Note:** The snap installation is available for all distributions which support `snapd`.
[](https://snapcraft.io/kata-containers)
[Use snap](snap-installation-guide.md) to install Kata Containers from https://snapcraft.io.
### Scripted Installation
[Use `kata-doc-to-script`](installing-with-kata-doc-to-script.md) to generate installation scripts that can be reviewed before they are executed.
### Automatic Installation
[Use `kata-manager`](/utils/README.md) to automatically install a working Kata Containers system.
### Manual Installation
Manual installation instructions are available for [these distributions](#supported-distributions) and document how to:
1. Add the Kata Containers repository to your distro package manager, and import the packages signing key.
2. Install the Kata Containers packages.
3. Install a supported container manager.
4. Configure the container manager to use `kata-runtime` as the default OCI runtime. Or, for Kata Containers 1.5.0 or above, configure the
`io.containerd.kata.v2` to be the runtime shim (see [containerd runtime v2 (shim API)](https://github.com/containerd/containerd/tree/master/runtime/v2)
and [How to use Kata Containers and CRI (containerd plugin) with Kubernetes](https://github.com/kata-containers/documentation/blob/master/how-to/how-to-use-k8s-with-cri-containerd-and-kata.md)).
> **Notes on upgrading**:
> - If you are installing Kata Containers on a system that already has Clear Containers or `runv` installed,
> first read [the upgrading document](../Upgrading.md).
> hosts the Kata Containers packages built by OBS for all the supported architectures.
> Packages are available for the latest and stable releases (more info [here](https://github.com/kata-containers/documentation/blob/master/Stable-Branch-Strategy.md)).
>
> - The following guides apply to the latest Kata Containers release
> (a.k.a. `master` release).
>
> - When choosing a stable release, replace all `master` occurrences in the URLs
> with a `stable-x.y` version available on the [download server](http://download.opensuse.org/repositories/home:/katacontainers:/releases:/).
> **Notes on packages source verification**:
> - The Kata packages hosted on the download server are signed with GPG to ensure integrity and authenticity.
>
> - The public key used to sign packages is available [at this link](https://raw.githubusercontent.com/kata-containers/tests/master/data/rpm-signkey.pub); the fingerprint is `9FDC0CB6 3708CF80 3696E2DC D0B37B82 6063F3ED`.
>
> - Only trust the signing key and fingerprint listed in the previous bullet point. Do not disable GPG checks,
> otherwise packages source and authenticity is not guaranteed.
Follow the [containerd installation guide](container-manager/containerd/containerd-install.md).
## Build from source installation
> **Notes:**
>
> - Power users who decide to build from sources should be aware of the
@@ -115,6 +88,7 @@ who are comfortable building software from source to use the latest component
versions. This is not recommended for normal users.
## Installing on a Cloud Service Platform
* [Amazon Web Services (AWS)](aws-installation-guide.md)
The process for installing Kata itself on bare metal is identical to that of a virtualization-enabled VM.
For detailed information to install Kata on your distribution of choice, see the [Kata Containers installation user guides](https://github.com/kata-containers/documentation/blob/master/install/README.md).
For detailed information to install Kata on your distribution of choice, see the [Kata Containers installation user guides](../install/README.md).
* [Create a Kata-enabled Image](#create-a-kata-enabled-image)
Kata Containers on Google Compute Engine (GCE) makes use of [nested virtualization](https://cloud.google.com/compute/docs/instances/enable-nested-virtualization-vm-instances). Most of the installation procedure is identical to that for Kata on your preferred distribution, but enabling nested virtualization currently requires extra steps on GCE. This guide walks you through creating an image and instance with nested virtualization enabled. Note that `kata-runtime kata-check` checks for nested virtualization, but does not fail if support is not found.
Kata Containers on Google Compute Engine (GCE) makes use of [nested virtualization](https://cloud.google.com/compute/docs/instances/enable-nested-virtualization-vm-instances). Most of the installation procedure is identical to that for Kata on your preferred distribution, but enabling nested virtualization currently requires extra steps on GCE. This guide walks you through creating an image and instance with nested virtualization enabled. Note that `kata-runtime check` checks for nested virtualization, but does not fail if support is not found.
As a pre-requisite this guide assumes an installed and configured instance of the [Google Cloud SDK](https://cloud.google.com/sdk/downloads). For a zero-configuration option, all of the commands below were been tested under [Google Cloud Shell](https://cloud.google.com/shell/) (as of Jun 2018). Verify your `gcloud` installation and configuration:
@@ -101,7 +101,7 @@ If this fails, ensure you created your instance from the correct image and that
The process for installing Kata itself on a virtualization-enabled VM is identical to that for bare metal.
For detailed information to install Kata on your distribution of choice, see the [Kata Containers installation user guides](https://github.com/kata-containers/documentation/blob/master/install/README.md).
For detailed information to install Kata on your distribution of choice, see the [Kata Containers installation user guides](../install/README.md).
For more information on what `kata-manager` can do, refer to the [`kata-manager` page](https://github.com/kata-containers/tests/blob/master/cmd/kata-manager).
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.