Commit Graph

8400 Commits

Author SHA1 Message Date
David Gibson
415420f689 runtime: Make SetupOCIConfigFile clean up after itself
SetupOCIConfigFile creates a temporary directory with os.MkDirTemp().  This
means the callers need to register a deferred function to remove it again.
At least one of them was commented out meaning that a /temp/katatest-
directory was leftover after the unit tests ran.

Change to using t.TempDir() which as well as better matching other parts of
the tests means the testing framework will handle cleaning it up.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 90b2f5b776)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:10:21 +02:00
David Gibson
688b9abd35 runtime: Don't use fixed /tmp/mountPoint path
Several tests in kata_agent_test.go create /tmp/mountPoint as a dummy
directory to mount.  This is not cleaned up after the test.  Although it
is in /tmp, that's still a little messy and can be confusing to a user.
In addition, because it uses the same name every time, it allows for one
run of the test to interfere with the next.

Use the built in t.TempDir() to use an automatically named and deleted
temporary directory instead.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 2eeb5dc223)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:10:21 +02:00
Francesco Giudici
dc1288de8d kata-monitor: add a README file
Fixes: #3704

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
(cherry picked from commit 7b2ff02647)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:10:21 +02:00
bin
78edf827df kata-monitor: add some links when generating pages for browsers
Add some links to rendered webpages for better user experience,
let users can jump to pages only by clicking links in browsers.

Fixes: #4061

Signed-off-by: bin <bin@hyper.sh>
(cherry picked from commit f8cc5d1ad8)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:10:21 +02:00
Yibo Zhuang
eff74fab0e agent: fsGroup support for direct-assigned volume
Adding two functions set_ownership and
recursive_ownership_change to support changing group id
ownership for a mounted volume.

The set_ownership will be called in common_storage_handler
after mount_storage performs the mount for the volume.
set_ownership will be a noop if the FSGroup field in the
Storage struct is not set which indicates no chown will be
performed. If FSGroup field is specified, then it will
perform the recursive walk of the mounted volume path to
change ownership of all files and directories to the
desired group id. It will also configure the SetGid bit
so that files created the directory will have group
following parent directory group.

If the fsGroupChangePolicy is on root mismatch,
then the group ownership will be skipped if the root
directory group id alreasy matches the desired group
id and if the SetGid bit is also set on the root directory.

This is the same behavior as what
Kubelet does today when performing the recursive walk
to change ownership.

Fixes #4018

Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
(cherry picked from commit 92c00c7e84)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:09:22 +02:00
Yibo Zhuang
01cd58094e proto: fsGroup support for direct-assigned volume
This change adds two fields to the Storage pb

FSGroup which is a group id that the runtime
specifies to indicate to the agent to perform a
chown of the mounted volume to the specified
group id after mounting is complete in the guest.

FSGroupChangePolicy which is a policy to indicate
whether to always perform the group id ownership
change or only if the root directory group id
does not match with the desired group id.

These two fields will allow CSI plugins to indicate
to Kata that after the block device is mounted in
the guest, group id ownership change should be performed
on that volume.

Fixes #4018

Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
(cherry picked from commit 6a47b82c81)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:05:29 +02:00
Yibo Zhuang
97ad1d55ff runtime: fsGroup support for direct-assigned volume
The fsGroup will be specified by the fsGroup key in
the direct-assign mountinfo metadate field.
This will be set when invoking the kata-runtime
binary and providing the key, value pair in the metadata
field. Similarly, the fsGroupChangePolicy will also
be provided in the mountinfo metadate field.

Adding an extra fields FsGroup and FSGroupChangePolicy
in the Mount construct for container mount which will
be populated when creating block devices by parsing
out the mountInfo.json.

And in handleDeviceBlockVolume of the kata-agent client,
it checks if the mount FSGroup is not nil, which
indicates that fsGroup change is required in the guest,
and will provide the FSGroup field in the protobuf to
pass the value to the agent.

Fixes #4018

Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
(cherry picked from commit 532d53977e)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:05:29 +02:00
Zhuoyu Tie
b62cced7f4 runtime: no need to write virtiofsd error to log
The scanner reads nothing from viriofsd stderr pipe, because param
'--syslog' rediercts stderr to syslog. So there is no need to write
scanner.Text() to kata log

Fixes: #4063

Signed-off-by: Zhuoyu Tie <tiezhuoyu@outlook.com>
(cherry picked from commit 6e79042aa0)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:05:29 +02:00
Francesco Giudici
8242cfd2be kata-monitor: update the hrefs in the debug/pprof index page
kata-monitor allows to get data profiles from the kata shim
instances running on the same node by acting as a proxy
(e.g., http://$NODE_ADDRESS:8090/debug/pprof/?sandbox=$MYSANDBOXID).
In order to proxy the requests and the responses to the right shim,
kata-monitor requires to pass the sandbox id via a query string in the
url.

The profiling index page proxied by kata-monitor contains the link to all
the data profiles available. All the links anyway do not contain the
sandbox id included in the request: the links result then broken when
accessed through kata-monitor.
This happens because the profiling index page comes from the kata shim,
which will not include the query string provided in the http request.

Let's add on-the-fly the sandbox id in each href tag returned by the kata
shim index page before providing the proxied page.

Fixes: #4054

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
(cherry picked from commit 86977ff780)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:05:29 +02:00
Feng Wang
a37d4e538f agent: best-effort removing mount point
During container exit, the agent tries to remove all the mount point directories,
which can fail if it's a readonly filesytem (e.g. device mapper). This commit ignores
the removal failure and logs a warning message.

Fixes: #4043

Signed-off-by: Feng Wang <feng.wang@databricks.com>
(cherry picked from commit aabcebbf58)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:05:29 +02:00
Greg Kurz
d1197ee8e5 tools/packaging: Fix error path in 'kata-deploy-binaries.sh -s'
`make kata-tarball` relies on `kata-deploy-binaries.sh -s` which
silently ignores errors, and you may end up with an incomplete
tarball without noticing it because `make`'s exit status is 0.

`kata-deploy-binaries.sh` does set the `errexit` option and all the
code in the script seems to assume that since it doesn't do error
checking. Unfortunately, bash automatically disables `errexit` when
calling a function from a conditional pipeline, like done in the `-s`
case:

	if [ "${silent}" == true ]; then
		if ! handle_build "${t}" &>"$log_file"; then
                ^^^^^^
           this disables `errexit`

and `handle_build` ends with a `tar tvf` that always succeeds.

Adding error checking all over the place isn't really an option
as it would seriously obfuscate the code. Drop the conditional
pipeline instead and print the final error message from a `trap`
handler on the special ERR signal. This requires the `errtrace`
option as `trap`s aren't propagated to functions by default.

Since all outputs of `handle_build` are redirected to the build
log file, some file descriptor duplication magic is needed for
the handler to be able to write to the orignal stdout and stderr.

Fixes #3757

Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit a779e19bee)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:05:29 +02:00
Greg Kurz
c9c7751184 tools/packaging: Fix usage of kata-deploy-binaries.sh
Add missing documentation for -s .

Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 0baebd2b37)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:05:29 +02:00
Greg Kurz
1e62231610 tools/packaging/kata-deploy: Copy install_yq.sh in a dedicated script
'make kata-tarball' sometimes fails early with:

cp: cannot create regular file '[...]/tools/packaging/kata-deploy/local-build/dockerbuild/install_yq.sh': File exists

This happens because all assets are built in parallel using the same
`kata-deploy-binaries-in-docker.sh` script, and thus all try to copy
the `install_yq.sh` script to the same location with the `cp` command.
This is a well known race condition that cannot be avoided without
serialization of `cp` invocations.

Move the copying of `install_yq.sh` to a separate script and ensure
it is called *before* parallel builds. Make the presence of the copy
a prerequisite for each sub-build so that they still can be triggered
individually. Update the GH release workflow to also call this script
before calling `kata-deploy-binaries-in-docker.sh`.

Fixes #3756

Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 154c8b03d3)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:05:29 +02:00
David Gibson
8fa64e011d packaging: Eliminate TTY_OPT and NO_TTY variables in kata-deploy
NO_TTY configured whether to add the -t option to docker run.  It makes no
sense for the caller to configure this, since whether you need it depends
on the commands you're running.  Since the point here is to run
non-interactive build scripts, we don't need -t, or -i either.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 1ed7da8fc7)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:05:29 +02:00
David Gibson
8f67f9e384 tools/packaging/kata-deploy/local-build: Add build to gitignore
This directory consists entirely of files built during a make kata-tarball,
so it should not be committed to the tree. A symbolic link to this directory
might be created during 'make tarball', ignore it as well.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[greg: - rearranged the subject to make the subsystem checker happy
       - also ignore the symbolic link created by
         `kata-deploy-binaries-in-docker.sh`]
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit bad859d2f8)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:05:29 +02:00
Greg Kurz
3049b7760a versions: Bump firecracker to v0.23.4
This release changes Docker images repository from DockerHub to Amazon
ECR. This resolves the `You have reached your pull rate limit` error
when building the firecracker tarball.

Fixes #4001

Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 0d5f80b803)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:05:29 +02:00
Miao Xia
aedfef29a3 runtime/virtcontainers: Pass the hugepages resources to agent
The hugepages resources claimed by containers should be limited
by cgroup in the guest OS.

Fixes: #3695

Signed-off-by: Miao Xia <xia.miao1@zte.com.cn>
(cherry picked from commit a2f5c1768e)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:05:29 +02:00
David Gibson
c9e1f72785 agent: Verify that we allocated as many hugepages as we need
allocate_hugepages() writes to the kernel sysfs file to allocate hugepages
in the Kata VM.  However, even if the write succeeds, it's not certain that
the kernel will actually be able to allocate as many hugepages as we
requested.

This patch reads back the file after writing it to check if we were able to
allocate all the required hugepages.

fixes #3816

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 42e35505b0)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:05:29 +02:00
David Gibson
ba858e8cd9 agent: Don't attempt to create directories for hugepage configuration
allocate_hugepages() constructs the path for the sysfs directory containing
hugepage configuration, then attempts to create this directory if it does
not exist.

This doesn't make sense: sysfs is a view into kernel configuration, if the
kernel has support for the hugepage size, the directory will already be
there, if it doesn't, trying to create it won't help.

For the same reason, attempting to create the "nr_hugepages" file
itself is pointless, so there's no reason to call
OpenOptions::create(true).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 608e003abc)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 09:05:29 +02:00
Fabiano Fidêncio
b784763685
Merge pull request #4120 from likebreath/0420/backport_clh_v23.0
stable-2.4 | versions: Upgrade to Cloud Hypervisor v23.0
2022-04-21 14:33:37 +02:00
Fabiano Fidêncio
df2d57e9b8
Merge pull request #4098 from fengwang666/stable-2.4_backport
stable-2.4 | runtime: Base64 encode the direct volume mountInfo path
2022-04-21 12:54:03 +02:00
Bo Chen
bc32eff7b4 virtcontainers: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v23.0.
Note: The client code of cloud-hypervisor's (CLH) OpenAPI is
automatically generated by openapi-generator [1-2].

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 29e569aa92)
2022-04-20 15:57:50 -07:00
Bo Chen
984ef5389e versions: Upgrade to Cloud Hypervisor v23.0
Highlights from the Cloud Hypervisor release v23.0: 1) vDPA Support; 2)
Updated OS Support list (Jammy 22.04 added with EOLed versions removed);
3) AArch64 Memory Map Improvements; 4) AMX Support; 5) Bug Fixes;

Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v23.0

Fixes: #4101

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 6012c19707)
2022-04-20 15:57:50 -07:00
Feng Wang
adf6493b89 runtime: Base64 encode the direct volume mountInfo path
This is to avoid accidentally deleting multiple volumes.

Fixes #4020

Signed-off-by: Feng Wang <feng.wang@databricks.com>
(cherry picked from commit 354cd3b9b6)
2022-04-13 22:30:53 -07:00
Greg Kurz
10bab3c96a
Merge pull request #4081 from fidencio/wip/stable-2.4-agent-avoid-panic-when-getting-empty-stats
stable-2.4 | agent: Avoid agent panic when reading empty stats
2022-04-13 14:13:13 +02:00
Fabiano Fidêncio
6b41754018 agent: Avoid agent panic when reading empty stats
This was seen in an issue report, where we'd try to unwrap a None value,
leading to a panic.

Fixes: #4077
Related: #4043

Full backtrace:
```
"thread 'tokio-runtime-worker' panicked at 'called `Option::unwrap()` on a `None` value', rustjail/src/cgroups/fs/mod.rs:593:31"
"stack backtrace:"
"   0:     0x7f0390edcc3a - std::backtrace_rs::backtrace::libunwind::trace::hd5eff4de16dbdd15"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5"
"   1:     0x7f0390edcc3a - std::backtrace_rs::backtrace::trace_unsynchronized::h04a775b4c6ab90d6"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5"
"   2:     0x7f0390edcc3a - std::sys_common::backtrace::_print_fmt::h3253c3db9f17d826"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:67:5"
"   3:     0x7f0390edcc3a - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h02bfc712fc868664"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:46:22"
"   4:     0x7f0390a91fbc - core::fmt::write::hfd5090d1132106d8"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/fmt/mod.rs:1149:17"
"   5:     0x7f0390edb804 - std::io::Write::write_fmt::h34acb699c6d6f5a9"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/io/mod.rs:1697:15"
"   6:     0x7f0390edbee0 - std::sys_common::backtrace::_print::hfca761479e3d91ed"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:49:5"
"   7:     0x7f0390edbee0 - std::sys_common::backtrace::print::hf666af0b87d2b5ba"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:36:9"
"   8:     0x7f0390edbee0 - std::panicking::default_hook::{{closure}}::hb4617bd1d4a09097"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:211:50"
"   9:     0x7f0390edb2da - std::panicking::default_hook::h84f684d9eff1eede"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:228:9"
"  10:     0x7f0390edb2da - std::panicking::rust_panic_with_hook::h8e784f5c39f46346"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:606:17"
"  11:     0x7f0390f0c416 - std::panicking::begin_panic_handler::{{closure}}::hef496869aa926670"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:500:13"
"  12:     0x7f0390f0c3b6 - std::sys_common::backtrace::__rust_end_short_backtrace::h8e9b039b8ed3e70f"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:139:18"
"  13:     0x7f0390f0c372 - rust_begin_unwind"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:498:5"
"  14:     0x7f03909062c0 - core::panicking::panic_fmt::h568976b83a33ae59"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/panicking.rs:107:14"
"  15:     0x7f039090641c - core::panicking::panic::he2e71cfa6548cc2c"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/panicking.rs:48:5"
"  16:     0x7f0390eb443f - <rustjail::cgroups::fs::Manager as rustjail::cgroups::Manager>::get_stats::h85031fc1c59c53d9"
"  17:     0x7f03909c0138 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::hfa6e6cd7516f8d11"
"  18:     0x7f0390d697e5 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::hffbaa534cfa97d44"
"  19:     0x7f039099c0b3 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::hae3ab083a06d0b4b"
"  20:     0x7f0390af9e1e - std::panic::catch_unwind::h1fdd25c8ebba32e1"
"  21:     0x7f0390b7c4e6 - tokio::runtime::task::raw::poll::hd3ebbd0717dac808"
"  22:     0x7f0390f49f3f - tokio::runtime::thread_pool::worker::Context::run_task::hfdd63cd1e0b17abf"
"  23:     0x7f0390f3a599 - tokio::runtime::task::raw::poll::h62954f6369b1d210"
"  24:     0x7f0390f37863 - std::sys_common::backtrace::__rust_begin_short_backtrace::h1c58f232c078bfe9"
"  25:     0x7f0390f4f3dd - core::ops::function::FnOnce::call_once{{vtable.shim}}::h2d329a84c0feed57"
"  26:     0x7f0390f0e535 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h137e5243c6233a3b"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/alloc/src/boxed.rs:1694:9"
"  27:     0x7f0390f0e535 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h7331c46863d912b7"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/alloc/src/boxed.rs:1694:9"
"  28:     0x7f0390f0e535 - std::sys::unix:🧵:Thread:🆕:thread_start::h1fb20b966cb927ab"
"                               at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys/unix/thread.rs:106:17"
```

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 78f30c33c6)
2022-04-12 18:59:02 +02:00
Fabiano Fidêncio
0ad6f05dee
Merge pull request #4024 from bergwolf/2.4.0-branch-bump
# Kata Containers 2.4.0
2022-04-01 13:46:35 +02:00
Peng Tao
4c9c01a124 release: Kata Containers 2.4.0
- stable-2.4 | agent: fix container stop error with signal SIGRTMIN+3
- stable-2.4 | kata-monitor: fix duplicated output when printing usage
- stable-2.4 | runtime: Stop getting OOM events from agent for "ttrpc closed" error
- kata-deploy: fix version bump from -rc to stable
- stable-2.4: release: Include all the rust vendored code into the vendored tarball
- stable-2.4 | tools: release: Do not consider release candidates as stable releases
- agent: Signal the whole process group
- stable-2.4 | docs: Update k8s documentation
- backport main commits to stable 2.4
- stable-2.4: Bump QEMU to 6.2 (bringing then SGX support in)
- runtime: Properly handle ESRCH error when signaling container
- stable-2.4 | versions: Upgrade to Cloud Hypervisor v22.1

f2319d69 release: Adapt kata-deploy for 2.4.0
cae48e9c agent: fix container stop error with signal SIGRTMIN+3
342aa95c kata-monitor: fix duplicated output when printing usage
9f75e226 runtime: add logs around sandbox monitor
363fbed8 runtime: stop getting OOM events when ttrpc: closed error
f840de5a workflows,release: Ship *all* the rust vendored code
952cea5f tools: Add a generate_vendor.sh script
cc965fa0 kata-deploy: fix version bump from -rc to stable
f41cc184 tools: release: Do not consider release candidates as stable releases
e059b50f runtime: Add more debug logs for container io stream copy
71ce6f53 agent: Kill the all the container processes of the same cgroup
30fc2c86 docs: Update k8s documentation
24028969 virtcontainers: Run mock hook from build tree rather than system bin dir
4e54aa5a doc: fix filename typo
d815393c manager: Add options to change self test behaviour
4111e1a3 manager: Add option to enable component debug
2918be18 manager: Create containerd link
6b31b068 kernel: fix cve-2022-0847
5589b246 doc: update Intel SGX use cases document
1da88dca tools: update QEMU to 6.2
3e2f9223 runtime: Properly handle ESRCH error when signaling container
4c21cb3e versions: Upgrade to Cloud Hypervisor v22.1

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-04-01 06:20:20 +00:00
Peng Tao
f2319d693d release: Adapt kata-deploy for 2.4.0
kata-deploy files must be adapted to a new release.  The cases where it
happens are when the release goes from -> to:
* main -> stable:
  * kata-deploy-stable / kata-cleanup-stable: are removed

* stable -> stable:
  * kata-deploy / kata-cleanup: bump the release to the new one.

There are no changes when doing an alpha release, as the files on the
"main" branch always point to the "latest" and "stable" tags.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-04-01 06:20:20 +00:00
Bin Liu
98ccf8f6a1
Merge pull request #4008 from wxx213/stable-2.4
stable-2.4 | agent: fix container stop error with signal SIGRTMIN+3
2022-04-01 11:29:18 +08:00
Wang Xingxing
cae48e9c9b agent: fix container stop error with signal SIGRTMIN+3
The nix::sys::signal::Signal package api cannot deal with SIGRTMIN+3,
directly use libc function to send the signal.

Fixes: #3990

Signed-off-by: Wang Xingxing <stellarwxx@163.com>
(cherry picked from commit 0d765bd082)
Signed-off-by: Wang Xingxing <stellarwxx@163.com>
2022-03-31 16:49:06 +08:00
snir911
a36103c759
Merge pull request #4003 from fgiudici/kata-monitor_fix_help_backport
stable-2.4 | kata-monitor: fix duplicated output when printing usage
2022-03-30 18:57:17 +03:00
Fabiano Fidêncio
6abbcc551c
Merge pull request #3997 from liubin/backport-2.4
stable-2.4 | runtime: Stop getting OOM events from agent for "ttrpc closed" error
2022-03-30 14:08:55 +02:00
Francesco Giudici
342aa95cc8 kata-monitor: fix duplicated output when printing usage
(default: "/run/containerd/containerd.sock") is duplicated when
printing kata-monitor usage:

[root@kubernetes ~]# kata-monitor --help
Usage of kata-monitor:
  -listen-address string
        The address to listen on for HTTP requests. (default ":8090")
  -log-level string
        Log level of logrus(trace/debug/info/warn/error/fatal/panic). (default "info")
  -runtime-endpoint string
        Endpoint of CRI container runtime service. (default: "/run/containerd/containerd.sock") (default "/run/containerd/containerd.sock")

the golang flag package takes care of adding the defaults when printing
usage. Remove the explicit print of the value so that it would not be
printed on screen twice.

Fixes: #3998

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
(cherry picked from commit a63bbf9793)
2022-03-30 14:02:54 +02:00
bin
9f75e226f1 runtime: add logs around sandbox monitor
For debugging purposes, add some logs.

Fixes: #3815

Signed-off-by: bin <bin@hyper.sh>
2022-03-30 17:11:40 +08:00
bin
363fbed804 runtime: stop getting OOM events when ttrpc: closed error
getOOMEvents is a long-waiting call, it will retry when failed.
For cases of agent shutdown, the retry should stop.

When the agent hasn't detected agent has died, we can also check
whether the error is "ttrpc: closed".

Fixes: #3815

Signed-off-by: bin <bin@hyper.sh>
2022-03-30 17:11:35 +08:00
Fabiano Fidêncio
54a638317a
Merge pull request #3988 from bergwolf/github/kata-deploy
kata-deploy: fix version bump from -rc to stable
2022-03-30 11:01:45 +02:00
Peng Tao
8ce6b12b41
Merge pull request #3993 from fidencio/wip/stable-2.4-release-include-all-rust-vendored-code-to-the-vendored-tarball
stable-2.4: release: Include all the rust vendored code into the vendored tarball
2022-03-30 16:10:47 +08:00
Fabiano Fidêncio
f840de5acb workflows,release: Ship *all* the rust vendored code
Instead of only vendoring the code needed by the agent, let's ensure we
vendor all the needed rust code, and let's do it using the newly
introduced enerate_vendor.sh script.

Fixes: #3973

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 3606923ac8)
2022-03-29 23:27:43 +02:00
Fabiano Fidêncio
952cea5f5d tools: Add a generate_vendor.sh script
This script is responsible for generating a tarball with all the rust
vendored code that is needed for fully building kata-containers on a
disconnected environment.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 2eb07455d0)
2022-03-29 23:27:29 +02:00
Peng Tao
cc965fa0cb kata-deploy: fix version bump from -rc to stable
In such case, we should bump from "latest" tag rather than from
current_version.

Fixes: #3986
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-03-29 03:45:27 +00:00
GabyCT
44b1473d0c
Merge pull request #3977 from fidencio/wip/backport-fix-for-3847
stable-2.4 | tools: release: Do not consider release candidates as stable releases
2022-03-28 10:38:47 -06:00
Fupan Li
565efd1bf2
Merge pull request #3975 from bergwolf/github/backport-stable-2.4
agent: Signal the whole process group
2022-03-28 18:26:12 +08:00
Fabiano Fidêncio
f41cc18427 tools: release: Do not consider release candidates as stable releases
During the release of 2.4.0-rc0 @egernst noticed an incositency in the
way we handle release tags, as release candidates are being taken as
"stable" releases, while both the kata-deploy tests and the release
action consider this as "latest".

Ideally we should have our own tag for "release candidate", but that's
something that could and should be discussed more extensively outside of
the scope of this quick fix.

For now, let's align the code generating the PR for bumping the release
with what we already do as part of the release action and kata-deploy
test, and tag "-rc"  as latest, regardless of which branch it's coming
from.

Fixes: #3847

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 4adf93ef2c)
2022-03-28 11:01:58 +02:00
Feng Wang
e059b50f5c runtime: Add more debug logs for container io stream copy
This can help debugging container lifecycle issues

Fixes: #3913

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-28 16:22:22 +08:00
Feng Wang
71ce6f537f agent: Kill the all the container processes of the same cgroup
Otherwise the container process might leak and cause an unclean exit

Fixes: #3913

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-28 16:21:51 +08:00
Bin Liu
a2b73b60bd
Merge pull request #3960 from cmaf/update-k8s-docs-1-stable-2.4
stable-2.4 | docs: Update k8s documentation
2022-03-25 15:25:25 +08:00
Bin Liu
2ce9ce7b8f
Merge pull request #3954 from bergwolf/github/backport-stable-2.4
backport main commits to stable 2.4
2022-03-25 14:45:17 +08:00
Chelsea Mafrica
30fc2c863d docs: Update k8s documentation
Update documentation with missing step to untaint node to enable
scheduling and update the example to run a pod using the kata runtime
class instead of untrusted workloads, which applies to versions of CRI-O
prior to v1.12.

Fixes #3863

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
(cherry picked from commit 5c434270d1)
2022-03-24 11:22:18 -07:00
David Gibson
24028969c2 virtcontainers: Run mock hook from build tree rather than system bin dir
Running unit tests should generally have minimal dependencies on
things outside the build tree.  It *definitely* shouldn't modify
system wide things outside the build tree.  Currently the runtime
"make test" target does so, though.

Several of the tests in src/runtime/pkg/katautils/hook_test.go require a
sample hook binary.  They expect this hook in
/usr/bin/virtcontainers/bin/test/hook, so the makefile, as root, installs
the test binary to that location.

Go tests automatically run within the package's directory though, so
there's no need to use a system wide path.  We can use a relative path to
the binary build within the tree just as easily.

fixes #3941

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-03-24 12:02:00 +08:00