Commit Graph

974 Commits

Author SHA1 Message Date
Eric Ernst
5385ddc560 Merge pull request #7365 from alakesh/symlink-fix
agent: exclude symlinks from recursive ownership change
2023-07-25 11:27:48 -07:00
Alakesh Haloi
371a118ad0 agent: exclude symlinks from recursive ownership change
currently when fsGroup is used with direct-assign, kata agent
recursively changes ownership and permission for each file including
symlinks. However the problem with symlinks is, the permission of
the symlink itself may not be same as the underlying file. So while
doing recursive ownership and permission changes we should skip
symlinks.

Fixes: #7364
Signed-off-by: Alakesh Haloi <a_haloi@apple.com>
2023-07-19 20:42:55 -07:00
Chao Wu
935432c36d Merge pull request #7352 from justxuewei/exec-hang
agent: Fix exec hang issues with a backgroud process
2023-07-18 23:02:18 +08:00
Fabiano Fidêncio
25d80fcec2 Merge pull request #6993 from zvonkok/kata-agent-init-mount
agent: Ignore already mounted dev/fs/pseudo-fs
2023-07-18 14:11:44 +02:00
Xuewei Niu
6c91af0a26 agent: Fix exec hang issues with a backgroud process
Issue #4747 and pull request #4748 fix exec hang issues where the exec
command hangs when a process's stdout is not closed. However, the PR might
cause the exec command not to work as expected, leading to CI failure. The
PR was reverted in #7042. This PR resolves the exec hang issues and has
undergone 1000 rounds of testing to verify that it would not cause any CI
failures.

Fixes: #4747

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-07-16 08:32:45 +08:00
Archana Shinde
b9b8ccca0c Merge pull request #7236 from amshinde/move-guestprotection
kata-ctl: Move GuestProtection code to kata-sys-util
2023-07-13 23:50:17 -07:00
Archana Shinde
9824206820 agent: Make the static checks pass for agent
The static checks for the agent require Cargo.lock to be updated.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-07-13 20:08:13 +05:30
xuejun-xj
a65291ad72 agent: rustjail: update test_mknod_dev
When running cargo test in container, test_mknod_dev may fail sometimes
because of "Operation not permitted". Change the device path to
"/dev/fifo-test" to avoid this case.

Fixes: #7284

Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
2023-07-12 11:22:32 +08:00
xuejun-xj
46b81dd7d2 agent: clippy: fix cargo clippy warnings
Replace "if let Ok(_) = ..." with ".is_ok()" method.

Fixes: #7284

Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
2023-07-12 11:22:32 +08:00
xuejun-xj
c4771d9e89 agent: Makefile: enable set SECCOMP dynamically
Change ":=" to "?:".

Fixes: #7284

Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
2023-07-12 11:22:32 +08:00
Fabiano Fidêncio
275c84e7b5 Revert "agent: fix the issue of exec hang with a backgroud process"
This reverts commit 25d2fb0fde.

The reason we're reverting the commit is because it to check whether
it's the cause for the regression on devmapper tests.

Fixes: #7253
Depends-on: github.com/kata-containers/tests#5705

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-08 14:27:40 +02:00
Zvonko Kaiser
f72cb2fc12 agent: Remove shadowed function, add slog-term
Remove shadowed get_mounts(), added slog-term as a new crate,
slog can directly log to stdout and we can capture output
in the test-cases that are created in the function to be tested.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-07 11:28:14 +00:00
Zvonko Kaiser
07810bf71f agent: Ignore already mounted dev/fs/pseudo-fs
Using an initrd and setting KATA_INIT=yes meaning we're using the kata-agent
as the init process we need to make sure that the agent is not segfaulting
if mounts are already happened. Some workloads need to configure several
things in the initrd before the kata-agent starts which involves having
/proc or /sys already mounted.

Fixes: #6992

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-07 07:36:04 +00:00
Wedson Almeida Filho
0504bd7254 agent: convert the sl macros to functions
There is nothing in them that requires them to be macros. Converting
them to functions allows for better error messages.

Fixes: #7201

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-06-28 14:05:32 -03:00
Wedson Almeida Filho
0860fbd410 agent: convert the ttrpc_error macro to a function
There is nothing in it that requires it to be a macro. Converting it to
a function allows for better error messages.

Fixes: #7201

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-06-28 14:05:32 -03:00
Wedson Almeida Filho
0e5d6ce6d7 agent: convert the is_allowed macro to a function
Having a function allows for better error messages from the type checker
and it makes it clearer to callers what can happen. For example:

is_allowed!(req);

Gives no indication that it may result in an early return, and no simple
way for callers to modify the behaviour. It also makes it look like
ownership of `req` is being transferred.

On the other hand,

is_allowed(&req)?;

Indicates that `req` is being borrowed (immutably) and may fail. The
question mark indicates that the caller wants an early return on
failure.

Fixes: #7201

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-06-28 14:05:32 -03:00
Wedson Almeida Filho
f680fc52be agent: change AGENT_CONFIG's lazy type to just AgentConfig
Since it is never modified, it doesn't really need a lock of any kind.
Removing the `RwLock` wrapper allows us to remove all `.read().await`
calls when accessing it.

Additionally, `AGENT_CONFIG` already has a static lifetime, so there is
no need to wrap it in a ref-counted heap allocation.

Fixes: #5409

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-06-28 14:05:27 -03:00
Yushuo
aaa96c749b feat(runtime-rs): modify onlineCpuMemRequest
Some vmms, such as dragonball, will actively help us
perform online cpu operations when doing cpu hotplug.
Under the old onlineCpuMem interface, it is difficult
to adapt to this situation.

So we modify the semantics of nb_cpus in onlineCpuMemRequest.
In the original semantics, nb_cpus represents the number of
newly added CPUs that need to be online. The modified
semantics become that the number of online CPUs in the guest
needs to be guaranteed.

Fixes: #5030

Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
2023-06-12 17:53:16 +08:00
James O. D. Hunt
452f286552 Merge pull request #6764 from byron-marohn/fix_5401
kata-ctl: Switch to slog logging; add --log-level and --json-logging arguments
2023-06-07 16:08:53 +01:00
Yushuo
410bc18143 agent-ctl: fix the compile error
When the version of libc is upgraded to 0.2.145, older getrandom could not adapt
to new API, and this will make agent-ctl fail to compile.

We upgrade the version of `rand`, so the low version of getrandom will no longer
need.

Fixes: #7032

Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
2023-06-05 21:48:36 +08:00
Jayant Singh
77519fd120 kata-ctl: Switch to slog logging; add --log-level, --json-logging args
Fixes: #5401, #6654

- Switch kata-ctl from eprintln!()/println!() to structured logging via
  the logging library which uses slog.
- Adds a new create_term_logger() library call which enables printing
  log messages to the terminal via a less verbose / more human readable
  terminal format with colors.
- Adds --log-level argument to select the minimum log level of printed messages.
- Adds --json-logging argument to switch to logging in JSON format.

Co-authored-by: Byron Marohn <byron.marohn@intel.com>
Co-authored-by: Luke Phillips <lucas.phillips@intel.com>
Signed-off-by: Jayant Singh <jayant.singh@intel.com>
Signed-off-by: Byron Marohn <byron.marohn@intel.com>
Signed-off-by: Luke Phillips <lucas.phillips@intel.com>
Signed-off-by: Kelby Madal-Hellmuth <kelby.madal-hellmuth@intel.com>
Signed-off-by: Liz Lawrens <liz.lawrens@intel.com>
2023-06-02 20:13:22 +00:00
Fupan Li
465f5a5ced Merge pull request #4748 from lifupan/main_fix
agent: fix the issue of exec hang with a backgroud process
2023-06-02 10:46:43 +08:00
Zhongtao Hu
cb962b0dc9 Merge pull request #6702 from Apokleos/directvol-common
runtime-rs/kata-ctl: Enhancement of DirectVolumeMount.
2023-05-28 12:00:12 +08:00
alex.lyn
5ddc4f94c5 runtime-rs/kata-ctl: Enhancement of DirectVolumeMount.
Move the get_volume_mount_info to kata-types/src/mount.rs.
If so, it becomes a common method of DirectVolumeMountInfo
and reduces duplicated code.

Fixes: #6701

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-05-26 11:18:29 +08:00
Fupan Li
25d2fb0fde agent: fix the issue of exec hang with a backgroud process
When run a exec process in backgroud without tty, the
exec will hang and didn't terminated.

For example:

crictl -i <container id> sh -c 'nohup tail -f /dev/null &'

Fixes: #4747

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2023-05-26 10:56:46 +08:00
Tim Zhang
5231aff90f Merge pull request #6860 from lifupan/main
netlink: Fix the issue of update_interface
2023-05-26 10:54:07 +08:00
Zhongtao Hu
b076d46db3 agent: handle hotplug virtio-mmio device
As dragonball support hotplug virtio-mmio device, we should handle it in agent

Fixes:#5375
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-05-23 00:53:22 +08:00
Peng Tao
f6e1b1152c agent: update tokio dependency
To 1.28.1 to bring in the latest fixes.

Fixes: #6881
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-05-18 09:36:06 +00:00
fupan
2bda92face netlink: Fix the issue of update_interface
When updating an interface, there's maybe an existed
interface whose name would be the same with the updated
required name, thus it would update failed with interface
name existed error. Thus we should rename the existed interface
with an temporary name and swap it with the previouse interface
name last.

Fixes: #6842

Signed-off-by: fupan <fupan.lfp@antgroup.com>
2023-05-17 16:45:49 +08:00
Feng Wang
ebc8e8e2fd Merge pull request #6773 from jepio/agent-config-error-context
agent: Add context to errors that may occur when AgentConfig file is …
2023-05-16 09:21:34 -07:00
Amulya Meka
76f975e5e6 Merge pull request #6742 from Amulyam24/agent-build
runtime: remove overriding ARCH value by default for ppc64le
2023-05-12 12:34:50 +05:30
Jeremi Piotrowski
022a33de92 agent: Add context to errors when AgentConfig file is missing
When the agent config file is missing, the panic message says "no such file or
directory" but doesn't inform the user about which file was missing. Add
context to the parsing (with filename) and to the from_config_file() calls
(with information where the path is coming from).

Fixes: #6771
Depends-on: github.com/kata-containers/tests#5627
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-05-10 08:43:16 +02:00
Amulyam24
defb643346 runtime: remove overriding ARCH value by default for ppc64le
Currently, ARCH value is being set to powerpc64le by default.
powerpc64le is only right in context of rust and any operation
which might use this variable for a different purpose would fail on ppc64le.

Fixes: #6741

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2023-04-27 16:17:48 +05:30
Feng Wang
205909fbed runtime: Fix virtiofs fd leak
The kata runtime invokes removeStaleVirtiofsShareMounts after
a container is stopped to clean up the stale virtiofs file caches.

Fixes: #6455
Signed-off-by: Feng Wang <fwang@confluent.io>
2023-04-26 15:53:39 -07:00
Fupan Li
ceefd50bd0 Merge pull request #6680 from Tim-Zhang/fix-ut-bad-fd
agent: Fix ut issue caused by fd double closed
2023-04-20 11:18:27 +08:00
Tim Zhang
53c749a9de agent: Fix ut issue caused by fd double closed
Never ever try to close the same fd double times, even in a unit test.

A file descriptor is a number which will be reused, so when you close
the same number twice you may close another file descriptor in the second
time and then there will be an error 'Bad file descriptor (os error 9)'
while the wrongly closed fd is being used.

Fixes: #6679

Signed-off-by: Tim Zhang <tim@hyper.sh>
2023-04-18 23:19:10 +08:00
Tim Zhang
2e3f19af92 agent: fix clippy warnings caused by protobuf3
Fix warnings introduced by protobuf upgrade.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2023-04-17 20:15:49 +08:00
Tim Zhang
4849c56faa agent: Fix unit test issue cuased by protobuf upgrade
Fixes: #6646

Signed-off-by: Tim Zhang <tim@hyper.sh>
2023-04-17 19:49:21 +08:00
Tim Zhang
8af6fc77cd agent: Bump ttrpc from 0.6.0 to 0.7.1
Fixes: #6646

Signed-off-by: Tim Zhang <tim@hyper.sh>
2023-04-17 18:31:41 +08:00
Greg Kurz
c1fbaae8d6 rustjail: Use CPUWeight with systemd and CgroupsV2
The CPU shares property belongs to CgroupsV1. CgroupsV2 uses CPU weight
instead. The correct value is computed in the latter case but it is passed
to systemd using the legacy property. Systemd rejects the request and the
agent exists with the following error :

        Value specified in CPUShares is out of range: unknown

Replace the "shares" wording with "weight" in the CgroupsV2 code to
avoid confusions. Use the "CPUWeight" property since this is what
systemd expects in this case.

Fixes #6636

References:

https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#CPUWeight=weight
https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#systemd%20252
https://github.com/containers/crun/blob/main/crun.1.md#cpu-controller

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-04-07 17:57:26 +02:00
Christophe de Dinechin
b661e0cf3f rustjail: Add anyhow context for D-Bus connections
In cases where the D-Bus connection fails, add a little additional context about
the origin of the error.

Fixes: 6561

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Suggested-by: Archana Shinde <archana.m.shinde@intel.com>
Spell-checked-by: Greg Kurz <gkurz@redhat.com>
2023-04-03 14:09:34 +02:00
Christophe de Dinechin
7796e6ccc6 rustjail: Fix minor grammatical error in function name
Rename `unit_exist` function to `unit_exists` to match English grammar rule.

Fixes: #6561

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2023-03-30 16:13:37 +02:00
Christophe de Dinechin
41fdda1d84 rustjail: Do not unwrap potential error with cgroup manager
There can be an error while connecting to the cgroups managager, for
example a `ENOENT` if a file is not found. Make sure that this is
reported through the proper channels instead of causing a `panic()`
that does not provide much information.

Fixes: #6561

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Reported-by: Greg Kurz <gkurz@redhat.com>
2023-03-30 16:09:13 +02:00
Fabiano Fidêncio
2fe0733dcb Merge pull request #4582 from BbolroC/vfio-ap
agent: Bring in VFIO-AP device handling again
2023-03-20 11:43:13 +01:00
Hyounggyu Choi
96baa83895 agent: Bring in VFIO-AP device handling again
This PR is a continuing work for (kata-containers#3679).

This generalizes the previous VFIO device handling which only
focuses on PCI to include AP (IBM Z specific).

Fixes: kata-containers#3678
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2023-03-16 18:14:12 +09:00
Jakob Naucke
f666f8e2df agent: Add VFIO-AP device handling
Initial VFIO-AP support (#578) was simple, but somewhat hacky; a
different code path would be chosen for performing the hotplug, and
agent-side device handling was bound to knowing the assigned queue
numbers (APQNs) through some other means; plus the code for awaiting
them was written for the Go agent and never released. This code also
artificially increased the hotplug timeout to wait for the (relatively
expensive, thus limited to 5 seconds at the quickest) AP rescan, which
is impractical for e.g. common k8s timeouts.

Since then, the general handling logic was improved (#1190), but it
assumed PCI in several places.

In the runtime, introduce and parse AP devices. Annotate them as such
when passing to the agent, and include information about the associated
APQNs.

The agent awaits the passed APQNs through uevents and triggers a
rescan directly.

Fixes: #3678
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2023-03-16 10:07:48 +09:00
Jakob Naucke
4c527d00c7 agent: Rename VFIO handling to VFIO PCI handling
e.g., split_vfio_option is PCI-specific and should instead be named
split_vfio_pci_option. This mutually affects the runtime, most notably
how the labels are named for the agent.

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2023-03-16 07:43:39 +09:00
Jakob Naucke
db89c88f4f agent: Use cfg-if for s390x CCW
Uses fewer lines in upcoming VFIO-AP support.

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2023-03-16 07:43:39 +09:00
Jakob Naucke
68a586e52c agent: Use a constant for CCW root bus path
used a function like PCI does, but this is not necessary

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2023-03-16 07:43:39 +09:00
Eduardo Lima (Etrunko)
a8b55bf874 dependency: update cgroups-rs
Huge pages failure with cgroups v2.
https://github.com/kata-containers/cgroups-rs/issues/112

Fixes: #6470

Signed-off-by: Eduardo Lima (Etrunko) <etrunko@redhat.com>
2023-03-15 12:21:12 -03:00