currently when fsGroup is used with direct-assign, kata agent
recursively changes ownership and permission for each file including
symlinks. However the problem with symlinks is, the permission of
the symlink itself may not be same as the underlying file. So while
doing recursive ownership and permission changes we should skip
symlinks.
Fixes: #7364
Signed-off-by: Alakesh Haloi <a_haloi@apple.com>
Issue #4747 and pull request #4748 fix exec hang issues where the exec
command hangs when a process's stdout is not closed. However, the PR might
cause the exec command not to work as expected, leading to CI failure. The
PR was reverted in #7042. This PR resolves the exec hang issues and has
undergone 1000 rounds of testing to verify that it would not cause any CI
failures.
Fixes: #4747
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
When running cargo test in container, test_mknod_dev may fail sometimes
because of "Operation not permitted". Change the device path to
"/dev/fifo-test" to avoid this case.
Fixes: #7284
Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
This reverts commit 25d2fb0fde.
The reason we're reverting the commit is because it to check whether
it's the cause for the regression on devmapper tests.
Fixes: #7253
Depends-on: github.com/kata-containers/tests#5705
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Remove shadowed get_mounts(), added slog-term as a new crate,
slog can directly log to stdout and we can capture output
in the test-cases that are created in the function to be tested.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Using an initrd and setting KATA_INIT=yes meaning we're using the kata-agent
as the init process we need to make sure that the agent is not segfaulting
if mounts are already happened. Some workloads need to configure several
things in the initrd before the kata-agent starts which involves having
/proc or /sys already mounted.
Fixes: #6992
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
There is nothing in them that requires them to be macros. Converting
them to functions allows for better error messages.
Fixes: #7201
Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
There is nothing in it that requires it to be a macro. Converting it to
a function allows for better error messages.
Fixes: #7201
Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
Having a function allows for better error messages from the type checker
and it makes it clearer to callers what can happen. For example:
is_allowed!(req);
Gives no indication that it may result in an early return, and no simple
way for callers to modify the behaviour. It also makes it look like
ownership of `req` is being transferred.
On the other hand,
is_allowed(&req)?;
Indicates that `req` is being borrowed (immutably) and may fail. The
question mark indicates that the caller wants an early return on
failure.
Fixes: #7201
Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
Since it is never modified, it doesn't really need a lock of any kind.
Removing the `RwLock` wrapper allows us to remove all `.read().await`
calls when accessing it.
Additionally, `AGENT_CONFIG` already has a static lifetime, so there is
no need to wrap it in a ref-counted heap allocation.
Fixes: #5409
Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
Some vmms, such as dragonball, will actively help us
perform online cpu operations when doing cpu hotplug.
Under the old onlineCpuMem interface, it is difficult
to adapt to this situation.
So we modify the semantics of nb_cpus in onlineCpuMemRequest.
In the original semantics, nb_cpus represents the number of
newly added CPUs that need to be online. The modified
semantics become that the number of online CPUs in the guest
needs to be guaranteed.
Fixes: #5030
Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
When the version of libc is upgraded to 0.2.145, older getrandom could not adapt
to new API, and this will make agent-ctl fail to compile.
We upgrade the version of `rand`, so the low version of getrandom will no longer
need.
Fixes: #7032
Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
Fixes: #5401, #6654
- Switch kata-ctl from eprintln!()/println!() to structured logging via
the logging library which uses slog.
- Adds a new create_term_logger() library call which enables printing
log messages to the terminal via a less verbose / more human readable
terminal format with colors.
- Adds --log-level argument to select the minimum log level of printed messages.
- Adds --json-logging argument to switch to logging in JSON format.
Co-authored-by: Byron Marohn <byron.marohn@intel.com>
Co-authored-by: Luke Phillips <lucas.phillips@intel.com>
Signed-off-by: Jayant Singh <jayant.singh@intel.com>
Signed-off-by: Byron Marohn <byron.marohn@intel.com>
Signed-off-by: Luke Phillips <lucas.phillips@intel.com>
Signed-off-by: Kelby Madal-Hellmuth <kelby.madal-hellmuth@intel.com>
Signed-off-by: Liz Lawrens <liz.lawrens@intel.com>
Move the get_volume_mount_info to kata-types/src/mount.rs.
If so, it becomes a common method of DirectVolumeMountInfo
and reduces duplicated code.
Fixes: #6701
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
When run a exec process in backgroud without tty, the
exec will hang and didn't terminated.
For example:
crictl -i <container id> sh -c 'nohup tail -f /dev/null &'
Fixes: #4747
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
When updating an interface, there's maybe an existed
interface whose name would be the same with the updated
required name, thus it would update failed with interface
name existed error. Thus we should rename the existed interface
with an temporary name and swap it with the previouse interface
name last.
Fixes: #6842
Signed-off-by: fupan <fupan.lfp@antgroup.com>
When the agent config file is missing, the panic message says "no such file or
directory" but doesn't inform the user about which file was missing. Add
context to the parsing (with filename) and to the from_config_file() calls
(with information where the path is coming from).
Fixes: #6771
Depends-on: github.com/kata-containers/tests#5627
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Currently, ARCH value is being set to powerpc64le by default.
powerpc64le is only right in context of rust and any operation
which might use this variable for a different purpose would fail on ppc64le.
Fixes: #6741
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
The kata runtime invokes removeStaleVirtiofsShareMounts after
a container is stopped to clean up the stale virtiofs file caches.
Fixes: #6455
Signed-off-by: Feng Wang <fwang@confluent.io>
Never ever try to close the same fd double times, even in a unit test.
A file descriptor is a number which will be reused, so when you close
the same number twice you may close another file descriptor in the second
time and then there will be an error 'Bad file descriptor (os error 9)'
while the wrongly closed fd is being used.
Fixes: #6679
Signed-off-by: Tim Zhang <tim@hyper.sh>
In cases where the D-Bus connection fails, add a little additional context about
the origin of the error.
Fixes: 6561
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Suggested-by: Archana Shinde <archana.m.shinde@intel.com>
Spell-checked-by: Greg Kurz <gkurz@redhat.com>
There can be an error while connecting to the cgroups managager, for
example a `ENOENT` if a file is not found. Make sure that this is
reported through the proper channels instead of causing a `panic()`
that does not provide much information.
Fixes: #6561
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Reported-by: Greg Kurz <gkurz@redhat.com>
This PR is a continuing work for (kata-containers#3679).
This generalizes the previous VFIO device handling which only
focuses on PCI to include AP (IBM Z specific).
Fixes: kata-containers#3678
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Initial VFIO-AP support (#578) was simple, but somewhat hacky; a
different code path would be chosen for performing the hotplug, and
agent-side device handling was bound to knowing the assigned queue
numbers (APQNs) through some other means; plus the code for awaiting
them was written for the Go agent and never released. This code also
artificially increased the hotplug timeout to wait for the (relatively
expensive, thus limited to 5 seconds at the quickest) AP rescan, which
is impractical for e.g. common k8s timeouts.
Since then, the general handling logic was improved (#1190), but it
assumed PCI in several places.
In the runtime, introduce and parse AP devices. Annotate them as such
when passing to the agent, and include information about the associated
APQNs.
The agent awaits the passed APQNs through uevents and triggers a
rescan directly.
Fixes: #3678
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
e.g., split_vfio_option is PCI-specific and should instead be named
split_vfio_pci_option. This mutually affects the runtime, most notably
how the labels are named for the agent.
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>