QEMU has always been started daemonized since the beginning. I
could not find any justification for that though, but it certainly
introduces a problem : QEMU stops logging errors when started this
way, which isn't accaptable from a support standpoint. The QEMU
community discourages the use of -daemonize ; mostly because
libvirt, QEMU's primary consummer, doesn't use this option and
prefers getting errors from QEMU's stderr through a pipe in order
to enforce rollover.
Now that virtcontainers knows how to start QEMU with a pre-
established QMP connection, let's start QEMU without -daemonize.
This requires to handle the reaping of QEMU when it terminates.
Since cmd.Wait() is blocking, call it from a goroutine.
Signed-off-by: Greg Kurz <groug@kaod.org>
LaunchCustomQemu() currently starts QEMU with cmd.Run() which is
supposed to block until the child process terminates. This assumes
that QEMU daemonizes itself, otherwise LaunchCustomQemu() would
block forever. The virtcontainers package indeed enables the
Daemonize knob in the configuration but having such an implicit
dependency on a supposedly configurable setting is ugly and fragile.
cmd.Run() is :
func (c *Cmd) Run() error {
if err := c.Start(); err != nil {
return err
}
return c.Wait()
}
Let's open-code this : govmm calls cmd.Start() and returns the
cmd to virtcontainers which calls cmd.Wait().
If QEMU doesn't start, e.g. missing binary, there won't be any
errors to collect from QEMU output. Just drop these lines in govmm.
Similarily there won't be any log file to read from in virtcontainers.
Drop that as well.
Signed-off-by: Greg Kurz <groug@kaod.org>
Running QEMU daemonized ensures that the QMP socket is ready to
accept connections when LaunchQemu() returns. In order to be
able to run QEMU undaemonized, let's handle that part upfront.
Create a listener socket and connect to it. Pass the listener
to QEMU and pass the connected socket to QMP : this ensures
that we cannot fail to establish QMP connection and that we
can detect if QEMU exits before accepting the connection.
This is basically what libvirt does.
Signed-off-by: Greg Kurz <groug@kaod.org>
QEMU's -qmp option can be passed the file descriptor of a socket that
is already in listening mode. This is done with by passing `fd=XXX`
to `-qmp` instead of a path. Note that these two options are mutually
exclusive : QEMU errors out if both are passed, so we check that as
well in the validation function.
While here add the `path=` stanza in the path based case for clarity.
Signed-off-by: Greg Kurz <groug@kaod.org>
When QEMU is launched daemonized, we have the guarantee that the
QMP socket is available. In order to launch a non-daemonized QEMU,
the QMP connection should be created before QEMU is started in order
to avoid a race. Introduce a variant of QMPStart() that can use such
an existing connection.
Signed-off-by: Greg Kurz <groug@kaod.org>
After building the binary as usual with `cargo build` run it as follows.
It needs a configuration.toml in which only qemu keys `path`, `kernel`
and `initrd` will initially need to be set. Point them to respective
files e.g. from a kata distribution tarball.
It also needs to be launched from an exported container bundle
directory. One can be created by running
mkdir rootfs
podman export $(podman create busybox) | tar -C ./rootfs -xvf -
runc spec -b .
in a suitable directory.
Then launch the program like this:
KATA_CONF_FILE=/path/to/configuration-qemu.toml /path/to/shim-ctl
Fixes: #5817
Signed-off-by: Pavel Mores <pmores@redhat.com>
This does almost literally nothing so far apart from getting and setting
HypervisorConfig. It's mostly copied from/inspired by dragonball.
Signed-off-by: Pavel Mores <pmores@redhat.com>
If a pod of kata is deployed on a machine, after the machine restarts, the pod status of kata-deploy will be CrashLoopBackOff.
Fixes: #5868
Signed-off-by: SinghWang <wangxin_0611@126.com>
None of the host namespace paths make sense in the guest. Let's clear
them all before sending the spec to the agent.
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
We should test is_pid_namespace_enabled before amending the container
spec, where the pid namespace path is cleared and resulting
sandbox_pidns to always being false.
Fixes: #5881
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Strings in Rust don't have \0 at the end, but C does, which leads to `umount2`
in the libc can't get the correct path. Besides, calling `nix::mount::umount2`
to avoid using an unsafe block is a robust solution.
Fixes: #5871
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Standalone share fs should add virtiofs device in setup_device_before_start_vm
and return the storages to mount the directory in guest. And it uses
hypervisor's jailer root directly instead of jail config.
Besides, we tweaked the parameter, so it adapts to rust version virtiofsd
now. And its cache policy which forbids caching is "never" now, instead of
"none". Hence, we change the default cache mode.
Fixes: #5655
Signed-off-by: Yipeng Yin <yinyipeng@bytedance.com>
For now, rng init is too slow for kata3.0/dragonball. Enable
random_trust_cpu can speed up rng init when kernel boot.
Fixes: #5870
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Cgroup manager for a container will always be created.
Thus, dropping the option for LinuxContainer.cgroup_manager
is feasible and could simplify the code.
Fixes: #5778
Signed-off-by: Yuan-Zhuo <yuanzhuo0118@outlook.com>
Use pidfd_open and poll on newer versions of Linux to wait
for the process to exit. For older versions use existing wait logic
Fixes: #5617
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
Fixed the issue when using nonblocking, the `tokio::io::copy()` needing
to handle EAGAIN, resulting in high CPU usage.
Fixes: #5740
Signed-off-by: Quanwei Zhou <quanweiZhou@linux.alibaba.com>
Removed the `Debug` trait for the `ShareFs` and etc. Renamed
`ShareFsMount::upgrade()` and `ShareFsMount::downgrade()` to
`upgrade_to_rw()` and `downgrade_to_ro()`. Protected `mounted_info_set`
with a mutex to avoid race conditions.
Fixes: #5588
Signed-off-by: Xuewei Niu <justxuewei@apache.org>
This commit implemented umonut controls and permission controls. When a volume
is no longer referenced, it will be umounted immediately. When a volume mounted
with readonly permission and a new coming container needs readwrite permission,
the volume should be upgraded to readwrite permission. On the contrary, if a
volume with readwrite permission and no container needs readwrite, then the
volume should be downgraded.
Fixes: #5588
Signed-off-by: Xuewei Niu <justxuewei@apache.org>
Implemented bind mount related managment on the sandbox side, involving bind
mount a volume if it's not mounted before, upgrade permission to readwrite if
there is a new container needs.
Fixes: #5588
Signed-off-by: Xuewei Niu <justxuewei@apache.org>