Commit Graph

956 Commits

Author SHA1 Message Date
Greg Kurz
993ecec932 virtiofsd: Convert legacy -o sub-options to their -- replacement
The `-o` option is the legacy way to configure virtiofsd, inherited
from the C implementation. The rust implementation honours it for
compatibility but it logs deprecation warnings.

Let's use the replacement options in the go shim code. Also drop
references to `-o` from the configuration TOML file.

Fixes #7111

Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit a43ea24dfc)
Signed-off-by: Greg Kurz <groug@kaod.org>
2023-06-21 11:22:44 +02:00
Greg Kurz
2e9125c32b virtiofsd: Drop -o no_posix_lock
The C implementation of virtiofsd had some kind of limited support
for remote POSIX locks that was causing some workflows to fail with
kata. Commit 432f9bea6e hard coded `-o no_posix_lock` in order
to enforce guest local POSIX locks and avoid the issues.

We've switched to the rust implementation of virtiofsd since then,
but it emits a warning about `-o` being deprecated.

According to https://gitlab.com/virtio-fs/virtiofsd/-/issues/53 :

   The C implementation of the daemon has limited support for
   remote POSIX locks, restricted exclusively to non-blocking
   operations. We tried to implement the same level of
   functionality in #2, but we finally decided against it because,
   in practice most applications will fail if non-blocking
   operations aren't supported.

   Implementing support for non-blocking isn't trivial and will
   probably require extending the kernel interface before we can
   even start working on the daemon side.

There is thus no justification to pass `-o no_posix_lock` anymore.

Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 8e00dc6944)
Signed-off-by: Greg Kurz <groug@kaod.org>
2023-06-21 11:22:35 +02:00
Greg Kurz
407727e1fb virtiofsd: Stop using deprecated -f option
The rust implementation of virtiofsd always runs foreground and
spits a deprecation warning when `-f` is passed.

Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 2a15ad9788)
Signed-off-by: Greg Kurz <groug@kaod.org>
2023-06-21 11:22:28 +02:00
Beraldo Leal
075a311282 runtime: sending SIGKILL to qemu
There is a race condition when virtiofsd is killed without finishing all
the clients. Because of that, when a pod is stopped, QEMU detects
virtiofsd is gone, which is legitimate.

Sending a SIGTERM first before killing could introduce some latency
during the shutdown.

Fixes #6757.
Backport of #6959.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
(cherry picked from commit 0e47cfc4c7)
2023-05-25 13:46:47 -04:00
XDTG
624dc2d222 runtime: use filepath.Clean() to clean the mount path
Fix path check bypassed issuse introduced by #6082,
use filepath.Clean() to clean path before check

Fixes: #6082

Signed-off-by: XDTG <click1799@163.com>
(cherry picked from commit dc86d6dac3)
Signed-off-by: Greg Kurz <groug@kaod.org>
2023-03-20 16:25:42 +01:00
James O. D. Hunt
5f6d747e6d
Merge pull request #6272 from cmaf/tracing-clh-returnctx-startVM
runtime: tracing: Fix missing ctx return
2023-02-14 08:17:45 +00:00
Bin Liu
e812c5ce66
Merge pull request #6076 from zhaojizhuang/reconnect
runtime: add reconnect timeout for vhost user block
2023-02-14 10:39:20 +08:00
Archana Shinde
7b4e5751ca
Merge pull request #5007 from larrydewey/update-rpb-main
SEV: Update ReducedPhysBits
2023-02-13 14:56:38 -08:00
Chelsea Mafrica
c453919911 runtime: tracing: Fix missing ctx return
Normally we return the context when creating a trace span so that the
ordering of spans w.r.t. calls is maintained in tracing output. Add
missing context for StartVM() for Cloud Hypervisor.

Fixes #6271

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2023-02-13 12:37:52 -08:00
zhaojizhuang
ca02c9f512 runtime: add reconnect timeout for vhost user block
Fixes: #6075
Signed-off-by: zhaojizhuang <571130360@qq.com>
2023-02-13 14:33:46 +08:00
Bin Liu
8a9392fd9d
Merge pull request #6188 from yahaa/Typo-fix
Typo: change tabs in comment to spaces
2023-02-11 11:19:11 +08:00
Larry Dewey
67b8f0773f SEV: Update ReducedPhysBits
Updating this field, as `cpuid` provides host level data, which is not
what a guest would expect for Reduced Phsycial Bits. In almost all
cases, we should be using `1` for the value here.

Amend: Adding unit test change.

Fixes: #5006

Signed-off-by: Larry Dewey <larry.dewey@amd.com>
2023-02-10 13:19:33 -06:00
yaoyinnan
bdf20b5d26 rootfs: support EROFS filesystem
For kata containers, rootfs is used in the read-only way.
EROFS can noticably decrease metadata overhead.

On the basis of supporting the EROFS file system, it supports using the config parameter to switch the file system used by rootfs.

Fixes: #6063

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
2023-02-11 00:44:13 +08:00
Bin Liu
407d3146e6
Merge pull request #6234 from UiPath/fix-clh-timeout
clh: Enforce API timeout only for vm.boot request
2023-02-08 21:33:56 +08:00
Alexandru Matei
ac64b021a6 clh: Enforce API timeout only for vm.boot request
launchClh already has a timeout of 10seconds for launching clh, e.g.
if launchClh or setupVirtiofsDaemon takes a few seconds the context's
deadline will already be expired by the time it reaches bootVM

Fixes #6240
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2023-02-08 11:14:51 +02:00
Bin Liu
56071c6e7b virtiofsd: change cache mod to const
Change cache mod from literal to const and place them in one place.

Also set default cache mode from `none` to `never` in
`pkg/katautils/config-settings.go.in`.

Fixes: #6151

Signed-off-by: Bin Liu <bin@hyper.sh>
2023-02-08 15:06:52 +08:00
joannejchen
9794c52c65 improvement: Fix naming conventions for span name and log subsystem
Normally, the span name should be the same as the function name, and the log subsystem should not contain spaces.

Fixes #6153

Signed-off-by: joannejchen <chenjjoanne@gmail.com>
2023-02-06 08:25:49 -06:00
yahaa
e071d9251f Typo: change tabs in comment to spaces
Fixes: #6150

Signed-off-by: yahaa <1477765176@qq.com>
2023-02-02 12:08:33 +08:00
Greg Kurz
334c4b8bdc runtime: Drop QEMU log file support
The QEMU log file is essentially about fine grain tracing of QEMU
internals and mostly useful for developpers, not production. Notably,
the log file isn't limited in size, nor rotated in any way. It means
that a container running in the VM could possibly flood the log file
with a guest triggerable trace. For example, on openshift, the log
file is supposed to reside on a per-VM 14 GiB tmpfs mount. This means
that each pod running with the kata runtime could potentially consume
this amount of host RAM which is not acceptable.

Error messages are best collected from QEMU's stderr as kata is doing
now since PR #5736 was merged. Drop support for the QEMU log file
because it doesn't bring any value but can certainly do harm.

Fixes #6173

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-01-31 09:20:29 +01:00
zhaojizhuang
9092c23a2e runtime: Add hmp for qemu
Fixes: #6092
Signed-off-by: zhaojizhuang <571130360@qq.com>
2023-01-29 14:22:04 +08:00
Greg Kurz
af125b1498
Merge pull request #5736 from gkurz/no-qemu-daemonize
runtime: Start QEMU undaemonized and get logs
2023-01-27 16:33:48 +01:00
Greg Kurz
39fe4a4b6f runtime: Collect QEMU's stderr
LaunchQemu now connects a pipe to QEMU's stderr and makes it
usable by callers through a Go io.ReadCloser object. As
explained in [0], all messages should be read from the pipe
before calling cmd.Wait : introduce a LogAndWait helper to handle
that.

Fixes #5780

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-01-24 23:09:17 +01:00
Greg Kurz
a5319c6be6 runtime: Start QEMU undaemonized
QEMU has always been started daemonized since the beginning. I
could not find any justification for that though, but it certainly
introduces a problem : QEMU stops logging errors when started this
way, which isn't accaptable from a support standpoint. The QEMU
community discourages the use of -daemonize ; mostly because
libvirt, QEMU's primary consummer, doesn't use this option and
prefers getting errors from QEMU's stderr through a pipe in order
to enforce rollover.

Now that virtcontainers knows how to start QEMU with a pre-
established QMP connection, let's start QEMU without -daemonize.
This requires to handle the reaping of QEMU when it terminates.
Since cmd.Wait() is blocking, call it from a goroutine.

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-01-24 23:09:11 +01:00
Greg Kurz
bf4e3a618f runtime: Launch QEMU with cmd.Start()
LaunchCustomQemu() currently starts QEMU with cmd.Run() which is
supposed to block until the child process terminates. This assumes
that QEMU daemonizes itself, otherwise LaunchCustomQemu() would
block forever. The virtcontainers package indeed enables the
Daemonize knob in the configuration but having such an implicit
dependency on a supposedly configurable setting is ugly and fragile.

cmd.Run() is :

func (c *Cmd) Run() error {
	if err := c.Start(); err != nil {
		return err
	}
	return c.Wait()
}

Let's open-code this : govmm calls cmd.Start() and returns the
cmd to virtcontainers which calls cmd.Wait().

If QEMU doesn't start, e.g. missing binary, there won't be any
errors to collect from QEMU output. Just drop these lines in govmm.
Similarily there won't be any log file to read from in virtcontainers.
Drop that as well.

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-01-24 23:09:11 +01:00
Greg Kurz
8a1723a5cb runtime: Pre-establish the QMP connection
Running QEMU daemonized ensures that the QMP socket is ready to
accept connections when LaunchQemu() returns. In order to be
able to run QEMU undaemonized, let's handle that part upfront.
Create a listener socket and connect to it. Pass the listener
to QEMU and pass the connected socket to QMP : this ensures
that we cannot fail to establish QMP connection and that we
can detect if QEMU exits before accepting the connection.
This is basically what libvirt does.

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-01-24 23:09:11 +01:00
Tim Zhang
20196048bf
Merge pull request #6030 from liubin/fix/6029-use-system-hugepagesize
runtime: use system pagesize for hugepage test
2023-01-16 16:57:55 +08:00
Eric Ernst
3d573ba579
Merge pull request #6050 from egernst/goos-the-vc
virtcontainers: split out linux-specific bits for mount, factory
2023-01-13 15:28:42 -08:00
Eric Ernst
458fe865ea
Merge pull request #6052 from egernst/add-darwin-skeletons
Add darwin skeletons
2023-01-13 13:14:16 -08:00
Eric Ernst
923cd3fda1 virtcontainers: split out Linux parts from mount
Mount handling is often unique in Linux. Let's ensure that the common
parts remain in mount.go, while Linux speific parts are within a linux
file.

Fixes: #6049

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2023-01-13 11:14:56 -08:00
Eric Ernst
f82918f872
Merge pull request #6045 from egernst/fix-6044
Address issues with the initial vCPU pinning functionality
2023-01-13 11:06:42 -08:00
GabyCT
9c6e90fd55
Merge pull request #6043 from GabyCT/topic/fixerrormsg
virtcontainers: Fix misspelling in error message
2023-01-13 09:16:34 -06:00
Eric Ernst
60ff230d80 virtcontainers: Split the factory package into Linux and Darwin bits
- split template
- split factory
- add stubs for darwin

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2023-01-12 16:51:28 -08:00
Samuel Ortiz
ea06fe3afc virtcontainers: Add a Network API skeleton for Darwin
Empty for now.

Fixes: #6051

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2023-01-12 15:53:28 -08:00
Eric Ernst
6ee550e9a5 runtime: vCPUs pinning is sandbox specific, not hypervisor
While at it, make sure we persist this and fix a misc typo.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2023-01-12 15:44:25 -08:00
Peng Tao
4a4232b851
Merge pull request #6037 from bergwolf/github/no-netns
runtime: fix up disable_netns handling
2023-01-12 09:58:24 +08:00
Eric Ernst
e3d3b72fa2 virtcontainers: use resource control for setting CPU affinity
Let's abstract the CPU affinity, instead of calling linux only code from
sandbox.

Fixes: #6044

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2023-01-11 17:55:53 -08:00
Gabriela Cervantes
fc17d7cc41 virtcontainers: Fix misspelling in error message
This PR fixes a misspelling in the error message when it tries to run
a system without Confidential computing support.

Fixes #6042

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-01-11 21:58:07 +00:00
Peng Tao
12fd6ffc1f runtime: fix up disable_netns handling
With `disable_netns=true`, we should never scan the sandbox netns which
is the host netns in such case.

Fixes: #6021
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-01-11 12:25:24 +00:00
Bin Liu
8551853cfe runtime: use system pagesize for hugepage test
In TestHandleHugepages it will do a mount operation with different pagesizes,
but some systems only support 2M pagesize, test for a 1g pagesize will fail.

This commit try to fix by only mount pagesizes under `/sys/kernel/mm/hugepages`, which are
supported to mount by the OS.

Fixes: #6029

Signed-off-by: Bin Liu <bin@hyper.sh>
2023-01-11 17:02:58 +08:00
Eric Ernst
07e77f5be7
Merge pull request #5994 from dcantah/virtcontainers_tests_darwin
virtcontainers: tests: Ensure Linux specific tests are just run on Linux
2023-01-10 17:13:28 -08:00
Fabiano Fidêncio
147c56bb8d
Merge pull request #6019 from liubin/fix/6018-virtiofsd-cache-mod
Change cache mode from none to never
2023-01-10 23:12:13 +01:00
Bin Liu
8225d8044e
Merge pull request #6003 from dcantah/fs-skeleton
virtcontainers: fs_share: Add Darwin skeleton
2023-01-10 17:48:45 +08:00
Bin Liu
86a82cace9 runtime: change cache mode from none to never
New Rust virtiofsd's `cache` mode doesn't support `none` mode,
we should use `never` to replace it.

Fixes: #6018

Signed-off-by: Bin Liu <bin@hyper.sh>
2023-01-10 17:29:48 +08:00
Eric Ernst
4d53303a7d
Merge pull request #6005 from dcantah/vfw-skeleton
virtcontainers: Add a Virtualization.framework skeleton
2023-01-09 15:50:04 -08:00
Bin Liu
1bae41a4d4
Merge pull request #5996 from dcantah/vfw-initial
virtcontainers: Introduce hypervisor_darwin
2023-01-09 11:37:02 +08:00
Samuel Ortiz
fa9ae9362c virtcontainers: Add a Virtualization.framework skeleton
Fixes: #6004

A Virtualization.framework based Hypervisor implementation.
This is just stubs for now to eventually get this building.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Signed-off-by: Danny Canter <danny@dcantah.dev>
2023-01-08 07:40:21 -08:00
Eric Ernst
d48b22bb13 virtcontainers: fs_share: add Darwin skeleton
Fixes: #6002

As a first pass for testing, let's add a skeleton for filesystem
sharing support on Darwin..

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Signed-off-by: Danny Canter <danny@dcantah.dev>
2023-01-07 19:56:47 -08:00
Bin Liu
bc8a6423e0
Merge pull request #5986 from dcantah/nydus-nonetns
nydus: net-ns handling needs to be only executed on Linux hosts
2023-01-07 11:19:07 +08:00
Eric Ernst
fafc7a8b1a virtcontainers: tests: Ensure Linux specific tests are just run on Linux
Fixes: #5993

Several tests utilize linux'isms like Mounts, bindmounts, vsock etc.

Let's ensure that these are still tested on Linux, but that we also skip
these tests when on other operating systems (Darwin). This commit just
moves tests; there shouldn't be any functional test changes. While the
tests still won't be runnable on Darwin/other hosts yet, this is a necessary
step forward.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Signed-off-by: Danny Canter <danny@dcantah.dev>
2023-01-06 11:09:11 -08:00
Fabiano Fidêncio
efa4fc0b25 clh: Add hotplug support for network devices
This is needed in order to have Moby / Docker working properly with
Cloud Hypervisor, as Moby / Docker relies on hotplugging a network
device to the VM as a preStartHook.

Fixes: #5997

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-01-06 18:59:47 +01:00