Commit Graph

1387 Commits

Author SHA1 Message Date
Larry Dewey
123c867172 SEV: Update ReducedPhysBits
Updating this field, as `cpuid` provides host level data, which is not
what a guest would expect for Reduced Phsycial Bits. In almost all
cases, we should be using `1` for the value here.

Amend: Adding unit test change.

Fixes: #5006

Signed-off-by: Larry Dewey <larry.dewey@amd.com>
(cherry picked from commit 67b8f0773f)
2023-02-14 08:55:53 -08:00
Alexandru Matei
98f60c100c clh: Enforce API timeout only for vm.boot request
launchClh already has a timeout of 10seconds for launching clh, e.g.
if launchClh or setupVirtiofsDaemon takes a few seconds the context's
deadline will already be expired by the time it reaches bootVM

Fixes #6240
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
(cherry picked from commit ac64b021a6)
2023-02-14 08:55:53 -08:00
ls
92f3b11c94 runtime:all APIs are hang in the service.mu
When the vmm process exits abnormally, a goroutine sets s.monitor
to null in the 'watchSandbox' function without getting service.mu,
This will cause another goroutine to block when sending a message
to s.monitor, and it holds service.mu, which leads to a deadlock.
For example, the wait function in the file
.../pkg/containerd-shim-v2/wait.go will send a message to s.monitor
after obtaining service.mu, but s.monitor may be null at this time

Fixes: #6059

Signed-off-by: ls <335814617@qq.com>
(cherry picked from commit 69fc8de712)
2023-02-14 08:55:53 -08:00
Greg Kurz
92619c833e runtime: Drop QEMU log file support
The QEMU log file is essentially about fine grain tracing of QEMU
internals and mostly useful for developpers, not production. Notably,
the log file isn't limited in size, nor rotated in any way. It means
that a container running in the VM could possibly flood the log file
with a guest triggerable trace. For example, on openshift, the log
file is supposed to reside on a per-VM 14 GiB tmpfs mount. This means
that each pod running with the kata runtime could potentially consume
this amount of host RAM which is not acceptable.

Error messages are best collected from QEMU's stderr as kata is doing
now since PR #5736 was merged. Drop support for the QEMU log file
because it doesn't bring any value but can certainly do harm.

Fixes #6173

Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 334c4b8bdc)
Signed-off-by: Greg Kurz <groug@kaod.org>
2023-02-04 17:56:17 +01:00
Greg Kurz
4f3db76780 runtime: Collect QEMU's stderr
LaunchQemu now connects a pipe to QEMU's stderr and makes it
usable by callers through a Go io.ReadCloser object. As
explained in [0], all messages should be read from the pipe
before calling cmd.Wait : introduce a LogAndWait helper to handle
that.

Fixes #5780

Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 39fe4a4b6f)
Signed-off-by: Greg Kurz <groug@kaod.org>
2023-02-04 17:56:17 +01:00
Greg Kurz
918c11e46b runtime: Start QEMU undaemonized
QEMU has always been started daemonized since the beginning. I
could not find any justification for that though, but it certainly
introduces a problem : QEMU stops logging errors when started this
way, which isn't accaptable from a support standpoint. The QEMU
community discourages the use of -daemonize ; mostly because
libvirt, QEMU's primary consummer, doesn't use this option and
prefers getting errors from QEMU's stderr through a pipe in order
to enforce rollover.

Now that virtcontainers knows how to start QEMU with a pre-
established QMP connection, let's start QEMU without -daemonize.
This requires to handle the reaping of QEMU when it terminates.
Since cmd.Wait() is blocking, call it from a goroutine.

Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit a5319c6be6)
Signed-off-by: Greg Kurz <groug@kaod.org>
2023-02-04 17:56:17 +01:00
Greg Kurz
8c4507be21 runtime: Launch QEMU with cmd.Start()
LaunchCustomQemu() currently starts QEMU with cmd.Run() which is
supposed to block until the child process terminates. This assumes
that QEMU daemonizes itself, otherwise LaunchCustomQemu() would
block forever. The virtcontainers package indeed enables the
Daemonize knob in the configuration but having such an implicit
dependency on a supposedly configurable setting is ugly and fragile.

cmd.Run() is :

func (c *Cmd) Run() error {
	if err := c.Start(); err != nil {
		return err
	}
	return c.Wait()
}

Let's open-code this : govmm calls cmd.Start() and returns the
cmd to virtcontainers which calls cmd.Wait().

If QEMU doesn't start, e.g. missing binary, there won't be any
errors to collect from QEMU output. Just drop these lines in govmm.
Similarily there won't be any log file to read from in virtcontainers.
Drop that as well.

Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit bf4e3a618f)
Signed-off-by: Greg Kurz <groug@kaod.org>
2023-02-04 17:56:17 +01:00
Greg Kurz
a61fba6d45 runtime: Pre-establish the QMP connection
Running QEMU daemonized ensures that the QMP socket is ready to
accept connections when LaunchQemu() returns. In order to be
able to run QEMU undaemonized, let's handle that part upfront.
Create a listener socket and connect to it. Pass the listener
to QEMU and pass the connected socket to QMP : this ensures
that we cannot fail to establish QMP connection and that we
can detect if QEMU exits before accepting the connection.
This is basically what libvirt does.

Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 8a1723a5cb)
Signed-off-by: Greg Kurz <groug@kaod.org>
2023-02-04 17:56:17 +01:00
Greg Kurz
ad9cb0ba58 govmm: Optionally pass QMP listener to QEMU
QEMU's -qmp option can be passed the file descriptor of a socket that
is already in listening mode. This is done with by passing `fd=XXX`
to `-qmp` instead of a path. Note that these two options are mutually
exclusive : QEMU errors out if both are passed, so we check that as
well in the validation function.

While here add the `path=` stanza in the path based case for clarity.

Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 8a4f08cb0f)
Signed-off-by: Greg Kurz <groug@kaod.org>
2023-02-04 17:56:17 +01:00
Greg Kurz
d6dd99e986 govmm: Optionally start QMP with a pre-configured connection
When QEMU is launched daemonized, we have the guarantee that the
QMP socket is available. In order to launch a non-daemonized QEMU,
the QMP connection should be created before QEMU is started in order
to avoid a race. Introduce a variant of QMPStart() that can use such
an existing connection.

Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 219bb8e7d0)
Signed-off-by: Greg Kurz <groug@kaod.org>
2023-02-04 17:56:17 +01:00
Peng Tao
bc5bbfa60f versions: update nydusd version
To the latest stable v2.1.1.

Fixes: #5635
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
(cherry picked from commit a636d426d9)
2022-12-19 16:05:03 +01:00
Bo Chen
9cf1af873b runtime: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v28.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.

Fixes: #5683

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 36545aa81a)
2022-11-25 17:53:03 +01:00
Alexandru Matei
719017d688 clh: return faster with dead clh process from isClhRunning
Through proactively checking if Cloud Hypervisor process is dead,
this patch provides a faster path for isClhRunning

Fixes: #5623

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
(cherry picked from commit 0e3ac66e76)
2022-11-25 17:53:03 +01:00
Alexandru Matei
569ecdbe76 clh: fast exit from isClhRunning if the process was stopped
Use atomic operations instead of acquiring a mutex in isClhRunning.
This stops isClhRunning from generating a deadlock by trying to
reacquire an already-acquired lock when called via StopVM->terminate.

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
(cherry picked from commit 9ef68e0c7a)
2022-11-25 17:53:03 +01:00
Alexandru Matei
fa8a0ad49b clh: don't try to stop clh multiple times
Avoid executing StopVM concurrently when virtiofs dies as a result of clh
being stopped in StopVM.

Fixes: #5622

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
(cherry picked from commit 2631b08ff1)
2022-11-25 17:53:03 +01:00
Guanglu Guo
8fbf862fa6 cloud-hypervisor: Fix GetThreadIDs function
Get vcpu thread-ids by reading cloud-hypervisor process tasks information.

Fixes: #5568

Signed-off-by: Guanglu Guo <guoguanglu@qiyi.com>
(cherry picked from commit daeee26a1e)
2022-11-25 17:53:03 +01:00
Fabiano Fidêncio
9141acd94c versions: Update Cloud Hypervisor to b4e39427080
An API change, done a long time ago, has been exposed on Cloud
Hypervisor and we should update it on the Kata Containers side to ensure
it doesn't affect Cloud Hypervisor CI and because the change is needed
for an upcoming work to get QAT working with Cloud Hypervisor.

Fixes: #5492

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 9d286af7b4)
2022-11-25 17:53:03 +01:00
Bo Chen
9a0ab92f65 runtime: clh: Use the new API to boot with TDX firmware (td-shim)
The new way to boot from TDX firmware (e.g. td-shim) is using the
combination of '--platform tdx=on' with '--firmware tdshim'.

Fixes: #5309

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 067e2b1e33)
2022-11-25 17:53:03 +01:00
Bo Chen
f3eac35b55 runtime: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v27.0.
Note: The client code of cloud-hypervisor's (CLH) OpenAPI is
automatically generated by openapi-generator [1-2].

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Fixes: #5309

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 5d63fcf344)
2022-11-25 17:53:03 +01:00
Fabiano Fidêncio
74791ed389 runtime: Fix gofmt issues
It seems that bumping the version of golang and golangci-lint new format
changes are required.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit d94718fb30)
2022-11-25 14:10:06 +01:00
Fabiano Fidêncio
778ebb6e60 golang: Stop using io/ioutils
The package has been deprecated as part of 1.16 and the same
functionality is now provided by either the io or the os package.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 16b8375095)
2022-11-25 13:29:34 +01:00
Peng Tao
b8ce291dd0 build: update golang version to 1.19.2
So that we get the latest language fixes.

There is little use to maitain compiler backward compatibility.
Let's just set the default golang version to the latest 1.19.2.

Fixes: #5494
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
(cherry picked from commit eab8d6be13)
2022-11-25 13:29:04 +01:00
Feng Wang
43b0e95800 runtime: store the user name in hypervisor config
The user name will be used to delete the user instead of relying on
uid lookup because uid can be reused.

Fixes: #5155

Signed-off-by: Feng Wang <feng.wang@databricks.com>
(cherry picked from commit f914319874)
Signed-off-by: Greg Kurz <groug@kaod.org>
2022-09-27 15:58:26 +02:00
Feng Wang
81801888a2 runtime: make StopVM thread-safe
StopVM can be invoked by multiple threads and needs to be thread-safe

Fixes: #5155

Signed-off-by: Feng Wang <feng.wang@databricks.com>
(cherry picked from commit 5cafe21770)
Signed-off-by: Greg Kurz <groug@kaod.org>
2022-09-27 15:58:25 +02:00
Feng Wang
fba39ef32d runtime: add more debug logs for non-root user operation
Previously the logging was insufficient and made debugging difficult

Fixes: #5155

Signed-off-by: Feng Wang <feng.wang@databricks.com>
(cherry picked from commit c3015927a3)
Signed-off-by: Greg Kurz <groug@kaod.org>
2022-09-27 15:58:24 +02:00
Peng Tao
e229a03cc8 runtime: update runc dependency
To bring fix to CVE-2022-29162.

Fixes: #5217
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-09-27 11:54:37 +08:00
Eric Ernst
9997ab064a sandbox_test: Add test to verify memory hotplug behavior
Augment the mock hypervisor so that we can validate that ACPI memory hotplug
is carried out as expected.

We'll augment the number of memory slots in the hypervisor config each
time the memory of the hypervisor is changed. In this way we can ensure
that large memory hotplugs are broken up into appropriately sized
pieces in the unit test.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-08-31 10:32:30 -07:00
Eric Ernst
f390c122f0 sandbox: don't hotplug too much memory at once
If we're using ACPI hotplug for memory, there's a limitation on the
amount of memory which can be hotplugged at a single time.

During hotplug, we'll allocate memory for the memmap for each page,
resulting in a 64 byte per 4KiB page allocation. As an example, hotplugging 12GiB
of memory requires ~192 MiB of *free* memory, which is about the limit
we should expect for an idle 256 MiB guest (conservative heuristic of 75%
of provided memory).

From experimentation, at pod creation time we can reliably add 48 times
what is provided to the guest. (a factor of 48 results in using 75% of
provided memory for hotplug). Using prior example of a guest with 256Mi
RAM, 256 Mi * 48 = 12 Gi; 12GiB is upper end of what we should expect
can be hotplugged successfully into the guest.

Note: It isn't expected that we'll need to hotplug large amounts of RAM
after workloads have already started -- container additions are expected
to occur first in pod lifecycle. Based on this, we expect that provided
memory should be freely available for hotplug.

If virtio-mem is being utilized, there isn't such a limitation - we can
hotplug the max allowed memory at a single time.

Fixes: #4847

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-08-31 10:32:30 -07:00
Eric Ernst
e0142db24f hypervisor: Add GetTotalMemoryMB to interface
It'll be useful to get the total memory provided to the guest
(hotplugged + coldplugged). We'll use this information when calcualting
how much memory we can add at a time when utilizing ACPI hotplug.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-08-30 16:37:47 -07:00
Archana Shinde
7d52934ec1
Merge pull request #4798 from amshinde/use-iouring-qemu
Use iouring for qemu block devices
2022-08-26 04:00:24 +05:30
Fabiano Fidêncio
ddc94e00b0
Merge pull request #4982 from fidencio/topic/improve-cloud-hypervisor-plus-tdx-support
TDX: Get TDX working again with Cloud Hypervisor + a minor change on QEMU's code
2022-08-25 08:53:10 +02:00
Fabiano Fidêncio
dc90eae17b qemu: Drop unnecessary tdx_guest kernel parameter
With the current TDX kernel used with Kata Containers, `tdx_guest` is
not needed, as TDX_GUEST is now a kernel configuration.

With this in mind, let's just drop the kernel parameter.

Fixes: #4981

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-24 20:02:43 +02:00
Fabiano Fidêncio
d4b67613f0 clh: Use HVC console with TDX
As right now the TDX guest kernel doesn't support "serial" console,
let's switch to using HVC in this case.

Fixes: #4980

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-24 20:02:40 +02:00
Fabiano Fidêncio
c0cb3cd4d8 clh: Avoid crashing when memory hotplug is not allowed
The runtime will crash when trying to resize memory when memory hotplug
is not allowed.

This happens because we cannot simply set the hotplug amount to zero,
leading is to not set memory hotplug at all, and later then trying to
access the value of a nil pointer.

Fixes: #4979

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-24 20:02:22 +02:00
Fabiano Fidêncio
9f0a57c0eb clh: Increase API and SandboxStop timeouts for TDX
While doing tests using `ctr`, I've noticed that I've been hitting those
timeouts more frequently than expected.

Till we find the root cause of the issue (which is *not* in the Kata
Containers), let's increase the timeouts when dealing with a
Confidential Guest.

Fixes: #4978

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-24 20:02:12 +02:00
Fabiano Fidêncio
c142fa2541 clh: Lift the sharedFS restriction used with TDX
When booting the TDX kernel with `tdx_disable_filter`, as it's been done
for QEMU, VirtioFS can work without any issues.

Whether this will be part of the upstream kernel or not is a different
story, but it easily could make it there as Cloud Hypervisor relies on
the VIRTIO_F_IOMMU_PLATFORM feature, which forces the guest to use the
DMA API, making these devices compatible with TDX.

See Sebastien Boeuf's explanation of this in the
3c973fa7ce208e7113f69424b7574b83f584885d commit:
"""
By using DMA API, the guest triggers the TDX codepath to share some of
the guest memory, in particular the virtqueues and associated buffers so
that the VMM and vhost-user backends/processes can access this memory.
"""

Fixes: #4977

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-24 17:14:05 +02:00
Peng Tao
a06d819b24 runtime: cri-o annotations have been moved to podman
Let's swith to depending on podman which also simplies indirect
dependency on kubernetes components. And it helps to avoid cri-o
security issues like CVE-2022-1708 as well.

Fixes: #4972
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-08-24 18:11:37 +08:00
Bin Liu
6551d4f25a
Merge pull request #4051 from bergwolf/github/vmx-vm-factory
enable vmx for vm factory
2022-08-24 16:22:37 +08:00
Fabiano Fidêncio
9806ce8615
Merge pull request #4937 from chenhengqi/fix-error-msg
network: Fix error message for setting hardware address on TAP interface
2022-08-19 17:54:58 +02:00
Fabiano Fidêncio
828383bc39
Merge pull request #4933 from likebreath/0816/prepare_clh_v26.0
Upgrade to Cloud Hypervisor v26.0
2022-08-18 18:36:53 +02:00
Peng Tao
f508c2909a runtime: constify splitIrqChipMachineOptions
A simple cleanup.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-08-18 10:09:20 +08:00
Peng Tao
2b0587db95 runtime: VMX is migratible in vm factory case
We are not spinning up any L2 guests in vm factory, so the L1 guest
migration is expected to work even with VMX.

See https://www.linux-kvm.org/page/Nested_Guests

Fixes: #4050
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-08-18 10:08:43 +08:00
Peng Tao
fa09f0ec84 runtime: remove qemuPaths
It is broken that it doesn't list QemuVirt machine type. In fact we
don't need it at all. Just drop it.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-08-18 10:06:10 +08:00
Bo Chen
3a597c2742 runtime: clh: Use the new 'payload' interface
The new 'payload' interface now contains the 'kernel' and 'initramfs'
config.

Fixes: #4952

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-08-17 12:23:43 -07:00
Bo Chen
16baecc5b1 runtime: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v26.0.
Note: The client code of cloud-hypervisor's (CLH) OpenAPI is
automatically generated by openapi-generator [1-2].

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Fixes: #4952

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-08-17 12:23:12 -07:00
Hengqi Chen
8ff5c10ac4 network: Fix error message for setting hardware address on TAP interface
Error out with the correct interface name and hardware address instead.

Fixes: #4944

Signed-off-by: Hengqi Chen <chenhengqi@outlook.com>
2022-08-17 16:42:07 +08:00
Chelsea Mafrica
fcc1e0c617 runtime: tracing: End root span at end of trace
The root span should exist the duration of the trace. Defer ending span
until the end of the trace instead of end of function. Add the span to
the service struct to do so.

Fixes #4902

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2022-08-12 13:15:39 -07:00
Bin Liu
cb7f9524be
Merge pull request #4804 from openanolis/anolis/merge_runtime_rs_to_main
runtime-rs:merge runtime rs to main
2022-08-11 08:40:41 +08:00
Tim Zhang
4813a3cef9
Merge pull request #4711 from liubin/fix/4710-wait-nydusd-api-server-ready
nydus: wait nydusd API server ready before mounting share fs
2022-08-10 17:20:17 +08:00
liubin
2ae807fd29 nydus: wait nydusd API server ready before mounting share fs
If the API server is not ready, the mount call will fail, so before
mounting share fs, we should wait the nydusd is started and
the API server is ready.

Fixes: #4710

Signed-off-by: liubin <liubin0329@gmail.com>
Signed-off-by: Bin Liu <bin@hyper.sh>
2022-08-08 16:18:38 +08:00