Commit Graph

221 Commits

Author SHA1 Message Date
Samuel Ortiz
fa0e9dc6b1 virtcontainers: Make all Linux VMMs only build on Linux
Some of them (e.g. QEMU) can run on other OSes (e.g. Darwin) but the
current virtcontainers implementation is Linux specific.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-16 19:07:34 +01:00
Samuel Ortiz
c91035d0e1 virtcontainers: Move non QEMU specific constants to hypervisor.go
Hotplugging errors and 9pfs size are not particularily QEMU specific.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-16 19:07:34 +01:00
luodaowen.backend
2d9f89aec7 feature(nydusd): add nydusd support to introduse lazyload ability
Pulling image is the most time-consuming step in the container lifecycle. This PR
introduse nydus to kata container, it can lazily pull image when container start. So it
can speed up kata container create and start.

Fixes #2724

Signed-off-by: luodaowen.backend <luodaowen.backend@bytedance.com>
2022-02-11 21:41:17 +08:00
Samuel Ortiz
e0b264430d virtcontainers: Define a Network interface
And move the Linux implementation into a GOOS specific file.

Fixes #3005

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Samuel Ortiz
844eb61992 virtcontainers: Have CreateVM use a Network reference
We are replacing the NetworkingNamespace structure with the Network
one, so we should have the hypervisor interface switching to it as well.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Jakob Naucke
7ffe9e5198
virtcontainers: Do not add a virtio-rng-ccw device
On s390x, skip adding a virtio-rng device. The on-chip CPACF provides
entropy instead. For Confidential Containers, when using Secure
Execution, entropy attacks on virtio-rng are mitigated.

Fixes: #3598
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2022-02-02 17:06:20 +01:00
Julio Montes
1f29478b09 runtime: suppport split firmware
firmware can be split into FIRMWARE_VARS.fd (UEFI variables as
configuration) and FIRMWARE_CODE.fd (UEFI program image). UEFI
variables can be customized per each user while UEFI code is kept same.

fixes #3583

Signed-off-by: Julio Montes <julio.montes@intel.com>
2022-02-01 13:40:19 -06:00
Fabiano Fidêncio
ec6655af87 govmm: Use govmm from our own pkg
Let's stop using govmm from kata-containers/govmm and let's start using
it from our own repo.

Fixes: #3495

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-01-19 18:02:46 +01:00
Julio Montes
49223e67af runtime: remove enable_swap option
`enable_swap` option was added long time ago to add
`-realtime mlock=off` to the QEMU's command line.
Kata now supports QEMU 6, `-realtime` option has been deprecated and
`mlock=on` is causing unexpected behaviors in kata.
This patch removes support for `enable_swap`, `-realtime` and `mlock=`
since they are causing bugs in kata.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2022-01-18 11:12:29 -06:00
bin
03546f75a6 runtime: change io/ioutil to io/os packages
Change io/ioutil to io/os packages because io/ioutil package
is deprecated from 1.16:

Discard => io.Discard
NopCloser => io.NopCloser
ReadAll => io.ReadAll
ReadDir => os.ReadDir
ReadFile => os.ReadFile
TempDir => os.MkdirTemp
TempFile => os.CreateTemp
WriteFile => os.WriteFile

Details: https://go.dev/doc/go1.16#ioutil

Fixes: #3265

Signed-off-by: bin <bin@hyper.sh>
2021-12-15 07:31:48 +08:00
bin
40bd34caaf runtime: only call stopVirtiofsd when shared_fs is virtio-fs
If shared_fs is set to virtio-9p, the virtiofsd is not started,
so there is no need to stop it.

Fixes: #3219

Signed-off-by: bin <bin@hyper.sh>
2021-12-07 16:06:26 +08:00
Eric Ernst
ce92cadc7d vc: hypervisor: remove setSandbox
The hypervisor interface implementation should not know a thing about
sandboxes.

Fixes: #2882

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-11-19 12:20:41 -08:00
Eric Ernst
2227c46c25 vc: hypervisor: use our own logger
This'll end up moving to hypervisors pkg, but let's stop using virtLog,
instead introduce hvLogger.

Fixes: #2884

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-11-19 12:20:41 -08:00
Eric Ernst
4c2883f7e2 vc: hypervisor: remove dependency on persist API
Today the hypervisor code in vc relies on persist pkg for two things:
1. To get the VM/run store path on the host filesystem,
2. For type definition of the Load/Save functions of the hypervisor
   interface.

For (1), we can simply remove the store interface from the hypervisor
config and replace it with just the path, since this is all we really
need. When we create a NewHypervisor structure, outside of the
hypervisor, we can populate this path.

For (2), rather than have the persist pkg define the structure, let's
let the hypervisor code (soon to be pkg) define the structure. persist
API already needs to call into hypervisor anyway; let's allow us to
define the structure.

We'll probably want to look at following similar pattern for other parts
of vc that we want to make independent of the persist API.

In doing this, we started an initial hypervisors pkg, to hold these
types (avoid a circular dependency between virtcontainers and persist
pkg). Next step will be to remove all other dependencies and move the
hypervisor specific code into this pkg, and out of virtcontaienrs.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-11-19 12:20:41 -08:00
Eric Ernst
34f23de512 vc: hypervisor: Remove need to get shared address from sandbox
Add shared path as part of the hypervisor config

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-11-19 12:20:41 -08:00
Eric Ernst
860f30882a virtcontainers: move oci, uuid packages top level
This will be useful at runtime level; no need for oci or uuid to be subpkg of
virtcontainers.

While at it, ensure we run gofmt on the changed files.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-11-17 14:12:57 -08:00
bin
09f7962ff1 runtime: merge virtcontainers/pkg/types into virtcontainers/types
There are two types packages under virtcontainers, and the
virtcontainers/pkg/types has a few codes, merging them into
one can make it easy for outstanding and using types package.

Fixes: #3031

Signed-off-by: bin <bin@hyper.sh>
2021-11-12 15:06:39 +08:00
Chelsea Mafrica
09d5d8836b runtime: tracing: Change method for adding tags
In later versions of OpenTelemetry label.Any() is deprecated. Create
addTag() to handle type assertions of values. Change AddTag() to
variadic function that accepts multiple keys and values.

Fixes #2547

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-11-04 10:19:05 -07:00
Manohar Castelino
f434bcbf6c hypervisor: createSandbox is CreateVM
Last of a series of commits to export the top level
hypervisor generic methods.

s/createSandbox/CreateVM

Fixes #2880

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
76f1ce9e30 hypervisor: startSandbox is StartVM
s/startSandbox/StartVM

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
fd24a695bf hypervisor: waitSandbox is waitVM
renaming...

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
a6385c8fde hypervisor: stopSandbox is StopVM
Renaming. There is no Sandbox specific logic except tracing.

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
f989078cd2 hypervisor: resumeSandbox is ResumeVM
renaming...

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
73b4f27c46 hypervisor: saveSandbox is SaveVM
rename

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
7308610c41 hypervisor: pauseSandbox is nothing but PauseVM
renaming

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
8f78e1cc19 hypervisor: The SandboxConsole is the VM's console
update naming

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
4d47aeef2e hypervisor: Export generic interface methods
This is in preparation for creating a seperate hypervisor package.
Non functional change.

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
6baf2586ee hypervisor: Minimal exports of generic hypervisor internal fields
Export commonly used hypervisor fields and utility functions.
These need to be exposed to allow the hypervisor to be consumed
externally.

Note: This does not change the hypervisor interface definition.
Those changes will be separate commits.

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
2021-10-22 16:45:35 -07:00
Chelsea Mafrica
4ce2b14e60
Merge pull request #2817 from jodh-intel/clh+fc-agent-tracing
Enable agent tracing for hybrid VSOCK hypervisors
2021-10-18 22:01:52 -07:00
James O. D. Hunt
e61f5e2931 runtime: Show socket path in kata-env output
Display a pseudo path to the sandbox socket in the output of
`kata-runtime env` for those hypervisors that use Hybrid VSOCK.

The path is not a real path since the command does not create a sandbox.
The output includes a `{ID}` tag which would be replaced with the real
sandbox ID (name) when the sandbox was created.

This feature is only useful for agent tracing with the trace forwarder
where the configured hypervisor uses Hybrid VSOCK.

Note that the features required a new `setConfig()` method to be added
to the `hypervisor` interface. This isn't normally needed as the
specified hypervisor configuration passed to `setConfig()` is also
passed to `createSandbox()`. However the new call is required by
`kata-runtime env` to display the correct socket path for Firecracker.
The new method isn't wholly redundant for the main code path though as
it's now used by each hypervisor's `createSandbox()` call.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-10-15 11:45:29 +01:00
Feng Wang
adc9e0baaf runtime: fix two bugs in rootless hypervisor
Update the sandbox dir clean up logic to be more appropriate
Add different seeds for randInt() method

Fixes #2770

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2021-10-08 15:52:42 -07:00
Fupan Li
988eb95621
Merge pull request #2760 from liubin/fix/2759-optimize-code-for-managing-temp-users
runtime: optimize code for managing temp users for rootless mode
2021-10-08 13:49:14 +08:00
bin
bf8f582c1d runtime: optimize code for managing temp users for rootless mode
This commit does two chagnes:

- move code for managing temp users to rootless.go.
- use common function in qemu.go when shutdown the VM.

Fixes: #2759

Signed-off-by: bin <bin@hyper.sh>
2021-10-08 11:04:21 +08:00
David Gibson
b57613f53e
Merge pull request #1682 from dgibson/rescan
Remove forced PCI rescans from agent
2021-09-29 13:03:55 +10:00
Feng Wang
e5fe53f0a9 runtime: fix nil reference in cleanup rootless user
It seems the client (crio) can send multiple requests to stop the Kata VM,
resulting a nil reference if the uid has already been cleaned up by a different thread.

Fixes #2743

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2021-09-27 21:28:47 -07:00
David Gibson
ad45c52fbe runtime/device: Record guest PCI path for VFIO devices
For several device types which correspond to a PCI device in the guest
we record the device's PCI path in the guest.  We don't currently do
that for VFIO devices, but we're going to need to for better handling
of SR-IOV devices.

To accomplish this, we have to determine the guest PCI path from the
information the VMM gives us:

For qemu, we query the slot of the device and its bridge from QMP.

For cloud-hypervisor, the device add interface gives us a guest PCI
address.  In fact this represents a design error in the clh API -
there's no way it can really know the guest PCI address in general.
It works in this case, because clh doesn't use PCI bridges, so the
device will always be on the root bus.  Based on that, the PCI path is
simply the device's slot number.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-27 12:46:33 +10:00
David Gibson
5c2af3e308 runtime/device: Refactor hotplugVFIODevice() to have common exit path
hotplugVFIODevice() has several different paths depending if we're
plugging into a root port or a PCIE<->PCI bridge and if we're using a
regular or mediated VFIO device.

We're going to want some common code on the successful exit path here,
so refactor the function to allow that without duplication.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-27 12:46:33 +10:00
David Gibson
8bbcb06af5 qemu: Disable SHPC hotplug
Under certain circumstances[0] Kata will attempt to use SHPC hotplug
for PCI devices on the guest.  In fact we explicitly enable SHPC on
our PCI to PCI bridges, regardless of the qemu default.

SHPC was designed a long, long time ago for physical hotplugging and
works very poorly for a virtual environment. In particular it has a
mandatory 5s delay to allow a (real, human) operator to back out the
operation if they press a button by mistake. This alone makes it
unusable for a fast start up application like Kata.

Worse, the agent forces a PCI rescan during startup.  That will race
with the SHPC hotplug operation causing the device to go into a bad
state where config space can't be accessed from the guest at all.

The only reason we've sort of gotten away with this is that our
default guest kernel configuration triggers what's arguably a kernel
bug effectively disabling SHPC.  That makes the agent rescan the only
reason we see the new device.

Now that we require a qemu >=6.1, which includes ACPI PCI hotplug on
the q35 machine, we can explicitly disable SHPC in all cases.  It's
nothing but trouble.

fixes #2174

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-23 10:27:26 +10:00
Feng Wang
1cfe59304d runtime: Run QEMU using a non-root user/group
A random generated user/group is used to start QEMU VMM process.
The /dev/kvm group owner is also added to the QEMU process to grant it access.

Fixes #2444

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2021-09-17 11:28:44 -07:00
David Gibson
64bb803fcf runtime/qemu: Move from query-cpus to query-cpus-fast
We recently updated to using qemu-6.1 (from qemu 5.2).  Unfortunately one
breaking change in qemu 6.0 wasn't caught by the CI.

The query-cpus QMP command has been removed, replaced by query-cpus-fast
(which has been available since qemu 2.12).  govmm already had support for
query-cpus-fast, we just weren't using it, so the change is quite easy.

fixes #2643

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-15 16:41:26 +10:00
Hui Zhu
bcc9fa3b35 hotplugAddBlockDevice: Use ExecuteBlockdevAddWithDriverCache with swap
Use ExecuteBlockdevAddWithDriverCache with swap in
hotplugAddBlockDevice to handle swap file cannot work OK with
ExecuteBlockdevAddWithCache issue.

Fixes: #2548

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-09-01 14:13:11 +08:00
Peng Tao
c0daa4ebff
Merge pull request #2513 from cmaf/tracing-tracingtags-consistency
tracing: Change runtime tracing tags to vars
2021-08-31 10:25:10 +08:00
Binbin Zhang
ff37f5c798 runtime: Optimize the way slice created
Initialize and assign a value, reducing one append operation

Fixes: #2264

Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
2021-08-28 04:15:59 +08:00
Chelsea Mafrica
8f0f949abf tracing: Move dynamically added attributes to Trace()
Where possible, move attributes added with AddTag() to Trace() call to
reduce the amount of code used for tracing.

Fixes #2512

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-08-27 08:26:40 -07:00
Chelsea Mafrica
8058e97212 tracing: Change runtime tracing tags to vars
Tracing tags are stored inconsistently throughout the runtime. Change
all instances of tracing tags to variables.

Fixes #2512

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-08-26 15:55:32 -07:00
Julio Montes
47d95dc1c6 runtime: virtcontainers: fix govet fieldalignment
Fix structures alignment

fixes #2271

Depends-on: github.com/kata-containers/tests#3727

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-07-20 11:59:15 -05:00
Hui Zhu
243d4b8689 runtime: Sandbox: Add addSwap and removeSwap
addSwap will create a swap file, hotplug it to hypervisor as a special
block device and let agent to setup it in the guest kernel.
removeSwap will remove the swap file.

Just QEMU support addSwap.

Fixes: #2201

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-07-19 23:20:40 +08:00
Eric Ernst
feeb1ef8b1
Merge pull request #2212 from lifupan/fix_virtiofsd
qemu: stop the virtiofsd specifically
2021-07-12 13:56:04 -07:00
Benjamin Porter
b10e3e22b5
tracing: Consolidate tracing into a new katatrace package
Removes custom trace functions defined across the repo and creates
a single trace function in a new katatrace package. Also moves
span tag management into this package and provides a function to
dynamically add a tag at runtime, such as a container id, etc.

Fixes #1162

Signed-off-by: Benjamin Porter <bporter816@gmail.com>
2021-07-11 14:19:51 -05:00
fupan.lfp
8f76626fd6 qemu: stop the virtiofsd specifically
We'd better stop the virtiofsd specifically after stop qemu,
instead of depending on the qemu's termination to notify virtiofsd
to exit.

Fixes: #2211

Signed-off-by: fupan.lfp <fupan.lfp@antgroup.com>
2021-07-10 17:26:19 +08:00
Fupan Li
2de9c5b41d
Merge pull request #1969 from liubin/feature/1968-pass-span-context-to-agent
Pass span context from runtime to agent to get a full trace #1968
2021-07-03 09:31:02 +08:00
bin
bd5951247c runtime: add spans and attributes for agent/mount
Add more spans and attributes for agent setup, add devices,
and mount volumes.

Fixes: #1968

Signed-off-by: bin <bin@hyper.sh>
2021-07-02 10:07:28 +08:00
Jakob Naucke
8310a3d70a
virtcontainers: Don't fail memory hotplug
Architectures that do not support memory hotplugging will fail when
memory limits are set because that amount is hotplugged. Issue a warning
instead. The long-term solution is virtio-mem.

Fixes: #1412
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-06-24 10:58:06 +02:00
Marcel Apfelbaum
ac6b9c53d2 runtime: Hot-plug virtio-mem device on PCI bridge
Currently the virtio-mem device is hotplugged on the root bus.
This doesn't work for PCIe machines like q35.

Hotplug the virtio-mem device into the pci bridge instead.

Fixes #1953
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
2021-06-22 12:34:48 +03:00
Marcel Apfelbaum
789a59549e virtcontainers: Remove the pc machine
Keeping around two different x86 machines has no added value
and require more tests and maintenance. Prefer the q35 machine
since it has more features and drop the pc machine.

Fixes #1953
Depends-on: github.com/kata-containers/tests#3586
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
2021-06-22 11:54:07 +03:00
Julio Montes
7834f4127f virtcontainers: change memory_offset to uint64
`memory_offset` is used to increase the maximum amount of memory
supported in a VM, this offset is equal to the NVDIMM/PMEM device that
is hot added, in real use case workloads such devices are bigger than
4G, which is the current limit (uint32).

fixes #2006

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-06-16 07:16:49 -05:00
Chelsea Mafrica
8ca0207281 tracing: Add sandbox and container ID to trace spans
Add sandbox, container, and hypervisor IDs to trace spans. Note that
some spans in sandbox.go are created with a trace() call from api.go.
These spans have additional attributes set after span creation to
overwrite the api attributes.

Fixes #1878

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-06-02 21:53:54 -07:00
Bin Liu
1673110ee9
Merge pull request #1930 from jcvenegas/kata-moinitor-export-virtiofsd
metrics: Add virtiofsd exporter
2021-06-03 10:38:55 +08:00
Carlos Venegas
2234b73090 metrics: Add virtiofsd exporter
Export proc stats for virtiofsd.

This commit only adds for hypervisors that have support for it.

- qemu
- cloud-hypervisor

Fixes: #1926

Signed-off-by: Carlos Venegas <jos.c.venegas.munoz@intel.com>
2021-06-02 16:06:00 +00:00
Peng Tao
41e04495f4
Merge pull request #1943 from bergwolf/cleanup2
cleanup TODOs in runtime
2021-06-01 14:16:46 +08:00
Chelsea Mafrica
bcde703b36
Merge pull request #1859 from cmaf/tracing-attributes-1
tracing: Make runtime span attributes more consistent
2021-05-31 21:57:58 -07:00
Peng Tao
fee0004ad4 runtime: remove TODO about hot add memory in qemu.go
Already addressed by https://github.com/kata-containers/runtime/pull/786

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-05-29 11:15:50 +08:00
bin
72cd8f5ef6 virtiofsd: refactor qemu.go to use code in virtiofsd.go
CloudHypervisor is using virtiofsd.go to manage virtiofsd process,
but qemu has its code in qemu.go. This commit let qemu to re-use
code in virtiofsd.go to reduce code and improve maintenanceability.

Fixes: #1933

Signed-off-by: bin <bin@hyper.sh>
2021-05-29 11:00:05 +08:00
Chelsea Mafrica
05a46fede0 tracing: Make runtime span attributes more consistent
Span attributes (tags) are not consistent in runtime tracing, so
designate and use core attributes such source, package, subsystem, and
type as span metadata for more understandable output.

Use WithAttributes() during span creation to reduce calls to
SetAttributes().

Modify Trace() in katautils to accept slice of attributes so multiple
functions using different attributes can use it.

Fixes #1852

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-05-27 10:07:11 -07:00
Samuel Ortiz
2c4e4ca1ac
Merge pull request #1590 from devimc/2021-02-02/ConfidentialComputing
Support TDx
2021-05-10 22:19:40 +02:00
Julio Montes
0affe8860d virtcontainers: define confidential guest framework
Define the structure and functions needed to support confidential
guests, this commit doesn't add support for any specific technology,
support for TDX, SEV, PEF and others will be added in following
commits.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-05-06 10:09:05 -05:00
Hui Zhu
7f7c3fc8ec qemu.go: qemu: resizeMemory: Fix virtio-mem resize overflow issue
This commit change sizeByte from uint32 to uint64 to fix overflow issue.

Fixes: #1796

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-05-06 14:13:50 +08:00
Hui Zhu
c9053ea3fb qemu.go: qemu: setupVirtioMem: let sizeMB be multiple of 2Mib
Got:
FATA[0000] run pod sandbox: rpc error: code = Unknown desc = failed to
create containerd task: Add 189759MB virtio-mem-pci fail QMP command
failed: backend memory size must be multiple of 0x200000: unknown

This commit let sizeMB be multiple of 2Mib to fix the issue.

Fixes: #1796

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-05-06 14:13:48 +08:00
Jakob Naucke
3ee61776d6
virtcontainers: Enable virtio-fs on s390x
Allow and configure vhost-user-fs devices (virtio-fs) on s390x. As a
consequence, appendVhostUserDevice now takes a context, which affects
its signature for other architectures.

Fixes: #1753

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-04-29 09:54:08 +02:00
Jakob Naucke
adba4532a4
virtcontainers: Revert "virtcontainers: Allow s390x appendVhostUserDevice"
This reverts commit 7f60911333.
Patch allowed other vhost user devices besides FS not supported on s390x
and failed to attach a CCW device number, which results in the
inavailability to use more devices after vhost-user-fs-ccw.

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-04-29 09:43:33 +02:00
Eric Ernst
23a8179184
Merge pull request #1756 from egernst/leave-no-virtiofs-behind
qemu: kill virtiofsd if failure to start VMM
2021-04-27 17:16:33 -07:00
Wainer dos Santos Moschetta
3677640811 runtime/virtcontainers: Fix typo on qmp error msg
"negotiate" was misspelled on qemu's qmp error message.

Fixes #1764
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2021-04-27 11:52:42 -04:00
Eric Ernst
a57c8ab1be qemu: kill virtiofsd if failure to start VMM
If the QEMU VMM fails to launch, we currently fail to kill virtiofsd,
resulting in leftover processes running on the host. Let's make sure we
kill these, and explicitly cleanup the virtiofs socket on the
filesystem.

Ideally we'll migrate QEMU to utilize the same virtiofsd interface that
CLH uses, but let's fix this bug as a first step.

Fixes: #1755

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-04-26 21:07:20 -07:00
Chelsea Mafrica
1c222c75ac
Merge pull request #1697 from jodh-intel/improve-agent-shutdown-handling
Improve agent shutdown handling
2021-04-20 21:25:36 -07:00
Jakob Naucke
7f60911333
virtcontainers: Allow s390x appendVhostUserDevice
Remove the prohibition of vhost-user devices on s390x, which are by now
supported (e.g. vhost-user-fs-ccw). As a consequence,
appendVhostUserDevice no longer needs an error in its signature.
This enables virtio-fs support on s390x.

Fixes: #1469

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-04-20 12:20:32 +02:00
James O. D. Hunt
9256e590dc shutdown: Don't sever console watcher too early
Fixed logic used to handle static agent tracing.

For a standard (untraced) hypervisor shutdown, the runtime kills the VM
process once the workload has finished. But if static agent tracing is
enabled, the agent running inside the VM is responsible for the
shutdown. The existing code handled this scenario but did not wait for
the hypervisor process to end. The outcome of this being that the
console watcher thread was killed too early.

Although not a problem for an untraced system, if static agent tracing
was enabled, the logs from the hypervisor would be truncated, missing the
crucial final stages of the agents shutdown sequence.

The fix necessitated adding a new parameter to the `stopSandbox()` API,
which if true requests the runtime hypervisor logic simply to wait for
the hypervisor process to exit rather than killing it.

Fixes: #1696.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-04-15 15:22:00 +01:00
Jakob Naucke
31ced01eba
virtcontainers: Fix missing contexts in s390x
#1389 has added a context for many signatures to improve trace spans.
Functions specific to s390x lack this. Add context where required. This
affects some common code signatures, since some functions that do not
require context on other architectures do require it on s390x.
Also remove an unnecessary import in test_qemu_s390x.go.

Fixes: #1562

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-03-29 17:49:27 +02:00
Chelsea Mafrica
f3ebbb1f1a runtime: Fix trace span ordering
Return ctx in trace() functions to correct span ordering.

Fixes #1550

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-03-25 11:43:04 -07:00
Peng Tao
74192d179d runtime: fix static check errors
It turns out we have managed to break the static checker in many
difference places with the absence of static checker in github action.
Let's fix them while enabling static checker in github actions...

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-03-24 20:10:19 +08:00
Peng Tao
0153f76b07 runtime: gofmt code
Looks like we have merged a lot of code that is not properly formated.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-03-24 14:37:46 +08:00
Chelsea Mafrica
6b0dc60dda runtime: Fix ordering of trace spans
A significant number of trace calls did not use a parent context that
would create proper span ordering in trace output. Add local context to
functions for use in trace calls to facilitate proper span ordering.
Additionally, change whether trace function returns context in some
functions in virtcontainers and use existing context rather than
background context in bindMount() so that span exists as a child of a
parent span.

Fixes #1355

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-03-16 17:39:28 -07:00
David Gibson
72cb9287a0 vhost-user-blk: Use PciPath type for vhost user devices
VhostUserDeviceAttrs::PCIAddr didn't actually store a PCI address
(DDDD:BB:DD.F), but rather a PCI path.  Use the PciPath type and
rename things to make that clearer.

TestHandleBlockVolume previously used the bizarre value "0001:01"
which is neither a PCI address nor a PCI path for this value.  Change
it to a valid PCI path - it appears the actual value didn't matter for
that test, as long as it was consistent.

Forward port of
3596058c67

fixes #1040

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-02-19 09:56:08 +11:00
David Gibson
74f5b5febe runtime/block: Use PciPath type through block code
BlockDrive::PCIAddr doesn't actually store a PCI address
(DDDD:BB:DD.F) but a PCI path.  Use the PciPath type and rename things
to make that clearer.

TestHandleBlockVolume() previously used a bizarre value "0002:01" for
the "PCI address" which was neither an actual PCI address, nor a PCI
path.  Update it to use a PCI path - the actual value appears not to
matter in this test, as long as its consistent throughout.

Forward port of
64751f377b

fixes #1040

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-02-19 09:56:08 +11:00
David Gibson
32b40f5fe4 runtime/network: Use PciPath type through network handling
The "PCI address" returned by Endpoint::PciPath() isn't actually a PCI
address (DDDD:BB:DD.F), but rather a PCI path.  Rename and use the
PciPath type to clean this up and the various parts of the network
code connected to it.

Forward port of
3e589713cf

fixes #1040

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-02-19 09:56:08 +11:00
Fupan Li
5d1432210c
Merge pull request #1352 from liubin/fix/migrate-opentracing-to-opentelemetry
runtime: migrate from opentracing to opentelemetry
2021-02-09 10:18:10 +08:00
bin
17df9b119d runtime: migrate from opentracing to opentelemetry
This commit includes two changes:
- migrate from opentracing to opentelemetry
- add jaeger configuration items

Fixes: #1351

Signed-off-by: bin <bin@hyper.sh>
2021-02-03 17:30:49 +08:00
Jianyong Wu
b7a1f752c0 arm64: enable acpi for qemu/virt.
acpi is enabled for kata 1.x, port and rebase code for 2.x
including:
runtime: enable pflash;
agent: add acpi support for pci bus path;
packaging: enable CONFIG_RTC_DRV_EFI;

Fixes: #1317
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-01-29 22:12:43 +08:00
Eric Ernst
789fd7c1c6 blk-dev: hotplug readonly if applicable
If a block based volume is read only, let's make sure we add as a RO
device

Fixes: #1246

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2021-01-12 14:50:54 -08:00
Eric Ernst
12777b26e4 volumes: cleanup / minor refactoring
Update some headers, very minor refactoring

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2021-01-12 14:50:47 -08:00
Eric Ernst
9a7bcccc8e qemu: no state to save if QEMU isn't running
On pod delete, we were looking to read files that we had just deleted. In particular,
stopSandbox for QEMU was called (we cleanup up vmpath), and then QEMU's
save function was called, which immediately checks for the PID file.

Let's only update the persist store for QEMU if QEMU is actually
running. This'll avoid Error messages being displayed when we are
stopping and deleting a sandbox:

```
level=error msg="Could not read qemu pid file"
```

I reviewed CLH, and it looks like it is already taking appropriate
action, so no changes needed.

Ideally we won't spend much time saving state to persist.json unless
there's an actual error during stop/delete/shutdown path, as the persist will
also be removed after the pod is removed. We may want to optimize this,
as currently we are doing a persist store when deleting each container
(after the sandbox is stopped, VM is killed), and when we stop the sandbox.
This'll require more rework... tracked in:
  https://github.com/kata-containers/kata-containers/issues/1181

Fixes: #1179

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-12-21 11:29:44 -08:00
bin liu
4e3a8c0124 runtime: remove global sandbox variable
Remove global sandbox variable, and save *Sandbox to hypervisor struct.
For some needs, hypervisor may need to use methods from Sandbox.

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-13 09:47:09 +08:00
bin liu
e3510be867 runtime: use one line if statement to check if err is nil for qemu.go
Use `if err := q.qmpSetup(); err != nil` to reduce code and make it easy
to read. And remove checking err if last function call also return an error,
return the function call directly.

Fixes: #1081

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-09 11:42:45 +08:00
Fupan Li
d22c7cf00b
Merge pull request #1013 from liubin/feature/1012-dump-guest-memroy-on-panic
Dump guest memory when kernel panic for QEMU
2020-11-09 09:46:28 +08:00
bin liu
40418f6d88 runtime: add geust memory dump
When guest panic, dump guest kernel memory to host filesystem.
And also includes:
- hypervisor config
- hypervisor version
- and state of sandbox

Fixes: #1012

Signed-off-by: bin liu <bin@hyper.sh>
2020-11-05 16:04:21 +08:00
bin liu
5b065eb599 runtime: change govmm package
Change govmm package name from github.com/intel/govmm
to github.com/kata-containers/govmm

Fixes: #859

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-22 21:27:49 +08:00
Jianyong Wu
9eab301526 arm64: correct bridge type for QEMUVIRT
port forward PR https://github.com/kata-containers/runtime/pull/3017

Fixes: #3016
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2020-10-20 14:09:03 +08:00
David Gibson
cb9993759b runtime: Fix typo in hotplugVFIODevice()
"machineType" is misspelled as "machinneType".

Fixes: #670

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-09-04 14:28:51 +10:00
Jakob-Naucke
1236e22475
runtime: Add support for VFIO-AP pass-through
Recognise when a device to be hot-plugged is an IBM Adjunct Processor
(AP) device and execute VFIO AP hot-plug accordingly. Includes unittest
for recognising and uses CCW for addDeviceToBridge in hotplugVFIODevice
if appropriate.

Fixes: #491

Signed-off-by: Jakob-Naucke <jakob.naucke@ibm.com>
Co-authored-by: Julio Montes <julio.montes@intel.com>
Reviewed-by: Alice Frosi <afrosi@redhat.com>
2020-09-01 10:41:49 +02:00
Qi Feng Huo
f5598a1bc2 Subject: [PATCH] qemu: add annotations for iommu_platform
for s390x virtio devices

Add iommu_platform annotations for qemu for ccw,
other supported devices can also make use of that.

  Fixes #603

Signed-off-by: Qi Feng Huo <huoqif@cn.ibm.com>
2020-08-28 11:25:14 +08:00
Liam Merwick
c15ef219e5 qemu: Set govmmQemu NoReboot config Knob
The Kata architecture does not support rebooting VMs (the lifecycle
being start/exec/kill) and if a VM is killed (e.g. using sysrq-trigger),
the VM does not exit fully and other layers do not notice the state change.
Set the NoReboot config Knob so that govmmQemu.LaunchQemu() runs QEMU
with the --no-reboot command-line option.

Fixes: #2866

Signed-off-by: Liam Merwick <liam.merwick@oracle.com>
2020-07-30 16:04:08 +01:00
Penny Zheng
7f3e8959c5 console-watcher: use console watcher to monitor guest console outputs
Import new console watcher to monitor guest console outputs, and will be
only effective when we turn on enable_debug option.
Guest console outputs may include guest kernel debug info, agent debug info,
etc.

Fixes: #389

Signed-off-by: Penny Zheng penny.zheng@arm.com
2020-07-16 05:26:19 +00:00
Penny Zheng
1099a28830 kata 2.0: delete use_vsock option and proxy abstraction
With kata containers moving to 2.0, (hybrid-)vsock will be the only
way to directly communicate between host and agent.
And kata-proxy as additional component to handle the multiplexing on
serial port is also no longer needed.
Cleaning up related unit tests, and also add another mock socket type
`MockHybridVSock` to deal with ttrpc-based hybrid-vsock mock server.

Fixes: #389

Signed-off-by: Penny Zheng penny.zheng@arm.com
2020-07-16 04:20:02 +00:00
Peng Tao
2bff7a16f5
Merge pull request #363 from liubin/feature/delete-sub-commands-332
runtime: delete unused sub-commands.
2020-07-13 11:06:27 +08:00
Julio Montes
2afbfcab99 virtcontainers: print a warning when the device to append is not supported
Print a warning message when the device to append to a QEMU VM is not
supported. This change is just to improve debuggability.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-07-08 09:36:36 -05:00
bin liu
3cf8b470cd runtime: delete Stateful from SandboxConfig
Since all containers are started from shim v2, `Stateful` is not needed.

Fixes: #332

Signed-off-by: bin liu <bin@hyper.sh>
2020-07-08 21:59:44 +08:00
bin liu
41c04648ad runtime: fix wrong issue links
Fix issue links in source codes.

Fixes: #391

Signed-off-by: bin liu <bin@hyper.sh>
2020-07-07 16:35:43 +08:00
Shuicheng Lin
bdd386ba14 qemu: Fix rtc parameter is not set to qemu
[ port from runtime commit 379f19f7ccd71ebe938d9d6fe3cfe5f05f4f02bf ]

Add default value for Clock, otherwise rtc parameter will be dropped
by Valid function. "host" is the default value in qemu for rtc clock.

Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-06-30 04:04:38 -07:00
Jia He
fa9d619e8a qemu: add cpu_features option
[ port from runtime commit 0100af18a2afdd6dfcc95129ec6237ba4915b3e5 ]

To control whether guest can enable/disable some CPU features. E.g. pmu=off,
vmx=off. As discussed in the thread [1], the best approach is to let users
specify them. How about adding a new option in the configuration file.

Currently this patch only supports this option in qemu,no other vmm.

[1] https://github.com/kata-containers/runtime/pull/2559#issuecomment-603998256

Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-06-29 20:16:11 -07:00
Christophe de Dinechin
be9ca0d58b qemu: Don't leak file descriptors in case of error
[ port from runtime commit 7b269ff7aa2d62fe12593ff7040798e6c9bd5d65 ]

If we take one of the error paths from setupVirtiofsd() after
opening the fd variable, the fd.Close() function is not called.

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-06-29 01:19:18 -07:00
Peng Tao
9d90906546
Merge pull request #320 from dgibson/cleanups
Clean up some unnecessary data structures
2020-06-26 16:18:16 +08:00
Peng Tao
3bbb97add3
Merge pull request #312 from Pennyzct/network_throttle_on_qemu
rate-limiter: network I/O throttling on VM level
2020-06-25 04:59:44 +08:00
David Gibson
ea1d799f79 qemu: Only one element of qemuPaths map is relevant
The qemuPaths field in qemuArchBase maps from machine type to the default
qemu path.  But, by the time we construct it, we already know the machine
type, so that entry ends up being the only one we care about.

So, collapse the map into a single path.  As a bonus, the qemuPath()
method can no longer fail.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-06-24 21:26:43 +10:00
David Gibson
5dffffd432 qemu: Remove useless table from qemuArchBase
The supportedQemuMachines array in qemuArchBase has a list of all the
qemu machine types supported for the architecture, with the options
for each.  But, the machineType field already tells us which of the
machine types we're actually using, and that's the only entry we
actually care about.

So, drop the table, and just have a single value with the machine type
we're actually using.  As a bonus that means the machine() method can
no longer fail, so no longer needs an error return.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-06-24 21:26:38 +10:00
David Gibson
97a02131c6 qemu: Detect and fail a bad machine type earlier
Currently, newQemuArch() doesn't return an error.  So, if passed an invalid
machine type, it will return a technically valid, but unusable qemuArch
object, which will probably fail with other errors shortly down the track.

Change this, to more cleanly fail the newQemuArch itself, letting us
detect a bad machine type earlier.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-06-24 21:07:33 +10:00
Penny Zheng
bd8658e362 rate-limiter: check if hypervisor supports built-in rate limiter
As for some hypervisors, like firecracker, they support built-in rate limiter
to control network I/O bandwidth on VMM level. And for some hypervisors, like qemu,
they don't.

Fixes: #250

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
2020-06-24 06:14:34 +00:00
Christophe de Dinechin
487520ff74 qemu: Report all errors on virtiofsd execution
The virtiofs daemon may run into errors other than the file
not existing, e.g. the file may not be executable.

Fixes: #2682

Message is now:
  virtiofs daemon /usr/local/bin/hello returned with error:
  fork/exec /usr/local/bin/virtiofsd: permission denied

instead of
  panic: runtime error: invalid memory address or nil

Fixes: #2582

Message is now:
  virtiofs daemon /usr/local/bin/hello-not-found returned with error:
  fork/exec /usr/local/bin/hello-not-found: no such file or directory

instead of:
  virtiofsd path (/usr/local/bin/hello-no-found) does not exist

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2020-06-23 22:10:44 +02:00
Julio Montes
ac9cc96a6f
Merge pull request #304 from fidencio/wip/forward_port_2703
[foward port] Add vIOMMU support to qemu q35
2020-06-23 12:20:52 -05:00
Peng Tao
042135949a vc: make host shared path readonly
We need to make sure containers cannot modify host path unless it is explicitly shared to it. Right now we expose an additional top level shared directory to the guest and allow it to be modified. This is less ideal and can be enhanced by following method:
1. create two directories for each sandbox:
  -. /run/kata-containers/shared/sandboxes/$sbx_id/mounts/, a directory to hold all host/guest shared mounts
  -. /run/kata-containers/shared/sandboxes/$sbx_id/shared/, a host/guest shared directory (9pfs/virtiofs source dir)
2. /run/kata-containers/shared/sandboxes/$sbx_id/mounts/ is bind mounted readonly to /run/kata-containers/shared/sandboxes/$sbx_id/shared/, so guest cannot modify it
3. host-guest shared files/directories are mounted one-level under /run/kata-containers/shared/sandboxes/$sbx_id/mounts/ and thus present to guest at one level under /run/kata-containers/shared/sandboxes/$sbx_id/shared/

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-06-23 00:44:44 -07:00
Adrian Moreno
b97287090b qemu: enable iommu on q35
Add a configuration option and a Pod Annotation

If activated:
- Add kernel parameters to load iommu
- Add irqchip=split in the kvm options
- Add a vIOMMU to the VM

Fixes #2694
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2020-06-22 16:37:20 +02:00
Peng Tao
6de95bf36c gomod: update runtime import path
To use the kata-containers repo path.

Most of the change is generated by script:
find . -type f -name "*.go" |xargs sed -i -e \
's|github.com/kata-containers/runtime|github.com/kata-containers/kata-containers/src/runtime|g'

Fixes: #201
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-04-29 18:39:03 -07:00
Peng Tao
a02a8bda66 runtime: move all code to src/runtime
To prepare for merging into kata-containers repository.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-04-27 19:39:25 -07:00