Commit Graph

2069 Commits

Author SHA1 Message Date
Unmesh Deodhar
12c5ef9020 packaging: add support to build OVMF for SEV
SEV requires special OVMF to work with kernel hashes.
Thus, adding changes that builds this custom OVMF for SEV.

Fixes: #6572

Signed-Off-By: Unmesh Deodhar <udeodhar@amd.com>
2023-05-10 12:19:55 -05:00
Feng Wang
4e0dce6802
Merge pull request #6738 from fengwang666/oss-fix-fd-leak
runtime: Fix virtiofs fd leak
2023-05-08 10:52:36 -07:00
Eduardo Berrocal
a4c0303d89 virtcontainers: Fixed static checks for improved test coverage for fc.go
Expanded tests on fc_test.go to cover more lines of code. Coverage went from 4.6% to 18.5%.
Fixed very simple static check fail on line 202.

Fixes: #266

Signed-off-by: Eduardo Berrocal <eduardo.berrocal@intel.com>
2023-05-07 00:17:36 -07:00
Peng Tao
65670e6b0a
Merge pull request #6699 from zvonkok/cold-plug-vfio
gpu: cold plug VFIO devices
2023-05-05 10:04:29 +08:00
Archana Shinde
b86d32aba9
Merge pull request #6728 from nedsouza/256/tests_coverage_pkg_signals
pkg/signals: Improved test coverage 60% to 100%
2023-05-04 16:19:12 -07:00
Archana Shinde
9443c4aea7
Merge pull request #6729 from nedsouza/259/tests_coverage_virtcontainers_persist
virtcontainers/persist: Improved test coverage 65% to 87.5%
2023-05-04 16:18:55 -07:00
Archana Shinde
09134c30de
Merge pull request #6737 from nedsouza/265/virtcontainers-clh-go-coverage
virtcontainers/clh_test.go: improve unit test coverage
2023-05-04 16:15:43 -07:00
Zvonko Kaiser
13d7f39c71 gpu: Check for VFIO port assignments
Bailing out early if the port is wrong, allowed port settings are
no-port, root-port, switch-port

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-05-03 12:32:33 +00:00
Eduardo Berrocal
03a8cd69c2 virtcontainers: Improved test coverage for fc.go from 4.6% to 18.5%
Expanded tests on fc_test.go to cover more lines of code. Coverage went from 4.6% to 18.5%.

Fixes: #266

Signed-off-by: Eduardo Berrocal <eduardo.berrocal@intel.com>
2023-04-28 15:40:45 -07:00
Eduardo Berrocal
6bf1fc6051 virtcontainers/factory: Improved test coverage
Expanded tests on factory_test.go to cover more lines of code. Coverage went from 34% to 41.5% in the case of user-mode run tests,
and from 77.7% to 84% in the case of priviledge-mode run tests.

Fixes: #260

Signed-off-by: Eduardo Berrocal <eduardo.berrocal@intel.com>
2023-04-27 13:08:35 -07:00
Zvonko Kaiser
138ada049c gpu: Cold Plug VFIO toml setting
Added the cold_plug_vfio setting to the qemu-toml.in with some
epxlanation

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-27 11:04:45 +00:00
Zvonko Kaiser
f7ad75cb12 gpu: Cold-plug extend the api.md
Make the hypervisorconfig consistent in code and api.md

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-27 09:35:05 +00:00
Zvonko Kaiser
0fec2e6986 gpu: Add cold-plug test
Cold plug setting is now correctly decoded in toml

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-27 09:30:24 +00:00
Feng Wang
205909fbed runtime: Fix virtiofs fd leak
The kata runtime invokes removeStaleVirtiofsShareMounts after
a container is stopped to clean up the stale virtiofs file caches.

Fixes: #6455
Signed-off-by: Feng Wang <fwang@confluent.io>
2023-04-26 15:53:39 -07:00
Tamas K Lengyel
0f45b0faa9 virtcontainers/clh_test.go: improve unit test coverage
Credit PR to Hackathon Team3

Fixes: #265

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
2023-04-26 19:12:51 +00:00
Zvonko Kaiser
dded731db3 gpu: Add OVMF setting for MMIO aperture
The default size of OVMFs aperture is too low to
initialized PCIe devices with huge BARs

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-26 09:47:37 +00:00
Zvonko Kaiser
2a830177ca gpu: Add fwcfg helper function
Added driver util function for easier handling of VFIO
devices outside of the VFIO module. At the sandbox level
we may need to set options depending if we have a VFIO/PCIe
device, like the fwCfg for confiential guests.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-26 09:47:37 +00:00
Zvonko Kaiser
131f056a12 gpu: Extract VFIO Functions to drivers
Some functions may be used in other modules then only in
the VFIO module, extract them and make them available to
other layers like sandbox.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-26 09:47:37 +00:00
Zvonko Kaiser
c8cf7ed3bc gpu: Add ColdPlug of VFIO devices with devManager
If we have a VFIO device and cold-plug is enabled
we mark each device as ColdPlug=true and let the VFIO
module do the attaching.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-26 09:47:37 +00:00
Zvonko Kaiser
e2b5e7f73b gpu: Add Rawdevices to hypervisor
RawDevics are used to get PCIe device info early before the sandbox
is started to make better PCIe topology decisions

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-26 09:47:37 +00:00
Zvonko Kaiser
6107c32d70 gpu: Assign default value to cold-plug
Make sure the configuration is propagated to the right structs
and the default value is assigned.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-26 09:47:37 +00:00
Zvonko Kaiser
377ebc2ad1 gpu: Add configuration option for cold-plug VFIO
Users can set cold-plug="root-port" to cold plug a VFIO device in QEMU

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-26 09:47:37 +00:00
Zvonko Kaiser
c18ceae109 gpu: Add new struct PCIePort
For the hypervisor to distinguish between PCIe components, adding
a new enum that can be used for hot-plug and cold-plug of PCIe devices

Fixes: #6687

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-26 09:47:37 +00:00
Eduardo Berrocal
9c38204f13 virtcontainers/persist: Improved test coverage 65% to 87.5%
Expanded tests on manager_test.go to cover more lines of code.

Fixes: #259

Signed-off-by: Eduardo Berrocal <eduardo.berrocal@intel.com>
2023-04-25 23:53:46 +00:00
Eduardo Berrocal
1c1ee8057c pkg/signals: Improved test coverage 60% to 100%
Expanded tests on signals_test.go to cover more lines of code. 'go test' won't show 100% coverage (only 66.7%), because one test need to spawn a new
process (since it is testing a function that calls os.Exit(1)).

Fixes: #256

Signed-off-by: Eduardo Berrocal <eduardo.berrocal@intel.com>
2023-04-25 23:34:13 +00:00
Fupan Li
a1568cd2f5
Merge pull request #6676 from zvonkok/gpu-runtime
gpu: Add GPU enabled confguration and runtime
2023-04-19 13:01:49 +08:00
Zvonko Kaiser
a81fff706f gpu: Adding a GPU enabled configuration
We need to set hotplug on pci root port and enable at least one
root port. Also set the guest-hooks-dir to the correct path

Fixes: #6675

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-17 10:40:09 +00:00
Zvonko Kaiser
f4f958d53c gpu: Do not pass-through PCI (Host) Bridges
On some systems a GPU is in a IOMMU group with a PCI Bridge and
PCI Host Bridge. Per default no PCI Bridge needs to be passed-through.
When scanning the IOMMU group, ignore devices with a 0x60 class ID prefix.

Fixes: #6663

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-17 10:08:23 +00:00
Fabiano Fidêncio
fffe2c6082
Merge pull request #6648 from fidencio/topic/gha-tdx-improvements-and-fixes
gha: tdx: Ensure kata-deploy is removed after the tests run
2023-04-15 00:21:31 +02:00
Fabiano Fidêncio
dc662333df runtime: Increase the dial_timeout
When testing on AKS, we've been hitting the dial_timeout every now and
then.  Let's increase it to 45 seconds (instead of 30) for all the VMMs,
and to 60 seconfs in case of TEEs.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-13 22:42:52 +02:00
Fabiano Fidêncio
f478b9115e clh: tdx: Update timeouts for confidential guest
Booting up TDX takes more time than booting up a normal VM.  Those
values are being already used as part of the CCv0 branch, and we're just
bringing them to the `main` branch as well.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-13 10:18:07 +02:00
Alexandru Matei
db2cac34d8 runtime: Don't create socket file in /run/kata
The socket file for shim management is created in /run/kata
and it isn't deleted after the container is stopped. After
running and stopping thousands of containers /run folder
will run out of space.

Fixes #6622
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
Co-authored-by: Greg Kurz <groug@kaod.org>
2023-04-13 10:21:29 +03:00
Fabiano Fidêncio
3b3656d96d
Merge pull request #6522 from fidencio/topic/add-tdx-artefacts-from-2023ww01-to-main
tdx: Add artefacts from the latest TDX tools release into main
2023-04-11 20:43:02 +02:00
Fabiano Fidêncio
50ce33b02d
Merge pull request #6205 from fengwang666/non-root-clh
runtime: support non-root for clh
2023-04-11 19:34:00 +02:00
Fabiano Fidêncio
98682805be config: Add configuration for QEMU TDX
As the QEMU configuration for TDX differs quite a lot from the normal
QEMU configuration, let's add a new configuration file for the QEMU TDX.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 16:10:35 +02:00
Fabiano Fidêncio
3e15800199 govmm: Directly pass the firmware using -bios with TDX
Since TDX doesn't support readonly memslot, TDVF cannot be mapped as
pflash device and it actually works as RAM. "-bios" option is chosen to
load TDVF.

OVMF is the opensource firmware that implements the TDVF support. Thus
the command line to specify and load TDVF is ``-bios OVMF.fd``

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:23:42 +02:00
Fabiano Fidêncio
3c5ffb0c85 govmm: Set "sept-ve-disable=on"
This is needed since 22ww49.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:23:42 +02:00
Fabiano Fidêncio
ed145365ec runtime/qemu: Drop "kvm-type=tdx"
This is not supported since 22ww49.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:23:42 +02:00
Fabiano Fidêncio
25b3cdd38c virtcontainers: Drop check for the tdx CPU flag
In the recent kernels provided by Intel the `tdx` CPU flag is not
present anymore.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:23:42 +02:00
Fabiano Fidêncio
01bdacb4e4 virtcontainers: Also check /sys/firmwares/tdx for TDX
Let's make sure we also check /sys/firmwares/tdx for TDX guest
protection, as the location may depend on whether TDX Seam is being used
or not.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:23:42 +02:00
James O. D. Hunt
cbe6f04194
Merge pull request #6501 from shippomx/dev_metrics
runtime: add filter metrics with specific names
2023-04-05 15:15:09 +01:00
Miao Xia
0f73515561 runtime: add filter metrics with specific names
The kata monitor metrics API returns a huge size response,
if containers or sandboxs are a large number,
focus on what we need will be harder.

Fixes: #6500

Signed-off-by: Miao Xia <xia.miao1@zte.com.cn>
2023-03-28 14:56:13 +08:00
Bin Liu
75987aae72
Merge pull request #6408 from jongwu/nydus_rm_hybrid
nydus: upgrad to v2.2.0
2023-03-28 11:07:56 +08:00
James O. D. Hunt
f06f72b5e9
Merge pull request #6467 from jongwu/qemu-uefi-path
qemu/arm64: disable image nvdimm once no firmware offered
2023-03-22 08:43:01 +00:00
Jianyong Wu
ece5edc641 qemu/arm64: disable image nvdimm if no firmware offered
For now, image nvdimm on qemu/arm64 depends on UEFI/ACPI, so if there
is no firmware offered, it should be disabled.

Fixes: #6468
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-03-20 18:03:05 +08:00
Hyounggyu Choi
96baa83895 agent: Bring in VFIO-AP device handling again
This PR is a continuing work for (kata-containers#3679).

This generalizes the previous VFIO device handling which only
focuses on PCI to include AP (IBM Z specific).

Fixes: kata-containers#3678
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2023-03-16 18:14:12 +09:00
Jakob Naucke
f666f8e2df agent: Add VFIO-AP device handling
Initial VFIO-AP support (#578) was simple, but somewhat hacky; a
different code path would be chosen for performing the hotplug, and
agent-side device handling was bound to knowing the assigned queue
numbers (APQNs) through some other means; plus the code for awaiting
them was written for the Go agent and never released. This code also
artificially increased the hotplug timeout to wait for the (relatively
expensive, thus limited to 5 seconds at the quickest) AP rescan, which
is impractical for e.g. common k8s timeouts.

Since then, the general handling logic was improved (#1190), but it
assumed PCI in several places.

In the runtime, introduce and parse AP devices. Annotate them as such
when passing to the agent, and include information about the associated
APQNs.

The agent awaits the passed APQNs through uevents and triggers a
rescan directly.

Fixes: #3678
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2023-03-16 10:07:48 +09:00
Jakob Naucke
b546eca26f runtime: Generalize VFIO devices
Generalize VFIO devices to allow for adding AP in the next patch.
The logic for VFIOPciDeviceMediatedType() has been changed and IsAPVFIOMediatedDevice() has been removed.

The rationale for the revomal is:

- VFIODeviceMediatedType is divided into 2 subtypes for AP and PCI
- Logic of checking a subtype of mediated device is included in GetVFIODeviceType()
- VFIOPciDeviceMediatedType() can simply fulfill the device addition based
on a type categorized by GetVFIODeviceType()

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2023-03-16 10:06:37 +09:00
Jakob Naucke
4c527d00c7 agent: Rename VFIO handling to VFIO PCI handling
e.g., split_vfio_option is PCI-specific and should instead be named
split_vfio_pci_option. This mutually affects the runtime, most notably
how the labels are named for the agent.

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2023-03-16 07:43:39 +09:00
Fabiano Fidêncio
814d07af58
Merge pull request #6463 from sprt/sprt/mshv-compat
runtime: add support for Hyper-V
2023-03-15 18:03:25 +01:00
Henry Beberman
974a5c22f0 runtime: add support for Hyper-V
This adds /dev/mshv to the list of sandbox devices so that VMMs can
create Hyper-V VMs.

In our testing, this also doesn't error out in case /dev/mshv isn't
present.

Fixes #6454.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-03-13 17:13:51 -07:00
Sidhartha Mani
a6c67a161e
runtime: add support for ephemeral mounts to occupy entire sandbox memory
On hotplug of memory as containers are started, remount all ephemeral mounts with size option set to the total sandbox memory

Fixes: #6417

Signed-off-by: Sidhartha Mani <sidhartha_mani@apple.com>
2023-03-10 13:36:02 -08:00
Fabiano Fidêncio
98d611623f
Merge pull request #6361 from etrunko/main
runtime/Makefile: Fix install-containerd-shim-v2 dependency
2023-03-04 13:47:11 +01:00
Jianyong Wu
395645e1ce runtime: hybrid-mode cause error in the latest nydusd
When update the nydusd to 2.2, the argument "--hybrid-mode" cause
the following error:

thread 'main' panicked at 'ArgAction::SetTrue / ArgAction::SetFalse is defaulted'

Maybe we should remove it to upgrad nydusd

Fixes: #6407
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-03-04 12:58:48 +08:00
Eduardo Lima (Etrunko)
a9e2fc8678 runtime/Makefile: Fix install-containerd-shim-v2 dependency
$ make install
make: *** No rule to make target 'containerd-shim-kata-v2', needed by 'install-containerd-shim-v2'.  Stop.

Spotted when building kata-runtime with a different name for
SHIMV2_OUTPUT. For instance, trying to keep different runtime binaries
installed at the same time, one from master and another from lets say,
the CCv0 branch, with the following small change applied.

diff --git a/src/runtime/Makefile b/src/runtime/Makefile
index 95efaff78..2bab9eb75 100644
--- a/src/runtime/Makefile
+++ b/src/runtime/Makefile
@@ -231,7 +231,7 @@ SED = sed

 CLI_DIR = cmd
 SHIMV2 = containerd-shim-kata-v2
-SHIMV2_OUTPUT = $(bCURDIR)/$(SHIMV2)
+SHIMV2_OUTPUT = $(CURDIR)/$(SHIMV2)-ccv0
 SHIMV2_DIR = $(CLI_DIR)/$(SHIMV2)

 MONITOR = kata-monitor

Fixes: #6398

Signed-off-by: Eduardo Lima (Etrunko) <etrunko@redhat.com>
2023-03-01 15:57:30 -03:00
yanggang
b6880c60d3
logging: Correct the code notes
Fix wrong notes for func GetSandboxesStoragePathRust()

Fixes: #6394

Signed-off-by: yanggang <gang.yang@daocloud.io>
2023-03-01 19:20:25 +08:00
Chelsea Mafrica
703589c279
Merge pull request #6369 from XDTG/6082/Fix-path-check-bypassed
runtime: use filepath.Clean() to clean the mount path
2023-02-27 17:24:50 -08:00
Bo Chen
3ac6f29e95 runtime: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v30.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.

Fixes: #6375

Signed-off-by: Bo Chen <chen.bo@intel.com>
2023-02-24 10:20:29 -08:00
XDTG
dc86d6dac3 runtime: use filepath.Clean() to clean the mount path
Fix path check bypassed issuse introduced by #6082,
use filepath.Clean() to clean path before check

Fixes: #6082

Signed-off-by: XDTG <click1799@163.com>
2023-02-24 15:48:09 +08:00
Feng Wang
cbe6ad9034 runtime: support non-root for clh
This change enables to run cloud-hypervisor VMM using a non-root user
when rootless flag is set true in the configuration

Fixes: #2567

Signed-off-by: Feng Wang <fwang@confluent.io>
2023-02-22 13:57:09 -08:00
Amulyam24
e84af6a620 virtiofsd: update to a valid path on ppc64le
Currently the symbolic link for virtiofsd which is used as
a valid path is not updated on every CI run. Fix it by
using the actual path of installation.

Fixes: #6311

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2023-02-17 16:22:39 +05:30
James O. D. Hunt
5f6d747e6d
Merge pull request #6272 from cmaf/tracing-clh-returnctx-startVM
runtime: tracing: Fix missing ctx return
2023-02-14 08:17:45 +00:00
Bin Liu
e812c5ce66
Merge pull request #6076 from zhaojizhuang/reconnect
runtime: add reconnect timeout for vhost user block
2023-02-14 10:39:20 +08:00
Archana Shinde
7b4e5751ca
Merge pull request #5007 from larrydewey/update-rpb-main
SEV: Update ReducedPhysBits
2023-02-13 14:56:38 -08:00
Chelsea Mafrica
c453919911 runtime: tracing: Fix missing ctx return
Normally we return the context when creating a trace span so that the
ordering of spans w.r.t. calls is maintained in tracing output. Add
missing context for StartVM() for Cloud Hypervisor.

Fixes #6271

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2023-02-13 12:37:52 -08:00
zhaojizhuang
ca02c9f512 runtime: add reconnect timeout for vhost user block
Fixes: #6075
Signed-off-by: zhaojizhuang <571130360@qq.com>
2023-02-13 14:33:46 +08:00
Bin Liu
95602c8c08
Merge pull request #5999 from yaoyinnan/5998/feat/cgroup-metrics
runtime: support cgroup v2 metrics marshal guest metrics
2023-02-11 19:26:24 +08:00
Bin Liu
8a9392fd9d
Merge pull request #6188 from yahaa/Typo-fix
Typo: change tabs in comment to spaces
2023-02-11 11:19:11 +08:00
Bin Liu
ecbd94d80c
Merge pull request #6064 from yaoyinnan/6063/feat/rootfs-erofs
rootfs: support EROFS filesystem
2023-02-11 11:10:23 +08:00
Larry Dewey
67b8f0773f SEV: Update ReducedPhysBits
Updating this field, as `cpuid` provides host level data, which is not
what a guest would expect for Reduced Phsycial Bits. In almost all
cases, we should be using `1` for the value here.

Amend: Adding unit test change.

Fixes: #5006

Signed-off-by: Larry Dewey <larry.dewey@amd.com>
2023-02-10 13:19:33 -06:00
yaoyinnan
bdf20b5d26 rootfs: support EROFS filesystem
For kata containers, rootfs is used in the read-only way.
EROFS can noticably decrease metadata overhead.

On the basis of supporting the EROFS file system, it supports using the config parameter to switch the file system used by rootfs.

Fixes: #6063

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
2023-02-11 00:44:13 +08:00
GabyCT
86501d5f6f
Merge pull request #6200 from gkurz/improve-appendFDs-doc
runtime: Improve documentation of appendFDs
2023-02-09 15:50:37 -06:00
yaoyinnan
01765e1734 runtime: support cgroup v2 metrics marshal guest metrics
Support to use cgroup v2 metrics marshal guest metrics.

Fixes: #5998

Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
2023-02-09 19:14:09 +08:00
Bin Liu
407d3146e6
Merge pull request #6234 from UiPath/fix-clh-timeout
clh: Enforce API timeout only for vm.boot request
2023-02-08 21:33:56 +08:00
Alexandru Matei
ac64b021a6 clh: Enforce API timeout only for vm.boot request
launchClh already has a timeout of 10seconds for launching clh, e.g.
if launchClh or setupVirtiofsDaemon takes a few seconds the context's
deadline will already be expired by the time it reaches bootVM

Fixes #6240
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2023-02-08 11:14:51 +02:00
Bin Liu
56071c6e7b virtiofsd: change cache mod to const
Change cache mod from literal to const and place them in one place.

Also set default cache mode from `none` to `never` in
`pkg/katautils/config-settings.go.in`.

Fixes: #6151

Signed-off-by: Bin Liu <bin@hyper.sh>
2023-02-08 15:06:52 +08:00
Bin Liu
71a3b73cb0
Merge pull request #6223 from d3c3mber/rm-unused-shim-config
runtime: remove not used shim configurations
2023-02-08 10:00:52 +08:00
d3c3mber
390916b33c runtime: remove not used shim configurations
ShimPath and ShimDebug are not needed anymore.

Fixes: #6147

Signed-off-by: d3c3mber <tangbo_gl_2022@163.com>
2023-02-07 14:06:12 +08:00
joannejchen
9794c52c65 improvement: Fix naming conventions for span name and log subsystem
Normally, the span name should be the same as the function name, and the log subsystem should not contain spaces.

Fixes #6153

Signed-off-by: joannejchen <chenjjoanne@gmail.com>
2023-02-06 08:25:49 -06:00
GabyCT
7fc35f19eb
Merge pull request #6056 from jongwu/perm_deny
arm64/CI: fix unit test failure on arm64
2023-02-03 10:53:38 -06:00
Jianyong Wu
59f104c022 runtime: skip unit test that fail regularly on aarch64
There are lots of unit test cases fails regularly on aarch64, including
TestIOCopy, create_tmpfs. Temporarily skip it for now and enable it
after them get fixed.

Fixes: #6194
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-02-03 11:34:39 +08:00
Greg Kurz
3c48f2202c runtime: Improve documentation of appendFDs
The cmd.ExtraFiles feature that is used to implement appendFDs takes an
array of arbitray file descriptors and internally renumbers them to be
consecutive starting from 3, using dup2().

This isn't especially obvious : document it for the sake of clarity.

Fixes #6199

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-02-02 12:52:10 +01:00
yahaa
e071d9251f Typo: change tabs in comment to spaces
Fixes: #6150

Signed-off-by: yahaa <1477765176@qq.com>
2023-02-02 12:08:33 +08:00
Peng Tao
a34f36f8f4
Merge pull request #6149 from openanolis/fix_kata_runtime
runtime:fix stat uds path
2023-02-02 11:00:07 +08:00
Greg Kurz
334c4b8bdc runtime: Drop QEMU log file support
The QEMU log file is essentially about fine grain tracing of QEMU
internals and mostly useful for developpers, not production. Notably,
the log file isn't limited in size, nor rotated in any way. It means
that a container running in the VM could possibly flood the log file
with a guest triggerable trace. For example, on openshift, the log
file is supposed to reside on a per-VM 14 GiB tmpfs mount. This means
that each pod running with the kata runtime could potentially consume
this amount of host RAM which is not acceptable.

Error messages are best collected from QEMU's stderr as kata is doing
now since PR #5736 was merged. Drop support for the QEMU log file
because it doesn't bring any value but can certainly do harm.

Fixes #6173

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-01-31 09:20:29 +01:00
Zhongtao Hu
1e531b44dc runtime:fix stat uds path
os.Stat("unix:///run/vc/sbs/sid/shim-monitor.sock") will fail,
should be os.Stat("/run/vc/sbs/sid/shim-monitor.sock")

Fixes:#6148
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2023-01-29 15:08:13 +08:00
zhaojizhuang
9092c23a2e runtime: Add hmp for qemu
Fixes: #6092
Signed-off-by: zhaojizhuang <571130360@qq.com>
2023-01-29 14:22:04 +08:00
Greg Kurz
af125b1498
Merge pull request #5736 from gkurz/no-qemu-daemonize
runtime: Start QEMU undaemonized and get logs
2023-01-27 16:33:48 +01:00
Greg Kurz
39fe4a4b6f runtime: Collect QEMU's stderr
LaunchQemu now connects a pipe to QEMU's stderr and makes it
usable by callers through a Go io.ReadCloser object. As
explained in [0], all messages should be read from the pipe
before calling cmd.Wait : introduce a LogAndWait helper to handle
that.

Fixes #5780

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-01-24 23:09:17 +01:00
Greg Kurz
a5319c6be6 runtime: Start QEMU undaemonized
QEMU has always been started daemonized since the beginning. I
could not find any justification for that though, but it certainly
introduces a problem : QEMU stops logging errors when started this
way, which isn't accaptable from a support standpoint. The QEMU
community discourages the use of -daemonize ; mostly because
libvirt, QEMU's primary consummer, doesn't use this option and
prefers getting errors from QEMU's stderr through a pipe in order
to enforce rollover.

Now that virtcontainers knows how to start QEMU with a pre-
established QMP connection, let's start QEMU without -daemonize.
This requires to handle the reaping of QEMU when it terminates.
Since cmd.Wait() is blocking, call it from a goroutine.

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-01-24 23:09:11 +01:00
Greg Kurz
bf4e3a618f runtime: Launch QEMU with cmd.Start()
LaunchCustomQemu() currently starts QEMU with cmd.Run() which is
supposed to block until the child process terminates. This assumes
that QEMU daemonizes itself, otherwise LaunchCustomQemu() would
block forever. The virtcontainers package indeed enables the
Daemonize knob in the configuration but having such an implicit
dependency on a supposedly configurable setting is ugly and fragile.

cmd.Run() is :

func (c *Cmd) Run() error {
	if err := c.Start(); err != nil {
		return err
	}
	return c.Wait()
}

Let's open-code this : govmm calls cmd.Start() and returns the
cmd to virtcontainers which calls cmd.Wait().

If QEMU doesn't start, e.g. missing binary, there won't be any
errors to collect from QEMU output. Just drop these lines in govmm.
Similarily there won't be any log file to read from in virtcontainers.
Drop that as well.

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-01-24 23:09:11 +01:00
Greg Kurz
8a1723a5cb runtime: Pre-establish the QMP connection
Running QEMU daemonized ensures that the QMP socket is ready to
accept connections when LaunchQemu() returns. In order to be
able to run QEMU undaemonized, let's handle that part upfront.
Create a listener socket and connect to it. Pass the listener
to QEMU and pass the connected socket to QMP : this ensures
that we cannot fail to establish QMP connection and that we
can detect if QEMU exits before accepting the connection.
This is basically what libvirt does.

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-01-24 23:09:11 +01:00
Greg Kurz
8a4f08cb0f govmm: Optionally pass QMP listener to QEMU
QEMU's -qmp option can be passed the file descriptor of a socket that
is already in listening mode. This is done with by passing `fd=XXX`
to `-qmp` instead of a path. Note that these two options are mutually
exclusive : QEMU errors out if both are passed, so we check that as
well in the validation function.

While here add the `path=` stanza in the path based case for clarity.

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-01-24 23:08:48 +01:00
Greg Kurz
219bb8e7d0 govmm: Optionally start QMP with a pre-configured connection
When QEMU is launched daemonized, we have the guarantee that the
QMP socket is available. In order to launch a non-daemonized QEMU,
the QMP connection should be created before QEMU is started in order
to avoid a race. Introduce a variant of QMPStart() that can use such
an existing connection.

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-01-24 19:16:47 +01:00
GabyCT
421a33f846
Merge pull request #6096 from dcantah/kataruntime-use_hyp_consts
runtime: Use consts in `kata-runtime check`
2023-01-18 10:54:42 -06:00
Peng Tao
7d1a604bad
Merge pull request #6060 from ls-ggg/6055/service.mu-deadlock
runtime:all APIs are hang in the service.mu
2023-01-18 10:50:00 +08:00
Danny Canter
ba87e0afea runtime: Use consts in kata-runtime check
Fixes: #6095

We're already importing the virtcontainers package so might as well
use the constants for the hypervisor types we're checking against instead
of typing the names out in the switch cases.

Signed-off-by: Danny Canter <danny@dcantah.dev>
2023-01-17 06:55:36 -08:00
Bin Liu
790f45190b
Merge pull request #6074 from zhaojizhuang/enablevhostuserstore
runtime: paas enablevhostuserstore annotation to hypervisor config
2023-01-17 11:43:43 +08:00
Tim Zhang
20196048bf
Merge pull request #6030 from liubin/fix/6029-use-system-hugepagesize
runtime: use system pagesize for hugepage test
2023-01-16 16:57:55 +08:00
ls
69fc8de712 runtime:all APIs are hang in the service.mu
When the vmm process exits abnormally, a goroutine sets s.monitor
to null in the 'watchSandbox' function without getting service.mu,
This will cause another goroutine to block when sending a message
to s.monitor, and it holds service.mu, which leads to a deadlock.
For example, the wait function in the file
.../pkg/containerd-shim-v2/wait.go will send a message to s.monitor
after obtaining service.mu, but s.monitor may be null at this time

Fixes: #6059

Signed-off-by: ls <335814617@qq.com>
2023-01-16 14:45:37 +08:00
Eric Ernst
807eeaafd0
Merge pull request #6047 from egernst/build-kata-monitor-on-darwin
runtime: Use git rev-parse for the kata-monitor tag
2023-01-13 15:29:00 -08:00
Eric Ernst
3d573ba579
Merge pull request #6050 from egernst/goos-the-vc
virtcontainers: split out linux-specific bits for mount, factory
2023-01-13 15:28:42 -08:00
Eric Ernst
458fe865ea
Merge pull request #6052 from egernst/add-darwin-skeletons
Add darwin skeletons
2023-01-13 13:14:16 -08:00
Eric Ernst
923cd3fda1 virtcontainers: split out Linux parts from mount
Mount handling is often unique in Linux. Let's ensure that the common
parts remain in mount.go, while Linux speific parts are within a linux
file.

Fixes: #6049

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2023-01-13 11:14:56 -08:00
Eric Ernst
54f2b296e3
Merge pull request #6048 from egernst/revendor-netlink
vendor: revendor netlink to get latest
2023-01-13 11:08:47 -08:00
Eric Ernst
f82918f872
Merge pull request #6045 from egernst/fix-6044
Address issues with the initial vCPU pinning functionality
2023-01-13 11:06:42 -08:00
GabyCT
9c6e90fd55
Merge pull request #6043 from GabyCT/topic/fixerrormsg
virtcontainers: Fix misspelling in error message
2023-01-13 09:16:34 -06:00
zhaojizhuang
cf1bae3521 runtime: paas enablevhostuserstore annotation to hypervisor config
Fixes: #6073
Signed-off-by: zhaojizhuang <571130360@qq.com>
2023-01-13 17:07:38 +08:00
Eric Ernst
60ff230d80 virtcontainers: Split the factory package into Linux and Darwin bits
- split template
- split factory
- add stubs for darwin

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2023-01-12 16:51:28 -08:00
Samuel Ortiz
76437a9721 runtime: Use git rev-parse for the kata-monitor tag
The .git-commit can be a multiple line file, potentially confusing
the Darwin linker for example.

Fixes: #6046

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2023-01-12 16:01:58 -08:00
Samuel Ortiz
a9626682af virtcontainers: resourcecontrol: Add skeleton for Darwin
Cgroups do not exist on Darwin, so use an empty implementation for
resourcecontrol for the time being. In the process, ensure that the
utilized cgroup handling (ie, isSystemdCgroup) is kept in general file,
since we use this to help assess/constrain the container spec we pass to
the guest.

Fixes: #6051

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2023-01-12 15:53:28 -08:00
Samuel Ortiz
ea06fe3afc virtcontainers: Add a Network API skeleton for Darwin
Empty for now.

Fixes: #6051

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2023-01-12 15:53:28 -08:00
Eric Ernst
6ee550e9a5 runtime: vCPUs pinning is sandbox specific, not hypervisor
While at it, make sure we persist this and fix a misc typo.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2023-01-12 15:44:25 -08:00
Peng Tao
2b4b825228
Merge pull request #6032 from liubin/fix/6031-add-test-file-to-gitignore
runtime: add test generated file to .gitignore
2023-01-12 15:38:46 +08:00
Peng Tao
4a4232b851
Merge pull request #6037 from bergwolf/github/no-netns
runtime: fix up disable_netns handling
2023-01-12 09:58:24 +08:00
Eric Ernst
e3d3b72fa2 virtcontainers: use resource control for setting CPU affinity
Let's abstract the CPU affinity, instead of calling linux only code from
sandbox.

Fixes: #6044

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2023-01-11 17:55:53 -08:00
Eric Ernst
f137048be3 resource-control: add helper function for setting CPU affinity
Let's abstract the CPU affinity

Fixes: #6044

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2023-01-11 17:55:53 -08:00
Eric Ernst
73216a8104 vendor: revendor netlink to get latest
This'll address issue where netlink couldn't build on Darwin hosts.

Fixes: #6026

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2023-01-11 17:23:15 -08:00
Gabriela Cervantes
fc17d7cc41 virtcontainers: Fix misspelling in error message
This PR fixes a misspelling in the error message when it tries to run
a system without Confidential computing support.

Fixes #6042

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-01-11 21:58:07 +00:00
Peng Tao
12fd6ffc1f runtime: fix up disable_netns handling
With `disable_netns=true`, we should never scan the sandbox netns which
is the host netns in such case.

Fixes: #6021
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-01-11 12:25:24 +00:00
Bin Liu
7eb43cec15 runtime: add test generated file to .gitignore
Add test generated file to .gitignore to avoid making the
working directory dirty.

Fixes: #6031

Signed-off-by: Bin Liu <bin@hyper.sh>
2023-01-11 17:16:06 +08:00
Bin Liu
8551853cfe runtime: use system pagesize for hugepage test
In TestHandleHugepages it will do a mount operation with different pagesizes,
but some systems only support 2M pagesize, test for a 1g pagesize will fail.

This commit try to fix by only mount pagesizes under `/sys/kernel/mm/hugepages`, which are
supported to mount by the OS.

Fixes: #6029

Signed-off-by: Bin Liu <bin@hyper.sh>
2023-01-11 17:02:58 +08:00
Eric Ernst
07e77f5be7
Merge pull request #5994 from dcantah/virtcontainers_tests_darwin
virtcontainers: tests: Ensure Linux specific tests are just run on Linux
2023-01-10 17:13:28 -08:00
Fabiano Fidêncio
147c56bb8d
Merge pull request #6019 from liubin/fix/6018-virtiofsd-cache-mod
Change cache mode from none to never
2023-01-10 23:12:13 +01:00
Bin Liu
8225d8044e
Merge pull request #6003 from dcantah/fs-skeleton
virtcontainers: fs_share: Add Darwin skeleton
2023-01-10 17:48:45 +08:00
Bin Liu
86a82cace9 runtime: change cache mode from none to never
New Rust virtiofsd's `cache` mode doesn't support `none` mode,
we should use `never` to replace it.

Fixes: #6018

Signed-off-by: Bin Liu <bin@hyper.sh>
2023-01-10 17:29:48 +08:00
Eric Ernst
4d53303a7d
Merge pull request #6005 from dcantah/vfw-skeleton
virtcontainers: Add a Virtualization.framework skeleton
2023-01-09 15:50:04 -08:00
Bin Liu
1bae41a4d4
Merge pull request #5996 from dcantah/vfw-initial
virtcontainers: Introduce hypervisor_darwin
2023-01-09 11:37:02 +08:00
Samuel Ortiz
fa9ae9362c virtcontainers: Add a Virtualization.framework skeleton
Fixes: #6004

A Virtualization.framework based Hypervisor implementation.
This is just stubs for now to eventually get this building.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Signed-off-by: Danny Canter <danny@dcantah.dev>
2023-01-08 07:40:21 -08:00
Eric Ernst
d48b22bb13 virtcontainers: fs_share: add Darwin skeleton
Fixes: #6002

As a first pass for testing, let's add a skeleton for filesystem
sharing support on Darwin..

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Signed-off-by: Danny Canter <danny@dcantah.dev>
2023-01-07 19:56:47 -08:00
Bin Liu
2c10b37172
Merge pull request #5991 from dcantah/darwin-sigs
runtime: Define Darwin handled signals list
2023-01-07 11:19:48 +08:00
Bin Liu
bc8a6423e0
Merge pull request #5986 from dcantah/nydus-nonetns
nydus: net-ns handling needs to be only executed on Linux hosts
2023-01-07 11:19:07 +08:00
Eric Ernst
fafc7a8b1a virtcontainers: tests: Ensure Linux specific tests are just run on Linux
Fixes: #5993

Several tests utilize linux'isms like Mounts, bindmounts, vsock etc.

Let's ensure that these are still tested on Linux, but that we also skip
these tests when on other operating systems (Darwin). This commit just
moves tests; there shouldn't be any functional test changes. While the
tests still won't be runnable on Darwin/other hosts yet, this is a necessary
step forward.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Signed-off-by: Danny Canter <danny@dcantah.dev>
2023-01-06 11:09:11 -08:00
Fabiano Fidêncio
efa4fc0b25 clh: Add hotplug support for network devices
This is needed in order to have Moby / Docker working properly with
Cloud Hypervisor, as Moby / Docker relies on hotplugging a network
device to the VM as a preStartHook.

Fixes: #5997

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-01-06 18:59:47 +01:00
Fabiano Fidêncio
1074d2c1d3 clh: Make vmAddNetPutRequest capable of doing hotplugs
THe only bit needed for having the vmAddNetPutRequest() capable of
dealing with hotplugs, instead of only coldplugs, is making sure it
doesn't error out in case a `200` response is returned.

The 200 response means:
"""
The new device was successfully added to the VM instance.
"""

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-01-06 18:55:55 +01:00
Fabiano Fidêncio
175794458f
Merge pull request #5972 from bergwolf/github/hook
fix moby prestart hook handling
2023-01-06 14:54:39 +01:00
Eric Ernst
9ec8a13985 virtcontainers: introduce hypervisor_darwin
Fixes: #5995

Placeholder skeleton at this point - implementation will be added after
basic build refactoring lands.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Signed-off-by: Danny Canter <danny@dcantah.dev>
2023-01-06 02:03:34 -08:00
Peng Tao
8bb68a9f28 vc/network: skip existing endpoints when scanning for new ones
So that addAllEndpoints() becomes re-entrant and we can use it to scan
netns changes.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-01-06 10:01:19 +00:00
Samuel Ortiz
3b4420eb8e runtime: Define Darwin handled signals list
Fixes: #5990

Some signals may not be defined on non Linux host OSes, like
SIGSTKFLT for example. It's also not defined on certain architectures,
but irrelevant for this.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Signed-off-by: Danny Canter <danny@dcantah.dev>
2023-01-05 17:50:47 -08:00
Danny Canter
24b05a99b6 schedcore: Make buildable on !linux
Fixes: #5983

sched-core only makes sense on Linux hosts. Let's add stub/error for
other platforms.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Signed-off-by: Danny Canter <danny@dcantah.dev>
2023-01-05 11:51:04 -08:00
Danny Canter
3886aad199 nydus: net-ns handling needs to be only executed on Linux hosts
Fixes: #5985

With nydus not being its own pkg, it is challenging to implement cleanly
in a virtcontainers package that isn't necesarily Linux-only. The
existing code utilizes network namespace code in order to ensure nydus
is launched in the host netns. This is very Linux specific - so let's
make sure we only carry this out in a linux specific file.

In the Darwin case, to allow for compilation at least, let's add a stub
for doNetNS. Ideally the nydus and vc code can be refactored /
decoupled.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Signed-off-by: Danny Canter <danny@dcantah.dev>
2023-01-05 11:48:43 -08:00
Bin Liu
4ab9364aa6
Merge pull request #5946 from dcantah/clarify-var
Runtime: Clarify mutability of global var
2023-01-05 13:08:45 +08:00
Bin Liu
649d2d4b8d
Merge pull request #5964 from openanolis/kata-runtime
kata-runtime: add rust runtime path for kata-runtime exec
2023-01-05 09:35:21 +08:00
Peng Tao
d085389127 vc: fix up UT for CreateSandbox API change
Need to adapt the UT as well.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-01-03 22:30:42 +08:00
Peng Tao
578a9c25f0 vc: rescan network endpoints after running prestart hooks
Moby relies on the prestart hooks to configure network endpoints. We
should rescan the netns after running them so that the newly added
endpoints can be found and plugged to the guest.

Fixes: #5941
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-01-03 22:30:41 +08:00
Peng Tao
cb84b0fb02 katautils: run prestart hooks after starting VM
So that we can pass the hypervisor pid to the hook instead of the
runtime process's.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-01-03 10:52:32 +00:00
Danny Canter
56e7b5d0fd runtime/Makefile: Get some bits happy on darwin
Substitution in the yq install script doesn't like zsh, and additionally
the version of yq we're using doesn't have a darwin/arm64 build so grab
the amd64 version and let rosetta work its magic.

Additionally swap to abspath from readlink -m for the printing of what binaries
to install, as the -m flag doesn't exist on the BSD variant, and this
should be the same behavior.

Fixes: #5970

Signed-off-by: Danny Canter <danny@dcantah.dev>
2023-01-02 04:19:58 -08:00
Danny Canter
86ee24b33c Runtime: Clarify mutability of global var
Was about to change `urandomdev` to a constant when I realized it's
intentionally mutable so it can be mocked in tests. There's other
comments to the same effect so clarify here as well.

Fixes: #5965

Signed-off-by: Danny Canter <danny@dcantah.dev>
2023-01-02 01:13:34 -08:00
Zhongtao Hu
dae6670628 kata-runtime: add rust runtime path for kata-runtime exec
add rust runtime path for kata-runtime exec

Fixes:#5963
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-12-30 13:34:34 +08:00
Binbin Zhang
99485d871c shim: return hypervisor's pid not shim's pid
update outdated code comments

Fixes: #3234

Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
2022-12-14 11:16:11 +08:00
Fabiano Fidêncio
f1381eb361
Merge pull request #4813 from ManaSugi/fix/add-selinux-agent
runtime,agent: Add SELinux support for containers inside the guest
2022-12-13 11:24:53 +01:00
Alexandru Matei
d04d45ea05 runtime: use pidfd to wait for processes on Linux
Use pidfd_open and poll on newer versions of Linux to wait
for the process to exit. For older versions use existing wait logic

Fixes: #5617

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2022-12-06 16:31:05 +02:00
Alexandru Matei
e9ba0c11d0 runtime: use exponential backoff for process wait
Initial wait period between checks is 1ms, and the
next ones are min(wait_period*5, 50ms)

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2022-12-06 16:30:58 +02:00
Alexandru Matei
71491a69c3 runtime: move process wait logic to another function
extract process wait logic to another function

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2022-12-05 13:32:04 +02:00
Alexandru Matei
92ebe61fea runtime: reap force killed processes
reap child processes after sending SIGKILL

Fixes #5739

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2022-12-05 13:31:58 +02:00
Bin Liu
d4321ab489 runtime: Add identification in version for runtime-rs
Now we are supporting two runtime/shim, the go version,
and the rust version, for debug purposes, we can
add an identification in the version info
to tell us which runtime/shim is used.

Fixes: #5806

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-12-01 15:14:08 +08:00
Manabu Sugimoto
c617bbe70d runtime: Pass SELinux policy for containers to the agent
Pass SELinux policy for containers to the agent if `disable_guest_selinux`
is set to `false` in the runtime configuration. The `container_t` type
is applied to the container process inside the guest by default.
Users can also set a custom SELinux policy to the container process using
`guest_selinux_label` in the runtime configuration. This will be an
alternative configuration of Kubernetes' security context for SELinux
because users cannot specify the policy in Kata through Kubernetes's security
context. To apply SELinux policy to the container, the guest rootfs must
be CentOS that is created and built with `SELINUX=yes`.

Fixes: #4812

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2022-11-29 19:07:56 +09:00
GabyCT
013752667b
Merge pull request #5776 from liubin/tmp/debug-static-check
ci: let static checks don't depend on build
2022-11-28 07:51:42 -06:00
Bin Liu
6af037d379
Merge pull request #5154 from Yuan-Zhuo/main
agent: support systemd cgroup for kata agent.
2022-11-28 18:40:10 +08:00
Bin Liu
e723bad0af ci: let static checks don't depend on build
Build is a time consumable operation, skip build while let
ci run faster.

Fixes: #5777

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-11-28 15:26:04 +08:00
Bin Liu
a55eb78c32
Merge pull request #5752 from liubin/fix/5750-go-fix-1.19
runtime: go fix code for 1.19
2022-11-26 02:09:02 +08:00
Peng Tao
e32c023d96
Merge pull request #5714 from UiPath/fix-mkdir
runtime: don't fail mkdir if the folder is already created by another process
2022-11-25 17:52:56 +08:00
Bin Liu
1dfd845f51 runtime: go fix code for 1.19
We have starting to use golang 1.19, some features are
not supported later, so run `go fix` to fix them.

Fixes: #5750

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-11-25 11:29:18 +08:00
Alexandru Matei
4b45e13869 runtime: don't fail mkdir if the folder is already created
Use MkdirAll instead of Mkdir so it doesn't generate an
error when the folder is created by another process

Fixes #5713

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2022-11-24 11:20:56 +02:00
Bin Liu
06a604b753
Merge pull request #5720 from YchauWang/wyc-docs-test-22
runtime: add log record to the qemu config method `appendDevices` for…
2022-11-24 13:15:06 +08:00
Peng Tao
b4d0a39f6d
Merge pull request #5723 from fidencio/topic/runtime-bump-containerd-to-v1.6.8
runtime: Use containerd v1.6.8
2022-11-24 11:28:58 +08:00
wangyongchao.bj
30a7ebf430 runtime: Log invalid devices in QEMU config
When the user tried to add new devices to the VM, there is no error info for the invalid
 device. This PR adds a log record to the `appendDevices` for the invalid device of the
 qemu config.

Fixes: #5719

Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
2022-11-23 09:09:45 +08:00
Fabiano Fidêncio
df3d9878d5
Merge pull request #5695 from darfux/virtiofs-queue-size
runtime: Support virtiofs queue size for qemu and make it configurable
2022-11-22 20:04:30 +01:00
Fabiano Fidêncio
2539f31862 runtime: Use containerd v1.6.8
Let's follow the binary bump used in the CI and also bump the vendored
version of containerd to v1.6.8.

Fixes: #5722

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-11-22 18:28:30 +01:00
Peng Tao
a636d426d9 versions: update nydusd version
To the latest stable v2.1.1.

Depends-on: github.com/kata-containers/tests#5246
Fixes: #5635
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-11-19 16:33:29 +00:00
liyuxuan.darfux
3bb145c63a runtime: Support virtiofs queue size for qemu and make it configurable
The default vhost-user-fs queue-size of qemu is 128 now. Set it to 1024
by default which is same as clh. Also make this value configurable.

Fixes: #5694

Signed-off-by: liyuxuan.darfux <liyuxuan.darfux@bytedance.com>
2022-11-19 15:38:11 +08:00
Bo Chen
36545aa81a runtime: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v28.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.

Fixes: #5683

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-11-17 09:45:27 -08:00
Fabiano Fidêncio
d94718fb30 runtime: Fix gofmt issues
It seems that bumping the version of golang and golangci-lint new format
changes are required.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-11-17 14:16:12 +01:00
Fabiano Fidêncio
16b8375095 golang: Stop using io/ioutils
The package has been deprecated as part of 1.16 and the same
functionality is now provided by either the io or the os package.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-11-17 13:43:25 +01:00
Peng Tao
eab8d6be13 build: update golang version to 1.19.2
So that we get the latest language fixes.

There is little use to maitain compiler backward compatibility.
Let's just set the default golang version to the latest 1.19.2.

Fixes: #5494
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-11-16 19:02:39 +01:00
Alexandru Matei
a04afab74d qemu: early exit from Check if the process was stopped
Fixes: #5625

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2022-11-10 22:43:32 +02:00
Alexandru Matei
7e481f2179 qemu: set stopped only if StopVM is successful
Fixes: #5624

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2022-11-10 22:43:32 +02:00
Alexandru Matei
0e3ac66e76 clh: return faster with dead clh process from isClhRunning
Through proactively checking if Cloud Hypervisor process is dead,
this patch provides a faster path for isClhRunning

Fixes: #5623

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2022-11-10 22:43:32 +02:00
Alexandru Matei
9ef68e0c7a clh: fast exit from isClhRunning if the process was stopped
Use atomic operations instead of acquiring a mutex in isClhRunning.
This stops isClhRunning from generating a deadlock by trying to
reacquire an already-acquired lock when called via StopVM->terminate.

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2022-11-10 22:43:32 +02:00
Alexandru Matei
2631b08ff1 clh: don't try to stop clh multiple times
Avoid executing StopVM concurrently when virtiofs dies as a result of clh
being stopped in StopVM.

Fixes: #5622

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2022-11-10 22:43:32 +02:00
Fabiano Fidêncio
7250be3601
Merge pull request #5584 from fengyehong/clh-thread
cloud-hypervisor: Fix GetThreadIDs function
2022-11-07 08:22:40 +01:00
Guanglu Guo
daeee26a1e cloud-hypervisor: Fix GetThreadIDs function
Get vcpu thread-ids by reading cloud-hypervisor process tasks information.

Fixes: #5568

Signed-off-by: Guanglu Guo <guoguanglu@qiyi.com>
2022-11-05 17:23:19 +08:00
LitFlwr0
2508d39b7c runtime: added vcpus pinning logics
Core VCPU threads pinning logics for issue 4476. Also provided docs.

Fixes:#4476
Signed-off-by: LitFlwr0 <861690705@qq.com>
2022-11-04 17:52:42 +08:00
snir911
288e337a6f
Merge pull request #5434 from Rouzip/remove-doNetNS
add EnterNetNS in virtcontainers
2022-10-30 11:19:07 +02:00
Yuan-Zhuo
d7bb4b5512 agent: support systemd cgroup for kata agent
1. Implemented a rust module for operating cgroups through systemd with the help of zbus (src/agent/rustjail/src/cgroups/systemd).
2. Add support for optional cgroup configuration through fs and systemd at agent (src/agent/rustjail/src/container.rs).
3. Described the usage and supported properties of the agent systemd cgroup (docs/design/agent-systemd-cgroup.md).

Fixes: #4336

Signed-off-by: Yuan-Zhuo <yuanzhuo0118@outlook.com>
2022-10-25 13:57:09 +08:00
Bo Chen
a151d8ee50
Merge pull request #5493 from fidencio/topic/update-clh
versions: Update Cloud Hypervisor to b4e39427080
2022-10-24 07:54:02 -07:00
Fabiano Fidêncio
190e623c40
Merge pull request #5317 from Champ-Goblem/fix-containerd-stats
shim: Ensure pagesize is set when reporting hugetlb stats
2022-10-24 10:24:49 +02:00
Fabiano Fidêncio
9d286af7b4 versions: Update Cloud Hypervisor to b4e39427080
An API change, done a long time ago, has been exposed on Cloud
Hypervisor and we should update it on the Kata Containers side to ensure
it doesn't affect Cloud Hypervisor CI and because the change is needed
for an upcoming work to get QAT working with Cloud Hypervisor.

Fixes: #5492

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-10-21 20:52:54 +02:00
Rouzip
39363ffbfb
runtime: remove same function
Add EnterNetNS in virtcontainers to remove same function.

FIXes #5394

Signed-off-by: Rouzip <1226015390@qq.com>
2022-10-17 10:59:13 +08:00
Fupan Li
2c88e1cd80
Merge pull request #5302 from liubin/fix/5285-SetFsSharingSupport-comment
runtime: fix incorrect comment for SetFsSharingSupport function
2022-10-09 09:40:31 +08:00
Bin Liu
b556c9b986
Merge pull request #5235 from YchauWang/wyc-qmp-log
virtcontainers: add warn log record for qmp hotplug cpu error
2022-10-09 08:29:09 +08:00
Vijay Dhanraj
435c8f181a acrn: Enable ACRN hypervisor support for Kata 2.x release
Currently ACRN hypervisor support in Kata2.x releases is broken.
This commit re-enables ACRN hypervisor support and also refactors
the code so as to remove dependency on Sandbox.

Fixes #3027

Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
2022-10-07 07:40:32 -07:00
Archana Shinde
6e2d39c588
Merge pull request #5311 from likebreath/0930/clh_v27.0
Upgrade to Cloud Hypervisor v27.0
2022-10-04 10:56:00 -07:00
Champ-Goblem
89e62d4edf shim: Ensure pagesize is set when reporting hugetbl stats
The containerd stats method and metrics API are broken with Kata 2.5.x, the stats fail to load and the metrics API responds with status code 500

This seems to be down to the conversion from the stats reported by the agent RPC `StatsContainer` where the field `Pagesize` is not
completed by the `setHugetlbStats` method. In the case where multiple sized tables stats are reported, this causes containerd to register two metrics
with the same label set, rather than each being partitioned by the `page` label.

Fixes: #5316
Signed-off-by: Champ-Goblem <cameron@northflank.com>
2022-10-04 09:16:30 +01:00
Bo Chen
067e2b1e33 runtime: clh: Use the new API to boot with TDX firmware (td-shim)
The new way to boot from TDX firmware (e.g. td-shim) is using the
combination of '--platform tdx=on' with '--firmware tdshim'.

Fixes: #5309

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-10-03 10:30:54 -07:00
Bo Chen
5d63fcf344 runtime: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v27.0.
Note: The client code of cloud-hypervisor's (CLH) OpenAPI is
automatically generated by openapi-generator [1-2].

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Fixes: #5309

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-10-03 10:30:42 -07:00
norbjd
17de94e118 microvm: Remove kernel_irqchip=on option
`kernel_irqchip` option doesn't seem to bring any benefits and, on the
contrary, its usage cause issues when using the microvm machine type.

With this in mind, let's remove it.

Fixes: #1984, #4386

Signed-off-by: norbjd <norbjd@users.noreply.github.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-10-03 11:48:05 +02:00
Bin Liu
68e8a86aec runtime: fix incorrect comment for SetFsSharingSupport function
The comment for SetFsSharingSupport is not suitable, correct the
function name.

Fixes: #5285

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-09-30 15:44:44 +08:00
Peng Tao
8a2df6b31c
Merge pull request #4931 from jpecholt/snp-support
Added SNP-Support for Kata-Containers
2022-09-27 14:17:54 +08:00
Bin Liu
407e46b1b7
Merge pull request #5218 from bergwolf/github/deps
runtime/runtime-rs: update dependency
2022-09-27 11:02:46 +08:00
wangyongchao.bj
04bbce8dc3 virtcontainers: add warn log record for qmp hotplug cpu error
The qmp command of hotplug cpu failed error was hidden. It didn't friendly for
the user tracing the hotplug cpu error. The PR help us to improve the hotplug
cpu error log. Add real qemu command error log for `failed to hot add vCPUs`.
Through the error message, we can get the reason of the failed qmp command
 for hotplug cpu operation.

Fixes: #5234

Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
2022-09-23 08:22:30 +08:00
Peng Tao
9628c7df0c runtime: update runc dependency
To bring fix to CVE-2022-29162.

Fixes: #5217
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-09-21 17:21:37 +08:00
Joana Pecholt
ded60173d4 runtime: Enable choice between AMD SEV and SNP
This is based on a patch from @niteeshkd that adds a config
parameter to choose between AMD SEV and SEV-SNP VMs as the
confidential guest type in case both types are supported. SEV is
the default.

Signed-off-by: Joana Pecholt <joana.pecholt@aisec.fraunhofer.de>
2022-09-16 17:51:41 +02:00
Joana Pecholt
22bda0838c runtime: Support for AMD SEV-SNP VMs
This commit adds AMD SEV-SNP as a confidential guest option to the
runtime. Information on required components such as OVMF, QEMU and
a kernel supporting SEV-SNP are defined in the versions file and
corresponding configs are added.

Note: The CPU model 'host' provided by the current SNP-QEMU does
not support all SNP capabilities yet, which is why this option is
changed to EPYC-v4.

Note: The guest's physical address space reduction specified with
ReducedPhysBits is 1. Details are can be found in Section 15.34.6
here https://www.amd.com/system/files/TechDocs/24593.pdf

Fixes #4437

Signed-off-by: Joana Pecholt <joana.pecholt@aisec.fraunhofer.de>
2022-09-16 17:51:41 +02:00
Joana Pecholt
105eda5b9a runtime: Initrd path option added to config
Adds initrd configuration option to the configuration.toml that is
generated for the setup using QEMU.

Signed-off-by: Joana Pecholt <joana.pecholt@aisec.fraunhofer.de>
2022-09-16 17:51:41 +02:00
Feng Wang
f914319874 runtime: store the user name in hypervisor config
The user name will be used to delete the user instead of relying on
uid lookup because uid can be reused.

Fixes: #5155

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-09-13 10:32:55 -07:00
Feng Wang
5cafe21770 runtime: make StopVM thread-safe
StopVM can be invoked by multiple threads and needs to be thread-safe

Fixes: #5155

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-09-12 21:56:15 -07:00
Feng Wang
c3015927a3 runtime: add more debug logs for non-root user operation
Previously the logging was insufficient and made debugging difficult

Fixes: #5155

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-09-12 21:38:57 -07:00
Eric Ernst
9997ab064a sandbox_test: Add test to verify memory hotplug behavior
Augment the mock hypervisor so that we can validate that ACPI memory hotplug
is carried out as expected.

We'll augment the number of memory slots in the hypervisor config each
time the memory of the hypervisor is changed. In this way we can ensure
that large memory hotplugs are broken up into appropriately sized
pieces in the unit test.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-08-31 10:32:30 -07:00
Eric Ernst
f390c122f0 sandbox: don't hotplug too much memory at once
If we're using ACPI hotplug for memory, there's a limitation on the
amount of memory which can be hotplugged at a single time.

During hotplug, we'll allocate memory for the memmap for each page,
resulting in a 64 byte per 4KiB page allocation. As an example, hotplugging 12GiB
of memory requires ~192 MiB of *free* memory, which is about the limit
we should expect for an idle 256 MiB guest (conservative heuristic of 75%
of provided memory).

From experimentation, at pod creation time we can reliably add 48 times
what is provided to the guest. (a factor of 48 results in using 75% of
provided memory for hotplug). Using prior example of a guest with 256Mi
RAM, 256 Mi * 48 = 12 Gi; 12GiB is upper end of what we should expect
can be hotplugged successfully into the guest.

Note: It isn't expected that we'll need to hotplug large amounts of RAM
after workloads have already started -- container additions are expected
to occur first in pod lifecycle. Based on this, we expect that provided
memory should be freely available for hotplug.

If virtio-mem is being utilized, there isn't such a limitation - we can
hotplug the max allowed memory at a single time.

Fixes: #4847

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-08-31 10:32:30 -07:00
Eric Ernst
e0142db24f hypervisor: Add GetTotalMemoryMB to interface
It'll be useful to get the total memory provided to the guest
(hotplugged + coldplugged). We'll use this information when calcualting
how much memory we can add at a time when utilizing ACPI hotplug.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-08-30 16:37:47 -07:00
Archana Shinde
7d52934ec1
Merge pull request #4798 from amshinde/use-iouring-qemu
Use iouring for qemu block devices
2022-08-26 04:00:24 +05:30
Fabiano Fidêncio
ddc94e00b0
Merge pull request #4982 from fidencio/topic/improve-cloud-hypervisor-plus-tdx-support
TDX: Get TDX working again with Cloud Hypervisor + a minor change on QEMU's code
2022-08-25 08:53:10 +02:00
Fabiano Fidêncio
dc90eae17b qemu: Drop unnecessary tdx_guest kernel parameter
With the current TDX kernel used with Kata Containers, `tdx_guest` is
not needed, as TDX_GUEST is now a kernel configuration.

With this in mind, let's just drop the kernel parameter.

Fixes: #4981

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-24 20:02:43 +02:00
Fabiano Fidêncio
d4b67613f0 clh: Use HVC console with TDX
As right now the TDX guest kernel doesn't support "serial" console,
let's switch to using HVC in this case.

Fixes: #4980

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-24 20:02:40 +02:00
Fabiano Fidêncio
c0cb3cd4d8 clh: Avoid crashing when memory hotplug is not allowed
The runtime will crash when trying to resize memory when memory hotplug
is not allowed.

This happens because we cannot simply set the hotplug amount to zero,
leading is to not set memory hotplug at all, and later then trying to
access the value of a nil pointer.

Fixes: #4979

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-24 20:02:22 +02:00
Fabiano Fidêncio
9f0a57c0eb clh: Increase API and SandboxStop timeouts for TDX
While doing tests using `ctr`, I've noticed that I've been hitting those
timeouts more frequently than expected.

Till we find the root cause of the issue (which is *not* in the Kata
Containers), let's increase the timeouts when dealing with a
Confidential Guest.

Fixes: #4978

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-24 20:02:12 +02:00
Fabiano Fidêncio
c142fa2541 clh: Lift the sharedFS restriction used with TDX
When booting the TDX kernel with `tdx_disable_filter`, as it's been done
for QEMU, VirtioFS can work without any issues.

Whether this will be part of the upstream kernel or not is a different
story, but it easily could make it there as Cloud Hypervisor relies on
the VIRTIO_F_IOMMU_PLATFORM feature, which forces the guest to use the
DMA API, making these devices compatible with TDX.

See Sebastien Boeuf's explanation of this in the
3c973fa7ce208e7113f69424b7574b83f584885d commit:
"""
By using DMA API, the guest triggers the TDX codepath to share some of
the guest memory, in particular the virtqueues and associated buffers so
that the VMM and vhost-user backends/processes can access this memory.
"""

Fixes: #4977

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-08-24 17:14:05 +02:00
Peng Tao
a06d819b24 runtime: cri-o annotations have been moved to podman
Let's swith to depending on podman which also simplies indirect
dependency on kubernetes components. And it helps to avoid cri-o
security issues like CVE-2022-1708 as well.

Fixes: #4972
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-08-24 18:11:37 +08:00
Bin Liu
6551d4f25a
Merge pull request #4051 from bergwolf/github/vmx-vm-factory
enable vmx for vm factory
2022-08-24 16:22:37 +08:00
Fabiano Fidêncio
9806ce8615
Merge pull request #4937 from chenhengqi/fix-error-msg
network: Fix error message for setting hardware address on TAP interface
2022-08-19 17:54:58 +02:00
Fabiano Fidêncio
828383bc39
Merge pull request #4933 from likebreath/0816/prepare_clh_v26.0
Upgrade to Cloud Hypervisor v26.0
2022-08-18 18:36:53 +02:00
Peng Tao
f508c2909a runtime: constify splitIrqChipMachineOptions
A simple cleanup.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-08-18 10:09:20 +08:00
Peng Tao
2b0587db95 runtime: VMX is migratible in vm factory case
We are not spinning up any L2 guests in vm factory, so the L1 guest
migration is expected to work even with VMX.

See https://www.linux-kvm.org/page/Nested_Guests

Fixes: #4050
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-08-18 10:08:43 +08:00
Peng Tao
fa09f0ec84 runtime: remove qemuPaths
It is broken that it doesn't list QemuVirt machine type. In fact we
don't need it at all. Just drop it.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-08-18 10:06:10 +08:00
Bo Chen
3a597c2742 runtime: clh: Use the new 'payload' interface
The new 'payload' interface now contains the 'kernel' and 'initramfs'
config.

Fixes: #4952

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-08-17 12:23:43 -07:00
Bo Chen
16baecc5b1 runtime: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v26.0.
Note: The client code of cloud-hypervisor's (CLH) OpenAPI is
automatically generated by openapi-generator [1-2].

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Fixes: #4952

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-08-17 12:23:12 -07:00
Hengqi Chen
8ff5c10ac4 network: Fix error message for setting hardware address on TAP interface
Error out with the correct interface name and hardware address instead.

Fixes: #4944

Signed-off-by: Hengqi Chen <chenhengqi@outlook.com>
2022-08-17 16:42:07 +08:00
Chelsea Mafrica
fcc1e0c617 runtime: tracing: End root span at end of trace
The root span should exist the duration of the trace. Defer ending span
until the end of the trace instead of end of function. Add the span to
the service struct to do so.

Fixes #4902

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2022-08-12 13:15:39 -07:00
Bin Liu
cb7f9524be
Merge pull request #4804 from openanolis/anolis/merge_runtime_rs_to_main
runtime-rs:merge runtime rs to main
2022-08-11 08:40:41 +08:00
Tim Zhang
4813a3cef9
Merge pull request #4711 from liubin/fix/4710-wait-nydusd-api-server-ready
nydus: wait nydusd API server ready before mounting share fs
2022-08-10 17:20:17 +08:00
liubin
2ae807fd29 nydus: wait nydusd API server ready before mounting share fs
If the API server is not ready, the mount call will fail, so before
mounting share fs, we should wait the nydusd is started and
the API server is ready.

Fixes: #4710

Signed-off-by: liubin <liubin0329@gmail.com>
Signed-off-by: Bin Liu <bin@hyper.sh>
2022-08-08 16:18:38 +08:00
Tim Zhang
8d4d98587f
Merge pull request #4746 from liubin/fix/4745-add-log-field
runtime: explicitly mark the source of the log is from qemu.log
2022-08-08 15:21:01 +08:00
Archana Shinde
c1e3b8f40f govmm: Refactor qmp functions for adding block device
Instead of passing a bunch of arguments to qmp functions for
adding block devices, use govmm BlockDevice structure to reduce these.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-08-05 13:16:34 -07:00
Archana Shinde
598884f374 govmm: Refactor code to get rid of redundant code
Get rid of redundant return values from function.
args and blockdevArgs used to return different values to maintain
compatilibity between qemu versions. These are exactly the same now.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-08-05 13:16:34 -07:00
Archana Shinde
00860a7e43 qmp: Pass aio backend while adding block device
Allow govmm to pass aio backend while adding block device.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-08-05 13:16:34 -07:00
Archana Shinde
e1b49d7586 config: Add block aio as a supported annotation
Allow Block AIO to be passed as a per pod annotation.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-08-05 13:16:34 -07:00
Archana Shinde
ed0f1d0b32 config: Add "block_device_aio" as a config option for qemu
This configuration will allow users to choose between different
I/O backends for qemu, with the default being io_uring.
This will allow users to fallback to a different I/O mechanism while
running on kernels olders than 5.1.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-08-05 13:16:34 -07:00
chmod100
d8ad16a34e runtime: add unlock before return in sendReq
Unlock is required before return, so there need to add unlock

Fixes: #4827

Signed-off-by: chmod100 <letfu@outlook.com>
2022-08-05 13:30:12 +00:00
Archana Shinde
b6cd2348f5 govmm: Add io_uring as AIO type
io_uring was introduced as a new kernel IO interface in kernel 5.1.
It is designed for higher performance than the older Linux AIO API.
This feature was added in qemu 5.0.

Fixes #4645

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-08-03 10:43:12 -07:00
Archana Shinde
81cdaf0771 govmm: Correct documentation for Linux aio.
The comments for "native" aio are incorrect. Correct these.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-08-03 10:41:50 -07:00
Zhongtao Hu
adfad44efe Merge remote-tracking branch 'origin/main' into runtime-rs-merge-tmp
To keep runtime-rs up to date, we will merge main into runtime-rs every
week.

Fixes:#4776
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-08-01 11:12:48 +08:00
yaoyinnan
5c3155f7e2 runtime: Support for host cgroup v2
Support cgroup v2 on the host. Update vendor containerd/cgroups to add cgroup v2.

Fixes: #3073

Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
2022-07-28 10:30:45 +08:00
Bin Liu
85f4e7caf6 runtime: explicitly mark the source of the log is from qemu.log
In qemu.StopVM(), if debug is enabled, the shim will dump logs
from qemu.log, but users don't know which logs are from qemu.log
and shim itself. Adding some additional messages will
help users to distinguish these logs.

Fixes: #4745

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-07-26 16:08:59 +08:00
gntouts
56d49b5073 versions: Update Firecracker version to v1.1.0
This patch upgrades Firecracker version from v0.23.4 to v1.1.0

* Generate swagger models for v1.1.0 (from firecracker.yaml)
* Replace ht_enabled param to smt (API change)
* Remove NUMA-related jailer param --node 0

Fixes: #4673
Depends-on: github.com/kata-containers/tests#4968

Signed-off-by: George Ntoutsos <gntouts@nubificus.co.uk>
Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
2022-07-26 07:01:26 +00:00
Ji-Xinyou
62182db645 runtime-rs: add unit test for ipvlan endpoint
Add unit test to check the integrity of IPVlanEndpoint::new(...)

Fixes: #4655
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
2022-07-18 15:56:06 +08:00
wllenyj
274598ae56 kata-runtime: add dragonball config check support.
add dragonball config check support.

Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
2022-07-14 10:43:50 +08:00
Fabiano Fidêncio
be31207f6e clh: Don't crash if no network device is set by the upper layer
`ctr` doesn't set a network device when creating the sandbox, which
leads to Cloud Hypervisor's driver crashing, see the log below:
```
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x55641c23b248]
goroutine 32 [running]:
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.glob..func1(0xc000397900)
	/home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/clh.go:163 +0x128
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.(*cloudHypervisor).vmAddNetPut(...)
	/home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/clh.go:1348
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.(*cloudHypervisor).bootVM(0xc000397900, {0x55641c76dfc0, 0xc000454ae0})
	/home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/clh.go:1378 +0x5a2
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.(*cloudHypervisor).StartVM(0xc000397900, {0x55641c76dff8, 0xc00044c240},
0x55641b8016fd)
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/clh.go:659 +0x7ee
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.(*Sandbox).startVM.func2()
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/sandbox.go:1219 +0x190
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.(*LinuxNetwork).Run.func1({0xc0004a8910, 0x3b})
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/network_linux.go:319 +0x1b
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.doNetNS({0xc000048440, 0xc00044c240}, 0xc0005d5b38)
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/network_linux.go:1045 +0x163
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.(*LinuxNetwork).Run(0xc000150c80, {0x55641c76dff8, 0xc00044c240}, 0xc00014e4e0)
	/home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/network_linux.go:318 +0x105
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.(*Sandbox).startVM(0xc000107d40, {0x55641c76dff8, 0xc0005529f0})
	/home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/sandbox.go:1205 +0x65f
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.createSandboxFromConfig({_, _}, {{0x0, 0x0, 0x0}, {0xc000385a00, 0x1, 0x1},
{0x55641d033260, 0x0, ...}, ...}, ...)
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/api.go:91 +0x346
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.CreateSandbox({_, _}, {{0x0, 0x0, 0x0}, {0xc000385a00, 0x1, 0x1},
{0x55641d033260, 0x0, ...}, ...}, ...)
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/api.go:51 +0x150
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.(*VCImpl).CreateSandbox(_, {_, _}, {{0x0, 0x0, 0x0}, {0xc000385a00, 0x1, 0x1},
{0x55641d033260, ...}, ...})
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/implementation.go:35 +0x74
github.com/kata-containers/kata-containers/src/runtime/pkg/katautils.CreateSandbox({_, _}, {_, _}, {{0xc0004806c0, 0x9}, 0xc000140110, 0xc00000f7a0,
{0x0, 0x0}, ...}, ...)
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/pkg/katautils/create.go:175 +0x8b6
github.com/kata-containers/kata-containers/src/runtime/pkg/containerd-shim-v2.create({0x55641c76dff8, 0xc0004129f0}, 0xc00034a000, 0xc00036a000)
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/pkg/containerd-shim-v2/create.go:147 +0xdea
github.com/kata-containers/kata-containers/src/runtime/pkg/containerd-shim-v2.(*service).Create.func2()
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/pkg/containerd-shim-v2/service.go:401 +0x32
created by github.com/kata-containers/kata-containers/src/runtime/pkg/containerd-shim-v2.(*service).Create
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/pkg/containerd-shim-v2/service.go:400 +0x534
```

This bug has been introduced as part of the
https://github.com/kata-containers/kata-containers/pull/4312 PR, which
changed how we add the network device.

In order to avoid the crash, let's simply check whether we have a device
to be added before iterating the list of network devices.

Fixes: #4618

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-07-13 10:40:21 +02:00
Fabiano Fidêncio
dc3b6f6592 versions: Update Cloud Hypervisor to v25.0
Cloud Hypervisor v25.0 has been released on July 7th, 2022, and brings
the following changes:

**ch-remote Improvements**
The ch-remote command has gained support for creating the VM from a JSON
config and support for booting and deleting the VM from the VMM.

**VM "Coredump" Support**
Under the guest_debug feature flag it is now possible to extract the memory
of the guest for use in debugging with e.g. the crash utility.
(https://github.com/cloud-hypervisor/cloud-hypervisor/issues/4012)

**Notable Bug Fixes**
* Always restore console mode on exit
  (https://github.com/cloud-hypervisor/cloud-hypervisor/issues/4249,
   https://github.com/cloud-hypervisor/cloud-hypervisor/issues/4248)
* Restore vCPUs in numerical order which fixes aarch64 snapshot/restore
  (https://github.com/cloud-hypervisor/cloud-hypervisor/issues/4244)
* Don't try and configure IFF_RUNNING on TAP devices
  (https://github.com/cloud-hypervisor/cloud-hypervisor/issues/4279)
* Propagate configured queue size through to vhost-user backend
  (https://github.com/cloud-hypervisor/cloud-hypervisor/issues/4286)
* Always Program vCPU CPUID before running the vCPU to fix running on Linux
  5.16
  (https://github.com/cloud-hypervisor/cloud-hypervisor/issues/4156)
* Enable ACPI MADT "Online Capable" flag for hotpluggable vCPUs to fix newer
  Linux guest

**Removals**
The following functionality has been removed:

* The mergeable option from the virtio-pmem support has been removed
  (https://github.com/cloud-hypervisor/cloud-hypervisor/issues/3968)
* The dax option from the virtio-fs support has been removed
  (https://github.com/cloud-hypervisor/cloud-hypervisor/issues/3889)

Fixes: #4641

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-07-12 14:47:58 +00:00
Manabu Sugimoto
4d89476c91 runtime: Fix DisableSelinux config
Enable Kata runtime to handle `disable_selinux` flag properly in order
to be able to change the status by the runtime configuration whether the
runtime applies the SELinux label to VMM process.

Fixes: #4599
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2022-07-06 15:50:28 +09:00
Fabiano Fidêncio
071dd4c790
Merge pull request #4109 from pmores/drop-in-cfg-files-support
Drop in cfg files support
2022-07-05 22:21:24 +02:00
Peng Tao
a1de394e51
Merge pull request #4550 from liubin/fix/4548-overwrite-mount-type-for-bind-mount
runtime: overwrite mount type to bind for bind mounts
2022-07-04 19:56:26 +08:00
liubin
1f363a386c runtime: overwrite mount type to bind for bind mounts
Some clients like nerdctl may pass mount type of none for volumes/bind mounts,
this will lead to container start fails.

Referring to runc, it overwrites the mount type to bind and ignores the input value.

Fixes: #4548

Signed-off-by: liubin <liubin0329@gmail.com>
2022-07-01 12:13:01 +08:00
GabyCT
02a51e75a7
Merge pull request #4554 from liubin/fix/delete-not-used-console-from-container-config
runtime: delete Console from Cmd type
2022-06-30 11:40:07 -05:00
Fabiano Fidêncio
aa561b49f5
Merge pull request #4540 from fidencio/topic/default_maxmemory
Add `default_maxmemory` config option
2022-06-30 12:08:15 +02:00
GabyCT
2a94261df5
Merge pull request #4549 from liubin/fix/4419-set-status-if-wait-process-failed
shim: set a non-zero return code if the wait process call failed.
2022-06-29 17:04:53 -05:00
Fabiano Fidêncio
1e12d56512
Merge pull request #4469 from egernst/config-validation-refactor
Refactor how hypervisor config validation is handled
2022-06-29 14:42:11 +02:00
liubin
a5a25ed13d runtime: delete Console from Cmd type
There is much code related to this property, but it is not used anymore.

Fixes: #4553

Signed-off-by: liubin <liubin0329@gmail.com>
2022-06-29 17:36:32 +08:00
Pavel Mores
96553e8bd2 runtime: Add documentation of drop-in config file fragments
Added user manual for the drop-in config file fragments feature.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2022-06-29 10:56:53 +02:00
Pavel Mores
c656457e90 runtime: Add tests of drop-in config file decoding
The tests ensure that interactions between drop-ins and the base
configuration.toml and among drop-ins themselves work as intended,
basically that files are evaluated in the correct order (base file
first, then drop-ins in alphabetical order) and the last one to set
a specific key wins.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2022-06-29 09:54:39 +02:00
Pavel Mores
99f5ca80fc runtime: Plug drop-in decoding into decodeConfig()
Fixes #4108

Signed-off-by: Pavel Mores <pmores@redhat.com>
2022-06-29 09:54:38 +02:00
Pavel Mores
0f9856c465 runtime: Scan drop-in directory, read files and decode them
updateFromDropIn() uses the infrastructure built by previous commits to
ensure no contents of 'tomlConfig' are lost during decoding.   To do
this, we preserve the current contents of our tomlConfig in a clone and
decode a drop-in into the original.  At this point, the original
instance is updated but its Agent and/or Hypervisor fields are
potentially damaged.

To merge, we update the clone's Agent/Hypervisor from the original
instance.   Now the clone has the desired Agent/Hypervisor and the
original instance has the rest, so to finish, we just need to move the
clone's Agent/Hypervisor to the original.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2022-06-29 09:54:38 +02:00
Pavel Mores
2c1efcc697 runtime: Add helpers to copy fields between tomlConfig instances
These functions take a TOML key - an array of individual components,
e.g. ["agent" "kata" "enable_tracing"], as returned by BurntSushi - and
two 'tomlConfig' instances.  They copy the value of the struct field
identified by the key from the source instance to the target one if
necessary.

This is only done if the TOML key points to structures stored in
maps by 'tomlConfig', i.e. 'hypervisor' and 'agent'.  Nothing needs to
be done in other cases.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2022-06-29 09:54:38 +02:00
Pavel Mores
20f11877be runtime: Add framework to manipulate config structs via reflection
For 'tomlConfig' substructures stored in Golang maps - 'hypervisor' and
'agent' - BurntSushi doesn't preserve their previous contents as it does
for substructures stored directly (e.g. 'runtime').  We use reflection
to work around this.

This commit adds three primitive operations to work with struct fields
identified by their `toml:"..."` tags - one to get a field value, one to
set a field value and one to assign a source struct field value to the
corresponding field of a target.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2022-06-29 09:54:38 +02:00
liubin
ab5f1c9564 shim: set a non-zero return code if the wait process call failed.
Return code is an int32 type, so if an error occurred, the default value
may be zero, this value will be created as a normal exit code.

Set return code to 255 will let the caller(for example Kubernetes) know
that there are some problems with the pod/container.

Fixes: #4419

Signed-off-by: liubin <liubin0329@gmail.com>
2022-06-29 12:33:32 +08:00
Eric Ernst
e5be5cb086 runtime: device: cleanup outdated comments
Prior device config move didn't update the comments. Let's address this,
and make sure comments match the new path...

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-06-28 18:22:28 -07:00
Eric Ernst
5f936f268f virtcontainers: config validation is host specific
Ideally this config validation would be in a seperate package
(katautils?), but that would introduce circular dependency since we'd
call it from vc, and it depends on vc types (which, shouldn't be vc, but
probably a hypervisor package instead).

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-06-28 18:22:28 -07:00
Fabiano Fidêncio
323271403e virtcontainers: Remove unused function
While working on the previous commits, some of the functions become
non-used.  Let's simply remove them.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-06-28 21:19:24 +02:00
Fabiano Fidêncio
0939f5181b config: Expose default_maxmemory
Expose the newly added `default_maxmemory` to the project's Makefile and
to the configuration files.

Fixes: #4516

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-06-28 21:19:24 +02:00
Fabiano Fidêncio
58ff2bd5c9 clh,qemu: Adapt to using default_maxmemory
Let's adapt Cloud Hypervisor's and QEMU's code to properly behave to the
newly added `default_maxmemory` config.

While implementing this, a change of behaviour (or a bug fix, depending
on how you see it) has been introduced as if a pod requests more memory
than the amount avaiable in the host, instead of failing to start the
pod, we simply hotplug the maximum amount of memory available, mimicing
better the runc behaviour.

Fixes: #4516

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-06-28 21:19:24 +02:00
Tim Zhang
916ffb75d7
Merge pull request #4432 from liubin/fix/4420-binary-log
shim: support shim v2 logging plugin
2022-06-28 16:29:07 +08:00
Fabiano Fidêncio
afdc960424 hypervisor: Add default_maxmemory configuration
Let's add a `default_maxmemory` configuration, which allows the admins
to set the maximum amount of memory to be used by a VM, considering the
initial amount + whatever ends up being hotplugged via the pod limits.

By default this value is 0 (zero), and it means that the whole physical
RAM is the limit.

Fixes: #4516

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-06-28 08:32:15 +02:00
Bin Liu
4e30e11b31 shim: support shim v2 logging plugin
Now kata shim only supports stdout/stderr of fifo from
containerd/CRI-O, but shim v2 supports logging plugins,
and nerdctl default will use the binary schema for logs.

This commit will add the others type of log plugins:

- file
- binary

In case of binary, kata shim will receive a stdout/stderr like:

binary:///nerdctl?_NERDCTL_INTERNAL_LOGGING=/var/lib/nerdctl/1935db59

That means the nerdctl process will handle the logs(stdout/stderr)

Fixes: #4420

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-06-28 13:54:22 +08:00
Eric Ernst
bdf5e5229b virtcontainers: validate hypervisor config outside of hypervisor itself
Depending on the user of it, the hypervisor from hypervisor interface
could have differing view on what is valid or not. To help decouple,
let's instead check the hypervisor config validity as part of the
sandbox creation, rather than as part of the CreateVM call within the
hypervisor interface implementation.

Fixes: #4251

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-06-27 11:53:41 -07:00
Eric Ernst
469e098543 katautils: don't do validation when loading hypervisor config
Policy for whats valid/invalid within the config varies by VMM, host,
and by silicon architecture. Let's keep katautils simple for just
translating a toml to the hypervisor config structure, and leave
validation to virtcontainers.

Without this change, we're doing duplicate validation.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-06-27 10:13:26 -07:00
Bin Liu
27b1bb5ed9
Merge pull request #4467 from egernst/device-pkg
device package cleanup/refactor
2022-06-27 14:40:53 +08:00
Eric Ernst
e32bf53318 device: deduplicate state structures
Before, we maintained almost identical structures between our persist
API and what we keep for our devices, with the persist API being a
slight subset of device structures.

Let's deduplicate this, now that persist is importing device package.
Json unmarshal of prior persist structure will work fine, since it was
an exact subset of fields.

Fixes: #4468

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-06-26 21:31:29 -07:00
Eric Ernst
f97d9b45c8 runtime: device/persist: drop persist dependency from device pkgs
Rather than have device package depend on persist, let's define the
(almost duplicate) structures within device itself, and have the Kata
Container's persist pkg import these.

This'll help avoid unecessary dependencies within our core packages.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-06-26 21:31:29 -07:00
Eric Ernst
f9e96c6506 runtime: device: move to top level package
Let's move device package to runtime/pkg instead of being buried under
virtcontainers.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-06-26 21:31:29 -07:00
Fabiano Fidêncio
133528dd14
Merge pull request #4503 from amshinde/multi-queue-block
block: Leverage multiqueue for virtio-block
2022-06-23 12:17:11 +02:00
Fabiano Fidêncio
78e27de6c3
Merge pull request #4358 from zvonkok/memreserve
runtime: Add heuristic to get the right value(s) for mem-reserve
2022-06-22 13:41:23 +02:00
Archana Shinde
e227b4c404 block: Leverage multiqueue for virtio-block
Similar to network, we can use multiple queues for virtio-block
devices. This can help improve storage performance.
This commit changes the number of queues for block devices to
the number of cpus for cloud-hypervisor and qemu.

Today the default number of cpus a VM starts with is 1.
Hence the queues used will be 1. This change will help
improve performance when the default cold-plugged cpus is greater
than one by changing this in the config file. This may also help
when we use the sandboxing feature with k8s that passes down
the sum of the resources required down to Kata.

Fixes #4502

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-06-21 12:38:53 -07:00
Eric Ernst
72049350ae
Merge pull request #4288 from fengwang666/enable-qemu-sandbox
runtime: enable sandbox feature on qemu
2022-06-21 09:22:26 -07:00
Zvonko Kaiser
e7e7dc9dfe runtime: Add heuristic to get the right value(s) for mem-reserve
Fixes: #2938

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2022-06-21 03:44:28 -07:00
Liang Zhou
ef925d40ce runtime: enable sandbox feature on qemu
Enable "-sandbox on" in qemu can introduce another protect layer
on the host, to make the secure container more secure.

The default option is disable because this feature may introduce some
performance cost, even though user can enable
/proc/sys/net/core/bpf_jit_enable to reduce the impact.

Fixes: #2266

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-06-17 15:30:46 -07:00
Chelsea Mafrica
28995301b3 tracing: Remove whitespace from root span
Remove space from root span name to follow camel casing of other tracing
span names in the runtime and to make parsing easier in testing.

Fixes #4483

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2022-06-17 12:07:37 -07:00
Fabiano Fidêncio
f30fe86dc1
Merge pull request #4456 from Bevisy/fixIssue4454
docs: Update outdated URLs and keep them available
2022-06-16 10:26:24 +02:00
Bin Liu
553ec46115
Merge pull request #4436 from alex-matei/fix/sandbox-mem-overflow
runtime: fix error when trying to parse sandbox sizing annotations
2022-06-16 11:18:24 +08:00
James O. D. Hunt
9766a285a4
Merge pull request #4422 from snir911/dependabot_bumps
deps: Resolve dependabot bumps of containerd, crossbeam-utils, regex
2022-06-15 15:57:53 +01:00
Binbin Zhang
a305bafeef docs: Update outdated URLs and keep them available
By comparing the content of the old url and the new url,
ensure that their content is consistent and does not contain ambiguities

Fixes: #4454

Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
2022-06-15 16:34:28 +08:00
Fabiano Fidêncio
ac5dbd8598 clh: Improve logging related to the net dev addition
Let's improve the log so we make it clear that we're only *actually*
adding the net device to the Cloud Hypervisor configuration when calling
our own version of VmAddNetPut().

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-06-14 10:53:09 +00:00
Fabiano Fidêncio
0b75522e1f network: Set queues to 1 to ensure we get the network fds
We want to have the file descriptors of the opened tuntap device to pass
them down to the VMMs, so the VMMs don't have to explicitly open a new
tuntap device themselves, as the `container_kvm_t` label does not allow
such a thing.

With this change we ensure that what's currently done when using QEMU as
the hypervisor, can be easily replicated with other VMMs, even if they
don't support multiqueue.

As a side effect of this, we need to close the received file descriptors
in the code of the VMMs which are not going to use them.

Fixes: #3533

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-06-14 10:53:09 +00:00
Fabiano Fidêncio
93b61e0f07 network: Add FFI_NO_PI to the netlink flags
Adding FFI_NO_PI to the netlink flags causes no harm to the supported
and tested hypervisors as when opening the device by its name Cloud
Hypervisor[0], Firecracker[1], and QEMU[2] do set the flag already.

However, when receiving the file descriptor of an opened tutap device
Cloud Hypervisor is not able to set the flag, leaving the guest without
connectivity.

To avoid such an issue, let's simply add the FFI_NO_PI flag to the
netlink flags and ensure, from our side, that the VMMs don't have to set
it on their side when dealing with an already opened tuntap device.

Note that there's a PR opened[3] just for testing that this change
doesn't cause any breakage.

[0]: e52175c2ab/net_util/src/tap.rs (L129)
[1]: b6d6f71213/src/devices/src/virtio/net/tap.rs (L126)
[2]: 3757b0d08b/net/tap-linux.c (L54)
[3]: https://github.com/kata-containers/kata-containers/pull/4292

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-06-14 10:53:09 +00:00
Fabiano Fidêncio
bf3ddc125d clh: Pass the tuntap fds down to Cloud Hypervisor
This is basically a no-op right now, as:
* netPair.TapInterface.VMFds is nil
* the tap name is still passed to Cloud Hypervisor, which is the Cloud
  Hypervisor's first choice when opening a tap device.

In the very near future we'll stop passing the tap name to Cloud
Hypervisor, and start passing the file descriptors of the opened tap
instead.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-06-14 10:53:09 +00:00
Fabiano Fidêncio
55ed32e924 clh: Take care of the VmAdNetdPut request ourselves
Knowing that VmAddNetPut works as expected, let's switch to manually
building the request and writing it to the appropriate socket.

By doing this it gives us more flexibility to, later on, pass the file
descriptor of the tuntap device to Cloud Hypervisor, as openAPI doesn't
support such operation (it has no notion of SCM Rights).

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-06-14 10:53:09 +00:00
Fabiano Fidêncio
01fe09a4ee clh: Hotplug the network devices
Instead of creating the VM with the network device already plugged in,
let's actually add the network device *after* the VM is created, but
*before* the Vm is actually booted.

Although it looks like it doesn't make any functional difference between
what's done in the past and what this commit introduces, this will be
used to workaround a limitation on OpenAPI when it comes to passing down
the network device's file descriptor to Cloud Hypervisor, so Cloud
Hypervisor can use it instead of opening the device by its name on the
VMM side.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-06-14 10:51:02 +00:00
Fabiano Fidêncio
2e07538334 clh: Expose VmAddNetPut
VmAddNetPut is the API provided by the Cloud Hypervisor client (auto
generated) code to hotplug a new network device to the VM.

Let's expose it now as it'll be used as part this series, mostly to
guide the reviewer through the process of what we have to do, as later
on, spoiler alert, it'll end up being removed.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-06-14 10:27:30 +00:00
Fabiano Fidêncio
a80eb33cd6
Merge pull request #4308 from fidencio/topic/virtiofsd-switch-to-using-the-rust-version-on-all-arches
runtime: Switch to using the rust version of virtiofsd (all arches but powerpc)
2022-06-13 13:45:51 +02:00
Bin Liu
81acfc1286
Merge pull request #4425 from liubin/fix/4376-change-log-level-of-getoomevent
shim: change the log level for GetOOMEvent call failures
2022-06-13 17:53:11 +08:00
James O. D. Hunt
9b93db0220
Merge pull request #4417 from jodh-intel/docs-monitor-considerations
docs: Add more kata monitor details
2022-06-13 10:51:52 +01:00
Fabiano Fidêncio
1ef0b7ded0 runtime: Switch to using the rust version of virtiofsd (all but power)
So far this has been done for x86_64.  Now that the support for building
and testing has been added for all arches, let's do the second part of
the switch.

We're still not done yet for powerpc, as some a virtifosd crash on the
rust version has been found by the maintainer.

Fixes: #4258, #4260

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-06-13 10:41:26 +02:00
Alexandru Matei
721ca72a64 runtime: fix error when trying to parse sandbox sizing annotations
Changed bitsize for parsing functions to 64-bit in order to avoid
parsing errors.

Fixes #4435

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2022-06-11 18:51:10 +03:00
Archana Shinde
aefe11b9ba
Merge pull request #4331 from dgibson/config-enable-iommu-annotation
Allow io.katacontainers.config.hypervisor.enable_iommu annotation by …
2022-06-10 17:43:27 -07:00
James O. D. Hunt
412441308b docs: Add more kata monitor details
Add more detail to the `kata-monitor` doc to allow an admin to make a
more informed decision about where and how to run the daemon.

Fixes: #4416.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2022-06-09 09:20:11 +01:00
Eric Ernst
4ebf9d38b9
Merge pull request #4310 from egernst/core-sched
shim: add support for core scheduling
2022-06-08 17:42:45 +02:00
Bin Liu
eff4e1017d shim: change the log level for GetOOMEvent call failures
GetOOMEvent is a blocking call that will fail if
the container exit, in this case, it's not an error or warning.

Changing the log level for logs in case of GetOOMEvent call fails
will reduce log noise in a large cluster that has pods
creating/deleting frequently.

Fixes: #4376

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-06-08 22:17:24 +08:00
dependabot[bot]
5d7fb7b7b0 build(deps): bump github.com/containerd/containerd in /src/runtime
Bumps [github.com/containerd/containerd](https://github.com/containerd/containerd) from 1.6.1 to 1.6.6.
- [Release notes](https://github.com/containerd/containerd/releases)
- [Changelog](https://github.com/containerd/containerd/blob/main/RELEASES.md)
- [Commits](https://github.com/containerd/containerd/compare/v1.6.1...v1.6.6)

---
updated-dependencies:
- dependency-name: github.com/containerd/containerd
  dependency-type: direct:production
...

Fixes: #4421
Signed-off-by: dependabot[bot] <support@github.com>
2022-06-08 10:54:46 +03:00
David Gibson
8f10e13e07 config: Allow enable_iommu pod annotation by default
Since #902 the `io.katacontainers.config.hypervisor` pod annotations
have only been permitted if explicitly allowed in the global
configuration.  The default global configuration allows no such
annotations.  That's important because several of those annotations
would cause Kata to execute arbitrary binaries, and so were wildly
unsafe.

However, this is inconvenient for the
`io.katacontainers.config.hypervisor.enable_iommu` annotation
specifically, which controls whether the sandbox VM includes a vIOMMU.
A guest side vIOMMU is necessary to implement VFIO passthrough devices
with `vfio_mode = vfio`, so enabling that mode of operation currently
requires a global configuration change, and can't just be enabled
per-pod.

Unlike some of the other hypervisor annotations, the `enable_iommu`
annotation is quite safe.  By default the vIOMMU is not present, so
allowing a user to override it for a pod only improves their
facilities for isolation.  Even if the global default were changed to
enable the vIOMMU, that doesn't compel the guest kernel to use it, so
allowing a user to disable the vIOMMU doesn't materially affect
isolation either.

Therefore, allow the io.katacontainers.config.hypervisor.enable_iommu
annotation to work in the default configurations.

fixes #4330

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-06-04 13:02:05 +10:00
Eric Ernst
430da47215
Merge pull request #4360 from fengwang666/shim-leak
runtime: ignore ESRCH error from stop container
2022-06-02 12:42:19 -07:00
Feng Wang
9726f56fdc runtime: force stop container after the container process exits
Set thestop container force flag to true so that the container state is always set to
“StateStopped” after the container wait goroutine is finished. This is necessary for
the following delete container step to succeed.

Fixes: #4359

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-06-02 08:17:08 -07:00
Eric Ernst
d2df1209a5 docs: describe kata handling for core-scheduling
Add initial documentation for core-scheduling.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-05-31 16:17:00 -07:00
Michael Crosby
22b6a94a84 shim: add support for core scheduling
In linux 5.14 and hopefully some backports, core scheduling allows processes to
be co scheduled within the same domain on SMT enabled systems.

Containerd impl sets the core sched domain when launching a shim. This
allows a clean way for each shim(container/pod) to be in its own domain and any
additional containers, (v2 pods) be be launched with the same domain as well as
any exec'd process added to the container.

kernel docs: https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/core-scheduling.html

For Kata specifically, we will look for SCHED_CORE environment variable
to be set to indicate we shuold create a new schedule core domain.

This is equivalent to the containerd shim's PR: e48bbe8394

Fixes: #4309

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Signed-off-by: Michael Crosby <michael@thepasture.io>
2022-05-31 10:10:40 -07:00
Eric Ernst
65f0cef16c kata-runtime: add iptables CLI to test http endpoint
While end users can connect directly to the shim, let's provide a way to
easily get/set iptables from kata-runtime itself.

Fixes: #4080
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-05-31 09:27:58 -07:00
Eric Ernst
3201ad0830 shim-client: ensure we check resp status for Put/Post
Without this, potential errors are silently dropped. Let's ensure we
return the error code as well as potenial data from the response.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-05-31 09:27:58 -07:00
Eric Ernst
0706fb28ac kata-runtime: shmgmt: make url usage consistent
Before, we had a mix of slash, etc. Unfortunately, when cleaning URL
paths, serve mux seems to mangle the request method, resulting in each
request being a GET (instead of PUT or POST).

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-05-31 09:27:58 -07:00
Eric Ernst
2a09378dd9 shim-client: add support for DoPut
While at it, make sure we check for nil in DoPost

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-05-31 09:27:58 -07:00
Eric Ernst
640173cfc2 shim-mgmt: Add endpoint handler for interacting with iptables
Add two endpoints: ip6tables, iptables.

Each url handler supports GET and PUT operations. PUT expects
the requests' data to be []bytes, and to contain iptable information in
format to be consumed by iptables-restore.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-05-31 09:27:58 -07:00
Eric Ernst
0136be22ca virtcontainers: plumb iptable set/get from sandbox to agent
Introduce get/set iptable handling. We add a sandbox API for getting and
setting the IPTables within the guest. This routes it from sandbox
interface, through kata-agent, ultimately making requests to the guest
agent.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-05-31 09:27:58 -07:00
Eric Ernst
03176a9e09 proto: update generated code based on proto update
Update the generated agent.pb.go code based on proto update.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-05-31 08:45:59 -07:00
Fabiano Fidêncio
fff832874e clh: Update to v24.0
This release has been tracked through the v24.0 project.

virtio-iommu specification describes how a device can be attached by default
to a bypass domain. This feature is particularly helpful for booting a VM with
guest software which doesn't support virtio-iommu but still need to access
the device. Now that Cloud Hypervisor supports this feature, it can boot a VM
with Rust Hypervisor Firmware or OVMF even if the virtio-block device exposing
the disk image is placed behind a virtual IOMMU.

Multiple checks have been added to the code to prevent devices with identical
identifiers from being created, and therefore avoid unexpected behaviors at boot
or whenever a device was hot plugged into the VM.

Sparse mmap support has been added to both VFIO and vfio-user devices. This
allows the device regions that are not fully mappable to be partially mapped.
And the more a device region can be mapped into the guest address space, the
fewer VM exits will be generated when this device is accessed. This directly
impacts the performance related to this device.

A new serial_number option has been added to --platform, allowing a user to
set a specific serial number for the platform. This number is exposed to the
guest through the SMBIOS.

* Fix loading RAW firmware (#4072)
* Reject compressed QCOW images (#4055)
* Reject virtio-mem resize if device is not activated (#4003)
* Fix potential mmap leaks from VFIO/vfio-user MMIO regions (#4069)
* Fix algorithm finding HOB memory resources (#3983)

* Refactor interrupt handling (#4083)
* Load kernel asynchronously (#4022)
* Only create ACPI memory manager DSDT when resizable (#4013)

Deprecated features will be removed in a subsequent release and users should
plan to use alternatives

* The mergeable option from the virtio-pmem support has been deprecated
(#3968)
* The dax option from the virtio-fs support has been deprecated (#3889)

Fixes: #4317

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-05-26 08:51:18 +00:00
Eric Ernst
6d00701ec9
Merge pull request #4298 from yibozhuang/fix-direct-volume
Fix issues with direct-volume stats feature
2022-05-23 15:23:51 -07:00
Yibo Zhuang
4428ceae16 runtime: direct-volume stats use correct name
Today the shim does a translation when doing
direct-volume stats where it takes the source and
returns the mount path within the guest.

The source for a direct-assigned volume is actually
the device path on the host and not the publish
volume path.

This change will perform a lookup of the mount info
during direct-volume stats to ensure that the
device path is provided to the shim for querying
the volume stats.

Fixes: #4297

Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
2022-05-20 18:42:47 -07:00
Yibo Zhuang
ffdc065b4c runtime: direct-volume stats update to use GET parameter
The go default http mux AFAIK doesn’t support pattern
routing so right now client is padding the url
for direct-volume stats with a subpath of the volume
path and this will always result in 404 not found returned
by the shim.

This change will update the shim to take the volume
path as a GET query parameter instead of a subpath.
If the parameter is missing or empty, then return
400 BadRequest to the client.

Fixes: #4297

Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
2022-05-20 18:41:51 -07:00
Yibo Zhuang
f295953183 runtime: fix incorrect Action function for direct-volume stats
The action function expects a function that returns error
but the current direct-volume stats Action returns
(string, error) which is invalid.

This change fixes the format and print out the stats from
the command instead.

Fixes: #4293

Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
2022-05-20 14:55:00 -07:00
Peng Tao
2c238c8504
Merge pull request #4213 from zvonkok/vfio
runtime: Adding the correct detection of mediated PCIe devices
2022-05-20 15:00:23 +08:00
Fabiano Fidêncio
811ac6a8ce
Merge pull request #4282 from r4f4/runtime-dedup-types-import
runtime: remove duplicate 'types' import
2022-05-19 22:15:36 +02:00
Chelsea Mafrica
d8be0f8e9f
Merge pull request #4281 from r4f4/runtime-qemu-comments
runtime: sync docstrings with function names
2022-05-19 09:17:38 -07:00
Rafael Fonseca
7a5ccd1264 runtime: sync docstrings with function names
The functions were renamed but their docstrings were not.

Fixes #4006

Signed-off-by: Rafael Fonseca <r4f4rfs@gmail.com>
2022-05-19 14:31:47 +02:00
Greg Kurz
fa61bd43ee
Merge pull request #4238 from snir911/wip/legacy_console
qemu: allow using legacy serial device for the console
2022-05-19 14:30:59 +02:00
Rafael Fonseca
ce2e521a0f runtime: remove duplicate 'types' import
Fallout of 09f7962ff

Fixes #4285

Signed-off-by: Rafael Fonseca <r4f4rfs@gmail.com>
2022-05-19 13:49:47 +02:00
Snir Sheriber
f4994e486b runtime: allow annotation configuration to use_legacy_serial
and update the docs and test

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2022-05-18 18:58:21 +03:00
Fabiano Fidêncio
c88a48be21
Merge pull request #4271 from r4f4/runtime-err-check-fix
runtime: do not check for EOF error in console watcher
2022-05-18 09:49:48 +02:00
GabyCT
12f0ab120a
Merge pull request #4191 from dgibson/go-test-script
Improve Go unit test script
2022-05-17 10:27:04 -05:00
Rafael Fonseca
8052fe62fa runtime: do not check for EOF error in console watcher
The documentation of the bufio package explicitly says

"Err returns the first non-EOF error that was encountered by the
Scanner."

When io.EOF happens, `Err()` will return `nil` and `Scan()` will return
`false`.

Fixes #4079

Signed-off-by: Rafael Fonseca <r4f4rfs@gmail.com>
2022-05-17 15:14:33 +02:00
Snir Sheriber
c67b9d2975 qemu: allow using legacy serial device for the console
This allows to get guest early boot logs which are usually
missed when virtconsole is used.
- It utilizes previous work on the govmm side:
https://github.com/kata-containers/govmm/pull/203
- unit test added

Fixes: #4237
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2022-05-17 12:06:11 +03:00
Snir Sheriber
44814dce19 qemu: treat console kernel params within appendConsole
as it is tightly coupled with the appended console device
additionally have it tested

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2022-05-17 12:05:31 +03:00
Fabiano Fidêncio
c39852e83f runtime: Use ${LIBEXEC}/virtiofsd as the default virtiofsd path
As now we build and ship the rust version of virtiofsd, which is not
tied to QEMU, we need to update its default location to match with where
we're installing this binary.

Fixes: #4249

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-05-16 09:30:24 +02:00
David Gibson
e73b70baff runtime: Don't run unit tests verbose by default
go-test.sh by default adds the -v option to 'go test' meaning that output
will be printed from all the passing tests as well as any failing ones.
This results in a lot of output in which it's often difficult to locate the
failing tests you're interested in.

So, remove -v from the default flags.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-05-13 13:22:31 +10:00
David Gibson
f24a6e761f runtime: Consolidate flags setting in unit tests script
One of the responsibilities of the go-test.sh script is setting up the
default flags for 'go test'.  This is constructed across several different
places in the script using several unneeded intermediate variables though.

Consolidate all the flag construction into one place.

fixes #4190

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-05-13 13:22:29 +10:00
David Gibson
cf465feb02 runtime: Don't change test behaviour based on $CI or $KATA_DEV_MODE
go-test.sh changes behaviour based on both the $CI and $KATA_DEV_MODE
variables, but not in a way that makes a lot of sense.

If either one is set it uses the test_coverage path, instead of the
test_local path.  That collects coverage information, as the name
suggests, but it also means it runs the tests twice as root and
non-root, which is very non-obvious.

It's not clear what use case the test_local path is for at all.
Developer local builds will typically have $KATA_DEV_MODE set and CI
builds will have $CI set.  There's essentially no downside to running
coverage all the time - it has little impact on the test runtime.

In addition, if *both* $CI and $KATA_DEV_MODE are set, the script
refuses to run things as root, considering it "unsafe".  While having
both set might be unwise in a general sense, there's not really any
way running sudo can be any more unsafe than it is with either one
set.

So, simplify everything by just always running the test_coverage path.
This leaves the test_local path unused, so we can remove it entirely.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-05-13 13:14:37 +10:00
David Gibson
34c4ac599c runtime: Remove redundant subcommands from go-test.sh
go-test.sh accepts subcommands, however invoking it in the usual way via
the Makefile doesn't use them.  In fact the only remaining subcommand is
"help" and we already have another way of getting the usage information
(-h or --help).  We don't need a second way, so just drop subcommand
handling.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-05-13 13:14:37 +10:00
David Gibson
0aff5aaa39 runtime: Simplify package listing in go-test.sh
go-test.sh defaults to testing all the packages listed by go list, except
for a number filtered out.  It turns out that none of those filters are
necessary any more:
  * We've long required a Go newer than 1.9 which means the vendor filter
    isn't needed
  * The agent filter doesn't do anything now that we've moved to the Kata
    2.x unified repo
  * The tests filters don't hit anything on the list of modules in
    src/runtime (which is the only user of the script)

But since we don't need to filter anything out any more, we don't even need
to iterate through a list ourselves.  We can simply pass "./..." directly
to go test and it will iterate through all the sub-packages itself.

Interestingly this more than doubles the speed of "make test" for me - I
suspect because go test's internal paralellism works better over a larger
pool of tests.

This also lets us remove handling of non-existent coverage files from
test_go_package(), since with default options we will no longer test packages without tests
by default.  If the user explicitly requests testing of a package with no
tests, then failing makes sense.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-05-13 13:14:37 +10:00
David Gibson
557c4cfd00 runtime: Don't chmod coverage files in Go tests
The go-test.sh script has an explicit chmod command, run as root, to
set the mode of the temporary coverage files to 0644.  AFAICT the
point of this is specifically the 004 bit allowing world read access,
so that we can then merge the temporary coverage file into the main
coverage file.

That's a convoluted way of doing things.  Instead we can just run the tail
command which reads the temporary file as the same user that generated it.

In addition, go-test.sh became root to remove that temporary coverage
file.  This is not necessary, since deleting a regular file just requires
write access to the directory, not the file itself.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-05-13 13:14:37 +10:00
David Gibson
04c8b52e04 runtime: Remove HTML coverage option from go-test.sh
The html-coverage option to this script doesn't really alter behaviour
it just does the same thing as normal coverage, then converts the
report to HTML.  That conversion is a single command, plus a chmod to
make the final output mode 0644.  That overrides any umask the user
has set, which doesn't seem like a policy decision this script should
be making.

Nothing in the kata-containers or tests repository uses this, so it doesn't
really make sense to keep this logic inside this script.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-05-13 13:14:37 +10:00
David Gibson
7f76914422 runtime: Add coverage.txt.tmp to gitignore
In addition to coverage.txt, the go-test.sh script creates
coverage.txt.tmp files while running.  These are temporary and
certainly shouldn't be committed, so add them to the gitignore file.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-05-13 13:14:37 +10:00
David Gibson
13c2577004 runtime: Move go testing script locally
The go unit tests for the runtime are invoked by the helper script
ci/go-test.sh.  Which calls the run_go_test() function in ci/lib.sh.  Which
calls into .ci/go-test.sh from the tests repository.

But.. the runtime is the only user of this script, and generally stuff for
unit tests (rather than functional or integration tests) lives in the main
repository, not the tests repository.

So, just move the actual script into src/runtime.  A change to remove it
from the tests repo will follow.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-05-13 13:14:37 +10:00
Zvonko Kaiser
2a1d394147 runtime: Adding the correct detection of mediated PCIe devices
Fixes #4212

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2022-05-09 00:57:06 -07:00
Fabiano Fidêncio
33a8b70558 clh: Rely on Cloud Hypervisor for generating the device ID
We're currently hitting a race condition on the Cloud Hypervisor's
driver code when quickly removing and adding a block device.

This happens because the device removal is an asynchronous operation,
and we currently do *not* monitor events coming from Cloud Hypervisor to
know when the device was actually removed.  Together with this, the
sandbox code doesn't know about that and when a new device is attached
it'll quickly assign what may be the very same ID to the new device,
leading to the Cloud Hypervisor's driver trying to hotplug a device with
the very same ID of the device that was not yet removed.

This is, in a nutshell, why the tests with Cloud Hypervisor and
devmapper have been failing every now and then.

The workaround taken to solve the issue is basically *not* passing down
the device ID to Cloud Hypervisor and simply letting Cloud Hypervisor
itself generate those, as Cloud Hypervisor does it in a manner that
avoids such conflicts.  With this addition we have then to keep a map of
the device ID and the Cloud Hypervisor's generated ID, so we can
properly remove the device.

This workaround will probably stay for a while, at least till someone
has enough cycles to implement a way to watch the device removal event
and then properly act on that.  Spoiler alert, this will be a complex
change that may not even be worth it considering the race can be avoided
with this commit.

Fixes: #4176

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-05-04 09:04:03 +02:00
Jianyong Wu
982c32358a
Merge pull request #4031 from Jaylyn-Ren/kata-spdk
Virtcontainers: Enable hot plugging vhost-user-blk device on ARM
2022-04-29 12:16:38 +08:00
Fabiano Fidêncio
b6467ddd73 clh: Expose disk rate limiter config
With everything implemented, let's now expose the disk rate limiter
configuration options in the Cloud Hypervisor configuration file.

Fixes: #4139

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:28:29 +02:00
Fabiano Fidêncio
7580bb5a78 clh: Expose net rate limiter config
With everything implemented, let's now expose the net rate limiter
configuration options in the Cloud Hypervisor configuration file.

Fixes: #4017

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:28:13 +02:00
Fabiano Fidêncio
a88adabaae clh: Cloud Hypervisor has a built-in Rate Limiter
The notion of "built-in rate limiter" was added as part of
bd8658e362, and that commit considered
that only Firecracker had a built-in rate limiter, which I think was the
case when that was introduced (mid 2020).

Nowadays, however, Cloud Hypervisor takes advantage of the very same crate
used by Firecraker to do I/O throttling.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:27:56 +02:00
Fabiano Fidêncio
63c4da03a9 clh: Implement the Disk RateLimiter logic
Let's take advantage of the newly added DiskRateLimiter* options and
apply those to the network device configuration.

The logic here is identical to the one already present in the Network
part of Cloud Hypervisor's driver.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:27:53 +02:00
Fabiano Fidêncio
511f7f822d config: Add DiskRateLimiter* to Cloud Hypervisor
Let's add the newly added disk rate limiter configurations to the Cloud
Hypervisor's hypervisor configuration.

Right now those are not used anywhere, and there's absolutely no way the
users can set those up.  That's coming later in this very same series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:27:15 +02:00
Fabiano Fidêncio
5b18575dfe hypervisor: Add disk bandwidth and operations rate limiters
This is the disk counterpart of the what was introduced for the network
as part of the previous commits in this series.

The newly added fields are:
* DiskRateLimiterBwMaxRate, defined in bits per second, which is used to
  control the network I/O bandwidth at the VM level.
* DiskRateLimiterBwOneTimeBurst, also defined in bits per second, which
  is used to define an *initial* max rate, which doesn't replenish.
* DiskRateLimiterOpsMaxRate, the operations per second equivalent of the
  DiskRateLimiterBwMaxRate.
* DiskRateLimiterOpsOneTimeBurst, the operations per second equivalent of
  the DiskRateLimiterBwOneTimeBurst.

For now those extra fields have only been added to the hypervisor's
configuration and they'll be used in the coming patches of this very
same series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:27:11 +02:00
Fabiano Fidêncio
1cf9469297 clh: Implement the Network RateLimiter logic
Let's take advantage of the newly added NetRateLimiter* options and
apply those to the network device configuration.

The logic here is quite similar to the one already present in the
Firecracker's driver, with the main difference being the single Inbound
/ Outbound MaxRate and the presence of both Bandwidth and Operations
rate limiter.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:26:38 +02:00
Fabiano Fidêncio
00a5b1bda9 utils: Define DefaultRateLimiterRefillTimeMilliSecs
Firecracker's driver doesn't expose the RefillTime option of the rate
limiter to the user.  Instead, it uses a contant value of 1000
miliseconds (1 second).

As we're following Firecracker's driver implementation, let's expose
create a new constant, use it as part of the Firecracker's driver, and
later on re-use it as part of the Cloud Hypervisor's driver.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:22:42 +02:00
Fabiano Fidêncio
be1bb7e39f utils: Move FC's function to revert bytes to utils
Firecracker's revertBytes function, now called "RevertBytes", can be
exposed as part of the virtcontainers' utils file, as this function will
be reused by Cloud Hypervisor, when adding the rate limiter logic there.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:22:42 +02:00
Fabiano Fidêncio
c9f6496d6d config: Add NetRateLimiter* to Cloud Hypervisor
Let's add the newly added network rate limiter configurations to the
Cloud Hypervisor's hypervisor configuration.

Right now those are not used anywhere, and there's absolutely no way the
users can set those up.  That's coming later in this very same series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:22:42 +02:00
Fabiano Fidêncio
2d35e6066d hypervisor: Add network bandwidth and operations rate limiters
In a similar way to what's already exposed as RxRateLimiterMaxRate and
TxRateLimiterMaxRate, let's add four new fields to the Hypervisor's
configuration.

The values added are related to bandwidth and operations rate limiters,
which have to be added so we can expose I/O throttling configurations to
users using Cloud Hypervisor as their preferred VMM.

The reason we cannot simply re-use {Rx,Tx}RateLimiterMaxRate is because
Cloud Hypervisor exposes a single MaxRate to be used for both inbound
and outbound queues.

The newly added fields are:
* NetRateLimiterBwMaxRate, defined in bits per second, which is used to
  control the network I/O bandwidth at the VM level.
* NetRateLimiterBwOneTimeBurst, also defined in bits per second, which
  is used to define an *initial* max rate, which doesn't replenish.
* NetRateLimiterOpsMaxRate, the operations per second equivalent of the
  NetRateLimiterBwMaxRate.
* NetRateLimiterOpsOneTimeBurst, the operations per second equivalent of
  the NetRateLimiterBwOneTimeBurst.

For now those extra fields have only been added to the hypervisor's
configuration and they'll be used in the coming patches of this very
same series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:22:42 +02:00
David Gibson
1b931f4203 runtime: Allock mockfs storage to be placed in any directory
Currently EnableMockTesting() takes no arguments and will always place the
mock storage in the fixed location /tmp/vc/mockfs.  This means that one
test run can interfere with the next one if anything isn't cleaned up
(and there are other bugs which means that happens).  If if those were
fixed this would allow developers testing on the same machine to interfere
with each other.

So, allow the mockfs to be placed at an arbitrary place given as a
parameter to EnableMockTesting().  In TestMain() we place it under our
existing temporary directory, so we don't need any additional cleanup just
for the mockfs.

fixes #4140

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:47:59 +10:00
David Gibson
ef6d54a781 runtime: Let MockFSInit create a mock fs driver at any path
Currently MockFSInit always creates the mockfs at the fixed path
/tmp/vc/mockfs.  This change allows it to be initialized at any path
given as a parameter.  This allows the tests in fs_test.go to be
simplified, because the by using a temporary directory from
t.TempDir(), which is automatically cleaned up, we don't need to
manually trigger initTestDir() (which is misnamed, it's actually a
cleanup function).

For now we still use the fixed path when auto-creating the mockfs in
MockAutoInit(), but we'll change that later.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:23:36 +10:00
David Gibson
5d8438e939 runtime: Move mockfs control global into mockfs.go
virtcontainers/persist/fs/mockfs.go defines a mock filesystem type for
testing.  A global variable in virtcontainers/persist/manager.go is used to
force use of the mock fs rather than a normal one.

This patch moves the global, and the EnableMockTesting() function which
sets it into mockfs.go.  This is slightly cleaner to begin with, and will
allow some further enhancements.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:23:36 +10:00
David Gibson
963d03ea8a runtime: Export StoragePathSuffix
storagePathSuffix defines the file path suffix - "vc" - used for
Kata's persistent storage information, as a private constant.  We
duplicate this information in fc.go which also needs it.

Export it from fs.go instead, so it can be used in fc.go.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:23:36 +10:00
David Gibson
1719a8b491 runtime: Don't abuse MockStorageRootPath() for factory tests
A number of unit tests under virtcontainers/factory use
MockStorageRootPath() as a general purpose temporary directory.  This
doesn't make sense: the mockfs driver isn't even in use here since we only
call EnableMockTesting for the pase virtcontainers package, not the
subpackages.

Instead use t.TempDir() which is for exactly this purpose.  As a bonus it
also handles the cleanup, so we don't need MockStorageDestroy any more.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:23:36 +10:00
David Gibson
bec59f9e39 runtime: Make bind mount tests better clean up after themselves
There are several tests in mount_test.go which perform a sample bind
mount.  These need a corresponding unmount to clean up afterwards or
attempting to delete the temporary files will fail due to the existing
mountpoint.  Most of them had such an unmount, but
TestBindMountInvalidPgtypes was missing one.

In addition, the existing unmounts where done inconsistently - one was
simply inline (so wouldn't be executed if the test fails too early) and one
is a defer.  Change them all to use the t.Cleanup mechanism.

For the dummy mountpoint files, rather than cleaning them up after the
test, the tests were removing them at the beginning of the test.  That
stops the test being messed up by a previous run, but messily.  Since
these are created in a private temporary directory anyway, if there's
something already there, that indicates a problem we shouldn't ignore.
In fact we don't need to explicitly remove these at all - they'll be
removed along with the rest of the private temporary directory.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:20:35 +10:00
David Gibson
f7ba21c86f runtime: Clean up mock hook logs in tests
The tests in hook_test.go run a mock hook binary, which does some debug
logging to /tmp/mock_hook.log.  Currently we don't clean up those logs
when the tests are done.  Use a test cleanup function to do this.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:14:52 +10:00
David Gibson
90b2f5b776 runtime: Make SetupOCIConfigFile clean up after itself
SetupOCIConfigFile creates a temporary directory with os.MkDirTemp().  This
means the callers need to register a deferred function to remove it again.
At least one of them was commented out meaning that a /temp/katatest-
directory was leftover after the unit tests ran.

Change to using t.TempDir() which as well as better matching other parts of
the tests means the testing framework will handle cleaning it up.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:14:52 +10:00
David Gibson
2eeb5dc223 runtime: Don't use fixed /tmp/mountPoint path
Several tests in kata_agent_test.go create /tmp/mountPoint as a dummy
directory to mount.  This is not cleaned up after the test.  Although it
is in /tmp, that's still a little messy and can be confusing to a user.
In addition, because it uses the same name every time, it allows for one
run of the test to interfere with the next.

Use the built in t.TempDir() to use an automatically named and deleted
temporary directory instead.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:14:52 +10:00
Archana Shinde
33e244f284
Merge pull request #4102 from likebreath/0414/clh_v23.0
Upgrade to Cloud Hypervisor v23.0
2022-04-19 06:01:04 -07:00
Chelsea Mafrica
0af13b469d
Merge pull request #4086 from BbolroC/s390x-fix
test: Fix golangci-lint error for s390x
2022-04-15 21:07:09 -07:00
Bin Liu
b19bfac7cd
Merge pull request #4042 from yibozhuang/direct-assign-fsgroup
fsGroup support for direct-assigned volume
2022-04-16 10:23:15 +08:00
Bin Liu
4ec1967542
Merge pull request #4094 from fgiudici/kata-monitor_readme
kata-monitor: add the README file
2022-04-16 08:27:22 +08:00
Bin Liu
362201605e
Merge pull request #4055 from fgiudici/kata-monitor_pprof
kata-monitor: update the hrefs in the debug/pprof index page
2022-04-16 08:12:18 +08:00
Francesco Giudici
7b2ff02647 kata-monitor: add a README file
Fixes: #3704

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
2022-04-15 18:03:23 +02:00
Bo Chen
29e569aa92 virtcontainers: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v23.0.
Note: The client code of cloud-hypervisor's (CLH) OpenAPI is
automatically generated by openapi-generator [1-2].

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-04-14 12:56:01 -07:00
Chelsea Mafrica
32f92e75cc
Merge pull request #4021 from fengwang666/direct-volume-bug
runtime: Base64 encode the direct volume mountInfo path
2022-04-13 13:15:38 -07:00
Greg Kurz
4443bb68a4
Merge pull request #4064 from tiezhuoyu/4063/no-need-to-write-error-of-virtiofsd-to-kata-log
runtime: no need to write virtiofsd error to log
2022-04-13 11:59:19 +02:00
Hyounggyu Choi
d136c9c240 test: Fix golangci-lint error for s390x
This is to fix a test failure for the
kata-containers-2.0-ubuntu-20.04-s390x-main-baseline jenkins job

Fixes: #4088

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2022-04-13 09:20:51 +02:00
Fupan Li
66aa07649b
Merge pull request #4062 from liubin/fix/4061-add-links-for-kata-monitor
kata-monitor: add some links when generating pages for browsers
2022-04-13 11:30:21 +08:00
Francesco Giudici
86977ff780 kata-monitor: update the hrefs in the debug/pprof index page
kata-monitor allows to get data profiles from the kata shim
instances running on the same node by acting as a proxy
(e.g., http://$NODE_ADDRESS:8090/debug/pprof/?sandbox=$MYSANDBOXID).
In order to proxy the requests and the responses to the right shim,
kata-monitor requires to pass the sandbox id via a query string in the
url.

The profiling index page proxied by kata-monitor contains the link to all
the data profiles available. All the links anyway do not contain the
sandbox id included in the request: the links result then broken when
accessed through kata-monitor.
This happens because the profiling index page comes from the kata shim,
which will not include the query string provided in the http request.

Let's add on-the-fly the sandbox id in each href tag returned by the kata
shim index page before providing the proxied page.

Fixes: #4054

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
2022-04-12 15:53:59 +02:00
Zhuoyu Tie
6e79042aa0 runtime: no need to write virtiofsd error to log
The scanner reads nothing from viriofsd stderr pipe, because param
'--syslog' rediercts stderr to syslog. So there is no need to write
scanner.Text() to kata log

Fixes: #4063

Signed-off-by: Zhuoyu Tie <tiezhuoyu@outlook.com>
2022-04-12 15:59:57 +08:00
Yibo Zhuang
532d53977e runtime: fsGroup support for direct-assigned volume
The fsGroup will be specified by the fsGroup key in
the direct-assign mountinfo metadate field.
This will be set when invoking the kata-runtime
binary and providing the key, value pair in the metadata
field. Similarly, the fsGroupChangePolicy will also
be provided in the mountinfo metadate field.

Adding an extra fields FsGroup and FSGroupChangePolicy
in the Mount construct for container mount which will
be populated when creating block devices by parsing
out the mountInfo.json.

And in handleDeviceBlockVolume of the kata-agent client,
it checks if the mount FSGroup is not nil, which
indicates that fsGroup change is required in the guest,
and will provide the FSGroup field in the protobuf to
pass the value to the agent.

Fixes #4018

Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
2022-04-11 08:41:13 -07:00
Yibo Zhuang
6a47b82c81 proto: fsGroup support for direct-assigned volume
This change adds two fields to the Storage pb

FSGroup which is a group id that the runtime
specifies to indicate to the agent to perform a
chown of the mounted volume to the specified
group id after mounting is complete in the guest.

FSGroupChangePolicy which is a policy to indicate
whether to always perform the group id ownership
change or only if the root directory group id
does not match with the desired group id.

These two fields will allow CSI plugins to indicate
to Kata that after the block device is mounted in
the guest, group id ownership change should be performed
on that volume.

Fixes #4018

Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
2022-04-11 08:41:13 -07:00
bin
f8cc5d1ad8 kata-monitor: add some links when generating pages for browsers
Add some links to rendered webpages for better user experience,
let users can jump to pages only by clicking links in browsers.

Fixes: #4061

Signed-off-by: bin <bin@hyper.sh>
2022-04-11 09:29:56 +08:00
bin
9d5b03a1b7 runtime: delete debug option in virtiofsd
virtiofsd's debug will be enabled if hypervisor's debug has been
enabled, this will generate too many noisy logs from virtiofsd.

Unbind the relationship of log level between virtiofsd and
hypervisor, if users want to see debug log of virtiofsd,
can set it by:

  virtio_fs_extra_args = ["-o", "log_level=debug"]

Fixes: #3303

Signed-off-by: bin <bin@hyper.sh>
2022-04-07 19:55:22 +08:00
Greg Kurz
d0d3787233
Merge pull request #3696 from shippomx/main
kata-runtime enable hugepage support
2022-04-06 16:47:04 +02:00
Jaylyn Ren
b975f2e8d2 Virtcontainers: Enable hot plugging vhost-user-blk device on ARM
The vhost-user-blk can be hotplugged on the PCI bridge successfully on
X86, but failed on Arm. However, hotplugging it on Root Port as a PCIe
device can work well on ARM.
Open the "pcie_root_port" in configuration.toml is needed.

Fixes: #4019

Signed-off-by: Jaylyn Ren <jaylyn.ren@arm.com>
2022-04-06 17:37:51 +08:00
Fabiano Fidêncio
b39caf43f1
Merge pull request #3923 from Jakob-Naucke/no-initrd-se
runtime: Allow and require no initrd for SE
2022-04-05 09:26:07 +02:00
Feng Wang
354cd3b9b6 runtime: Base64 encode the direct volume mountInfo path
This is to avoid accidentally deleting multiple volumes.

Fixes #4020

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-04-04 19:56:46 -07:00
Archana Shinde
e62bc8e7f3
Merge pull request #3915 from Juneezee/test/t.TempDir
test: use `T.TempDir` to create temporary test directory
2022-04-04 01:34:46 -07:00
Fabiano Fidêncio
98750d792b clh: Expose service offload configuration
This configuration option is valid for all the hypervisor that are going
to be used with the confidential containers effort, thus exposing the
configuration option for Cloud Hypervisor as well.

Fixes: #4022

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-01 11:15:55 +02:00
Eng Zer Jun
59c7165ee1
test: use T.TempDir to create temporary test directory
The directory created by `T.TempDir` is automatically removed when the
test and all its subtests complete.

This commit also updates the unit test advice to use `T.TempDir` to
create temporary directory in tests.

Fixes: #3924

Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2022-03-31 09:31:36 +08:00
snir911
18dc578134
Merge pull request #3999 from fgiudici/kata-monitor_fix_help
kata-monitor: fix duplicated output when printing usage
2022-03-30 18:56:59 +03:00
Francesco Giudici
a63bbf9793 kata-monitor: fix duplicated output when printing usage
(default: "/run/containerd/containerd.sock") is duplicated when
printing kata-monitor usage:

[root@kubernetes ~]# kata-monitor --help
Usage of kata-monitor:
  -listen-address string
        The address to listen on for HTTP requests. (default ":8090")
  -log-level string
        Log level of logrus(trace/debug/info/warn/error/fatal/panic). (default "info")
  -runtime-endpoint string
        Endpoint of CRI container runtime service. (default: "/run/containerd/containerd.sock") (default "/run/containerd/containerd.sock")

the golang flag package takes care of adding the defaults when printing
usage. Remove the explicit print of the value so that it would not be
printed on screen twice.

Fixes: #3998

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
2022-03-30 11:58:53 +02:00
bin
5e1c30d484 runtime: add logs around sandbox monitor
For debugging purposes, add some logs.

Fixes: #3815

Signed-off-by: bin <bin@hyper.sh>
2022-03-29 16:59:12 +08:00
bin
fb8be96194 runtime: stop getting OOM events when ttrpc: closed error
getOOMEvents is a long-waiting call, it will retry when failed.
For cases of agent shutdown, the retry should stop.

When the agent hasn't detected agent has died, we can also check
whether the error is "ttrpc: closed".

Fixes: #3815

Signed-off-by: bin <bin@hyper.sh>
2022-03-29 16:39:01 +08:00
Bin Liu
9495316145
Merge pull request #3962 from yaoyinnan/fix/3750-VirtioMem
runtime: Remove the explicit VirtioMem set and fix the comment
2022-03-29 10:20:05 +08:00
yaoyinnan
66f05c5bcb runtime: Remove the explicit VirtioMem set and fix the comment
Modify the 2Mib in the comment to 4Mib.
VirtioMem is set by configuration file or annotation. And setupVirtioMem is called only when VirtioMem is true.

Fixes: #3750

Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
2022-03-28 21:21:38 +08:00
Feng Wang
0928eb9f4e agent: Kill the all the container processes of the same cgroup
Otherwise the container process might leak and cause an unclean exit

Fixes: #3913

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-27 10:06:58 -07:00
Jakob Naucke
ff17c756d2
runtime: Allow and require no initrd for SE
Previously, it was not permitted to have neither an initrd nor an image.
However, this is the exact config to use for Secure Execution, where the
initrd is part of the image to be specified as `-kernel`. Require the
configuration of no initrd for Secure Execution.

Also
- remove redundant code for image/initrd checking -- no need to check in
  `newQemuHypervisorConfig` (calling) when it is also checked in
  `getInitrdAndImage` (called)
- use `QemuCCWVirtio` constant when possible

Fixes: #3922
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2022-03-25 18:36:12 +01:00
Feng Wang
19f372b5f5 runtime: Add more debug logs for container io stream copy
This can help debugging container lifecycle issues

Fixes: #3913

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-24 21:35:16 -07:00
David Gibson
c77e34de33 runtime: Move mock hook source
src/runtime/virtcontainers/hook/mock contains a simple example hook in Go.
The only thing this is used for is for some tests in
src/runtime/pkg/katautils/hook_test.go.  It doesn't really have anything
to do with the rest of the virtcontainers package.

So, move it next to the test code that uses it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-03-23 19:37:35 +11:00
David Gibson
86723b51ae virtcontainers: Remove unused install/uninstall targets
We've now removed the need to install the mock hook binary for unit tests.
However, it turns out that managing that was the *only* thing that the
install and uninstall targets in the virtcontainers Makefile handled.

So, remove them.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-03-23 19:37:18 +11:00
David Gibson
0e83c95fac virtcontainers: Run mock hook from build tree rather than system bin dir
Running unit tests should generally have minimal dependencies on
things outside the build tree.  It *definitely* shouldn't modify
system wide things outside the build tree.  Currently the runtime
"make test" target does so, though.

Several of the tests in src/runtime/pkg/katautils/hook_test.go require a
sample hook binary.  They expect this hook in
/usr/bin/virtcontainers/bin/test/hook, so the makefile, as root, installs
the test binary to that location.

Go tests automatically run within the package's directory though, so
there's no need to use a system wide path.  We can use a relative path to
the binary build within the tree just as easily.

fixes #3941

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-03-23 19:34:50 +11:00
David Gibson
e65db838ff virtcontainers: Remove VC_BIN_DIR
The VC_BIN_DIR variable in the virtcontainers Makefile is almost unused.
It's used to generate TEST_BIN_DIR, and it's created in the install target.
However, we also create TEST_BIN_DIR, which is a subdirectory of VC_BIN_DIR
with mkdir -p, so it will necessarily create VC_BIN_DIR along the way.

So we can drop the unnecessary mkdir and expand the definition of
VC_BIN_DIR in the definition of TEST_BIN_DIR.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-03-22 16:53:59 +11:00
David Gibson
c20ad2836c virtcontainers: Remove unused Makefile defines
The INSTALL_EXEC and UNINSTALL_EXEC definitions from the virtcontainers
Makefile (unlike those from the runtime Makefile in the parent directory)
are entirely unused.  Remove them.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-03-22 16:40:57 +11:00
David Gibson
c776bdf4a8 virtcontainers: Remove unused parameter from go-test.sh
The check-go-test target passes the path to the mock hook test binary to
go-test.sh when it invokes it.  But go-test.sh just calls run_go_test from
ci/lib.sh, which invokes a script from the tests repo *without* any
parameters.

That is, this parameter is ignored anyway, so remove it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-03-22 16:39:22 +11:00
James O. D. Hunt
f8fb0d3bb6
Merge pull request #3322 from Kvasscn/kata_dev_block_driver_option
device: using const strings for block-driver option instead of hard coding
2022-03-21 10:56:25 +00:00
Miao Xia
a2f5c1768e runtime/virtcontainers: Pass the hugepages resources to agent
The hugepages resources claimed by containers should be limited
by cgroup in the guest OS.

Fixes: #3695

Signed-off-by: Miao Xia <xia.miao1@zte.com.cn>
2022-03-15 18:46:08 +08:00
Feng Wang
aa5ae6b17c runtime: Properly handle ESRCH error when signaling container
Currently kata shim v2 doesn't translate ESRCH signal, causing container
fail to stop and shim leak.

Fixes: #3874

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-14 11:03:05 -07:00
zhanghj
efa19c41eb device: use const strings for block-driver option instead of hard coding
Currently, the block driver option is specifed by hard coding, maybe it
is better to use const string variables instead of hard coded strings.
Another modification is to remove duplicate consts for virtio driver in
manager.go.

Fixes: #3321

Signed-off-by: Jason Zhang <zhanghj.lc@inspur.com>
2022-03-14 09:20:43 +08:00
Gabriela Cervantes
ffdf961ae9 docs: Update contact link in runtime README
This PR updates the contact link in the runtime README document.

Fixes #3854

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2022-03-08 16:27:34 +00:00
Bin Liu
deb8ce97a8
Merge pull request #3836 from liubin/fix/minor-fix
Enhancement: fix comments/logs and delete not used function
2022-03-07 17:26:30 +08:00
bin
1b34494b2f runtime: fix invalid comments for pkg/resourcecontrol
Some comments are copied and not adjusted to the
pkg/resourcecontrol package.

Fixes: #3835

Signed-off-by: bin <bin@hyper.sh>
2022-03-05 10:32:31 +08:00
Evan Foster
afc567a9ae storage: make k8s emptyDir creation configurable
This change introduces the `disable_guest_empty_dir` config option,
which allows the user to change whether a Kubernetes emptyDir volume is
created on the guest (the default, for performance reasons), or the host
(necessary if you want to pass data from the host to a guest via an
emptyDir).

Fixes #2053

Signed-off-by: Evan Foster <efoster@adobe.com>
2022-03-04 12:02:42 -08:00
Eric Ernst
1e301482e7
Merge pull request #3406 from fengwang666/direct-blk-assignment
Implement direct-assigned volume
2022-03-04 11:58:37 -08:00
Feng Wang
e76519af83 runtime: small refactor to improve readability
Remove some confusing/duplicate code so it's more readable

Fixes: #3454

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-04 10:00:52 -08:00
Fabiano Fidêncio
7e5f11a52b vendor: Update containerd to 1.6.1
Let's bring in the latest release of Containerd, 1.6.1, released on
March 2nd, 2022.

With this, we take the opportunity to remove containerd/api reference as
we shouldn't need a separate module only for the API.

Here's the list of changes needed in the code due to the bump:
* stop using `grpc.WithInsecure()` as it's been deprecated
  - use `grpc.WithTransportCredentials(insecure.NewCredentials())`
    instead

Fixes: #3820

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-04 10:28:40 +01:00
Fabiano Fidêncio
2af91b23e1
Merge pull request #3281 from jongwu/vcpu_hotplug_arm64
experimentally enable vcpu hotplug and virtio-mem on arm64 in kernel part
2022-03-04 09:14:31 +01:00
Jianyong Wu
42771fa726 runtime: don't set socket and thread for arm/virt
As this is just a initial vcpu hotplug support, thread and socket has
not been supported. So, don't set socket and thread when hotadd cpu for
arm/virt.

Fixes: #3280
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2022-03-04 11:22:18 +08:00
Feng Wang
f905161bbb runtime: mount direct-assigned block device fs only once
Mount the direct-assigned block device fs only once and keep a refcount
in the guest. Also use the ro flag inside the options field to determine
whether the block device and filesystem should be mounted as ro

Fixes: #3454

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-03 18:57:02 -08:00
Feng Wang
ea51ef1c40 runtime: forward the stat and resize requests from shimv2 to kata agent
Translate the volume path from host-known path to guest-known path
and forward the request to kata agent.

Fixes: #3454

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-03 18:57:02 -08:00
Feng Wang
c39281ad65 runtime: update container creation to work with direct assigned volumes
During the container creation, it will parse the mount info file
of the direct assigned volumes and update the in memory mount object.

Fixes: #3454

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-03 18:57:02 -08:00
Feng Wang
4e00c2377c agent: add grpc interface for stat and resize operations
Add GetVolumeStats and ResizeVolume APIs for the runtime to query stat
and resize fs in the guest.

Fixes: #3454

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-03 18:57:02 -08:00
Feng Wang
e9b5a25502 runtime: add stat and resize APIs to containerd-shim-v2
To query fs stats and resize fs, the requests need to be passed to
kata agent through containerd-shim-v2. So we're adding to rest APIs
on the shim management endpoint.
Also refactor shim management client to its own go file.

Fixes: #3454

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-03 18:56:53 -08:00
Feng Wang
6e0090abb5 runtime: persist direct volume mount info
In the direct assigned volume scenario, Kata Containers persists
the information required for managing the volume inside the guest
on host filesystem.

Fixes: #3454

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-03 15:32:12 -08:00
Feng Wang
fa326b4e0f runtime: augment kata-runtime CLI to support direct-assigned volume
Add commands to add, remove, resize and get stats of a direct-assigned volume.
These commands are expected to be consumed by CSI.

Fixes: #3454

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-03 15:32:03 -08:00
Fabiano Fidêncio
a2422cf2a1
Merge pull request #3389 from zhsj/rm-distro-test
katatestutils: remove distro constraints
2022-03-03 23:26:58 +01:00
Fabiano Fidêncio
12af632952
Merge pull request #3814 from fidencio/wip/disable-block-device-use-minor-fixes
Minor fixes for the `disable_block_device_use` comments
2022-03-03 23:26:05 +01:00
Fabiano Fidêncio
af80473496 clh: stop virtofsd if clh fails to boot up the vm
If, for some reason, we're able to launch cloud hypervisor but not able
to boot the VM up, the virtiofsd process would be left behind.

Let's ensure, via defer, that we stop virtiofsd in case of errors.

Fixes: #3819

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-03 19:10:37 +01:00
Fabiano Fidêncio
c54bc8e657
Merge pull request #3811 from fidencio/wip/clh-tdx-round-2
clh: tdx: Don't use sharedFS with Confidential Guests
2022-03-03 19:03:28 +01:00
Fabiano Fidêncio
97951a2d12 clh: Don't use SharedFS with Confidential Guests
kata-containers/pulls#3771 added TDX support for Cloud Hypervisor, but
two big things got overlooked while doing that.

1. virtio-fs, as of now, cannot be part of the trust boundary, so the
   Confidential Guest will not be using it.

2. virtio-block hotplug should be enabled in order to use virtio-block
   for the rootfs (used with the devmapper plugin).

When trying to use cloud-hypervisor with TDX using virtio-fs, we're
facing the following error on the guest kernel:
```
virtiofs virtio2: device must provide VIRTIO_F_ACCESS_PLATFORM
```

After checking and double-checking with virtiofs and cloud-hypervisor
developers, it happens as confidential containers might put some
limitations on the device, so it can't access all of the guests' memory
and that's where this restriction seems to be coming from. Vivek
mentioned that virtiofsd do not support VIRTIO_F_ACCESS_PLATFORM (aka
VIRTIO_F_IOMMU_PLATFORM) yet, and that for ecrypted guests virtiofs may
not be the best solution at the moment.

@sboeuf put this in a very nice way: "if the virtio-fs driver doesn't
support VIRTIO_F_ACCESS_PLATFORM, then the pages corresponding to the
virtqueues and the buffers won't be marked as SHARED, meaning the VMM
won't have access to it".

Interestingly enough, it works with QEMU, and it may be due to some
change done on the patched QEMU that @devimc is packaging, but we won't
take the path to figure out what was the change and patch
cloud-hypervisor on the same way, because of 1.

Fixes: #3810

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-03 12:49:40 +01:00
Fabiano Fidêncio
c30b3a9ff1 clh: Adding a volume is not supported without SharedFS
As mounting volumes into the guest requires SharedFS setup, let's ensure
we error out if trying to do so in a situation where SharedFS is not
supported.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-03 12:49:30 +01:00
Fabiano Fidêncio
f889f1f957 clh: introduce supportsSharedFS()
supportsSharedFS() is a new method to be used to ensure that no SharedFS
specifics are called when, for a reason or another, Cloud Hypervisor is
in a mode where SharedFSs are not supported.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-03 12:49:28 +01:00
Fabiano Fidêncio
54d27ed721 clh: introduce loadVirtiofsDaemon()
Similarly to the `createVirtiofsDaemon` and `stopVirtiofsDaemon` methos,
let's introduce and use loadVirtiofsDaemon, at it'll also be handy later
in this series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-03 12:48:38 +01:00
Fabiano Fidêncio
ae2221ea68 clh: introduce stopVirtiofsDaemon()
Similary to the `createVirtiofsDaemon` method, let's introduce and use
its counterpart, as it'll also be handy later in this series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-03 12:48:26 +01:00
Fabiano Fidêncio
e8bc26f90d clh: introduce setupVirtiofsDaemon()
Similarly to what's been done with the `createVirtiofsDaemon`, let's
create a `setupVirtiofsDaemon` one.

It will also become handy later in this series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-03 12:48:14 +01:00
Fabiano Fidêncio
413b3b477a clh: introduce createVirtiofsDaemon()
Let's introduce and use a new `createVirtiofsDaemon` method.  Its name
says it all, and it'll be handy later in this series when, spoiler
alert, SharedFS cannot be used (in such cases as in Confidential
Guests).

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-03 12:48:02 +01:00
James O. D. Hunt
55cd0c89d8 runtime: Build golang components with extra security options
Enable stack protector and fortify source for golang builds.

Fixes: #3817.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2022-03-03 10:41:26 +00:00
Fabiano Fidêncio
76e4f6a2a3 Revert "hypervisors: Confidential Guests do not support Device hotplug"
This reverts commit df8ffecde0, as device
hotplug *is* supported and, more than that, is very much needed when
using virtio-blk instead of virtio-fs.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-03 09:59:55 +01:00
Fabiano Fidêncio
fa8b93927c config: qemu: Fix disable_block_device_use comments
virtio-fs, instead of virtio-9p, is the default shared file system type
in case virtio-blk is not used.

Fixes: #3813

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-02 20:43:36 +01:00
Fabiano Fidêncio
9615c8bc9c config: fc: Don't expose disable_block_device_use
Relying on virtio-block is the *only* way to use Firecracker with Kata
Containers, as shared FS (virtio-{fs,fs-nydus,9p}) is not supported by
Firecracker.

As configuration doesn't make sense to be exposed, we hardcode the
`false` value in the Firecracker configuration structure.

Fixes: #3813

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-02 20:43:28 +01:00
Bin Liu
2ae8bd696a
Merge pull request #3367 from wfly1998/main
build: always reset ARCH after getting it
2022-03-02 14:42:45 +08:00
Bin Liu
75877f8793
Merge pull request #3187 from Kvasscn/kata_dev_remove_temp_vsock_dir
virtcontainers: remove temp dir created for vsock in test code
2022-03-02 11:05:47 +08:00
Francesco Giudici
7f638dd049
Merge pull request #3764 from Jakob-Naucke/hugepages-test-s390x
virtcontainers: Use available s390x hugepages
2022-03-01 14:33:59 +01:00
Fabiano Fidêncio
4ab35b0899
Merge pull request #3796 from jodh-intel/fix-monitor-listen-address
Fix monitor listen address
2022-03-01 13:51:01 +01:00
Fabiano Fidêncio
97c17085b0
Merge pull request #3770 from Jakob-Naucke/gofmt-vmm-s390x
runtime: Gofmt fixes
2022-03-01 11:34:15 +01:00
James O. D. Hunt
e64c54a2ad monitor: Listen to localhost only by default
Change `kata-monitor` to listen to port `8090` on the local interface
only by default.

> **Note:**
>
> This is a breaking change as previously it listened on all interfaces.

Fixes: #3795.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2022-03-01 10:00:43 +00:00
James O. D. Hunt
e6350d3d45 monitor: Fix build options
Removed redundant and duplicated build options to build
`kata-monitor` the same way as the other components:

- `CGO_ENABLED=0` is not necessary.
- `-buildmode=exe` is not necessary since `BUILDFLAGS` already sets the
  build mode.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2022-03-01 10:00:43 +00:00
GabyCT
ccb063b848
Merge pull request #3788 from fidencio/wip/update-clh-confidential-guest-comments
Update `confidential_guest` comments
2022-02-28 15:11:01 -06:00
GabyCT
bc1733bb0e
Merge pull request #3774 from egernst/delinux-runtime
cleanup runtime pkgs for Darwin build, add basic Darwin build/unit test
2022-02-28 15:08:09 -06:00
Jakob Naucke
eda8ea154a
runtime: Gofmt fixes
- Mostly blank lines after `+build` -- see
  https://pkg.go.dev/go/build@go1.14.15 -- this is, to date, enforced by
  `gofmt`.
- 1.17-style go:build directives are also added.
- Spaces in govmm/vmm_s390x.go

Fixes: #3769
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2022-02-28 17:24:47 +01:00
Eric Ernst
e355a71860 container: file is not linux specific
This should not be linux specific -- drop restriction.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-02-28 08:01:53 -08:00
Eric Ernst
b31876eefb device-manager: move linux-only test to a linux-only file
We can't Mkdev on Darwin - let's make sure the vfio test is in a
linux-only file.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-02-28 08:01:53 -08:00
Eric Ernst
6a5c634490 resourcecontrol: SystemdCgroup check is not necessarily linux specific
This utility function is also used to check the spec that will run in
the guest - no need for this to be linux specific.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-02-28 08:01:53 -08:00
Eric Ernst
cc58cf6993 resourcecontrol: convert stats dev_t to unit64types
Their types may differ on various host OSes, but
unix.Major|Minor always takes a uint64

Depends-on: github.com/kata-containers/tests#4516
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-02-28 08:01:53 -08:00
Eric Ernst
5be188cc29 utils: Add darwin stub
Add a stub for utils_darwin to facilitate building this package on
Darwin. We can probably drop this empty stub if we have better
abstraction for the various parts of virtcontainers that call it
today...

Fixes:# 3777

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-02-28 08:01:53 -08:00
Samuel Ortiz
ad0449195d virtcontainers: Convert stats dev_t to uint64
We need to convert them to uint64 as their types may differ on various
host OSes, but unix.Major|Minor takes a uint64 regardless.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-28 08:01:53 -08:00
Samuel Ortiz
56751089c0 katautils: Use a syscall wrapper for the hook JSON state
There is no real equivalent of a thread ID on Darwin.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-28 08:01:53 -08:00
Samuel Ortiz
7d64ae7a41 runtime: Add a syscall wrapper package
It allows to support syscall variations between host OSes.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-28 08:01:53 -08:00
Samuel Ortiz
abc681ca5f katautils: Add Darwin stub for the netNS API
And move the current implementation into a Linux only file.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-28 08:01:53 -08:00
Fabiano Fidêncio
de57466212 config: Expand confidential_guest comments
Let's clarify that an error will be reported in case confidential_guest
is enabled, but the hardware where Kata Containers is running doesn't
provide the required feature set.

Fixes: #3787

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-28 11:57:42 +01:00
Fabiano Fidêncio
641d475fa6 config: clh: Use "Intel TDX" instead of just "TDX"
Let's use "Intel TDX" rather than just "TDX", as it can ease the
understanding of the terminology.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-28 10:27:21 +01:00
Fabiano Fidêncio
0bafa2def9 config: clh: Mention supported TEEs
Let's mention the supported TEEs to be used with confidential guests.

Right now, Cloud Hyperisor supports only Intel TDX, used together with
TD Shim.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-28 10:24:33 +01:00
bin
81ed269ed2 runtime: use Cmd.StdoutPipe instead of self-created pipe
Nydusd uses a bufio.Scanner to check if nydusd process has
existed, but stderr/stdout passed to Cmd is self-created pipe,
this pipe will not be closed if the process start failing.

Use standard Cmd.StdoutPipe can close the stdout and kata shim
will detect the existence of the nydusd process, then call cmd.Wait to
reap the process' resources.

Fixes: #3783

Signed-off-by: bin <bin@hyper.sh>
2022-02-28 16:52:49 +08:00
Eric Ernst
3997c962c2
Merge pull request #3767 from tanweernoor/02242022-kata-containers-issue-3631
runtime, config: make selinux configurable
2022-02-26 08:44:29 -08:00
Fabiano Fidêncio
a9ba7c132b clh: Fix typo on HotplugRemoveDevice
A copy and paste mistake was made and the error on HotplugRemoveDevice()
should be about removal and not about addition.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 22:35:32 +01:00
Tanweer Noor
082d538cb4 runtime: make selinux configurable
removes --tags selinux handling in the makefile (part of it introduced here: d78ffd6)
and makes selinux configurable via configuration.toml

Fixes: #3631
Signed-off-by: Tanweer Noor <tnoor@apple.com>
2022-02-25 10:33:46 -08:00
Fabiano Fidêncio
ea1876f057
Merge pull request #3771 from fidencio/wip/clh-tdx
clh: Add TDX support
2022-02-25 18:45:31 +01:00
Samuel Ortiz
1103f5a4d4 virtcontainers: Use FilesystemSharer for sharing the containers files
Switching to the generic FilesystemSharer brings 2 majors improvements:

1. Remove container and sandbox specific code from kata_agent.go
2. Allow for non Linux implementations to provide ways to share
   container files and root filesystems with the Kata Linux guest.

Fixes #3622

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-25 17:22:27 +01:00
Samuel Ortiz
533c1c0e86 virtcontainers: Keep all filesystem sharing prep code to sandbox.go
With the Linux implementation of the FilesystemSharer interface, we can
now remove all host filesystem sharing code from kata_agent and keep it
where it belongs: sandbox.go.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-25 17:22:27 +01:00
Samuel Ortiz
61590bbddc virtcontainers: Add a Linux implementation for the FilesystemSharer
This gathers the current kata agent and container filesystem sharing
code into a FilesystemSharer implementation.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-25 17:22:27 +01:00
Samuel Ortiz
03fc1cbd7e virtcontainers: Add a filesystem sharing interface
Filesystem sharing here means the ability to share some parts of the
host filesystem with the guest. It's mostly about sharing files and
container bundle root filesystems.

In order to allow for different file and rootfs sharing implementations,
we define a FilesystemSharer interface.

This interface provides a preparation step, where concrete
implementations will be able to e.g. prepare the host filesysstem.
Then it provides 2 methods, one for sharing any file (regular file or a
directory) and another one for sharing a container root filesystem

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-25 17:22:27 +01:00
Fabiano Fidêncio
72434333aa clh: Add TDX support
Let's enable TDX support for Cloud Hypervisor, using td-shim as its
desired firmware.

Fixes: #3632

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
a13b4d5ad8 clh: Add firmware to the config file
"firmware" option was already present for a while, but it's never been
exposed to the configuration file before.

Let's do it now as it can be used, in combination with the newly added
confidential_guest option, to boot a guest VM using the so called
`td-shim`[0] with Cloud Hypervisor.

[0]: https://github.com/confidential-containers/td-shim

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
a8827e0c78 hypervisors: Confidential Guests do not support NVDIMM
NVDIMM is also not supported with Confidential Guests and Virtio Block
devices should be used instead.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
f50ff9f798 hypervisors: Confidential Guests do not support Memory hotplug
Similarly to VCPUs and Device hotplug, Confidential Guests also do not
support Memory hotplug.

Let's make it clear in the documentation and guard the code on both QEMU
and Cloud Hypervisor side to ensure we don't advertise Memory hotplug as
being supported when running Confidential Guests.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
df8ffecde0 hypervisors: Confidential Guests do not support Device hotplug
Similarly to VCPUs hotplug, Confidential Guests also do not support
Device hotplug.

Let's make it clear in the documentation and guard the code on both QEMU
and Cloud Hypervisor side to ensure we don't advertise Device hotplug as
being supported when running Confidential Guests.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
28c4c044e6 hypervisors: Confidential Guests do not support VCPUs hotplug
As confidential guests do not support VCPUs hotplug, let's set the
"DefaultMaxVCPUs" value to "NumVCPUs".

The reason to do this is to ensure that guests will be started with the
correct amount of VCPUs, without giving to the guest with all the
possible VCPUs the host could provide.

One clear side effect of this limitation is that workloads that would
require more VCPUs on their yaml definition will not run on this
scenario.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
29ee870d20 clh: Add confidential_guest to the config file
ConfidentialGuest is an option already present and exposed for QEMU,
which is used for using Kata Containers together with different sorts of
Guest Protections, such as TDX and SEV for x86_64, PEF for ppc64le, and
SE for s390x.

Right now we error out in case confidential_guest is enabled, as we will
be implementing the needed blocks for this as part of this series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
9621c59691 clh: refactor image / initrd configuration set
This is a small code refactor removing a deadcode based the checks
already done in the generic hypervisor abstraction.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
dcdc412e25 clh: use common kernel params from the hypervisor code
The hypervisor code already defines 3 common kernel root params for the
following cases:
* NVDIMM
* NVDIMM without DAX support
* Virtio Block

As parameters used for cloud-hypervisor have an overlap with the ones
provided by the NVDIMM case, let's take advantage of that.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
4c164afbac versions: Update Cloud Hypervisor to 5343e09e7b8db
Let's bump the Cloud Hypervisor version to 5343e09e7b8db, as that brings
a few fixes we're interested in, such as:

* hypervisor, vmm: Handle TDX hypercalls with INVALID_OPERAND
  - https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3723
    - This is needed for the TDX support on the cloud hypervisor driver,
      which is part of this very same series.

* openapi: Update the PciBdf types
  - https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3748
    - This is needed due to a change in a DeviceNode field, which would
      cause a marshalling / demarshalling error when running with a
      version of cloud-hypervisor that includes the TDX fixes mentioned
      above.

* scripts: dev_cli: Don't quote $features_build
* scripts: dev_cli: Add --features option
  - https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3773
    - This is needed due to changes in the scripts used to build Cloud
      Hypervisor, which are used as part of Kata Containers CIs and
      github actions.

      Due to this change, we're also adapting the build scripts as part
      of this very same commit.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:16 +01:00
Jakob Naucke
bbfe7d6591
Merge pull request #3599 from Jakob-Naucke/no-virtio-rng-ccw
virtcontainers: Do not add a virtio-rng-ccw device
2022-02-25 15:27:02 +01:00
Francesco Giudici
3da6006de4
Merge pull request #3751 from fgiudici/kata-monitor_issue3705
kata-monitor: fix collecting metrics for sandboxes not started through CRI
2022-02-25 14:53:12 +01:00
Jakob Naucke
b2a65f9031
virtcontainers: Use available s390x hugepages
in TestHandleHugepages. On s390x, hugepage sizes must be set at boot, so
test with any that are present (default is 1M).

Depends-on: github.com/kata-containers/kata-containers#3770
Fixes: #3763
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2022-02-25 13:11:00 +01:00
Amulyam24
cb4230e60e runtime: fix package declaration for ppc64le
Incorrect package name causes build to fail. Fix it
in vm_ppc64le.go

Fixes: #3761

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2022-02-24 15:31:48 +05:30
Eric Ernst
c6cc038364
Merge pull request #3615 from sameo/topic/hypervisor
Make the hypervisor framework not Linux specific
2022-02-23 16:02:00 -08:00
Francesco Giudici
fec26f8e51 kata-monitor: trivial: rename symbols & labels
We introduced collection of sandboxes metadata from the CRI that will be
attached to the sandbox metrics: this will allow to immediately match
sandboxes metrics with CRI workloads.
Rename the symbols from *Kube* to *CRI* as the metadata will be there
every time pods are created through CRI, also if kubernetes is not
installed (e.g., 'crictl runp').

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
2022-02-23 18:34:32 +01:00
Samuel Ortiz
9fd4e5514f runtime: Move the resourcecontrol package one layer up
And try to reduce the number of virtcontainers packages, step by step.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-23 15:48:40 +01:00
Samuel Ortiz
823faee83a virtcontainers: Rename the cgroups package
To resourcecontrol, and make it consistent with the fact that cgroups
are a Linux implementation of the ResourceController interface.

Fixes: #3601

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-23 15:48:40 +01:00
Samuel Ortiz
0d1a7da682 virtcontainers: Rename and clean the cgroup interface
We call it a ResourceController, and we make it not so Linux specific.
Now the Linux implementations is the cgroups one.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-23 15:48:40 +01:00
Samuel Ortiz
ad10e201e1 virtcontainers: cgroups: Move non Linux routine to utils.go
Have an OS agnostic file for sharing routines.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-23 15:48:40 +01:00
Samuel Ortiz
d49d0b6f39 virtcontainers: cgroups: Define a cgroup interface
And move the current, Linux-specific implementation into
cgroups_linux.go

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-23 15:48:40 +01:00
Francesco Giudici
3ac52e8193 kata-monitor: fix updating sandbox cache at startup
We now rely on fs events only to update the sandbox cache. This is not
true anyway for sandboxes already present at kata-monitor startup: we
just retrieve the list and add them in the cache only when we get their
CRI metadata. If CRI metadata is not available we will never add them to
the sandbox cache.
Fix this by immediately adding the sandboxes we find at startup time to
the sandbox cache.

Fixes: #3705

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
2022-02-23 11:21:06 +01:00
Francesco Giudici
160bb62138 kata-monitor: bump version to 0.3.0
Since kata-monitor now:
- relies on fs events *only* to update the sandbox cache
- adds CRI meta-data as labels (CRI pod name, namespace and uid)
it deserves a version bump.

Note that while we could let kata-monitor match the runtime version,
kata-monitor will usually work flawlessy with different kata shim
releases: so it makes sense to keep kata-monitor version separated.

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
2022-02-23 11:17:02 +01:00
Fabiano Fidêncio
6a9e5f90f7
Merge pull request #3670 from sameo/topic/nerdctl
Support nerdctl OCI hooks
2022-02-22 23:03:33 +01:00
Fabiano Fidêncio
4729fd0fc2
Merge pull request #3736 from liubin/fix/3733-log-events-for-crio
shim: log events for CRI-O
2022-02-22 09:19:37 +01:00
bin
f6fc1621f7 shim: log events for CRI-O
CRI-O start shim process without setting TTRPC_ADDRESS,
that the forwarding events goroutine will get errors.

For CRI-O runtime, we can log the events to log file.

Fixes: #3733

Signed-off-by: bin <bin@hyper.sh>
2022-02-22 11:02:50 +08:00
Fabiano Fidêncio
1e9f3c856d
Merge pull request #3553 from fgiudici/kata-monitor_cachefix
kata-monitor: simplify sandbox cache management and attach kubernetes POD metadata to metrics
2022-02-21 13:17:22 +01:00
Peng Tao
031da99914
Merge pull request #3687 from luodw/nydus-clh
nydus: add lazyload support for kata with clh
2022-02-21 19:31:45 +08:00
luodaowen.backend
3175aad5ba virtiofs-nydus: add lazyload support for kata with clh
As kata with qemu has supported lazyload, so this pr aims to
bring lazyload ability to kata with clh.

Fixes #3654

Signed-off-by: luodaowen.backend <luodaowen.backend@bytedance.com>
2022-02-19 21:55:31 +08:00
zhanghj
94b831ebf8 virtcontainers: remove temp dir created for vsock in test code
remove temp dir generated by mock.GenerateKataMockHybridVSock().

Fixes: #3186

Signed-off-by: zhanghj <zhanghj.lc@inspur.com>
2022-02-19 16:59:15 +08:00
Archana Shinde
7db9bef72c
Merge pull request #3718 from Kvasscn/kata_dev_fix_utils_assert_msg
virtcontainers: Remove duplicated assert messages in utils test code
2022-02-18 06:07:16 -08:00
Samuel Ortiz
27de212fe1 runtime: Always add network endpoints from the pod netns
As the container runtime, we're never inspecting, adding or configuring
host networking endpoints.
Make sure we're always do that by wrapping addSingleEndpoint calls into
the pod network namespace.

Fixes #3661

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-18 10:37:07 +01:00
zhanghj
1cee0a9452 virtcontainers: Remove duplicated assert messages in utils test code
Remove duplicated strings in assert.Errorf() and assert.NoErrorf().

Fixes: #3714

Signed-off-by: zhanghj <zhanghj.lc@inspur.com>
2022-02-18 16:45:05 +08:00
Samuel Ortiz
77c29bfd3b container: Remove VFIO lazy attach handling
With the recently added VFIO fixes and support, we should not need that
anymore.

Fixes #3108

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-17 08:39:44 +01:00
GabyCT
ced5e910d5
Merge pull request #3558 from jodh-intel/docs-rework-readme
docs: Improve top-level README
2022-02-16 16:28:14 -06:00
Fabiano Fidêncio
6f9685fbf5
Merge pull request #3624 from mdlayher/mdl-vsock
runtime: use github.com/mdlayher/vsock@v1.1.0
2022-02-16 23:11:47 +01:00
Samuel Ortiz
26b3f0017c virtcontainers: Split hypervisor into Linux and OS agnostic bits
Keep all the OS agnostic bits in the hypervisor.go and
hypervisor_ARCH.go files.

Fixes #3614

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-16 19:15:31 +01:00
Samuel Ortiz
fa0e9dc6b1 virtcontainers: Make all Linux VMMs only build on Linux
Some of them (e.g. QEMU) can run on other OSes (e.g. Darwin) but the
current virtcontainers implementation is Linux specific.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-16 19:07:34 +01:00
Samuel Ortiz
c91035d0e1 virtcontainers: Move non QEMU specific constants to hypervisor.go
Hotplugging errors and 9pfs size are not particularily QEMU specific.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-16 19:07:34 +01:00
Samuel Ortiz
10ae05914c virtcontainers: Move guest protection definitions to hypervisor.go
They're not QEMU specific, other VMMs may implement support for it.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-16 19:07:31 +01:00
Samuel Ortiz
b28d0274ff virtcontainers: Make max vCPU config less QEMU specific
Even though it's still actually defined as the QEMU upper bound,
it's now abstracted away through govmm.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-16 19:06:32 +01:00
Samuel Ortiz
a5f6df6a49 govmm: Define the number of supported vCPUs per architecture
Based on qhe QEMU supports on those architectures.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-16 19:06:32 +01:00
Fabiano Fidêncio
be2e90469a
Merge pull request #3669 from fidencio/wip/virtiofsd-use-announce-submounts
virtiofsd: Use "-o announce_submounts"
2022-02-16 16:43:18 +01:00
James O. D. Hunt
9818cf7196 docs: Improve top-level and runtime README
Various improvements to the top-level README file:

- Moved the following sections from the runtime's README to the
  top-level README:
  - License
  - Platform support / Hardware requirements
- Added the following sections to the top-level README:
  - Configuration
  - Hypervisors
- Improved formatting of the Documentation section in the top-level
  README.
- Removed some unused named links from the top-level README.

Also improvements to the runtime README:

- Removed confusing mention of the old 1.x runtime name.
- Clarify the binary name for the 2.x runtime and the utility program.

> **Note:**
>
> We cannot currently link to the AMD website as that site's
> configuration causes the CI static checks to fail. See
> https://github.com/kata-containers/tests/issues/4401

Fixes: #3557.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2022-02-16 09:52:48 +00:00
bin
81a8baa5e5 runtime: add hugepages support
Add hugepages support, port from:
b486387cba

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Signed-off-by: bin <bin@hyper.sh>
2022-02-16 15:14:53 +08:00
bin
7df677c01e runtime: Update calculateSandboxMemory to include Hugepages Limit
Support hugepages and port from:
96dbb2e8f0

Fixes: #3342

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Signed-off-by: bin <bin@hyper.sh>
2022-02-16 15:14:37 +08:00
Samuel Ortiz
4f96e3eae3 katautils: Pass the nerdctl netns annotation to the OCI hooks
We need to let nerdctl know which namespace to use when calling the
selected CNI plugin.
See https://github.com/containerd/nerdctl/issues/787

Fixes: #1935

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-15 18:11:23 +01:00
Samuel Ortiz
a871a33b65 katautils: Run the createRuntime hooks
The preStart hooks are being deprecated over the createRuntime ones.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-15 17:31:56 +01:00
Samuel Ortiz
d9dfce1453 katautils: Run the preStart hook in the host namespace
The OCI spec is very specific about it:

"The prestart hooks MUST be executed in the runtime namespace."

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-15 17:31:56 +01:00
Samuel Ortiz
6be6d0a3b3 katautils: Pass the OCI annotations back to the called OCI hooks
That allows us to amend those annotations with information that could be
used when running those hooks.

For example nerdctl will use those annotations to resolve the networking
namespace path in where to run the CNI plugin, i.e. the created pod
networking namespace.

Fixes #3629

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-15 17:31:56 +01:00
Fabiano Fidêncio
4bd945b67b virtiofsd: Use "-o announce_submounts"
German Maglione, one of the current virtio-fs developers, has brought to
our attention that using "announce-submounts" could help us to prevent
inode number collisions.

This feature was introduced a year ago or so by Hanna Reitz as part of
the 08dce386e77eb9ab044cb118e5391dc9ae11c5a8, and as we already mandate
QEMU >= 6.1.0, let's take advantage of that.

Fixes: #3507

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-15 08:52:03 +01:00
Yu Li
37df1678ae build: always reset ARCH after getting it
When building with `ARCH=x86_64`, the previous `Makefile` will use it
without checking and cause:

Makefile:319: *** "ERROR: No hypervisors known for architecture x86_64 (looked for: acrn firecracker qemu cloud-hypervisor)".  Stop.

This commit fix the above issue by checking `ARCH` no matter where it
is assigned.

Fixes: #3444

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com>
2022-02-15 14:26:34 +08:00
Shengjing Zhu
3a641b56f6 katatestutils: remove distro constraints
The distro constraint parses os release files, which may not contain
distro version(VERSION_ID field), for example rolling release distributions
like Debian testing, archlinux.

These distro constraints are not used anyway, so removing them instead
of fixing the complex version detection.

Fixes: #1864

Signed-off-by: Shengjing Zhu <zhsj@debian.org>
2022-02-15 02:11:52 +08:00
Fabiano Fidêncio
90fd625d0c versions: Udpate Cloud Hypervisor to 55479a64d237
Let's update cloud-hypervisor to a version that exposes the TDx support
via the OpenAPI's auto-generated code.

Fixes: #3663

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-14 17:32:30 +01:00
James O. D. Hunt
8f80dffead
Merge pull request #3648 from yaoyinnan/index-in-for
runtime: The index variable is initialized multiple times in for
2022-02-14 12:36:46 +00:00
Bin Liu
cf53ec2c71
Merge pull request #2977 from luodw/support_nydus
feature(nydusd): add nydusd support to introduce lazyload ability
2022-02-14 13:08:50 +08:00
Matt Layher
c1ce67d905
runtime: use github.com/mdlayher/vsock@v1.1.0
Fixes #3625
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2022-02-12 19:57:15 -05:00
yaoyinnan
42a878e6c1 runtime: The index variable is initialized multiple times in for
Change the variables `mountTypeFieldIdx := 8`, `mntDestIdx := 4` and `netNsMountType := "nsfs"` to const.

And unify the variable naming style, modify `mntDestIdx` to `mountDestIdx`.

Fixes: #3646

Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
2022-02-12 11:10:10 +08:00
luodaowen.backend
2d9f89aec7 feature(nydusd): add nydusd support to introduse lazyload ability
Pulling image is the most time-consuming step in the container lifecycle. This PR
introduse nydus to kata container, it can lazily pull image when container start. So it
can speed up kata container create and start.

Fixes #2724

Signed-off-by: luodaowen.backend <luodaowen.backend@bytedance.com>
2022-02-11 21:41:17 +08:00
Daniel Höxtermann
b19b6938a8 docs: Fix relative links in Markdown
Relative links within this repository allow for easier navigation to
the corresponding file / directory in the current commit / for the
selected version.

Link text was slightly changed / fixed in
- docs/Unit-Test-Advice.md
- docs/how-to/how-to-run-docker-with-kata.md

Fixes #3045

Signed-off-by: Daniel Höxtermann <daniel@hxtm.dev>
2022-02-11 13:49:42 +01:00
Julio Montes
982f14fa66 runtime: support QEMU SGX
Enable SGX in QEMU when `sgx.intel.com/epc` annotation is defined

fixes #3436

Signed-off-by: Julio Montes <julio.montes@intel.com>
2022-02-10 09:45:48 -06:00
Samuel Ortiz
07b9d93f5f virtcontainer: Simplify the sandbox network creation flow
We don't need to call NewNetwork() twice, and we can have the VM factory
case return immediatly. That makes the code more readable.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Samuel Ortiz
2c7087ff42 virtcontainers: Make all endpoints Linux only
All of the networking endpoints are Linux specific.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Samuel Ortiz
49d2cde1e2 virtcontainers: Split network tests into generic and OS specific parts
Some unit tests are generic while others, mostly because they depend on
netlink, are Linux specific.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Samuel Ortiz
0269077ebf virtcontainers: Remove the netlink package dependency from network.go
Move the netlink dependent code into network_linux.go.
Other OSes will have to provide the same functions.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Samuel Ortiz
7fca5792f7 virtcontainers: Unify Network endpoints management interface
And only have AddEndpoints/RemoveEndpoints for all cases (single
endpoint vs all of them, hotplug or not).

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Samuel Ortiz
c67109a251 virtcontainers: Remove the Network PostAdd method
It's used once by the sandbox code and can be implemented directly
there.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Samuel Ortiz
e0b264430d virtcontainers: Define a Network interface
And move the Linux implementation into a GOOS specific file.

Fixes #3005

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Samuel Ortiz
5e119e90e8 virtcontainers: Rename the Network structure fields and methods
We are converting the Network structure into an interface, so that
different host OSes can have different networking implementations for
Kata.
One step into that direction is to rename all the Network structure
fields and methods to something that is less Linux networking namespace
specific. This will make the Network interface naming consistent.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Samuel Ortiz
b858d0dedf virtcontainers: Make all Network fields private
Prepare for making it a real interface.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Samuel Ortiz
49eee79f5f virtcontainers: Remove the NetworkNamespace structure
It is now replaced with a single Network structure

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Samuel Ortiz
844eb61992 virtcontainers: Have CreateVM use a Network reference
We are replacing the NetworkingNamespace structure with the Network
one, so we should have the hypervisor interface switching to it as well.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Samuel Ortiz
d7b67a7d1a virtcontainers: Network API cleanups and simplifications
Remove unused parameters.
Reduce the number of parameters by deriving some of them (e.g. a
networking config) from their outer structure (e.g. a Sandbox
reference).

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Samuel Ortiz
2edea88369 virtcontainers: Make the Network structure manage endpoints
Endpoints creations, attachement and hotplug are bound to the networking
namespace described through the Network structure.
Making them Network methods is natural and simplifies the code.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Samuel Ortiz
8f48e28325 virtcontainers: Expand the Network structure
For simplicity sake, there should only be one networking structure per
sandbox, as opposed to two (Network and NetworkingNamespace) currently.

This commit start expanding the Network structure in order to eventually
make it the single representation of a virtcontainers sandbox
networking.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Pierre Kohler
5ef522f7c3 runtime: check kvm module sev correctly
Runtime now accepts both `1` and `Y` as valid values for
kvm_amd module parameter kvm_amd.sev.

Fixes #3273

Signed-off-by: Pierre Kohler <pierre.kohler@cysec.systems>
2022-02-07 23:48:47 +01:00