kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2025-08-24 10:41:43 +00:00

Author	SHA1	Message	Date
Fabiano Fidêncio	3a4b924226	Merge pull request #6833 from rye-stripe/bugfix/vcpu-pinning resource-control: fix setting CPU affinities on Linux	2023-05-18 08:12:39 +02:00
Fabiano Fidêncio	e762f70920	Merge pull request #6838 from rye-stripe/bugfix/use-enable-vcpus-pinning-from-toml runtime: use enable_vcpus_pinning from toml	2023-05-17 21:30:44 +02:00
Dov Murik	dd7562522a	runtime: pkg/sev: Add kbs utility package for SEV pre-attestation Supports both online and offline modes of interaction with simple-kbs for SEV/SEV-ES confidential guests. Fixes: #6795 Signed-off-by: Dov Murik <dovmurik@linux.ibm.com>	2023-05-16 15:27:32 +03:00
Dov Murik	05de7b2607	runtime: Add sev package The sev package provides utilities for launching AMD SEV and SEV-ES confidential guests. Fixes: #6795 Signed-off-by: Dov Murik <dovmurik@linux.ibm.com>	2023-05-16 15:27:32 +03:00
Peteris Rudzusiks	bdb75fb21e	runtime: use enable_vcpus_pinning from toml Set the default value of runtime's EnableVCPUsPinning to value read from .toml. Fixes: #6836 Signed-off-by: Peteris Rudzusiks <rye@stripe.com>	2023-05-15 21:41:20 +02:00
Peteris Rudzusiks	3e85bf5b17	resource-control: fix setting CPU affinities on Linux With this fix the vCPU pinning feature chooses the correct physical cores to pin the vCPU threads on rather than always using core 0. Fixes #6831 Signed-off-by: Peteris Rudzusiks <rye@stripe.com>	2023-05-15 16:46:36 +02:00
Peng Tao	65670e6b0a	Merge pull request #6699 from zvonkok/cold-plug-vfio gpu: cold plug VFIO devices	2023-05-05 10:04:29 +08:00
Zvonko Kaiser	13d7f39c71	gpu: Check for VFIO port assignments Bailing out early if the port is wrong, allowed port settings are no-port, root-port, switch-port Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-05-03 12:32:33 +00:00
Zvonko Kaiser	138ada049c	gpu: Cold Plug VFIO toml setting Added the cold_plug_vfio setting to the qemu-toml.in with some epxlanation Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-04-27 11:04:45 +00:00
Zvonko Kaiser	0fec2e6986	gpu: Add cold-plug test Cold plug setting is now correctly decoded in toml Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-04-27 09:30:24 +00:00
Zvonko Kaiser	2a830177ca	gpu: Add fwcfg helper function Added driver util function for easier handling of VFIO devices outside of the VFIO module. At the sandbox level we may need to set options depending if we have a VFIO/PCIe device, like the fwCfg for confiential guests. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-04-26 09:47:37 +00:00
Zvonko Kaiser	131f056a12	gpu: Extract VFIO Functions to drivers Some functions may be used in other modules then only in the VFIO module, extract them and make them available to other layers like sandbox. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-04-26 09:47:37 +00:00
Zvonko Kaiser	c8cf7ed3bc	gpu: Add ColdPlug of VFIO devices with devManager If we have a VFIO device and cold-plug is enabled we mark each device as ColdPlug=true and let the VFIO module do the attaching. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-04-26 09:47:37 +00:00
Zvonko Kaiser	6107c32d70	gpu: Assign default value to cold-plug Make sure the configuration is propagated to the right structs and the default value is assigned. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-04-26 09:47:37 +00:00
Zvonko Kaiser	377ebc2ad1	gpu: Add configuration option for cold-plug VFIO Users can set cold-plug="root-port" to cold plug a VFIO device in QEMU Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-04-26 09:47:37 +00:00
Zvonko Kaiser	c18ceae109	gpu: Add new struct PCIePort For the hypervisor to distinguish between PCIe components, adding a new enum that can be used for hot-plug and cold-plug of PCIe devices Fixes: #6687 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-04-26 09:47:37 +00:00
Eduardo Berrocal	1c1ee8057c	pkg/signals: Improved test coverage 60% to 100% Expanded tests on signals_test.go to cover more lines of code. 'go test' won't show 100% coverage (only 66.7%), because one test need to spawn a new process (since it is testing a function that calls os.Exit(1)). Fixes: #256 Signed-off-by: Eduardo Berrocal <eduardo.berrocal@intel.com>	2023-04-25 23:34:13 +00:00
Zvonko Kaiser	f4f958d53c	gpu: Do not pass-through PCI (Host) Bridges On some systems a GPU is in a IOMMU group with a PCI Bridge and PCI Host Bridge. Per default no PCI Bridge needs to be passed-through. When scanning the IOMMU group, ignore devices with a 0x60 class ID prefix. Fixes: #6663 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-04-17 10:08:23 +00:00
Alexandru Matei	db2cac34d8	runtime: Don't create socket file in /run/kata The socket file for shim management is created in /run/kata and it isn't deleted after the container is stopped. After running and stopping thousands of containers /run folder will run out of space. Fixes #6622 Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com> Co-authored-by: Greg Kurz <groug@kaod.org>	2023-04-13 10:21:29 +03:00
Fabiano Fidêncio	3b3656d96d	Merge pull request #6522 from fidencio/topic/add-tdx-artefacts-from-2023ww01-to-main tdx: Add artefacts from the latest TDX tools release into main	2023-04-11 20:43:02 +02:00
Fabiano Fidêncio	50ce33b02d	Merge pull request #6205 from fengwang666/non-root-clh runtime: support non-root for clh	2023-04-11 19:34:00 +02:00
Fabiano Fidêncio	3e15800199	govmm: Directly pass the firmware using -bios with TDX Since TDX doesn't support readonly memslot, TDVF cannot be mapped as pflash device and it actually works as RAM. "-bios" option is chosen to load TDVF. OVMF is the opensource firmware that implements the TDVF support. Thus the command line to specify and load TDVF is ``-bios OVMF.fd`` Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-04-11 15:23:42 +02:00
Fabiano Fidêncio	3c5ffb0c85	govmm: Set "sept-ve-disable=on" This is needed since 22ww49. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-04-11 15:23:42 +02:00
James O. D. Hunt	cbe6f04194	Merge pull request #6501 from shippomx/dev_metrics runtime: add filter metrics with specific names	2023-04-05 15:15:09 +01:00
Miao Xia	0f73515561	runtime: add filter metrics with specific names The kata monitor metrics API returns a huge size response, if containers or sandboxs are a large number, focus on what we need will be harder. Fixes: #6500 Signed-off-by: Miao Xia <xia.miao1@zte.com.cn>	2023-03-28 14:56:13 +08:00
James O. D. Hunt	f06f72b5e9	Merge pull request #6467 from jongwu/qemu-uefi-path qemu/arm64: disable image nvdimm once no firmware offered	2023-03-22 08:43:01 +00:00
Jianyong Wu	ece5edc641	qemu/arm64: disable image nvdimm if no firmware offered For now, image nvdimm on qemu/arm64 depends on UEFI/ACPI, so if there is no firmware offered, it should be disabled. Fixes: #6468 Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>	2023-03-20 18:03:05 +08:00
Hyounggyu Choi	96baa83895	agent: Bring in VFIO-AP device handling again This PR is a continuing work for (kata-containers#3679). This generalizes the previous VFIO device handling which only focuses on PCI to include AP (IBM Z specific). Fixes: kata-containers#3678 Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2023-03-16 18:14:12 +09:00
Jakob Naucke	f666f8e2df	agent: Add VFIO-AP device handling Initial VFIO-AP support (#578) was simple, but somewhat hacky; a different code path would be chosen for performing the hotplug, and agent-side device handling was bound to knowing the assigned queue numbers (APQNs) through some other means; plus the code for awaiting them was written for the Go agent and never released. This code also artificially increased the hotplug timeout to wait for the (relatively expensive, thus limited to 5 seconds at the quickest) AP rescan, which is impractical for e.g. common k8s timeouts. Since then, the general handling logic was improved (#1190), but it assumed PCI in several places. In the runtime, introduce and parse AP devices. Annotate them as such when passing to the agent, and include information about the associated APQNs. The agent awaits the passed APQNs through uevents and triggers a rescan directly. Fixes: #3678 Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2023-03-16 10:07:48 +09:00
Jakob Naucke	b546eca26f	runtime: Generalize VFIO devices Generalize VFIO devices to allow for adding AP in the next patch. The logic for VFIOPciDeviceMediatedType() has been changed and IsAPVFIOMediatedDevice() has been removed. The rationale for the revomal is: - VFIODeviceMediatedType is divided into 2 subtypes for AP and PCI - Logic of checking a subtype of mediated device is included in GetVFIODeviceType() - VFIOPciDeviceMediatedType() can simply fulfill the device addition based on a type categorized by GetVFIODeviceType() Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2023-03-16 10:06:37 +09:00
Jakob Naucke	4c527d00c7	agent: Rename VFIO handling to VFIO PCI handling e.g., split_vfio_option is PCI-specific and should instead be named split_vfio_pci_option. This mutually affects the runtime, most notably how the labels are named for the agent. Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2023-03-16 07:43:39 +09:00
Henry Beberman	974a5c22f0	runtime: add support for Hyper-V This adds /dev/mshv to the list of sandbox devices so that VMMs can create Hyper-V VMs. In our testing, this also doesn't error out in case /dev/mshv isn't present. Fixes #6454. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2023-03-13 17:13:51 -07:00
yanggang	b6880c60d3	logging: Correct the code notes Fix wrong notes for func GetSandboxesStoragePathRust() Fixes: #6394 Signed-off-by: yanggang <gang.yang@daocloud.io>	2023-03-01 19:20:25 +08:00
Feng Wang	cbe6ad9034	runtime: support non-root for clh This change enables to run cloud-hypervisor VMM using a non-root user when rootless flag is set true in the configuration Fixes: #2567 Signed-off-by: Feng Wang <fwang@confluent.io>	2023-02-22 13:57:09 -08:00
zhaojizhuang	ca02c9f512	runtime: add reconnect timeout for vhost user block Fixes: #6075 Signed-off-by: zhaojizhuang <571130360@qq.com>	2023-02-13 14:33:46 +08:00
Bin Liu	95602c8c08	Merge pull request #5999 from yaoyinnan/5998/feat/cgroup-metrics runtime: support cgroup v2 metrics marshal guest metrics	2023-02-11 19:26:24 +08:00
Bin Liu	ecbd94d80c	Merge pull request #6064 from yaoyinnan/6063/feat/rootfs-erofs rootfs: support EROFS filesystem	2023-02-11 11:10:23 +08:00
yaoyinnan	bdf20b5d26	rootfs: support EROFS filesystem For kata containers, rootfs is used in the read-only way. EROFS can noticably decrease metadata overhead. On the basis of supporting the EROFS file system, it supports using the config parameter to switch the file system used by rootfs. Fixes: #6063 Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>	2023-02-11 00:44:13 +08:00
GabyCT	86501d5f6f	Merge pull request #6200 from gkurz/improve-appendFDs-doc runtime: Improve documentation of appendFDs	2023-02-09 15:50:37 -06:00
yaoyinnan	01765e1734	runtime: support cgroup v2 metrics marshal guest metrics Support to use cgroup v2 metrics marshal guest metrics. Fixes: #5998 Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>	2023-02-09 19:14:09 +08:00
Bin Liu	56071c6e7b	virtiofsd: change cache mod to const Change cache mod from literal to const and place them in one place. Also set default cache mode from `none` to `never` in `pkg/katautils/config-settings.go.in`. Fixes: #6151 Signed-off-by: Bin Liu <bin@hyper.sh>	2023-02-08 15:06:52 +08:00
Bin Liu	71a3b73cb0	Merge pull request #6223 from d3c3mber/rm-unused-shim-config runtime: remove not used shim configurations	2023-02-08 10:00:52 +08:00
d3c3mber	390916b33c	runtime: remove not used shim configurations ShimPath and ShimDebug are not needed anymore. Fixes: #6147 Signed-off-by: d3c3mber <tangbo_gl_2022@163.com>	2023-02-07 14:06:12 +08:00
GabyCT	7fc35f19eb	Merge pull request #6056 from jongwu/perm_deny arm64/CI: fix unit test failure on arm64	2023-02-03 10:53:38 -06:00
Jianyong Wu	59f104c022	runtime: skip unit test that fail regularly on aarch64 There are lots of unit test cases fails regularly on aarch64, including TestIOCopy, create_tmpfs. Temporarily skip it for now and enable it after them get fixed. Fixes: #6194 Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>	2023-02-03 11:34:39 +08:00
Greg Kurz	3c48f2202c	runtime: Improve documentation of appendFDs The cmd.ExtraFiles feature that is used to implement appendFDs takes an array of arbitray file descriptors and internally renumbers them to be consecutive starting from 3, using dup2(). This isn't especially obvious : document it for the sake of clarity. Fixes #6199 Signed-off-by: Greg Kurz <groug@kaod.org>	2023-02-02 12:52:10 +01:00
Peng Tao	a34f36f8f4	Merge pull request #6149 from openanolis/fix_kata_runtime runtime:fix stat uds path	2023-02-02 11:00:07 +08:00
Greg Kurz	334c4b8bdc	runtime: Drop QEMU log file support The QEMU log file is essentially about fine grain tracing of QEMU internals and mostly useful for developpers, not production. Notably, the log file isn't limited in size, nor rotated in any way. It means that a container running in the VM could possibly flood the log file with a guest triggerable trace. For example, on openshift, the log file is supposed to reside on a per-VM 14 GiB tmpfs mount. This means that each pod running with the kata runtime could potentially consume this amount of host RAM which is not acceptable. Error messages are best collected from QEMU's stderr as kata is doing now since PR #5736 was merged. Drop support for the QEMU log file because it doesn't bring any value but can certainly do harm. Fixes #6173 Signed-off-by: Greg Kurz <groug@kaod.org>	2023-01-31 09:20:29 +01:00
Zhongtao Hu	1e531b44dc	runtime:fix stat uds path os.Stat("unix:///run/vc/sbs/sid/shim-monitor.sock") will fail, should be os.Stat("/run/vc/sbs/sid/shim-monitor.sock") Fixes:#6148 Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>	2023-01-29 15:08:13 +08:00
zhaojizhuang	9092c23a2e	runtime: Add hmp for qemu Fixes: #6092 Signed-off-by: zhaojizhuang <571130360@qq.com>	2023-01-29 14:22:04 +08:00
Greg Kurz	af125b1498	Merge pull request #5736 from gkurz/no-qemu-daemonize runtime: Start QEMU undaemonized and get logs	2023-01-27 16:33:48 +01:00
Greg Kurz	39fe4a4b6f	runtime: Collect QEMU's stderr LaunchQemu now connects a pipe to QEMU's stderr and makes it usable by callers through a Go io.ReadCloser object. As explained in [0], all messages should be read from the pipe before calling cmd.Wait : introduce a LogAndWait helper to handle that. Fixes #5780 Signed-off-by: Greg Kurz <groug@kaod.org>	2023-01-24 23:09:17 +01:00
Greg Kurz	bf4e3a618f	runtime: Launch QEMU with cmd.Start() LaunchCustomQemu() currently starts QEMU with cmd.Run() which is supposed to block until the child process terminates. This assumes that QEMU daemonizes itself, otherwise LaunchCustomQemu() would block forever. The virtcontainers package indeed enables the Daemonize knob in the configuration but having such an implicit dependency on a supposedly configurable setting is ugly and fragile. cmd.Run() is : func (c *Cmd) Run() error { if err := c.Start(); err != nil { return err } return c.Wait() } Let's open-code this : govmm calls cmd.Start() and returns the cmd to virtcontainers which calls cmd.Wait(). If QEMU doesn't start, e.g. missing binary, there won't be any errors to collect from QEMU output. Just drop these lines in govmm. Similarily there won't be any log file to read from in virtcontainers. Drop that as well. Signed-off-by: Greg Kurz <groug@kaod.org>	2023-01-24 23:09:11 +01:00
Greg Kurz	8a4f08cb0f	govmm: Optionally pass QMP listener to QEMU QEMU's -qmp option can be passed the file descriptor of a socket that is already in listening mode. This is done with by passing `fd=XXX` to `-qmp` instead of a path. Note that these two options are mutually exclusive : QEMU errors out if both are passed, so we check that as well in the validation function. While here add the `path=` stanza in the path based case for clarity. Signed-off-by: Greg Kurz <groug@kaod.org>	2023-01-24 23:08:48 +01:00
Greg Kurz	219bb8e7d0	govmm: Optionally start QMP with a pre-configured connection When QEMU is launched daemonized, we have the guarantee that the QMP socket is available. In order to launch a non-daemonized QEMU, the QMP connection should be created before QEMU is started in order to avoid a race. Introduce a variant of QMPStart() that can use such an existing connection. Signed-off-by: Greg Kurz <groug@kaod.org>	2023-01-24 19:16:47 +01:00
Peng Tao	7d1a604bad	Merge pull request #6060 from ls-ggg/6055/service.mu-deadlock runtime:all APIs are hang in the service.mu	2023-01-18 10:50:00 +08:00
Bin Liu	790f45190b	Merge pull request #6074 from zhaojizhuang/enablevhostuserstore runtime: paas enablevhostuserstore annotation to hypervisor config	2023-01-17 11:43:43 +08:00
ls	69fc8de712	runtime:all APIs are hang in the service.mu When the vmm process exits abnormally, a goroutine sets s.monitor to null in the 'watchSandbox' function without getting service.mu, This will cause another goroutine to block when sending a message to s.monitor, and it holds service.mu, which leads to a deadlock. For example, the wait function in the file .../pkg/containerd-shim-v2/wait.go will send a message to s.monitor after obtaining service.mu, but s.monitor may be null at this time Fixes: #6059 Signed-off-by: ls <335814617@qq.com>	2023-01-16 14:45:37 +08:00
Eric Ernst	458fe865ea	Merge pull request #6052 from egernst/add-darwin-skeletons Add darwin skeletons	2023-01-13 13:14:16 -08:00
zhaojizhuang	cf1bae3521	runtime: paas enablevhostuserstore annotation to hypervisor config Fixes: #6073 Signed-off-by: zhaojizhuang <571130360@qq.com>	2023-01-13 17:07:38 +08:00
Samuel Ortiz	a9626682af	virtcontainers: resourcecontrol: Add skeleton for Darwin Cgroups do not exist on Darwin, so use an empty implementation for resourcecontrol for the time being. In the process, ensure that the utilized cgroup handling (ie, isSystemdCgroup) is kept in general file, since we use this to help assess/constrain the container spec we pass to the guest. Fixes: #6051 Signed-off-by: Samuel Ortiz <s.ortiz@apple.com> Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2023-01-12 15:53:28 -08:00
Eric Ernst	6ee550e9a5	runtime: vCPUs pinning is sandbox specific, not hypervisor While at it, make sure we persist this and fix a misc typo. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2023-01-12 15:44:25 -08:00
Eric Ernst	f137048be3	resource-control: add helper function for setting CPU affinity Let's abstract the CPU affinity Fixes: #6044 Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2023-01-11 17:55:53 -08:00
Bin Liu	86a82cace9	runtime: change cache mode from none to never New Rust virtiofsd's `cache` mode doesn't support `none` mode, we should use `never` to replace it. Fixes: #6018 Signed-off-by: Bin Liu <bin@hyper.sh>	2023-01-10 17:29:48 +08:00
Bin Liu	2c10b37172	Merge pull request #5991 from dcantah/darwin-sigs runtime: Define Darwin handled signals list	2023-01-07 11:19:48 +08:00
Fabiano Fidêncio	175794458f	Merge pull request #5972 from bergwolf/github/hook fix moby prestart hook handling	2023-01-06 14:54:39 +01:00
Samuel Ortiz	3b4420eb8e	runtime: Define Darwin handled signals list Fixes: #5990 Some signals may not be defined on non Linux host OSes, like SIGSTKFLT for example. It's also not defined on certain architectures, but irrelevant for this. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com> Signed-off-by: Danny Canter <danny@dcantah.dev>	2023-01-05 17:50:47 -08:00
Danny Canter	24b05a99b6	schedcore: Make buildable on !linux Fixes: #5983 sched-core only makes sense on Linux hosts. Let's add stub/error for other platforms. Signed-off-by: Eric Ernst <eric_ernst@apple.com> Signed-off-by: Danny Canter <danny@dcantah.dev>	2023-01-05 11:51:04 -08:00
Peng Tao	d085389127	vc: fix up UT for CreateSandbox API change Need to adapt the UT as well. Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2023-01-03 22:30:42 +08:00
Peng Tao	578a9c25f0	vc: rescan network endpoints after running prestart hooks Moby relies on the prestart hooks to configure network endpoints. We should rescan the netns after running them so that the newly added endpoints can be found and plugged to the guest. Fixes: #5941 Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2023-01-03 22:30:41 +08:00
Peng Tao	cb84b0fb02	katautils: run prestart hooks after starting VM So that we can pass the hypervisor pid to the hook instead of the runtime process's. Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2023-01-03 10:52:32 +00:00
Zhongtao Hu	dae6670628	kata-runtime: add rust runtime path for kata-runtime exec add rust runtime path for kata-runtime exec Fixes:#5963 Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>	2022-12-30 13:34:34 +08:00
Binbin Zhang	99485d871c	shim: return hypervisor's pid not shim's pid update outdated code comments Fixes: #3234 Signed-off-by: Binbin Zhang <binbin36520@gmail.com>	2022-12-14 11:16:11 +08:00
Manabu Sugimoto	c617bbe70d	runtime: Pass SELinux policy for containers to the agent Pass SELinux policy for containers to the agent if `disable_guest_selinux` is set to `false` in the runtime configuration. The `container_t` type is applied to the container process inside the guest by default. Users can also set a custom SELinux policy to the container process using `guest_selinux_label` in the runtime configuration. This will be an alternative configuration of Kubernetes' security context for SELinux because users cannot specify the policy in Kata through Kubernetes's security context. To apply SELinux policy to the container, the guest rootfs must be CentOS that is created and built with `SELINUX=yes`. Fixes: #4812 Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>	2022-11-29 19:07:56 +09:00
Bin Liu	1dfd845f51	runtime: go fix code for 1.19 We have starting to use golang 1.19, some features are not supported later, so run `go fix` to fix them. Fixes: #5750 Signed-off-by: Bin Liu <bin@hyper.sh>	2022-11-25 11:29:18 +08:00
Bin Liu	06a604b753	Merge pull request #5720 from YchauWang/wyc-docs-test-22 runtime: add log record to the qemu config method `appendDevices` for…	2022-11-24 13:15:06 +08:00
wangyongchao.bj	30a7ebf430	runtime: Log invalid devices in QEMU config When the user tried to add new devices to the VM, there is no error info for the invalid device. This PR adds a log record to the `appendDevices` for the invalid device of the qemu config. Fixes: #5719 Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>	2022-11-23 09:09:45 +08:00
Fabiano Fidêncio	df3d9878d5	Merge pull request #5695 from darfux/virtiofs-queue-size runtime: Support virtiofs queue size for qemu and make it configurable	2022-11-22 20:04:30 +01:00
liyuxuan.darfux	3bb145c63a	runtime: Support virtiofs queue size for qemu and make it configurable The default vhost-user-fs queue-size of qemu is 128 now. Set it to 1024 by default which is same as clh. Also make this value configurable. Fixes: #5694 Signed-off-by: liyuxuan.darfux <liyuxuan.darfux@bytedance.com>	2022-11-19 15:38:11 +08:00
Fabiano Fidêncio	d94718fb30	runtime: Fix gofmt issues It seems that bumping the version of golang and golangci-lint new format changes are required. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-11-17 14:16:12 +01:00
Fabiano Fidêncio	16b8375095	golang: Stop using io/ioutils The package has been deprecated as part of 1.16 and the same functionality is now provided by either the io or the os package. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-11-17 13:43:25 +01:00
LitFlwr0	2508d39b7c	runtime: added vcpus pinning logics Core VCPU threads pinning logics for issue 4476. Also provided docs. Fixes:#4476 Signed-off-by: LitFlwr0 <861690705@qq.com>	2022-11-04 17:52:42 +08:00
snir911	288e337a6f	Merge pull request #5434 from Rouzip/remove-doNetNS add EnterNetNS in virtcontainers	2022-10-30 11:19:07 +02:00
Fabiano Fidêncio	190e623c40	Merge pull request #5317 from Champ-Goblem/fix-containerd-stats shim: Ensure pagesize is set when reporting hugetlb stats	2022-10-24 10:24:49 +02:00
Rouzip	39363ffbfb	runtime: remove same function Add EnterNetNS in virtcontainers to remove same function. FIXes #5394 Signed-off-by: Rouzip <1226015390@qq.com>	2022-10-17 10:59:13 +08:00
Vijay Dhanraj	435c8f181a	acrn: Enable ACRN hypervisor support for Kata 2.x release Currently ACRN hypervisor support in Kata2.x releases is broken. This commit re-enables ACRN hypervisor support and also refactors the code so as to remove dependency on Sandbox. Fixes #3027 Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>	2022-10-07 07:40:32 -07:00
Champ-Goblem	89e62d4edf	shim: Ensure pagesize is set when reporting hugetbl stats The containerd stats method and metrics API are broken with Kata 2.5.x, the stats fail to load and the metrics API responds with status code 500 This seems to be down to the conversion from the stats reported by the agent RPC `StatsContainer` where the field `Pagesize` is not completed by the `setHugetlbStats` method. In the case where multiple sized tables stats are reported, this causes containerd to register two metrics with the same label set, rather than each being partitioned by the `page` label. Fixes: #5316 Signed-off-by: Champ-Goblem <cameron@northflank.com>	2022-10-04 09:16:30 +01:00
Peng Tao	8a2df6b31c	Merge pull request #4931 from jpecholt/snp-support Added SNP-Support for Kata-Containers	2022-09-27 14:17:54 +08:00
Joana Pecholt	ded60173d4	runtime: Enable choice between AMD SEV and SNP This is based on a patch from @niteeshkd that adds a config parameter to choose between AMD SEV and SEV-SNP VMs as the confidential guest type in case both types are supported. SEV is the default. Signed-off-by: Joana Pecholt <joana.pecholt@aisec.fraunhofer.de>	2022-09-16 17:51:41 +02:00
Joana Pecholt	22bda0838c	runtime: Support for AMD SEV-SNP VMs This commit adds AMD SEV-SNP as a confidential guest option to the runtime. Information on required components such as OVMF, QEMU and a kernel supporting SEV-SNP are defined in the versions file and corresponding configs are added. Note: The CPU model 'host' provided by the current SNP-QEMU does not support all SNP capabilities yet, which is why this option is changed to EPYC-v4. Note: The guest's physical address space reduction specified with ReducedPhysBits is 1. Details are can be found in Section 15.34.6 here https://www.amd.com/system/files/TechDocs/24593.pdf Fixes #4437 Signed-off-by: Joana Pecholt <joana.pecholt@aisec.fraunhofer.de>	2022-09-16 17:51:41 +02:00
Feng Wang	f914319874	runtime: store the user name in hypervisor config The user name will be used to delete the user instead of relying on uid lookup because uid can be reused. Fixes: #5155 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-09-13 10:32:55 -07:00
Feng Wang	c3015927a3	runtime: add more debug logs for non-root user operation Previously the logging was insufficient and made debugging difficult Fixes: #5155 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-09-12 21:38:57 -07:00
Archana Shinde	7d52934ec1	Merge pull request #4798 from amshinde/use-iouring-qemu Use iouring for qemu block devices	2022-08-26 04:00:24 +05:30
Peng Tao	a06d819b24	runtime: cri-o annotations have been moved to podman Let's swith to depending on podman which also simplies indirect dependency on kubernetes components. And it helps to avoid cri-o security issues like CVE-2022-1708 as well. Fixes: #4972 Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2022-08-24 18:11:37 +08:00
Chelsea Mafrica	fcc1e0c617	runtime: tracing: End root span at end of trace The root span should exist the duration of the trace. Defer ending span until the end of the trace instead of end of function. Add the span to the service struct to do so. Fixes #4902 Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>	2022-08-12 13:15:39 -07:00
Archana Shinde	c1e3b8f40f	govmm: Refactor qmp functions for adding block device Instead of passing a bunch of arguments to qmp functions for adding block devices, use govmm BlockDevice structure to reduce these. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2022-08-05 13:16:34 -07:00
Archana Shinde	598884f374	govmm: Refactor code to get rid of redundant code Get rid of redundant return values from function. args and blockdevArgs used to return different values to maintain compatilibity between qemu versions. These are exactly the same now. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2022-08-05 13:16:34 -07:00
Archana Shinde	00860a7e43	qmp: Pass aio backend while adding block device Allow govmm to pass aio backend while adding block device. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2022-08-05 13:16:34 -07:00
Archana Shinde	e1b49d7586	config: Add block aio as a supported annotation Allow Block AIO to be passed as a per pod annotation. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2022-08-05 13:16:34 -07:00
Archana Shinde	ed0f1d0b32	config: Add "block_device_aio" as a config option for qemu This configuration will allow users to choose between different I/O backends for qemu, with the default being io_uring. This will allow users to fallback to a different I/O mechanism while running on kernels olders than 5.1. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2022-08-05 13:16:34 -07:00
Archana Shinde	b6cd2348f5	govmm: Add io_uring as AIO type io_uring was introduced as a new kernel IO interface in kernel 5.1. It is designed for higher performance than the older Linux AIO API. This feature was added in qemu 5.0. Fixes #4645 Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2022-08-03 10:43:12 -07:00
Archana Shinde	81cdaf0771	govmm: Correct documentation for Linux aio. The comments for "native" aio are incorrect. Correct these. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2022-08-03 10:41:50 -07:00
Zhongtao Hu	adfad44efe	Merge remote-tracking branch 'origin/main' into runtime-rs-merge-tmp To keep runtime-rs up to date, we will merge main into runtime-rs every week. Fixes:#4776 Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>	2022-08-01 11:12:48 +08:00
yaoyinnan	5c3155f7e2	runtime: Support for host cgroup v2 Support cgroup v2 on the host. Update vendor containerd/cgroups to add cgroup v2. Fixes: #3073 Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>	2022-07-28 10:30:45 +08:00
wllenyj	274598ae56	kata-runtime: add dragonball config check support. add dragonball config check support. Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>	2022-07-14 10:43:50 +08:00
Manabu Sugimoto	4d89476c91	runtime: Fix DisableSelinux config Enable Kata runtime to handle `disable_selinux` flag properly in order to be able to change the status by the runtime configuration whether the runtime applies the SELinux label to VMM process. Fixes: #4599 Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>	2022-07-06 15:50:28 +09:00
Fabiano Fidêncio	071dd4c790	Merge pull request #4109 from pmores/drop-in-cfg-files-support Drop in cfg files support	2022-07-05 22:21:24 +02:00
Peng Tao	a1de394e51	Merge pull request #4550 from liubin/fix/4548-overwrite-mount-type-for-bind-mount runtime: overwrite mount type to bind for bind mounts	2022-07-04 19:56:26 +08:00
liubin	1f363a386c	runtime: overwrite mount type to bind for bind mounts Some clients like nerdctl may pass mount type of none for volumes/bind mounts, this will lead to container start fails. Referring to runc, it overwrites the mount type to bind and ignores the input value. Fixes: #4548 Signed-off-by: liubin <liubin0329@gmail.com>	2022-07-01 12:13:01 +08:00
GabyCT	02a51e75a7	Merge pull request #4554 from liubin/fix/delete-not-used-console-from-container-config runtime: delete Console from Cmd type	2022-06-30 11:40:07 -05:00
Fabiano Fidêncio	aa561b49f5	Merge pull request #4540 from fidencio/topic/default_maxmemory Add `default_maxmemory` config option	2022-06-30 12:08:15 +02:00
GabyCT	2a94261df5	Merge pull request #4549 from liubin/fix/4419-set-status-if-wait-process-failed shim: set a non-zero return code if the wait process call failed.	2022-06-29 17:04:53 -05:00
Fabiano Fidêncio	1e12d56512	Merge pull request #4469 from egernst/config-validation-refactor Refactor how hypervisor config validation is handled	2022-06-29 14:42:11 +02:00
liubin	a5a25ed13d	runtime: delete Console from Cmd type There is much code related to this property, but it is not used anymore. Fixes: #4553 Signed-off-by: liubin <liubin0329@gmail.com>	2022-06-29 17:36:32 +08:00
Pavel Mores	c656457e90	runtime: Add tests of drop-in config file decoding The tests ensure that interactions between drop-ins and the base configuration.toml and among drop-ins themselves work as intended, basically that files are evaluated in the correct order (base file first, then drop-ins in alphabetical order) and the last one to set a specific key wins. Signed-off-by: Pavel Mores <pmores@redhat.com>	2022-06-29 09:54:39 +02:00
Pavel Mores	99f5ca80fc	runtime: Plug drop-in decoding into decodeConfig() Fixes #4108 Signed-off-by: Pavel Mores <pmores@redhat.com>	2022-06-29 09:54:38 +02:00
Pavel Mores	0f9856c465	runtime: Scan drop-in directory, read files and decode them updateFromDropIn() uses the infrastructure built by previous commits to ensure no contents of 'tomlConfig' are lost during decoding. To do this, we preserve the current contents of our tomlConfig in a clone and decode a drop-in into the original. At this point, the original instance is updated but its Agent and/or Hypervisor fields are potentially damaged. To merge, we update the clone's Agent/Hypervisor from the original instance. Now the clone has the desired Agent/Hypervisor and the original instance has the rest, so to finish, we just need to move the clone's Agent/Hypervisor to the original. Signed-off-by: Pavel Mores <pmores@redhat.com>	2022-06-29 09:54:38 +02:00
Pavel Mores	2c1efcc697	runtime: Add helpers to copy fields between tomlConfig instances These functions take a TOML key - an array of individual components, e.g. ["agent" "kata" "enable_tracing"], as returned by BurntSushi - and two 'tomlConfig' instances. They copy the value of the struct field identified by the key from the source instance to the target one if necessary. This is only done if the TOML key points to structures stored in maps by 'tomlConfig', i.e. 'hypervisor' and 'agent'. Nothing needs to be done in other cases. Signed-off-by: Pavel Mores <pmores@redhat.com>	2022-06-29 09:54:38 +02:00
Pavel Mores	20f11877be	runtime: Add framework to manipulate config structs via reflection For 'tomlConfig' substructures stored in Golang maps - 'hypervisor' and 'agent' - BurntSushi doesn't preserve their previous contents as it does for substructures stored directly (e.g. 'runtime'). We use reflection to work around this. This commit adds three primitive operations to work with struct fields identified by their `toml:"..."` tags - one to get a field value, one to set a field value and one to assign a source struct field value to the corresponding field of a target. Signed-off-by: Pavel Mores <pmores@redhat.com>	2022-06-29 09:54:38 +02:00
liubin	ab5f1c9564	shim: set a non-zero return code if the wait process call failed. Return code is an int32 type, so if an error occurred, the default value may be zero, this value will be created as a normal exit code. Set return code to 255 will let the caller(for example Kubernetes) know that there are some problems with the pod/container. Fixes: #4419 Signed-off-by: liubin <liubin0329@gmail.com>	2022-06-29 12:33:32 +08:00
Eric Ernst	e5be5cb086	runtime: device: cleanup outdated comments Prior device config move didn't update the comments. Let's address this, and make sure comments match the new path... Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-06-28 18:22:28 -07:00
Tim Zhang	916ffb75d7	Merge pull request #4432 from liubin/fix/4420-binary-log shim: support shim v2 logging plugin	2022-06-28 16:29:07 +08:00
Fabiano Fidêncio	afdc960424	hypervisor: Add default_maxmemory configuration Let's add a `default_maxmemory` configuration, which allows the admins to set the maximum amount of memory to be used by a VM, considering the initial amount + whatever ends up being hotplugged via the pod limits. By default this value is 0 (zero), and it means that the whole physical RAM is the limit. Fixes: #4516 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-06-28 08:32:15 +02:00
Bin Liu	4e30e11b31	shim: support shim v2 logging plugin Now kata shim only supports stdout/stderr of fifo from containerd/CRI-O, but shim v2 supports logging plugins, and nerdctl default will use the binary schema for logs. This commit will add the others type of log plugins: - file - binary In case of binary, kata shim will receive a stdout/stderr like: binary:///nerdctl?_NERDCTL_INTERNAL_LOGGING=/var/lib/nerdctl/1935db59 That means the nerdctl process will handle the logs(stdout/stderr) Fixes: #4420 Signed-off-by: Bin Liu <bin@hyper.sh>	2022-06-28 13:54:22 +08:00
Eric Ernst	469e098543	katautils: don't do validation when loading hypervisor config Policy for whats valid/invalid within the config varies by VMM, host, and by silicon architecture. Let's keep katautils simple for just translating a toml to the hypervisor config structure, and leave validation to virtcontainers. Without this change, we're doing duplicate validation. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-06-27 10:13:26 -07:00
Bin Liu	27b1bb5ed9	Merge pull request #4467 from egernst/device-pkg device package cleanup/refactor	2022-06-27 14:40:53 +08:00
Eric Ernst	e32bf53318	device: deduplicate state structures Before, we maintained almost identical structures between our persist API and what we keep for our devices, with the persist API being a slight subset of device structures. Let's deduplicate this, now that persist is importing device package. Json unmarshal of prior persist structure will work fine, since it was an exact subset of fields. Fixes: #4468 Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-06-26 21:31:29 -07:00
Eric Ernst	f97d9b45c8	runtime: device/persist: drop persist dependency from device pkgs Rather than have device package depend on persist, let's define the (almost duplicate) structures within device itself, and have the Kata Container's persist pkg import these. This'll help avoid unecessary dependencies within our core packages. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-06-26 21:31:29 -07:00
Eric Ernst	f9e96c6506	runtime: device: move to top level package Let's move device package to runtime/pkg instead of being buried under virtcontainers. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-06-26 21:31:29 -07:00
Eric Ernst	72049350ae	Merge pull request #4288 from fengwang666/enable-qemu-sandbox runtime: enable sandbox feature on qemu	2022-06-21 09:22:26 -07:00
Liang Zhou	ef925d40ce	runtime: enable sandbox feature on qemu Enable "-sandbox on" in qemu can introduce another protect layer on the host, to make the secure container more secure. The default option is disable because this feature may introduce some performance cost, even though user can enable /proc/sys/net/core/bpf_jit_enable to reduce the impact. Fixes: #2266 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-06-17 15:30:46 -07:00
Chelsea Mafrica	28995301b3	tracing: Remove whitespace from root span Remove space from root span name to follow camel casing of other tracing span names in the runtime and to make parsing easier in testing. Fixes #4483 Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>	2022-06-17 12:07:37 -07:00
Bin Liu	553ec46115	Merge pull request #4436 from alex-matei/fix/sandbox-mem-overflow runtime: fix error when trying to parse sandbox sizing annotations	2022-06-16 11:18:24 +08:00
Bin Liu	81acfc1286	Merge pull request #4425 from liubin/fix/4376-change-log-level-of-getoomevent shim: change the log level for GetOOMEvent call failures	2022-06-13 17:53:11 +08:00
Alexandru Matei	721ca72a64	runtime: fix error when trying to parse sandbox sizing annotations Changed bitsize for parsing functions to 64-bit in order to avoid parsing errors. Fixes #4435 Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>	2022-06-11 18:51:10 +03:00
Eric Ernst	4ebf9d38b9	Merge pull request #4310 from egernst/core-sched shim: add support for core scheduling	2022-06-08 17:42:45 +02:00
Bin Liu	eff4e1017d	shim: change the log level for GetOOMEvent call failures GetOOMEvent is a blocking call that will fail if the container exit, in this case, it's not an error or warning. Changing the log level for logs in case of GetOOMEvent call fails will reduce log noise in a large cluster that has pods creating/deleting frequently. Fixes: #4376 Signed-off-by: Bin Liu <bin@hyper.sh>	2022-06-08 22:17:24 +08:00
Eric Ernst	430da47215	Merge pull request #4360 from fengwang666/shim-leak runtime: ignore ESRCH error from stop container	2022-06-02 12:42:19 -07:00
Feng Wang	9726f56fdc	runtime: force stop container after the container process exits Set thestop container force flag to true so that the container state is always set to “StateStopped” after the container wait goroutine is finished. This is necessary for the following delete container step to succeed. Fixes: #4359 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-06-02 08:17:08 -07:00
Eric Ernst	d2df1209a5	docs: describe kata handling for core-scheduling Add initial documentation for core-scheduling. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-05-31 16:17:00 -07:00
Michael Crosby	22b6a94a84	shim: add support for core scheduling In linux 5.14 and hopefully some backports, core scheduling allows processes to be co scheduled within the same domain on SMT enabled systems. Containerd impl sets the core sched domain when launching a shim. This allows a clean way for each shim(container/pod) to be in its own domain and any additional containers, (v2 pods) be be launched with the same domain as well as any exec'd process added to the container. kernel docs: https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/core-scheduling.html For Kata specifically, we will look for SCHED_CORE environment variable to be set to indicate we shuold create a new schedule core domain. This is equivalent to the containerd shim's PR: `e48bbe8394` Fixes: #4309 Signed-off-by: Eric Ernst <eric_ernst@apple.com> Signed-off-by: Michael Crosby <michael@thepasture.io>	2022-05-31 10:10:40 -07:00
Eric Ernst	3201ad0830	shim-client: ensure we check resp status for Put/Post Without this, potential errors are silently dropped. Let's ensure we return the error code as well as potenial data from the response. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-05-31 09:27:58 -07:00
Eric Ernst	0706fb28ac	kata-runtime: shmgmt: make url usage consistent Before, we had a mix of slash, etc. Unfortunately, when cleaning URL paths, serve mux seems to mangle the request method, resulting in each request being a GET (instead of PUT or POST). Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-05-31 09:27:58 -07:00
Eric Ernst	2a09378dd9	shim-client: add support for DoPut While at it, make sure we check for nil in DoPost Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-05-31 09:27:58 -07:00
Eric Ernst	640173cfc2	shim-mgmt: Add endpoint handler for interacting with iptables Add two endpoints: ip6tables, iptables. Each url handler supports GET and PUT operations. PUT expects the requests' data to be []bytes, and to contain iptable information in format to be consumed by iptables-restore. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-05-31 09:27:58 -07:00
Yibo Zhuang	ffdc065b4c	runtime: direct-volume stats update to use GET parameter The go default http mux AFAIK doesn’t support pattern routing so right now client is padding the url for direct-volume stats with a subpath of the volume path and this will always result in 404 not found returned by the shim. This change will update the shim to take the volume path as a GET query parameter instead of a subpath. If the parameter is missing or empty, then return 400 BadRequest to the client. Fixes: #4297 Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>	2022-05-20 18:41:51 -07:00
Snir Sheriber	f4994e486b	runtime: allow annotation configuration to use_legacy_serial and update the docs and test Signed-off-by: Snir Sheriber <ssheribe@redhat.com>	2022-05-18 18:58:21 +03:00
Snir Sheriber	c67b9d2975	qemu: allow using legacy serial device for the console This allows to get guest early boot logs which are usually missed when virtconsole is used. - It utilizes previous work on the govmm side: https://github.com/kata-containers/govmm/pull/203 - unit test added Fixes: #4237 Signed-off-by: Snir Sheriber <ssheribe@redhat.com>	2022-05-17 12:06:11 +03:00
Fabiano Fidêncio	511f7f822d	config: Add DiskRateLimiter* to Cloud Hypervisor Let's add the newly added disk rate limiter configurations to the Cloud Hypervisor's hypervisor configuration. Right now those are not used anywhere, and there's absolutely no way the users can set those up. That's coming later in this very same series. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-04-28 10:27:15 +02:00
Fabiano Fidêncio	5b18575dfe	hypervisor: Add disk bandwidth and operations rate limiters This is the disk counterpart of the what was introduced for the network as part of the previous commits in this series. The newly added fields are: * DiskRateLimiterBwMaxRate, defined in bits per second, which is used to control the network I/O bandwidth at the VM level. * DiskRateLimiterBwOneTimeBurst, also defined in bits per second, which is used to define an initial max rate, which doesn't replenish. * DiskRateLimiterOpsMaxRate, the operations per second equivalent of the DiskRateLimiterBwMaxRate. * DiskRateLimiterOpsOneTimeBurst, the operations per second equivalent of the DiskRateLimiterBwOneTimeBurst. For now those extra fields have only been added to the hypervisor's configuration and they'll be used in the coming patches of this very same series. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-04-28 10:27:11 +02:00
Fabiano Fidêncio	c9f6496d6d	config: Add NetRateLimiter* to Cloud Hypervisor Let's add the newly added network rate limiter configurations to the Cloud Hypervisor's hypervisor configuration. Right now those are not used anywhere, and there's absolutely no way the users can set those up. That's coming later in this very same series. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-04-28 10:22:42 +02:00
Fabiano Fidêncio	2d35e6066d	hypervisor: Add network bandwidth and operations rate limiters In a similar way to what's already exposed as RxRateLimiterMaxRate and TxRateLimiterMaxRate, let's add four new fields to the Hypervisor's configuration. The values added are related to bandwidth and operations rate limiters, which have to be added so we can expose I/O throttling configurations to users using Cloud Hypervisor as their preferred VMM. The reason we cannot simply re-use {Rx,Tx}RateLimiterMaxRate is because Cloud Hypervisor exposes a single MaxRate to be used for both inbound and outbound queues. The newly added fields are: * NetRateLimiterBwMaxRate, defined in bits per second, which is used to control the network I/O bandwidth at the VM level. * NetRateLimiterBwOneTimeBurst, also defined in bits per second, which is used to define an initial max rate, which doesn't replenish. * NetRateLimiterOpsMaxRate, the operations per second equivalent of the NetRateLimiterBwMaxRate. * NetRateLimiterOpsOneTimeBurst, the operations per second equivalent of the NetRateLimiterBwOneTimeBurst. For now those extra fields have only been added to the hypervisor's configuration and they'll be used in the coming patches of this very same series. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-04-28 10:22:42 +02:00
David Gibson	f7ba21c86f	runtime: Clean up mock hook logs in tests The tests in hook_test.go run a mock hook binary, which does some debug logging to /tmp/mock_hook.log. Currently we don't clean up those logs when the tests are done. Use a test cleanup function to do this. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-04-22 14:14:52 +10:00
David Gibson	90b2f5b776	runtime: Make SetupOCIConfigFile clean up after itself SetupOCIConfigFile creates a temporary directory with os.MkDirTemp(). This means the callers need to register a deferred function to remove it again. At least one of them was commented out meaning that a /temp/katatest- directory was leftover after the unit tests ran. Change to using t.TempDir() which as well as better matching other parts of the tests means the testing framework will handle cleaning it up. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-04-22 14:14:52 +10:00
Bin Liu	b19bfac7cd	Merge pull request #4042 from yibozhuang/direct-assign-fsgroup fsGroup support for direct-assigned volume	2022-04-16 10:23:15 +08:00
Bin Liu	362201605e	Merge pull request #4055 from fgiudici/kata-monitor_pprof kata-monitor: update the hrefs in the debug/pprof index page	2022-04-16 08:12:18 +08:00
Chelsea Mafrica	32f92e75cc	Merge pull request #4021 from fengwang666/direct-volume-bug runtime: Base64 encode the direct volume mountInfo path	2022-04-13 13:15:38 -07:00
Francesco Giudici	86977ff780	kata-monitor: update the hrefs in the debug/pprof index page kata-monitor allows to get data profiles from the kata shim instances running on the same node by acting as a proxy (e.g., http://$NODE_ADDRESS:8090/debug/pprof/?sandbox=$MYSANDBOXID). In order to proxy the requests and the responses to the right shim, kata-monitor requires to pass the sandbox id via a query string in the url. The profiling index page proxied by kata-monitor contains the link to all the data profiles available. All the links anyway do not contain the sandbox id included in the request: the links result then broken when accessed through kata-monitor. This happens because the profiling index page comes from the kata shim, which will not include the query string provided in the http request. Let's add on-the-fly the sandbox id in each href tag returned by the kata shim index page before providing the proxied page. Fixes: #4054 Signed-off-by: Francesco Giudici <fgiudici@redhat.com>	2022-04-12 15:53:59 +02:00
Yibo Zhuang	532d53977e	runtime: fsGroup support for direct-assigned volume The fsGroup will be specified by the fsGroup key in the direct-assign mountinfo metadate field. This will be set when invoking the kata-runtime binary and providing the key, value pair in the metadata field. Similarly, the fsGroupChangePolicy will also be provided in the mountinfo metadate field. Adding an extra fields FsGroup and FSGroupChangePolicy in the Mount construct for container mount which will be populated when creating block devices by parsing out the mountInfo.json. And in handleDeviceBlockVolume of the kata-agent client, it checks if the mount FSGroup is not nil, which indicates that fsGroup change is required in the guest, and will provide the FSGroup field in the protobuf to pass the value to the agent. Fixes #4018 Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>	2022-04-11 08:41:13 -07:00
bin	f8cc5d1ad8	kata-monitor: add some links when generating pages for browsers Add some links to rendered webpages for better user experience, let users can jump to pages only by clicking links in browsers. Fixes: #4061 Signed-off-by: bin <bin@hyper.sh>	2022-04-11 09:29:56 +08:00
Fabiano Fidêncio	b39caf43f1	Merge pull request #3923 from Jakob-Naucke/no-initrd-se runtime: Allow and require no initrd for SE	2022-04-05 09:26:07 +02:00
Feng Wang	354cd3b9b6	runtime: Base64 encode the direct volume mountInfo path This is to avoid accidentally deleting multiple volumes. Fixes #4020 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-04-04 19:56:46 -07:00
Eng Zer Jun	59c7165ee1	test: use `T.TempDir` to create temporary test directory The directory created by `T.TempDir` is automatically removed when the test and all its subtests complete. This commit also updates the unit test advice to use `T.TempDir` to create temporary directory in tests. Fixes: #3924 Reference: https://pkg.go.dev/testing#T.TempDir Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>	2022-03-31 09:31:36 +08:00
bin	5e1c30d484	runtime: add logs around sandbox monitor For debugging purposes, add some logs. Fixes: #3815 Signed-off-by: bin <bin@hyper.sh>	2022-03-29 16:59:12 +08:00
bin	fb8be96194	runtime: stop getting OOM events when ttrpc: closed error getOOMEvents is a long-waiting call, it will retry when failed. For cases of agent shutdown, the retry should stop. When the agent hasn't detected agent has died, we can also check whether the error is "ttrpc: closed". Fixes: #3815 Signed-off-by: bin <bin@hyper.sh>	2022-03-29 16:39:01 +08:00
Feng Wang	0928eb9f4e	agent: Kill the all the container processes of the same cgroup Otherwise the container process might leak and cause an unclean exit Fixes: #3913 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-03-27 10:06:58 -07:00
Jakob Naucke	ff17c756d2	runtime: Allow and require no initrd for SE Previously, it was not permitted to have neither an initrd nor an image. However, this is the exact config to use for Secure Execution, where the initrd is part of the image to be specified as `-kernel`. Require the configuration of no initrd for Secure Execution. Also - remove redundant code for image/initrd checking -- no need to check in `newQemuHypervisorConfig` (calling) when it is also checked in `getInitrdAndImage` (called) - use `QemuCCWVirtio` constant when possible Fixes: #3922 Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2022-03-25 18:36:12 +01:00
Feng Wang	19f372b5f5	runtime: Add more debug logs for container io stream copy This can help debugging container lifecycle issues Fixes: #3913 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-03-24 21:35:16 -07:00
David Gibson	c77e34de33	runtime: Move mock hook source src/runtime/virtcontainers/hook/mock contains a simple example hook in Go. The only thing this is used for is for some tests in src/runtime/pkg/katautils/hook_test.go. It doesn't really have anything to do with the rest of the virtcontainers package. So, move it next to the test code that uses it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-03-23 19:37:35 +11:00
David Gibson	0e83c95fac	virtcontainers: Run mock hook from build tree rather than system bin dir Running unit tests should generally have minimal dependencies on things outside the build tree. It definitely shouldn't modify system wide things outside the build tree. Currently the runtime "make test" target does so, though. Several of the tests in src/runtime/pkg/katautils/hook_test.go require a sample hook binary. They expect this hook in /usr/bin/virtcontainers/bin/test/hook, so the makefile, as root, installs the test binary to that location. Go tests automatically run within the package's directory though, so there's no need to use a system wide path. We can use a relative path to the binary build within the tree just as easily. fixes #3941 Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-03-23 19:34:50 +11:00
Bin Liu	deb8ce97a8	Merge pull request #3836 from liubin/fix/minor-fix Enhancement: fix comments/logs and delete not used function	2022-03-07 17:26:30 +08:00
bin	1b34494b2f	runtime: fix invalid comments for pkg/resourcecontrol Some comments are copied and not adjusted to the pkg/resourcecontrol package. Fixes: #3835 Signed-off-by: bin <bin@hyper.sh>	2022-03-05 10:32:31 +08:00
Evan Foster	afc567a9ae	storage: make k8s emptyDir creation configurable This change introduces the `disable_guest_empty_dir` config option, which allows the user to change whether a Kubernetes emptyDir volume is created on the guest (the default, for performance reasons), or the host (necessary if you want to pass data from the host to a guest via an emptyDir). Fixes #2053 Signed-off-by: Evan Foster <efoster@adobe.com>	2022-03-04 12:02:42 -08:00
Eric Ernst	1e301482e7	Merge pull request #3406 from fengwang666/direct-blk-assignment Implement direct-assigned volume	2022-03-04 11:58:37 -08:00
Fabiano Fidêncio	7e5f11a52b	vendor: Update containerd to 1.6.1 Let's bring in the latest release of Containerd, 1.6.1, released on March 2nd, 2022. With this, we take the opportunity to remove containerd/api reference as we shouldn't need a separate module only for the API. Here's the list of changes needed in the code due to the bump: * stop using `grpc.WithInsecure()` as it's been deprecated - use `grpc.WithTransportCredentials(insecure.NewCredentials())` instead Fixes: #3820 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-03-04 10:28:40 +01:00
Feng Wang	e9b5a25502	runtime: add stat and resize APIs to containerd-shim-v2 To query fs stats and resize fs, the requests need to be passed to kata agent through containerd-shim-v2. So we're adding to rest APIs on the shim management endpoint. Also refactor shim management client to its own go file. Fixes: #3454 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-03-03 18:56:53 -08:00
Feng Wang	6e0090abb5	runtime: persist direct volume mount info In the direct assigned volume scenario, Kata Containers persists the information required for managing the volume inside the guest on host filesystem. Fixes: #3454 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-03-03 15:32:12 -08:00
Fabiano Fidêncio	a2422cf2a1	Merge pull request #3389 from zhsj/rm-distro-test katatestutils: remove distro constraints	2022-03-03 23:26:58 +01:00
Fabiano Fidêncio	9615c8bc9c	config: fc: Don't expose disable_block_device_use Relying on virtio-block is the only way to use Firecracker with Kata Containers, as shared FS (virtio-{fs,fs-nydus,9p}) is not supported by Firecracker. As configuration doesn't make sense to be exposed, we hardcode the `false` value in the Firecracker configuration structure. Fixes: #3813 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-03-02 20:43:28 +01:00
Fabiano Fidêncio	97c17085b0	Merge pull request #3770 from Jakob-Naucke/gofmt-vmm-s390x runtime: Gofmt fixes	2022-03-01 11:34:15 +01:00
Jakob Naucke	eda8ea154a	runtime: Gofmt fixes - Mostly blank lines after `+build` -- see https://pkg.go.dev/go/build@go1.14.15 -- this is, to date, enforced by `gofmt`. - 1.17-style go:build directives are also added. - Spaces in govmm/vmm_s390x.go Fixes: #3769 Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2022-02-28 17:24:47 +01:00
Eric Ernst	6a5c634490	resourcecontrol: SystemdCgroup check is not necessarily linux specific This utility function is also used to check the spec that will run in the guest - no need for this to be linux specific. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-02-28 08:01:53 -08:00
Eric Ernst	cc58cf6993	resourcecontrol: convert stats dev_t to unit64types Their types may differ on various host OSes, but unix.Major\|Minor always takes a uint64 Depends-on: github.com/kata-containers/tests#4516 Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-02-28 08:01:53 -08:00
Samuel Ortiz	56751089c0	katautils: Use a syscall wrapper for the hook JSON state There is no real equivalent of a thread ID on Darwin. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-28 08:01:53 -08:00
Samuel Ortiz	7d64ae7a41	runtime: Add a syscall wrapper package It allows to support syscall variations between host OSes. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-28 08:01:53 -08:00
Samuel Ortiz	abc681ca5f	katautils: Add Darwin stub for the netNS API And move the current implementation into a Linux only file. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-28 08:01:53 -08:00
Tanweer Noor	082d538cb4	runtime: make selinux configurable removes --tags selinux handling in the makefile (part of it introduced here: `d78ffd6`) and makes selinux configurable via configuration.toml Fixes: #3631 Signed-off-by: Tanweer Noor <tnoor@apple.com>	2022-02-25 10:33:46 -08:00
Fabiano Fidêncio	29ee870d20	clh: Add confidential_guest to the config file ConfidentialGuest is an option already present and exposed for QEMU, which is used for using Kata Containers together with different sorts of Guest Protections, such as TDX and SEV for x86_64, PEF for ppc64le, and SE for s390x. Right now we error out in case confidential_guest is enabled, as we will be implementing the needed blocks for this as part of this series. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-02-25 16:49:21 +01:00
Francesco Giudici	3da6006de4	Merge pull request #3751 from fgiudici/kata-monitor_issue3705 kata-monitor: fix collecting metrics for sandboxes not started through CRI	2022-02-25 14:53:12 +01:00
Amulyam24	cb4230e60e	runtime: fix package declaration for ppc64le Incorrect package name causes build to fail. Fix it in vm_ppc64le.go Fixes: #3761 Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>	2022-02-24 15:31:48 +05:30
Eric Ernst	c6cc038364	Merge pull request #3615 from sameo/topic/hypervisor Make the hypervisor framework not Linux specific	2022-02-23 16:02:00 -08:00
Francesco Giudici	fec26f8e51	kata-monitor: trivial: rename symbols & labels We introduced collection of sandboxes metadata from the CRI that will be attached to the sandbox metrics: this will allow to immediately match sandboxes metrics with CRI workloads. Rename the symbols from Kube to CRI as the metadata will be there every time pods are created through CRI, also if kubernetes is not installed (e.g., 'crictl runp'). Signed-off-by: Francesco Giudici <fgiudici@redhat.com>	2022-02-23 18:34:32 +01:00
Samuel Ortiz	9fd4e5514f	runtime: Move the resourcecontrol package one layer up And try to reduce the number of virtcontainers packages, step by step. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-23 15:48:40 +01:00
Francesco Giudici	3ac52e8193	kata-monitor: fix updating sandbox cache at startup We now rely on fs events only to update the sandbox cache. This is not true anyway for sandboxes already present at kata-monitor startup: we just retrieve the list and add them in the cache only when we get their CRI metadata. If CRI metadata is not available we will never add them to the sandbox cache. Fix this by immediately adding the sandboxes we find at startup time to the sandbox cache. Fixes: #3705 Signed-off-by: Francesco Giudici <fgiudici@redhat.com>	2022-02-23 11:21:06 +01:00
Fabiano Fidêncio	6a9e5f90f7	Merge pull request #3670 from sameo/topic/nerdctl Support nerdctl OCI hooks	2022-02-22 23:03:33 +01:00
Fabiano Fidêncio	4729fd0fc2	Merge pull request #3736 from liubin/fix/3733-log-events-for-crio shim: log events for CRI-O	2022-02-22 09:19:37 +01:00
bin	f6fc1621f7	shim: log events for CRI-O CRI-O start shim process without setting TTRPC_ADDRESS, that the forwarding events goroutine will get errors. For CRI-O runtime, we can log the events to log file. Fixes: #3733 Signed-off-by: bin <bin@hyper.sh>	2022-02-22 11:02:50 +08:00
Fabiano Fidêncio	1e9f3c856d	Merge pull request #3553 from fgiudici/kata-monitor_cachefix kata-monitor: simplify sandbox cache management and attach kubernetes POD metadata to metrics	2022-02-21 13:17:22 +01:00
luodaowen.backend	3175aad5ba	virtiofs-nydus: add lazyload support for kata with clh As kata with qemu has supported lazyload, so this pr aims to bring lazyload ability to kata with clh. Fixes #3654 Signed-off-by: luodaowen.backend <luodaowen.backend@bytedance.com>	2022-02-19 21:55:31 +08:00
Samuel Ortiz	b28d0274ff	virtcontainers: Make max vCPU config less QEMU specific Even though it's still actually defined as the QEMU upper bound, it's now abstracted away through govmm. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-16 19:06:32 +01:00

... 2 3 4 5 6 ...

534 Commits