kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2025-09-14 21:39:26 +00:00

Author	SHA1	Message	Date
Bo Chen	98b7350a1b	virtcontainers: clh: Enable the `seccomp` feature This patch enables the `seccomp` feature from Cloud Hypervisor which provides fine-grained allowed syscalls for each of its worker threads. It brings important security benefits, while would increase memory footprint. Fixes: #2782 Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-10-08 15:07:43 -07:00
Fupan Li	988eb95621	Merge pull request #2760 from liubin/fix/2759-optimize-code-for-managing-temp-users runtime: optimize code for managing temp users for rootless mode	2021-10-08 13:49:14 +08:00
bin	bf8f582c1d	runtime: optimize code for managing temp users for rootless mode This commit does two chagnes: - move code for managing temp users to rootless.go. - use common function in qemu.go when shutdown the VM. Fixes: #2759 Signed-off-by: bin <bin@hyper.sh>	2021-10-08 11:04:21 +08:00
Bin Liu	10ec4b133c	Merge pull request #2742 from liubin/fix/2741-delete-file-code Delete file virtcontainers-setup.sh	2021-10-07 11:54:47 +08:00
Jianyong Wu	7eac2ec786	protection: add confidential compute frame for arm Even CCA, which is the confidential compute archtecture, has not been ready, add a empty implementation to avoid static check error. Fixes: #2789 Signed-off-by: Jianyong Wu <jianyong.wu@arm.com> Suggested-by: Fabiano Fidêncio <fidencio@redhat.com>	2021-10-06 15:53:36 +02:00
Jianyong Wu	8acfc154de	check: fix typecheck failure in qemu_arm64_test.go fix typecheck failure in qemu_arm64_test.go Fixes: #2789 Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>	2021-10-06 15:53:35 +02:00
Amulya Meka	5b02d54e23	virtcontainers: fix lint failure on ppc64le Add nolint for arch specific code to exclude from lint check. Fixes: #2773 Signed-off-by: Amulya Meka <amulmek1@in.ibm.com>	2021-10-06 15:53:35 +02:00
Jakob Naucke	ff9728f032	virtcontainers: nolint guestProtection Exclude from lint checking for it is ultimately only used in architecture-specific code. Fixes: #2273 Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2021-10-06 15:53:35 +02:00
Samuel Ortiz	71ce6cfe9e	runtime: Pass the route IP family to the agent When updating the guest routing table, we should forward the IP family information up to the guest. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2021-10-01 14:35:17 +02:00
Samuel Ortiz	99450bd1f7	agent: protos: Add a Family field to the Route payload Our check for the IP family is working as long as we have either a gateway or a destination IP. Some routes are missing both. The RT netlink messages provide the IP family information for each route, so we can carry that piece of information up to the guest. That will allow for a more reliable route IP family determination. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2021-10-01 14:35:17 +02:00
Samuel Ortiz	f85fe70231	runtime: vendor: Bump the netlink package dependency We need to be able to get the IP family from the netlink route meesages, and the Route.Family field only got recently added to the netlink package. The update generates static check warnings about the call for nethandler.Delete() being deprecated in favor of a Close() call instead. So we include the s/Delete()/Close()/ change as part of this PR. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2021-10-01 14:35:01 +02:00
James O. D. Hunt	2ce8d4263c	clh: Suppress hypervisor output to make guest output visible Reduce the cloud-hypervisor log level from `Debug` to `Info` when hypervisor debug is enabled. This is required since `Debug` level: - Is overkill for debugging hypervisor failures. - Effectively hides the output from the guest kernel and userland: CLH generates so much output that the output from the guest gets "lost in the noise" (experiments show that for each full CLH debug message, at most 1 _byte_ of guest output is displayed). Fixes: #2726. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2021-09-30 14:22:09 +01:00
bin	762922a521	runtime: delete func ConstraintsToVCPUs ConstraintsToVCPUs is not used any more. Fixes: #2741 Signed-off-by: bin <bin@hyper.sh>	2021-09-30 14:44:41 +08:00
bin	4f4854308a	runtime: delete virtcontainers-setup.sh This file is not used anymore. Fixes: #2741 Signed-off-by: bin <bin@hyper.sh>	2021-09-30 14:44:30 +08:00
Bin Liu	4ac7199282	Merge pull request #2494 from rapiz1/clean-up-code virtcontainers: clean up useless code	2021-09-29 22:56:13 +08:00
David Gibson	b57613f53e	Merge pull request #1682 from dgibson/rescan Remove forced PCI rescans from agent	2021-09-29 13:03:55 +10:00
Feng Wang	e5fe53f0a9	runtime: fix nil reference in cleanup rootless user It seems the client (crio) can send multiple requests to stop the Kata VM, resulting a nil reference if the uid has already been cleaned up by a different thread. Fixes #2743 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2021-09-27 21:28:47 -07:00
Bin Liu	3217b03b17	Merge pull request #2522 from Bevisy/main-2515 virtcontainers: Fix incorrect scripts path	2021-09-27 21:14:40 +08:00
Bin Liu	39df808f6a	Merge pull request #2695 from YchauWang/wyc-vc-cgroup runtime: clear virtcontainers cgroup duplicated function	2021-09-27 21:12:39 +08:00
David Gibson	aad1a8734f	runtime/device: Give the agent information about VFIO devices We send information about several kinds of devices to the agent so that it can apply specific handling. We don't currently do this with VFIO devices. However we need to do that so that the agent can properly wait for VFIO devices to be ready (previously it did that using a PCI rescan which may not be reliable and has some very bad side effects). This patch collates and sends the relevant information. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-27 12:46:33 +10:00
David Gibson	ebd7b61884	runtime: Don't repeat GetDeviceByID between appendDevices() and append() Both appendBlockDevice and appendVhostUserBlkDevice start by using GetDeviceByID to lookup the api.Device object corresponding to their ContainerDevice object. However their common caller, appendDevices() has already done this. This changes it so the looked up api.Device is passed to the individual appendDevice() functions. This slightly reduces duplicated work, but more importantly it makes it clearer that append*Device() don't need to check for a nil result from GetDeviceByID, since the caller has already done that. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-27 12:46:33 +10:00
David Gibson	ad45c52fbe	runtime/device: Record guest PCI path for VFIO devices For several device types which correspond to a PCI device in the guest we record the device's PCI path in the guest. We don't currently do that for VFIO devices, but we're going to need to for better handling of SR-IOV devices. To accomplish this, we have to determine the guest PCI path from the information the VMM gives us: For qemu, we query the slot of the device and its bridge from QMP. For cloud-hypervisor, the device add interface gives us a guest PCI address. In fact this represents a design error in the clh API - there's no way it can really know the guest PCI address in general. It works in this case, because clh doesn't use PCI bridges, so the device will always be on the root bus. Based on that, the PCI path is simply the device's slot number. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-27 12:46:33 +10:00
David Gibson	5c2af3e308	runtime/device: Refactor hotplugVFIODevice() to have common exit path hotplugVFIODevice() has several different paths depending if we're plugging into a root port or a PCIE<->PCI bridge and if we're using a regular or mediated VFIO device. We're going to want some common code on the successful exit path here, so refactor the function to allow that without duplication. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-27 12:46:33 +10:00
David Gibson	cf36fd87ad	runtime: Fix some leftover go fmt errors A few "go fmt" errors appear to have crept it. Clean them up with "go fmt ./..." in the src/runtime directory. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-27 12:46:33 +10:00
zhanghj	57e3712dbd	virtiofs: fix error report in TestVirtiofsdStart when go test running Initialize ctx with context.Background() instead of nil value. Fixes: #2718 Signed-off-by: zhanghj <zhanghj.lc@inspur.com>	2021-09-24 16:06:06 +08:00
Fabiano Fidêncio	279f8e9d03	Merge pull request #2590 from c3d/issue/2589-virtiofsd-perms virtiofs: Create shared directory with 0700 mode, not 0750	2021-09-24 09:16:40 +02:00
Julio Montes	5d2a82fbf9	Merge pull request #2323 from dgibson/acpi-pcihp Replace SHPC with ACPI PCI hotplug for Kata guests	2021-09-23 09:55:31 -05:00
Fabiano Fidêncio	0ececc630f	Merge pull request #2666 from cmaf/tracing-newContainer-logger runtime: tracing: Fix logger passed in newContainer	2021-09-23 13:07:19 +02:00
Fabiano Fidêncio	e33c26ba18	Merge pull request #2622 from YchauWang/wyc-vc-api virtcontainers: update VC SandboxConfig API add SandboxBindMounts field	2021-09-23 13:05:33 +02:00
Fabiano Fidêncio	47170e302a	Merge pull request #2616 from Bevisy/main-2615 sandbox: Allow the device to be accessed,such as /dev/null and /dev/u…	2021-09-23 13:04:18 +02:00
David Gibson	8bbcb06af5	qemu: Disable SHPC hotplug Under certain circumstances[0] Kata will attempt to use SHPC hotplug for PCI devices on the guest. In fact we explicitly enable SHPC on our PCI to PCI bridges, regardless of the qemu default. SHPC was designed a long, long time ago for physical hotplugging and works very poorly for a virtual environment. In particular it has a mandatory 5s delay to allow a (real, human) operator to back out the operation if they press a button by mistake. This alone makes it unusable for a fast start up application like Kata. Worse, the agent forces a PCI rescan during startup. That will race with the SHPC hotplug operation causing the device to go into a bad state where config space can't be accessed from the guest at all. The only reason we've sort of gotten away with this is that our default guest kernel configuration triggers what's arguably a kernel bug effectively disabling SHPC. That makes the agent rescan the only reason we see the new device. Now that we require a qemu >=6.1, which includes ACPI PCI hotplug on the q35 machine, we can explicitly disable SHPC in all cases. It's nothing but trouble. fixes #2174 Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-23 10:27:26 +10:00
David Gibson	cc4983eeac	runtime: Remove unused qemuArchBase.appendBridges definition qemuArchBase.appendBridges is never actually used, because the bare qemuArchBase type is itself never used (outside of unit tests). Instead all the subclasses of qemuArchBase override appendBridges() to call the very similar, but not identical genericAppendBridges. So, we can remove the qemuArchBase.appendBridges implementation. Furthermore, all those subclasses override appendBridges() in exactly the same way, and so we can remove those definitions and replace the base class qemuArchBase appendBridges() with that version, calling genericAppendBridges(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-23 10:15:08 +10:00
wangyongchao.bj	3b0c4bf9a0	runtime: clear virtcontainers cgroup duplicated function There are `DeviceToDeviceCgroup` and `deviceToDeviceCgroup` two functions, creating a `specs.LinuxDeviceCgroup` object. We clear the new function `deviceToDeviceCgroup`. Fixes: #2694 Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>	2021-09-22 15:13:34 +08:00
Fabiano Fidêncio	2bee8bc6bd	Merge pull request #2432 from fengwang666/qemu-rootless runtime: run the QEMU VMM process with a non-root user	2021-09-21 21:37:02 +02:00
Feng Wang	9a6d56f1ab	runtime: fix empty cgroup path validation error An empty cgroup path shouldn't fail cgroup creation Fixes #2674 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2021-09-20 13:48:09 -07:00
Christophe de Dinechin	48fb1d9203	virtiofs: Create shared directory with 0700 mode, not 0750 A discussion on the Linux kernel mailing list [1] exposed that virtiofsd makes a core assumption that the file systems being shared are not accessible by any non-privileged user. We currently create the `shared` directory in the sandbox with the default `0750` permissions, which gives read and directory traversal access to the group. There is no real good reason for a non-root user to access the shared directory, and this is potentially dangerous. Fixes: #2589 [1]: https://lore.kernel.org/linux-fsdevel/YTI+k29AoeGdX13Q@redhat.com/ Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2021-09-20 10:47:18 +02:00
Chelsea Mafrica	077b77c178	runtime: tracing: Fix logger passed in newContainer Change logger in Trace call in newContainer from sandbox.Logger() to nil. Passing nil will cause an error to be logged by kataTraceLogger instead of the sandbox logger, which will avoid having the log message report it as part of the sandbox subsystem when it is part of the container subsystem. The kataTraceLogger will not log it as related to the container subsystem, but since the container logger has not been created at this point, and we already use the kataTraceLogger in other instances where a subsystem's logger has not been created yet, this PR makes the call consistent with other code. Fixes #2665 Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>	2021-09-17 11:41:04 -07:00
Feng Wang	1cfe59304d	runtime: Run QEMU using a non-root user/group A random generated user/group is used to start QEMU VMM process. The /dev/kvm group owner is also added to the QEMU process to grant it access. Fixes #2444 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2021-09-17 11:28:44 -07:00
Hui Zhu	fff82b4ef5	Merge pull request #2628 from bergwolf/runtime-reorg runtime: refactor commandline code directory	2021-09-17 10:37:22 +08:00
Chelsea Mafrica	6159ef3499	Merge pull request #2626 from YchauWang/wyc-vc-api02 virtcontainers: update VC HypervisorConfig API add three lost fields	2021-09-16 16:46:27 -07:00
Peng Tao	067c44d0b6	runtime: fix UT build failure storeContainer has been removed. Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2021-09-16 19:42:02 +08:00
Samuel Ortiz	7bf96d2457	Merge pull request #2604 from Amulyam24/container_tests virtcontainers: add unit tests for container.go	2021-09-16 11:02:16 +02:00
Bo Chen	d00decc97d	runtime: clh: Enable hugepages support This patch adds the configuration option that allows to use hugepages with Cloud Hypervisor guests. Fixes: #2648 Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-09-15 10:43:57 -07:00
David Gibson	64bb803fcf	runtime/qemu: Move from query-cpus to query-cpus-fast We recently updated to using qemu-6.1 (from qemu 5.2). Unfortunately one breaking change in qemu 6.0 wasn't caught by the CI. The query-cpus QMP command has been removed, replaced by query-cpus-fast (which has been available since qemu 2.12). govmm already had support for query-cpus-fast, we just weren't using it, so the change is quite easy. fixes #2643 Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-15 16:41:26 +10:00
Samuel Ortiz	9bed2ade0f	virtcontainers: Convert to the new cgroups package API The new API is based on containerd's cgroups package. With that conversion we can simpligy the virtcontainers sandbox code and also uniformize our cgroups external API dependency. We now only depend on containerd/cgroups for everything cgroups related. Depends-on: github.com/kata-containers/tests#3805 Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com> Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-09-14 07:09:34 +02:00
Samuel Ortiz	b42ed39349	virtcontainers: cgroups: Add a containerd API based cgroups package Eventually, we will convert the virtcontainers and the whole Kata runtime code base to only rely on that package. This will make Kata only depends on the simpler containerd cgroups API. Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>	2021-09-14 07:09:34 +02:00
Samuel Ortiz	f17752b0dc	virtcontainers: container: Do not create and manage container host cgroups The only process we are adding there is the container host one, and there is no such thing anymore. Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>	2021-09-14 07:09:33 +02:00
Samuel Ortiz	dc7e9bce73	virtcontainers: sandbox: Host cgroups partitioning This is a simplification of the host cgroup handling by partitioning the host cgroups into 2: A sandbox cgroup and an overhead cgroup. The sandbox cgroup is always created and initialized. The overhead cgroup is only available when sandbox_cgroup_only is unset, and is unconstrained on all controllers. The goal of having an overhead cgroup is to be more flexible on how we manage a pod overhead. Having such cgroup will allow for setting a fixed overhead per pod, for a subset of controllers, while at the same time not having the pod being accounted for those resources. When sandbox_cgroup_only is not set, we move all non vCPU threads to the overhead cgroup and let them run unconstrained. When it is set, all pod related processes and threads will run in the sandbox cgroup. Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>	2021-09-14 07:09:29 +02:00
Samuel Ortiz	f811026c77	virtcontainers: Unconditionally create the sandbox cgroup manager Regardless of the sandbox_cgroup_only setting, we create the sandbox cgroup manager and set the sandbox cgroup path at the same time. Without doing this, the hypervisor constraint routine is mostly a NOP as the sandbox state cgroup path is not initialized. Fixes #2184 Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>	2021-09-14 07:05:57 +02:00
wangyongchao.bj	a6066404f7	virtcontainers: update VC HypervisorConfig API add three lost fields Sync the virtcontainers api.md document, add `ConfidentialGuest` `EntropySourceList` `GuestSwap` three fields to the HypervisorConfig API. Fixes #2625 Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>	2021-09-14 10:42:54 +08:00

... 13 14 15 16 17 ...

1241 Commits