kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2025-08-27 20:18:57 +00:00

Author	SHA1	Message	Date
zhanghj	efa19c41eb	device: use const strings for block-driver option instead of hard coding Currently, the block driver option is specifed by hard coding, maybe it is better to use const string variables instead of hard coded strings. Another modification is to remove duplicate consts for virtio driver in manager.go. Fixes: #3321 Signed-off-by: Jason Zhang <zhanghj.lc@inspur.com>	2022-03-14 09:20:43 +08:00
Eric Ernst	1e301482e7	Merge pull request #3406 from fengwang666/direct-blk-assignment Implement direct-assigned volume	2022-03-04 11:58:37 -08:00
Feng Wang	e76519af83	runtime: small refactor to improve readability Remove some confusing/duplicate code so it's more readable Fixes: #3454 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-03-04 10:00:52 -08:00
Fabiano Fidêncio	7e5f11a52b	vendor: Update containerd to 1.6.1 Let's bring in the latest release of Containerd, 1.6.1, released on March 2nd, 2022. With this, we take the opportunity to remove containerd/api reference as we shouldn't need a separate module only for the API. Here's the list of changes needed in the code due to the bump: * stop using `grpc.WithInsecure()` as it's been deprecated - use `grpc.WithTransportCredentials(insecure.NewCredentials())` instead Fixes: #3820 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-03-04 10:28:40 +01:00
Fabiano Fidêncio	2af91b23e1	Merge pull request #3281 from jongwu/vcpu_hotplug_arm64 experimentally enable vcpu hotplug and virtio-mem on arm64 in kernel part	2022-03-04 09:14:31 +01:00
Jianyong Wu	42771fa726	runtime: don't set socket and thread for arm/virt As this is just a initial vcpu hotplug support, thread and socket has not been supported. So, don't set socket and thread when hotadd cpu for arm/virt. Fixes: #3280 Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>	2022-03-04 11:22:18 +08:00
Feng Wang	f905161bbb	runtime: mount direct-assigned block device fs only once Mount the direct-assigned block device fs only once and keep a refcount in the guest. Also use the ro flag inside the options field to determine whether the block device and filesystem should be mounted as ro Fixes: #3454 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-03-03 18:57:02 -08:00
Feng Wang	ea51ef1c40	runtime: forward the stat and resize requests from shimv2 to kata agent Translate the volume path from host-known path to guest-known path and forward the request to kata agent. Fixes: #3454 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-03-03 18:57:02 -08:00
Feng Wang	c39281ad65	runtime: update container creation to work with direct assigned volumes During the container creation, it will parse the mount info file of the direct assigned volumes and update the in memory mount object. Fixes: #3454 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-03-03 18:57:02 -08:00
Feng Wang	4e00c2377c	agent: add grpc interface for stat and resize operations Add GetVolumeStats and ResizeVolume APIs for the runtime to query stat and resize fs in the guest. Fixes: #3454 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-03-03 18:57:02 -08:00
Fabiano Fidêncio	af80473496	clh: stop virtofsd if clh fails to boot up the vm If, for some reason, we're able to launch cloud hypervisor but not able to boot the VM up, the virtiofsd process would be left behind. Let's ensure, via defer, that we stop virtiofsd in case of errors. Fixes: #3819 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-03-03 19:10:37 +01:00
Fabiano Fidêncio	97951a2d12	clh: Don't use SharedFS with Confidential Guests kata-containers/pulls#3771 added TDX support for Cloud Hypervisor, but two big things got overlooked while doing that. 1. virtio-fs, as of now, cannot be part of the trust boundary, so the Confidential Guest will not be using it. 2. virtio-block hotplug should be enabled in order to use virtio-block for the rootfs (used with the devmapper plugin). When trying to use cloud-hypervisor with TDX using virtio-fs, we're facing the following error on the guest kernel: ``` virtiofs virtio2: device must provide VIRTIO_F_ACCESS_PLATFORM ``` After checking and double-checking with virtiofs and cloud-hypervisor developers, it happens as confidential containers might put some limitations on the device, so it can't access all of the guests' memory and that's where this restriction seems to be coming from. Vivek mentioned that virtiofsd do not support VIRTIO_F_ACCESS_PLATFORM (aka VIRTIO_F_IOMMU_PLATFORM) yet, and that for ecrypted guests virtiofs may not be the best solution at the moment. @sboeuf put this in a very nice way: "if the virtio-fs driver doesn't support VIRTIO_F_ACCESS_PLATFORM, then the pages corresponding to the virtqueues and the buffers won't be marked as SHARED, meaning the VMM won't have access to it". Interestingly enough, it works with QEMU, and it may be due to some change done on the patched QEMU that @devimc is packaging, but we won't take the path to figure out what was the change and patch cloud-hypervisor on the same way, because of 1. Fixes: #3810 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-03-03 12:49:40 +01:00
Fabiano Fidêncio	c30b3a9ff1	clh: Adding a volume is not supported without SharedFS As mounting volumes into the guest requires SharedFS setup, let's ensure we error out if trying to do so in a situation where SharedFS is not supported. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-03-03 12:49:30 +01:00
Fabiano Fidêncio	f889f1f957	clh: introduce supportsSharedFS() supportsSharedFS() is a new method to be used to ensure that no SharedFS specifics are called when, for a reason or another, Cloud Hypervisor is in a mode where SharedFSs are not supported. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-03-03 12:49:28 +01:00
Fabiano Fidêncio	54d27ed721	clh: introduce loadVirtiofsDaemon() Similarly to the `createVirtiofsDaemon` and `stopVirtiofsDaemon` methos, let's introduce and use loadVirtiofsDaemon, at it'll also be handy later in this series. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-03-03 12:48:38 +01:00
Fabiano Fidêncio	ae2221ea68	clh: introduce stopVirtiofsDaemon() Similary to the `createVirtiofsDaemon` method, let's introduce and use its counterpart, as it'll also be handy later in this series. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-03-03 12:48:26 +01:00
Fabiano Fidêncio	e8bc26f90d	clh: introduce setupVirtiofsDaemon() Similarly to what's been done with the `createVirtiofsDaemon`, let's create a `setupVirtiofsDaemon` one. It will also become handy later in this series. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-03-03 12:48:14 +01:00
Fabiano Fidêncio	413b3b477a	clh: introduce createVirtiofsDaemon() Let's introduce and use a new `createVirtiofsDaemon` method. Its name says it all, and it'll be handy later in this series when, spoiler alert, SharedFS cannot be used (in such cases as in Confidential Guests). Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-03-03 12:48:02 +01:00
Fabiano Fidêncio	76e4f6a2a3	Revert "hypervisors: Confidential Guests do not support Device hotplug" This reverts commit `df8ffecde0`, as device hotplug is supported and, more than that, is very much needed when using virtio-blk instead of virtio-fs. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-03-03 09:59:55 +01:00
Bin Liu	75877f8793	Merge pull request #3187 from Kvasscn/kata_dev_remove_temp_vsock_dir virtcontainers: remove temp dir created for vsock in test code	2022-03-02 11:05:47 +08:00
Francesco Giudici	7f638dd049	Merge pull request #3764 from Jakob-Naucke/hugepages-test-s390x virtcontainers: Use available s390x hugepages	2022-03-01 14:33:59 +01:00
Fabiano Fidêncio	97c17085b0	Merge pull request #3770 from Jakob-Naucke/gofmt-vmm-s390x runtime: Gofmt fixes	2022-03-01 11:34:15 +01:00
GabyCT	bc1733bb0e	Merge pull request #3774 from egernst/delinux-runtime cleanup runtime pkgs for Darwin build, add basic Darwin build/unit test	2022-02-28 15:08:09 -06:00
Jakob Naucke	eda8ea154a	runtime: Gofmt fixes - Mostly blank lines after `+build` -- see https://pkg.go.dev/go/build@go1.14.15 -- this is, to date, enforced by `gofmt`. - 1.17-style go:build directives are also added. - Spaces in govmm/vmm_s390x.go Fixes: #3769 Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2022-02-28 17:24:47 +01:00
Eric Ernst	e355a71860	container: file is not linux specific This should not be linux specific -- drop restriction. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-02-28 08:01:53 -08:00
Eric Ernst	b31876eefb	device-manager: move linux-only test to a linux-only file We can't Mkdev on Darwin - let's make sure the vfio test is in a linux-only file. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-02-28 08:01:53 -08:00
Eric Ernst	5be188cc29	utils: Add darwin stub Add a stub for utils_darwin to facilitate building this package on Darwin. We can probably drop this empty stub if we have better abstraction for the various parts of virtcontainers that call it today... Fixes:# 3777 Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-02-28 08:01:53 -08:00
Samuel Ortiz	ad0449195d	virtcontainers: Convert stats dev_t to uint64 We need to convert them to uint64 as their types may differ on various host OSes, but unix.Major\|Minor takes a uint64 regardless. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-28 08:01:53 -08:00
bin	81ed269ed2	runtime: use Cmd.StdoutPipe instead of self-created pipe Nydusd uses a bufio.Scanner to check if nydusd process has existed, but stderr/stdout passed to Cmd is self-created pipe, this pipe will not be closed if the process start failing. Use standard Cmd.StdoutPipe can close the stdout and kata shim will detect the existence of the nydusd process, then call cmd.Wait to reap the process' resources. Fixes: #3783 Signed-off-by: bin <bin@hyper.sh>	2022-02-28 16:52:49 +08:00
Eric Ernst	3997c962c2	Merge pull request #3767 from tanweernoor/02242022-kata-containers-issue-3631 runtime, config: make selinux configurable	2022-02-26 08:44:29 -08:00
Fabiano Fidêncio	a9ba7c132b	clh: Fix typo on HotplugRemoveDevice A copy and paste mistake was made and the error on HotplugRemoveDevice() should be about removal and not about addition. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-02-25 22:35:32 +01:00
Tanweer Noor	082d538cb4	runtime: make selinux configurable removes --tags selinux handling in the makefile (part of it introduced here: `d78ffd6`) and makes selinux configurable via configuration.toml Fixes: #3631 Signed-off-by: Tanweer Noor <tnoor@apple.com>	2022-02-25 10:33:46 -08:00
Fabiano Fidêncio	ea1876f057	Merge pull request #3771 from fidencio/wip/clh-tdx clh: Add TDX support	2022-02-25 18:45:31 +01:00
Samuel Ortiz	1103f5a4d4	virtcontainers: Use FilesystemSharer for sharing the containers files Switching to the generic FilesystemSharer brings 2 majors improvements: 1. Remove container and sandbox specific code from kata_agent.go 2. Allow for non Linux implementations to provide ways to share container files and root filesystems with the Kata Linux guest. Fixes #3622 Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-25 17:22:27 +01:00
Samuel Ortiz	533c1c0e86	virtcontainers: Keep all filesystem sharing prep code to sandbox.go With the Linux implementation of the FilesystemSharer interface, we can now remove all host filesystem sharing code from kata_agent and keep it where it belongs: sandbox.go. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-25 17:22:27 +01:00
Samuel Ortiz	61590bbddc	virtcontainers: Add a Linux implementation for the FilesystemSharer This gathers the current kata agent and container filesystem sharing code into a FilesystemSharer implementation. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-25 17:22:27 +01:00
Samuel Ortiz	03fc1cbd7e	virtcontainers: Add a filesystem sharing interface Filesystem sharing here means the ability to share some parts of the host filesystem with the guest. It's mostly about sharing files and container bundle root filesystems. In order to allow for different file and rootfs sharing implementations, we define a FilesystemSharer interface. This interface provides a preparation step, where concrete implementations will be able to e.g. prepare the host filesysstem. Then it provides 2 methods, one for sharing any file (regular file or a directory) and another one for sharing a container root filesystem Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-25 17:22:27 +01:00
Fabiano Fidêncio	72434333aa	clh: Add TDX support Let's enable TDX support for Cloud Hypervisor, using td-shim as its desired firmware. Fixes: #3632 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-02-25 16:49:21 +01:00
Fabiano Fidêncio	a8827e0c78	hypervisors: Confidential Guests do not support NVDIMM NVDIMM is also not supported with Confidential Guests and Virtio Block devices should be used instead. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-02-25 16:49:21 +01:00
Fabiano Fidêncio	f50ff9f798	hypervisors: Confidential Guests do not support Memory hotplug Similarly to VCPUs and Device hotplug, Confidential Guests also do not support Memory hotplug. Let's make it clear in the documentation and guard the code on both QEMU and Cloud Hypervisor side to ensure we don't advertise Memory hotplug as being supported when running Confidential Guests. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-02-25 16:49:21 +01:00
Fabiano Fidêncio	df8ffecde0	hypervisors: Confidential Guests do not support Device hotplug Similarly to VCPUs hotplug, Confidential Guests also do not support Device hotplug. Let's make it clear in the documentation and guard the code on both QEMU and Cloud Hypervisor side to ensure we don't advertise Device hotplug as being supported when running Confidential Guests. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-02-25 16:49:21 +01:00
Fabiano Fidêncio	28c4c044e6	hypervisors: Confidential Guests do not support VCPUs hotplug As confidential guests do not support VCPUs hotplug, let's set the "DefaultMaxVCPUs" value to "NumVCPUs". The reason to do this is to ensure that guests will be started with the correct amount of VCPUs, without giving to the guest with all the possible VCPUs the host could provide. One clear side effect of this limitation is that workloads that would require more VCPUs on their yaml definition will not run on this scenario. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-02-25 16:49:21 +01:00
Fabiano Fidêncio	29ee870d20	clh: Add confidential_guest to the config file ConfidentialGuest is an option already present and exposed for QEMU, which is used for using Kata Containers together with different sorts of Guest Protections, such as TDX and SEV for x86_64, PEF for ppc64le, and SE for s390x. Right now we error out in case confidential_guest is enabled, as we will be implementing the needed blocks for this as part of this series. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-02-25 16:49:21 +01:00
Fabiano Fidêncio	9621c59691	clh: refactor image / initrd configuration set This is a small code refactor removing a deadcode based the checks already done in the generic hypervisor abstraction. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-02-25 16:49:21 +01:00
Fabiano Fidêncio	dcdc412e25	clh: use common kernel params from the hypervisor code The hypervisor code already defines 3 common kernel root params for the following cases: * NVDIMM * NVDIMM without DAX support * Virtio Block As parameters used for cloud-hypervisor have an overlap with the ones provided by the NVDIMM case, let's take advantage of that. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-02-25 16:49:21 +01:00
Fabiano Fidêncio	4c164afbac	versions: Update Cloud Hypervisor to 5343e09e7b8db Let's bump the Cloud Hypervisor version to 5343e09e7b8db, as that brings a few fixes we're interested in, such as: * hypervisor, vmm: Handle TDX hypercalls with INVALID_OPERAND - https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3723 - This is needed for the TDX support on the cloud hypervisor driver, which is part of this very same series. * openapi: Update the PciBdf types - https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3748 - This is needed due to a change in a DeviceNode field, which would cause a marshalling / demarshalling error when running with a version of cloud-hypervisor that includes the TDX fixes mentioned above. * scripts: dev_cli: Don't quote $features_build * scripts: dev_cli: Add --features option - https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3773 - This is needed due to changes in the scripts used to build Cloud Hypervisor, which are used as part of Kata Containers CIs and github actions. Due to this change, we're also adapting the build scripts as part of this very same commit. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-02-25 16:49:16 +01:00
Jakob Naucke	bbfe7d6591	Merge pull request #3599 from Jakob-Naucke/no-virtio-rng-ccw virtcontainers: Do not add a virtio-rng-ccw device	2022-02-25 15:27:02 +01:00
Jakob Naucke	b2a65f9031	virtcontainers: Use available s390x hugepages in TestHandleHugepages. On s390x, hugepage sizes must be set at boot, so test with any that are present (default is 1M). Depends-on: github.com/kata-containers/kata-containers#3770 Fixes: #3763 Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2022-02-25 13:11:00 +01:00
Eric Ernst	c6cc038364	Merge pull request #3615 from sameo/topic/hypervisor Make the hypervisor framework not Linux specific	2022-02-23 16:02:00 -08:00
Samuel Ortiz	9fd4e5514f	runtime: Move the resourcecontrol package one layer up And try to reduce the number of virtcontainers packages, step by step. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-23 15:48:40 +01:00
Samuel Ortiz	823faee83a	virtcontainers: Rename the cgroups package To resourcecontrol, and make it consistent with the fact that cgroups are a Linux implementation of the ResourceController interface. Fixes: #3601 Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-23 15:48:40 +01:00
Samuel Ortiz	0d1a7da682	virtcontainers: Rename and clean the cgroup interface We call it a ResourceController, and we make it not so Linux specific. Now the Linux implementations is the cgroups one. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-23 15:48:40 +01:00
Samuel Ortiz	ad10e201e1	virtcontainers: cgroups: Move non Linux routine to utils.go Have an OS agnostic file for sharing routines. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-23 15:48:40 +01:00
Samuel Ortiz	d49d0b6f39	virtcontainers: cgroups: Define a cgroup interface And move the current, Linux-specific implementation into cgroups_linux.go Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-23 15:48:40 +01:00
Fabiano Fidêncio	4729fd0fc2	Merge pull request #3736 from liubin/fix/3733-log-events-for-crio shim: log events for CRI-O	2022-02-22 09:19:37 +01:00
bin	f6fc1621f7	shim: log events for CRI-O CRI-O start shim process without setting TTRPC_ADDRESS, that the forwarding events goroutine will get errors. For CRI-O runtime, we can log the events to log file. Fixes: #3733 Signed-off-by: bin <bin@hyper.sh>	2022-02-22 11:02:50 +08:00
Peng Tao	031da99914	Merge pull request #3687 from luodw/nydus-clh nydus: add lazyload support for kata with clh	2022-02-21 19:31:45 +08:00
luodaowen.backend	3175aad5ba	virtiofs-nydus: add lazyload support for kata with clh As kata with qemu has supported lazyload, so this pr aims to bring lazyload ability to kata with clh. Fixes #3654 Signed-off-by: luodaowen.backend <luodaowen.backend@bytedance.com>	2022-02-19 21:55:31 +08:00
zhanghj	94b831ebf8	virtcontainers: remove temp dir created for vsock in test code remove temp dir generated by mock.GenerateKataMockHybridVSock(). Fixes: #3186 Signed-off-by: zhanghj <zhanghj.lc@inspur.com>	2022-02-19 16:59:15 +08:00
Archana Shinde	7db9bef72c	Merge pull request #3718 from Kvasscn/kata_dev_fix_utils_assert_msg virtcontainers: Remove duplicated assert messages in utils test code	2022-02-18 06:07:16 -08:00
Samuel Ortiz	27de212fe1	runtime: Always add network endpoints from the pod netns As the container runtime, we're never inspecting, adding or configuring host networking endpoints. Make sure we're always do that by wrapping addSingleEndpoint calls into the pod network namespace. Fixes #3661 Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-18 10:37:07 +01:00
zhanghj	1cee0a9452	virtcontainers: Remove duplicated assert messages in utils test code Remove duplicated strings in assert.Errorf() and assert.NoErrorf(). Fixes: #3714 Signed-off-by: zhanghj <zhanghj.lc@inspur.com>	2022-02-18 16:45:05 +08:00
Samuel Ortiz	77c29bfd3b	container: Remove VFIO lazy attach handling With the recently added VFIO fixes and support, we should not need that anymore. Fixes #3108 Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-17 08:39:44 +01:00
Fabiano Fidêncio	6f9685fbf5	Merge pull request #3624 from mdlayher/mdl-vsock runtime: use github.com/mdlayher/vsock@v1.1.0	2022-02-16 23:11:47 +01:00
Samuel Ortiz	26b3f0017c	virtcontainers: Split hypervisor into Linux and OS agnostic bits Keep all the OS agnostic bits in the hypervisor.go and hypervisor_ARCH.go files. Fixes #3614 Signed-off-by: Eric Ernst <eric_ernst@apple.com> Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-16 19:15:31 +01:00
Samuel Ortiz	fa0e9dc6b1	virtcontainers: Make all Linux VMMs only build on Linux Some of them (e.g. QEMU) can run on other OSes (e.g. Darwin) but the current virtcontainers implementation is Linux specific. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-16 19:07:34 +01:00
Samuel Ortiz	c91035d0e1	virtcontainers: Move non QEMU specific constants to hypervisor.go Hotplugging errors and 9pfs size are not particularily QEMU specific. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-16 19:07:34 +01:00
Samuel Ortiz	10ae05914c	virtcontainers: Move guest protection definitions to hypervisor.go They're not QEMU specific, other VMMs may implement support for it. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-16 19:07:31 +01:00
Samuel Ortiz	b28d0274ff	virtcontainers: Make max vCPU config less QEMU specific Even though it's still actually defined as the QEMU upper bound, it's now abstracted away through govmm. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-16 19:06:32 +01:00
bin	81a8baa5e5	runtime: add hugepages support Add hugepages support, port from: `b486387cba` Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com> Signed-off-by: bin <bin@hyper.sh>	2022-02-16 15:14:53 +08:00
bin	7df677c01e	runtime: Update calculateSandboxMemory to include Hugepages Limit Support hugepages and port from: `96dbb2e8f0` Fixes: #3342 Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com> Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com> Signed-off-by: bin <bin@hyper.sh>	2022-02-16 15:14:37 +08:00
Fabiano Fidêncio	90fd625d0c	versions: Udpate Cloud Hypervisor to 55479a64d237 Let's update cloud-hypervisor to a version that exposes the TDx support via the OpenAPI's auto-generated code. Fixes: #3663 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-02-14 17:32:30 +01:00
Matt Layher	c1ce67d905	runtime: use github.com/mdlayher/vsock@v1.1.0 Fixes #3625 Signed-off-by: Matt Layher <mdlayher@gmail.com>	2022-02-12 19:57:15 -05:00
luodaowen.backend	2d9f89aec7	feature(nydusd): add nydusd support to introduse lazyload ability Pulling image is the most time-consuming step in the container lifecycle. This PR introduse nydus to kata container, it can lazily pull image when container start. So it can speed up kata container create and start. Fixes #2724 Signed-off-by: luodaowen.backend <luodaowen.backend@bytedance.com>	2022-02-11 21:41:17 +08:00
Julio Montes	982f14fa66	runtime: support QEMU SGX Enable SGX in QEMU when `sgx.intel.com/epc` annotation is defined fixes #3436 Signed-off-by: Julio Montes <julio.montes@intel.com>	2022-02-10 09:45:48 -06:00
Samuel Ortiz	07b9d93f5f	virtcontainer: Simplify the sandbox network creation flow We don't need to call NewNetwork() twice, and we can have the VM factory case return immediatly. That makes the code more readable. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-08 22:27:53 +01:00
Samuel Ortiz	2c7087ff42	virtcontainers: Make all endpoints Linux only All of the networking endpoints are Linux specific. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-08 22:27:53 +01:00
Samuel Ortiz	49d2cde1e2	virtcontainers: Split network tests into generic and OS specific parts Some unit tests are generic while others, mostly because they depend on netlink, are Linux specific. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-08 22:27:53 +01:00
Samuel Ortiz	0269077ebf	virtcontainers: Remove the netlink package dependency from network.go Move the netlink dependent code into network_linux.go. Other OSes will have to provide the same functions. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-08 22:27:53 +01:00
Samuel Ortiz	7fca5792f7	virtcontainers: Unify Network endpoints management interface And only have AddEndpoints/RemoveEndpoints for all cases (single endpoint vs all of them, hotplug or not). Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-08 22:27:53 +01:00
Samuel Ortiz	c67109a251	virtcontainers: Remove the Network PostAdd method It's used once by the sandbox code and can be implemented directly there. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-08 22:27:53 +01:00
Samuel Ortiz	e0b264430d	virtcontainers: Define a Network interface And move the Linux implementation into a GOOS specific file. Fixes #3005 Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-08 22:27:53 +01:00
Samuel Ortiz	5e119e90e8	virtcontainers: Rename the Network structure fields and methods We are converting the Network structure into an interface, so that different host OSes can have different networking implementations for Kata. One step into that direction is to rename all the Network structure fields and methods to something that is less Linux networking namespace specific. This will make the Network interface naming consistent. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-08 22:27:53 +01:00
Samuel Ortiz	b858d0dedf	virtcontainers: Make all Network fields private Prepare for making it a real interface. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-08 22:27:53 +01:00
Samuel Ortiz	49eee79f5f	virtcontainers: Remove the NetworkNamespace structure It is now replaced with a single Network structure Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-08 22:27:53 +01:00
Samuel Ortiz	844eb61992	virtcontainers: Have CreateVM use a Network reference We are replacing the NetworkingNamespace structure with the Network one, so we should have the hypervisor interface switching to it as well. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-08 22:27:53 +01:00
Samuel Ortiz	d7b67a7d1a	virtcontainers: Network API cleanups and simplifications Remove unused parameters. Reduce the number of parameters by deriving some of them (e.g. a networking config) from their outer structure (e.g. a Sandbox reference). Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-08 22:27:53 +01:00
Samuel Ortiz	2edea88369	virtcontainers: Make the Network structure manage endpoints Endpoints creations, attachement and hotplug are bound to the networking namespace described through the Network structure. Making them Network methods is natural and simplifies the code. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-08 22:27:53 +01:00
Samuel Ortiz	8f48e28325	virtcontainers: Expand the Network structure For simplicity sake, there should only be one networking structure per sandbox, as opposed to two (Network and NetworkingNamespace) currently. This commit start expanding the Network structure in order to eventually make it the single representation of a virtcontainers sandbox networking. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-08 22:27:53 +01:00
Pierre Kohler	5ef522f7c3	runtime: check kvm module `sev` correctly Runtime now accepts both `1` and `Y` as valid values for kvm_amd module parameter kvm_amd.sev. Fixes #3273 Signed-off-by: Pierre Kohler <pierre.kohler@cysec.systems>	2022-02-07 23:48:47 +01:00
Eric Ernst	e8eb5e8295	Merge pull request #3609 from egernst/rootless-linux virtcontainers: Split the rootless package into OS specific parts	2022-02-03 12:19:31 -08:00
Jakob Naucke	7ffe9e5198	virtcontainers: Do not add a virtio-rng-ccw device On s390x, skip adding a virtio-rng device. The on-chip CPACF provides entropy instead. For Confidential Containers, when using Secure Execution, entropy attacks on virtio-rng are mitigated. Fixes: #3598 Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2022-02-02 17:06:20 +01:00
Julio Montes	1f29478b09	runtime: suppport split firmware firmware can be split into FIRMWARE_VARS.fd (UEFI variables as configuration) and FIRMWARE_CODE.fd (UEFI program image). UEFI variables can be customized per each user while UEFI code is kept same. fixes #3583 Signed-off-by: Julio Montes <julio.montes@intel.com>	2022-02-01 13:40:19 -06:00
Samuel Ortiz	14e7f52a91	virtcontainers: Split the rootless package into OS specific parts Move the netns specific bits into a Linux specific file. Fixes: #3607 Signed-off-by: Samuel Ortiz <s.ortiz@apple.com> Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-01-28 16:20:28 -08:00
James O. D. Hunt	7c956e0d27	virtcontainers: Enable initrd for Cloud Hypervisor Since CH has supported booting with an initramfs since version 0.7.0 [1], allow an `initrd=` to be specified. Fixes: #3566. [1] - https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v0.7.0 Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2022-01-28 10:49:10 +00:00
Eric Ernst	a5ebeb96c1	Merge pull request #2941 from egernst/sandbox-sizing-feature Sandbox sizing feature	2022-01-27 09:37:57 -08:00
Eric Ernst	8cde54131a	runtime: introduce static sandbox resource management There are software and hardware architectures which do not support dynamically adjusting the CPU and memory resources associated with a sandbox. For these, today, they rely on "default CPU" and "default memory" configuration options for the runtime, either set by annotation or by the configuration toml on disk. In the case of a single container (launched by ctr, or something like "docker run"), we could allow for sizing the VM correctly, since all of the information is already available to us at creation time. In the sandbox / pod container case, it is possible for the upper layer container runtime (ie, containerd or crio) could send a specific annotation indicating the total workload resource requirements associated with the sandbox creation request. In the case of sizing information not being provided, we will follow same behavior as today: start the VM with (just) the default CPU/memory. If this information is provided, we'll track this as Workload specific resources, and track default sizing information as Base resources. We will update the hypervisor configuration to utilize Base+Workload resources, thus starting the VM with the appropriate amount of CPU and memory. In this scenario (we start the VM with the "right" amount of CPU/Memory), we do not want to update the VM resources when containers are added, or adjusted in size. This functionality is introduced behind a configuration flag, `static_sandbox_resource_mgmt`. This is defaulted to false for all configurations except Firecracker, which is set to true. This'll greatly improve UX for folks who are utilizing Kata with a VMM or hardware architecture that doesn't support hotplug. Note, users will still be unable to do in place vertical pod autoscaling or other dynamic container/pod sizing with this enabled. Fixes: #3264 Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-01-26 09:04:38 -08:00
Braden Rayhorn	fc0e095180	runtime: fix handling container spec's memory limit The OCI container spec specifies a limit of -1 signifies unlimited memory. Update the sandbox memory calculator to reflect this part of the spec. Fixes: #3512 Signed-off-by: Braden Rayhorn <bradenrayhorn@fastmail.com>	2022-01-24 13:30:32 -06:00
Bo Chen	2d799cbfa3	virtcontainers: clh: Re-generate the client code This patch re-generates the client code for Cloud Hypervisor v21.0. Note: The client code of cloud-hypervisor's (CLH) OpenAPI is automatically generated by openapi-generator [1-2]. [1] https://github.com/OpenAPITools/openapi-generator [2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md Signed-off-by: Bo Chen <chen.bo@intel.com>	2022-01-20 17:48:10 -08:00
Fabiano Fidêncio	ec6655af87	govmm: Use govmm from our own pkg Let's stop using govmm from kata-containers/govmm and let's start using it from our own repo. Fixes: #3495 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-01-19 18:02:46 +01:00
Julio Montes	49223e67af	runtime: remove enable_swap option `enable_swap` option was added long time ago to add `-realtime mlock=off` to the QEMU's command line. Kata now supports QEMU 6, `-realtime` option has been deprecated and `mlock=on` is causing unexpected behaviors in kata. This patch removes support for `enable_swap`, `-realtime` and `mlock=` since they are causing bugs in kata. Signed-off-by: Julio Montes <julio.montes@intel.com>	2022-01-18 11:12:29 -06:00
liangxianlong	878ab93c15	runtime: Provide protection for shared data The k.reqHandlers should be protected by locks when used Fixes #3440 Signed-off-by: liangxianlong <liang.xianlong@zte.com.cn>	2022-01-13 14:48:10 +08:00
James O. D. Hunt	ef835b5948	Merge pull request #3418 from yangfeiyu20102011/main runtime: it should rollback when failed in Sandbox AddInterface	2022-01-12 10:22:36 +00:00
bin	85f5ae190e	runtime: close span before return from function in case of error Return before closing span will cause invalid spans, so span should be closed before function return. Fixes: #3424 Signed-off-by: bin <bin@hyper.sh>	2022-01-11 19:45:41 +08:00
yangfeiyu	b133a2368a	runtime: it should rollback when failed in Sandbox AddInterface When Sandbox AddInterface() is called, it may fail after endpoint.HotAttach, we'd better rollback and call save() in the end. Fixes: #3419 Signed-off-by: yangfeiyu <yangfeiyu20102011@163.com>	2022-01-11 18:43:43 +08:00
Feng Wang	c486c2ca18	agent: fix the broken protobuf generation code After the protocols are moved to upper libs (PR3355), the runtime protocol generation is broken. This fixes it. Fixes: #3414 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-01-10 15:37:00 -08:00
Shengjing Zhu	2d0f9d2d06	vc: remove swagger binary Fixes: #3362 Signed-off-by: Shengjing Zhu <zhsj@debian.org>	2021-12-25 22:41:29 +08:00
Jakob Naucke	137e217b85	docs: Fix outdated k8s link in virtcontainers readme Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2021-12-22 19:40:25 +01:00
James O. D. Hunt	2ebae2d279	Merge pull request #3287 from jodh-intel/docs-split-arch-doc Split architecture doc into separate files	2021-12-20 10:11:30 +00:00
James O. D. Hunt	6f9efb4043	docs: Move arch doc to separate directory Move the architecture document into a new `docs/design/architecture/` directory in preparation for splitting it into more manageable pieces. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2021-12-16 12:26:17 +00:00
Eric Ernst	3865a1bcf6	Merge pull request #2918 from egernst/update-container-type-handling update container type handling	2021-12-15 10:41:23 -08:00
Eric Ernst	7a989a8333	runtime: api-test: fixup not clear why this was commented out before -- ensure that we set approprate annotation on the sandbox container's annotations to indicate this is a sandbox. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-12-14 18:55:18 -08:00
Eric Ernst	52f79aef91	utils: update container type handling Today we assume that if the CRI/upper layer doesn't provide a container type annotation, it should be treated as a sandbox. Up to this point, a sandbox with a pause container in CRI context and a single container (ala ctr run) are treated the same. For VM sizing and container constraining, it'll be useful to know if this is a sandbox or if this is a single container. In updating this, we cleanup the type handling tests and we update the containerd annotations vendoring. Fixes: #2926 Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-12-14 17:59:19 -08:00
bin	03546f75a6	runtime: change io/ioutil to io/os packages Change io/ioutil to io/os packages because io/ioutil package is deprecated from 1.16: Discard => io.Discard NopCloser => io.NopCloser ReadAll => io.ReadAll ReadDir => os.ReadDir ReadFile => os.ReadFile TempDir => os.MkdirTemp TempFile => os.CreateTemp WriteFile => os.WriteFile Details: https://go.dev/doc/go1.16#ioutil Fixes: #3265 Signed-off-by: bin <bin@hyper.sh>	2021-12-15 07:31:48 +08:00
Fabiano Fidêncio	602d87295b	Merge pull request #3226 from liubin/fix/3193-fill-hypervisorconfig runtime/template: Handling new attributes for hypervisor config	2021-12-09 13:29:23 +01:00
Chelsea Mafrica	7522109abc	Merge pull request #3218 from liubin/fix/3217-fix-span-name runtime: correct span name for stopSandbox function	2021-12-07 16:36:14 -08:00
bin	b92babf91b	runtime/template: Handling new attributes for hypervisor config Some new attributes are added to hypervisor config: - VMStorePath - RunStorePath - SharedPath These attributes should be handled in two places: - reset when check the new hypervisor's config is suitable to the base config. - copy from new hypervisor's config when create new VM Fixes: #3193 Signed-off-by: bin <bin@hyper.sh>	2021-12-07 19:31:03 +08:00
bin	40bd34caaf	runtime: only call stopVirtiofsd when shared_fs is virtio-fs If shared_fs is set to virtio-9p, the virtiofsd is not started, so there is no need to stop it. Fixes: #3219 Signed-off-by: bin <bin@hyper.sh>	2021-12-07 16:06:26 +08:00
bin	33f343ee08	runtime: correct span name for stopSandbox function Normally the span name should be the same as function name, so chagne `StopVM` to `stopSandbox`. Fixes: #3217 Signed-off-by: bin <bin@hyper.sh>	2021-12-07 15:59:18 +08:00
Bo Chen	995300260e	virtcontainers: clh: Upgrade to openapi-generator v5.3.0 The latest release of openapi-generator v5.3.0 contains the fix for `dropping err` bug [1]. This patch also re-generated the client code of Cloud Hypervisor to have the bug fixed. [1] https://github.com/OpenAPITools/openapi-generator/pull/10275 Fixes: #3201 Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-12-03 08:55:38 -08:00
Fabiano Fidêncio	3fdc97e110	Merge pull request #3183 from fengwang666/nonroot-vhost-bug-fix runtime: enable vhost-net for rootless hypervisor	2021-12-03 10:42:50 +01:00
Feng Wang	b3bcb7b251	runtime: enable vhost-net for rootless hypervisor vhost-net is disabled in the rootless kata runtime feature, which has been abandoned since kata 2.0. I reused the rootless flag for nonroot hypervisor and would like to enable vhost-net. Fixes #3182 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2021-12-02 21:55:31 -08:00
Bo Chen	4756a04b2d	virtcontainers: clh: Re-generate the client code This patch re-generates the client code for Cloud Hypervisor v19.0. Note: The client code of cloud-hypervisor's (CLH) OpenAPI is automatically generated by openapi-generator [1-2]. [1] https://github.com/OpenAPITools/openapi-generator [2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-12-02 12:09:12 -08:00
Peng Tao	01b6ffc0a4	Merge pull request #3028 from egernst/hypervisor-hacking Hypervisor cleanup, refactoring	2021-11-26 10:21:49 +08:00
bin	ddc68131df	runtime: delete netmon Netmon is not used anymore. Fixes: #3112 Signed-off-by: bin <bin@hyper.sh>	2021-11-24 15:08:18 +08:00
Eric Ernst	ce92cadc7d	vc: hypervisor: remove setSandbox The hypervisor interface implementation should not know a thing about sandboxes. Fixes: #2882 Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-11-19 12:20:41 -08:00
Eric Ernst	2227c46c25	vc: hypervisor: use our own logger This'll end up moving to hypervisors pkg, but let's stop using virtLog, instead introduce hvLogger. Fixes: #2884 Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-11-19 12:20:41 -08:00
Eric Ernst	4c2883f7e2	vc: hypervisor: remove dependency on persist API Today the hypervisor code in vc relies on persist pkg for two things: 1. To get the VM/run store path on the host filesystem, 2. For type definition of the Load/Save functions of the hypervisor interface. For (1), we can simply remove the store interface from the hypervisor config and replace it with just the path, since this is all we really need. When we create a NewHypervisor structure, outside of the hypervisor, we can populate this path. For (2), rather than have the persist pkg define the structure, let's let the hypervisor code (soon to be pkg) define the structure. persist API already needs to call into hypervisor anyway; let's allow us to define the structure. We'll probably want to look at following similar pattern for other parts of vc that we want to make independent of the persist API. In doing this, we started an initial hypervisors pkg, to hold these types (avoid a circular dependency between virtcontainers and persist pkg). Next step will be to remove all other dependencies and move the hypervisor specific code into this pkg, and out of virtcontaienrs. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-11-19 12:20:41 -08:00
Eric Ernst	34f23de512	vc: hypervisor: Remove need to get shared address from sandbox Add shared path as part of the hypervisor config Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-11-19 12:20:41 -08:00
Eric Ernst	c28e5a7807	acrn: remove dependency on sandbox, persistapi datatypes Today, acrn relies on sandbox level information, as well as a store provided by common parts of the hypervisor. As we cleanup the abstractions within our runtime, we need to ensure that there aren't cross dependencies between the sandbox, the persistence logic and the hypervisor. Ensure that ACRN still compiles, but remove the setSandbox usage as well as persist driver setup. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-11-19 12:20:41 -08:00
Greg Kurz	f80ca66300	Merge pull request #2921 from Amulyam24/template_test virtcontainers: fix failing template test on ppc64le	2021-11-18 17:32:18 +01:00
Amulyam24	d5a18173b9	virtcontainers: fix failing template test on ppc64le If a file/directory doesn't exist, os.Stat() returns an error. Assert the returned value with os.IsNotExist() to prevent it from failing. Fixes: #2920 Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>	2021-11-18 15:37:40 +05:30
Eric Ernst	7e6f2b8d64	vc-utils: don't export unused function Many of these functions are just used on one place throughout the rest of the code base. If we create hypervisor package, newtork package, etc, we may want to parse this out. Fixes: #3049 Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-11-17 14:12:57 -08:00
Eric Ernst	860f30882a	virtcontainers: move oci, uuid packages top level This will be useful at runtime level; no need for oci or uuid to be subpkg of virtcontainers. While at it, ensure we run gofmt on the changed files. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-11-17 14:12:57 -08:00
Eric Ernst	8acb3a32b6	virtcontainers: remove unused package nsenter Package is not utilized. Remove. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-11-17 14:12:57 -08:00
Eric Ernst	4788cb8263	vc-network: remove unused functions Unused functions -- let's clean up! Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-11-17 14:12:57 -08:00
Eric Ernst	b6ebddd7ef	oci: remove unused function GetContainerType This is unused - we utilize ContainerType directly. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-11-17 14:12:57 -08:00
Eric Ernst	1e7cb4bc3a	macvlan: drop bridged part of name The fact that we need to "bridge" the endpoint is a bit irrelevant. To be consistent with the rest of the endpoints, let's just call this "macvlan" Fixes: #3050 Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-11-16 16:44:29 -08:00
Carlos Venegas	15b5d22e81	Merge pull request #2778 from jcvenegas/clh-race-condition-check clh: Fix race condition that prevent start pods	2021-11-16 14:15:06 -06:00
Carlos Venegas	55412044df	monitor: Fix monitor race condition doing hypervisor.check() The thread monitor will check if the agent and the VMM are alive every second in a blocking thread. The Cloud hypervisor API server is single-threaded, if the monitor does a `check()`, while a slow request is still in progress, the monitor check() method will timeout. The monitor thread will stop all the shim-v2 execution. This commit modifies the monitor thread to make it check the status of the hypervisor after 5 seconds. Additionally, the `check()` method from cloud-hypervisor will use the method `clh.isClhRunning(timeout)` with a 10 seconds timeout. The monitor function does no timeout, so even if `hypervisor.check()` takes more 10 seconds, the isClhRunning method handles errors doing a VmmPing and retry in case of errors until the timeout is reached. Reduce the time to the next check to 5 should not affect any functionality, but it will reduce the overhead polling the hypervisor. Fixes: #2777 Signed-off-by: Carlos Venegas <jose.carlos.venegas.munoz@intel.com>	2021-11-16 18:28:29 +00:00
snir911	b046c1ef6b	Merge pull request #2959 from snir911/wip/cgroups-systemd-fix cgroups: Fix systemd cgroup support	2021-11-15 10:44:45 +02:00
Eric Ernst	e89c06e68b	Merge pull request #3032 from liubin/fix/3031-merge-two-types-packages runtime: merge virtcontainers/pkg/types into virtcontainers/types	2021-11-12 14:23:21 -08:00
bin	09f7962ff1	runtime: merge virtcontainers/pkg/types into virtcontainers/types There are two types packages under virtcontainers, and the virtcontainers/pkg/types has a few codes, merging them into one can make it easy for outstanding and using types package. Fixes: #3031 Signed-off-by: bin <bin@hyper.sh>	2021-11-12 15:06:39 +08:00
bin	6acedc2531	runtime: delete not used codes Functions EnvVars and GetOCIConfig in runtime/virtcontainers/pkg/oci/utils.go are not used anymore. Fixes: #3029 Signed-off-by: bin <bin@hyper.sh>	2021-11-12 11:35:31 +08:00
Snir Sheriber	bcf181b7ee	cgroups: Fix systemd cgroup support As github.com/containerd/cgroups doesn't support scope units which are essential in some cases lets create the cgroups manually and load it trough the cgroups api This is currently done only when there's single sandbox cgroup (sandbox_cgroup_only=true), otherwise we set it as static cgroup path as it used to be (until a proper soultion for overhead cgroup under systemd will be suggested) Fixes: #2868 Signed-off-by: Snir Sheriber <ssheribe@redhat.com>	2021-11-11 08:51:45 +02:00
Bin Liu	04185bd068	Merge pull request #2997 from Jakob-Naucke/lint-protection virtcontainers: Lint protection types	2021-11-11 08:34:48 +08:00
Fabiano Fidêncio	653976c0fd	Merge pull request #3000 from bergwolf/crioptions runtime: Revert "runtime: use containerd package instead of cri-containerd"	2021-11-10 13:41:24 +01:00
Peng Tao	eacfcdec19	runtime: Revert "runtime: use containerd package instead of cri-containerd" This reverts commit `76f16fd1a7` to bring back cri-containerd crioptions parsing so that kata works with older containerd versions like v1.3.9 and v1.4.6. Fixes: #2999 Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2021-11-10 16:06:42 +08:00
Jakob Naucke	b7b89905d4	virtcontainers: Lint protection types Protection types like tdxProtection or seProtection were marked nolint, remove this. As a side effect, ARM needs dummy tests for these. Fixes: #2801 Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2021-11-09 18:36:32 +01:00
James O. D. Hunt	87f676062c	agent: Remove dynamic tracing APIs Remove the `StartTracing` and `StopTracing` agent APIs that toggle dynamic tracing. This is not supported in Kata 2.x, as documented in the [tracing proposals document](https://github.com/kata-containers/kata-containers/pull/2062). Fixes: #2985. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2021-11-09 08:39:06 +00:00
Chelsea Mafrica	84ccdd8ef2	vendor: update OpenTelemetry to v0.20.0 Update OpenTelemetry from v0.15.0 to v0.20.0. Git log 02d8bdd5 Release v0.20.0 (1837) aa66fe75 OS and Process resource detectors (1788) 7374d679 Fix Links documents (1835) 856f5b84 Add feature request issue template (1831) 0fdc3d78 Remove bundler from Jaeger exporter (1830) 738ef11e Fix flaky global ErrorHandler delegation test (1829) e43d9c00 Update Default Value for Jaeger Exporter Endpoint (1824) 0032bd64 Fix default merging of resource attributes from environment variable (1785) 96c5e4ba Add SpanProcessor example for Span annotation on start (1733) 543c8144 Remove the WithSDKOptions from the Jaeger exporter (1825) 66389ad6 Update function docs in sdk.go (1826) 70bc9eb3 Adds support for timeout on the otlp/gRPC exporter (1821) 081cc61d Update Jaeger exporter convenience functions (1822) 1b9f16d3 Remove the WithDisabled option from Jaeger exporter (1806) 6867faa0 Bump actions/cache from v2.1.4 to v2.1.5 (1818) a2bf04dc Build context pipeline in Jaeger upload process (1809) 2de86f23 Remove locking from Jaeger exporter shutdown/export (1807) 4f9fec29 Add ExportSpans benchmark to Jaeger exporter (1805) d9566abe Fix OTLP testing flake: signal connection from mock collector (1816) a2cecb6e add support for env var configuration to otlp/gRPC (1811) d616df61 Fix flaky OTLP exporter reconnect test (1814) b09df84a Changes stdout to expose the `*sdktrace.TracerProvider` (1800) 04890608 Remove options field from Jaeger exporter (1808) 6db20e00 Remove the abandoned Process struct in Jaeger exporter (1804) 086abf34 docs: use test example to document prometheus.InstallNewPipeline (1796) d0cea04b Bump google.golang.org/api from 0.43.0 to 0.44.0 in /exporters/trace/jaeger (1792) 99c477fe Fixed typo for default service name in Jaeger Exporter (1797) 95fd8f50 Bump google.golang.org/grpc from 1.36.1 to 1.37.0 in /exporters/otlp (1791) 9b251644 Zipkin Exporter: Use default resouce's serviceName as default serivce name (1777) (1786) 4d141e47 Add k8s.node.name and k8s.node.uid to semconv (1789) 5c99a34c Fix golint issue caused by incorrect comment (1795) c5d006c0 Update Jaeger environment variables (1752) 58432808 add NewExportPipeline and InstallNewPipeline for otlp (1373) 7d8e6bd7 Zipkin Exporter: Adjust span transformation to comply with the spec (1688) 2817c091 Merge sdk/export/trace into sdk/trace (1778) c61e654c Refactor prometheus exporter tests to match file headers as well (1470) 23422c56 Remove process config for Jaeger exporter (1776) 0d49b592 Add test to check bsp ignores `OnEnd` and `ForceFlush` post Shutdown` (1772) e9aaa04b Record links/events attribute drops independently (1771) 5bbfc22c Make ExportSpans for Jaeger Exporter honor deadline (1773) 0786fe32 Add Bug report issue templates (1775) 3c7facee Add `ExportTimeout` option to batch span processor (1755) c6b92d5b Make TraceFlags spec-compliant (1770) ee687ca5 Bump github.com/itchyny/gojq from 0.12.2 to 0.12.3 in /internal/tools (1774) 52a24774 add support for configuring tls certs via env var to otlp/HTTP (1769) 35cfbc7e Update precedence of event name in Jaeger exporter (1768) 33699d24 Adds semantic conventions for exceptions (1492) 928e3c38 Modify ForceFlush to abort after timeout/cancellation (1757) 3947cab4 Fix testCollectorEndpoint typo and add tag assertions in jaeger_test (1753) ecc635dc add website docs (1747) 07a8d195 Fix Jaeger span status reporting and unify tag keys (1761) 4fa35c90 add partial support for env var config to otlp/HTTP (1758) bf180d0f improve OTLP/gRPC connection errors (1737) d575865b Fix span IsRecording when not sampling (1750) 20c93b01 Update SamplingParameters (1749) 97501a3f Update SpanSnapshot to use parent SpanContext (1748) 604b05cb Store current Span instead of local and remote SpanContext in context.Context (1731) c61f4b6d Set @lizthegrey to emeritus status (1745) b1342fec Bump github.com/golangci/golangci-lint in /internal/tools (1743) 54e1bd19 Bump google.golang.org/api from 0.41.0 to 0.43.0 in /exporters/trace/jaeger (1741) 4d25b6a2 Bump github.com/prometheus/client_golang from 1.9.0 to 1.10.0 in /exporters/metric/prometheus (1740) 0a47b66f Bump google.golang.org/grpc from 1.36.0 to 1.36.1 in /exporters/otlp (1739) 26f006b8 Reinstate @paivagustavo as an Approver (1734) 382c7ced Remove hasRemoteParent field from SDK span (1728) 862a5a68 Remove setting error status while recording error with Span from oteltest package (1729) 6defcfdf Remove links on NewRoot spans (1726) a9b2f851 upgrade thrift to v0.14.1 in jaeger exporter (1712) 5a6a854d Bump google.golang.org/protobuf from 1.25.0 to 1.26.0 in /exporters/otlp (1724) 23486213 Migrate to using go.opentelemetry.io/proto/otlp (1713) 5d559b40 Remove makeSamplingDecision func (1711) e24702da Update the TraceContext.Extract docs (1720) 9d4eb1f6 Update dates in CHANGELOG.md for 2021 releases (1723) 2b4fa968 Release v0.19.0 (1710) 4beb7041 sdk/trace: removing ApplyConfig and Config (1693) 1d42be16 Rename WithDefaultSampler TracerProvider option to WithSampler and update docs (1702) 860d5d86 Add flag to determine whether SpanContext is remote (1701) 0fe65e6b Comply with OpenTelemetry attributes specification (1703) 88884351 Bump google.golang.org/api from 0.40.0 to 0.41.0 in /exporters/trace/jaeger (1700) 345f264a breaking(zipkin): removes servicName from zipkin exporter. (1697) 62cbf0f2 Populate Jaeger's Span.Process from Resource (1673) 28eaaa9a Add a test to prove the Tracer is safe for concurrent calls (1665) 8b1be11a Rename resource pkg label vars and methods (1692) a1539d44 OpenCensus metric exporter bridge (1444) 77aa218d Fix issue #1490, apply same logic as in the SDK (1687) 9d3416cc Fix synchronization issues in global trace delegate implementation (1686) 58f69f09 Span status from HTTP code: Do not set status message if it can be inferred (1681) 9c305bde Flush metric events prior to shutdown in OTLP example (1678) 66b1135a Fix CHANGELOG (1680) 90bd4ab5 Update employer information for maintainers (1683) 36841913 Remove WithRecord() option from trace.SpanOption when starting a span (1660) 65c7de20 Remove trace prefix from NoOp src files. (1679) e88a091a Make SpanContext Immutable (1573) d75e2680 Avoid overriding configuration of tracer provider (1633) 2b4d5ac3 Bump github.com/golangci/golangci-lint in /internal/tools (1671) 150b868d Bump github.com/google/go-cmp from 0.5.4 to 0.5.5 (1667) 76aa924e Fix the examples target info messaging (1676) a3aa9fda Bump github.com/itchyny/gojq from 0.12.1 to 0.12.2 in /internal/tools (1672) a5edd79e Removed setting error status while recording err as span event (1663) e9814758 chore(zipkin): improves zipkin example to not to depend on timeouts. (1566) 3dc91f2d Add ForceFlush method to TracerProvider (1608) bd0bba43 exporter: swap pusher for exporter (1656) 56904859 Update the SimpleSpanProcessor (1612) a7f7abac SpanStatus description set only when status code is set to Error (1662) 05252f40 Jaeger Exporter: Fix minor mapping discrepancies (1626) 238e7c61 Add non-empty string check for attribute keys (1659) e9b9aca8 Add tests for propagation of Sampler Tracestate changes (1655) 875a2583 Add docs on when reviews should be cleared (1556) 7153ef2d Add HTTP/JSON to the otlp exporter (1586) 62e2a0f7 Unexport the simple and batch SpanProcessors (1638) 992837f1 Add TracerProvider tests to oteltest harness (1607) bb4c297e Pre release v0.18.0 (1635) 712c3dcc Fix makefile ci target and coverage test packages (1634) 841d2a58 Rename local var new to not collide with builtin (1610) 13938ab5 Update SpanProcessor docs (1611) e25503a0 Add compatibility tests to CI (1567) 1519d959 Use reasonable interval in sdktrace.WithBatchTimeout (1621) 7d4496e0 Pass metric labels when transforming to gaugeArray (1570) 6d4a5e0d Bump google.golang.org/grpc from 1.35.0 to 1.36.0 in /exporters/otlp (1619) a93393a0 Bump google.golang.org/grpc in /example/prom-collector (1620) e499ca86 Fix validation for tracestate with vendor and add tests (1581) 43886e52 Make timestamps sequential in lastvalue agg check (1579) 37688ef6 revent end-users from implementing some interfaces (1575) 85e696d2 Updating documentation with an working example for creating NewExporter (1513) 562eb28b Unify the Added sections of the unreleased changes (1580) c4cf1aff Fix Windows build of Jaeger tests (1577) 4a163bea Fix stdout TestStdoutTimestamp failure with sleep (1572) bd4701eb Stagger timestamps in exact aggregator tests (1569) b94cd4b2 add code attributes to semconv package (1558) 78c06cef Update docs from gitter to slack for communication (1554) 1307c911 Remove vendor exclude from license-check (1552) 5d2636e5 Bump github.com/golangci/golangci-lint in /internal/tools (1565) d7aff473 Vendor Thrift dependency (1551) 298c5a14 Update span limits to conform with OpenTelemetry specification (1535) ecf65d79 Rename otel/label -> otel/attribute (1541) 1b5b6621 Remove resampling on span.SetName (1545) 8da52996 fix: grpc reconnection (1521) 3bce9c97 Add Keys() method to propagation.TextMapCarrier (1544) 0b1a1c72 Make oteltest.SpanRecorder into a concrete type (1542) 7d0e3e52 SDK span no modification after ended (1543) 7de3b58c Remove extra labels types (1314) 73194e44 Bump google.golang.org/api from 0.39.0 to 0.40.0 in /exporters/trace/jaeger (1536) 8fae0a64 Create resource.Default() with required attributes/default values (1507) 76f93422 Release v0.17.0 (1534) 9b242bc4 Organize API into Go modules based on stability and dependencies (1528) e50a1c8c Bump actions/cache from v2 to v2.1.4 (1518) a6aa7f00 Bump google.golang.org/api from 0.38.0 to 0.39.0 in /exporters/trace/jaeger (1517) 38efc875 Code Improvement - Error strings should not be capitalized (1488) 6b340501 Update default branch name (1505) b39fd052 nit: Fix comment to be up-to-date (1510) 186c2953 Fix golint error of package comment form (1487) 9308d662 Bump google.golang.org/api from 0.37.0 to 0.38.0 in /exporters/trace/jaeger (1506) 1952d7b6 Reverse order of attribute precedence when merging two Resources (1501) ad7b4715 Remove build flags for runtime/trace support (1498) 4bf4b690 Remove inaccurate and unnecessary import comment (1481) 7e19eb6a Bump google.golang.org/api from 0.36.0 to 0.37.0 in /exporters/trace/jaeger (1504) c6a4406a Bump github.com/golangci/golangci-lint in /internal/tools (1503) 9524ac09 Update workflows to include main branch as trigger (1497) c066f15e Bump github.com/gogo/protobuf from 1.3.1 to 1.3.2 in /internal/tools (1478) 894e0240 Bump github.com/golangci/golangci-lint in /internal/tools (1477) 71ffba39 Bump google.golang.org/grpc from 1.34.0 to 1.35.0 in /exporters/otlp (1471) 515809a8 Bump github.com/itchyny/gojq from 0.12.0 to 0.12.1 in /internal/tools (1472) 3e96ad1e gitignore: remove unused example path (1474) c5622777 Histogram aggregator functional options (1434) 0df8cd62 Rename Makefile.proto to avoid interpretation as proto file (1468) 979ff51f Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 (1453) 1df8b3b8 Bump github.com/gogo/protobuf from 1.3.1 to 1.3.2 in /exporters/otlp (1456) 4c30a90a Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 in /sdk (1455) 5a9f8f6e Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 in /exporters/stdout (1454) 7786f34c Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 in /exporters/trace/zipkin (1457) 4352a7a6 Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 in /exporters/otlp (1460) 6990b3b3 Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 in /exporters/metric/prometheus (1461) 7af40d22 Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 in /exporters/trace/jaeger (1463) f16f1892 Bump google.golang.org/grpc in /example/otel-collector (1465) fe363be3 Move Span Event to API (1452) 43922240 Bump google.golang.org/grpc in /example/prom-collector (1466) 0aadfb27 Prepare release v0.16.0 (1464) 207587b6 Metric histogram aggregator: Swap in SynchronizedMove to avoid allocations (1435) c29c6fd1 Shutdown underlying span exporter while shutting down BatchSpanProcessor (1443) dfece3d2 Combine the Push and Pull metric controllers (1378) 74deeddd Handle tracestate in TraceContext propagator (1447) 49f699d6 Remove Quantile aggregation, DDSketch aggregator; add Exact timestamps (1412) 9c949411 Rename internal/testing to internal/internaltest (1449) 8d809814 Move gRPC driver to a subpackage and add an HTTP driver (1420) 9332af1b Bump github.com/golangci/golangci-lint in /internal/tools (1445) 5ed96e92 Update exporters/otlp Readme.md (1441) bc9cb5e3 Switch CircleCI badge to GitHub Actions (1440) 716ad082 Remove CircleCI config (1439) 0682db1e Adding Security Workflows to GitHub Actions (2/2): gosec workflow (1429) 11f732b8 Adding Security Workflows to GitHub Actions (1/2): codeql workflow (1428) 40f1c003 Add Tracestate into the SamplingResult struct (1432) db06c8d1 Flush metric events before shutdown in collector example (1438) f6f458e1 Fix golint issue caused by typo in trace.go (1436) fe9d1f7e Use uint64 Count consistently in metric aggregation (1430) 3a337d0b Bump github.com/golangci/golangci-lint in /internal/tools (1433) 1e4c8321 cleanup: drop the removed examples in gitignore (1427) 5c9221cf Unify endpoint API that related to OTel exporter (1401) 045c3ffe Build scripts: Replace mapfile with read loop for old bash versions (1425) 2def8c3d Add Versioning Documentation (1388) 6bcd1085 Bump github.com/itchyny/gojq from 0.11.2 to 0.12.0 in /internal/tools (1424) 38e76efe Add a split protocol driver for otlp exporter (1418) 439cd313 Add TraceState to SpanContext in API (1340) 35215264 Split connection management away from exporter (1369) add9d933 Bump github.com/prometheus/client_golang from 1.8.0 to 1.9.0 in /exporters/metric/prometheus (1414) 93d426a1 Add @dashpole as a project Approver (1410) 6fe20ef3 Fix small typo (1409) b22d0d70 Mention the getting started guide (1406) 3fb80fb2 Fix duplicate checkout action in GitHub workflow (1407) 2051927b Correct CI workflow syntax (1403) f11a86f7 Fix typo in comment (1402) bdf87a78 Migrate CircleCI ci.yml workflow to GitHub Actions (1382) 4e59dd1f Bump google.golang.org/grpc from 1.32.0 to 1.34.0 in /example/otel-collector (1400) 83513f70 Bump google.golang.org/api from 0.32.0 to 0.36.0 in /exporters/trace/jaeger (1398) a354fc41 Bump github.com/prometheus/client_golang from 1.7.1 to 1.8.0 in /exporters/metric/prometheus (1397) 3528e42c Bump google.golang.org/grpc from 1.32.0 to 1.34.0 in /exporters/otlp (1396) af114baf Call otel.Handle with non-nil errors (1384) c3c4273e Add RO/RW span interfaces (1360) Fixes #2591 Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>	2021-11-04 12:30:45 -07:00
Chelsea Mafrica	09d5d8836b	runtime: tracing: Change method for adding tags In later versions of OpenTelemetry label.Any() is deprecated. Create addTag() to handle type assertions of values. Change AddTag() to variadic function that accepts multiple keys and values. Fixes #2547 Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>	2021-11-04 10:19:05 -07:00
Snir Sheriber	b34ed403c5	cgroups: pass vhost-vsock device to cgroup for the sandbox cgroup Signed-off-by: Snir Sheriber <ssheribe@redhat.com>	2021-11-04 10:59:10 +02:00
Snir Sheriber	7362e1e8a9	runtime: remove prefix when cgroups are managed by systemd as done previously in `9949daf4dc` Signed-off-by: Snir Sheriber <ssheribe@redhat.com>	2021-11-04 10:13:22 +02:00
Jianyong Wu	e15c8460db	Merge pull request #2265 from rapiz1/simple-ro-mount virtcontainers: simplify read-only mount handling	2021-11-01 10:43:16 +08:00
GabyCT	969b78b01f	Merge pull request #2496 from rapiz1/show-guest-protection cli: Show available guest protection in env output	2021-10-29 17:28:47 -05:00
James O. D. Hunt	2551179e43	Merge pull request #2929 from YchauWang/vc-docs-api virtcontainers: api: update the functions in the api.md docs	2021-10-29 16:01:31 +01:00
James O. D. Hunt	4e2dd41eb6	Merge pull request #1791 from wainersm/virtcontainers-1 virtcontainers: check that both initrd and image are not set	2021-10-29 14:51:07 +01:00
wangyongchao.bj	338ac87516	virtcontainers: api: update the functions in the api.md docs Virtcontainers API document functions weren't sync with the codes Sandbox and VCImpl. And we have two functions named `CreateSandbox` functions, diff by one parameter, very confused. So this pr sync the codes to api documents. Fixes: #2928 Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>	2021-10-29 15:36:53 +08:00
Yujia Qiao	e66d0473be	virtcontainers: simplify read-only mount handling Current handling of read-only mounts is a little tricky. However, a clearer solution can be used here: 1. make a private ro bind mount at privateDest to the mount source 2. make a bind mount at mountDest to the mount created in step 1 3. umount the private bind mount created in step 1 One important aspect is that the mount in step 2 is duplicated from the one we created in step 1. So the MS_RDONLY flag is properly preserved in all mounts created in the propagtion. Fixes: #2205 Depends-on: github.com/kata-containers/tests#4106 Signed-off-by: Yujia Qiao <rapiz3142@gmail.com>	2021-10-28 15:48:41 +08:00
Manabu Sugimoto	3be50adab9	agent: Add support for Seccomp The kata-agent supports seccomp feature based on the OCI runtime specification. This seccomp capability in the kata-agent is enabled by default. However, it is not enforced by default: users need to enable that by setting `disable_guest_seccomp` to `false` in the main configuration file. Fixes: #1476 Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>	2021-10-27 19:06:13 +09:00
Wainer dos Santos Moschetta	309dae631a	virtcontainers: check that both initrd and image are not set This changed valid() in hypervisor to check the case where both initrd and image path are set; in this case it returns an error. Fixes #1868 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2021-10-26 10:44:23 -04:00
bin	5f306330f4	virtcontainers: delete duplicated notify in watchHypervisor function When hypervisor check failed, the notify function is called twice. Fixes: #2901 Signed-off-by: bin <bin@hyper.sh>	2021-10-26 11:58:26 +08:00
Yujia Qiao	6cc8000cae	cli: Show available guest protection in env output Show available guest protections in the `kata-runtime env` output. Also bump the formatVersion. Fixes: #1982 Signed-off-by: Yujia Qiao <rapiz3142@gmail.com>	2021-10-25 21:44:56 +08:00
Yujia Qiao	2063b13805	virtcontainers: Add func AvailableGuestProtections Add functions to return guestProtection as a string slice, which can be then used in `kata-runtime env` output. Signed-off-by: Yujia Qiao <rapiz3142@gmail.com>	2021-10-25 21:44:01 +08:00
James O. D. Hunt	ec3aa1694b	Merge pull request #2844 from jongwu/unit_test enable unit test on arm	2021-10-25 10:58:21 +01:00
David Gibson	a0825badf6	Merge pull request #2795 from dgibson/vfio-as-vfio Allow VFIO devices to be used as VFIO devices in the container	2021-10-25 14:25:26 +11:00
David Gibson	34273da98f	runtime/device: Allow VFIO devices to be presented to guest as VFIO devices On a conventional (e.g. runc) container, passing in a VFIO group device, /dev/vfio/NN, will result in the same VFIO group device being available within the container. With Kata, however, the VFIO device will be bound to the guest kernel's driver (if it has one), possibly appearing as some other device (or a network interface) within the guest. This add a new `vfio_mode` option to alter this. If set to "vfio" it will instruct the agent to remap VFIO devices to the VFIO driver within the guest as well, meaning they will appear as VFIO devices within the container. Unlike a runc container, the VFIO devices will have different names to the host, since the names correspond to the IOMMU groups of the guest and those can't be remapped with namespaces. For now we keep 'guest-kernel' as the value in the default configuration files, to maintain current Kata behaviour. In future we should change this to 'vfio' as the default. That will make Kata's default behaviour more closely resemble OCI specified behaviour. fixes #693 Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-25 12:29:31 +11:00
David Gibson	68696e051d	runtime: Add parameter to constrainGRPCSpec to control VFIO handling Currently constrainGRPCSpec always removes VFIO devices from the OCI container spec which will be used for the inner container. For upcoming support for VFIO devices in DPDK usecases we'll need to not do that. As a preliminary to that, add an extra parameter to the function to control whether or not it will remove the VFIO devices from the spec. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-25 12:29:31 +11:00
David Gibson	d9e2e9edb2	runtime: Rename constraintGRPCSpec to improve grammar "constraint" is a noun, "constrain" is the associated verb, which makes more sense in this context. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-25 12:29:31 +11:00
David Gibson	57ab408576	runtime: Introduce "vfio_mode" config variable and annotation In order to support DPDK workloads, we need to change the way VFIO devices will be handled in Kata containers. However, the current method, although it is not remotely OCI compliant has real uses. Therefore, introduce a new runtime configuration field "vfio_mode" to control how VFIO devices will be presented to the container. We also add a new sandbox annotation - io.katacontainers.config.runtime.vfio_mode - to override this on a per-sandbox basis. For now, the only allowed value is "guest-kernel" which refers to the current behaviour where VFIO devices added to the container will be bound to whatever driver in the VM kernel claims them. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-25 12:29:29 +11:00
Jianyong Wu	1a96b8ba35	template: disable template unit test on arm Template is broken on arm. here we disable the template unit test temporarily. Fixes: #2809 Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>	2021-10-23 15:07:25 +08:00
Jianyong Wu	43b13a4a6d	runtime: DefaultMaxVCPUs should not greater than defaultMaxQemuVCPUs DefaultMaxVCPUs may be larger than the defaultMaxQemuVCPUs that should be checked and avoided. Fixes: #2809 Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>	2021-10-23 15:07:25 +08:00
Manohar Castelino	52268d0ece	hypervisor: Expose the hypervisor itself Export the top level hypervisor type s/hypervisor/Hypervisor Fixes: #2880 Signed-off-by: Manohar Castelino <mcastelino@apple.com> Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-10-22 16:46:02 -07:00
Eric Ernst	a72bed5b34	hypervisor: update tests based on createSandbox->CreateVM change Fixup a couple of broken tests. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-10-22 16:45:35 -07:00
Manohar Castelino	f434bcbf6c	hypervisor: createSandbox is CreateVM Last of a series of commits to export the top level hypervisor generic methods. s/createSandbox/CreateVM Fixes #2880 Signed-off-by: Manohar Castelino <mcastelino@apple.com> Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-10-22 16:45:35 -07:00
Manohar Castelino	76f1ce9e30	hypervisor: startSandbox is StartVM s/startSandbox/StartVM Signed-off-by: Manohar Castelino <mcastelino@apple.com> Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-10-22 16:45:35 -07:00
Manohar Castelino	fd24a695bf	hypervisor: waitSandbox is waitVM renaming... Signed-off-by: Manohar Castelino <mcastelino@apple.com>	2021-10-22 16:45:35 -07:00
Manohar Castelino	a6385c8fde	hypervisor: stopSandbox is StopVM Renaming. There is no Sandbox specific logic except tracing. Signed-off-by: Manohar Castelino <mcastelino@apple.com>	2021-10-22 16:45:35 -07:00
Manohar Castelino	f989078cd2	hypervisor: resumeSandbox is ResumeVM renaming... Signed-off-by: Manohar Castelino <mcastelino@apple.com>	2021-10-22 16:45:35 -07:00
Manohar Castelino	73b4f27c46	hypervisor: saveSandbox is SaveVM rename Signed-off-by: Manohar Castelino <mcastelino@apple.com>	2021-10-22 16:45:35 -07:00
Manohar Castelino	7308610c41	hypervisor: pauseSandbox is nothing but PauseVM renaming Signed-off-by: Manohar Castelino <mcastelino@apple.com>	2021-10-22 16:45:35 -07:00
Manohar Castelino	8f78e1cc19	hypervisor: The SandboxConsole is the VM's console update naming Signed-off-by: Manohar Castelino <mcastelino@apple.com>	2021-10-22 16:45:35 -07:00
Manohar Castelino	4d47aeef2e	hypervisor: Export generic interface methods This is in preparation for creating a seperate hypervisor package. Non functional change. Signed-off-by: Manohar Castelino <mcastelino@apple.com>	2021-10-22 16:45:35 -07:00
Manohar Castelino	6baf2586ee	hypervisor: Minimal exports of generic hypervisor internal fields Export commonly used hypervisor fields and utility functions. These need to be exposed to allow the hypervisor to be consumed externally. Note: This does not change the hypervisor interface definition. Those changes will be separate commits. Signed-off-by: Manohar Castelino <mcastelino@apple.com>	2021-10-22 16:45:35 -07:00
GabyCT	03877f3479	Merge pull request #2872 from likebreath/1020/clh_v19.0 Upgrade to Cloud Hypervisor v19.0	2021-10-21 10:26:55 -05:00
James O. D. Hunt	09741272bc	Merge pull request #2783 from likebreath/1001/clh_enable_seccomp virtcontainers: clh: Enable the `seccomp` feature	2021-10-21 09:21:33 +01:00
Bo Chen	8030b6caf0	virtcontainers: clh: Re-generate the client code This patch re-generates the client code for Cloud Hypervisor v19.0. Note: The client code of cloud-hypervisor's (CLH) OpenAPI is automatically generated by openapi-generator [1-2]. [1] https://github.com/OpenAPITools/openapi-generator [2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-10-20 15:48:55 -07:00
Chelsea Mafrica	4ce2b14e60	Merge pull request #2817 from jodh-intel/clh+fc-agent-tracing Enable agent tracing for hybrid VSOCK hypervisors	2021-10-18 22:01:52 -07:00
bin	76f16fd1a7	runtime: use containerd package instead of cri-containerd cri-containerd project has been merged into containerd repo, and we should not reference it any more in code and docs. This commit will use containerd package instead of cri-containerd package. Fixes: #2791 Signed-off-by: bin <bin@hyper.sh>	2021-10-19 09:40:20 +08:00
James O. D. Hunt	41c49a7bf5	Merge pull request #2771 from fengwang666/debug-pid runtime: update sandbox root dir cleanup behavior in rootless hypervisor	2021-10-18 17:47:47 +01:00
Christophe de Dinechin	bcffa26305	tracing: Fix typo in "package" tag name The tracing tags for api.go contain `"packages"` as a tag name, whereas all other tags contain `"package"`. Fixes: #2847 Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2021-10-15 14:48:00 +02:00
James O. D. Hunt	e61f5e2931	runtime: Show socket path in kata-env output Display a pseudo path to the sandbox socket in the output of `kata-runtime env` for those hypervisors that use Hybrid VSOCK. The path is not a real path since the command does not create a sandbox. The output includes a `{ID}` tag which would be replaced with the real sandbox ID (name) when the sandbox was created. This feature is only useful for agent tracing with the trace forwarder where the configured hypervisor uses Hybrid VSOCK. Note that the features required a new `setConfig()` method to be added to the `hypervisor` interface. This isn't normally needed as the specified hypervisor configuration passed to `setConfig()` is also passed to `createSandbox()`. However the new call is required by `kata-runtime env` to display the correct socket path for Firecracker. The new method isn't wholly redundant for the main code path though as it's now used by each hypervisor's `createSandbox()` call. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2021-10-15 11:45:29 +01:00
James O. D. Hunt	321be0f794	tracing: Remove trace mode and trace type Remove the `trace_mode` and `trace_type` agent tracing options as decided in the Architecture Committee meeting. See: - https://github.com/kata-containers/kata-containers/pull/2062 Fixes: #2352. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2021-10-15 10:09:38 +01:00
Bo Chen	7b2bfd4eca	virtcontainers: clh: Use 'quiet' as the default kernel parameter The 'quiet' kernel parameter can avoid guest kernel logs while booting, which can reduce boot time. Fix: #2820 Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-10-11 22:06:27 -07:00
Bo Chen	3e24e46c70	virtcontainers: clh: Turn-off serial and virtio-console by default We will need to have console output from the guest only for debugging purposes. As a result, we can turn-off both the serial and virtio-console devices by default for better boot time. Fixes: #2820 Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-10-11 22:06:23 -07:00
Feng Wang	adc9e0baaf	runtime: fix two bugs in rootless hypervisor Update the sandbox dir clean up logic to be more appropriate Add different seeds for randInt() method Fixes #2770 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2021-10-08 15:52:42 -07:00
Bo Chen	51cbe14584	runtime: Add option "disable_seccomp" to config hypervisor.clh This patch adds an option "disable_seccomp" to the config hypervisor.clh, from which users can disable the `seccomp` feature from Cloud Hypervisor when needed (for debugging purposes). Fixes: #2782 Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-10-08 15:10:30 -07:00
Bo Chen	98b7350a1b	virtcontainers: clh: Enable the `seccomp` feature This patch enables the `seccomp` feature from Cloud Hypervisor which provides fine-grained allowed syscalls for each of its worker threads. It brings important security benefits, while would increase memory footprint. Fixes: #2782 Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-10-08 15:07:43 -07:00
Fupan Li	988eb95621	Merge pull request #2760 from liubin/fix/2759-optimize-code-for-managing-temp-users runtime: optimize code for managing temp users for rootless mode	2021-10-08 13:49:14 +08:00
bin	bf8f582c1d	runtime: optimize code for managing temp users for rootless mode This commit does two chagnes: - move code for managing temp users to rootless.go. - use common function in qemu.go when shutdown the VM. Fixes: #2759 Signed-off-by: bin <bin@hyper.sh>	2021-10-08 11:04:21 +08:00
Bin Liu	10ec4b133c	Merge pull request #2742 from liubin/fix/2741-delete-file-code Delete file virtcontainers-setup.sh	2021-10-07 11:54:47 +08:00
Jianyong Wu	7eac2ec786	protection: add confidential compute frame for arm Even CCA, which is the confidential compute archtecture, has not been ready, add a empty implementation to avoid static check error. Fixes: #2789 Signed-off-by: Jianyong Wu <jianyong.wu@arm.com> Suggested-by: Fabiano Fidêncio <fidencio@redhat.com>	2021-10-06 15:53:36 +02:00
Jianyong Wu	8acfc154de	check: fix typecheck failure in qemu_arm64_test.go fix typecheck failure in qemu_arm64_test.go Fixes: #2789 Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>	2021-10-06 15:53:35 +02:00
Amulya Meka	5b02d54e23	virtcontainers: fix lint failure on ppc64le Add nolint for arch specific code to exclude from lint check. Fixes: #2773 Signed-off-by: Amulya Meka <amulmek1@in.ibm.com>	2021-10-06 15:53:35 +02:00
Jakob Naucke	ff9728f032	virtcontainers: nolint guestProtection Exclude from lint checking for it is ultimately only used in architecture-specific code. Fixes: #2273 Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2021-10-06 15:53:35 +02:00
Samuel Ortiz	71ce6cfe9e	runtime: Pass the route IP family to the agent When updating the guest routing table, we should forward the IP family information up to the guest. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2021-10-01 14:35:17 +02:00
Samuel Ortiz	99450bd1f7	agent: protos: Add a Family field to the Route payload Our check for the IP family is working as long as we have either a gateway or a destination IP. Some routes are missing both. The RT netlink messages provide the IP family information for each route, so we can carry that piece of information up to the guest. That will allow for a more reliable route IP family determination. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2021-10-01 14:35:17 +02:00
Samuel Ortiz	f85fe70231	runtime: vendor: Bump the netlink package dependency We need to be able to get the IP family from the netlink route meesages, and the Route.Family field only got recently added to the netlink package. The update generates static check warnings about the call for nethandler.Delete() being deprecated in favor of a Close() call instead. So we include the s/Delete()/Close()/ change as part of this PR. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2021-10-01 14:35:01 +02:00
James O. D. Hunt	2ce8d4263c	clh: Suppress hypervisor output to make guest output visible Reduce the cloud-hypervisor log level from `Debug` to `Info` when hypervisor debug is enabled. This is required since `Debug` level: - Is overkill for debugging hypervisor failures. - Effectively hides the output from the guest kernel and userland: CLH generates so much output that the output from the guest gets "lost in the noise" (experiments show that for each full CLH debug message, at most 1 _byte_ of guest output is displayed). Fixes: #2726. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2021-09-30 14:22:09 +01:00
bin	762922a521	runtime: delete func ConstraintsToVCPUs ConstraintsToVCPUs is not used any more. Fixes: #2741 Signed-off-by: bin <bin@hyper.sh>	2021-09-30 14:44:41 +08:00
bin	4f4854308a	runtime: delete virtcontainers-setup.sh This file is not used anymore. Fixes: #2741 Signed-off-by: bin <bin@hyper.sh>	2021-09-30 14:44:30 +08:00
Bin Liu	4ac7199282	Merge pull request #2494 from rapiz1/clean-up-code virtcontainers: clean up useless code	2021-09-29 22:56:13 +08:00
David Gibson	b57613f53e	Merge pull request #1682 from dgibson/rescan Remove forced PCI rescans from agent	2021-09-29 13:03:55 +10:00
Feng Wang	e5fe53f0a9	runtime: fix nil reference in cleanup rootless user It seems the client (crio) can send multiple requests to stop the Kata VM, resulting a nil reference if the uid has already been cleaned up by a different thread. Fixes #2743 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2021-09-27 21:28:47 -07:00
Bin Liu	3217b03b17	Merge pull request #2522 from Bevisy/main-2515 virtcontainers: Fix incorrect scripts path	2021-09-27 21:14:40 +08:00
Bin Liu	39df808f6a	Merge pull request #2695 from YchauWang/wyc-vc-cgroup runtime: clear virtcontainers cgroup duplicated function	2021-09-27 21:12:39 +08:00
David Gibson	aad1a8734f	runtime/device: Give the agent information about VFIO devices We send information about several kinds of devices to the agent so that it can apply specific handling. We don't currently do this with VFIO devices. However we need to do that so that the agent can properly wait for VFIO devices to be ready (previously it did that using a PCI rescan which may not be reliable and has some very bad side effects). This patch collates and sends the relevant information. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-27 12:46:33 +10:00
David Gibson	ebd7b61884	runtime: Don't repeat GetDeviceByID between appendDevices() and append() Both appendBlockDevice and appendVhostUserBlkDevice start by using GetDeviceByID to lookup the api.Device object corresponding to their ContainerDevice object. However their common caller, appendDevices() has already done this. This changes it so the looked up api.Device is passed to the individual appendDevice() functions. This slightly reduces duplicated work, but more importantly it makes it clearer that append*Device() don't need to check for a nil result from GetDeviceByID, since the caller has already done that. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-27 12:46:33 +10:00
David Gibson	ad45c52fbe	runtime/device: Record guest PCI path for VFIO devices For several device types which correspond to a PCI device in the guest we record the device's PCI path in the guest. We don't currently do that for VFIO devices, but we're going to need to for better handling of SR-IOV devices. To accomplish this, we have to determine the guest PCI path from the information the VMM gives us: For qemu, we query the slot of the device and its bridge from QMP. For cloud-hypervisor, the device add interface gives us a guest PCI address. In fact this represents a design error in the clh API - there's no way it can really know the guest PCI address in general. It works in this case, because clh doesn't use PCI bridges, so the device will always be on the root bus. Based on that, the PCI path is simply the device's slot number. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-27 12:46:33 +10:00
David Gibson	5c2af3e308	runtime/device: Refactor hotplugVFIODevice() to have common exit path hotplugVFIODevice() has several different paths depending if we're plugging into a root port or a PCIE<->PCI bridge and if we're using a regular or mediated VFIO device. We're going to want some common code on the successful exit path here, so refactor the function to allow that without duplication. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-27 12:46:33 +10:00
David Gibson	cf36fd87ad	runtime: Fix some leftover go fmt errors A few "go fmt" errors appear to have crept it. Clean them up with "go fmt ./..." in the src/runtime directory. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-27 12:46:33 +10:00
zhanghj	57e3712dbd	virtiofs: fix error report in TestVirtiofsdStart when go test running Initialize ctx with context.Background() instead of nil value. Fixes: #2718 Signed-off-by: zhanghj <zhanghj.lc@inspur.com>	2021-09-24 16:06:06 +08:00
Fabiano Fidêncio	279f8e9d03	Merge pull request #2590 from c3d/issue/2589-virtiofsd-perms virtiofs: Create shared directory with 0700 mode, not 0750	2021-09-24 09:16:40 +02:00
Julio Montes	5d2a82fbf9	Merge pull request #2323 from dgibson/acpi-pcihp Replace SHPC with ACPI PCI hotplug for Kata guests	2021-09-23 09:55:31 -05:00
Fabiano Fidêncio	0ececc630f	Merge pull request #2666 from cmaf/tracing-newContainer-logger runtime: tracing: Fix logger passed in newContainer	2021-09-23 13:07:19 +02:00
Fabiano Fidêncio	e33c26ba18	Merge pull request #2622 from YchauWang/wyc-vc-api virtcontainers: update VC SandboxConfig API add SandboxBindMounts field	2021-09-23 13:05:33 +02:00
Fabiano Fidêncio	47170e302a	Merge pull request #2616 from Bevisy/main-2615 sandbox: Allow the device to be accessed,such as /dev/null and /dev/u…	2021-09-23 13:04:18 +02:00
David Gibson	8bbcb06af5	qemu: Disable SHPC hotplug Under certain circumstances[0] Kata will attempt to use SHPC hotplug for PCI devices on the guest. In fact we explicitly enable SHPC on our PCI to PCI bridges, regardless of the qemu default. SHPC was designed a long, long time ago for physical hotplugging and works very poorly for a virtual environment. In particular it has a mandatory 5s delay to allow a (real, human) operator to back out the operation if they press a button by mistake. This alone makes it unusable for a fast start up application like Kata. Worse, the agent forces a PCI rescan during startup. That will race with the SHPC hotplug operation causing the device to go into a bad state where config space can't be accessed from the guest at all. The only reason we've sort of gotten away with this is that our default guest kernel configuration triggers what's arguably a kernel bug effectively disabling SHPC. That makes the agent rescan the only reason we see the new device. Now that we require a qemu >=6.1, which includes ACPI PCI hotplug on the q35 machine, we can explicitly disable SHPC in all cases. It's nothing but trouble. fixes #2174 Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-23 10:27:26 +10:00
David Gibson	cc4983eeac	runtime: Remove unused qemuArchBase.appendBridges definition qemuArchBase.appendBridges is never actually used, because the bare qemuArchBase type is itself never used (outside of unit tests). Instead all the subclasses of qemuArchBase override appendBridges() to call the very similar, but not identical genericAppendBridges. So, we can remove the qemuArchBase.appendBridges implementation. Furthermore, all those subclasses override appendBridges() in exactly the same way, and so we can remove those definitions and replace the base class qemuArchBase appendBridges() with that version, calling genericAppendBridges(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-23 10:15:08 +10:00
wangyongchao.bj	3b0c4bf9a0	runtime: clear virtcontainers cgroup duplicated function There are `DeviceToDeviceCgroup` and `deviceToDeviceCgroup` two functions, creating a `specs.LinuxDeviceCgroup` object. We clear the new function `deviceToDeviceCgroup`. Fixes: #2694 Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>	2021-09-22 15:13:34 +08:00
Fabiano Fidêncio	2bee8bc6bd	Merge pull request #2432 from fengwang666/qemu-rootless runtime: run the QEMU VMM process with a non-root user	2021-09-21 21:37:02 +02:00
Feng Wang	9a6d56f1ab	runtime: fix empty cgroup path validation error An empty cgroup path shouldn't fail cgroup creation Fixes #2674 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2021-09-20 13:48:09 -07:00
Christophe de Dinechin	48fb1d9203	virtiofs: Create shared directory with 0700 mode, not 0750 A discussion on the Linux kernel mailing list [1] exposed that virtiofsd makes a core assumption that the file systems being shared are not accessible by any non-privileged user. We currently create the `shared` directory in the sandbox with the default `0750` permissions, which gives read and directory traversal access to the group. There is no real good reason for a non-root user to access the shared directory, and this is potentially dangerous. Fixes: #2589 [1]: https://lore.kernel.org/linux-fsdevel/YTI+k29AoeGdX13Q@redhat.com/ Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2021-09-20 10:47:18 +02:00
Chelsea Mafrica	077b77c178	runtime: tracing: Fix logger passed in newContainer Change logger in Trace call in newContainer from sandbox.Logger() to nil. Passing nil will cause an error to be logged by kataTraceLogger instead of the sandbox logger, which will avoid having the log message report it as part of the sandbox subsystem when it is part of the container subsystem. The kataTraceLogger will not log it as related to the container subsystem, but since the container logger has not been created at this point, and we already use the kataTraceLogger in other instances where a subsystem's logger has not been created yet, this PR makes the call consistent with other code. Fixes #2665 Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>	2021-09-17 11:41:04 -07:00
Feng Wang	1cfe59304d	runtime: Run QEMU using a non-root user/group A random generated user/group is used to start QEMU VMM process. The /dev/kvm group owner is also added to the QEMU process to grant it access. Fixes #2444 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2021-09-17 11:28:44 -07:00
Hui Zhu	fff82b4ef5	Merge pull request #2628 from bergwolf/runtime-reorg runtime: refactor commandline code directory	2021-09-17 10:37:22 +08:00
Chelsea Mafrica	6159ef3499	Merge pull request #2626 from YchauWang/wyc-vc-api02 virtcontainers: update VC HypervisorConfig API add three lost fields	2021-09-16 16:46:27 -07:00
Peng Tao	067c44d0b6	runtime: fix UT build failure storeContainer has been removed. Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2021-09-16 19:42:02 +08:00
Samuel Ortiz	7bf96d2457	Merge pull request #2604 from Amulyam24/container_tests virtcontainers: add unit tests for container.go	2021-09-16 11:02:16 +02:00
Bo Chen	d00decc97d	runtime: clh: Enable hugepages support This patch adds the configuration option that allows to use hugepages with Cloud Hypervisor guests. Fixes: #2648 Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-09-15 10:43:57 -07:00
David Gibson	64bb803fcf	runtime/qemu: Move from query-cpus to query-cpus-fast We recently updated to using qemu-6.1 (from qemu 5.2). Unfortunately one breaking change in qemu 6.0 wasn't caught by the CI. The query-cpus QMP command has been removed, replaced by query-cpus-fast (which has been available since qemu 2.12). govmm already had support for query-cpus-fast, we just weren't using it, so the change is quite easy. fixes #2643 Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-15 16:41:26 +10:00
Samuel Ortiz	9bed2ade0f	virtcontainers: Convert to the new cgroups package API The new API is based on containerd's cgroups package. With that conversion we can simpligy the virtcontainers sandbox code and also uniformize our cgroups external API dependency. We now only depend on containerd/cgroups for everything cgroups related. Depends-on: github.com/kata-containers/tests#3805 Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com> Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-09-14 07:09:34 +02:00
Samuel Ortiz	b42ed39349	virtcontainers: cgroups: Add a containerd API based cgroups package Eventually, we will convert the virtcontainers and the whole Kata runtime code base to only rely on that package. This will make Kata only depends on the simpler containerd cgroups API. Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>	2021-09-14 07:09:34 +02:00
Samuel Ortiz	f17752b0dc	virtcontainers: container: Do not create and manage container host cgroups The only process we are adding there is the container host one, and there is no such thing anymore. Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>	2021-09-14 07:09:33 +02:00
Samuel Ortiz	dc7e9bce73	virtcontainers: sandbox: Host cgroups partitioning This is a simplification of the host cgroup handling by partitioning the host cgroups into 2: A sandbox cgroup and an overhead cgroup. The sandbox cgroup is always created and initialized. The overhead cgroup is only available when sandbox_cgroup_only is unset, and is unconstrained on all controllers. The goal of having an overhead cgroup is to be more flexible on how we manage a pod overhead. Having such cgroup will allow for setting a fixed overhead per pod, for a subset of controllers, while at the same time not having the pod being accounted for those resources. When sandbox_cgroup_only is not set, we move all non vCPU threads to the overhead cgroup and let them run unconstrained. When it is set, all pod related processes and threads will run in the sandbox cgroup. Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>	2021-09-14 07:09:29 +02:00
Samuel Ortiz	f811026c77	virtcontainers: Unconditionally create the sandbox cgroup manager Regardless of the sandbox_cgroup_only setting, we create the sandbox cgroup manager and set the sandbox cgroup path at the same time. Without doing this, the hypervisor constraint routine is mostly a NOP as the sandbox state cgroup path is not initialized. Fixes #2184 Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>	2021-09-14 07:05:57 +02:00
wangyongchao.bj	a6066404f7	virtcontainers: update VC HypervisorConfig API add three lost fields Sync the virtcontainers api.md document, add `ConfidentialGuest` `EntropySourceList` `GuestSwap` three fields to the HypervisorConfig API. Fixes #2625 Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>	2021-09-14 10:42:54 +08:00
wangyongchao.bj	bb18cd475c	virtcontainers: update VC SandboxConfig API add SandboxBindMounts field sync the virtcontainers api.md document, add SandboxBindMounts field to the SandboxConfig API. And update the order of the SandboxConfig API fields. Fixes #2621 Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>	2021-09-14 09:56:47 +08:00
Eric Ernst	967db0cbcc	Merge pull request #2544 from likebreath/0831/upgrade_clh_v18.0 versions: Upgrade to Cloud Hypervisor v18.0	2021-09-13 11:27:45 -07:00
Binbin Zhang	58e77a3c13	sandbox: Allow the device to be accessed,such as /dev/null and /dev/urandom If the device has no permission, such as /dev/null, /dev/urandom, it needs to be added into cgroup. Fixes: #2615 Signed-off-by: Binbin Zhang <binbin36520@gmail.com>	2021-09-13 20:47:16 +08:00
Samuel Ortiz	75ef8c243a	Merge pull request #2603 from Bevisy/main-2539 sandbox: Add device permissions such as /dev/null to cgroup	2021-09-13 11:04:51 +02:00
Anastassios Nanos	62baa48ef5	virtcontainers: fc: parse vcpuID correctly In getThreadIDs(), the cpuID variable is derived from a string that already contains a whitespace. As a result, strings.SplitAfter returns the cpuID with a leading space. This makes any go variant of string to int fail (strconv.ParseInt() in our case). This patch makes sure that the leading space character is removed so the string passed to strconv.ParseInt() is "CPUID" and not " CPUID". This has been caused by a change in the naming scheme of vcpu threads for Firecracker after v0.19.1. Fixes: #2592 Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>	2021-09-10 09:39:56 +00:00
Bo Chen	f785ff0bf2	virtcontainers: clh: Revert the workaround incorrect default values Given the fix to the bugs of the openapi spec file is included in the Cloud Hypervisor v18.0 [1], this patch reverts the workaround we carried in the CLH driver. This reverts commit `932ee41b3f`. [1] https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3029 Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-09-09 14:52:53 -07:00
Bo Chen	0e0e59dc5f	virtcontainers: clh: Re-generate the client code This patch re-generates the client code for Cloud Hypervisor v18.0. Note: The client code of cloud-hypervisor's (CLH) OpenAPI is automatically generated by openapi-generator [1-2]. [1] https://github.com/OpenAPITools/openapi-generator [2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-09-09 14:51:55 -07:00
Amulyam24	d865c80986	virtcontainers: add unit tests for container.go Fixes: #268 Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>	2021-09-09 13:09:38 +05:30
Binbin Zhang	71f915c63f	sandbox: Add device permissions such as /dev/null to cgroup adds the default devices for unix such as /dev/null, /dev/urandom to the container's resource cgroup spec Fixes: #2539 Signed-off-by: Binbin Zhang <binbin36520@gmail.com>	2021-09-09 15:33:24 +08:00
bin	2abc450a4d	test: enable running tests under root user Add tests that run under root user to test special cases. Fixes: #2446 Signed-off-by: bin <bin@hyper.sh>	2021-09-09 14:21:34 +08:00
Bin Liu	103fdd3f6c	Merge pull request #2564 from Bevisy/main-2296 virtcontainers: Remove NewStoreFeature	2021-09-03 10:41:21 +08:00
James O. D. Hunt	f3a1bf3b45	Merge pull request #2552 from bergwolf/license license: drop redundent license files	2021-09-02 14:31:18 +01:00
Binbin Zhang	e2a9e78c9e	virtcontainers: Remove NewStoreFeature remove NewStoreFeature Fixes: #2296 Signed-off-by: Binbin Zhang <binbin36520@gmail.com>	2021-09-02 21:28:36 +08:00
Peng Tao	256c3b2747	license: drop redundent license files There is no need to keep multiple copies of the license file in different directory. We can just use the top level one for the project. Fixes: #2553 Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2021-09-01 15:10:04 +08:00
Hui Zhu	bcc9fa3b35	hotplugAddBlockDevice: Use ExecuteBlockdevAddWithDriverCache with swap Use ExecuteBlockdevAddWithDriverCache with swap in hotplugAddBlockDevice to handle swap file cannot work OK with ExecuteBlockdevAddWithCache issue. Fixes: #2548 Signed-off-by: Hui Zhu <teawater@antfin.com>	2021-09-01 14:13:11 +08:00
Peng Tao	c0daa4ebff	Merge pull request #2513 from cmaf/tracing-tracingtags-consistency tracing: Change runtime tracing tags to vars	2021-08-31 10:25:10 +08:00
Fabiano Fidêncio	67d1f4fd14	Merge pull request #2528 from snir911/main_debuggabillity_sq shimv2: add logging to shimv2 api calls	2021-08-30 15:50:55 +02:00
Peng Tao	a9de761d71	runtime: drop qemu-lite support As the project is not maintained and we have not been testing against it for a long time. Fixes: #2529 Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2021-08-30 16:58:12 +08:00
Snir Sheriber	0c7789fad6	runtime: Add container field to logs and unified field naming Signed-off-by: Snir Sheriber <ssheribe@redhat.com>	2021-08-30 10:09:05 +03:00
Bo Chen	b564dd47b6	Merge pull request #2526 from Bevisy/main-2285 runtime: delete types or const that no longer needed	2021-08-29 15:35:03 -07:00
Bin Liu	a89cc0bb5c	Merge pull request #2524 from Bevisy/main-2264 runtime: Optimize the way slice created	2021-08-29 16:00:08 +08:00
Eric Ernst	8771d8c375	Merge pull request #2514 from rapiz1/improve-util-test virtcontainers: simplify tests	2021-08-28 06:41:15 -07:00
Yujia Qiao	a99fcc3af1	virtcontainers: simplify tests Simplify tests in utils_test.go by table-driven tests. Fixes: #2281 Signed-off-by: Yujia Qiao <rapiz3142@gmail.com>	2021-08-28 12:35:25 +08:00
Binbin Zhang	39ffd8ee84	runtime: delete types or const that no longer needed type: ProcessListOptions; ProcessList const: SocketTypeVSOCK Fixes: #2285 Signed-off-by: Binbin Zhang <binbin36520@gmail.com>	2021-08-28 04:09:25 +00:00
Binbin Zhang	ff37f5c798	runtime: Optimize the way slice created Initialize and assign a value, reducing one append operation Fixes: #2264 Signed-off-by: Binbin Zhang <binbin36520@gmail.com>	2021-08-28 04:15:59 +08:00
Carlos Venegas	fb583780f6	Merge pull request #2488 from likebreath/0823/clh_openapi_generator virtcontainers: clh: Upgrade to the openapi-generator v5.2.1	2021-08-27 14:28:09 -05:00
Binbin Zhang	4751698829	virtcontainers: Fix incorrect scripts path modify to the correct relative path Fixes: #2515 Signed-off-by: Binbin Zhang <binbin36520@gmail.com>	2021-08-27 19:16:53 +00:00
Chelsea Mafrica	8f0f949abf	tracing: Move dynamically added attributes to Trace() Where possible, move attributes added with AddTag() to Trace() call to reduce the amount of code used for tracing. Fixes #2512 Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>	2021-08-27 08:26:40 -07:00
Bo Chen	932ee41b3f	virtcontainers: clh: Workaround incorrect default values Two default values defined in the 'cloud-hypervisor.yaml' have typo, and this patch manually overwrites them with the correct value as a workaround before the corresponding fix is landed to Cloud Hypervisor upstream. Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-08-26 22:53:31 -07:00
Bo Chen	bff38e4f4d	virtcontainers: clh: Fix the unit test This patch fixes the unit tests over clh.go with the updated client code. Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-08-26 22:53:17 -07:00
Bo Chen	d967d3cb37	virtcontainers: clh: Use constructors to ensure proper default value With the updated openapi-generator, the client code now handles optional attributes correctly, and ensures to assign the right default values. This patch enables to use those constructors to make sure the proper default values being used. Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-08-26 22:53:13 -07:00
Chelsea Mafrica	8058e97212	tracing: Change runtime tracing tags to vars Tracing tags are stored inconsistently throughout the runtime. Change all instances of tracing tags to variables. Fixes #2512 Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>	2021-08-26 15:55:32 -07:00
Bo Chen	a6a2e525de	virtcontainers: clh: Migrate to use the updated client APIs The client code (and APIs) for Cloud Hypervisor has been changed dramatically due to the upgrade to `openapi-generator` v5.2.1. This patch migrate the Cloud Hypervisor driver in the kata-runtime to use those updated APIs. The main change from the client code is that it now uses "pointer" type to represent "optional" attributes from the input openapi specification file. Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-08-26 14:04:18 -07:00
Yujia Qiao	814cea9601	virtcontainers: clean up useless code Fixes: #2275 Signed-off-by: Yujia Qiao <rapiz3142@gmail.com>	2021-08-24 16:04:34 +08:00
Bo Chen	46eb07e14f	virtcontainers: clh: Re-generate the client code This patch re-generates the client code for Cloud Hypervisor with the updated `openapi-generator` v5.2.1. Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-08-23 16:00:32 -07:00
Bo Chen	80fba4d637	virtcontainers: clh: Upgrade to the openapi-generator v5.2.1 To improve the quality and correctness of the auto-generated code, this patch upgrade the `openapi-generator` to its latest stable release v5.2.1. Fixes: #2487 Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-08-23 15:59:41 -07:00
Bl1tz23	87bbae1bd7	fc: fix version parsing for fc >= 0.25 Allows to use firecracker version >=0.25. Fixes: #2471 Signed-off-by: Bl1tz23 <alex3angle@gmail.com>	2021-08-23 15:09:59 +03:00
wangyongchao.bj	99ab91df3d	docs: update the docs project url from kata 1.x to 2.x changed the document project url in the using-vpp-and-kata.md and runtime experimental README.md files. Fixes: #2418 Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>	2021-08-10 13:51:54 +08:00
Anastassios Nanos	64dd35ba4f	virtcontainers: fc: properly remove jailed block device When running a firecracker instance jailed, block devices are not removed correctly, as the jailerRoot path is not stripped from the PATCH command sent to the FC API. This patch differentiates the jailed case from the non-jailed one and allows the firecracker instance to be properly terminated. Fixes #2387 Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>	2021-08-04 16:31:56 +00:00
David Gibson	3165095669	runtime/qemu: Use explicit "on" for kernel_irqchip parameter Kata uses the 'kernel_irqchip' machine option to qemu. By default it uses it in what qemu calls the "short-form boolean" with no parameter. That style was deprecated by qemu between 5.2 and 6.0 (commit ccd3b3b8112b) and effectively removed entirely between 6.0 and 6.1 (commit d8fb7d0969d5). Update ourselves for newer qemus by using an explicit "kernel_irqchip=on". Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-08-04 14:34:11 +10:00
Carlos Venegas	27b9a68189	Merge pull request #2365 from sameo/topic/clh-tracing virtcontainers: clh: Do not use the default HTTP client	2021-08-03 12:54:09 -05:00
Hui Zhu	e6408fe670	Container: Add initConfigResourcesMemory and call it in newContainer The swappiness is not right if just set io.katacontainers.container.resource.swappiness: $ pod_yaml=pod.yaml $ container_yaml=container.yaml $ image="quay.io/prometheus/busybox:latest" $ cat << EOF > "${pod_yaml}" metadata: name: busybox-sandbox1 EOF $ cat << EOF > "${container_yaml}" metadata: name: busybox-killed-vmm annotations: io.katacontainers.container.resource.swappiness: "100" image: image: "$image" command: - top EOF $ sudo crictl pull $image $ podid=$(sudo crictl runp $pod_yaml) $ cid=$(sudo crictl create $podid $container_yaml $pod_yaml) $ sudo crictl start $cid crictl exec $cid cat /sys/fs/cgroup/memory/memory.swappiness 60 The cause of this issue is there are two elements store the resources infomation. They are c.config.Resources for calculateSandboxMemory and c.GetPatchedOCISpec() for agent. This add initConfigResourcesMemory to Container and call it in newContainer to handle the issue. Fixes: #2372 Signed-off-by: Hui Zhu <teawater@antfin.com>	2021-08-02 16:02:12 +08:00
Fupan Li	fdc42ca7ff	Merge pull request #2324 from jongwu/ro_nv qemu/arm: remove nvdimm/"ReadOnly" option on arm64	2021-08-02 14:14:06 +08:00
Hui Zhu	ee90affc18	newContainer: Initialize c.config.Resources.Memory if it is nil container start fail if io.katacontainers.container.resource.swap_in_bytes and memory_limit_in_bytes are not set. $ pod_yaml=pod.yaml $ container_yaml=container.yaml $ image="quay.io/prometheus/busybox:latest" $ cat << EOF > "${pod_yaml}" metadata: name: busybox-sandbox1 EOF $ cat << EOF > "${container_yaml}" metadata: name: busybox-killed-vmm annotations: io.katacontainers.container.resource.swappiness: "60" image: image: "$image" command: - top EOF $ sudo crictl pull $image $ podid=$(sudo crictl runp $pod_yaml) $ cid=$(sudo crictl create $podid $container_yaml $pod_yaml) $ sudo crictl start $cid DEBU[0000] get runtime connection DEBU[0000] connect using endpoint 'unix:///var/run/containerd/containerd.sock' with '10s' timeout DEBU[0000] connected successfully using endpoint: unix:///var/run/containerd/containerd.sock DEBU[0000] StartContainerRequest: &StartContainerRequest{ContainerId:4fea91d16f661931fe33acd247efe831ef9e571588ba18b5a16f04c278fd61b8,} DEBU[0000] StartContainerResponse: nil FATA[0000] starting the container "4fea91d16f661931fe33acd247efe831ef9e571588ba18b5a16f04c278fd61b8": rpc error: code = Unknown desc = failed to create containerd task: failed to create shim: ttrpc: closed: unknown The cause of fail if if c.config.Resources.Memory is nil, values of io.katacontainers.container.resource.swappiness and io.katacontainers.container.resource.swap_in_bytes will be store in newContainer. This commit initialize c.config.Resources.Memory if it is nil in newContainer. Fixes: #2367 Signed-off-by: Hui Zhu <teawater@antfin.com>	2021-08-01 10:03:27 +08:00
Hui Zhu	767a41ce56	updateResources: Log result after calculateSandboxMemory Log result after calculateSandboxMemory in updateResources. Fixes: #2367 Signed-off-by: Hui Zhu <teawater@antfin.com>	2021-08-01 09:57:44 +08:00
Samuel Ortiz	760ec4e58a	virtcontainers: clh: Do not use the default HTTP client When enabling tracing with Cloud Hypervisor, we end up establishing 2 connections to 2 different HTTP servers: The Cloud Hypervisor API one that runs over a UNIX socket and the Jaeger endpoint running over UDP. Both connections use the default HTTP golang client instance, and thus share the same transport layer. As the Cloud Hypervisor implementation sets it up to be over a Unix socket, the jaeger uploader ends up going through that transport as well, and sending its spans to the Cloud Hypervisor API server. We fix that by giving the Cloud Hypervisor implementation its own HTTP client instance and we avoid sharing it with anything else in the shim. Fixes #2364 Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>	2021-07-30 16:51:01 +02:00
James O. D. Hunt	4f0726bc49	docs: Remove table of contents Removed all TOCs now that GitHub auto-generates them. Also updated the documentation requirements doc removing the requirement to add a TOC. Fixes: #2022. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2021-07-30 10:58:22 +01:00
Bo Chen	cc0bb9aebc	versions: Upgrade to Cloud Hypervisor v17.0 Highlights from the Cloud Hypervisor release v17.0: 1) ARM64 NUMA support using ACPI; 2) `Seccomp` support for MSHV backend; 3) Hotplug of macvtap devices; 4) Improved SGX support; 5) Inflight tracking for `vhost-user` devices; 6) Bug fixes. Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v17.0 Note: The client code of cloud-hypervisor's OpenAPI is automatically generated by `openapi-generator` [1-2]. As the API changes do not impact usages in Kata, no additional changes in kata's runtime are needed to work with the current version of cloud-hypervisor. [1] https://github.com/OpenAPITools/openapi-generator [2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md Fixes: #2333 Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-07-27 11:56:29 -07:00
Jianyong Wu	77604de80b	qemu/arm: remove nvdimm/"ReadOnly" option on arm64 There is a new "ReadOnly" option added to nvdimm device in qemu and now added to kata. However, qemu used for arm64 is a little old and has no this feature. Here we remove this feature for arm. Fixes: #2320 Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>	2021-07-27 20:32:55 +08:00
Gabriela Cervantes	4fbae549e4	docs: Update experimental documentation This PR updates the experimental documentation with the proper reference to kata 2.x Fixes #2317 Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2021-07-26 20:29:21 +00:00
Fabiano Fidêncio	116c29c897	cgroups: manager's Set() now takes Resources as its parameter Pior our bump to runc 1.0.1 the manager's Set() would take a Config as its parameter. Now it takes the Resources directly. Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>	2021-07-26 11:34:27 +02:00
Fabiano Fidêncio	c0f801c0c4	rootless: RunningInUserNS() is now part of userns namespace Previously part of the "system" namespace, the RunningInUserNS() has been moved to the "userns" namespace. Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>	2021-07-26 11:34:23 +02:00
Julio Montes	2859600a6f	runtime: virtcontainers: make rootfs image read-only Improve security by making rootfs image read-only, nobody will be able to modify it from the guest. fixes #1916 Signed-off-by: Julio Montes <julio.montes@intel.com>	2021-07-23 13:20:42 -05:00
Julio Montes	aec530904b	runtime: virtcontainers/utils: fix govet fieldalignment Fix structures alignment Signed-off-by: Julio Montes <julio.montes@intel.com>	2021-07-20 11:59:15 -05:00
Julio Montes	1e4f7faa77	runtime: virtcontainers/types: fix govet fieldalignment Fix structures alignment Signed-off-by: Julio Montes <julio.montes@intel.com>	2021-07-20 11:59:15 -05:00
Julio Montes	bb9495c0b7	runtime: virtcontainers/pkg: fix govet fieldalignment Fix structures alignment Signed-off-by: Julio Montes <julio.montes@intel.com>	2021-07-20 11:59:15 -05:00
Julio Montes	80ab91ac2f	runtime: virtcontainers/persist: fix govet fieldalignment Fix structures alignment Signed-off-by: Julio Montes <julio.montes@intel.com>	2021-07-20 11:59:15 -05:00
Julio Montes	54bdd01811	runtime: virtcontainers/factory: fix govet fieldalignment Fix structures alignment Signed-off-by: Julio Montes <julio.montes@intel.com>	2021-07-20 11:59:15 -05:00
Julio Montes	dd58de368d	runtime: virtcontainers/device: fix govet fieldalignment Fix structures alignment Signed-off-by: Julio Montes <julio.montes@intel.com>	2021-07-20 11:59:15 -05:00
Julio Montes	47d95dc1c6	runtime: virtcontainers: fix govet fieldalignment Fix structures alignment fixes #2271 Depends-on: github.com/kata-containers/tests#3727 Signed-off-by: Julio Montes <julio.montes@intel.com>	2021-07-20 11:59:15 -05:00
Hui Zhu	cb6b7667cd	runtime: Add option "enable_guest_swap" to config hypervisor.qemu This commit add option "enable_guest_swap" to config hypervisor.qemu. It will enable swap in the guest. Default false. When enable_guest_swap is enabled, insert a raw file to the guest as the swap device if the swappiness of a container (set by annotation "io.katacontainers.container.resource.swappiness") is bigger than 0. The size of the swap device should be swap_in_bytes (set by annotation "io.katacontainers.container.resource.swap_in_bytes") - memory_limit_in_bytes. If swap_in_bytes is not set, the size should be memory_limit_in_bytes. If swap_in_bytes and memory_limit_in_bytes is not set, the size should be default_memory. Fixes: #2201 Signed-off-by: Hui Zhu <teawater@antfin.com>	2021-07-19 23:22:06 +08:00
Hui Zhu	a733f537e5	runtime: newContainer: Handle the annotations of SWAP This commit add code to handle the annotations "io.katacontainers.container.resource.swappiness" and "io.katacontainers.container.resource.swap_in_bytes". It will set the value of "io.katacontainers.resource.swappiness" to c.config.Resources.Memory.Swappiness and set the value of "io.katacontainers.resource.swap_in_bytes" to c.config.Resources.Memory.Swap. Fixes: #2201 Signed-off-by: Hui Zhu <teawater@antfin.com>	2021-07-19 23:20:46 +08:00
Hui Zhu	2c835b60ed	ContainerConfig: Set ocispec.Annotations to containerConfig.Annotations ocispec.Annotations is dropped in ContainerConfig. This commit let it to be set to containerConfig.Annotations in ContainerConfig. Fixes: #2201 Signed-off-by: Hui Zhu <teawater@antfin.com>	2021-07-19 23:20:43 +08:00
Hui Zhu	243d4b8689	runtime: Sandbox: Add addSwap and removeSwap addSwap will create a swap file, hotplug it to hypervisor as a special block device and let agent to setup it in the guest kernel. removeSwap will remove the swap file. Just QEMU support addSwap. Fixes: #2201 Signed-off-by: Hui Zhu <teawater@antfin.com>	2021-07-19 23:20:40 +08:00
Hui Zhu	e1b91986d7	runtime: Update golang proto code for AddSwap Fixes: #2201 Signed-off-by: Hui Zhu <teawater@antfin.com>	2021-07-19 23:20:37 +08:00
Fabiano Fidêncio	11d84cca46	Merge pull request #2229 from lifupan/fix_virtiofsd virtiofsd: fix the issue of missing stop virtiofsd	2021-07-19 13:34:59 +02:00
Fabiano Fidêncio	3a9ecbcca5	Merge pull request #2231 from liubin/fix/2230-register-defer-callback-at-early-stage runtime: Register defer function at early stage	2021-07-14 17:50:48 +02:00
fupan.lfp	34828df9a1	virtiofsd: fix the issue of missing stop virtiofsd The virtiofsd's PID wan't assigned the right pid, which will result skipping kill it. Fixes: #2228 Signed-off-by: fupan.lfp <fupan.lfp@antgroup.com>	2021-07-14 21:07:10 +08:00
bin	39546a1070	runtime: delete not used functions Delete some not used functions in sandbox.go Fixes: #2230 Signed-off-by: bin <bin@hyper.sh>	2021-07-14 19:42:50 +08:00
bin	d0bc148fe0	runtime: Register defer function at early stage Register defer function at early stage ensure that it can be called if the startSandbox fails. Fixes: #2230 Signed-off-by: bin <bin@hyper.sh>	2021-07-14 17:20:53 +08:00
bin	350acb2d6e	virtcontainers: refactoring code for error handling in sandbox Use a defined error variable replade inplace error, and shortcut for handling errors returned from function calls. Fixes: #2187 Signed-off-by: bin <bin@hyper.sh>	2021-07-14 14:28:58 +08:00
bin	858f39ef75	virtcontainers: update wrong comments for code Some comments/URL are old or wrong, update them to the correct ones. Fixes: #2187 Signed-off-by: bin <bin@hyper.sh>	2021-07-14 14:28:57 +08:00
bin	e0a19f6a16	virtcontainers: update API documentation Some functions add context as its first parameter, the documentation should update. Fixes: #2187 Signed-off-by: bin <bin@hyper.sh>	2021-07-14 14:28:57 +08:00
Eric Ernst	feeb1ef8b1	Merge pull request #2212 from lifupan/fix_virtiofsd qemu: stop the virtiofsd specifically	2021-07-12 13:56:04 -07:00
Chelsea Mafrica	61b1a6732b	Merge pull request #2179 from bporter816/bporter816/refactor-tracing tracing: Consolidate tracing into a new katatrace package	2021-07-12 12:42:01 -04:00
bin	9081bee2fd	runtime: return error if clh's binary has not a normal stat When checking clh's binary path if valid, return error even though the error is not a IsNotExist error. And add errors to log filed when errors occurred. Fixes: #2208 Signed-off-by: bin <bin@hyper.sh>	2021-07-12 11:16:35 +08:00
Benjamin Porter	b10e3e22b5	tracing: Consolidate tracing into a new katatrace package Removes custom trace functions defined across the repo and creates a single trace function in a new katatrace package. Also moves span tag management into this package and provides a function to dynamically add a tag at runtime, such as a container id, etc. Fixes #1162 Signed-off-by: Benjamin Porter <bporter816@gmail.com>	2021-07-11 14:19:51 -05:00
fupan.lfp	8f76626fd6	qemu: stop the virtiofsd specifically We'd better stop the virtiofsd specifically after stop qemu, instead of depending on the qemu's termination to notify virtiofsd to exit. Fixes: #2211 Signed-off-by: fupan.lfp <fupan.lfp@antgroup.com>	2021-07-10 17:26:19 +08:00
Fabiano Fidêncio	305fb0547d	virtcontainers: Fix `gosimple` issue on client.go For some reason our static check started to get opinionated about code that's been there for ages. One of the suggestions is to improve: ``` INFO: Running golangci-lint on /home/fidencio/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/pkg/agent/protocols/client client.go:431:2: S1017: should replace this `if` statement with an unconditional `strings.TrimPrefix` (gosimple) if strings.HasPrefix(sock, "mock:") { ``` And that's what this PR is about. Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>	2021-07-09 17:18:08 +02:00
Fabiano Fidêncio	89cf168c92	virtcontainers: Ignore a staticcheck error on cpuset.go First of all, cpuset.go just comes from kubernetes and we shouldn't be doing much with this file apart from updating it every now and then (but that's material for another PR). Right now, due to some change on the static checks we use as part of our CI, we started getting issues as: ``` INFO: Running golangci-lint on /home/fidencio/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/pkg/cpuset cpuset.go:60:2: SA4005: ineffective assignment to field Builder.done (staticcheck) b.done = true ``` For those, let's just ignore the lint and move on. Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>	2021-07-09 17:17:12 +02:00
Jakob Naucke	cfd690b638	virtcontainers: Use virtio-blk-ccw on s390x if virtio-blk-pci were to be used Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2021-07-08 14:59:47 +02:00
Bo Chen	d08603bebb	runtime: Remove the version check for cloud hypervisor It looks like the version check for cloud hypervisor (clh) was added initially when clh was actively evolving its API. We no longer need the version check as clh API has been fairly stable for its recent releases. Fixes: #1991 Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-07-06 18:42:59 -07:00
Tim Zhang	3f1aa8ff91	Merge pull request #2084 from liubin/fix/2082-refactor-vc-pkg-oci runtime: refact virtcontainers/pkg/oci	2021-07-06 19:14:10 +08:00
Fupan Li	2de9c5b41d	Merge pull request #1969 from liubin/feature/1968-pass-span-context-to-agent Pass span context from runtime to agent to get a full trace #1968	2021-07-03 09:31:02 +08:00
bin	bd5951247c	runtime: add spans and attributes for agent/mount Add more spans and attributes for agent setup, add devices, and mount volumes. Fixes: #1968 Signed-off-by: bin <bin@hyper.sh>	2021-07-02 10:07:28 +08:00
bin	ae46e7bf97	runtime: pass span context to agent in ttRPC client Pass span context through ttRPC metadata, that agent can get parent from the context to create new sub-spans. Fixes: #1968 Signed-off-by: bin <bin@hyper.sh>	2021-07-02 10:07:14 +08:00
bin	66dd8719e3	runtime: refact virtcontainers/pkg/oci Use common functions wrapping logic of getting values from annotations, parsing bool/uint32/uint64 and setting to struct fields. Fixes: #2082 Signed-off-by: bin <bin@hyper.sh>	2021-07-01 10:14:47 +08:00
fupan.lfp	d671f78952	agent: fix the issue of convert OCI spec to RPC spec Since the rpc spec used an interface to represen the ErrnoRet, thus the transform function of OCItoGRPC should take care of this case. Depends-on: github.com/kata-containers/tests#3629 Fixes: #1441 Signed-off-by: fupan.lfp <fupan.lfp@antgroup.com>	2021-06-30 22:56:59 +08:00
fupan.lfp	f607641a6e	shimv2: fix the issue bring by updating containerd vendor Fix the mismatch bring by the upgrading of vendor of containerd, cgroup and runtime spec. Fixes: #1441 Signed-off-by: fupan.lfp <fupan.lfp@antgroup.com>	2021-06-30 22:56:51 +08:00
Eric Ernst	d0ad388721	Merge pull request #2065 from ManaSugi/format-golang-proto runtime: Format golang proto code	2021-06-30 11:08:57 -07:00
Fabiano Fidêncio	a8bb8269fe	Merge pull request #2047 from Jakob-Naucke/s390x-skip-hotplug virtcontainers: Don't fail memory hotplug	2021-06-28 08:18:31 +02:00
Eric Ernst	064dfb164b	runtime: Add "watchable-mounts" concept for inotify support To workaround virtiofs' lack of inotify support, we'll special case particular mounts which are typically watched, and pass on information to the agent so it can ensure that the mount presented to the container is indeed watchable (see applicable agent commit). This commit will: - identify watchable mounts based on file count and mount source - create a watchable-bind storage object for these mounts to communicate intent to the agent - update the OCI spec to take the updated watchable mount source into account Unit tests added and updated for the newly introduced functionality/functions. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-06-24 10:07:06 -07:00
Eric Ernst	57c0cee0a5	runtime: Cleanup mountSharedDirMounts, shareFile parameters There's no reason to pass the paths; they can be determined when they are actually used. Let's make the return values more comparable to the other mount handling functions (we'll add storage object in future commit), and pass the mount maps as function parameters. ...No functional changes here... Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-06-24 10:07:06 -07:00
Jakob Naucke	8310a3d70a	virtcontainers: Don't fail memory hotplug Architectures that do not support memory hotplugging will fail when memory limits are set because that amount is hotplugged. Issue a warning instead. The long-term solution is virtio-mem. Fixes: #1412 Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2021-06-24 10:58:06 +02:00
David Gibson	c0cc6d5978	Merge pull request #1954 from marcel-apf/remove-pc Remove the pc machine	2021-06-23 12:00:05 +10:00
Marcel Apfelbaum	ac6b9c53d2	runtime: Hot-plug virtio-mem device on PCI bridge Currently the virtio-mem device is hotplugged on the root bus. This doesn't work for PCIe machines like q35. Hotplug the virtio-mem device into the pci bridge instead. Fixes #1953 Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>	2021-06-22 12:34:48 +03:00
Marcel Apfelbaum	789a59549e	virtcontainers: Remove the pc machine Keeping around two different x86 machines has no added value and require more tests and maintenance. Prefer the q35 machine since it has more features and drop the pc machine. Fixes #1953 Depends-on: github.com/kata-containers/tests#3586 Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>	2021-06-22 11:54:07 +03:00
Manabu Sugimoto	caf5760c45	runtime: Update golang proto code We should update golang proto files. These changes are updated using libprotoc v3.6.1. Fixes: #2064 Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>	2021-06-19 18:53:56 +09:00
Julio Montes	ecdd137c6f	runtime: do not hot-remove PMEM devices PMEM devices cannot be hot-removed from a running VM. fixes #2018 Signed-off-by: Julio Montes <julio.montes@intel.com>	2021-06-18 09:02:03 -05:00
Tim Zhang	90029032b4	Merge pull request #2049 from liubin/2048/fix-log-field runtime: using detail propertites instead of function name in log field	2021-06-17 10:53:12 +08:00
bin	2022c64f94	runtime: using detail propertites instead of function name in log field To print the correct value of kernel parameters, the log field value should not be a function name. And for that qemuArchBase doesn't contain debug flag, so the log contains debug/non-debug parameters. Fixes: #2048 Signed-off-by: bin <bin@hyper.sh>	2021-06-17 00:17:16 +08:00
Julio Montes	361bee91f7	runtime/virtcontrainers: fix alignment structures fix alignment of qemuArchBase and HypervisorConfig structures Signed-off-by: Julio Montes <julio.montes@intel.com>	2021-06-16 07:16:49 -05:00
Julio Montes	7834f4127f	virtcontainers: change memory_offset to uint64 `memory_offset` is used to increase the maximum amount of memory supported in a VM, this offset is equal to the NVDIMM/PMEM device that is hot added, in real use case workloads such devices are bigger than 4G, which is the current limit (uint32). fixes #2006 Signed-off-by: Julio Montes <julio.montes@intel.com>	2021-06-16 07:16:49 -05:00
Amulya Meka	ea9bb8e9ad	ppc64le: Adding test for appendProtectionDevice Fixes: #2038 Signed-off-by: Amulya Meka <amulmek1@in.ibm.com>	2021-06-15 10:23:38 +00:00
Fabiano Fidêncio	5a71786986	Merge pull request #1674 from jimcadden/stable-2.0-SEV Support SEV	2021-06-12 16:56:51 +02:00
Fabiano Fidêncio	be31694554	virtcontainers: Fix TestQemuAmd64AppendProtectionDevice() Since SEV support has been added, an implementation mistake was also added to TestQemuAmd64AppendProtectionDevice. appendProtectionDevice() will, as it name says, append the protection device to whatever was there previously. So, when SEV was added, we broke the comparison done for TDX as we didn't append the expected output for TDX with what we already had for SEV. This should be enough to get the tests passing. Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>	2021-06-12 08:56:15 -04:00
GabyCT	66e4c77a54	Merge pull request #1993 from likebreath/0610/clh_v16.0 versions: Upgrade to cloud-hypervisor v16.0	2021-06-11 15:11:11 -05:00
Fabiano Fidêncio	24bbcf58d3	Merge pull request #1981 from LiangZhou-CTY/patch-1 runtime: remove the call to storeSandbox at the end of createSandboxFromConfig	2021-06-11 00:30:39 +02:00
Fabiano Fidêncio	8239f6fc17	Merge pull request #1772 from Jakob-Naucke/sec-exec virtcontainers: Add support for Secure Execution	2021-06-11 00:02:01 +02:00
Bo Chen	85c40001da	versions: Upgrade to cloud-hypervisor v16.0 Highlights from the Cloud Hypervisor release v16.0: 1) Improved live migration support; 2) Improved `vhost-user` support; 3) ARM64 ACPI and UEFI support; 4) Bug fixes. Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v16.0 Note: The client code of cloud-hypervisor's OpenAPI is automatically generated by `openapi-generator` [1-2]. As the API changes do not impact usages in Kata, no additional changes in kata's runtime are needed to work with the current version of cloud-hypervisor. [1] https://github.com/OpenAPITools/openapi-generator [2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md Fixes: #1992 Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-06-10 10:16:39 -07:00
Liang Zhou	3130e66d33	runtime: remove storeSandbox at the end of createSandboxFromConfig Remove storeSandbox() at the end of createSandboxFromConfig(), because this callchain createSandboxFromConfig -> createContainers has already calls storeSandbox(). This can improve the startup speed of the container, even just for a little. Fixes: #1980 Signed-off-by: Liang Zhou <zhoul110@chinatelecom.cn>	2021-06-10 11:56:40 +08:00
Tim Zhang	f26837a0f1	Merge pull request #1967 from liubin/fix/1956-add-more-traces-for-network runtime: add more traces for network	2021-06-10 10:56:42 +08:00
Fabiano Fidêncio	51ac042cad	Merge pull request #939 from keloyang/detach factory: Use lazy unmount	2021-06-07 13:26:16 +02:00
Jakob Naucke	c0c05c73e1	virtcontainers: Add support for Secure Execution Secure Execution is a confidential computing technology on s390x (IBM Z & LinuxONE). Enable the correspondent virtualization technology in QEMU (where it is referred to as "Protected Virtualization"). - Introduce enableProtection and appendProtectionDevice functions for QEMU s390x. - Introduce CheckCmdline to check for "prot_virt=1" being present on the kernel command line. - Introduce CPUFacilities and avilableGuestProtection for hypervisor s390x to check for CPU support. Fixes: #1771 Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2021-06-07 10:50:33 +02:00
Jakob Naucke	78f21710e3	virtcontainers/s390x: Put consts into one block Previously, all consts were in single lines in virtcontainers/qemu_s390x.go. Put them into a const block. Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2021-06-07 10:50:30 +02:00
bin	784025bb08	runtime: add more traces for network Add traces for all the endpoinnt types and the main interface functions. Record errors for some traces. Fixes: #1956 Signed-off-by: bin <bin@hyper.sh>	2021-06-07 11:38:40 +08:00
Chelsea Mafrica	3d0e0b2786	tracing: Add network model to span Trace spans erroneously set the network model to default in all cases. Add function to return network model string and use it to set attribute in spans. Fixes #1878 Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>	2021-06-02 21:53:54 -07:00
Chelsea Mafrica	8ca0207281	tracing: Add sandbox and container ID to trace spans Add sandbox, container, and hypervisor IDs to trace spans. Note that some spans in sandbox.go are created with a trace() call from api.go. These spans have additional attributes set after span creation to overwrite the api attributes. Fixes #1878 Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>	2021-06-02 21:53:54 -07:00
Bin Liu	1673110ee9	Merge pull request #1930 from jcvenegas/kata-moinitor-export-virtiofsd metrics: Add virtiofsd exporter	2021-06-03 10:38:55 +08:00
Sandeep Gupta	b26d5b1d08	virtcontainers: Support SEV fixes #1869 Signed-off-by: Jim Cadden <jcadden@ibm.com>	2021-06-02 14:32:50 -04:00
Carlos Venegas	2234b73090	metrics: Add virtiofsd exporter Export proc stats for virtiofsd. This commit only adds for hypervisors that have support for it. - qemu - cloud-hypervisor Fixes: #1926 Signed-off-by: Carlos Venegas <jos.c.venegas.munoz@intel.com>	2021-06-02 16:06:00 +00:00
Tim Zhang	476ec9bd86	Merge pull request #1948 from liubin/fix/1947-fix-comments runtime: fix some comments and logs	2021-06-02 10:52:01 +08:00
Pradipta Banerjee	604e3a6fa1	Merge pull request #1882 from Amulyam24/pef runtime: Add support for PEF	2021-06-01 12:56:53 +05:30
Peng Tao	41e04495f4	Merge pull request #1943 from bergwolf/cleanup2 cleanup TODOs in runtime	2021-06-01 14:16:46 +08:00
Chelsea Mafrica	bcde703b36	Merge pull request #1859 from cmaf/tracing-attributes-1 tracing: Make runtime span attributes more consistent	2021-05-31 21:57:58 -07:00
bin	b68334a1a8	runtime: fix some comments and logs This commit fix some conments/logs. And add some logs for debug. Fixes: #1947 Signed-off-by: bin <bin@hyper.sh>	2021-06-01 09:04:18 +08:00
Bin Liu	d1ac0a1a2c	Merge pull request #1938 from liubin/fix/1933-virtiofsd-refactor virtiofsd: refactor qemu.go to use code in virtiofsd.go	2021-06-01 08:32:56 +08:00
Fabiano Fidêncio	d7b6e3e178	Merge pull request #1942 from bergwolf/cleanup runtime: remove unused doc.go	2021-05-31 22:41:24 +02:00
Peng Tao	1f5b229bef	runtime: remove FIXME in SandboxState about CgroupPath It is in real life usage as we put non constrained sandbox processes (like shim) in a separate cgroup path. Fixes: #1944 Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2021-05-29 13:17:14 +08:00
Peng Tao	fee0004ad4	runtime: remove TODO about hot add memory in qemu.go Already addressed by https://github.com/kata-containers/runtime/pull/786 Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2021-05-29 11:15:50 +08:00
Peng Tao	2e29ef9cab	runtime: remove TODO comment from StatusContainer It is no longer valid as containerd already doesn't treat container pid as host process pid. Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2021-05-29 11:10:32 +08:00
bin	72cd8f5ef6	virtiofsd: refactor qemu.go to use code in virtiofsd.go CloudHypervisor is using virtiofsd.go to manage virtiofsd process, but qemu has its code in qemu.go. This commit let qemu to re-use code in virtiofsd.go to reduce code and improve maintenanceability. Fixes: #1933 Signed-off-by: bin <bin@hyper.sh>	2021-05-29 11:00:05 +08:00
Peng Tao	0b22c48d2a	runtime: remove unused doc.go It doesn't even contain any actual code there. Fixes: #1941 Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2021-05-29 10:25:29 +08:00
Chelsea Mafrica	05a46fede0	tracing: Make runtime span attributes more consistent Span attributes (tags) are not consistent in runtime tracing, so designate and use core attributes such source, package, subsystem, and type as span metadata for more understandable output. Use WithAttributes() during span creation to reduce calls to SetAttributes(). Modify Trace() in katautils to accept slice of attributes so multiple functions using different attributes can use it. Fixes #1852 Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>	2021-05-27 10:07:11 -07:00
bin	773deca2f6	virtiofsd: Fix file descriptors leak and return correct PID This commit will fix two problems: - Virtiofsd process ID returned to the caller will always be 0, the pid var is never being assigned a value. - Socket listen fd may leak in case of failure of starting virtiofsd process. This is a port of `be9ca0d58b` Fixes: #1931 Signed-off-by: bin <bin@hyper.sh>	2021-05-27 16:51:41 +08:00
Amulyam24	37a426b4c6	runtime: Add support for PEF Protected Execution Facility(PEF) is the confidential computing technology on ppc64le. This PR adds the support for it in Kata. Also re-vendor govmm for the latest changes. Fixes: #1881 Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>	2021-05-25 14:29:42 +00:00
Eric Ernst	7f1030d303	sandbox-bindmount: persist mount information Without this, if the shim dies, we will not have a reliable way to identify what mounts should be cleaned up if `containerd-shim-kata-v2 cleanup` is called for the sandbox. Before this, if you `ctr run` with a sandbox bindmount defined and SIGKILL the containerd-shim-kata-v2, you'll notice the sandbox bindmount left on host. With this change, the shim is able to get the sandbox bindmount information from disk and do the appropriate cleanup. Fixes #1896 Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-05-21 12:54:35 -07:00
Eric Ernst	089a7484e1	sandbox: Cleanup if failure to setup sandbox-bindmount occurs If for any reason there's an error when trying to setup the sandbox bindmounts, make sure we roll back any mounts already created when setting up the sandbox. Without this, we'd leave shared directory mount and potentially sandbox-bindmounts on the host. Fixes: #1895 Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-05-21 12:54:35 -07:00
Shukui Yang	bd0cde40e7	factory: Use lazy unmount we can have the following case, 1. start kata container with factory feature, this need kata-runtime config to enable factory and use initrd as base image. 2. start a kata container. 3. cd /root; cd /run/vc/vm/template dir, this will make /run/vc/vm/template to be in used. 4. destroy vm template with kata-runtime factory destroy , and check the template mountpoint. we can see the template mountpoints will add everytime we repeat the above steps . [root@centos1 template]# mount \|grep template [root@centos1 template]# docker run -ti --rm --runtime untrusted-runtime --net none busybox echo [root@centos1 template]# cd /root; cd /run/vc/vm/template/ [root@centos1 template]# /kata/bin/kata-runtime factory destroy vm factory destroyed [root@centos1 template]# mount \|grep template tmpfs on /run/vc/vm/template type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=2105344k) [root@centos1 template]# docker run -ti --rm --runtime untrusted-runtime --net none busybox echo [root@centos1 template]# cd /root; cd /run/vc/vm/template/ [root@centos1 template]# /kata/bin/kata-runtime factory destroy vm factory destroyed [root@centos1 template]# mount \|grep template tmpfs on /run/vc/vm/template type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=2105344k) tmpfs on /run/vc/vm/template type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=2105344k) Fixes: #938 Signed-off-by: Shukui Yang <keloyangsk@gmail.com>	2021-05-20 16:18:28 +08:00
Peng Tao	35151f1786	runtime: sandbox delete should succeed after verifying sandbox state Otherwise we might block delete and create orphan containers. Fixes: #1039 Signed-off-by: Peng Tao <bergwolf@hyper.sh> Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-05-13 14:05:49 -07:00
Fabiano Fidêncio	ac61e60492	Merge pull request #1790 from snir911/configure_timeout runtime: make dialing timeout configurable	2021-05-11 16:52:05 +02:00
Samuel Ortiz	2c4e4ca1ac	Merge pull request #1590 from devimc/2021-02-02/ConfidentialComputing Support TDx	2021-05-10 22:19:40 +02:00
Snir Sheriber	01b56d6cbf	runtime: make dialing timeout configurable allow to set dialing timeout in configuration.toml default is 30s Fixes: #1789 Signed-off-by: Snir Sheriber <ssheribe@redhat.com>	2021-05-10 16:39:37 +03:00
Julio Montes	4f61f4b490	virtcontainers: Support TDX Add support for Intel TDX confidential guests fixes #1332 Signed-off-by: Julio Montes <julio.montes@intel.com>	2021-05-06 10:09:05 -05:00
Julio Montes	0affe8860d	virtcontainers: define confidential guest framework Define the structure and functions needed to support confidential guests, this commit doesn't add support for any specific technology, support for TDX, SEV, PEF and others will be added in following commits. Signed-off-by: Julio Montes <julio.montes@intel.com>	2021-05-06 10:09:05 -05:00
Hui Zhu	7f7c3fc8ec	qemu.go: qemu: resizeMemory: Fix virtio-mem resize overflow issue This commit change sizeByte from uint32 to uint64 to fix overflow issue. Fixes: #1796 Signed-off-by: Hui Zhu <teawater@antfin.com>	2021-05-06 14:13:50 +08:00
Hui Zhu	c9053ea3fb	qemu.go: qemu: setupVirtioMem: let sizeMB be multiple of 2Mib Got: FATA[0000] run pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: Add 189759MB virtio-mem-pci fail QMP command failed: backend memory size must be multiple of 0x200000: unknown This commit let sizeMB be multiple of 2Mib to fix the issue. Fixes: #1796 Signed-off-by: Hui Zhu <teawater@antfin.com>	2021-05-06 14:13:48 +08:00
Julio Montes	88cf3db601	runtime: implement CPUFlags function `CPUFlags` returns a map with all the CPU flags, these CPU flags may help us to identiry whether a system support confidential computing or not. Signed-off-by: Julio Montes <julio.montes@intel.com>	2021-05-03 09:33:13 -05:00
Eric Ernst	1c0d3afd55	Merge pull request #1754 from Jakob-Naucke/fix-virtiofs-s390x virtcontainers: Fix virtio-fs on s390x	2021-04-30 09:28:12 -07:00
Fabiano Fidêncio	2e0221125a	Merge pull request #1780 from likebreath/0429/clh_v15.0 versions: Upgrade to cloud-hypervisor v15.0	2021-04-30 18:20:36 +02:00
Fabiano Fidêncio	29fdfcfebc	Merge pull request #1725 from liubin/liubin/1724-not-return-if-get-api-socket-failed clh: return error if apiSocketPath failed	2021-04-30 18:16:45 +02:00
Fabiano Fidêncio	dc23adcd50	Merge pull request #1743 from alrs/fix-runtime-err runtime: fix dropped error	2021-04-30 18:15:22 +02:00
Fabiano Fidêncio	bd486f7bf3	Merge pull request #1720 from ManaSugi/update-seccomp-spec agent: Update seccomp configuration for errnoRet and flags	2021-04-30 10:52:42 +02:00
Bo Chen	1ca6bedf3e	versions: Upgrade to cloud-hypervisor v15.0 Quotes from the cloud-hypervisor release v15.0: This release is the first in a new version numbering scheme to represent that we believe Cloud Hypervisor is maturing and entering a period of stability. With this new release we are beginning our new stability guarantees. Other highlights from the latest release include: 1) Network device rate limiting; 2) Support for runtime control of `virtio-net` guest offload; 3) `--api-socket` supports file descriptor parameter; 4) Bug fixes on `virtio-pmem`, PCI BARs alignment, `virtio-net`, etc.; 5) Deprecation of the "LinuxBoot" protocol for ELF and bzImage in the coming release. Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v15.0 Note: The client code of cloud-hypervisor's OpenAPI is automatically generated by `openapi-generator` [1-2]. As the API changes do not impact usages in Kata, no additional changes in kata's runtime are needed to work with the current version of cloud-hypervisor. [1] https://github.com/OpenAPITools/openapi-generator [2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md Fixes: #1779 Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-04-29 10:56:22 -07:00
Jakob Naucke	3ee61776d6	virtcontainers: Enable virtio-fs on s390x Allow and configure vhost-user-fs devices (virtio-fs) on s390x. As a consequence, appendVhostUserDevice now takes a context, which affects its signature for other architectures. Fixes: #1753 Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2021-04-29 09:54:08 +02:00
Jakob Naucke	adba4532a4	virtcontainers: Revert "virtcontainers: Allow s390x appendVhostUserDevice" This reverts commit `7f60911333`. Patch allowed other vhost user devices besides FS not supported on s390x and failed to attach a CCW device number, which results in the inavailability to use more devices after vhost-user-fs-ccw. Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2021-04-29 09:43:33 +02:00
Eric Ernst	b20dff8027	Merge pull request #1759 from kata-containers/fix_update Fix the issue that sandbox size is not right after update	2021-04-28 14:48:24 -07:00
Eric Ernst	23a8179184	Merge pull request #1756 from egernst/leave-no-virtiofs-behind qemu: kill virtiofsd if failure to start VMM	2021-04-27 17:16:33 -07:00
Wainer dos Santos Moschetta	3677640811	runtime/virtcontainers: Fix typo on qmp error msg "negotiate" was misspelled on qemu's qmp error message. Fixes #1764 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2021-04-27 11:52:42 -04:00
Hui Zhu	0787ea8073	cgroupsCreate: not set resources to c.config.Resources cgroupsCreate will just keep the CPU resources infomation but not the others. Set it to c.config.Resources will clean most of resources of the container. This commit remove it to handle the issue. Fixes: #1758 Signed-off-by: Hui Zhu <teawater@antfin.com>	2021-04-27 16:44:30 +08:00
Hui Zhu	831224aa22	Sandbox: Fix ContainerConfig ptr in CreateContainer and createContainers The pointer that send to newContainer in CreateContainer and createContainers is not the pointer that point to the address in s.config.Containers. This commit fix this issue. Fixes: #1758 Signed-off-by: Hui Zhu <teawater@antfin.com>	2021-04-27 16:44:22 +08:00
Eric Ernst	a57c8ab1be	qemu: kill virtiofsd if failure to start VMM If the QEMU VMM fails to launch, we currently fail to kill virtiofsd, resulting in leftover processes running on the host. Let's make sure we kill these, and explicitly cleanup the virtiofs socket on the filesystem. Ideally we'll migrate QEMU to utilize the same virtiofsd interface that CLH uses, but let's fix this bug as a first step. Fixes: #1755 Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-04-26 21:07:20 -07:00
bin	0d0a520d42	clh: return error if apiSocketPath failed If apiSocketPath failed, should return the error, but not nil Fixes: #1724 Signed-off-by: bin <bin@hyper.sh>	2021-04-25 10:25:42 +08:00
Lars Lehtonen	fc6bb01a7f	runtime: fix dropped error Fixes: #212 Signed-off-by: Lars Lehtonen <lars.lehtonen@gmail.com>	2021-04-24 14:18:50 -07:00
Fabiano Fidêncio	fe2311cd4c	Merge pull request #1739 from pmores/virtiofsd-extra-args-annotation-handling add io.katacontainers.config.hypervisor.virtio_fs_extra_args handling	2021-04-23 23:22:01 +02:00
Pavel Mores	30ff6ee88b	runtime: handle io.katacontainers.config.hypervisor.virtio_fs_extra_args Users can specify extra arguments for virtiofsd in a pod spec using the io.katacontainers.config.hypervisor.virtio_fs_extra_args annontation. However, this annotation was ignored so far by the runtime. This commit fixes the issue by processing the annotation value (if present) and translating it to the corresponding hypervisor configuration item. Fixes #1523 Signed-off-by: Pavel Mores <pmores@redhat.com>	2021-04-23 21:09:28 +02:00
Fabiano Fidêncio	5eaf7a9982	Merge pull request #1049 from c3d/feature/1043-entropy-source-annotation Entropy source annotation	2021-04-23 20:16:11 +02:00
Fabiano Fidêncio	b41d9a99b4	Merge pull request #1703 from lifupan/main_fix fix the issue of missing set fsGroup for EphemeralStorage	2021-04-22 20:29:36 +02:00
Christophe de Dinechin	dcb9f40394	config: Protect annotation for entropy_source It would be undesirable to be given an annotation like "/dev/null". Filter out bad annotation values. Fixes: #1043 Suggested-by: James O. D. Hunt <james.o.hunt@intel.com> Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2021-04-22 15:26:40 +02:00
fupan.lfp	628d55bf4c	kata-agent: fix the issue of fsGroup missing For k8s emptyDir volume, a specific fsGroup would be set for it, thus runtime should pass this fsGroup for EphemeralStorage to guest and set it properly on the emptyDir volume in guest. Fixes: #1580 Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>	2021-04-22 21:08:52 +08:00
Chelsea Mafrica	1c222c75ac	Merge pull request #1697 from jodh-intel/improve-agent-shutdown-handling Improve agent shutdown handling	2021-04-20 21:25:36 -07:00
Manabu Sugimoto	81c5ff1231	agent: Update seccomp configuration for errnoRet and flags Update: - Make the type of errnoRet in oci.proto oneof - Update seccomp_grpc_to_oci that can set errnoRet as EPREM if the value is empty. - Update the oci.pb.go based on the above fixes - Add seccomp errnoRet and flags option to configs in rustjail Fixes: #1719 Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>	2021-04-21 12:16:58 +09:00
Fabiano Fidêncio	4c177b5c40	Merge pull request #1599 from Jakob-Naucke/virtiofs-s390x Enable virtio-fs on s390x	2021-04-20 21:07:15 +02:00
Carlos Venegas	cd27308755	Merge pull request #1432 from dgibson/bug1431 block: Generate PCI path for virtio-blk devices on clh	2021-04-20 12:00:09 -05:00
Fabiano Fidêncio	9df86d28a5	Merge pull request #1678 from cmaf/remove-spans-healthcheck runtime: Disable trace for healthcheck	2021-04-20 18:38:47 +02:00
Jakob Naucke	7f60911333	virtcontainers: Allow s390x appendVhostUserDevice Remove the prohibition of vhost-user devices on s390x, which are by now supported (e.g. vhost-user-fs-ccw). As a consequence, appendVhostUserDevice no longer needs an error in its signature. This enables virtio-fs support on s390x. Fixes: #1469 Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2021-04-20 12:20:32 +02:00
James O. D. Hunt	de2631e711	utils: Make WaitLocalProcess safer Rather than relying on the system clock, use a channel timeout to avoid problems if the system time changed. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2021-04-15 15:46:42 +01:00
James O. D. Hunt	9256e590dc	shutdown: Don't sever console watcher too early Fixed logic used to handle static agent tracing. For a standard (untraced) hypervisor shutdown, the runtime kills the VM process once the workload has finished. But if static agent tracing is enabled, the agent running inside the VM is responsible for the shutdown. The existing code handled this scenario but did not wait for the hypervisor process to end. The outcome of this being that the console watcher thread was killed too early. Although not a problem for an untraced system, if static agent tracing was enabled, the logs from the hypervisor would be truncated, missing the crucial final stages of the agents shutdown sequence. The fix necessitated adding a new parameter to the `stopSandbox()` API, which if true requests the runtime hypervisor logic simply to wait for the hypervisor process to exit rather than killing it. Fixes: #1696. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2021-04-15 15:22:00 +01:00
James O. D. Hunt	51ab870091	utils: Improve WaitLocalProcess Previously, the hypervisors were sending a signal and then checking to see if the process had died by sending the magic null signal (`0`). However, that doesn't work as it was written: the logic was assuming sending the null signal to a process that was dead would return `ESRCH`, but it doesn't: you first need to you `wait(2)` for the process before sending that signal. This means that previously, all affected hypervisors would appear to take `timeout` seconds to end, even though they had _already_ finished. Now, the hypervisors true end time will be seen as we wait for the processes before sending the null signal to ensure the process has finished. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2021-04-15 14:51:06 +01:00
James O. D. Hunt	507ef6369e	utils: Add waitLocalProcess function Refactored some of the hypervisors to remove the duplicated code used to trigger a shutdown. Also added some unit tests. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2021-04-15 14:51:03 +01:00
David Gibson	1d5098de70	agent/block: Generate PCI path for virtio-blk devices on clh Currently runtime and agent special case virtio-blk devices under clh, ostensibly because the PCI address information is not available in that case. In fact, cloud-hypervisor's VmAddDiskPut API does return a PciDeviceInfo, which includes a PCI address. That API is broken, because PCI addressing depends on guest (firmware or OS) actions that the hypervisor won't know about. clh only gets away with this because it only uses a single PCI root and never uses PCI bridges, in which case the guest addresses are accurately predictable: they always have domain and bus zero. Until https://github.com/kata-containers/kata-containers/pull/1190, Kata couldn't handle PCI addressing unless there was exactly one bridge, which might be why this was actually special-cased for clh. With #1190 merged, we can handle more general PCI paths, and we can derive a trivial (one element) PCI path from the information that the clh API gives us. We can use that to remove this special case. fixes #1431 Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-04-13 13:29:24 +10:00
Chelsea Mafrica	543f9da3ba	runtime: Disable trace for healthcheck With tracing enabled, grpc health check generates a large number of spans which creates too much data for tasks running longer than a few minutes. To solve this, remove span creation from kata agent check() and sendReq() where the majority of the spans come from. Leave contexts in functions for subsequent calls that create spans. Fixes #1395 Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>	2021-04-09 15:47:00 -07:00
bin	421439c633	API: remove ProcessListContainer/ListProcesses This commit will remove ProcessListContainer API from VCSandbox and ListProcesses from agent.proto. Fixes: #1668 Signed-off-by: bin <bin@hyper.sh>	2021-04-09 17:34:25 +08:00
bin	d75fe95685	virtcontainers: replace newStore by store in Sandbox struct The property name make newcomers confused when reading code. Since in Kata Containers 2.0 there will only be one type of store, so it's safe to replace it by `store` simply. Fixes: #1660 Signed-off-by: bin <bin@hyper.sh>	2021-04-08 23:59:16 +08:00
GabyCT	0b87fd436f	Merge pull request #1544 from snir911/timeout runtime: increase dial timeout	2021-04-06 16:10:51 -05:00
Peng Tao	d5600641dd	Merge pull request #1603 from lifupan/fix_fsgroup Fix fsgroup	2021-04-06 11:35:03 +08:00
Snir Sheriber	13653e7b55	runtime: increase dial timeout On some setups, starting multiple kata pods (qemu) simultaneously on the same node might cause kata VMs booting time to increase and the pods to fail with: Failed to check if grpc server is working: rpc error: code = DeadlineExceeded desc = timed out connecting to vsock 1358662990:1024: unknown Increasing default dialing timeout to 30s should cover most cases. Signed-off-by: Snir Sheriber <ssheribe@redhat.com> Fixes: #1543	2021-04-04 09:37:38 +03:00
Bo Chen	1511d966aa	Merge pull request #1616 from egernst/dechat-deruntime Dechat deruntime	2021-04-01 11:02:27 -07:00
Chelsea Mafrica	4a3282cf1a	Merge pull request #1608 from likebreath/0331/go_fmt_clh_clinet_code runtime: Format auto-generated client code for cloud-hypervisor API	2021-04-01 10:39:02 -07:00
Eric Ernst	a4c125a8b9	trace: move gRPC requests from debug to trace There are many requests to the agent that happen with relatively high frequency when a workload is running (checkRequest, as an example). Let's move from Debug to Trace to avoid bombarding journal. Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>	2021-04-01 09:03:26 -07:00
Fupan Li	5524bc806b	Merge pull request #1612 from liubin/1610/use-concrete-kata-agent-config-type runtime: use concrete KataAgentConfig instead of interface type	2021-04-01 21:26:38 +08:00
bin	6fe48329b5	runtime: use concrete KataAgentConfig instead of interface type Kata Containers 2.0 only have one type of agent, so there is no need to use interface as config's type Fixes: #1610 Signed-off-by: bin <bin@hyper.sh>	2021-04-01 13:44:45 +08:00
fupan.lfp	88e58a4f4b	agent: fix the issue of missing pass fsGroup For k8s emptyDir volume, a specific fsGroup would be set for it, thus runtime should pass this fsGroup to guest and set it properly on the emptyDir volume in guest. Fixes: #1580 Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>	2021-04-01 11:33:18 +08:00
Bo Chen	0c38d9ecc4	runtime: Fix the format of the client code of cloud-hypervisor APIs Regenerate the client code with the added `go-fmt` step. No functional changes. Fixes: #1606 Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-03-31 14:41:44 -07:00
Bo Chen	52cacf8838	runtime: Format auto-generated client code for cloud-hypervisor API This patch extends the current process of generating client code for cloud-hypervisor API with an additional step, `go-fmt`, which will remove the generated `client/go.mod` file and format all auto-generated code. Fixes: #1606 Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-03-31 14:36:24 -07:00
Eric Ernst	c0c7bef2b8	Merge pull request #1592 from likebreath/0330/versions_clh_v0.14.0 versions: Update cloud-hypervisor to release v0.14.1	2021-03-31 12:39:35 -07:00
Bo Chen	84b62dc3b1	versions: Update cloud-hypervisor to release v0.14.1 Highlights for cloud-hypervisor version 0.14.0 include: 1) Structured event monitoring; 2) MSHV improvements; 3) Improved aarch64 platform; 4) Updated hotplug documentation; 6) PTY control for serial and virtio-console; 7) Block device rate limiting; 8) Plan to deprecate the support of "LinuxBoot" protocol and support PVH protocol only. Highlights for cloud-hypervisor version 0.13.0 include: 1) Wider VFIO device support; 2) Improve huge page support; 3) MACvTAP support; 4) VHD disk image support; 5) Improved Virtio device threading; 6) Clean shutdown support via synthetic power button. Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases Note: The client code of cloud-hypervisor's OpenAPI is automatically generated by `openapi-generator` [1-2]. As the API changes do not impact usages in Kata, no additional changes in kata's runtime are needed to work with the latest version of cloud-hypervisor. [1] https://github.com/OpenAPITools/openapi-generator [2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md Fixes: #1591 Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-03-31 11:09:47 -07:00
Orestis Lagkas Nikolos	6255cc1959	virtcontainers/fc: Upgrade Firecracker to v0.23.1 This patch upgrades Firecracker version from v0.21.1 to v0.23.1 * Generate swagger models for v0.23.1 (from firecracker.yaml) * Change uint64 types in TokenBucket object according to rate-limiter implementation (introduced in commit #cfeb966) * Update Firecracker Logger/Metrics to support the new API * Update payload in fc.vmRunning to support the new API * Add Metrics type to fcConfig Fixes: #1518 Signed-off-by: Orestis Lagkas Nikolos <olagkasn@nubificus.co.uk>	2021-03-31 04:55:40 -05:00
Chelsea Mafrica	e5aa4e7eb4	Merge pull request #1563 from Jakob-Naucke/s390x-missing-contexts virtcontainers: Fix missing contexts in s390x	2021-03-30 09:38:28 -07:00
Tim Zhang	b58fb25d88	Merge pull request #1555 from liubin/fix/1554-install-hook-before-test test: install mock hook binary before test	2021-03-30 14:01:56 +08:00
Eric Ernst	24214a536a	Merge pull request #1560 from egernst/fix-1559 container: on cleanup, rm container directory for mounts path	2021-03-29 14:14:52 -07:00
GabyCT	17840cb573	Merge pull request #1546 from devimc/2021-03-24/supportQEMU6 runtime: add support for QEMU 6	2021-03-29 14:33:16 -06:00
Eric Ernst	9a4e866654	container: on cleanup, rm container directory for mounts path A wrong path was being used for container directory when virtiofs is utilized. This resulted in a warning message in logs when a container is killed, or completes: level=warning msg="Could not remove container share dir" Without proper removal, they'd later be cleaned up when the shared path is removed as part of stopping the sandbox. Fixes: #1559 Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>	2021-03-29 11:39:39 -07:00
Jakob Naucke	31ced01eba	virtcontainers: Fix missing contexts in s390x #1389 has added a context for many signatures to improve trace spans. Functions specific to s390x lack this. Add context where required. This affects some common code signatures, since some functions that do not require context on other architectures do require it on s390x. Also remove an unnecessary import in test_qemu_s390x.go. Fixes: #1562 Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2021-03-29 17:49:27 +02:00
bin	48e5e4f2f3	test: install mock hook binary before test `make test` depends mock hook in virtcontainers directory, before test, install it first. And also run test as normal user and root in GitHub actions. Fixes: #1554 Signed-off-by: bin <bin@hyper.sh>	2021-03-29 22:40:45 +08:00
Bin Liu	594c47ab6c	Merge pull request #1553 from bergwolf/ro-volumes runtime: fix virtiofsd RO volume sharing	2021-03-29 20:43:34 +08:00
Peng Tao	e34924488b	runtime: fix virtiofsd RO volume sharing Right now we rely heavily on mount propagation to share host files/directories to the guest. However, because virtiofsd pivots and moves itself to a separate mount namespace, the remount mount is not present in virtiofsd's mount. And it causes guest to be able to write to the host RO volume. To fix it, create a private RO mount and then move it to the host mounts dir so that it will be present readonly in the host-guest shared dir. Fixes: #1552 Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2021-03-29 13:54:25 +08:00
bin	532ff7c909	runtime: update virtcontainers API documentation Virtcontainers API documentation is outdated, update documentation from the latest source. Fixes: #1455 Signed-off-by: bin <bin@hyper.sh>	2021-03-29 11:50:53 +08:00
Chelsea Mafrica	f3ebbb1f1a	runtime: Fix trace span ordering Return ctx in trace() functions to correct span ordering. Fixes #1550 Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>	2021-03-25 11:43:04 -07:00
Julio Montes	1555bfd8b5	runtime: add support for QEMU 6 Use `on` and `off` to enable or disable features, `no` prefix is deprecated fixes #1545 Signed-off-by: Julio Montes <julio.montes@intel.com>	2021-03-24 10:55:35 -06:00
James O. D. Hunt	2fc7f75724	Merge pull request #1521 from jodh-intel/verify-cid Verify container ID	2021-03-24 13:27:58 +00:00
Peng Tao	74192d179d	runtime: fix static check errors It turns out we have managed to break the static checker in many difference places with the absence of static checker in github action. Let's fix them while enabling static checker in github actions... Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2021-03-24 20:10:19 +08:00
Peng Tao	a2dee1f6a0	runtime: fix vm factory UT failure We need to use different mocked socket otherwise they conflict with each other. Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2021-03-24 18:21:21 +08:00
Peng Tao	0153f76b07	runtime: gofmt code Looks like we have merged a lot of code that is not properly formated. Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2021-03-24 14:37:46 +08:00
Peng Tao	b2ec5a43d5	runtime: fix cleanupSandboxBindMounts panic Found in UT: --- FAIL: TestKataCleanupSandbox (0.00s) panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2021-03-23 16:44:47 +08:00
Peng Tao	8e71c4fc7a	runtime: fix missing context argument in mocked sandbox APIs Missing context.Context in several APIs. Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2021-03-23 16:19:46 +08:00
Peng Tao	8ff62beeb4	runtime: fix vcmock build failure github.com/kata-containers/kata-containers/src/runtime/virtcontainers/pkg/vcmock virtcontainers/pkg/vcmock/container.go:19:10: cannot use c.MockSandbox (type Sandbox) as type virtcontainers.VCSandbox in return argument: Sandbox does not implement virtcontainers.VCSandbox (missing GetHypervisorPid method) github.com/kata-containers/kata-containers/src/runtime/pkg/katautils Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2021-03-23 15:57:07 +08:00
Chelsea Mafrica	3369fc8b4b	Merge pull request #1514 from fgiudici/port_cgroup_fix [forwardport] Fixup systemd cgroup handling	2021-03-19 14:18:03 -07:00
James O. D. Hunt	12e9f7f82c	runtime: Add missing test mock function Added a missing `vcmock.Sandbox.GetHypervisorPid()` function. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2021-03-17 10:37:47 +00:00
Chelsea Mafrica	4bf84b4b2f	runtime: Add contexts to calls in unit tests Modify calls in unit tests to use context since many functions were updated to accept local context to fix trace span ordering. Fixes #1355 Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>	2021-03-16 17:39:28 -07:00
Chelsea Mafrica	6b0dc60dda	runtime: Fix ordering of trace spans A significant number of trace calls did not use a parent context that would create proper span ordering in trace output. Add local context to functions for use in trace calls to facilitate proper span ordering. Additionally, change whether trace function returns context in some functions in virtcontainers and use existing context rather than background context in bindMount() so that span exists as a child of a parent span. Fixes #1355 Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>	2021-03-16 17:39:28 -07:00
Eric Ernsteernst	d7cb3df0d2	cgroups: Add systemd detection when creating cgroup manager Look at the provided cgroup path to determine whether systemd is being used to manage the cgroups. With this, systemd cgroups are being detected and created appropriately for the sandbox. Fixes: #599 Signed-off-by: Eric Ernsteernst <eric@amperecomputing.com> (forward port of https://github.com/kata-containers/runtime/pull/2817) Signed-off-by: Francesco Giudici <fgiudici@redhat.com>	2021-03-16 08:27:14 +01:00
Eric Ernsteernst	f659871f55	cgroups: remove unused SystemdCgroup variable and accessor/mutators Since we are now detecting, no longer to keep this state. Signed-off-by: Eric Ernsteernst <eric@amperecomputing.com> (forward port of https://github.com/kata-containers/runtime/pull/2817) Signed-off-by: Francesco Giudici <fgiudici@redhat.com>	2021-03-16 08:26:15 +01:00
Eric Ernst	48ed8f3c4a	runtime: add support for readonly sandbox bindmounts If specified, sandbox_bind_mounts identifies host paths to be mounted (ro) into the sandboxes shared path. This is only valid if filesystem sharing is utilized. The provided path(s) will be bindmounted (ro) into the shared fs directory on the host, and thus mapped into the guest. If defaults are utilized, these mounts should be available in the guest at `/var/run/kata-containers/shared/containers/sandbox-mounts` These will not be exposed to the container workloads, and are only added for potential guest-services to consume (example: expose certs into the guest that are available on the host). Fixes: #1464 Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>	2021-03-04 10:04:25 -08:00
fupan.lfp	bc0ac526a2	shimv2: return the hypervisor's pid as the container pid Since the kata's hypervisor process is in the network namespace, which is close to container's process, and some host metrics such as cadvisor can use this pid to access the network namespace to get some network metrics. Thus this commit replace the shim's pid with the hypervisor's pid. Fixes: #1451 Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>	2021-02-24 13:26:05 +08:00
David Gibson	72cb9287a0	vhost-user-blk: Use PciPath type for vhost user devices VhostUserDeviceAttrs::PCIAddr didn't actually store a PCI address (DDDD:BB:DD.F), but rather a PCI path. Use the PciPath type and rename things to make that clearer. TestHandleBlockVolume previously used the bizarre value "0001:01" which is neither a PCI address nor a PCI path for this value. Change it to a valid PCI path - it appears the actual value didn't matter for that test, as long as it was consistent. Forward port of `3596058c67` fixes #1040 Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-02-19 09:56:08 +11:00
David Gibson	74f5b5febe	runtime/block: Use PciPath type through block code BlockDrive::PCIAddr doesn't actually store a PCI address (DDDD:BB:DD.F) but a PCI path. Use the PciPath type and rename things to make that clearer. TestHandleBlockVolume() previously used a bizarre value "0002:01" for the "PCI address" which was neither an actual PCI address, nor a PCI path. Update it to use a PCI path - the actual value appears not to matter in this test, as long as its consistent throughout. Forward port of `64751f377b` fixes #1040 Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-02-19 09:56:08 +11:00
David Gibson	32b40f5fe4	runtime/network: Use PciPath type through network handling The "PCI address" returned by Endpoint::PciPath() isn't actually a PCI address (DDDD:BB:DD.F), but rather a PCI path. Rename and use the PciPath type to clean this up and the various parts of the network code connected to it. Forward port of `3e589713cf` fixes #1040 Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-02-19 09:56:08 +11:00
David Gibson	7e92831c7a	protocols: Update PCI path names / terminology in agent protocol def Now that we have types to represent PCI paths on both the agent and runtime sides, we can update the protocol definitionto use clearer terminology. Note that this doesn't actually change the agent protocol, because it just renames a field without changing its field ID or type. While we're there fix a trivial rustfmt error in src/agent/protocols/build.rs Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-02-19 09:56:08 +11:00
David Gibson	8e5fd8ee84	runtime: Introduce PciSlot and PciPath types This is a dedicated data type for representing PCI paths, that is, PCI devices described by the slot numbers of the bridges we need to reach them. There are a number of places that uses strings with that structure for things. The plan is to use this data type to consolidate their handling. These are essentially Go equivalents of the pci::Slot and pci::Path types introduced in the Rust agent. Forward port of `185b3ab044` Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-02-19 09:56:05 +11:00
Eric Ernst	3721351324	runtime: cpuset: when creating container, don't pass cpuset details Today we only clear out the cpuset details when doing an update call on existing container/pods. This works in the case of Kubernetes, but not in the case where we are explicitly setting the cpuset details at boot time. For example, if you are running a single container via docker ala: docker run --cpuset-cpus 0-3 -it alpine sh What would happen is the cpuset info would be passed in with the container spec for create container request to the agent. At that point in time, there'd only be the defualt number of CPUs available in the guest (1), so you'd be left with cpusets set to 0. Next, we'd hotplug the vCPUs, providing 0-4 CPUs in the guest, but the cpuset would never be updated, leaving the application tied to CPU 0. Ouch. Until the day we support cpusets in the guest, let's make sure that we start off clearing the cpuset fields. Fixes: #1405 Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>	2021-02-11 17:38:15 -08:00
Fupan Li	5d1432210c	Merge pull request #1352 from liubin/fix/migrate-opentracing-to-opentelemetry runtime: migrate from opentracing to opentelemetry	2021-02-09 10:18:10 +08:00
Chelsea Mafrica	38b5a43267	Merge pull request #1318 from jongwu/acpi arm64: enable acpi for qemu/virt.	2021-02-03 16:37:49 -08:00
bin	17df9b119d	runtime: migrate from opentracing to opentelemetry This commit includes two changes: - migrate from opentracing to opentelemetry - add jaeger configuration items Fixes: #1351 Signed-off-by: bin <bin@hyper.sh>	2021-02-03 17:30:49 +08:00
Jianyong Wu	b7a1f752c0	arm64: enable acpi for qemu/virt. acpi is enabled for kata 1.x, port and rebase code for 2.x including: runtime: enable pflash; agent: add acpi support for pci bus path; packaging: enable CONFIG_RTC_DRV_EFI; Fixes: #1317 Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>	2021-01-29 22:12:43 +08:00
Bo Chen	c2d14cdeea	versions: Update cloud-hypervisor to release v0.12.0 Highlights for cloud-hypervisor version v0.12.0 include: removal of `vhost-user-net` and `vhost-user-block` self spawning, migration of `vhost-user-fs` backend, ARM64 enhancements with full support of `--watchdog` for rebooting, and enhanced `info` HTTP API to include the details of devices used by the VM including VFIO devices. Fixes: #1315 Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-01-25 10:58:19 -08:00
Eric Ernst	789fd7c1c6	blk-dev: hotplug readonly if applicable If a block based volume is read only, let's make sure we add as a RO device Fixes: #1246 Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>	2021-01-12 14:50:54 -08:00
Eric Ernst	12777b26e4	volumes: cleanup / minor refactoring Update some headers, very minor refactoring Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>	2021-01-12 14:50:47 -08:00
Eric Ernst	542e93d987	Merge pull request #1180 from egernst/qemu-cleanup-check qemu: no state to save if QEMU isn't running	2021-01-06 11:17:54 -08:00
Eric Ernst	9a7bcccc8e	qemu: no state to save if QEMU isn't running On pod delete, we were looking to read files that we had just deleted. In particular, stopSandbox for QEMU was called (we cleanup up vmpath), and then QEMU's save function was called, which immediately checks for the PID file. Let's only update the persist store for QEMU if QEMU is actually running. This'll avoid Error messages being displayed when we are stopping and deleting a sandbox: ``` level=error msg="Could not read qemu pid file" ``` I reviewed CLH, and it looks like it is already taking appropriate action, so no changes needed. Ideally we won't spend much time saving state to persist.json unless there's an actual error during stop/delete/shutdown path, as the persist will also be removed after the pod is removed. We may want to optimize this, as currently we are doing a persist store when deleting each container (after the sandbox is stopped, VM is killed), and when we stop the sandbox. This'll require more rework... tracked in: https://github.com/kata-containers/kata-containers/issues/1181 Fixes: #1179 Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>	2020-12-21 11:29:44 -08:00
David Gibson	e004616b02	runtime/network: Fix error reporting in listRoutes() If the upcast from resultingRoutes to *grpc.IRoutes fails, we return (nil, err), but previous code ensures that err is nil at that point, so we return no error. fixes #1206 Forward port of `0ffaeeb5d8` Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-12-18 14:36:09 +11:00
David Gibson	1ae8e81abb	runtime/network: Correct error reporting in listInterfaces() If the upcast from resultingInterfaces to *grpc.Interfaces fails, we return (nil, err), but previous code ensures that err is nil at that point, so we return no error. Forward port of `b86e904c2d` fixes #1206 Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-12-18 14:35:50 +11:00
David Gibson	a19263e58d	agent/protocols: Remove unneeded import from oci.proto oci.proto imports "google/protobuf/wrappers.proto", but doesn't appear to use it, which causes a warning from protoc when we compile it. Remove the import to fix the warning. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-12-17 13:06:41 +11:00
Bo Chen	647331ace6	runtime: clh: Enforce to call 'cleanupVM' for 'stopSandbox' We should always cleanup the vm directory when doing `stopSandbox`, while we are skipping the cleanup process on some error code paths when using cloud-hypervisor driver. Fixes: #1098 Signed-off-by: Bo Chen <chen.bo@intel.com>	2020-12-01 17:27:44 -08:00
bin liu	fdbf7d3222	virtcontainers: revert CleanupContainer from PR 1079 In PR 1079, CleanupContainer's parameter of sandboxID is changed to VCSandbox, but at cleanup, there is no VCSandbox is constructed, we should load it from disk by loadSandboxConfig() in persist.go. This commit reverts parts of #1079 Fixes: #1119 Signed-off-by: bin liu <bin@hyper.sh>	2020-11-17 10:31:33 +08:00
bin liu	4e3a8c0124	runtime: remove global sandbox variable Remove global sandbox variable, and save *Sandbox to hypervisor struct. For some needs, hypervisor may need to use methods from Sandbox. Signed-off-by: bin liu <bin@hyper.sh>	2020-11-13 09:47:09 +08:00
bin liu	290203943c	runtime: delete sandboxlist.go and sandboxlist_test.go Delete sandboxlist.go and sandboxlist_test.go under virtcontainers package. Fixes: #1078 Signed-off-by: bin liu <bin@hyper.sh>	2020-11-13 09:47:09 +08:00
Julio Montes	36f65ce182	runtime: clh: update cloud-hypervisor Update cloud-hypervisor to commit 2706319. Fixes a limitation in OpenAPITools/openapi-generator tool, it's impossible to send go zero types, like false and 0 to cloud-hypervisor because `omitempty` is added if a field is not required. See cloud-hypervisor/cloud-hypervisor#1961 for more information Signed-off-by: Julio Montes <julio.montes@intel.com>	2020-11-12 09:33:56 -06:00
Julio Montes	e1396f0402	runtime: clh: disable virtiofs DAX when FS cache size is 0 Guest consumes 120Mb more of memory when DAX is enabled and the default FS cache size (8G) is used. Disable dax when it is not required reducing guest's memory footprint. Without this patch: ``` 7fdea4000000-7fdee4000000 rw-s 18850589 /memfd:ch_ram (deleted) Size: 1048576 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 187876 kB ``` With this patch: ``` 7fa970000000-7fa9b0000000 rw-s 612001 /memfd:ch_ram (deleted) Size: 1048576 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 57308 kB Pss: 56722 kB ``` fixes #1100 Signed-off-by: Julio Montes <julio.montes@intel.com>	2020-11-12 09:33:56 -06:00
Peng Tao	3c88106f65	Merge pull request #1084 from liubin/fix/1081-clean-codes runtime: clean/refactor code	2020-11-11 10:09:10 +08:00
Bo Chen	359ab16a8f	Merge pull request #1090 from likebreath/1106/clh_upgrade_v0.11.0 versions: Update cloud-hypervisor to release v0.11.0	2020-11-09 15:51:09 -08:00
bin liu	b8414045bf	runtime: remove nsenter remove code for nsenter Fixes: #1081 Signed-off-by: bin liu <bin@hyper.sh>	2020-11-09 11:42:51 +08:00
bin liu	e3510be867	runtime: use one line if statement to check if err is nil for qemu.go Use `if err := q.qmpSetup(); err != nil` to reduce code and make it easy to read. And remove checking err if last function call also return an error, return the function call directly. Fixes: #1081 Signed-off-by: bin liu <bin@hyper.sh>	2020-11-09 11:42:45 +08:00
Fupan Li	d22c7cf00b	Merge pull request #1013 from liubin/feature/1012-dump-guest-memroy-on-panic Dump guest memory when kernel panic for QEMU	2020-11-09 09:46:28 +08:00
Bo Chen	92c1c4c690	versions: Update cloud-hypervisor to release v0.11.0 The release v0.11.0 of cloud-hypervisor features the following changes: 1) Improved Linux Boot Time, 2) `SIGTERM/SIGINT` Interrupt Signal, Handling 3) Default Log Level Changed, 4) `io_uring` support by default for `virtio-block` (on host kernel version 5.8+), 5) Windows Guest Support, 6) New `--balloon` Parameter Added, 7) Experimental `virtio-watchdog` Support, 8) Bug fixes. Fixes: #1089 Signed-off-by: Bo Chen <chen.bo@intel.com>	2020-11-06 16:19:31 -08:00
Archana Shinde	6160043c01	Merge pull request #1077 from likebreath/1103/clh_refactor_device_unplug clh: Consolidate the code path for device unplug	2020-11-06 16:00:56 -08:00
Peng Tao	c7a2b12fab	Merge pull request #1086 from jodh-intel/2.0-dev-fix-annotations annotations: Improve asset annotation handling	2020-11-06 10:29:22 +08:00
James O. D. Hunt	5ced96e96d	hypervisor: Remove unused methods Deleted `HypervisorConfig`'s unused `CustomFirmwareAsset()` and `JailerAssetPath()` methods. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2020-11-05 12:15:47 +00:00
James O. D. Hunt	e82c9daec3	annotations: Improve asset annotation handling Make `asset.go` the arbiter of asset annotations by removing all asset annotations lists from other parts of the codebase. This makes the code simpler, easier to maintain, and more robust. Specifically, the previous behaviour was inconsistent as the following ways: - `createAssets()` in `sandbox.go` was not handling the following asset annotations: - firmware: - `io.katacontainers.config.hypervisor.firmware` - `io.katacontainers.config.hypervisor.firmware_hash` - hypervisor: - `io.katacontainers.config.hypervisor.path` - `io.katacontainers.config.hypervisor.hypervisor_hash` - hypervisor control binary: - `io.katacontainers.config.hypervisor.ctlpath` - `io.katacontainers.config.hypervisor.hypervisorctl_hash` - jailer: - `io.katacontainers.config.hypervisor.jailer_path` - `io.katacontainers.config.hypervisor.jailer_hash` - `addAssetAnnotations()` in the `oci` package was not handling the following asset annotations: - hypervisor: - `io.katacontainers.config.hypervisor.path` - `io.katacontainers.config.hypervisor.hypervisor_hash` - hypervisor control binary: - `io.katacontainers.config.hypervisor.ctlpath` - `io.katacontainers.config.hypervisor.hypervisorctl_hash` - jailer: - `io.katacontainers.config.hypervisor.jailer_path` - `io.katacontainers.config.hypervisor.jailer_hash` This change fixes the bug where specifying a custom hypervisor path via an asset annotation was having no effect. Fixes: #1085. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2020-11-05 12:15:42 +00:00
James O. D. Hunt	0f26f1cd6f	annotations: Add missing hypervisor control annotation Add missing annotation definitions for a hypervisor control binary: - `io.katacontainers.config.hypervisor.ctlpath` - `io.katacontainers.config.hypervisor.hypervisorctl_hash` Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2020-11-05 12:12:58 +00:00
James O. D. Hunt	76064e3e2d	asset: Formatting, grammar and whitespace Improve formatting, grammar and whitespace. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2020-11-05 12:12:51 +00:00
bin liu	40418f6d88	runtime: add geust memory dump When guest panic, dump guest kernel memory to host filesystem. And also includes: - hypervisor config - hypervisor version - and state of sandbox Fixes: #1012 Signed-off-by: bin liu <bin@hyper.sh>	2020-11-05 16:04:21 +08:00
Peng Tao	a958eaa8d3	runtime: mount shared mountpoint readonly bindmount remount events are not propagated through mount subtrees, so we have to remount the shared dir mountpoint directly. E.g., ``` mkdir -p source dest foo source/foo mount -o bind --make-shared source dest mount -o bind foo source/foo echo bind mount rw mount \| grep foo echo remount ro mount -o remount,bind,ro source/foo mount \| grep foo ``` would result in: ``` bind mount rw /dev/xvda1 on /home/ubuntu/source/foo type ext4 (rw,relatime,discard,data=ordered) /dev/xvda1 on /home/ubuntu/dest/foo type ext4 (rw,relatime,discard,data=ordered) remount ro /dev/xvda1 on /home/ubuntu/source/foo type ext4 (ro,relatime,discard,data=ordered) /dev/xvda1 on /home/ubuntu/dest/foo type ext4 (rw,relatime,discard,data=ordered) ``` The reason is that bind mount creats new mount structs and attaches them to different mount subtrees. However, MS_REMOUNT only looks for existing mount structs to modify and does not try to propagate the change to mount structs in other subtrees. Fixes: #1061 Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2020-11-04 17:51:49 +08:00
Peng Tao	125e21cea3	runtime: readonly mounts should be readonly bindmount on the host So that we get protected at the VM boundary not just the guest kernel. Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2020-11-04 17:51:49 +08:00
Bo Chen	93d7962510	clh: Consolidate the code path for device unplug In cloud-hypervisor, it provides a single unified way of unplugging devices, e.g. the `/vm.RemoveDevice` HTTP API. Taking advantage of this API, we can simplify our implementation of `hotplugRemoveDevice` in `clh.go`, where we can consolidate similar code paths for different device unplug (e.g. no need to implement `hotplugRemoveBlockDevice` and `hotplugRemoveVfioDevice` separately). We will only need to retrieve the right `deviceID` based on the type of devices, and use the single unified HTTP API for device unplug. Fixes: #1076 Signed-off-by: Bo Chen <chen.bo@intel.com>	2020-11-03 15:46:38 -08:00
Julio Montes	77b50969ea	runtime: cloud-hypervisor: reduce memory footprint Cloud-hypervisor supports DAX, let's enable it to reduce its memory footprint. Before this patch: 19.96M ``` 20448kB -- [/usr/share/kata-containers/kata.img] ``` With this patch: 10.83M ``` 11100kB -- [/usr/share/kata-containers/kata.img] ``` fixes #1056 Signed-off-by: Julio Montes <julio.montes@intel.com>	2020-10-29 14:21:57 -06:00
Peng Tao	b316661818	runtime: remove the unused proto files These are moved to the agent and no longer needed. Fixes: #1028 Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2020-10-25 10:57:38 +08:00
Peng Tao	54e23c8302	agent: move gogo.proto out of the github.com namespance To follow the same namespace scope as other proto files. Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2020-10-25 10:44:53 +08:00
Peng Tao	583e6ed3e5	agent: types.pb.go is not regenerated When types.proto was relocated, types.pb.go is not regenerated and still references the old location. Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2020-10-25 10:35:35 +08:00
Archana Shinde	5f0b83cc54	Merge pull request #1000 from jongwu/pci arm64: correct bridge type for QEMUVIRT	2020-10-22 13:53:27 -07:00
bin liu	5b065eb599	runtime: change govmm package Change govmm package name from github.com/intel/govmm to github.com/kata-containers/govmm Fixes: #859 Signed-off-by: bin liu <bin@hyper.sh>	2020-10-22 21:27:49 +08:00
Jianyong Wu	9eab301526	arm64: correct bridge type for QEMUVIRT port forward PR https://github.com/kata-containers/runtime/pull/3017 Fixes: #3016 Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>	2020-10-20 14:09:03 +08:00
Julio Montes	f162e7e960	Merge pull request #948 from justin-he/max_ports virtcontainers: Append max_ports to virtio-serial device	2020-10-19 08:55:06 -05:00
Jia He	da79b4be67	virtcontainers: Append max_ports to virtio-serial device Allow API consumers to change the maximum number of ports in the virtio-serial devices, setting a lower number of ports can improve the boot time and reduce the attack surface. Before this patch on arm64: [ 0.028664] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled [ 0.055031] printk: console [hvc0] enabled After this patch on arm64: [ 0.028484] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled [ 0.031370] printk: console [hvc0] enabled Fixes: #2676 Signed-off-by: Jia He <justin.he@arm.com>	2020-10-16 23:40:54 +08:00
Peng Tao	0d5d69e8cd	Merge pull request #902 from c3d/bug-v2/launchpad-1878234-access Validate runtime annotations	2020-10-16 15:47:45 +08:00
Christophe de Dinechin	c5771be2de	annotations: Correct unit tests to validate new protections Add the verification of some basic protections, namely that: - EnableAnnotations is honored - Dangerous paths cannot be modified if no match - Errors are returned when expected Fixes: #901 Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2020-10-14 16:10:12 +02:00
Christophe de Dinechin	398d79184c	annotations: Split addHypervisorOverrides to reduce complexity Warning from gocyclo during make check: virtcontainers/pkg/oci/utils.go:404:1: cyclomatic complexity 37 of func `addHypervisorConfigOverrides` is high (> 30) (gocyclo) func addHypervisorConfigOverrides(ocispec specs.Spec, config *vc.SandboxConfig, runtime RuntimeConfig) error { ^ Fixes: #901 Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2020-10-14 16:10:12 +02:00
Christophe de Dinechin	b2b3bc7ad8	annotations: Add unit test for checkPathIsInGlobs There are a few interesting corner cases to consider for this function. Fixes: #901 Suggested-by: James O.D. Hunt <james.o.hunt@intel.com> Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2020-10-14 16:10:12 +02:00
Christophe de Dinechin	6f52179ce4	annotations: Add unit test for regexpContains function James O.D Hunt: "But also, regexpContains() and checkPathIsInGlobList() seem like good candidates for some unit tests. The "look" obvious, but a few boundary condition tests would be useful I think (filenames with spaces, backslashes, special characters, and relative & absolute paths are also an interesting thought here)." There aren't that many boundary conditions on a list with regexps, if you assume the regexp match function itself works. However, the tests is useful in documenting expectations. Fixes: #901 Suggested-by: James O.D. Hunt <james.o.hunt@intel.com> Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2020-10-14 16:10:12 +02:00
Christophe de Dinechin	b119427405	annotations: Give better names to local variabes in search functions Use more meaningful variable names for clarity. Fixes: #901 Suggested-by: James O.D. Hunt james.o.hunt@intel.com> Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2020-10-14 16:10:12 +02:00
Christophe de Dinechin	b5db114aad	annotations: Rename checkPathIsInGlobList with checkPathIsInGlobs The name is shorter and more specific Fixes: #901 Suggested-by: James O.D. Hunt <james.o.hunt@intel.com> Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2020-10-14 16:10:12 +02:00
Christophe de Dinechin	7c6aede5d4	config: Whitelist hypervisor annotations by name Add a field "enable_annotations" to the runtime configuration that can be used to whitelist annotations using a list of regular expressions, which are used to match any part of the base annotation name, i.e. the part after "io.katacontainers.config.hypervisor." For example, the following configuraiton will match "virtio_fs_daemon", "initrd" and "jailer_path", but not "path" nor "firmware": enable_annotations = [ "virtio.*", "initrd", "_path" ] The default is an empty list of enabled annotations, which disables annotations entirely. If an anontation is rejected, the message is something like: annotation io.katacontainers.config.hypervisor.virtio_fs_daemon is not enabled Fixes: #901 Suggested-by: Peng Tao <tao.peng@linux.alibaba.com> Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2020-10-14 16:10:12 +02:00
Christophe de Dinechin	f047fced0b	config: Use glob instead of regexp to match paths in annotations When filtering annotations that correspond to paths, e.g. hypervisor.path, it is better to use a glob syntax than a regexp syntax, as it is more usual for paths, and prevents classes of matches that are undesirable in our case, such as matching .. against .* Fixes: #901 Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2020-10-14 16:10:12 +02:00
Christophe de Dinechin	11b9c90cd8	annotations: Fix typo in comment A comment talking about runtime related annotations describes them as being related to the agent. A similar comment for the agent annotations is missing. Fixes: #901 Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2020-10-14 16:10:12 +02:00
Christophe de Dinechin	4e89b885d2	config: Protect file_mem_backend against annotation attacks This one could theoretically be used to overwrite data on the host. It seems somewhat less risky than the earlier ones for a number of reasons, but worth protecting a little anyway. Fixes: #901 Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2020-10-14 16:10:12 +02:00
Christophe de Dinechin	aae9656d8b	config: Protect vhost_user_store_path against annotation attacks This path could be used to overwrite data on the host. Fixes: #901 Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2020-10-14 16:10:12 +02:00
Christophe de Dinechin	b21a829c61	config: Protect ctlpath from annotation attack This also adds annotation for ctlpath which were not present before. It's better to implement the code consistenly right now to make sure that we don't end up with a leaky implementation tacked on later. Fixes: #901 Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2020-10-14 16:10:12 +02:00
Christophe de Dinechin	27b6620b23	config: Protect jailer_path annotation The jailer_path annotation can be used to execute arbitrary code on the host. Add a jailer_path_list configuration entry providing a list of regular expressions that can be used to filter annotations that represent valid file names. Fixes: #901 Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2020-10-14 16:10:12 +02:00
Christophe de Dinechin	2d431c61c6	annotations: Simplify negative logic Replace strange negative logic (!ok -> continue) with positive logic (ok -> do it) Fixes: #901 Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2020-10-14 16:10:12 +02:00
Christophe de Dinechin	2ca9ca892d	config: Add hypervisor path override through annotations The annotation is provided, so it should be respected. Furthermore, it is important to implement it with the appropriate protetions similar to what was done for virtiofsd. Fixes: #901 Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2020-10-14 16:10:12 +02:00
Christophe de Dinechin	2e093dfd8b	config: Fix typo in function name There was an extra 'p' in addHypervisorVirtioFsOverrides. Fixes: #901 Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2020-10-14 16:10:12 +02:00
Christophe de Dinechin	bf13ff0a3a	config: Protect virtio_fs_daemon annotation Sending the virtio_fs_daemon annotation can be used to execute arbitrary code on the host. In order to prevent this, restrict the values of the annotation to a list provided by the configuration file. Fixes: #901 Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2020-10-14 16:10:12 +02:00
Eric Ernst	d8a8fe47fb	cpuset: don't set cpuset.mems in the guest Kata doesn't map any numa topologies in the guest. Let's make sure we clear the Cpuset fields before passing container updates to the guest. Note, in the future we may want to have a vCPU to guest CPU mapping and still include the cpuset.Cpus. Until we have this support, clear this as well. Fixes: #932 Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>	2020-10-13 15:54:03 -07:00
Eric Ernst	88cd712876	sandbox: consider cpusets if quota is not enforced CPUSet cgroup allows for pinning the memory associated with a cpuset to a given numa node. Similar to cpuset.cpus, we should take cpuset.mems into account for the sandbox-cgroup that Kata creates. Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>	2020-10-13 15:54:03 -07:00
Eric Ernst	77a463e57a	cpuset: support setting mems for sandbox CPUSet cgroup allows for pinning the memory associated with a cpuset to a given numa node. Similar to cpuset.cpus, we should take cpuset.mems into account for the sandbox-cgroup that Kata creates. Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>	2020-10-13 15:54:03 -07:00
Eric Ernst	2d690536b8	cpuset: add cpuset pkg Pulled from 1.18.4 Kubernetes, adding the cpuset pkg for managing CPUSet calculations on the host. Go mod'ing the original code from k8s.io/kubernetes was very painful, and this is very static, so let's just pull in what we need. Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>	2020-10-13 15:54:03 -07:00
Eric Ernst	12cc0ee168	sandbox: don't constrain cpus, mem only cpuset, devices Allow for constraining the cpuset as well as the devices-whitelist . Revert sandbox constraints for cpu/memory, as they break the K8S use case. Can re-add behind a non-default flag in the future. The sandbox CPUSet should be updated every time a container is created, updated, or removed. To facilitate this without rewriting the 'non constrained cgroup' handling, let's add to the Sandbox's cgroupsUpdate function. Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>	2020-10-12 21:31:27 -07:00
Eric Ernst	b6cf68a985	cgroups: add ability to update CPUSet Add function for applying a cpuset change to a cgroup Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>	2020-10-12 21:31:27 -07:00
Eric Ernst	b812d4f7fa	virtcontainers: add method for calculating cpuset for sandbox Calculate sandbox's CPUSet as the union of each of the container's CPUSets. Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>	2020-10-12 21:31:27 -07:00
Bin Liu	43da14e7b3	Merge pull request #752 from YchauWang/clear-moke-code01 runtime: Clear the VCMock 1.x API Methods from 2.0	2020-10-09 17:41:21 +08:00
Bo Chen	c33ee54a21	clh: Support VFIO device unplug This patch adds the support of VFIO device unplug when using cloud-hypervisor. Fixes: #860 Signed-off-by: Bo Chen <chen.bo@intel.com>	2020-10-05 12:20:13 -07:00
Bo Chen	1f4dfa3166	clh: Remove unnecessary VmmPing We can rely on the error handling of the actual HTTP API calls to catch errors, and don't need to call VmmPing explicitly in advance. Signed-off-by: Bo Chen <chen.bo@intel.com>	2020-10-05 12:17:45 -07:00

... 9 10 11 12 13 ...

1239 Commits