kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2025-08-15 06:34:03 +00:00

Author	SHA1	Message	Date
Wainer dos Santos Moschetta	309dae631a	virtcontainers: check that both initrd and image are not set This changed valid() in hypervisor to check the case where both initrd and image path are set; in this case it returns an error. Fixes #1868 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2021-10-26 10:44:23 -04:00
James O. D. Hunt	3120b489e3	Merge pull request #2687 from genjuro214/improve-oci-to-grpc agent-ctl: improve the oci_to_grpc code	2021-10-26 13:00:02 +01:00
James O. D. Hunt	a10cfffdff	forwarder: Fix changing log level Fix `-l <log-level>` for the trace forwarder which didn't work previously as it lacked the magic Cargo configuration. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2021-10-26 11:02:06 +01:00
James O. D. Hunt	6abccb92ce	forwarder: Drop privileges when using hybrid VSOCK Hybrid VSOCK requires `root` privileges to access the sandbox-specific host-side AF_UNIX socket created by the hypervisor (CLH or FC). However, once the socket has been bound, privileges can be dropped, allowing the forwarder to run as user `nobody`. Fixes: #2905. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2021-10-26 11:01:58 +01:00
Bin Liu	8d8604e10f	Merge pull request #2893 from liubin/fix/2892-print-error-instead-of-return agent: do not return error but print it if task wait failed	2021-10-26 17:48:17 +08:00
Lei Li	bf00b8df87	agent-ctl: improve the oci_to_grpc code The oci_to_grpc function just handles part of oci fields, and others are not copied from oci spec to grpc spec, such as process.env, process.capabilities, mounts and so on. Try to implement more handlings to convert thoses fields. Fixes #2686 Signed-off-by: Lei Li <cdlleili@cn.ibm.com>	2021-10-26 16:54:28 +08:00
James O. D. Hunt	b67fa9e450	forwarder: Make explicit root check Rather than generating a potentially misleading error message if the socket bind fails, perform an explicit check for `root` for Hybrid VSOCK. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2021-10-26 09:28:26 +01:00
James O. D. Hunt	e377578e08	forwarder: Fix docs socket path Updated the trace forwarder README to ensure the real socket path is created, not the template socket path returned by `kata-runtime env`. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2021-10-26 09:28:26 +01:00
James O. D. Hunt	d1d9e84e9f	Merge pull request #2902 from liubin/fix/2901-delete-duplicated-line virtcontainers: delete duplicated notify in watchHypervisor function	2021-10-26 08:22:11 +01:00
bin	5f306330f4	virtcontainers: delete duplicated notify in watchHypervisor function When hypervisor check failed, the notify function is called twice. Fixes: #2901 Signed-off-by: bin <bin@hyper.sh>	2021-10-26 11:58:26 +08:00
bin	5f5eca6b8e	agent: do not return error but print it if task wait failed Do not return error but print it if task wait failed and let program continue to run the next code. Fixes: #2892 Signed-off-by: bin <bin@hyper.sh>	2021-10-26 11:43:39 +08:00
Jakob Naucke	d2a7b6ff4a	packaging/static-build: s390x fixes - Install OpenSSL for key generation in kernel build - Do not install libpmem - Do not exclude `/share//*.img` files in QEMU tarball since among them are boot loader files critical for IPLing. Fixes: #2895 Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2021-10-25 18:47:35 +02:00
Yujia Qiao	6cc8000cae	cli: Show available guest protection in env output Show available guest protections in the `kata-runtime env` output. Also bump the formatVersion. Fixes: #1982 Signed-off-by: Yujia Qiao <rapiz3142@gmail.com>	2021-10-25 21:44:56 +08:00
Yujia Qiao	2063b13805	virtcontainers: Add func AvailableGuestProtections Add functions to return guestProtection as a string slice, which can be then used in `kata-runtime env` output. Signed-off-by: Yujia Qiao <rapiz3142@gmail.com>	2021-10-25 21:44:01 +08:00
Fupan Li	3d0fe433c6	Merge pull request #2889 from lht/handle-uevent-remove-actions agent: Handle uevent remove actions	2021-10-25 19:08:20 +08:00
James O. D. Hunt	ec3aa1694b	Merge pull request #2844 from jongwu/unit_test enable unit test on arm	2021-10-25 10:58:21 +01:00
Bin Liu	01fdeb7641	Merge pull request #2891 from ManaSugi/fix/unify-form rustjail: Consistent coding style of LinuxDevice type	2021-10-25 14:03:03 +08:00
Bin Liu	ded864f862	Merge pull request #2568 from Bevisy/main-2254 cli: Fix outdated kata-runtime bash completion	2021-10-25 14:02:13 +08:00
Haitao Li	a13e2f77b8	agent: Handle uevent remove actions uevents with action=remove was ignored causing the agent to reuse stale data in the device map. This patch adds handling of such uevents. Fixes #2405 Signed-off-by: Haitao Li <lihaitao@gmail.com>	2021-10-25 14:41:32 +11:00
David Gibson	a0825badf6	Merge pull request #2795 from dgibson/vfio-as-vfio Allow VFIO devices to be used as VFIO devices in the container	2021-10-25 14:25:26 +11:00
Peng Tao	e709f11229	Merge pull request #2881 from mcastelino/topic/hypervisor-rename Expose top level hypervisor methods -	2021-10-25 10:25:49 +08:00
David Gibson	34273da98f	runtime/device: Allow VFIO devices to be presented to guest as VFIO devices On a conventional (e.g. runc) container, passing in a VFIO group device, /dev/vfio/NN, will result in the same VFIO group device being available within the container. With Kata, however, the VFIO device will be bound to the guest kernel's driver (if it has one), possibly appearing as some other device (or a network interface) within the guest. This add a new `vfio_mode` option to alter this. If set to "vfio" it will instruct the agent to remap VFIO devices to the VFIO driver within the guest as well, meaning they will appear as VFIO devices within the container. Unlike a runc container, the VFIO devices will have different names to the host, since the names correspond to the IOMMU groups of the guest and those can't be remapped with namespaces. For now we keep 'guest-kernel' as the value in the default configuration files, to maintain current Kata behaviour. In future we should change this to 'vfio' as the default. That will make Kata's default behaviour more closely resemble OCI specified behaviour. fixes #693 Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-25 12:29:31 +11:00
David Gibson	68696e051d	runtime: Add parameter to constrainGRPCSpec to control VFIO handling Currently constrainGRPCSpec always removes VFIO devices from the OCI container spec which will be used for the inner container. For upcoming support for VFIO devices in DPDK usecases we'll need to not do that. As a preliminary to that, add an extra parameter to the function to control whether or not it will remove the VFIO devices from the spec. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-25 12:29:31 +11:00
David Gibson	d9e2e9edb2	runtime: Rename constraintGRPCSpec to improve grammar "constraint" is a noun, "constrain" is the associated verb, which makes more sense in this context. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-25 12:29:31 +11:00
David Gibson	57ab408576	runtime: Introduce "vfio_mode" config variable and annotation In order to support DPDK workloads, we need to change the way VFIO devices will be handled in Kata containers. However, the current method, although it is not remotely OCI compliant has real uses. Therefore, introduce a new runtime configuration field "vfio_mode" to control how VFIO devices will be presented to the container. We also add a new sandbox annotation - io.katacontainers.config.runtime.vfio_mode - to override this on a per-sandbox basis. For now, the only allowed value is "guest-kernel" which refers to the current behaviour where VFIO devices added to the container will be bound to whatever driver in the VM kernel claims them. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-25 12:29:29 +11:00
David Gibson	730b9c433f	agent/device: Create device nodes for VFIO devices Add and adjust the vfio devices in the inner container spec so that rustjail will create device nodes for them. In order to do that, we also need to make sure the VFIO device node is ready within the guest VM first. That may take (slightly) longer than just the underlying PCI device(s) being ready, because vfio-pci needs to initialize. So, add a helper function that will wait for a specific VFIO device node to be ready, using the existing uevent listening mechanism. It also returns the device node name for the device (though in practice it will always /dev/vfio/NN where NN is the group number). Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-25 12:28:33 +11:00
David Gibson	175f9b06e9	rustjail: Allow container devices in subdirectories Many device nodes go directly under /dev, however some are conventionally placed in subdirectories under /dev. For example /dev/vfio/vfio or /dev/pts/ptmx. Currently, attempting to pass such a device into a Kata container will fail because mknod() will get an ENOENT because the parent directory is missing (or an equivalent error for bind_dev()). Correct that by making subdirectories as necessary in create_devices(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-25 12:28:33 +11:00
David Gibson	9891efc61f	rustjail: Correct sanity checks on device path For each user supplied device, create_devices() checks that the given path actually is in /dev, by checking that its path starts with /dev and does not contain "..". However, this has subtle errors because it's interpreting the path as a raw string without considering separators. It will accept the path /devfoo which it should not, while it will not accept the valid (though weird) paths /dev/... and /dev/a..b. Correct this by using std::path::Path methods designed for the purpose. Having done this, it's trivial to also generate the relative path that mknod_dev() or bind_dev() will need, so do that at the same time. We also move this logic into a helper function so that we can add some unit tests for it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-25 12:28:33 +11:00
David Gibson	d6b62c029e	rustjail: Change mknod_dev() and bind_dev() to take relative device path Both these functions take the absolute path from LinuxDevice and drop the leading '/' to make a relative path. They do that with a simple &dev.path[1..]. That can be technically incorrect in some edge cases such as a path with redundant /s like "//dev//sda". To handle cases like that, have the explicit relative path passed into these functions. For now we calculate it in the same buggy way, but we'll fix that shortly. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-25 12:28:33 +11:00
David Gibson	2680c0bfee	rustjail: Provide useful context on device node creation errors create_devices() within the rustjail module is responsible for creating device nodes within the (inner) containers. Errors that occur here will be propagated up, but are likely to be low level failures of mknod() - e.g. ENOENT or EACCESS - which won't be very useful without context when reported all the way up to the runtime without the context of what we were trying to do. Add some anyhow context information giving the details of the device we were trying to create when it failed. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-25 12:28:33 +11:00
David Gibson	42b92b2b05	agent/device: Allow container devname to differ from the host Currently, update_spec_device() assumes that the proper device path in the (inner) container is the same as the device path specified in the outer OCI spec on the host. Usually that's correct. However for VFIO group devices we actually need the container to see the VM's device path, since it's normal to correlate that with IOMMU group information from sysfs which will be different in the guest and which we can't namespace away. So, add an extra "final_path" parameter to update_spec_device() to allow callers to chose the device path that should be used for the inner container. All current callers pass the same thing as container_path, but that will change in future. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-25 12:28:33 +11:00
David Gibson	827a41f973	agent/device: Refactor update_spec_device_list() update_spec_device_list() is used to update the container configuration to change device major/minor numbers configured by the Kata client based on host details to values suitable for the sandbox VM, which may differ. It takes a 'device' object, but the only things it actually uses from there are container_path and vm_path. Refactor this as update_spec_device(), taking the host and guest paths to the device as explicit parameters. This makes the function more self-contained and will enable some future extensions. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-25 12:28:33 +11:00
David Gibson	8ceadcc5a9	agent/device: Sanity check guest IOMMU groups Each VFIO device passed into the guest could represent a whole IOMMU group of devices on the host. Since these devices aren't DMA isolated from each other, they must appear as the same IOMMU group in the guest as well. The VMM should enforce that for us, but double check it, since things can't work otherwise. This also means we determine the guest IOMMU group for the VFIO device, which we'll be needing later. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-25 12:28:33 +11:00
David Gibson	ff59db7534	agent/device: Add function to get IOMMU group for a PCI device For upcoming VFIO extensions we'll need to work with the IOMMU groups of VFIO devices. This helps us towards that by adding pci_iommu_group() to retrieve the IOMMU group (if any) of a given PCI device. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-25 12:28:33 +11:00
David Gibson	13b06a35d5	agent/device: Rebind VFIO devices to VFIO driver inside guest VFIO devices can be added to a Kata container and they will be passed through to the sandbox guest. However, inside the guest those devices will bind to a native guest driver, so they will no longer appear as VFIO devices within the guest. This behaviour differs from runc or other conventional container runtimes. This code allows the agent to match the behaviour of other runtimes, if instructed to by kata-runtime. VFIO devices it's informed about with the "vfio" type instead of the existing "vfio-gk" type will be rebound to the vfio-pci driver within the guest. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-25 12:28:33 +11:00
David Gibson	e22bd78249	agent/device: Add helper function for binding a guest device to a driver For better VFIO support, we're going to need to take control of which guest driver controls specific guest devices. To assist with that, add the pci_driver_override() function to force a specific guest device to be bound to a specific guest driver. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-25 12:28:33 +11:00
Manabu Sugimoto	b40eedc9f7	rustjail: Consistent coding style of LinuxDevice type Use `"c".to_string` in the device type of `dev/full` in order to consistent with the coding style of other devices Fixes: #2890 Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>	2021-10-25 09:15:59 +09:00
Jianyong Wu	57c0f93f54	agent: fix race condition when test watcher create_tmpfs won't pass as the race condition in watcher umount. quote James's words here: 1. Rust runs all tests in parallel. 2. Mounts are a process-wide, not a per-thread resource. The only test that calls watcher.mount() is create_tmpfs(). However, other tests create BindWatcher objects. 3. BindWatcher's drop() implementation calls self.cleanup(), which calls unmount for the mountpoint create_tmpfs() asserts. 4. The other tests are calling unmount whenever a BindWatcher goes out of scope. To avoid that issue, let the tests using BindWatcher in watcher and sandbox.rs run sequentially. Fixes: #2809 Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>	2021-10-24 17:31:53 +08:00
Jianyong Wu	1a96b8ba35	template: disable template unit test on arm Template is broken on arm. here we disable the template unit test temporarily. Fixes: #2809 Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>	2021-10-23 15:07:25 +08:00
Jianyong Wu	43b13a4a6d	runtime: DefaultMaxVCPUs should not greater than defaultMaxQemuVCPUs DefaultMaxVCPUs may be larger than the defaultMaxQemuVCPUs that should be checked and avoided. Fixes: #2809 Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>	2021-10-23 15:07:25 +08:00
Jianyong Wu	c59c36732b	runtime: current vcpu number should be limited The physical current vcpu number should not be used directly as the largest vcpu number is limited to defaultMaxQemuVCPUs. Here, a new helper is introduced in pkg/katautils/config.go to get current vcpu number. Fixes: #2809 Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>	2021-10-23 15:07:25 +08:00
Jianyong Wu	fa922517d9	runtime: kernel version with '+' as suffix panic in parse The current kernel version parse lib can't process suffix '+', as the modified kernel version will add '+' as suffix, thus panic will occur. For example, if the current kernel version is "5.14.0-rc4+", test TestHostNetworkingRequested will panic: --- FAIL: TestHostNetworkingRequested (0.00s) panic: &{DistroName:ubuntu DistroVersion:18.04 KernelVersion:5.11.0-rc3+ Issue: Passed:[] Failed:[] Debug:true ActualEUID:0}: failed to check test constraints: error: Build meta data is empty Here, remove the suffix '+' in kernel version fix helper. Fixes: #2809 Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>	2021-10-23 15:07:25 +08:00
Manohar Castelino	52268d0ece	hypervisor: Expose the hypervisor itself Export the top level hypervisor type s/hypervisor/Hypervisor Fixes: #2880 Signed-off-by: Manohar Castelino <mcastelino@apple.com> Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-10-22 16:46:02 -07:00
Eric Ernst	a72bed5b34	hypervisor: update tests based on createSandbox->CreateVM change Fixup a couple of broken tests. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-10-22 16:45:35 -07:00
Manohar Castelino	f434bcbf6c	hypervisor: createSandbox is CreateVM Last of a series of commits to export the top level hypervisor generic methods. s/createSandbox/CreateVM Fixes #2880 Signed-off-by: Manohar Castelino <mcastelino@apple.com> Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-10-22 16:45:35 -07:00
Manohar Castelino	76f1ce9e30	hypervisor: startSandbox is StartVM s/startSandbox/StartVM Signed-off-by: Manohar Castelino <mcastelino@apple.com> Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-10-22 16:45:35 -07:00
Manohar Castelino	fd24a695bf	hypervisor: waitSandbox is waitVM renaming... Signed-off-by: Manohar Castelino <mcastelino@apple.com>	2021-10-22 16:45:35 -07:00
Manohar Castelino	a6385c8fde	hypervisor: stopSandbox is StopVM Renaming. There is no Sandbox specific logic except tracing. Signed-off-by: Manohar Castelino <mcastelino@apple.com>	2021-10-22 16:45:35 -07:00
Manohar Castelino	f989078cd2	hypervisor: resumeSandbox is ResumeVM renaming... Signed-off-by: Manohar Castelino <mcastelino@apple.com>	2021-10-22 16:45:35 -07:00
Manohar Castelino	73b4f27c46	hypervisor: saveSandbox is SaveVM rename Signed-off-by: Manohar Castelino <mcastelino@apple.com>	2021-10-22 16:45:35 -07:00

1 2 3 4 5 ...

7230 Commits