kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2025-09-18 15:28:10 +00:00

Author	SHA1	Message	Date
Jianyong Wu	241c355e07	clh:arm64: use arm AMBA uart for hypervisor debug cloud hypervisor on arm64 only support arm AMBA UART(pl011) as tty. So, the console should be set to "ttyAMA0" instead of "ttyS0" when enable hypervisor debug mode. Fixes: #5080 Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>	2023-09-15 01:44:23 +00:00
Jeremi Piotrowski	3a1db7a86b	runtime: clh: Support enabling iommu by enabling IOMMU on the default PCI segment. For hotplug to work we need a virtualized iommu and clh exposes one if there is some device or PCI segment that requests it. I would have preferred to add a separate PCI segment for hotplugging vfio devices but unfortunately kata assumes there is only one segment all over the place. See create_pci_root_bus_path(), split_vfio_pci_option() and grep for '0000'. Enabling the IOMMU on the default PCI segment requires passing enabling IOMMU on every device that is attached to it, which is why it is sprinkled all over the place. CLH does not support IOMMU for VirtioFs, so I've added a non IOMMU segment for that device. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>	2023-09-14 14:23:28 +02:00
Jeremi Piotrowski	fc51e4b9eb	runtime: Check config for supported CLH (cold\|hot)_plug_vfio values The only supported options are hot_plug_vfio=root-port or no-port. cold_plug_vfio not supported yet. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>	2023-09-14 14:23:28 +02:00
Peng Tao	55ca7e8aec	Merge pull request #7907 from Xuanqing-Shi/7876/network-devices-naming-conflict runtime: Naming conflict of network devices	2023-09-13 19:29:41 +08:00
shixuanqing	1636abbe1c	runtime: issue with non-empty []Endpoint in RemoveEndpoints In the RemoveEndpoints(), when the endpoints paramete isn't empty, using idx may result in wrong endpoint removals. To improve, directly passing the endpoint parameter helps locate the correct elements within n.eps. Fixes: #7732 Signed-off-by: shixuanqing <1356292400@qq.com> Fixes: #7732 Signed-off-by: shixuanqing <1356292400@qq.com> Update src/runtime/virtcontainers/network_linux.go Co-authored-by: Xuewei Niu <justxuewei@apache.org>	2023-09-13 09:47:18 +00:00
Peng Tao	9766f9090c	Merge pull request #7719 from beraldoleal/nullable Remove gogoproto.nullable extension	2023-09-13 15:11:56 +08:00
shixuanqing	ca4b6b051d	runtime: Naming conflict of network devices When creating a new endpoint, we check existing endpoint names and automatically adjust the naming of the new endpoint to ensure uniqueness. Fixes: #7876 Signed-off-by: shixuanqing <1356292400@qq.com>	2023-09-12 04:29:51 +00:00
Fabiano Fidêncio	6cd5d83a37	Merge pull request #7865 from gkurz/fix-more-virtiofs-args runtime: Fix more virtiofs args	2023-09-09 21:30:16 +02:00
Greg Kurz	72c510d057	runtime/virtiofsd: Drop all references to "--cache=none" This syntax belongs to the legacy C virtiofsd implementation that we don't support anymore since kata-containers 3.1.3 because of other API breaking changes. People have been warned to switch from "none" to "never" since kata-containers 2.5.2. Let's officially do that. The compat code that would convert "none" to "never" isn't needed anymore. Just drop it. Fixes #7864 Signed-off-by: Greg Kurz <groug@kaod.org>	2023-09-08 17:57:30 +02:00
Beraldo Leal	ead724bec1	protocol: removing gogo.nullable feature gogo.nullable is the main gogo.protobuf' feature used here. Since we are trying to remove gogo.protobuf, the first reasonable step seems to be remove this feature. This is a core update, and it will change how the structs are defined. I could spot only a few places using those structs, based on make check/build. Fixes #7723. Signed-off-by: Beraldo Leal <bleal@redhat.com>	2023-09-08 11:49:01 -04:00
Peng Tao	435e890cd9	Merge pull request #7703 from bergwolf/github/nerdctl-fc runtime: run prestart hooks before starting VM for FC	2023-09-07 10:55:31 +08:00
Greg Kurz	81536f21af	runtime/qemu: Pass "--xattr" to virtiofsd instead of "-o xattr" The "-o" syntax belongs to the legacy C virtiofsd. It is deprecated with the rust implementation. Signed-off-by: Greg Kurz <groug@kaod.org>	2023-09-06 17:50:35 +02:00
Peng Tao	2e4c874726	runtime/vc: runPrestartHooks should ignore GetHypervisorPid failure If we are running FC hypervisor, it is not started when prestart hooks are executed. So we should just ignore such error and just go ahead and run the hooks. Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2023-08-30 03:06:11 +00:00
Peng Tao	21204caf20	runtime: fail early when starting docker container with FC FC does not support network device hotplug. Let's add a check to fail early when starting containers created by docker. Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2023-08-30 02:52:01 +00:00
Peng Tao	32fd013716	runtime: run prestart hooks before starting VM for FC Add a new hypervisor capability to tell if it supports device hotplug. If not, we should run prestart hooks before starting new VMs as nerdctl is using the prestart hooks to set up netns. To make nerdctl + FC to work, we need to run the prestart hooks before starting new VMs. Fixes: #6384 Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2023-08-30 02:52:01 +00:00
Fabiano Fidêncio	d1b54ede29	qemu: tdx: Workaround SMP issue with TDX 1.5 `...,sockets=1,cores=numvcpus,threads=1,...` must be used. Fixes: #7770 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-08-28 13:41:36 +02:00
Archana Shinde	1e34220c41	qemu: tdx: Adapt to the TDX 1.5 stack QEMU for TDX 1.5 makes use of private memory map/unmap. Make changes to govmm to support this. Support for private backing fd for memory is added as knob to the qemu config. Userspace's map/unmap operations are done by fallocate() ioctl on the backing store fd. Reference: https://lore.kernel.org/linux-mm/20220519153713.819591-1-chao.p.peng@linux.intel.com/ Fixes: #7770 Signed-off-by: Archana Shinde <archana.m.shinde@intel.com> Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-08-28 13:41:36 +02:00
Peng Tao	18d42da21e	runtime/fc: fix image/initrd annotation handling Right now if we configure an image annotation and have a config file setting initrd, the initrd config would override the image annotation. Make sure annotations are preferred over config options in image and initrd path handling. Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2023-08-23 03:47:28 +00:00
Peng Tao	9fda7059a5	runtime/clh: fix image/initrd annotation handling We should make sure annotations are preferred over config options in image and initrd path handling. Fixes: #7705 Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2023-08-23 03:47:28 +00:00
Peng Tao	1a0092d631	runtime/qemu: fix image/initrd annotation handling Right now if we configure an image annotation and have a config file setting initrd, the initrd config would override the image annotation. Add a helper function ImageOrInitrdAssetPath to make sure annotations are preferred over config options in image and initrd path handling. Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2023-08-23 03:47:27 +00:00
Fabiano Fidêncio	e107d1d94e	Merge pull request #7574 from microsoft/danmihai1/policy agent: runtime: add Agent Policy feature	2023-08-15 11:29:13 +02:00
Chelsea Mafrica	22465d22f0	Merge pull request #7638 from ManaSugi/fix/virtcontainers-doc docs: Remove installation step in virtcontainers doc	2023-08-14 10:21:57 -07:00
Dan Mihai	ab829d1038	agent: runtime: add the Agent Policy feature Fixes: #7573 To enable this feature, build your rootfs using AGENT_POLICY=yes. The default is AGENT_POLICY=no. Building rootfs using AGENT_POLICY=yes has the following effects: 1. The kata-opa service gets included in the Guest image. 2. The agent gets built using AGENT_POLICY=yes. After this patch, the shim calls SetPolicy if and only if a Policy annotation is attached to the sandbox/pod. When creating a sandbox/pod that doesn't have an attached Policy annotation: 1. If the agent was built using AGENT_POLICY=yes, the new sandbox uses the default agent settings, that might include a default Policy too. 2. If the agent was built using AGENT_POLICY=no, the new sandbox is executed the same way as before this patch. Any SetPolicy calls from the shim to the agent fail if the agent was built using AGENT_POLICY=no. If the agent was built using AGENT_POLICY=yes: 1. The agent reads the contents of a default policy file during sandbox start-up. 2. The agent then connects to the OPA service on localhost and sends the default policy to OPA. 3. If the shim calls SetPolicy: a. The agent checks if SetPolicy is allowed by the current policy (the current policy is typically the default policy mentioned above). b. If SetPolicy is allowed, the agent deletes the current policy from OPA and replaces it with the new policy it received from the shim. A typical new policy from the shim doesn't allow any future SetPolicy calls. 4. For every agent rpc API call, the agent asks OPA if that call should be allowed. OPA allows or not a call based on the current policy, the name of the agent API, and the API call's inputs. The agent rejects any calls that are rejected by OPA. When building using AGENT_POLICY_DEBUG=yes, additional Policy logging gets enabled in the agent. In particular, information about the inputs for agent rpc API calls is logged in /tmp/policy.txt, on the Guest VM. These inputs can be useful for investigating API calls that might have been rejected by the Policy. Examples: 1. Load a failing policy file test1.rego on a different machine: opa run --server --addr 127.0.0.1:8181 test1.rego 2. Collect the API inputs from Guest's /tmp/policy.txt and test on the machine where the failing policy has been loaded: curl -X POST http://localhost:8181/v1/data/agent_policy/CreateContainerRequest \ --data-binary @test1-inputs.json Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2023-08-14 17:07:35 +00:00
Manabu Sugimoto	416445e7eb	docs: Remove installation step in virtcontainers doc Remove the installation step in the virtcontainers doc because the virtcontainers install/uninstall targets have been removed by `86723b51ae` and they are not used anymore. Fixes: #7637 Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>	2023-08-14 15:15:24 +09:00
Pradipta Banerjee	ab13ef87ee	runtime: propagate configmap/secrets etc changes for remote-hyp For remote hypervisor, the configmap, secrets, downward-api or project-volumes are copied from host to guest. This patch watches for changes to the host files and copies the changes to the guest. Note that configmap updates takes significantly longer than updates via downward-api. This is similar across runc and Kata runtimes. Fixes: #7210 Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com> Signed-off-by: Julien Ropé <jrope@redhat.com> (cherry picked from commit `3081cd5f8e`) (cherry picked from commit 68ec673bc4d9cd853eee51b21a0e91fcec149aad)	2023-08-11 16:31:08 +01:00
Yohei Ueda	c074ec4df1	runtime: Copy shared files recursively This patch enables recursive file copying when filesystem sharing is not used. Signed-off-by: Yohei Ueda <yohei@jp.ibm.com> Co-authored-by: stevenhorsman <steven@uk.ibm.com> (cherry picked from commit `5422a056f2`) (cherry picked from commit 16055ce040bbd724be2916bc518d89b69c9e0ca5) Fixes: #7210	2023-08-11 16:16:52 +01:00
Manabu Sugimoto	cc922be5ec	versions: Update firecracker version to 1.4.0 This patch upgrades Firecracker version from v1.1.0 to v1.4.0. * Generate swagger models for v1.4.0 (from `firecracker.yaml`) - The version of go-swagger used is v0.30.0 * The firecracker v1.4.0 includes the following changes. - Added * Added support for custom CPU templates allowing users to adjust vCPU features exposed to the guest via CPUID, MSRs and ARM registers. * Introduced V1N1 static CPU template for ARM to represent Neoverse V1 CPU as Neoverse N1. * Added support for the virtio-rng entropy device. The device is optional. A single device can be enabled per VM using the /entropy endpoint. * Added a cpu-template-helper tool for assisting with creating and managing custom CPU templates. - Changed * Set FDP_EXCPTN_ONLY bit (CPUID.7h.0:EBX[6]) and ZERO_FCS_FDS bit (CPUID.7h.0:EBX[13]) in Intel's CPUID normalization process. - Fixed * Fixed feature flags in T2S CPU template on Intel Ice Lake. * Fixed CPUID leaf 0xb to be exposed to guests running on AMD host. * Fixed a performance regression in the jailer logic for closing open file descriptors. * A race condition that has been identified between the API thread and the VMM thread due to a misconfiguration of the api_event_fd. * Fixed CPUID leaf 0x1 to disable perfmon and debug feature on x86 host. * Fixed passing through cache information from host in CPUID leaf 0x80000006. * Fixed the T2S CPU template to set the RRSBA bit of the IA32_ARCH_CAPABILITIES MSR to 1 in accordance with an Intel microcode update. * Fixed the T2CL CPU template to pass through the RSBA and RRSBA bits of the IA32_ARCH_CAPABILITIES MSR from the host in accordance with an Intel microcode update. * Fixed passing through cache information from host in CPUID leaf 0x80000005. * Fixed the T2A CPU template to disable SVM (nested virtualization). * Fixed the T2A CPU template to set EferLmsleUnsupported bit (CPUID.80000008h:EBX[20]), which indicates that EFER[LMSLE] is not supported. Fixes: #7610 Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>	2023-08-10 16:48:13 +09:00
Wedson Almeida Filho	4fbe0a3a53	runtime: bind-mount mounted block device into container When the mounted block device isn't a layer, we want to mount it into containers, but since it's already mounted with the correct fs (e.g., tar, ext4, etc.) in the pod, we just bind-mount it into the container. Fixes: #7536 Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>	2023-08-03 17:58:39 -03:00
Wedson Almeida Filho	7e1b1949d4	runtime: add support for kata overlays When at least one `io.katacontainers.fs-opt.layer` option is added to the rootfs, it gets inserted into the VM as a layer, and the file system is mounted as an overlay of all layers using the overlayfs driver. Additionally, if the `io.katacontainers.fs-opt.block_device=file` option is present in a layer, it is mounted as a block device backed by a file on the host. Fixes: #7536 Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>	2023-08-03 17:58:39 -03:00
Zvonko Kaiser	cddcde1d40	vfio: Fix vfio device ordering If modeVFIO is enabled we need 1st to attach the VFIO control group device /dev/vfio/vfio an 2nd the actuall device(s) afterwards.Sort the devices starting with device #1 being the VFIO control group device and the next the actuall device(s) /dev/vfio/<group> Fixes: #7493 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-07-31 11:26:27 +00:00
Zvonko Kaiser	1fc715bc65	s390x: Add AP Attach/Detach test Now that we have propper AP device support add a unit test for testing the correct Attach/Detach of AP devices. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-07-23 13:44:19 +00:00
Zvonko Kaiser	545de5042a	vfio: Fix tests Now with more elaborate checking of cold\|hot plug ports we needed to update some of the tests. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-07-20 13:42:44 +00:00
Zvonko Kaiser	62aa6750ec	vfio: Added better handling of VFIO Control Devices Depending on the vfio_mode we need to mount the VFIO control device additionally into the container. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-07-20 13:42:42 +00:00
Zvonko Kaiser	dd422ccb69	vfio: Remove obsolete HotplugVFIOonRootBus Removing HotplugVFIOonRootBus which is obsolete with the latest PCI topology changes, users can set cold_plug_vfio or hot_plug_vfio either in the configuration.toml or via annotations. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-07-20 07:25:40 +00:00
Zvonko Kaiser	114542e2ba	s390x: Fixing device.Bus assignment The device.Bus was reset if a specific combination of configuration parameters were not met. With the new PCIe topology this should not happen anymore Fixes: #7381 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-07-20 07:24:26 +00:00
Peng Tao	581be92b25	Merge pull request #4492 from zvonkok/pcie-topology runtime: fix PCIe topology for GPUDirect use-case	2023-07-03 09:17:12 +08:00
Fabiano Fidêncio	6a21e20c63	runtime: Add "none" as a shared_fs option Currently, even when using devmapper, if the VMM supports virtio-fs / virtio-9p, that's used to share a few files between the host and the guest. This needed, as we need to share with the guest contents like secrets, certificates, and configurations, via Kubernetes objects like configMaps or secrets, and those are rotated and must be updated into the guest whenever the rotation happens. However, there are still use-cases users can live with just copying those files into the guest at the pod creation time, and for those there's absolutely no need to have a shared filesystem process running with no extra obvious benefit, consuming memory and even increasing the attack surface used by Kata Containers. For the case mentioned above, we should allow users, making it very clear which limitations it'll bring, to run Kata Containers with devmapper without actually having to use a shared file system, which is already the approach taken when using Firecracker as the VMM. Fixes: #7207 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-06-30 20:45:00 +02:00
Greg Kurz	a43ea24dfc	virtiofsd: Convert legacy `-o` sub-options to their `--` replacement The `-o` option is the legacy way to configure virtiofsd, inherited from the C implementation. The rust implementation honours it for compatibility but it logs deprecation warnings. Let's use the replacement options in the go shim code. Also drop references to `-o` from the configuration TOML file. Fixes #7111 Signed-off-by: Greg Kurz <groug@kaod.org>	2023-06-16 11:42:54 +02:00
Greg Kurz	8e00dc6944	virtiofsd: Drop `-o no_posix_lock` The C implementation of virtiofsd had some kind of limited support for remote POSIX locks that was causing some workflows to fail with kata. Commit `432f9bea6e` hard coded `-o no_posix_lock` in order to enforce guest local POSIX locks and avoid the issues. We've switched to the rust implementation of virtiofsd since then, but it emits a warning about `-o` being deprecated. According to https://gitlab.com/virtio-fs/virtiofsd/-/issues/53 : The C implementation of the daemon has limited support for remote POSIX locks, restricted exclusively to non-blocking operations. We tried to implement the same level of functionality in #2, but we finally decided against it because, in practice most applications will fail if non-blocking operations aren't supported. Implementing support for non-blocking isn't trivial and will probably require extending the kernel interface before we can even start working on the daemon side. There is thus no justification to pass `-o no_posix_lock` anymore. Signed-off-by: Greg Kurz <groug@kaod.org>	2023-06-16 11:42:39 +02:00
Greg Kurz	2a15ad9788	virtiofsd: Stop using deprecated `-f` option The rust implementation of virtiofsd always runs foreground and spits a deprecation warning when `-f` is passed. Signed-off-by: Greg Kurz <groug@kaod.org>	2023-06-16 10:30:40 +02:00
Zvonko Kaiser	72f2cb84e6	gpu: Reset cold or hot plug after overriding If we override the cold, hot plug with an annotation we need to reset the other plugging mechanism to NoPort otherwise both will be enabled. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-15 17:51:01 +00:00
Zvonko Kaiser	fbacc09646	gpu: PCIe topology, consider vhost-user-block in Virt In Virt the vhost-user-block is an PCIe device so we need to make sure to consider it as well. We're keeping track of vhost-user-block devices and deduce the correct amount of PCIe root ports. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-15 17:39:55 +00:00
Zvonko Kaiser	b11246c3aa	gpu: Various fixes for virt machine type The PCI qom path was not deduced correctly added regex for correct path walking. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:33:57 +00:00
Zvonko Kaiser	40101ea7db	vfio: Added annotation for hot(cold) plug Now it is possible to configure the PCIe topology via annotations and addded a simple test, checking for Invalid and RootPort Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00
Zvonko Kaiser	8f0d4e2612	vfio: Cleanup of Cold and Hot Plug Removed the configuration of PCIeRootPort and PCIeSwitchPort, those values can be deduced in createPCIeTopology Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00
Zvonko Kaiser	b5c4677e0e	vfio: Rearrange the bus assignemnt Refactor the bus assignment so that the call to GetAllVFIODevicesFromIOMMUGroup can be used by any module without affecting the topology. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00
Zvonko Kaiser	b1aa8c8a24	gpu: Moved the PCIe configs to drivers The hypervisor_state file was the wrong location for the PCIe Port settings, moved everything under device umbrella, where it can be consumed more easily and we do not get into circular deps. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00
Zvonko Kaiser	55a66eb7fb	gpu: Add config to TOML Update cold-plug and hot-plug setting to include bridge, root and switch-port Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00
Zvonko Kaiser	da42801c38	gpu: Add config settings tests for hot-plug Updated all references and config settings for hot-plug to match cold-plug Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00
Zvonko Kaiser	de39fb7d38	runtime: Add support for GPUDirect and GPUDirect RDMA PCIe topology Fixes: #4491 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00

... 3 4 5 6 7 ...

1248 Commits