kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2025-08-24 10:41:43 +00:00

Author	SHA1	Message	Date
Julien Ropé	9de65707ca	runtime: stop reporting net dev metrics for the shim As part of the shim network metrics, the shim is reporting network interfaces from the host with no namespace isolation - this gives insight in interfaces not tied to the kata containers, and causes an increase in resource usage for kata metrics. As the shim itself is not using the network (all its communication with other processes is done with local unix sockets), there is no reason to keep gathering and reporting shim-specific network metrics. Actual network usage of the kata containers can be found from the existing hypervisor network metrics (kata_hypervisor_netdev) and from the agent network metrics (kata_guest_netdev_stat). Fixes: #5738 Signed-off-by: Julien Ropé <jrope@redhat.com>	2024-02-22 14:00:00 +01:00
Alex Lyn	cf74166d75	Merge pull request #9015 from Apokleos/bugfix-exec-uds runtime: display accurate error msg to avoid misleading users.	2024-02-05 13:50:43 +08:00
Alex Lyn	c6830ceb89	runtime: display accurate error msg to avoid misleading users. The original handling method does not reach user expectations. When the ClientSocketAddress method stats the corresponding path of runtime-rs and has not found it yet, we should return an error message here that includes the reason for the failure (which should be an error display indicating that both runtime-go and runtime-rs were not found). Instead of simply displaying the corresponding path of runtime-rs as the final error message to users. It is also necessary to return the error promptly to the caller for further error handling. Fixes: #8999 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-02-04 16:45:59 +08:00
Guoqiang Ding	7bf1ebe16d	kata-monitor: fix agentUrl from containerd shim Fix the missing leading slash. Fixes: #9013 Signed-off-by: Guoqiang Ding <dgq8211@gmail.com>	2024-02-04 16:24:13 +08:00
Zhigang Wang	9317e23df1	mount: Reduce the mount points with namespace isolation This patch can reduce load on systemd process, and increase the k8s deployment density when using go runtime. Fixes: #8758 Signed-off-by: Zhigang Wang <wangzhigang17@huawei.com> Signed-off-by: Liu Wenyuan <liuwenyuan9@huawei.com>	2024-02-01 18:34:24 +08:00
yaoyinnan	b0b8523cea	runtime: modify ValidCgroupPath unit test Modify ValidCgroupPath unit test. Fixes: #8930 Signed-off-by: yaoyinnan <35447132+yaoyinnan@users.noreply.github.com>	2024-01-31 14:37:17 +08:00
yaoyinnan	feed5c8ff9	runtime: merged ValidCgroupPath method Merged ValidCgroupPath method to handle cgroupv1 and cgroupv2. Fixes: #8930 Signed-off-by: yaoyinnan <35447132+yaoyinnan@users.noreply.github.com>	2024-01-31 14:37:13 +08:00
Kvlil	a4b208a712	runtime: remove SharedVersions field dead code SharedVersion fiel add a versiontable property that isn't supported by upstream QEMU. This is dead code since virtcontainers isn't setting SharedVersions to true. Fixes: #7720 Signed-off-by: Kvlil <kalil.pelissier@gmail.com>	2024-01-22 12:18:42 +00:00
ChengyuZhu6	5318afe273	runtime: support to create VirtualVolume rootfs storages 1) Creating storage for all `io.katacontainers.volume=` messages in rootFs.Options, and then aggregates all storages into `containerStorages`. 2) Creating storage for other data volumes and push them into `volumeStorages`. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2023-11-23 23:22:55 +08:00
Pradipta Banerjee	39e8c84269	runtime: Add support for key annotations to remote hyp In order to support different pod VM instance type via remote hypervisor implementation (cloud-api-adaptor), we need to pass machine_type, default_vcpus and default_memory annotations to cloud-api-adaptor. The cloud-api-adaptor then uses these annotations to spin up the appropriate cloud instance. Reference PR for cloud-api-adaptor https://github.com/confidential-containers/cloud-api-adaptor/pull/1088 Fixes: #7140 Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com> (based on commit `004f07f076`)	2023-11-17 13:33:27 +00:00
stevenhorsman	26d56678a9	config: Add initial remote hypervisor config - Remote hypervisor template config - Add annotation enablement for machine_type, default_memory and default_vcpus for flexible instance types Fixes: #6349 Signed-off-by: stevenhorsman <steven@uk.ibm.com> (based on commits `7c9a791d67` and `335a456425`)	2023-11-17 13:33:24 +00:00
stevenhorsman	ad63439a3e	runtime: Update the remote hypervisor config Add the SELinux setting to ensure it is passed through to the remote hypervisor Fixes: #5936 Signed-off-by: stevenhorsman <steven@uk.ibm.com> (based on commit `3ef2fd1784`)	2023-11-17 13:32:52 +00:00
Lei Li	50e0d43dad	runtime: Support privileged containers in peer pod VM This patch fixes the issue of running containers with privileged as true. See the discussion at this URL for the details. https://github.com/confidential-containers/cloud-api-adaptor/issues/111 Signed-off-by: Lei Li <cdlleili@cn.ibm.com> (based on commit `c3e6b66051`)	2023-11-17 13:32:52 +00:00
Yohei Ueda	57d4dd8e57	runtime: Support the remote hypervisor type This patch adds the support of the remote hypervisor type. Shim opens a Unix domain socket specified in the config file, and sends TTPRC requests to a external process to control sandbox VMs. Fixes #4482 Co-authored-by: Pradipta Banerjee <pradipta.banerjee@gmail.com> Co-authored-by: stevenhorsman <steven@uk.ibm.com> Signed-off-by: Yohei Ueda <yohei@jp.ibm.com> (based on commit `f9278f22c3`)	2023-11-17 13:32:49 +00:00
Liu Wenyuan	9542211e71	configuration: add configuration for StratoVirt hypervisor. Add configuration-stratovirt.toml.in to generate the StratoVirt configuration, and parser to deliver config to StratoVirt. Fixes: #7794 Signed-off-by: Liu Wenyuan <liuwenyuan9@huawei.com>	2023-11-16 20:47:26 +08:00
Fabiano Fidêncio	5e9cf75937	vc: utils: Rename CalculateMilliCPUs() to CalculateCPUsF() With the change done in the last commit, instead of calculating milli cpus, we're actually converting the CPUs to a fraction number, a float. Let's update the function name (and associated vars) to represent that change. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-11-10 18:26:01 +01:00
Fabiano Fidêncio	e477ed0e86	runtime: Improve vCPU allocation for the VMMs First of all, this is a controversial piece, and I know that. In this commit we're trying to make a less greedy approach regards the amount of vCPUs we allocate for the VMM, which will be advantageous mainly when using the `static_sandbox_resource_mgmt` feature, which is used by the confidential guests. The current approach we have basically does: * Gets the amount of vCPUs set in the config (an integer) * Gets the amount of vCPUs set as limit (an integer) * Sum those up * Starts / Updates the VMM to use that total amount of vCPUs The fact we're dealing with integers is logical, as we cannot request 500m vCPUs to the VMMs. However, it leads us to, in several cases, be wasting one vCPU. Let's take the example that we know the VMM requires 500m vCPUs to be running, and the workload sets 250m vCPUs as a resource limit. In that case, we'd do: * Gets the amount of vCPUs set in the config: 1 * Gets the amount of vCPUs set as limit: ceil(0.25) * 1 + ceil(0.25) = 1 + 1 = 2 vCPUs * Starts / Updates the VMM to use 2 vCPUs With the logic changed here, what we're doing is considering everything as float till just before we start / update the VMM. So, the flow describe above would be: * Gets the amount of vCPUs set in the config: 0.5 * Gets the amount of vCPUs set as limit: 0.25 * ceil(0.5 + 0.25) = 1 vCPUs * Starts / Updates the VMM to use 1 vCPUs In the way I've written this patch we introduce zero regressions, as the default values set are still the same, and those will only be changed for the TEE use cases (although I can see firecracker, or any other user of `static_sandbox_resource_mgmt=true` taking advantage of this). There's, though, an implicit assumption in this patch that we'd need to make explicit, and that's that the default_vcpus / default_memory is the amount of vcpus / memory required by the VMM, and absolutely nothing else. Also, the amount set there should be reflected in the podOverhead for the specific runtime class. One other possible approach, which I am not that much in favour of taking as I think it's less clear, is that we could actually get the podOverhead amount, subtract it from the default_vcpus (treating the result as a float), then sum up what the user set as limit (as a float), and finally ceil the result. It could work, but IMHO this is less clear, and less explicit on what we're actually doing, and how the default_vcpus / default_memory should be used. Fixes: #6909 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2023-11-10 18:25:57 +01:00
Beraldo Leal	7641c19f74	runtime: bump containerd for gogo deprecation This update includes necessary changes due to the version bump of containerd and its dependencies. It's part of a broader initiative to phase out gogo protobuf, which has been deprecated, and to align with the current supported libraries. Fixes #7420. Signed-off-by: Beraldo Leal <bleal@redhat.com>	2023-11-06 16:49:59 +00:00
Beraldo Leal	16fa2c39e6	protocols: replace gogo/types.Empty and Any by Google versions. Signed-off-by: Beraldo Leal <bleal@redhat.com>	2023-11-06 16:49:58 +00:00
James O. D. Hunt	3e8cf6959c	runtime: Validate hypervisor section name in config file Previously, if you accidentally modified the name of the hypervisor section in the config file, the default golang runtime gives a cryptic error message ("`VM memory cannot be zero`"). This can be demonstrated using the `kata-runtime` utility program which uses the same golang config package as the actual runtime (`containerd-shim-kata-v2`): ```bash $ kata-runtime env >/dev/null; echo $? 0 $ sudo sed -i 's!^\[hypervisor\.qemu\]!\[hypervisor\.foo\]!g' /etc/kata-containers/configuration.toml $ kata-runtime env >/dev/null; echo $? VM memory cannot be zero 1 ``` The hypervisor name is now validated so that the behaviour becomes: ```bash $ kata-runtime env >/dev/null; echo $? 0 $ sudo sed -i 's!^\[hypervisor\.qemu\]!\[hypervisor\.foo\]!g' /etc/kata-containers/configuration.toml $ ./kata-runtime env >/dev/null; echo $? /etc/kata-containers/configuration.toml: configuration file contains invalid hypervisor section: "foo" 1 ``` Fixes: #8212. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2023-10-12 13:53:37 +01:00
Zvonko Kaiser	7c934dc7da	gpu: Fix cold-plug of VFIO devices We need to do proper sandbox sizing when we're doing cold-plug introduce CDI, the de-facto standard for enabling devices in containers. containerd will pass-through annotations for accumulated CPU,Memory and now CDI devices. With that information sandbox sizing can be derived correctly. Fixes: #7331 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-09-28 09:49:13 +00:00
Alexandru Matei	2ca781518a	clh: Direct IO support for block devices Clh suports direct i/o for disks. It doesn't offer any support for noflush, removed passing of option to cloud-hypervisor internal config Fixes: #7798 Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>	2023-09-21 14:48:24 +03:00
Dan Mihai	82ff2db460	runtime: support kernel params including spaces Support quoted kernel command line parameters that include space characters. Example: dm-mod.create="dm-verity,,,ro,0 736328 verity 1 /dev/vda1 /dev/vda2 4096 4096 92041 0 sha256 f211b9f1921ef726d57a72bf82be23a510076639fa8549ade10f85e214e0ddb4 065c13dfb5b4e0af034685aa5442bddda47b17c182ee44ba55a373835d18a038" Fixes: #8003 Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2023-09-19 20:26:38 +00:00
Greg Kurz	1f16b6627b	runtime/qemu: Rework QMP/HMP support PR #6146 added the possibility to control QEMU with an extra HMP socket as an aid for debugging. This is great for development or bug chasing but this raises some concerns in production. The HMP monitor allows to temper with the VM state in a variety of ways. This could be intentionally or mistakenly used to inject subtle bugs in the VM that would be extremely hard if not even impossible to debug. We definitely don't want that to be enabled by default. The feature is currently wired to the `enable_debug` setting in the `[hypervisor.qemu]` section of the configuration file. This setting has historically been used to control "debug output" and it is used as such by some downstream users (e.g. Openshift). Forcing people to have the extra HMP backdoor at the same time is abusive and dangerous. A new `extra_monitor_socket` is added to `[hypervisor.qemu]` to give fine control on whether the HMP socket is wanted or not. This setting is still gated by `enable_debug = true` to make it clear it is for debug only. The default is to not have the HMP socket though. This isn't backward compatible with #6416 but it is for the sake of "better safe than sorry". An extra monitor socket makes the QEMU instance untrusted. A warning is thus logged to the journal when one is requested. While here, also allow the user to choose between HMP and QMP for the extra monitor socket. Motivation is that QMP offers way more options to control or introspect the VM than HMP does. Users can also ask for pretty json formatting well suited for human reading. This will improve the debugging experience. This feature is only made visible in the base and GPU configurations of QEMU for now. Fixes #7952 Signed-off-by: Greg Kurz <groug@kaod.org>	2023-09-18 12:13:01 +02:00
Jeremi Piotrowski	bfc93927fb	runtime: Remove redundant check in checkPCIeConfig There is no way for this branch to be hit, as port is only set when it is different than config.NoPort. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>	2023-09-14 14:23:28 +02:00
Jeremi Piotrowski	7c4e73b609	runtime: Add test cases for checkPCIeConfig These test cases shows which options are valid for CLH/Qemu, and test that we correctly catch unsupported combinations. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>	2023-09-14 14:23:28 +02:00
Jeremi Piotrowski	fc51e4b9eb	runtime: Check config for supported CLH (cold\|hot)_plug_vfio values The only supported options are hot_plug_vfio=root-port or no-port. cold_plug_vfio not supported yet. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>	2023-09-14 14:23:28 +02:00
Jeremi Piotrowski	c2ba29c15b	runtime: Fix data race in ioCopy IoCopy is a tricky function (I don't claim to fully understand its contract), but here is what I see: The goroutine that runs it spawns 3 goroutines - one for each stream to handle (stdin/stdout/stderr). The goroutine then waits for the stream goroutines to exit. The idea is that when the process exits and is closed, the stdout goroutine will be unblocked and close stdin - this should unblock the stdin goroutine. The stderr goroutine will exit at the same time as the stdout goroutine. The iocopy routine then closes all tty.io streams. The problem is that the stdout goroutine decrements the WaitGroup before closing the stdin stream, which causes the iocopy goroutine to race to close the streams. Move the wg.Done() of the stdout routine past the close so that this race becomes impossible. I can't guarantee that this doesn't affect some unspecified behavior. Fixes: #5031 Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>	2023-08-31 10:17:38 +02:00
Archana Shinde	1e34220c41	qemu: tdx: Adapt to the TDX 1.5 stack QEMU for TDX 1.5 makes use of private memory map/unmap. Make changes to govmm to support this. Support for private backing fd for memory is added as knob to the qemu config. Userspace's map/unmap operations are done by fallocate() ioctl on the backing store fd. Reference: https://lore.kernel.org/linux-mm/20220519153713.819591-1-chao.p.peng@linux.intel.com/ Fixes: #7770 Signed-off-by: Archana Shinde <archana.m.shinde@intel.com> Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-08-28 13:41:36 +02:00
Dan Mihai	ab829d1038	agent: runtime: add the Agent Policy feature Fixes: #7573 To enable this feature, build your rootfs using AGENT_POLICY=yes. The default is AGENT_POLICY=no. Building rootfs using AGENT_POLICY=yes has the following effects: 1. The kata-opa service gets included in the Guest image. 2. The agent gets built using AGENT_POLICY=yes. After this patch, the shim calls SetPolicy if and only if a Policy annotation is attached to the sandbox/pod. When creating a sandbox/pod that doesn't have an attached Policy annotation: 1. If the agent was built using AGENT_POLICY=yes, the new sandbox uses the default agent settings, that might include a default Policy too. 2. If the agent was built using AGENT_POLICY=no, the new sandbox is executed the same way as before this patch. Any SetPolicy calls from the shim to the agent fail if the agent was built using AGENT_POLICY=no. If the agent was built using AGENT_POLICY=yes: 1. The agent reads the contents of a default policy file during sandbox start-up. 2. The agent then connects to the OPA service on localhost and sends the default policy to OPA. 3. If the shim calls SetPolicy: a. The agent checks if SetPolicy is allowed by the current policy (the current policy is typically the default policy mentioned above). b. If SetPolicy is allowed, the agent deletes the current policy from OPA and replaces it with the new policy it received from the shim. A typical new policy from the shim doesn't allow any future SetPolicy calls. 4. For every agent rpc API call, the agent asks OPA if that call should be allowed. OPA allows or not a call based on the current policy, the name of the agent API, and the API call's inputs. The agent rejects any calls that are rejected by OPA. When building using AGENT_POLICY_DEBUG=yes, additional Policy logging gets enabled in the agent. In particular, information about the inputs for agent rpc API calls is logged in /tmp/policy.txt, on the Guest VM. These inputs can be useful for investigating API calls that might have been rejected by the Policy. Examples: 1. Load a failing policy file test1.rego on a different machine: opa run --server --addr 127.0.0.1:8181 test1.rego 2. Collect the API inputs from Guest's /tmp/policy.txt and test on the machine where the failing policy has been loaded: curl -X POST http://localhost:8181/v1/data/agent_policy/CreateContainerRequest \ --data-binary @test1-inputs.json Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2023-08-14 17:07:35 +00:00
Wedson Almeida Filho	7e1b1949d4	runtime: add support for kata overlays When at least one `io.katacontainers.fs-opt.layer` option is added to the rootfs, it gets inserted into the VM as a layer, and the file system is mounted as an overlay of all layers using the overlayfs driver. Additionally, if the `io.katacontainers.fs-opt.block_device=file` option is present in a layer, it is mounted as a block device backed by a file on the host. Fixes: #7536 Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>	2023-08-03 17:58:39 -03:00
Zvonko Kaiser	1fc715bc65	s390x: Add AP Attach/Detach test Now that we have propper AP device support add a unit test for testing the correct Attach/Detach of AP devices. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-07-23 13:44:19 +00:00
Zvonko Kaiser	545de5042a	vfio: Fix tests Now with more elaborate checking of cold\|hot plug ports we needed to update some of the tests. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-07-20 13:42:44 +00:00
Zvonko Kaiser	62aa6750ec	vfio: Added better handling of VFIO Control Devices Depending on the vfio_mode we need to mount the VFIO control device additionally into the container. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-07-20 13:42:42 +00:00
Zvonko Kaiser	dd422ccb69	vfio: Remove obsolete HotplugVFIOonRootBus Removing HotplugVFIOonRootBus which is obsolete with the latest PCI topology changes, users can set cold_plug_vfio or hot_plug_vfio either in the configuration.toml or via annotations. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-07-20 07:25:40 +00:00
Zvonko Kaiser	114542e2ba	s390x: Fixing device.Bus assignment The device.Bus was reset if a specific combination of configuration parameters were not met. With the new PCIe topology this should not happen anymore Fixes: #7381 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-07-20 07:24:26 +00:00
Peng Tao	581be92b25	Merge pull request #4492 from zvonkok/pcie-topology runtime: fix PCIe topology for GPUDirect use-case	2023-07-03 09:17:12 +08:00
Fabiano Fidêncio	6a21e20c63	runtime: Add "none" as a shared_fs option Currently, even when using devmapper, if the VMM supports virtio-fs / virtio-9p, that's used to share a few files between the host and the guest. This needed, as we need to share with the guest contents like secrets, certificates, and configurations, via Kubernetes objects like configMaps or secrets, and those are rotated and must be updated into the guest whenever the rotation happens. However, there are still use-cases users can live with just copying those files into the guest at the pod creation time, and for those there's absolutely no need to have a shared filesystem process running with no extra obvious benefit, consuming memory and even increasing the attack surface used by Kata Containers. For the case mentioned above, we should allow users, making it very clear which limitations it'll bring, to run Kata Containers with devmapper without actually having to use a shared file system, which is already the approach taken when using Firecracker as the VMM. Fixes: #7207 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-06-30 20:45:00 +02:00
Zvonko Kaiser	0f454d0c04	gpu: Fixing typos for PCIe topology changes Some comments and functions had typos and wrong capitalization. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-30 08:42:55 +00:00
Zvonko Kaiser	8330fb8ee7	gpu: Update unit tests Some tests are now failing due to the changes how PCIe is handled. Update the test accordingly. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-23 11:16:25 +00:00
Zvonko Kaiser	72f2cb84e6	gpu: Reset cold or hot plug after overriding If we override the cold, hot plug with an annotation we need to reset the other plugging mechanism to NoPort otherwise both will be enabled. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-15 17:51:01 +00:00
Zvonko Kaiser	fbacc09646	gpu: PCIe topology, consider vhost-user-block in Virt In Virt the vhost-user-block is an PCIe device so we need to make sure to consider it as well. We're keeping track of vhost-user-block devices and deduce the correct amount of PCIe root ports. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-15 17:39:55 +00:00
Zvonko Kaiser	b11246c3aa	gpu: Various fixes for virt machine type The PCI qom path was not deduced correctly added regex for correct path walking. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:33:57 +00:00
Zvonko Kaiser	40101ea7db	vfio: Added annotation for hot(cold) plug Now it is possible to configure the PCIe topology via annotations and addded a simple test, checking for Invalid and RootPort Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00
Zvonko Kaiser	8f0d4e2612	vfio: Cleanup of Cold and Hot Plug Removed the configuration of PCIeRootPort and PCIeSwitchPort, those values can be deduced in createPCIeTopology Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00
Zvonko Kaiser	b5c4677e0e	vfio: Rearrange the bus assignemnt Refactor the bus assignment so that the call to GetAllVFIODevicesFromIOMMUGroup can be used by any module without affecting the topology. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00
Zvonko Kaiser	b1aa8c8a24	gpu: Moved the PCIe configs to drivers The hypervisor_state file was the wrong location for the PCIe Port settings, moved everything under device umbrella, where it can be consumed more easily and we do not get into circular deps. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00
Zvonko Kaiser	55a66eb7fb	gpu: Add config to TOML Update cold-plug and hot-plug setting to include bridge, root and switch-port Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00
Zvonko Kaiser	da42801c38	gpu: Add config settings tests for hot-plug Updated all references and config settings for hot-plug to match cold-plug Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00
Zvonko Kaiser	de39fb7d38	runtime: Add support for GPUDirect and GPUDirect RDMA PCIe topology Fixes: #4491 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00
Fabiano Fidêncio	3a4b924226	Merge pull request #6833 from rye-stripe/bugfix/vcpu-pinning resource-control: fix setting CPU affinities on Linux	2023-05-18 08:12:39 +02:00
Fabiano Fidêncio	e762f70920	Merge pull request #6838 from rye-stripe/bugfix/use-enable-vcpus-pinning-from-toml runtime: use enable_vcpus_pinning from toml	2023-05-17 21:30:44 +02:00
Dov Murik	dd7562522a	runtime: pkg/sev: Add kbs utility package for SEV pre-attestation Supports both online and offline modes of interaction with simple-kbs for SEV/SEV-ES confidential guests. Fixes: #6795 Signed-off-by: Dov Murik <dovmurik@linux.ibm.com>	2023-05-16 15:27:32 +03:00
Dov Murik	05de7b2607	runtime: Add sev package The sev package provides utilities for launching AMD SEV and SEV-ES confidential guests. Fixes: #6795 Signed-off-by: Dov Murik <dovmurik@linux.ibm.com>	2023-05-16 15:27:32 +03:00
Peteris Rudzusiks	bdb75fb21e	runtime: use enable_vcpus_pinning from toml Set the default value of runtime's EnableVCPUsPinning to value read from .toml. Fixes: #6836 Signed-off-by: Peteris Rudzusiks <rye@stripe.com>	2023-05-15 21:41:20 +02:00
Peteris Rudzusiks	3e85bf5b17	resource-control: fix setting CPU affinities on Linux With this fix the vCPU pinning feature chooses the correct physical cores to pin the vCPU threads on rather than always using core 0. Fixes #6831 Signed-off-by: Peteris Rudzusiks <rye@stripe.com>	2023-05-15 16:46:36 +02:00
Peng Tao	65670e6b0a	Merge pull request #6699 from zvonkok/cold-plug-vfio gpu: cold plug VFIO devices	2023-05-05 10:04:29 +08:00
Zvonko Kaiser	13d7f39c71	gpu: Check for VFIO port assignments Bailing out early if the port is wrong, allowed port settings are no-port, root-port, switch-port Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-05-03 12:32:33 +00:00
Zvonko Kaiser	138ada049c	gpu: Cold Plug VFIO toml setting Added the cold_plug_vfio setting to the qemu-toml.in with some epxlanation Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-04-27 11:04:45 +00:00
Zvonko Kaiser	0fec2e6986	gpu: Add cold-plug test Cold plug setting is now correctly decoded in toml Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-04-27 09:30:24 +00:00
Zvonko Kaiser	2a830177ca	gpu: Add fwcfg helper function Added driver util function for easier handling of VFIO devices outside of the VFIO module. At the sandbox level we may need to set options depending if we have a VFIO/PCIe device, like the fwCfg for confiential guests. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-04-26 09:47:37 +00:00
Zvonko Kaiser	131f056a12	gpu: Extract VFIO Functions to drivers Some functions may be used in other modules then only in the VFIO module, extract them and make them available to other layers like sandbox. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-04-26 09:47:37 +00:00
Zvonko Kaiser	c8cf7ed3bc	gpu: Add ColdPlug of VFIO devices with devManager If we have a VFIO device and cold-plug is enabled we mark each device as ColdPlug=true and let the VFIO module do the attaching. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-04-26 09:47:37 +00:00
Zvonko Kaiser	6107c32d70	gpu: Assign default value to cold-plug Make sure the configuration is propagated to the right structs and the default value is assigned. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-04-26 09:47:37 +00:00
Zvonko Kaiser	377ebc2ad1	gpu: Add configuration option for cold-plug VFIO Users can set cold-plug="root-port" to cold plug a VFIO device in QEMU Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-04-26 09:47:37 +00:00
Zvonko Kaiser	c18ceae109	gpu: Add new struct PCIePort For the hypervisor to distinguish between PCIe components, adding a new enum that can be used for hot-plug and cold-plug of PCIe devices Fixes: #6687 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-04-26 09:47:37 +00:00
Eduardo Berrocal	1c1ee8057c	pkg/signals: Improved test coverage 60% to 100% Expanded tests on signals_test.go to cover more lines of code. 'go test' won't show 100% coverage (only 66.7%), because one test need to spawn a new process (since it is testing a function that calls os.Exit(1)). Fixes: #256 Signed-off-by: Eduardo Berrocal <eduardo.berrocal@intel.com>	2023-04-25 23:34:13 +00:00
Zvonko Kaiser	f4f958d53c	gpu: Do not pass-through PCI (Host) Bridges On some systems a GPU is in a IOMMU group with a PCI Bridge and PCI Host Bridge. Per default no PCI Bridge needs to be passed-through. When scanning the IOMMU group, ignore devices with a 0x60 class ID prefix. Fixes: #6663 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-04-17 10:08:23 +00:00
Alexandru Matei	db2cac34d8	runtime: Don't create socket file in /run/kata The socket file for shim management is created in /run/kata and it isn't deleted after the container is stopped. After running and stopping thousands of containers /run folder will run out of space. Fixes #6622 Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com> Co-authored-by: Greg Kurz <groug@kaod.org>	2023-04-13 10:21:29 +03:00
Fabiano Fidêncio	3b3656d96d	Merge pull request #6522 from fidencio/topic/add-tdx-artefacts-from-2023ww01-to-main tdx: Add artefacts from the latest TDX tools release into main	2023-04-11 20:43:02 +02:00
Fabiano Fidêncio	50ce33b02d	Merge pull request #6205 from fengwang666/non-root-clh runtime: support non-root for clh	2023-04-11 19:34:00 +02:00
Fabiano Fidêncio	3e15800199	govmm: Directly pass the firmware using -bios with TDX Since TDX doesn't support readonly memslot, TDVF cannot be mapped as pflash device and it actually works as RAM. "-bios" option is chosen to load TDVF. OVMF is the opensource firmware that implements the TDVF support. Thus the command line to specify and load TDVF is ``-bios OVMF.fd`` Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-04-11 15:23:42 +02:00
Fabiano Fidêncio	3c5ffb0c85	govmm: Set "sept-ve-disable=on" This is needed since 22ww49. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-04-11 15:23:42 +02:00
James O. D. Hunt	cbe6f04194	Merge pull request #6501 from shippomx/dev_metrics runtime: add filter metrics with specific names	2023-04-05 15:15:09 +01:00
Miao Xia	0f73515561	runtime: add filter metrics with specific names The kata monitor metrics API returns a huge size response, if containers or sandboxs are a large number, focus on what we need will be harder. Fixes: #6500 Signed-off-by: Miao Xia <xia.miao1@zte.com.cn>	2023-03-28 14:56:13 +08:00
James O. D. Hunt	f06f72b5e9	Merge pull request #6467 from jongwu/qemu-uefi-path qemu/arm64: disable image nvdimm once no firmware offered	2023-03-22 08:43:01 +00:00
Jianyong Wu	ece5edc641	qemu/arm64: disable image nvdimm if no firmware offered For now, image nvdimm on qemu/arm64 depends on UEFI/ACPI, so if there is no firmware offered, it should be disabled. Fixes: #6468 Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>	2023-03-20 18:03:05 +08:00
Hyounggyu Choi	96baa83895	agent: Bring in VFIO-AP device handling again This PR is a continuing work for (kata-containers#3679). This generalizes the previous VFIO device handling which only focuses on PCI to include AP (IBM Z specific). Fixes: kata-containers#3678 Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2023-03-16 18:14:12 +09:00
Jakob Naucke	f666f8e2df	agent: Add VFIO-AP device handling Initial VFIO-AP support (#578) was simple, but somewhat hacky; a different code path would be chosen for performing the hotplug, and agent-side device handling was bound to knowing the assigned queue numbers (APQNs) through some other means; plus the code for awaiting them was written for the Go agent and never released. This code also artificially increased the hotplug timeout to wait for the (relatively expensive, thus limited to 5 seconds at the quickest) AP rescan, which is impractical for e.g. common k8s timeouts. Since then, the general handling logic was improved (#1190), but it assumed PCI in several places. In the runtime, introduce and parse AP devices. Annotate them as such when passing to the agent, and include information about the associated APQNs. The agent awaits the passed APQNs through uevents and triggers a rescan directly. Fixes: #3678 Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2023-03-16 10:07:48 +09:00
Jakob Naucke	b546eca26f	runtime: Generalize VFIO devices Generalize VFIO devices to allow for adding AP in the next patch. The logic for VFIOPciDeviceMediatedType() has been changed and IsAPVFIOMediatedDevice() has been removed. The rationale for the revomal is: - VFIODeviceMediatedType is divided into 2 subtypes for AP and PCI - Logic of checking a subtype of mediated device is included in GetVFIODeviceType() - VFIOPciDeviceMediatedType() can simply fulfill the device addition based on a type categorized by GetVFIODeviceType() Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2023-03-16 10:06:37 +09:00
Jakob Naucke	4c527d00c7	agent: Rename VFIO handling to VFIO PCI handling e.g., split_vfio_option is PCI-specific and should instead be named split_vfio_pci_option. This mutually affects the runtime, most notably how the labels are named for the agent. Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2023-03-16 07:43:39 +09:00
Henry Beberman	974a5c22f0	runtime: add support for Hyper-V This adds /dev/mshv to the list of sandbox devices so that VMMs can create Hyper-V VMs. In our testing, this also doesn't error out in case /dev/mshv isn't present. Fixes #6454. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2023-03-13 17:13:51 -07:00
yanggang	b6880c60d3	logging: Correct the code notes Fix wrong notes for func GetSandboxesStoragePathRust() Fixes: #6394 Signed-off-by: yanggang <gang.yang@daocloud.io>	2023-03-01 19:20:25 +08:00
Feng Wang	cbe6ad9034	runtime: support non-root for clh This change enables to run cloud-hypervisor VMM using a non-root user when rootless flag is set true in the configuration Fixes: #2567 Signed-off-by: Feng Wang <fwang@confluent.io>	2023-02-22 13:57:09 -08:00
zhaojizhuang	ca02c9f512	runtime: add reconnect timeout for vhost user block Fixes: #6075 Signed-off-by: zhaojizhuang <571130360@qq.com>	2023-02-13 14:33:46 +08:00
Bin Liu	95602c8c08	Merge pull request #5999 from yaoyinnan/5998/feat/cgroup-metrics runtime: support cgroup v2 metrics marshal guest metrics	2023-02-11 19:26:24 +08:00
Bin Liu	ecbd94d80c	Merge pull request #6064 from yaoyinnan/6063/feat/rootfs-erofs rootfs: support EROFS filesystem	2023-02-11 11:10:23 +08:00
yaoyinnan	bdf20b5d26	rootfs: support EROFS filesystem For kata containers, rootfs is used in the read-only way. EROFS can noticably decrease metadata overhead. On the basis of supporting the EROFS file system, it supports using the config parameter to switch the file system used by rootfs. Fixes: #6063 Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>	2023-02-11 00:44:13 +08:00
GabyCT	86501d5f6f	Merge pull request #6200 from gkurz/improve-appendFDs-doc runtime: Improve documentation of appendFDs	2023-02-09 15:50:37 -06:00
yaoyinnan	01765e1734	runtime: support cgroup v2 metrics marshal guest metrics Support to use cgroup v2 metrics marshal guest metrics. Fixes: #5998 Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>	2023-02-09 19:14:09 +08:00
Bin Liu	56071c6e7b	virtiofsd: change cache mod to const Change cache mod from literal to const and place them in one place. Also set default cache mode from `none` to `never` in `pkg/katautils/config-settings.go.in`. Fixes: #6151 Signed-off-by: Bin Liu <bin@hyper.sh>	2023-02-08 15:06:52 +08:00
Bin Liu	71a3b73cb0	Merge pull request #6223 from d3c3mber/rm-unused-shim-config runtime: remove not used shim configurations	2023-02-08 10:00:52 +08:00
d3c3mber	390916b33c	runtime: remove not used shim configurations ShimPath and ShimDebug are not needed anymore. Fixes: #6147 Signed-off-by: d3c3mber <tangbo_gl_2022@163.com>	2023-02-07 14:06:12 +08:00
GabyCT	7fc35f19eb	Merge pull request #6056 from jongwu/perm_deny arm64/CI: fix unit test failure on arm64	2023-02-03 10:53:38 -06:00
Jianyong Wu	59f104c022	runtime: skip unit test that fail regularly on aarch64 There are lots of unit test cases fails regularly on aarch64, including TestIOCopy, create_tmpfs. Temporarily skip it for now and enable it after them get fixed. Fixes: #6194 Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>	2023-02-03 11:34:39 +08:00
Greg Kurz	3c48f2202c	runtime: Improve documentation of appendFDs The cmd.ExtraFiles feature that is used to implement appendFDs takes an array of arbitray file descriptors and internally renumbers them to be consecutive starting from 3, using dup2(). This isn't especially obvious : document it for the sake of clarity. Fixes #6199 Signed-off-by: Greg Kurz <groug@kaod.org>	2023-02-02 12:52:10 +01:00
Peng Tao	a34f36f8f4	Merge pull request #6149 from openanolis/fix_kata_runtime runtime:fix stat uds path	2023-02-02 11:00:07 +08:00
Greg Kurz	334c4b8bdc	runtime: Drop QEMU log file support The QEMU log file is essentially about fine grain tracing of QEMU internals and mostly useful for developpers, not production. Notably, the log file isn't limited in size, nor rotated in any way. It means that a container running in the VM could possibly flood the log file with a guest triggerable trace. For example, on openshift, the log file is supposed to reside on a per-VM 14 GiB tmpfs mount. This means that each pod running with the kata runtime could potentially consume this amount of host RAM which is not acceptable. Error messages are best collected from QEMU's stderr as kata is doing now since PR #5736 was merged. Drop support for the QEMU log file because it doesn't bring any value but can certainly do harm. Fixes #6173 Signed-off-by: Greg Kurz <groug@kaod.org>	2023-01-31 09:20:29 +01:00
Zhongtao Hu	1e531b44dc	runtime:fix stat uds path os.Stat("unix:///run/vc/sbs/sid/shim-monitor.sock") will fail, should be os.Stat("/run/vc/sbs/sid/shim-monitor.sock") Fixes:#6148 Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>	2023-01-29 15:08:13 +08:00
zhaojizhuang	9092c23a2e	runtime: Add hmp for qemu Fixes: #6092 Signed-off-by: zhaojizhuang <571130360@qq.com>	2023-01-29 14:22:04 +08:00
Greg Kurz	af125b1498	Merge pull request #5736 from gkurz/no-qemu-daemonize runtime: Start QEMU undaemonized and get logs	2023-01-27 16:33:48 +01:00
Greg Kurz	39fe4a4b6f	runtime: Collect QEMU's stderr LaunchQemu now connects a pipe to QEMU's stderr and makes it usable by callers through a Go io.ReadCloser object. As explained in [0], all messages should be read from the pipe before calling cmd.Wait : introduce a LogAndWait helper to handle that. Fixes #5780 Signed-off-by: Greg Kurz <groug@kaod.org>	2023-01-24 23:09:17 +01:00
Greg Kurz	bf4e3a618f	runtime: Launch QEMU with cmd.Start() LaunchCustomQemu() currently starts QEMU with cmd.Run() which is supposed to block until the child process terminates. This assumes that QEMU daemonizes itself, otherwise LaunchCustomQemu() would block forever. The virtcontainers package indeed enables the Daemonize knob in the configuration but having such an implicit dependency on a supposedly configurable setting is ugly and fragile. cmd.Run() is : func (c *Cmd) Run() error { if err := c.Start(); err != nil { return err } return c.Wait() } Let's open-code this : govmm calls cmd.Start() and returns the cmd to virtcontainers which calls cmd.Wait(). If QEMU doesn't start, e.g. missing binary, there won't be any errors to collect from QEMU output. Just drop these lines in govmm. Similarily there won't be any log file to read from in virtcontainers. Drop that as well. Signed-off-by: Greg Kurz <groug@kaod.org>	2023-01-24 23:09:11 +01:00
Greg Kurz	8a4f08cb0f	govmm: Optionally pass QMP listener to QEMU QEMU's -qmp option can be passed the file descriptor of a socket that is already in listening mode. This is done with by passing `fd=XXX` to `-qmp` instead of a path. Note that these two options are mutually exclusive : QEMU errors out if both are passed, so we check that as well in the validation function. While here add the `path=` stanza in the path based case for clarity. Signed-off-by: Greg Kurz <groug@kaod.org>	2023-01-24 23:08:48 +01:00
Greg Kurz	219bb8e7d0	govmm: Optionally start QMP with a pre-configured connection When QEMU is launched daemonized, we have the guarantee that the QMP socket is available. In order to launch a non-daemonized QEMU, the QMP connection should be created before QEMU is started in order to avoid a race. Introduce a variant of QMPStart() that can use such an existing connection. Signed-off-by: Greg Kurz <groug@kaod.org>	2023-01-24 19:16:47 +01:00
Peng Tao	7d1a604bad	Merge pull request #6060 from ls-ggg/6055/service.mu-deadlock runtime:all APIs are hang in the service.mu	2023-01-18 10:50:00 +08:00
Bin Liu	790f45190b	Merge pull request #6074 from zhaojizhuang/enablevhostuserstore runtime: paas enablevhostuserstore annotation to hypervisor config	2023-01-17 11:43:43 +08:00
ls	69fc8de712	runtime:all APIs are hang in the service.mu When the vmm process exits abnormally, a goroutine sets s.monitor to null in the 'watchSandbox' function without getting service.mu, This will cause another goroutine to block when sending a message to s.monitor, and it holds service.mu, which leads to a deadlock. For example, the wait function in the file .../pkg/containerd-shim-v2/wait.go will send a message to s.monitor after obtaining service.mu, but s.monitor may be null at this time Fixes: #6059 Signed-off-by: ls <335814617@qq.com>	2023-01-16 14:45:37 +08:00
Eric Ernst	458fe865ea	Merge pull request #6052 from egernst/add-darwin-skeletons Add darwin skeletons	2023-01-13 13:14:16 -08:00
zhaojizhuang	cf1bae3521	runtime: paas enablevhostuserstore annotation to hypervisor config Fixes: #6073 Signed-off-by: zhaojizhuang <571130360@qq.com>	2023-01-13 17:07:38 +08:00
Samuel Ortiz	a9626682af	virtcontainers: resourcecontrol: Add skeleton for Darwin Cgroups do not exist on Darwin, so use an empty implementation for resourcecontrol for the time being. In the process, ensure that the utilized cgroup handling (ie, isSystemdCgroup) is kept in general file, since we use this to help assess/constrain the container spec we pass to the guest. Fixes: #6051 Signed-off-by: Samuel Ortiz <s.ortiz@apple.com> Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2023-01-12 15:53:28 -08:00
Eric Ernst	6ee550e9a5	runtime: vCPUs pinning is sandbox specific, not hypervisor While at it, make sure we persist this and fix a misc typo. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2023-01-12 15:44:25 -08:00
Eric Ernst	f137048be3	resource-control: add helper function for setting CPU affinity Let's abstract the CPU affinity Fixes: #6044 Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2023-01-11 17:55:53 -08:00
Bin Liu	86a82cace9	runtime: change cache mode from none to never New Rust virtiofsd's `cache` mode doesn't support `none` mode, we should use `never` to replace it. Fixes: #6018 Signed-off-by: Bin Liu <bin@hyper.sh>	2023-01-10 17:29:48 +08:00
Bin Liu	2c10b37172	Merge pull request #5991 from dcantah/darwin-sigs runtime: Define Darwin handled signals list	2023-01-07 11:19:48 +08:00
Fabiano Fidêncio	175794458f	Merge pull request #5972 from bergwolf/github/hook fix moby prestart hook handling	2023-01-06 14:54:39 +01:00
Samuel Ortiz	3b4420eb8e	runtime: Define Darwin handled signals list Fixes: #5990 Some signals may not be defined on non Linux host OSes, like SIGSTKFLT for example. It's also not defined on certain architectures, but irrelevant for this. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com> Signed-off-by: Danny Canter <danny@dcantah.dev>	2023-01-05 17:50:47 -08:00
Danny Canter	24b05a99b6	schedcore: Make buildable on !linux Fixes: #5983 sched-core only makes sense on Linux hosts. Let's add stub/error for other platforms. Signed-off-by: Eric Ernst <eric_ernst@apple.com> Signed-off-by: Danny Canter <danny@dcantah.dev>	2023-01-05 11:51:04 -08:00
Peng Tao	d085389127	vc: fix up UT for CreateSandbox API change Need to adapt the UT as well. Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2023-01-03 22:30:42 +08:00
Peng Tao	578a9c25f0	vc: rescan network endpoints after running prestart hooks Moby relies on the prestart hooks to configure network endpoints. We should rescan the netns after running them so that the newly added endpoints can be found and plugged to the guest. Fixes: #5941 Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2023-01-03 22:30:41 +08:00
Peng Tao	cb84b0fb02	katautils: run prestart hooks after starting VM So that we can pass the hypervisor pid to the hook instead of the runtime process's. Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2023-01-03 10:52:32 +00:00
Zhongtao Hu	dae6670628	kata-runtime: add rust runtime path for kata-runtime exec add rust runtime path for kata-runtime exec Fixes:#5963 Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>	2022-12-30 13:34:34 +08:00
Binbin Zhang	99485d871c	shim: return hypervisor's pid not shim's pid update outdated code comments Fixes: #3234 Signed-off-by: Binbin Zhang <binbin36520@gmail.com>	2022-12-14 11:16:11 +08:00
Manabu Sugimoto	c617bbe70d	runtime: Pass SELinux policy for containers to the agent Pass SELinux policy for containers to the agent if `disable_guest_selinux` is set to `false` in the runtime configuration. The `container_t` type is applied to the container process inside the guest by default. Users can also set a custom SELinux policy to the container process using `guest_selinux_label` in the runtime configuration. This will be an alternative configuration of Kubernetes' security context for SELinux because users cannot specify the policy in Kata through Kubernetes's security context. To apply SELinux policy to the container, the guest rootfs must be CentOS that is created and built with `SELINUX=yes`. Fixes: #4812 Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>	2022-11-29 19:07:56 +09:00
Bin Liu	1dfd845f51	runtime: go fix code for 1.19 We have starting to use golang 1.19, some features are not supported later, so run `go fix` to fix them. Fixes: #5750 Signed-off-by: Bin Liu <bin@hyper.sh>	2022-11-25 11:29:18 +08:00
Bin Liu	06a604b753	Merge pull request #5720 from YchauWang/wyc-docs-test-22 runtime: add log record to the qemu config method `appendDevices` for…	2022-11-24 13:15:06 +08:00
wangyongchao.bj	30a7ebf430	runtime: Log invalid devices in QEMU config When the user tried to add new devices to the VM, there is no error info for the invalid device. This PR adds a log record to the `appendDevices` for the invalid device of the qemu config. Fixes: #5719 Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>	2022-11-23 09:09:45 +08:00
Fabiano Fidêncio	df3d9878d5	Merge pull request #5695 from darfux/virtiofs-queue-size runtime: Support virtiofs queue size for qemu and make it configurable	2022-11-22 20:04:30 +01:00
liyuxuan.darfux	3bb145c63a	runtime: Support virtiofs queue size for qemu and make it configurable The default vhost-user-fs queue-size of qemu is 128 now. Set it to 1024 by default which is same as clh. Also make this value configurable. Fixes: #5694 Signed-off-by: liyuxuan.darfux <liyuxuan.darfux@bytedance.com>	2022-11-19 15:38:11 +08:00
Fabiano Fidêncio	d94718fb30	runtime: Fix gofmt issues It seems that bumping the version of golang and golangci-lint new format changes are required. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-11-17 14:16:12 +01:00
Fabiano Fidêncio	16b8375095	golang: Stop using io/ioutils The package has been deprecated as part of 1.16 and the same functionality is now provided by either the io or the os package. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-11-17 13:43:25 +01:00
LitFlwr0	2508d39b7c	runtime: added vcpus pinning logics Core VCPU threads pinning logics for issue 4476. Also provided docs. Fixes:#4476 Signed-off-by: LitFlwr0 <861690705@qq.com>	2022-11-04 17:52:42 +08:00
snir911	288e337a6f	Merge pull request #5434 from Rouzip/remove-doNetNS add EnterNetNS in virtcontainers	2022-10-30 11:19:07 +02:00
Fabiano Fidêncio	190e623c40	Merge pull request #5317 from Champ-Goblem/fix-containerd-stats shim: Ensure pagesize is set when reporting hugetlb stats	2022-10-24 10:24:49 +02:00
Rouzip	39363ffbfb	runtime: remove same function Add EnterNetNS in virtcontainers to remove same function. FIXes #5394 Signed-off-by: Rouzip <1226015390@qq.com>	2022-10-17 10:59:13 +08:00
Vijay Dhanraj	435c8f181a	acrn: Enable ACRN hypervisor support for Kata 2.x release Currently ACRN hypervisor support in Kata2.x releases is broken. This commit re-enables ACRN hypervisor support and also refactors the code so as to remove dependency on Sandbox. Fixes #3027 Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>	2022-10-07 07:40:32 -07:00
Champ-Goblem	89e62d4edf	shim: Ensure pagesize is set when reporting hugetbl stats The containerd stats method and metrics API are broken with Kata 2.5.x, the stats fail to load and the metrics API responds with status code 500 This seems to be down to the conversion from the stats reported by the agent RPC `StatsContainer` where the field `Pagesize` is not completed by the `setHugetlbStats` method. In the case where multiple sized tables stats are reported, this causes containerd to register two metrics with the same label set, rather than each being partitioned by the `page` label. Fixes: #5316 Signed-off-by: Champ-Goblem <cameron@northflank.com>	2022-10-04 09:16:30 +01:00
Peng Tao	8a2df6b31c	Merge pull request #4931 from jpecholt/snp-support Added SNP-Support for Kata-Containers	2022-09-27 14:17:54 +08:00
Joana Pecholt	ded60173d4	runtime: Enable choice between AMD SEV and SNP This is based on a patch from @niteeshkd that adds a config parameter to choose between AMD SEV and SEV-SNP VMs as the confidential guest type in case both types are supported. SEV is the default. Signed-off-by: Joana Pecholt <joana.pecholt@aisec.fraunhofer.de>	2022-09-16 17:51:41 +02:00
Joana Pecholt	22bda0838c	runtime: Support for AMD SEV-SNP VMs This commit adds AMD SEV-SNP as a confidential guest option to the runtime. Information on required components such as OVMF, QEMU and a kernel supporting SEV-SNP are defined in the versions file and corresponding configs are added. Note: The CPU model 'host' provided by the current SNP-QEMU does not support all SNP capabilities yet, which is why this option is changed to EPYC-v4. Note: The guest's physical address space reduction specified with ReducedPhysBits is 1. Details are can be found in Section 15.34.6 here https://www.amd.com/system/files/TechDocs/24593.pdf Fixes #4437 Signed-off-by: Joana Pecholt <joana.pecholt@aisec.fraunhofer.de>	2022-09-16 17:51:41 +02:00
Feng Wang	f914319874	runtime: store the user name in hypervisor config The user name will be used to delete the user instead of relying on uid lookup because uid can be reused. Fixes: #5155 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-09-13 10:32:55 -07:00
Feng Wang	c3015927a3	runtime: add more debug logs for non-root user operation Previously the logging was insufficient and made debugging difficult Fixes: #5155 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-09-12 21:38:57 -07:00
Archana Shinde	7d52934ec1	Merge pull request #4798 from amshinde/use-iouring-qemu Use iouring for qemu block devices	2022-08-26 04:00:24 +05:30
Peng Tao	a06d819b24	runtime: cri-o annotations have been moved to podman Let's swith to depending on podman which also simplies indirect dependency on kubernetes components. And it helps to avoid cri-o security issues like CVE-2022-1708 as well. Fixes: #4972 Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2022-08-24 18:11:37 +08:00
Chelsea Mafrica	fcc1e0c617	runtime: tracing: End root span at end of trace The root span should exist the duration of the trace. Defer ending span until the end of the trace instead of end of function. Add the span to the service struct to do so. Fixes #4902 Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>	2022-08-12 13:15:39 -07:00
Archana Shinde	c1e3b8f40f	govmm: Refactor qmp functions for adding block device Instead of passing a bunch of arguments to qmp functions for adding block devices, use govmm BlockDevice structure to reduce these. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2022-08-05 13:16:34 -07:00
Archana Shinde	598884f374	govmm: Refactor code to get rid of redundant code Get rid of redundant return values from function. args and blockdevArgs used to return different values to maintain compatilibity between qemu versions. These are exactly the same now. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2022-08-05 13:16:34 -07:00
Archana Shinde	00860a7e43	qmp: Pass aio backend while adding block device Allow govmm to pass aio backend while adding block device. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2022-08-05 13:16:34 -07:00
Archana Shinde	e1b49d7586	config: Add block aio as a supported annotation Allow Block AIO to be passed as a per pod annotation. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2022-08-05 13:16:34 -07:00
Archana Shinde	ed0f1d0b32	config: Add "block_device_aio" as a config option for qemu This configuration will allow users to choose between different I/O backends for qemu, with the default being io_uring. This will allow users to fallback to a different I/O mechanism while running on kernels olders than 5.1. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2022-08-05 13:16:34 -07:00

1 2 3 4 5 ...

534 Commits