kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2025-06-01 03:46:34 +00:00

Author	SHA1	Message	Date
Hyounggyu Choi	419b5ed715	runtime: Add DeviceInfo to Container for VFIO coldplug configuration Even though ociSpec.Linux.Devices is preserved when vfio_mode is VFIO, it has not been updated correctly for coldplug scenarios. This happens because the device info passed to the agent via CreateContainerRequest is dropped by the Kata runtime. This commit ensures that the device info is added to the sandbox's device manager when vfio_mode is VFIO and coldPlugVFIO is true (e.g., vfio-ap-cold), allowing ociSpec.Linux.Devices to be properly updated with the device information before the container is created on the guest. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-01-28 10:53:00 +01:00
Fabiano Fidêncio	fefcf7cfa4	acrn: Drop support As we don't have any CI, nor maintainer to keep ACRN code around, we better have it removed than give users the expectation that it should or would work at some point. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-09-19 16:05:43 +02:00
Ajay Victor	a19f2eacec	runtime: Enable ImageName annotation for remote hypervisor Enables ImageName to support multiple VM images in remote hypervisor scenario Fixes https://github.com/kata-containers/kata-containers/issues/10240 Signed-off-by: Ajay Victor <ajvictor@in.ibm.com>	2024-09-19 14:48:46 +05:30
Fabiano Fidêncio	e477ed0e86	runtime: Improve vCPU allocation for the VMMs First of all, this is a controversial piece, and I know that. In this commit we're trying to make a less greedy approach regards the amount of vCPUs we allocate for the VMM, which will be advantageous mainly when using the `static_sandbox_resource_mgmt` feature, which is used by the confidential guests. The current approach we have basically does: * Gets the amount of vCPUs set in the config (an integer) * Gets the amount of vCPUs set as limit (an integer) * Sum those up * Starts / Updates the VMM to use that total amount of vCPUs The fact we're dealing with integers is logical, as we cannot request 500m vCPUs to the VMMs. However, it leads us to, in several cases, be wasting one vCPU. Let's take the example that we know the VMM requires 500m vCPUs to be running, and the workload sets 250m vCPUs as a resource limit. In that case, we'd do: * Gets the amount of vCPUs set in the config: 1 * Gets the amount of vCPUs set as limit: ceil(0.25) * 1 + ceil(0.25) = 1 + 1 = 2 vCPUs * Starts / Updates the VMM to use 2 vCPUs With the logic changed here, what we're doing is considering everything as float till just before we start / update the VMM. So, the flow describe above would be: * Gets the amount of vCPUs set in the config: 0.5 * Gets the amount of vCPUs set as limit: 0.25 * ceil(0.5 + 0.25) = 1 vCPUs * Starts / Updates the VMM to use 1 vCPUs In the way I've written this patch we introduce zero regressions, as the default values set are still the same, and those will only be changed for the TEE use cases (although I can see firecracker, or any other user of `static_sandbox_resource_mgmt=true` taking advantage of this). There's, though, an implicit assumption in this patch that we'd need to make explicit, and that's that the default_vcpus / default_memory is the amount of vcpus / memory required by the VMM, and absolutely nothing else. Also, the amount set there should be reflected in the podOverhead for the specific runtime class. One other possible approach, which I am not that much in favour of taking as I think it's less clear, is that we could actually get the podOverhead amount, subtract it from the default_vcpus (treating the result as a float), then sum up what the user set as limit (as a float), and finally ceil the result. It could work, but IMHO this is less clear, and less explicit on what we're actually doing, and how the default_vcpus / default_memory should be used. Fixes: #6909 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2023-11-10 18:25:57 +01:00
Zvonko Kaiser	545de5042a	vfio: Fix tests Now with more elaborate checking of cold\|hot plug ports we needed to update some of the tests. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-07-20 13:42:44 +00:00
Zvonko Kaiser	55a66eb7fb	gpu: Add config to TOML Update cold-plug and hot-plug setting to include bridge, root and switch-port Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00
Zvonko Kaiser	de39fb7d38	runtime: Add support for GPUDirect and GPUDirect RDMA PCIe topology Fixes: #4491 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-14 08:20:24 +00:00
Sidhartha Mani	a6c67a161e	runtime: add support for ephemeral mounts to occupy entire sandbox memory On hotplug of memory as containers are started, remount all ephemeral mounts with size option set to the total sandbox memory Fixes: #6417 Signed-off-by: Sidhartha Mani <sidhartha_mani@apple.com>	2023-03-10 13:36:02 -08:00
zhaojizhuang	ca02c9f512	runtime: add reconnect timeout for vhost user block Fixes: #6075 Signed-off-by: zhaojizhuang <571130360@qq.com>	2023-02-13 14:33:46 +08:00
Eric Ernst	07e77f5be7	Merge pull request #5994 from dcantah/virtcontainers_tests_darwin virtcontainers: tests: Ensure Linux specific tests are just run on Linux	2023-01-10 17:13:28 -08:00
Eric Ernst	fafc7a8b1a	virtcontainers: tests: Ensure Linux specific tests are just run on Linux Fixes: #5993 Several tests utilize linux'isms like Mounts, bindmounts, vsock etc. Let's ensure that these are still tested on Linux, but that we also skip these tests when on other operating systems (Darwin). This commit just moves tests; there shouldn't be any functional test changes. While the tests still won't be runnable on Darwin/other hosts yet, this is a necessary step forward. Signed-off-by: Eric Ernst <eric_ernst@apple.com> Signed-off-by: Danny Canter <danny@dcantah.dev>	2023-01-06 11:09:11 -08:00
Peng Tao	d085389127	vc: fix up UT for CreateSandbox API change Need to adapt the UT as well. Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2023-01-03 22:30:42 +08:00
Eric Ernst	9997ab064a	sandbox_test: Add test to verify memory hotplug behavior Augment the mock hypervisor so that we can validate that ACPI memory hotplug is carried out as expected. We'll augment the number of memory slots in the hypervisor config each time the memory of the hypervisor is changed. In this way we can ensure that large memory hotplugs are broken up into appropriately sized pieces in the unit test. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-08-31 10:32:30 -07:00
Eric Ernst	f9e96c6506	runtime: device: move to top level package Let's move device package to runtime/pkg instead of being buried under virtcontainers. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-06-26 21:31:29 -07:00
Eng Zer Jun	59c7165ee1	test: use `T.TempDir` to create temporary test directory The directory created by `T.TempDir` is automatically removed when the test and all its subtests complete. This commit also updates the unit test advice to use `T.TempDir` to create temporary directory in tests. Fixes: #3924 Reference: https://pkg.go.dev/testing#T.TempDir Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>	2022-03-31 09:31:36 +08:00
zhanghj	efa19c41eb	device: use const strings for block-driver option instead of hard coding Currently, the block driver option is specifed by hard coding, maybe it is better to use const string variables instead of hard coded strings. Another modification is to remove duplicate consts for virtio driver in manager.go. Fixes: #3321 Signed-off-by: Jason Zhang <zhanghj.lc@inspur.com>	2022-03-14 09:20:43 +08:00
Samuel Ortiz	823faee83a	virtcontainers: Rename the cgroups package To resourcecontrol, and make it consistent with the fact that cgroups are a Linux implementation of the ResourceController interface. Fixes: #3601 Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-23 15:48:40 +01:00
Samuel Ortiz	77c29bfd3b	container: Remove VFIO lazy attach handling With the recently added VFIO fixes and support, we should not need that anymore. Fixes #3108 Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-17 08:39:44 +01:00
bin	7df677c01e	runtime: Update calculateSandboxMemory to include Hugepages Limit Support hugepages and port from: `96dbb2e8f0` Fixes: #3342 Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com> Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com> Signed-off-by: bin <bin@hyper.sh>	2022-02-16 15:14:37 +08:00
Samuel Ortiz	5e119e90e8	virtcontainers: Rename the Network structure fields and methods We are converting the Network structure into an interface, so that different host OSes can have different networking implementations for Kata. One step into that direction is to rename all the Network structure fields and methods to something that is less Linux networking namespace specific. This will make the Network interface naming consistent. Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-08 22:27:53 +01:00
Samuel Ortiz	49eee79f5f	virtcontainers: Remove the NetworkNamespace structure It is now replaced with a single Network structure Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>	2022-02-08 22:27:53 +01:00
Braden Rayhorn	fc0e095180	runtime: fix handling container spec's memory limit The OCI container spec specifies a limit of -1 signifies unlimited memory. Update the sandbox memory calculator to reflect this part of the spec. Fixes: #3512 Signed-off-by: Braden Rayhorn <bradenrayhorn@fastmail.com>	2022-01-24 13:30:32 -06:00
bin	03546f75a6	runtime: change io/ioutil to io/os packages Change io/ioutil to io/os packages because io/ioutil package is deprecated from 1.16: Discard => io.Discard NopCloser => io.NopCloser ReadAll => io.ReadAll ReadDir => os.ReadDir ReadFile => os.ReadFile TempDir => os.MkdirTemp TempFile => os.CreateTemp WriteFile => os.WriteFile Details: https://go.dev/doc/go1.16#ioutil Fixes: #3265 Signed-off-by: bin <bin@hyper.sh>	2021-12-15 07:31:48 +08:00
bin	ddc68131df	runtime: delete netmon Netmon is not used anymore. Fixes: #3112 Signed-off-by: bin <bin@hyper.sh>	2021-11-24 15:08:18 +08:00
Manohar Castelino	4d47aeef2e	hypervisor: Export generic interface methods This is in preparation for creating a seperate hypervisor package. Non functional change. Signed-off-by: Manohar Castelino <mcastelino@apple.com>	2021-10-22 16:45:35 -07:00
Samuel Ortiz	9bed2ade0f	virtcontainers: Convert to the new cgroups package API The new API is based on containerd's cgroups package. With that conversion we can simpligy the virtcontainers sandbox code and also uniformize our cgroups external API dependency. We now only depend on containerd/cgroups for everything cgroups related. Depends-on: github.com/kata-containers/tests#3805 Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com> Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-09-14 07:09:34 +02:00
Julio Montes	47d95dc1c6	runtime: virtcontainers: fix govet fieldalignment Fix structures alignment fixes #2271 Depends-on: github.com/kata-containers/tests#3727 Signed-off-by: Julio Montes <julio.montes@intel.com>	2021-07-20 11:59:15 -05:00
Hui Zhu	cb6b7667cd	runtime: Add option "enable_guest_swap" to config hypervisor.qemu This commit add option "enable_guest_swap" to config hypervisor.qemu. It will enable swap in the guest. Default false. When enable_guest_swap is enabled, insert a raw file to the guest as the swap device if the swappiness of a container (set by annotation "io.katacontainers.container.resource.swappiness") is bigger than 0. The size of the swap device should be swap_in_bytes (set by annotation "io.katacontainers.container.resource.swap_in_bytes") - memory_limit_in_bytes. If swap_in_bytes is not set, the size should be memory_limit_in_bytes. If swap_in_bytes and memory_limit_in_bytes is not set, the size should be default_memory. Fixes: #2201 Signed-off-by: Hui Zhu <teawater@antfin.com>	2021-07-19 23:22:06 +08:00
bin	39546a1070	runtime: delete not used functions Delete some not used functions in sandbox.go Fixes: #2230 Signed-off-by: bin <bin@hyper.sh>	2021-07-14 19:42:50 +08:00
Eric Ernst	064dfb164b	runtime: Add "watchable-mounts" concept for inotify support To workaround virtiofs' lack of inotify support, we'll special case particular mounts which are typically watched, and pass on information to the agent so it can ensure that the mount presented to the container is indeed watchable (see applicable agent commit). This commit will: - identify watchable mounts based on file count and mount source - create a watchable-bind storage object for these mounts to communicate intent to the agent - update the OCI spec to take the updated watchable mount source into account Unit tests added and updated for the newly introduced functionality/functions. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-06-24 10:07:06 -07:00
Eric Ernst	57c0cee0a5	runtime: Cleanup mountSharedDirMounts, shareFile parameters There's no reason to pass the paths; they can be determined when they are actually used. Let's make the return values more comparable to the other mount handling functions (we'll add storage object in future commit), and pass the mount maps as function parameters. ...No functional changes here... Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-06-24 10:07:06 -07:00
bin	d75fe95685	virtcontainers: replace newStore by store in Sandbox struct The property name make newcomers confused when reading code. Since in Kata Containers 2.0 there will only be one type of store, so it's safe to replace it by `store` simply. Fixes: #1660 Signed-off-by: bin <bin@hyper.sh>	2021-04-08 23:59:16 +08:00
Chelsea Mafrica	4bf84b4b2f	runtime: Add contexts to calls in unit tests Modify calls in unit tests to use context since many functions were updated to accept local context to fix trace span ordering. Fixes #1355 Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>	2021-03-16 17:39:28 -07:00
Chelsea Mafrica	6b0dc60dda	runtime: Fix ordering of trace spans A significant number of trace calls did not use a parent context that would create proper span ordering in trace output. Add local context to functions for use in trace calls to facilitate proper span ordering. Additionally, change whether trace function returns context in some functions in virtcontainers and use existing context rather than background context in bindMount() so that span exists as a child of a parent span. Fixes #1355 Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>	2021-03-16 17:39:28 -07:00
bin liu	fdbf7d3222	virtcontainers: revert CleanupContainer from PR 1079 In PR 1079, CleanupContainer's parameter of sandboxID is changed to VCSandbox, but at cleanup, there is no VCSandbox is constructed, we should load it from disk by loadSandboxConfig() in persist.go. This commit reverts parts of #1079 Fixes: #1119 Signed-off-by: bin liu <bin@hyper.sh>	2020-11-17 10:31:33 +08:00
bin liu	4e3a8c0124	runtime: remove global sandbox variable Remove global sandbox variable, and save *Sandbox to hypervisor struct. For some needs, hypervisor may need to use methods from Sandbox. Signed-off-by: bin liu <bin@hyper.sh>	2020-11-13 09:47:09 +08:00
bin liu	290203943c	runtime: delete sandboxlist.go and sandboxlist_test.go Delete sandboxlist.go and sandboxlist_test.go under virtcontainers package. Fixes: #1078 Signed-off-by: bin liu <bin@hyper.sh>	2020-11-13 09:47:09 +08:00
Peng Tao	c7a2b12fab	Merge pull request #1086 from jodh-intel/2.0-dev-fix-annotations annotations: Improve asset annotation handling	2020-11-06 10:29:22 +08:00
James O. D. Hunt	e82c9daec3	annotations: Improve asset annotation handling Make `asset.go` the arbiter of asset annotations by removing all asset annotations lists from other parts of the codebase. This makes the code simpler, easier to maintain, and more robust. Specifically, the previous behaviour was inconsistent as the following ways: - `createAssets()` in `sandbox.go` was not handling the following asset annotations: - firmware: - `io.katacontainers.config.hypervisor.firmware` - `io.katacontainers.config.hypervisor.firmware_hash` - hypervisor: - `io.katacontainers.config.hypervisor.path` - `io.katacontainers.config.hypervisor.hypervisor_hash` - hypervisor control binary: - `io.katacontainers.config.hypervisor.ctlpath` - `io.katacontainers.config.hypervisor.hypervisorctl_hash` - jailer: - `io.katacontainers.config.hypervisor.jailer_path` - `io.katacontainers.config.hypervisor.jailer_hash` - `addAssetAnnotations()` in the `oci` package was not handling the following asset annotations: - hypervisor: - `io.katacontainers.config.hypervisor.path` - `io.katacontainers.config.hypervisor.hypervisor_hash` - hypervisor control binary: - `io.katacontainers.config.hypervisor.ctlpath` - `io.katacontainers.config.hypervisor.hypervisorctl_hash` - jailer: - `io.katacontainers.config.hypervisor.jailer_path` - `io.katacontainers.config.hypervisor.jailer_hash` This change fixes the bug where specifying a custom hypervisor path via an asset annotation was having no effect. Fixes: #1085. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2020-11-05 12:15:42 +00:00
Peng Tao	a958eaa8d3	runtime: mount shared mountpoint readonly bindmount remount events are not propagated through mount subtrees, so we have to remount the shared dir mountpoint directly. E.g., ``` mkdir -p source dest foo source/foo mount -o bind --make-shared source dest mount -o bind foo source/foo echo bind mount rw mount \| grep foo echo remount ro mount -o remount,bind,ro source/foo mount \| grep foo ``` would result in: ``` bind mount rw /dev/xvda1 on /home/ubuntu/source/foo type ext4 (rw,relatime,discard,data=ordered) /dev/xvda1 on /home/ubuntu/dest/foo type ext4 (rw,relatime,discard,data=ordered) remount ro /dev/xvda1 on /home/ubuntu/source/foo type ext4 (ro,relatime,discard,data=ordered) /dev/xvda1 on /home/ubuntu/dest/foo type ext4 (rw,relatime,discard,data=ordered) ``` The reason is that bind mount creats new mount structs and attaches them to different mount subtrees. However, MS_REMOUNT only looks for existing mount structs to modify and does not try to propagate the change to mount structs in other subtrees. Fixes: #1061 Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2020-11-04 17:51:49 +08:00
Eric Ernst	88cd712876	sandbox: consider cpusets if quota is not enforced CPUSet cgroup allows for pinning the memory associated with a cpuset to a given numa node. Similar to cpuset.cpus, we should take cpuset.mems into account for the sandbox-cgroup that Kata creates. Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>	2020-10-13 15:54:03 -07:00
Eric Ernst	77a463e57a	cpuset: support setting mems for sandbox CPUSet cgroup allows for pinning the memory associated with a cpuset to a given numa node. Similar to cpuset.cpus, we should take cpuset.mems into account for the sandbox-cgroup that Kata creates. Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>	2020-10-13 15:54:03 -07:00
Eric Ernst	b812d4f7fa	virtcontainers: add method for calculating cpuset for sandbox Calculate sandbox's CPUSet as the union of each of the container's CPUSets. Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>	2020-10-12 21:31:27 -07:00
Chelsea Mafrica	07a307b4b1	virtcontainers: Remove duplicate unit tests Remove tests from virtcontainers/sandbox_test.go which were moved to virtcontainers/types/sandbox_test.go. Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>	2020-07-24 01:36:12 +00:00
bin liu	bd8f03a5ef	runtime: remove agent abstraction This PR will delete agent abstraction and use Kata agent as the only one agent. Fixes: #377 Signed-off-by: bin liu <bin@hyper.sh>	2020-07-08 10:07:40 +08:00
Peng Tao	6de95bf36c	gomod: update runtime import path To use the kata-containers repo path. Most of the change is generated by script: find . -type f -name "*.go" \|xargs sed -i -e \ 's\|github.com/kata-containers/runtime\|github.com/kata-containers/kata-containers/src/runtime\|g' Fixes: #201 Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2020-04-29 18:39:03 -07:00
Peng Tao	a02a8bda66	runtime: move all code to src/runtime To prepare for merging into kata-containers repository. Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2020-04-27 19:39:25 -07:00

47 Commits