kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2025-07-31 07:19:06 +00:00

Author	SHA1	Message	Date
Eric Ernst	72049350ae	Merge pull request #4288 from fengwang666/enable-qemu-sandbox runtime: enable sandbox feature on qemu	2022-06-21 09:22:26 -07:00
Zvonko Kaiser	e7e7dc9dfe	runtime: Add heuristic to get the right value(s) for mem-reserve Fixes: #2938 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2022-06-21 03:44:28 -07:00
Liang Zhou	ef925d40ce	runtime: enable sandbox feature on qemu Enable "-sandbox on" in qemu can introduce another protect layer on the host, to make the secure container more secure. The default option is disable because this feature may introduce some performance cost, even though user can enable /proc/sys/net/core/bpf_jit_enable to reduce the impact. Fixes: #2266 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-06-17 15:30:46 -07:00
Chelsea Mafrica	28995301b3	tracing: Remove whitespace from root span Remove space from root span name to follow camel casing of other tracing span names in the runtime and to make parsing easier in testing. Fixes #4483 Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>	2022-06-17 12:07:37 -07:00
Fabiano Fidêncio	f30fe86dc1	Merge pull request #4456 from Bevisy/fixIssue4454 docs: Update outdated URLs and keep them available	2022-06-16 10:26:24 +02:00
Bin Liu	553ec46115	Merge pull request #4436 from alex-matei/fix/sandbox-mem-overflow runtime: fix error when trying to parse sandbox sizing annotations	2022-06-16 11:18:24 +08:00
James O. D. Hunt	9766a285a4	Merge pull request #4422 from snir911/dependabot_bumps deps: Resolve dependabot bumps of containerd, crossbeam-utils, regex	2022-06-15 15:57:53 +01:00
Binbin Zhang	a305bafeef	docs: Update outdated URLs and keep them available By comparing the content of the old url and the new url, ensure that their content is consistent and does not contain ambiguities Fixes: #4454 Signed-off-by: Binbin Zhang <binbin36520@gmail.com>	2022-06-15 16:34:28 +08:00
Fabiano Fidêncio	ac5dbd8598	clh: Improve logging related to the net dev addition Let's improve the log so we make it clear that we're only actually adding the net device to the Cloud Hypervisor configuration when calling our own version of VmAddNetPut(). Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-06-14 10:53:09 +00:00
Fabiano Fidêncio	0b75522e1f	network: Set queues to 1 to ensure we get the network fds We want to have the file descriptors of the opened tuntap device to pass them down to the VMMs, so the VMMs don't have to explicitly open a new tuntap device themselves, as the `container_kvm_t` label does not allow such a thing. With this change we ensure that what's currently done when using QEMU as the hypervisor, can be easily replicated with other VMMs, even if they don't support multiqueue. As a side effect of this, we need to close the received file descriptors in the code of the VMMs which are not going to use them. Fixes: #3533 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-06-14 10:53:09 +00:00
Fabiano Fidêncio	93b61e0f07	network: Add FFI_NO_PI to the netlink flags Adding FFI_NO_PI to the netlink flags causes no harm to the supported and tested hypervisors as when opening the device by its name Cloud Hypervisor[0], Firecracker[1], and QEMU[2] do set the flag already. However, when receiving the file descriptor of an opened tutap device Cloud Hypervisor is not able to set the flag, leaving the guest without connectivity. To avoid such an issue, let's simply add the FFI_NO_PI flag to the netlink flags and ensure, from our side, that the VMMs don't have to set it on their side when dealing with an already opened tuntap device. Note that there's a PR opened[3] just for testing that this change doesn't cause any breakage. [0]: `e52175c2ab/net_util/src/tap.rs (L129)` [1]: `b6d6f71213/src/devices/src/virtio/net/tap.rs (L126)` [2]: `3757b0d08b/net/tap-linux.c (L54)` [3]: https://github.com/kata-containers/kata-containers/pull/4292 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-06-14 10:53:09 +00:00
Fabiano Fidêncio	bf3ddc125d	clh: Pass the tuntap fds down to Cloud Hypervisor This is basically a no-op right now, as: * netPair.TapInterface.VMFds is nil * the tap name is still passed to Cloud Hypervisor, which is the Cloud Hypervisor's first choice when opening a tap device. In the very near future we'll stop passing the tap name to Cloud Hypervisor, and start passing the file descriptors of the opened tap instead. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-06-14 10:53:09 +00:00
Fabiano Fidêncio	55ed32e924	clh: Take care of the VmAdNetdPut request ourselves Knowing that VmAddNetPut works as expected, let's switch to manually building the request and writing it to the appropriate socket. By doing this it gives us more flexibility to, later on, pass the file descriptor of the tuntap device to Cloud Hypervisor, as openAPI doesn't support such operation (it has no notion of SCM Rights). Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-06-14 10:53:09 +00:00
Fabiano Fidêncio	01fe09a4ee	clh: Hotplug the network devices Instead of creating the VM with the network device already plugged in, let's actually add the network device after the VM is created, but before the Vm is actually booted. Although it looks like it doesn't make any functional difference between what's done in the past and what this commit introduces, this will be used to workaround a limitation on OpenAPI when it comes to passing down the network device's file descriptor to Cloud Hypervisor, so Cloud Hypervisor can use it instead of opening the device by its name on the VMM side. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-06-14 10:51:02 +00:00
Fabiano Fidêncio	2e07538334	clh: Expose VmAddNetPut VmAddNetPut is the API provided by the Cloud Hypervisor client (auto generated) code to hotplug a new network device to the VM. Let's expose it now as it'll be used as part this series, mostly to guide the reviewer through the process of what we have to do, as later on, spoiler alert, it'll end up being removed. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-06-14 10:27:30 +00:00
Fabiano Fidêncio	a80eb33cd6	Merge pull request #4308 from fidencio/topic/virtiofsd-switch-to-using-the-rust-version-on-all-arches runtime: Switch to using the rust version of virtiofsd (all arches but powerpc)	2022-06-13 13:45:51 +02:00
Bin Liu	81acfc1286	Merge pull request #4425 from liubin/fix/4376-change-log-level-of-getoomevent shim: change the log level for GetOOMEvent call failures	2022-06-13 17:53:11 +08:00
James O. D. Hunt	9b93db0220	Merge pull request #4417 from jodh-intel/docs-monitor-considerations docs: Add more kata monitor details	2022-06-13 10:51:52 +01:00
Fabiano Fidêncio	1ef0b7ded0	runtime: Switch to using the rust version of virtiofsd (all but power) So far this has been done for x86_64. Now that the support for building and testing has been added for all arches, let's do the second part of the switch. We're still not done yet for powerpc, as some a virtifosd crash on the rust version has been found by the maintainer. Fixes: #4258, #4260 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-06-13 10:41:26 +02:00
Alexandru Matei	721ca72a64	runtime: fix error when trying to parse sandbox sizing annotations Changed bitsize for parsing functions to 64-bit in order to avoid parsing errors. Fixes #4435 Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>	2022-06-11 18:51:10 +03:00
Archana Shinde	aefe11b9ba	Merge pull request #4331 from dgibson/config-enable-iommu-annotation Allow io.katacontainers.config.hypervisor.enable_iommu annotation by …	2022-06-10 17:43:27 -07:00
James O. D. Hunt	412441308b	docs: Add more kata monitor details Add more detail to the `kata-monitor` doc to allow an admin to make a more informed decision about where and how to run the daemon. Fixes: #4416. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2022-06-09 09:20:11 +01:00
Eric Ernst	4ebf9d38b9	Merge pull request #4310 from egernst/core-sched shim: add support for core scheduling	2022-06-08 17:42:45 +02:00
Bin Liu	eff4e1017d	shim: change the log level for GetOOMEvent call failures GetOOMEvent is a blocking call that will fail if the container exit, in this case, it's not an error or warning. Changing the log level for logs in case of GetOOMEvent call fails will reduce log noise in a large cluster that has pods creating/deleting frequently. Fixes: #4376 Signed-off-by: Bin Liu <bin@hyper.sh>	2022-06-08 22:17:24 +08:00
dependabot[bot]	5d7fb7b7b0	build(deps): bump github.com/containerd/containerd in /src/runtime Bumps [github.com/containerd/containerd](https://github.com/containerd/containerd) from 1.6.1 to 1.6.6. - [Release notes](https://github.com/containerd/containerd/releases) - [Changelog](https://github.com/containerd/containerd/blob/main/RELEASES.md) - [Commits](https://github.com/containerd/containerd/compare/v1.6.1...v1.6.6) --- updated-dependencies: - dependency-name: github.com/containerd/containerd dependency-type: direct:production ... Fixes: #4421 Signed-off-by: dependabot[bot] <support@github.com>	2022-06-08 10:54:46 +03:00
David Gibson	8f10e13e07	config: Allow enable_iommu pod annotation by default Since #902 the `io.katacontainers.config.hypervisor` pod annotations have only been permitted if explicitly allowed in the global configuration. The default global configuration allows no such annotations. That's important because several of those annotations would cause Kata to execute arbitrary binaries, and so were wildly unsafe. However, this is inconvenient for the `io.katacontainers.config.hypervisor.enable_iommu` annotation specifically, which controls whether the sandbox VM includes a vIOMMU. A guest side vIOMMU is necessary to implement VFIO passthrough devices with `vfio_mode = vfio`, so enabling that mode of operation currently requires a global configuration change, and can't just be enabled per-pod. Unlike some of the other hypervisor annotations, the `enable_iommu` annotation is quite safe. By default the vIOMMU is not present, so allowing a user to override it for a pod only improves their facilities for isolation. Even if the global default were changed to enable the vIOMMU, that doesn't compel the guest kernel to use it, so allowing a user to disable the vIOMMU doesn't materially affect isolation either. Therefore, allow the io.katacontainers.config.hypervisor.enable_iommu annotation to work in the default configurations. fixes #4330 Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-06-04 13:02:05 +10:00
Eric Ernst	430da47215	Merge pull request #4360 from fengwang666/shim-leak runtime: ignore ESRCH error from stop container	2022-06-02 12:42:19 -07:00
Feng Wang	9726f56fdc	runtime: force stop container after the container process exits Set thestop container force flag to true so that the container state is always set to “StateStopped” after the container wait goroutine is finished. This is necessary for the following delete container step to succeed. Fixes: #4359 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-06-02 08:17:08 -07:00
Eric Ernst	d2df1209a5	docs: describe kata handling for core-scheduling Add initial documentation for core-scheduling. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-05-31 16:17:00 -07:00
Michael Crosby	22b6a94a84	shim: add support for core scheduling In linux 5.14 and hopefully some backports, core scheduling allows processes to be co scheduled within the same domain on SMT enabled systems. Containerd impl sets the core sched domain when launching a shim. This allows a clean way for each shim(container/pod) to be in its own domain and any additional containers, (v2 pods) be be launched with the same domain as well as any exec'd process added to the container. kernel docs: https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/core-scheduling.html For Kata specifically, we will look for SCHED_CORE environment variable to be set to indicate we shuold create a new schedule core domain. This is equivalent to the containerd shim's PR: `e48bbe8394` Fixes: #4309 Signed-off-by: Eric Ernst <eric_ernst@apple.com> Signed-off-by: Michael Crosby <michael@thepasture.io>	2022-05-31 10:10:40 -07:00
Eric Ernst	65f0cef16c	kata-runtime: add iptables CLI to test http endpoint While end users can connect directly to the shim, let's provide a way to easily get/set iptables from kata-runtime itself. Fixes: #4080 Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-05-31 09:27:58 -07:00
Eric Ernst	3201ad0830	shim-client: ensure we check resp status for Put/Post Without this, potential errors are silently dropped. Let's ensure we return the error code as well as potenial data from the response. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-05-31 09:27:58 -07:00
Eric Ernst	0706fb28ac	kata-runtime: shmgmt: make url usage consistent Before, we had a mix of slash, etc. Unfortunately, when cleaning URL paths, serve mux seems to mangle the request method, resulting in each request being a GET (instead of PUT or POST). Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-05-31 09:27:58 -07:00
Eric Ernst	2a09378dd9	shim-client: add support for DoPut While at it, make sure we check for nil in DoPost Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-05-31 09:27:58 -07:00
Eric Ernst	640173cfc2	shim-mgmt: Add endpoint handler for interacting with iptables Add two endpoints: ip6tables, iptables. Each url handler supports GET and PUT operations. PUT expects the requests' data to be []bytes, and to contain iptable information in format to be consumed by iptables-restore. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-05-31 09:27:58 -07:00
Eric Ernst	0136be22ca	virtcontainers: plumb iptable set/get from sandbox to agent Introduce get/set iptable handling. We add a sandbox API for getting and setting the IPTables within the guest. This routes it from sandbox interface, through kata-agent, ultimately making requests to the guest agent. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-05-31 09:27:58 -07:00
Eric Ernst	03176a9e09	proto: update generated code based on proto update Update the generated agent.pb.go code based on proto update. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2022-05-31 08:45:59 -07:00
Fabiano Fidêncio	fff832874e	clh: Update to v24.0 This release has been tracked through the v24.0 project. virtio-iommu specification describes how a device can be attached by default to a bypass domain. This feature is particularly helpful for booting a VM with guest software which doesn't support virtio-iommu but still need to access the device. Now that Cloud Hypervisor supports this feature, it can boot a VM with Rust Hypervisor Firmware or OVMF even if the virtio-block device exposing the disk image is placed behind a virtual IOMMU. Multiple checks have been added to the code to prevent devices with identical identifiers from being created, and therefore avoid unexpected behaviors at boot or whenever a device was hot plugged into the VM. Sparse mmap support has been added to both VFIO and vfio-user devices. This allows the device regions that are not fully mappable to be partially mapped. And the more a device region can be mapped into the guest address space, the fewer VM exits will be generated when this device is accessed. This directly impacts the performance related to this device. A new serial_number option has been added to --platform, allowing a user to set a specific serial number for the platform. This number is exposed to the guest through the SMBIOS. * Fix loading RAW firmware (#4072) * Reject compressed QCOW images (#4055) * Reject virtio-mem resize if device is not activated (#4003) * Fix potential mmap leaks from VFIO/vfio-user MMIO regions (#4069) * Fix algorithm finding HOB memory resources (#3983) * Refactor interrupt handling (#4083) * Load kernel asynchronously (#4022) * Only create ACPI memory manager DSDT when resizable (#4013) Deprecated features will be removed in a subsequent release and users should plan to use alternatives * The mergeable option from the virtio-pmem support has been deprecated (#3968) * The dax option from the virtio-fs support has been deprecated (#3889) Fixes: #4317 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-05-26 08:51:18 +00:00
Eric Ernst	6d00701ec9	Merge pull request #4298 from yibozhuang/fix-direct-volume Fix issues with direct-volume stats feature	2022-05-23 15:23:51 -07:00
Yibo Zhuang	4428ceae16	runtime: direct-volume stats use correct name Today the shim does a translation when doing direct-volume stats where it takes the source and returns the mount path within the guest. The source for a direct-assigned volume is actually the device path on the host and not the publish volume path. This change will perform a lookup of the mount info during direct-volume stats to ensure that the device path is provided to the shim for querying the volume stats. Fixes: #4297 Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>	2022-05-20 18:42:47 -07:00
Yibo Zhuang	ffdc065b4c	runtime: direct-volume stats update to use GET parameter The go default http mux AFAIK doesn’t support pattern routing so right now client is padding the url for direct-volume stats with a subpath of the volume path and this will always result in 404 not found returned by the shim. This change will update the shim to take the volume path as a GET query parameter instead of a subpath. If the parameter is missing or empty, then return 400 BadRequest to the client. Fixes: #4297 Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>	2022-05-20 18:41:51 -07:00
Yibo Zhuang	f295953183	runtime: fix incorrect Action function for direct-volume stats The action function expects a function that returns error but the current direct-volume stats Action returns (string, error) which is invalid. This change fixes the format and print out the stats from the command instead. Fixes: #4293 Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>	2022-05-20 14:55:00 -07:00
Peng Tao	2c238c8504	Merge pull request #4213 from zvonkok/vfio runtime: Adding the correct detection of mediated PCIe devices	2022-05-20 15:00:23 +08:00
Fabiano Fidêncio	811ac6a8ce	Merge pull request #4282 from r4f4/runtime-dedup-types-import runtime: remove duplicate 'types' import	2022-05-19 22:15:36 +02:00
Chelsea Mafrica	d8be0f8e9f	Merge pull request #4281 from r4f4/runtime-qemu-comments runtime: sync docstrings with function names	2022-05-19 09:17:38 -07:00
Rafael Fonseca	7a5ccd1264	runtime: sync docstrings with function names The functions were renamed but their docstrings were not. Fixes #4006 Signed-off-by: Rafael Fonseca <r4f4rfs@gmail.com>	2022-05-19 14:31:47 +02:00
Greg Kurz	fa61bd43ee	Merge pull request #4238 from snir911/wip/legacy_console qemu: allow using legacy serial device for the console	2022-05-19 14:30:59 +02:00
Rafael Fonseca	ce2e521a0f	runtime: remove duplicate 'types' import Fallout of `09f7962ff` Fixes #4285 Signed-off-by: Rafael Fonseca <r4f4rfs@gmail.com>	2022-05-19 13:49:47 +02:00
Snir Sheriber	f4994e486b	runtime: allow annotation configuration to use_legacy_serial and update the docs and test Signed-off-by: Snir Sheriber <ssheribe@redhat.com>	2022-05-18 18:58:21 +03:00
Fabiano Fidêncio	c88a48be21	Merge pull request #4271 from r4f4/runtime-err-check-fix runtime: do not check for EOF error in console watcher	2022-05-18 09:49:48 +02:00
GabyCT	12f0ab120a	Merge pull request #4191 from dgibson/go-test-script Improve Go unit test script	2022-05-17 10:27:04 -05:00
Rafael Fonseca	8052fe62fa	runtime: do not check for EOF error in console watcher The documentation of the bufio package explicitly says "Err returns the first non-EOF error that was encountered by the Scanner." When io.EOF happens, `Err()` will return `nil` and `Scan()` will return `false`. Fixes #4079 Signed-off-by: Rafael Fonseca <r4f4rfs@gmail.com>	2022-05-17 15:14:33 +02:00
Snir Sheriber	c67b9d2975	qemu: allow using legacy serial device for the console This allows to get guest early boot logs which are usually missed when virtconsole is used. - It utilizes previous work on the govmm side: https://github.com/kata-containers/govmm/pull/203 - unit test added Fixes: #4237 Signed-off-by: Snir Sheriber <ssheribe@redhat.com>	2022-05-17 12:06:11 +03:00
Snir Sheriber	44814dce19	qemu: treat console kernel params within appendConsole as it is tightly coupled with the appended console device additionally have it tested Signed-off-by: Snir Sheriber <ssheribe@redhat.com>	2022-05-17 12:05:31 +03:00
Fabiano Fidêncio	c39852e83f	runtime: Use ${LIBEXEC}/virtiofsd as the default virtiofsd path As now we build and ship the rust version of virtiofsd, which is not tied to QEMU, we need to update its default location to match with where we're installing this binary. Fixes: #4249 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-05-16 09:30:24 +02:00
David Gibson	e73b70baff	runtime: Don't run unit tests verbose by default go-test.sh by default adds the -v option to 'go test' meaning that output will be printed from all the passing tests as well as any failing ones. This results in a lot of output in which it's often difficult to locate the failing tests you're interested in. So, remove -v from the default flags. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-05-13 13:22:31 +10:00
David Gibson	f24a6e761f	runtime: Consolidate flags setting in unit tests script One of the responsibilities of the go-test.sh script is setting up the default flags for 'go test'. This is constructed across several different places in the script using several unneeded intermediate variables though. Consolidate all the flag construction into one place. fixes #4190 Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-05-13 13:22:29 +10:00
David Gibson	cf465feb02	runtime: Don't change test behaviour based on $CI or $KATA_DEV_MODE go-test.sh changes behaviour based on both the $CI and $KATA_DEV_MODE variables, but not in a way that makes a lot of sense. If either one is set it uses the test_coverage path, instead of the test_local path. That collects coverage information, as the name suggests, but it also means it runs the tests twice as root and non-root, which is very non-obvious. It's not clear what use case the test_local path is for at all. Developer local builds will typically have $KATA_DEV_MODE set and CI builds will have $CI set. There's essentially no downside to running coverage all the time - it has little impact on the test runtime. In addition, if both $CI and $KATA_DEV_MODE are set, the script refuses to run things as root, considering it "unsafe". While having both set might be unwise in a general sense, there's not really any way running sudo can be any more unsafe than it is with either one set. So, simplify everything by just always running the test_coverage path. This leaves the test_local path unused, so we can remove it entirely. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-05-13 13:14:37 +10:00
David Gibson	34c4ac599c	runtime: Remove redundant subcommands from go-test.sh go-test.sh accepts subcommands, however invoking it in the usual way via the Makefile doesn't use them. In fact the only remaining subcommand is "help" and we already have another way of getting the usage information (-h or --help). We don't need a second way, so just drop subcommand handling. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-05-13 13:14:37 +10:00
David Gibson	0aff5aaa39	runtime: Simplify package listing in go-test.sh go-test.sh defaults to testing all the packages listed by go list, except for a number filtered out. It turns out that none of those filters are necessary any more: * We've long required a Go newer than 1.9 which means the vendor filter isn't needed * The agent filter doesn't do anything now that we've moved to the Kata 2.x unified repo * The tests filters don't hit anything on the list of modules in src/runtime (which is the only user of the script) But since we don't need to filter anything out any more, we don't even need to iterate through a list ourselves. We can simply pass "./..." directly to go test and it will iterate through all the sub-packages itself. Interestingly this more than doubles the speed of "make test" for me - I suspect because go test's internal paralellism works better over a larger pool of tests. This also lets us remove handling of non-existent coverage files from test_go_package(), since with default options we will no longer test packages without tests by default. If the user explicitly requests testing of a package with no tests, then failing makes sense. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-05-13 13:14:37 +10:00
David Gibson	557c4cfd00	runtime: Don't chmod coverage files in Go tests The go-test.sh script has an explicit chmod command, run as root, to set the mode of the temporary coverage files to 0644. AFAICT the point of this is specifically the 004 bit allowing world read access, so that we can then merge the temporary coverage file into the main coverage file. That's a convoluted way of doing things. Instead we can just run the tail command which reads the temporary file as the same user that generated it. In addition, go-test.sh became root to remove that temporary coverage file. This is not necessary, since deleting a regular file just requires write access to the directory, not the file itself. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-05-13 13:14:37 +10:00
David Gibson	04c8b52e04	runtime: Remove HTML coverage option from go-test.sh The html-coverage option to this script doesn't really alter behaviour it just does the same thing as normal coverage, then converts the report to HTML. That conversion is a single command, plus a chmod to make the final output mode 0644. That overrides any umask the user has set, which doesn't seem like a policy decision this script should be making. Nothing in the kata-containers or tests repository uses this, so it doesn't really make sense to keep this logic inside this script. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-05-13 13:14:37 +10:00
David Gibson	7f76914422	runtime: Add coverage.txt.tmp to gitignore In addition to coverage.txt, the go-test.sh script creates coverage.txt.tmp files while running. These are temporary and certainly shouldn't be committed, so add them to the gitignore file. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-05-13 13:14:37 +10:00
David Gibson	13c2577004	runtime: Move go testing script locally The go unit tests for the runtime are invoked by the helper script ci/go-test.sh. Which calls the run_go_test() function in ci/lib.sh. Which calls into .ci/go-test.sh from the tests repository. But.. the runtime is the only user of this script, and generally stuff for unit tests (rather than functional or integration tests) lives in the main repository, not the tests repository. So, just move the actual script into src/runtime. A change to remove it from the tests repo will follow. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-05-13 13:14:37 +10:00
Zvonko Kaiser	2a1d394147	runtime: Adding the correct detection of mediated PCIe devices Fixes #4212 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2022-05-09 00:57:06 -07:00
Fabiano Fidêncio	33a8b70558	clh: Rely on Cloud Hypervisor for generating the device ID We're currently hitting a race condition on the Cloud Hypervisor's driver code when quickly removing and adding a block device. This happens because the device removal is an asynchronous operation, and we currently do not monitor events coming from Cloud Hypervisor to know when the device was actually removed. Together with this, the sandbox code doesn't know about that and when a new device is attached it'll quickly assign what may be the very same ID to the new device, leading to the Cloud Hypervisor's driver trying to hotplug a device with the very same ID of the device that was not yet removed. This is, in a nutshell, why the tests with Cloud Hypervisor and devmapper have been failing every now and then. The workaround taken to solve the issue is basically not passing down the device ID to Cloud Hypervisor and simply letting Cloud Hypervisor itself generate those, as Cloud Hypervisor does it in a manner that avoids such conflicts. With this addition we have then to keep a map of the device ID and the Cloud Hypervisor's generated ID, so we can properly remove the device. This workaround will probably stay for a while, at least till someone has enough cycles to implement a way to watch the device removal event and then properly act on that. Spoiler alert, this will be a complex change that may not even be worth it considering the race can be avoided with this commit. Fixes: #4176 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-05-04 09:04:03 +02:00
Jianyong Wu	982c32358a	Merge pull request #4031 from Jaylyn-Ren/kata-spdk Virtcontainers: Enable hot plugging vhost-user-blk device on ARM	2022-04-29 12:16:38 +08:00
Fabiano Fidêncio	b6467ddd73	clh: Expose disk rate limiter config With everything implemented, let's now expose the disk rate limiter configuration options in the Cloud Hypervisor configuration file. Fixes: #4139 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-04-28 10:28:29 +02:00
Fabiano Fidêncio	7580bb5a78	clh: Expose net rate limiter config With everything implemented, let's now expose the net rate limiter configuration options in the Cloud Hypervisor configuration file. Fixes: #4017 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-04-28 10:28:13 +02:00
Fabiano Fidêncio	a88adabaae	clh: Cloud Hypervisor has a built-in Rate Limiter The notion of "built-in rate limiter" was added as part of `bd8658e362`, and that commit considered that only Firecracker had a built-in rate limiter, which I think was the case when that was introduced (mid 2020). Nowadays, however, Cloud Hypervisor takes advantage of the very same crate used by Firecraker to do I/O throttling. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-04-28 10:27:56 +02:00
Fabiano Fidêncio	63c4da03a9	clh: Implement the Disk RateLimiter logic Let's take advantage of the newly added DiskRateLimiter* options and apply those to the network device configuration. The logic here is identical to the one already present in the Network part of Cloud Hypervisor's driver. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-04-28 10:27:53 +02:00
Fabiano Fidêncio	511f7f822d	config: Add DiskRateLimiter* to Cloud Hypervisor Let's add the newly added disk rate limiter configurations to the Cloud Hypervisor's hypervisor configuration. Right now those are not used anywhere, and there's absolutely no way the users can set those up. That's coming later in this very same series. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-04-28 10:27:15 +02:00
Fabiano Fidêncio	5b18575dfe	hypervisor: Add disk bandwidth and operations rate limiters This is the disk counterpart of the what was introduced for the network as part of the previous commits in this series. The newly added fields are: * DiskRateLimiterBwMaxRate, defined in bits per second, which is used to control the network I/O bandwidth at the VM level. * DiskRateLimiterBwOneTimeBurst, also defined in bits per second, which is used to define an initial max rate, which doesn't replenish. * DiskRateLimiterOpsMaxRate, the operations per second equivalent of the DiskRateLimiterBwMaxRate. * DiskRateLimiterOpsOneTimeBurst, the operations per second equivalent of the DiskRateLimiterBwOneTimeBurst. For now those extra fields have only been added to the hypervisor's configuration and they'll be used in the coming patches of this very same series. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-04-28 10:27:11 +02:00
Fabiano Fidêncio	1cf9469297	clh: Implement the Network RateLimiter logic Let's take advantage of the newly added NetRateLimiter* options and apply those to the network device configuration. The logic here is quite similar to the one already present in the Firecracker's driver, with the main difference being the single Inbound / Outbound MaxRate and the presence of both Bandwidth and Operations rate limiter. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-04-28 10:26:38 +02:00
Fabiano Fidêncio	00a5b1bda9	utils: Define DefaultRateLimiterRefillTimeMilliSecs Firecracker's driver doesn't expose the RefillTime option of the rate limiter to the user. Instead, it uses a contant value of 1000 miliseconds (1 second). As we're following Firecracker's driver implementation, let's expose create a new constant, use it as part of the Firecracker's driver, and later on re-use it as part of the Cloud Hypervisor's driver. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-04-28 10:22:42 +02:00
Fabiano Fidêncio	be1bb7e39f	utils: Move FC's function to revert bytes to utils Firecracker's revertBytes function, now called "RevertBytes", can be exposed as part of the virtcontainers' utils file, as this function will be reused by Cloud Hypervisor, when adding the rate limiter logic there. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-04-28 10:22:42 +02:00
Fabiano Fidêncio	c9f6496d6d	config: Add NetRateLimiter* to Cloud Hypervisor Let's add the newly added network rate limiter configurations to the Cloud Hypervisor's hypervisor configuration. Right now those are not used anywhere, and there's absolutely no way the users can set those up. That's coming later in this very same series. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-04-28 10:22:42 +02:00
Fabiano Fidêncio	2d35e6066d	hypervisor: Add network bandwidth and operations rate limiters In a similar way to what's already exposed as RxRateLimiterMaxRate and TxRateLimiterMaxRate, let's add four new fields to the Hypervisor's configuration. The values added are related to bandwidth and operations rate limiters, which have to be added so we can expose I/O throttling configurations to users using Cloud Hypervisor as their preferred VMM. The reason we cannot simply re-use {Rx,Tx}RateLimiterMaxRate is because Cloud Hypervisor exposes a single MaxRate to be used for both inbound and outbound queues. The newly added fields are: * NetRateLimiterBwMaxRate, defined in bits per second, which is used to control the network I/O bandwidth at the VM level. * NetRateLimiterBwOneTimeBurst, also defined in bits per second, which is used to define an initial max rate, which doesn't replenish. * NetRateLimiterOpsMaxRate, the operations per second equivalent of the NetRateLimiterBwMaxRate. * NetRateLimiterOpsOneTimeBurst, the operations per second equivalent of the NetRateLimiterBwOneTimeBurst. For now those extra fields have only been added to the hypervisor's configuration and they'll be used in the coming patches of this very same series. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-04-28 10:22:42 +02:00
David Gibson	1b931f4203	runtime: Allock mockfs storage to be placed in any directory Currently EnableMockTesting() takes no arguments and will always place the mock storage in the fixed location /tmp/vc/mockfs. This means that one test run can interfere with the next one if anything isn't cleaned up (and there are other bugs which means that happens). If if those were fixed this would allow developers testing on the same machine to interfere with each other. So, allow the mockfs to be placed at an arbitrary place given as a parameter to EnableMockTesting(). In TestMain() we place it under our existing temporary directory, so we don't need any additional cleanup just for the mockfs. fixes #4140 Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-04-22 14:47:59 +10:00
David Gibson	ef6d54a781	runtime: Let MockFSInit create a mock fs driver at any path Currently MockFSInit always creates the mockfs at the fixed path /tmp/vc/mockfs. This change allows it to be initialized at any path given as a parameter. This allows the tests in fs_test.go to be simplified, because the by using a temporary directory from t.TempDir(), which is automatically cleaned up, we don't need to manually trigger initTestDir() (which is misnamed, it's actually a cleanup function). For now we still use the fixed path when auto-creating the mockfs in MockAutoInit(), but we'll change that later. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-04-22 14:23:36 +10:00
David Gibson	5d8438e939	runtime: Move mockfs control global into mockfs.go virtcontainers/persist/fs/mockfs.go defines a mock filesystem type for testing. A global variable in virtcontainers/persist/manager.go is used to force use of the mock fs rather than a normal one. This patch moves the global, and the EnableMockTesting() function which sets it into mockfs.go. This is slightly cleaner to begin with, and will allow some further enhancements. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-04-22 14:23:36 +10:00
David Gibson	963d03ea8a	runtime: Export StoragePathSuffix storagePathSuffix defines the file path suffix - "vc" - used for Kata's persistent storage information, as a private constant. We duplicate this information in fc.go which also needs it. Export it from fs.go instead, so it can be used in fc.go. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-04-22 14:23:36 +10:00
David Gibson	1719a8b491	runtime: Don't abuse MockStorageRootPath() for factory tests A number of unit tests under virtcontainers/factory use MockStorageRootPath() as a general purpose temporary directory. This doesn't make sense: the mockfs driver isn't even in use here since we only call EnableMockTesting for the pase virtcontainers package, not the subpackages. Instead use t.TempDir() which is for exactly this purpose. As a bonus it also handles the cleanup, so we don't need MockStorageDestroy any more. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-04-22 14:23:36 +10:00
David Gibson	bec59f9e39	runtime: Make bind mount tests better clean up after themselves There are several tests in mount_test.go which perform a sample bind mount. These need a corresponding unmount to clean up afterwards or attempting to delete the temporary files will fail due to the existing mountpoint. Most of them had such an unmount, but TestBindMountInvalidPgtypes was missing one. In addition, the existing unmounts where done inconsistently - one was simply inline (so wouldn't be executed if the test fails too early) and one is a defer. Change them all to use the t.Cleanup mechanism. For the dummy mountpoint files, rather than cleaning them up after the test, the tests were removing them at the beginning of the test. That stops the test being messed up by a previous run, but messily. Since these are created in a private temporary directory anyway, if there's something already there, that indicates a problem we shouldn't ignore. In fact we don't need to explicitly remove these at all - they'll be removed along with the rest of the private temporary directory. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-04-22 14:20:35 +10:00
David Gibson	f7ba21c86f	runtime: Clean up mock hook logs in tests The tests in hook_test.go run a mock hook binary, which does some debug logging to /tmp/mock_hook.log. Currently we don't clean up those logs when the tests are done. Use a test cleanup function to do this. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-04-22 14:14:52 +10:00
David Gibson	90b2f5b776	runtime: Make SetupOCIConfigFile clean up after itself SetupOCIConfigFile creates a temporary directory with os.MkDirTemp(). This means the callers need to register a deferred function to remove it again. At least one of them was commented out meaning that a /temp/katatest- directory was leftover after the unit tests ran. Change to using t.TempDir() which as well as better matching other parts of the tests means the testing framework will handle cleaning it up. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-04-22 14:14:52 +10:00
David Gibson	2eeb5dc223	runtime: Don't use fixed /tmp/mountPoint path Several tests in kata_agent_test.go create /tmp/mountPoint as a dummy directory to mount. This is not cleaned up after the test. Although it is in /tmp, that's still a little messy and can be confusing to a user. In addition, because it uses the same name every time, it allows for one run of the test to interfere with the next. Use the built in t.TempDir() to use an automatically named and deleted temporary directory instead. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-04-22 14:14:52 +10:00
Archana Shinde	33e244f284	Merge pull request #4102 from likebreath/0414/clh_v23.0 Upgrade to Cloud Hypervisor v23.0	2022-04-19 06:01:04 -07:00
Chelsea Mafrica	0af13b469d	Merge pull request #4086 from BbolroC/s390x-fix test: Fix golangci-lint error for s390x	2022-04-15 21:07:09 -07:00
Bin Liu	b19bfac7cd	Merge pull request #4042 from yibozhuang/direct-assign-fsgroup fsGroup support for direct-assigned volume	2022-04-16 10:23:15 +08:00
Bin Liu	4ec1967542	Merge pull request #4094 from fgiudici/kata-monitor_readme kata-monitor: add the README file	2022-04-16 08:27:22 +08:00
Bin Liu	362201605e	Merge pull request #4055 from fgiudici/kata-monitor_pprof kata-monitor: update the hrefs in the debug/pprof index page	2022-04-16 08:12:18 +08:00
Francesco Giudici	7b2ff02647	kata-monitor: add a README file Fixes: #3704 Signed-off-by: Francesco Giudici <fgiudici@redhat.com>	2022-04-15 18:03:23 +02:00
Bo Chen	29e569aa92	virtcontainers: clh: Re-generate the client code This patch re-generates the client code for Cloud Hypervisor v23.0. Note: The client code of cloud-hypervisor's (CLH) OpenAPI is automatically generated by openapi-generator [1-2]. [1] https://github.com/OpenAPITools/openapi-generator [2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md Signed-off-by: Bo Chen <chen.bo@intel.com>	2022-04-14 12:56:01 -07:00
Chelsea Mafrica	32f92e75cc	Merge pull request #4021 from fengwang666/direct-volume-bug runtime: Base64 encode the direct volume mountInfo path	2022-04-13 13:15:38 -07:00
Greg Kurz	4443bb68a4	Merge pull request #4064 from tiezhuoyu/4063/no-need-to-write-error-of-virtiofsd-to-kata-log runtime: no need to write virtiofsd error to log	2022-04-13 11:59:19 +02:00
Hyounggyu Choi	d136c9c240	test: Fix golangci-lint error for s390x This is to fix a test failure for the kata-containers-2.0-ubuntu-20.04-s390x-main-baseline jenkins job Fixes: #4088 Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2022-04-13 09:20:51 +02:00
Fupan Li	66aa07649b	Merge pull request #4062 from liubin/fix/4061-add-links-for-kata-monitor kata-monitor: add some links when generating pages for browsers	2022-04-13 11:30:21 +08:00
Francesco Giudici	86977ff780	kata-monitor: update the hrefs in the debug/pprof index page kata-monitor allows to get data profiles from the kata shim instances running on the same node by acting as a proxy (e.g., http://$NODE_ADDRESS:8090/debug/pprof/?sandbox=$MYSANDBOXID). In order to proxy the requests and the responses to the right shim, kata-monitor requires to pass the sandbox id via a query string in the url. The profiling index page proxied by kata-monitor contains the link to all the data profiles available. All the links anyway do not contain the sandbox id included in the request: the links result then broken when accessed through kata-monitor. This happens because the profiling index page comes from the kata shim, which will not include the query string provided in the http request. Let's add on-the-fly the sandbox id in each href tag returned by the kata shim index page before providing the proxied page. Fixes: #4054 Signed-off-by: Francesco Giudici <fgiudici@redhat.com>	2022-04-12 15:53:59 +02:00
Zhuoyu Tie	6e79042aa0	runtime: no need to write virtiofsd error to log The scanner reads nothing from viriofsd stderr pipe, because param '--syslog' rediercts stderr to syslog. So there is no need to write scanner.Text() to kata log Fixes: #4063 Signed-off-by: Zhuoyu Tie <tiezhuoyu@outlook.com>	2022-04-12 15:59:57 +08:00
Yibo Zhuang	532d53977e	runtime: fsGroup support for direct-assigned volume The fsGroup will be specified by the fsGroup key in the direct-assign mountinfo metadate field. This will be set when invoking the kata-runtime binary and providing the key, value pair in the metadata field. Similarly, the fsGroupChangePolicy will also be provided in the mountinfo metadate field. Adding an extra fields FsGroup and FSGroupChangePolicy in the Mount construct for container mount which will be populated when creating block devices by parsing out the mountInfo.json. And in handleDeviceBlockVolume of the kata-agent client, it checks if the mount FSGroup is not nil, which indicates that fsGroup change is required in the guest, and will provide the FSGroup field in the protobuf to pass the value to the agent. Fixes #4018 Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>	2022-04-11 08:41:13 -07:00
Yibo Zhuang	6a47b82c81	proto: fsGroup support for direct-assigned volume This change adds two fields to the Storage pb FSGroup which is a group id that the runtime specifies to indicate to the agent to perform a chown of the mounted volume to the specified group id after mounting is complete in the guest. FSGroupChangePolicy which is a policy to indicate whether to always perform the group id ownership change or only if the root directory group id does not match with the desired group id. These two fields will allow CSI plugins to indicate to Kata that after the block device is mounted in the guest, group id ownership change should be performed on that volume. Fixes #4018 Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>	2022-04-11 08:41:13 -07:00
bin	f8cc5d1ad8	kata-monitor: add some links when generating pages for browsers Add some links to rendered webpages for better user experience, let users can jump to pages only by clicking links in browsers. Fixes: #4061 Signed-off-by: bin <bin@hyper.sh>	2022-04-11 09:29:56 +08:00
bin	9d5b03a1b7	runtime: delete debug option in virtiofsd virtiofsd's debug will be enabled if hypervisor's debug has been enabled, this will generate too many noisy logs from virtiofsd. Unbind the relationship of log level between virtiofsd and hypervisor, if users want to see debug log of virtiofsd, can set it by: virtio_fs_extra_args = ["-o", "log_level=debug"] Fixes: #3303 Signed-off-by: bin <bin@hyper.sh>	2022-04-07 19:55:22 +08:00
Greg Kurz	d0d3787233	Merge pull request #3696 from shippomx/main kata-runtime enable hugepage support	2022-04-06 16:47:04 +02:00
Jaylyn Ren	b975f2e8d2	Virtcontainers: Enable hot plugging vhost-user-blk device on ARM The vhost-user-blk can be hotplugged on the PCI bridge successfully on X86, but failed on Arm. However, hotplugging it on Root Port as a PCIe device can work well on ARM. Open the "pcie_root_port" in configuration.toml is needed. Fixes: #4019 Signed-off-by: Jaylyn Ren <jaylyn.ren@arm.com>	2022-04-06 17:37:51 +08:00
Fabiano Fidêncio	b39caf43f1	Merge pull request #3923 from Jakob-Naucke/no-initrd-se runtime: Allow and require no initrd for SE	2022-04-05 09:26:07 +02:00
Feng Wang	354cd3b9b6	runtime: Base64 encode the direct volume mountInfo path This is to avoid accidentally deleting multiple volumes. Fixes #4020 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-04-04 19:56:46 -07:00
Archana Shinde	e62bc8e7f3	Merge pull request #3915 from Juneezee/test/t.TempDir test: use `T.TempDir` to create temporary test directory	2022-04-04 01:34:46 -07:00
Fabiano Fidêncio	98750d792b	clh: Expose service offload configuration This configuration option is valid for all the hypervisor that are going to be used with the confidential containers effort, thus exposing the configuration option for Cloud Hypervisor as well. Fixes: #4022 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-04-01 11:15:55 +02:00
Eng Zer Jun	59c7165ee1	test: use `T.TempDir` to create temporary test directory The directory created by `T.TempDir` is automatically removed when the test and all its subtests complete. This commit also updates the unit test advice to use `T.TempDir` to create temporary directory in tests. Fixes: #3924 Reference: https://pkg.go.dev/testing#T.TempDir Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>	2022-03-31 09:31:36 +08:00
snir911	18dc578134	Merge pull request #3999 from fgiudici/kata-monitor_fix_help kata-monitor: fix duplicated output when printing usage	2022-03-30 18:56:59 +03:00
Francesco Giudici	a63bbf9793	kata-monitor: fix duplicated output when printing usage (default: "/run/containerd/containerd.sock") is duplicated when printing kata-monitor usage: [root@kubernetes ~]# kata-monitor --help Usage of kata-monitor: -listen-address string The address to listen on for HTTP requests. (default ":8090") -log-level string Log level of logrus(trace/debug/info/warn/error/fatal/panic). (default "info") -runtime-endpoint string Endpoint of CRI container runtime service. (default: "/run/containerd/containerd.sock") (default "/run/containerd/containerd.sock") the golang flag package takes care of adding the defaults when printing usage. Remove the explicit print of the value so that it would not be printed on screen twice. Fixes: #3998 Signed-off-by: Francesco Giudici <fgiudici@redhat.com>	2022-03-30 11:58:53 +02:00
bin	5e1c30d484	runtime: add logs around sandbox monitor For debugging purposes, add some logs. Fixes: #3815 Signed-off-by: bin <bin@hyper.sh>	2022-03-29 16:59:12 +08:00
bin	fb8be96194	runtime: stop getting OOM events when ttrpc: closed error getOOMEvents is a long-waiting call, it will retry when failed. For cases of agent shutdown, the retry should stop. When the agent hasn't detected agent has died, we can also check whether the error is "ttrpc: closed". Fixes: #3815 Signed-off-by: bin <bin@hyper.sh>	2022-03-29 16:39:01 +08:00
Bin Liu	9495316145	Merge pull request #3962 from yaoyinnan/fix/3750-VirtioMem runtime: Remove the explicit VirtioMem set and fix the comment	2022-03-29 10:20:05 +08:00
yaoyinnan	66f05c5bcb	runtime: Remove the explicit VirtioMem set and fix the comment Modify the 2Mib in the comment to 4Mib. VirtioMem is set by configuration file or annotation. And setupVirtioMem is called only when VirtioMem is true. Fixes: #3750 Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>	2022-03-28 21:21:38 +08:00
Feng Wang	0928eb9f4e	agent: Kill the all the container processes of the same cgroup Otherwise the container process might leak and cause an unclean exit Fixes: #3913 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-03-27 10:06:58 -07:00
Jakob Naucke	ff17c756d2	runtime: Allow and require no initrd for SE Previously, it was not permitted to have neither an initrd nor an image. However, this is the exact config to use for Secure Execution, where the initrd is part of the image to be specified as `-kernel`. Require the configuration of no initrd for Secure Execution. Also - remove redundant code for image/initrd checking -- no need to check in `newQemuHypervisorConfig` (calling) when it is also checked in `getInitrdAndImage` (called) - use `QemuCCWVirtio` constant when possible Fixes: #3922 Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2022-03-25 18:36:12 +01:00
Feng Wang	19f372b5f5	runtime: Add more debug logs for container io stream copy This can help debugging container lifecycle issues Fixes: #3913 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-03-24 21:35:16 -07:00
David Gibson	c77e34de33	runtime: Move mock hook source src/runtime/virtcontainers/hook/mock contains a simple example hook in Go. The only thing this is used for is for some tests in src/runtime/pkg/katautils/hook_test.go. It doesn't really have anything to do with the rest of the virtcontainers package. So, move it next to the test code that uses it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-03-23 19:37:35 +11:00
David Gibson	86723b51ae	virtcontainers: Remove unused install/uninstall targets We've now removed the need to install the mock hook binary for unit tests. However, it turns out that managing that was the only thing that the install and uninstall targets in the virtcontainers Makefile handled. So, remove them. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-03-23 19:37:18 +11:00
David Gibson	0e83c95fac	virtcontainers: Run mock hook from build tree rather than system bin dir Running unit tests should generally have minimal dependencies on things outside the build tree. It definitely shouldn't modify system wide things outside the build tree. Currently the runtime "make test" target does so, though. Several of the tests in src/runtime/pkg/katautils/hook_test.go require a sample hook binary. They expect this hook in /usr/bin/virtcontainers/bin/test/hook, so the makefile, as root, installs the test binary to that location. Go tests automatically run within the package's directory though, so there's no need to use a system wide path. We can use a relative path to the binary build within the tree just as easily. fixes #3941 Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-03-23 19:34:50 +11:00
David Gibson	e65db838ff	virtcontainers: Remove VC_BIN_DIR The VC_BIN_DIR variable in the virtcontainers Makefile is almost unused. It's used to generate TEST_BIN_DIR, and it's created in the install target. However, we also create TEST_BIN_DIR, which is a subdirectory of VC_BIN_DIR with mkdir -p, so it will necessarily create VC_BIN_DIR along the way. So we can drop the unnecessary mkdir and expand the definition of VC_BIN_DIR in the definition of TEST_BIN_DIR. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-03-22 16:53:59 +11:00
David Gibson	c20ad2836c	virtcontainers: Remove unused Makefile defines The INSTALL_EXEC and UNINSTALL_EXEC definitions from the virtcontainers Makefile (unlike those from the runtime Makefile in the parent directory) are entirely unused. Remove them. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-03-22 16:40:57 +11:00
David Gibson	c776bdf4a8	virtcontainers: Remove unused parameter from go-test.sh The check-go-test target passes the path to the mock hook test binary to go-test.sh when it invokes it. But go-test.sh just calls run_go_test from ci/lib.sh, which invokes a script from the tests repo without any parameters. That is, this parameter is ignored anyway, so remove it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-03-22 16:39:22 +11:00
James O. D. Hunt	f8fb0d3bb6	Merge pull request #3322 from Kvasscn/kata_dev_block_driver_option device: using const strings for block-driver option instead of hard coding	2022-03-21 10:56:25 +00:00
Miao Xia	a2f5c1768e	runtime/virtcontainers: Pass the hugepages resources to agent The hugepages resources claimed by containers should be limited by cgroup in the guest OS. Fixes: #3695 Signed-off-by: Miao Xia <xia.miao1@zte.com.cn>	2022-03-15 18:46:08 +08:00
Feng Wang	aa5ae6b17c	runtime: Properly handle ESRCH error when signaling container Currently kata shim v2 doesn't translate ESRCH signal, causing container fail to stop and shim leak. Fixes: #3874 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-03-14 11:03:05 -07:00
zhanghj	efa19c41eb	device: use const strings for block-driver option instead of hard coding Currently, the block driver option is specifed by hard coding, maybe it is better to use const string variables instead of hard coded strings. Another modification is to remove duplicate consts for virtio driver in manager.go. Fixes: #3321 Signed-off-by: Jason Zhang <zhanghj.lc@inspur.com>	2022-03-14 09:20:43 +08:00
Gabriela Cervantes	ffdf961ae9	docs: Update contact link in runtime README This PR updates the contact link in the runtime README document. Fixes #3854 Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2022-03-08 16:27:34 +00:00
Bin Liu	deb8ce97a8	Merge pull request #3836 from liubin/fix/minor-fix Enhancement: fix comments/logs and delete not used function	2022-03-07 17:26:30 +08:00
bin	1b34494b2f	runtime: fix invalid comments for pkg/resourcecontrol Some comments are copied and not adjusted to the pkg/resourcecontrol package. Fixes: #3835 Signed-off-by: bin <bin@hyper.sh>	2022-03-05 10:32:31 +08:00
Evan Foster	afc567a9ae	storage: make k8s emptyDir creation configurable This change introduces the `disable_guest_empty_dir` config option, which allows the user to change whether a Kubernetes emptyDir volume is created on the guest (the default, for performance reasons), or the host (necessary if you want to pass data from the host to a guest via an emptyDir). Fixes #2053 Signed-off-by: Evan Foster <efoster@adobe.com>	2022-03-04 12:02:42 -08:00
Eric Ernst	1e301482e7	Merge pull request #3406 from fengwang666/direct-blk-assignment Implement direct-assigned volume	2022-03-04 11:58:37 -08:00
Feng Wang	e76519af83	runtime: small refactor to improve readability Remove some confusing/duplicate code so it's more readable Fixes: #3454 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-03-04 10:00:52 -08:00
Fabiano Fidêncio	7e5f11a52b	vendor: Update containerd to 1.6.1 Let's bring in the latest release of Containerd, 1.6.1, released on March 2nd, 2022. With this, we take the opportunity to remove containerd/api reference as we shouldn't need a separate module only for the API. Here's the list of changes needed in the code due to the bump: * stop using `grpc.WithInsecure()` as it's been deprecated - use `grpc.WithTransportCredentials(insecure.NewCredentials())` instead Fixes: #3820 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-03-04 10:28:40 +01:00
Fabiano Fidêncio	2af91b23e1	Merge pull request #3281 from jongwu/vcpu_hotplug_arm64 experimentally enable vcpu hotplug and virtio-mem on arm64 in kernel part	2022-03-04 09:14:31 +01:00
Jianyong Wu	42771fa726	runtime: don't set socket and thread for arm/virt As this is just a initial vcpu hotplug support, thread and socket has not been supported. So, don't set socket and thread when hotadd cpu for arm/virt. Fixes: #3280 Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>	2022-03-04 11:22:18 +08:00
Feng Wang	f905161bbb	runtime: mount direct-assigned block device fs only once Mount the direct-assigned block device fs only once and keep a refcount in the guest. Also use the ro flag inside the options field to determine whether the block device and filesystem should be mounted as ro Fixes: #3454 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-03-03 18:57:02 -08:00
Feng Wang	ea51ef1c40	runtime: forward the stat and resize requests from shimv2 to kata agent Translate the volume path from host-known path to guest-known path and forward the request to kata agent. Fixes: #3454 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-03-03 18:57:02 -08:00
Feng Wang	c39281ad65	runtime: update container creation to work with direct assigned volumes During the container creation, it will parse the mount info file of the direct assigned volumes and update the in memory mount object. Fixes: #3454 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-03-03 18:57:02 -08:00
Feng Wang	4e00c2377c	agent: add grpc interface for stat and resize operations Add GetVolumeStats and ResizeVolume APIs for the runtime to query stat and resize fs in the guest. Fixes: #3454 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-03-03 18:57:02 -08:00
Feng Wang	e9b5a25502	runtime: add stat and resize APIs to containerd-shim-v2 To query fs stats and resize fs, the requests need to be passed to kata agent through containerd-shim-v2. So we're adding to rest APIs on the shim management endpoint. Also refactor shim management client to its own go file. Fixes: #3454 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-03-03 18:56:53 -08:00
Feng Wang	6e0090abb5	runtime: persist direct volume mount info In the direct assigned volume scenario, Kata Containers persists the information required for managing the volume inside the guest on host filesystem. Fixes: #3454 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-03-03 15:32:12 -08:00
Feng Wang	fa326b4e0f	runtime: augment kata-runtime CLI to support direct-assigned volume Add commands to add, remove, resize and get stats of a direct-assigned volume. These commands are expected to be consumed by CSI. Fixes: #3454 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-03-03 15:32:03 -08:00
Fabiano Fidêncio	a2422cf2a1	Merge pull request #3389 from zhsj/rm-distro-test katatestutils: remove distro constraints	2022-03-03 23:26:58 +01:00
Fabiano Fidêncio	12af632952	Merge pull request #3814 from fidencio/wip/disable-block-device-use-minor-fixes Minor fixes for the `disable_block_device_use` comments	2022-03-03 23:26:05 +01:00
Fabiano Fidêncio	af80473496	clh: stop virtofsd if clh fails to boot up the vm If, for some reason, we're able to launch cloud hypervisor but not able to boot the VM up, the virtiofsd process would be left behind. Let's ensure, via defer, that we stop virtiofsd in case of errors. Fixes: #3819 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2022-03-03 19:10:37 +01:00
Fabiano Fidêncio	c54bc8e657	Merge pull request #3811 from fidencio/wip/clh-tdx-round-2 clh: tdx: Don't use sharedFS with Confidential Guests	2022-03-03 19:03:28 +01:00

1 2 3 4 5 ...

1387 Commits