1251 Commits

Author SHA1 Message Date
Peng Tao
d085389127 vc: fix up UT for CreateSandbox API change
Need to adapt the UT as well.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-01-03 22:30:42 +08:00
Peng Tao
578a9c25f0 vc: rescan network endpoints after running prestart hooks
Moby relies on the prestart hooks to configure network endpoints. We
should rescan the netns after running them so that the newly added
endpoints can be found and plugged to the guest.

Fixes: #5941
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-01-03 22:30:41 +08:00
Danny Canter
86ee24b33c Runtime: Clarify mutability of global var
Was about to change `urandomdev` to a constant when I realized it's
intentionally mutable so it can be mocked in tests. There's other
comments to the same effect so clarify here as well.

Fixes: #5965

Signed-off-by: Danny Canter <danny@dcantah.dev>
2023-01-02 01:13:34 -08:00
Fabiano Fidêncio
f1381eb361 Merge pull request #4813 from ManaSugi/fix/add-selinux-agent
runtime,agent: Add SELinux support for containers inside the guest
2022-12-13 11:24:53 +01:00
Alexandru Matei
d04d45ea05 runtime: use pidfd to wait for processes on Linux
Use pidfd_open and poll on newer versions of Linux to wait
for the process to exit. For older versions use existing wait logic

Fixes: #5617

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2022-12-06 16:31:05 +02:00
Alexandru Matei
e9ba0c11d0 runtime: use exponential backoff for process wait
Initial wait period between checks is 1ms, and the
next ones are min(wait_period*5, 50ms)

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2022-12-06 16:30:58 +02:00
Alexandru Matei
71491a69c3 runtime: move process wait logic to another function
extract process wait logic to another function

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2022-12-05 13:32:04 +02:00
Alexandru Matei
92ebe61fea runtime: reap force killed processes
reap child processes after sending SIGKILL

Fixes #5739

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2022-12-05 13:31:58 +02:00
Manabu Sugimoto
c617bbe70d runtime: Pass SELinux policy for containers to the agent
Pass SELinux policy for containers to the agent if `disable_guest_selinux`
is set to `false` in the runtime configuration. The `container_t` type
is applied to the container process inside the guest by default.
Users can also set a custom SELinux policy to the container process using
`guest_selinux_label` in the runtime configuration. This will be an
alternative configuration of Kubernetes' security context for SELinux
because users cannot specify the policy in Kata through Kubernetes's security
context. To apply SELinux policy to the container, the guest rootfs must
be CentOS that is created and built with `SELINUX=yes`.

Fixes: #4812

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2022-11-29 19:07:56 +09:00
Bin Liu
6af037d379 Merge pull request #5154 from Yuan-Zhuo/main
agent: support systemd cgroup for kata agent.
2022-11-28 18:40:10 +08:00
Bin Liu
a55eb78c32 Merge pull request #5752 from liubin/fix/5750-go-fix-1.19
runtime: go fix code for 1.19
2022-11-26 02:09:02 +08:00
Peng Tao
e32c023d96 Merge pull request #5714 from UiPath/fix-mkdir
runtime: don't fail mkdir if the folder is already created by another process
2022-11-25 17:52:56 +08:00
Bin Liu
1dfd845f51 runtime: go fix code for 1.19
We have starting to use golang 1.19, some features are
not supported later, so run `go fix` to fix them.

Fixes: #5750

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-11-25 11:29:18 +08:00
Alexandru Matei
4b45e13869 runtime: don't fail mkdir if the folder is already created
Use MkdirAll instead of Mkdir so it doesn't generate an
error when the folder is created by another process

Fixes #5713

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2022-11-24 11:20:56 +02:00
Fabiano Fidêncio
df3d9878d5 Merge pull request #5695 from darfux/virtiofs-queue-size
runtime: Support virtiofs queue size for qemu and make it configurable
2022-11-22 20:04:30 +01:00
Peng Tao
a636d426d9 versions: update nydusd version
To the latest stable v2.1.1.

Depends-on: github.com/kata-containers/tests#5246
Fixes: #5635
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2022-11-19 16:33:29 +00:00
liyuxuan.darfux
3bb145c63a runtime: Support virtiofs queue size for qemu and make it configurable
The default vhost-user-fs queue-size of qemu is 128 now. Set it to 1024
by default which is same as clh. Also make this value configurable.

Fixes: #5694

Signed-off-by: liyuxuan.darfux <liyuxuan.darfux@bytedance.com>
2022-11-19 15:38:11 +08:00
Bo Chen
36545aa81a runtime: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v28.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.

Fixes: #5683

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-11-17 09:45:27 -08:00
Fabiano Fidêncio
d94718fb30 runtime: Fix gofmt issues
It seems that bumping the version of golang and golangci-lint new format
changes are required.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-11-17 14:16:12 +01:00
Fabiano Fidêncio
16b8375095 golang: Stop using io/ioutils
The package has been deprecated as part of 1.16 and the same
functionality is now provided by either the io or the os package.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-11-17 13:43:25 +01:00
Alexandru Matei
a04afab74d qemu: early exit from Check if the process was stopped
Fixes: #5625

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2022-11-10 22:43:32 +02:00
Alexandru Matei
7e481f2179 qemu: set stopped only if StopVM is successful
Fixes: #5624

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2022-11-10 22:43:32 +02:00
Alexandru Matei
0e3ac66e76 clh: return faster with dead clh process from isClhRunning
Through proactively checking if Cloud Hypervisor process is dead,
this patch provides a faster path for isClhRunning

Fixes: #5623

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2022-11-10 22:43:32 +02:00
Alexandru Matei
9ef68e0c7a clh: fast exit from isClhRunning if the process was stopped
Use atomic operations instead of acquiring a mutex in isClhRunning.
This stops isClhRunning from generating a deadlock by trying to
reacquire an already-acquired lock when called via StopVM->terminate.

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2022-11-10 22:43:32 +02:00
Alexandru Matei
2631b08ff1 clh: don't try to stop clh multiple times
Avoid executing StopVM concurrently when virtiofs dies as a result of clh
being stopped in StopVM.

Fixes: #5622

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2022-11-10 22:43:32 +02:00
Fabiano Fidêncio
7250be3601 Merge pull request #5584 from fengyehong/clh-thread
cloud-hypervisor: Fix GetThreadIDs function
2022-11-07 08:22:40 +01:00
Guanglu Guo
daeee26a1e cloud-hypervisor: Fix GetThreadIDs function
Get vcpu thread-ids by reading cloud-hypervisor process tasks information.

Fixes: #5568

Signed-off-by: Guanglu Guo <guoguanglu@qiyi.com>
2022-11-05 17:23:19 +08:00
LitFlwr0
2508d39b7c runtime: added vcpus pinning logics
Core VCPU threads pinning logics for issue 4476. Also provided docs.

Fixes:#4476
Signed-off-by: LitFlwr0 <861690705@qq.com>
2022-11-04 17:52:42 +08:00
snir911
288e337a6f Merge pull request #5434 from Rouzip/remove-doNetNS
add EnterNetNS in virtcontainers
2022-10-30 11:19:07 +02:00
Yuan-Zhuo
d7bb4b5512 agent: support systemd cgroup for kata agent
1. Implemented a rust module for operating cgroups through systemd with the help of zbus (src/agent/rustjail/src/cgroups/systemd).
2. Add support for optional cgroup configuration through fs and systemd at agent (src/agent/rustjail/src/container.rs).
3. Described the usage and supported properties of the agent systemd cgroup (docs/design/agent-systemd-cgroup.md).

Fixes: #4336

Signed-off-by: Yuan-Zhuo <yuanzhuo0118@outlook.com>
2022-10-25 13:57:09 +08:00
Fabiano Fidêncio
9d286af7b4 versions: Update Cloud Hypervisor to b4e39427080
An API change, done a long time ago, has been exposed on Cloud
Hypervisor and we should update it on the Kata Containers side to ensure
it doesn't affect Cloud Hypervisor CI and because the change is needed
for an upcoming work to get QAT working with Cloud Hypervisor.

Fixes: #5492

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-10-21 20:52:54 +02:00
Rouzip
39363ffbfb runtime: remove same function
Add EnterNetNS in virtcontainers to remove same function.

FIXes #5394

Signed-off-by: Rouzip <1226015390@qq.com>
2022-10-17 10:59:13 +08:00
Fupan Li
2c88e1cd80 Merge pull request #5302 from liubin/fix/5285-SetFsSharingSupport-comment
runtime: fix incorrect comment for SetFsSharingSupport function
2022-10-09 09:40:31 +08:00
Bin Liu
b556c9b986 Merge pull request #5235 from YchauWang/wyc-qmp-log
virtcontainers: add warn log record for qmp hotplug cpu error
2022-10-09 08:29:09 +08:00
Vijay Dhanraj
435c8f181a acrn: Enable ACRN hypervisor support for Kata 2.x release
Currently ACRN hypervisor support in Kata2.x releases is broken.
This commit re-enables ACRN hypervisor support and also refactors
the code so as to remove dependency on Sandbox.

Fixes #3027

Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
2022-10-07 07:40:32 -07:00
Archana Shinde
6e2d39c588 Merge pull request #5311 from likebreath/0930/clh_v27.0
Upgrade to Cloud Hypervisor v27.0
2022-10-04 10:56:00 -07:00
Bo Chen
067e2b1e33 runtime: clh: Use the new API to boot with TDX firmware (td-shim)
The new way to boot from TDX firmware (e.g. td-shim) is using the
combination of '--platform tdx=on' with '--firmware tdshim'.

Fixes: #5309

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-10-03 10:30:54 -07:00
Bo Chen
5d63fcf344 runtime: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v27.0.
Note: The client code of cloud-hypervisor's (CLH) OpenAPI is
automatically generated by openapi-generator [1-2].

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Fixes: #5309

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-10-03 10:30:42 -07:00
norbjd
17de94e118 microvm: Remove kernel_irqchip=on option
`kernel_irqchip` option doesn't seem to bring any benefits and, on the
contrary, its usage cause issues when using the microvm machine type.

With this in mind, let's remove it.

Fixes: #1984, #4386

Signed-off-by: norbjd <norbjd@users.noreply.github.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-10-03 11:48:05 +02:00
Bin Liu
68e8a86aec runtime: fix incorrect comment for SetFsSharingSupport function
The comment for SetFsSharingSupport is not suitable, correct the
function name.

Fixes: #5285

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-09-30 15:44:44 +08:00
Peng Tao
8a2df6b31c Merge pull request #4931 from jpecholt/snp-support
Added SNP-Support for Kata-Containers
2022-09-27 14:17:54 +08:00
wangyongchao.bj
04bbce8dc3 virtcontainers: add warn log record for qmp hotplug cpu error
The qmp command of hotplug cpu failed error was hidden. It didn't friendly for
the user tracing the hotplug cpu error. The PR help us to improve the hotplug
cpu error log. Add real qemu command error log for `failed to hot add vCPUs`.
Through the error message, we can get the reason of the failed qmp command
 for hotplug cpu operation.

Fixes: #5234

Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
2022-09-23 08:22:30 +08:00
Joana Pecholt
ded60173d4 runtime: Enable choice between AMD SEV and SNP
This is based on a patch from @niteeshkd that adds a config
parameter to choose between AMD SEV and SEV-SNP VMs as the
confidential guest type in case both types are supported. SEV is
the default.

Signed-off-by: Joana Pecholt <joana.pecholt@aisec.fraunhofer.de>
2022-09-16 17:51:41 +02:00
Joana Pecholt
22bda0838c runtime: Support for AMD SEV-SNP VMs
This commit adds AMD SEV-SNP as a confidential guest option to the
runtime. Information on required components such as OVMF, QEMU and
a kernel supporting SEV-SNP are defined in the versions file and
corresponding configs are added.

Note: The CPU model 'host' provided by the current SNP-QEMU does
not support all SNP capabilities yet, which is why this option is
changed to EPYC-v4.

Note: The guest's physical address space reduction specified with
ReducedPhysBits is 1. Details are can be found in Section 15.34.6
here https://www.amd.com/system/files/TechDocs/24593.pdf

Fixes #4437

Signed-off-by: Joana Pecholt <joana.pecholt@aisec.fraunhofer.de>
2022-09-16 17:51:41 +02:00
Feng Wang
f914319874 runtime: store the user name in hypervisor config
The user name will be used to delete the user instead of relying on
uid lookup because uid can be reused.

Fixes: #5155

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-09-13 10:32:55 -07:00
Feng Wang
5cafe21770 runtime: make StopVM thread-safe
StopVM can be invoked by multiple threads and needs to be thread-safe

Fixes: #5155

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-09-12 21:56:15 -07:00
Feng Wang
c3015927a3 runtime: add more debug logs for non-root user operation
Previously the logging was insufficient and made debugging difficult

Fixes: #5155

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-09-12 21:38:57 -07:00
Eric Ernst
9997ab064a sandbox_test: Add test to verify memory hotplug behavior
Augment the mock hypervisor so that we can validate that ACPI memory hotplug
is carried out as expected.

We'll augment the number of memory slots in the hypervisor config each
time the memory of the hypervisor is changed. In this way we can ensure
that large memory hotplugs are broken up into appropriately sized
pieces in the unit test.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-08-31 10:32:30 -07:00
Eric Ernst
f390c122f0 sandbox: don't hotplug too much memory at once
If we're using ACPI hotplug for memory, there's a limitation on the
amount of memory which can be hotplugged at a single time.

During hotplug, we'll allocate memory for the memmap for each page,
resulting in a 64 byte per 4KiB page allocation. As an example, hotplugging 12GiB
of memory requires ~192 MiB of *free* memory, which is about the limit
we should expect for an idle 256 MiB guest (conservative heuristic of 75%
of provided memory).

From experimentation, at pod creation time we can reliably add 48 times
what is provided to the guest. (a factor of 48 results in using 75% of
provided memory for hotplug). Using prior example of a guest with 256Mi
RAM, 256 Mi * 48 = 12 Gi; 12GiB is upper end of what we should expect
can be hotplugged successfully into the guest.

Note: It isn't expected that we'll need to hotplug large amounts of RAM
after workloads have already started -- container additions are expected
to occur first in pod lifecycle. Based on this, we expect that provided
memory should be freely available for hotplug.

If virtio-mem is being utilized, there isn't such a limitation - we can
hotplug the max allowed memory at a single time.

Fixes: #4847

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-08-31 10:32:30 -07:00
Eric Ernst
e0142db24f hypervisor: Add GetTotalMemoryMB to interface
It'll be useful to get the total memory provided to the guest
(hotplugged + coldplugged). We'll use this information when calcualting
how much memory we can add at a time when utilizing ACPI hotplug.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-08-30 16:37:47 -07:00