kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2025-08-18 07:58:36 +00:00

Author	SHA1	Message	Date
stevenhorsman	779754dcf6	runtime: Support policy in remote hypervisor Move the `sandbox.agent.setPolicy` call out of the remoteHypervisor if, block, so we can use the policy implementation on peer pods Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-06-19 16:43:53 +01:00
markyangcc	a28bf266f9	runtime: fix missing of VhostUserDeviceReconnect parameter assignment Commit 'ca02c9f5124e' implements the vhost-user-blk reconnection functionality, However, it has missed assigning VhostUserDeviceReconnect when new the QEMU HypervisorConfig, resulting in VhostUserDeviceReconnect always set to default value 0. Real change is this line, most of changes caused by go format, return vc.HypervisorConfig{ // ... VhostUserDeviceReconnect: h.VhostUserDeviceReconnect, }, nil Fixes: #9848 Signed-off-by: markyangcc <mmdou3@163.com>	2024-06-18 12:15:10 +08:00
Wainer dos Santos Moschetta	bdbee78517	runtime: allow default_{vcpus,memory} annotations to qemu-coco-dev This is a counterpart of commit `abf52420a4` for the qemu-coco-dev configuration. By allowing default_vcpu and default_memory annotations users can fine-tune the VM based on the size of the container image to avoid issues related with pulling large images in the guest. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-06-17 18:59:52 -03:00
Wainer dos Santos Moschetta	baa8d9d99c	runtime: set shared_fs=none to qemu-coco-dev configuration Just like the TEE configurations (sev, snp, tdx) we want to have the qemu-coco-dev using shared_fs=none. Fixes: #9676 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-06-17 18:42:46 -03:00
Bo Chen	a68aeca356	Merge pull request #9575 from likebreath/0430/clh_v39.0 versions: Upgrade to Cloud Hypervisor v39.0	2024-06-14 09:10:19 -07:00
Steve Horsman	ab8a9882c1	Merge pull request #9818 from EmmEff/fix-spelling runtime: fix minor spelling issues	2024-06-14 13:12:56 +01:00
Steve Horsman	99bf95f773	Merge pull request #9827 from littlejawa/fix_panic_on_metrics_gathering runtime: avoid panic on metrics gathering	2024-06-14 11:12:43 +01:00
Mike Frisch	c2f61b0fe3	runtime: spelling fixes Minor spelling fixes in runtime log messages. Signed-off-by: Mike Frisch <mikef17@gmail.com>	2024-06-13 12:11:34 -04:00
Greg Kurz	b85b1c1058	Merge pull request #9790 from gkurz/kill-some-dead-runtime-code Kill some dead runtime code	2024-06-13 15:45:51 +02:00
Julien Ropé	9c86eb1d35	runtime: avoid panic on metrics gathering While running with a remote hypervisor, whenever kata-monitor tries to access metrics from the shim, the shim does a "panic" and no metric can be gathered. The function GetVirtioFsPid() is called on metrics gathering, and had a call to "panic()". Since there is no virtiofs process for remote hypervisor, the right implementation is to return nil. The caller expects that, and will skip metrics gathering for virtiofs. Fixes: #9826 Signed-off-by: Julien Ropé <jrope@redhat.com>	2024-06-12 10:02:44 +02:00
Xuewei Niu	92cc5e0adb	Merge pull request #9781 from gaohuatao-1/ght/shm	2024-06-12 12:39:28 +08:00
Greg Kurz	1acf8d0c35	govmm: Drop QEMU's `NoShutdown` knob Code is not used. Signed-off-by: Greg Kurz <groug@kaod.org>	2024-06-11 19:55:54 +02:00
Greg Kurz	cb5b548ad7	govmm: Drop QEMU's `Daemonize` knob Code isn't used anymore. Signed-off-by: Greg Kurz <groug@kaod.org>	2024-06-11 19:55:54 +02:00
Greg Kurz	33eaf69d5f	virtcontainers: Drop QEMU's `Daemonize` knob QEMU isn't started as daemon anymore and this won't change (see #5736 for details). Drop the related code. Signed-off-by: Greg Kurz <groug@kaod.org>	2024-06-11 19:55:54 +02:00
Bo Chen	2398442c58	runtime: clh: Re-generate the client code This patch re-generates the client code for Cloud Hypervisor v39.0. Note: The client code of cloud-hypervisor's OpenAPI is automatically generated by openapi-generator. Fixes: #8694, #9574 Signed-off-by: Bo Chen <chen.bo@intel.com>	2024-06-11 09:42:17 -07:00
gaohuatao	638e9acf89	runtime: fix the bug of func countFiles When the total number of files observed is greater than limit, return (-1, err). When the returned err is not nil, the func countFiles should return -1. Fixes:#9780 Signed-off-by: gaohuatao <gaohuatao@bytedance.com>	2024-06-11 18:17:18 +08:00
Niteesh Dubey	62d3d7c58f	runtime: enable kernel-hashes for SNP confidential container This is required to provide the hashes of kernel, initrd and cmdline needed during the attestation of the coco. Fixes: #9150 Signed-off-by: Niteesh Dubey <niteesh@us.ibm.com>	2024-06-05 15:02:02 +00:00
Fabiano Fidêncio	138ef2c55f	Merge pull request #9678 from AdithyaKrishnan/main TEEs: Skip a few CI tests for SEV/SNP	2024-06-04 23:42:51 +02:00
Ryan Savino	6db08ed620	runtime: sev: snp: Use shared_fs=none Disabling 9p for SEV and SNP TEEs. Signed-Off-By: Ryan Savino <ryan.savino@amd.com>	2024-06-03 01:14:16 -05:00
Beraldo Leal	c99ba42d62	deps: bumping yq to v4.40.7 Since yq frequently updates, let's upgrade to a version from February to bypass potential issues with versions 4.41-4.43 for now. We can always upgrade to the newest version if necessary. Fixes #9354 Depends-on:github.com/kata-containers/tests#5818 Signed-off-by: Beraldo Leal <bleal@redhat.com>	2024-05-31 13:28:34 -04:00
Beraldo Leal	4f6732595d	ci: skip go version check golang.mk is not ready to deal with non GOPATH installs. This is breaking test on s390x. Since previous steps here are installing go and yq our way, we could skip this aditional check. A full refactor to golang.mk would be needed to work with different paths. Signed-off-by: Beraldo Leal <bleal@redhat.com>	2024-05-31 13:28:34 -04:00
Zvonko Kaiser	d4832b3b74	vfio: Fix hotpunplug We need to remove the device from the tracking map, a container restart will increment the bus index and we will get out of root-ports and crash the machine. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-05-28 07:37:30 +00:00
Zvonko Kaiser	4c93bb2d61	qemu: Add CDI device handling for any container type We need special handling for pod_sandbox, pod_container and single_container how and when to inject CDI devices Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-05-27 10:13:01 +00:00
Zvonko Kaiser	c7b41361b2	gpu: reintroduce pcie_root_port and add pcie_switch_port In Kubernetes we still do not have proper VM sizing at sandbox creation level. This KEP tries to mitigates that: kubernetes/enhancements#4113 but this can take some time until Kube and containerd or other runtimes have those changes rolled out. Before we used a static config of VFIO ports, and we introduced CDI support which needs a patched contianerd. We want to eliminate the patched continerd in the GPU case as well. Fixes: #8860 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-05-27 10:13:01 +00:00
Steve Horsman	b89c3e35dd	Merge pull request #9583 from cncal/update_check_error_message runtime: make kata-runtime check error more understandable when /dev/kvm doesn't exist	2024-05-24 17:49:43 +01:00
Fabiano Fidêncio	d83cf39ba1	Merge pull request #9680 from kata-containers/dependabot/go_modules/src/runtime/go_modules-5e29427af7 build(deps): bump golang.org/x/net from 0.24.0 to 0.25.0 in /src/runtime in the go_modules group across 1 directory	2024-05-23 12:55:29 +02:00
Fabiano Fidêncio	0e33ecf7fc	Merge pull request #9653 from JakubLedworowski/fixes-9497-ensure-quote-generation-service-is-added-to-qemu-cmd-2 runtime: Enable connection to Quote Generation Service (QGS)	2024-05-22 15:49:23 +02:00
Fabiano Fidêncio	94f7bbf253	Merge pull request #9682 from fidencio/topic/allow-increasing-cpus-and-memory-via-annotation-for-tdx runtime: tdx: Allow default_{cpu,memory} annotations	2024-05-22 12:07:28 +02:00
Jakub Ledworowski	fc680139e5	runtime: Enable connection to Quote Generation Service (QGS) For the TD attestation to work the connection to QGS on the host is needed. By default QGS runs on vsock port 4050, but can be modified by the host owner. Format of the qemu object follows the SocketAddress structure, so it needs to be provided in the JSON format, as in the example below: -object '{"qom-type":"tdx-guest","id":"tdx","quote-generation-socket":{"type":"vsock","cid":"2","port":"4050"}}' Fixes: #9497 Signed-off-by: Jakub Ledworowski <jakub.ledworowski@intel.com>	2024-05-22 11:16:24 +02:00
Alex Lyn	ce030d1804	Merge pull request #9641 from cmaf/runtime-resize-mem-1 runtime: Add missing check in ResizeMemory for CH	2024-05-22 14:05:30 +08:00
Alex Lyn	b7af00be2a	Merge pull request #9624 from cncal/bugfix_duplicated_devices runtime: fix duplicated devices requested to the agent	2024-05-22 12:45:46 +08:00
Steve Horsman	f41f642b90	Merge pull request #9635 from kata-containers/dependabot/go_modules/src/runtime/go_modules-f0df977846 build(deps): bump github.com/containerd/containerd from 1.7.11 to 1.7.16 in /src/runtime in the go_modules group across 1 directory	2024-05-21 21:19:32 +01:00
Steve Horsman	9b0ed3dfa7	Merge pull request #9657 from ajaypvictor/remote-hyp-annotations runtime: Disable number of cpu comparison on remote hypervisor scenario	2024-05-21 21:19:12 +01:00
stevenhorsman	865fa9da15	runtime: Resolve go static-checks failure Remove `rand.Seed` call to resolve the following failure: ``` rand.Seed is deprecated: As of Go 1.20 there is no reason to call Seed with a random value. ``` The go rand.Seed docs: https://pkg.go.dev/math/rand@go1.20#Seed back this up and states: > If Seed is not called, the generator is seeded randomly at program startup. so I believe we can just delete the call. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-05-21 11:08:59 +01:00
Fabiano Fidêncio	abf52420a4	runtime: tdx: Allow default_{cpu,memory} annotations For now, let's allow the users to set the default_cpu and default_memory when using TDX, as they may hit issues related to the size of the container image that must be pulled and unpacked inside the guest, Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-21 10:26:39 +02:00
stevenhorsman	75a201389d	runtime: update go version in go.mod - Make due to us bumping the golang version used in our CI but `make vendor` fails without the go version in the runtime go.mod being increased, so update this and run go mod tidy Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-05-21 09:11:46 +01:00
dependabot[bot]	735185b15c	build(deps): bump github.com/containerd/containerd Bumps the go_modules group with 1 update in the /src/runtime directory: [github.com/containerd/containerd](https://github.com/containerd/containerd). Updates `github.com/containerd/containerd` from 1.7.11 to 1.7.16 - [Release notes](https://github.com/containerd/containerd/releases) - [Changelog](https://github.com/containerd/containerd/blob/main/RELEASES.md) - [Commits](https://github.com/containerd/containerd/compare/v1.7.11...v1.7.16) --- updated-dependencies: - dependency-name: github.com/containerd/containerd dependency-type: direct:production dependency-group: go_modules ... Signed-off-by: dependabot[bot] <support@github.com>	2024-05-21 09:11:46 +01:00
Ajay Victor	abe607b0c7	runtime: Disable number of cpu comparison on remote hypervisor scenario Fixes https://github.com/kata-containers/kata-containers/issues/9238 Signed-off-by: Ajay Victor <ajvictor@in.ibm.com>	2024-05-21 13:34:21 +05:30
dependabot[bot]	01868b2849	--- updated-dependencies: - dependency-name: golang.org/x/net dependency-type: indirect dependency-group: go_modules ... Signed-off-by: dependabot[bot] <support@github.com>	2024-05-20 22:06:41 +00:00
Fabiano Fidêncio	f2de259387	runtime: tdx: Use shared_fs=none We shouldn't be using 9p, at all, with TEEs, as off right now we have no way to ensure the channels are encrypted. The way to work this around for now is using guest pull, either with containerd + nydus snapshotter or with CRI-O; or even tardev snapshotter for pulling on the host (which is the approach used by MSFT). This is only done for TDX for now, leaving the generic, AMD, and IBM related stuff for the folks working on those to switch and debug possible issues on their environment. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-19 18:47:09 +02:00
Chelsea Mafrica	5d2af555da	runtime: Add missing check in ResizeMemory for CH ResizeMemory for Cloud Hypervisor is missing a check for the new requested memory being greater than the max hotplug size after alignment. Add the check, and since an earlier check for this setsrequested memory to the max hotplug size, do the same in the post-alignment check. Fixes #9640 Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>	2024-05-15 11:29:18 -07:00
cncal	232db2d906	runtime: fix duplicated devices requested to the agent By default, when a container is created with the `--privileged` flag, all devices in `/dev` from the host are mounted into the guest. If there is a block device(e.g. `/dev/dm`) followed by a generic device(e.g. `/dev/null`)，two identical block devices(`/dev/dm`) would be requested to the kata agent causing the agent to exit with error: > Conflicting device updates for /dev/dm-2 As the generic device type does not hit any cases defined in `switch`， the variable `kataDevice` which is defined outside of the loop is still the value of the previous block device rather than `nil`. Defining `kataDevice` in the loop fixes this bug. Signed-off-by: cncal <flycalvin@qq.com>	2024-05-12 16:38:37 +08:00
Fabiano Fidêncio	77f457c0e1	runtime: tdx: Drop sept-ve-disable=on This was needed when we were using an old (and not maintained anymore) host stack. Considering what we have as part of the distros, Today, this can simply be dropped, as I cannot find any reference of this one being needed in any up-to-date documentation. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-09 07:59:12 +02:00
Fabiano Fidêncio	416d00228c	Revert "qemu: tdx: Adapt command line" (partially) This reverts commit `b7cccfa019`. The `private=on` bit has never made its way upstream, and was removed from the latest iteration that we're using. With that in mind, let's revert its usage in the code. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-09 07:59:12 +02:00
Fabiano Fidêncio	1c3037fd25	Revert "govmm: tdx: Expose the private=on\|off knob" This reverts commit `582b5b6b19`. The `private=on` bit has never made its way upstream, and was removed from the latest iteration that we're using. With that in mind, let's revert its addition, and later on its usage in the code. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-09 07:59:12 +02:00
Fabiano Fidêncio	f48450b360	runtime: config: tdx: Add QEMU / OVMF placeholder var Let's add the PLACEHOLDER_FOR_DISTRO_{QEMU,OVMF}_WITH_TDX_SUPPORT variables instead of actually setting a path, so we can easily replace those as part of our deployment scripts. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-09 07:59:12 +02:00
Alex Lyn	875e6e3815	Merge pull request #9601 from cncal/fix_redundant_log qemu: the error is logged only when it occurs	2024-05-08 08:59:01 +08:00
GabyCT	a564422b7b	Merge pull request #9582 from cncal/main build: fix the confusing build message if yq doesn't exist in GOPATH/bin	2024-05-07 09:34:27 -06:00
Fabiano Fidêncio	ddf6b367c7	Merge pull request #9568 from kata-containers/dependabot/go_modules/src/runtime/go_modules-22ef55fa20 build(deps): bump the go_modules group across 5 directories with 8 updates	2024-05-07 13:14:48 +02:00
cncal	15d511af97	qemu: the error is logged only when it occurs Everytime I create contianer on arm64 machine, containerd/kata logs a redundant warning as follows: ``` shell time="2024-05-07" level=warning msg="<nil>" arch=arm64 name=containerd-shim-v2 pid=xxx sandbox=fdd1f05 source=virtcontainers/hypervisor ``` I added an error statement so that the error would be logged when it occurs. Signed-off-by: cncal <flycalvin@qq.com>	2024-05-07 14:28:04 +08:00
cncal	48d873b52b	build: fix the confusing build message if yq doesn't exist in GOPATH/bin The build message shows that yq was not found when I tried to build runtime binaries, but I've actually installed yq by yum install. Signed-off-by: cncal <flycalvin@qq.com>	2024-05-03 08:34:45 +08:00
cncal	9caa7beb1f	runtime: make kata-runtime check error more understandable If device /dev/kvm does not exist, kata-runtime check would fail with an ambiguous error messae 'no such file or directory'. I added a little more details to make it understandable and it will belike: ``` ERRO[0000] cannot open kvm device: no such file or directory arch=arm64 check-type=full device=/dev/kvm name=kata-runtime pid=2849085 source=runtime ERRO[0000] no such file or directory arch=arm64 name=kata-runtime pid=2849085 source=runtime no such file or directory ``` Signed-off-by: cncal <flycalvin@qq.com>	2024-05-03 08:29:08 +08:00
Zvonko Kaiser	e5e0983b56	Merge pull request #9476 from zvonkok/nvidia-config-tomls config: Add NVIDIA GPU SNP, TDX configuration files	2024-05-02 10:27:10 +02:00
stevenhorsman	3c2232d898	runtime: fix testVersionString logic - The testVersionString logic use regex to check that the ociVersion is displayed correctly, but with the new go module that version has a `+` in, so we need to quote this to escape special characters Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-04-30 10:54:49 +01:00
dependabot[bot]	391bc35805	build(deps): bump the go_modules group across 5 directories with 8 updates Bumps the go_modules group with 2 updates in the /src/runtime directory: [github.com/containerd/containerd](https://github.com/containerd/containerd) and [github.com/containers/podman/v4](https://github.com/containers/podman). Bumps the go_modules group with 4 updates in the /src/tools/csi-kata-directvolume directory: [golang.org/x/sys](https://github.com/golang/sys), google.golang.org/protobuf, [golang.org/x/net](https://github.com/golang/net) and [google.golang.org/grpc](https://github.com/grpc/grpc-go). Bumps the go_modules group with 2 updates in the /src/tools/log-parser directory: [golang.org/x/sys](https://github.com/golang/sys) and gopkg.in/yaml.v3. Bumps the go_modules group with 2 updates in the /tests directory: [golang.org/x/sys](https://github.com/golang/sys) and gopkg.in/yaml.v3. Bumps the go_modules group with 2 updates in the /tools/testing/kata-webhook directory: [golang.org/x/sys](https://github.com/golang/sys) and [golang.org/x/net](https://github.com/golang/net). Updates `github.com/containerd/containerd` from 1.7.2 to 1.7.11 - [Release notes](https://github.com/containerd/containerd/releases) - [Changelog](https://github.com/containerd/containerd/blob/main/RELEASES.md) - [Commits](https://github.com/containerd/containerd/compare/v1.7.2...v1.7.11) Updates `github.com/containers/podman/v4` from 4.2.0 to 4.9.4 - [Release notes](https://github.com/containers/podman/releases) - [Changelog](https://github.com/containers/podman/blob/v4.9.4/RELEASE_NOTES.md) - [Commits](https://github.com/containers/podman/compare/v4.2.0...v4.9.4) Updates `google.golang.org/protobuf` from 1.29.1 to 1.33.0 Updates `github.com/cyphar/filepath-securejoin` from 0.2.3 to 0.2.4 - [Release notes](https://github.com/cyphar/filepath-securejoin/releases) - [Commits](https://github.com/cyphar/filepath-securejoin/compare/v0.2.3...v0.2.4) Updates `golang.org/x/sys` from 0.15.0 to 0.19.0 - [Commits](https://github.com/golang/sys/compare/v0.15.0...v0.19.0) Updates `google.golang.org/protobuf` from 1.31.0 to 1.33.0 Updates `golang.org/x/net` from 0.19.0 to 0.23.0 - [Commits](https://github.com/golang/net/compare/v0.19.0...v0.23.0) Updates `google.golang.org/grpc` from 1.59.0 to 1.63.2 - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](https://github.com/grpc/grpc-go/compare/v1.59.0...v1.63.2) Updates `golang.org/x/sys` from 0.0.0-20191026070338-33540a1f6037 to 0.1.0 - [Commits](https://github.com/golang/sys/compare/v0.15.0...v0.19.0) Updates `gopkg.in/yaml.v3` from 3.0.0-20200313102051-9f266ea9e77c to 3.0.0 Updates `golang.org/x/sys` from 0.0.0-20220429233432-b5fbb4746d32 to 0.19.0 - [Commits](https://github.com/golang/sys/compare/v0.15.0...v0.19.0) Updates `gopkg.in/yaml.v3` from 3.0.0-20210107192922-496545a6307b to 3.0.0 Updates `golang.org/x/sys` from 0.15.0 to 0.19.0 - [Commits](https://github.com/golang/sys/compare/v0.15.0...v0.19.0) Updates `golang.org/x/net` from 0.19.0 to 0.23.0 - [Commits](https://github.com/golang/net/compare/v0.19.0...v0.23.0) --- updated-dependencies: - dependency-name: github.com/containerd/containerd dependency-type: direct:production dependency-group: go_modules - dependency-name: github.com/containers/podman/v4 dependency-type: direct:production dependency-group: go_modules - dependency-name: google.golang.org/protobuf dependency-type: direct:production dependency-group: go_modules - dependency-name: github.com/cyphar/filepath-securejoin dependency-type: indirect dependency-group: go_modules - dependency-name: golang.org/x/sys dependency-type: indirect dependency-group: go_modules - dependency-name: google.golang.org/protobuf dependency-type: indirect dependency-group: go_modules - dependency-name: golang.org/x/net dependency-type: direct:production dependency-group: go_modules - dependency-name: google.golang.org/grpc dependency-type: direct:production dependency-group: go_modules - dependency-name: golang.org/x/sys dependency-type: indirect dependency-group: go_modules - dependency-name: gopkg.in/yaml.v3 dependency-type: indirect dependency-group: go_modules - dependency-name: golang.org/x/sys dependency-type: indirect dependency-group: go_modules - dependency-name: gopkg.in/yaml.v3 dependency-type: indirect dependency-group: go_modules - dependency-name: golang.org/x/sys dependency-type: indirect dependency-group: go_modules - dependency-name: golang.org/x/net dependency-type: indirect dependency-group: go_modules ... Signed-off-by: dependabot[bot] <support@github.com>	2024-04-30 09:46:13 +01:00
Wainer dos Santos Moschetta	42fb5d7760	runtime: new qemu-coco-dev configuration Created a new configuration to configure Kata for CoCo without requiring TEE hardware so to allow developers implement/test/debug platform agnostic code on their workstations. It will also ease testing of CoCo features on CI with non-TEE supported VMs. This is based off qemu configuration. The following differences applied: - switched to confidential guest image/initrd - switched to confidential kernel - switched to 9p shared_fs Fixes #9487 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-04-29 05:45:10 -03:00
Dan Mihai	89c85dfe84	Merge pull request #9432 from UiPath/fix-clh-wait clh: isClhRunning waits for full timeout when clh exits	2024-04-23 13:02:45 -07:00
Greg Kurz	42a79801f3	Merge pull request #9524 from littlejawa/fix_createruntime_hook_not_called runtime: Call CreateRuntime hooks at container creation time	2024-04-23 13:43:36 +02:00
Julien Ropé	70e798ed35	runtime: Call CreateRuntime hooks at container creation time CreateRuntime hooks are called at the CreateSandbox time, but not after CreateContainer. Fixes: #9523 Signed-off-by: Julien Ropé <jrope@redhat.com>	2024-04-19 10:25:02 +02:00
Adil Sadik	1c5ca0c915	runtime: update golang.org/x/net updates golang.org/x/net to newer version that closes some reported vulnerabilities and security issues Fixes #9486 Signed-off-by: Adil Sadik <sparky.005@gmail.com>	2024-04-18 10:55:02 -04:00
Zvonko Kaiser	eda3bfe2ef	config: Add NVIDIA GPU SNP, TDX configuration files Fixes: #9475 For TDX and SNP add NVIDIA specific configuration files Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-04-17 12:49:13 +00:00
Alexandru Matei	54923164b5	clh: isClhRunning waits for full timeout when clh exits isClhRunning uses signal 0 to test whether the process is still alive or not. This doesn't work because the process is a direct child of the shim. Once it is dead the process becomes zombie. Since no one waits for it the process lingers until its parent dies and init reaps it. Hence sending signal 0 in isClhRunning will always return success whether the process is dead or not. This patch calls wait to reap the process, if it succeeds that means it is our child process, if not we send the signal. Fixes: #9431 Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>	2024-04-12 11:31:53 +03:00
Greg Kurz	8b996b9307	Merge pull request #9331 from egernst/foobar katautils: check number of cores on the system intead of go runtime	2024-04-08 18:38:49 +02:00
Greg Kurz	be8f0cb520	Merge pull request #9402 from deagon/feat/debug-threads qemu: show the thread name when enable the hypervisor.debug option	2024-04-08 11:04:36 +02:00
Fabiano Fidêncio	cdb8531302	hypervisor: Simplify TDX protection detection Let's rely on the kvm module 'tdx' parameter to do so. This aligns with both OSVs (Canonical, Red Hat, SUSE) and the TDX adoption (https://github.com/intel/tdx-linux) stacks. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-04-05 19:51:27 +02:00
Fabiano Fidêncio	b7cccfa019	qemu: tdx: Adapt command line This commit is a mess, but I'm not exactly sure what's the best way to make it less messy, as we're getting QEMU TDX to work while partially reverting `1e34220c41`. With that said, let me cover the content of this commit. Firstly, we're reverting all the changes related to "memory-backend-memfd-private", as that's what was used with the previous host stack, but it seems it didn't fly upstream. Secondly, in order to get QEMU to properly work with TDX, we need to enforce the 'private=on' knob and use the "memory-backend-ram", and we're doing so, and also making sure to test the `private=on` newly added knob. I'm sorry for the confusion, I understand this is not optimal, I just don't see an easy path to do changes without leaving the code broken during those changes. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-04-05 19:51:27 +02:00
Fabiano Fidêncio	6b4cc5ea6a	Revert "qemu: tdx: Workaround SMP issue with TDX 1.5" This reverts commit `d1b54ede29`. Conflicts: src/runtime/virtcontainers/qemu.go This commit was a hack that was needed in order to get QEMU + TDX to work atop of the stack our CI was running on. As we're moving to "the officially supported by distros" host OS, we need to get rid of this. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-04-05 10:23:52 +02:00
Fabiano Fidêncio	582b5b6b19	govmm: tdx: Expose the private=on\|off knob The private=on\|off knob is required in order to properly lauunch a TDX guest VM. This is a brand new property that is part of the still in-flight patches adding TDX support on QEMU. Please, see: `3fdd8072da` Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-04-05 10:23:52 +02:00
Eric Ernst	da01bccd36	katautils: check number of cores on the system intead of go runtime We used to utilize go runtime's "NumCPUs()", which will give the number of cores available to the Go runtime, which may be a subset of physical cores if the shim is started from within a cpuset. From the function's description: "NumCPU returns the number of logical CPUs usable by the current process." As an example, if containerd is run from within a smaller CPUset, the maximum size of a pod will be dictated by this CPUset, instead of what will be available on the rest of the system. Since the shim will be moved into its own cgroup that may have a different CPUset, let's stick with checking physical cores. This also aligns with what we have documented for maxVCPU handling. In the event we fail to read /proc/cpuinfo, let's use the goruntime. Fixes: #9327 Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2024-04-03 16:09:16 -07:00
Guoqiang Ding	cd0c31e185	qemu: show the thread name when enable the hypervisor.debug option Add debug-threads=on in the name argument if debug enabled. Fixes: #9400 Signed-off-by: Guoqiang Ding <dgq8211@gmail.com>	2024-04-03 10:36:52 +08:00
Dan Mihai	600f9266f3	runtime: remove stream copy infinite loop This reverts commit `1c5693be86`. Avoid apparent infinite loop when ReadStreamRequest is blocked by policy - for some of the pods. When running the k8s-limit-range.bats test with Policy enabled, the Shim + VMM never get terminated on my cluster. Not sure why the sandbox clean-up works better for other tests, but the k8s-limit-range test pod gets stuck in an infinite loop: stdout io stream copy error happens: error = %wrpc error: code = PermissionDenied desc = \"ReadStreamRequest is blocked by policy ... policy check: ReadStreamRequest ... stdout io stream copy error happens: error = %wrpc error: code = PermissionDenied desc = \"ReadStreamRequest is blocked by policy ... policy check: ReadStreamRequest ... Fixes: #9380 Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-03-28 22:43:28 +00:00
Tobin Feldman-Fitzthum	9856fe5bea	runtime: remove ServiceOffload parameter Since we no longer use the service_offload configuration, remove the ServiceOffload field from the image struct. Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>	2024-03-27 12:21:13 -05:00
Tobin Feldman-Fitzthum	a18c7ca307	runtime: remove unimplemented CoCo configurations These experimental options were added 2 years ago in anticipation of features that would be added in CoCo. These do not match the features that were eventually added and will soon be ported to main. Fixes: #8047 Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>	2024-03-27 12:21:06 -05:00
Chengyu Zhu	e66a5cb54d	Merge pull request #9332 from ChengyuZhu6/guest-pull-timeout Support to set timeout to pull large image in guest	2024-03-28 00:34:08 +08:00
Greg Kurz	e1068da1a0	Merge pull request #9326 from gkurz/draft-release Only tag and publish the release when it is fully ready	2024-03-27 15:59:59 +01:00
ChengyuZhu6	c2dc13ebaa	runtime: support to configure CreateContainer Timeout in configurations support to configure CreateContainerRequestTimeout in the configurations. e.g.: [runtime] ... create_container_timeout = 300 Note: The effective timeout is determined by the lesser of two values: runtime-request-timeout from kubelet config (https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#:~:text=runtime%2Drequest%2Dtimeout) and create_container_timeout. In essence, the timeout used for guest pull=runtime-request-timeout<create_container_timeout?runtime-request-timeout:create_container_timeout. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-03-27 21:58:41 +08:00
Greg Kurz	693c9487d4	docs: Adjust release documentation Most of the content of `docs/Stable-Branch-Strategy.md` got de-facto deprecated by the re-design of the release process described in #9064. Remove this file and all its references in the repo. The `## Versioning` section has some useful information though. It is moved to `docs/Release-Process.md`. The documentation of the `PATCH` field is adapted according to new workflow. Fixes #9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>	2024-03-27 12:41:48 +01:00
ChengyuZhu6	2224f6d63f	runtime: support to configure CreateContainer timeout in annotation Support to configure CreateContainerRequestTimeout in the annotations. e.g.: annotations: "io.katacontainers.config.runtime.create_container_timeout": "300" Note: The effective timeout is determined by the lesser of two values: runtime-request-timeout from kubelet config (https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#:~:text=runtime%2Drequest%2Dtimeout) and create_container_timeout. In essence, the timeout used for guest pull=runtime-request-timeout<create_container_timeout?runtime-request-timeout:create_container_timeout. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-03-27 15:44:29 +08:00
ChengyuZhu6	39bd462431	runtime: support to set timeout for CreateContainerRequest In the situation to pull images in the guest #8484, it’s important to account for pulling large images. Presently, the image pull process in the guest hinges on `CreateContainerRequest`, which defaults to a 60-second timeout. However, this duration may prove insufficient for pulling larger images, such as those containing AI models. Consequently, we must devise a method to extend the timeout period for large image pull. Fixes: #8141 Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-03-27 15:44:29 +08:00
Chelsea Mafrica	d69514766e	src: Remove references to files in tests repo Change scripts and source that uses files in the tests repo to use the corresponding file in the current repo. Fixes #9165 Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>	2024-03-25 15:09:52 -07:00
ChengyuZhu6	ba242b0198	runtime: support different cri container type check To support handle image-guest-pull block volume from different CRIs, including cri-o and containerd. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-03-19 18:05:59 +01:00
ChengyuZhu6	965da9bc9b	runtime: support to pass image information to guest by KataVirtualVolume support to pass image information to guest by KataVirtualVolumeImageGuestPullType in KataVirtualVolume, which will be used to pull image on the guest. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-03-19 17:22:36 +01:00
Alexandru Matei	617b0114b3	clh: initialize clh pid before using it The PID needs to be initialized before calling isClhRunning. waitVMM() uses isClhRunning and is called by launchClh() just before returning from function. Fixes: #9230 Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>	2024-03-09 13:53:51 +02:00
Steve Horsman	54e5ce2464	Merge pull request #9154 from chungeun-choi/change-deprecated-package fixed - Change the deprecated module from 'io/util' to util. 'io/util…	2024-03-08 15:05:43 +00:00
Chungeun Choi	bad263f399	runtime: Replace deprecated module io/ioutil" to "io" This change updates the module import to use 'util' instead of the deprecated 'io/util' Fixes: #9166 Signed-off-by: Chungeun Choi <ce.choi@okestro.com>	2024-03-07 10:56:06 +00:00
Linda Yu	1c5693be86	stream: repeat copybuffer if it is blocked by policy copyBuffer returns and the streams will be closed when error occurs. If the error contains "blocked by policy" it means the log output is disabled by policy with "ReadStreamRequest" and "WriteStreamRequest" set to false. But at this moment, we want the real stream still working (not be seen) because we might want to enable logging for debugging purpose, so we repeat copybuffer in this case to avoid streams being closed. Fixes: #8797 Signed-off-by: Linda Yu <linda.yu@intel.com>	2024-03-07 15:00:23 +08:00
Linda Yu	eda419cb03	kata-runtime: add set policy function to kata-runtime logging/debugging information might probably be disabled in production due to security consideration, but we'd better provide an approach for customer to get logging information during runtime, this PR implement setpolicy function in kata-runtime tools, although it can set whole policy other than logging. setpolicy would evokes remote attestation, which means before setting policy during runtime, user has to reconfigure new policy hash in KBS/AS. usage: kata-runtime policy set policy.rego --sandbox-id XXXXXXXX Fixes: #8797 Signed-off-by: Linda Yu <linda.yu@intel.com>	2024-03-07 15:00:23 +08:00
Fupan Li	628f57aca0	Merge pull request #9193 from UiPath/fix/clh-dax clh: Enable DAX for rootfs	2024-03-05 09:39:22 +08:00
Liu Bo	b6f8355ea3	katautils: fix panic on tracing. This fixes a panic on tracing on container exit. The root cause is that global var needs to be set by "=" instead of ":=". Fixes: #9102 Signed-off-by: Liu Bo <liub.liubo@gmail.com>	2024-02-29 18:40:23 -08:00
Alexandru Matei	6856e8f678	clh: Enable DAX for rootfs Fixes: #9192 Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>	2024-02-29 18:01:47 +02:00
Dan Mihai	352e2af5f0	Merge pull request #9153 from microsoft/danmihai1/clh-bootVM-timeout runtime: clh: minimum 10s timeout for CreateVM + BootVM	2024-02-27 09:58:01 -08:00
Dan Mihai	f4509b806b	runtime: clh: minimum 10s timeout for CreateVM + BootVM Relax the timeout for calling CLH's CreateVM + BootVM APIs. When hitting the older 1s timeout, killing a half-booted Guest and retrying the same boot sequence could have been wasteful and resulting in unstable CI testing on slower Hosts. Fixes: #9152 Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-02-24 19:15:57 +00:00
Julien Ropé	9de65707ca	runtime: stop reporting net dev metrics for the shim As part of the shim network metrics, the shim is reporting network interfaces from the host with no namespace isolation - this gives insight in interfaces not tied to the kata containers, and causes an increase in resource usage for kata metrics. As the shim itself is not using the network (all its communication with other processes is done with local unix sockets), there is no reason to keep gathering and reporting shim-specific network metrics. Actual network usage of the kata containers can be found from the existing hypervisor network metrics (kata_hypervisor_netdev) and from the agent network metrics (kata_guest_netdev_stat). Fixes: #5738 Signed-off-by: Julien Ropé <jrope@redhat.com>	2024-02-22 14:00:00 +01:00
Archana Shinde	6d38fa1530	network: Try removing as many changes as possible during network cleanup In case an error is encountered while removing a network endpoint during network cleanup, we cuurently return immediately with the error. With this change, in case of error we simply log the error and proceed towards removing the next endpoint. With this, we can cleanup the network changes made by the shim as much as possible. This is especially important when multiple interfaces are passed to the network namespace using a network plugin like multus. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2024-02-20 06:08:05 -08:00
Archana Shinde	b005cda689	network: Move up defer block tp cleanup network Move the defer for cleaning up network before the call to add network. This way if any change made by add network is reverted by in case of failure. This is particulary important for physical network interfaces as with this step we make sure that driver for the physical interface is reverted back to the original host driver. Without this the physical network iterface will remain bound to vfio. Fixes: #8646 Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2024-02-20 06:06:42 -08:00
ChengyuZhu6	96c297cb37	runtime: fix checksum mismatch error in `make vendor` Fix checksum mismatch error in `make vendor`. Fixes: #9111 Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-02-18 22:22:38 +08:00
Fabiano Fidêncio	eea4277fbf	runtime: Update runc to v1.1.12 Although we don't seem to be affected by https://nvd.nist.gov/vuln/detail/CVE-2024-21626, we vendor and use the runc package in a few different places of our code, and we better update the package to its latest release. Fixes: #9097 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-02-14 23:13:39 +01:00
Niteesh Dubey	3e383674f8	runtime: fix creation of SEV confidential container on SNP enabled host. This is needed to fix the bug which is not allowing to create SEV container on SNP enabled host anymore. This is a regression that was introduced as part of the following commit: `de39fb7d38` Fixes: #9036 Signed-off-by: Niteesh Dubey <niteesh@us.ibm.com>	2024-02-06 19:01:30 +00:00
Alex Lyn	1ab9a21492	Merge pull request #8552 from deagon/fix/missing-port-type runtime: missing port type in the DeviceInfo	2024-02-06 10:56:46 +08:00
Fabiano Fidêncio	1362918ff0	Merge pull request #9011 from fidencio/topic/switch-to-using-the-confidential-rootfs runtime: Replace TEE specific initrd / image for the confidential one	2024-02-05 10:43:12 +01:00
Guoqiang Ding	6068faf40b	runtime: failed to run in the case of ColdPlugVFIO Add the missing port type in the DeviceInfo. Fixes: #9014 Signed-off-by: Guoqiang Ding <dgq8211@gmail.com>	2024-02-05 17:30:11 +08:00
Alex Lyn	cf74166d75	Merge pull request #9015 from Apokleos/bugfix-exec-uds runtime: display accurate error msg to avoid misleading users.	2024-02-05 13:50:43 +08:00
Alex Lyn	c6830ceb89	runtime: display accurate error msg to avoid misleading users. The original handling method does not reach user expectations. When the ClientSocketAddress method stats the corresponding path of runtime-rs and has not found it yet, we should return an error message here that includes the reason for the failure (which should be an error display indicating that both runtime-go and runtime-rs were not found). Instead of simply displaying the corresponding path of runtime-rs as the final error message to users. It is also necessary to return the error promptly to the caller for further error handling. Fixes: #8999 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-02-04 16:45:59 +08:00
Guoqiang Ding	7bf1ebe16d	kata-monitor: fix agentUrl from containerd shim Fix the missing leading slash. Fixes: #9013 Signed-off-by: Guoqiang Ding <dgq8211@gmail.com>	2024-02-04 16:24:13 +08:00
Fabiano Fidêncio	e4258d8694	runtime: Use confidential image / initrd instead of TEE specific ones Now that we have a confidential image / initrd being built, instead of a specific one for each TEE, let's use it everywhere possible. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-02-03 13:20:14 +01:00
Fabiano Fidêncio	3755c69165	runtime: makefile: remove SNP specific kernel references As this is not used anymore, we can go ahead and just remove it Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-02-02 21:12:21 +01:00
Fabiano Fidêncio	57b132f94c	runtime: makefile: remove SEV specific kernel references As this is not used anymore, we can go ahead and just remove it Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-02-02 21:12:21 +01:00
Fabiano Fidêncio	2562d23242	runtime: makefile: remove TDX specific kernel references As this is not used anymore, we can go ahead and just remove it. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-02-02 21:11:43 +01:00
Fabiano Fidêncio	f4e3c936d8	runtime: snp: config: Use the confidential kernel As we're building a single confidential kernel, we should rely on it rather than keep using the specific ones for TDX / SEV / SNP. However, for debugability-sake, let's do this change TEE by TEE. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-02-02 21:11:36 +01:00
Fabiano Fidêncio	8731366d7b	runtime: sev: config: Use the confidential kernel As we're building a single confidential kernel, we should rely on it rather than keep using the specific ones for TDX / SEV / SNP. However, for debugability-sake, let's do this change TEE by TEE. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-02-02 21:11:36 +01:00
Fabiano Fidêncio	6cbdba7268	runtime: tdx: config: Use the confidential kernel As we're building a single confidential kernel, we should rely on it rather than keep using the specific ones for TDX / SEV / SNP. However, for debugability-sake, let's do this change TEE by TEE. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-02-02 17:13:06 +01:00
Fabiano Fidêncio	a618461d3a	runtime: Add confidential kernel to the makefile With this we can properly generate and the the `-confidential` kernel, which supports SEV / SNP / TDX as part of our configuration files. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-02-02 17:13:05 +01:00
Zhigang Wang	9317e23df1	mount: Reduce the mount points with namespace isolation This patch can reduce load on systemd process, and increase the k8s deployment density when using go runtime. Fixes: #8758 Signed-off-by: Zhigang Wang <wangzhigang17@huawei.com> Signed-off-by: Liu Wenyuan <liuwenyuan9@huawei.com>	2024-02-01 18:34:24 +08:00
Alex Lyn	cf26c16017	Merge pull request #8931 from yaoyinnan/8930/feat/merge-ValidCgroupPath runtime: merged ValidCgroupPath method	2024-02-01 12:53:55 +08:00
yaoyinnan	9aa1ed805a	runtime: add SingleContainer when obtaining OCI Spec When creating a cgroup, add a SingleContainer when obtaining the OCI Spec to apply to ctr, podman, etc. Fixes: #5240 Signed-off-by: yaoyinnan <35447132+yaoyinnan@users.noreply.github.com>	2024-01-31 15:24:07 +08:00
yaoyinnan	b0b8523cea	runtime: modify ValidCgroupPath unit test Modify ValidCgroupPath unit test. Fixes: #8930 Signed-off-by: yaoyinnan <35447132+yaoyinnan@users.noreply.github.com>	2024-01-31 14:37:17 +08:00
yaoyinnan	feed5c8ff9	runtime: merged ValidCgroupPath method Merged ValidCgroupPath method to handle cgroupv1 and cgroupv2. Fixes: #8930 Signed-off-by: yaoyinnan <35447132+yaoyinnan@users.noreply.github.com>	2024-01-31 14:37:13 +08:00
Kvlil	a4b208a712	runtime: remove SharedVersions field dead code SharedVersion fiel add a versiontable property that isn't supported by upstream QEMU. This is dead code since virtcontainers isn't setting SharedVersions to true. Fixes: #7720 Signed-off-by: Kvlil <kalil.pelissier@gmail.com>	2024-01-22 12:18:42 +00:00
Amulyam24	394777291d	runtime: fix failing unit tests on ppc64le A few CPU related test cases were failing as the version was being verified against Power8 while the CI machine is Power9. Fixes: #5531 Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>	2024-01-18 16:31:13 +01:00
Hyounggyu Choi	540a2a7fb1	runtime: Allow no initrd path for IBM Z Secure Execution This is to reintroduce a configuration rule for IBM Z Secure Execution, where no initrd path should be configured. For the TEE of interest, only a kernel image should be specified with `confidential_guest=true`. Fixes: #8692 Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2023-12-19 11:21:16 +01:00
Hyounggyu Choi	588f639a69	Merge pull request #6755 from BbolroC/add-se-artifacts-to-main packaging: Add IBM Z SE artifacts to main	2023-12-08 05:17:38 +01:00
Fabiano Fidêncio	d149b9f9ca	Merge pull request #7231 from wainersm/measured_rootfs-improvements Build for measured rootfs improvements	2023-12-05 22:20:33 +01:00
Hyounggyu Choi	bb1d4adaa9	config: add SE configuration This is to add SE configuration which is used by kata runtime. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2023-12-04 21:08:49 +01:00
yuchen.cc	1cd1558a92	mount: support checking multiple kinds of block device driver Device mapper is the only supported block device driver so far, which seems limiting. Kata Containers can work well with other block devices. It is necessary to enhance supporting of multiple kinds of host block device. Fixes #4714 Signed-off-by: yuchen.cc <yuchen.cc@alibaba-inc.com>	2023-12-01 11:59:30 +08:00
Steve Horsman	c6110284d5	Merge pull request #8520 from stevenhorsman/hypervisor-ttrpc runtime: Update hypervisor generated code	2023-11-30 10:01:56 +00:00
Fabiano Fidêncio	f15e16b692	Revert "runtime: confidential: Do not set the max_vcpu to cpu" This reverts commit `b0157ad73a`. ``` commit `b0157ad73a` Refs: 3.3.0-alpha0-124-gb0157ad73 Author: Fabiano Fidêncio <fabiano.fidencio@intel.com> AuthorDate: Fri Aug 11 14:55:11 2023 +0200 Commit: Fabiano Fidêncio <fabiano.fidencio@intel.com> CommitDate: Fri Nov 10 12:58:20 2023 +0100 runtime: confidential: Do not set the max_vcpu to cpu We don't have to do this since we're relying on the `static_sandbox_resource_mgmt` feature, which gives us the correct amount of memory and CPUs to be allocated. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> ``` This commit was removing a requirement that was made previously, but due to the SMP issue we're facing with the QEMU used for TDX (see commit d1b54ede290e95762099fff4e0bcdad10f816126), QEMU will fail to start due to: ``` Invalid CPU topology: product of the hierarchy must match maxcpus: sockets (1) dies (1) * cores (1) * threads (1) != maxcpus (240)" ``` This has no affect on the SEV / SNP workflow and hopefully we'll be able to re-revet this soon enough, when this gets solved on te QEMU side. Last but not least, this is not a "clean" revert as we're using conf.NumVCPUs() instead of conf.NumVCPUs, to ensure we're dealing with uint32. Fixes: #8532 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-11-30 00:41:27 +01:00
stevenhorsman	47b8c3181f	runtime: remote hypervisor updates to ttrpc - Update the remote hypervisor code to match the re-genned code for the ttrpc Hypervisor Service Fixes: #8519 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2023-11-29 18:04:40 +00:00
stevenhorsman	613c75ba8c	runtime: Update hypervisor generated code Update to use ttrpc_out instead of grpc_out Fixes: #8519 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2023-11-29 18:04:40 +00:00
Wainer dos Santos Moschetta	a13eecf7f3	runtime(-rs): add clean-generated-files target The new clean-generated-files make target allows for removing the generated files (including the configuration.toml files). The tools/packaging/static-build/shim-v2/build.sh script now uses that target to always force the re-generation of those files. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2023-11-28 11:21:53 -03:00
James O. D. Hunt	45cc417a4e	Merge pull request #8461 from jodh-intel/update-codeowners CODEOWNERS: Expand scope	2023-11-27 15:38:39 +00:00
Fabiano Fidêncio	bb4c51a5e0	Merge pull request #8494 from ChengyuZhu6/kata_virtual_volume runtime: Pass `KataVirtualVolume` to the guest as devices in go runtime	2023-11-27 16:02:28 +01:00
ChengyuZhu6	5318afe273	runtime: support to create VirtualVolume rootfs storages 1) Creating storage for all `io.katacontainers.volume=` messages in rootFs.Options, and then aggregates all storages into `containerStorages`. 2) Creating storage for other data volumes and push them into `volumeStorages`. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2023-11-23 23:22:55 +08:00
ChengyuZhu6	0b4f7c2ee7	runtime: redefine and add functions to handle VirtualVolume to storage 1) Extract function `handleBlockVolume` to create Storage only. 2) Add functions to handle KataVirtualVolume device and construct corresponding storages. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2023-11-23 23:07:32 +08:00
ChengyuZhu6	bd099fbda9	runtime: extend SharedFile to support mutiple storage devices To enhance the construction and administration of `Katavirtualvolume` storages, this commit expands the 'sharedFile' structure to manage both rootfs storages(`containerStorages`) including `Katavirtualvolume` and other data volumes storages(`volumeStorages`). NOTE: `volumeStorages` is intended for future extensions to support Kubernetes data volumes. Currently, `KataVirtualVolume` is exclusively employed for container rootfs, hence only `containerStorages` is actively utilized. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2023-11-23 23:05:14 +08:00
ChengyuZhu6	e4f33ac141	runtime: add functions to create devices in KataVirtualVolume The snapshotter will place `KataVirtualVolume` information into 'rootfs.options' and commence with the prefix 'io.katacontainers.volume='. The purpose of this commit is to transform the encapsulated KataVirtualVolume data into device information. Fixes: #8495 Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com> Co-authored-by: Feng Wang <feng.wang@databricks.com> Co-authored-by: Samuel Ortiz <sameo@linux.intel.com> Co-authored-by: Wedson Almeida Filho <walmeida@microsoft.com>	2023-11-23 23:05:13 +08:00
Dan Mihai	756022787c	Merge pull request #8239 from Sumynwa/sumsharma/fix_configmap_update_propagation runtime: Fix configmap/secrets updates with FS sharing disabled	2023-11-23 06:50:53 -08:00
Fabiano Fidêncio	9445a967b6	Merge pull request #8471 from ChengyuZhu6/kata-virtual-volume runtime: Introduce `KataVirtualVolume` structure into go runtime	2023-11-20 21:58:27 +01:00
ChengyuZhu6	1353b14e6c	runtime: Add KataVirtualVolume struct in runtime Add the corresponding data structure in the runtime part according to kata-containers/kata-containers/pull/7698. Fixes: #8472 Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2023-11-19 13:30:32 +08:00
Pradipta Banerjee	39e8c84269	runtime: Add support for key annotations to remote hyp In order to support different pod VM instance type via remote hypervisor implementation (cloud-api-adaptor), we need to pass machine_type, default_vcpus and default_memory annotations to cloud-api-adaptor. The cloud-api-adaptor then uses these annotations to spin up the appropriate cloud instance. Reference PR for cloud-api-adaptor https://github.com/confidential-containers/cloud-api-adaptor/pull/1088 Fixes: #7140 Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com> (based on commit `004f07f076`)	2023-11-17 13:33:27 +00:00
Yohei Ueda	2910e333a8	runtime: Use static resource in remote hypervisor This patch updates the template configuration file for the remote hypervisor to set static_sandbox_resource_mgmt to be true. The remote hypervisor uses the peer pod config to determine the sandbox size, so requires this to be set to true by default. Fixes: #6616 Signed-off-by: Yohei Ueda <yohei@jp.ibm.com> (based on commit `938447803b`)	2023-11-17 13:33:27 +00:00
stevenhorsman	26d56678a9	config: Add initial remote hypervisor config - Remote hypervisor template config - Add annotation enablement for machine_type, default_memory and default_vcpus for flexible instance types Fixes: #6349 Signed-off-by: stevenhorsman <steven@uk.ibm.com> (based on commits `7c9a791d67` and `335a456425`)	2023-11-17 13:33:24 +00:00
stevenhorsman	ad63439a3e	runtime: Update the remote hypervisor config Add the SELinux setting to ensure it is passed through to the remote hypervisor Fixes: #5936 Signed-off-by: stevenhorsman <steven@uk.ibm.com> (based on commit `3ef2fd1784`)	2023-11-17 13:32:52 +00:00
Lei Li	50e0d43dad	runtime: Support privileged containers in peer pod VM This patch fixes the issue of running containers with privileged as true. See the discussion at this URL for the details. https://github.com/confidential-containers/cloud-api-adaptor/issues/111 Signed-off-by: Lei Li <cdlleili@cn.ibm.com> (based on commit `c3e6b66051`)	2023-11-17 13:32:52 +00:00
Yohei Ueda	57d4dd8e57	runtime: Support the remote hypervisor type This patch adds the support of the remote hypervisor type. Shim opens a Unix domain socket specified in the config file, and sends TTPRC requests to a external process to control sandbox VMs. Fixes #4482 Co-authored-by: Pradipta Banerjee <pradipta.banerjee@gmail.com> Co-authored-by: stevenhorsman <steven@uk.ibm.com> Signed-off-by: Yohei Ueda <yohei@jp.ibm.com> (based on commit `f9278f22c3`)	2023-11-17 13:32:49 +00:00
Yohei Ueda	8ac9a22097	runtime: Add hypervisor proto to support peer pod VMs This patch adds a protobuf definiton of the remote hypervisor type. Signed-off-by: Yohei Ueda <yohei@jp.ibm.com> Co-authored-by: stevenhorsman <steven@uk.ibm.com> (based on commit `150e8aba6d`)	2023-11-17 13:31:09 +00:00
Sumedh Alok Sharma	4aaf54bdad	runtime: Fix configmap/secrets update propagation with FS sharing disabled This PR fixes k8's configmap/secrets etc update propagation when filesystem sharing is disabled. The commit introduces below changes with some limitations: - creates new timestamped directory in guest - updates the '..data' symlink - creates user visible symlinks to newly created secrets. - Limitation: The older timestamped directory and stale user visible symlinks exist in guest due to missing DELETE api in agent. Fixes: #7398 Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>	2023-11-17 13:01:23 +05:30
James O. D. Hunt	4a4fc9c648	CODEOWNERS: Expand scope Improve the `CODEOWNERS` file by specifying more groups. Since GitHub automatically checks the `CODEOWNERS` file when a PR is created and adds all matching groups as reviewers for the PR, this may help reduce the PR backlog since the right people will be alerted and requested to review the PR. That should improve the quality of reviews (and thus the quality of the landed code). It may also have a positive effect on PR velocity. > Note: > > This PR combines the other `CODEOWNERS` files so we have > a single, visible, top-level file. See: https://github.com/kata-containers/community/issues/253 Fixes: #3804. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2023-11-16 16:09:20 +00:00
Liu Wenyuan	c77e990c3e	tests: Enable tests for StratoVirt hypervisor This commit enables StratoVirt hypervisor to be tested in kata GHA, incluing k8s, metrics, cri-containerd, nydus and so on. Meanwhile, adding some unit tests for StratoVirt to make sure it works. Fixes: #7794 Signed-off-by: Liu Wenyuan <liuwenyuan9@huawei.com>	2023-11-16 20:47:26 +08:00
Liu Wenyuan	9542211e71	configuration: add configuration for StratoVirt hypervisor. Add configuration-stratovirt.toml.in to generate the StratoVirt configuration, and parser to deliver config to StratoVirt. Fixes: #7794 Signed-off-by: Liu Wenyuan <liuwenyuan9@huawei.com>	2023-11-16 20:47:26 +08:00
Liu Wenyuan	561c85be54	build: Makefile for StratoVirt hypervisor Add support for building StratoVirt hypervisor, including x86_64 and arm64. Fixes: #7794 Signed-off-by: Liu Wenyuan <liuwenyuan9@huawei.com>	2023-11-16 20:47:26 +08:00
Liu Wenyuan	26966c8469	virtcontainers: Add StratoVirt as a supported hypervisor Initial support of the MicroVM machine type of StratoVirt hypervisor for the kata go runtime. Fixes: #7794 Signed-off-by: Liu Wenyuan <liuwenyuan9@huawei.com>	2023-11-16 20:47:24 +08:00
Fabiano Fidêncio	5e9cf75937	vc: utils: Rename CalculateMilliCPUs() to CalculateCPUsF() With the change done in the last commit, instead of calculating milli cpus, we're actually converting the CPUs to a fraction number, a float. Let's update the function name (and associated vars) to represent that change. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-11-10 18:26:01 +01:00
Fabiano Fidêncio	e477ed0e86	runtime: Improve vCPU allocation for the VMMs First of all, this is a controversial piece, and I know that. In this commit we're trying to make a less greedy approach regards the amount of vCPUs we allocate for the VMM, which will be advantageous mainly when using the `static_sandbox_resource_mgmt` feature, which is used by the confidential guests. The current approach we have basically does: * Gets the amount of vCPUs set in the config (an integer) * Gets the amount of vCPUs set as limit (an integer) * Sum those up * Starts / Updates the VMM to use that total amount of vCPUs The fact we're dealing with integers is logical, as we cannot request 500m vCPUs to the VMMs. However, it leads us to, in several cases, be wasting one vCPU. Let's take the example that we know the VMM requires 500m vCPUs to be running, and the workload sets 250m vCPUs as a resource limit. In that case, we'd do: * Gets the amount of vCPUs set in the config: 1 * Gets the amount of vCPUs set as limit: ceil(0.25) * 1 + ceil(0.25) = 1 + 1 = 2 vCPUs * Starts / Updates the VMM to use 2 vCPUs With the logic changed here, what we're doing is considering everything as float till just before we start / update the VMM. So, the flow describe above would be: * Gets the amount of vCPUs set in the config: 0.5 * Gets the amount of vCPUs set as limit: 0.25 * ceil(0.5 + 0.25) = 1 vCPUs * Starts / Updates the VMM to use 1 vCPUs In the way I've written this patch we introduce zero regressions, as the default values set are still the same, and those will only be changed for the TEE use cases (although I can see firecracker, or any other user of `static_sandbox_resource_mgmt=true` taking advantage of this). There's, though, an implicit assumption in this patch that we'd need to make explicit, and that's that the default_vcpus / default_memory is the amount of vcpus / memory required by the VMM, and absolutely nothing else. Also, the amount set there should be reflected in the podOverhead for the specific runtime class. One other possible approach, which I am not that much in favour of taking as I think it's less clear, is that we could actually get the podOverhead amount, subtract it from the default_vcpus (treating the result as a float), then sum up what the user set as limit (as a float), and finally ceil the result. It could work, but IMHO this is less clear, and less explicit on what we're actually doing, and how the default_vcpus / default_memory should be used. Fixes: #6909 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2023-11-10 18:25:57 +01:00
Fabiano Fidêncio	b0157ad73a	runtime: confidential: Do not set the max_vcpu to cpu We don't have to do this since we're relying on the `static_sandbox_resource_mgmt` feature, which gives us the correct amount of memory and CPUs to be allocated. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-11-10 12:58:20 +01:00
Archana Shinde	1611723465	Merge pull request #8379 from likebreath/1103/clh_v36.0 Upgrade to Cloud Hypervisor v36.0	2023-11-08 21:10:41 -08:00
Archana Shinde	268d4d622f	Merge pull request #8389 from justxuewei/vm-capable-test runtime: Fix TestCheckHostIsVMContainerCapable unstablity issue	2023-11-08 12:14:04 -08:00
Archana Shinde	92a517156c	Merge pull request #8367 from amshinde/add-nerdctl-ipvlan-test network: Fix network hotplug for ipvlan and macvlan endpoints for qemu and add tests	2023-11-08 11:45:13 -08:00
Xuewei Niu	acd9057c7b	runtime: Fix TestCheckHostIsVMContainerCapable unstablity issue TestCheckHostIsVMContainerCapable removes sysModuleDir to simulate a case that the kernel modules are not loaded. However, checkKernelModules() executes modprobe <module> if a module not found in that directory. Loading those modules is required to be denied temporarily. Fixes: #8390 Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2023-11-08 22:40:08 +08:00
Archana Shinde	a6272733e7	network: Fix network hotplug for ipvlan and macvlan endpoints. Since moving from network coldplug to hotplug, the only case verified was veth endpoints. Support for network hotplug for ipvlan and macvlan was broken/not added. Fix it. Fixes: #8391 Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2023-11-07 10:13:51 -08:00
Beraldo Leal	dd530ba8ee	tests: fixes AMD errors TestCheckHostIsVMContainerCapable is failing on AMD machines. kata-check_amd64_test.go:96 has no AMD modules, also getCPUType is missing. Fixes #8384. Signed-off-by: Beraldo Leal <bleal@redhat.com>	2023-11-06 16:49:59 +00:00
Beraldo Leal	7641c19f74	runtime: bump containerd for gogo deprecation This update includes necessary changes due to the version bump of containerd and its dependencies. It's part of a broader initiative to phase out gogo protobuf, which has been deprecated, and to align with the current supported libraries. Fixes #7420. Signed-off-by: Beraldo Leal <bleal@redhat.com>	2023-11-06 16:49:59 +00:00
Beraldo Leal	16fa2c39e6	protocols: replace gogo/types.Empty and Any by Google versions. Signed-off-by: Beraldo Leal <bleal@redhat.com>	2023-11-06 16:49:58 +00:00
Beraldo Leal	5d88c78a6e	protocols: generating agent.pb.go `a3b003c345` modified agent but agent.pb.go was not updated. Signed-off-by: Beraldo Leal <bleal@redhat.com>	2023-11-06 16:49:58 +00:00
Bo Chen	071667f1ca	runtime: clh: Re-generate the client code This patch re-generates the client code for Cloud Hypervisor v35.0. Note: The client code of cloud-hypervisor's OpenAPI is automatically generated by openapi-generator. Fixes: #8378 Signed-off-by: Bo Chen <chen.bo@intel.com>	2023-11-03 10:47:06 -07:00
Fabiano Fidêncio	40cc397218	Merge pull request #8255 from cmaf/migrate-checks-fixes-links docs: Fix broken links	2023-11-01 14:46:30 +01:00
David Esparza	2a17d3889e	Merge pull request #8334 from amshinde/ipvlan-nerdctl-fix network: Fix network attach for ipvlan and macvlan	2023-10-30 16:00:32 -06:00
Archana Shinde	f53f86884f	network: Fix network attach for ipvlan and macvlan We used the approach of cold-plugging network interface for pre-shimv2 support for docker.Since the hotplug approach was not required, we never really got to implementing hotplug support for certain network endpoints, ipvlan and macvlan being among them. Since moving to shimv2 interface as the default for runtime, we switched to hotplugging the network interface for supporting docker and nerdctl. This was done for veth endpoints only. Implement the hot-attach apis for ipvlan and macvlan as well to support ipvlan and macvlan networks with docker and nerdctl. Fixes: #8333 Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2023-10-27 21:42:37 -07:00
Chelsea Mafrica	0608e20a01	docs: Fix broken links Update broken links so that static checks pass. Fixes #8254 Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>	2023-10-26 10:17:01 -07:00
James O. D. Hunt	d707fa2c0d	kata-runtime/kata-ctl: Add security details to output Add the hypervisor security details to the output of the `kata-runtime env` and `kata-ctl env` commands so the user can see, amongst other things, the value of `confidential_guest`. Fixes: #8313. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2023-10-25 16:34:42 +01:00
Fabiano Fidêncio	328ba0da99	Merge pull request #7647 from jongwu/use_pcie_virt AArch64: runtime: use pcie root port to do pci/pcie device hotplug	2023-10-25 09:17:13 +02:00
James O. D. Hunt	048cc70654	Merge pull request #8213 from jodh-intel/validate-hypervisor-cfg-name runtime: Validate hypervisor section name in config file	2023-10-19 07:40:58 +01:00
Jianyong Wu	f9c9d8f645	runtime: QemuVirt: hotadd virtio-mem dev to pcie root port Hotplug virtio-mem device to pcie root port for Qemu Virt. Fixes: #7646 Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>	2023-10-18 06:35:57 +00:00
Jianyong Wu	ef18c9550c	runtime:qemuvirt: hotadd net dev to pcie root port Hotplug network device to pcie root port as this is the only way on QemuVirt. Fixes: #7646 Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>	2023-10-18 06:35:57 +00:00
Jianyong Wu	f1aec98f9d	qemu/virt: use pcie_root_port to do device hotplug for virt ACPI PCI device hotplug on qemu virt is not supported. The only way to hotplug pci device is pcie native way. Thus we need create pcie root port as default. Pcie root port number depends on following: 1. reserved one for network device as default; 2. virtio-mem dev; 3. add enough port for vhost user blk dev; Fixes: #7646 Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>	2023-10-18 06:35:57 +00:00
Jianyong Wu	28a41e1d16	runtime: add a new API for Network interface Add GetEndpointsNum API for Network Interface to get the number of network endpoints. This is used for caculate the number of pcie root port for QemuVirt. Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>	2023-10-18 06:35:57 +00:00
James O. D. Hunt	3e8cf6959c	runtime: Validate hypervisor section name in config file Previously, if you accidentally modified the name of the hypervisor section in the config file, the default golang runtime gives a cryptic error message ("`VM memory cannot be zero`"). This can be demonstrated using the `kata-runtime` utility program which uses the same golang config package as the actual runtime (`containerd-shim-kata-v2`): ```bash $ kata-runtime env >/dev/null; echo $? 0 $ sudo sed -i 's!^\[hypervisor\.qemu\]!\[hypervisor\.foo\]!g' /etc/kata-containers/configuration.toml $ kata-runtime env >/dev/null; echo $? VM memory cannot be zero 1 ``` The hypervisor name is now validated so that the behaviour becomes: ```bash $ kata-runtime env >/dev/null; echo $? 0 $ sudo sed -i 's!^\[hypervisor\.qemu\]!\[hypervisor\.foo\]!g' /etc/kata-containers/configuration.toml $ ./kata-runtime env >/dev/null; echo $? /etc/kata-containers/configuration.toml: configuration file contains invalid hypervisor section: "foo" 1 ``` Fixes: #8212. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2023-10-12 13:53:37 +01:00
Peng Tao	d7660d82a0	runtime: unify gopkg.in/yaml.v3 to v3.0.1 The older versions have Denial of Service issues. Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2023-10-10 03:56:45 +00:00
Peng Tao	fc9a107e8e	runtime: unify swag and testify dependency So that we don't need to depend on that many versions of them. Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2023-10-10 03:56:45 +00:00
Peng Tao	79ebb959c5	runtime: update runc dependency to v1.1.9 To pick up security fixes. Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2023-10-10 03:56:45 +00:00
Peng Tao	7f3e8bd65e	runtime: unify golang.org/x/text to v0.7.0 The older versions contain security issues. Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2023-10-10 03:56:45 +00:00
Peng Tao	df325ae371	runtime: update golang.org/x/net to v0.7.0 To pick up fix for the following issue: A maliciously crafted HTTP/2 stream could cause excessive CPU consumption in the HPACK decoder, sufficient to cause a denial of service from a small number of small requests. Fixes: #8190 Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2023-10-10 03:56:39 +00:00
Zvonko Kaiser	7c934dc7da	gpu: Fix cold-plug of VFIO devices We need to do proper sandbox sizing when we're doing cold-plug introduce CDI, the de-facto standard for enabling devices in containers. containerd will pass-through annotations for accumulated CPU,Memory and now CDI devices. With that information sandbox sizing can be derived correctly. Fixes: #7331 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-09-28 09:49:13 +00:00
Greg Kurz	defbb64ac8	Merge pull request #8036 from rye-stripe/bugfix/overhead-metrics runtime: fix reading cgroup stats of sandboxes	2023-09-27 19:39:55 +02:00
Bo Chen	dfd0c9fa9a	runtime: clh: Re-generate the client code This patch re-generates the client code for Cloud Hypervisor v35.0. Note: The client code of cloud-hypervisor's OpenAPI is automatically generated by openapi-generator. Fixes: #8057 Signed-off-by: Bo Chen <chen.bo@intel.com>	2023-09-25 12:22:37 -07:00
Peteris Rudzusiks	94e2ccc2d5	runtime: fix reading cgroup stats of sandboxes The cgroup stats come from resourcecontrol package in the form of pointers to structs. The sandbox Stat() method incorrectly was expecting structs. This caused the cpu and memory stats to always be 0, which in turn caused incorrect pod overhead metrics. Fixes #8035 Signed-off-by: Peteris Rudzusiks <rye@stripe.com>	2023-09-21 17:00:53 +02:00
Alexandru Matei	d507d189bb	fc: Add support for noflush cache option Firecracker supports noflush semantic via Unsafe cache type. There is no support for direct i/o, remove it from config file Fixes: #7823 Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>	2023-09-21 14:48:24 +03:00
Alexandru Matei	2ca781518a	clh: Direct IO support for block devices Clh suports direct i/o for disks. It doesn't offer any support for noflush, removed passing of option to cloud-hypervisor internal config Fixes: #7798 Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>	2023-09-21 14:48:24 +03:00
Wainer Moschetta	87e64a07ed	Merge pull request #7979 from beraldoleal/gogo-removal protocol: remove gogoprotobuff tests	2023-09-20 22:38:10 -03:00
Beraldo Leal	730ef51693	deps: updating dependencies Updating dependencies after make check, make test. Signed-off-by: Beraldo Leal <bleal@redhat.com>	2023-09-19 16:54:35 -04:00
Dan Mihai	82ff2db460	runtime: support kernel params including spaces Support quoted kernel command line parameters that include space characters. Example: dm-mod.create="dm-verity,,,ro,0 736328 verity 1 /dev/vda1 /dev/vda2 4096 4096 92041 0 sha256 f211b9f1921ef726d57a72bf82be23a510076639fa8549ade10f85e214e0ddb4 065c13dfb5b4e0af034685aa5442bddda47b17c182ee44ba55a373835d18a038" Fixes: #8003 Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2023-09-19 20:26:38 +00:00
Beraldo Leal	604a9dd673	protocol: remove gogoprotobuff tests This is part of a bigger effort to drop gogoprotobuff from our code base. IIUC, those options are basically used by *pb_test.go, and since we are dropping gogoprotobuff and those are auto generated tests, let's just remove it. Fixes #7978. Signed-off-by: Beraldo Leal <bleal@redhat.com>	2023-09-19 12:55:42 -04:00
Fabiano Fidêncio	84c0d59d23	Merge pull request #7985 from fidencio/topic/clh-use-static_sandbox_resource_mgmt-as-default-on-arm clh: arm: Use static_sandbox_resource_mgmt=true	2023-09-19 09:25:34 +02:00
Fabiano Fidêncio	c3ee913bf6	Merge pull request #7953 from gkurz/extra-monitor-socket runtime/qemu: Rework QMP/HMP support	2023-09-18 19:04:14 +02:00
Fabiano Fidêncio	72599f1911	clh: arm: Use static_sandbox_resource_mgmt=true Users have noticed that this is needed, as CLH does not yet implement a way to hotplug resources on aarh64. With this patch, when building for x86_64, I can see the this is the resulting config: ``` $ ARCH=amd64 make ... $ cat config/configuration-clh.toml \| grep static_sandbox_resource_mgmt static_sandbox_resource_mgmt=false ``` And when building for aarch64: ``` $ ARCH=arm64 make ... $ cat config/configuration-clh.toml \| grep static_sandbox_resource_mgmt static_sandbox_resource_mgmt=true ``` Fixes: #7941 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-09-18 14:14:10 +02:00
Jeremi Piotrowski	dfa6af54df	Merge pull request #7806 from jongwu/clh_serial clh:arm64: use arm AMBA UART for hypervisor debug	2023-09-18 12:29:07 +02:00
Greg Kurz	1f16b6627b	runtime/qemu: Rework QMP/HMP support PR #6146 added the possibility to control QEMU with an extra HMP socket as an aid for debugging. This is great for development or bug chasing but this raises some concerns in production. The HMP monitor allows to temper with the VM state in a variety of ways. This could be intentionally or mistakenly used to inject subtle bugs in the VM that would be extremely hard if not even impossible to debug. We definitely don't want that to be enabled by default. The feature is currently wired to the `enable_debug` setting in the `[hypervisor.qemu]` section of the configuration file. This setting has historically been used to control "debug output" and it is used as such by some downstream users (e.g. Openshift). Forcing people to have the extra HMP backdoor at the same time is abusive and dangerous. A new `extra_monitor_socket` is added to `[hypervisor.qemu]` to give fine control on whether the HMP socket is wanted or not. This setting is still gated by `enable_debug = true` to make it clear it is for debug only. The default is to not have the HMP socket though. This isn't backward compatible with #6416 but it is for the sake of "better safe than sorry". An extra monitor socket makes the QEMU instance untrusted. A warning is thus logged to the journal when one is requested. While here, also allow the user to choose between HMP and QMP for the extra monitor socket. Motivation is that QMP offers way more options to control or introspect the VM than HMP does. Users can also ask for pretty json formatting well suited for human reading. This will improve the debugging experience. This feature is only made visible in the base and GPU configurations of QEMU for now. Fixes #7952 Signed-off-by: Greg Kurz <groug@kaod.org>	2023-09-18 12:13:01 +02:00
Peng Tao	6eedd9b0b9	Merge pull request #7738 from Xuanqing-Shi/7732/handle-non-empty-endpoints-in-RemoveEndpoints runtime: incorrect handling of non-empty []Endpoint parameter in Remo…	2023-09-18 10:58:28 +08:00
Jianyong Wu	241c355e07	clh:arm64: use arm AMBA uart for hypervisor debug cloud hypervisor on arm64 only support arm AMBA UART(pl011) as tty. So, the console should be set to "ttyAMA0" instead of "ttyS0" when enable hypervisor debug mode. Fixes: #5080 Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>	2023-09-15 01:44:23 +00:00
Jeremi Piotrowski	3a1db7a86b	runtime: clh: Support enabling iommu by enabling IOMMU on the default PCI segment. For hotplug to work we need a virtualized iommu and clh exposes one if there is some device or PCI segment that requests it. I would have preferred to add a separate PCI segment for hotplugging vfio devices but unfortunately kata assumes there is only one segment all over the place. See create_pci_root_bus_path(), split_vfio_pci_option() and grep for '0000'. Enabling the IOMMU on the default PCI segment requires passing enabling IOMMU on every device that is attached to it, which is why it is sprinkled all over the place. CLH does not support IOMMU for VirtioFs, so I've added a non IOMMU segment for that device. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>	2023-09-14 14:23:28 +02:00
Jeremi Piotrowski	bfc93927fb	runtime: Remove redundant check in checkPCIeConfig There is no way for this branch to be hit, as port is only set when it is different than config.NoPort. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>	2023-09-14 14:23:28 +02:00
Jeremi Piotrowski	7c4e73b609	runtime: Add test cases for checkPCIeConfig These test cases shows which options are valid for CLH/Qemu, and test that we correctly catch unsupported combinations. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>	2023-09-14 14:23:28 +02:00
Jeremi Piotrowski	fc51e4b9eb	runtime: Check config for supported CLH (cold\|hot)_plug_vfio values The only supported options are hot_plug_vfio=root-port or no-port. cold_plug_vfio not supported yet. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>	2023-09-14 14:23:28 +02:00
Jeremi Piotrowski	509771e6f5	runtime: clh: Add hot_plug_vfio entry to config hot_plug_vfio needs to be set to root-port, otherwise attaching vfio devices to CLH VMs fails. Either cold_plug_vfio or hot_plug_vfio is required, and we have not implemented support for cold_plug_vfio in CLH yet. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>	2023-09-14 14:23:28 +02:00
Peng Tao	55ca7e8aec	Merge pull request #7907 from Xuanqing-Shi/7876/network-devices-naming-conflict runtime: Naming conflict of network devices	2023-09-13 19:29:41 +08:00
shixuanqing	1636abbe1c	runtime: issue with non-empty []Endpoint in RemoveEndpoints In the RemoveEndpoints(), when the endpoints paramete isn't empty, using idx may result in wrong endpoint removals. To improve, directly passing the endpoint parameter helps locate the correct elements within n.eps. Fixes: #7732 Signed-off-by: shixuanqing <1356292400@qq.com> Fixes: #7732 Signed-off-by: shixuanqing <1356292400@qq.com> Update src/runtime/virtcontainers/network_linux.go Co-authored-by: Xuewei Niu <justxuewei@apache.org>	2023-09-13 09:47:18 +00:00
Peng Tao	9766f9090c	Merge pull request #7719 from beraldoleal/nullable Remove gogoproto.nullable extension	2023-09-13 15:11:56 +08:00
shixuanqing	ca4b6b051d	runtime: Naming conflict of network devices When creating a new endpoint, we check existing endpoint names and automatically adjust the naming of the new endpoint to ensure uniqueness. Fixes: #7876 Signed-off-by: shixuanqing <1356292400@qq.com>	2023-09-12 04:29:51 +00:00
James O. D. Hunt	c0f697fcc5	runtime: Allow kernel_params annotation To support the removal of the `initcall_debug` and `earlyprintk=` options from the default guest kernel cmdline, add `kernel_params` to the list of enabled annotations to allow those kernel options (or others) to be set using `kata-deploy` for either runtime. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2023-09-11 12:12:12 +01:00
Fabiano Fidêncio	6cd5d83a37	Merge pull request #7865 from gkurz/fix-more-virtiofs-args runtime: Fix more virtiofs args	2023-09-09 21:30:16 +02:00
Greg Kurz	72c510d057	runtime/virtiofsd: Drop all references to "--cache=none" This syntax belongs to the legacy C virtiofsd implementation that we don't support anymore since kata-containers 3.1.3 because of other API breaking changes. People have been warned to switch from "none" to "never" since kata-containers 2.5.2. Let's officially do that. The compat code that would convert "none" to "never" isn't needed anymore. Just drop it. Fixes #7864 Signed-off-by: Greg Kurz <groug@kaod.org>	2023-09-08 17:57:30 +02:00
Beraldo Leal	ead724bec1	protocol: removing gogo.nullable feature gogo.nullable is the main gogo.protobuf' feature used here. Since we are trying to remove gogo.protobuf, the first reasonable step seems to be remove this feature. This is a core update, and it will change how the structs are defined. I could spot only a few places using those structs, based on make check/build. Fixes #7723. Signed-off-by: Beraldo Leal <bleal@redhat.com>	2023-09-08 11:49:01 -04:00
Peng Tao	435e890cd9	Merge pull request #7703 from bergwolf/github/nerdctl-fc runtime: run prestart hooks before starting VM for FC	2023-09-07 10:55:31 +08:00
Greg Kurz	81536f21af	runtime/qemu: Pass "--xattr" to virtiofsd instead of "-o xattr" The "-o" syntax belongs to the legacy C virtiofsd. It is deprecated with the rust implementation. Signed-off-by: Greg Kurz <groug@kaod.org>	2023-09-06 17:50:35 +02:00
Fabiano Fidêncio	b1dd09a4d3	runtime: Allow virtio_fs_extra_args annotation Some use cases may just require passing extra arguments to virtiofsd, and having this disabled by default makes it impossible to set when using kata-deploy, as changes in the configuration file would be overwritten by the daemon-set. With this in mind, let's allow users to pass whatever thet need (and here I'm specifically looking at `--xattr`) as a virtio_fs_extra_arg. Fixes: #7853 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-09-06 17:11:16 +02:00
Dan Mihai	d0e0610679	runtime: config: use the SEV initrd for SNP Thanks Unmesh Deodhar! Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2023-09-01 14:28:08 +00:00
Fabiano Fidêncio	67fed26f18	runtime: Use TDX image with in the qemu-tdx config Let's make sure we use the TDX image as part of the QEMU TDX configuration, which will help us to have the policies tested here. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-09-01 14:28:08 +00:00
Jeremi Piotrowski	bde06758b1	Merge pull request #7761 from jepio/iocopy-fix-race runtime: Fix data race in ioCopy	2023-09-01 09:30:54 +02:00
Jeremi Piotrowski	c2ba29c15b	runtime: Fix data race in ioCopy IoCopy is a tricky function (I don't claim to fully understand its contract), but here is what I see: The goroutine that runs it spawns 3 goroutines - one for each stream to handle (stdin/stdout/stderr). The goroutine then waits for the stream goroutines to exit. The idea is that when the process exits and is closed, the stdout goroutine will be unblocked and close stdin - this should unblock the stdin goroutine. The stderr goroutine will exit at the same time as the stdout goroutine. The iocopy routine then closes all tty.io streams. The problem is that the stdout goroutine decrements the WaitGroup before closing the stdin stream, which causes the iocopy goroutine to race to close the streams. Move the wg.Done() of the stdout routine past the close so that this race becomes impossible. I can't guarantee that this doesn't affect some unspecified behavior. Fixes: #5031 Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>	2023-08-31 10:17:38 +02:00
Peng Tao	2e4c874726	runtime/vc: runPrestartHooks should ignore GetHypervisorPid failure If we are running FC hypervisor, it is not started when prestart hooks are executed. So we should just ignore such error and just go ahead and run the hooks. Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2023-08-30 03:06:11 +00:00
Peng Tao	21204caf20	runtime: fail early when starting docker container with FC FC does not support network device hotplug. Let's add a check to fail early when starting containers created by docker. Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2023-08-30 02:52:01 +00:00
Peng Tao	32fd013716	runtime: run prestart hooks before starting VM for FC Add a new hypervisor capability to tell if it supports device hotplug. If not, we should run prestart hooks before starting new VMs as nerdctl is using the prestart hooks to set up netns. To make nerdctl + FC to work, we need to run the prestart hooks before starting new VMs. Fixes: #6384 Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2023-08-30 02:52:01 +00:00
Beraldo Leal	00e7ffd988	tests: check vmx only on Intel machines When running on amd machines, those tests will fail because there is no vmx flag. Following other tests that checks for cpuType, let's adapt them to restrict vmx only on Intel machines. Fixes #7788. Related #5066 Signed-off-by: Beraldo Leal <bleal@redhat.com>	2023-08-29 20:04:31 -04:00
Beraldo Leal	80146f2078	tests: Fixes cpuType check on AMD machines cpuType is not initialized yet. gets 0 (Intel) by default, failing on AMD machines. Fixes #7785 Signed-off-by: Beraldo Leal <bleal@redhat.com>	2023-08-29 17:04:07 -04:00
Fabiano Fidêncio	d1b54ede29	qemu: tdx: Workaround SMP issue with TDX 1.5 `...,sockets=1,cores=numvcpus,threads=1,...` must be used. Fixes: #7770 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-08-28 13:41:36 +02:00
Archana Shinde	1e34220c41	qemu: tdx: Adapt to the TDX 1.5 stack QEMU for TDX 1.5 makes use of private memory map/unmap. Make changes to govmm to support this. Support for private backing fd for memory is added as knob to the qemu config. Userspace's map/unmap operations are done by fallocate() ioctl on the backing store fd. Reference: https://lore.kernel.org/linux-mm/20220519153713.819591-1-chao.p.peng@linux.intel.com/ Fixes: #7770 Signed-off-by: Archana Shinde <archana.m.shinde@intel.com> Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-08-28 13:41:36 +02:00
Peng Tao	18d42da21e	runtime/fc: fix image/initrd annotation handling Right now if we configure an image annotation and have a config file setting initrd, the initrd config would override the image annotation. Make sure annotations are preferred over config options in image and initrd path handling. Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2023-08-23 03:47:28 +00:00
Peng Tao	9fda7059a5	runtime/clh: fix image/initrd annotation handling We should make sure annotations are preferred over config options in image and initrd path handling. Fixes: #7705 Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2023-08-23 03:47:28 +00:00
Peng Tao	1a0092d631	runtime/qemu: fix image/initrd annotation handling Right now if we configure an image annotation and have a config file setting initrd, the initrd config would override the image annotation. Add a helper function ImageOrInitrdAssetPath to make sure annotations are preferred over config options in image and initrd path handling. Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2023-08-23 03:47:27 +00:00
Fabiano Fidêncio	e107d1d94e	Merge pull request #7574 from microsoft/danmihai1/policy agent: runtime: add Agent Policy feature	2023-08-15 11:29:13 +02:00
Chelsea Mafrica	22465d22f0	Merge pull request #7638 from ManaSugi/fix/virtcontainers-doc docs: Remove installation step in virtcontainers doc	2023-08-14 10:21:57 -07:00
Dan Mihai	ab829d1038	agent: runtime: add the Agent Policy feature Fixes: #7573 To enable this feature, build your rootfs using AGENT_POLICY=yes. The default is AGENT_POLICY=no. Building rootfs using AGENT_POLICY=yes has the following effects: 1. The kata-opa service gets included in the Guest image. 2. The agent gets built using AGENT_POLICY=yes. After this patch, the shim calls SetPolicy if and only if a Policy annotation is attached to the sandbox/pod. When creating a sandbox/pod that doesn't have an attached Policy annotation: 1. If the agent was built using AGENT_POLICY=yes, the new sandbox uses the default agent settings, that might include a default Policy too. 2. If the agent was built using AGENT_POLICY=no, the new sandbox is executed the same way as before this patch. Any SetPolicy calls from the shim to the agent fail if the agent was built using AGENT_POLICY=no. If the agent was built using AGENT_POLICY=yes: 1. The agent reads the contents of a default policy file during sandbox start-up. 2. The agent then connects to the OPA service on localhost and sends the default policy to OPA. 3. If the shim calls SetPolicy: a. The agent checks if SetPolicy is allowed by the current policy (the current policy is typically the default policy mentioned above). b. If SetPolicy is allowed, the agent deletes the current policy from OPA and replaces it with the new policy it received from the shim. A typical new policy from the shim doesn't allow any future SetPolicy calls. 4. For every agent rpc API call, the agent asks OPA if that call should be allowed. OPA allows or not a call based on the current policy, the name of the agent API, and the API call's inputs. The agent rejects any calls that are rejected by OPA. When building using AGENT_POLICY_DEBUG=yes, additional Policy logging gets enabled in the agent. In particular, information about the inputs for agent rpc API calls is logged in /tmp/policy.txt, on the Guest VM. These inputs can be useful for investigating API calls that might have been rejected by the Policy. Examples: 1. Load a failing policy file test1.rego on a different machine: opa run --server --addr 127.0.0.1:8181 test1.rego 2. Collect the API inputs from Guest's /tmp/policy.txt and test on the machine where the failing policy has been loaded: curl -X POST http://localhost:8181/v1/data/agent_policy/CreateContainerRequest \ --data-binary @test1-inputs.json Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2023-08-14 17:07:35 +00:00
Manabu Sugimoto	416445e7eb	docs: Remove installation step in virtcontainers doc Remove the installation step in the virtcontainers doc because the virtcontainers install/uninstall targets have been removed by `86723b51ae` and they are not used anymore. Fixes: #7637 Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>	2023-08-14 15:15:24 +09:00
stevenhorsman	8815ed0665	runtime: Remove config warnings Remove configuration file shared_fs = none warnings now that there is a solution to updating configMaps, secrets etc Fixes: #7210 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2023-08-11 16:31:08 +01:00
Pradipta Banerjee	ab13ef87ee	runtime: propagate configmap/secrets etc changes for remote-hyp For remote hypervisor, the configmap, secrets, downward-api or project-volumes are copied from host to guest. This patch watches for changes to the host files and copies the changes to the guest. Note that configmap updates takes significantly longer than updates via downward-api. This is similar across runc and Kata runtimes. Fixes: #7210 Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com> Signed-off-by: Julien Ropé <jrope@redhat.com> (cherry picked from commit `3081cd5f8e`) (cherry picked from commit 68ec673bc4d9cd853eee51b21a0e91fcec149aad)	2023-08-11 16:31:08 +01:00
Yohei Ueda	c074ec4df1	runtime: Copy shared files recursively This patch enables recursive file copying when filesystem sharing is not used. Signed-off-by: Yohei Ueda <yohei@jp.ibm.com> Co-authored-by: stevenhorsman <steven@uk.ibm.com> (cherry picked from commit `5422a056f2`) (cherry picked from commit 16055ce040bbd724be2916bc518d89b69c9e0ca5) Fixes: #7210	2023-08-11 16:16:52 +01:00
Manabu Sugimoto	cc922be5ec	versions: Update firecracker version to 1.4.0 This patch upgrades Firecracker version from v1.1.0 to v1.4.0. * Generate swagger models for v1.4.0 (from `firecracker.yaml`) - The version of go-swagger used is v0.30.0 * The firecracker v1.4.0 includes the following changes. - Added * Added support for custom CPU templates allowing users to adjust vCPU features exposed to the guest via CPUID, MSRs and ARM registers. * Introduced V1N1 static CPU template for ARM to represent Neoverse V1 CPU as Neoverse N1. * Added support for the virtio-rng entropy device. The device is optional. A single device can be enabled per VM using the /entropy endpoint. * Added a cpu-template-helper tool for assisting with creating and managing custom CPU templates. - Changed * Set FDP_EXCPTN_ONLY bit (CPUID.7h.0:EBX[6]) and ZERO_FCS_FDS bit (CPUID.7h.0:EBX[13]) in Intel's CPUID normalization process. - Fixed * Fixed feature flags in T2S CPU template on Intel Ice Lake. * Fixed CPUID leaf 0xb to be exposed to guests running on AMD host. * Fixed a performance regression in the jailer logic for closing open file descriptors. * A race condition that has been identified between the API thread and the VMM thread due to a misconfiguration of the api_event_fd. * Fixed CPUID leaf 0x1 to disable perfmon and debug feature on x86 host. * Fixed passing through cache information from host in CPUID leaf 0x80000006. * Fixed the T2S CPU template to set the RRSBA bit of the IA32_ARCH_CAPABILITIES MSR to 1 in accordance with an Intel microcode update. * Fixed the T2CL CPU template to pass through the RSBA and RRSBA bits of the IA32_ARCH_CAPABILITIES MSR from the host in accordance with an Intel microcode update. * Fixed passing through cache information from host in CPUID leaf 0x80000005. * Fixed the T2A CPU template to disable SVM (nested virtualization). * Fixed the T2A CPU template to set EferLmsleUnsupported bit (CPUID.80000008h:EBX[20]), which indicates that EFER[LMSLE] is not supported. Fixes: #7610 Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>	2023-08-10 16:48:13 +09:00
Wedson Almeida Filho	4fbe0a3a53	runtime: bind-mount mounted block device into container When the mounted block device isn't a layer, we want to mount it into containers, but since it's already mounted with the correct fs (e.g., tar, ext4, etc.) in the pod, we just bind-mount it into the container. Fixes: #7536 Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>	2023-08-03 17:58:39 -03:00
Wedson Almeida Filho	7e1b1949d4	runtime: add support for kata overlays When at least one `io.katacontainers.fs-opt.layer` option is added to the rootfs, it gets inserted into the VM as a layer, and the file system is mounted as an overlay of all layers using the overlayfs driver. Additionally, if the `io.katacontainers.fs-opt.block_device=file` option is present in a layer, it is mounted as a block device backed by a file on the host. Fixes: #7536 Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>	2023-08-03 17:58:39 -03:00
Zvonko Kaiser	cddcde1d40	vfio: Fix vfio device ordering If modeVFIO is enabled we need 1st to attach the VFIO control group device /dev/vfio/vfio an 2nd the actuall device(s) afterwards.Sort the devices starting with device #1 being the VFIO control group device and the next the actuall device(s) /dev/vfio/<group> Fixes: #7493 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-07-31 11:26:27 +00:00
Zvonko Kaiser	1fc715bc65	s390x: Add AP Attach/Detach test Now that we have propper AP device support add a unit test for testing the correct Attach/Detach of AP devices. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-07-23 13:44:19 +00:00
Zvonko Kaiser	545de5042a	vfio: Fix tests Now with more elaborate checking of cold\|hot plug ports we needed to update some of the tests. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-07-20 13:42:44 +00:00
Zvonko Kaiser	62aa6750ec	vfio: Added better handling of VFIO Control Devices Depending on the vfio_mode we need to mount the VFIO control device additionally into the container. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-07-20 13:42:42 +00:00
Zvonko Kaiser	dd422ccb69	vfio: Remove obsolete HotplugVFIOonRootBus Removing HotplugVFIOonRootBus which is obsolete with the latest PCI topology changes, users can set cold_plug_vfio or hot_plug_vfio either in the configuration.toml or via annotations. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-07-20 07:25:40 +00:00
Zvonko Kaiser	114542e2ba	s390x: Fixing device.Bus assignment The device.Bus was reset if a specific combination of configuration parameters were not met. With the new PCIe topology this should not happen anymore Fixes: #7381 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-07-20 07:24:26 +00:00
Peng Tao	581be92b25	Merge pull request #4492 from zvonkok/pcie-topology runtime: fix PCIe topology for GPUDirect use-case	2023-07-03 09:17:12 +08:00
Fabiano Fidêncio	6a21e20c63	runtime: Add "none" as a shared_fs option Currently, even when using devmapper, if the VMM supports virtio-fs / virtio-9p, that's used to share a few files between the host and the guest. This needed, as we need to share with the guest contents like secrets, certificates, and configurations, via Kubernetes objects like configMaps or secrets, and those are rotated and must be updated into the guest whenever the rotation happens. However, there are still use-cases users can live with just copying those files into the guest at the pod creation time, and for those there's absolutely no need to have a shared filesystem process running with no extra obvious benefit, consuming memory and even increasing the attack surface used by Kata Containers. For the case mentioned above, we should allow users, making it very clear which limitations it'll bring, to run Kata Containers with devmapper without actually having to use a shared file system, which is already the approach taken when using Firecracker as the VMM. Fixes: #7207 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-06-30 20:45:00 +02:00
Zvonko Kaiser	0f454d0c04	gpu: Fixing typos for PCIe topology changes Some comments and functions had typos and wrong capitalization. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-30 08:42:55 +00:00
Zvonko Kaiser	8330fb8ee7	gpu: Update unit tests Some tests are now failing due to the changes how PCIe is handled. Update the test accordingly. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2023-06-23 11:16:25 +00:00
Greg Kurz	a43ea24dfc	virtiofsd: Convert legacy `-o` sub-options to their `--` replacement The `-o` option is the legacy way to configure virtiofsd, inherited from the C implementation. The rust implementation honours it for compatibility but it logs deprecation warnings. Let's use the replacement options in the go shim code. Also drop references to `-o` from the configuration TOML file. Fixes #7111 Signed-off-by: Greg Kurz <groug@kaod.org>	2023-06-16 11:42:54 +02:00
Greg Kurz	8e00dc6944	virtiofsd: Drop `-o no_posix_lock` The C implementation of virtiofsd had some kind of limited support for remote POSIX locks that was causing some workflows to fail with kata. Commit `432f9bea6e` hard coded `-o no_posix_lock` in order to enforce guest local POSIX locks and avoid the issues. We've switched to the rust implementation of virtiofsd since then, but it emits a warning about `-o` being deprecated. According to https://gitlab.com/virtio-fs/virtiofsd/-/issues/53 : The C implementation of the daemon has limited support for remote POSIX locks, restricted exclusively to non-blocking operations. We tried to implement the same level of functionality in #2, but we finally decided against it because, in practice most applications will fail if non-blocking operations aren't supported. Implementing support for non-blocking isn't trivial and will probably require extending the kernel interface before we can even start working on the daemon side. There is thus no justification to pass `-o no_posix_lock` anymore. Signed-off-by: Greg Kurz <groug@kaod.org>	2023-06-16 11:42:39 +02:00

... 3 4 5 6 7 ...

2065 Commits