Commit Graph

1902 Commits

Author SHA1 Message Date
Alex Lyn
1684c1962c runtime: Fix runtime/cdi panic with assignment to entry in nil map
It will panic when users do GPU vfio passthrough with cdi in runtime.
The root cause is that CustomSpec.Annotations is nil when new element
added.
To address this issue, initialization is introduced when it's nil.

Fixes #10266

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-09-09 20:15:10 +08:00
Fabiano Fidêncio
65a4562050 runtime: qemu: tdx: Add omitempty to QuoteGenerationSocket
I know right now we're always passing a value for that, but this doesn't
really have to be set unless attestation is used.  Thus, let's also omit
it in case it's empty.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2024-09-06 15:05:55 +02:00
Fabiano Fidêncio
7818484120 runtime: qemu: tdx: Support mrconfigid / mrowner/ mrownerconfig
This is a quick and simple pre-req for supporting initData, which will
take advantage of the mrconfigid in the TDX case.

While already adding mrconfigid, which is hardcoded empty right now,
let's do the same for mrowner and mrownerconfig, and leave it prepared
for future expansions.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2024-09-06 15:05:54 +02:00
Fabiano Fidêncio
8285957678 runtime: qemu: Rename prepareObjectWithTDXQgs to prepareTDXObject
The reason we're relying on yet another function to do so is because the
TDX object will be used in its qom / qapi json format.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2024-09-06 14:36:09 +02:00
Fabiano Fidêncio
7d048f5963 Merge pull request #10254 from fidencio/topic/remove-amd-specific-warning-from-non-amd-systems
runtime: Don't error out about SNP cert path on non SNP platforms
2024-09-04 23:42:32 +02:00
Fabiano Fidêncio
b10256a7ca runtime: Don't error out about SNP cert path on non SNP platforms
This error is specific to SNP platforms, so let's make sure we only
error this out when an SNP platform is used.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2024-09-04 11:54:52 +02:00
Silenio Quarti
9e1388728e runtime: fix bad default machine_type for remote hypervisor
Fixes: #10249

Signed-off-by: Silenio Quarti <silenio_quarti@ca.ibm.com>
2024-09-03 20:53:19 -04:00
Alex Lyn
f24983b3cf Merge pull request #10210 from l8huang/cold-vf
runtime: check if  cold_plug_vfio is enabled before create PhysicalEndpoint
2024-08-27 15:23:55 +08:00
Silenio Quarti
11ba8f05ca runtime: Allow machine_type in kata config for remote hypervisors
Fixes: #10211

Signed-off-by: Silenio Quarti <silenio_quarti@ca.ibm.com>
2024-08-26 10:17:40 -04:00
Lei Huang
70168a467d runtime: check if cold_plug_vfio is enabled before create PhysicalEndpoint
PhysicalEndpoint unbinds its VF interface and rebinds it as a VFIO device,
then cold-plugs the VFIO device into the guest kernel.

When `cold_plug_vfio` is set to "no-port", cold-plugging the VFIO device
will fail.

This change checks if `cold_plug_vfio` is enabled before creating PhysicalEndpoint
to avoid unnecessary VFIO rebind operations.

Fixes: #10162

Signed-off-by: Lei Huang <leih@nvidia.com>
2024-08-23 15:42:17 -07:00
Bo Chen
a0bd78b358 Merge pull request #10205 from likebreath/0819/upgrade_clh_v41.0
Upgrade to Cloud Hypervisor v41.0
2024-08-23 10:01:41 -07:00
Bo Chen
254f8bca74 runtime: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v41.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.

Fixes: #10203

Signed-off-by: Bo Chen <chen.bo@intel.com>
2024-08-22 11:05:54 -07:00
Hyounggyu Choi
7d0aba1a24 runtime: Enable to get cdh_api_timeout from configuration file
This commit allows `cdh_api_timeout` to be configured from the configuration file.
The configuration is commented out with specifying a default value (50s) because
the default value is configured in the agent.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-08-22 14:47:37 +02:00
Fabiano Fidêncio
e0ae398a2e Merge pull request #10151 from squarti/rootdir2
runtime: Files are not synced between host and guest VMs
2024-08-18 12:32:52 +02:00
ChengyuZhu6
ca05aca548 runtime: Add specific error message for gRPC request timeouts
Improved error handling to provide clearer feedback on request failures.

For example:
Improve createcontainer request timeout error message from
"Error: failed to create containerd task: failed to create shim task:context deadline exceed"
to "Error: failed to create containerd task: failed to create shim task: CreateContainerRequest timed out: context deadline exceed".

Fixes: #10173 -- part II

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-08-16 20:24:48 +08:00
Silenio Quarti
5d815ffde1 runtime: Files are not synced between host and guest VMs
This PR resolves the default kubelet root dir symbolic link and
uses it as the absolute path for the fs watcher regexs

Fixes: https://github.com/kata-containers/kata-containers/issues/9986

Signed-off-by: Silenio Quarti <silenio_quarti@ca.ibm.com>
2024-08-15 23:19:08 -04:00
Fabiano Fidêncio
44b08b84b0 Merge pull request #10113 from Freax13/fix/no-scsi-off
qemu: don't emit scsi parameter
2024-08-08 16:23:36 +02:00
Fabiano Fidêncio
f33f2d09f7 runtime: image-pull: Make it work with nerdctl
Our code for handling images being pulled inside the guest relies on a
containerType ("sandbox" or "container") being set as part of the
container annotations, which is done by the CRI Engine being used, and
depending on the used CRI Engine we check for a specfic annotation
related to the image-name, which is then passed to the agent.

However, when running kata-containers without kubernetes, specifically
when using `nerdctl`, none of those annotations are set at all.

One thing that we can do to allow folks to use `nerdctl`, however, is to
take advantage of the `--label` flag, and document on our side that
users must pass `io.kubernetes.cri.image-name=$image_name` as part of
the label.

By doing this, and changing our "fallback" so we can always look for
such annotation, we ensure that nerdctl will work when using the nydus
snapshotter, with kata-containers, to perform image pulling inside the
pod sandbox / guest.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-08-06 17:07:45 +02:00
Tom Dohrmann
322c80e7c8 qemu: don't emit scsi parameter
This parameter has been deprecated for a long time and QEMU 9.1.0 finally removes it.

Fixes: kata-containers#10112
Signed-off-by: Tom Dohrmann <erbse.13@gmx.de>
2024-08-02 07:30:39 +02:00
Qi Feng Huo
a113fc93c8 initdata: fix unit test code for initdata annotation
Added ut code for initdata annotation

Signed-off-by: Qi Feng Huo <huoqif@cn.ibm.com>
2024-07-26 18:24:05 +08:00
Qi Feng Huo
8d61029676 initdata: add unit test code for initdata annotation
Added ut code for initdata annotation

Signed-off-by: Qi Feng Huo <huoqif@cn.ibm.com>
2024-07-26 14:20:57 +08:00
Qi Feng Huo
b80057dfb5 initdata: Merge branch 'main' into annotation
- Merge branch 'main' into feature branch annotation
2024-07-26 14:01:04 +08:00
Archana Shinde
1636c201f4 network: Implement network hotunplug for physical endpoints
Similar to HotAttach, the HotDetach method signature for network
endoints needs to be changed as well to allow for the method to make
use of device manager to manage the hot unplug of physical network
devices.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2024-07-17 16:42:41 -07:00
Archana Shinde
c6390f2a2a vfio: Introduce function to get vfio dev path
This function will be later used to get the vfio dev path.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2024-07-17 16:42:41 -07:00
Archana Shinde
1e304e6307 network: Implement hotplug for physical endpoints
Enable physical network interfaces to be hotplugged.
For this, we need to change the signature of the HotAttach method
to make use of Sandbox instead of Hypervisor. Similar approach was
followed for Attach method, but this change was overlooked for
HotAttach.
The signature change is required in order to make use of
device manager and receiver for physical network
enpoints.

Fixes: #8405

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2024-07-17 16:42:40 -07:00
Archana Shinde
2fef4bc844 vfio: use driver_override field for device binding.
The current implementation for device binding using driver bind/unbind
and new_id fails in the scenario when the physical device is not bound
to a driver before assigning it to vfio.
There exists and updated mechanism to accomplish the same that does not
have the same issue as above.
The driver_override field for a device allows us to specify the driver for a device
rather than relying on the bound driver to provide a positive match of the
device. It also has other advantages referenced here:
https://patchwork.kernel.org/project/linux-pci/patch/1396372540.476.160.camel@ul30vt.home/

So use the updated driver_override mechanism for binding/unbinding a
physical device/virtual function to vfio-pci.

Signed-off-by: liangxianlong <liang.xianlong@zte.com.cn>
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2024-07-17 16:42:40 -07:00
Qi Feng Huo
4d66ee1935 initdata: add initdata annotation in hypervisor config
- Add Initdata annotation for hypervisor config, so that it can be passed when CreateVM

Signed-off-by: Qi Feng Huo <huoqif@cn.ibm.com>
2024-07-11 10:56:18 +08:00
Silenio Quarti
8260ce8d15 runtime: Initialize SharedFS for remote hypervisor
Sets SharedFS config to NoSharedFS for remote hypervisor in order to start the file watcher which syncs files from the host to the guest VMs. 

Signed-off-by: Silenio Quarti <silenio_quarti@ca.ibm.com>
2024-07-10 14:31:25 -03:00
Xuewei Niu
7f71eac6de Merge pull request #9868 from l8huang/dan
runtime: implement DAN in Go kata-runtime
2024-07-10 19:09:46 +08:00
Lei Huang
171d298dea runtime: implement DAN in Go kata-runtime
The DAN feature has already been implemented in kata-runtime-rs, and
this commit brings the same capability to the Go kata-runtime.

Fixes: #9758

Signed-off-by: Lei Huang <leih@nvidia.com>
2024-07-10 00:22:30 -07:00
Niteesh Dubey
529660fafb runtime: pass certificates for SNP coco
This will be used to get extended attestation report.

Fixes: #9805

Signed-off-by: Niteesh Dubey <niteesh@us.ibm.com>
2024-07-09 03:46:00 +00:00
Amulya Meka
dd12089e0d Merge pull request #9914 from Amulyam24/qemu-fix
kata-deploy: fix qemu static build on ppc64le
2024-07-02 10:45:03 +05:30
Amulyam24
259ec408b5 kata-deploy: fix qemu static build for v8.2.1 on ppc64le
Do not install the packages librados-dev and librbd-dev as they are not needed for building static qemu.

Add machine option cap-ail-mode-3=off while creating the VM to qemu cmdline.
Fixes: #9893

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2024-07-01 14:56:43 +05:30
Bo Chen
25e3cab028 runtime: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v40.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.

Fixes: #9929

Signed-off-by: Bo Chen <chen.bo@intel.com>
2024-06-27 09:59:00 -07:00
Alex Lyn
d66c214ae7 Merge pull request #9849 from markyangcc/main
runtime: fix missing of VhostUserDeviceReconnect parameter assignment
2024-06-27 21:48:37 +08:00
Zvonko Kaiser
e0aa54301f gpu: Missing separator
Add the correct separator for replacement

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-06-26 10:40:35 +00:00
Wainer Moschetta
f7e0d6313b Merge pull request #9865 from wainersm/qemu-coco-dev_updates
runtime: updates to qemu-coco-dev configuration
2024-06-21 10:14:30 -03:00
stevenhorsman
779754dcf6 runtime: Support policy in remote hypervisor
Move the `sandbox.agent.setPolicy` call out of the remoteHypervisor
if, block, so we can use the policy implementation on peer pods

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-06-19 16:43:53 +01:00
markyangcc
a28bf266f9 runtime: fix missing of VhostUserDeviceReconnect parameter assignment
Commit 'ca02c9f5124e' implements the vhost-user-blk reconnection functionality,
However, it has missed assigning VhostUserDeviceReconnect when new the QEMU
HypervisorConfig, resulting in VhostUserDeviceReconnect always set to default value 0.

Real change is this line, most of changes caused by go format,

return vc.HypervisorConfig{
	// ...
	VhostUserDeviceReconnect: h.VhostUserDeviceReconnect,
}, nil

Fixes: #9848
Signed-off-by: markyangcc <mmdou3@163.com>
2024-06-18 12:15:10 +08:00
Wainer dos Santos Moschetta
bdbee78517 runtime: allow default_{vcpus,memory} annotations to qemu-coco-dev
This is a counterpart of commit abf52420a4 for the qemu-coco-dev
configuration. By allowing default_vcpu and default_memory annotations
users can fine-tune the VM based on the size of the container
image to avoid issues related with pulling large images in the guest.

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2024-06-17 18:59:52 -03:00
Wainer dos Santos Moschetta
baa8d9d99c runtime: set shared_fs=none to qemu-coco-dev configuration
Just like the TEE configurations (sev, snp, tdx) we want to have the
qemu-coco-dev using shared_fs=none.

Fixes: #9676
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2024-06-17 18:42:46 -03:00
Bo Chen
a68aeca356 Merge pull request #9575 from likebreath/0430/clh_v39.0
versions: Upgrade to Cloud Hypervisor v39.0
2024-06-14 09:10:19 -07:00
Steve Horsman
ab8a9882c1 Merge pull request #9818 from EmmEff/fix-spelling
runtime: fix minor spelling issues
2024-06-14 13:12:56 +01:00
Steve Horsman
99bf95f773 Merge pull request #9827 from littlejawa/fix_panic_on_metrics_gathering
runtime: avoid panic on metrics gathering
2024-06-14 11:12:43 +01:00
Mike Frisch
c2f61b0fe3 runtime: spelling fixes
Minor spelling fixes in runtime log messages.

Signed-off-by: Mike Frisch <mikef17@gmail.com>
2024-06-13 12:11:34 -04:00
Greg Kurz
b85b1c1058 Merge pull request #9790 from gkurz/kill-some-dead-runtime-code
Kill some dead runtime code
2024-06-13 15:45:51 +02:00
Julien Ropé
9c86eb1d35 runtime: avoid panic on metrics gathering
While running with a remote hypervisor, whenever kata-monitor tries to access
metrics from the shim, the shim does a "panic" and no metric can be gathered.

The function GetVirtioFsPid() is called on metrics gathering, and had a call
to "panic()". Since there is no virtiofs process for remote hypervisor, the
right implementation is to return nil. The caller expects that, and will skip
metrics gathering for virtiofs.

Fixes: #9826

Signed-off-by: Julien Ropé <jrope@redhat.com>
2024-06-12 10:02:44 +02:00
Xuewei Niu
92cc5e0adb Merge pull request #9781 from gaohuatao-1/ght/shm 2024-06-12 12:39:28 +08:00
Greg Kurz
1acf8d0c35 govmm: Drop QEMU's NoShutdown knob
Code is not used.

Signed-off-by: Greg Kurz <groug@kaod.org>
2024-06-11 19:55:54 +02:00
Greg Kurz
cb5b548ad7 govmm: Drop QEMU's Daemonize knob
Code isn't used anymore.

Signed-off-by: Greg Kurz <groug@kaod.org>
2024-06-11 19:55:54 +02:00