runtime: keep cold-plug VFIO devices in guest-kernel mode

Container.createDevices was dropping cold-plug VFIO entries from the
container's deviceInfos whenever vfio_mode = "guest-kernel", which
in turn meant the agent's CreateContainer request carried no
vfio-pci-gk device entry and sandbox.pcimap[cid] stayed empty. The
SR-IOV device plugin still set PCIDEVICE_<RES>=<host-BDF> on the
workload container, so update_env_pci then aborted with
"No PCI mapping found for container <id>" and the container failed
with CrashLoopBackOff.

Include cold-plug VFIO devices in deviceInfos for both VFIO modes.
The existing vfio-pci-gk agent handler returns dev: None (so
/dev/vfio/<group> is not materialised in the container spec, and
constrainGRPCSpec(stripVfio=true) already strips it from the grpc
spec for guest-kernel mode), while still recording the host->guest
PCI mapping into sandbox.pcimap[cid] so env-var translation works.

devManager.NewDevice calls FindDevice first, which matches the
already cold-plugged sandbox-level device by HostPath / major / minor,
so this does not double-attach.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Fabiano Fidêncio
2026-05-27 15:32:05 +02:00
parent 9893b6dc03
commit e6777f0866
2 changed files with 17 additions and 2 deletions

View File

@@ -1106,7 +1106,22 @@ func (c *Container) createDevices(ctx context.Context, contConfig *ContainerConf
// device /dev/vfio/vfio an 2nd the actuall device(s) afterwards.
// Sort the devices starting with device #1 being the VFIO control group
// device and the next the actuall device(s) /dev/vfio/<group>
if coldPlugVFIO && c.sandbox.config.VfioMode == config.VFIOModeVFIO {
//
// Cold-plug VFIO devices must also reach the agent in
// `VfioMode == GuestKernel`. The agent's `vfio-pci-gk` handler
// returns `dev: None` (so /dev/vfio/<group> is *not* materialised in
// the container spec — `constrainGRPCSpec(stripVfio=true)` will have
// already removed it from `grpcSpec.Linux.Devices`), but it still
// records the host->guest PCI mapping into `sandbox.pcimap[cid]`.
// Without that mapping, `update_env_pci` cannot translate the
// `PCIDEVICE_<RES>=<host-BDF>` env vars set by the SR-IOV device
// plugin and aborts the container creation with
// "No PCI mapping found for container <id>".
//
// `devManager.NewDevice` calls `FindDevice` first, which matches the
// already-cold-plugged sandbox-level device by HostPath/major/minor,
// so this does not double-attach.
if coldPlugVFIO {
// DeviceInfo should still be added to the sandbox's device manager
// if vfio_mode is VFIO and coldPlugVFIO is true (e.g. vfio-ap-cold).
// This ensures that ociSpec.Linux.Devices is updated with

View File

@@ -1284,7 +1284,7 @@ func TestKataAgentCreateContainerVFIODevices(t *testing.T) {
hotPlugVFIO: config.NoPort,
coldPlugVFIO: config.BridgePort,
vfioMode: config.VFIOModeGuestKernel,
expectVFIODev: false,
expectVFIODev: true,
},
}