Both these functions take the absolute path from LinuxDevice and drop the
leading '/' to make a relative path. They do that with a simple
&dev.path[1..]. That can be technically incorrect in some edge cases such
as a path with redundant /s like "//dev//sda".
To handle cases like that, have the explicit relative path passed into
these functions. For now we calculate it in the same buggy way, but we'll
fix that shortly.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
create_devices() within the rustjail module is responsible for creating
device nodes within the (inner) containers. Errors that occur here will
be propagated up, but are likely to be low level failures of mknod() - e.g.
ENOENT or EACCESS - which won't be very useful without context when
reported all the way up to the runtime without the context of what we were
trying to do.
Add some anyhow context information giving the details of the device we
were trying to create when it failed.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently, update_spec_device() assumes that the proper device path in the
(inner) container is the same as the device path specified in the outer OCI
spec on the host.
Usually that's correct. However for VFIO group devices we actually need
the container to see the VM's device path, since it's normal to correlate
that with IOMMU group information from sysfs which will be different in the
guest and which we can't namespace away.
So, add an extra "final_path" parameter to update_spec_device() to allow
callers to chose the device path that should be used for the inner
container. All current callers pass the same thing as container_path, but
that will change in future.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
update_spec_device_list() is used to update the container configuration to
change device major/minor numbers configured by the Kata client based on
host details to values suitable for the sandbox VM, which may differ. It
takes a 'device' object, but the only things it actually uses from there
are container_path and vm_path.
Refactor this as update_spec_device(), taking the host and guest paths to
the device as explicit parameters. This makes the function more
self-contained and will enable some future extensions.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Each VFIO device passed into the guest could represent a whole IOMMU group
of devices on the host. Since these devices aren't DMA isolated from each
other, they must appear as the same IOMMU group in the guest as well.
The VMM should enforce that for us, but double check it, since things can't
work otherwise. This also means we determine the guest IOMMU group for the
VFIO device, which we'll be needing later.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
For upcoming VFIO extensions we'll need to work with the IOMMU groups of
VFIO devices. This helps us towards that by adding pci_iommu_group() to
retrieve the IOMMU group (if any) of a given PCI device.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
VFIO devices can be added to a Kata container and they will be passed
through to the sandbox guest. However, inside the guest those devices
will bind to a native guest driver, so they will no longer appear as VFIO
devices within the guest. This behaviour differs from runc or other
conventional container runtimes.
This code allows the agent to match the behaviour of other runtimes,
if instructed to by kata-runtime. VFIO devices it's informed about
with the "vfio" type instead of the existing "vfio-gk" type will be
rebound to the vfio-pci driver within the guest.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
For better VFIO support, we're going to need to take control of which guest
driver controls specific guest devices. To assist with that, add the
pci_driver_override() function to force a specific guest device to be
bound to a specific guest driver.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Use `"c".to_string` in the device type of `dev/full`
in order to consistent with the coding style of other devices
Fixes: #2890
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
create_tmpfs won't pass as the race condition in watcher umount. quote
James's words here:
1. Rust runs all tests in parallel.
2. Mounts are a process-wide, not a per-thread resource.
The only test that calls watcher.mount() is create_tmpfs().
However, other tests create BindWatcher objects.
3. BindWatcher's drop() implementation calls self.cleanup(),
which calls unmount for the mountpoint create_tmpfs() asserts.
4. The other tests are calling unmount whenever a BindWatcher goes
out of scope.
To avoid that issue, let the tests using BindWatcher in watcher and
sandbox.rs run sequentially.
Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
DefaultMaxVCPUs may be larger than the defaultMaxQemuVCPUs that should
be checked and avoided.
Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
The physical current vcpu number should not be used directly as the
largest vcpu number is limited to defaultMaxQemuVCPUs.
Here, a new helper is introduced in pkg/katautils/config.go to get
current vcpu number.
Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
The current kernel version parse lib can't process suffix '+', as the
modified kernel version will add '+' as suffix, thus panic will occur.
For example, if the current kernel version is "5.14.0-rc4+", test
TestHostNetworkingRequested will panic:
--- FAIL: TestHostNetworkingRequested (0.00s)
panic: &{DistroName:ubuntu DistroVersion:18.04
KernelVersion:5.11.0-rc3+ Issue: Passed:[] Failed:[] Debug:true
ActualEUID:0}: failed to check test constraints: error: Build meta data
is empty
Here, remove the suffix '+' in kernel version fix helper.
Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Export the top level hypervisor type
s/hypervisor/Hypervisor
Fixes: #2880
Signed-off-by: Manohar Castelino <mcastelino@apple.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Last of a series of commits to export the top level
hypervisor generic methods.
s/createSandbox/CreateVM
Fixes#2880
Signed-off-by: Manohar Castelino <mcastelino@apple.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Export commonly used hypervisor fields and utility functions.
These need to be exposed to allow the hypervisor to be consumed
externally.
Note: This does not change the hypervisor interface definition.
Those changes will be separate commits.
Signed-off-by: Manohar Castelino <mcastelino@apple.com>
This is to bump the OOT QAT 1.7 driver version to the
latest version. I dida test on my QAT enabled system and
everything functioned as expected.
Fixes: #2877
Signed-off-by: Eric Adams <eric.adams@intel.com>
Highlights from the Cloud Hypervisor release v19.0: 1) Improved PTY
handling for serial and virtio-console; 2) PCI boot time optimisations;
3) Improved TDX support; 4) Live migration enhancements (support with
virtio-mem and virtio-balloon); 5) virtio-mem support with vfio-user; 6)
AArch64 for virtio-iommu; 7) Various bug fixes for live-migration and
VFIO passthrough.
Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v19.0Fixes: #2871
Signed-off-by: Bo Chen <chen.bo@intel.com>
The upstream kernel SGX support has changed drastically since
the initial version of the Intel SGX use case doc was written.
The updated use case documents how to easily setup SGX with
Kata Containers running in a Kubernetes cluster.
Fixes: #2811
Depends-on: github.com/kata-containers/tests#4079
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Co-authored-by: James O. D. Hunt <james.o.hunt@intel.com>