Commit Graph

7290 Commits

Author SHA1 Message Date
Manabu Sugimoto
45e7c2cab1 static-checks: Add step for installing libseccomp
This adds a step for installing libseccomp because the kata-agent
supports seccomp feature.

Fixes: #1476

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2021-10-27 19:06:13 +09:00
Manabu Sugimoto
a3647e3486 osbuilder: Set up libseccomp library
The osbuilder needs to set up libseccomp library to build the kata-agent
because the kata-agent supports seccomp currently.
The library is built from the sources to create a static library for musl libc.
In addition, environment variables for the libseccomp crate are set to
link the library statically.

Fixes: #1476

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2021-10-27 19:06:13 +09:00
Manabu Sugimoto
3be50adab9 agent: Add support for Seccomp
The kata-agent supports seccomp feature based on the OCI runtime specification.
This seccomp capability in the kata-agent is enabled by default.
However, it is not enforced by default: users need to enable that by setting
`disable_guest_seccomp` to `false` in the main configuration file.

Fixes: #1476

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2021-10-27 19:06:13 +09:00
James O. D. Hunt
4d4a15d6ce
Merge pull request #2057 from wainersm/fix_kata-deploy-ci
ci: test-kata-deploy: Get rid of slash-command-action action
2021-10-27 10:08:12 +01:00
Peng Tao
03a9411884
Merge pull request #2878 from eadamsintel/update-qat-dockerfile
This is to bump the OOT QAT 1.7 driver version to the latest version.…
2021-10-27 17:00:04 +08:00
Samuel Ortiz
4280415149 agent: Fix the configuration sample file
All endpoint names share the `Request` suffix.
Also, the current list is based on functions, not requests.

Fixes #2916

Reported-by: Jakob Naucke <jakob.naucke@ibm.com>
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2021-10-27 06:02:33 +02:00
Bo Chen
bf5f42d411
Merge pull request #2906 from jodh-intel/trace-forwarder-drop-privs
forwarder: Drop privileges when using hybrid VSOCK
2021-10-26 13:24:01 -07:00
Chelsea Mafrica
8f33e6f593
Merge pull request #2896 from Jakob-Naucke/static
packaging/static-build: s390x fixes
2021-10-26 11:53:34 -07:00
Wainer dos Santos Moschetta
b0bc71f463 ci: test-kata-deploy: Get rid of slash-command-action action
There is a problem with slash-command-action which is on absence of a slash command
the job fails (instead of simply ignore, i.e., skip). This is documented on
https://github.com/xt0rted/slash-command-action/issues/124. There is a workaround
also documented on that issue, but here instead let's get rid of the action.

In this new implementation all comments sent to the pull request are parsed, if any
starts with "/test_kata-deploy" then the job is triggered.

Fixes #2836
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2021-10-26 11:36:13 -04:00
Wainer dos Santos Moschetta
309dae631a virtcontainers: check that both initrd and image are not set
This changed valid() in hypervisor to check the case where both
initrd and image path are set; in this case it returns an error.

Fixes #1868
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2021-10-26 10:44:23 -04:00
James O. D. Hunt
3120b489e3
Merge pull request #2687 from genjuro214/improve-oci-to-grpc
agent-ctl: improve the oci_to_grpc code
2021-10-26 13:00:02 +01:00
James O. D. Hunt
a10cfffdff forwarder: Fix changing log level
Fix `-l <log-level>` for the trace forwarder which didn't work
previously as it lacked the magic Cargo configuration.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-10-26 11:02:06 +01:00
James O. D. Hunt
6abccb92ce forwarder: Drop privileges when using hybrid VSOCK
Hybrid VSOCK requires `root` privileges to access the sandbox-specific
host-side AF_UNIX socket created by the hypervisor (CLH or FC). However,
once the socket has been bound, privileges can be dropped, allowing the
forwarder to run as user `nobody`.

Fixes: #2905.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-10-26 11:01:58 +01:00
Bin Liu
8d8604e10f
Merge pull request #2893 from liubin/fix/2892-print-error-instead-of-return
agent: do not return error but print it if task wait failed
2021-10-26 17:48:17 +08:00
Lei Li
bf00b8df87 agent-ctl: improve the oci_to_grpc code
The oci_to_grpc function just handles part of oci fields,
and others are not copied from oci spec to grpc spec,
such as process.env, process.capabilities, mounts and so on.
Try to implement more handlings to convert thoses fields.

Fixes #2686

Signed-off-by: Lei Li <cdlleili@cn.ibm.com>
2021-10-26 16:54:28 +08:00
James O. D. Hunt
b67fa9e450 forwarder: Make explicit root check
Rather than generating a potentially misleading error message if the
socket bind fails, perform an explicit check for `root` for Hybrid
VSOCK.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-10-26 09:28:26 +01:00
James O. D. Hunt
e377578e08 forwarder: Fix docs socket path
Updated the trace forwarder README to ensure the real socket path is
created, not the template socket path returned by `kata-runtime env`.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-10-26 09:28:26 +01:00
James O. D. Hunt
d1d9e84e9f
Merge pull request #2902 from liubin/fix/2901-delete-duplicated-line
virtcontainers: delete duplicated notify in watchHypervisor function
2021-10-26 08:22:11 +01:00
bin
5f306330f4 virtcontainers: delete duplicated notify in watchHypervisor function
When hypervisor check failed, the notify function is called twice.

Fixes: #2901

Signed-off-by: bin <bin@hyper.sh>
2021-10-26 11:58:26 +08:00
bin
5f5eca6b8e agent: do not return error but print it if task wait failed
Do not return error but print it if task wait failed
and let program continue to run the next code.

Fixes: #2892

Signed-off-by: bin <bin@hyper.sh>
2021-10-26 11:43:39 +08:00
Jakob Naucke
d2a7b6ff4a
packaging/static-build: s390x fixes
- Install OpenSSL for key generation in kernel build
- Do not install libpmem
- Do not exclude `*/share/*/*.img` files in QEMU tarball since among
  them are boot loader files critical for IPLing.

Fixes: #2895
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-10-25 18:47:35 +02:00
Yujia Qiao
6cc8000cae
cli: Show available guest protection in env output
Show available guest protections in the
`kata-runtime env` output. Also bump the formatVersion.

Fixes: #1982

Signed-off-by: Yujia Qiao <rapiz3142@gmail.com>
2021-10-25 21:44:56 +08:00
Yujia Qiao
2063b13805
virtcontainers: Add func AvailableGuestProtections
Add functions to return guestProtection as a string slice, which
can be then used in `kata-runtime env` output.

Signed-off-by: Yujia Qiao <rapiz3142@gmail.com>
2021-10-25 21:44:01 +08:00
Fupan Li
3d0fe433c6
Merge pull request #2889 from lht/handle-uevent-remove-actions
agent: Handle uevent remove actions
2021-10-25 19:08:20 +08:00
James O. D. Hunt
ec3aa1694b
Merge pull request #2844 from jongwu/unit_test
enable unit test on arm
2021-10-25 10:58:21 +01:00
Bin Liu
01fdeb7641
Merge pull request #2891 from ManaSugi/fix/unify-form
rustjail: Consistent coding style of LinuxDevice type
2021-10-25 14:03:03 +08:00
Bin Liu
ded864f862
Merge pull request #2568 from Bevisy/main-2254
cli: Fix outdated kata-runtime bash completion
2021-10-25 14:02:13 +08:00
Haitao Li
a13e2f77b8 agent: Handle uevent remove actions
uevents with action=remove was ignored causing the agent to reuse stale
data in the device map. This patch adds handling of such uevents.

Fixes #2405

Signed-off-by: Haitao Li <lihaitao@gmail.com>
2021-10-25 14:41:32 +11:00
David Gibson
a0825badf6
Merge pull request #2795 from dgibson/vfio-as-vfio
Allow VFIO devices to be used as VFIO devices in the container
2021-10-25 14:25:26 +11:00
Peng Tao
e709f11229
Merge pull request #2881 from mcastelino/topic/hypervisor-rename
Expose top level hypervisor methods -
2021-10-25 10:25:49 +08:00
David Gibson
34273da98f runtime/device: Allow VFIO devices to be presented to guest as VFIO devices
On a conventional (e.g. runc) container, passing in a VFIO group device,
/dev/vfio/NN, will result in the same VFIO group device being available
within the container.

With Kata, however, the VFIO device will be bound to the guest kernel's
driver (if it has one), possibly appearing as some other device (or a
network interface) within the guest.

This add a new `vfio_mode` option to alter this.  If set to "vfio" it will
instruct the agent to remap VFIO devices to the VFIO driver within the
guest as well, meaning they will appear as VFIO devices within the
container.

Unlike a runc container, the VFIO devices will have different names to the
host, since the names correspond to the IOMMU groups of the guest and those
can't be remapped with namespaces.

For now we keep 'guest-kernel' as the value in the default configuration
files, to maintain current Kata behaviour.  In future we should change this
to 'vfio' as the default.  That will make Kata's default behaviour more
closely resemble OCI specified behaviour.

fixes #693

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:29:31 +11:00
David Gibson
68696e051d runtime: Add parameter to constrainGRPCSpec to control VFIO handling
Currently constrainGRPCSpec always removes VFIO devices from the OCI
container spec which will be used for the inner container.  For
upcoming support for VFIO devices in DPDK usecases we'll need to not
do that.

As a preliminary to that, add an extra parameter to the function to
control whether or not it will remove the VFIO devices from the spec.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:29:31 +11:00
David Gibson
d9e2e9edb2 runtime: Rename constraintGRPCSpec to improve grammar
"constraint" is a noun, "constrain" is the associated verb, which makes
more sense in this context.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:29:31 +11:00
David Gibson
57ab408576 runtime: Introduce "vfio_mode" config variable and annotation
In order to support DPDK workloads, we need to change the way VFIO devices
will be handled in Kata containers.  However, the current method, although
it is not remotely OCI compliant has real uses.  Therefore, introduce a new
runtime configuration field "vfio_mode" to control how VFIO devices will be
presented to the container.

We also add a new sandbox annotation -
io.katacontainers.config.runtime.vfio_mode - to override this on a
per-sandbox basis.

For now, the only allowed value is "guest-kernel" which refers to the
current behaviour where VFIO devices added to the container will be bound
to whatever driver in the VM kernel claims them.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:29:29 +11:00
David Gibson
730b9c433f agent/device: Create device nodes for VFIO devices
Add and adjust the vfio devices in the inner container spec so that
rustjail will create device nodes for them.

In order to do that, we also need to make sure the VFIO device node is
ready within the guest VM first.  That may take (slightly) longer than
just the underlying PCI device(s) being ready, because vfio-pci needs
to initialize.  So, add a helper function that will wait for a
specific VFIO device node to be ready, using the existing uevent
listening mechanism.  It also returns the device node name for the
device (though in practice it will always /dev/vfio/NN where NN is the
group number).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:28:33 +11:00
David Gibson
175f9b06e9 rustjail: Allow container devices in subdirectories
Many device nodes go directly under /dev, however some are conventionally
placed in subdirectories under /dev.  For example /dev/vfio/vfio or
/dev/pts/ptmx.

Currently, attempting to pass such a device into a Kata container will fail
because mknod() will get an ENOENT because the parent directory is missing
(or an equivalent error for bind_dev()).

Correct that by making subdirectories as necessary in create_devices().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:28:33 +11:00
David Gibson
9891efc61f rustjail: Correct sanity checks on device path
For each user supplied device, create_devices() checks that the given path
actually is in /dev, by checking that its path starts with /dev and does
not contain "..".

However, this has subtle errors because it's interpreting the path as a raw
string without considering separators.  It will accept the path /devfoo
which it should not, while it will not accept the valid (though weird)
paths /dev/... and /dev/a..b.

Correct this by using std::path::Path methods designed for the purpose.
Having done this, it's trivial to also generate the relative path that
mknod_dev() or bind_dev() will need, so do that at the same time.

We also move this logic into a helper function so that we can add some unit
tests for it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:28:33 +11:00
David Gibson
d6b62c029e rustjail: Change mknod_dev() and bind_dev() to take relative device path
Both these functions take the absolute path from LinuxDevice and drop the
leading '/' to make a relative path.  They do that with a simple
&dev.path[1..].  That can be technically incorrect in some edge cases such
as a path with redundant /s like "//dev//sda".

To handle cases like that, have the explicit relative path passed into
these functions.  For now we calculate it in the same buggy way, but we'll
fix that shortly.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:28:33 +11:00
David Gibson
2680c0bfee rustjail: Provide useful context on device node creation errors
create_devices() within the rustjail module is responsible for creating
device nodes within the (inner) containers.  Errors that occur here will
be propagated up, but are likely to be low level failures of mknod() - e.g.
ENOENT or EACCESS - which won't be very useful without context when
reported all the way up to the runtime without the context of what we were
trying to do.

Add some anyhow context information giving the details of the device we
were trying to create when it failed.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:28:33 +11:00
David Gibson
42b92b2b05 agent/device: Allow container devname to differ from the host
Currently, update_spec_device() assumes that the proper device path in the
(inner) container is the same as the device path specified in the outer OCI
spec on the host.

Usually that's correct.  However for VFIO group devices we actually need
the container to see the VM's device path, since it's normal to correlate
that with IOMMU group information from sysfs which will be different in the
guest and which we can't namespace away.

So, add an extra "final_path" parameter to update_spec_device() to allow
callers to chose the device path that should be used for the inner
container.  All current callers pass the same thing as container_path, but
that will change in future.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:28:33 +11:00
David Gibson
827a41f973 agent/device: Refactor update_spec_device_list()
update_spec_device_list() is used to update the container configuration to
change device major/minor numbers configured by the Kata client based on
host details to values suitable for the sandbox VM, which may differ.  It
takes a 'device' object, but the only things it actually uses from there
are container_path and vm_path.

Refactor this as update_spec_device(), taking the host and guest paths to
the device as explicit parameters.  This makes the function more
self-contained and will enable some future extensions.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:28:33 +11:00
David Gibson
8ceadcc5a9 agent/device: Sanity check guest IOMMU groups
Each VFIO device passed into the guest could represent a whole IOMMU group
of devices on the host.  Since these devices aren't DMA isolated from each
other, they must appear as the same IOMMU group in the guest as well.

The VMM should enforce that for us, but double check it, since things can't
work otherwise.  This also means we determine the guest IOMMU group for the
VFIO device, which we'll be needing later.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:28:33 +11:00
David Gibson
ff59db7534 agent/device: Add function to get IOMMU group for a PCI device
For upcoming VFIO extensions we'll need to work with the IOMMU groups of
VFIO devices.  This helps us towards that by adding pci_iommu_group() to
retrieve the IOMMU group (if any) of a given PCI device.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:28:33 +11:00
David Gibson
13b06a35d5 agent/device: Rebind VFIO devices to VFIO driver inside guest
VFIO devices can be added to a Kata container and they will be passed
through to the sandbox guest.  However, inside the guest those devices
will bind to a native guest driver, so they will no longer appear as VFIO
devices within the guest.  This behaviour differs from runc or other
conventional container runtimes.

This code allows the agent to match the behaviour of other runtimes,
if instructed to by kata-runtime.  VFIO devices it's informed about
with the "vfio" type instead of the existing "vfio-gk" type will be
rebound to the vfio-pci driver within the guest.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:28:33 +11:00
David Gibson
e22bd78249 agent/device: Add helper function for binding a guest device to a driver
For better VFIO support, we're going to need to take control of which guest
driver controls specific guest devices.  To assist with that, add the
pci_driver_override() function to force a specific guest device to be
bound to a specific guest driver.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:28:33 +11:00
Manabu Sugimoto
b40eedc9f7 rustjail: Consistent coding style of LinuxDevice type
Use `"c".to_string` in the device type of `dev/full`
in order to consistent with the coding style of other devices

Fixes: #2890

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2021-10-25 09:15:59 +09:00
Jianyong Wu
57c0f93f54 agent: fix race condition when test watcher
create_tmpfs won't pass as the race condition in watcher umount. quote
James's words here:

1. Rust runs all tests in parallel.
2. Mounts are a process-wide, not a per-thread resource.
The only test that calls watcher.mount() is create_tmpfs().
However, other tests create BindWatcher objects.
3. BindWatcher's drop() implementation calls self.cleanup(),
which calls unmount for the mountpoint create_tmpfs() asserts.
4. The other tests are calling unmount whenever a BindWatcher goes
out of scope.

To avoid that issue, let the tests using BindWatcher in watcher and
sandbox.rs run sequentially.

Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-10-24 17:31:53 +08:00
Jianyong Wu
1a96b8ba35 template: disable template unit test on arm
Template is broken on arm. here we disable the template unit test
temporarily.

Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-10-23 15:07:25 +08:00
Jianyong Wu
43b13a4a6d runtime: DefaultMaxVCPUs should not greater than defaultMaxQemuVCPUs
DefaultMaxVCPUs may be larger than the defaultMaxQemuVCPUs that should
be checked and avoided.

Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-10-23 15:07:25 +08:00
Jianyong Wu
c59c36732b runtime: current vcpu number should be limited
The physical current vcpu number should not be used directly as the
largest vcpu number is limited to defaultMaxQemuVCPUs.
Here, a new helper is introduced in pkg/katautils/config.go to get
current vcpu number.

Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-10-23 15:07:25 +08:00