Network device hotplugging will use the same infrastructure (Netdev,
DeviceVirtioNet) as coldplugging, i.e. QemuCmdLine. To make the code
of network device setup visible outside of QemuCmdLine we factor it out
to a non-member function `get_network_device()` and make QemuCmdLine just
delegate to it.
Signed-off-by: Pavel Mores <pmores@redhat.com>
The function takes a whole QemuCmdLine but only actually uses
HypervisorConfig. We increase callability of the function by limiting
its interface to what it needs. This will come handy shortly.
Signed-off-by: Pavel Mores <pmores@redhat.com>
At least one PCI bridge is necessary to hotplug PCI devices. We only
support PCI (at this point at least) since that's what the go runtime
does (note that looking at the code in virtcontainers it might seem that
other bus types are supported, however when the bridge objects are passed
to govmm, all but PCI bridges are actually ignored). The entire logic of
bridge setup is lifted from runtime-go for compatibility's sake.
Signed-off-by: Pavel Mores <pmores@redhat.com>
Inorder to support sandbox api, intorduce the sandbox_config
struct and split the sandbox start stage from init process.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Docker cannot exit normally after the container process exits when
used with runtime-rs since it doesn't receive the exit event. This
commit enable runtime-rs to send TaskExit to containerd after process
exits.
Also, it moves "system_time_into" and "option_system_time_into" from
crates/runtimes/common/src/types/trans_into_shim.rs to a new utility
mod.
Signed-off-by: Sicheng Liu <lsc2001@outlook.com>
When the sandbox api was enabled, the pasue container
wouldn't be created, thus the shared sandbox pidns
should be fallbacked to the first container's init process,
instead of return any error here.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
When using network adapters that support SR-IOV, a VFIO device can be
plugged into a guest VM and claimed as a network interface. This can
significantly enhance network performance.
Fixes: #9758
Signed-off-by: Lei Huang <leih@nvidia.com>
rename the task_service to service, in order to
incopperate with the following added sandbox
services.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
In order to make different from sandbox request/response, this commit
changed the task request/response to TaskRequest/TaskResponse.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Since the wait_vm would be called before calling stop_vm,
which would take the reader lock, thus blocking the stop_vm
getting the writer lock, which would trigge the dead lock.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Since the block_on would block on the current thread
which would prevent other async tasks to be run on this
worker thread, thus change it to use the async task for
this task.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
This PR introduces support for selectively compiling Dragonball in
runtime-rs. By default, Dragonball will continue to be compiled into
the containerd-shim-kata-v2 executable, but users now have the option
to disable Dragonball compilation.
Fixes#10310
Signed-off-by: sidney chang <2190206983@qq.com>
Add cdi devices including ContainerDevice definition and
annotation_container_device method to annotate vfio device
in OCI Spec annotations which is inserted into Guest with
its mapping of vendor-class and guest pci path.
Fixes#10145
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
We need vfio device's properties device, vendor and
class, but we can only get property device and vendor.
just extend it with class is ok.
Fixes#10145
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
As we don't have any CI, nor maintainer to keep ACRN code around, we
better have it removed than give users the expectation that it should or
would work at some point.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
This change technically affects the path for enabled guest selinux as well,
however since this is not implemented in runtime-rs anyway nothing should
break. When guest selinux support is added this change will come handy.
Signed-off-by: Pavel Mores <pmores@redhat.com>
If guest selinux is off the runtime has to ensure that container OCI spec
contains no selinux labels for the container rootfs and process. Failure
to do so causes kata agent to try and apply the labels which fails since
selinux is not enabled in guest, which in turn causes container launch
to fail.
This is largely inspired by golang runtime(*) with a slight deviation
in ordering of checks. This change simply checks the disable_guest_selinux
config setting and if it's true it clears both rootfs and process label if
necessary. Golang runtime, on the other hand, seems to first check if
process label is non-empty and only then it checks the config setting,
meaning that if process label is empty the rootfs label is not reset
even if it's non-empty. Frankly, this looks like a potential bug though
probably unlikely to manifest since it can be assumed that the labels are
either both empty, or both non-empty.
(*) 4fd4b02f2e/src/runtime/virtcontainers/kata_agent.go (L1005)
Signed-off-by: Pavel Mores <pmores@redhat.com>
In order to handle the setting we have to first parse it and make its
value available to the rest of the program.
The yes() function is added to comply with serde which seems to insist
on default values being returned from functions. Long term, this is
surely not the best place for this function to live, however given that
this is currently the first and only place where it's used it seems
appropriate to put it near its use. If it ends up being reused elsewhere
a better place will surely emerge.
Signed-off-by: Pavel Mores <pmores@redhat.com>
kata-shim was not reporting `inactive_file` in memory stat.
This memory is deducted by containerd when calculating the size of container working set, as it can be paged out by the operating
system under memory pressure. Without reporting `inactive_file`, containerd will over report container memory usage.
[Here](https://github.com/containerd/containerd/blob/v1.7.22/pkg/cri/server/container_stats_list_linux.go#L117) is where containerd
deducts `inactive_file` from memory usage.
Note that kata-shim correctly reports `total_inactive_file` for cgroup v1, but this was not implemented for cgroup v2.
This commit:
- Adds code in kata-shim to report "inactive_file" memory for cgroup v2
- Implements reporting of all available cgroup v2 memory stats to containerd
- Uses defensive coding to avoid assuming existence of any memory.stat fields
The list of available cgroup v2 memory stats defined by containerd can be found
[here](https://pkg.go.dev/github.com/containerd/cgroups/v2/stats#MemoryStat).
Fixes#10280
Signed-off-by: Alex Man <alexman@stripe.com>
This patch adds support to call kata agents SetPolicy
API. Also adds tests for SetPolicy API using agent-ctl.
Fixes#9711
Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
agent built with policy feature initializes the policy engine using a
policy document from a default path, which is installed & linked during
UVM rootfs build. This commit adds support to provide a default agent
policy as environment variable.
This targets development/testing scenarios where kata-agent
is wanted to be started as a local process.
Fixes#10301
Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
Add two parameters for enabling cosign signature image verification.
- `enable_signature_verification`: to activate signature verification
- `image_policy`: URI of the image policy
config
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
new version of the anyhow crate has changed the backtrace capture thus
unit tests of kata-agent that compares a raised error with an expected
one would fail. To fix this, we need only panics to have backtraces,
thus set `RUST_BACKTRACE=1` and `RUST_LIB_BACKTRACE=0` for tests due to
document
https://docs.rs/anyhow/latest/anyhow/
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
It's a prerequisite PR to make built-in vmm dragonball compilation
options configurable.
Extract TAP device-related code from dragonball's dbs_utils into a
separate library within the runtime-rs hypervisor module.
To enhance functionality and reduce dependencies, the extracted code
has been reimplemented using the libc crate and the ifreq structure.
Fixes#10182
Signed-off-by: sidney chang <2190206983@qq.com>
Remove the recently added default UID/GID values, because the genpolicy
design is to initialize those fields before this new code path gets
executed.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Some container images are configured such that the user (and group)
under which their entrypoint should run is not a number (or pair of
numbers), but a user name.
For example, in a Dockerfile, one might write:
> USER 185
indicating that the entrypoint should run under UID=185.
Some images, however, might have:
> RUN groupadd --system --gid=185 spark
> RUN useradd --system --uid=185 --gid=spark spark
> ...
> USER spark
indicating that the UID:GID pair should be resolved at runtime via
/etc/passwd.
To handle such images correctly, read through all /etc/passwd files in
all layers, find the latest version of it (i.e., the top-most layer with
such a file), and, in so doing, ensure that whiteouts of this file are
respected (i.e., if one layer adds the file and some subsequent layer
removes it, don't use it).
Signed-off-by: Hernan Gatta <hernan.gatta@opaque.co>
Disabling the UID Policy rule was a workaround for #9928. Re-enable
that rule here and add a new test/CI temporary workaround for this
issue. This new test workaround will be removed after fixing #9928.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>