Commit Graph

1239 Commits

Author SHA1 Message Date
Bo Chen
16baecc5b1 runtime: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v26.0.
Note: The client code of cloud-hypervisor's (CLH) OpenAPI is
automatically generated by openapi-generator [1-2].

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Fixes: #4952

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-08-17 12:23:12 -07:00
Hengqi Chen
8ff5c10ac4 network: Fix error message for setting hardware address on TAP interface
Error out with the correct interface name and hardware address instead.

Fixes: #4944

Signed-off-by: Hengqi Chen <chenhengqi@outlook.com>
2022-08-17 16:42:07 +08:00
Bin Liu
cb7f9524be
Merge pull request #4804 from openanolis/anolis/merge_runtime_rs_to_main
runtime-rs:merge runtime rs to main
2022-08-11 08:40:41 +08:00
Tim Zhang
4813a3cef9
Merge pull request #4711 from liubin/fix/4710-wait-nydusd-api-server-ready
nydus: wait nydusd API server ready before mounting share fs
2022-08-10 17:20:17 +08:00
liubin
2ae807fd29 nydus: wait nydusd API server ready before mounting share fs
If the API server is not ready, the mount call will fail, so before
mounting share fs, we should wait the nydusd is started and
the API server is ready.

Fixes: #4710

Signed-off-by: liubin <liubin0329@gmail.com>
Signed-off-by: Bin Liu <bin@hyper.sh>
2022-08-08 16:18:38 +08:00
Tim Zhang
8d4d98587f
Merge pull request #4746 from liubin/fix/4745-add-log-field
runtime: explicitly mark the source of the log is from qemu.log
2022-08-08 15:21:01 +08:00
Archana Shinde
c1e3b8f40f govmm: Refactor qmp functions for adding block device
Instead of passing a bunch of arguments to qmp functions for
adding block devices, use govmm BlockDevice structure to reduce these.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-08-05 13:16:34 -07:00
Archana Shinde
00860a7e43 qmp: Pass aio backend while adding block device
Allow govmm to pass aio backend while adding block device.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-08-05 13:16:34 -07:00
Archana Shinde
e1b49d7586 config: Add block aio as a supported annotation
Allow Block AIO to be passed as a per pod annotation.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-08-05 13:16:34 -07:00
Archana Shinde
ed0f1d0b32 config: Add "block_device_aio" as a config option for qemu
This configuration will allow users to choose between different
I/O backends for qemu, with the default being io_uring.
This will allow users to fallback to a different I/O mechanism while
running on kernels olders than 5.1.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-08-05 13:16:34 -07:00
chmod100
d8ad16a34e runtime: add unlock before return in sendReq
Unlock is required before return, so there need to add unlock

Fixes: #4827

Signed-off-by: chmod100 <letfu@outlook.com>
2022-08-05 13:30:12 +00:00
Zhongtao Hu
adfad44efe Merge remote-tracking branch 'origin/main' into runtime-rs-merge-tmp
To keep runtime-rs up to date, we will merge main into runtime-rs every
week.

Fixes:#4776
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2022-08-01 11:12:48 +08:00
yaoyinnan
5c3155f7e2 runtime: Support for host cgroup v2
Support cgroup v2 on the host. Update vendor containerd/cgroups to add cgroup v2.

Fixes: #3073

Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
2022-07-28 10:30:45 +08:00
Bin Liu
85f4e7caf6 runtime: explicitly mark the source of the log is from qemu.log
In qemu.StopVM(), if debug is enabled, the shim will dump logs
from qemu.log, but users don't know which logs are from qemu.log
and shim itself. Adding some additional messages will
help users to distinguish these logs.

Fixes: #4745

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-07-26 16:08:59 +08:00
gntouts
56d49b5073 versions: Update Firecracker version to v1.1.0
This patch upgrades Firecracker version from v0.23.4 to v1.1.0

* Generate swagger models for v1.1.0 (from firecracker.yaml)
* Replace ht_enabled param to smt (API change)
* Remove NUMA-related jailer param --node 0

Fixes: #4673
Depends-on: github.com/kata-containers/tests#4968

Signed-off-by: George Ntoutsos <gntouts@nubificus.co.uk>
Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
2022-07-26 07:01:26 +00:00
Ji-Xinyou
62182db645 runtime-rs: add unit test for ipvlan endpoint
Add unit test to check the integrity of IPVlanEndpoint::new(...)

Fixes: #4655
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
2022-07-18 15:56:06 +08:00
wllenyj
274598ae56 kata-runtime: add dragonball config check support.
add dragonball config check support.

Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
2022-07-14 10:43:50 +08:00
Fabiano Fidêncio
be31207f6e clh: Don't crash if no network device is set by the upper layer
`ctr` doesn't set a network device when creating the sandbox, which
leads to Cloud Hypervisor's driver crashing, see the log below:
```
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x55641c23b248]
goroutine 32 [running]:
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.glob..func1(0xc000397900)
	/home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/clh.go:163 +0x128
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.(*cloudHypervisor).vmAddNetPut(...)
	/home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/clh.go:1348
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.(*cloudHypervisor).bootVM(0xc000397900, {0x55641c76dfc0, 0xc000454ae0})
	/home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/clh.go:1378 +0x5a2
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.(*cloudHypervisor).StartVM(0xc000397900, {0x55641c76dff8, 0xc00044c240},
0x55641b8016fd)
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/clh.go:659 +0x7ee
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.(*Sandbox).startVM.func2()
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/sandbox.go:1219 +0x190
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.(*LinuxNetwork).Run.func1({0xc0004a8910, 0x3b})
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/network_linux.go:319 +0x1b
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.doNetNS({0xc000048440, 0xc00044c240}, 0xc0005d5b38)
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/network_linux.go:1045 +0x163
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.(*LinuxNetwork).Run(0xc000150c80, {0x55641c76dff8, 0xc00044c240}, 0xc00014e4e0)
	/home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/network_linux.go:318 +0x105
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.(*Sandbox).startVM(0xc000107d40, {0x55641c76dff8, 0xc0005529f0})
	/home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/sandbox.go:1205 +0x65f
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.createSandboxFromConfig({_, _}, {{0x0, 0x0, 0x0}, {0xc000385a00, 0x1, 0x1},
{0x55641d033260, 0x0, ...}, ...}, ...)
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/api.go:91 +0x346
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.CreateSandbox({_, _}, {{0x0, 0x0, 0x0}, {0xc000385a00, 0x1, 0x1},
{0x55641d033260, 0x0, ...}, ...}, ...)
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/api.go:51 +0x150
github.com/kata-containers/kata-containers/src/runtime/virtcontainers.(*VCImpl).CreateSandbox(_, {_, _}, {{0x0, 0x0, 0x0}, {0xc000385a00, 0x1, 0x1},
{0x55641d033260, ...}, ...})
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/implementation.go:35 +0x74
github.com/kata-containers/kata-containers/src/runtime/pkg/katautils.CreateSandbox({_, _}, {_, _}, {{0xc0004806c0, 0x9}, 0xc000140110, 0xc00000f7a0,
{0x0, 0x0}, ...}, ...)
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/pkg/katautils/create.go:175 +0x8b6
github.com/kata-containers/kata-containers/src/runtime/pkg/containerd-shim-v2.create({0x55641c76dff8, 0xc0004129f0}, 0xc00034a000, 0xc00036a000)
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/pkg/containerd-shim-v2/create.go:147 +0xdea
github.com/kata-containers/kata-containers/src/runtime/pkg/containerd-shim-v2.(*service).Create.func2()
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/pkg/containerd-shim-v2/service.go:401 +0x32
created by github.com/kata-containers/kata-containers/src/runtime/pkg/containerd-shim-v2.(*service).Create
        /home/ubuntu/go/src/github.com/kata-containers/kata-containers/src/runtime/pkg/containerd-shim-v2/service.go:400 +0x534
```

This bug has been introduced as part of the
https://github.com/kata-containers/kata-containers/pull/4312 PR, which
changed how we add the network device.

In order to avoid the crash, let's simply check whether we have a device
to be added before iterating the list of network devices.

Fixes: #4618

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-07-13 10:40:21 +02:00
Fabiano Fidêncio
dc3b6f6592 versions: Update Cloud Hypervisor to v25.0
Cloud Hypervisor v25.0 has been released on July 7th, 2022, and brings
the following changes:

**ch-remote Improvements**
The ch-remote command has gained support for creating the VM from a JSON
config and support for booting and deleting the VM from the VMM.

**VM "Coredump" Support**
Under the guest_debug feature flag it is now possible to extract the memory
of the guest for use in debugging with e.g. the crash utility.
(https://github.com/cloud-hypervisor/cloud-hypervisor/issues/4012)

**Notable Bug Fixes**
* Always restore console mode on exit
  (https://github.com/cloud-hypervisor/cloud-hypervisor/issues/4249,
   https://github.com/cloud-hypervisor/cloud-hypervisor/issues/4248)
* Restore vCPUs in numerical order which fixes aarch64 snapshot/restore
  (https://github.com/cloud-hypervisor/cloud-hypervisor/issues/4244)
* Don't try and configure IFF_RUNNING on TAP devices
  (https://github.com/cloud-hypervisor/cloud-hypervisor/issues/4279)
* Propagate configured queue size through to vhost-user backend
  (https://github.com/cloud-hypervisor/cloud-hypervisor/issues/4286)
* Always Program vCPU CPUID before running the vCPU to fix running on Linux
  5.16
  (https://github.com/cloud-hypervisor/cloud-hypervisor/issues/4156)
* Enable ACPI MADT "Online Capable" flag for hotpluggable vCPUs to fix newer
  Linux guest

**Removals**
The following functionality has been removed:

* The mergeable option from the virtio-pmem support has been removed
  (https://github.com/cloud-hypervisor/cloud-hypervisor/issues/3968)
* The dax option from the virtio-fs support has been removed
  (https://github.com/cloud-hypervisor/cloud-hypervisor/issues/3889)

Fixes: #4641

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-07-12 14:47:58 +00:00
GabyCT
02a51e75a7
Merge pull request #4554 from liubin/fix/delete-not-used-console-from-container-config
runtime: delete Console from Cmd type
2022-06-30 11:40:07 -05:00
Fabiano Fidêncio
aa561b49f5
Merge pull request #4540 from fidencio/topic/default_maxmemory
Add `default_maxmemory` config option
2022-06-30 12:08:15 +02:00
GabyCT
2a94261df5
Merge pull request #4549 from liubin/fix/4419-set-status-if-wait-process-failed
shim: set a non-zero return code if the wait process call failed.
2022-06-29 17:04:53 -05:00
liubin
a5a25ed13d runtime: delete Console from Cmd type
There is much code related to this property, but it is not used anymore.

Fixes: #4553

Signed-off-by: liubin <liubin0329@gmail.com>
2022-06-29 17:36:32 +08:00
liubin
ab5f1c9564 shim: set a non-zero return code if the wait process call failed.
Return code is an int32 type, so if an error occurred, the default value
may be zero, this value will be created as a normal exit code.

Set return code to 255 will let the caller(for example Kubernetes) know
that there are some problems with the pod/container.

Fixes: #4419

Signed-off-by: liubin <liubin0329@gmail.com>
2022-06-29 12:33:32 +08:00
Eric Ernst
5f936f268f virtcontainers: config validation is host specific
Ideally this config validation would be in a seperate package
(katautils?), but that would introduce circular dependency since we'd
call it from vc, and it depends on vc types (which, shouldn't be vc, but
probably a hypervisor package instead).

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-06-28 18:22:28 -07:00
Fabiano Fidêncio
323271403e virtcontainers: Remove unused function
While working on the previous commits, some of the functions become
non-used.  Let's simply remove them.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-06-28 21:19:24 +02:00
Fabiano Fidêncio
58ff2bd5c9 clh,qemu: Adapt to using default_maxmemory
Let's adapt Cloud Hypervisor's and QEMU's code to properly behave to the
newly added `default_maxmemory` config.

While implementing this, a change of behaviour (or a bug fix, depending
on how you see it) has been introduced as if a pod requests more memory
than the amount avaiable in the host, instead of failing to start the
pod, we simply hotplug the maximum amount of memory available, mimicing
better the runc behaviour.

Fixes: #4516

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-06-28 21:19:24 +02:00
Fabiano Fidêncio
afdc960424 hypervisor: Add default_maxmemory configuration
Let's add a `default_maxmemory` configuration, which allows the admins
to set the maximum amount of memory to be used by a VM, considering the
initial amount + whatever ends up being hotplugged via the pod limits.

By default this value is 0 (zero), and it means that the whole physical
RAM is the limit.

Fixes: #4516

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-06-28 08:32:15 +02:00
Eric Ernst
bdf5e5229b virtcontainers: validate hypervisor config outside of hypervisor itself
Depending on the user of it, the hypervisor from hypervisor interface
could have differing view on what is valid or not. To help decouple,
let's instead check the hypervisor config validity as part of the
sandbox creation, rather than as part of the CreateVM call within the
hypervisor interface implementation.

Fixes: #4251

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-06-27 11:53:41 -07:00
Eric Ernst
469e098543 katautils: don't do validation when loading hypervisor config
Policy for whats valid/invalid within the config varies by VMM, host,
and by silicon architecture. Let's keep katautils simple for just
translating a toml to the hypervisor config structure, and leave
validation to virtcontainers.

Without this change, we're doing duplicate validation.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-06-27 10:13:26 -07:00
Bin Liu
27b1bb5ed9
Merge pull request #4467 from egernst/device-pkg
device package cleanup/refactor
2022-06-27 14:40:53 +08:00
Eric Ernst
f97d9b45c8 runtime: device/persist: drop persist dependency from device pkgs
Rather than have device package depend on persist, let's define the
(almost duplicate) structures within device itself, and have the Kata
Container's persist pkg import these.

This'll help avoid unecessary dependencies within our core packages.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-06-26 21:31:29 -07:00
Eric Ernst
f9e96c6506 runtime: device: move to top level package
Let's move device package to runtime/pkg instead of being buried under
virtcontainers.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-06-26 21:31:29 -07:00
Fabiano Fidêncio
133528dd14
Merge pull request #4503 from amshinde/multi-queue-block
block: Leverage multiqueue for virtio-block
2022-06-23 12:17:11 +02:00
Fabiano Fidêncio
78e27de6c3
Merge pull request #4358 from zvonkok/memreserve
runtime: Add heuristic to get the right value(s) for mem-reserve
2022-06-22 13:41:23 +02:00
Archana Shinde
e227b4c404 block: Leverage multiqueue for virtio-block
Similar to network, we can use multiple queues for virtio-block
devices. This can help improve storage performance.
This commit changes the number of queues for block devices to
the number of cpus for cloud-hypervisor and qemu.

Today the default number of cpus a VM starts with is 1.
Hence the queues used will be 1. This change will help
improve performance when the default cold-plugged cpus is greater
than one by changing this in the config file. This may also help
when we use the sandboxing feature with k8s that passes down
the sum of the resources required down to Kata.

Fixes #4502

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2022-06-21 12:38:53 -07:00
Eric Ernst
72049350ae
Merge pull request #4288 from fengwang666/enable-qemu-sandbox
runtime: enable sandbox feature on qemu
2022-06-21 09:22:26 -07:00
Zvonko Kaiser
e7e7dc9dfe runtime: Add heuristic to get the right value(s) for mem-reserve
Fixes: #2938

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2022-06-21 03:44:28 -07:00
Liang Zhou
ef925d40ce runtime: enable sandbox feature on qemu
Enable "-sandbox on" in qemu can introduce another protect layer
on the host, to make the secure container more secure.

The default option is disable because this feature may introduce some
performance cost, even though user can enable
/proc/sys/net/core/bpf_jit_enable to reduce the impact.

Fixes: #2266

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-06-17 15:30:46 -07:00
Fabiano Fidêncio
ac5dbd8598 clh: Improve logging related to the net dev addition
Let's improve the log so we make it clear that we're only *actually*
adding the net device to the Cloud Hypervisor configuration when calling
our own version of VmAddNetPut().

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-06-14 10:53:09 +00:00
Fabiano Fidêncio
0b75522e1f network: Set queues to 1 to ensure we get the network fds
We want to have the file descriptors of the opened tuntap device to pass
them down to the VMMs, so the VMMs don't have to explicitly open a new
tuntap device themselves, as the `container_kvm_t` label does not allow
such a thing.

With this change we ensure that what's currently done when using QEMU as
the hypervisor, can be easily replicated with other VMMs, even if they
don't support multiqueue.

As a side effect of this, we need to close the received file descriptors
in the code of the VMMs which are not going to use them.

Fixes: #3533

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-06-14 10:53:09 +00:00
Fabiano Fidêncio
93b61e0f07 network: Add FFI_NO_PI to the netlink flags
Adding FFI_NO_PI to the netlink flags causes no harm to the supported
and tested hypervisors as when opening the device by its name Cloud
Hypervisor[0], Firecracker[1], and QEMU[2] do set the flag already.

However, when receiving the file descriptor of an opened tutap device
Cloud Hypervisor is not able to set the flag, leaving the guest without
connectivity.

To avoid such an issue, let's simply add the FFI_NO_PI flag to the
netlink flags and ensure, from our side, that the VMMs don't have to set
it on their side when dealing with an already opened tuntap device.

Note that there's a PR opened[3] just for testing that this change
doesn't cause any breakage.

[0]: e52175c2ab/net_util/src/tap.rs (L129)
[1]: b6d6f71213/src/devices/src/virtio/net/tap.rs (L126)
[2]: 3757b0d08b/net/tap-linux.c (L54)
[3]: https://github.com/kata-containers/kata-containers/pull/4292

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-06-14 10:53:09 +00:00
Fabiano Fidêncio
bf3ddc125d clh: Pass the tuntap fds down to Cloud Hypervisor
This is basically a no-op right now, as:
* netPair.TapInterface.VMFds is nil
* the tap name is still passed to Cloud Hypervisor, which is the Cloud
  Hypervisor's first choice when opening a tap device.

In the very near future we'll stop passing the tap name to Cloud
Hypervisor, and start passing the file descriptors of the opened tap
instead.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-06-14 10:53:09 +00:00
Fabiano Fidêncio
55ed32e924 clh: Take care of the VmAdNetdPut request ourselves
Knowing that VmAddNetPut works as expected, let's switch to manually
building the request and writing it to the appropriate socket.

By doing this it gives us more flexibility to, later on, pass the file
descriptor of the tuntap device to Cloud Hypervisor, as openAPI doesn't
support such operation (it has no notion of SCM Rights).

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-06-14 10:53:09 +00:00
Fabiano Fidêncio
01fe09a4ee clh: Hotplug the network devices
Instead of creating the VM with the network device already plugged in,
let's actually add the network device *after* the VM is created, but
*before* the Vm is actually booted.

Although it looks like it doesn't make any functional difference between
what's done in the past and what this commit introduces, this will be
used to workaround a limitation on OpenAPI when it comes to passing down
the network device's file descriptor to Cloud Hypervisor, so Cloud
Hypervisor can use it instead of opening the device by its name on the
VMM side.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-06-14 10:51:02 +00:00
Fabiano Fidêncio
2e07538334 clh: Expose VmAddNetPut
VmAddNetPut is the API provided by the Cloud Hypervisor client (auto
generated) code to hotplug a new network device to the VM.

Let's expose it now as it'll be used as part this series, mostly to
guide the reviewer through the process of what we have to do, as later
on, spoiler alert, it'll end up being removed.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-06-14 10:27:30 +00:00
Eric Ernst
0136be22ca virtcontainers: plumb iptable set/get from sandbox to agent
Introduce get/set iptable handling. We add a sandbox API for getting and
setting the IPTables within the guest. This routes it from sandbox
interface, through kata-agent, ultimately making requests to the guest
agent.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-05-31 09:27:58 -07:00
Eric Ernst
03176a9e09 proto: update generated code based on proto update
Update the generated agent.pb.go code based on proto update.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-05-31 08:45:59 -07:00
Fabiano Fidêncio
fff832874e clh: Update to v24.0
This release has been tracked through the v24.0 project.

virtio-iommu specification describes how a device can be attached by default
to a bypass domain. This feature is particularly helpful for booting a VM with
guest software which doesn't support virtio-iommu but still need to access
the device. Now that Cloud Hypervisor supports this feature, it can boot a VM
with Rust Hypervisor Firmware or OVMF even if the virtio-block device exposing
the disk image is placed behind a virtual IOMMU.

Multiple checks have been added to the code to prevent devices with identical
identifiers from being created, and therefore avoid unexpected behaviors at boot
or whenever a device was hot plugged into the VM.

Sparse mmap support has been added to both VFIO and vfio-user devices. This
allows the device regions that are not fully mappable to be partially mapped.
And the more a device region can be mapped into the guest address space, the
fewer VM exits will be generated when this device is accessed. This directly
impacts the performance related to this device.

A new serial_number option has been added to --platform, allowing a user to
set a specific serial number for the platform. This number is exposed to the
guest through the SMBIOS.

* Fix loading RAW firmware (#4072)
* Reject compressed QCOW images (#4055)
* Reject virtio-mem resize if device is not activated (#4003)
* Fix potential mmap leaks from VFIO/vfio-user MMIO regions (#4069)
* Fix algorithm finding HOB memory resources (#3983)

* Refactor interrupt handling (#4083)
* Load kernel asynchronously (#4022)
* Only create ACPI memory manager DSDT when resizable (#4013)

Deprecated features will be removed in a subsequent release and users should
plan to use alternatives

* The mergeable option from the virtio-pmem support has been deprecated
(#3968)
* The dax option from the virtio-fs support has been deprecated (#3889)

Fixes: #4317

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-05-26 08:51:18 +00:00
Peng Tao
2c238c8504
Merge pull request #4213 from zvonkok/vfio
runtime: Adding the correct detection of mediated PCIe devices
2022-05-20 15:00:23 +08:00
Fabiano Fidêncio
811ac6a8ce
Merge pull request #4282 from r4f4/runtime-dedup-types-import
runtime: remove duplicate 'types' import
2022-05-19 22:15:36 +02:00
Chelsea Mafrica
d8be0f8e9f
Merge pull request #4281 from r4f4/runtime-qemu-comments
runtime: sync docstrings with function names
2022-05-19 09:17:38 -07:00
Rafael Fonseca
7a5ccd1264 runtime: sync docstrings with function names
The functions were renamed but their docstrings were not.

Fixes #4006

Signed-off-by: Rafael Fonseca <r4f4rfs@gmail.com>
2022-05-19 14:31:47 +02:00
Greg Kurz
fa61bd43ee
Merge pull request #4238 from snir911/wip/legacy_console
qemu: allow using legacy serial device for the console
2022-05-19 14:30:59 +02:00
Rafael Fonseca
ce2e521a0f runtime: remove duplicate 'types' import
Fallout of 09f7962ff

Fixes #4285

Signed-off-by: Rafael Fonseca <r4f4rfs@gmail.com>
2022-05-19 13:49:47 +02:00
Snir Sheriber
f4994e486b runtime: allow annotation configuration to use_legacy_serial
and update the docs and test

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2022-05-18 18:58:21 +03:00
Rafael Fonseca
8052fe62fa runtime: do not check for EOF error in console watcher
The documentation of the bufio package explicitly says

"Err returns the first non-EOF error that was encountered by the
Scanner."

When io.EOF happens, `Err()` will return `nil` and `Scan()` will return
`false`.

Fixes #4079

Signed-off-by: Rafael Fonseca <r4f4rfs@gmail.com>
2022-05-17 15:14:33 +02:00
Snir Sheriber
c67b9d2975 qemu: allow using legacy serial device for the console
This allows to get guest early boot logs which are usually
missed when virtconsole is used.
- It utilizes previous work on the govmm side:
https://github.com/kata-containers/govmm/pull/203
- unit test added

Fixes: #4237
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2022-05-17 12:06:11 +03:00
Snir Sheriber
44814dce19 qemu: treat console kernel params within appendConsole
as it is tightly coupled with the appended console device
additionally have it tested

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2022-05-17 12:05:31 +03:00
Zvonko Kaiser
2a1d394147 runtime: Adding the correct detection of mediated PCIe devices
Fixes #4212

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2022-05-09 00:57:06 -07:00
Fabiano Fidêncio
33a8b70558 clh: Rely on Cloud Hypervisor for generating the device ID
We're currently hitting a race condition on the Cloud Hypervisor's
driver code when quickly removing and adding a block device.

This happens because the device removal is an asynchronous operation,
and we currently do *not* monitor events coming from Cloud Hypervisor to
know when the device was actually removed.  Together with this, the
sandbox code doesn't know about that and when a new device is attached
it'll quickly assign what may be the very same ID to the new device,
leading to the Cloud Hypervisor's driver trying to hotplug a device with
the very same ID of the device that was not yet removed.

This is, in a nutshell, why the tests with Cloud Hypervisor and
devmapper have been failing every now and then.

The workaround taken to solve the issue is basically *not* passing down
the device ID to Cloud Hypervisor and simply letting Cloud Hypervisor
itself generate those, as Cloud Hypervisor does it in a manner that
avoids such conflicts.  With this addition we have then to keep a map of
the device ID and the Cloud Hypervisor's generated ID, so we can
properly remove the device.

This workaround will probably stay for a while, at least till someone
has enough cycles to implement a way to watch the device removal event
and then properly act on that.  Spoiler alert, this will be a complex
change that may not even be worth it considering the race can be avoided
with this commit.

Fixes: #4176

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-05-04 09:04:03 +02:00
Jianyong Wu
982c32358a
Merge pull request #4031 from Jaylyn-Ren/kata-spdk
Virtcontainers: Enable hot plugging vhost-user-blk device on ARM
2022-04-29 12:16:38 +08:00
Fabiano Fidêncio
a88adabaae clh: Cloud Hypervisor has a built-in Rate Limiter
The notion of "built-in rate limiter" was added as part of
bd8658e362, and that commit considered
that only Firecracker had a built-in rate limiter, which I think was the
case when that was introduced (mid 2020).

Nowadays, however, Cloud Hypervisor takes advantage of the very same crate
used by Firecraker to do I/O throttling.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:27:56 +02:00
Fabiano Fidêncio
63c4da03a9 clh: Implement the Disk RateLimiter logic
Let's take advantage of the newly added DiskRateLimiter* options and
apply those to the network device configuration.

The logic here is identical to the one already present in the Network
part of Cloud Hypervisor's driver.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:27:53 +02:00
Fabiano Fidêncio
5b18575dfe hypervisor: Add disk bandwidth and operations rate limiters
This is the disk counterpart of the what was introduced for the network
as part of the previous commits in this series.

The newly added fields are:
* DiskRateLimiterBwMaxRate, defined in bits per second, which is used to
  control the network I/O bandwidth at the VM level.
* DiskRateLimiterBwOneTimeBurst, also defined in bits per second, which
  is used to define an *initial* max rate, which doesn't replenish.
* DiskRateLimiterOpsMaxRate, the operations per second equivalent of the
  DiskRateLimiterBwMaxRate.
* DiskRateLimiterOpsOneTimeBurst, the operations per second equivalent of
  the DiskRateLimiterBwOneTimeBurst.

For now those extra fields have only been added to the hypervisor's
configuration and they'll be used in the coming patches of this very
same series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:27:11 +02:00
Fabiano Fidêncio
1cf9469297 clh: Implement the Network RateLimiter logic
Let's take advantage of the newly added NetRateLimiter* options and
apply those to the network device configuration.

The logic here is quite similar to the one already present in the
Firecracker's driver, with the main difference being the single Inbound
/ Outbound MaxRate and the presence of both Bandwidth and Operations
rate limiter.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:26:38 +02:00
Fabiano Fidêncio
00a5b1bda9 utils: Define DefaultRateLimiterRefillTimeMilliSecs
Firecracker's driver doesn't expose the RefillTime option of the rate
limiter to the user.  Instead, it uses a contant value of 1000
miliseconds (1 second).

As we're following Firecracker's driver implementation, let's expose
create a new constant, use it as part of the Firecracker's driver, and
later on re-use it as part of the Cloud Hypervisor's driver.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:22:42 +02:00
Fabiano Fidêncio
be1bb7e39f utils: Move FC's function to revert bytes to utils
Firecracker's revertBytes function, now called "RevertBytes", can be
exposed as part of the virtcontainers' utils file, as this function will
be reused by Cloud Hypervisor, when adding the rate limiter logic there.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:22:42 +02:00
Fabiano Fidêncio
2d35e6066d hypervisor: Add network bandwidth and operations rate limiters
In a similar way to what's already exposed as RxRateLimiterMaxRate and
TxRateLimiterMaxRate, let's add four new fields to the Hypervisor's
configuration.

The values added are related to bandwidth and operations rate limiters,
which have to be added so we can expose I/O throttling configurations to
users using Cloud Hypervisor as their preferred VMM.

The reason we cannot simply re-use {Rx,Tx}RateLimiterMaxRate is because
Cloud Hypervisor exposes a single MaxRate to be used for both inbound
and outbound queues.

The newly added fields are:
* NetRateLimiterBwMaxRate, defined in bits per second, which is used to
  control the network I/O bandwidth at the VM level.
* NetRateLimiterBwOneTimeBurst, also defined in bits per second, which
  is used to define an *initial* max rate, which doesn't replenish.
* NetRateLimiterOpsMaxRate, the operations per second equivalent of the
  NetRateLimiterBwMaxRate.
* NetRateLimiterOpsOneTimeBurst, the operations per second equivalent of
  the NetRateLimiterBwOneTimeBurst.

For now those extra fields have only been added to the hypervisor's
configuration and they'll be used in the coming patches of this very
same series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:22:42 +02:00
David Gibson
1b931f4203 runtime: Allock mockfs storage to be placed in any directory
Currently EnableMockTesting() takes no arguments and will always place the
mock storage in the fixed location /tmp/vc/mockfs.  This means that one
test run can interfere with the next one if anything isn't cleaned up
(and there are other bugs which means that happens).  If if those were
fixed this would allow developers testing on the same machine to interfere
with each other.

So, allow the mockfs to be placed at an arbitrary place given as a
parameter to EnableMockTesting().  In TestMain() we place it under our
existing temporary directory, so we don't need any additional cleanup just
for the mockfs.

fixes #4140

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:47:59 +10:00
David Gibson
ef6d54a781 runtime: Let MockFSInit create a mock fs driver at any path
Currently MockFSInit always creates the mockfs at the fixed path
/tmp/vc/mockfs.  This change allows it to be initialized at any path
given as a parameter.  This allows the tests in fs_test.go to be
simplified, because the by using a temporary directory from
t.TempDir(), which is automatically cleaned up, we don't need to
manually trigger initTestDir() (which is misnamed, it's actually a
cleanup function).

For now we still use the fixed path when auto-creating the mockfs in
MockAutoInit(), but we'll change that later.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:23:36 +10:00
David Gibson
5d8438e939 runtime: Move mockfs control global into mockfs.go
virtcontainers/persist/fs/mockfs.go defines a mock filesystem type for
testing.  A global variable in virtcontainers/persist/manager.go is used to
force use of the mock fs rather than a normal one.

This patch moves the global, and the EnableMockTesting() function which
sets it into mockfs.go.  This is slightly cleaner to begin with, and will
allow some further enhancements.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:23:36 +10:00
David Gibson
963d03ea8a runtime: Export StoragePathSuffix
storagePathSuffix defines the file path suffix - "vc" - used for
Kata's persistent storage information, as a private constant.  We
duplicate this information in fc.go which also needs it.

Export it from fs.go instead, so it can be used in fc.go.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:23:36 +10:00
David Gibson
1719a8b491 runtime: Don't abuse MockStorageRootPath() for factory tests
A number of unit tests under virtcontainers/factory use
MockStorageRootPath() as a general purpose temporary directory.  This
doesn't make sense: the mockfs driver isn't even in use here since we only
call EnableMockTesting for the pase virtcontainers package, not the
subpackages.

Instead use t.TempDir() which is for exactly this purpose.  As a bonus it
also handles the cleanup, so we don't need MockStorageDestroy any more.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:23:36 +10:00
David Gibson
bec59f9e39 runtime: Make bind mount tests better clean up after themselves
There are several tests in mount_test.go which perform a sample bind
mount.  These need a corresponding unmount to clean up afterwards or
attempting to delete the temporary files will fail due to the existing
mountpoint.  Most of them had such an unmount, but
TestBindMountInvalidPgtypes was missing one.

In addition, the existing unmounts where done inconsistently - one was
simply inline (so wouldn't be executed if the test fails too early) and one
is a defer.  Change them all to use the t.Cleanup mechanism.

For the dummy mountpoint files, rather than cleaning them up after the
test, the tests were removing them at the beginning of the test.  That
stops the test being messed up by a previous run, but messily.  Since
these are created in a private temporary directory anyway, if there's
something already there, that indicates a problem we shouldn't ignore.
In fact we don't need to explicitly remove these at all - they'll be
removed along with the rest of the private temporary directory.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:20:35 +10:00
David Gibson
2eeb5dc223 runtime: Don't use fixed /tmp/mountPoint path
Several tests in kata_agent_test.go create /tmp/mountPoint as a dummy
directory to mount.  This is not cleaned up after the test.  Although it
is in /tmp, that's still a little messy and can be confusing to a user.
In addition, because it uses the same name every time, it allows for one
run of the test to interfere with the next.

Use the built in t.TempDir() to use an automatically named and deleted
temporary directory instead.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:14:52 +10:00
Archana Shinde
33e244f284
Merge pull request #4102 from likebreath/0414/clh_v23.0
Upgrade to Cloud Hypervisor v23.0
2022-04-19 06:01:04 -07:00
Bin Liu
b19bfac7cd
Merge pull request #4042 from yibozhuang/direct-assign-fsgroup
fsGroup support for direct-assigned volume
2022-04-16 10:23:15 +08:00
Bo Chen
29e569aa92 virtcontainers: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v23.0.
Note: The client code of cloud-hypervisor's (CLH) OpenAPI is
automatically generated by openapi-generator [1-2].

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-04-14 12:56:01 -07:00
Chelsea Mafrica
32f92e75cc
Merge pull request #4021 from fengwang666/direct-volume-bug
runtime: Base64 encode the direct volume mountInfo path
2022-04-13 13:15:38 -07:00
Zhuoyu Tie
6e79042aa0 runtime: no need to write virtiofsd error to log
The scanner reads nothing from viriofsd stderr pipe, because param
'--syslog' rediercts stderr to syslog. So there is no need to write
scanner.Text() to kata log

Fixes: #4063

Signed-off-by: Zhuoyu Tie <tiezhuoyu@outlook.com>
2022-04-12 15:59:57 +08:00
Yibo Zhuang
532d53977e runtime: fsGroup support for direct-assigned volume
The fsGroup will be specified by the fsGroup key in
the direct-assign mountinfo metadate field.
This will be set when invoking the kata-runtime
binary and providing the key, value pair in the metadata
field. Similarly, the fsGroupChangePolicy will also
be provided in the mountinfo metadate field.

Adding an extra fields FsGroup and FSGroupChangePolicy
in the Mount construct for container mount which will
be populated when creating block devices by parsing
out the mountInfo.json.

And in handleDeviceBlockVolume of the kata-agent client,
it checks if the mount FSGroup is not nil, which
indicates that fsGroup change is required in the guest,
and will provide the FSGroup field in the protobuf to
pass the value to the agent.

Fixes #4018

Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
2022-04-11 08:41:13 -07:00
Yibo Zhuang
6a47b82c81 proto: fsGroup support for direct-assigned volume
This change adds two fields to the Storage pb

FSGroup which is a group id that the runtime
specifies to indicate to the agent to perform a
chown of the mounted volume to the specified
group id after mounting is complete in the guest.

FSGroupChangePolicy which is a policy to indicate
whether to always perform the group id ownership
change or only if the root directory group id
does not match with the desired group id.

These two fields will allow CSI plugins to indicate
to Kata that after the block device is mounted in
the guest, group id ownership change should be performed
on that volume.

Fixes #4018

Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
2022-04-11 08:41:13 -07:00
bin
9d5b03a1b7 runtime: delete debug option in virtiofsd
virtiofsd's debug will be enabled if hypervisor's debug has been
enabled, this will generate too many noisy logs from virtiofsd.

Unbind the relationship of log level between virtiofsd and
hypervisor, if users want to see debug log of virtiofsd,
can set it by:

  virtio_fs_extra_args = ["-o", "log_level=debug"]

Fixes: #3303

Signed-off-by: bin <bin@hyper.sh>
2022-04-07 19:55:22 +08:00
Greg Kurz
d0d3787233
Merge pull request #3696 from shippomx/main
kata-runtime enable hugepage support
2022-04-06 16:47:04 +02:00
Jaylyn Ren
b975f2e8d2 Virtcontainers: Enable hot plugging vhost-user-blk device on ARM
The vhost-user-blk can be hotplugged on the PCI bridge successfully on
X86, but failed on Arm. However, hotplugging it on Root Port as a PCIe
device can work well on ARM.
Open the "pcie_root_port" in configuration.toml is needed.

Fixes: #4019

Signed-off-by: Jaylyn Ren <jaylyn.ren@arm.com>
2022-04-06 17:37:51 +08:00
Fabiano Fidêncio
b39caf43f1
Merge pull request #3923 from Jakob-Naucke/no-initrd-se
runtime: Allow and require no initrd for SE
2022-04-05 09:26:07 +02:00
Feng Wang
354cd3b9b6 runtime: Base64 encode the direct volume mountInfo path
This is to avoid accidentally deleting multiple volumes.

Fixes #4020

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-04-04 19:56:46 -07:00
Eng Zer Jun
59c7165ee1
test: use T.TempDir to create temporary test directory
The directory created by `T.TempDir` is automatically removed when the
test and all its subtests complete.

This commit also updates the unit test advice to use `T.TempDir` to
create temporary directory in tests.

Fixes: #3924

Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2022-03-31 09:31:36 +08:00
bin
5e1c30d484 runtime: add logs around sandbox monitor
For debugging purposes, add some logs.

Fixes: #3815

Signed-off-by: bin <bin@hyper.sh>
2022-03-29 16:59:12 +08:00
yaoyinnan
66f05c5bcb runtime: Remove the explicit VirtioMem set and fix the comment
Modify the 2Mib in the comment to 4Mib.
VirtioMem is set by configuration file or annotation. And setupVirtioMem is called only when VirtioMem is true.

Fixes: #3750

Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
2022-03-28 21:21:38 +08:00
Jakob Naucke
ff17c756d2
runtime: Allow and require no initrd for SE
Previously, it was not permitted to have neither an initrd nor an image.
However, this is the exact config to use for Secure Execution, where the
initrd is part of the image to be specified as `-kernel`. Require the
configuration of no initrd for Secure Execution.

Also
- remove redundant code for image/initrd checking -- no need to check in
  `newQemuHypervisorConfig` (calling) when it is also checked in
  `getInitrdAndImage` (called)
- use `QemuCCWVirtio` constant when possible

Fixes: #3922
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2022-03-25 18:36:12 +01:00
David Gibson
c77e34de33 runtime: Move mock hook source
src/runtime/virtcontainers/hook/mock contains a simple example hook in Go.
The only thing this is used for is for some tests in
src/runtime/pkg/katautils/hook_test.go.  It doesn't really have anything
to do with the rest of the virtcontainers package.

So, move it next to the test code that uses it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-03-23 19:37:35 +11:00
David Gibson
86723b51ae virtcontainers: Remove unused install/uninstall targets
We've now removed the need to install the mock hook binary for unit tests.
However, it turns out that managing that was the *only* thing that the
install and uninstall targets in the virtcontainers Makefile handled.

So, remove them.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-03-23 19:37:18 +11:00
David Gibson
e65db838ff virtcontainers: Remove VC_BIN_DIR
The VC_BIN_DIR variable in the virtcontainers Makefile is almost unused.
It's used to generate TEST_BIN_DIR, and it's created in the install target.
However, we also create TEST_BIN_DIR, which is a subdirectory of VC_BIN_DIR
with mkdir -p, so it will necessarily create VC_BIN_DIR along the way.

So we can drop the unnecessary mkdir and expand the definition of
VC_BIN_DIR in the definition of TEST_BIN_DIR.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-03-22 16:53:59 +11:00
David Gibson
c20ad2836c virtcontainers: Remove unused Makefile defines
The INSTALL_EXEC and UNINSTALL_EXEC definitions from the virtcontainers
Makefile (unlike those from the runtime Makefile in the parent directory)
are entirely unused.  Remove them.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-03-22 16:40:57 +11:00
David Gibson
c776bdf4a8 virtcontainers: Remove unused parameter from go-test.sh
The check-go-test target passes the path to the mock hook test binary to
go-test.sh when it invokes it.  But go-test.sh just calls run_go_test from
ci/lib.sh, which invokes a script from the tests repo *without* any
parameters.

That is, this parameter is ignored anyway, so remove it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-03-22 16:39:22 +11:00
James O. D. Hunt
f8fb0d3bb6
Merge pull request #3322 from Kvasscn/kata_dev_block_driver_option
device: using const strings for block-driver option instead of hard coding
2022-03-21 10:56:25 +00:00
Miao Xia
a2f5c1768e runtime/virtcontainers: Pass the hugepages resources to agent
The hugepages resources claimed by containers should be limited
by cgroup in the guest OS.

Fixes: #3695

Signed-off-by: Miao Xia <xia.miao1@zte.com.cn>
2022-03-15 18:46:08 +08:00
Feng Wang
aa5ae6b17c runtime: Properly handle ESRCH error when signaling container
Currently kata shim v2 doesn't translate ESRCH signal, causing container
fail to stop and shim leak.

Fixes: #3874

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-14 11:03:05 -07:00
zhanghj
efa19c41eb device: use const strings for block-driver option instead of hard coding
Currently, the block driver option is specifed by hard coding, maybe it
is better to use const string variables instead of hard coded strings.
Another modification is to remove duplicate consts for virtio driver in
manager.go.

Fixes: #3321

Signed-off-by: Jason Zhang <zhanghj.lc@inspur.com>
2022-03-14 09:20:43 +08:00
Eric Ernst
1e301482e7
Merge pull request #3406 from fengwang666/direct-blk-assignment
Implement direct-assigned volume
2022-03-04 11:58:37 -08:00
Feng Wang
e76519af83 runtime: small refactor to improve readability
Remove some confusing/duplicate code so it's more readable

Fixes: #3454

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-04 10:00:52 -08:00
Fabiano Fidêncio
7e5f11a52b vendor: Update containerd to 1.6.1
Let's bring in the latest release of Containerd, 1.6.1, released on
March 2nd, 2022.

With this, we take the opportunity to remove containerd/api reference as
we shouldn't need a separate module only for the API.

Here's the list of changes needed in the code due to the bump:
* stop using `grpc.WithInsecure()` as it's been deprecated
  - use `grpc.WithTransportCredentials(insecure.NewCredentials())`
    instead

Fixes: #3820

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-04 10:28:40 +01:00
Fabiano Fidêncio
2af91b23e1
Merge pull request #3281 from jongwu/vcpu_hotplug_arm64
experimentally enable vcpu hotplug and virtio-mem on arm64 in kernel part
2022-03-04 09:14:31 +01:00
Jianyong Wu
42771fa726 runtime: don't set socket and thread for arm/virt
As this is just a initial vcpu hotplug support, thread and socket has
not been supported. So, don't set socket and thread when hotadd cpu for
arm/virt.

Fixes: #3280
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2022-03-04 11:22:18 +08:00
Feng Wang
f905161bbb runtime: mount direct-assigned block device fs only once
Mount the direct-assigned block device fs only once and keep a refcount
in the guest. Also use the ro flag inside the options field to determine
whether the block device and filesystem should be mounted as ro

Fixes: #3454

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-03 18:57:02 -08:00
Feng Wang
ea51ef1c40 runtime: forward the stat and resize requests from shimv2 to kata agent
Translate the volume path from host-known path to guest-known path
and forward the request to kata agent.

Fixes: #3454

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-03 18:57:02 -08:00
Feng Wang
c39281ad65 runtime: update container creation to work with direct assigned volumes
During the container creation, it will parse the mount info file
of the direct assigned volumes and update the in memory mount object.

Fixes: #3454

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-03 18:57:02 -08:00
Feng Wang
4e00c2377c agent: add grpc interface for stat and resize operations
Add GetVolumeStats and ResizeVolume APIs for the runtime to query stat
and resize fs in the guest.

Fixes: #3454

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-03 18:57:02 -08:00
Fabiano Fidêncio
af80473496 clh: stop virtofsd if clh fails to boot up the vm
If, for some reason, we're able to launch cloud hypervisor but not able
to boot the VM up, the virtiofsd process would be left behind.

Let's ensure, via defer, that we stop virtiofsd in case of errors.

Fixes: #3819

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-03 19:10:37 +01:00
Fabiano Fidêncio
97951a2d12 clh: Don't use SharedFS with Confidential Guests
kata-containers/pulls#3771 added TDX support for Cloud Hypervisor, but
two big things got overlooked while doing that.

1. virtio-fs, as of now, cannot be part of the trust boundary, so the
   Confidential Guest will not be using it.

2. virtio-block hotplug should be enabled in order to use virtio-block
   for the rootfs (used with the devmapper plugin).

When trying to use cloud-hypervisor with TDX using virtio-fs, we're
facing the following error on the guest kernel:
```
virtiofs virtio2: device must provide VIRTIO_F_ACCESS_PLATFORM
```

After checking and double-checking with virtiofs and cloud-hypervisor
developers, it happens as confidential containers might put some
limitations on the device, so it can't access all of the guests' memory
and that's where this restriction seems to be coming from. Vivek
mentioned that virtiofsd do not support VIRTIO_F_ACCESS_PLATFORM (aka
VIRTIO_F_IOMMU_PLATFORM) yet, and that for ecrypted guests virtiofs may
not be the best solution at the moment.

@sboeuf put this in a very nice way: "if the virtio-fs driver doesn't
support VIRTIO_F_ACCESS_PLATFORM, then the pages corresponding to the
virtqueues and the buffers won't be marked as SHARED, meaning the VMM
won't have access to it".

Interestingly enough, it works with QEMU, and it may be due to some
change done on the patched QEMU that @devimc is packaging, but we won't
take the path to figure out what was the change and patch
cloud-hypervisor on the same way, because of 1.

Fixes: #3810

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-03 12:49:40 +01:00
Fabiano Fidêncio
c30b3a9ff1 clh: Adding a volume is not supported without SharedFS
As mounting volumes into the guest requires SharedFS setup, let's ensure
we error out if trying to do so in a situation where SharedFS is not
supported.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-03 12:49:30 +01:00
Fabiano Fidêncio
f889f1f957 clh: introduce supportsSharedFS()
supportsSharedFS() is a new method to be used to ensure that no SharedFS
specifics are called when, for a reason or another, Cloud Hypervisor is
in a mode where SharedFSs are not supported.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-03 12:49:28 +01:00
Fabiano Fidêncio
54d27ed721 clh: introduce loadVirtiofsDaemon()
Similarly to the `createVirtiofsDaemon` and `stopVirtiofsDaemon` methos,
let's introduce and use loadVirtiofsDaemon, at it'll also be handy later
in this series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-03 12:48:38 +01:00
Fabiano Fidêncio
ae2221ea68 clh: introduce stopVirtiofsDaemon()
Similary to the `createVirtiofsDaemon` method, let's introduce and use
its counterpart, as it'll also be handy later in this series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-03 12:48:26 +01:00
Fabiano Fidêncio
e8bc26f90d clh: introduce setupVirtiofsDaemon()
Similarly to what's been done with the `createVirtiofsDaemon`, let's
create a `setupVirtiofsDaemon` one.

It will also become handy later in this series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-03 12:48:14 +01:00
Fabiano Fidêncio
413b3b477a clh: introduce createVirtiofsDaemon()
Let's introduce and use a new `createVirtiofsDaemon` method.  Its name
says it all, and it'll be handy later in this series when, spoiler
alert, SharedFS cannot be used (in such cases as in Confidential
Guests).

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-03 12:48:02 +01:00
Fabiano Fidêncio
76e4f6a2a3 Revert "hypervisors: Confidential Guests do not support Device hotplug"
This reverts commit df8ffecde0, as device
hotplug *is* supported and, more than that, is very much needed when
using virtio-blk instead of virtio-fs.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-03 09:59:55 +01:00
Bin Liu
75877f8793
Merge pull request #3187 from Kvasscn/kata_dev_remove_temp_vsock_dir
virtcontainers: remove temp dir created for vsock in test code
2022-03-02 11:05:47 +08:00
Francesco Giudici
7f638dd049
Merge pull request #3764 from Jakob-Naucke/hugepages-test-s390x
virtcontainers: Use available s390x hugepages
2022-03-01 14:33:59 +01:00
Fabiano Fidêncio
97c17085b0
Merge pull request #3770 from Jakob-Naucke/gofmt-vmm-s390x
runtime: Gofmt fixes
2022-03-01 11:34:15 +01:00
GabyCT
bc1733bb0e
Merge pull request #3774 from egernst/delinux-runtime
cleanup runtime pkgs for Darwin build, add basic Darwin build/unit test
2022-02-28 15:08:09 -06:00
Jakob Naucke
eda8ea154a
runtime: Gofmt fixes
- Mostly blank lines after `+build` -- see
  https://pkg.go.dev/go/build@go1.14.15 -- this is, to date, enforced by
  `gofmt`.
- 1.17-style go:build directives are also added.
- Spaces in govmm/vmm_s390x.go

Fixes: #3769
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2022-02-28 17:24:47 +01:00
Eric Ernst
e355a71860 container: file is not linux specific
This should not be linux specific -- drop restriction.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-02-28 08:01:53 -08:00
Eric Ernst
b31876eefb device-manager: move linux-only test to a linux-only file
We can't Mkdev on Darwin - let's make sure the vfio test is in a
linux-only file.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-02-28 08:01:53 -08:00
Eric Ernst
5be188cc29 utils: Add darwin stub
Add a stub for utils_darwin to facilitate building this package on
Darwin. We can probably drop this empty stub if we have better
abstraction for the various parts of virtcontainers that call it
today...

Fixes:# 3777

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-02-28 08:01:53 -08:00
Samuel Ortiz
ad0449195d virtcontainers: Convert stats dev_t to uint64
We need to convert them to uint64 as their types may differ on various
host OSes, but unix.Major|Minor takes a uint64 regardless.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-28 08:01:53 -08:00
bin
81ed269ed2 runtime: use Cmd.StdoutPipe instead of self-created pipe
Nydusd uses a bufio.Scanner to check if nydusd process has
existed, but stderr/stdout passed to Cmd is self-created pipe,
this pipe will not be closed if the process start failing.

Use standard Cmd.StdoutPipe can close the stdout and kata shim
will detect the existence of the nydusd process, then call cmd.Wait to
reap the process' resources.

Fixes: #3783

Signed-off-by: bin <bin@hyper.sh>
2022-02-28 16:52:49 +08:00
Eric Ernst
3997c962c2
Merge pull request #3767 from tanweernoor/02242022-kata-containers-issue-3631
runtime, config: make selinux configurable
2022-02-26 08:44:29 -08:00
Fabiano Fidêncio
a9ba7c132b clh: Fix typo on HotplugRemoveDevice
A copy and paste mistake was made and the error on HotplugRemoveDevice()
should be about removal and not about addition.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 22:35:32 +01:00
Tanweer Noor
082d538cb4 runtime: make selinux configurable
removes --tags selinux handling in the makefile (part of it introduced here: d78ffd6)
and makes selinux configurable via configuration.toml

Fixes: #3631
Signed-off-by: Tanweer Noor <tnoor@apple.com>
2022-02-25 10:33:46 -08:00
Fabiano Fidêncio
ea1876f057
Merge pull request #3771 from fidencio/wip/clh-tdx
clh: Add TDX support
2022-02-25 18:45:31 +01:00
Samuel Ortiz
1103f5a4d4 virtcontainers: Use FilesystemSharer for sharing the containers files
Switching to the generic FilesystemSharer brings 2 majors improvements:

1. Remove container and sandbox specific code from kata_agent.go
2. Allow for non Linux implementations to provide ways to share
   container files and root filesystems with the Kata Linux guest.

Fixes #3622

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-25 17:22:27 +01:00
Samuel Ortiz
533c1c0e86 virtcontainers: Keep all filesystem sharing prep code to sandbox.go
With the Linux implementation of the FilesystemSharer interface, we can
now remove all host filesystem sharing code from kata_agent and keep it
where it belongs: sandbox.go.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-25 17:22:27 +01:00
Samuel Ortiz
61590bbddc virtcontainers: Add a Linux implementation for the FilesystemSharer
This gathers the current kata agent and container filesystem sharing
code into a FilesystemSharer implementation.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-25 17:22:27 +01:00
Samuel Ortiz
03fc1cbd7e virtcontainers: Add a filesystem sharing interface
Filesystem sharing here means the ability to share some parts of the
host filesystem with the guest. It's mostly about sharing files and
container bundle root filesystems.

In order to allow for different file and rootfs sharing implementations,
we define a FilesystemSharer interface.

This interface provides a preparation step, where concrete
implementations will be able to e.g. prepare the host filesysstem.
Then it provides 2 methods, one for sharing any file (regular file or a
directory) and another one for sharing a container root filesystem

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-25 17:22:27 +01:00
Fabiano Fidêncio
72434333aa clh: Add TDX support
Let's enable TDX support for Cloud Hypervisor, using td-shim as its
desired firmware.

Fixes: #3632

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
a8827e0c78 hypervisors: Confidential Guests do not support NVDIMM
NVDIMM is also not supported with Confidential Guests and Virtio Block
devices should be used instead.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
f50ff9f798 hypervisors: Confidential Guests do not support Memory hotplug
Similarly to VCPUs and Device hotplug, Confidential Guests also do not
support Memory hotplug.

Let's make it clear in the documentation and guard the code on both QEMU
and Cloud Hypervisor side to ensure we don't advertise Memory hotplug as
being supported when running Confidential Guests.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
df8ffecde0 hypervisors: Confidential Guests do not support Device hotplug
Similarly to VCPUs hotplug, Confidential Guests also do not support
Device hotplug.

Let's make it clear in the documentation and guard the code on both QEMU
and Cloud Hypervisor side to ensure we don't advertise Device hotplug as
being supported when running Confidential Guests.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
28c4c044e6 hypervisors: Confidential Guests do not support VCPUs hotplug
As confidential guests do not support VCPUs hotplug, let's set the
"DefaultMaxVCPUs" value to "NumVCPUs".

The reason to do this is to ensure that guests will be started with the
correct amount of VCPUs, without giving to the guest with all the
possible VCPUs the host could provide.

One clear side effect of this limitation is that workloads that would
require more VCPUs on their yaml definition will not run on this
scenario.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
29ee870d20 clh: Add confidential_guest to the config file
ConfidentialGuest is an option already present and exposed for QEMU,
which is used for using Kata Containers together with different sorts of
Guest Protections, such as TDX and SEV for x86_64, PEF for ppc64le, and
SE for s390x.

Right now we error out in case confidential_guest is enabled, as we will
be implementing the needed blocks for this as part of this series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
9621c59691 clh: refactor image / initrd configuration set
This is a small code refactor removing a deadcode based the checks
already done in the generic hypervisor abstraction.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
dcdc412e25 clh: use common kernel params from the hypervisor code
The hypervisor code already defines 3 common kernel root params for the
following cases:
* NVDIMM
* NVDIMM without DAX support
* Virtio Block

As parameters used for cloud-hypervisor have an overlap with the ones
provided by the NVDIMM case, let's take advantage of that.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
4c164afbac versions: Update Cloud Hypervisor to 5343e09e7b8db
Let's bump the Cloud Hypervisor version to 5343e09e7b8db, as that brings
a few fixes we're interested in, such as:

* hypervisor, vmm: Handle TDX hypercalls with INVALID_OPERAND
  - https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3723
    - This is needed for the TDX support on the cloud hypervisor driver,
      which is part of this very same series.

* openapi: Update the PciBdf types
  - https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3748
    - This is needed due to a change in a DeviceNode field, which would
      cause a marshalling / demarshalling error when running with a
      version of cloud-hypervisor that includes the TDX fixes mentioned
      above.

* scripts: dev_cli: Don't quote $features_build
* scripts: dev_cli: Add --features option
  - https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3773
    - This is needed due to changes in the scripts used to build Cloud
      Hypervisor, which are used as part of Kata Containers CIs and
      github actions.

      Due to this change, we're also adapting the build scripts as part
      of this very same commit.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:16 +01:00
Jakob Naucke
bbfe7d6591
Merge pull request #3599 from Jakob-Naucke/no-virtio-rng-ccw
virtcontainers: Do not add a virtio-rng-ccw device
2022-02-25 15:27:02 +01:00
Jakob Naucke
b2a65f9031
virtcontainers: Use available s390x hugepages
in TestHandleHugepages. On s390x, hugepage sizes must be set at boot, so
test with any that are present (default is 1M).

Depends-on: github.com/kata-containers/kata-containers#3770
Fixes: #3763
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2022-02-25 13:11:00 +01:00
Eric Ernst
c6cc038364
Merge pull request #3615 from sameo/topic/hypervisor
Make the hypervisor framework not Linux specific
2022-02-23 16:02:00 -08:00
Samuel Ortiz
9fd4e5514f runtime: Move the resourcecontrol package one layer up
And try to reduce the number of virtcontainers packages, step by step.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-23 15:48:40 +01:00
Samuel Ortiz
823faee83a virtcontainers: Rename the cgroups package
To resourcecontrol, and make it consistent with the fact that cgroups
are a Linux implementation of the ResourceController interface.

Fixes: #3601

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-23 15:48:40 +01:00
Samuel Ortiz
0d1a7da682 virtcontainers: Rename and clean the cgroup interface
We call it a ResourceController, and we make it not so Linux specific.
Now the Linux implementations is the cgroups one.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-23 15:48:40 +01:00
Samuel Ortiz
ad10e201e1 virtcontainers: cgroups: Move non Linux routine to utils.go
Have an OS agnostic file for sharing routines.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-23 15:48:40 +01:00
Samuel Ortiz
d49d0b6f39 virtcontainers: cgroups: Define a cgroup interface
And move the current, Linux-specific implementation into
cgroups_linux.go

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-23 15:48:40 +01:00
Fabiano Fidêncio
4729fd0fc2
Merge pull request #3736 from liubin/fix/3733-log-events-for-crio
shim: log events for CRI-O
2022-02-22 09:19:37 +01:00
bin
f6fc1621f7 shim: log events for CRI-O
CRI-O start shim process without setting TTRPC_ADDRESS,
that the forwarding events goroutine will get errors.

For CRI-O runtime, we can log the events to log file.

Fixes: #3733

Signed-off-by: bin <bin@hyper.sh>
2022-02-22 11:02:50 +08:00
Peng Tao
031da99914
Merge pull request #3687 from luodw/nydus-clh
nydus: add lazyload support for kata with clh
2022-02-21 19:31:45 +08:00
luodaowen.backend
3175aad5ba virtiofs-nydus: add lazyload support for kata with clh
As kata with qemu has supported lazyload, so this pr aims to
bring lazyload ability to kata with clh.

Fixes #3654

Signed-off-by: luodaowen.backend <luodaowen.backend@bytedance.com>
2022-02-19 21:55:31 +08:00
zhanghj
94b831ebf8 virtcontainers: remove temp dir created for vsock in test code
remove temp dir generated by mock.GenerateKataMockHybridVSock().

Fixes: #3186

Signed-off-by: zhanghj <zhanghj.lc@inspur.com>
2022-02-19 16:59:15 +08:00
Archana Shinde
7db9bef72c
Merge pull request #3718 from Kvasscn/kata_dev_fix_utils_assert_msg
virtcontainers: Remove duplicated assert messages in utils test code
2022-02-18 06:07:16 -08:00
Samuel Ortiz
27de212fe1 runtime: Always add network endpoints from the pod netns
As the container runtime, we're never inspecting, adding or configuring
host networking endpoints.
Make sure we're always do that by wrapping addSingleEndpoint calls into
the pod network namespace.

Fixes #3661

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-18 10:37:07 +01:00
zhanghj
1cee0a9452 virtcontainers: Remove duplicated assert messages in utils test code
Remove duplicated strings in assert.Errorf() and assert.NoErrorf().

Fixes: #3714

Signed-off-by: zhanghj <zhanghj.lc@inspur.com>
2022-02-18 16:45:05 +08:00
Samuel Ortiz
77c29bfd3b container: Remove VFIO lazy attach handling
With the recently added VFIO fixes and support, we should not need that
anymore.

Fixes #3108

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-17 08:39:44 +01:00
Fabiano Fidêncio
6f9685fbf5
Merge pull request #3624 from mdlayher/mdl-vsock
runtime: use github.com/mdlayher/vsock@v1.1.0
2022-02-16 23:11:47 +01:00
Samuel Ortiz
26b3f0017c virtcontainers: Split hypervisor into Linux and OS agnostic bits
Keep all the OS agnostic bits in the hypervisor.go and
hypervisor_ARCH.go files.

Fixes #3614

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-16 19:15:31 +01:00
Samuel Ortiz
fa0e9dc6b1 virtcontainers: Make all Linux VMMs only build on Linux
Some of them (e.g. QEMU) can run on other OSes (e.g. Darwin) but the
current virtcontainers implementation is Linux specific.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-16 19:07:34 +01:00
Samuel Ortiz
c91035d0e1 virtcontainers: Move non QEMU specific constants to hypervisor.go
Hotplugging errors and 9pfs size are not particularily QEMU specific.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-16 19:07:34 +01:00
Samuel Ortiz
10ae05914c virtcontainers: Move guest protection definitions to hypervisor.go
They're not QEMU specific, other VMMs may implement support for it.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-16 19:07:31 +01:00
Samuel Ortiz
b28d0274ff virtcontainers: Make max vCPU config less QEMU specific
Even though it's still actually defined as the QEMU upper bound,
it's now abstracted away through govmm.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-16 19:06:32 +01:00
bin
81a8baa5e5 runtime: add hugepages support
Add hugepages support, port from:
b486387cba

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Signed-off-by: bin <bin@hyper.sh>
2022-02-16 15:14:53 +08:00
bin
7df677c01e runtime: Update calculateSandboxMemory to include Hugepages Limit
Support hugepages and port from:
96dbb2e8f0

Fixes: #3342

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Signed-off-by: bin <bin@hyper.sh>
2022-02-16 15:14:37 +08:00
Fabiano Fidêncio
90fd625d0c versions: Udpate Cloud Hypervisor to 55479a64d237
Let's update cloud-hypervisor to a version that exposes the TDx support
via the OpenAPI's auto-generated code.

Fixes: #3663

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-14 17:32:30 +01:00
Matt Layher
c1ce67d905
runtime: use github.com/mdlayher/vsock@v1.1.0
Fixes #3625
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2022-02-12 19:57:15 -05:00
luodaowen.backend
2d9f89aec7 feature(nydusd): add nydusd support to introduse lazyload ability
Pulling image is the most time-consuming step in the container lifecycle. This PR
introduse nydus to kata container, it can lazily pull image when container start. So it
can speed up kata container create and start.

Fixes #2724

Signed-off-by: luodaowen.backend <luodaowen.backend@bytedance.com>
2022-02-11 21:41:17 +08:00
Julio Montes
982f14fa66 runtime: support QEMU SGX
Enable SGX in QEMU when `sgx.intel.com/epc` annotation is defined

fixes #3436

Signed-off-by: Julio Montes <julio.montes@intel.com>
2022-02-10 09:45:48 -06:00
Samuel Ortiz
07b9d93f5f virtcontainer: Simplify the sandbox network creation flow
We don't need to call NewNetwork() twice, and we can have the VM factory
case return immediatly. That makes the code more readable.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Samuel Ortiz
2c7087ff42 virtcontainers: Make all endpoints Linux only
All of the networking endpoints are Linux specific.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Samuel Ortiz
49d2cde1e2 virtcontainers: Split network tests into generic and OS specific parts
Some unit tests are generic while others, mostly because they depend on
netlink, are Linux specific.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Samuel Ortiz
0269077ebf virtcontainers: Remove the netlink package dependency from network.go
Move the netlink dependent code into network_linux.go.
Other OSes will have to provide the same functions.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Samuel Ortiz
7fca5792f7 virtcontainers: Unify Network endpoints management interface
And only have AddEndpoints/RemoveEndpoints for all cases (single
endpoint vs all of them, hotplug or not).

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Samuel Ortiz
c67109a251 virtcontainers: Remove the Network PostAdd method
It's used once by the sandbox code and can be implemented directly
there.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Samuel Ortiz
e0b264430d virtcontainers: Define a Network interface
And move the Linux implementation into a GOOS specific file.

Fixes #3005

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Samuel Ortiz
5e119e90e8 virtcontainers: Rename the Network structure fields and methods
We are converting the Network structure into an interface, so that
different host OSes can have different networking implementations for
Kata.
One step into that direction is to rename all the Network structure
fields and methods to something that is less Linux networking namespace
specific. This will make the Network interface naming consistent.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Samuel Ortiz
b858d0dedf virtcontainers: Make all Network fields private
Prepare for making it a real interface.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Samuel Ortiz
49eee79f5f virtcontainers: Remove the NetworkNamespace structure
It is now replaced with a single Network structure

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Samuel Ortiz
844eb61992 virtcontainers: Have CreateVM use a Network reference
We are replacing the NetworkingNamespace structure with the Network
one, so we should have the hypervisor interface switching to it as well.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Samuel Ortiz
d7b67a7d1a virtcontainers: Network API cleanups and simplifications
Remove unused parameters.
Reduce the number of parameters by deriving some of them (e.g. a
networking config) from their outer structure (e.g. a Sandbox
reference).

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Samuel Ortiz
2edea88369 virtcontainers: Make the Network structure manage endpoints
Endpoints creations, attachement and hotplug are bound to the networking
namespace described through the Network structure.
Making them Network methods is natural and simplifies the code.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Samuel Ortiz
8f48e28325 virtcontainers: Expand the Network structure
For simplicity sake, there should only be one networking structure per
sandbox, as opposed to two (Network and NetworkingNamespace) currently.

This commit start expanding the Network structure in order to eventually
make it the single representation of a virtcontainers sandbox
networking.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-08 22:27:53 +01:00
Pierre Kohler
5ef522f7c3 runtime: check kvm module sev correctly
Runtime now accepts both `1` and `Y` as valid values for
kvm_amd module parameter kvm_amd.sev.

Fixes #3273

Signed-off-by: Pierre Kohler <pierre.kohler@cysec.systems>
2022-02-07 23:48:47 +01:00
Eric Ernst
e8eb5e8295
Merge pull request #3609 from egernst/rootless-linux
virtcontainers: Split the rootless package into OS specific parts
2022-02-03 12:19:31 -08:00
Jakob Naucke
7ffe9e5198
virtcontainers: Do not add a virtio-rng-ccw device
On s390x, skip adding a virtio-rng device. The on-chip CPACF provides
entropy instead. For Confidential Containers, when using Secure
Execution, entropy attacks on virtio-rng are mitigated.

Fixes: #3598
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2022-02-02 17:06:20 +01:00
Julio Montes
1f29478b09 runtime: suppport split firmware
firmware can be split into FIRMWARE_VARS.fd (UEFI variables as
configuration) and FIRMWARE_CODE.fd (UEFI program image). UEFI
variables can be customized per each user while UEFI code is kept same.

fixes #3583

Signed-off-by: Julio Montes <julio.montes@intel.com>
2022-02-01 13:40:19 -06:00
Samuel Ortiz
14e7f52a91 virtcontainers: Split the rootless package into OS specific parts
Move the netns specific bits into a Linux specific file.

Fixes: #3607

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-01-28 16:20:28 -08:00
James O. D. Hunt
7c956e0d27 virtcontainers: Enable initrd for Cloud Hypervisor
Since CH has supported booting with an initramfs since version 0.7.0
[1], allow an `initrd=` to be specified.

Fixes: #3566.

[1] - https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v0.7.0

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2022-01-28 10:49:10 +00:00
Eric Ernst
a5ebeb96c1
Merge pull request #2941 from egernst/sandbox-sizing-feature
Sandbox sizing feature
2022-01-27 09:37:57 -08:00
Eric Ernst
8cde54131a runtime: introduce static sandbox resource management
There are software and hardware architectures which do not support
dynamically adjusting the CPU and memory resources associated with a
sandbox. For these, today, they rely on "default CPU" and "default
memory" configuration options for the runtime, either set by annotation
or by the configuration toml on disk.

In the case of a single container (launched by ctr, or something like
"docker run"), we could allow for sizing the VM correctly, since all of
the information is already available to us at creation time.

In the sandbox / pod container case, it is possible for the upper layer
container runtime (ie, containerd or crio) could send a specific
annotation indicating the total workload resource requirements
associated with the sandbox creation request.

In the case of sizing information not being provided, we will follow
same behavior as today: start the VM with (just) the default CPU/memory.

If this information is provided, we'll track this as Workload specific
resources, and track default sizing information as Base resources. We
will update the hypervisor configuration to utilize Base+Workload
resources, thus starting the VM with the appropriate amount of CPU and
memory.

In this scenario (we start the VM with the "right" amount of
CPU/Memory), we do not want to update the VM resources when containers
are added, or adjusted in size.

This functionality is introduced behind a configuration flag,
`static_sandbox_resource_mgmt`. This is defaulted to false for all
configurations except Firecracker, which is set to true.

This'll greatly improve UX for folks who are utilizing
Kata with a VMM or hardware architecture that doesn't support hotplug.

Note, users will still be unable to do in place vertical pod autoscaling
or other dynamic container/pod sizing with this enabled.

Fixes: #3264

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-01-26 09:04:38 -08:00
Braden Rayhorn
fc0e095180
runtime: fix handling container spec's memory limit
The OCI container spec specifies a limit of -1 signifies
unlimited memory. Update the sandbox memory calculator
to reflect this part of the spec.

Fixes: #3512

Signed-off-by: Braden Rayhorn <bradenrayhorn@fastmail.com>
2022-01-24 13:30:32 -06:00
Bo Chen
2d799cbfa3 virtcontainers: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v21.0.
Note: The client code of cloud-hypervisor's (CLH) OpenAPI is
automatically generated by openapi-generator [1-2].

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-01-20 17:48:10 -08:00
Fabiano Fidêncio
ec6655af87 govmm: Use govmm from our own pkg
Let's stop using govmm from kata-containers/govmm and let's start using
it from our own repo.

Fixes: #3495

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-01-19 18:02:46 +01:00
Julio Montes
49223e67af runtime: remove enable_swap option
`enable_swap` option was added long time ago to add
`-realtime mlock=off` to the QEMU's command line.
Kata now supports QEMU 6, `-realtime` option has been deprecated and
`mlock=on` is causing unexpected behaviors in kata.
This patch removes support for `enable_swap`, `-realtime` and `mlock=`
since they are causing bugs in kata.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2022-01-18 11:12:29 -06:00
liangxianlong
878ab93c15 runtime: Provide protection for shared data
The k.reqHandlers should be protected by locks when used

Fixes #3440

Signed-off-by: liangxianlong <liang.xianlong@zte.com.cn>
2022-01-13 14:48:10 +08:00
James O. D. Hunt
ef835b5948
Merge pull request #3418 from yangfeiyu20102011/main
runtime: it should rollback when failed in Sandbox AddInterface
2022-01-12 10:22:36 +00:00
bin
85f5ae190e runtime: close span before return from function in case of error
Return before closing span will cause invalid spans, so span should
be closed before function return.

Fixes: #3424

Signed-off-by: bin <bin@hyper.sh>
2022-01-11 19:45:41 +08:00
yangfeiyu
b133a2368a runtime: it should rollback when failed in Sandbox AddInterface
When Sandbox AddInterface() is called, it may fail after endpoint.HotAttach,
we'd better rollback and call save() in the end.

Fixes: #3419

Signed-off-by: yangfeiyu <yangfeiyu20102011@163.com>
2022-01-11 18:43:43 +08:00
Feng Wang
c486c2ca18 agent: fix the broken protobuf generation code
After the protocols are moved to upper libs (PR3355),
the runtime protocol generation is broken. This fixes it.

Fixes: #3414

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-01-10 15:37:00 -08:00
Shengjing Zhu
2d0f9d2d06 vc: remove swagger binary
Fixes: #3362

Signed-off-by: Shengjing Zhu <zhsj@debian.org>
2021-12-25 22:41:29 +08:00
Jakob Naucke
137e217b85
docs: Fix outdated k8s link
in virtcontainers readme

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-12-22 19:40:25 +01:00
James O. D. Hunt
2ebae2d279
Merge pull request #3287 from jodh-intel/docs-split-arch-doc
Split architecture doc into separate files
2021-12-20 10:11:30 +00:00
James O. D. Hunt
6f9efb4043 docs: Move arch doc to separate directory
Move the architecture document into a new `docs/design/architecture/` directory
in preparation for splitting it into more manageable pieces.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-12-16 12:26:17 +00:00
Eric Ernst
3865a1bcf6
Merge pull request #2918 from egernst/update-container-type-handling
update container type handling
2021-12-15 10:41:23 -08:00
Eric Ernst
7a989a8333 runtime: api-test: fixup
not clear why this was commented out before -- ensure that we set
approprate annotation on the sandbox container's annotations to indicate
this is a sandbox.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-12-14 18:55:18 -08:00
Eric Ernst
52f79aef91 utils: update container type handling
Today we assume that if the CRI/upper layer doesn't provide a container
type annotation, it should be treated as a sandbox. Up to this point, a
sandbox with a pause container in CRI context and a single container
(ala ctr run) are treated the same.

For VM sizing and container constraining, it'll be useful to know if
this is a sandbox or if this is a single container.

In updating this, we cleanup the type handling tests and we update the
containerd annotations vendoring.

Fixes: #2926

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-12-14 17:59:19 -08:00
bin
03546f75a6 runtime: change io/ioutil to io/os packages
Change io/ioutil to io/os packages because io/ioutil package
is deprecated from 1.16:

Discard => io.Discard
NopCloser => io.NopCloser
ReadAll => io.ReadAll
ReadDir => os.ReadDir
ReadFile => os.ReadFile
TempDir => os.MkdirTemp
TempFile => os.CreateTemp
WriteFile => os.WriteFile

Details: https://go.dev/doc/go1.16#ioutil

Fixes: #3265

Signed-off-by: bin <bin@hyper.sh>
2021-12-15 07:31:48 +08:00
Fabiano Fidêncio
602d87295b
Merge pull request #3226 from liubin/fix/3193-fill-hypervisorconfig
runtime/template: Handling new attributes for hypervisor config
2021-12-09 13:29:23 +01:00
Chelsea Mafrica
7522109abc
Merge pull request #3218 from liubin/fix/3217-fix-span-name
runtime: correct span name for stopSandbox function
2021-12-07 16:36:14 -08:00
bin
b92babf91b runtime/template: Handling new attributes for hypervisor config
Some new attributes are added to hypervisor config:
- VMStorePath
- RunStorePath
- SharedPath

These attributes should be handled in two places:

- reset when check the new hypervisor's config is suitable
  to the base config.
- copy from new hypervisor's config when create new VM

Fixes: #3193

Signed-off-by: bin <bin@hyper.sh>
2021-12-07 19:31:03 +08:00
bin
40bd34caaf runtime: only call stopVirtiofsd when shared_fs is virtio-fs
If shared_fs is set to virtio-9p, the virtiofsd is not started,
so there is no need to stop it.

Fixes: #3219

Signed-off-by: bin <bin@hyper.sh>
2021-12-07 16:06:26 +08:00
bin
33f343ee08 runtime: correct span name for stopSandbox function
Normally the span name should be the same as function
name, so chagne `StopVM` to `stopSandbox`.

Fixes: #3217

Signed-off-by: bin <bin@hyper.sh>
2021-12-07 15:59:18 +08:00
Bo Chen
995300260e virtcontainers: clh: Upgrade to openapi-generator v5.3.0
The latest release of openapi-generator v5.3.0 contains the fix for
`dropping err` bug [1]. This patch also re-generated the client code of
Cloud Hypervisor to have the bug fixed.

[1] https://github.com/OpenAPITools/openapi-generator/pull/10275

Fixes: #3201

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-12-03 08:55:38 -08:00
Fabiano Fidêncio
3fdc97e110
Merge pull request #3183 from fengwang666/nonroot-vhost-bug-fix
runtime: enable vhost-net for rootless hypervisor
2021-12-03 10:42:50 +01:00
Feng Wang
b3bcb7b251 runtime: enable vhost-net for rootless hypervisor
vhost-net is disabled in the rootless kata runtime feature, which has been abandoned since kata 2.0.
I reused the rootless flag for nonroot hypervisor and would like to enable vhost-net.

Fixes #3182

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2021-12-02 21:55:31 -08:00
Bo Chen
4756a04b2d virtcontainers: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v19.0.
Note: The client code of cloud-hypervisor's (CLH) OpenAPI is
automatically generated by openapi-generator [1-2].

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-12-02 12:09:12 -08:00
Peng Tao
01b6ffc0a4
Merge pull request #3028 from egernst/hypervisor-hacking
Hypervisor cleanup, refactoring
2021-11-26 10:21:49 +08:00
bin
ddc68131df runtime: delete netmon
Netmon is not used anymore.

Fixes: #3112

Signed-off-by: bin <bin@hyper.sh>
2021-11-24 15:08:18 +08:00
Eric Ernst
ce92cadc7d vc: hypervisor: remove setSandbox
The hypervisor interface implementation should not know a thing about
sandboxes.

Fixes: #2882

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-11-19 12:20:41 -08:00
Eric Ernst
2227c46c25 vc: hypervisor: use our own logger
This'll end up moving to hypervisors pkg, but let's stop using virtLog,
instead introduce hvLogger.

Fixes: #2884

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-11-19 12:20:41 -08:00
Eric Ernst
4c2883f7e2 vc: hypervisor: remove dependency on persist API
Today the hypervisor code in vc relies on persist pkg for two things:
1. To get the VM/run store path on the host filesystem,
2. For type definition of the Load/Save functions of the hypervisor
   interface.

For (1), we can simply remove the store interface from the hypervisor
config and replace it with just the path, since this is all we really
need. When we create a NewHypervisor structure, outside of the
hypervisor, we can populate this path.

For (2), rather than have the persist pkg define the structure, let's
let the hypervisor code (soon to be pkg) define the structure. persist
API already needs to call into hypervisor anyway; let's allow us to
define the structure.

We'll probably want to look at following similar pattern for other parts
of vc that we want to make independent of the persist API.

In doing this, we started an initial hypervisors pkg, to hold these
types (avoid a circular dependency between virtcontainers and persist
pkg). Next step will be to remove all other dependencies and move the
hypervisor specific code into this pkg, and out of virtcontaienrs.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-11-19 12:20:41 -08:00
Eric Ernst
34f23de512 vc: hypervisor: Remove need to get shared address from sandbox
Add shared path as part of the hypervisor config

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-11-19 12:20:41 -08:00
Eric Ernst
c28e5a7807 acrn: remove dependency on sandbox, persistapi datatypes
Today, acrn relies on sandbox level information, as well as a store
provided by common parts of the hypervisor. As we cleanup the
abstractions within our runtime, we need to ensure that there aren't
cross dependencies between the sandbox, the persistence logic and the
hypervisor.

Ensure that ACRN still compiles, but remove the setSandbox usage as
well as persist driver setup.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-11-19 12:20:41 -08:00
Greg Kurz
f80ca66300
Merge pull request #2921 from Amulyam24/template_test
virtcontainers: fix failing template test on ppc64le
2021-11-18 17:32:18 +01:00
Amulyam24
d5a18173b9 virtcontainers: fix failing template test on ppc64le
If a file/directory doesn't exist, os.Stat() returns an
error. Assert the returned value with os.IsNotExist() to
prevent it from failing.

Fixes: #2920

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2021-11-18 15:37:40 +05:30
Eric Ernst
7e6f2b8d64 vc-utils: don't export unused function
Many of these functions are just used on one place throughout the rest
of the code base. If we create hypervisor package, newtork package, etc, we may want to
parse this out.

Fixes: #3049

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-11-17 14:12:57 -08:00
Eric Ernst
860f30882a virtcontainers: move oci, uuid packages top level
This will be useful at runtime level; no need for oci or uuid to be subpkg of
virtcontainers.

While at it, ensure we run gofmt on the changed files.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-11-17 14:12:57 -08:00
Eric Ernst
8acb3a32b6 virtcontainers: remove unused package nsenter
Package is not utilized. Remove.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-11-17 14:12:57 -08:00
Eric Ernst
4788cb8263 vc-network: remove unused functions
Unused functions -- let's clean up!

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-11-17 14:12:57 -08:00
Eric Ernst
b6ebddd7ef oci: remove unused function GetContainerType
This is unused - we utilize ContainerType directly.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-11-17 14:12:57 -08:00
Eric Ernst
1e7cb4bc3a macvlan: drop bridged part of name
The fact that we need to "bridge" the endpoint is a bit irrelevant. To
be consistent with the rest of the endpoints, let's just call this
"macvlan"

Fixes: #3050

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-11-16 16:44:29 -08:00
Carlos Venegas
15b5d22e81
Merge pull request #2778 from jcvenegas/clh-race-condition-check
clh: Fix race condition that prevent start pods
2021-11-16 14:15:06 -06:00
Carlos Venegas
55412044df monitor: Fix monitor race condition doing hypervisor.check()
The thread monitor will check if the agent and the VMM are alive every
second in a blocking thread. The Cloud hypervisor API server is
single-threaded, if the monitor does a `check()`, while a slow request
is still in progress, the monitor check() method will timeout. The
monitor thread will stop all the shim-v2 execution.

This commit modifies the monitor thread to make it check the status of
the hypervisor after 5 seconds. Additionally, the `check()` method from
cloud-hypervisor will use the method `clh.isClhRunning(timeout)` with a
10 seconds timeout. The monitor function does no timeout, so even if
`hypervisor.check()` takes more 10 seconds, the isClhRunning method
handles errors doing a VmmPing and retry in case of errors until the
timeout is reached.

Reduce the time to the next check to 5 should not affect any functionality,
but it will reduce the overhead polling the hypervisor.

Fixes: #2777

Signed-off-by: Carlos Venegas <jose.carlos.venegas.munoz@intel.com>
2021-11-16 18:28:29 +00:00
snir911
b046c1ef6b
Merge pull request #2959 from snir911/wip/cgroups-systemd-fix
cgroups: Fix systemd cgroup support
2021-11-15 10:44:45 +02:00
Eric Ernst
e89c06e68b
Merge pull request #3032 from liubin/fix/3031-merge-two-types-packages
runtime: merge virtcontainers/pkg/types into virtcontainers/types
2021-11-12 14:23:21 -08:00
bin
09f7962ff1 runtime: merge virtcontainers/pkg/types into virtcontainers/types
There are two types packages under virtcontainers, and the
virtcontainers/pkg/types has a few codes, merging them into
one can make it easy for outstanding and using types package.

Fixes: #3031

Signed-off-by: bin <bin@hyper.sh>
2021-11-12 15:06:39 +08:00
bin
6acedc2531 runtime: delete not used codes
Functions EnvVars and GetOCIConfig in runtime/virtcontainers/pkg/oci/utils.go
are not used anymore.

Fixes: #3029

Signed-off-by: bin <bin@hyper.sh>
2021-11-12 11:35:31 +08:00
Snir Sheriber
bcf181b7ee cgroups: Fix systemd cgroup support
As github.com/containerd/cgroups doesn't support scope
units which are essential in some cases lets create
the cgroups manually and load it trough the cgroups
api
This is currently done only when there's single sandbox
cgroup (sandbox_cgroup_only=true), otherwise we set it
as static cgroup path as it used to be (until a proper
soultion for overhead cgroup under systemd will be
suggested)

Fixes: #2868
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2021-11-11 08:51:45 +02:00
Bin Liu
04185bd068
Merge pull request #2997 from Jakob-Naucke/lint-protection
virtcontainers: Lint protection types
2021-11-11 08:34:48 +08:00
Fabiano Fidêncio
653976c0fd
Merge pull request #3000 from bergwolf/crioptions
runtime: Revert "runtime: use containerd package instead of cri-containerd"
2021-11-10 13:41:24 +01:00
Peng Tao
eacfcdec19 runtime: Revert "runtime: use containerd package instead of cri-containerd"
This reverts commit 76f16fd1a7 to bring
back cri-containerd crioptions parsing so that kata works with older
containerd versions like v1.3.9 and v1.4.6.

Fixes: #2999
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-11-10 16:06:42 +08:00
Jakob Naucke
b7b89905d4
virtcontainers: Lint protection types
Protection types like tdxProtection or seProtection were marked nolint,
remove this. As a side effect, ARM needs dummy tests for these.

Fixes: #2801
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-11-09 18:36:32 +01:00
James O. D. Hunt
87f676062c agent: Remove dynamic tracing APIs
Remove the `StartTracing` and `StopTracing` agent APIs that toggle
dynamic tracing. This is not supported in Kata 2.x, as documented in the
[tracing proposals document](https://github.com/kata-containers/kata-containers/pull/2062).

Fixes: #2985.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-11-09 08:39:06 +00:00
Chelsea Mafrica
84ccdd8ef2 vendor: update OpenTelemetry to v0.20.0
Update OpenTelemetry from v0.15.0 to v0.20.0.

    Git log

    02d8bdd5 Release v0.20.0 (1837)
    aa66fe75 OS and Process resource detectors (1788)
    7374d679 Fix Links documents (1835)
    856f5b84 Add feature request issue template (1831)
    0fdc3d78 Remove bundler from Jaeger exporter (1830)
    738ef11e Fix flaky global ErrorHandler delegation test (1829)
    e43d9c00  Update Default Value for Jaeger Exporter Endpoint  (1824)
    0032bd64 Fix default merging of resource attributes from environment variable (1785)
    96c5e4ba Add SpanProcessor example for Span annotation on start (1733)
    543c8144 Remove the WithSDKOptions from the Jaeger exporter (1825)
    66389ad6 Update function docs in sdk.go (1826)
    70bc9eb3 Adds support for timeout on the otlp/gRPC exporter (1821)
    081cc61d Update Jaeger exporter convenience functions (1822)
    1b9f16d3 Remove the WithDisabled option from Jaeger exporter (1806)
    6867faa0 Bump actions/cache from v2.1.4 to v2.1.5 (1818)
    a2bf04dc Build context pipeline in Jaeger upload process (1809)
    2de86f23 Remove locking from Jaeger exporter shutdown/export (1807)
    4f9fec29 Add ExportSpans benchmark to Jaeger exporter (1805)
    d9566abe Fix OTLP testing flake: signal connection from mock collector (1816)
    a2cecb6e add support for env var configuration to otlp/gRPC (1811)
    d616df61 Fix flaky OTLP exporter reconnect test (1814)
    b09df84a Changes stdout to expose the `*sdktrace.TracerProvider` (1800)
    04890608 Remove options field from Jaeger exporter (1808)
    6db20e00 Remove the abandoned Process struct in Jaeger exporter (1804)
    086abf34 docs: use test example to document prometheus.InstallNewPipeline (1796)
    d0cea04b Bump google.golang.org/api from 0.43.0 to 0.44.0 in /exporters/trace/jaeger (1792)
    99c477fe Fixed typo for default service name in Jaeger Exporter (1797)
    95fd8f50 Bump google.golang.org/grpc from 1.36.1 to 1.37.0 in /exporters/otlp (1791)
    9b251644 Zipkin Exporter: Use default resouce's serviceName as default serivce name (1777) (1786)
    4d141e47 Add k8s.node.name and k8s.node.uid to semconv (1789)
    5c99a34c Fix golint issue caused by incorrect comment (1795)
    c5d006c0 Update Jaeger environment variables (1752)
    58432808 add NewExportPipeline and InstallNewPipeline for otlp (1373)
    7d8e6bd7 Zipkin Exporter: Adjust span transformation to comply with the spec (1688)
    2817c091 Merge sdk/export/trace into sdk/trace (1778)
    c61e654c Refactor prometheus exporter tests to match file headers as well (1470)
    23422c56 Remove process config for Jaeger exporter (1776)
    0d49b592 Add test to check bsp ignores `OnEnd` and `ForceFlush` post Shutdown` (1772)
    e9aaa04b Record links/events attribute drops independently (1771)
    5bbfc22c Make ExportSpans for Jaeger Exporter honor deadline (1773)
    0786fe32 Add Bug report issue templates (1775)
    3c7facee Add `ExportTimeout` option to batch span processor (1755)
    c6b92d5b Make TraceFlags spec-compliant (1770)
    ee687ca5 Bump github.com/itchyny/gojq from 0.12.2 to 0.12.3 in /internal/tools (1774)
    52a24774 add support for configuring tls certs via env var to otlp/HTTP (1769)
    35cfbc7e Update precedence of event name in Jaeger exporter (1768)
    33699d24 Adds semantic conventions for exceptions (1492)
    928e3c38 Modify ForceFlush to abort after timeout/cancellation (1757)
    3947cab4 Fix testCollectorEndpoint typo and add tag assertions in jaeger_test (1753)
    ecc635dc add website docs (1747)
    07a8d195 Fix Jaeger span status reporting and unify tag keys (1761)
    4fa35c90 add partial support for env var config to otlp/HTTP (1758)
    bf180d0f improve OTLP/gRPC connection errors (1737)
    d575865b Fix span IsRecording when not sampling (1750)
    20c93b01 Update SamplingParameters (1749)
    97501a3f Update SpanSnapshot to use parent SpanContext (1748)
    604b05cb Store current Span instead of local and remote SpanContext in context.Context (1731)
    c61f4b6d Set @lizthegrey to emeritus status (1745)
    b1342fec Bump github.com/golangci/golangci-lint in /internal/tools (1743)
    54e1bd19 Bump google.golang.org/api from 0.41.0 to 0.43.0 in /exporters/trace/jaeger (1741)
    4d25b6a2 Bump github.com/prometheus/client_golang from 1.9.0 to 1.10.0 in /exporters/metric/prometheus (1740)
    0a47b66f Bump google.golang.org/grpc from 1.36.0 to 1.36.1 in /exporters/otlp (1739)
    26f006b8 Reinstate @paivagustavo as an Approver (1734)
    382c7ced Remove hasRemoteParent field from SDK span (1728)
    862a5a68 Remove setting error status while recording error with Span from oteltest package (1729)
    6defcfdf Remove links on NewRoot spans (1726)
    a9b2f851 upgrade thrift to v0.14.1 in jaeger exporter (1712)
    5a6a854d Bump google.golang.org/protobuf from 1.25.0 to 1.26.0 in /exporters/otlp (1724)
    23486213 Migrate to using go.opentelemetry.io/proto/otlp (1713)
    5d559b40 Remove makeSamplingDecision func (1711)
    e24702da Update the TraceContext.Extract docs (1720)
    9d4eb1f6 Update dates in CHANGELOG.md for 2021 releases (1723)
    2b4fa968 Release v0.19.0 (1710)
    4beb7041 sdk/trace: removing ApplyConfig and Config (1693)
    1d42be16 Rename WithDefaultSampler TracerProvider option to WithSampler and update docs (1702)
    860d5d86 Add flag to determine whether SpanContext is remote (1701)
    0fe65e6b Comply with OpenTelemetry attributes specification (1703)
    88884351 Bump google.golang.org/api from 0.40.0 to 0.41.0 in /exporters/trace/jaeger (1700)
    345f264a breaking(zipkin): removes servicName from zipkin exporter. (1697)
    62cbf0f2 Populate Jaeger's Span.Process from Resource (1673)
    28eaaa9a Add a test to prove the Tracer is safe for concurrent calls (1665)
    8b1be11a Rename resource pkg label vars and methods (1692)
    a1539d44 OpenCensus metric exporter bridge (1444)
    77aa218d Fix issue #1490, apply same logic as in the SDK (1687)
    9d3416cc Fix synchronization issues in global trace delegate implementation (1686)
    58f69f09 Span status from HTTP code: Do not set status message if it can be inferred (1681)
    9c305bde Flush metric events prior to shutdown in OTLP example (1678)
    66b1135a Fix CHANGELOG (1680)
    90bd4ab5 Update employer information for maintainers (1683)
    36841913 Remove WithRecord() option from trace.SpanOption when starting a span (1660)
    65c7de20 Remove trace prefix from NoOp src files. (1679)
    e88a091a Make SpanContext Immutable (1573)
    d75e2680 Avoid overriding configuration of tracer provider (1633)
    2b4d5ac3 Bump github.com/golangci/golangci-lint in /internal/tools (1671)
    150b868d Bump github.com/google/go-cmp from 0.5.4 to 0.5.5 (1667)
    76aa924e Fix the examples target info messaging (1676)
    a3aa9fda Bump github.com/itchyny/gojq from 0.12.1 to 0.12.2 in /internal/tools (1672)
    a5edd79e Removed setting error status while recording err as span event (1663)
    e9814758 chore(zipkin): improves zipkin example to not to depend on timeouts. (1566)
    3dc91f2d Add ForceFlush method to TracerProvider (1608)
    bd0bba43 exporter: swap pusher for exporter (1656)
    56904859 Update the SimpleSpanProcessor (1612)
    a7f7abac  SpanStatus description set only when status code is set to Error (1662)
    05252f40 Jaeger Exporter: Fix minor mapping discrepancies (1626)
    238e7c61 Add non-empty string check for attribute keys (1659)
    e9b9aca8 Add tests for propagation of Sampler Tracestate changes (1655)
    875a2583 Add docs on when reviews should be cleared (1556)
    7153ef2d Add HTTP/JSON to the otlp exporter (1586)
    62e2a0f7 Unexport the simple and batch SpanProcessors (1638)
    992837f1 Add TracerProvider tests to oteltest harness (1607)
    bb4c297e Pre release v0.18.0 (1635)
    712c3dcc Fix makefile ci target and coverage test packages (1634)
    841d2a58 Rename local var new to not collide with builtin (1610)
    13938ab5 Update SpanProcessor docs (1611)
    e25503a0 Add compatibility tests to CI (1567)
    1519d959 Use reasonable interval in sdktrace.WithBatchTimeout (1621)
    7d4496e0 Pass metric labels when transforming to gaugeArray (1570)
    6d4a5e0d Bump google.golang.org/grpc from 1.35.0 to 1.36.0 in /exporters/otlp (1619)
    a93393a0 Bump google.golang.org/grpc in /example/prom-collector (1620)
    e499ca86 Fix validation for tracestate with vendor and add tests (1581)
    43886e52 Make timestamps sequential in lastvalue agg check (1579)
    37688ef6 revent end-users from implementing some interfaces (1575)
    85e696d2 Updating documentation with an working example for creating NewExporter (1513)
    562eb28b Unify the Added sections of the unreleased changes (1580)
    c4cf1aff Fix Windows build of Jaeger tests (1577)
    4a163bea Fix stdout TestStdoutTimestamp failure with sleep (1572)
    bd4701eb Stagger timestamps in exact aggregator tests (1569)
    b94cd4b2 add code attributes to semconv package (1558)
    78c06cef Update docs from gitter to slack for communication (1554)
    1307c911 Remove vendor exclude from license-check (1552)
    5d2636e5 Bump github.com/golangci/golangci-lint in /internal/tools (1565)
    d7aff473 Vendor Thrift dependency (1551)
    298c5a14 Update span limits to conform with OpenTelemetry specification (1535)
    ecf65d79 Rename otel/label -> otel/attribute (1541)
    1b5b6621 Remove resampling on span.SetName (1545)
    8da52996 fix: grpc reconnection  (1521)
    3bce9c97 Add Keys() method to propagation.TextMapCarrier (1544)
    0b1a1c72 Make oteltest.SpanRecorder into a concrete type (1542)
    7d0e3e52 SDK span no modification after ended (1543)
    7de3b58c Remove extra labels types (1314)
    73194e44 Bump google.golang.org/api from 0.39.0 to 0.40.0 in /exporters/trace/jaeger (1536)
    8fae0a64 Create resource.Default() with required attributes/default values (1507)
    76f93422 Release v0.17.0 (1534)
    9b242bc4 Organize API into Go modules based on stability and dependencies (1528)
    e50a1c8c Bump actions/cache from v2 to v2.1.4 (1518)
    a6aa7f00 Bump google.golang.org/api from 0.38.0 to 0.39.0 in /exporters/trace/jaeger (1517)
    38efc875 Code Improvement - Error strings should not be capitalized (1488)
    6b340501 Update default branch name (1505)
    b39fd052 nit: Fix comment to be up-to-date (1510)
    186c2953 Fix golint error of package comment form (1487)
    9308d662 Bump google.golang.org/api from 0.37.0 to 0.38.0 in /exporters/trace/jaeger (1506)
    1952d7b6 Reverse order of attribute precedence when merging two Resources (1501)
    ad7b4715 Remove build flags for runtime/trace support (1498)
    4bf4b690 Remove inaccurate and unnecessary import comment (1481)
    7e19eb6a Bump google.golang.org/api from 0.36.0 to 0.37.0 in /exporters/trace/jaeger (1504)
    c6a4406a Bump github.com/golangci/golangci-lint in /internal/tools (1503)
    9524ac09 Update workflows to include main branch as trigger (1497)
    c066f15e Bump github.com/gogo/protobuf from 1.3.1 to 1.3.2 in /internal/tools (1478)
    894e0240 Bump github.com/golangci/golangci-lint in /internal/tools (1477)
    71ffba39 Bump google.golang.org/grpc from 1.34.0 to 1.35.0 in /exporters/otlp (1471)
    515809a8 Bump github.com/itchyny/gojq from 0.12.0 to 0.12.1 in /internal/tools (1472)
    3e96ad1e gitignore: remove unused example path (1474)
    c5622777 Histogram aggregator functional options (1434)
    0df8cd62 Rename Makefile.proto to avoid interpretation as proto file (1468)
    979ff51f Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 (1453)
    1df8b3b8 Bump github.com/gogo/protobuf from 1.3.1 to 1.3.2 in /exporters/otlp (1456)
    4c30a90a Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 in /sdk (1455)
    5a9f8f6e Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 in /exporters/stdout (1454)
    7786f34c Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 in /exporters/trace/zipkin (1457)
    4352a7a6 Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 in /exporters/otlp (1460)
    6990b3b3 Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 in /exporters/metric/prometheus (1461)
    7af40d22 Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 in /exporters/trace/jaeger (1463)
    f16f1892 Bump google.golang.org/grpc in /example/otel-collector (1465)
    fe363be3 Move Span Event to API (1452)
    43922240 Bump google.golang.org/grpc in /example/prom-collector (1466)
    0aadfb27 Prepare release v0.16.0 (1464)
    207587b6 Metric histogram aggregator: Swap in SynchronizedMove to avoid allocations (1435)
    c29c6fd1 Shutdown underlying span exporter while shutting down BatchSpanProcessor (1443)
    dfece3d2 Combine the Push and Pull metric controllers (1378)
    74deeddd Handle tracestate in TraceContext propagator  (1447)
    49f699d6 Remove Quantile aggregation, DDSketch aggregator; add Exact timestamps (1412)
    9c949411 Rename internal/testing to internal/internaltest (1449)
    8d809814 Move gRPC driver to a subpackage and add an HTTP driver (1420)
    9332af1b Bump github.com/golangci/golangci-lint in /internal/tools (1445)
    5ed96e92 Update exporters/otlp Readme.md (1441)
    bc9cb5e3 Switch CircleCI badge to GitHub Actions (1440)
    716ad082 Remove CircleCI config (1439)
    0682db1e Adding Security Workflows to GitHub Actions (2/2): gosec workflow (1429)
    11f732b8 Adding Security Workflows to GitHub Actions (1/2): codeql workflow (1428)
    40f1c003 Add Tracestate into the SamplingResult struct (1432)
    db06c8d1 Flush metric events before shutdown in collector example (1438)
    f6f458e1 Fix golint issue caused by typo in trace.go (1436)
    fe9d1f7e Use uint64 Count consistently in metric aggregation (1430)
    3a337d0b Bump github.com/golangci/golangci-lint in /internal/tools (1433)
    1e4c8321 cleanup: drop the removed examples in gitignore (1427)
    5c9221cf Unify endpoint API that related to OTel exporter (1401)
    045c3ffe Build scripts: Replace mapfile with read loop for old bash versions (1425)
    2def8c3d Add Versioning Documentation (1388)
    6bcd1085 Bump github.com/itchyny/gojq from 0.11.2 to 0.12.0 in /internal/tools (1424)
    38e76efe Add a split protocol driver for otlp exporter (1418)
    439cd313 Add TraceState to SpanContext in API (1340)
    35215264 Split connection management away from exporter (1369)
    add9d933 Bump github.com/prometheus/client_golang from 1.8.0 to 1.9.0 in /exporters/metric/prometheus (1414)
    93d426a1 Add @dashpole as a project Approver (1410)
    6fe20ef3 Fix small typo (1409)
    b22d0d70 Mention the getting started guide (1406)
    3fb80fb2 Fix duplicate checkout action in GitHub workflow (1407)
    2051927b Correct CI workflow syntax (1403)
    f11a86f7 Fix typo in comment (1402)
    bdf87a78 Migrate CircleCI ci.yml workflow to GitHub Actions (1382)
    4e59dd1f Bump google.golang.org/grpc from 1.32.0 to 1.34.0 in /example/otel-collector (1400)
    83513f70 Bump google.golang.org/api from 0.32.0 to 0.36.0 in /exporters/trace/jaeger (1398)
    a354fc41 Bump github.com/prometheus/client_golang from 1.7.1 to 1.8.0 in /exporters/metric/prometheus (1397)
    3528e42c Bump google.golang.org/grpc from 1.32.0 to 1.34.0 in /exporters/otlp (1396)
    af114baf Call otel.Handle with non-nil errors (1384)
    c3c4273e Add RO/RW span interfaces (1360)

Fixes #2591

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-11-04 12:30:45 -07:00
Chelsea Mafrica
09d5d8836b runtime: tracing: Change method for adding tags
In later versions of OpenTelemetry label.Any() is deprecated. Create
addTag() to handle type assertions of values. Change AddTag() to
variadic function that accepts multiple keys and values.

Fixes #2547

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-11-04 10:19:05 -07:00
Snir Sheriber
b34ed403c5 cgroups: pass vhost-vsock device to cgroup
for the sandbox cgroup

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2021-11-04 10:59:10 +02:00
Snir Sheriber
7362e1e8a9 runtime: remove prefix when cgroups are managed by systemd
as done previously in 9949daf4dc

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2021-11-04 10:13:22 +02:00
Jianyong Wu
e15c8460db
Merge pull request #2265 from rapiz1/simple-ro-mount
virtcontainers: simplify read-only mount handling
2021-11-01 10:43:16 +08:00
GabyCT
969b78b01f
Merge pull request #2496 from rapiz1/show-guest-protection
cli: Show available guest protection in env output
2021-10-29 17:28:47 -05:00
James O. D. Hunt
2551179e43
Merge pull request #2929 from YchauWang/vc-docs-api
virtcontainers: api: update the functions in the api.md docs
2021-10-29 16:01:31 +01:00
James O. D. Hunt
4e2dd41eb6
Merge pull request #1791 from wainersm/virtcontainers-1
virtcontainers: check that both initrd and image are not set
2021-10-29 14:51:07 +01:00
wangyongchao.bj
338ac87516 virtcontainers: api: update the functions in the api.md docs
Virtcontainers API document functions weren't sync with the codes Sandbox and VCImpl.
And we have two functions named `CreateSandbox` functions, diff by one parameter,
very confused. So this pr sync the codes to api documents.

Fixes: #2928

Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
2021-10-29 15:36:53 +08:00
Yujia Qiao
e66d0473be virtcontainers: simplify read-only mount handling
Current handling of read-only mounts is a little tricky.
However, a clearer solution can be used here:
  1. make a private ro bind mount at privateDest to the mount source
  2. make a bind mount at mountDest to the mount created in step 1
  3. umount the private bind mount created in step 1
One important aspect is that the mount in step 2 is duplicated from
the one we created in step 1. So the MS_RDONLY flag is properly
preserved in all mounts created in the propagtion.

Fixes: #2205

Depends-on: github.com/kata-containers/tests#4106

Signed-off-by: Yujia Qiao <rapiz3142@gmail.com>
2021-10-28 15:48:41 +08:00
Manabu Sugimoto
3be50adab9 agent: Add support for Seccomp
The kata-agent supports seccomp feature based on the OCI runtime specification.
This seccomp capability in the kata-agent is enabled by default.
However, it is not enforced by default: users need to enable that by setting
`disable_guest_seccomp` to `false` in the main configuration file.

Fixes: #1476

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2021-10-27 19:06:13 +09:00
Wainer dos Santos Moschetta
309dae631a virtcontainers: check that both initrd and image are not set
This changed valid() in hypervisor to check the case where both
initrd and image path are set; in this case it returns an error.

Fixes #1868
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2021-10-26 10:44:23 -04:00
bin
5f306330f4 virtcontainers: delete duplicated notify in watchHypervisor function
When hypervisor check failed, the notify function is called twice.

Fixes: #2901

Signed-off-by: bin <bin@hyper.sh>
2021-10-26 11:58:26 +08:00
Yujia Qiao
6cc8000cae
cli: Show available guest protection in env output
Show available guest protections in the
`kata-runtime env` output. Also bump the formatVersion.

Fixes: #1982

Signed-off-by: Yujia Qiao <rapiz3142@gmail.com>
2021-10-25 21:44:56 +08:00
Yujia Qiao
2063b13805
virtcontainers: Add func AvailableGuestProtections
Add functions to return guestProtection as a string slice, which
can be then used in `kata-runtime env` output.

Signed-off-by: Yujia Qiao <rapiz3142@gmail.com>
2021-10-25 21:44:01 +08:00
James O. D. Hunt
ec3aa1694b
Merge pull request #2844 from jongwu/unit_test
enable unit test on arm
2021-10-25 10:58:21 +01:00
David Gibson
a0825badf6
Merge pull request #2795 from dgibson/vfio-as-vfio
Allow VFIO devices to be used as VFIO devices in the container
2021-10-25 14:25:26 +11:00
David Gibson
34273da98f runtime/device: Allow VFIO devices to be presented to guest as VFIO devices
On a conventional (e.g. runc) container, passing in a VFIO group device,
/dev/vfio/NN, will result in the same VFIO group device being available
within the container.

With Kata, however, the VFIO device will be bound to the guest kernel's
driver (if it has one), possibly appearing as some other device (or a
network interface) within the guest.

This add a new `vfio_mode` option to alter this.  If set to "vfio" it will
instruct the agent to remap VFIO devices to the VFIO driver within the
guest as well, meaning they will appear as VFIO devices within the
container.

Unlike a runc container, the VFIO devices will have different names to the
host, since the names correspond to the IOMMU groups of the guest and those
can't be remapped with namespaces.

For now we keep 'guest-kernel' as the value in the default configuration
files, to maintain current Kata behaviour.  In future we should change this
to 'vfio' as the default.  That will make Kata's default behaviour more
closely resemble OCI specified behaviour.

fixes #693

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:29:31 +11:00
David Gibson
68696e051d runtime: Add parameter to constrainGRPCSpec to control VFIO handling
Currently constrainGRPCSpec always removes VFIO devices from the OCI
container spec which will be used for the inner container.  For
upcoming support for VFIO devices in DPDK usecases we'll need to not
do that.

As a preliminary to that, add an extra parameter to the function to
control whether or not it will remove the VFIO devices from the spec.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:29:31 +11:00
David Gibson
d9e2e9edb2 runtime: Rename constraintGRPCSpec to improve grammar
"constraint" is a noun, "constrain" is the associated verb, which makes
more sense in this context.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:29:31 +11:00
David Gibson
57ab408576 runtime: Introduce "vfio_mode" config variable and annotation
In order to support DPDK workloads, we need to change the way VFIO devices
will be handled in Kata containers.  However, the current method, although
it is not remotely OCI compliant has real uses.  Therefore, introduce a new
runtime configuration field "vfio_mode" to control how VFIO devices will be
presented to the container.

We also add a new sandbox annotation -
io.katacontainers.config.runtime.vfio_mode - to override this on a
per-sandbox basis.

For now, the only allowed value is "guest-kernel" which refers to the
current behaviour where VFIO devices added to the container will be bound
to whatever driver in the VM kernel claims them.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-10-25 12:29:29 +11:00
Jianyong Wu
1a96b8ba35 template: disable template unit test on arm
Template is broken on arm. here we disable the template unit test
temporarily.

Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-10-23 15:07:25 +08:00
Jianyong Wu
43b13a4a6d runtime: DefaultMaxVCPUs should not greater than defaultMaxQemuVCPUs
DefaultMaxVCPUs may be larger than the defaultMaxQemuVCPUs that should
be checked and avoided.

Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-10-23 15:07:25 +08:00
Manohar Castelino
52268d0ece hypervisor: Expose the hypervisor itself
Export the top level hypervisor type

s/hypervisor/Hypervisor

Fixes: #2880

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-10-22 16:46:02 -07:00
Eric Ernst
a72bed5b34 hypervisor: update tests based on createSandbox->CreateVM change
Fixup a couple of broken tests.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
f434bcbf6c hypervisor: createSandbox is CreateVM
Last of a series of commits to export the top level
hypervisor generic methods.

s/createSandbox/CreateVM

Fixes #2880

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
76f1ce9e30 hypervisor: startSandbox is StartVM
s/startSandbox/StartVM

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
fd24a695bf hypervisor: waitSandbox is waitVM
renaming...

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
a6385c8fde hypervisor: stopSandbox is StopVM
Renaming. There is no Sandbox specific logic except tracing.

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
f989078cd2 hypervisor: resumeSandbox is ResumeVM
renaming...

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
73b4f27c46 hypervisor: saveSandbox is SaveVM
rename

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
7308610c41 hypervisor: pauseSandbox is nothing but PauseVM
renaming

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
8f78e1cc19 hypervisor: The SandboxConsole is the VM's console
update naming

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
4d47aeef2e hypervisor: Export generic interface methods
This is in preparation for creating a seperate hypervisor package.
Non functional change.

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
2021-10-22 16:45:35 -07:00
Manohar Castelino
6baf2586ee hypervisor: Minimal exports of generic hypervisor internal fields
Export commonly used hypervisor fields and utility functions.
These need to be exposed to allow the hypervisor to be consumed
externally.

Note: This does not change the hypervisor interface definition.
Those changes will be separate commits.

Signed-off-by: Manohar Castelino <mcastelino@apple.com>
2021-10-22 16:45:35 -07:00
GabyCT
03877f3479
Merge pull request #2872 from likebreath/1020/clh_v19.0
Upgrade to Cloud Hypervisor v19.0
2021-10-21 10:26:55 -05:00
James O. D. Hunt
09741272bc
Merge pull request #2783 from likebreath/1001/clh_enable_seccomp
virtcontainers: clh: Enable the `seccomp` feature
2021-10-21 09:21:33 +01:00
Bo Chen
8030b6caf0 virtcontainers: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v19.0.
Note: The client code of cloud-hypervisor's (CLH) OpenAPI is
automatically generated by openapi-generator [1-2].

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-10-20 15:48:55 -07:00
Chelsea Mafrica
4ce2b14e60
Merge pull request #2817 from jodh-intel/clh+fc-agent-tracing
Enable agent tracing for hybrid VSOCK hypervisors
2021-10-18 22:01:52 -07:00
bin
76f16fd1a7 runtime: use containerd package instead of cri-containerd
cri-containerd project has been merged into containerd repo, and
we should not reference it any more in code and docs.

This commit will use containerd package instead of cri-containerd
package.

Fixes: #2791

Signed-off-by: bin <bin@hyper.sh>
2021-10-19 09:40:20 +08:00
James O. D. Hunt
41c49a7bf5
Merge pull request #2771 from fengwang666/debug-pid
runtime: update sandbox root dir cleanup behavior in rootless hypervisor
2021-10-18 17:47:47 +01:00
Christophe de Dinechin
bcffa26305 tracing: Fix typo in "package" tag name
The tracing tags for api.go contain `"packages"` as a tag name,
whereas all other tags contain `"package"`.

Fixes: #2847

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2021-10-15 14:48:00 +02:00
James O. D. Hunt
e61f5e2931 runtime: Show socket path in kata-env output
Display a pseudo path to the sandbox socket in the output of
`kata-runtime env` for those hypervisors that use Hybrid VSOCK.

The path is not a real path since the command does not create a sandbox.
The output includes a `{ID}` tag which would be replaced with the real
sandbox ID (name) when the sandbox was created.

This feature is only useful for agent tracing with the trace forwarder
where the configured hypervisor uses Hybrid VSOCK.

Note that the features required a new `setConfig()` method to be added
to the `hypervisor` interface. This isn't normally needed as the
specified hypervisor configuration passed to `setConfig()` is also
passed to `createSandbox()`. However the new call is required by
`kata-runtime env` to display the correct socket path for Firecracker.
The new method isn't wholly redundant for the main code path though as
it's now used by each hypervisor's `createSandbox()` call.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-10-15 11:45:29 +01:00
James O. D. Hunt
321be0f794 tracing: Remove trace mode and trace type
Remove the `trace_mode` and `trace_type` agent tracing options as
decided in the Architecture Committee meeting.

See:

- https://github.com/kata-containers/kata-containers/pull/2062

Fixes: #2352.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-10-15 10:09:38 +01:00
Bo Chen
7b2bfd4eca virtcontainers: clh: Use 'quiet' as the default kernel parameter
The 'quiet' kernel parameter can avoid guest kernel logs while booting,
which can reduce boot time.

Fix: #2820

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-10-11 22:06:27 -07:00
Bo Chen
3e24e46c70 virtcontainers: clh: Turn-off serial and virtio-console by default
We will need to have console output from the guest only for debugging
purposes. As a result, we can turn-off both the serial and
virtio-console devices by default for better boot time.

Fixes: #2820

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-10-11 22:06:23 -07:00
Feng Wang
adc9e0baaf runtime: fix two bugs in rootless hypervisor
Update the sandbox dir clean up logic to be more appropriate
Add different seeds for randInt() method

Fixes #2770

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2021-10-08 15:52:42 -07:00
Bo Chen
51cbe14584 runtime: Add option "disable_seccomp" to config hypervisor.clh
This patch adds an option "disable_seccomp" to the config
hypervisor.clh, from which users can disable the `seccomp`
feature from Cloud Hypervisor when needed (for debugging purposes).

Fixes: #2782

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-10-08 15:10:30 -07:00
Bo Chen
98b7350a1b virtcontainers: clh: Enable the seccomp feature
This patch enables the `seccomp` feature from Cloud Hypervisor which
provides fine-grained allowed syscalls for each of its worker
threads. It brings important security benefits, while would increase
memory footprint.

Fixes: #2782

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-10-08 15:07:43 -07:00
Fupan Li
988eb95621
Merge pull request #2760 from liubin/fix/2759-optimize-code-for-managing-temp-users
runtime: optimize code for managing temp users for rootless mode
2021-10-08 13:49:14 +08:00
bin
bf8f582c1d runtime: optimize code for managing temp users for rootless mode
This commit does two chagnes:

- move code for managing temp users to rootless.go.
- use common function in qemu.go when shutdown the VM.

Fixes: #2759

Signed-off-by: bin <bin@hyper.sh>
2021-10-08 11:04:21 +08:00
Bin Liu
10ec4b133c
Merge pull request #2742 from liubin/fix/2741-delete-file-code
Delete file virtcontainers-setup.sh
2021-10-07 11:54:47 +08:00
Jianyong Wu
7eac2ec786
protection: add confidential compute frame for arm
Even CCA, which is the confidential compute archtecture, has not been
ready, add a empty implementation to avoid static check error.

Fixes: #2789
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Suggested-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-10-06 15:53:36 +02:00
Jianyong Wu
8acfc154de
check: fix typecheck failure in qemu_arm64_test.go
fix typecheck failure in qemu_arm64_test.go

Fixes: #2789
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-10-06 15:53:35 +02:00
Amulya Meka
5b02d54e23
virtcontainers: fix lint failure on ppc64le
Add nolint for arch specific code to exclude
from lint check.

Fixes: #2773

Signed-off-by: Amulya Meka <amulmek1@in.ibm.com>
2021-10-06 15:53:35 +02:00
Jakob Naucke
ff9728f032
virtcontainers: nolint guestProtection
Exclude from lint checking for it is ultimately only used in
architecture-specific code.

Fixes: #2273
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-10-06 15:53:35 +02:00
Samuel Ortiz
71ce6cfe9e runtime: Pass the route IP family to the agent
When updating the guest routing table, we should forward the IP family
information up to the guest.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2021-10-01 14:35:17 +02:00
Samuel Ortiz
99450bd1f7 agent: protos: Add a Family field to the Route payload
Our check for the IP family is working as long as we have either a
gateway or a destination IP. Some routes are missing both.
The RT netlink messages provide the IP family information for each
route, so we can carry that piece of information up to the guest. That
will allow for a more reliable route IP family determination.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2021-10-01 14:35:17 +02:00
Samuel Ortiz
f85fe70231 runtime: vendor: Bump the netlink package dependency
We need to be able to get the IP family from the netlink route meesages,
and the Route.Family field only got recently added to the netlink
package.

The update generates static check warnings about the call for
nethandler.Delete() being deprecated in favor of a Close() call instead.
So we include the s/Delete()/Close()/ change as part of this PR.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2021-10-01 14:35:01 +02:00
James O. D. Hunt
2ce8d4263c clh: Suppress hypervisor output to make guest output visible
Reduce the cloud-hypervisor log level from `Debug` to `Info` when hypervisor
debug is enabled. This is required since `Debug` level:

- Is overkill for debugging hypervisor failures.
- Effectively hides the output from the guest kernel and userland: CLH
  generates so much output that the output from the guest gets "lost in
  the noise" (experiments show that for each full CLH debug message, at most
  1 _byte_ of guest output is displayed).

Fixes: #2726.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-09-30 14:22:09 +01:00
bin
762922a521 runtime: delete func ConstraintsToVCPUs
ConstraintsToVCPUs is not used any more.

Fixes: #2741

Signed-off-by: bin <bin@hyper.sh>
2021-09-30 14:44:41 +08:00
bin
4f4854308a runtime: delete virtcontainers-setup.sh
This file is not used anymore.

Fixes: #2741

Signed-off-by: bin <bin@hyper.sh>
2021-09-30 14:44:30 +08:00
Bin Liu
4ac7199282
Merge pull request #2494 from rapiz1/clean-up-code
virtcontainers: clean up useless code
2021-09-29 22:56:13 +08:00
David Gibson
b57613f53e
Merge pull request #1682 from dgibson/rescan
Remove forced PCI rescans from agent
2021-09-29 13:03:55 +10:00
Feng Wang
e5fe53f0a9 runtime: fix nil reference in cleanup rootless user
It seems the client (crio) can send multiple requests to stop the Kata VM,
resulting a nil reference if the uid has already been cleaned up by a different thread.

Fixes #2743

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2021-09-27 21:28:47 -07:00
Bin Liu
3217b03b17
Merge pull request #2522 from Bevisy/main-2515
virtcontainers: Fix incorrect scripts path
2021-09-27 21:14:40 +08:00
Bin Liu
39df808f6a
Merge pull request #2695 from YchauWang/wyc-vc-cgroup
runtime: clear virtcontainers cgroup duplicated function
2021-09-27 21:12:39 +08:00
David Gibson
aad1a8734f runtime/device: Give the agent information about VFIO devices
We send information about several kinds of devices to the agent so
that it can apply specific handling.  We don't currently do this with
VFIO devices.  However we need to do that so that the agent can
properly wait for VFIO devices to be ready (previously it did that
using a PCI rescan which may not be reliable and has some very bad
side effects).

This patch collates and sends the relevant information.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-27 12:46:33 +10:00
David Gibson
ebd7b61884 runtime: Don't repeat GetDeviceByID between appendDevices() and append*()
Both appendBlockDevice and appendVhostUserBlkDevice start by using
GetDeviceByID to lookup the api.Device object corresponding to their
ContainerDevice object.  However their common caller, appendDevices() has
already done this.

This changes it so the looked up api.Device is passed to the individual
append*Device() functions.  This slightly reduces duplicated work, but more
importantly it makes it clearer that append*Device() don't need to check
for a nil result from GetDeviceByID, since the caller has already done
that.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-27 12:46:33 +10:00
David Gibson
ad45c52fbe runtime/device: Record guest PCI path for VFIO devices
For several device types which correspond to a PCI device in the guest
we record the device's PCI path in the guest.  We don't currently do
that for VFIO devices, but we're going to need to for better handling
of SR-IOV devices.

To accomplish this, we have to determine the guest PCI path from the
information the VMM gives us:

For qemu, we query the slot of the device and its bridge from QMP.

For cloud-hypervisor, the device add interface gives us a guest PCI
address.  In fact this represents a design error in the clh API -
there's no way it can really know the guest PCI address in general.
It works in this case, because clh doesn't use PCI bridges, so the
device will always be on the root bus.  Based on that, the PCI path is
simply the device's slot number.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-27 12:46:33 +10:00
David Gibson
5c2af3e308 runtime/device: Refactor hotplugVFIODevice() to have common exit path
hotplugVFIODevice() has several different paths depending if we're
plugging into a root port or a PCIE<->PCI bridge and if we're using a
regular or mediated VFIO device.

We're going to want some common code on the successful exit path here,
so refactor the function to allow that without duplication.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-27 12:46:33 +10:00
David Gibson
cf36fd87ad runtime: Fix some leftover go fmt errors
A few "go fmt" errors appear to have crept it.  Clean them up with
"go fmt ./..." in the src/runtime directory.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-27 12:46:33 +10:00
zhanghj
57e3712dbd virtiofs: fix error report in TestVirtiofsdStart when go test running
Initialize ctx with context.Background() instead of nil value.

Fixes: #2718

Signed-off-by: zhanghj <zhanghj.lc@inspur.com>
2021-09-24 16:06:06 +08:00
Fabiano Fidêncio
279f8e9d03
Merge pull request #2590 from c3d/issue/2589-virtiofsd-perms
virtiofs: Create shared directory with 0700 mode, not 0750
2021-09-24 09:16:40 +02:00
Julio Montes
5d2a82fbf9
Merge pull request #2323 from dgibson/acpi-pcihp
Replace SHPC with ACPI PCI hotplug for Kata guests
2021-09-23 09:55:31 -05:00
Fabiano Fidêncio
0ececc630f
Merge pull request #2666 from cmaf/tracing-newContainer-logger
runtime: tracing: Fix logger passed in newContainer
2021-09-23 13:07:19 +02:00
Fabiano Fidêncio
e33c26ba18
Merge pull request #2622 from YchauWang/wyc-vc-api
virtcontainers: update VC SandboxConfig API add SandboxBindMounts field
2021-09-23 13:05:33 +02:00
Fabiano Fidêncio
47170e302a
Merge pull request #2616 from Bevisy/main-2615
sandbox: Allow the device to be accessed,such as /dev/null and /dev/u…
2021-09-23 13:04:18 +02:00
David Gibson
8bbcb06af5 qemu: Disable SHPC hotplug
Under certain circumstances[0] Kata will attempt to use SHPC hotplug
for PCI devices on the guest.  In fact we explicitly enable SHPC on
our PCI to PCI bridges, regardless of the qemu default.

SHPC was designed a long, long time ago for physical hotplugging and
works very poorly for a virtual environment. In particular it has a
mandatory 5s delay to allow a (real, human) operator to back out the
operation if they press a button by mistake. This alone makes it
unusable for a fast start up application like Kata.

Worse, the agent forces a PCI rescan during startup.  That will race
with the SHPC hotplug operation causing the device to go into a bad
state where config space can't be accessed from the guest at all.

The only reason we've sort of gotten away with this is that our
default guest kernel configuration triggers what's arguably a kernel
bug effectively disabling SHPC.  That makes the agent rescan the only
reason we see the new device.

Now that we require a qemu >=6.1, which includes ACPI PCI hotplug on
the q35 machine, we can explicitly disable SHPC in all cases.  It's
nothing but trouble.

fixes #2174

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-23 10:27:26 +10:00
David Gibson
cc4983eeac runtime: Remove unused qemuArchBase.appendBridges definition
qemuArchBase.appendBridges is never actually used, because the bare
qemuArchBase type is itself never used (outside of unit tests).  Instead
*all* the subclasses of qemuArchBase override appendBridges() to call
the very similar, but not identical genericAppendBridges.  So, we can
remove the qemuArchBase.appendBridges implementation.

Furthermore, all those subclasses override appendBridges() in exactly
the same way, and so we can remove *those* definitions and replace the
base class qemuArchBase appendBridges() with that version, calling
genericAppendBridges().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-23 10:15:08 +10:00
wangyongchao.bj
3b0c4bf9a0 runtime: clear virtcontainers cgroup duplicated function
There are `DeviceToDeviceCgroup` and `deviceToDeviceCgroup` two functions,
 creating a `specs.LinuxDeviceCgroup` object. We clear the new function `deviceToDeviceCgroup`.

Fixes: #2694

Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
2021-09-22 15:13:34 +08:00
Fabiano Fidêncio
2bee8bc6bd
Merge pull request #2432 from fengwang666/qemu-rootless
runtime: run the QEMU VMM process with a non-root user
2021-09-21 21:37:02 +02:00
Feng Wang
9a6d56f1ab runtime: fix empty cgroup path validation error
An empty cgroup path shouldn't fail cgroup creation

Fixes #2674

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2021-09-20 13:48:09 -07:00
Christophe de Dinechin
48fb1d9203 virtiofs: Create shared directory with 0700 mode, not 0750
A discussion on the Linux kernel mailing list [1] exposed that virtiofsd makes a
core assumption that the file systems being shared are not accessible by any
non-privileged user. We currently create the `shared` directory in the sandbox
with the default `0750` permissions, which gives read and directory traversal
access to the group. There is no real good reason for a non-root user to access
the shared directory, and this is potentially dangerous.

Fixes: #2589

[1]: https://lore.kernel.org/linux-fsdevel/YTI+k29AoeGdX13Q@redhat.com/

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2021-09-20 10:47:18 +02:00
Chelsea Mafrica
077b77c178 runtime: tracing: Fix logger passed in newContainer
Change logger in Trace call in newContainer from sandbox.Logger() to
nil. Passing nil will cause an error to be logged by kataTraceLogger
instead of the sandbox logger, which will avoid having the log message
report it as part of the sandbox subsystem when it is part of the
container subsystem.

The kataTraceLogger will not log it as related to the container
subsystem, but since the container logger has not been created at this
point, and we already use the kataTraceLogger in other instances where a
subsystem's logger has not been created yet, this PR makes the call
consistent with other code.

Fixes #2665

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-09-17 11:41:04 -07:00
Feng Wang
1cfe59304d runtime: Run QEMU using a non-root user/group
A random generated user/group is used to start QEMU VMM process.
The /dev/kvm group owner is also added to the QEMU process to grant it access.

Fixes #2444

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2021-09-17 11:28:44 -07:00
Hui Zhu
fff82b4ef5
Merge pull request #2628 from bergwolf/runtime-reorg
runtime: refactor commandline code directory
2021-09-17 10:37:22 +08:00
Chelsea Mafrica
6159ef3499
Merge pull request #2626 from YchauWang/wyc-vc-api02
virtcontainers: update VC HypervisorConfig API add three lost fields
2021-09-16 16:46:27 -07:00
Peng Tao
067c44d0b6 runtime: fix UT build failure
storeContainer has been removed.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-09-16 19:42:02 +08:00
Samuel Ortiz
7bf96d2457
Merge pull request #2604 from Amulyam24/container_tests
virtcontainers: add unit tests for container.go
2021-09-16 11:02:16 +02:00
Bo Chen
d00decc97d runtime: clh: Enable hugepages support
This patch adds the configuration option that allows to use hugepages
with Cloud Hypervisor guests.

Fixes: #2648

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-09-15 10:43:57 -07:00
David Gibson
64bb803fcf runtime/qemu: Move from query-cpus to query-cpus-fast
We recently updated to using qemu-6.1 (from qemu 5.2).  Unfortunately one
breaking change in qemu 6.0 wasn't caught by the CI.

The query-cpus QMP command has been removed, replaced by query-cpus-fast
(which has been available since qemu 2.12).  govmm already had support for
query-cpus-fast, we just weren't using it, so the change is quite easy.

fixes #2643

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-15 16:41:26 +10:00
Samuel Ortiz
9bed2ade0f virtcontainers: Convert to the new cgroups package API
The new API is based on containerd's cgroups package.
With that conversion we can simpligy the virtcontainers sandbox code and
also uniformize our cgroups external API dependency. We now only depend
on containerd/cgroups for everything cgroups related.

Depends-on: github.com/kata-containers/tests#3805
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-09-14 07:09:34 +02:00
Samuel Ortiz
b42ed39349 virtcontainers: cgroups: Add a containerd API based cgroups package
Eventually, we will convert the virtcontainers and the whole Kata
runtime code base to only rely on that package.

This will make Kata only depends on the simpler containerd cgroups API.

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
2021-09-14 07:09:34 +02:00
Samuel Ortiz
f17752b0dc virtcontainers: container: Do not create and manage container host cgroups
The only process we are adding there is the container host one, and
there is no such thing anymore.

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
2021-09-14 07:09:33 +02:00
Samuel Ortiz
dc7e9bce73 virtcontainers: sandbox: Host cgroups partitioning
This is a simplification of the host cgroup handling by partitioning the
host cgroups into 2: A sandbox cgroup and an overhead cgroup.

The sandbox cgroup is always created and initialized. The overhead
cgroup is only available when sandbox_cgroup_only is unset, and is
unconstrained on all controllers. The goal of having an overhead cgroup
is to be more flexible on how we manage a pod overhead. Having such
cgroup will allow for setting a fixed overhead per pod, for a subset of
controllers, while at the same time not having the pod being accounted
for those resources.

When sandbox_cgroup_only is not set, we move all non vCPU threads
to the overhead cgroup and let them run unconstrained. When it is set,
all pod related processes and threads will run in the sandbox cgroup.

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
2021-09-14 07:09:29 +02:00
Samuel Ortiz
f811026c77 virtcontainers: Unconditionally create the sandbox cgroup manager
Regardless of the sandbox_cgroup_only setting, we create the sandbox
cgroup manager and set the sandbox cgroup path at the same time.

Without doing this, the hypervisor constraint routine is mostly a NOP as
the sandbox state cgroup path is not initialized.

Fixes #2184

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
2021-09-14 07:05:57 +02:00
wangyongchao.bj
a6066404f7 virtcontainers: update VC HypervisorConfig API add three lost fields
Sync the virtcontainers api.md document, add `ConfidentialGuest` `EntropySourceList` `GuestSwap` three
 fields to the HypervisorConfig API.

Fixes #2625

Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
2021-09-14 10:42:54 +08:00
wangyongchao.bj
bb18cd475c virtcontainers: update VC SandboxConfig API add SandboxBindMounts field
sync the virtcontainers api.md document, add SandboxBindMounts field to the SandboxConfig API.
And update the order of the SandboxConfig API fields.

Fixes #2621

Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
2021-09-14 09:56:47 +08:00
Eric Ernst
967db0cbcc
Merge pull request #2544 from likebreath/0831/upgrade_clh_v18.0
versions: Upgrade to Cloud Hypervisor v18.0
2021-09-13 11:27:45 -07:00
Binbin Zhang
58e77a3c13 sandbox: Allow the device to be accessed,such as /dev/null and /dev/urandom
If the device has no permission, such as /dev/null, /dev/urandom,
it needs to be added into cgroup.

Fixes: #2615

Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
2021-09-13 20:47:16 +08:00
Samuel Ortiz
75ef8c243a
Merge pull request #2603 from Bevisy/main-2539
sandbox: Add device permissions such as /dev/null to cgroup
2021-09-13 11:04:51 +02:00
Anastassios Nanos
62baa48ef5 virtcontainers: fc: parse vcpuID correctly
In getThreadIDs(), the cpuID variable is derived from a string that
already contains a whitespace. As a result, strings.SplitAfter returns
the cpuID with a leading space. This makes any go variant of string to int
fail (strconv.ParseInt() in our case). This patch makes sure that the
leading space character is removed so the string passed to
strconv.ParseInt() is "CPUID" and not " CPUID".

This has been caused by a change in the naming scheme of vcpu threads
for Firecracker after v0.19.1.

Fixes: #2592

Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
2021-09-10 09:39:56 +00:00
Bo Chen
f785ff0bf2 virtcontainers: clh: Revert the workaround incorrect default values
Given the fix to the bugs of the openapi spec file is included in the
Cloud Hypervisor v18.0 [1], this patch reverts the workaround we carried
in the CLH driver.

This reverts commit 932ee41b3f.

[1] https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3029

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-09-09 14:52:53 -07:00
Bo Chen
0e0e59dc5f virtcontainers: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v18.0.
Note: The client code of cloud-hypervisor's (CLH) OpenAPI is
automatically generated by openapi-generator [1-2].

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-09-09 14:51:55 -07:00
Amulyam24
d865c80986 virtcontainers: add unit tests for container.go
Fixes: #268

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2021-09-09 13:09:38 +05:30
Binbin Zhang
71f915c63f sandbox: Add device permissions such as /dev/null to cgroup
adds the default devices for unix such as /dev/null, /dev/urandom to
the container's resource cgroup spec

Fixes: #2539

Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
2021-09-09 15:33:24 +08:00
bin
2abc450a4d test: enable running tests under root user
Add tests that run under root user to test special cases.

Fixes: #2446

Signed-off-by: bin <bin@hyper.sh>
2021-09-09 14:21:34 +08:00
Bin Liu
103fdd3f6c
Merge pull request #2564 from Bevisy/main-2296
virtcontainers: Remove NewStoreFeature
2021-09-03 10:41:21 +08:00
James O. D. Hunt
f3a1bf3b45
Merge pull request #2552 from bergwolf/license
license: drop redundent license files
2021-09-02 14:31:18 +01:00
Binbin Zhang
e2a9e78c9e virtcontainers: Remove NewStoreFeature
remove NewStoreFeature

Fixes: #2296

Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
2021-09-02 21:28:36 +08:00
Peng Tao
256c3b2747 license: drop redundent license files
There is no need to keep multiple copies of the license file in
different directory. We can just use the top level one for the project.

Fixes: #2553
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-09-01 15:10:04 +08:00
Hui Zhu
bcc9fa3b35 hotplugAddBlockDevice: Use ExecuteBlockdevAddWithDriverCache with swap
Use ExecuteBlockdevAddWithDriverCache with swap in
hotplugAddBlockDevice to handle swap file cannot work OK with
ExecuteBlockdevAddWithCache issue.

Fixes: #2548

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-09-01 14:13:11 +08:00
Peng Tao
c0daa4ebff
Merge pull request #2513 from cmaf/tracing-tracingtags-consistency
tracing: Change runtime tracing tags to vars
2021-08-31 10:25:10 +08:00
Fabiano Fidêncio
67d1f4fd14
Merge pull request #2528 from snir911/main_debuggabillity_sq
shimv2: add logging to shimv2 api calls
2021-08-30 15:50:55 +02:00
Peng Tao
a9de761d71 runtime: drop qemu-lite support
As the project is not maintained and we have not been testing against it
for a long time.

Fixes: #2529
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2021-08-30 16:58:12 +08:00
Snir Sheriber
0c7789fad6 runtime: Add container field to logs
and unified field naming

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2021-08-30 10:09:05 +03:00
Bo Chen
b564dd47b6
Merge pull request #2526 from Bevisy/main-2285
runtime: delete types or const that no longer needed
2021-08-29 15:35:03 -07:00
Bin Liu
a89cc0bb5c
Merge pull request #2524 from Bevisy/main-2264
runtime: Optimize the way slice created
2021-08-29 16:00:08 +08:00
Eric Ernst
8771d8c375
Merge pull request #2514 from rapiz1/improve-util-test
virtcontainers: simplify tests
2021-08-28 06:41:15 -07:00
Yujia Qiao
a99fcc3af1 virtcontainers: simplify tests
Simplify tests in utils_test.go by table-driven tests.

Fixes: #2281

Signed-off-by: Yujia Qiao <rapiz3142@gmail.com>
2021-08-28 12:35:25 +08:00
Binbin Zhang
39ffd8ee84 runtime: delete types or const that no longer needed
type: ProcessListOptions; ProcessList
const: SocketTypeVSOCK

Fixes: #2285

Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
2021-08-28 04:09:25 +00:00
Binbin Zhang
ff37f5c798 runtime: Optimize the way slice created
Initialize and assign a value, reducing one append operation

Fixes: #2264

Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
2021-08-28 04:15:59 +08:00
Carlos Venegas
fb583780f6
Merge pull request #2488 from likebreath/0823/clh_openapi_generator
virtcontainers: clh: Upgrade to the openapi-generator v5.2.1
2021-08-27 14:28:09 -05:00
Binbin Zhang
4751698829 virtcontainers: Fix incorrect scripts path
modify to the correct relative path

Fixes: #2515

Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
2021-08-27 19:16:53 +00:00
Chelsea Mafrica
8f0f949abf tracing: Move dynamically added attributes to Trace()
Where possible, move attributes added with AddTag() to Trace() call to
reduce the amount of code used for tracing.

Fixes #2512

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-08-27 08:26:40 -07:00
Bo Chen
932ee41b3f virtcontainers: clh: Workaround incorrect default values
Two default values defined in the 'cloud-hypervisor.yaml' have typo, and this
patch manually overwrites them with the correct value as a workaround
before the corresponding fix is landed to Cloud Hypervisor upstream.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-08-26 22:53:31 -07:00
Bo Chen
bff38e4f4d virtcontainers: clh: Fix the unit test
This patch fixes the unit tests over clh.go with the updated client code.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-08-26 22:53:17 -07:00
Bo Chen
d967d3cb37 virtcontainers: clh: Use constructors to ensure proper default value
With the updated openapi-generator, the client code now handles optional
attributes correctly, and ensures to assign the right default
values. This patch enables to use those constructors to make sure the
proper default values being used.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-08-26 22:53:13 -07:00
Chelsea Mafrica
8058e97212 tracing: Change runtime tracing tags to vars
Tracing tags are stored inconsistently throughout the runtime. Change
all instances of tracing tags to variables.

Fixes #2512

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-08-26 15:55:32 -07:00
Bo Chen
a6a2e525de virtcontainers: clh: Migrate to use the updated client APIs
The client code (and APIs) for Cloud Hypervisor has been changed
dramatically due to the upgrade to `openapi-generator` v5.2.1. This
patch migrate the Cloud Hypervisor driver in the kata-runtime to use
those updated APIs.

The main change from the client code is that it now uses "pointer" type
to represent "optional" attributes from the input openapi specification
file.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-08-26 14:04:18 -07:00
Yujia Qiao
814cea9601 virtcontainers: clean up useless code
Fixes: #2275

Signed-off-by: Yujia Qiao <rapiz3142@gmail.com>
2021-08-24 16:04:34 +08:00
Bo Chen
46eb07e14f virtcontainers: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor with the
updated `openapi-generator` v5.2.1.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-08-23 16:00:32 -07:00
Bo Chen
80fba4d637 virtcontainers: clh: Upgrade to the openapi-generator v5.2.1
To improve the quality and correctness of the auto-generated code, this
patch upgrade the `openapi-generator` to its latest stable release
v5.2.1.

Fixes: #2487

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-08-23 15:59:41 -07:00
Bl1tz23
87bbae1bd7 fc: fix version parsing for fc >= 0.25
Allows to use firecracker version >=0.25.

Fixes: #2471

Signed-off-by: Bl1tz23 <alex3angle@gmail.com>
2021-08-23 15:09:59 +03:00
wangyongchao.bj
99ab91df3d docs: update the docs project url from kata 1.x to 2.x
changed the document project url in the using-vpp-and-kata.md and
runtime experimental README.md files.

Fixes: #2418

Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
2021-08-10 13:51:54 +08:00
Anastassios Nanos
64dd35ba4f virtcontainers: fc: properly remove jailed block device
When running a firecracker instance jailed, block devices
are not removed correctly, as the jailerRoot path is not
stripped from the PATCH command sent to the FC API.

This patch differentiates the jailed case from the non-jailed
one and allows the firecracker instance to be properly
terminated.

Fixes #2387

Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
2021-08-04 16:31:56 +00:00
David Gibson
3165095669 runtime/qemu: Use explicit "on" for kernel_irqchip parameter
Kata uses the 'kernel_irqchip' machine option to qemu.  By default it
uses it in what qemu calls the "short-form boolean" with no parameter.
That style was deprecated by qemu between 5.2 and 6.0 (commit
ccd3b3b8112b) and effectively removed entirely between 6.0 and 6.1
(commit d8fb7d0969d5).

Update ourselves for newer qemus by using an explicit
"kernel_irqchip=on".

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-08-04 14:34:11 +10:00
Carlos Venegas
27b9a68189
Merge pull request #2365 from sameo/topic/clh-tracing
virtcontainers: clh: Do not use the default HTTP client
2021-08-03 12:54:09 -05:00
Hui Zhu
e6408fe670 Container: Add initConfigResourcesMemory and call it in newContainer
The swappiness is not right if just set
io.katacontainers.container.resource.swappiness:
$ pod_yaml=pod.yaml
$ container_yaml=container.yaml
$ image="quay.io/prometheus/busybox:latest"
$ cat << EOF > "${pod_yaml}"
metadata:
  name: busybox-sandbox1
EOF
$ cat << EOF > "${container_yaml}"
metadata:
  name: busybox-killed-vmm
annotations:
  io.katacontainers.container.resource.swappiness: "100"
image:
  image: "$image"
command:
- top
EOF
$ sudo crictl pull $image
$ podid=$(sudo crictl runp $pod_yaml)
$ cid=$(sudo crictl create $podid $container_yaml $pod_yaml)
$ sudo crictl start $cid
crictl exec $cid cat /sys/fs/cgroup/memory/memory.swappiness
60

The cause of this issue is there are two elements store the resources
infomation.  They are c.config.Resources for calculateSandboxMemory and
c.GetPatchedOCISpec() for agent.
This add initConfigResourcesMemory to Container and call it in
newContainer to handle the issue.

Fixes: #2372

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-08-02 16:02:12 +08:00
Fupan Li
fdc42ca7ff
Merge pull request #2324 from jongwu/ro_nv
qemu/arm: remove nvdimm/"ReadOnly" option on arm64
2021-08-02 14:14:06 +08:00
Hui Zhu
ee90affc18 newContainer: Initialize c.config.Resources.Memory if it is nil
container start fail if
io.katacontainers.container.resource.swap_in_bytes and
memory_limit_in_bytes are not set.
$ pod_yaml=pod.yaml
$ container_yaml=container.yaml
$ image="quay.io/prometheus/busybox:latest"
$ cat << EOF > "${pod_yaml}"
metadata:
  name: busybox-sandbox1
EOF
$ cat << EOF > "${container_yaml}"
metadata:
  name: busybox-killed-vmm
annotations:
  io.katacontainers.container.resource.swappiness: "60"
image:
  image: "$image"
command:
- top
EOF
$ sudo crictl pull $image
$ podid=$(sudo crictl runp $pod_yaml)
$ cid=$(sudo crictl create $podid $container_yaml $pod_yaml)
$ sudo crictl start $cid
DEBU[0000] get runtime connection
DEBU[0000] connect using endpoint
'unix:///var/run/containerd/containerd.sock' with '10s' timeout
DEBU[0000] connected successfully using endpoint:
unix:///var/run/containerd/containerd.sock
DEBU[0000] StartContainerRequest:
&StartContainerRequest{ContainerId:4fea91d16f661931fe33acd247efe831ef9e571588ba18b5a16f04c278fd61b8,}
DEBU[0000] StartContainerResponse: nil
FATA[0000] starting the container
"4fea91d16f661931fe33acd247efe831ef9e571588ba18b5a16f04c278fd61b8": rpc
error: code = Unknown desc = failed to create containerd task: failed to
create shim: ttrpc: closed: unknown

The cause of fail if if c.config.Resources.Memory is nil, values of
io.katacontainers.container.resource.swappiness and
io.katacontainers.container.resource.swap_in_bytes will be store in
newContainer.

This commit initialize c.config.Resources.Memory if it is nil in
newContainer.

Fixes: #2367

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-08-01 10:03:27 +08:00
Hui Zhu
767a41ce56 updateResources: Log result after calculateSandboxMemory
Log result after calculateSandboxMemory in updateResources.

Fixes: #2367

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-08-01 09:57:44 +08:00
Samuel Ortiz
760ec4e58a virtcontainers: clh: Do not use the default HTTP client
When enabling tracing with Cloud Hypervisor, we end up establishing 2
connections to 2 different HTTP servers: The Cloud Hypervisor API one
that runs over a UNIX socket and the Jaeger endpoint running over UDP.

Both connections use the default HTTP golang client instance, and thus
share the same transport layer. As the Cloud Hypervisor implementation
sets it up to be over a Unix socket, the jaeger uploader ends up going
through that transport as well, and sending its spans to the Cloud
Hypervisor API server.

We fix that by giving the Cloud Hypervisor implementation its own HTTP
client instance and we avoid sharing it with anything else in the shim.

Fixes #2364

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
2021-07-30 16:51:01 +02:00
James O. D. Hunt
4f0726bc49 docs: Remove table of contents
Removed all TOCs now that GitHub auto-generates them.

Also updated the documentation requirements doc removing the requirement
to add a TOC.

Fixes: #2022.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-07-30 10:58:22 +01:00
Bo Chen
cc0bb9aebc versions: Upgrade to Cloud Hypervisor v17.0
Highlights from the Cloud Hypervisor release v17.0: 1) ARM64 NUMA
support using ACPI; 2) `Seccomp` support for MSHV backend; 3) Hotplug of
macvtap devices; 4) Improved SGX support; 5) Inflight tracking for
`vhost-user` devices; 6) Bug fixes.

Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v17.0

Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by `openapi-generator` [1-2]. As the API changes do not
impact usages in Kata, no additional changes in kata's runtime are
needed to work with the current version of cloud-hypervisor.

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Fixes: #2333

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-07-27 11:56:29 -07:00
Jianyong Wu
77604de80b qemu/arm: remove nvdimm/"ReadOnly" option on arm64
There is a new "ReadOnly" option added to nvdimm device in qemu
and now added to kata. However, qemu used for arm64 is a little
old and has no this feature. Here we remove this feature for arm.

Fixes: #2320
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-07-27 20:32:55 +08:00
Gabriela Cervantes
4fbae549e4 docs: Update experimental documentation
This PR updates the experimental documentation with the proper reference
to kata 2.x

Fixes #2317

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2021-07-26 20:29:21 +00:00
Fabiano Fidêncio
116c29c897 cgroups: manager's Set() now takes Resources as its parameter
Pior our bump to runc 1.0.1 the manager's Set() would take a Config as
its parameter.  Now it takes the Resources directly.

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-07-26 11:34:27 +02:00
Fabiano Fidêncio
c0f801c0c4 rootless: RunningInUserNS() is now part of userns namespace
Previously part of the "system" namespace, the RunningInUserNS() has
been moved to the "userns" namespace.

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-07-26 11:34:23 +02:00
Julio Montes
2859600a6f runtime: virtcontainers: make rootfs image read-only
Improve security by making rootfs image read-only, nobody
will be able to modify it from the guest.

fixes #1916

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-07-23 13:20:42 -05:00
Julio Montes
aec530904b runtime: virtcontainers/utils: fix govet fieldalignment
Fix structures alignment

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-07-20 11:59:15 -05:00
Julio Montes
1e4f7faa77 runtime: virtcontainers/types: fix govet fieldalignment
Fix structures alignment

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-07-20 11:59:15 -05:00
Julio Montes
bb9495c0b7 runtime: virtcontainers/pkg: fix govet fieldalignment
Fix structures alignment

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-07-20 11:59:15 -05:00
Julio Montes
80ab91ac2f runtime: virtcontainers/persist: fix govet fieldalignment
Fix structures alignment

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-07-20 11:59:15 -05:00
Julio Montes
54bdd01811 runtime: virtcontainers/factory: fix govet fieldalignment
Fix structures alignment

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-07-20 11:59:15 -05:00
Julio Montes
dd58de368d runtime: virtcontainers/device: fix govet fieldalignment
Fix structures alignment

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-07-20 11:59:15 -05:00
Julio Montes
47d95dc1c6 runtime: virtcontainers: fix govet fieldalignment
Fix structures alignment

fixes #2271

Depends-on: github.com/kata-containers/tests#3727

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-07-20 11:59:15 -05:00
Hui Zhu
cb6b7667cd runtime: Add option "enable_guest_swap" to config hypervisor.qemu
This commit add option "enable_guest_swap" to config hypervisor.qemu.
It will enable swap in the guest. Default false.
When enable_guest_swap is enabled, insert a raw file to the guest as the
swap device if the swappiness of a container (set by annotation
"io.katacontainers.container.resource.swappiness") is bigger than 0.
The size of the swap device should be
swap_in_bytes (set by annotation
"io.katacontainers.container.resource.swap_in_bytes") - memory_limit_in_bytes.
If swap_in_bytes is not set, the size should be memory_limit_in_bytes.
If swap_in_bytes and memory_limit_in_bytes is not set, the size should be
default_memory.

Fixes: #2201

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-07-19 23:22:06 +08:00
Hui Zhu
a733f537e5 runtime: newContainer: Handle the annotations of SWAP
This commit add code to handle the annotations
"io.katacontainers.container.resource.swappiness" and
"io.katacontainers.container.resource.swap_in_bytes".
It will set the value of "io.katacontainers.resource.swappiness" to
c.config.Resources.Memory.Swappiness and set the value of
"io.katacontainers.resource.swap_in_bytes" to
c.config.Resources.Memory.Swap.

Fixes: #2201

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-07-19 23:20:46 +08:00
Hui Zhu
2c835b60ed ContainerConfig: Set ocispec.Annotations to containerConfig.Annotations
ocispec.Annotations is dropped in ContainerConfig.
This commit let it to be set to containerConfig.Annotations in
ContainerConfig.

Fixes: #2201

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-07-19 23:20:43 +08:00
Hui Zhu
243d4b8689 runtime: Sandbox: Add addSwap and removeSwap
addSwap will create a swap file, hotplug it to hypervisor as a special
block device and let agent to setup it in the guest kernel.
removeSwap will remove the swap file.

Just QEMU support addSwap.

Fixes: #2201

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-07-19 23:20:40 +08:00
Hui Zhu
e1b91986d7 runtime: Update golang proto code for AddSwap
Fixes: #2201

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-07-19 23:20:37 +08:00
Fabiano Fidêncio
11d84cca46
Merge pull request #2229 from lifupan/fix_virtiofsd
virtiofsd: fix the issue of missing stop virtiofsd
2021-07-19 13:34:59 +02:00
Fabiano Fidêncio
3a9ecbcca5
Merge pull request #2231 from liubin/fix/2230-register-defer-callback-at-early-stage
runtime: Register defer function at early stage
2021-07-14 17:50:48 +02:00
fupan.lfp
34828df9a1 virtiofsd: fix the issue of missing stop virtiofsd
The virtiofsd's PID wan't assigned the right pid,
which will result skipping kill it.

Fixes: #2228

Signed-off-by: fupan.lfp <fupan.lfp@antgroup.com>
2021-07-14 21:07:10 +08:00
bin
39546a1070 runtime: delete not used functions
Delete some not used functions in sandbox.go

Fixes: #2230

Signed-off-by: bin <bin@hyper.sh>
2021-07-14 19:42:50 +08:00
bin
d0bc148fe0 runtime: Register defer function at early stage
Register defer function at early stage ensure that
it can be called if the startSandbox fails.

Fixes: #2230

Signed-off-by: bin <bin@hyper.sh>
2021-07-14 17:20:53 +08:00
bin
350acb2d6e virtcontainers: refactoring code for error handling in sandbox
Use a defined error variable replade inplace error, and shortcut
for handling errors returned from function calls.

Fixes: #2187

Signed-off-by: bin <bin@hyper.sh>
2021-07-14 14:28:58 +08:00
bin
858f39ef75 virtcontainers: update wrong comments for code
Some comments/URL are old or wrong, update them
to the correct ones.

Fixes: #2187

Signed-off-by: bin <bin@hyper.sh>
2021-07-14 14:28:57 +08:00
bin
e0a19f6a16 virtcontainers: update API documentation
Some functions add context as its first parameter,
the documentation should update.

Fixes: #2187

Signed-off-by: bin <bin@hyper.sh>
2021-07-14 14:28:57 +08:00
Eric Ernst
feeb1ef8b1
Merge pull request #2212 from lifupan/fix_virtiofsd
qemu: stop the virtiofsd specifically
2021-07-12 13:56:04 -07:00
Chelsea Mafrica
61b1a6732b
Merge pull request #2179 from bporter816/bporter816/refactor-tracing
tracing: Consolidate tracing into a new katatrace package
2021-07-12 12:42:01 -04:00
bin
9081bee2fd runtime: return error if clh's binary has not a normal stat
When checking clh's binary path if valid, return error even
though the error is not a IsNotExist error.

And add errors to log filed when errors occurred.

Fixes: #2208

Signed-off-by: bin <bin@hyper.sh>
2021-07-12 11:16:35 +08:00
Benjamin Porter
b10e3e22b5
tracing: Consolidate tracing into a new katatrace package
Removes custom trace functions defined across the repo and creates
a single trace function in a new katatrace package. Also moves
span tag management into this package and provides a function to
dynamically add a tag at runtime, such as a container id, etc.

Fixes #1162

Signed-off-by: Benjamin Porter <bporter816@gmail.com>
2021-07-11 14:19:51 -05:00
fupan.lfp
8f76626fd6 qemu: stop the virtiofsd specifically
We'd better stop the virtiofsd specifically after stop qemu,
instead of depending on the qemu's termination to notify virtiofsd
to exit.

Fixes: #2211

Signed-off-by: fupan.lfp <fupan.lfp@antgroup.com>
2021-07-10 17:26:19 +08:00
Fabiano Fidêncio
305fb0547d virtcontainers: Fix gosimple issue on client.go
For some reason our static check started to get opinionated about code
that's been there for ages.

One of the suggestions is to improve:
```
INFO: Running golangci-lint on /home/fidencio/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/pkg/agent/protocols/client
client.go:431:2: S1017: should replace this `if` statement with an unconditional `strings.TrimPrefix` (gosimple)
	if strings.HasPrefix(sock, "mock:") {
```

And that's what this PR is about.

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-07-09 17:18:08 +02:00
Fabiano Fidêncio
89cf168c92 virtcontainers: Ignore a staticcheck error on cpuset.go
First of all, cpuset.go just comes from kubernetes and we shouldn't be
doing much with this file apart from updating it every now and then
(but that's material for another PR).

Right now, due to some change on the static checks we use as part of our
CI, we started getting issues as:
```
INFO: Running golangci-lint on /home/fidencio/go/src/github.com/kata-containers/kata-containers/src/runtime/virtcontainers/pkg/cpuset
cpuset.go:60:2: SA4005: ineffective assignment to field Builder.done (staticcheck)
	b.done = true
```

For those, let's just ignore the lint and move on.

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
2021-07-09 17:17:12 +02:00
Jakob Naucke
cfd690b638
virtcontainers: Use virtio-blk-ccw on s390x
if virtio-blk-pci were to be used

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-07-08 14:59:47 +02:00
Bo Chen
d08603bebb runtime: Remove the version check for cloud hypervisor
It looks like the version check for cloud hypervisor (clh) was added
initially when clh was actively evolving its API. We no longer need the
version check as clh API has been fairly stable for its recent releases.

Fixes: #1991

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-07-06 18:42:59 -07:00
Tim Zhang
3f1aa8ff91
Merge pull request #2084 from liubin/fix/2082-refactor-vc-pkg-oci
runtime: refact virtcontainers/pkg/oci
2021-07-06 19:14:10 +08:00
Fupan Li
2de9c5b41d
Merge pull request #1969 from liubin/feature/1968-pass-span-context-to-agent
Pass span context from runtime to agent to get a full trace #1968
2021-07-03 09:31:02 +08:00
bin
bd5951247c runtime: add spans and attributes for agent/mount
Add more spans and attributes for agent setup, add devices,
and mount volumes.

Fixes: #1968

Signed-off-by: bin <bin@hyper.sh>
2021-07-02 10:07:28 +08:00
bin
ae46e7bf97 runtime: pass span context to agent in ttRPC client
Pass span context through ttRPC metadata, that
agent can get parent from the context to create
new sub-spans.

Fixes: #1968

Signed-off-by: bin <bin@hyper.sh>
2021-07-02 10:07:14 +08:00
bin
66dd8719e3 runtime: refact virtcontainers/pkg/oci
Use common functions wrapping logic of getting values
from annotations, parsing bool/uint32/uint64 and setting
to struct fields.

Fixes: #2082

Signed-off-by: bin <bin@hyper.sh>
2021-07-01 10:14:47 +08:00
fupan.lfp
d671f78952 agent: fix the issue of convert OCI spec to RPC spec
Since the rpc spec used an interface to represen the ErrnoRet,
thus the transform function of OCItoGRPC should take care of
this case.

Depends-on: github.com/kata-containers/tests#3629

Fixes: #1441

Signed-off-by: fupan.lfp <fupan.lfp@antgroup.com>
2021-06-30 22:56:59 +08:00
fupan.lfp
f607641a6e shimv2: fix the issue bring by updating containerd vendor
Fix the mismatch bring by the upgrading of vendor of  containerd,
cgroup and runtime spec.

Fixes: #1441

Signed-off-by: fupan.lfp <fupan.lfp@antgroup.com>
2021-06-30 22:56:51 +08:00
Eric Ernst
d0ad388721
Merge pull request #2065 from ManaSugi/format-golang-proto
runtime: Format golang proto code
2021-06-30 11:08:57 -07:00
Fabiano Fidêncio
a8bb8269fe
Merge pull request #2047 from Jakob-Naucke/s390x-skip-hotplug
virtcontainers: Don't fail memory hotplug
2021-06-28 08:18:31 +02:00
Eric Ernst
064dfb164b runtime: Add "watchable-mounts" concept for inotify support
To workaround virtiofs' lack of inotify support, we'll special case
particular mounts which are typically watched, and pass on information
to the agent so it can ensure that the mount presented to the container
is indeed watchable (see applicable agent commit).

This commit will:
 - identify watchable mounts based on file count and mount source
 - create a watchable-bind storage object for these mounts to
   communicate intent to the agent
 - update the OCI spec to take the updated watchable mount source into account

Unit tests added and updated for the newly introduced
functionality/functions.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-06-24 10:07:06 -07:00
Eric Ernst
57c0cee0a5 runtime: Cleanup mountSharedDirMounts, shareFile parameters
There's no reason to pass the paths; they can be
determined when they are actually used.

Let's make the return values more comparable to the other mount handling
functions (we'll add storage object in future commit), and pass the mount maps as
function parameters.

...No functional changes here...

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-06-24 10:07:06 -07:00
Jakob Naucke
8310a3d70a
virtcontainers: Don't fail memory hotplug
Architectures that do not support memory hotplugging will fail when
memory limits are set because that amount is hotplugged. Issue a warning
instead. The long-term solution is virtio-mem.

Fixes: #1412
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-06-24 10:58:06 +02:00
David Gibson
c0cc6d5978
Merge pull request #1954 from marcel-apf/remove-pc
Remove the pc machine
2021-06-23 12:00:05 +10:00
Marcel Apfelbaum
ac6b9c53d2 runtime: Hot-plug virtio-mem device on PCI bridge
Currently the virtio-mem device is hotplugged on the root bus.
This doesn't work for PCIe machines like q35.

Hotplug the virtio-mem device into the pci bridge instead.

Fixes #1953
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
2021-06-22 12:34:48 +03:00
Marcel Apfelbaum
789a59549e virtcontainers: Remove the pc machine
Keeping around two different x86 machines has no added value
and require more tests and maintenance. Prefer the q35 machine
since it has more features and drop the pc machine.

Fixes #1953
Depends-on: github.com/kata-containers/tests#3586
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
2021-06-22 11:54:07 +03:00
Manabu Sugimoto
caf5760c45 runtime: Update golang proto code
We should update golang proto files.
These changes are updated using libprotoc v3.6.1.

Fixes: #2064

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2021-06-19 18:53:56 +09:00
Julio Montes
ecdd137c6f runtime: do not hot-remove PMEM devices
PMEM devices cannot be hot-removed from a running VM.

fixes #2018

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-06-18 09:02:03 -05:00
Tim Zhang
90029032b4
Merge pull request #2049 from liubin/2048/fix-log-field
runtime: using detail propertites instead of function name in log field
2021-06-17 10:53:12 +08:00
bin
2022c64f94 runtime: using detail propertites instead of function name in log field
To print the correct value of kernel parameters, the log field
value should not be a function name. And for that qemuArchBase
doesn't contain debug flag, so the log contains debug/non-debug
parameters.

Fixes: #2048

Signed-off-by: bin <bin@hyper.sh>
2021-06-17 00:17:16 +08:00
Julio Montes
361bee91f7 runtime/virtcontrainers: fix alignment structures
fix alignment of qemuArchBase and HypervisorConfig structures

Signed-off-by: Julio Montes <julio.montes@intel.com>
2021-06-16 07:16:49 -05:00