Commit Graph

70 Commits

Author SHA1 Message Date
Julio Montes
ed7240b40f virtcontainers: move device operations to a more generic place
move device operations to a more generic place where they can be used
in any hypervisor implementation.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2019-10-02 16:56:28 +00:00
Gabi Beyer
5f0799f1b7 vc: add rootless dir to path variables
Modify some path variables to be functions that return the path
with the rootless directory prefix if running rootlessly.

Fixes: #1827

Signed-off-by: Gabi Beyer <gabrielle.n.beyer@intel.com>
2019-09-26 16:17:16 +02:00
Julio Montes
5ac6e9a897 virtcontainers: make socket generation hypervisor specific
Kata support several hypervisor and not all hypervisor support the
same type of sockets, for example QEMU support vsock and unix sockets, while
firecracker only support hybrid vsocks, hence sockets generations should be
hypervisor specific

fixes #2027

Signed-off-by: Julio Montes <julio.montes@intel.com>
2019-09-19 19:39:07 +00:00
Julio Montes
880bb2b7b8 virtcontainers: introducing HybridVSock type
This new socket type is currently supported only by the firecracker hypervisor.
For more details about its internal implementation see:
https://github.com/firecracker-microvm/firecracker/blob/master/docs/vsock.md

Signed-off-by: Julio Montes <julio.montes@intel.com>
2019-09-19 11:25:11 -05:00
GabyCT
5ff0ef9377 Merge pull request #1971 from renzhengeek/renzhen/virtio-fs-dev
virtio-fs: add virtio_fs_extra_args for virtiofsd
2019-09-09 09:33:28 -05:00
Eric Ren
712e06ae84 virtio-fs: add virtio_fs_extra_args for virtiofsd
Since virtio-fs is under active development, more
options will be added increasingly. To avaoid frequent
change on runtime side to handle option changes, use
one mingled arg to ease testing new option/feature of
virtiofsd.

See `virtiofsd -h` for more option details.

Fixes: #1999
Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
2019-08-24 09:16:38 +08:00
Peng Tao
0075bf85ba hypervisor: allow to return a slice of pids
so that for qemu, we can save and export virtiofsd pid,
and put it to the same cgroup as the qemu process.

Fixes: #1972
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2019-08-21 11:37:01 +08:00
Peng Tao
6c77d76f24 qemu: check guest status with qmp query-status
When guest panics or stops with unexpected internal
error, qemu process might still be running but we can
find out such situation with qmp. Then monitor can still
report such failures to watchers.

Fixes: #1963
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2019-08-16 12:58:25 +00:00
Wei Zhang
7d5e48f1b5 persist: manage "hypervisor.json" with new store
Fixes #803

Merge "hypervisor.json" into "persist.json", so the new store can take
care of hypervisor data now.

Signed-off-by: Wei Zhang <weizhang555.zw@gmail.com>
2019-07-23 17:09:11 +08:00
Manohar Castelino
78ea50c36c virtcontainers: Jailer: Add jailer support for firecracker
Firecracker provides a jailer to constrain the VMM. Use this
jailer to launch the firecracker VMM instead of launching it
directly from the kata-runtime.

The jailer will ensure that the firecracker VMM will run
in its own network and mount namespace. All assets required
by the VMM have to be present within these namespaces.
The assets need to be copied or bind mounted into the chroot
location setup by jailer in order for firecracker to access
these resouces. This includes files, device nodes and all
other assets.

Jailer automatically sets up the jail to have access to
kvm and vhost-vsock.

If a jailer is not available (i.e. not setup in the toml)
for a given hypervisor the runtime will act as the jailer.

Also enhance the hypervisor interface and unit tests to
include the network namespace. This allows the hypervisor
to choose how and where to lauch the VMM process, vs
virtcontainers directly launching the VMM process.

Fixes: #1129

Signed-off-by: Manohar Castelino <manohar.r.castelino@intel.com>
2019-07-11 21:32:36 +00:00
Vijay Dhanraj
d9a4157841 virtcontainers: Add support for launching/managing ACRN based VMs
This patch adds the following,
1. Implement Sandbox management APIs for ACRN.
2. Implement Sandbox operation APIs for ACRN.
3. Add support for hot-plugging virtio-blk based
(using blk rescan feature) container rootfs to ACRN.
4. Prime devices, image and kernel parameters for
launching VM using ACRN.

v2->v3:
Incrementing index to keep track of virtio-blk devices
created. This change removes the workaround introduced
in block.go.

v1->v2:
1. Created issue #1785 to address the UUID TODO item.
2. Removed dead code.
3. Fixed formatting of log messages.
4. Fixed year in copyright message.
5. Removed acrn_amd64.go file as there are no amd64 specific
   changes. Moved the code to acrn_arch_base.go.

Fixes: #1778

Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
2019-07-10 10:49:24 -07:00
Vijay Dhanraj
828e0a2205 pkg/katautils: Add support for ACRN hypervisor config
This patch adds support for,
1. Extracting and configuring ACRN hypervisor from toml.
2. Add ACRN hypervisor ctl for controlling ACRN hypervisor.
This will be used for updating virtio-blk based
container rootfs using blk rescan feature.

v2->v3:
Fixed acrnctl path.

v1->v2:
Trimmed hypervisor config options as needed by ACRN.

Fixes: #1778

Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
2019-07-10 10:49:24 -07:00
Penny Zheng
7e6fcddefa kernelRootParams: define agnostic commonkernelRootParams
Let's define agnostic commonkernelRootParams for all hypervisors,
including qemu, firecracker, etc. for now, it has two scenarios,
one for NVDIMM, one for virtio-blk.

Fixes: #1642

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
2019-05-29 15:12:56 +08:00
Ganesh Maharaj Mahalingam
a41894da18 runtime: Enable file based backend
A file based memory backend mapped to the host, fot eg: '/dev/shm' will
be used by virtio-fs for performance reasons. This change is a generic
implementation of that for kata. This will be enabled default for
virtio-fs negating the need to enable hugepages in that scenario. This
option can be used without virtio-fs by setting 'file_mem_backend' to
the location in the configuration file. Default value is an empty
string.

Fixes: #1656
Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
2019-05-23 20:47:42 -07:00
Dr. David Alan Gilbert
75f75862c2 virtiofs: Add cache option
Several cache modes are supported by virtio-fs.  They affect the
performance and consistency characteristics of the file system.

For the time being cache="none" is recommended, but the other modes can
be experimented with.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-05-05 11:32:34 -06:00
Dr. David Alan Gilbert
6767c1a358 virtiofs: Add cache size option
Add VirtioFSCacheSize aka virtio_fs_cache_size option
to set the size (in MiB) of the DAX cache.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-05-05 11:32:34 -06:00
Stefan Hajnoczi
d690dff164 config: add virtio_fs_daemon string
Add a config option for the virtio-fs vhost-user daemon path.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-05-01 10:55:31 -04:00
Stefan Hajnoczi
9e87fa21cf config: add shared_fs option
Add a config option to select between virtio-9p and virtiofs.  This
option currently has no effect and will be used in a later patch.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-05-01 10:55:31 -04:00
Hui Zhu
16fe8553af qemu: Remove the storage directories if qemu get from the factory
Store related in directory /var/lib/vc/sbs and /run/vc/sbs if
vm template is enabled.
The cause is NewVM and NewVMFromGrpc will create vcStore with
VM's ID and set it as store of hypervisor if the factory is enabled.

This commit record the VM's ID to HypervisorConfig.VMid and remove
directories in qemu.cleanupVM to handle the issue.

Fixes: #1452

Signed-off-by: Hui Zhu <teawater@hyper.sh>
2019-04-10 11:11:45 +08:00
Penny Zheng
47670fcf73 memoryDevice: reconstruct memoryDevice
If kata-runtime supports memory hotplug via probe interface, we need to reconstruct
memoryDevice to store relevant status, which are addr and probe. addr specifies the
physical address of the memory device, and probe determines it is hotplugged via
acpi-driven or probe interface.

Fixes: #1149

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
2019-04-04 17:03:20 +08:00
Peng Tao
6fda03ec92 hypervisor: make getThreadIDs return vcpu to threadid mapping
We need such mapping information to put vcpus in container cpuset properly.

Fixes: #1435

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2019-04-02 15:51:27 +08:00
Ganesh Maharaj Mahalingam
f4428761cb lint: Update go linter from gometalinter to golangci-lint.
gometalinter is deprecated and will be archived April '19. The
suggestion is to switch to golangci-lint which is apparently 5x faster
than gometalinter.

Partially Fixes: #1377

Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
2019-03-25 08:48:13 -07:00
Hui Zhu
90704c8bb6 VMCache: the core and the client
VMCache is a new function that creates VMs as caches before using it.
It helps speed up new container creation.
The function consists of a server and some clients communicating
through Unix socket.  The protocol is gRPC in protocols/cache/cache.proto.
The VMCache server will create some VMs and cache them by factory cache.
It will convert the VM to gRPC format and transport it when gets
requestion from clients.
Factory grpccache is the VMCache client.  It will request gRPC format
VM and convert it back to a VM.  If VMCache function is enabled,
kata-runtime will request VM from factory grpccache when it creates
a new sandbox.

VMCache has two options.
vm_cache_number specifies the number of caches of VMCache:
unspecified or == 0   --> VMCache is disabled
> 0                   --> will be set to the specified number
vm_cache_endpoint specifies the address of the Unix socket.

This commit just includes the core and the client of VMCache.

Currently, VM cache still cannot work with VM templating and vsock.
And just support qemu.

Fixes: #52

Signed-off-by: Hui Zhu <teawater@hyper.sh>
2019-03-08 10:05:59 +08:00
Julio Montes
a1c85902f6 virtcontainers: add method to get hypervisor PID
hypervisor PID can be used to move the whole process and its
threads into a new cgroup.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2019-02-13 18:01:14 -06:00
Samuel Ortiz
fad23ea54e virtcontainers: Conversion to Stores
We convert the whole virtcontainers code to use the store package
instead of the resource_storage one. The resource_storage removal will
happen in a separate change for a more logical split.

This change is fairly big but mostly does not change the code logic.
What really changes is when we create a store for a container or a
sandbox. We now need to explictly do so instead of just assigning a
filesystem{} instance. Other than that, the logic is kept intact.

Fixes: #1099

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-02-07 00:59:29 +01:00
Samuel Ortiz
b25f43e865 virtcontainers: Add Capabilities to the types package
In order to move the hypervisor implementations into their own package,
we need to put the capabilities type into the types package.

Fixes: #1119

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-14 20:30:06 +01:00
Samuel Ortiz
67e696bf62 virtcontainers: Add Asset to the types package
In order to move the hypervisor implementations into their own package,
we need to put the asset type into the types package and break the
hypervisor->asset->virtcontainers->hypervisor cyclic dependency.

Fixes: #1119

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-14 20:30:06 +01:00
Samuel Ortiz
cf22f402d8 virtcontainers: Remove the hypervisor waitSandbox method
We always call waitSandbox after we start the VM (startSandbox), so
let's simplify the hypervisor interface and integrate waiting for the VM
into startSandbox.
This makes startSandbox a blocking call, but that is practically the
case today.

Fixes: #1009

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-08 19:38:33 +01:00
Samuel Ortiz
763bf18daa virtcontainers: Remove the hypervisor init method
We always combine the hypervisor init and createSandbox, because what
we're trying to do is simply that: Set the hypervisor and have it create
a sandbox.

Instead of keeping a method with vague semantics, remove init and
integrate the actual hypervisor setup phase into the createSandbox one.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-08 19:37:20 +01:00
Hui Zhu
dd28ff5986 memory: Add new option memory_offset
This value will be plused to max memory of hypervisor.
It is the memory address space for the NVDIMM devie.
If set block storage driver (block_device_driver) to "nvdimm",
should set memory_offset to the size of block device.

Signed-off-by: Hui Zhu <teawater@hyper.sh>
2018-12-24 15:36:25 +08:00
Peng Tao
bf1a5ce000 sandbox: cleanup sandbox if creation failed
This includes cleaning up the sandbox on disk resources,
and closing open fds when preparing the hypervisor.

Fixes: #1057

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-12-21 13:46:16 +08:00
Sebastien Boeuf
e14071f2bd Merge pull request #1045 from mcastelino/topic/firecracker-virtio-mmio
Firecracker: virtio mmio support
2018-12-20 19:47:01 -08:00
Manohar Castelino
0d84d799ea virtio-mmio: Add support for virtio-mmio
Start adding support for virtio-mmio devices starting with block.
The devices show within the vm as vda, vdb,... based on order of
insertion and such within the VM resemble virtio-blk devices.

They need to be explicitly differentiated to ensure that the
agent logic within the VM can discover and mount them appropropriately.
The agent uses PCI location to discover them for virtio-blk.
For virtio-mmio we need to use the predicted device name for now.

Note: Kata used a disk for the VM rootfs in the case of Firecracker.
(Instead of initrd or virtual-nvdimm). The Kata code today does not
handle this case properly.

For now as Firecracker is the only Hypervisor in Kata that
uses virtio-mmio directly offset the drive index to comprehend
this.

Longer term we should track if the rootfs is setup as a block
device explicitly.

Fixes: #1046

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Manohar Castelino <manohar.r.castelino@intel.com>
2018-12-20 15:08:51 -08:00
Manohar Castelino
e65bafa793 virtcontainers: Add firecracker as a supported hypervisor
Add firecracker as a supported hypervisor. This connects the
newly defined firecracker implementation as a supported
hypervisor.

Move operation definition to the common hypervisor code.

Signed-off-by: Manohar Castelino <manohar.r.castelino@intel.com>
2018-12-20 11:54:59 -08:00
Jose Carlos Venegas Munoz
618cfbf1db vc: sandbox: Let sandbox manage VM resources.
- Container only is responsable of namespaces and cgroups
inside the VM.

- Sandbox will manage VM resources.

The resouces has to be re-calculated and updated:

- Create new Container: If a new container is created the cpus and memory
may be updated.

- Container update: The update call will change the cgroups of a container.
the sandbox would need to resize the cpus and VM depending the update.

To manage the resources from sandbox the hypervisor interaface adds two methods.

- resizeMemory().

This function will be used by the sandbox to request
increase or decrease the VM memory.

- resizeCPUs()

vcpus are requested to the hypervisor based
on the sum of all the containers in the sandbox.

The CPUs calculations use the container cgroup information all the time.

This should allow do better calculations.

For example.

2 containers in a pod.

container 1 cpus = .5
container 2 cpus = .5

Now:
Sandbox requested vcpus 1

Before:
Sandbox requested vcpus 2

When a update request is done only some atributes have
information. If cpu and quota are nil or 0 we dont update them.

If we would updated them the sandbox calculations would remove already
removed vcpus.

This commit also moves the sandbox resource update call at container.update()
just before the container cgroups information is updated.

Fixes: #833

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
2018-12-13 16:33:14 -06:00
Julio Montes
976f5b2a6e Merge pull request #990 from alicefr/s390x
s390x: add support for s390x
2018-12-11 10:57:27 -06:00
Alice Frosi
6f83061139 s390x: add support for s390x
The PR adds the support for s390x.

In the case of CCW devices, the vhost-user devices are not supported.
See #659. An error message is thrown if they tried to be used.

Memory hotplug is not supported on s390 yet and an error message is thrown.

The VirtioNetPCI has been changed to VirtioNet. The generalization
allows to set the VirtioNet to the correct CCW device for s390x.

Fixes: #666

Co-authored-by: Yash D Jain ydjainopensource@gmail.com
Signed-off-by: Alice Frosi <afrosi@de.ibm.com>
2018-12-11 12:32:17 +01:00
Hui Zhu
f6511471d4 block: Add cache-related options for block devices
Add block_device_cache_set, block_device_cache_direct and
block_device_cache_noflush.
They are cache-related options for block devices that are described in
https://github.com/qemu/qemu/blob/master/qapi/block-core.json.
block_device_cache_direct denotes whether use of O_DIRECT (bypass the host
page cache) is enabled.  block_device_cache_noflush denotes whether flush
requests for the device are ignored.
The json said they are supported since 2.9.
So add block_device_cache_set to control the cache options set to block
devices or not.  It will help to support the old version qemu.

Fixes: #956

Signed-off-by: Hui Zhu <teawater@hyper.sh>
2018-12-06 18:07:44 +08:00
Felix Abecassis
33abb3ecf8 cli: add guest hook path option in the configuration file
Add support for specifying an optional drop-in path for guest OCI hooks.
This is the runtime side for leveraging the agent change introduced in
kata-containers/agent@980023ec62

Fixes: #720

Co-authored-by: Edward Guzman <eguzman@nvidia.com>
Co-authored-by: Felix Abecassis <fabecassis@nvidia.com>
Signed-off-by: Felix Abecassis <fabecassis@nvidia.com>
2018-10-29 13:06:22 -07:00
Wei Zhang
34fe3b9d6d cgroups: add host cgroup support
Fixes #344

Add host cgroup support for kata.

This commits only adds cpu.cfs_period and cpu.cfs_quota support.

It will create 3-level hierarchy, take "cpu" cgroup as an example:

```
/sys/fs/cgroup
|---cpu
   |---kata
      |---<sandbox-id>
         |--vcpu
      |---<sandbox-id>
```

* `vc` cgroup is common parent for all kata-container sandbox, it won't be removed
after sandbox removed. This cgroup has no limitation.
* `<sandbox-id>` cgroup is the layer for each sandbox, it contains all other qemu
threads except for vcpu threads. In future, we can consider putting all shim
processes and proxy process here. This cgroup has no limitation yet.
* `vcpu` cgroup contains vcpu threads from qemu. Currently cpu quota and period
constraint applies to this cgroup.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
Signed-off-by: Jingxiao Lu <lujingxiao@huawei.com>
2018-10-27 09:41:35 +08:00
Jose Carlos Venegas Munoz
41619e4f83 vc: qemu: Add option to change entropy source
This adds a config option to choose the VM entropy
source.

Fixes: #702

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
2018-09-25 17:54:32 -05:00
Jose Carlos Venegas Munoz
19801bf784 config: Add Memory slots configuration.
Add configuration to decide the amount of slots that will be used in a VM

- This will limit the amount of times that memory can be hotplugged.
- Use memory slots provided by user.
- tests: aling struct

cli: kata-env: Add memory slots info.

- Show the slots to be added to the VM.

```diff
[Hypervisor]
  MachineType = "pc"
  Version = "QEMU ..."
  Path = "/opt/kata/bin/qemu-system-x86_64"
  BlockDeviceDriver = "virtio-scsi"
  Msize9p = 8192
+  MemorySlots = 10
  Debug = false
  UseVSock = false
```

Fixes: #751

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
2018-09-21 10:57:00 -05:00
Ruidong
225e10cfc4 cli: add configuration option to enable/disable vhost_net
Add `disable_vhost_net` option to enable or disable the use of
vhost_net. Vhost_net can improve network performance.

Signed-off-by: Ruidong Cao <caoruidong@huawei.com>
2018-09-14 00:14:03 +08:00
Archana Shinde
2f552fbf43 hypervisor: Add hypervisor interface to return config
This api will allow the config to be accessed by other subsystems
such as network.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2018-09-12 12:02:15 -07:00
Peng Tao
a1537a5271 hypervisor: rename DefaultVCPUs and DefaultMemSz
Now that we only use hypervisor config to set them, they
are not overridden by other configs. So drop the default prefix.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-09-06 21:04:56 +08:00
Peng Tao
ce288652d5 virtcontainers: remove sandboxConfig.VMConfig
We can just use hyprvisor config to specify the memory size
of a guest. There is no need to maintain the extra place just
for memory size.

Fixes: #692

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-09-06 14:15:56 +08:00
James O. D. Hunt
d0679a6fd1 tracing: Add tracing support to virtcontainers
Add additional `context.Context` parameters and `struct` fields to allow
trace spans to be created by the `virtcontainers` internal functions,
objects and sub-packages.

Note that not every function is traced; we can add more traces as
desired.

Fixes #566.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2018-08-22 08:24:58 +01:00
Archana Shinde
31e2925a9a vfio: Add configuration to support VFIO hotplug on root bus
We need this configuration due to a limitation in seabios
firmware in handling hotplug for PCI devices with large BARS.
Long term, this needs to be fixed in the firmware.

Fixes #594

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2018-08-20 11:36:21 -07:00
Julio Montes
33643797ad virtcontainers: Use vsock if host support it
When the hypervisor option `use_vsock` is true the runtime will check for vsock
support. If vsock is supported, not proxy will be used and the shims
will connect to the VM using VSOCKS. This flag is true by default, so will use
VSOCK when possible and no proxy will be started.

fixes #383

Signed-off-by: Jose Carlos Venegas Munoz jose.carlos.venegas.munoz@intel.com
Signed-off-by: Julio Montes <julio.montes@intel.com>
2018-07-31 15:38:45 -05:00
Julio Montes
4680e58e08 cli: add configuration option to enable/disable vsocks
Add `use_vsock` option to enable or disable the use of vsocks
for communication between host and guest.

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Signed-off-by: Julio Montes <julio.montes@intel.com>
2018-07-31 13:52:43 -05:00