Commit Graph

30 Commits

Author SHA1 Message Date
Julio Montes
5ac6e9a897 virtcontainers: make socket generation hypervisor specific
Kata support several hypervisor and not all hypervisor support the
same type of sockets, for example QEMU support vsock and unix sockets, while
firecracker only support hybrid vsocks, hence sockets generations should be
hypervisor specific

fixes #2027

Signed-off-by: Julio Montes <julio.montes@intel.com>
2019-09-19 19:39:07 +00:00
Peng Tao
0075bf85ba hypervisor: allow to return a slice of pids
so that for qemu, we can save and export virtiofsd pid,
and put it to the same cgroup as the qemu process.

Fixes: #1972
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2019-08-21 11:37:01 +08:00
Peng Tao
6c77d76f24 qemu: check guest status with qmp query-status
When guest panics or stops with unexpected internal
error, qemu process might still be running but we can
find out such situation with qmp. Then monitor can still
report such failures to watchers.

Fixes: #1963
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2019-08-16 12:58:25 +00:00
Wei Zhang
7d5e48f1b5 persist: manage "hypervisor.json" with new store
Fixes #803

Merge "hypervisor.json" into "persist.json", so the new store can take
care of hypervisor data now.

Signed-off-by: Wei Zhang <weizhang555.zw@gmail.com>
2019-07-23 17:09:11 +08:00
Manohar Castelino
78ea50c36c virtcontainers: Jailer: Add jailer support for firecracker
Firecracker provides a jailer to constrain the VMM. Use this
jailer to launch the firecracker VMM instead of launching it
directly from the kata-runtime.

The jailer will ensure that the firecracker VMM will run
in its own network and mount namespace. All assets required
by the VMM have to be present within these namespaces.
The assets need to be copied or bind mounted into the chroot
location setup by jailer in order for firecracker to access
these resouces. This includes files, device nodes and all
other assets.

Jailer automatically sets up the jail to have access to
kvm and vhost-vsock.

If a jailer is not available (i.e. not setup in the toml)
for a given hypervisor the runtime will act as the jailer.

Also enhance the hypervisor interface and unit tests to
include the network namespace. This allows the hypervisor
to choose how and where to lauch the VMM process, vs
virtcontainers directly launching the VMM process.

Fixes: #1129

Signed-off-by: Manohar Castelino <manohar.r.castelino@intel.com>
2019-07-11 21:32:36 +00:00
Penny Zheng
47670fcf73 memoryDevice: reconstruct memoryDevice
If kata-runtime supports memory hotplug via probe interface, we need to reconstruct
memoryDevice to store relevant status, which are addr and probe. addr specifies the
physical address of the memory device, and probe determines it is hotplugged via
acpi-driven or probe interface.

Fixes: #1149

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
2019-04-04 17:03:20 +08:00
Peng Tao
6fda03ec92 hypervisor: make getThreadIDs return vcpu to threadid mapping
We need such mapping information to put vcpus in container cpuset properly.

Fixes: #1435

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2019-04-02 15:51:27 +08:00
Hui Zhu
90704c8bb6 VMCache: the core and the client
VMCache is a new function that creates VMs as caches before using it.
It helps speed up new container creation.
The function consists of a server and some clients communicating
through Unix socket.  The protocol is gRPC in protocols/cache/cache.proto.
The VMCache server will create some VMs and cache them by factory cache.
It will convert the VM to gRPC format and transport it when gets
requestion from clients.
Factory grpccache is the VMCache client.  It will request gRPC format
VM and convert it back to a VM.  If VMCache function is enabled,
kata-runtime will request VM from factory grpccache when it creates
a new sandbox.

VMCache has two options.
vm_cache_number specifies the number of caches of VMCache:
unspecified or == 0   --> VMCache is disabled
> 0                   --> will be set to the specified number
vm_cache_endpoint specifies the address of the Unix socket.

This commit just includes the core and the client of VMCache.

Currently, VM cache still cannot work with VM templating and vsock.
And just support qemu.

Fixes: #52

Signed-off-by: Hui Zhu <teawater@hyper.sh>
2019-03-08 10:05:59 +08:00
Julio Montes
a1c85902f6 virtcontainers: add method to get hypervisor PID
hypervisor PID can be used to move the whole process and its
threads into a new cgroup.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2019-02-13 18:01:14 -06:00
Samuel Ortiz
fad23ea54e virtcontainers: Conversion to Stores
We convert the whole virtcontainers code to use the store package
instead of the resource_storage one. The resource_storage removal will
happen in a separate change for a more logical split.

This change is fairly big but mostly does not change the code logic.
What really changes is when we create a store for a container or a
sandbox. We now need to explictly do so instead of just assigning a
filesystem{} instance. Other than that, the logic is kept intact.

Fixes: #1099

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-02-07 00:59:29 +01:00
Samuel Ortiz
b25f43e865 virtcontainers: Add Capabilities to the types package
In order to move the hypervisor implementations into their own package,
we need to put the capabilities type into the types package.

Fixes: #1119

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-14 20:30:06 +01:00
Samuel Ortiz
cf22f402d8 virtcontainers: Remove the hypervisor waitSandbox method
We always call waitSandbox after we start the VM (startSandbox), so
let's simplify the hypervisor interface and integrate waiting for the VM
into startSandbox.
This makes startSandbox a blocking call, but that is practically the
case today.

Fixes: #1009

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-08 19:38:33 +01:00
Samuel Ortiz
763bf18daa virtcontainers: Remove the hypervisor init method
We always combine the hypervisor init and createSandbox, because what
we're trying to do is simply that: Set the hypervisor and have it create
a sandbox.

Instead of keeping a method with vague semantics, remove init and
integrate the actual hypervisor setup phase into the createSandbox one.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-08 19:37:20 +01:00
Peng Tao
bf1a5ce000 sandbox: cleanup sandbox if creation failed
This includes cleaning up the sandbox on disk resources,
and closing open fds when preparing the hypervisor.

Fixes: #1057

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-12-21 13:46:16 +08:00
Jose Carlos Venegas Munoz
618cfbf1db vc: sandbox: Let sandbox manage VM resources.
- Container only is responsable of namespaces and cgroups
inside the VM.

- Sandbox will manage VM resources.

The resouces has to be re-calculated and updated:

- Create new Container: If a new container is created the cpus and memory
may be updated.

- Container update: The update call will change the cgroups of a container.
the sandbox would need to resize the cpus and VM depending the update.

To manage the resources from sandbox the hypervisor interaface adds two methods.

- resizeMemory().

This function will be used by the sandbox to request
increase or decrease the VM memory.

- resizeCPUs()

vcpus are requested to the hypervisor based
on the sum of all the containers in the sandbox.

The CPUs calculations use the container cgroup information all the time.

This should allow do better calculations.

For example.

2 containers in a pod.

container 1 cpus = .5
container 2 cpus = .5

Now:
Sandbox requested vcpus 1

Before:
Sandbox requested vcpus 2

When a update request is done only some atributes have
information. If cpu and quota are nil or 0 we dont update them.

If we would updated them the sandbox calculations would remove already
removed vcpus.

This commit also moves the sandbox resource update call at container.update()
just before the container cgroups information is updated.

Fixes: #833

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
2018-12-13 16:33:14 -06:00
Wei Zhang
34fe3b9d6d cgroups: add host cgroup support
Fixes #344

Add host cgroup support for kata.

This commits only adds cpu.cfs_period and cpu.cfs_quota support.

It will create 3-level hierarchy, take "cpu" cgroup as an example:

```
/sys/fs/cgroup
|---cpu
   |---kata
      |---<sandbox-id>
         |--vcpu
      |---<sandbox-id>
```

* `vc` cgroup is common parent for all kata-container sandbox, it won't be removed
after sandbox removed. This cgroup has no limitation.
* `<sandbox-id>` cgroup is the layer for each sandbox, it contains all other qemu
threads except for vcpu threads. In future, we can consider putting all shim
processes and proxy process here. This cgroup has no limitation yet.
* `vcpu` cgroup contains vcpu threads from qemu. Currently cpu quota and period
constraint applies to this cgroup.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
Signed-off-by: Jingxiao Lu <lujingxiao@huawei.com>
2018-10-27 09:41:35 +08:00
Zichang Lin
36306e283c sandbox/virtcontainers: modify tests relate to memory hotplug.
Signed-off-by: Clare Chen <clare.chenhui@huawei.com>
Signed-off-by: Zichang Lin <linzichang@huawei.com>
2018-10-17 23:01:13 -04:00
Jose Carlos Venegas Munoz
1f5792ecbb test: fix unit test nil pointer.
Add filesystem to qemu object.
Fix mock_hypervisor

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
2018-10-02 15:58:08 -05:00
Archana Shinde
2f552fbf43 hypervisor: Add hypervisor interface to return config
This api will allow the config to be accessed by other subsystems
such as network.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2018-09-12 12:02:15 -07:00
Peng Tao
ce288652d5 virtcontainers: remove sandboxConfig.VMConfig
We can just use hyprvisor config to specify the memory size
of a guest. There is no need to maintain the extra place just
for memory size.

Fixes: #692

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-09-06 14:15:56 +08:00
James O. D. Hunt
d0679a6fd1 tracing: Add tracing support to virtcontainers
Add additional `context.Context` parameters and `struct` fields to allow
trace spans to be created by the `virtcontainers` internal functions,
objects and sub-packages.

Note that not every function is traced; we can add more traces as
desired.

Fixes #566.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2018-08-22 08:24:58 +01:00
Peng Tao
7a6f205970 virtcontainers: keep qmp connection when possible
For each time a sandbox structure is created, we ensure s.Release()
is called. Then we can keep the qmp connection as long as Sandbox
pointer is alive.

All VC interfaces are still stateless as s.Release() is called before
each API returns.

OTOH, for VCSandbox APIs, FetchSandbox() must be paired with s.Release,
the same as before.

Fixes: #500

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-07-23 08:37:55 +08:00
Peng Tao
28b6104710 qemu: prepare for vm templating support
1. support qemu migration save operation
2. setup vm templating parameters per hypervisor config
3. create vm storage path when it does not exist. This can happen when
an empty guest is created without a sandbox.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-07-19 12:44:58 +08:00
Peng Tao
7f20dd89a3 hypervisor: cleanup valid method
The boolean return value is not necessary.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-07-19 10:49:25 +08:00
Peng Tao
18e6a6effc hypervisor: decouple hypervisor from sandbox
A hypervisor implementation does not need to depend on a sandbox
structure. Decouple them in preparation for vm factory.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-07-19 10:49:25 +08:00
Julio Montes
4527a8066a virtcontainers/qemu: honour CPU constrains
Don't fail if a new container with a CPU constraint was added to
a POD and no more vCPUs are available, instead apply the constraint
and let kernel balance the resources.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2018-05-14 17:33:31 -05:00
James O. D. Hunt
bce9edd277 socket: Enforce socket length
A Unix domain socket is limited to 107 usable bytes on Linux. However,
not all code creating socket paths was checking for this limits.

Created a new `utils.BuildSocketPath()` function (with tests) to
encapsulate the logic and updated all code creating sockets to use it.

Fixes #268.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2018-05-09 11:36:24 +01:00
Graham whaley
d6c3ec864b license: SPDX: update all vc files to use SPDX style
When imported, the vc files carried in the 'full style' apache
license text, but the standard for kata is to use SPDX style.
Update the relevant files to SPDX.

Fixes: #227

Signed-off-by: Graham whaley <graham.whaley@intel.com>
2018-04-18 13:43:15 +01:00
Peng Tao
6107694930 runtime: rename pod to sandbox
As agreed in [the kata containers API
design](https://github.com/kata-containers/documentation/blob/master/design/kata-api-design.md),
we need to rename pod notion to sandbox. The patch is a bit big but the
actual change is done through the script:
```
sed -i -e 's/pod/sandbox/g' -e 's/Pod/Sandbox/g' -e 's/POD/SB/g'
```

The only expections are `pod_sandbox` and `pod_container` annotations,
since we already pushed them to cri shims, we have to use them unchanged.

Fixes: #199

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-04-13 09:32:51 +08:00
Samuel Ortiz
24eff72d82 virtcontainers: Initial import
This is a virtcontainers 1.0.8 import into Kata Containers runtime.

virtcontainers is a Go library designed to manage hardware virtualized
pods and containers. It is the core Clear Containers framework and will
become the core Kata Containers framework, as discussed at
https://github.com/kata-containers/runtime/issues/33

Some more more pointers:

virtcontainers README, including some design and architecure notes:
https://github.com/containers/virtcontainers/blob/master/README.md

virtcontainers 1.0 API:
https://github.com/containers/virtcontainers/blob/master/documentation/api/1.0/api.md

Fixes #40

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2018-03-13 00:49:46 +01:00