Commit Graph

70 Commits

Author SHA1 Message Date
Yang, Wei
071030b784 shimv2: Close vhostfd after vm get vhostfd
If kata containers is using vfio and vhost net,the unbinding
of vfio would be hang. In the scenario, vhost net kernel thread
takes a reference to the qemu's mm, and the reference also includes
the mmap regions on the vfio device file. so vhost kernel thread
would be not released when qemu is killed as the vhost file
descriptor still is opened by shim v2 process, and the vfio device
is not released because there's still a reference to the mmap.

Fixes: #1669

Signed-off-by: Yang, Wei <w90p710@gmail.com>
Signed-off-by: Eric Ernst <eric.ernst@intel.com>
2019-05-16 13:31:11 +08:00
Wei Zhang
297097779e persist: save/load GuestMemoryHotplugProbe
Support saving/loading `GuestMemoryHotplugProbe` from sandbox state.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2019-05-09 14:39:04 +08:00
Wei Zhang
4c192139cf newstore: remove file "devices.json"
When using experimental feature "newstore", we save and load devices
information from `persist.json` instead of `devices.json`, in such case,
file `devices.json` isn't needed anymore, so remove it.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2019-05-06 14:40:08 +08:00
Salvador Fuentes
bc9b9e2af6 vc: Revert "vc: change container rootfs to be a mount"
This reverts commit 196661bc0d.

Reverting because cri-o with devicemapper started
to fail after this commit was merged.

Fixes: #1574.

Signed-off-by: Salvador Fuentes <salvador.fuentes@intel.com>
2019-04-23 08:56:36 -05:00
Peng Tao
196661bc0d vc: change container rootfs to be a mount
We can use the same data structure to describe both of them.
So that we can handle them similarly.

Fixes: #1566

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2019-04-20 00:42:25 -07:00
Wei Zhang
b42fde69c0 persist: demo code for persist api
Demonstrate how to make use of `virtcontainer/persist/api` data structure
package.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2019-04-19 15:33:53 +08:00
Peng Tao
cf90751638 vc: export vc error types
So that shimv2 can convert it into grpc errors.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2019-04-12 02:01:02 -07:00
Fupan Li
c9a3b933f8 Merge pull request #1427 from Ace-Tang/fix-qemu-leak
qemu: fix qemu leak when failed to start container
2019-04-02 23:32:11 +08:00
lifupan
628ea46c58 virtcontainers: change container's rootfs from string to mount alike struct
container's rootfs is a string type, which cannot represent a
block storage backed rootfs which hasn't been mounted.
Change it to a mount alike struct as below:
    RootFs struct {
            // Source specify the BlockDevice path
            Source string
            // Target specify where the rootfs is mounted if it has been mounted
            Target string
            // Type specifies the type of filesystem to mount.
            Type string
            // Options specifies zero or more fstab style mount options.
            Options []string
            // Mounted specifies whether the rootfs has be mounted or not
            Mounted bool
     }

If the container's rootfs has been mounted as before, then this struct can be
initialized as: RootFs{Target: <rootfs>, Mounted: true} to be compatible with
previous case.

Fixes:#1158

Signed-off-by: lifupan <lifupan@gmail.com>
2019-04-02 10:54:05 +08:00
Ace-Tang
096fa046f8 qemu: fix qemu leak when failed to start container
do cleanup inside startVM() if start vm get error

Fixes: #1426

Signed-off-by: Ace-Tang <aceapril@126.com>
2019-03-28 19:38:56 +08:00
James O. D. Hunt
c759cf5f37 tracing: Fix tracing
The store refactor (#1066) inadvertently broke runtime tracing as it
created new contexts containing trace spans.

Reworking the store changes to re-use the existing context resolves the
problem since runtime tracing assumes a single context.

Fixes #1277.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2019-03-04 11:02:31 +00:00
James O. D. Hunt
f540a80354 store: Add SetLogger API
Add a `store.SetLogger()` API to allow the store package to log with the
standard set of fields (as expected by the log parser [1].

Fixes #1297.

---

[1] - https://github.com/kata-containers/tests/tree/master/cmd/log-parser#logfile-requirements

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2019-02-28 10:56:26 +00:00
Julio Montes
5201860bb0 virtcontainers: reimplement sandbox cgroup
All containers run in different cgroups even the sandbox, with this new
implementation the sandbox cpu cgroup wil be equal to the sum of all its
containers and the hypervisor process will be placed there impacting to the
containers running in the sandbox (VM). The default number of vcpus is
used when the sandbox has no constraints. For example, if default_vcpus
is 2, then quota will be 200000 and period 100000.

**c-ray test**
http://www.futuretech.blinkenlights.nl/c-ray.html

```
+=============================================+
|         | 6 threads 6cpus | 1 thread 1 cpu  |
+=============================================+
| current |   40 seconds    |   122 seconds   |
+==============================================
|   new   |   37 seconds    |   124 seconds   |
+==============================================
```

current = current cgroups implementation
new = new cgroups implementation

**workload**

```yaml
apiVersion: v1
kind: Pod
metadata:
  name: c-ray
  annotations:
    io.kubernetes.cri.untrusted-workload: "true"
spec:
  restartPolicy: Never
  containers:
  - name: c-ray-1
    image: docker.io/devimc/c-ray:latest
    imagePullPolicy: IfNotPresent
    args: ["-t", "6", "-s", "1600x1200", "-r", "8", "-i",
          "/c-ray-1.1/sphfract", "-o", "/tmp/output.ppm"]
    resources:
      limits:
        cpu: 6
  - name: c-ray-2
    image: docker.io/devimc/c-ray:latest
    imagePullPolicy: IfNotPresent
    args: ["-t", "1", "-s", "1600x1200", "-r", "8", "-i",
          "/c-ray-1.1/sphfract", "-o", "/tmp/output.ppm"]
    resources:
      limits:
        cpu: 1
```

fixes #1153

Signed-off-by: Julio Montes <julio.montes@intel.com>
2019-02-19 13:13:44 -06:00
Samuel Ortiz
fad23ea54e virtcontainers: Conversion to Stores
We convert the whole virtcontainers code to use the store package
instead of the resource_storage one. The resource_storage removal will
happen in a separate change for a more logical split.

This change is fairly big but mostly does not change the code logic.
What really changes is when we create a store for a container or a
sandbox. We now need to explictly do so instead of just assigning a
filesystem{} instance. Other than that, the logic is kept intact.

Fixes: #1099

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-02-07 00:59:29 +01:00
Samuel Ortiz
b05dbe3886 runtime: Convert to the new internal types package
We can now remove all the sandbox shared types and convert the rest of
the code to using the new internal types package.

This commit includes virtcontainers, cli and containerd-shim changes in
one atomic change in order to not break bisect'ibility.

Fixes: #1095

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-08 14:43:33 +01:00
Samuel Ortiz
3ab7d077d1 virtcontainers: Alias for pkg/types
Since we're going to have both external and internal types packages, we
alias the external one as vcTypes. And the internal one will be usable
through the types namespace.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-08 14:24:06 +01:00
Samuel Ortiz
acf833cb4a virtcontainers: Call agent startSandbox from startVM
We always ask the agent to start the sandbox when we start the VM, so we
should simply call agent.startSandbox from startVM instead of open
coding those.
This slightly simplifies the complex createSandboxFromConfig routine.

Fixes: #1011

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Eric Ernst <eric.ernst@intel.com>
2019-01-07 09:56:04 -08:00
Samuel Ortiz
ebf8547c38 virtcontainers: Remove useless startSandbox wrapper
startSandbox() wraps a single operation (sandbox.Start()), so we can
remove it and make the code easier to read/follow.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-07 09:48:22 -08:00
Peng Tao
bf1a5ce000 sandbox: cleanup sandbox if creation failed
This includes cleaning up the sandbox on disk resources,
and closing open fds when preparing the hypervisor.

Fixes: #1057

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-12-21 13:46:16 +08:00
Sebastien Boeuf
fa9b15dafe virtcontainers: Return the appropriate container status
When our runtime is asked for the container status, we also handle
the scenario where the container is stopped if the shim process for
that container on the host has terminated.

In the current implementation, we retrieve the container status
before stopping the container, causing a wrong status to be returned.
The wait for the original go-routine's completion was done in a defer
within the caller of statusContainers(), resulting in the
statusContainer()'s values to return the pre-stopped value.

This bug is first observed when updating to docker v18.09/containerd
v1.2.0. With the current implementation, containerd-shim receives the
TaskExit when it detects kata-shim is terminating. When checking the
container state, however, it does not get the expected "stopped" value.

The following commit resolves the described issue by simplifying the
locking used around the status container calls. Originally
StatusContainer would request a read lock. If we needed to update the
container status in statusContainer, we'd start a go-routine which
would request a read-write lock, waiting for the original read lock to
be released.  Can't imagine a bug could linger in this logic. We now
just request a read-write lock in the caller (StatusContainer),
skipping the need for a separate go-routine and defer. This greatly
simplifies the logic, and removes the original bug.

Fixes #926

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-11-28 20:10:34 -08:00
Sebastien Boeuf
982381bff0 api: Cleanup StartContainer()
Simple patch reducing the complexity of StartContainer().

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-11-12 15:15:56 -08:00
Sebastien Boeuf
57773816b3 sandbox: Create and export Pause/ResumeContainer() to the API level
In order to support use cases such as containerd-shim-v2 where
we would have a long running process holding the sandbox pointer,
there would be no reason to call into the stateless functions
PauseContainer() and ResumeContainer(), which would recreate a
new sandbox pointer and the corresponding ones for containers.

Fixes #903

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-11-12 15:15:50 -08:00
Sebastien Boeuf
b298ec4228 sandbox: Create and export ProcessListContainer() to the API level
In order to support use cases such as containerd-shim-v2 where
we would have a long running process holding the sandbox pointer,
there would be no reason to call into the stateless function
ProcessListContainer(), which would recreate a new sandbox pointer
and the corresponding ones for containers.

Fixes #903

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-11-12 15:15:44 -08:00
Sebastien Boeuf
3add296f78 sandbox: Create and export KillContainer() to the API level
In order to support use cases such as containerd-shim-v2 where we
would have a long running process holding the sandbox pointer, there
would be no reason to call into the stateless function KillContainer(),
which would recreate a new sandbox pointer and the corresponding ones
for containers.

Fixes #903

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-11-12 15:15:37 -08:00
Sebastien Boeuf
76537265cb sandbox: Create and export StopContainer() to the API level
In order to support use cases such as containerd-shim-v2 where we
would have a long running process holding the sandbox pointer, there
would be no reason to call into the stateless function StopContainer(),
which would recreate a new sandbox pointer and the corresponding ones
for containers.

Fixes #903

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-11-12 15:15:31 -08:00
Sebastien Boeuf
109e12aa56 sandbox: Export Stop() to the API level
In order to support use cases such as containerd-shim-v2 where we
would have a long running process holding the sandbox pointer, there
would be no reason to call into the stateless function StopSandbox(),
which would recreate a new sandbox pointer and the corresponding ones
for containers.

Fixes #903

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-11-12 15:15:24 -08:00
Sebastien Boeuf
6c3e266eb9 sandbox: Export Start() to the API level
In order to support use cases such as containerd-shim-v2 where we
would have a long running process holding the sandbox pointer, there
would be no reason to call into the stateless function StartSandbox(),
which would recreate a new sandbox pointer and the corresponding ones
for containers.

Fixes #903

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-11-12 15:15:04 -08:00
Sebastien Boeuf
7bf84d05ad types: Replace agent/pkg/types with virtcontainers/pkg/types
This commit replaces every place where the "types" package from the
Kata agent was used, with the new "types" package from virtcontainers.

In order to do so, it introduces a few translation functions between
the agent and virtcontainers types, since this is needed by the kata
agent implementation.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-11-02 08:46:11 -07:00
Wei Zhang
34fe3b9d6d cgroups: add host cgroup support
Fixes #344

Add host cgroup support for kata.

This commits only adds cpu.cfs_period and cpu.cfs_quota support.

It will create 3-level hierarchy, take "cpu" cgroup as an example:

```
/sys/fs/cgroup
|---cpu
   |---kata
      |---<sandbox-id>
         |--vcpu
      |---<sandbox-id>
```

* `vc` cgroup is common parent for all kata-container sandbox, it won't be removed
after sandbox removed. This cgroup has no limitation.
* `<sandbox-id>` cgroup is the layer for each sandbox, it contains all other qemu
threads except for vcpu threads. In future, we can consider putting all shim
processes and proxy process here. This cgroup has no limitation yet.
* `vcpu` cgroup contains vcpu threads from qemu. Currently cpu quota and period
constraint applies to this cgroup.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
Signed-off-by: Jingxiao Lu <lujingxiao@huawei.com>
2018-10-27 09:41:35 +08:00
Sebastien Boeuf
309dcf9977 vendor: Update the agent vendoring based on pkg/types
Some agent types definition that were generic enough to be reused
everywhere, have been split from the initial grpc package.

This prevents from importing the entire protobuf package through
the grpc one, and prevents binaries such as kata-netmon to stay
in sync with the types definitions.

Fixes #856

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-10-26 09:35:59 -07:00
Frank Cao
633f4567f3 Merge pull request #825 from jodh-intel/add-trace-to-remaining-api-funcs
virtcontainers: Add missing API trace calls
2018-10-18 16:53:30 +08:00
Graham Whaley
0a652a1ab8 Merge pull request #786 from linzichang/master
sandbox/virtcontainers: memory resource hotplug when create container.
2018-10-18 09:43:24 +01:00
Ruidong Cao
3f39d6e807 virtcontainers: Add missing API release calls
Add missing release sandbox calls to network related functions in
virtcontainers API.

Fixes #732.

Signed-off-by: Ruidong Cao <caoruidong@huawei.com>
2018-10-18 06:58:04 +08:00
James O. D. Hunt
ee9275fedb virtcontainers: Add missing API trace calls
Add missing trace calls to remaining public virtcontainers API
functions.

Fixes #824.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2018-10-17 11:34:43 +01:00
Zichang Lin
8e2ee686bd sandbox/virtcontainers: memory resource hotplug when create container.
When create sandbox, we setup a sandbox of 2048M base memory, and
then hotplug memory that is needed for every new container. And
we change the unit of c.config.Resources.Mem from MiB to Byte in
order to prevent the 4095B < memory < 1MiB from being lost.

Depends-on:github.com/kata-containers/tests#813

Fixes #400

Signed-off-by: Clare Chen <clare.chenhui@huawei.com>
Signed-off-by: Zichang Lin <linzichang@huawei.com>
2018-10-15 10:37:29 +08:00
Clare Chen
12a0354084 sandbox: get and store guest details.
Get and store guest details after sandbox is completely created.
And get memory block size from sandbox state file when check
hotplug memory valid.

Signed-off-by: Clare Chen <clare.chenhui@huawei.com>
Signed-off-by: Zichang Lin <linzichang@huawei.com>
2018-09-17 07:00:46 -04:00
Sebastien Boeuf
9c6ed93f80 hook: Move OCI hooks handling to the CLI
The CLI being the implementation of the OCI specification, and the
hooks being OCI specific, it makes sense to move the handling of any
OCI hooks to the CLI level. This changes allows the Kata API to
become OCI agnostic.

Fixes #599

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-08-24 15:07:27 -07:00
James O. D. Hunt
d0679a6fd1 tracing: Add tracing support to virtcontainers
Add additional `context.Context` parameters and `struct` fields to allow
trace spans to be created by the `virtcontainers` internal functions,
objects and sub-packages.

Note that not every function is traced; we can add more traces as
desired.

Fixes #566.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2018-08-22 08:24:58 +01:00
James O. D. Hunt
90970d94c0 tracing: Add trace spans to virtcontainers APIs
Create spans for all `virtcontainers` API functions.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2018-08-22 08:24:58 +01:00
James O. D. Hunt
c200b28dc7 tracing: Add context to virtcontainers API
Add a `context.Context` parameter to all the virtcontainers API's to
support tracing.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2018-08-22 08:24:58 +01:00
Ruidong Cao
1a17200cc8 virtcontainers: add sandbox hotplug network API
Add sandbox hotplug network API to meet design

Signed-off-by: Ruidong Cao <caoruidong@huawei.com>
2018-08-16 16:10:10 +08:00
Wei Zhang
6e6be98b15 devices: add interface "sandbox.AddDevice"
Fixes #50 .

Add new interface sandbox.AddDevice, then for Frakti use case, a device
can be attached to sandbox before container is created.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2018-08-15 15:24:12 +08:00
James O. D. Hunt
58448bbcb8 logging: Allow SetLogger to be called multiple times
Now that the `SetLogger()` functions accept a `logrus.Entry`, they can
access the fields that have already been set for the logger and
re-apply them if `SetLogger()` is called multiple times.

This fixes a bug whereby the logger functions -- which are necessarily
called multiple times [1] -- previously ended up applying any new fields
the specified logger contained, but erroneously removing any additional
fields added since `SetLogger()` was last called.

Partially fixes #519.

--
[1] - https://github.com/kata-containers/runtime/pull/468

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2018-07-30 15:32:41 +01:00
James O. D. Hunt
029e7ca680 api: Change logger functions to accept a log entry
Rather than accepting a `logrus.FieldLogger` interface type, change all
the `SetLogger()` functions to accept a `logrus.Entry`.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2018-07-30 15:32:41 +01:00
James O. D. Hunt
dfb758a82d logging: Remove duplicate arch field in vc
As of #521, the runtime now adds the `arch` log field so
`virtcontainers` doesn't need to set it too.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2018-07-30 15:32:41 +01:00
Haomin Tsai
8939fd802f Merge pull request #351 from woshijpf/fix-no-kata-agent
virtcontainers: process the case that kata-agent doesn't start in VM
2018-07-24 19:47:08 +08:00
flyflypeng
7103c4f14a virtcontainers: add qemu process rollback
If some errors occur after qemu process start, then we need to
rollback to kill qemu process

Fixes: #297

Signed-off-by: flyflypeng <jiangpengfei9@huawei.com>
2018-07-24 21:36:57 +08:00
flyflypeng
daebbd1e93 virtcontainers: add rollback to remove sandbox network
If error occurs after sandbox network created successfully, we need to rollback
to remove the created sandbox network

Fixes: #297

Signed-off-by: flyflypeng <jiangpengfei9@huawei.com>
2018-07-24 21:34:58 +08:00
Peng Tao
d69fbcf17f sandbox: add stateful sandbox config
When enabled, do not release in memory sandbox resources in VC APIs,
and callers are expected to call sandbox.Release() to release the in
memory resources.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-07-23 09:54:02 +08:00
Peng Tao
7a6f205970 virtcontainers: keep qmp connection when possible
For each time a sandbox structure is created, we ensure s.Release()
is called. Then we can keep the qmp connection as long as Sandbox
pointer is alive.

All VC interfaces are still stateless as s.Release() is called before
each API returns.

OTOH, for VCSandbox APIs, FetchSandbox() must be paired with s.Release,
the same as before.

Fixes: #500

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-07-23 08:37:55 +08:00