Commit Graph

184 Commits

Author SHA1 Message Date
Penny Zheng
3bfcdf755a agent: add interface memHotplugByProbe
we need to notify guest kernel about memory hot-added event via probe interface.
hot-added memory deivce should be sliced into the size of memory section.

Fixes: #1149

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
2019-04-04 17:03:20 +08:00
Penny Zheng
47670fcf73 memoryDevice: reconstruct memoryDevice
If kata-runtime supports memory hotplug via probe interface, we need to reconstruct
memoryDevice to store relevant status, which are addr and probe. addr specifies the
physical address of the memory device, and probe determines it is hotplugged via
acpi-driven or probe interface.

Fixes: #1149

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
2019-04-04 17:03:20 +08:00
Penny Zheng
30a6a7de39 agent: acquire memory hotplug probe info via GetGuestDetails
In order to support memory hotplug via probe interface in kata-runtime,
firstly, we need to verify whether guest kernel is capable of that.

Fixes: #1149

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
2019-04-04 17:03:19 +08:00
Fupan Li
c9a3b933f8 Merge pull request #1427 from Ace-Tang/fix-qemu-leak
qemu: fix qemu leak when failed to start container
2019-04-02 23:32:11 +08:00
Peng Tao
25d21060e3 Merge pull request #1412 from lifupan/shimv2mount
shimv2: optionally plug rootfs block storage instead of mounting it
2019-04-02 15:30:40 +08:00
lifupan
628ea46c58 virtcontainers: change container's rootfs from string to mount alike struct
container's rootfs is a string type, which cannot represent a
block storage backed rootfs which hasn't been mounted.
Change it to a mount alike struct as below:
    RootFs struct {
            // Source specify the BlockDevice path
            Source string
            // Target specify where the rootfs is mounted if it has been mounted
            Target string
            // Type specifies the type of filesystem to mount.
            Type string
            // Options specifies zero or more fstab style mount options.
            Options []string
            // Mounted specifies whether the rootfs has be mounted or not
            Mounted bool
     }

If the container's rootfs has been mounted as before, then this struct can be
initialized as: RootFs{Target: <rootfs>, Mounted: true} to be compatible with
previous case.

Fixes:#1158

Signed-off-by: lifupan <lifupan@gmail.com>
2019-04-02 10:54:05 +08:00
Ace-Tang
096fa046f8 qemu: fix qemu leak when failed to start container
do cleanup inside startVM() if start vm get error

Fixes: #1426

Signed-off-by: Ace-Tang <aceapril@126.com>
2019-03-28 19:38:56 +08:00
Ganesh Maharaj Mahalingam
f4428761cb lint: Update go linter from gometalinter to golangci-lint.
gometalinter is deprecated and will be archived April '19. The
suggestion is to switch to golangci-lint which is apparently 5x faster
than gometalinter.

Partially Fixes: #1377

Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
2019-03-25 08:48:13 -07:00
Jose Carlos Venegas Munoz
0061e166d4 virtcontainers: move resource calculation to its own function
Make cpu and memory calculation in a different function
this help to reduce the function complexity and easy  unit test.

Fixes: #1296

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
2019-03-11 12:17:01 -06:00
Wei Zhang
da80c70c0c config: enhance Feature structure
Fixes #1226

Add more fields to better describe an experimental feature.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2019-03-10 22:44:41 +08:00
Wei Zhang
050f03bb36 config: Add config flag "experimental"
Fixes #1226

Add new flag "experimental" for supporting underworking features.
Some features are under developing which are not ready for release,
there're also some features which will break compatibility which is not
suitable to be merged into a kata minor release(x version in x.y.z)

For getting these features above merged earlier for more testing, we can
mark them as "experimental" features, and move them to formal features
when they are ready.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2019-03-12 11:03:28 +08:00
James O. D. Hunt
c759cf5f37 tracing: Fix tracing
The store refactor (#1066) inadvertently broke runtime tracing as it
created new contexts containing trace spans.

Reworking the store changes to re-use the existing context resolves the
problem since runtime tracing assumes a single context.

Fixes #1277.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2019-03-04 11:02:31 +00:00
Eric Ernst
dc2650889c virtcontainers: fix vCPU calculation errors
We were grabbing a running total of quota and period for each container
and then calculating the number of resulting vCPUs. Summing period
doesn't make sense.  To simplify, let's just calculate mCPU per
container, keep a running total of mCPUs requested, and then translate
to sandbox vCPUs after.

Fixes: #1292

Signed-off-by: Eric Ernst <eric.ernst@intel.com>
2019-02-28 08:13:04 -08:00
Julio Montes
5201860bb0 virtcontainers: reimplement sandbox cgroup
All containers run in different cgroups even the sandbox, with this new
implementation the sandbox cpu cgroup wil be equal to the sum of all its
containers and the hypervisor process will be placed there impacting to the
containers running in the sandbox (VM). The default number of vcpus is
used when the sandbox has no constraints. For example, if default_vcpus
is 2, then quota will be 200000 and period 100000.

**c-ray test**
http://www.futuretech.blinkenlights.nl/c-ray.html

```
+=============================================+
|         | 6 threads 6cpus | 1 thread 1 cpu  |
+=============================================+
| current |   40 seconds    |   122 seconds   |
+==============================================
|   new   |   37 seconds    |   124 seconds   |
+==============================================
```

current = current cgroups implementation
new = new cgroups implementation

**workload**

```yaml
apiVersion: v1
kind: Pod
metadata:
  name: c-ray
  annotations:
    io.kubernetes.cri.untrusted-workload: "true"
spec:
  restartPolicy: Never
  containers:
  - name: c-ray-1
    image: docker.io/devimc/c-ray:latest
    imagePullPolicy: IfNotPresent
    args: ["-t", "6", "-s", "1600x1200", "-r", "8", "-i",
          "/c-ray-1.1/sphfract", "-o", "/tmp/output.ppm"]
    resources:
      limits:
        cpu: 6
  - name: c-ray-2
    image: docker.io/devimc/c-ray:latest
    imagePullPolicy: IfNotPresent
    args: ["-t", "1", "-s", "1600x1200", "-r", "8", "-i",
          "/c-ray-1.1/sphfract", "-o", "/tmp/output.ppm"]
    resources:
      limits:
        cpu: 1
```

fixes #1153

Signed-off-by: Julio Montes <julio.montes@intel.com>
2019-02-19 13:13:44 -06:00
Samuel Ortiz
fad23ea54e virtcontainers: Conversion to Stores
We convert the whole virtcontainers code to use the store package
instead of the resource_storage one. The resource_storage removal will
happen in a separate change for a more logical split.

This change is fairly big but mostly does not change the code logic.
What really changes is when we create a store for a container or a
sandbox. We now need to explictly do so instead of just assigning a
filesystem{} instance. Other than that, the logic is kept intact.

Fixes: #1099

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-02-07 00:59:29 +01:00
Samuel Ortiz
18dcd2c2f7 virtcontainers: Decouple the network API from the sandbox one
In order to fix #1059, we want to create a hypervisor package. Some of
the hypervisor implementations (qemu) depend on the network and endpoint
interfaces. We can not have a virtcontainers -> hypervisor -> network,
endpoint -> virtcontainers cyclic dependency.
So before creating the hypervisor package, we need to decouple the
network API from the virtcontainers one.

Fixes: #1180

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-25 15:25:49 +01:00
Samuel Ortiz
b39cb1d13a virtcontainers: Remove the network interface
There's only one real implementer of the network interface and no real
need to implement anything else. We can just go ahead and remove this
abstraction.

Fixes: #1179

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-25 15:25:46 +01:00
Peng Tao
d314e2d0b7 agent: clean up share path created by the agent
The agent code creates a directory at
`/run/kata-containers/shared/sandboxes/sbid/` to hold shared data
between host and guest. We need to clean it up when removing a sandbox.

Fixes: #1138

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2019-01-21 14:10:59 +08:00
Samuel Ortiz
67e696bf62 virtcontainers: Add Asset to the types package
In order to move the hypervisor implementations into their own package,
we need to put the asset type into the types package and break the
hypervisor->asset->virtcontainers->hypervisor cyclic dependency.

Fixes: #1119

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-14 20:30:06 +01:00
Samuel Ortiz
cf22f402d8 virtcontainers: Remove the hypervisor waitSandbox method
We always call waitSandbox after we start the VM (startSandbox), so
let's simplify the hypervisor interface and integrate waiting for the VM
into startSandbox.
This makes startSandbox a blocking call, but that is practically the
case today.

Fixes: #1009

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-08 19:38:33 +01:00
Samuel Ortiz
763bf18daa virtcontainers: Remove the hypervisor init method
We always combine the hypervisor init and createSandbox, because what
we're trying to do is simply that: Set the hypervisor and have it create
a sandbox.

Instead of keeping a method with vague semantics, remove init and
integrate the actual hypervisor setup phase into the createSandbox one.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-08 19:37:20 +01:00
Samuel Ortiz
b05dbe3886 runtime: Convert to the new internal types package
We can now remove all the sandbox shared types and convert the rest of
the code to using the new internal types package.

This commit includes virtcontainers, cli and containerd-shim changes in
one atomic change in order to not break bisect'ibility.

Fixes: #1095

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-08 14:43:33 +01:00
Samuel Ortiz
3ab7d077d1 virtcontainers: Alias for pkg/types
Since we're going to have both external and internal types packages, we
alias the external one as vcTypes. And the internal one will be usable
through the types namespace.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-08 14:24:06 +01:00
James O. D. Hunt
36c267a1d2 Merge pull request #1085 from bergwolf/containerd
cli: allow to kill a stopped container and sandbox
2019-01-08 08:44:10 +00:00
James O. D. Hunt
38c9cd2b85 Merge pull request #689 from nitkon/seccomp
virtcontainers: Pass seccomp profile inside VM
2019-01-08 08:42:07 +00:00
Nitesh Konkar
c2c9c844e2 virtcontainers: Conditionally pass seccomp profile
Pass Seccomp profile to the agent only if
the configuration.toml allows it to be passed
and the agent/image is seccomp capable.

Fixes: #688

Signed-off-by: Nitesh Konkar niteshkonkar@in.ibm.com
2019-01-08 10:22:23 +05:30
Peng Tao
bf2813fee8 cli: allow to kill a stopped container and sandbox
cri containerd calls kill on stopped sandbox and if we
fail the call, it can cause `cri stopp` command to fail
too.

Fixes: #1084

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2019-01-08 11:19:25 +08:00
Samuel Ortiz
09168ccda7 virtcontainers: Call stopVM() from sandbox.Stop()
Now that stopVM() also calls agent.stopSandbox(), we can have the
sandbox Stop() call using stopVM() directly and avoid code duplication.

Fixes: #1011

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-07 09:56:58 -08:00
Samuel Ortiz
acf833cb4a virtcontainers: Call agent startSandbox from startVM
We always ask the agent to start the sandbox when we start the VM, so we
should simply call agent.startSandbox from startVM instead of open
coding those.
This slightly simplifies the complex createSandboxFromConfig routine.

Fixes: #1011

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Eric Ernst <eric.ernst@intel.com>
2019-01-07 09:56:04 -08:00
Peng Tao
bf1a5ce000 sandbox: cleanup sandbox if creation failed
This includes cleaning up the sandbox on disk resources,
and closing open fds when preparing the hypervisor.

Fixes: #1057

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-12-21 13:46:16 +08:00
Jose Carlos Venegas Munoz
0d80202573 vc:sandbox: rename newcontainer to fetchcontainer.
The containers is not new but fech from an existing one.

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
2018-12-13 16:33:24 -06:00
Jose Carlos Venegas Munoz
618cfbf1db vc: sandbox: Let sandbox manage VM resources.
- Container only is responsable of namespaces and cgroups
inside the VM.

- Sandbox will manage VM resources.

The resouces has to be re-calculated and updated:

- Create new Container: If a new container is created the cpus and memory
may be updated.

- Container update: The update call will change the cgroups of a container.
the sandbox would need to resize the cpus and VM depending the update.

To manage the resources from sandbox the hypervisor interaface adds two methods.

- resizeMemory().

This function will be used by the sandbox to request
increase or decrease the VM memory.

- resizeCPUs()

vcpus are requested to the hypervisor based
on the sum of all the containers in the sandbox.

The CPUs calculations use the container cgroup information all the time.

This should allow do better calculations.

For example.

2 containers in a pod.

container 1 cpus = .5
container 2 cpus = .5

Now:
Sandbox requested vcpus 1

Before:
Sandbox requested vcpus 2

When a update request is done only some atributes have
information. If cpu and quota are nil or 0 we dont update them.

If we would updated them the sandbox calculations would remove already
removed vcpus.

This commit also moves the sandbox resource update call at container.update()
just before the container cgroups information is updated.

Fixes: #833

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
2018-12-13 16:33:14 -06:00
Sebastien Boeuf
57773816b3 sandbox: Create and export Pause/ResumeContainer() to the API level
In order to support use cases such as containerd-shim-v2 where
we would have a long running process holding the sandbox pointer,
there would be no reason to call into the stateless functions
PauseContainer() and ResumeContainer(), which would recreate a
new sandbox pointer and the corresponding ones for containers.

Fixes #903

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-11-12 15:15:50 -08:00
Sebastien Boeuf
b298ec4228 sandbox: Create and export ProcessListContainer() to the API level
In order to support use cases such as containerd-shim-v2 where
we would have a long running process holding the sandbox pointer,
there would be no reason to call into the stateless function
ProcessListContainer(), which would recreate a new sandbox pointer
and the corresponding ones for containers.

Fixes #903

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-11-12 15:15:44 -08:00
Sebastien Boeuf
3add296f78 sandbox: Create and export KillContainer() to the API level
In order to support use cases such as containerd-shim-v2 where we
would have a long running process holding the sandbox pointer, there
would be no reason to call into the stateless function KillContainer(),
which would recreate a new sandbox pointer and the corresponding ones
for containers.

Fixes #903

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-11-12 15:15:37 -08:00
Sebastien Boeuf
76537265cb sandbox: Create and export StopContainer() to the API level
In order to support use cases such as containerd-shim-v2 where we
would have a long running process holding the sandbox pointer, there
would be no reason to call into the stateless function StopContainer(),
which would recreate a new sandbox pointer and the corresponding ones
for containers.

Fixes #903

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-11-12 15:15:31 -08:00
Sebastien Boeuf
109e12aa56 sandbox: Export Stop() to the API level
In order to support use cases such as containerd-shim-v2 where we
would have a long running process holding the sandbox pointer, there
would be no reason to call into the stateless function StopSandbox(),
which would recreate a new sandbox pointer and the corresponding ones
for containers.

Fixes #903

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-11-12 15:15:24 -08:00
Sebastien Boeuf
6c3e266eb9 sandbox: Export Start() to the API level
In order to support use cases such as containerd-shim-v2 where we
would have a long running process holding the sandbox pointer, there
would be no reason to call into the stateless function StartSandbox(),
which would recreate a new sandbox pointer and the corresponding ones
for containers.

Fixes #903

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-11-12 15:15:04 -08:00
Sebastien Boeuf
51997775bd virtcontainers: Rely on new interface LinkType field
Now that Interface structure includes the useful information about
the type of interface, Kata does not need to do any assumption about
the type of interface that needs to be added.

Fixes #866

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-11-02 08:46:11 -07:00
Sebastien Boeuf
7bf84d05ad types: Replace agent/pkg/types with virtcontainers/pkg/types
This commit replaces every place where the "types" package from the
Kata agent was used, with the new "types" package from virtcontainers.

In order to do so, it introduces a few translation functions between
the agent and virtcontainers types, since this is needed by the kata
agent implementation.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-11-02 08:46:11 -07:00
Peng Tao
e9aa870255 network: enable network hotplug for vm factory
After we scan the netns, we should hotplug the network interface to
the guest after it is kicked off running.

Fixes: #871

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-11-01 09:33:16 +08:00
Wei Zhang
34fe3b9d6d cgroups: add host cgroup support
Fixes #344

Add host cgroup support for kata.

This commits only adds cpu.cfs_period and cpu.cfs_quota support.

It will create 3-level hierarchy, take "cpu" cgroup as an example:

```
/sys/fs/cgroup
|---cpu
   |---kata
      |---<sandbox-id>
         |--vcpu
      |---<sandbox-id>
```

* `vc` cgroup is common parent for all kata-container sandbox, it won't be removed
after sandbox removed. This cgroup has no limitation.
* `<sandbox-id>` cgroup is the layer for each sandbox, it contains all other qemu
threads except for vcpu threads. In future, we can consider putting all shim
processes and proxy process here. This cgroup has no limitation yet.
* `vcpu` cgroup contains vcpu threads from qemu. Currently cpu quota and period
constraint applies to this cgroup.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
Signed-off-by: Jingxiao Lu <lujingxiao@huawei.com>
2018-10-27 09:41:35 +08:00
Sebastien Boeuf
309dcf9977 vendor: Update the agent vendoring based on pkg/types
Some agent types definition that were generic enough to be reused
everywhere, have been split from the initial grpc package.

This prevents from importing the entire protobuf package through
the grpc one, and prevents binaries such as kata-netmon to stay
in sync with the types definitions.

Fixes #856

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-10-26 09:35:59 -07:00
Ruidong Cao
14e5437cae cli: add configuration option to use or not use host netns
If `disable_new_netns` set to true, create VM and shim processes in the host netns

Signed-off-by: Ruidong Cao <caoruidong@huawei.com>
2018-10-22 21:06:58 +08:00
Ruidong Cao
6935279beb network: add new NetInterworkingModel "none" and endpoint type TapEndpoint
This model is for not creating a new net ns for VM and directly
creating taps in the host net ns.

Signed-off-by: Ruidong Cao <caoruidong@huawei.com>
2018-10-22 21:06:58 +08:00
Ruidong Cao
f8f29622a4 virtcontainers: refactor hotplug qmp functions
Refactor these functions so differernt types of endpoints can use a unified
function to hotplug nics.

Fixes #731

Signed-off-by: Ruidong Cao <caoruidong@huawei.com>
2018-10-22 21:06:56 +08:00
Archana Shinde
b04691e229 network: Collapse log calls for endpoint Attach and Detach
Log Attach, Detach, HotAttach and HotDetach at a single
location.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2018-10-11 14:45:57 -07:00
Julio Montes
9e606b3da8 virtcontainers: revert "fix shared dir resource remaining"
This reverts commit 8a6d383715.

Don't remove all directories in the shared directory because
`docker cp` re-mounts all the mount points specified in the
config.json causing serious problems in the host.

fixes #777

Signed-off-by: Julio Montes <julio.montes@intel.com>
2018-09-24 12:15:09 -05:00
Sebastien Boeuf
b59ea21e4f Merge pull request #752 from jcvenegas/memory-slots-config
config: Add Memory slots config
2018-09-21 11:53:04 -07:00
Archana Shinde
38734bd7c6 Merge pull request #761 from caoruidong/add-inf
virtcontainers: support vhost and physical endpoints in AddInterface
2018-09-21 10:54:52 -07:00