Modify some path variables to be functions that return the path
with the rootless directory prefix if running rootlessly.
Fixes: #1827
Signed-off-by: Gabi Beyer <gabrielle.n.beyer@intel.com>
Fixes: #2023
We can get OCI spec config from bundle instead of annotations, so this
field isn't necessary.
Signed-off-by: Wei Zhang <weizhang555.zw@gmail.com>
When a new sandbox is created, join to its cgroup path
this will create all proxy, shim, etc in the sandbox cgroup.
Fixes: #1879
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
When shimv2 was killed by accident, containerd would try to
launch a new shimv2 binarry to cleanup the container. In order
to avoid race condition, the cleanup should be done serialized
in a sandbox. Thus adding a new api to do this by locking the
sandbox.
Fixes:#1832
Signed-off-by: lifupan <lifupan@gmail.com>
in create sandbox, if process error, should remove network without judge
NetNsCreated is true, since network is created by kata and should be
removed by kata, and network.Remove has judged if need to delete netns
depend on NetNsCreated
Fixes: #1920
Signed-off-by: Ace-Tang <aceapril@126.com>
When force is true, ignore any guest related errors. This can
be used to stop a sandbox when hypervisor process is dead.
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
If kata containers is using vfio and vhost net,the unbinding
of vfio would be hang. In the scenario, vhost net kernel thread
takes a reference to the qemu's mm, and the reference also includes
the mmap regions on the vfio device file. so vhost kernel thread
would be not released when qemu is killed as the vhost file
descriptor still is opened by shim v2 process, and the vfio device
is not released because there's still a reference to the mmap.
Fixes: #1669
Signed-off-by: Yang, Wei <w90p710@gmail.com>
Signed-off-by: Eric Ernst <eric.ernst@intel.com>
When using experimental feature "newstore", we save and load devices
information from `persist.json` instead of `devices.json`, in such case,
file `devices.json` isn't needed anymore, so remove it.
Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
This reverts commit 196661bc0d.
Reverting because cri-o with devicemapper started
to fail after this commit was merged.
Fixes: #1574.
Signed-off-by: Salvador Fuentes <salvador.fuentes@intel.com>
We can use the same data structure to describe both of them.
So that we can handle them similarly.
Fixes: #1566
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
container's rootfs is a string type, which cannot represent a
block storage backed rootfs which hasn't been mounted.
Change it to a mount alike struct as below:
RootFs struct {
// Source specify the BlockDevice path
Source string
// Target specify where the rootfs is mounted if it has been mounted
Target string
// Type specifies the type of filesystem to mount.
Type string
// Options specifies zero or more fstab style mount options.
Options []string
// Mounted specifies whether the rootfs has be mounted or not
Mounted bool
}
If the container's rootfs has been mounted as before, then this struct can be
initialized as: RootFs{Target: <rootfs>, Mounted: true} to be compatible with
previous case.
Fixes:#1158
Signed-off-by: lifupan <lifupan@gmail.com>
The store refactor (#1066) inadvertently broke runtime tracing as it
created new contexts containing trace spans.
Reworking the store changes to re-use the existing context resolves the
problem since runtime tracing assumes a single context.
Fixes#1277.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
All containers run in different cgroups even the sandbox, with this new
implementation the sandbox cpu cgroup wil be equal to the sum of all its
containers and the hypervisor process will be placed there impacting to the
containers running in the sandbox (VM). The default number of vcpus is
used when the sandbox has no constraints. For example, if default_vcpus
is 2, then quota will be 200000 and period 100000.
**c-ray test**
http://www.futuretech.blinkenlights.nl/c-ray.html
```
+=============================================+
| | 6 threads 6cpus | 1 thread 1 cpu |
+=============================================+
| current | 40 seconds | 122 seconds |
+==============================================
| new | 37 seconds | 124 seconds |
+==============================================
```
current = current cgroups implementation
new = new cgroups implementation
**workload**
```yaml
apiVersion: v1
kind: Pod
metadata:
name: c-ray
annotations:
io.kubernetes.cri.untrusted-workload: "true"
spec:
restartPolicy: Never
containers:
- name: c-ray-1
image: docker.io/devimc/c-ray:latest
imagePullPolicy: IfNotPresent
args: ["-t", "6", "-s", "1600x1200", "-r", "8", "-i",
"/c-ray-1.1/sphfract", "-o", "/tmp/output.ppm"]
resources:
limits:
cpu: 6
- name: c-ray-2
image: docker.io/devimc/c-ray:latest
imagePullPolicy: IfNotPresent
args: ["-t", "1", "-s", "1600x1200", "-r", "8", "-i",
"/c-ray-1.1/sphfract", "-o", "/tmp/output.ppm"]
resources:
limits:
cpu: 1
```
fixes#1153
Signed-off-by: Julio Montes <julio.montes@intel.com>
We convert the whole virtcontainers code to use the store package
instead of the resource_storage one. The resource_storage removal will
happen in a separate change for a more logical split.
This change is fairly big but mostly does not change the code logic.
What really changes is when we create a store for a container or a
sandbox. We now need to explictly do so instead of just assigning a
filesystem{} instance. Other than that, the logic is kept intact.
Fixes: #1099
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We can now remove all the sandbox shared types and convert the rest of
the code to using the new internal types package.
This commit includes virtcontainers, cli and containerd-shim changes in
one atomic change in order to not break bisect'ibility.
Fixes: #1095
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Since we're going to have both external and internal types packages, we
alias the external one as vcTypes. And the internal one will be usable
through the types namespace.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We always ask the agent to start the sandbox when we start the VM, so we
should simply call agent.startSandbox from startVM instead of open
coding those.
This slightly simplifies the complex createSandboxFromConfig routine.
Fixes: #1011
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Eric Ernst <eric.ernst@intel.com>
startSandbox() wraps a single operation (sandbox.Start()), so we can
remove it and make the code easier to read/follow.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This includes cleaning up the sandbox on disk resources,
and closing open fds when preparing the hypervisor.
Fixes: #1057
Signed-off-by: Peng Tao <bergwolf@gmail.com>
When our runtime is asked for the container status, we also handle
the scenario where the container is stopped if the shim process for
that container on the host has terminated.
In the current implementation, we retrieve the container status
before stopping the container, causing a wrong status to be returned.
The wait for the original go-routine's completion was done in a defer
within the caller of statusContainers(), resulting in the
statusContainer()'s values to return the pre-stopped value.
This bug is first observed when updating to docker v18.09/containerd
v1.2.0. With the current implementation, containerd-shim receives the
TaskExit when it detects kata-shim is terminating. When checking the
container state, however, it does not get the expected "stopped" value.
The following commit resolves the described issue by simplifying the
locking used around the status container calls. Originally
StatusContainer would request a read lock. If we needed to update the
container status in statusContainer, we'd start a go-routine which
would request a read-write lock, waiting for the original read lock to
be released. Can't imagine a bug could linger in this logic. We now
just request a read-write lock in the caller (StatusContainer),
skipping the need for a separate go-routine and defer. This greatly
simplifies the logic, and removes the original bug.
Fixes#926
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to support use cases such as containerd-shim-v2 where
we would have a long running process holding the sandbox pointer,
there would be no reason to call into the stateless functions
PauseContainer() and ResumeContainer(), which would recreate a
new sandbox pointer and the corresponding ones for containers.
Fixes#903
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to support use cases such as containerd-shim-v2 where
we would have a long running process holding the sandbox pointer,
there would be no reason to call into the stateless function
ProcessListContainer(), which would recreate a new sandbox pointer
and the corresponding ones for containers.
Fixes#903
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to support use cases such as containerd-shim-v2 where we
would have a long running process holding the sandbox pointer, there
would be no reason to call into the stateless function KillContainer(),
which would recreate a new sandbox pointer and the corresponding ones
for containers.
Fixes#903
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to support use cases such as containerd-shim-v2 where we
would have a long running process holding the sandbox pointer, there
would be no reason to call into the stateless function StopContainer(),
which would recreate a new sandbox pointer and the corresponding ones
for containers.
Fixes#903
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to support use cases such as containerd-shim-v2 where we
would have a long running process holding the sandbox pointer, there
would be no reason to call into the stateless function StopSandbox(),
which would recreate a new sandbox pointer and the corresponding ones
for containers.
Fixes#903
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to support use cases such as containerd-shim-v2 where we
would have a long running process holding the sandbox pointer, there
would be no reason to call into the stateless function StartSandbox(),
which would recreate a new sandbox pointer and the corresponding ones
for containers.
Fixes#903
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit replaces every place where the "types" package from the
Kata agent was used, with the new "types" package from virtcontainers.
In order to do so, it introduces a few translation functions between
the agent and virtcontainers types, since this is needed by the kata
agent implementation.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Fixes#344
Add host cgroup support for kata.
This commits only adds cpu.cfs_period and cpu.cfs_quota support.
It will create 3-level hierarchy, take "cpu" cgroup as an example:
```
/sys/fs/cgroup
|---cpu
|---kata
|---<sandbox-id>
|--vcpu
|---<sandbox-id>
```
* `vc` cgroup is common parent for all kata-container sandbox, it won't be removed
after sandbox removed. This cgroup has no limitation.
* `<sandbox-id>` cgroup is the layer for each sandbox, it contains all other qemu
threads except for vcpu threads. In future, we can consider putting all shim
processes and proxy process here. This cgroup has no limitation yet.
* `vcpu` cgroup contains vcpu threads from qemu. Currently cpu quota and period
constraint applies to this cgroup.
Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
Signed-off-by: Jingxiao Lu <lujingxiao@huawei.com>
Some agent types definition that were generic enough to be reused
everywhere, have been split from the initial grpc package.
This prevents from importing the entire protobuf package through
the grpc one, and prevents binaries such as kata-netmon to stay
in sync with the types definitions.
Fixes#856
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When create sandbox, we setup a sandbox of 2048M base memory, and
then hotplug memory that is needed for every new container. And
we change the unit of c.config.Resources.Mem from MiB to Byte in
order to prevent the 4095B < memory < 1MiB from being lost.
Depends-on:github.com/kata-containers/tests#813
Fixes#400
Signed-off-by: Clare Chen <clare.chenhui@huawei.com>
Signed-off-by: Zichang Lin <linzichang@huawei.com>
Get and store guest details after sandbox is completely created.
And get memory block size from sandbox state file when check
hotplug memory valid.
Signed-off-by: Clare Chen <clare.chenhui@huawei.com>
Signed-off-by: Zichang Lin <linzichang@huawei.com>
The CLI being the implementation of the OCI specification, and the
hooks being OCI specific, it makes sense to move the handling of any
OCI hooks to the CLI level. This changes allows the Kata API to
become OCI agnostic.
Fixes#599
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Add additional `context.Context` parameters and `struct` fields to allow
trace spans to be created by the `virtcontainers` internal functions,
objects and sub-packages.
Note that not every function is traced; we can add more traces as
desired.
Fixes#566.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Fixes#50 .
Add new interface sandbox.AddDevice, then for Frakti use case, a device
can be attached to sandbox before container is created.
Signed-off-by: Wei Zhang <zhangwei555@huawei.com>