Commit Graph

122 Commits

Author SHA1 Message Date
Ganesh Maharaj Mahalingam
27a92f94c8 runtime: Fix rootfs mount assumptions
This patch fixes the issue where various version of snapshotters,
overlay, block based graphdriver, containerd-shim-v2 overlay, block
based snapshotters mount & create rootfs differently and kata should be
able to handle them all.

The current version of the code always assumes that a folder named
'rootfs' exists within the mount device and that is the path the
container should start at. This patch checks the existing mount point
and if it is the same as the rootFs passed to the container, we no
longer add a suffix to the container's rootfs path.

Fixes: #1325

Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
Co-Authored-by: Manohar Castelino <manohar.r.castelino@intel.com>
2019-03-05 13:41:37 -08:00
Ace-Tang
454775fb97 cgroups: fix failed to remove sandbox cgroup
sandbox cgroup use V1NoConstraints, this only create memory subsystem,
but when delete, load parent cgroup always use `cgroups.V1`, so other
subsystem path can not be find, sandbox cgroup can not be deleted.

Fixes: #1263

Signed-off-by: Ace-Tang <aceapril@126.com>
2019-02-21 17:34:34 +08:00
Julio Montes
62c393c119 virtcontainers: change container's state to stop asap
container is killed by force, container's state MUST change its state to stop
immediately to avoid leaving it in a bad state.

fixes #1088

Signed-off-by: Julio Montes <julio.montes@intel.com>
2019-02-19 13:13:44 -06:00
Julio Montes
9758cdba7c virtcontainers: move cpu cgroup implementation
cpu cgroups are container's specific hence all containers even the sandbox
should be able o create, delete and update their cgroups. The cgroup crated
matches with the cgroup path passed by the containers manager.

fixes #1117
fixes #1118
fixes #1021

Signed-off-by: Julio Montes <julio.montes@intel.com>
2019-02-19 13:13:44 -06:00
Samuel Ortiz
7b0376f3d3 virtcontainers: Fix container.go cyclomatic complexity
With the Stores conversion, the newContainer() cyclomatic complexity
went over 15. We fix that by extracting the block devices creation
routine out of newContainer.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-02-07 00:59:33 +01:00
Samuel Ortiz
fad23ea54e virtcontainers: Conversion to Stores
We convert the whole virtcontainers code to use the store package
instead of the resource_storage one. The resource_storage removal will
happen in a separate change for a more logical split.

This change is fairly big but mostly does not change the code logic.
What really changes is when we create a store for a container or a
sandbox. We now need to explictly do so instead of just assigning a
filesystem{} instance. Other than that, the logic is kept intact.

Fixes: #1099

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-02-07 00:59:29 +01:00
Frank Cao
6c3277e013 Merge pull request #1126 from jcvenegas/allow-update-on-ready
update: allow do update on ready.
2019-01-18 11:03:12 +08:00
Jose Carlos Venegas Munoz
7228bab79b container: update: Allow updates once container is created
Before, we would only allow for a container-update command
to proceed if the container was in the running state. So
long as the container is created, this should be allowed.

This was found using the `static` policy for Kubernetes CPU
manager[1]. Where the `update` command is called after the
`create` runtime command (when the container state is `ready`).

[1] https://github.com/kubernetes/community/blob/95a4a1/contributors/design-proposals/node/cpu-manager.md#example-scenarios-and-interactions

Fixes: #1083

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
2019-01-16 17:15:00 -05:00
Samuel Ortiz
b25f43e865 virtcontainers: Add Capabilities to the types package
In order to move the hypervisor implementations into their own package,
we need to put the capabilities type into the types package.

Fixes: #1119

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-14 20:30:06 +01:00
Samuel Ortiz
b05dbe3886 runtime: Convert to the new internal types package
We can now remove all the sandbox shared types and convert the rest of
the code to using the new internal types package.

This commit includes virtcontainers, cli and containerd-shim changes in
one atomic change in order to not break bisect'ibility.

Fixes: #1095

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-01-08 14:43:33 +01:00
Peng Tao
bf2813fee8 cli: allow to kill a stopped container and sandbox
cri containerd calls kill on stopped sandbox and if we
fail the call, it can cause `cri stopp` command to fail
too.

Fixes: #1084

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2019-01-08 11:19:25 +08:00
Frank Cao
174e0c98bc Merge pull request #963 from running99/master
container: Use lazy unmount
2018-12-26 09:50:44 +08:00
Sebastien Boeuf
83e38c959a mounts: Ignore existing mounts if they cannot be honored
In case we use an hypervisor that cannot support filesystem sharing,
we copy files over to the VM rootfs through the gRPC protocol. This
is a nice workaround, but it only works with regular files, which
means no device file, no socket file, no directory, etc... can be
sent this way.

This is a limitation that we accept here, by simply ignoring those
non-regular files.

Fixes #1068

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Eric Ernst <eric.ernst@intel.com>
2018-12-21 15:38:06 +00:00
running
c099be56da container: Use lazy unmount
Unmount recursively to unmount bind-mounted volumes.
Fixes: #965
Signed-off-by: Ning Lu <crossrunning@outlook.com>
2018-12-21 11:11:58 +08:00
Julio Montes
378d8157a6 virtcontainers: copy or bind mount shared file
Copy files to contaier's rootfs if hypervisor doesn't supports filesystem
sharing, otherwise bind mount them in the shared directory.

see #1031

Signed-off-by: Julio Montes <julio.montes@intel.com>
2018-12-19 09:58:44 -06:00
Jose Carlos Venegas Munoz
618cfbf1db vc: sandbox: Let sandbox manage VM resources.
- Container only is responsable of namespaces and cgroups
inside the VM.

- Sandbox will manage VM resources.

The resouces has to be re-calculated and updated:

- Create new Container: If a new container is created the cpus and memory
may be updated.

- Container update: The update call will change the cgroups of a container.
the sandbox would need to resize the cpus and VM depending the update.

To manage the resources from sandbox the hypervisor interaface adds two methods.

- resizeMemory().

This function will be used by the sandbox to request
increase or decrease the VM memory.

- resizeCPUs()

vcpus are requested to the hypervisor based
on the sum of all the containers in the sandbox.

The CPUs calculations use the container cgroup information all the time.

This should allow do better calculations.

For example.

2 containers in a pod.

container 1 cpus = .5
container 2 cpus = .5

Now:
Sandbox requested vcpus 1

Before:
Sandbox requested vcpus 2

When a update request is done only some atributes have
information. If cpu and quota are nil or 0 we dont update them.

If we would updated them the sandbox calculations would remove already
removed vcpus.

This commit also moves the sandbox resource update call at container.update()
just before the container cgroups information is updated.

Fixes: #833

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
2018-12-13 16:33:14 -06:00
Hui Zhu
193b324242 newContainer: Not attach device if it is a CDROM
Got "docker: Error response from daemon: OCI runtime create failed:
QMP command failed: unknown." when "docker run --privileged" with kata.
In qemu part, it got:
"Could not open '/dev/sr0': Read-only file system"
or
"No medium found"
The cause is qemu need open block device to get its status.
But /dev/sr0 is a CDROM that cannot be opened.

This patch let newContainer doesn't attach device if it is a CDROM
to handle the issue.

Fixes #829

Signed-off-by: Hui Zhu <teawater@hyper.sh>
2018-11-07 17:28:06 +08:00
Zichang Lin
36306e283c sandbox/virtcontainers: modify tests relate to memory hotplug.
Signed-off-by: Clare Chen <clare.chenhui@huawei.com>
Signed-off-by: Zichang Lin <linzichang@huawei.com>
2018-10-17 23:01:13 -04:00
Clare Chen
14f480af8f sandbox/virtcontainers: combine addResources and updateResources
addResources is just a special case of updateResources. Combine the shared codes
so that we do not maintain the two pieces of identical code.

Signed-off-by: Clare Chen <clare.chenhui@huawei.com>
2018-10-15 10:39:08 +08:00
Zichang Lin
8e2ee686bd sandbox/virtcontainers: memory resource hotplug when create container.
When create sandbox, we setup a sandbox of 2048M base memory, and
then hotplug memory that is needed for every new container. And
we change the unit of c.config.Resources.Mem from MiB to Byte in
order to prevent the 4095B < memory < 1MiB from being lost.

Depends-on:github.com/kata-containers/tests#813

Fixes #400

Signed-off-by: Clare Chen <clare.chenhui@huawei.com>
Signed-off-by: Zichang Lin <linzichang@huawei.com>
2018-10-15 10:37:29 +08:00
Jose Carlos Venegas Munoz
4697cf3c79 memory: update: Update state using the memory removed.
If the memory is reduced , its cgroup in the VM was updated properly. But the
runtime assumed that the memory was also removed from the VM.

Then when it is added more memory again, more is added (but not needed).

Fixes: #801

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
2018-10-02 14:38:21 -05:00
Julio Montes
9e606b3da8 virtcontainers: revert "fix shared dir resource remaining"
This reverts commit 8a6d383715.

Don't remove all directories in the shared directory because
`docker cp` re-mounts all the mount points specified in the
config.json causing serious problems in the host.

fixes #777

Signed-off-by: Julio Montes <julio.montes@intel.com>
2018-09-24 12:15:09 -05:00
Clare Chen
12a0354084 sandbox: get and store guest details.
Get and store guest details after sandbox is completely created.
And get memory block size from sandbox state file when check
hotplug memory valid.

Signed-off-by: Clare Chen <clare.chenhui@huawei.com>
Signed-off-by: Zichang Lin <linzichang@huawei.com>
2018-09-17 07:00:46 -04:00
Clare Chen
13bf7d1bbc virtcontainers: hotplug memory with kata-runtime update command
Add support for using update command to hotplug memory to vm.
Connect kata-runtime update interface with hypervisor memory hotplug
feature.

Fixes #625

Signed-off-by: Clare Chen <clare.chenhui@huawei.com>
2018-09-17 05:02:18 -04:00
James O. D. Hunt
ed1e343b93 Merge pull request #655 from WeiZhang555/add-ref-counter-for-devices
Add ref counter for devices
2018-09-06 09:51:07 +01:00
fupan
a5478b93e0 virtcontainers: wait until process exited before RemoveContainer
RemoveContainer is called right after SignalProcess(SIGKILL), the container
process might be still running and container Destroy() will fail, thus it's better
to wait on this process exited before to issue RemoveContainer.

Fixes: #690

Signed-off-by: fupan <lifupan@gmail.com>
2018-09-03 12:18:12 +08:00
Wei Zhang
c518b1ef00 device: use devicemanager to manage rootfs block
Fixes #635

When container rootfs is block based in devicemapper use case, we can re-use
sandbox device manager to manage rootfs block plug/unplug, we don't detailed
description of block in container state file, instead we only need a Block index
referencing sandbox device.

Remove `HotpluggedDrive` and `RootfsPCIAddr` from state file because it's not
necessary any more.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2018-08-31 19:30:08 +08:00
Wei Zhang
affd6e3216 devices: add reference count for devices.
Fixes #635

Remove `Hotplugged bool` field from device and add two new fields
instead:
* `RefCount`: how many references to this device. One device can be
referenced(`NewDevice()`) many times by same/different container(s),
two devices are regarded identical if they have same hostPath
* `AttachCount`: how many times this device has been attached. A device
can only be hotplugged once to the qemu, every new Attach command will
add the AttachCount, and real `Detach` will be done only when
`AttachCount == 0`

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2018-08-31 09:53:01 +08:00
James O. D. Hunt
d0679a6fd1 tracing: Add tracing support to virtcontainers
Add additional `context.Context` parameters and `struct` fields to allow
trace spans to be created by the `virtcontainers` internal functions,
objects and sub-packages.

Note that not every function is traced; we can add more traces as
desired.

Fixes #566.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2018-08-22 08:24:58 +01:00
Wei Zhang
6e6be98b15 devices: add interface "sandbox.AddDevice"
Fixes #50 .

Add new interface sandbox.AddDevice, then for Frakti use case, a device
can be attached to sandbox before container is created.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2018-08-15 15:24:12 +08:00
Sebastien Boeuf
16600efc1d Merge pull request #531 from WeiZhang555/bugfix
re-add: refactor device manager
2018-08-02 07:32:02 -07:00
Wei Zhang
b7464899ec devices: address some comments
Address some review comments:
* remove unnecessary rollback logics
* add vfio hot unplug handling.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2018-07-31 10:05:56 +08:00
Wei Zhang
f905c16f21 device-manager: refactor device manger
Fixes #50

This commit imports a big logic change:
* host device to be attached or appended now is sandbox level resources,
one device should bind to sandbox/hypervisor first, then container could
reference it via device's unique ID.
* attach or detach device should go through the device manager interface
instead of the device interface.
* allocate device ID in global device mapper to guarantee every device
has a uniq device ID and there won't be any ID collision.

With this change, there will some changes on data format on disk for sandbox
and container, these changes also make a breakage of backward compatibility.

New persist data format:
* every sandbox will get a new "devices.json" file under "/run/vc/sbs/<sid>/"
which saves detailed device information, this also conforms to the concept that
device should be sandbox level resource.
* every container uses a "devices.json" file but with new data format:
```
[
  {
    "ID": "b80d4736e70a471f",
    "ContainerPath": "/dev/zero"
  },
  {
    "ID": "6765a06e0aa0897d",
    "ContainerPath": "/dev/null"
  }
]
```
`ID` should reference to a device in a sandbox, `ContainerPath` indicates device
path inside a container.

Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2018-07-31 10:03:57 +08:00
Wei Zhang
1194154309 devices: use device manager to manage all devices
Fixes #50

Previously the devices are created with device manager and laterly
attached to hypervisor with "device.Attach()", this could work, but
there's no way to remember the reference count for every device, which
means if we plug one device to hypervisor twice, it's truly inserted
twice, but actually we only need to insert once but use it in many
places.

Use device manager as a consolidated entrypoint of device management can
give us a way to handle many "references" to single device, because it
can save all devices and remember it's use count.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2018-07-31 09:59:29 +08:00
James O. D. Hunt
763a1b6265 logging: Remove unnecessary fields and use standard names
Ensure the entire codebase uses `"sandbox"` and `"container"` log
fields for the sandboxID and containerID respectively.

Simplify code where fields can be dropped.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2018-07-30 15:32:41 +01:00
Sebastien Boeuf
927487c142 revert: "virtcontainers: support pre-add storage for frakti"
This PR got merged while it had some issues with some shim processes
being left behind after k8s testing. And because those issues were
real issues introduced by this PR (not some random failures), now
the master branch is broken and new pull requests cannot get the
CI passing. That's the reason why this commit revert the changes
introduced by this PR so that we can fix the master branch.

Fixes #529

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-07-27 09:39:56 -07:00
Wei Zhang
8391b20805 devices: address some comments
Address some review comments:
* remove unnecessary rollback logics
* add vfio hot unplug handling.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2018-07-26 14:15:52 +08:00
Wei Zhang
7f5989f06c device-manager: refactor device manger
Fixes #50

This commit imports a big logic change:
* host device to be attached or appended now is sandbox level resources,
one device should bind to sandbox/hypervisor first, then container could
reference it via device's unique ID.
* attach or detach device should go through the device manager interface
instead of the device interface.
* allocate device ID in global device mapper to guarantee every device
has a uniq device ID and there won't be any ID collision.

With this change, there will some changes on data format on disk for sandbox
and container, these changes also make a breakage of backward compatibility.

New persist data format:
* every sandbox will get a new "devices.json" file under "/run/vc/sbs/<sid>/"
which saves detailed device information, this also conforms to the concept that
device should be sandbox level resource.
* every container uses a "devices.json" file but with new data format:
```
[
  {
    "ID": "b80d4736e70a471f",
    "ContainerPath": "/dev/zero"
  },
  {
    "ID": "6765a06e0aa0897d",
    "ContainerPath": "/dev/null"
  }
]
```
`ID` should reference to a device in a sandbox, `ContainerPath` indicates device
path inside a container.

Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2018-07-26 14:09:53 +08:00
Wei Zhang
2885eb0532 devices: use device manager to manage all devices
Fixes #50

Previously the devices are created with device manager and laterly
attached to hypervisor with "device.Attach()", this could work, but
there's no way to remember the reference count for every device, which
means if we plug one device to hypervisor twice, it's truly inserted
twice, but actually we only need to insert once but use it in many
places.

Use device manager as a consolidated entrypoint of device management can
give us a way to handle many "references" to single device, because it
can save all devices and remember it's use count.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
2018-07-26 11:33:28 +08:00
c00416947
8a6d383715 virtcontainers : fix shared dir resource remaining
Before this patch shared dir will reamin when sandox
has already removed, espacilly for kata-agent mod.

Do clean up shared dirs after all mounts are umounted.

Fixes: #291

Signed-off-by: Haomin <caihaomin@huawei.com>
2018-06-19 20:32:07 +08:00
Julio Montes
b99cadb553 virtcontainers: add pause and resume container to the API
Pause and resume container functions allow us to just pause/resume a
specific container not all the sanbox, in that way different containers
can be paused or running in the same sanbox.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2018-05-31 09:38:13 -05:00
Archana Shinde
d885782df1 namespace: Check if pid namespaces need to be shared
k8s provides a configuration for sharing PID namespace
among containers. In case of crio and cri plugin, an infra
container is started first. All following containers are
supposed to share the pid namespace of this container.

In case a non-empty pid namespace path is provided for a container,
we check for the above condition while creating a container
and pass this out to the kata agent in the CreatContainer
request as SandboxPidNs flag. We clear out the PID namespaces
in the configuration passed to the kata agent.

Fixes #343

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2018-05-30 13:34:24 -07:00
Peng Tao
be82c7fc6f Merge pull request #299 from jshachm/implement-events-command
cli :Implement events command
2018-05-18 15:35:52 +08:00
c00416947
1205e347f2 cli: implement events command
Events cli display container events such as cpu,
memory, and IO usage statistics.

By now OOM notifications and intel RDT are not fully supproted.

Fixes: #186

Signed-off-by: Haomin <caihaomin@huawei.com>
2018-05-18 09:17:49 +08:00
Julio Montes
4527a8066a virtcontainers/qemu: honour CPU constrains
Don't fail if a new container with a CPU constraint was added to
a POD and no more vCPUs are available, instead apply the constraint
and let kernel balance the resources.

Signed-off-by: Julio Montes <julio.montes@intel.com>
2018-05-14 17:33:31 -05:00
Julio Montes
81f376920e cli: implement update command
Update command is used to update container's resources at run time.
All constraints are applied inside the VM to each container cgroup.
By now only CPU constraints are fully supported, vCPU are hot added
or removed depending of the new constraint.

fixes #189

Signed-off-by: Julio Montes <julio.montes@intel.com>
2018-05-08 07:26:38 -05:00
Zhang Wei
f4a453b86c virtcontainers: address some comments
* Move makeNameID() func to virtcontainers/utils file as it's a generic
function for making name and ID.
* Move bindDevicetoVFIO() and bindDevicetoHost() to vfio driver package.

Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2018-05-08 10:24:26 +08:00
Zhang Wei
366558ad5b virtcontainers: refactor device.go to device manager
Fixes #50

This is done for decoupling device management part from other parts.
It seperate device.go to several dirs and files:

```
virtcontainers/device
├── api
│   └── interface.go
├── config
│   └── config.go
├── drivers
│   ├── block.go
│   ├── generic.go
│   ├── utils.go
│   ├── vfio.go
│   ├── vhost_user_blk.go
│   ├── vhost_user.go
│   ├── vhost_user_net.go
│   └── vhost_user_scsi.go
└── manager
    ├── manager.go
    └── utils.go
```

* `api` contains interface definition of device management, so upper level caller
should import and use the interface, and lower level should implement the interface.
it's bridge to device drivers and callers.
* `config` contains structed exported data.
* `drivers` contains specific device drivers including block, vfio and vhost user
devices.
* `manager` exposes an external management package with a `DeviceManager`.

Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2018-05-08 10:24:26 +08:00
Peng Tao
1bb6ab9e22 api: add sandbox iostream API
It returns stdin, stdout and stderr stream of the specified process in
the container.

Fixes: #258

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-05-04 15:38:32 +08:00
Peng Tao
bf4ef4324e API: add sandbox winsizeprocess api
It sends tty resize request to the agent to resize a process's tty
window.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-05-04 15:38:32 +08:00