Commit Graph

342 Commits

Author SHA1 Message Date
Sebastien Boeuf
87aa1d77ed
Merge pull request #252 from bergwolf/sandbox_api_1
API: support sandbox monitor operation
2018-05-01 10:01:17 -07:00
Peng Tao
9d1311d0ee kata_agent: refactor sendReq
CI complains about cyclomatic complexity in sendReq.

warning: cyclomatic complexity 16 of function (*kataAgent).sendReq() is
high (> 15) (gocyclo)

Refactor it a bit to avoid such error. I'm not a big fan of the new code
but it is done so because golang does not support generics.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-05-01 22:42:39 +08:00
Peng Tao
35ebadcedc api: add sandbox Monitor API
It monitors the sandbox status and returns an error channel to let
caller watch it.

Fixes: #251

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-05-01 22:42:33 +08:00
Eric Ernst
70b3c774f8
Merge pull request #263 from bergwolf/sandbox_pointer
virtcontainers: always pass sandbox as a pointer
2018-05-01 07:33:33 -07:00
Peng Tao
5fb4768f83 virtcontainers: always pass sandbox as a pointer
Currently we sometimes pass it as a pointer and other times not. As
a result, the view of sandbox across virtcontainers may not be the same
and it costs extra memory copy each time we pass it by value. Fix it
by ensuring sandbox is always passed by pointers.

Fixes: #262

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-05-01 20:50:07 +08:00
Jose Carlos Venegas Munoz
4d73637829 versions: Move to k8s 1.10
Move to k8s 1.10

Fixes: #277

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
2018-05-01 02:15:11 -05:00
Sebastien Boeuf
8d897f407f
Merge pull request #238 from jodh-intel/collect-script-support-initrd+osbuilder-file
Tidy up and add support for initrd and osbuilder metadata file
2018-04-30 16:00:15 -07:00
Eric Ernst
ff9b2bd04e
Merge pull request #256 from sboeuf/improve_container_search_CLI
cli: Optimize container research
2018-04-30 14:41:40 -07:00
Sebastien Boeuf
e6f066b828 cli: Optimize container research
This commit will allow for better performance regarding the time spent
to retrieve the sandbox ID related to a container ID.

The way it works is by relying on a specific mapping between container
IDs and sanbox IDs, meaning it allows to retrieve directly the sandbox
ID related to a container ID from the CLI. This lowers complexity from
O(n²) to O(1), because we don't need to call into ListPod() which was
parsing all the pods and all the containers on the system everytime
the CLI need to retrieve this mapping.

This commit also updates the whole unit tests as a consequence. This
is involving most of them since they were all relying on ListPod()
before.

Fixes #212

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-04-30 10:53:08 -07:00
Graham Whaley
f92d7dd1c1
Merge pull request #275 from sboeuf/fix_k8s_shim_killed
virtcontainers: Properly remove the container when shim gets killed
2018-04-30 16:51:34 +01:00
Sebastien Boeuf
e78941e3e5
Merge pull request #272 from amshinde/pass-bundle-in-hooks
hooks: Send the bundle path in the state that is sent with hooks
2018-04-30 07:28:27 -07:00
Sebastien Boeuf
789dbca6d6 virtcontainers: Properly remove the container when shim gets killed
Here is an interesting case I have been debugging. I was trying to
understand why a "kubeadm reset" was not working for kata-runtime
compared to runc. In this case, the only pod started with Kata is
the kube-dns pod. For some reasons, when this pod is stopped and
removed, its containers receive some signals, 2 of them being SIGTERM
signals, which seems the way to properly stop them, but the third
container receives a SIGCONT. Obviously, nothing happens in this
case, but apparently CRI-O considers this should be the end of the
container and after a few seconds, it kills the container process
(being the shim in Kata case). Because it is using a SIGKILL, the
signal does not get forwarded to the agent because the shim itself
is killed right away. After this happened, CRI-O calls into
"kata-runtime state", we detect the shim is not running anymore
and we try to stop the container. The code will eventually call
into agent.RemoveContainer(), but this will fail and return an
error because inside the agent, the container is still running.

The approach to solve this issue here is to send a SIGKILL signal
to the container after the shim has been waited for. This call does
not check for the error returned because most of the cases, regular
use cases, will end up returning an error because the shim itself
not being there actually represents the container inside the VM has
already terminated.
And in case the shim has been killed without the possibility to
forward the signal (like described in first paragraph), the SIGKILL
will work and will allow the following call to agent.stopContainer()
to proceed to the removal of the container inside the agent.

Fixes #274

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-04-27 18:36:27 -07:00
Archana Shinde
a301a9e641 hooks: Send the bundle path in the state that is sent with hooks
We currently just send the pid in the state. While OCI specifies
a few other fields as well, this commit just adds the bundle path
and the container id to the state. This should fix the errors seen
with hooks that rely on the bundle path.

Other fields like running "state" string have been left out. As this
would need sending the strings that OCI recognises. Hooks have been
implemented in virtcontainers and sending the state string would
require calling into OCI specific code in virtcontainers.

The bundle path again is OCI specific, but this can be accessed
using annotations. Hooks really need to be moved to the cli as they
are OCI specific. This however needs network hotplug to be implemented
first so that the hooks can be called from the cli after the
VM has been created.

Fixes #271

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2018-04-27 15:48:58 -07:00
James O. D. Hunt
d4225ede2f
Merge pull request #260 from jcvenegas/go1.10
versions: move to go 1.10
2018-04-27 14:50:24 +01:00
Jose Carlos Venegas Munoz
20432dd99f versions: ci: Move to go 1.10
Change the latest working go version for kata.

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
2018-04-26 21:26:51 -05:00
Eric Ernst
ff3518e3ec
Merge pull request #232 from sboeuf/fix_openshift_k8s
cli: Don't wait for OCI delete to stop the sandbox
2018-04-26 15:38:48 -07:00
Jose Carlos Venegas Munoz
9830810684 vendor: update covertool.
Update covertool to allow run tests with go 1.10

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
2018-04-26 11:38:15 -05:00
Sebastien Boeuf
644489b6e7 virtcontainers: Fix gofmt issues for Go 1.10
Now that our CI has moved to Go 1.10, we need to update one file
that is not formatted as the new gofmt (1.10) expects it to be
formatted.

Fixes #249

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-04-26 11:38:15 -05:00
Julio Montes
31eb51ee7d
Merge pull request #244 from jodh-intel/backtrace-on-sigusr1
cli: Backtrace on SIGUSR1
2018-04-26 07:49:10 -05:00
James O. D. Hunt
6191ddffb3 cli: Backtrace on SIGUSR1
Rework the signal handling code so that if debug is enabled and a
`SIGUSR1` signal is received, backtrace to the system log but continue
to run.

Added some basic tests for the signal handling code.

Fixes #241.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2018-04-26 11:39:20 +01:00
Sebastien Boeuf
07af4edea9 cli: Stop the sandbox on a KILL
The same way a caller of "kata-runtime kill 12345" expects
the container 12345 to be killed, the same call to a container
representing a sandbox should actually kill the sandbox, meaning
it would be stopped after the container has been killed.

This way, the caller knows the VM is stopped after kill returns.
This is an issue raised by Openshift and Kubernetes tests. They
call into delete way after the call to kill has been submitted,
and in the meantime they kill all processes related to the container,
meaning they do kill the VM before we could do it ourselves. In this
case, the delete responsible of stopping the VM comes too late and it
returns an error when trying to destroy the sandbox while trying to
communicate with the agent since the VM is not here anymore.

This commit addresses this issue by letting "kill" call into
StopSandbox() if the command relates to a sandbox instead of
a simple container.

Fixes #246

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-04-25 09:07:34 -07:00
Sebastien Boeuf
163a081776 cli: Check sandbox state before to issue a StopSandbox
The way a delete works, it was always trying to stop the sandbox, even
when the force flag was not enabled. Because we want to be able to stop
the sandbox from a kill command, this means a sandbox stop might be
called twice, and we don't want the second stop to fail, leading to the
failure of the delete command.

That's why this commit checks for the sandbox status before to try
stopping the sandbox.

Fixes #246

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-04-25 09:01:53 -07:00
James O. D. Hunt
fc8d913713 cli: Whitespace fix
Remove blank line.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2018-04-25 16:53:46 +01:00
James O. D. Hunt
7c6856f2a9 cli: Rename fatal.go to signals.go
The fatal file is going to also deal with non-fatal signals so rename
it.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2018-04-25 16:53:46 +01:00
Sebastien Boeuf
45e3f858f0
Merge pull request #255 from chavafg/topic/downgrade-go-version
versions: change newest supported go version
2018-04-24 14:39:56 -07:00
Salvador Fuentes
cf7491665b versions: change newest supported go version
change from go1.10 to 1.9.2.
Our static checks and unit tests fail when using
go 1.10. Since we use go 1.9.2 to test in our CI,
reflect this version in versions.yaml

By doing this, we will be able to remove the hardcoded version
from the jenkins scripts and instead install golang using
`.ci/install_go.sh` from the tests repository. And when moving
to go1.10 using a PR, the CI will test that the static checks
and unit tests pass correctly.

Fixes: #254.

Signed-off-by: Salvador Fuentes <salvador.fuentes@intel.com>
2018-04-24 12:48:55 -05:00
Jose Carlos Venegas Munoz
7bb4e0470c
Merge pull request #240 from jcvenegas/versions-cri
versions: Add cri-containerd to versions file.
2018-04-24 10:32:34 -05:00
Sebastien Boeuf
d931d2902d
Merge pull request #218 from bergwolf/sandbox_api
api: add sandbox operation APIs
2018-04-24 07:21:36 -07:00
Peng Tao
29ce01fd11 api: add sandbox EnterContainer API
And make VC EnterContainer a wrapper of it.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-04-24 15:33:51 +08:00
Peng Tao
488c3ee353 api: add sandbox Status API
It returns the status of current sandbox.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-04-24 15:33:47 +08:00
Peng Tao
b3d9683743 api: add sandbox StatusContainer API
It retrieves container status from sandbox.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-04-24 15:32:54 +08:00
Peng Tao
4b30446217 api: add sandbox startcontainer API
And make VC.StartContainer a wrapper of it.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-04-24 15:30:53 +08:00
Peng Tao
d9144c8514 api: add sandbox DeleteContainer API
DeleteContainer in api.go is now a wrapper of it.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-04-24 15:30:53 +08:00
Peng Tao
f6aa8a23fc api: add sandbox CreateContainer API
And make CreateContainer in api.go a wrapper of it.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-04-24 15:30:53 +08:00
Peng Tao
ef89131b85 api: add sandbox Delete API
By exporting the existing sandbox delete() function.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-04-24 15:30:53 +08:00
Peng Tao
5165de0d76 api: add sandbox pause and resume API
By exporting the existing sandbox operations.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-04-24 15:30:53 +08:00
Peng Tao
eb23771d5a api: add sandbox release API
It disconnects the agent connection and removes the sandbox
from global sandbox list.

A new option `LongLiveConn` is also added to kata
agent's configuration. When set, the API caller is expected
to call sandbox.Release() to drop the agent connection explicitly.

`proxyBuiltIn` is moved out of agent state because we can always
retrieve it from sandbox config instead.

Fixes: #217

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-04-24 15:30:53 +08:00
Peng Tao
d189be8579 api: add FetchSandbox
It finds out and existing sandbox and returns it.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
2018-04-24 15:30:53 +08:00
Jose Carlos Venegas Munoz
336aa93e6c versions: Add cri-containerd to versions file.
- Add latest release from cri-containerd.

Fixes: #239

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
2018-04-23 20:06:54 -05:00
Sebastien Boeuf
76af465724
Merge pull request #243 from jodh-intel/fix-TestIsHostDevice-test
virtcontainers: Fix TestIsHostDevice test as non-root
2018-04-23 11:27:55 -07:00
James O. D. Hunt
53d73e56e0 virtcontainers: Fix TestIsHostDevice test as non-root
Don't Attempt to create file below `/dev` when running as non-`root`.

Move the logic into a new `TestIsHostDeviceCreateFile` test and skip
unless `root.`

Fixes #242.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2018-04-23 14:29:13 +01:00
Sebastien Boeuf
de32be7eed
Merge pull request #211 from amshinde/assign-bridge-addr
Assign address to a pci bridge while appending it
2018-04-20 14:52:31 -07:00
James O. D. Hunt
9dceb3eed1 scripts: Added initrd support to collect script
The collect script is now able to extract the osbuilder metadata
from an initrd image.

Fixes #237.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2018-04-20 16:55:10 +01:00
James O. D. Hunt
72056eb89b scripts: Collect script now shows osbuilder file
Changed the collect script to display the contents of the
osbuilder metadata file which provides details of the image.

Partially fixes #237.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2018-04-20 16:55:01 +01:00
James O. D. Hunt
4281bc3543 scripts: Make collect script variable local
Added a missing `local` in `get_image_file()`.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2018-04-20 16:46:45 +01:00
James O. D. Hunt
fbd28085d3 scripts: Make more collect script variables read only
Changed some important global variables to be read-only.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2018-04-20 16:44:00 +01:00
Sebastien Boeuf
dec01c1ec0
Merge pull request #236 from devimc/cpu/cpuset
virtcontainers: kata_agent: enable cpus and mem sets
2018-04-20 00:43:43 -07:00
Sebastien Boeuf
397decb051
Merge pull request #220 from amshinde/revert-dev-mount
Handle device nodes and regular files in /dev
2018-04-19 15:00:02 -07:00
Julio Montes
e9404cc9e0 virtcontainers: kata_agent: enable cpus and mem sets
this patch is to honour docker `--cpuset-cpus` and
`--cpuset-mems` options.

fixes #221

Signed-off-by: Julio Montes <julio.montes@intel.com>
2018-04-19 13:16:46 -05:00
Archana Shinde
71c7a9c13e virtcontainers: Handle regular files in /dev
The k8s test creates a log file in /dev under
/dev/termination-log, which is not the right place to create
logs, but we need to handle this. With this commit, we handle
regular files under /dev by passing them as 9p shares. All other
special files including device files and directories
are not passed as 9p shares as these are specific to the host.
Any operations on these in the guest would fail anyways.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2018-04-19 10:59:26 -07:00