Commit Graph

5354 Commits

Author SHA1 Message Date
Christophe de Dinechin
2d65b3bfd8 config: Protect jailer_path annotation
The jailer_path annotation can be used to execute arbitrary code on
the host. Add a jailer_path_list configuration entry providing a list
of regular expressions that can be used to filter annotations that
represent valid file names.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
fe5e1cf2e1 config: Add examples for path_list configuration
The path_list configuration gives a series of regular expressions that
limit which values are acceptable through annotations in order to
avoid kata launching arbitrary binaries on the host when receiving an
annotation.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
3f7bcf54f0 annotations: Simplify negative logic
Replace strange negative logic  (!ok -> continue) with positive
logic (ok -> do it)

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
80144fc415 config: Add hypervisor path override through annotations
The annotation is provided, so it should be respected.
Furthermore, it is important to implement it with the appropriate
protetions similar to what was done for virtiofsd.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
2f5f35608a config: Fix typo in function name
There was an extra 'p' in addHypervisorVirtioFsOverrides.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
2faafbdd3a config: Protect virtio_fs_daemon annotation
Sending the virtio_fs_daemon annotation can be used to execute
arbitrary code on the host. In order to prevent this, restrict the
values of the annotation to a list provided by the configuration
file.

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Christophe de Dinechin
9e5ed41511 config: Add 'List' alternates for hypervisor configuration paths
Paths mentioned in the hypervisor configuration can be overriden
using annotations, which is potentially dangerous. For each path,
add a 'List' variant that specifies the list of acceptable values
from annotations.

Bug: https://bugs.launchpad.net/katacontainers.io/+bug/1878234

Fixes: #901

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2020-10-18 00:40:16 +08:00
Peng Tao
b33d4fe708 agent: fix panic on malformed device resource in container update
Somehow containerd is sending a malformed device in update API. While it
should not happen, we should not panic either.

Fixes: #946
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:40:16 +08:00
Eric Ernst
183823398d cpuset: add cpuset pkg
Pulled from 1.18.4 Kubernetes, adding the cpuset pkg for managing
CPUSet calculations on the host. Go mod'ing the original code from
k8s.io/kubernetes was very painful, and this is very static, so let's
just pull in what we need.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
Eric Ernst
bfbbe8ba6b cpuset: don't set cpuset.mems in the guest
Kata doesn't map any numa topologies in the guest. Let's make sure we
clear the Cpuset fields before passing container updates to the
guest.

Note, in the future we may want to have a vCPU to guest CPU mapping and
still include the cpuset.Cpus. Until we have this support, clear this as
well.

Fixes: #932

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
Eric Ernst
5c21ec278c sandbox: consider cpusets if quota is not enforced
CPUSet cgroup allows for pinning the memory associated with a cpuset to
a given numa node. Similar to cpuset.cpus, we should take cpuset.mems
into account for the sandbox-cgroup that Kata creates.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
Eric Ernst
9bb0d48d56 cpuset: support setting mems for sandbox
CPUSet cgroup allows for pinning the memory associated with a cpuset to
a given numa node. Similar to cpuset.cpus, we should take cpuset.mems
into account for the sandbox-cgroup that Kata creates.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
Eric Ernst
64a2ef62e0 virtcontainers: add method for calculating cpuset for sandbox
Calculate sandbox's CPUSet as the union of each of the container's
CPUSets.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
Eric Ernst
a441f21c40 cpuset: add cpuset pkg
Pulled from 1.18.4 Kubernetes, adding the cpuset pkg for managing
CPUSet calculations on the host. Go mod'ing the original code from
k8s.io/kubernetes was very painful, and this is very static, so let's
just pull in what we need.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
James O. D. Hunt
ce54090f25 docs: Update upgrading guide
Update the upgrading guide for 2.0.

Fixes: #928.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-18 00:40:16 +08:00
Ychau Wang
e884fef483 docs: update the build kata containers kernel document
Update the build kata containers kernel document for 2.0 release. Fixed
the 1.x release project paths and urls, using the kata-containers
project file paths and urls.

Fixes: #929

Signed-off-by: Ychau Wang <wangyongchao.bj@inspur.com>
2020-10-18 00:40:16 +08:00
David Gibson
9c16643c12 agent/device: Check type as well as major:minor when looking up devices
To update device resource entries from host to guest, we search for
the right entry by host major:minor numbers, then later update it.
However block and character devices exist in separate major:minor
namespaces so we could have one block and one character device with
matching major:minor and thus incorrectly update both with the details
for whichever device is processed second.

Add a check on device type to prevent this.

Port from the Kata 1 Go agent
https://github.com/kata-containers/agent/commit/27ebdc9d2761

Fixes: #703

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-10-18 00:40:16 +08:00
David Gibson
4978c9092c agent/device: Index all devices in spec before updating them
The agent needs to update device entries in the OCI spec so that it
has the correct major:minor numbers for the guest, which may differ
from the host.

Entries in the main device list are looked up by device path, but
entries in the device resources list are looked up by (host)
major:minor.  This is done one device at a time, updating as we go in
update_spec_device_list().

But since the host and guest have different namespaces, one device
might have the same major:minor as a different device on the host.  In
that case we could update one resource entry to the correct guest
values, then mistakenly update it again because it now matches a
different host device.

To avoid this, rather than looking up and updating one by one, we make
all the lookups in advance, creating a map from (host) device path to
the indices in the spec where the device and resource entries can be
found.

Port from the Go agent in Kata 1,
https://github.com/kata-containers/agent/commit/d88d46849130

Fixes: #703

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-10-18 00:40:16 +08:00
David Gibson
a7ba362f92 agent/device: Forward port update_spec_device_list() unit test
The Kata 1 Go agent included a unit test for updateSpecDeviceList, but no
such unit test exists for the Rust agent's equivalent
update_spec_device_list().  Port the Kata1 test to Rust.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-10-18 00:40:16 +08:00
David Gibson
230a9833f8 agent/device: update_spec_device_list() should error if dev not found
If update_spec_device_list() is given a device that can't be found in the
OCI spec, it currently does nothing, and returns Ok(()).  That doesn't
seem like what we'd expect and is not what the Go agent in Kata 1 does.

Change it to return an error in that case, like Kata 1.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-10-18 00:40:16 +08:00
Eric Ernst
a6d9fd4118 sandbox: don't constrain cpus, mem only cpuset, devices
Allow for constraining the cpuset as well as the devices-whitelist . Revert
sandbox constraints for cpu/memory, as they break the K8S use case. Can
re-add behind a non-default flag in the future.

The sandbox CPUSet should be updated every time a container is created,
updated, or removed.

To facilitate this without rewriting the 'non constrained cgroup'
handling, let's add to the Sandbox's cgroupsUpdate function.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
Eric Ernst
8f0cb2f1ea cgroups: add ability to update CPUSet
Add function for applying a cpuset change to a cgroup

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
Eric Ernst
cbdae44992 agent: fix errorneous parsing for guest block size
We were assuming base 10 string before, when the block size from sysfs
is actually a hex string. Let's fix that.

Fixes: #908

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2020-10-18 00:40:16 +08:00
James O. D. Hunt
97acaa8124 docs: Add containerd install guide
Create a containerd installation guide and a new `kata-manager` script
for 2.0 that automated the steps outlined in the guide.

Also cleaned up and improved the installation documentation in various
ways, the most significant being:

- Added legacy install link for 1.x installs.
- Official packages section:
  - Removed "Contact" column (since it was empty!)
  - Reworded "Versions" column to clarify the versions are a minimum
    (to reduce maintenance burden).
  - Add a column to show which installation methods receive automatic updates.
  - Modified order of installation options in table and document to
    de-emphasise automatic installation and promote official packages
    and snap more.
- Removed sections no longer relevant for 2.0.

Fixes: #738.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2020-10-18 00:40:16 +08:00
bin liu
23246662b2 agent: use ok_or/map_err instead of match
Sometimes `Option.or_or` and `Result.map_err` may be simpler
than match statement. Especially in rpc.rs, there are
many `ctr.get_process` and `sandbox.get_container` which
are using `match`.

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
ebe5ad1386 rustjail: use Iterator to manipulate vector elements
Use Iterator can save codes, and make code more readable

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
c9497c88e4 rustjail: delete codes commented out
There are some uses/codes/struct fields are commented out, and
may not turn into  un-comment these codes, so delete these comments.

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
d5d9928f97 rustjail: delete unused test code
The auto generated test code is no meanings, delete it.

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
f70892a5bb agent: use chain of Result to avoid early return
Use rust `Result`'s `or_else`/`and_then` can write clean codes.
And can avoid early return by check wether the `Result`
is `Ok` or `Err`.

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
ab64780a0b agent: update not accurate comments
This commit includes:
- update comments that not matched the function name
- file path with doubled slash

Fixes: #922

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
9e064ba192 agent: use macro to simplify parse_cmdline function in config.rs
In function parse_cmdline there are some similar codes, if we want
to add more commandline arguments, the code will grow too long.
Use macro can reduce some codes with the same logic/processing.

Fixes: #914

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
42c48f54ed agent: add blank lines between methods
In rpc.rs, there are no blank lines between methods, this commit
add blank lines for these methods.

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
d3a36fa06f agent: delete unused field in agentService
The code is for test, and not needed now.

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
fa546600ff agent: use no-named closure to reduce codes
For simple closures, inline closures can save codes.

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
efddcb4ab8 agent: use a local fn to reduce duplicated codes
The same codes used twices, aggregated into a function can
reduce codes.

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
Peng Tao
7bb3e562bc packaging: fix cloud-hypervisor binary path
1. ensure build-static-clh.sh puts cloud-hypervisor under ./cloud-hypervisor directory
2. install cloud-hypervisor/cloud-hypervisor binary

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:40:16 +08:00
Peng Tao
7b53041bad packaging: fix missing cloud_hypervisor_repo
It is needed in order to build from source.

Fixes: #916
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:40:16 +08:00
Peng Tao
38212ba6d8 packaging: apply qemu v5.1 stable fixes
Qemu v5.1 was released with an affending commit 9b3a35ec82
(virtio: verify that legacy support is not accidentally on).
As a result, it breaks commandline compatiblilities for old qemu
users. Upstream qemu has fixed it but no release has been put out yet.
Let's apply these fixes by hand for now.

Refs: https://www.mail-archive.com/qemu-devel@nongnu.org/msg729556.html

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:40:16 +08:00
Jianyong Wu
fb7e9b4f32 agent: fix aarch64 build
aarch64 needs libgcc to resolve some non-builtin symbols.

Fixes: #909
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
0cfcbf79b8 docs: add namespace key to pod/container config files
If no namespace field in config files, CRI-O will failed:
 setting pod sandbox name and id: cannot generate pod name without namespace

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
bin liu
997f1f6cd0 docs: Add crictl example json files
Add basic sample pod/container config files to show
how to use `crictl` with Kata containers.

Fixes: #881

Signed-off-by: bin liu <bin@hyper.sh>
2020-10-18 00:40:16 +08:00
Ychau Wang
f60f43af6b runtime: Clear the VCMock 1.x API Methods from 2.0
Clear the 1.x branch api methods in the 2.0. Keep the same methods to
the VC interface, like the VCImpl struct.

Fixes: #751

Signed-off-by: Ychau Wang <wangyongchao.bj@inspur.com>
2020-10-18 00:40:16 +08:00
Julio Montes
1789527d61 ci: snap: add event filtering
Run the snap CI on every PR is not needed. Don't run the snap CI
on PRs that don't change the source code (*.go/*.rs), a configuration
file or Makefile.

fixes #896

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-18 00:40:16 +08:00
Peng Tao
999f67d573 agent: do not follow link when mounting container proc and sysfs
Attackers might use it to explore other containers in the same pod.
While it is still safe to allow it, we can just close the race window
like runc does.

Fixes: #885
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:40:16 +08:00
Peng Tao
cb2255f199 agent: set init process non-dumpable
On old kernels (like v4.9), kernel applies CLOECEC in wrong order w.r.t.
dumpable task flags. As a result, we might leak guest file descriptor to
containers. This is a former runc CVE-2016-9962 and still applies to
kata agent. Although Kata container is still valid at protecting the
host, we should not leak extra resources to user containers.

This sets the init processes that join and setup the container's
namespaces as non-dumpable before they setns to the container's pid (or
any other ) namespace.

This settings is automatically reset to the default after the Exec in
the container so that it does not change functionality for the
applications that are running inside, just our init processes.

This prevents parent processes, the pid 1 of the container, to ptrace
the init process before it drops caps and other sets LSMs.

The order during the exec syscall is that the process is set back to
dumpable before O_CLOEXEC are processed.

Refs:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=613cc2b6f272c1a8ad33aefa21cad77af23139f7
https://github.com/torvalds/linux/blob/v4.9/fs/exec.c#L1290-L1318
opencontainers/runc@50a19c6
https://nvd.nist.gov/vuln/detail/CVE-2016-9962

Fixes: #890
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:40:16 +08:00
Peng Tao
2a6c9eec74 agent-ctl: include cargo lock updates
Simply running `make` would generate some cargo lock updates for
agent-ctl. Let's include them so that we have fixed dependencies.

Fixes: #883
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2020-10-18 00:40:16 +08:00
Julio Montes
eaff5de37a versions: add plugins section
plugins sections contains the details of plugins required for
the components or testing.

Add sriov-network-device-plugin url and version that are consumed
by the VFIO test in the tests repository.

fixes #879

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-18 00:40:16 +08:00
Jose Carlos Venegas Munoz
4f1d23b651 virtiofs: Disable DAX
virtiofs DAX support is not stable today, there are
a few corner cases to make it default.

Fixes: #862
Fixes: #875

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
2020-10-18 00:40:16 +08:00
Julio Montes
6d80df9831 snap: specify python version
In order to avoid `unmet dependencies` error in the CI,
the python version must be specified in the yaml.

fixes #877

Signed-off-by: Julio Montes <julio.montes@intel.com>
2020-10-18 00:40:16 +08:00
Ralf Haferkamp
a116ce0b75 osbuilder: Create target directory for agent
When building with AGENT_SOURCE_BIN pointing to an already built
kata-agent binary, the target directory needs to be created in the
rootfs tree.

Fixes #873

Signed-off-by: Ralf Haferkamp <rhafer@suse.com>
2020-10-18 00:40:16 +08:00