Commit Graph

9129 Commits

Author SHA1 Message Date
Bin Liu
383be2203a agent: Add a macro to skip a loop easier
Add a macro to skip a loop easier without using a
if {} else {} condition check.

Fixes: #4185

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-04-30 20:45:41 +08:00
Bin Liu
c633780ba7
Merge pull request #4119 from bradenrayhorn/test-create-logger-task
agent: add tests for create_logger_task function
2022-04-30 19:48:07 +08:00
Bin Liu
97d7b1845b runk: use custom Kill command to support --all option
runk uses liboci-cli crate to parse command line options,
but liboci-cli does not support --all option for kill command,
though this is the runtime spec behavior.

But crictl will issue kill --all command when stopping containers,
as a workaround, we use a custom kill command instead of the one
provided by liboci-cli.

Fixes: #4182

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-04-30 19:34:18 +08:00
Fabiano Fidêncio
1dd6f85a17
Merge pull request #4178 from liubin/4177
runk: set BinaryName for runk for containerd
2022-04-29 21:17:37 +02:00
Champ-Goblem
1b7fd19acb rootfs: Fix chronyd.service failing on boot
In at least kata versions 2.3.3 and 2.4.0 it was noticed that the guest
operating system's clock would drift out of sync slowly over time
whilst the pod was running.

This had previously been raised and fixed in the old reposity via [1].
In essence kvm_ptp and chrony were paired together in order to
keep the system clock up to date with the host.

In the recent versions of kata metioned above,
the chronyd.service fails upon boot with status `266/NAMESPACE`
which seems to be due to the fact that the `/var/lib/chrony`
directory no longer exists.

This change sets the `/var/lib/chrony` directory for the `ReadWritePaths`
to be ignored when the directory does not exist, as per [2].

[1] https://github.com/kata-containers/runtime/issues/1279
[2] https://www.freedesktop.org/software/systemd
/man/systemd.exec.html#ReadWritePaths=

Fixes: #4167
Signed-off-by: Champ-Goblem <cameron_mcdermott@yahoo.co.uk>
2022-04-29 17:15:29 +01:00
Bin Liu
7772f7dd99 runk: set BinaryName for runk for containerd
The default runtime for io.containerd.runc.v2 is runc,
to use runk, the containerd configuration should set the
default runtime to runk or add BinaryName options for the
runtime.

Fixes: #4177

Signed-off-by: Bin Liu <bin@hyper.sh>
2022-04-29 22:26:32 +08:00
James O. D. Hunt
cc839772d3
Merge pull request #2785 from ManaSugi/standard-container-runtime
tools: Add a Rust-based standard OCI container runtime based on Kata agent
2022-04-29 13:20:59 +01:00
James O. D. Hunt
2d5f11501c
Merge pull request #4083 from bradenrayhorn/test-parse-mount-table
rustjail: add tests for parse_mount_table
2022-04-29 11:34:22 +01:00
Jianyong Wu
982c32358a
Merge pull request #4031 from Jaylyn-Ren/kata-spdk
Virtcontainers: Enable hot plugging vhost-user-blk device on ARM
2022-04-29 12:16:38 +08:00
Feng Wang
da11c21b4a
Merge pull request #3248 from fengwang666/direct-blk-design
docs: repropose direct-assigned volume
2022-04-28 16:55:50 -07:00
Feng Wang
7ffe5a16f2 docs: Direct-assigned volume design
Detail design description on direct-assigned volume

Fixes: #1468

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-04-28 14:47:36 -07:00
Julio Montes
ea857bb1b8
Merge pull request #4172 from devimc/2022-04-28/fixQEMU
versions: change qemu tdx url and tag
2022-04-28 15:31:52 -05:00
Archana Shinde
9fdc88101f
Merge pull request #3907 from zvonkok/nvidia
doc: Update for NVIDIA GPUs
2022-04-28 12:42:44 -07:00
Julio Montes
081f6de874 versions: change qemu tdx url and tag
https://github.com/intel/qemu-dcp is the new repo that supports
qemu with Intel TDX

fixes #4171

Signed-off-by: Julio Montes <julio.montes@intel.com>
2022-04-28 13:46:11 -05:00
Chelsea Mafrica
3f069c7acb
Merge pull request #4166 from jodh-intel/agent-ctl-fix-abstract
agent-ctl: Fix abstract socket connections
2022-04-28 10:17:28 -07:00
James O. D. Hunt
666aee54d2 docs: Add VSOCK localhost example for agent-ctl
Update the `agent-ctl` docs to show how to use a VSOCK local address
when running the agent and the tool in the same environment. This is an
alternative to using a Unix socket.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2022-04-28 13:33:23 +01:00
James O. D. Hunt
86d348e065 docs: Use VM term in agent-ctl doc
Use the standard "VM" acronym to mean Virtual Machine.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2022-04-28 13:33:19 +01:00
James O. D. Hunt
4b9b62bb3e agent-ctl: Fix abstract socket connections
Unbreak the `agent-ctl` tool connecting to the agent with a Unix domain
socket.

It appears that [1] changed the behaviour of connecting to the agent
using a local Unix socket (which is not used by Kata under normal
operation).

The change can be seen by reverting to commit
72b8144b56 (the one before [1]) and
running the agent manually as:

```bash
$ sudo KATA_AGENT_SERVER_ADDR=unix:///tmp/foo.socket target/x86_64-unknown-linux-musl/release/kata-agent
```

Before [1], in another terminal we see this:

```bash
$ sudo lsof -U 2>/dev/null |grep foo|awk '{print $9}'
@/tmp/foo.socket@
```

But now, we see the following:

```bash
$ sudo lsof -U 2>/dev/null |grep foo|awk '{print $9}'
@/tmp/foo.socket
```

Note the last byte which represents a nul (`\0`) value.

The `agent-ctl` tool used to add that trailing nul but now it seems to not
be needed, so this change removes it, restoring functionality. No
external changes are necessary so the `agent-ctl` tool can connect to
the agent as below like this:

```bash
$ cargo run -- -l debug connect --server-address "unix://@/tmp/foo.socket" --bundle-dir "$bundle_dir" -c Check -c GetGuestDetails
```

[1] - https://github.com/kata-containers/kata-containers/issues/3124

Fixes: #4164.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2022-04-28 13:33:09 +01:00
Fabiano Fidêncio
c4dd029566
Merge pull request #4135 from fidencio/topic/clh-net-rate-limitting
Implement network and disk rate limiter for Cloud Hypervisor
2022-04-28 13:33:10 +02:00
Fabiano Fidêncio
9fb9c80fd3
Merge pull request #4161 from fidencio/topic/kata-deploy-plus-rke2
kata-deploy: Add support to RKE2
2022-04-28 11:35:11 +02:00
Fabiano Fidêncio
b6467ddd73 clh: Expose disk rate limiter config
With everything implemented, let's now expose the disk rate limiter
configuration options in the Cloud Hypervisor configuration file.

Fixes: #4139

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:28:29 +02:00
Fabiano Fidêncio
7580bb5a78 clh: Expose net rate limiter config
With everything implemented, let's now expose the net rate limiter
configuration options in the Cloud Hypervisor configuration file.

Fixes: #4017

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:28:13 +02:00
Fabiano Fidêncio
a88adabaae clh: Cloud Hypervisor has a built-in Rate Limiter
The notion of "built-in rate limiter" was added as part of
bd8658e362, and that commit considered
that only Firecracker had a built-in rate limiter, which I think was the
case when that was introduced (mid 2020).

Nowadays, however, Cloud Hypervisor takes advantage of the very same crate
used by Firecraker to do I/O throttling.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:27:56 +02:00
Fabiano Fidêncio
63c4da03a9 clh: Implement the Disk RateLimiter logic
Let's take advantage of the newly added DiskRateLimiter* options and
apply those to the network device configuration.

The logic here is identical to the one already present in the Network
part of Cloud Hypervisor's driver.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:27:53 +02:00
Fabiano Fidêncio
511f7f822d config: Add DiskRateLimiter* to Cloud Hypervisor
Let's add the newly added disk rate limiter configurations to the Cloud
Hypervisor's hypervisor configuration.

Right now those are not used anywhere, and there's absolutely no way the
users can set those up.  That's coming later in this very same series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:27:15 +02:00
Fabiano Fidêncio
5b18575dfe hypervisor: Add disk bandwidth and operations rate limiters
This is the disk counterpart of the what was introduced for the network
as part of the previous commits in this series.

The newly added fields are:
* DiskRateLimiterBwMaxRate, defined in bits per second, which is used to
  control the network I/O bandwidth at the VM level.
* DiskRateLimiterBwOneTimeBurst, also defined in bits per second, which
  is used to define an *initial* max rate, which doesn't replenish.
* DiskRateLimiterOpsMaxRate, the operations per second equivalent of the
  DiskRateLimiterBwMaxRate.
* DiskRateLimiterOpsOneTimeBurst, the operations per second equivalent of
  the DiskRateLimiterBwOneTimeBurst.

For now those extra fields have only been added to the hypervisor's
configuration and they'll be used in the coming patches of this very
same series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:27:11 +02:00
Fabiano Fidêncio
1cf9469297 clh: Implement the Network RateLimiter logic
Let's take advantage of the newly added NetRateLimiter* options and
apply those to the network device configuration.

The logic here is quite similar to the one already present in the
Firecracker's driver, with the main difference being the single Inbound
/ Outbound MaxRate and the presence of both Bandwidth and Operations
rate limiter.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:26:38 +02:00
Fabiano Fidêncio
00a5b1bda9 utils: Define DefaultRateLimiterRefillTimeMilliSecs
Firecracker's driver doesn't expose the RefillTime option of the rate
limiter to the user.  Instead, it uses a contant value of 1000
miliseconds (1 second).

As we're following Firecracker's driver implementation, let's expose
create a new constant, use it as part of the Firecracker's driver, and
later on re-use it as part of the Cloud Hypervisor's driver.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:22:42 +02:00
Fabiano Fidêncio
be1bb7e39f utils: Move FC's function to revert bytes to utils
Firecracker's revertBytes function, now called "RevertBytes", can be
exposed as part of the virtcontainers' utils file, as this function will
be reused by Cloud Hypervisor, when adding the rate limiter logic there.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:22:42 +02:00
Fabiano Fidêncio
c9f6496d6d config: Add NetRateLimiter* to Cloud Hypervisor
Let's add the newly added network rate limiter configurations to the
Cloud Hypervisor's hypervisor configuration.

Right now those are not used anywhere, and there's absolutely no way the
users can set those up.  That's coming later in this very same series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:22:42 +02:00
Fabiano Fidêncio
2d35e6066d hypervisor: Add network bandwidth and operations rate limiters
In a similar way to what's already exposed as RxRateLimiterMaxRate and
TxRateLimiterMaxRate, let's add four new fields to the Hypervisor's
configuration.

The values added are related to bandwidth and operations rate limiters,
which have to be added so we can expose I/O throttling configurations to
users using Cloud Hypervisor as their preferred VMM.

The reason we cannot simply re-use {Rx,Tx}RateLimiterMaxRate is because
Cloud Hypervisor exposes a single MaxRate to be used for both inbound
and outbound queues.

The newly added fields are:
* NetRateLimiterBwMaxRate, defined in bits per second, which is used to
  control the network I/O bandwidth at the VM level.
* NetRateLimiterBwOneTimeBurst, also defined in bits per second, which
  is used to define an *initial* max rate, which doesn't replenish.
* NetRateLimiterOpsMaxRate, the operations per second equivalent of the
  NetRateLimiterBwMaxRate.
* NetRateLimiterOpsOneTimeBurst, the operations per second equivalent of
  the NetRateLimiterBwOneTimeBurst.

For now those extra fields have only been added to the hypervisor's
configuration and they'll be used in the coming patches of this very
same series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:22:42 +02:00
Braden Rayhorn
b0e439cb66
rustjail: add tests for parse_mount_table
Add tests for parse_mount_table function in rustjail/src/mount.rs.
Includes some minor refactoring improve the testability of the
function and improve its error values.

Fixes: #4082

Signed-off-by: Braden Rayhorn <bradenrayhorn@fastmail.com>
2022-04-27 20:06:01 -05:00
Chelsea Mafrica
ab067cf074
Merge pull request #4163 from GabyCT/topic/fixdoccontainerd
docs: Update containerd link to installation guide
2022-04-27 16:18:57 -07:00
Fabiano Fidêncio
ccb0183934 kata-deploy: Add support to RKE2
"RKE2 - Rancher's Next Generation Kuberentes Distribution" can easily be
supported by kata-deploy with some simple adjustments to what we've been
relying on for "k3s".

The main differences between k3s and RKE2 are, basically:
1. The location where the containerd configuration is stored
   - k3s: /var/lib/rancher/k3s/agent/etc/containerd/
   - rke2: /var/lib/rancher/rke2/agent/etc/containerd/
2. The name of the systemd services used:
   - k3s: k3s.service or k3s-agent.service
   - rke2: rke2-server.service or rke2-agent.service

Knowing this, let's add a new overlay for RKE2, adapt the kata-deploy
and the kata-cleanup scripts, and that's it.

Fixes: #4160

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 19:05:36 +02:00
Fabiano Fidêncio
9d39362e30 kata-deploy: Reestructure the installing section
Let's move the specific installation instructions, such as for k3s,
upper in the document.

This helps reading (and also skipping) according to what the user
is looking for.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2022-04-27 19:05:36 +02:00
Fabiano Fidêncio
18d27f7949 kata-deploy: Add a missing $ prefix in the README
Commit short-log says it all.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2022-04-27 19:05:36 +02:00
Gabriela Cervantes
6948b4b360 docs: Update containerd link to installation guide
This PR updates the containerd url link for the installation guide

Fixes #4162

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2022-04-27 16:52:53 +00:00
Manabu Sugimoto
b221a2590f tools: Add runk
Add a Rust-based standard OCI container runtime based on
Kata agent.

You can build and install runk as follows:

```sh
$ cd src/tools/runk
$ make
$ sudo make install
$ runk --help
```

Fixes: #2784

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2022-04-28 00:48:57 +09:00
Manabu Sugimoto
2c218a07b9 agent: Modify Kata agent for runk
Generate an oci-kata-agent which is a customized agent to be
called from runk which is a Rust-based standard OCI container
runtime based on Kata agent.

Fixes: #2784

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2022-04-28 00:48:57 +09:00
Zvonko Kaiser
dd4bd7f471 doc: Added initial doc update for NV GPUs
Fixed rpm vs deb references
Update to the shell portion

Fixes #3379

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2022-04-27 16:38:35 +02:00
James O. D. Hunt
d02db3a268
Merge pull request #4156 from Kvasscn/kata_dev_fix_docs_pc_machine
docs: remove pc machine type supports
2022-04-27 11:55:58 +01:00
James O. D. Hunt
0a6e7d443e
Merge pull request #3910 from etrunko/agent_random
Agent: Unit tests for random.rs
2022-04-27 09:41:02 +01:00
James O. D. Hunt
7b20707197
Merge pull request #4107 from garrettmahin/test-mount-grpc-to-oci
rustjail: Add tests for mount_grpc_to_oci
2022-04-27 08:50:24 +01:00
Fabiano Fidêncio
411053e2bd
Merge pull request #4152 from gkurz/fix-clh-build
packaging: Fix broken path in `build-static-clh.sh`
2022-04-27 08:59:43 +02:00
Jason Zhang
832c33d5b5 docs: remove pc machine type supports
Currently the 'pc' machine type is no longer supported in kata configuration,
so remove it in the design docs.

Fixes: #4155

Signed-off-by: Jason Zhang <zhanghj.lc@inspur.com>
2022-04-27 11:28:03 +08:00
Greg Kurz
b658dccc5f tools: fix typo in clh directory name
This allows to get released binaries again.

Fixes: #4151

Signed-off-by: Greg Kurz <groug@kaod.org>
2022-04-26 17:57:32 +02:00
Greg Kurz
afbd60da27 packaging: Fix clh build from source fall-back
If we fail to download the clh binary, we fall-back to build from source.
Unfortunately, `pull_clh_released_binary()` leaves a `cloud_hypervisor`
directory behind, which causes `build_clh_from_source()` not to clone
the git repo:

    [ -d "${repo_dir}" ] || git clone "${cloud_hypervisor_repo}"

When building from a kata-containers git repo, the subsequent calls
to `git` in this function thus apply to the kata-containers repo and
eventually fail, e.g.:

+ git checkout v23.0
error: pathspec 'v23.0' did not match any file(s) known to git

It doesn't quite make sense actually to keep an existing directory the
content of which is arbitrary when we want to it to contain a specific
version of clh. Just remove it instead.

Fixes: #4151

Signed-off-by: Greg Kurz <groug@kaod.org>
2022-04-26 17:57:32 +02:00
Peng Tao
5b6e45ed6c
Merge pull request #4141 from dgibson/cleanup-tmp
Fix Go unit tests to clean up /tmp after themselves
2022-04-26 15:43:34 +08:00
Garrett Mahin
4b9e78b837 rustjail: Add tests for mount_grpc_to_oci
Add test coverage for mount_grpc_to_oci in rustjail/src/lib.rs

Fixes: #4106

Signed-off-by: Garrett Mahin <garrett.mahin@gmail.com>
2022-04-25 08:37:17 -05:00
James O. D. Hunt
bc919cc54c
Merge pull request #4122 from bradenrayhorn/test-mount-from
rustjail: add tests for mount_from function
2022-04-25 11:55:21 +01:00