Commit Graph

11257 Commits

Author SHA1 Message Date
Fabiano Fidêncio
034d7aab87 tests: k8s: Ensure the runtime classes are properly created
With these 2 simple checks we can ensure that we do not regress on the
behaviour of allowing the runtime classes / default runtime class to be
created by the kata-deploy payload.

Fixes: #7491

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-09 11:46:04 +02:00
Fabiano Fidêncio
1e8fe131bd k8s: tests: Take advantage of SHIMS and DEFAULT_SHIM env vars
We don't have to do any sed to replace the runtimeclass being used by
the moment we start taking advantage of the `DEFAULT_SHIM` environment
variable exposed merged in the previous commits.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-09 11:15:34 +02:00
Jiang Liu
311671abb5
Merge pull request #7552 from jiangliu/agent-r1
Fix mimor bugs and improve coding stype of agent rpc/sandbox/mount
2023-08-09 13:19:02 +08:00
Jiang Liu
baabfa9f1f agent: refine implementation of mount related code
Refine implementation of mount by:
- log message with `path.display()` instead of `{:?}`
- add prefix "_" to unused variables
- pass by reference instead of by value to avoid creating redundant
  array
- exactly matching prefix "fsgid=" instead of "fsgid"
- avoid redundant clone() operations

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-08 18:03:03 +08:00
Jiang Liu
98ba211a34 agent: fix a bug in update_ephemeral_mounts()
There's a bug in function update_ephemeral_mounts() which only handles
the first storage object and ignores all other storage objects.

Fixes: #7551

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-08 18:03:02 +08:00
Jiang Liu
5333618d70 agent: make add_storage() take &[Storage] instead of Vec<Storage>
Simplify add_storage() by taking &[Storage] instead of Vec<Storage>.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-08 18:03:01 +08:00
Jiang Liu
37f34781d1 agent: simplify function online_cpu_memory()
Simplify function online_cpu_memory() by on calling update_cpuset_path()
for containers with cpuset configured.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-08 18:03:00 +08:00
Jiang Liu
d3c5422379 agent: refine style of code related to sandbox
Refine style of code related to sandbox by:
- remove unnecessary comments for caller to take lock, we have already taken
  `&mut self`.
- change "*count < 1 " to "*count == 0", `count` is type of u32.
- make remove_sandbox_storage() to take `&mut self` instead of `&self`.
- group related function to each others
- avoid search the map twice in function find_process()
- avoid unwrap() in function run_oom_event_monitor()
- avoid unwrap() in online_resources()

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-08 18:02:59 +08:00
Jiang Liu
71a9f67781 agent: avoid unwrap() in function do_remove_container()
Avoid unwrap() in function do_remove_container(), and also make
implmementation symmetric for both timeout and non-timeout cases.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-08 18:02:58 +08:00
Jiang Liu
84badd89d7 agent: avoid clone objects when possible
Optimize agent rpc implementation by:
- avoid clone objects when possible
- avoid unwrap() when possible
- explictly drop object to ensure order

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-08 18:02:56 +08:00
Chao Wu
b098960442
Merge pull request #7581 from justxuewei/bump-versions
deps: Bump dependent crate versions
2023-08-08 15:16:57 +08:00
Chao Wu
24bf637835
Merge pull request #7500 from pmores/fix-queue-num-in-dragonball-share-fs
fix number of queues handling in dragonball share fs device
2023-08-08 12:07:25 +08:00
Xuewei Niu
b23c5ed155 deps: Bump dependent crate versions
This pull request is mainly for updating vm-memory and vmm-sys-util.

The affacted crates include:

- vm-memory: from 0.9.0 to 0.10.0
- vmm-sys-util: from 0.10.0 to 0.11.0
- virtio-queue: from 0.6.0 to 0.7.0
- fuse-backend-rs: from 0.10.4 to 0.10.5
- linux-loader: from 0.6.0 to 0.8.0
- nydus-api: from 0.3.0 to 0.3.1
- nydus-rafs: from 0.3.1 to 0.3.2
- nydus-storage: from 0.6.3 to 0.6.4

Fixes: #0000

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-08-08 11:54:09 +08:00
Fupan Li
5a20d8dcaf
Merge pull request #7383 from justxuewei/dan
runtime-rs: Introduce directly attachable network
2023-08-08 09:54:28 +08:00
Chelsea Mafrica
553fd79ea9
Merge pull request #7572 from GabyCT/topic/resnet50fp32
metrics: General improvements to mobilenet tensorflow test
2023-08-07 13:33:28 -07:00
GabyCT
194120b679
Merge pull request #7540 from GabyCT/topic/enableiperf
gha: Add iperf network metrics
2023-08-07 13:40:02 -06:00
Gabriela Cervantes
863283716d metrics: General improvements to mobilenet tensorflow test
This PR renames the mobilenet tensorflow test to have a more specific
tensorflow name mainly because tensorflow has different configurations
and we will add more tensorflow tests so we want to distinguish each
tensorflow test.

Fixes #7571

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-07 16:50:00 +00:00
Gabriela Cervantes
3c319d8d4c metrics: Add iperf to gha run script
This PR adds iperf to gha run script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-07 16:20:00 +00:00
Gabriela Cervantes
5b5caf8908 gha: Add iperf network metrics
This PR adds the iperf network metrics to the github actions
for kata metrics.

Fixes #7535

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-07 16:20:00 +00:00
Chelsea Mafrica
4559caf619
Merge pull request #7467 from ManaSugi/doc/use-k8-control-plane
docs: Use control-plane term instead of master
2023-08-06 23:40:51 -07:00
Fabiano Fidêncio
b365bef570
Merge pull request #7191 from wedsonaf/avoid-clones
agent: avoid unnecessary calls to `Arc::clone`
2023-08-06 15:34:07 +02:00
GabyCT
7144acb2a5
Merge pull request #7527 from GabyCT/topic/latency
metrics: Add network latency test
2023-08-04 15:54:07 -06:00
Gabriela Cervantes
66db5b5350 metrics: Add latency test to network README
This PR adds latency test to network README for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-04 20:27:27 +00:00
Wedson Almeida Filho
c36572418f agent: avoid unnecessary calls to Arc::clone
These calls cause two extra atomic instructions each time they're used,
one to increment and another one to decrement the refcount.

Since we don't need them because the referred value is guaranteed to
outlive the function, remove the calls.

Fixes: #7190

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-03 20:53:05 -03:00
Fabiano Fidêncio
8c03deac3a
Merge pull request #7106 from wedsonaf/image-pulling
Image pulling on the host
2023-08-04 01:08:42 +02:00
Wedson Almeida Filho
4fbe0a3a53 runtime: bind-mount mounted block device into container
When the mounted block device isn't a layer, we want to mount it into
containers, but since it's already mounted with the correct fs (e.g.,
tar, ext4, etc.) in the pod, we just bind-mount it into the container.

Fixes: #7536

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-03 17:58:39 -03:00
Wedson Almeida Filho
7e1b1949d4 runtime: add support for kata overlays
When at least one `io.katacontainers.fs-opt.layer` option is added to
the rootfs, it gets inserted into the VM as a layer, and the file system
is mounted as an overlay of all layers using the overlayfs driver.

Additionally, if the `io.katacontainers.fs-opt.block_device=file` option
is present in a layer, it is mounted as a block device backed by a file
on the host.

Fixes: #7536

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-03 17:58:39 -03:00
Wedson Almeida Filho
6c867d9e86 agent: add io.katacontainers.fs-opt.overlay-rw option
This causes the overlay-fs driver to add the `upperdir` and `workdir`
options to an overlay-fs mount so that the mount becomes writable using
a discardable directory under the container id.

Fixes: #7536

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-03 17:58:39 -03:00
Wedson Almeida Filho
6163c35657 agent: skip mount options that start with "io.katacontainers."
This is so that file systems don't fail when we pass kata-specific
options from the snapshotter to kata.

Fixes: #7536

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-03 17:58:39 -03:00
Fabiano Fidêncio
fa35afa982
Merge pull request #7542 from wedsonaf/ci-fix
Use version 0.10.4 of `fuse-backend-rs`
2023-08-03 22:50:11 +02:00
Wedson Almeida Filho
b2ff97aa01 dragonball: use version 0.10.4 of fuse-backend-rs
Version 0.10.5, which was just released, breaks `nydus-storage`.

This is a workaround to fix the CI which is blocking other PRs.

Fixes: #7541

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-03 14:15:17 -03:00
Fabiano Fidêncio
ebdae7cfdf
Merge pull request #7520 from jepio/host-systemctl
kata-deploy: Use host's systemctl
2023-08-03 13:53:28 +02:00
Fabiano Fidêncio
e2755a47b8
Merge pull request #7524 from fidencio/revert-kata-deploy-changes-after-3.2.0-rc0-release
release: Revert kata-deploy changes after 3.2.0-rc0 release
2023-08-03 11:28:43 +02:00
Fabiano Fidêncio
1163fc9de2 release: Revert kata-deploy changes after 3.2.0-rc0 release
As 3.2.0-rc0 has been released, let's switch the kata-deploy / kata-cleanup
tags back to "latest", and re-add the kata-deploy-stable and the
kata-cleanup-stable files.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-03 10:08:20 +02:00
Xuewei Niu
3958a39d07 runtime-rs: Introduce directly attachable network
Kata containers as VM-based containers are allowed to run in the host
netns. That is, the network is able to isolate in the L2. The network
performance will benefit from this architecture, which eliminates as many
hops as possible. We called it a Directly Attachable Network (DAN for
short).

The network devices are placed at the host netns by the CNI plugins. The
configs are saved at {dan_conf}/{sandbox_id}.json in the format of JSON,
including device name, type, and network info. At the very beginning stage,
the DAN only supports host tap devices. More devices, like the DPDK, will
be supported in later versions.

The format of file looks like as below:

```json
{
	"netns": "/path/to/netns",
	"devices": [{
		"name": "eth0",
		"guest_mac": "xx:xx:xx:xx:xx",
		"device": {
			"type": "vhost-user",
			"path": "/tmp/test",
			"queue_num": 1,
			"queue_size": 1
		},
		"network_info": {
			"interface": {
				"ip_addresses": ["192.168.0.1/24"],
				"mtu": 1500,
				"ntype": "tuntap",
				"flags": 0
			},
			"routes": [{
				"dest": "172.18.0.0/16",
				"source": "172.18.0.1",
				"gateway": "172.18.31.1",
				"scope": 0,
				"flags": 0
			}],
			"neighbors": [{
				"ip_address": "192.168.0.3/16",
				"device": "",
				"state": 0,
				"flags": 0,
				"hardware_addr": "xx:xx:xx:xx:xx"
			}]
		}
	}]
}
```

Fixes: #1922

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-08-03 15:33:34 +08:00
David Esparza
7d1c48c881
Merge pull request #7530 from dborquez/fix_check_running_processes
metrics: stop kata components before start a metric test.
2023-08-02 23:51:27 -06:00
Zhongtao Hu
e719423262
Merge pull request #7127 from cmaf/runtime-rs-ch-blk-2
runtime-rs: Add block device handling for cloud hypervisor
2023-08-03 09:46:32 +08:00
David Esparza
1e15369e59
metrics: Improve naming testing containers in launch times test
This commit provides a new way to name the containers used
in the launch-times-test in this form:
'kata_launch_times_RANDOM_NUMBER', where RANDOM_NUMBER is
in the 0-1000 range.

Fixes: #7529

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-08-02 17:04:55 -06:00
David Esparza
5dbe88330f
metrics: Clean kata components before start a metric test.
This PR kills all kata components before start a new
metric test.

Fixes: #7528

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-08-02 17:04:51 -06:00
Fabiano Fidêncio
d424f3c595
Merge pull request #7523 from fidencio/3.2.0-rc0-branch-bump
# Kata Containers 3.2.0-rc0
2023-08-02 20:04:37 +02:00
Zvonko Kaiser
cf8899f260
Merge pull request #7494 from zvonkok/vfio-mode
vfio: Fix vfio device ordering
2023-08-02 19:45:22 +02:00
Gabriela Cervantes
3b45060b61 metrics: Add latency server yaml
This PR adds latency server yaml for kubernetes test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-02 16:52:17 +00:00
Gabriela Cervantes
9bb8451df5 metrics: Add latency client yaml
This PR adds latency client yaml for the kubernetes test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-02 16:50:51 +00:00
Gabriela Cervantes
64fdb98704 metrics: Add network latency test
This PR adds network latency test for kata metrics.

Fixes #7526

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-02 16:46:48 +00:00
Chelsea Mafrica
a81ad3b587 runtime-rs: Add block device handling in cloud hypervisor
Add functions for adding a block device to a container for CH.

Fixes #6690

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2023-08-02 09:18:48 -07:00
David Esparza
542012c8be
Merge pull request #7503 from GabyCT/topic/ghafio
metrics: Add FIO test to gha for kata metrics CI
2023-08-02 10:05:09 -06:00
David Esparza
5979f3790b
Merge pull request #7516 from GabyCT/topic/addiperf
metrics: Add iperf3 network test
2023-08-02 10:04:51 -06:00
Fabiano Fidêncio
006ecce49a release: Kata Containers 3.2.0-rc0
- ci-on-push: Make the CI also run for the stable-* branches
- ci: k8s: Do not fail when gathering info on AKS nodes
- kata-deploy: enable cross build for non-x86
- runtime-rs: add support for gather metrics in runtime-rs
- kata-ctl: add monitor subcommand for runtime-rs
- release: release-note.sh: Fix typos and reference to images
- metrics: Add sysbench performance test
- Simplify implementation of runtime-rs/service

6ad16d497 release: Adapt kata-deploy for 3.2.0-rc0
025596b28 ci-on-push: Make the CI also run for the stable-* branches
7ffc0c122 static-build: enable cross build for qemu
35d6d86ab static-build: enable cross-build for image build
2205fb9d0 static-build: enable cross build for virtiofsd
11631c681 static-build: enable cross build for shim-v2
7923de899 static-build: cross build kernel
e2c31fce2 kata-deploy: enable cross build for kata deploy script
2fc5f0e2e kata-depoly: prepare env for cross build in lib.sh
f5e9985af release: release-note.sh: Fix typos and reference to images
f910c66d6 ci: k8s: Do not fail when gathering info on AKS nodes
632818176 metrics: Add k8s sysbench documentation
b3901c46d runtime-rs: ignore errors during clean up sandbox resources
5a1b5d367 metrics: Add sysbench pod yaml
ad413d164 metrics: Add sysbench dockerfile
151256011 metrics: Add sysbench performance test
62e328ca5 runtime-rs: refine implementation of TaskService
458e1bc71 runtime-rs: make send_message() as an method of ServiceManager
1cc1c81c9 runtime-rs: fix possibe bug in ServiceManager::run()
1a5f90dc3 runtime-rs: simplify implementation of service crate
731e7c763 kata-ctl: add monitor subcommand for runtime-rs The previous kata-monitor in golang could not communicate with runtime-rs to gather metrics due to different sandbox addresses. This PR adds the subcommand monitor in kata-ctl to gather metrics from runtime-rs and monitor itself.
d74639d8c kata-ctl: provide the global TIMEOUT for creating MgmtClient
02cc4fe9d runtime-rs: add support for gather metrics in runtime-rs

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-02 16:59:41 +02:00
Fabiano Fidêncio
6ad16d4977 release: Adapt kata-deploy for 3.2.0-rc0
kata-deploy files must be adapted to a new release.  The cases where it
happens are when the release goes from -> to:
* main -> stable:
  * kata-deploy-stable / kata-cleanup-stable: are removed

* stable -> stable:
  * kata-deploy / kata-cleanup: bump the release to the new one.

There are no changes when doing an alpha release, as the files on the
"main" branch always point to the "latest" and "stable" tags.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-02 16:59:41 +02:00
Fabiano Fidêncio
4e812009f5
Merge pull request #7519 from fidencio/topic/gha-ci-run-on-stable-branches
ci-on-push: Make the CI also run for the stable-* branches
2023-08-02 16:13:06 +02:00