Commit Graph

11526 Commits

Author SHA1 Message Date
Zhongtao Hu
f0440a9cfe
Merge pull request #7742 from frezcirno/fix-log-forwarder-loop
runtime-rs: check peer close in log_forwarder
2023-08-26 10:44:09 +08:00
Fabiano Fidêncio
16a610d788
Merge pull request #7758 from fidencio/topic/gha-avoid-fail-fast-till-everything-is-ultra-stable
gha: Avoid "fail-fast" in tests that are known to be flaky
2023-08-25 16:49:26 +02:00
Jiang Liu
91db888d83
Merge pull request #7602 from jiangliu/agent-storage
Refine storage device management for kata-agent
2023-08-25 22:20:18 +08:00
Zixuan Tan
dffc16e5b3 runtime-rs: check peer close in log_forwarder
The log_forwarder task does not check if the peer has closed, causing a
meaningless loop during the period of “kata vm exit”, when the peer
closed, and “ShutdownContainer RPC received” that aborts the log forwarder.

This patch fixes the problem.

Fixes: #7741

Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2023-08-25 19:00:07 +08:00
Jiang Liu
aaa5ab1264 agent: simplify storage device by removing StorageDeviceObject
Simplify storage device implementation by removing StorageDeviceObject.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-25 17:23:16 +08:00
Fabiano Fidêncio
fb49d5d7ce gha: Avoid "fail-fast" in tests that are known to be flaky
Otherwise we'll have to re-run all the tests due to a flaky behaviour in
one of the parts.

Fixes: #7757

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-25 10:00:17 +02:00
Dan Mihai
183f51d6f6 tests: use unique test name
k8s-pid-ns.bats was already using the test name from
k8s-kill-all-process-in-container.bats - probably a copy/paste bug.

Fixes: #7753

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-08-25 03:41:06 +00:00
Dan Mihai
6a974679f2 tests: delete k8s deployment at the test's end
At the end of k8s-kill-all-process-in-container.bats, delete the
deployment it created.

Fixes: #7752

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-08-25 03:34:37 +00:00
David Esparza
686eb3878b
Merge pull request #7751 from GabyCT/topic/unusednhwc
metrics: Remove unused variable in tensorflow nhwc script
2023-08-24 18:34:06 -06:00
Fabiano Fidêncio
f1d8e1f513
Merge pull request #7747 from fidencio/topic/kata-deploy-dont-try-to-remove-opt-kata
kata-deploy: Don't try to remove /opt/kata
2023-08-24 18:56:52 +02:00
Gabriela Cervantes
32a778b6da metrics: Remove unused variable in tensorflow nhwc script
This PR removes unused variable in tensorflow nhwc script.

Fixes #7750

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-24 15:54:27 +00:00
David Esparza
875a85ee14
Merge pull request #7736 from GabyCT/topic/tensorflowfp32
metrics: Add TensorFlow ResNet50 FP32 benchmark
2023-08-24 08:56:24 -06:00
Fabiano Fidêncio
d8f3ce6497 kata-deploy: Don't try to remove /opt/kata
The directory is a host path mount and cannot be removed from within the
container.  What we actually want to remove is whatever is inside that
directory.

This may raise errors like:
```
rm: cannot remove '/opt/kata/': Device or resource busy
```

Fixes: #7746

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-24 13:57:36 +02:00
Jeremi Piotrowski
71c90b994a
Merge pull request #7745 from jepio/vfio-part-0
gha: vfio: Run on Ubuntu 23.04 runner
2023-08-24 12:15:19 +02:00
Greg Kurz
9991772b26
Merge pull request #7718 from littlejawa/fix_filemode_when_zero
kata-agent: use default filemode for block device when it is set to 0
2023-08-24 11:40:28 +02:00
Jeremi Piotrowski
936e8091a7 gha: vfio: Run on Ubuntu 23.04 runner
The vfio test requires nested-nested virtualization:

L0 Azure host
-> L1 Ubuntu VM
  -> L2 Fedora VM
    -> L3 Kata

This hits a kernel bug on v5.15 but works quite nicely on the v6.2 kernel
included in Ubuntu 23.04. We can switch back to Ubuntu 22.04 when they roll out
v6.2.

Fixes: #6555
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-08-24 10:10:02 +02:00
Jiang Liu
0e7248264d agent: move storage device related code into dedicated files
Move storage device related code into dedicated files.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 13:48:51 +08:00
Xuewei Niu
268e846558 runtime-rs: Fix volumes and rootfs cleanup issues
There are several processes for container exit:

- Non-detach mode: `Wait` request is sent by containerd, then
  `wait_process()` will be called eventually.
- Detach mode: `Wait` request is not sent, the `wait_process()` won’t be
  called.
    - Killed by ctr: For example, a container runs `tail -f /dev/null`, and
      is killed by `sudo ctr t kill -a -s SIGTERM <CID>`. Kill request is
      sent, then `kill_process()` will be called. User executes `sudo ctr c
      rm <CID>`, `Delete` request is sent, then `delete_process()` will be
      called.
    - Exited on its own: For example, a container runs `sleep 1s`. The
      container’s state goes to `Stopped` after 1 second. User executes
      the delete command as below.

Where do we do container cleanup things?

- `wait_process()`: No, because it won’t be called in detach mode.
- `delete_process()`: No, because it depends on when the user executes the
  delete command.
- `run_io_wait()`: Yes. A container is considered exited once its IO ended.
  And this always be called once a container is launched.

Fixes: #7713

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-08-24 13:23:47 +08:00
Jiang Liu
8f49ee33b2 agent: refine storage related code a bit
Refine storage related code by:
- remove the STORAGE_HANDLER_LIST
- define type alias
- move code near to its caller

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 13:09:10 +08:00
Jiang Liu
60ca12ccb0 agent: switch to new storage subsystem
Switch to new storage subsystem to create a StorageDevice for each
storage object.

Fixes: #7614

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 13:09:09 +08:00
Jiang Liu
fcbda0b419 kata-types: introduce StorageDevice and StorageHandlerManager
Introduce StorageDevice and StorageHandlerManager, which will be used
to refine storage device management for kata-agent.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 13:08:55 +08:00
Jiang Liu
b03b1f6134 agent: simplify the way to manage storage object
Simplify the way to manage storage objects, and introduce
StorageStateCommon structures for coming extensions.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 12:58:24 +08:00
Jiang Liu
8392c71bf2 sys-util: support more mount flags in parse_mount_options()
Support more mount flags in parse_mount_options().

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 12:17:39 +08:00
Jiang Liu
c00d8f3d48 agent: use create_mount_destination() from kata-sys-util
Use create_mount_destination() from kata-sys-util crate to reduce
redundant code.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 12:17:38 +08:00
Jiang Liu
5e867f0538 types: add more mount related constants
Add more mount related constants.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 12:17:36 +08:00
Jiang Liu
880e6c9a76 agent: use function from kata-sys-utils to reduce code
Use function get_linux_mount_info() from kata-sys-util crate to share
common code.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 12:17:34 +08:00
QuanweiZhou
a6921dd837
Merge pull request #7698 from jiangliu/virtual-volume
kata-types: introduce KataVirtualVolume to support nydus, direct volume and image pull
2023-08-24 11:50:39 +08:00
Fabiano Fidêncio
7705c5962e
Merge pull request #7728 from ManaSugi/fix/typo-test-toml
libs,tests: fix typo disable_guest_seccomp in configuration-anno-1.toml
2023-08-23 23:55:41 +02:00
GabyCT
c1712e1930
Merge pull request #7737 from jepio/fix-local-build
local-build: Remove GID before creating group
2023-08-23 12:26:39 -06:00
Jeremi Piotrowski
3b881fbc0e local-build: Remove GID before creating group
docker install now creates a group with gid 999 which happens to match what we
need to get docker-in-docker to work. Remove the group first as we don't need
it.

Fixes: #7726
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-08-23 18:58:38 +02:00
David Esparza
ebce5d25a9
Merge pull request #7734 from fidencio/topic/kata-deploy-fix-removal
kata-deploy: Avoid failing on content removal
2023-08-23 10:29:57 -06:00
Gabriela Cervantes
959ca49447 metrics: Add TensorFlow ResNet50 fp32 Dockerfile
This PR adds the TensorFlow ResNet50 fp32 Dockerfile for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-23 16:24:58 +00:00
Gabriela Cervantes
4b7d72c4a8 metrics: Add TensorFlow ResNet50 FP32 benchmark
This PR adds TensorFlow ResNet50 FP32 benchmark for kata metrics.

Fixes #7735

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-23 16:21:09 +00:00
Fabiano Fidêncio
e7e4cc2182
Merge pull request #7716 from bergwolf/github/image-initrd-assets
runtime: fix image and initrd assets handling
2023-08-23 18:02:15 +02:00
Fabiano Fidêncio
5cba38c175 kata-deploy: Avoid failing on content removal
We can simply use `rm -f` all over the place and avoid the container
returning any error.

Fixes: #7733

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-23 16:49:26 +02:00
Peng Tao
18d42da21e runtime/fc: fix image/initrd annotation handling
Right now if we configure an image annotation and have a config file
setting initrd, the initrd config would override the image annotation.

Make sure annotations are preferred over config options in image and initrd
path handling.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-08-23 03:47:28 +00:00
Peng Tao
9fda7059a5 runtime/clh: fix image/initrd annotation handling
We should make sure annotations are preferred over
config options in image and initrd path handling.

Fixes: #7705
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-08-23 03:47:28 +00:00
Peng Tao
1a0092d631 runtime/qemu: fix image/initrd annotation handling
Right now if we configure an image annotation and have a config file
setting initrd, the initrd config would override the image annotation.

Add a helper function ImageOrInitrdAssetPath to make sure annotations
are preferred over config options in image and initrd path handling.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-08-23 03:47:27 +00:00
Manabu Sugimoto
22d8f335d6 libs,tests: fix typo disable_guest_seccomp in configuration-anno-1.toml
Change `pdisable_guest_seccomp` to `disable_guest_seccomp`

Fixes: #7727

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-08-23 12:08:18 +09:00
GabyCT
b8990c0490
Merge pull request #7722 from GabyCT/topic/adddiskreadme
metrics: Add disk link to README
2023-08-22 12:29:54 -06:00
GabyCT
514d3d42b8
Merge pull request #7712 from GabyCT/topic/fixfiopath
metrics: Fix FIO path
2023-08-22 12:28:28 -06:00
Gabriela Cervantes
8afd158cef metrics: Add disk link to README
This PR adds disk link to README documentation for kata metrics.

Fixes #7721

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-22 16:20:31 +00:00
Julien Ropé
40914b25d4 kata-agent: use default filemode for block device when it is set to 0
When the FileMode field for the device is unset (0), use a default value instead
to allow the use of the device from the container.
This behaviour is seen from cri-o typically.

Note: this is what runc is doing, which is why regular containers don't have an
issue. This change makes sure kata behaves the same as runc.

Fixes: #7717

Signed-off-by: Julien Ropé <jrope@redhat.com>
2023-08-22 16:08:14 +02:00
Fabiano Fidêncio
8032797418
Merge pull request #7708 from microsoft/danmihai1/kata-deploy-log
gha: capture additional kata-deploy output
2023-08-21 23:43:51 +02:00
David Esparza
d2c130ea69
Merge pull request #7710 from GabyCT/topic/fixpytorch1
metrics: Use function from metrics common in pytorch script
2023-08-21 15:31:24 -06:00
Gabriela Cervantes
eee2ee6eeb metrics: Fix FIO path
This PR fixes the FIO path for the FIO files.

Fixes #7711

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-21 21:06:04 +00:00
David Esparza
9347051592
Merge pull request #7666 from dborquez/metrics_improve_fio_test
metrics: Enable kata runtime in K8s for FIO test.
2023-08-21 13:51:57 -06:00
Gabriela Cervantes
39bc3488f5 metrics: Use function from metrics common in pytorch script
This PR uses a common function into the pytorch script.

Fixes #7709

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-21 16:12:35 +00:00
Dan Mihai
400eb88743 gha: capture additional kata-deploy output
10 lines can be insufficient for diagnostics.

Fixes: #7707

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-08-21 15:58:57 +00:00
GabyCT
700759232f
Merge pull request #7690 from GabyCT/topic/fixpytorch
metrics: Fix README for pytorch
2023-08-21 09:50:14 -06:00